Sei sulla pagina 1di 5

Learning Object Affordances Based on Structural Object

Representation
Kadir F. Uyanik and Asil Kaan Bozcuoglu

Abstract— The concept of affordances was first coined by features of the object that will provide us enough information
J. J. Gibson to explain how organisms perceive ”values” to deduce the corresponding affordance for a specific action
and ”meanings” of things in the environment and how this and desired outcome. This hypothesis is also supported by
perception is linked to the action possibilities offered to the
organism. This concept has recently been used in the area the recent neuroscientific findings. The ventral pathway of
of cognitive robotics due to its emphasis on the interaction brain is found to be the place that recognizes objects whereas
between the organism and the environment. Another theory the dorsal pathway is responsible for perception of possible
explaining how object recognition occurs in humans is the actions. Overall, the anterior intra-parietal area is the neural
theory of Recognition by Components proposed by I. Biederman. basis of action affordances. In the experiments, this area is
Although these concepts are introduced in Psychology, they
influence the study of autonomous robotics. In vast majority highly active while the organism is trying to grasp an object
of the robotics studies, raw or low-level sensory data are [4].
taken as the perceptual input to the learning system based In this work, we consider three main features of affor-
on the Gibson’s claim that the meaning of the objects in the dances that can be associated with robots:
environment are directly apparent to the agent acting in it.
• Affordances are relative. This feature states that an
In this report, we first review these theories, then propose an
affordance learner system that utilizes the relevant structural affordance do not only depend on the organism nor the
properties of the objects to relate its action repertoire with these environment. Rather, it infers through their interaction.
structural features and outcomes of the applied actions. This For example, the hold-ability affordance of a stone does
system may overcome the representational inadequacies seen in not only depend on the physical features of the stone
most of the affordance learner systems, and yield higher level
perceptual representation of the objects. Finally, we analyze
but also the holder organism’s physical features.
and discuss the preliminary results obtained while testing each • Affordances provide perceptual economy. Perceptual
sub-module of the system. economy covers the hypothesis that the organism do not
need to process the whole perceptual information so as
I. INTRODUCTION
to accomplish a simple task but it is enough to process
Recent studies in robotics with special emphasis on au- only relevant information. In other words, organism
tonomous systems have mainly focused on developing sys- does not care about all the perceptual features of an
tems that mimic human intelligence. With this goal, robotics entity but filters and processes only the information that
research becomes more and more cross-inter disciplinary is necessary for him to do a specific action in order to
by making use of the developments in cognitive science, reach a desired effect. Hence, this feature of affordances
psychology, ethology, neuroscience and many other science provides minimality and low computational cost for the
and engineering disciplines. In this study, we inspired by the perception action.
theory of affordances from Ecological Psychology, and the • Affordances provide general information with a lim-
theory of recognition-by-parts from Cognitive Psychology to ited interaction. While the discussion on how much
develop an autonomous robotic system being able to perceive interaction is needed between the organism and the
the meaning of the objects to some extent if a chance of environment is still continuing, it is usually assumed
interaction is given. that the use of affordances enables one to learn whether
A. Affordances Concept a chair that an organism sees for the first time would
afford sit-ability.
The concept of affordances was introduced by J. J.
In this study we will be using the affordance formalization
Gibson[1] to explain how organisms perceive ”values” and
proposed by Sahin et al.[6] as it is shown in the figure 1.
”meanings” of things in the environment and how this
perception is linked to the action possibilities offered to B. Recognition by Components Theory
the organism. According to this concept, the organisms do Recognition by Components (RBC) theory, proposed by
not need to recognize what actions can be inferred by Irving Biederman[2], argues that we as humans recognize a
interacting an object and make complex inferences over novel/unfamiliar object by parsing it to the primitive com-
these meanings prior to interact with them. For instance, ponents, then comparing/matching the spatial and structural
we do not need to recognize an object when we need to relations between these parts with those of known objects.
immediately interact with it. Instead, we look for set of Human visual recognition system can be associated with
Kadir F. Uyanik 1444405 kadir@ceng.metu.edu.tr the speech recognition system considering the fact that the
Asil Kaan Bozcuoglu 1773829 asil@kovan.ceng.metu.edu.tr lexical access during speech perception can be successfully
Fig. 1. An affordance is an acquired relation between a (entity, behavior)
tuple of an agent such that the application of the “behavior” on the “entity”
generates a certain “effect“ [6].

Fig. 3. The overall system described in RBC Theory.

Fig. 2. The objects on the right are consisted of the geons on the left.

modeled as a process of identification of individual audial


primitive elements, the phenomes, from a relatively small
set of primitives [5]. RBC asserts that as in the case of
speech recognition, object recognition is realized by using
visual primitives called geometrical icons,geons. Objects are
segmented, mostly at regions of sharp concavity, and resul-
tant parts are matched against the best fitting primitive(out
of approximately 30 different geons) as it is explained in
the figure 3. In the end, objects are decomposed into its
constitutional parts as it is symbolically shown in the figure
2. Fig. 4. a) Experimental Framework b) An example distance image c) An
example amplitude image
C. Experimental Framework
A SwissRanger SR4000 time-of-flight range camera is
used to get 3D range images from real world (Figure 3). iCub mainly relies on YARP. Hence robot is controlled via
SR4000 is capable of grabbing 176x144 resolution range YARP and range image processing is done on ROS::PCL.
images. It returns point cloud, distance, amplitude and con- Matlab is used for the implementation of graph-based
fidence data. calculations of the proposed work.
Yet Another Robot Platform (YARP) is used to access the
camera’s output via network. We implemented YARP driver II. PROPOSED SYSTEM
for SR4000 camera which enables us the reach camera from
A. Overview
any computing node in the YARP network.
In order to process the point cloud data, Robot Operating The system is divided into three main modules (Figure 4).
System(ROS) is used. ROS is very similar to the YARP in The first module is the data processing part which reads point
terms of its distributed architecture and the way nodes are cloud from SR4000 or a file. After getting data, this mod-
communicating with each other. The main reason why we are ule is responsible for background subtraction and segment
using both similar system at the same time is that ROS has a objects into components. Finally, it returns the component
very useful stack called Point Cloud Library(PCL) including neighborhoods, geometric features of each component to the
many statistical computing tools. Another reason is the robot next component.
Structural Pattern Recognition System is the second mod- Algorithm 1 Pattern Recognition Algorithm
ule of the system. As its name suggests, it is the main Input: Components and Neighbor Relation of Object
module that is responsible for pattern recognition, i.e. object Input: comparison − graph
recognition. Construct an unattributed graph from components and
After the second module recognizes the object, it returns neighbor relation of object
the features of the recognized object to the final module. Decide on geometrical classification of each node.
Affordance Deduction System will be implemented in the Make the graph attributed with this classification
future and is out of this report’s scope. Construct match − graph of given two attributed graphs.
Find maximal − clique of the match − graph
B. Point Cloud Analysis and Part Decomposition if node-size-of(maximal − clique) <= 1 then
In this section, the algorithm that is used to decompose return false
objects into its constitutional parts is explained. Contrary to end if
RBC, part identification is not going to be done due to the for Each matched node’s neighborhood relations do
problems caused by self and/or environmental occlusion and Calculate the Mahalonobis distance of that neighbor-
computational complexity of part fitting process. However, hood in graph.
objects are going to be segmented into the point cloud Compare it with the corresponding distance in
clusters belonging to the primitive parts of the objects. Then comparison − graph
spatial and volumetric features are extracted from these end for
clusters and these features are returned to the graphical- if All the calculated mahalonobis distance of graph is
representation module. The overall algorithm is given in the in the safe interval with respect to comparison − graph
figure 5. Decomposition algorithm briefly takes raw point then
cloud data as input and outputs object point clusters (figure return true
6) so that robot can interact with them. Whole process takes else
about 2-3 seconds on a intel i7 quad-core pc without any return false
optimization. end if

C. Structural Pattern Recognition System


III. RESULTS AND DISCUSSION
In overall, the structural pattern recognition assumes that
given pattern structure is quantifiable and extractable. Typ- The experiments on this system is specifically aimed to
ically, Structural Pattern Recognition techniques’ approach recognize different kind of cups since one of the aims of
is to formulate hierarchical descriptions of complex patterns the work is to make a humanoid robot, iCub to be able to
built up from simpler primitive elements. Since the previous deduce its affordance on hold-ability of different cups.
module of the system segments objects into components, A. Point Cloud Analysis and Segmentation into Parts
these techniques become suitable for our case. On the other
hand, geometrical shape of each component is determined The work currently has been finished recovers only the
by using previously trained kn nearest neighbor rule system clustering of the objects on the workspace of the robot. But
with geometrical features of each component. part decomposition module is under development. The results
When a new array of component point clouds which rep- has been obtained until now are promising since robot can
resents an object and neighborhood relations of components easily locate the object to be interacted with.
are given to the module, it constructs a graph by assigning a Decomposition module is composed of delaunay trian-
node for each component and an edge between corresponding gulation of the point cloud, and calculation of gaussian
nodes for every neighborhood relation. curvature of each vortex of the triangles. After thresholding
After constructing the graph, the module checks its simi- the curvature change different parts are going to be parsed
larity with the given graph by the following procedures: as it is mentioned in the Biederman’s RBC theory -parts are
decomposed in the edges of the perceptual information-.
• Decide the geometrical classification of each component
Since the point clouds belonging to each parts of an object
by using previously trained kn nearest neighbor rule is obtained from gaussian curvature analysis, we can obtain
system with geometrical features of each component spatial and volumetric information of each part, such as
and add it to the node as the geometric shape feature volume, area(triangulated surface), principal alignment axis,
• Construct match graph of given two attributed graphs.
surface curvature mean and variance ,and so on. These fea-
• Find maximal clique of the match graph
tures are then passed to the graphical representation module.
• For each matched node’s neighborhood relations, cal-
culate the Mahalonobis distance for new graph. B. Structural Pattern Recognition System
• If distances of edges of new graph is in given interval When we were doing these tests, the first module’s seg-
of the other graph’s corresponding distances, return mentation mechanism was still not working. Thus, for testing
similar. Otherwise, return false. this module, we manually split points of handle and points
Overall algorithm is shown in Algorithm 1. of cup by using Point Cloud Library’s visualization software.
Fig. 5. Part decomposition algoithm. “i:/”, “f:/“, ”o:/“ stand for input, function, and output, respectively(for each module). Functions may take some input
arguments as well.

Fig. 6. Left is the input point cloud representing a table having two mugs on it. Right small figures shows the resultant object clusters from different
point of views in the visualization tool.
planning to make the robot to construct a generic graph for
each affordance in the supervised training phase. When we
will have constructed this feature, the comparison graph will
be complete enough to cover these small variations.
IV. FUTURE WORK
The system we propose here offers a complete solution
for affordance-based learning; therefore, it has many unim-
plemented features in the current version. In this section, we
will propose a roadmap for the system. After finalizing devel-
opment of part decomposition and graphical representation
modules, the very first thing to do is to connect currently
implemented two modules in an online fashion and test the
overall system.
For the part decomposition module gaussian curvature
Fig. 7. Amplitude data of comparison objects. estimation and RANSAC primitive shape extraction methods
are going to be tried, and more robust point cloud clustering
methods are to be implemented.
For the structural pattern recognition module, constructing
a generic graph mechanism will be implemented. With this
mechanism, after training with sufficient number of exam-
ples, the system will critically reduce the number of errors
in affordance perception.
V. CONCLUSION
In this report, we described some of psychological studies
such as Affordances concept and RBC theory. Afterwards,
we proposed an intelligent system, to deduct possible set
of actions for an object, based on these studies. Then, we
gave the details of current implementation of the system and
analyze the initial experiment results.
R EFERENCES
Fig. 8. Amplitude data of some test cases. [1] Gibson J. J. (1986). The ecological approach to visual perception.
Lawrence Erlbaum Associates
[2] Gibson, E. J. (2003). The world is so full of a number of things:
The geometric feature of each component is also manually On specification and perceptual learning. Ecological Psychology, 15,
283288.
assigned. [3] I. Biedeman, Recognition-by-components: A theory of human image
For testing, 4 different cups are chosen to be the compar- understanding. Psychological Review. 1987;94(11):5147.
ison objects (Figure 5). Their orientation are exactly same [4] Norman, J. (2002). Two visual systems and two theories. Behavioral
and Brain Sciences, 25, 73144
with given orientation in Figure 5. 8 different cups with 4 [5] Marslen-Wilson, W. (1980). Optimal efficiency in human speech
different orientations, a total of 32 different data, are given processing. Max-Planck-Institut fiir Psycho-linguistik, Nijmegen, The
to the system as testing data. Some of these data can be seen Netherlands
[6] E. Sahin, M. Cakmak, M.R.Dogar, E. Ugur , G. Ucoluk, To Afford
in Figure 6. or Not to Afford: A New Formalization of Affordances Toward
As the results, the module successfully identifies 28 dif- Affordance-Based Robot Control, Adaptive Behavior , 2007 pp: 447-
ferent test data, out of 32, when the comparison graph is 472
the cup at the case a. For the case b, 19 cups are identified.
Finally, 23 and 20 different cups are identified for the case
c and the case d respectively.
If we analyze the results, we see that when the handle of
cup lies among the z-axis like in the case a, the module is
more successful. The reason behind this is checking Maha-
lanobis distance in order to distinguish big objects from small
ones. If handle of the cup lies with a different orientation
like in the case b, calculated Mahalanobis distance between
components becomes more erroneous because point cloud
of the handle has less points. This may be a problem for
current version of the system. On the other hand, we are

Potrebbero piacerti anche