Sei sulla pagina 1di 6

Robotics for Artificial Intelligence – KIMROB03

ALICE - A Domestic Service Robot


Group n. 04 — Parth Tiwary and Danny Rogaar

Abstract— A domestic service robot is tasked with retrieving travel from start to goal state. Hierarchical navigation is a
known objects from fixed locations. We implement the overall crucial and more sophisticated alternative for hardcoding sets
behaviour, using a navigation algorithm, object recognition and of waypoints for each possible route in the final demo.
grasping methods. Experiments show the robots capability in
navigation and object recognition. Due to a flaw in object In the following sections, we further elaborate on imple-
recognition, we were unable to test the robots grasping ability mentation of all the three components of the service robot:
on the platform, although simulations have shown the robot to navigation, object recognition and grasping, and in the ex-
be able to grasp targeted objects. periments section we put up a discussion on why some
components didn’t work as expected during the final demo
I. INTRODUCTION
and a reasonable explanation.
A domestic service robot is primarily used for performing
household chores for assisted living, domestic cleaning, A. Neural object recognition
and surveillance. It is a multidisciplinary field and requires Object recognition has been an incredibly successful area
a sophisticated integration of all the involved disciplines, of research since Krizhevsky showed the performance us-
whether computer vision, navigation or grasping. With re- ing neural networks [Krizhevsky et al., 2012]. For image
cent advances in recognition and grasping of an array of processing using such architectures, typically convolutional
household objects and the increased accuracy in such tasks neural networks are applied.
[Hodson, 2018], there is a rise in demand for service robots. Basic fully connected networks are centered around the
We work with an example of one such domestic service robot artificial neuron. The artificial neuron is basically a weighted
in this project. sum of inputs minus a certain threshold. The output can then
In this paper, we elaborate on our implementation of domes- be given to an activation function that manipulates the linear
tic service robot ALICE, which is three folds: navigation, output of the neuron. For example, linear activation means
object recognition and grasping. We discuss each of these using the output of the weighted sum directly. Rectified
topics and our implementation in-depth in the following Linear Unit output is pretty much a linear output. However,
sections. In navigation we implement Dijkstra path planning a fully linear network has no better potential than a shallow
algorithm ([Dijkstra, 1959]) for finding the path between two version of it. ReLU avoids being fully linear by truncating
waypoints, our implementation uses heapq data structure for the output below zero input [Glorot et al., 2011].
improving the runtime of the algorithm and we use push/pop Again, images are better processed using convolutional neu-
mechanisms using heappush and heappop. In object recog- ral networks, as images are only locally connected which is
nition, we use a convolutional neural network for classifying exploited by the technique. 2D convolutions apply a kernel
objects into five pre-defined classes, we use two convolutions which forms its weights, onto the input image at all possible
each followed by a max pool layer and two fully connected locations. Since the kernel weights are multiplied with the
layers sandwiched in between. For grasping we implement current piece of the image, the operation can be seen as
an action server, details on which will be discussed in the applying a fully connected network to a small part of the
following sections. image, at all possible parts, which is much more efficient
In navigation, we implement a hierarchical structure for than having a single giant fully connected layer.
navigation. The implementation is two folds and addresses The output is a feature map that represents different patterns
once core issue faced during the implementation of final at a scale depending on the number of convolutions before.
demo behaviour. First, we pre-define a set of waypoints for That is, early feature maps are only able to learn simple
the final demo as shown in figure 1. Then, whenever a start patterns such as edges using only one convolution. Later
and goal state is given to the robot, we use Dijkstra path feature maps may, however, be able to learn complete classes.
planning algorithm to obtain an optimal set of waypoints. Given that deeper layers are able to learn more complex
Once we have obtained the optimal set of waypoints, we patterns, the depth of a network is related to the complexity
provide these waypoints to our navigation sub-behaviour of the given problem. I.e. with few classes, only basic
which further invokes Dijkstra path planning algorithm to features may be good enough for the solution
plan a path between each of these waypoints in order to Max-pooling can be used to subsample feature maps/images
by choosing only the highest value in a small part of the
The authors are with Faculty of Science and En-
gineering, University of Groningen, The Netherlands. image. This introduces some invariance to location which is
{p.p.tiwary,p.d.rogaar}@student.rug.nl useful for classification but less so for localisation.
Gradient descent for parameter optimisation means adjusting to plan a path between the waypoints.
the model parameters in a way that makes a predefined To be more specific, instead of defining waypoint paths for
loss function to go down, by using the network function each possible route, we define a coarse grid of waypoints in
derivative. the map (See fig. 1 for the actual map and its grid). Whenever
the robot is tasked to move, path planning happens between
II. METHOD the waypoints by converting them to a cost-map with cost
A. Navigation depending on the existence of connections and a euclidean
For navigating through the environment, ALICE uses distance. The result is a set of waypoints that can be used for
a custom implementation of Dijkstra algorithm for path navigating between the robot and its goal while successfully
planning, making sure navigation from one point to another navigating around obstacles. We also set the orientation of
is accomplished through the shortest path. For the robot, the waypoint vectors on the fly since the robots direction changes
shortest path often means barely scraping any obstacles in depending on its location and target. The orientations are
the way. The robot then slows down not to carefully navi- set along the longest axis of movement towards the relevant
gate around such encounters. Luckily, intermediate locations waypoint. For the start and waypoints next to the tables,
dubbed waypoints can be set so that path planning happens orientations are fixed which is also seen in the image.
between the waypoints and not the immediate robot and
its target. This can steer the path of the robot around any
obstacles such as tables and closets, in turn preventing slow
movement for having to be more careful. The input for the
navigation is thus set to be a list of waypoints which are
followed in order.
Implementation
As for the specifics of the implementation, a flattened 1D
ros array is provided as the costmap, which we use as it
is. Every index in the 1D costmap array corresponds to a
node in the typical graph based approach to the algorithm.
For the extraction of neighbours of a given node in the
1D array, appropriate offsets were used to index actual
neighbours in 2D costmap onto 1D. This approach for the
extraction of neighbours for a given node is implemented
in find neighbours function. Distances are euclidean though Fig. 1: Waypoint graph for intermediate navigation tar-
only 8 neighbouring nodes are considered (adjacent and di- gets. Tables appear as two dots on the map due to
agonally adjacent positions). An infinite distance is returned the laser scanner, thus red squares denote their ap-
whenever the costmap is any larger than 0, implying the proximate footprint on the map. For waypoints with-
robot is near an obstacle. The former makes sure paths are out arrows, a direction is determined on the fly.
considered only when they do not encounter obstacles. Legend: A:Start, B:Table1, C:Table2 left, D:Table2 right,
Run-time for Dijkstra implementation is optimised using E:mid right, F:mid left, G:mid center, H:drop-off.
python heapq implementation of priority queue. Heaps are
binary trees in which every parent has value less than or
equal to the child node. In our implementation, heappush Augmentation
mechanism is used for pushing all the neighbours of the Data is the key to all state-of-the-art deep learning
current node with their respective distances onto the heap and implementations. However, it is not usually the case that
heappop mechanism is used for extracting next node with the we have enough data and hence, data augmentations can
shortest distance. heappop has a runtime of O(log(n)) for the come in handy. But why would data augmentations help
extraction of minimum instead of usual O(n) which signifi- with training the deep learning models at all?
cantly improves the runtime of the Dijkstra implementation. We can think of data augmentations as added noise provided
to the network during the training process with each
Hierarchical navigation variation of augmentation, which consequently makes sure
When navigating a new environment with a range of that our model learns general features and doesn’t over-fit
different tasks, for example, the final demo, it would be in- on the given dataset.
feasible to define a new list of waypoints for each possible The provided dataset includes 5 classes one for each box,
start position of the robot and each possible goal. We the classes include BaseTech, TomatoSoup, Eraserbox,
implement it by doing a recursive path planning, once for the UsbHub and Evergreen, and we have around 200 images
waypoints, in order to extract the optimal set of waypoints in each class. In order to improve the usefulness of our
to use in order to reach from start position of the robot to the data, we augment images such that the network learns
goal state and then we use the same path planning algorithm more invariance to them. These augmentations include
brightness changes, horizontal and vertical flips and random Object Recognition
crops in the image. Specifically, we used the following
For object recognition, we used a convolutional neural
augmentations for our dataset:
network architecture with two fully connected layers, two
convolutions followed by a max pool, finally connecting
• Flip Image Horizontal: In horizontal flip, we flip the to the softmax layer. A description of the architecture in
image around the central vertical axis in the image, provided in figure 3.
which provides the invariance against the orientation We used a gradient descent optimiser with a learning rate
of the box around the center axis in the image. of 0.002. Input to the network are 64x64x3 images and
output is a softmax array, which we further pass through
• Change Brightness: For changing the brightness of the argmax to obtain the class of the object present in the
the image and providing data to the model, so that image. This network was trained for 200 epochs and the
it learns features which are invariant to variations in accuracy on the validation set plateaued around 99%, which
brightness of the image. We change the RGB image to seems exceptionally high and hints towards over-fitting. We
HSV and then we randomly choose a value for HSV tried to run multiples training cycles with smaller epochs
in the range of [50, 200] and assign this value to the however, the validation accuracy very quickly reaches the
v component(value) of HSV and then we convert this peak, and leaves less room for stopping the training process
image back to BGR format as defined in opencv which earlier. We wanted to stop the training process earlier in
is returned by the function. order to avoid a probable over-fitting, unfortunately this was
not possible.
• Random Crop: The ROIs obtained from ALICE are
not always accurate and as a result of that on a lot of Recognition results on ROIs images obtained from ALICE:
occasions we obtain images in which the objects are We further tested our network on the ROIs obtained from
partially hidden or edges are cut from either top-bottom ALICE during the final demo. Results obtained on the ROIs
or sides. For our model to be invariant to such peculiar can be seen in the figure 4. Out of total 18 ROI images
cases, we include random crop in order for our model to obtained from ALICE, our network mis-classified 5 as can
learn invariance to such crops. An array of augmented be seen in the above referenced figure. We also observe
images can be seen in figure 2 that these mis-classifications are for the images in which
the object is kept upside down, and such instances were
not provided to our network during the training process.
Hence, it is reasonable to assume that our network bound
to mis-classify these instances, this can be explained given
the training process and augmentations provided during the
training.

B. Grasping
In the main behaviour, there are 4 components, one
of which is the grasping action server. For grasping, we
define grasps inside of moveit.py, where we further de-
fine grasp pose, pre grasp posture, pre grasp approach and
post grasp retreat. These are four essentials for creating a
grasp and are defined inside create grasps function.
To briefly define the grasping pipeline: as ALICE starts
approaching the table while trying to detect any objects on
the table from the point cloud data. It returns ROIs for the
objects that might be present on the table. Depending on the
ROIs obtained the object recognition server extracts a ROI,
resizes it to an 64x64 image and tries to recognise the object
present in each of these ROIs. If the object is classified as an
object in the given order, the hardcoded dimensions of the
object is used to create a collision model for the object in
the scene, after which the grasping process is initiated. This
further goes onto show that object recognition is an essential
part of the grasping pipeline. Also, since object detection is
Fig. 2: A subset of the training set, including from left to performed after every grasp, this action will clear the table
right: original image, brightness, crop and flip augmentation. of all objects in the order.
For specifics of the grasps created, for grasp pose we define
Fig. 3: Object recognition neural network. Convolutions use 5x5 and Max-pooling uses 2x2 kernels.

an object top for each object and this object top is defined the state to waiting for order. After visiting every table, the
as follows: robot moves to the drop-off point. All objects are mentioned
ob ject top = 0.5 ∗ size[2] (1) and the state is reset such that the robot waits for an order
again.
Where size[2] is z component of the object size, that is The navigation behaviour uses a few states itself (fig. 6), to
height of the object. For grasp pose, the arm is just directly keep track of each waypoint that is visited. The subbehaviour
positioned above the object using the object top and this receives a set of waypoints to follow before the starting
generalises to any object depending on the height of the state. It sets its state to waiting and starts moving towards
object (fig. 5). We define array of parameters for each of the first waypoint. Successful navigation changes the state
these predefined states(pre-grasp posture, pre-grasp approach to end waypoint which sets the next waypoint as the nav-
and post-grasp retreat) for creating a grasp. Grasps are igation target. However, failed navigation changes the state
attempted until either the table did not contain any objects to end navigation. Both the end navigation state as well as
in the order, or a certain number of attempts were reached. successful navigation along all of the given waypoints results
Despite sometimes no valid grasps are found, the robot resets in a finished state for the subbehaviour.
and then re-calibrates the box location and orientation and a As for the other processes, grasping is described in its own
followup grasp may in fact work. section and the reproach state precedes a simple command
C. Behaviour to the alicecontroller action server to move backwards 80cm.
Alice approach is a more complex action. Roughly, the action
The main behaviour controls the execution of 4 main
uses the camera to search for any objects in front, and turns
processes. These processes are the navigation sub-behaviour,
the robot so that driving will place the robot in front of them,
and the action servers for grasping, approaching a table with
before aligning the robots rotation to look straight.
objects, and moving back from a table (called reproach).
Keeping track of these processes is done by keeping a local III. EXPERIMENTS
state, whose values and transitions are shown in fig. 7. The
robot starts by waiting for its order. Such an order contains a Various parts of the robot should function and we observe
list of tables to go to and objects to collect. Given an order, their performance through our experiments.
the robot is tasked to go to all tables in said order and remove
all objects matching the order as well. The instructions imply A. Navigation
that the robot will be able to go to a specific table and grab The navigation including navigation using a grid of way-
a given object, as well as being able to clear a table by points works very well in simulation. Moving along way-
adding all objects to the order. To complete the task, the points made the movements smooth as the robot does not
robot enters (in a sequential and ordered way) a navigation have to drive slow to avoid collision. Moreover, a starting
state, an approach state, a grasping and backwards movement waypoint had been implemented to simulate the position of
state. All states can be entered three times at max, since there the robot at the start of the experiments. The start can be set
are three potential locations for the objects. If the robot has to anywhere and is always followed up by the mid center
visited a table and no other table is in the order, it also resets waypoint. After changing our behaviour to be continuous
Fig. 5: A grasp pose for an object as defined in grasping
section

Fig. 6: Navigation subbehaviour states. Navigation happens


to a waypoint happens in the waiting state. end waypoint
denotes a success and starts the cycle with the next waypoint
until none are left (finished). Failed navigation results in the
end waiting state.

manual recovery had to happen as the robot could no longer


move itself away. Instead, the behaviour was finished and the
robot started to approach; an action that requires shutdown
when no tables with objects are seen. Assuming that the
costmap is a fair representation of obstacles to the robot, it
will be best to just warn the operators and halt behaviours
Fig. 4: Predictions on ROIs obtained from ALICE. Out temporarily. We also found smoothness of navigation to
of 18 images obtained from the ROIs, we have 5 miss- depend on the map and waypoints, with a map used in the
classifications, specifically in the cases where we have object final demo resulting in the robot often trying to rotate to
put upside down and our network is not able to base it’s align itself with waypoints. With enough space in the map,
prediction on any of the features it has learn during the alignment rotations more often happened automatically when
training process. Which seems reasonable. moving to the waypoint. Although we also found our derived
waypoint orientations to be off from the desired orientation
by up to 90 degrees, this did not influence the overall
navigation negatively in first experiments and is therefore
(i.e. resetting the state to waiting for order after object drop- not attributed to the problems we found with orientation.
off), we found the navigation to be very robust as well.
Specifically, wherever the robot was located, it would always B. Object Recognition
move to start which was typically quite successful and The object recognition network was trained until 98.98%
ensures proper execution of further navigation behaviour. accuracy and 97% accuracy on validation. For a much better
Testing the navigation in the real world showed similar test, the performance needed to be tested on new images
results to the simulation. Although waypoints had to be set created by the robot. While integrating the network that
more carefully and the map was more noisy (See e.g. fig. was meant for real objects (different to the network during
1), navigation worked similar to the simulation. We did find simulation), differences between our simulation and actual
some drawbacks in our approach. That is, failing to navigate, recognition model and assumptions on its output lead to
when sporadically the robot got stuck in its costmap, meant an extraneous application of an argmax function. Without
Fig. 7: The local state of our main behaviour. Colours denote which behaviours are used for the state. From light to dark
respectively: Navigation subbehaviour, alice approach action server, grasping action server and the alicecontroller action
server. The states ’finished’ and ’waiting for order’ are handled in the main behaviour.

the moment to test recognition, the issue led to a constant be improved, also, the final demo showed too much robot
prediction from the network and the object recognition was rotations although we did not observe them in a better map.
faulty. The constant prediction of the network was on a box Aligning waypoint orientations with the robot seemed to
not included in the final experiment, leading to a constantly not have a major impact on performance, as failing to do
wrong prediction. In order to get the results on object so perfectly did not result in bad rotations of the robot, or
recognition, then, we have tested the performance afterwards inabilities in navigating.
on the ROIs collected during the final demo. These ROIs Object recognition was based on a convolutional neural
appeared with problems in their colours channels, seemingly network (fig. 3) which we found (after the experiments) to
a problem with cv2 mixed RGB channels in imwrite. How- perform adequately for the task with 13 out of 18 objects
ever, switching these channels does not fix the issue. We observed during the experiments classified correctly.
have shown the results on the given images in fig. 4, since Grasping was shown to be successful in simulation, grabbing
the problem possibly also occurs on the robot. Nonetheless, objects and managing to place them on the robot. Unfortu-
only 5 out of 18 images were misclassified. nately, we were unable to evaluate the performance of object
grasping on the robot since our object recognition was flawed
C. Grasping
at that time.
The method of grasping as explained in sec. II-B, has
shown to be effective in simulation. As mentioned, grasps R EFERENCES
were attempted until either the table did not contain any [Dijkstra, 1959] Dijkstra, E. W. (1959). A note on two problems in
objects in the order, or a certain number of attempts were connexion with graphs. Numer. Math., 1(1):269–271.
[Glorot et al., 2011] Glorot, X., Bordes, A., and Bengio, Y. (2011). Deep
reached. The former implementation meant that despite sparse rectifier neural networks. Journal of Machine Learning Research,
sometimes no valid grasps are found, the robot reset its 15:315–323.
hypothesis of the box location and orientation and a followup [Hodson, 2018] Hodson, R. (2018). How robots are grasping the art of
gripping. Nature: international journal of science, 557:s23–s25.
grasp may in fact work. Overall, our robot was able to [Krizhevsky et al., 2012] Krizhevsky, A., Sutskever, I., and Hinton, G.
pick up many objects in simulation, however the grasping (2012). Imagenet classification with deep convolutional neural networks.
is very dependent on the object recognition. We had not In Advances in Neural Information Processing Systems 25 (NIPS 2012).
attempted grasping before the final demo, and having our
object recognition bugged (see sec. III-B) meant that we only
saw the robot grasp one time, and no adequate evaluation can
be performed.
IV. DISCUSSION / CONCLUSION
We complete the behaviour of a domestic service robot
when tasked to retrieve objects from a fixed set of locations.
Navigation is implemented using Dijkstras algorithm be-
tween target and goal in waypoints, and between these given
waypoints in the map. The navigation proved quite suc-
cessful, allowing successful and robust navigation between
targets. Edge cases such as failed navigation handling could

Potrebbero piacerti anche