Sei sulla pagina 1di 8

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2017.2678541, IEEE Robotics
and Automation Letters
IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED FEBRUARY, 2017 1

A Natural Infrastructure-Less Human-Robot Interaction System


Valeria Villani, Lorenzo Sabattini, Giuseppe Riggio, Cristian Secchi, Marco Minelli, Cesare Fantuzzi

Abstract—This paper introduces a novel methodology for A. Contribution


letting a user interact with a robotic system in a natural manner.
The main idea is that of defining an interaction system that
does not require any specific infrastructure or device, but that Standing in this scenario, in this paper we propose a novel
relies on commonly utilized objects while leaving the user’s hands
hands-free infrastructure-less natural interface for HRI, based
free. Specifically, we propose to utilize a smartwatch (or any
commercial sensorized wristband) for recognizing the motion of on the recognition of the movements of users’s forearm. We
the user’s forearm. This is achieved by measuring accelerations use the term infrastructure-less for indicating an interaction
and angular velocities, that are then elaborated for recognizing system that exploits an everyday device and does not need
gestures and for defining velocity commands for the robot. The additional dedicated external sensors (such as, e.g., external
proposed interaction system is evaluated experimentally with
cameras), which have a number of drawbacks. In fact they
different users controlling a semi-autonomous quadrotor. Results
show that the usability and effectiveness of the proposed natural might be not portable, limit the physical area where interaction
interaction system provide significant improvement in the human- occurs and the freedom of motion of the user, make the
robot interaction experience. interaction less natural and require a set up phase, thus not
Index Terms—Human-Centered Robotics, Human Factors and being suited for instantaneous use while the user is performing
Human-in-the-Loop, Physical Human-Robot Interaction, Haptics everyday activities. In the application presented in this paper
and Haptic Interfaces. we consider a commercial smartwatch, but also common
wristbands for activity tracking are suited. In general, any
commercial multipurpose device equipped with accelerometers
I. I NTRODUCTION
and gyroscopes, which can therefore measure movements of

R ECENT advances in robot control and safety systems


allowed the access, in the last few years, of robots in our
daily life. Robotic systems have been applied to several fields,
the forearm of the user, fits the proposed approach. Other than
being a multipurpose device, the use of such an interaction
means has the additional advantage of providing the user with
such as social assistance [1], surveillance [2], tour-guide [3] freedom of movement and letting her/him be immersed in the
or floor cleaning [4]. These contexts are very different from environment where the robot moves, being able to track it with
traditional industrial applications, in which robots are typically non-fragmented visibility [7].
utilized by expert users. In fact, in daily life applications,
users are typically inexperienced, and do not have any specific Moreover, situation and environment awareness is increased
education related to robot programming: the design of effective through haptic feedback, provided by vibration with modulated
human-robot interaction (HRI) systems is then of paramount frequency. In particular, vibration is used for two different
importance, for these applications. purposes: for acknowledging the user’s command (i.e. a vi-
In the last few years, the concept of natural user interfaces bration is provided if a gesture has been recognized), and for
(NUIs) has been developed. The main idea is that of allowing a providing her/him with information on the status of the robot
direct expression of mental concepts by intuitively mimicking with respect to the environment (i.e. vibration modulated based
real-world behavior. NUIs offer a natural and reality-based on the distance from obstacles, or targets).
interaction by exploiting users’ pre-existing knowledge and Considered these features, the proposed HRI approach takes
using actions that correspond to daily practices in the physical the form of a tangible user interface (TUI) [7], in addition to
world [5]. To achieve this, NUIs allow users to directly being a NUI. The term TUIs encompasses a great variety of
manipulate objects and interact with robots rather than instruct interaction systems relying on a coupling between physical
them to do so by typing commands. Thus, they represent an objects and digital information, which is physically embodied
evolutionary paradigm that overcomes the access bottleneck in concrete form in the environment. Thus, TUIs provide direct
of classical interaction devices such as keyboards, mice and mapping between the behavior of the robot and usage of such
joysticks, by resorting to voice, gestures, touch and motion a robot, and between the behavior of control devices and
tracking [6], [7]. resulting digital effects. The proposed HRI system relies on
embodied interaction, tangible manipulation, physical repre-
Manuscript received: September, 10, 2016; Revised December, 14, 2016; sentation of data and embeddedness in real space, which are
Accepted February, 7, 2016.
This paper was recommended for publication by Editor Yasuyoshi Yokoko- the pillars of TUIs.
hji upon evaluation of the Associate Editor and Reviewers’ comments.
Authors are with the Department of Sciences and The interaction system proposed in this paper represents
Methods for Engineering (DISMI), University of Mod- then a milestone towards interaction with robots for everybody,
ena and Reggio Emilia, Italy {valeria.villani, being (to the best of the authors’ knowledge) the first example
lorenzo.sabattini, giuseppe.riggio,
cristian.secchi, cesare.fantuzzi}@unimore.it of a natural and tangible hands-free infrastructure-less HRI
Digital Object Identifier (DOI): see top of this page. system.

2377-3766 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2017.2678541, IEEE Robotics
and Automation Letters
2 IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED FEBRUARY, 2017

B. Application scenario in having controlled and uniform illumination condition. This


In the application presented in this paper we consider the represents a strong assumption, and often cannot be verified in
need to remotely control a quadrotor. However, the proposed real world applications, in particular in outdoor environment.
approach is general and possibly holds valid also to control, Moreover, this kind of approach is not infrastructure-less
among the others, mobile robots [8], [9] and industrial ma- since dedicated vision system instrumentation is required. This
nipulators [10]. It can be scaled to multi-robot systems [11] implies that, firstly, these interaction systems are not portable,
and has been preliminarily applied to interact with industrial since they are effective only when the user lies in the field
machines [12]. Depending on the application and the robot of view of the sensors, that is in front of the camera [13].
considered, the mapping between movement of the forearm Secondly, these systems cannot track the robot since the user
and commands to the robot and between gestures and changes can barely move and follow it. Thus, their field of application
of the state machine has to be adapted. Moreover, in the case is greatly reduced: for instance, they are unsuited for any
of nonholonomic robots state feedback linearization has to be inspection application, where movement in large areas is
considered. typically required. This can be partially dealt with mounting
Specifically, the proposed interaction system is advanta- the camera on the robot [8], and letting the user follow the
geous in the following application scenarios. First, we consider robot in its movements. However, also in this case, the user
those conditions when the user and the robot share the same must stand in front of the camera in order for the gestures to
working area, that is when the distance between the user be recognized.
and the robot is not intrinsic to the application, but would To overcome the limitations of vision-based HRI, physical
be introduced solely by the human-robot interface. In this interaction devices can be exploited for implementing non-
scenario, by using the proposed interaction system, the user vision-based methodologies. One of the most popular tech-
can control the robot while standing in its proximity and niques consists in recognizing gestures utilizing sensorized
following its movements. Second, we consider the case of gloves [16]. An extensive survey on wearable systems for
remote control of a robot either when the user’s view on the activity recognition is reported in [17], and classical methods
robot is partially occluded or in an uncluttered environment. for recognizing gestures, such as hidden Markov models, are
In the first case, just as an example, we consider the inspection also presented therein. Further, in [18] an overview of the most
af an area inaccessible to the user due to closed road, debris used approaches for recognizing gestures is reported.
obstructing the road, or architectural barriers in the presence An ad-hoc vibrotactile glove is proposed in [19] to guide
of disabled users. In this scenario, the haptic feedback for ob- the user on a semi-autonomous wheelchair by providing a
stacle avoidance compensates the partial occlusion of the field vibration feedback that indicates obstacles or desired direc-
of view of the user. In the case of uncluttered environment, tions in the environment. While these devices provide a high
the smartwatch might be used to drive the robot for an aerial measurement precision, they heavily limit the freedom of
inspection of the area: consider for example the case of a motion of the user, who is forced to wear uncomfortable ad
quadrotor hosting an infrared camera that supports search and hoc devices [14]. To overcome this, hands-free techniques have
rescue activities in a hostile environment (like mountains, as been developed that rely on accelerometer data. In particular,
in [11], or areas devastated by an earthquake) and provides a gesture recognition methods are proposed in [20]–[22] for
haptic feedback when someone is detected. In addition, we recognizing the user’s motion. In [23] the authors use a
consider the necessity of inspecting the roof of a building wristband and a headset to control a fleet of drones by means
under construction or maintenance, while the user is on the of gesture and speech recognition. The use of the headset
ground. In this scenario, a hands-free interaction is required increases the invasiveness of the approach and moreover the
due to the fact that typically users wear gloves, which impede interaction with the robot is limited to a predefined set of
interaction with input devices, such as touchscreen, mouse primitives.
and keyboard, or might have greasy hands. In this kind of Non-vision-based gesture recognition can also be obtained
applications, hands-free control of a quadrotor represents a utilizing touch screen technologies. Along these lines, [24] and
very effective solution that allows to reach the area of interest [25] consider the use of multi-touch hand-held technologies,
and acquire images (or measurements) remotely. such as tablets, to control robots. These interfaces require
the use of both hands and, hence, are not practical in many
circumstances. Moreover, they lack in spatial interaction [7]
C. Related works and environment skill [5], thus having limited learnability and
Gesture-based control represents one of the most frequently intuitiviness.
utilized paradigms for providing intuitive interaction with
II. M ULTI - MODAL CONTROL ARCHITECTURE FOR THE
robotic systems. Despite being a quite new research field,
INTERACTION WITH A ROBOT
several techniques have been proposed in the literature. The
greatest majority of them relies on the use of vision sys- A. Overview of the system
tems [9], [10], [13]–[15]. Generally speaking, vision-based According to the interaction and control system considered
techniques require proper lighting conditions and camera in this paper, the user wears the smartwatch or the activity
angles [14], and gestures are recognized either by means of tracker wristband: the motion of her/his forearm is then
colored markers [14] or by directly detecting human palm [15]. acquired by the device. In particular, characteristic measure-
The common assumption of all these methodologies consists ments related to the motion are considered: three-dimensional

2377-3766 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2017.2678541, IEEE Robotics
and Automation Letters
VILLANI et al.: NATURAL INFRASTRUCTURE-LESS HUMAN-ROBOT INTERACTION SYSTEM 3

(a) Angles of the smartwatch (b) Directions for teleoperating the


quadrotor

Fig. 2. Reference frames for the Teleoperated state.

preprogrammed, control algorithm. If no control algorithm is


defined, as in the application presented in this paper, then the
quadrotor is stopped.
Fig. 1. State machine of the control algorithm.
Using the Up gesture, the user can move to the Hovering
state, where the quadrotor autonomously takes off, and reaches
acceleration (raw and with automatically compensated gravity) a stable hovering position. The Down gesture takes the system
and three-dimensional angular velocity. back to the Autonomous state: if no control algorithm is
Data are then processed for implementing motion recog- defined, the quadrotor lands and stops.
nition: in particular, the motion of the user is analysed for From the Hovering state, it is possible to move to the
recognizing gestures (used for imposing high-level commands Teleoperated state using the Circle gesture. In the Teleoperated
to the robot), or for defining velocity commands (i.e. the robot state, the user can directly control the motion of the quadrotor.
velocity is controlled as a function of the user’s motion). It is possible to go back to the Hovering state using again the
Feedback is then provided to the user, to inform her/him Circle gesture.
about the current status of the system and the recognition From the Teleoperated state, it is possible to utilize the Up
of gestures. In particular, feedback is provided to the user gesture to take the system to the Fly up state. In this state,
in terms of modulated vibration frequency of the smartwatch, the quadrotor increases its height, at a constant velocity. The
depending on the position and the status of the robot. gesture Down can then be utilized by the operator to take the
system back to the Teleoperated state. In a similar manner,
B. Control architecture for a quadrotor the gesture Down can be used to take the system to the Fly
In the following we describe a particular realization of down state, in which the quadrotor decreases its height, at a
the proposed interaction system, in which a smartwatch is constant velocity. The gesture Up can then be utilized by the
exploited for letting a human operator interact with a quadrotor operator to take the system back to the Teleoperated state.
unmanned aerial vehicle1 . The overall control architecture From the Teleoperated state, it is possible to utilize the
utilized for controlling the quadrotor is described by the state Left (Right) gesture for bringing the system to the Left flip
machine diagram depicted in Fig. 1. Gestures are utilized for (Right flip) state: in this condition, the quadrotor executes a
changing the state of the system. Even though the proposed flip maneuver in clockwise (counterclockwise) direction, with
architecture could be implemented with any set of gestures, a preprogrammed control algorithm, and then automatically
in this application we consider the following five gestures, goes back to the Teleoperated state.
without loss of generality: 1) Up: sharp movement upwards, In the Teleoperated state, the pose of the smartwatch is
2) Down: sharp movement downwards, 3) Circle: movement translated into a velocity command for the quadrotor, and its
in a circular shape, 4) Left: sharp movement to the left, vibration is exploited to provide the user with a feedback on
5) Right: sharp movement to the right. As will be clarified the current performance of the quadrotor. In details, the angles
hereafter, these gestures represent a natural choice, since they of the smartwatch are directly translated into velocity control
can be naturally used for implementing an intuitive coupling inputs for the quadrotor, in a natural and intuitive manner.
between user commands and effects on the robot behavior. In Referring to Figs. 2(a) and 2(b), the Roll angle is used
the following, Section IV is devoted to the description of the to control the motion of the quadrotor along the for-
algorithm used for gesture recognition. ward/backward direction and the Pitch angle is used to control
The system is initialized in the Autonomous state. In this the motion of the quadrotor along the left/right direction. In
state there is no interaction between the user and the quadrotor. particular, let ϑr , ϑp ∈ [−π/2, π/2] be the roll and pitch
Hence, the quadrotor is controlled by means of the internal, angle, respectively, and let vx , vy ∈ R be the velocity of the
1 We will hereafter assume that the quadrotor is endowed with a low level
quadrotor along the forward/backward and left/right direction,
control system, which enables it to perform desired trajectories. This can be with positive sign towards the forward and left direction,
obtained using standard quadrotor control techniques, such as [26]. respectively. Then, the velocity commands are computed as

2377-3766 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2017.2678541, IEEE Robotics
and Automation Letters
4 IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED FEBRUARY, 2017

follows: maximum and minimum allowed values for the height of


vx = Kr ϑr , vy = Kp ϑp (1) the quadrotor, respectively. More specifically, let h (t) be the
height of the quadrotor at time t: then, we want to ensure that
where Kr , Kp > 0 are constants defined in such a way
hm ≤ h (t) ≤ hM , ∀t. This requirement is due to the fact that
that the maximum angle that is achievable with the motion
our experiments are performed indoor; for outdoor scenarios
of the forearm corresponds to the maximum velocity of the
it might be relaxed to h (t) ≥ hm . In the Fly up and Fly down
quadrotor.
states, the vibration frequency increases as the height of the
Conversely, the Yaw angle of the smartwatch is used to
quadrotor approaches hm or hM , in a proportional manner.
define a setpoint for the Yaw angle of the quadrotor. Namely, (A)
Specifically, let fv (t) be the vibration frequency at time t
let ϑy ∈ [−π, π] be the yaw angle of the smartwatch, and let
corresponding to the feedback on altitude. In the Fly up state,
ϕ ∈ [−π, π] be the yaw angle of the quadrotor. Then, the yaw
it is defined as follows:
rate of the quadrotor is controlled as follows: 
0 if h (t) < hM − δ
ϕ̇ = Ky (ϑy − ϕ) (2) fv(A) (t) =
Kv (h (t) − hM + δ) otherwise
where Ky > 0 is an arbitrarily defined constant. (3)
where the threshold δ was defined as δ = 0.3 (hM − hm ), and
Kv = 0.5 is a normalization constant, empirically defined to
III. H APTIC FEEDBACK
guarantee that the vibration frequency is sufficiently small at
Modulated vibration of the smartwatch is utilized for pro- the discontinuity h (t) < hM −δ, thus attenuating the effect of
viding the user with a haptic feedback. In particular, vibration the discontinuity and avoiding possible confusion and stress to
is utilized with two different purposes: for acknowledging the the user due to a sudden strong vibration. In a similar way, the
user’s command, and for providing her/him with information vibration frequency was defined as follows in the Fly down
on the status of the robot. state:

(A) 0 if h (t) > hm + δ
A. Command acknowledgement fv (t) =
Kv (−h (t) + hM + δ) otherwise
Vibration is utilized for informing the user about recog- (4)
nition of gestures: namely, a single short vibration notifies Furthermore, the user is provided with a feedback related to
the user when a gesture has been recognized by the control the distance form either the target to reach or the obstacle to
(T ) (O)
architecture. This provides an immediate feedback about the avoid, based on the application. Denoting by fv and fv
current status of the robot with reference to Fig. 1, and the vibration frequency of these two haptic feedbacks, they
the possible actions, thus increasing user’s situation aware- are defined as
ness. Such feedback aims at solving, at least partially, the 1 1
ephemerality of gestures [27]. Indeed, the ephemeral nature fv(T ) (t) = Kv(T ) (T ) , fv(O) (t) = Kv(O) (O) (5)
d (t) d (t)
of gestures implies that if a gesture is not recognized or
(T ) (O)
is misrecognized, the user has little information available to where Kv and Kv are normalization constants set to 0.2
understand what happened [27]. This kind of sensory feedback following the same empirical criterion as Kv , and d(T ) (t)
tackles the first condition by informing the user whether a and d(O) (t) are the Euclidean distances between the instanta-
gesture has been recognized. In the current implementation, neous quadrotor position and the target and obstacle positions,
no information is provided about possible misrecognition: it respectively. Several techniques can be found in the literature
could be provided by showing on the screen of the smartwatch for using the on-board cameras to detect the position of targets
which gesture has been recognized. However, it is worthwhile and obstacles in real time: however, this is out of the scope
noting that, utilizing the gesture recognition algorithm that will of the paper, and will therefore not be addressed here. In
be presented in Section IV, we experimentally verified that Subsection V-A, preliminary results of the validation of the
gesture misrecognition rate is quite low, which significantly haptic feedback for target proximity are presented.
reduces the need for this additional information.
IV. G ESTURE RECOGNITION
B. Status of the robot To implement the control architecture introduced in Subsec-
Vibration is also utilized for assisting the user in piloting tion II-B, in the following we will consider gesture recognition
the quadrotor. In particular, we considered the case where the by template matching [28], [29]. This classical approach
quadrotor is moving in an environment which is populated by represents a standard algorithm for this kind of applications
obstacles to be avoided, or targets to be reached. The user and, as will be shown in Subsection IV-C, exhibits satisfac-
then senses a vibration that is modulated based on the current tory performance. However, we would like to point out that
altitude of the quadrotor, and its position with respect to the more advanced methods have been proposed in the literature,
obstacles, or its distance from the target. allowing higher recognition performance.
As regards altitude related vibration, it is mostly important In the proposed system, gestures are recognized by tem-
indoor, where practical constraint on the maximum height plate matching of signals recorded by accelerometers and
of flight of the robot are to be considered. In particular, gyroscopes in the smartwatch. Hence, an initial training step
we introduce hM and hm , with hM > hm > 0, as the for template building is required. Then, when the system is

2377-3766 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2017.2678541, IEEE Robotics
and Automation Letters
VILLANI et al.: NATURAL INFRASTRUCTURE-LESS HUMAN-ROBOT INTERACTION SYSTEM 5

activated, the signals acquired by the smartwatch are on-line B. Template matching
compared to the templates and the decision whether a gesture Templates are exploited to detect the occurrence of gestures
has just occurred is taken by computing suited metrics of in real-time, that is when interacting with the robot. Basi-
comparison. cally, each measured signal is compared to the corresponding
templates of all possible gestures and a rule is established to
A. Template building decide which gesture is the most likely to have just occurred,
A template is built for each signal recorded by the smart- if any. As a metric of comparison, we use the correlation
watch and for each gesture. Let S be the set of signals that coefficient [31]. Denoting by xw,k = x(k, . . . , k + N − 1) the
the smartwatch can acquire, namely angular displacements and sliding window of signal sig starting at time index k, the
linear accelerations (raw and with automatically compensated correlation coefficient between such window and the template
gravity), and let G = {U p, Down, Circle, Lef t, Right} be for the gesture gest is given by
the set of gestures of interest. Moreover, let Nsig = 9 be (sig,gest) σx̃xw,k
the cardinality of S, that is the number of signals acquired ρx̃xw,k =q (9)
2 σ2
σx̃x̃
by the smartwatch, and let Ngest = 5 be the cardinality of G. xw,k xw,k
Hence, a total of Nsig · Ngest templates is required. 2
Training data used for the construction of templates consist where σx̃xw,k , σx̃x̃ and σx2w,k xw,k are the covariance and
of Nrep = 60 repetitions of each gesture. Epochs have been variance of the signals in subscript. Then, the correlation
automatically extracted from the recorded signals, considering coefficients are averaged for each gesture, in order to obtain
(gest)
a window of fixed length N = 35. A preliminary alignment a coefficient, denoted by ρx̃xw,k , representing how much
step, based on the normalized cross-correlation function [30], the movement in the current sliding window resembles each
is required to correct the approximate selection of the begin- gesture. Thus, we have
ning of each epoch. Given two real finite-duration sequences (gest) 1 X (sig ∗,gest ,gest)
x, y of length N , with x(n), y(n), n = 1, . . . , N , the n- ρx̃xw,k = ∗,gest ρ (10)
|S | ∗,gest ∗,gest x̃xw,k
sig ∈S
th element of x, y, respectively, the corresponding cross-
correlation is defined as where |S ∗,gest
| denotes the cardinality of the set S ∗,gest .
N −|k|−1 To exclude false positives (the algorithm detects a gesture,
X
rxy (l) = x(n)y(n − l) l = 0, ±1, ±2, . . . (6) although no gestures have been performed), we set a threshold
n=i θ = 0.75 on the correlation coefficient. The value for θ
where i = l, k = 0 for l ≥ 0, and i = 0, k = l for l < 0 and was coarsely set, without neither fine tuning nor optimization.
l is the time shift (or lag) parameter. The normalized cross- Despite of this, as shown in the next Subsection, the accuracy
correlation sequence [30] is defined as of gesture recognition is satisfactory. Thus, the decision rule
rxy (l) is as follows. If, for the current sliding window, all the
ρxy (l) = p (7) coefficients are below the threshold, then we decide that no
rxx (0)ryy (0) (gest)
gesture has occurred. Otherwise, if we have ρx̃xw,k > θ for
where rxx (0) and ryy (0) are the variance of sequences x(n) one gesture, then we claim that the corresponding gesture has
(gest)
and y(n), respectively. It ranges from −1 to +1 and for l = 0 occurred. If we have ρx̃xw,k > θ for more than one gesture,
it corresponds to the correlation coefficient of x(n) and y(n), then we pick the gesture with the greatest ρx̃xw,k .
(gest)

i.e., ρxy (0) = ρxy . Finally, after the detection of a gesture, we establish a short
The cross-correlation sequence, and its normalized version, latency time window of length 1 s in which we expect the
are a widely used tool to measure the similarity of two series subject not to perform any new gesture, waiting for the robot to
as a function of the lag of one relative to the other. In this be ready in the requested state (see the state machine diagram
regard, they turn out to be useful to align the epochs within in Fig. 1). Such time interval starts with the detection of a
which gestures were performed. In particular, considering gesture by the algorithm described above. During the latency
arbitrarily as a reference the first epoch, namely x0 (n), of each window the search for a new gesture by template matching
data record, remaining epochs xi (n), i = 1, . . . , Nrep − 1 are is stopped: data acquired by the smartwatch are then not
shifted by a number li∗ of lags such that the normalized cross- analyzed. Template matching is restarted at the end of such
correlation sequence between the reference and the current interval.
epoch is maximized:
li∗ = l | ρx0 xi (li∗ ) = max ρx0 xi (l). (8) C. Validation of the gesture recognition module
l

A threshold α = 0.7 is set on the value of the maximum To assess the performance of the algorithm for gesture
normalized cross-correlation sequence ρx0 xi (li∗ ): only epochs recognition, we carried out three tests. First, a subject was
with ρx0 xi (li∗ ) > α are involved in the definition of the asked to randomly perform N1 = 100 repetitions of the
template. Gestures not satisfying this constraint are discarded, gestures in a steady state, that is sitting at a desk and holding
since they are considered not representative of the correct the arm at rest between two consecutive gestures. Second, the
execution of the gesture. Selected epochs are then averaged same subject was asked to perform N2 = 100 repetitions of the
to build the template, denoted by xe(sig,gest) . gestures while moving, that is walking, jogging, (simulating)

2377-3766 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2017.2678541, IEEE Robotics
and Automation Letters
6 IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED FEBRUARY, 2017

assessed the effectiveness of the haptic feedback; second, we


RECOGNIZED considered the proposed interaction system in a real scenario
and compared it with the official smartphone application that
↑ ↓ ⟳ ← → none comes with the quadrotor. The joypad and the smartphone
application were selected as standard reference interaction
↑ 78.9 1.2 0 2.5 1.2 16.2 systems. Clearly, they require the use of hands, and, in the
case of the joypad, are not infrastructure-less. Moreover, they
↓ 1.9 84.4 0 0 10.0 3.7 provide a less natural mapping of user movements to quadrotor
ACTUAL

movements, as corroborated by the numerical results presented


⟳ 0.6 0 90.1 3.1 0 6.2 below and reported by subjects involved in the tests.
To experimentally validate the proposed HRI system, we
← 0.6 0 0 95.0 0 4.4 utilized a Parrot AR.Drone 2.0 quadrotor and a Samsung Gear
S smartwatch. Due to its limited computation capabilities, it
→ 11.2 0 0 0 85.0 3.8 was not possible to implement the gesture recognition and
control architecture on-board the quadrotor. Thus, an external
computer was utilized, and the architecture was implemented
Fig. 3. Confusion matrix of the gesture recognition. Values are in %.
by means of ROS [33]. Wi-Fi was used for communicating
with the quadrotor, which was equipped with a vision-based
swimming, exercising, shaking the arm to touch his head, localization system as described in [34], and with the smart-
greeting someone, or looking at the watch. watch.
In these experiments, the achieved accuracy is very high
(100% and 95%, respectively), even when spurious movements
were performed. However, this result might be biased by the A. Simulated environment
facts that gestures are performed in a very controlled situation, In this set of experiments, we simulated the quadrotor by
and only one subject was involved. using the robot simulator Gazebo3 implemented in ROS [33].
To assess the inter-subject robustness of the gesture recog- The simulator was running on the same computer that was
nition, the third experiment involved 20 subjects, not partici- connected to the smartwatch and where gesture recognition
pating in the training set, for a total of N3 = 800 repeated ges- was computed. Two experiments were carried out.
tures. Preliminarily, each subject was briefly instructed about Twenty two subjects were involved in the tests (7 females
the system and performed few gestures to get confident. Then, and 15 males, age 31.2 y.o. ± 4.3). They are researchers
she/he was asked to perform 40 repetitions of the gestures, from our team and students from our university who are not
while standing, as shown in the first part of the attached involved in this research topic at all. Some of them have some
video. Fig. 3 reports the performance of the algorithm in terms experience in the use of the joypad for videogaming, but they
of confusion matrix2 . Although no subject-tailored template all have barely experience in the use of the smartwatch for
building was carried out, results show that the implemented controlling a quadrotor and are first-time users in this regard.
algorithm performs well with subjects not involved in the off- 1) Usability assessment: First, the user was asked to drive
line training, achieving an overall accuracy higher than 86%. the robot through a cluttered environment. In particular, the
We would like to remark that this result, and in general any user had to move the robot in an area where five columns
gesture misrecognition, do not affect the safety and attitude were placed on the ground, as shown in Fig. 4(a). The user was
control of the robot. Indeed, in the considered system, gestures instructed to drive the robots from the initial position to the
are only utilized for switching among different states in the goal position, performing a slalom path, without touching any
state machine, whereas, in the Teleoperated state, the motion column. Each user was requested to perform the experiment
of the user directly defines the motion of the quadrotor as a with two different interaction systems: the smartwatch and a
function of the roll, pitch and yaw angles of the smartwatch. standard joypad. The same state machine of Fig. 1 was used for
the joypad: some buttons were programmed to initiate actions
V. E XPERIMENTAL VALIDATION like take-off, landing, hovering, flying up and flying down,
Experiments aimed at evaluating the usability of the pro- as gestures do for the smartwatch. Then, the pose of two
posed interaction mode with the quadrotor and assessing analog sticks was translated into a velocity command for the
the effectiveness of the haptic feedback. Two experimental quadrotor, as with the pose of the smartwatch. To compare the
conditions were considered: first, we considered a simulated two interaction means, we considered the total travel time. The
environment where we compared the interaction by means of results achieved with the smartwatch are significantly better4
the smartwatch and of a common joypad for videogaming and than the ones obtained with the joypad. On average, the total
travel time in the case of the smartwatch was 35.8 s with
2 The confusion matrix is used in machine learning to report results achieved
standard deviation 12.6 s, whereas in the case fo the joypad
by a classification system [32]. It contains information about actual (rows)
and predicted (columns) classifications and can be used to compute system’s
the total travel time was 41.0 s with standard deviation 13.9 s.
accuracy and misclassification rate. The last column (right) of the confusion
3 http://gazebosim.org/
matrix reports the occurrence of false negatives, i.e., performed gestures not
detected by the algorithm. 4 P-value p = 0.1.

2377-3766 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2017.2678541, IEEE Robotics
and Automation Letters
VILLANI et al.: NATURAL INFRASTRUCTURE-LESS HUMAN-ROBOT INTERACTION SYSTEM 7

1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

Probability

Probability
0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1
Smartwatch with haptic feedback
Joypad without haptic feedback
0 0
20 30 40 50 60 70 0 0.5 1 1.5
Time [s] Distance [m]

(a) Experimental setup (b) Performance metrics (a) Experimental setup (b) Performance metrics

Fig. 4. Assessment of the usability of the proposed interaction system in Fig. 5. Assessment of the effectiveness of the haptic feedback in simulated
simulated environment. Left: experimental setup. Right: empirical distribution environment. Left: experimental setup. Right: empirical distribution function
function of the travel times when piloting the robot with the smartwatch (red) of the landing distances from the target with (blue) and without (red) haptic
and a joypad (blue). feedback.

120
Moreover, the effectiveness of the use of the smartwatch Smartwatch

AR.FreeFlight app
has been also assessed in terms of the empirical distribution 100

function [31] of travel time. The empirical distribution function 80


provides a complete statistical description of the effectiveness,

Time [s]
measured here as travel time, of the considered interaction ap- 60

proaches, as opposed to the mean, the variance, the median, or 40


synthetic statistical tests, which give only partial information.
Fig. 4(b) reports the empirical distribution functions of travel 20

time for the two interaction devices. As the figure highlights, 0


2 4 6 8 10
piloting the quadrotor with the smartwatch allows to complete Trial

the task in shorter time with higher probability than the case (a) Experimental setup (b) Performance metrics
of the joypad.
2) Haptic feedback: Second, the user was requested to drive Fig. 6. Assessment of the usability of the proposed architecture in real
environment Left: experimental setup. Right: time to pop a balloon when
the robot around a column and then to visit a target location, piloting the quadrotor with the smartwatch (red solid) and the official
identified by a red circle (diameter 40 cm) placed on the application (blue dashed).
ground, as shown in Fig. 5(a). The goal of this test was to
assess the effectiveness of the haptic feedback by measuring Specifically, we tested the usability of the proposed sys-
the distance from the target when the haptic feedback was or tem, in comparison with the official smartphone application
was not provided to the user. Experimental results have shown AR.FreeFlight to pilot the quadrotor. The AR.FreeFlight app
that the haptic feedback provides useful information to users5 : implements the same degree of autonomy as the proposed
indeed, in presence of the haptic feedback, on average, users system. Actions like take-off, landing, hovering, flip, flying
landed at a mean distance from the target µd(T ) = 0.41 m up and flying down are internally implemented in the app
with standard deviation σd(T ) = 0.25 m, whereas when haptic and can be activated by the user by tapping a button in the
feedback was not provided users landed at a mean distance interface, just as gestures do. The position of two analog sticks
from the target µd(T ) = 0.59 m with standard deviation is translated into a velocity command for the quadrotor, as is
σd(T ) = 0.33 m. In terms of the empirical distribution function done with the pose of the smartwatch in the Teleoperated state
of measured distances, Fig. 5(b) highlights that the haptic of Fig. 1.
feedback provides useful information to the user, since it The experiment consisted in asking the user to pop some
allows to land at lower distance from the target with higher balloons by piloting the quadrotor, equipped with four needles,
probability. to collide on them. A snapshot of the experiment is shown in
Fig. 6(a), and a portion of the experiment is reported in the
B. Real environment third part of the attached video. As a metric to quantify the
The proposed HRI system was validated also by means of effectiveness of interaction, we considered the average time
experiments involving users. The results of these experiments required to pop a balloon, starting from the quadrotor take-
are shown in the second part of the accompanying video, off. Two subjects were involved in the experiment and 10
where a user pilots the quadrotor by means of the smartwatch. balloons per method were popped. The application ran on
a smartphone embedding a 5.2 in touchscreen. Piloting the
5 P-value p = 0.02. quadrotor with the smartwatch proved to be more intuitive

2377-3766 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2017.2678541, IEEE Robotics
and Automation Letters
8 IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED FEBRUARY, 2017

and both performers reported an increased cognitive burden [11] “SHERPA, EU collaborative project ICT-6000958.”
when utilizing the smartphone application rather than the [12] V. Villani, L. Sabattini, N. Battilani, and C. Fantuzzi, “Smartwatch-
enhanced interaction with an advanced troubleshooting system for
smartwatch. This is confirmed6 by the fact that the use of industrial machines,” IFAC-PapersOnLine, vol. 49, no. 19, pp. 277–282,
the smartwatch resulted in an average time to pop a balloon 2016.
of 19.9 s, whereas piloting with the application resulted in [13] A. Sanna, F. Lamberti, G. Paravati, and F. Manuri, “A kinect-based nat-
ural interface for quadrotor control,” Entertainment Computing, vol. 4,
an average time of 47.6 s. In Fig. 6(b) we report the time to pp. 179–186, 2013.
pop for both the piloting modalities, for each trial (red solid [14] J. L. Raheja, R. Shyam, G. A. Rajsekhar, and P. B. Prasad, Real-Time
for the smartwatch, blue dashed for the application), sorted Robotic Hand Control Using Hand Gestures. InTech, 2012, ch. 20, pp.
413–428.
in ascending order. The figure shows that, in 4 trials over 10, [15] H. K. Kaura, V. Honrao, P. Sayali, and P. Shetty, “Gesture controlled
the use of the application took a time greater than the greatest robot using image processing,” Int. J. Adv. Res. Artif. Intell, vol. 2, no. 5,
time (black dashed line) achieved with the smartwatch. pp. 69–77, 2013.
[16] H. Junker, O. Amft, P. Lukowicz, and G. Tröstera, “Gesture spotting with
VI. C ONCLUSIONS body-worn inertial sensors to detect user activities,” Pattern Recognition,
In this paper we introduced a novel approach for hands-free vol. 41, pp. 2010–2024, 2008.
[17] D. Roggen, S. Magnenat, M. Waibel, and G. Tröster, “Wearable com-
natural human-robot interaction, based on the use of general puting,” IEEE Robot. Automat. Mag., vol. 18, no. 2, pp. 83–95, 2011.
purpose devices. In particular, the motion of the user’s forearm [18] S. Mitra and T. Acharya, “Gesture recognition: A survey,” IEEE Tran.
was recognized by means of a smartwatch, and processed Syst. Man Cybern. C, pl. Rev., vol. 37, no. 3, pp. 311–324, 2007.
[19] H. Uchiyama, M. A. Covington, and W. D. Potter, “Vibrotactile glove
online for recognizing gestures, and for defining control inputs guidance for semi-autonomous wheelchair operations,” in Proceedings
for a robot. Modulated vibration was exploited for providing of the 46th ACM Southeast Conference (ACMSE). ACM, 2008, pp.
the user with a haptic feedback, informing her/him about 336–339.
[20] A. Akl and S. Valaee, “Accelerometer-based gesture recognition via
detected gestures and the current status of the robot. The dynamic-time warping, affinity propagation, & compressive sensing,” in
usability of the proposed interaction system was experimen- Proc. IEEE Int. Conf. Acoustics Speech and Signal Process. (ICASSP).
tally evaluated, both in real and simulated environment, having IEEE, 2010, pp. 2270–2273.
[21] M. Khan, S. I. Ahamed, M. Rahman, and J.-J. Yang, “Gesthaar: An
users controlling the motion of a semi-autonomous quadrotor. accelerometer-based gesture recognition method and its application in
Future work will consider the implementation of on-line nui driven pervasive healthcare,” in Proc. IEEE Int. Conf. Emerging
learning methods for adapting the templates to the user, thus Signal Process. Applicat. (ESPA). IEEE, 2012, pp. 163–166.
[22] J. Liu, L. Zhong, J. Wickramasuriya, and V. Vasudevan, “uWave:
further improving the performance of the proposed interaction Accelerometer-based personalized gesture recognition and its applica-
architecture. Moreover, we plan to improve the usability tions,” Pervasive and Mobile Computing, vol. 5, no. 6, pp. 657–675,
assessment, measuring the cognitive workload associated to 2009.
[23] J. Cacace, F. Alberto, and V. Lippiello, “Multimodal interaction
the use of the smartwatch to interact with a quadrotor. This with multiple co-located drones in search and rescue missions,”
information can be exploited to adapt the robot behavior to arXiv:1605.07316, 2016.
the cognitive and emotional state of the user, in a scenario of [24] A. W. Evans, J. P. Gray, D. Rudnick, and R. E. Karlsen, “Control
solutions for robots using Android and iOS devices,” in SPIE Defense,
affective HRI [35]. Security, and Sensing. Int. Soc. Optics and Photonics, 2012.
[25] M. Micire, M. Desai, A. Courtemanche, K. M. Tsui, and H. A. Yanco,
R EFERENCES “Analysis of natural gestures for controlling robot teams on multi-touch
tabletop surfaces,” in Proc. ACM Int. Conf. Interactive Tabletops and
[1] R. Bemelmans, G. J. Gelderblom, P. Jonker, and L. De Witte, “Socially Surfaces (ITS), 2009, pp. 41–48.
assistive robots in elderly care: A systematic review into effects and [26] D. Mellinger, N. Michael, and V. Kumar, “Trajectory generation and
effectiveness,” J. Am. Med. Dir. Assoc., vol. 13, pp. 114–120, 2012. control for precise aggressive maneuvers with quadrotors,” Int. J. Robot.
[2] D. Di Paola, A. Milella, G. Cicirelli, and A. Distante, “An autonomous Res., vol. 31, no. 5, pp. 664–674, 2012.
mobile robotic system for surveillance of indoor environments,” Int. J. [27] D. A. Norman, “Natural user interfaces are not natural,” Interactions,
Adv. Robot. Syst., vol. 7, no. 1, pp. 19–26, 2010. vol. 17, no. 3, pp. 6–10, 2010.
[3] N. Tomatis, R. Philippsen, B. Jensen, K. O. Arras, G. Terrien, R. Piguet, [28] R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis.
and R. Y. Siegwart, “Building a fully autonomous tour guide robot,” in New York: Wiley, 1973.
Proc. 33rd Int. Symp. Robotics (ISR), 2002. [29] R. C. Gonzalez and R. E. Woods, Digital Image Processing, 3rd ed.
[4] M. c. Kang, K. s. Kim, D. k. Noh, J. w. Han, and S. j. Ko, “A robust Prentice Hall, 2008.
obstacle detection method for robotic vacuum cleaners,” IEEE Trans. [30] J. G. Proakis and D. G. Manolakis, Digital Signal Processing. Principles,
Consumer Electron., vol. 60, no. 4, pp. 587–595, Nov 2014. Algorithms, and Applications, 3rd ed. Prentice-Hall International, Inc.,
[5] R. J. Jacob, A. Girouard, L. M. Hirshfield, M. S. Horn, O. Shaer, E. T. 1996.
Solovey, and J. Zigelbaum, “Reality-based interaction: A framework [31] H. Kobayashi, B. L. Mark, and W. Turin, Probability, Random Processes,
for post-WIMP interfaces,” in Proc. SIGCHI Conf. Human Factors in and Statistical Analysis. Cambridge University Press, 2012.
Computing Syst. (CHI). ACM Press, 2008, pp. 201–210. [32] R. Kohavi and F. Provost, “Glossary of terms,” Machine Learning,
[6] J. Blake, Natural User Interfaces in .NET. Manning, 2011. vol. 30, no. 2-3, pp. 271–274, 1998.
[7] E. Hornecker and J. Buur, “Getting a grip on tangible interaction: A [33] M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs,
framework on physical space and social interaction,” in Proc. SIGCHI R. Wheeler, and A. Y. Ng, “ROS: an open-source robot operating
Conf. Human Factors in Computing Syst. (CHI). ACM Press, 2006, system,” in Proc. ICRA Workshop Open Source Software, vol. 3, no.
pp. 437–446. 3.2, 2009, p. 5.
[8] A. Levratti, A. De Vuono, C. Fantuzzi, and C. Secchi, “TIREBOT: A [34] J. Engel, J. Sturm, and D. Cremers, “Scale-aware navigation of a
novel tire workshop assistant robot,” Proc. IEEE Int. Conf. Advanced low-cost quadrocopter with a monocular camera,” Robot. Auton. Syst.,
Intelligent Mechatronics (AIM), pp. 733–738, 2016. vol. 62, no. 11, pp. 1646–1656, 2014.
[9] S. Waldherr, R. Romero, and S. Thrun, “A gesture based interface for [35] P. Rani, N. Sarkar, C. A. Smith, and L. D. Kirby, “Anxiety detecting
human-robot interaction,” Autonomous Robot, vol. 9, pp. 151–173, 2000. robotic system - towards implicit human-robot collaboration,” Robotica,
[10] A. De Luca and F. Flacco, “Integrated control for pHRI: collision vol. 22, no. 1, pp. 85–95, 2004.
avoidance, detection, reaction and collaboration,” Proc. 4th IEEE Int.
Conf. Biomedical Robotics and Biomechatronics, pp. 288–295, 2012.

6 P-value p = 0.04.

2377-3766 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Potrebbero piacerti anche