Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2017.2678541, IEEE Robotics
and Automation Letters
IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED FEBRUARY, 2017 1
2377-3766 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2017.2678541, IEEE Robotics
and Automation Letters
2 IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED FEBRUARY, 2017
2377-3766 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2017.2678541, IEEE Robotics
and Automation Letters
VILLANI et al.: NATURAL INFRASTRUCTURE-LESS HUMAN-ROBOT INTERACTION SYSTEM 3
2377-3766 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2017.2678541, IEEE Robotics
and Automation Letters
4 IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED FEBRUARY, 2017
2377-3766 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2017.2678541, IEEE Robotics
and Automation Letters
VILLANI et al.: NATURAL INFRASTRUCTURE-LESS HUMAN-ROBOT INTERACTION SYSTEM 5
activated, the signals acquired by the smartwatch are on-line B. Template matching
compared to the templates and the decision whether a gesture Templates are exploited to detect the occurrence of gestures
has just occurred is taken by computing suited metrics of in real-time, that is when interacting with the robot. Basi-
comparison. cally, each measured signal is compared to the corresponding
templates of all possible gestures and a rule is established to
A. Template building decide which gesture is the most likely to have just occurred,
A template is built for each signal recorded by the smart- if any. As a metric of comparison, we use the correlation
watch and for each gesture. Let S be the set of signals that coefficient [31]. Denoting by xw,k = x(k, . . . , k + N − 1) the
the smartwatch can acquire, namely angular displacements and sliding window of signal sig starting at time index k, the
linear accelerations (raw and with automatically compensated correlation coefficient between such window and the template
gravity), and let G = {U p, Down, Circle, Lef t, Right} be for the gesture gest is given by
the set of gestures of interest. Moreover, let Nsig = 9 be (sig,gest) σx̃xw,k
the cardinality of S, that is the number of signals acquired ρx̃xw,k =q (9)
2 σ2
σx̃x̃
by the smartwatch, and let Ngest = 5 be the cardinality of G. xw,k xw,k
Hence, a total of Nsig · Ngest templates is required. 2
Training data used for the construction of templates consist where σx̃xw,k , σx̃x̃ and σx2w,k xw,k are the covariance and
of Nrep = 60 repetitions of each gesture. Epochs have been variance of the signals in subscript. Then, the correlation
automatically extracted from the recorded signals, considering coefficients are averaged for each gesture, in order to obtain
(gest)
a window of fixed length N = 35. A preliminary alignment a coefficient, denoted by ρx̃xw,k , representing how much
step, based on the normalized cross-correlation function [30], the movement in the current sliding window resembles each
is required to correct the approximate selection of the begin- gesture. Thus, we have
ning of each epoch. Given two real finite-duration sequences (gest) 1 X (sig ∗,gest ,gest)
x, y of length N , with x(n), y(n), n = 1, . . . , N , the n- ρx̃xw,k = ∗,gest ρ (10)
|S | ∗,gest ∗,gest x̃xw,k
sig ∈S
th element of x, y, respectively, the corresponding cross-
correlation is defined as where |S ∗,gest
| denotes the cardinality of the set S ∗,gest .
N −|k|−1 To exclude false positives (the algorithm detects a gesture,
X
rxy (l) = x(n)y(n − l) l = 0, ±1, ±2, . . . (6) although no gestures have been performed), we set a threshold
n=i θ = 0.75 on the correlation coefficient. The value for θ
where i = l, k = 0 for l ≥ 0, and i = 0, k = l for l < 0 and was coarsely set, without neither fine tuning nor optimization.
l is the time shift (or lag) parameter. The normalized cross- Despite of this, as shown in the next Subsection, the accuracy
correlation sequence [30] is defined as of gesture recognition is satisfactory. Thus, the decision rule
rxy (l) is as follows. If, for the current sliding window, all the
ρxy (l) = p (7) coefficients are below the threshold, then we decide that no
rxx (0)ryy (0) (gest)
gesture has occurred. Otherwise, if we have ρx̃xw,k > θ for
where rxx (0) and ryy (0) are the variance of sequences x(n) one gesture, then we claim that the corresponding gesture has
(gest)
and y(n), respectively. It ranges from −1 to +1 and for l = 0 occurred. If we have ρx̃xw,k > θ for more than one gesture,
it corresponds to the correlation coefficient of x(n) and y(n), then we pick the gesture with the greatest ρx̃xw,k .
(gest)
i.e., ρxy (0) = ρxy . Finally, after the detection of a gesture, we establish a short
The cross-correlation sequence, and its normalized version, latency time window of length 1 s in which we expect the
are a widely used tool to measure the similarity of two series subject not to perform any new gesture, waiting for the robot to
as a function of the lag of one relative to the other. In this be ready in the requested state (see the state machine diagram
regard, they turn out to be useful to align the epochs within in Fig. 1). Such time interval starts with the detection of a
which gestures were performed. In particular, considering gesture by the algorithm described above. During the latency
arbitrarily as a reference the first epoch, namely x0 (n), of each window the search for a new gesture by template matching
data record, remaining epochs xi (n), i = 1, . . . , Nrep − 1 are is stopped: data acquired by the smartwatch are then not
shifted by a number li∗ of lags such that the normalized cross- analyzed. Template matching is restarted at the end of such
correlation sequence between the reference and the current interval.
epoch is maximized:
li∗ = l | ρx0 xi (li∗ ) = max ρx0 xi (l). (8) C. Validation of the gesture recognition module
l
A threshold α = 0.7 is set on the value of the maximum To assess the performance of the algorithm for gesture
normalized cross-correlation sequence ρx0 xi (li∗ ): only epochs recognition, we carried out three tests. First, a subject was
with ρx0 xi (li∗ ) > α are involved in the definition of the asked to randomly perform N1 = 100 repetitions of the
template. Gestures not satisfying this constraint are discarded, gestures in a steady state, that is sitting at a desk and holding
since they are considered not representative of the correct the arm at rest between two consecutive gestures. Second, the
execution of the gesture. Selected epochs are then averaged same subject was asked to perform N2 = 100 repetitions of the
to build the template, denoted by xe(sig,gest) . gestures while moving, that is walking, jogging, (simulating)
2377-3766 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2017.2678541, IEEE Robotics
and Automation Letters
6 IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED FEBRUARY, 2017
2377-3766 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2017.2678541, IEEE Robotics
and Automation Letters
VILLANI et al.: NATURAL INFRASTRUCTURE-LESS HUMAN-ROBOT INTERACTION SYSTEM 7
1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
Probability
Probability
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
Smartwatch with haptic feedback
Joypad without haptic feedback
0 0
20 30 40 50 60 70 0 0.5 1 1.5
Time [s] Distance [m]
(a) Experimental setup (b) Performance metrics (a) Experimental setup (b) Performance metrics
Fig. 4. Assessment of the usability of the proposed interaction system in Fig. 5. Assessment of the effectiveness of the haptic feedback in simulated
simulated environment. Left: experimental setup. Right: empirical distribution environment. Left: experimental setup. Right: empirical distribution function
function of the travel times when piloting the robot with the smartwatch (red) of the landing distances from the target with (blue) and without (red) haptic
and a joypad (blue). feedback.
120
Moreover, the effectiveness of the use of the smartwatch Smartwatch
AR.FreeFlight app
has been also assessed in terms of the empirical distribution 100
Time [s]
measured here as travel time, of the considered interaction ap- 60
the task in shorter time with higher probability than the case (a) Experimental setup (b) Performance metrics
of the joypad.
2) Haptic feedback: Second, the user was requested to drive Fig. 6. Assessment of the usability of the proposed architecture in real
environment Left: experimental setup. Right: time to pop a balloon when
the robot around a column and then to visit a target location, piloting the quadrotor with the smartwatch (red solid) and the official
identified by a red circle (diameter 40 cm) placed on the application (blue dashed).
ground, as shown in Fig. 5(a). The goal of this test was to
assess the effectiveness of the haptic feedback by measuring Specifically, we tested the usability of the proposed sys-
the distance from the target when the haptic feedback was or tem, in comparison with the official smartphone application
was not provided to the user. Experimental results have shown AR.FreeFlight to pilot the quadrotor. The AR.FreeFlight app
that the haptic feedback provides useful information to users5 : implements the same degree of autonomy as the proposed
indeed, in presence of the haptic feedback, on average, users system. Actions like take-off, landing, hovering, flip, flying
landed at a mean distance from the target µd(T ) = 0.41 m up and flying down are internally implemented in the app
with standard deviation σd(T ) = 0.25 m, whereas when haptic and can be activated by the user by tapping a button in the
feedback was not provided users landed at a mean distance interface, just as gestures do. The position of two analog sticks
from the target µd(T ) = 0.59 m with standard deviation is translated into a velocity command for the quadrotor, as is
σd(T ) = 0.33 m. In terms of the empirical distribution function done with the pose of the smartwatch in the Teleoperated state
of measured distances, Fig. 5(b) highlights that the haptic of Fig. 1.
feedback provides useful information to the user, since it The experiment consisted in asking the user to pop some
allows to land at lower distance from the target with higher balloons by piloting the quadrotor, equipped with four needles,
probability. to collide on them. A snapshot of the experiment is shown in
Fig. 6(a), and a portion of the experiment is reported in the
B. Real environment third part of the attached video. As a metric to quantify the
The proposed HRI system was validated also by means of effectiveness of interaction, we considered the average time
experiments involving users. The results of these experiments required to pop a balloon, starting from the quadrotor take-
are shown in the second part of the accompanying video, off. Two subjects were involved in the experiment and 10
where a user pilots the quadrotor by means of the smartwatch. balloons per method were popped. The application ran on
a smartphone embedding a 5.2 in touchscreen. Piloting the
5 P-value p = 0.02. quadrotor with the smartwatch proved to be more intuitive
2377-3766 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2017.2678541, IEEE Robotics
and Automation Letters
8 IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED FEBRUARY, 2017
and both performers reported an increased cognitive burden [11] “SHERPA, EU collaborative project ICT-6000958.”
when utilizing the smartphone application rather than the [12] V. Villani, L. Sabattini, N. Battilani, and C. Fantuzzi, “Smartwatch-
enhanced interaction with an advanced troubleshooting system for
smartwatch. This is confirmed6 by the fact that the use of industrial machines,” IFAC-PapersOnLine, vol. 49, no. 19, pp. 277–282,
the smartwatch resulted in an average time to pop a balloon 2016.
of 19.9 s, whereas piloting with the application resulted in [13] A. Sanna, F. Lamberti, G. Paravati, and F. Manuri, “A kinect-based nat-
ural interface for quadrotor control,” Entertainment Computing, vol. 4,
an average time of 47.6 s. In Fig. 6(b) we report the time to pp. 179–186, 2013.
pop for both the piloting modalities, for each trial (red solid [14] J. L. Raheja, R. Shyam, G. A. Rajsekhar, and P. B. Prasad, Real-Time
for the smartwatch, blue dashed for the application), sorted Robotic Hand Control Using Hand Gestures. InTech, 2012, ch. 20, pp.
413–428.
in ascending order. The figure shows that, in 4 trials over 10, [15] H. K. Kaura, V. Honrao, P. Sayali, and P. Shetty, “Gesture controlled
the use of the application took a time greater than the greatest robot using image processing,” Int. J. Adv. Res. Artif. Intell, vol. 2, no. 5,
time (black dashed line) achieved with the smartwatch. pp. 69–77, 2013.
[16] H. Junker, O. Amft, P. Lukowicz, and G. Tröstera, “Gesture spotting with
VI. C ONCLUSIONS body-worn inertial sensors to detect user activities,” Pattern Recognition,
In this paper we introduced a novel approach for hands-free vol. 41, pp. 2010–2024, 2008.
[17] D. Roggen, S. Magnenat, M. Waibel, and G. Tröster, “Wearable com-
natural human-robot interaction, based on the use of general puting,” IEEE Robot. Automat. Mag., vol. 18, no. 2, pp. 83–95, 2011.
purpose devices. In particular, the motion of the user’s forearm [18] S. Mitra and T. Acharya, “Gesture recognition: A survey,” IEEE Tran.
was recognized by means of a smartwatch, and processed Syst. Man Cybern. C, pl. Rev., vol. 37, no. 3, pp. 311–324, 2007.
[19] H. Uchiyama, M. A. Covington, and W. D. Potter, “Vibrotactile glove
online for recognizing gestures, and for defining control inputs guidance for semi-autonomous wheelchair operations,” in Proceedings
for a robot. Modulated vibration was exploited for providing of the 46th ACM Southeast Conference (ACMSE). ACM, 2008, pp.
the user with a haptic feedback, informing her/him about 336–339.
[20] A. Akl and S. Valaee, “Accelerometer-based gesture recognition via
detected gestures and the current status of the robot. The dynamic-time warping, affinity propagation, & compressive sensing,” in
usability of the proposed interaction system was experimen- Proc. IEEE Int. Conf. Acoustics Speech and Signal Process. (ICASSP).
tally evaluated, both in real and simulated environment, having IEEE, 2010, pp. 2270–2273.
[21] M. Khan, S. I. Ahamed, M. Rahman, and J.-J. Yang, “Gesthaar: An
users controlling the motion of a semi-autonomous quadrotor. accelerometer-based gesture recognition method and its application in
Future work will consider the implementation of on-line nui driven pervasive healthcare,” in Proc. IEEE Int. Conf. Emerging
learning methods for adapting the templates to the user, thus Signal Process. Applicat. (ESPA). IEEE, 2012, pp. 163–166.
[22] J. Liu, L. Zhong, J. Wickramasuriya, and V. Vasudevan, “uWave:
further improving the performance of the proposed interaction Accelerometer-based personalized gesture recognition and its applica-
architecture. Moreover, we plan to improve the usability tions,” Pervasive and Mobile Computing, vol. 5, no. 6, pp. 657–675,
assessment, measuring the cognitive workload associated to 2009.
[23] J. Cacace, F. Alberto, and V. Lippiello, “Multimodal interaction
the use of the smartwatch to interact with a quadrotor. This with multiple co-located drones in search and rescue missions,”
information can be exploited to adapt the robot behavior to arXiv:1605.07316, 2016.
the cognitive and emotional state of the user, in a scenario of [24] A. W. Evans, J. P. Gray, D. Rudnick, and R. E. Karlsen, “Control
solutions for robots using Android and iOS devices,” in SPIE Defense,
affective HRI [35]. Security, and Sensing. Int. Soc. Optics and Photonics, 2012.
[25] M. Micire, M. Desai, A. Courtemanche, K. M. Tsui, and H. A. Yanco,
R EFERENCES “Analysis of natural gestures for controlling robot teams on multi-touch
tabletop surfaces,” in Proc. ACM Int. Conf. Interactive Tabletops and
[1] R. Bemelmans, G. J. Gelderblom, P. Jonker, and L. De Witte, “Socially Surfaces (ITS), 2009, pp. 41–48.
assistive robots in elderly care: A systematic review into effects and [26] D. Mellinger, N. Michael, and V. Kumar, “Trajectory generation and
effectiveness,” J. Am. Med. Dir. Assoc., vol. 13, pp. 114–120, 2012. control for precise aggressive maneuvers with quadrotors,” Int. J. Robot.
[2] D. Di Paola, A. Milella, G. Cicirelli, and A. Distante, “An autonomous Res., vol. 31, no. 5, pp. 664–674, 2012.
mobile robotic system for surveillance of indoor environments,” Int. J. [27] D. A. Norman, “Natural user interfaces are not natural,” Interactions,
Adv. Robot. Syst., vol. 7, no. 1, pp. 19–26, 2010. vol. 17, no. 3, pp. 6–10, 2010.
[3] N. Tomatis, R. Philippsen, B. Jensen, K. O. Arras, G. Terrien, R. Piguet, [28] R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis.
and R. Y. Siegwart, “Building a fully autonomous tour guide robot,” in New York: Wiley, 1973.
Proc. 33rd Int. Symp. Robotics (ISR), 2002. [29] R. C. Gonzalez and R. E. Woods, Digital Image Processing, 3rd ed.
[4] M. c. Kang, K. s. Kim, D. k. Noh, J. w. Han, and S. j. Ko, “A robust Prentice Hall, 2008.
obstacle detection method for robotic vacuum cleaners,” IEEE Trans. [30] J. G. Proakis and D. G. Manolakis, Digital Signal Processing. Principles,
Consumer Electron., vol. 60, no. 4, pp. 587–595, Nov 2014. Algorithms, and Applications, 3rd ed. Prentice-Hall International, Inc.,
[5] R. J. Jacob, A. Girouard, L. M. Hirshfield, M. S. Horn, O. Shaer, E. T. 1996.
Solovey, and J. Zigelbaum, “Reality-based interaction: A framework [31] H. Kobayashi, B. L. Mark, and W. Turin, Probability, Random Processes,
for post-WIMP interfaces,” in Proc. SIGCHI Conf. Human Factors in and Statistical Analysis. Cambridge University Press, 2012.
Computing Syst. (CHI). ACM Press, 2008, pp. 201–210. [32] R. Kohavi and F. Provost, “Glossary of terms,” Machine Learning,
[6] J. Blake, Natural User Interfaces in .NET. Manning, 2011. vol. 30, no. 2-3, pp. 271–274, 1998.
[7] E. Hornecker and J. Buur, “Getting a grip on tangible interaction: A [33] M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs,
framework on physical space and social interaction,” in Proc. SIGCHI R. Wheeler, and A. Y. Ng, “ROS: an open-source robot operating
Conf. Human Factors in Computing Syst. (CHI). ACM Press, 2006, system,” in Proc. ICRA Workshop Open Source Software, vol. 3, no.
pp. 437–446. 3.2, 2009, p. 5.
[8] A. Levratti, A. De Vuono, C. Fantuzzi, and C. Secchi, “TIREBOT: A [34] J. Engel, J. Sturm, and D. Cremers, “Scale-aware navigation of a
novel tire workshop assistant robot,” Proc. IEEE Int. Conf. Advanced low-cost quadrocopter with a monocular camera,” Robot. Auton. Syst.,
Intelligent Mechatronics (AIM), pp. 733–738, 2016. vol. 62, no. 11, pp. 1646–1656, 2014.
[9] S. Waldherr, R. Romero, and S. Thrun, “A gesture based interface for [35] P. Rani, N. Sarkar, C. A. Smith, and L. D. Kirby, “Anxiety detecting
human-robot interaction,” Autonomous Robot, vol. 9, pp. 151–173, 2000. robotic system - towards implicit human-robot collaboration,” Robotica,
[10] A. De Luca and F. Flacco, “Integrated control for pHRI: collision vol. 22, no. 1, pp. 85–95, 2004.
avoidance, detection, reaction and collaboration,” Proc. 4th IEEE Int.
Conf. Biomedical Robotics and Biomechatronics, pp. 288–295, 2012.
6 P-value p = 0.04.
2377-3766 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.