Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Miguel Borges? , Andrew Symington† , Brian Coltin† , Trey Smith‡ , Rodrigo Ventura?
I. I NTRODUCTION
The HTC Vive is a consumer headset and accompanying
motion capture system designed for virtual reality (VR)
applications [1]. Motion capture describes the process of
estimating absolute position and orientation — or pose — Fig. 1. Astrobee (1) shown on a level granite surface (2), which is used
in real-time, and has many applications in film, medicine, to simulate a 2D microgravity environment. The prototype has trackers
engineering [2], and notably robotics. mounted to its port and starboard sides (3), and a single lighthouse (4)
mounted overhead.
The Vive system is comprised of lighthouses that emit
synchronized light sweeps, and trackers that use photodiodes
to measure light pulse timings as a proxy for estimating the
horizontal and vertical angles to lighthouses. The trackers
tracking systems (VisualEyez with an in-house developed
fuse angle measurements from a bundle of rigid photodiodes
pose estimation algorithm and QR codes with overhead
together to estimate the pose using a technique similar to
camera).
Angle-of-Arrival [3]. The tracker also has access to motion
data from an incorporated Inertial Measurement Unit (IMU) The first contribution of this paper is an analysis of Vive’s
to maintain a smooth and continuous trajectory. static precision and dynamic precision and accuracy. We
The Vive system provides a compelling means of obtaining show that although the original system has a sub-millimeter
ground truth data for roboticists: it is more affordable than precision with the trackers in a static state, when that state is
competing technologies, it is straightforward to set up and dynamic, the precision worsens by one order of magnitude.
use, and a Robotics Operating System (ROS) driver already We also show experimentally that the accuracy of this system
exists for integration with an ecosystem of robotic tools. can vary from a few millimeters up to a meter in a dynamic
For the reasons above, Vive was chosen as a source situation.
of ground truth for testing the Astrobee robots (Fig. 1). We attribute the high error to the closed-source fusion
Astrobees [4] are free-flying robots that will be deployed algorithm giving higher weight to the inertial measurements,
to the International Space Station in 2018 to be used as a thus minimizing jitter to the VR user. Motivated by this
research platform for free-flying robots1 . Ideally, the system and by not having access to the source code of Vive’s
should exhibit error in the millimeter range in order to be algorithms, the second contribution of this paper is a set
able to benchmark Astrobee’s localization algorithms [5], to of algorithms for Vive that improve on the accuracy and
test Astrobee’s robotic arm and to improve on the available stability while providing an open-source platform that is easy
? Institute for Systems and Robotics - Lisboa, Instituto Superior Técnico, for the user to change. These algorithms are used to compute
Universidade de Lisboa the trackers’ poses and for the calibration procedure that
† SGT Inc., NASA Ames Research Center
relates the lighthouses with the user’s workspace. We show
‡ NASA Ames Research Center
miguel.r.borges@tecnico.ulisboa.pt that our tracking methods, although less smooth, are able to
1 Astrobee flight software available open source at outperform Vive’s built-in algorithms in accuracy by up to
https://github.com/nasa/astrobee two orders of magnitude.
II. M OTION C APTURE
Vive is one of many motion capture systems available
on the market. Examples of other systems include VICON,
OptiTrack and VisualEyez. VICON and OptiTrack use cam-
eras to track reflective markers illuminated by infrared light
sources. VICON quotes accuracy levels of up to 76 µm and
precision (noise) of 15 µm in a four camera configuration [6]. Fig. 2. Side view of HTC Vive working principle with α being the angle
OptiTrack claims that its system can achieve accuracy levels between a sweeping plane and the normal vector to the lighthouse.
of less than 0.3 mm for robotic tracking systems. VisualEyez
uses a three-camera system to track active LED markers, and
is reported to have millimeter-level precision [7]. The key The inertial data is composed of linear accelerations and
issue with these tracking systems is that they are prohibitively angular velocities. But since this comes from a consumer
expensive for general use. grade IMU, the measurements are very noisy.
Reliable motion capture is an essential component of There are multiple frames associated with this problem.
an immersive VR experience. As the technology grows in Both the lighthouse and the tracker have their own frames, l
popularity so the cost of equipment falls. We are now at a and t respectively. For clarity, the lighthouse is represented
point where off-the-shelf VR devices are providing a feasible as li instead of l in Fig. 3, where i is its index. An auxiliary
alternative for motion capture in the context of robotics. frame vive, v, is selected to always be coincident with one
Examples of VR systems that offer motion capture include of the lighthouses’ frames — chosen during the calibration
Oculus Rift [8] and HTC Vive. procedure. This procedure allows the system to relate the
HTC Vive’s pose estimation has a similar working prin- poses of the lighthouses, in the case multiple lighthouses are
ciple to the Angle of Arrival (AoA) localization techniques being used. It also allows the user to choose the world frame
[9] used in Wireless Sensor Networks (WSN). AoA based w. The final output of Vive is a rigid-body transform between
localization resorts to arrays of antennas to estimate the angle a tracker frame and the world frame, represented by a red
of the received signal. From this interaction between multiple arrow in Fig. 3.
nodes, it is possible to estimate their location. Vive’s trackers
however, estimate the angle of the lighthouse through a time
difference, as is explained in the next section.
In [1], the authors evaluate Vive’s accuracy, however they
focus on research with the headset. They do not mention how
the trackers and controllers behave. These two devices are
more appealing for roboticists. It is therefore unclear how
Vive behaves as a ground truth tracking system for robotic
applications.
III. P ROBLEM D ESCRIPTION Fig. 3. Frames involved in Vive’s pose estimation.
The Server then passes that data to the Calibrator or APE h l Pp = p (2)
l x
Pp
(Absolute Pose Estimator), depending on its current state. arctan
lP z
p
These states can be Calibrating — determining the relative
rigid-body transforms between the lighthouses and the world In order to compute the pose of the tracker we use a sum
frame using the Calibrator — or Tracking — real time pose of of squared-differences between the photodiode’s recorded
solving using the APE. angles (αp ) and the estimated angles. This is a non-convex
optimization problem however, using an optimizer 3 , we
are able to get results quickly enough to achieve real-time
tracking. The cost function is the following:
M X
X N
2
fAP E = [hp,l (v Tt ) − αp ] (3)
l=1 p=1
where hp,l (·) is function (2) with (1) as the input argument
after converting it to cartesian coordinates.
Our cost function uses the data from all the lighthouses at
the same time in order to increase the stability of the solution,
however for the horizontal axis, we have to negate the
recorded angles (−αphorizontal ) due to the rotation direction
Fig. 4. Diagram of the system. of the lighthouse’s laser.
Our algorithm also takes advantage of Vive’s high sam-
pling rate by initializing the optimizer at the last computed
2 deepdive is available open source at
https://github.com/asymingt/deepdive 3 Ceres-Solver is available at http://ceres-solver.org
pose as a means of making the estimation process faster. of the relative pose between the trackers and the lighthouse,
For the first estimation done by the algorithm, we use an using the following cost function:
arbitrary starting pose in front of one lighthouses, to make N
sure it doesn’t converge to a pose behind it. X 2
fˆCal = hp l Tt − αp
(5)
In order to prevent outliers and old data from influencing p=1
the estimation we included the following restrictions: all
measured angles with magnitude greater than 60 degrees are where hp is function (2) using as its input:
rejected, there must be at least 4 measured angles from the
l
most recently detected lighthouse and all samples older than P̃p = l Tt t P̃p (6)
50 ms are not used in the case they are not from the most
recently detected lighthouse, otherwise the APE skips the This estimate is used to initialize the final cost function,
estimation of this pose. where we compute, simultaneously, the pose of each light-
The cost function (3) is fairly complex and the optimizer house in the vive frame, but this time with all the trackers
may sporadically diverge or converge into a local minimum. at the same time, as in:
In order to prevent wrong estimations, we included one
K X
M X
N
more verification before providing the solution to the user: X 2
it checks the cost function’s cost and if it is bigger than fCal = [hp,l,t (w Tl ) − αp ] (7)
t=1 l=1 p=1
a threshold linearly related with the number of observed
angles, the algorithm rejects the pose and waits for new data. where the function hp,l,t is similar to (2), however the input
These constraints improve APE’s stability but they also is w Tl which can be obtained from the rigid-body transform
lead to ignoring poses (loss of tracking) on the edges of the l
Tw . To obtain the input of the original function, we resort
workspace, where most of the photodiodes are occluded from to the following expression:
the lighthouse.
All the poses are estimated by the algorithm in the vive l
P̃p = l Tw w
Tt t P̃p (8)
frame instead of the world frame. The vive frame is an
auxiliary frame between the lighthouses’ frames and the After having all the lighthouses’ poses computed, the
world frame. The poses in the world frame are computed procedure chooses the vive frame as one of the lighthouses’
through ROS because the world frame is determined prior to frames and converts the lighthouses’ poses to this new
the pose estimation, as is mentioned in subsection V-B. auxiliary frame. We decided to use this frame in order to
preserve the frame hierarchy in the original ROS driver.
B. Calibration Procedure
VI. R ESULTS
When the Vive system is installed, the lighthouses are
individually mounted wherever it is convenient for the user, In order to evaluate Vive’s performance, we designed
so the registration from lighthouse to lighthouse and from two experiments where we assess the system in different
lighthouse to the world frame of interest is initially unknown. situations. Since we do not have access to Vive’s baseline
Therefore we created a procedure that addresses this issue. algorithm’s (from now on referenced as baseline) input data,
Our calibration procedure consists of a concatenation of in order to compare it with our proposed algorithm (from
rigid-body transforms (4) that leads to the relative poses of now on referenced as proposed), we will have to use different
the lighthouses. It assumes that the trackers are static leading datasets collected in similar conditions. We used however,
to a more accurate process. different lighthouse configurations for each dataset. We used
two set-ups (shown in Fig. 5): for the baseline algorithms,
w
Tl =w Tb b Tt t Tl (4) we used the adjacent walls configuration, which provided a
position standard deviation of 5 mm against 93 mm and a
For Astrobee tracking, as in many robotic applications, maximum deviation of 28 mm against 6271 mm of the other
we have multiple trackers rigidly mounted to the robot configuration; the proposed algorithm performed well for
chassis, pointing in different directions, to provide improved the first lighthouse set-up (lighthouses on opposing walls),
tracking coverage. We use the combined tracker information eliminating the need to reconfigure their locations.
to estimate the position of the robot’s body frame (b). The
user should specify the mounting geometry of the trackers
on the robot as body-to tracker relative poses b Tt . The user
also registers the world frame by taking Vive measurements
with the robot body frame fixed at a known pose w Tb .
In the time interval between the start and end of the data
acquisition, our calibrator records light data at 30 or 60 Hz
(depending on the lighthouse’s mode). After completing the
data acquisition, it starts by computing an initial estimate Fig. 5. Configuration of the lighthouses in the granite surface’s workspace.
A. Static State Results fit to the estimated poses and also the angle between the
We started with a stationary state comparison between plane’s normal vector and the same vector attached to the
algorithms, as described in section IV, where we evaluate tracker’s frame in the first instant. We will compare the
the estimated pose’s standard deviation (maximum standard precision (through the standard deviation) of both algorithms
deviation of the 3D position and standard deviation of again but for this different situation and now we’ll also in-
the angle resorting to an axis-angle representation of the clude an accuracy assessment (through the plane’s maximum
deviation — max — and average deviation — d). ¯ For these
orientation). In this experiment, the tracker was static and
at a distance of 1-2 m from the lighthouses. Table I contains tests, trackers 1 and 2 were mounted on the sides of Astrobee,
the results we obtained for each of the approximately 30 s except for dataset d4, where they were attached to the top
dataset. of the robot, as was tracker 3 when it was used. We include
in the results table a reference to the location of the trackers
TABLE I on the robot for each dataset (s for starboard, p for port and
S TANDARD DEVIATION OF THE POSE IN A STATIC STATE . t for top). All the datasets have a duration of 40-120 s and
Algorithm Dataset σPosition [mm] σOrientation [◦ ] Astrobee performed a trajectory similar to the one in Fig. 6,
s1 0.417 0.00300 manually controlled, with an average linear velocity of 1-6
s2 0.151 0.00586 cm/s.
Baseline s3 0.260 0.000476
s4 0.214 0.0023 TABLE II
s5 0.168 0.00687 D EVIATION FROM THE FITTED PLANE .
s6 4.960 0.010
s7 0.0875 0.000216 Algorithm Dataset Tracker σ [mm] max [mm] d¯ [mm]
Proposed s8 1.149 0.000476 d1 1 (s) 1.08 2.36 0.90
s9 10.052 0.0447 d1 2 (p) 1.51 6.77 2.02
s10 0.851 0.0030 d1 3 (t) 0.74 2.96 0.93
d2 1 (s) 33.44 802.57 43.25
Bsl.
d2 2 (p) 7.73 74.51 8.63
Comparing the average position’s standard deviation from d3 1 (s) 0.76 3.32 1.12
Vive’s built-in algorithms (0.242 mm) and from our algo- d3 2 (p) 2.24 28.79 3.39
d4 1 (t) 71.721 270.628 150.371
rithms (3.419 mm) we can clearly conclude that the former d4 2 (t) 26.629 106.627 48.589
outperforms the latter in a stationary experiment. These d5 3 (t) 1.14 5.05 0.90
results are explained by the fact that our algorithm does not d6 3 (t) 0.39 5.21 0.39
Prop.
use the correlation between consecutive poses. The inertial d7 1 (s) 2.11 22.80 2.94
d7 2 (p) 1.09 12.40 1.07
measurements also help the baseline algorithm with this
Algorithm Dataset Tracker σ [◦ ] max [◦ ] d¯ [◦ ]
correlation. However, this is only valid for a static situation, d1 1 (s) 0.02 0.50 0.01
and the trackers will not be in a constant pose while tracking d1 2 (p) 0.02 0.47 0.02
the robot. d1 3 (t) 0.01 0.26 0.01
d2 1 (s) 0.80 58.31 0.11
Baseline
B. Dynamic State Results d2 2 (p) 0.11 4.7 0.08
d3 1 (s) 0.01 0.26 0.01
The second experiment consists of tracking with the same d3 2 (p) 0.09 4.24 0.02
algorithms but with the trackers in motion, as described in d4 1 (t) 0.04 0.64 0.01
d4 2 (t) 0.04 0.69 0.01
section IV. As Astrobee floats in a perfectly flat surface, it’s
d5 3 (t) 0.11 2.14 0.06
trajectory should be a perfect plane. d6 3 (t) 0.36 4.98 0.17
Proposed
d7 1 (s) 1.05 11.63 0.32
d7 2 (p) 0.29 4.89 0.13