Fundamentals of Wearable Computers and Augmented Reality, Second Edition

FUNDAMENTALS OF
Wearable
Computers and
Augmented
Reality
SECOND EDITION
FUNDAMENTALS OF
Wearable
Computers and
Augmented
Reality
SECOND EDITION
edited by
Woodrow Barfield
Boca Raton London New York
CRC Press is an imprint of the

Taylor & Francis Group, an informa business
MATLAB is a trademark of The MathWorks, Inc. and is used with permission. The MathWorks does
not warrant the accuracy of the text or exercises in this book. This books use or discussion of MATLAB software or related products does not constitute endorsement or sponsorship by The MathWorks
of a particular pedagogical approach or particular use of the MATLAB software.
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
2016 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works
Version Date: 20150616
International Standard Book Number-13: 978-1-4822-4351-2 (eBook - PDF)
This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222
Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Contents
Preface.......................................................................................................................ix
Acknowledgments......................................................................................................xi
Editor...................................................................................................................... xiii
Contributors.............................................................................................................. xv
Section IIntroduction
Chapter 1 Wearable Computers and Augmented Reality: Musings
andFuture Directions............................................................................3
Woodrow Barfield
Chapter 2 Wearable Computing: Meeting the Challenge.................................... 13
Thad Starner
Chapter 3 Intimacy and Extimacy: Ethics, Power, and Potential
ofWearable Technologies................................................................... 31
Patricia Flanagan, Despina Papadopoulos, and Georgina Voss
Section IIThe Technology

Chapter 4 Head-Mounted Display Technologies for Augmented Reality........... 59
Kiyoshi Kiyokawa
Chapter 5 Optics for Smart Glasses, Smart Eyewear, Augmented Reality,
and Virtual Reality Headsets.............................................................. 85
Bernard Kress
Chapter 6 Image-Based Geometric Registration for Zoomable Cameras
Using Precalibrated Information....................................................... 125
Takafumi Taketomi
vi
Contents
Chapter 7 Visual Tracking for Augmented Reality in Natural Environments......151

Suya You and Ulrich Neumann
Chapter 8 Urban Visual Modeling and Tracking............................................... 173
Jonathan Ventura and Tobias Hllerer
Chapter 9 Scalable Augmented Reality on Mobile Devices: Applications,
Challenges, Methods, and Software.................................................. 195
Xin Yang and K.T. Tim Cheng
Chapter 10 Haptic Augmented Reality: Taxonomy, Research Status,
andChallenges.................................................................................. 227
Seokhee Jeon, Seungmoon Choi, and Matthias Harders
Section III Augmented Reality

Chapter 11 Location-Based Mixed and Augmented Reality Storytelling........... 259
Ronald Azuma
Chapter 12 Dimensions of Spatial Sound and Interface Styles of Audio
Augmented Reality: Whereware, Wearware, and Everyware.......... 277
Michael Cohen
Chapter 13 Applications of Audio Augmented Reality: Wearware,
Everyware, Anyware, and Awareware..............................................309
Michael Cohen and Julin Villegas
Chapter 14 Recent Advances in Augmented Reality for Architecture,
Engineering, and Construction Applications.................................... 331
Amir H. Behzadan, Suyang Dong, and Vineet R. Kamat
Chapter 15 Augmented Reality HumanRobot Interfaces toward
Augmented Robotics......................................................................... 399
Maki Sugimoto
Chapter 16 Use of Mobile Augmented Reality for Cultural Heritage................. 411
John Krogstie and Anne-Cecilie Haugstvedt
vii
Contents
Chapter 17 Applications of Augmented Reality for the Automotive Industry......433

Vincent Gay-Bellile, Steve Bourgeois, Dorra Larnaout,
andMohamed Tamaazousti
Chapter 18 Visual Consistency in Augmented Reality Compositing.................. 457
Jan Fischer
Chapter 19 Applications of Augmented Reality in the Operating Room............ 485
Ziv Yaniv and Cristian A. Linte
Chapter 20 Augmented Reality for Image-Guided Surgery................................ 519
Marta Kersten-Oertel, Pierre Jannin, and D. Louis Collins
Section IVWearable Computers and Wearable

Technology
Chapter 21 Soft Skin Simulation for Wearable Haptic Rendering................................. 551
Gabriel Cirio, Alvaro G. Perez, and Miguel A. Otaduy
Chapter 22 Design Challenges of Real Wearable Computers............................. 583
Attila Reiss and Oliver Amft
Chapter 23 E-Textiles in the Apparel Factory: Leveraging Cut-and-Sew
Technology toward the Next Generation of Smart Garments........... 619
Lucy E. Dunne, Cory Simon, and Guido Gioberto
Chapter 24 Garment Devices: Integrating Energy Storage into Textiles............. 639
Kristy Jost, Genevieve Dion, and Yury Gogotsi
Chapter 25 Collaboration with Wearable Computers.......................................... 661
Mark Billinghurst, Carolin Reichherzer, and Allaeddin Nassani
Author Index......................................................................................................... 681
Subject Index......................................................................................................... 707
Preface
In the early 1990s, I was a member of the coordinating committee that put together
the first conference on wearable computers, which, interestingly, was followed by
a highly publicized wearable computer fashion show. Speaking at the conference,
Irecall making the following comment about wearable computers: Are we wearing
them, or are they wearing us? At the time, I was thinking that eventually advances
in prosthetics, sensors, and artificial intelligence would result in computational tools
that would have amazing consequences for humanity. Developments sincethen have
proven that vision correct. The first edition of Fundamentals of Wearable Computers
and Augmented Reality, published in 2001, helped set the stage for the coming
decade, in which an explosion in research and applications for wearable computers
and augmented reality occurred.
When the first edition was published, much of the research in augmented reality
and wearable computers was primarily proof-of-concept projects; there were
few, if any, commercial products on the market. There was no Google Glass or
handheld smartphones equipped with sensors and the computing power of a mid1980s supercomputer. And the apps for handheld smartphones that exist now
were n onexistent then. Fast forward to today: the commercial market for wearable
computers and augmented reality is in the millions of dollars and heading toward the
billions. From a technology perspective, much of what is happening now with wearables and augmented reality would not have been possible even five years ago. So,
as an observation, Ray Kurzweils law of accelerating returns seems to be alive and
well with wearable computer and augmented reality technology, because 14 years
after the first edition of this book, the capabilities and applications of both technologies are orders of magnitude faster, smaller, and cheaper.
As another observation, the research and development of wearable computers
and augmented reality technology that was once dominated by U.S. universities
and research laboratories is truly international in scope today. In fact, the second
edition of Fundamentals of Wearable Computers and Augmented Reality contains
contributions from researchers in the United States, Asia, and Europe. And if one participates in conferences in this field, they are as likely to be held these days in Europe
or Asia as they are in the United States. These are very positive developments and
will lead to even more amazing applications involving the use of wearable c omputers
and augmented reality technology in the future.
Just as the first edition of this book provided a comprehensive coverage of the
field, the second edition attempts to do the same, specifically by including chapters
from a broad range of topics w
ritten by outstanding researchers and teachers within
the field. All of the chapters are new, with an effort to again provide fundamental
knowledge on each topic so that a valuable technical resource is provided to the
community. Specifically, the second edition contains chapters on haptics, visual displays, the use of augmented reality for surgery and manufacturing, technical issues
of image registration and tracking, and augmenting the environment with wearable
ix
Preface
audio interfaces. The second edition also contains chapters on the use of augmented
reality in preserving our cultural heritage, on humancomputer interaction and
augmented reality technology, on augmented reality and robotics, and on what we
termed in the first edition as computational clothing. Still, even with this wide range
of applications, the main goal of the second edition is to provide the community with
fundamental information and basic knowledge about the design and use of wearable
computers and augmented reality with the goal to enhance peoples lives. I believe
the chapter authors accomplished that goal showing great expertise and breadth of
knowledge. My hope is that this second edition can also serve as a stimulus for
developments in these amazing technologies in the coming decade.
Woodrow Barfield, PhD, JD, LLM
Chapel Hill, North Carolina
The images for augmented reality and wearable computers are essential for the
understanding of the material in this comprehensive text; therefore, all color images
submitted by the chapter authors are available at http://www.crcpress.com/product/
isbn/9781482243505.
MATLAB is a registered trademark of The MathWorks, Inc. For product information, please contact:
The MathWorks, Inc.
3 Apple Hill Drive
Natick, MA 01760-2098 USA
Tel: 508-647-7000
Fax: 508-647-7001
E-mail: info@mathworks.com
Web: www.mathworks.com
Acknowledgments
I offer special thanks to the following chapter authors for providing images that
appear on the cover of the book: Kiyoshi Kiyokawa, an occlusion-capable optical
see-through head-mounted display; Miguel A. Otaduy, Gabriel Cirio, and Alvaro
G. Perez, simulation of a deformable hand with nonlinear skin mechanics; Vineet
R. Kamat, Amir H. Behzadan, and Suyang Dong, augmented reality visualization
of buried utilities during excavation; Marta Kersten-Oertel, virtual vessels of an
arteriovenous malformation (AVM) (with color-coded vessels [blue for veins,
red for arteries, and purple for the AVM nidus]) overlaid on a live image of a 3D
printed nylon anthropomorphic head phantom; Seokhee Jeon, Seungmoon Choi, and
Matthias Harders, an example of a visuo-haptic augmented reality system, doing a
modulation of real soft object stiffness; and Kristy Jost, Genevieve Dion, and Yury
Gogotsi, 3D simulations of knitted smart textiles (rendered on the Shima Seiki Apex
3 Design Software).
Several members of CRC Press contributed in important ways to this books
publication and deserve recognition. First, I thank Jessica Vakili, senior project
coordinator, for answering numerous questions about the process of editing the book
and those of the chapter authors in a timely, patient, and always efficient manner.
I also thank and acknowledge Cindy Renee Carelli, senior acquisition editor, for
contacting me about editing a second edition, championing the proposal through
the publishers review process, and her timely reminders to meet the deadline. The
project editor, Todd Perry, is thanked for the important task of overseeing the coordination, copyediting, and typesetting of the chapters. Gowthaman Sadhanandham
is also thanked for his work in production and assistance provided to authors.
Most importantly, in my role as editor for the second edition, I acknowledge and
thank the authors for their hard work and creative effort to produce outstanding
chapters. To the extent this book provides the community with a valuable resource
and stimulates further developments in the field, each chapter author deserves much
thanks and credit. In many ways, this book began 14 years ago, when the first edition
was published. To receive contributions from some of the original authors, to see
how their careers developed over the years, and the contributions they made to the
field, was a truly satisfying experience for me. It was a great honor that such a distinguished group again agreed to join the project.
Finally, in memoriam, I thank my parents for the freedom they gave me to follow
my interests and for the Erlenmeyer, distilling, and volumetric flasks when I was a
budding teenage scientist. Further, my niece, Melissa, is an inspiration and serves
as the gold standard in the family. Last but not least, I acknowledge my daughter,
Jessica, student and college athlete, for keeping me young and busy. I look forward
to all she will achieve.
xi
Editor
Woodrow Barfield, PhD, JD, LLM, has served as professor of engineering at the
University of Washington, Seattle, Washington, where he received the National
Science Foundation Presidential Young Investigator Award. Professor Barfield
directed the Sensory Engineering Laboratory, where he was involved in research on
sensors and augmented and virtual reality displays. He has served as a senior editor for Presence: Teleoperators and VirtualEnvironments and isan associate editor
for Virtual Reality.He has more than 350 publications and presentations, including
invited lectures and keynote talks, and holds two degrees inlaw.
xiii
Contributors
Oliver Amft
ACTLab Research Group
University of Passau
Passau, Germany
Seungmoon Choi
Pohang University of Science and
Technology
Pohang, South Korea
Ronald Azuma
Intel Labs
Santa Clara, California
Gabriel Cirio
Department of Computer Science
Universidad Rey Juan Carlos
Madrid, Spain
Woodrow Barfield
Chapel Hill, North Carolina
Amir H. Behzadan
Department of Civil, Environmental,
and Construction Engineering
University of Central Florida
Orlando, Florida
Mark Billinghurst
Human Interface Technology
Laboratory New Zealand
University of Canterbury
Christchurch, New Zealand
Steve Bourgeois
Vision and Content Engineering
Laboratory
CEA LIST
Gif-sur-Yvette, France
K.T. Tim Cheng
Department of Electrical and Computer
Engineering
University of California, Santa Barbara
Santa Barbara, California
D. Louis Collins
Department of Biomedical Engineering
Department of Neurology &
Neurosurgery
Montreal Neurological Institute
McGill University
Montreal, Canada
Michael Cohen
Computer Arts Laboratory
University of Aizu
Aizu-Wakamatsu, Japan
Genevieve Dion
Shima Seiki Haute Technology
Laboratory
ExCITe Center
Antoinette Westphal College of Media
Arts and Design
Drexel University
Philadelphia, Pennsylvania
Suyang Dong
Department of Civil and Environmental
Engineering
University of Michigan
Ann Arbor, Michigan
xv
xvi
Lucy E. Dunne
Department of Design, Housing, and
Apparel
University of Minnesota
St Paul, Minnesota
Jan Fischer
European Patent Office
Munich, Germany
Patricia Flanagan
Wearables Lab
Academy of Visual Arts
Hong Kong Baptist University
Kowloon Tong, Hong Kong
Vincent Gay-Bellile
Laboratory
CEA LIST
Guido Gioberto
Department of Computer Science and
Engineering
University of Minnesota
Minneapolis, Minnesota
Yury Gogotsi
Department of Materials Science and
Engineering
College of Engineering
A.J. Drexel Nanomaterials Institute
Drexel University
Matthias Harders
University of Innsbruck
Innsbruck, Austria
Anne-Cecilie Haugstvedt
Computas A/S
Lysaker, Norway
Contributors
Tobias Hllerer
University of California
Pierre Jannin
INSERM Research Director
LTSI, Inserm UMR 1099
University of Rennes
Rennes, France
Kristy Jost
Department of Materials Science and
Engineering
College of Engineering
A.J. Drexel Nanomaterials Institute
and
Shima Seiki Haute Technology
Laboratory
ExCITe Center
Antoinette Westphal College of Media
Arts and Design
Drexel University
Seokhee Jeon
Kyung Hee University
Seoul, South Korea
Vineet R. Kamat
Department of Civil and Environmental
Engineering
University of Michigan
Ann Arbor, Michigan
Marta Kersten-Oertel
Montreal Neurological Institute
McGill University
Montreal, Quebec, Canada
Kiyoshi Kiyokawa
Cybermedia Center
Osaka University
Osaka, Japan
xvii
Contributors
Bernard Kress
Google [X] Labs
Mountain View, California
John Krogstie
Department of Computer and
Information Science
Norwegian University of Science and
Technology
Trondheim, Norway
Dorra Larnaout
Laboratory
CEA LIST
Cristian A. Linte
Rochester Institute of Technology
Rochester, New York
Allaeddin Nassani
Alvaro G. Perez
Madrid, Spain
Carolin Reichherzer
Attila Reiss
Chair of Sensor Technology
University of Passau
Passau, Germany
Cory Simon
Johnson Space Center
National Aeronautics and Space
Administration
Houston, Texas
Thad Starner
School of Interactive Computing
Georgia Institute of Technology
Atlanta, Georgia
Ulrich Neumann
University of Southern California
Los Angeles, California
Maki Sugimoto
Faculty of Science and Technology
Department of Information and
Computer Science
Keio University
Tokyo, Japan
Miguel A. Otaduy
Madrid, Spain
Takafumi Taketomi
Nara Institute of Science and
Technology
Nara, Japan
Despina Papadopoulos
Interactive Telecommunications
Program
New York University
New York, New York
Mohamed Tamaazousti
Laboratory
CEA LIST
xviii
Jonathan Ventura
University of Colorado
Colorado Springs, Colorado
Julin Villegas
Computer Arts Laboratory
University of Aizu
Aizu-Wakamatsu, Japan
Georgina Voss
Science and Technology Policy
Research
University of Sussex
Sussex, United Kingdom
Xin Yang
Department of Electrical and Computer
Engineering
University of California, Santa Barbara
Contributors
Ziv Yaniv
TAJ Technologies, Inc.
Mendota Heights, Minnesota
and
Office of High Performance Computing
and Communications
National Library of Medicine
National Institutes of Health
Bethesda, Maryland
Suya You
University of Southern California
Los Angeles, California
Section I
Introduction
Wearable Computers
and Augmented Reality
Musings and Future Directions
Woodrow Barfield
CONTENTS
1.1 Public Policy......................................................................................................7
1.2 Toward a Theory of Augmented Reality........................................................... 9
1.3 Challenges and the Future Ahead.................................................................... 10
References................................................................................................................. 11
In this chapter, I briefly introduce the topic of wearable computers and augmented
reality, with the goal to provide the reader a roadmap to the book, a brief h istorical
perspective, and a glimpse into the future of a sensor-filled, wearable computer and
augmented reality (AR) world. While each technology alone (AR and wearables) is
providing people with amazing applications and technologies to assist them in their
daily life, the combination of the technologies is often additive and, in some cases,
multiplicative, as, for example, when virtual images, spatialized sound, and haptic
feedback are combined with wearable computers to augment the world with information whenever or wherever it is needed.
Let me begin to set the stage by offering a few definitions. Azuma (1997) defined
an augmented reality application as one that combines the real world with the virtual
world, is interactive and in real-time, and is registered in three dimensions. Often, the
platform to deliver augmented reality is a wearable device; or in the case of a smart
phone, a hand-held computer. Additionally, most people think of a w
earable computer as a computing device that is small and light enough to be worn on ones body
without causing discomfort. And unlike a laptop or a palmtop, a wearable c omputer is
constantly turned on and is often used to interact with the real-world through sensors
that are becoming more ubiquitous each day. Furthermore, information provided
by a wearable computer can be very context and location sensitive, especially when
combined with GPS. In this regard, the computational model of wearable computers
differs from that of laptop computers and personal digital assistants.
In the early days of research in developing augmented reality, many of the same
researchers were also involved in creating immersive virtual environments. We began
to discuss different degrees of reality and virtuality. Early on, Paul Milgram from
the University of Toronto, codified the thinking by proposing a virtuality continuum
Fundamentals of Wearable Computers and Augmented Reality
which represents a continuous scale ranging between the completely virtual world,
avirtuality, and a completely real, reality (Milgram etal., 1994). The realityvirtuality
continuum therefore encompasses all possible variations and compositions of real
and virtual objects. The area between the two extremes, where both the real and the
virtual are mixed, is the so-called mixed-realitywhich Paul indicated consisted
of both augmented-reality, where the virtual augments the real, and augmented
virtuality, where the real augments the virtual. Another prominent early researcher
in wearables, and a proponent of the idea of mediating reality, was Steve Mann
(2001, 2002). Steve, now at the University of Toronto, describes wearable computing as miniature body-borne computational and sensory devices; he expanded the
discussion of wearable computing to include the more expansive term bearable
computing by which he meant wearable computing technology that is on or in the
body, and with numerous examples, Steve showed how wearable computing could be
used to augment, mediate, or diminish reality (Mann, 2002).
When I think of the different types of computing technology that may be worn on
or in the body, I envision a continuum that starts with the most basic of wearable computing technology and ends with wearable computing that is actually connected to a
persons central nervous system, that is, their brain (Figure 1.1). In fact, as humans
are becoming more-and-more equipped with wearable computing technology, the
distinction as to what is thought of as a prosthesis is becoming blurred as we integrate more wearable computing devices into human anatomy and physiology. The
extension of computing integrated into a persons brain could radically enhance
human sensory and cognitive abilities; in fact, in my view, we are just now at the cusp
of wearable computing and sensor technology breaking the skin barrier and moving
4 mm
3
1
FIGURE 1.1 A microchip is used to process brain waves that are used to control a cursor on
a computer screen. (Image courtesy of Wikimedia Commons.)
Wearable Computers and Augmented Reality
into the human body, and eventually into the brain. Already there are experimental
systems (computing technology integrated into a persons brain) in-field now that are
helping those with severe physical disabilities. For example, consider people with
debilitating diseases such that they are essentially locked in their own body. With
the appropriate wearable computing technology consisting of a microchip that is
implanted onto the surface of the brain (where it monitors electronic thought pulses),
such people may use a computer by thought alone allowing them to communicate
with their family, caregivers, and through the internet, the world at large. Sadly, just
in the United States about, 5000 people yearly are diagnosed with just such a disease
that ultimately shuts down the motor control capabilities of their bodyamyotrophic
lateral sclerosis, sometimes called Lou Gehrigs disease. This disease is a rapidly
progressive, invariably fatal neurological disease that attacks the nerve cells responsible for controlling voluntary muscles. I highlight this example to show, that while
many uses of AR/wearables will be for gaming, navigation, shopping, and so on,
there are very transformative uses of wearable computing technology, either being
developed now, or soon to be developed, that will benefit humanity in ways we are
just now beginning to realize.
One of the early adopters of wearable computing technology, especially with
regard to implantable sensors within the body, was Professor Kevin Warwick who in
1998 at the University of Reading was one of the first people to hack his body when
he participated in a series of proof-of-concept studies involving a sensor implanted
into the median nerves of his left arm; a procedure which allowed him to link his
nervous system directly to a computer. Most notably, Professor Warwick was able
to control an electric wheelchair and an artificial hand, using the neural interface.
In addition to being able to measure the signals transmitted along the nerve fibers in
Professor Warwicks left arm, the implant was also able to create artificial sensation
by stimulating the nerves in his arm using individual electrodes. This bi-directional
functionality was demonstrated with the aid of Kevins wife and a second, less complex implant which connected to her nervous system. According to Kevin, this was the
first solely electronic communication between the nervous systems of two humans;
since then, many have extended Kevins seminal work in wearable computers using
RFID chips and other implantable sensors (and there is even an a nti-chipping statute
enacted in California and other states).
Other types of innovative and interesting wearable devices are being developed
at a rapid pace. For example, researchers at Brown University and Cyberkinetics
in Massachusetts are devising a microchip that is implanted in the motor cortex
just beneath a persons skull that will be able to intercept nerve signals and reroute
them to a computer, which will then wirelessly send a command to any of various
electronic devices, including computers, stereos, and electric wheelchairs. And
neuroscientists and robotics engineers have just recently demonstrated the viability of
direct brain-to-brain communication in humans using electroencephalogram (EEG)
and image-guided transcranial magnetic stimulation (TMS) technologies. Further,
consider a German team that has designed a microvibration device and a wireless
low-frequency receiver that can be implanted in a persons tooth. The vibrator acts
as microphone and speaker, sending sound waves along the jawbone to a persons
eardrum. And in another example of a wearable implantable device, the company
Setpoint, is developing computing therapies to reduce systemic inflammation by

stimulating the vagus nerve using an implantable pulse generator. This device works
by activating the bodys natural inflammatory reflex to dampen inflammation and
improve clinical signs and symptoms.
Medical necessity, for example, to manage debilitating disease such as diabetes,
is a main reason why people will become equipped with wearable computing technology and sensors that monitor their bodys health. In fact, millions of p eople
worldwide with diabetes could benefit from implantable sensors and wearable
computers designed to monitor their blood-sugar level; because if not controlled such
people are at risk for dangerous complications, including damage to the eyes, kidneys, and heart. To help people monitor their blood-sugar level, Smart Holograms,
a spinoff company of Cambridge University, Google, and others are developing eyeworn sensors to assist those with the disease. Googles technology consists of contact
lens built with special sensors that measures sugar levels in tears using a tiny wireless chip and miniature sensor embedded between two layers of soft contact lens
material. As interesting and innovative as this solution to monitoring diabetes is, this
isnt the only examples of eye-oriented wearable technology that will be developed.
In the future, we may see people equipped with contact lens or retinal prosthesis that
monitor their health, detect energy in the x-ray or infrared range, and have telephoto
capabilities.
As for developing a telephoto lens, for the approximately 2025 million people
worldwide who have the advanced form of age-related macular degeneration (AMD),
a disease which affects the region of the retina responsible for central, detailed vision,
and is the leading cause of irreversible vision loss and legal blindness in people over
the age of 65, an implantable telescope could offer hope. In fact, in 2010, the U.S.
FDA approved an implantable miniature telescope (IMT), which works like the telephoto lens of a camera (Figure 1.2).
The IMT technology reduces the impact of the central vision blind spot due to
end-stage AMD and projects the objects the patient is looking at onto the healthy
area of the light-sensing retina not degenerated by the disease.
FIGURE 1.2 The implantable miniature telescope (IMT) is designed to improve vision
for those experiencing age-related macular degeneration. (Images provided courtesy of
VisionCare Ophthalmic Technologies, Saratoga, CA.)
The surgical procedure involves removing the eyes natural lens, as with cataract
surgery, and replacing the lens with the IMT. While telephoto eyes are not coming
soon to an ophthalmologist office, this is an intriguing step in that direction and a
look into the future of wearable computers. I should point out that in the United
States any device containing a contact lens or other eye-wearable technology is regulated by the Federal Drug Administration as a medical device; the point being that
much of wearable computing technology comes under government regulation.
1.1 PUBLIC POLICY

Although not the focus of this book, an important topic for discussion is the use of
augmented reality and wearable computers in the context of public policy especially in
regard to privacy. For example, Steve Mann presents the idea that wearable computers
can be used to film newsworthy events as they happen or people of authority as they
perform their duties. This example brings up the issues of privacy and whether a person
has a legal right to film other people in public. Consider the following case decided by
the U.S. First Circuit Court of Appealsbut note it is not the only legal dispute involving sensors and wearable computers. In the case, Simon Glik was arrested for using his
cell phones digital video camera (a wearable computer) to film several police officers
arresting a young man on the Boston Common (Glik v. Cunniffe, 2011). The charges
against Glik, which included violation of Massachusettss wiretap statute and two
other state-law offenses, were subsequently judged baseless and were dismissed. Glik
then brought suit under a U.S. Federal Statute (42 U.S.C. 1983), claiming that his very
arrest for filming the officers constituted a violation of his rights under the First (free
speech) and Fourth (unlawful arrest) Amendments to the U.S. Constitution. The court
held that based on the facts alleged, that Glik was exercising clearly established First
Amendment rights in filming the officers in a public space, and that his clearly established Fourth Amendment rights were violated by his arrest without probable cause.
While the vast amount of information captured by all the wearable digital devices
is valuable on its own, sensor data derived from wearable computers will be even
more powerful when linked to the physical world. On this point, knowing where
a photo was taken, or when a car passed by an automated sensor, will add rich
metadata that can be employed in countless ways. In effect, location information
will link the physical world to the virtual meta-world of sensor data. With sensor
technology, everything from the clothing we wear to the roads we drive on will be
embedded with sensors that collect information on our every move, including our
goals, and our desires. Just consider one of the most common technologies equipped
with sensorsa cell phone. It can contain an accelerometer to measure changes in
velocity, a gyroscope to measure orientation, and a camera to record the visual scene.
With these senses the cell phone can be used to track a persons location, and integrate
that information with comprehensive satellite, aerial, and ground maps to generate
multi-layered real-time location-based databases. In addition, body-worn sensors are
being used to monitor blood pressure, heart rate, weight, and blood glucose, and can
link to a smartphone often with wireless sensors. Also, given the ability of hackers
to access networks and wireless body-worn devices, the cybersecurity of wearable
devices is becoming a major concern. Another point to make is that sensors on the
outside of the body, are rapidly moving under the skin as they begin to connect the
functions of our body to the sensors external to it (Holland etal., 2001).
Furthermore, what about privacy issues and the use of wearable computers to
film people against their will? Consider an extreme case, video voyeurism, which
is the act of filming or disseminating images of a persons private areas under
circumstance in which the person had a reasonable expectation of privacy regardless
of whether the person is in a private or public location. Video voyeurism is not only
possible but being done using wearable computers (mostly hand held cameras). In
the United States, such conduct is prohibited under State and Federal law (see, e.g.,
Video Voyeurism Prevention Act of 2004, 18 U.S.C.A. 1801). Furthermore, what
about the privacy issues associated with other wearable computing technology such
as the ability to recognize a persons face, then search the internet for personal information about the individual (e.g., police record or credit report), and tag that information on the person as they move throughout the environment?
As many of the chapters in this book show, the use of wearable computers
combined with augmented reality capabilities can be used to alter or diminish reality
in which a wearable computer can be used to replace or remove clutter, say, for example, an unwanted advertisement on the side of a building. On this topic, I published
an article Commercial Speech, Intellectual Property Rights, and Advertising Using
Virtual Images Inserted in TV, Film, and the Real World in UCLA Entertainment
Law Review. In the article, I discussed the legal and policy ramifications of placing
ads consisting of virtual images projected in the real world. We can think of virtual
advertising as a form of digital technology that allows advertisers to insert computergenerated brand names, logos, or animated images into television programs or movies; or with Steves wearable computer technology and other displays, the real world.
In the case of TV, a reported benefit of virtual advertising is that it allows the action
on the screen to continue while displaying an ad viewable only by the home audience.
What may be worrisome about the use of virtual images to replace portions of
the real world is that corporations and government officials may be able to alter what
people see based on political or economic considerations; an altered reality may then
become the accepted norm, the consequences of which seem to bring up the dystopian
society described in Huxleys Brave New World. Changing directions, another policy
issue to consider for people equipped with networked devices is what liabilities, if
any, would be incurred by those who disrupt the functioning of their computing prosthesis. For example, would an individual be liable if they interfered with a signal sent
to an individuals wearable computer, if that signal was used to assist the individual
in seeing and perceiving the world? On just this point, former U.S. Vice President,
Dick Cheney, equipped with a pacemaker had its wireless feature disabled in 2007.
Restaurants have also entered into the debate about the direction of our wearable
computer future. Taking a stance against Google Glass, a Seattle-based restaurant,
Lost Lake Cafe, actually kicked out a patron for wearing Glass. The restaurant is
standing by its no-glass policy, despite mixed responses from the local community.
In another incident, a theater owner in Columbus, Ohio, saw enough of a threat from
Google Glass to call the Department of Homeland Security. The Homeland Security
agents removed the programmer who was wearing Google Glass connected to his
prescription lenses. Further, a San Francisco bar frequented by a high-tech crowd

has banned patrons from wearing Google Glass while inside the establishment. In
fact, San Francisco seems to be ground zero for cyborg disputes as a social media
consultant who wore Glass inside a San Francisco bar claimed that she was attacked
by patrons objecting to her wearing the device inside the bar. In addition, a reporter
for Business Insider said he had his Google Glass snatched off his face and smashed
to the ground in San Franciscos Mission District.
Continuing the theme of how wearable computers and augmented reality technology impact law and policy, in addition to FDA regulations, some jurisdictions are
just beginning to regulate wearable computing technology if its use poses a danger to
the population. For example, sparsely populated Wyoming is among a small number
of U.S. states eyeing a ban on the use of wearable computers while driving, over
concerns that drivers wearing Google Glass may pay more attention to their email or
other online content than the road. And in a high-profile California case that raised
new questions about distracted driving, a driver wearing Google Glass was ticketed
for wearing the display while driving after being stopped for speeding. The ticket
was for violating a California statute which prohibited a visual monitor in her car
while driving. Later, the ticket was dismissed due to lack of proof the device was
actually operating while she was driving. To show the power and influence of corporations in the debate about our wearable computer/AR future, Google is lobbying
officials in at least three U.S. states to stop proposed restrictions on driving with
headsets such as Google Glass, marking some of the first clashes over the nascent
wearable technology.
By presenting the material in the earlier sections, my goal was to inform the
readers of this book that while the technology presented in the subsequent chapters is
fascinating and even inspiring, there are still policy and legal issues that will have to
be discussed as wearable computer and augmented reality technologies improve and
enter more into the mainstream of society. Thus, I can concludewhile technology
may push society further, there is a feedback loop, technology is also influenced by
society, including its laws and regulations.
1.2 TOWARD A THEORY OF AUGMENTED REALITY

As a final comment, one often hears people discuss the need for theory to provide
an intellectual framework for the work done in augmented reality. When I was on
the faculty at the University of Washington, my students and I built a head tracked
augmented reality system that as one looked around the space of the laboratory,
they saw a corresponding computer-generated image that was rendered such that it
occluded real objects in that space. We noticed that some attributes of the virtual
images allowed the person to more easily view the virtual object and real world in a
seamless manner. Later, I became interested in the topic of how people performed
cognitive operations on computer-generated images. With Jim Foley, now at Georgia
Tech, I performed experiments to determine how people mentally rotated images
rendered with different lighting models. This led to thinking about how virtual
images could be seamlessly integrated into the real world. I asked the question of
whether there was any theory to explain how different characteristics of virtual
10
images combined to form a seamless whole with the environment they were projected into, or whether virtual images projected in the real world appeared separate
from the surrounding space (floating and disembodied from the real world scene).
I recalled a paper I had read while in college by Garner and Felfoldy (1970) on
the integrality of stimulus dimensions in various types of information processing.
The authors of the paper noted that separable dimensions remain psychologically
distinct when in combination; an example being forms varying in shape and color.
We say that two dimensions (features) are integral when they are perceived holistically, that is, its hard to visually decode the value of one independently from the
other. A vast amount of converging evidence suggests that people are highly efficient
at selectively attending to separable dimensions. By contrast, integral dimensions
combine into relatively unanalyzable, unitary wholes, an example being colors
varying in hue, brightness, and saturation. Although people can selectively attend
to integral dimensions to some degree, the process is far less efficient than occurs
for s eparable-dimension stimuli (Shepard, 1964). I think that much can be done to
develop a theory of augmented, mediated, or diminished reality using the approach
discussed by Garner and Felfoldy, and Shepard, and I encourage readers of this book
to do so. Such research would have to expand the past work which was done on single
images, to virtual images projected into the real world.
1.3 CHALLENGES AND THE FUTURE AHEAD

While the chapters in this book discuss innovative applications using wearable
computer technology and augmented reality, the chapters also focus on providing
solutions to some of the difficult design problems in both of these fields. Clearly,
there are still many design challenges to overcome and many amazing applications
yet to developsuch goals are what designing the future is about. For example,
consider a technical problem, image registration, GPS lacks accuracy, but I expect
vast improvements in image registration as the world is filled with more sensors.
Ialso expect that wearable computing technology will become more-and-more integrated with the human body; especially for reasons of medical necessity. And with
continuing advances in miniaturization and nanotechnology, head-worn displays
will be replaced with smart contact lens, and further into the future bionic eyes that
record everything a person sees, along with the capability to overlay the world with
graphics (essentially information). Such technology will provide people augmented
reality capabilities that would be considered the subject of science fiction just a few
years ago.
While this chapter focused more on a policy discussion and futuristic view of
wearable computers and augmented reality, the remaining chapters focus far more
on technical and design issues associated with the two technologies. The reader
should keep in mind that the authors of the chapters which follow are inventing the
future, but we should all be involved in determining where technology leads us and
what that future looks like.
11
REFERENCES
Azuma, R. T., 1997, A survey of augmented reality, Presence: Teleoperators and Virtual
Environments, 6(4), 355385.
Garner, W. R. and Felfoldy, G. L., 1970, Integrality of stimulus dimensions in various types
ofinformation processing, Cognitive Psychology, 1, 225241.
Glik v. Cunniffe, 655 F.3d 78 (1st Cir. 2011) (case at the United State Court of Appeals for
the First Circuit that held that a private citizen has the right to record video and audio
of public officials in a public place, and that the arrest of the citizen for a wiretapping
violation violated the citizens First and Fourth Amendment rights).
Holland, D., Roberson, D. J., and Barfield, W., 2001, Computing under the skin, in Barfield, W.
and Caudell, T. (eds.), Fundamentals of Wearable Computing and Augmented Reality,
pp. 747792, Lawrence Erlbaum Associates, Inc., Mahwah, NJ.
Mann, S., August 6, 2002, Mediated reality with implementations for everyday life, Presence
Connect, the online companion to the MIT Press journal, PRESENCE: Teleoperators
and Virtual Environments, 11(2), 158175, MIT Press.
Mann, S. and Niedzviecki, H., 2001, Cyborg: Digital Destiny and Human Possibility in the
Age of the Wearable Computer, Anchor Canada Publisher, Toronto, Doubleday, Canada
Publisher.
Milgram, P., Takemura, H., Utsumi, A., and Kishino, F., 1994, Augmented reality: A class of
displays on the realityvirtuality continuum, in Proceedings of the SPIE Conference on
Telemanipulator and Telepresence Technologies, vol. 2351, pp. 282292, Boston, MA.
Shepard, R. N., 1964, Attention and the metric structure of the stimulus space, Journal of
Mathematical Psychology, 1, 5487.
2 Meeting the Challenge
Wearable Computing
Thad Starner
CONTENTS
2.1 Networking...................................................................................................... 14
2.2 Power and Heat................................................................................................ 15
2.3 Mobile Input.................................................................................................... 17
2.4 Display............................................................................................................. 18
2.5 Virtual Reality................................................................................................. 19
2.6 Portable Video Viewers...................................................................................20
2.7 Industrial Wearable Systems........................................................................... 22
2.8 Academic/Maker Systems for Everyday Use..................................................24
2.9 Consumer Devices...........................................................................................26
2.10 Meeting the Challenge.....................................................................................28
References.................................................................................................................28
Wearable computers and head-mounted displays (HMDs) are in the press daily. Why
have they captured our imaginations now, when the technology has been available
for decades? While Fitbits fitness tracking devices are selling in the millions in
2014, what prevented FitSense (see Figure 2.5) from having similar success with
such devices in 2000? Since 1993 I have been wearing a computer with an HMD as
part of my daily life, and Reddy Information Systems had a commercial wearable
with Reflection Technologys Private Eye HMD in 1991 (Eliason 1992). Yet over
20years later, Google Glass is generating more excitement than any of those early
devices.
Many new classes of devices have followed a similar arc of adoption. The fax
machine was invented in 1846 but became popular over 130years later. In 1994, the
IBM Simon touchscreen smartphone had many features familiar in todays phones,
but it was the Apple iPhone in 2007 that seized the publics imagination (Sager 2012).
Often, the perceived need for a technology lags behind innovation, and sometimes
developers can be surprised by the ways in which users run with a technology. When
the cellular phone was introduced in the early 1980s, who would have guessed that
increasingly we would use it more for texting than talking?
Some pundits look for a killer app to drive the adoption of a new class of device.
Yet that can be misleading. As of mid-2014, tablets are outselling laptops in Europe,
yet there is no single killer app that drives adoption. Instead, the tablet offers a different set of affordances (Gibson 1977) than the smartphone or the laptop, making
13
14
it more desirable in certain situations. For example, for reading in bed the tablet
is lighter than a laptop and provides an easier-to-read screen than a smartphone.
Thetablet is controlled by finger taps and swipes that require less hardware and dexterity than trying to control a mouse and keyboard on a laptop, which also makes it
convenient for use when the user is in positions other than upright at a desk.
Wearable computers have yet a different set of affordances than laptops, tablets,
and smartphones. I often lie on a couch in my office, put the focus of my HMD at the
same depth as the ceiling, and work on large documents while typing using a onehanded keyboard called a Twiddler. This position is very comfortable, much more so
than any other interface I have tried, but students often think that they are waking me
when they walk into my office. In addition, I often use my wearable computer while
walking. I find it helps me think to be moving when I am composing, and no other
device enables such on-the-go use.
On-the-go use is one aspect of wearable computers that makes them distinct from
other devices. In fact, my personal definition of a wearable computer is any bodyworn computer that is designed to provide useful services while the user is performing other tasks. Often the wearables interface is secondary to a users other tasks
and should require a minimum of user attention. Take, for example, a digital music
player. It is often used while a user is exercising, studying, or commuting, and the
interface is used in short bursts and then ignored.
Such a secondary interface in support of a primary task is characteristic of a
wearable computer and can be seen in smartwatches, some spacesuits, fitness monitors, and even smartphones for some applications. Some of these devices are already
commonplace. However, here I will focus on wearable computers that include an
HMD, as these devices are at the threshold of becoming popular and are perhaps
the most versatile and general-purpose class of wearable computers. Like all wearable computers, those based on HMDs have to address fundamental challenges in
networking (both on and off the body), power and heat, and mobile input. First I will
describe these challenges and show how, until recently, they severely limited what
types of devices could be manufactured. Then I will present five phases of HMD
development that illustrate how improvements in technology allowed progressively
more useful and usable devices.
2.1NETWORKING
Turn-by-turn navigation, voice-based web search, and cloud-based office tools are
now commonplace on smartphones, but only in the past few years has the latency
of cellular networks been reduced to the point that computing in the cloud is effective. A decade ago, the throughput of a cellular network in cities like Atlanta could
be impressive, yet the latency would severely limit the usability of a user interface
depending on it. Today when sending a message, a Google Glass user might say,
OK Glass, send a message to Thad Starner. Remember to pick up the instruction
manual, and the experience can be a seamless interplay of local and cloud-based
processing. The three commands OK Glass, send a message to, and Thad Starner
are processed locally because the speech recognizer simply needs to distinguish
between one of several prompts, but the message content Remember to pick up
Wearable Computing
15
the instruction manual requires the increased processing power of the cloud to be
recognized accurately. With an LTE cellular connection, the content is processed
quickly, and the user may barely notice a difference in performance between local
and remote services. However, with a GPRS, EDGE, or sometimes even an HSPDA
connection, the wait for processing in the cloud can be intolerable.
WiFi (IEEE 802.11) might seem a viable alternative to commercial cellular networks, but until 2000 open hotspots were rare. Wearable computers in the late 1990s
often used WiFi, but they required adapters that were the size of a small mobile
phone and required significant power. Today, a part of a single chip can provide this
service.
On-body networking has also been a challenge. Bluetooth (IEEE 802.15) was
originally intended as a replacement for RS232 connections on desktop PCs, not as a
body network. The standard was not designed with power as a foremost concern, and
even basic implementations were unstable until 2001. Only recently, with the widespread adoption of Bluetooth Low Energy by the major mobile phone manufacturers
have wearable devices really had an appropriate body-centered wireless network.
Fundamental issues still remain. Both WiFi and Bluetooth use 2.4 GHz radio, which
is blocked by water and the human body. Thus, a sensor mounted in a shoe to monitor footfalls might have difficulty maintaining connection to an earbud that provides
information as to a runners performance.
Most positioning systems also involve networks. For example, the location-aware
Active Badge system made by Olivetti Research Laboratory in 1992 used a network
of infrared receivers to detect transmissions from a badge to locate a wearer and to
unlock doors as the user approached them. When the user was walking through the
lab, the system could also re-route phone calls to the nearest phone (Want 2010).
Similarly, the Global Positioning System uses a network of satellites to provide
precisely synchronized radio transmissions that a body-worn receiver can use to
determine its position on the surface of the planet. Today, GPS is probably one of
the most commonly used technologies for on-body devices. It is hard to imagine
life without it, but before 2000, GPS was accurate to within 100 m due to the U.S.
military intentionally degrading the signal with Selective Availability. Turn-by-turn
directions were impossible. Today, civilian accuracy has a median open, outdoor
accuracy of 10 m (Varshavsky and Patel 2010). Modern GPS units can even maintain
connection and tracking through wooden roofs.
2.2 POWER AND HEAT

In 1993, my first HMD-based wearable computer was powered by a lead-acid gel cell
battery that massed 1.3kg. Today, a lithium-ion camcorder battery stores the same
amount of power but weighs a quarter as much. While that seems like an impressive
improvement, battery life will continue to be a major obstacle to wearable technology, since improvements in battery technology have been modest compared to other
computing trends. For example, while disk storage density increased by a factor of
1200 during the 1990s, battery energy density only increased by a factor of three
(Starner 2003). In a mobile device, the battery will often be one of the biggest and
most expensive components. Since battery technology is unlikely to change during a
16
normal 18-month consumer product development cycle, the battery should be specified first as it will often be the most constraining factor on the products industrial
design and will drive the selection of other components.
One of those components is the DCDC power converter. A typical converter
might accept between 3.4 and 4.2 V from a nominal 3.6 V lithium battery and
produce several constant voltages for various components. One improvement in
mobile consumer electronics that often goes underappreciated is the efficiency of
DCDC power converters. Before 2000, just the DCDC converter for Google Glass
could mass 30 g (Glass itself is 45 g), and the device might lose 30% of its power as
heat. Today, switching DCDC converters are often more than 95% efficient and are
just a few grams.
Due to this efficiency improvement, there is a corresponding reduction in heat
production. Heat often limits how small a mobile device can be. A wearable device is
often in contact with a users skin, and it must have enough surface area and ventilation
to cool, or it will have to throttle its performance considerably to stay at a comfortable
temperature for the user (Starner and Maguire 1999). This tension between performance and physical size can be quite frustrating to designers of wearable devices.
Users often desire small jewelry-like devices to wear but are also attracted to powerhungry services like creating augmented reality overlays with registered graphics or
transmitting video remotely. Yet in consumer products, fashion is the key. Unless the
consumer is willing to put on the device, it does not matter what benefits it offers, and
physical size and form are major components of the desirability of a device.
In practice, the design of a wearable device is often iterative. Given a battery
size, an industrial designer creates a fashionable package. That package should be
optimized in part for thermal dissipation given its expected use. Will the device
have the ability to perform the expected services and not become uncomfortable
to wear? If not, can the package be made larger to spread the heat, lowering
the temperature at the surface? Or can lower-heat alternatives be found for the
electronics? Unfortunately, many industrial design tools do not model heat, which
tends to require highly s pecialized software. Thus, the iteration cycle between fashion and mechanical engineering constraints can be slow.
One bright spot in designing wearable computers is the considerable effort that
has been invested in smartphone CPUs and the concomitant power benefits. Modern
embedded processors with dynamic voltage scaling can produce levels of computing power equivalent to a late-1980s supercomputer in one instant and then, in the
next moment, can switch to a maintenance mode which draws milliwatts of power
while waiting for user input. Designing system and user software carefully for these
CPUs can have significant benefits. Slower computation over a longer period can
use significantly less power than finishing the same task at a higher speed and then
resting. This slow-and-steady technique has cascading benefits: power converters are
generally more efficient at lower currents, and lithium-ion batteries last longer with
a steady discharge than with bursty uses of power.
Similarly, system software can exploit knowledge about its networking to help
flatten the battery load.
Wireless networking requires significant power when the signal is weak. For
non-crucial tasks, waiting for a better signal can save power and heat. Designing
Wearable Computing
17
maintenance and background tasks (e.g., caching email and social networking feeds)
to be thermally aware allows more headroom for on-demand interactive tasks. If the
wearable is thought of as a leaky cup, and heat as water filling it, then one goal is
to keep the cup as empty as possible at any given time so that when a power-hungry
task is required, we have as much space as possible to buffer the heat produced and
not overflow the cup.
2.3 MOBILE INPUT

Wearable computing interfaces often aspire to be hands-free. This term is a bit of a
misnomer. What the user really wants is an interface that is unencumbering. A wristwatch that senses a wearers gesture to decline a phone call or to change the track
on a digital music player is certainly not hands-free, but its clearly better for use
while jogging compared to stopping and manipulating a touchscreen. Unfortunately,
an on-the-go wearable user has reduced dexterity, eyesight, hearing, attention, and
sense of touch than when stationary, which makes an unencumbering interface
design particularly challenging.
Speech interfaces seem like an obvious alternative, and with low-latency c ellular
networks and processing in the cloud, speech recognition on Android and iOS
phones has become ubiquitous. Modern, big data machine learning techniques are
enabling ever-better speech recognition. As enough examples of speech are captured
on mobile devices with a large variety of accents and background noises, recognition
rates are improving. However, dictating personal notes during a business conversation or a university class is not socially appropriate. In fact, there are many situations
in which a user might feel uncomfortable interacting with a device via speech. Thus,
mobile keyboards will continue to be a necessary part of mobile interfaces.
Unfortunately, todays mini-QWERTY and virtual keyboards require a lot of
visual attention when mobile. A method of mobile touch typing is needed. To my
knowledge, the Twiddler keyboard, first brought on the market in 1992, is still the
fastest touch typing mobile device. Learning the Twiddler requires half the learning
time (25h for 47 wpm on average) of the desktop QWERTY keyboard to achieve
the greater-than-40 wpm required for high school typing classes (Lyons 2006). Yet
the device remains a niche market item for dedicated users. Perhaps as more users
type while on-the-go and after the wireless Twiddler 3 is introduced, more people
will learn it. Such silent, eyes-free mobile text entry still remains an opportunity
for innovation, especially for any technology that can accelerate the learning curve.
Navigating interfaces while on-the-go also remains a challenge. Some self-
contained headsets use trackpads or simple d-pad interactions, but some users would
like a more subtle method of interaction. One option is to mount a remote c ontroller
elsewhere on the body and use Bluetooth Human Interface Device profiles for connection. In a series of studies, Bruce Thomass group at the University of South
Australia explored both what types of pointing devices are most effective while onthe-go and where they should be mounted (Thomas etal. 2002, Zucco etal. 2009). His
results suggest that mini-trackpads and mini-trackballs can be highly effective, even
while moving. Launched in 2002, Xybernauts POMA wearable computer suggested
another interesting variant on this theme. A user could run his finger over a wired,
18
upside-down optical mouse sensor to control a cursor. Perhaps with todays smaller
and lower-power components, a wireless version could be made. More recently,
Zeagler and Starner explored textile interfaces for mobile input (Komar etal. 2009,
Profita etal. 2013), and a plethora of community-funded Bluetooth Human Interface
Devices are being developed, often focusing on rings and bracelets. One device will
not satisfy all needs, and there will be an exciting market for third-party interfaces
for consumer wearable computers.
Traditional windows, icon, menu, pointer (WIMP) interfaces are difficult to use
while on-the-go as they require too much visual and manual attention. Fortunately,
however, smartphones have broken the former monopoly on graphical user interfaces. Swipes, taps, and gestures on phone and tablet touchscreens can be made without much precision, and many of the features of Android and iOS can be accessed
through these cruder gestures. Yet these devices still require a flat piece of glass,
which can be awkward to manipulate while doing other tasks. Instead, researchers
and startups are spending considerable energy creating gestural interfaces using
motion sensors. Besides pointing, these interfaces associate gestures with particular
commands such as silencing a phone or waking up an interface. False triggering,
however, is a challenge in the mobile environment; an interface that keeps triggering
incorrectly throughout the users workday is annoying at best.
2.4DISPLAY
While visual displays often get the most attention, auditory and tactile displays are
excellent choices for on-the-go users. Almost all mobile phones have a simple vibration motor to alert the user to an incoming call. Unfortunately, a phone vibrating in a
pants pocket or purse can be hard to perceive while walking. In the future, I expect
the closer contact with the skin made available by smartwatches to enable more reliable and expressive tactile interfaces than a simple on/off vibration motor.
Audio displays are another good choice for on-the-go interaction. Smartphones
and mobile music players are almost always shipped with earbuds included, but there
is much room for innovation. Bone conduction, such as is used with Google Glass
and by the military and professional scuba divers, allows the wearer to hear notifications from the computer without blocking the ear canals. Ambient audio interfaces
(Sawhney and Schmandt 2000) allow the wearer to monitor information sources,
like the volume of stock market trading, for sudden changes without devoting much
attention to the process. Rendering audio in 3D can help the user monitor several
ambient information sources at once or can improve the sense of participant presence during conference calls.
HMDs can range from devices meant to immerse the user in a synthetic reality to
a device with a few lights to provide feedback regarding the wearers performance
while biking. HMDs can be created using lasers, scanning mirrors, holographic
optics, LCDs, CRTs, and many others types of technologies. For any given HMD,
design trade-offs are made between size, weight, power, brightness and contrast,
transparency, resolution, color, eyebox (the 3D region in which the eye can be placed
and still see the entire display in focus), focus, and many other factors. The intended
purpose of the HMD often forces very different form factors and interactions.
19
Wearable Computing
Forthe purpose of discussion, Ive clustered these into five categories: virtual reality,
portable video viewers, industrial wearable systems, academic/maker wearables for
everyday use, and consumer devices. See Kress et al. (2014) for a more technical
discussion of typical optics of these types of displays.
2.5 VIRTUAL REALITY

In the late 1980s and early 1990s, LCD and CRT displays were large, heavy, power
hungry, and required significant support electronics. However, it was during this
time that virtual reality was popularized, and by the mid-1990s, HMDs began to
be affordable. VPL Research, Virtual Research, Virtual I/O, Nintendo, and many
others generated a lot of excitement with virtual reality headsets for professionals
and gamers (Figure 2.1). An example of an early professional system was the 1991
Flight Helmet by Virtual Research. It has a 100-degree diagonal field of view and
240 120 pixel resolution. It weighs 1.67 kg and uses 6.9 cm LCD screens with
LEEP Systems wide-angle optics to provide an immersive stereoscopic experience.
For its era, the Flight Helmet was competitively priced at $6000. Subsequent Virtual
Research devices employed smaller lenses and a reduced field of view to save weight
and cost. By 1994, the LCDs in the companys VR4 had twice the resolution at half
(a)
(b)
(c)
(d)
FIGURE 2.1 Virtual reality HMDs. (a) Virtual Researchs Flight Helmet (1991, $6000).
(b) Nintendo virtual boy video game console (1995, $180). (c) Virtual i-O i-glasses! Personal
3D viewer head-mounted display (1995, $395). (d) Oculus Rift DK1 (2013, $300). (Images
courtesy of Tavenner Hall.)
20
the size. However, with todays lighter weight panels and electronics, the Oculus
Rift Developer Kit 1 slightly surpasses the original Flight Helmets field of view and
has 640 480 pixel resolution per eye while weighing 379 g. The biggest difference
between 1991 and today, though, is the pricethe Rift DK1 is only $300 whereas the
Flight Helmet, adjusted for inflation, would be the equivalent of over $10,000 today.
The 1995 Nintendo Virtual Boy game console is an interesting contrast to the
Flight Helmet. It costs $180, and with over a million devices sold, it ranks among
the largest-selling HMDs. The Virtual Boy introduced many consumers to immersive gameplay. It is portable and includes the full computing system in the headset
(the wired controller includes the battery pack for the device). As a table-top head
display, the Virtual Boy avoids the problem of too much weight on the head, but it
has no possibility of head tracking or the freedom of motion available with most
VR headsets. It uses Reflection Technologys scanning, mirror-style, monochromatic
display in which a column of 224 LEDs is scanned across the eye with an oscillating
mirror as the LEDs flash on and off, creating an apparent 384 224 pixel resolution
display with persistence of vision. Unlike many consumer VR devices, the Virtual
Boy provides adjustments for focus and inter-eye distance. Still, some users quickly
complain of simulation sickness issues.
Minimizing head weight and simulation sickness continues to be a major concern
with modern VR HMDs. However, power and network are rarely a concern with these
devices, since they are mostly for stationary use and attach to desktop or gaming
systems. The user controls the experience through head tracking and instrumented
gloves as well as standard desktop interfaces such as keyboards and joysticks. While
these VR HMDs are not wearables by my definition, they are examples of early
major efforts in industrial and consumer devices and share many features with the
next class of device, mobile video viewers.
2.6 PORTABLE VIDEO VIEWERS

By the late 1990s, camcorder viewfinders were a major market for small LCD panels. Lightweight and inexpensive LCD-based mobile HMDs were now possible
(Figure 2.2). Unfortunately, there were no popular mobile computing devices that
could output images or videos. Smartphones would only become prevalent after 2007,
and most did not have the capability for controlling an external screen. The video
iPod can stream video, but it would not be released until 2005. Instead, HMD manufacturers started focusing on portable DVD players for entertaining the traveler.
In-seat entertainment systems were rare, so manufacturers envisioned a small HMD
connected to a portable DVD player, which allowed the wearer to watch a movie
during a flight or car ride. Networking was not required, and user input consisted of
a few button presses. Battery life needed to be at least 2 h and some devices, like the
Eyetop (Figure 2.2), offered packages with an external battery powering both the display and the DVD player. With the Glasstron HMD (and the current HMZ line), Sony
was more agnostic about whether the device should be used while mobile or at home.
One concept was that the headsets could be used in place of a large screen television
for those apartments with little space. However, the Glasstron line did include a place
to mount Sonys rechargeable camcorder batteries for mobile usage.
21
Wearable Computing
(a)
(b)
(c)
(d)
(e)
(f )
FIGURE 2.2 Portable video viewers first concentrated on interfacing with portable DVD
players, then flash-based media players like the video iPod, and most recently started integrating enough internal memory to store movies directly. (a) Sony Glasstron PLM-A35 (2000,
$499). (b) Eyetop Centra DVD bundle (2004, $599). (c) MyVu Personal Viewer (2006, $270).
(d) Vuzix iWear (2008, $250). (e) Vuzix Wrap 230 (2010, $170). (f) Epson Moverio BT-100
(2012; $700). (Images courtesy of Tavenner Hall.)
As small, flash memory-based mobile video players became common, portable

video viewers became much more convenient. Companies such as MyVu and Vuzix
sold several models and hundreds of thousands of devices (Figure 2.2), with the
units even making appearances in vending machines at airports. Modern video
viewers, like the Epson Moverio, can be wireless, having an internal battery and
using a micro-SD reader or internal memory for loading the desired movie directly
to the headset.
22
The Moverio BT-100 (Figure 2.2) is especially interesting as it sits astride three
different classes of device: portable video viewer, industrial wearable, and consumer
wearable. It is self-contained, two-eyed, 2D or 3D, and see-through and can run standard Android applications. It has WiFi and a removable micro-SDHC for loading
movies and other content. Its battery and trackpad controller is in a wired pendant,
giving it ease of control and a good battery life. Unfortunately, the HMD itself is a
bit bulky and the noseweight is too highboth problems the company is trying to
address with the new BT-200 model.
Unlike the modern Moverio, many older devices do not attempt 3D viewing, as
simulator sickness was a potential issue for some users and 3D movies were uncommon until the late 2000s. Instead, these displays play the same image on both eyes,
which can still provide a high quality experience. Unfortunately, video viewers s uffer
certain apathy from consumers. Carrying the headset in addition to a smartphone
or digital video player is a burden, and most consumers prefer watching movies
ontheir pocket media players and mobile phones instead of carrying the extra bulk
of a video viewer. An argument could be made that a more immersive system, like an
Oculus Rift, would provide a higher quality experience that consumers would prefer,
but such a wide field of view system is even more awkward to transport. Studies on
mobile video viewing show diminishing returns in perception of quality above 320
240 pixel resolution (Weaver etal. 2010), which suggests that once video quality is
good enough, the perceived value of the video system will be more determined by
other factors such as convenience, ease-of-use, and price.
2.7 INDUSTRIAL WEARABLE SYSTEMS

Historically, industrial HMD-based wearable computers have been one-eyed with an
HMD connected to a computer module and battery mounted on the waist (Figure2.3).
Instead of removing the user from reality, these systems are intended to provide
computer support while the wearer is focused on a task in the physical world such as
inspection, maintenance, repair, and order picking. For example, when repairing a
car, the HMD might show each step in a set of installation instructions.
Improvements in performance for industrial tasks can be dramatic. A study performed at Carnegie Mellon University showed that during Army tank inspections,
an interactive checklist on a one-eyed HMD can cut in half the required personnel
and reduce the required time for completing the task by 70% (Siewiorek etal. 2008).
For order picking, a process during which a worker selects parts from inventory to
deliver to an assembly line or for an outgoing package to a customer, a graphical
guide on a HMD can reduce pick errors by 80% and completion time by 38% over
the current practice of using paper-based parts lists (Guo etal. 2014).
Some HMD uses provide capabilities that are obviously better than current
practice. For instance, when testing an electrical circuit, technicians must often
hold two electrical probes and a test meter. Repairing telephone lines adds the
extra complication of clinging to a telephone pole at the same time. The Triplett
VisualEYEzer 3250 multimeter (Figure 2.3) provides a head-up view of the meters
display, allowing the user to hold a probe in each hand. The result is that the technician can test circuits more quickly and have a better ability to handle precarious
23
Wearable Computing
(a)
(b)
(c)
(d)
(e)
(f )
FIGURE 2.3 Wearable systems designed for industrial, medical, and military a pplications.
(a) Xybernaut MA-IV computer (1999, $7500). (b) Triplett VisualEYEzer 3250 multimeter
(2000, $500). (c) Xybernaut MA-V computer (2001, $5000). (d) Xybernaut/Hitachi VII/
POMA/WIA computer (2002, $1500). (e) MicroOptical SV-6 display (2003, $1995). (f)Vuzix
Tac-Eye LT head-up display (2010, $3000). (Images courtesy of Tavenner Hall.)
situations. In the operating room, anesthesiologists use HMDs in a similar way. The
HMD overlays vital statistics on the doctors visual field while monitoring the patient
(Liu etal. 2009). Current practice often requires anesthesiologists to divert their gaze
to monitor elsewhere in the room, which reduces the speed at which dangerous situations are detected and corrected.
With more case studies showing the advantages of HMDs in the workplace,
industry has shown a steady interest in the technology. From the mid-1990s to after
2000, companies such as FlexiPC and Xybernaut provided a general-purpose line
24
of systems for sale. See Figure 2.3 for the evolution of Xybernauts line. Meanwhile,
specialty display companies like MicroOptical and Vuzix (Figure 2.3) made displays
designed for industrial purposes but encouraged others to integrate them into systems
for industry. User input to a general purpose industrial system might be in the form
of small vocabulary, isolated-word speech recognition; a portable trackball; a dial;
or a trackpad mounted on the side of the main computer. Wireless networking was
often by 802.11 PCMCIA cards. CDPD, a digital standard implemented on top of
analog AMPS cellular service, was used when the wearer needed to work outside of
the corporate environment. Most on-body components were connected via wires, as
wireless Bluetooth implementations were often unstable or non-existent. Industrial
customers often insisted on Microsoft Windows for compatibility with their other
systems, which dictated many difficult design choices. Windows was not optimized
for mobile use, and 86 processors were particularly bad at power efficiency. Thus,
wearables had to be large to have enough battery life and to dissipate enough heat
during use. The default Windows WIMP user interface required significant hand-eye
coordination to use, which caused wearers to stop what they were doing and focus on
the virtual interface before continuing their task in the physical world. After smart
phones and tablets introduced popular, lighter-weight operating systems and user
interfaces designed for grosser gesture-based interactions, many corporate customers began to consider operating systems other than Windows. The popularization of
cloud computing also helped break the Windows monopoly, as corporate customers
considered wearables as thin client interfaces to data stored in the wireless network.
Today, lightweight, self-contained Android-based HMDs like Google Glass,
Vuzix M100, and Optinvent ORA are ideal for manufacturing tasks such as order
picking and quality control, and companies like APX-Labs are adapting these devices
to the traditional wearable industrial tasks of repair, inspection, and maintenance.
Yet many opportunities still exist for improvements; interfaces are evolving quickly,
but mobile input is still a fundamental challenge. Switching to a real-time operating system could help with better battery life, user experience, weight, cost, system
complexity, and the number of parts required to make a full machine. One device is
not suitable for all tasks, and I foresee an array of specialized devices in the future.
2.8 ACADEMIC/MAKER SYSTEMS FOR EVERYDAY USE

Industrial systems focused on devices donned like uniforms to perform a specific
task, but some academics and makers started creating their own systems in the early
1990s that were intended for everyday private use. These devices were worn more
like eyeglasses or clothing. Applications included listening to music, texting, navigation, and schedulingapps that became mostly the domain of smartphones 15years
later. However, taking notes during classes, meetings, and face-to-face conversations
was a common additional use of these devices beyond what is seen on smartphones
today. Users often explained that having the devices was like having an extra brain
to keep track of detailed information.
Audio and visual displays were often optimized for text, and chording keyboards
such as a Twiddler (shown in Figure 2.4b) or any of the 7- or 8-button chorders
(shown in Figure 2.4a) enabled desktop-level touch typing speeds. Due to the use of
25
Wearable Computing
(a)
(b)
(c)
(d)
FIGURE 2.4 Some wearable computers designed by academics and makers focused on
creating interfaces that could be used as part of daily life. (a) Herbert 1, designed by Greg
Priest-Dorman in 1994. (b) Lizzy wearable computer, designed by Thad Starner in 1995
(original design 1993). (c) MIThril, designed by Rich DeVaul in 2000. (d) CharmIT, designed
as a commercial, open-hardware wearable computing kit for the community by Charmed,
Inc. in 2000. (Images courtesy of Tavenner Hall.)
lighter-weight interfaces and operating systems, battery life tended to be better than
the industrial counterparts. Networks included analog dial-up over cellular, amateur
radio, CDPD, and WiFi as they became available. The CharmIT, Lizzy, and Herbert
1 concentrated the electronics into a centralized package, but the MIThril and
Herbert 3 (not shown) distributed the electronics in a vest to create a more balanced
package for wearing.
Displays were mostly one-eyed and opaque, depending on the illusion in the
human visual system by which vision is shared between the two eyes. These displays appear see-through to the user because the image from the occluded eye and
the image of the physical world from the non-occluded eye are merged to create a
perception of both. In general, opaque displays provide better contrast and brightness than transparent displays in daylight environments. The opaque displays might
be mounted up and away from the main line of sight or mounted directly in front
of the eye. Reflection Technologys Private Eye (Figure 2.4b) and MicroOpticals
displays (Figure 2.4d) were popular choices due to their relatively low power and
good sharpness for reading text. Several of the everyday users of these homebrew
machines from the 1990s would later join the Google Glass team and help inform
the development of that project.
26
2.9 CONSUMER DEVICES

Consumer wearable computers are fashion and, above all, must be designed as such.
Unless a user is willing to put on the device, it does not matter what functionality it
promises. Making a device that is both desirable and fashionable places constraints
on the whole system: the size of the battery, heat dissipation, networking, input, and
the design of the HMD itself.
Consumer wearable computers often strive to be aware of the users context, which
requires leveraging low power modes on CPUs, flash memory, and sensors to monitor the user throughout the day. As opposed to explicit input, these devices may sense
the wearers movement, location, and environmental information in the background.
For example, the Fitbit One (Figure 2.5), clipped on to clothing or stored in a pocket,
monitors steps taken, elevation climbed, and calories burned during the day. This
information is often uploaded to the cloud for later analysis through a paired laptop
or phone using the Ones Bluetooth LE radio. The Fitsense FS-1 from 2000 had a
similar focus but also included a wristwatch so that the user can refer to his statistics
quickly while on-the-go. Since Bluetooth LE did not yet exist when the FS-1 was
created, it used a proprietary, low-power, on-body network to communicate between
(a)
(b)
(c)
(d)
FIGURE 2.5 As technology improves, consumer wearable devices continue to gain acceptance. (a) Fitsense heart band, shoe sensor, and wristwatch display (2000, $200). (b) Fitbit
One (2012, $100). (c) Recon MOD Live HMD and watch band controller for skiing (2011,
$400). (d) 2012 Ibex Google Glass prototype. Released Glass Explorer edition (2014, $1500).
(Images courtesy of Tavenner Hall.)
Wearable Computing
27
its different components as well as a desktop or laptop. This choice was necessary
because of battery life and the lack of stability of wireless standards-based interfaces
at the time, and it meant that mobile phones could not interface with the device. Now
that Bluetooth LE is becoming common, an increasing number of devices, including
the Recon MOD Live and Google Glass (Figure 2.5), will leverage off-body digital
networks by piggybacking on the connection provided by a smartphone.
Both consumer wristwatches, such as FS-1, and HMDs, such as Recon MOD
and Google Glass, can provide information to the wearer while on the go. Because
these displays are fast to access, they reduce the time from when the user first has
the intention to check some information and the action to do so. Whereas mobile
phones might take 23 s to access (to physically retrieve, unlock, and navigate to the
appropriate application), wristwatches and HMDs can shorten that delay to only a
couple of seconds (Ashbrook etal. 2008). This reduction in time from intention to
action allows the user to glance at the display, much like the speedometer in a cars
dashboard, and get useful information while performing other tasks.
An HMD has several advantages over a wristwatch, one of which is that it can be
actually hands-free. By definition, a wristwatch requires at least one arm to check
the display and often another hand to manipulate the interface. However, such manual control is easy for the user to learn, is precise, and can be subtle. HMDs are also
mounted closer to the wearers primary senses of sight and hearing. This location
provides a unique first-person view of the world, matching the users perspective.
One use, of course, is pairing the HMD with a camera so that the user can capture
what he sees while on-the-go. Being mounted on the head can also allow HMDbased systems to sense many signals unavailable to a wrist interface, including head
motion, eye blinks, eye movement, and even brain signals in ideal circumstances.
On the other hand, a wrist-mounted system can sense the users hand motions and
may even be able to distinguish different types of actions and objects by their sounds
(Ward etal. 2006).
The Recon MOD Live takes advantage of both approaches, pairing a wrist-mounted
controller with an opaque HMD and Android computer mounted in a compatible pair
of goggles. The system is designed for use while skiing to provide information like
location, speed, descent, and jump airtime. With the HMD, status information can
be provided in a head-up manner with little to no control required by the user. The
information can be shared to others via a Bluetooth connection to a smartphone.
When the user has more attention (and hands) to spare, he can use the wrist interface
to select and scroll through text messages, select music to play, or interact with apps.
Google Glass, another Android-based wearable HMD, uses head motion, speech,
and a multi-touch trackpad on one earpiece for its input. Networking is via 802.11
WiFi or tethering to the users phone over Bluetooth. The display is transparent and
mounted high. It is easily ignorable and designed for short microinteractions lasting
a few seconds. This focus on microinteractions helps preserve battery life. Common
uses include texting, email, weather, clock, turn-by-turn directions, stock quotes,
calendar, traffic, remembering ones parking location, pictures, videos (10 s in length
by default), and suggestions for restaurants, events, tourist spots, and photo spots.
Glasss interface is designed to be used throughout the day and while on-the-go.
For example, if the user is walking and a text arrives, Glass alerts the user with a
28
sound. If the user ignores the alert, nothing happens. However, if the user tilts his
head up, the screen lights show the text. The user can read and dismiss it with nudge
of his head upward. Alternatively, the user can say OK Glass, reply and dictate
a response. Because Glass displays a limited amount of text on the screen at once,
interactions are short or broken into multiple small interactions. Ideally, such on-thego interactions should be around four seconds or less (Oulasvirta etal. 2005) to help
keep the user focused in the physical world.
Glass is also designed to interfere as little as possible with the users senses. Not
only is the display mounted high enough that it keeps both pupils unobstructed for
full eye-contact while the user is conversing with another person, but sound is rendered by a bone conduction transducer, which sends sound through the users head
directly to the cochlea. The ears are kept clear so that the user maintains normal,
unobstructed, binaural hearing.
Both the Recon MOD Live and Google Glass are monocular systems with
relatively small fields of view. This design choice minimizes size and weightin
particular, weight supported by the nose. Comfort is more important than features
when designing something intended to be worn for an extended period of time, and
current large field of view displays burden the nose and face too much.
2.10 MEETING THE CHALLENGE

The challenges of networking, power and heat, display, and mobile input will continue
for wearable computing for the foreseeable future. However, with improvements in
optics technology, electronics miniaturization, and network standards, self-contained
HMD-based wearable computers can be relatively minimal devices that are comfortable to wear all day. Now that the historical challenges are being addressed, the field
of wearable computing is being confronted with too many opportunities. It will take
ten years and many companies to capitalize on the potential, but I hope to see the
same sort of revolution and improvements in efficiency and lifestyle that happened
around the PC and the smartphone. The challenge now is in taking advantage of this
new way to augment humanity.
REFERENCES
Ashbrook, D., J. Clawson, K. Lyons, T. Starner, and N. Patel. Quickdraw: The impact of
mobility and on-body placement on device access time. In: ACM Conference Human
Factors in Computing Systems (CHI), April 2008, pp. 219222, Florence, Italy.
Eliason, F. A wearable manual called red. New York Times, March 29, 1992, 7.
Gibson, J. The theory of affordances. In: Perceiving, Acting, and Knowing, R. Shaw,
J.Bransford (eds.). Erlbaum: Hillsdale, NJ, 1977, pp. 6782.
Guo, A., S. Raghu, X. Xie, S. Ismail, X. Luo, J. Simoneau, S. Gilliland, H. Baumann,
C.Southern, and T. Starner. A comparison of order picking assisted by head-up display
(HUD), cart-mounted display (CMD), light, and paper pick list. In: IEEE ISWC, Seattle,
WA, September 2014, pp. 7178.
Komor, N., S. Gilliland, J. Clawson, M. Bhardwaj, M. Garg, C. Zeagler, and T. Starner. Is it
gropable?Assessing the impact of mobility on textile interfaces. In: IEEE ISWC, Linz,
Austria, September 2009, pp. 7174.
Wearable Computing
29
Kress, B., E. Saeedi, and V. Brac-de-la-Perriere. The segmentation of the HMD market:
Optics for smart glasses, smart eyewear, AR and VR headsets. In: Proceedings of the
SPIE 9202, Photonics Applications for Aviation, Aerospace, Commercial, and Harsh
Environments V, September 5, 2014, p. 92020D, San Diego, CA.
Liu, D., S. Jenkins, and P. Sanderson. Clinical implementation of a head-mounted display of
patient vital signs. In: IEEE ISWC, Linz, Austria, September 2009, pp. 4754.
Lyons, K., T. Starner, and B. Gain. Experimental evaluations of the twiddler oneHanded
chording mobile keyboard. HCI Journal 21(4), 2006, 343392.
Oulasvirta, A., S. Tamminen, V. Roto, and J. Kuorelahti. Interaction in 4-second bursts: The
fragmented nature of attentional resources in mobile HCI. In: ACM CHI, Portland,
Oregon, 2005, pp. 919928.
Profita, H., J. Clawson, S. Gilliland, C. Zeagler, T. Starner, J. Budd, and E. Do. Dont mind me
touching my wrist: A case study of interacting with on-body technology in public. In:
IEEE ISWC, Zurich, Switzerland, 2013, pp. 8996.
Sager, I. Before IPhone and Android came Simon, the first smartphone. Bloomberg Businessweek,
June 29, 2012, http://www.bloomberg.com/bw/articles/2012-06-29/before-iphone-andandroid-came-simon-the-first-smartphone (Accessed March 17, 2015).
Sawhney, N. and C. Schmandt. Nomadic radio: Speech and audio interaction for c ontextual messaging in nomadic environments. ACM Transactions on ComputerHuman Interaction
(TOCHI) 7(3), 2000, 353383.
Siewiorek, D., A. Smailagic, and T. Starner. Application Design for Wearable Computing,
Synthesis Lecture Series Monograph. Morgan & Claypool, San Rafael, CA, 2008.
Starner, T. Powerful change part 1: Batteries and possible alternatives for the mobile market.
IEEE Pervasive Computing 2(4), 2003, 8688.
Starner, T. and Y. Maguire. Heat dissipation in wearable computers aided by thermal coupling
with the user. ACM Journal on Mobile Networks and Applications (MONET), Special
issue on Wearable Computers 4(1), 1999, 313.
Thomas, B., K. Grimmer, J. Zucco, and S. Milanese. Where does the mouse go? An investigation into the placement of a body-attached touchpad mouse for wearable computers.
Personal and Ubiquitous Computing 6, 2002, 97112.
Varshavsky, A. and S. Patel. Location in ubiquitous computing. In: Pervasive Computing,
J.Krumm (ed.). CRC Press: Boca Raton, FL, 2010, pp. 285319.
Want, R. An introduction to ubiquitous computing. In: Pervasive Computing, J. Krumm (ed.).
CRC Press: Boca Raton, FL, 2010, pp. 135.
Ward, J., P. Lukowicz, G. Troester, and T. Starner. Activity recognition of assembly tasks using
body-worn microphones and accelerometers. IEEE Transactions on Pattern Analysis
and Machine Intelligence (PAMI) 28(10), 2006, 15531567.
Weaver, K., T. Starner, and H. Hamilton. An evaluation of video intelligibility for novice
American sign language learners on a mobile device. In: ACM ASSETS, Orlando, FL,
October 2010, pp. 107114.
Zucco, J., B. Thomas, K. Grimmer-Somers, and A. Cockburn. A comparison of menu
configurations and pointing devices for use with wearable computers while mobile and
stationary. In: IEEE ISWC, 2009, pp. 6370, Linz, Austria.
3 Ethics, Power, and Potential

Intimacy and Extimacy
of Wearable Technologies
Patricia Flanagan, DespinaPapadopoulos,
and Georgina Voss
CONTENTS
3.1 Introduction..................................................................................................... 32
3.2 Future Scenarios: Ethical and Speculative Implications of How Our
Embodied Materiality Is Affected by Emerging Technologies....................... 33
3.2.1 Garment as Anchor..............................................................................34
3.2.2 Start with Value................................................................................... 35
3.2.3 Think about the System....................................................................... 35
3.2.4 Requirements and Specifications Are for Humans Too...................... 35
3.2.5 Prototypes and Iterative Design........................................................... 36
3.2.6 Experimenting with the Future, Questioning the Present................... 36
3.2.7 Coloring............................................................................................... 37
3.2.8 Life As We Know ItThe Qualified Self........................................... 37
3.3 Self and the Social Politic of Wearable Technologies..................................... 39
3.3.1 Personal Technologies, Regional Innovation....................................... 39
3.3.2 Quantifying the Intended User............................................................40
3.3.3 Tracking in the Factories..................................................................... 42
3.3.4 Bodies at Work.................................................................................... 43
3.4 Synaptic Sculpture: Vibrant Materiality and the Interconnected Body.......... 43
3.4.1 Sperm, Stars, and Human-Centric Perception.................................... 43
3.4.2 Inversion of the Design Process.......................................................... 45
3.4.3 Bridging Materiality and Information.................................................46
3.4.4 Merger of the Body and Technology................................................... 48
3.4.5 Conclusion: Synthesis and Synaptics................................................... 52
References................................................................................................................. 53
31
32
3.1INTRODUCTION
The chapter is founded on the premise that current wearable technology design
practices represent a reductionist view of human capacity. The democratization of
technology into work, play, home, and mobile social networks in recent years has
seen traditional humancomputer interaction (HCI) design methodology broadened
through the integration of other methodologies and knowledge from the humanities
such as social science, anthropology, and ethnography. The field of HCI is inherently
interdisciplinary and its history is one of the inevitable disciplinary multiculturalisms spawned by the expansive impact of technological growth.
What questions should we be asking to engage a more critical design perspective? This chapter extends traditional functionalist approaches to design to engage
cultural, experience-based, and techno-futurist approaches. Wearable technologies
are therefore discussed in terms of their critical, political, ethical, and speculative potential, and case studies are presented to illustrate and exemplify the ideas
promulgated.
The chapter is organized into three sections. The first section proposes the role
of the designer to be one that includes a cultural approach to designing future
scenariosone that considers ethical and speculative implications of how our
embodied materiality is affected by emerging technologies. What is the relationship of the self to the proliferating wearable technologies? How is our sense of
self changing as new technologies mediate the space between our experience of
self and the world? We develop a methodology that asks designers and technologists to build future scenarios and envision how our embodied materiality is
affected by emerging technologies. Using a philosophical framework we explore
design and its implications on the relationship of the self to the self and to social
relationships. We then investigate how technologies such as Google Glasses and
Quantified Self applications inform our relationship to our self and redefine our
social interactions.
The second section discusses the self and the social politic of wearable technologies from macro to micro perspectives. Considering wider supply and production
chains and regulatory systems whose existence shapes the production and meaning
of wearablesboth their material form and design, and the movement of gathered
data from the body into wider dispersed networks of power. Moving from the micro
(technology/body) to the macro (systems of production), we consider where control
lies across these networks, at what unit of analysis, and what their impact could be
on the wider world as they are dispersed.
The final section adopts a techno futurist approach proposing synaptic sculpture
as a process for creative design that engages vibrant materiality and the interconnected body. The section describes the emergence of a new paradigm in terms of
our augmented perspectiveour perception of scale expanding our awareness and
sensitivity across macro- and nanolevels. These new spheres of awareness become
our normative environmentones with an amplified awareness of the instability,
fungability, and interconnectedness of things. This perspective promulgates the
space of design to be in the interface as mediator of experience, rather than design
of objects or products. We propose the need to develop a connoisseur of somesthetic
33
qualities surrounding the design of wearables. This subverts the traditional fashion
design methodology away from the trickle-down theory to one that can enhance a
relationship between designer and user who can become coproducers and connects
materiality to anthropology and the lived experience of the individual.
3.2FUTURE SCENARIOS: ETHICAL AND SPECULATIVE

IMPLICATIONS OF HOW OUR EMBODIED MATERIALITY
IS AFFECTED BY EMERGING TECHNOLOGIES
What is the relationship of the self to the proliferating wearable technologies? How
is our sense-of-self changing as new technologies mediate the space between our
experience of self and the world?
As we create more and more wearable devices, it is important that we also
develop a methodology that asks designers and technologists to build future scenarios and envision how our embodied materiality is affected by emerging technologies.
Wearable environments are laden with symbolic, cultural, and emotional meaning
and therefore provide a unique space to investigate questions of physicality, presence, and intimacy. In many ways the wearable environment is that interface that
connects, and at the same time creates a boundary with the world.
Deborah Cohen in an article for The Atlantic, Why we look the way we look
now, (Cohen, 2014) writes:
Look closely at the emergence of our modern style, and you can see politics in the
fabric seams. Economic collapse and the search for social unitythe conditions that
made the New Deal possiblecreated an unlikely alignment of tastes. Streamlined
clothes appealed to the still prosperous, anxious to hide their wealth, and to the downwardly mobile, who hoped to conceal their slide.
Our clothing has always expressed our relationship to social structures and to the
ways we perceive others and want to be perceived by them. It also reflects ideological relationships not only to means of productionthe industrial revolution after all
ultimately presaged ready-to-wear and a democratization of access to fashion, but
also to morality. When the zipper was introduced to male pants in 1901 critics at the
time considered it a sign of moral decline. Similarly, corsets, high heels, and casual
Fridays all exemplify our collective attitude toward capability, physicality, and the
way we engage with the world and others.
Today, as we are developing a new range of wearable devices it would be
instructive to use a framework that explores design and its implications to
the relationship of the self to the self and to social relationships. This same
framework can be used to investigate how technologies like Google Glass and
Quantified Self applications inform our relationship to our self and redefine our
social interactions.
In the past 20years we have seen increased development in the realm of wearable technologies. From the early MIT experiments of a cyborgian self (spearheaded
by Steve Mann and Thad Starner) to todays Google Glass and Quantified Self
applications, the focus has been on a singular vision of what it means to be human.
34
Partscience fiction, part optimized efficiency, these visions tend to a reductionism of

what creates meaning, understanding, awareness, and being in the world.
Countered to this approach, designers experiment with fashions potential for
self-expression resulting in a series of projects that has focused on the use of light,
experimentation with textile technologies, and refocusing notions of connectivity
and intimacy. While these experimentations look more critically into the potential
uses of technology as an agent of expression and of investigating the interactional
possibilities of the wearable environment, they mostly focus on aesthetic potentialities and no rigorous design methodology has emerged that can be applied to the
development of future devices.
As our uses of technology and the devices that surround us are now a defining
part of our material culture, we need to critically consider what we want our culture
to evolve toward and the ways in which these technologies will mediate the space
between ourselves, others, and with our increasingly complex environments. The
potential of technology to create new materialities has been eclipsed by virtuality,
a causal (and casual) relationship to self and others, and a mostly materialistic and
reductionistic relationship to data and their meaning. Critical design and even ethics
have an important role to play in reframing our uses of technology and in developing a
methodology for design and the building of tangible, wearable devices and interfaces.
Considering foundational questions of ethics and human capabilities as well as
looking at definitions of usefulness and its relationship to design, and a design methodology centered around explorations of physicality, mindfulness and notions of
sociability can help us ensure more thoughtful and useful applications. Central to
the discourse of ethics are questions of capability, responsibility, and our relationship
to the social. More fundamentally, ethics asks us to consider what is a life worth
living and how we create meaning for ourselves. Can these questions be incorporated in a design and product development practice? In many ways they are already
tacitly, inevitably, incorporated, but often are not specified in an articulated manner.
As we build specifications matrixes we can include these fundamental questions to
expand the horizon of what wearable devices can offer and how they create meaningful experiences.
What are the responsibilities of designers and technologists when dealing with
the intersecting nodes and dwellings of existence? What are the future worlds we
want to build and inhabit? Why are physicality and tangibility important and what
has design offered in the thinking of our relationship with ourselves, others, and the
world at large?
By framing a discourse within experimentation with physical and computational
materials we can overcome the duality and reductionism that has informed most of
the current and future vision of technology and technological devices. By starting
with human-to-human interaction, we can ground human-to-computer interaction
on principles of sustainability, physicality, and humanism.
3.2.1 Garment as Anchor

As part of this process we should remember that wearables should first and foremost be wearable. They become an extension of the human body and using the
35
conventions and techniques used in garments and accessories; approaching them

through the spectrum of the human body should be our starting point. Our relationship to materials, clothing and how it has evolved to fashion spans thousands of years
and we must take cues and inspiration from the process of making garments and the
rituals of donning clothes.
3.2.2Start with Value

A recent paper from Endeavour Partners found that one-third of American consumers who have owned a wearable product stopped using it within six months.
Whats more, while one in 10 American adults own some form of activity tracker,
half of them no longer use it (Endeavour, 2014). This statistic has been repeated
often and calls into focus questions of value. Why is the drop-off rate so high? What
is the actual and perceived value that wearable, and quantified-self applications and
devices deliver?
Questions of value should ground each product development and clearly articulate
the value provided. Value has multiple dimensions, both external and internal, and
while we are accustomed to measuring value in financial terms, it is necessary to
qualify value as a network of relationships and clearly map how these relationships,
interactions, and exchanges evolve over time. These relationships, in human terms,
almost always include our relationship to our self (self-reflection), to those close and
near to us (our intimate relationships and to the social at large), and to our relationship with the world (our urban and natural environment).
3.2.3Think about the System

In other words, value and the way we formulate relationships are part of a larger
system of interactions. Adopting a view of the system of interactions and mapping
the nodes and points where devices and their features connect to various touchpoints in the system will provide designers and technologists with insights for richer
interactions and help find opportunities for innovation and adoptability. How do
proposed functions and features extend possibility, serendipity, discovery, and sociability? What are the intersecting areas of activity and interest that emerge?
By analyzing current infrastructures and mapping the cultural, physical, social,
and institutional components of a system, we might be able to better understand the
interactions that create, support, and challenge current systems. Working with users
to identify the leverage points for change and growth and to reimagine systems that
enable better flows of value consistently results in designs that allow users to imagine
their own usestherefore overcoming barriers to adoptability.
3.2.4Requirements and Specifications Are for Humans Too

Too often requirements and specifications account for the needs of the device and not
the needs of the user. There is tremendous opportunity in developing requirements
and specifications that articulate the value for users and how this value is created
and touches their entire system of use and interactions. Google Glass, a little over a
36
year after it released a version of the device to developers, published a set of social
guidelines, a social etiquette of sorts. The list includes advice such as:
Ask for permission. Standing alone in the corner of a room staring at people while
recording them through Glass is not going to win you any friends.
Glass-out. Glass was built for short bursts of information and interactions that allow
you to quickly get back to doing the other things you love. If you find yourself staring
off into the prism for long periods of time youre probably looking pretty weird to the
people around you. So dont read War and Peace on Glass. Things like that are better
done on bigger screens.
Be creepy or rude (aka, a Glasshole). Respect others and if they have questions about
Glass dont get snappy. Be polite and explain what Glass does and remember, a quick
demo can go a long way. In places where cell phone cameras arent allowed, the same
rules will apply to Glass. If youre asked to turn your phone off, turn Glass off as well.
Breaking the rules or being rude will not get businesses excited about Glass and will
ruin it for other Explorers.
Google Glass, by releasing its product at an early stage, has been able to generate a vigorous discourse on privacy, socialization, and the way social cues can be
built into interaction design. Could Google Glasses be designed in such a way as to
make the list of dos and donts obsolete? Experimenting with scenarios of use and
observing users in the street, cafes, parties, and at work can yield insightful observations that can be translated into design decisions and reflected in specifications and
requirements.
3.2.5Prototypes and Iterative Design

In many ways Google Glass is a widely released early prototype. It will be interesting to see how the insights gathered from this initial release can be used to experiment with social cues and overcome some of the backlash that it has attracted as a
product so far.
While the importance of early prototypes and iterative design is well understood and embraced as part of a design methodology, additional emphasis must be
placed when developing wearable devices. The relationship we have with our clothes
and accessories touches deeply our sense of self, comfort, identity, and expression.
Creating early prototypes that imagine how wearable devices conform to the body
and its rituals reveals opportunities for value and avoids the ergonomic and social
pitfalls that many wearable devices in the market have fallen into.
3.2.6Experimenting with the Future, Questioning the Present

Imagining the future and engaging in speculative design can help designers and
technologists explore edge-case scenarios and draw out the possible implications of
design and technology decisions. The future is often an exaggeration of the present,
and speculative design can be used to highlight the ramifications of design decisions,
features, and functionality considerations and reveal the potential value and context
of use for current devices.
37
3.2.7Coloring
Coloring is a hypothetical consumer health product that is launched in the year 2046
and was developed by School of Visual Arts, MFA Interaction Design (SVA NYC,
2014) students Matt Brigante, Melody Quintana, Sam Wander, and Amy Wu as part
of a Future Wearables class. The project assumes that by the year 2046 significant
leaps in psychology and neuroscience research will have taken place, transforming
our understanding of mental health. The project also assumes that innovations in
materials technology will introduce new possibilities for treatment, such as brain
chip implants.
Coloring is imagined as
a skin interface for people who use brain chip implants to track and manage their
mental health. It communicates with the users brain chip to display a real-time
visualization of their emotional state, right in the palm of their hand. Emotions are
mapped to a 7000-color spectrum. The spectrum is richer and more precise than our
verbal emotional vocabulary, empowering people with a new language to understand
their feelings. Rather than having to use blunt and unpredictable prescription drugs,
users are given the agency to self-medicate when appropriate. They can simply blend
harmonizing colors into their Coloring to balance their mood
Coloring (2014)
The project took as a starting point the work of John Rogers, professor of materials
science and engineering at the University of Illinois at Urbana-Champaign, in
implantable technologies and speculated on future scenarios of use. At the same
time it asks us to consider how our wearable devices can provide us with a new
vocabulary and range for expression and communication. This future scenario can
help explore current opportunities and create a framework for inquiry and extend
what is possible (Figure 3.1).
3.2.8Life As We Know ItThe Qualified Self

Another student project, developed at NYUs (ITP, 2014) graduate Interactive
Telecommunications Program looks critically at the uses of quantified-self applications and devices. Life As We Know ItThe Qualified Self is Asli Aydins graduate thesis. Aydin, fascinated by the quantified self movement decided to use a series
of life-logging techniques to track herself through a very difficult time in her life
following her fathers cancer diagnosis and through his death.
The project asks questions such as the following: Why do we collect data? Do
data tell us something we dont know about ourselves? Does it change our behavior?
Aydin set out to discover whether or not her data could tell the story of her experience. She writes: The more I tried to put it together, the less I felt like it connected
to my experience. I decided to create a book that compared the two states of my data
during the process of death and how I felt. Aysin used the following applications
and devices to life-log her experience: Jawbone, Openpaths, Reporter, Moodscope,
Happiness Survey. After months of intense self-quantification Aysin concluded that
the qualified self is far from the quantified and she realized that her journal entries
38
Color emotion spectrum
7000 colors
Hierarchical structure enables

simple, top-level readings without
referencing a chart.
More fine-tuned readings can
be gleaned with the help of an
interactive map.
Happiness
Surprise
Disgust
Anger
Contempt
Fear
Sad
7 Families
Discrete Emotion Theory
These seven specific core emotions are
biologically determined emotional
responses whose expression and
recognition is fundamentally the
same for all individuals regardless
of ethnic or cultural differences.
1000 colors in each core

emotional family
FIGURE 3.1 Coloring by Matt Brigante, Melody Quintana, Sam Wander, and Amy Wu.
provided the insight and reflection that eluded her devices and applications. At the
end of her thesis presentation she writes:
Every time we experience these moments the self is shaped. They shape our expectations, our confidence, our expression. They shape who we are. The truth is simple and
it is not embedded in a set of data that tells me how many steps Ive taken. While data
can be useful with specific set goals, my biggest takeaway throughout this journey has
been to remember to track my soul first. The self is fascinating - that fascination cannot
be quantified
Her experience and reflections can be used as direct input and create guidelines for
developing wearable devices that aim to change behavior and provide insight into the
human experience.
39
Thich Nhat Hanh is a Buddhist monk who was invited by Google (Confino, 2013)
to give a series of workshops and provide inspiration to its developers and product
managers on how to engage users and develop applications and devices that can yield
the insights that evaded those devices and applications that Aydin used and possibly
account for the drop-off rate of current wearables. In discussing the goals of these
workshops, Thich Nhat Hanh commented:
When they create electronic devices, they can reflect on whether that new product will
take people away from themselves, their family, and nature. Instead they can create the
kind of devices and software that can help them to go back to themselves, to take care
of their feelings. By doing that, they will feel good because theyre doing something
good for society.
Engaging with the totality of human experience and probing into what creates value,
the systems that we inhabit and the relationships that we create in them are all fundamental in the creation of meaningful and useful wearable devices. We have adopted
a far too reductionistic approach for too long and have been leading product development based on a mechanistic model of what it is to be human.
Comfort, connectedness, engagement, the delight of a soft material against the
human skin, the rituals of dressing and undressing, form the grounding framework
for creating wearable devices. We stand at the precipice of incredible innovation in
materials, sensors, computational and power technologies. We have the opportunity
to create new models of expression, communication, and reflection, and in order to
do so, we should adopt a methodology that is grounded in humanistic and ethical
principles and critically consider how we want to use these innovations to interact
with our communities and ourselves.
3.3SELF AND THE SOCIAL POLITICOFWEARABLE

TECHNOLOGIES
3.3.1Personal Technologies, Regional Innovation
Wearable computing devices are personal, particular, and corporeal. They offer
intimate understandings of the bodyits rhythms, its movements, and its biochemical impulses. They offer intimacies across larger systems, networks, and
communitiessee, for example, the Betwine wristband (Imlab, 2014) which allows
distal users to gently nudge and race against each other. Yet these devices are not
the bespoke ornamentations or cumbersome user-designed apparatus of previous
decades (Mann, 1997), the modern wave of wearables have moved from clunky early
adopter prototypes and spread out into mainstream markets. They are FuelBand,
FitBit, Glass; they are mass-produced; they are legion. To consider the ethics of
this current generation of wearablesintimate, yet manifoldinvolves bringing
together the bodies on which they sit, to the bodies that produce them. Which systems of production are employed to bring wearable technologies to mass markets?
Where does control lie across these networks? By considering the sites of production
and supply, we can interrogate how these systems shape the meaning of wearables,
40
both in the materiality of their design, the configuration of their intended use, and
the politics of the data that they gather.
Market researchers predict that wearable computing devices will explode in
popularity in coming years to the extent that they will become the norm (ABI,
2010). The numbers are enormous: by 2018, there are expected to be 485 million
annual device shipments, all of which have to be manufactured somewhere. Despite
rhetoric of a shrinking world, regional patterns of innovation and industry remain
embedded into the earth; certain places are better at doing some things than others
(Howells, 1999).
The San Francisco Bay Area in Northern California is home to the Silicon Valley
information technology cluster (Saxenian, 1996): after an early history around
microprocessors and semiconductors, the area transformed into a hub for software
and Internet service companies and plays host to some of the worlds largest technology companies. Many of these firms, including Google and Facebook, are now
edging into the wearables market, pulling together teams of designers and engineers
to haul together the concept and intent around these devices.
Seventeen time zones away, the intent becomes material. China is one of the largest and most rapidly developing economies in the world, expanding the industrial
capacity of its high tech industries to act as the global economys world factory,
answering Western desire for ICT consumer goods (Bound et al., 2013). Many of
the current generations of wearables are designed by people in the global North
and made by people in the global South. FitBit is the market leader in the wearable
activity band market. While the founder company is based in San Francisco, FitBit
locates its manufacturing in China; while the device retails for around U.S. $100, it
costs less than one-fifth of that to make (Electronica, 2013). Yet these devices are
also designed for users in the global North, with estimates that 61% of the wearable
technology market in 2013 was attributed to sports and activity trackers. FitBit was,
its founder explained, designed as a quiet and personal device:
From early on we promoted a notion of a more introverted technology that is more
about the connection between yourself and your goal, rather than having a third party
like an athletics company telling you how fit you should be and whats the proper
weight for you.
Amit, G. (2014)
In doing so, the technology falls into not only Western trends around commercialized self-improvement (Maguire, 2008) but also trajectories laid down by the earlier
quantimetric self-tracking movement.
3.3.2Quantifying the Intended User

Unless something can be measured, it cannot be improved. So we are on a quest to collect as many personal tools that will assist us in quantifiable measurement of ourselves.
We welcome tools that help us see and understand bodies and minds so that we can
figure out what humans are here for.
Kelley (2007)
41
The term quantified self emerged in 2007 to describe the way that peopleinitially
an elite group of Bay Area inhabitees, including editors of WIRED magazine
sought to find answers to cosmic questions (Who are we? What does it mean to
be human?) through rational corporeal self-knowledge, giving rise to tools that
offered insight into the data found within their own bodies. By this framing, wearables became a way of reducing wider physicaland mentalhealthcare systems of
infrastructure down to the level of the individual: self-tracking as a form of self-care,
reconfiguring the relationship that might otherwise be formed between a patient and
a medical professional to that between a user, a piece of rubber, a circuit board, and
a software algorithm (while, in the wings, a company sits quietly, waiting to mop the
data up).
Research done by the Centre for Creative and Social Technology (CAST) at
Goldsmiths, University of London, found that 63% of U.K. and 71% of U.S. respondents thought that wearable technology had improved their health and fitness,
with one in three willing to wear a monitor that shared personal data with a healthcare provider (Rackspace, 2013). The business models around the market indicated
where the true value of wearables lies: not in the plastic and electronics of the hardware devices themselves but also in the fog of data that they extracted from the
human body. AsChrisBauer, the codirector of CAST, described it:
The rich data created by wearable tech will drive the human cloud of personal data
With this comes countless opportunities to tap into this data; whether its connecting with third parties to provide more tailored and personalized services or working
closer with healthcare institutions to get a better understanding of their patients. We
are already seeing wearable technology being used in the private sector with health
insurance firms encouraging members to use wearable fitness devices to earn rewards
for maintaining a healthier lifestyle.
Bauer (2013)
While the devices themselves are manufactured in their millions, numerous software
apps have also crawled into the world to make sense of this data: see, for example,
the MapMyFitness toolcompatible with devices such as the FitBit and Jawbone,
it has, as of May 2014, 16 million registered users who log over 200,000 health and
fitness activities daily.
For the users of wearable tech in the global North, ethical issues have emerged
around privacythe tipping point between sousveillance and surveillance. Participants
in CASTs research cited privacy concerns as the main barrier to adoption. Questions
have been raised about whether data can be sold on to third parties; whether it is
securely stored; and who, ultimately, owns it (Ng, 2014). These suspicions emerge
from the primacy of the idea of control and choice: that the users who make the choice
to use wearable tech as a way to figure out what humans are here for may unknowingly
and unwittingly relinquish control of the data it generates; that someone else may be
using rational means to see and understand bodies and minds. These are the fears of
the intended user, the perfect persona who chooses to explore self-knowledge through
the body, and who has the leisure time to engage in fitness activities. Control, consent,
and choice are keys: over half of CASTs respondents felt that wearable technology
42
helped them feel more in control of their lives. Down across the supply chain, however, choice is abstracted and bodies are intended to be surveilled.
3.3.3Tracking in the Factories

We are machines, we are robots, we plug our scanner in, were holding it, but we might
as well be plugging it into ourselves. We dont think for ourselves, maybe they dont
trust us to think for ourselves as human beings.
Adam Littler, Amazon picker (Littler, 2013)
The notion of the quantified self derives from a core concept of agency and sousveillance, in which the motions of the body are willingly recorded by a participant in
the bodys activity. Yet there is a much longer heritage of using rational metrics to
measure the activity of the human body, only by outside agents. In his work published in 1911, The Principles of Scientific Management, Frederick Taylor described
how the productivity of the workforce could be improved by applying the scientific
method to labor management. These included techniques such as time-and-motion
studies, in which a workers series of motions around various tasksbricklaying,
moving pig iron, were timed to ensure the most efficient way to perform a job. Here,
monitoring is not an autonomous choice made with agency about enlightenment and
self-knowledge, but an act placed onto individuals within the power dimensions of the
workplace itself. The body is quantifiednot for self-directed self-improvement, but
as a means to wring maximum physical efficiency out of it for an outside body: the
boss. The British supermarket chain Tesco equipped its employees with data bands
and determined that it thus needed 18% less of those same workers (Wilson, 2013).
Wearables in the workplace are becoming more prevalent: CAST reported that 18% of
employees now wear some kind of device, and that 6% of employers provide a wearable
device for their workers. Innovations in this space include Hitachis Business Microscope,
a lanyard packed with sensors that recognize face, body and rhythm data between
employees, gathering data that can be turned into interaction-based organizational and
network diagrams. A host of software solutions supports this surveillance of workplace
bodies, such as Cogiscans Tracking and Route Control, which uses real-time information to track the physical location and quantities of all products on the factory floor, and in
doing so minimises unnecessary movements of employees (Cogiscan, 2014).
As Ana Coote notes, we live in an era of instant communication and mobile technologies with global reach, where people can increasingly work anywhere; and there is no
end to what employers can demand (Coote et al., 2014). Yet unlimited work does not necessarily map onto quantified laborindeed, it is possibly its antithesis. Unsurprisingly,
the bodies at work that are the most quantifiable are those engaged in routine manual
labornot the creative knowledge-intensive work done in the designing and prototyping
of wearables by engineers and designers, but repetitive replicable tasks that are only an
inch away from being replaced by the automated machines who can mimic the actions
of human bodies, but without need for sleep, fuel, or rights (Frey and Osborne, 2013).
Adam Littlers quote, given earlier, was taken from a BBC documentary in the
enormous warehouses of the online retailer, Amazon, which stock a range of consumer activity trackers including FitBit, Jawbone, and Polar. Little, an undercover
43
reporter, took a job as a picker in the warehouse in Swansea, Wales, where he

collected orders from around the 800,000 ft2 of storage. To assist himto track
himhe was given a handset that told him what to collect but that also timed his
motions, counting down the set number of seconds that he had to find and pick
each item; and, if he made a mistake, the scanner beeped. The handsets were
introduced by Amazon to provide analysis of their inventory, but also increase
worker productivity by reducing the time it takes pickers to find products in a vast
distribution center (Master, 2012). For the pickers, the scanners increase productivity by leading to the intensification of tasks, increasing the stress on their own
bodiesLittler himself ended up running around the warehouse during his nightshifts, covering nearly eleven miles in a night. There is no incentive for introspective self-betterment and self-knowledge from this device; the scanner observes, it
tracks, and it punishes. Workers who miss the productivity targets set down and
enforced by the technologies of power (McKinlay and Starkey, 1998) face disciplinary action.
3.3.4Bodies at Work
In their piece, 75 Watt (2013), artists Cohen Van Balen collaborated with a choreographer and Chinese factory workers to create a piece which reverse engineers the
values of a supply chain by creating a useless physical object; the product of the
labor is the dance done by the workers as they assemble the clunky white plastic
device. Seventy-five watts is the average output of energy a human expends in a day,
a measure that could be tracked by sousveillance through a consumer wearable, on
the path to asking questions about the meaning of human life. Yet down along the
supply chain, in the factories and the warehouses, the same transformative power of
digital hardware around wearable technology answers the question: the human life
is capital: the bodies themselves only actions.
3.4SYNAPTIC SCULPTURE: VIBRANT MATERIALITY

AND THE INTERCONNECTED BODY
As technology is rapidly evolving it is becoming invisible, embodied within the
materials of everyday life. Textiles have a heritage, tradition, and cultural function
that are evolving in a mash-up with science and technology, imbuing them with
capacities to extend our perception of ourselves and of the world and the way we live
in it. Wearables ability to interconnect changes our perspective and relationships
with others; the ability to focus more explicitly and explore intimately at nanoscopic
levels combined with macroscopic and virtual perspectives opens possibilities of
completely new experiences of being in the world.
3.4.1Sperm, Stars, and Human-Centric Perception

It was once believed that human spermatozoa contained all the elements needed for
human reproduction. To the human eye it appeared that a tiny figure was visible in
the head of the spermatozoon. A womans role in reproduction was simply a vessel to
44
nurse the spermatozoon until it developed enough to enter the world. The invention
of the microscope revealed the process of spermatozoa fertilization with ovum and
changed the role of women profoundly. Our comprehension of the world is mediated
by technology and is dependent on our ability to adapt and make sense of the information the technologies provide. Early star charts depicted animals, figures, and
objects in the sky. The images and mythologies that went along with them were used
to aid memory and help recall the location of stars in the visible night sky. With the
invention of the telescope the cartographers job changed drastically. Maps became
factual documents plotting out the heavens above in ever increasing detail, in line
with technological advancement. Human consciousness is altered as new technology
enables us to see things differently, for example, when we landed on the moon in
1969, images looking back at the earth were projected into peoples living rooms via
television, and they enabled us to imagine ourselves as part of a greater whole and
see the earthrather than endless and boundless in natural resourcesas a delicate
intertwined ecosystem of which we are just a small part.
Floating eye is a wearable work by Hiroo Iwata performed at Ars Electronica in
2000 where a floating blimp suspended above the wearers body supports a camera.
The head of the wearer is encased in a dome, and from the inside they view a panoramic screen projecting what is being filmed. The experience is that of observing
oneself from above, normal vision is superseded and interaction with the environment estranged. This work predates a perspective that we are becoming accustomed
to, that of navigating space by looking down into the screen of a digital device,
guided by a plan view and prompted by Google maps or the like. Wearable technologies may at first seem to disorient or give a feeling of estrangement but as we explore
new ways to understand the world around us, we are profoundly changing the way
we live and interact. The shift in perspective that we are fast approaching involves
both time and scale. The wearable technology that surrounds and permeates our
bodies will mediate this experience and augment our senses. We are witnessing the
emergence of a new paradigm in terms of our augmented perspectiveour perception of scale expanding our awareness and sensitivity across macro and nanospheres
that we will learn to accommodate and ultimately will become our normative
environment.
Since the mid-1990s we have lived in environments supported by digitization.
This is long enough to evaluate the theoretical hype of the late 1990s surrounding the
digital and virtual world of the Internet that hypothesized homogenization of culture
and the divorce of information from materiality. The teleological and ocular-centric
faith in technology has deep-seated historical roots. With the invention of photo
graphy, journals wrote articles in awe of this new scienceit seemed that we had
procured the magical ability to capture moments of life in factual documents of light
on photo sensitive paper. A zealous appeal made in the comments by Oliver Wendall
Holmes in an article published in 1859 heralds this greatest human triumph over
earthly conditions, the divorce of form and substance. What is to come of the
stereoscope and the photograph[] Form is henceforth divorced from matter. In
fact, matter as a visible object is of no great use any longer, except as the mold on
which form is shaped (Holmes, 1859).
45
In retrospect, the Internet has ultimately refocused our attention on matter, as

the dividing line between digital and material evaporates within our human-technogenesis. The interface inherently involves a coupling between computer-mediated
rendering of data and human response. Strings of 0s and 1s in raw state in the computer have no sensory or cognitive effect without material formations that interface
with our proprioceptors to make sense of the data.
The development of computing is deeply indebted to the development of materials technologies. Textiles manufacture and techniques helped conceptualize digital
technologiesfrom the protocol logic of knitting to the matrix of heddle structures
in weaving machines, to the dpi and pixilation of hand techniques such as crossstitch. Teshome Gabriels Notes on Weavin Digital: T(h)inkers at the Loom (Gabriel
and Wagmister, 1997) explores traditional non-Western weaving in this light; Otto
Van Buschs Zen and the Abstract Machine of Knitting (von Busch, 2013) and Sadie
Plants The Future Looms: Weaving Women and Cybernetics (Plant, 1996) and
Zeros + Ones: Digital Women and the New Technoculture (Plant, 1997) evidence
strong connection between the material and the digital.
3.4.2Inversion of the Design Process

We are witnessing a subversion of the traditional fashion design methodology away
from the trickle-down theory to one that can enhance a relationship between designer
and user who become coproducers, at the same time connecting materiality to
anthropology and the lived experience of the individual. Self-mapping and tracking
means that data that was once considered the domain of a third party specialist to
interpret is available for self-reflection and immediate reconfiguration. Combined
with the storage capacity of super computers massive data sets about micro personal
information are guiding future strategies for big business design. The inversion of
the design process from technology-driven design, to need-driven design, and ultimately to concept-driven design takes the design process from one of the enabling
technologies, to applications-driven design, (that is, focusing on users, tasks, and
evolution), to visionary design driven by concepts and principles.
The Centre for Postnormal Policy at the Hawaii Research Center for Futures
Studies forecasts a postnormal condition of chaos, complexity, and contradictions
under conditions of uncertainty and accelerated change in three future modes: the
extended present, the familiar future, and the un-thought future (Sweeney, 2014).
In the world of contemporary art we have witnessed a transition where the locus of
meaning that once lay within the object, and then in the medium, now lies in the
interface (Poissant, 2007). This holds true throughout design and fashion sectors,
in other words the evolution has changed the focus from production, to service, to
experience (Table 3.1).
Our relationship with the world is evolving from one in which historically we
were hunter gatherers using the products of the world; then we learnt to harness the
energy in production of materials, controlling the natural world around us through
industrialization; and now there is a need for us to imagine the future, to design and
craft our own world.
46
TABLE 3.1
Authors and Concepts That Point to the Growing Prominence of Experience/
Interaction/Interface Design (Including HCI)
Author
John Sweeney (2014)
Louise Poissant (2007)

Jannis Angelis and
Edson Pinheiro de
Lima (2011)
Neil Gershenfeld (2011)
Ishii Hiroshi (1997, 2012)
PSFK (2014)
Past
Present
Future
The extended present

governed by trends
and weak signals
Material object
Focus on production
The familiar future

governed by images
of the future(s)
The medium
Focus on service
The un-thought future

governed by design
and experience
The interface
Focus on experience
Computers controlling
tools
Technology-driven
design
Connected intimacy
Machines making
machines
Need-driven design
Building with materials

containing codes
Concept-driven design
Tailored ecosystems
Co-evolved possibilities
3.4.3Bridging Materiality and Information

In the history of wearable computing the predominant focus has been on ocular-
centric ways of knowledge transfer (Margetts, 1994). Although augmentation of
other senses is being explored, historically the emphasis has been placed on vision
as the means of input and output and this legacy has informed our perception of
wearables, the classic example being Google Glass. Primary research funding is still
spent in this area. Challenging the dominance of vision, in an analysis of the senses
David Howes cites both Marxs doctrine and etymology when he proposes that Late
capitalism is much more than a civilization of the image and it cannot be theorized
adequately without account being taken of its increasingly multisensory materiality. The difficulty here stems from the sensory bias intrinsic to the very notion of
theorization: theory comes from the Greek theorein meaning to gaze upon. It is
high time for all of the senses (not solely vision) to become directly in their practice
theoreticians (Quotes from Marx, discussed in Howes, 2003). In order to achieve
this, discourse needs to change the language around design away from functional
attributes and technical capacities to develop a connoisseurship of somesthetic qualities (Schiphorst, 2011). The focus of wearable technology that concerns itself with
experience views technology as the mediator rather than the end product.
What we wear will record bodily data and exchange information with the environment. The line between personal and global data will blur, sensory stimulation will
be felt at macro and micro spheres of humancomputer engagement and interpersonal communication. What we wear not only expresses our identity, protects us, and
regulates temperature but also is rapidly becoming the substrate to embed sensors,
recorders, actuators, transmitters, diffusers, and integrators (Pold, 2005). These six
elements expand and augment the bodys five senses of sight, sound, touch, taste,
and smell. Wearables utilize these 11 parameters as new media to sculpt experience.
47
Sensors perceive data in the environment; they can be based on the detection of
different parameters such as light, heat, humidity, stress, force, movement, and noise
and come in many forms such as microphones, ultrasound detectors, photovoltaic
sheets, stretch sensors, and data gloves. They can be analogue or digital.
Recorders take samples of reality or traces of activity and collect them, in a nalogue
formats by fixing them onto substrates like tape, film or photo-paper, and through
numeric coding in digital formats. Recordings can be transformed and altered, and
augment memory.
Actuators can have different mechanisms such as electric, pneumatic, or hydraulic, in order to produce activities such as movement, light, or sound, for example,
a fan, a light, or a buzzer. When combined with materials such as shape memory
alloys, thermochromic inks, or smart textiles, they can appear to embody autonomy
in reaction to changes in conditions.
Transmitters nullify distance, as they have evolved so has the way we live in the
world. This has been as profound as the effect of the lens on our visual perspective
from micro to macro understandings of the world. They are interfaces ranging from
the telegraph to television, facsimile, radio, Internet, X-bee, etc.; they offer potential
to reconsider time, space, and interaction.
Diffusers are attachments for broadening the spread of a signal into a more even
and regulated flow, for example, devices that spread the light from a source evenly
across a screen. They could be in the form of an electrostatic membrane or a projection screen such as LCD, plasma, or thermal imaging.
Integrators involve the integration of technologies into living organisms, the
mash-up between biology, medicine, tissue engineering, nanotechnology, and artificial life.
Translation of data from one type of input expressed through another form of
output is a natural function within the fungible realm of data; it enables exploration of the traditional boundaries that govern human perception. It is well known
that people without or with minimal function of certain senses, become more acute
in the function of others. For example, by producing oral clicking noises, Ben
Underwood is able to echolocate and visualize spaces around him even though he
is blind. Tests showed that when he performed echolocation his calcimine cortex,
the part of the brain that normally deals with visuals, was shown to be stimulated
(McCaffrey, 2014).
Neil Harbisson is an artist who has achromatopsy, meaning he cannot see colors. He has legally registered as a cyborg and wears a permanent head mounted
computer that enables him to hear color by converting light waves to sound waves.
He is a painter and produces artworks based on music. His senses have been augmented and his body adapted to the expanded somesthetic so that he perceives
more than the natural human visible spectrum to include infrared and ultra violet
(Harbisson, 2012).
The bodys sensual capacities can adapt and accommodate new experiences and
wearables provide a platform for experimentation. An example is a wearable devise
that explores sensory dissonanceBamboo Whisper translates language from one
wearer into percussive sounds and vibration felt by the wearer of a second device
(Figure 3.2).
48
FIGURE 3.2 Bamboo Whisper, Tricia Flanagan, and Raune Frankjaer. (Photo: Tricia
Flanagan, 2012.)
By exploring voice in terms of vibration, and stripping away semiotic analysis of

language, the sensual capacities to communicate emotion and understanding between
bodies could reveal predemic universal Ursprache. In this light, Bamboo Whisper
extends earlier experiments by Dadaist Hugo Ball with his nonlexical phonetic
poems and what the Russian futurist poets Khlebnikov and Krucnykhs termed
Zaoum (Watts, 1988). What happens when body rhythms overlay one another? Does
one regulate the activity of the other? Do the bodies adapt to the exchange and find
new rhythms and harmonies in unison?
We experience similar effects when dancing or sleeping close together. Parents
intuitively adopt techniques of rhythm, breathing or singing to nurse babies.
Experimentation in wearable technology can develop and adapt similar strategies
in order to generate fundamental questions and employ them in speculative design.
This methodology represents an alternative to design processes that use design to
answer preconceived questions generated from historical or market generated data
and formulated as problems to be solved. Affirmative design practices such as the
latter are limited in their capacity and do not support design mavericks. Adopting a
language that enables effective design of emotional experiences and fosters a connoisseurship of the interface, for example, through the use of the eleven parameters
described earlier, is an attempt to address somesthetic issues as primary to design
development where the technology itself does not govern but is a tool in the design
of human experience.
3.4.4Merger of the Body and Technology

PSFK predicts a future of coevolved possibilities where technologies are evolving alongside human behaviors to augment, replicate or react to natural abilities
and inputs, creating an increasingly connected relationship between people and
their devices (PSFK, 2014). The person as computer embodies new forms of
intuitive computer control. Steve Mann calls this Humanistic Intelligence (Mann,
49
2001, 2008). Flanagan and Vegas research into Humanistic Intelligence produced
Blinklifier, a wearable device that uses eye gestures to communicate with an onboard
computer. By wearing electroplated false eyelashes and conductive eyeliner, bio-data
from blinking communicates directly with the processor without engaging cognitive action. The bodys natural gestures are augmented and amplified into a head
mounted light array (Flanagan and Vega, 2012). We innately understand and interpret information from peoples eye gestures; by amplifying these everyday gestures,
Blinklifier leverages the expressive capacity of the body (Figure 3.3).
Anticipating the merger of the body and technology, Ray Kurzweil proposed singularity (Kurzweil, 1998, 2006) as the point in the future when the capacity and
calculation speeds of computers equal that of human neural activity, and our understanding of how the mind works enables us to replicate its function. Kurzweil promulgates artificial intellects superior to human ones, which poses the question: In
the future will we be outsmarted by our smart-clothes? Artificial intellects known as
artilects will conceivably have rights, following the attainment of universal human
rights, and then the rights of animals, landscapes, and trees (Dator, 2008). Nonhuman
entities are already represented in our juridical systems, in the form of corporations,
and artilects could attain rights in a similar manner (Sudia, 2001).
The separation between human and machine intelligence traditionally lies in the
human realm of emotions thought of as metaphysical. Recent scientific discovery has
given us insight into emotions such as retaliation, empathy, and love that can now be
understood within the frame of scientific knowledge.
FIGURE 3.3 Blinklifier, Tricia Flanagan, and Katia Vega. (Photo: Dicky Ma. Tricia
Flanagan, 2012.)
50
Lower levels of the neurotransmitter serotonin may affect your ability to keep
calm when you think someone is treating you badly and promote your tendency to
retaliate (Crockett, 2008). Mirror neurons have been discovered in the brain, which
produce the same chemical reaction in your body when you witness an experience
happening to another as are being produced by the body you are watching (Keysers,
2009). For example, when someone falls over and hurts himself or herself, you may
instinctively say ouch and actually produce small amounts of the same chemical
reaction in your body as if it happened to you. Empathy could therefore be described
as physiological rather than a purely emotional condition. Biologists are endeavoring to interpret emotional states into biological chains of events. Tests indicate that
higher levels of oxytocin in females and vasopressin in males may foster trust and
pair bonding at a quicker rate. Brain activity in dopamine related areas of the brain
are active when mothers look at photos of their offspring or people look at photographs of their lovers. Dopamine is a neurotransmitter, which activates the same
circuitry that drugs like nicotine, cocaine, and heroine do to produce euphoria and
addition. Love therefore can be described as an emergent property of a cocktail of
ancient neuropeptides and neurotransmitters (Young, 2009).
We tend to anthropomorphize robots, our bodies produce mirror neurons in reaction to their behaviors in a similar way that we do to human entities (Gazzola et al.,
2007). Can the experience of digitally mediated touch produce physiological chemistry in the recipient? Cute Circuits Hug Shirt senses the pressure and length of a hug,
the heart rate, and skin temperature of the hugger and sends this data via Bluetooth
to a recipient whose corresponding Hug Shirt actuators provide a simulated hug.
Put simply, the sender hugs their own body and a recipient body feels the experience. Can wearables, designed to actuate physical haptic stimulus on another induce
chemical emotional effect? What are the potential implications for health, medicine,
and well-being? The interconnected networks that mirror neurons imply, between
human but also nonhuman entities, pose fundamental problems to Ray Kurzweil and
Norbert Wieners (Wiener, 1989) assumptions that by mechanistic analysis of the
materials of the body, we will ultimately understand and replicate them. Quantum
physics proposes that to understand the mind, we must look outside the body, and
consider the interconnected nature of everything as porous.
Textiles of the future merge science and technology. Nobel laureate Alex Carrel headed
the first tissue culture laboratory exploring one of the most complexes of all materials
the skin. Future textiles will be designed with highly engineered specificationslike
skincombining areas that are thicker, thinner, more flexible, or ridged and that have
the ability to adapt to the task or the environment. At SymbioticA lab, Aaron Catts has
been growing cultured skins from enzymes to produce kill-free leather, an approach
that tackles ethical and sustainability issues. Stelarcs Ear on Arm (2006ongoing)
was cultured in the SymbioticA lab. The ear was grown from tissue culture around
a frame and then sutured to Stelarcs forearm. A microphone was then embedded in
the prosthetic enabling visitors to Stelarcs website to listen to whatever his third ear
hears. In a future iteration of the project, he plans to implant a speaker into his mouth,
so that people can speak to him through transmitters, for example, from his website
or a mobile telephone, and he will hear the sounds inside his head, or if he opens his
mouth someone elses voice could speak from within it.
51
Through the convergence of biological engineering and nanotechnology, future

clothing and accessories simply could grow from our bodies. Nancy Tilburys speculative designs explore new definitions of designs without cloth or conventional fabrication methods (Quinn, 2013). Tilburys research promulgates notions of garments
formed from gases and nanoelectronic particles that automatically assemble on the
body, liquids that thicken when they come into contact with the body and form a
second skin, and surfaces that emerge from inside the body induced by swallowable
technologies in the form of tablets or nanoprobes that create changes in the color,
pattern, or textural surface of the skin.
The swallowable Peptomics, developed by Johannes Blank and Shaila C. Rssle,
convert the alphabetic language used to identify protein code into new configurations of molecular word modeling (Blank and Rssle, 2014). Their project has recreated the seven deadly sins of wrath, greed, sloth, pride, lust, envy, and gluttony into
new three dimensional chains of amino acid monomers linked by peptide bonds,
complete with their specific biological functions, and encapsulated into word pills.
Health projects of the future such as Coloring, pictured earlier in the chapter, combine with biotech approaches like Peptomics provide examples of the merger of
electronic and chemical synapses to potentially create consumer mood management
products that may be closer to realization than we think.
From the inside out, we are capable of 3D printing organs to replace and create
new body parts that will integrate with fashion and technology affecting social and
political agency of the future. Tonita Abeytas Sensate (Lupton and Tobias, 2002)
collection takes the tools of sexual hygiene and makes them into fashionable intimate
apparel made from latex with built in male/female condoms. At a micro scale, the
borders between inside and outside the body become permeable and fluid. Lindsay
Kelleys project Digesting Wetlands considers the space of the body as a micro biome,
a wetland environment, and an interconnected ecosystem. Digestion becomes a
way of figuring the landscape and encountering animals, plants, and environmental
systems, with molecular gastronomy techniques providing metaphoric and literal
frameworks for imagining how bodies and landscapes interrelate (Kelley, 2014).
Kelley designs molecular gastronomy events and edible objects as forms of environmentalist interventionism. For example, to better digest oil spills or absorb heavy
metals. From this approach bodies are viewed as floating islands where metaphors
of atmospheric shift, drift, and blooming open up productive spaces for intervention
where fauna and flora from in and outside our bodies interact with changing populations of viruses, bacteria, and fungi. Digital ecosystems can be viewed in a similar
fluid manner where keywords for future design describe events like calcification,
erosion, sedimentation, swarm behaviors etc.
Atomic force microscopy (AFM) enables optical imaging of the world at the
nanolevel. Interestingly as we have traced the changing perception of the world aided
by new optical apparatus, the telescope, the microscope, and the stereoscope are all
lens-based technologies. The most dramatic change in our perception is occurring as
we incorporate nanoscale environments, where scanning the surface with AFM produces haptic vibrations that are translated by computer programs into visual images.
One example born from this technology is a textile substrate that is self-cleaning that
was developed by mimicking the cell structure of the lotus leaf.
52
Macro perspectives gained through global information networks, cloud computing, and super computers that allow access to information instantaneously have
enabled us to envisage an interconnected worldview. Simultaneously, an amplified
awareness of the instability, fungability, and interconnectedness of things is emerging
as we acknowledge the vibracity of the world at a molecular level. The importance
of haptic engagement within communication, incorporating the bodys full potential
of senses, is gaining recognition. Nanoperspectives reveal a world with completely
different parameters and support a reconsideration of vitalism fundamental to string
theory and quantum physics that question our current understanding of materiality.
This perspective further promulgates the space of design to be in the interface as
mediator of experience, rather than design of objects or products.
3.4.5Conclusion: Synthesis and Synaptics

The current way we view HCI is predominantly mediated through a screen and keyboard. Linking computing directly to the bodybiology and technology goes way
beyond consideration of semiotic analysis, it is not simply just a process of signs as
Barthes (1973) would have itit is a process involving all the senses.
There is a need to establish a connoisseurship in the somesthetic design of wearable interfacesbe they physical or digital. Synaptic sculpture is an approach that
views materials (biological or electronic) in terms of their potential as actants and
bodies and things as agentic. It is a neologism the combination of three words:
haptic, synaptic and snopsis. Hapticof or relating to the sense of touch, in particular relating to the perception and manipulation of objects using the senses of
touch and proprioception; Synapticof or relating to a synapse or synapses between
nerve cells. It is a specialized junction where transmission of information takes place
through electronic or chemical signals, a communication device for transmitting
information. The term was introduced at the end of the nineteenth century by the
British neurophysiologist Charles Sherrington. Traditionally used in biology it lends
itself well for use in relation to biotechnology and the hybrid spaces emerging from
biological (chemical) and electronic (data flows) worlds. It is a relational space at a
micro level; and Snopsisan ancient Greek word meaning sunto combine or
form plus psis to view (Flanagan, 2011).
What is described earlier is indebted to the notion of vitalism, an idea that has
been around for some time and has been explored in the work of Spinoza, Nietzsche,
Thoreau, Darwin, Adorno, Deluze, Bergson, and Dreiesch. Across contemporary
literature, theorists are describing a vivacious new landscape: the Internet of things
(Ashton, 2009), a universe of bits (Poissant, 2007), vibrant matter (Bennett, 2010),
tangible bits and radical atoms (Ishii et al., 2012), a postvitalist point of view (Doyle,
2003), and synaptic sculpture (Flanagan, 2011), all of which call for a transvital
approach where all matterenergy, code, viruses, air, and water are seen as relevant
and related (Thomas, 2013).
Materials and thinking are recombining and agency is paramount in our understanding of both human and nonhuman entities. Traditional consumer-based capital
structures are undermined from the bottom up (Howes, 2005) by digital ecology
layering like calcification, where buildup accumulates and changes the environment.
53
Bio-data and big-data combine to produce unprecedented detail of personal information enabling the tailoring of design to personal desires, while at the other end
of the spectrum, human life is subsumed as a widget in the production line. Lines
of control, borders between public and private, are all to be renegotiated. If you can
print your own IUD, you take control away from legislation in terms of birth control
or make DIY medical decisions because you have access to specific data sets that in
the past were left to teams of expertsthe questions who will control society and
how governance systems will function in these new terrains remain unanswered.
A humanistic intelligence approach to wearable technologies considers a seamless integration, extending the reach of the systems of the body into body coverings
and into the world beyond. The biosphere and the data-sphere become one through
which sustainable design solutions will emerge. The field of wearable technology
explores the function of the mechanistic, as well as that of neural networks and mental representation. The peripheral borders where physical atoms meet digital bits are
fertile new spaces for design.
At the nanolevel it is revealed that everything we thought was fixed and stable
is chaotic and in motion. There is a growing awareness of the porosity of the world
and fungability of materials. Design of future wearable tech. apparel and artifacts
are created with molecular aesthetics, they are synaptic sculptures where experience becomes a material to be molded and shaped in the design of interaction. An
awareness of interconnectedness will promulgate designers to create works responsibly and tackle research problems by proposing creative solutions. Vibrant materials
will be crafted into bespoke manifestations of experienceapparel as extensions of
natural systems.
REFERENCES
ABI Research. 2010. Wearable computing devices, like Apples iWatch, will exceed 485million
annual shipments by 2018. ABIresearch.com. Accessed May 20, 2014. https://www.
abiresearch.com/press/wearable-computing-devices-like-apples-iwatch-will.
Amit, G. 2014. Wearable technology that ignores emotional needs is a Major Error.
Dezeen. Accessed May 20, 2014. http://www.dezeen.com/2014/03/10/interview-fitbitdesigner-gadi-amit-wearable-technology/.
Angelis, J. and E.P. de Lima. 2011. Shifting from production to service to experience-based
operations. In: Service Design and Delivery, M. Macintyre, G. Parry, and J. Angelis
(eds.), pp. 8384. New York: Springer.
Ashton, K. 2009. That Internet of Things thing. RFID Journal, June 22, 2009. Accessed June
14, 2014. http://www.rfidjournal.com/articles/view?4986.
Barthes, R. 1973. Mythologies. London, U.K.: Granada.
Bauer quoted in Rackspace. 2013. The human cloud: Wearable technology from novelty to
production. White Paper. San Antonio, TX: Rackspace.
Bennett, J. 2010 (1957). Vibrant Matter: A Political Ecology of Things. Durham, NC: Duke
University Press.
Blank, J. and S.C. Rssle. 2014. PeptomicsMolecular word modeling. Paper presented at
the Third International Conference on Transdisciplinary Imaging at the Intersection of
Art, Science and CultureCloud and Molecular Aesthetics, Pera Museum, Istanbul,
Turkey, June 2628, 2014. Abstract accessed August 30, 2014. http://ocradst.org/
cloudandmolecularaesthetics/peptomics/. See also http://www.peptomics.org.
54
Bound, K., T. Saunders, J. Wilsdon, and J. Adams. 2013. Chinas absorptive state: Innovation
and research in China. Nesta, London, U.K. Accessed August 30, 2014. http://www.
nesta.org.uk/publications/chinas-absorptive-state-innovation-and-research-china.
Cogiscan. 2014. WIP tracking and route control. Cogiscan.com. Accessed May 20, 2014.
http://www.cogiscan.com/track-trace-control/application-software/wip-tracking-routecontrol/.
Cohen, D. 2014. Why we look the way we look now. The Atlantic Magazine, April 16, 2014.
Accessed May 14, 2014. http://www.theatlantic.com/magazine/archive/2014/05/the-waywe-look-now/359803/.
Coloring. Accessed August 30, 2014. http://interactiondesign.sva.edu/people/project/coloring.
Confino, J. 2013. Google seeks out Wisdom of Zen Master Thich Nhat Hanh. The Guardian,
September 5, 2013. Accessed August 30, 2014. http://www.theguardian.com/sustainable-
business/global-technology-ceos-wisdom-zen-master-thich-nhat-hanh.
Coote, A., A. Simms, and J. Franklin, 2014. 21 Hours: Why a Shorter Working Week Can Help
us All to Flourish in the 21st Century. London, U.K.: New Economics Foundation, p. 10.
Crockett, M. 2008. Psychology: Not fair. Nature 453: 827, June 12, 2008.
Dator, J. 2008. On the rights and rites of humans and artilects. Paper presented at the
International Conference for the Integration of Science and Technology into Society,
Daejeon, Korea, July 1417, 2008. Accessed August 30, 2014, www.futures.hawaii.edu/
publications/ai/RitesRightsRobots2008.pdf.
Doyle, R. 2003. Wetwares: Experiments in Postvital Living, Vol. 24. Minneapolis, MN:
University of Minnesota Press.
Electronics360. 2013. Teardown: Fitbit flex. Electronics360. Accessed May 20, 2014. http://
electronics360.globalspec.com/article/3128/teardown-fitbit-flex.
Endeavour Partners. 2014. Inside Wearables, January 2014. Accessed August 30, 2014. http://
endeavourpartners.net/white-papers/.
Flanagan, P. 2011. The ethics of collaboration in sunaptic sculpture. Ctr+P Journal of
Contemporary Art 14: 3750, February.
Flanagan, P. and K. Vega. 2012. Blinklifier: The power of feedback loops for amplifying
expressions through bodily worn objects. Paper presented at the 10th Asia Pacific
Conference on Computer Human Interaction (APCHI 2012), Matsue, Japan. See also,
Accessed August 30, 2014. http://pipa.triciaflanagan.com/portfolio-item/blinklifier/and
https://www.youtube.com/watch?v=VNhnZUNqA6M.
Frey, C. and M. Osborne. 2013. The Future of Employment: How Susceptible Are Jobs to
Computerisation? Oxford, U.K.: OMS Working Paper.
Gabriel, T.H. and F. Wagmister. 1997. Notes on weavin digital: T(h)inkers at the loom. Social
Identities 3(3): 333344.
Gazzola, V., G. Rizzolatti, B. Wicker, and C. Keysers. 2007. The anthropomorphic brain:
The mirror neuron system responds to human and robotic actions. Neuroimage 35(4):
16741684.
Gershenfeld, N. 2011. The making revolution. In: Power of Making: The Importance of
Being Skilled, D. Charny (ed.), pp. 5665. London, U.K.: V&A Pub. and the Crafts
Council.
Harbisson, N. 2012. I listen to color. TED Global. Accessed November 25, 2012. http://www.
ted.com/talks/neil_harbisson_i_listen_to_color.html.
Holmes, O.W. 1859. The stereoscope and the stereograph. In: Art in Theory, 18151900: An
Anthology of Changing Ideas, C. Harrison, P. Wood, and J. Gaiger (eds.), pp. 668672.
Malden, MA: Blackwell. Originally published in The Atlantic Monthly, Vol. 3 (Boston,
MA, June 1859): 738748.
Howells, J. 1999. Regional systems of innovation. In: Innovation Policy in a Global Economy
D. Archibugi, J. Howells, and J. Michie (eds.). Cambridge University Press, Cambridge.
55
Howes, D. 2003. Aestheticization takes command. In: Empire of the Senses: The Sensual
Culture Reader Sensory Formations Series, D. Howes (ed.), pp. 245250. Oxford, U.K.:
Berg.
Howes, D. 2005. HYPERESTHESIA, or, the sensual logic of late capitalism. In: Empire of
the Senses: The Sensual Culture Reader Sensory Formations Series, D. Howes (ed.),
pp.281303. Oxford, U.K.: Berg.
Imlab, C. 2014. Accessed May 20, 2014. https://www.youtube.com/watch?v=pE0rlfBSe7I.
Ishii, H., D. Lakatos, L. Bonanni, and J.-B. Labrune. 2012. Radical Atoms: Beyond Tangible
Bits, Toward Transformable Materials, Vol. 19. New York: ACM.
Ishii, H. and B. Ullmer. 1997. Tangible bits: Towards seamless interfaces between people, bits
and atoms. ITP Tisch. Accessed August 30, 2014. http://itp.nyu.edu/itp/.
Kelley, K. 2007. What is the quantified self? Quantifiedself.com. Accessed May 20, 2014.
http://quantifiedself.com/2007/10/what-is-the-quantifiable-self/.
Kelley, L. 2014. Digesting wetlands. Paper presented at the Third International Conference
on Transdisciplinary Imaging at the Intersection of Art, Science and CultureCloud
and Molecular Aesthetic, Pera Museum, Istanbul, Turkey. Abstract accessed August 30,
2014. ocradst.org/cloudandmolecularaesthetics/digesting-wetlands/.
Keysers, C. 2009. Mirrow neuronsAre we ethical by nature. In: Whats Next?: Dispatches
on the Future of Science: Original Essays from a New Generation of Scientists,
M.Brockman (ed.). New York: Vintage Books.
Kurzweil, R. 1998. The Age of Spiritual Machines: When Computers Exceed Human
Intelligence. New York: Viking Press.
Kurzweil, R. 2006. Singularity: Ubiquity interviews Ray Kurzweil. Ubiquity, January 1, 2006.
Littler, M. 2013. Amazon: The Truth Behind the Click. Produced by Michael Price. London,
U.K.: BBC.
Lupton, E. and J. Tobias, 2002. Skin: Surface, Substance + Design, 36, pp. 7475. New York:
Princeton Architectural Press.
Maguire, J.S. 2008. Leisure and the obligation of self-work: An examination of the fitness
field. Leisure Studies 27: 5975, January.
Mann, S. 1997. Wearable computing: A first step toward personal imaging. Computer 30(2):
2532.
Mann, S. 2001. Wearable computing: Toward humanistic intelligence. IEEE Intelligent
Systems 16(3): 1015.
Mann, S. 2008. Humanistic intelligence/humanistic computing: Wearcomp as a new framework for intelligent signal processing. Proceedings of IEEE 86(11): 21232151.
Margetts, M. 1994. Action not words. In: The Cultural Turn: Scene-Setting Essays on
Contemporary Cultural History, D.C. Chaney (ed.), pp. 3847. New York: Routledge.
Master, N. 2012. Barcode scanners used by amazon to manage distribution centre operations. RFgen. Accessed May 20, 2014. http://www.rfgen.com/blog/bid/241685/
Barcode-scanners-used-by-Amazon-to-manage-distribution-center-operations.
McCaffrey, E. 2014. Extraordinary people: The boy who sees without eyes. Accessed June 29,
2014. http://www.imdb.com/title/tt1273701/.
McKinlay, A. and K. Starkey. 1998. Foucault, Management and Organization Theory: From
Panopticon to Technologies of Self. London, U.K.: Sage.
Ng, C. 2014. Five privacy concerns about wearable technology. Accessed May 20, 2014.
http://blog.varonis.com/5-privacy-concerns-about-wearable-technology/.
Plant, S. 1996. The future looms: Weaving women and cybernetics. In: Clicking in: Hot Links
to a Digital Culture, L. Hershman-Leeson (ed.), pp. 123135. Seattle, WA: Bay Press.
Plant, S. 1997. Zeros and Ones: Digital Women and the New Technoculture. New York: Doubleday.
Poissant, L. 2007. The passage from material to interface. In: Media Art Histories, O. Grau
(ed.), pp. 229251. Cambridge, MA: MIT Press.
56
Pold, S. 2005. Interface realisms: The interface as aesthetic form. Postmodern Culture 15(2).
Accessed February 20, 2015. http://muse.jhu.edu.lib-ezproxy.hkbu.edu.hk/journals/pmc/
toc/pmc15.2.html.
PSFK Labs. 2014. The future of wearable tech. Accessed January 8. http://www.slideshare.
net/PSFK/psfk-future-of-wearable-technology-report: PSFK.
Quinn, B. 2013. Textile Visionaries Innovation and Sustainability in Textiles Design, Vol. 11,
pp. 7681. London, U.K.: Laurence King Publishing.
Rackspace. 2013. The human cloud: Wearable technology from novelty to production. White
Paper. San Antonio, NC: Rackspace.
Saxenian, A.L. 1996. Regional Advantage: Culture and Competition in Silicon Valley and
Route 128. Cambridge, MA: Harvard University Press.
Schiphorst, T. 2011. Self-evidence: Applying somatic connoisseurship to experience design.
In: CHI 11 Extended Abstracts on Human Factors in Computing Systems, pp. 145160.
Sudia, F.W. 2001. A jurisprudence of artilects: Blueprint for a synthetic citizen. Accessed
August 30, 2014. http://www.kurzweilai.net/a-jurisprudence-of-artilects-blueprint-fora-synthetic-citizen.
SVA, NYC. Interaction MFA interaction design. Accessed August 12, 2014. http://
interactiondesign.sva.edu/.
Sweeney, J.A. 2014. Artifacts from the three tomorrows. Graduate Institute of Futures Studies,
Tamkang University, Hawaii. Accessed August 30, 2014. https://www.
academia.
edu/7084893/The_Three_Tomorrows_A_Method_for_Postnormal_Times.
Thomas, P. 2013. Nanoart: The Immateriality of Art. Chicago, IL: Intellect.
Von Busch, O. 2013. Zen and the abstract machine of knitting. Textile 11(1): 619.
Wander, S. 2014. Introducing coloring. Accessed January 12, 2014. http://vimeo.com/
81510205.
Watts, H. 1988. The dada event: From trans substantiation to bones and barking. In: Event
Arts and Art Events, S.C. Foster (ed.), Vol. 57, pp. 119131. Ann Arbor, MI: UMI
Research Press.
Wiener, N. 1989. The Human Use of Human Beings: Cybernetics and Society. London, U.K.:
Free Association.
Wilson, J. 2013. Wearables in the workplace. Harvard Business Review Magazine, September.
Young, L.J. 2009. Love: Neuroscience reveals all. Nature 457: 148. https://hbr.org/2013/09/
wearables-in-the-workplace. Accessed February 28, 2015.
Section II
The Technology
Head-Mounted Display
Technologies for
Augmented Reality
Kiyoshi Kiyokawa
CONTENTS
4.1 Introduction.....................................................................................................60
4.2 Brief History of Head-Mounted Displays........................................................60
4.3 Human Vision System..................................................................................... 61
4.4 HMD-Based AR Applications......................................................................... 62
4.5 Hardware Issues............................................................................................... 63
4.5.1 Optical and Video See-Through Approaches...................................... 63
4.5.2 Ocularity..............................................................................................64
4.5.3 Eye-Relief............................................................................................ 65
4.5.4 Typical Optical Design........................................................................ 65
4.5.5 Other Optical Design........................................................................... 68
4.6 Characteristics of Head-Mounted Displays..................................................... 70
4.6.1 Resolution............................................................................................ 70
4.6.2 Field of View....................................................................................... 70
4.6.3 Occlusion............................................................................................. 72
4.6.4 Depth of Field...................................................................................... 73
4.6.5 Latency................................................................................................ 75
4.6.6 Parallax................................................................................................ 76
4.6.7 Distortions and Aberrations................................................................. 77
4.6.8 Pictorial Consistency........................................................................... 77
4.6.9 Multimodality...................................................................................... 78
4.6.10 Sensing................................................................................................. 78
4.7 Human Perceptual Issues................................................................................. 79
4.7.1 Depth Perception................................................................................. 79
4.7.2 User Acceptance..................................................................................80
4.7.3 Adaptation............................................................................................80
4.8 Conclusion....................................................................................................... 81
References................................................................................................................. 81
59
60
4.1INTRODUCTION
Ever since Sutherlands first see-through head-mounted display (HMD) in the late
1960s, attempts have been made to develop a variety of HMDs by researchers and
manufacturers in the communities of virtual reality (VR), augmented reality (AR), and
wearable computers. Because of HMDs wide application domains and technological
limitations, however, no single HMD is perfect. Ideally, visual stimulation should be
presented in a field of view (FOV) of 200(H) 125(V), at an angular resolution of
0.5min of arc, with a dynamic range of 80 db, at a temporal resolution of 120Hz, and
the device should look like a normal pair of glasses. Such a visual display is difficult to
realize and therefore an appropriate compromise must be made considering a variety
of technological trade-offs. This is why it is extremely important to understand characteristics of different types of HMDs, their capabilities, and limitations. As an introduction to the following discussion, this section introduces three issues related to HMDs:
a brief history of HMDs, human vision system, and application examples of HMDs.
4.2 BRIEF HISTORY OF HEAD-MOUNTED DISPLAYS

The idea of an HMD was first patented by McCollum (1945). Heilig also patented
a stereoscopic television HMD in 1960 (Heilig, 1960). Then he developed and
patented a stationary VR simulator, the Sensorama Simulator in 1962, which was
equipped with a variety of input and output devices including a binocular display
to give a user virtual experiences. Comeau and Bryan at Philco Corporation built
Headsight in 1961, the first functioning HMD (Comeau and Bryan, 1961). This was
more like todays telepresence system. Using a magnetic tracking system and a single cathode ray tube (CRT) monitor mounted on a helmet, Headsight shows a remote
video image according to the measured head direction. Bell Helicopter Company
studied a servo-controlled camera-based HMD in the 1960s. This display provides
the pilot an augmented view captured by an infrared camera under the helicopter
for landing at night. In a sense that the real-world image is augmented in real time,
this is the first video see-through AR system, though computer-generated imagery
was not yet used.
The first HMD coupled with head tracking facility and real-time computer-
generated image overlay onto the real environment was demonstrated by Sutherland
in late 1960s (Sutherland, 1965, 1968). This tethered display, called Sword of
Damocles, has a set of CRT-based optical see-through relay optics for each eye,
allowing each eye to observe a synthetic image and its surrounding real environment
simultaneously from a different vantage point.
Since the early 1970s, the U.S. Air Force has studied HMD systems as a way of
providing the aircrew with a variety of flight information. As the first system in this
regard, the AN/PVS-5 series night vision goggle (NVG) was first tested in 1973.
The Honeywell integrated helmet and display sighting system (IHADSS) is one of
the most successful see-through systems in army aviation, which was first fielded in
1985 (Rash and Martin, 1988). In 1982, Furness demonstrated the visually coupled
airborne systems simulator (VCASS), the U.S. Air Forces super-cockpit VR system
(Furness, 1986).
61
Head-Mounted Display Technologies for Augmented Reality
The large expanse extra perspective (LEEP) optical system, developed in 1979
by Howledtt, has been widely used in VR. The LEEP system, originally developed
for 3-D still photography, provides a wide FOV (~110(H) 55(V)) stereoscopic
viewing. Having a wide exit pupil of about 40mm, the LEEP requires no adjustment mechanism for interpupillary distance (IPD). Employing the LEEP optical
system, McGreevy and Fisher have developed the Virtual Interactive Environment
Workstation (VIEW) system at the NASA Ames Research Center in 1985. Using
the LEEP optics, VPL Research introduced the first commercial HMD, EyePhone,
in 1989. The EyePhone encouraged VR research at many institutes and laboratories.
Since then a variety of HMDs have been developed and commercialized.
4.3 HUMAN VISION SYSTEM

Vision is the most reliable and complicated sensory, providing more than 70% of the
total sensory information. Figure 4.1a shows the structure of a human eye. When
light travels through the cornea, it enters the pupil. The pupil is a round opening in
the center of the iris, which adjusts the pupils aperture size. After the light travels
through the pupil, it will enter the crystalline lens, which refracts the light on the
retina. There are two types of photoreceptor cells, rods and cones, on the retina.
The retina contains about 7 million cone cells and 120 million rod cells. As shown
in Figure 4.1b, most cones exist in the fovea, while rods widely exist on the retina
except for the fovea. Three types of cone cells, corresponding to different peak
wavelength sensitivities, cooperatively provide color perception within spectral
region of 400700nm. The cones function under the daylight (normal) condition
and provide very sharp visual acuity, the ability to resolve spatial detail. The rods
function even under the dim light condition, though they provide lower visual acuity
than cones do. Normal visual acuity can identify an object that subtends an angle of
10.5min of arc.
FOV of the human eye is an oval of about 150(H) by 120(V). As both eyes FOVs
overlap, the total binocular FOV measures about 200(H) by 120(V) (Barfield etal.,
1995). The innermost region corresponding to the fovea is only 1.7 in d iameter.
Outside this region, the visual acuity drops drastically. Tocompensate, one needs
Cornea
Nasal
Iris
Density
Pupil
Lens
Ciliary body
and muscle
Blind spot
Optic nerve
(a)
Cones
Rods
Temporal
Degrees
from fovea
Retina
Fovea
Blind spot
(b)
80
Nasal
20 0 20
FIGURE 4.1 (a) Human eye structure and (b) density of cones and rods.
80
Temporal
62
to move the eyes and/or the head. An area in the view where fixation can be accomplished without head motion is called the field of fixation, which is roughly circular
with a radius of about 4050. However, head motion will normally accompany to
maintain the rotation angle of the eyes smaller than15. The horizontal FOV slowly
declines with age, from nearly 180(H) at age 20, to 135(H) at age 80.
Depth perception occurs with monocular and/or binocular depth cues. These cues
can be further categorized into physiological and psychological cues. Physiological
monocular depth cues include accommodation, monocular convergence, and motion
parallax. Psychological monocular depth cues include apparent size, linear perspective, aerial perspective, texture gradient, occlusion, shades, and shadows. Binocular
convergence and stereopsis are typical physiological and psychological binocular
depth cues, respectively. Binocular convergence is related to the angle between two
lines from a focused object to the both eyes, while stereopsis is about the lateral
disparity between left and right images. Stereopsis is the most powerful depth cue
for distance up to 69m (Boff etal., 1986), and it can be effective up to a few hundreds of meters.
The human eye has a total dynamic sensitivity of at least 1010, by changing the
pupil diameter from about 2 to 8mm. According to the intensity of the light, the
dynamic range is divided into three types of vision: photopic, mesopic, and scotopic (Bohm and Schranner, 1990). Photopic vision, experienced during daylight,
features sharp visual acuity and color perception. In this case, rods are saturated
and not effective. Mesopic vision is experienced at dawn and twilight. In this case,
cones function less actively and provide reduced color perception. At the same time,
peripheral vision can be effective to find dim objects. Scotopic vision is experienced
under starlight conditions. In this case, peripheral vision is more dominant than the
foveal vision with poor visual acuity and degraded color perception because only the
rods are active.
4.4 HMD-BASED AR APPLICATIONS

As readers may find elsewhere in this book, HMDs have a variety of applications in
AR including military, medicine, scientific visualization, manufacturing, education,
training, navigation, and entertainment. When considering the use of an HMD, it is
important to identify crucial aspects in the target application.
A wide FOV HMD is preferred when the visual information needs to surround the
user. Army aviation is a good example in this regard, where the aviator often needs
to see in every direction. Through the HMD, the aviator sees a variety of situational
information, including pilotage imagery, tactical, and operational data (Buchroeder,
1987). In this case, a monocular display is often sufficient, as most targets are distant.
Size and weight of the HMD are relatively less crucial, as the aviator needs to wear a
helmet anyway which can also be suspended from the cockpit ceiling.
A high-resolution HMD is preferred for a dexterous manipulation task. For example,
angular pixel resolution as well as registration accuracy is crucial in medical AR visualization. Medical AR visualization eliminates necessity for frequent gaze switching
between the patients body at hand and live images of the small camera inside the body
on a monitor during laparoscopic and endoscopic procedures (Rolland etal., 1996).
63
Stereoscopic view is also important for accurate operations. Wide FOV, on the other
hand, is not crucial, as the image overlay is needed in a small area at hand.
A lightweight, less-tiring HMD is specifically preferred for end users and/or for
tasks with a large workspace. Early examples in this regard include Boeings AR
system for wire harness assembly (Caudell and Mizell, 1992), KARMA system for
end-user maintenance (Feiner etal., 1993), and an outdoor wearable tour guidance
system (Feiner etal., 1997). In these systems, moderate pixel resolution and registration accuracy often suffice. Safety and user acceptance issues, such as periphery
vision and a mechanism for easy attachment/detachment, are more of importance.
4.5 HARDWARE ISSUES

4.5.1 Optical and Video See-Through Approaches
There are mainly two types of see-through approaches in AR; optical and video.
Figure 4.2a shows a typical configuration of an optical see-through display. With
an optical see-through display, the real and synthetic images are combined with a
Monitor
Rendered image
Overlaid
image
Optical
combiner
(a)
Real image
Camera
Real image
Captured image
Image
composition
Rendered image
Overlaid
image
Monitor
(b)
FIGURE 4.2 Typical configurations of (a) optical see-through display and (b) video seethrough display.
64
partially transmissive and reflective optical device, typically half-silvered mirror.

The real world is left almost intact through the optical combiner, while the synthetic
image is optically overlaid on the real image. In most optical see-through HMDs, the
optical combiner is normally placed at the end of the optical path just in front of the
users eyes. In the case of a half-silvered mirror, the real scene is simply seen through
it, whereas the synthetic imagery is reflected on it. The imaging device should not
block the real environment from the eyes. Instead, it is normally located above the
optical combiner, or to the side of the users head with relay optics. Advantages of
optical see-through HMDs include a natural, instantaneous view of the real scene,
seamlessness between aided and periphery views, and (generally) simple and lightweight structures.
Figure 4.2b shows a typical configuration of a video see-through display. With
a video see-through display, the real world image is first captured by a video camera, then the captured image and the synthetic image are combined electronically,
and finally the combined image is presented to the user. Electronic merging can
be accomplished by frame grabbers (such as digital cameras) or chroma-keying
devices. Compared to optical see-through displays, commercially available video
see-through displays are much less. As a result, researchers have often had to build
them manually, using a closed-view (non-see-through) HMD and one or two small
video cameras such as webcams. Advantages of video see-through HMDs over optical see-through HMDs include pictorial consistency between the real and the synthetic views and the availability of a variety of image processing techniques. With
appropriate vision-based tracking and synchronous processing of the captured and
the rendered images, geometric and temporal consistencies can be accomplished.
4.5.2 Ocularity
Ocularity is another criterion for categorizing HMDs. There are three types of ocularity: monocular, biocular, and binocular. These categories are independent of the
type of see-through. Table 4.1 shows applicability of each combination of ocularity
and see-through types in AR.
A monocular HMD has a single viewing device, either see-through or closed. It is
relatively small and provides unaided real view to the other eye. A monocular HMD
is preferable, for example, for some outdoor situations, where less obtrusive real view
is crucial and a stereoscopic synthetic image is not necessary. The army aviation and
wearable computing are good examples. With a monocular HMD, the two eyes see quite
different images. This causes an annoying visual experience called binocular rivalry.
This deficiency is prominent when using a monocular video see-through display.
TABLE 4.1
Combinations of Ocularity and See-Through Types
Optical see-through
Video see-through
Monocular
Biocular
Binocular (Stereo)
Good
Confusing
Confusing
Good
Very good
Very good
65
A biocular HMD provides a single image to both eyes. As both eyes always
observe an exact same synthetic image, a problem of binocular rivalry does not
occur. This is a typical configuration for consumer HMDs, where 2D images such as
televisions and video games are primary target contents. Some biocular HMDs have
optical see-through capability for safety reasons. However, an optical see-through
view with a biocular HMD is annoying in AR systems because accurate registration
is achievable only with one eye. For AR, biocular video see-through HMDs are preferable for casual applications, where stereo capability is not crucial but a convincing overlaid image is necessary. Entertainment is a good application domain in this
regard (Billinghurst etal., 2001).
A binocular HMD has two separate displays with two input channels, one for
each eye. Because of the stereo capability, binocular HMDs are preferred in many
AR systems. There is often confusion between binocular and stereo. A binocular
HMD can function as a stereoscopic HMD only when two different image sources
are properly provided.
4.5.3Eye-Relief
Most HMDs need to magnify a small image on the imaging device to produce a large
virtual screen at a certain distance to cover the users view (Figure 4.3a). For small
total size and rotational moment of inertia, a short eye-relief (the separation between
the eyepiece and the eye) is desirable. However, a too-small eye-relief causes the
FOV to be partially shaded off, and it is inconvenient for users with eyeglasses. As a
compromise, eye-relief of an HMD is normally set between 20 and 40mm.
Eye-relief and the actual distance between the eye and the imaging device (or
the last image plane) are interlinked to each other, because a magnifying lens (the
eyepiece functions as a magnifying lens) has normally equivalent front and back
focal lengths. For example, when the eye-relief is 30mm, the distance between
the eye and the image will be roughly 60mm. Similarly, the larger the eye-relief
becomes, the larger the eyepiece diameter needs to be, which introduces heavier
optics but a larger exit pupil size. The exit pupil should be as large as possible,
at least around 10 mm in diameter. The eyepiece diameter cannot exceed the
IPD normally, which varies among individuals from 53 to 73mm (Robinett and
Rolland, 1992).
4.5.4Typical Optical Design

There is a variety of optical designs in HMDs and each design has its pros and cons.
For example, virtual screens formed by an HMD appear differently in different
optical designs (see Figure 4.3b). Optical designs used for HMDs can be divided
into two types; pupil forming and non-pupil forming. Pupil forming architecture,
also known as relay optics, has often been used in early HMDs to allow large FOV
in exchange for large total size and weight (Hayford and Koch, 1989). It produces at
least one intermediate image and an exit pupil that are collimated by the eyepiece.
Having an intermediate image, optical design can be flexible regarding the size of
imaging device, for example. Pupil forming systems are normally folded and placed
66
Eyeball
Eyepiece
Imaging device
Eye-relief (2040 mm)
Virtual screen
(~2 Eye-relief )
Viewing distance (1 m ~ infinity)
(a)
Conventional HMD, HOE-based

and waveguide-based HMD
Virtual screen is formed
at a certain distance
Light field display

An arbitrary shape of
virtual screen is formed
within a depth of field
Eyeball
VRD, pinlight display

Image is projected
on the retina (virtual
screen is at infinity)
(b)
HMPD
Image is projected
onto the real environment
FIGURE 4.3 (a) Eye-relief and viewing distance and (b) locations of the virtual screen in
different types of HMDs.
around the head to minimize rotational moment of inertia. In such systems, the
pupil of an eye needs to be positioned within a specific volume called an eye box to
avoid eclipse.
With the advent of high-resolution, small imaging devices, non-pupil-forming
architecture has become more common, which allows a modest FOV in a lightweight and compact form factor. As a drawback of non-pupil-forming architecture,
optical design is less flexible. Figure 4.4 shows a number of typical eyepiece designs
in non-pupil forming architecture. In early HMDs, refractive optics has been used
67

Imaging device
Eyeball
Eyepiece
Eyeball
Imaging
device
(a)
(b)
Concave
mirror
Half-silvered
mirror
Imaging device
Eyeball
(c)
Free-form prism
FIGURE 4.4 Typical eyepiece designs. (a) Refractive, (b) catadioptric, and (c) free-form
prism.
(Figure 4.4a). In this case, at least three lenses are normally required for aberration
correction. The size in depth and weight of the optics are difficult to reduce. Optical
see-through capability is achieved by folding the optical path by an optical combiner
placed between the eyepiece and the eye.
Catadioptric designs (Figure 4.4b) contain a concave mirror and a half-silvered
mirror. Light emitted from the imaging device is first reflected on the half-silvered
mirror toward the concave mirror. The light then bounces on the concave mirror, travels through the half-silvered mirror, and enters the eye. This configuration
reduces the size and weight significantly. Besides, chromatic aberration is not introduced, which is the inability of a lens to focus different colors to the same point.
Optical see-through capability is achieved by simply making the concave mirror
semitransparent. However, the eye receives only one-fourth of the original light of
the imaging device at most, because the light must travel through the half-silvered
mirror twice. A beam-splitting prism is often used in place of the half-silvered mirror to increase the FOV at the expense of weight.
A free-form prism (Figure 4.4c) reduces the thickness and weight without loss of
light efficiency. For example, 34 horizontal FOV is achieved with the prisms thickness of 15mm. The inner side of the front surface functions as a concave mirror. The
inner side of the back surface is carefully angled. At first, the light from the imaging device bounces off this surface with total reflection. Second, the reflected light
travels through this surface to the eye, because of small incident angles. To provide
optical see-through capability, a compensating prism can be attached at the front
side (on the right side of Figure 4.4c).
68

Holographic
optical element
(a)
Imaging
device
Imaging
device
Eyeball
Image guided with total reflection
Couple-out
optics
Eyeball
Couple-in
optics
Imaging device
(b)
FIGURE 4.5 Examples of (a) HOE-based HMD and (b) waveguide-based HMD.
A holographic optical element (HOE), a kind of diffractive grating, has been used
for lightweight optics in HMDs. Due to its diffractive power, a variety of curved mirror
shapes can be formed on a flat substrate. A HOE can also function as a highly transparent optical combiner due to its wavelength selectivity. Based on these unique characteristics, very thin, lightweight, and bright optical see-through HMDs can be designed
(Ando etal., 1998). An example of HOE-based stereo HMD is illustrated in Figure 4.5a.
An optical waveguide or a light-guide optical element, together with couple-in
and couple-out optics, offers compact, lightweight, wide field of view HMD designs
(Allen, 2002, Kasai etal., 2000). As shown in Figure 4.5b, image components from
an image source are first coupled into the waveguide with total internal reflection.
Those image components are then coupled out of the waveguide using carefully
designed semitransparent reflecting material such as HOE. Some of recent HMDs
such as Google Glass and EPSON Moverio Series use a waveguide-based design.
4.5.5 Other Optical Design

While typical HMDs present a virtual screen at a certain distance in front of the
users eye, some HMDs form no virtual screen (Figure 4.3b). The Virtual Retinal
Display (VRD), developed at the University of Washington, scans modulated light
directly onto the retina of the eye based on the principle of Maxwellian view.
TheVRD eliminates the need for screens and imaging optics, theoretically allowing
69
for very high-resolution and wide FOV. The VRD assures focused images all the
time regardless of accommodation of the eye, in exchange for a small exit pupil.
Head-mounted projective displays (HMPD) present a stereo image onto the real
environment from a pair of miniature projectors (Fisher, 1996). A typical configuration of HMPD is shown in Figure 4.6a. From the regions in the real environment that
are covered with retro-reflective materials, the projected stereo image is bounced
back to the corresponding eyes separately. Without the need for eyepiece, this design
is less obtrusive, and it gives smaller aberrations and larger binocular FOV up to
120 horizontally.
In 2013, two novel near-eye light field HMDs have been proposed. The light field
is all the light rays at every point in space travelling every direction. In theory, light
field displays can reproduce accommodation, convergence, and binocular disparity depth cues, eliminating a common problem of the accommodationconvergence
conflict within a designed depth of field. NVIDIAs non-see-through near-eye light
field display (Lanman and Luebke, 2013) is capable of presenting these cues, by
using an imaging device and microlens array near to the eye, closer than the eye
accommodation distance (see Figure 4.6b). Because of the simple structure and a
short distance between the eye and the imaging device, a near-eye light field display
can potentially provide a high-resolution, wide FOV with very thin (~10mm) and
lightweight (~100 g) form factors. University of North Carolinas near-eye light field
display (Maimone and Fuchs, 2013) is optical see-through, supporting a wide FOV,
selective occlusion, and multiple simultaneous focal depths in a similar compact
form factor. Their approach requires no reflective, refractive, or diffractive components, but instead relies on a set of optimized patterns to produce a focused image
when displayed on a stack of spatial light modulators (LCD panels). Although image
quality of these near-eye light field displays is currently not satisfactory, they are
extremely promising because of the unique advantages mentioned earlier.
In 2014, UNC and NVIDIA jointly proposed yet another novel architecture, called
pinlight (Maimone etal., 2014). A pinlight display is simply composed of a spatial
light modulator (an LCD panel) and an array of point light sources (implemented as
an edge-lit, etched acrylic sheet). It forms an array of miniature see-through projectors, thereby offering an arbitrary wide FOV supporting a compact form factor. Their
prototype display renders a wide FOV (110 diagonal) in real time by using a shader
program to rearrange images for tiled miniature projectors.
Projector
Retroreflective
surface
Microlens array
Eyeball
(a)
Eyeball
Half-silvered
mirror
(b)
Imaging device
FIGURE 4.6 (a) Head-mounted projective display and (b) near-eye light field display.
70
4.6 CHARACTERISTICS OF HEAD-MOUNTED DISPLAYS

4.6.1Resolution
Resolution of a display system defines the fidelity of the image. Resolution of the total
system is limited by optics and imaging device. In the case of video see-through, resolution of the camera must be taken into consideration as well. A modulation transfer
function (MTF) is often used to quantify the way modulation is transferred through the
system. If the system is linear, convolution of the individual components MTFs gives
the MTF of the entire system. However, angular resolution and the number of total
pixels are conveniently used to assess each component. Regarding resolution of the
synthetic image, an ideal HMD will need to have as many as 12,000 7,200 pixels to
compete with the human vision (60 pixels per degree (PPD) for the total FOV of 200
120). This is, unfortunately, not yet easily obtainable from the current technology.
To compromise, one needs to choose either of three options; (1) higher angular
resolution with a narrower FOV, (2) lower angular resolution with a wider FOV, and
(3) array multiple screens (called tiling). Medical visualization and army aviation
are suitable for first and second options, respectively. The border between first and
second options is not clear, but 50 horizontally is a reasonable threshold. The third
option is promising, but it often suffers from geometric and color discontinuities at
display unit borders, increased manufacturing costs, weight, and size of the device.
For example, Sensicss piSight provides a wide FOV of 187(H) 84(V) in 4 3
arrangements per eye. Its maximum total input pixel resolution per eye is 1,920
1,200, yielding the horizontal PPD of 10.3. Another way of using multiple screens
is a combination of first and second options (Longridge et al., 1989). The idea is
to provide a high-resolution screen and a wide FOV screen in a concentric layout.
Mimicking the human vision system, this configuration gives highest resolution to
where needed.
However, as the pixel resolution of a flat panel has been steadily increasing, this
angular resolution-FOV trade-off is likely to disappear in future. For example, if a
4 K display (3,840 2,160) is used to cover 150 of horizontal FOV, its PPD is over
25.6, at which the pixel structure is difficult to notice.
In AR systems, resolution of the real scene is a different story. Optical seethrough displays provide close to the best scene resolution that is obtained with the
unaided eye. Aberrations and distortions introduced by the optical combiner are negligible. Video see-through displays, on the other hand, provide digitized real images.
Closed-type HMDs, mentioned earlier, can be used as a video see-through HMD
by attaching a digital camera. The resolution of the observed real scene is limited
by both of the resolution of the camera and the display. To avoid unnecessary image
deterioration, it is desirable that the cameras pixel resolution is comparable or superior to that of the display unit.
4.6.2Field of View
A field of view of an HMD for AR can be classified into a number of regions.
Anaided (or overlay) FOV is the most important visual field in AR where the synthetic image is overlaid onto the real scene. An aided FOV of a stereo HMD typically
71
consists of a stereo FOV and monocular FOVs. Narrow FOV HMDs (<~60(H))
commonly have 100% overlap, whereas wide FOV HMDs (>~80(H)) often have
a small overlap ratio, for example, 50%. Outside of the aided FOV consists of the
peripheral FOV and occluded regions blocked by the HMD structure. The real scene
is directly seen through the peripheral FOV, whereas none of the real or synthetic
image is viewed in the occluded regions. The real views transition between the aided
and peripheral views is desired to be as seamless as possible. The occluded regions
must be as small as possible.
Closed-type, wide FOV (immersive) HMDs, such as Oculus Rift, have typically
no or little peripheral FOV through which the real scene is seen. A video see-through
option is available on market for some closed-type wide FOV HMDs, such as Oculus
Rift and Sensics piSight. By attaching appropriate cameras manually, any closedtype HMDs can be used as a video see-through HMD. InfinitEye V2, which offers
the total binocular FOV of 210(H) 90(V) with 90 of stereo overlap, is not an
exception.
In optical see-through HMDs, overlay FOVs larger than around 60(H) are difficult to achieve with conventional optical designs due to aberrations and distortions. However, optical see-through HMDs tend to have a simple and compact
structure, leaving a wide peripheral FOV for direct observation of the real scene.
Nagahara etal. (2003) proposed a very wide FOV HMD (180(H) 60(V) overlap)
using a pair of ellipsoidal and hyperboloidal curved mirrors. This configuration
can theoretically achieve optical see-through, provided by half-silvered curved
mirror. However, the image is seen only from the very small sweet spot, the focus
of the ellipsoid. L-3 Link Simulation and Trainings Advanced HMD (AHMD)
achieves a wide view of 100(H) 50(V) optically using an ellipsoidal mirror
(Sisodia et al., 2006). Kiyokawa (2007) proposed a type of HMPD, hyperboloidal HMPD (HHMPD) (see Figure 4.7a), which provides a wide FOV by using a
pair of semitransparent hyperboloidal mirrors. With this design, a horizontal FOV
wider than 180 is easily achievable. Nguyen etal. (2011) extended this design to
be available in a mobile environment by using a semitransparent retroreflective
screen (see Figure 4.7b).
Recent advancements in optical designs offer completely new paradigms to
optical see-through wide FOV HMDs. Pinlight displays, introduced in the previous section, allow an arbitrary wide FOV in an eyeglass-like compact form factor.
Innovegas iOptik architecture also offers an arbitrary wide FOV, by a custom contact lens. Through the contact lens, one can focus on the backside of the eyeglasses
and the real environment at the same time. A wide aided FOV is available if an
appropriate image is presented on the backside of the eyeglass, by micro projectors,
for example.
A necessary aided FOV is task-dependent. In medical 3-D visualization, such
as breast needle biopsy, only a limited region in the visual field needs to be aided.
In VR, peripheral vision is proven to be important for situation awareness and
navigation tasks (Arthur, 2000). Larger peripheral FOVs reduce required head
motion and searching time. However, the actual effects of a wide FOV display
on the perception of AR content have not been widely studied. Kishishita et al.
(2014) showed that search performance in a divided attention task either drops
72
(a)
(b)
(c)
FIGURE 4.7 A hyperboloidal head-mounted projective display (HHMPD) (a) with and
(b)without a semitransparent retroreflective screen and (c)an example of image.
or increases as the FOV increases up to 100 of horizontal FOV, depending on a

view management method used, and that the estimated performances converge at
approximately 130.
4.6.3 Occlusion
Occlusion is well known to be a strong depth cue. In the real world, orders of objects
in depth can be recognized by observing overlaps among them. In terms of cognitive
psychology, incorrect occlusion confuses a user. The occlusion capability of a seethrough display is important in enhancing users perception, visibility, and realism
of the synthetic scene presented. Correct mutual occlusion between the real and the
synthetic scenes is often essential in AR applications, such as architectural previewing. To present correct occlusion, depth information of both the real and the synthetic
scenes is necessary. Depth information of the synthetic image is normally available
from the depth buffer in the graphics pipeline. Real-time depth acquisition in the
73
real scene has been a tough problem, but an inexpensive RGB-D camera is widely
available nowadays.
Once the depth information is acquired, occlusion is reproduced differently with
optical and video see-through approaches. In both cases, a partially occluded virtual
object can be presented by depth keying or rendering phantom objects. Similarly, a
partially occluded real object can be presented in a video see-through approach simply by rendering the occluding virtual object over the video background. However,
the same effect in an optical way is quite difficult to achieve, as the real scene is
always seen through the partially transmissive optical combiner. Any optical combiner will reflect some percentage of the incoming light and transmit the rest, making it impossible to overlay opaque objects in an optical way. Besides, each pixel of
the synthetic image is affected by the color of the real image at the corresponding
point, and never directly shows its intended color.
Some approaches to tackle this problem include (1) using a luminous synthetic
imagery to make the real scene virtually invisible, (2) using a pattern light source
in a dark environment to make part of real objects invisible (e.g., Maimone etal.,
2013), and (3) using a HMPD with retroreflective screens. First approach is common
in flight simulators but it also restricts available colors (to only bright ones). Second
and third approaches need a special configuration in the real environment thus not
available, for example, in a mobile situation. Another approach is a transmissive or
reflective light-modulating mechanism embedded in the see-through optics. ELMO
displays proposed by Kiyokawa employ a relay design to introduce a transparent
LCD panel positioned at an intermediate focus point. The most advanced ELMO display (ELMO-4) features a parallax-free optics with a built-in real-time rangefinder
(Kiyokawa etal., 2003) (see Figure 4.8). An optical see-through light field display
using a stack of LCD panels has a capability of selective occlusion (Maimone and
Fuchs, 2013) and is extremely promising, though its image quality needs to be significantly improved. Reflective approaches have also been proposed using a digital
micro-mirror device (DMD) or a liquid crystal on silicon (LCoS) (Cakmakci etal.,
2004). Although they require a telecentric system, reflective approaches are advantageous in terms of color purity and light efficiency.
4.6.4Depth of Field
Depth of field refers to the range of distances from the eye (or a camera) in which an
object appears in focus. In the real life, the eyes accommodation is automatically
adjusted to focus on an object according to the distance, and objects outside the
depth of field appear blurred. On the other hand, the synthetic image is normally
seen at a fixed distance. Therefore, it is impossible to focus on both the real and the
synthetic images at the same time with a conventional optical see-through HMD,
unless the focused object is at or near the HMDs viewing distance.
This problem does not occur with a video see-through display, though captured
real objects can be defocused due to the camera. To avoid blurred video images,
the camera is preferable to be autofocus or to have a small aperture size. However,
fixed focus of the synthetic image is problematic because accommodation and
74

Color
display
Optical
combiner
Masking
LCD
Real
viewpoint
Virtual
viewpoint
Mirror
Mirror
Virtual
viewpoint
Mirror
(a)
(b)
(c)
(d)
FIGURE 4.8 (a) ELMO-4 optics design, (b) its appearance, and overlay images seen through
ELMO-4, (c) without occlusion and (d) with occlusion and real-time range sensing. (Images
taken from Kiyokawa, K. etal., An occlusion-capable optical see-through head mount display
for supporting co-located collaboration, Proceedings of International Symposium on Mixed
and Augmented Reality (ISMAR) 2003, 133141, 2003. Copyright (2003) IEEE.)
c onvergence are closely interlinked in the human vision system. Adjusting one of
these while keeping the other causes eyestrain.
To focus on both the real and the synthetic images at the same time, a different
optical design can be used. Virtual images presented by VRDs and pinlight displays appear clearly in focus regardless of users accommodation distance. This is not
always advantageous, specifically when the content to present is a realistic 3-D scene.
On the other hand, a number of varifocal HMDs have been proposed that change
the depth of focus of the image in real time according to the intended depth of the
content. 3DDAC developed at ATR in late 1990s has an eye-tracking device and a
lens shift mechanism (Omura etal., 1996). Its fourth generation, 3DDAC Mk.4 can
change its focal length in the range between 0 and 4 diopters in about 0.3 s. In 2001,
the University of Washington has proposed True 3-D Display (Schowengerdt and
Seibel, 2004) using laser scanning by a varifocal mirror that can present a number of
75
FIGURE 4.9 A liquid lens-based varifocal HMD. (Courtesy of Hong Hua, University of
Arizona, Tucson, AZ.)
images at different depths in a time division fashion. In this system, it is difficult to

control an image depth, as it is not presented at a time. In 2008, University of Arizona
has proposed a varifocal HMD using a liquid lens (see Figure 4.9) (Liuetal., 2008).
This approach is advantageous in terms of size, weight, and cost. Being able to reproduce accommodation cues, near-eye light field displays are the most promising in
this regard, although image quality needs to be improved further.
4.6.5Latency
Latency in HMD-based systems refers to a temporal lag from the measurement of
head motion to the moment the rendered image is presented to the user. This leads
to inconsistency between visual and vestibular sensations. In an optical see-through
HMD, latency is observed as a severe registration error with head motion, which
further introduces motion sickness, confusion, and disorientation. In such a situation, the synthetic image swings around the real scene. In a video see-through HMD,
this problem can be minimized by delaying the captured real image to synchronize it
with the corresponding synthetic image. This approach eliminates apparent latency
between the real and the synthetic scenes, at the expense of artificial delay introduced in the real scene.
To compensate latency, prediction filters such as an extended Kalman filter (EKF)
have been successfully used. Frameless rendering techniques can minimize the rendering delay by continuously updating part of the image frame. Taking advantage
of nonuniformity of visual acuity and/or saccadic suppression, limiting regions and/
or resolution of the synthetic image using an eye-tracking device helps reduce the
rendering delay (Luebke and Hallen, 2001). Viewport extraction and image shifting techniques take a different approach. With these techniques, a synthetic image
larger than the screen resolution is first rendered, and then a portion of it is extracted
and presented to the user according to the latest measurement. There exist some
76
FIGURE 4.10 Reflex HMD. (Courtesy of Ryugo Kijima, Gifu University, Gifu, Japan.)
hardware implementations of image shift techniques. Kijima et al. coined a term

Reflex HMD, from vestibulo-ocular reflex, describing an HMD that has a highspeed head pose measurement system independent from the system latency and an
image shifting mechanism. They propose a variety of Reflex HMD (see Figure 4.10)
(Kijima and Ojika, 2002). Their system uses a gyro sensor attached to an HMD to
estimate the amount of rotation corresponding to the system latency and adjusts the
cropping position and rotation angle of the rendered image. This approach is inexpensive and independent from machines, applications, and OS. A similar mechanism
is employed in Oculus Rift. By using a high-speed (1,000Hz) inertial measurement
unit (IMU) and a pixel resampling hardware, it compensates not only for head rotation (inter-frame latency), but also for a rolling shutter effect (intra-frame latency) of
a display unit.
4.6.6Parallax
Unlike optical see-through systems, video see-through HMDs are difficult to eliminate parallax between the users eye and the camera viewpoint. Mounting a stereo camera above the HMD introduces a vertical parallax, causing a false sense of
height. Horizontal parallax introduces errors in depth perception. It is desirable that
the camera lens is positioned optically at the users eye to minimize the parallax.
Examples of parallax-free video see-through HMDs include Canons COASTAR
(Takagi etal., 2000) and State etal.s (2005) display by using a free-form prism and a
half-silvered mirror, respectively. On the other hand, parallax introduced in an optical combiner is negligible and not compensated normally. As another problem, the
viewpoint for rendering must match that of the eye (for optical see-through) or the
camera (for video see-through). As a rendering viewpoint, the center of eye rotation
77
is better for position accuracy, whereas the center of the entrance pupil is better for
angular accuracy (Vaissie and Rolland, 2000). Although human IPD alters dynamically because of eye rotation, this dynamic IPD has not yet been compensated in real
time to the authors knowledge.
4.6.7Distortions and Aberrations

Image distortions and aberrations cause incorrect registration and rendered depths,
eyestrain and disorientation in AR. In a stereo HMD, differences in image distortion between left and right images must be minimized to achieve correct stereopsis.
Scanning-based displays such as CRTs and VRDs are prone to image distortion.
Because it takes several milliseconds to scan an image, image distortion on the retina will occur with rapid head motion. Rapid head motion also induces annoying
color separation with field-sequential color systems.
Lenses and curved mirrors introduce a variety of optical aberrations. Typical distortions include pincushion, barrel, and trapezoidal. Without introducing additional
optical elements, optical distortions can be corrected electronically by predistorting
the source image. In optical see-through HMDs, distortion must be corrected optically, which may increase weight and size of the optics.
Chromatic aberrations occur due to refractive power (a prism effect) of the lenses.
To compensate, achromatic lenses are normally used, which consist of convex and
concave lenses. Reflective optical elements such as concave mirrors do not induce
chromatic aberrations. Considering that full-color displays actually have only RGB
components, chromatic aberrations can be compensated by separately predistorting R, G, and B planes at the expense of increased rendering costs. This technique
greatly contributes to flexibility in optical designs, resulting, for example, in inexpensive wide FOV such as Oculus Rift.
Spherical aberrations are induced by the spherical shape of the lens surface. With
lateral shift of the eye, the image gets distorted and blurred. Similarly, field curvatures cause blurred imagery in the periphery. Predistorting techniques are not
effective to correct these aberrations. Instead, aspheric and/or achromatic lenses can
be used.
4.6.8Pictorial Consistency
Pictorial consistency between the real and virtual images is important for sense
of reality as well as visibility of the overlay information. For example, brightness
and contrast of the synthetic image should be adjusted to that of the real image.
In an optical see-through HMD, it is difficult to match them for a very wide range
of luminance values of the real scene. For example, no imaging device is bright
enough to be comparable to the sunshine. Instead, some products allow transparency control. In video see-through systems, pictorial consistency is more easily achieved. Instead, low contrast (low dynamic range) of the captured image is
often a problem. To compensate, real-time high dynamic range (HDR) techniques
could be used, though the author is not aware of a successful example in video
see-through AR.
78
4.6.9 Multimodality
Vision is a primary modality in AR. Most AR studies and applications are vision
oriented. However, other senses are also important. Literally speaking, AR systems
target arbitrary sensory information. Receptors of special senses including auditory,
olfactory, gustatory, and the sense of balance reside in the head, thus a head-mounted
device is a good choice for modulating such sensory information. For example, a
noise-canceling earphone is considered a hear-through head-mounted (auditory) display in a sense that it combines modulated sound in the real world with digital sound.
Recently, a variety of HMDs for nonvisual senses have been proposed.
Some sensory information is more difficult to reproduce than others. Interplay
of different senses can be used to address this problem. For example, Meta Cookie
developed by Narumi etal. (2011) successfully presents different types of tastes to
the same real cookie by overriding its visual and olfactory stimuli using a headmounted device. In this way, multimodal displays have a great potential in complementing and enforcing missing senses. It will be more and more important, at least
at a lab level, to explore different types of senses in the form of head-mounted
devices.
4.6.10 Sensing
Unlike a smartphone or a smart watch, a head-mounted device will be cumbersome
if a user needs to put on and take off frequently. A typical prospect on a future HMD
is that it will become light, small, and comfortable so that a wearer can continuously
use it for an extended period of time a day for a variety of purposes. However, an
HMD will be useless or even harmful, when the content is not relevant to the current situation hindering observation of the imminent real environment behind. This
problem is less prominent with an HMD for wearable computing, where the FOV is
relatively small and shown off center of the users view. This problem is more crucial
with an HMD for AR, as it is expected to have a wide FOV covering users central
field of vision. In such situations, an AR system must be able to be aware of user and
environmental contexts, and switch contents and its presentation style properly and
dynamically.
Different types of contextual information need to be recognized to determine if
and how the AR content should be presented. Such information includes environmental context such as location, time, weather, traffic, as well as user context such as
body motion (Takada etal., 2010), gaze (Toyama etal., 2014), physiological status,
and schedule. In this sense, integration of sensing mechanisms into an HMD will
become more important. An HMD can be combined not only with conventional
sensors such as a camera and a GPS unit but also with environmental sensors for
light, noise, and temperature as well as biological sensors for EEG, ECG, skin conductance, and body temperature.
Among a variety of sensing information, a large number of attempts have been
made on eye tracking. In 2008, Fraunhofer IPMS has proposed an HMD, iSTAR,
that is capable of both displaying an image and eye tracking at the same time using
an OLED on a CMOS sensor by exploiting the duality of an image sensor and an
Hyperboloidal
half-silvered mirror
79
Eye-hole
(for eyeball observation)
IEEE 1394
camera
First-order mirror
(a)
(b)
FIGURE 4.11 Wide view eye camera. Appearance (a) and captured image (b).
image display. A users view as well as users gaze is important in analysis of users
interest, however, it has been difficult to acquire a wide parallax-free users view.
Mori etal. (2011) proposed a head-mounted eye camera that achieves this by using a
hyperboloidal semitransparent mirror (see Figure 4.11). Eye tracking is also achieved
by analyzing users eye images captured at the same time as users view. Corneal
image analysis is a promising alternative to this system for its simple hardware configuration, offering a variety of applications including calibration-free eye tracking (Nakazawa and Nitschke, 2012), interaction-free HMD calibration (Itoh and
Klinker, 2014), object recognition, etc. For a multifocal HMD, estimation of gaze
direction may not be enough. It is more desirable to be able to estimate the depth of
the attended point in space. Toyama etal. (2014) revealed that a stereo eye tracker
can estimate a focused image distance, by using a prototypical three-layer monocular optical see-through HMD.
4.7 HUMAN PERCEPTUAL ISSUES

4.7.1Depth Perception
Even when geometrical consistency is achieved, it is often difficult to perceive depths
of virtual objects correctly in AR. This is due primarily to (1) an HMDs insufficient
capability to support depth cues, (2) lack of standard rendering approaches, and (3)
visual congestion. First, as we have seen in this chapter, standard HMDs do not
support every depth cue used in the human vision system. Depth perception can be
improved by rendering other types of monocular depth cues, for example, shades,
shadows, aerial perspective, and texture gradient when appropriate.
Second, some of those rendering techniques may not be preferable in some situations. Virtual objects in an AR application are often rendered in a simple way
(e.g.,wire-framed) intentionally so as not to obstruct visibility of the real scene. Such
objects are less informative in terms of depth perception. The typical x-ray vision
effect also causes confusion in depth perception. To support correct depth perception
in such situations, many research groups such as Livingston etal. (2003) proposed
80
a variety of combinations of visualization techniques, varying, for example, edge

drawing styles and surface opacity.
Third, in some AR applications, virtual annotations and labels may overlap or
congest. Visual congestion degrades the visibility of the object of interest, making
it difficult to perceive its distance. To alleviate label overlaps, and to increase label
visibilities, relocation techniques in the screen space have been proposed by many
research groups (e.g., Bell etal., 2001; Grasset etal., 2012).
4.7.2User Acceptance
Inappropriately worn HMDs will induce undesirable symptoms including headaches,
shoulder stiffness, motion sickness, or even severe injuries. From an ergonomic point
of view, HMDs must be as light, small, and comfortable to wear as possible, as far
as the visual performance satisfies the application requirements. The center of mass
of an HMD must be positioned as close to that of the users head as possible. A wellbalanced heavy HMD feels much lighter than a poorly balanced lightweight HMD.
Safety issues are of equal importance. By its nature, AR applications distract
users voluntary attention to the real environment by overlaying synthetic information. Paying too much attention to the synthetic image could be highly dangerous to
the real world activity. To prevent catastrophic results, AR applications may need
to display minimal information as long as the target task is assisted satisfactorily.
Furthermore, HMDs restrict peripheral vision, which obstructs situation awareness
of the surroundings. In video see-through, central vision will be lost under a system
failure. To accommodate these problems, a flip-up display design is helpful (Rolland
and Fuchs, 2001). When safety issues are of top priority, optical see-through HMDs
are recommended.
From a social point of view, HMDs should have a low profile or cool design to be
widely accepted. Video cameras on an HMD have privacy and security issues. Bass
etal. (1997) describe the ultimate test of obtrusiveness of an HMD, as whether or
not a wearer is able to gamble in a Las Vegas casino without challenge.
4.7.3Adaptation
The human vision system is quite dynamic. It takes some time to adapt to and
recover from a new visual experience. For example, wearing an HMD will cause the
pupils dilation slightly. However, complete dilation may take over 20min whereas
complete constriction may take less than one minute (Alpern and Campbell, 1963).
Even though the visual experience is inconsistent with the real world, the human
vision system adapts to the new environment very flexibly. For example, a great ability of adaptation to the inverted image on the retina has been proven for more than
100years (Stratton, 1896). Similar adaptation occurs with AR systems with parallax
in video see-through systems. Biocca and Rolland (1998) found that performance
in a depth-pointing task was improved significantly over time using a video seethrough system with parallax of 62mm in vertical and 165mm in horizontal. Also
found was a negative aftereffect, which can be harmful in some situations.
81
Long-term use of an HMD will increase the likelihood of users encounter to

a variety of deficiencies, such as red eyes, fatigue, double vision, and motion sickness. Therefore, a recovery period should be given to the user whenever needed. The
National Institute for Occupational Safety recommends a 15min of rest each after
2h of continuous use of a Video Display Unit (VDU) (Rosner and Belkin, 1989).
Extensive user studies must be conducted to develop similar recommendations for
see-through HMDs.
4.8CONCLUSION
With the advancements in display technologies and an increasing public interest to
AR, VR, and wearable computing, both research and business on HMDs are now
more active than ever. However, there is and will be no single right HMD due to
technical limitations and wide variety of applications. Therefore, appropriate compromise must be made depending on the target application. Issues discussed in this
chapter give some insights into the selection of an HMD. One must first consider
whether optical or video see-through approach is more suitable for the target task.
This is, in short, a trade-off between the real world visibility and pictorial consistency. Next consideration would be a trade-off between the field of view and angular
resolution. When the user needs to observe both near and far overlay information, an
accommodation-capable (e.g., near-eye light field displays) or accommodation-free
(e.g., VRDs) HMD may be the first choice. If true occlusion within nearly intact real
views is necessary, occlusion-capable optical see-through displays such as ELMO-4
should be selected. Novel optical designs such as near-eye light field displays and
pinlight displays offer many preferable features at the same time, such as a wide field
of view and a compact form factor. Multimodal output and sensing features will be
more important as the demand for more advanced AR applications grows and HMD
becomes indispensable tool.
REFERENCES
Allen, K. (2002). A new fold in microdisplay optics, in emerging displays review, emerging
display technologies, Stanford Resources, July, pp. 712.
Alpern, M. and Campbell, F. W. (1963). The behavior of the pupil during dark adaptation,
Journal Physiology, 65, 57.
Ando, T., Yamasaki, K., Okamoto, M., and Shimizu, E. (1998). Head-mounted display using
a holographic optical element, Proceedings of SPIE 3293, Practical Holography XII,
SanJose, CA, p.183. doi:10.1117/12.303654.
Arthur, K. W. (2000). Effects of field of view on performance with head-mounted displays,
Doctoral thesis, University of North Carolina at Chapel Hill, Chapel Hill, NC.
Barfield, W., Hendrix, C., Bjorneseth, O., Kaczmarek, K. A., and Lotens, W. (1995). Comparison
of human sensory capabilities with technical specifications of virtual environment equipment, Presence, 4(4), 329356.
Bass, L., Mann, S., Siewiorek, D., and Thompson, C. (1997). Issues in wearable computing: A
CHI 97 workshop, ACM SIGCHI Bulletin, 29(4), 3439.
Bell, B., Feiner, S., and Hollerer, T. (2001). View management for virtual and augmented reality, Proceedings of the ACM UIST 2001, Orlando, FL, pp. 101110.
82
Billinghurst, M., Kato, H., and Poupyrev, I. (2001). The magicbookMoving seamlessly
between reality and virtuality, IEEE Computer Graphics and Applications, 21(3), 68.
Biocca, F. A. and Rolland, J. P. (1998). Virtual eyes can rearrange your body: Adaptation to
virtual-eye location in see-thru head-mounted displays, Presence: Teleoperators and
Virtual Environments (MIT Press), 7(3), 262277.
Boff, K. R., Kaufman, L., and Thomas, J. P. (1986). Handbook of Perception and Human
Performance, John Wiley & Sons, New York.
Bohm, H. D. V. and Schranner, R. (1990). Requirements of an HMS/D for a night-flying helicopter. Helmet-mounted displays II, Proceedings of SPIE, Orlando, FL, 1290, 93107.
Buchroeder, R. A. (1987). Helmet-mounted displays, tutorial short course notes T2, SPIE
Technical Symposium Southeast on Optics, Electro-optics, and Sensors, Orlando, FL.
Cakmakci, O., Ha, Y., and Rolland, J. P. (2004). A compact optical see-through head-worn
display with occlusion support, Proceedings of the IEEE and ACM International
Symposium on Mixed and Augmented Reality (ISMAR), Arlington, VA, pp. 1625.
Caudell, T. P. and Mizell, D. W. (1992). Augmented reality: An application of heads-up display
technology to manual manufacturing processes, Proceedings of the 1992 IEEE Hawaii
International Conference on Systems Sciences, Honolulu, HI, pp. 659669.
Comeau, C. P. and Bryan, J. S. (1961). Headsight television system provides remote surveillance, Electronics, 34(10 November), 8690.
Feiner, S., Macintyre, B., and Seligmann, D. (1993). Knowledge-based augmented reality,
Communications of the ACM, 36(7), 5362.
Feiner, S. B., Macintyre, B., Tobias, H., and Webster, A. (1997). A touring machine: Prototyping
3D mobile augmented reality systems for exploring the urban environment, Proceedings
of ISWC97, Cambridge, MA, pp. 7481.
Fisher, R. (November 5, 1996). Head-mounted projection display system featuring beam splitter and method of making same, US Patent No. 5572229.
Furness, T. A. (1986). The super cockpit and its human factors challenges, Proceedings of the
Human Factors Society, Dayton, OH, 30, 4852.
Grasset, R., Langlotz, T., Kalkofen, D., Tatzgern, M., and Schmalstieg, D. (2012). Imagedriven view management for augmented reality browsers, Proceedings of International
Symposium on Mixed and Augmented Reality (ISMAR), Atlanta, GA, pp. 177186.
Hayford, M. J. and Koch, D. G. (1989). Optical arrangement, US Patent No. 4854688, issued
August 8, 1989.
Heilig, M. (1960). Stereoscopic television apparatus for individual use, US Patent No.
2955156, issued October 4, 1960.
Itoh, Y. and Klinker, G. (2014). Interaction-free calibration for optical see-through headmounted displays based on 3D eye localization, Proceedings of the Ninth IEEE
Symposium on 3D User Interfaces (3DUI), Minneapolis, MN, pp. 7582.
Kasai, I., Tanijiri, Y., Endo, T., and Ueda, H. (2000). A forgettable near eye display, Proceedings
of Fourth International Symposium on Wearable Computers (ISWC) 2000, Atlanta, GA,
pp. 115118.
Kijima, R. and Ojika, T. (2002). Reflex HMD to compensate lag and correction of derivative
deformation, Proceedings of International Conference on Virtual Reality (VR) 2002,
Orlando, FL, pp.172179.
Kishishita, N., Kiyokawa, K., Kruijff, E., Orlosky, J., Mashita, T., and Takemura, H. (2014).
Analysing the effects of a wide field of view augmented reality display on search performance in divided attention tasks, Proceedings of International Symposium on Mixed
and Augmented Reality (ISMAR) 2014, Munich, Germany.
Kiyokawa, K. (2007). A wide field-of-view head mounted projective display using hyperbolic half-silvered mirrors, Proceedings of International Symposium on Mixed and
Augmented Reality (ISMAR) 2007, Nara, Japan, pp. 207210.
83
Kiyokawa, K., Billinghurst, M., Campbell, B., and Woods, E. (2003). An occlusion-capable
optical see-through head mount display for supporting co-located collaboration,
Proceedings of International Symposium on Mixed and Augmented Reality (ISMAR)
2003, Tokyo, Japan, pp. 133141.
Lanman, D. and Luebke, D. (2013). Near-eye light field displays, ACM Transactions on
Graphics (TOG), 32(6), 220. Proceedings of SIGGRAPH Asia, Hong Kong, China.
Liu, S., Cheng, D., and Hua, H. (2008). An optical see-through head mounted display with
addressable focal planes, Proceedings of IEEE International Symposium on Mixed and
Augmented Reality (ISMAR) 2008, Cambridge, UK, pp. 3342.
Livingston, M. A., Swan, J. E., Gabbard, J. L., Hollerer, T. H., Hix, D., Julier, S. J., Yohan,B.,
and Brown, D. (2003). Resolving multiple occluded layers in augmented reality,
Proceedings of International Symposium on Mixed and Augmented Reality (ISMAR)
2003, Tokyo, Japan, pp. 5665.
Longridge, T., Thomas, M., Fernie, A., Williams, T., and Wetzel, P. (1989). Design of an eye slaved
area of interest system for the simulator complexity testbed, in Area of Interest/Field-Of-View
Research Using ASPT, T. Longridge (ed.). National Security Industrial Association, Air Force
Human Resources Laboratory, Air Force Systems Command, Washington, DC, pp. 275283.
Luebke, D. and Hallen, B. (2001). Perceptually-driven simplification for interactive rendering, Proceedings of the ACM 12th Eurographics Workshop on Rendering Techniques,
London, UK, pp.223234.
Maimone, A. and Fuchs, H. (2013). Computational augmented reality eyeglasses, Proceedings
of International Symposium on Mixed and Augmented Reality (ISMAR) 2013, Adelaide,
Australia, pp. 2938.
Maimone, A., Lanman, D., Rathinavel, K., Keller, K., Luebke, D., and Fuchs, H. (2014).
Pinlight displays: Wide field of view augmented reality eyeglasses using defocused
point light sources, ACM Transaction on Graphics (TOG), 33(4), Article No. 89.
Maimone, A., Yang, X., Dierk, N., State, A., Dou, M., and Fuchs, H. (2013). General-purpose
telepresence with head-worn optical see-through displays and projector-based lighting,
Proceedings of IEEE Virtual Reality (VR), Orlando, FL, pp. 2326.
McCollum, H. (1945). Stereoscopic television apparatus, US Patent No. 2,388,170.
Mori, H., Sumiya, E., Mashita, T., Kiyokawa, K., and Takemura, H. (2011). A wide-view p arallax-free
eye-mark recorder with a hyperboloidal half-silvered mirror and appearance-based gaze estimation, IEEE TVCG, 17(7), 900912.
Nagahara, H., Yagi, Y., and Yachida, M. (2003). Super wide viewer using catadioptical optics,
Proceedings of ACM VRST, Osaka, Japan, pp. 169175.
Nakazawa, A. and Nitschke, C. (2012). Point of gaze estimation through corneal surface
reflection in an active illumination environment, Proceedings of European Conference
on Computer Vision (ECCV), Florence, Italy, Vol. 2, pp. 159172.
Narumi, T., Nishizaka, S., Kajinami, T., Tanikawa, T., and Hirose, M. (2011). Meta cookie:
An illusion-based gustatory display, Proceedings of the 14th International Conference
on Human-Computer Interaction (HCI International 2011), Orlando, FL, pp. 260269.
Nguyen, D., Mashita, T., Kiyokawa, K., and Takemura, H. (2011). Subjective image quality
assessment of a wide-view head mounted projective display with a semi-transparent
retro-reflective screen, Proceedings of the 21st International Conference on Artificial
Reality and Telexistence (ICAT 2011), Osaka, Japan.
Omura, K., Shiwa, S., and Kishino, F. (1996). 3-D display with accommodative compensation (3DDAC) employing real-time gaze detection, SID 1996 Digest, San Diego, CA,
pp.889892.
Rash, C. E. and Martin, J. S. (1988). The impact of the U.S. Armys AH-64 helmet mounted
display on future aviation helmet design, USAARL Report No. 88-13. Fort Rucker, AL:
U.S. Army Aeromedical Research Laboratory.
84
Robinett, W. and Rolland, J. P. (1992). A computational model for the stereoscopic optics of a
head-mounted display, Presence: Teleoperators and Virtual Environments (MIT Press),
1(1), 4562.
Rolland, J. P. and Fuchs, H. (2001). Optical versus video see-through head-mounted displays,
in Fundamentals of Wearable Computers and Augmented Reality, Barfield, W. and
Caudell, T. (eds.). Lawrence Erlbaum Associates: Mahwah, NJ.
Rolland, J. P., Wright, D. L., and Kancherla, A. R. (1996). Towards a novel augmented-reality
tool to visualize dynamic 3D anatomy, Proceedings of Medicine Meets Virtual Reality,
Vol. 5, San Diego, CA (1997). Technical Report, TR96-02, University of Central
Florida, Orlando, FL.
Rosner, M. and Belkin, M. (1989). Video display units and visual function, Survey of
Ophthalmology, 33(6), 515522.
Schowengerdt, B. T. and Seibel, E. J. (2004). True 3D displays that allow viewers to dynamically shift accommodation, bringing objects displayed at different viewing distances
into and out of focus, Cyber Psychology & Behavior, 7(6), 610620.
Sisodia, A., Riser, A., Bayer, M., and McGuire, J. (2006). Advanced helmet mounted display
for simulator applications, SPIE Defense & Security Symposium, Helmet- and HeadMounted Displays XI: Technologies and Applications Conference, Orlando, FL.
State, A., Keller, K. P., and Fuchs, H. (2005). Simulation-based design and rapid prototyping
of a parallax-free, orthoscopic video see-through head-mounted display, Proceedings of
IEEE/ACM ISMAR, Santa Barbara, CA, pp. 2931.
Stratton, G. M. (1896). Some preliminary experiments on vision without inversion of the retinal image, Psychological Review, 3, 611617.
Sutherland, I. (1965). The ultimate display, Information Processing 1965: Proceedings of IFIP
Congress, New York, NY, Vol. 2, pp. 506508.
Sutherland, I. (1968). A head-mounted three-dimensional display, Fall Joint Computer
Conference, AFIPS Conference Proceedings, San Francisco, CA, Vol. 33, pp. 757764.
Takada, D., Ogawa, T., Kiyokawa, K., and Takemura, H. (2010). A context-aware wearable
AR system with dynamic information detail control based on body motion, Transaction
on Human Interface Society, Japan, 12(1), 4756 (in Japanese).
Takagi, A., Yamazaki, S., Saito, Y., and Taniguchi, N. (2000). Development of a stereo
video see-through HMD for AR systems, Proceedings of International Symposium on
Augmented Reality (ISAR) 2000, Munich, Germany, pp. 6880.
Toyama, T., Orlosky, J., Sonntag, D., and Kiyokawa, K. (2014). Natural interface for multifocal plane head mounted displays using 3D gaze, Proceedings of the 2014 International
Working Conference on Advanced Visual Interfaces, Como, Italy, pp. 2532.
Vaissie, L. and Rolland, J. (2000). Accuracy of rendered depth in head-mounted displays:
Choice of eyepoint locations, Proceedings of SPIE AeroSense 2000, Orlando, FL, Vol.
4021, pp.343353.
Optics for Smart

Glasses, Smart Eyewear,
Augmented Reality, and
Virtual Reality Headsets
Bernard Kress
CONTENTS
5.1 Introduction..................................................................................................... 86
5.2 HMD/SMART Eyewear Market Segments..................................................... 87
5.3 Optical Requirements...................................................................................... 88
5.4 Optical Architectures for HMDs and Smart Glasses...................................... 91
5.5 Diffractive and Holographic Extractors..........................................................97
5.6 Notions of IPD, Eye Box, Eye Relief, and Eye Pupil.......................................99
5.7 Optical Microdisplays.................................................................................... 102
5.8 Smart Eyewear............................................................................................... 107
5.9 Examples of Current Industrial Implementations......................................... 110
5.9.1 Display-Less Connected Glasses....................................................... 110
5.9.2 Immersion Display Smart Glasses..................................................... 114
5.9.3 See-Through Smart Glasses............................................................... 114
5.9.4 Consumer Immersion VR Headsets.................................................. 115
5.9.5 Consumer AR (See-Through) Headsets............................................ 116
5.9.6 Specialized AR Headsets.................................................................. 117
5.10 Other Optical Architectures Developed in Industry..................................... 117
5.10.1 Contact Lens-Based HMD Systems.................................................. 117
5.10.2 Light Field See-Through Wearable Displays..................................... 118
5.11 Optics for Input Interfaces............................................................................. 118
5.11.1 Voice Control..................................................................................... 119
5.11.2 Input via Trackpad............................................................................. 119
5.11.3 Head and Eye Gestures Sensors........................................................ 119
5.11.4 Eye Gaze Tracking............................................................................. 119
5.11.5 Hand Gesture Sensing....................................................................... 121
5.11.6 Other Sensing Technologies.............................................................. 122
5.12 Conclusion..................................................................................................... 122
References............................................................................................................... 123
85
86
This chapter reviews the various optical technologies that have been developed to
implement head-mounted displays (HMDs), as augmented reality (AR) devices, virtual reality (VR) devices, and more recently as connected glasses, smart glasses, and
smart eyewear. We review the typical requirements and optical performances of such
devices and categorize them into distinct groups, suited for different (and constantly
evolving) market segments, and analyze such market segmentation.
5.1INTRODUCTION
Augmented reality (AR) HMDs (based on see-through optics) have been around for a
few decades now, although being dedicated solely to defense applications until recently
(Cakmacki and Rolland, 2006, Hua et al., 2010, Martins et al., 2004, Melzer and Moffitt,
1997, Rash, 1999, Velger, 1998, Wilson and Wright, 2007). Today AR headsets have
been applied to various markets, such as firefighting, police, engineering, logistics, medical, surgery, and more, with emphasis on sensors, specific digital imaging, and strong
connectivity. Consumer applications are also emerging rapidly, focused on connectivity
and digital imaging capabilities, in an attractive and minimalistic package. Such segmentation has been possible, thanks to recent technological leaps in the smartphones industry
(connectivity, on-board CPU power with miniaturization of ICs, development of complex
sensors, novel microdisplays, novel digital imaging techniques, and battery technology).
Virtual reality (VR) HMDs (also called occlusion or immersive displays) have
also been around for decades but have been targeted to various market segments,
such as flight simulators and battle training for defense applications. Successive
attempts to mass distribute VR HMDs as well as AR HMDs to the consumer market
has partially failed during the last two decades mainly because of the lack of adapted
displays and microdisplays, embarked sensors, and subsequent problems with high
latency (see a few early examples of VR offerings from the 1990s in Figure 5.1).
FIGURE 5.1 Early examples of VR systems from the 1990s.
Optics for Smart Glasses, Smart Eyewear, Augmented Reality
87
5.2 HMD/SMART EYEWEAR MARKET SEGMENTS

Traditionally, HMD markets have been split between low-cost gadget occlusion
HMDs (low-resolution video players, none or minimal sensors, no connectivity) and
high-cost see-through defense HMDs (complex see-through optics, large field of
view [FOV], high resolution, and sensors packed). Today, thanks to the development
of smartphones and associated chips, sensors, and apps, new market segments are
emerging, targeted for various applications, such as the following:
Connected glasses: Such eyewear devices usually have no display or have
single or multiple pixel displays (individual LEDs), but are packed with
Bluetooth connectivity and in some cases also WiFi connectivity. They may
incorporate digital imaging with >8 MP still images and high-resolution
video feed, through a tethered smartphone.
Smart glasses: Smart glasses come in different flavors (occlusion or see-through).
They have small displays and small FOV (usually around 1020 diagonally).
Such smart glasses may also incorporate prescription (Rx) lenses, but the optical combiner is here not part of the Rx lens; instead, itis rather located outside
the Rx lens on the world side (occlusion and see-through combiners) or before
the Rx lens on the glass frame (VR and large occlusion displays).
Smart eyewear: Smart eyewear devices integrate the optical combiner into the
Rx lenses (which could also be a zero diopter lenssuch as curved sun shades).
Smart eyewear is an extension of see-through smart glasses that actually has the
look and feel of conventional glasses, with the addition of Rx lens prescription.
Gaming VR devices: VR HMDs have been with us for some time and still look
like the devices first developed in the 1990s (bulky and heavy, with a cable
running down the neck; see Figure 5.1). However, their display resolution, computing power, multiple sensors, and very low latency make them quite different
from their ancestors. Gaming VR devices will eventually evolve into a new
breed of VR devices, smaller and lighter, and more oriented toward new communication ways rather than toward pure entertainment. With this evolution,
VR devices will tend to merge with large FOV AR devices. Although the large
FOV display challenge has been solved for VR, see-through optics for large
FOV (>100) is still a challenge (especially in terms of size and weight). Most
of the occlusion VR headsets are binocular (Takahashi and Hiroka, 2008) and
provide 3D stereo experience. Foveal rendering (both in resolution and color)
is an active area of research in VR to reduce the computational burden and
connectivity bandwidth (i.e., high resolution rendering over a few degrees only
along the fovea, in which direction is sensed through low latency gaze trackers).
AR headsets for consumer and enterprise: These are niche markets that
have been proven very effective for specific market segments such as medical, firefighting, engineering, logistics, and distribution. See-through smart
glasses provide a contextual display functionality rather than a true AR
functionality that requires merge FOV, thus more bulky optics. Major
challenges lie ahead for binocular consumer and enterprise AR headsets,
such as solving the focus/vergence disparity problem (by using light field
88
FIGURE 5.2 Some of the current AR HMD and smart glasses products.
displays?) as well as providing integrated optical solutions to implement

occlusion pixels for realistic augmented reality displays.
Defense HMDs: The defense markets will remain a stable market for
both large FOV VR (simulation and training) and large FOV AR HMDs
(forboth rotary wing aircraft and fixed wing aircrafts that tend to replace
bulky heads-up display [HUDs]).
Some of the current offerings are depicted in Figure 5.2. For more specific VR headset offerings, refer Sections 5.9.4 through 5.9.6.
5.3 OPTICAL REQUIREMENTS

The optical requirements for the various HMD market segments described in Section5.2
have very different requirements, linked both to the target application and to the form
factor constraints as summarized in Table 5.1.
FOV is one of the requirements that may differ greatly from one application market
to the other, both for occlusion and see-through displays, and is clearly expressed in
the multitude of FOVs and FOV locations developed by industry (see also Figure 5.3).
Requirements on the size of the FOV are directly linked to content and its location to the physical implementation and the application (AR, VR, or smart glasses).
Eye strain should be one criteria when deciding how large and where to position the
FOV within the angular space FOV available to the user. Display technologies (such
as scanners or switchable optics) that are able to scale in real time the FOV without
losing resolution (i.e., keeping the resolution at the eyes resolving limit of 1.2 arc
min), or optical combiner technologies that are able to relocate in real time the entire
FOV, or fracture the available FOV/resolution into different locations are very desirable, but have not been implemented yet in commercial systems.
89
TABLE 5.1
Requirements for the Various HMD Market Segments
Specs
Smart Glasses
VR Headsets
Industrial HMDs
Defense HMDs
Industrial design
Power
consumption
Costs
Weight/size
++++
++++
+++
+++
+++ (Forgettable)
++
+
+
++
Eye box
++++ (Minor mech.

adjustments)
+++ (Combo/
monolithic)
+ (Mono- to
full color)
(15)
+ (100.1)
++
(Dial in)
(Dial in )
+++ (Minor mech.

adjustments)
+++
+ (Helmet
mounted)
(Dial in)
++++ (Full color)
++ (Multicolor)
++++ (>90)
++ (Occlusion)
++ (>30)
++++ (>500.1)
++++
(Mono-/
multicolor)
++++ (>100)
++++ (>500.1)
+++
++
(Occlusion
display)
Binocular 3D
+++
+++
Monocular
Binocular 2D
Rx glasses
integration
Full color
operation
FOV
System contrast
Environmental
stability
See-through
quality
Mono-/binocular
Monocular
(NA)
Note: + means critical; +++ means most critical; means not critical; means least critical.
Oculus Rift (115)

Sony Morpheus (90)
Sony HMZ-T2 (51)
Lumus DK-32 (40)
Zeiss cinemizer (35)
Google Glass
(15)
Optinvent ORA (24)

Epson Moverio
(23)
Vuzix M-100
(16)
Recon Jet
(16)
Occlusion
display
See-through
display
FIGURE 5.3 Display FOVs (both occlusion and see-through) developed by industry.
90
In order to keep the resolution within or below the angular resolution of the human
eye, scaling large FOV is today a real challenge for immersed VR headsets, which
require very large FOV, and thus also very dense pixel count. Several major display
companies have been developing 4 2 K displays over a regular cell phone display
area, which should be able to address a high FOV and decent angular resolution for
VR systems up to 100 diagonal FOV.
For smart glasses, with FOV of 1520, nHD (640 360 pixels, one ninth of full
HD) or at best 720p resolutions are usually sufficient. The FOV and the resulting
resolution for various available HMDs today are listed in Table 5.2. Dot per degree
(DPD) replaces the traditional resolution criteria of dot per inch (DPI) as in conventional displays. Indeed, a high DPI can result in a low DPD as the FOV is large.
An angular resolution of 50 DPD corresponds roughly to 1.2 arc min, which is the
resolution of the human eye (for 20/20 vision). Figure 5.4a shows the angular resolution of some available HMDs in industry as a function of FOV. As one can expect,
the angular resolution tends to decrease when the FOV increases, even when the
display resolution increases (see also Section 5.4).
The pixel counts to achieve the 1.2 arc min angular resolution for increasing
FOVs (diagonally measured) can be quite large when attempting to implement 20/20
vision in VR headsets for FOV over 100. Today, the most dense pixel count display
is a 2 K display (QHD at 2560 1440 on Galaxy Note 5) which would allow such
resolution over a FOV of 60 only. Next year, 4 K displays (3840 2160 by Samsung)
will be available, pushing the resolution up to nearly 100, which is the minimum for
VR but quite large already for AR applications. Figure 5.4b shows how a 16:9 aspect
ratio pixel count scales with FOV.
It is also interesting to organize the various existing HMDs on a graph showing
the FOV as a function of the target functionality (smart glasses, AR, or VR)see
Figure 5.5.
As one might expect, lower FOV is favored by smart glass and smart eyewear
applications. While the FOV increases, professional AR applications tend to be
favored, and for maximal FOV, occlusion VR gaming devices are the preferred
application.
TABLE 5.2
FOV and Resulting Angular Resolution for Various Devices Available Today
Device
FOV
Resolution
Aspect Ratio
Pixels per Degree
Google Glass
Vuzix M100
Epson Moverio
Oculus Rift
Zeiss Cinemizer
Sony HMZ T2
Optinvent ORA
Lumus DK40
15
16
23
115
35
51
24
25
640 360
400 240
960 540
800 640
870 500
1280 720
640 480
640 480
16:9
16:9
16:9
1.25:1
1.74:1
16:9
4:3
4:3
48
28
48
9
28
46
33
32
91
Angular resolution (pixels/deg)
Eye resolution limit (1.2 arc min or 50 pixels/deg)
50
Glass
40
Moverio
Optinvent
Lumus
Cinemizer
Vuzix m100
30
20
Sony
Morpheus
10
Resolution measured in DPD rather than DPI
10
(a)
20
30
50
60
70
Oculus
DK1
80
90
100
110
FOV (diag, deg)
8 K SHD (7680 4320)
35
Size of display (Mpix)
40
Oculus
DK2
Pixel counts required to achieve 1.2 arc resolution over a 16:9

screen for various FOV
40
4 K UHD (3840 2160)
30
25
2 K QHD (2560 1440)
20
1080p (1920 1080)
15
720p (1280 720)
10
nHD
(640 360)
5
0
(b)
Sony
HMZ2
Samsung annoucement for 2015

Galaxy Note 5
20
40
Smart glasses, AR
60
80
100
120
VR (OLED)
140
160
180
FOV (diag, deg)
FIGURE 5.4 (a) Angular resolution as a function of FOV for various existing HMDs;
(b)Pixel counts as a function of FOV.
5.4 OPTICAL ARCHITECTURES FOR HMDs AND SMART GLASSES

We have seen in previous sections that there are very different application sectors for
HMDs, relying on very different optical requirements. It is therefore not surprising
to see that there are very different optical architectures that have been developed to
address such different requirements, both on optical performance and form factor.
Most of the tools available to the optical engineer in his toolbox have been used to
implement various types of smart glasses, AR, and VR devices. Such optical tools
include refractives, reflectives, catadioptric, immersed reflectives, segmented reflectives, Fresnel, diffractives, holographics, lightguides, waveguides, MEMS, etc.
However, within this optical zoo, there are only two main ways to implement a seethrough (or non-see-through) optical HMD architecture: the pupil forming or the nonpupil-forming architectures (see Figure 5.6a). In the pupil-forming architecture, there is
VR
92
VR
AR
Moverio
Very large FOVocclusionbinocular
Laster
AR
Lumus
Large FOVsee-throughmono or binocular
Optinvent
Smart
glasses
Glass
Connected
glasses
Vuzix
m100
Olympus
Medium FOVsee-throughmonocular (Rx integration)

Small FOVocclusionmonocular
Single pixel displaysee-through

10
50
100
FOV (deg)
FIGURE 5.5 Smart glasses, AR, and VR as a function of FOV.
Sony
HMZ2
Oculus
rift
Sony
Morpheus
93
1. Pupil-forming architecture
2. Non-pupil-forming architecture (magnifier)
(a)
Oculus Rift
Occlusion
Sony HMZ
Huge FOV
Vuzix
Large eye box
do it yourself
Vuzix M100 Partially occluded

MyVu
Small FOV
Medium eye box
(b)
Laster Sarl
ODA Labs
...
Bug eye optics

Large FOV
Medium eye box
Temple projector
SBG Labs Corp

Composyt Labs
...
Holographic reflector
Mono or full color
Large FOV
Temple projector
(c)
FIGURE 5.6 (a) Pupil forming and non-pupil-forming optical architectures for HMDs;
(b)occlusion display magnifiers (VR); (c) see-through free-space combiner optics; (Continued)
94
MyVu Corp Distorted see-through

Vuzix Corp Small FOV
Olympus Ltd Medium eye box
Kopin Corp
Google Glass
Good see-through
RockChip Ltd
small FOV
ITRI Taiwan
Medium eye box
OmniVision Inc.
(d)
Without complement piece

Canon Ltd
Motorola HC1
Kopin Golden
eye
Distorted see-through
(or opaque)
Medium FOV
Medium eye box
With complement piece

Imagine Optics
Fraunhoffer
Good see-through
Medium FOV
Medium eye box
(e)
FIGURE 5.6 (Continued) (d) see-through light-guide combiner optics; (e) see-through TIR
freeform combiner optics;
(Continued)
95
Volume holographic
combiner
Curve coated reflector combiner

Epson Ltd
Moverio 1 & 2
...
OK see-through
Medium FOV
Medium eye box
Konica Minolta
...
Good see-through
Medium FOV
Small V eye box
(f )
Microprism combiner
Optinvent Sarl See-through OK
Medium FOV
Large eye box
Injection molded
Cascaded coated
mirrors combiner
Lumus Ltd See-through OK
Medium FOV
Large eye box
All glass
various coatings
Volume holographic combiner

Diffractive combiner
Sony Ltd
Good see-through
Vuzix/Nokia Medium FOV
(M2000AR) Large eye box
BAE Q-sight Photopolymer
(g)
FIGURE 5.6 (Continued) (f) see-through single mirror combiner optic; and (g) see-through
cascaded extractor optics.
an aerial image of the microdisplay formed by a relay lens. This aerial image becomes
the object to be magnified by the eyepiece lens, as in a non-pupil-forming architecture.
Although the non-pupil-forming optical architecture seems to be the simplest
and thus best candidate to implement small and compact HMDs, the pupil-forming
architecture has a few advantages such as the following:
For a large FOV, the microdisplay does not need to be located close to the
combiner lens (thus providing free space around the temple side).
As the object is an aerial image (thus directly accessiblenot located under a
cover plate as in the microdisplay), a diffuser or other element can be placed in
that plane to yield an adequate diffusion cone in order to expand, for example,
the eye box of the system. Other exit pupil expanders (EPEs) can also be used
in that pupil plane (microlens arrays [MLAs], diffractive elements, etc.).
96
The optical path can be tilted at the aerial image plane, thus providing for
hear wrap instead of straight optical path as in the non-pupil-forming architecture. The aerial image can be bounced off at grazing incidence through
a mirror or a prism.
Most of the consumer offerings today (see Figure 5.2) are using the non-pupil-forming
architecture. Most of the defense HMDs are using the pupil-forming architecture.
The optical platforms used to implement the optical combining function in smart
glasses, smart eyewear, AR, and VR devices are quite diverse. They can be grouped
roughly into six categories:
1.
Immersion display magnifiers (Figure 5.6b): These are magnifiers placed
directly on top of the display for maximum FOV (such as in VR devices) or
further away in a folded path such as in smaller FOV smart glasses. They
may be implemented as conventional lenses or more compact segmented or
Fresnel optics, on flat or curved substrates, over single or multiple surfaces.
2.
See-through free-space combiner optics (Figure 5.6c): Such optics are usually partially reflective (either through thin metal or dichroic coatings), as
thin elements or immersed in a thicker refractive optical element, and operate in off-axis mode, making them more complex surfaces than standard
on-axis surfaces as in (1). Such surfaces can also be freeform to implement
large FOV. They might be reflective, segmented (Fresnel-type), or reflective
diffractive/holographic (Kressetal., 2009) in order to reduce the curvature
and thus their protrusion.
3.
See-through lightguide combiner optics (Figure 5.6d): Very often these
architectures are not really lightguides, since any light reflecting (through
TIR) from the surfaces might produce ghost images (or reduce the contrast)
rather than contributing to the desired image. However, the light field is
constantly kept inside plastic or glass, keeping it from being affected by
hair, scatter from dust, etc. For perfect see-through, reflective optics might
be used (right side of Figure 5.6d).
4.
See-through freeform TIR combiner optics (Figure 5.6e): This is a classical
design used not only in see-through combiners but also in occlusion HMDs
(Talha etal., 2008). Typically, this is a three-surface freeform optical element, first surface transmissive, second surface TIR, and third surface partially reflective. It is very desirable in occlusion displays since it allows
the relocation of the display on top or on the side and can allow for larger
FOV. In see-through mode, a compensating element has to be cemented on
the partially reflective coating. Multiple TIR bounces (>3) have also been
investigated with this architecture.
5.
See-through single mirror TIR combiner optic (Figure 5.6f): This is a true TIR
guide that uses either a partially reflective, flat, or curved mirror as a single
extractor or a leaky diffractive or holographic extractor. When the guide gets
thin, the eye box tends to be reduced. The combiner element (flat or curved)
as seen by the eye should have the widest extent possible, in order to produce
the largest eye box. This is why the combiner mirror (or half-tinted mirror)
97
should be oriented inside the lightguide in such a way that the user sees the
largest possible combiner area, producing, therefore, the largest possible eyebox, without compromising image resolution, distortion, or efficiency.
6.
See-through cascaded waveguide extractors optics (Figure 5.6g): In order to
expand the eye box from the previous architectures (especially #5), cascaded
extractors (Thomson CSF, 1991) have been investigated, ranging from dichroic
mirrors to partially reflective prism arrays and variable efficiency reflective and
transmission holographic extractors (Kress and Meyrueis, 2009).
Most of the HMDs we review in this chapter are monocular designs, although there
has been extensive research and development for stereoscopic displays for the consumer market. The issues related to potential eye strain are more complex when
dealing with bi-ocular or binocular displays (Peli, 1998).
5.5 DIFFRACTIVE AND HOLOGRAPHIC EXTRACTORS

We describe here a particular type of optical combiner that can be implemented
either in free space or in waveguide space. The combiner is here a holographic or
diffractive optical element. Diffractive (surface relief modulation) and holographic
(material index modulation) optics are similar in nature and can implement various
optical functionality, such as depicted in Figure 5.7.
Although the optical phenomenon is similar (diffraction through material m
odulation),
the optical effects are very different. For example, the Bragg selectivity in volume holograms (index modulation in the material) cannot be implemented as a surface relief diffractive element. A diffractive element is, however, easier to replicate via embossing
or injection molding or a combination of both. The Bragg selectivity of volume holograms is a very desirable feature that has already been implemented in defense HUDs
fordecades. Only recently have volume holograms been applied to AR headsets and
Holographic and diffractive optical elements
Holographic optical elements (HOEs)

Diffractive optical elements (DOEs)
Sandwiched goop with index modulation
Surface relief modulation
Beam splitter
Engineered diffusers
DOE/aspheric lenses
Microlens arrays
(MLAs)
CGH
Grating/Beam
redirection (custom pattern projection)
Beam shaping/beam
homogenizing
FIGURE 5.7 Diffractive and holographic optics implementing various optical functionalities.
98

Hologram type
Transmission
hologram
Angular selectivity
(%) @550 nm
(%) @30
100
100
0
Reflection
hologram
Spectral selectivity
30
60 ()
(%)@550 nm
450
550
650 (nm)
550
650
(%) @30
100
100
30
60 ()
450
(nm)
FIGURE 5.8 Angular and spectral bandwidths of reflection and transmission holograms.
(a)
(b)
(c)
FIGURE 5.9 Examples of holographic and diffractive combiners such as: (a) Free space
Digilens and Composyt Labs smart glasses using volume reflection holograms, (b): Flat
Nokia/Vuzix/Microsoft and flat Microsoft Hololens digital diffractive combiner with 2D
exit pupil expanders. (c) KonicaMinolta full color holographic vertical lightguide using a
single RGB reflective holographic extractor and Sony monocolor waveguide smart glasses
using 1D reflective holographic incoupler and exit pupil expander outcoupler.
99
smart glasses (see Figure 5.9). Afree-space operation is depicted in Figure 5.6b and a
waveguide operation is depicted in Figure5.6g. Figure 5.8 shows typical angular and
spectral bandwidths derived from Kogelnik-coupled wave theory, for reflection and transmission volume holograms, operating in either free space or TIR waveguide modes. The
FOV of the display is thus usually limited by the angular spectrum of the hologram, modulated by the spectral bandwidth. Transmission holograms have wider bandwidths, but
require also a higher index modulation, especially when tri-color operation is required.
In order to reduce spectral spread (when using LED illumination) and increase
angular bandwidth (in order to push through the entire FOV without uniformity
hit), it is necessary to use reflection-type holograms (large angular bandwidth and
smaller spectral bandwidth).
5.6 NOTIONS OF IPD, EYE BOX, EYE RELIEF, AND EYE PUPIL
Although the eye box is one of the most important criteria in an HMD, allowing easy
viewing of the entire FOV by users having different interpupillary distances (IPDs) or
temple-to-eye distances, it is the criteria that has also the loosest definition. The IPD is
an important criterion that has to be addressed for consumer smart glasses, in order to
cover a 95 percentile of the potential market (see Figure 5.10). Usually a combination
of optical and mechanical adjustment can lead to a large covering of the IPD (large
exit eye pupil or eye box). A static system may not address a large enough population.
The eye box is usually referred to as the metric distance over which the users eye
pupil can move in both directions, at the eye relief (or vertex) distance, without loosing
the edges of the image (display). However, loosing the display is quite subjective and
involves a combination of resolution, distortion, and illumination uniformity considerations, making it a complex parameter. For obvious aesthetics and wearability reasons,
it is desirable to have the thinnest combiner and at the same time the largest eye box
Interpupillary distance
55 mm
Adult male (U.S.A.), 5th percentile
70 mm
Adult male (U.S.A.), 95th percentile
53 mm
Adult female (U.S.A.), 5th percentile
65 mm
Adult female (U.S.A.), 95th percentile
41 mm
Child, low
55 mm
Child, high
FIGURE 5.10 Interpupillary distance (IPD).
100
Thickness of combiner
e
Fr
Aesthetic constraint
ed
v
ur
ec
ac
p
es
er
bin
Design space for smart glasses
m
co
Lig
ith
w
ide
u
htg
on
tor
ma
olli
c
s
axi
E
o EP
e w/
id
eg u
Wav
Waveguide w EPE
Mechanical stability constraint

Min IPD
coverage
Full IPD
coverage
Eye box size
FIGURE 5.11 Optical combiner thickness as a function of the eye box size for various optical HMD architectures.
or exit pupil (an eye box of 10mm horizontally and 8mm vertically is often used as
a standard requirement for todays smart glasses). Designing a thin optical combiner
producing a large eyebox is usually not easy: when using conventional free-space
optics, the eye box scales with the thickness of combiner (see, e.g., Figure 5.11), as in
most of the architectures presented in the previous section, expect for architecture #6
(Figure5.6g), which is based on waveguide optics using cascaded planar extractors.
For holographic combiner and extraction (both free spaceFigure 5.6c and
waveguide Figure 5.6g), various EPE techniques have been investigated to expand
the eye box in both directions (Levola, 2006, Urey and Powell, 2005). EPEs are often
based on cascaded extractors (conventional optics or holographics) and act usually
only on one direction (horizontal direction). See, for example, also Figure 5.9, upper
right example (Nokia/Vuzix AR 6000 AR HMD and Microsoft HoloLens) using
a diffractive waveguide combiner with both X and Y diffractive waveguide EPE.
However, such complex diffractive structures require subwavelength tilted structures
difficult to replicate in mass by embossing or injection molding. The EPEs can also
be implemented in free-space architectures by the use of diffusers or MLAs.
The eye box is also a function of the size of the eye pupil (see Figure 5.12).
Typically, a smaller eye pupil (in bright environments) will produce a smaller effective eye box, and a larger eye pupil (in darker environments) will produce a larger
eye box. A standard pupil diameter used in industry is usually 4mm, but can vary
anywhere from 1to 7mm depending on the ambient brightness.
The eye box is modeled and measured at the eye relief, the distance from the
cornea to the first optical surface of the combiner. If the combiner is integrated
within Rx lenses (such as in smart eyewear), the notion of eye relief is then replaced
by the notion of vertex distance, the distance from the cornea to the apex of the lens
on the eye side surface. If the combiner is worn with extra lenses (such as in smart
glasses), the eye relief remains the distance from the cornea to the exit surface of
101
Eye box size
Eye box is scaled by eye pupil size
Eye pupil diam

Bright
conditions
Human pupil
23 mm
Low-light
conditions
Human pupil
7 mm
FIGURE 5.12 Eye box size (exit pupil) as a function of the eye pupil diameter.
Eye box scales with eye relief
Design space
Eye box
Non-pupil-forming
architectures
Full IPD coverage
Pupil-forming
architectures
Min IPD coverage

Vertex distance
Aesthetic
constraint
Eye relief/
vertex distance
FIGURE 5.13 Eye box versus eye relief for various optical HMD architectures.
the combiner, not to the Rx lens. In virtually all HMD configurations, the eye box
reduces when the eye relief increases (see Figure 5.13).
However, in pupil-forming architectures (refer also to Figure 5.5), the eye box may
actually increase until a certain distance (short distance, usually smaller than the
nominal eye relief), and then get smaller. For non-pupil-forming architectures, the eye
box reduces as soon as one gets away from the last optical element in the combiner.
102

Eye box is shared with FOV
Eye box
Eye box size
FOV
Eye box
FOV
FOV
FIGURE 5.14 The eye box and FOV share a common space.
The eye box is also shared with the FOV (see Figure 5.14). If the HMD can be
switched between various FOV (by either altering the microdisplay, using a MEMS
projector as a display, or changing the position of the microdisplay and the focal
length of the magnifier), the eye box may vary from a comfortable eye box (small
FOV) to a nonacceptable eye box blocking the edges of the image (for large FOV).
Finally, the effective eye box of a smart glass can be much larger than the real
optical eye box when various mechanical adjustments of the combiner may be used to
match the exit pupil of the combiner to the entrance pupil of the users eye. However,
for any position of the combiner, the eye box has to allow the entire FOV to be seen
unaltered, at the target eye relief. It may happen that for a specific position of the combiner, the entire display might be seen indoors (large pupil), but the edges of the display might become blurry outside due to the fact that the eye pupil diameter decreases.
5.7 OPTICAL MICRODISPLAYS

The microdisplay is an integral part of any AR or VR HMD or smart eyewear.
Various technological platforms have been used, from traditional panels including
transmission LCD displays to reflective liquid crystal on silicon (LCoS), with illumination engines as LED back or front lights, to organic LED (OLED) and inorganic
LED panels, to optical scanners such as MEMS or fiber scanners. Reflective LCoS
panels are depicted in Figure 5.15.
Traditional illumination engines (see Figure 5.16) range from curved polarization beam splitting (PBS) films (large angular and external bandwidth filmsleft)
to PBS cubes (center) with either free-space LED collimation or back light to thinedge illuminated front lights (right). Although edge illuminated front lights produce
the most compact architecture, it is also the most difficult to implement (front light
illumination layers have also been developed for other display systems such as the
Mirasol MEMs displays by Qualcomm).
103
FIGURE 5.15 Liquid crystal on silicon microdisplays.
FIGURE 5.16 Illumination engines for LCoS microdisplays.
The efficiency of either LCoS or LCD transmission microdisplays remains low

(2%4% typically). Phase LCoS microdisplay panels can also be used to produce a
phase image, which, upon coherent laser illumination, will produce an intensity pattern in either the far or the near field (see Figure 5.17a, HoloEye product).
Such phase LCoS can be considered as dynamic computer-generated holograms
(CGHs) and can therefore implement either Fresnel (near field) or Fourier (far field)
patterns. Producing dynamically images appearing at different depths without moving any optical element is a very desirable feature in HMDs and smart glasses, producing compensation for visual impairments or various depth cues.
In order to integrate them either in an HUD or an HMD system, they have to be
used in combination with a combiner optic (see Figure 5.6a) and an EPE (such as a
diffuser). When using an EPE, an intermediate 2D aerial image has to be produced,
which reduces the attractiveness of that technology.
OLED as well as inorganic LED panels are exciting alternatives to transmission
LCD or LCoS microdisplay technologies. Such panels are emissive displays, which do
not require a additional backlight (or front light), but produce a Lambertian illumination, wasting light after the panel. In order to provide efficient light usage and reduce
ghosting for most of the architectures described in Figure 5.6a, the emission cone of
microdisplay should remain partially collimated. Directionality of the emission cone
104
(a)
(b)
FIGURE 5.17 (a) Phase LCoS microdisplays (HoloEye PLUTO panel) as dynamic computergenerated holograms for far field or near field display; (b) bidirectional OLED panel used in
HMD (Fraunhoffer Institute).
is also a desirable variable to control in order to increase efficiency of the combiner

optics. OLED panels have been used to implement AR HMDs. Such OLED panels
can also be used as bidirectional panels, integrating a detector on the same plane, thus
enabling easy eye gaze tracking (see Figure 5.17b).
Curved panels such as with OLED technology might help in relaxing the constraints on the optics design especially in off-axis mode or in large FOV VR headsets
(see Figure 5.18).
Instead of working with an object plane that is static (planar), this object plane can
be used as a degree of freedom in the global optimization process of the combiner
optics. Today, most of OLED microdisplay panels do have a silicon backplane, a
requirement for the high density of the pixels (pixels smaller than 10m). Instead of
using directly patterned OLED pixels, they use color filters on a single OLED material, because of the pixel density. This reduces the efficiency of the panel. R&D in
most OLED companies focus on removing such color filters and patterning directly
OLED pixels at sub-10-m size.
105
FIGURE 5.18 Curved OLED panels can relax the design constraints for HMD combiner optics.
PicoP
display engine
Combiner optics
MEMS scanning mirror

R G B Layers
Projected
image
MEMS
Mobile device with

embedded PicoP
FIGURE 5.19 MEMS micromirror laser sources for Pico projectors and HMDs.
MEMS micromirror laser scanners are desirable image generators for HMDs
(since the laser beams are already collimatedsee Figure 5.19) but cannot produce an
acceptable eyebox without forming an intermediate aerial image plane to build up the
eyebox (optional diffusion at this image plane may also creates parasitic speckle when
used with laser illumination). Most scanners use laser light, which can produce speckle
(which has then to be despeckeled by an additional despeckeler device). However, if
there is no diffuser in the optical path, speckle should not appear to the eye.
An alternative to micromirror MEMS scanner is the vibrating piezo fiber scanner
(such as with Magic Leaps AR HMD). Such a fiber scanner can be used either as an
image-producing device and thus integrated in an HMD system (see Figure 5.20) or can
be used as a digital image sensor (in reverse mode). These devices are very small, and
the laser or LED source can be located away from the fiber end tip (unlike in a MEMS
scanner which is free space), making them ideal candidates for HMD pattern generators.
Furthermore, as fiber can also be used in reverse mode, eye gaze tracking can be integrated in a bidirectional scheme, such as in the bidirectional OLED device in Figure 5.17.
One of the advantages of both MEMS or Fiber scanners is that the effective FOV
can be rescaled and/or relocated in real time without loosing efficiency (provided the
106
Head strap
Brightness
control
Video camera
and IR LEDS
Scanning fiber
display tube
FIGURE 5.20 Vibrating piezo fiber tip produces an image and integrated in an HMD via a
free-space optical combiner.
TABLE 5.3
Microdisplays and Image Generators Used Today in HMD/Smart Glass
Devices
Device
Type
Display
Resolution
Google Glass
Vuzix M100
Epson Moverio
Oculus Rift DK1
Oculus Rift DK2
Silicon microdisplay ST1080
Zeiss Cinemizer
Sony HMZ T3
Sony Morpheus
Optinvent ORA
Lumus DK40
Composyt Labs
See-through
Opaque
See-through
Opaque
Opaque
Opaque
Opaque
Opaque
Opaque
See-through
See-through
See-through
Vuzix/Nokia M2000AR
See-through
LCOS
LCD
LCOS
LCD
OLED
LCOS
OLED
OLED
OLED
LCD
LCOS
Laser MEMS
scanner
Laser MEMS
scanner
640 360
400 240
960 540
1280 800
1920 1080
1920 1080
870 500
1280 720
1920 1080
640 480
640 480
Res and FOV can
vary dynamically
Res and FOV can
vary dynamically
Aspect Ratio
16:9
16:9
16:9
1.25:1
16:9
1.74:1
16:9
16:9
4:3
4:3
Can vary
Can vary
107
combiner optic can still do a decent imaging job at such varying angles). This is possible to a certain degree in emissive panels, but not in LCD or LCoS panels in which
the backlight would always illuminate the entire display.
Table 5.3 summarizes the various image sources used for some of the current
HMD, VR, and smart glass offerings. Most of the image generation sources are used
today (LCD, LCoS, OLED, MEMS).
5.8 SMART EYEWEAR

The combination of optical combiner and prescription glasses, or plano sunshades is
crucial for its adoption by consumers. The integration of Rx lenses based on spherical, cylindrical and prism compensations is a de facto requirement for future smart
eyewear (see Figure 5.21).
In order to compensate for simple nearsighted and farsighted vision, many HMD
manufacturers allow the display to be adjusted in regard to the magnifier and produce
an image appearing at various depths. However, a mechanical adjustment might not
be suitable for smart eyewear.
It is a difficult optical exercise if one desires to have a single optical element
implementing both functionalityoptical prescription including cylinder and optical combiningwithout any complex mechanical adjustments. Most of smart glasses
available today use a combination of glasses and optical combiner (two physical optical elements). The glasses can be located either before or after the optical combiner,
producing specific constraints for the users adaptation to the display (see Figure 5.22).
The most straightforward way to combine Rx lenses and a combiner is to place
the Rx lens in between the eye and the combiner (case 1.a in Figure 5.22). While this
might be sufficient to correct for nearsightedness, it does not produce an acceptable
viewing experience for farsightedness.
Furthermore, as the only acceptable shape for an Rx or plano lens is a meniscus (for aesthetic as well as size reasons) (see Figure 5.23), planar, plano-convex,
concave-plano, or convex-convex lenses might then not be used (such lens shapes
would however allow for easier integration of flat optical combiner). Integrating a flat
optical element inside a curved meniscus will produce a thick lens. While addressing
visual impairment issues, it might not be acceptable for aesthetic and weight reasons
for the consumer market.
The requirements for medical eyewear is very different than for consumer eyewear
and may allow for much thicker eyewear implementing a large eye box combiner
within a thick meniscus lens providing adequate eyewear prescription (As in Essilor/
Lumus smart glasses providing relief for age-related macular degeneration or AMD).
Integrating the optical combiner inside a conventional Rx meniscus lens is thus
a complex challenge, which may not be fulfilled by using conventional optical elements, and may require more complex optics, such as segmented optics, microoptics,
holographic, or diffractive structures.
Figure 5.24 shows some of the current products that combine two different optical
elements, a combiner optic and an Rx lens. The Rx lens is worn between the eye and
the combiner (producing an effective correction for nearsightedness for both world
and digital displays, but lacks in compensating farsightedness for the digital display).
108
Nearsightedness (Myopia)
Tells you which eye

the prescription is
for, O, D, is the
right eye, O, S, is
the left eye. Some
prescriptions simply
list L and R.
This number is used for

bifocals, it gets added to
the regular sphere
prescription to get the
near vision prescription.
SPHERE
CYL
AXIS
ADO
O.D.
+4.25
+2.50
090
+2.00
O.S.
+4.00
+1.50
090
+2.00
Uncorrected
Corrected with lenses
Farsightedness (Hyperopia)
These numbers describe any astigmatism. The Cyl

number indicates the severity of the astigmatism. Axis
tells you which way the astigmatism is oriented.
Uncorrected
Corrected with lenses
FIGURE 5.21 Prescription compensation in combiner optics for smart eyewear to address a large potential consumer market.
The spherical error (that is, nearsighted or

farsighted). A + means the prescription is
farsighted. A means the prescription is
nearsighted. The higher the number after the
+ or , the stronger the prescription.
109
1. Rx lens UNDER combiner

a. Combiner independent of Rx lens
b. Combiner shares outer shape of Rx lens
(1.a)
(1.b)
Generic uncompensated combiner but problematic for

farsightedness (hyperopia)
2. Rx lens OVER combiner
a. Combiner independent of Rx lens
b. Combiner shares inner shape of Rx lens
(2.a)
(2.b)
Combiner to be compensated for nearsightedness (myopia)

3. Combiner INSIDE Rx lens
a. Flat or curved combiner requiring TIR and
injection port
b. Flat/curved combiner (no TIR + injection port)
(3.a)
(3.b)
Combiner to be compensated for both myopia and hyperopia
FIGURE 5.22 Integration of optical combiner and prescription lenses.
Meniscus
Biconcave Planoconcave
Positive lenses
+8.00
Meniscus
Biconcave
Planoconcave
Negative lenses
+4.00
5.00
8.00
+6.00
+2.00
3.00
6.00
+4.50
+0.50
0.50
4.50
All +4D lenses
All 4D lenses
FIGURE 5.23 The only acceptable lens shape for smart eyewear is a meniscus.
110
(a)
(b)
FIGURE 5.24 Available prescription lenses implementations in smart glasses: (a) Google
Glass and (b) Lumus Ltd.
5.9 EXAMPLES OF CURRENT INDUSTRIAL IMPLEMENTATIONS

We review in this section some of the current offerings available on the market for
connected glasses, smart glasses, smart eyewear, AR, and VR headsets.
5.9.1 Display-Less Connected Glasses

No display here, but high-resolution camera, with Bluetooth and/or WiFi connectivity
(see Figure 5.25a).
lon Smart Glasses

Single LED alert light
Geco eyewear
Mita Mamma eyewear

Fun-lki eyewear
Life Logger
Bluetooth WiFi camera headset
(a)
FIGURE 5.25 (a) Connected glasses available on the market;
(Continued)
111
MicroOptical Corp.,
MyVu Corp.
STAR 1200
Augmented Reality System
Vuzix Corp.
(b)
Google Glass, Google, Mountain View
OmniVision, Santa Clara
Laster Sarl, France

ChipSiP and Rockchip, Taiwan
(c)
(d)
FIGURE 5.25 (Continued) (b) occlusion smart glasses available on the market; (c) seethrough smart glasses available on the market (Google Glass and various copy cats, + Laster
SARL p roduct); (d) pseudo see-through tapered combiner device from Olympus; (Continued)
112
Oculus
Sony
Silicon microdisplay ST1080
(e)
Barrel distortion
(in-engine)
Pin-cushion distortion
(from rift lenses)
No distortion
(final observed image)
(f)
Number of receptors
per square millimeter
180,000
Blind spot
160,000
140,000
Rods
Rods
120,000
100,000
80,000
60,000
40,000
Cones
20,000
Cones
0
70 60 50 40 30 20 10 0 10 20 30 40 50 60 70 80
Angle ()
(g)
FIGURE 5.25 (Continued) (e) some of the main VR headsets available today on the market;
(f) oculus latency sensor (left) and optical distortion compensation (right) through software in VR;
(g) foveated rendering in high pixel count VR systems, linked to gaze tracking;
(Continued)
113

Focus accommodation/vergence disparity in stereoscopic VR systems
Apparent
object
location
Vergence distance
Focus distance
Focus distance
Vergence distance
Real
object
location
Virtual image
location
Screen
location
VR
magnifier
lenses
(h)
Real 3D scene
Stereoscopic scene in VR headset
Managing the vergence with ISD, IOD, and IPD

Left
display
IPD
(interpupillary distance)
IOD
(interocular distance)
ISD
(interscreen
distance)
Right
display
(i)
Optinvent, France
Microprisms
Epson Ltd., Japan

Lightguide
Lumus Ltd., Israel
Cascaded mirrors
Sony Ltd., Japan

Holographic
(j)
FIGURE 5.25 (Continued) (h) focus/vergence accommodation disparity in stereoscopic VR

systems; (i) managing the eye convergence in stereoscopic based VR headsets; (j) consumer
AR systems available on the market;
(Continued)
114
(k)
(l)
FIGURE 5.25 (Continued) (k) specialized AR headsets for law enforcement, firefighting,
and engineering; and (l) specialized AR headsets for medical and surgical environments.
5.9.2Immersion Display Smart Glasses

Such immersive head worn displays were the first available consumer electronic HMDs.
Some of them (MicroOptical and Vuzix) have been available since the end of the 1990s
as personal video players (no connectivity or camera). More recent designs include both
camera and connectivity (Bluetooth and WiFi) as well as an operating system such as
Android (e.g., the monocular Vuzix M100 smart glass on Figure5.25b, upper left).
5.9.3See-Through Smart Glasses

See-through smart glasses combine both the characteristics of the previous monocular occlusion smart glasses with small FOV and the higher-end AR see-through
HMDs, in a smaller package, with the additional feature of Rx lenses. An early
example is Google Glass, with a few copy cats (ChipSiP, Rockchip, MicroVision),
115
and other designs such as Laster, in which combiner is located vertically (see
Figure 5.25c). An interesting alternative optical architecture has been developed by
Olympus and others (Vuzix Pupil, Sony Smart Glasses, Telepathy One)see Figure
5.25d. The combiner is here an occlusion tapered combiner using a 45 mirror, but
the end tip of the combiner is smaller than the usual 4mm diameter of the eye pupil,
making it thus pseudo transparent to the far field (much like when one takes a knife
edge close to the eye and can see through the edge).
Such see-through smart glasses or smart eyewear is, however, not an AR system:
their limited FOV (limited mainly by the size of the optics) and angular offset of
such FOV make them best suited for contextual display applications rather than for
true AR applications. AR headset systems require larger FOV centered on the users
line of sight (see also Section 5.9.5).
5.9.4 Consumer Immersion VR Headsets

VR headsets seem to be a Phoenix rising from the ashes of the defunct 1990s VR
wave. In fact, although the external aspects and the optics remains similar, the performances are very different, both in the content, the resolution of the display (1080 p),
and especially the sensors (gyro, accelerometer, magnetometer) and their latency to
display refresh (<20 ms). Figure 5.25e shows the main contenders (Samsung recently
disclosed a high-resolution VR headset with Oculus and Zeiss developed the VR-One).
Low latency (<20 ms) and software compensation for optical distortion and lateral chromatic spread are some of the key components of an effective VR system
such as the Oculus Rift DK2 (see Figure 5.25f).
Foveated rendering (see Figure 5.25g) is another key technique which will help
alleviate the computing and bandwidth requirements for next generation VR systems
having very high pixel counts. Eye gaze tracking (see also Section 5.11) is a key
component for a true foveated rendering in VR systems.
There are many problems on the road to the perfect VR system, some of them are
related simply to technology, some are related to the market, and some are related to
the sickness they induce in some people.
VR (as well as AR) sickness is occurring at various levels and have been responsible partly to the flop of the VR market in the 1990s.
Dense pixel counts and very low latency (<20 ms) have enabled a much smoother
experience, and thus reduce sickness related to such, especially, low latency. This
is solely technology related (more pixels, better sensors, faster electronics, smoother
transitions, shorter pixel latency). However, other VR-induced sickness issues are
more related to the architecture, such as the focus/convergence disparity (see Figure
5.25h) in stereoscopic displays such as the ones used in most of the VR systems today.
In a true 3D experience, the eyes focus accommodation is not conflicting with
the true location of the object, and thus yields the correct eye convergence (also
called vergence). For a stereoscopic display such as most of the VR systems today,
the eye focus accommodation conflicts with the eye vergence, producing the sickness sensation for the user. Solving for the focus/vergence disparity in conventional
VR systems (based on stereoscopic imaging) is a complex task. The focus/vergence
disparity is a function of the IPD and the ISD, (interscreen distance) as well as the
116
IOD (interocular distance), see Figure 5.25i. Adjusting the screens and/or the lenses
or using dynamic lenses can provide a potential solution.
Another way to reduce such disparity in VR and AR systems (and therefore
the associated sickness) is to use non-stereoscopic 3D display techniques such as
dynamic holographic displays (using Fresnel diffractive patterns producing virtual images at specific distances). Light field displays such as the ones developed
by nVIDIA (Douglas Lanman) or Magic Leap Inc. are another elegant way to
produce a true 3D representation accommodating both eye vergence and focus.
However, light field display techniques are still in their infancy and require complex rendering computation power as well as higher display resolutions (similar
constraints as the light field imaging, such as in the Lytro camera system).
5.9.5 Consumer AR (See-Through) Headsets

AR headsets have been developed since a few decades, especially for the defense
market. Consumer AR systems have recently been introduced, under a vast variety
of different optical architectures, over FOV around 3040 diagonally, such as the
products shown in Figure 5.25j.
The upper left example in Figure 5.6g (Epson Moverio bt200) is a single halftone
mirror lightguide combiner relatively thick (10 mm). All three other configurations
(Optinvent ORAupper right, using microprism array; Lumuslower leftusing
cascaded dichroic mirrors; and Sony Smart Glasslower Rightusing holographic
waveguide extractor) are waveguide combiners with 1D EPEs, as described also in
Figure 5.6g. Microsofts Hololens AR combiner is similar to the Nokia/Vuzix diffractive combiner. Such waveguide extractors have the benefit of producing a large
horizontal eye box (due to the EPE effect of the linear cascaded combiners) in a
relatively thin lightguide (<3mm).
Due to their moderate FOV (3040) and the position of the virtual image
directly in the line of sight, they are good candidates for true AR applications.
Although these AR devices are integrated in glass frames, they do not integrate any
Rx lenses, the combiners having the annoying requirement to be perfectly flat to
function efficiently. Integrating Rx lenses on top of such combiners is a difficult task
and produces often clunky looking glasses, with unorthodox lens curvatures. The
simplest solution is again to introduce standard Rx lenses in between the combiner
and the eye (such as described in Section 5.8).
For true AR experience, it is also important to occlude the reality when a digital
image is displayed in the field, so that this image appears as a real object and not a
transparency over the field. Integrating such occlusion pixels in AR systems can be
done, for example, by using a pupil forming architectures with an LCD shutter on
the image plane. Most of the AR systems today (see Figure 5.25j as well as figures
in next section) do not implement any occlusion displays, simply because it is very
difficult to do with traditional optics.
One (simple) way to implement occlusion pixels in AR systems is to take a video of
reality and digitally superimpose onto the video feed any virtual objects to be viewed
in a conventional VR environment (opaque screen), therefore creating the illusion of
seamless integration of reality and virtuality. However, this is of course not true AR.
117
5.9.6Specialized AR Headsets
A few industrial headsets have been developed for specialized markets such as law
enforcement, firefighting, and engineering (Hua and Javidi, 2014, Wilson, 2005),
on the basis of an opaque combiner located in the lower part of the vision. Such
devices are not consumer devices, and have a price tag only accessible to professional markets (>$5 K). Figure 5.25k shows some of these devices (Motorola HC1,
Kopin-Verizon Golden Eye, and Canon).
They might also include specialized gear such as FLIR camera and special communication channels. Interestingly, these devices are usually built around the architecture described in Figure 5.6e (non-see-through freeform TIR combiner).
Other specialized markets include medical applications, for patient record viewing or assistance in surgery (monitoring the patients vital signs during surgery,
recording surgery for teaching requirements, etc.). Figure 5.25l shows some implementations based on an Epson Moverio see-through AR headset (also shown in
Figure 5.6f) (left), a Motorola HC1 (center), and Google Glass (right).
Medical applications have been a perfect niche application for AR and smart
glasses, long before the current boom in consumer smart glass and VR headsets, and
will remain a relatively small but steady market for such devices in the future.
5.10 OTHER OPTICAL ARCHITECTURES DEVELOPED IN INDUSTRY

We have reviewed in Section 5.7 the various optical architectures developed in
industry to produce most of the devices described in the previous section. There
are however a few other, more exotic, optical implementations that are based
on nonconventional combiner architectures. We review these in the following
section.
5.10.1 Contact Lens-Based HMD Systems

As contact lenses are the natural extension of any Rx lenses, could therefore be
contact lenses the natural extension of smart glasses? This is not so easy, and the
smart contact lens providing high-resolution display to the user is yet a challenge to
be undertaken. This said, contact lenses have already been used to implement various functionality, such as a telescopic contact lens (Prof. Joe Ford, UCSD, 2012)
and the glucose sensing and insulin delivery contact lens at Google [X] labs in
Mountain View.
There are however a few systems developed in research labs that have introduced
the use of a contact lens. One example is the single pixel display contact lens developed by Prof. Babak Parviz at University of Washington, and the Innovega HMD
based on a dual contact lens/smart glass device (see Figure 5.26).
The Innovega HMD system relies on a contact lens that has an added microlens
surface at its center to collimate the digital display field. The orthogonal polarization
coating filters on the microlens and on the rest of the contact lens provide an effective
way to collimate the polarized display light originating from a microdisplay without
altering the see-through experience other than in polarization. In one implementation,
118
Innovega., U.S.A.
Combo contact lens and glasses
Standard OLED or
LCD display panel
Contact
lens
Video/audio IC
Contact lens with

embedded optics
Outer
filter
Eyeglasses
Optical filter
conditions display light
Center filter and

display lens
FIGURE 5.26 Dual smart glass/contact lens see-through HMD from Innovega.
the microdisplay is located on the temple with a relay lens forming an image on a
reflective holographic combiner located at the inner surface of the glass lens, providing
a true see-through experience. In another implementation, the microdisplay is located
directly on the inner surface of the glass lens, making it thus non-see-through HMD.
Due to the fact that the collimation lens is located close to the cornea, and that it
actually moves with the cornea, it provides for an excellent eye box and a large FOV.
5.10.2Light Field See-Through Wearable Displays

Similarly to light field cameras (such as the Lytro integral camera), light field can
also provide effective display functionality (Hua and Javidi, 2014). The example in
Figure 5.27 produces a light field display from a display and an array of microlenses
as the collimator.
Multiple images are seen under various angles and then presented to the viewer,
which combine in a single image at a particular eye relief distance. This particular
architecture is non-see-through.
5.11 OPTICS FOR INPUT INTERFACES

The good old keyboard and mouse, along with other devices such as pen tablets and
touchscreens, have been implemented over the years as very effective input technologies for consumer electronics (computers, laptops, tablets, smartphones, or smart
watches).
For a head-worn computer such as in smart glasses and ARVR HMDs, other
types of input mechanisms have been developed. They can be summarized into six
different groups: voice control, trackpad input, eye and head gestures sensors, eye
gaze tracking, hand gesture sensing, and other miscellaneous sensing techniques.
Panchind
image (close- Displayed
up-photo)
image
Bare microdisplay
119
Near-eye light
field display
FIGURE 5.27 Light field occlusion HMD from NVidia. (From Dr. Douglas Lanman at
NVIDIA, Santa Clara, CA.)
5.11.1 Voice Control

Voice control for HMDs have been implemented directly as an extension of similar
technologies available to the smartphone industry. Although the microphone(s) are
always close to the mouth in an HMD, they may require a quiet surrounding to function properly.
5.11.2Input via Trackpad

Similarly to voice control, a trackpad, located often on the temple side of a smart
glass can provide for an effective input mechanism. Although it might be awkward to
touching ones temple side to control the device, it is very effective for a few operating conditions (hand available, fingers clean, no gloves, no rain, no sweat, etc.).
5.11.3Head and Eye Gestures Sensors

Thanks to the integrated sensors available in todays smartphones (magnetometers,
accelerometers, gyroscopes), the HMD industry was an early adopter of head gesture
sensor (even in the early defense HMD years). Eye gesture sensors, such as wink
and blink have also been integrated without the need for an imaging sensor (using a
single pixel detector).
5.11.4Eye Gaze Tracking

Various optical gaze tracking techniques have been implemented in industry, not only
in HMDs but also in older camcorder devices. Technologies range from conventional
flood IR CMOS imaging of glints, or scanning via waveguides, etc. (Curatu etal.,
2005), by using dedicated optical paths or the reverse imaging path with single or multiple IR source flood illumination or laser structured illumination. Figure 5.28 shows
the five successive Purkinje images used in conventional gaze tracking applications.
120
P4
P3
P2
P1
L
FIGURE 5.28 Four Purkinje images (P1, P2, P3, and P4) used for gaze tracking (the glint
is the first Purkinje image off the retinaP1).
The first reflection (the glint) is usually the most used one (see also Figure 5.29).
Single or multiple IR flood sources may be used to increase the resolution of the
tracking (e.g., 4 IR sources located symmetrically to produce good vertical and horizontal gaze tracking).
In many combiner architectures, it is desirable to use the same optical path for the
display as for the glint imaging. However, as the display uses an infinite conjugate
(image at infinity) and the gaze tracking finite conjugates, it is necessary to use either
an additional objective lens on the IR CMOS sensor or to position the source outside
the display imaging train (see Figure 5.30). A cold (or hot) mirror would allow for the
splitting between the finite and infinite conjugates.
More complex optical architectures have been developed to implement the finite
conjugate IR imaging task within the infinite conjugate imaging task depicted in
Figure 5.30. One of them is based on the optical architecture described in Figure5.6e
(freeform TIR combiner with compensation optic). The freeform surfaces in the
combiner optic added to an extra objective lens on the IR CMOS sensor allows for
a very compact gaze tracking architecture including both IR flood illumination and
IR imaging (see Figure 5.31).
Glints
FIGURE 5.29 The first Purkinje reflections (glints) used in eye gaze tracking.
121
Microdisplay
Infinite
conjugate
Combiner lens
IR source
Infinite
conjugate
Combiner lens
IR eye gaze
sensor
Finite
conjugate
Combiner lens and additional lens
FIGURE 5.30 Finite (gaze tracking) and infinite (collimation of the display) conjugates in
HMDs.
NIR sensor
Freeform prism
orre
rm c
fo
Free
Exit pupil
Microdisplay
NIR LED
ctor
Freeform combiner with
complement piece for seethrough operation
FIGURE 5.31 Eye gaze tracking in a see-through combiner based on freeform TIR surfaces.
This device has been developed by Canon (Japan). It is however relatively thick
and therefore not quite adapted to consumer smart eyewear, but rather to the professional AR market.
5.11.5Hand Gesture Sensing

Hand gesture sensing has been implemented in various HMDs as add-ons borrowed
from gesture-sensing applications developed for traditional gaming consoles, TVs,
and computers (see Figure 5.32).
122
FIGURE 5.32 Some of the current hand gesture sensor developers for gaming, TV, computer, and HMDs.
(a)
(b)
(c)
FIGURE 5.33 Gesture sensors integrated as add-ons to HMDs. (a) Oculus DK2 and Leap
Motion sensor, (b) Oculus Rift and Soft Kinetic, and (c) Soft Kinetic and Meta Glasses.
A few of these technologies have been applied to HMDs, especially VR headsets,

such as flood IR illumination and shadow imaging (Leap Motion), time of flight
sensor (Soft Kinetic), and structured IR illumination (Primesense Kinect)see
Figure5.33. All three architectures have been tested with VR systems such as the
Oculus Rift DK2, and are now commercially available to the public.
5.11.6Other Sensing Technologies

A few other, and sometimes exotic, input mechanisms have been developed and tested
for HMDs and smart glasses input, such as brain wave sensing, body tapping sensing, etc.
5.12CONCLUSION
We have reviewed in this chapter the main optical architectures used today to implement smart glasses, smart eyewear, see-through AR HMDs, and occlusion VR HMDs.
These various optical architectures produce different size, weight, and functionality
constrains, for devices that are applied to new emerging markets, stemming from the
original defense and VR markets as we have known them for decades. Adapted sensing such as gaze tracking and hand/eye gesture sensing are being implemented in
various HMDs, especially VR headsets, stemming from the earlier efforts produced
for conventional gaming devices. For AR devices, the wide FOV thin see-through
optical combiner remains a challenge, as well as the use of optical occlusion pixels,
and for VR occlusion devices, the challenge resides in reducing thickness, size, and
123
weight of the magnifier lens and increases its performance over a very large FOV.
VR sickness in many users also needs to be addressed for both VR and large FOV
AR systems: although display resolution (pixel counts vs. FOV), low sensor/display
latency, high GPU speed, etc. have all been dramatically improved since the first VR
efforts in the 1990s, many VR sickness issues remain to be addressed today, such as
eye convergence/focus accommodation disparity and motion sickness (internal ear
vs. perceived motion disparity).
For consumer products such as smart glasses and smart eyewear, the integration
of the optical combiner into high wrap sunglass-type lenses or true Rx meniscus
lenses is one of the most desired features for next-generation smart eyewear and is
also one of the most difficult optical challenges.
REFERENCES
Cakmakci, O. and J. Rolland, Head-worn displays: A review, Journal of Display Technology,
2(3), 199216, 2006.
Curatu, C., H. Hua, and J. Rolland, Projection based head mounted display with eye tracking
capabilities, Proceedings of SPIE, 5875, 58750J-1, 2005.
Hua, H., D. Cheng, Y. Wang, and S. Liu, Near-eye displays: State-of-the-art and emerging
technologies, Proceedings of SPIE, 7690, 769009, 2010.
Hua, H. and B. Javidi, A 3D integral imaging optical see-through head-mounted display,
Optics Express, 22(11), 1348413491, 2014.
Kress, B. and P. Meyrueis, Applied Digital Optics, from Micro-Optics to Nano-Photonics,
John Wiley Publisher, Chichester, UK, 2009.
Kress, B., V. Raulot, and P. Meyrueis, Digital combiner achieves low cost and high reliability for head-up display applications, SPIE Newsroom Illumination and Displays,
Bellingham, WA, 2009, http://spie.org/x35062.xml.
Levola, T., Diffractive optics for virtual displays, Proceedings of the Journal of the SID
(Society for Image Display), 14(5), 467475, 2006.
Martins, R. etal., Projection based head mounted displays for wearable computers, Proceedings
of SPIE, 5442, 104110, 2004.
Melzer, J. and K. Moffitt, Head Mounted Displays: Designing for the User, MacGraw Hill,
1997, reprinted in 2011.
Peli, E., The visual effects of head-mounted display are not distinguishable from those of
desk-top computer display, Vision Research, 38, 20532066, 1998.
Rash, C.E., Head mounted display: Design issues for rotary-wing aircraft. United States Army
Aeromedical Research Laboratory, US Government Printing Office, Washington, DC, 1999.
Takahashi, H. and S. Hiroka, Stereoscopic see-through retinal projection head-mounted display, Proceedings of SPIE-IS&T Electronic Imaging, 6803, 68031N, 2008.
Talha, M.M., Y. Wang, D. Cheng, and J. Chang, Design of a compact wide field of view HMD
optics system using freeform surfaces, Proceedings of SPIE, 6624, 662416-1, 2008.
Thomson CSF, US patent # 5,076,664 of December 31, 1991 (Thomson CSF, France).
Urey, H. and K.D. Powell, Microlens array based exit pupil expander for full color displays,
Applied Optics, 44(23), 49304936, 2005.
Velger, M., Helmet Mounted Displays and Sights, Artech House Publisher, Boston, MA, 1998.
Wilson, J. etal., Design of monocular head mounted displays for increased indoor firefighting
safety and efficiency, Proceedings of SPIE, 5800, 103114, 2005.
Wilson, J. and P. Wright, Design of monocular head-mounted displays, with a case study
on fire-fighting, Proceedings of IMechE, Part C: Journal of Mechanical Engineering
Science, vol. 221, 2007.
Image-Based Geometric
Registration for Zoomable
Cameras Using
Precalibrated Information
Takafumi Taketomi
CONTENTS
6.1 Background.................................................................................................... 126
6.2 Literature Review.......................................................................................... 126
6.3 Marker-Based Camera Parameter Estimation............................................... 127
6.4 Camera Pose Estimation for Zoomable Cameras.......................................... 129
6.4.1 Parameterization of Intrinsic Camera Parameter Change................. 130
6.4.1.1 Camera Calibration for Each Zoom Value......................... 130
6.4.1.2 Intrinsic Camera Parameter Change Expression Using
Zoom Variable.................................................................... 130
6.4.2 Monocular Camera Case................................................................... 131
6.4.2.1 Definition of Energy Function............................................ 131
6.4.2.2 Energy Term for Epipolar Constraint................................. 131
6.4.2.3 Energy Term for Fiducial Marker Corners......................... 132
6.4.2.4 Energy Term for Continuity of Zoom Value....................... 133
6.4.2.5 Balancing Energy Terms..................................................... 133
6.4.2.6 Camera Pose Estimation by Energy Minimization............ 134
6.4.3 Stereo Camera Case........................................................................... 134
6.4.3.1 Camera Calibration for Camera pair.................................. 135
6.4.3.2 Geometric Model for Stereo Camera Considering
Optical Zoom Lens Movement........................................... 135
6.4.3.3 Camera Pose Estimation Using Zoom Camera and
Base Camera Pair................................................................ 136
6.5 Camera Parameter Estimation Results.......................................................... 137
6.5.1 Camera Calibration Result................................................................ 137
6.5.2 Quantitative Evaluation in Simulated Environment.......................... 139
6.5.2.1 Free Camera Motion........................................................... 140
6.5.2.2 Straight Camera Motion..................................................... 143
6.5.3 Qualitative Evaluation in Real Environment..................................... 145
6.6 Summary....................................................................................................... 147
References............................................................................................................... 148
125
126
6.1BACKGROUND
In video see-through-based augmented reality (AR), estimating camera parameters
is important for achieving geometric registration between real and virtual worlds.
In general, extrinsic camera parameters (rotation and translation) are estimated by
assuming fixed intrinsic camera parameters (focal length, aspect ratio, principal
points, and radial distortions). Early augmented reality applications assume the use
of head-mounted displays (HMDs) for displaying augmented reality images to users.
When using HMDs, changes in intrinsic camera parameters such as camera zooming
are not used to prevent unnatural sensations in users; thus, assuming fixed intrinsic
camera parameters is not a problem in conventional augmented reality applications.
In contrast, mobile augmented reality applications that use smartphones and tablet
PCs have been widely developed in the recent years. In addition, augmented reality
technology is often used in TV programs. In these applications, the changes in intrinsic camera parameter hardly give unnatural sensations. However, most of the augmented reality applications still assume fixed intrinsic camera parameters. Thisis
due to the difficulty of estimating extrinsic and intrinsic camera parameters simultaneously. Removing the limitation of fixed intrinsic camera parameters in camera
parameter estimation opens possibilities in many augmented reality applications.
In this chapter, to estimate camera parameters while the intrinsic camera parameters change, two methods are introduced: estimation using monocular camera and
estimation using stereo camera. More specifically, we focus on the marker-based
camera parameter estimation method because this method is widely used in augmented reality applications. Both methods are extended versions of the markerbased camera parameter estimation method.
The remainder of this chapter is organized as follows. In Section 6.2, related works
are briefly reviewed. Section 6.3 introduces general marker-based camera parameter
estimation. The framework for estimating intrinsic and extrinsic camera parameters
using precalibrated information is described in Section 6.4, and its effectiveness is
quantitatively and qualitatively evaluated in Section 6.5. Finally, Section6.6 presents
the conclusion and the future work.
6.2 LITERATURE REVIEW

Many vision-based methods for estimating camera parameters have already been
proposed in the fields of AR and computer vision. In these methods, camera parameters are estimated by solving the perspective-n-point (PnP) problem using 2D3D
corresponding pairs. There are two groups for solving the PnP problem: the camera
parameter estimation under the conditions of known and the unknown intrinsic camera parameters. Recently, numerous methods have been proposed to solve the PnP
problem when the intrinsic camera parameters are known (Fischer and Bolles 1981;
Klette etal. 1998; Quan and Lan 1999; Wu and Hu 2006; Lepetit etal. 2009; Hmam
and Kim 2010). Most camera parameter estimation methods belong to this category.
In AR, 2D3D corresponding pairs are obtained by using a 3D model of the environment or a feature landmark database (Drummond and Cipolla 2002; Taketomi
etal. 2011).
Image-Based Geometric Registration
127
Solutions for the PnP problem when the intrinsic camera parameters are not
known have also been proposed (Abidi and Chandra 1995; Triggs 1999). These methods can estimate the absolute extrinsic camera parameters and focal length from 2D
to 3D corresponding pairs. However, in these methods, the accuracy of the estimated
camera parameters decreases depending on the specific geometric relationship of
the points. To solve this problem, Bujnak etal. proposed a method for estimating
extrinsic camera parameters and focal length. Their method uses a Euclidean rigidity constraint in object space (Bujnak etal. 2008). Furthermore, they improved the
computational cost of the method (Bujnak etal. 2008) by joining planar and nonplanar solvers (Bujnak etal. 2010). The method (Bujnak etal. 2010) can be implemented
in real time on a desktop computer. However, the accuracy of the estimated camera
parameters still decreases in this method when the optical axis is perpendicular to the
plane formed by the 3D points. Kukelova etal. proposed the five-point-based method
(Kukelova etal. 2013). This method can achieve more stable camera parameter estimation than that of the method in Bujnak etal. (2010). However, most marker-based
applications use a square marker (Kato and Billinghurst 1999). In these applications,
camera parameters should be estimated from four 2D3D corresponding pairs.
Unlike in the PnP problem, for estimating the intrinsic and extrinsic camera parameters, corresponding pairs of 2D image coordinates in multiple images are used
(Hartle and Zisserman 2004; Stewenius etal. 2005; Li 2006). These methods are
usually used in 3D reconstruction from multiple images, for example, the structurefrom-motion technique (Snavely etal. 2006). Although these methods do not need
any prior knowledge of the target environment, they cannot estimate absolute extrinsic
camera parameters. Sturm proposed a self-calibration method for zoom lens cameras, which uses precalibration information (Sturm 1997). The idea of this method
is similar to that of the method described in Section 6.4.2. In this method, intrinsic
camera parameters are calibrated and then represented by one parameter. In the
online process, the estimation of the intrinsic and extrinsic camera parameters uses
this precalibration information and is based on the Kruppa equation. However, the
solution of the Kruppa equation is not robust to noise, and this method cannot estimate absolute extrinsic camera parameters. These methods are impractical for some
AR applications because they require the user to arrange the CG objects and coordinate system manually.
In contrast to previous methods, the method that we describe in this section can
accurately and stably estimate intrinsic and absolute extrinsic camera parameters
using an epipolar constraint and a precalibrated intrinsic camera parameter change.
In our method, a fiducial marker is used to obtain 2D3D corresponding pairs.
Natural feature points that do not have 3D positions are used to stabilize the camera
parameter estimation results. Estimated intrinsic camera parameters are constrained
by the precalibrated intrinsic camera parameter change.
6.3 MARKER-BASED CAMERA PARAMETER ESTIMATION

In this section, we introduce a general marker-based camera parameter estimation process. Our marker-based camera parameter estimation process can be divided into three
processes: marker detection, marker identification, and camera parameter estimation.
128
A fiducial maker is detected from an input image using image processing techniques
such as binarization. Then, the marker is matched against known markers. After
detection and identification, 3D positions of the fiducial marker features are associated
with the 2D positions of the fiducial marker in the input image. These 2D3D correspondences are used to estimate camera parameters. In general, fixed intrinsic camera
parameters are assumed in this camera parameter estimation process. Thus, in this
estimation process, extrinsic camera parameters are estimated as unknown parameters. Most camera parameter estimation methods employ the following cost function:
E=
p p
i
(6.1)
where
E is the cost
pi is a detected 2D position of the fiducial marker in the input image
pi is a reprojected position of the 3D point of the fiducial marker feature as shown
in Figure 6.1
The position of the reprojected point can be calculated using the translation of vector t, rotation of matrix R, and the intrinsic camera parameter matrix K, as follows:
pi K R | t Pi (6.2)
where Pi is a 3D position of the fiducial marker feature. Note that the distortion factor
is ignored in this formulation. Finally, extrinsic camera parameters are estimated by
minimizing Equation 6.1.
Pi
pi
pi
World coordinate
system
Camera coordinate
system
[R|t]
FIGURE 6.1 Geometric relationship between a reprojected point and a detected point.
129
In the past, intrinsic camera parameters are fixed in the camera parameter estimation process in augmented reality. These intrinsic camera parameters are obtained in
advance by using camera calibration methods (Tsai 1986; Zhang 2000). On the other
hand, simultaneous intrinsic and extrinsic camera parameter estimation methods
have been proposed in the field of computer vision (Bujnak etal. 2008, 2010). These
methods can estimate intrinsic and extrinsic camera parameters using 2D3D correspondences. However, results are unstable when the marker features lie on the same
plane. In addition, the accuracy of camera parameter estimation will decrease when
the camera moves with the camera zooming along the optical axis. In Section 6.3,
two methods for overcoming these problems are introduced.
6.4 CAMERA POSE ESTIMATION FOR ZOOMABLE CAMERAS

In this section, the method for estimating intrinsic and extrinsic camera parameters
is introduced. Figure 6.2 shows the camera parameter estimation framework for
zoomable cameras. The method can be divided into two processes: offline camera calibration and online camera parameter estimation. In the offline process, an
intrinsic camera parameter change is modeled by calibrating the intrinsic camera
parameters for each zoom value. In the online camera parameter estimation process, intrinsic and extrinsic camera parameters are estimated using precalibration
information. Two extrinsic camera parameter estimation methods are introduced: a
monocular-camera-based method and a stereo-camera-based method. The monocular-camera-based method can be used for general marker-based augmented reality
Offline stage
1. Camera calibration for each magnification
of the camera zooming
2. Third-order spline fitting for each parameter change
3. Stereo camera calibration
(only for stereo camera case)
Online stage
Monocular camera case
1. KLT-based natural feature tracking between
successive frames
2. Fiducial marker detection
Stereo camera case

1. Extrinsic camera parameter estimation for
the reference camera
2. Magnification of zoom value estimation
3. Calculation of the energy function
3. Calculation of the energy function
4. Intrinsic and extrinsic camera parameter

esitmation by minimizing the energy function
4. Refinement of extrinsic camera parameters

by minimizing the energy function
FIGURE 6.2 Flow diagram of camera parameter estimation for zoomable camera.
130
applications. On the other hand, the stereo-camera-based method can be used for
situations wherein an additional camera can be attached to the camera capturing the
augmented reality background images. Details of these methods are described in the
following sections.
6.4.1 Parameterization of Intrinsic Camera Parameter Change

In this process, the intrinsic camera parameter change is modeled using camera calibration results for each zoom value. By using this model, we can improve the stability and accuracy of online camera parameter estimation.
6.4.1.1 Camera Calibration for Each Zoom Value
In this process, we assume the intrinsic camera parameter matrix as follows.
fx
K =0
0
0
fy
0
cx
cy (6.3)
1
where
fx and f y represent focal lengths
cx and cy represent principal points
In this method, we assume zero skew and no lens distortion. This assumption is
reasonable for most of the recent camera devices. Thus, the degree of freedom of the
intrinsic camera parameters is four. In this method, these four values for each zoom
value are obtained by using Zhangs camera calibration method (Zhang 2000).
6.4.1.2Intrinsic Camera Parameter Change Expression
Using Zoom Variable
After getting the intrinsic camera parameters for each zoom value, the intrinsic camera parameters are expressed in terms of the zoom variable m.
f x (m)
K(m) = 0
0
0
f y (m)
0
cx (m)
cy (m) (6.4)
1
By using this expression, the degree of freedom of the intrinsic camera parameter
matrix is reduced to one. In addition, the relationship of each intrinsic camera parameter change is retained. Unlike in previous research that handles the intrinsic camera
parameters independently (Bujnak etal. 2008, 2010), the method described in this
section can achieve stable camera parameter estimation during the online process.
In this method, the third-order spline fitting is employed to the result of camera
calibration to obtain the intrinsic camera parameter change model for each parameter. The third-order spline fitting has features that an obtained function through
131
the all control points and each polynomial function is continuously connected at the
borders. These features are suitable for an energy minimization process in the online
camera parameter estimation.
6.4.2Monocular Camera Case

In this section, the method for estimating intrinsic and extrinsic camera parameters
with a monocular camera is introduced (Taketomi etal. 2014). In this method, two
energy terms are added to the cost function of the conventional marker-based camera
parameter estimation. In addition, precalibrated information of the zoomable camera
is used in this online camera parameter estimation process.
6.4.2.1 Definition of Energy Function
In the online process, intrinsic and extrinsic camera parameters are estimated based
on an energy minimization framework. In order to estimate camera parameters,
two energy terms are added into the conventional cost function of the marker-based
camera parameter estimation: an energy term based on the epipolar constraint for
tracked natural features and an energy term based on the continuity constraint for
temporal change of zoom values. The cost function Emono for the camera parameter
estimation is defined as follows:
Emono = Eep + wmk Emk + wz Ezoom (6.5)
where wmk and wz are weights for balancing each term. These weights are automatically determined based on the camera parameters of the previous frame. Emk is used
to estimate the absolute extrinsic camera parameters, Eep implicitly gives the 3D
structure information, and Ezoom gives the temporal constraint for zoom values. Eep
and Ezoom help achieve stable estimation of the magnification of the zoom value. In
the following sections, the details of the energy terms and the weights are described.
6.4.2.2 Energy Term for Epipolar Constraint
In this method, the energy term Eep is calculated from the summation of distances
between the epipolar lines and the tracked natural features as shown in Figure 6.3.
Based on epipolar geometry, a corresponding point must be located on the epipolar
Tracked feature point
Pi
Reprojection error di
pi
Key frame
Epipolar line li
qi
ei
Epipole
ei
Current frame j
FIGURE 6.3 Reprojection error based on tracked natural features.
132
line on another camera image (Hartle and Zisserman 2004). To calculate this distance, frames that satisfy the following criteria are stored as the key frames:

1. The distance between the current camera position and the camera position
of the previous 10 frames is maximum.
2. All the distances between the current camera position and key frame positions are larger than the threshold.
Note that the first frame is stored as the first key frame in the online process of camera parameter estimation. In addition, natural features are tracked between successive frames using the KanadeLucasTomasi feature tracker (Shi and Tomasi 1994).
More concretely, the energy term Eep is as follows:
Eep =

1
Sj
2
i
(6.6)
iS j
where
S is a set of tracked natural feature points in the jth frame
di is the reprojection error for the natural feature point i
The reprojection error di is defined as the distance between an epipolar line l and
a detected natural feature position qi in the input image. The epipolar line l can be
calculated from the epipole ei and the projected position pi of the natural feature
position pi in the key frame. Epipole ei and the projected position pi are calculated as:

ei = K ( m j ) Tj Pkey (6.7)
pi = K ( m j ) Tj Pi (6.8)
where
Pkey represents the key frame camera position in the world coordinate system
Tj represents the extrinsic camera parameter matrix (camera rotation and
translation)
The subscript represents the estimated camera parameters in the key frame. Note that
Pi in Equation 6.8 is already transformed into the world coordinate system using the
matrices Kkey(mkey) and Tkey. By using this notation, we can represent the estimation
error for the two frames with the epipolar constraint as the reprojection error.
6.4.2.3 Energy Term for Fiducial Marker Corners
This term is almost the same energy term as that calculated in conventional camera
parameter estimation methods. Reprojection errors are calculated from the correspondences between the fiducial marker corners in an input image and its reprojected points:
Emk =

( K ( m ) T P p ) (6.9)
j
i =1
j i
133
where Pi and pi are the 3D position of fiducial marker corners and its detected position in the input image, respectively. Unlike the conventional methods, the magnification parameter m of the camera zooming exists in the intrinsic camera parameter
matrix K in the jth frame.
6.4.2.4 Energy Term for Continuity of Zoom Value
This term is used to achieve stable camera parameter estimation. In augmented reality, camera parameters are estimated from a video feed. In this case, the magnification of the zoom value continuously changes in successive frames. In order to add
this constraint, we use the energy term Ezoom in the energy function:
2
Ezoom = ( m j 1 m j ) (6.10)
With this constraint, a discontinuous change in the zoom value is suppressed.

6.4.2.5 Balancing Energy Terms
In the energy function E, there are three energy terms. Balancing each energy term
is important to achieve accurate and stable camera parameter estimation. In this
method, each energy term is automatically balanced using the estimated camera
parameters in the previous frame. In this section, the details of the auto balancing
framework are described.
In fiducial marker-based camera parameter estimation, the estimated camera
parameters will be unstable when the optical axis of the camera is perpendicular to
the fiducial marker plane. This is caused by the singularity problem in the optimization process of camera parameter estimation. For this reason, as shown in Figure 6.4,
the weight wmk for the energy term Emk is calculated from the angle between the
optical axis and the fiducial marker plane as follows:
wmk ( ) =
4 2
+ (6.11)
2
This weight function is experimentally investigated. In addition, is a minimal

weight for Emk.
On the other hand, to effectively constraint the continuity of the zoom value,
the weight wz is dynamically changed depending on the estimated intrinsic camera
Optical axis
Normal of fiducial
marker plane
FIGURE 6.4 Weight for the fiducial marker-based energy term.
134
parameters in the previous frame. In general, the relationship between the zoom
values and intrinsic camera parameters is not proportional. The focal lengths (fx(m),
f y(m)) are drastically changed at a large image magnification resulting from the camera zooming. For this reason, if we use the constant weight wz, the effect of the
weight wz might be too strong or too weak in the camera parameter estimation process. Thus, we should control the weight wz adequately. To solve this problem, we
employed a weight for wz which depends on fx(m) as follows:
wz =

1
(6.12)
fx ( m j )
In this term, we only use fx because the change of fx is almost the same as that of f y.
By using this weight, we can adequately control the weight wz based on the rate of
change of the intrinsic camera parameters.
6.4.2.6 Camera Pose Estimation by Energy Minimization
To estimate the intrinsic and extrinsic camera parameters, the energy function
E is minimized by using the LevenbergMarquardt algorithm. We employ the
M-estimator to reduce the effect of mis-tracked natural features in the optimization
process. In this method, we employ the GemanMcClure function .
( x) =
x2 /2
(6.13)
1 + x2
where x represents the residual. In this optimization process, the zoom value mj1
estimated in the previous frame and the extrinsic camera parameters estimated by
using K(mj1) are used as initial parameters. The results of camera parameter estimation may converge at a local minimum. Experimentally, we confirmed that the local
minimum problem occurs along the optical axis of the camera. To avoid the local
minimum problem, the optimization process is executed using three different initial
values generated by adding an offset to the initial magnification value of camera
zooming. Finally, the lowest energy value resulting from all trials is chosen, and its
estimated camera parameters K(mj) and Tj are adopted as the final result.
6.4.3Stereo Camera Case

In this section, the method for estimating intrinsic and extrinsic camera parameters using a reference camera is introduced. In this method, the reference camera is
fixed on the zoomable camera. Intrinsic camera parameters of the reference camera
are fixed during the camera parameter estimation process of the zoomable camera.
Firstly, in the online process, the extrinsic camera parameters for the reference camera are estimated using the fiducial marker. Intrinsic and extrinsic camera parameters for the zoomable camera are then estimated. In the camera parameter estimation
process, the reference and zoomable camera pair is modeled by considering an optical lens movement. By using this model and estimated extrinsic camera parameters
135
for the reference camera, intrinsic and extrinsic camera parameters for the zoomable
camera can be obtained by estimating the zoom value. Details of the algorithm are
described in the following sections.
6.4.3.1 Camera Calibration for Camera pair
This method assumes an additional camera attached to the zoomable camera as
shown in Figure 6.5. This attached camera is used as a reference to estimate intrinsic
and extrinsic camera parameters of the zoomable camera. Intrinsic camera parameters of the reference camera are calibrated and fixed in the whole process. In this
calibration process, the magnification of the zoom value of the zoomable camera is
set to 1.0 (non-zoom mode). In this setting, the intrinsic camera parameters of the
zoomable camera and the reference camera are known. By using these known intrinsic camera parameters, a relative geometric relationship Trel between the zoomable
camera and the reference camera is calibrated by capturing a calibration pattern.
This relative geometric relationship is used to estimate intrinsic and extrinsic camera
parameters of the zoomable camera.
6.4.3.2Geometric Model for Stereo Camera Considering
Optical Zoom Lens Movement
In the case of camera zooming using an optical zoom lens, the relative geometric relationship Trel changes depend on the optical lens movement because the optical center moves along the optical axis. In this method, the optical lens movement
is modeled as the focal length change by zooming using a simple zoomable camera model (Numao et al. 1998). This simple zoomable camera model is shown in
Figure6.6. Concretely, in this modelization, a relationship between the focal length
of each zoom value fi and the minimum focal length f min is calculated:
fi = fi fmin (6.14)
Zoomable camera
fmin
fmax
p
F( f )
Trel
Tzoom
p
Tref
Reference camera
FIGURE 6.5 Stereo camera model.
136
Maximum
zoom value
Minimum
zoom value
fmin
Image sensor
L( f )
fmax
Amount of lens
movement L( f ) [mm]
fmin
Focal length f (mm)
FIGURE 6.6 Modelization of optical lens movement.
A regression line is fitted to the result of this calculation, and then the relationship
between lens movement and the focal length change L(f) is obtained:

L ( f ) = f + (6.15)
where and are the parameters for the regression line. The relationship between
lens movement and the focal length change and the relationship between the zoomable camera and the reference camera are used to model the stereo camera model
as shown in Figure 6.5. In this figure, Tzoom and Tref are extrinsic camera parameters
of the zoomable camera and the reference camera in the world coordinate system,
respectively. In addition, F(f) is the amount of optical zoom center movement.
F ( f ) = 0
0
0
1
0
0
0
1
0 (6.16)
L ( f )
6.4.3.3Camera Pose Estimation Using Zoom Camera

and Base Camera Pair
In the online process, intrinsic and extrinsic camera parameters of the zoomable
camera are estimated using the estimated extrinsic camera parameters of the reference camera, precalibrated information, and the relationship between the focal
length change and the lens movement.
Firstly, the extrinsic camera parameters of the reference camera Tref are estimated
using known intrinsic camera parameters Kref and the detected fiducial marker pattern. In this extrinsic camera parameter estimation process, extrinsic camera parameters are estimated similar to the conventional marker-based camera parameter
estimation process. Extrinsic camera parameters of the zoomable camera Tzoom can
be represented using the estimated extrinsic camera parameters Tref and precalibrated information Trel and F(f):

Tzoom = F ( f ) TrelTref (6.17)
137
In addition, projected points pi of 3D points Pi in the zoomable camera images can

be represented using the intrinsic camera parameters of the zoomable camera K(z):
pi K ( m ) F f x ( m ) TrelTref Pi (6.18)
In this equation, all of parameters are known except for z. Thus, we can estimate
the magnification of the zoom value m by minimizing the reprojection errors of the
detected marker and the reprojected marker corner positions. In this minimization
process, the LevenbergMarquardt method is employed and the zoom value in the
previous frame is used as the initial value for optimization.
Finally, Tzoom is refined by minimizing the following cost function using detected
marker corner positions in reference and zoomable camera images:
E=

i =1
K ( m ) Tzoom Pi pizoom +
i =1
Kref Trel1F f x ( m )
Tzoom Pi piref (6.19)
where piref and pizoom represent detected marker corner positions in reference and
zoomable camera images, respectively. Note that the magnification of the zoom
value z is fixed in this optimization process.
6.5 CAMERA PARAMETER ESTIMATION RESULTS

Performances of the methods described in Section 6.4 are shown in this section.
First, precalibration information is obtained using the camera calibration process
described in Section 6.4.1. Next, accuracies of camera parameter estimation of each
method are quantitatively and qualitatively evaluated in simulated and real environments. In these evaluations, the estimated camera parameters are compared
with those of the Bujnaks method (Bujnak etal. 2010). The Bujnaks method can
estimate focal length, lens distortion, and extrinsic camera parameters using four
2D3D corresponding pairs by minimizing the distances between 3D positions of
natural feature points and its observed positions. Although this method can estimate
lens distortion, we ignore the lens distortion effect because most consumer camera
devices have no lens distortion except for camera-mounted wide-angle lens. In addition, in the experiments in real environment, camera parameter estimation results
of ARToolkit which does not handle the intrinsic camera parameter change are also
shown as a reference. It should be noted that all input video sequences start at the
non-zoom setting, and that the offset for the initial value was set 0.1. In all experiments, we used a desktop PC (CPU: Corei7 2.93 GHz, Memory: 4.00 GB).
6.5.1Camera Calibration Result

In this experiment, we used a Sony HDR-AX2000 video camera, which records 640
480 pixel images, with an optical zoom (120) and a progressive scan at 30 fps.
The lens distortion of this camera is almost zero (1 = 1.4 104). This video
138
camera was used to generate virtual camera motions in the quantitative evaluation
and to acquire actual video sequences in the qualitative evaluation. The range of the
image magnification resulting from camera zooming is divided into 20 intervals,
and then, the intrinsic camera parameters for each zoom value are obtained using
Zhangs camera calibration method (Zhang 2000).
Figures 6.7 and 6.8 show the results of the camera calibration. In these figures,
the lines indicate the spline fitting results. These results show that the focal length
drastically changes when the zoom value is greater than 13. In addition, the center
of the projection changes cyclically because the lens rotates during zooming. In the
following experiments, we used the spline fitting results of fx(z), f y(z), u(z), and v(z).
1200
fx(m)
f y(m)
Focal length (mm)
1000
800
600
400
200
0
7 8 9 10 11 12 13 14 15 16 17 18 19 20
Magnification of camera zooming
FIGURE 6.7 Calibration result of focal length.
Center of projection (pixel)
u(m)
v(m)
400
350
300
250
200
150
100
50
0
10 11 12 13 14 15 16 17 18 19 20
Magnification of zoom value
FIGURE 6.8 Calibration result of center of projection.
139
6.5.2 Quantitative Evaluation in Simulated Environment

The accuracy of the estimated intrinsic and extrinsic camera parameters was quantitatively evaluated in a simulated environment. In this experiment, two virtual camera motions in the simulated environment were acquired using ARToolKit (Kato
and Billinghurst 1999) and video sequences captured in the real environment. In this
virtual camera motion acquisition process, intrinsic camera parameters are fixed at
the smallest magnification of camera zooming. The differences between these two
motions are as follows:
The camera moves freely during camera zooming in the simulated environment (free motion).
The camera moves straight along the optical axis during camera zooming
in the simulated environment (straight motion).
The camera travels 2173mm in the free camera motion and 1776mm in the straight
camera motion. In this simulation, 100 3D points were randomly generated in
3D space (500mm 500mm 500mm) and then the corresponding pairs were
obtained by projecting these 3D points into virtual cameras. Additionally, because
there was no noise in the projected points, Gaussian noise was added, with the mean
equal to zero and the standard deviation = 2.0. Figure 6.9 shows the geometrical
relationship between 3D points and camera motions in the simulated environments.
To quantitatively evaluate the estimated camera parameters, camera position
errors are measured by the Euclidian distance between estimated camera positions
and true camera positions. On the other hand, estimated camera poses are evaluated
using the difference between the estimated rotation matrix Rest and the true rotation
matrix Rtrue using the following calculation:
T
1. Multiplying the two matrices Rd = Rtrue Rest
.
2. Decompose the matrix Rd into rotation axis vector w and rotation angle
using Rodriguess formula.
Finally, rotation angle is employed as the rotation error of the estimated rotation
matrix. This evaluation method is described in the literature (Petit etal. 2011).
Free camera motion
Straight camera motion
FIGURE 6.9 Part of the camera paths and 3D points in the simulated environment.
140
Focal length (mm)
6.5.2.1 Free Camera Motion

In this case, the camera moves freely in the simulated environment, which includes
a translation, a rotation, and zooming. Figures 6.10 and 6.11 show the results of the
estimated intrinsic parameters (fx, f y, u, v) and the ground truth value for each frame.
In these figures, the method for the monocular camera is labeled as Method A and
the method for the stereo camera is notated as Method B. It should be noted that
Bujnaks method (Bujnak etal. 2010) cannot estimate the centers of the projections.
Figure 6.11 shows the results of Methods A and B only. These results confirm that
Methods Aand B can estimate the focal length more accurately than Bujnaks method.
In addition, Methods A and B can accurately estimate the center of projection.
Figures 6.12 and 6.13 show the errors for estimated position and rotation. These
results confirm that the accuracy of the estimated extrinsic camera parameters is
160
140
120
100
80
60
40
50
100
150
200
Bujnaks method
Focal length (mm)
fx (estimated)
160
140
120
100
80
60
40
50
fx (ground truth)
100
150
fy (estimated)
200
Method A
Focal length (mm)
fx (estimated)
160
140
120
100
80
60
40
50
fx (ground truth)
100
150
fy (estimated)
200
fx (ground truth)
fy (ground truth)
250
300
Frame number
fy (ground truth)
250
300
Frame number
Method B
fx (estimated)
250
300
Frame number
fy (estimated)
fy (ground truth)
FIGURE 6.10 Estimation results of focal length for each frame in free camera motion.
141
340
320
300
280
260
240
220
200
50
150
100
200
250
300
Frame number
Method A
u (estimated)
340
320
300
280
260
240
220
200
u (ground truth)
50
100
v (estimated)
200
150
250
300
Frame number
Method B
u (estimated)
u (ground truth)
v (ground truth)
v (ground truth)
v (estimated)
FIGURE 6.11 Estimation results of center of projection for each frame in free camera
motion.
30
Position error (mm)
25
20
15
10
5
0
50
100
Bujnaks method
150
200
Method A
250
300
Frame number
Method B
FIGURE 6.12 Estimated camera position errors for each frame in the case of free camera
motion.
142

12
Rotation error (degree)
10
8
6
4
2
0
50
100
150
Bujnaks method
200
Method A
250
300
Frame number
Method B
FIGURE 6.13 Estimated camera rotation errors for each frame in free camera motion.
drastically improved by Methods A and B. These improvements are considered to

be due to the accurate estimation of intrinsic camera parameters. In addition, we can
confirm that translation errors are highly dependent on estimation errors of the zoom
factor. Table 6.1 shows the average errors for each camera parameter. Although the
average reprojection error in Bujnaks method is small, the errors for each camera
parameter are still large. This is due to the difficulty in estimating the parameters
using only 2D3D correspondences. In contrast, the average estimation errors for
each camera parameter decrease in Methods A and B. We consider that the multiple
frame information and the continuity constraint of the camera zooming were responsible for this improvement. However, the processing times of Methods A and B are
slower than those of Bujnaks method. In Methods A and B, the energy minimization process accounts for most of the processing time. Especially, in Method A, the
minimization process is executed for three different initial values to avoid the local
minimum problem.
TABLE 6.1
Comparison of Accuracy in the Case of Free Camera Motion
Average focal length error (mm)
Average position error (mm)
Average rotation error (degree)
Average reprojection error (pixel)
Processing time (s)
Bujnaks Method
Method A
Method B
13.08
6.1
1.37
1.36
0.0011
0.83
0.46
1.31
0.82
0.06
0.5
0.51
1.18
0.31
0.04
143
6.5.2.2 Straight Camera Motion

In this case, the camera moves straight along the optical axis during camera zooming.
In addition, the optical axis is perpendicular to the fiducial marker plane. This condition
cannot easily be handled by Bujnaks method. Figures 6.14 and 6.15 show the results
of the estimated intrinsic camera parameters. Figures 6.16 and 6.17 show the errors for
estimated position and rotation. Table 6.2 shows the average errors for each camera
parameter. These results show that Methods A and B can estimate accurate intrinsic
and extrinsic camera parameters under this difficult condition. Although the reprojection error is small in Bujnaks method, the estimated camera parameters are inaccurate.
This is due to the difficulty in estimating camera parameters when using only
2D3D correspondences.
Focal length (mm)
160
140
120
100
80
60
40
50
100
150
200
250
300
Frame number
fy (estimated)
fy (ground truth)
200
250
300
Frame number
fy (estimated)
fy (ground truth)
200
250
300
Frame number
fy (estimated)
fy (ground truth)
Bujnaks method
fx (estimated)
Focal length (mm)
160
140
120
100
80
60
40
fx (ground truth)
50
100
150
Method A
fx (estimated)
Focal length (mm)
160
140
120
100
80
60
40
fx (ground truth)
50
100
150
Method B
fx (estimated)
fx (ground truth)
FIGURE 6.14 Estimation results of focal length for each frame in straight camera motion.
144
340
320
300
280
260
240
220
200
100
50
150
200
Method A
u (estimated)
340
320
300
280
260
240
220
200
u (ground truth)
100
50
v (estimated)
150
200
Method B
u (estimated)
u (ground truth)
v (estimated)
250
300
Frame number
v (ground truth)
250
300
Frame number
v (ground truth)
FIGURE 6.15 Estimation results of center of projection for each frame in straight camera
motion.
30
Position error (mm)
25
20
15
10
5
0
50
100
Bujnaks method
150
200
Method A
250
300
Frame number
Method B
FIGURE 6.16 Estimated camera position errors for each frame in straight camera motion.
145

10
Rotation error (degree)
9
8
7
6
5
4
3
2
1
0
50
100
150
Bujnaks method
200
Method A
250
300
Frame number
Method B
FIGURE 6.17 Estimated camera rotation errors for each frame in straight camera motion.
TABLE 6.2
Comparison of Accuracy in the Case of Straight Camera Motion
Average focal length error (mm)
Average position error (mm)
Average rotation error (degree)
Average reprojection error (pixel)
Processing time (s)
Bujnaks Method
Method A
Method B
13.66
7.71
2.24
1.33
0.0012
2.13
1.1
1.67
0.79
0.05
0.51
0.54
1.73
0.43
0.04
6.5.3 Qualitative Evaluation in Real Environment

The geometric registration results were compared with those of ARToolKit (Kato and
Billinghurst 1999) and Bujnaks method (Bujnak etal. 2010). The camera parameter estimation process was executed for two video sequences, one of which was a free camera
motion sequence and the other a straight camera motion sequence. In these sequences,
the image magnification resulting from the camera zooming changes dynamically.
Figure 6.18 shows the results of the geometric registration. A virtual cube is overlaid
on Rubiks cube. We can confirm that the virtual cube is accurately overlaid in Methods
A and B. In contrast, the results of ARToolKit and the Bujnaks method involve geometric inconsistency. More specifically, there is a large inconsistency in the geometric registration result of the Bujnaks method when the optical axis is perpendicular to the fiducial
marker plane. These results show that Methods A and B can achieve accurate geometric
registration using estimated camera parameters even in such a difficult condition.
Figures 6.19 and 6.20 show the results of the estimated camera paths. Frustums
represent the estimated camera positions and poses. The size of the frustum changes
depending on the focal length.
146

Bujnaks method
Method A
Method B
Zoom
Non-zoom
Zoom
Non-zoom
ARToolKit
FIGURE 6.18 Geometric registration results of each method. A virtual cube is overlaid on
Rubiks cube in each frame.
Bujnaks method
Method A
Method B
FIGURE 6.19 Estimated camera paths for free camera motion.
147
Bujnaks method
Method A
Method B
FIGURE 6.20 Estimated camera paths for straight camera motion.
This figure confirms that the estimated camera paths of Methods A and B are
smoother than those of the Bujnaks method. There is a large jitter in the estimated
camera path of the Bujnaks method. We confirmed that Methods A and B can estimate the camera path with more stability than the Bujnaks method.
6.6SUMMARY
In this chapter, the methods for estimating intrinsic and extrinsic camera parameters
were introduced. In the monocular camera case, camera parameters are estimated
by minimizing the energy function. In this method, two additional energy terms
are added to the conventional marker-based camera parameter estimation method:
reprojection errors of tracked natural features and temporal constraint of zoom
value. On the other hand, in the stereo camera case, intrinsic and extrinsic camera parameter estimation of the zoomable camera is achieved using the reference
camera. In this method, the optical lens movement is modeled as the focal length
change by zooming using a simple zoomable camera model. By using this model and
the reference camera, intrinsic and extrinsic camera parameters can be estimated
by solving a one-dimensional optimization problem. These methods can achieve
accurate and stable camera parameter estimation. However, the current methods
do not consider the lens distortion. Lens distortion must be considered when using
wide-angle lenses.
148
REFERENCES
Abidi, M. A., T. Chandra. A new efficient and direct solution for pose estimation using quadrangular targets: Algorithm and evaluation. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 17(5), 1995, 534538.
Bujnak, M., Z. Kukelova, T. Pajdla. A general solution to the P4P problem for camera with
unknown focal length. Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, Anchorage, AK, June 2328, 2008, pp. 18.
Bujnak, M., Z. Kukelova, T. Pajdla. New efficient solution to the absolute pose problem for
camera with unknown focal length and radial distortion. Proceedings of the Asian
Conference on Computer Vision, Queenstown, New Zealand, November 812, 2010,
pp. 1124.
Drummond, T., R. Cipolla. Real-time visual tracking of complex structure. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 27(7), 2002, 932946.
Fischer, M. A., R. C. Bolles. Random sample consensus: A paradigm for model fitting with
applications to image analysis and automated cartography. Communications of the
ACM, 24(6), 1981, 381395.
Hartle, R., A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge, U.K.:
Cambridge University Press, 2004.
Hmam, H., J. Kim. Optimal non-iterative pose estimation via convex relaxation. International
Journal of Image and Vision Computing, 28(11), 2010, 15151523.
Kato, H., M. Billinghurst. Marker tracking and HMD calibration for a video-based augmented
reality conferencing system. Proceedings of the International Workshop on Augmented
Reality, San Francisco, CA, October 2021, 1999, pp. 8594.
Klette, R., K. Schluns, A. Koschan, editors. Computer Vision: Three-Dimensional Data from
Image. New York: Springer, 1998.
Kukelova, Z., M. Bujnak, T. Pajdla. Real-time solution to the absolute pose problem with
unknown radial distortion and focal length. Proceedings of the International Conference
on Computer Vision, Sydney, New South Wales, Australia, December 18, 2013,
pp.28162823.
Lepetit, V., F. Moreno-noguer, P. Fua. EPnP: An accurate O(n) solution to the PnP problem.
International Journal of Computer Vision, 81(2), 2009, 155166.
Li, H. A simple solution to the six-point two-view focal-length problem. Proceedings of the
European Conference on Computer Vision, Graz, Austria, May 713, 2006, pp. 200213.
Numao, T., Y. Nakatani, M. Okutomi. Calibration of a pan/tilt/zoom camera by a simple camera model. Technical Report of IEICE. PRMU, 1998, Kanagawa, Japan, pp. 6572.
Petit, A., G. Caron, H. Uchiyama, E. Marchand. Evaluation of model based tracking with
TrakMark dataset. Proceedings of the International Workshop on AR/MR Registration,
Tracking and Benchmarking, Basel, Switzerland, October 26, 2011.
Quan, L., Z.-D. Lan. Linear n-point camera pose determination. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 21(8), 1999, 774780.
Shi, J., C. Tomasi. Good features to track. Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, Seattle, WA, June 2123, 1994, pp. 593600.
Snavely, N., S. M. Seitz, R. Szeliski. Photo tourism: Exploring photo collections in 3D. ACM
Transactions on Graphics, 25(3), 2006, 835846.
Stewenius, H., D. Nister, F. Kahl, F. Schaffalitzky. A minimal solution for relative pose with
unknown focal length. Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, San Diego, CA, June 2025, 2005, pp. 789794.
Sturm, P. Self-calibration of a moving zoom-lens camera by pre-calibration. International
Journal of Image and Vision Computing, 15, 1997, 583589.
149
Taketomi, T., K. Okada, G. Yamamoto, J. Miyazaki, H. Kato. Camera pose estimation under
dynamic intrinsic parameter change for augmented reality. International Journal of
Computers and Graphics, 44, 2014, 1119.
Taketomi, T., T. Sato, N. Yokoya. Real-time and accurate extrinsic camera parameter estimation using feature landmark database for augmented reality. International Journal of
Computers and Graphics, 35(4), 2011, 768777.
Triggs, B. Camera pose and calibration from 4 or 5 known 3D points. Proceedings of the
International Conference on Computer Vision, Kerkyra, Greece, September 2027,
1999, pp. 278284.
Tsai, R. Y. An efficient and accurate camera calibration technique for 3D machine vision.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
Miami Beach, FL, 1986, pp. 364374.
Wu, Y., Z. Hu. PnP problem revisited. Journal of Mathematical Imaging and Vision, 24(1),
2006, 131141.
Zhang, Z. A flexible new technique for camera calibration. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 22(11), 2000, 13301334.
Visual Tracking for

Augmented Reality in
Natural Environments
Suya You and Ulrich Neumann
CONTENTS
7.1 Introduction................................................................................................... 152
7.2 Framework for Simultaneous Tracking and Recognition.............................. 152
7.2.1 Offline Stage...................................................................................... 153
7.2.2 Online Stage...................................................................................... 154
7.2.3 Simultaneous Tracking and Recognition........................................... 154
7.3 Camera Pose Tracking with Robust SFM..................................................... 156
7.3.1 Structure from Motion Using Subtrack Optimization....................... 156
7.3.2 System Components.......................................................................... 158
7.3.2.1 Building the Point Cloud.................................................... 158
7.3.2.2 Extracting Keypoint Descriptors........................................ 160
7.3.2.3 Incremental Keypoint Matching......................................... 160
7.3.2.4 Camera Pose Estimation..................................................... 161
7.3.2.5 Incorporating Unmatched Keypoints................................. 161
7.3.2.6 Experiments........................................................................ 162
7.4 Camera Pose Tracking with LiDAR Point Clouds........................................ 164
7.4.1 Automatic Estimation of Initial Camera Pose................................... 164
7.4.1.1 Generate Synthetic Images of Point Clouds....................... 165
7.4.1.2 Extract Keypoint Features.................................................. 165
7.4.1.3 Estimate Camera Pose........................................................ 165
7.4.2 Camera Pose Refinement................................................................... 166
7.4.3 Experiments....................................................................................... 167
7.5 Summary and Conclusions............................................................................ 169
Acknowledgment.................................................................................................... 169
References............................................................................................................... 170
151
152
7.1INTRODUCTION
Augmented Reality (AR) is the process of combining virtual and real objects into
a single spatially coherent view. In most cases, this entails capturing a sequence of
images and determining a cameras spatial pose (position and orientation) at each
frame. The cameras position and orientation, along with its internal parameters,
provide the essential information needed to create augmented realities.
Tracking, or camera pose determination, is a primary technical challenge of AR, and
therefore the subject of a large body of AR research and development work (Neumann
and You, 1999; Azuma et al., 2001; Shibata et al., 2010; Uchiyama and Marchand,
2012). Visual tracking is a popular approach to AR pose determination. In its simplest
form, the environment is prepared with artificial markers that can be easily detected and
tracked (Cho and Neumann, 2001; Ababsa and Mallem, 2004; Claus and Fitzgibbon,
2004; Mooser etal., 2006). The marker-based approach, however, is often impractical for use in wide-area environments. A more practical approach is to track naturally occurring elements of the environment (Platonov etal., 2006; Bleser et al., 2006;
Comport et al., 2006; Wagner etal., 2008; Hsiao etal., 2010; Uchiyamaetal.,2011;
Guan etal., 2012), possibly in combination with artificial markers (Jiang etal., 2000).
The advantage of visual tracking for AR, with either artificial markers or natural features, is the ability to capture images, identify visual content, estimate pose, and manage AR display all in one computing device with a camera and display.
This chapter focuses on the problem of robust visual tracking for AR in natural environments. Emphasis is placed on a novel tracking technique that combines
feature matching, visual recognition, and camera pose tracking. A tracking via recognition strategy is presented that performs simultaneous tracking and recognition
within a unified framework. Rather than functioning as a separate initialization
step, recognition is as an ongoing process that both aids and benefits from tracking.
Experiments show the advantages of this unified approach in comparison to traditional approaches that treat recognition and tracking as separate processes.
Within the same unified framework, two tracking systems are presented. The first
system employs robust structure from motion (SFM) to build a sparse point cloud
model of the target environment for visual recognition and pose tracking, while the
second system employs a dense 3D point cloud data captured with active sensors
such as LiDAR (Light Detection and Ranging) scanners. In both cases, by determining a sufficiently large set of 2D3D feature correspondences between an image and
the model database, an accurate camera pose can be computed and used to render
virtual elements for AR display.
7.2FRAMEWORK FOR SIMULTANEOUS

TRACKING AND RECOGNITION
Object recognition and tracking are closely linked problems in computer vision. Both
frequently employ similar low-level tasks such as feature selection, matching, and
model fitting. Moreover, the two often play complementary roles within a larger system, with object recognition used to initialize or re-initialize tracking, and tracking
supporting the recognition of a targets identity (Mooser etal., 2008; Guan etal., 2012).
153
Visual Tracking for Augmented Reality in Natural Environments
Offline stage
Image input
SFM
LiDAR
Create virtual
objects
Online stage
Virtual
object DB
AR
display
Recognize Compute
features camera pose
3D point Camera
clouds
poses
Extract feature
descriptors
Render virtual
objects
Feature
DB
Match
features
Video
input
FIGURE 7.1 Unified framework for simultaneous visual tracking and recognition.
Figure 7.1 illustrates the developed unified framework for simultaneous visual
recognition and tracking. At a high level, there are two stages. The first, an offline
stage, defines a set of visual features and associated virtual content. The second is an
online stage that processes an input video stream, recognizes visual features, estimates camera pose, and renders final AR output for each frame. These two stages are
connected through a feature database and a virtual object database.
7.2.1 Offline Stage

The offline stage is responsible for defining a set of visible landmarks for visual recognition and tracking. The landmarks can consist of artificial markers, natural features, or
both. In any case, a landmark definition contains both appearance data and geometry
data. Appearance data is used to recognize the features from a video input; geometric
data is used to compute camera pose. Keypoints are extracted from the landmarks and
represented using local descriptors that serve as feature definitions for recognition and
tracking. The output of the feature definition component is stored in the features database.
This process of building this database represents a primary function of the offline stage.
Many methods can generate the features and 3D point cloud models needed for
the feature database. Computer vision methods using stereo, depth cameras, and
SFM can produce accurate but sparse point cloud models of the target scenes, and
are often favored for smaller controllable workspaces. An alternative approach is
to utilize active laser scanners such as LiDAR systems, to quickly acquire dense
3D point cloud models in wide-area environments (Feiner etal., 2011; Pylvninen
et al., 2012; Guan et al., 2013). New methods for constructing a feature database
using both approaches are described in Section 7.3.
All virtual content is also defined in the offline stage, and stored in the annotation
database. This data typically consist of 2D or 3D models with rendering attributes
that may include color, texture, animations, and interactive behavior. Virtualannotations are often associated with a set of features in the feature database and will
154
only be rendered when those features are recognized and visible. The construction
and use of the annotation database is outside of this chapters scope, however its
relationship to the feature database is made clear by inclusion in the offline stage.
7.2.2 Online Stage

The online stage processes video input to control the appearance of rendered AR output.
A matching component attempts to match visual features detected in each input frame
to the feature database. This component thus serves in the key role of target recognition.
The camera pose component is responsible for tracking. It utilizes the feature
database geometric data to achieve pose calculation and tracking. Once a sufficient
number of features are recognized, their geometry data are used to estimate the
position and orientation of the camera. There may be frames with too few matches
to compute a pose. Rather than simply failing at these frames, keypoints that fail to
directly match to the feature database are employed to determine camera pose using
a technique called incremental matching, described in Section 7.2.3.
The final component in the online stage retrieves the virtual objects associated
with recognized targets and, using the current camera pose, renders them into the
input image. This rendering produces the final output of AR applications.
7.2.3 Simultaneous Tracking and Recognition

A key goal of the tracking via recognition strategy is to tightly integrate recognition
and tracking into a unified and symbiotic system. First, the extracted visual features
and descriptors are used to incrementally recognize objects of interest and estimate
camera pose. Then, all unmatched features are back-projected using the computed
pose, generating an additional set of correspondences across frames to maintain a
stable camera pose through movements. Rather than functioning solely as a separate
initialization step, recognition is an ongoing process that both aids and benefits from
tracking. The strategy exploits two key insights:
If keypoints are reliably tracked from one frame to the next, then matching
resulting from multiple frames can produce more confident object identification and more robust pose estimation.
With sufficient keypoint matches to compute an objects pose, one can infer
the locations of all other keypoint matches and use those points to aid the
tracking process.
These two insights lead to a process called incremental keypoint matching. It begins
with the feature database of learned keypoints and their descriptors. At each frame,
a small subset of the large collection of tracked keypoints is selected for matching
against the database, thus bounding the CPU processing time used for matching in a
single frame. If an object is visible in the scene, the system can positively identify it
and compute camera pose within a few frames. Once an object has been recognized,
its tracked keypoints are continually matched against the database. Keypoints that
cannot be matched against the database are matched to prior-frame keypoints by
back-projection with the current pose estimate. Keypoints are tested to determine
155
whether the current pose estimate implies that they lie on the surface of the database
object. If it does, its location in object space coordinates can be back-projected to generate a match. Keypoints that do not belong to the object, perhaps because they lie on
an occluding object, will not fit the pose estimates in future frames and these points
are discarded. In the event the system loses track of some of the originally matched
keypoints, it still tracks the object correctly using the newer matches. Thus, estimated
camera pose is used to compute the positions for new keypoints that have never been
matched to the database. Tracking these new points along with database-matched
keypoints helps maintain accurate pose tracking through subsequent frames.
From this point forward, keypoint matches to the database are called database
matches, and those keypoint matches that are derived from the incremental backprojection algorithm are called projection matches.
Figure 7.2 illustrates the principle of incremental keypoint matching for simultaneous tracking and recognition. The scene contains a book cover viewed by a moving camera. A feature database containing the book cover model and features was
prepared offline. In all frames, the detected dots on the book cover are the keypoints
used for object identification and pose tracking; bold lines indicate database matches;
and thin lines are projection matches. An early frame, shown in Figure 7.2a, finds
a few database matches that are used to estimate an initial camera pose. The pose
estimate is used to draw the white rectangle model of the book. As the sequence
continues, Figure 7.3b shows numerous keypoints matched by incremental back-projection. These are used to produce stable pose estimates. Figure 7.2c shows a frame
with substantial object occlusion. Only two of the original database matches remain,
however the projection matches ensure that tracking remains accurate and robust.
(a)
(b)
(c)
FIGURE 7.2 Incremental keypoint matching for simultaneous tracking and recognition.
(a) Initial camera pose is estimated from the database matches. (b) Camera pose is refined
incrementally with back-projection matches. (c) Stable camera pose produced by combining
the database matches and projection matches.
156
k1
k2
k3
k4
k5
k6
FIGURE 7.3 Illustration of hypothetical keypoint tracks for six camera locations. Each keypoint projects along a ray in space. The six rays do not meet at a single point, so there is
no structure point valid for the entire track. However, subtracks k1,2,3 and k4,5,6 do have valid
structure points. The goal of the optimization algorithm is to reliably perform this partitioning.
7.3 CAMERA POSE TRACKING WITH ROBUST SFM

This section describes a tracking system for markerless AR based on the tracking
via recognition framework. As illustrated in Figure 7.1, the system includes two
main components. The first component, the offline stage, takes a video stream as
input and builds a sparse point cloud model of the target environment using robust
SFM. Keypoints are extracted from the point clouds and represented as a set of local
descriptors. Its output is a database of recognizable keypoints along with 3D descriptions of accompanying virtual objects. The second part, the online stage, processes
input frames sequentially to recognize the previously stored keypoints, recover camera pose, and render associated virtual content.
Both components depend on the ability to simultaneously determine scene structure and camera pose from input images. In principle, the problem is straightforward
given a sufficient number of feature correspondences between frames. In practice,
however, inaccuracies among the feature correspondences make the task far more
challenging. To tackle this problem, an algorithm called subtrack optimization
(Mooser, 2009a) uses dynamic programming to detect incorrect correspondences
and remove them from the output. Experiments show that subtrack optimization
algorithm produces a keypoint database that is both larger and more accurate than
could be achieved using conventional SFM techniques.
7.3.1 Structure from Motion Using Subtrack Optimization

In its general form, SFM can be applied to any set of two or more images. In our
case, we focus solely on images captured as a video sequence from a single camera.
Ordered sequences simplify the problem because consecutive frames are similar and
easier to match with motion estimation approaches such as optical flow. However,
SFM from video presents its own unique challenge because long sequences accumulate small errors that cause drift in the computed camera pose.
Given an ordered sequence of captured images, keypoints can be extracted from
the first frame and tracked using a standard optical flow technique. A variant of
LucasKanade optical flow based on image pyramids is employed. The optical flow
157
process generates a set of keypoint tracks, each consisting of a keypoint correspondence spanning two or more consecutive frames. Each track continues to grow until
optical flow fails or the keypoint drifts out of view. In practice, tracks can span only
a few frames or several hundred.
Ideally, all keypoints in a given track correspond to the same 3D point in space, in
which case the keypoint track is deemed consistent. Over a long sequence, however,
this seldom holds true. Keypoint tracks are often stable for a few frames, then drift,
and then become stable again.
A simple solution identifies keypoint tracks that do not fit to a single structure
point and remove them from the computation. Traditional outlier detection schemes
such as RANSAC (Random Sample Consensus) may be used to this end. However,
simply labeling entire tracks as an inlier or outlier ultimately discards useful data.
Along keypoint track is generally stable over some portion of its lifetime and a more
powerful approach will identify those sections and use them.
Identifying the sets of frames during which a keypoint track remains stable is
nontrivial. Simply splitting the track into fixed sized partitions, for example, would
only partially address the problem. Partitions, no matter how large or small, may be
individually inconsistent. Moreover, if consecutive partitions are consistent, it would
be preferable to consider them as a whole.
The motivation behind the subtrack optimization algorithm is to solve this partitioning problem optimally. It sets out to identify the longest possible subtracks that
can be deemed consistent. Favoring fewer, longer subtracks is important because it
ensures that they span as wide a baseline as possible. A method that is overly aggressive in partitioning a keypoint track will lose valuable information, and the accuracy
of the resulting structure will suffer accordingly.
This idea is illustrated in Figure 7.3. As a hypothetical camera moves from top to bottom, a keypoint is tracked along a ray in space at each frame. Because those rays do not
meet at a common structure point, the six frame tracks are, by definition, inconsistent.
The subtrack spanning frames 13 and frames 46 are consistent, however, and thus
represent an optimal partitioning with both subtracks usable for pose computation.
Each subtrack corresponds to a single structure point with its consistency determined by average reprojection error. For keypoint track k, let kj and Pj be the k eypoint
and camera, respectively,at frame j and ka,b be the subtrack spanning frames a to b
inclusive. The consistency of ka,b is given by the error function
E (ka,b ) = min

1
N
d(P X , k )
j
(7.1)
j=a
where
d is Euclidean distance in pixels
N is the length of the subtrack
The argument, X, that minimizes the equation is the structure point corresponding
to ka,b.
158
In general, the optimal partitioning p of keypoint track k is defined in terms of a

cost function
C ( p) =
( + E ( k
a ,b
)) (7.2)
ka ,b p
where is a constant penalty term ensuring that the optimization favors longer subtracks whenever possible.
The number of possible partitions is exponential in the length of k, so a brute
force search would be intractable. As it turns out, however, given an estimate of the
camera pose at each frame, a partitioning suited to our needs can be found in low
order polynomial time.
The key idea is to define the cost function recursively as
C ( p 0 ) = 0
C ( p 1 ) = 0
C ( p n ) = min[C ( p a 1 ) + + E (a, n)]
(7.3)
1 a n
where p n is the optimal partitioning of a track only up to frame n. This recursion

can be computed efficiently from the bottom up using dynamic programming. See
Mooser (2009) for a formal proof of its correctness and analysis of its run time.
Although the final partition is optimal in that it minimizes Equation 7.2, it is not
necessarily the case that all subtracks are consistent enough for pose calculation.
Recall that finding long consistent subtracks is an important goal. After optimizing
each keypoint track, those subtracks spanning at least three frames and having
E(ka,b) < 1.0 are deemed consistent; all others are deemed inconsistent. Only the
structure points corresponding to consistent subtracks are included in the final structure. This subtrack optimization process forms the basis for building the keypoint
database during the offline training stage. It also serves an important role in the
online stage as described in the following section.
7.3.2 System Components

This section details the main components of the architecture shown in Figure 7.1 for
the case where the tracking system employs SFM with subtrack optimization.
7.3.2.1 Building the Point Cloud
The keypoint database consists of a cloud of 3D points, each associated with several
descriptor vectors. This data can be created by tracking keypoints through a video
sequence acquired ahead of time. The subtrack optimization algorithm introduced
earlier can determine reliable feature locations and incrementally build a complete
3D point cloud of the scene.
In order to partition a keypoint track, the subtrack optimization algorithm
requires an estimated camera pose at each frame. Initially, no poses are known,
so some method is needed to bootstrap the process. One way to achieve this is to
select features tracked from the first frame to another early frame n and use them
159
to fit a fundamental matrix. Using known internal parameters, this defines a rigid
transformation relating the two cameras.
For every keypoint tracked from frame 1 to frame n, an initial structure point
Xk is computed by linear triangulation. The initial structure points are then used to
compute poses for the intervening frames, 2 through (n 1). Each camera pose is a
rigid transformation having six degrees of freedom, which is fit using Levenberg
Marquardt optimization.
The resulting short sequence of camera poses, typically 810 in our implementation, provides enough information to apply the subtrack optimization algorithm to
each tracked keypoint. This partitions each track into one or more subtracks and
determines which subtracks are inliers. For each inlier subtrack, a 3D point is computed to form an initial point cloud.
Subsequent frames are processed with optical flow applied to each keypoint to
extend the corresponding track. For each new frame, the system initially assumes
that all consistent subtracks in the prior frame are still consistent for the current
frame. Since those subtracks have known 3D structure points, they are used to estimate the initial camera pose of the new frame.
With an initial camera pose for a new frame, subtrack optimization is performed
taking that new pose estimate into account. Each keypoint track is repartitioned into
subtracks, with a new 3D structure point assigned to each new subtrack. So each frame
estimates one camera pose followed by one subtrack optimization. This continues for
the entire input sequence to produce a final 3D reconstruction and feature database.
Figure 7.4 shows a resulting point cloud model for a building sequence, rotated to
provide an overhead view. The structure of the hedges is clearly visible, along with
FIGURE 7.4 Point cloud model of a building along with camera poses produced with the
proposed robust SFM approach.
160
some keypoints arising from the brick walls and the lawn. The path of the camera in
front of the building is computed and rendered on the left.
7.3.2.2 Extracting Keypoint Descriptors
The point cloud, by itself, contains only geometric information, namely a 3D location for each structure point. To complete the keypoint database, each point must be
associated with a visual descriptor that can be matched during the online stage.
The descriptor uses the WalshHadamard (WH) kernel projection to describe
a keypoint, which is highly compact and discriminative. The WH descriptor takes
rectangular image patches (typically 32 32 pixel) as input and uses WH kernel
transformation to reduce the patches to low dimensional vectors, with lower dimensions encoding low frequency information and higher dimensions encoding high
frequency information. Given an image patch p and the ith WH kernel ui, the ith element of the kernel projection is given by the inner product uiT p. Experimental trials
revealed that 20 dimensions are sufficient to retain the most characteristic features of
each patch and provide reliable matching results. The first WH kernel simply computes a sum of the patchs intensity values, which contains no discriminative information after normalization. The first kernel is thus discarded, so the 20-dimensional
descriptor vector is comprised by WH kernels u1 through u21.
Given the 20-dimensional descriptor for each keypoint, a Euclidean nearest
neighbor search finds its match in a database of learned keypoints. A distance ratio
filter is applied, that is, a match is accepted only if the distance ratio between its
nearest and second nearest matches is less than a predefined threshold, as detailed
in the following section.
Tracking hundreds of keypoints over hundreds of frames, the resulting database can grow quite large. However, the matching process maintains efficiency by
employing an approximate nearest neighbor search that is, at worst, logarithmic in
the size of the database, as described in the following section.
7.3.2.3 Incremental Keypoint Matching
Each frame of the input sequence contains a set of keypoints. As in the offline
stage, these keypoints are initially generated by a detector and tracked using optical flow, exactly as in the offline stage. Initially, only the 2D image locations of
these keypoints are known. A descriptor is extracted at each keypoint using the
20-
dimensional kernel projections, and matched against the database using a
Euclidean nearest neighbor search. To improve speed performance, the approximate nearest neighbor search algorithm called Best-Bin-First (BBF) (Lowe, 2004)
is employed, which returns the exact nearest neighbor with high probability while
only requiring time logarithmic in the database size. Using the BBF algorithm, the
nearest and second nearest matches are retrieved and their matching distances are
computed. If both of these matches are associated with the same structure point,
their matching distance ratio should be smaller than a predefined threshold and the
match is accepted.
Even with fast approximate nearest-neighbor searches, the matching process is
too time-consuming when applied to all keypoints in an image (typically 300500).
161
In order to limit this computational cost in any frame, we adopt the incremental keypoint matching approach described in Section 7.2.3. Incremental keypoint matching
only attempts to match a limited number of keypoints in each frame. The selected
set varies each frame so the set of successful matches gradually accumulates. While
a few matches per frame are seldom sufficient to estimate a robust pose, successful
matches are tracked from frame to frame. By selecting new keypoints for matching
and tracking existing matches, the number of tracked matches accumulates. In our
experiments, about 10 frames are sufficient for accumulating enough matches to
recover a robust camera pose. As the sequence continues, additional matched points
may be computed at each frame.
7.3.2.4 Camera Pose Estimation
Each successful database match produces a correspondence between a 2D image
point and 3D structure point. RANSAC is used to fit a camera pose. Outliers are
removed, as these 3D points are deemed incorrect.
Keypoints are tracked through the sequence to enable pose recovery in subsequent frames. Some of these points are lost as they drift out of view or due to track
failures. At the same time, however, incremental matching adds new matches at each
frame, trying to maintain a match set large enough to compute a reliable pose.
7.3.2.5 Incorporating Unmatched Keypoints
The process described thus far only uses points whose 3D locations are known in
advance and stored in the feature database. Most of the keypoints in the input image
are never successfully matched against the database. Incorporating these points into
the pose estimation process can significantly improve the quality of the final results.
An unmatched keypoint in a single frame represents a ray in 3D space, and thus
can be associated with any point along that ray. If that same keypoint is tracked over
several frames, each with a known pose, its location in 3D space can be estimated.
In fact, such estimates are precisely the input to the subtrack optimization algorithm
used during the offline stage. Thus, once a pose is recovered for at least 10 consecutive frames, the subtrack optimization algorithm is applied to all keypoint tracks that
are not matched to the database. The partitioned keypoints now have associated 3D
locations and can be applied to the pose estimation of future frames. This results in
two distinct sets of matches to compute pose, that is, database matches and projection matches.
The advantages of including projection matches in the pose estimation process
are twofold. First, there are times that the camera views an area that was not included
in the model building phase, in which case database matches alone are insufficient
to recover pose. In such cases, pose is computed from projection matches since they
persist without database matches.
The second advantage is that even when many database matches are found, additional matches result in a smoother and more reliable pose. Typically, there are many
times more projection matches than database matches. Residual errors exist in all
keypoint position estimates. A larger number of correspondences make the final
least-squares fit more reliable and smoothly varying.
162
7.3.2.6Experiments
Experiments demonstrate various behaviors of the online and offline stages of the
tracking system. The primary focus of all of these tests is to show that the robust
SFM tracking approach makes a substantial, measurable difference in the end results
(Mooser, 2009). Both stages were thus compared with and without subtrack optimization. In all test cases, one video sequence was captured for the offline stage and a
separate, longer video was used for the online stage.
Three test results are shown to demonstrate the systems performance with different scenes. The first test, the Fuse Box sequence in Figure 7.5a, shows the exterior of
an electrical fuse box in an industrial environment. The scene contains a mixture of
planar and nonplanar surfaces. The second test, the A/C Motor sequence in Figure
7.5b, targets an irregularly shaped object. Although the ground surrounding the
motor is flat, it is mostly covered in gravel, and does not contain many easily identified features. The final case, the building sequence in Figure 7.5c, shows an outer
(a)
(b)
(c)
FIGURE 7.5 Sample tracking and augmentation scenes for all three test cases: (a) the fusebox is tracked from a variety of orientations, not all of which were covered in the training
process; (b) the ac motor is tracked through a nearly 180 rotation, while the keypoint dataset
was built from only one side of the motor; and (c) the building scene contains both natural and
man-made objects, making the camera tracking extremely challenging.
163
building scene comprising both natural and man-made objects. The annotations are
virtual labels showing the way to nearby points of interest.
Table 7.1 shows the RMS reprojection error of all database keypoints produced
in the offline stages. Without subtrack optimization, keypoint tracks are never partitioned; tracks are simply terminated when their error exceeds a threshold. Every
track thus corresponds to a single structure point. If the track drifts significantly
over the whole sequence, the structure point is poorly defined and produces a large
reprojection error, as reflected in the results. In all tests, the average optimized error
is lower than the unoptimized cases.
Moreover, when running the offline stage with no optimization, the keypoint
database has far fewer total keypoints. The reason for this is that a keypoint track
that drifts significantly cannot be fit to any single structure point. Without subtrack optimization, such a track does not contribute any points to the database.
Subtrack optimization, however, may find multiple subtracks that each have valid
structure points, all of which can go into the database.
Table 7.2 shows the results of recognizing and tracking in the online stage.
Without subtrack optimization, pose refinement relies only on database matches and
no projection matches are used. As in the offline stage, pose accuracy is measured
by average reprojection error of all keypoints in all frames.
While there is significant error reduction with optimization, the absolute error is not as
low as in the offline stage. This is largely due to residual errors in the keypointdatabase.
TABLE 7.1
Offline Stage Error Measurement
Sequence
Fuse Box
AC Motor
Buildings
No optimization
Average subtrack length
Reprojected error
15.48
1.17
24.97
0.45
20.89
0.85
With optimization
Average subtrack length
Reprojected error
21.36
0.64
26.23
0.37
25.28
0.54
TABLE 7.2
Online Stage Error Measurement
Sequence
Fuse Box
AC Motor
Buildings
Length (frames)
600
444
350
No optimization
Average inliers
Reprojected error
50.38
4.43
46.11
2.08 (failed after 269 frames)
68.79
4.49 (failed after 206 frames)
With optimization
Average inliers
Reprojected error
438.45
1.91
300.61
0.56
438.41
2.06
164
Since the set of keypoints detected during the online stage are a subset of those found
when building the model, even the best matches can have an error of a few pixels. Table
7.2 also compares the average number of inlier keypoints available with and without
subtrack optimization. It shows that the optimization step greatly increases the total
number or projection matches available for use in pose calculation. This larger number
of observations to estimate pose makes the computation more reliable and robust to
individual errors, significantly improving the quality of the final results.
Figure 7.5 shows sample pose tracking and augmentation results for all three test
scenes. Camera poses are accurately estimated so the virtual objects are well aligned
with the real scenes. Note that all the tests involved moving the online camera to
areas outside of the areas viewed during the offline stage. During those movements,
the system is unable to generate database matches. Without the use of projection
matches, camera pose and tracking fail immediately. Using projection matches, however, the system is able to continue tracking, although errors will accumulate in the
absence of any visible database features. As database features become visible again,
the accumulated tracking errors are corrected.
7.4 CAMERA POSE TRACKING WITH LiDAR POINT CLOUDS

The SFM-based system described in Section 7.3 can automatically produce accurate,
but relatively sparse 3D point cloud models of varied scenes for use in visual tracking. The ability to produce a larger set of keypoints is a significant advantage. More
keypoints lead to a better chance of recognizing the objects and computing accurate
tracking camera pose during the online phase, especially in cluttered and occluded
natural environments.
This section describes a tracking system that utilizes active LiDAR to produce
dense point cloud models for object recognition and camera pose tracking. The
system accepts an input of video images and an offline-acquired 3D point cloud
scene model, where each point has both position and color. These are registered
in a common coordinate system, distinguishable objects are identified to establish
correspondences between the images and model, and camera pose is estimated for
rendering virtual content. This system shares many common components with the
SFM-based system, but significant distinctions exist in keypoint selection, correspondences matching, and pose recovery. This system consists of two major steps,
automatic pose initiation and iterative pose refinement, which are detailed in the
following section.
Figure 7.6 shows a portion of a 3D colored point cloud model (Los Angeles downtown) captured by a ground-based LiDAR system. The point cloud data represent
the scenes appearance and geometry. As in Section 7.3, appearance data is used to
recognize features; geometric data is used to compute camera pose.
7.4.1Automatic Estimation of Initial Camera Pose

Initial camera pose is automatically estimated in a two-step process. First, colored
3D point clouds are projected into 2D images from predefined virtual viewpoints to
form a set of synthetic view images. Keypoints are detected in the projected images
(a)
165
(b)
FIGURE 7.6 Dense 3D point cloud model (a) captured by a ground LiDAR system and a
camera image (b) of a corner in downtown Los Angeles.
and their visual descriptors are computed. These keypoints are then projected back
onto the point cloud data to obtain their 3D positions. In a second step, the camera
pose for an input video frame is estimated by corresponding image keypoints and
the back-projected keypoints.
7.4.1.1 Generate Synthetic Images of Point Clouds
A set of virtual viewpoints are arranged to face the major plane surfaces of the point
cloud model. These views uniformly sample viewing directions and logarithmically
sample viewing distance, as shown in Figure 7.7a. Experiments show that six viewing
directions and three viewing distances are sufficient for facade scenes with one major
plane surface.
Once viewpoints are defined, the 3D point models are rendered onto each 2D
image plane using ray casting and Z-buffers to handle occlusions. Figure 7.7b shows
examples of synthetic images generated from the Los Angeles point cloud dataset
shown in Figure 7.6.
7.4.1.2 Extract Keypoint Features
Keypoint features and their associated SIFT (Scale Invariant Feature Transform) visual
descriptors are extracted in each synthetic view image. The extracted features are reprojected onto the 3D point clouds by finding intersections with the first plane that is obtained
through a plane segmentation method (Stamos and Allen, 2002). It is possible that the
same feature is reprojected to different 3D coordinates from different synthetic views.
Nearby feature points are filtered so that proximate points with similar descriptors are
merged into one feature. The final output is a set of 3D keypoint features with associated
visual descriptors saved in the feature database for online matching and pose estimation.
7.4.1.3 Estimate Camera Pose
Given an input image, its keypoint features are extracted and matched against the 3D
keypoints in the feature dataset. Each matched feature surface normal is computed
and used for clustering features. A modified RANSAC method is employed to estimate the camera pose and remove outliers. Rather than maximizing the number of
inliers that are consensus to the pose hypothesis, we make modifications as follows.
166
(a)
(b)
FIGURE 7.7 (a) Virtual viewpoint arrangement and (b) synthetic images produced from a
3D point cloud.
Inliers are clustered according to their normal directions so that the inliers with
close normal directions are grouped as the same cluster. Let N1 and N2 be the number
of inliers for the largest two clusters. Among all the hypothesized poses, we want to
maximize the value of N2
[ R | T ] = arg max N 2 (7.4)
[R | T ]

This promotes solutions with inliers that lie in different planes. This avoids the condition where all features lie in a single plane making the calculated pose unstable and
sensitive to position errors.
7.4.2Camera Pose Refinement

The estimated initial pose is iteratively refined by progressively incorporating more
keypoint features into the camera pose estimation. Using the pose estimated in the
previous iteration, additional feature correspondences are generated and used to
167
generate a new improved pose estimate. In the first iteration, SIFT features are used,
but Harris features are employed in the following iterations to improve processing
speed. Harris features are less distinctive but much faster to compute. For each feature point, a normalized intensity histogram within an 8 8 patch is computed as a
feature descriptor and used for matching. We search for corresponding points within
the neighborhood of H H pixels. Initially, H is set to 64 pixels. The search size is
reduced by half in each iteration to 4 4 (16 pixels) as more accurate pose estimates
are obtained.
The pose refinement is accomplished through an optimization process that minimizes an error function of feature descriptors derived from the input image and the
projected images of point clouds.
1
{i}
E=

(s I
3D
(i ) I 2 D (i ))2 (7.5)
where
I3D (i) and I2D (i) are descriptors for the ith feature on the projected image and input
image, respectively
s is a scale factor compensating the reflectance or lighting effects
s will take the value so that the Equation 7.5 is equivalent to
E=

I 3D (i )2 I 2 D (i )2
2
I 2 D (i )
(7.6)
I 3D (i )2
So final pose optimization is to minimize the error equation

[ R | T ] = arg max E (7.7)

[R | T ]
7.4.3Experiments
Experiments evaluate the systems performances with various real data. Reprojection
error in the image domain is measured to evaluate the accuracy of camera pose estimation. Figure 7.8 demonstrates the behaviors of iterative pose estimation and refinements of the tracking system. The left column of the Figure 7.8 shows the rendered
image of the 3D point cloud model from the estimated camera pose. The middle
column shows the rendered point cloud image aligned with the camera image. The
alignment accuracy depends on the accuracy of the camera pose estimate. A pixel
difference image is shown on the right column. Alignment errors are clearly reduced
after each iteration.
The number of iterations needed is usually small. Our experiments show that
measured projection errors often remain constant after three iterations. This indicates that sufficient correspondences have been obtained after three iterations for
computing a stable and accurate camera pose.
168
FIGURE 7.8 Iterative estimation of camera pose: the left column shows input 3D point
cloud model rendered from the estimated camera pose, the middle column shows the model
image aligned with the camera image, and the right column shows the pixel-difference image
that illustrates the accuracy of pose estimation.
Figure 7.9 shows another example of tracking and augmentation results for a
video image using a 3D point cloud model. An initial camera pose is automatically
obtained from keypoint matches. After three iterations of pose refinements, accurate
camera poses are estimated so the virtual models are well aligned with the real
scenes.
(a)
(b)
(c)
(d)
169
FIGURE 7.9 A video image (a) and a 3D point cloud model (b) rendered with an initial
camera pose. Final pose is estimated after three iterations (c), allowing accurate alignment of
the 3D model with the image (d).
7.5 SUMMARY AND CONCLUSIONS

This chapter describes methods for visual tracking to support augmented realities in
natural environments. These new approaches are based on a tracking via recognition strategy that employs simultaneous tracking and recognition. Integrated feature
matching, visual recognition, and pose estimation provide the robust motion and
camera pose tracking needed for natural settings. Scenes with arbitrary geometry
can be captured, recognized, tracked, and augmented. Within the same framework,
two tracking systems are presented. The first employs robust SFM to build a sparse
point cloud model of the target environment. The second system addresses the use of
dense point cloud data captured with a laser scanner. Experiments demonstrate the
advantages of the integrated techniques in comparison to traditional approaches that
treat recognition and tracking as separate processes.
ACKNOWLEDGMENT
Much of this work is the PhD research of members of the Computer Graphics
and Immersive Technology (CGIT) lab at the University of Southern California.
In particular, we relied on the works of Dr. Wei Guan, Dr. Jonathan Mooser, and
170
Dr.QuanWang. We are also grateful for the current and former project sponsors,
including the U.S. Army Research Office (ARO), the Office of Naval Research
(ONR), the National Geospatial-Intelligence Agency (NGA), DARPA, NASA,
Nokia, Airbus, and Korean Air Corp.
REFERENCES
Ababsa, F. and Mallem, M., Robust camera pose estimation using 2D fiducials tracking
for real-time augmented reality systems. Proceedings of the 2004 ACM SIGGRAPH
International Conference on Virtual Reality Continuum and Its Applications in Industry,
Singapore, June 1618, 2004, pp. 431435.
Azuma, R., Baillot, Y., Behringer, R., Feiner, S., Julier, S., and MacIntyre, B., Recent advances
in augmented reality. IEEE Computer Graphics and Applications, 21(6): 3447,
November/December 2001.
Bleser, G., Wuest, H., and Stricker, D., Online camera pose estimation in partially known
and dynamic scenes. International Symposium on Mixed and Augmented Reality, Santa
Barbara, CA, October 2225, 2006, pp. 5665.
Cho, Y. and Neumann, U., Multi-ring fiducial systems for scalable fiducial-tracking augmented reality. PRESENCE: Teleoperators and Virtual Environments, 10(6): 599612,
December 2001.
Claus, D. and Fitzgibbon, A. W., Reliable fiducial detection in natural scenes. Proceedings
of European Conference on Computer Vision, Prague, May 1114, 2004, pp. 469480.
Comport, I., Marchand, E., Pressigout, M., and Chaumette, F., Real-time markerless tracking for augmented reality: The virtual visual serving framework. IEEE Transactions on
Visualization and Computer Graphics, 12(4): 615628, July 2006.
Feiner, S., Korah, T., Murphy, D., Parameswaran, V., Stroila, M., and White, S., Enabling
large-scale outdoor mixed reality and augmented reality. International Symposium on
Mixed and Augmented Reality, Basel, Switzerland, October 2629, 2011, p. 1.
Guan, W., You, S., and Neumann, U., Efficient matchings and mobile augmented reality.
ACM Transactions on Multimedia Computing, Communications and Applications
(TOMCCAP), special issue on 3D Mobile Multimedia, 8(47): 115, September 2012.
Guan, W., You, S., and Pang, G., Estimation of camera pose with respect to terrestrial LiDAR
data. IEEE Workshop on the Applications of Computer Vision (WACV), Tampa, FL,
January 1517, 2013, pp. 391398.
Hsiao, E., Collet, A., and Hebert, M., Making specific features less discriminative to improve
point-based 3D object recognition. IEEE Conference on Computer Vision and Pattern
Recognition, San Francisco, CA, June 1318, 2010, pp. 26532660.
Jiang, B., You, S., and Neumann, U., Camera tracking for augmented reality media. IEEE
International Conference on Multimedia and Expo, New York, July 30August 2, 2000,
pp. 16371640.
Lowe, D. G., Distinctive image features from scale-invariant keypoints. International Journal
of Computer Vision, 60: 91110, 2004.
Mooser, J., You, S., and Neumann, U., Tricodes: A barcode-like fiducial design for augmented
reality media. IEEE International Conference on Multimedia and Expo, Toronto,
Ontario, Canada, July 912, 2006, pp. 13011304.
Mooser, J., You, S., and Neumann, U., Fast simultaneous tracking and recognition using
incremental keypoint matching. International Symposium on 3D Data Processing,
Visualization, and Transmission, Atlanta, GA, June 1820, 2008.
Mooser, J., You, S., Neumann, U., Grasset, R., and Billinghurst, M., A dynamic programming
approach to structure from motion in video. Asian Conference on Computer Vision,
Xian, China, September 2327, 2009a, pp. 110.
171
Mooser, J., You, S., Neumann, U., and Wang, Q., Applying robust structure from motion to
markerless augmented reality. IEEE Workshop on Applications of Computer Vision
(WACV), Snowbird, UT, December 78, 2009b, pp. 18.
Neumann, U., and You, S., Natural feature tracking for augmented reality. IEEE Transactions
on Multimedia, 1(1): 5364, 1999.
Platonov, J., Heibel, H., Meier, P., and Grollmann, B., A mobile markerless AR system for
maintenance and repair. International Symposium on Mixed and Augmented Reality,
Santa Barbara, CA, October 2225, 2006, pp. 105108.
Pylvninen, T., Berclaz, T. J., Korah, T., Hedau, V., Aanjaneya, M., and Grzeszczuk, R., 3D
city modeling from street-level data for augmented reality applications. International
Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission
(3DIMPVT), Zurich, Switzerland, October 1315, 2012, pp. 238245.
Shibata, F., Ikeda, S., Kurata, T., and Uchiyama, H., An intermediate report of Trakmark WGInternational voluntary activities on establishing benchmark test schemes for AR/MR
geometric registration and tracking methods. International Symposium on Mixed and
Augmented Reality, Seoul, Korea, October 1316, 2010, pp. 298302.
Stamos, I., and Allen, P. K., Geometry and texture recovery of scenes of large scale. Computer
Vision and Image Understanding, 88(2): 94118, 2002.
Uchiyama, H., and Marchand, H., Object detection and pose tracking for augmented reality:
Recent approaches. 18th Korea-Japan Joint Workshop on Frontiers of Computer Vision,
Kawasaki, Japan, February 2012, pp. 18.
Wagner, D., Reitmayr, G., Mulloni, A., Drummond, T., and Schmalstieg, D., Pose tracking
from natural features on mobile phones. Proceedings of the International Symposium
on Mixed and Augmented Reality (ISMAR), Cambridge, U.K., September 1518, 2008,
pp. 125134.
Urban Visual Modeling

and Tracking
Jonathan Ventura and Tobias Hllerer
CONTENTS
8.1 Introduction................................................................................................... 174
8.2 Outdoor Panoramic Capture.......................................................................... 176
8.2.1 Guidelines for Capture...................................................................... 176
8.3 Automatic 3D Modeling................................................................................ 177
8.3.1 Image Extraction............................................................................... 177
8.3.2 3D Reconstruction Pipeline............................................................... 179
8.4 Semiautomatic Geo-Alignment..................................................................... 179
8.4.1 Vertical Alignment............................................................................ 180
8.4.2 Ground Plane Determination............................................................. 181
8.4.3 Map Alignment.................................................................................. 181
8.5 Tracking the Model........................................................................................ 181
8.5.1 Image Representation........................................................................ 182
8.5.2 Camera Model................................................................................... 182
8.5.3 Point Correspondence Search............................................................ 182
8.5.4 Pose Update....................................................................................... 183
8.5.5 Success Metric................................................................................... 184
8.5.6 Live Keyframe Sampling................................................................... 184
8.6 Tracker Initialization..................................................................................... 185
8.6.1 Image-Based Method......................................................................... 185
8.6.2 Feature-Based Method....................................................................... 186
8.7 Server/Client System Design......................................................................... 186
8.7.1 Server/Client System Overview......................................................... 186
8.7.2 Latency Analysis............................................................................... 187
8.7.3 Sensor Integration.............................................................................. 188
8.8 Evaluation...................................................................................................... 189
8.8.1 Speed................................................................................................. 189
8.8.2 Accuracy Tests with Differential GPS............................................... 189
8.8.3 Augmentation Examples.................................................................... 190
8.9 Discussion...................................................................................................... 191
8.10 Further Reading............................................................................................. 192
References............................................................................................................... 192
173
174
8.1INTRODUCTION
This chapter explains how to digitally capture, model, and track large outdoor spaces
so that they can be used as environments for mobile-augmented reality (AR) applications. The three-dimensional (3D) visual model of the environment is used as a
database for image-based pose tracking with a handheld camera-equipped tablet.
Experimental analysis demonstrates that real-time localization with high accuracy
can be achieved from models created using a small panoramic camera.
Device positioning is a common prerequisite for many AR applications. Indoors,
visual detection of flat, printed markers has proven to be a very successful method for
accurate device positioning, at least for a small workspace. In larger spaces, external
tracking systems allow for precise positioning by use of statically mounted cameras
that observe objects moving in the space. Outdoors, however, we cannot require that
the environment be covered in printed markers or surrounded by mounted cameras. The global positioning system (GPS) provides ubiquitous device tracking from
satellites, but does not guarantee enough accuracy on consumer-level devices for
AR applications. This chapter presents an alternative approach that treats the built
environment like an existing visual marker. By detecting and tracking landmark features on the building facades, the system uses the surrounding buildings for accurate
device positioning in the same way that printed markers are used indoors, except at
a larger scale.
Visual modeling and tracking technology is based on the relationship between
points in the scene and cameras that observe them. Having images of the same point
from multiple known camera positions allows us to determine the 3D location of
the point, as depicted in Figure 8.1. Conversely, observing multiple known points
in a single image allows us to determine the position of the camera, as depicted in
Figure8.2. The first case is useful for building a 3D model of an environment. The
second case is useful for determining the location of a camera with respect to that
model. Researchers in the fields of photogrammetry and multiple-view geometry have
studied the equations and principles governing these relationships extensively. This
chapter describes a system that applies these principles to model a large outdoor space
and track the position of a camera-equipped mobile device moving in that space.
The process for preparing an outdoor environment for tracking usage in AR
applications involves three basic steps. First, the area is captured in many photographs that cover the environment from all possible viewpoints. Second, from this
collection of photographs, feature points are extracted and matched between images,
and their 3D positions are precisely determined in an iterative process. Third, the 3D
points are aligned with a map of building outlines to provide the reconstruction with
a scale in meters and a global position and orientation.
The resulting 3D reconstruction is stored on a server and transferred to the client
device. After computing a tracker initialization on the server, the software on the
mobile client device tracks 3D feature points in the live camera view to continuously
determine the position and orientation of the device.
The following sections provide detailed descriptions and evaluations of these
system components. Sections 8.2 through 8.4 cover the outdoor modeling process.
Sections 8.5 through 8.7 describe the outdoor tracking system. Section 8.8 provides
Urban Visual Modeling and Tracking
175
FIGURE 8.1 Triangulation of a 3D point from observations in two cameras. The distance
between the cameras is called the baseline; the estimate is more accurate with a larger baseline.
FIGURE 8.2 Localization of a camera from three 3D point observations. The estimate is
generally more accurate when the points are nearer to the camera.
176
quantitative evaluations of the system, and Section 8.9 provides a discussion of the
overall system design and performance. Finally, Section 8.10 gives an annotated reference list for interested readers who would like to explore related work.
8.2 OUTDOOR PANORAMIC CAPTURE

An easy way to capture many different viewpoints of a large environment is to use a
panoramic or omnidirectional camera. Such a camera captures all viewing angles from
one position in a single capture. Examples of such cameras include the professionalgrade Point Grey Ladybug, which has six cameras, and the consumer-grade Ricoh
Theta, which has just two cameras, placed back to back, with very wide-angle lenses.
A consumer-grade panorama camera is light enough to be held overhead in ones
hand. Alternatively, a tripod or monopod attached to a backpack serves as an easy
mounting system, if the camera can be remotely triggered.
8.2.1 Guidelines for Capture

By simply walking through an environment and capturing panoramas at regular
intervals, the environment to be augmented is easily captured in images. However,
some care should be taken during the capture process, in order to ensure the success
of the later reconstruction and augmentation steps.
The main issue to consider is how many pictures to take, and where to take them.
To answer this, the characteristics of vision-based 3D reconstruction should be taken
into account. Triangulation of 3D points depends on multiple observations of a
point taken from images in different locations, as shown in Figure 8.1. The distance
between two cameras is called the baseline. A larger baseline gives a more accurate
point triangulation. However, if the images are too far apart, then the appearance
of the object will change too much, which means that the images cannot be automatically matched together. The scale-invariant feature transform (SIFT) descriptor
(Lowe, 2004), which we use for matching, is reported to match well with up to 45
of out-of-plane rotation. However, the matching works better with smaller angles.
A simple rule of thumb is that the optimal distance ratio is about 4/10, meaning
that the pictures should be about 4 m apart if the buildings are 10 m away. This corresponds to having an angle of about 10 between the rays observing a point.
When recording panoramic video, images are extracted at a fixed rate in order to
have regularly spaced panoramas from the video to use in the reconstruction pipeline. The appropriate interval to achieve the desired 4/10 ratio depends on the speed
of motion and the distance to the buildings. For walking speed in building courtyards, an appropriate sampling rate is about two panoramas per second.
A second consideration is the expected distance between the offline-captured
panorama images and the users location during online use of the AR application.
Again, the limiting factor is the ability of the image points to be matched. This
depends on both the angle of view and the change in scale. Experiments with the
iPad 2 and the Point Grey Ladybug camera have shown that reasonable localization
performance can be expected in a range within a quarter of the distance to the buildings from the offline panorama capture point (Ventura and Hollerer, 2012b).
177
8.3 AUTOMATIC 3D MODELING

After canvassing the area to be modeled and collecting imagery from many
viewpoints, the image collection is processed in an automatic 3D modeling
pipeline. This pipeline takes the image sequences as input and outputs the estimated camera positions and 3D triangulated points. The collection of estimated
points is called a point cloud. This section gives some details about how the
pipeline works.
8.3.1Image Extraction
There are several common panoramic image representations that could be used to
store the image sequences. Mappings such as spherical and cylindrical projection
offer a continuous representation of all camera rays in one image. However, they
nonlinearly distort the perspective view, which would impact performance when
matching to images from a normal perspective camera, as found on a typical mobile
device.
Instead of using spherical or cylindrical projection, perspective views are
extracted from each panorama, such that the collection of extracted views covers
the entire visual field. A typical cube map used as an environment map in rendering uses six images arranged orthogonally, with 90 horizontal and vertical fields of
view in each image. This representation offers perspective views without distortion.
However, in practice, the low field of view in each image hinders the matching and
reconstruction process.
To address this issue, perspective images with wider than 90 horizontal field
of view are used. The top and bottom of the cube are omitted, since they generally have no usable texture. The faces provide overlapping views, which increases
the likelihood of matching across perspective distortion. Eight perspective views per
panorama are used to increase image matching performance by ensuring that all
directions are covered in a view without severe perspective distortion. The views are
arranged at equal rotational increments about the vertical camera axis. Figure 8.3
shows an image from the panorama camera and its extended cube map representation.
(a)
FIGURE 8.3 (a) An example panorama represented in spherical projection. (Continued)
178
(b)
(c)
(d)
(e)
(f )
(g)
FIGURE 8.3 (Continued) (bc) Images extracted from the panorama using perspective projection. (de) Images extracted from the panorama using perspective projection.
(fg)Images extracted from the panorama using perspective projection.
(Continued)
179
(h)
(i)
FIGURE 8.3 (Continued) (hi) Images extracted from the panorama using perspective
projection.
8.3.2 3D Reconstruction Pipeline

After extraction, the perspective views from the panoramas are processed in an incremental structure-from-motion pipeline, which produces a 3D point cloud from image
feature correspondences. This pipeline has four major steps: pair-wise panorama
matching, match verification, reconstruction of an initial pair, and incremental addition of the remaining panoramas. If a linear camera path is assumed, without loop
closures, panoramas are only matched to their neighbors in the sequence. Otherwise,
exhaustive pair-wise matching is employed to test all possible correspondences. The
SIFT detector and descriptor is used for feature matching (Lowe, 2004). Matches are
verified by finding the essential matrix (Nistr, 2004) relating two panoramas using a
progressive sample consensus (PROSAC) loop (Chum and Matas, 2005). After triangulating points using an initial panorama pair, panoramas are incrementally added,
more points are triangulated, and bundle adjustment is performed. This is repeated
until no more panoramas can be added. The relative rotation between perspective
views in a single panorama is fixed, and only the rotation and translation between
panoramas is estimated and refined.
8.4 SEMIAUTOMATIC GEO-ALIGNMENT

Image-based reconstruction by itself produces a metric reconstruction that is internally consistent. However, this process cannot recover the external orientation of
the reconstruction, meaning the direction of gravity, the scale in meters, and the
geographic positions of the cameras and 3D points. This external orientation, however, is very useful for many kinds of AR applications. With a geo-aligned reconstruction it is then possible to display geo-referenced information such as map.
Other applications which do not display geo-referenced information still benefit
from geo-alignment, because it can be used to determine the devices height off the
ground and the scale and orientation with which 3D models should be displayed.
180
To enable these benefits in our AR applications, the semiautomatic geo-alignment

procedure described here is employed to determine the external orientation of the
reconstruction.
8.4.1 Vertical Alignment

The first step of the alignment procedure is to determine the vertical orientation of the reconstruction. This is determined by two rotation angles that transform the reconstruction to make the negative Z-axis aligned with the direction of
gravity. To automatically estimate this alignment, roughly vertical line segments
are extracted from all images. The LSD line segment detector (Von Gioi etal.,
2010) is applied and all lines with a minimum length of 25% of the image diagonal and an orientation within 45 off-vertical are accepted. Using only roughly
vertical lines makes the assumption that the images are taken with a roughly
upright orientation, and that there are sufficient upright structures having such
lines in the images.
Then, a common vertical vanishing point for all images is determined. Vertical
vanishing point hypotheses are generated by repeatedly sampling a pair of lines
and finding their intersection point. Each hypothesis is tested against all lines to
determine their angular errors with respect to the vanishing point hypothesis. The
hypothesis with the greatest number of inliers is selected as the common vertical
vanishing point for all images. Figure 8.4 shows an example of vertical lines found
to be inliers in one image. After finding the common vertical vanishing point, the
rotation which brings this point to vertical (0, 0, 1)T is determined and applied to the
reconstruction.
FIGURE 8.4 Lines on the buildings (in white) are used to determine a common vanishing
point and align the vertical axis of the reconstruction with the direction of gravity.
181
8.4.2 Ground Plane Determination

Once the reconstruction is vertically aligned, the ground plane is determined by
considering the Z-coordinate of all 3D points. Assuming that the reconstruction contains many points on the ground, the ground plane should be a peak in the histogram
of Z values, near the lower end of the range. Erroneous points in the reconstruction
might lie under the ground, so the absolute minimum value should not be used as the
ground height. Instead, the height of the ground is initialized to the 80th percentile
Zvalue. The ground height will then be manually tuned by inspecting the reconstruction visually and confirming that the estimated ground plane meets the bottom
edges of buildings.
8.4.3Map Alignment
Now there are four remaining degrees of freedom left: a rotation about the vertical
axis, a translation on the XY ground plane, and the metric scaling of the reconstruction. An initialization for these remaining transformation parameters is determined
manually by visually comparing an overhead, orthographic view of the reconstruction with a map of building outlines from the area, which can be freely downloaded
from OpenStreetMap. A simple interactive tool renders the point cloud and building
outlines together. The user interactively rotates, translates, and scales the reconstruction until the points roughly match the buildings.
After the user determines a rough initialization, automatic nonlinear optimization is applied to determine the best fit between 3D points and building walls. Each
point is assigned to the nearest building wall according to the 2D point-line distance.
However, if the point-line distance is greater than 4 m, or the projection of the point
onto the line does not lie on the line segment, then the match is discarded. The
rotation, translation, and scale parameters are iteratively updated to minimize the
point-line distance of all matches, using the Huber loss function for robustness to
outliers. The entire optimization procedure is repeated until convergence to find the
best point-line assignment and 3D alignment.
An example reconstruction aligned to OpenStreetMap data is shown in Figure8.5.
The panoramas for this reconstruction were captured by walking in a straight line
through the center of the Graz Hauptplatz courtyard while holding the Ricoh Theta
camera overhead.
8.5 TRACKING THE MODEL

Given a 3D point cloud reconstruction of the scene, the mobile devices position is
determined in real time by identifying and localizing feature points observed by
the devices camera. The continuously operating tracker maintains in real time the
pose of the mobile phone or tablet with respect to the model. The tracker takes as
input a pose prior, the current camera image, and sensor readings, and outputs a pose
posterior that estimates the current device position and orientation. The pose prior
is provided by the previous tracking iteration. Tracker initialization, and reinitialization after failure, is provided by the procedures discussed in Section 8.6.
182

150
100
Northing (m)
50
0
50
100
150
200
200
100
100
200
Easting (m)
FIGURE 8.5 Point cloud reconstruction aligned to OpenStreetMap building data. Thegray
dots indicate panorama capture locations, and the black points indicate triangulated 3Dpoints.
8.5.1Image Representation
We refer the images extracted from the panoramas as keyframes. These keyframes
are extended by preparing an image pyramid, meaning that the image is repeatedly
half-sampled; the stack of images at progressively smaller resolutions is stored and
used to improve patch sampling during tracker operation.
8.5.2 Camera Model

Most mobile phones and tablets have a moderate field-of-view camera with low distortion, thus a simple pinhole camera model without a radial distortion term is sufficient to model it. This model has only one parameter, the focal length, which is
determined in a precalibration step. The center of projection is assumed to be the
center of the image. The same model is used for the panorama keyframes, which are
generated by synthetic warping of the panorama images.
8.5.3Point Correspondence Search

At each frame, the tracker projects patches from the database of images into the
camera frame according to the pose prior, searches for features in a window around
their expected positions, and then updates the pose using gradient descent to minimize the re-projection error.
First, any points that are predicted to lie behind the camera or outside of the image
are culled and not considered in further steps of the current tracking iteration.
183
For each point that passes the culling test, an 8 8 pixel template patch is
extracted from a keyframe that observes the projected point. This keyframe is called
the source, and the camera image is called the target. A perspective warp is used
to compensate for the parallax between the source and the target. This perspective
warp is determined by the 3D point, X, and its normal, n, and the plane p = (n1, n2,
n3, D)T, where nX + D = 0. If the target and source projection matrices are P = [I | 0]
and Pi = [Ri | ti] respectively, the 3 3 perspective warp is
Wi = Ksource (Ri + tivT) Ktarget
where
v = D1 (n1, n2, n3)T
Ksource and Ktarget are the intrinsic calibration matrices of the source and target
images, respectively
The determinant of the warp |Wi| gives the amount of scaling between source and target. This warp is computed for all keyframes that observe the point; the keyframe with
warp scale closest to one is chosen as the source. This ensures the best resolution when
sampling the template patch. The system also chooses the best level of the source image
pyramid according to the resolution required to when sampling the template patch.
The template patch is used to search in the target image for the points current
projected location in the image. We search for this location by computing the normalized cross-correlation between the template patch and patches sampled from the
target image at all locations on an 8 8 grid around the location given by the pose
prior. The location with the best score is accepted if the score exceeds a threshold
of 0.7, which was experimentally found to adequately separate correct and incorrect
matches. To increase robustness to fast movements, the search for correspondence is
performed over an image pyramid of four levels.
8.5.4Pose Update
After searching for correspondences, the camera pose estimate is updated to fit the
measurements. All correspondences found during patch search are used to update the
camera pose, even if they were not successfully refined to the lowest pyramid level.
For each observed point, we project its search location down to the zero pyramid
level, giving a measured location xi for point Xi. Ten iterations of gradient descent
over an M-estimator are used to minimize the re-projection error of all points:
e=

m( y x )
i
where
yi is the projected location of Xi using the current pose estimate
m(u) is the Tukey loss function (Huber, 1981)
The parameters of the Tukey loss function are recomputed to update the weights
after each of the first five iterations.
184
8.5.5Success Metric
After the pose update, the number Nfound of points found to be inliers by the
M-estimator is counted. This is an indicator of the success of the tracker, that is,
whether the pose posterior matches the true pose of the camera. The system requires
that at least 100 points have been successfully tracked (Nfound 100).
To ensure an acceptable frame rate, a limit Nmax is placed on the number of points
Nattempted that the tracker can attempt to find in a single frame. The system first performs view frustum culling on all points, and then selects Nattempted Nmax points to
search for. The point selection is performed by choosing some ordering of visible
points and selecting the first Nmax points from the ordering to be tracked. The question is, which ordering ensures the best tracking performance?
One commonly used approach is to randomly shuffle all the visible points at each
frame. However, this can lead to pose jitter when the camera is not moving. The
reason is that using different subsets of the points for tracking may result in slightly
different poses found by the gradient descent minimization, because of slight errors
in patch search.
A solution to this problem is to randomly order the points once at system startup.
This provides a fixed, but random, ordering for the points at each frame. The result
is that for a static or slowly moving camera, the tracker will reach a steady state
where the same subset of points is used for tracking and pose update at each frame.
Overall, this sampling procedure reduces pose jitter in comparison to sampling a
new random ordering of points at each frame.
8.5.6Live Keyframe Sampling

A second source of pose inaccuracy is poor feature correspondence. Errors in the
patch search can prevent the pose update from correctly converging. Direct alignment of the mobile device camera image to the panorama keyframes can cause poor
feature correspondence which leads to inaccurate or jittery pose estimates. This is
most likely because of the difference in imaging characteristics of the two cameras,
such as focal length and sharpness. In contrast, the feature correspondence between
two nearby images from the same camera is less noisy.
To address this problem, live keyframe sampling is incorporated into the tracking
system. During tracker operation, keyframes are collected from the current video
stream and added to the set of images used for patch projection. The tracker preferentially projects patches from the new keyframes, as these lead to more stable pose
estimation.
The decision of when to sample a new keyframe is based on the number Nold of
points that are projected from a panorama keyframe in the current camera image, and
the number Nnew of points projected from a new keyframe. When the ratio Nnew/Nold
drops below 50%, or when the distance to the nearest new keyframe rises above 2 m,
a new keyframe is sampled, and the inlier measurements are associated to the corresponding 3D points to be used for future patch projection.
185
8.6 TRACKER INITIALIZATION

The patch tracking method described in Section 8.5 is fast enough for real-time
operation, but requires an initial position to start the iterative tracking procedure.
The region of convergence for the tracker is too small to make it feasible to use the
GPS and compass reading as an initialization (see Section 8.7.2). Instead, visual
localization procedures are employed that are capable of determining the camera
pose within a wide range of possible positions.
The task of visual localization is challenging in the outdoor case. This is because
the range of possible views is large compared to an indoor setting. When tracking
a desk, for example, the camera can reasonably be expected to move within a small
range of distances from the desk surface. In a building courtyard or street-side setting, however, the camera could be far from the original point of capture, but the
building would still be visible because of its size. This mandates localization strategies that are robust to large changes in perspective and scale.
This section presents two different methods for tracker initialization and reinitialization after tracking failure. The image-based method is fast enough to be computed
in real time but is limited in range. The feature-based method offers a more robust
solution to the visual localization problem, but requires significant computation as
well as storage for the descriptor database. The way these methods are combined is
explained in the system design overview given in Section 8.7.
8.6.1Image-Based Method
The image-based localization method is relatively simple and is easily implemented.
A cache of recently seen images is stored along with their known poses. When the
tracking system fails and enters the lost state, the system matches the current image
to the cache to find pose hypotheses. The best matching image is used as a pose prior
to start the tracker at the next frame.
The image cache is generated during tracker operation and can be saved for reuse
in future tracking sessions. During tracker operation, a tracked image is added to the
cache when tracking is successful and the closest keyframe in the cache is more than
1 m different in position or 45 different in orientation.
To find image matches, a variant of the small blurry image (SBI) matching procedure is used (Klein and Murray, 2008). Images in the cache are down-sampled to the
fifth image pyramid level (meaning that they are half-sampled four times). This same
down-sampling is applied to the current query image from the camera. Then each
cache image is compared to the query image using the normalized cross-correlation
score (Gonzalez and Wood, 2007). The pose of the image with the highest correlation score is used as the initialization for the tracker in the next frame.
The SBI localization method is suitably fast even for a large number of cache
images. However, it requires the query camera to be relatively close to a cache image.
Beyond a small amount of translation or rotation, the query frame will not match to
any cache image. Thus, this method is impractical for outdoor localization in a large
space, unless a very dense coverage of cache images is acquired.
186
8.6.2Feature-Based Method
Alternatively, feature matching can be used to extend the range of poses where
localization can be achieved. Given a query image, the system extracts features and
searches for correspondences in the panorama keyframe database. Then, a robust
sampling procedure is used to find a subset of inlier correspondences that support a
common pose estimate.
Each feature from the query image is matched to its nearest neighbor in the set
of all features in the database according to the Euclidean distance between SIFT
descriptors. Approximate nearest-neighbor search is performed using a kd-tree for
speed (Lowe, 2004). Then the camera pose is robustly estimated using the PROSAC
procedure (Chum and Matas, 2005) and the three-point absolute pose algorithm
(Fischler and Bolles, 1981).
An alternative approach is to apply a document retrieval technique (Sivic and
Zisserman, 2003). A vocabulary tree (Nistr and Stewenius, 2006) is used to hierarchically organize the descriptors so that each descriptor is identified by its cluster (or word).
Given a query frame, standard tf-idf weighted document matching is applied to order the
keyframes by similarity (Sivic and Zisserman, 2003). The top K documents are then subjected to geometric pose verification to find a suitable match. For each top-ranked document, the nearest-neighbor matching and pose estimation procedure described earlier
is performed, using only features from the single image that was retrieved. This image
retrieval approach scales with database size better than the performing nearest-neighbor
matching with entire database. However, there must exist in the database a single view
that has enough visual overlap with the query for the procedure to work. Irschara etal.
developed a method to increase the set of views in the database synthetically, which
increases the range of the image retrieval technique (Irschara etal., 2009).
8.7 SERVER/CLIENT SYSTEM DESIGN

This section gives an explanation of how the various components described earlier
are organized into a complete outdoor tracking system for AR applications.
8.7.1Server/Client System Overview

The overall system design is illustrated in Figure 8.6. The online tracking and
image-based reinitialization components are computed directly on the mobile client device, as these are relatively lightweight operations that can be computed in
real time on such restricted hardware. The feature-based localization component is
computed on a remote server or computing cloud, where storage and computation are
essentially unrestricted. The systems preparation and its online operation procedure
are described in more detail in the following.
First, the omnidirectional video is processed in the offline reconstruction process
to produce the 3D point cloud model with panorama keyframes and feature descriptors. The point cloud and the keyframes are copied to the mobile client device; the
descriptors are not needed for tracking and thus do not need to be copied onto the
client device.
187
Omnidirectional video
Stored in
server
Server
Offline reconstruction
Wireless network
Localization
request
Orientation
sensors
Camera
Localization
response
Patch
projection
3DOF relative
rotation
Keyframes and
3D points
Tracking
Video
stream
Copied
to client
Live keyframe
sampling
Dynamic 6DOF
absolute pose
AR display
Mobile device
FIGURE 8.6 Tracking system overview with server/client design.
The tracking system runs in real time on the client device in the following loop.
First, the system tries to track the model using the previous pose estimate. The incremental rotation estimate provided by the inertial sensors in the device is preapplied
to the previous pose estimate to compensate for fast motion. If tracking fails, then the
image cache is searched using the current camera image. Tracking is then tried again
using the pose prior provided by the best match from the image cache. If this fails,
then the system generates a localization query that is sent to the server over the wireless network. While the server processes the query, the system continues attempting
to restart tracking using the image cache. When the query response is received, the
computed pose is used to restart tracking.
8.7.2Latency Analysis
Due to network communication time and server computation time, feature-based
localization introduces latency between an image query and a pose response. During
this time, the camera might be moved from its query position, introducing error in
the localization pose estimate. Thus, the system needs some ability to handle an
outdated pose estimate from the localization system.
188
The region of convergence of the tracker determines the amount of error in the
pose prior that the system can tolerate. The continuous pose tracker uses a patch
search method to find a point given a pose prior. This search occurs over a fixed
region around the estimated point projection location, and is run over an image pyramid to expand the search region. This establishes a maximum pixel error in the
projected point location that will still lead to tracker convergence.
We use a simplified analysis here by considering movement in one dimension, to
produce an estimate of the tracker convergence region.
Assuming rotation around the Y-axis (vertical axis), a rotational error of qerr
degrees will cause a pixel offset of xerr pixels:
xerr = f tan(qerr)
where f is the focal length parameter of the cameras intrinsic calibration matrix.
The maximum projection error can be used to find the maximum rotational pose
error qmax.
The system uses an effective search radius of 423 = 32 pixels, and the Apple iPad
2 camera used for testing has a focal length of f = 1179.90. Thus, the maximum rotational pose error is qmax = 1.55. This limit could be a problem if localization latency
is 1 s or more.
For the translation case, the maximum translation tX depends on the distance Z to
the observed object:
xerr = ftX Z
For the iPad 2 camera, the maximum translation is tX /Z = 0.03. Given a building
that is 12 m away, the maximum translation would be about 1/3 m. This as well
would be a limitation for localization, given the distance a fast-walking user could
cover in 1 s.
This analysis suggests in general that the complete time for the localization query
to be sent, processed, and returnedthe localization latencyshould be within 1 s.
Timing data from our experiments is given in Section 8.1.
8.7.3Sensor Integration
To overcome the problem of rotational movement during the localization latency period,
the inertial sensors in the device are used to maintain an estimate of rotational movement. The estimated difference in rotation between the localization query and response
is preapplied to the localization response before attempting to initialize the tracker.
A similar approach could be applied to estimate translational movement based
on accelerometer readings. However, the accelerometer found in typical consumer
devices, such as the iPad 2, are too noisy to be used for estimating translation, even
over a brief period. Fortunately, translational error during the latency period is not an
issue in larger environments such as typical urban scenes. This is because generally
the distance to the buildings is such that small translational movements do not cause
significant parallax in the image.
189
8.8EVALUATION
This section reports on evaluations of several aspects of the system and shows that
it provides sufficient tracking performance to support many kinds of geo-referenced
mobile AR applications.
8.8.1Speed
Localization queries are processed on a remote server while the mobile tracker continues running. This means that the server does not have to respond in real time, since
the processing happens in the background. However, the processing time should be
as short as possible to provide a smooth user experience, and ideally within 1 s, as
determined in Section 8.7.2.
Average timings were recorded using an Apple Mac Pro with a 2.26 GHz
Quad-Core Intel Xeon and 8 GB RAM. The model tested has 21 panoramas,
3691 points, and 6823 features. Most of the computation time is spent on SIFT
feature extraction (900 ms) and PROSAC pose estimation (500 ms). The time to
transfer a JPEG-compressed image from the device to the server is not a severe
bottleneck, even with a 3G cellular data connection. Transfer time typically
takes 3040 ms using either a wireless or 3G connection.
Overall, the average localization latency is about one and a half seconds. In practice we have experienced localization times of 23 s for a larger model. However, the
processing speed could be greatly improved by using GPU implementations of the
feature extraction and pose estimation steps.
The speed of online tracking on the client device was evaluated using an Apple
iPad 2 tablet. Feature-based tracking on the mobile device consists of three steps that
constitute the majority of computation time per frame: point culling (0.005 ms per
point); patch warp (0.02 ms per point); and patch search (0.033 ms per point). The
total tracking time per frame depends on the total number of points in the model
Ntotal, the number of points tracked Ntrack, and the number of pyramid levels L. This
gives an approximate tracking time per frame:
ttrack = Ntotaltcull + NtrackL(twarp + tsearch)
With multithreading on the dual-core iPad 2, the processing time is approximately
reduced by half. For a model with 3691 points, 1024 tracked points, and 4 pyramid
levels, this gives a maximum tracking time of approximately 117 ms per frame.
However, typically the number of points tracked decreases at each successive pyramid search level, so the actual tracking time in practice is lower, and frame rates of
1520 fps tracking are achievable.
8.8.2Accuracy Tests with Differential GPS

To test the absolute positional accuracy possible with the system, a differential GPS
receiver was attached to the iPad 2. Differential GPS receivers use measurements
from GPS satellites as well as a correction signal from a nearby base station in order
190

10
30
GPS
Tracker
35
Easting (m)
Northing (m)
40
45
50
55
20
25
30
35
60
65
GPS
Tracker
15
500
1000
1500
2000
Frame
2500
40
500
1000
1500
2000
2500
Frame
FIGURE 8.7 Comparison of the camera position estimates from the visual tracking system
with ground truth position estimates from the differential GPS receiver.
to attain ground truth positional estimates with accuracy under 10cm. Because the
GPS receiver produces positional readings at a rate of 1Hz, linear interpolation was
used to up-sample the signal to 30Hz.
A test video with the differential GPS receiver was recorded in the Graz Hauptplatz
while observing the Rathaus (City Hall). The panoramic reconstruction of this area
was made from 37 panoramas taken with the Ricoh Theta camera. The resulting reconstruction contains 14,523 points. The semiautomatic alignment method described in
this chapter was used to georegister the model with respect to building outlines from
OpenStreetMap. An overhead view of the point cloud is shown in Figure 8.5.
A comparison of the differential GPS track and the positional track created with
our system is shown in Figure 8.7. The system achieved an average error of 0.72 m
in the easting direction and 0.38 m in the northing direction. This shows that our
system provides better accuracy than consumer GPS, which has an accuracy of about
3 m with a high-quality receiver.
8.8.3Augmentation Examples
Several prototypes have been developed and tested to evaluate the use of our modeling and tracking system for AR applications. Example screen captures from these
prototypes are shown in Figure 8.8.
The first prototype is a landscape design application. In a large courtyard on the
UC Santa Barbara campus, the user can place virtual trees on the large grassy area
between the buildings. As trees are placed, the user can move around to view how
the trees would look from different angles.
A second prototype tests the use of video game graphics. In this application, a
landing spaceship is rendered into another building courtyard on the UCSB campus
at the spot on the ground where the user touches the screen. Using an assumed position of the sun, accurate shading and shadows are rendered, to increase the realism
of the rendering.
A third prototype was created to test architectural rendering. Here, a reconstruction of a city street (Branch Street in Arroyo Grande, CA) was created by holding the
panorama camera out on the sunroof of a car and driving down the street to capture
191
(a)
(b)
(c)
FIGURE 8.8 Example images of the tracking system in use with 3D models rendered over
the camera image. (a) Synthetic trees planted in the grass. (b) A spaceship landing in the
courtyard, rendered with lighting and shadow effects. (c) Virtual lamps affixed to the side of
the building.
the buildings on either side. Then, a user standing on the sidewalk can add architectural elements such as virtual lamps to the building facades by simply touching the
screen at the points on the wall where they should be placed.
8.9DISCUSSION
From these evaluations, it can be concluded that visual modeling and tracking offers
a compelling solution to device pose estimation for mobile AR applications. The
approach enables high-accuracy tracking at real-time rates with consumer hardware.
Experience with the prototype applications suggests that the pose estimation is of
sufficient quality to make objects appear to stick to surfaces, such that they seem
truly attached to a wall or a ground. Using simple rendering techniques such as
shading and shadowing also helps to improve the perceived realism of the rendered
graphics.
The major limitation of this approach is that the system is generally restricted
to operation from viewpoints where the scene is visually distinctive and able to be
recognized by its appearance. For many viewpoints, this is not the case, such as
texture-less building walls, and the sky or the ground. In addition, many scenes contain repetitive textures, such as grids of windows, that confuse the visual localization
system and lead to system failure. One possible solution to these problems would be
to further integrate other position and motion sensors, such as a GPS receiver, accelerometer, gyroscope, and compass, to complement the visual tracker.
The source code for the system described in this chapter is publicly available for
download, testing, and further development at http://www.jventura.net/code.
192
8.10 FURTHER READING

In this final section, references are provided so that the interested reader can find
more details about this work, as well as canonical references to learn more about this
research area and other approaches to the problem. This reference list is not intended
to be exhaustive, but is instead a starting point for further investigation.
More details about the system described in this chapter can be found in our
research papers (Ventura and Hollerer, 2011, 2012a,b) and Venturas doctoral dissertation (Ventura, 2012). The 3D reconstruction pipeline is based on that of Snavely
(2008), Snavely etal. (2006), with modifications to handle cameras arranged in a
panoramic rig. The fundamentals of multiview geometry are discussed extensively
in the essential textbook by Hartley and Zisserman (2004).
One classic reference for camera pose estimation is that of Fischler and Bolles,
who introduced a solution to the camera pose estimation problem as well as the
Random Sample Consensus (RANSAC) method for finding a consistent set of observations from noisy data (Fischler and Bolles, 1981). PROSAC is a more efficient variant of RANSAC and is applied in this work (Chum and Matas, 2005). Most modern
methods rely on the SIFT method to detect feature points and kd-trees for approximate nearest-neighbor feature matching (Lowe, 2004). Many researchers have also
investigated more scalable approaches to feature matching (Arth et al., 2009; Li
etal., 2010; Sattler etal., 2011). The document-based approach to image retrieval
was introduced by Sivic and Zisserman (2003) and expanded by others to include
vocabulary trees (Nistr and Stewenius, 2006), geometric verification (Philbin etal.,
2007), and virtual images (Irschara etal., 2009).
Camera-based tracking also has a long history of research. In the AR context, one
canonical work is by Lowe who used SIFT descriptors for initialization and tracking
(Skrypnyk and Lowe, 2004). More recently, the landmark work of Klein and Murray
introduced Parallel Tracking and Mapping (PTAM), where points are triangulated and
tracked simultaneously in an efficient manner (Klein and Murray, 2007). The tracking
method described in this chapter is adapted from this work. Alternatives to the pointbased approach are possible, such as using a wireframe (Klein and Murray, 2006) or
textured 3D model (Reitmayr and Drummond, 2006). Researchers in AR systems have
also considered approaches to outdoor pose tracking that use the camera in combination with other dedicated position and velocity sensors (Oskiper etal., 2012).
REFERENCES
Arth, C., Wagner, D., Klopschitz, M., Irschara, A., and Schmalstieg, D. (2009). Wide area
localization on mobile phones. In ISMAR09 Proceedings of the 2009 Eighth IEEE
International Symposium on Mixed and Augmented Reality (pp. 7382). Washington,
DC: IEEE Computer Society.
Chum, O. and Matas, J. (2005). Matching with PROSAC-progressive sample consensus.
In CVPR 2005. IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, 2005 (Vol. 1, pp. 220226). Washington, DC: IEEE Computer Society.
Fischler, M. A. and Bolles, R. C. (1981). Random sample consensus: A paradigm for model
fitting with applications to image analysis and automated cartography. Communications
of the ACM, 24(6), 381395.
193
Gonzalez, R. and Wood, R. E. (2007). Digital Image Processing. Upper Saddle River, NJ:
Pearson/Prentice Hall.
Hartley, R. and Zisserman, A. (2004). Multiple View Geometry in Computer Vision. Cambridge,
UK: Cambridge University Press.
Huber, P. J. (1981). Robust Statistics. New York: John Wiley & Sons.
Irschara, A., Zach, C., Frahm, J. M., and Bischof, H. (2009). From structure-from-motion
point clouds to fast location recognition. In CVPR 2009. IEEE Conference on Computer
Vision and Pattern Recognition, 2009 (pp. 25992606). Washington, DC: IEEE
Computer Society.
Klein, G. and Murray, D. (2006). Full-3d edge tracking with a particle filter. In British
Machine Vision Conference (BMVC06). Manchester, U.K.: British Machine Vision
Association.
Klein, G. and Murray, D. (2007). Parallel tracking and mapping for small AR workspaces. In
ISMAR07: Proceedings of the 2007 Sixth IEEE and ACM International Symposium on
Mixed and Augmented Reality (pp. 225234). Washington, DC: IEEE Computer Society.
Klein, G. and Murray, D. (2008). Improving the agility of keyframe-based SLAM. In
ECCV08: Proceedings of the 10th European Conference on Computer Vision: Part II
(Vol. 5303 LNCS, pp. 802815). Berlin, Germany: Springer-Verlag.
Li, Y., Snavely, N., and Huttenlocher, D. (2010). Location recognition using prioritized feature
matching. In ECCV10: Proceedings of the 11th European Conference on Computer
vision: Part II (pp. 791804). Berlin, Germany: Springer-Verlag.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International
Journal of Computer Vision, 60(2), 91110.
Nistr, D. (2004). An efficient solution to the five-point relative pose problem. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 26(6), 756777.
Nistr, D. and Stewenius, H. (2006). Scalable recognition with a vocabulary tree. In CVPR06
Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and
Pattern Recognition (Vol. 2, pp. 21612168). Washington DC: IEEE Computer Society.
Oskiper, T., Samarasekera, S., and Kumar, R. (2012). Multi-sensor navigation algorithm using
monocular camera, IMU and GPS for large scale augmented reality. In ISMAR12:
Proceedings of the 2012 IEEE International Symposium on Mixed and Augmented
Reality (ISMAR) (pp. 7180). Washington, DC: IEEE Computer Society.
Philbin, J., Chum, O., Isard, M., Sivic, J., and Zisserman, A. (2007). Object retrieval with large
vocabularies and fast spatial matching. In CVPR07. IEEE Conference on Computer
Vision and Pattern Recognition, 2007 (pp. 18). Washington, DC: IEEE Computer
Society.
Reitmayr, G. and Drummond, T. W. (2006). Going out: Robust model-based tracking for
outdoor augmented reality. In ISMAR06: Proceedings of the Fifth IEEE and ACM
International Symposium on Mixed and Augmented Reality (pp. 109118). Washington,
DC: IEEE Computer Society.
Sattler, T., Leibe, B., and Kobbelt, L. (2011). Fast image-based localization using direct
2D-to-3D matching. In ICCV11 Proceedings of the 2011 International Conference on
Computer Vision (Vol. 43). Washington, DC: IEEE Computer Society.
Sivic, J. and Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. In ICCV03 Proceedings of the Ninth IEEE International Conference on
Computer Vision (Vol. 2, pp. 14701477). Washington, DC: IEEE Computer Society.
Skrypnyk, I. and Lowe, D. G. (2004). Scene modelling, recognition and tracking with invariant image features. In ISMAR04 Proceedings of the Third IEEE/ACM International
Symposium on Mixed and Augmented Reality (pp. 110119). Washington, DC: IEEE
Computer Society.
Snavely, K. (2008). Scene reconstruction and visualization from Internet photo collections.
Dissertation, University of Washington, Seattle, WA.
194
Snavely, N., Seitz, S. M., and Szeliski, R. (2006). Photo tourism: Exploring photo collections
in 3D. ACM Transactions on Graphics (TOG)Proceedings of ACM SIGGRAPH 2006,
25(3), 835846.
Ventura, J. (2012). Wide-Area Visual Modeling and Tracking for Mobile Augmented Reality
(T.Hollerer, Ed.). Santa Barbara, CA: University of California.
Ventura, J. and Hollerer, T. (2011). Outdoor mobile localization from panoramic imagery. In
ISMAR11 Proceedings of the 2011 10th IEEE International Symposium on Mixed and
Augmented Reality (pp. 247248). Washington, DC: IEEE Computer Society.
Ventura, J. and Hollerer, T. (2012a). Structure from motion in urban environments using
upright Panoramas. Virtual Reality, 17(2), 147156.
Ventura, J. and Hollerer, T. (2012b). Wide-area scene mapping for mobile visual tracking.
In ISMAR12 Proceedings of the 2012 IEEE International Symposium on Mixed and
Augmented Reality (ISMAR) (pp. 312). Washington, DC: IEEE Computer Society.
Von Gioi, R. G., Jakubowicz, J., Morel, J.-M., and Randall, G. (2010). LSD: A fast line segment detector with a false detection control. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 32(4), 722732.
Scalable Augmented
Reality on Mobile
Devices
Applications, Challenges,
Methods, and Software
Xin Yang and K.T. Tim Cheng
CONTENTS
9.1 Applications................................................................................................... 195
9.1.1 Computer Vision-Based MAR Apps................................................. 196
9.1.2 Sensor-Based MAR Apps.................................................................. 197
9.1.3 MAR Apps Based on Hybrid Approaches........................................ 198
9.2 Challenges..................................................................................................... 199
9.3 Pipelines and Methods...................................................................................200
9.3.1 Visual Object Recognition and Tracking.......................................... 201
9.3.1.1 Marker-Based Methods....................................................... 201
9.3.1.2 Feature-Based Method........................................................206
9.3.2 Sensor-Based Recognition and Tracking........................................... 216
9.3.2.1 Sensor-Based Object Tracking............................................ 216
9.3.3 Hybrid-Based Recognition and Tracking.......................................... 221
9.4 Software Development Toolkits.................................................................... 222
9.5 Conclusions.................................................................................................... 223
References............................................................................................................... 223
9.1APPLICATIONS
In recent years, mobile devices such as smartphones and tablets have experienced
phenomenal growth. Their computing power has grown enormously and the integration of a wide range of sensors, for example, compass, accelerometer, gyroscope,
GPS, etc., has significantly enriched these devices functionalities. The connectivity
of smartphones has also gone through rapid evolution. A variety of radios including
cellular broadband, Wi-Fi, Bluetooth, and NFC available in todays smartphones
enable users to communicate with other devices, interact with the Internet, and
exchange their data with, and run their computing tasks in the clouds. These mobile
195
196
handheld devices equipped with cameras, sensors, low-latency data networks, and
powerful multicore application processors (APs) can now run very sophisticated
augmented reality (AR) applications. There are already a very rich collection of
mobile AR (MAR) apps for visual guidance in assembly, maintenance and training, interactive books, multimedia-augmented advertising, contextual informationaugmented personal navigation, etc., that can be used anytime and anyplace.
A scalable MAR system, which has the ability to identify objects and/or locations
from a large database and track the pose of a mobile device with respective to the
physical objects, can support a wide range of MAR apps. Such a system can enable
applications such as nationwide multimedia-enhanced advertisements printed on
papers, augmenting millions of book pages in a library, and recognizing millions of
worldwide places of interests and providing navigation or contextual information, etc.
Broadly, scalable MAR apps can be categorized into three classes according to the
underlying techniques they rely on: computer vision-based, sensor-based, and hybrid.
9.1.1 Computer Vision-Based MAR Apps

Vision-based MAR apps rely on images captured by mobile cameras, and use
vision-based recognition and tracking algorithms to identify physical objects and
further link them to appropriate virtual objects. Specially, a conventional visionbased MAR pipeline consists of three main steps: (1) deriving a set of features
from a captured image frame and matching it to a database to recognize the object
(i.e., recognition), (2) tracking the recognized object from frame to frame by
matching features of consecutive frames (i.e., tracking), and (3) building precise
coordinate transforms (e.g., RANdom SAmple Consensus [RANSAC], Fischler
etal. 1981, or PROSAC, Chum etal. 2005) between the current image frame and
the recognized object in the database (i.e., pose estimation). Some details of visionbased MAR approaches will be presented in Section 9.3.1.
With rapid advances in real-time object recognition and tracking capability,
vision-based MAR apps are becoming increasingly popular. To date, we have seen
vision-based apps permeating into marketing (e.g., multimedia-augmented advertising), education (e.g., interactive books), social network, search engine, and mobile
health. In the following section, we highlight some exemplar vision-based MAR
apps, illustrating the rich and exciting opportunities in these areas.
In marketing and education, vision-based MAR apps can significantly improve
the user experiences by overlaying related multimedia information or digital operations (e.g., mouse click) on paper-based advertisements (such as newspapers, handbills, and movie posters) and books. Picture the following app for reading a book or
newspaper: through the phone camera and touchscreen, you could highlight any text
or figure, which is automatically recognized and used for search on the Web. Such
search results are then displayed on the viewfinder to facilitate the reading. Visionbased MAR technology can overlay extra 3D pictures, videos, and sounds related to
the recognized object on the viewfinder in real time to provide the user an augmented
sense that mixes reality and virtuality. For example, the user can see flowers that are
not actually blooming when pointing the camera to a picture of bare branches or can
link handouts with the augmented digital information. Regarding the latter example,
Scalable Augmented Reality on Mobile Devices
197
systems that facilitate the development of scalable mobile-augmented papers can be

found in EMM (Yang etal. 2011a,b), FACT (Liao etal. 2010) and Mixpad (Yang etal.
2012) from FXPAL, and Mobile Retriever (Liu etal. 2008).
A wide range of MAR apps for social mining heavily rely on face recognition
technologies. For instance, SocialCamera from Viewdle allows smartphone users
to take photos with built-in, instant tagging. It uses face recognition technology to
identify people among the friends in your face database, and tag them automatically.
Therefore, it will only take a few clicks to share tagged mobile photos with friends
through Facebook, Line, MMS, or email.
Google Goggles is a MAR app that conducts searches based on pictures taken by
the phone. Its visual search relies on real-time image recognition to identify objects
in the picture as the starting point for search, often referred to as query-by-image.
For example, a user can take a picture of a famous landmark or a painting to search
for information about it, a picture of a products barcode or a book cover to search
for online stores selling the product/book, a picture of a movie poster to view reviews
or to find tickets at nearby theaters, or a picture of a restaurant menu in French for
translation to English. Such a query-by-image capability allows users to search for
items without typing any text. For its image-based translation capability, the app
recognizes printed text and uses optical character recognition (OCR) to produce a
snippet and then translate it into another language.
There already exist several vision-based MAR apps for mobile health. These apps
could recognize food objects on the camera viewfinder, analyze their calories and
nutrition data, and then display the information using overlaid text and charts on the
viewfinder. While object recognition for most food items are quite feasible, the key
challenge for this app is the ability to estimate object volume (the serving size for
nutrition analysis). Accurate estimation would require knowing the distance between
the phone camera and the object.
9.1.2Sensor-Based MAR Apps

Sensor-based MAR apps rely on the input from sensors (e.g., compass, GPS, NFC,
accelerometer, and gyroscope) to identify and track the geographical position of a
mobile device and its orientation. Using this information, specific contextual information, such as the route to a destination, nearby shops, points of interests, etc., can
be brought out from a database and overlaid on top of the real-world scene displayed
on the viewfinder. With more types and more accurate sensors integrated into each
new generation of smartphones, sensor-based MAR apps have become increasingly
important and are gaining popularity, especially in the application area of personal
navigation.
An exemplar navigation app for outdoor activity is Theodolite which utilizes
a phone display as a viewfinder to overlay GPS data (coordinates and elevation),
compass heading, attitude, and time data. It serves as a compass, GPS, map,
zoom camera, rangefinder, and inclinometer. The obvious uses for such an app
include backcountry pursuits, skiing, fishing or boating navigation, surveying,
landscaping, tracking the trajectory of objects, as well as search and rescue.
Spyglass is a similar app for outdoor activities. It overlays positional information
198
using available sensors of the phone and could be used as a waypoints tool, sextant, compass, rangefinder, speedometer, and inclinometer.
Other sensor-based MAR apps which serve for navigation focus on the scenario
of car driving. For instance, Wikitude Drive is a MAR navigation system for which
computer-generated driving instructions are drawn on top of the realitythe real
road the user is driving on. Navigation thus takes place in real time in the smartphones live camera image. This app solves a key problem that existing navigation
systems havethe driver no longer needs to take his/her eyes off the road when looking at the navigations system. Other similar apps include Route 66 Maps + Navigation
that provides an amalgamation of comprehensive 3D maps and AR navigation to
bring fun and informative experience to drivers and Follow Me that can trace the
users exact route on road, backed with real-time graphics and a virtual car that leads
the user all the way to the destination.
In addition to navigation, sensor-based MAR is also applicable to educational
applications. A famous example is Star Walk, an interactive astronomy guide offering AR of the sky with the actual sky outside. Using this app, users can align the
physical sky to the sky shown on the display of the phone/tablet. This allows for pinpoint precision for tracking satellites, finding stars, or finding constellations, offering an attractive educational tool for students, part-time star gazers, or astronomers.
As the user adjusts the orientation of phone/tablet toward the sky, with the help of
the accelerometer and gyroscope sensors, which offer highly accurate positioning,
the sky shown on the display will move along to match the physical sky. Finding
celestial objects becomes easysimply moving the viewfinder over the object in the
real-world view and clicking on it. The user can also search for an objectJupiter,
a satellite, or a specific constellation, to have it instantly come up on the viewfinder.
Then the app will guide the user to adjust the orientation of the device to match it up
with the object in the real sky.
9.1.3MAR Apps Based on Hybrid Approaches

Sensor-based and vision-based methods have distinct advantages and disadvantages. Sensor-based method can obtain geographical locations and relative pose
of a mobile device from built-in inertial sensors, requiring few complex calculations. Therefore, it is attractive for any mobile platform with limited computing
and memory resources. However, its accuracy is usually low due to the low-cost
inertial sensors used in mobile handhelds. On the other hand, vision-based method
can achieve much better accuracy at a cost of high computational and memory
complexity.
Many scalable MAR apps that demand both high accuracy and high efficiency
employ hybrid methods which combine the complementary advantages of both
approaches. InterSense (Naimark et al. 2002) is a good example which uses a
sensor-based inertial tracker to predict the potential positions of markers and
then leverages vision-based image analysis for the candidate regions to refine the
results. There are several other examples using magnetic and gyro sensors to stabilize the tracking systems (Jiang etal. 2004; Reitmayr etal. 2007; Ribo etal. 2002;
Uchiyama etal. 2002).
199
9.2CHALLENGES
Despite advances in computer vision and signal processing algorithms as well as
mobile hardware, scalable MAR remains very challenging due to the following
reasons:

1. The design objectives of modern mobile APs require more than just performance. Priority is often given to other factors such as low power consumption and a small form factor. Although the performance of mobile
CPUs has achieved greater than 30 improvement within a short period of
recent 5years (e.g., ARM quadcore Cortex A-15 in 2014 vs. ARM 11 single-core in 2009), todays mobile CPU cores are still not powerful enough
to perform computationally intensive vision tasks such as sophisticated
feature extraction and image recognition algorithms. Graphics processing units (GPUs), which have been built into most APs, can help speed
up processing via parallel computing (Cheng etal. 2013; Terriberry etal.
2008), but most feature extraction and recognition algorithms are designed
to be executed sequentially and cannot fully utilize GPU capabilities. In
addition, mobile devices have less memory and lower memory bandwidth
than desktop systems. The memory of todays high-end smartphones, such
as Samsung Galaxy S5, is limited to 2 GB of SDRAM and the memory
size of mid- and entry-level phones is even smaller. This level of memory
sizes is not sufficient for performing local object recognition using a large
database. In order to realize efficient object recognition, the entire indexing structure of a database needs to be loaded and reside in main memory.
The total amount of memory usage for an indexing structure usually grows
linearly with the number of database images. For a database of a moderate size (e.g., tens of thousands of images), or a large size (e.g., millions of images), the indexing structure itself could easily exhaust memory
resources. Several scalable MAR systems employ the client-server model
to handle large databases. That is, sending the captured image or processed image data (e.g., image features) to a server (or a cloud) via the
Internet, performing object recognition and pose estimation on the server
side, and then sending the estimated pose and associated digital data back
to the mobile device. While Wi-Fi is a built-in feature for almost all mobile
devices, connection to high-bandwidth access points is still not available
anyplace, neither anytime. For connection to data networks, todays mobile
devices rely on a combination of mobile broadband networks including
3G, 3.5G, and 4G. These networks, while providing acceptable network
access speed for most apps, cannot support real-time responses for apps
demanding a large amount of data transfer. Moreover, advanced mobile
broadband networks still have limited availability in areas not having
dense populations.
2. For most algorithms, it is very challenging to achieve both good accuracy
and high efficiency. As mentioned in previous section, sensor-based methods can achieve good efficiency but its performance is often limited by
200
the low precision of sensors used in mobile devices, limiting their applicability for apps demanding high recognition rate and tracking accuracy.
In addition, sensor-based approaches do not provide information about
the objects in the camera picture. As a result, the presented information
could only be related to the direction and position of the device, not to a
specific object. On the other hand, vision-based methods usually require
significant computation and memory space to process the image which
consists of a large number of pixels, to match against a large database
and to estimate geometric transformation between the object within a
captured image and the recognized database object. A hybrid approach
that integrates vision-based and sensor-based methods can potentially
combine their complementary advantages; however, designing a fusion
solution that optimizes accuracy, efficiency, and robustness is not a trivial
task at all.
9.3 PIPELINES AND METHODS

Augmented reality links proper context with a real-world object/location and adjusts
the pose of the virtual data so as to render it at a correct position and with a correct
orientation on a real-world image. In order to do this, the system needs to know
where the user is and what the user is looking at. More specifically, a MAR system
needs to determine the location and orientation of the mobile device, calculate the
relative pose (location and orientation) of the device in real time, and then render
virtual objects in the correct place. Figure 9.1 illustrates a general pipeline of a scalable MAR system. Given the physical data (e.g., an image, the geographical location,
the direction) captured from a mobile handheld device, the system identifies objects/
locations of interests captured by the mobile device. The identification process is
usually conducted by processing the captured physical data to generate a description delineating the real-world scene, and then matching the description to a large
database. The database object which best matches the feature of the captured object
is considered as the recognized object and an initial pose is generated. After recognition, the movement of the recognized object is tracked. The recognizer and the
Visual data, audio data
location, direction, altitude, ...
lighting, temperature, pressure, ...
Reality
data
Cloud
Database
Object/location
identification
Identified
result
Augmented digital
data
Pose tracking
Associated
data
Relative
pose
Overlaying virtual
data on reality
Final scene
Cloud
FIGURE 9.1 A general pipeline for scalable augmented reality on mobile devices.
201
tracker of AR are executed alternatively to complement each other: the recognizer is

activated whenever the tracker fails or new objects/locations occur and the tracker
bridges the recognized results and speeds up the process by avoiding unnecessary
recognition task that is more computationally expensive than tracking. Databases
that contain objects for recognition and associated digital data can be stored either
in local storage space or in the cloud, depending on the size of databases, available
local storage resources, and performance requirement.
9.3.1 Visual Object Recognition and Tracking

As a camera is a built-in component in most MAR systems, methods based on visual
object recognition and tracking are of special interests in MAR. Researchers in
image processing, computer vision, pattern recognition, and machine learning have
developed a considerable number of object recognition and tracking methods. Some
of successful ones for AR rely on predefined markers that consist of easily detectable
patterns. In a marker-based MAR system, each object of interest is associated with
a particular marker. By detecting, identifying, and tracking the marker using image
analysis techniques, an AR system can recognize the associated object and obtain
the correct scale and pose of the camera relative to the physical object.
Another category of approaches for visual object recognition and tracking is
based on visual feature extraction and matching. These methods use a prebuilt database containing precomputed features for all objects of interests. For each image
frame captured by a camera, the system extracts the same type of features from the
image and then matches the feature to database features. The corresponding object
of the best match in the database is then reported as the recognized object. For
tracking the recognized object, visual features extracted from consecutive frames
are derived and matched to track the movement of the object from frame to frame.
Marker-based methods and feature-based methods, having distinct advantages and
limitations, are not mutually exclusive and thus can be combined. In the following
section, we describe the procedure pipeline for each category and give an overview
of state-of-the-art methods for each step in the pipeline.
9.3.1.1 Marker-Based Methods
A good marker design should facilitate a quick and reliable detection, identification,
and tracking of the marker under various imaging conditions. It has been shown that
black-and-white markers are much more robust to various photometric changes and
background clutters than chromatic markers. Previous studies have also concluded
that a square is the simplest and most suitable shape for a maker. This is because
a system needs at least four pairs of corresponding points between two detected
markers from two image frames to estimate the camera pose. Four corner points of
a square are sufficient for homography estimation and can be reliably detected as
intersections of edge lines. Therefore, most existing MAR systems leverage blackand-white and square-shaped markers for recognition and tracking tasks, and in this
section we focus our discussion on this type of markers.
Typically, a marker-based MAR system consists of three key components, as
shown in Figure 9.2: (1) marker detection in which regions that are likely to
202

Marker detection
Color to gray
image
conversion
Edge
detection
Line fitting
and corner
detection
Candidate
markers
detection
Captured image
Recognized object
and camera pose
Postverification
Detected markers
Marker tracking
and
pose estimation
Marker identification
(Marker template, 2D barcode,
imperceptive markers)
FIGURE 9.2 Illustration of marker-based MAR pipeline.
be markers are localized, (2) marker identification in which candidate marker

regions are verified and recognized, and (3) marker tracking and pose estimation in which four pairs of corner points are used to estimate the homography
transformation.
9.3.1.1.1 Marker Detection
The goal of marker detection is to find positions of markers, to delineate the
boundary of each marker, and to localize corner points of markers in an image.
A basic marker detection procedure is illustrated by blue blocks in Figure 9.2.
First, an RGB image is converted into an intensity image. Second, edge detection
is performed on the grayscale image to obtain a list of edge segments. Popular
edge detection operators include Canny edge operator (Canny 1986), Sobel operator (Gonzalez et al. 1992), and Laplacian operator (Haralick 1984). Line fitting
is then conducted based on the detected edges, which detects intersections of
two lines as potential corners. Then, regions which are defined by four straight
connecting lines and consist of four corners are considered as potential marker
candidates. Finally, candidate markers are verified by some effective and fastto-compute criteria. Candidates that do not pass the verification procedure are
removed as false positives to avoid unnecessary processing for nonmarker regions
in the following identification and tracking steps. A simple yet effective verification scheme is based on the size of a candidate marker region, that is, rejecting
small regions with a limited number of pixels, since small regions are either false
positives or true markers but are too far away from a camera to achieve reliable
pose estimation. Another fast verification criterion is based on the histogram of
a region. As a maker consists of black-and-white colors, its histogram should be
bipolar. Based on this criterion, we could easily remove false positives which have
a relatively uniform histogram. In addition, depending on the particular appearance of a marker, a system could quickly reject obvious nonmarkers with high
confidence. For instance, 2D barcode markers have a number of sharp edges
inside a marker (i.e., edges between white and black cells). A heuristic yet efficient
criterion is to examine the frequency of intensity changes in two perpendicular
directions: if the frequency is below a predefined threshold, this region is rejected
as a false positive.
203
9.3.1.1.2 Marker Identification

A scalable MAR system utilizes a large database, in which each element is linked to
corresponding virtual data, digital interactions, etc. The goal of marker identification is to find a matched element from the database for a captured marker so that the
system knows which virtual data should be overlaid on top of the current real-world
scene. Techniques used for marker identification depend on the marker type. Two
popular types of markers widely used for AR are template-based markers and 2D
barcode markers.
Template markers are black-and-white markers which have a simple image inside
a black border (as shown in Figure 9.3a). Identification of a template marker is typically based on template matching, that is, comparing the marker image captured from
the detection process to a database consists of all marker templates. Specifically, a
marker region is first cropped from the captured image and then it is rectified to be
a square and scaled to the same size as marker templates in the database. After that,
the rectified and scaled marker is rotated into four different orientations (0, 90,
180, and 270). For each orientation, an overall best match in the database with the
highest similarity value to the rotated marker is identified. If the highest similarity
value is greater than a threshold, the region is considered as a recognized marker;
otherwise, it is unrecognized and the system rejects it. Several similarity metrics
have been proposed for matching, such as the sum of squared differences (SSD),
mutual information, etc. (Ashby etal. 1988).
One major limitation of template matching is that a system needs to match a
marker candidate region against all marker templates in the database. Matching each
(a)
(b)
(c)
(d)
FIGURE 9.3 Illustration of 2D markers. (a) Template marker. Barcode markers: (b) QR
code, (c) DataMatrix, and (d) PDF417.
204
pair of marker images involves processing of all corresponding pixels of the two
images. As a result, the total runtime for template matching is nontrivial and grows
linearly with the number of templates in the database, prohibiting its usage for a
large database. To speed up the matching process, markers are often down-sampled
to a small size, that is, 16 16 or 32 32, to reduce the amount of pixels needed to be
compared. However, the scalability and accuracy of a MAR system could be greatly
restricted by the size of quantized markers. For a quantized image size of 16 16
inside a marker, the maximum number of distinct patterns that can be generated is
(16 16)2 = 65,536 (each pixel can be either 1 or 0). In other words, the upper bound
on the number of distinct markers is limited to 65,536. In practice, due to the photometric changes, inaccuracy in the detection process and other noise sources, the
number of markers that can be correctly and reliably detected could be even smaller
than this number. Due to these limitations, template markers are not suitable for scalable MAR apps which use a large database.
2D barcode markers are markers consisting of frequently changed black-andwhite data cells and possibly a border or other landmarks (as shown in Figure 9.3b
through d). A system identifies a 2D barcode marker by decoding the encoded information in it. Typically, the decoding process is performed by sampling the pixel
values from the calculated center of each cell, and then resolves the cell values
using them. The resolved cell value can be either interpreted as a binary number
(i.e., 0 or1) or can link to more information (e.g., ASCII characters) via a database.
Popular 2D barcode standards include QR code (Information Technology 2006a),
DataMatrix (Information Technology 2006b), and PDF417 (Information Technology
2006c), which were originally developed for logistics and tagging purposes but
are also used for AR apps. In addition to these three standards that will be briefly
described in the following section, there are many other standards (e.g., MaxiCode,
Aztec Code, SPARQCode) that might be used for tracking in some applications too.
QR code (Figure 9.3b) is a 2D barcode created by the Japanese corporation DensoWave in 1994. QR is the abbreviation for Quick Response, as the code is intended for
high-speed decoding. QR code became popular for mobile tagging applications and
is the de facto standard in Japan. QR code is flexible and has large storage capacity.
A single QR code symbol can contain up to 7089 numeric characters, 4296 alphanumeric characters, 2953bytes of binary data, or 1817 Kanji characters. Therefore, QR
codes are very suitable for large-scale MAR apps.
DataMatrix (Figure 9.3c) is another popular barcode marker which is famous for
marking small items such as electronic components. The DataMatrix can encode
up to 3116 characters from the entire ASCII character set with extensions. The
DataMatrix barcode is also used in mobile marketing under the name SemaCode.
PDF417 (Figure 9.3d) was developed in 1991 by Symbol (recently acquired by
Motorola). A single PDF417 symbol can be considered multiple linear barcode rows
stacked above each other. A single PDF417 symbol can theoretically hold up to 1850
alphanumeric characters, 2710 digits, or 1108bytes. The exact data capacity depends
on the structure of the data to be encoded; this is due to the internal data compression algorithms used during coding. The ratio of the widths of the bars (or spaces) to
each other encode the information in a PDF417 symbol. For that reason, the printing
accuracy and a suitable printer resolution are important for high-quality PDF417
205
symbols. This also makes PDF417 the least suitable for AR applications where the
marker is often under perspective transformation.
9.3.1.1.3 Marker Tracking
The main idea of AR is to present virtual objects in a real environment as if they were
part of it. Virtual objects should move and change its pose accordingly with the movement of a mobile camera. Tracking the camera pose (i.e., camera location and orientation) in real time is required in order to render the virtual object in the right scale and
perspective. The pose of a camera relative to a marker in the real scene can be uniquely
determined from a minimum of four corresponding points between the marker in the
real scene and the marker on the camera image plane. Note that the four points used for
determining the camera pose need to be coplanar but noncollinear. Marker-based object
tracking uses the four corners of a square marker, which can be reliably detected, for this
purpose. We define a transformation T between a camera and a marker as
X r11 r12

x
Y r21 r22

y = T =
Z r31 r32

1
1 0 0
r13 t x X

r23 t y Y
(9.1)
r33 t z Z

0 1 1
where
[X Y Z 1]T is a homography representation for a marker corners coordinates in the
earth coordinate system
[x y 1]T is its projected coordinates on the image plane
Once an initial camera pose is obtained, the system can keep tracking the marker on the
image plane by constructing corner correspondences between consecutive frames and
computing the transformation matrix between two frames based on the correspondences.
9.3.1.1.4Discussions
2D barcode identification directly decodes the information in a marker without
demanding enormous amount of computations for image matching. In addition,
marker-based tracking only needs to detect four corners of a marker and estimate
the camera pose according to Equation 9.1. Therefore, the barcode marker-based
method is time efficient so as to provide real-time performance for many MAR apps.
Furthermore, 2D barcode markers have a large storage capacity and thus can support
applications which require high scalability. However, barcode markers need to be
printed on or attached to objects beforehand for association with specific contents.
Thus, they are visually obtrusive and for some outdoor scenarios (e.g., landmarks)
attaching markers to objects is not feasible. Moreover, marker-based methods are
sensitive to occlusion. These limitations may lead to a poor user experience. Featurebased methods can overcome these limitations, while the slower speed and greater
memory usage could be two major issues.
206
9.3.1.2 Feature-Based Method

Local features (an example shown in Figure 9.4a) have been used for various computer vision apps, including object recognition, tracking, image registration, etc.
Different from the conventional global feature extraction which generates a single
feature vector for an entire image, local feature extraction generates a set of highdimensional feature vectors for an image. Local feature extraction typically consists
of two steps: (1) interest point detection, also referred to as local feature detection,
which selects a set of salient points in an image, and (2) interest point description,
also referred to as local feature description, which transforms a small image patch
around a feature point into a vector representation suitable for further processing.
In comparison with a global feature representation, it has been demonstrated that
local features are more robust to various geometric and photometric transformations,
occlusion, and background clutters. As a result, to ensure a more satisfactory user
experience, many existing MAR systems choose a local feature representation for
object recognition and tracking.
A typical flow for a local feature-based MAR is as follows. In the offline phase,
local feature extraction is performed for every database image. An indexing structure which encodes feature descriptors of all database images is constructed. In the
recognition phase, local features of a captured image is first extracted, each of which
is then used to query the database using the indexing structure for finding a matching local feature in the database. The database image which has the most matching
features with the capture image is considered as the recognized target. An initial
camera pose is then estimated based on the corresponding matches using RANSAC
or PROSAC algorithms. In the tracking phase, local features between consecutive
frames are compared. Corresponding matches are used to track the movement of
cameras between frames. In the following, we review state-of-the-art methods for
each step.
9.3.1.2.1 Local Feature Extraction
The efficiency, robustness, and distinctiveness of local feature representation significantly affect the user experience and scalability of a MAR system. In this part,
we review relevant work about interest point detection and description, and present
their latest advances for scalable MAR. However, there is an enormous breadth and
Lyy
Local feature example
Lxy
Dyy
Dxy
1
2
1
X = [x1, x2, ..., xd]
(a)
(b)
(c)
(d)
(e)
FIGURE 9.4 (a) An exemplar image overlaid with detected local features. (b) and (c) are
the discretized and cropped Gaussian second-order partial derivative in the y-direction and
the xy-direction, respectively; (d) and (e) are SURF box-filter approximation for Lyy and L xy
respectively.
207
amount for results in this field. With limited space, we can only afford reviewing
a small subset of representative results that are most relevant to the application of
scalable MAR.
9.3.1.2.1.1Interest Point Detection An interest point detector is an operator
which attributes a saliency score to each pixel of an image and then chooses a subset
of pixels with local maximum scores. A good detector should provide points that
have the following properties: (1) repeatability (or robustness), that is, given two
images of the same object under different image conditions, a high percentage of
points on the object can be visible in both images; (2) distinctiveness, that is, the
neighborhood of a detected point should be sufficiently informative so that the point
can be easily distinguished from other detected points; (3) efficiency, that is, the
detection in a new image should be sufficiently fast to support time-critical applications; and (4) quantity, that is, a typical image should contain a sufficient number
of points to cover the target object, so that it can be recognized even under partial
occlusion.
A wide variety of interest point detectors exist in the literature. Some lightweight detectors (Rosten etal. 2006) aim at high efficiency to target applications
that demand real-time performance and/or mobile hardware platforms that have
limited computing resources. However, the performance of these detectors is relatively poor. As a result, it requires pose verification to exclude false matches in
the matching phase that often incurs a nontrivial runtime. On the other hand, several high-quality feature detectors (Bay etal. 2006, 2008; Lowe 2004) have been
developed with a primary focus on robustness and distinctiveness. These detectors ability to accurately localize correct targets from a large database makes
them suitable for large-scale object recognition. However, the computational complexity of these detectors is usually very high, making them inefficient on a mobile
device. Some recent efforts, for example (Yang etal. 2012a), have been made to
adapt these feature detection algorithms onto mobile devices and optimize their
performance and efficiency for MAR. Due to space limitation, we review only the
most representative work for the lightweight detector, the high-quality detector,
and algorithm adaptation. A thorough survey on local feature-based detectors can
be found in Tuytelaars etal. (2008).
Lightweight detector: FAST. The FAST (features from accelerated segmented test)
detector, proposed by Rosten etal. (2006), has become popular recently due to its
highly efficient processing pipeline. The basic idea of FAST is to compare 16 pixels
located on the boundary of a circle (radius is 3) around a central point, each pixel is
labeled from integer number 116 clockwise: if the intensities of n (n threshold)
consecutive pixels are all higher or all lower than the central pixel, then the central
pixel is labeled as a potential feature point and n is defined as the response value
for the central pixel. The final set of feature points is determined after applying
a nonmaximum suppression step (i.e., if the response value of a point is the local
maximum within a small region, this point is considered as a feature point). Since
the FAST detector only involves a set of intensity comparisons with little arithmetic
operations, it is highly efficient.
208
The FAST detector is not invariant to scale changes. To achieve scale invariance,
Rublee et al. (2011) proposed to employ a scale pyramid of an image and detect
FAST feature points at each level in the pyramid. FAST could incur large responses
along edges, leading to a lower repeatability and distinctiveness compared to highquality detectors such as SIFT (Lowe 2004) and SURF (Bay etal. 2006, 2008). To
address this limitation, Rublee etal. employed a Harris corner measure to order the
FAST feature points and discard those with small responses to the Harris measure.
High-quality detector: SURF. The SURF (Speeded Up Robust Feature) detector,
proposed by Bay etal. (2006, 2008), is one of the most popular high-quality point
detectors in the literature. It is scale-invariant and based on the determinant of the
Hessian matrix H(X, ):
L xx ( X , )
H ( X , ) =
L xy ( X , )
L xy ( X , )
(9.2)
L yy ( X , )
where
X = (x, y) is a pixel location in an Image I
is a scale factor
L xx(X, ) is the convolution of the Gaussian second-order derivative in x direction,
similarly for Lyy and L xy (see Figure 9.4b and c).
To speed up the process, a SURF detector approximates the Gaussian second-order
partial derivatives with a combination of box filter responses (see Figure 9.4d and e),
computed using the integral image technique (Simard etal. 1998). The approximated
derivatives are denoted as Dxx, Dxy, and Dyy and accordingly the approximate Hessian
determinant is

det( H approx ) = Dxx Dyy (0.9 Dxy )2 (9.3)
A SURF detector computes Hessian determinant values for every image pixel i over
scales using box filters of a successively larger size, yielding a determinant pyramid
for the entire image. Then it applies a 3 3 3 local maximum extraction over the
determinant pyramid to select interest points locations and corresponding salient
scales.
To achieve rotation invariance, SURF relies on gradient histograms to identify a
dominant orientation for each detected point. An image patch around each point is
rotated to its dominant orientation before computing a feature descriptor. Specially,
the dominant orientation of a SURF detector is computed as follows. First, the entire
orientation space is quantized into N histogram bins, each of which represents a
sliding orientation window covering an angle of /3. Then SURF computes gradient
responses of every pixel in a circular neighborhood of an interest point. Based on
the gradient orientation of a pixel, SURF maps it to the corresponding histogram
bins and adds its gradient response to these bins. Finally, the bin with the largest
responses is utilized to calculate the dominant orientations of interest points.
209
TABLE 9.1
Comparison of FAST and SURF Detector on Mobile Device and PC
Time
Detector
FAST detector
SURF detector
Mobile Device (ms)
PC (ms)
Speed Up
170
2156
40
143
4
15
Comparing to FAST, SURF point detection involves much more complex computations and, thus, is much slower than FAST. The runtime limitation of SURF is
further exacerbated when running a SURF detector on a mobile platform. Table 9.1
compares the runtime performance of a FAST detector and a SURF detector running on a mobile device (Motorola Xoom1) and a laptop (Thinkpad T420), respectively. Running a FAST detector takes 170 ms on a Motorola Xoom1 and 40 ms on
an i5-based laptop, yielding a 4 speed gap. However, running a SURF detector on
them takes 2156 and 143 ms, respectively, indicating a 15 speed gap.
Although FAST detector is more efficient than SURF, it cannot match SURFs
robustness and distinctiveness. As a result, it usually fails to achieve satisfactory
performance for MAR apps that demand high recognition accuracy from a large
database and/or handling content with large photometric/geometric changes.
Algorithm adaptation: Accelerating SURF on mobile devices. There are several
techniques for improving SURFs efficiency by exploiting coherency between consecutive frames (Ta etal. 2009), employing GPUs for parallel computing or optimizing various aspects of the implementation (Terriberry et al. 2008). An interesting
solution proposed recently (Yang etal. 2012a) is to analyze the causes for a SURF
detectors poor efficiency and large overhead on a mobile platform, and propose a set
of techniques to adapt the SURF algorithm to a mobile platform. Specially, two mismatches between the computations used in existing SURF algorithm and common
mobile hardware platforms are identified as the sources of significant performance
degradation:
Mismatch between data access pattern and small cache size of a mobile
platform. A SURF detector relies on an integral image and accesses it
using a sliding window of successively larger size for different scales. But
a 2D array is stored in a row-based fashion in memory (cache and DRAM),
not in a window-based fashion; pixels in a single sliding window reside
in multiple memory rows (illustrated in Figure 9.5a). The data cache size
of a mobile AP, typically 32kB for todays devices, is too small to cache
all memory rows for pixels involved in one sliding window, leading to
cache misses and cache line replacements and, in turn, incurring expensive
memory access.
Mismatch between a huge amount of data-dependent branches in the algorithm and high pipeline hazard penalty of the mobile platform. To identify a dominant orientation, a SURF detector analyzes gradienthistogram.
210

Locality based on original SURF
Locality based on tiled SURF

Sliding
window
Sliding
window
(a)
Image
tile
(b)
FIGURE 9.5 Illustration of data locality and access pattern in (a) the original SURF detector and (b) the tiled SURF. Each color represents data stored in a unique DRAM row. In the
original SURF, a sliding window needs to access multiple DRAM rows, leading to frequent
cache misses, while in tiled SURF, all required data within a sliding window can be cached.
During this analysis, every pixel around an interesting point is mapped to

corresponding histogram bins via a set of branch operations, that is, If-thenElse expressions. The total number of pixels involved in this analysis is huge.
Thus, the entire process involves an enormous amount of data-dependent
branch operations. However, the branch predictor and the speculation of
out-of-order execution of an ARM-based mobile CPU core are usually
not as advanced as that of a laptop or desktop processor. Consequently,
it incurs high pipeline hazard penalties, yielding significant performance
degradation.
To address the problem caused by the mismatch between data access pattern of
SURF and the small cache size of a mobile CPU, a tiled SURF was proposed in Yang
etal. (2013a), which divides an image into tiles (illustrated in Figure 9.5b) and performs point detection for each tile individually to exploit local spatial coherences and
reduce external memory traffic. To avoid pipeline hazards penalties, two solutions
were proposed in Yang etal. (2013a) to remove data-dependent branch operations.
The first solution is to use an alternative implementation, that is, instead of using
If-then-Else expressions, a lookup table is used to store the correlations between each
orientation and the corresponding histogram bins. This solution does not change
the functionality and other computations, but trades memory for speed. The second
solution is to replace the original gradient histogram method with a branching-free
orientation operator based on gradient moments (i.e., GMoment) (Rosin 1999). The
gradient momentbased method may slightly degrade the robustness of a SURF
detector but can greatly improve the speed on mobile platforms.
Tables 9.2 and 9.3 compare the runtime cost and the Phone-to-PC runtime ratio
between the original SURF and adapted SURF, respectively (Yang et al. 2012a).
211
TABLE 9.2
Runtime Cost Comparison on Three Mobile Platforms
Time (ms)
Droid
Thunderbolt
Xoom1
U-SURF
U-SURF tiling
O-SURF
O-SURF lookup table
O-SURF GMoment
O-SURF Tiling + GMoment
1310
930
7700
4264
1516
1053
525
356
2495
1820
613
404
461
243
2156
1178
519
269
TABLE 9.3
Speed Ratio Comparison on Three Mobile Platforms
Phone-to-PC Ratio ()
U-SURF
U-SURF tiling
O-SURF
O-SURF lookup table
O-SURF GMoment
O-SURF Tiling + GMoment
Droid
Thunderbolt
Xoom1
20
14
54
18
19
13
8
7
17
7
8
7
7
4
15
6
7
3
ThePhone-to-PC ratio, defined in Equation 9.4, is the runtime of a program running

on a mobile CPU divided by that on a desktop CPU, which reflects the speed gap
between them.
Phone-to-PC ratio =
Runtime on mobile platform

(9.4)
Runtime on x86-based PC
The evaluation experiments were performed on three mobile devices: a Motorola

Droid which features an ARM Cortex-A8 processor, an HTC Thunderbolt which
uses a Scorpion processor, and a Motorola Xoom1 which uses a dual-core ARM
Cortex-A9 processor. The first two rows of Tables 9.2 and 9.3 compare the runtime
cost and the Phone-to-PC ratio of upright SURF (U-SURF) without and with tiling. As expected, tiling can greatly reduce runtime cost by 29%47%. It reduces
the Phone-to-PC ratio by 12.5%42.9% on the three devices. The reduction in
Phone-to-PC ratio indicates that the mismatch between data access pattern and
a small cache size of a mobile CPU causes more severe runtime degradation on
mobile CPUs than desktop CPUs. So alleviating this problem is critical for performance optimization when porting algorithms to a mobile CPU. The third to
fifth rows of Tables 9.2 and 9.3 compare the results of oriented SURF (O-SURF)
with branch operations, O-SURF using a lookup table and using GMoment (Rosin
1999), respectively. Results show that using a lookup table or using the GMoment
212
method can greatly reduce the overall runtime and the Phone-to-PC ratio on three
platforms. The reduction in the Phone-to-PC ratio further confirms that branch
hazard penalty has a much greater runtime impact on a mobile CPU than on a
desktop CPU. Choosing proper implementations or algorithms to avoid such penalties is critical for a mobile task. The last rows of Tables 9.2 and 9.3 show the
results with the application of both two adaptations to O-SURF: comparing to
original SURF, the two adaptations can reduce the runtime on mobile platforms
by 68.
9.3.1.2.1.2Local Feature Description Once a set of interest points has been
extracted from an image, their content needs to be encoded in descriptors that are
suitable for matching. In the past decade, the most popular choices for this step have
been the SIFT descriptor and the SURF descriptor. SIFT and SURF have successfully demonstrated their good robustness and distinctiveness in a variety of computer vision applications. However, the computational complexity of SIFT is too
high for real-time application with tight time constraints. Despite that SURF accelerates SIFT by 23, it is still not sufficiently fast for real-time applications running
on a mobile device. In addition, SIFT and SURF are high-dimensional real-value
vectors which demand large storage space and high computing power for matching. Recently, the booming development of real-time mobile apps stimulates a rapid
development of binary descriptors that are more compact and faster to compute than
SURF-like features while maintaining a satisfactory feature quality. Notable work
includes BRIEF (Terriberry etal. 2008) and its variants rBRIEF (Rublee etal. 2011),
BRISK (Leutenegger etal. 2011), FREAK (Alahi etal. 2012), and LDB (Yang etal.
2012b, 2014a,b). In the following section, we review three representative descriptors:
SURF, BIREF, and LDB.
SURF: Speeded Up Robust Features. The SURF descriptor aims to achieve robustness to lighting variations and small positional shifts by encoding the image information in a localized set of gradient statistics. Specifically, each image patch is
divided into 4 4 grid cells. In each cell, SURF computes a set of summary statistics d x, |d x|, dy, and |dy|, resulting in a 64-dimensional descriptor. The firstorder derivatives d x and dy can be calculated very efficiently using box filters and
integral images.
Motivated by the success of SURF, a further optimized version has been proposed in Terriberry etal. (2008) that takes advantage of the computational power
available in current CUDA-enabled graphics cards. This GPUSURF implementation has been reported to perform feature extraction for a 60 480 image at a
frame rate upto 20Hz, thus making feature extraction a truly affordable processing step. However, to date, most mobile GPU cores do not support CUDA and thus
porting an implementation from desktop-based GPUs to mobile GPUs remains a
tedious task.
BRIEF: Binary robust independent elementary features. The BRIEF descriptor,
proposed by Calonder et al. (2010), primarily aims at high computational efficiency for construction and matching, and a small footprint for storage. The basic
idea of BRIEF is to directly generate bit strings by simple binary tests comparing
213
pixel intensities in an image patch. More specifically, a binary test is defined and
performed on a patch p of size S S as
1
( p; x, y) =
0
if I (p, x ) < I ( p, y)
(9.5)
otherwise
where I(p, x) is the pixel intensity at location x = (u, v)T. Choosing a set of nd (x,y)location pairs uniquely defines the binary test set and consequently leads to an
nd -dimensional bit string that corresponds to the decimal counterpart of
1i nd
i 1
( p; xi , yi ) (9.6)
By construction, the tests of Equation 9.6 consider only the information at single
pixels; therefore, the resulting BRIEF descriptors are very sensitive to noises. To
increase the stability and repeatability, the authors proposed to smooth pixels of
every pixel pairs using Gaussian or box filters before performing the binary tests.
The spatial arrangement of binary tests greatly affects the performance of the
BRIEF descriptor. In Calonder et al. (2010), the authors experimented with five
sampling geometries for determining the spatial arrangement. Experimental results
demonstrate that the tests that are randomly sampled from an isotropic Gaussian
distributionGaussian (0, 1/25S2) where the origin of the coordinate system is the
center of a patchgive the highest recognition rate.
LDB: Local difference binary. Binary descriptors such as BRIEF and a list of
enhanced versions of BRIEF (Alahi etal. 2012; Leutenegger etal. 2011; Rublee
etal. 2011) are very efficient to compute, to store, and to match (simply computing
the Hamming distance between descriptors via XOR and bit count operations).
These runtime advantages make them more suitable for real-time applications
and handheld devices. However, these binary descriptors utilize overly simplified information, that is, only intensities of a subset of pixels within an image
patch, and thus have low discriminative ability. Lack of distinctiveness incurs
an enormous number of false matches when matching against a large database.
Expensive postverification methods (e.g., RANSAC, Fischler etal. 1981) are usually required to discover and validate matching consensus, increasing the runtime
of the entire process.
Local difference binary (LDB), a binary descriptor, achieves similar computational speed and robustness as BRIEF and other state-of-the-art binary descriptors,
yet offering greater distinctiveness. The high quality of LDB is achieved through
three schemes. First, LDB utilizes average intensity Iavg and first-order gradients,
dxand dy, of grid cells within an image patch. Specifically, the internal patterns of the
image patch is captured through a set of binary tests, each of which compares the Iavg,
dx, and dy of a pair of grid cells (illustrated in Figure 9.6a and b). The average intensity and gradients capture both the DC and AC components of a patch; thus, they
provide a more complete description than other binary descriptors. Second, LDB
214
Iavg1 = 50
Binary test
dx1 = 0
dy1 = 0
Iavg2 = 150
dx2 = 32
dy2 = 64
(c)
Binary test on a pair of grids
A patch with 3 3 gridding

(a)
(b)
A patch with multiple gridding
FIGURE 9.6 Illustration of LDB extraction. (a) An image patch is divided into 3 3 equalsized grids. (b) Compute the intensity summation (I), gradient in x and y directions (dx anddy)
of each patch, and compare I, dx and dy between every unique pair of grids. (c) Three-level
gridding (with 2 2, 3 3, and 4 4 grids) is applied to capture information at different
granularities.
employs a multiple gridding strategy to encode the structure at different spatial granularities (Figure 9.6c). Coarse-level grids can cancel out high-frequency noise while
fine-level grids can capture detailed local patterns, thus enhancing distinctiveness.
Third, LDB leverages a modified AdaBoost method (Yang et al. 2014b) to select
a set of salient bits. The modified AdaBoost targets the fundamental goal of idea
binary descriptors: minimizing distance between matches while maximizing them
between mismatches, optimizing the performance of LDB for a given descriptor
length. Computing LDB is very fast. Relying on integral images, the average intensity and first-order gradients of each grid cell can be obtained by only four to eight
add/subtract operations.
9.3.1.2.2 Local Feature-Based Object Recognition
To recognize objects in a captured image, a system matches each feature descriptors
on a captured image to database features in order to find its nearest neighbor (NN). If
a pair of NNs pass the verification criteria (i.e., the similarity between a feature and
its NN being above a predetermined threshold, complying with a geometric model),
this feature pair is considered a matched pair; otherwise, it is discarded as a false
positive. The database object which has most matched features to the captured image
is considered as the recognized object.
215
Fast and accurately retrieving the NN of a local feature from a large database
is the key to efficient and accurate object recognition, ensuring a satisfactory user
experience and scalability for MAR apps. Two popular techniques that have been
commonly used for large-scale NN matching are locality sensitive hashing (LSH)
and bag-of-words (BoW) matching.
LSH: Locality sensitive hashing. LSH, (Gionis etal. 1999), is a widely used technique for approximate NN search. The key of LSH is a hash function, which maps
similar descriptors into the same bucket of a hash table and different descriptors in
different buckets. To find the NN of a query descriptor, we first retrieve its matching
bucket and then check all the descriptors within the matched bucket using a bruteforce search.
For binary features, the hash function can simply be a subset of bits from the
original bit string; descriptors with a common sub-bit-string are casted to the
same table bucket. The size of the subset, that is, the hash key size, determines
the upper bound of the Hamming distance among descriptors within the same
buckets. To improve the detection rate of NN search based on LSH, two techniques, namely multi-table and multi-probe, are usually used. The multi-table
technique stores the database descriptors in several hash tables, each of which
leverages a different hash function. In the query phase, the query descriptor is
hashed into a bucket of every hash table and all descriptors in each of these buckets are then further checked for matching. Multi-table improves the detection rate
of NN search at the cost of higher memory usage and longer matching time, which
is linearly proportional to the number of hash tables used. Multi-probe examines
both the bucket in which the query descriptor falls and its neighboring buckets.
While multi-probe would result in more matching checks of database descriptors,
it actually requires fewer hash tables and thus incurs lower memory usage. In
addition, it allows a larger key size and in turn smaller buckets and fewer matches
to check per bucket.
BoW: Bag-of-words matching. BoW matching (Sivic et al. 2003) is an effective
strategy to reduce memory usage and support fast matching via a scalable indexing
scheme such as an inverted file. Typically, BoW matching quantizes local image
descriptors into visual words and then computes the image similarity by counting the frequency of co-occurrences of words. However, it completely ignores
the spatial information; hence it may greatly degrade the accuracy. In order to
enhance the accuracy for BoW matching, several approaches have been proposed
to compensate the loss of spatial information. For example, geometric verification (Philbin et al. 2007), which is designed for general image-matching applications, is a popular scheme which verifies local correspondences by checking
their homography consistency. Wu et al. presented a bundling feature matching
scheme (Wu etal. 2009) for partial-duplicate image detection. In their approach,
sets of local features are bundled into groups by MSER (Matas etal. 2002) region
detected regions, and robust geometric constraints are then enforced within each
group. Spatial pyramid matching (Lazebnik etal. 2006), which considers approximate global geometric correspondences, is another scheme to enforce geometric
constraints for more accurate BoW matching. The scheme partitions the image
216
into increasingly finer subregions and computes histograms of local features found
within each subregion. To compute the similarity between two images, the distance between histogram at each spatial level is weighted and summed together.
All these schemes yield more reliable local-region matches by enforcing various
geometric constraints. However, these schemes are very computationally expensive, thus when applying them to MAR, the recognition procedure is conducted on
the server side or in the cloud where abundant computing and memory resources
are available.
9.3.1.2.3 Local Feature-Based Object Tracking
A typical flow of local feature-based object tracking is to find corresponding local
features on consecutive frames and then estimate the homography transformation
between image frames based on local feature matches according to Equation 9.1. But
different from marker-based tracking and pose estimation, which utilize only four
reliable corner matches, local feature-based method often generates a large amount
of correspondences which inevitably could include some outliers. Selecting reliable
matches from a large correspondence set is challenging, and existing solutions often
rely on the RANSAC or PROSAC algorithms to solve this problem. The key idea
of RANSAC and PROSAC is to iteratively estimate parameters of a transformation
model from a set of noisy feature correspondences so that a sufficient number of
consensuses can be obtained. We refer readers to Fischler etal. (1981) and Chum
etal. (2005) for details of the RANSAC and PROSAC algorithms. The quality of
local features is essential for the accuracy of local feature matches. A large number
of false positive matches resulting from low-quality features could lead to an enormous amount of iterations in the RANSAC and PROSAC procedures, yielding an
excessively long runtime.
9.3.2Sensor-Based Recognition and Tracking

Sensor-based method typically leverages the GPS to identify the location of a mobile
device and utilizes a compass (or in combination with other sensors) to determine
the direction that the device is heading to. Based on the location and direction of a
device, a MAR system could determine which virtual data should be associated with
the current scene. After that, the devices motion is tracked based on motion sensors
(also known as Inertial Measurement Unit [IMU]). Since location recognition using
the GPS is straightforward, in the following section we mainly focus on common
motion sensors used in todays smart mobile devices, their functionalities, and tracking algorithms based on these sensors.
9.3.2.1 Sensor-Based Object Tracking
With recent advances in microelectromechanical systems (MEMS) technology,
IMUs are now commonplace in most smart mobile devices. These IMUs are used
by mobile apps for tracking the movement of a mobile device and consequently
enabling the device to interact with its surrounding. The most common IMUs found
in these smart devices today include accelerometers, magnetometers, and gyroscopes. Each of these sensors provides a unique input to the overall tracking system.
217
In the following section, we first briefly review their functionalities, hardware, and
bias of these sensors and then present tracking algorithms based on the sensor data.
9.3.2.1.1Accelerometer
An accelerometer measures the acceleration forces exerted on a mobile device. The
accelerometer reading is a summation of two forces: the gravity force due to the
weight of the device and the acceleration force due to the motion of the device.
Todays mobile devices are equipped with a three-axis accelerometer which measures the forces in the x, y, and z directions with respect to the surface plane of the
mobile device.
There are several types of accelerometers and the type used in mobile devices
is the capacitive accelerometer. Figure 9.7a illustrates the structure of a one-axis
MEMS capacitive accelerometer. The accelerometer data is acquired by measuring the force exerted on an object which is able to flex up or down. The amount the
device flexes is monitored by a set of fingers that are attached to a movable inertial
mass and flex with the device. As these fingers/plates move, they get closer to, and
move further apart from, a set of stationary fingers/plates. The proximity of these
fingers/plates can create a change in the measured capacitance between multiple
fingers/plates, which can be monitored to measure the displacement of the center
inertial mass. This structure can be extended to build a three-axis accelerometer for
measuring the displacement along all three axes.
9.3.2.1.2Gyroscope
A three-axis gyroscope provides a 3D vector which measures the rotational (angular) velocity of a device around three axes of the devices coordinate system. The
gyroscope was first introduced into smartphones by Apple in iPhone4 in June 2010.
The first Android phone in which a three-axis gyroscope was integrated is Googles
Nexus S in December 2010. Gyroscopes work off the principles of the Coriolis force,
and are usually implemented within an integrated circuit (IC) using a vibrating mass
attached to a set of springs. If the device is rotated about the axis defined by the
first set of springs, the inner frame will be pushed away from the axis of rotation,
causing a compression in the second set of springs due to the Coriolis acceleration
experienced by the vibrating mass. An example of a MEMS gyroscope is depicted
in Figure 9.7b.
These types of gyroscopes are relatively cheap to manufacture; however, they are
often noisy and could introduce significant errors if their measurements are not modeled properly. On the contrary, many advanced inertial navigation systems (INSs)
today have begun using optical gyroscopes instead, which prove to be much more
accurate. Because of its broad application and increasing popularity, more advanced
gyroscopes are being developed and integrated into new devices, which can yield
more accurate results.
Although latest MEMS gyroscopes have smaller errors than the previous generations, all gyroscopes used in smartphones today still experience a small amount of
bias. Given that the gyroscope measures a rate (change over time), the bias itself is a
rate. A gyroscope bias can be envisioned as the rotational velocity observed by the
device when it is not in motion. This view is somewhat simplified, as a bias can also
218
Base (substrate)
x2
C1
C2
Spring ks
x1
Fixed outer
plates
Movable plates
Inner frame
Vo
Resonating mass
Vo
Mass drive direction
Proof mass:
movable
microstructure
Springs
Spring ks
Base (substrate)
(a)
Coriolis sense fingers
Vx
(b)
FIGURE 9.7 (a) A typical 1D MEMS capacitive accelerometer and (b) a vibrating mass gyroscope.
Motion, x
219
occur when the device is moving as well. In addition, this bias is sensitive to several
factors including the temperature and often randomly varies over time, thus is difficult to compensate for. This bias is often estimated as a random variable by many
filtering algorithms.
9.3.2.1.3Magnetometer
The magnetometer measures the strength of the earths magnetic field, which
is a vector pointing toward the magnetic north of the earth. The magnetometer found in most smart devices is primarily one of two possible types: a Hall
effect magnetometer or an anisotropic magnetoresistive (AMR) magnetometer.
The Hall effect magnetometers are the most common and provide a voltage output in response to the measured field strength and can also sense polarity. The
AMR magnetometers use a thin strip of a special kind of alloy that changes its
resistance whenever there is a change in the surrounding magnetic field. AMR
magnetometers usually yield much better accuracy; however, they are more
expensive.
One primary source of a magnetometers error is called the magnetometer bias.
This bias is caused by the surrounding environment (external to the magnetometer
itself) and can cause a wide range of errors in the magnetometer readings. The bias
itself can be separated into one of two types. The first is called hard iron bias. This
type of bias is primarily caused by devices which produce a magnetic field. The
errors observed by a hard iron bias are constant offsets, usually applied to all axes
of the magnetometer equally. This bias is not time or space varying and can be
compensated for by simply adjusting the readings of the magnetometer by some
constant value. The other type of bias commonly experienced by the magnetometer
is called soft iron bias. This type of bias is caused by any distortions in the magnetic
field surrounding the magnetometer, thus can have many forms and is difficult to
compensate for.
9.3.2.1.4 Kalman Filtering for Sensor-Based Tracking
The goal of tracking is to obtain the translation and orientation of a device in
the 3D earth coordinate system. Each of the three sensors alone can provide the
orientation of a mobile device. Once we get the orientation information, we can
derive the gravity force components along three axes of the mobile device and
then subtract the gravity force from the accelerometer data to obtain the motioninduced acceleration force. By double integration of motion-induced acceleration
force, we can obtain the translation of the devices in the 3D earth coordinate
system.
However, since each type of sensor data is quite noisy, relying on a single type of
sensor cannot achieve an accurate tracking. Many approaches apply a filtering-based
method to fuse three types of sensor data for a more reliable and precise tracking
result. In the following, we overview the most standard filtering methodKalman
filtering, which is also the algorithm implemented in Android operating system for
estimating smartphones orientation. For more advanced yet computationally expensive filters such as unscented Kalman filters or particle filters, please refer to Li etal.
(2013) and Cheon etal. (2007) for details.
220
The Kalman filtering process for orientation estimation can be broken down into
two primary steps: the prediction step and the updating step. In the predicting phase,
the filter uses the gyroscope measurement to predict the dynamics of the device rotation. The gyroscope measurements are thus integrated directly into the state transition equation and used to provide a predicted state estimate. In particular, the state
equation is defined using a seven-element state vector consisting of the quaternion
q (t ) (i.e., four-element orientation vector) and the gyro-bias b(t):
q (t )
X (t ) =
(9.7)
b(t )
We define an error angle vector as a small rotation between the estimated and the
true orientation of a device in the earth coordinate system. Similarly, an error bias
vector b is defined as small differences between the estimated and the true bias of
the device. Accordingly, the error state propagation model is given by
[]
=
b 033
I 33 I 33
+
033 b 033
033 n (9.8)

I 33 nb
where
is the rotation velocity around three axes
nw and nb models are the gyroscope noise and bias, respectively
In most cases, nw is assumed to be an independent white Gaussian distribution along
each axis of the gyroscope input. Therefore, its expected value is given as E[n] = 03 1.
The gyroscope bias model is usually defined as b = nb, where nb is an independent
white Gaussian distribution along each axis.
The solution to this differential equation has the closed form solution found in
Trawny etal. (2005) and yields the following state transition matrix:

=
033
(9.9)
I 33
where
]
= I 33 [
]
= [
| t )
| t ))
sin(|
(1 cos(|
] 2
+[
|
|
|
|
| t ))
| t sin(|
| t ))
(1 cos(|
(|
[]2
I 33t
|
|3
|
|
= b is treated as a system input which

Note that in the state transition matrix,
has already been bias corrected.
221
In the updating phase, the filter combines the previously estimated state with
the recorded accelerometer, and magnetometer measurements come directly from
sampling the IMUs to revise the orientation estimation. Each recorded measurement complies with a model which describes its relationship with the estimated
states and noise errors of some measurements. Specifically, the measurement model
is of the form:

z = REB (q) z 0 + nz (9.10)
where
E[nz] = 0
E[ nz nzT ] = R
REB (q) is a rotation matrix from the earth coordinate system to the predicted
device coordinate system
The rotation matrix is obtained using the propagated quaternion from the process
model. z0 is a unit vector representation of north in the earth coordinate system. The
measurement residual is defined as r = z z, where z is the input measurement. Since
this residual represents the error between the measurement vector and the predicted
vector, it is a close approximation of the error angle vector . This approximation is
derived in Trawny etal. (2005) and defined as
r REB (q) z 0

0 + nz (9.11)
b

which gives a result for the measurement model H:
H = REB (q) z 0
0 (9.12)
After the measurement update, the residual obtained in the measurement model is
used to update the quaternion which in turn is used as the filter result.
9.3.3 Hybrid-Based Recognition and Tracking

At present, the processing power and memory capacity of mobile devices are still
too limited for scalable MAR apps solely relying on sophisticated visual recognition and tracking methods. On the other hand, the built-in sensors (e.g., accelerometer, gyroscope, magnetometer, and GPS) usually lack sufficient accuracy, thus
cannot provide satisfactory performance for recognition and tracking tasks. Several
studies proposed to combine these vision-based and sensor-based methods. For
instance, GPS positioning alone is insufficient for AR apps, but it can be combined
with a visual tracking method to achieve a desired level of accuracy (Reitmayr etal.
2007). The basic idea behind these methods is to use GPS to identify the position
222
(i.e., the location on earth) and this information is used to initialize the visual tracking system, which in turn gives the users local pose and the view direction. In
Naimark etal. (2002), the authors proposed to combine visual tracking and GPS
for outdoor building visualization. The user can place virtual models on Google
Earth and the app can retrieve and visualize them based on the users GPS location.
Another promising direction is to combine vision information with motion sensor
data (i.e., gyroscope, accelerometer, and magnetometer) to provide a more accurate and efficient object tracking. The trend of integrating more sensors into mobile
devices has not stopped yet. For example, Google has just released a new mobile
platform, Tango, which integrates 6 Degree-of-Freedom motion sensors, depth sensors, and high-quality cameras. Amazon has announced their new Fire phone which
includes four cameras tucked into the front corners of the phone, in additional to
other motion sensors. Advances in mobile hardware offer the opportunities to gain
richer contextual information surrounding a mobile device and in turn open a door
for new approaches to best utilizing all available multimodel information.
9.4 SOFTWARE DEVELOPMENT TOOLKITS

OpenCV is one of the most popular software development libraries for computer
vision tasks. A mobile version of OpenCV has been released for running on mobile
platforms (OpenCV for Android). Other mature libraries such as Eigen (Eigen main
page) or LAPACK for linear algebra (LAPACKLinear Algebra PACKage) also
become available for mobile platforms even though the support and the optimization
level are still limited.
Qualcomm has released a mobile-optimized computer vision library, named
FastCV (FastCV main page), which includes the most frequently used vision processing functions and can be used for camera-based mobile apps. The CV functions offered by FastCV include gesture recognition, text recognition and tracking,
and face detection, tracking, and recognition. FastCV can run on most ARM-based
processors but is particularly tuned for Qualcomms Snapdragon processor (S2 and
above) and utilizes hardware acceleration to speed up some of the most computerintensive vision functions.
Built on top of FastCV, Qualcomm further offers an MAR software development
kit (SDK), named Vuforia (Vuforia main page). Vuforia offers software functions
for app developers that can recognize and maintain a variety of 2D and 3D visual
targets, frame markers, text, and user interactions (e.g., interactions with a virtual
button). In addition, it provides APIs to easily render 3D graphics or video playback
on top of the real scene. To manage visual targets, Vuforia provides two ways to store
target databases: on a mobile device or on the cloud. Device databases do not require
network connectivity for the recognition, and thus can avoid the overhead for data
transfer and are free to use in mobile apps. However, due to the limited storage space
and computing power of mobile devices, device databases can only store a limited
number of targets; so far, the max of targets that can be stored in a device database
is 100. Cloud databases are managed using either the Target Manager UI provided
by Qualcomm or the Vuforia Web Service API. They enable you to host over one
million targets on the cloud. The Vuforia cloud recognition service is an enterprise
223
class solution with various pricing plans determined by your apps total number of
image recognitions per month. Generally speaking, Vuforia development infrastructure facilitates, and significantly simplifies, the development of MAR apps.
9.5CONCLUSIONS
The advancement of mobile technology, in terms of hardware computing power,
seamless connectivity to the cloud, and fast computer vision algorithms, has raised
AR into the mainstream of mobile apps. Following the widespread popularity of a
handful of killer MAR applications already commercially available, it is believed
that MAR will expand exponentially in the next few years. The advent of MAR will
have a profound and lasting impact on the way people use their smartphones and tablets. These emerging MAR apps will turn our everyday world into a fully interactive
digital experience, from which we can see, hear, feel, and even smell the information
in a different way. This emerging direction will push the industry toward truly ubiquitous computing and a technologically converged paradigm.
The scalability, accuracy, and efficiency of the underlying techniques (i.e., object
recognition and tracking) are key factors influencing user experience of MAR apps.
New algorithms in computer vision and pattern recognition, such as lightweight feature extraction, have been developed to provide efficiency and compactness on lowpower mobile devices and meanwhile maintain sufficiently good accuracy. Several
efforts are also made to analyze particular hardware limitations for executing existing recognition and tracking algorithms on mobile devices and explore adaption
techniques to address these limitations. In addition to advances in the development
of lightweight computer vision algorithms, a variety of sensors have been integrated
into modern smartphones, enabling location recognition (e.g., via GPS) and device
tracking (e.g., via gyroscope, accelerometer, and magnetometer) at little computational cost. However, due to large noise of low-cost sensors equipped in todays
smartphones, the accuracy of location recognition and device tracking is usually
low and cannot meet the requirement for apps which demand high accuracy. Fusing
visual information with sensor data is a promising direction to achieve both high
accuracy and efficiency, and we shall see an increasing amount of research work
along this direction in the near future.
REFERENCES
Alahi, A., Ortiz, R., and Vandergheynst, P. 2012. FREAK: Fast retinal keypoint, In Proceedings
of the Computer Vision on Pattern Recognition.
Ashby, F.G. and Perrin, N.A. 1988. Toward a unified theory of similarity and recognition.
Psychological Review, 95:124150.
Bay, H., Ess, A., Tuytelaars T., and Gool, L.V. 2006. SURF: Speeded-up robust features. In
Proceedings of the European Conference on Computer Vision.
Bay, H., Ess, A., Tuytelaars, T., and Gool L.V. June 2008. Speeded-up robust features.
In Proceedings of the Conference on Vision and Image Understanding, 110(3),
346359.
Calonder, M., Lepetit, V., Strecha, C., and Fua, P. 2010. BRIEF: Binary robust independent
elementary features. In Proceedings of the European Conference on Computer Vision.
224
Canny, J. 1986. A computational approach to edge detection. IEEE Transactions on Pattern

Analysis and Machine Intelligence, 8(6):679698.
Cheng, K.T., Yang, X., and Wang, Y.-C. July 79, 2013. Performance optimization of vision
apps on mobile application processor. International Conference on Systems, Signals and
Image Processing (IWSSIP), Bucharest, Romania.
Cheon, Y.J. and Kim, J.H. 2007. Unscented filtering in a unit quaternion space for spacecraft
attitude estimation. In Proceedings of the IEEE International Symposium on Industrial
Electronics, pp. 6671.
Chum, O. and Matas, J. 2005. Matching with PROSACProgressive sample consensus. In
Proceedings of Computer Vision and Pattern Recognition, 1:220226.
Eigen main page: http://eigen.tuxfamily.org/index.php?title=Main_Page.
FastCV main page: https://developer.qualcomm.com/mobile-development/add-advanced-features/
computer-vision-fastcv.
Fischler, M.A. and Bolles, R.C. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications
of the ACM, 24, 381395.
Gionis, A., Indyk, P., and Motwani, R. 1999. Similarity search in high dimensions via hashing. In Proceedings of International Conference on Very Large Databases, 25:518529.
Gonzalez, R. and Woods, R. 1992. Digital Image Processing, Addison Wesley: Reading, MA,
pp. 414428.
Haralick, R. 1984. Digital step edges from zero crossing of second directional derivatives.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(1):5868.
Honkamaa, P., Siltanen, S., Jappinen, J., Woodward, C., and Korkalo, O. 2007. Interactive
Outdoor Mobile Augmentation using Markerless Tracking and GPS. Laval, France.
Information TechnologyAutomatic Identification and Data Capture TechniquesData
Matrix Bar Code Symbology Specification. 2006a. ISO/IEC 24720:2006. International
Organization for Standardization.
Information TechnologyAutomatic Identification and Data Capture TechniquesQR
Code 2005 Barcode Symbology Specification. 2006b. ISO/IEC 18004. International
Information TechnologyAutomatic Identification and Data Capture TechniquesPDF417
Barcode Symbology Specification. 2006c. ISO/IEC 15438:2006. International
Jiang, B., Neumann, U., and Suya, Y. March 2004. A robust hybrid tracking system for outdoor
augmented reality. In Proceedings of Virtual Reality, pp. 3275.
LAPACKLinear Algebra PACKage: http://www.netlib.org/lapack/.
Lazebnik, S., Schmid, C., and Ponce, J. 2006. Beyond bags of features: Spatial pyramid
matching for recognizing natural scene categories. In Proceedings of Computer Vision
and Recognition Recognition, pp. 21692178.
Leutenegger, S., Chli, M., and Siegwart, R. 2011. BRISK: Binary robust invariant scalable
keypoints. In Proceedings of the Computer Vision on Pattern Recognition.
Li, W.W.L, Iltis, R.A., and Win, M.Z. 2013. Integrated IMU and radiolocation-based navigation using a rao-blackwellized particle filter. In Proceedigns of the IEEE International
Conference on Acoustics, Speech and Signal Processing, pp. 51655169.
Liao, C.Y., Tang, H., Liu, Q., Chiu, P., and Chen, F. 2010. FACT: Fine-grained cross-media
interaction with documented via a portable hybrid paper-laptop interface. In ACM
Multimedia.
Liu, X. and Doermann, D. 2008. Mobile retriever: Access to digital documents from their
physical source. International Journal of Document Analysis and Recognition 11(1):
pp. 1927.
Lowe, D.G. 2004. Distinctive image features from scale-invariant keypoints. International
Journal of Computer Vision, 60(2):91110.
225
Matas, J., Chum, O., Urban, M., and Pajdla, T. 2002. Robust wide baseline stereo from maximally stable extremal regions. In Proceedings of British Machine Vision Conference,
pp. 384396.
Naimark, L. and Foxlin, E. 2002. Circular data matrix fiducial system and robust image
processing for a wearable vision-inertial self-tracker. In Proceedings of International
Symposium on Mixed and Augmented Reality, pp. 2736.
OpenCV for Android: http://opencv.org/platforms/android.html.
Philbin, J., Chum, O., Isard, M., Sivic, J., and Zisserman, A. 2007. Object retrieval with large
vocabularies and fast spatial matching. Proceedings of Computer Vision and Pattern
Recognition, pp. 18.
Reitmayr, G. and Drummond, T.W. 2007. Initialization for visual tracking in urban environments. pp. 161172.
Ribo, M., Lang, P., Ganster, H., Brandner, M., Stock, C., and Pinz, A. 2002. Hybrid tracking for outdoor augmented reality applications. Computer Graphics and Applications,
IEEE, 22(6):5463, 178.
Rosin, P.L. 1999. Measuring corner properties. Journal of Computer Vision and Image
Understanding, 73(2):291307.
Rosten, E. and Drummond, T. 2006. Machine learning for high speed corner detection. In
Proceedings of the European Conference on Computer Vision.
Rublee, E., Rabaud, V., Konolige, K., and Bradski, G. 2011. ORB: An efficient alternative to
SIFT or SURF. In Proceedings of the International Conference on Computer Vision.
Simard, P., Bottou, L., Haffner, P., and LeCun, Y. 1998. Boxlets: A fast convolution algorithm for signal processing and neural networks. In Proceedings of Neural Information
Processing Systems (NIPS).
Sivic, J. and Zisserman, A. 2003. Video google: A text retrieval approach to object
matching in videos. Proceedings of International Conference on Computer Vision,
2:14701477.
Ta, D.N., Chen, W.C., Gelfand, N., and Pulli, K. 2009. SURFTrac: Efficient tracking and
continuous object recognition using local feature descriptors. In Proceedings of the
Conference on Vision and Pattern Recognition.
Terriberry, T.B., French, L.M., and Helmsen, J. 2008. GPU accelerating speeded-up robust
features. In Proceedings of the 3D Data Processing, Visualization and Transmission.
Trawny, N. and Roumeliotis, S. 2005. Indirect Kalman Filter for 3D Attitude Estimation. Tech.
Rep. 2. Department of Computing Science and Engineering, University of Minnesota,
Minneapolis, MN.
Tuytelaars, T. and Mikolajczyk, K. 2008. Local invariant feature detectors: A survey. Journal
Foundations and Trends in Computer Graphics and Vision, 3:177280.
Uchiyama, S., Takemoto, K., Satoh, K., Yamamoto, H., and Tamura, H. 2002. MR platform: A
basic body on which mixed reality applications are built. In Proceedings of International
Symposium on Mixed and Augmented Reality. Vol. 00, p. 246.
Vuforia main page: https://www.vuforia.com/.
Wu, Z., Ke, Q.F., Isard, M., and Sun, J. 2009. Bundling features for large scale partial-duplicate web image search. In Proceedings of Computer Vision and Pattern Recognition,
pp. 2532.
Yang, X. and Cheng, K.T. June 2012a. Accelerating SURF detector on mobile devices. ACM
International Conference on Multimedia, Nara, Japan.
Yang, X. and Cheng, K.T. 2012b. LDB: An ultrafast feature for scalable augmented reality on
mobile device. In Proceedings of International Symposium on Mixed and Augmented
Reality, pp. 4957.
Yang, X. and Cheng, K.T. 2014a. Local difference binary for ultrafast distinctive feature description. IEEE Transactions on Pattern Analysis and Machine Intelligence,
36(1):188194.
226
Yang, X. and Cheng, K.T. 2014b. Learning optimized local difference binaries for scalable
augmented reality on mobile devices. IEEE Transactions on Visualization and Computer
Graphics, 20(6):852865.
Yang, X., Liao, C.Y., and Liu, Q. 2012. MixPad: Augmenting paper with mice & keyboards
for bimanual, cross-media and fine-grained interaction with documents. In Proceedings
of ACM Multimedia, pp. 11451148.
Yang, X., Liao, C.Y., Liu, Q., and Cheng, K.T. 2011a. Minimum correspondence sets for
improving large-scale augmented paper. In Proceedings of International Conference on
Virtual Reality Continuum and Its Applications in Industry.
Yang, X., Liao, C.Y., Liu, Q., and Cheng, K.T. 2011b. Large-scale EMM identification with
geometry-constrained visual word correspondence voting. In Proceedings of ACM
International Conference on Multimedia Retrieval.
10 Taxonomy, Research
Haptic Augmented Reality

Status, and Challenges
Seokhee Jeon, Seungmoon Choi,
and Matthias Harders
CONTENTS
10.1 Introduction................................................................................................... 227
10.2 Taxonomies.................................................................................................... 229
10.2.1 Visuo-Haptic RealityVirtuality Continuum.................................... 229
10.2.2 Artificial Recreation and Augmented Perception.............................. 232
10.2.3 Within- and Between-Property Augmentation.................................. 233
10.3 Components Required for Haptic AR........................................................... 234
10.3.1 Interface for Haptic AR..................................................................... 234
10.3.2 Registration between Real and Virtual Stimuli................................. 236
10.3.3 Rendering Algorithm for Augmentation........................................... 237
10.3.4 Models for Haptic AR....................................................................... 238
10.4 Stiffness Modulation..................................................................................... 239
10.4.1 Haptic AR Interface...........................................................................240
10.4.2 Stiffness Modulation in Single-Contact Interaction..........................240
10.4.3 Stiffness Modulation in Two-Contact Squeezing.............................. 243
10.5 Application: Palpating Virtual Inclusion in Phantom with Two Contacts......245
10.5.1 Rendering Algorithm......................................................................... 247
10.6 Friction Modulation.......................................................................................248
10.7 Open Research Topics................................................................................... 249
10.8 Conclusions.................................................................................................... 250
References............................................................................................................... 251
10.1INTRODUCTION
This chapter introduces an emerging research field in augmented reality (AR), called
haptic AR. As AR enables a real space to be transformed to a semi-virtual space by
providing a user with the mixed sensations of real and virtual objects, haptic AR does
the same for the sense of touch; a user can touch a real object, a virtual object, or a real
object augmented with virtual touch. Visual AR is a relatively mature technology and
is being applied to diverse practical applications such as surgical training, industrial
manufacturing, and entertainment (Azuma etal. 2001). In contrast, the technology
227
228
for haptic AR is quite recent and poses a great number of new research problems
ranging from modeling to rendering in terms of both hardware and software.
Haptic AR promises great potential to enrich user interaction in various applications.
For example, suppose that a user is holding a pen-shaped magic tool in the hand, which
allows the user to touch and explore a virtual vase overlaid on a real table. Besides, the
user may draw a picture on the table with an augmented feel of using a paint brush on
a smooth piece of paper, or using a marker on a stiff white board. In a more practical
setting, medical students can practice cancer palpation skills by exploring a phantom
body while trying to find virtual tumors that are rendered inside the body. A consumertargeted application can be found in online stores. Consumers can see clothes displayed
on the touchscreen of a tablet computer and feel their textures with bare fingers, for
which the textural and frictional properties of the touchscreen are modulated to those
of the clothes. Another prominent example is augmentation or guidance of motor skills
by means of external haptic (force or vibrotactile) feedback, for example, shared control or motor learning of complex skills such as driving and calligraphy. Creating such
haptic modulations belongs to the realm of haptic AR. Although we have a long way to
go in order to realize all the envisioned applications of haptic AR, some representative
examples that have been developed in recent years are shown in Figure 10.1.
Virtual tumor
(a)
(b)
HMD
Haptic device
Actuator
(c)
(d)
FIGURE 10.1 Representative applications of haptic AR. (a) AR-based open surgery simulator introduced. (From Harders, M. et al., IEEE Trans. Visual. Comput. Graph., 15, 138,
2009.) (b) Haptic AR breast tumor palpation system. (From Jeon, S. and Harders, M., IEEE
Trans. Haptics, 99, 1, 2014.) (c) Texture modeling and rendering based on contact acceleration data. (Reprinted from Romano, J.M. and Kuchenbecker, K.J., IEEE Trans. Haptics, 5,
109, 2011. With permission.) (d) Conceptual illustration of the haptic AR drawing example.
229
In this chapter, we first address three taxonomies for haptic AR based on a composite visuo-haptic realityvirtuality continuum, a functional aspect of haptic AR
applications, and the subject of augmentation (Section 10.2). A number of studies
related to haptic AR are reviewed and classified based on the three taxonomies.
Based on the review, associated research issues along with components needed for
a haptic AR system are elucidated in Section 10.3. Sections 10.4 through 10.6 introduce our approach for the augmentation of real object stiffness and friction, in the
interaction with one or two contact points. A discussion of the open research issues
for haptic AR is provided in Section 10.7, followed by brief conclusions in Section
10.8. We hope that this chapter could prompt more research interest in this exciting,
yet unexplored, area of haptic AR.
10.2TAXONOMIES
10.2.1 Visuo-Haptic RealityVirtuality Continuum
General concepts associated with AR, or more generally, mixed reality (MR) were
defined earlier by Milgram and Colquhoun Jr. (1999) using the realityvirtuality
continuum shown in Figure 10.2a. The continuum includes all possible combinations
of purely real and virtual environments, with the intermediate area corresponding to
MR. Whether an environment is closer to reality or virtuality depends on the amount
of overlay or augmentation that the computer system needs to perform; the more augmentation performed, the closer to virtuality. This criterion allows MR to be further
classified into AR (e.g., a heads-up display in an aircraft cockpit) and augmented
virtuality (e.g., a computer game employing a virtual dancer with the face image
of a famous actress). We, however, note that the current literature does not strictly
discriminate the two terms, and uses AR and MR interchangeably.
Extending the concept, we can define a similar realityvirtuality continuum for
the sense of touch and construct a visuo-haptic realityvirtuality continuum by compositing the two unimodal continua shown in Figure 10.2b. This continuum can be
valuable for building the taxonomy of haptic MR. In Figure 10.2b, the whole visuohaptic continuum is classified into nine categories, and each category is named in an
abbreviated form. The shaded regions belong to the realm of MR. In what follows, we
review the concepts and instances associated with each category, with more attention
to those of MR. Note that the continuum for touch includes all kinds of haptic feedback and does not depend on the specific types of haptic sensations (e.g., kinesthetic,
tactile, or thermal) or interaction paradigms (e.g., tool-mediated or bare-handed).
In the composite continuum, the left column has the three categories of h aptic
reality, vR-hR, vMR-hR, and vV-hR, where the corresponding environments provide only real haptic sensations. Among them, the simplest category is vR-hR,
which represents purely real environments without any synthetic stimuli. The other
end, vV-hR, refers to the conventional visual virtual environments with real touch,
for example, using a tangible prop to interact with virtual objects. Environments
between the two ends belong to vMR-hR, in which a user sees mixed objects but
still touches real objects. A typical example is the so-called tangible AR that has
been actively studied in the visual AR community. In tangible AR, a real prop held
230
Visual virtuality
Visual mixed reality
Visual reality
Degree of virtuality in vision
Mixed reality
Augmented reality
Augmented virtuality
Realityvirtuality continuum
Real environment
Virtual environment
(a)
vV-hR
vV-hMR
vV-hV
vMR-hR
vMR-hMR
vMR-hV
vR-hR
vR-hMR
vR-hV
Haptic reality
(b)
Haptic mixed reality
Haptic virtuality
Degree of virtuality in touch
FIGURE 10.2 Realityvirtuality continuum extended to encompass touch. (Figures taken

from Jeon, S. and Choi, S., Presence Teleop. Virt. Environ., 18, 387, 2009. With permission.)
(a) Original realityvirtuality continuum. (From Milgram, P. and Colquhoun, H. Jr., A taxonomy of real and virtual world display integration, in Mixed RealityMerging Real and
Virtual Worlds, Y.O.A.H. Tamura (ed.), Springer-Verlag, Berlin, Germany, 1999, pp. 116.)
(b) Composite visuo-haptic realityvirtuality continuum. (Jeon, S. and Choi, S., Presence
Teleop. Virt. Environ., 18, 387, 2009.) Shaded areas in the composite continuum represent the
realm of mixed reality.
in the hand is usually used as a tangible interface for visually mixed environments
(e.g., the MagicBook in Billinghurst etal. 2001), and its haptic property is regarded
unimportant for the applications. Another example is the projection augmented
model. Acomputer-generated image is projected onto a real physical model to create
a realistic-looking object, and the model can be touched by the bare hand (e.g.,see
Bennett and Stevens 2006). Since the material property (e.g., texture) of the real
object may not agree with its visually augmented model, haptic properties are usually incorrectly displayed in this application.
The categories in the right column of the composite continuum, vR-hV, vMR-hV,
and vV-hV, are for haptic virtuality, corresponding to environments with only virtual
haptic sensations, and have received the most attention from the haptics research
community. Robot-assisted motor rehabilitation can be an example of vR-hV where
231
synthetic haptic feedback is provided in a real visual environment, while an interactive virtual simulator is an instance of vV-hV where the sensory information of both
modalities is virtual. In the intermediate category, vMR-hV, purely virtual haptic
objects are placed in a visually mixed environment, and are rendered using a haptic interface on the basis of the conventional haptic rendering methods for virtual
objects. Earlier attempts in this category focused on how to integrate haptic rendering of virtual objects into the existing visual AR framework, and they identified
the precise registration between the haptic and the visual coordinate frame as a key
issue (Adcock etal. 2003, Vallino and Brown 1999). For this topic, Kim etal. (2006)
applied an adaptive low-pass filter to reduce the trembling error of a low-cost visionbased tracker using ARToolkit, and upsampled the tracking data for use with 1 kHz
haptic rendering (Kim etal. 2006). Bianchi etal. further improved the registration
accuracy via intensive calibration of a vision-based object tracker (Bianchi et al.
2006a,b). Their latest work explored the potential of visuo-haptic AR technology
for medical training with their highly stable and accurate AR system (Harders etal.
2009). Ott et al. also applied the HMD-based visuo-haptic framework to training
processes in industry and demonstrated its potential (Ott et al. 2007). In applications, a half mirror was often used for constructing a visuo-haptic framework due to
the better collocation of visual and haptic feedback, for example, ImmersiveTouch
(Luciano et al. 2005), Reachin Display (Reachin Technology), PARIS display
(Johnson etal. 2000), and SenseGraphics 3D-IW (SenseGraphics). Such frameworks
were, for instance, applied to cranial implant design (Scharver etal. 2004) or MR
painting application (Sandor etal. 2007).
The last categories for haptic MR, vR-hMR, vMR-hMR, and vV-hMR, with which
the rest of this chapter is concerned, lie in the middle column of the composite continuum. A common characteristic of haptic MR is that synthetic haptic signals that
are generated by a haptic interface modulate or augment stimuli that occur due to a
contact between a real object and a haptic interface medium, that is, a tool or a body
part. The VisHap system (Ye etal. 2003) is an instance of vR-hMR that provides
mixed haptic sensations in a real environment. In this system, some properties of a
virtual object (e.g., shape and stiffness) are rendered by a haptic device, while others
(e.g., texture and friction) are supplied by a real prop attached at the end-effector of
the device. Other examples in this category are the SmartTool (Nojima etal. 2002)
and SmartTouch systems (Kajimoto etal. 2004). They utilized various sensors (optical and electrical conductivity sensors) to capture real signals that could hardly be
perceived by the bare hand, transformed the signals into haptic information, and then
delivered them to the user in order to facilitate certain tasks (e.g., peeling off the
white from the yolk in an egg). The MicroTactus system (Yao etal. 2004) is another
example of vR-hMR, which detects and magnifies acceleration signals caused by
the interaction of a pen-type probe with a real object. The system was shown to
improve the performance of tissue boundary detection in arthroscopic surgical training. Asimilar pen-type haptic AR system, Ubi-Pen (Kyung and Lee 2009), embedded miniaturized texture and vibrotactile displays in the pen, adding realistic tactile
feedback for interaction with a touch screen in mobile devices.
On the other hand, environments in vV-hMR use synthetic visual stimuli. For example, Borst etal. investigated the utility of haptic MR in a visual virtual environment
232
by adding synthetic force to a passive haptic response for a panel control task (Borst
and Volz 2005). Their results showed that mixed force feedback was better than synthetic force alone in terms of task performance and user preference. In vMR-hMR,
both modalities rely on mixed stimuli. Ha etal. installed a vibrator in a real tangible
prop to produce virtual vibrotactile sensations in addition to the real haptic information of the prop in a visually mixed environment (Ha etal. 2007). They demonstrated
that the virtual vibrotactile feedback enhances immersion for an AR-based handheld
game. Bayart etal. introduced a teleoperation framework where force measured at
the remote site is presented at the master side with additional virtual force and mixed
imagery (Bayart etal. 2007, 2008). In particular, they tried to modulate a certain real
haptic property with virtual force feedback for a hole-patching task and a painting
application, unlike most of the related studies introduced earlier.
Several remarks need to be made. First, the vast majority of related work, except
(Bayart et al. 2008, Borst and Volz 2005, Nojima et al. 2002), has used the term
haptic AR without distinguishing vMR-hV and hMR, although research issues associated with the two categories are fundamentally different. Second, haptic MR can
be further classified to haptic AR and haptic augmented virtuality using the same
criterion of visual MR. All of the research instances of hMR introduced earlier correspond to haptic AR, since little knowledge regarding an environment is managed
by the computer for haptic augmentation. However, despite its potential, attempts to
develop systematic and general computational algorithms for haptic AR have been
scanty. An instance of haptic augmented virtuality can be haptic rendering systems
that use haptic signals captured from a real object (e.g., see Hoever et al. 2009,
Okamura etal. 2001, Pai etal. 2001, Romano and Kuchenbecker 2011) in addition
to virtual object rendering, although such a concept has not been formalized before.
Third, although the taxonomy is defined for composite visuo-haptic configurations,
a unimodal case (e.g., no haptic or visual feedback) can also be mapped to the corresponding 1D continuum on the axes in Figure 10.2b.
10.2.2Artificial Recreation and Augmented Perception

The taxonomy described in the previous section is based on the visuo-haptic reality
virtuality continuum, thereby elucidating the nature of stimuli provided to users
and associated research issues. Also useful is a taxonomy that specifies the aims of
augmentation. Hugues etal. (2011) defined two functional categories for visual AR:
artificial recreation (or environment) and augmented perception, which can be also
applied to hMR category in Figure 10.2. This is in line with the terms used by Bayart
and Kheddar (2006)haptic enhancing and enhanced haptics, respectively.
In artificial recreation, haptic augmentation is used to provide a realistic presentation
of physical entities by exploiting the crucial advantage of AR, that is, more efficient
and realistic construction of an immersive environment, compared to VR. Artificial
recreation can be further classified into two sub-categories. It can be either for realistic reproduction of a specific physical environment, for example, the texture display
example of clothes described in Section 10.1, or for creating a nonexisting environment,
for example, the tumor palpation example in Jeon etal. (2012). The latter is a particularly
important area for haptic AR, since it maximizes the advantages of both VR and AR.
233
In contrast, augmented perception aims at utilizing touch as an additional channel

for transferring useful information that can assist decision-making. Since realism is
no longer a concern, the form of virtual haptic stimuli in this category significantly
varies depending on the target usage. For example, one of the simplest forms is
vibration alerts. Synthetic vibratory signals, while mixed with other haptic attributes of the environment, are a powerful means of conveying timing information,
for example, mobile phone alarms, driving hazard warnings (Chun etal. 2013), and
rhythmic guidance (Lee etal. 2012b). Recently, many researchers also tried to use
vibration for spatial information (e.g., Lee and Choi 2014, Sreng etal. 2008) and discrete categorical information (e.g., haptic icon [Rovers and van Essen 2004, Ternes
and MacLean 2008]).
Force feedback is another widely used form for augmentation in this category.
The most common example is virtual fixtures used for haptic guidance. They add
guiding or preventing forces to the operators movement while she/he performs a
motor task, in order to improve the safety, accuracy, and speed of task execution
(Abbott et al. 2007). The term was originally coined in Rosenberg (1993), and it
has been applied to various areas, for example, a manual milling tool (Zoran and
Paradiso 2012), the SmartTool (Nojima etal. 2002), or surgical assistance systems
(Li etal. 2007).
There have also been attempts that faithfully follow the original meaning of
augmentation of reality. The aforementioned MicroTactus system (Yao et al.
2004) is one example. Sometimes, augmentation is done by mapping nonhaptic
information into haptic cues for the purpose of data perceptualization, for example, color information mapped to tactile stimuli (Kajimoto etal. 2004). Another
interesting concept is diminished reality, which hides reality, for example, removing the surface haptic texture of a physical object (Ochiai etal. 2014). This concept of diminished reality can also be applied to hand tremor cancellation in
surgical operations (Gopal et al. 2013, Mitchell et al. 2007). Lastly, in a broad
sense, exoskeletal suits are also an example of augmentation through mixing real
and virtual force.
10.2.3 Within- and Between-Property Augmentation

Various physical properties, such as shape, stiffness, friction, viscosity, and surface
texture, contribute to haptic perception. Depending on the haptic AR scenario, some
object properties may remain intact while the rest may be subject to augmentation.
Here, the augmentation may occur within a property, for example, mixing real and
virtual stiffness for rendering harder virtual nodules inside a tissue phantom (Jeon
etal. 2012), or it may be between different properties, for example, adding virtual
stiffness to real surface textures (Yokokohji et al. 1999) or vice versa (Borst and
Volz 2005).
This distinction is particularly useful for gauging the degree, accuracy, and type
of registration needed for augmentation. Consequently, this taxonomy allows the
developer to quantify the amount of environment modeling necessary for registration in preprocessing and rendering steps. The next section further describes issues
and requirements for registration and environment modeling for haptic AR.
234
TABLE 10.1
Classification of Related Studies Using the Composite Taxonomy
Within-property
augmentation
Between-property
augmentation
Artificial Recreation
Augmented Perception
Borst and Volz (2005)

Jeon and Choi (2009)
Jeon and Choi (2011)
Jeon etal. (2012)
Jeon etal. (2011)
Jeon and Harders (2014)
SoIanki and Raja (2010)
Gerling and Thomas (2005)
Kurita etal. (2009)
Hachisu etal. (2012)
Bayart etal. (2008)
Bayart etal. (2007)
Fukumoto and Sugimura (2001)
Iwata etal. (2001)
Minamizawa etal. (2007)
Park etal. (2011)
Ye etal. (2003)
Yokokohji etal. (1999)
Frey etal. (2006)
Parkes etal. (2009)
Ha etal. (2006)
Romano and Kuchenbecker (2011)
Abbott and Okamura (2003)

Bose etal. (1992)
Gopal etal. (2013)
Kajimoto etal. (2004)
Mitchell etal. (2007)
Nojima etal. (2002)
Ochiai etal. (2014)
Yao etal. (2004)
Yang etal. (2008)
Lee etal. (2012a)
Brewster and Brown (2004)
Brown and Kaaresoja (2006)
Kim and Kim (2012)
Kyung and Lee (2009)
Lee and Choi (2014)
Powell and OMalley (2011)
Rosenberg (1993)
Spence and Ho (2008)
Zoran and Paradiso (2012)
Grosshauser and Hermann (2009)
Further, the last two taxonomies are combined to construct a composite taxonomy,
and all relevant literature in the hMR category is classified using this taxonomy in
Table 10.1. Note that most of the haptic AR systems have both within- and betweenproperty characteristics to some degree. For clear classification, we only examined
key augmentation features in Table 10.1.
10.3 COMPONENTS REQUIRED FOR HAPTIC AR

10.3.1Interface for Haptic AR
A haptic AR framework inherently involves interactions with real e nvironments. There
fore, three systemsa haptic interface, a human operator, and a real e nvironment
react to each other through an interaction tool, leading to tridirectional interaction as
During interaction, the interaction tool is coupled with the three components,
and this coupling is the core for the realization of haptic AR, that is, merging the
real and the virtual. Through this coupled tool, relevant physical signals from
235

Real environment
Re/action
based on
physics
Object
Coupled when in contact
Sensing
Computer
Haptic
interface
Interaction
tool
Coupled
Coupled
Actuation
Haptic rendering system
Perception
Sensorimotor
system
Brain
Action
Human user
FIGURE 10.3 Tridirectional interaction in haptic AR.
both the real environment and the haptic interface are mixed and transmitted to
the user. Therefore, designing this feel-through tool is of substantial importance
in designing a haptic AR interface.
The feel-through can be either direct or indirect. Direct feel-through, analogous
to optical see-through in visual AR, transmits relevant physical signals directly to
the user via a mechanically coupled implement. In contrast, in indirect feel-through
(similar to video see-through), relevant physical signals are sensed, modeled, and
synthetically reconstructed for the user to feel, for example, in masterslave teleoperation. In direct feel-through, preserving the realism of a real environment and
mixing real and virtual stimuli is relatively easy, but real signals must be compensated for with great care for augmentation. To this end, the system may need to
employ very accurate real response estimation methods for active compensation
or special hardware for passive compensation, for example, using a ball bearing
tip to remove friction (Jeon and Choi 2010) and using a deformable tip to compensate for real contact vibration (Hachisu etal. 2012). On the contrary, in indirect
feel-through, modulating real signals is easier since all the final stimuli are synthesized, but more sophisticated hardware is required for transparent rendering of
virtual stimuli with high realism.
Different kinds of coupling may exist. Mechanical coupling is a typical example,
a force feedback haptic stylus instrumented with a contact tip, for example (Jeon and
Choi 2011). Other forms such as thermal coupling and electric coupling are also possible depending on the target property. In between-property augmentation, coupling
may not be very tight, for example, only position data and timing are shared (Borst
and Volz 2005).
Haptic AR tools can come in many different forms. In addition to typical styli,
very thin sheath-type tools are also used, for example, sensors on one side and
236
actuators on the other side of a sheath (Nojima etal. 2002). Sometimes a real object
itself is a tool, for example, when both sensing and actuation modules are embedded
in a tangible marker (Ha etal. 2006).
A tool and coupling for haptic AR needs to be very carefully designed. Each of
the three components involved in the interaction requires a proper attachment to the
tool, appropriate sensing and actuation capability, and eventually, all of these should
be compactly integrated into the tool in a way that it can be appropriately used by
a user. To this end, the form factors of the sensors, attachment joints, and actuation
parts should be carefully designed to maximize the reliability of sensing and actuation while maintaining a sufficient degree of freedom of movement.
10.3.2 Registration between Real and Virtual Stimuli

An AR system generally faces two registration problems between real and virtual
environments: spatial and temporal registrations. Virtual and real stimuli must be
spatially and temporally aligned with each other with high accuracy and robustness.
In visual AR, proper alignment of virtual graphics (usually in 3D) on real video
streams has been a major research issue (Feng etal. 2008). Tracking an AR camera,
a user, and real objects and localizing them in a world coordinate frame are the core
technical problems (Harders etal. 2009).
In haptic AR, virtual and real haptic stimuli also have to be spatially and
temporally aligned, for example, adding a virtual force at the right position and
at the right moment. While sharing the same principle, registration in haptic AR
sometimes has different technical requirements. In many haptic AR scenarios,
an area of interest for touching is very small (even one or a couple of points),
and touch usually occurs via a tool. Therefore, large area tracking used in visual
AR is not necessary, and tracking can be simplified, for example, detecting the
moment and location of contact between a haptic tool and a real object using a
mechanical tracker. However, tracking data are directly used for haptic rendering in many cases, so the update rate and accuracy of tracking should be carefully considered.
In addition to such basic position and timing registration, other forms of spatial
and temporal quantities related to the target augmentation property often require
adequate alignment. For example, in order to augment stiffness, the direction of
force for virtual stiffness must be aligned with the response force direction from
real stiffness. Another example is an AR pulse simulator where the frequency and
phase of a virtual heart beat should match with those of the real one. These alignments usually can be done by acquiring the real quantity through sophisticated realtime sensing and/or estimation modules and setting corresponding virtual values
to them. Examining and designing such property-related registration is one of the
major research issues in developing haptic ARsystems.
The requirements of this property-related registration largely depend on an application area, a target augmentation property, and physical signals involved. However, the
within/between-property taxonomy can provide some clues for judging what kinds of
and how accurate registration is needed, as the taxonomy gives the degree of association
237
between virtual and real signals. In the case of within-property augmentation, mixing
happens in a single property, and thus virtual signals related to a target property need to
be exactly aligned with corresponding real signals for harmonious merging and smooth
transition along the line between real and virtual. This needs very sophisticated registration, often with the estimation of real properties based on sensors and environment
models (see Section 10.4 for how we have approached this issue). However, in betweenproperty augmentation, different properties are usually treated separately, and virtual
signals of one target property do not have to be closely associated with real signals of
the other properties. Thus, the registration may be of lesser accuracy in this case.
10.3.3 Rendering Algorithm for Augmentation

A rendering frame of an AR system consists of (1) sensing the real environment,
(2)realvirtual registration, (3) merging stimuli, and (4) displaying the stimuli. The
following paragraphs overview the steps for haptic AR. Steps 2 and 3 are the core
parts for haptic AR.
Step 1 prepares data for steps 2 and 3 by sensing variables from the real environment. Signal processing can also be applied to the sensor values.
Step 2 conducts a registration process based on the sensed data and pre-
identified models (see Section 10.3.4 for examples). This step usually estimates the spatial and temporal state of the tool and the real environment and
then conducts the registration as indicated in Section 10.3.2, for example,
property-related registration and contact detection between the tool and real
objects. Depending on the result of this step, the system decides whether to
proceed to step 3 or go back to step 1 in this frame.
Step 3 is dedicated to the actual calculation of virtual feedback (in direct feelthrough) or mixed feedback (in indirect feel-through). Computational procedures in this step largely depend on the categories of haptic AR (Table 10.1).
For artificial recreation, this step simulates the behaviors of the properties
involved in the rendering using physically based models. However, augmented
perception may need to derive the target signal based on purely sensed signals
and/or using simpler rules, for example, doubling the amplitude of measured
contact vibration (Yao etal. 2004). In addition, within-property augmentation often requires an estimation of the properties of a real object in order to
compensate for or augment it. For instance, modulating the feel of a brush
in the AR drawing example first needs the compensation of the real tension
and friction of the manipulandum. This estimation can be done either using a
model already identified in a preprocessing step or by real-time estimation of
the property using sensor values, or both (see Section 10.3.4 for more details).
In between-property augmentation, however, this estimation process is not
required in general, and providing virtual properties is simpler.
Step 4 sends commands to the haptic AR interface to display the feedback
calculated in Step 3. Sometimes we need techniques for controlling the
hardware for the precise delivery of stimuli.
238
10.3.4Models for Haptic AR

As aforementioned in Sections 10.3.2 and 10.3.3, haptic AR requires predefined
models for three different purposes. First, models are needed for simulating the
responses of the signals associated with rendering properties, which is the same for
haptic VR rendering. Such computational models have been extensively studied in
haptics and virtual reality. In most cases, they include some degree of simplification
to fulfill the real-time requirement of haptic rendering.
The second model is for realvirtual registration (Step 2 in Section 10.3.3). The
most common example is the geometry model of real objects for contact and surface normal detection, which is usually built in preprocessing. Employing such a
geometry model makes rendering simpler since conventional rendering algorithms
for haptic VR can be readily applied. However, acquiring and using such models
should be minimized in order to fully utilize the general advantage of AR: efficient
construction of a realistic and immersive environment without extensive modeling.
Balancing the amount of modeling and complexity of rendering algorithm is important. In addition to geometry models, property augmentation sometimes needs models for the estimation of real information. For example, in Section 10.4, we estimate
local geometry and local deformation near to the contact point based on a simplified
environment model that is identified in preprocessing in order for stiffness direction
registration.
The last model is for the estimation of real signals in order for modulation in
Step 3 of the rendering (Section 10.3.3). The estimation often has challenging
accuracy requirements while still preserving efficiency for real-time performance.
For properties such as stiffness and friction, estimating physical responses has been
extensively studied in robotics and mechatronics for the purpose of environment
modeling and/or compensation. In general, there are two approaches for this: openloop model-based estimation and closed-loop sensor-based estimation. One of the
research issues is how to adapt those techniques for use in haptic AR, which has
the following requirements. The estimation should be perceptually accurate since
theerrors in estimation can be directly fed into the final stimuli. The identification
process should also be feasible for the application, for example, very quick identification is mandatory for scenarios in which real objects for interaction frequently
change. Lastly, using the same hardware for both identification and rendering is preferred for the usability of the system.
Each category in Table 10.1 has different requirements for models. Systems in the
artificial recreation category may need more sophisticated models for both simulation and estimation, while those in the augmenting perception category may suffice
with simpler model for simulation. Furthermore, systems in the between-property
category may have to use very accurate registration and estimation models, while
merging between properties may not need models for registration and estimation.
For summary, Table 10.2 outlines the rendering and registration characteristics of
the categories in the two taxonomies.
In the following sections, we introduce example algorithms for haptic AR, targeting a system that can modulate stiffness and friction of a real object by systematically adding virtual stiffness and friction.
239
TABLE 10.2
Characteristics of the Categories
Category
Within Property
Between Property
Registration and
rendering
Registration: position and timing

registration as well as propertyrelated registration needed.
Rendering includes estimation and
compensation of real signals and
merging of them with virtual signals.
Registration: only basic position

and timing registration needed.
Rendering: algorithms for haptic
VR can be applied.
Category
Models required
Artificial Recreation
Models for physics simulation.
Sometimes models for registration
and compensation.
Augmented Perception
Models for registration and
compensation.
Category
Rendering
Direct Feel-Through
Real-time compensation of real
property needed.
Indirect Feel-Through
Transparent haptic rendering
algorithm and interface needed.
10.4 STIFFNESS MODULATION

We initiated our endeavor toward haptic AR with the augmentation or modulation
of real object stiffness, which is one of the most important properties for rendering
the shape and hardness of an object. We summarize a series of our major results on
stiffness modulation (Jeon and Choi 2008, 2009, 2010, 2011; Jeon and Harders 2012)
in the following sections. This topic can be categorized into artificial recreation and
within-property augmentation.
We aim at providing a user with augmented stiffness by adding virtual force
feedback when interacting with real objects. We took two steps for this goal. The
first step was single-point interaction supporting typical exploratory patterns, such
as tapping, stroking, or contour following (Section 10.4.2). The second step extended
the first system to two-point manipulation, focusing on grasping and squeezing
(Section 10.4.3).
Our augmentation methods emphasize minimizing the need for prior knowledge
and preprocessing, for example, the geometric model of a real object, used for registration, while preserving plausible perceptual quality. Our system requires a minimal amount of prior information such as the dynamics model of a real object. This
preserves a crucial advantage of AR; only models for the objects of interest, not the
entire environment, are required, which potentially leads to greater simplicity in
application development.
Our framework considers elastic objects with moderate stiffness for interaction.
Objects made of plastic (e.g., clay), brittle (e.g., glass), or high stiffness material
(e.g., steel) are out of scope due to either complex material behavior or the performance limitations of current haptic devices. In addition, homogeneous dynamic
material responses are assumed for real objects.
240
PHANToM
Premium 1.5
PHANToM
Premium 1.5
NANO17 force sensor
FIGURE 10.4 Haptic AR interface. (Reprinted from Jeon, S. and Harders, M., Extending
haptic augmented reality: Modulating stiffness during two-point squeezing, in Proceedings
of the Haptics Symposium, 2012, pp. 141146. With permission.)
10.4.1 Haptic AR Interface

We constructed a haptic AR interface using two general impedance-type haptic
interfaces (Geomagic; PHANToM premium model 1.5), each of which has a customdesigned tool for interaction with a real object (see Figure 10.4). The tool is instrumented with a 3D force/torque sensor (ATI Industrial Automation, Inc.; model Nano17)
attached between the tool tip and the gimbal joints at the last link of the PHANToM.
This allows the system to measure the reaction force from a real object that is equal
to the sum of the force from the haptic interface and the force from the users hand.
10.4.2Stiffness Modulation in Single-Contact Interaction

Suppose that a user indents a real object with a probing tool. This makes the object
deform, and the user feels a reaction force. Let the apparent stiffness of the object at time
t be k(t). This is the stiffness that the user perceives when no additional virtual force is
rendered. The goal of stiffness augmentation is to systematically change the stiffness that
the user perceives k(t) to a desired stiffness k(t ) by providing virtual force to the user.
As shown in Figure 10.5, two force components, the force that the haptic device
exerts to the tool, fd(t), and the force from the users hand, fh(t), deform the object
surface and result in a reaction force fr(t), such that

fr ( t ) = fh ( t ) + fd ( t ) . (10.1)
The reaction force fr(t) during contact can be decomposed into two orthogonal
force components, as shown in Figure 10.5:

fr ( t ) = frn ( t ) + frt ( t ) , (10.2)
where
frn (t ) is the result of object elasticity in the normal direction
frt (t ) is the frictional tangential force
241
Original surface fr
fr
Deformed surface
fh
fd
frt
f rn
pc
un
FIGURE 10.5 Variables for single-contact stiffness modulation. (Reprinted from Jeon, S.
and Choi, S., Presence Teleop. Virt. Environ., 20, 337, 2011. With permission.)
Let x(t) be the displacement caused by the elastic force component, which represents
the distance between the haptic interface tool position, p(t), and the original nondeformed position pc(t) of a contacted particle on the object surface. If we denote the
unit vector in the direction of frn (t ) by un(t) and the target modulation stiffness by k(t ),
the force that a user should feel is:

fh ( t ) = k ( t ) x(t )u n (t ). (10.3)

Using (10.3), the force that the haptic device needs to exert is
fd ( t ) = fr ( t ) k ( t ) x(t )u n (t ). (10.4)
This equation indicates the tasks that a stiffness modulation algorithm has to do in
every loop: (1) detection of the contact between the haptic tool and the real object for
spatial and temporal registration, (2) measurement of the reaction force fr(t), (3) estimation of the direction un(t) and magnitude x(t) of the resulting deformation for stiffness augmentation, and (4) control of the device-rendered force fd(t) to produce the
desired force fd (t ). The following section describes how we address these four steps.
In Step 1, we use force sensor readings for contact detection since the entire
geometry of the real environment is not available. A collision is regarded to have
occurred when forces sensed during interaction exceed a threshold. To increase the
accuracy, we developed algorithms to suppress noise, as well as to compensate for
the weight and dynamic effects of the tool. See Jeon and Choi (2011) for details.
Step2 is also simply done with the force sensor attached to the probing tool.
Step 3 is the key process for stiffness modulation. We first identify the friction
and deformation dynamics of a real object in a preprocessing step, and use them later
during rendering to estimate the known variables for merging real and virtual forces.
The details of this process are summarized in the following section.
Before augmentation, we carry out two preprocessing steps. First, the friction
between the real object and the tool tip is identified using the Dahl friction model (Jeon
and Choi 2011). The original Dahl model is transformed to an equivalent discrete-time
difference equation, as described in Mahvash and Okamura (2006). Italso includes
a velocity-dependent term to cope with viscous friction. Theprocedure for friction
242
identification adapts the divide-and-conquer strategy by performing identification

separately for the presliding and the sliding regime, which decouples the nonlinear
identification problem to two linear problems. Data for lateral displacement, velocity,
normal force, and friction force are collected during manual stroking, and then are
divided into presliding and sliding bins according to the lateral displacement. The
data bins for the presliding regime are used to identify the parameters that define
behavior at almost zero velocity, while the others are used for Coulomb and viscous
parameters.
The second preprocessing step is for identifying the deformation dynamics of the
real object. We use the HuntCrossley model (Hunt and Crossley 1975) to account
for nonlinearity. The model determines the response force magnitude given displacement x(t) and velocity x (t ) by
f (t ) = k x (t )
+ b x (t )
) x ( t ) , (10.5)
m
where
k and b are stiffness and damping constants
m is a constant exponent (usually 1 < m < 2)
For identification, the data triples consisting of displacement, velocity, and reaction
force magnitude are collected through repeated presses and releases of a deformable
sample in the normal direction. The data are passed to a recursive least-squares algorithm for an iterative estimation of the HuntCrossley model parameters (Haddadi
and Hashtrudi-Zaad 2008).
For rendering, the following computational process is executed in every haptic rendering frame. First, two variables, the deformation direction un(t) and the magnitude
of the deformation x(t) are estimated. The former is derived as follows. Equation10.2
indicates that the response force fr(t) consists of two perpendicular force components: frn (t ) and frt (t ). Since un(t) is the unit vector of frn (t ), un(t) becomes:
un (t ) =

fr ( t ) frt ( t )
fr ( t ) frt ( t )
. (10.6)
The known variable in (10.6) is frt ( t ). The magnitude of frt (t ) is estimated using the
identified Dahl model. Its direction is derived from the tangent vector at the current
contact point p(t), which is found by projecting p(t) onto un(tt) and subtracting
it from p(t).
The next part is the estimation of x(t). The assumption of material homogeneity
allows us to directly approximate it from the inverse of the HuntCrossley model
identified previously. Finally, using the estimated un(t) and x(t), fd (t ) is calculated
using (10.4), which is then sent to the haptic AR interface.
In Jeon and Choi (2011), we assessed the physical performance of each component and the perceptual performance of the final rendering result using various real
samples. In particular, the perceptual quality of modulated stiffness evaluated in a
243
psychophysical experiment showed that rendering errors were less than the human
discriminability of stiffness. This demonstrates that our system can provide perceptually convincing stiffness modulation.
10.4.3Stiffness Modulation in Two-Contact Squeezing

After confirming the potential of the concept, we moved to a more challenging
scenario: stiffness modulation in two-contact squeezing (Jeon and Harders 2012).
Wedeveloped new algorithms to provide stiffness augmentation while grasping and
squeezing an object with two probing tools. In this system, we assume that the object
is fully lifted from the ground and the contact points do not change without slip.
Wealso do not take inertial effects into account.
During lifting an object, an additional force due to the object weight, fw(t) in
Figure 10.6, is involved in the system. When the user applies forces fh,* (t) to hold and
squeeze the object (* is either 1 or 2 depending on the contact point) and the haptic
interfaces exert forces fd,* (t) for modulation, weight forces fw,* (t) are also present at
the two contact points. At each contact point, these three force components deform
the object and make reaction force fr,* (t):

fr ,* ( t ) = fh,* ( t ) + fd ,* ( t ) + fw,* ( t ) . (10.7)
fr,* (t) can be further decomposed to pure weight fw,* (t) and a force component in
squeezing direction fsqz,* (t) as shown in Figure 10.6, resulting in

fr ,* ( t ) = fsqz,* ( t ) + fw,* ( t ) . (10.8)
Since the displacement and the force along the squeezing direction contribute to stiffness
perception, the force component of interest is fsqz,* (t). Then, (10.7) can be rewritten as

fsqz,* ( t ) = fh,* ( t ) + fd ,* ( t ) . (10.9)
x1u1
x2u2
pc,2
p1
fsqz,1
fw,1
fsqz,2
fd
fh
fw
p2
pc,1
fr,1
fw,2
fr
fr,2
Deformed surface
FIGURE 10.6 Variables for two-contact stiffness modulation. (Reprinted from Jeon, S.
and Harders, M., Extending haptic augmented reality: Modulating stiffness during two-point
squeezing, in Proceedings of the Haptics Symposium, 2012, pp. 141146. With permission.)
244
To make the user feel the desired stiffness k(t ),

fh,* ( t ) = k ( t ) x* ( t ) u* ( t ) , (10.10)
where x*(t) represents the displacement along the squeezing direction and u*(t)
is the unit vector toward the direction of that deformation. Combining (10.9) and
(10.10) results in the virtual force for the haptic interfaces to render for the desired
augmentation:

fd ,* ( t ) = fsqz,* ( t ) k ( t ) x* ( t ) u* ( t ) . (10.11)
Here again, (10.11) indicates that we need to estimate the displacement x*(t) and the
deformation direction u*(t) at each contact point. The known variables are the reaction forces fr,*(t) and the tool tip positions p*(t). To this end, the following three observations about an object held in the steady state are utilized. First, the magnitudes of
the two squeezing forces fsqz,1(t) and fsqz,2(t) are the same, but the directions are the
opposite (fsqz,1(t) = fsqz,2(t)). Second, each squeezing force falls on the line connecting the two contact locations. Third, the total weight of the object is equal to the sum
of the two reaction force vectors:

fr ,1 ( t ) + fr ,2 ( t ) = fw,1 ( t ) + fw,2 ( t ) .
The first
second
LLLLLLLLL
I and
LLLLLLLLL
I observations provide the directions of fsqz,*(t) (= u*(t) =
p1 (t )p2 (t ) or p2 (t )p1 (t ) ; also see l(t) in Figure 10.6). The magnitude of fsqz,*(t), fsqz,*
(t) is determined as follows. The sum of the reaction forces along the l(t) direction,
fr sqz (t ) = fr ,1 (t ) u l (t ) + fr ,2 (t ) u l (t ) , includes not only the two squeezing forces, but
also the weight. Thus, fsqz(t) can be calculated by subtracting the effect of the weight
along l(t) from frsqz(t):

fsqz ( t ) = fr sqz ( t ) fwsqz ( t ) , (10.12)
where f wsqz(t) can be derived based on the third observation such that
fwsqz ( t ) = fr ,1 ( t ) + fr ,1 ( t ) u l ( t ) . (10.13)
Then, the squeezing force at each contact point can be derived based on the first
observation:

fsqz,1 ( t ) = fsqz,2 ( t ) = 0.5 fsqz ( t ) . (10.14)
245
FIGURE 10.7 Example snapshot of visuo-haptic augmentation. Reaction force (dark gray
arrow), weight (gray arrow), and haptic device force (light gray arrow) are depicted. Examples
with increased stiffness (virtual forces oppose squeezing) and decreased stiffness (virtual
forces assist squeezing) are shown on left and right, respectively.
Steps for the estimation of the displacement x* (t) in (10.11) are as follows. Let
the distance between the two initial contact points on the non-deformed surface
(pc,1(t) and pc,2(t) in Figure 10.6) be d0. It is constant over time due to the no-slip
assumption. Assuming homogeneity, x1(t) is equal to x2(t), and the displacements can
be derived by

x1 ( t ) = x2 ( t ) = 0.5 d0 d ( t ) , (10.15)
where d(t) is p1 (t )p2 (t ) . All the unknown variables are now estimated and the final
virtual force can be calculated using (10.11).
In Jeon and Harders (2012), we also evaluated the system performance through
simulations and a psychophysical experiment. Overall, the evaluation indicated that
our system can provide physically and perceptually sound stiffness augmentation.
In addition, the system has further been integrated with a visual AR framework
(Harders et al. 2009). To our knowledge, this is among the first system that can
augment both visual and haptic sensations. We used the visual system to display
information related to haptic augmentation, such as the force vectors involved in the
algorithm. Figure 10.7 shows exemplar snapshots.
10.5APPLICATION: PALPATING VIRTUAL INCLUSION

IN PHANTOM WITH TWO CONTACTS
This section introduces an example of the applications of our stiffness modulation
framework, taken from Jeon and Harders (2014). We developed algorithms for rendering a stiffer inclusion in a physical tissue phantom during manipulations at more
than one contact location. The basic concept is depicted in Figure 10.8. The goal of
the system is to render forces that give an illusion of a harder inclusion in the mock-up.
In Figure 10.8, fR,*(t) are the reaction forces from the real environment to which
the system adds virtual force feedback fT,*(t) stemming from the simulated tumor
246
fH,1
fH,2
fR,1
fT,1
fR,2
fT,2
Silicone tissue
mock up
Virtual tumor
FIGURE 10.8 Overall configuration of rendering stiffer inclusion in real mock-up.

(Reprinted from Jeon, S. and Harders, M., IEEE Trans. Haptics, 99, 1, 2014. With permission.)
with the consideration of the mutual effects between the contacts. The final combined forces fH,* (t) enable a user to feel augmented sensations of the stiffer inclusion,
given as

fH ,* ( t ) = fR,* ( t ) + fT ,* ( t ) . (10.16)
Here, estimating and simulating fT,*(t) is the key for creating a sound illusion. The
hardware setup we used is the same as the one shown in Figure 10.4.
A two-step, measurement-based approach is taken to model the dynamic behavior
of the inclusion. First, a contact dynamics model representing the pure response of the
inclusion is identified using the data captured during palpating a physical mock-up. Then,
another dynamics model is constructed to capture the movement characteristics of the
inclusion in response to external forces. Both models are then used in rendering to determine fT,* (t) in real-time. The procedures are detailed in the following paragraphs.
The first preprocessing step is for identifying the overall contact force resulting
purely from an inclusion (inclusion-only case) as a function of the distance between
the inclusion and the contact point. Our approach is to extract the difference between
the responses of a sample with a stiffer inclusion (inclusion-embedded) and a sample without it (no-inclusion). To this end, we first identify the HuntCrossley model
using the no-inclusion model. We use the same identification procedure described in
Section 10.4.2. This model is denoted by f = H NT ( x, x ). Then, we obtain the data from
the inclusion-embedded sample by manually poking along a line from pTs to pT0 (see
Figure 10.9 for the involved quantities). This time, we also record the position changes
of pT using a position tracking system (TrackIR; NaturalPoint, Inc.). This gives us the
state vector when palpating the tumor-embedded model ( xTE , x TE , fTE , pT , p H ).
As depicted in Figure 10.8, the force f TE (t) can be decomposed into fR (t) and fT (t).
Since f = H NT ( x, x ) represents the magnitude of fR (t), the magnitude of fT (t) can be
obtained by passing all data pairs ( xTE , x TE ) to H NT ( x, x ) and by computing differences using

fT ( xTE , x TE ) = fTE H NT ( xTE , x TE ) . (10.17)
247

Original
surface
pTs
l0
Deformed
surface
lHT
Displaced tumor
pH
Tool tip
pT0
Initial tumor
pT
FIGURE 10.9 Variables for inclusion model identification. (Reprinted from Jeon, S. and
Harders, M., IEEE Trans. Haptics, 99, 1, 2014. With permission.)
f T (t) can be expressed as a function of the distance between the inclusion and the tool
tip. Let the distance from pH(t) to pT (t) be lHT (t), and the initial distance from pTs to
pT0 be l0. Then, the difference, l(t) = l0 lHT (t), becomes a relative displacement toward
the inclusion. By using the data triples (l, l, fT ), a new response model with respect
to l(t) can be derived, which is denoted as HT (l, l). This represents the inclusion-only
force response at the single contact point pTs, poking into the direction of pT.
In the second step, the inclusion movement in response to external forces is characterized. Nonlinear changes of d(t) with respect to an external force fT (t) can be
approximated using again the HuntCrossley model. After determining d(t) using a
position tracker and fT (t) using our rendering algorithm described in the next subsection, vector triples (d, d , fT ) are employed to identify three HuntCrossley models
for the three Cartesian directions, denoted by Gx (d x , d x ), Gy (d y , d y ), and Gz (dz , d z ).
10.5.1 Rendering Algorithm

Rendering begins with making a contact with the no-inclusion model. Forces from
multiple contacts deform the model as shown in Figure 10.10 and displace the contact point from pHs,* to pH,* (t) and the inclusion from pT0 to pT (t). The force causing this movement is the same as the inclusion response at the users hand fT,* (t) in
(10.15). Therefore, the direction of fT,* (t) is from the inclusion position to the tool tip,
such that
fT ,* ( t ) = fT ,* ( t )

p H ,* ( t ) pT ( t )
. (10.18)
| p H ,* ( t ) pT ( t ) |
Equation 10.18 indicates that the unknown values, f T,* (t) and pT (t), should be approximated during the rendering.
f T,* (t) is derived based on HT. To this end, we first scale the current indentation
distance to match those during the recording:
l* (t ) = (l0,* lHT ,* (t ))
l0
. (10.19)
l0,*
248

l0,1
pTs
pHs,1
Original
surface
l0
pH,1
lHT,1
Tool tip 1
Displaced tumor
pH,2
pT0
d
pT
Tool tip 2
Initial tumor
Deformed
surface
FIGURE 10.10 Variables for inclusion augmentation rendering. (Reprinted from Jeon, S.
and Harders, M., IEEE Trans. Haptics, 99, 1, 2014. With permission.)
Then, we can obtain a linearly-normalized indentation length along p H ,*pT

with respect to the reference deformation. Finally, f T,*(t) is approximated by
fT ,* (t ) = HT (l* (t ), l* (t )).
We take a similar approach for the update of d(t), and then eventually pT (t). Taking
the inverse of Gx,y,z allows us to approximate d(t) by
1/ m
fT ,*,i (t )
*=1
di ( t ) =
k + b d (t )
i = x, y, z, (10.20)
where
n is the number of contact points
m is the exponential parameter in the HuntCrossley model
Finally, fT,*(t) is determined using (10.18), which is directly sent to the haptic AR
interface.
In Jeon and Harders (2014), we compared the simulation results of our algorithm with actual measurement data recorded from eight different real mock-ups via
various interaction methods. Overall, inclusion movements and the mutual effects
between contacts are captured and simulated with reasonable accuracy; the force
simulation errors were less than the force perception thresholds in most cases.
10.6 FRICTION MODULATION

Our next target was the modulation of surface friction (Jeon etal. 2011). Here, we
introduce simple and effective algorithms for estimating and altering inherent friction between a tool tip and a surface to desired friction. We also use the same hardware setup for friction modulation.
The specific goal of this work is to alter real friction force freal(t) such that a user
perceives a modulated friction force ftarg(t) that mimics the response of a surface
made of a different desired target material when the user strokes the real surface
249
Real friction freal

Virtual modulation friction fmod
Target friction ftarg
FIGURE 10.11 Variables for friction modulation. (Reprinted from Jeon, S. etal., Extensions
to haptic augmented reality: Modulating friction and weight, in Proceedings of the IEEE
World Haptics Conference (WHC), 2011, pp. 227232. With permission.)
with a tool. As illustrated in Figure 10.11, this is done by adding a modulation friction force f mod(t) to the real friction force:

fmod ( t ) = ftarg ( t ) freal ( t ) . (10.21)
Thus, the task reduces to: (1) simulation of the desired friction response ftarg(t) and (2)
measurement of the real friction force freal(t).
For the simulation of the desired friction force ftarg(t) during rendering, we identify the modified Dahl model describing the friction of a target surface. For the Dahl
model parameter identification, a user repeatedly strokes the target surface with the
probe tip attached to the PHANToM. The identification procedure is the same as
that given in Section 10.4.2. The model is then used to calculate ftarg(t) using the tool
tip position and velocity and the normal contact force during augmented rendering.
freal(t) can be easily derived from force sensor readings after a noise reduction
process. Given the real friction and the target friction, the appropriate modulation
force that needs to be rendered by the device is finally computed using (10.20). The
modulation force is sent to the haptic interface for force control.
We tested the accuracy of our friction identification and modulation algorithms
with four distinctive surfaces (Jeon etal. 2011). The results showed that regardless of
the base surface, the friction was modulated to a target surface without perceptually
significant errors.
10.7 OPEN RESEARCH TOPICS

In spite of our and other groups endeavor for haptic AR, this field is still young and
immature, awaiting persistent research on many intriguing and challenging topics.
For instance, our work regarding stiffness modulation has focused on homogeneous
soft real objects, meaning that the material characteristics of the objects are identical regardless of contact point. However, most natural deformable objects exhibit
inhomogeneity. Such objects show much more complicated deformation and friction
250
behaviors, and approaches that are based on more in-depth contact mechanics are
necessary for appropriate modeling and augmentation. This has been one direction
of our research, with an initial result that allows the user to model the shape of a soft
object using a haptic interface without the need for other devices (Yim and Choi 2012).
Our work has used a handheld tool for the exploration of real objects. This must
be extended to those which allow for the use of bare hands, or at least very similar
cases such as thin thimbles enclosing fingertips. Such extension will enlarge the
application area of haptic AR by the great extent, for example, palpation training on
a real phantom that includes virtual organs and lumps. To this end, we have begun to
examine the feasibility of sensing not only contact force but also contact pressure in
a compact device and its utility for haptic AR (Kim etal. 2014).
Another important topic is that for multi-finger interaction. This functionality
requires very complicated haptic interfaces that provide multiple, independent forces
with a very large degrees of freedom (see Barbagli et al. 2005), as well as very
sophisticated deformable body rendering algorithms that take into account the interplay between multiple contacts. Research effort on this topic is still ongoing even for
haptic VR.
Regarding material properties, we need methods to augment friction, texture, and
temperature. Friction is expected to be relatively easier in both modeling and rendering for haptic AR, as long as deformation is properly handled. Temperature modulation is likely to be more challenging, especially due to the difficulty of integrating a
temperature display to the fingertip that touches real objects. This functionality can
greatly improve the realism of AR applications.
The last critical topic we wish to mention is texture. Texture is one of the most
salient material properties and determines the identifying tactual characteristics of
an object (Katz 1925). As such, a great amount of research has been devoted to
haptic perception and rendering of surface texture. Texture is also one of the most
complex issues because of the multiple perceptual dimensions involved in texture
perception; all of surface microgeometry and materials elasticity, viscosity, and friction play an important role (Hollins etal. 1993, 2000). See Choi and Tan (2004a,b,
2005, 2007) for a review of texture perception relevant to haptic rendering, Campion
and Hayward (2007) for passive rendering of virtual textures, and Fritz and Barner
(1996), Guruswamy et al. (2011), Lang and Andrews (2011), and Romano and
Kuchenbecker (2011) for various models. All of these studies pertained to haptic VR
rendering. Among these, the work of Kuchenbecker and her colleagues has the best
feasibility for application to haptic AR; they have developed a high-quality texture
rendering system that overlays artificial vibrations on a touchscreen to deliver the
textures of real samples (Romano and Kuchenbecker 2011) and an open database of
textures (Culbertson etal. 2014). This research can be a cornerstone for the modeling
and augmentation of real textures.
10.8CONCLUSIONS
This chapter overviewed the emerging AR paradigm for the sense of touch. We first
outlined the conceptual, functional, and technical aspects of this new paradigm with
three taxonomies and thorough review of existing literature. Then, we moved to
251
recent attempts for realizing haptic AR, where hardware and algorithms for augmenting the stiffness and friction of a real object were detailed. These frameworks
are applied to medical training of palpation, where stiffer virtual inclusions are rendered in a real tissue mock-up. Lastly, we elucidate several challenges and future
research topics in this research area. We hope that our endeavor introduced in this
chapter will pave the way to more diverse and mature researches in the exciting field
of haptic AR.
REFERENCES
Abbott, J., P. Marayong, and A. Okamura. 2007. Haptic virtual fixtures for robot-assisted
manipulation. In Robotics Research, eds. S. Thrun, R. Brooks, and H. Durrant-Whyte,
pp. 4964. Springer-Verlag: Berlin, Germany.
Abbott, J. and A. Okamura. 2003. Virtual fixture architectures for telemanipulation.
Proceedings of the IEEE International Conference on Robotics and Automation,
pp.27982805. Taipei, Taiwan.
Adcock, M., M. Hutchins, and C. Gunn. 2003. Augmented reality haptics: Using ARToolKit
for display of haptic applications. Proceedings of Augmented Reality Toolkit Workshop,
pp. 12. Tokyo, Japan.
Azuma, R., Y. Baillot, R. Behringer, S. Feiner, S. Julier, and B. MacIntyre. 2001. Recent
advances in augmented reality. IEEE Computer Graphics & Applications 21 (6):3447.
Barbagli, F., D. Prattichizzo, and K. Salisbury. 2005. A multirate approach to haptic interaction with deformable objects single and multipoint contacts. International Journal of
Robotics Research 24 (9):703716.
Bayart, B., J. Y. Didier, and A. Kheddar. 2008. Force feedback virtual painting on real
objects: A paradigm of augmented reality haptics. Lecture Notes in Computer Science
(EuroHaptics 2008) 5024:776785.
Bayart, B., A. Drif, A. Kheddar, and J.-Y. Didier. 2007. Visuo-haptic blending applied to a
tele-touch-diagnosis application. Lecture Notes on Computer Science (Virtual Reality,
HCII 2007) 4563: 617626.
Bayart, B. and A. Kheddar. 2006. Haptic augmented reality taxonomy: Haptic enhancing and
enhanced haptics. Proceedings of EuroHaptics, 641644. Paris, France.
Bennett, E. and B. Stevens. 2006. The effect that the visual and haptic problems associated with touching a projection augmented model have on object-presence. Presence:
Teleoperators and Virtual Environments 15 (4):419437.
Bianchi, G., C. Jung, B. Knoerlein, G. Szekely, and M. Harders. 2006a. High-fidelity visuohaptic interaction with virtual objects in multi-modal AR systems. Proceedings of the
IEEE and ACM International Symposium on Mixed and Augmented Reality, pp.187
196. Santa Barbara, USA.
Bianchi, G., B. Knoerlein, G. Szekely, and M. Harders. 2006b. High precision augmented reality haptics. Proceedings of EuroHaptics, pp. 169168. Paris, France.
Billinghurst, M., H. Kato, and I. Poupyrev. 2001. The MagicBookMoving seamlessly
between reality and virtuality. IEEE Computer Graphics & Applications 21 (3):68.
Borst, C. W. and R. A. Volz. 2005. Evaluation of a haptic mixed reality system for interactions with a virtual control panel. Presence: Teleoperators and Virtual Environments
14(6):677696.
Bose, B., A. K. Kalra, S. Thukral, A. Sood, S. K. Guha, and S. Anand. 1992. Tremor compensation for robotics assisted microsurgery. Engineering in Medicine and Biology Society,
1992, 14th Annual International Conference of the IEEE, October 29, 1992November
1 1992, pp. 10671068. Paris, France.
252
Brewster, S. and L. M. Brown. 2004. Tactons: Structured tactile messages for non-visual
information display. Proceedings of the Australasian User Interface Conference,
pp.1523. Dunedin, New Zealand.
Brown, L. M. and T. Kaaresoja. 2006. Feel whos talking: Using tactons for mobile phone
alerts. Proceeding of the Annual SIGCHI Conference on Human Factors in Computing
Systems, pp. 604609. Montral, Canada.
Campion, G. and V. Hayward. 2007. On the synthesis of haptic textures. IEEE Transactions
on Robotics 24 (3):527536.
Choi, S. and H. Z. Tan. 2004a. Perceived instability of virtual haptic texture. I. Experimental
studies. Presence: Teleoperators and Virtual Environment 13 (4):395415.
Choi, S. and H. Z. Tan. 2004b. Toward realistic haptic rendering of surface textures. IEEE
Computer Graphics & Applications (Special Issue on Haptic RenderingBeyond Visual
Computing) 24 (2):4047.
Choi, S. and H. Z. Tan. 2005. Perceived instability of haptic virtual texture. II. Effects of
collision detection algorithm. Presence: Teleoperators and Virtual Environments
14(4):463481.
Choi, S. and H. Z. Tan. 2007. Perceived instability of virtual haptic texture. III. Effect of
update rate. Presence: Teleoperators and Virtual Environments 16 (3):263278.
Chun, J., I. Lee, G. Park, J. Seo, S. Choi, and S. H. Han. 2013. Efficacy of haptic blind spot
warnings applied through a steering wheel or a seatbelt. Transportation Research Part
F: Traffic Psychology and Behaviour 21:231241.
Culbertson, H., J. J. L. Delgado, and K. J. Kuchenbecker. 2014. One hundred data-driven haptic texture models and open-source methods for rendering on 3D objects. Proceedings
of the IEEE Haptics Symposium, pp. 319325. Houston, TX.
Feng, Z., H. B. L. Duh, and M. Billinghurst. 2008. Trends in augmented reality tracking, interaction and display: A review of ten years of ISMAR. Proceedings of the IEEE/ACM
International Symposium of Mixed and Augmented Reality, pp. 193202. Cambridge, UK.
Frey, M., J. Hoogen, R. Burgkart, and R. Riener. 2006. Physical interaction with a virtual
knee jointThe 9 DOF haptic display of the Munich knee joint simulator. Presence:
Teleoperators and Virtual Environment 15 (5):570587.
Fritz, J. P. and K. E. Barner. 1996. Stochastic models for haptic texture. Proceedings of
SPIEs International Symposium on Intelligent Systems and Advanced Manufacturing
Telemanipulator and Telepresence Technologies III, pp. 3444. Boston, MA.
Fukumoto, M. and T. Sugimura. 2001. Active click: Tactile feedback for touch panels.
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems,
pp.121122. Seattle, WA.
Gerling, G. J. and G. W. Thomas. 2005. Augmented, pulsating tactile feedback facilitates simulator training of clinical breast examinations. Human Factors 47 (3):670681.
Gopal, P., S. Kumar, S. Bachhal, and A. Kumar. 2013. Tremor acquisition and reduction for
robotic surgical applications. Proceedings of International Conference on Advanced
Electronic Systems, pp. 310312. Pilani, India.
Grosshauser, T. and T. Hermann. 2009. Augmented hapticsAn interactive feedback system
for musicians. Lecture Notes in Computer Science (HAID 2012) 5763:100108.
Guruswamy, V. L., J. Lang, and W.-S. Lee. 2011. IIR filter models of haptic vibration textures.
IEEE Transactions on Instrumentation and Measurement 60 (1):93103.
Ha, T., Y. Chang, and W. Woo. 2007. Usability test of immersion for augmented reality based
product design. Lecture Notes in Computer Science (Edutainment 2007) 4469:152161.
Ha, T., Y. Kim, J. Ryu, and W. Woo. 2006. Enhancing immersiveness in AR-based product
design. Lecture Notes in Computer Science (ICAT 2006) 4282:207216.
Hachisu, T., M. Sato, S. Fukushima, and H. Kajimoto. 2012. Augmentation of material property by modulating vibration resulting from tapping. Lecture Notes in Computer Science
(EuroHaptics 2012) 7282:173180.
253
Haddadi, A. and K. Hashtrudi-Zaad. 2008. A new method for online parameter estimation
of hunt-crossley environment dynamic models. Proceedings of the IEEE International
Conference on Intelligent Robots and Systems, pp. 981986. Nice, France.
Harders, M., G. Bianchi, B. Knoerlein, and G. Szekely. 2009. Calibration, registration, and
synchronization for high precision augmented reality haptics. IEEE Transactions on
Visualization and Computer Graphics 15 (1):138149.
Hoever, R., G. Kosa, G. Szekely, and M. Harders. 2009. Data-driven haptic rendering-from
viscous fluids to visco-elastic solids. IEEE Transactions on Haptics 2:1527.
Hollins, M., S. J. Bensmi, K. Karlof, and F. Young. 2000. Individual differences in perceptual space for tactile textures: Evidence from multidimensional scaling. Perception &
Psychophysics 62 (8):15341544.
Hollins, M., R. Faldowski, R. Rao, and F. Young. 1993. Perceptual dimensions of tactile surfaced
texture: A multidimensional scaling analysis. Perception & Psychophysics 54:697705.
Hugues, O., P. Fuchs, and O. Nannipieri. 2011. New augmented reality taxonomy: Technologies
and features of augmented environment. In Handbook of Augmented Reality, ed.
B.Furht, pp. 4763. Springer-Verlag: Berlin, Germany.
Hunt, K. and F. Crossley. 1975. Coefficient of restitution interpreted as damping in vibroimpact. ASME Journal of Applied Mechanics 42:440445.
Iwata, H., H. Yano, F. Nakaizumi, and R. Kawamura. 2001. Project FEELEX: Adding haptic
surface to graphics. Proceedings of ACM SIGGRAPH, pp. 469476. Los Angeles, CA.
Jeon, S. and S. Choi. 2008. Modulating real object stiffness for haptic augmented reality.
Lecture Notes on Computer Science (EuroHaptics 2008) 5024:609618.
Jeon, S. and S. Choi. 2009. Haptic augmented reality: Taxonomy and an example of stiffness
modulation. Presence: Teleoperators and Virtual Environments 18 (5):387408.
Jeon, S. and S. Choi. 2010. Stiffness modulation for haptic augmented reality: Extension to
3D interaction. Proceedings of the Haptics Symposium, pp. 273280. Waltham, MA.
Jeon, S. and S. Choi. 2011. Real stiffness augmentation for haptic augmented reality. Presence:
Teleoperators and Virtual Environments 20 (4):337370.
Jeon, S. and M. Harders. 2012. Extending haptic augmented reality: Modulating stiffness
during two-point squeezing. Proceedings of the Haptics Symposium, pp. 141146.
Vancouver, Canada.
Jeon, S. and M. Harders. 2014. Haptic tumor augmentation: Exploring multi-point interaction.
IEEE Transactions on Haptics 99 (Preprints):11.
Jeon, S., M. Harders, and S. Choi. 2012. Rendering virtual tumors in real tissue mock-ups
using haptic augmented reality. IEEE Transactions on Haptics 5 (1):7784.
Jeon, S., J.-C. Metzger, S. Choi, and M. Harders. 2011. Extensions to haptic augmented reality: Modulating friction and weight. Proceedings of the IEEE World Haptics Conference
(WHC), pp. 227232. Istanbul, Turkey.
Johnson, A., D. Sandin, G. Dawe, T. DeFanti, D. Pape, Z. Qiu, and D. P. S. Thongrong. 2000.
Developing the PARIS: Using the CAVE to prototype a new VR display. Proceedings of
the ACM Symposium on Immersive Projection Technology.
Kajimoto, H., N. Kawakami, S. Tachi, and M. Inami. 2004. SmartTouch: Electric skin to touch
the untouchable. IEEE Computer Graphics & Applications 24 (1):3643.
Katz, D. 1925. The World of Touch. Hillsdale, NJ: Lawrence Erlbaum Associates.
Kim, H., S. Choi, and W. K. Chung. 2014. Contact force decomposition using tactile information for haptic augmented reality. Proceedings of the IEEE/RSJ International Conference
on Robots and Systems, pp. 12421247. Chicago, IL.
Kim, S., J. Cha, J. Kim, J. Ryu, S. Eom, N. P. Mahalik, and B. Ahn. 2006. A novel test-bed for
immersive and interactive broadcasting production using augmented reality and haptics.
IEICE Transactions on Information and Systems E89-D (1):106110.
Kim, S.-Y. and J. C. Kim. 2012. Vibrotactile rendering for a traveling vibrotactile wave based
on a haptic processor. IEEE Transactions on Haptics 5 (1):1420.
254
Kurita, Y., A. Ikeda, T. Tamaki, T. Ogasawara, and K. Nagata. 2009. Haptic augmented reality interface using the real force response of an object. Proceedings of the ACM Virtual
Reality Software and Technology, pp. 8386. Kyoto, Japan.
Kyung, K.-U. and J.-Y. Lee. 2009. Ubi-Pen: A haptic interface with texture and vibrotactile
display. IEEE Computer Graphics and Applications 29 (1):2432.
Lang, J. and S. Andrews. 2011. Measurement-based modeling of contact forces and textures
for haptic rendering. IEEE Transactions on Visualization and Computer Graphics 17
(3):380391.
Lee, H., W. Kim, J. Han, and C. Han. 2012a. The technical trend of the exoskeleton robot
system for human power assistance. International Journal of Precision Engineering and
Manufacturing 13 (8):14911497.
Lee, I. and S. Choi. 2014. Vibrotactile guidance for drumming learning: Method and perceptual assessment. Proceedings of the IEEE Haptics Symposium, pp. 147152.
Houston, TX.
Lee, I., K. Hong, and S. Choi. 2012. Guidance methods for bimanual timing tasks. Proceedings
of IEEE Haptics Symposium, pp. 297300. Vancouver, Canada.
Li, M., M. Ishii, and R. H. Taylor. 2007. Spatial motion constraints using virtual fixtures generated by anatomy. IEEE Transactions on Robotics 23 (1):419.
Luciano, C., P. Banerjee, L. Florea, and G. Dawe. 2005. Design of the ImmersiveTouch:
A high-performance haptic augmented virtual reality system. Proceedings of
International Conference on Human-Computer Interaction. Las Vegas, NV.
Mahvash, M. and A. M. Okamura. 2006. Friction compensation for a force-feedback telerobotic system. Proceedings of the IEEE International Conference on Robotics and
Automation, pp. 32683273. Orlando, FL.
Milgram, P. and H. Colquhoun, Jr. 1999. A taxonomy of real and virtual world display integration. In Mixed RealityMerging Real and Virtual Worlds, ed. by Y. O. A. H. Tamura,
pp.116. Springer-Verlag: Berlin, Germany.
Minamizawa, K., H. Kajimoto, N. Kawakami, and S. Tachi. 2007. Wearable haptic display to
present gravity sensation. Proceedings of the World Haptics Conference, pp. 133138.
Tsukuba, Japan
Mitchell, B., J. Koo, M. Iordachita, P. Kazanzides, A. Kapoor, J. Handa, G. Hager, and
R. Taylor. 2007. Development and application of a new steady-hand manipulator for
retinal surgery. Proceedings of the IEEE International Conference on Robotics and
Automation, pp. 623629. Rome, Italy.
Nojima, T., D. Sekiguchi, M. Inami, and S. Tachi. 2002. The SmartTool: A system for
augmented reality of haptics. Proceedings of the IEEE Virtual Reality Conference,
pp.6772. Orlando, FL.
Ochiai, Y., T. Hoshi, J. Rekimoto, and M. Takasaki. 2014. Diminished haptics: Towards digital
transformation of real world textures. Lecture Notes on Computer Science (Eurohaptics
2014, Part I) LNCS 8618: pp. 409417.
Okamura, A. M., M. R. Cutkosky, and J. T. Dennerlein. 2001. Reality-based models for vibration feedback in virtual environments. IEEE/ASME Transactions on Mechatronics
6(3):245252.
Ott, R., D. Thalmann, and F. Vexo. 2007. Haptic feedback in mixed-reality environment. The
Visual Computer: International Journal of Computer Graphics 23 (9):843849.
Pai, D. K., K. van den Doel, D. L. James, J. Lang, J. E. Lloyd, J. L. Richmond, and S. H. Yau. 2001.
Scanning physical interaction behavior of 3D objects. Proceedings of the Annual Conference
on ACM Computer Graphics and Interactive Techniques, pp. 8796. Los Angeles, CA.
Park, G., S. Choi, K. Hwang, S. Kim, J. Sa, and M. Joung. 2011. Tactile effect design and
evaluation for virtual buttons on a mobile device Touchscreen. Proceedings of the
International Conference on Human-Computer Interaction with Mobile Devices and
Services (MobileHCI), pp. 1120. Stockholm, Sweden.
255
Parkes, R., N. N. Forrest, and S. Baillie. 2009. A mixed reality simulator for feline abdominal
palpation training in veterinary medicine. Studies in Health Technology and Informatics
142:244246.
Powell, D. and M. K. OMalley. 2011. Efficacy of shared-control guidance paradigms for
robot-mediated training. Proceedings of the IEEE World Haptics Conference, pp.427
432. Istanbul, Turkey.
Reachin Technology. Reachin Display. http://www.reachin.se/. Accessed March 4, 2015.
Romano, J. M. and K. J. Kuchenbecker. 2011. Creating realistic virtual textures from contact
acceleration data. IEEE Transactions on Haptics 5 (2):109119.
Rosenberg, L. B. 1993. Virtual fixtures: Perceptual tools for telerobotic manipulation.
Proceedings of the IEEE Virtual Reality Annual International Symposium, pp. 7682.
Rovers, L. and H. van Essen. 2004. Design and evaluation of hapticons for enriched instant
messaging. Proceedings of Eurohaptics, pp. 498503. Munich, Germany.
Sandor, C., S. Uchiyama, and H. Yamamoto. 2007. Visuo-haptic systems: Half-mirrors considered harmful. Proceedings of the World Haptics Conference, pp. 292297. Tsukuba,
Japan.
Scharver, C., R. Evenhouse, A. Johnson, and J. Leigh. 2004. Designing cranial implants in a
haptic augmented reality environment. Communications of the ACM 47 (8):3238.
SenseGraphics. 3D-IW. http://www.sensegraphics.se/. Accessed on March 4, 2015.
SoIanki, M. and V. Raja. 2010. Haptic based augmented reality simulator for training clinical
breast examination. Proceedings of the IEEE Conference on Biomedical Engineering
and Sciences, pp. 265269. Kuala Lumpur, Malaysia.
Spence, C. and C. Ho. 2008. Tactile and multisensory spatial warning signals for drivers. IEEE
Transactions on Haptics 1 (2):121129.
Sreng, J., A. Lecuyer, and C. Andriot. 2008. Using vibration patterns to provide impact position information in haptic manipulation of virtual objects. Lecture Notes on Computer
Science (EuroHaptics 2008) 5024:589598.
Ternes, D. and K. E. MacLean. 2008. Designing large sets of haptic icons with rhythm. Lecture
Notes on Computer Science (EuroHaptics 2008) 5024:199208.
Vallino, J. R. and C. M. Brown. 1999. Haptics in augmented reality. Proceedings of the
IEEE International Conference on Multimedia Computing and Systems, pp. 195200.
Florence, Italy.
Yang, C., J. Zhang, I. Chen, Y. Dong, and Y. Zhang. 2008. A review of exoskeleton-type systems and their key technologies. Proceedings of the Institution of Mechanical Engineers,
Part C: Journal of Mechanical Engineering Science 222 (8):15991612.
Yao, H.-Y., V. Hayward, and R. E. Ellis. 2004. A tactile magnification instrument for minimally invasive surgery. Lecture Notes on Computer Science (MICCAI) 3217:8996.
Ye, G., J. Corso, G. Hager, and A. Okamura. 2003. VisHap: Augmented reality combining
haptics and vision. Proceedings of the IEEE International Conference on Systems, Man
and Cybernetics, pp. 34253431. Washington, D.C.
Yim, S. and S. Choi. 2012. Shape modeling of soft real objects using force-feedback haptic interface. Proceedings of the IEEE Haptics Symposium, pp. 479484. Vancouver,
Canada.
Yokokohji, Y., R. L. Hollis, and T. Kanade. 1999. WYSIWYF display: A visual/haptic
interface to virtual environment. Presence: Teleoperators and Virtual Environments
8(4):412434.
Zoran, A. and J. A. Paradiso. 2012. The FreeDA handheld digital milling device for craft
and fabrication. Proceedings of the ACM Symposium on User Interface Software and
Technology, pp. 34. Toronto, Canada.
Section III
Augmented Reality
11
Location-Based Mixed
and Augmented
Reality Storytelling
Ronald Azuma
CONTENTS
11.1 Motivation...................................................................................................... 259
11.2 Reinforcing.................................................................................................... 261
11.3 Reskinning..................................................................................................... 265
11.4 Remembering................................................................................................. 269
11.5 Conclusion..................................................................................................... 272
References............................................................................................................... 274
11.1MOTIVATION
One of the ultimate uses of mixed reality (MR) and augmented reality (AR) will be
to enable new forms of storytelling that enable virtual content to be connected in
meaningful ways to particular locations, whether those are places, people, or objects.
By AR, I refer to experiences that superimpose or composite virtual content in
3D space directly over the real world, in real time (Azuma, 1997). However, this
chapter also includes a broader range of MR experiences that blend real and virtual
in some manner but may not require precise alignment between the two (Milgram
and Kishino, 1994).
Initially, AR applications focused on professional usages that aided perception of
a task that needed to be done in a complex 3D environment, such as medical surgeries or the maintenance and assembly of equipment. This focus was logical, because
in the early days of AR, the equipment required was so specialized and expensive
that only professional applications seemed economically viable. Later, access to MR
and AR technologies became democratized through marker and image-based tracking via cameras attached to desktop and laptop computers, smartphones, and tablets.
This enabled almost everyone to run certain forms of MR and AR on devices that
they already bought for other purposes besides AR. Today, we see a variety of MR
and AR applications that target the mass market for advertising, entertainment, and
educational purposes. In the future, these experiences will advance to the point of
establishing new forms of media that rely upon the combination of real and virtual
to tell stories in new and compelling ways. In traditional media, such as books, radio,
movies, TV, and video games, the content is entirely virtual and disconnected from
259
260
the location where the content is experienced. MR and AR storytelling experiences

will offer new ways to tell stories, in different ways than traditional media, with new
advantages and disadvantages compared against established media.
A wide variety of platforms and systems run the AR and MR storytelling experiences that I cover in this chapter. There is no single platform or system that dominates, partially because AR and MR technologies are still evolving and also because
the creators build or adapt custom systems to fit their particular experiences. A system that runs inside the controlled environment of a museum can be very different
from mobile systems that bike riders carry with them as they ride through a city. This
lack of standard platforms increases the challenge of telling stories with AR and MR
technologies, requiring storytellers to also become familiar with the capabilities and
limitations of the underlying technologies.
Storytelling is fundamentally important, and any advancements in media technology that enable people to tell stories in new and potentially more compelling ways
can have profound impact. While almost everyone enjoys good stories as a form of
entertainment, the importance of storytelling runs much deeper than that. Telling
a story is an important method of education and instruction. Stories can contain
lessons, codified bits of wisdom that are passed on in a memorable and enjoyable
form. Technological developments that make the story clearer and more memorable
can aid retention and understanding. I firmly believe that, in the long run, one of the
ultimate uses of MR and AR technologies will be as a new form of location-based
media that enables new storytelling experiences.
The goal of this chapter is to discuss location-based MR and AR storytelling.
Iprovide a hypothesis of why this might be a powerful new form of media, and specify three approaches for achieving this potential. The bulk of this chapter provides
an overview of various experiences or concepts that provide a glimmer of the potential inherent here. While I attempt to discuss a representative sample of approaches
in this area, this is not a comprehensive survey, so it does not cover all previous
workin this field. To conclude the chapter, I discuss both a fundamental challenge
limiting this new type of media, along with what an ultimate payoff might be.
MR and AR storytelling is a particular subset of a much broader area of locationbased experiences that include ARGs (alternate reality games), puzzle hunts, crossmedia and trans-media experiences, pervasive games, and performance art. For
a broader discussion of these other types of experiences, please see Benford and
Giannachi (2011), Harrigan and Wardrip-Fruin (2007), and Montola etal. (2009).
This chapter focuses on MR and AR systems that explicitly attempt to tell a story
to the participants, rather than ones where the focus is on playing a game, solving a
puzzle, or providing entertainment or an artistic experience.
This chapter also focuses on location-based storytelling experiences that generally occur outside a participants home and, in many cases, operate only at specific
sites. Therefore, I do not cover the case of augmenting a real book with 3D virtual
content to supplement the story that exists in the traditional book. A commercial
example of an AR book is Anomaly (Brittenham and Haberlin, 2012).
The reason this chapter does not cover AR books is my hypothesis about what
will make AR storytelling compelling: the combination of real and virtual must be
meaningful and powerful, where the core of the experience requires both the real
Location-Based Mixed and Augmented Reality Storytelling
261
and virtual components. If the experience is based on reality by itself, with little contributed by the augmentations, then there is no point in using AR. Conversely, if the
core of the experience comes solely from virtual content, then the augmentation part
is only a novelty and it will not be a viable new form of media. Many AR experiences
fall into the latter case. In the case of books, DVD cases, and movie posters, what is
compelling about reality is not the book, DVD, or poster. It is the virtual content represented or embodied by those objects. The compelling content resides in the ideas in
the books and the movies themselves, not the physical objects. Therefore, an experience that augments a book or movie poster with virtual augmentations derives all its
power from purely virtual content. Reality then becomes a backdrop that forms the
context of the experience, and perhaps part of the user interface, but reality is not a
core part of the content.
I hypothesize that there are at least three approaches for AR storytelling where
both real and virtual form critical parts of the experience:
Reinforcing
Reskinning
Remembering
In the next three sections, I discuss these three in more detail and describe examples
and concepts of each approach.
11.2REINFORCING
In reinforcing, the AR storytelling strategy is to select a real environment, whether
that is an object, person, or location, that is inherently compelling by itself, without
augmentation. Then, the AR augmentations attempt to complement the power that
is inherent in reality itself to form a new type of experience that is more compelling
than either the virtual content or reality by themselves.
Let me provide a conceptual example. Lets assume the goal of an experience is
to educate a participant about the Battle of Gettysburg. A student wishing to learn
about that battle could watch a 1993 movie, called Gettysburg, which had star actors,
superb cinematography, a compelling soundtrack, thousands of Civil War reenactors, and parts of it were even filmed on the site of the battle. However, Gettysburg is
also a real location. If you are so inclined, you can travel there and see the battlefield
yourself, in person. And if you do this, you will see large grassy fields, stone fences,
and many monuments. You will not see any reenactments of the battle or other virtual recreations that take you back to that fateful time in 1863. Yet, if you know why
that spot is important in American history, then simply being there, on the actual
spot where the event happened, is a powerful experience. I remember standing at
the spot of Picketts Charge, on the Union side, and feeling overcome by emotion.
(For the reader unfamiliar with the American Civil War, Picketts Charge was the
culmination of the Battle of Gettysburg. The Confederates lost both Gettysburg and
Vicksburg, on July 3 and 4, 1863. These two events are generally considered to be
the turning point of that war. Because the Union won the American Civil War, the
United States became one unified, indivisible country rather than separated into two
262
or more countries.) An AR storytelling experience that was located on the battlefield

itself might be able to draw from the best of both real and virtual to provide a new
type of compelling experience. The virtual content embodied by the film Gettysburg,
emplaced in the actual location where the battle took place, could be more powerful
than experiencing either independently.
My favorite example of this strategy is 110 Stories, which was designed by
Brian August (August, 110StoriesWhats Your Story?). This experience runs on
a mobile phone. If you are near or in Manhattan, the application uses the compass
and tilt sensors in the phone to render an outline of where the Twin Towers in the
World Trade Center should be, against the New York City skyline (Figure 11.1).
To me, there are two design decisions that make 110 Stories particularly poignant.
The first was the choice of how to render the buildings. Even on mobile phones, we
have graphics hardware that could render a detailed, perhaps nearly photorealistic
representation of the Twin Towers. But the application does not do that. Instead, it
renders only the outline of the buildings, as if sketched against the skyline with a
grease pencil. While this reduces the realism, I believe this makes the experience
more effective, because it matches the message that this experience tries to send:
That the towers are no longer there. And they are supposed to be there. The second
aspect is that after this application augments Manhattan with the outline of the towers, it invites the participant to not only capture and upload that augmented image,
but to also submit a brief story. Why did the participant choose to take this picture?
What does it mean to the photographer? If you go to the 110 Stories website, you
will see many stories that participants have chosen to record and share. One that I
have never forgotten tells how the author grew up in Manhattan, and his brother told
him that as long as he could see the Twin Towers, he would never be lost because
they told him which direction was south. They would always guide him home. And
the story concluded by stating that his brother was now forever at home, with the
implication being that his brother was in the Twin Towers on September 11, and that
he missed him.
FIGURE 11.1 110 Stories.
263
In the Voices of Oakland experience, participants experience stories about people

who are buried in Oakland Cemetery, the oldest cemetery in Atlanta (Dow etal.,
2005). The core experience is a singular linear story that takes participants along a
designated path through the cemetery, but with options to explore additional content
if the participant chooses. The creators describe the historic locations as imbued
with an aura that can make such experiences compelling. Because this is a historic
cemetery, the creators could not modify anything about the environment, nor add
markers or other elements to aid tracking. Furthermore, they took care to develop an
experience that was appropriately serious in tone and respectful of the cemetery, the
relatives of the occupants, and all other stakeholders. The virtual augmentations consisted of audio played when the participants were at the correct spots. Professional
actors provided the narration of the character voices. Due to limitations in tracking,
a Wizard of Oz approach was taken, where another person selected the content to be
played based upon the participants context and actions on the control device.
At Columbia University, Steven Feiners group developed a technique called
Situated Documentaries, in which MR techniques were used to enable a broad range
of virtual content to be seen in the context of the real locations where events actually
happened. In Columbias example, they built a location-based experience where a
participant would walk around the Columbia campus to see audio, videos, images,
web pages, 3D augmentations, and other media describing past events (Hllerer etal.,
1999). Their experiences told stories about the history of the Columbia University
campus, describing a student revolt, how students used tunnels to occupy buildings
that were guarded by police, and how one campus building was the former location
of the Bloomingdale Asylum.
Commercial versions of the Situated Documentaries technique now exist. The
Streetmuseum Londinium app from the Museum of London provides a historical
view of London in Roman times (Museum of London, Londinium App). As the participant walks to designated sites in the city, he or she can experience audio, video,
and imagery based on archaeological finds from that time period (Meyers, 2011).
The Streetmuseum application enables images or videos to appear to be augmented
over real backgrounds in the participants surrounding environment. A similar effort
was started by PhillyHistory.org, a repository of historical images of Philadelphia.
In April 2011, they released a mobile application that uses Layar as the platform
for augmenting a users view of the city with historical photos aligned with the real
background (PhillyHistory.org, 2011). Hayes (2011) provides further information
about these and other commercial projects.
Dow Day is an AR experience in which participants role play journalists in the
year 1967, investigating student protests at the University of WisconsinMadison
against the Dow Chemical company for manufacturing napalm used in the Vietnam
War (Squire etal., 2007). While set up as a game that requires participants to learn
how journalists perform their job, it also serves to engage them in learning about this
specific historical era and events through personal stories and testimonies, where
these virtual materials were tied to the real locations where the events happened.
Dow Day uses the augmented reality and interactive storytelling (ARIS) platform.
As a final example, I will discuss The Westwood Experience, a location-based
MR experience that I worked on (Azuma, The Westwood Experience by Nokia
264
Research Center Hollywood, Wither etal., 2010). We conducted this experience in

December 2009 and January 2010 in a part of Westwood, CA, that is south of the
UCLA campus. It was an experiment testing a variety of MR effects to enhance
a location-based, linear story that the participants experienced as they walked the
streets of Westwood. In brief, the participants assembled in a theater where they
met an actor portraying the honorary Mayor of Westwood. They were given mobile
phones and earphones and left the theater on their own, guided by clues to specific
points in the town. First, they experienced effects that visualized the town in the
year 1949, attempting to turn the clock back to that time period. Then they heard
a story of a striking young woman that the protagonist met, loved, and lost. They
experienced this story at the locations where these events were supposed to have happened: at the caf where they met, the building where they spent the night together,
a jewelry store where he bought a ring, and the last spot where he saw her as she
disappeared in a taxi.
The payoff, and the reason why this experience was located in Westwood, comes
at the end, when our protagonist informs us that now he wants us to meet the woman
he just told us about in his story. The power comes when the participants realize how
this is going to happen. The woman he talked about was a real person, and the way
they will meet her is not by seeing a video or image of her on their mobile phones,
nor even a 3D virtual model augmented in space. They will meet her, the real person,
by visiting Westwood Village Memorial Park Cemetery. The narration at this point
sets the expectation of how the power of this place will affect their thoughts and
behaviors:
She became what she said she would, a movie star In the end, Norma came back
to Westwood too. Shes between engagements now, resting as actors sometime say.
Shes not alone, but among many others, some who lived their lives as publicly as she
did, many of their names once as familiar as hers. I come to see her every so often, as
many others do. Im taking you to her now. I know youll be mindful of the customs
proper to the place we find her, a place of real people and real endings.
In contrast to Voices of Oakland, which the participants knew would be in a cemetery, this experience surprised most of the participants by ending in a cemetery,
particularly because this one is small and hidden away behind numerous high-rise
buildings. They were guided to a specific crypt, where they discover that the woman
in the story was Norma Jeane Baker, better known to the world by her other name:
Marilyn Monroe. We used a Situated Documentaries technique at this spot, showing
newsreel footage of her funeral. In certain sequences from the newsreel, the participants can see a clear, meaningful and one-to-one correlation between the crypts they
see in the surrounding environment and the footage that plays on the mobile device.
Combined with the somber music that was composed specifically for this spot, the
effect was a powerful and poignant coda to this experience. The emotions that participants reported feeling at this spot were different than if they simply read a tourist
guidebook and then visited the cemetery. By experiencing a story about her prior to
visiting her crypt, they were left to contemplate her life before becoming a movie
star, and to wonder if the story they just experienced might have been real.
265
Many AR storytelling experiences that rely on the reinforcing strategy use the
technique of connecting the story to the past. Being able to increase the range of
stories that can be told through new AR and MR effects will require advancements
in our ability to track and augment historic outdoor environments. The Archeoguide
project (Vlahakis etal., 2002) was an early effort to develop a platform for augmenting archaeological sites.
Reinforcing as a strategy has strengths and weaknesses. On the positive side,
the experience does not rely solely upon the virtual content by itself. Since reality
itself is compelling on its own, the real world does some of the work of providing
the meaningful experience. It may be easier to design and build virtual content that
complements reality rather than virtual content that must shoulder the entire burden
of being compelling by itself. In 110 Stories and The Westwood Experience, I believe
there are examples demonstrating that this strategy can succeed. On the negative
side, the experience is tied to a specific location. A person wishing to participate in
110 Stories must travel to Manhattan. A specific experience does not scale; it cannot
be experienced at any arbitrary location, but rather only the one it was designed for.
However, many different experiences might be built for different locations around
the world. Furthermore, the story must be tied to the characteristics of the real location. One cannot tell any arbitrary story and expect reinforcing to work; instead, the
story must complement the reality that exists at the chosen site. In The Westwood
Experience, we walked around Westwood with the writer, and he wrote the story
to incorporate real elements in the town, such as a jewelry store. And we were very
aware that our experience ended in a real cemetery. A story that was disrespectful
of that reality or that provided experiences inappropriate to that location would at
best fail to harness the power of that real place, and at worst would be offensive.
Reinforcing requires the story to appropriately complement reality.
11.3RESKINNING
In reskinning, the strategy is to remake reality to suit the purposes of the story you
wish to tell. Reality is either something that the creators specifically set up and then
augment, or the experience is designed to recharacterize whatever real surroundings
exist. Unlike reinforcing, there may not be anything particularly special or evocative
about the real location, which means experiences based on reskinning can potentially scale to operate in most arbitrary real locations, or reward the participant for
finding locations that work well for the experience. However, most of the power
from the experience must now come from the virtual content and how it adapts and
exploits the real world to fit that virtual content.
Rainbows End is a Hugo-award winning science fiction book written by Vinge
(2006) that provides one ultimate concept of reskinning. In this book, nearly perfect
AR is ubiquitously available to people who can operate the latest wearable computing systems, which use displays embedded in contact lenses and tracking provided
by a vast infrastructure of smart motes that permeate almost all inhabited locations.
Within this world, there are Belief Circles, which are persistent virtual worlds that
are linked to and overlaid upon real locations. Each Belief Circle has a particular theme, such as a fantasy world set in medieval times. When a user chooses to
266
subscribe to a Belief Circle, he or she sees the surrounding world changed to fit the
theme. For example, in a medieval Belief Circle, nearby real buildings might appear
to be castles and huts, and people on bicycles might instead appear to be knights
on horseback. A Belief Circle has a large group of people who subscribe to it and
create custom content, and when others view that content, the creators can receive
micropayments. We can view a Belief Circle as a persistent, co-located virtual world
that links directly, one to one, to our real world and uses the principle of reskinning
to change reality to fit the needs of the virtual content and experience.
Unlike the world of Rainbows End, we do not currently have ubiquitous tracking and sensing, so there are few examples of reskinning, and those often rely upon
real environments that were specially created to support the needs of the story. Two
examples are AR Faade and Half Real.
Faade is an interactive story experience where the participant plays the role of
a dinner guest visiting a couple whose marriage is just about to break apart (Mateas
and Stern, 2007). Faade supports free text entry so that the participant can type in
anything to converse with the two virtual characters while walking around freely
in the virtual environment. The experience is not a linear story. Depending on what
the participant does or says, various story beats are triggered and experienced. For
example, walking up to or commenting on a particular object or picture will trigger
certain narrative sequences. Faade by itself is a virtual environment that runs on a
PC and monitor. Researchers built an AR version, called AR Faade, in which they
built a real set that replicated the apartment that is the setting of this experience
(Figure 11.2), and participants wore a wearable AR system and walked around the
real set to see virtual representations of the couple (Dow etal., 2008). The goal was
to provide the participants a greater sense of actually occupying a real apartment
and interacting more naturally with the apartment and its virtual inhabitants. For
example, instead of typing in what they would say, participants now simply said what
they wanted, directly. Rather than relying upon voice recognition, human operators
working behind the scenes then typed in what the participants said into the system.
The evaluation did not directly attempt to measure whether AR Faade was more
engaging or compelling than Faade by itself. However, there was evidence that AR
Faade did affect some participants emotionally. Some chose to quit early rather
FIGURE 11.2 The real set of the apartment in AR Faade.
267
than participate in an experience that was an uncomfortable social situation that they
were expected to take an active role in. Others became highly engaged, showing visible signs of surprise and emotional connection, such as running to follow one of the
virtual characters when she leaves.
Half Real is a theatrical murder mystery play that uses spatial AR to merge real
actors and a physical set with virtual content and to engage the audience with interactive situations where the audience members vote on how an investigation proceeds
(Marner et al., 2012). Actors were actively tracked so that virtual labels could be
attached to them. Each audience member had a ZigZag handheld controller to vote
when prompted. The real set is painted white so that projective textures can change
the appearance of the set during the performance. The creators had to work out
numerous system issues to provide the reliability, robustness, and transportability
required of a professional stage production. Half Real completed a tour in South
Australia and subsequently played for a 3-week, sold out run in Melbourne. Future
possibilities include using the augmentations to change the appearances of the actors
themselves, rather than just the set and the backgrounds.
Applying the reskinning technique outside of controlled real environments, such as
those of AR Faade and Half Real, requires AR systems that can detect and understand
the real world. Kinect Fusion exploits the depth-sensing capabilities of the Kinect to
build a volumetric representation of a real environment, enabling a system to track off
of that space and augment it more realistically, with correct occlusions (Newcombe
etal., 2011). Such a system represents a step along the directions needed for AR and
MR systems to more commonly enable the reskinning technique. For example, the
Scavenger Bot demonstration that Intel showed at consumer electronics show (CES)
2014 showed a system that could scan a previously unknown tabletop environment and
then change its appearance by applying a different skin upon the environment (Figure
11.3). This system can also handle dynamic changes in the environment. While we are
FIGURE 11.3 The Intel Scavenger Bot demonstration at CES 2014, reskinning a real environment with a virtual grid pattern.
268
now seeing systems that can detect and model the real world, these models generally
lack semantic understanding. True reskinning will require systems that can detect and
recognize the semantic characteristics of the environment and objects.
The University of Central Florida provided an example of reskinning the interior
of a museum to better engage visitors with the exhibits. In the MR Sea Creatures
experience in the Orlando Science Center, visitors saw the museum interior transformed to be underneath the sea, and skeletons of ancient sea creatures on display
then came to life (Hughes etal., 2005). Visitors navigate a virtual rover vehicle to
collect specimens around the museum. At the end, they see an animation of one
dinosaur grabbing a pterodactyl out of the air and holding it in its mouth, which then
transforms back to the real world where the visitors see the real fossil of that dinosaur with the pterodactyl in its mouth.
The Aphasia House project is an exciting new application of MR storytelling
which enables patients suffering from traumatic brain injury to tell their own personal stories to therapists, not for the purpose of entertainment, but as a critical part
of guiding a doctor in determining how to treat a patient (Stapleton, 2014). People suffering from aphasia are impaired in their ability to communicate due to severe brain
injuries. They may lose the ability to speak, read, or write. Preliminary results from
this project indicate that immersive storytelling in an MR environment may enable
some patients to reconnect with their abilities to tell stories, and a doctor involved in
this project testifies that this breakthrough would not have been possible in a purely
virtual environment or without the augmentations provided in the MR environment.
What appeared to be critical was building a real environment (a kitchen) that could
be augmented in a variety of ways to elucidate familiar previous experiences from
the patient: making coffee, eating a bagel, touching countertops, and doing that in
a multimodal way so that he felt, heard, and smelled familiar sensations. This is an
example of applying reskinning to evoke stories out of a patient in the pursuit of a
serious goal: helping patients recover their own abilities to communicate.
Reskinning relies most on the power of the experience coming from the virtual
content, rather than the real environment, so a key strategy may be to exploit virtual content that participants are already familiar with. When this content is created
by professional storytellers and audiences who have already read the books, seen
the movies, or otherwise experienced the virtual content, then a new experience
thatleverages that same content is not starting from scratch. It has an advantage in
that the audience already finds the virtual content compelling. One example of this
is the Wizarding World of Harry Potter at the Universal Studios Orlando theme park
in Florida. While this is not explicitly an example of AR or MR storytelling, it is
an example of this leveraging strategy. Since most visitors are already familiar with
the Harry Potter books or films, when they walk through that area of the theme park
and experience the attractions and shops there, they draw from their memories and
previous knowledge of this fantasy world. Such leveraging is the basis of many crossmedia or trans-media approaches, and it can be quite successful. The Wizarding
World of Harry Potter was sufficiently popular that Universal expanded it in the
summer of 2014.
Alices Adventures in New Media was an early AR narrative experiment that leveraged the world of Alice in Wonderland, written by Lewis Carroll (Moreno etal., 2001).
269
FIGURE 11.4 A Leviathan demonstration in the Intel booth at CES 2014.
In this system, a participant sat at a table and saw three other characters from the
book. The participant could interact with the characters by performing various
actions such as serving and sipping tea, which affected the narrative snippets.
At CES 2014, Intel ran a series of AR demonstrations based upon the steampunk
fantasy world of Leviathan, written by Westerfeld (2009). These AR demonstrations
were intended to inspire visitors about the potential for AR storytelling that used
this leveraging strategy (Azuma, Leviathan at CES 2014). I was part of a large team
of people who created and ran these demonstrations. The world of Leviathan is set
in an alternate Earth, during World War I, where mankind discovered genetic engineering very early. Therefore, in some countries a biological revolution supplanted
the industrial revolution, and people chose to fabricate new types of living things to
suit their purposes. For example, the Leviathan itself is an enormous flying airship in
the form of a whale, replacing dirigibles. In our demonstrations, we brought virtual
representations of the Leviathan and other creatures inspired by the book into the
real environment, both during the Intel CEOs keynote presentation and in the Intel
booth on the CES show floor (Figure 11.4). While these demonstrations did not tell
stories by themselves, they served as an inspiration of how this leveraging strategy
could result in compelling new storytelling media when applied through the reskinning strategy of AR and MR storytelling.
11.4REMEMBERING
In remembering, the AR storytelling strategy is to draw upon memories and retell
those stories, generally at the particular place where those memories and stories
happened. The belief is that combining the memories and stories with the actual real
location can result in a new experience that is more powerful than the real location
by itself, or the virtual content by itself. For example, I could revisit the site of my
270
wedding ceremony and see the gazebo where that occurred. While I have photos
and videos of that event, communicating my personal story of that day and what that
meant to me might be done in a more powerful manner as an AR or MR experience,
merging that virtual content with the actual location where my wedding occurred.
The strategy of remembering is similar to reinforcing, but there are some differences. The locations used in the reinforcing approach have particular meanings and
power that most people agree upon and know. For example, the site of the Battle of
Gettysburg draws its power from a specific event. While interpretations might vary,
the meaning is shared and agreed to by almost all participants, and that constrains
the experiences based on reinforcing to conform to that meaning. Remembering, in
contrast, is generally more personal and individual. With this approach, the potential stories and memories can vary greatly, even at the same location. For example,
Sproul Plaza on the campus of the University of California, Berkeley, could be home
to a wide variety of memories and stories. One person might remember participating
in the Free Speech Movement at that spot, while another knows it as the place where
he first met his future spouse, and yet another has memories of Pet Hugs sessions
where students could hug therapy dogs to reduce their stress.
Even when divorced from a particular location, memories and viewpoints by
themselves can make compelling experiences. Three Angry Men (MacIntyre etal.,
2003) and its successor, Four Angry Men, are experimental AR narrative demonstrations that enable participants to access and experience the memories and thoughts
of characters in a narrative, from their particular points of view. Inspired by the
drama Twelve Angry Men, written by Reginald Rose, these experiences place the
participant in the viewpoint of a jury member deliberating on a case. When seated
at a particular chair in a table, the participant sees the other jurors from the perspective of the juror who is sitting in the chair he or she is occupying (Figure 11.5). The
participant not only hears what the other jurors say and what his or her persona is
saying, but he or she also hears the inner thoughts on the character sitting in that
chair. The participant is free at any time to switch seats. When he or she does so, the
deliberation continues but the participant now hears and sees things from a different
jurors perspective and hears that jurors inner thoughts. For example, one juror with
liberal leanings sees another African-American juror as a potential ally but the third
juror as prejudiced. When the participant moves to the seat of the prejudiced juror,
the entire experience changes. While the initial juror heard the prejudiced juror as
FIGURE 11.5 Three Angry Men, seeing two other jurors from one point of view.
271
loud and unreasonable, the prejudiced juror hears himself as reasonable, if a bit frustrated. Even the appearances of the jurors change depending on the viewpoint. To the
prejudiced juror, the African-American jurors appearance and behaviors transform
to conform to his biases. Three Angry Men provides an example of how AR storytelling could communicate, at a first-person level, and how stories and memories change
based on personal perspectives and biases, which has been called the Rashomon
effect after Akira Kurosawas film.
REXplorer was a system that encouraged participants to explore and learn about
the historic town of Regensburg, Germany, and its well preserved medieval city center through MR storytelling techniques (Ballagas etal., 2008). Although REXplorer
is primarily a game that asks participants to go on quests to specific locations within
the city center, it motivates these quests through virtual characters, ghosts who used
to inhabit the town, who have requests to make of the participants and stories to tell
them. The participants learn about these characters and their stories and perform
tasks such as carrying a love letter to another character inhabiting a different location in town. By performing these tasks, participants indirectly explore the historical
city center, with the goal of learning history in a more entertaining and enjoyable
manner. Some participants found that using stories in this manner injected life into
a historical tour that otherwise might have been dry and boring.
Rider Spoke is a location-based experience in which bike riders were encouraged
to record personal stories and memories associated with particular locations, at the
spot where those occurred (Benford and Giannachi, 2011). The virtual content consisted of the audio recordings. Riders could add recordings only in spots that did not
already have content associated with that location, ensuring that each location had
unique content. The system provoked the participants to leave significant and evocative memories. For example, one instruction asked a participant to find a spot that his
or her father would like and to talk about that. In this experience, participants were
not just passive consumers of content but active generators, contributing their own
personal stories, as if mapping diary entries to specific spots in the city. Rider Spoke
was conducted in 10 cities across the world.
You Get Me was a 2008 experience in which participants selected one of eight
young people to hear his or her stories and, perhaps, make a connection to that person (Blast Theory, 2008). The eight young people had communication and tracking
equipment and walked around a park. Each person has a key question that he or
she wanted help in answering. Participants went to computer terminals at the Royal
Opera House, about 5 miles away from the park, to select one of the young people
and then explored the park virtually. As the participant moved in the virtual representation of the park, they heard stories relating to that person, the personal geography of how that park maps to the chosen young person. For example, one person
arranged stories around a swimming pool in which she nearly drowned. The stories
give clues for answering that persons question. The participant can then track down
the person in the park and attempt to answer the question. If the person thinks the
answer is insufficient, he or she can reject it and force the participant to explore more
of the personal geography and stories. But if the person finds the answer intriguing,
he or she can invite the participant for a private chat or phone call. And in the conclusion, the person takes a photo of himself or herself with the park in the background
272
and sends it to the participant. You Get Me is a compelling experience, connecting

the participant in an intimate manner, via MR storytelling techniques, to one of
eight real people with real concerns and real stories, overlaid on a real park that each
person uniquely maps as his or her own personal geography.
Since we do not have ubiquitous tracking systems with the desired accuracies
needed to build indoor/outdoor AR and MR experiences, our ability to implement
storytelling experiences based on the remembering approach is constrained, but
the potential is there to enable individuals to create and make available their own
personal stories for others to see in the context where they occurred. Not all stories
have to be written by professional storytellers, aimed at a mass market audience.
Some stories might be of interest only to your family or a close circle of friends,
or only to specialized audiences. But that does not make them any less important.
Sometimes, the stories that are the most important to us or to others are the ones
that represent ourselves, who we are, what our goals and aspirations are, and where
we came from.
11.5CONCLUSION
AR storytelling is still in an early, exploratory phase. While there have been many
initial experiments, as this chapter has discussed, I feel that as a form of media,
it is still very early in its development. It reminds me of an early phase of the
development of motion pictures, where some of the first movies featured footage
of moving trains, showing what the technology could do. Advancing the technology of moving pictures from those early days into the art form of cinema that we
know today required progress on many fronts, not just in technology, but also in
art, design, and business models. In AR and MR storytelling, we do not yet have
the equivalents of the early pioneers in cinema, such as Buster Keaton, Sergei
Eisenstein, and D.W. Griffith. These future pioneers will need to overcome some
of the core challenges of this new form of media while simultaneously unlocking
its potential.
One of the most important challenges in AR and MR storytelling is motivating
people to make the necessary effort to participate in these location-based media.
These experiences generally require participants to leave their homes and travel to
particular locations or venues, which requires effort and costs resources, in terms
of time, money, etc. In comparison, one can watch a film or see a TV show almost
anywhere. It takes little effort to turn on the TV and watch a show on the DVR, see
a movie, or play a video game in ones house. Why would someone get off the couch
and instead participate in these new location-based media?
The answer is that AR and MR storytelling experiences must become compelling enough to convince participants that this effort is worthwhile. Despite our
TV sets, game consoles, and comfortable couches, people still leave home to go
to a movie theater, see a sporting event in a stadium, go to theme parks, visit a
museum, travel to distant sites on vacation, etc. Those experiences are attractive enough that people willingly spend the extra effort to participate in those.
Initially, AR and MR storytelling might leverage such situations, augmenting
273
those experiences that are already proven to draw people out of their homes.
As the medium develops, I look forward to such experiences being sufficient by
themselves to attract participants.
What would be a payoff that would make people eager to participate in
location-based experiences? The Walt Disney Company provided an example in a
2013 Alternate Reality Game called The Optimist (Andersen, 2013). This provided a
series of experiences that Disney fans could participate in, culminating in an elaborate puzzle hunt that took place at the 2013 D23 Expo and in Disneyland. The people
who knew about these events and who chose to attend at the specific locations and
dates were rewarded with access to locations that the general public normally cannot
enter. These locations included the Club 33 private club, Walt Disneys apartment
above the fire station on Main Street, and the Lilly Belle caboose car on one of
the railroad trains. For Disney fans, visiting these locations provided highly desirable and special experiences, ones they would remember and forever cherish. While
compelling, this is a specific approach that requires special locations and does not
generalize or scale to most situations.
A more general approach toward achieving compelling experiences will be to
realize the potential inherent in the medium to see the world around you through
the eyes, viewpoint, and mindset of another person. To me, an ultimate expression
of the potential of AR and MR storytelling is if it can cause you to view the world
in a different way, and if this impact is powerful enough that it actually changes
your own belief system, how you view the world and make decisions. I can give an
example of the desired impact through something that happened to me through real
life experience:
A friend of mine, who worked on several projects with me, had a stroke.
He now requires a powered wheelchair to travel anywhere.
I now view the world differently than I did prior to this incident, because I have
traveled with him to many events. Before, I would not think twice about curbs or
stairs or other things that are insurmountable obstacles to my friend. Now, I am
sensitive to the locations of ramps, elevators, and other items that provide wheelchair access.
AR and MR storytelling experiences have the potential to change how we view
the world, to make us see the world from a different perspective, such as that of my
friend, and to in turn change our belief systems and values. This different perspective can be cultural, political, historical, social, or any other dimension. But if an
experience can change me, in a way similar to what I just described, that is proof that
experience is compelling.
We know that traditional media, such as film, plays, and books, have this power
and there are examples in each where people have found those stories memorable,
compelling, and life altering. When we have equivalent examples in AR and MR
storytelling, that exploit the new potentials in this form of media, then we will know
that it has matured sufficiently to stand equally with established media. I look forward to this day.
274
REFERENCES
Andersen, M. The Optimist draws fans into fictionalized Disney history. Wired, July 23,
2013. http://www.wired.com/2013/07/disney-the-optimist-arg/ (accessed May 5, 2014).
August, B. 110StoriesWhats your story? http://110stories.com (accessed May 5, 2014).
Azuma, R. A survey of augmented reality. Presence: Teleoperators and Virtual Environments,
6(4), 1997, 355385.
Azuma, R. Leviathan at CES 2014. http://ronaldazuma.com/Leviathan_at_CES2014.html
(accessed May 2, 2014).
Azuma, R. The Westwood Experience by Nokia Research Center Hollywood. http://ronal
dazuma.com/westwood.html (accessed May 2, 2014).
Ballagas, R., A. Kuntze, and S. Walz. Gaming tourism: Lessons from evaluating REXplorer, a
pervasive game for tourists. In Pervasive Computing 2008, Sydney, New South Wales,
Australia, May 1922, 2008, pp. 244261.
Benford, S. and G. Giannachi. Performing Mixed Reality. Cambridge, MA: MIT Press, 2011.
Blast Theory. You Get Me. http://www.blasttheory.co.uk/projects/you-get-me/ (accessed May
12, 2014).
Brittenham, S. and B. Haberlin. Anomaly. Anomaly Publishing, 2012.
Dow, S., J. Lee, C. Oezbek, B. MacIntyre, J. D. Bolter, and M. Gandy. Exploring spatial
narratives and mixed reality experiences in Oakland cemetery. In Proceedings of the
2005 ACM SIGCHI International Conference on Advances in Computer Entertainment
Technology, Valencia, Spain, June 1517, 2005, pp. 5160.
Dow, S., B. MacIntyre, and M. Mateas. Styles of play in immersive and interactive story: Case
studies from a gallery installation of AR Faade. In Proceedings of the 2008 International
Conference on Advances in Computer Entertainment Technology, Yokohama, Japan,
December 35, 2008, pp. 373380.
Harrigan, P. and N. Wardrip-Fruin, eds. Second Person: Role-Playing and Story in Games and
Playable Media. Cambridge, MA: MIT Press, 2007.
Hayes, G. Transmedia futures: Situated documentary via augmented reality, 2011. http://
www.personalizemedia.com/transmedia-futures-situated-documentary-via-augmented-
reality/ (accessed May 5, 2014).
Hllerer, T., S. Feiner, and J. Pavlik. Situated documentaries: Embedding multimedia presentations in the real world. In Proceedings of the 3rd IEEE International Symposium on
Wearable Computers 1999, San Francisco, CA, October 1819, 1999, pp. 7986.
Hughes, C., C. Stapleton, D. Hughes, and E. Smith. Mixed reality in education, entertainment
and training. IEEE Computer Graphics and Applications, 25(6), 2005, 2430.
MacIntyre, B., J. D. Bolter, J. Vaughn etal. Three angry men: An augmented-reality experiment in point-of-view drama. In Proceedings of the First International Conference
on Technologies for Interactive Digital Storytelling and Entertainment, Darmstadt,
Germany, March 2426, 2003, pp. 230236.
Mateas, M. and A. Stern. Writing Faade: A case study in procedural authorship. In Second
Person: Role-Playing and Story in Games and Playable Media, P. Harrigan and
N.Wardrip-Fruin (eds.). Cambridge, MA: MIT Press, 2007, pp. 183207.
Marner, M., S. Haren, M. Gardiner, and B. Thomas. Exploring interactivity and augmented
reality in theater: A case study of half real. In IEEE International Symposium on Mixed
and Augmented Reality 2012, Arts, Media and Humanities Proceedings, Atlanta, GA,
November 58, 2012, pp. 8186.
Meyers, K. Revealing Londinium Under London: New AR App. Cultural Heritage Informatics
Initiative, http://chi.anthropology.msu.edu/2011/07/revealing-londinium-under-londonnew-ar-app/ (accessed May 5, 2014).
Milgram, P. and F. Kishino. A taxonomy of mixed reality visual displays. IEICE Transactions
on Information Systems, E77-D(12), 1994, 13211329.
275
Montola, M., J. Stenros, and A. Waern. Pervasive Games: Theory and Design. Burlington,
MA: Morgan Kaufmann Publishers, 2009.
Moreno, E., B. MacIntyre, and J. D. Bolter. Alices adventures in new media: An exploration
of interactive narratives in augmented reality. In CAST01, Bonn, Germany, September
2122, 2001, pp. 149152.
Museum of London. Londinium App. http://www.museumoflondon.org.uk/Resources/app/
Streetmuseum-Londinium/home.html (accessed May 5, 2014).
Newcombe, R., S. Izadi, O. Hilliges etal. KinectFusion: Real-time dense surface mapping and
tracking. In Proceedings of IEEE International Symposium on Mixed and Augmented
Reality (ISMAR) 2011, Basel, Switzerland, October 2629, 2011, pp. 127136.
PhillyHistory.org, Implementing Mobile Augmented Reality Technology for Viewing Historic
Images. An Azavea and City of Philadelphia Department of Records White Paper.
2011. http://www.azavea.com/research/company-research/augmented-reality/ (accessed
May5, 2014).
Squire, K., M. F. Jan, J. Matthews etal. Wherever you go, there you are: Place-based augmented reality games for learning. In The Educational Design and Use of Simulation
Computer Games. Sense Publishing, 2007, pp. 265296. Rotterdam, The Netherlands.
Stapleton, C. Developing stories that healA collaboration between Simiosys and the Aphasia
house. http://simiosys.com/blog/?p=459 (accessed June 16, 2014).
Vlahakis, V., N. Ioannidis, J. Karigiannis etal. Archeoguide: An augmented reality guide for
archaeological sites. IEEE Computer Graphics and Applications, 22(5), 2002, 5260.
Vinge, V. Rainbows End. New York: Tor Doherty Associates, 2006.
Westerfeld, S. Leviathan. New York: Simon Pulse, 2009.
Wither, J., R. Allen, V. Samanta etal. The Westwood experience: Connecting story to locations
via mixed reality. In IEEE International Symposium on Mixed and Augmented Reality
2010, Arts, Media and Humanities Proceedings, Seoul, Korea, October 1316, 2010,
pp. 3946.
12
Dimensions of Spatial
Sound and Interface
Styles of Audio
Augmented Reality
Whereware, Wearware,
and Everyware
Michael Cohen
CONTENTS
12.1 Introduction and Overview............................................................................ 278
12.1.1 Auditory Dimensions......................................................................... 278
12.1.2 Source to Sink Chain......................................................................... 278
12.1.3 Spatial Sound.....................................................................................280
12.1.3.1 Directionalization and Localization................................... 283
12.1.3.2 Spatial Reverberation.......................................................... 285
12.1.4 Spatial Audio Augmented Reality..................................................... 287
12.2 Whereware: Spatial Dimensions.................................................................... 289
12.2.1 Position = Location and Orientation; Changing
Pose=Translation and Rotation........................................................ 289
12.2.2 Whereware for Augmented Reality................................................... 289
12.2.3 Distance Effects................................................................................. 291
12.2.4 Stereotelephony................................................................................. 292
12.3 Wearware and Everyware: Source and Sink Dimensions............................. 294
12.3.1 Capabilities........................................................................................ 295
12.3.1.1 Mobile and Wearable Auditory Interfaces.......................... 295
12.3.1.2 Form Factors....................................................................... 295
12.3.1.3 Dynamic Responsiveness....................................................300
12.3.1.4 Head Tracking.....................................................................300
12.3.1.5 Broadband Wireless Network Connectivity: 4G,
MIMO, ABC, and SDR...................................................... 301
277
278
12.3.2 Special Displays................................................................................. 301

12.3.2.1 Binaural Hearing Aids........................................................ 301
12.3.2.2 Parametric Ultrasonics.......................................................302
12.3.2.3 Distributed Spatial Sound...................................................302
12.3.2.4 Information Furniture......................................................... 303
12.4 Concluding Remarks.....................................................................................304
References...............................................................................................................304
12.1 INTRODUCTION AND OVERVIEW

Time is the core of multimedia. Modern applications are synchronous: dynamic (interactive, runtime), realtime (updates reflected immediately), and online (networked).
Sound and audio, including spatial sound and augmented audio, are especially leveraged by such distributed capabilities, not just modulation of location of virtual sound
sources, but parameterization of the entire source-sink cascade, including synthesis,
directionalization, spatialization, and reception. This chapter reviews spatial sound
in the context of interactive multimedia and virtual reality (VR) and augmented
reality (AR). The theory and practice of spatial sound are surveyed, including psychoacoustic bases of spatial hearing and techniques for creating and displaying augmented audio. Whereware is explained as a class of location- and position-sensitive
interfaces. Wearware and everyware refer respectively to portability of personal
sound terminals, such as contemporary smartphones and tablets, and pervasiveness
of public interfaces, such as speaker arrays.
12.1.1 Auditory Dimensions

VR and AR (Barfield and Furness III 1995, Durlach and Mavor 1995, Shilling and
Shinn-Cunningham 2002, Stanney 2002), usually with coordinated multimodal
input and output, reify information by giving it a realistic manifestation. Auditory
displays deserve full citizenship in the user interface. This section surveys spatial
and nonspatial dimensions of sound. Rendering of virtual sound sources with respect
to sinks in a modeled space is parameterized by such characteristics as position,
intensity, radiation, distance fall-off and filtering, reflections and reverberation, and
diffraction around obstacles.
12.1.2Source to Sink Chain

To begin, stages in an audio AR (AAR) system are reviewed, including careful
use of the often muddled jargon. Figure 12.1 illustrates an entire chain, a complete
rendering of which models sources, space, and sinks. (The term sink, denoting the
dual of a source, is used instead of listener to distinguish it from an actual human,
including allowing designation of multiple sinks for a single user, as explained in the
following chapter.) Generally, signals are received, captured, or synthesized, then
recorded or buffered, processed, and finally transmitted or rendered. A common
technique in combination of real and virtual audio is to model soundscapes as compositions of sound sources and cascaded filters that process them. The effects of
279
Dimensions of Spatial Sound and Interface Styles
Location, orientation, and

position tracking
Sensors:
Ultrasonic or acoustic
Magnetic
Optical, infrared
GPS/WAAS
Gyroscopic, accelerometric
Sources
Object generation
Environmental sounds
Auditory icons, earcons
Voice
Music
Nonspatial (anechoic: dry) sources:
Sampling (microphones)
Additive and subtractive synthesis
AM and FM
Physical modeling
Granular synthesis
Speech synthesis (including TTS)
Nonlinear wave-shaping
Waveguide synthesis
Hybrid algorithms
Network- or cloud-served streaming
Parameters:
Location: direction (heading) and distance
Directivity
Directional tone color
Mute/muzzle (solo)
Motion
Space (medium)
Radiation/propagation
Sound field modeling:

Spreading loss and distance attenuation
Atmospheric effects
Humidity, temperature
Refraction
Transmission loss (air absorption)
Propagation delay
Obstructions and occlusion, diffraction
Reflection (echoes) and scattering
Reverberation
Sinks
Reception and
directional synthesis
Auralization:
Location and direction (orientation)
Panning, HRTF processing
deafen/muffle (attend)
Doppler effects
Display:
Earphones, headphones, and headsets
Bone conduction
Nearphones
Loudspeakers
Speaker arrays (5.1, 7.1, 10.2, 22.2, WFS, HOA, etc.)
FIGURE 12.1 Overview of virtual acoustics and augmented audio.
the various elementssensor (i.e., a microphone when a sound source is not otherwise provided); room or space; torso, head, and pinnae of the modeled listener; and
loudspeakersshould all be considered to create a veridical auditory illusion.
A speakers voice, a musical instrument tone, or another acoustic event(the source)
causes small fluctuations of pressure above (compression, a.k.a. condensation) and
below (rarefaction) atmospheric pressure. These variations can be sensed by a microphone, transducing acoustic energy into electrical. Such m
easurement, expressed as
a voltage signal, is discretely sampled (in time) and quantized (inamplitude) by an
audio interface ADC (analogdigital converter), encoding it uniformly (as in LPCM,
linear pulse code modulation) or nonlinearly (as in - or A-law representations).
Audio sources might alternatively be provided as recorded material, or synthesized or streamed in real time. Recordings, stored as computer files, might need to
be decompressed, for example, from MP3 or AAC (.m4a). Decoded audio data has
a flat PCM encoding, such as that encapsulated by WAV or AIFF files, in which
audio signals are represented as sequences of amplitudes at a constant sampling
rate. Synthesis techniques include those listed below the Sources block at the left
of Figure 12.1. Networked streams are typically remote teleconferees voices or
internet- and cloud-served music or other material.
These audio signals are filtered by a computers DSP (digital signal processing).
Digital amplification or attenuation is accomplished by adjusting linear gain, a
coefficient which modulates (multiplies) a raw signal sequence, scaling the envelope
of a notional pulse train for source exposure or sink sensitivity. Balancing or panning a stereo signal involves coupled gain adjustments to a leftright signal pair.
Spectrum-based adjustmentssuch as equalization and aural enhancementare
also possible, typically by specifying frequency-band-specific gain. Such s weetening
280
might include spectral extensionwith subharmonic or overtone synthesisas well

as time-based enrichmentechoes and reverberation.
Power is energy per time, corresponding to the time-averaged square of a linear
signals running value. Intensity is power per area, proportional to the square of
the RMS (root mean square, the standard deviation of a centered signal) pressure
or voltage. For a digital signal, intensity is equivalent to power (proportional to an
arbitrary indeterminate factor, since area is unknown or unspecified), and therefore also corresponds to the windowed and averaged sum of squares of a sample
sequence. Inthe frequency domain, squares of the amplitudes of the Fourier components comprise a power spectrum, in which energy equivalently corresponds to
intensity (via Parsevals theorem, which is basically a restatement of conservation of
energy). Thelevel associated with a sounds subjective loudness or volume (which is
not to be confused here with the separate idea of 3D spatial extent) is proportional to
the logarithm of the intensity, since human perception of loudness (as measured, for
instance, on a sone scale) is predicted by an approximately logarithmic compression
of measured intensity.
In VR and AR applications, sources are parameterized by spatial position, including location and orientation as well as other attributes. Simple source models project
sound equally in all directions, but more complicated models, such as megaphone
effects, describe deviation from omnidirectional radiation patterns by emphasizing certain directions. Sources and sinks are embedded in a space, which may be
a realistic model of a room or building, or more typically a simplified or abstracted
model. This embedding specifies or implies interaction of the sources with a simulated environment, including echoic reflections off surfaces; reverberation, statistically approximable ambience; as well as occlusions and obstructions (Funkhouser
etal. 2002), other objects in a scene around which sound must diffract or bend. Such
mediation also includes propagation and attenuation models. Auralization emphasizes o torealistic acoustic rendering, as in architectural walk-throughs and simulations of concert halls (Kleiner 2011).
At the end of the simulation the signal is received by sinks, parameterized like
sources with positions, as the sequence is processed with filters that model the
effect of each listeners head and torso (Kapralos etal. 2008). All of these effects
are modeled as digital signal processes, cascaded in a data chain or aggregated into
a monolithic filter. Finally, the multichannel PCM stream can be fed into a DAC
(digitalanalog converter), smoothed out with reconstruction LPFs (low-pass filters),
and sent through analog amplifiers to loudspeakers, where the voltage-encoded signal corresponds to excursion of speaker diaphragms, which induce pressure waves
that interact with each human listeners real environment before entering ears to be
apprehended as auditory events.
12.1.3Spatial Sound
In stereo reproduction systems, sound comes only from left and right transducers:
earphones, headphones, and loudspeakers. Such audio systems project only lateral
arrangement of captured and mixed sources, as the apparent direction from which
sound emanates is controlled by panning, shifting the balance of a sound source
281
between channels with a pan pot (short for panoramic potentiometer), a cross-coupled
dual mixing variable resistor, perhaps with nonlinear cross-tapers to preserve total
power across a distribution. But this technique yields images that are diffuse, located
basically only between loudspeakers and only at distances farther from the listener
than the plane of the speakers, or, if headphones are used, only between the ears
(for intracranial, or IHL, inside-the-head localization). Cyberspatial sound projects
audio media into acoustic space by manipulating signals so that they assume virtual
or projected positions, mapping them from zero space (source channels) into multidimensional space (listeners perceptual spaces). Panning by cross-fading intensity
yields lateralized images, degenerately spatialized 1D, but more sophisticated processes can make virtual sources directionalized in a 2D, periphonic (360, 2 radians
circumferentially) flat soundscape, or spatialized in a 3D, pantophonic (360 90,
4 steradian solid angle) soundscape. Spatial audio involves technology that allows
virtual sound sources to have not only lateral leftright (sway) attributes (as in a
conventional stereo mix), but vertical updown (heave) and longitudinal backforth
(surge) qualities as well. By applying psychoacoustic effects with DSP, engineers
and scientists are developing ways of generating sound fields (Tohyama etal. 1995)
and fully 3D sound imagery (Blauert 1997, Gilkey and Anderson 1997, Begault
2004, Rumsey 2006, Pulkki etal. 2011, Suzuki etal. 2011). Such virtual positions
enable auditory localization, estimability of the position in space of virtual sources.
Augmenting a sound system with spatial attributes unfolds extended acoustic dimensions; spatial sound is a sonic analog of 3D graphics. It takes theater-in-the-round
and turns it inside-out, immersing listeners in soundscapes.
The Audium is a unique spatial sound theater (Loy 1985, Shaff 2002), a specially
constructed venue featuring music steered among its 176 speakers in an intimate
(49 seats) setting in San Francisco. One of its literally motivating principles is that
space (like rhythm, melody, or harmony) is a fundamental element of music, and
that controlled rendering of sound in space creates a kinetic perception that must be
a part of a composers musical vocabulary. Sounds are choreographed through their
movement and intensity on multiple trajectories through space. In the words of the
Audiums director,
The volume of space becomes part of the work and a strong sense of sculpting sound
in three dimensions becomes apparent. A melodic line acquires a starting point,
a speed, a determined pathway, and a point to conclusion. Areas in space become
launching sites and meeting stations for converging sound lines. Melodic convolutions
can be physically felt as they flow along spatial planesvertical, horizontal, diagonal,
circular, and any combination. As each melodic line travels, layers unfold, overlap, and
entwine to reveal a rich audio tapestry. Harmonic tensions between different locations
in space open up unusual timbres. Rhythmic ideas take on new qualities when speed
and direction are enhanced by controlled movement. Live performance of works gives
a human, interactive element to the Audiums spatial electronic orchestra.
Stan Shaff
The most direct way of implementing spatial sound is by simply distributing real
sources in space, as in antiphonal or polychoral music, such as that composed in
282
Venice during the Renaissance by Andrea Gabrieli, his nephew Giovanni Gabrieli,
and Claudio Monteverdi. Such a literal approach to electroacoustic spatial sound,
acoustic space synthesis, physically associates each source with a loudspeaker, statically placed or perhaps moved around mechanically. Alternatively, spatial sound can
be bluntly captured by a gimbaled dummy head positioned around a fixed speaker
or by physically moving sources and speakers around a mannequin (alternatively
spelled manikin, a dummy head; in German: Kunstkopf), the captured binaural signals from which are presentable to listeners. However, such implementations are
awkward and not portable. Therefore, the rest of this chapter concentrates on DSP
synthesis of spatial cues.
Fully parameterized spatial audio (Begault 1994, Carlile 1996, Jot 1999, Rumsey
2001) allows dynamic, arbitrary placement and movement of multiple sources in
soundscapesincluding musical sound characteristics outlined in Table 12.1 and
spatial dimensions presented later in Table 12.2as well as control of extra dimensions (Cohen and Wenzel 1995) shown in Figure 12.2, including apparent extent,
directivity, orientation, and environmental characteristics such as envelopment, for
cyberspatial capabilities (Cohen etal. 1999).
A sound diffuser or spatializer, a multidimensional mixer, creates the impression
that sound is coming from different sources and different places, just as one would
hear in person. There are two paradigms for AR and VR perspectives: projecting simulated sources into a listeners space, and transporting a listener into another space.
TABLE 12.1
Dimensions of Musical Sound
Frequency content
Pitch and register: tone, melody, harmony, vibrato (FM)
Waveform (sinusoid, sawtooth, square, triangle, rectification, etc.), tone color, waveshaping,
equalization, and sweetening
Spectral profile, including envelope and moments (center frequency)
Spectrotemporal pattern (evolving spectrum), texture, tone color
LTAS (long-term average spectrum)
Dynamics
Intensity/volume/loudness
SNR (signal-to-noise ratio)
Envelope: attack, decay, sustain, release (musical note shape)
Temporal envelope, including tremolo (AM)
Timing
Duration
Tempo, repetition rate
Duty cycle
Rhythm and cadence, including syncopation
Spatial position: location and orientation
Direction: azimuth, elevation
Distance (range)
Directivity: attitude and focus
283
TABLE 12.2
Physically Spatial Dimensions: Taxonomy of Positional Degrees of Freedom,
Including Cinematographic Gestures
Position
Static (Posture, Pose)
Dynamic (Gesture)
Location
(Displacement)
Scalar
Translation
Camera
Motion
Lateral (transverse
width or breadth)
Frontal
(longitudinal
depth)
Abscissa
X
Ordinate
Y
Sway: track
(crab)
Surge: dolly
Vertical (height)
Altitude
Z
Heave: boom
(crane)
Orientation or
Attitude
Directions
(Force)
Leftright
Back (aft), out:

retreat (drag)
forth (fore), in:
advance (thrust)
Up: ascend (lift)
down: descend
(weight)
Barrel roll
Azimuth
Pitch (tumble,
flip): tilt
Roll (bank,
flop): (roll)
Yaw (whirl,
twist): pan
About
Axis
Rotation
Elevation
Along
Axis
Climb/dive
Left/right
CW/CCW
Perpendicular
to Plane
Sagittal
(median)
Frontal
(coronal)
Horizontal
(transverse)
In Plane
Sagittal
(median)
Frontal
Horizontal
(transverse)
Simulations can be rendered of otherwise impractical or impossible situations, such

as inside a musical instrument. Spatial hearing can be stimulated by assigning each
phantom source a virtual position with respect to each sink and simulating auditory
positional cues. In AR applications, virtual position is chosen to align with real-world
features or directions. Audio displays based on such technology exploit human ability
to quickly and preattentively (unconsciously) localize and segregate sound sources.
12.1.3.1 Directionalization and Localization
Binaural localization cues include interaural time (phase) difference (ITD), a function of the interaural distance, as illustrated in Figure 12.3; interaural intensity difference (IID),* a consequence of the head acoustic shadow; and their generalization
as binaural frequency-dependent attenuation. These perceptual cues can be captured
by frequency-domain anatomical or head-related transfer functions (HRTFs) or
equivalently as time-domain head-related impulse responses (HRIRs), measured for
* IID is also known as ILD. IID emphasizes objective measurement, as intensity has dimensions of
power per area and units W/m 2, whereas ILD implicitly suggests subjective sensation, since level is
logarithmic, in dB.
284
Spatial impression
Source
Position
Azimuth
Distance
Elevation
Environment
Focus, diffuseness,
intimacy
Dimensions
Width
Depth
Width
Height
Depth
Height
Envelopment
Dimensions
Width
Depth
Height
Perceived
dimensions
Width
Depth
Height
FIGURE 12.2 Subjective spatial attributes. (Extended from taxonomy by R. Mason as

reported in Rumsey, F., Spatial Audio, Focal Press, Waltham, MA, 2001, p. 45; Smalley,D.,
Spectro-morphology and structuring processes, in S. Emmerson (ed.), The Language of
Electroacoustic Music, Macmillan-Palgrave, Cambridge, MA, 1986; Smalley, D., Organised
Sound, 2, 107126, 1997.)
the head, pinnae (outer ears), and torso of humans or mannequins (Hartmann 1999,
Wenzel 1992). The bumps, folds, cavities, and ridges of a pinnae cause superposition
of direct and reflected sound, direction-dependent interference. This cancellation
and reinforcement results in comb filtering, so-called because of spectral modification, heard as tonal coloration, which manifests as characteristic notches and peaks
in a frequency plot (Ballou 1991, Watkinson 2001). For each direction, a leftright
stereo pair of these HRTF earprints can be captured. Cyberspatial sound can be
generated by driving input signals through these filters in a digital signal processor,
creating psychoacoustic localization effects by expanding an originally monaural
signal into a binaural or multichannel signal with spatial cues. These static perceptual cues are fragile in the presence of conflicting dynamic cues (Martens 2003), so
often the earprint selection is parameterized by head tracking.
Thus, a spatial sound signal processor implements digital filters, the output of
which can be converted to analog signals, amplified, and presented to speakers.
Such systems process arbitrary audio signalsincluding voices, sound effects,
and musicwith functions that place signals within the perceptual three-space of
each listener, including direction (azimuth and elevation) and distance (range), as
explained in the next section. These algorithms are deployed in spatial sound engines
such as DirectXs DirectSound and some implementations of OpenAL, using filters
such as the MIT KEMAR database. Depending upon the application, adequate periphonic directionalization can be achieved by modulating only phase and intensity
just ITD (delay) and IID (as in balance panning) without heavier DSP, as predicted
by the duplex theory, a simple perceptual model of spatial hearing direction estimation. For instance, the Java Sound Spatial Audio library works this way, and allows
285
Source
r sin
r
Contralateral
Ipsilateral
FIGURE 12.3 Woodworths formula for interaural time delay (ITD) is a frequency-
independent, far-field, ray-tracing model of a rigid, spherical head: time difference cues are
registered at starts and ends of sounds (onsets and offsets). Lag in binaural arrival of a planar
wavefront is estimated as = r( +sin )/C, where r is the assumed radius of a head, is the
bearing of a far-field source, and C is the speed of sound. The radius of a typical adult head is
about 10cm. Primarily based on low frequency content of a sound signal, ITD is usable (without phase ambiguity from spatial aliasing) for wavelengths up to the Nyquist spacing of the
distance between the ears, corresponding to about 1700Hz. (This model is a simplification.
For instance, heads are not spherical, and ears do not symmetrically straddle the diameter,
but are slightly rearward, at around 100 and 260.)
original stereo tracks to be so projected, since the HRTF-based filter requirement of

a monophonic channel is waived.
ITDs and IIDs do not specify a unique location, as there are an infinite number
of locations along curves of equal distance from the ears having particular ITDs and
IIDs, the so-called cone of confusion. For example, anywhere on the median plane
ITD and IID vanish. HRTFs generalize and subsume ITDs and IIDs: time delays are
encoded in the phase spectrum, and IID corresponds to relative power of a filter pair.
12.1.3.2 Spatial Reverberation
A dry or free-field model includes no notion of a virtual room, and hence such
signals carry no echoes. Artificial spatial reverberation is a wet technique for simulating acoustic information used by listeners of sounds in interior environments,
which affects interaural coherence.* A spatial reverberation system, as illustrated by
Figure 12.4, creates an acoustic environment by simulating echoes consistent with
*
In similar colloquialisms, reverberant or anechoic spaces are said to be either live or dead, high- or
low-frequency signals are said to be bright or dark, and active or inactive signals are said to be
hot or cold.
286

Binaural space transfer function
Frequency domain
HRTF
RTF
Reflection
Reverberation
Absorption
FT IFT
HRIR
Time domain
Source
Mic
ADC
RIR
Convolution
Binaural space impulse response

DAC
FIGURE 12.4 Spatial sound synthesis: An impulse response is a time-domain representation

of a systems reaction to a kick, a momentary introduction of energy, like the trace of an
echogram of a clap. The frequency-domain equivalent of a time-based signal is a transfer
function, a complex representation of frequency-dependent attenuation and delay or phase shift.
The Fourier transform (FT) of an anatomical impulse responsewhich typically captures
effects not only of the head itself but also of the pinnae and torso, but is n evertheless referred
to as an head-related impulse responseis its frequency-domain equivalent, the anatomical
transfer function (imprecisely known as the head-related transfer function, or HRTF). As the
anatomical impulse response can be convolved with an impulse response that captures
reflections, reverberation, and absorption of a room or space (parameterized by positions of
sources and sinks), the space or room impulse response (RIR), or when the HRTF is c ascaded
with (multiplied by ) the room transfer function (RTF), the binaural space impulse response
and its corresponding binaural space transfer function capture both the dry directionalization
and the ambience for full spatializationlocation, orientation, and presence.
placement of virtual sources and sinks. There are two modeled classes of generated
echoes: early reflections, which are discretely generated (delayed, with frequencydependent amplitude), and late reflections comprising reverberation, which is more
diffuse and usually statistically described. The soundstage impression of the space
in which sound is perceived is related to presence, resonance, clarity and definition, envelopment, and immersion. Spaciousness, the perception of environmental
characteristicssuch as liveness, size, and shapeis correlated with indirect sound.
Spatial texture is associated with perception of interaction of sound with its environment, particularly the interval between arrival of direct sound and first few (early)
reflections.
As explained in the following paragraphs, early reflections are the particular
echoes generated by each source, and the reverberation tail forms the ambience of
the listening environment. Direct sound and discrete early reflections are cascaded
with filters for ambience to yield spatial reverberation.
Early reflections: Representing specific echoes, discrete, early reflectionsoff the
floor, walls, and ceilingprovide source position-dependent auditory images of
287
virtual sources to each sink. An image-model or ray-tracing algorithm can be used

to simulate the timing and direction of these individual reflections, represented by
a finite impulse response (FIR) filter, which are then processed (usually digitally by
feed-forward tapped delay lines, like shift registers) as if they were separate sources
(Kendall etal. 1986). Each spatialized audio source, incident or reflected, requires
its own DSP channel.
Late reverberation: Late field reverberation typically represents source positionindependent ambience of a space. Reverberant implementations of spatial sound
sometimes employ a recursive (feed-backward, or autoregressive), or infinite impulse
response (IIR), section to yield global reverberation effects of exponentially increasing density and decaying amplitude.
Combined spatialization: A filter that combines early reflections and late field
reverberationsometimes called TDR, for tapped delay + recirculation (not to be
confused with time domain reflectrometry, a technique for capturing echograms)
captures cues to perceived sound direction, sound distance, and room characteristics.
The ratio of direct to indirect sound energy is an important distance cue (as reviewed
in Section 12.2.3). Reverberation time is often characterized by RT60, the time for
sound to decay 60 dB, to one millionth power or one thousandth amplitude. Warmth
is used to describe reverberation time at low frequencies, and brilliance refers to
reverberation time at high frequencies. Commercial systems such as the Audyssey
MultEQ are designed to analyze site-specific reflections and adjust rendered audio to
compensate for unwanted distortions. A combination of nonrecursive and recursive
filters allows a spatial reverberation system to accept descriptions of characteristics
such as room dimensions and absorption, as well as time-varying source and sink
positions. Given an audio stream and a specification of source and sink motion in a
modeled room, a spatial reverberator generates the sound field arriving at ones ears.
Spatialized audio is rendered into a soundscape in which listeners localize objects by
perceiving both virtual sources and the simulated environment.
12.1.4Spatial Audio Augmented Reality

The goal of AR and augmented virtuality is to integrate external and internal data,
mixing real information with virtual so that a composite display facilitates decisionmaking processes or enriches environments and experience. Like electronic musical
instruments producing tones that are both sampled and synthesized, or a hologram
floating over a stage setting, mixed reality and mixed virtuality systems combine
naturally captured (or transmitted) and artificially generated information.
Audio reinforcement, as in PA (public address) systems, is a kind of AAR. Asa
different sort of example, Yamaha Vocaloid uses granular synthesis, compiling synthetic voices from processed snippets of human singing recordings. It gives musical
voice to Hatsune Miku, a humanoid persona and virtual idol, portrayed as a CG
16-year-old girl with long turquoise pigtails, who sometimes performs with real-life
performers such as Lady Gaga.
The concept illustrated by Figure 12.5 exploits wireless capture of mobile phones: a
dummy head equipped with (upside-down) binaurally arranged mobile phones enables
288
FIGURE 12.5 Poor persons mobile stereotelephony: a pair of inverted mobile phones,
deployed as a microphone array attached to a mannequin, simultaneously calling a dual voice
line, realizes wireless binaural telepresence.
portable stereotelephonic telepresence. By transmitting the sound m

ixture heard by
a user, a remote listener can experience the same acoustic panorama. Amobile AAR
system (Mynatt etal. 1998, Hrm etal. 2004, Rozier etal. 2004, Sukan etal. 2010)
extends such ideas, directionalizing virtual sounds by convolution with HRIRs.
Visual AR has two classes of head-up, see-through composited display: optical
and video. Optical see-through displays project augmenting information on transparent surfaces such as eyeglass lenses through which a scene is viewed naturally. Video
see-through displays capture a scene via cameras in front of the eyes, compositing
augmenting graphics in a frame buffer before graphic display. The distinction is akin
to that between reflex cameras with real analog images visible in the viewfinder and
mirrorless cameras with digital live preview rastering. Analogously, AAR can display sound intended to mix with naturally apprehended events, or explicitly capture
ambient sounds and mix them with synthesized audio before display via acoustically
opaque earwear such as circumaural headphones and earbuds.
Occluding the ear canal (as is done by the Etymtic Research ER4 MicroPro
earphones) eases the mixing of virtual and real sounds by passively canceling outside
noise and actively controlling signal levels, but listeners find annoying the occlusion
effect, wherein intensity increases at low frequencies, especially when chewing or
swallowing. Alternative hear-through semi-immersive implementations of AAR use
open headphones to eliminate the occlusion effect and simplify design since microphones are not required (Martin etal. 2009).
289
12.2 WHEREWARE: SPATIAL DIMENSIONS

12.2.1Position = Location and Orientation;
ChangingPose = Translation and Rotation
Location-based entertainment (LBE) usually describes theme park-style installations, but locative media and location-based services (LBS) typically refer to sitespecific information delivered to mobile terminals. Placelessness of purely virtual
information cripples applications for LBS. Hyperlocality encourages the use of georeferences, web databases stuffed with geographic coordinates, and geospatial data
usable by AR systems.
Cyberspatial sound is naturally deployed to designate position of sources relative
to sinks. Since audition is omnidirectional but direction-dependent, it is especially
receptive to orientation parameterization. Position is the combination of location and
orientation. Location-based and -aware services do not necessarily require orientation information, but position-based services are explicitly parameterized by angular
bearing (from which can be derived, for example, a course or direction to follow) as
well as place. Location can be tracked by GPS-like systems (including the Russian
GLONASS); orientation is capturable by sensors such as gyroscopes and magnetometers. Position can be cross-referenced to GIS (geographic information systems)
data, electronic maps. As elaborated in the following sections, whereware suggests
using hyperlocal georeferences to allow applications location awareness; whenceand whitherware suggests the potential of position awareness to enhance navigation and situation awareness, directionalizing and spatializing media streams so that
relative arrangement of sources and sinks corresponds to actual or notional circumstances, especially in real-time communication interfaces such as AR applications.
Combining literal direction effects and metaphorical distance effects in whence- and
whitherware applications invites over-saturation of interface channels, encouraging
interface strategies such as audio windowing, narrowcasting, and multipresence,
described in the following chapter.
12.2.2 Whereware for Augmented Reality

Some services use real-time location information derived from GPS, GIS, and RTLS
(real-time locating systems) to mash-up navigation and social networking. Emerging
interior techniques using indoor GPS or based on acoustic, optical, or electromagnetic tracking promise the same kind of localizability indoors (Ficco etal. 2014).
Location-based and -aware services nominally use translational location to specify a
subjects place, rectangularly representable as latitude, longitude, and altitude or x, y,
and z (sway, surge, and heave). Position, as suggested by its cognate pose, comprises
rotation as well as translation. Orientation in three-space is commonly described as
roll, pitch (or elevation), and yaw (or azimuth). These spatial dimensions are summarized in Table 12.2.
AR systems need full position information, including orientation, to align composited layersusing trackers or machine vision techniques such as fiducials, markers, and optical feature tracking. Besides GPS, mobile devices use cameras along
290
with microelectromechanical systems (MEMS)gyroscopes, accelerometers, magnetometers (electronic compasses)and dead reckoning (path integration), combined with some kind of sensor fusion, to infer position. A receiver needs (geometric,
photometric, acoustic, etc.) calibration with the real world to align overlaid objects
and scenes. Issues include static (geometric) error and drift, rendering registration
error, and dynamic error (time lag and jitter), all somewhat mitigated by a forgiving
user or a nonliteral user interface, allowing plausible discrepancies within bounds of
suspended disbelief.
Whereware denotes position-aware applications, including LBS and AR applications. Whenceware (from whence, meaning from where) denotes location-aware
applications that reference an origin; whitherware (from whither, meaning to
where) denotes location-aware applications referencing a destination (Cohen and
Villegas 2011). Such functionality is especially relevant to interfaces with spatial
sound capability. For example, whenceware-enhanced voicemail s ystems could
directionalize playback so that each displayed message apparently comes from
its senders location. A real-time streamed voice channel might be p rocessed as
part of its display to express the speakers position. Such applications of s patial
sound (Loomis et al. 1990, Holland et al. 2002, May 2004), consistent with
and reinforcing ones natural sense of direction, can improve situation awareness. In polyphonic soundscapes with multiple audio channels, spatial sound
can enhance the cocktail party effect, allowing listeners to hear out a particular channel from the cacophony, enhancing discriminability and speech intelligibility. Looser mappings are also possible: a virtual source location need not
correspond to g eographic location of a sender, but could be mapped into individualized space of a sink. Important messages might come from a direction
in front of a recipient, while less critical voicemail comes from behind. Timetagged notifications could be projected to clock-associated azimuths, so that, for
instance, a three oclock appointment reminder could come from the right, or a
six oclock message from behind.
Whence- and whitherware navigation systems, primed with hyperlocal geotags
of locations, can auditorily display sonic beaconslandmarks, warnings of situated hazards, and come hithers, beckoning travelers to goals or checkpoints.* Spatial
sound can be used to enhance driving experience. Through onboard devices such as
navigation systems, telematic services based on mobile or vehicular communications
infrastructure give drivers access to weather and traffic advisories, as well as information on nearby restaurants, fuel, and entertainment facilities. As illustrated by
Figure 12.6, localized audio sources can be aligned with real-world locations across
various frames-of-reference for increased situation awareness and safety.
*
Legend suggests a couple of gruesome examples: The mythical Sirens sang so enchantingly that sailors
were lured to shipwreck, running aground on the bird-womens island. When Jason wanted to guide
the Argonauts past them, he had his crew stuff their ears with beeswax (passive attenuation) while
Orpheus played his lyre, drowning out (masking) the otherwise irresistible singing of the sea-nymphs.
In a later fabled era, the Pied Piper was hired by the town of Hamelin to clear a rat infestation. He
played a musical pipe to lure the rats into a river where they drowned. However, the town neglected
to pay the piper, so he played enticingly again to lead his deadbeat patrons children out of town,
whereupon they disappeared.
291
Compass
Goal
Junction,
milestone, checkpoint
Accident
Mobile channel
Traffic jam
Location-based services
Door ajar
Blind spot traffic
Other vehicles
Sonar parking assist
Land line
Home
FIGURE 12.6 Back-seat driver: localized beacons for vehicular situation awareness.
12.2.3Distance Effects
In spatial auditory displays, the direction of virtual sources can be simply literal,
but to make a source audible it must be granted a practical intensity (Fouad 2004)
besides a figurative loudness. Extraordinary sounds like the explosion of Krakatoa
(a v olcanic island in the Indonesian archipelago) in 1883 can be audible hundreds of
kilometers away, but ordinary sounds such as speech and music rarely exceed 100
dB SPL and cannot usually be heard much more than a kilometer away. Although
geometric computer graphic models are almost always finally represented in rectilinear 2D, Cartesian or rectangular (x, y), or 3D, Euclidian (x, y, z), coordinates, navigation data are typically originally GPS-derived geographic latitudelongitudealtitude
coordinates, and spatial sound is almost always modeled using spherical coordinates,
distinguishing direction (azimuth and elevation ) and distance (radius or range ).
These frames-of-reference are naturally coextensive. To reify the notion of a source
projecting audible sound to a sinka spatial sound receiver represented by an avatar
in a virtual environment or the subject of an AAR projectionit is usual to display
the direction directly but to take some liberties by scaling the intensity. As in the
visual domain (McAllister 1993), the mechanism for judging distances is a combination of mono (monaural) and stereo (binaural) and stationary and dynamic cues.
292
Apparent distance of a sound source (Bronkhorst and Houtgast 1999, Moore and
King 1999) is determined mainly (Villegas and Cohen 2010) by
Overall sound level: Intensity of a point source varies as the reciprocal of
its squared distance (Mershon and King 1975). Simple models might use
this inverse-square free-field spherical intensity attenuation, equivalent to
linear gain scaling as a simple inverse and level falling off at 6 dB/range
doubling.
Interaural level differences: Closer sources present a larger ILD (Brungart
and Rabinowitz 1999), especially exaggerated at intimate, near-field whisper ranges.
Reverberation: Distance perception is affected by lateral reflections (Nielsen
1992).
Direct-to-reverberant energy ratio: In environments with reflecting surfaces,
sources far from a listener yield roughly the same reverberation level,
whereas direct sound level attenuates approximately according to the aforementioned 6 dB/distance doubling (Zahorik etal. 2005).
Head orientation: Range estimations are better when source direction is nearly
aligned with the interaural axis (Holt and Thurlow 1969).
Familiarity with environment and sound source: Distance estimation is worse
for unfamiliar sound sources (Coleman 1962).
Source dullness (sharpness): Distant sources are duller due to high-frequency
absorbent effects of air (Coleman 1968, Malham 2001). Nature acts like an LPF.
If a sound source or sink is moving, temporal intensity variation and Doppler shift
(pitch modulation) also contribute to distance estimation (Zakarauskas and Cynader
1991) (see Figure 12.7).
In virtual environments, the gain is usually clamped or railed within a certain near-field distance, and often disregarded beyond a certain range, where it is
assumed to be negligible. Also, distance effects are often exaggerated, as estimation
of range sharpens if level control is driven by models that roll off more rapidly than
the physical inverse 1/d amplitude law of spherical waves. Virtual sources can be
brought even closer to a listeners head by adjusting ILD to extreme values without
requiring the level to be increased as much as would normally occur at such close
range (Martens 2001).
12.2.4Stereotelephony
Stereotelephony means putting multichannel audio into communication networks,
using stereo effects in telephones and spatial sound in groupware. As audio codecs
improve and networks quicken for internet telephony, high-fidelity, high-definition
real-time audio becomes asymptotically
Broadband, with high sampling rate for full spectrum
Broadly dynamically ranged, carrying appropriate bit depth for expressive
intensity modulation
293

Full volume area
Extent (nimbus)
Fall-off for
distance
attenuation
1
0.7
0.5
0
0
3
6
dfva
de
Level/dB
Gain
Clamping
Distance from source
FIGURE 12.7 Virtual sound projection: The full volume area (dfva) is represented by the
dashed circle, within which the sound is heard at full volume, pegged at 0 dB, possibly with
near-field range manipulation. The extent (de) of the exposure refers to the nimbus in which a
focus-modulated sink can hear a nimbus-modulated source. (From Benford, S. etal., Presence:
Teleoperators and Virtual Environments, 4, 364386, 1995; Greenhalgh, C. and Benford, S.,
ACM Transactions on ComputerHuman Interactions, 2(3), 239261, 1995.) (Such limits are
a little like clipping planes in graphics renderers that cull beyond the sides of viewing frusta.)
Clear (with minimal noise), transparentuncolored by artifacts of sampling,

quantization, codecs, or network transmissionand therefore almost indistinguishable from in person events
Persistent, always available (247 or 365, 360 90: around the clock,
around the world)
Multichannel, with wide streams for polyphony (2 for stereo, 6 for 5.1, etc.).
Stereotelephony encourages directional and spatial sound and can be used seamlessly by AAR systems. Spatialization can be done in a peer-to-peer style (P2P) by
multimedia processors directly exchanging streams at the edge of a network (Cohen
and Gyrbir 2009) or in a client-server style, letting network servers push preprocessed data at thin clients, as in cloud-based audio engines extending the functionality of a PBX (private branch exchange). Even though a conferencing media
server might be called a voice bridge in acknowledgment of tuning for speech, it
can still be applied to other kinds of audio sources. For instance, radio-quality
music, sound effects (SFX), and auditory icons (Blattner et al. 1989, McGookin
et al. 2009) can all be streamed. Differentiated treatment seems to be indicated
(Seo etal. 2010), directly spatializing predetermined clips or loops (locally stored
294
or synthesized), as well as spatializing monophonic streams (such as telephonic

channels) and mixing soundscapes with network-delivered multichannel streams
(such as stereo music).
A standard will emerge to send location and position information upon connecting an ordinary phone call, POTS (plain old telephone service). Such metadata can
already be carried by non-POTSs, such as VOIP, including voice chat for MMORPG
(massively multiplayer online role-playing games) and other conferencing systems.
SIP systems, which distinguish signaling side channels from realtime media streams,
could be configured to convey such information as well as to support stereotelephony
(Alam etal. 2009).
12.3WEARWARE AND EVERYWARE: SOURCE

AND SINK DIMENSIONS
The proliferation of auditory display configurations is a challenge and opportunity
for audio engineers and sound and user experience designers. Stationary loudspeakers, handheld mobile devices, and eartop headwear span a continuum of form factors. The reproduction apparatus imposes restrictions and complications on the
integration of augmented spatial sound systems. Rapidly adopted technology has
caused several new words to enter the vernacular. For example, prosumer catches the
sense of a class of product and user reconciling the amateur/professional dichotomy,
reflecting increasingly affordable high-performance equipment and more discriminating users, as cycles and bandwidth flirt with human sensitivity. It also has the
alternative meaning of a combination of producer and consumer, in accordance with
bottom-up crowd sourcing. The digital 4C [foresee] convergence is the confluence
of communication devices, computing, consumer electronics, and content. As summarized by Table 12.3, such integration enables ubicomp (ubiquitous computing), the
smooth interaction of devices at different scales, including hybrid AAR leveraging
heterogeneous hardware.
TABLE 12.3
Saturated: Distributed and Pervasive, Continuous and Networked,
Transparent or Invisible (Spatial Hierarchy of Ubicomp or Ambient
Intimacy)
Smart spaces
Cooperative buildings and smart homes
Roomware (software for rooms) and reactive rooms media spaces
Spatially immersive displays
Information furniture
Networked appliances
Handheld, mobile, nomadic, portable, and wireless devices
Wearable computers
Computational clothing (smart clothes)
295
12.3.1Capabilities
12.3.1.1 Mobile and Wearable Auditory Interfaces
The dream motivating wireless technology is anytime, anywhere communications.
Mobile communication offers unique requirements and prospects because of interesting form factors (weight, size, interface), susceptibility to noise (less robust n etwork),
restricted bandwidth, and social potential (universality). Wireless computing has gone
beyond laptops, smartphones, and tablets (as well as smartbooks, netbooks, palmtops, ultrabooks, notebooks, clamshells, slates, tables, touchpad/handheld computers,
PDAs, and handheld gaming devices) to include wearable and intimate systems and
smart clothing. Besides the personal sound display systems illustrated by Figure 12.8,
an ultimate consequence of wearware could be talking clothing, like that imagined
by Figure 12.9. AAR mobile browsers are naturally extended by voice interfaces to
allow audio dialogs and mixed-initiative conversations: recognition of multitouch gestures, spoken input (via ASR: automatic speech recognition) and s ynthesized speech
(via TTS: text-to-speech), which modern algorithms have outgrown the drunken,
Scandinavian robot accents (so to speak) of the recent past.
12.3.1.2 Form Factors
Personal audio interfacesincluding intimate, wearable, handheld, mobile (like a
smartphone), nomadic (like a tablet), and portable (like a laptop)represent one end
of a spectrum, the other end of which is marked by social displays. These endpoints
delimit a continuum of useful interfaces, as outlined by Table 12.4. Auditory display
form factors can be ordered according to degree of intimacy along this private-public
dimension, corresponding to the vertical dimension in Figure 12.10.
Stereo earphones, headphones, and headsets arrange eartop transducers straddling ones head at the ears: in (as with earbuds), on (as with supraaural
headphones), or over (as with circumaural headphones). Design, fashion,
personality, style, and uniqueness are important characteristics of such
wearable devices, perhaps especially for younger users. Headphone-like displays, although somewhat cumbersome, allow greater f reedom of movement
while maintaining individually controlled audio display, including nearfield effects such as whispering. Such intimate sounds in ones sacred space
can evoke the so-called ASMR (autonomous sensory meridian response), a
euphoric tingling sensation sometimes compared to that from binaural beats.
Mobile terminals such as smartphones are portable and repositionable, but requ
ire extension to deliver stereo sound. Headphones which block e xternal sounds
are especially well suited for supporting active noise c ancellation (ANC),
which adds a polarity-inverted image of ambient noise to displayed signals.*
Open-back headphones are more transparent to ambient sounds; they obviate pseudoacoustic features for which binaural microphone-captured signals
* Polarity inversion can also be used as a crude way to broaden stereo imagery. Interaural cross-correlation (IACC) is related to the diffuseness or solidness of an image and affects the perceived spatial
extent of a source, the auditory source width (ASW). By inverting one side of a stereo pair, IACC is
bluntly subdued, and the resultant soundscape is widened.
296
Monophonic:
one speaker
Monotic or monaural:
one ear
Diotic:
one signal to both ears
Stereophonic:
two speakers
Dichoticbiphonic:
separate channels
Dichoticbinaural:
mannequin microphones
Loudspeaker binaural:
crosstalk cancellation
Nearphones, earspeakers:
Pseudophonic:
stereo speakers close to ears cross-connected dichotic
FIGURE 12.8 Personal sound displays. By preconditioning stereo signals, speaker crosstalk can be reduced. A special implementation of this technique is called transaural. (From
Bauck, J.L. and Cooper, D.H., Journal of the Audio Engineering Society, 44(9), 683705,
1996.) Such measures can be obviated by placing the speakers near the ears, nearphones for
unencumbered binaural sound. Pseudophonic arrangements allow a striking demonstration
of the suggestiveness of head-turning directionalization, as frontback and even updown
disambiguation is flipped, even if the subject can see the source. (Extended from Streicher,
R. and Everest, F.A., The New Stereo Soundbook, 3rd edn., Audio Engineering Associates,
Pasadena, CA; Marui, A. and Martens, W.L., Spatial character and quality assessment of
selected stereophonic image enhancements for headphone playback of popular music, in
AES: Audio Engineering Society Convention (120th Convention), Paris, France, 2006.)
297
FIGURE 12.9 Introducinginteractivewearspatial sound wearware. (Copyright 2015,

The New Yorker Collection from cartoonbank.com. All rights reserved.)
must be fed into collocated earphones with pass-through amplification to

restore real environmental sounds. Earbuds, such as Bluetooth e arpieces,
evoke that worn by Communications Officer Lt. Uhura in the original classic Star Trek science fiction TV series. Because they occlude the ear canal,
20 dB of passive cancellation or noise exclusion is possible, which is also
combinable with ANC (Heller 2012).
Bone conduction headphones usually conduct acoustic energy through the
mastoid portion of the temporal bone behind the ears. The Panasonic OpenEar Bone Conduction Wireless Headphones use earpiece speakers placed
in front of the ears, using vibration through ones upper cheekbones. Such
bone knockers do not block ambient sounds and can extend vibrotactile
low-frequency stimulation. Even more intimate than an article of clothing
is a digit in an orifice. Figure 12.11 illustrates utilizing sound transmission
through finger bones.
Nearphones (a.k.a. earspeakers) straddle the head without touching it, but are
close enough to the ears to minimize cross-talk.
298
TABLE 12.4
Audio and Visual Displays along PrivatePublic Continua
Display
Proxemic
Context
Architecture
Audio
Visual
Intimate
personal,
private
Individual
Headset, wearable computer
Eartop headphones, ear

buds
Chair
Nearphones, earspeakers
Interpersonal
Couch or bench
Multipersonal,
familiar
Social
Home theater, vehicle

spatially immersive display
(e.g., Cave, Cabin)
Club, theater, reality center
Loudspeakers (e.g., stereo

dipole, transaural)
Surround sound, 5.1, etc.
(e.g.,Ambisonics)
Public
Stadium, concert arena
Eyetop HWDs (headworn displays), HMDs

(head-mounted displays)
Smartphone, tablet, laptop
display, desktop monitor
HDTV, fishtank VR, NEC
VisionDome
Projection
Speaker array (e.g.,

VBAP, WFS, HOA)
Public address
Large-screen display
(e.g., IMAX)
Diffusion
Potentially everyone
Massively
multiuser
Multiuser
Single use
Reality
Augmented
reality
Location
Omnipresent
Mobile
Augmented
virtuality
Virtuality
Synthesis
Location-based
Stationary
FIGURE 12.10 Augmented reality location diffusion taxonomy: augmented reality refers
to extension of users real environments by synthetic (virtual) content. The Synthesis axis is
an original MRMV continuum. (From Milgram, P. and Kishino, F., IEICE Transactions on
Informations and Systems, E77-D(2), 13211329, 1994.) Location refers to how and where such
AR systems might be actually used; Diffusion refers to degree of concurrent usage. (Adapted
and extended from Brull, W. etal., Computer Graphics Animation, 28(4), 4048, 2008.)
Sound bells are parabolic or hyperbolic ceiling-mounted speaker domes which

focus acoustic beams, suitable for semiprivate audition in public spaces
such as g alleries and museums.
Parametric speakers use ultrasonic microspeaker arrays (described later in
Section 12.3.2.2).
299
FIGURE 12.11 Whisper wearable terminal: By sticking a finger in ones ear, a user can hear
sound conducted through finger bones. When fingers on opposite hands, driven by respective sides
of a stereophonic signal, are inserted into both ears, a kind of binaural sound can be displayed.
The channels are not so sharply separated compared with ordinary stereophonic sound, but the
display has a unique surround feel provided by the combination of aerial and bone conduction.
(From Fukumoto, M., A finger-ring shaped wearable HANDset based on bone-conduction, in
ISWC: Proceedings of the International Symposium on Wearable Computing, pp. 1013, 2005.)
Stereo loudspeakers often bestride visual media such as televisions, computer

monitors, and projection screens. With crosstalk cancellation or compensation (CTC or XTC), preconditioning of a stereo signal to minimize unavoidable leakage from each speaker to the contralateral ear, such arrangements
can emulate binaural displays.
Multichannel loudspeaker arrays are least invasive and are best suited for
situations where listeners are confined to a predetermined space (the sweet
spot or stereo seat) and delivered sound is independent of each listeners
position. Speaker array systems allow multiple listeners to enjoy a shared
environmental display. Line arrays can be driven for beam-forming, using
amplitude- and phase-modulation to produce directional lobes of emphasis
via constructive and destructive interference. Loudspeaker arrays can be
configured for various kinds of deployments:
Discrete arrangements are used in theatrical or AV home theater surround
sound, in 5.1, 7.1, 10.2, 22.2, etc. configurations (Holman 2008). Thecardinal number to the right of the decimal point in such designations counts the
number of bass channels: signals from LFE (low frequency effects) channels are combined with lower frequency bands from other input channels
by a bass management system which drives separate woofers. Since low
frequency sounds are not very directionalizable, arrangement of woofers is
300
usually independent of other speakers. DSP can also be applied to signals

presented via speaker arrays to enhance spatiality, such as manipulations
of source elevation (Tanno et al. 2010, Kim etal. 2014). Virtual surround
systems such as DTS Headphone:X are intended to emulate with headphone
display the externalized spatial effect of speaker arrays.
VBAP, Vector Base Amplitude Panning, uses only gain adjustment (no DSP)
across surrounding speakers for sound diffusion (Pulkki 1997, 2004).
DirAC, for Directional Audio Coding, is based on spatial frequency sampling
(Pulkki etal. 2009).
Ambisonics and HOA, Higher-Order Ambisonics, are based on spherical harmonics and reprojections of a sound field captured by tetrahedral microphone arrays.
Barco Auro-3D and Dolby Atmos are contemporary theatrical standards.
Recently such public venue development has literally elevated, with deployment not only of speakers at ear level but also extension to height speakers
and ceiling-mounted voice of God zenith speakers.
WFS systems use densely distributed DSP-driven speaker arrays to reconstruct, by Huygens Principle, complex wavefronts, within the sweet spot
of which localization of virtual sources does not depend upon or change
with listener position (Rabenstein and Spors 2008, Melchior etal. 2010).
Wavefield (numeric solution) and wavefront (analytic solution) synthesis
systems can be thought of as a complement of HOA (Kim and Choi 2013).
12.3.1.3 Dynamic Responsiveness
Spatial sound and spatial reverberation systems have varying degrees of responsiveness, according to immediacy of the audio stream and dynamism of the sources and
sinks, orderable into a hierarchy of parameterizability:
Static non-realtime Ordinary stereo mix, with fixed soundscape
Binaural recording spatial music, including musique concrte and acousmatic
sound
Dynamic non-realtime Recorded stereo mix with animated soundscape
Static realtime Ordinary teleconference with fixed locations
Dynamic realtime VR or AR chatspace with moving avatars, sources, and
sinks
Dynamic realtime with head tracking for AAR applications
Wikitude-style contents, in which crowd-sourced augmentation is prepared offline in
advance, are less flexible for AAR than real time, dynamically generated networkdelivered audio streams.
12.3.1.4 Head Tracking
Since virtual sources do not actually exist, but are simulated by multiple channels
of audio, a spatialization system must be aware of the position of a listeners ears to
faithfully create a desired sound image. A head tracker, which might be attached to a
301
or
FIGURE 12.12 Head tracking: Active listening, including natural audition, uses intentional and unintentional head turning to disambiguate frontback confusion.
set of headphones or a head-mounted display, can continuously detect orientation of

a listeners head, adjusting selection of filters for input signals. This feature is important for robust localization and soundscape stabilization, allowing sources to stay
anchored in an environment as ones head is turned, as illustrated by Figure 12.12.
A sophisticated system synthesizes a soundscape in which sound source images
respect each users head orientation in real-time (Gamper 2014).
12.3.1.5Broadband Wireless Network Connectivity:
4G, MIMO, ABC, and SDR
Exponentially increasing speed of roaming communication surpasses that predicted by Moores law for computation, and seems even more inevitable, thanks
to multiple-input multiple-output (MIMO) architectures which optimize antenna
usage. A catchphrase for fourth-generation mobile is always best connected, suggesting proliferation of persistent sessions, like intercoms to close friends and
relations or intimate colleagues. Anticipated features include wireless technology
integrationlinking global systems with local, such as IEEE 802.11 (a.k.a. Wi-Fi)
and BluetoothSDR (cognitive or software-defined radio), and advanced multimedia mobile communications (IPV6, high-resolution video transmission, digital
broadcasting, security, etc.), including 3D VR and AR interfaces. Metropolitan
area network (MAN) systems such as LTE, WiMAX (IEEE 802.16), and ZigBee
(IEEE802.15.4), along with smart antennas, especially beam-forming phased arrays,
will make theoretical bandwidth and practical throughput across the composite het
net (heterogeneous network) even broader and higher.
12.3.2Special Displays
12.3.2.1 Binaural Hearing Aids
About 5% of the worlds population suffers from significant hearing loss, and such
affliction is expected to worsen in the future. Modern hearing aids are a kind of wearable computer: they capture and digitize signals, perform frequency filtering across
separate bands, suppress noise, dynamically amplify conditioned signals, reconstruct
full-band signals from separated bands, and resynthesize amplified analog signals
302
which are finally transduced back into sound. Since microphones and speakers are
collocated in such devices, intense signals can cause howling, which limits amplification rangethe so-called GBF, gain before feedback. Binaural hearing aids can use
the cross-channels to address that problem, including detection stages in the processing chain so that when oscillation is on one side but not in the contralateral device at
the same frequency, it can be identified as feedback and suppressed (Hamacher etal.
2005). Contemporary models feature modes that allow selection from among several
programs according to the circumstantial environment (conversation in noise, phone
call, television, etc.). Consumer products such as GN ReSound Linx and Starkey Halo
use Bluetooth-connected smartphone interfaces to make directional and spectral
hearing aid adjustments, via a kind of body area network (BAN).
12.3.2.2 Parametric Ultrasonics
Ultrasound has wavelengths shorter than audible sound and can be aimed in beams
tighter than those formed by normal loudspeakers. Such displays create audible signals
through propagation distortion, nonlinear effects on air of ultrasonic signals (nominally above around 20 kHz, around 4060 kHz in current practice). They have been
researched for decades as parametric acoustic arrays but only in the last decade have
practical systems become commercially available (Ciglar 2010), including Holosonic
Research Labs Audio Spotlight and American Technologys HyperSonic Sound
System. They work by modulating an audio source onto an ultrasonic carrier, which
is then amplified and projected into the air by ultrasonic transducers. The product of
two ultrasonic signals, a reference carrier and the combination of the carrier and a
variable audio source, decompose into bands through dispersion in the air:
cos(c) sin(c + s) =
1
1
sin(2c + s) + sin(s).
2
2
As in a Theremin, the first (sum) intermodulation term is inaudible, but the second
(difference) is not. Audible sound is generated in the air, not at the speakers. The
highly directionalized sound beams are steerable through their focus and controllable spreading (reportedly as low as 3), and can be bounced off surfaces. If technical issues regarding such systems lower-frequency response, below 200400 Hz
(as there is an inherent 12 dB/octave high-pass slope, a consequence of the way that
ultrasound demodulates into the audible range), and concerns about health hazards
(as the inaudible sounds can be very intense, in the range of 140150 dB SPL) can be
allayed, ultrasonic-based audio displays could be as flexible as analogous light-based
visual displays, allowing personal sound in otherwise quiet spaces such as libraries.
12.3.2.3 Distributed Spatial Sound
Another exciting field of application of audio technologies is distributed spatial
sound. Smartphones and other audio terminals could be used not only as remote
controls, but also as distributed displays, ad hoc or spontaneous loudspeaker arrays
embedded among collocated users and helping to overcome the limited power of
individual devices. Similarly, mobile spatial audio could be used more extensively
for artistic purposes. For example, SoundDelta (Mariette etal. 2010) was a multiuser
303
AAR environment that used ambisonic zones in a way that resembled cells in a
mobile telephony network, rendering spatial audio via headphones. Participants
heard an interactive spatial soundscape while walking around a public area such as
a town square or park.
12.3.2.4 Information Furniture
Internet appliances can be outfitted with spatial sound capability. Besides home
theater, for instance, multimedia baths can be configured for extravagant surround
sound display. Massage chairs (such as those made by Inada) can synchronize s hiatsu
with music. A swivel chair can also be deployed as an instance of multimodal information furniture. As an audio output modality, nearphones straddling a headrest
present unencumbered binaural sound with soundscape stabilization for auditory
image localization, directionalizing audio using dynamically selected transfer functions determined by chair rotation (Cohen etal. 2007). For haptic output modality,
servomotors can twist motorized chairs under networked control, distributing torque
across the internet to direct the attention of seated subjects, orienting users (like
a dark ride amusement park attraction), or nudging them in a particular direction
(Cohen 2003) (see Figure 12.13).
(a)
(b)
FIGURE 12.13 Information furniture: a pivot (swivel) chair with servomotor deployed as a
rotary motion platform and I/O device, shown along with its digital analog. The input modality
is orientation tracking, which dynamically selects transfer functions used to s patialize audio
in a stable (rotation-invariant) soundscape. Nearphones straddling the headrest provide unencumbered binaural display. Commercially available Pioneer BodySonic configurations
embed speakers in the headrest and seat of lounge chairs and sofas, as well as dance floors, to
display visceral vibration that can augment audio-channel information. (a)Rotary motion platform (Developed with Mechtec, www.mechtec.co.jp), (b) mixed reality simulation compositing panoramic imagery into dynamic cg: a simulated simulator. (Model by Daisuke Kaneko.)
304
12.4 CONCLUDING REMARKS

AR can leverage fixed-mobile convergence (FMC), the integration of wireline and
wireless interfaces. Indoor sensors and outdoor GPS-like navigation systems will
fuse into seamless tracking that leverages user position to enhance applications and
location-based services: whereware. Wearwarewearable computers and mobile,
nomadic, and portable networked communication devicesand internet appliances
and ubicomp interfaces allow heterogeneous multimodal displays, including AAR:
everyware. Hybrid configurations will emerge, such as loudspeaker arrays in conjunction with eartop displays (via BANs) and ad hoc arrangements of mobile phone
speakers. For contemporary instance, the Sony IMAX Personal Sound Display
allows individual binaural channels on top of the theatrical six-channel system, for
personalized soundscapes including multilingual narration. The dichotomy between
mobile and LBS is resolved with mobile ambient transmedial interfaces that span
both personal mobile devices and shared locative public resources. The next chapter
surveys various applications of such capabilities, and considers more closely the possibilities of such fluid frames of reference.
REFERENCES
Alam, S., M. Cohen, J. Villegas, and A. Ashir (2009). Narrowcasting in SIP: Articulated p rivacy
control. In S. A. Ahson and M. Ilyas (eds.), SIP Handbook: Services, Technologies, and
Security of Session Initiation Protocol, Chapter 14, pp. 323345. CRC Press/Taylor &
Francis. www.crcpress.com/product/isbn/9781420066036.
Ballou, G. M. (1991). Handbook for Sound Engineers (3rd edn.). Focal Press.
Barfield, W. and T. A. Furness III (1995). Virtual Environments and Advanced Interface
Design. Oxford University Press.
Bauck, J. L. and D. H. Cooper (1996, September). Generalized transaural stereo and applications. J. Aud. Eng. Soc. 44(9), 683705. http://www.aes.org/e-lib/browse.cfm?elib=7888.
Begault, D. R. (1994). 3-D Sound for Virtual Reality and Multimedia. Academic Press.
Begault, D. R. (ed.) (2004). Spatial Sound Techniques, Part 1: Virtual and Binaural Audio
Technologies. Audio Engineering Society.
Benford, S., J. Bowers, L. Fahln, C. Greenhalgh, J. Mariani, and T. Rodde (1995). Networked
virtual reality and cooperative work. Presence: Teleoperators and Virtual Environments
4(4), 364386.
Blattner, M. M., D. A. Sumikawa, and R. M. Greenberg (1989). Earcons and Icons: Their
structure and common design principles. Human Computer Interaction 4(1), 1144.
Blauert, J. (1997). Spatial Hearing: The Psychophysics of Human Sound Localization (revised
edn.). MIT Press.
Bronkhorst, A. W. and T. Houtgast (1999, February 11). Auditory distance perception in
rooms. Nature 397(6719), 517520, DOI:10.1038/17374.
Brull, W., I. Lindt, I. Herbst, J. Ohlenburg, A.-K. Braun, and R. Wetzel (2008, July/August).
Towards Next-Gen mobile AR games. Computer Graphics Animation 28(4), 4048.
Brungart, D. S. and W. M. Rabinowitz (1999). Auditory localization of nearby sources. Headrelated transfer functions. The Journal of the Acoustical Society of America 106(3),
14651479. http://link.aip.org/link/?JAS/106/1465/1, DOI: 10.1121/1.427180.
Carlile, S. (1996). Virtual Auditory Space: Generation and Applications. Springer.
Ciglar, M. (2010, June). An ultrasound based instrument generating audible and tactile sound. In
Proc. NIME: New Instruments for Music Expression, Sydney, New South Wales, Australia.
305
Cohen, M. (2003). The Internet Chair. International Journal of Human-Computer Interaction.

(Special Issue: Mediated Reality) 15(2), 297311.
Cohen, M., K. Doi, T. Hattori, and Y. Mine (2007, October). Control of navigable panoramic
imagery with information furniture: Chair-driven 2.5D steering through multistandpoint
QTVR Panoramas with automatic window dilation. In T. Miyazaki, I. Paik, and D. Wei
(eds.), Proc. CIT: Seventh International Conference on Computer and Information
Technology, Aizu-Wakamatsu, Japan, pp. 511516.
Cohen, M. and N. Gyrbir (2009, November). Mobile narrowcasting spatial sound. In
Y. Suzuki, D. Brungart, H. Kato, K. Iida, D. Cabrera, and Y. Iwaya (eds.), IWPASH:
Proceedings of the International Workshop on the Principles and Applications of Spatial
Hearing, Zao, Japan.
Cohen, M., J. Herder, and W. L. Martens (1999, November). Cyberspatial audio technology.
Journal of the Acoustical Society of Japan 20(6), 389395.
Cohen, M. and J. Villegas (2011, March). From whereware to whence- and whitherware:
Augmented audio reality for position-aware services. In ISVRI: Proceedings of the
International Symposium on Virtual Reality Innovations, Singapore. isvri2011.org.
Cohen, M. and E. M. Wenzel (1995). The design of multidimensional sound interfaces. In
W. Barfield and T. A. Furness III (eds.), Virtual Environments and Advanced Interface
Design, Chapter 8, pp. 291346. Oxford University Press.
Coleman, P. D. (1962). Failure to localize the source distance of an unfamiliar sound. J.Acoust.
Soc. Am. 34(3), 345346. DOI: 10.1121/1.1928121.
Coleman, P. D. (1968). Dual rle of frequency spectrum in determination of auditory distance. J. Acoust. Soc. Am. 44(2), 631632. http://scitation.aip.org/content/asa/journal/
jasa/44/2/10.1121/1.1911132.
Durlach, N. I. and A. S. Mavor (eds.) (1995). Virtual RealityScience and Technological
Challenges. National Academy Press, National Research Council.
Ficco, M., F. Palmieri, and A. Castiglione (2014, February). Hybrid indoor and outdoor location services for new generation mobile terminals. PUC: Personal and Ubiquitous
Computing 18(2), 271285.
Fouad, H. (2004). Spatialization with stereo loudspeakers. See Greenebaum and Barzel
(2004), pp. 143158.
Fukumoto, M. (2005, October). A finger-ring shaped wearable HANDset based on bone-conduction. In ISWC: Proceedings of the International Symposium on Wearable Computing,
pp. 1013.
Funkhouser, T., N. Tsingos, and J.-M. Jot (2002, August). Sounds good to me!: Computational
sound for graphics, virtual reality, and interactive systems. SIGGRAPH Course Notes.
Gamper, H. (2014). Enabling technologies for audio augmented reality systems. PhD thesis,
Aalto University, Espoo, Finland. http://urn.fi/URN:ISBN:978-952-60-5622-7.
Gilkey, R. H. and T. R. Anderson (eds.) (1997). Binaural and Spatial Hearing in Real and
Virtual Environments. Mahway, NJ: Lawrence Erlbaum Associates.
Greenebaum, K. and R. Barzel (eds.) (2004). Audio Anecdotes II. Wellesley, MA: A K Peters.
Greenhalgh, C. and S. Benford (1995, September). Massive: A collaborative virtual environment for teleconferencing. ACM Transactions on ComputerHuman Interaction 2(3),
239261.
Hamacher, V., J. Chalupper, J. Eggers, E. Fischer, U. Kornagel, H. Puder, and U. Rass (2005,
January). Signal processing in high-end hearing aids: State of the art, challenges, and
future trends. EURASIP Journal on Applied Signal Processing 2005, 29152929. DOI:
10.1155/ASP.2005.2915.
Hrm, A., J. Jakka, M. Tikander, M. Karjalainen, T. Lokki, J. Hiipakka, and G. Lorho (2004,
June). Augmented reality audio for mobile and wearable appliances. Journal of the
Audio Engineering Society 52(6), 618639. www.aes.org/e-lib/browse.cfm?elib=13010.
306
Hartmann, W. M. (1999, November). How we localize sound. Physics Today 52(11), 2429.
www.aip.org/pt/nov99/locsound.html.
Heller, E. J. (2012). Why You Hear What You Hear. Princeton University Press. www.
whyyouhearwhatyouhear.com, http://press.princeton.edu/titles/9912.html.
Holland, S., D. R. Morse, and H. Gedenryd (2002, September). AudioGPS: Spatial audio navigation with a minimal attention interface. PUC: Personal and Ubiquitous Computing
6(4), 253259.
Holman, T. (2008). Surround Sound: Up and Running (2nd edn.). Oxford, U.K.: Elsevier/
Focal Press.
Holt, R. E. and W. R. Thurlow (1969, December). Subject orientation and judgment of distance
of a sound source. The Journal of the Acoustical Society of America 46(6), 15841585.
Jot, J.-M. (1999). Real-time spatial processing of sounds for music, multimedia and interactive
humancomputer interfaces. Multimedia Systems 7(1), 5569.
Kapralos, B., M. R. Jenkin, and E. Milios (2008, December). Virtual audio systems. Presence:
Teleoperators and Virtual Environments 17(6), 527549. www.mitpressjournals.org/toc/
pres/17/6.
Kendall, G. S., W. L. Martens, D. J. Freed, M. D. Ludwig, and R. W. Karstens (1986). Image
model reverberation from recirculating delays. In AES: Audio Engineering Society
Convention, New York.
Kim, S., M. Ikeda, and W. L. Martens (2014). Reproducing virtually elevated sound via a
conventional home-theater audio system. Journal of the Audio Engineering Society
62(5), 337344.
Kim, Y.-H. and J.-W. Choi (2013). Sound Visualization and Manipulation. Wiley. http://
as.wiley.com/WileyCDA/WileyTitle/productCd-1118368479.html.
Kleiner, M. (2011). Acoustics and Audio Technology (3rd edn.). John Ross Publishing.
Loomis, J. M., C. Hebert, and J. G. Cicinelli (1990, October). Active localization of virtual
sounds. The Journal of the Acoustical Society of America 88(4), 17571763.
Loy, G. (1985). About audiumA conversation with Stanley Shaff. Computer Music Journal
9(2), 4148.
Malham, D. G. (2001, Winter). Toward reality equivalence in spatial sound diffusion. Computer
Music Journal 25(4), 3138. DOI: 10.1162/01489260152815279.
Mariette, N., B. F. G. Katz, K. Boussetta, and O. Guillerminet (2010, May). SoundDelta: A
study of audio augmented reality using WiFi-distributed ambisonic cell rendering. In
Proceedings of the 128 Audio Engineering Society Convention, London.
Martens, W. (2003, December). Perceptual evaluation of filters controlling source direction:
Customized and generalized HRTFs for binaural synthesis. Acoustical Science and
Technology 24(5), 220232. http://dx.doi.org/10.1250/ast.24.220.
Martens, W. L. (2001, December). Psychophysical calibration for controlling the range of
a virtual sound source: Multidimensional complexity in spatial auditory display. In
Proceedings of ICAD: International Conference on Auditory Display, pp. 197207.
http://legacy.spa.aalto.fi/icad2001/proceedings/papers/martens.pdf.
Martin, A., C. Jin, and A. V. Schaik (2009). Psychoacoustic evaluation of systems for delivering spatialized augmented-reality audio. Journal of the Audio Engineering Society
57(12), 10161027.
Marui, A. and W. L. Martens (2006, May). Spatial character and quality assessment of selected
stereophonic image enhancements for headphone playback of popular music. In AES:
Audio Engineering Society Convention (120th Convention), Paris.
May, M. (2004). Wayfinding, ships and augmented reality. In P. Andersen and L. Qvortrup
(eds.), Virtual Applications: Applications with Virtual Inhabited 3D Worlds, Chapter 10,
pp. 212233. London: Springer.
McAllister, D. F. (1993). Stereo Computer Graphics and Other True 3D Technologies.
Princeton University Press.
307
McGookin, D. K., S. A. Brewster, and P. Priego (2009). Audio bubbles: Employing nonspeech audio to support tourist wayfinding. In HAID: Proceedings of the International
Conference on Haptic and Audio Interaction Design, Dresden, Germany, pp. 4150.
Springer-Verlag. DOI: 10.1007/978-3-642-04076-4_5.
Melchior, F., J. Ahrens, and S. Spors (2010, November). Spatial audio reproduction: From
theory to production, Part I. In AES: Audio Engineering Society Convention (129th
Convention), San Francisco, California.
Mershon, D. H. and L. E. King (1975). Intensity and reverberation as factors in the auditory
perception of egocentric distance. Perception & Psychophysics 18(6), 409415.
Milgram, P. and F. Kishino (1994, December). A taxonomy of mixed reality visual displays.
IEICE Transactions on Information and Systems E77-D(12), 13211329.
Moore, D. R. and A. J. King (1999). Auditory perception: The near and far of sound localization. Current Biology 9(10), R361R363.
Mynatt, E. D., M. Back, R. Want, M. Baer, and J. B. Ellis (1998, April). Audio aura: Lightweight audio augmented reality. In Proceeding of CHI: Conference on Computer
Human Interaction, Los Angeles, California, pp. 566573. ACM Press/Addison-Wesley.
DOI: 10.1145/274644.274720, http://dx.doi.org/10.1145/274644.274720.
Nielsen, S. H. (1992, March). Auditory distance perception in different rooms. In Audio
Engineering Society (92nd Convention). www.aes.org/e-lib/browse.cfm?elib=6826.
Pulkki, V. (1997). Virtual source positioning using vector base amplitude panning. Journal of
the Audio Engineering Society 45(6), 456466.
Pulkki, V. (2004, June). Spatialization with multiple loudspeakers. See Greenebaum and
Barzel (2004), pp. 159172.
Pulkki, V., M.-V. Laitinen, J. Vilkamo, J. Ahonen, T. Lokki, and T. Pihlajamki (2009,
November). Directional audio codingPerception-based reproduction of spatial
sound. In Y. Suzuki, D. Brungart, H. Kato, K. Iida, and D. Cabrera (eds.), IWPASH:
Proceedings of the International Workshop on the Principles and Applications of Spatial
Hearing, Zao, Japan. eproceedings.worldscinet.com/9789814299312/9789814299312_
0056.html.
Pulkki, V., T. Lokki, and D. Rocchesso (2011). Spatial effects. In U. Zlzer (ed.), DAFX:
Digital Audio Effects (2nd edn.), Chapter 5, pp. 139184. Wiley.
Rabenstein, R. and S. Spors (2008). Sound field reproduction. In J. Benesty, M. M. Sondhi, and
Y. Huang (eds.), Handbook of Speech Processing, Chapter 53, pp. 10951114. Springer.
Rozier, J., K. Karahalios, and J. Donath (2004, April). Hear and there: An augmented reality
system of linked audio. In Proceedings of ICAD: International Conference on Auditory
Display.
Rumsey, F. (2001). Spatial Audio. Focal Press.
Rumsey, F. (ed.) (2006). Spatial Sound Techniques, Part 2: Multichannel Audio Technologies.
Audio Engineering Society.
Seo, B., M. M. Htoon, R. Zimmermann, and C.-D. Wang (2010, November). Spatializer: A
web-based position audio toolkit. In Proceedings of ACE: International Conference on
Advances in Computer Entertainment Technology, Taipei, Republic of China. ace2010.
ntpu.edu.tw.
Shaff, S. (2002). AUDIUM: Sound-sculptured space. Leonardo 35(3), 248. www.
mitpressjournals.org/toc/leon/35/3.
Shilling, R. D. and B. Shinn-Cunningham (2002). Handbook of Virtual Environments: Design,
Implementation, and Applications, Chapter Virtual auditory displays, pp. 6592.
Human Factors and Ergonomics. Mahway, New Jersey: Lawrence Erlbaum Associates.
Smalley, D. (1986). Spectro-morphology and structuring processes. In S. Emmerson (ed.), The
Language of Electroacoustic Music. Cambridge, Massachusetts: Macmillan-Palgrave.
Smalley, D. (1997, August). Spectromorphology: Explaining sound-shapes. Organised Sound
2(2), 107126. http://dx.doi.org/10.1017/S1355771897009059.
308
Stanney, K. M. (2002). Handbook of Virtual Environments: Design, Implementation, and

Applications. Human Factors and Ergonomics. Mahway, New Jersey: Lawrence Erlbaum
Associates.
Streicher, R. and F. A. Everest (2006). The New Stereo Soundbook (3rd edn.). Pasadena,
California: Audio Engineering Associates.
Sukan, M., O. Oda, X. Shi, M. Entrena, S. Sadalgi, J. Qi, and S. Feiner (2010). Armonica:
A collaborative sonic environment. In ACM Symposium on User Interface Software
and Technology, pp. 401402. DOI: 10.1145/1866218.1866240, http://doi.acm.
org/10.1145/1866218.1866240.
Suzuki, Y., D. Brungart, Y. Iwaya, K. Iida, D. Cabrera, and H. Kato (eds.) (2011). Principles and
Applications of Spatial Hearing. Singapore: www.worldscientific.com/worldscibooks/
10.1142/7674.
Tanno, K., A. Saji, H. Li, and J. Huang (2010). A 3-d sound creation system using horizontally arranged loudspeakers. In AES: Audio Engineering Society Convention (129th
Convention), San Francisco, California, Paper 8281.
Tohyama, M., H. Suzuki, and Y. Ando (1995). The Nature and Technology of Acoustic Space.
Academic Press.
Villegas, J. and M. Cohen (2010, December). Hrir: Modulating range in headphone-reproduced spatial audio. In VRCAI: Proceedings of the Ninth International Conference on
Virtual-Reality Continuum and Its Applications in Industry, Seoul vrcai2010.org.
Watkinson, J. (2001). Convergence in Broadcast and Communications Media: The Fundamentals
of Audio, Video, Data Processing and Communications Technologies. Focal Press.
Wenzel, E. M. (1992). Localization in virtual acoustic displays. Presence: Teleoperators and
Virtual Environments 1(1), 80107.
Zahorik, P., D. S. Brungart, and A. W. Bronkhorst (2005, June). Auditory distance perception
in humans: A summary of past and present research. Acta Acustica United with Acustica
91(3), 409420.
Zakarauskas, P. and M. Cynader (1991). Aural intensity for a moving source. Hearing
Research 52(5), 233244.
13
Applications of Audio
Augmented Reality
Wearware, Everyware,
Anyware, and Awareware
Michael Cohen and Julin Villegas
CONTENTS
13.1 Introduction and Overview............................................................................309
13.2 Applications................................................................................................... 310
13.2.1 Navigation and Location-Awareness Systems................................... 311
13.2.2 Assistive Technology for Visually Impaired..................................... 312
13.2.3 Synesthetic Telepresence................................................................... 313
13.2.4 Security and Scene Analysis............................................................. 313
13.2.5 Motion Coaching via Sonification..................................................... 314
13.2.6 Situated Games.................................................................................. 314
13.2.7 Entertainment and Spatial Music...................................................... 314
13.3 Anyware and Awareware............................................................................... 315
13.3.1 Audio Windowing.............................................................................. 315
13.3.2 Narrowcasting.................................................................................... 315
13.3.3 Multipresence.................................................................................... 317
13.3.4 Layered Soundscapes......................................................................... 319
13.4 Challenges..................................................................................................... 320
13.4.1 Capture and Synthesis....................................................................... 321
13.4.2 Performance....................................................................................... 321
13.4.3 Authoring Standards.......................................................................... 322
13.5 Concluding Remarks..................................................................................... 323
References............................................................................................................... 324
13.1 INTRODUCTION AND OVERVIEW

The previous chapter outlined the psychoacoustic theory behind cyberspatial sound,
recapitulated in Figure 13.1, and the idea of audio augmented reality (AAR), including
review of its various form factors. Whereware was described as a class of location- and
position-aware interfaces, particularly those featuring spatial sound. This chapter
considers application domains, interaction styles, and display configurations to
309
310
FIGURE 13.1 Schematic summary of binaural effects: ITD (interaural time delay or
difference), IID (interaural intensity difference), and head shadow (frequency-dependent
binaural attenuation). Ipsilateral signals at the ear closer to a source are stronger and earlier;
contralateral signals at the ear away from a source are weaker and later.
realizeAAR. Utility, professional, and leisure application areas are surveyed, including
multimodal augmented reality (AR) interfaces featuring spatial sound. Consideration
of (individual) wearware and (ubicomp) everyware is continued from the previous
chapter, in the context of mobile ambient transmedial interfaces that integrate personal
and public resources. Two more ware terms are introduced: anyware here refers to
multipresence audio windowing interfaces that use narrowcasting to selectively enable
composited sources and soundscape layers, and awareware automatically adjusts such
narrowcasting, maintaining a model of user receptiveness in order to modulate and
distribute privacy and attention across overlaid soundscapes.
13.2APPLICATIONS
Users of visual AR systems can sometimes be subject to bewildering displays of
information, making it difficult to identify and avoid hazards. Historically, adoption
of novel visual technologies precedes that of audio technologies. Current mobile
technology already offers display resolution finer than human visual acuity at normal viewing distances, and 3D visual displays are deployed on mobile platforms.
Considering such trends, we can expect a great increase in number of applications
using wearable spatial audio technologies. Miniaturization of components will allow
creation of devices small enough to be worn over the ear and controlled by facial
or tongue movements. Such devices will naturally include spatialization options for
audio reproduction. Researchers are exploring alternative ways to present information via complementary sensory modalities, especially audition. In this section we
survey a broad variety of applications of AAR, not only those that explicitly attempt
Applications of Audio Augmented Reality
311
to desaturate the visual channel. Systems like those outlined in the last chapter are
being deployed in utility, professional, and recreational domains, considered here in
that order.
13.2.1 Navigation and Location-Awareness Systems

Spatialized sound and music can be deployed to aid way-finding (Loomis etal. 1990,
Bederson 1995, Jones et al. 2008). A relaxed understanding of wearability admits
consideration of vehicles as a kind of wearable computer. For simple example, as
seen in Figure 13.2, a virtual sound source can guide a driver around a corner.
Commercial products featuring location-aware audio rendering, such as Audio
Conexus GPS tour guide systems, are currently used in tourism. Such products trigger contents display as a vehicle enters a designated zone. It is easy to extend such
capabilities to enact preprogrammed radio plays. For example, at a historic battleground, shouts, explosions, and spatialized sound effects can enliven narration of a
tourist guide. An exhaustive review is beyond the scope of this chapter, but a few
examples illustrate such systems. Intelligent Traffic System (ITS) information can
be spatialized using in-vehicle loudspeakers (Takao 2003). We extended some of
Takaos ideas in GABRIEL (Villegas and Cohen 2010), retrofitting a vehicle with
location-aware announcements (tourist information, navigation instructions, and
traffic advisories). These announcements were delivered via wireless headphones for
passengers and bone-conduction headphones for the driver. Besides using landmarks
to trigger audio stream delivery, our prototype used geolocated virtual sources that
reprojected with updates in the position and course of the vehicle, as illustrated in
FIGURE 13.2 Localized beacons and directionalization for vehicular way-finding and
way-showing.
312
GPS
receiver
Driver
RS-232,
USB
Broadcaster
OSC
Bone conduction
OSC
Audio
server
Wireless
Landmarks,
virtual sources
Tourists
FIGURE 13.3 System architecture of GABRIEL: vehicular GPS information updates the
soundscape.
Figure 13.3. These ideas can be applied not only to smart or connected vehicles but
also to mobile pedestrian interfaces featuring visual navigation enhanced with spatial sound displayed via headphones (Sanuki etal. 2014). The goal is to provide users
of wearable computers with complementary cues for way-finding (or way-showing)
and situation awareness, including both static contents and dynamic streams.
Other researchers have concentrated on easing interaction between users of wearable computers and the devices themselves. Motivated by the laborious process of
navigating through series of menus in small screens, menu items can be presented
via spatiotemporal multiplexed speech enhanced with spatial audio (Ikei etal. 2006),
achieving almost perfect recognition when four items are displayed at 60 intervals
in front of the user.
13.2.2Assistive Technology for Visually Impaired

Another rich field in binaural wearable computing is in assistive technologies for
the visually impaired (Edwards 2011, Katz etal. 2012). Mapping camera-captured
optical frequencies to speaker-displayed audible tones (Foner 1997) induces synesthetic experience, allowing users to hear what they (cannot) see. Note that the
idea of synesthetia has been generalized here, beyond its original sense of a subject experiencing directly apprehended stimulation in another sensory modality, to
apply to sensation invoked by the display of a mediating system mapping stimuli
cross-modally (White and Feiner 2011). For instance, the Eyeborg augmentation
device maps colors into sounds, even for colorblind (Harbisson 2008). Electronic
travel aids (ETAs) are tools for the blind with which obstacles and paths are displayed haptically or acoustically (Terven et al. 2014). One representative system
(Bujacz etal. 2012) uses personalized head-related impulse responses (HRIRs) for
azimuth and elevation cues, presenting distance cues as a inverse function of pitch
and amplitude and letting sound duration represent the size of objects. Such electronically assistive tools are still controversial in the blind community, as some find
these systems piteous or unnecessary, especially compared to traditional white cane
313
or guide dog approaches. Indeed, many blind users of such systems regard the resultant experience as more akin to sensory amputation than to sensory augmentation.
13.2.3Synesthetic Telepresence
Mixed reality audio interfaces are not limited to first-person experiences: secondperson teleoperated vehicles can exploit AAR displays. For instance, a remotely controlled telerobot (Tachi 2009) or UAV (unmanned aerial vehicle, or drone) exploring
a nuclear power plant might be equipped with sensors such as directional dosimeters
(Geiger counters) as well as binocular cameras and binaural microphones, sharing
its egocentric perspective for telepresent experience. Overlaid on the drones naturally captured soundscape, an auditory rendering of radioactive hot-spots could
be delivered to human pilots, directionalized audio sources aligned with the actual
hazardous environment. Such multimodal displays support what could be called
synesthetic telepresence, since a one-to-one, unimodal mapping of sensor data to
displayed media is not compulsory: mediation of experience can substitute or include
cross-modal stimulation. The most appropriate telepresence mapping might rely on
crossing modal boundaries, so that, for instance, important data might be sonified
as well as visualized.
13.2.4Security and Scene Analysis

Although spatial audio is usually thought of as a display technique, its principles
can be applied to sound capture and auditory scene analysis (Bregman 1990).
Human ability to localize auditory events can be augmented though hookups from
expanded microphone arrays, applications of which date back more than a century
(Scientific American 1880). Modern wearable spatial audio technologies have had
a fruitful contribution to gunshot position detection. Individual Gunshot Detector
technologies are passive acoustic systems that detect and localize sources of small
arms fire. They are tuned to detect the crack bang of a shot, recognizing audio
signatures comprising the muzzle blast from rifle fire and shockwave of a bullet
while screening out other acoustic events. Such detection systems use several microphones to capture supersonic gunshot, then analyze the recordings to determine
bullet trajectory, speed, and caliber (Duckworth et al. 2001), taking into account
models of ballistics and acoustic radiation for calculating projectile deceleration.
Examples include the Soldier Wearable Shooter Detection System, developed by
Raytheon BBN Technologies, and the Shoulder-Worn Acoustic Targeting System,
developed by QinetiQ North America. These systems display estimated position of
a sniper aurally as well as visually: incoming shot announcements are transmitted to
an earpiece while a wrist display provides range, azimuth, and elevation coordinates
of the shooter position. GPS-enabled devices can coordinate operations between
multiple units.
Another example of spatial sound capture in the field is the use of portable microphone arrays for reconnaissance and surveillance. Blind signal or source separation (BSS) is the separation of a set of source signals from a set of mixed signals.
A biologically inspired technique processes portable microphone array signals to
314
extract acoustic features such as spectral content, interaural time delays or differences (ITDs) and interaural intensity differences (IIDs), and pitch periodicity, which
are used to identify and locate acoustic sources in the environment, as many mammals do (Handel 1989, Deligeorges etal. 2009). BSS is the parsing of admixtures,
that is, the analysis of a soundscape. Beyond security, such a system could be used
for characterizing animal behavior, acoustic data logging, and underwater acoustic
monitoring, to mention a few applications.
13.2.5Motion Coaching via Sonification

Sonification is the rendering of non-auditory data as sound. Audio infoviz (information visualization) can be applied, for example, to teaching dance (Grosshauser
etal. 2012), employing sonifications somewhat like the squeaky shoes that delight
toddlers by chorusing their footfalls. In such systems, sensors (e.g., gyroscopes,
accelerometers, and pressure sensors) are worn or used by a dancer to sonify
performance. Ateacher can detect problems in a performance in a way that might
be difficult without auditory feedback. Such monitoring could work in combination
with audio players that alter music to reflect motion of rehabilitation patients learning, for example, how to walk with a prosthetic limb. Sonification displays benefit
from spatialization: an event can trigger positional change in music being played
(e.g., left limping triggers an increase of music level on that channel). The game
industry is interested in multimodal systems enriching impressions of motion and
inducing motion changes via synchronous sound effects (Akiyama etal. 2012) and
using vibration to reinforce auditory events (Chi etal. 2008).
13.2.6Situated Games
Location-aware games can use spatial audio to increase engagement of players
(Paterson etal. 2011). Situated or location-based games (Gaye etal. 2003, Magerkurth
etal. 2005) are a fertile domain for AAR (Cater etal. 2007), sound gardens, including cross-modal applications such as those mentioned earlier. decibel 151 (Stewart
etal. 2008, Magas etal. 2009) was an art installation and music interface that used
spatial audio technology and ideas of social networking to turn individuals into
walking soundtracks as they moved around each other in a shared real space and
listened to each other in a shared virtual space. Platforms to build such kind of interactions are being developed, such as MARA (Mobile Augmented Reality Audio), a
system that allows playback and recording of binaural sounds to track user position
(Peltola etal. 2009).
13.2.7Entertainment and Spatial Music

Spatial sound can be used to enhance musical expression and experience. Some
speciality publishers include spatial music in their catalog, but spatial music is still an
underexploited capability of distributed media in general, including popular encodings such as MP3 and AAC (Quackenbush and Herre 2005, Breebaart and Faller
2007, Breebaart et al. 2008). Radio dramas such as enactment of Stephen Kings
315
Mist (King 2007) are enlivened by binaural effects. Interactive music browsing can
leverage cyberspatial capability, and hypermedia-encoded musical experiences, such
as that suggested by the IEEE 1599 Music Encoding and Interaction standard (Baggi
and Haus 2009), represent inviting opportunities for spatial sound, including AAR.
Audio mixing techniques can be deployed in user interfaces for acoustic diffusion
(Gibson 2005). Virtual concerts can be optionally presented with perspective, so that
listening position coincides with virtual camera position. Head-tracking systems can
anchor such soundscapes so that they remain fixed in a users environment, such as
relative to a television (Algazi etal. 2005, Algazi and Duda 2011).
13.3 ANYWARE AND AWAREWARE

Applications featuring AAR are not mutually exclusive. Multitasking presents an
over-tempting invitation to abuse, or at least a potential for sensory overload. As
dilating network bandwidth allows increasingly polyphonic soundscapes, augmented
audio listeners might be overwhelmed with stimuli, as suggested in Figure 13.4.
Interface paradigms such as audio windowing (Cohen and Ludwig 1991a), narrowcasting (Alam etal. 2009), and panoramic or practically panoramic displays (Cohen
and Gyrbir 2008) can soften such information overload.
13.3.1Audio Windowing
Audio windowing, in analogy to graphical windowing user interfaces, treats soundscapes as articulated elements in a composite display (Begault 1994). In graphical user interfaces (GUIs), application windows can be rearranged on a desktop,
minimized, maximized, and reordered. Audio windowing similarly allows such
configuration of spatial soundscapes. Soundscapes, analogous to layers in graphical applications, can be combined simply by summing, although in practice some
scaling (amplification and attenuation), normalization, equalization, or other conditioning might yield more articulate results. For instance, interior soundscapes might
have reverberation applied, to better distinguish them from outdoor scenes. More
significantly, to make a composited soundscape manageable, some sources might be
muzzled or muted and some sinks might be narrowcastingly muffled or deafened.
(As defined in the last chapter, a sink is the dual of a source, used instead of listener
to distinguish it from an actual human, including allowing designation of multiple
sinks for a single user, as explained later in this chapter.)
13.3.2 Narrowcasting
As mentioned in the last chapter, a panoramic potentiometer can control the balance
of a channel in a conventional (leftright stereo) mix. By using an audio windowing
system as a mixing board, a multidimensional pan pot, users and applications can set
parameters corresponding to source and sink positions to realize a distributed sound
diffuser. Narrowcasting (Cohen and Ludwig 1991b, Cohen 2000, Fernando et al.
2009)by way of analogy with broad-, uni-, any-, and multicastingis an idiom
for limiting media streams, formalized by the expressions shown in Figure 13.5, to
316
FIGURE 13.4 Surroundsound: soundscape overload. (Copyright 2015, The New Yorker
Collection from cartoonbank.com. All rights reserved.)
distribute, ration, and control privacy, attention, and presence. Anyware models are
separate but combinable scenes, allowing a user to have selective attendance across
multiple spaces. Advanced floor control for chat spaces and conferences is outlined
in Table 13.1.
Privacy has two interpretations, the first association being that of avoiding leaks
of confidential information, protecting secrets. But a second interpretation means
freedom from disturbance, in the sense of not being bothered by irrelevance or interruption. Narrowcasting operations manage privacy in both senses, filtering duplex
information flow through an articulated conferencing model. Sources and sinks
are symmetric duals in virtual spaces, respectively representing sound emitters
317
The general expression of inclusive selection is

active(x) = exclude(x) ( y include(y) include(x)).
So, for mute and solo (or select), the source relation is
active(sourcex) = mute(sourcex) ( y solo(sourcey) solo(sourcex)),
mute explicitly turning off a source, and solo disabling the collocated complement of the selection
(in the spirit of anything not mandatory is forbidden). For deafen and attend, the sink relation is
active(sink x) = deafen(sink x) ( y attend(sink y) attend(sink x)).
FIGURE 13.5 Simplified formalization of narrowcasting and selection functions in

predicate calculus notation, where means not, means conjunction (logical and),
means there exists, and means implies. The duality between source and sink o perations is
strong, and the semantics are analogous: an object is inclusively enabled by default unless (a)it
is explicitly excluded with mute (for sources) or deafen (for sinks) or (b) peers are e xplicitly
included with solo or select (for sources) or attend (for sinks) when the respective object is not.
and collectors. A human user might be represented by both a source and a sink in
a groupware environment, or perhaps by multiple instances of such delegates. In
groupware environments, both ones own and others sources and sinks are adjusted
for privacy. Audibility of a soundscape is controlled by embedded sinks. Sources
can be explicitly turned off by muting or implicitly ignored by selecting some
others. Similarly, sinks can be explicitly deafened or implicitly desensitized if other
sinks are attended. Modulation of source exposure or sink attention (Benford etal.
1995, Greenhalgh and Benford 1995) need not be all or nothing: nimbus and focus
can be, respectively, partially softened with muzzling and muffling (Cohen 1993).
Narrowcasting attributes can be crossed with spatialization and used for polite calling or awareware, reflecting a sensitivity to ones availability, like the online
offline status switch of a conferencing service.
13.3.3Multipresence
Increasingly fine-grained networked interpersonal communication, from journaling
through microblogging to life-streaming and tweets, suggests desirability of realtime communication via persistent channels for media streams. Multitasking
users want to have presence in several locations at once. Enriched user interfaces,
especially coupled with position tracking systems, encourage multipresence, the
inhabiting by sources and sinks of multiple spaces simultaneously, allowing a user to
monitor and inhabit many spaces (Cohen 1998). Multipresence is an interface strategy for managing attention and exposure, allowing a single human user to designate
doppelgnger delegates in distributed domains. Being anywhere is better than being
everywhere, since it is selective; multipresence is distilled ubiquity, narrowcastingenabled audition (for sinks) or address (for sources) of multiple objects of regard.
Display technology can enable such augmented telepresence for spoken telecommunication (Martens and Yoshida 2000).
318
TABLE 13.1
s
k
Narrowcasting for sOUrce
Tput and INput: Sources and Sinks Are Symmetric
Duals in Virtual Spaces, Respectively Representing Sound Emitters and
Collectors
Source
Sink
Function
Level
Direction
Presence
Locus
Instance
Transducer
Organ
Express
Include
Radiation
Amplification, attenuation
Output (display)
Nimbus (projection, exposure)
Reception
Sensitivity
Input (control)
Focus (attention)
Speaker
Loudspeaker
Mouth
Megaphone
Solo (select)
Listener
Microphone or dummy-head
Ear
Ear trumpet
Attend
Suppress
Exclude
Own
(reflexive)
Muzzle
Mute
Muffle
Deafen
(Thumb up)
(Thumbs down)
(Thumb down)
(Thumbs up)
Other
(transitive)
319
Independence of location and orientation can be exploited to flatter multipresent auditory localization. An advantage of separating translation and rotation is
that directionalizability can be preserved even across multiple frames of reference.
Such distributed presence can be coupled with a motion platform (like that shown
in Figure 12.13), a vehicle, or position tracking. Moving can twist (but deliberately
not shift) multiple sinks, maintaining consistent proprioceptive sensation. Relaxedly
shared position data can be filtered to adjust objects only relatively, for instance,
using angular displacement instead of absolute azimuth. A technique for integration
(resolving the apparent paradoxes) of such multipresence is explained in Cohen and
Fernando (2009).
13.3.4 Layered Soundscapes

VR interfaces feature self-referenced displays including immersive, first-person
perspectives and tethered, second-person perspectives, as well as third-person perspectives (such as a map). These various perspectives are characterized by the relationship of the position of individual users to their display. Visual interfaces are
copotentiated by AAR interfaces, particularly those featuring position-aware (and
not just location-aware) spatial sound. Such systems can display direction and metaphorical distance, modulated and composed with audio windowing, narrowcasting,
and multipresence.
A use-case scenario of an errand-running pedestrian illustrates these ideas. The
subject dons an AAR display, wearware, perhaps headphones or nearphones attached
to eyewear, or enjoys some blended combination of personal and public display, ubicomp everyware. He forks his presence, inhabits multiple scenes simultaneously, anyware, with automatic prioritization of the soundscape layers, awareware. Following a
hierarchical taxonomy for visual virtual perspectives (Cohen 1998), these layers can
be sorted into proximal, medial, and distal classes, described here in decreasing order
of intimacy and self-identification, corresponding to the same utilityprofessional
recreational classifications used to order applications in the previous section.
A first-person, utility soundscape: A proximal reality browser layer comprises
way-showing, navigation cues: directionalized sounds signifying north,
home or origin, intermediate checkpoints, destinations, etc. Populating this
layer are auditory icons, recorded everyday sounds used with such displays,
and earcons, more abstract sounds signifying scene objects. For instance,
the traveler might hear a snippet of a recording of a relatives voice from
home, or a tone from a milestone marker. Except for nonliteral distance
effects, the soundscape corresponds to what is sometimes imprecisely
called a PoV (point-of-view) visual perspective, tightly bound to the position
of the listeners head. The rendering is responsive to translation (location)
as well as rotation (orientation). This intimate perspective can be described
as endocentric, centered inside a subject.
A second-person, professional soundscape: At the same time, the user monitors intermittent voice chat among his colleagues, whose respective channels are directionalized to suggest arrangement of their desks at the office.
320
Relaxing the tight correspondence of a first person, endocentric soundscape

allows a displaced or telepresent experience, analogous to that often used in
games in which players may view their own avatar from over-the-shoulder
or behind-the-back. Such medial rendering is sensitive to orientation but
not location: as a subject physically moves about, the soundscape twists
but does not shift. Surrounding voices neither recede nor approach; there
is no parallax. This perspective can be described as egocentric, centered
on (but not within) each subject, a kind of metaphorical tethering. Like a
first-person soundscape, it is idiocentric, centered upon oneself. Respective
sinks are still self-identifiable and personal, but the sensation is more like
an out-of-body experience.
A third-person, recreational soundscape: Our subject also enjoys music
rendered with respect to his head, such as a conventional stereo mix.
This distal perspective, perhaps shared with others, can be described as
exocentric, centered outside a subject, as it is oblivious or indifferent to the
position of the listener. Unlike first- and second-person aural perspectives,
it is allocentric, egalitarian, and nonindividualized. An entire audience
might share a sweet spot.
These layers are combined, basically by adding them, since their perceptual spaces are
coextensive, overlaid on the actual environment of the user. Such a composite soundscape is overcrowded, but narrowcasting provides a user interface strategy for data
reduction, selectively managing and culling hypermedia. Within each soundscape,
sources are individually controllable, either explicitly (via mute) or implicitly (by
selecting some others). Narrowcasting sinks fuse multiple layers while preserving their
individual activatability (via deafen and attend), like a visible/invisible toggle switch in
a graphical application or an ignore sender or thread function in e-mail and webforum browsers. Sinks are designated monitors of respective composited soundscapes;
sources are sound effects, distributed voices, and music populating those layers.
The anyware of such an articulated soundscape can be extended by sensitivity
of each layer to activity in other layers, awareware. An incoming call might automatically muzzle or attenuate sources in the music layer by deafening or muffling
its associated sink. Focusing on an office conference (by attending its sink) could
stifle other soundscapes. Awareware exemplifies intelligent interfaces that maintain
a model of user state, adjusting multimodal input and output (I/O) to accommodate
context and circumstances besides position.
13.4CHALLENGES
Personal networked spatial audio technologies inherit the challenges of both wearable computing and artificial spatialization (Kyriakakis 1998). While miniaturization and accuracy of sensors and actuators improve rapidly, extension of battery
capacity is far slower. Longevity solutions are being pursued, and techniques are
emerging in the realm of energy generators such as those collecting solar power,
harvesting human body heat (Leonov and Vullers 2009), or transducing limb and
finger movements, blood pressure, or other biological energy sources (Starner and
321
Paradiso2004). Besides such low-level concerns, remaining issues include individualizability of earprints, dynamic soundscape display, and standardization of scene
specification, respectively considered in the following sections.
13.4.1Capture and Synthesis

Generic earprints usually produce auditory images located inside the head and deviations in elevation that depend upon listener pinnae size (Martens 2003). Alternatives
intended to improve externalizability of virtual sound sources include acquisition of
personalized HRIRs, adaptation of HRIRs entries from a database to match a given
users anthropometrics, and HRIR synthesis (Rund and Saturka 2012).
Capture of individual HRIRs is tedious and expensive. The process involves, for
example, bringing people to an anechoic chamber where each subject sits in a rotary
chair controlled by computer that synchronizes emission of a reference signal and
its recording from binaural microphones located at the ears, repeating the operation
for many anglestypically at 5 intervals (about five times the minimum audible
angle for low-frequency frontal sources). Once recorded, raw measurements must
be trimmed and equalized (e.g., for diffuse- or free-field reproduction). The entire
process can take several hours for each subject, although more rapid solutions have
been proposed (Zotkin etal. 2002).
Capture of HRIRs is usually at a fixed distance from the subjects center of the
head, so all synthetic sounds created with them are rendered on the surface of this
sphere. Simulating distance changes can be done by modulating intensity for the farfield, but for the near-field this mechanism is ineffective (Brungart 2002, Spagnol
and Avanzini 2009). We addressed this issue by creating an object capable of realtime HRIR filtering with range control into the near-field, between 20 and 160cm
(Villegas and Cohen 2013).
Synthesizing HRIRs can be achieved by sampling a persons upper body, constructing a 3D model to estimate each HRIR by boundary element methods
(Gumerov etal. 2010, Geronazzo etal. 2011, Spagnol etal. 2013). This technique
needs an accurate surface mesh model of a subjects features to give acceptable
results. Microsoft is experimenting with a procedure that uses its Xbox Kinect, a
set-top console peripheral featuring near-infrared depth cameras and software to
capture actors, to scan each subjects head and torso, matching measured anthropometric parameters against those of other training subjects for whom HRTFs had
been previously captured (Bilinski etal. 2014). Similar to provision in public offices
of reading glasses with different diopter corrections, one could choose among several HRIR sets, selecting a best fitting dataset (Algazi etal. 2001). One might also
interpolate between multiple sets to find a good match for ones head, pinnae, and
torso characteristics (Rothbucher etal. 2010).
13.4.2Performance
Audio spatialization brings its own challenges, such as polyphony (i.e., directionalizing multiple sound sources simultaneously) and relatedly, intelligibility of multiple
speech sources. Other issues include sensitivity to noisy conditions, and difficulty in
322
distinguishing sounds coming from the back or front of the listener. System issues
include the need for low latency for richly populated polyphonic soundscapes and
seamless integration of streamed network communication and signal processing.
Distance and movement cues are difficult to express with HRIRs. Humans generally perform poorly at tasks involving localization of moving sources. The minimum audible movement angle ranges from about 8 for slow sources to about 21
for sources moving at one revolution per second (Perrott and Musicant 1977). This
inability to follow moving sources (binaural sluggishness) is also evident in virtual
scenes where binaural cues need to be approximated at sometimes insufficient rates
and where other cues such as Doppler shift are not readily available (Prschmann
and Strig 2009). Compromises between computational complexity and realism of
movement illusion have been reached, but there is still no clear way to definitively
solve this problem.
13.4.3Authoring Standards
Authoring contents for such AR applications can be difficult, and a broadly accepted
standard has yet to emerge (Perey etal. 2011). Spatial Sound Description Interchange
Format (SpatDIF) is a format that describes spatial sound information in a structured way to support real-time and non-real-time applications (Peters etal. 2012).
The Web3D Consortium is working on extending its XML-based X3D (Brutzman
and Daly 2007) to support AR applications. Augmented Reality Markup Language
(ARML) (MacIntyre et al. 2013) is also an XML grammar that lets developers
describe an augmented scene, its objects, and some behavior. A method has been
proposed to model such scenes with a two-stage process (Lemordant and Lasorsa
2010), analogous to rendering documents using HTML and CSS languages.
The challenge of delivering satisfactory binaural experience is reflected in projects such as BiLiBinaural Listening, a collaborative research project being conducted at IRCAM focused on assessment of the quality of the experience made
possible by binaural listening, the research and development of solutions for individualizing the listening while avoiding tedious measurements in an anechoic chamber,
and the definition of a format for sharing binaural data in anticipation of an international standard. With initiatives such as SOFA (Spatially Oriented Format for
Acoustics), the problems of storing and sharing HRIRs and BRIRs (binaural room
impulse responses) can hopefully be ameliorated (Majdak etal. 2013).
In channel-based systems, display configuration is predetermined, and channels are rigidly persistent, but object-based models feature transient sounds
tagged with position metadata, rendered at runtime for both flexible speaker
arrangement and interactivity. MPEG-H Part 3 is a compression standard for 3D
audio that can support many loudspeakers. Intended for display of prerendered
contents, it uses a channel-based model, and therefore has a separate focus from
MDA, Multi-Dimensional Audio, an object-oriented representation. Unifying
these paradigms is Tech 3364, the European Broadcast Union (EBU) Audio Data/
Definition Model (ADM), which integrates audio streams and metadata, including periphonic, binaural, channel-, scene-, and object-based arrangements, hoping for future-proofing by extensibility. The file format is agnostic regarding the
323
audio assets payload, and is self-declaratory so that intelligent decoding/replay

will be able to perform optimal rendering of content.
13.5 CONCLUDING REMARKS

Recapitulating the themes of this chapter and the preceding one, whereware RTLS
(real-time location- [and position-] based services) can parameterize cyberspatial
sound displayed by mobile ambient transmedial blending of personal wearware and
public resources, everyware ubicomp interfaces. Audio windowing can dynamically
adjust composite soundscapes, modulated for attention and exposure by anyware and
awareware narrowcasting interfaces. To control attention (focus), selected sources
can be muted or selected to cull parts of a soundscape, or ones own sinks deafened
or attended to ignore entire scenes; to control exposure (nimbus), ones sources can be
muted or selected or others sinks deafened or attended for selective privacy.
Applications of AAR span utility, work, and leisure. Navigation interfaces feature
way-finding and -showing and increase situation awareness. Users altering source
and sink parameters can experience the sensation of wandering among conferees in
a conference room. Multipresent concert goers can actively focus on particular channels by sonically hovering over musicians in a virtual recital hall, breaking down
the conventional fourth wall separating performers from audience. Entities in cyberspace and augmented displays may be imbued with auditory qualities as flexibly as
they are assigned visual attributes. Sound presented in such dimensional fashion is
as different from conventional mixes as sculpture is from painting.
Binaural accessories will emerge featuring hearing protection, hearing aid functions, noise reduction, speech enhancement, and spatial filtering functions, emphasizing important sound signals such as warnings and alarms. Anticipations of
specific commercial offerings quickly get stale in printed media such as this book,
but as we go to press some emerging products deserve mention. The Intelligent
Headset features GPS, a gyroscope, magnetometer, accelerometer, and dynamic
binaural audio. Presumably, Google Glass-like eyewear and VR-style HMDs such as
the Sony PlayStation 4 Morpheus and Facebook Oculus Rift will also incorporate
binaural AAR displays, since native tracking used for stabilizing visual scenes can
be applied to soundscapes as well. (Until HMDs become lighter weight, however, the
VR salute, using a hand to grab and steady ones headwear, will probably persist.) In
the medium future, scanning tools like Google Project Tango using machine vision
as well as indoor localization tools (Ficco etal. 2014) like indoor GPS and the Apple
iBeacon will allow modeling of real spaces and tracking within them.
It is ironic that the participle wired, formerly flattering, is now somewhat pejorative, connoting a cumbersome leash. The emergence of wireless form factors
such as mobile networked computing, handheld/nomadic/portable interfaces, global
roaming, software-defined radio, and wearable multimedia computing interfaces
offer opportunities for innovative design and advanced applications, both creative
and re-creative (Streicher and Everest 2006) and unplug and play. Literally sensational spatial sound is around and upon us, to simulate and stimulate.
Tracking sensors, motion capture, and gestural recognition can be used to nudge
computer interfaces toward interaction styles described variously as ambient,
324
embedded, pervasive, reality-based, ubicomp or calm (Weiser 1991, Weiser and

Brown 1997), tangible (Ishii and Ullmer 1997), tacit (Pedersen etal. 2000), postWIMP (van Dam 2005), and organic (Rekimoto 2008). Natural interfaces without
mouse or keyboard (Cooperstock etal. 1997), the disappearing or invisible c omputer
(Streitz etal. 2007), are more convenient and intuitive than traditional devices and
enable new computing environments, including telerobotics, multimedia information furniture, and spatially immersive displays.
Science fiction is a fertile asymptote for engineers (Shedroff and Noessel 2012).
The future never arrives but beckons us toward it. We are still a long way from the
Star Trek holodeck or telepathic interfaces, let alone the so-called singularity when
machine intelligence overcomes that of humans, but fantastic developments await.
As tools such as hearing aids and eyeglasses become prostheses, augmented humans
increasingly become cyborgs (cybernetic organisms), integrating organic and biomechatronic systems. Bionics is the use of devices (Sarpeshkar 2006), such as cochlear
implants* implanted in the nervous system to help the deaf, blind, and paralyzed, and
such technology will increasingly be applied to normally abled as well. Dry and wet
not only metaphorically describe reverberation, but also literally describe s ilicon- and
carbon-based intelligence. Mindmachine interfaces will hasten the seamlessness
and disintermediation of our connectivity. Hive-like networked sensors will make
hearing more collective. Genetic engineering, telerobotics, and nanotechnology will
enhance audition: perhaps humans can be crafted to have superpinnae (Durlach etal.
1993), or more than two ears (Schnupp and Carr 2009), or ultrasonic sensitivity. Lets
see, and lets hear!
REFERENCES
Akiyama, S., K. Sato, Y. Makino, and T. Maeno (2012, November). Effect onEnriching
impressions of motions and physically changing motions via synchronous sound effects.
In Proceedings of the Joint International Conference on Soft Computing and Intelligent
Systems (SCIS) and International Symposium on Advanced Intelligent Systems (ISIS),
pp. 856860. DOI: 10.1109/SCIS-ISIS.2012.6505218.
Alam, S., M. Cohen, J. Villegas, and A. Ashir (2009). Narrowcasting in SIP: Articulated privacy control. In S. A. Ahson and M. Ilyas (eds.), SIP Handbook: Services, Technologies,
and Security of Session Initiation Protocol, chapter 14, pp. 323345. CRC Press/
Taylor& Francis. www.crcpress.com/product/isbn/9781420066036.
Algazi, V., R. Duda, D. Thompson, and C. Avendano (2001). The CIPIC HRTF database. In
Proceedings of the IEEE Workshop on the Applications of Signal Processing to Audio
and Acoustics, pp. 99102. DOI: 10.1109/ASPAA.2001.969552.
Algazi, V. R., R. J. Dalton, Jr., R. O. Duda, and D. M. Thompson (2005, October). Motiontracked binaural sound for personal music players. In AES: Audio Engineering Society
Convention (119th Convention), New York.
Algazi, V. R. and R. O. Duda (2011, January). Headphone-based spatial sound. IEEE Signal
Processing Magazine 28(1), 3342.
*
Note that we have not distinguished between external assistive hearing devices (e.g., BTEbehindthe-ear devices), which are properly known as hearing aids (HA), and cochlear implants (CI), which
have an effect like internal headphones. Nor have we discussed bimodal and hybrid configurations
(e.g., HA in one ear, CI in the other). Although there are important differences between them from the
wearable computer perspective, they are similar in function and opportunities.
325
Baggi, D. and G. Haus (2009, March). IEEE 1599: Music encoding and interaction. Computer
42(3), 8487. DOI: 10.1109/MC.2009.85.
Bederson, B. B. (1995). Audio augmented reality: A prototype automated tour guide.
In Proceedings of the CHI: Conference on ComputerHuman Interaction. DOI:
10.1145/223355.223526.
Begault, D. R. (1994). 3-D Sound for Virtual Reality and Multimedia. Academic Press.
Benford, S., J. Bowers, L. Fahln, C. Greenhalgh, J. Mariani, and T. Rodde (1995). Networked
virtual reality and cooperative work. Presence: Teleoperators and Virtual Environments
4(4), 364386.
Bilinski, P., J. Ahrens, M. R. P. Thomas, I. J. Tashev, and J. C. Platt (2014, May). HRTF
magnitude synthesis via sparse representation of anthropometric features. In ICASSP:
Proceedings of the International Conference on Acoustics, Speech, and Signal
Processing, Florence. http://www.icassp2014.org.
Breebaart, J., J. Engdegrd, C. Falch, O. Hellmuth, J. Hilpert, A. Hoelzer, J. Koppens etal.
(2008, May). Spatial audio object coding (SAOC)The upcoming MPEG standard on
parametric object based audio coding. In AES: Audio Engineering Society Convention
(124th Convention). http://www.aes.org/e-lib/browse.cfm?elib = 14507.
Breebaart, J. and C. Faller (eds.) (2007). Spatial Audio Processing: MPEG Surround and
Other Applications. West Sussex: John Wiley & Sons, Ltd.
Bregman, A. S. (1990). Auditory Scene Analysis: The Perceptual Organization of Sound. MIT
Press.
Brungart, D. S. (2002). Near-field virtual audio displays. Presence: Teleoperators and Virtual
Environments 11(1), 93106.
Brutzman, D. and L. Daly (2007, April). X3D: Extensible 3D Graphics for Web Authors.
Morgan Kaufmann.
Bujacz, M., P. Skulimowski, and P. Strumillo (2012). NavitonA prototype mobility
aid for auditory presentation of three-dimensional scenes to the visually impaired.
Journal of Audio Engineering Society 60(9), 696708. www.aes.org/e-lib/browse.
cfm?elib=16374.
Cater, K., R. Hull, T. Melamed, and R. Hutchings (2007, April). An investigation into the
use of spatialized sound in locative games. In Proceedings of the CHI: Conference
on ComputerHuman Interaction, San Jose, California, pp. 23152320. http://dx.doi.
org/10.1145/1240866.1241000.
Chi, D., D. Cho, S. Oh, K. Jun, Y. You, H. Lee, and M. Sung (2008, December). Sound-specific
vibration interface using digital signal processing. In Proceedings of the International
Conference on Computer Science and Software Engineering, vol. 4, pp. 114117.
Cohen, M. (1993, August). Throwing, pitching, and catching sound: Audio windowing models
and modes. IJMMS: Journal of PersonComputer Interaction 39(2), 269304.
Cohen, M. (1998). Quantity of presence: Beyond person, number, and pronouns. In T.L.Kunii
and A. Luciani (eds.), Cyberworlds, chapter 19, pp. 289308. Tokyo: Springer-Verlag.
Cohen, M. (2000, February). Exclude and include for audio sources and sinks: Analogs of
mute and solo are deafen and attend. Presence: Teleoperators and Virtual Environments
9(1), 8496. http://www.mitpressjournals.org/doi/pdf/10.1162/105474600566637.
Cohen, M. and O. N. N. Fernando (2009). Awareware: Narrowcasting attributes for selective
attention, privacy, and multipresence. In P. Markopoulos, B. de Ruyter, and W. Mackay
(eds.), Awareness Systems: Advances in Theory, Methodology and Design, Human
Computer Interaction Series, chapter 11, pp. 259289. Springer.
Cohen, M. and N. Gyrbir (2008). Personal and portable, plus practically panoramic:
Mobile and ambient display and control of virtual worlds. Innovation: The Magazine of
Research & Technology 8(3), 3335. www.innovationmagazine.com.
Cohen, M. and L. F. Ludwig (1991a, March). Multidimensional audio window management.
IJMMS: Journal of PersonComputer Interaction 34(3), 319336.
326
Cohen, M. and L. F. Ludwig (1991b). Multidimensional audio window management. In

S.Greenberg (ed.), Computer Supported Cooperative Work and Groupware, chapter 10,
pp. 193210. London: Academic Press.
Cooperstock, J. R., S. S. Fels, W. Buxton, and K. C. Smith (1997, September). Reactive environments: Throwing away your keyboard and mouse. Communications of the ACM
40(9), 6573.
Deligeorges, S., A. Hubbard, and D. Mountain (2009, February). Biomimetic acoustic detection and localization system. US Patent 7,495,998.
Duckworth, G., J. Barger, and D. Gilbert (2001, January). Acoustic counter-sniper system. US
Patent 6,178,141.
Durlach, N. I., B. Shinn-Cunningham, and R. M. Held (1993). Supernormal auditory localization: I. General background. Presence: Teleoperators and Virtual Environments 2(2),
89103.
Edwards, A. D. N. (2011). Auditory display in assistive technology. In T. Hermann, A.
Hunt, and J. G. Neuhoff (eds.), The Sonification Handbook, chapter 17. Berlin: Logos
Publishing House. http://sonification.de/handbook/index.php/chapters/chapter17/.
Fernando, O. N. N., M. Cohen, U. C. Dumindawardana, and M. Kawaguchi (2009). Duplex
narrowcasting operations for multipresent groupware avatars on mobile devices.
IJWMC: International Journal of Wireless and Mobile Computing 4(2), 280287. www.
inderscience.com/browse/index.php?journalID=46&year=2009&vol=4&issue=2.
Ficco, M., F. Palmieri, and A. Castiglione (2014, February). Hybrid indoor and outdoor location services for new generation mobile terminals. PUC: Personal and Ubiquitous
Computing 18(2), 271285.
Foner, L. (1997, October). Artificial synesthesia via sonification: A wearable augmented sensory system. In Proceedings of the International Symposium on Wearable Computers,
pp.156157. DOI: 10.1109/ISWC.1997.629932.
Gaye, L., R. Maz, and L. E. Holmquist (2003). Sonic city: The urban environment as a musical interface. In NIME: Proceedings of the Conference on New Interfaces for Musical
Expression, pp. 109115.
Geronazzo, M., S. Spagnol, and F. Avanzini (2011, November). A head-related transfer
function model for real-time customized 3-d sound rendering. In Proceedings of the
International Conference on Signal Image Technology & Internet-Based Systems, Dijon,
France, pp.174179. http://doi.ieeecomputersociety.org/10.1109/SITIS.2011.21.
Gibson, D. (2005). The Art of Mixing: A Visual Guide to Recording, Engineering, and
Production (2nd ed.). Artistpro.
Greenhalgh, C. and S. Benford (1995, September). Massive: A collaborative virtual environment for teleconferencing. ACM Transactions on ComputerHuman Interaction 2(3),
239261.
Grosshauser, T., B. Blsing, C. Spieth, and T. Hermann (2012). Wearable sensor-based realtime sonification of motion and foot pressure in dance teaching and training. Journal of
Audio Engineering Society 60(7/8), 580589.
Gumerov, N. A., A. E. ODonovan, R. Duraiswami, and D. N. Zotkin (2010, January).
Computation of the head-related transfer function via the fast multipole accelerated
boundary element method and its spherical harmonic representation. Journal of the
Acoustical Society of America 127(1), 370386. DOI: 10.1121/1.3257598.
Handel, S. (1989). Listening: An Introduction to the Perception of Auditory Events. MIT Press.
Harbisson, N. (2008, June). Painting by ear. Modern Painters, The International Contemporary
Art Magazine, 7073.
Ikei, Y., H. Yamazaki, K. Hirota, and M. Hirose (2006, March). vCocktail: Multiplexed-voice
menu presentation method for wearable computers. In Proceedings of the Virtual Reality
Conference, pp. 183190. DOI: 10.1109/VR.2006.141.
327
Ishii, H. and B. Ullmer (1997, March). Tangible bits: Towards seamless interfaces between
people, bits and atoms. In Proceedings of the CHI: Conference on ComputerHuman
Interaction, pp. 234241.
Jones, M., S. Jones, G. Bradley, N. Warren, D. Bainbridge, and G. Holmes (2008). Ontrack:
Dynamically adapting music playback to support navigation. PUC: Personal and
Ubiquitous Computing 12(7), 513525. DOI: 10.1007/s00779-007-0155-2.
Katz, B. F. G., S. Kammoun, G. Parseihian, O. Gutierrez, A. Brilhault, M. Auvray, P. Truillet,
M. Denis, S. Thorpe, and C. Jouffrais (2012, November). NAVIG: Augmented reality
guidance system for the visually impaired. Virtual Reality 16(4), 253269. DOI: 10.1007/
s10055-012-0213-6.
King, S. (2007, October). The mist in 3-D sound.
Kyriakakis, C. (1998, May). Fundamental and technological limitations of immersive audio
systems. Proceedings of the IEEE 86(5), 941951. DOI: 10.1109/5.664281, http://
ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=664281.
Lemordant, J. and Y. Lasorsa (2010, May). Augmented reality audio editing. In Audio
Engineering Society Convention (128th Convention), London. www.aes.org/e-lib/
browse.cfm?elib=15439.
Leonov, V. and R. J. M. Vullers (2009). Wearable electronics self-powered by using human
body heat: The state of the art and the perspective. Journal of Renewable and Sustainable
Energy 1(6). http://scitation.aip.org/content/aip/journal/jrse/1/6/10.1063/1.3255465.
Loomis, J. M., C. Hebert, and J. G. Cicinelli (1990, October). Active localization of virtual
sounds. Journal of the Acoustical Society of America 88(4), 17571763.
MacIntyre, B., H. Rouzati, and M. Lechner (2013, May/June). Walled gardens: Apps and data as
barriers to augmented reality. IEEE Computer Graphics and Applications 33(3), 7781.
Magas, M., R. Stewart, and B. Fields (2009, August). decibel 151. In Proc. SIGGRAPH Art
Gallery, New Orleans, Louisiana. DOI: 10.1145/1667265.1667290, www.siggraph.org/
s2009/galleries_experiences/information_aesthetics/index.php.
Magerkurth, C., A. D. Cheok, R. L. Mandryk, and T. Nilsen (2005, July). Pervasive games:
Bringing computer entertainment back to the real world. Computer Entertainment 3(3),
4. DOI: 10.1145/1077246.1077257.
Majdak, P., T. Carpentier, R. Nicol, A. Roginska, Y. Suzuki, K. Watanabe, H. Wierstorf, H.
Ziegelwanger, and M. Noisternig (2013, May). Spatially oriented format for acoustics:
A data exchange format representing head-related transfer functions. In AES: Audio
Engineering Society Convention (134th Convention), Rome. Preprint #8880.
Martens, W. (2003, December). Perceptual evaluation of filters controlling source direction:
Customized and generalized HRTFs for binaural synthesis. Acoustical Science and
Technology 24(5), 220232. http://dx.doi.org/10.1250/ast.24.220.
Martens, W. L. and A. Yoshida (2000, May). Augmenting spoken telecommunication via spatial audio transformation. Journal of the 3D-Forum Society of Japan 14(4), 69175.
Paterson, N., K. Naliuka, T. Carrigy, M. Haahr, and F. Conway (2011, February). Locationaware interactive game audio. In AES: Audio Engineering Society Convention (41st
International Conference): Audio for Games. www.aes.org/e-lib/browse.cfm?elib=15769.
Pedersen, E. R., T. Sokoler, and L. Nelson (2000). Paperbuttons: Expanding a tangible user
interface. In DIS: Proceedings of the third Conference on Designing Interactive Systems,
New York, pp. 216223. ACM. DOI: 10.1145/347642.347723.
Peltola, M., T. Lokki, and L. Savioja (2009, February). Augmented reality audio for location-based games. In AES: Audio Engineering Society Convention (35th International
Conference): Audio for Games. www.aes.org/e-lib/browse.cfm?elib=15176.
Perey, C., T. Engelke, and C. Reed (2011). Current status of standards for augmented reality.
In L. Alem and W. Huang (eds.), Recent Trends of Mobile Collaborative Augmented
Reality Systems, chapter 2, pp. 2138. Springer-Verlag.
328
Perrott, D. R. and A. D. Musicant (1977). Minimum auditory movement angle: Binaural localization of moving sound sources. Journal of the Acoustical Society of America 62(6),
14631466. DOI: 10.1121/1.381675.
Peters, N., T. Lossius, and J. C. Schacher (2012, July). SpatDIF: Principles, specification, and
examples. In Proceedings of Sound and Music Computing Conference, Copenhagen.
Prschmann, C. and C. Strig (2009, August). Investigations into the velocity and distance perception of moving sound sources. Acta Acustica United with Acustica 95(4), 696706.
Quackenbush, S. and J. Herre (2005, October/December). MPEG surround. IEEE Multimedia
12(4), 1823. http://doi.ieeecomputersociety.org/10.1109/MMUL.2005.76. 5.
Rekimoto, J. (2008). Organic interaction technologies: From stone to skin. Communications
of the ACM 51(6), 3844.
Rothbucher, M., T. Habigt, J. Habigt, T. Riedmaier, and K. Diepold (2010, December). Measuring
anthropometric data for HRTF personalization. In SITIS: Proceedings of the International
Conference on Signal-Image Technology and Internet-Based Systems, pp.102106.
Rund, F. and F. Saturka (2012, July). Alternatives to HRTF measurement. In TSP: Proceedings
of International Conference on Telecommunications and Signal Processing, pp.648652.
Sanuki, W., J. Villegas, and M. Cohen (2014, April). Machi-beaconAn application of spatial
sound on navigation systems. In AES: Audio Engineering Society Convention (136th
International Conference), Berlin.
Sarpeshkar, R. (2006, May). Brain power: Borrowing from biology makes for low-power
computing. IEEE Spectrum 43(5), 2429. http://spectrum.ieee.org/biomedical/devices/
brain-power/.
Schnupp, J. W. H. and C. E. Carr (2009, May). On hearing with more than one ear: Lessons
from evolution. Nature Neuroscience 12(6), 692697. DOI: 10.1038/nn.2325, www.
nature.com/neuro/journal/v12/n6/full/nn.2325.html.
Scientific American (1880, July). Navigation in fogs. Scientific American 43(1). http://www.
gutenberg.org/files/38482/38482-h/38482-h.htm.
Shedroff, N. and C. Noessel (2012). Make It So: Interaction Design Lessons from Science
Fiction. Rosenfeld Media.
Spagnol, S. and F. Avanzini (2009, July). Auditory localization in the near-field. In Proceedings
of SMC: Sound and Music Computing Conference, Porto.
Spagnol, S., M. Geronazzo, and F. Avanzini (2013, March). On the relation between pinna
reflection patterns and head-related transfer function features. IEEE Transactions on
Audio, Speech, and Language Processing 21(3), 508519.
Starner, T. and J. A. Paradiso (2004). Human generated power for mobile electronics. In Low
Power Electronics Design, pp. 135. CRC Press.
Stewart, R., M. Levy, and M. Sandler (2008, September). 3D interactive environment for
music collection navigation. In Proceedings of DAFx: 11th International Conference on
Digital Audio Effects, Espoo.
Streicher, R. and F. A. Everest (2006). The New Stereo Soundbook (3rd edn.). Pasadena,
California: Audio Engineering Associates.
Streitz, N., A. Kameas, and I. Mavrommati (eds.) (2007). The Disappearing Computer:
Interaction Design, System Infrastructures and Applications for Smart Environments.
State-of-the-Art Survey. Springer LNCS 4500.
Tachi, S. (2009). Telexistence. Singapore: World Scientific Publishing Company.
Takao, H. (2003, July). Adapting 3D sounds for auditory user interface on interactive in-car
information tools. PhD thesis, Waseda University, Tokyo. dspace.wul.waseda.ac.jp/
dspace/handle/2065/436.
Terven, J. R., J. Salas, and B. Raducanu (2014, April). New opportunities for computer visionbased assistive technology systems for the visually impaired. IEEJ Transactions on
Electronics, Information and Systems 47(4), 5258. DOI: 10.1109/MC.2013.265, http://
doi.ieeecomputersociety.org/10.1109/MC.2013.265.
329
van Dam, A. (2005, September/October). Visualization research problems in next-generation

educational software. IEEE Computer Graphics and Applications 25(5), 8892. DOI:
10.1109/MCG.2005.118.
Villegas, J. and M. Cohen (2010, October). GABRIEL: Geo-Aware BRoadcasting for
In-Vehicle Entertainment and Localizability. In AES 40th International Conference
Spatial Audio: Sense the Sound of Space, Tokyo. www.aes.org/events/40/.
Villegas, J. and M. Cohen (2013, October). Real-time head-related impulse response filtering with distance control. In AES: Audio Engineering Society Convention (135th
Convention), New York, p. EB3-2. http://www.aes.org/events/135/ebriefs/?ID=3760.
Weiser, M. (1991, September). The computer for the 21st century. Scientific American, 94104.
Weiser, M. and J. S. Brown (1997). The coming age of calm technology. In P. J. Denning and
R. M. Metcalfe (eds.), Beyond CalculationThe Next Fifty Years, chapter 6, pp. 7585.
Copernicus (Springer-Verlag).
White, S. and S. Feiner (2011). Dynamic, abstract representations of audio in a mobile
augmented reality conferencing system. In Recent Trends of Mobile Collaborative
Augmented Reality Systems, chapter 13, pp. 149160. Springer.
Zotkin, D. N., R. Duraiswami, and L. S. Davis (2002, July). Customizable auditory displays.
In Proceedings of ICAD: International Conference on Auditory Display, Kyoto.
14
Recent Advances in
Augmented Reality
for Architecture,
Engineering, and
Construction
Applications
Amir H. Behzadan, Suyang Dong,
and Vineet R. Kamat
CONTENTS
14.1 Introduction................................................................................................... 332
14.1.1 Overview of Augmented Reality in Architecture, Engineering,
and Construction................................................................................ 333
14.1.2 Recent Advances in AR for AEC Applications................................. 335
14.2 Challenges Associated with AR in AEC Applications................................. 337
14.2.1 Spatial Alignment of Real and Virtual Objects (Registration)......... 337
14.2.1.1 Registration Process............................................................ 337
14.2.1.2 Experimental Results.......................................................... 341
14.2.2 Visual Illusion of Virtual and Real-World Coexistence (Occlusion).....344
14.2.2.1 Occlusion Handling Process...............................................344
14.2.2.2 Two-Stage Rendering..........................................................346
14.2.2.3 Implementation Challenges................................................346
14.3 Software and Hardware for AR in AEC Applications.................................. 350
14.3.1 Software Interfaces............................................................................ 350
14.3.1.1 ARVISCOPE...................................................................... 350
14.3.1.2 SMART............................................................................... 352
14.3.2 Hardware Platforms........................................................................... 353
14.3.2.1 UM-AR-GPS-ROVER........................................................ 353
14.3.2.2 ARMOR.............................................................................. 355
331
332
14.4 Implemented AEC Applications.................................................................... 358

14.4.1 AR-Assisted Building Damage Reconnaissance............................... 358
14.4.1.1 Technical Approach for AR-Based Damage Assessment.....361
14.4.1.2 Vertical Edge Detection...................................................... 362
14.4.1.3 Horizontal Edge Detection.................................................364
14.4.1.4 Corner Detection................................................................. 365
14.4.1.5 IDR Calculation.................................................................. 365
14.4.2 AR for Georeferenced Visualization and Emulated Proximity
Monitoring of Buried Utilities........................................................... 370
14.4.2.1 Technical Approach for Visualization of Buried Utility
Geodata in Operator-Perspective AR................................. 373
14.4.2.2 Processing Geodata Vectors and Attributes for AR
Visualization....................................................................... 374
14.4.2.3 Real-Time Proximity Monitoring of Excavators
toBuried Assets.................................................................. 376
14.4.2.4 Monitoring ExcavatorUtility Proximity Using
aGeometric Interference Detection Approach................... 379
14.4.3 AR for Collaborative Information Delivery...................................... 383
14.4.3.1 Technical Approach for AR-Based Information Delivery.....385
14.4.3.2 Multiple Views in ARVita.................................................. 390
14.5 Summary and Conclusions............................................................................ 392
References............................................................................................................... 392
14.1INTRODUCTION
In several science and engineering applications, visualization can enhance a users
cognition or learning experience especially when the goal is to communicate information about a complex phenomenon or to demonstrate the applicability of an
abstract concept to real-world circumstances. An important category of visualization
is termed virtual reality (VR), which attempts to replace the users physical world
with a completely synthetic environment. There are a wide array of applications
now commonly associated with VR such as computer-aided design (CAD), scientific
visualization, visual simulation, animation, computer games, and virtual training.
In VR, however, the users sensory receptors (eyes and ears) are isolated from
the real physical world and completely immersed in the synthetic environment that
replicates the physical world to some extent. In addition, even though VR can provide a stable, robust, interactive, and immersive experience, the cost and effort of
constructing a faithful synthetic environment include tasks such as model engineering (the process of creating, refining, archiving, and maintaining 3D models), scene
management, and graphics rendering and can thus be enormous (Brooks 1999).
In contrast to the VR paradigm, another category of visualization techniques, called
augmented reality (AR), attempts to preserve the users awareness of the real environment by compositing the real-world and the virtual contents in a mixed 3D space. In
particular, AR refers to the visualization technology that blends virtual objects with
Recent Advances in Augmented Reality for AEC Applications
333
the real world (Azuma etal. 2001). For this purpose, AR must not only maintain a
correct and consistent spatial relation between the virtual and real objects, but also
sustain the illusion that they coexist in the augmented space. The blending effect reinforces the connections between people and objects, promotes peoples appreciation
about their context, and provides hints for the users to discover their surroundings.
In addition, the awareness of the real environment in AR and the information
conveyed by the virtual objects help users perform real-world tasks, whereas VR
applications are mainly restricted to designing, running simulations, and training
(Azuma 1997). Furthermore, AR offers a promising alternative to the model engineering challenge inherent in VR by only including entities that capture the essence
of the study (Behzadan and Kamat 2005). These essential entities usually exist in a
complex and dynamic context that is necessary to the model, but costly to replicate
in VR. However, reconstructing the context is rarely a problem in AR, where modelers can take full advantage of the real context (e.g., terrains and existing structures)
and render them as backgrounds, thereby saving a considerable amount of effort and
resources.
14.1.1Overview of Augmented Reality in Architecture,

Engineering, and Construction
AR has significant potential in the architecture, engineering, and construction (AEC)
industry. Shin and Dunston (2008) presented a comprehensive outline for identifying
AR applications in construction. The paper reveals eight work tasks that may potentially benefit from AR (i.e., layout, excavation, positioning, inspection, coordination,
supervision, commenting, and strategizing). Figure 14.1 depicts some example applications from these areas.
The first attempt at visualizing underground utilities was made by Roberts etal.
(2002a). They looked beneath the ground and inspected the subsurface utilities.
Some further exploration can be found in Behzadan and Kamat (2009a) and Schall
et al. (2008), where the work has been extended to improve visual perception for
excavation safety and subsurface utilities, respectively. AR serves as a useful inspection assistance method in the sense that it supplements a users normal experience
with context-related or georeferenced virtual objects.
Webster etal. (1996) developed an AR system for improving the inspection and
renovation of architectural structures. Users can have x-ray vision and see columns behind a finished wall and rebars inside the columns. A discrepancy check
tool has been developed by Georgel et al. (2007), which allows users to readily obtain an augmentation in order to find differences between an as-designed
3D model and an as-built facility. Golparvar-Fard et al. (2009) demonstrated
an example of applying AR for construction supervision. They implemented a
system for visualizing performance metrics. It aims to represent progress deviations through the superimposition of 4D as-planned models over time-lapsed real
jobsite photographs.
Dai et al. (2011) presented another supervision example of overlaying as-built
drawings onto an aboveground site photo for the purpose of continuous quality
334
(a)
(b)
(c)
(d)
FIGURE 14.1 Example applications of AR in the AEC industry. (a) subsurface utilities (From
Schall, G. et al., Virtual redlining for civil engineering in real environments, Proceedings of
the IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Cambridge,
U.K., 2010.), (b) inspection (From Georgel, P. et al., An industrial augmented reality solution
for discrepancy check, Proceedings of the 2007 IEEE and ACM International Symposium on
Mixed and Augmented Reality, 2007, pp. 111115.), (c) supervision (From Golparvar-Fard, M.
et al., J.Comput. Civil Eng., 23(6), 418, 2009. With permission from ASCE.), and (d) project
feasibility analysis (From Piekarski, W., IEEE Comp. Graphics Appl., 26(1), 14, 2006.)
investigation of a bored pile construction. These works share the characteristic of

monitoring the discrepancy in the chronological order, which is different from the
discrepancy check mentioned earlier. Some examples of coordinating and strategizing are the visualization of construction simulations and architectural designs.
Behzadan and Kamat (2007) designed and implemented augmented reality visualization of simulated construction operations (ARVISCOPE), an AR framework for
visualization of simulated outdoor construction operations to facilitate the verification and validation of the results generated by discrete event simulation (DES).
Piekarski (2006) visualized the design of an extension to a building using a mobile
AR platform called TINMITH2. Some other construction tasks excluded by Shin and
Dunston (2008), which feature high complexity, may also benefit from AR. For example, the quality of welding depends on the welders experience and skill. Aiteanu etal.
(2003) improved the working conditions for welders, as well as the quality control, by
335
developing a welding helmet that augments visual information, such as paper drawings
and online quality assistance, before and during the welding process.
14.1.2Recent Advances in AR for AEC Applications

The Laboratory for Interactive Visualization in Engineering (LIVE) at the University
of Michigan has been engaged in AR research with applications related to construction, operations planning, inspection, safety, and education. These AEC applications
include visual excavator-collision avoidance systems, rapid reconnaissance systems
for measuring earthquake-induced building damage, and visualization of operations-level construction processes in both outdoor AR and the collaborative tabletop
AR environments (Figure 14.2). The developed visual collision avoidance system
allows excavator operators to persistently see what utilities lie buried in the vicinity of a digging machine or a human spotter, thus helping prevent accidents caused
by utility strikes (Talmaki etal. 2013). With the aid of AR, the rapid postdisaster
reconnaissance system for building damage assessment superimposes previously
stored building baselines onto the corresponding images of a real structure. The onsite inspectors can then estimate the damage by evaluating discrepancies between
the baselines and the real building edges (Dong 2012). Moreover, visualization of
construction operations in outdoor AR facilitates the verification and validation of
the results of simulated construction processes, with minimum effort spent on creating 3D models of the surrounding environment (Behzadan and Kamat 2009b).
Lastly, the tabletop collaborative AR visualization helps to bridge the gap between
paper-based static information and computer-based graphical models. It reflects the
dynamic nature of a jobsite, and preserves the convenience of face-to-face collaboration (Dong etal. 2013).
Video
camera
Electronic
compass
Orientation of
excavator
Tracking and video hardware
AR cabin platform
Write
Geometric data in
Dynamic pipeline model generation algorithm
Position of
excavator
Outdoor augmented reality registration
Rover part
of RTK
AR visualization
Read
Real world data input
Geodatabase
(a)
FIGURE 14.2 AR research for AEC applications in LIVE. (a) Visual collision avoidance. (From Talmaki, S.A. et al., Adv. Eng. Inform., 27(2), 283, 2013.)(Continued)
336
GPS
antenna
Electronic
compass
Camera
HMD
RTK rover
receiver
RTK rover
radio
(b)
GPS satellites
Trace file
DES model
Augmented view
Mobile user
Interpreter
CAD models
(c)
FIGURE 14.2 (Continued) AR research for AEC applications in LIVE. (b)
Reconnaissance of damaged building. (From Dong, S., Scalable and extensible augmented reality with applications in civil infrastructure systems, PhD dissertation,
Department of Civil and Environmental Engineering, University of Michigan, Ann
Arbor, MI, 2012.) (c) Visualization of construction processes. (From Behzadan, A.H.
and Kamat, V.R., J. Comput. Civil Eng., 23(6), 405, 2009b. With permission from
ASCE.)(Continued)
337
(d)
FIGURE 14.2 (Continued) AR research for AEC applications in LIVE. (d) Collaborative
AR visualization. (From Dong, S. et al., Elsevier J. Adv. Eng. Softw., 55, 45, 2013.)
14.2 CHALLENGES ASSOCIATED WITH AR IN AEC APPLICATIONS

14.2.1Spatial Alignment of Real and Virtual Objects (Registration)
Spatial registration in AR attempts to guarantee that the real-world objects and superimposed virtual objects are properly aligned with respect to each other (Behzadan
and Kamat 2007). In the absence of proper registration, the illusion that the two
worlds coexist inside the users view of the augmented space will be compromised.
14.2.1.1 Registration Process
As shown in Table 14.1, the registration process typically consists of four major steps
(Shreiner etal. 2006):
1. Positioning the viewing volume of a users eyes in the world coordinate system
2. Positioning virtual objects in the world coordinate system
3. Determining the shape of the viewing volume
4. Transforming virtual objects from the world coordinate system to the eye
coordinate system
Since the origin of the world coordinate system coincides with the users eye coordinate system, which is the users geographical location in each frame, these steps
must consider six degrees of freedom (three for position and three for head orientation) measured by tracking devices, as well as the lens parameter of the camera that
captures the real-world views.
As shown in Figure 14.3, the world coordinate system uses a right-handed system
with the Y-axis pointing in the direction of the true north, the X-axis pointing to
the east, and the Z-axis pointing upward. The eye coordinate system complies with
338
TABLE 14.1
Four Steps of the Registration Process in AR
Step
Viewing
Task
Parameters
and Device
Illustration
Position the
viewing volume
of a users eyes
in the world.
Attitude of
the camera
(electronic
compass)
Zw
Ze
Ye
Xe
Modeling
Position the
objects in the
world.
Yw
Xw
Location of
the world
origin
(RTK-GPS)
Zw
Zo
Yw
Yo
Xw
Xo
Creating
viewing
frustum
Decide the shape

of the viewing
volume.
Lens and
aspect ratio
of camera
(camera)
Projection
Project the
objects onto the
image plane.
Perspective
projection
matrix
339

Zw Upward
Ze
Zw
Zo
Yw East
Yo
Xw
Xo
True North
Yw(Ye)
Yw
Ye
Yw
Xe
Xw
FIGURE 14.3 Definition of the world coordinate system.
the OpenSceneGraph (OSG) (Martz 2007) default coordinate system, using a righthanded system with the Z-axis as the up vector, and the Y-axis departing from the eye.
As shown in Figure 14.3, the yaw, pitch, and roll angles are used to describe the
relative orientation between the world and eye coordinate systems. The zxy rotation
sequence is picked to construct the transformation matrix between the two coordinate systems. Suppose the eye and world coordinate systems coincide at the beginning. The users head rotates around the Z-axis by yaw angle [180, +180] to get
the new axes, X and Y. Since the rotation is clockwise under the right-handed system, the rotation matrix is Rz(). Then, the head rotates around the X-axis by pitch
angle [90, +90] to get the new axes Y and Z, with counterclockwise rotation
of R x(). Finally, the head rotates around the Y-axis by roll angle [180, +180]
with a counterclockwise rotation of Ry() to reach the final attitude.
Converting the virtual object from the world coordinate to the eye coordinate is
an inverse process of rotating from the world coordinate system to the eye coordinate
system; therefore, the rotating matrix is written as Rz()R x()Ry(), as shown in
Equation 14.1. Since OSG provides quaternion, a simple and robust way to express
rotation, the rotation matrix is further constructed as quaternion by specifying the
rotation axis and angles. The procedure is explained as follows, and all associated
equations are listed in sequence in Equations 14.2 through 14.5: rotating around the
Y-axis by degrees, then rotating around the X-axis by degrees, and finally
rotating around the Z-axis by degrees:
0
0 cos( ) 0 sin( ) X w
X e cos( ) sin( ) 0 1

1
0
Ye = sin( ) cos( ) 0 * 0 cos() sin() * 0
* Yw
Z e 0
0
1 0 sin() cos() sin() 0 cos() Z w

(14.1)

Pe = Rz ( ) *Rx ( ) * Ry ( ) * Pw (14.2)
0

Z-axis = 0 (14.3)
1
340
cos( ) sin( ) 0 1 cos( )
X-axis = sin( ) cos( ) 0 * 0 = sin( ) (14.4)

0
0
1 0 0
0
0
cos( ) sin( ) 0 0 sin( )
1
Y -axis = 0 cos() sin() * sin( ) cos( ) 0 * 1 = cos()cos( )

0 sin() cos() 0
0
1 0 sin()cos( )
(14.5)
Once the rotation sequence and transformation is completed, the next step is to
model the virtual objects in their exact locations. The definition of the object
coordinate system is determined by the drawing software. The origin is fixed to
a pivot point on the object with user-specified geographical location. The geographical location of the world coordinate origin is also given by position tracking
devices (e.g., GPS sensor) carried by the user. Therefore, the 3D vector between
the object and world coordinate origins can be calculated. The methods to calculate the distance between geographical coordinates is originally introduced by
Vincenty (1975). Behzadan and Kamat (2007) used this approach to design an
inverse method that uses a reference point to calculate the 3D vector between two
geographical locations.
Once a virtual object is modeled inside the users viewing frustum, any further
translation, rotation, and scaling operations are applied on the object. Finally,
theusers viewing frustum must be defined. The real world is perceived through
the perspective projection by the human eye and the video camera. Four parameters are needed to construct a perspective projection matrix: horizontal angle
of view, horizontal and vertical aspect ratio, and near and far planes. As shown
in Figure 14.4, these parameters together form a viewing frustum and decide the
virtual content to be displayed in the augmented space. In order to increase computational efficiency, all virtual objects outside of the viewing frustum are either
cropped or clipped.
Far
Top
Aspect ratio =
Width
Height
Left
Near
Ze
Line o
t
f sigh
Ye
Center of point
Xe
Vertical
FOV
Bottom
Right
FIGURE 14.4 The viewing frustum defines the virtual content that can be seen.
341
14.2.1.2 Experimental Results

Figure 14.5 shows the process followed to calibrate the mechanical attitude discrepancy and validate the registration algorithm. A real box of size 12cm 7cm 2cm
(length width height) is placed at a known pose. A semitransparent 3D model of
the same size is created and projected onto the real scene, so that the level of alignment
can be judged. The virtual box is first projected without adjustments to the attitude
measurement, and discrepancy is thus present. The virtual box is then shifted to align
with the real one by adding a compensation value to the attitude measurement. The
experiment was further continued to validate the agreement between the real and virtual camera; if the static registration algorithm works correctly, the virtual box should
coincide with the real box when moved together with six degrees of freedom. Figure
14.5 also shows that overall the virtual box matched the real one in all tested cases.
Despite achieving satisfactory results in static registration (i.e., when virtual
objects are not moving inside the viewing frustum), due to the latency induced by the
head orientation sensor itself, correct static registration does not necessarily guarantee that the user can see the same correct and stable augmented image when in
motion. In particular, the communication mechanism (PULL vs. PUSH) with the
head orientation sensor may cause synchronization latency problems in dynamic
registration. Figure 14.6 lists the main steps involved in PULL and PUSH mechanisms. A more detailed description of these methods can be found in Dong (2012).
In the authors research, in order to determine the latency under PUSH mode,
a series of experiments were performed on a Dell Inspiron machine with an Intel
Calibration result
Yaw offset: 4.5
Pitch offset: 7.3
Roll offset: 1.0
X pos: 0.15 m
Y pos: 0.30 m
Z pos: 0.04 m
X pos: 0.05 m
Y pos: 0.30 m
Z pos: 0.09 m
Roll: 22.21
X pos: 0.07 m
Y pos: 0.30 m
Z pos: 0.09 m
Pitch: 46.12
FIGURE 14.5 Mechanical attitude calibration result and validation experiment of registration algorithm.
342
Notify the TCM module of the data components

needed for each reque
Reque data
Notify the TCM module to send the data
Wait for
data packet
Waiting for the response from the TCM module

If time out, then return to the age of requeing
data, otherwise go to parsing data age
Check data completion by CRC matching

Interpret YAW, PITCH, ROLL, and other relevant
Parse data data from binary data packet
(a)
Eablish
communication
Wait for
data packet
Notify the TCM module of the data components

needed only at the beginning
Notify the TCM module to flush the data
Attach callback to the OnDataReceived event
If OnDataReceived event is trigged, then
OnDataReceivedFunc is invoked
Check data completion by CRC matching

Interpret YAW, PITCH, ROLL, and other relevant
Parse data data from binary data packet
(b)
FIGURE 14.6 Communication stages in (a) the PULL mode and (b) the PUSH mode.
CoreTM 2Duo CPU T6600 2.2 GHz and 64-bit Windows operating system. In order
to minimize the transmission latency between the camera and the host system, an
integrated camera was used and the resolution was adjusted to the minimum option
of 160120. A TCM compass module was used as a 3D orientation tracking device.
Both the camera and the TCM module ran at approximately 30Hz. The camera update
function was written as a callback and executed at every frame. The system time was
recorded when each new frame was captured. The device update function was written as a delegate and registered with the OnDataReceived event when a new data
packet was placed in the buffer.
The system time stamp was also assigned to the angular data each time the event is
triggered. As shown in Figure 14.7, the TCM module was held static at the beginning,
and then rapidly swung to one side at the speed of about 150/s. Later, the exact instant
that the module started swinging was identified from the recorded image frames and
the TCM module angular data, along with their corresponding time stamps. In this
way, the time stamps were compared to find the lag of the TCM module PUSH mode.
343
(b)
(a)
(c)
Degree
200
150
100
50
0
(d)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Time instance
FIGURE 14.7 Comparison between the TCM-XB data log and the corresponding recorded
image frames. The shaded area highlights the exact instant that the module started swinging.
(a) Static, (b) Begin to swing, (c) Second frame of swing, (d) Recorded data log.
Six groups of experiments were carried out and the delay in the PUSH mode relative
to the web camera was found to be 5 ms on average. This implies that the communication delay in the PUSH mode was small enough to be neglected.
Another source of latency error in PUSH mode is due to the finite impulse response
(FIR) filter of the compass module. In particular, the calibration of the magnetometer
can compensate for a local static magnetic source within the vicinity of the compass
module. However, dynamic magnetic distortion still impacts the module in motion,
and the noise magnification depends on the acceleration of the module. Usually the
noise increases with the acceleration. Among the three degrees of freedom, heading
(i.e., yaw) is the most sensitive to the noise. Except for the high frequency vibration
noise, other types of noise can be removed by a FIR Gaussian filter. The compass
344
Frame
A
Frame Frame Frame

A1 A2
A3
Gaussian filter
Pop out
Insert
Frame
A16
0.08
0.04
Update
Frame Frame Frame
A
A1
A2
Frame
A15
Gaussian
0.06
Frame Frame
A31 A32
0.02
0.00
Frame Frame
A30 A31
18 16 14 12 10 8 6 4 2 0
2 4
6 8 10 12 14 16 18
FIGURE 14.8 The filter-induced latency when a 32 tap Gaussian filter is used.
module comes with five options for filtering: 32, 16, 8, 4, and 0 tap filter. The higher
the number, the more stable the output but the longer the expected latency.
Consider the case of selecting a 32 tap filter, as shown in Figure 14.8. When it is time
to send out estimated data at time instant A, the module adds a new sample A to the end
of the queue with the first one being dropped, and applies a Gaussian filter to the queue.
However, the filtered result actually reflects the estimated value at time instant A15.
Since the module samples at approximately 3032Hz, it induces a 0.5s delay for a 32 tap
filter, a 0.25 s delay for 16 tap filter, and so on. This is referred to as filter-induced latency,
and it applies to both PULL and PUSH modes. A 0 tap filter implies no filtering, but significant jittering. Dong (2012) provided a detailed account of how filter-induced latency
can be avoided by moving the Gaussian FIR filter from the hardware to the software.
14.2.2Visual Illusion of Virtual and Real-World

Coexistence (Occlusion)
In an ideal AR visualization scenario, real and virtual objects must be seamlessly
blended in all three dimensions, instead of virtual objects being simply superimposed on
top of a real-world background, as is the case in most current AR approaches. The result
of composing an AR scene without considering the relative depth of the real and virtual
objects is that the graphical entities in the scene appear to float over the real background,
rather than blend or coexist with real objects in that scene. This phenomenon is commonly referred to as incorrect occlusion. The occlusion problem is more complicated in
outdoor AR where the user expects to navigate the space freely, and where the relative
depth between the involved virtual and real content changes dynamically with time.
Figure 14.9 is a schematic AR scene where a real object (structural column) is closer
than the virtual object (forklift) to the viewpoint (Behzadan and Kamat 2008). Therightside image shows visually correct occlusion where the forklift is partially blocked by the
structural column. However, the left-side image shows the scene in absence of occlusion,
producing an incorrect illusion that the forklift is in front of the column.
14.2.2.1 Occlusion Handling Process
Several researchers have explored the AR occlusion problem from different perspectives. Wloka and Anderson (1995) implemented a high-speed stereo matching algorithm
that infers depth maps from a stereo pair of intensity bitmaps. However, random gross
errors blink virtual objects on and off and turn out to be very distracting. Berger (1997)
proposed a contour-based approach, but with the major limitation that the contours need
(a)
345
(b)
FIGURE 14.9 Example of incorrect and correct occlusion in AR visualization. (a) Incorrect
occlusion, (b) Correct occlusion.
to be seen from frame to frame. Lepetit and Berger (2000) refined the previous method
with a semiautomated approach that requires the user to outline the occluding objects
in the key views, and then the system automatically detects these occluding objects and
handles uncertainties on the computed motion between two key frames. Despite the
visual improvements, the semiautomated method is only appropriate for postprocessing.
Fortin and Hebert (2006) studied both a model-based approach using a bounding box,
and a depth-based approach using a stereo camera. The former works only with a static
viewpoint and the latter is subject to low-textured areas. Ryu etal. (2010) and Louis and
Martinez (2012) tried to increase the accuracy of the depth map by a region of interest
extraction method using background subtraction and stereo depth algorithms. However,
only simple background examples were demonstrated. Tian et al. (2010) designed an
interactive segmentation and object tracking method for real-time occlusion, but their
algorithm fails in the situation where virtual objects are in front of the real objects.
In the authors research, a robust AR occlusion algorithm was designed and
implemented that uses a real-time time-of-flight (TOF) camera, an RGB video camera, OpenGL Shading Language (GLSL), and render to texture (RTT) techniques to
correctly resolve the depth of real and virtual objects in real-time AR visualizations.
Compared to the previous approach, this approach enables improvements in three ways:
1.
Ubiquity: The TOF camera is capable of suppressing the background illumination (SBI) and enables the designed algorithm to work in both indoor
and outdoor environments. It puts the least limitation on context and conditions compared with any previous approach.
2.
Robustness: Using the OpenGL depth-buffering method, it can work regardless of the spatial relationship among involved virtual and real objects.
3.
Speed: The processing and sampling of the depth map is parallelized by taking
advantage of the GLSL fragment shader and the RTT technique. Koch etal.
(2009) described a parallel research effort that adopted a similar approach for
TV production in indoor environments with a 3D model constructed beforehand with the goal of segmenting a moving factor from the background.
A fundamental step to correct occlusion handling is obtaining an accurate measurement of the distance from the virtual and real object to the users eye. In an outdoor
346
AR environment, the distance from the virtual object to the viewpoint can be calculated using the Vincenty algorithm (Vincenty 1975). This algorithm interprets the
metric distance based on the geographical locations of the virtual object and the user.
The locations of the virtual objects are predefined by the program. In a simulated
construction operation, for example, the geographical locations of virtual building
components and equipment are extracted from the engineering drawings. The location of the viewpoint, on the other hand, is tracked by a position sensor (e.g., GPS)
carried by the user. A TOF camera estimates the distance from the real object to the
eye with the help of the TOF principle, which measures the time that a signal travels,
with well-defined speed, from the transmitter to the receiver (Beder etal. 2007).
Specifically, the TOF camera measures radio frequency (RF)-modulated light
sources with phase detectors. The modulated outgoing beam is sent out with an RF
carrier, and the phase shift of that carrier is measured on the receiver side to compute
the distance (Gokturk etal. 2010). Compared to traditional light detection and ranging (LIDAR) scanners and stereo vision, the TOF camera features real-time feedback
with high accuracy. It is capable of capturing a complete scene with one shot, and with
speeds of up to 40 frames per second (fps). However, common TOF cameras are vulnerable to background light (e.g., artificial lighting and the sun) that generates electrons
and confuses the receiver. In the authors research, the SBI method is used to allow the
TOF camera to work flexibly in both indoor and outdoor environments (PMD 2010).
14.2.2.2 Two-Stage Rendering
Depth buffering, also known as z-buffering, is the solution for hidden-surface elimination in OpenGL, and is usually done efficiently in the graphics processing unit (GPU).
A depth buffer is a 2D array that shares the same resolution with the color buffer and
the viewport. If enabled in the OpenGL drawing stage, the depth buffer keeps record
of the closest depth value to the observer for each pixel. For an incoming fragment at
a certain pixel, the fragment will not be drawn unless its corresponding depth value
is smaller than the previous one. If it is drawn, then the corresponding depth value in
the depth buffer is replaced by the smaller one. In this way, after the entire scene has
been drawn, only those fragments that were not obscured by any others remain visible.
Depth buffering thus provides a promising approach for solving the AR occlusion problem. Figure 14.10 shows a two-stage rendering method. In the first stage
of rendering, the background of the real scene is drawn as usual, but with the depth
map retrieved from the TOF camera written into the depth buffer at the same time.
In the second stage, the virtual objects are drawn with depth buffer testing enabled.
Consequently, the invisible part of virtual object, either hidden by a real object or
another virtual object, will be correctly occluded.
14.2.2.3 Implementation Challenges
Despite the straightforward implementation of depth buffering, there are several challenges when integrating the depth buffer with the depth map from the TOF camera:
1. After being processed through the OpenGL graphics pipeline and written into the depth buffer, the distance between the OpenGL camera and
the virtual object is no longer the physical distance (Shreiner etal. 2006).
Thetransformation model is explained in Table 14.2. Therefore, the distance
De
buf pth
fer
B
RG age
im
First rendering stage
Co
buf lor
fer
Hidden surface
removal
h
pt e
De ag
im
347
AR
registration
Second rendering stage
FIGURE 14.10 Two-stage rendering for occlusion handling. (From Dong, S. etal., J. Comput.
Civ. Eng., 27(6), 607, 2013.)
TABLE 14.2
Transformation Steps Applied to the Raw TOF Depth Image
Name
Meaning
Operation
Expression
Ze
Distance to the
viewpoint
Acquired by TOF camera
Zc
Clip coordinate
after projection
transformation
Mortho * Mperspective *
[Xe Ye Ze We]T
Zcvv
Canonical view
volume
Zc/Wc (Wc = Ze and is the

homogenous component
in clip coordinate)
Z cvv =
Zd
Value sent to
depth buffer
(Zndc + 1)/2
Zd =
Range
(0, +)
Zc =
Z e * ( f + n) 2 * f * n * We
f n
f n
[n, f]
where n and f are the near and far

planes and We is the homogenous
component in eye coordinate
and is usually equal to 1
f +n
2* f *n
f n Z e * ( f n)
f +n
f *n
+ 0.5
2*( f n) Ze*( f n)
[1, 1]
[0, 1]
for each pixel from the real object to the viewpoint recorded by the TOF
camera has to be processed by the same transformation model, before it is
written into the depth buffer for comparison.
2. There are three cameras for rendering an AR space: a video camera, a TOF
camera, and an OpenGL camera. The video camera captures RGB or intensity
values of the real scene as the background, and its result is written into the color
buffer. The TOF camera acquires the depth map of the real scene, and its result is
written into the depth buffer. The OpenGL camera projects virtual objects on top
of real scenes, with its result being written into both the color and depth buffers.
In order to ensure correct alignment and occlusion, ideally all cameras should
share the same projection parameters (theprinciple points and focal lengths).
348
Even though the TOF camera provides an integrated intensity image that can be
aligned with the depth map by itself, the monocular color channel compromises
the visual credibility. On the other hand, if an external video camera is used,
then the intrinsic and extrinsic parameters of the video camera and TOF camera
may not agree (i.e., different principle points, focal lengths, and distortions).
Therefore, some image registration methods are required to find the correspondence between the depth map and the RGB image. Dong (2012) provided a
detailed description of two methods including a nonlinear homography estimation implementation adopted from Lourakis (2011) and stereo projection that
are used to register the depth map and RGB image. The projection parameters
of OpenGL camera are adjustable and can accommodate either an RGB or TOF
camera. Figure 14.11 shows snapshots of the occlusion effect achieved in this
research by using homography mapping between the TOF and RGB camera.
3. Traditional OpenGL pixel-drawing commands can be extremely slow when
writing a 2D array (i.e., the depth map) into the frame buffer. In the authors
research, an alternative and efficient approach using OpenGL texture and
GLSL is used.
4. The resolution of the TOF depth map is fixed as 200 200, while that of the
depth buffer can be arbitrary, depending on the resolution of the viewport.
This implies the necessity of interpolation between the TOF depth map and
the depth buffer. Furthermore, image registration demands an expensive
computation budget if a high-resolution viewport is defined. In the authors
research, the RTT technique is used to carry out the interpolation and registration computation in parallel.
(a)
(b)
FIGURE 14.11 Occlusion effect comparison using homography mapping between the
TOF camera and the RGB camera: (a) occlusion disabled and (b) occlusion enabled. (From
Dong,S. etal., J. Comput. Civ. Eng., 27(6), 607, 2013.)
349

Despite the outstanding performance of the TOF camera in speed and accuracy, the
biggest technical challenge it faces is modular error, since the receiver decides the distance by measuring the phase offset of the carrier. The maximum valid range is limited
by the RF carrier wavelength. For instance, if the standard measurement range is 7.5m,
and an object happens to be 8 m away from the camera, its distance is represented as
0.5m (8mod 7.5) on the depth map, instead of 8 m. This can create incorrect occlusion in outdoor conditions, where ranges can easily go beyond 7.5 m. In the authors
research, object detection and segmentation were explored as possible options to mitigate this limitation and the experiment range is intentionally restricted to within 7.5 m.
Two sets of validation experiments were conducted in both indoor and outdoor
environments. In both cases, the TOF camera is positioned about 7.5 m away, facing the wall. Demonstration videos of both experiments are maintained and can be
found at http://pathfinder.engin.umich.edu/videos.htm. All of the virtual models are
courtesy of the Google 3D Warehouse community. In the indoor experiment, a forklift picks up a virtual piece of cardboard in front of the virtual stack, and maneuvers
to put it on top of a physical piece of cardboard. In the meantime, a construction
worker passes by with a buggy, and then puts a physical bottle beside the virtual cardboard. Figure14.12 shows snapshots of indoor experiments to validate the occlusion
(a)
(b)
FIGURE 14.12 Indoor simulated construction processes with occlusion (a) disabled and
(b)enabled. (From Dong, S. etal., J. Comput. Civ. Eng., 27(6), 607, 2013.)
350
(a)
(b)
FIGURE 14.13 Outdoor simulated construction processes with occlusion (a) disabled and
(b) enabled. (From Dong, S. etal., J. Comput. Civ. Eng., 27(6), 607, 2013.)
correctness. In the outdoor experiment, a construction worker stands on a virtual

scissor lift and paints the wall. The worker then jumps off the scissor lift, pushes the
debris to the virtual pile of dirt with a real shovel, and operates the virtual minidozer.
Figure14.13 shows snapshots of outdoor experiments to validate the occlusion correctness. It is clear from the composite visualization that occlusion provides much
better spatial cues and realism for outdoor AR visual simulation.
14.3 SOFTWARE AND HARDWARE FOR AR IN AEC APPLICATIONS

14.3.1Software Interfaces
14.3.1.1ARVISCOPE
ARVISCOPE was the first-generation expressive self-contained AR animation authoring language that was designed to allow an external software process (e.g.,a running
DES model) to author a dynamic visualization in AR (Behzadan etal. 2012). Sequential
statements written in this language can describe a smooth and continuous operation of
arbitrary length and complexity. The communicated statements (i.e.,events) are interpreted by the visualization engine of the AR application. Appropriate data structures,
algorithms, and routines are then invoked to manipulate CAD models and other 3D
geometric primitives to present a smooth, accurate representation of the operations.
351
In order to create a viewing frustum with the users eye at the center of the projection,
using the procedure described in Section14.3.2.1, users positional and head orientation data are continuously obtained and processed. Details of the tracking devices
used to achieve this are also described in Section 14.3.2.1.
Despite the fact that the ARVISCOPE authoring language is powerful enough to
describe the complexities involved in a typical construction operation, the syntax of
the language is not very complex. Based on their functionality, ARVISCOPE language statements are grouped into scene construction, dynamic, and control statements (Behzadan 2008). These statements can be sequentially recorded into and
interpreted from a text file referred to as the animation trace file. The animation
trace file is sequentially interpreted line by line as soon as the application starts, the
individual statements are processed, and the graphical representation corresponding
to the event in each line of the trace file is simultaneously created and depicted inside
the users augmented viewing frustum. During this process, the user can freely move
in the animated augmented space.
The animation trace file can be created either manually (for short animations) or
automatically during a simulation run. Manual generation of an animation trace file
is typically not practical except in the case of simple demonstrative examples of short
animated duration. Automatic generation of a trace file is more recommended since
it requires less time and produces more accurate results. Automatic generation of an
animation trace file requires instrumentation of a simulation model (i.e., including
additional code and statements in a simulation model). For example, Figure 14.14
shows how two new lines are created and added to the animation trace file describing a simple earthmoving operation as a result of a statement added to the simulation input file of the same operation. These two lines will be written to the trace file
numerous times with different arguments (e.g., time tag, duration, object name, route
name) depending on the specific instance of the activity taking place.
The completed trace file will contain other lines of text that will be written out when
other parts of the modeled operation take place. Thus, the time-ordered sequence of
animation statements written out by all the activities in the model during a simulation run constitutes the trace file required to visualize the modeled operations in AR.
DES tool (stroboscope)
Soil
Loader
Soil
Load
Haul
Hauler
Return
ONSTART Return PRINT ATF

Instrumenting
''SIMTIME %.2f\059
TRAVEL Hauler%.of ReturnRoad %.2f\059\n''
SimTime Return.Hauler.ResNum Return.Duration;
Dump
AR language (arviscope)
Generating
SIMTIME 12.00;
TRAVEL Hauler1 ReturnRoad 15.00;
FIGURE 14.14 Sample instrumentation of a DES input file for automated generation of the
corresponding ARVISCOPE animation trace file.
352
FIGURE 14.15 Animated structural steel erection operations in ARVISCOPE.
Itshould be noted that the simulation input file partially shown in Figure14.14 can be
created by any DES authoring language such as STROBOSCOPE (State and Resource
Based Simulation of Construction Processes). STROBOSCOPE is a programmable and
extensible simulation system designed for modeling complex construction operations
in detail and for the development of special-purpose simulation tools (Martinez 1996).
ARVISCOPE supports animation scalability, which in the context of the authors
research is defined as the ability of the visualization to construct complex scenes that
potentially consist of a large number of CAD objects, and to maintain performance
levels as the size of the operation increases. Scalability allows the creation of very
complex scenes such as the erection of an entire structural steel frame consisting of
several beams and columns by loading only a few CAD models of steel sections,
and placing them repeatedly at appropriate locations using multiple transformation
nodes. Figure 14.15 shows animation snapshots of a structural steel erection operation that was modeled in STROBOSCOPE and visualized in full-scale outdoor AR
using ARVISCOPE. The multistory steel structure shown in this figure was completely modeled and animated in the augmented scene using CAD models of only
a few steel sections. The operation consisted of a virtual tower crane that picked up
steel sections and installed them in their appropriate locations on the steel structure.
14.3.1.2SMART
Scalable and Modular Augmented Reality Template (SMART) is an extensible AR
computing framework that is designed to deliver high-accuracy and convincing augmented graphics that correctly place virtual contents relative to a real scene and
robustly resolve the occlusion relationships between them. SMART is built on top
of the previously designed ARVISCOPE platform (Section 14.3.1.1) and is a loosely
coupled interface that is independent of any specific engineering application or
domain. Instead, it can be readily adapted to an array of engineering applications
such as visual collision avoidance of underground facilities, postdisaster reconnaissance of damaged buildings, and visualization of simulated construction processes.
353
The inbuilt registration algorithm of SMART guarantees high-accuracy static alignment between real and virtual objects. Some efforts have also been made to reduce
dynamical misregistration, which includes the following:
1. In order to reduce synchronization latency, multiple threads are dynamically generated for reading and processing sensor measurement immediately upon the data arrival in the host system.
2. The FIR filter applied to jittering output of the electronic compass leads to
filter-induced latency; therefore, an adaptive lag compensation algorithm is
designed to eliminate the dynamic misregistration.
The SMART framework follows the classical modelviewcontroller (MVC) pattern. Scenegraphcontroller is the implementation of the MVC pattern in SMART
and is described in the following:
1. The model counterpart in SMART is the scene that utilizes application-
specific input/output (I/O) engines to load virtual objects, and that maintains their spatial and attribute status. The update of a virtual objects status
is reflected when it is time to refresh the associated graphs.
2. The graph corresponds to the view and reflects the AR registration results
for each frame update event. Given the fact that the users head can be in
continuous motion, the graph always invokes callbacks to rebuild the transformation matrix based on the latest position and attitude measurement,
and refreshes the background image.
3. The controller manages all user interface (UI) elements, and responds to a
users commands by invoking delegates member functions such as a scene
or a graph.
The SMART framework that is based on a scenegraphcontroller setup is shown in
Figure 14.16 and is constructed in the following way: the main entry of the program
is CARApp, which is in charge of CARSensorForeman and CARSiteForeman.
The former initializes and manages all tracking devices such as real-time kinematic
(RTK) GPS receivers and electronic compasses, while the latter defines the relation
among scene, graphs, and controller. Once a CARSiteForeman object is initialized, it orchestrates the creation of CARScene, CARController, and CARGraph
and the connection of graphs to the appropriate scene. Applications derived from
SMART are single document interface (SDI). Therefore, there is only one open
scene and one controller within a SmartSite. The controller keeps pointers to the
graph and the scene.
14.3.2Hardware Platforms
14.3.2.1UM-AR-GPS-ROVER
The designed software interface must be accompanied by a robust and easy-todeploy hardware platform that enables users to perform operations in both indoor and
outdoor settings. Therefore, a first-generation wearable hardware apparatus called
354
1
1
CARSiteManager
Attributes
+Operations()
+Operations()
CARSensorForeman
Attribute
+Operation()
1
1
1..
1
SmartSite
Attributes
+Operations()
<<uses>>
1
CARTrackerCallback
Attributes
+Operations()
1
1
CARControllerA <<uses>>
CARSceneA
Attributes
Attributes
+Operations()
+Operations()
1
1..*
CARGraphA
Attributes
+Operations()
FIGURE 14.16 SMART framework architecture.
+Operations()
1
1
<<uses>>
CARMotionTracker
Attributes
CARStatementProcessor
CARAnimation
Attributes
+Operations()
Attributes
+Operations()
<<uses>> <<uses>>
CARLocation
Attributes
+Operations()
CAROrientation
Attributes
+Operations()
SMARTVideo
Attributes
CARApp
Attribute
+Operation()
355
GPS sensor
Tracker
(hidden)
Video camera
Head-mounted
display
Laptop
Touch pad
FIGURE 14.17 Overview of the UM-AR-GPS-ROVER hardware framework.
UM-AR-GPS-ROVER was designed in which GPS and three DOF head orientation
sensors were used to capture a users position and direction of look (Behzadan etal.
2008). Figure 14.17 shows the configuration of the backpack and the allocation of
hardware. UM-AR-GPS-ROVER was equipped with the following:

1. Computing devices capable of rapid position calculation and image rendering including an interface for external input (i.e., both user commands and
a video capturing the users environment)
2. An interface to display the final augmented view to the user
3. External power source for the hardware components to ensure continuous
operation without restricting user mobility
The design also had to take into account ergonomic factors to avoid user discomfort
after long periods of operation. Figure 14.18 shows the main hardware components
of UM-AR-GPS-ROVER that include a head-mounted display (HMD), user registration and tracking peripherals, and a mobile laptop computer to control and facilitate
system operation and user I/O devices.
14.3.2.2ARMOR
As a prototype design, UM-AR-GPS-ROVER succeeded in reusability and modularity, and produced sufficient results in proof-of-concept simulation animation.
However, there are two primary design defects that are inadequately addressed: accuracy and ergonomics. First, the insecure placement of tracking devices disqualifies the
UM-AR-GPS-ROVER from the centimeter-accuracy-level goal. Second, packaging
all devices, power panels, and wires into a single backpack makes it impossible to
accommodate more equipment such as RTK Rover radio. The backpack was also too
heavy for even distribution of weight around the body. The Augmented Reality Mobile
OpeRation (ARMOR) platform evolves from the UM-AR-GPS-ROVER platform.
356
h
c
Helmet components
Backpack components
Input devices
FIGURE 14.18 Hardware components of UM-AR-GPS-ROVER (a) Kensington Contour

Laptop Backpack, (b) Sony Vaio Laptop, (c) Trimble AgGPS 332 Receiver, (d) Powerbase
NiMh External Portable Power Pack, (e) Trimble GPS Antenna (mounted on the backpack),
(f) TCM5 3-Axis Orientation Tracker (hidden inside the helmet), (g) Helmet, (h) i-Glasses
SVGA Pro Head-Mounted Display, (i) Fire-i Digital Firewire Camera, (j) Cirque Smart Cat
Touch Pad, (k) WristPC Wearable Keyboard.
ARMOR introduces high-accuracy and lightweight devices, rigidly places all tracking
instruments with full calibration, and renovates the carrying harness to make it more
wearable. The improvements featured in ARMOR can be broken into four categories:
1. Highly accurate tracking devices with rigid placement and full calibration.
2. Lightweight selection of I/O and computing devices and external power
source.
3. Intuitive user command input.
4. Load-bearing vest to accommodate devices and distribute weight evenly
around the body.
An overview comparison between UM-AR-GPS-ROVER and ARMOR is listed in
Table 14.3. ARMOR can work in both indoor and outdoor modes. The indoor mode
does not necessarily imply that the GPS signal is unavailable, but that the qualified
GPS signal is absent. The GPS signal quality can be extracted from the $GGA section of the GPS data string that follows the National Marine Electronics Association
(NMEA) format. The fix quality ranges from 0 to 8. For example, 2 means differential GPS (DGPS) fix, 4 means RTD fix, and 5 means float RTK. The user can
define the standard (i.e., which fix quality is deemed as qualified) in the hardware
357
TABLE 14.3
Comparison between UM-AR-GPS-ROVER and ARMOR Platforms
Component
UM-AR-GPS-ROVER
ARMOR
Comparison
Location
tracking
Trimble AgGPS 332

using OmniStar XP
correction for
differential GPS method
Orientation
tracking
PNI TCM 5
Trimble AgGPS 332

using CMR
correction broadcast
by a Trimble AgGPS
RTK Base 450/900
PNI TCM XB
Video camera
Fire-I digital FireWire

camera
Microsoft LifeCam
VX-5000
Head-mounted
display
Laptop
i-Glasses SVGA Pro

video see-through HMD
Dell Precision M60
notebook
eMagin Z800 3DVisor
User command
input
Nintendo Wii Remote
Power source
WristPC wearable
keyboard and Cirque
Smart Cat touchpad
Fedco POWERBASE
OmniStar XP provides
1020cm accuracy. RTK
provides 2.5cm horizontal
accuracy and 3.7cm vertical
accuracy.
Same accuracy, but ARMOR
places TCM XB rigidly
close to camera.
LifeCam VX-5000 is
lightweight, and has small
volume and less wire.
Z800 3DVisor is lightweight
with stereovision.
Asus N10J is lightweight, has
small volume, and is
equipped with NVIDIA GPU.
Wii Remote is lightweight
and intuitive to use.
Backpack
apparatus
Kensington contour
laptop backpack
Load-bearing vest
Asus N10J netbook
Tekkeon myPower
ALL MP3750
MP3750 is lightweight and

has multiple voltage output
charging both GPS receiver
and HMD.
Extensible and easy to access
equipment.
configuration file. When a qualified GPS signal is available, the geographical location is extracted from the $GPGGA section of the GPS data string. Otherwise, a preset
pseudo-location is used, and this pseudo-location can be controlled by a keyboard.
The optimization of all devices in aspects such as volume, weight, and rigidity
allows that all components be compacted and secured into one load-bearing vest.
Figure 14.19 shows the configuration of the ARMOR backpack and the allocation
of hardware. There are three primary pouches: The back pouch accommodates the
AgGPS 332 Receiver, the SiteNet 900 is stored in the right-side pouch, and the leftside pouch holds the HMD connect interface box to a PC and the MP3750 battery. An
Asus N10J netbook is securely tied to the inner part of the back pouch. All other miscellaneous accessories (e.g., USB to serial port hubs, AAA batteries) are distributed
in the auxiliary pouches. The wire lengths are customized to the vest, which minimizes outside exposure. The configuration of the vest has several advantages over
the Kensington Contour laptop backpack used by ARVISCOPE. First, the design of
the pouches allows for an even distribution of weight around the body. Second, the
separation of devices allows the user to conveniently access and checks the condition of certain hardware. Third, different parts of the loading vest are loosely joined
358

GPS antenna
Electronic compass
Camera
GPS antenna
Electronic compass
HMD
Camera
HMD
RTK Rover
radio antenna
RTK rover
radio
Netbook
RTK Rover receiver
Battery and HMD

connect Hub
Wiimote
FIGURE 14.19 The profile of ARMOR from different perspectives.
so that the vest can fit any body type, and be worn rapidly even when fully loaded.
ARMOR has been tested by several users for outdoor operations that lasted for over
30 continuous minutes, without any interruption or reported discomfort.
14.4 IMPLEMENTED AEC APPLICATIONS

14.4.1 AR-Assisted Building Damage Reconnaissance
Following a major seismic event, rapid and accurate evaluation of a building condition is essential for determining its structural integrity for future occupancy. Current
inspection practices usually conform to the ATC-20 postearthquake safety evaluation
field manual and its addendum, which provide procedures and guidelines for making on-site evaluations (Rojah 2005). Responders (i.e., inspectors, structural engineers,
and other specialists) often conduct visual inspections and designate affected buildings
as green (apparently safe), yellow (limited entry), or red (unsafe) for immediate occupancy (Chock 2006). The assessment procedure can take from minutes to days depending on the purpose of the evaluation (Vidal etal. 2009). However, it has been pointed
out by researchers (Kamat and El-Tawil 2007, Tubbesing 1989) that this approach is
subjective and thus may sometimes suffer from misinterpretation, especially given
that building inspectors do not have enough opportunities to conduct building safety
assessments and verify their judgments, as earthquakes are infrequent.
Despite the de facto national standard of the ATC-20 convention, researchers have
been proposing quantitative measurements for a more effective and reliable assessment of structural hazards. Most of these approaches, especially noncontact, build
on the premise that significant local structural damage manifests itself as translational displacement between consecutive floors, which is referred to as the interstory
drift (Miranda etal. 2006). The interstory drift ratio (IDR), which is the interstory
drift divided by the height of the story, is a critical structural performance indicator
that correlates the exterior deformation with the internal structural damage. A large
IDR indicates a higher likelihood of damage. For example, a peak IDR larger than
0.025 signals the possibility of a serious threat to human safety, and values larger
than 0.06 translate to severe damage (Krishnan 2006).
359
Calculating the IDR commonly follows contact (specifically the double integration
of acceleration) or noncontact (vision-based or laser scanning) methods. Skolnik and
Wallace (2010) discussed that the double integration method may not be well suited for
nonlinear responses due to sparse instrumentation or subjective choices of signal processing filters. On the other hand, most vision-based approaches require the preinstallation of
a target panel or emitting light source that may not be widely available and can be subject
to damage during long-term maintenance. Examples of these approaches can be found in
Wahbeh etal. (2003) (tracking an LED reference system with a high fidelity camera) and
Ji (2010) (using feature markers as reference points for vision reconstruction).
Fukuda etal. (2010) tried to eliminate the need for target panels by using an object
recognition algorithm called orientation code matching. They performed comparison experiments by tracking a target panel and existing features on bridges such
as bolts, and achieved satisfactory agreement between the two test sets. However,
it is not clear whether this approach performs well when monitoring a buildings
structure, as building surfaces are usually featureless. In addition, deploying laser
scanners for continuous or periodic structural monitoring (Alba et al. 2006, Park
etal. 2007), in spite of their high accuracy, may not be feasible for rapid evaluation
scenarios given the equipment volume and the large collected datasets.
Kamat and El-Tawil (2007) first proposed the approach of projecting the previously
stored building baseline onto the real structure, and using a quantitative method to
count the pixel offset between the augmented baseline and the building edge. Despite
the stability of this method, it required a carefully aligned perpendicular line of sight
from the camera to the wall for pixel counting. Such orthogonal alignment becomes
unrealistic for high-rise buildings, since it demands the camera and the wall to be
at the same height. Dai et al. (2011) removed the premise of orthogonality using a
photogrammetry-assisted quantification method, which established a projection relation between 2D photo images and the 3D object space. They validated this approach
with experiments that were conducted with a two-story reconfigurable aluminum
building frame whose edge could be shifted by displacing the connecting bolts.
However, the issue of automatic edge detection and the feasibility of deploying such a
method at large scales, for example, with high-rise buildings, have not been addressed.
In this Section, a new algorithm called line segment detector (LSD) for automating edge extraction and a new computational framework for automating the damage detection procedure are introduced. In order to verify the effectiveness of these
methods, a synthetic virtual prototyping (VP) environment was designed to profile
the detection algorithms sensitivity to errors inherent in the used tracking devices.
Figure 14.20 shows the schematic overview of measuring earthquake-induced damage
manifested as a detectable drift in a buildings faade. The previously stored building
information is retrieved and superimposed as a baseline wireframe image on the real
building structure after the damage. The sustained damage can be then evaluated by
comparing the key differences between the augmented baseline and the actual drifting
building edge. Figure 14.20 also demonstrates the hardware prototype ARMOR (Dong
and Kamat 2010) on which the developed application can be deployed. The inspector
wears a GPS antenna and a RTK radio that communicates with the RTK base station.
Together they can track the inspectors position up to centimeter-accuracy level. The
estimation procedure and the final results can be shown in an HMD to the inspector.
360

GPS
antenna
Electronic
compass
Camera
HMD
RTK Rover
receiver
RTK Rover
radio
FIGURE 14.20 Schematic overview of the designed AR-assisted assessment methodology.

(From Dong, S. etal., Autom. Construct., 33, 24, 2013.)
Besides being a quantitative means of providing reliable damage estimation results,

the vertical baseline of the building structure is also a qualitative alternative for visual
inspection of local damage. By observing the graphical discrepancy between the vertical baseline and the real building edge, the on-site reconnaissance team can approximately but quickly assess how severe the local damage is in the neighborhood of
the visual field. In other words, the larger the graphical discrepancy, the more severe
the damage. The two snapshots of Figure 14.21 focus on different key locations of
the building, but they are taken from the same angle (i.e., direction). The bottomright window on each image is a zoom-in view of the key location. The two vertical
lines in the zoom-in window represent the detected edge and the vertical baseline,
Graphical discrepancy between detected

edge and vertical baseline
Detected
edge
(a)
Vertical
baseline
Detected
edge
Vertical
baseline
(b)
FIGURE 14.21 Graphical discrepancy between the vertical baseline and the detected
building edge provides hints about the magnitude of the local damage. (a) Small gap indicates
less local damage. (b) Large gap indicates more local damage. (From Dong, S. etal., Autom.
Construct., 33, 24, 2013.)
361
respectively. The fact that the gap between the detected edge and the vertical baseline
in Figure14.21a is smaller than that of Figure 14.21b indicates that the key location in
Figure 14.21b suffered more local damage than that of Figure 14.21a.
14.4.1.1 Technical Approach for AR-Based Damage Assessment
A synthetic 3D environment based on VP principles was designed to demonstrate
and evaluate the computational framework, verify the developed algorithms, and
conduct sensitivity analysis. Figure 14.22 shows a 10-story building model simulated
in the VP environment. The graphical model is entirely reconfigurable and capable
A
6.1 m
6.1 m
6.1 m
W14x22
(Typical)
6.1 m
Moment bays
(a)
9.14 m
A
ROOF
9.14 m
D
W24x84
W16x26
W21x50
W24x117
W21x50
5.33 m
9 @ 4.2 m
W24x117
W14x22
W27x94
(Typical)
W24x131
W27x94
W16x26
(Typical)
6.1 m
Moment bays
9.14 m
9.14 m
5
6
9.14 m
F
STORY
10
9
8
7
W24x131
W27x94
W24x131
W27x94
W24x146
W27x94
W24x146
W27x94
W24x146
W27x94
W24x146
5 @ 9.14 m
(b)
Moment bays
Gravity bay
Moment bays
FIGURE 14.22 A 10-story graphical and structural building model constructed in VP environment. (a) Plan view. (b) Elevation view. (From Dong, S. etal., Autom. Construct., 33, 24, 2013.)
362
of manifesting any level of internal damage on its faade in the form of residual drift,
so that the IDR can be extrapolated for each floor. Given the input IDR, the structural
macro model predicts the potential for structural collapse and the mode of collapse,
should failure occur. The residual drift is represented by translating the joints of the
wireframe model that have been superimposed with a high-resolution faade texture.
The drift is further manifested through the displaced edges on the surface texture
that can be extracted using the LSD method.
Subsequently, the 2D intersections between extracted edges and projected horizontal baselines are used to triangulate the 3D spatial coordinates at key locations
on the building. The 2D image, where extracted edges and baselines are visible, is
taken by the OpenGL camera that is set up at specified corners in the vicinity of the
building. At each corner, the cameras orientation (i.e., pitch) is adjusted to take a
snapshot of each floor in sequence and project the corresponding horizontal baseline.
In reality, the location of the camera may be tracked by an RTK-GPS sensor, and its
orientation may be monitored with an electronic compass. In the simulated VP environment, the location and orientation of the camera are known and can be controlled
via its software interface. Random errors can thus be introduced to simulate the
effects of systemic tracking uncertainty or jitter expected in a field implementation.
In order to imitate the structural damage sustained after the disaster, a uniform distribution drift of [0.06 m, 0.06 m], in both x and y directions, is applied to each joint
of the building so that the difference between consecutive floors in either the xory
direction is less than 0.12 m. The selection of the damage model is derived by the
requirement on inelastic IDR, which is commonly limited to 2.5% by building codes
and is occasionally relaxed to 3% for tall buildings (Hart 2008). Given that the average height of a building floor is 34 m, the maximum allowable displacement between
two consecutive floors will be 0.090.12 m when using the most relaxed IDR of 3%.
In addition, a reasonable assumption is made that unless the internal columns buckle
or collapse, the height of the building remains the same after the damage. Since the
column buckling or collapse situation is not modeled in the simulation, the z value of
the corner coordinate does not change. Each polygon vertex is assigned a 2D texture
coordinate, and the associated clipped texture is pasted onto the surface of the wall
(Figure 14.23). The texture can thus displace with the drifting vertex in the 3D space,
with the goal of estimating the vertex deformation through the displaced texture.
14.4.1.2 Vertical Edge Detection
Vertical edge detection of the building wall is the most critical step for locating the
key point on the 2D image plane, which happens to be a fundamental problem in
FIGURE 14.23 Internal structural damage, shift of the vertex, is expressed through the
displacement of the texture. (From Dong, S. etal., Autom. Construct., 33, 24, 2013.)
363
image processing and computer vision domains as well. Many algorithms for edge
detection exist and most of them use the Canny edge detector (Canny 1986) and
Hough transformation (Duda and Hart 1972) as a benchmark. However, standard
algorithms are subject to two main limitations. First, they face threshold dependency
(resulting from adjusting algorithm parameters that influence its accuracy). Second,
they may generate false positives and negatives by either detecting too many irrelevant small line segments or failing to interpret the desirable line segments. False
positives and negatives are highly related to the threshold tuning.
In the authors research, initially, the graph-cut based active contour (GCBAC)
algorithm (Xu and Bansal 2003) was used. By employing the concept of contour
neighborhood, GCBAC alleviates the local minima trapping problem suffered by
traditional active contour. However, GCBAC requires manual specification of the
initial contour and contour neighborhood width, quantities that are arbitrary and subjective. Optimization can be achieved by using the original baseline of the damaged
building to numerically calculate both the initial contour and neighborhood width.
GCBAC works best when the image covers the entire outline of the building that is
not applicable in the real applications. Unfortunately, the coverage of the entire highrise building surface inevitably results in lower-resolution details. Moreover, frequent
partial occlusion from trees and other buildings can compromise detection accuracy.
The second attempt was a linear-time LSD that gives accurate results, a controlled number of false detections, and (most importantly) requires no parameter
tuning (von Gioi et al. 2010). This method outperforms GCBAC in searching for
localized line segments. However, as shown in Figure 14.24, there may be still multiple line segment candidates in the neighborhood of the actual edge of the building
wall. A filter is used to eliminate those line segments whose slope and boundary
deviate significantly from the original baseline. Manual selection can be used if the
LSD fails to locate a desirable edge; the user can manually transpose the closest line
segment to the desirable position in a short amount of time.
FIGURE 14.24 A geometric filter together with minimal manual reinforcement can efficiently
eliminate most irrelevant line segments. (From Dong, S. etal., Autom. Construct., 33, 24, 2013.)
364
14.4.1.3 Horizontal Edge Detection

Horizontal edge detection also plays an essential role in deciding the 2D coordinates
of the drifting corner. If the horizontal frames of windows roughly match with the
physical floors separating stories, then the horizontal edge can also be graphically
detected by LSD as windows bottom frames (Figure 14.25). However, since such an
assumption is not universally true, in the authors research, the horizontal baseline
(that physically separates stories on the damaged building surface) was numerically
projected to represent the horizontal edge. Such an approach is more generic than the
graphical detection. Since a floor is allowed to drift within the xy plane, its horizontal baseline has to be accordingly shifted before it is projected onto the 2D image so
as to match the real horizontal edge.
If the 2D projected horizontal edge does not strictly match with the real horizontal edge, then a gap is detectable, and it enlarges as the camera moves closer to the
building. Furthermore, the drift between the horizontal baseline and the horizontal
edge contains both parallel and perpendicular components (in x and y directions),
and the detectable gap is caused exclusively by the perpendicular component. The
drift on the z coordinate is not considered here since internal column buckling or
collapse is not considered in the damage model.
Unless the drift is known, it is impractical to deterministically position the edge
in the xy plane. Therefore, all possible drifting configurations are exhaustively tested
with a computation complexity of O(n4). This happens because iterating through all
the possible shift configurations of two endpoints on one line segment costs O(n2),
given that only the perpendicular drift component between the edge and the baseline
is considered. The union of two line segments needed in the triangulation has O(n4)
FIGURE 14.25 The detectable gap between the original baseline and real edge enlarges as
the camera gets closer to the building. (From Dong, S. etal., Autom. Construct., 33, 24, 2013.)
ntal
b
Hori
zo
aseli
ntal
b
aseli
Building edge
(a)
Building edge
Hori
zo
365
ne
ne
(b)
FIGURE 14.26 Alignment of horizontal baseline: shifting the two ends of the baseline with
(a) different distances costs O(n4) and (b) the same distance costs O(n2). (From Dong, S. etal.,
Autom. Construct., 33, 24, 2013.)
complexity, as shown in Figure 14.26a where n is the uniform distribution interval

divided by the estimation step. A simple approximation can reduce the complexity
from O(n4) to O(n2) without compromising accuracy.
Since the intersection between the baseline and the edge is close to one end point
of the line segment, only the (x, y) of that end point dominates the intersection accuracy, and the impact of the other end diminishes significantly given the ratio of the
drift magnitude over the distance between two endpoints. Therefore, the two points
on one line segment can share the same tested shifting value with O(n) complexity,
and subsequently the complexity of two line segments decreases to O(n2), as shown
in Figure 14.26b.
14.4.1.4 Corner Detection
The next challenge is to select the best estimation from the n2 candidates in the
aforementioned iteration test. Each pair of tested shift (x, y) of the baselines corresponds to an estimated 3D corner position (x, y, z). If the actual 3D corner position is (x, y, z), and the height of the building remains the same after the damage,
an intuitive judgment for the confidence of the estimation is min (zz). A better
judgment also takes (x, y) into account. For example, if the original 3D corner position is (x0, y0, z0), a proper tested shift (x, y) should be close to the estimated shift
(xx0, yy0). In other words, (xx0 x, yy0 y) should be minimized. Based on
the hypothesis discussed earlier, two filters are proposed for selecting the estimated
corner coordinate. The first one minimizes the square root of (xx0 x, yy0 y,
zz0). The second one sets thresholds for (xx0 x, yy0 y, zz0) and selects the
one with the smallest (|x|, |y|) among the filtering results. Experiment results indicated that there is no major performance gain of one over the other. The algorithm is
described as a flowchart in Figure 14.27.
14.4.1.5 IDR Calculation
Once the corners position is estimated, the calculation of the IDR for each story
is straightforward. For example, in Figure 14.28, the horizontal movement of the
highlighted floor relative to the ceiling is denoted as P1P2, and the floor height is h.
Therefore, the IDR of this story can be calculated as (P1P2)/h.
366
Read internal and

external parameters
of the camera
Search the building edge

using LSD
NO
If both sides are visible
Choose the baseline with

lower slope
NO Choose the visible baseline
NO
YES
Shift endpoints of the baseline

in the uniform distribution
interval with predefined step
Calculate the 2D intersection

between shifted horizontal
baseline and building edge
Have 2D intersection
points from two aspects?
YES
Triangulate the 3D coordinate
of the corner from
intersections on two images
Entire uniform distribution

interval iterated?
YES
Filter the reconstruction
results to find
the best estimation
FIGURE 14.27 Flowchart of the proposed corner detection algorithm. (From Dong, S. etal.,
Autom. Construct., 33, 24, 2013.)
367
P1
P2
FIGURE 14.28 IDR calculation for a sample floor. (From Dong, S. etal., Autom. Construct.,
33, 24, 2013.)

Two sets of experiments were conducted. The goal of the first set was to test the best
performance that the algorithm can achieve with ground true camera pose tracking data. Even given the ground true tracking data, the estimation accuracy can
still be affected by many factors. Therefore, a series of comparison experiments
were conducted to find the influence magnitude of each factor. It was found that
the accuracy improves when the camera moves away from the building. In general,
the lower theslope, the less the projected error, because increasing distance helps
approximate orthogonal perspective. Therefore, increasing the distance of the camera from thebuilding has the effect of lowering the slope and attenuating the error.
Another experiment was conducted to understand whether the observing angle
could affect the accuracy. The observing angle is formed by the line of sight between
two cameras. In the first group, two images from two perspectives covered both sides
of the building wall, as shown in Figure 14.29a and b. In this case, the observing angle
(a)
(b)
(c)
(d)
FIGURE 14.29 Observing angle of the camera (a) Covering both sides of the building,
(b)Covering both sides of the building, (c) Covering only one side of the building, (d)Covering
only one side of the building. (From Dong, S. etal., Autom. Construct., 33, 24, 2013.)
368
is closer to a right angle. In the second group, one image covered both sides, while the
other covered only one side. In the third group, both images covered only one side, as
shown in Figure 14.29c and d. In the latter two cases, the observing angle is closer to
180. It was concluded that the accuracy degenerates significantly when covering only
one side of the wall. This indicates that the detection error is minimized when the angle
formed by two lines is close to a right angle, and magnified when the angle is either
acute or obtuse. In addition, it was found that a higher image resolution helps promote
the accuracy of the LSD algorithm, which in turn increases the overall accuracy.
The goal of the second set of experiments was to test the robustness of the developed algorithm in the presence of instrument errors. The experiments were conducted
with the best configuration, as found from the results of the first set of experiments
(i.e., the camera of 18 megapixels was placed about 35 m away from the building and
provided coverage of both sides of the building). In the first test, ground truth orientation data were assumed and only location error was introduced. In Figure 14.30, the
z-axis shows the average estimation error with the unit of meter. The altitude RMSaxis shows the accuracy response to the change in RTK-GPS altitude measurement
uncertainty, and the longitude and latitude RMS-axis shows the accuracy response to
the change in both RTK-GPS longitude and latitude measurement uncertainty.
The results indicated that uncertainty in longitude and latitude has a bigger impact
on the displacement error than it does on altitude, as indicated by the diagonal arrow
in Figure 14.30. The results also showed that longitudes and latitudes smaller than
3mm can achieve the measurement accuracy of 5mm, as indicated by the left-toright arrow in Figure 14.30.
Given that the displacement error is linear to the GPS location accuracy, state-of-theart RTK-GPS can meet the precision requirement. For example, manufacturer-specified
Measurement error due to GPS
uncertainty
0.03
0.0350.04
0.025
0.030.035
0.02
0.015
0.0250.03
0.01
0.005
0.020.025
0.0150.02
0.010.015
0
Displacement error (m)
0.04
0.035
00.005
18
0
Altitude RMS (mm)
15
12
0.0050.01
Longitude and latitude RMS (mm)
FIGURE 14.30 Sensitivity of computed drift to camera position errors. (From Dong, S.
etal., Autom. Construct., 33, 24, 2013.)
369
accuracy reports uncertainty of 1mm (RMS) in latitude and longitude, and 23mm
(RMS) in altitude, in which case displacement error stays below 5mm (Trimble 2009).
The second test assumed ground truth location data, and only introduced error to orientation. In Figure 14.31, the z-axis shows the average estimation error in meters. The
pitch and roll RMS-axis shows the accuracy response to the change in electronic compass pitch and roll reading uncertainty, and the yaw RMS-axis shows the accuracy
response to the change in electronic compass yaw reading uncertainty.
The results indicated that uncertainty in pitch and roll has a more adverse impact
on the displacement error than the yaw does, as shown by the right-to-left arrow in
Figure14.31. Furthermore, a precision of 0.01 (RMS) on all three axes is required to
keep the displacement error in the useful range, as indicated by the left-to-right arrow in
Figure 14.31. Unfortunately, state-of-the-art electronic compasses mostly cannot satisfy
this precision requirement. Most off-the-shelf electronic compasses report uncertainty
bigger than 0.1 (RMS), thus suggesting the need for survey-grade line-of-sight tracking
methods for monitoring the cameras orientation. The third test considered the combined
error from both location and orientation readings. As shown in Figure 14.32, results
indicated that the uncertainty from the electronic compass is the critical source of error.
In summary, the experimental results with ground true location and orientation
data were proven to be satisfactory for damage detection requirements. The results
also highlight the conditions for achieving the ideal measurement accuracy, for
example, observing distance, angle, and image resolution. The experimental results
with instrumental errors reveal the bottleneck in field implementation. While the
state-of-the-art RTK-GPS sensor can meet the location accuracy requirement, the
electronic compass is not accurate enough to supply qualified measurement data,
suggesting that alternative survey-grade orientation measurement methods must
be identified to replace electronic compasses. The conducted sensitivity analysis
Measurement error from compass
uncertainty
0.4
0.3
0.350.4
0.25
0.30.35
0.2
0.250.3
0.15
0.20.25
0.1
0.150.2
0.05
0.08
0.06
0.04
0.02
0.03
0.01
Pitch and roll RMS (deg)
0.02
0.05
0.10.15
0.04
0.09
0.08
0.07
0.06
0.35
0.050.1
00.05
Yaw RMS (deg)
FIGURE 14.31 Sensitivity of computed drift to camera orientation errors. (From Dong, S.
etal., Autom. Construct., 33, 24, 2013.)
370

Measurement error due to GPS and
compass uncertainty
0.35
0.2
0.30.35
0.15
0.250.3
0.1
0.20.25
0.05
0.150.2
0.10.15
3
0.08
0.09
0.04
0.03
0.02
9
0
GPS RMS (mm)
0.01
0.07
0.050.1
0.06
00
0.05
0.3
0.25
00.05
Electronic compass RMS (mm)
FIGURE 14.32 Sensitivity of computed drift to camera location and orientation errors.
(From Dong, S. etal., Autom. Construct., 33, 24, 2013.)
developed a clear matrix revealing the relationship between instrument accuracy

and accuracy of computed drift, so the practical implementation of the proposed
method can evolve with choices made for higher-accuracy instruments than the
ones tested.
14.4.2AR for Georeferenced Visualization and Emulated

Proximity Monitoring of Buried Utilities
Most developed and developing countries around the world have a massive network
of underground conveyances that make modern-day life possible. In the United
States alone, underground infrastructure comprises of about 20 million miles of pipe,
conduit, and cable (Patel and Chasey 2010). In a report to the Federal Laboratory
Consortium, Sterling (2000) noted that the urban underground has become a spiders
web of utility lines, including phones, electricity, gas, fiber optics, traffic signals,
streetlighting circuits, drainage and flood control facilities, water mains, and wastewater pipes. In some locations, major oil and gas pipelines, national defense communication lines, rail and road tunnels also share the underground space. In addition,
the demand for new buried utilities is continuously increasing with new construction and reconstruction being fueled by growth as well as aging infrastructure. For
example, in the United States, the drinking water supply needs an additional $335
billion in infrastructure investments over 20years for thousands of miles of pipeline to ensure the public health and economic well-being (EPA 2007). As a result,
excavation contractors are continuously digging and trenching the ground to install
new utilities or repair existing lines. Since neither the machines (e.g., backhoes, trenchers, augers) nor their (skilled) operators can see what lies buried in their vicinity,
utilities are frequently struck and damaged (Lester and Bernold 2007).
371
In the United States, an underground utility is hit by an excavator every 60 s causing

billions of dollars of damage each year (Spurgin etal. 2009). From a safety, productivity, and economic perspective, excavation damage to buried utilities is thus a significant national problem. Pipeline and Hazardous Materials Safety Administration
(PHMSA) statistics from 1990 to 2009 identify excavation damage as the single biggest cause of all reported breakdowns in U.S. pipeline systems, accounting for about
35% of all incidents. Hits to utility lines cause interruption to daily life and commerce, and jeopardize the safety of workers, bystanders, and building occupants. For
U.S. pipelines, the 3-year period from 2007 to 2009 saw 1908 serious accidents, 36
fatalities, 174 injuries, and property damage of $800 million (PHMSA 2010). For
the same period (20072009), the 801 serious accidents that involved gas distribution lines caused 27 fatalities, 165 injuries, and property damage of $450 million
(PHMSA 2010). The U.S. Congress has identified the significance of the problem
several years ago and stated its concern in the Transportation Equity Act for the
twenty-first century, TEA 21, Title VII, Subtitle C, Section 87301, by noting that
unintentional damage to underground facilities during excavation is a significant
cause of disruptions in telecommunications, water supply, electric power, and other
vital public services, such as hospital and air traffic control operations, and is a leading cause of natural gas and hazardous liquid pipeline accidents. This in turn, accelerated research in utility location technologies for both new (e.g., Patel etal. 2010)
and existing underground utilities (e.g., Lester and Bernold 2007, SIS 2011, UIT 2010,
VUG 2010). Parallel advances in mapping technology also facilitated the adoption of
digital geodata archival and distribution methods by utility owners (e.g., ESRI 2005,
Porter 2010, UIT 2009b, Wyland 2009) and the evolution of standards for characterizing the quality of mapped geodata (e.g., ASCE 2002). Widespread recognition of
the problem also spawned the national Call Before You Dig campaign in 2000 to
increase public awareness about using 811, or the One-Call system (CGA 2010a).
In the United States, every excavation contractor is required by law to call
a One-Call Center 48 or more hours before digging. The centers serve as a
clearinghouse for excavation activities that are planned close to underground
utilities in the area. For example, Michigans statute, Act 53 of Public Act 1974,
requires anyone who engages in any type of excavation (e.g., grading, demolition, cultivating, auguring, blasting, boring) to provide advance notice of at least
3 full working days to MDS, the One-Call utility notification organization in
Michigan. Figure 14.33 describes the typical steps of the One-Call excavation
damage prevention process using MDS as an example.
Excavator calls
one-call center with
sufficient lead time
before digging
One-call center
shares information
with member
utility owners
Utility owners
dispatch field
locators and
markers
FIGURE 14.33 One-Call process for excavation damage prevention.
Expected utility
locations are
marked with
paint/stakes/flags
372
As soon as a prospective excavator (e.g., contractor) calls MDS with a locating

request, MDS creates a request ticket and shares the information with its member
companies who may own buried assets in the proposed excavation area. The affected
utility owners then send out their own teams to the field with location information
to identify that utilities exist in the planned excavations vicinity. If reliable location
information is not readily available, then the field teams may deploy suitable geophysical technologies to locate the pertinent buried lines and update their database.
Geophysical technologies assist field locators in investigating the earths subsurface.
Ground penetrating radar (GPR), acoustic, seismic, RF identifiers, electromagnetic,
and gyroscopic duct runners are various geophysical methods available to field locators (Costello etal. 2007). Since multiple geophysical technologies or combination
of devices are needed to locate different types of utilities, multisensory approaches
such as those developed by UIT (2009a), VUG (2010), and SIS (2011) can be used
to simultaneously locate and map different types of buried assets prior to a planned
excavation. In all cases (preexisting location information or geophysical surveys),
once the utilities are located, they are marked on the ground with paint, stakes, or
flags using a predefined color scheme that helps identify the types of buried lines
(CGA 2010b).
Despite advances in geophysical locating technologies, and independent of the
care with which known utility locations may be marked on the ground, there are
some fundamental challenges that make it very difficult for excavator operators to be
spatially aware of their surroundings, and thus avoid hitting utility lines while digging. Inaccurate, incomplete, or missing utility location information is often cited as
a cause of incidents involving excavators striking buried utilities (Patel etal. 2010).
The current state of knowledge and the resulting state of practice has two critical
limitations when considered from an excavator operators perspective:

1. Lack of persistent visual guidance for spatial awareness: While the practice of marking utility locations on the ground helps in initial excavation
planning, such surface markings (e.g., paint, stakes, flags) are the first
to be destroyed or dislocated when excavation begins and the top soil or
surface is scraped. This makes it challenging for an excavator operator to
maintain spatial orientation, and must then rely on memory and judgment
to recollect expected utility locations as excavation proceeds. Seemingly
innocuous events such as returning to work after a break or stopping to help
another crew can prove detrimental because the operator must now recall
the marked locations before continuing to dig. Thus, excavator operators
and field supervisors have no persistent visual clues that can help them be
spatially aware of the underground space surrounding an excavators digging implement. Minor lapses in orientation or in recollecting marked utility locations can thus easily lead to accidents.
2. Inability to gauge proximity of excavator to buried assets while digging:
Another significant limitation is that an operator has no practical means of
knowing the distance of an excavators digging implement (e.g., bucket) to
the nearest buried obstructions until they are exposed. Excavation guidelines
in most states including Michigan require buried utilities to be hand exposed
373
prior to the use of any power equipment (MDS 2007). Failure to follow the
hand exposure guidelines, which happens often out of ignorance or as a
conscious decision, means that the first estimate of proximity an operator
receives is when the digging implement actually touches a buried utility. Itis
easy to understand why this first touch can often actually be a strike.
Field locators and markers typically mark only the horizontal location of utilities on
the ground, and often do not include any depth or other attribute information that may
possibly help operators better perceive their location in three dimensions (MDS 2009).
A possible justification for this practice is that locators believe that there are standard
depths where each utility type is typically buried. Second, even if depth and other
attribute information is marked on the ground along with a utilitys horizontal location, the markings are destroyed early in the excavation process, placing the burden
of remembering a utilitys expected depth and orientation on the operator. Excavatormounted devices such as EZiDig (CD 2010) can be partially useful by providing physical evidence of certain utilities (e.g., metallic pipes) existing in an excavators path.
However, as noted earlier, accurate geolocation of all utilities may need a multisensory
approach that can be pursued prior to excavation, but is impractical to implement on a
digging excavator (Costello etal. 2007). Thus, without any visual cues or quantitative
feedback, excavator operators find it very challenging to gauge the evolving distance
between a digging machine and any unexposed utilities that lie buried in the vicinity.
14.4.2.1Technical Approach for Visualization of Buried
Utility Geodata in Operator-Perspective AR
AR visualization can be achieved with optical see-through or video see-through
HMDs worn by the user to view a composite scene (Azuma etal. 2001). The form of
AR visualization most suitable for this application is video see-through AR, where
a video camera abstracts the users view, and graphics are registered and superimposed on the video feed to create a continuous AR composite display (Kanbara etal.
2000). In order to achieve this, the line of sight (position and orientation) of the video
camera must be continuously tracked so that the projection of the superimposed
graphics in each frame can be computed relative to the cameras pose. As shown
in Figure 14.34, using AR visualization, as an excavator digs, the goal is to have
superimposed, color-coded geodata graphics that stay fixed to their intended ground
locations to continuously help orient the operator.
Geodata from
archived GIS or
geophysical
surveys
Characterize and
convert geodata to
annotated 3D
models
Measure position
and orientation
of cab-mounted
video camera
Register 3D models
for visualization in
operator-view
augmented reality
FIGURE 14.34 Overview of the designed approach for visualization of buried asset geodata in operator-view AR.
374
However, in practice, it is not always feasible for a first-person operators perspective due to physical encumbrance and potential distractions associated with traditional AR displays (e.g., HMDs). In order to provide a continuous AR display for
operator spatial awareness, the most suitable approach is to enable AR visualization
on a screen display mounted in the excavators cabin (Figure 14.35). Such a display,
when strategically mounted in the operators field of view may serve as a continuous
excavation guidance tool, while providing an unobstructed view of the jobsite.
In order to abstract the operators viewpoint, a high-speed FireWire camera is
mounted on the roof of the excavators cabin. The position of the camera will be
continuously tracked using RTK-GPS, providing a location uncertainty of 12in.
Tracking of the excavators 3D attitude (heading, pitch, and roll) can be intuitively done using a magnetic orientation sensor. However, the sensor may produce
noisy data due to the excavators metallic construction. In this case, an alternative
approach is to use multiple dual antenna RTK-GPS receivers. The antennas should
be mounted on an excavators cabin in such a way that their simultaneous position
tracking helps interpret a directional vector in 3D space corresponding to the cabins
articulation. A similar idea based on tracking the GPS locations of multiple points
on a machine has been attempted before for grade control and blade positioning on
bulldozers and graders (Roberts etal. 2002b). Having estimated the location of the
viewpoint (i.e., camera) and the orientation of its line of sight, a trigonometric 3D
registration algorithm is used to compute the pose of the geodata graphics that will
be superimposed at each frame to create a composite AR view.
14.4.2.2 Processing Geodata Vectors and Attributes for AR Visualization
The advent of Geospatial Information System (GIS) storage and retrieval, along with
accurate GPS data collection, has allowed fundamental improvements in methods
used to collect and depict utility data (Anspach 2011). Utility owners now typically
have a data record of their underground assets in some form of a GIS. For example,
in the U.S. state of Michigan, DTE (a major utility provider) archives its geodata as
GIS shapefiles using a proprietary database. In order to visualize such GIS geodata
in AR, the geodata must first be converted to a format suitable for graphical visualization. More importantly, the geodata accuracy and associated reliability must be
characterized before it can be used as information support for excavator operation
and control. In order to address the wide disparity in the source, quality, age, and
completeness of utility records, state-of-the-art utility GIS tools characterize geodata in terms of its precision and its pedigree (ProStar 2011). Precision characterizes
the accuracy that can be associated with the geodata (e.g., Map Grade for subfoot,
or Survey Grade for 4 in.). Pedigree, on the other hand, describes the lineage of the
geodata and its attributes (e.g., as-designed or as-built, type of utility, identity of data
collector, device used, aerial imagery). Together, the precision and pedigree help
characterize the accuracy of a geodata set and the reliability that can be associated
with its source. This information can be used to display not only the expected locations of utilities to an operator, but also the degree of uncertainty (or buffer) associated with the expected locations.
GIS shapefiles can be converted to a graphical 3D format using XML-based
encoding schemes such as the Geography Markup Language (GML) (Burggraf 2006).
tron
ic
pipe power
line
Sewe
r
drain and
lines
Petro
le
pipe um
line
FIGURE 14.35 Cabin-mounted screen display for persistent AR visualization.
Elec
375
376
(a)
(b)
FIGURE 14.36 Geodata converted to annotated 3D models. (a) Google Earth view. (From
Talmaki, S. et al., Geospatial databases and augmented reality visualization for improving safety in urban excavation operations, Proceedings of the 2010 Construction Research
Congress: Innovation for Reshaping Construction Practice, American Society of Civil
Engineers, Reston, VA, 2010, pp. 91101.) (b)Open Scene Graph (OSG) AR.
GML,inparticular, is specifically created for geospatial data as an open interchange

format, and has shown to be effective in transcoding utility geodatabases for use in visualization (e.g., Mendez etal. 2008). Figure 14.36 shows snapshots of experiments conducted in this research using converted geodata of a DTE electrical line using Google
Earth to verify the accuracy of the geodata conversion and using an OSG-based environment to visualize geodata in a 3D AR environment. The geodata used in these experiments was processed through a series of steps to be converted to annotated 3D models
following the uniform color code used to identify paint markings made by field locators
(CGA 2010b). In Figure 14.36a, the buffer was calculated by interpreting the geodatas
precision and pedigree is represented as a band or halo whose width represents the
uncertainty associated with the utilitys location. In Figure 14.36b, the buffer was identified by increasing the diameter of the cylindrical geometry representing the utility line.
14.4.2.3 Real-Time Proximity Monitoring of Excavators to Buried Assets
Proximity monitoring has been studied in many engineering contexts, particularly
in automation and robotic applications. Examples in construction and manufacturing include workerequipment proximity detection for safety alerts (Teizer et al.
2010), equipmentequipment proximity detection for situational awareness (Chi
et al. 2008, Oloufa et al. 2003), and equipmentenvironment proximity detection
for autonomous operation/navigation and obstacle avoidance (Kim etal. 2005, Son
et al. 2010). Proximity computing methods in such applications can include radio
detection and ranging (Radar), sonar, GPS, radio-frequency identification (RFID),
lasers, LIDAR, cameras (for computer vision), or combinations of these technologies
(Ruff 2001, Teizer etal. 2010). Each of these methods require one or both of the following: (1) the ability to install sensors on both autonomous entities whose proximity is monitored (e.g.,Oloufa etal. 2003, Teizer etal. 2010) and/or (2) the ability to
377
model the environment surrounding an autonomous entity in real time to recognize

specific geometric objects so they can be avoided if needed (e.g., Kim etal. 2005,
Kwon etal. 2004). For the application area described in this Section, neither of the
earlier requirements can be met. Installing any type of sensor on an excavator bucket
is impractical because it could be easily damaged. On the other hand, any possible
recognition of buried assets through real-time geometric modeling (e.g., LIDAR)
assumes that the assets are already visible, which is clearly not the case while an
excavator is digging. Therefore, the authors explored a new passive and interpreted
computation method (Figure 14.37) for monitoring the proximity between a digging
excavator and underground assets in the vicinity. In the designed approach, the following technical challenges were addressed:
1. Tracking an excavators end-effector position with acceptable uncertainty
2. Monitoring the tracked excavator in a graphical 3D world
3. Interpreting the excavators proximity to utilities known to exist in the
vicinity in real time using a geometric approach
Measurement of end-effector position is currently implemented by several original equipment manufacturers (OEMs) such as Leica, Trimble, and Caterpillar (e.g.,
Kini etal. 2009, Roberts etal. 2002b) for grade control of excavating equipment,
providing operator feedback of absolute position of the digging implement in near
real time. As closed, turnkey equipment, these systems are typically not suitable for
interfacing with other instrumentation, and do not provide the flexibility to explore a
variety of sensing technologies and assess their accuracy. To facilitate this research,
the authors designed and assembled a hardware platform that provides interfaces to
several sensor technologies, has well-characterized errors and latencies, and meets
an accuracy target for location of the end effector in the subfoot range. Figure 14.38
is a schematic illustration of the designed excavator articulation tracking system.
Accurate measurement of excavator articulation angles is taken via direct measurement using linear variable differential transformers (LVDTs), encoders, or similar
transducers. Inertial sensors, whileused in some commercial grade control systems
(e.g., Leica, Trimble) to measure articulation angles, may be subject to crosstalk from
cabin motion and may haveslower dynamic response. Measurement of cabin position requires the use of a RTK-enabled GPS receiver. Also, measurement of cabin
orientation is done using both a magnetic orientation tracker and a dual antenna
RTK-GPS receiver to resolve heading, pitch, and roll angles.
Tracking excavator
articulation and
position of digging
implement
Emulation in 3D
world populated
with buried utility
geodata models
Detection of
proximity to
buried assets in
excavator vicinity
Real-time
knowledge-based
excavator
operation
FIGURE 14.37 Proximity monitoring of excavators digging implement to buried assets.
378
Radio signal
transmitter
RTK
rover antenna
RTK
base antenna
LVDT
RTK
base receiver
Radio signal
receiver
FIGURE 14.38 Monitoring excavator articulation.
Data acquisition and kinematic calculations are performed by a dedicated data

acquisition system (DAS). The kinematic state of the excavator will be calculated in
real time, including, but not limited to, the end-effector position, joint positions, and
cabin position and orientation. A real-time data stream of the excavator kinematic
state, the status of the DAS, and calculation algorithm are broadcast to the rendering
computer. This data stream is also logged to allow excavator motions to be streamed
to the rendering computer offline. Latencies between the GPS, sensors, and cameras
are characterized and used to improve synchronization between all data sources in
the rendering process. If the excavator is equipped with an interface (typically CAN
bus) to OEM instrumentation, the DAS can also monitor and log data from that
instrumentation for comparison and evaluation purposes. Furthermore, the camera
mounted on the cabin of the excavator to provide a view of the forward scene is calibrated with respect to the cabins coordinate system. This cameras trigger is either
generated or recorded by the DAS to allow synchronization with other data sources
within 100 ms. Additional cameras may also be mounted to provide other views
(e.g.,operator controls, operator face).
The excavators tracked kinematic articulation is used for proximity detection to
surrounding assets through monitoring it in a graphical 3D world by representing it
as a virtual avatar (i.e., a geometric model). For this, first, U.S. Geological Survey
(USGS) digital elevation models (DEMs) and satellite imagery are converted into
3D terrain models. Once a terrain model is created, the virtual world is populated
with geometric models representing vicinal utilities, converted using the process
described in the previous Section. The output of this step is a 3D virtual environment that consists of a geographically and visually correct terrain model, geolocated 3D representations of existing underground utilities, and a real-time 3D avatar
of the excavator operating in the area. The key capability achieved by implementing
this emulated environment is that since all objects contained in this virtual world
(a)
379
(b)
FIGURE 14.39 Snapshots of preliminary work in monitoring and proximity querying.

(a) Real time monitoring. (b) Geometric proximity estimation. (From Talmaki, S. and
Kamat,V.R., J.Comput. Civ. Eng., 28(3), 04014001, 2014.)
will be 3D models, geometric proximity detection algorithms can be directly used

to compute the distance between any selected pair of objects. Performing such a
computation on object pairs consisting of the excavator bucket, and models representing buried assets thus enables monitoring of the excavators proximity to known
locations of underground utilities.
Figure 14.39 shows snapshots of a monitored virtual environment comprising several tiles of the Michigan state plane grid. USGS DEM data were imported along
with satellite imagery to create the 3D terrain model of the area. Geodata representing DTEs electricity distribution lines buried in the area were also imported into
the environment and their coordinate frames were reconciled. A vehicle was instrumented with a GPS receiver and driven around the area and continuously tracked.
As shown in Figure 14.39a, while the vehicle was being tracked, its avatar was monitored and represented in the 3D world as an excavator. The goal of this simple experiment was to monitor only the vehicles motion in real time along the 3D terrain as the
first step for equipmentutility proximity estimation.
14.4.2.4Monitoring ExcavatorUtility Proximity Using
aGeometric Interference Detection Approach
The designed geometric proximity querying method is applied to pairs of 3D
models, where each pair is comprised of the excavators digging implement and
a graphical model of a buried asset in the vicinity. Since the geometric models
of both the monitored excavator and the utilities represent their best known real
locations, estimates of Euclidean distance between the two can be interpreted as
a measure of their true proximity. If the graphical model of a monitored excavator
bucket object comes close to the graphical model of an electrical conduit (or the
buffer zone around its expected location), then the conclusion is that the corresponding real excavators bucket is also approaching the expected location of the
real buried electric line. A graphical 3D world representing a complex environment
380
generally consists of N moving and M stationary objects, where both N and M can
be arbitrarily large. In this case, for every monitored excavator (N), the task is to
estimate its proximity to M stationary objects representing buried assets.
The geometric computation of object-to-object proximity is an intensive task.
Efficiency in proximity computing algorithms is thus paramount. In order to address
this challenge, a two-phase, hierarchical approach based on a method originally
presented in Larsen etal. (1999) was investigated. Since activities on an excavation
site are typically spread laterally, not all object pairs are possibly in close vicinity
at any given time. A two-phase approach is efficient for this task because pairs of
excavatorutility objects that are not within interacting distance could be quickly
eliminated. This can possibly limit detailed pair-wise proximity tests to onlythose
object pairs that are within interacting distance (i.e., utilities in the vicinity of
the digging excavator). Figure 14.40 presents the two-stage proximity detection
approach. In particular, following each excavator motion update, a quick approximate test based on loosely fitting bounding volumes called axis-aligned bounding boxes (AABBs) first finds potentially interacting excavatorutility object pairs,
using a variation of the N-body sweep and prune approach originally proposed by
Cohen et al. (1995). After identifying pairs of potentially interacting objects, an
exact two-level test computes the Euclidean distance between the objects in each
pair to determine the excavators proximity to nearby utilities. In this stage, the
algorithm constructs another kind of imaginary bounding volumes called swept
sphere volumes (SSVs) around the digging implement and each buried asset object
(Larsen etal. 1999). In order to improve computational efficiency, these bounding
volumes will then be organized into bounding volume hierarchies (BVHs), on which
proximity tests will be attempted.
Incremental
motion
Emulated 3D
virtual world
Prune distant
object pairs
with AABBs
Flagged
object pairs
No proximity
Response
parameters
Analysis and
warning
response
Exact pair-wise
proximity tests
with SSVs
Possible proximity
FIGURE 14.40 Two-stage proximity detection.
Object pairs in
close proximity
Confirmed proximity
381
The excavators current articulation and the location of the nearest buried assets
is then projected on the display along with the AR view. The computed distance
to the nearest buried assets is also displayed, along with identifying attributes to
help the operator correlate the presented information to utility lines displayed in
the AR view.
The robustness of the designed excavatorutility proximity detection methodology
was tested in a series of field experiments using ARMOR and the SMART framework. In particular, electricity conduits in the vicinity of the G.G. Brown Building at
the University of Michigan were exported as Keyhole Modeling Language (KML)
files from a Geodatabase provided by the DTE Energy Company. The following
procedure interprets KML files and builds conduit models:
1. Extract the spatial and attribute information of pipelines from the KML
file using libkml, a library for parsing, generating, and operating in KML
(Google 2008). For example, the geographical location of pipelines is
recorded under the Geometry element as LineString (Google 2012). A cursor is thus designed to iterate through the KML file, locate LineString elements, and extract the geographical locations.
2. Convert consecutive vertices within one LineString from the geographical
coordinate to the local coordinate in order to raise computational efficiency
during the registration routine. The first vertex on the line string is chosen
as the origin of the local coordinate system, and the local coordinates of
the remaining vertices are determined by calculating the relative 3D vector between the rest of the vertices and the first one, using the Vincenty
algorithm.
3. In order to save storage memory, a unit cylinder is shared by all pipeline
segments as primitive geometry upon which the transformation matrix is
built.
4. Scale, rotate, and translate the primitive cylinder to the correct size, attitude, and position. For simplicity, the normalized vector between two successive vertices is named as the pipeline vector. First, the primitive cylinder
is scaled along the X- and Y-axis by the radius of the true pipeline, and then
scaled along the Z-axis by the distance between two successive vertices.
Second, the scaled cylinder is rotated along the axisformed by the cross
product between vector 0, 0, 1 and the pipeline vectorby the angle of
the dot product between vector 0, 0, 1 and the pipeline vector. Finally, the
center of the rotated pipeline is translated to the midpoint between two successive vertices. This step is applied to each pair of two successive vertices.
Figures 14.41 through 14.43 show several snapshots corresponding to different steps
of these field experiments.
382
1
4
Z
2
X 3
FIGURE 14.41 Conduit loading procedure, conduits overlaid on Google Earth and field
experiment results.
FIGURE 14.42 Labeling attribute information and color coding on the underground utilities.
FIGURE 14.43 An x-ray view of the underground utilities.
383
14.4.3 AR for Collaborative Information Delivery

According to the Raytheon STEM Index, a new U.S. News annual index, despite
some signs of improvement, student aptitude for and interest in science, technology,
engineering, and mathematics (STEM), has been mostly flat for more than a decade,
even as the need for STEM skills continues to grow. The Raytheon Index shows that
after a long period of flat to down indicators, there has been some upward movement,
particularly in the actual number of STEM degrees granted at the undergraduate and
graduate levels. But even with those numbers on the rise, as a proportion of total
degrees granted they still hover close to the same levels that existed in 2000, indicating that the education pipeline to fill the current and future jobs that will require
STEM skills still is not producing enough talent (Alphonse 2014).
In particular to construction and civil engineering as a major STEM degree granting discipline, research shows that a large percentage of students fail to properly
link their classroom learning to real-world engineering tasks and the dynamics
and complexities involved in a typical construction project (Arditi and Polat 2010,
Bowie 2010, Mills and Treagust 2003). Nonetheless, traditional information delivery
techniques (including the use of chalkboards, handouts, memorization, index cards,
and slide presentations) coupled with focusing mainly on simplistic approaches and
unrealistic assumptions to formulate and solve complicated engineering problems
are still considered predominant instructional methods in many engineering curricula (Tener 1996). Several academic surveys have indicated that students complain
about the limited use of emerging technologies and advanced problem-solving tools
in the classroom (Behzadan and Kamat 2012). Engineering students need to also
pick up social and technical skills (e.g., critical thinking, decision making, collaboration, and leadership) that they need in order to be competent in the digital age.
One of the fastest emerging technologies in engineering education is visualization. Although instructional methods that take advantage of visualization techniques
have been around for several years, they still rely on traditional media and tools.
For example, students who take a course in construction planning may use drawings, scheduling bar charts, sand table models, and more recently, 3D CAD models.
However, none of these techniques are capable of effectively conveying information
on every aspect of a project. For instance, 2D or 3D models do not reflect temporal
progress, while scheduling bar charts do not demonstrate the corresponding spatial
layout. More recently, some studies have been conducted on linking 3D CAD models
with construction schedules, so as to exploit the dynamic 3D nature of construction
at the project level. This class of visualization technique is commonly known as
4DCAD, where based upon the planned work sequence (i.e., project schedule), individual CAD components are added to the target facilities as time advances.
At the project level, 4D CAD modeling proves its value in minimizing the misinterpretation of a project sequence by integrating the spatial, temporal, and logical aspects of construction planning information (Koo and Fischer 2000). Unlike
project-level visualization in which only major time-consuming processes are animated, operations-level visualization explicitly represents the interaction between
equipment, labor, materials, and space (e.g., laying bricks, lifting columns). This
visualization approach is especially powerful when there is a need to elaborate on
384
operational details such as the maneuverability of trucks and backhoes in excavation areas, and the deployment of cranes and materials in steel erection. Such tasks
require careful and detailed planning and validation, so as to maximize resource
utilization and to identify hidden spatial collision and temporal conflicts. Therefore,
this visualization paradigm can help engineers in validating and verifying operational concepts, checking for design interferences, and estimating overall constructability (Kamat and Martinez 2003).
From the point of view of collaborative learning, however, there are still major
gaps between research advancements in visualization and the integration of
visualization techniques in pedagogical settings, as well as outstanding implementation challenges that need to be properly addressed. This Section describes the latest work by the authors in using AR as an interconnecting media to bridge the gap
between computer-based dynamic visualization and paper-based collaboratively
shared workspace. AR is one of the most promising candidates because it blends
computer-generated graphics with real scene backgrounds, using real-time registration algorithms. Users can work across the table face-to-face, shift the focus of
shared workspace instantly, and jointly analyze dynamic engineering scenarios. This
idea is developed and implemented in Augmented Reality Vitascope (ARVita), in
which multiple users wearing HMDs can observe and interact with dynamic simulated construction activities laid on the surface of a table.
Compared to VR, AR can enhance the traditional learning experience since
1. The ability to learn concepts and ideas through interacting with a scene
and building ones own knowledge (constructivism learning) facilitates the
generation of knowledge and skills that could otherwise take too long to
accumulate.
2. Traditional methods of learning spatially relate content by viewing 2D
diagrams or images create a cognitive filter. This filter exists even when
working with 3D objects on a computer screen because the manipulation of
objects in space is done through mouse clicks. By using 3D immersive AR,
a more direct cognitive path toward understanding the content is possible.
3. Making mistakes during the learning process will have literally no real
consequence for the educator, whereas in traditional learning, the failure to
follow certain rules or precautions while operating machinery or handling a
hazardous material could lead to serious safety and health-related problems.
4. AR supports discovery-based learning, an instructional technique in which
students take control of their own learning process, acquire information,
and use that information in order to experience scenarios that may not be
feasible in reality given the time and space constraints of a typical engineering project.
5. An important objective of all academic curricula is to promote social interaction among students, and to teach them to listen, respect, influence, and
act. By providing multiple students with access to a shared augmented space
populated with real and virtual objects, they are encouraged to become
involved in teamwork and brainstorming activities to solve a problem,
which simultaneously helps them improve their communication skills.
385
Group discussion cultivates face-to-face conversation, where there is a dynamic and

easy interchange of focus between the shared workspace and speakers interpersonal
space. The shared workspace is the common task area between collaborators, while
the interpersonal space is the common communication space. The former is usually
a subset of the latter (Billinghurst and Kato 1999). Educators can use a variety of
nonverbal cues to quickly shift the focus of a shared workspace accordingly, and thus
work more efficiently. Compared to VR, AR by definition better supports the prospect
of collaborative learning and discussion. As previously stated, AR has been widely
studied in construction in areas including but not limited to operations visualization,
computer-aided operations, project schedule supervision, and component inspection.
However, there are few examples in the collaborative AR domain. For instance, Wang
and Dunston (2008) developed an AR face-to-face design review prototype and conducted test cases for collaboratively performing error detection. Hammad etal. (2009)
applied distributed AR for visualizing collaborative construction tasks (e.g., crane
operations) to check spatial and engineering constraints in outdoor jobsites. To this date,
none of the previous work in this domain allows users to validate simulated processes
and learn from the results by collaboratively observing animations of dynamic operations. The following Sections include detailed description of the design requirement,
technical implementation, and capabilities of ARVita, a collaborative AR-based learning system for visualizing and studying dynamic construction operations.
14.4.3.1 Technical Approach for AR-Based Information Delivery
The software architecture of ARVita conforms to the classical MVC pattern as
shown in Figure 14.44. A Model is responsible for initializing, archiving, and updating the VITASCOPE scene node. The manipulation of the VITASCOPE scene
Video
tracker
marker
Video
Video
background
Marker
Tracker
ModelView
transform matrix
FLTK Window
FLTK OSG Widgets

viewer
FLTK Widgets
Embedded
OSG viewer
View
Controller
Sensor
resource
Vitascope API
Coordinate
transform matrix
Vitascope
scene root
Model
FIGURE 14.44 The software architecture of ARVita conforms to the MVC pattern. Each
arrow indicates a belongs to relationship.
386
node is possible because the VITASCOPE visualization engine contains a list of

application programming interface (APIs) granting developers full control of the
underlying animation process. A Controller communicates users I/O commands
to the VITASCOPE API wrapped inside the Model. The communication channel is
powered by the Fast Light Toolkit (FLTK) that translates and dispatches mouse/key
messages to the Model and the View.
The View displays the updated Model content, and this is based on the premise
that it can correctly set up projection and ModelView matrices of the OpenGL cameras. First, a camera projection matrix is populated at the start of the program based
on the camera calibration result. This is to make sure that the OpenGL virtual camera and real camera share a consistent view volume. Second, the ModelView matrix
(the pose of the OpenGL camera) is updated in every frame based on the marker
tracking results so that CAD models are transformed to the correct stance relative
to the camera.
The Tracker reads updated video frames and stores the detected physical marker
descriptor in the Marker. Consequently, the Marker calculates the cameras pose in
the world coordinate system based on the descriptor, and updates the ModelView
transformation node. ARVita chooses to comply with this Tracker and Marker mechanism because it is an abstract layer to separate the tracking and rendering logic.
This makes ARVita versatile in accommodating different tracking (i.e., registration)
procedures. It currently supports two available trackers: ARToolkit (Hirokazu and
Billinghurst 1999), which is a widely used fiducial marker tracking library, and KEG
(Feng and Kamat 2012), which was developed at the University of Michigan and is a
natural marker tracking library.
The video resource is pasted as dynamic texture on the background and the eagle
window. Despite the stability of the trackers used in ARVita, they all require the
majority of the marker, or even the entire marker, to be visible in the video frame. For
example, in the ARToolkit, the CAD models could immediately disappear as soon as a
tiny corner of the marker is lost from the cameras sight. This limitation is much more
severe when the animated jobsite covers the majority of the screen, which makes it
very difficult to cover the marker within the cameras view volume. The eagle window
is thus valuable for mitigating these flaws. It can be toggled on and off by the user.
When the user moves the camera to look for a vantage point, the eagle window can be
toggled such that the user is aware of the visibility of the marker. When the camera is
set to static, and the user is paying attention to the animation, the eagle window can be
toggled off so it does not affect the observation.
FLTK plays the role of the Controller in ARVita, and it translates and dispatches
a users interaction/input messages to the system. The motivation behind ARVita
is to allow multiple users to observe the animation from different perspectives,
and promptly shift the focus of the shared working space in a natural approach. As
shown in Figure 14.45, these natural interactions include rotating the marker to find
vantage points, and pointing at the model to attract others attention. Given that the
scale of the model may prevent users from getting close enough to regions of interest,
ARVita provides users with basic zooming and panning functionalities.
The focus of the shared working space can not only be switched spatially, but also
temporally. Users can choose to observe the animation at different speeds, or jump
387
FIGURE 14.45 Two users are observing the animation lying on the table.
instantaneously along the timeline (Figure 14.46). The ARVita Controller wraps
the existing VITASCOPE APIs such as vitaExecuteViewRatioChange() and
vitaExecuteTimeJump(), in a user-friendly interface as most media players do
using features such as fast-forward buttons and progress bar.
As shown in Figure 14.47, the VITASCOPE scene node (the core logic of the
model) resides at the bottom of the tree. The vitaProcessTraceFile()function is called up in every frame to update the animation logic. Above the scene node
is a coordinate transformation node. Since all tracking algorithms used in ARVita
assume that the z-axis is pointing up and use the right-hand coordinate system, this
transformation converts VITASCOPEs y-axis to be pointing upward, and converts
the right-hand coordinate system to ARVitas default system, so that the jobsite
model is laid horizontally above the marker.
The core of the View is the FLTK_OSGViewer node that inherits methods from both
the FLTK Window class and osgViewer class and thus functions as the glue between
FIGURE 14.46 Steel erection activities at different timestamps.
388
Embedded
view
View
Controller
Fltk
widgets
Fltk
window
Fltk_OSG
viewer
osgCamera:
projection,
viewport
Video
background
Eagle
window
background
Modelview
Model
Coordinate
transform
Vitascope
Vitascope
scene root
FIGURE 14.47 The realization of the MVC model with OSG.
the FLTK and OSG. Under its hood are the ModelView transformation node and video
stream display nodes. The ModelView matrix is updated in each frame by the tracking event callbacks. This approach follows osgARTs example (OSG ARToolkit) that
uses the Tracker and Marker updating mechanism to bundle the tracking procedure
(e.g., ARToolkit and OSG together). As shown in Figure 14.48, both Tracker and
Marker are attached as event callbacks to the respective node in the graph.
The fiducial marker tracking method is efficient and fast. This is because the fiducial marker is usually composed as an artificial picture that contains easy to extract
visual features (e.g., a set of black and white patterns). The extraction of these patterns (e.g., straight lines, sharp corners, circles) is fast and reliable. ARToolkit is one
of the oldest fiducial marker tracking methods, and is widely considered a benchmark in the AR community. However, it also suffers from the common shortcomings
of a fiducial marker, and requires all four of its corners to be visible to the camera so
that the camera pose can be computed.
This can cause frustration when the user navigates through the animated jobsite only to find the animated graphics blinking on and off due to loss of tracking.
389

Image for
tracking
Event
callback
Root
Abstract
video
Instance
video
Dynamic
texture
Video
layer
Modelview
Abstract
tracker
Event callback
Transformation
matrix and visibility
Instance
tracker
Marker
VITASCOPE
FIGURE 14.48 osgARTs tracker and marker updating mechanism.
Thisdisadvantage was the motivation to look into natural markers, which are more
flexible with regard to the markers coverage. Besides the advantage of partial coverage (Figure 14.49), the natural marker does not depend on special predefined visual
features such as those of the fiducial marker. In other words, the features can be
points, corners, edges, and blobs that appear in a natural image. The extraction of
these features is vital in establishing correspondence between the observed image by
the camera and the marker image, and in estimating the cameras pose.
Therefore, robust estimation usually requires the establishment of ample correspondence, which is a challenging issue. The KEG tracker used in ARVita inherits merits from both the detection-based methods such as Scale Invariant Feature
Transform (SIFT) (Lowe 2004) or FERNs (Ozuysal etal. 2007), and tracking-based
methods such as KanadeLucasTomasi (KLT) (Lucas and Kanade 1981, Shi and
FIGURE 14.49 The natural marker tracking library is flexible on marker visibility.
390
TABLE 14.4
Comparison between Two Natural Marker Approaches
Approach
Principle
Relation between
consecutive frame
Pros
Cons
Detection-Based
Identify matching feature points on
each new frame.
Independent.
Failure of estimation in one frame will
in no way affect the next frame.
Time consuming.
Tracking-Based
Follow up the existing feature points
from frame to frame.
Current frame is correlated with
previous one.
Fast.
Error of estimation in one frame will
be carried forward, and the
accumulated error will eventually
lead to loss of tracking.
Tomasi 1994). A robust detection-based method often demands a high computational

budget, and can hardly meet real-time frame rates. On the other hand, accumulated errors in the tracking-based method can be carried forward along consecutive
frames, and thus compromise the tracking quality.
KEG takes advantage of the coding system in the AprilTags (Olson 2011) tracking library to identify the associated information with a certain marker. Even though
AprilTags, itself, is part of the fiducial tracking family, it does not need to be fully
covered by the camera once the identity of the marker is confirmed. Table 14.4 summarizes the profiles of detection-based and tracking-based methods.
14.4.3.2 Multiple Views in ARVita
The OSGCompositeViewer class is the key to upgrading the single-view version of
ARVita to the multiple-views version. This class is a container of multiple views that
keeps the views synchronized correctly and threaded safely. Each view plays the
same role as the FLTK_OSGViewer does in Figure 14.47, and independently maintains its own video, tracker, marker resources, and ModelView matrix. However,
these views share the same VITASCOPE scene node (Figure 14.50) for two reasons:
to synchronize animation across different views, and to save memory space by maintaining only one copy of each scene node.
The number of instance views depends on how many video capture devices are
available. Based on their system device ID, ARVita presents users with a list of
available web cameras as the program starts and lets the users choose the number
of views and corresponding web cameras (Figure 14.51). When one user interacts
with the model (through rotating the marker, zooming, or dragging the progress bar),
allof these spatial or temporal updates will be reflected in all other users augmented
spaces, so that a consistent dynamic model is shared among all users.
The current version of ARVita supports running multiple views on a single computer, indirectly limiting the maximum number of participants. As more viewers join,
the computer quickly gets overloaded by maintaining too many video resources and
tracking procedures. Work is currently in progress to pursue an alternative distributed computing approach to overcome this limitation. As a generic architecture for
391
Composite
Viewer
View_1
Video_1
ModelView_1
View_2
Video_2
View_n
ModelView_2
Video_n
ModelView_3
VITASCOPE
Scene Root
FIGURE 14.50 All views possess their own video, tracker, and marker objects, but point to
the same VITASCOPE scene node.
FIGURE 14.51 Users can select multiple cameras that are detected by ARVita as the program starts.
distributed computer simulation systems, high-level architecture (HLA) can integrate

heterogeneous simulation software and data sources, while communicating between
computers and even platforms. HLA thus presents itself as a promising solution for a
distributed ARVita. However, having multiple views on one computer is still useful.
For example, in the multiview version of ARVita, one can observe animation from
different viewpoints simultaneously and thus acquire a broader comprehension of the
entire simulated processes. The next generation of ARVita will focus on offering students more flexibility and space when they observe a 3D animation model. Two thrusts
of improvements can be made. The first is to enable the tracking library to function in
the unknown environment so that a users observation range is no longer limited by the
visibility of the marker. In other words, any flat table with an ample amount of features
could be a tracking plane. The second area for improvement relates to the efforts that are
392
being made to make ARVita comply with the rules of HLA. This compliance will allow
ARVita to be distributed and synchronized across computers. When this happens, students can run multiple instances of ARVita on their own computers, but still collaborate
on the synchronized model. The current version of ARVita software and its source code
can be found on the website: http://pathfinder.engin.umich.edu/software.htm.
14.5 SUMMARY AND CONCLUSIONS

In AEC applications, the requirement of blending synthetic and physical objects distinguishes AR from other visualization technologies in three aspects:
1. It reinforces the connections between people and objects, and promotes
engineers appreciation about their working context.
2. It allows engineers to perform field tasks with the awareness of both the
physical and synthetic environments.
3. It offsets the significant cost of 3D model engineering by including the realworld background.
This Chapter reviewed critical problems in AR and investigates technical approaches
to address the fundamental challenges that prevent this technology from being usefully deployed in AEC applications, such as the alignment of virtual objects with
the real environment continuously across time and space, blending of virtual entities
with the real background faithfully to create a sustained illusion of coexistence, and
the integration of these methods into a scalable and extensible computing AR framework that is openly accessible to the teaching and research community.
The research findings have been evaluated in several challenging AEC applications where the potential of having a significant economic and social impact is high.
Examples of validation test beds implemented include the following:
1. An AR postdisaster reconnaissance framework that enables building
inspectors to rapidly evaluate and quantify structural damage sustained by
buildings in seismic events such as earthquakes or blasts
2. An AR visual excavatorutility collision avoidance system that enables
workers to see buried utilities hidden under the ground surface, thus helping prevent accidental utility strikes
3. A tabletop collaborative AR visualization framework that allows multiple
users to observe and interact with visual simulations of engineering processes
REFERENCES
Aiteanu, D., Hiller, B., and Graser, A. (2003). A step forward in manual welding: Demonstration
of augmented reality helmet. Proceedings of the 2003 IEEE and ACM International
Symposium on Mixed and Augmented Reality. Tokyo, Japan.
Alba, M., Fregonese, L., Prandi, F., Scaioni, M., Valgoi, P., and Monitoring, D. (2006).
Structural monitoring of a large dam by terrestrial laser. Proceedings of the 2006 ISPRS
Commission V (use capital V instead of small v) Symposium. Dresden, Germany.
393
Alphonse, L. (2014). New STEM index finds Americas STEM talent pool still too shallow to
meet demand, U.S. News. http://www.usnews.com/news/stem-index/articles/2014/04/23/
new-stem-index-finds-americas-stem-talent-pool-still-too-shallow-to-meetdemand?int=9a5208 (accessed March 3, 2015).
Anspach, J. H. (2011). Utility data management. http://modot.org/business//Utility Data
Management.pdf (accessed September 15, 2011).
Arditi, D. and Polat, G. (2010). Graduate education in construction management. Journal of
Professional Issues in Engineering Education and Practice, 3, 175179.
ASCE (American Society of Civil Engineers) (2002). Standard guideline for the collection
and depiction of existing subsurface utility data. ASCE/CI Standard 38-02, Reston, VA.
Azuma, R. (1997). A survey of augmented reality. Teleoperators and Virtual Environments,
6(4), 355385.
Azuma, R., Baillot, Y., Behringer, R., Feiner, S., Julier, S., and MaIntyre, B. (2001). Recent
advances in augmented reality. Journal of Computer Graphics and Applications,
21(6), 3447.
Beder, C., Bartczak, B., and Koch, R. (2007). A comparison of PMD-cameras and stereovision for the task of surface reconstruction using patchlets. Proceedings of 2007 IEEE
Conference on Computer Vision and Pattern Recognition. Minneapolis, MN.
Behzadan, A. H. (2008). ARVISCOPE: Georeferenced visualization of dynamic construction
processes in three-dimensional outdoor augmented reality. PhD dissertation, Department
of Civil and Environmental Engineering, University of Michigan, Ann Arbor, MI.
Behzadan, A. H., Dong, S., and Kamat, V. R. (2012). Chapter 5Mobile and pervasive
construction visualization using outdoor augmented reality. In Mobile and Pervasive
Computing in Construction, C. Anumba and X. Wang (eds.), pp. 5485. John Wiley &
Sons: Hoboken, NJ.
Behzadan, A. H. and Kamat, V. R. (2005). Visualization of construction graphics in outdoor
augmented reality. Proceedings of the 2005 Winter Simulation Conference, Institute of
Electrical and Electronics Engineers (IEEE). Orlando, FL.
Behzadan, A. H. and Kamat, V. R. (2007). Georeferenced registration of construction graphics
in mobile outdoor augmented reality. Journal of Computing in Civil Engineering, 21(4),
247258.
Behzadan, A. H. and Kamat, V. R. (2008). Resolving incorrect occlusion in augmented reality animations of simulated construction operations. Proceedings of the 15th Annual
Workshop of the European Group for Intelligent Computing in Engineering (EG-ICE),
European Group for Intelligent Computing in Engineering, Plymouth, U.K., pp.
2435.
Behzadan, A. H. and Kamat, V. R. (2009a). Interactive augmented reality visualization
for improved damage prevention and maintenance of underground infrastructure.
Proceedings of 2009 Construction Research Congress, American Society of Civil
Engineers, pp. 12141222. Seattle, WA.
Behzadan, A. H. and Kamat, V. R. (2009b). Automated generation of operations level construction animations in outdoor augmented reality. Journal of Computing in Civil
Engineering, 23(6), 405417.
Behzadan, A. and Kamat, R. V. (2012). A framework for utilizing context-aware augmented
reality visualization in engineering education. Proceedings of the 2012 International
Conference on Construction Applications of Virtual Reality (CONVR). Taipei, Taiwan.
Behzadan, A. H., Timm, B. W., and Kamat, V. R. (2008). General-purpose modular hardware
and software framework for mobile outdoor augmented reality applications in engineering. Advances Engineering Informatics, 22, 90105.
Berger, M.-O. (1997). Resolving occlusion in augmented reality: A contour based approach
without 3D reconstruction. Proceedings of 1997 IEEE Conference on Computer Vision
and Pattern Recognition. San Juan, PR.
394
Billinghurst, M. and Kato, H. (1999). Collaborative mixed reality. Proceedings of the 1999
International Symposium on Mixed Reality. Yokohama, Japan.
Bowie, J. (2010). Enhancing classroom instruction with collaborative technologies.
http://www.eschoolnews.com/2010/12/20/enhancing - classroom-instruction - with-
collaborative-technologies/ (accessed March 3, 2015).
Brooks, F. P. (1999). Whats real about virtual reality? Journal of Computer Graphics and
Applications, 19(6), 1627.
Burggraf, D. S. (2006). Geography markup language. Data Science Journal, 5(October 19), 178.
Cable Detection (CD) (2010). EZiDig fact sheet. http://www.cabledetection.co.uk/ezidig
(accessed September 15, 2010).
Canny, J. (1986). A computational approach to edge detection. IEEE Transactions Pattern
Analysis and Machine Intelligence, PAMI, 8(6), 679698.
Chi, S., Caldas, C. H., and Gong, J. (2008). A crash avoidance framework for heavy equipment control systems using 3D imaging sensors. ITCon Special Issue on Sensors in
Construction and Infrastructure Management, 13, 118133.
Chock, G. (2006). ATC-20 post-earthquake building safety evaluations performed after
the October 15, 2006 Hawaii Earthquakes Summary and Recommendations for
Improvements (updated). Hawaii Structural Engineers Association: Honolulu, HI.
Cohen, J. D., Lin, M. C., Manocha, D., and Ponamgi, M. (1995). I-COLLIDE: An interactive
and exact collision detection system for large-scaled environments. Proceedings of the
ACM International Symposium on Interactive 3D Graphics, Association for Computing
Machinery, New York, pp. 189196.
Common Ground Alliance (CGA) (2010a). 811 campaign. http://www.call811.com/ (accessed
September 15, 2010).
Common Ground Alliance (CGA) (2010b). Locate accurately, dig safely. http://www.common
groundalliance.com/Content/NavigationMenu/Publications_and_Resources/Educational_
Programs/LocateAccuratelyBrochure.pdf (accessed September 15, 2010).
Costello, S. B., Chapman, D. N., Rogers, C. D. F., and Metje, N. (2007). Underground asset
location and condition assessment technologies. Tunneling and Underground Space
Technology, 22, 524542.
Dai, F., Lu, M., and Kamat, V. R. (2011). Analytical approach to augmenting site photos with
3D graphics of underground infrastructure in construction engineering applications.
Journal of Computing in Civil Engineering, 25(1), 6674.
Dong, S. (2012). Scalable and extensible augmented reality with applications in civil infrastructure systems. PhD dissertation, Department of Civil and Environmental Engineering,
University of Michigan, Ann Arbor, MI.
Dong S., Behzadan A. H., Feng C., and Kamat V. R. (2013). Collaborative visualization of
engineering processes using tabletop augmented reality. Elsevier Journal of Advances
in Engineering Software, 55, 4555.
Dong, S. and Kamat, V. R. (2010). Robust mobile computing framework for visualization of simulated processes in augmented reality. Proceedings of the 2010 Winter Simulation Conference,
Institute of Electrical and Electronics Engineers, pp. 31113122. Baltimore, MD.
Duda, R. O. and Hart, P. E. (1972). Use of the Hough transformation to detect lines and curves
in pictures. Communications of the ACM, 15(1), 1115.
EPA, United States Environmental Protection Agency. (2007). Drinking water needs survey
and assessment, fourth report to congress. http://www.epa.gov/safewater/needssurvey/
pdfs/2007/report_needssurvey_2007.pdf (accessed September 14, 2010).
ESRI. (2005). DTE energy, energy currentsGIS for energy, spring 2005, Redlands, CA. http://
www.esri.com/library/reprints/pdfs/enercur-dte-energy.pdf (accessed September 15, 2010).
Feng, C. and Kamat, V. R. (2012). A plane tracker for AEC automation applications. The
2012 International Symposium on Automation and Robotics in Construction, Mining
and Petroleum. Oulu, Finland.
395
Fortin, P.-A. and Hebert, P. (2006). Handling occlusions in real-time augmented reality:
Dealing with movable real and virtual objects. Proceedings of the 2006 Canadian
Conference on Computer and Robot Vision, pp. 5462. Quebec City, Canada.
Fukuda, Y., Feng, M., Narita, Y., Kaneko, S., and Tanaka, T. (2010). Vision-based displacement sensor for monitoring dynamic response using robust object search algorithm.
IEEE Sensors, 13(12), 19281931.
Georgel, P., Schroeder, P., Benhimane, S., and Hinterstoisser, S. (2007). An industrial augmented reality solution for discrepancy check. Proceedings of the 2007 IEEE and ACM
International Symposium on Mixed and Augmented Reality, pp. 111115. Nara, Japan.
Google (2008). Introducing libkml: A library for reading, writing, and manipulating KML.
http://google-opensource.blogspot.com/2008/03/introducing-libkml-library-for-
reading.html (accessed March 3, 2015).
Google (2012). KML documentation introduction. https://developers.google.com/kml/documentation/ (accessed March 3, 2015).
Gokturk, B. S., Yalcin, H., and Bamji, C. (2010). A time-of-flight depth sensorSystem
description, issues and solutions. Proceedings of 2010 IEEE Conference on Computer
Vision and Pattern Recognition Workshop. San Francisco, CA.
Golparvar-Fard, M., Pena-Mora, F., Arboleda, C. A., and Lee, S. (2009). Visualization of construction progress monitoring with 4D simulation model overlaid on time-lapsed photographs. Journal of Computing in Civil Engineering, 23(6), 391404.
Hammad, A., Wang, H., and Mudur, S. P. (2009). Distributed augmented reality for visualizing collaborative construction tasks. Journal of Computing in Civil Engineering, 23(6), 418427.
Hart, G. C. (2008). An alternative procedure for seismic analysis and design of tall buildings located in the Los Angeles region. Los Angeles Tall Buildings Structural Design
Council (LATBSDC), Los Angeles, CA. http://www.tallbuildings.org/PDFFiles/2011LA-CRITERIA-FINAL.pdf (accessed March 3, 2015).
Hirokazu, K. and Billinghurst, M. (1999). Marker tracking and HMD calibration for a videobased augmented reality conferencing system. Proceedings of the 1999 IEEE and ACM
International Workshop on Augmented Reality (IWAR 99). San Francisco, CA.
Ji, Y. (2010). A computer vision-based approach for structural displacement measurement.
Sensors and Smart Structures Technologies for Civil, Mechanical, and Aerospace
Systems, 43(7), 642647.
Kamat, V. R. and El-Tawil, S. (2007). Evaluation of augmented reality for rapid assessment
of earthquake-induced building damage. Journal of Computing in Civil Engineering,
21(5), 303310.
Kamat, V. R. and Martinez, J. C. (2003). Automated generation of dynamic, operations
level virtual construction scenarios. Electronic Journal of Information Technology in
Construction (ITcon), 8, 6584.
Kanbara, M., Okuma, T., Takemura, H., and Yokoya, N. (2000). A stereoscopic video seethrough augmented reality system based on real-time vision-based registration.
Proceedings of the IEEE Virtual Reality 2000, pp. 255262. New Brunswick, NJ.
Kim, C., Haas, C. T., and Liapi, K. A. (2005). Rapid, on-site spatial information acquisition
and its use for infrastructure operation and maintenance. Automation in Construction,
14(5), 666684.
Kini, A. P., King, K., and Kang, S. (2009). Sensor calibration and real-time tracking of a backhoe loader using the iGPS system. Quality Digest Magazine. http://www.qualitydigest.
com/inside/cmsc-article/sensor-calibration-and-real-time-tracking-backhoe-loader-
using-igps-system.html (accessed September 15, 2011).
Koch, R., Schiller, I., Bartczak, B., Kellner, F., and Kser, K. (2009). MixIn3D: 3D mixed reality with
ToF-camera. Proceedings of 2009 Dynamic 3D Imaging (Dyn3D) Workshop, Jena, Germany.
Koo, B. and Fischer, M. (2000). Feasibility study of 4D CAD in commercial construction.
Journal of Construction Engineering and Management, 126(4), 251260.
396
Krishnan, S. (2006). Case studies of damage to tall steel moment-frame buildings in Southern
California during Large San Andreas Earthquakes. Bulletin of the Seismological Society
of America, 96(4A), 15231537.
Kwon, S., Boche, F., Kim, C., Haas, C. T., and Liapi, K. (2004). Fitting range data to primitives
for rapid local 3D modeling using sparse range point clouds. Journal of Automation in
Construction, 13, 6781.
Larsen, E., Gottschalk, S., Lin, M., and Manocha, D. (1999). Fast proximity queries with
swept sphere volumes. Technical report TR99-018. Department of Computer Science,
UNC-Chapel Hill, Chapel Hill, NC. http://gamma.cs.unc.edu/SSV/ssv.pdf (accessed
Lepetit, V. and Berger, M.-O. (2000). A semi-automatic method for resolving occlusion in augmented reality. Proceedings of 2000 IEEE Conference on Computer Vision and Pattern
Recognition. Hilton Head Island, SC.
Lester, J. and Bernold, L. E. (2007). Innovative process to characterize buried utilities using
ground penetrating radar. Automation in Construction, 16, 546555.
Louis, J. and Martinez, J. (2012). Rendering stereoscopic augmented reality scenes with occlusions using depth from stereo and texture mapping. Proceedings of 2012 Construction
Research Congress, American Society of Civil Engineers, West Lafayette, IN.
Lourakis, M. (2011). Homest: A C/C++ library for robust, non-linear homography estimation.
http://www.ics.forth.gr/~lourakis/homest/ (accessed March 3, 2015).
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International
Journal of Computer Vision.
Lucas, B. D. and Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. Proceedings of the Seventh International Joint Conference on
Artificial Intelligence.
Martinez, J. C. (1996). Stroboscope: State and resource based simulation of construction processes. Thesis, University of Michigan, Ann Arbor, MI.
Martz, P. (2007). OpenSceneGraph Quick Start Guide.
Mendez, E., Schall, G., Havemann, S., Junghanns, S., Fellner, D., and Schmalstieg, D.
(2008). Generating semantic 3D models of underground infrastructure. IEEE Computer
Graphics and Applications, 28(3), 4857.
Mills, J. E. and Treagust, D. F. (2003). Engineering educationIs problem-based or projectbased learning the answer? Australasian Journal of Engineering Education, 3(2), 116.
Miranda, E., Asce, M., and Akkar, S. D. (2006). Generalized interstory drift spectrum. Journal
of Structural Engineering, 132(6), 840852.
MISS DIG Systems Inc. (MDS) (2007). One-Call Excavation Handbook. http://www.missdig.net/images/Education_items/2007onecall_handbook.pdf (accessed September 15,
2011).
Oloufa, A. A., Ikeda, M., and Oda, H. (2003). Situational awareness of construction equipment using GPS, wireless and web technologies. Automation in Construction, 12(6),
737748.
Olson, E. (2011). AprilTag: A robust and flexible visual fiducial system. Proceedings of the
IEEE International Conference on Robotics and Automation (ICRA). Shanghai, China.
Ozuysal, M., Fua, P., and Lepetit, V. (2007). Fast keypoint recognition in ten lines of code.
Proceedings of 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp.18.
Minneapolis, MN
Park, H. S., Lee, H. M., Adeli, H., and Lee, I. (2007). A new approach for health monitoring of structures: Terrestrial laser scanning. Journal of Computer Aided Civil and
Infrastructure Engineering, 22(1), 1930.
Patel, A. and Chasey, A. (2010). Integrating GPS and laser technology to map underground
utilities installed using open trench method. Proceedings of 2010 Construction Research
Congress, American Society of Civil Engineers, Banff, Canada.
397
Patel, A., Chasey, A., and Ariaratnam, S. T. (2010). Integrating global positioning system with
laser technology to capture as-built information during open-cut construction. Journal
of Pipeline Systems Engineering and Practice, ASCE, 1(4), 147155.
PHMSA (Pipeline and Hazardous Materials Safety Administration) (2010). Stakeholder communications, consequences to the public and the pipeline industry. http://primis.phmsa.
dot.gov/comm/reports/safety/CPI.html (accessed September 15, 2011).
Piekarski, W. (2006) 3D Modelling with the tinmith mobile outdoor augmented reality system.
IEEE Computer Graphics and Applications, 26(1), 1417.
PMD (2010). PMD CamCube 3.0. http://www.pmdtec.com/news_media/video/camcube.php
(accessed March 3, 2015).
Porter, T. R. (2010). Navigation sensors enhance GIS capabilities. Midstream Business,
Houston, TX, July 1, 2010. http://www.pipelineandgastechnology.com/Operations/
SCADAAutomation/item63959.php (accessed February 10, 2011). Houston, TX.
ProStar (2011). About ProStar. http://www.guardianprostar.com/company.htm (accessed
Roberts, G., Evans, A., Dodson, A. H., Denby, B., Cooper, S., and Hollands, R. (2002a). The
use of augmented reality, GPS and INS for subsurface data visualization. Proceedings of
the 2002 FIG XIII International Congress, pp. 112. Washington, DC.
Roberts, G., Ogundipe, O., and Dodson, A. H. (2002b). Construction plant control using RTK
GPS. Proceedings of the FIG XXII International Congress, Washington, DC.
Rojah, C. (2005). ATC-20-1 Field Manual: Post Earthquake Safety Evaluation of Buildings.
Applied Technology Council: Redwood City, CA.
Ruff, T. M. (2001). Monitoring blind spotsA major concern for haul trucks. Engineering
and Mining Journal, 202(12), 1726.
Ryu, S.-W., Han, J.-H., Jeong, J., Lee, S. H., and Park, J. I. (2010). Real-time occlusion culling for augmented reality. Proceedings of the 2010 Korea-Japan Joint Workshop on
Frontiers of Computer Vision, pp. 498503. Hiroshima, Japan.
Schall, G., Mendez, E., and Schmalstieg, D. (2008). Virtual redlining for civil engineering in
real environments, Proceedings of the IEEE International Symposium on Mixed and
Augmented Reality (ISMAR), Cambridge, U.K., pp. 9598.
Shi, J. and Tomasi, C. (1994). Good features to track. Proceedings of 1994 IEEE Conference
on Computer Vision and Pattern Recognition, pp. 593600. Seattle, WA.
Shin, D. H. and Dunston, P. S. (2008). Identification of application areas for augmented reality in industrial construction based on technology suitability. Journal of Automation in
Construction, 17(7), 882894.
Shreiner, D., Woo, M., Neider, J., and Davis, T. (2006). OpenGL Programming Guide. Pearson
Education. Boston, MA.
Skolnik, D. A. and Wallace, J. W. (2010). Critical assessment of interstory drift measurements.
Journal of Structural Engineering, 136(12), 15741584.
Son, H., Kim, C., and Choi, K. (2010). Rapid 3D object detection and modeling using range
data from 3D range imaging camera for heavy equipment operation. Automation in
Construction, 19(7), 898906.
Spurgin, J. T., Lopez, J., and Kerr, K. (2009). Utility damage preventionWhat can your
agency do? Proceedings of the 2009 APWA International Public Works Congress &
Exposition. American Public Works Association (APWA): Kansas City, MO.
Sterling, R. (2000). Utility locating technologies: A summary of responses to a statement of need distributed by the Federal Laboratory Consortium for Technology Transfer. Federal Laboratory
Consortium Special Reports Series No. 9. Louisiana Tech University: Ruston, LA.
Subsurface Imaging Systems (SIS) (2011). Surface penetrating radar systems. http://
subsurfaceimagingsystems.com (accessed February 10, 2011).
Talmaki, S. A., Kamat, V. R., and Cai, H. (2013). Geometric modeling of geospatial data for
visualization-assisted excavation. Advanced Engineering Informatics, 27(2), 283298.
398
Teizer, J., Allread, B. S., Fullerton, C. E., and Hinze, J. (2010). Autonomous pro-active
real-time construction worker and equipment operator proximity safety alert system.
Automation in Construction, Elsevier, 19(5), 630640.
Tener, R. K. (1996). Industry-university partnerships for construction engineering education.
Journal of Professional Issues in Engineering Education and Practice, 122(4), 156162.
Tian, Y., Guan, T., and Wang, C. (2010). Real-time occlusion handling in augmented reality
based on an object tracking approach. Sensors, 10(4), 28852900.
Trimble (2009). Trimble R8 GNSS. http://www.trimble.com/Survey/trimbler8gnss.aspx
(accessed March 3, 2015).
Tubbesing, S. K. (1989). The Loma Prieta, California, Earthquake of October 17, 1989-Loss
Estimation and Procedures.
Underground Imaging Technologies (UIT) (2009a). Multi-sensor TDEM system. http://www.
uit-systems.com/emi.html (accessed February 10, 2011).
Underground Imaging Technologies (UIT) (2009b). Case studies. http://www.uit-systems.
com/case_studies.html (accessed February 10, 2011).
Underground Imaging Technologies (UIT) (2010). Seeing what cant be seen with 3D subsurface imaging. www.uit-systems.com (accessed February 10, 2011).
Vidal, F., Feriche, M., and Ontiveros, A. (2009). Basic techniques for quick and rapid postearthquake assessments of building safety. Proceedings of the 2009 International
Workshop on Seismic Microzoning and Risk Reduction, pp. 110. Almeria, Spain.
Vincenty, T. (1975). Direct and inverse solutions of geodesics on the ellipsoid with application
of nested equations. Survey Reviews.
Virtual Underground (VUG) (2010). Virtual underground. http://www.virtualug.com (accessed
February 10, 2011).
von Gioi, R., Jakubowicz, J., Morel, J.-M., and Randall, G. (2010). LSD: A Fast Line Segment
Detector with a false detection control. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 32(4), 722732.
Wahbeh, A. M., Caffrey, J. P., and Masri, S. F. (2003). A vision-based approach for the direct
measurement of displacements in vibrating systems. Smart Materials and Structures,
12(5), 785794.
Wang, X. and Dunston, P. (2008). User perspectives on mixed reality tabletop visualization for faceto-face collaborative design review. Journal of Automation in Construction, 17(4), 399412.
Webster, A., Feiner, S., Macintyre, B., and Massie, W. (1996). Augmented reality in architectural construction, inspection, and renovation. Proceedings of 1996 ASCE Congress on
Computing in Civil Engineering. Anaheim, CA.
Wloka, M. M. and Anderson, B. G. (1995). Resolving occlusion in augmented reality.
Proceedings of Symposium on Interactive 3D Graphics, pp. 512. New York, NY.
Wyland, J. (2009). Rio grande electric moves mapping and field work into digital realm. PRWeb,
February 12, 2009. http://www.prweb.com/releases/mapping/utility/prweb2027014.htm
(accessed February 10, 2011).
Xu, N. and Bansal, R. (2003). Object segmentation using graph cuts based active contours.
Proceedings of 2003 IEEE Conference on Computer Vision and Pattern Recognition,
pp. 4653. Madison, WI.
15
Augmented Reality
HumanRobot
Interfaces toward
Augmented Robotics
Maki Sugimoto
CONTENTS
15.1 Introduction................................................................................................... 399
15.2 Video See-Through Augmented Reality Interface for Tele-Operation.........400
15.3 Future Predictive Visual Presentation with Consideration of Real
Environment..................................................................................................404
15.4 Projection-Based Augmented Reality for Gaming Robots...........................405
15.5 Active Tangible Robots in AR Environment.................................................407
15.6 Conclusion.....................................................................................................408
References...............................................................................................................409
15.1INTRODUCTION
This chapter introduces a set of example applications of humanrobot interfaces
with augmented reality (AR) technology. Previously, robots were used in controlled
environments such as factories for automation, performing highly repetitious tasks.
Nowadays, robots are widely distributed throughout society and can be found performing a range of tasks such as daily housekeeping, search-and-rescue missions,
and humanrobot communications. Figure 15.1 shows an example of robots ranging
from household appliances to mobile robots. AR technology is able to contribute to
visualizing information between robots and users in those situations.
Figure 15.2 shows examples of information related to robots and users. Robots
are able to capture valuable information by their embedded sensors such as vision,
thermo, and depth. Furthermore, robots have information related to their behavior
such as internal motor control plans and trajectory records. It is possible to make
seamless the cooperation between robots and users by projecting such information
onto real scenes with AR technology.
399
400
FIGURE 15.1 Variety of robots. (Illustration by Motoko Kanegae.)
Task
Sensor
information
Attention
Action plan
Recorded
trajectory
Humanrobot
interface
Battery
information
FIGURE 15.2 Information around robot and user.
15.2VIDEO SEE-THROUGH AUGMENTED REALITY

INTERFACE FOR TELE-OPERATION
Video see-through-based AR is one of the effective methods for creating visual
interfaces for robots. Especially, video see-through displays are commonly used for
remote operations (Milgram etal., 1995; Hashimoto etal., 2011). This section introduces some of the video see-through AR interfaces.
Tele-operated rescue robots (Tadokoro et al., 2000) have been developed for
search-and-rescue tasks within unknown disordered regions such as collapsed buildings, postearthquake debris, and fields of natural disaster. Time Followers Vision
(Sugimito et al., 2005) is one of the AR interfaces used for tele-robot operation.
Figure 15.3a shows a snapshot of a rescue robot. An egocentric view camera capturing the first-person viewpoint is typically installed in such remote-controlled robots.
Figure 15.3b shows an image captured by an egocentric view camera on the robot.
By observing the camera image without an efficient visual presentation interface,
the operator tends to mistake the position and direction of the robot, which leads to
a decrease in the probability of achieving critical mission objectives. In particular,
it is difficult for an operator who is not accustomed to the operation to estimate
Augmented Reality HumanRobot Interfaces toward Augmented Robotics
(a)
401
(b)
FIGURE 15.3 (a) Rescue robot and (b) camera image of egocentric view.
the position and direction of the robot, especially because the distance to a target
is strictly based on camera images from the first-person viewpoint. The exocentric
view camera that is physically installed to the robot is very effective in such situations. However, to appropriately attach an exocentric view camera, physical expansion of the robot is needed. Such a large robotic body often disturbs the activity of
the robot due to physical constraints.
Thinking about the time base of the egocentric view camera image, for instance,
when the robot is advanced, a previous image has more environmental information
of the robot than the current image. The viewpoint of the previous image contains
the current position of the robot in such a situation, as shown in Figure 15.4.
In such a situation, by capturing a previous first-person-view image, a comprehensible third-person-perspective image can be generated even without reconstructing any complex environmental model. By superimposing a model of the robot on a
recorded image, it is possible to create a virtual third-person view that shows the current situation of the robot. Figure 15.5a shows fundamental concepts of the system.
Figure 15.5b shows a snapshot of a remote robot operation by Time Followers Vision.
Figure 15.6a shows the system configuration of the tele-operation interface.
Itconsists of a tele-operated robot that has a first-person-view camera, a sensor to
record the position and direction of the robot, and a visual presentation interface for
the operator that presents a virtual third-person view. The operator controls the robot
Past position
Camera
Moving direction
Field of view
FIGURE 15.4 Past field view of egocentric camera.
Current position
402

Teleoperated robot
Wireframe
model
Past image
User
(a)
(b)
FIGURE 15.5 (a) Concept and (b) remote robot operation by Time Followers Vision.
by looking at the image presented on the screen. Figure 15.6b shows a snapshot of
virtual third-person view generated by Time Followers Vision.
Through this configuration, the image captured by the camera during remote
operation is stored in a database along with time, position, and direction information.
To generate the virtual third-person view, the system selects the optimal background
image within the database, which provides information on both the current situation
and the surrounding environment of the robot.
The background image is selected by an evaluation function that considers the
field of view, the position of the camera, and the current position of the vehicle.
After a background image has been selected, a CG (Computer Graphics) model of
the vehicle is mapped with the image to generate a virtual viewpoint as seen from
behind the vehicle. It contributes to operability of the robot since both the current
state of vehicle and its environment are clearly visible to the operator.
Although past images are readily available when the environment is static,
problems arise when the environment is dynamic due to disparities in the present
PC
Vehicle
Camera
Images
DirectShow filter
Position
direction
Image
database
Query
Direct 3D
application
3D sensor
403
Wireframe
model
Selected
images
Background
texture
Vision
Control
unit
Command
Operator
(a)
(b)
FIGURE 15.6 (a) Virtual third-person view and (b) system configuration.
environment and the stored images. To apply this method to a dynamic environment,
detecting the difference between them is one of future works.
This method explained earlier can be used in various fields. Figure 15.7 shows
snapshots of a four-wheel robot for outdoor environments and the virtual third-
person-view interface with it. A differential GPS (Global Positioning System) unit
was used as the position sensor for this implementation. It was possible to remotely
operate the robot in a closed tarmac course.
404
(a)
(b)
FIGURE 15.7 (a) Remote robot in outdoor and (b) remote operation in closed tarmac course.
15.3FUTURE PREDICTIVE VISUAL PRESENTATION WITH

CONSIDERATION OF REAL ENVIRONMENT
Advanced technology of range finders provides depth information of surrounding
environments for robots. Pathfinder Vision (Maeda etal., 2013) adds functionality
to the virtual third-person-view interface to assist future prediction by operators.
Itsupports short-term path planning by presenting images that consider the physical
interference between models of the surrounding environment and the current state
of a robot. This system generates a predictive robot model from operating information. Also, a model of obstacles in the real environment is obtained by a range finder.
Byvisualizing the predicted future, it is possible to prevent collisions earlier than
actual collision. This system helps operators discover more secure future trajectories
of the robot. Figure 15.8a shows a remote-operated robot with an RGBD camera
(arange finder integrated with an RGB camera).
Figure 15.8b shows a model of obstacles in the real environment. Complex environments represent difficult challenges for the operator during remote operation
of the robot. When investigating such environments, the operability takes priority,
and requires educated operators to generate the best performance. The third-person
(a)
(b)
FIGURE 15.8 (a) Tele-operated robot with RGBD camera and (b) model of obstacles.
405
FIGURE 15.9 Applying physical simulation (a) before and (b) after.
viewpoint generated either by projecting the CG-based robot model to the stored
sequence of images or by directly capturing the scene using pole cameras lets the
operator become aware of the relative position of the robot in the surroundings.
However, it is still dependent on the operators prior experience to predict the nearfuture event, such as the possibility of a collision or fall that may or may not result
from the current course of robot movement, which thus makes it hard to set the
proper route for the robot.
Pathfinder Vision provides an informative interface that supports the operator
in becoming cognizant of near-future events by generating and presenting images
depicting the predicted events by considering the physical interferences and current
state of vehicle operation (Figure 15.9).
This system predicts where the robot will be after a few seconds, based on the
current state of operation. The prediction of the near-future robot state is governed by the physics simulation of the robot model and the geometric model of
the obstacles. The geometry of the obstacles is obtained by the range finder in real
time. A future prediction image is generated according to the simulation result,
and is presented to the operator through a visual interface for remote operation.
Having this kind of interface augments the ability of remote operators in complex
environments.
15.4PROJECTION-BASED AUGMENTEDREALITY
FOR GAMING ROBOTS
Applications of AR techniques for the control of robots is not limited to the video
see-through configuration described earlier. A projection-based configuration is
another possibility for AR robot systems. Augmented Coliseum (Kojima etal., 2006)
is an example of such a projection-based AR in a local environment. Figure 15.10
shows snapshots of Augmented Coliseum.
Figure 15.11 shows the system configuration. It consists of robots that are enhanced
by computer graphics and the virtual environment as it is represented inside the
computer. In the virtual environment, there are models of the robots that are synchronized with the real environment. Hence, it can achieve interactions between the
virtual and real environments.
406
(a)
(b)
FIGURE 15.10 Robots in augmented gaming environment. (a) Robots in AR gaming environment and (b) AR explosion effect.
Computer
Video signal
Video signal
generator
Display device
(projector)
Fiducial
images
Fiducial images
User
Real robots
Operation
CG objects
Position direction
Virtual environment
Models of
virtual objects
Physics simulation
Models of
robots
Measurement
device
FIGURE 15.11 System configuration.
When a user moves a robot in the real environment, the model of the robot in
the virtual environment is updated by measuring the robots position and direction.
In order to measure those factors, this system uses a display-based measurement
technique (Sugimoto etal., 2007). The interaction between the robot and a virtual
object that does not exist in the real environment is realized by physics simulation. The physics engine updates virtual forces for each model of robots and virtual
objects. According to the virtual forces, the motions of the virtual objects and the
robots are controlled. Figure 15.12 shows the physics simulation model of Augmented
Coliseum. It calculates reaction forces in the virtual environment by the penalty
method that considers interpenetration of objects.
Reaction force F
Damper
Spring
407
Speed v
Interpenetration x
FIGURE 15.12 Physics simulation model.

Direction of the robot
CG object
Virtual force
Force by the wheel
FIGURE 15.13 Virtual force presentation by actuators of robot.
The calculated virtual forces are presented by the actuators of the robots. If the
robots have omnidirectional wheels, it is very easy to render the virtual force by
them. On the other hand, robots with non-omnidirectional wheels are able to present the forces arbitrarily by changing the direction and position of the robot itself as
For a robot that has two ordinal wheels, it is possible to apply current for both
actuators according to the angle between the virtual force and the direction of the
robot. By applying this physics simulation and the robot control method, it is possible
to consider the robots as a physical display of the virtual forces.
Figure 15.14 shows a conceptual image of bilateral augmentation by robots and
virtual objects. In this environment, not only are virtual functions given for the robots
by projection, but also it is possible to augment the presence of virtual objects by
robot motion. The physics simulation gives seamless reality for the AR environment.
15.5 ACTIVE TANGIBLE ROBOTS IN AR ENVIRONMENT

Synchronizing robots through a physical communication interface is another possibility in AR environments. The concept of local and remote active tangible interactions (Richter etal., 2007) brings robots into a tabletop environment with distributed
and synchronized configuration. Figure 15.15 shows tabletop synchronized robots
by a display-based robot control technique (Leitner etal., 2009). Users are able to
handle robots directly in this environment. The system can synchronize the robots
in local and remote locations. It is possible to use these kinds of robots for remote
tangible communications.
408

Robot
Robot
Computer graphics
(a)
Computer graphics
Robot
(b)
Robot
FIGURE 15.14 (a) Augmentation by CG and (b) augmentation by robots.
FIGURE 15.15 Synchronized robots in tabletop environment.
15.6CONCLUSION
This chapter introduced a video see-through-based remote robot operation interface
and a projection-based local robot augmentation system as applications of AR technology for robots. It is possible to create a better understanding between the users
and the robots by projecting virtual information on to the real environment. Not
surprisingly, computer graphics are able to support visualizing information and give
virtual augmentation for the robots. But the robots are also able to behave according
to the present information in the virtual environment through physics simulation.
Itcontributes to design natural and intuitive humanrobot interfaces.
409
REFERENCES
Hashimoto, S., Ishida, A., Inami, M., and Igarashi, T. (2011). TouchMe: An augmented reality
based remote robot manipulation, in: Proceedings of the 21st International Conference
on Artificial Reality and Telexistence (ICAT11).
Kojima, M., Sugimoto, M., Nakamura, A., Tomita, M., Nii, H., and Inami, M. (2006).
Augmented Coliseum: An augmented game environment with small vehicles, in:
Proceedings of the First IEEE International Workshop on Horizontal Interactive
HumanComputer Systems (TABLETOP06), pp. 38.
Leitner, J., Haller, M., Yun, K., Woo, W., Sugimoto, M., Inami, M., Cheok, A.D., and BeenLirn, H.D. (2009). Physical interfaces for tabletop games, Computers in Entertainment,
7(4), Article No. 61.
Maeda, N., Song, C., and Sugimoto, M. (2013). Pathfinder Vision: Future prediction augmented reality interface for vehicle tele-operation using past images, in: Proceedings of
the 23rd International Conference on Artificial Reality and Telexistence 2013 (ICAT13).
Milgram, P., Rastogi, A., and Grodski, J.J. (1995). Telerobotic control using augmented reality, in: Proceedings of Fourth IEEE International Workshop on Robot and Human
Communication (RO-MAN95).
Richter, J., Thomas, B.H., Sugimoto, M., and Inami, M. (2007). Remote active tangible
interactions, in: Proceedings of the First International Conference on Tangible and
Embedded Interaction, pp. 3942.
Sugimito, M., Kagotani, G., Nii, H., Shiroma, N., Inami, M., and Matsuno, F. (2005). Time
Followers Vision: A tele-operation interface with past images, IEEE Computer Graphics
and Applications, 25(1), 5463.
Sugimoto, M., Kodama, K., Nakamura, A., Kojima, M., and Inami, M. (2007). A display-based
tracking system: Display-based computing for measurement systems, in: Proceedings
of the 17th International Conference on Artificial Reality and Telexistence (ICAT07),
pp. 3138.
Tadokoro, S., Kitano, H., Takahashi, T., Noda, I., Matsubara, H., Shinjoh, A., Koto, T. etal.
(2000). The RoboCup-Rescue project, in: Proceedings of the 2000 IEEE International
Conference on Robotics & Automation, pp. 40904095.
16
Use of Mobile
Augmented Reality
forCultural Heritage
John Krogstie and Anne-Cecilie Haugstvedt
CONTENTS
16.1 Introduction................................................................................................... 411
16.2 Background on Mobile AR for Cultural Heritage......................................... 412
16.3 Application Example: Historical Tour Guide................................................ 416
16.3.1 Overview of Application.................................................................... 418
16.3.2 Technical Details............................................................................... 419
16.4 Evaluation of User Interest of the Application.............................................. 421
16.4.1 Results................................................................................................ 421
16.5 Conclusion..................................................................................................... 427
References............................................................................................................... 429
16.1INTRODUCTION
Cultural heritage is the legacy of physical artifacts and intangible attributes of a
group or society that are inherited from past generations and maintained in the present for the benefit of current and future generations. An important societal challenge
is to both preserve and make cultural heritage artifacts accessible to the general
public in both short- and long-term time frames. One recent technology that is being
used to help preserve cultural heritage is augmented reality (henceforth AR).
To preserve cultural artifacts, several cultural heritage institutions have developed
their own mobile AR applications using cultural heritage resources. These applications combine AR technology with historical pictures and other cultural artifacts.
Aquestion investigated in this chapter is how effective is AR technology for presenting cultural heritage information, and how acceptable is the technology from a users
perspective? A number of studies have examined the acceptance of mobile applications and services (Gao etal. 2011, 2014, Ha etal. 2007, Kaasinen 2005, Liang and
Yeh 2010, Liu and Li 2011, van der Heijden etal. 2005, Verkasalo etal. 2010), in
some cases adding to the traditional technology acceptance model (TAM) based on
limitations of TAM for mobile applications (Gao etal. 2008, Wu and Wang 2005).
Further, while recent studies of mobile technology have examined user acceptance of
mobile tourist guides (Peres etal. 2011, Tsai 2011), we have found only one study that
has examined user acceptance of mobile AR (van Kleef etal. 2010). Therefore,user
411
412
acceptance studies of mobile AR applications with cultural heritage resources are

rare. Thus, despite the popularity of AR for cultural heritage applications, and that
user acceptance of technology has been of interest to researchers and practitioners
for decades, little research has been done to study users acceptance or willingness
to use mobile AR applications combined with cultural heritage resources.
The aim of this chapter is to present an overview of current approaches that use
AR, including mobile AR, for the presentation of cultural heritage. Interestingly,
many studies on AR technology have not looked specifically on the acceptance and
use of AR for different tasks and applications. Therefore, in this chapter, we highlight one recent example of a mobile AR system that was designed specifically for
cultural heritage; we also report the results of a study focusing on factors influencing
the acceptance of mobile AR systems, and we evaluate to what extent there seems to
be interest in AR-based applications, both to a tourist and to local community members that have an interest in local cultural history. Finally, cultural heritage includes
tangible culture (such as buildings, monuments, works of art, and artifacts), intangible culture (such as folklore, traditions, language, and knowledge), and natural
heritage (including culturally significant landscapes, and biodiversity). Our focus in
this chapter is on tangible culture, with additional annotations to different artifacts
and information where appropriate.
Following the introduction, in Section 16.2, additional background information is
provided on the use of mobile AR for cultural heritage. In Section 16.3 we present
the results of a detailed case study using mobile AR for the presentation of cultural
heritage, and in Section 16.4 we describe a user study designed to evaluate the AR
application. The chapter concludes with a summary of the use and acceptability of
mobile AR for cultural heritage.
16.2 BACKGROUND ON MOBILE AR FOR CULTURAL HERITAGE

AR aims to enhance our view of the world by superimposing virtual objects on
thereal world in a way that persuades the viewer that the virtual object is part of
thereal environment (Butchart 2011). Generally, mobile AR technology provides the
same functionality as standard (nonmobile) AR systems, but without constraining
the individuals whereabouts to a specially equipped area (Karimi and Hammad
2004). According to Azuma etal. (2011), mobile AR is one of the fastest-growing
research areas in the AR field; this is due, in part, to the emergence and widespread
use of smartphones and tablets that provide powerful platforms for supporting AR
in a mobile setting. Current smartphones and tablets combine a fast processor with
graphics hardware, a large touch screen, and relevant embedded sensors such as a
high-resolution camera, GPS, Wi-Fi (for indoor positioning; Krogstie 2012), compass, and accelerometer, making mobile AR ideal for both indoor and outdoor applications (Billinghurst and Dnser 2012). Currently, there are two main approaches
to commercial AR applications: AR browsers based on geo-referenced positioning
and image recognitionbased AR. As we will discuss in a following section of the
chapter, most of the AR applications for cultural purposes use AR browsers and
rarely use image recognitionbased AR systems, such as Google Goggles; but this
may change in the future.
Use of Mobile Augmented Reality forCultural Heritage
413
Whereas other chapters in this book describe AR technology in terms of industry

and medical science applications, the focus of this chapter is on the application of AR
technology to cultural heritage. According to Johnson etal. (2010), Museum educators, arguably, have always been in the business of AR, creating bridges between
objects, ideas, and visitors. For example, both artifacts and exhibition areas in museums are often accompanied by extra materials such as descriptions, pictures, maps,
or movies. And for archaeological sites, there are on-site guides with pictures of how
a site looks now, printed on normal paper, and images of how the site appeared in the
past printed on transparent material. Audio guides are also used to annotate cultural
artifacts, especially at museums. Some social scientists also use physical mock-ups
with virtual overlays as presented in Laroche et al. (2011). However, mobile AR
applications takes the technology used to preserve cultural heritage a step further,
allowing an institution to provide historical information to the user, in the context
of where the user is located and when the user is there. Mobile AR technology also
generates publicity, and thus helps the institution attract and reach a new audience
(Boyer and Marcus 2011).
The following material should by no means be regarded as a complete list of all
mobile AR projects currently employed in the cultural heritage sector; however, it
describes the shift from projects that use AR systems equipped with head-mounted
displays and special hardware (which tether the user to a specific location) to projects
using contemporary AR systems with smartphone applications. A review of representative AR systems follows.
The Archeoguide system (Vlahakis etal. 2002) used mobile AR technology at
cultural heritage sites, and was launched in 2001. The system was built around the
historical site of Olympia, Greece, and provided personalized contextual information based on the users position and orientation at the site. Three different mobile
clients were supported within the system: a laptop, a tablet, and a personal digital
assistant (PDA) (the smartphones of those days).
The functionality typically provided by AR technology was only available on the
laptop system and required the use of a see-through head-mounted display with an
external web camera, a digital compass, a backpack with a GPS receiver, a laptop,
wireless communication equipment, and a battery. Further, the laptop display used a
hybrid approach in which a GPS and compass system were used to provide a rough
estimate of the users position, followed by vision-based tracking technique that was
used to find the users exact position and orientation. The vision-based tracking was
based on natural landmarks instead of artificial markers.
The tablets used with mobile AR were more conveniently sized and had a GPS
receiver, digital compass, and wireless communication equipment integrated in a
small box at the bottom of the screen. The tablets did not have a camera, but provided customized information and reconstructed augmented views of the surroundings aligned with the users position and orientation. Moving on to the PDA system,
while it did not support user-tracking, it did act as an electronic version of a written
tour guide.
The AR system used by PhillyHistory.org (Boyer and Marcus 2011) was developed
as a joint project between Azavea, a software company specializing in Geographic
Information Systems, and the City of Philadelphia Department of Records.
414
The cultural heritage application was built on top of the Layar platform and is available for both Android and iPhone devices. The system enables users to view historic
photographs of Philadelphia as overlays on the camera view of their smartphones.
The application contains almost 90,000 geo-positioned images, 500 of which can
be viewed in 3D, while a selection of 20 images contain additional explanatory text
developed by local scholars. The entire development process is thoroughly documented in a white paper and numerous blog posts covering technical and cultural
challenges that the designers confronted and overcome as they developed the system.
The Streetmuseum (ML 2014) cultural heritage system represents an AR application for iPhone and was developed by the Museum of London. The application
contains over 200 images from sites across London. Users with an iPhone can view
these images in 3D as ghostly overlays on the present day scene; however, users
with a 3G phone cannot access the complete AR functionality, but are still able to
view the images in 2D. The Streetmuseum application is different from the applications built by other cultural institutions, mainly because the Museum of London was
able to tailor their system for their particular uses, rather than using an existing AR
browser not tailored to specific users. The result is an application that offers a far
better experience than Layar, but only works on a limited number of devices (Chan
2010). As a measure of its success, the system had more than 50,000 downloads in
the first 2 weeks of its use.
The Netherlands Architecture Institutes UAR application (NAI 2011) is a mobile
architecture device developed by the Netherlands Architecture Institute. The application is built on top of Layar and is available for both Android and iPhone devices.
It uses AR to provide information about the built environment and is similar to the
Streetmuseum and the AR system developed by PhillyHistory.org. However, unlike
those systems, UAR also contains design drawings and 3D models of buildings that
were either never built, are under construction, or in the planning stage.
Another cultural heritage system, the Powerhouse Museums AR system
(Powerhouse 2014) allows visitors to use their mobile phones to visualize Sydney,
Australia as it appeared 100 years ago. The Powerhouse Museum system is not a
custom application, but is implemented as a channel in Layar. It is thus available on
all devices with a Layar browser. Their web page contains detailed instructions on
how to download Layar and search for the Powerhouse Museum channel.
Still another AR system designed to preserve cultural heritage was intelligent
tourism and cultural information through ubiquitous services (iTACITUS) (BMT
2011). This AR system was developed in connection to a European research project that was completed in July 2009. While the project was ongoing, researchers
explored various ways of using AR to provide compelling experiences at cultural
heritage sites, with an additional aim to encourage cultural tourism. One of the systems developed under the iTACITUS program, was Zllner et al.s AR presentation system for remote cultural heritage sites (2009). This system has been used
to produce several installations, among them the 20years representing the Fall of
the Berlin Wall installation at CeBIT in 2009. In that installation, users used Ultra
Mobile PCs (UMPCs) to visualize images of Berlin superimposed on a satellite
image of the city laid out on the floor. By touching the screen, users were able to
navigate through visualizations of Berlin as it appeared in different decades, thus
415
recognizing the historical geographical location and politics in Berlin before and
after World War II, followed by the construction of the Berlin Wall. The installation
also consisted of an outdoor component where interested users were able to take photos of a building and receive historical overlays from the server that corresponded to
the current view of the site.
In another example of mobile AR, the CityViewAR system (Billinghurst and
Dnser 2012) was developed to be used in a city. A particular goal of this AR
system was to support learning. With the system students used a mobile phone
application to see buildings in the city of Christchurch as they existed before the
2011 earthquakea natural event that resulted in much damage to the city. The
application was regarded to be user-friendly and thus designed to be used by any
citizen.
In Keil etal. (2011) a mobile app was designed to use AR technology to explain
the history and the architectural visual features at a real building viewed outdoors.
Inthe application, Explore! (Ardito etal. 2012), one can create an AR outdoor environment based on 3D models of monuments, places, and objects of historical sites,
and also extend the cultural heritage experience with contextual sounds.
Finally a recent paper from the European project Tag Cloud (de los Ros etal.
2014) provides an overview of current trends in information technology that are most
relevant to cultural institutions. The project investigates how AR, storytelling, and
social media can improve a visitors experience of local culture. Following the overview of techniques for cultural heritage, members of the project note recent developments related to the use of AR in the field.
Lights of St. Etienne (Argon 2014) use the AR browser Argon to create an embodied, location-based experience. Further, Historypin (2014) is a system that allows
community members to share images of the past. However, most of the applications
and research projects related to culture heritage are tourism oriented, and do not
consider the importance of engaging local community members about their own
cultural past.
In addition to these systems, there are many image recognitionbased AR appli
cations available: one of the most popular is StickyBits and another is Holey and
Gaikwad (2014); both are in the process of becoming mature technologies with
the capability to show relevant information about any object in the users vicinity.
Vuforia (2014) is a Solution Development Kit (SDK) for AR image r ecognition system that supports iOS, Android, and Unity 3D. These functionalities are only to a
limited degree being used in cultural heritage MAR applications. One use is culture
and nature travel (kulturog naturreise), which is a project whose goal is to present cultural heritage and natural phenomena in Norway using mobile technology
(Kulturrdet 2011). The project is being done in collaboration with the Arts Council
Norway (Kulturrdet), the Directorate for Cultural Heritage (Riksantikvaren), the
Norwegian Directorate for Nature Management (Direktoratet for naturforvaltning),
and the Norwegian Mapping Authority (Statens kartverk). The first pilot experiment
in the project used AR and QR codes to present information from the historical river
district in Oslo, Norway. Future pilot projects will be conducted to identify what
information and technology is needed to present information from archives, museums, and databases on smartphones, tablets, and other mobile devices.
416
16.3 APPLICATION EXAMPLE: HISTORICAL TOUR GUIDE

In this section we present in detail, a representative application of the use of mobile
AR for presenting cultural heritage; following this section is a discussion of user interest and acceptance of the AR application presented here. The application was first
presented in Haugstvedt and Krogstie (2012). The presented application allows us to
describe in detail a representative MAR application for cultural heritage, which provides a basis for the important discussion on appropriate adaptations of MAR for cultural heritage (not only including the possibility of making such solutions technically).
The application discussed and presented is the Historical Tour Guide. It is a
location-aware mobile information system that uses mobile AR to present localhistorical photographs of Trondheim, Norway together with additional data about the houses
and historical inhabitants of the houses. Through the website Trondheimsbilder.no,
people have for a number of years been able to access a large and growing collection
of historical local photographs using an Internet platform. The website is frequently
visited by historians and members of local historical organizations and new pictures are
added to the site regularly. In connection with work occurring during the last 5years
over the course of the Wireless Trondheim Living Lab project (Andresen etal. 2007),
we have developed and evaluated a number of applications combining local historical
pictures with user positioning, providing access to these pictures on a mobile platform
close to where these pictures were originally taken (Ibrahim 2008). Although technically successful, the system has been deficient in generating large interest from users in
the local community.
In Billinghurst and Dnser (2012) the claim is made that providing AR experiences on mobile devices can provide unique benefits over offering non-AR content on
the same topic; in our project, we thus wanted to investigate the application of mobile
AR for presenting the historical pictures. Feedback on the use of Trondheimsbilder.
no and similar systems has indicated that such systems serve both purposesto
learn about the city and offer a certain fun/enjoyment factor. Thus it is hypothesized
that the use of AR applications can have both hedonic and utilitarian purposes.
The following questions guided the research:

1. Do the previously established relationships between the constructs in the

TAM, extended with perceived enjoyment, hold for AR applications providing historical pictures and information?
2. Is there an interest among people in using AR applications with historical
pictures and information?
3. If yes to 2, what are the characteristics of users that have a specific interest
in using such AR applications for cultural heritage?
The first research question deals with relationships between constructs discussed in
the TAM acceptance model. Van der Heijden (2004) showed that perceived enjoyment and perceived ease of use were stronger predictors of intention to use a hedonic
system than perceived usefulness. Based on this observation, our goal was to discover whether the same finding held for mobile AR applications that presented historical pictures and information. Information about user acceptance of technology
417
and its hedonic qualities, pragmatically, can be used to find ways to make cultural
heritage information more acceptable to users.
The second and third question guiding the research dealt with a users interest
in using mobile AR technology for accessing cultural heritage. Here the research
aim was to discover whether there is an interest in the technology with respect to its
application for cultural heritage, and if so, whether this interest is dependent on the
specific application being used on a specific type of device. It was also of interest to
discover whether people wanted to use the application that was developed in their
home town, or when visiting a new city (as a tourist). We were also interested in
discovering how previous interest in local history influenced whether people wanted
to use the application or not.
To research these questions, a preliminary study was first conducted to explore
the need for an AR application presenting historical photographs. A number of similar solutions were reviewed and stakeholders from local cultural heritage institutions
were interviewed to gather user and system requirements. Next, a prototype was
developed and evaluated for its usability. Based on the results derived from this analysis, another design and development phase was performed. Furthermore, different
models for technology acceptance were reviewed and a questionnaire was designed
to measure usability. The questionnaire consisted of five major parts:

1. Perceived usefulness
2. Perceived ease of use
3. Perceived enjoyment
4. Behavioral intention
5. Individual variables
Figure 16.1 shows the research model used in this study. Note that this is the TAM
with perceived enjoyment as used by Davis etal. (1992) and van der Heijden (2004).
The measure for perceived usefulness was developed specifically for this project in
line with the thinking of van der Heijden (2004). For some time it has been clear that
mobile applications have certain specific challenges regarding usability that should
be taken into account (Krogstie 2001, Krogstie etal. 2003).
Perceived ease
of use (PEOU)
H4
Perceived
enjoyment (PE)
H3
H1
H5
Behavioral
intention to use
(BI)
FIGURE 16.1 Technology acceptance research model.
Perceived
usefulness (PU)
H2
418
Four constructs are included in the model: perceived enjoyment, perceived usefulness, perceived ease of use, and behavioral intention. While it was expected that
the predicative strength of the paths may change, it was also expected that the structure of the relationships from TAM would hold for this model as well. This conclusion led to the following hypotheses:
H1: There is a positive relationship between perceived ease of use and intention to use.
H2: There is a positive relationship between perceived usefulness and intention to use.
H3: There is a positive relationship between perceived ease of use and perceived usefulness.
Perceived enjoyment has the same position in the research model as perceived usefulness that led to the following hypotheses:
H4: There is a positive relationship between perceived ease of use and perceived enjoyment.
H5: There is a positive relationship between perceived enjoyment and intention to use.
16.3.1 Overview of Application

The concrete system was built atop CroMAR (Mora etal. 2012), a system that uses
mobile AR to support reflection on crowd management. There are three ways to
access information using the application:

1. Clicking on a point of interest (POI)

2. Clicking on a photo in the list of available photos
3. Looking on the map of an area where you are currently
All three methods can be combined with filtering to allow users to only look at
photos and related information from a specific decade. We next present the main
functionality of the system:
AR view: The AR view is the main view of the application where POIs are
shown as floating icons overlaying the camera feed. The name of the application is shown in the toolbar at the top. The view is shown in Figure 16.2.
Photo overlay: The applications provided transparent photo overlays. These
let the user see historical images overlaid over the present day scene. The
buttons in the toolbar at the top of the screen are used to close the overlay or
go to the detailed information view belonging to the picture.
Detailed information view: Each of the photographs in the application has
an associated detailed information view. For pictures it would contain a
description of the motive and also lets the user know when the picture was
taken, the source of the photograph, and the name of the photographer.
419
FIGURE 16.2 AR view of the Historical Tour Guide application.
Timeline: The timeline is always visible at the bottom of the screen. It lets
the user filter the amount of incoming information so they only see photographs from a specific decade. The selected decade is marked in green and
written in the upper left corner of the display.
Map: The map shows the users current position and the position of photos
from the decade selected on the timeline. Each pin is tagged with the name
of the photo and the current distance from the user.
List view: This view shows the user a list of all photographs from the
selected decade and provides a convenient method to open detailed views
without having to locate the associated markers.
16.3.2Technical Details
As mentioned earlier, the Historical Tour Guide application is built on top of
CroMAR. Therefore, most of the technical details of these two applications are similar. The code for CroMAR is written in Objective-C, the programming language for
native iOS applications. The Historical Tour Guide is written in the same language
but was updated to use automatic reference counting (ARC), a compiler feature of
XCode that provides automatic memory management of Objective-C objects.
Figure 16.3 shows the key objects in the Historical Tour Guide. The application is
organized around the model-view controller (MVC) pattern. This pattern separates
the data objects in the model from the views used to present the data. It facilitates the
independent development of different components and makes it possible to swap out
views or data without having to change large amounts of code. The system objects in
the diagram are standard objects that are part of all iOS applications. These are not
subclassed or modified by application developers. This is unlike the custom objects
that are instances of custom classes written for this specific application.
420
Custom objects
Model
Data objects
System objects
Either system or
custom objects
Controller
Event
loop
UIApplication
View
View controllers
Application
delegate
View and UI
objects
UIWindow
FIGURE 16.3 System architecture.
The UIApplication object is the system object that manages the application event
loop. It receives events from the system and dispatches these to the applications custom classes. It works together with the application delegate, a custom object created
at launch time that is responsible for the initialization of the application. The view
controller objects manage the presentation of the applications content on the screen.
Each of the controller objects manages a single view and its collection of subviews.
The other custom view controllers in the application also manage subclasses, or one
of the other standard iOS view controllers.
Each view covers a specific area and responds to events within that area. Controls
are a specialized type of view for implementing buttons, check boxes, text fields,
or similar interface objects. Further, the views and view controllers are connected.
When a view controller is presented, it makes its views visible by installing them
in the applications window. This is represented by a system object of the type
UIWindow. The last group of objects is the data model objects. These objects store
the applications content, such as the POIs, photographs, and historical information.
The Historical Tour Guide is launched when the user taps the custom application
icon. At this point in time, the application moves from the not running state to the
active state, passing briefly through the inactive state. As part of this launch cycle,
the system creates a process and a thread for the application and calls the applications main function. The Historical Tour Guide is an event-driven application. The
flow of the program is determined by two types of events:

1. Touch events, generated when users touch the views of the application
2. Motion events, generated when users move the device
Events of the first type are generated when a user presses a button, scrolls in a list, or
interacts with any of the other views. An action message is generated and sent to the
target object that was specified when the view was created.
421
During initialization, the system also registers as an observer for orientation

changes. As a result, the system receives the notifications that are generated when a
user changes location or moves the device to look in another direction. The application uses this information to redraw the marker icons and update the distance labels
shown on the map and in the list.
Both CroMAR and the original tour guide prototype showed all POIs in the
direction the user was looking. However, this became a problem when the number
of POIs increased and the marker icons started to overlap. In this situation, the users
would have to use the timeline to filter the POIs, not because they were interested in
photographs from a specific decade, but because they needed to reduce the number
of visible icons. The current version of the application implements additional filtering and only shows icons for POIs less than 50 m away.
The map in both CroMAR and the Historical Tour Guide are implemented using
Apples Map Kit framework. This is a framework that uses Google services to provide map data. The framework provides automatic support for the touch events that
let users zoom and pan the map. The application also uses Apples UIKit, Foundation,
CoreGraphics, and CoreLocation frameworks.
16.4 EVALUATION OF USER INTEREST OF THE APPLICATION

As described earlier we used an extended version of the TAM (Venkatesh and Davis
2000) to investigate the potential acceptance of the AR application for cultural heritage described earlier. A prototype usability analysis was developed in accordance
with general principles for usability design, and two surveys were conducted to evaluate usability and acceptance. The web survey involved participants from an online
market research panel. Specifically, 200 participants answered a questionnaire after
having watched a video showing the application in use. The video lasted about 1min
and 30 s, and illustrated the use of the AR application by interspersing explanatory
text with pictures showing the use of the application in practice. A similar survey
has been used by Hella and Krogstie (2011) for validating similar research models,
since the methodology enables one to get feedback from sufficiently many users to
be able to apply the prescribed appropriate statistical techniques. The participants
were recruited from different parts of Norway by a market research company. Given
that the participants were part of the market research companys user pool, we had
no direct contact with the respondents.
In the street survey, 42 participants received the opportunity to try the application in practice in the streets of Trondheim before they answered the questionnaire,
allowing more qualitative input to the evaluation. None of the respondents were
known to us before being stopped on the street. And as to our knowledge, there was
no overlap between the groups used in the web survey.
16.4.1Results
This section presents the descriptive analysis of the results from the two surveys.
Astatistical test of the overall research model is presented by Haugstvedt and Krogstie
(2012), and we only present the main result here. In the street survey, the age range
422
was 1460years, with a mean age of 27.8. Overall, 59.5% of the respondents were
male, and about a fifth of the respondents replied that they had previously completed
a similar questionnaire. In the web survey the age range was 2045years, with a mean
age of 33.3. The gender distribution was about equal, but with slightly more female
respondents.
Interestingly, for the street survey, the respondents did not use the entire scale.
Apart from some responding 3, all answers were in the range from 4 to 7, which
indicate that those having the opportunity to test the system properly themselves were
all either neutral or positive. On the web survey the entire scale was used, which partly
explains the shift in the average value of the responses between the two surveys.
The responses to the questions on perceived usefulness are found in Table 16.1. First
the individual question is presented (e.g., PU1by using the app, I can more quickly
and easily find historical pictures and information), before first the gradings from the
web survey and then the gradings from the street survey are presented for each question. Inthe final two lines we present the average response for all four questions in the
category perceived usefulness. Although the median answer is the same on most questions, we see that the average is generally higher on the street survey. Still we can regard
the responses to usefulness to be quite high compared to what we have experienced in
similar surveys relative to other mobile applications such as (Hella and Krogstie 2010).
The responses to the four items on perceived ease of use are found in Table 16.2.
First the individual question is presented (e.g., PEOU1interaction with the app is
clear and understandable), before first the gradings from the web survey and then
the gradings from the street survey are presented for each question. In the final two
lines we present the average response for all four questions in the category perceived
TABLE 16.1
Comparing Answers between the Surveys on Perceived Usefulness
Item
Min
Max
Mean
Median
STD
PU1: By using the app, I can more quickly and easily find historical pictures and information.
PU1web
PU1street
200
42
1
4
7
7
5.35
6.00
6
6
1.181
0.963
6
6
1.171
0.661
PU2: By using the app, I learn more about history in Trondheim.

PU2web
PU2street
200
42
1
4
7
7
5.34
6.38
PU3: By using the app, I can quickly find historical pictures and information from places nearby.
PU3web
PU3street
200
42
1
4
7
7
5.39
6.24
6
6
1.202
0.726
PU4: By using app, I am more likely to find historical pictures and information that interest me.
PU4web
PU4street
200
42
1
3
7
7
5.08
6.05
5
6
1.393
0.963
200
42
1
4
7
7
5.29
6.17
5.5
6.25
1.13
0.648
PUaverage
PUweb
PUstreet
423
TABLE 16.2
Comparing Answers between the Surveys on Perceived Ease of Use
Item
Min
Max
Mean
Median
STD
PEOU1: Interaction with the app is clear and understandable.

PEOU1web
PEOU1street
200
42
1
3
7
7
5.31
5.52
6
6
1.233
1.215
PEOU2: Interaction with the app does not require a lot of mental effort.
PEOU2web
PEOU2street
200
42
1
3
7
7
4.87
5.88
5
6
1.361
0.993
1
4
7
7
5.16
5.79
5
6
1.182
1.001
PEOU3: I find the app easy to use.

PEOU3web
PEOU3street
200
42
PEOU4: I find it easy to get the app to do what I want it to do.

PEOU4web
PEOU4street
200
42
1
4
7
7
4.97
5.57
5
6
1.223
1.016
200
42
1
4
7
7
5.08
5.69
5.13
5.75
1.117
0.860
PEOUaverage
PEOUweb
PEOUstreet
ease of use. Here we find a similar pattern as earlier, with higher averages and also
for several of the questions higher medians. The responses are on average a bit less
positive though; thus even if the application first had undergone a separate usability
test and had been improved before being used in this specific investigations, this is
an indication that there is still room for improvements. It is positive though that the
responses to the street survey is more positive, with averages of around six as for
finding the app easy to use and not requiring a lot of mental effort.
For perceived enjoyment the respondents in both surveys used a semantic differential with contrasting adjectives at each end of the scale to rate these items.
Thescale used in the street survey was a discrete scale with seven categories while
the scale used in the web survey was continuous. The replies from the continuous
scale were later on coded into seven categories. Results are shown in Table 16.3.
First the individual scale is presented (e.g., PE1disgustingenjoyable) before first
the gradings from the web survey and then the gradings from the street survey are
presented for each question. In the final two lines we present the average response
for all four scales in the category perceived enjoyment. The data reveal a higher average in the web survey, which indicate quite high perceived enjoyment of this kind
of service, although we find a higher standard deviation on the result of this than in
thecategories perceived ease of use and perceived usefulness, pointing to that here
the opinions of the respondents are more mixed.
In Table 16.4, we provide similar data for intention to use. First the individual
question is presented (e.g., BI1I intend to use the app on a smartphone), before first
the gradings from the web survey and then the gradings from the street survey are
presented for each question. In the final two lines we present the average response
424
TABLE 16.3
Comparing Answers between the Surveys on Perceived Enjoyment
Item
Min
Max
Mean
Median
STD
PE1: disgustingenjoyable
PE1web
PE1street
200
40
1
3
7
7
6.06
5.83
7
6
1.562
1.130
200
40
1
3
7
7
5.60
5.45
6.5
5
1.881
1.131
1
4
7
7
6.17
5.70
7
6
1.514
1.067
200
40
1
3
7
7
5.67
6.00
7
6
1.993
1.038
200
40
1
3.75
7
7
5.9
5.74
6.5
5.75
1.500
0.893
PE2: dullexciting
PE2web
PE2street
PE3: unpleasantpleasant
PE3web
PE3street
200
40
PE4: boringinteresting
PE4web
PE4street
PEaverage
PEweb
PEstreet
for all eight questions in the category intention to use. Intention to use is the only
area in the street survey where the respondents used the entire scale in answering.
As can be seen from the higher standard deviations on many of these questions,
opinions arequite mixed, although the responses on questions on some modes of
usages (e.g.,the use of the app if visiting a city as a tourist) are very positive. We will
discuss some of the other results in more detail in the following.
An important aspect with the use of the web survey was the opportunity to validate the hedonic research model. Figure 16.4 shows the structural model calculated
with data from the web survey. The structural model shows that all five hypotheses
were supported. With the exception of the path between PEOU and BI, all paths were
significant at the p < 0.001 level. The path between PEOU and BI was significant at
the p < 0.05 level. A more detailed treatment of the statistical validity of these results
is found in (Haugstvedt and Krogstie 2012).
To summarize the hypothesis relative to the research model in light of Figure 16.4
H1: There is a positive relationship between perceived ease of use and intention to use. Accepted at p < 0.05.
H2: There is a positive relationship between perceived usefulness and intention to use. Accepted at p < 0.001.
H3: There is a positive relationship between perceived ease of use and perceived usefulness. Accepted at p < 0.001.
H4: There is a positive relationship between perceived ease of use and perceived enjoyment. Accepted at p < 0.001.
H5: There is a positive relationship between perceived enjoyment and intention to use. Accepted at p < 0.001.
425
TABLE 16.4
Comparing Answers between the Surveys on Intention to Use
Item
Min
Max
Mean
Median
STD
BI1: I intend to use the app on a smartphone.

BI1web
BI1street
200
40
1
3
7
7
4.58
5.98
5
6
1.760
1.121
7
7
4.07
5.80
4
6
1.700
1.229
7
7
4.01
5.22
4
5
1.779
1.557
3.55
4.81
4
5
1.695
1.784
5.05
6.45
5
7
1.577
0.889
4.45
6.12
5
6
1.692
1.109
4.54
5.43
5
5
1.779
1.548
BI2: I predict that I will use the app on a smartphone.

BI2web
BI2street
200
41
1
3
BI3: I intend to use the app on a tablet.

BI3web
BI3street
200
41
1
2
BI4: I predict that I will use the app on a tablet.

BI4web
BI4street
200
42
1
1
7
7
BI5: I intend to use the app in a city I visit as a tourist.

BI5web
BI5street
200
42
1
4
7
7
BI6: I predict that I will use the app in a city I visit as a tourist.
BI6web
BI6street
200
42
1
3
7
7
BI7: I intend to use the app in my hometown.

BI7web
BI7street
200
42
1
1
7
7
BI8: I predict that I will use the app in my hometown.

BI8web
BI8street
200
42
1
1
7
7
4.16
5.24
4
5
1.810
1.620
200
40
1
3.63
7
7
4.3
5.7
4.5
5.5
1.520
0.966
BIaverage
BIweb
BIstreet
On a detailed level, the descriptive results from the street survey and the web survey
have some differences. The street survey participants were generally more positive
about the application than their web survey counterparts, and to a lesser extent used
the whole (negative part of) scale. This might be caused by unwanted bias by the
presence of the investigator, but can also be the effect of using and experimenting
with the app freely themselves. The web survey participants rated the application
higher on the scale for perceived enjoyment, but it is likely that this is due to the
different format that was used for this scale in the web survey. The company that
collected the data used a continuous slider to program this question instead of having
seven distinct categories. The answers were afterward mapped into seven categories.
It is possible that this format caused the respondents to use the endpoints of the scale.
426
0.549
Perceived ease
of use (PEOU)
Perceived
enjoyment (PE)
0.301
0.333
(5.416***)
0.760
(23.821***)
(8.085***)
0.152
(2.060*)
Behavioral
intention to use
(BI) 0.301
Perceived
usefulness (PU)
0.578
0.382
(4.680***)
*Significant at p < 0.05.

***Significant at p < 0.001.
FIGURE 16.4 Structural model with data from web survey.
That would at least explain the high number of scores of 7 in this scale in the web
survey compared to the street survey. The participants in both surveys were more
interested in using the application in a city they visited as a tourist than to using it in
their hometown. This indicates that it is relevant to compare the input of people from
other places with those from locals.
Finally, the generalizability of the results should be investigated. As mentioned
earlier a similar application had been made on a normal mobile platform with maps
and geo-tagged historical pictures with limited success, not getting past the prototype stage. The application on a mobile AR platform has received better feedback as
reported here. This mirrors the results reported by Billinghurst and Dnser (2012)
claiming that providing AR on mobile devices can have benefits over offering nonAR content on the same topic.
Norwegians are known to quickly adopt new technologies. It is hard to judge if
users in other countries where the use of smartphones and tablets are not so widespread would be less positive to applications of this sort. Given that the application does not store any private data, aspects of trust that carry different weight in
different cultures (Gao and Krogstie 2011) would not be expected to influence the
results.
Having established the basic research model, we have looked further on relationships between the individual variables and intention to use (BI) using the
web survey. Given that the data are on a Likert scale, we have used nonparametric techniques to investigate correlations and find the following significant
results looking at the web survey. In addition to the overall bivariable (on intention to use), we have also looked at variables relative to use of the app on a smartphone (BI-smart), on a tablet (like tested, BI-tablet), as a tourist (BI-tourist), or
as a local (BI-local).
In Table 16.5, the first column shows the test variable, whereas the second
shows the grouping value that is one of the background variables. The next column
indicates the average value of the test variable for those responding yes on the
427
TABLE 16.5
Significant Relationships between Main Variables and Background Variables
Test Variable
BI
BI
BI
BI-smart
BI-smart
BI-tablet
BI-tourist
BI-local
PEOU
PE
PE
Grouping Variable
A (Yes)
B (No)
N(A)
N(B)
Interest in local history

Have tablet
Have smartphone
Have smartphone
Have tablet
Have smartphone
Have smartphone
Have smartphone
Used similar app
4.53
4.63
4.44
4.52
4.65
4.40
4.90
4.46
5.14
6.10
5.10
4.12
4.16
3.69
3.44
4.09
3.51
4.08
3.85
4.78
5.75
6.00
89
61
163
163
89
61
163
95
139
27
27
95
139
27
163
89
27
27
95
169
0.048
0.036
0.006
0.000
0.008
0.001
0.004
0.033
0.021
0.018
0.002
grouping variable. The next column is the average of the no answer and the next
columns indicate the number of yes and no answers for the grouping variable, and
p is the probability that there is not a difference between the groups on the test
variables.
As we see from the table, those who already have expressed an interest in local
history seem more likely to use such applications, which should come as no surprise. On the technology side, we see that those already having a smartphone or
a tablet find it more likely that they will adopt an application of this sort, which
necessarily must be mobile. This can also explain the large differences in answers
on questions BI1BI4 in Table 16.4. Those having a tablet are more likely to use a
cultural heritage application on a tablet. Similarly those having a smartphone are
more inclined to use a cultural heritage application on a smartphone. We see also
a positive correlation between interest in local history and the usage of the app on
a smartphone. Smartphone users may desire to use the app both as a tourist and
as a local. Finally, those having a smartphone perceived the tablet app to be more
easy to use than those not having a tablet, which also can explain the spread of the
responses to usability. Asmore people get used to using tablets, these applications
exploiting the possibilities of tablets will also be experienced as more easy to use.
Those already having interest in local history expressed significantly higher perceived enjoyment. Somewhat surprising is that those not familiar with such an app,
had a higher perceived enjoyment than those (quite few) that had used such applications before. Possibly, this can be explained by a novelty factor.
16.5CONCLUSION
As have been illustrated in this chapter, mobile AR has been implemented for
a large range of situations, both for indoor and outdoor use, applying a large
variety of techniques to enhance the experience of the spectators, view of cultural heritage. As the technology is developed further, such that a person can
428
eventually use standard technology for viewing cultural heritage information, it

is likely that the actual usage of such solutions will increase.
To illustrate the last point, in the chapter we evaluated the acceptance of a particular application of mobile AR. Further, we applied a TAM for combined utilitarian
and hedonic systems to examine the determinants of intention to use an AR application with historical information and pictures. An application was built and used to
demonstrate the concept for the participants in the street survey that was conducted
as part of the project. A questionnaire was used for data collection. The same questionnaire was also used in a web survey, conducted online using a market research
panel. The participants in the web survey watched a video presentation of the use of
the application before answering the questionnaire.
Returning to our research questions we can conclude the following:

1. Do the previously established relationships between the constructs in the

TAM, extended with perceived enjoyment, hold for AR applications providing historical pictures and information? As illustrated in Figure 16.4, it
seems this established research model for application with both utilitarian
and hedonic aspects is useful for this type of application.
2. Is there an interest among people in using AR applications with historical
pictures and information? Based on the responses as reported in Tables 16.1
through 16.4, we can say that there is an interest for such applications, but
that this varies quite a bit between different people.
3. If yes to 2, what are the characteristics of users that have a specific interest in
using such AR applications for cultural heritage? As illustrated in Table16.5,
an existing interest in local history and thus cultural heritage relative to local
history seems to be important for intention to use. Also that one already
has a device of the sort that the application is provided on seems to have
significant impact on intention to use. Although many will like to use such
apps locally, the interest seems to be even higher for the usage of such apps
when one is visiting other places as a tourist.
This chapter discussed also the differences in the results based on feedback on seeing a mock-up outside the actual usage environment, and the experiences with a real
application in the real environment. This study was done to allow participants to
judge results from the evaluation of a more low-fidelity solution, where it is easier to
get feedback from many users at an early stage (here originally done to validate the
research model, and investigate the significant correlations between the variables in
the research model and background variables). It appears that the responses in the
low-fidelity case were more conservative. One exception was the result on perceived
enjoyment, which seems to have been influenced by how this question was implemented with sliders rather than discrete values in the questionnaire. On the other
hand, we saw limitations with the more real setting, in the sense that there could be
possible investigator bias which leads to not getting the local respondents even at
the local site (which here might be tourists and not locals). Thus when investigating
the more detailed mechanisms for acceptance, we have focused in particular on the
results from the web-based survey.
429
REFERENCES
Andresen, S. H., J. Krogstie, and T. Jelle. 2007. Lab and research activities at wireless
trondheim. In: Fourth International Symposium on Wireless Communication Systems,
Trondheim, Norway.
Ardito, C., M. F. Costabile, A. De Anegeli, and R. Lanzilotti. 2012. Enriching archaeological
parks with contextual sounds and mobile technology. ACM Transactions on Computer
Human Interaction, 19(4): 130, ISSN: 1073-0516, DOI: 10.1145/2395131.2395136.
Argon. 2015. Georgia tech augmented environments. http://argon.gatech.edu/ (accessed June
7, 2014).
Azuma, R., M. Billinghurst, and G. Klinker. 2011. Special section on mobile augmented reality. Computer and Graphics, 35: viiviii.
Billinghurst, M. and A. Dnser. 2012. Augmented reality in the classroom. IEEE Computer,
45(7): 5663.
BMT. 2011. BMT research and development directorate. iTacitusIntelligent tourism and
cultural information through ubiquitous services. http://www.itacitus.org/ (accessed
June 22, 2014).
Boyer, D. and J. Marcus. 2011. Implementing mobile augmented reality applications for
cultural institutions. In: J. Trant and D. Bearman (eds.), Museums and the Web 2011:
Proceedings (MW2011), Toronto, Ontario, Canada.
Butchart, B. 2011. Augmented reality for smartphones. Technical report, JISC observatory.
Brighton, UK.
Chan, S. 2010. On augmented reality (again)Time with UAR, Layar, streetmuseum & the CBA.
http://www.powerhousemuseum.com/dmsblog/index.php/2010/10/26/on-augmentedreality-again-time-with-uar-layar-streetmuseum-the-cba/ (accessed June 22, 2014).
Davis, F. D., R. P. Bagozzi, and P. R. Warshaw. 1992. Extrinsic and intrinsic motivation to use
computers in the workplace. Journal of Applied Social Psychology, 22(14): 11111132.
de los Ros, S., M. F. Cabrera-Umpirrez, M. T. Arredondo, M. Pramo, B. Baranski, J. Meis,
M. Gerhard, B. Prados, L. Prez, and M. del Mar Villafranca. 2014. Using augmented
reality and social media in mobile applications to engage people on cultural sites
universal access in humancomputer interaction. Universal access to information and
knowledge. Lecture Notes in Computer Science, 8514: 662672.
Gao, S. and J. Krogstie. 2011. Explaining the adoption of mobile information services from a
cultural perspective. In: Proceedings ICMB 2011, June 2021, Como, Italy, pp. 243252.
Gao, S., J. Krogstie, and P. A. Gransther. 2008. Mobile services acceptance model. Paper
presented at the International Conference on Convergence and Hybrid Information
Technology (ICHIT2008), Daejeon, Korea.
Gao, S., J. Krogstie, and K. Siau. 2011. Developing an instrument to measure the adoption
of mobile services. International Journal on Mobile Information Systems, 7(1): 4567.
Gao, S., J. Krogstie, and K. Siau. 2014. Adoption of mobile information services: An empirical
study. International Journal on Mobile Information Systems, 10(2): 147171.
Ha, I., Y. Yoon, and M. Choi. 2007. Determinants of adoption of mobile games under mobile
broadband wireless access environment. Information and Management, 44(3): 276286.
Haugstvedt, A.-C. and J. Krogstie. 2012. Mobile augmented reality for cultural heritage:
A technology acceptance study. In: Proceedings of the International Symposium on
Mixed and Augmented Reality (ISMAR), November 58, Atlanta, GA.
Hella, L. and J. Krogstie. 2011. Personalisations by semantic web technology in food s hopping.
In: Proceedings of WIMS 2011, Sogndal, Norway.
Historypin. 2014. A global community collaborating around history. http://www.historypin.
com (accessed June 7, 2014).
Holey, P. and V. Gaikwad. 2014. Google glass technology. International Journal of Advanced
Research in Computer Science and Management Studies, 2(3): 278.
430
Ibrahim, M. 2008. Utvikling og evaluering av ulike mobile, lokasjonsbaserte tjenester (in

Norwegian). Technical report, IDI, NTNU. Trondheim, Norway.
Johnson, L. F., H. Witchey, R. Smith, A. Levine, and K. Haywood. 2010. The 2010 Horizon
Report: Museum Edition. The New Media Consortium, Austin, TX.
Kaasinen, E. 2005. User Acceptance of Mobile ServicesValue, Ease of Use, Trust and Ease
of Adoption. VTT Publications 566, Espoo, Finland.
Karimi, H. and A. Hammad (eds.). 2004. Mobile augmented reality. Telegeoinformatics:
Location-Based Computing and Services, Chapter 9. CRC Press LLC, Boca Raton, FL.
Keil, J., M. Zllner, M. Becker, F. Wientapper, T. Engelke, and H. Wuest. 2011. The house of
OlbrichAn augmented reality tour through architectural history. In: Proceedings of
10th International Conference on Mobile Business (ICMB 2011), pp. 243252. Como,
Italy.
Krogstie, J. 2001. Requirements engineering for mobile information systems. Paper presented
at the Seventh International Workshop on Requirements Engineering: Foundations for
Software Quality (REFSQ01), Interlaken, Switzerland.
Krogstie, J. 2012. Bridging research and innovation by applying living labs for design science
research. In: Proceedings SCIS 2012, August 1720, Uppsala, Sweden.
Krogstie, J., K. Lyytinen, A. L. Opdahl, B. Pernici, K. Siau, and K. Smolander. 2003. Mobile
information systemsResearch challenges on the conceptual and logical level. Paper
presented at the MobiMod02, Tampere, Finland.
Kulturrdet. 2011. The Norwegian directorate for nature management arts council Norway,
the directorate for cultural heritage and the Norwegian mapping authority. Kulturog
naturreise. http://www.kulturognaturreise.no/, 2011 (accessed June 22, 2014).
Laroche, F., M. Servirest, D. Lefvre, and J.-L. Kerouanton. 2011. Where virtual enhances
physical mock-up: A way to understand our heritage. In: Proceedings of ISMAR 2011,
Basel, Switzerland, pp. 16.
Liang, T.-P. and Y.-H. Yeh. 2010. Effect of use contexts on the continuous use of mobile services: The case of mobile games. Personal and Ubiquitous Computing, 15(2): 187196.
Liu, Y. and H. Li. 2011. Exploring the impact of use context on mobile hedonic services adoption: An empirical study on mobile gaming in China. Computers in Human Behavior,
27(2): 890898.
Mora, S., A. Boron, and M. Divitini. 2012. CroMAR: Mobile augmented reality for supporting reflection on crowd management. International Journal of Mobile Human Computer
Interaction, 4(2): 88101.
ML. 2014. Museum of London, http://www.museumoflondon.org.uk/Resources/app/you-arehere-app/home.html (accessed June 7, 2014).
NAI. 2011. The Netherlands Architecture Institute. See what is not (yet) there with the NAI
and augmented reality. http://en.nai.nl/museum/architecture_app/item/_pid/kolom21/_rp_kolom2-1_elementId/1_601695 (accessed June 22, 2014).
Peres, R., A. Correia, and M. Moital. 2011. The indicators of intention to adopt mobile electronic tourist guides. Journal of Hospitality and Tourism Technology, 2(2): 120138.
Powerhouse. 2014. Layar: Augmented reality browsing of Powerhouse Museum around
Sydney. http://www.powerhousemuseum.com/layar/ (accessed June 22, 2014).
Tsai, C.-Y. 2011. An analysis of usage intentions for mobile travel guide systems. Journal of
Business Management, 4(13): 29622970.
Van der Heijden, H. 2004. User acceptance of hedonic information systems. MIS Quarterly,
28(4): 695704.
Van der Heijden, H., M. Ogertschnig, and van der Gast, L. 2005. Effects of context relevance
and perceived risk on user acceptance of mobile information services. In: ECIS 2005
Proceedings, Paper 7.http://aisel.aisnet.org/ecis2005/7, Regensberg, Germany.
431
van Kleef, N., J. Noltes, and S. van der Spoel. 2010. Success factors for augmented reality business models. In: Study Tour Pixel 2010. University of Twente, Twente, the
Netherlands, pp. 136.
Venkatesh, V. and F. D. Davis. 2000. A theoretical extension of the technology acceptance
model: Four longitudinal field studies. Management Science, 46(2): 186204.
Verkasalo, H., C. Lpez-Nicolsb, F. J. Molina-Castillo, and H. Bouwman. 2010. Analysis
of users and non-users of smartphone applications. Telematics and Informatics, 27(3):
242255.
Vlahakis, V., N. Ioannidis, J. Karigiannis, M. Tsotros, and M. Gounaris. 2002. Virtual reality and information technology for archaeological site promotion. In: Proceedings of
the Fifth International Conference on Business Information Systems (BIS02), Poznan,
Poland.
Vuforia. 2014. https://www.qualcomm.com/products/vuforia (accessed March 3, 2015).
Wu, J.-H. and S.-C. Wang. 2005. What drives mobile commerce? An empirical evaluation of
the revised technology acceptance model. Information Management, 42: 719729.
Wst, H., M. Zllner, J. Keil, and D. Pletinckx. 2009. An augmented reality presentation system
for remote cultural heritage site. In: Proceedings of the 10th International Symposium
on Virtual Reality, Archaeology and Cultural Heritage VAST (2009). Atlantic City, NJ.
17
Applications of
Augmented Reality for
the Automotive Industry
Vincent Gay-Bellile, Steve Bourgeois,
DorraLarnaout, and Mohamed Tamaazousti
CONTENTS
17.1 Introduction................................................................................................... 434
17.2 Potential Benefits of Augmented Reality for the Automotive Industry......... 434
17.2.1 Vehicle Design and Conception......................................................... 434
17.2.2 Factory Planning................................................................................ 435
17.2.3 Vehicle Production............................................................................. 435
17.2.4 Sales Support..................................................................................... 436
17.2.5 Driving Assistance............................................................................. 436
17.2.6 User Manual and Maintenance Support............................................ 438
17.3 Technological Challenges for a Large-Scale Deployment............................ 438
17.4 Tracking a Vehicle or One of Its Component................................................440
17.4.1 State of the Art..................................................................................440
17.4.2 Our Solution: VSLAM Constrained by a CAD Model..................... 443
17.4.2.1 Principle.............................................................................. 443
17.4.2.2 Implementation...................................................................444
17.4.3 Discussion.......................................................................................... 445
17.5 Vehicle Localization for Aided Navigation in an Urban Context.................446
17.5.1 State of the Art..................................................................................446
17.5.2 Constrained VSLAM for Large-Scale Vehicle Localization............ 447
17.5.2.1 Constraining VSLAM with GPS........................................448
17.5.2.2 Improving the In-Plane Accuracy with GIS Constraint.....449
17.5.2.3 Improving the Out-Plane Accuracy with a SIG
Constraint...........................................................................452
17.5.2.4 Overview of Our Complete Solution for Vehicle
Localization........................................................................ 452
17.5.3 Discussion.......................................................................................... 453
17.6 Conclusion..................................................................................................... 454
Acknowledgments................................................................................................... 454
References............................................................................................................... 455
433
434
17.1INTRODUCTION
For a number of reasons, the automotive industry is heavily involved in the development of augmented reality (AR) and its applications. Since the late 1990s, car
manufacturers and original equipment manufacturers (OEMs) have explored the
benefits of AR through major collaborative projects, such as ARVIKA with Daimler
Chrysler, Ford, Audi, Volkswagen, and Siemens AG (ARVIKA Project 19992003);
ARTESAS with Siemens AG and BMW (ARTESAS Project 20042006); EFA2014
with Audi, BMW, Continental, and Bosch (EFA2014 Project n.d.); AVILUS with
Daimler, Siemens, and Volkswagen; and finally, EGYPT with PSA. During these
projects, the use of AR for the whole product life cycle has been studied, from car
conception until customer assistance services. Still, the deployment of AR applications in the automotive industry has remained limited due to various technical
challenges.
In this chapter, we first provide an overview of the different uses of AR in the
automotive industry and their relative requirements. Then, we focus on recent tracking technology advances that may lead to large-scale deployment of AR solutions in
the automotive industry.
17.2POTENTIAL BENEFITS OF AUGMENTED REALITY

FOR THE AUTOMOTIVE INDUSTRY
In the following sections, we briefly illustrate the potential benefits of AR during
the different stages of the life cycle of a vehicle. However, the resulting list of applications are not meant to be exhaustive but representative of the uses of AR in the
automotive industry.
17.2.1 Vehicle Design and Conception

Design and ergonomics of a vehicle are the key elements in the purchase decision of
many customers. Therefore, in the vehicle conception phase, engineers must carefully consider customer assessments of various design features as early as possible.
However, due to the multiple variations of a vehicle (equipment, material, etc.), production of a real-size physical mock-up for each variant suggested by a purchaser
would be both expensive and slow. In this context, virtual reality (VR) provides an
efficient solution to allow a potential buyer to observe many variants of a vehicle
using the same display device (e.g., VR glasses, CAVE). However, compared to the
use of VR, in many steps of the design process, physical mock-ups are still being
used since they provide an accurate evaluation of the human factor, which includes
the ability to touch the real object and to better visualize the objects shape and
dimensions.
Compared to VR as a visualization tool, AR offers additional benefits as it allows
virtual elements of an automobile to be visualized while also allowing the end user
to remain in his or her natural environment (Menk etal. 2010; Porter etal. 2010).
Indeed, a mock-up of the target object can be virtually customized in real time
through an AR device (video stream of a camera, AR glasses or projection onto
Applications of Augmented Reality for the Automotive Industry
435
the mock-up). Further, because it uses a tangible prototype, AR (compared to VR)

provides a better perception of the dimensions and the volume of an automobile. The
use of AR also allows the potential customer to move naturally around the object
and even interact with it. Finally, since the fine design elements of an automobile
(e.g., buttons, handles, prints) can be added virtually to the real object, with AR only
a low-fidelity mock-up is required. Compared to the traditional mock-up approach,
the overall cost of production using AR is reduced since one model can be used to
assess many design configurations.
17.2.2 Factory Planning

To remain competitive, that is, maintaining reduced production costs and adhering
to consumer taste evolution, the automotive industry needs to have a quick renewal
pace for its vehicles. This shorter product life cycle implies that the manufacturer
must upgrade its installation frequently. However, suspending a production chain to
check if the validity of a new process is expensive since it reduces the productivity of
the current production chain. Maintaining a high availability of the production while
providing flexibility is the target of factory planning. Therefore, virtual simulation
is widely used in factory planning since it allows engineers to simulate the entire
production system, detect discrepancies (e.g., collision between a robotic arm and
its environment), and evaluate the impact of the new workspace without disturbing
the current production chain. However, this solution requires a complete and accurate 3D model of the existing environment, which is difficult to obtain for a plant in
operation.
In the factory planning context, AR can provide an insitu visualization of the
virtual simulation (Pentenrieder etal. 2007; Park etal. 2008) where the operator is
invited to move through the plant while observing the virtual simulation integrated
into the real environment. The consistency of the simulation with respect to the real
environment can thus be visually assessed.
17.2.3 Vehicle Production

Although the production lines are increasingly automated, tasks that require complex manipulations, or whose accessibility is limited, are still performed by human
operators. Also, when a vehicle is produced in limited quantity, such as the case for
customized utility vehicles, the use of robots in the production cycle can be prohibitively expensive. In these cases, automobile assembly is achieved by a human technician following a specific order and with dedicated tools. Because the use of operation
documents is time consuming, operators tend to rely on their memory instead of
visually checking the step order. This results in a higher error rate, and a loss of
both quality and time. AR offers an alternative to paper documentation. By directly
superimposing the instruction sequence into the operators field of view, AR can provide the required information at the right time, at the right place, without requiring
any significant mental effort from the operator. Moreover, the information provided
by AR is not limited to just the description of what to do, but also how to do it: the
tool to use and the correct physical gesture. AR can also improve the perception the
436
operator forms of the design by showing him or her hidden elements that he or she
otherwise could not directly see, such as the position of an electrical wire behind a
metal panel. This virtual information can be generated from existing CAD data of
the product, or also from data arising from nondestructive testing (e.g., ultrasonic
imagery, tomography).
Summarizing, AR can reduce the risk of human mistakes, but also increase the
efficiency of the operator. Indeed, with AR the operator can remain focused on an
important area of intervention since he or she no longer needs to switch between the
documentation and its work area. This approach, the use of AR within the automotive industry, was evaluated for tasks such as picking (Reif and Gnthner 2009),
assembly (Reiners etal. 1998; Regenbrecht etal. 2005), and quality control (Zhou
etal. 2012). In addition, if AR technology can be used to make complex operations
easier, it can also be used for training purposes, in which case AR could free senior
staff from basic training duties or accelerate the replacement of an absent worker.
17.2.4Sales Support
During the process of a sale, an automotive salesperson faces various challenges.
First, he or she may have to present to the potential customer the various models and
options of a vehicle; yet only one or two models may be available in the showroom.
Moreover, even if the vehicle is available, some specificity of the vehicle cannot
be easily observed, such as the turning radius or the braking distance. One current
solution to this problem is to use a VR configurator. However, this approach has
problems since the virtual representation is limited given that the user perception of
dimensions and volumes of the automobile are distorted.
AR provides an interesting alternative to this problem. Similar to a virtual configurator, with AR the vehicle can be customized interactively (Gay-Bellile et al.
2012). Importantly, with this approach, since the augmentation is achieved on a real
vehicle, the perception of dimensions and volumes are preserved. With AR the end
user experience can be further improved by integrating some interactions between
the real environment and the virtual elements. For example, if the customer chooses
to change the color of the bodywork, it is important to provide a restitution that
respects the current lighting conditions, but that also keeps the reflections of the surrounding environment onto the bodywork. And finally as illustrated in Figure 17.1,
if customization through AR is useful for cars purchased by the average customer,
it is even more useful for utility vehicles due to their extreme customizable interior
furnishings.
17.2.5 Driving Assistance

In driving conditions, perception of the surrounding environment is a key element of
the driving task. AR can be used to improve this perception in various ways. First,
AR can superimpose, with respect to the drivers field of view, his or her current
trajectory in order to support him or her during a maneuver. AR can also suggest a
trajectory to help the driver during his or her navigation decisions (see Figure 17.2).
In this latter case, AR can reduce the cognitive effort of the driver, since, contrarily
437
FIGURE 17.1 AR for sales assistance in an automotive showroom: while constrained

VSLAM aligns accurately the CAD model of the vehicle with the video stream (top left), AR
is used to customize the interior furnishings (bottom left) of the vehicle. The whole processing is done in real time on a mobile tablet (bottom right). (Courtesy of www.diotasoft.com.)
FIGURE 17.2 Driving-assistance in AR. While the vehicle is accurately localized with a
constrained VSLAM using a camera, standard GPS, and a coarse 3D city mode (top left), AR
is used to display the path to follow (top right), safety information such as dangerous intersections (top right) and crosswalks (bottom left), or touristic information such as hotel (bottom
right). (Courtesy of CEATECH.)
438
to current aided navigation systems, the driver would not have to mentally transpose
schematic representation of a road network with his/her current perception of the
environment. AR can also be used to focus the drivers attention onto potential dangers, such as at crosswalks and intersections (see Figure 17.2), and road users.
AR can also improve the drivers perception of the roadway environment when
observation conditions are degraded due to weather or other reasons. For example,
with AR, the edges of the road can be highlighted in foggy conditions; and vehicles occluded by a building (Barnum et al. 2009) or by another vehicle (Gomes
etal. 2013) can be displayed to the driver using AR. Finally, the development of an
autonomous vehicle will probably favor the development of infotainment applications, such as touristic guide, and AR will be especially useful in providing this
information.
17.2.6User Manual and Maintenance Support

AR also provides an interesting alternative to the paper-based user and technical manual that is provided with a vehicle. Indeed, in such manuals, even if the
instructions are provided through schematics or pictures, the transcription of
these instructions onto the real vehicle is not necessarily intuitive. By providing
these instructions directly onto the real vehicle (Friedrich 2002; Gay-Bellile etal.
2012), AR avoids confusions (see Figure 17.3). Moreover, if the user does not look
at the area of the vehicle needing his or her attention, AR can guide him or her to
the correct location. And finally, if the manual is not sufficient, a remote expert
can provide dedicated instruction by annotating, online, the live video stream of
an AR device (Reitmayr etal. 2007).
17.3TECHNOLOGICAL CHALLENGES FOR

A LARGE-SCALE DEPLOYMENT
While studies have demonstrated the potential benefits of AR in the automotive field,
most of the proposed solutions remained at the level of proof of concept. To advance
another step further, it is necessary to develop prototypes that demonstrate the ability of AR technology to reach the level of requirements as follows:
Service quality: The virtual elements of an AR system should provide clear
and unambiguous information, and accurate and stable registration of these
elements onto the real environment; otherwise the system would lead to
errors or would require a significant cognitive effort for the end user to
compensate for these inaccuracies.
Service continuity: The AR system should provide a high availability ratio
since it engages the productivity or the brand image of the car manufacturer. The robustness to variations of its environmental conditions of use is
a main concern.
Ease of use and ergonomics: The AR system should be intuitive enough for
its use by the targeted end user, who, realistically, is not expected to be an
expert in AR technology. Moreover, the AR system should be compatible
439
FIGURE 17.3 AR for maintenance and repair. (Courtesy of www.diotasoft.com.)
with the task it is required to support. For example, for assembly-assistance

or driving-assistance applications, the requirements of the task require a
hands-free device.
Ease of deployment: The cost for establishing and operating the system
should be compatible with the benefits provided by its use. For a large-scale
deployment, the equipment and data required for its exploitation must be
available at low cost.
To reach these objectives, the main challenge for developing AR for the automobile industry is localization technology. Indeed, as we will underline in the following sections, most of the current solutions rely on a tracking system that cannot be
deployed at large scale due to their lack of accuracy, lack of robustness, or their
prohibitive cost.
440
Of course, localization is not the only challenge to implementing AR in the automotive industry. Providing a hands-free AR device remains a key challenge for the
solutions ergonomics, as illustrated by the research activity on semitransparent
glasses and windshields. However, depending on the targeted application, the use of
a deported screen (such as the screen of a tablet or the screen of Google Glasses) or
of a video projector (spatial AR) can provide an imperfect but nevertheless acceptable alternative.
Consequently, in the following sections, we will focus our attention to the localization issue and introduce the recent advances we have made on this subject. Also,
we will distinguish tracking solutions with respect to their applicative context: AR on
the vehicle and AR from the vehicle. The former covers all the applications for which
the car, or one of its components, is the target of the augmentation, while the latter
covers mainly the driving-assistance applications.
17.4 TRACKING A VEHICLE OR ONE OF ITS COMPONENT

The automotive industry remains a challenging context for current tracking technologies since it requires an accurate localization of potentially complex objects by
a nonexpert user. In the following, we will first introduce the current state-of-the-art
solutions and their relative limitations with respect to the requirements outlined in
Section 17.3. Then, we will introduce our tracking solution and discuss its benefits
in the automotive context.
17.4.1State of the Art

Among the automotive industry, various tracking solutions have been attempted.
Afirst solution consists of exploiting the 6DoF mechanical measurement arms that
are already in use in the automotive industry (Kahn and Kuijper 2012). As long
as the arm is required for the targeted task, this solution provides extremely accurate and robust localization. However, in most of the applicative contexts, such an
approach does not provide an efficient solution. Indeed, the volume covered by such
an arm is limited by nature and its use induces a limitation in terms of accessibility
(collision between an arm segment and an element of the environment). Moreover,
its cost represents a prohibitive drawback for its deployment.
A second approach relies on optical tracking solutions that use 2D (Calvet etal.
2012) or 3D markers (Pintaric and Kaufmann 2007). These solutions consist of rigidly attaching a set of markers onto the target object (inside-out approach) or onto
the observer (outside-in approach). Within the inside-out approach, the pose of the
object with respect to the camera is directly estimated from an image of the marker
in the cameras video stream. Since at least one marker must be visible to provide
localization, the volume covered by this approach is limited by the number of markers. However, even attaching a single marker onto the target object can be unacceptable in some contexts. For example, selling-support applications require that the
appearance of the car remains unchanged. In such contexts, the outside-in approach
provides an alternative solution. With this solution, the marker is no longer attached
to the target object, but onto the observer. A constellation of static cameras is used to
441
localize the observer with respect to the object. For that, an off-line calibration process has to be performed beforehand to determine the pose of the target object with
respect to the camera constellation. Similar to the inside-out approach, the outside-in
approach requires the marker to be visible by at least one camera of the constellation. Therefore, the volume covered by such an approach is limited. In both cases,
the necessity of instrumenting the environment with markers or cameras implies a
deployment process whose complexity is inappropriate with most of the applicative
contexts. For the outside-in approach, the cost of the camera constellation can also
be a further limitation.
To prevent these deployment problems, optical solutions that rely exclusively on
natural features of the target object have been developed. They are referred to as
model-based tracking and rely on a 3D model that describes the natural features of
the target object (assumption of known object). These 3D features can be geometrical, such as edges of a 3D mesh model (Drummond and Cipolla 2002; Wuest etal.
2007), or photo-geometric, such as a collection of 3D texture patches (Rothganger
etal. 2003; Gordon and Lowe 2006). The pose P of the object can be estimated
from a camera image by matching a subset {Qi}ni =1 of these 3D model features with
their corresponding 2D features {q i, j }ni =1 in the jth camera image. The object pose is
then defined as the one that minimizes the reprojection error, that is, the distance
between the 2D projection of the 3D feature according to the object pose and the
cameras intrinsic parameters and the position of their associated 2D features in
the image:
arg min P

d (q
2
i, j
, (P Qi ))
i =1
where
d2 is a 2D Euclidian distance
is the camera projection function
While model-based tracking solutions are easy to deploy and result in accurate localization, their lack of robustness is not compatible with the quality and continuity of
service required.
Indeed, model-based tracking is subject to two major drawbacks. First, the
matching of the 3D features with 2D observations can only be achieved under
restrictive conditions of use. On the one hand, the information encoded in geometric
features is not discriminant enough to easily distinguish between two observations.
To succeed, a usual solution consists in reducing the matching complexity problem
by introducing a small motion assumption. However, this assumption reduces the
condition of use of the system. On the other hand, photometric features are discriminant enough to achieve the matching under fast motion. However, because the
appearance of an object varies with the lighting conditions, a 3D model based on
photometric features is only valid under specific lighting conditions. This constraint
reduces the use of this solution to an environment where the lighting conditions are
controllable.
442
Second, the pose of the camera is not accurately estimated when the object is
small in the image, subject to a large occlusion, or out of the field of view, since
those configurations do not provide enough geometric constraints: the number of 2D
features matched with the 3D model is small and/or these 2D features are located in
a small area of the image.
Among the optical tracking solutions, a last approach provides a greater robustness to lighting condition variations and large viewpoint variations. This solution
is usually referred as visual simultaneous localization and mapping (VSLAM) or
structure from motion (SfM), depending on the particular scientific community
(Mouragnon etal. 2006; Klein and Murray 2007).
While the camera pose is estimated in a similar way to a model-based tracking solution, VSLAM approaches do not use an a priori model of an object but
reconstructs, online and in real time, a 3D model of the whole scene (assumption of
unknown environment). To achieve this reconstruction, VSLAM uses the principle of
multiview geometry (Hartley and Zisserman 2004) in order to assess the 3D position
of scene features, such as 3D points, from the motion of their apparent 2D motion
in the video stream. Since a long-enough 2D displacement is required to estimate
the depths of the features, this reconstruction process is not achieved in each frame.
The frames at which the reconstruction process is achieved are usually referred to
as keyframes. To reach an optimal trajectory and scene reconstruction with respect
to the multiview geometry constraints, both of them are optimized simultaneously
with a nonlinear optimization process, referred to as bundle adjustment (BA). This
optimization process minimizes the error R that corresponds to the sum of square
differences between the 2D projection of each 3D point and its associated 2D observations in each keyframe (also referred to as reprojection error):
R () =

d (q
2
i, j
, (Pj , Qi ))
i =1 jA i
where
stands for the scene parameters optimized by the BA (i.e., the coordinates {Qi}ni =1
of the reconstructed 3D features and the pose parameters {Pj}lj=1 at keyframes)
2
d is the squared Euclidean distance
is the projection function of a point Qi with respect to a camera pose P
Ai is the set of keyframe indexes in which the point Qi is associated to an observation qi,j
Since the camera localization is estimated from a 3D model that covers the whole
observed scene, the VSLAM approach provides an excellent robustness to a large
and fast camera motion and partial scene occlusion. Moreover, since the 3D model of
the scene is built online, the photometric appearance of the model features is, by construction, consistent with the current lighting conditions. Unfortunately, the VSLAM
approach provides poor localization accuracy. Indeed, in spite of the BA, the reconstruction process is usually subject to error accumulation due to its incremental nature.
Moreover, VSLAM localization is expressed in an arbitrarily chosen coordinate
443
frame, and an arbitrary scale factor when a single camera is used. However, this last
drawback can be partially resolved by bootstrapping the 3D model of the scene with
the help of a model-based tracking approach. While the model-based solution is used
to estimate the pose of the first keyframe, an initial reconstruction of the scene is
achieved by back-projecting the 2D image features onto the CAD model of the target
object (Bleser etal. 2006). If this initial reconstruction process provides a coordinate
frame and a scale factor to the VSLAM localization, their accuracy remains limited
since the reconstruction is achieved from a single point of view.
In the following, we will introduce our optical tracking approach that goes one
step further toward the combination of model-based tracking and VSLAM, and provides an accurate and robust localization while remaining easy to deploy.
17.4.2Our Solution: VSLAM Constrained by a CAD Model

While the VSLAM approach uses the whole scene to estimate the 3D camera motion,
it assumes that no a priori knowledge on this environment is available (assumption
of unknown environment) and that model-based approaches use exclusively the elements of the scene for which an a priori model is available (assumption of known
object). However, in most of the automotive applications, the scene is constituted of a
known object (the car or one of its components for which a CAD model is available)
surrounded by an unknown environment (assumption of partially known environment). In this context, it would be unfortunate to neglect the information provided
by the unknown environment or by the CAD model. To the benefit of both of these
situations (unknown environment, CAD model), we proposed to merge the VSLAM
solution and the model-based solution in a new approach referred as constrained
VSLAM. More specifically, since car components are usually textureless, we will
consider a model-based approach based on purely a geometric model.
To develop a tracking solution that is accurate, robust, and easy to deploy with this
new approach, we consider the following assumptions:
A CAD model of the target object is available.
The target object remains static with respect to its surrounding environment
during the tracking process.
We propose that both of these assumptions are reasonable in most automotive applications. Indeed, CAD models are widely available in the automotive industry and it
is unusual to move a car during a sale, for quality control, or a maintenance process.
17.4.2.1Principle
Similar to Bleser (Bleser et al. 2006), a model-based approach based on a CAD
model is used to bootstrap a VSLAM process. However, while Bleser ignores the
CAD model during the remaining tracking process, we propose to use this a priori
knowledge during the whole tracking process to preclude VSLAM from the problem
of error accumulation.
Indeed, if only the multiview constraints are used to estimate the motion of the
camera, the resulting camera trajectory and scene reconstruction will be subject to
444
error accumulation. The resulting drift can be observed, for example, at keyframes
by projecting the CAD model onto the image with the camera pose estimated bythe
VSLAM process. The error on the camera pose will generate an offset between
thesharp edges of the CAD model and their corresponding contours in the image.
If the constraints provided by the CAD model were strictly respected, these offsets
would be null.
To prevent drift, we introduce the constraints provided by the CAD model directly
into the VSLAM process. Consequently, the optimal trajectory and environment
reconstruction must minimize simultaneously the multiview constraints (i.e., the
reprojection errors of the 3D features reconstructed online) but also the constraints
provided by the CAD model (i.e., the offset between the projection of the 3D sharp
edges of the CAD model and their corresponding contours in the image).
17.4.2.2Implementation
We propose to use model-based constraints provided by the sharp edges of the 3D
model (i.e., for a polygonal model, edges used by two triangles and whose dihedral
angle is superior to a threshold). Similar to Drummond and Cipolla (2002), these
sharp edges are sampled in a set of oriented points {L i}si =1, usually referred as an
edgelet, each point being parameterized by its 3D position Mi and the 3D direction Di of the sharp edge from which it was extracted. An illustration of an edgebased model for the bodywork of a car is illustrated in Figure 17.4.
To introduce the edge-based constrain into a VSLAM process while keeping its
real-time performance, we propose to modify exclusively the BA process. Indeed,
since it can be achieved as a background process (Klein and Murray 2007), the performance of the tracking process will not be altered by our modifications. However,
modifying the BA will affect the result provided by the tracking process since the
BA refines the position of the 3D features used to estimate the camera pose.
In the BA, the edge-based constraint takes the form of an additional term hCAD
in the cost function. This term corresponds to the orthogonal distance between the
FIGURE 17.4 Illustration of an edge-based model for the bodywork of a car.
445
projection of the edgelet PjMi and its corresponding contour mi,j in the image. This
corresponding contour is usually defined as the nearest point of contour located
along the normal of the edgelet direction with an orientation similar to that of the
projected edgelet. Consequently, the deviation hCAD of the scene with respect to the
CAD-model constraints is defined as follows:
h CAD () =

i, j
(m i, j (Pj , M i ))
i =1 jB
where
ni,j is the normal of the projected direction PjDi
is the camera projection function of a 3D edgelet Mi with respect to a camera
pose Pj
B is the set of keyframe indexes for which the edgelet Mi is associated to a 2D
contour point mi,j
The optimal scene reconstruction optim is therefore defined as the scene that minimizes simultaneously the multiview and CAD-model constraints:

optim = min ( R () + h CAD ())
This biobjective cost function is optimized through a BA. More details about its
implementation can be found in Gay-Bellile etal. (2012).
17.4.3 Discussion
The combination of model-based and VSLAM approaches provides two main benefits.
On the one hand, the constraints provided by the CAD model provide accuracy to the
VSLAM process. Indeed, it defines a reference frame but also prevents the drift of the
scale factor and the error accumulation, which are the main drawbacks of the standard
VSLAM process. Moreover, our solution provides a better robustness to coarse initialization than usual solutions that bootstrap a VSLAM process with a model-based solution
(Bleser etal. 2006). In fact, the error that a coarse initialization induces on the online
reconstruction will be progressively corrected since the model-based constraint is optimized for several keyframes simultaneously. Consequently, our constrained VSLAM
process is easier to deploy and provides a more accurate online reconstruction.
On the other hand, the reconstruction of the unknown environment realized by
the VSLAM process provides the robustness that was missing in the standard modelbased approach. Therefore, the features used to estimate the camera poses are distributed over the whole image since the whole environment is used to estimate the
pose of the camera. Even if the object is small in the image, occluded, or out of
the field of view, its location with respect to the camera can still be estimated from
the observation of its surrounding environment. Moreover, because the edgelet-to-
contour association requires a small motion assumption, standard model-based solutions are not robust to fast motion. However, in our constrained VSLAM approach,
446
a first motion estimation is provided by the VSLAM process. Since the edgelet-tocontour matching process is achieved from the camera pose provided by this first
estimation, the small motion assumption is always checked.
Also, since VSLAM and geometric model-based tracking are both robust to illumination variations, the resulting constrained VSLAM maintains this robustness.
Consequently, our solution provides both accuracy and robustness, thus meeting the
requirement of quality and service continuity. On the other hand, the ease of deployment requirements is also fulfilled by our constrained VSLAM solution since it uses
only a standard camera and CAD model that are widely used in the automotive
industry, and since it can be coarsely initialized.
Our constrained VSLAM solution was successfully applied on various parts of a
vehicle, such as the bodywork, a cylinder head, and the automobiles interior. Some
results are shown in Figure 17.1 where our approach was used for sales assistance
using AR technology. Notice that these scenarios would have been extremely challenging given the usual model-based or VSLAM approaches since the bodywork
provides few sharp edges while its texture is unstable since it is mostly generated by
specular reflections of the surrounding environment.
17.5VEHICLE LOCALIZATION FOR AIDED

NAVIGATION IN AN URBAN CONTEXT
While standard navigation aiding systems often settle for a coarse localization provided
by a standard GPS system, the use of AR requires an accurate 6DoF localization at high
frequency. Indeed, while a driver can mentally compensate an inaccurate localization in
a usual exocentric top-down view of the road network, this is no longer the case when the
road to follow is displayed in an egocentric view. In the latter case, navigational instructions must be perfectly aligned with the real world in order not to mislead the driver.
In the following material, we present an overview of the accurate localization
solutions for large-scale localization in a city. Among these existing solutions, we
then describe in detail the fusion of VSLAM with GPS data. Finally, we present our
solution that uses the GIS models to improve the localization accuracy of the fusion
of VSLAM with GPS data.
17.5.1State of the Art

Accurate vehicle localization is a very active field of research among the autonomous
vehicle community. However, most of the solutions require sensors (LIDAR, GPS
RTK, etc.) that are too expensive for our context. Consequently, we will mostly focus
on vision-based solutions that can be initiated with low-cost cameras.
A first family of vision-based approaches relies on the exploitation of a georeferenced landmark database that has been previously built off-line. Among
these solutions, we can usually distinguish two approaches. The first one uses a
database of geo-referenced 2D pictures, such as Google StreetView. The vehicle
localization is therefore defined as the location of database images whose appearance is the most similar to the appearance of the live video stream (Cummins and
Newman 2011). However, those solutions, usually referred as topological VSLAM,
(a)
447
(b)
FIGURE 17.5 (a) The in-plane degree of freedom, that is, the x and y position and the yaw
angle. (b) The out-plane degree of freedom, that is, the altitude, the pitch, and the roll angle.
do not provide an accurate 6DoF localization and, consequently, do not meet the
quality of service criterion expressed earlier.
The second approach relies on a database of 3D visual features, such as 3D points
associated with a description of their visual appearance. This approach corresponds
to a photo-geometric 3D model that can be used online by model-based tracking
solutions (Lothe etal. 2009) previously introduced in Section 17.4.1 to estimate the
6DoF of the camera pose at high frequency. However, this approach is subject to
serious drawbacks. First, the databases are not widely distributed and their constructions are usually expensive since it tries to collect data from a dedicated vehicle that
moves along all the streets of all the cities. Second, the localization process is usually not robust to large illumination variations with respect to illumination condition
observed during the database construction. Therefore, the continuity of service cannot be guaranteed with this solution.
To avoid the drawbacks induced by landmark database, another family of visionbased approaches relies on the exploitation of VSLAM (cf. Section 17.4.1). However,
as already mentioned, this approach is subject to error accumulation, and scale factor
drift when a single camera is used. To prevent these problems, most of the solutions
try to fuse the motion estimated by VSLAM with the location provided by GPS
(Schleicher etal. 2009). While these solutions provide high frequency localization,
the estimation remains inaccurate. Indeed, the accuracy of the in-plane parameters
(cf. Figure 17.5) are usually limited to the GPS accuracy, while the out-of-plane
parameters (cf. Figure 17.5) are not estimated or result in high uncertainty.
Consequently, none of the previously mentioned solutions provide at the same
time the service quality/continuity and ease of deployment. However, in the following, we will demonstrate that the VSLAM approach can fulfill all these requirements
when it is correctly constrained with additional data, such as that derived from a GPS
and a geographic information system (GIS).
17.5.2Constrained VSLAM for Large-Scale Vehicle Localization

To provide a solution that fulfills both the quality and ease of deployment requirements, we propose to combine the VSLAM approach with data provided by a GPS
448
and a GIS, both being low cost and widely used. Similar to the approach that we introduced for tracking vehicle components, our solution relies on a constrained VSLAM
framework. The two solutions only differ by the nature of the constraints used.
In the following, we will first describe how the in-plane DoF of a VSLAM can
be coarsely constrained with GPS data (Section 17.5.2.1). Then, we will demonstrate
that a standard GIS can be used to refine both the in-plane (Section 17.5.2.2) and outplane DoF (Section 17.5.2.3) of the localization.
17.5.2.1 Constraining VSLAM with GPS
The GPS constraint to consider is that the position of the camera positions provided
by the VSLAM must follow the ones provided by the GPS. In fact, if we consider
that the camera is rigidly embedded to the GPS and the distance between these
two sensors is negligible, then this constraint can be observed by analyzing the gap
between the GPS measurements and the camera positions. It results in the following
GPS constraints:
h GPS () =

( t ) (g )
j x
j x
+ ( t j ) y (g j ) y
j =1
where
((t j )x , (t j )y ) is the jth cameras position and ((g j )x , (g j )y ) its associated GPS data
1 is the number of the camera optimized in the BA
Including this GPS constraint in the BA gives

A () = R () + w h GPS ()
A weight w must be added to take into account the uncertainty of the GPS data
and its variation over time (which is not the case for the CAD model constraints
since it is assumed error free). The presence of aberrant data makes the weight estimation a challenging problem. To avoid this problem, Lhuillier (2012) proposes a
new formulation of the constraint that allows more robustness against inaccurate
data. The principle is to progressively introduce the additional constraints (GPS, 3D
model) while not degrading significantly the multiview geometry (i.e., the reprojection error) beyond a certain threshold:
arg min h()
() < e
t
R
where
is a vector containing all the parameters optimized in the local BA (the n 3D
points and the l camera poses)
R() is the reprojection error previously introduced in Section 17.4.1
449
The inequality constraint R() < et can be formalized through a cost function that contains, in addition to the constraint term h(), a regularization term prohibiting a degradation of the reprojection error beyond the predefined threshold et. This regularization
term, computed from the standard reprojection error, has a negligible value when the
mentioned condition is respected and tends toward infinity when the considered condition is close to being broken. Consequently, the resulting cost function is given by
I () =
w
+ h()
e t R ()
where w is a constant that attributes a negligible influence to the regularization term

when the degradation of the reprojection error is low.
To realize a BA with the inequality constraint, two steps are required. The first
consists in performing a classic BA where the standard cost function R() is minimized. This step allows us to determine the maximal degradation of the reprojection
error et. Next, the additional data are introduced through the constrained BA with
inequality constrain by minimizing I().
Fusing GPS data with VSLAM through the constraint BA described earlier results
in a 6DoF geolocalization. However, even if aberrant GPS data are filtered due to
the inequality constraints, the accuracy of the resulting camera positions are broadly
equivalent to the GPS uncertainties. Furthermore, adding GPS data in the BA process constraints only the camera position and implicitly the yaw angle (or cap) of
the camera orientation, we call these parameters the in-plane camera parameters
(cf. Figure 17.5). The other degrees of freedom (altitude, pitch, and roll angle) that
we call out-plane parameters (cf. Figure 17.5) are not constrained and may drift over
time. Consequently, constraining a VSLAM with a GPS is not sufficient to reach the
required accuracy. In the following, constraints provided by a GIS will be added to
solve this remaining problem.
17.5.2.2 Improving the In-Plane Accuracy with GIS Constraint
Nowadays, GISs are constituted of several layers, such as a road map, a digital elevation
model (DEM) of the ground, or a 3D buildings model. This is the last layer that we will
use as an additional constraint to improve the in-plane accuracy of our VSLAM process.
An intuitive approach would consist of adding the model-based constraints
introduced in Section 17.4.2 in the BA constrained to GPS data (cf. Section17.5.2).
However, there are two major difficulties with this intuitive approach. The first is
inherent to the quality of the buildings models that are less accurate and detailed
than CAD models of small objects. Even if there exist high quality models of
cities, they are currently expensive and not available everywhere. Themost widespread models are obtained by aerial photogrammetry thus limiting the accuracy
obtained, which may cause errors (up to 2 m). Furthermore, the resulting models
are very simple (no details: doors, windows, etc.) and textureless. For these
reasons, exploiting 3D features extracted from a city model in a constrained BA
is not feasible. Wepropose in Section 17.5.2.2.1 another constraint that is well
suited to the simplicity of buildings models.
450
The second difficulty is that the GPS and buildings constraint both act (explicitly
or implicitly) on the camera position and have different optimal solutions due to
the important GPS uncertainty in dense urban areas. Thus, merging these two constraints in a unique constraint BA can lead to convergence problems. Therefore, we
propose a solution that uses the buildings models to remove the bias that affect the
GPS data before including them in the BA process.
17.5.2.2.1 Buildings Constraint
The principle of the constraint provided by the 3D buildings models is based on the
following hypothesis: a perfect VSLAM reconstruction (i.e., without any drift) must
lead to a 3D point whose cloud will be almost aligned with the 3D buildings models
of the observed scene. Consequently, it is possible to evaluate the compliance or the
noncompliance of the buildings constraint by measuring the difference between the
positions of the reconstructed 3D points and their corresponding facades in the 3D
buildings model. However, all the 3D points reconstructed by the VSLAM algorithm
do not necessarily belong to a facade. In fact, some 3D points may represent elements
belonging to the parked cars, trees, and road signs. This set of points is not concerned
by the buildings constraint. Consequently, the first step to establish this constraint is to
identify the set M of 3D points that belongs to buildings fronts and associate each one
to its correspondent facade. The constrained term associated to the buildings model
must measure the distance between each point Qi M and its corresponding faade
hi. For that, each point Qi M is expressed in the coordinate frames of its faade:
Qhi i
h Qi
=T i
1
1
where Thi is the (4 4) transfer matrix from the world coordinates frame to the
facade hi coordinates frame. In this coordinate frame the distance between the 3D
points and the buildings model is simply given by the z coordinate of Qhi i. Therefore,
the compliance of a point cloud with respect to the building constraint can be estimated as follows:
h bdg () =

(Q )
hi
i
iM
2
z
To deal with many wrong 3D point/buildings model associations, a robust estimator

works better than the L2 norm. More details on the buildings constraints are given
in Tamaazousti etal. (2011).
17.5.2.2.2 Improving GPS Accuracy
It is a common assumption that the uncertainties of the GPS data can approximately
be modeled by a bias and small Gaussian noise. The magnitude of this bias depends
on many parameters, as, for example, the constellation of visible satellites, but can
usually be approximated as locally constant. Determining and correcting locally the
bias of the GPS will improve the accuracy of the VSLAM and GPS fusion.
451
Recent studies propose to correct the bias of standard GPS by using geo-referenced
information provided by GIS. In Kichun etal. (2013) road markings are used: white
lines delimiting the road are detected in the images and matched with the numerical
model of the road to estimate the lateral bias of the GPS. Crosswalks are also exploited
with the same principle to calculate the bias in the direction of the vehicle displacement. However, white lines are not present on all roads and crosswalks are not frequent
enough to estimate regularly the GPS bias. In addition, these road markings can be regularly occluded by other cars. Finally, this type of information (white lines and crosswalks) is currently not common in GIS, which makes this solution difficult to deploy.
To overcome these problems, we propose to estimate the bias of the GPS from the
reconstruction obtained by the fusion of VSLAM and GPS data by exploiting the buildings that are more visible in urban areas and for which 3D models are widely available in
GIS. The bias of the GPS is not directly observable, unlike the VSLAM r econstruction
error generated by this bias that results in a misalignment between the reconstructed 3D
point cloud and the buildings models. Our solution is based onthe following hypothesis:
error affecting locally (i.e., the n last camera position and the 3D points they observe)
the VSLAM reconstruction after fusion with GPS data corresponds to a rigid transformation in the ground plane, that is, with 3DoF (position and yaw angle). This assumption appears to be a sufficient approximation for a coarse correction.
We add a correction module to the constrained VSLAM process that estimatethe
GPS bias at each keyframe as seen in Figure 17.6. This module takes as input
the buildings model and the VSLAM reconstruction that is not perfectly aligned
with the buildings model due to the GPS bias. The first step of this module is to
identify which points of the 3D point cloud come from buildings facades or from
the rest of the environment such as parked cars or trees. Once this segmentation is
performed, the 3DoF rigid transformation is estimated by minimizing the distances
between the 3D points associated with the buildings model and their associated
facade. This transformation is then locally applied to the GPS data associated to the
n last camera position to correct their bias.
Buildings
model
VSLAM
Constrained
bundle
adjustment
DEM
Georeferenced
VSLAM
reconstruction
Local correction
of GPS data
GPS
FIGURE 17.6 Overview of our proposed solution for accurate localization in urban environment. It combines VSLAM with GPS data and GIS models (buildings and DEM).
452
17.5.2.3 Improving the Out-Plane Accuracy with a SIG Constraint

To improve the accuracy of the out-plane DoF, we propose to constraint the VSLAM
trajectory with the DEM layer of a SIG. The DEM is a 3D geometric model that
provides a simple representation of the altitude variations of the roads network.
Inaddition to its availability in urban environment and rural areas, this model has
the advantage that it informs us about the out-plane degrees of freedom of the camera. In fact, since the camera is rigidly embedded in the vehicle, its altitude, roll, and
pitch angles can be considered constant relative to the road. In the following, since
we use very simple DEM that doesnt provide the road inclination, we use only an
altitude constraint and exploit the fact that the camera height h relative to the ground
is constant (i.e., we assume that the variations of the height related to the shocks are
negligible). This implies that the trajectory of the camera must belong to a surface
parallel to the road and located at a distance h from the ground. It seems necessary to
identify the road in the DEM where each camera pose is located. This association is
based on a proximity criterion: a camera pose is associated with the neared road kj
in the DEM in terms of orthogonal distance. In the constrained BA, the constrained
term associated with the DEM corresponds to the difference between the camera
k
altitudes (t j j )z expressed in its road plane coordinate frames and the desired height h:
h DEM () =
(t ) h
j =1
kj
j
tkj
tj
where j = Lk j with Lkj the (4 4) transfer matrix from the world coordinates
1
1

frame to the road kj coordinates frame associated to the jth camera. More details
on the DEM constraints are given in Larnaout etal. (2013a).
17.5.2.4 Overview of Our Complete Solution for Vehicle Localization
To summarize the proposed solution, it is a VSLAM approach with a constrained BA
that includes buildings, DEM, and GPS constraints. The resulting cost function of
the constrained BA is given by
I () =
w
+ h GPS () + h DEM () + h bdg ()
e t R ()
To overcome the convergence problem, GPS data are coarsely corrected beforehand
through an additional module that acts as a differential GPS where geo-referenced
antennas are replaced by a buildings model. Figure 17.6 presents an overview of the
proposed solution.
We performed many experiments to evaluate our vehicle geolocalization solution.
For our experiments, sequences of several kilometers of the roadway were acquired
in France. To obtain these sequences, the vehicle was equipped with a standard GPS
1Hz and an RGB camera providing 30 frames per second. The GIS models used
are provided by the French National Geographic Institute with an uncertainty that
do not exceed 2 m. We present results for one of these sequences in Figure 17.7.
(a)
(b)
(c)
(d)
453
FIGURE 17.7 (a) The raw GPS data and (b) the GPS data coarsely corrected with the approach
described in Section 17.5.2.2.2. (c) Results obtained by fusing VSLAM and the raw GPS data
and (d) results obtained with the proposed solution (the triangles are the camera position for
each keyframe and the 3D point cloud is the sparse reconstruction of the observed environment).
The localization accuracy is clearly improved with the proposed solution this is
enhanced by the good alignment between the resulting 3D point cloud and the buildings model where considering the fusion of VSLAM with the raw GPS, data gaps
can be observed between the resulting 3D points and the buildings model. Using a
complete GIS model (buildings and DEM) to improve the fusion of VSLAM and
GPS data yields accurate 6DoF localization that fulfill the quality criteria required
by aided navigation application as illustrated in Figure 17.2.
17.5.3 Discussion
Using GIS models can improve the localization accuracy resulting from the fusion of
VSLAM with GPS data. This fulfills both the quality and ease of deployment requirements of navigation-aided applications since GIS is now widely available and some
GIS databases are free (e.g., OpenStreeMap 3D). Furthermore, since our approach
relies on a VSLAM approach, it provides a continuity of service that solutions based
on visual featured databases cannot guarantee. On the other hand the localization
quality provided by our solution is still less accurate than approaches based on a
database. However, the accuracy provided by our solution remains sufficient for most
454
navigation-aided applications. The localization accuracy reached with the proposed

solution can still be improved by using other low-cost sensors such as the odometer
or inertial sensor. Indeed, the information on the vehicle trajectory provided by these
sensors can be easily introduced in our generic constrained VSLAM framework by
adding a new constraints term in the BA.
It is also possible to exploit the reconstruction provided by the constrained
VSLAM as an initial database of 3D visual features that can be transferred to a
server and then refined off-line to improve its accuracy (Larnaout et al. 2013b).
Thus, subsequent users that circulate in this area would be able to locate themselves
via model-based solutions. Our approach would be used to localize the vehicle and
simultaneously update and extend the database only when a user enters an area that
is not mapped in the database or mapped with an incompatible lighting condition.
Our approach results in a collaborative solution to the database creation issue while
maintaining the continuity of service in area devoid of database.
17.6CONCLUSION
Applications of AR in the automotive industry are numerous. Many AR techniques
for the automobile industry have been studied but few actually deployed. Even if
recent advances in tracking technologies, such as the constrained VSLAM, allow
engineers to remove the main technological locks, some challenges will still remain.
The first challenge relates to the ergonomy of the solution. For example, most of
the applications require a hands-free device. While solutions are already available,
such as spatial AR or semitransparent glasses, their ergonomy (e.g., width of the field
of view, dynamic focus distance, luminosity, and contrast) should be improved to
reach a high end user acceptance. But ergonomic issues are not limited to hardware.
The displayed information should also be designed to facilitate the work of the end
user without disturbing him or her or introducing potential dangers. For example, the
iconography used to display information on a windshield must be designed to reduce
the risk of hiding pedestrian or vehicles from the drivers view.
The second challenge concerns the integration of AR in the product life management process. For example, the goal is to lower the cost of content creation of an AR
application and facilitate its updates; therefore, the product data management (PDM)
should also integrate the needs of AR applications. Further, the development of a 3D
documentation could benefit from the 3D models and animations that were created
during the conception stage of the vehicle. Consequently, the development of new
norms and standards concerning 3D models, animations, interactions, and documentation will probably be necessary. In summary, this chapter presented an overview of
the use of AR in the automotive industry; we expect more integration of AR in the
product life cycle of automobiles in the future.
ACKNOWLEDGMENTS
We would like to thank Diotasoft for supporting our research with applications
in sales support in concession and vehicle maintenance through AR. We also
455
thank Valeo and Renault for their financial supports that allow development of
AR-based aided navigation solutions.
We are grateful to our colleagues at Vision and Content Engineering Laboratory
and Michel Dhome from the Institute Pascal for many useful comments and insights
that helped us to develop and refine the work.
REFERENCES
ARTESAS Project. Advanced augmented reality technologies for industrial service and applications. http://www.wzl.rwth-aachen.de/en/aa272c5cc77694f6c12570fb00676ba1.htm.
20042006.
ARVIKA Project. Augmented reality for development, production and servicing. http://cg.cis.
upenn.edu/hms/research/AVIS/papers/flyer_e.pdf. 19992003.
Barnum, P. C., Y. Sheikh, A. Datta, and T. Kanade. Dynamic seethroughs: Synthesizing hidden
views of moving objects. International Symposium on Mixed and Augmented Reality.
2009. Orlando, FL.
Bleser, G., H. Wuest, and D. Stricker. Online camera pose estimation in partially known and
dynamic scenes. International Symposium on Mixed and Augmented Reality. Santa
Barbard, CA, 2006.
Calvet, L., P. Gurdjos, and V. Charvillat. Camera tracking using concentric circle markers:
Paradigms and algorithms. International Conference on Image Processing. Orlando, FL,
2012.
Cummins, M. and P. Newman. Appearance-only SLAM at large scale with FAB-MAP 2.0. The
International Journal of Robotics Research. 30(9), 2011: 11001123.
Drummond, T. and R. Cipolla. Real-time visual tracking of complex structures. IEEE
Transactions on Pattern Analysis and Machine Intelligence. 24, 2002: 932946.
EFA2014Project.n.d.http://www.strategiekreis-elektromobilitaet.de/public/projekte/
eva2014/other/research-project-energy-efficient-driving-2014.
Friedrich, W. ARVIKAAugmented reality for development, production and service.
International Symposium on Mixed and Augmented Reality. 2002. Darmstadt, Germany.
Gay-Bellile, V., S. Bourgeois, M. Tamaazousti, and S. Naudet-Collette. A mobile markerless
augmented reality system for the automotive field. Workshop on Tracking Methods and
Applications. 2012. Atlanta, GA.
Gomes, P., M. Ferreira, M. K.-S. Kruger-Silveria, and F. V. Vieira. Augmented reality driving supported by vehicular ad hoc networking. International Symposium on Mixed and
Augmented Reality. Adelaide, SA, 2013.
Gordon, I. and D. G. Lowe. What and where: 3D object recognition with accurate pose.
Toward Category-Level Object Recognition. 4170, 2006: 6782.
Hartley, R. and A. Zisserman. Multiple View Geometry in Computer Vision, 2nd edn. Cambridge
University Press, Cambridge, U.K., 2004.
Kahn, S. and A. Kuijper. Fusing real-time depth imaging with high precision pose estimation by a measurement arm. International Conference on Cyberworlds. Darmstadt,
Germany, 2012, pp. 256260.
Kichun, J., C. Keounyup, and S. Myoungho. Gps-bias correction for precise localization of
autonomous vehicles. In Intelligent Vehicles Symposium, Gold Coast, Australia, 2013.
Klein, G. and D. Murray. Parallel tracking and mapping for small AR workspaces. International
Symposium on Mixed and Augmented Reality. 2007. Nara, Japan.
Larnaout, D., V. Gay-Bellile, S. Bourgeois, and M. Dhome. Vehicle 6-DoF localization based
on SLAM constrained by GPS and digital elevation model information. International
Conference on Image Processing. Melbourne, Victoria, Australia, 2013a, pp. 25042508.
456
Larnaout, D., V. Gay-Bellile, S. Bourgeois, and B. Labbe. Fast and automatic city-scale environment modeling for an accurate 6DOF vehicle localization. International Symposium
on Mixed and Augmented Reality. Adelaide, South Australia, Australia, 2013b.
Lhuillier, M. Incremental fusion of structure-from-motion and GPS using constrained bundle
adjustment. IEEE Transactions on Pattern Analysis and Machine Intelligence. 34, 2012:
24892495.
Lothe, P., S. Bourgeois, F. Dekeyser, and M. Dhome. Towards geographical referencing
of monocular SLAM reconstruction using 3D city models: Application to real-time
accurate vision-based localization. IEEE Conference on Computer Vision and Pattern
Recognition. Miami, FL, 2009, pp. 28822889.
Menk, C., E. Jundt, and R. Koch. Evaluation of geometric registration methods for using spatial augmented reality in the automotive industry. Vision, Modeling, and Visualization
Workshop. Siegen, Germany, 2010, pp. 243250.
Mouragnon, E., M. Lhuillier, M. Dhome, F. Dekeyser, and P. Sayd. 3D reconstruction of
complex structures with bundle adjustment: An incremental approach. International
Conference on Robotics and Automation. Orlando, FL, 2006.
Park, H.-S., H.-W. Choi, and J.-W. Park. Augmented reality based cockpit module assembly
system. International Conference on Smart Manufacturing Application. Gyeonggi-do,
South Korea, 2008.
Pentenrieder, K., C. Bade, F. Doil, and P. Meier. Augmented reality-based factory planning
An application tailored to industrial needs. International Symposium on Mixed and
Augmented Reality. Nara, Japan, 2007.
Pintaric, T. and H. Kaufmann. Affordable infrared-optical pose tracking for virtual and augmented reality. Workshop on Trends and Issues in Tracking for Virtual Environments.
2007. Charlotte, NC.
Porter, S. R., M. R. Marner, R. T. Smith, J. E. Zucco, and B. H. Thomas. Validating spatial
augmented reality for interactive rapid prototyping. IEEE International Symposium on
Mixed and Augmented Reality. Seoul, South Korea, 2010, pp. 265266.
Regenbrecht, H., G. Baratoff, and W. Wilke. Augmented reality projects in the automotive and
aerospace industries. Computer Graphics and Applications. 25(6), 2005: 4856.
Reif, R. and W. A. Gnthner. Pick-by-vision: Augmented reality supported order picking. The
Visual Computer. 25(57), 2009: 461467.
Reiners, D., D. Stricker, G. Klinker, and S. Mller. Augmented reality for construction tasks:
Doorlock assembly. International Workshop on Augmented Reality. Natick, MA, 1998.
Reitmayr, G., E. Eade, and T. Drummond. Semi-automatic annotations in unknown environments.
International Symposium on Mixed and Augmented Reality. Nara, Japan, 2007, pp. 6770.
Rothganger, F., S. Lazebnik, C. Schmid, and J. Ponce. 3D object modeling and recognition using affine-invariant patches and multi-view spatial constraints. Conference on
Computer Vision and Pattern Recognition. 2003, pp. 272280. Madison, WI.
Schleicher, D., L. Bergasa, M. Ocana, R. Barea, and E. Lopez. Real-time hierarchical GPS
aided visual SLAM on Urban environments. International Conference on Robotics and
Automation. Kobe, Japan, 2009, pp. 43814386.
Tamaazousti, M., V. Gay-Bellile, S. Naudet-Collette, S. Bourgeois, and M. Dhome. NonLinear
refinement of structure from motion reconstruction by taking advantage of a partial
knowledge of the environment. Computer Vision and Pattern Recognition. Providence,
RI, 2011, pp. 30733080.
Wuest, H., F. Wientapper, and D. Stricker. Adaptable model-based tracking using analysis-bysynthesis techniques. In Computer Analysis of Images and Patterns. W. G. Kropatsch,
M.Kampel, and A. Hanbury (eds.), Springer, Berlin, Germany, 2007, pp. 2027.
Zhou, J., I. Lee, B. Thomas, R. Menassa, A. Farrant, and A. Sansome. In-situ support for automotive manufacturing using spatial augmented reality. International Journal of Virtual
Reality. 11(1), 2012: 3341.
18
Visual Consistency in
Augmented Reality
Compositing
Jan Fischer
CONTENTS
18.1 Introduction................................................................................................... 458
18.1.1 Image Compositing in Video See-Through Augmented Reality....... 458
18.1.2 Image Inconsistencies: Camera Imperfections and Artifacts............ 458
18.1.3 Specific Challenges of RealVirtual Compositing............................ 459
18.1.4 Summary of Effects Causing Visual Discrepancies.......................... 461
18.2 Creating Visual Consistency: Two Complementary Approaches to
Unified Visual Realism in AR....................................................................... 461
18.2.1 Emulation of Camera Realism........................................................... 462
18.2.2 Artistic or Illustrative Stylization...................................................... 463
18.2.3 Summary of the Two Complementary Strategies.............................. 463
18.3 Related Work and Specific Challenges in AR...............................................464
18.4 Emulating Photographic Imperfections in AR..............................................465
18.4.1 Camera Image Noise.........................................................................466
18.4.2 Motion Blur........................................................................................468
18.4.3 Defocus Blur (Depth of Field)...........................................................468
18.5 Specific Challenges of RealVirtual Compositing........................................469
18.5.1 Aliasing at the Border of Virtual Objects..........................................469
18.5.1.1 Pixel Averaging in the Original Resolution........................ 470
18.5.1.2 RealVirtual Antialiasing Using Supersampling............... 470
18.5.2 Occlusion Handling........................................................................... 473
18.5.2.1 Occlusion Handling Using a Time-of-Flight Range
Sensor.................................................................................. 473
18.6 Stylized AR................................................................................................... 474
18.6.1 Real-Time Cartoon-Like Stylization on the GPU............................. 475
18.6.2 Other Stylization Approaches............................................................ 476
18.6.3 Applications of Stylized AR.............................................................. 476
18.6.4 Psychophysical Evaluation of Stylized AR........................................ 478
18.7 Conclusion..................................................................................................... 478
References............................................................................................................... 479
457
458
18.1INTRODUCTION
18.1.1 Image Compositing in Video See-Through Augmented Reality
In augmented reality (AR), virtual graphical elements generated by a computer are
overlaid over a view of the real world. The types of additional graphical elements are
as varied as the spectrum of AR applications. In a pedestrian or bicycle navigation
AR application, the overlay may consist of abstract 3D arrows and some text; in a
medical AR application, organ structures from a CT scan may be shown; and in an
AR game, an entire parallel game universe may coexist within the real surroundings
of the player.
In the case of video see-through AR, which is the focus of this chapter, this visual
integration of virtual objects into the users environment is accomplished with the
help of a digital video camera (Azuma etal., 2001). This digital video camera may
for instance be a camera built into a smart phone, tablet, or laptop, or it may be contained in a wearable device, such as a head-mounted display. This camera generally
points in a direction roughly corresponding to the users viewing direction, and it
continually captures images of the real scene that is in front of the user. The captured
video stream is processed by a computer, which uses each captured video frame as
a background image over which it renders the additional graphical elements. Finally,
the combination of the captured camera image with the superimposed virtual objects
is shown to the user on a screen, which is often built into the same device that also
contains the video camera. This process is continually repeated at real-time frame
rates, generating the impression that the user can see-through the screen to observe
the augmented environment.
18.1.2 Image Inconsistencies: Camera Imperfections and Artifacts

Naturally, the image portions captured with the camera and the computer-generated
portions in the augmented video stream have very different properties. The image
recorded by the camera shows the real world with its diverse range of optical effects;
for instance, complex illumination phenomena such as lighting and shadows from
several light sources, specular (mirror-like) reflections, and mutual diffuse (scattered) reflections among surfaces in the environment. (The theory of global illumination is discussed in Dutre etal. (2006).) Moreover, varied and complicated objects
may be observed in the real scene, consisting of intricate shapes and covered with
complex color patterns.
This rich natural scene is captured by the digital camera. Numerous distortions
and artifacts are introduced by the image acquisition pipeline of the camera. These
include initial nonlinear geometric transformations caused by the geometry of the
camera lens and imperfections due to technical limitations of the image sensor.
Afurther loss of image quality is caused by processing stages in the latter part of
the pipeline, which often involve color space transformations and image compression steps. Of course, camera parameters such as exposure time, ISO sensitivity, and
white balancewhich may be manually or automatically adjustablealso have a
strong impact on the captured image. Camera image distortions and imperfections
are often quite marked in the images captured by AR systems: Since AR applications
Visual Consistency in Augmented Reality Compositing
459
often run on small and mobile devices such as smart phones, tablets, and wearable
displays, the camera hardware used is typically small and usually also rather inexpensive. A limited image quality generally has to be accepted due to the restrictions of the components of these small cameras in comparison with larger and more
expensive video equipment. Klein and Murray (2010) provide an exhaustive overview of the artifacts introduced in the image acquisition pipeline of a small, low-cost
digital video camera.
The computer-generated virtual objects in an AR scene, on the other hand, are
artificially generated. They are normally synthesized using standard real-time rendering algorithms as they are common in computer graphics. The fast reaction times,
and therefore short latencies, which are required in an interactive AR system, mean
that only relatively simple rendering approaches can be used, in particular in view of
the still rather limited performance of many mobile devices. The models shown as
virtual objects are often rather plain and may in some applications even be generated
procedurally (i.e., during runtime by an algorithm). Illumination parameters such as
the number and position of light sources typically do not match the conditions in the
real environment, which are moreover normally variable. Of course, the computerrendered image portions also lack the imperfections contained in the camera image,
such as image noise and motion blur. Instead, the virtual objects are drawn perfectly
within the capabilities of their rendering algorithms. An introduction into the type of
real-time rendering techniques often utilized in AR is given by Munshi etal. (2008).
As a consequence, the augmented environment presented to the user actually
combines image portions with two entirely different kinds of visual realism: On the
one hand, the visually rich real scene with its complex illumination seen through
an imperfect and distorting imaging pipeline. On the other hand, the synthetically
perfect virtual elements generated from simple models and artificial, static lighting parameters. Ultimately, rather than displaying an integrated augmented environment to the user, the output of an AR system often reveals quite clearly what it
really is: simple computer graphics pasted over a camera image of moderate quality
(see Figure 18.1). It is thus not surprising that a study has shown that people can tell
very reliably whether a particular object is real or virtual in conventional AR (see
Section 18.6.4).
18.1.3Specific Challenges of RealVirtual Compositing

In AR systems, a combined scene is generated, which consists of real and virtual
elements. In addition to the different levels of visual realism depicted in these two
types of image portions, the different types of information available for them pose
another significant difficulty. It is a central problem that no depth information generally exists for the pixels of the camera image. While the simulated depth can easily
be calculated for the computer graphics objects, the AR system does not readily
know whether any given object in the camera image is close or far away.
For this reason, in a nave AR implementation, the virtual elements are always
simply painted over the camera image. This simplistic approach leads to visual
inconsistencies that are immediately apparent: Real objects that are closer to the
user remain hidden behind virtual elements that are supposedly farther awayan
460
FIGURE 18.1 Comparison of the visual realism of real and computer-generated image elements in conventional augmented reality: Which one is real?
effect that is even more noticeable when the respective objects are in motion along
the depth axis. A particularly common instance of this problem concerns the hands
of the user. They are by nature close to the users viewpoint, and they are often
in viewparticularly in applications requiring user interaction. In the nave AR
approach, virtual objects always occlude the users hands, although they are often
supposedly farther away.
It is worth noting that within each of the real and the virtual realms, mutual occlusions are depicted correctly: In the real image portions, objects naturally occlude
each other, while the mutual occlusion of virtual objects is handled by standard
computer graphics algorithms such as the z-buffer (Catmull, 1974; Straer, 1974).
Only when real and virtual objects are visually integrated in the same image, the
lack of depth information in the camera image becomes apparent. This occlusion
problem is a long-standing challenge of AR, of which Wloka and Anderson (1995)
and Fuhrmann etal. (1999) provided early discussions.
For the same reason, global illumination effects caused by the interaction bet
ween real and virtual elements in the augmented environment generally cannot be
rendered correctly. In particular, it is normally impossible to represent reflections
and shadowing properly, since a simulation of these effects also requires knowledge
about the 3D structure of the observed real scene. For example, a virtual sphere hovering over a real desk might be expected to cast a shadow on the surface of the desk,
depending on the position of light sources in the scene. However, correctly rendering
the shadow requires knowledge about the exact spatial position and orientation of
the desk, which cannot be easily extracted from a 2D camera image. Similarly, the
reflections in a virtual mirror (or other reflective object) positioned in the augmented
scene can only be displayed properly if the spatial structure of the real environment
is known. This complex problem of rendering of reflections and shadows in AR was
for instance discussed by Agusanto etal. (2003).
461
Another visual inconsistency arising from the realvirtual compositing process

is aliasing at the edges of virtual objects. Aliasing is an artifact that is caused by
the way in which computer graphics algorithms render graphical primitives. The
geometrical shapes that constitute a 3D model are displayed on a screen consisting of rectangular pixels. Using straightforward rendering methods, this often leads
to jagged, staircase-like edges at the border of graphical primitives, which can be
aesthetically displeasing, in particular if the contrast between neighboring objects is
large. Aliasing can be handled quite effectively using antialiasing methods within
purely computer-generated scenes. These methods often rasterize graphical primitives locally at a higher resolution (supersampling) and then average the resulting
subpixels for generating a smooth edge in the resulting output image, as described
for instance in Schilling (1991). This is obviously not easily possible in AR, where
the resolution of the real-world image is limited to what is provided by the digital
video camera. Sharp and jagged edges therefore frequently appear at the transition
between virtual and real image areas in AR.
18.1.4Summary of Effects Causing Visual Discrepancies

The preceding sections have given an introduction into the effects that cause discrepancies between the visual appearance of real and virtual image portions in video
see-through AR. Table 18.1 contains a summary of these effects, which are classified
into the photographic (camera) imperfections outlined in Section 18.1.2, as well as
the specific compositing challenges and global illumination problems described in
Section 18.1.3.
Some aspects of visual consistency in video see-through AR have been rather
thoroughly investigated since the field emerged in its modern form in the 1990s.
These include the handling of lens distortion (Bajura and Neumann, 1995; Tuceryan
etal., 1995) and attempts at reconstructing the global illumination conditions of the
real scene (Drettakis etal., 1997; Agusanto etal., 2003; Grosch etal., 2007; Aittala,
2010; de Sorbier and Saito, 2014).
This chapter will mostly focus on the other, nonconventional aspects of visual
inconsistency in AR, as listed in Table 18.1 (see the table entries marked with *).
18.2CREATING VISUAL CONSISTENCY: TWO COMPLEMENTARY

APPROACHES TO UNIFIED VISUAL REALISM IN AR
The various image imperfections and compositing problems discussed earlier lead to
an AR environment in which real and virtual elements appear visually inconsistent,
often markedly. The challenge addressed in this chapter is therefore the following:
How to generate an AR video stream in which the real and virtual parts of the scene
are reproduced in a visually consistent manner? In other words, how can an equalized level of visual realism be achieved in real-time AR, rather than presenting the
user with the incongruous styles of an abstract, synthetically perfect computer rendering pasted over a noisy, but visually rich camera image?
Two complementary approaches yielding fundamentally different types of visual
realism have been proposed to address this issue.
462
TABLE 18.1
Reasons for Visual Discrepancies between Real and Virtual Image Portions in AR
Photographic imperfections
Camera noise*
(and other sensor artifacts)
Motion blur*
Defocus blur
(depth of field)*
Lens distortion
Noise contained in the camera image, as well as further artifacts caused

by the acquisition pipeline of small/inexpensive digital video cameras.
For more information, see Section 18.4.1.
Blurring of objects that are rapidly moving during the recording of a
single video frame.
Limited depth range in a captured camera image in which recorded
objects appear acceptably sharp.
Nonlinear distortions in a captured image caused by the geometry of the
camera optics.
For more information, see Bajura and Neumann (1995) and Tuceryan
etal. (1995).
Specific challenges of realvirtual compositing

Aliasing*
Jagged edges at the border between real and virtual image elements.
Occlusion handling*
Incorrect representation of mutual occlusions between real and virtual
objects due to lack of depth information in the real scene.
Global illumination challenges
Mismatched lighting
The real-world illumination conditions are unknown, and the synthetic
lighting of the computer-generated elements is therefore inconsistent.
For more information, see Drettakis etal. (1997), Agusanto etal. (2003),
Grosch etal. (2007), Aittala (2010), and de Sorbier and Saito (2014).
Shadows and reflections
Shadows and reflections between real and virtual objects cannot be
correctly rendered.
For more information, see Agusanto etal. (2003), Sugano etal. (2003),
and de Sorbier and Saito (2014).
The entries marked with an asterisk (*) are discussed in more detail in this chapter.
18.2.1Emulation of Camera Realism

The first possible answer to the question formulated earlier is the attempt to emulate
the look of the camera image as closely as possible when rendering the computer
graphics elements. This basic idea of striving for a kind of camera realism in AR
has been investigated since the 1990s, when the field emerged in its modern form.
In particular, the analysis of the illumination conditions in the real scene and their
consideration in the realvirtual compositing process has long been considered a
primary objective, as discussed earlier (see for instance Drettakis etal., 1997).
More recently, AR research has also begun to address the aforementioned photographic imperfections contained in the camera images of video see-through AR systems. By trying to mimic the shortcomings of the image acquisition pipeline when
463
rendering the virtual objects (Fischer et al., 2006a; Klein and Murray, 2010), the
visual realism of the camera image can be better emulatedeven if this means that
the output image is made worse in a certain sense.
Methods for the emulation of camera realism in AR will be discussed in Section 18.4.
18.2.2Artistic or Illustrative Stylization

A complementary approach for solving the problem of visual inconsistencies in AR,
which aims at achieving an entirely different kind of visual style, was originally proposed by the author (Fischer etal., 2005c). In stylized AR, the entire AR video image
is filtered to produce an output mimicking an artistic or illustrative visual style.
Stylized AR systems present to the user an environment that resembles an animated
pencil drawing, watercolor painting, technical illustration, or many other styles, for
example. Such artistic and illustrative stylizations have the property that they simplify the underlying images, often significantly. These simplifications, which are
applied equally to the real and virtual elements in the augmented image, can effectively conceal the effects that cause visual inconsistencies in conventional AR.
Many stylization filters aim at generating largely homogenous image regions, for
example, using nonlinear image smoothing in a cartoon-like stylization or the hatchings that are often employed in technical illustration styles. Such filters therefore
inherently remove photographic imperfections such as image noise and blur. They
also mask incongruous physical illumination effects and simplify even the underlying richness of shapes and textures in the real scene itself.
Other artistic stylization filters actually introduce their own visual components
into the output image, such as paint brush strokes and dots, pencil strokes, and paper
or canvas textures. Moreover, artistic stylization often also entails a transformation
of image colors, for instance into a predefined artistic color paletteor even a reduction to gray or black-and-white scales. Since these artistic visual style elements are
applied equally to the real and virtual image portions, the visual inconsistencies
between them are largely concealed.
Obviously, the loss of image detail and overall change of visual style mean that
stylized AR cannot be employed in every application area. However, its use has been
demonstrated in a range of applications (e.g., see Zllner etal., 2008; Dai, 2011), as
further discussed in Section 18.6.3. In some fields such as entertainment and culture,
the presentation of the augmented environment in an alternative visual style may
actually be interesting by and of itself.
Section 18.6 discusses methods for generating stylized augmented environments,
as well as example applications of stylized AR and the experimental evaluation of
the approach.
18.2.3Summary of the Two Complementary Strategies

In summary, two fundamentally different strategies can be employed for handling
visual inconsistencies in video see-through AR. Figure 18.2 illustrates how the complementary approaches of a refined camera realism and stylized AR are derived
from straightforward conventional AR compositing.
464

Camera-adaptive
compositing
Pho
Camera realism
Conventional AR
ns
te
ula
ctio
Sim
erfe
p
ic im
aph
ogr
Artistic/illustrative style
App
ly s
Filt
tyli
ers
zati
to e
on
ntir
e im
age
Stylized AR
Visual
consistency
Low
High
FIGURE 18.2 Complementary approaches for dealing with visual inconsistencies in video
see-through augmented reality.
18.3 RELATED WORK AND SPECIFIC CHALLENGES IN AR

Of course, the challenge of combining real and virtual image portions in a visually
consistent manner is not unique to AR. Similar tasks have to be performed in many
applications dealing with the processing of images or videos. These include, for
instance, the professional editing of photos and, in particular, visual effects (VFX)
in movie production.
Therefore, the adaptation of the illumination of synthetically generated objects to
the conditions in a real environment has been investigated thoroughly, for instance,
by Fournier etal. (1993) and Debevec (2008).
Corresponding to the problem of the image noise of small digital cameras in AR,
the integration of computer graphics into analog photo and film material must deal
with film grain. Film grain is the mostly random optical texture of processed film
caused by its microscopic chemical structure. Solutions for handling film grain when
integrating computer graphics elements into analog film images were presented, for
example, by Cooper etal. (2004) and Grubba Software (2014).
465
Likewise, the artistic stylization of video sequences has been an area of ongoing
research and development for many years both in academia (e.g., see Hertzmann and
Perlin, 2000; Kypriandis et al., 2013) and the video-processing software industry
(Boris, 2014).
These related developments from the fields of general image and video processing
can provide an important technological basis for achieving corresponding effects in
video see-through AR. However, it must be noted that a set of unique constraints has
to be fulfilled when addressing these problems in AR:
Real-time: Any video-processing filter and specialized rendering method
applied in AR must be able to deliver real-time frame rates to ensure
interactivity.
Competition for system resources: Typical AR systems must additionally
perform a number of computationally expensive tasks, such as image-based
pose estimation or sensor fusion, further reducing the processing resources
available to any video postprocessing step.
Limited hardware capabilities: AR systems are often based on small and
portable hardware platforms such as smart phones, tablets, or wearable
devices, which usually offer significantly less computational power than a
graphical workstation, a desktop computer or even a laptop.
Fully automatic processing: Unlike in many other application areas, the
video-processing steps applied to an AR video stream must generally be
fully automatic. The user cannot make manual adjustments to filter parameters or interactively select different filters depending on circumstances
while the AR system is in operation.
This unique set of constraints means that some image and video-processing methods
developed for other applications may not be suitable for AR at all, while others have
to be modified in order to be useful in this specific context.
18.4 EMULATING PHOTOGRAPHIC IMPERFECTIONS IN AR

As mentioned earlier, the emulation of photographic imperfections in the camera
image in order to achieve a better visual integration in AR is a relatively recent
research area. The basic approach is to adapt the visual representation of virtual
objects in the AR scene to the effects found in the camera image. This can be accomplished by altering the actual rendering process of the virtual objects or by postprocessing their rendered images using a specialized image filter. As is generally the
case in AR applications, it is a particular challenge that this mimicking of camera
image imperfections must be performed with a minimal impact on the overall processing load of the system in order to ensure real-time interactivity. Moreover, the
adapted rendering and postprocessing steps must run fully automatically, possibly
except for an initial one-time calibration procedure.
466
18.4.1 Camera Image Noise

The video frames delivered by a digital camera contain a certain amount of image
noise. Even when capturing an entirely static scene, the acquired color values for a
given pixel vary slightly from one frame to the next. Although they may not be large,
the dynamic nature of these variations leads to a distinct difference in appearance
between the camera image and the virtual objects, which do not contain any such
noise. Moreover, compared to the more sophisticated cameras used in other application areas, the small and inexpensive image sensors typically employed in AR applications often deliver noisier video streams.
A first approach for the real-time emulation of camera image noise in AR was
proposed in Fischer etal. (2006a). The model presented therein assumes that a given
digital camera delivers a distinctive type of image noise. In the literature, precise
theoretical descriptions of image sensor noise have been discussed (see Withagen
et al., 2005). However, such an exact noise model can only be established using
elaborate measurement procedures under controlled conditions. A simplified noise
model was therefore proposed, which can be estimated based on an uncomplicated
one-time calibration procedure.
In this simplified noise model, the variation of intensity in each color channel is
assumed to be normally distributed. Each color channel (R, G, B) is treated separately, which leads to three different sets of normal distribution parameters (i.e., mean
values and standard deviations). As a refinement of the basic noise model, different
normal distributions depending on the overall pixel intensity are determined, since
some effects contributing to image noise are proportional to the amount of incoming light (e.g., so-called shot noise). This is modeled by estimating several sets of
normal distribution parameters. The average color channel intensity of each pixel
is quantized into one of N intensity bins, for each of which three separate normal
distributions (R, G, B) are estimated.
The one-time calibration step for camera image noise is based on the principle
of capturing arbitrary but static scenes with the camera over a duration of several
frames. An average image is computed for each of these reference scenes. It is then
possible to determine the noise in each pixel of each captured frame relative to its
associated average image. For each color channel and each intensity bin, the noise
distribution parameters are estimated.
The computed noise distribution parameters are used in an adapted rendering
pipeline that displays virtual objects with overlaid noise. Noise textures are generated during program initialization as a preparation step for the actual rendering
algorithm. The noise textures are generated as RGB textures, in which each texel
holds random channel variation values. These channel variations are generated such
that they have a distribution corresponding to the measured noise mean values and
standard deviations.
A special random number generator is employed, which delivers random values
with a normal distribution of a given mean and standard deviation, rather than the
uniform distributions delivered by conventional random number generators. For
each of the intensity bins defined in the calibration step, a separate noise texture is
generated.
(a)
(c)
467
(b)
(d)
FIGURE 18.3 Example of image noise emulation for a virtual object. (a) Virtual butter fly
object (shown without background camera image). (b) Example of random noise texture. (c)
Butterfly wing detail without image noise emulation. (d) Same image detail with overlaid
noise emulation (contrast enhanced for better visibility).
The noise textures are used in an adapted AR rendering pipeline. When rendering virtual objects over the background camera image, the color values of their
pixels are modified based on the noise textures. Between frames, the noise textures
are translated and rotated randomly so that dynamically varying noise is emulated.
In order to better match the characteristics of image noise found in the camera
image, the scaling of the noise textures, the magnitude of the noise modulation,
and the speed of the random noise animation are adjustable with user-definable
parameters.
Figure 18.3 shows an example of camera image noise emulation applied to a virtual object. The effect is more apparent in a video than in a still image, since the
emulated noiselike the camera noise itselfis dynamic and therefore continually
changes between frames.
A significantly more advanced emulation of a large number of photographic
imperfections caused by the specific properties of the image acquisition pipeline of
small cameras was developed by Klein and Murray (2010). Their system models and
simulates distortions, chromatic aberrations, blur, Bayer masking, noise, sharpening,
468
and color-space compression, while maintaining real-time frame rates. A similar

advanced image processing pipeline for image degradation using noise emulation
and a range of other effects in AR was also described by Aittala (2010).
18.4.2Motion Blur
A particularly apparent type of blur in the camera image occurs when moving objects
are observed. Motion blur results from the temporal integration of light intensity in the
image sensor. If there is fast movement in the observed scene, a blurred camera image
is captured. This is in particular also true when the camera itself is moved. An adapted
display technique for virtual objects in AR that mimics motion blur was also presented
in Fischer etal. (2006a). In order to be able to simulate the effects of motion blur in
the camera image, the magnitude and direction of the blurring have to be known. The
motion blur vector can be approximated using the camera pose information, which is
already determined in the AR system for the correct placement of the virtual objects.
This motion blur rendering technique for AR computes the geometric center of
each virtual model in a preprocessing step. This is achieved by finding the bounding
box of the model and calculating its center. During the runtime of the AR system,
the blur vector is estimated in every frame. This approximated blur vector is defined
as the 2D motion of the center of the virtual object in image space. It is computed by
projecting the object center into image space according to the current camera pose.
The difference between the projected object centers in two consecutive frames is
used as the blur vector.
Motion blur rendering is only activated if the length of the blur vector is greater
than a predefined threshold (typically between 5 and 10 pixels). Moreover, excessively long motion blur vectors resulting from an erratic pose estimation are ignored.
If motion blur rendering is to be applied in a time step, an adapted display method is
used. The simulated motion blur effect is created by repeatedly blending the virtual
object image over the camera image at different positions. These positions are generated along the computed blur vector, which is centered at the current projection of
the object center. Alpha blending is used during the rendering of the individual copies of the virtual object image. This way, the virtual object appears blurred from its
current position along the blur vector. Figure 18.4 shows an example of motion blur
simulation in AR using this technique.
Motion blur emulation methods based on an external motion estimate are also
employed in Klein and Murray (2010) and Aittala (2010). A differentand potentially more accurateapproach is to estimate the size of motion blur in the camera
image using computer vision methods. Such analytical motion blur simulations were
proposed by Okumura etal. (2006) and Park etal. (2009). In both of these methods,
vision-based motion blur estimation is made possible by integration into the actual
vision-based camera tracking algorithm.
18.4.3Defocus Blur (Depth of Field)

Objects captured by a camera only appear sharp in the resulting image at certain
depths. This is a well-known effect: When a photographer focuses on a certain object
in preparation for taking a photo, objects that are closer or farther away from the
469
(a)
(b)
(c)
FIGURE 18.4 Motion blur simulation in augmented reality. (a) An example of strong
motion blur caused by fast camera rotation in a camera image without virtual objects. Virtual
objects rendered with simulated motion blur in an AR scene: (b) virtual hamburger and (c)
butterfly model.
camera become blurred (i.e., they are out of focus). The range of depths in which captured objects appear acceptably sharp is called depth of field. The resulting defocus
blurring of objects in the camera image is another visual discrepancy in AR, since
virtual objects always appear to be perfect in focus.
The problem of simulating defocus blur in AR has been addressed by several
researchers. The method of Okumura et al. (2006) handles defocus blur together
with motion blur in the context of their marker tracking algorithm. More recently,
Kn and Kaufmann (2012) and Xueting and Ogawa (2013) have proposed specific
AR rendering methods capable of simulating defocus blur.
18.5 SPECIFIC CHALLENGES OF REALVIRTUAL COMPOSITING

18.5.1Aliasing at the Border of Virtual Objects
Aliasing between real and virtual image areas is another effect that makes virtual
objects easily distinguishable: At the outer boundary of graphical models, the typical
hard, jagged edges generated by a polygonal renderer are visible. Sharp, aesthetically displeasing edges are therefore apparent, in particular if a large contrast exists
between the virtual objects in an AR scene and the surrounding real environment.
This problem is further worsened by the typically limited resolution of the output
image in AR.
470
18.5.1.1 Pixel Averaging in the Original Resolution

In Fischer et al. (2006a), an algorithm was introduced, which performs a smooth
blending between the boundary pixels of virtual objects and the adjacent camera
pixels, which results in reduced aliasing. Unlike the antialiasing methods commonly
employed in computer graphics, this approach performs an averaging of pixels in
the original resolution, that is, without the use of supersampling. The resolution of
the output AR image usually matches the original resolution of the real-time camera
image delivered by the digital camera and is thus relatively small.
This algorithm performs a smooth blending between virtual objects and the camera image by detecting where the outer edges of virtual objects are located. In order
to perform the blending, a 3 3 neighborhood of pixels is considered for each output
pixel. For each of the pixels in this area, it is determined whether a part of a virtual
object or the camera image is visible at that location.
If a camera image pixel is detected at the currently regarded output location
(i.e., the center of the neighborhood), this original camera image pixel is used as
output without modifications. Otherwise, the blending operation is executed. The
blending operation is designed so that an averaging of color values is only performed
for boundary pixels, that is, those virtual object pixels that have adjacent camera
pixels in their 3 3 neighborhood. For such a virtual object boundary pixel, only
the neighboring camera image pixels are taken into account for the blending, while
adjacent virtual object pixels are ignored.
A sum of camera image pixels is computed over the associated 3 3 neighborhood. Whereas virtual object pixels in the surrounding neighborhood are ignored,
the color of the central virtual object pixel is also added. The resulting summed-up
color data is then divided by the number of pixels that were included in the summation (i.e., the number of adjacent camera image pixels plus one). The final result is the
antialiased virtual object boundary pixel color. Due to this averaging of color values
between virtual object boundary pixels and adjacent camera image pixels, a smooth
blending is performed at the object boundaries.
Figure 18.5 shows an example demonstrating the smoothed transitions between a
virtual object (in this example, a virtual butterfly model) and the camera image that
are generated by this antialiasing method.
18.5.1.2 RealVirtual Antialiasing Using Supersampling
A more advanced approach to realvirtual antialiasing was proposed in Fischer
(2008). Here, the images contributing to the composited output image are processed
at a resolution that is significantly higher than the resolution of the resulting output
image. This supersampling is rather easily achieved for the virtual scene elements,
which are simply synthesized at the corresponding higher resolution. This is more
challenging for the camera image, because its resolution is predetermined by the
capabilities of the image acquisition hardware.
Realvirtual supersampling is therefore made possible by first upsampling the
real video image, that is, enlarging it to a higher resolution, and then rendering a
high resolution version of the virtual content over it. This results in a combined high
resolution image, which is used as input for the final image composition step.
471

Original rendering
(b)
(a)
(c)
Smoothed object border
FIGURE 18.5 Antialiasing at virtual object boundaries by pixel averaging in the original resolution. The edges of a virtual butterfly object (a) are shown without (b) and with antialiasing (c).
An adaptive edge-directed image upscaling algorithm generates the higher resolution version of the camera image. This advanced upscaling method, which is
executed on the graphics processing unit (GPU), is based on principles described by
Kraus etal. (2007). This sophisticated upsampling scheme generates an enlarged
image that approximates more closely what a higher resolution version of the input
image would likely look like, compared to simple upscaling approaches such as
nearest neighbor or other straightforward interpolation schemes. An example of
a photograph upsampled with the edge-directed scheme is shown in Figure 18.6.
The underlying assumption is that by using a more realistic upsampling scheme
for the camera image, a better result for the realvirtual supersampling can be
achieved.
In the final image composition step, camera image and virtual objects are rendered
at the original, smaller resolution. However, for pixels at the boundary between real
and virtual image regions, the combined higher resolution image is accessed, and the
pixel colors at the corresponding location are averaged.
The averaged high resolution pixels are used as boundary pixels for the combined
image, which leads to an antialiased output. Like the pixel averaging in the original
resolution described earlier, this more advanced realvirtual antialiasing was implemented using image processing shaders executed on the (GPU), resulting in realtime frame rates even for large image resolutions and upsampling factors.
A comparison of antialiasing methods for integrating virtual elements into a
camera image is depicted in Figure 18.7. In this figure, results obtained without
antialiasing (see Figure 18.7a) are shown next to pixel averaging in the original
resolution as described in the preceding section (see Figure 18.7b) and realvirtual
supersampling (see Figure 18.7c). The figure demonstrates that by using supersampling even small-scale structures can be integrated into the camera image with significantly fewer artifacts and an aesthetically superior result compared to the other
approaches.
472
(a)
(b)
(c)
(d)
FIGURE 18.6 Edge-directed upscaling of an example photograph using the method of

Kraus etal. (2007): (a) Original image size with (b) enlarged detail, (c) upscaled photograph
with (d) enlarged detail.
(a)
(b)
(c)
FIGURE 18.7 Comparison of different methods for integrating a virtual element into a
camera image. A virtual stop sign object is drawn over a real street scene. (a) No antialiasing,
(b) pixel averaging in the original resolution, and (c) realvirtual supersampling (4).
473
18.5.2Occlusion Handling
One of the most disturbing problems in conventional video see-through AR is the
incorrect representation of occlusions. The fact that the spatial relationships between
real and virtual objects in an augmented environment are not correctly rendered is
not only aesthetically displeasing; the resulting disorientation can make it difficult or
even impossible for the user to perform useful interactive tasks.
Consequently, the high relevance of the occlusion problem has inspired the development of many different approaches for solving it since the first AR systems were
introduced. Some of these occlusion-handling methods rely on the reconstruction
of scene depths using a pair of stereo cameras (Wloka and Anderson, 1995) or on
predefined models of physical occluders in the real environment, so-called phantom
models (Fuhrmann etal., 1999; Fischer etal., 2004). Other approaches use computer
vision for establishing a rudimentary model of depth relationships in the real environment (Berger, 1997), possibly with the help of predefined surfaces in the environment in front of which user interaction is supposed to take place (dynamic occlusion
backgrounds, see Fischer etal., 2003).
18.5.2.1 Occlusion Handling Using a Time-of-Flight Range Sensor
In Fischer etal. (2007), a specialized, recently developed type of hardware was utilized for the first time for addressing the occlusion problem in AR. A depth-of-flight
range sensor was introduced into the AR image generation pipeline. This sensor
produces a 2D map of the distances to real objects in the environment. The distance
map is registered with high resolution color images delivered by a digital video camera. When displaying the virtual models in AR, the distance map is used in order
to decide whether the camera image or the virtual object is visible at any given
position. This way, the occlusion of virtual models by real objects can be correctly
represented.
A 3D time-of-flight camera from manufacturer pmdtechnologies (pmdtechnologies, 2014) was chosen, namely the PMD [vision] 19k. The camera uses a 160 120
pixel photonic mixer device (PMD) sensor array that acquires distance data using
the time-of-flight principle with active illumination. An array of LEDs sends out
invisible modulated near-infrared light. For each pixel, the PMD sensor delivers distance and intensity information simultaneously, where the distance data is computed
from the phase shift of the reflected light directly inside the camera. Since both the
intensity and the depth values are captured through the same optical system, they
are perfectly aligned.
The camera works at a frame-rate of up to 20 fps and is therefore well suited
for real-time AR applications. However, the image data of the PMD camera itself
is not adequate for AR applications due to its low resolution and the limitation
to grayscale images. In order to enhance the visual quality, the time-of-flight
camera was combined with a standard high-resolution camera, namely a Matrix
Vision BlueFox with a 1600 1200 pixel sensor (MATRIX VISION, 2014). The
resulting horizontal field-of-view of about 34 is similar to that of the PMD camera. This ensures an easy calibration of both cameras and only a small loss of
information.
474
The PMD depth image and the high resolution color image were registeredwith
standard methods. Using the stereo system calibration method of the Camera Cali
bration Toolbox for MATLAB (Bouguet, 2013), the extrinsic as well as the intrinsic
parameters of this system were calculated. By applying the resulting translation and
rotation to the 3D data calculated from the depth data, 3D coordinates in the reference frame of the high resolution camera were obtained.
The occlusion-handling scheme is based on the comparison of the absolute depths
of real and virtual objects. This is possible because the depth maps delivered by the
time-of-flight camera and the coordinate system of the AR system are both calibrated to operate in units corresponding to real millimeters.
Whenever a graphical primitive constituting a virtual model in the augmented
environment is rendered, a special shader program executed on the GPU is activated. The shader compares the depth of each pixel of the graphical primitive with
the depth information stored in the time-of-flight depth map. If the depth measured
by the time-of-flight sensor is smaller than the primitive depth at this location, the
display of the virtual object pixel is suppressed. This way, the occlusion of virtual
objects by real objects is correctly depicted, without requiring any prior knowledge
about the real occluders.
For logistical reasons, experiments were performed using an offline image generation pipeline. The depth maps and color images created by the process described
earlier were stored on disk and then imported into an adapted AR image generation
software. Figure 18.8 shows examples of AR images with occlusion handling based
on depth maps acquired with the time-of-flight camera, as well as the associated
combined depth maps themselves.
Recently, depth cameras in smaller form factors have become widespread in the
consumer market. These more practical depth sensors are also being exploited for
occlusion handling in AR, for instance the Kinect camera (Izadi etal., 2011) and the
Leap motion controller (Regenbrecht etal., 2013).
18.6 STYLIZED AR
Stylized AR systems present to their users a view of the augmented environment
transformed to resemble an artistic or illustrative style. These stylizations inherently
mask many or even all of the effects causing visual inconsistencies between the real
and virtual elements of an AR scene. As with all advanced rendering approaches
for AR, the methods employed for the stylization must run fast enough to guarantee
real-time frame rates and without requiring user interaction.
The stylized AR system described in Fischer etal. (2005c) relied on a combination of video filtering of the camera image with an adapted rendering method for
the virtual objects in order to achieve an integrated cartoon-like effect. More recent
methods generally implement stylization in AR as a pure video postprocessing filter: First, a normal AR image is generated by rendering the virtual objects over
the camera image in the conventional manner, and then the stylization is applied
to the entire combined image. This is made possible by the programmable GPUs
that are nowadays contained not only in desktop computers but also in many wearable devices. Video stylization filters can be implemented on modern GPUs with a
(a)
(b)
(c)
(d)
475
FIGURE 18.8 Examples of occlusion handling in AR using a time-of-flight camera. A real

piece of cardboard was used to occlude parts of virtual objects: (a) A virtual plane model is
partially occluded by the cardboard piece, (c) a virtual Easter island statue is intersected by
the cardboard piece. The corresponding combined depth maps containing the depth-of-flight
data and synthetic depth values of the virtual objects are shown in (b) and (d), respectively.
minimal impact on the frame rate and latency of the AR system, and without requiring computational resources of the central processing unit (CPU) of the device.
18.6.1Real-Time Cartoon-Like Stylization on the GPU

Such a GPU-based stylization filter for AR was presented in Fischer etal. (2005b)
and discussed in greater detail in Fischer and Bartz (2005). For each frame, a standard AR pipeline first generates an output image containing the camera image with
overlaid virtual objects. This original AR frame is rendered using the graphics hardware and resides in its local frame buffer memory. A postprocessing filter is then
applied to it, which is executed by the graphics processing unit.
The stylization filter consists of two steps. In the first step, a simplified color
image is computed from the original AR frame. The simplified color image is composed of regions that are approximately uniformly colored. A nonlinear filter using
476
a photometric weighting of pixels is the basis for this computation. This nonlinear
filter is inspired by bilateral filtering (Tomasi and Manduchi 1998), which is a widespread method for creating uniformly colored regions in an image. The photometric
filter is applied to a shrunk version of the input image. This way, a better color simplification is achieved, and the required computation time can be reduced. Several
filtering iterations are consecutively applied to the image. The repetition of the filter
operation is necessary in order to achieve a sufficiently good color simplification.
The second stage of the nonphotorealistic filter is an edge detection step using a
Sobel filter. The simplified color image is the primary input for this operation. This
way, the generated silhouette lines are located between similarly colored regions in the
image, which is an approximation of a cartoon-like rendering style. To a lesser degree,
edges detected in the original AR frame are also taken into account when drawing the
silhouette lines. The higher resolution of the original image compared to the shrunk
color image can contribute some additional detail to the edge detection result. In the
typical configuration, most of the input for the edge detection step is taken from the
simplified color image. It consists of mostly uniformly colored regions generated by
the photometric filter. Therefore, edges detected in the simplified color image typically
correspond quite well to the outer boundaries of physical or virtual objects.
Finally, the simplified color image is combined with the edge detection results.
The color image is enlarged to the size of the original input image. The combined
responses of the edge detection filters are drawn over the enlarged image as black
lines. A specific weight function is used for computing a transparency for the detected
edge pixels, which produces a smooth blending over the color image.
The GPU-based implementation of the cartoon-like filter for stylized AR is fast
and delivers real-time frame rates. Figure 18.9 shows examples of stylized AR
images generated with the real-time cartoon-like filter.
18.6.2Other Stylization Approaches

Various other methods for generating stylized augmented environments have been
proposed, producing output images in a range of different styles. These more recent
methods generally rely on image postprocessing on the GPU as their main operation.
Due to the massively increased performance of contemporary graphics processing
units, they can generate output images using more sophisticated effects. Moreover,
they also typically process the video images at full resolution and do not need to work
in a reduced image size, as it was done in Fischer etal. (2005b) discussed earlier.
Table 18.2 contains an overview of further styles that have been applied to realtime augmented environments. (An earlier overview of stylization approaches for
AR and applications of stylized AR was also given in Fischer etal. (2008b).)
18.6.3Applications of Stylized AR
The fact that artistic stylization significantly alters not only the appearance of the
virtual objects but also the camera image means that stylized AR is not suitable for
every application area. However, sometimes a seamless visual integration between
real and virtual elements is more important than an unaltered view of the augmented
477
(a)
(b)
(c)
(d)
FIGURE 18.9 Examples of real-time cartoon-like stylization in augmented reality. (a)

Virtual coffee maker model, conventional view, (b) the same in stylized AR. (c) Virtual plate
model, conventional view, (d) the same in stylized AR.
TABLE 18.2
Overview of Further Real-Time Artistic and Illustrative Stylization
Approaches for Augmented Reality
References
Artistic Style
Fischer etal. (2005a)

Haller etal. (2005)
Pointillism
Sketchy
Fischer etal. (2006b)

Chen etal. (2008)
Technical illustration
Watercolor
Wang etal. (2010)
Line drawing and

abstracted shading
Painterly
Chen etal. (2012)
Remarks
Use of a static Voronoi diagram (CPU implementation)
Outlines drawn using particle system; background
paper texture
Black-and-white display; GPU-based hatching
Temporal coherence through the use of a dynamic
Voronoi diagram
Temporally coherent stylization; focus control of
abstraction
Advanced methods for temporal coherence
environment. Moreover, in fields such as entertainment and culture, an artistically

modified perception of the AR scene may actually be desirable in and of itself.
A range of different example applications of stylized AR techniques has been
presented. In the field of art and culture, Fischer et al. (2006b) demonstrated
The Augmented Painting, an interactive installation in which users could switch
478
between different types of stylizations while observing a stylized augmented

environment. A system for exploring buildings and paintings integrated into their
historical context in a seamlessly stylized AR representation was described by
Zllner etal. (2008).
Another area in which stylized AR has been applied is technical and design visualization. Kalkofen et al. (2007) presented techniques for a coherent realvirtual
stylization when inspecting technical and medical 3D models. The selective application of realvirtual stylization in a tangible AR system was described by Fischer
etal. (2008a). Dai (2011) demonstrated the use of stylized AR techniques in virtual
furniture layout in order to effectively harmonize the 3D virtual furniture and the
real-world scene.
Other example applications include the use of a stylized AR representation for
a personal geographical navigation application (Kndel et al., 2007). Lang et al.
(2008) utilized stylization in a massively multiplayer online world based on AR.
18.6.4Psychophysical Evaluation of Stylized AR

A psychophysical study on the effectiveness of stylized AR was performed in order
to assess whether it can actually achieve the aim of equalizing the visual realism
of real and virtual image elements (Fischer et al., 2006c). In this study, a number of participants were asked to decide whether objects shown in AR scenes are
virtual or real. Conventionally rendered as well as stylized AR images and short
video clips were presented to the participants. Each of these images and video clips
showed either a real or a virtual object placed on top of an AR tracking marker.
The real-time cartoon-like stylization filter described earlier in Section 18.6.1 was
used for the experiment. Figure 18.9 shows example images containing virtual
objects used in the study in both the conventional (Figure 18.9a and c) and stylized
(Figure 18.9b and d) modes.
The correctness of the participants responses and their reaction times were
recorded. The results of this study showed that an equalized level of realism is
achieved by using stylized AR, that is, it is significantly more difficult to distinguish
virtual objects from real objects. Presenting the scenes in a stylized manner successfully reduced the detectable differences between physical and virtual objects. The
majority of the errors consisted of falsely believing that some of the physical objects
presented in stylized AR were actually virtual objects. This highlights the success
of the stylization algorithm in generating a consistent representation of virtual and
physical image elements.
18.7CONCLUSION
This chapter has given an introduction into the problems that need to be solved when
a seamless visual integration of real and virtual objects in an augmented environment is desired. The significance of this challenge depends on the requirements of a
given application. In some cases, a unified visual appearance may only be a secondary matter. In such applications, fast processing, short latencies, a reduced implementation effort, and an unmodified view of the real scene may be more important
479
than the desire to address visual inconsistencies. The described strategies for emulating camera image imperfections, as well as the artistic filters employed in stylized
AR, require the implementation of sophisticated image processing algorithms. Even
in their most optimized form, these postprocessing steps still have some impact on
the frame rate and latencies of the AR system. Moreover, they alter the composited
AR image in its conventional form. In a certain sense, it could therefore be said that
these solutions for visual inconsistencies make the AR image worse or even alter
its visual appearance completely.
However, it must be noted that there is no such thing as a natural visual style for
augmented environments. Nave implementations of AR generate visually incongruous images (see Figure 18.1). These incompatible levels of visual realism between
real and virtual elements should not be considered as the self-evident, logical style
of rendering in AR. Rather, the common approach to rendering in AR is a legacy of
the real-time techniques that were available de facto since the emergence of the field.
These were originally developed to display purely synthetic images on computers
with very limited performance.
In particular, those application areas in which presence (a sense of immersion)
in AR is often more criticalentertainment, gaming, interactive experiences, arts,
and culturehave become increasingly important in recent years. Such applications may benefit greatly from techniques that make real and virtual elements in the
environment less distinguishable, and that therefore deliver a more integrated real
virtual experience to the user.
This chapter has provided an overview of techniques that address some of the
underlying problems. Real-time algorithms have been developed for emulating camera image imperfections, for solving the specific challenges of realvirtual compositing, and for the artistic stylization of entire AR video streams. These developments
have been greatly aided by the rapid emergence of ever-faster graphics processing
units capable of performing programmable video postprocessing. Some researchers have even demonstrated integrated pipelines for emulating a large number of
the most relevant effects distinguishing the camera image from the virtual objects.
Future developments will certainly bring us even closer to achieving the aim of presenting to the user a truly seamlessly AR, whether in the style of a camera image or
in an artistic style.
REFERENCES
Agusanto, K., Li, L., Chuangui, Z., and Sing, N. (2003), Photorealistic rendering for augmented reality using environment illumination. In: Proceedings of the Second IEEE
and ACM International Symposium on Mixed and Augmented Reality (ISMAR), Tokyo,
Japan. IEEE, Washington, DC, pp.208216.
Aittala, M. (2010), Inverse lighting and photorealistic rendering for augmented reality. The
Visual Computer, 26(68): 669678.
Azuma, R., Baillot, Y., Behringer, R., Feiner, S., Julier, S., and MacIntyre, B. (2001), Recent
advances in augmented reality. IEEE Computer Graphics and Applications, 21(6):
3447.
Bajura, M. and Neumann, U. (1995), Dynamic registration correction in video-based augmented reality systems. IEEE Computer Graphics and Applications, 15(5): 5260.
480
Berger, M. (1997), Resolving occlusion in augmented reality: A contour based approach

without 3D reconstruction. In: Proceedings of the IEEE Computer Society Conference
on Computer Vision and Pattern Recognition (CVPR), San Juan, Puerto Rico. IEEE,
Washington, DC, pp. 9196.
Boris FX. (2014), Art looks. Accessed June 20, 2014. http://www.borisfx.com/units/ArtLooks.php.
Bouguet, J.-Y. (2013), Camera calibration toolbox for Matlab. Last modified December 2,
2013. http://www.vision.caltech.edu/bouguetj/calib_doc/.
Catmull, E. (1974), A subdivision algorithm for computer display of curved surfaces. PhD
thesis, University of Utah, Salt Lake City, UT.
Chen, J., Turk, G., and MacIntyre, B. (2008), Watercolor inspired non-photorealistic rendering
for augmented reality. In: Proceedings of the 2008 ACM Symposium on Virtual Reality
Software and Technology, Bordeaux, France. ACM, New York, NY, pp. 231234.
Chen, J., Turk, G., and MacIntyre, B. (2012), A non-photorealistic rendering framework
with temporal coherence for augmented reality. In: Proceedings of IEEE International
Symposium on Mixed and Augmented Reality (ISMAR) 2012, Atlanta, GA. IEEE,
Cooper, J., Boyce, J., Tourapis, A., Gomila, C., Llach, J., and Yin, P. (2004), Technique for
film grain simulation using a database of film grain patterns. U.S. Patent Application
10/581,151. The United States Patent and Trademark Office.
Dai, D. (2011), Stylized rendering for virtual furniture layout. In: Proceedings of Second
International Conference on Multimedia Technology (ICMT), Hangzhou, China. IEEE,
Debevec, P. (2008), Rendering synthetic objects into real scenes: Bridging traditional and
image-based graphics with global illumination and high dynamic range photography. In:
ACM SIGGRAPH 2008 Classes, Los Angeles, CA. ACM, New York, NY, p. 32.
de Sorbier, F. and Saito, H. (2014), Stereoscopic augmented reality with pseudo-realistic
global illumination effects. In: Proceedings of IS&T/SPIE Electronic Imaging 2014
(Stereoscopic Displays and Applications XXV), San Francisco, CA, p. 90111W.
Drettakis, G., Robert, L., and Bougnoux, S. (1997), Interactive common illumination for
computer augmented reality. In: Proceedings of the Eighth Eurographics Workshop
on Rendering Techniques 97 (Proceedings of the 8th Eurographics Workshop on
Rendering), St. Etienne, France. Springer, Vienna, Austria, pp. 4556.
Dutre, P., Bala, K., Bekaert, P., and Shirley, P. (2006), Advanced Global Illumination, 2nd edn.
A K Peters/CRC Press, Wellesley, MA.
Fischer, J. (2008), Real-virtual antialiasing. ACM SIGGRAPH 2008 Posters. ACM, Redwood
City, CA, p. 70.
Fischer, J. and Bartz, D. (2005), Real-time cartoon-like stylization of AR video streams on
the GPU. Technical Report. Eberhard-Karls-Universitt Tbingen, Tbingen, Germany.
http://tobias-lib.uni-tuebingen.de/volltexte/2005/1987/. Accessed February 26, 2015.
Fischer, J., Bartz, D., and Straer, W. (2004), Occlusion handling for medical augmented reality using a volumetric phantom model. In: Proceedings of the ACM Symposium on Virtual
Reality Software and Technology (VRST), Hong Kong, China. ACM, New York, pp. 174177.
Fischer, J., Bartz, D., and Straer, W. (2005a), Artistic reality: Fast brush stroke stylization for
augmented reality. In: Proceedings of the ACM Symposium on Virtual Reality Software
and Technology (VRST), Monterey, CA. ACM, New York, pp. 155158.
Fischer, J., Bartz, D., and Straer, W. (2005b), Reality tooning: Fast non-photorealism for augmented video streams. In: Proceedings of Fourth IEEE and ACM International Symposium
on Mixed and Augmented Reality (ISMAR), Vienna, Austria. IEEE, Washington, DC,
pp.186187.
481
Fischer, J., Bartz, D., and Straer, W. (2005c), Stylized augmented reality for improved immersion. In: Proceedings of IEEE Virtual Reality 2005, Bonn, Germany. IEEE, Washington,
DC, pp. 195202.
Fischer, J., Bartz, D., and Straer, W. (2006a), Enhanced visual realism by incorporating camera image effects. In: Proceedings of the Fifth IEEE and ACM International Symposium
on Mixed and Augmented Reality (ISMAR), Santa Barbara, CA. IEEE, Washington, DC,
pp. 205208.
Fischer, J., Bartz, D., and Straer, W. (2006b), The augmented painting. In: ACM SIGGRAPH
2006 Emerging Technologies, Boston, MA. ACM, New York, p. 2.
Fischer, J., Cunningham, D., Bartz, D., Wallraven, C., Blthoff, H., and Straer, W. (2006c),
Measuring the discernability of virtual objects in conventional and stylized augmented
reality. In: Proceedings of the 12th Eurographics Conference on Virtual Environments
(EGVE), Lisbon, Portugal. Eurographics Association, Geneva, Switzerland, pp. 5361.
Fischer, J., Flohr, D., and Straer, W. (2008a), Selective stylization for visually uniform t angible
AR. In: Proceedings of the 14th Eurographics Symposium on Virtual Environments
(EGVE), Eindhoven, Netherlands. Eurographics Association, Geneva, Switzerland,
pp.18.
Fischer, J., Haller, M., and Thomas, B. (2008b), Stylized depiction in mixed reality.
International Journal of Virtual Reality (IJVR), 7(4): 7179. IPI Press.
Fischer, J., Huhle, B., and Schilling, A. (2007), Using time-of-flight range data for occlusion
handling in augmented reality. In: Proceedings of the 13th Eurographics Conference on
Virtual Environments (EGVE), Weimar, Germany. Eurographics Association, Geneva,
Switzerland, pp. 109116.
Fischer, J., Regenbrecht, H., and Baratoff, G. (2003), Detecting dynamic occlusion in front of
static backgrounds for AR scenes. In: Proceedings of the Ninth Eurographics Workshop
on Virtual Environments (EGVE), pp. 153162. Eurographics Association, Aire-la-Ville,
Switzerland.
Fournier, A., Gunawan, A., and Romanzin, C. (1993), Common illumination between real
and computer generated scenes. In: Proceedings of Graphics Interface 93, Toronto,
Ontario, Canada. Canadian Information Processing Society, pp. 254262. Mississauga,
Canada.
Fuhrmann, A., Hesina, G., Faure, F., and Gervautz, M. (1999), Occlusion in collaborative
augmented environments. Computers & Graphics, 23(6): 809819.
Grosch, T., Eble, T., and Mueller, S. (2007), Consistent interactive augmentation of live
camera images with correct near-field illumination. In: Proceedings of the 2007 ACM
Symposium on Virtual Reality Software and Technology (VRST), Newport Beach, CA.
ACM, New York, pp. 125132.
Grubba Software. (2014), TrueGrainAccurate black & white film simulation for digital photography. Accessed June 19, 2014. http://grubbasoftware.com/.
Haller, M., Landerl, F., and Billinghurst, M. (2005), A loose and sketchy approach in a mediated reality environment. In: Proceedings of the Third International Conference on
Computer Graphics and Interactive Techniques in Australasia and South East Asia
(GRAPHITE), Dunedin, New Zealand. ACM, New York, pp. 371379.
Hertzmann, A. and Perlin, K. (2000), Painterly rendering for video and interaction. In:
Proceedings of the First International Symposium on Non-photorealistic Animation and
Rendering (NPAR), Annecy, France. ACM, New York, pp. 712.
Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R., Kohli, P., Shotton, H. etal.
(2011), KinectFusion: Real-time 3D reconstruction and interaction using a moving
depth camera. In: Proceedings of the 24th Annual ACM Symposium on User Interface
Software and Technology (UIST), Santa Barbara, CA. ACM, New York, pp. 559568.
482
Kalkofen, D., Mendez, E., and Schmalstieg, D. (2007), Interactive focus and context visualization for augmented reality. In: Proceedings of the Sixth IEEE and ACM International
Symposium on Mixed and Augmented Reality (ISMAR), Nara, Japan. IEEE, Washington,
DC, pp. 110.
Kn, P. and Kaufmann, H. (2012), Physically-based depth of field in augmented reality.
Eurographics 2012 (Short Papers), Cagliari, Italy. Eurographics Association, Geneva,
Switzerland, pp. 8992.
Klein, G. and Murray, D. (2010), Simulating low-cost cameras for augmented reality compositing. IEEE Transactions on Visualization and Computer Graphics, 16(3): 369380.
Kndel, S., Hachet, M., and Guitton, P. (2007), Enhancing the mapping between real and virtual
world on mobile devices through efficient rendering. In: Proceedings of Mixed Reality
User Interfaces 2007 (IEEE VR Workshop), Charlotte, NC. IEEE, Washington, DC.
Kraus, M., Eissele, M., and Strengert, M. (2007), GPU-based edge-directed image interpolation. In: Proceedings of the 15th Scandinavian Conference on Image Analysis, Aalborg,
Denmark. Springer, Berlin, Germany, pp. 532541.
Kyprianidis, J., Collomosse, J., Wang, T., and Isenberg, T. (2013), State of the Art: A taxonomy of artistic stylization techniques for images and video. IEEE Transactions on
Visualization and Computer Graphics (TVCG), 19(5): 866885.
Lang, T., MacIntyre, B., and Zugaza, I. (2008), Massively multiplayer online worlds as a
platform for augmented reality experiences. In: Proceedings of IEEE Virtual Reality
Conference (VR), Reno, NE. IEEE, Washington, DC, pp. 6770.
MATRIX VISION. (2014), MATRIX VISION GmbH. Accessed June 22, 2014. http://www.
matrix-vision.com/home-en.html.
Munshi, A., Ginsburg, D., and Shreiner, D. (2008), OpenGL ES 2.0 Programming Guide.
Pearson Education, Boston, MA.
Okumura, B., Kanbara, M., and Yokoya, N. (2006), Augmented reality based on estimation of
defocusing and motion blurring from captured images. In: Proceedings of the Fifth IEEE
and ACM International Symposium on Mixed and Augmented Reality (ISMAR), Santa
Barbara, CA. IEEE, Washington, DC, pp. 219225.
Park, Y., Lepetit, V., and Woo, W. (2009), ESM-Blur: Handling & rendering blur in 3D tracking
and augmentation. In: Proceedings of Eighth IEEE International Symposium on Mixed
and Augmented Reality (ISMAR), Orlando, FL. IEEE, Washington, DC, pp. 163166.
pmdtechnologies. (2014), pmdcan you imagine: The world of pmd. Accessed June 22,
2014. http://www.pmdtec.com/.
Regenbrecht, H., Collins, J., and Hoermann, S. (2013), A leap-supported, hybrid AR interface approach. In: Proceedings of the 25th Australian Computer-Human Interaction
Conference: Augmentation, Application, Innovation, Collaboration, Adelaide, South
Australia, Australia. ACM, New York, pp. 281284.
Schilling, A. (1991), A new simple and efficient antialiasing with subpixel masks. Computer
Graphics, 25(4): 133141. (SIGGRAPH 1991 Proceedings), ACM.
Straer, W. (1974), Schnelle Kurven- und Flachendarstellung auf graphischen Sichtgerten.
Dissertation, Technical University of Berlin, Berlin, Germany.
Sugano, N., Kato, H., and Tachibana, K. (2003), The effects of shadow representation of
virtual objects in augmented reality. In: Proceedings of the Second IEEE and ACM
International Symposium on Mixed and Augmented Reality (ISMAR), Tokyo, Japan.
IEEE, Washington, DC, pp. 7683.
Tomasi, C. and Manduchi, R. (1998), Bilateral filtering for gray and color images. In: Sixth
International Conference on Computer Vision, Bombay, India. IEEE, pp. 839846.
Washington, DC.
Tuceryan, M., Greer, D., Whitaker, R., Breen, D., Crampton, C., Rose, E., and Ahlers, K.
(1995), Calibration requirements and procedures for a monitor-based augmented reality
system. IEEE Transactions on Visualization and Computer Graphics, 1(3): 255273.
483
Wang, S., Cai, K., Lu, J., Liu, X., and Wu, E. (2010), Real-time coherent stylization for augmented reality. The Visual Computer, 26(68): 445455.
Withagen, P., Groen, F., and Schutte, K. (2005), CCD characterization for a range of color
cameras. In: Proceedings of Instrumentation and Measurement Technology Conference
(IMTC 2005), Ottawa, Ontario, Canada. IEEE, Washington, DC, pp. 22322235.
Wloka, M. and Anderson, B. (1995), Resolving occlusion in augmented reality. In: Proceedings
of the 1995 ACM Symposium on Interactive 3D Graphics, Monterey, CA. ACM, New
York, pp. 512.
Xueting, L. and Ogawa, T. (2013), A depth cue method based on blurring effect in augmented
reality. In: Proceedings of the Fourth Augmented Human International Conference,
Stuttgart, Germany. ACM, New York, pp. 8188.
Zllner, M., Pagani, A., Pastarmov, Y., Wuest, H., and Stricker, D. (2008), Reality filtering:
A visual time machine in augmented reality. In: Proceedings of the Ninth International
Symposium on Virtual Reality, Archaeology and Cultural Heritage (VAST), Braga,
Portugal. Eurographics Association, pp. 7177. Geneva, Switzerland.
19
Applications of
Augmented Reality in
the Operating Room
Ziv Yaniv and Cristian A. Linte
CONTENTS
19.1 Introduction................................................................................................... 486
19.1.1 Minimally Invasive Interventions...................................................... 486
19.1.2 Augmented, Virtual, and Mixed Realities......................................... 486
19.1.3 Augmenting Visualization for Surgical Navigation.......................... 487
19.2 Image Guidance Infrastructure..................................................................... 488
19.2.1 Medical Imaging................................................................................ 488
19.2.2 Segmentation and Modeling.............................................................. 488
19.2.3 Instrument Localization and Tracking.............................................. 489
19.2.4 Registration and Data Fusion............................................................. 489
19.2.5 Data Visualization and Feedback Method......................................... 490
19.3 Guided Tour of AR in the Operating Room.................................................. 490
19.3.1 Understanding the OR Requirements................................................ 490
19.3.2 Commercially Available Systems...................................................... 490
19.3.2.1 Augmented X-Ray Guidance.............................................. 490
19.3.2.2 Augmented Ultrasound Guidance...................................... 491
19.3.2.3 Augmented Video and SPECT Guidance........................... 492
19.3.2.4 Augmented Endoscopic Video Guidance........................... 493
19.3.2.5 Augmented Tactile Feedback.............................................. 495
19.3.3 Laboratory Prototypes....................................................................... 495
19.3.3.1 Screen-Based Display....................................................... 496
19.3.3.2 Binocular Display............................................................. 501
19.3.3.3 Head-Mounted Display..................................................... 503
19.3.3.4 Semitransparent (Half-Silvered Mirror) Display..............504
19.3.3.5 Direct Patient Overlay Display......................................... 505
19.4 Limitations and Challenges........................................................................... 507
19.4.1 Optimal Information Dissemination and User Perception
Performance....................................................................................... 507
19.4.2 Accommodation for Tissue Motion and Organ Deformation............ 508
19.4.3 Compatibility with Clinical Workflow and Environment................. 508
19.4.4 Cost-Effectiveness............................................................................. 508
19.5 Future Perspectives........................................................................................509
References............................................................................................................... 510
485
486
19.1INTRODUCTION
19.1.1 Minimally Invasive Interventions
Most surgical procedures and therapeutic interventions have been traditionally
performed by gaining direct access to the internal anatomy, using direct visual inspection to deliver therapy and treat the conditions. In the meantime, a wide variety of
medical imaging modalities have been employed to diagnose the condition, plan the
procedure, and monitor the patient during the intervention; nevertheless, the tissue
manipulation and delivery of therapy has been performed under rather invasive incisions that permit direct visualization of the surgical site and ample access inside the
body cavity. Over the past several decades, significant efforts have been dedicated to
minimizing invasiveness associated with surgical interventions, most of which have
been possible thanks to the developments in medical imaging, surgical navigation,
visualization and display technologies.
19.1.2Augmented, Virtual, and Mixed Realities

In parallel with the aspirations toward less invasive therapy delivery on the medical
side, many engineering applications have been employing computer-generated m
odels
to design, simulate, and visualize the interaction between different components
within an assembly prior to the global system implementation. One such approach
has focused on complementing the users visual field with information that facilitates
the performance of a particular taska technique broadly introduced and described
by Milgram etal. as augmenting natural feedback to the operator with simulated
cues (Milgram etal. 1994) and later becoming known as augmented r eality (AR).
While the term augmented reality may be used somewhat loosely and in a more
inclusive sense than it had originally been intended, it refers to visualization environments that combine real-world visualization with computer-generated information
aimed to show information that direct vision cannot, resulting in a more comprehensive, enhanced view of the real world. The term mixed reality has been suggested
as a better descriptor of such environments, as, depending on the extent of real and
computer-generated information, the resulting environments can lie anywhere on
the spectrum of the realityvirtuality continuum (Kaneko etal. 1993; Metzger 1993;
Milgram and Kishino 1994; Takemura and Kishino 1992; Utsumi etal. 1994).
The real component of a typical mixed reality environment may consist of either a
direct view of the field observed by the users eyes (i.e., optical-based AR), or a view
of the field captured using a video camera and displayed to the uservideo-based
AR. As synthetic (i.e., computer-generated) data is added to the environment, the
mixed reality may become less of a traditional AR and more of an augmented
virtuality, yet still sufficiently different from a fully immersed virtual reality environment, in which the user has no access to the real-world view.
Initial applications of such visualization techniques have been adopted in response
to efforts in facilitating task performance in industry. While at first, computer-
generated models were displayed on screens and available to the workers as guides,
a more revolutionary approach resorted to the use of see-through displays mounted
on head-set devices worn by the workers. This approach enabled the superposition of
Applications of Augmented Reality in the Operating Room
487
the computer models onto the real view of the physical parts (Caudell 1994), hence
facilitating the workers task by truly augmenting the users view with computersimulated cues.
19.1.3Augmenting Visualization for Surgical Navigation

Although originally designed and implemented for industrial applications, mixed
reality visualization environments soon experienced traction in the medical world,
motivated primarily by the goal to improve clinical outcome and procedure safety
by improving accuracy and reducing variability, reduce radiation exposure to both
patient and clinical staff, as well as reduce procedure morbidity, recovery time and
associated costs. In addition, the challenges associated with the trend toward minimally invasive procedures quickly revealed themselves, in terms of surgical navigation and target tissue manipulation under restricted access conditions and limited
vision, and hence the need for adequate intuitive visualization became critical for the
performance of the procedure.
Computers have become an integral part of medicine, enabling the acquisition,
processing, analysis, and visualization of medical images and their integration into
diagnosis and therapy planning (Ettinger etal. 1998), surgical training (Botden and
Jakimowicz 2009; Feifer etal. 2008; Kerner etal. 2003; Rolland etal. 1997), preand intraoperative data visualization (Kaufman et al. 1997; Koehring et al. 2008;
Lovo etal. 2007), and intraoperative navigation (Nakamoto etal. 2008; Teber etal.
2009; Vogt etal. 2004; Vosburgh and San Jos Estpar 2007). These technologies
have empowered clinicians to not only perform procedures that were rarely successful decades ago, but also embrace the use of less invasive techniques in an effort to
reduce procedure morbidity and patient trauma.
Besides providing faithful diagnosis, medical imaging has also enabled a variety
of minimally invasive procedures; however, the success of the procedure depends
upon the clinicians ability to mentally recreate the view of the surgical scene based
on the intraoperative images. These images provide a limited field of view of the
internal anatomy and also feature lower quality than the preoperative images used
for diagnosis. Moreover, depending on the employed imaging modality, the s urgical
instruments used during therapy may not be easily visible in the intraoperative
images, raising the need for additional information.
To provide accurate guidance while avoiding critical anatomical structures, s everal
data types acquired from different sources at different stages of the procedure are
integrated within a common image guidance workflow. High-quality preoperative
images and anatomical models are employed to provide the big picture of the internal anatomy that help the surgeon navigate from the point of access to the target to be
treated, serving as a road map. The employed surgical tools are typically i nstrumented
with tracking (i.e., localization) sensors that encode the tool position and orientation,
and, provided the patient anatomy is registered to the p reoperative images/models
(i.e., image/model-to-patient registration typically achieved via the tracking system),
the virtual representation of the surgical instruments can be visualized in the same
coordinate system as the road map, in a similar fashion as using a GPS navigation system to obtain positioning information along a route. Tocompensate for the
488
limited intraoperative faithfulness provided by the slightly outdated preoperative

data, intraoperatively acquired images are also integrated into the guidance environment to provide accurate and precise target identification and on-target instrument
positioning based on real-time information. Following fusion of pre- and intraoperative images and instrument tracking information, the physician performs the tool-totarget navigation using the preoperative images/models augmented with the virtual
tool representations, followed by the on-target instrument positioning under realtime image guidance complemented by real-time instrument tracking.
The multimodality guidance and navigation information can be displayedto the
surgeon via traditional 2D display screens available in the interventionalsuites,in the
form of an AR display, either directly overlaid onto the patients skin (i.e.,optical-based
AR) or by augmenting a video display of the patient (i.e., video-based AR), via tracked
head-mounted (stereoscopic) displays or recently developed and commercially available 3D displays (Linte etal. 2013). Less common modes of augmented tactile and
auditory feedback are also in use and will be described.
19.2 IMAGE GUIDANCE INFRASTRUCTURE

Having set the stage, we now briefly present the key technologies involved in the
development of medical navigation guidance systems (Cleary and Peters 2010;
Peters and Cleary 2008; Yaniv and Cleary 2006), including (1) medical imaging,
(2) segmentation and modeling, (3) localization and tracking, (4) registration, and
(5)visualization and other feedback methods.
19.2.1 Medical Imaging

In the context of medical AR, images may represent either the real or virtual part
of the environment, depending on their source and acquisition time. Information
obtained from preoperative images, such as computed tomography (CT), is often
displayed as virtual models of the anatomy, while real-time intraoperative imaging, such as endoscopic video, would constitute the real component. High-quality
preoperative images are typically acquired for diagnostic purposes and traditionally their value was minimal during the intervention. Preoperative diagnostic imaging modalities include CT and magnetic resonance imaging (MRI) for anatomical
imaging, and positron emission tomography (PET) and single photon emission computed tomography (SPECT) for functional imaging. Intraoperative imaging modalities include x-ray fluoroscopy, cone-beam CT (CBCT), ultrasound (US), and video
endoscopy, all of which provide anatomical imaging. A more recent addition is a
commercial system for intraoperative SPECT that is part of an AR system, which
will be described later on.
19.2.2Segmentation and Modeling

Following their acquisition, the datasets can be quite large, making them challenging
to manipulate in real time, especially when not the entire extent of the data is required
for a specific application. A common approach is to segment the information of interest
489
and generate models that provide interactive visualization in the region of interest.
Alarge number of approaches to image segmentation have been developed. These can
be divided into low level and model-based approaches. Themost common low level
segmentation approach is thresholding, which is readily applicable for segmenting
bone structures in CT. Often combinations of multiple low-level operations are used for
specific segmentation tasks, but these are not generalizable. Model-based approaches
have been more successful and generalizable. Among others these include deformable
models, level set based segmentation, statistical shape and appearance models, and
more recently statistical methods for segmenting ensembles of organs or structures
using models of organ variation and the spatial relationships between structures.
19.2.3Instrument Localization and Tracking

Considering the surgical scene is not directly visible during most minimally invasive
interventions, the position and orientation of the surgical instrument with respect
to the anatomy must be precisely known at all times. To address this requirement,
spatial localizers have become an integral part of image-guided navigation platforms, enabling the tracking of all interventional tools within the same coordinate
system attached to the patient. The common tracking technologies found in the medical setting are either optical or electromagnetic. These are standard devices with
well-understood advantages and limitations. Broadly speaking, optical systems are
highly accurate but require an unobstructed line-of-sight between the cameras and
the markers. As a result, these systems are appropriate for tracking tools outside the
body or rigid tools with the markers mounted outside the body. Electromagnetic
tracking systems are accurate and can track both rigid and flexible tools inside and
outside the body. However, these systems are susceptible to measurement distortions
in the presence of ferromagnetic materials. A less common tracking approach that is
gaining traction is to use the medical images, endoscopic video or US, to track tools
and anatomical structures in real time. Currently, this approach is still limited to
prototype research systems.
19.2.4Registration and Data Fusion

Registration is the enabling technology for spatial data integration and it plays a vital
role in aligning images, features, and/or models with each other, as well as establishing the relationship between the virtual environment (pre- and intraoperative images
and surgical tool representations), and the physical patient. The most common registration approach found in the clinical setting is rigid registration using a paired-point
approach. This is the only registration method that has an analytic solution. In some
cases point identification and pairing is performed automatically and in some manually. Nonrigid registration algorithms are also in use, but they all require some form of
initialization. While we most often judge registration algorithms solely based on their
accuracy, in the clinical setting one must also take into account the computation time,
robustness, and amount of user interaction. The combination of these constraints determines if an algorithm is truly applicable in a specific setting. As an example, a 2mm
registration error may be sufficiently accurate for one procedure, but not for another.
490
19.2.5Data Visualization and Feedback Method

Once all images and segmented structures are available, along with the spatial relationships between anatomy and devices, this information needs to be displayed to
the clinician. Most often this is done visually, either using orthogonal multiplanar
display of the 3D images, surface renderings of segmented structures, volume rendering of the anatomy, or nonphotorealistic rendering of the objects, or less common
options, including tactile feedback or auditory feedback. The choice of feedback or
rendering method should not be based on subjective criteria, such as whether the
operator prefers a specific visualization, but rather on the operators performance on
the task at hand given the specific feedback mechanism.
19.3 GUIDED TOUR OF AR IN THE OPERATING ROOM

19.3.1Understanding the OR Requirements
Developing an AR guidance system for the operating room is a challenging task.
Inthis high risk setting, guidance systems are expected to exhibit robust behavior
with intuitive operation and without interfering with the correct functionality of multiple other devices that are concurrently in use. One such example is the performance
and effect of using infrared-based optical tracking, the most common tracking
approach currently found in the clinical setting. It is well known that surgical lights
and lights used by operating microscopes can reduce tracking accuracy (Langlotz
2004); on the other hand the infrared light emitted by the tracking device has been
shown to affect the measurements recorded by pulse oximeters (Mathes etal. 2008).
As a consequence, when developing an AR system for the clinical setting, a holistic
approach should be taken to identify all potential effects of the AR system on other
equipment and vice versa. Finally, given the highly regulated safety domain, transition into clinical care requires adherence to the development practices specified by
regulatory agencies (e.g., U.S. Food and Drug Administration, Health Canada, etc.).
Most likely these challenges are in part the reason that there is a significant gap
between the number of systems developed in the laboratory setting and those that are
able to transition into clinical carea gap that is often referred to as the valley of
death (Coller and Califf 2009). It is much harder to get a system into the clinic than
to develop a prototype in the laboratory and evaluate it on phantoms, animal models,
or a small number of patients. We thus divide our tour of AR in the OR in two parts:
systems that are commercially available and systems that are laboratory prototypes.
19.3.2Commercially Available Systems

19.3.2.1 Augmented X-Ray Guidance
One of the first commercially available navigation systems was for all intents and
purposes an AR system. The system augmented fluoroscopic x-ray images, displayed on a standard screen, by projecting iconic representations of tracked tools, as
shown in Figure 19.1. Among others, these systems were available as ION Fluoronav
StealthStation (Medtronic, USA), and Fluologics (Praxim, France). This form of
491
FIGURE 19.1 Virtual fluoroscopy system interface. Projection of tracked tool and virtual
extension are dynamically overlaid onto the intraoperative x-ray images.
augmentation was originally introduced in orthopedics as virtual fluoroscopy (Foley

etal. 2001). Prior to the introduction of this system, the location of a tool relative to
the anatomy was performed by acquiring multiple x-ray images from different poses
that the physician used to create a 3D mental map. To monitor dynamic processes, for
example, a drill trajectory, imaging was repeated multiple times at discrete intervals.
As a result, the patient and medical staff are exposed to a nonnegligible amount of
ionizing radiation, with accuracy dependent upon the physicians skill at interpreting
the images. As such, virtual fluoroscopy systems mimicked the standard practice
without the need for repetitive imaging and provided continuous updates by calibrating the x-ray device and tracking it and all tools. Instead of acquiring images at
multiple discrete intervals, the same set of images is used to provide continuous feedback. For each x-ray image the projection parameters and imager pose are known.
The tools are continuously tracked using an optical system and their representations
are then projected onto the x-ray images, similar to continuous x-ray fluoroscopy.
Itshould be noted that the end result is still highly dependent on the physicians ability to interpret these 2D projection images. These systems were shown to provide
more accurate results than traditional fluoroscopy while reducing radiation exposure
(Merloz etal. 2007). Unfortunately, they also resulted in longer operating times, and,
since their introduction, they have fallen out of favor, with physicians shifting toward
other forms of navigation guidance (Mavrogenis etal. 2013).
19.3.2.2 Augmented Ultrasound Guidance
A similar approach has been recently used in the context of US guidance for procedures requiring needle insertion. These systems augment the US video stream,
displayed on a standard screen, with iconic representations of the needle. The challenges addressed stem from the nature of 2D US. This tomographic modality limits
the needle insertion to the plane defined by the image. In addition, identifying the
location of the needle tip in the image is not trivial due to the noisy nature of US
images. AR guidance systems are thought to facilitate in-plane needle insertion
accuracy and enable out-of-plane insertions by augmenting the US image. When
working in-plane, the needle location and a virtual extension are overlaid onto
the image. When working out-of-plane, a graphic denoting the intersection of the
needle trajectory with the imaging plane is overlaid onto the image. These systems
use electromagnetic tracking to localize the US probe and customized needles with
embedded sensors. Among others, this class of systems is available as the SonixGPS
492
Tip-to-plane 0.3 cm
(a)
(b)
ation Length 3.0 x cm
FIGURE 19.2 Needle guidance system from InnerOptic technology. System provides
guidance for insertion of an electromagnetically tracked ablation probe into a tumor using
a stereoscopic display. (a) Physical setup and endoscopic view (A) US probe (B) ablation
needle. (b) Augmented US video stream with the needle, its virtual extension and ablation
region. (Photo courtesy of S. Razzaque, InnerOptic Technology, Hillsborough, NC.)
(Ultrasonix, Canada), and LOGIQ E9 (GE Healthcare, USA), and a defunct product
the US Guide 2000 (UltraGuide, Israel). These systems have been used to guide
lesion biopsy procedures (Hakime et al. 2012), tumor ablations (Hildebrand et al.
2007), liver resection (Kleemann etal. 2006), and for regional anesthesia (Umbarje
etal. 2013; Wong etal. 2013). A limiting factor, however, is that these systems require
customized needles with embedded sensors or external adaptors rigidly attached to
the needle as used in the UltraGuide system. To address this issue, the Clear Guide
ONE system (Clear Guide Medical, USA)* uses a similar augmentation approach, but
does not require customized needles; instead, it uses a device mounted directly onto
the US probe, which includes a structured light system to track the needle and a combination of optical and inertial sensors for tracking the US probe (Stolka etal. 2011).
Finally, a different approach to augmenting the users perception is provided by
the AIM and InVision, systems (InnerOptic Technology, USA). These use electromagnetic and optical tracking, respectively, with augmentation provided via a stereoscopic display and a realistic representation of the needle. The operator is required
to wear polarized glasses to perceive a full 3D scene. These systems have been clinically used for delivering ablative therapy (Sindram et al. 2010, 2011). Figure 19.2
shows the physical setup and augmented guidance view.
19.3.2.3 Augmented Video and SPECT Guidance
The systems described earlier are used to directly guide the intervention with
AR being the key technical component. The declipseSPECT (SurgicEye GmbH,
*
Not yet approved for clinical use.
493
Germany) system uses AR to guide data acquisition and clinical decision making,
although it is not the systems key novel aspect. The clinical context is the use of
nuclear medicine in the surgical setting. A radiopharmaceutical tracer (i.e., a ligand
linked to a gamma ray emitting substance) is injected into the blood stream and
binds to tumor cells more readily than to normal cells, increasing the concentration
of radiation emitting substance in the tumor. A gamma probe is then used to localize
the region exhibiting higher radiation counts. Previously, localization was done in
a qualitative manner, with gamma probes used to evaluate the presence of radioactive material via continuous auditory and numerical feedback (Povoski etal. 2009),
providing limited knowledge about the spatial distribution of the tracer. The novelty
of the declipseSPECT system consists of the reconstruction of a 3D volume based on
the concentration of radioactive tracer measured with an optically tracked, handheld,
gamma probe. Hence, the system does the job of a SPECT imaging system using a
handheld device rather than the large diagnostic SPECT machine that cannot be
used in the operating room setting. To provide AR guidance, the system combines a
video camera with a commercial optical tracking system. This combination device
is calibrated during assembly. Once calibrated, any spatial information known in the
tracking systems coordinate frame is readily projected onto the video.
As the quality of the 3D SPECT reconstruction is highly dependent on the data
gathered with the tracked gamma probe, the system displays the amount of information acquired in the region of interest overlaid onto the video stream of the patient.
This information guides the surgeon to place the gamma probe at locations that have
insufficient measurements. Once the SPECT volume is computed, its projection is
overlaid onto the video of the patient as shown in Figure 19.3. This overlay guides the
surgeon during biopsy or resection of the tumor (Navab etal. 2012). The system was
initially evaluated in a study including 85 patients undergoing lymph node biopsy after
being diagnosed with invasive breast cancer (Wendler etal. 2010). The study identified
the correlation between the quality of the 3D reconstruction and the quality of data
acquisition, emphasizing that better acquisitions resulted in improved guidance.
Having described three forms of commercially available AR systems, we point
out that all three share an important characteristic: they do not involve registration. These systems are based on calibration of the imaging device (x-ray, US, video
camera) with respect to a tracking system. The augmented video consists of intraoperative imaging overlaid with additional data also acquired intraoperatively; the lack
of pre- to intraoperative registration is an important aspect in the clinical setting, as
the registration often disrupts and slows the clinical workflow, significantly reducing
the chances of user acceptance.
19.3.2.4 Augmented Endoscopic Video Guidance
A guidance system that does require registration of preoperative data is the Scopis
Hybrid Navigation system (SCOPIS Medical, Germany), developed in the context of
earnosethroat (ENT) surgery. This system uses a preoperative CT volume of the
patient in which structures of interest, surgical planning information such as desired
osteotomy lines, and targets are defined. Both the CT and a live endoscopic video
stream are used to guide the surgeon (Winne etal. 2011). Intraoperatively, the CT
is registered to the patient using rigid registration based on corresponding points
494
FIGURE 19.3 Interface of the declipseSPECT system from SurgicEye GmbH. The system
uses an optically tracked gamma probe (a), to obtain a SPECT volume that is overlaid onto the
video stream (b). Computations are in a patient-centric coordinate system, a tracked reference
frame that is attached to the patients skin (c). (Photo courtesy of J. Traub, SurgicEye GmbH,
Mnchen, Germany.)
identified in the CT and on the patient. The calibrated endoscope is then tracked
using either optical or electromagnetic tracking, and the information defined preoperatively in the CT coordinate system is overlaid onto the video stream. To strike a
balance between overlaying information onto the image and obscuring the anatomy,
the system overlays information in a minimalistic manner, using outlines as opposed
to solid semitransparent objects. An interesting feature of endoscopic images is that
they exhibit severe barrel distortion. This is a feature associated with the lenses
used in endoscopes that are required to provide a wide angle view. Thus, to augment endoscopic video, one must either correct the distorted images or distort the
projected data prior to merging the two information sources. The approach adopted
by the Scopis system is to distort the projected data, an approach consistent with the
understanding that clinicians prefer to view the original unprocessed images, as well
as also more computationally efficient.
All commercially available AR systems described so far use standard or stereoscopic 2D screens to display the information, most likely due to the availability of
screens in the clinical setting, and the familiarity of clinicians with this mode of
interaction. Stepping into the modern day interventional setting, one immediately
observes that the medical staff focuses on various screens, while almost no one looks
directly at the patient. This approach is not ideal, as it poses a challenge for handeye
495
coordination, a known issue both in US-guided needle insertion (Adhikary et al.

2013) and laparoscopic surgery (Wilson et al. 2010), and tentatively addressed by
some AR systems described in the following section.
19.3.2.5 Augmented Tactile Feedback
An AR domain often overlooked is haptic AR, using tactile feedback to augment
the users performance. In orthopedics, the concept of augmenting an operators tactile experience was introduced in the context of bone milling for partial or total
knee replacement using cooperative control of a robotic device. The original robots
included the Acrobot (Acrobot Ltd, UK) and RIO (MAKO Surgical Corp., USA),*
both of which involved the surgeon manually moving a robotically held milling tool
with the robot providing haptic feedback, allowing the operator to freely move the
tool in certain locations and preventing motion in others (Lang etal. 2011; Rodriguez
etal. 2005). The regions are defined in a planning phase and do not correspond to
physical obstacles, but rather consist of bone regions where the robot should move
freely, and others where it should not. This form of defining virtual obstacles is
referred to in the robotics literature as virtual fixtures (Rosenberg 1993). Using
these constraints the surgeon is able to perform milling through a small incision
executing the preoperative plan specified on a CT scan of the knee. This cooperative
approach is better suited for clinical acceptance because the surgeons stay in control
of the process while their accuracy is improved and variability is reduced. It should
be noted that these systems require highly accurate registration between the robot
and the boney structures, as no intraoperative imaging is used during the workflow.
Registration is performed by attaching a pointer tool with known geometry to the
robot and digitizing points on the bone surface. These points are registered to the
bone surface defined in the CT volume. To maintain a valid registration throughout
the procedure, the anatomy is either immobilized or it is instrumented with optically
tracked reference markers. Initial clinical use appears promising (Lonner etal. 2010),
with more accurate and less variable implant placements achieved with the robotic
device. While the technical goal of these systems may have been achieved, it is yet
to be determined if they make a clinical difference. Will the improved accuracy and
reduced variability improve the implant function and longevity? The requirement for
almost perfect registration has been identified as a key issue with the use of the RIO
system, and, along with robot positioning, these constraints are cited as the main
reasons for longer surgery times (Banger etal. 2013).
19.3.3Laboratory Prototypes
The number of laboratory AR systems is exceedingly large as compared to the few
that have made it into the commercial domain. The following sections provide brief
descriptions of such systems. Most notably, all of these prototypes, except one, are
based on visual augmentation and are therefore divided into subcategories based
on their display technology: a screen, a medical binocular system, a head-mounted
display (HMD), a half-silvered mirror display, or direct projection onto the patient.
*
The Acrobot and RIO systems are currently owned by Stryker Corp.
496
To start with an exception, the virtual reality system for guiding endoscopic skull
base surgery that also incorporates auditory augmentation (Dixon etal. 2014). This
system provides auditory cues in addition to virtual reality views of the underlying
anatomy. Critical anatomical structures are segmented from an intraoperative CBCT
data set that is rigidly registered to the patient. Structure- and distance-dependent
auditory cues are used to alert the operator of proximity to specific critical structures. The system was evaluated by seven surgeons who preferred the auditory
alarms be customizable, with the option of turning them off once the location of
the critical structures had been identified using the endoscopic or virtual reality
views. This speaks to the need for context awareness in these systems, as discussed
in Section 19.4.
In this tour, similar systems are grouped and presented in chronological order
according to their development and cited relevant publications in that specific group.
19.3.3.1 Screen-Based Display
Stepping into a modern operating room, one immediately notices multiple screens
displaying various pre- and intraoperative data. The second observation is that most
of the time the clinical staff is looking at the screens and not directly at the patient.
Thus, introducing an additional screen is clinically acceptable, possibly explaining
why this display choice is the most common approach used by AR systems.
It should be noted that this tour does not include systems that utilize a separate
screen approachone screen showing a virtual reality scene depicting the underlying spatial relationships between surgical tools and anatomy with additional screen(s)
showing the physical scene via intraoperative imaging (e.g., endoscopic or US video
streams (Ellsmere etal. 2004)). This approach relies on the operator to integrate the
information from the virtual and real worlds and thus should not be classified as an
AR system, but a VR system in conjunction with real-time imaging and will not be
discussed.
One of the first image-guided navigation systems for neurosurgical procedures
was the system presented in Grimson etal. (1996). This system augmented video
from a standard camera with structures segmented from preoperative CT or MR,
rigidly registered to the intraoperative surgical scene. The surgeon physically
traced contours of the relevant structures onto the patients scalp using the augmented video. The system did not include tracking of the patient or tools, assuming that the patient was stationary until all structures were physically marked.
A related system for liver tumor ablation was described in Nicolau etal. (2009).
This system overlaid models of the liver segmented from CT onto standard video.
Rigid registration was used to align the CT acquired at end-exhalation with the
physical patient. As the liver moves and deforms with respiration, the registration
and guidance information were only valid and therefore used at end-exhalation.
In this system, the distal end of the ablation needle was optically tracked using a
printed marker, assuming a rigid needle, with a fixed transform between the needles tracked distal end and its tip. Following clinical evaluation, it was determined
that the needle deflection was not negligible in 25% of the cases, an issue that can
be mitigated by using electromagnetic tracking and embedding the tracked sensor
close to the needles tip.
497
A system that overlays a rendering of tumor volume onto standard video without
the need for registration was described in Sato etal. (1998). This system was developed in the context of breast tumor resection. A standard video camera and a 2D US
probe are optically tracked and a 3D US volume was acquired with the tracked US,
therefore providing information on the tumor location relative to the tracked camera. The system assumes a stationary tumor, which is achieved by suspending the
patients respiration for a short period.
A more recent system aiming to improve puncture site selection for needle insertion in epidural anesthesia was described in Al-Deen Ashab et al. (2013). Using
an optically tracked US probe, the system creates a panoramic image of the spinal
column and identifies the vertebra levels. The optical tracking system also provides
a live video stream and the calibration parameters for the video camera, which were
used in conjunction with the known 3D location of the vertebras from US to overlay corresponding lines onto the video of the patients back without the need for
registration. The overlay is iconic, with lines denoting the vertebra location and
anatomical name.
Finally, a system that uses AR to guide the positioning of a robotic device was
described in Joskowicz etal. (2006). This system is used in the context of neurosurgery and relied on a preoperative CT or MR of the head that was rigidly registered
relative to a camera located in a fixed position. The location of the desired robot
mounting is projected onto the cameras video stream, allowing the user to position
the physical mount accordingly.
Several groups have proposed the use of a tracked standard video camera to guide
maxillofacial and neurosurgery procedures (Kockro etal. 2009; Mischkowski etal.
2006; Pandya etal. 2005; Weber etal. 2003) and more recently for guidance of kidney stone removal (Mller etal. 2013). Tracking of the camera pose is performed
using either a commercial optical tracking system (Kockro etal. 2009; Mischkowski
etal. 2006; Weber etal. 2003), a mechanical arm (Pandya etal. 2005), or directly
from the video stream using pose estimation (Mller etal. 2013). The camera video
feed is then augmented with projections of anatomical structures segmented from
preoperative CT scans and rigidly registered to the intraoperative patient. The systems described in Weber et al. (2003) and Mischkowski et al. (2006) mount the
camera on the back of a small handheld LCD screen, which is equivalent to the use
of a tablet with built-in camera described in Mller etal. (2013). In these systems,
the operator positions the screen between themselves and the patient as if looking
through a window, providing an intrinsic rough alignment of the operator and camera viewing direction, ensuring that the augmented information overlaid onto the
visible surface is in the correct location relative to the operators viewing angle.
In the system described in Kockro etal. (2009) the tracked camera is mounted
inside a surgical pointer and the augmented views are displayed on a standard screen.
As a consequence, the camera and surgeons viewing direction may differ significantly, resulting in erroneous perception of the location of the overlaid structures
on the skin surface, as illustrated in Figure 19.4. This effect became apparent when
surgeons attempted to use a pen to physically mark the structure locations on the
patients skin, a similar issue to that arising when projecting the underlying anatomical structures directly onto the patient (Section 19.3.3.4).
498
a/
er tor
c
m
Ca roje
p
FIGURE 19.4 When the surgeons viewing direction and the camera/projectors viewing
direction differ significantly, projecting an internal structure onto the skin surface directly or
onto its picture provides misleading guidance.
The following group of systems augments endoscopic video with additional information acquired either pre- or intraoperatively. Before proceeding, it is worthwhile
noting a unique characteristic of all endoscopic systems: they exhibit severe radial
distortions. This is due to their use of fish-eye lenses, a constraint imposed by the
need to have a wide field of view while minimizing the size of the tool inserted into
the body, therefore requiring correction of the distorted video images or distortion
of the augmented information before fusion.
An early example of augmented endoscopy was described in Freysinger et al.
(1997). This system was developed in the context of ENT surgery. The endoscopes
path is specified on a preoperative CT or MR. Both patient and endoscope are localized using electromagnetic tracking. The preoperative image is rigidly registered to
the patient and an iconic representation of the desired endoscope path in the form of
rectangles is overlaid onto the video.
A similar system developed for guiding resection of pituitary tumors was
described in Kawamata etal. (2002). Critical structures such as the carotid arteries
and optic nerves are segmented preoperatively in MR or CT. The patient and endoscope were optically tracked and following rigid registration, all of the segmented
structures are projected as wireframes onto the endoscopic image. It should be noted
that this form of augmentation all but occluded the content of the endoscopic image.
A more recent development is the system for lymph node biopsy guidance presented in Higgins etal. (2008). This system ensures the physician obtains the biopsy
499
from the correct location, which is not visible on the endoscopic video, by overlaying
the segmented lymph node identified in preoperative CT onto the endoscopic view.
A unique feature of this system is that it does not use external tracking devices;
instead, the location of the endoscope is estimated by rigidly registering a virtual
view generated from CT to the endoscopic video. Comparison between the real and
synthetic images is performed using an information theorybased metric.
Having identified intraoperative registration of soft tissue organs as a complex
challenge, some groups have decided to sidestep this hurdle. Instead of using data
from preoperative volumetric images, these groups acquire volumetric or tomographic data intraoperatively in a manner that ensures that the pose of the endoscope
is known with respect to the intraoperative coordinate system. One such system
developed in the context of liver surgery was described in Konishi etal. (2007). The
system uses a combined magneto-optical tracking system to track an US probe and
the endoscope. The tracked 2D US is used to acquire a volume intraoperatively when
the patient is at end-exhalation. Vessels and tumor locations are volume rendered
from this data and overlaid onto the endoscopic video. No registration is required as
the relative pose of the US and endoscope is known in the external tracking systems
coordinate frame.
Another system targeting liver tumor resection was described in Feuerstein etal.
(2008). That system employed an optical tracking system and intraoperative volumetric imaging is acquired using a CBCT machine. A contrast enhanced CBCT
image is acquired before resection with the patient at end-exhalation. The structures
visible in CBCT are then overlaid onto the endoscopic video so that the surgeon is
aware of the location of blood vessels and bile ducts that go into and out of the liver
segment of interest. Augmentation is only valid during the end-exhalation respiratory phase and prior to the beginning of resection.
A system that is not limited to guidance during end-exhalation was described
in Shekhar etal. (2010). Localization is done with an optical tracking system and
intraoperative volumetric images are acquired with a CT machine whose pose relative to the tracking system is determined via calibration. To enable augmentation
throughout the respiratory cycle, an initial high quality contrast enhanced volume
is acquired. This volume is then nonrigidly registered to subsequently acquired
low-dose CTs. The deformed structures are overlaid onto the endoscopic image.
Themain issue with this approach is the increased radiation exposure, making it less
likely to be adopted clinically. This approach later evolved into the use of optically
tracked 2D US that is continuously fused with live tracked stereo endoscopy (Kang
etal. 2014). This system does not require registration and implicitly accommodates
organ motion and deformation.
While the majority of endoscopy-based AR systems use commercial tracking devices, the system described in Teber et al. (2009) and Simpfendrfer et al.
(2011) is unique, in that it uses the medical images and implanted markers to perform endoscopic tracking. Note that this approach is invasive, as the markers are
implanted into the organ of interest under standard endoscopic guidance. Retrieval
of the m
arkers is also an issue, unless the procedure entails resection of tissue and
the markers are implanted in the gross region that will be removed during the procedure. An additional issue with this approach is that it assumes that the organ does not
500
deform, limiting its applicability to a subset of soft tissue interventions. The system
was utilized in radical prostatectomy and partial kidney resection. In the former
application, a 3D surface model obtained from intraoperative 3D transrectal US was
overlaid onto the video images, while in the latter, anatomical models were obtained
from intraoperative CBCT data.
While virtual fluoroscopy as a commercial product has experienced somewhat
of a fall-out from the clinic, variants of this approach that provide augmented x-ray
views of registered anatomical structures have continued to be developed in research
labs. In the context of orthopedic surgery, an AR system was developed to aid in the
reduction of long bone fractures (Zheng et al. 2008). This system registers cylindrical models to the two fragments of the long bone using two x-ray images. Each
bone segment is optically tracked using rigidly implanted reference frames. As the
surgeon manipulates the bone fragments, the image is modified based on the models
without the need for additional x-rays. In the context of cardiac procedures, a variation on the virtual fluoroscopy approach is used with preoperative data from MRI
augmenting live fluoroscopy (De Buck etal. 2005; Dori etal. 2011; George etal.
2011). These procedures are traditionally guided using fluoroscopy as the imaging
modality, which only enables clear catheter visualization, with the heart only clearly
visible under contrast agent administration. As contrast can cause acute renal failure,
the clinician has to strike a balance between the need to observe the spatial relationships between catheters and anatomy and the amount of administered contrast. Both
the systems described in De Buck etal. (2005) and Dori etal. (2011) rigidly register
the MRI data to contrast enhanced x-rays using manual alignment. While the former
uses a segmented model, the latter uses a volume rendering of the MRI dataset.
A system using automatic fiducialbased rigid registration is described in George
etal. (2011), with the x-ray images augmented with a segmented model from MRI.
It should be noted that all these systems use rigid registration with the augmented
data being valid only at a specific point of the cardiac and respiratory cycles. Thus,
the need for live fluoroscopy remains in clinical demand, as it reflects the dynamic
physical reality with the additional augmented information that is not necessarily
synchronized.
Finally, a recent method for initializing 2D/3D registration based on augmentation of fluoroscopic images was described in Gong etal. (2013). An anatomical
model obtained from preoperative CT or MR is virtually attached to an optically
tracked pointer. The operator uses the pointer to control the overlay of the model
onto intraoperative fluoroscopic images of the anatomy. Overlay is performed using
the projection parameters of the x-ray device. When the model is visually aligned
with the fluoroscopic images, it is physically in the same location as the anatomical
structure. Once alignment is visually determined, the pose of the tracked pointer
yields the desired transformation. Figure 19.5 shows this system in the laboratory
setting.
While all of the systems described earlier augmented fluoroscopic x-ray images,
they involved cumbersome user interactions or required the use of additional hardware
components such as tracking systems. This is less desirable in the intraoperative
environment where space is often physically limited and time is a precious commodity. An elegant AR system combining video and x-ray fluoroscopy in a clinically
501
a
c
FIGURE 19.5 Operator uses an (a) optically tracked (b) probe tool to align a volume of the
spine to (c) multiple x-ray images. Spine model is virtually attached to the tracked probe.
appropriate form factor was described in Navabet al. (2010) and Chen etal. (2013).
This system is based on a modified portable fluoroscopy unit, a C-arm. A video camera and a double mirror system are attached to the C-arm and calibrated such that the
optical axis of the video camera and the x-ray machine are aligned. Following this
one-time calibration, the x-ray images are overlaid onto the video, placing them in
the physical context. Thus, the exact location of underlying anatomy that is readily
visible in the x-ray is also visible with respect to the skin surface seen in the video.
The last system in this category is a model-enhanced US-assisted guidance system for cardiac interventions. The system integrates electromagnetically tracked
transesophageal US imaging providing real-time visualization, with augmented
models of the cardiac anatomy segmented from preoperative CT or MRI and virtual
representations of the electromagnetically tracked delivery instruments (Linte etal.
2008, 2010). The interface for these guidance systems is shown in Figure 19.6. The
anatomical models were rigidly registered to the intraoperative setting. While this
registration is not accurate enough for navigation based on the preoperative models,
they provide missing context while the intraoperative US provides the live navigation
guidance. The system was later adapted to an augmented display for mitral valve
repair to guide the positioning of a novel therapeutic device, NeoChord, as described
in Moore etal. (2013).
19.3.3.2 Binocular Display
Medical binocular systems are a natural vehicle for AR as they already provide a
mediated stereo view to the surgeon. Thus, the introduction of AR systems into
procedures utilizing these systems is more natural and has a smaller impact on the
existing workflow, increasing the chances of user adoption. Unlike the screen-based
display in which information is overlaid onto a single image, the systems in this category need to overlay the information on two views in a manner that creates correct
depth perception.
502
(a)
(b)
FIGURE 19.6 Augmented views from US-assisted system for guidance of cardiac interventions. (a) View for guiding direct access mitral valve implantation and septal defect repair,
and (b) View for guidance of transapical mitral valve repair.
One of the earlier systems in this category was the microscope-assisted guided
interventions (MAGI) system (Edwards etal. 2000). This system was developed in
the context of neurosurgery, augmenting the visible surface with semitransparent renderings of critical structures segmented in preoperative MR. Both patient and microscope are optically tracked, with the patient rigidly registered to the intraoperative
settings. Unfortunately, the depth perception created by the overlay of semitransparent surfaces even, with highly accurate projection parameters and registration, was
ambiguous and unstable (Johnson etal. 2003). Another system that can be described
as a medical HMD is the variscope AR (Birkfellner etal. 2003). By adding a beam
splitter to a clinical head-mounted binocular system, additional structures segmented
from preoperative CT are injected into the surgeons line of sight. The surgeons head
location, tools, and patient are all optically tracked. The preoperative data is rigidly
registered to the patient with the intended use being neurosurgery applications.
The da Vinci robotic surgical system is a commercial masterslave robotic system for soft tissue interventions (DiMaio et al. 2011). The surgeon controls the
system using visual feedback provided by a stereo endoscope on the patient side,
with the images viewed through a binocular system on the surgeons side, as shown
in Figure19.7. A number of groups have proposed to augment the surgeons view
by overlaying semitransparent surface models onto the stereo images. Models are
obtained from segmentation of CT and are rigidly registered to the intraoperative
setting. Among others, these include overlay of reconstructed coronary tree for cardiac procedures using a one-time manual adjustment of the original rigid registration
without any further updates (Falk etal. 2005), overlay of semitransparent kidney and
collecting system for partial kidney resection with continuous rigid registration to
account for motion (Su etal. 2009), overlay of blood vessels for liver tumor resection
(Buchs et al. 2013) without any updates after the initial registration,* and overlay
*
Note that the system was used on patients that were cirrhotic; the liver is rigid due to disease.
(a)
503
(b)
FIGURE 19.7 da Vinci optics: (a) stereo endoscope on the patient side and (b) binocular
system on the surgeon side.
of semitransparent facial nerve and cochlea for cochlear implant surgery (Liu etal.
2014), again, without any updates after the initial registration.
19.3.3.3 Head-Mounted Display
HMDs are in many ways equivalent to the medical binocular systems described earlier. They differ in several aspects, as they are not part of the current clinical setup;
historically they have been rather intrusive, and the augmentation is only visible to
a single user. Possibly with the newer generation of devices such as Google Glass,
Epsons Moverio BT-200, and the Oculus Rift, systems utilizing such devices may be
revived in the clinical setting, as they are much less intrusive.
One of the pioneering clinical AR systems did use a HMD for insitu display of
US images (Bajura etal. 1992) and a decade later an updated version of this system
was used to provide guidance for breast biopsy (Rosenthal etal. 2002). In this system
the HMD, US probe, and biopsy needle are optically tracked, with no intraoperative registration required. This guidance concept latter evolved into the AIM and
InVision commercial systems described earlier, but with the display consisting of a
stereoscopic monitor as opposed to a HMD.
A system for biopsy guidance using an optically tracked HMD and needle was
described in Vogt etal. (2006) and Wacker etal. (2006). The system overlays information from preoperative or, the less common, intraoperative MR onto the field of
view, and rigidly registered to the patient. A more recent, cost-effective AR system
for guiding needle insertion in vertebroplasty was described in Abe et al. (2013).
The system uses an outside-in-based optical tracking approach using a low cost
web camera attached to the HMD. Registration is performed via visual alignment
using x-ray imaging, with guidance provided via a needle trajectory planned on a
preoperative CT.
504
19.3.3.4 Semitransparent (Half-Silvered Mirror) Display

Half-silvered mirror devices overlay information in the line of sight of the operator
in a manner that they perceive the augmented image in its correct spatial location.
This approach allows the operator to maintain a focus on the interventional site,
increasing handeye coordination.
One of the earlier examples of using AR in the interventional setting displayed
CT data overlaid onto a physical phantom using a half-silvered mirror and a stereoscopic display that alternated images between left and right eye with corresponding
shutter glasses (Blackwell et al. 2000). The system was intended for educational
purposes. The operators head, the display position and phantom were tracked using
an optical tracking system, with the corresponding CT image of the phantom rigidly
registered to its physical counterpart.
The Sonic flashlight is a system designed for guiding US-based needle interventions (Chang etal. 2002, 2005; Stetten and Chib 2001). A miniature display and a
half-silvered mirror are mounted onto a standard 2D US probe. The operator looks
directly at the insertion site and sees the tomographic information overlaid onto the
anatomy in a natural manner. This enhances handeye coordination as compared
to a screen-based display, does not require registration or tracking and is a minimal
modification of the existing clinical setup. The system was successfully used by
clinicians in two clinical trials (Amesur etal. 2009; Wang etal. 2009). Figure 19.8
shows the system in use.
A similar system albeit larger scale was described in Fichtinger etal. (2005) and
Fischer etal. (2007). In this case, instead of attaching the screen and half-silvered
mirror to an US probe, they were attached to CT and MR scanners. This approach
provided a means for needle insertion guidance with the tomographic image being
either CT or MR. It should be noted that these imaging modalities are primarily
FIGURE 19.8 Sonic flashlight used in a cadaver study to place a needle in a deep vein in
the arm. (a) Miniature display and (b) half-silvered mirror. (Photo courtesy of G. Stetten,
University of Pittsburgh, Pittsburgh, PA.)
505
used for diagnosis and less for guidance of interventional procedures due to their
costs, and, as a consequence such a device is less likely to gain widespread clinical
acceptance.
The Medarpa system, developed at the Fraunhofer Institute, Germany, overlays
CT data onto the patient, similar to the previous system, with the advantage that
the display is mobile (Khan etal. 2006). The intended use is needle guidance for
biopsy procedures. To display the relevant information, the head of the operator
and the mobile display are optically tracked, while the needle is electromagnetically tracked. The system requires patient registration and calibration of the two
tracking systems, so that the transformation between them is known. These steps
are less desirable as they increase the time a procedure takes and involve increased
cost and technological footprint in the interventional suite given the use of multiple
tracking systems.
Finally, a unique 3D autostereoscopic system in combination with a half-
silvered mirror was described in the context of guidance of brain tumor interventions in an open MRI scanner (Liao et al. 2010) and guidance of dental
interventions using CT data (Wang etal. 2014). This system is unique in itself,
as it provides the perception of 3D without the need for custom glasses. This is
achieved by using integral photography, placing an array of lenses in front of the
screen that create the correct 3D perception from all viewing directions. In the
MRI-based version of the system, the image volume was rigidly registered to
the patient and the integral v ideography device and tools were tracked optically
using a commercial tracking system. In the CT-based version, the tracking and
registration were customized for the procedure using an optical tracking system
developed in-house with the volumetric image registered to the patient using the
visible dental structures.
19.3.3.5 Direct Patient Overlay Display
Directly overlaying information onto the patient became possible with the introduction of high-quality projection systems that have a sufficiently small form factor.
Similar to the use of a half-silvered mirror, this approach facilitates better handeye
coordination, as the surgeons attention is not divided between a screen and the interventional site. An implicit assumption of a projection-based display is that there is
a direct line of site between the projector and the patient. In the physically crowded
operating room such a setup is not always feasible, as the clinical staff and additional
equipment will often block the projectors field of view. An additional issue with this
display approach is that it can lead to incorrect perception of the location of deep
seated structures if the projection direction and the surgeons viewing direction differ significantly, as illustrated in Figure 19.4.
One of the earlier projectionbased augmentation approaches used a laser light
and the persistence of vision effect to overlay the shape of a craniotomy onto a head
phantom (Glossop etal. 2003). A CT volume was rigidly registered to the phantom
using an optically tracked pointer, after which the craniotomy outline defined in the
CT was projected as a set of points in quick succession, creating the perception of an
outline overlaid onto the skull. Another system developed in the context of craniofacial surgery overlaid borehole geometry, bone cutting trajectories and tumor margins
506
onto the patients head (Marmulla etal. 2005; Wrn etal. 2005). A rigid registration
was performed by matching the surface from a CT scan to an intraoperative surface
scan acquired using coded light pattern.
A system for guiding needle insertion in brachytherapy, insertion of radioactive implants into tumor, was described in Krempien et al. (2008). The system
uses a fixed projector that is utilized both for guidance, as well as to project a
structured light pattern for acquiring 3D intraoperative surface structures. The
intraoperative surface is rigidly registered to a preoperative surface extracted
from CT. Information is presented incrementally, first guiding the operator to
the insertion point, then aligning the needle along insertion trajectory and then
insertion depth. The system used iconic information and color coding to provide
guidance, while tracking both the patient and needle using a commercial optical
tracking system.
A system for guiding port placement in endoscopic procedures was presented
in Sugimoto etal. (2010). A volume rendering of internal anatomical structures
from preoperative CT is projected onto the patients skin using a projector fixed
above the operating table. The CT data is rigidly registered to the patient using
anatomical structures on the skin surface (e.g., umbilicus). Registration accuracy
was about 5 mm, with the result used as a rough roadmap helping the surgeon
to select appropriate locations for port placement. The procedure itself was then
guided using standard endoscopy. A similar system facilitating port placement
in endoscopic liver surgery performed using the da Vinci robotic system was
described in Volont et al. (2011). Registration and visualization are similar in
both systems. In the da Vinci case, clinicians noted that this form of guidance
was primarily useful in obese patients that required modification of the standard
port locations.
The development of small form-factor projectors enabled the projection of information directly onto the patient using a handheld and optically tracked pico-projector
(Gavaghan et al. 2011, 2012). This system was used to project vessels and tumor
locations onto the liver surface for guiding open liver surgery, tumor localization for
guiding orthopedic tumor resection, and for guiding needle insertions by projecting
iconic information, cross hairs and color, to indicate location and distance from target. Preoperative information was obtained from CT, and rigid registration was used
to align this information to the intraoperative setting. This system does not take into
account the viewers pose and the associated issues of depth perception when projecting anatomical structures. Rather, it adopts a practical solution, overlaying iconic
guidance information instead of projecting anatomical structures for structures that
lie deep beneath the skin surface.
A more recent, fixed projector display, system that implicitly accounts for motion
and deformation was described in Szab etal. (2013). This system overlays infrared
temperature maps directly onto the anatomy. The intent is to facilitate identification
of decreased blood flow to the heart muscle during cardiac surgery. The system uses
an infrared camera and a projector that are calibrated, so that their optical axis is
aligned. The system does not require registration as all data is acquired intraoperatively and, as long as the acquisition and projection latency is sufficiently short, it
readily accommodates the moving and deforming anatomy.
507
19.4 LIMITATIONS AND CHALLENGES

The goal of AR environments in clinical applications is to enhance the physicians
view of the anatomy and surgical site as a means to facilitate tool-to-target navigation and on-target instrument positioning. Nevertheless, most systems will face several challenges, some of which may delay or reflect upon their clinical acceptance.
19.4.1Optimal Information Dissemination and

User Perception Performance
Most image guidance platforms integrate data and signals from several sources,
including pre- and intraoperative images, functional (i.e., electrophysiology) data
and surgical tracking information, all incorporated in a common coordinate frame
and display. The operators performance is thus dependent on their perception and
interpretation of this information. Thus, even a technically optimal system is still
dependent on the human observers perception of the presented information.
One of the first publications to identify perception-related issues with medical AR
systems was Johnson etal. (2003). That work identified depth perception issues when
using semitransparent structure overlay onto the visible surface in a stereoscopic
setup. A later study evaluated the effect of seven rendering methods on stereoscopic
depth perception (Sielhorst etal. 2006). The best performance was achieved with
semitransparent surface rendering and when using a virtual window overlaid onto
the skin surface. Others have proposed various rendering and interaction methods to
improve depth perception both in stereoscopic and monocular views. In Bichlmeier
etal. (2009), the concept of using a virtual mirror is introduced. They show that by
introducing a user-controllable virtual mirror into the scene, the operator is able to
achieve improved stereoscopic depth perception. In monocular AR, Kalkofen etal.
(2009) describe the use of a focus + context approach to improve depth perception. Edges from the context structure in the original image are overlaid onto the
augmented scene, partially occluding the augmented focus information, leading to
improved depth perception. A novel nonphotorealistic approach for conveying depth
in monocular views is described in Hansen etal. (2010). Standard rendering methods
are replaced by illustrative rendering with depth of structures conveyed by modifying the stroke thickness and style. A more recent evaluation of rendering methods
for enhancing depth perception of a vessel tree structure was described in KerstenOertel et al. (2014). Seven depth cues are evaluated with rendering using depthdependent color and the use of aerial perspective shown to give the best depth cues.
Finally, even an ideal system from the technical and perceptual standpoint may
present us with new challenges. In Dixon et al. (2013), the issue of inattentional
blindness was studied. That is, by providing a compelling AR environment, we are
increasing the clinicians focus on specific regions while reducing their ability to
detect unexpected findings nearby. In a randomized study using endoscopic AR,
with the control being standard endoscopy, it was shown that the AR system lead to
more accurate results, but that it significantly reduced the operators ability to identify a complication or a foreign body that were in close proximity to the target. This
is a critical issue in systems where the operator is the only person viewing the scene
508
(e.g., systems using medical binocular) where no other member of the clinical staff
can alert them to such issues.
19.4.2Accommodation for Tissue Motion and Organ Deformation

Most interventions involving soft tissues are prone to anatomical differences between
the preoperative image dataset, and any anatomical models derived from these data,
and the intraoperative anatomy. Most such changes are simply due to a different
patient position or tissue relaxation when accessing the internal organs. To better
estimate and/or account for tissue and organ deformation that is not reflected in the
typical rigid-body image-to-patient registration performed prior to the start of a typical image-guided procedure, several techniques for surface estimation and tracking
have been explored (Maier-Hein etal. 2013). Stoyanov etal. (2008) proposed image
stabilization and augmentation with surface tracking and motion models, using the
operators gaze to identify the region that should be stabilized. A recent survey of
the current state-of-the-art AR systems for partial kidney resection (Hughes-Hallett
etal. 2014) noted that accounting for organ and tissue deformation remains a major
research challenge.
19.4.3Compatibility with Clinical Workflow and Environment

Most interventional navigation systems that have been successfully embraced
into the clinic do not require a great deal of new equipment, and integrate easily
with the existing equipment in the interventional suite, in a context-aware manner.
Inaddition, more technology leads to more data sources and more data being displayed, which in turn may become harmful by occluding information visible in the
intraoperative images. One approach to limit the overwhelming technological footprint and its impact on the interventional workflow and data interpretation is via
context-aware AR systems (Kati etal. 2013, 2014). This requires that we study the
existing procedure workflow, identifying different stages using input from various
unobtrusive sensors (Navab et al. 2007), as well as content of the intraoperative
medical images, and the relative locations of tools and anatomy reported by a tracking system. In Jannin and Morandi (2007), the authors define and use ontology to
model surgical procedures by facilitating identification of parameters that enable
prediction of the course of surgery. In their recent work, Padoy etal. proposed the
use of hidden Markov models and dynamic time warping (2012) to identify surgical
phases, an approach similar to the one proposed in Bouarfa etal. (2011) that relies on
embedded Bayesian Hidden Markov Model to identify surgical phases.
19.4.4Cost-Effectiveness
Based on past performance, introduction of new technologies into the operating
room has more often increased the cost of providing healthcare (Bodenheimer
2005). Government agencies are aware of this as reflected by the goals of the affordable healthcare act in the Unites States that aims to reduce healthcare costs (Davies
2013). From a financial perspective, only a few studies have focused on evaluating
509
the cost-effective use of virtual reality image-guidance and robotic systems (Costa
etal. 2013; Desai etal. 2011; Margier etal. 2014; Novak etal. 2007; Swank etal.
2009), and unfortunately none of these studies reported a clear financial benefit for
using the proposed navigation systems. With regard to evaluating cost-effectiveness
of AR systems, the only evaluation identified was that of the RIO augmented haptic surgery system (Swank et al. 2009). While this system was shown to be costeffective, the analysis also showed an increased number of patients undergoing the
procedure, possibly attracted by the novelty of the technology.
Thus, successfully transitioning from a laboratory implementation and testing
to clinical care now becomes not only a matter of providing improved healthcare,
but also of being cost-effective. If the intent is to develop systems that will be clinically adopted, then cost should also be considered during the research phase and
not ignored until the clinical implementation phase, which seems to be the case for
most recently developed systems that have experienced low traction in terms of their
clinical translation.
19.5 FUTURE PERSPECTIVES

In the research setting, focus has shifted toward modeling and accommodating for
soft tissue motion and deformation, and in some cases developing systems that attempt
to sidestep this challenge. Keeping in mind that the overall goal of this domain is to
provide improved patient care and not develop algorithms and systems for their mathematical beauty and novelty, sidestepping an issue may sometimes be the correct
choice. Though, further research into motion and deformation modeling is merited.
A subject that has not been sufficiently addressed by researchers developing
image-guided navigation and AR systems is the design of optimal humancomputer
interfaces, currently only addressed from the information display perspective.
Researchers have objectively designed display methods for neurosurgical procedures
(Kersten-Oertel etal. 2014), and for interventional radiology procedures (Varga etal.
2013), but in most cases display choice is arbitrary. In addition, the conclusion that
that overwhelming the operator with complex available information is actually detrimental has stimulated workflow analysis and investigation of context aware systems,
focused on displaying the right amount of information when it is actually needed.
Unfortunately, the operators interaction with the system is often very cumbersome,
with the system being controlled by proxythe system developer performing the
interaction based on the clinicians requests. A plausible explanation for this approach
is the lack of intuitive system control by others besides the developers. Extreme cases
described in Grtzel etal. (2004) reveal that clinical staff required close to 7min to
execute a desired selection, which only consisted of a mouse click. This aspect of
system design requires further efforts on the part of the research community.
Finally, the cost aspect of novel systems should influence the design of future AR
systems. Costly novel solutions will most often not be widely adopted. If the choice
is between using a costly imaging modality, for example, intraoperative MR, in combination with simple algorithms, as opposed to a cost-effective imaging modality,
for example, 3D US, in combination with complex algorithms, the latter has higher
chances of widespread adoption.
510
Transitioning from the laboratory to the commercial domain remains a challenge, as illustrated by the ratio of commercial systems to laboratory prototypes
presented here. On the other hand, medicine is one of the domains where AR systems have made inroads, overcoming both technical and regulatory challenges that
are not faced in other application domains. Given the active research in medical AR
and the large number of laboratory systems, we expect to see additional AR systems or systems incorporating elements of AR in commercial products. While we
use the term medical AR to describe the domain, in practice the important aspect
of these systems is that they augment the physicians abilities to carry out complex
procedures in a minimally invasive manner, improving the quality of healthcare for
all of us.
REFERENCES
Abe, Y., S. Sato, K. Kato, T. Hyakumachi, Y. Yanagibashi, M. Ito, and K. Abumi. 2013. A novel
3D guidance system using augmented reality for percutaneous vertebroplasty: Technical
note. Journal of Neurosurgery. Spine 19(4) (October): 492501.
Adhikary, S. D., A. Hadzic, and P. M. McQuillan. 2013. Simulator for teaching hand-eye coordination during ultrasound-guided regional anaesthesia. British Journal of Anaesthesia
111(5) (November): 844845.
Al-Deen Ashab, H., V. A. Lessoway, S. Khallaghi, A. Cheng, R. Rohling, and P. Abolmaesumi.
2013. An augmented reality system for epidural anesthesia (AREA): Prepuncture
identification of vertebrae. IEEE Transactions on Bio-Medical Engineering 60(9)
(September): 26362644.
Amesur, N., D. Wang, W. Chang, D. Weiser, R. Klatzky, G. Shukla, and G. Stetten. 2009.
Peripherally inserted central catheter placement using the sonic flashlight. Journal of
Vascular and Interventional Radiology 20(10) (October): 13801383.
Bajura, M., H. Fuchs, and R. Ohbuchi. 1992. Merging virtual objects with the real world: Seeing
ultrasound imagery within the patient. In Proceedings of the 19th Annual Conference
on Computer Graphics and Interactive Techniques, SIGGRAPH92, Chicago, IL. New
York: ACM, pp. 203210.
Banger, M., P. J. Rowe, and M. Blyth. 2013. Time analysis of MAKO RIO UKA procedures
in comparison with the Oxford UKA. Bone & Joint Journal Orthopaedic Proceedings
Supplement 95-B(Suppl. 28) (August 1): 89.
Bichlmeier, C., S. M. Heining, M. Feuerstein, and N. Navab. 2009. The virtual mirror: Anew
interaction paradigm for augmented reality environments. IEEE Transactions on
Medical Imaging 28(9) (September): 14981510.
Birkfellner, W., M. Figl, C. Matula, J. Hummel, R. Hanel, H. Imhof, F. Wanschitz, A. Wagner,
F. Watzinger, and H. Bergmann. 2003. Computer-enhanced stereoscopic vision in a
head-mounted operating binocular. Physics in Medicine and Biology 48(3) (February7):
N49N57.
Blackwell, M., C. Nikou, A. M. DiGioia, and T. Kanade. 2000. An image overlay system for
medical data visualization. Medical Image Analysis 4(1) (March): 6772.
Bodenheimer, T. 2005. High and rising health care costs. Part 2: Technologic innovation.
Annals of Internal Medicine 142(11) (June 7): 932937.
Botden, S. M. and J. J. Jakimowicz. 2009. What is going on in augmented reality simulation
in laparoscopic surgery? Surgical Endoscopy 23: 1693700.
Bouarfa, L., P. P. Jonker, and J. Dankelman. 2011. Discovery of high-level tasks in the operating room. Journal of Biomedical Informatics 44(3) (June): 455462 (Biomedical
complexity and error).
511
Buchs, N. C., F. Volonte, F. Pugin, C. Toso, M. Fusaglia, K. Gavaghan, P. E. Majno,

M.Peterhans, S. Weber, and P. Morel. 2013. Augmented environments for the targeting
of hepatic lesions during image-guided robotic liver surgery. The Journal of Surgical
Research 184(2) (October): 825831.
Caudell, T. P. 1994. Introduction to augmented and virtual reality. Proceedings of the SPIE
1994: Telemanipulator and Telepresence Technology 2351: 272281.
Chang, W. M., M. B. Horowitz, and G. D. Stetten. 2005. Intuitive intraoperative ultrasound
guidance using the sonic flashlight: A novel ultrasound display system. Neurosurgery
56(Suppl. 2) (April): 434437; discussion 434437.
Chang, W. M., G. D. Stetten, L. A. Lobes Jr, D. M. Shelton, and R. J. Tamburo. 2002. Guidance
of retrobulbar injection with real-time tomographic reflection. Journal of Ultrasound in
Medicine 21(10) (October): 11311135.
Chen, X., L. Wang, P. Fallavollita, and N. Navab. 2013. Precise x-ray and video overlay for
augmented reality fluoroscopy. International Journal of Computer Assisted Radiology
and Surgery 8(1) (January): 2938.
Cleary, K. and T. M. Peters. 2010. Image-guided interventions: Technology review and clinical
applications. Annual Review of Biomedical Engineering 12(August 15): 119142.
Coller, B. S. and R. M. Califf. 2009. Traversing the valley of death: A guide to assessing
prospects for translational success. Science Translational Medicine 1(10) (December9):
10cm9.
Costa, F., E. Porazzi, U. Restelli, E. Foglia, A. Cardia, A. Ortolina, M. Tomei, M. Fornari,
and G. Banfi. 2013. Economic study: A cost-effectiveness analysis of an intraoperative
compared with a preoperative image-guided system in lumbar pedicle screw fixation
in patients with degenerative spondylolisthesis. The Spine Journal 14(8) (October 31):
17901796.
Davies, E. 2013. Obama promises to act on medicare costs, medical research, and gun control.
BMJ (Clinical Research Ed.) 346: f1034.
De Buck, S., F. Maes, J. Ector, J. Bogaert, S. Dymarkowski, H. Heidbuchel, and P. Suetens.
2005. An augmented reality system for patient-specific guidance of cardiac catheter
ablation procedures. IEEE Transactions on Medical Imaging 24(11) (November):
15121524.
Desai, A. S., A. Dramis, D. Kendoff, and T. N. Board. 2011. Critical review of the current
practice for computer-assisted navigation in total knee replacement surgery: Costeffectiveness and clinical outcome. Current Reviews in Musculoskeletal Medicine 4(1):
1115.
DiMaio, S., M. Hanuschik, and U. Kreaden. 2011. The da Vinci surgical system. In Surgical
Robotics, (eds.) J. Rosen, B. Hannaford, and R. M. Satava, pp. 199217. New York:
Springer.
Dixon, B. J., M. J. Daly, H. Chan, A. D. Vescan, I. J. Witterick, and J. C. Irish. 2013. Surgeons
blinded by enhanced navigation: The effect of augmented reality on attention. Surgical
Endoscopy 27(2) (February): 454461.
Dixon, B. J., M. J. Daly, H. Chan, A. Vescan, I. J. Witterick, and J. C. Irish. 2014. Augmented
real-time navigation with critical structure proximity alerts for endoscopic skull base
surgery. The Laryngoscope 124(4) (April): 853859.
Dori, Y., M. Sarmiento, A. C. Glatz, M. J. Gillespie, V. M. Jones, M. A. Harris, K. K. Whitehead,
M. A. Fogel, and J. J. Rome. 2011. X-ray magnetic resonance fusion to internal markers and utility in congenital heart disease catheterization. Circulation Cardiovascular
Imaging 4(4) (July): 415424.
Edwards, P. J., A. P. King, C. R. Maurer Jr, D. A. de Cunha, D. J. Hawkes, D. L. Hill,
R. P. Gaston et al. 2000. Design and evaluation of a system for microscope-assisted
guided
interventions (MAGI). IEEE Transactions on Medical Imaging 19(11)
(November): 10821093.
512
Ellsmere, J., J. Stoll, W. Wells 3rd, R. Kikinis, K. Vosburgh, R. Kane, D. Brooks, and D.Rattner.
2004. A new visualization technique for laparoscopic ultrasonography. Surgery 136(1)
(July): 8492.
Ettinger, G. L., M. E. Leventon, W. E. Grimson, R. Kikinis, L. Gugino, W. Cote, L. Sprung
etal. 1998. Experimentation with a transcranial magnetic stimulation system for functional brain mapping. Medical Image Analysis 2: 477486.
Falk, V., F. Mourgues, L. Adhami, S. Jacobs, H. Thiele, S. Nitzsche, F. W. Mohr, and E.CosteManire. 2005. Cardio navigation: Planning, simulation, and augmented reality in
robotic assisted endoscopic bypass grafting. The Annals of Thoracic Surgery 79(6)
(June): 20402047.
Feifer, A., J. Delisle, and M. Anidjar. 2008. Hybrid augmented reality simulator: Preliminary
construct validation of laparoscopic smoothness in a urology residency program.
Journal of Urology 180: 14551459.
Feuerstein, M., T. Mussack, S. M. Heining, and N. Navab. 2008. Intraoperative laparoscope
augmentation for port placement and resection planning in minimally invasive liver
resection. IEEE Transactions on Medical Imaging 27(3) (March): 355369.
Fichtinger, G., A. Deguet, K. Masamune, E. Balogh, G. S. Fischer, H. Mathieu, R. H. Taylor,
S. J. Zinreich, and L. M. Fayad. 2005. Image overlay guidance for needle insertion
in CT scanner. IEEE Transactions on Bio-Medical Engineering 52(8) (August):
14151424.
Fischer, G. S., A. Deguet, C. Csoma, R. H. Taylor, L. Fayad, J. A. Carrino, S. J. Zinreich, and
G. Fichtinger. 2007. MRI image overlay: Application to arthrography needle insertion.
Computer Aided Surgery 12(1) (January): 214.
Foley, K. T., D. A. Simon, and Y. R. Rampersaud. 2001. Virtual fluoroscopy: Computerassisted fluoroscopic navigation. Spine 26(4) (February 15): 347351.
Freysinger, W., A. R. Gunkel, and W. F. Thumfart. 1997. Image-guided endoscopic ENT
surgery. European Archives of Otorhinolaryngology 254(7): 343346.
Gavaghan, K. A., M. Peterhans, T. Oliveira-Santos, and S. Weber. 2011. A portable image
overlay projection device for computer-aided open liver surgery. IEEE Transactions on
Bio-Medical Engineering 58(6) (June): 18551864.
Gavaghan, K., T. Oliveira-Santos, M. Peterhans, M. Reyes, H. Kim, S. Anderegg, and
S.Weber. 2012. Evaluation of a portable image overlay projector for the visualisation of
surgical navigation data: Phantom studies. International Journal of Computer Assisted
Radiology and Surgery 7(4) (July): 547556.
George, A. K., M. Sonmez, R. J. Lederman, and A. Z. Faranesh. 2011. Robust automatic rigid
registration of MRI and x-ray using external fiducial markers for XFM-guided interventional procedures. Medical Physics 38(1) (January): 125141.
Glossop, N., C. Wedlake, J. Moore, T. Peters, and Z. Wang. 2003. Laser projection augmented reality system for computer assisted surgery. In Medical Image Computing and
Computer-Assisted InterventionMICCAI 2003, (eds.) R. E. Ellis and T. M. Peters,
pp.239246. Lecture Notes in Computer Science 2879. Berlin, Germany: Springer.
Gong, R. H., . Gler, M. Krkloglu, J. Lovejoy, and Z. Yaniv. 2013. Interactive initialization
of 2D/3D rigid registration. Medical Physics 40(12) (December): 121911.
Grtzel, C., T. Fong, S. Grange, and C. Baur. 2004. A non-contact mouse for surgeoncomputer
interaction. Technology and Health Care 12(3): 245257.
Grimson, W. L., G. J. Ettinger, S. J. White, T. Lozano-Perez, W. M. Wells, and R. Kikinis.
1996. An automatic registration method for frameless stereotaxy, image guided s urgery,
and enhanced reality visualization. IEEE Transactions on Medical Imaging 15(2):
129140.
Hakime, A., F. Deschamps, E. G. M. D. Carvalho, A. Barah, A. Auperin, and T. D. Baere.
2012. Electromagnetic-tracked biopsy under ultrasound guidance: Preliminary results.
CardioVascular and Interventional Radiology 35(4) (August 1): 898905.
513
Hansen, C., J. Wieferich, F. Ritter, C. Rieder, and H.-O. Peitgen. 2010. Illustrative visualization of 3D planning models for augmented reality in liver surgery. International Journal
of Computer Assisted Radiology and Surgery 5(2) (March): 133141.
Higgins, W. E., J. P. Helferty, K. Lu, S. A. Merritt, L. Rai, and K.-C. Yu. 2008. 3D CT-video
fusion for image-guided bronchoscopy. Computerized Medical Imaging and Graphics
32(3) (April): 159173.
Hildebrand, P., M. Kleemann, U. J. Roblick, L. Mirow, C. Brk, and H.-P. Bruch. 2007.
Technical aspects and feasibility of laparoscopic ultrasound navigation in radiofrequency ablation of unresectable hepatic malignancies. Journal of Laparoendoscopic &
Advanced Surgical Techniques. Part A 17(1) (February): 5357.
Hughes-Hallett, A., E. K. Mayer, H. J. Marcus, T. P. Cundy, P. J. Pratt, A. W. Darzi, and
J.A.Vale. 2014. Augmented reality partial nephrectomy: Examining the current status
and future perspectives. Urology 83(2) (February): 266273.
Jannin, P. and X. Morandi. 2007. Surgical models for computer-assisted neurosurgery.
NeuroImage 37(3) (September 1): 783791.
Johnson, L. G., P. Edwards, and D. Hawkes. 2003. Surface transparency makes stereo overlays
unpredictable: The implications for augmented reality. Studies in Health Technology
and Informatics 94: 131136.
Joskowicz, L., R. Shamir, M. Freiman, M. Shoham, E. Zehavi, F. Umansky, and Y. Shoshan.
2006. Image-guided system with miniature robot for precise positioning and targeting in
keyhole neurosurgery. Computer Aided Surgery 11(4) (July): 181193.
Kalkofen, D., E. Mendez, and D. Schmalstieg. 2009. Comprehensible visualization for augmented reality. IEEE Transactions on Visualization and Computer Graphics 15(2)
(April): 193204.
Kaneko, M., F. Kishino, K. Shimamura, and H. Harashima. 1993. Toward the new era of visual
communication. IEICE Transactions on Communications E76-B(6): 577591.
Kang, X., M. Azizian, E. Wilson, K. Wu, A. D. Martin, T. D. Kane, C. A. Peters, K. Cleary,
and R. Shekhar. 2014. Stereoscopic augmented reality for laparoscopic surgery. Surgical
Endoscopy 28(7) (February 1): 22272235.
Kati, D., P. Spengler, S. Bodenstedt, G. Castrillon-Oberndorfer, R. Seeberger, J. Hoffmann,
R. Dillmann, and S. Speidel. 2014. A system for context-aware intraoperative augmented reality in dental implant surgery. International Journal of Computer Assisted
Radiology and Surgery 10(1) (April 27): 101108.
Kati, D., A.-L. Wekerle, J. Grtler, P. Spengler, S. Bodenstedt, S. Rhl, S. Suwelack etal.
2013. Context-aware augmented reality in laparoscopic surgery. Computerized Medical
Imaging and Graphics 37(2) (March): 174182.
Kaufman, S., I. Poupyrev, E. Miller, M. Billinghurst, P. Oppenheimer, and S. Weghorst. 1997.
New interface metaphors for complex information space visualization: An ECG monitor
object prototype. Studies in Health Technology and Informatics 39: 131140.
Kawamata, T., H. Iseki, T. Shibasaki, and T. Hori. 2002. Endoscopic augmented reality navigation system for endonasal transsphenoidal surgery to treat pituitary tumors: Technical
note. Neurosurgery 50(6) (June): 13931397.
Kerner, K. F., C. Imielinska, J. Rolland, and H. Tang. 2003. Augmented reality for teaching
endotracheal intubation: MR imaging to create anatomically correct models. In
Proceedings of the Annual AMIA Symposium, Washington, DC, pp. 888889.
Kersten-Oertel, M., S. J.-S. Chen, and D. L. Collins. 2014. An evaluation of depth enhancing
perceptual cues for vascular volume visualization in neurosurgery. IEEE Transactions
on Visualization and Computer Graphics 20(3) (March): 391403.
Khan, M. F., S. Dogan, A. Maataoui, S. Wesarg, J. Gurung, H. Ackermann, M. Schiemann,
G. Wimmer-Greinecker, and T. J. Vogl. 2006. Navigation-based needle puncture of a
cadaver using a hybrid tracking navigational system. Investigative Radiology 41(10)
(October): 713720.
514
Kleemann, M., P. Hildebrand, M. Birth, and H. P. Bruch. 2006. Laparoscopic ultrasound

navigation in liver surgery: Technical aspects and accuracy. Surgical Endoscopy 20(5)
(May): 726729.
Kockro, R. A., Y. T. Tsai, I. Ng, P. Hwang, C. Zhu, K. Agusanto, L. X. Hong, and L. Serra.
2009. Dex-ray: Augmented reality neurosurgical navigation with a handheld video
probe. Neurosurgery 65(4) (October): 795807; discussion 807808.
Koehring, A., J. L. Foo, G. Miyano, T. Lobe, and E. Winer. 2008. A framework for interactive visualization of digital medical images. Journal of Laparoendoscopic & Advanced
Surgical Techniques 18: 697706.
Konishi, K., M. Nakamoto, Y. Kakeji, K. Tanoue, H. Kawanaka, S. Yamaguchi, S. Ieiri etal.
2007. A real-time navigation system for laparoscopic surgery based on three-dimensional
ultrasound using magneto-optic hybrid tracking configuration. International Journal of
Computer Assisted Radiology and Surgery 2(1) (June 1): 110.
Krempien, R., H. Hoppe, L. Kahrs, S. Daeuber, O. Schorr, G. Eggers, M. Bischof, M.W.Munter,
J. Debus, and W. Harms. 2008. Projector-based augmented reality for intuitive intraoperative guidance in image-guided 3D interstitial brachytherapy. International Journal of
Radiation Oncology, Biology, Physics 70(3) (March 1): 944952.
Lang, J. E., S. Mannava, A. J. Floyd, M. S. Goddard, B. P. Smith, A. Mofidi, T. M. Seyler, and
R. H. Jinnah. 2011. Robotic systems in orthopaedic surgery. The Journal of Bone and
Joint Surgery 93(10) (October): 12961299.
Langlotz, F. 2004. Potential pitfalls of computer aided orthopedic surgery. Injury 35(Suppl. 1)
(June): S-A17S-A23.
Liao, H., T. Inomata, I. Sakuma, and T. Dohi. 2010. 3-D augmented reality for MRI-guided
surgery using integral videography autostereoscopic image overlay. IEEE Transactions
on Bio-Medical Engineering 57(6) (June): 14761486.
Linte, C. A., K. P. Davenport, K. Cleary, C. Peters, K. G. Vosburgh, N. Navab, P. E. Edwards
et al. 2013. On mixed reality environments for minimally invasive therapy guidance:
Systems architecture, successes and challenges in their implementation from laboratory
to clinic. Computerized Medical Imaging and Graphics 37(2) (March): 8397.
Linte, C. A., J. Moore, C. Wedlake, and T. M. Peters. 2010. Evaluation of model-enhanced
ultrasound-assisted interventional guidance in a cardiac phantom. IEEE Transactions on
Bio-Medical Engineering 57(9) (September): 22092218.
Linte, C. A., J. Moore, A. Wiles, C. Wedlake, and T. Peters. 2008. Virtual reality-enhanced
ultrasound guidance: A novel technique for intracardiac interventions. Computer Aided
Surgery 13: 8294.
Liu, W. P., M. Azizian, J. Sorger, R. H. Taylor, B. K. Reilly, K. Cleary, and D. Preciado. 2014.
Cadaveric feasibility study of da Vinci Si-assisted cochlear implant with augmented
visual navigation for otologic surgery. JAMA OtolaryngologyHead & Neck Surgery
140(3) (March 1): 208214.
Lonner, J. H., T. K. John, and M. A. Conditt. 2010. Robotic arm-assisted UKA improves
tibial component alignment: A pilot study. Clinical Orthopaedics and Related Research
468(1) (January): 141146.
Lovo, E. E., J. C. Quintana, M. C. Puebla, G. Torrealba, J. L. Santos, I. H. Lira, and P. Tagle.
2007. A novel, inexpensive method of image coregistration for applications in imageguided surgery using augmented reality. Neurosurgery 60: 366371.
Maier-Hein, L., P. Mountney, A. Bartoli, H. Elhawary, D. Elson, A. Groch, A. Kolb etal. 2013.
Optical techniques for 3D surface reconstruction in computer-assisted laparoscopic
surgery. Medical Image Analysis 17(8) (December): 974996.
Margier, J., S. D. Tchouda, J.-J. Banihachemi, J.-L. Bosson, and S. Plaweski. 2014.
Computer-assisted navigation in ACL reconstruction is attractive but not yet cost
efficient. Knee Surgery, Sports Traumatology, Arthroscopy: Official Journal of the
ESSKA (January 21). DOI 10.1007/s00167-013-2831-2.
515
Marmulla, R., H. Hoppe, J. Mhling, and S. Hassfeld. 2005. New augmented reality concepts
for craniofacial surgical procedures. Plastic and Reconstructive Surgery 115(4) (April):
11241128.
Mathes, A. M., S. Kreuer, S. O. Schneider, S. Ziegeler, and U. Grundmann. 2008. The performance of six pulse oximeters in the environment of neuronavigation. Anesthesia and
Analgesia 107(2) (August): 541544.
Mavrogenis, A. F., O. D. Savvidou, G. Mimidis, J. Papanastasiou, D. Koulalis, N. Demertzis,
and P. J. Papagelopoulos. 2013. Computer-assisted navigation in orthopedic surgery.
Orthopedics 36(8) (August): 631642.
Merloz, P., J. Troccaz, H. Vouaillat, C. Vasile, J. Tonetti, A. Eid, and S. Plaweski. 2007.
Fluoroscopy-based navigation system in spine surgery. Proceedings of the Institution of
Mechanical Engineers. Part H, Journal of Engineering in Medicine 221(7) (October):
813820.
Metzger, P. J. 1993. Adding reality to the virtual. In Proceedings of the IEEE Virtual Reality
International Symposium, Seattle, WA, pp. 713.
Milgram, P. and F. Kishino. 1994. A taxonomy of mixed reality visual displays. In IEICE
Transactions on Information and Systems 77(12), 13211329, 1994.
Milgram, P., H. Takemura, A. Utsumi, and F. Kishino. 1994. Augmented reality: A class
of displays on the realityvirtuality continuum. In Proceedings of the SPIE 1994:
Telemanipulator and Telepresence Technology, Boston, MA, vol. 2351, pp. 282292.
Mischkowski, R. A., M. J. Zinser, A. C. Kbler, B. Krug, U. Seifert, and J. E. Zller. 2006. Application
of an augmented reality tool for maxillary positioning in o rthognathic s urgeryA feasibility
study. Journal of Cranio-Maxillo-Facial Surgery 34(8) (December): 478483.
Moore, J. T., M. W. A. Chu, B. Kiaii, D. Bainbridge, G. Guiraudon, C. Wedlake, M. Currie,
M. Rajchl, R. V. Patel, and T. M. Peters. 2013. A navigation platform for guidance
of beating heart transapical mitral valve repair. IEEE Transactions on Bio-Medical
Engineering 60(4) (April): 10341040.
Mller, M., M.-C. Rassweiler, J. Klein, A. Seitel, M. Gondan, M. Baumhauer, D. Teber,
J.J.Rassweiler, H.-P. Meinzer, and L. Maier-Hein. 2013. Mobile augmented reality for
computer-assisted percutaneous nephrolithotomy. International Journal of Computer
Assisted Radiology and Surgery 8(4) (July): 663675.
Nakamoto, M., K. Nakada, Y. Sato, K. Konishi, M. Hashizume, and S. Tamura. 2008.
Intraoperative magnetic tracker calibration using a magneto-optic hybrid tracker for 3-D
ultrasound-based navigation in laparoscopic surgery. IEEE Transactions on Medical
Imaging 27: 255270.
Navab, N., T. Blum, L. Wang, A. Okur, and T. Wendler. 2012. First deployments of augmented
reality in operating rooms. Computer 45(7) (July): 4855.
Navab, N., S.-M. Heining, and J. Traub. 2010. Camera augmented mobile C-Arm (CAMC):
Calibration, accuracy study, and clinical applications. IEEE Transactions on Medical
Imaging 29(7) (July): 14121423.
Navab, N., J. Traub, T. Sielhorst, M. Feuerstein, and C. Bichlmeier. 2007. Action- and
workflow-driven augmented reality for computer-aided medical procedures. IEEE
Computer Graphics and Applications 27(5) (October): 1014.

Nicolau, S. A., X. Pennec, L. Soler, X. Buy, A. Gangi, N. Ayache, and J. Marescaux. 2009. An
augmented reality system for liver thermal ablation: Design and evaluation on clinical
cases. Medical Image Analysis 13(3) (June): 494506.
Novak, E. J., M. D. Silverstein, and K. J. Bozic. 2007. The cost-effectiveness of computerassisted navigation in total knee arthroplasty. The Journal of Bone and Joint Surgery
89(11) (November): 23892397.
Padoy, N., T. Blum, S.-A. Ahmadi, H. Feussner, M.-O. Berger, and N. Navab. 2012. Statistical
modeling and recognition of surgical workflow. Medical Image Analysis 16(3) (April):
632641.
516
Pandya, A., M.-R. Siadat, and G. Auner. 2005. Design, implementation and accuracy of a prototype for medical augmented reality. Computer Aided Surgery 10(1) (January): 2335.
Peters, T. and K. Cleary (ed.). 2008. Image-Guided Interventions: Technology and Applications.
Berlin, Germany: Springer.
Povoski, S. P., R. L. Neff, C. M. Mojzisik, D. M. OMalley, G. H. Hinkle, N. C. Hall,
D.A.Murrey Jr, M. V. Knopp, and E. W. Martin Jr. 2009. A comprehensive overview
of radioguided surgery using gamma detection probe technology. World Journal of
Surgical Oncology 7(1) (December 1): 163.
Rodriguez, F., S. Harris, M. Jakopec, A. Barrett, P. Gomes, J. Henckel, J. Cobb, and B. Davies.
2005. Robotic clinical trials of uni-condylar arthroplasty. The International Journal of
Medical Robotics + Computer Assisted Surgery 1(4) (December): 2028.
Rolland, J. P., D. L. Wright, and A. R. Kancherla. 1997. Towards a novel augmented-reality
tool to visualize dynamic 3-D anatomy. Studies in Health Technology and Informatics
39: 337348.
Rosenberg, L. B. 1993. Virtual fixtures: Perceptual tools for telerobotic manipulation. In 1993
IEEE Virtual Reality Annual International Symposium, Seattle, WA, pp. 7682.
Rosenthal, M., A. State, J. Lee, G. Hirota, J. Ackerman, K. Keller, E. Pisano, M. Jiroutek,
K.Muller, and H. Fuchs. 2002. Augmented reality guidance for needle biopsies: An initial randomized, controlled trial in phantoms. Medical Image Analysis 6(3) (September):
313320.
Sato, Y., M. Nakamoto, Y. Tamaki, T. Sasama, I. Sakita, Y. Nakajima, M. Monden, and
S.Tamura. 1998. Image guidance of breast cancer surgery using 3-D ultrasound images
and augmented reality visualization. IEEE Transactions on Medical Imaging 17(5)
(October): 681693.
Shekhar, R., O. Dandekar, V. Bhat, M. Philip, P. Lei, C. Godinez, E. Sutton etal. 2010. Live
augmented reality: A new visualization method for laparoscopic surgery using continuous volumetric computed tomography. Surgical Endoscopy 24(8) (August): 19761985.
Sielhorst, T., C. Bichlmeier, S. M. Heining, and N. Navab. 2006. Depth perceptionA major
issue in medical AR: Evaluation study by twenty surgeons. Medical Image Computing
and Computer-Assisted Intervention 9(Pt 1): 364372.
Simpfendrfer, T., M. Baumhauer, M. Mller, C. N. Gutt, H.-P. Meinzer, J. J. Rassweiler,
S.Guven, and D. Teber. 2011. Augmented reality visualization during laparoscopic radical prostatectomy. Journal of Endourology/Endourological Society 25(12) (December):
18411845.
Sindram, D., I. H. McKillop, J. B. Martinie, and D. A. Iannitti. 2010. Novel 3-D laparoscopic magnetic ultrasound image guidance for lesion targeting. HPB 12(10) (December): 709716.
Sindram, D., R. Z. Swan, K. N. Lau, I. H. McKillop, D. A. Iannitti, and J. B. Martinie. 2011.
Real-time three-dimensional guided ultrasound targeting system for microwave ablation
of liver tumours: A human pilot study. HPB 13(3) (March): 185191.
Stetten, G. D. and V. S. Chib. 2001. Overlaying ultrasonographic images on direct vision.
Journal of Ultrasound in Medicine 20(3) (March): 235240.
Stolka, P. J., X. L. Wang, G. D. Hager, and E. M. Boctor. 2011. Navigation with local sensors in handheld 3D ultrasound: Initial in-vivo experience. In SPIE Medical Imaging:
Ultrasonic Imaging, Tomography, and Therapy, Lake Buena Vista, FL, (eds.) J. Dhooge
and M. M. Doyley, p. 79681J-9.
Stoyanov, D., G. P. Mylonas, M. Lerotic, A. J. Chung, and G.-Z. Yang. 2008. Intra-operative
visualizations: Perceptual fidelity and human factors. Journal of Display Technology
4(4) (December): 491501.
Su, L.-M., B. P. Vagvolgyi, R. Agarwal, C. E. Reiley, R. H. Taylor, and G. D. Hager. 2009.
Augmented reality during robot-assisted laparoscopic partial nephrectomy: Toward
real-time 3D-CT to stereoscopic video registration. Urology 73(4) (April): 896900.
517
Sugimoto, M., H. Yasuda, K. Koda, M. Suzuki, M. Yamazaki, T. Tezuka, C. Kosugi etal. 2010.
Image overlay navigation by markerless surface registration in gastrointestinal, hepatobiliary and pancreatic surgery. Journal of Hepato-Biliary-Pancreatic Sciences 17(5)
(September): 629636.
Swank, M. L., M. Alkire, M. Conditt, and J. H. Lonner. 2009. Technology and cost-
effectiveness in knee arthroplasty: Computer navigation and robotics. American Journal
of Orthopedics (Belle Mead, N.J.) 38(Suppl. 2) (February): 3236.
Szab, Z., S. Berg, S. Sjkvist, T. Gustafsson, P. Carleberg, M. Uppsll, J. Wren, H. Ahn, and
. Smedby. 2013. Real-time intraoperative visualization of myocardial circulation using
augmented reality temperature display. The International Journal of Cardiovascular
Imaging 29(2) (February): 521528.
Takemura, H. and F. Kishino. 1992. Cooperative work environment using virtual workspace.
In Proceedings of the Computer Supported Cooperative Work, Toronto, ON, Canada,
pp. 226232.
Teber, D., S. Guven, T. Simpfendorfer, M. Baumhauer, E. O. Gven, F. Yencilek, A. S. Gozen,
and J. Rassweiler. 2009. Augmented reality: A new tool to improve surgical accuracy
during laparoscopic partial nephrectomy? Preliminary in vitro and in vivo results.
European Urology 56(2) (August): 332338.
Umbarje, K., R. Tang, R. Randhawa, A. Sawka, and H. Vaghadia. 2013. Out-of-plane brachial
plexus block with a novel SonixGPS(TM) needle tracking system. Anaesthesia 68(4)
(April): 433434.
Utsumi, A., P. Milgram, H. Takemura, and F. Kishino. 1994. Investigation of errors in
perception of stereoscopically presented virtual object locations in real display space. In
Proceedings of the Human Factors and Ergonomics Society, Nashville, TN.
Varga, E., P. M. T. Pattynama, and A. Freudenthal. 2013. Manipulation of mental models
of anatomy in interventional radiology and its consequences for design of human
computer interaction. Cognition, Technology & Work 15(4) (November 1): 457473.
Vogt, S., A. Khamene, H. Niemann, and F. Sauer. 2004. An AR system with intuitive user
interface for manipulation and visualization of 3D medical data. Studies in Health
Technology and Informatics 98: 397403 (in Proceedings of the MMVR).
Vogt, S., A. Khamene, and F. Sauer. 2006. Reality augmentation for medical procedures:
System architecture, single camera marker tracking, and system evaluation. International
Journal of Computer Vision 70(2) (November 1): 179190.
Volont, F., F. Pugin, P. Bucher, M. Sugimoto, O. Ratib, and P. Morel. 2011. Augmented reality and image overlay navigation with OsiriX in laparoscopic and robotic surgery: Not
only a matter of fashion. Journal of Hepato-Biliary-Pancreatic Sciences 18(4) (July):
506509.
Vosburgh, K. G. and R. San Jos Estpar. 2007. Natural orifice transluminal endoscopic
surgery (NOTES): An opportunity for augmented reality guidance. Studies in Health
Technology and Informatics 125: 485490 (in Proceedings of the MMVR).
Wacker, F. K., S. Vogt, A. Khamene, J. A. Jesberger, S. G. Nour, D. R. Elgort, F. Sauer,
J. L. Duerk, and J. S. Lewin. 2006. An augmented reality system for MR imageguided needle biopsy: Initial results in a swine model. Radiology 238(2) (February):
497504.
Wang, D., N. Amesur, G. Shukla, A. Bayless, D. Weiser, A. Scharl, D. Mockel etal. 2009.
Peripherally inserted central catheter placement with the sonic flashlight: Initial clinical
trial by nurses. Journal of Ultrasound in Medicine 28(5) (May): 651656.
Wang, J., H. Suenaga, K. Hoshi, L. Yang, E. Kobayashi, I. Sakuma, and H. Liao. 2014.
Augmented reality navigation with automatic marker-free image registration using 3-D
image overlay for dental surgery. IEEE Transactions on Biomedical Engineering 61(4)
(April): 12951304.
518
Weber, S., M. Klein, A. Hein, T. Krueger, T. C. Lueth, and J. Bier. 2003. The navigated
image viewerEvaluation in maxillofacial surgery. In Medical Image Computing and
Computer-Assisted InterventionMICCAI 2003, (eds.) R. E. Ellis and T. M. Peters,
pp.762769. Lecture Notes in Computer Science, vol. 2878. Berlin, Germany: Springer.
Wendler, T., K. Herrmann, A. Schnelzer, T. Lasser, J. Traub, O. Kutter, A. Ehlerding et al.
2010. First demonstration of 3-D lymphatic mapping in breast cancer using freehand SPECT. European Journal of Nuclear Medicine and Molecular Imaging 37(8)
(August1): 14521461.
Wilson, M., M. Coleman, and J. McGrath. 2010. Developing basic handeye coordination
skills for laparoscopic surgery using gaze training. BJU International 105(10) (May):
13561358.
Winne, C., M. Khan, F. Stopp, E. Jank, and E. Keeve. 2011. Overlay visualization in endoscopic ENT surgery. International Journal of Computer Assisted Radiology and Surgery
6(3) (May): 401406.
Wong, S. W., A. U. Niazi, K. J. Chin, and V. W. Chan. 2013. Real-time ultrasound-guided
spinal anesthesia using the SonixGPS needle tracking system: A case report. Canadian
Journal of Anaesthesia 60(1) (January): 5053.
Wrn, H., M. Aschke, and L. A. Kahrs. 2005. New augmented reality and robotic based
methods for head-surgery. The International Journal of Medical Robotics + Computer
Assisted Surgery 1(3) (September): 4956.
Yaniv, Z. and K. Cleary. 2006. Image-guided procedures: A review. Technical report, CAIMR
TR-2006-3. Washington, DC: Georgetown University.
Zheng, G., X. Dong, and P. A. Gruetzner. 2008. Reality-augmented virtual fluoroscopy for
computer-assisted diaphyseal long bone fracture osteosynthesis: A novel technique and
feasibility study results. Proceedings of the Institution of Mechanical Engineers, Part H:
Journal of Engineering in Medicine 222(1) (January 1): 101115.
20
Augmented Reality for

Image-Guided Surgery
Marta Kersten-Oertel, PierreJannin,
and D. Louis Collins
CONTENTS
20.1 Introduction................................................................................................... 520
20.1.1 Image-Guided Surgery...................................................................... 521
20.1.1.1 Registration......................................................................... 521
20.1.1.2 Tracking.............................................................................. 522
20.2 Describing Augmented Reality IGS Systems: Data, Visualization
Processing, View........................................................................................... 523
20.2.1 Data.................................................................................................... 523
20.2.2 Visualization Processing................................................................... 525
20.2.3 View................................................................................................... 526
20.3 Surgical Applications of AR.......................................................................... 527
20.3.1 Neurosurgery..................................................................................... 527
20.3.1.1 Data..................................................................................... 531
20.3.1.2 Visualization Processing..................................................... 531
20.3.1.3 View.................................................................................... 531
20.3.2 Craniofacial, Maxillofacial, and Dental Surgery.............................. 533
20.3.2.1 Data..................................................................................... 533
20.3.2.3 View.................................................................................... 534
20.3.3 Internal and Soft Tissue Surgery (Heart, Breast, and Liver)............. 535
20.3.3.1 Data..................................................................................... 535
20.3.3.3 View.................................................................................... 536
20.3.4 Endoscopic and Laparoscopic Surgery.............................................. 537
20.3.4.1 Data..................................................................................... 537
20.3.4.3 View.................................................................................... 538
20.3.5 Orthopedic Surgery........................................................................... 538
20.3.5.1 Data..................................................................................... 538
20.3.5.3 View.................................................................................... 539
519
520
20.4 Validation and Evaluation of AR IGS Systems............................................. 539

20.5 Summary.......................................................................................................540
20.5.1 Data....................................................................................................540
20.5.2 Visualization Processing................................................................... 541
20.5.3 View................................................................................................... 541
20.5.4 Conclusions........................................................................................ 542
References............................................................................................................... 543
20.1INTRODUCTION
Image-guided surgery (IGS), a form of minimally invasive computer-assisted surgery, was first used in the field of neurosurgery in the mid-1990s. Since then, IGS
has gained wide acceptance and is used in numerous other surgical domains having
shown improvements in patient outcomes with lower morbidity and mortality rates,
smaller incisions and reduced trauma to the patient, faster recovery times, and also a
reduction in costs and hospital stays.
Concomitant with the growing use of less invasive surgical procedures such
as IGS, there has been a growing need for new visualization methods that allow
surgeons to gain as much (or more) visual information as open or exploratory
surgerybut with minimally invasive techniques. IGS has met some of these
needs by guiding surgeons and allowing them to navigate within the surgical field
of view based on computer models of preoperative patient data (e.g.,magnetic resonance images [MRI] or computed tomography [CT] images). This type of surgical guidance, which we will examine more closely in the next section, is achieved
by using tracking systems to localize surgical tools and then visualize them in the
context of computer models of patient anatomy on a monitor within the operating
room (OR).
One drawback that remains with traditional IGS systems is that the burden lies
with the surgeon to map the preoperative patient data displayed on the monitor of the
navigation system to the patient lying on the OR table. This mapping is not trivial, is
time consuming, and may be prone to error. In order to address this issue, a number
of research groups have explored the use of augmented reality (AR) visualizations
that combine in one field of view preoperative virtual data in the form of anatomical regions or objects of interest with the live patient or live images of the patient
(Kersten-Oertel, 2013b).
In this chapter, we examine augmented reality within the unique context of
IGS. We use the term augmented reality that was defined by Milgram and Kishino
(1994) as a position on the mixed reality continuum where virtual objects are
added to a real environment. Due to technological advances, the AR paradigm has
become wider and may now be extended to the notion of mixed reality. This would
also encompass the notion of augmented virtuality (a position on the continuum
where real objects are merged with a virtual environment). In a recent survey of
AR techniques in IGS (Kersten-Oertel, 2013b), 82 of 84 papers surveyed used the
term augmented reality when referring to either augmented virtuality or mixed
reality systems.
Augmented Reality for Image-Guided Surgery
521
20.1.1 Image-Guided Surgery

In IGS, surgical instruments are tracked in order to correlate in real time the surgical field of view with the preoperative images of the patient (Figure20.1). This is
achieved by (1) registering the patient to preoperative images, (2) tracking surgical
instruments, and (3) visualizing them on the preoperative patient data. Once registered, a surgeon can then use a tracked probe or other tracked s urgical i nstrument to
point at a region of interest on the patient and then look at the representation of the
instrument with respect to the preoperative images (on the m
onitor of the navigation
system). This allows for the visualization of the exact location of a selected tool with
respect to the surrounding anatomy and guides the surgeon in their task. The following sections describe the components of the IGS system.
20.1.1.1Registration
Registration refers to bringing images of the same object into spatial alignment. In
terms of IGS, registration is used to bring into alignment the preoperative images of
the patient, which are loaded onto the navigation system, with the anatomy of the
actual patient lying on the OR table (i.e., patient-to-image registration).
One method for aligning preoperative patient images to the patient in the OR is
to use a mechanical device called a stereotactic frame. This method, which requires
attaching a stereotactic frame to the patient by drilling into the skull, has been limited to applications in neurosurgery. The frame is attached to the patient prior to
imaging in order to identify the target within the image with respect to the external frame. The targets, as well as a trajectory to the target, are planned based on
the images. As the frame puts the head in a fixed position with reference to the
frames coordinate system, and frame fiducials seen in the images enable estimation
FIGURE 20.1 A surgeon uses a tracked surgical probe to point to a location of interest on
the patient. A virtual tool model is displayed with respect to the preoperative patient images
on the navigation system. This allows the surgeon to locate the tool with respect to surrounding anatomy that is not directly visible on the patient. (Photo courtesy of Sean Chen, Montreal
Neurological Hospital, Montreal, Quebec, Canada.)
522
of the image-to-frame transformation, any target or point within the brain can be
described with Cartesian or Polar coordinates. During surgery, surgical instruments
are attached to the frame allowing the surgeon to accurately approach the target.
More commonly the virtual and real patients are brought into alignment using
homologous landmark registration (Alp etal., 1998; Eggers etal., 2006; Wolfsberger
etal., 2002). The transformation between landmarks, represented as a point set on
the preoperative data and a corresponding point set chosen on the patient, is computed and optimized. The landmarks may be external markers placed on the patient
(i.e., fiducials) that are also visible on the preoperative images (typically CT or MRI)
or anatomical landmarks that are chosen both on the images of the patient and the
actual patient (e.g., the bridge of the nose, the external cantus of the eye, or the
meatus of the ear).
20.1.1.2Tracking
Tracking systems localize objects by determining their position and orientation in
space. In IGS, acoustic, optical, and electromagnetic tracking systems have been
used.
Optical tracking systems localize objects by measuring the light that is transmitted from an object. Typically, surgical tools, reference frames, and any other objects
in the OR that need to be tracked are outfitted with a trackerthree or four spherical
infrared reflecting balls placed in a specific spatial configuration. An optical infrared
camera system, which has an infrared light source, measures the light reflected by
the markers with two cameras in stereo and determines the position of the tracked
tool in 3-D space. The optical tracking technology used in most common IGS systems is the Polaris Optical Tracking Systems from Northern Digital Inc. (NDI).*
Electromagnetic tracking systems measure the magnetic strength of transmitters
placed at fixed locations. Sensors (in which voltage is induced by the magnetic field)
measure the location and orientation of moving objects to which they are attached
(e.g., surgical instruments). An advantage of electromagnetic tracking systems over
optical tracking systems is that they do not need to be in the line of sight of the emitter
and receiver; this means that surgical tools within the body cavity can also be tracked.
On the other hand, electromagnetic systems are not wireless, which can cause some
inconvenience in sterile areas. Furthermore, electromagnetic signals may suffer from
artifacts if any new tools or objects (made of conducting material) are introduced
into the magnetic field or if there is a magnetic scanner nearby (e.g., an intraoperative
MRI). For these reasons, optical systems are more commonly used in IGS.
Acoustic tracking systems have also been studied in the context of IGS. In one
ultrasound tracking system (Hata etal., 1997) an emitter was attached to the patient
and microphones at various positions in the OR picked up transmitted ultrasound
waves. Due to the different travel times of the ultrasound waves to the various microphones, triangulation could be used to determine the position of the patient. Acoustic
tracking has not gained wide acceptance in IGS, perhaps due to the lack of accuracy
shown to date and susceptibility of ultrasound systems to changes in room temperatures that result in changes in the speed of sound (Eggers etal., 2006).
*
http://www.ndigital.com.
523
20.2DESCRIBING AUGMENTED REALITY IGS SYSTEMS:

DATA, VISUALIZATION PROCESSING, VIEW
In 2012, the data, visualization processing, and view (DVV) taxonomy for describing
augmented reality IGS systems was proposed (Kersten-Oertel etal., 2012). This taxonomy defined a common language with which to describe and discuss AR systems
for IGS. Three major factors: data, visualization processing, and view, the relationships between them and their classes and subclasses were described (Figure 20.2).
In addition to the three major components of the DVV taxonomy, it is important to
define and consider the surgical scenario.
The surgical scenario defines the type of surgery considered and a detailed analysis of the surgical steps performed during the surgery. Each surgical step helps to
define the action to be done, its associated precision, and its completion time. By
considering for each surgical step (1) what data should be visualized during that
step, (2) where that data should be viewed, and (3) how it can be interacted with,
we can ensure that a system adapts to the needs of the surgeon as different tasks are
performed.
We briefly describe the major components of the taxonomy in this section and use
the taxonomy to describe augmented reality systems for differing surgical contexts
in the following section.
20.2.1Data
An important consideration for AR IGS systems is the data that is processed and
visualized for the end user. Data falls into two main classes: patient-specific data
and visually processed data. The different subclasses of data may be directly viewed
or may undergo one or more transformations to become visually processed data.
Patient-specific data may include clinical scores, patient demographics, and signal
or raw imaging data. Here we focus on visually processed data, which may be raw
imaging data, analyzed imaging data, prior knowledge data, or derived data. The
most important characteristics of visually processed data are the dimensionality of
the data (i.e., 1-D, 22.5-D, 3-D, or 4-D), whether in the AR IGS system the data is
represented as a real or virtual object, and the semantics of the data.
The notion of semantics refers to the meaning of the data at a particular surgical step. For example, visually processed data may have a meaning that is strategic,
operational, or anatomical. The most common semantic of visually processed data is
anatomical, or dealing with the physiology and pathology of a patient. Typically, in
AR IGS systems, anatomical models of patient data such as organs, vessels, tumors, or
other objects of interest are combined with live images of the patient to allow a surgeon
to see beyond the exposed anatomical surface of the patient. Data with an operational
semantic are used to represent surgical actions or tasks. As an example, consider the
use of a data primitive to represent different states in tumor biopsy surgery; as the
proximity of a needle approaches a tumor, the tumor changes representation from surface to wireframe mesh. Another typical example of operational semantic is the use
of color to represent the location of a surgical tool, as the tool approaches a high-risk
area; the color of a tool or another indicator may change to red to alert thesurgeon.
Raw imaging
data
Camera/Sensor
Analyzed
imaging data
Processed to create
Derived data
Prior knowledge data
Has
1..*
Perception
location
Are interacted with using
Are rendered to
Visualization processing
0..* 1..*
Visually
processed data
Gives us Transformed to
Clinical scores
Data
1..* Surgical ep
Conraints
(or/task/user)
1..*
Display
1..*
View
Interaction
tools
FIGURE 20.2 The three factors of our visualization taxonomy (i.e., data, visualization processing, and view), as well as the classes and subclasses
(solid-line arrows) that represent them and the relationships between them (dashed-line arrows) are shown. Numbers in the figure specify the cardinality
of the relationships. The surgical scenario is associated with both visually processed data and view classes. The view is the component that is interacted
with by the user and therefore, the component that is most limited by the constraints of the operating room, the surgeon, the surgical task, etc. * represent
cardinality; * in particular represents many relationships; 0..* means optional on both sides many-to-many; 1..* means one-to-many.
Demographic
information
1
Patient
specific data
Surgical scenario
IGS
visualization
syem
524
525
Lastly, data may have a strategic semantic that is concerned with planning and guidance. In IGS, plans are often visualized and displayed for the surgical team; these may
include the representation of virtual tools and their planned paths.
Raw imaging data is data acquired from a particular acquisition sensor, for example, MRI data, CT data, x-ray image data, and microscope data. After raw imaging,
data undergoes a transformation; it becomes analyzed imaging data, which is visualized for the end user of the system. As an example, whereas raw data could be the
direct output of the computed tomography angiography (CTA), the corresponding
analyzed imaging data could be the slices rendered as a volume showing only the
vessels. The most common attribute of analyzed imaging data is the data primitive
that represents the data, for example, a point, a line, a surface, or a volume.
Prior knowledge data is derived from generic models, such as atlases, labels, surgery roadmaps, surgical tool models, or accuracy or uncertainty information about
the IGS system. An example of the use of prior knowledge data in IGS is the surgical
tool models that are visualized with respect to patient anatomy on the navigation
system.
Derived data comes from processing either patient-specific data or prior knowledge data. Typical examples of data deriving from patient-specific data include
uncertainty measurements due to segmentation or registration, measurements such
as tumor volumes, and distances between regions of interest. Data derived from prior
knowledge could be brain regions segmented using an anatomical atlas.
In order to best determine how to visually display the data, it is important to
take into consideration the surgical step at which the data will be shown and the
characteristics and attributes of the different types of data. Some of these attributes
may include the imaging modality of the data, whether the data is preoperative or
intraoperative, and the semantics of the data.
20.2.2 Visualization Processing

The visualization processing component of the DVV taxonomy defines how the data
is transformed and the specific visualization techniques used to provide the best possible illustrative representation of the data. Numerous computer graphics and image
processing techniques have been proposed for visualizing medical imaging data
in the context of IGS systems. Some of these include nonphotorealistic rendering
(NPR) techniques, color coding, surface shading, volume rendering, adding depth
cues, or using saliency methods (e.g., highlighting).
In Figure 20.3, we show a number of different visualization techniques in the
context of augmented reality neurovascular surgery (Kersten-Oertel etal., 2013a).
When the entire vasculature of the patient is shown overlaid onto the skull of the
patient (before decluttering), it is very difficult for the surgeon to get comprehensible information due to the many vessels overlapping at different depths. However,
when the vessels are decluttered, only the most relevant information is shown in
a way thats easy to perceive and understand. Color coding the vessels based on
whether they are veins or arteries allows the surgeon to differentiate them intraoperatively. Lastly, the depth cue of fog is added giving the surgeon a better understanding of the relative depth of vessels.
526
Color-coding
Decluttering
Fog
FIGURE 20.3 An exploration of different volume rendering visualization techniques for

augmented reality neurovascular surgery. By color-coding vessels the surgeon can more
easily differentiate intraoperatively between veins and arteries. By only showing the most
relevant vessels around the malformation we reduce the amount of information and make the
image more perceptible. Lastly, by adding the depth cue of fog, a better understanding of the
relative depth of vessels is given.
By carefully considering and determining the most appropriate visualization

technique, it is possible to increase the diagnostic value of the original data. It may
be possible and desirable to reduce the complexity of the data in such a way that
there is enough data to inform a surgeon and that this information be quickly perceivable and understandable by the surgeon. For example, rather than rendering an
entire brain volume around a tumor, the data may be reduced to segmented strips of
eloquent cortical areas around a tumor.
20.2.3 View
The view factor of the DVV taxonomy describes (1) what the end user sees, (2)wherethe
end user sees it, and (3) how the end user can interact with the data and the s ystem.
As such, the view comprises of three subcomponents: the display used, the perception
location, and the interaction tools. The view is strongly linked to the surgical scenario;
each of the visually processed data for a given surgical step is presented at a particular
perception location, on a particular display, and may have only a subset of possible
interactions associated with it.
The constraints of IGS and the OR may require domain-specific solutions for the
view factor. For example, both the display and perception location are limited by the
physical constraints of the OR, including the need for sterile equipment in a particular area of the OR, the limited space available for equipment, and the assurance that
the surgeon is in no way impeded to move and behave as he/she would normally. The
interaction tools are also important in the OR and may also require domain-specific
solutions that take into account the constraints of the surgeon or surgical team such
527
as the cognitive load of the end user (including the amount of relevant information
that can be presented at any point in time), the reaction of the user to system errors,
and the users workflow.
In AR IGS systems, a number of different display devices have been proposed
including the surgical microscope, a monitor, a head-mounted display (HMD), aprojector, and a half-silvered mirror. Depending on the display, the end user will get
either a 2-D (e.g., a computer monitor) or 3-D impression of the augmented reality
visualization. Three-dimensional technologies can either be autostereoscopic or binocular stereoscopic. Whereas binocular stereoscopic visualization requires the use
of special glasses or headwear, autostereoscopic displays, such as multiview lenticular displays, videography, and holography, do not.
The perception location describes where the end user looks to take advantage of
the augmented reality visualization; this may be on or in the patient, a part of the
environment such as the wall, a surgical tool, or another digital device such as a
monitor. The perception location falls into one of two categories, those that require
the surgeon to look away from the surgical field of view and those that project the
virtual objects into the surgical field of view.
Interaction tools fall into two subclasses: hardware interaction tools and virtual
interaction tools. Hardware interaction tools are the physical devices used by the
end-user to interact with the system. A partial list includes keyboards, mice, surgical
tools, tangible objects, and data gloves. The virtual interaction tools describe how
the end user can interact with the data and visualization. Typical examples of interactions used in IGS systems include: tuning volume rendering transfer functions,
volume cutting, voxel peeling, using clipping planes, turning data visibility on and
off and, in general, adjusting data properties such as color, brightness, contrast, and
transparency.
20.3 SURGICAL APPLICATIONS OF AR

Augmented reality has been studied for use in many different computer-assisted
surgery applications. In this section, we describe the most popular surgical applications for augmented reality and use the DVV taxonomy to give domain-specific
examples of the types of data, visualization processing, and views have been
proposed (Figure20.4).
20.3.1Neurosurgery
Neurosurgery was one of the first applications of computer-assisted surgery systems and is currently the most common application for augmented reality IGS
systems. There are two main reasons for this. First, the fact that in neurosurgery
surgeons must resect the smallest possible volumes of tissue in a very narrow
operative field while trying to minimize damage to the eloquent areas of the brain
(Paul et al., 2005; Shuhaiber et al., 2004). Second, in neurosurgery, the surgical
anatomy is constrained within a fixed space (the skull), which allows for feasible
registration (Shuhaiber, 2004). Other soft tissues require more complex registration techniques.
528
(a)
(b)
FIGURE 20.4 Augmented reality systems for different surgical applications. (a)Neurosurgery:
augmented reality view in image-guided neurovascular surgery. Volume-rendered color-coded
vessels are overlaid on top of the patients dura to guide the surgeon and allow for localization
of vessels of interest. (From Kersten-Oertel, M. etal., Augmented reality in neurovascular
surgery: First experiences, Proceedings of AE-CAI, Boston, MA, LNCS, vol. 8678, pp. 8089,
2014.) (b) Liver surgery: augmented reality laparoscopic liver surgery. (From Haouchine, N.
et al., Image-guided simulation of heterogeneous tissue deformation for augmented reality
during hepatic surgery, ISMARIEEE International Symposium on Mixed and Augmented
Reality, Adelaide, SA, pp. 199208, 2013.) A real-time biomechanical model is overlaid on a
wireframe rendered liver, the tumor is depicted in magenta, and different veins are shown in
blue, green, and purple. (Image courtesy of N.Haouchine.)
(Continued)
529
(c)
(d)
FIGURE 20.4 (Continued) Augmented reality systems for different surgical applications.
(c) Orthopedic surgery: Augmented reality view for application in orthopedic surgery: the
virtual spine object segmented from CT is merged with the real. A virtual mirror is used to
see the spine from different perspectives. (From Bichlmeier, C., Immersive, Interactive and
Contextual In-Situ Visualization for Medical Applications, Technische Universitt Mnchen,
Mnchen, Germany, 2010; Image courtesy of C. Bichlmeier and P. Fallavolita.) (d) Dental
surgery: augmented reality system for dental implants: cylinders overlaid on a skull phantom
represent positioning information for implant placement. (From Kati, D. etal., Knowledgebased situation interpretation for context-aware augmented reality in dental implant surgery,
in: Liao, H., Eddie Edwards, P.J., Pan, X., Fan, Y., and Yang, G.-Z. (eds.), Medical Imaging
and Augmented Reality, Lecture Notes in Computer Science, vol. 6326, Springer, Berlin,
Germany, 2010b, pp. 531540; Image courtesy of D. Kati.)(Continued)
530
(e)
(f)
FIGURE 20.4 (Continued) Augmented reality systems for different surgical applications.
(e) Keyhole surgery: augmented reality view of thorax for minimally invasive port based
surgery. (From Bichlmeier, C., Immersive, Interactive and Contextual In-Situ Visualization
for Medical Applications, Technische Universitt Mnchen, Mnchen, Germany, 2010;
Image courtesy of C. Bichlmeier and P. Fallavolita.) (f) Laparoscopic gallbladder surgery:
augmented view of laparoscopic image in gallbladder surgery. (From Kati, D. etal., Off. J.
Comput. Med. Imaging Soc., 37, 174, 2013.) The virtual object represents an anatomical area
to avoid. (Image courtesy of D. Kati.)
One of the first augmented reality neurosurgical applications was proposed by

Gleason etal. (1994) where 3-D segmented virtual objects from preoperative patient
scans (e.g., tumors, ventricles, the brain surface) were combined with live video
images of the patient. This augmented reality view of the patient, with superimposed
anatomical data objects, could guide surgeons to locate particular areas of interest
and help a surgeon to plan a resection corridor to a lesion. Over the last 20years,
531
many other specific applications in neurosurgery have been proposed for augmented
reality visualization. Some of these are: transsphenoidal surgery (where for example
pituitary tumors are removed through the nose and the sphenoid bone); microscope
augmented reality IGS systems for neurosurgery, otolaryngology and ENT (ear,
nose, and throat) surgery, and craniotomy planning.
20.3.1.1Data
The most common type of analyzed data that has been used as virtual objects to
be projected into the optical path of the microscope, mixed with live video images,
or rendered on a HMD for image-guided neurosurgery (IGNS) is anatomical data.
Typically, lesions and tumors, eloquent brain areas to be avoided, vessels, or other
anatomical structures or functional data segmented and processed from MRI, CT,
and/or angiographies are displayed. This type of analyzed data has been typically
rendered as either surface or wireframe objects.
Only a few research groups have described the visualization of derived data. In
Kawamata etal.s (2002) AR neurosurgical system for transsphenoidal surgery the
distance of a tool to the tumor is shown numerically in a bar graph that represents the
status of the approach to the tumor. In a proposed AR IGNS system for needle biopsies (Sauer et al., 2002; Wacker etal., 2006) the derived data comes in the form of
extrapolated needle paths. Whereas the visualization of the needle (ablue cylinder) is
prior-knowledge data, the extrapolation of the needle path (depicted in yellow) that
depicts the needles computed trajectory is considered derived data. Prior-knowledge
data, in IGNS systems has typically taken the form of surgical tools that are visualized on preoperative data. Another form of prior-knowledge data is preoperative
plans that are visualized during the surgery. In the work by Worn etal. (2005), trajectories for bone cutting, points for biopsy, and planned boreholes are all visualized
to aid the surgeon in carrying out the preoperative plan.
20.3.1.2 Visualization Processing
Visualization processing in IGNS systems and in IGS systems in general has greatly
been limited to a few simple techniques. The most commonly described visualization
techniques are color-coding objects of interest (based on type of object or object state)
and using transparency to blend objects together. A few other techniques that have
been used in IGNS are using depth cues such as hidden line removal and stereo projection (Edwards etal., 1995), using lighting and shading (King etal., 2000), or using
saliency techniques such as highlighting objects of interest (Pandya etal., 2005).
20.3.1.3View
The most common display device for neurosurgical AR applications is the surgical
microscope. In many neurosurgical procedures a microscope is used to get a magnified view of the patients brain. Choosing the microscope as display device is a logical solution as the device is already present in the operating theater, and therefore
there is little or no additional cost in using it. Furthermore, using the microscope
reduces the disruption to the surgeons workflow.
A number of research groups have developed AR image-guided neurosurgery
systems that use a surgical microscope or a head-mounted variation of a microscope.
532
One example is the MAGI (microscope-assisted guided intervention) system,

which uses a stereomicroscope as display device. In the MAGI system a surface
or wireframerendered anatomical object of interest is projected into each of the
eyepieces of the microscope enabling stereoscopic visualization of virtual features
(Edwards etal., 1995, 1999; Nijmeh etal., 2005). A related work is the Varioscope
AR (Birkfellner etal., 2002, 2003; Figl etal., 2002), a custom built, head-mounted
operating microscope that allows for correct stereoscopic visualization of virtual
features. The virtual features are projected by means of two miniature displays onto
the focal plane of the main lens of the microscope. Another example of microscope
use in AR IGNS was proposed by Aschke etal. (2003). In their work a neurosurgical
microscope was outfitted with a micro optical bench connected to the microscopes
beam splitters allowing for correct stereoscopic visualization of the virtual features
presented in the optical path of the microscope.
Although research in microscope-based augmented reality IGNS systems have
been studied for use in neurosurgery since the 1990s, only recently have such systems come onto the market. The Zeiss OPMI 1 microscope* may be equipped
with an infrared tracker allowing for monoscopic overlay of virtual contours on the
optical path of the microscope. Although some suggest that monoscopic overlays are
problematic due to the lack of perceptual cues present, a recent study showed that
surgeons performed best on a positioning accuracy task when using a microscope
with monocular image injection in comparison to a 2-D or 3-D monitor (Rodriguez
Palma etal., 2012). As the authors did not compare stereoscopic microscope AR in
this study, it may be that surgeons performed better because they did not have to look
away from the task at hand, not due to the dimensionality of the displays.
Other display devices that have been used for neurosurgery (in order of popularity)
are HMD projectors, and computer monitors. One example of a HMD system comes
from the work of Sauer etal. (2002). A video see-through HMD is worn by a surgeon and live video images of the patient are augmented with segmented a natomical
objects of interest. One of the first AR IGNS systems come from the work of Grimson
etal. (1999) where a monitor was used as display device; anatomical structures, segmented from MRI, were overlaid on video images of the patients head.
Given the tendency of using the microscope as display device for neuronavigation,
the perception location is most commonly the patient. When the computer m
onitoris
used, the perception location is typically the monitor that is associated with the IGNS
system. A few systems, for example, the one proposed by Paul etal. (2005), offer the
user to choose between using the surgical microscope and the monitor that is part of
the IGNS system as perception location.
The majority of publications describing AR IGNS systems do not mention the use
of particular hardware interaction tools or the ways in which a user can interact with
the AR system (software interaction tools). There are a few exceptions. Hardware
interaction tools were described by Grimson etal. (1999) who described the use of
a mouse, and Worn etal. (2005) who described the use of either a mouse or a 6DOF
Phantom device connected to a virtual tool to interact with their AR system.
*
http://corporate.zeiss.com /content/dam /Corporate/pressandmedia/downloads/Carl_ Zeiss_

Innovation_English_Issue_14.pdf.
533
In terms of software interaction tools, Grimson etal. (1999) developed a graphical

user interface that allows a user to rotate and translate data in order to map the 3-D
anatomical model to laser range data of the patient. The user can also remove laser
data point outliers. Furthermore, the user can control the selection and opacity of
anatomical structures. In the work by Lorensen etal. (1993), it is an operator (not the
surgeon) who controls the AR view by mixing live video with anatomical models. In
the AR IGNS system developed by Giraldez etal. (2006) the surgeon can adjust the
brightness and color of the overlay image using a menu. In the system proposed by
Paul etal. (2005), the user can toggle each element in the scene to be visible or not
visible, and change the opacity, color, and pose of the overall scene.
The graphical user interface of IGNS systems has not generally been described
well in the literature. One exception comes from the system developed by Kosaka
etal. (2000); the interface has a main window that depicts the virtual object merged
with the microscope view in addition to three slice views of the patient preoperative
images showing the overlaid surgical instruments.
20.3.2Craniofacial, Maxillofacial, and Dental Surgery

The second most common application of augmented reality IGS systems is craniofacial and maxillofacial surgery. There have also been a few AR systems proposed for
dental surgery, which also falls under this category. AR in these types of surgeries
enables visualization of osteotomy lines or tumor margins from surgical plans on
the patient. By visualizing deep structures beyond the surface of the patient, the
invasiveness of this type of surgery is reduced (Shuhaiber, 2004).
20.3.2.1Data
The most common forms of analyzed data for craniofacial, maxillofacial, and
dental surgery are anatomical data and planning data. Most typically osteotomy
lines, tumors boundaries, nerves, and surgical plans are rendered as the virtual
objects that are merged with live images of the patient. Unlike other surgical
applications where surfaces and volumes are often rendered, in these types of
surgery the object type is most commonly simple points, lines, or contours. This
is likely the case for two reasons: first, only these types of objects (i.e., vector
objects) can be displayed on the patient with a projector and second, these types of
objects are sufficient for guiding surgery dealing with rigid objects (facial bones,
jaws, teeth, etc.).
Derived data have not often been discussed in publications relating to augmented reality systems for these types of surgery. One exception comes from the
work of Trevisan etal. (2006) where a dynamic sphere that shows the distance of
a tool from a dental, that changes color based on its position (i.e., changes to green
when the tool is on the correct path). Kati etal. (2010) in their augmented reality system for guidance of dental surgery proposed a context-aware visualization
for derived data in the form of color-coding objects of interest (virtual screws,
instruments, etc.) based on the distances between them. Such a visualization that
is based on the current surgical scenario is a move toward systems that enable
choosing the best visualizing method for the most relevant virtual objects based on
534
analyzing the current surgical step. Such systems can simplify the user interface
and thus lower the cognitive burden of the surgeon-user.
The types of prior-knowledge data that have been visualized in AR systems for
craniofacial, maxillofacial, and dental surgery are the tracked surgical tools used
(e.g., saws).
Perhaps due to the use of primarily vector objects, visualization processing of data
for these types of surgery is not typically described. When surface models have been
used, only transparency (Livin and Keeve, 2001; Trevisan et al., 2006) and color
coding (Trevisan et al., 2006) have been noted as visualization techniques.
20.3.2.3View
The most common display device used in craniofacial, dental, and maxillofacial
surgery is a projector. One example comes from the work by Marmulla etal. (2004,
2005) where osteotomy lines and tumor boundaries segmented from CT images
were projected onto the patient to guide and help the surgeon navigate with respect
to his/her surgical plan. Other display devices that have been used for these types of
surgeries include head-mounted displays (HMDs), portable screens, and monitors.
An example where a portable LCD screen (the X-Scope) was used comes from the
work of Mischkowski etal. (2005, 2006). In this system, the surgeon holds the portable screen and walks around the patient taking advantage of the combination of a
volume-rendered patient model from CT or MRI with the live images of the patient.
Salb etal. (2002) used an optical see-through HMD in their proposed system for
craniofacial surgery and Kati etal. (2014, 2010) used one in their dental AR system.
The majority of display devices commonly used for these types of surgery (except
for monitor) allow the perception location to be the patient. This suggests the importance of not disrupting the surgeons workflow; using the patient as the perception
location ensures that the surgeon need not look away from the surgical field of view
to benefit from the AR visualization. Furthermore, when virtual objects are projected onto the patient, the surgeon does not need to mentally transform guidance
images from the navigation system to the anatomy of the patient on the table. In their
AR system for dental surgery, Kati etal. (2010) proposed an interesting solution to
perception location that allowed both the perception location to be either the patient
or an area within the field of view of the surgeon but not on the patient. Their system offers the user two view methods: (1) an AR view where the virtual objects are
overlaid on the real anatomical structures, and (2) an analog view where there is a
visualization of information in a fixed position, for example, left corner of the view,
and does not occlude the patient anatomy. This proposed solution avoids distracting
the surgeon at particular steps in the surgery and occluding the patient with virtual
objects when that information is not needed.
Interactions, whether software or hardware, have not typically been described
in the literature pertaining to dental, craniofacial, and maxillofacial surgery. One
exception is the work of Trevisan etal. (2006) where manipulations of the virtual
objects (rotations, scaling, and zooming) are explicitly described as being possible
at all times.
535
20.3.3 Internal and Soft Tissue Surgery (Heart, Breast, and Liver)
Minimally invasive cardiovascular surgery has been shown to benefit from the
application of augmented reality visualization techniques in the OR. TerryPeters
group at the Robarts Research Institute developed an AR visualization system
for minimally invasive cardiac procedures, integrating real-time ultrasound and
virtual m
odels of the patients beating heart with tracked surgical instruments
(Bainbridge etal., 2008; Linte, 2008; Lo etal., 2005). Traub etal. (2004) also
described a system for optimal port placement in minimally invasive cardiovascular surgery. Liver surgery and breast cancer and breast-conserving surgery
have also been surgical applications that have been examined for augmented
reality IGS systems.
Although AR IGS systems have been proposed for internal and soft tissue type
surgeries, developing AR navigation systems for the liver and intestines (pliable
organs) or soft tissues such as the breast is difficult because they are nonrigid and
deform due to heartbeat, breathing, pressure from laparoscopic instruments, and
from being probed (Shuhaiber, 2004). Deformation is not as significant a problem
with surgery of semirigid organs such as bone and brain (Blackwell, 2000). This
may explain the smaller representation of soft tissue surgeries in the field of augmented reality navigation systems.
20.3.3.1Data
Analyzed data in soft tissue surgery typically have an anatomical semantic. For liver
surgery, segmented surface-rendered tumors, vessels, and whole livers have served
as the anatomical virtual data that is merged with the patient image. Rather than
using surface-rendered objects, Splechna etal.s (2002) system uses volume-rendered
intraoperative ultrasound scans that are visualized along with surface-rendered vessel trees for liver surgery. For breast cancer and breast conserving surgery, typically
wire frame tumor and breast models are visualized. Lastly, in cardiac surgery the
IGS system developed by Peters group (Bainbridge etal., 2008; Linte et al., 2008;
Lo etal., 2005) has depicted cardiac surface models as well as 3-D free-form deformation fields that describe the trajectories of the points on the surface model through
a cardiac cycle.
A unique form of derived data comes from the work of Sato etal.s (1998) breast
cancer surgical navigation system where measurements of the cancers intraductal
spread are depicted (i.e., whether or not abnormal cells have spread outside the
breast duct). Points are visualized as either red to indicate the presence of intraductal
spread and green if no intraductal spread is found. In Tomikawa etal.s (2010) breast-
conserving surgery system the puncture line of the surgical needle is depicted. In
the liver guidance system described by Splechtna etal. (2002), during the surgery a
radiologist places annotation markers (in the form of colored spheres) on the vessel
tree of the liver to note regions of interest for the surgeon. In cardiac surgery, Linte
etal. (2008) graphically represent uncertainty derived from the target registration
error using 95% confidence ellipsoids.
Similar to other IGS techniques, prior knowledge data in soft tissue and internal
surgery has been limited to the depiction of virtual tool models.
536

For soft tissue and internal surgery both color coding of structures and transparency for superimposition of virtual and real data, for example (Soler et al., 2004),
have been most commonly used. As well as using color coding and transparency,
Scheuering etal. (2003) use lighting in the form of Phong or surface shading was
used for their volume-rendered tumors, vessels, and objects of interest. In the work
of Samset etal. (2008) occlusion is used to aid in the perception of virtual and real
objects; edges from the live video of the patient are enhanced and rendered with the
overlay and virtual object.
20.3.3.3View
For internal and soft tissue surgery, two display devices have typically been used:
HMDs and monitors. In an interesting work, Suthau etal. (2002) examined different
AR display solutions in the context of liver surgery and compared video see-through
HMDs, optical see-through HMDs, and virtual retinal displays/retinal projection
HMDs. The authors concluded that there is a need for better see-through technologies, and that current displays are suboptimal. The authors also mentioned that
HMDs were too heavy and awkward to use and wear. Since publication of that work
many new augmented reality digital see-through headwear devices have been developed. Lightweight devices such as smart glasses (e.g., Google Glass* or EyeTap),
which are optical HMDs that allow for an augmented reality overlay, are beginning
to be used for medical applications. These types of devices can reflect projected
digital images from CT or x-ray as well as allow the user to see through it.
Given the split between HMDs and monitors used in these types of surgery the
perception location has been split between the patient and a monitor.
Virtual interaction tools have been described in a number of image-guided liver
surgery systems. In the work of Splechtna etal. (2002), a radiologist interacts with
and controls the AR visualization to support and guide the surgeon. The radiologist
views the same scene on a monitor as the surgeon sees in the HMD and registers
the preoperative models of the vessel tree to the real-world liver. The radiologist and
surgeon are both video- and audio-linked; if different views of the liver are needed
to orient the vessel tree, the radiologist can tell the surgeon to reorient the HMD.
Ina work by Soler etal. (2004) the user can interact with each of the anatomical
structures, can change transparency of objects, navigate through the surgical scene,
and use cutting planes.
Hardware interaction tools have also been described in a number of works
pertaining to liver surgery. In Paloc et al.s (2004) mixed reality system for liver
surgery, rather than using a mouse, key presses allow the surgeon to switch between
monoscopic and stereoscopic views or between real and virtual images. In Splechtna
etal.s (2002) AR system for liver surgery, the radiologist can interact with the system by using both a three-button mouse and a SpaceMouse (a 6-degrees-of-freedom
[6DoF] interaction device).
http://www.googleglasssurgeon.com/surgery.
http://eyetap.org/.
537
20.3.4Endoscopic and Laparoscopic Surgery

In endoscopic surgery, including laparoscopic and arthroscopic surgery, a telescopic
rod with a video camera goes through a small incision or body orifice and is used to
guide surgical instruments. Endoscopic surgery is a popular surgical application for
augmented reality systems. This may be due to the fact that with this type of minimally invasive surgery the surgical field of view is greatly reduced when compared
to traditional open surgery and that the surgeons are used to seeing the surgical field
on a display monitor. By augmenting the video images, the surgeon may be able to
get a better understanding of the surrounding physiology of the patient. Augmented
reality systems for these types of surgery have been used in the specific domains of
liver, digestive, abdominal, prostrate, urologic, and robotic surgery.
20.3.4.1Data
Analyzed data in endoscopic surgeries have been typically wireframe, surface, or
volume-rendered anatomical models that are overlaid on the video coming from the
camera at the end of the telescopic rod.
In a number of different AR systems for endoscopic surgeries, derived data have
been described. In the laparoscopic system proposed by Soler etal. (2004) for digestive surgery, the distance to a tumor (represented by a green sphere) is numerically
displayed. In the arthroscopic system for knee surgery proposed by Tonet et al.
(2000) a red highlighted area is depicted on the anatomical surface model to represent the area seen in the arthroscopic view. In the robotic endoscopic AR abdominal
surgery system developed by Suzuki et al. (2008), haptic sense information from
the right and left robot, a forceps indicator based on hardness, and the patients vital
signs are all visualized for the surgeon. In a recent laparoscopic system for liver
surgery proposed by Haouchine etal. (2013) a biomechanical model of a liver from
CT data is used to a compute volumetric displacement field. This allows for the realtime update of the overlay images of the liver and its internal structures (e.g., veins,
tumors) caused by motion, for example, due to interaction with a surgical instrument.
Prior knowledge data have not been typically depicted in AR systems for endoscopic surgery. Tool models, for example, are not typically visualized likely due to the
small areas shown in the camera views and lack of space to depict virtual tool models.
Instead, only the location of the tool tip is depicted, typically as a crosshair or arrow.
As with other types of surgery, color-coding regions or objects of interest and using
transparency to overlay camera images with virtual objects are the most common
visualization techniques used in endoscopic surgery. Depth cues however, have also
been used to improve perception of the ordering of objects. In the work by Fuchs
etal. (1998) the cue of occlusion is used, that is, a pixel is only painted if it lies above
the surface of the closest object. Suzuki etal. (2008) employ the cue of stereopsis in
the form of stereo pairs. The superimposed anatomy (virtual object) is shown in two
windows; the top view shows the left-eye image and the bottom view, the right-eye
image. Lastly, a saliency method that highlights in red the anatomical area of a 3-D
model in the corresponding arthroscopic view is used by Tonet etal. (2000).
538
20.3.4.3View
In endoscopic surgical systems, including laparoscopic and arthroscopic systems,
the display is with few exceptions a monitor. The use of a monitor is a natural decision for an AR system for these types of surgery as traditionally an external monitor
is used to view the image coming from the video camera at the end of the scope. In a
few systems other display devices, such as HMDs, have been proposed. For example,
in Fuchs etal.s (1998) laparoscopic system, the camera image is augmented with
preoperative anatomy and displayed to the end user via a HMD. In the thoracoscopic
AR system proposed by Sauer etal. (2006) a HMD is used to display surface models
of the spine and objects of interest on the patient with thoracoscopic view inset in the
end users AR view. Another example comes from Sudra etal.s system (2007) where
a HMD is used to augment reality in endoscopic robotic surgery.
For endoscopic and laparoscopic navigation systems, the monitor was the most
common perception location. As a monitor is traditionally employed in such surgeries,
its use as the perception location allows the easy extension of traditional navigation
tools and requires no training on the surgeons part. The patient, however, has also
served as the perception location in numerous endoscopic and laparoscopic AR systems where a HMD was used. This may suggest a shift from using a monitor as
perception location in IGS systems to using the patient as perception location, even in
surgeries where the surgeon traditionally does not look directly at the patient.
A few different interaction tools have been discussed in the literature pertaining to AR systems for endoscopic and laparoscopic surgery. In Teber etal.s
(2009) laparoscopic system for urological surgery, an independent operator uses
implanted landmarks to register and combine preoperative CT images with the
real-time video from the endoscope. An interesting solution to system interaction was proposed by Sudra etal. (2007). In their endoscopic robotic system both
speech- and gesture-based interactions are possible. In gesture-based interfaces,
some set of motions or configurations of the hands or body are recognized by the
system as commands. The surgeon can use speech or gestures to switch between
visualization methods, for example, to change parameters or to turn annotation
information on and off.
20.3.5Orthopedic Surgery
AR IGS systems have also been proposed for use in orthopedic surgery. Particular
applications for which AR has been used in orthopedic surgery include spinal surgery, hip replacement, long bone fractures, and knee surgery.
20.3.5.1Data
Analyzed data in AR image-guided orthopedic surgery are most commonly segmented bones from CT scans. In terms of derived data, the AR long bone fracture
navigation system proposed by Zheng etal. (2008) shows outlines and centerlines
of bone fragments on intraoperative fluoroscopic images. Prior knowledge data for
image-guided orthopedic surgery have only been described in the form of surgical
tool models.
539

Beyond the use of transparency and color coding, an interesting visualization technique that was used in a computer-assisted AR virtual fluoroscopic surgical system
for long bone fractures was NPR. Zheng et al. (2008) found that when they used
photorealistic rendering of the implant models, important aspects of the fluoroscopic
images were occluded. They improved the visualization by using a more illustrative
rendering style where only silhouette lines of the surgical implants were used,
thereby showing more image details and not occluding the fluoroscopic image during
navigation.
20.3.5.3View
Blackwell etal. (2000) proposed an image overlay system for both surgical teaching and intraoperative guidance of surgeries such as total hip replacement. In their
work, they used a semitransparent or half-silvered mirror as the display. A computergenerated image was reflected though the mirror such that the generated images
were projected onto the patient. The mirror was used in combination with tracked
3-D shutter glasses so that the virtual objects could be seen in the correct position
and in stereo. Bastien etal. (1997) also used a silvered-mirror for AR image-guided
spinal surgery. More recently, Wu etal. (2014) used a camera-projector system to
superimpose preoperative 3-D images onto the patient for guidance in vertebroplasty
spinal surgery. Other devices that have been proposed in AR orthopedic surgery are
HMDs and monitors.
Although there has been a wide range of display devices that have been proposed
in orthopedic surgery, similar to dental, craniofacial, and maxillofacial surgery,
the perception location has typically been the patient, rather than a monitor or the
environment. By projecting anatomical patient models and surgical plans onto the
patients anatomy, the surgeon need not look away when using such surgical tools as
drills and saws.
Only a few systems have mentioned the use of interaction tools in image-guided
orthopedic surgery. Zheng etal.s (2008) system discusses a hardware interaction
tool; a mouse is used to correct any segmentation errors of the anatomy of interest
before overlaying virtual models onto fluoroscopic images. In their work on a HMD
AR spinal system, Sauer etal. (2006) mention the use of virtual interaction tools.
The surgeon can adjust transparency of objects and turn object visibility on and off.
Furthermore, the surgeon can toggle between the AR view, thoracoscopic view, and
picture-in-picture mode (where he can move the view inset laterally and in depth so
that it does not obstruct important features of the real view).
20.4 VALIDATION AND EVALUATION OF AR IGS SYSTEMS

With the vast amount of research in augmented reality IGS and the many systems
that have been proposed, it is at first glance surprising that augmented reality
techniques have not made it into commercial systems and daily surgical practice. Yet, few research systems have been sufficiently validated and evaluated
to show that there is a benefit to using them, either for the surgeon in the OR or
540
on patient outcomes. Mostsystems have been built as proof of concept and have
not been sufficiently evaluated and validated. Without proper evaluations of integrated systems as well as evaluation of each of component described in the DVV
taxonomy, which would show the added benefit of using them, these systems will
not become regularly used.
In a review of close to a hundred papers on AR IGS systems it was found that
only in 4% of the cited works was the systems effect on surgical outcomes evaluated (Kersten-Oertel, 2013b). Furthermore, the review showed that the majority of
systems developed to date have only been validated in terms of the accuracy of the
system, whether as a whole or in part, for example, for patient-to-image registration,
calibration, or overlay accuracy of the real and virtual images. Validations that have
been done have been typically done using numerical methods or phantoms; very few
systems have been evaluated on patients in a real clinical setting, and even fewer
have been evaluated prospectively.
Based on the paradigm that there is a three-way relationship in the OR between
the surgeon, the patient and the IGS system, Jannin and Korb (2008) proposed
that to assess an IGS system one should look at these three components and their
interrelationships. Therefore, the assessment criteria falls under measures related
to the patient, the surgeon, the IGS system, and the interactions between them:
the surgeon and patient, the IGS system and the surgeon and the IGS system and
the patient. Furthermore, six levels of assessments of IGS systems have been proposed: the technical parameters, reliability in a clinical setting, efficacy in terms
of surgical performance, effectiveness in terms of patient outcome, economic
aspects of its use and, lastly, the social, legal and ethical aspects of its use (Jannin
and Korb, 2008).
The specifications of these criteria and assessment levels outline the complexity of validating and evaluating augmented reality IGS systems. Validating systems
based on patient-related criteria such as patient outcomes is particularly challenging
and time consuming as it involves planning and carrying out clinical trials and determining metrics that can objectively compare techniques to determine if improvements are related to a particular technology or technique used. These difficulties are
discernable by the lack of systems that have been used in real clinical settings and
evaluated on real patient data.
20.5SUMMARY
The examination of the DVV components of augmented reality surgical systems
allows for the use of a common language to describe an IGS system based on what
type of data should be visualized, how it should be visualized, at what point in the
surgery it should be visualized, and how the user can interact with the data both in
terms of manipulation on screen and hardware devices for interaction.
20.5.1Data
When looking at the type of data that is visualized, it is not surprising that across all
surgical domains the most commonly analyzed data is anatomical. Although this is
541
the case, it is important to highlight the importance of data as a factor, particularly

in terms of deciding which of the abundant available patient data should be visualized for the surgeon. For example, in neurosurgery for tumor resection, a tumor and
the surrounding anatomy can be segmented from MRI, eloquent cortex adjacent to
the tumor can be identified with positron emission tomography (PET) and functional
MRI (fMRI), vessels can be visualized with angiography images, and diffusion tensor imaging (DTI) can be used to identify white-matter fiber tracks. Given all of this
possible data, it is important to create a coherent visualization with only the relevant
information visualized at each surgical step.
Furthermore, future efforts should not only focus on displaying imaging data but
also examine the visualization of derived data such as tags, labels atlas information,
and uncertainty information.
20.5.2 Visualization Processing

Given the large amount of research on graphical rendering and visualization of medical image data, it is surprising that proposed techniques in these fields have not
typically been used in AR and IGS in general. The majority of visualized data in AR
IGS systems is color coded and blended with video images using transparency alone.
Yet, the use of color coding and transparency is not ideal, as these simple techniques
do not easily support the understanding of the structure and spatial relationships of
medical imaging data. The use of transparency alone to combine data can complicate the perception of relative depth distances and spatial relationships between surfaces (Hansen etal., 2010). Furthermore, when using stereo viewing in applications
such as AR, it has been shown that the use of transparency makes the perception of
the depth of the stereo images ambiguous (Johnson etal., 2002). More sophisticated
techniques are needed for better understanding of and interaction between complex
medical and multimodal data sets.
Drawing on research in the computer graphics and visualization communities,
more advanced rendering strategies such as NPR, using halos, lighting, and shading,
and adding additional depth information should be considered to allow for better and
quicker understanding of medical data in the OR.
20.5.3 View
Combined with the trend of preferring particular devices in a specific surgical
domain, the variety of display devices used across the various surgical domains suggests that there is no one ideal IGS solution. In fact a number of studies have shown
that choosing the appropriate intraoperative display technology is highly application,
task, and user specific (Cooperstock and Wang, 2009; Traub etal., 2008). Therefore,
an analysis of the surgical domain, the surgical tasks, the surgeons workflow, and
the OR environment must be carefully done to define the requirements of the display
device for a particular surgical application.
Technological advances in display devices are making their way into the OR.
Mobile devices, such as the iPod touch and iPad (Apple Computer, Cupertino,
California) have been used to display preoperative images in the surgical fieldofview.
542
One example of this is the Smith & Nephews Dash Smart Instrument System,* which
is a portable navigation system powered by BrainLab (Feldkirchen, Germany). This
orthopedic navigation system guides the surgeon to accurately place knee and hip
implants. Such ubiquitous devices, which are small and portable, are increasingly
being used across many medical domains and may soon become standard display
devices in the OR.
Although the majority of AR IGS systems continue to use classical interaction
hardware paradigms, such as the mouse and keyboard, these are not appropriate
solutions for the surgical environment (Bichlmeier etal., 2009). However, there has
been little focus on modernizing and developing new solutions for interacting with
visually processed data. One exception, which looked at finding a novel solution to
replace the use of a touch screen or mouse with an IGS system, comes from Fischer
etal. (2005). In their work, surgeons use surgical tools that are already present and
tracked in the OR to perform gestures that are recognized by the tracking system.
The user may click on simple menu markers to load patient data, choose to draw
points or lines, and change the color of virtual objects. In a similar vein, Onceanu
and Stewart (2011) proposed a device into which a tracked surgical probe can be
placed, that allows for joystick-like interaction with a navigation system. With the
advent of new devices such as the Kinect, gesture-based interactions are becoming
more commonly proposed as solutions for interacting with IGS systems.
Two future avenues for research within the view component are likely. First, the
development of appropriate hardware devices or gesture-based techniques that can
be used to interact with visually processed data. These will facilitate interactions
with the surgical navigation system. Second, as surgical modeling and workflow
monitory techniques advance to allow for recognition of a particular stage of surgery, interfaces will be created that require little to no interaction, again facilitating
the surgeons task. Recognition of situational awareness will allow for the automatic
optimization of not only interaction solutions with respect to the current surgical
activity but also the displayed data and visualization techniques used.
In an ideal IGS system, the view would change without interaction and without
disturbing the surgical workflow; the view solutions would be surgeon specific and
allow for natural interactions with the system. Furthermore, suitable representations
of the most appropriate data would be presented at any given stage of the surgery for
a given task.
20.5.4Conclusions
The purpose of augmented reality in IGS is to provide surgeons with a better understanding of the connection between preoperative patient models and the operative
field. Numerous AR technologies, techniques, and solutions have been proposed for
many different types of surgery; yet these solutions have yet to make it into daily
use. The specific domain of the OR makes it challenging to evaluate these proof-ofconcept systems. However, with new research that has begun to focus on surgical
*

https://www.brainlab.com/surgery-products/overview-platform-products/dash-smart-instrument/.
http://www.xbox.com/en-ca/kinect.
543
workflow modeling methods, and novel evaluation metrics specific to OR evaluations of IGS systems, these systems will begin to be properly evaluated and the added
benefit of using AR IGS systems in the OR will be demonstrated. Furthermore, we
predict that the trend of using augmented reality and ubiquitous computing solutions
will continue to increase in daily life. These two factors should push the adoption
and diffusion of AR technologies in the surgical community.
REFERENCES
Alp, M.S., Dujovny, M., Misra, M., Charbel, F.T., Ausman, J.I. (1998) Head registration techniques for image-guided surgery. Neurology Research, 20:3137.
Aschke, M., Wirtz, C.R., Raczkowsky, J., Worn, H., Kunze, S. (2003) Augmented reality in
operating microscopes for neurosurgical interventions. In: First International IEEE EMBS
Conference on Neural Engineering, Capri Island, Italy, March 2022, 2003, pp. 652655.
Bainbridge, D., Jones, D.L., Guiraudon, G.M., Peters, T.M. (2008) Ultrasound image and augmented reality guidance for off-pump, closed, beating, intracardiac surgery. Artificial
Organs, 32:840845.
Bastien, S., Peuchot, B., Tanguy, A. (1997) Augmented reality in spine surgery: Critical
appraisal and status of development. Studies in Health Technology and Informatics,
88:153156.
Bichlmeier, C. (2010) Immersive, Interactive and Contextual In-Situ Visualization for Medical
Applications. Mnchen, Germany: Technische Universitt Mnchen.
Bichlmeier, C., Heining, S.M., Feuerstein, M., Navab, N. (2009) The virtual mirror: A new
interaction paradigm for augmented reality environments. IEEE Transactions on
Medical Imaging, 28:14981510.
Birkfellner, W., Figl, M., Huber, K., Watzinger, F., Wanschitz, F., Hummel, J., Hanel, R. etal. (2002)
A head-mounted operating binocular for augmented reality visualization in medicine
Design and initial evaluation. IEEE Transactions on Medical Imaging, 21:991997.
Birkfellner, W., Figl, M., Matula, C., Hummel, J., Hanel, R., Imhof, H., Wanschitz, F.,
Wagner,A., Watzinger, F., Bergmann, H. (2003) Computer-enhanced stereoscopic vision
in a head-mounted operating binocular. Physics in Medicine and Biology, 48:N49N57.
Blackwell, M., Nikou, C., DiGioia, A.M., Kanade, T. (March 2000) An image o verlay system
for medical data visualization. Medical Image Analysis, 4(1):6772.
Cooperstock, J., Wang, G. (2009) Stereoscopic display technologies, interaction paradigms, and
rendering approaches for neurosurgical visualization. In: SPIE Proceedings Stereoscopic
Displays and Applications, San Jose, CA, vol. 7237, pp. 723703723703-11.
Edwards, P.J., Hawkes, D.J., Hill, D.L., Jewell, D., Spink, R., Strong, A.J., Gleeson, M.J.
(1995) Augmentation of reality using an operating microscope for otolaryngology and
neurosurgical guidance. Journal of Image Guided Surgery, 1:172178.
Edwards, P.J., King, A.P., Hawkes, D.J., Fleig, O., Maurer, C.R., Jr., Hill, D.L., Fenlon, M.R.
et al. (1999) Stereo augmented reality in the surgical microscope. Studies in Health
Technology and Informatics, 62:102108.
Eggers, G., Muhling, J., Marmulla, R. (2006) Image-to-patient registration techniques in head
surgery. International Journal of Oral and Maxillofacial Surgery, 35:10811095.
Figl, M., Birkfellner, W., Watzinger, F., Wanschitz, F., Hummel, J., Hanel, R., Ewers, R.,
Bergmann, H. (2002) PC-based control unit for a head mounted operating microscope
for augmented reality visualization in surgical navigation. In: Medical Image Computing
and Computer-Assisted InterventionMICCAI, Tokyo, Japan, vol. 2489, pp. 4451.
Fischer, J., Bartz, D., Straer, W. (2005) Intuitive and lightweight user interaction for medical
augmented reality. In: Proceedings of Vision, Modeling and Visualization (VMV),
Erlangen, Germany, November 1618, pp.375382.
544
Fuchs, H., Livingston, M.A., Raskar, R., Colucci, D., Keller, K., State, A., Crawford, J.R.,
Rademacher, P., Drake, S.H., Meyer, A.A. (1998) Augmented reality visualization
for laparoscopic surgery. In: Medical Image Computing and Computer-Assisted
InterventionMICCAI, Cambridge, MA, vol. 1496, pp. 934943.
Giraldez, J.G., Talib, H., Caversaccio, M., Ballester, M.A.G. (2006) Multimodal augmented
reality system for surgical microscopy. In: Medical Imaging 2006: Visualization, ImageGuided Procedures, and Display, San Diego, CA, vol. 6141, p. S1411.
Gleason, P.L., Kikinis, R., Altobelli, D., Wells, W., Alexander, E., 3rd, Black, P.M., Jolesz, F.
(1994) Video registration virtual reality for nonlinkage stereotactic surgery. Stereotactic
and Functional Neurosurgery, 63:139143.
Grimson, W.E.L., Kikinis, R., Jolesz, F.A., Black, P.M. (1999) Image-guided surgery. Scientific
American, 280:6269.
Hansen, C., Wieferich, J., Ritter, F., Rieder, C., Peitgen, H.O. (2010) Illustrative visualization
of 3D planning models for augmented reality in liver surgery. International Journal of
Computer Assisted Radiology and Surgery, 5:133141.
Haouchine, N., Dequidt, J., Peterlik, I., Kerrien, E., Berger, M.-O., Cotin, S. (2013) Imageguided simulation of heterogeneous tissue deformation for augmented reality during
hepatic surgery. In: ISMARIEEE International Symposium on Mixed and Augmented
Reality, Adelaide, SA, pp. 199208.
Hata, N., Dohi, T., Iseki, H., Takakura, K. (1997) Development of a frameless and armless
stereotactic neuronavigation system with ultrasonographic registration. Neurosurgery,
41:608613; discussion 613614.
Jannin, P., Korb, W. (2008) Assessments of image-guided interventions. In: Peters, P., Cleary,K.
(eds.) Image-Guided Interventions: Technology and Applications. New York: Springer,
pp. 531547.
Johnson, L., Edwards, P., Hawkes, D. (2002) Surface transparency makes stereo overlays
unpredictable: The implications for augmented reality. Studies in Health Technology
and Informatics, 94:131136.
Kati, D., Spengler, P., Bodenstedt, S., Castrillon-Oberndorfer, G., Seeberger, R., Hoffmann,J.,
Dillmann, R., Speidel, S. (2014) A system for context-aware intraoperative augmented
reality in dental implant surgery. International Journal of Computer Assisted Radiology
and Surgery, 10:101108.
Kati, D., Sudra, G., Speidel, S., Castrillon-Oberndorfer, G., Eggers, G., Dillmann, R. (2010)
Knowledge-based situation interpretation for context-aware augmented reality in dental implant surgery. In: Liao, H., Eddie Edwards, P.J., Pan, X., Fan, Y., Yang, G.-Z.
(eds.) Medical Imaging and Augmented Reality, Lecture Notes in Computer Science,
vol. 6326. Berlin, Germany: Springer, pp. 531540.
Kati, D., Wekerle, A.L., Gortler, J., Spengler, P., Bodenstedt, S., Rohl, S., Suwelack, S. etal.
(2013) Context-aware augmented reality in laparoscopic surgery. Computerized medical imaging and graphics. The Official Journal of the Computerized Medical Imaging
Society, 37:174182.
Kawamata, T., Iseki, H., Shibasaki, T., Hori, T. (2002) Endoscopic augmented reality navigation system for endonasal transsphenoidal surgery to treat pituitary tumors: Technical
note. Neurosurgery, 50:13931397.
Kersten-Oertel, M., Chen, S.J., Collins, D.L. (2013a) An evaluation of depth enhancing perceptual cues for vascular volume visualization in neurosurgery. IEEE Transactions on
Visualization and Computer Graphics, 20:391403.
Kersten-Oertel, M., Gerard, I., Drouin, S., Mok, K., Sirhan, D., Sinclair, D., Collins, D.L.
(2014) Augmented reality in neurovascular surgery: First experiences. In: Proceedings
of AE-CAI, Boston, MA, LNCS, vol. 8678, pp. 8089.
545
Kersten-Oertel, M., Jannin, P., Collins, D.L. (2012) DVV: A taxonomy for mixed reality visualization in image guided surgery. IEEE Transactions on Visualization and Computer
Graphics, 18:332352.
Kersten-Oertel, M., Jannin, P., Collins, D.L. (2013b) The state of the art of visualization in
mixed reality image guided surgery. Computerized Medical Imaging and Graphics: The
Official Journal of the Computerized Medical Imaging Society, 37:98112.
King, A.P., Edwards, P.J., Maurer, C.R., de Cunha, D.A., Gaston, R.P., Clarkson, M., Hill,
D.L.G. et al. (2000) Stereo augmented reality in the surgical microscope. PresenceTeleoperators and Virtual Environments, 9:360368.
Kosaka, A., Saito, A., Furuhashi, Y., Shibasaki, T. (2000) Augmented reality system for surgical navigation using robust target vision. In: IEEE Conference on Computer Vision and
Pattern Recognition, Hilton Head Island, South Carolina, vol. 2, pp. 187194.
Livin, M., Keeve, E. (2001) Stereoscopic augmented reality system for computer-assisted
surgery. In: CARS, vol. 1230, pp. 107111.
Linte, C.A., Moore J., Wiles A.D., Wedlake, C., Peters, T.M. (March 2008) Virtual realityenhanced ultrasound guidance: A novel technique for intracardiac interventions.
Computer Aided Surgery, 13(2):8294.
Lo, J., Moore, J., Wedlake, C., Guiraudon, G.M., Eagleson, R., Peters, T. (2005) Surgeoncontrolled visualization techniques for virtual reality-guided cardiac surgery. Studies in
Health Technology and Informatics, 142:162167.
Lorensen, W., Kikinis, R., Cline, H., Altobelli, D., Nafis, C., Gleason, L. (1993) Enhancing
reality in the operating-room. In: Proceedings of the IEEE Conference on Visualization
93, San Jose, CA, pp. 410415.
Marmulla, R., Hoppe, H., Muhling, J., Eggers, G. (2005) An augmented reality system for
image-guided surgery. International Journal of Oral and Maxillofacial Surgery,
34:594596.
Marmulla, R., Hoppe, H., Mhling, J., Hassfeld, S. (2004) New augmented reality concepts for
craniofacial surgical procedures. Plastic and Reconstructive Surgery, 115:11241128.
Milgram, P., Kishino, F. (1994) A taxonomy of mixed reality visual-displays. IEICE
Transactions on Information and Systems, E77d:13211329.
Mischkowski, R.A., Zinser, M., Kubler, A., Seifert, U., Zoller, J.E. (2005) Clinical and experimental evaluation of an augmented reality system in cranio-maxillofacial surgery. In:
CARS 2005: Computer Assisted Radiology and Surgery, vol. 1281, pp. 565570.
Mischkowski, R.A., Zinser, M.J., Kubler, A.C., Krug, B., Seifert, U., Zoller, J.E. (2006)
Application of an augmented reality tool for maxillary positioning in orthognathic surgeryA feasibility study. Journal of Craniomaxillofacial Surgery, Cologne,
Germany, 34:478483.
Nijmeh, A.D., Goodger, N.M., Hawkes, D., Edwards, P.J., McGurk, M. (2005) Image-guided
navigation in oral and maxillofacial surgery. British Journal of Oral and Maxillofacial
Surgery, 43:294302.
Onceanu, D., Stewart, A.J. (2011) Direct surgeon control of the computer in the operating
room. Medical image computing and computer-assisted intervention. In: MICCAI:
International Conference on Medical Image Computing and Computer-Assisted
Intervention, vol. 14, pp. 121128. Toronto, Canada.
Paloc, C., Carrasco, E., Macia, I., Gomez, R., Barandiaran, I., Jimenez, J.M., Rueda, O., di
Urbina, J.O., Valdivieso, A., Sakas, G. (2004) Computer-aided surgery based on autostereoscopic augmented reality. In: Proceedings of the Eighth International Conference
on Information Visualisation, Austin, TX, pp. 189193.
Pandya, A., Siadat, M.R., Auner, G. (2005) Design, implementation and accuracy of a prototype for medical augmented reality. Computer Aided Surgery, 10:2335.
546
Paul, P., Fleig, O., Jannin P. (November 2005) Augmented virtuality based on stereoscopic
reconstruction in multimodal image-guided neurosurgery: Methods and performance
evaluation. IEEE Transactions on Medical Imaging, 24(11):15001511.
Rodriguez Palma, S., Becker, B.C., Lobes, L.A., Riviere, C.N. (2012) Comparative evaluation
of monocular augmented-reality display for surgical microscopes. In: Engineering in
Medicine and Biology Society (EMBC), 2012 Annual International Conference of the
IEEE, San Diego, CA: IEEE, pp.14091412.
Salb, T., Brief, J., Burgert, O., Gockel, T., Hassfeld, S., Mhling, J., Dillmann, R. (2002) (intraoperative presentation of surgical planning and simulation results)augmented reality for
craniofacial surgery. In SPIE Electronic Imaging. International Conference on Stereoscopic
Displays and Virtual Reality Systems, JM et. al., ed. Vol 5006: Santa Clara, CA, May 2003.
Samset, E., Schmalstieg, D., Sloten, J.V., Freudenthal, A., Declerck, J., Casciaro, S., Rideng,O.,
Gersak, B. (2008) Augmented reality in surgical procedures. In: Human Vision and
Electronic Imaging XIII, vol. 6806, pp. 68060K68060K, San Jose, CA.
Sato, Y., Nakamoto, M., Tamaki, Y., Sasama, T., Sakita, I., Nakajima, Y., Monden, M., Tamura,
S. (1998) Image guidance of breast cancer surgery using 3-D ultrasound images and
augmented reality visualization. IEEE Transactions on Medical Imaging, 17:681693.
Sauer, F., Khamene, A., Bascle, B., Vogt, S., Rubino, G. (2002) Augmented-reality visualization in iMRI operating room: System description and preclinical testing. In:
Proceedings of the SPIE, Medical Imaging: Visualization and Image-Guided Procedures,
SanDiego,CA, vol. 4681, pp. 446454.
Sauer, F., Vogt, S., Khamene, A., Heining, S., Euler, E., Schneberger, M., Zuerl, K., Mutschler,
W. (March 2006) Augmented reality visualization for thoracoscopic spine surgery.
In: Proceedings on SPIE 6141, Medical Imaging 2006: Visualization, Image-Guided
Procedures, and Display, vol. 6141, pp. 430437. San Diego, CA, February 11, 2006.
Scheuering, M., Schenk, A., Schneider, A., Preim, B., Greiner, G. (2003) Intraoperative augmented reality for minimally invasive liver interventions. In: Medical Imaging 2003:
Visualization, Image-Guided Procedures, and Display, vol. 5029, pp. 407417.
Shuhaiber, J.H. (2004) Augmented reality in surgery. Archives of Surgery, 139:170174.
Soler, L., Ayache, N., Nicolau, S., Pennec, X., Forest, C., Delingette, H., Mutter, D.,
Marescaux, J. (2004) Virtual reality, augmented reality and robotics in surgical procedures of the liver. In: Buzug, T.M., Lueth, T.C. (eds.) Perspectives in Image Guided
Surgery. Singapore: World Scientific Publishing Company, pp. 476484.
Splechtna, R.C., Fuhrmann, A.L., Wegenkittl, R. (2002) ARASAugmented reality aided
surgery system description. VRV is Research Center Technical Report.
Sudra, G., Speidel, S., Fritz, D., Muller-Stich, B.P., Gutt, C., Dillmann, R. (2007) MEDIASSIST
MEDIcal ASSistance for intraoperative skill transfer in minimally invasive surgery
using augmented reality. In: Medical Imaging 2007: Visualization and Image-Guided
Procedures, vol. 6509, pp. 65091O65091O. San Diego, CA.
Suthau, T., Vetter, M., Hassenpflug, P., Meinzer, H.-P., Hellwich, O. (2002) A concept work
for augmented reality visualization based on a medical application in liver surgery.
Medical Application in Liver Surgery, Technical University Berlin, Berlin, Germany,
Commission V, WG V/3.
Suzuki, N., Hattori, A., Hashizume, M. (2008) Benefits of augmented reality function for laparoscopic and endoscopic surgical robot systems. In: MICCAI Workshop: AMI-ARCS,
New York, pp. 5360.
Teber, D., Guven, S., Simpfendorfer, T., Baumhauer, M., Guven, E.O., Yencilek, F., Gozen,A.S.,
Rassweiler, J. (2009) Augmented reality: A new tool to improve surgical accuracy during
laparoscopic partial nephrectomy? Preliminary in vitro and in vivo results. European
Urology, 56:332338.
547
Tomikawa, M., Hong, J., Shiotani, S., Tokunaga, E., Konishi, K., Ieiri, S., Tanoue, K.,
Akahoshi, T., Maehara, Y., Hashizume, M. (2010) Real-time 3-dimensional virtual
reality navigation system with open MRI for breast-conserving surgery. Journal of the
American College of Surgeons, 210:927933.
Tonet, O., Megali, G., DAttanasio, S., Dario, P., Carrozza, M.C., Marcacci, M., Martelli,S.,
La Palombara, P.F. (2000) An augmented reality navigation system for computer assisted
arthroscopic surgery of the knee. Medical Image Computing and Computer-Assisted
InterventionMICCAI, 1935:11581162.
Traub, J., Feuerstein, M., Bauer, M., Schirmbeck, E.U., Najafi, H., Bauernschmitt, R.,
Klinker, G. (2004) Augmented reality for port placement and navigation in robotically
assisted minimally invasive cardiovascular surgery. In: CARS 2004: Computer Assisted
Radiology and Surgery, Proceedings, vol. 1268, pp. 735740.
Traub, J., Sielhorst, T., Heining, S.M., Navab, N. (2008) Advanced display and visualization concepts for image guided surgery. Journal of Display Technology, Chicago, IL,
4:483490.
Trevisan, D.G., Nedel, L.P., Macq, B., Vanderdonckt, J. (2006) Detecting interaction variables in a mixed reality system for maxillofacial-guided surgery. In: SVR 2006-SBC
Symposium on Virtual Reality, Belm, Par, Brazil, vol. 1, pp. 3950.
Wacker, F.K., Vogt, S., Khamene, A., Jesberger, J.A., Nour, S.G., Elgort, D.R., Sauer, F.,
Duerk, J.L., Lewin, J.S. (2006) An augmented reality system for MR image-guided
needle biopsy: Initial results in a swine model. Radiology, 238:497504.
Wolfsberger, S., Rossler, K., Regatschnig, R., Ungersbock, K. (2002) Anatomical landmarks
for image registration in frameless stereotactic neuronavigation. Neurosurgical Review,
25:6872.
Worn, H., Aschke, M., Kahrs, L.A. (2005) New augmented reality and robotic based methods
for head-surgery. International Journal of Medical Robotics, 1:4956.
Wu, J.R., Wang, M.L., Liu, K.C., Hu, M.H., Lee, P.Y. (2014) Real-time advanced spinal surgery via visible patient model and augmented reality system. Computer Methods and
Programs in Biomedicine, 113:869881.
Zheng, G., Dong, X., Gruetzner, P.A. (2008) Reality-augmented virtual fluoroscopy for
computer-assisted diaphyseal long bone fracture osteosynthesis: A novel technique and
feasibility study results. Proceedings of the Institution of Mechanical Engineers. Part H,
Journal of Engineering in Medicine, 222:101115.
Section IV
Wearable Computers and
Wearable Technology
21
Soft Skin Simulation for

Wearable Haptic Rendering
Gabriel Cirio, Alvaro G. Perez,
and Miguel A. Otaduy
CONTENTS
21.1 Related Work................................................................................................. 554
21.1.1 Wearable Haptic Devices................................................................... 554
21.1.2 Soft Skin Simulation.......................................................................... 555
21.1.2.1 Rigid Articulated Hands..................................................... 555
21.1.2.2 Deformable Hands.............................................................. 556
21.2 Full-Hand Deformation Using Linear Elasticity........................................... 557
21.2.1 Skeleton.............................................................................................. 557
21.2.2 Flesh................................................................................................... 558
21.2.2.1 Elasticity Model.................................................................. 559
21.2.2.2 Tetrahedral Discretization.................................................. 559
21.2.2.3 Elastic Force Computation.................................................. 560
21.2.3 Coupling of Skeleton and Flesh......................................................... 560
21.2.4 Haptic Rendering............................................................................... 562
21.2.5 Results................................................................................................ 563
21.3 Strain-Limiting for Nonlinear Soft Skin Deformation..................................564
21.3.1 Formulation of Strain-Limiting Constraints......................................564
21.3.2 Constraint Jacobians.......................................................................... 566
21.3.3 Constrained Dynamics...................................................................... 566
21.3.4 Contact and Friction.......................................................................... 567
21.3.5 Error Metrics..................................................................................... 567
21.3.6 Haptic Coupling................................................................................. 568
21.3.7 Results................................................................................................ 568
21.4 Anisotropic Soft Skin Deformation............................................................... 570
21.4.1 Definition of Strain Limits................................................................ 571
21.4.2 Hyperbolic Projection Function......................................................... 572
21.4.3 Constraint Formulation...................................................................... 573
21.4.4 Constraint Jacobians.......................................................................... 573
21.4.5 Comparison with Other Approaches................................................. 574
21.4.6 Results................................................................................................ 576
21.5 Conclusion..................................................................................................... 576
References............................................................................................................... 577
551
552
Just as the synthesizing and rendering of visual images defines the area of computer
graphics, the science of developing devices and algorithms that synthesize computergenerated force-feedback and tactile cues is the concern of computer haptics (Lin
and Otaduy 2008). Haptics broadly refers to touch interactions (physical contact)
that occur for the purpose of perception of virtual environments and, more broadly
speaking, for the transmission of tactile cues.
Wearable haptics focuses on haptic devices, with their corresponding control
algorithms, that are worn by the user. Wearable haptic systems generate tactile feedback and apply it directly on the human body to interact, communicate, and cooperate with real and virtual environments. Although the sensory system responsible for
haptic perception, the somatosensory system, is distributed across the entire body, a
large portion of existing work in wearable haptics has focused on hand-based wearable haptics (notably fingertip tactile stimuli and finger kinesthetic feedback), mainly
due to the fundamental importance of manual interaction for most tasks.
Wearable haptics offers new ways of interacting with real and virtual environments. Leveraging touch as a primordial sense through which subjects can communicate easily and instantly, wearable haptics stimulations can be used to come
into contact with cognitively impaired patients, or augment social media interactions
with additional sensory channels. Wearable haptics can improve the cooperation of
humans and robots in a team, in scenarios such as search and rescue. Videogames,
a multibillion industry, as well as serious games for professional training could also
benefit from wearable haptic technology for an increased degree of immersion without requiring expensive and cumbersome ground-based devices, as already shown
by much simpler consumer devices such as Nintendos Wiimote.
The design of wearable devices is constrained by the wearability factor, thus limiting the weight, volume, shape, and form factor of the device (Gemperle etal. 1998).
Grounded kinesthetic devices require an external robot to simulate contact interaction. Similarly, exoskeletons are body-grounded (Bergamasco etal. 1994; Biggs and
Srinivasan 2002), producing forces and counterbalancing forces both felt by the user.
Wearable haptics, on the other hand, should be intrinsically distributed and multidegree-of-freedom (DoF), but necessarily largely underactuated and undersensed to
improve the wearability factor (Prattichizzo etal. 2013).
In order to simulate realistic haptic sensations with wearable devices, cutaneous
feedback is often used to replace kinesthetic feedback while trying to preserve the
sensations. This is an inherently difficult task. Yet, wearable haptic feedback systems producing realistic results have been designed, mainly applied to fingerpads
(Minamizawa etal. 2007, 2008; Nakamura and Fukui 2007; Aoki etal. 2009). These
devices have confirmed that some kinesthetic information, such as grasping force
and object weight, can be reasonably reproduced through tactile sensations alone.
However, these ungrounded haptic displays still have limitations, mainly in terms of
maximum weight and force they can render, due to the absence of real kinesthetic
feedback.
Wearable haptic devices produce haptic feedback through the activation of force
and deformation cues on the users skin. For a realistic perception of these cues, the
Soft Skin Simulation for Wearable Haptic Rendering
553
appropriate sensing channels must be excited with the correct intensity and adequate
timing, while everything is synchronized with the other modalities. Achieving such
a complex goal requires the use of a cognitive layer that predicts deformation and
force cues based on a biomechanical model of the users body. Therefore, wearable
haptic systems require the development of reliable and efficient numerical models
of human touch biomechanics. The model should include the interaction with the
environment while accounting for skin deformation and constraints coming from
articulation joints and contacts.
Therefore, to render high-fidelity haptic feedback for direct interaction with the
skin, which is often the case of wearable haptic devices, the command and actuation of haptic devices must rely on a thorough understanding of the forces and
deformations present at contact locations. This implies the interactive computation
of accurate forces and deformations of the skin. Over the past, research on haptic
rendering has produced excellent methods to support kinesthetic rendering of toolbased interaction, but in recent years we have also witnessed the invention of multiple cutaneous haptic devices, using a variety of technologies and skin stimulation
principles (Chubb etal. 2010; Scilingo etal. 2010; Wang and Hayward 2010; Gleeson
etal. 2011; Solazzi etal. 2011; Chinello etal. 2012). This progress in hardware design
calls for novel methods to compute accurate forces and deformations on the skin for
wearable haptic rendering.
This chapter summarizes the progress on a computationally efficient approach
to simulate accurate soft skin contact. These results are intended to be part of a
model-based control strategy for wearable haptic rendering of direct hand interaction, in which the forces and/or deformations needed to command the wearable
haptic device are computed by resolving the interaction between a skin model and
simulated objects or materials. The computation of high-fidelity forces and deformations during skin contact, which are then used to command wearable haptic devices,
is challenged by two major difficulties: frictional contact and the extreme nonlinear
elasticity of the skin.
We first survey existing wearable haptic devices and early approaches related to
hand and soft skin simulation, ranging from rigid articulated hand skeletons to more
sophisticated flesh and bone models allowing compliant frictional contact. However,
while off-line approaches are too computationally expensive, real-time techniques
do not account for the highly nonlinear and anisotropic elastic behavior of soft skin,
or the efficient two-way computation of haptic feedback. We therefore describe in
more detail a set of techniques that address these issues in the context of haptic
rendering. We first describe a technique that allows two-way coupling between flesh
and bones, fundamental for haptic rendering of hand interaction through contact
(Garre etal. 2011). We then show how to take into account the extreme n onlinearity
of skin
elasticity in an efficient way through strain-limiting constraints (Perez
etal. 2013), and how to incorporate anisotropic constraints to correctly model the
anisotropic behavior of finger flesh (Hernandez etal. 2013). These models capture
the deformations and forces of the skin upon contact. Therefore, they constitute a
central component of a model-based strategy for the control of wearable devices.
554
21.1 RELATED WORK

21.1.1 Wearable Haptic Devices
Prattichizzo etal. (2013) classify wearable haptic devices into three main categories
according to the way haptic feedback is generated: vibrotactile systems, pin-array
systems, and force systems.
Vibrotactile systems, of which the Wii Remote motion controller (Nintendo Co.
Ltd., Japan) is a simple yet well-known example, transmit haptic cues through vibrations. Many devices were designed to convey general information, such as directional data for guidance purposes through back skin (Traylor and Tan 2002) or
arm (Lieberman and Breazeal 2007) stimuli, foot-based driver safety information
(Kimetal. 2006), and humanrobot interaction in leaderfollower formation tasks
through a bracelet (Scheggi etal. 2012). Other devices give the illusion of a force or a
torque modulating the frequency and amplitude of the vibration of a mass or a rotor,
through handheld (Amemiya etal. 2007) or fingertip (Nakamura and Fukui 2007)
actuators. Leveraging sound synthesis mechanisms, several researchers embedded
vibrotactile transducers in shoe soles to reproduce different ground textures such as
grass, snow, and crumpling materials (Nordahl etal. 2010; Papetti etal. 2010; Turchet
etal. 2013). Although sometimes compelling, these approaches cannot convey force
cues from simple everyday interactions due to the inherently limited bandwidth of
vibrotactile devices. Pin-array systems locally deform the skin through the motion of
individual pins. Existing devices can convey Braille cell patterns (Yang etal. 2007)
and shape cues (Yang etal. 2007, 2009; Sarakoglou etal. 2012). The large number of
DoF (one per pin) makes these devices very flexible, but at the same time pose severe
challenges in terms of wearability due to the required number of actuators.
The third category, force systems, apply forces at one or more contact points
(Prattichizzo etal. 2013), similar to grounded interfaces. The best-known example
is arguably the commercial CyberGrasp (CyberGlove Systems LLC, San Jose, CA)
and other glove haptic displays, providing forces to each finger. Its complexity, however, comes at the cost of limited wearability. Minamizawa etal. (2007) presented
a wearable ungrounded haptic display that produces weight sensations of virtual
objects solely based on cutaneous stimulations. The device leverages the illusion
of weight through cutaneous sensations without any kinesthetic feedback, using an
actuated belt that applies normal and shear forces to the fingertip in an open loop
fashion. The initial single finger device was then extended to a glove for feedback on
all fingers in (Minamizawa etal. 2008). However, control did not take into account
the viscoelastic parameters of fingerpads specific to each individual, thus limiting its
accuracy. The haptic ring (Aoki etal. 2009) produces cutaneous force sensations on
the tip of the finger by pulling on three thin wires. The actuators are located on the
nail side of the finger and is worn like a ring. Winfree etal. (2009) introduce a new
device based on the gyroscopic effect to produce high torques on the hand in two
directions. Solazzi etal. (2010) designed a device that displays directly to the fingertip the transition between contact and noncontact situations together with the local
geometry of the virtual object under contact. The same authors have also designed a
fingertip device (Solazzi etal. 2011) that uses shape memory alloy wires to produce
555
tangential displacements of the skin and therefore display navigation cues to the
user. Tangential skin displacements are also addressed in Gleeson etal. (2010) with
a 2-DoF fingertip device, creating planar motion through a compliant flexure stage.
Leveraging the bielasticity of a fabric, Bianchi etal. (2010) improved the rendering
of softness with a device that conveys both cutaneous and kinesthetic information.
Prattichizzo etal. (2013) designed a 3-DoF wearable device for cutaneous force feedback through mobile platform that applies forces on the finger pad. The platform
is attached to a static one located on the back of the finger and is driven by three
motors. The authors assume a linear relationship between platform displacement and
the resultant wrench, therefore using a very simplified model of soft skin mechanics.
In general, none of these devices rely on soft skin deformation simulations to
compute haptic feedback. Yet, the behavior of flesh is extremely complex, mainly
due to its high deformation nonlinearity. Accounting for the object that is being
stimulated, which is often the fingers or the fingertip, through some sort of biomechanical model of articulated hand and/or soft skin mechanics can greatly increase
the precision of force feedback computations, and allow the use of a model-based
control strategy for wearable haptic rendering of direct hand interaction.
21.1.2Soft Skin Simulation

Direct haptic interaction with the hand places important challenges on the simulation of hand biomechanics. The simulation must be computed at high frame rates for
stable and high-fidelity haptic feedback, and it must accurately capture forces and
deformations to send realistic commands to the haptic device. This section quickly
surveys existing techniques for the simulation of hand models.
21.1.2.1 Rigid Articulated Hands
Several researchers have already addressed haptic interaction with an animated hand
model by the use of articulated bodies, using interconnected rigid bodies to model
the palm and each phalange. While these approaches ignore flesh deformation, they
exhibit some level of physical accuracy by relying on rigid body dynamics and constrained mechanics.
Articulated hand interaction with force feedback was first introduced by Borst
and Indugula (2005). They coupled a tracked hand configuration with an articulated
handusing linear and torsional springs. By performing collision detection on the articulated model, they allowed real-time grasping with haptic feedback. Ott etal. (2010)
showed that this approach allowed dexterous interaction on the palm and fingers. The
major drawback was the use of artificially high friction coefficients, due to the lack of
soft tissue deformation, often resulting in excessively sticky contacts. This approach
was improved by Jacobs etal. (2012), leveraging the 6-DoF god-object method (Ortega
etal. 2007). The resulting God-hand model uses Gausss principle of least constraint
to solve the interdependencies between the different rigid bodies of the hand.
Although rigid articulated hands can be very fast to compute, and are therefore
well fitted for haptic interaction, they do not take into account the soft skin deformation required for accurate simulation of frictional contact.
556
21.1.2.2 Deformable Hands

The hand is a complex structure made of a skeleton, muscles, tendons, and skin, all
with their own properties, making it challenging to accurately simulate the deformation of the hand as a whole during motion and contact. Both off-line and real-time
techniques have been developed to account for the elastic mechanics of the hand.
Off-line computer graphics solutions for hand animation focus on visual realism.
Anatomically inspired biomechanical models produce highly realistic animations,
such as the work of Sueda etal. (2008), which simulates the biomechanical behavior
of the hand down to the level of tendons and muscles. However, they are too computationally expensive for haptic rendering. Purely geometric methods (Kry etal.
2002; Kurihara and Miyata 2004), on the other hand, do not support local skin deformations due to contact interactions. Other solutions leverage real data to simulate
grasping realistically (Pollard and Zordan 2005; Kry and Pai 2006; Li etal. 2007),
but do not consider the deformation of the hands skin under contact.
Real-time solutions, on the other hand, require a trade-off between c omputation
time and realism. Simulating the complex, irregular, and layered structure of a finger
can be done accurately by relying on continuum mechanics. However, using physically based approaches has a high computational cost, which makes it difficult to
use for interactive simulations. For this reason, different analytical models for soft
finger frictional contacts have been developed in the past, which greatly reduce
the complexity of the simulation while keeping some level of finger compliance.
These models, called soft finger models, usually account for tangential and torsional
friction from a single contact. For instance, Barbagli et al. (2004) studied human
fingertips and derived a soft finger model for point-based contact, allowing two-finger
grasping, thanks to tangential and torsional friction forces. Frisoli etal. (2006) then
coupled these two sets of friction forces in the soft finger model using the concept of
limit curve (Howe and Cutkosky 1996). Ciocarlie etal. (2007) account for nonplanar contacts by using approximations of the local geometry of the finger and object
in c ontact and relying on linear complementarity formulations to solve frictional
contact mechanics. Soft finger models are often used in conjunction with extensions
of the god-object algorithm (Zilles and Salisbury 1995) for haptic interaction.
The most realistic approaches so far imitate the structure of a real hand by coupling articulated rigid bones with deformable flesh. Several approaches rely on geometric deformations for the flesh. Jacobs and Froehlich (2011) link soft body pads to
a rigid articulated structure, and rely on lattice shape matching (Rivers and James
2007) for the deformation of the pads. Duriez etal. (2008) add skinning to the rigid
articulated hand structure, and compute frictional contact mechanics directly on the
deformed skin surface. Other approaches use physically based simulation algorithms
for deformation. Although computationally expensive, with the increase of computational power, real-time hand simulation models started to emerge. Pouliquen etal.
(2005) used a rigid articulated skeleton coupled to linear elastic pads simulated
through corotational finite element method (FEM) and timestepped by formulating a
linear complementarity problem (LCP). FEM allows the simulation of skin and flesh
deformation through accurate discretization of continuum mechanics. Pads were
linked to the bones through fixed nodes, and the motion was driven by the skeleton.
557
Forcefeedback was computed using a virtual coupling mechanism and transmitted

to the user through a wearable kinesthetic device attached to the wrist.
More recent techniques improve the articulated hand/deformable flesh approach
in several ways, notably by introducing two-way coupling between flesh and bones,
as well as nonlinear and anisotropic elasticity to better simulate the real behavior of
finger flesh deformation. We discuss these techniques, as well as their incorporation
of haptic feedback, in the remainder of this chapter.
21.2 FULL-HAND DEFORMATION USING LINEAR ELASTICITY

In this section, we describe an approach that tackles two-way coupling between flesh
and skeleton for the full hand in real time. We build an anatomically based model
that includes the skeletal bone structure, a deformable flesh model, and the coupling
between skeleton and flesh based on a skinning approach. Thanks to two-way coupling between flesh and skeleton, the model enables the computation of joint forces,
distributed deformation, and frictional forces on fingertips during simulated grasping operations.
21.2.1Skeleton
Let us start considering a simple skeleton consisting of two bones a and b connected
through a joint. The dynamics of the two bones can be described by the Newton
Euler equations:

M a v a = Fa (q a , v a , q b , v b ) and M b v b = Fb (q a , v a , q b , v b ), (21.1)
and a joint constraint

C(q a , q b ) = 0. (21.2)
where
q includes both the position and orientation of a rigid body
v includes its linear and angular velocity
M is a 6 6 mass matrix
F is the vector of forces and torques
These may include gravitational and inertial forces, joint elasticity and damping, and
constraint forces.
For a skeleton with an arbitrary number of bones and joints, we group all bone
velocities in a vector vs, and then the constrained NewtonEuler equations can be
expressed as
M s v s = Fs (q s , v s ),

C s (q s ) = 0.
(21.3)
558
(a)
(b)
(c)
FIGURE 21.1 (a) Articulated hand skeleton with 16 bones. Phalanxes are linked
through hinge joints, and the phalanxes and the palm are linked through universal joints.
(b) Surface model for rendering purposes. (c) Embedding tetrahedral mesh to model flesh
elasticity.
We have modeled the hands skeleton using 16 rigid bodies (3 for each finger and
the thumb, and 1 for the palm) and 15 joints (10 hinge joints between phalanxes,
and 5 universal joints at the junctions of phalanxes and the palm), as shown in
Figure21.1. We have modeled joint limits by activating stiff angular springs after
admissible joint rotations are reached.
To ensure stable simulation under the combination of high stiffness, low mass,
and large time steps, we discretize the constrained NewtonEuler equations with
backward Euler implicit integration (Baraff and Witkin 1998). We approximate
the resulting nonlinear system with a linearization of forces at the beginning
ofeach time step, and then Equation 21.3 is transformed into the linear system of
equations:
A s v s + JTs s = bs ,

J s v s = b s ,
(21.4)
with As = Ms h(Fs/vs) h2(Fs/qs) and bs = Msvs,0 + hFs h(Fs/vs)vs,0.

We denote with h the size of the simulation step. The discretized joint constraints
are obtained by linearizing them at the beginning of the time step, with Js = Cs/qs
and bs = (1/h)Cs(qs,0). In all equations, the suffix 0 refers to values at the beginning
of the time step. Since the constraints are expressed on (linearized) positions, they
do not suffer drift.
The system matrix As is sparse, of size 6m 6m, with m = 16 the number of
bones. It contains nonzero blocks in the diagonal and in off-diagonal terms associated to pairs of adjacent bones. These nonzero off-diagonal blocks are induced by the
elastic forces that model joint limits.
21.2.2Flesh
Flesh is simulated using an elasticity model derived from continuum mechanics.
Wefirst introduce the formulation of continuum elasticity, followed by a description
of the finite element discretization using tetrahedral meshes. Finally, we describe the
computation of elastic forces.
559
21.2.2.1 Elasticity Model

In continuum mechanics, object deformation is described by a displacement map
u:x0x, from initial (material) coordinates x0 to deformed (world) coordinates x. We
consider elasticity models for which internal forces are a function of the deformation gradient G(x0) = x/x0, along with material properties. In particular, in this work we assume
an isotropic Hookean material, where stress linearly depends on strain as follows:
1 T 1
(u ) (G GT ) = E, (21.5)
2
2
where is the so-called Cauchy (linear) strain tensor:

1
1
(u + u T ) = (G + GT ) I, (21.6)
2
2
and E is a factor that is solely determined by material properties, namely Young

modulus Y and Poissons ratio v. Under these assumptions, elastic forces can be easily
derived from the stress field as felastic = .
To discretize the elasticity equations, FEM partitions the material space into
elements e, such as tetrahedra or hexahedra. This partition provides a
framework to interpolate variables inside the volume of each element from values
defined at its vertices (i.e., nodes). The vector of nodal forces f can then be defined as
a linear function of the vector of nodal displacements u, through a stiffness matrix K:
f = K u. (21.7)
21.2.2.2 Tetrahedral Discretization

Let us first consider a discretization based on a tetrahedral mesh with linear basis
functions. Under these assumptions, the strain and stress tensors are constant inside
each tetrahedral element. Given the four nodes {x1, x2, x3, x4} of a tetrahedral element, we define its volume matrix

X = (x1 x 4
x2 x4
x 3 x 4 ). (21.8)
For convenience, we express the inverse of the rest-state volume matrix based on its
rows:
r1

X = r2 . (21.9)
r3

1
0
It is also convenient to define a fictitious row r4 = (r1 + r2 + r3). Using the volume
matrix, the deformation gradient G = x/x0 of a tetrahedron can be computed as

G = X X 01. (21.10)
560
21.2.2.3 Elastic Force Computation

For tetrahedra with linear basis functions, the per-element stiffness matrix Ke integrated over the volume of the tetrahedron is:
Ke =

B EB dV
T
x0
, (21.11)
e ,x0
where B is a matrix of derivatives of shape functions, which turns out to be

constant.
To better handle large rotations, we apply a corotational strain formulation
(Mller and Gross 2004), in which a rotation matrix Re is estimated per element,
and the strain is measured in the unrotated setting. Then, the per-element matrix
is effectively warped as Ke = R e K e RTe . We estimate the rotation of a tetrahedron
using the polar decomposition of the deformation gradient. To maximize efficiency,
we have opted to design a coarse tetrahedral mesh that embeds the surface of the
hand, as shown in Figure 21.1. The positions of surface points are defined by simple
barycentric combination in the embedding tetrahedra. The tetrahedral mesh also
occupies the regions of bone tissue, which again simplifies the mesh and maximizes
the efficiency of the model. Our model approximates the complex layered structure
of the tissues in the hand with a homogeneous elastic material, but it preserves the
skeletal structure thanks to the coupling between bones and flesh, which is defined
in the next section.
Similar to the skeleton, we discretize the dynamics equations using backward
Euler implicit integration, leading to the system

A f v f = b f . (21.12)
The velocity vector vf concatenates the velocities of all mesh nodes. The system matrix of the flesh is Af = M f + hDf + h 2K f, with Df and K f damping and
stiffness matrices, respectively. It contains nonzero blocks in the diagonal and
in off-diagonal terms associated to pairs of nodes belonging to the same tetrahedron. Full details of the linear corotational formulation are given in Mller and
Gross (2004).
21.2.3Coupling of Skeleton and Flesh

One of the major virtues of our simulation approach is the simple yet efficient
way in which bones and flesh are interconnected, producing a model that captures both the skeletal and soft properties of a human hand. The overall motion of
the hand is determined by the skeletal configuration; therefore, we have designed
a skinning strategy that, given a skeletal configuration, defines target positions
for the flesh nodes. For simplicity, we have selected linear blend skinning (LBS)
561
(Magnenat-Thalmannetal.1988). Then, the target position fi of the ith flesh node is

defined by a convex combination of the bones rigid transformations:
fi (q s ) =
w (x + R x ), (21.13)
ij
j ij
where
(xj, Rj) are the position and orientation of the jth bone
xij is the (constant) target position of the flesh node in the local reference system
of the bone
{wij} are the skinning weights.
A realistic hand should enjoy bidirectional coupling between skeleton and flesh. The
skeletal motion must be transmitted to the flesh, and contact on the flesh must be
transmitted as well to the skeleton. Moreover, the stiffness of the flesh should impose
a resistance on the skeleton under joint rotations. We achieve bidirectional coupling
by inserting zero-length springs between flesh nodes and their target skinning positions. Then, the coupling of the ith flesh node produces a force Fi on the node and a
wrench (i.e., force and torque) Fj on the jth bone given by
Fi = k (x i fi ), (21.14)
Fj = k
fiT
( fi x i ). (21.15)
q j
The Jacobian matrix fi /q j = wij I33 , wij (R j x ij )* relates bone velocities to target
skinning velocities, with * representing the skew-symmetric matrix for a cross
product. Due to the addition of coupling forces between skeleton and flesh, the
numerical integration of skeleton and flesh velocities must be computed through a
single system of equations that couples Equations 21.4 and 21.12:
As
A fs
Js
A sf
Af
0
JTs v s bs

0 v f = b f . (21.16)
0 s bs
The coupling forces contribute to the terms Asf and Afs, through the derivatives of
flesh node forces with respect to bone configurations, and vice versa. Moreover,
the coupling forces also contribute terms to off-diagonal blocks of the As matrix
for pairs of bones that affect the target skinning position of the same flesh node.
In other words, for two bones a and b affecting the same flesh node xi, then
Fa /q b = k (fiT /q a )(fi /q b ) 0.
In practice, we limit the complexity added by the coupling by enforcing that each
flesh node may only be affected by at most two bones, and those bones must be adjacent. Then, the force Jacobian terms added by the fleshskeleton coupling to the As
matrix coincide with those added by joint limits.
562
21.2.4 Haptic Rendering

So far we have described the stand-alone simulation of a human hand. Haptic
interaction requires a bidirectional coupling between the simulated hand model and
a haptic device, involving transfer of positions and velocities in one direction, and
forces in the other. In our implementation, we have followed an impedance architecture, where we track positions and velocities of the haptic device, use these positions
to command the simulated hand, compute reaction forces, and send these forces to
the haptic device.
To ensure stable haptic rendering, we extend the virtual coupling algorithm
(Colgate et al. 1995) to our setting. This algorithm tracks the state qh of a haptic
device, simulates a virtual tool with state qt, and models bidirectional interaction
using a viscoelastic coupling between device and tool. If the simulation of the tool
remains passive, stability of the complete rendering algorithm is easily guaranteed by
setting the viscoelasticity parameters inside an admissible range.
Our hand simulation setting faces two difficulties for the application of the
v irtual coupling algorithm. The dynamics solver cannot maintain haptic update
rates, and the tracked haptic state and the tool state are high dimensional but
d ifferent. We address these two difficulties with a multirate simulation algorithm and a high-dimensional viscoelastic coupling. Due to hardware limitations, we have applied the hand simulation algorithm only to kinesthetic haptic
rendering, but nothing prevents us from extending it to cutaneous rendering,
because the simulation model computes contact forces and deformations on the
skin. In our experiments, we have tracked a users hand using a glove device,
which provides positions and orientations of the phalanxes and the palm. Ina
haptic thread, running at the device control rate, we simulate a set of proxy
bones composed by the rigid b odies corresponding to the phalanxes and the
palm. For each pair of tracked and proxy bones, we set a 6D viscoelastic coupling (i.e., linear and torsional v iscoelastic springs) (McNeely etal. 1999). For
efficiency reasons, and given that the users hand itself helps maintain the skeletal shape, we do not model joint constraints for the proxy bones in the haptic
thread. In the visual loop, we simulate the full hand in contact with the virtual
environment. We set viscoelastic couplings between the bones of the simulated
hand and the proxy bones in the haptic loop. Simulating two sets of bones, one
in the visual loop (the articulated skeleton) and another one in the haptic loop
(the proxies), provides a good trade-off between the stiffness rendered to the
user and the smoothing produced by the viscoelastic couplings. Note that the
maximum stiffness rendered to the user is imposed by the haptic update rate,
not the visual update rate. Figure21.2 shows a situation where the hand configuration is constrained by contact, and it naturally deviates from the configuration
read from the glove.
We have tested the display of interaction forces using the wearable haptic
glove Cybergrasp. This device supports only forces normal to the finger pads,
hence we project the coupling forces of distal proxy bones onto the direction
normal to each finger pad. Other types of haptic devices could allow the display
of richer forces.
563
FIGURE 21.2 Haptic interaction with a virtual environment through the wearable haptic
glove Cybergrasp using our full-hand linear deformation model.
21.2.5Results
We have executed several experiments, all on a quad-core 2.4 GHz PC with 3 GB of
memory and a GeForce 8800 GTS. Tracking of the hand configuration is performed
using a Cyberglove, as shown in Figure 21.2, and haptic feedback is achieved using a
Cybergrasp haptic device. During free-space motions, the hand follows the configuration read from the glove without apparent lag. In contact situations, the skeleton
and flesh deviate from the glove configuration as expected, due to the virtual coupling. More importantly, contact situations produce interesting bulging deformations
on the hands flesh. These deformations are possible, thanks to the coupling of a soft
flesh to the hands skeleton.
We have also measured the computational performance of several components
of the simulation algorithm. Timings were recorded using the 16-bone skeleton, a
597-tetrahedra mesh for flesh deformation, a 1,441-triangle mesh for collision detection, and a textured 23,640-triangle mesh for rendering. The computations in the
haptic loop are dominated by the solution of rigid body dynamics for the 16 proxy
bones. Nevertheless, these computations run at just 95 s per step on average. In the
visual loop, the computations are dominated by the simulation of the hand and the
computation of collision response. The solution of articulated body dynamics runs
at 7 ms per time step on average, while the solution of the unconstrained flesh and
bone motion runs at 38 ms per time step on average. For the scene shown above, a
full simulation step including collision response runs at 53 ms per time step on average. In our examples, contact was rarely the dominating cost, because we used rather
simple meshes for collision detection. We expect that our model would produce more
realistic deformations with finer simulation and collision meshes, but it would not
run interactively on current commodity PCs.
Although this approach allows the simulation of a full deformable hand in real
time with force feedback, it models the flesh as a linear elastic material, which is
564
not capable of capturing the range of behaviors of the skin. We address this issue
in the following section by introducing strain-limiting constraints and a constraint
optimization approach.
21.3STRAIN-LIMITING FOR NONLINEAR

SOFT SKIN DEFORMATION
The hand tissue is extremely nonlinear, particularly under compression. It is very
compliant under light loading, but soon becomes almost rigid. Correct modeling of
skin tissue requires capturing nonlinear elasticity. However, nonlinear hyperelastic
models can exhibit a very high numerical stiffness, which requires very small simulation time steps. Instead, we propose to model the extreme nonlinearity of the skin
using strain-limiting constraints, which in essence eliminate DoF from the computations. Strain-limiting methods enable larger time steps, and they turn the complexity
into the enforcement of constraints.
Strain-limiting was initially applied to cloth simulation based on the massspring model (Provot 1995; Bridson et al. 2003), and later extended to FEMs
(Thomaszewski et al. 2009). In the finite element setting, it requires the computation of principal strains for mesh elements, which are later constrained
to predefined limits. Principal strains are also computed for the simulation of
invertible hyperelastic materials (Irving etal. 2004), and gradients of principal
strains are needed for robust implicit integration of such hyperelastic materials
(Sin etal. 2011).
In this section, we present a hand skin model that captures efficiently the complex nonlinear behavior of skin mechanics. As shown in Figure 21.3, it eliminates
the limitations of linear materials, while avoiding the computational cost of traditional hyperelastic models. It is based on the addition of strain-limiting constraints
to linear corotational elastic models under stable implicit integration. We first present the basic formulation of strain-limiting constraints for tetrahedral meshes, and
then show how to integrate strain-limiting constraints with contact constraints in a
unified constrained dynamics solver. The overall simulation algorithm is simple and
relies on standard solvers, allowing the solution of dynamics with robust implicit
integration, constraint-based contact, and Coulomb friction. To display haptic feedback through a haptic device, we apply a modular approach that separates the simulation of the skin from the computation of feedback forces, and we connect both
modules using a virtual coupling mechanism (Colgate etal. 1995).
21.3.1Formulation of Strain-Limiting Constraints

For small deformations, the finger tissue is very compliant. For such small deformations, the linear elastic material described earlier captures this compliance well.
However, the finger tissue soon becomes very stiff. This highly nonlinear behavior
of the finger has been measured experimentally (Serina et al. 1997). We approximate this sudden stiffening effect using strain-limiting constraints. We limit strain
565
FIGURE 21.3 As shown on the left, the finger suffers severe artifacts when we use a compliant linear material. On top, bottom view of flipped tetrahedra due to friction forces, and on
the bottom, side view of collapsed finger under pressing forces. As shown on the right, these
situations are robustly handled with our strain-limiting model.
effectively by limiting the deformation gradient of each tetrahedron in the finite element mesh. To this end, we compute a singular value decomposition (SVD) of the
deformation gradient of each tetrahedron:
s1
G = USV S = 0
0
0
s2
0
0 = UT GV, (21.17)
s3
where the singular values (s1 s2 s3) capture deformations along principal axes. For
convenience, we express the rotations U and V based on their columns:

U = ( u1
u2
u 3 ), V = (v1
v2
v 3 ). (21.18)
Unit singular values in all directions (i.e., si = 1) imply no deformation, while a unit
determinant (i.e., det(G) = s1 s2 s3 = 1) implies volume preservation. We enforce
strain limiting by applying a lower limit smin (i.e., compression constraint) and an
upper limit smax (i.e., stretch constraint) on each singular value of the deformation
gradient:

smin si smax . (21.19)
566
21.3.2Constraint Jacobians
We enforce strain limiting constraints following a constrained optimization formulation described in the next section. This formulation requires the computation
of c onstraint Jacobians with respect to the generalized coordinates of the system
(i.e.,the nodal positions of the finite element mesh) due to two reasons. First, constraints are nonlinear, and we locally linearize them in each simulation step. Second,
we enforce constraints using the method of Lagrange multipliers, which applies
forces in the direction normal to the constraints.
To define constraint Jacobians, we take, for example, one compression constraint
of one tetrahedron (the formulation is analogous for stretch constraints):
Ci = si smin 0. (21.20)
From the definitions of the deformation gradient in Equations 21.8 through 21.10 and
its singular values in Equations 21.17 and 21.18, the Jacobians of the constraint with
respect to the four nodes of the tetrahedron can be computed as
Ci si
=
= rj v i u Ti ,
x j x j
j {1, 2, 3, 4}. (21.21)
The derivation follows easily from the fact that sk/gij = uikvjk (Papadopoulo and
Lourakis 2000).
21.3.3Constrained Dynamics
The unconstrained dynamics of deformable bodies can be expressed in matrix form
as Mv = F, where v is a vector that concatenates all nodal velocities, F is a vector
with all nodal forces, including gravity, elasticity, etc., and M is the mass matrix.
Given positions x0 and velocities v0 at the beginning of a simulation step, we integrate the equations with backward Euler implicit integration and linearized forces,
which amounts to solving the following linear system:
Av* = b, with A = M h
F
F
F
h2
v 0 + hF. (21.22)
and b = M h
v
x
v
With these unconstrained velocities v*, we integrate positions x* = x0 + hv*, and

check whether strain-limiting constraints are violated. We group all constraints in
one large vector C, and linearize them at the beginning of the simulation step (C0),
using the generalized constraint Jacobian C/x = J:

1
Jv C0 . (21.23)
h
The constrained dynamics problem is a quadratic program (QP) that consists of finding the closest velocity to the unconstrained one, subject to the constraints, that is,

1
v = arg min(v v* )T A(v v* ), s.t. Jv C0 . (21.24)
h
567
This QP is equivalent to the following LCP:
1
0 JA 1JT + JA 1b + C0 0, (21.25)
h
with constrained velocities computed as

A v = b + JT . (21.26)
In our examples, we solve the LCP using projected GaussSeidel (PGS) relaxation
(Cottle etal. 1992).
21.3.4Contact and Friction

Given two points pa and pb defining a collision, and the outward normal nb at pb, we
formulate a nonpenetration constraint as

C = nTb (pa pb ) 0. (21.27)
Nonpenetration constraints are linearized as in Equation 21.23, added to the vector

of constraints C together with strain-limiting constraints, and solved simultaneously.
Many of the interesting skin deformations appear due to friction. Figure 21.4
shows the nonlinear deformations produced, thanks to strain-limiting finger
mechanics during finger sliding. We incorporate contact friction in the earlier constrained dynamics problem using Coulombs model. In practice, in each iteration of
PGS, after the normal force of a contact n is computed, we add friction forces t in
the tangent plane of contact. To compute the forces, we maximize the dissipation
of tangential velocity subject to the Coulomb cone constraint, t n, where
is the friction coefficient. We approximate the cone constraint using a four-sided
pyramid.
21.3.5Error Metrics
The algorithm described earlier is a standard approach for simulating dynamics
under nonpenetration contact constraints (Kaufman etal. 2008; Otaduy etal. 2009).
Then, the simultaneous simulation of strain-limiting and contact constraints is
FIGURE 21.4 Three screen captures (side and bottom views) of interactive deformations
of a finger model under frictional contact, with unconstrained (reference, middle), backward,
(left) and forward (right) motions of the finger. The finger was simulated with a Young modulus of 2000 kPa and isotropic strain-limiting of 0.9 < si < 1.1.
568
simply carried out by merging strain-limiting constraints of the form (21.20) and
contact constrains of the form (21.27) into the same set of constraints C.
However, the convergence of the PGS solver for the LCP in Equation 21.25
requires appropriate weighting of the various constraint errors. We measure the error
of the constrained problem as
error =

min(Ci , 0) . (21.28)
To weight the constraints, we simply express constraint errors as distances in the units
of the workspace. Contact constraints as in Equation 21.27 are already expressed in
distance units; therefore, we set wi = 1 for them. Strain-limiting constraints as in
Equation 21.20 are dimensionless and indicate a relative scaling of tetrahedra. To
transform them to distance units, we scale each strain-limiting constraint by the
average edge length e of its corresponding tetrahedron, that is, wi = e.
21.3.6 Haptic Coupling

In our current implementation, we provide haptic interaction through kinesthetic
devices, but we plan to integrate our skin model with wearable haptic devices. To
couple the device and the simulation, we follow a virtual coupling approach (Colgate
et al. 1995). The complete deformable finger plays the role of haptic tool, and to
couple the haptic device we associate a rigid haptic handle to the finger (Garre etal.
2011). The handle is connected to the haptic device through a virtual coupling mechanism. We initialize the haptic handle with the same size, mass, and position as the
finger, and we connect the tool and handle using springs, which provide bidirectional
coupling.
In our current implementation, we simulate the finger model in a visual thread,
whose frame rate is limited by the simulation updates. To improve the quality of
haptic feedback, we simulate a proxy handle in a haptic thread running at 1 kHz,
and we set up virtual coupling mechanisms between the handle and its proxy, and
the proxy and the haptic device. Further improvements to haptic feedback could be
possible by substituting the virtual coupling between the handle and the proxy with
a handle-space linearization of the spring forces between the handle and the tool
(Garre etal. 2011).
To test our simulation algorithm, we have used two haptic devices. As shown in
Figure 21.5, a Phantom Premium with a thimble-type end-effector provides direct
kinesthetic feedback on the fingertip. However, the thimble-type end-effector lacks
tracking of orientations, and to test our model under full 6-DoF tracking we have
also integrated the simulation with a Phantom Omni haptic device.
21.3.7Results
To show its behavior and characterize its performance, we show examples of
non-linear soft tissue deformation (Figures 21.4, 21.5 and 21.6), and compare
the behavior to linear elastic materials (Figure 21.3). Most importantly, we have
569
FIGURE 21.5 A user manipulates a haptic device with a thimble end-effector. The split
screen shows, on the left, a first-person view of a finger model tapping a wooden table, and on
the right, a close-up bottom view of the fingertip.
applied our algorithm to the simulation of soft finger contact with haptic feedback
of direct finger interaction, as shown in Figure 21.5. A human finger model is
simulated while tracking the haptic device, and forces and deformations on the
fingers skin are computed interactively to command the feedback forces of the
haptic device.
Finally, we have tested our simulation algorithm with a full-hand model. We have
used a Cyberglove device to track the configuration of the users hand, and we use
PD controllers to command the hands skeleton based on input bone transformations.
In the future, mechanical tracking will be replaced with vision-based markerless
hand tracking, and the simulation will be integrated with wearable devices.
With the linear corotational deformation model, realistic bulging is only possible
under moderate forces, and the hand skin soon suffers artifacts. The top row of
Figure 21.6 shows clear artifacts of the linear corotational model when the deformations are large, in particular at finger joints, where the skin is excessively compressed. Our strain-limiting model produces the desired nonlinear skin behavior
instead, and deformations appear bounded in practice, as shown in the second row
of Figure 21.6.
With the addition of strain-limiting constraints, the simulation is no longer interactive. In the four frames shown in Figure 21.6, the number of active strain-limiting
constraints is 331 on average, and the simulation runs at an average of 376 ms per
time step. Future work will focus on the design of interactive nonlinear skin models
based on strain-limiting, and the current solution will be used for validation purposes. The current constraint-based skin model is interactive for models of moderate
size. Future developments in the project will focus on the estimation of model parameters from acquired data, the design of more efficient solvers for the constraint-based
skin model, and the integration of the skin model and the frictional fingertip contact
model. The current results will serve as a baseline for comparisons, and will be used
570
FIGURE 21.6 Hand animation and simulation of grasping. The snapshots compare the simulation of a full-hand model with and without strain-limiting constraints. Top row: simulation
of flesh using a linear corotational FEM model without strain-limiting. The fingers deform
excessively, particularly at joints. Second row: the same simulation using our nonlinear skin
model based on strain-limiting constraints. Third row: embedding tetrahedral mesh without
constraints. Fourth row: embedding tetrahedral mesh with constraints.
for validation purposes of more efficient models. In addition, the final model will be
used not only in the context of the model-based device control strategy, but also for
a model-based analysis of perceptual processes.
21.4 ANISOTROPIC SOFT SKIN DEFORMATION

Many nonlinear stiff materials exhibit anisotropic behaviors. In human anatomy,
for instance, the presence of flesh, skin, and bones generates a highly nonlinear and
anisotropic behavior, with different amounts of deformation depending on the position and direction of the applied loads. A real-world finger is a clear example of
anisotropic nonlinear elastic behavior, particularly under compression. It is very
compliant under light loading, but soon becomes almost rigid. This is true when the
fingertip is pressed flat against a surface. When pressed on the side, there is hardly
any deformation, showing a high anisotropy.
This anisotropic behavior has not been addressed in the past in the context of constraint-based strain limiting, yet many materials exhibit different material responses
depending on the deformation direction. Anisotropic behaviors are hard to implement in edge-based strain-limiting approaches (Provot 1995; Bridson et al. 2003),
571
sinceedges need to be aligned with the deformation direction that is being constrained,
requiring extensive remeshing. In continuum-based approaches, Thomaszewski etal.
(2009) use different limits for each strain value component of a cloth simulation
(weft, warp, and shear strains). With this approach, limits and strain values are always
defined on undeformed axes; hence they do not distinguish well the various deformation modes under large deformations. Picinbono etal. (2003) allow transverse anisotropic strain-limiting (with a transverse and a radial privileged direction) by adding
an energy term to a hyperelasticity formulation, penalizing stretch deformations in
the transverse direction. This formulation does not suffer from the same problems
as full anisotropy, since strain-limiting is only enforced on one axis, the radial axis
being free to deform. Therefore, no interpolation is required, but only a projection of
the strain tensor along the transverse direction. There are simple and straightforward
approaches used in other contexts to model anisotropy, but these solutions produce
unrealistic results for strain-limiting.
In this chapter, we describe a novel hyperbolic projection function to compute
stretch and compress limits along any deformation direction, and formulate the
strain-limiting constraints based on this projection. We compare our approach to
nave solutions and different approaches found in the literature, and show that our
approach produces predictable and more realistic results.
21.4.1 Definition of Strain Limits

In the isotropic case, computing the limits for any principal axis of deformation is
straightforward, since the limits are all the same no matter the direction.
In the anisotropic case, however, limits and deformation values are defined on
different sets of axes. Stretch and compress limits are defined on each axis of the
j
j
global reference frame (smax
and smin
, with j {1,2,3}). Deformation values are
defined along the principal axes of deformation computed through the SVD (si, with
i {1,2,3}). In general, the frames do not match. Yet, we need to know the value
of stretch and compress limits along the principal axes of deformation to be able to
formulate the constraints as in Equation 21.19. In the following section, we describe
our method for the computation of deformation limits along the principal axes from
deformation limits given on a global frame.
Figure 21.7 illustrates the problem and our solution. Let Fd be the orthonormal
frame representing the principal axes of deformation. According to the SVD decomposition in Equation 21.17, in order to transform a vector from the global frame
(where the limits are defined) to the frame Fd (where the deformation values are
defined), the vector has to be rotated by matrix VT. Since the limits are defined on
the global frame, which uses a canonical basis (e1 e2 e3), VT provides the three directions along which the limits are known in Fd. However, the deformation values to
be limited are known along the axes of Fd. Hence, our problem is reduced to finding
what the limits are along these axes.
For the general case, we require a function p that projects each rotated limit
onto the axes of Fd , thus providing stretch and compress limits to apply to each
deformation value si. Since there are three directions (e1 e2 e3) with two limits each
572

1,2
s max
VTe2
e2
s2
1,2
s min
VTe1
12
1
smin
1
smax
11
s1
1,1
smin
1,1
smax
FIGURE 21.7 Illustration of our hyperbolic projection method, which projects the limits
from the rotated global axes onto the principal axes of deformation.
(stretch and compress), and each direction has to be projected on each axis of Fd ,
there is a total of 18 limits to be computed (6 for each deformation value si).
21.4.2 Hyperbolic Projection Function

Nave approaches for p, such as orthogonal projection or linear interpolation, result
in incorrect or unrealistic results, as we will show later. Naturally, we want a nonlinear interpolation where the limit remains unchanged if deformation and limit
directions match, and where the limit vanishes (i.e., becomes infinitely large) when
deformation and limit directions are orthogonal. Therefore, we define p as
p() =
1
, (21.29)
| cos() |
where is the angle between a given rotated limit direction and a given axis of Fd.
j
j
Let us consider, for instance, axis ej of the global frame, where smin
and smax
are
defined. The limit direction in Fd is VT ej, and the axes of Fd are ((1,0,0)T (0,1,0)T
(0,0,1)T) = (e1 e2 e3). This results in the following stretch and compress values for each
deformation value si:
j ,i
smin
= 1+
j
smin
1
, (21.30)
T T
| ei V e j |
j ,i
smax
= 1+
j
smax
1
. (21.31)
T T
| ei V e j |
573
Equations 21.30 and 21.31 provide stretch and compress values for each limit defined
on a global axis (j {1,2,3}) and each principal axis of deformation (i {1,2,3}).
21.4.3Constraint Formulation
In the isotropic case, the constraints are defined as

i
Cmin
= si smin 0, (21.32)
i
Cmax
= smax si 0. (21.33)
Based on Equations 21.30 and 21.31, we reformulate our constraints to take into
account each interpolated limit, resulting in

j ,i
j
Cmin
= | e Ti VT e j | (si 1) (smin
1) 0, (21.34)
j ,i
j
Cmax
= (smax
1) | e Ti VT e j | (si 1) 0. (21.35)
21.4.4Constraint Jacobians
We enforce strain limiting constraints following the constrained dynamics algorithm described in the previous section. This formulation requires the computation
of c onstraint Jacobians with respect to the generalized coordinates of the system
(i.e.,the nodal positions of the finite element mesh) due to two reasons. First, constraints are nonlinear, and we locally linearize them in each simulation step. Second,
we enforce constraints using the method of Lagrange multipliers, which applies
forces in the direction normal to the constraints.
Taking the derivatives of Equations 21.34 and 21.35 with respect to a node xn
requires computing the derivatives of si and VT with respect to xn. For the differentiation of si, recall from Equation 21.21 that
si
= rn v i u Ti . (21.36)
x n
Papadopoulo and Lourakis (Papadopoulo and Lourakis 2000) define the derivative
of V with respect to each component gkl of the deformation gradient G as
V
= Vvk ,l , (21.37)
gkl
where vk ,l is found by solving a 2 2 linear system for each gkl.

Since we need the derivative of the transpose of V, and knowing that vk ,l is antisymmetric, we have
VT
= vk ,l VT . (21.38)
gkl
574
We can now use the chain rule to get the derivatives with respect to tetrahedral
nodesxn. To avoid dealing with rank-3 tensors, we directly formulate the derivatives
of VT ej instead:
VT e j
=
x n
1,l
v
VT e j
v2,l VT e j
v3,l VT e j rn,l . (21.39)
Using Equations 21.36 and 21.39, we can compute the derivatives of the constraints
in Equations 21.34 and 21.35 with respect to the nodal positions of the mesh:
i, j
VT e j
Cmin
s
= (si 1)sign e Ti VT e j e Ti
+ e Ti VT e j i , (21.40)
x n
x n
x n
i, j
VT e j
Cmax
s
= (1 si )sign e Ti VT e j e Ti
e Ti VT e j i . (21.41)
x n
x n
x n
21.4.5Comparison with Other Approaches

In order to justify the use of our hyperbolic projection function for the computation of limits along an arbitrary direction, in this section we show that straightforward approaches do not yield correct results. We compare our hyperbolic projection
method with the two simple but nave approaches among the projection and the interpolation categories: orthogonal projection and linear interpolation.
Orthogonal projection works by simply rotating the limits defined in the global
frame to frame Fd, and then projecting these limits onto the axes of Fd, where the
deformation values are defined. Therefore, there is a total of 18 limits and c onstraints,
j ,i
as in our approach, with three stretch limits s _ orthopro jmax
and three compress
j ,i
limits s _ orthopro jmin per principal axis of deformation:

j ,i
j
s _ orthopro jmin
= 1 + smin
1 e Ti VT e j , (21.42)
j ,i
j
s _ orthopro jmax
= 1 + smax
1 e Ti VT e j . (21.43)
Linear interpolation, on the other hand, interpolates the values defined in the
global frame to find the limits along an arbitrary direction. Instead of rotating
theglobalframe to Fd, we proceed the other way around: we apply the inverse rotation to Fd to get the principal axes of deformation in the global frame. This allows us
to easily compute the interpolations by simply computing the intersection of the line
defined by each principal axis of deformation with the ellipsoid defined by the global
frame and its limits. Therefore, there is a total of six limits and constraints, as in the
575
i
i
i sotropic case, with a stretch limit s _ linearintmax
and a compress limit s _ linearintmin
per principal axis of deformation:
i
min
s1min
= 0
0
i
max
1
smax
= 0
0
s _ linearint

s _ linearint

2
min
2
max
0 V e i , (21.44)
3
smin
0 V e i . (21.45)
3
smax
We highlight the limitations of both aforementioned approaches in the simple

scenario of a compliant vertical beam (E = 10 kPa), fixed at its bottom, and compressing due to gravity. Poissons ratio is set to = 0.3. The vertical axis and one of
the transverse axes are unrestricted. The remaining transverse axis can only deform
up to 5% (i.e., 0.95 < si < 1.05).
In the case of orthogonal projection, constraints are already violated during the
first frame of simulation, thus clearly yielding an overly stiff material. The reason
behind this erroneous behavior is the absence of weights to reduce the influence of
the limits defined on axes that are far from the principal axes of deformation. In our
scenario, the SVD decomposition computed a vertical principal axis of deformation matching the global vertical axis. Therefore, the other two global axes, where
stretch and compress limits are defined, are orthogonal to the vertical principal axis
of deformation. Since the orthogonal projection between orthogonal axes is zero,
according to Equations 21.42 and 21.43 there are two stretch and two compress limits on the vertical principal axis of deformation that are equal to one, meaning that
no deformation is allowed along that axis. The beam is therefore frozen in its initial
configuration.
In the case of linear interpolation, constraints are violated very late, when the
beam has almost completely collapsed on itself and artifacts start to appear, clearly
beyond the expected 5% maximal transversal deformation. This is due to the
weighted combination of unrestrictive limits and very restrictive ones. Since values
are interpolated, the very restrictive limit (in this case, the 5% limit) is progressively
relaxed to the unrestrictive limit as the principal axis of deformation moves from the
restricted to the unrestricted axis. Since in this vertical beam scenario the rotation V
resulted in a 180 rotation around the vertical axis, the transversal unrestrictive limit
overly relaxed the transversal restrictive limit, thus resulting in an overly compliant
material.
When using our approach, the state of the beam is coherent with the 5% transversal deformation limit.
576
FIGURE 21.8 Finger skin deformations under anisotropic strain-limiting. A finger is

pressed against a table in three different configurations. Top: the finger has anisotropic limits
simulating the behavior of a real finger (compliant when pressed flat, stiff otherwise). Bottom:
the finger with isotropic compliant limits.
21.4.6Results
We simulate these highly nonlinear, highly anisotropic conditions using our anisotropic strain-limiting approach on the finger model. The finger model is initialized with its longitudinal direction aligned with the horizontal axis (e1), and the
nail facing up along the vertical axis (e2). Limits are defined as 0.95 < s1 < 1.05
(stiff alonge1), 0.75 < s2 < 1.25 (compliant along e2), and 0.98 < s3 < 1.02 (almost
incompressible along e3). The aforementioned simulation parameters were selected
by trial and error to approximately match the behavior of a real finger. Figure 21.8
shows some results of the deformations when the finger is pressed against a table
along each axis. Wecompared the anisotropic model with the isotropic one using
0.75<s1<1.25. As expected, for the same motions we obtained similar results along
(e2), and overly compliant behavior along the other axes.
Overall, these results could have a strong impact on haptic rendering of direct
hand interaction too. In a model-based control strategy to command tactile devices, it
is important to compute realistic contact variables such as contact area, friction, force
magnitude, and force distribution. We have compared contact variables with a linear elastic model and our isotropic strain-limiting approach in an experiment where
the full finger is pressed flat against a plane: our strain-limiting approach shows the
expected fast increase of the contact force once a certain contact area is reached.
21.5CONCLUSION
In this chapter, we presented a set of models for efficiently simulating the highly
nonlinear elastic deformations of soft skin under frictional contact, as well as a full
human hand model with its articulated skeleton and soft flesh, allowing realistic
grasping manipulations. This work is a step toward a model-based control strategy
for wearable haptic rendering of direct hand interaction. This ultimate goal requires
a realistic hand model, allowing the real-time computation of high-fidelity forces and
nonlinear deformations during skin contact with simulated objects and materials.
577
We first quickly surveyed existing devices and techniques allowing rigid and
deformable hand interaction and grasping, highlighting the need for novel techniques
that accounted for the highly nonlinear and anisotropic behavior of flesh, as well as
efficient haptic coupling mechanisms. We then presented a hand model allowing
efficient two-way coupling between flesh and bones for haptic feedback, followed by
a set of constraint based models addressing the nonlinear and anisotropic behavior
of hand deformations.
Haptic interaction requires very fast update rates, and our current bottleneck is
the dynamics solver of the constrained optimization. We are exploring the use of
more efficient solvers, ideally reaching interactive rates for high-resolution models
and allowing more detailed haptic feedback.
We are also working on improving the way deformation limits are chosen and
set. We are exploring the estimation of limits from real force-deformation measurements, and thus mimic the behavior of real-world materials. The automatic estimation and placement of limits in a given model using real-world data (Bickel etal.
2009) would avoid ad hoc tuning and would improve the quality of the deformations.
In our work, we have focused on the improvement of the elastic behavior of finger
simulation models, but accurate modeling of the finger can leverage additional recent
findings about its mechanical behavior (Wiertlewski and Hayward 2012).
This work was largely motivated by the use of an accurate soft skin simulation
as a model-based control strategy in the command of cutaneous haptic devices.
Currently, we have successfully tested the simulation with kinesthetic haptic devices,
and we plan to test it as well with wearable cutaneous devices. To this end, it is
important to identify the particular forces and/or deformations needed to command
specific cutaneous devices.
ACKNOWLEDGMENTS
The authors would like to thank Carlos Garre and Fernando Hernndez for their
contributions. This work was supported in part by the EU FP7 project Wearhap
(601165), the European Research Council (ERC-2011-StG-280135 Animetrics), and
the Spanish Ministry of Economy (TIN2012-35840).
REFERENCES
Amemiya, T., H. Ando, and T. Maeda. March 2007. Hand-held force display with spring-cam
mechanism for generating asymmetric acceleration. In EuroHaptics Conference, 2007
and Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems.
World Haptics 2007. Second Joint, Tsukaba, Japan, pp. 572573.
Aoki, T., H. Mitake, S. Hasegawa, and M. Sato. 2009. Haptic ring: Touching virtual creatures in mixed reality environments. In SIGGRAPH09: Posters, SIGGRAPH09,
NewOrleans,LA. New York: ACM, vol. 100, p. 1100:1.
Baraff, D. and A. Witkin. 1998. Large steps in cloth simulation. In Computer Graphics
(Proceedings of SIGGRAPH 98), Orlando, FL. New York: ACM, pp. 4354.
Barbagli, F., A. Frisoli, K. Salisbury, and M. Bergamasco. 2004. Simulating human fingers:
A soft finger proxy model and algorithm. In Proceedings of the 12th International
Symposium on HAPTICS04, Chicago, IL, vol. 1, pp. 917.
578
Bergamasco, M., B. Allotta, L. Bosio, L. Ferretti, G. Parrini, G.M. Prisco, F. Salsedo, and
G. Sartini. May 1994. An arm exoskeleton system for teleoperation and virtual environments applications. In Proceedings of the 1994 IEEE International Conference on
Robotics and Automation, San Diego, CA, vol. 2, pp. 14491454.
Bianchi, M., A. Serio, E.P. Scilingo, and A Bicchi. March 2010. A new fabric-based softness
display. In 2010 IEEE Haptics Symposium, Waltham, MA, pp. 105112.
Bickel, B., M. Bcher, M.A. Otaduy, W. Matusik, H. Pfister, and M. Gross. July 2009. Capture
and modeling of non-linear heterogeneous soft tissue. ACM Transactions on Graphics
28 (3): 89:189:9.
Biggs, K. and M.A. Srinivasan. 2002. Haptic interfaces. In K.M. Stanney (ed.), Handbook of
Virtual Environments: Design, Implementation, and Applications, vol. 1. Boca Raton,
FL: CRC Press, pp. 93116.
Borst, C.W. and A.P. Indugula. 2005. Realistic virtual grasping. In Proceedings of the IEEE
Virtual Reality Conference, Bonn, Germany, pp. 9198.
Bridson, R., S. Marino, and R. Fedkiw. 2003. Simulation of clothing with folds and w
rinkles.
In Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer
Animation, San Diego, CA, pp. 2836.
Chinello, F., M. Malvezzi, C. Pacchierotti, and D. Prattichizzo. 2012. A three DoFs wearable
tactile display for exploration and manipulation of virtual objects. In Proceedings of the
IEEE Haptics Symposium, Vancouver, BC, pp. 7176.
Chubb, E.C., J.E. Colgate, and M.A. Peshkin. 2010. ShiverPaD: A glass haptic surface that
produces shear force on a bare finger. IEEE Transactions on Haptics 3 (3): 189198.
Ciocarlie, M., C. Lackner, and P. Allen. 2007. Soft finger model with adaptive contact
geometry for grasping and manipulation tasks. In World Haptics Conference, Tsukaba,
Japan, pp. 219224.
Colgate, J.E., M.C. Stanley, and J.M. Brown. 1995. Issues in the haptic display of tool use.
In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and
Systems, Pittsburgh, PA, pp. 140145.
Cottle, R., J. Pang, and R. Stone. 1992. The Linear Complementarity Problem. Boston, MA:
Academic Press.
Duriez, C., H. Courtecuisse, J.-P. de la Plata Alcalde, and P.-J. Bensoussan. 2008. Contact
skinning. In Eurographics Conference (short paper). Crete, Greece.
Frisoli, A., F. Barbagli, E. Ruffaldi, K. Salisbury, and M. Bergamasco. 2006. A limit-curve
based soft finger god-object algorithm. In 14th Symposium on Haptic Interfaces for
Virtual Environment and Teleoperator Systems, Arlington, VA, pp. 217223.
Garre, C., F. Hernandez, A. Gracia, and M.A. Otaduy. 2011. Interactive simulation of a
deformable hand for haptic rendering. In Proceedings of the World Haptics Conference,
Istanbul, Turkey, pp. 239244.
Gemperle, F., C. Kasabach, J. Stivoric, M. Bauer, and R. Martin. October 1998. Design for
wearability. In Second International Symposium on Wearable Computers, 1998. Digest
of Papers, Pittsburgh, PA, pp. 116122.
Gleeson, B.T., S.K. Horschel, and W.R. Provancher. October 2010. Design of a fingertipmounted tactile display with tangential skin displacement feedback. IEEE Transactions
on Haptics 3 (4): 297301.
Gleeson, B.T., C.A. Stewart, and W.R. Provancher. 2011. Improved tactile shear feedback: Tactor
design and an aperture-based restraint. IEEE Transactions on Haptics 4 (4): 253262.
Hernandez, F., G. Cirio, A.G. Perez, and M.A. Otaduy. 2013. Anisotropic strain limiting.
InProceedings of the CEIG. Madrid, Spain.
Howe, R.D. and M.R. Cutkosky. December 1996. Practical force-motion models for sliding
manipulation. International Journal of Robotics Research 15 (6): 557572.
579
Irving, G., J. Teran, and R. Fedkiw. 2004. Invertible finite elements for robust simulation of
large deformation. In Proceedings of the ACM SIGGRAPH/Eurographics Symposium on
Computer Animation. Aire-la-Ville, Switzerland: Eurographics Association, pp. 131140.
Jacobs, J. and B. Froehlich. March 2011. A soft hand model for physically-based manipulation of virtual objects. In 2011 IEEE Virtual Reality Conference (VR), Singapore,
pp.1118.
Jacobs, J., M. Stengel, and B. Froehlich. March 2012. A generalized god-object method for
plausible finger-based interactions in virtual environments. In 2012 IEEE Symposium on
3D User Interfaces (3DUI), Costa Mesa, CA, pp. 4351.
Kaufman, D.M., S. Sueda, D.L. James, and D.K. Pai. 2008. Staggered projections for f rictional
contact in multibody systems. In Proceedings of the ACM SIGGRAPH Asia, Singapore,
vol. 27, pp.164:1164:11.
Kim, H., C. Seo, J. Lee, J. Ryu, S. Yu, and S. Lee. September 2006. Vibrotactile display
for driving safety information. In IEEE Intelligent Transportation Systems Conference,
2006. ITSC06, Toronto, Ontario, Canada, pp. 573577.
Kry, P.G., D.L. James, and D.K. Pai. July 2002. EigenSkin: Real time large deformation character skinning in hardware. In ACM SIGGRAPH Symposium on Computer Animation,
San Antonio, TX. New York: ACM, pp. 153160.
Kry, P.G. and D.K. Pai. July 2006. Interaction capture and synthesis. ACM Transactions on
Graphics 25 (3): 872880.
Kurihara, T. and N. Miyata. July 2004. Modeling deformable human hands from medical
images. In 2004 ACM SIGGRAPH/Eurographics Symposium on Computer Animation.
Aire-la-Ville, Switzerland: Eurographics Association, pp. 355363.
Li, Y., J.L. Fu, and N.S. Pollard. July/August 2007. Data-driven grasp synthesis using shape
matching and task-based pruning. IEEE Transactions on Visualization and Computer
Graphics 13 (4): 732747.
Lieberman, J. and C. Breazeal. October 2007. TIKL: Development of a wearable vibrotactile
feedback suit for improved human motor learning. IEEE Transactions on Robotics 23
(5): 919926.
Lin, M.C. and M.A. Otaduy (eds.). July 2008. Haptic Rendering: Foundations, Algorithms
and Applications, Illustrated edition. Wellesley, MA: A K Peters Ltd.
Magnenat-Thalmann, N., R. Laperrire, and D. Thalmann. June 1988. Joint-dependent local
deformations for hand animation and object grasping. In Graphics Interface88. Toronto,
Ontario, Canada: Canadian Information Processing Society, pp. 2633.
McNeely, W.A., K.D. Puterbaugh, and J.J. Troy. August 1999. Six degrees-of-freedom haptic
rendering using voxel sampling. In Proceedings of the SIGGRAPH 99, Computer
Graphics Proceedings, Los Angeles, CA, vol. 18, pp. 401408.
Minamizawa, K., S. Fukamachi, H. Kajimoto, N. Kawakami, and S. Tachi. 2007. Gravity
grabber: Wearable haptic display to present virtual mass sensation. In ACM SIGGRAPH
2007 Emerging Technologies, SIGGRAPH07. New York: ACM.
Minamizawa, K., S. Kamuro, S. Fukamachi, N. Kawakami, and S. Tachi. 2008. GhostGlove:
Haptic existence of the virtual world. In ACM SIGGRAPH 2008 New Tech Demos,
SIGGRAPH08, Los Angeles, CA. New York: ACM, vol. 18, Article No. 8, p. 118:1.
Mller, M. and M. Gross. 2004. Interactive virtual materials. In Proceedings of the Graphics
Interface, Canadian HumanComputer Communications Society School of Computer
Science, University of Waterloo, Waterloo, Ontario, Canada, pp. 239246.
Nakamura, N. and Y. Fukui. March 2007. Development of fingertip type non-grounding
force feedback display. In EuroHaptics Conference, 2007 and Symposium on Haptic
Interfaces for Virtual Environment and Teleoperator Systems. World Haptics 2007.
Second Joint, Tsukaba, Japan, pp. 582583.
580
Nordahl, R., A. Berrezag, S. Dimitrov, L. Turchet, V. Hayward, and S. Serafin. 2010.

Preliminary experiment combining virtual reality haptic shoes and audio synthesis.
InA. Kappers, J. van Erp, W.B. Tiest, and F. van der Helm (eds.), Haptics: Generating
and Perceiving Tangible Sensations, vol. 6192 of Lecture Notes in Computer Science.
Berlin, Germany: Springer, pp. 123129. doi: 10.1007/978-3-642-14075-4-18.
Ortega, M., S. Redon, and S. Coquillart. 2007. A six degree-of-freedom god-object method
for haptic display of rigid bodies with surface properties. IEEE Transactions on
Visualization and Computer Graphics 13 (3): 458469.
Otaduy, M.A., R. Tamstorf, D. Steinemann, and M. Gross. April 2009. Implicit contact
handling for deformable objects. Computer Graphics Forum 28 (2): 559568.
Ott, R., F. Vexo, and D. Thalmann. 2010. Two-handed haptic manipulation for CAD and VR
applications. Computer Aided Design & Applications 7 (1): 125138.
Papadopoulo, T. and M.I.A. Lourakis. 2000. Estimating the jacobian of the singular value
decomposition: Theory and applications. In European Conference on Computer Vision,
Dublin, Ireland. London, U.K.: Springer-Verlag, pp. 554570.
Papetti, S., F. Fontana, M. Civolani, A. Berrezag, and V. Hayward. 2010. Audio-tactile
display of ground properties using interactive shoes. In R. Nordahl, S. Serafin,
F. Fontana, and S. Brewster (eds.), Haptic and Audio Interaction Design, vol. 6306
of Lecture Notes in Computer Science. Berlin, Germany: Springer, pp. 117128. doi:
10.1007/978-3-642-15841-4-13.
Perez, A.G., G. Cirio, F. Hernandez, C. Garre, and M.A. Otaduy. 2013. Strain limiting for
soft finger contact simulation. In Proceedings of the IEEE World Haptics Conference,
Daejeon, South Korea, pp. 7984.
Picinbono, G., H. Delingette, and N. Ayache. 2003. Non-linear anisotropic elasticity for realtime surgery simulation. Graphical Models 65 (5): 305321.
Pollard, N.S. and V.B. Zordan. July 2005. Physically based grasping control from e xample.
In 2005 ACM SIGGRAPH/Eurographics Symposium on Computer Animation,
LosAngeles, CA. NewYork: ACM, pp. 311318.
Pouliquen, M., C. Duriez, C. Andriot, A Bernard, L. Chodorge, and F. Gosselin. March 2005.
Real-time finite element finger pinch grasp simulation. In Eurohaptics Conference, 2005
and Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems,
2005. World Haptics 2005. First Joint, Pisa, Italy, pp. 323328.
Prattichizzo, D., F. Chinello, C. Pacchierotti, and M. Malvezzi. October 2013. Towards wearability in fingertip haptics: A 3-DoF wearable device for cutaneous force feedback. IEEE
Transactions on Haptics 6 (4): 506516.
Provot, X. 1995. Deformation constraints in a mass-spring model to describe rigid cloth
behavior. In Proceedings of the Graphics Interface, Quebec, QC.
Rivers, A.R. and D.L. James. 2007. FastLSM: Fast lattice shape matching for robust real-time
deformation. In ACM Transactions on Graphics (Proceedings of SIGGRAPH 2007),
26(3), Article 82.
Sarakoglou, I., N. Garcia-Hernandez, N.G. Tsagarakis, and D.G. Caldwell. 2012. A high
performance tactile feedback display and its integration in teleoperation. IEEE
Transactions on Haptics 5 (3): 252263.

Scheggi, S., F. Chinello, and D. Prattichizzo. 2012. Vibrotactile haptic feedback for humanrobot interaction in leader-follower tasks. In Proceedings of the Fifth International
Conference on PErvasive Technologies Related to Assistive Environments, PETRA12,
Crete, Greece. New York: ACM, vol. 51, p. 151:4.
Scilingo, E.P., M. Bianchi, G. Grioli, and A. Bicchi. 2010. Rendering softness: Integration of
kinesthetic and cutaneous information in a haptic device. IEEE Transactions on Haptics
3 (2): 109118.
581
Serina, E.R., C.D. Mote, and D. Rempel. 1997. Force response of the fingertip pulp to repeated
compression: Effects of loading rate, loading angle and anthropometry. Journal of
Biomechanics 30 (10): 10351040.
Sin, F., Y. Zhu, Y. Li, D. Schroeder, and J. Barbi. 2011. Invertible isotropic hyperelasticity
using SVD gradients. In ACM SIGGRAPH/Eurographics Symposium on Computer
Animation (Posters), Vancouver, BC.
Solazzi, M., A. Frisoli, and M. Bergamasco. September 2010. Design of a cutaneous fingertip
display for improving haptic exploration of virtual objects. In 2010 IEEE RO-MAN,
Viareggio, Italy, pp. 16.
Solazzi, M., W.R. Provancher, A. Frisoli, and M. Bergamasco. 2011. Design of a SMA a ctuated
2-DoF tactile device for displaying tangential skin displacement. In IEEE World Haptics
Conference, Istanbul, Turkey, pp. 3136.
Sueda, S., A. Kaufman, and D.K. Pai. August 2008. Musculotendon simulation for hand
animation. ACM Transactions on Graphics 27 (3). Article No. 8.
Thomaszewski, B., S. Pabst, and W. Strasser. 2009. Continuum-based strain limiting. Computer
Graphics Forum 28 (2): 569576.
Traylor, R. and H.Z. Tan. 2002. Development of a wearable haptic display for situation
awareness in altered-gravity environment: Some initial findings. In Proceedings of
the 10th Symposium on Haptic Interfaces for Virtual Environment and Teleoperator
Systems, 2002. HAPTICS 2002, Orlando, FL, pp. 159164.
Turchet, L., P. Burelli, and S. Serafin. 2013. Haptic feedback for enhancing realism of walking
simulations. IEEE Transactions on Haptics 6 (1): 3545.
Wang, Q. and V. Hayward. 2010. Biomechanically optimized distributed tactile transducer
based on lateral skin deformation. International Journal of Robotics Research 29 (4):
323335.
Wiertlewski, M. and V. Hayward. 2012. Mechanical behavior of the fingertip in the range
of frequencies and displacements relevant to touch. Journal of Biomechanics 45 (11):
18691874.
Winfree, K.N., J. Gewirtz, T. Mather, J. Fiene, and K.J. Kuchenbecker. March 2009. A high
fidelity ungrounded torque feedback device: The iTorqU 2.0. In EuroHaptics conference,
2009 and Symposium on Haptic Interfaces for Virtual Environment and Teleoperator
Systems. World Haptics 2009. Third Joint, Salt Lake City, UT, pp. 261266.
Yang, G.-H., K.-U. Kyung, M.A. Srinivasan, and D.-S. Kwon. March 2007. Development
of quantitative tactile display device to provide both pin-array-type tactile feedback
and thermal feedback. In EuroHaptics Conference, 2007 and Symposium on Haptic
Interfaces for Virtual Environment and Teleoperator Systems. World Haptics 2007.
Second Joint, Tsukaba, Japan, pp. 578579.
Yang, T.-H., S.-Y. Kim, C.-H. Kim, D.-S. Kwon, and W.J. Book. March 2009. Development of
a miniature pin-array tactile module using elastic and electromagnetic force for mobile
devices. In EuroHaptics conference, 2009 and Symposium on Haptic Interfaces for
Virtual Environment and Teleoperator Systems. World Haptics 2009. Third Joint, Salt
Lake City, UT, pp. 1317.
Zilles, C.B. and J.K. Salisbury. 1995. A constraint-based god-object method for haptic display.
In IEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 3. Los
Alamitos, CA: IEEE Computer Society, p. 3146.
22
Design Challenges
of Real Wearable
Computers
Attila Reiss and Oliver Amft
CONTENTS
22.1 Introduction................................................................................................. 584
22.1.1 Review Inclusion and Exclusion Criteria....................................... 585
22.2 Garment-Based Wearable Computers......................................................... 585
22.2.1 Placement on Human Body........................................................... 590
22.2.1.1 Head and Neck Region................................................... 590
22.2.1.2 Torso Region................................................................... 590
22.2.1.3 Arms and Hands Region................................................. 593
22.2.1.4 Legs Region.................................................................... 593
22.2.1.5 Feet Region..................................................................... 593
22.3 Accessory-Based Wearable Computers....................................................... 594
22.3.1 Placement on Human Body........................................................... 595
22.3.1.1 Head and Neck Region................................................... 595
22.3.1.2 Torso Region................................................................... 595
22.3.1.3 Arm Region..................................................................... 598
22.3.1.4 Wrist Region................................................................... 599
22.3.1.5 Hand and Finger Region................................................. 599
22.3.1.6 Legs and Feet Region......................................................600
22.4 Lessons Learned and Best Practices...........................................................600
22.4.1 User Interface................................................................................. 600
22.4.2 Sensing Modalities......................................................................... 603
22.4.3 Data and Power.............................................................................. 607
22.4.4 Wearability..................................................................................... 608
22.4.5 Social Acceptance and Aesthetics.................................................. 609
22.4.6 Robustness and Reliability............................................................610
22.4.7 Extensibility................................................................................... 611
22.4.8 Cost................................................................................................ 612
22.4.9 Safety............................................................................................. 612
22.4.10 On-Body Computing..................................................................... 612
22.5 Future Directions in Wearable Computing.................................................. 613
Acknowledgment.................................................................................................... 615
References............................................................................................................... 615
583
584
22.1INTRODUCTION
The foundations of wearable computing may lie in pocket and wrist-worn watches that
were invented in the sixteenth century. Thorpe and Shannon are credited for being
among the first researchers attempting to build wearable computers in 1961. Their
motivation was to beat statistics and increase chances of winning in card games and
roulette (Thorp, 1998). While their wearable systems were rudimentary by todays
technology standards, they battled with integrating critical features that are key components of a wearable computer until today: sensing or information input, computing,
and some form of actuation, feedback, or communication of retrieved information.
Thorp and Shannons solution was based on game status input using a shoe-integrated
button, computing using a carry-on device hidden in cloths, and information feedback using an audio ear pad, all to avoid being detected by the game clerk. Between
the 1990s and today, wearable computers gained commercial and research interest
and first on-body computing products appeared in the markets. In the 1990s, onbody computer solutions were often derived from standard computing components
available at the time, such as the half-keyboard (Matias etal., 1994) and backpack
or side-case worn embedded computing units (Lizzy, 1993; Starner, 1993). Asmore
integrated sensors became available, the view on market opportunities shifted from
using on-body computers for mobile data entry applications, such as in warehouse
management, to fitness, sports, and the quantified self. However, a substantial share
of todays commercial on-body systems, such as the Nike running gear (shoe sensor plus mobile carry-on device), still follows a similar technical approach as in the
early solution of Thorpe and Shannon: the on-body computer solution is c entered
around a mobile carry-on device that provides computing, and often also sensing and
actuation/communication functions. Similarly, many research efforts during the past
years considered smartphones and other carry-on devices as basis for wearable computers. Devices are often placed in pockets, attached to glasses, or strapped to body
parts, without actually considering the integration into a wearable system.
The vision of invisible computing set out by Mark Weiser (1991) suggests that
technology shall not require particular consideration during everyday life, be virtually invisible, and thus not hinder physical activity. Consequently, electronic
devices that add features to wearable systems, including computing, sensors, etc.,
must be unobtrusively embedded in a users outfit. In its ultimate form, real wearable computers thus become part of a regular garment or accessory that is already
used. Toward the integration of wearable computers, various challenges exist that
affect function, robustness, and various other design considerations. For example,
when integrating computing into a finger ring, space constraints critically affect the
design, besides aesthetic requirements of the ring as an accessory. Rather than space,
wearable computers integrated in garments are mostly constraint to textile material
properties such as breathability and stretchability.
In this chapter, we review projects that investigate the integration of wearable
computers in garments and accessories. Our goal is to provide readers with an overview and in-depth understanding of the technical challenges and best practices of
wearable computer integration, thus wearables that could become useful in everyday
Design Challenges of Real Wearable Computers
585
life situations. In particular, we summarize technical design challenges and lessons

learned, and provide directions for future research on real wearable computers.
The analysis emphasizes work toward the integration of sensing, computing, and
actuation with (1) textile fabric to obtain garment-based wearable computers and (2)
commonly used body attachments to obtain accessory-based wearable computers as
detailed in the following section.
22.1.1 Review Inclusion and Exclusion Criteria

Various technical challenges exist for integrating wearable computers in garments
and accessories, which is the focus of this review of research literature. An essential
prerequisite for real wearable computers is that the base purpose of a garment or
accessory must be retained when integrating electronics:
Garment-based wearable computers. Garments need to provide cover and
protection from environmental effects, such as water, temperature, and fire,
when adding wearable computing. Similarly, garments may serve as fashion showcase. Examples include shirts, gloves, pants, and shoes.
Accessory-based wearable computers. Accessories shall retain their purpose
for assistance or aesthetics when adding wearable computing. Examples
include goggles, rings, and belts.
Similar to smartphones, various other carry-on devices exist, such as belt clip-on
embedded computers, activity trackers, and the Google Glass. Following our focus
to identify integration solutions and open challenges, carry-on devices were not considered. Furthermore, work that emphasizes only one aspect, such as sensor integration in textiles or sensor data processing, without considering the complete wearable
computer solution was excluded from the analysis. While we mention commercial
products throughout this work, our analyses focused on research investigations along
the objectives mentioned earlier.
Based on web searches, we included 20 individual projects on garment-based
wearable computers and 13 projects on accessory-based wearable computers that
were published between the first Fundamentals of Wearable Computers and
Augmented Reality book edition in 2001 and June 2014. Projects were then categorized according to wearing position and subsequently analyzed.
22.2 GARMENT-BASED WEARABLE COMPUTERS

Garment-based wearable computers have been designed and implemented for a
variety of applications, including remote health monitoring, physical activity monitoring, and general user interaction. The following summary highlights important
applications and selected projects. Subsequently, we discuss the placement-specific
analysis in this section. Table 22.1 summarize the individual garment-based projects
included in the analysis.
586
TABLE 22.1
Garment-Based Wearable Computer Projects Analyzed in This Review
Placement, Form Factor
Project, System Description,

Application
Location: head and neck

Form: headband
U-healthcare: smart headband for

remote health monitoring (Kim etal.,
2008)

Form: neckband
Capacitive neckband for activity

recognition and nutrition monitoring
(Cheng etal., 2010)
Location: torso
Form: shirt
SMASH: distributed computer for

body posture monitoring (Harms
etal., 2008)
Location: torso
Form: shirt
CHRONIUS: instrumented shirt for

physiological monitoring (Rosso
etal., 2010)
Location: torso
Form: vest
MIThril: computing architecture for

wearable context-aware research
(DeVaul etal., 2001)
Location: torso
Form: vest
WearARM: computing core for

wearable applications (Lukowicz
etal., 2001)
Architecture Components
Computing: microcontroller
Sensors: PPG, accelerometer,
GPS
UI: none
Communication: ZigBee
Computing: MSP430
microcontroller
Sensors: capacitive electrodes
UI: none
Computing: MSP430
microcontroller
Sensors: accelerometers
UI: buttons, LEDs
Communication: I2C, SPI,
Bluetooth
Sensors: ECG, respiration rate
monitor, pulse oximeter,
temperature sensor,
accelerometer, microphone
UI: none
Communication: Bluetooth
Computing: MPC823
microprocessor, SA 1110
StrongARM microprocessor
Sensors: microphone,
accelerometers
UI: Twiddler, Palm keyboard,
HMD
Communication: I2C, USB,
Dallas Semiconductor
one-wire protocol, 10Mbs
Ethernet, 802.11 WLAN
Computing: SA 1110
StrongARM microprocessor,
TMS320 DSP
Sensors: grayscale cameras,
proximity sensor
UI: keyboard, mouse, HMD
Communication: I2C, USB, serial
port, IrDA port, PS/2, VGA,
10Mbs Ethernet, 802.11 WLAN
(Continued)
587
TABLE 22.1 (Continued)


Application
Location: torso
Form: vest
SensVest: vest-integrated activity

monitoring system (Knight etal.,
2005)
Location: torso
Form: jacket
wearIT@work: motion jacket to

support production and maintenance
tasks (Stiefmeier etal., 2008)
Location: torso
Form: suit
e-SUIT: business-suit-integrated
control of an information
management application (Toney
etal., 2002)
Location: torso
Form: underclothes
VTAMN: communicating underclothes

for remote health monitoring (Noury
etal., 2004)
Location: torso
Form: underclothes
MagIC: instrumented underclothes for

remote health monitoring (Di Rienzo
etal., 2005)
Location: torso
Form: underclothes
WEALTHY: instrumented underclothes

for remote health monitoring
(Paradiso etal., 2008)
Computing: no information
Sensors: HR monitor,
temperature sensor,
accelerometer
UI: LED, LCD display
Communication: RF
transmission (not specified)
Sensors: IMUs, force sensitive
resistors, ultrawideband tags
UI: none
Communication: data bus,
Bluetooth
Computing: microcontroller,
StrongARM microprocessor
Sensors: none
UI: buttons, slide control,
LEDs, LCD display, pager
motor
Communication: data bus,
WLAN
monitor, temperature sensor,
GPS
UI: button
Communication: I2C, GSM
monitor, accelerometer
UI: none
Communication: RF
transmission (not specified)
monitor (piezoresistive/
impedance pneumography),
piezoresistive elbow/
shoulder joint movement
monitor
UI: buttons, LEDs, buzzer
Communication: GPRS
(Continued)
588


Application
Location: torso, feet

Form: shirt, jacket, and
pair of boots
ProeTEX: smart garments for

monitoring the health and
environmental state of
emergency-disaster personnel
(Curone etal., 2010)
Location: arms and hands

Form: glove
StrinGlove: data glove for sign

language recognition (Kuroda
etal., 2004)
Location: arms and hands

Form: glove
Airwriting: glove-integrated
system for 3D-handwriting
recognition (Amma etal.,
2013)
Instrumented pair of pants for
monitoring lower extremity
joints (Liu etal., 2008)
Location: legs
Form: pair of pants
Location: feet
Form: shoes
Footnotes: instrumented shoes

to manipulate real-time
interactive musical outputs
(Paradiso, 2002)
Location: feet
Form: shoes
Shoe-mouse: instrumented
shoes as alternative (footcontrolled) input device
(Yeetal., 2005)
Location: feet
Form: shoes
GaitShoe: instrumented shoes

for gait analysis (Bamberg
etal., 2008)
Sensors: HR monitor, respiration rate
monitor, temperature sensor,
accelerometer, heat flux sensor,
CO/CO2 concentration, GPS
UI: visual and acoustic alarm
Communication: RS485 data bus,
ZigBee, 802.11 WLAN
Computing: DSP
Sensors: bend sensors, contact sensors
UI: none
Communication: no information
Sensors: accelerometer, gyroscope
UI: none
Computing: Atmel AVR Atmega8
microcontroller
Sensors: accelerometers, gyroscopes,
bend sensors
UI: none
Communication: I2C, Bluetooth
Computing: PIC 16C711 microcontroller
Sensors: accelerometer, gyroscope,
force sensitive resistor, bend sensor
UI: none
Communication: RF transmission (not
specified)
Computing: Atmel AT90S8515
microcontroller
force sensitive resistor, bend sensor
UI: none
Communication: GPRS
Computing: C8051F206
microcontroller
force sensitive resistor, bend sensor,
electric field sensor
UI: none
Communication: 916.50MHz RF
transmission (RF Monolithics
DR3000-1)
(Continued)
589

Location: feet
Form: bootee

Application
Instrumented infant shoe for
wireless pulse oximetry monitoring
(Weber etal., 2007)
Sensors: pulse oximeter,
accelerometer
UI: none
Communication: sub 1 GHz
RFtransmission (Nordic
nRF9E5)
Remote health monitoring. In remote health monitoring, wearable c omputers

often measure physiological parameters and activities of patients in their
daily life. For example, Rosso et al. (2010) used a T-shirt to m
onitor
physiological parameters, focusing on elderly patients with chronic diseases
such as chronic obstructive pulmonary disease (COPD) and chronic k idney
disease (CKD). A bootee-integrated system for monitoring infants was
presented by Weber et al. (2007). Garment-based wearable computers
were applied in the rehabilitation after medical treatment, for example,
for cardiac and chronic respiration patients (Paradiso et al., 2008) and
for applications in movement rehabilitation (Harms et al., 2008). Curone
etal. (2010) equipped emergency-disaster personnel (e.g., firefighters, civil-
protection authorities) with sensing and computing garments. Moreover, a
few commercial products exist in this field, such as the LifeShirt system by
VivoMetrics.*
Physical activity monitoring. Knight etal. (2005) described the design process
of a vest-integrated system, used for monitoring school childrens activities.
Liu etal. (2008) used a pair of instrumented pants to assess the dynamic
stability of motion-impaired elderly. Bamberg et al. (2008) presented an
instrumented shoe for gait analysis outside of traditional motion analysis
laboratories.
User interfaces. DeVaul et al. (2001) presented a wearable system called
Memory Glasses, integrated in the lining of a zip-in vest, for contextaware reminder delivery. Toney et al. (2002) incorporated a wearable
computer into a traditional business suit to control personal information management applications. An instrumented shoe was presented by
Paradiso (2002) to manipulate real-time interactive musical outputs. Ye
etal. (2005) described an alternative input using a shoe-based system for
impaired people, who have difficulties using their hands for computer
interaction. Kuroda etal. (2004) described a data glove used for sign language recognition.
*
http://vivonoetics.com/products/sensors/lifeshirt/.
590
Other applications. Further applications of garment-based wearable computers include sports (e.g., sensor shirts manufactured by Hexoskin* or
different commercially available shoe-based systems). Motion monitoring
was addressed for entertainment (e.g., the Xsens MVN), and worker support during production and maintenance tasks, e.g., as a motion jacket
(Stiefmeier et al., 2008). Cheng et al. (2010) presented a neck collar for
dietary behavior monitoring.
Clearly, such a widespread collection of application scenarios defines different system requirements. Nevertheless, most garment-based wearable computers include the
key components for sensing, computing, user interface, (wireless) data transmission,
and power supply. Table 22.1 list all analyzed garment-based wearable computer
projects. Detailed information on a systems architecture was not always available.
For example, many projects mentioned computing units but did not provide details
on memory or controller models. Power supply was omitted from the architecture
overview as all projects reported to use batteries.
22.2.1 Placement on Human Body

Most garment-based wearable computers in this analysis were placed on the torso.
Typical garments worn on the torso and thus serving as basis of wearable systems
include shirts, vests, jackets, and underclothes. Garment-form factors for other
body areas include headbands, neckbands, gloves, wristlets, pants, and shoes. This
section analyzes garment-based wearable computer projects optimized for placement at different body locations. Figure 22.1 summarizes all included projects and
placements.
22.2.1.1 Head and Neck Region
Few projects for the head and neck region were found. Limited space, ergonomic
considerations, and visibility may be the most challenging factors for wearable computers at the head and neck region. Kim etal. (2008) presented a headband to measure heart rate and accelerometer for step detection, including a pulse oximeter at
the forehead, microcontroller for signal preprocessing, ZigBee module for wireless
data transfer, and a rechargeable battery. Cheng etal. (2010) introduced capacitive
textile electrodes in a neckband, attached microprocessor, and a ZigBee transceiver.
As textile electrodes were used, an unobtrusive integration in scarfs, ties, or collars
was considered.
22.2.1.2 Torso Region
Garments on the torso provide large substrate area and are centrally located regarding signaling distances and measurement requirements. Moreover, torso garments
are close to the bodys center of mass, thus supporting heavier system components
such as batteries.
*

http://www.hexoskin.com/.
http://www.xsens.com/products/xsens-mvn/.
591
Head and neck

[Kim2008] [Cheng2010]
Torso
[Harms2008] [Rosso2010]
[DeVaul2001] [Lukowicz2001]
[Knight2005] [Stiefmeier2008]
[Toney2002] [Noury2004]
[DiRienzo2005] [Paradiso2008]
[Curone2010]
Arms and hands
[Kuroda2004] [Amma2013]
Legs
[Liu2008]
Feet
[Paradiso2002] [Ye2005]
[Bamberg2008] [Weber2007]
FIGURE 22.1 Garment-based wearable computer projects and their placement on the
human body.
Wearable computers at the torso were typically distributed systems, utilizing waist,
chest, back, or shoulders for components. Mostly, sensing at relevant body locations
and main system components, such as batteries and computing, was separated. A prototypical example of a distributed torso-worn system was described by Harms etal.
(2008). Their system was implemented into a loose-fitting long sleeve shirt, hence covering lower arms and the wrist locations too. The wearable computer design was hierarchical, consisting of a central master, region-specific gateways, and outer peripherals
(terminals) that provide sensing and I/O functionality. Essential system components,
including the master, gateways, and wiring, were glued onto the inner side of the garment, while a replaceable battery was placed in a pocket. Each gateway provided standardized interfaces to different terminals, such as accelerometer, buttons, and LEDs.
Harms etal. (2008) position four gateways to maintain a balanced distribution of terminals over the entire body and limiting wiring stretches below 85cm. Figure 22.2 shows
an example shirt and the architecture schematic. Another shirt-based wearable system
was presented by Rosso etal. (2010). Theyintegrate a set of sensors (ECG electrodes,
temperature sensors, accelerometers, etc.) into a T-shirt. In addition, an electronic module was attached to the T-shirt for data collection, analysis, and wireless transmission.
592
Gateways
Feedback to user and

ambient services
Konnex: Recognition level
Classification
Gateway: Feature level

Feature extraction and selection G
Sensor fusion
Battery
Konnex
Terminal: Signal level

Sampling, digitalisation
Signal pre-processing
Physical phenomenon
FIGURE 22.2 Example of garment-based wearable computer: SMASH shirt-integrated

sensing and computing infrastructure and system schematic. (From Harms, H. etal., SMASH:
A distributed sensing and processing garment for the classification of upper body postures,
Proceedings of Third International Conference on Body Area Networks (BodyNets), ICST,
Brussels, Belgium, 2008.)
Vest-based designs were preferred in many early wearable computer projects for
practical reasons, with the vests pockets used to carry bulky components. Moreover,
shirt designs were found cumbersome during dressing and putting off (Knight etal.,
2005). DeVaul etal. (2001) described the MIThril system, included in a chassis that
acts as a lining in a zip-in vest. The MIThril architecture included two computing
cores, a multi-protocol body bus, a range of I2C-based sensors, and interface devices,
such as clip-on HMD and Twiddler chording keyboard. Lukowicz etal. (2001) extend
MIThril with a modular computing core called WearARM, supporting reconfigurable data processing and efficient power management. In Knight etal. (2005), vests
were used to include sensors, core components, and output devices (LCD, LEDs).
Similar to vest-based systems, jacket-based solutions allowed users fast dressing and
stripping. Stiefmeier etal. (2008) described a motion jacket system, integrating inertial sensors and force-sensitive resistors together with processing and communication
features. Toney etal. (2002) integrated components into a traditional business suit,
including input and output interfaces connected to a PDA at the suits inner pocket.
Underclothes were found particularly beneficial for physiological sensors that
require direct skin contact. For example, Noury etal. (2004) developed a m
edical
monitoring system integrated in an undercloth, including sensing components (such
as ECG electrodes and temperature sensors), a fall detection module consisting of an
accelerometer and a microcontroller, wiring and interconnection busses. In addition,
the system included a belt, containing processing and communication components
and batteries, wired to the garment. Di Rienzo etal. (2005) presented an underclothbased system including textile sensors for ECG and respiration. Another underclothbased wearable computer was presented by Paradiso et al. (2008). The garment,
together with nine textile electrodes for cardiopulmonary monitoring, was realized
in one knitting procedure. Sensors were connected to an electronics module, placed
into a pocket at the lower back of cloth.
593
Curone etal. (2010) used three garment components: a T-shirt as inner garment,
a jacket as outer garment, and a pair of boots. Each component was sensor-equipped
for real-time monitoring of physiological, activity, and environmental parameters.
The inner garment included textile sensors connected to an electronic module via textile conductive cables. The outer garment included additional sensors to measure, for
example, external temperature, CO concentration, absolute position, and a processing
module for collecting and preprocessing data from different sensor nodes. Moreover,
electronic communication and alarm modules were attached for sending data to an
operations coordinator and providing visual and acoustic warnings when dangerous
situations were detected. The boots included CO2 sensors and a ZigBee module.
22.2.1.3 Arms and Hands Region
While various glove-based devices were developed in the past years (cf. the survey
of Dipietro etal., 2008), many of the related reports focus on information processing
rather than integration (e.g., Park etal., 2008), and thus were not analyzed.
Kuroda et al. (2004) introduced a data glove, called StrinGlove integrating
24bend sensors for hand posture monitoring and 9 contact sensors to detect contact between fingertips. Data processing was performed by a signal processor unit,
mounted onto the glove. Amma etal. (2013) described a wearable computer in a thin
glove and a wristlet. Their system consisted of a wireless data glove with inertial
sensors at the hands back, and microcontroller, Bluetooth module, and power supply at the wristlet. Besides the wearable computer, their system included an external
module for intensive data processing tasks.
Some glove designs only included sensors or data input, such as hand tracking
with a color glove (Wang and Popovic, 2009). Other glove-based projects are either
commercial developments (e.g., the MoCap Glove from CyberGlove Systems* or
data gloves from 5DT) or rely on one of the available products. A few garmentbased wearable computers for the torso covered upper limbs, too. For example, the
long sleeve shirt of Harms etal. (2008) included terminals at upper and lower arms.
Glove-like systems without fabric integration, as the SCURRY system (Kim etal.,
2005), are covered as accessory in Section 22.3.
22.2.1.4 Legs Region
Liu etal. (2008) introduced trousers with printed circuit boards located at hips and
leg joints, including inertial sensor, microcontroller with I2C interface, and A/D
converter. In addition, bend sensors were attached at the trousers knee locations.
Inother projects, trousers were often used as part of a larger system for only sensing,
thus are not discussed here.
22.2.1.5 Feet Region
Shoe-based wearable computers often included insole-integrated sensors, as well as
an attached module with additional sensors and system components for data processing, communication, and power supply. Typically, only basic data processing, for
*

http://www.cyberglovesystems.com/.
http://www.5dt.com/.
594
example, signal conditioning, was performed on the shoe wearable computer and
then data was sent to an external system for further processing. Foot-mounted system
designs were often constraint by space, required robust wear-resistant attachments,
and autonomous operation to avoid wiring from foot/ankle to a trousers or waistmounted computing unit, for example. The feet region was chosen in applications of
gait analysis or for using feet as input device.
Using insole-sensors, dynamic pressure was frequently measured at the heel and
great toe, as well as sole bending, for example, in Paradiso (2002). Bamberg etal.
(2008) included a capacitive sensor in the insole to estimate foot height above floor
level. Additional sensors, including accelerometers and gyroscopes were included
in the shoe-attached module. Paradiso (2002) placed the shoe-attached module at
shoes side, Bamberg etal. (2008) used the shoes back, while Ye etal. (2005) fully
integrated the components inside the shoe. An infant shoe (bootee) for wireless monitoring of pulse oximetry was presented by Weber etal. (2007). Electronic components, including oximetry module, RF transceiver, and power supply, were contained
in a box, integrated into a thick sole of the bootee.
22.3 ACCESSORY-BASED WEARABLE COMPUTERS

Application areas of accessory-based wearable computers largely overlap with
those of garment-based systems. Examples of remote health monitoring applications include the continuous medical monitoring and alert system of Anliker etal.
(2004) targeting high-risk cardiac and respiratory patients, and the multimodal
physiological monitoring device of Malhi etal. (2012). With the e-AR ear-worn
system, it was demonstrated how a similar design can be used for different applications: Wang etal. (2007) used the e-AR concept as a ubiquitous heart-rate monitoring device, while Jarchi etal. (2014) presented a gait analysis system. Examples
of accessory-based wearable computers for user interaction were described by
Tamaki etal. (2009) and Bulling etal. (2009). Kim etal. (2005) presented an input
system named SCURRY used either as a wearable finger mouse or as a wearable
keyboard.
A multifunctional wearable computer was presented in Amft etal. (2004), targeting medical aiding systems, mobile worker assistance, security and rescue applications, and sport exercise monitoring. The wearable autonomous microsystem of
Bharatula et al. (2004) was intended for different context-awareness applications.
Numerous applications were considered for smartwatches, from generic pager and
activity trackers to information appliances. IBMs early wristwatch computer was
developed for personal information management applications (Narayanaswami
etal., 2002). The eWatch of Maurer etal. (2006) served as wearable sensor and notification platform, for example, for location recognition using audio and light sensor
data. Wrist-worn devices are meanwhile widespread. A multitude of commercial
smartwatches exist, including the Pebble* and Neptune Pine, that give access to
*

https://getpebble.com/.
http://getneptune.com/.
595
internet-based services, gather data from other portable devices, or even serve as
simple smartphone replacement.
Table 22.2 list accessory-based wearable computer projects analyzed and further
discussed later per body region. While we aimed at extracting most architectural
information from the projects publications, often details were missing. The power
supply was omitted from the architecture overview since most systems used batteries, except for solar cells in a sensor button system (Bharatula etal., 2004).
22.3.1 Placement on Human Body

While the torso region was preferred for garment-based wearable computer projects, accessory-based systems were found preferably placed on the head, neck, and
arms (cf. Figure 22.3). One important reason is that device size and weight does not
exceed the perceived limits for these body regions. This section describes existing
accessory-based wearable projects related to the different body areas.
22.3.1.1 Head and Neck Region
Ear-worn, hearing-aid-like devices are well suited as wearable computers integrated
in accessories. The e-AR is a lightweight device (weights less than 10 g) that can be
attached onto the ear. While e-AR was mostly considered without the full integration
into an accessory, we included this project as a prototypical example as it provided
insights into wearable computers in ear-worn accessories, such as earrings. e-AR
included signal amplifiers, power supply, and wireless transceiver and an application-dependent sensing modality. Wang etal. (2007) used the e-AR system for reflective PPG measurement, thus embedding multiple LEDs and photodiodes into the
device. Jarchi et al. (2014) integrated a three-axis accelerometer for gait analysis.
Recently, our research group combined head measurement approaches in a regular
eyeglasses design (Amft et al., 2015). Figure 22.4 illustrates the eyeglasses design.
Tamaki etal. (2009) presented the Brainy Hand system, an ear-worn single color
camera for 3D hand gesture recognition, a laser line as a visual marker indicating
the camera range, and an earphone for receiving audio feedback. Matsushita (2001)
presented a head-mounted peripheral device, integrating two microcontrollers and
further components into a headset. Microcontrollers were used for audio and inertial
signal processing and connected via a 1Mbps serial data bus integrated within the
headset. Bulling etal. (2009) used goggles as accessory basis for their EOG-based
eye tracker system consisting of dry electrodes attached via steel springs to the goggles. Amplification electronics, accelerometer sensor, and electrodes were wired to a
processing and communication unit in an upper arm pocket.
22.3.1.2 Torso Region
Amft etal. (2004) presented a belt-integrated wearable computer called QBIC, where
main system electronics running Linux had been integrated into the belt buckle,
including microcontroller, memory, and wireless interfaces. The belt itself was used
as extension bus and mechanical support for interfaces, consisting of two layers of
leather with a flex-print wiring system in between. Belt interfaces included a headmounted display connector, as well as RS232 and USB ports. The QBIC system was
596
TABLE 22.2
Accessory-Based Wearable Computer Projects Analyzed in This Review

Application

Form: ear-worn device
e-AR: ear-worn activity and

heart rate monitoring device
(Jarchi etal., 2014)
Computing: Intel 8051

microcontroller
Sensors: accelerometer, PPG
UI: none
Communication: 2.4 GHz RF
transmission (Nordic nRF24E1)

Form: ear-worn device
Brainy Hand: ear-worn

interaction device based on
hand gestures (Tamaki etal.,
2009)
Sensors: camera
UI: earphone, laser line

Form: headset
Headset-based wearable
computer for context-aware
applications (Matsushita,
2001)
Computing: AT90S8515
microcontroller, PIC16LF877
microcontroller
Sensors: accelerometer, gyroscope
UI: speaker, microphone

Form: goggles
EOG-based wearable eye

tracking system (Bulling
etal., 2009)
Computing: 16-bit dsPIC

microcontroller (Microchip)
Sensors: EOG, accelerometer, light
sensor
UI: none
Location: torso
Form: belt
QBIC: belt-integrated
wearable computing platform
(Amft etal., 2004)
Computing: XScale microcontroller

(Intel PXA263B1C400)
Sensors: none
UI: none
Communication: USB, PS/2, VGA,
RS-232, Bluetooth
Location: torso
Form: button
Autonomous microsystem for

context-aware applications
(Bharatula etal., 2004)
Computing: CoolRisc 88
microcontroller
Sensors: accelerometer, light sensor,
microphone
UI: none
Communication: 868MHz RF
transmission
Location: arm
Form: arm-band
MP3 player realized as a

textile arm-band (Lee etal.,
2010)
Sensors: none
UI: buttons, LED array
(Continued)
597

Accessory-Based Wearable Computer Projects Analyzed in This Review

Application
Location: wrist
Form: watch
Linux Watch: smartwatch for

e.g., personal information
management applications
(Narayanaswami etal., 2002)
Location: wrist
Form: watch
eWatch: smartwatch as a
wearable sensing,
notification, and computing
platform (Maurer etal., 2006)
Location: wrist
Form: bracelet-like
device
AMON: remote health

monitoring and alert system
(Anliker etal., 2004)
Location: wrist
Form: bracelet-like
device
Remote health monitoring

system (Malhi etal., 2012)
Location: hand and finger

Form: glove-like device
SCURRY: hand-worn input

device (Kim etal., 2005)
Location: hand and finger

Form: ring
Finger-worn device for remote

health monitoring (Asada
etal., 2003)
Computing: ARM7 microcontroller
(Cirrus Logic EP 7211)
Sensors: none
UI: OLED touch display, roller wheel
Communication: Bluetooth, infrared
Computing: ARM7TDMI
microcontroller (Philips LPC2106)
Sensors: accelerometer, temperature
sensor, light sensor, microphone
UI: buttons, LCD display, buzzer,
vibrating motor
Communication: Bluetooth, infrared
Computing: ARM7TDMI
microcontroller (Atmel AT91R40807)
Sensors: ECG, pulse oximeter, blood
pressure system, accelerometer
UI: buttons, LCD display
Communication: GSM
Computing: C8051F020 microcontroller
Sensors: HR monitor, temperature
sensor, accelerometer
UI: none
Computing: PIC microcontroller
Sensors: accelerometers, gyroscopes
UI: none
Communication: 2.4 GHz RF
transmission
Computing: PIC microcontroller
Sensors: PPG sensor
UI: none
Communication: 915MHz RF
transmission
used for different field studies, including daily routine monitoring and sports. The
system is shown in Figure 22.5. Bharatula etal. (2004) presented a button concept as
autonomous microsystem, integrated in a button-like form. Sensors for light, sound,
and acceleration, microprocessor, and RF transceiver were included and a solar cell
and lithium polymer battery was considered for powering. The button could replace
regular buttons in cloths at various locations.
598
Head and neck

[Jarchi2014] [Tamaki2009]
[Matsushita2001] [Bulling2009]
Torso
[Amft2004] [Bharatula2004]
Arm
[Lee2010]
Wrist
[Narayanaswami2002]
[Maurer2006] [Anliker2004]
[Malhi2012]
Hand and finger
[Kim2005] [Asada2003]
FIGURE 22.3 Accessory-based wearable computer projects and their placement on the
human body.
Light sensor
Frameintegrate
d wiring
Optical heart
rate sensor
Microcontroller,
wireless interface,
inertial sensors,
battery
Top view
FIGURE 22.4 Example of an accessory-based wearable computer recently developed by

our group. Various activity and physiology measurements were integrated into regular eyeglasses. (Amft et al., IEEE Perv. Comput., 2015, in press).
22.3.1.3 Arm Region

Lee et al. (2010) presented an upper-arm-worn wearable computer, where the
music player function was integrated in a textile using direct chip integration and
implemented it as a sports armband. Two screen-printed layers of fabric patches
were used. The first layer included a microcontroller, MP3 decoder, SD memory
599

External battery
QBIC buckle
Connector
ports
Belt internal
battery
FIGURE 22.5 Example of accessory-based wearable computer: QBIC belt-integrated computer and system schematic. (From Amft, O. etal., Design of the QBIC wearable computing platform, Proceedings of 15th IEEE International Conference on Application-Specific
Systems, Architectures, and Processors (ASAP), Galveston, TX, 2004, pp. 398410.)
socket, and an earphone socket. The second layer was used to connect power and
ground lines.
22.3.1.4 Wrist Region
Smartwatches are the predominant and commercially successful wearable
computers at the wrist in recent years. Their success may be attributed to the reuse
of an established accessory location: the users wrist was used for watches in a long
time. The location is easily accessible and considered for visual expressions of fashion or trendiness. Narayanaswami etal. (2002) introduced IBM Linux Watch, one
of the earliest smartwatches. The Linux Watch was a complete computer running
Linux, displaying X11 graphics, and had wireless connectivity. The system consisted
of a main board with the processor, communications board including a Bluetooth
module, and display board including an OLED display. Maurer et al. (2006) presented the eWatch device, a wearable sensing, notification, and computing platform
built into a wrist watchform factor. eWatch provided tactile, audio, and visual notification while monitoring light, motion, sound, and temperature. Beside sensors, user
interface, and computing components, eWatch used Bluetooth to communicate with
a cellphone or a stationary computer. Anliker etal. (2004) presented a bracelet-like
device for continuous monitoring and evaluation of multiple physiological parameters, including blood pressure, ECG, and oxygen saturation. The enclosure included
all sensors, processing and communication modules, power supply, and user interface. Medical emergency detection was performed at the device, and analyzed data
was sent to a medical center unit. Another bracelet-based system is presented by
Malhi etal. (2012), integrating sensors into the device to measure temperature, heart
rate, and aim at detecting falls.
22.3.1.5 Hand and Finger Region
The human hand was as well considered for wearing accessory-based wearable computers. Kim etal. (2005) described a glove-like device called SCURRY, composed
of a base module and four ring-type modules containing sensors, communication,
and microcontroller components. The base module included two gyroscopes on the
600
back of the hand, while the ring modules included two-axis accelerometers. Asada
etal. (2003) considered a finger-worn ring accessory and integrated a PPG sensor,
microcontroller for LED modulation, data acquisition, filtering, RF communication,
and an RF transmitter. All components were encapsulated within the ring, in a compact body and powered by a tiny cell battery used for wristwatches.
22.3.1.6 Legs and Feet Region
No accessory-based wearable computers were found for the lower limbs region. Most
wearable computer designs that aim for the lower limbs region are integrated into
trousers or shoes, thus are considered in Sections 22.2.1.4 and 22.2.1.5.
22.4 LESSONS LEARNED AND BEST PRACTICES

Various technical challenges must be considered when designing and integrating garment-based and accessory-based wearable computers, including aspects
related to safety, ergonomics, social acceptance, reliability, powering, and cost.
In this section, technical challenges are discussed and lessons learned from the
analyzed projects are summarized. We identified and categorized challenges based
on (1)considerations from individual investigations found in the projects and (2)by
grouping identified best practices. Table 22.3 provides an overview on lessons
learned and best practices.
Most technical challenges can be directly linked to a seamless integration of
wearable computers into daily life situations, where a users outfit, system availability, and wearing comfort matters. Consequently, some best practices overlap
between garment-based and accessory-based systems and are discussed here in
parallel. Differences between the two variants mainly originate from garmentbased systems being mostly distributed and placed on textile substrate, while
accessory-based systems are encapsulated in small packages and placed at accessory locations of exposed body parts. Consequences of these differences will
also be analyzed within this section. Most of the analyzed projects address some
selected technical challenges, centered around wearability and sensor functions
and neglecting cost and safety considerations. More extensive design and implementation reviews were gathered for garment-based systems by Toney etal. (2002),
Knight etal. (2005), and Harms etal. (2008), and for accessory-based systems by
Amft etal. (2004).
22.4.1User Interface
As they are exposing a systems functionality with I/O modalities, user interfaces
are key to user experience. Multifunctional wearable computers typically support
a range of peripherals, including audio and visual, touch and keys, and buttons.
Peripherals shall be easily accessible, interrupting users in suitable moments, yet
being unobtrusive. DeVaul etal. (2001) stated that user interfaces shall maximize
provided information value while minimizing physical and cognitive burdens
imposed by accessing it.
601
TABLE 22.3
Technical Challenges of Wearable Computer Integration: Summary
ofLessons Learned and Best Practices
Technical Challenge
User interface
Sensing modalities
Data and power
Wearability
Social acceptance
and aesthetics
Robustness and
reliability
Lessons Learned and Best Practice

Buttons and touch interfaces are easily deployable input options in
garments and require low cognitive load.
Multimodal controls and alternative options are deployed for information
output, providing users with adequate feedback depending on the situation.
Electrode-based physiological measurements can be replaced by
conductive textile solutions using embroidery and knitting. Most
physiological signals can be acquired in garments and accessories.
Current microsystems and sensors can be integrated in small accessories
and garments. Errors in inertial and orientation estimates are expected
when measuring at loose-fitting garments.
Sensor redundancy using multiple signal channels or sensing at different
body locations can compensate low signal-noise ratio.
Sensors/peripherals in garments communicate to a central unit using
low-bandwidth, low-power wired busses as commonly found in embedded
systems, e.g., I2C and SPI.
High-channel garmentelectronics interfaces are necessary to realize
detachable electronics.
Flexible components in garments should be flat and may consume
relatively large surface area, but require similar properties than their fabric
substrate, such as breathability and stretchability.
Rigid components should be placed on the trunk, at location of minimal
shape change when moving, e.g., upper chest, upper back/shoulder region,
and hips. Heaviest components may be placed near to the bodys center of
mass.
Textile electronics technology provides many solutions for integration,
including textile electrodes, elongation sensors, wires, etc.
Accessory-based wearable computers need trade-off between sensor
placement and signal quality, where wearability often directs to use
placements with lower signal-to-noise ratio.
Providing usage options and system variants improves compatibility with
different cloths and wearer preferences.
Aesthetic design can spur market success. Technology may be visible.
When systems cannot be invisibly integrated, providing indications of
being unused, e.g., during conversations, may favor social acceptance.
Component protection from environmental stressors (heat, dust, liquid,etc.)
is often accomplished by gluing components or detachable design.
Considerations are required to protect wiring and connectors to withstand
long-term mechanical stress due to body motion, in both garment-based
and accessory-based systems.
Selected materials shall have low toxicity, low chemical reactivity, and be
thermally stable.
(Continued)
602

Technical Challenges of Wearable Computer Integration: Summary
ofLessons Learned and Best Practices
Technical Challenge
Extensibility
Cost
Safety
On-body computing
Lessons Learned and Best Practice

Modular and hierarchical system architectures provide extensibility for
multipurpose wearable computers, e.g., to add new sensors or peripherals,
without modifying main system components.
Off-the-shelf reusable components can minimize material cost. However,
careful evaluations in the target environment are needed, when prior
expertise and performance data is missing.
Market margins depend on business segments rather than functionality.
Inpresent consumer applications, wearable computers may add a small
premium to the garment/accessory only.
Minimize physical presence and cognitive load imposed by the system.
Safe wiring, contact insulation, and skin-friendly material choice are
essential.
Context awareness could be used to manage cognitive load.
Prototypes should be evaluated during long-term, realistic trials with
nonexpert users.
Distributed on-body processing reduces transmission and storage needs.
Heterogeneous architectures, combining general-purpose and specialpurpose subsystems could deal with application-specific processing needs.
Using a standard operating system expedites application development and
abstracts hardware, e.g., of different application-specific implementations.
Toney etal. (2002) overlooked input options regarding cognitive load and external
perception. Their investigations clearly favor garment-integrated hidden buttons as
they do not lead to loss of eye contact in social situations when accessing them, and
require limited cognitive load on the wearer. The authors implemented capacitive
touch sensors by embroidered metal threads inside hems and cuffs of a suit jacket.
The touch-sensitive buttons and sliders could be used to inconspicuously operate
personal information management applications with common gestures when sitting and standing. Buttons with different design were frequently used in many other
garment-based systems too (Paradiso et al., 2005; Harms et al., 2008; Lee et al.,
2010). Textile-integrated buttons could be constructed via multilayer fabrics, for
example, the MP3 player of Lee etal. (2010). As an example of more versatile wearable computers, the MIThril architecture supported various input devices, including
audio via microphone and text entry via Twiddler and Palm folding keyboard.
While Maurer etal. (2006) used push buttons on their smartwatch. Narayanaswami
etal. (2002) and others deliberately designed input interfaces without buttons, motivating this decision with ease of use and a more elegant appearance. The IBM Linux
Watch included touch screen and a roller wheel. Due to the display size limits, only
four quadrants with a potential fifth zone at the displays center were used. Instead,
the roller wheel was used to navigate lists corresponding to a middle mouse wheel.
603
While among input options, touch and button interfaces dominate, a trend to multimodal output controls is observable for information feedback. Toney etal. (2002)
investigated output options regarding cognitive load and external perception too
and implemented LEDs in suit cuffs to inform on priority of available information.
Furthermore, they considered a tactile feedback (pager motor) sewn into the jacket
shoulder providing different vibration patterns, and a watch computer with a programmable LCD for short messages. Toney etal. (2002) considered that sound output required context information to enable it only when appropriate. In the scenario
of Curone etal. (2010), garments included an alarm module for visual and acoustic
warnings in emergencies. Moreover, user warnings are key in medical monitoring
applications. LEDs and buzzers were used in various implementations to give immediate feedback (Paradiso etal., 2005; Harms etal., 2008). For physical activity monitoring, Knight etal. (2005) used LEDs indicating pulse measurements and an LCD
display to show data.
In the eWatch, user notifications were displayed via a 128 64 pixel LCD display,
LED, vibrating motor, and tone generating buzzer (Maurer etal., 2006). For the IBM
Linux Watch, an OLED display was preferred over LCD for readability, reduced
power consumption than backlit LCD, and for aesthetics. They chose yellow for the
OLED due to higher contrast than blue or green. Narayanaswami etal. (2002) found
no significant difference whether graphics were presented in landscape or portrait
format, but, for example, for a phone book the landscape mode was preferred as
fewer lines with more characters per line are easier to read. For particular applications, rather specific feedback options are however well conceivable. For example,
Tamaki etal. (2009) used an earphone for audio feedback and a laser line/miniprojector to indicate the camera view in their ear-worn system.
22.4.2Sensing Modalities
Sensor is a key asset of wearable computers that not only process direct user input
but gather information from measurable phenomena. Various sensing modalities were integrated in garment-based and accessory-based wearable computers.
Table22.4 summarize signals, modalities, and integration approaches used in the
included projects.
Sensors measuring physiological parameters were frequently found in underclothes as they require direct skin contact or skin proximity. Sensors to measure
motion, activity, and environmental parameters could be placed on outer garments or
accessories too, as long as a garment is tight-fitting (Harms etal., 2008). In garmentbased systems, sensors were frequently attached (glued, sewn, fixed with Velcro
straps) or integrated as yarn into the fabric itself, such as for textile electrodes. Since
we review wearable computer projects, sensor modalities and integration techniques
are certainly not exhaustive. Nevertheless, projects included here provide a useful
summary on established sensing approaches.
Paradiso etal. (2005) knitted sensors in garments, including fabric ECG electrodes, using stainless-steel yarn, elongation sensors using piezoresistive yarns for
respiration measurement as expansion and contraction of thoracic and abdomen
regions, and for activity monitoring in sleeves. They used knitted electrodes at
604
TABLE 22.4
Overview of Sensing Modalities Found in the Wearable Computer Projects
Type of Signal
Sensing Approach
Type and Integration of Sensor
ECG, heart rate
ECG leads: precordial, Einthoven

and Wilson
ECG, heart rate
One-lead ECG with three

electrodes
Two electrodes at thorax level or
four according to Einthoven
Reflective PPG measured at
finger/forehead/ear
Transmissive PPG measured at
infants foot
Reflectance sensor placed on top
of the wrist/next to the left
vertebral ribs
Standard oscillation method, the
wrist and its vasculature are
compressed
Inductive measurement of the
abdomen and/or thorax volume
displacement
Piezoresistive sensors at thoracic
and abdomen location
Conductive fabric electrodes are

realized with knitting within
underclothesa,b
Gold electrodes are integrated into a
wrist-worn devicec
Conductive fabric electrodes are woven
into underclothesd,e,f
PPG sensor integrated into finger ringg/
headbandh/earpiecei
Pulse oximeter integrated into a booteej
ECG, heart rate

Heart rate
Heart rate, blood
oxygen saturation
Blood oxygen
saturation
Blood pressure
Respiration rate,
respiration volume
Respiration rate
Respiration rate
Body temperature
Blink detection, eye

movement
Chewing and
swallowing detection
Impedance pneumography: four

electrodes placed at thoracic
position
Temperature probe (deduce core
body temperature from the
external temperature of the skin)
EOG-based eye tracking
Active capacitive sensing
principle
Cough and snoring

detection
Head movement
Near-body audio signal analysis
Upper body
movement
Inertial measurement of each

upper body part
Active capacitive sensing

principle
Pulse oximeter integrated into a

wrist-worn devicec/attached to the
internal side of a shirtd
Encircling inflatable compression cuff
(pump and valve) is integrated into a
wrist-worn devicec
Inductive coil(s) on the inside of an
inner garment wovend,e or
embroideredf
Fabric piezoresistive sensors are
underclothesa
Conductive fabric electrodes are
underclothesa
Temperature sensor placed at the
armpitd/attached to a bracelet ensuring
skin contactk
EOG electrodes are integrated into
gogglesl
Four conductive textile-based electrodes
are sewed in between textile layers of
a neckbandm
Microphone placed externally onto a
shirt, over the sternumd
Four conductive textile-based electrodes
are sewed in between textile layers of
a neckbandm
IMUs are attached to a jacketn
(Continued)
605

Type of Signal
Sensing Approach
Elbow and shoulder

joint movement
Piezoresistive sensors at elbow

and shoulder joints
Lower arm muscle

activity
Detect muscle contraction with

force-sensitive resistors (FSRs)
Hand and finger

movement
Inertial measurement of hand and

fingers
Finger flexure
Magnetic bend sensors
Detect contact
between fingertips
Lower body
movement
Knee bending
Magnetic contact sensors
Foot movement
Inertial measurement of the foot
Dynamic pressure of
the heel and toes
Flexion at the
metatarsalphalangeal joint
Plantar flexion and
dorsiflexion
Force sensitive resistors in front of

the shoe and under the heel
Bidirectional resistive bend sensor
under the foot
Overall body
movement, fall
detection, gait
analysis
Height of the foot
above floor
Inertial measurement of each

lower body joint
Piezoelectric bend strips
Vertical bidirectional resistive

bend sensor at the back of the
foot/ankle
Measuring the torso/arm
acceleration
Absolute position
Estimate the elevation of the foot

via capacitive loading from the
floor
Place reference ultra-wideband
transmitters around the object
and tags on the wearer
GPS
Ambient temperature
Temperature probe
Relative position to an
object

Fabric piezoresistive sensors are
realized with knitting in the sleeves of
underclothesa
An array of FSRs is integrated into a
strap, worn around the lower armn
Accelerometers and gyroscopes
integrated into a wrist-worno/
hand-worn devicep
Bend sensors integrated into the fabric
of a gloveq
Magnetic coils integrated into the fabric
of a gloveq
Accelerometer/gyroscope sewed into
pants at joints locationr
Piezoelectric bend sensors are sewed to
the inside of pants at knee locationr
Accelerometer/gyroscope attached to
the shoes,t,u,v
Piezoelectric strips are integrated within
a shoe insolet,u,v
Bend sensor is integrated within a shoe
insolet,u,v
Bend sensor is attached to the back of
the shoet
Accelerometer attached to the
body-worn computing unit,w placed
over the sternum onto a shirt,x or
integrated into a buttony/earpiecez/
wrist-worn deviceaa
Capacitive electric field sensor
integrated into a shoe insolet
Ultra-wideband tags are attached onto a
jackets shoulder regionab
GPS module attached to an outer
garmentac or electronic modulead
Temperature sensor placed on an outer
garmentx,ac,ad/integrated into a
wrist-worn deviceo
(Continued)
606

Type of Signal
Sensing Approach
Ambient humidity
Humidity sensor
Ambient light
Photodiode light sensor
Ambient sound
Microphone
CO/CO2 concentration
Using the potentiometric

measuring principle

Humidity sensor placed externally
onto a shirtx
Light sensor integrated into a
wrist-worn deviceo/buttony
Microphone integrated into a
wrist-worn deviceo/buttony
CO/CO2 sensor attached to an outer
garmentac
aParadiso etal. (2005); bParadiso etal. (2008); cAnliker etal. (2004); dRosso etal. (2010);
References:
eDi Rienzo etal. (2005); fNoury etal. (2004); gAsada etal. (2003); hKim etal. (2008); iWang
etal. (2007); jWeber etal. (2007); kMalhi etal. (2012); lBulling etal. (2009); mCheng etal.
(2010); nStiefmeier etal. (2008). oMaurer etal. (2006); pKim etal. (2005); qKuroda etal.
(2004); rLiu etal. (2008); sWeber etal. (2007); tBamberg etal. (2008); uParadiso (2002); vYe
etal. (2005); wDi Rienzo etal. (2005); xRosso etal. (2010); yBharatula etal. (2004); zJarchi
et al. (2014); aaAnliker et al. (2004); abStiefmeier et al. (2008); acCurone et al. (2010);
adNoury etal. (2004).
thoracic region for impedance pneumography too. Textile-woven ECG sensors

from conductive fibers were integrated at thorax level in the vest of Di Rienzo
etal. (2005). The vest included a textile transducer for measuring thorax volume
changes, thus assessing respiratory frequency. Also Rosso etal. (2010) used textile electrodes but woven wires for inductive measurement of abdomen and thorax
volume change. Cheng et al. (2010) deployed active capacitive sensing utilizing
conductive textile electrodes to assess volume changes at the neck. They found that
conductive fabrics are flexible and can be easily cut and sewed it into a neckband
textile.
In accessories, ECG and respiration was often measured using dry electrodes.
Anliker et al. (2004) integrated a one-lead ECG using three integrated gold electrodes, besides cuff-based blood pressure measurement in their wrist-worn system.
Reflective pulse oximetry was implemented on the wrist top. In the ring wearable
computer of Asada et al. (2003) reflective pulse oximetry was deployed, where
the ring ensured a proper level of pressure and optically shielded the sensor unit.
Rosso et al. (2010) placed the reflection sensor on the inner side of their T-shirt
to measure oxygen saturation. Pulse measurements at fingertip or wrist are easily
distorted by hand movement. Some projects considered alternative placements that
may be c onsidered comfortable and socially acceptable in particular applications,
for e xample, headband to measure at the forehead (Kim etal., 2008). To measure
body temperature, places on a garments inside (Rosso etal., 2010) and in a bracelet (Malhi etal., 2012) were considered. Skin temperature at the wrist is affected
by environmental conditions, however, such that a correlation with the core body
607
temperature is rather low (Anliker etal., 2004). Bulling etal. (2009) measured the
EOG with dry electrodes integrated into a goggles frame and compensated motion
artifacts using an accelerometer.
Inertial sensors are widely used to monitor physical activity in garment-based and
accessory-based systems. Examples include step and fall detection (Kim etal., 2008;
Rosso et al., 2010), upper body posture and motion capture (Harms et al., 2008;
Stiefmeier etal., 2008), finger motion (Kim etal., 2005), and physical activity monitoring (Anliker etal., 2004). Besides inertial sensors, shoe-based wearable systems
typically include pressure and bend sensors, directly integrated into insoles, such
as piezoelectric strips (Paradiso, 2002; Bamberg et al., 2008), bidirectional bend
and capacitive sensors (Bamberg etal., 2008). Additional sensor modalities found
in wearable computers include GPS and CO and CO2 concentration (Curone etal.,
2010), ultrawideband radar (Stiefmeier etal., 2008), and ambient temperature and
humidity (Rosso etal., 2010).
22.4.3Data and Power

In garment-based wearable computers, communication of sensors or peripheral to
a central computing unit was often accomplished using low-bandwidth and lowpower connections commonly found in embedded systems, as the two-line I2C bus.
I2Cis extensible regarding communicating nodes (Noury etal., 2004) and addition
of custom nodes (DeVaul etal., 2001). Alternatives include other typical embedded
systems busses as the four-wire SPI. Harms etal. (2008) connected components via
an SPI, in a redundant star topology to compensate line breakages in the textile.
Curone etal. (2010) used RS485 to connect components in inner and outer garment
to a central master and to distribute power. While USB provides higher bandwidth,
it is more complex to implement, thus remains an option for multipurpose wearable
computers connecting off-the-shelf components, such as USB cameras, drives, or
microphones as in the QBIC system (Amft etal., 2004). Wireless communication
was used to transmit data from the wearable system to remote processing units,
where necessary. Most common protocols are Bluetooth (Amma et al., 2013) and
ZigBee (Kim etal., 2008).
Combined data and power distribution provide efficient system solutions, as for
example, using RS485 bus (Curone etal., 2010). Multipurpose wearable computer
implementations often provide dedicated data and power infrastructure. Amft etal.
(2004) integrated data and power lines in a flex-print inside the belt of the QBIC system. Peripherals could access the lines at different connection points. The MIThril
body bus (DeVaul etal., 2001) used a branching single-connection power/data bus.
While early garment-based wearable systems distributed data and power via cables
routed on the textile and fixed, for example, by Velcro straps, more recent garments
use fabric-integrated wiring, for example, by embroidery (Noury et al., 2004).
Lee etal. (2010) realized power and ground lines as separate layers. Recently, the
development of textile-electronics connectors has gained interest, as more data lines
need to be connected. For example, Curone etal. (2010) used nine-pin and four-pin
connectors, both composed of a washable garment-integrated and a nonwashable
electronics attachment part.
608
22.4.4Wearability
Wearability and ergonomics have been key challenges since the first wearable computers, involving system size, shape, and weight (Gemperle etal., 1998; Knight and
Deen-Williams, 2006). Wearing a garment-based or accessory-based wearable computer shall not alter the usual perception of a garment or accessory, nor the wearers
usual posture and movement. Comfortable wearing, easy setup, and m
aintenance are
concurrent requirements. Ideally, a user attaches the system on in the morning, takes
it off in the evening, and completely forgets about it in between except when actively
using its functionality (DeVaul etal., 2001).
Knight etal. (2005) suggested that anthropometric data should be considered and
gathered design choices for smart garments: (1) components shall be flat but may
consume relatively large surface area (cf. recommendations of Gemperle etal., 1998)
and (2) components should be located on the trunk at location of minimal shape
change when bending or moving. According to Knight et al. (2005), appropriate
body regions include upper chest, upper back shoulder region, and hips. They
reported that their initial shirt design was impractical during dressing and stripping,
and thus changed to a vest design that closes on the chest. Bharatula etal. (2004)
suggested that body areas often in contact with objects, for example, underside of
arms and hands, back, bottom, and feet, should only contain seamless integrated
components within the fabrics thickness. In a similar gist, Paradiso (2002) noted
that at the shoes outer side, components can be placed without constraining common movement. Textile electrodes, elongation sensors, and other embroidered or
knitted transducers, wires, etc., are suitable components for comfortable wearable
computers; however, textile components require similar properties than their fabric
substrate, such as breathability and stretchability.
Functional yarns have been integrated into the fabric structure of garment-based
systems, for example, to acquire ECG and respiration. However, garments cannot be
entirely constructed from conductive fabric as conductive region may be too rigid
and uncomfortable. Hence metal threads are usually twisted around standard textile
yarn. The amount of metal in a fabric is a compromise between required electrical
properties and the necessity to maintain mechanical behavior compatible with textile
applications (Paradiso and Pacelli, 2011). Lukowicz etal. (2001) targeted a garment
to be soft on the inside for comfortable wear, and rigid on the outside for robustness,
for example, to protect system components. They experimented with different material combinations. However, soft fabric garments may be difficult to attach and take
off, as they do not maintain shape and hold sensors (Kuroda etal., 2004).
Compared to garment-based wearable computers, miniaturization and integration of wearable computers in common accessories still appear as open challenges.
Current projects often made trade-offs toward a wearable design. Anliker et al.
(2004) integrated various vital monitoring functions into their wrist-worn AMON
system, thus easing an integration in everyday life activities. However, this all-in-one
design disfavors optimal sensor positioning. For example, acquiring ECG became a
hard challenge for the AMON system, compared to chest-worn solutions. A trade-off
between wearability and signal quality is frequently found in wearable computers.
For example, in EOG goggles, signal amplification and A/D conversion is ideally
609
performed at the electrodes, while weight and size considerations resulted in wiring
EOG electrodes to an upper arm processing unit. Due to longer wires, analog highimpedance EOG signals pick up an increased amount of noise (Bulling etal., 2009).
The ring-based system of Asada etal. (2003) impressively shows miniaturization;
however, it still remains larger than typical finger rings. Besides size, weight remains
a critical design consideration. DeVaul etal. (2001) report a total weight of~2kg for
the MIThril system. They addressed weight distribution using a zip-in vest design that
balanced weight between shoulders. While overall system size decreased in recent systems, weight is determined by powering needs, hence the battery. Noury etal. (2004)
placed larger and heavier system components, including battery, computing and communication units, separately from the garment on a belt. Amft etal. (2004) used the belt
to integrate the complete wearable computer, weighting ~350g. Matsushitas (2001)
headset system weighted 220 g and the system of Bulling etal. (2009) was 188g, with
the goggles weighting only 60 g. To minimize energy needs, dynamic context-aware
power management appears to be an important topic for further research.
22.4.5Social Acceptance and Aesthetics

Aesthetically pleasant design may be key to the consumer market success of
accessory-based wearable computers, as demonstrated by recent commercial smartwatches. Similar features may help garment-based systems to become successful
too, by hiding privacy-sensitive and technical details, such as alarm buttons and
electrodes, but promoting comfort and commodity functions.
The overall objective for wearable computer aesthetics is a seamless integration
of all system components with the wearers everyday outfit. For example, DeVaul
etal. (2001) present a zip-in vest design that is compatible with a wide range of outer
wear options, from shirts to suits. Amft etal. (2004) create a variety of belts with
different color and texture; the buckle can be taken off from the belt and transferred
between different belt types. Knight etal. (2005) prefer a vest over a shirt, since a
longer vest provided social comfort to the wearer. They further used zippers instead
of Velcro straps to avoid gaps when closing the vest.
Alternative integration strategies, alleviating fashion and taste-dependent design
considerations, may be to integrate garment-based systems in underclothes (Noury
etal., 2004). By contrast, accessory-based systems, such as wrist watches (Maurer
etal., 2006) or finger rings (Asada etal., 2003), are continuously exposed and are
therefore substantially affected by aesthetic considerations.
Social acceptance of wearable computers clearly addresses factors beyond physical presence. Toney etal. (2002) describe a social weight of an item or technology
as: the measure of the degradation of social interaction that occurs between the
user and other people caused by the use of that item or technology. Evidently, a
piece of technology that is not noticeable to other people has an inherent low social
weight. Wearable computers that incur social weight could provide indications of
being unused during social interactions, that is, to avoid distracting conversation
partners. Further factors such as technology apprehension or cognitive load induced
by interacting with the wearable system should be considered as well when designing
wearable computers (Toney etal., 2002).
610
22.4.6 Robustness and Reliability

As part of a users everyday outfit, wearable computers are exposed to severe stress,
including different environmental conditions, moisture, sweat, cleaning, and activityrelated mechanical stress and vibrations. These stressors affect both g arment-based and
accessory-based systems.
An established strategy is to design sensitive electronic units to be removable
from garments. For example, Paradiso etal. (2008) made underclothes washable by
detaching the portable patient unit. Cleaning is simplified when using recent textile conductive sensors. Di Rienzo et al. (2005) developed a fully washable vest.
Harms etal. (2008) glued components onto their SMASH shirts inside, to sustain
wearing and cleaning. They use silicone gel for gluing, which has various benefits for
garment-based systems: low toxicity, electrically insulating, thermally stable, and low
chemical reactivity (resistant to oxygen and ozone). Figure 22.6 illustrates steps for
integrating electronics in the SMASH shirt. Rigid areas improved component protection in clothing (Lukowicz etal., 2001). Knight etal. (2005) used 100% polyester twill
for their vest, chosen for its strength, durability, and ease of washing. Moreover, they
use nylon webbing for required straps, due to strength and nonslip characteristics.
Accessory-based systems require similar robustness considerations. In the QBIC
mechanical design, Amft etal. (2004) used a tight buckle closing and sealed housings to prevent dust or liquid reaching electronic components. To withstand strong
forces, the belt side of the buckle consists of a stainless steel section.
Wiring and connectors should be specifically considered and suitable methods
are chosen, for example, for strain relief, contact sealing, and parts reliability assessment. In particular, as garments and accessories are constantly flexing during use,
continuous mechanical stress could cause rapid fatigue and thus lead to failures.
DeVaul etal. (2001) reported that a complete re-engineering of their body-bus system was necessary due to connector failure after less than a year of use. The QBIC
system (Amft etal., 2004) was used in various daily routine and sports monitoring
studies. The flex-print belt connections to the buckle showed cracks in signal lines
after only a few weeks of use and thus needed continuous replacement and repair.
(a)
(b)
FIGURE 22.6 Example of integration process using silicone depositing (a) and final result
(b) of the SMASH shirt. (Images courtesy of Holger Harms.)
(a)
611
(b)
FIGURE 22.7 Example of wearout effects due to mechanical and chemical stress in the
flex-print (a) and connector (b) of the QBIC belt computer. (Private images of the authors.)
Figure 22.7 shows examples of the line cracks leading to data and power connection
failures. The connectors were subsequently tinned to reduce the effect.
Foot-worn systems are particularly affected by mechanical stress due to foot
impact shocks and high accelerations during movement, requiring components to be
well attached or latched down. Paradiso (2002) covered the electronics board with a
protective Plexiglas shield. Sensors placed in insoles should be protected from abrasion, moisture, etc., which can be addressed by sealing and placing sensor insoles
beneath a regular insole in shoes (Paradiso, 2002).
22.4.7Extensibility
Aiming at multipurpose wearable computers, DeVaul et al. (2001) suggested that
systems should support the widest range of users and applications, which require
physical and functional reconfiguration options by design. In many projects, the
system design was optimized for a particular purpose, hence reducing the need
for extensibility. For multipurpose garment-based systems, extensibility primarily
addresses adding further electronic components. Reconfiguration of the textile-
integrated functions was not sufficiently considered, seemingly due to missing basefabric technology.
The garment designs of Harms et al. (2008) and Curone et al. (2010) follow a
modular architecture, where extensions such as adding sensors do not require modifying the central computing module. The vest design of Knight etal. (2005) included
two originally unused pockets, intended to house future additions. Harms et al.
(2008) deployed a hierarchical architecture consisting of three layers: terminals,
gateways, and central master, where gateways provide interfaces for sensors and
peripherals. Hubs could extend terminal count at a gateway to 127; thus the system
can be equipped with ~500 terminals in total.
Among the accessory-based wearable computers, Amft etal. (2004) addressed
extensibility by providing access to the QBIC system bus inside the belt for additional peripheral devices. In addition, the buckle contained main and extension
boards. The latter included peripheral and wireless interfaces that could be replaced
depending on the application and a specific belt configuration.
612
22.4.8Cost
Except for smartwatches and activity trackers, current garment-based and accessorybased wearable computers must yet be seen as niche products. Market price, and
hence production cost, is a key concern for adoption, in particular for mass market
garment and accessory products. Among the included projects, only few considered
cost aspects. A common approach was to build on components of the shelf only and
minimizing total component count.
Toney etal. (2002) estimated that their suit could be mass-produced at $17$20 for
the integrated electronics. Wearable computers provide a rich design space to choose
implementation options for cost-sensitive designs; however, they require dedicated
component evaluations in the targeted environment or application when prior experience and performance data are missing. For example, Knight etal. (2005) considered
several alternatives for heart-rate monitoring and selected an insert microphone and
pressure bulb as these were the cheapest options. However, components needed to be
replaced with more expensive alternatives due to motion artifact sensitivity.
Current garment-based wearable computers are frequently found in niches, where
higher market prices can be established. One main issue is the special production
process required for textile-integrated components. Accessory-based systems may
be less affected by production and technology-related risks, which enabled vendors
to successfully promote smartwatches.
22.4.9Safety
As new functionality is added, wearable computers are disruptive to classical uses of
garments and accessories. While reported cases of accidents while wearing custom
wearable computers exist, for example DeVaul etal. (2001), current safety considerations are premature. Since wearable computers include electronic modules, wiring,
batteries, etc., fundamental electrical safety considerations shall be applied and possibly extended for the needs of wearable systems in the future.
Key factors affecting safety include physical presence of the system, and cognitive load imposed on the wearer. Bulky and rigid design affects physical presence and
should be avoided. Similarly, wiring inside the outfit and contact insulation are essential
for safe handling and to prevent failures. For example, Knight etal. (2005) sewed wire
channels into their vest to pass leads through. Matsushita (2001) lowered Bluetooth
radio power of their headset system to reduce microwave irradiation into the users head.
Cognitive load is best addressed by user interfaces that minimize disruptive interrupts and attention during operation. Interruption moments could be determined
from user context information processing. To further eliminate safety concerns,
wearable system prototypes and their user interfaces should be evaluated during
long-term realistic trials with nonexperts wearing the system in their daily routine.
22.4.10On-Body Computing
In the headset-based system of Matsushita (2001), step detection was directly performed on the system. The AMON wrist-worn device analyzed measurements online,
613
including signal filtering and converting measured values into medical values, for
example, for blood pressure and RR distance, and performing automated evaluations for emergency detection (Anliker et al., 2004). The QBIC system was used
to run the CRNT streaming framework to recognize various activities (Bannach
etal., 2008). Rosso etal. (2010) used decision-tree algorithms on a PDA attached to
their sensor-embedded T-shirt to recognize worsening condition and provide immediate wearer feedback, while computationally intensive algorithms were executed
remotely. Curone etal. (2010) distributed signal processing and information extraction at sensor level in their outer garment.
Distributed processing and information extraction in wearable computers reduces
overall computational complexity and data amount to be communicated. Lukowicz
et al. (2001) observed that many computationally intensive tasks are application-
specific, for example, signal or image processing, and general-purpose processors
are not optimally suited. Their WearARM design consisted of a heterogeneous,
distributed architecture with general-purpose and special-purpose subsystems.
The latter included low-power DSPs to perform computations using a fraction of a
general-purpose processors time. Harms etal. (2008) distributed tasks onto a hierarchical network of different garment-integrated nodes. Terminals were equipped
with an 8-bit microprocessor for sensor signal preprocessing and translation.
Subsequently, gateways equipped with a 16-bit microcontroller concentrated data
from several attached terminals and extracted features. Eventually, a central master
unit processed feature data for online classification using a nearest centroid classifier
on a 16-bit microcontroller.
Few accessory-based wearable computers used operating systems to expedite
application development, abstract hardware, and use existing libraries and data processing frameworks. Both, IBMs smartwatch and the QBIC belt system ran GNU/
Linux. The EOG goggles ran freeRTOS. For garment-based wearable computers,
design and implementation of a dedicated operating system, called GarmentOS, was
proposed by Cheng etal. (2013).
22.5 FUTURE DIRECTIONS IN WEARABLE COMPUTING

Over the past 30years, wearable computer technology has made profound progress,
resulting in recent market successes. While the introduction of smartphones could be
considered as a game-changing breakthrough for wearable computers, it p rimarily
helped progressing in individual technical challenges, including sensing, user
interface, and on-body computing. Our review concentrated on projects aiming at
integrating electronics in functional garments and accessories, hence to assess progress toward the invisible computing paradigm for wearable systems. This section
summarizes conclusions and future research directions in wearable computing.
Market potential and sustainability. Current commercial wearable devices,
including smartwatches and activity trackers show low sustainability.
According to a 2014 Endeavour Partners report* one-third of American
*
http://endeavourpartners.net/white-papers/.
614
consumers who have owned a wearable product stopped using it within

6months. Attributed reasons include missing actionable feedback for users.
While many systems illustrate measurement data, concrete lifestyle-related
recommendations are lacking. A more fundamental reason for lacking success may be that thorough validation and performance reporting of many
marketed devices do not exist. Robust evaluation standards are needed
against which wearable systems can be tested and the reports should be
made publicly accessible. Based on device performance data, relevant feedback and actuation could be designed for both novel garment-based and
accessory-based wearable computers.
Garment-based wearable computers. The application field for garment-based
systems is cluttered with many highly specialized designs and functionality
to address individual niches. However, individual applications lack the
volume for textile mass production and hence incur high manufacturing
cost. One central approach investigated by the recent European project
SimpleSkin* is to standardize textile functionality similar to general
purpose electronics and reconfigure textile functions via software. The concept builds on an operating system for garments, called GarmentOS (Cheng
etal., 2013), to provide a hardware abstraction for application developers
similar to that of smartphone apps. Such garment apps are considered to
reconfigure many components of the wearable computer, including the textile sensing functions, data communication, and others. If successful, the
approach could enable textile manufacturers to mass-produce fabric that
suits for many applications of garment-based wearable computers.
Accessory-based wearable computers. As our analysis showed, integrating
wearable computers in real-life accessories is particularly challenging due to
space constraints and miniaturization involving electronics and power supply. For devices such as rings, necklaces, etc., standard electronic components
may still be too large to obtain perfect integration. Due to the persisting limitations in integrating accessory-based systems, stick-on devices, including
skin-attached plasters may take further momentum. For example, MC10
markets first stick-on sensor plasters for physiological monitoring.
User interaction. Currently, button and touch interfaces are predominant
interaction and control approaches for wearable computing. Minimizing the
cognitive burden of a context switch (from/to interacting with the wearable
computer) remains a challenge. As Google Glass and similar developments
show, processing context information can help to identify relevant information as well as sensible moments to interrupt users.
Sensing and on-body computing. While performance of textile-integrated sensors gradually improves, the fundamental limitation remains that wearable
computers obtain noisy measurements due to imperfect sensor placement
or measurement conditions. Recent works have shown that addingsensor channels and modalities increases available information and p otential
*

http://www.simpleskin.org.
http://www.mc10inc.com.
615
for artifact compensation. Wearable computers provide the necessary

processing resources for novel artifact handling algorithms. Future w
earable
computers require more elaborate context analysis functionality to differentiate customer offerings and increase user value.
Power management. Our review has shown that current wearable computers
generally rely on rechargeable batteries. As energy harvesting solutions are
taking momentum, future developments may combine batteries and harvesters to extend battery lifetime. However, further development of harvesting
solutions is needed to eventually power wearable computers continuously.
Wearable computer power consumption could be reduced by context-aware
dynamic power management, that is, to scale computing, peripherals, and
sensor needs according to the present situation and trends.
Wearability, extensibility, cost. Evaluating wearable computer systems is still
a time and effort-intensive undertaking. Development costs may rise due to
lacking prior performance data for a given device and a particular application.
Novel rapid prototyping approaches are needed to expedite the evaluation
process. Recent work on joint sensor and garment simulation can provide
estimations before actually implementing systems (Harms etal., 2012).
Robustness and safety. For accessory-based systems, sealing devices can minimize chemical stress, and elaborate mechanical design can help to deal
with forces during everyday use. For garment-based systems, continuous
wearing and cleaning cycles remain a critical issue. Detachable e lectronics
can mitigate cost implications of more frequent garment replacements;
however, these require robust dense-channel connectors in the garment.
Future research in connection technology is needed to find and standardize
connector designs. Similarly, safety standards still need to be established
for wearable computers that include handling instructions on how to operate garment and electronics.
Social acceptance and aesthetics. While the overall acceptance for body-worn
technology has increased in recent years, wearable computers are still not
established as part of our daily outfit. With the continuous progress made,
it seems conceivable that real wearable computers enter evaluation stages
within the next years and thus continue the momentum created by current
carry-on and stick-on devices.
ACKNOWLEDGMENT
The authors are thankful to Holger Harms for providing images. This work was
partially supported by the collaborative project SimpleSkin under contract with the
European Commission (#323849) in the FP7 FET Open framework.
REFERENCES
Amft, O., M. Lauffer, S. Ossevoort, F. Macaluso, P. Lukowicz, and G. Trster (2004). Design
of the QBIC wearable computing platform. In Proceedings of 15th IEEE International
Conference on Application-Specific Systems, Architectures, and Processors (ASAP),
Galveston, TX, pp.398410.
616
Amft, O., F. Wahl, S. Ishimaru, and K. Kunze (2015). Making regular eyeglasses smart. IEEE
Pervasive Computing, in press.
Amma, C., M. Georgi, and T. Schultz (2013). Airwriting: A wearable handwriting recognition
system. Personal and Ubiquitous Computing 18 (1), 191203.
Anliker, U., J. A. Ward, P. Lukowicz, G. Trster, F. Dolveck, M. Baer, F. Keita etal. (2004).
AMON: A wearable multiparameter medical monitoring and alert system. IEEE
Transactions on Information Technology in Biomedicine 8 (4), 415427.
Asada, H. H., P. Shaltis, A. Reisner, S. Rhee, and R. C. Hutchinson (2003). Mobile monitoring
with wearable photoplethysmographic biosensors. IEEE Engineering in Medicine and
Biology Magazine 22 (3), 2840.
Bamberg, S. J. M., A. Y. Benbasat, D. M. Scarborough, D. E. Krebs, and J. A. Paradiso (2008).
Gait analysis using a shoe-integrated wireless sensor system. IEEE Transactions on
Information Technology in Biomedicine 12 (4), 413423.
Bannach, D., O. Amft, and P. Lukowicz (2008). Rapid prototyping of activity recognition
applications. IEEE Pervasive Computing 7 (2), 2231.
Bharatula, N. B., S. Ossevoort, M. Stger, and G. Trster (2004). Towards wearable autonomous microsystems. In Proceedings of Second International Conference on Pervasive
Computing (PERVASIVE), Vienna, Austria, pp. 225237.
Bulling, A., D. Roggen, and G. Trster (2009). Wearable EOG goggles: Seamless sensing
and context-awareness in everyday environments. Journal of Ambient Intelligence and
Smart Environments 1 (2), 157171.
Cheng, J., O. Amft, and P. Lukowicz (2010). Active capacitive sensing: Exploring a new wearable sensing modality for activity recognition. In Proceedings of Eighth International
Conference on Pervasive Computing (PERVASIVE), Helsinki, Finland, pp. 319336.
Cheng, J., P. Lukowicz, N. Henze, A. Schmidt, O. Amft, G. A. Salvatore, and G. Trster (2013).
Smart textiles: From niche to mainstream. IEEE Pervasive Computing 12 (3), 8184.
Curone, D., E. L. Secco, A. Tognetti, G. Loriga, G. Dudnik, M. Risatti, R. Whyte, A. Bonfiglio,
and G. Magenes (2010). Smart garments for emergency operators: The ProeTEX project.
IEEE Transactions on Information Technology in Biomedicine 14 (3), 694701.
DeVaul, R. W., S. J. Schwartz, and A. Pentland (2001). MIThril: Context-aware computing for
daily life. Technical report, MIT Media Lab, Cambridge, MA.
Di Rienzo, M., F. Rizzo, G. Parati, G. Brambilla, M. Ferratini, and P. Castiglioni (2005).
MagIC system: A new textile-based wearable device for biological signal monitoring.
Applicability in daily life and clinical setting. In Proceedings of 27th Annual
International IEEE EMBS Conference, Shanghai, China, pp. 71677169.
Dipietro, L., A. M. Sabatini, and P. Dario (2008). A survey of glove-based systems and their
applications. IEEE Transactions on Systems, Man, and Cybernetics 38 (4), 461482.
Gemperle, F., C. Kasabach, J. Stivoric, M. Bauer, and R. Martin (1998). Design for wearability. In Proceedings of Second International Symposium on Wearable Computers
(ISWC), Pittsburgh, PA, pp. 116122.
Harms, H., O. Amft, D. Roggen, and G. Trster (2008). SMASH: A distributed sensing and
processing garment for the classification of upper body postures. In Proceedings of
Third International Conference on Body Area Networks (BodyNets), ICST, Brussels,
Belgium.
Harms, H., O. Amft, and G. Trster (2012). Does loose fitting matter? Predicting sensor
performance in smart garments. In Proceedings of Seventh International Conference on
Body Area Networks (BodyNets), ICST, Brussels, Belgium, pp. 14.
Jarchi, D., B. Lo, E. Ieong, D. Nathwani, and G.-Z. Yang (2014). Validation of the e-AR sensor
for gait event detection using the Parotec foot insole with application to post-operative
recovery monitoring. In Proceedings of 11th International Conference on Wearable and
Implantable Body Sensor Networks (BSN), Zurich, Switzerland, pp. 127131.
617
Kim, S. H., D. W. Ryoo, and C. Bae (2008). U-healthcare system using smart headband.
In Proceedings of 30th Annual International IEEE EMBS Conference, Vancouver,
British Columbia, Canada, pp. 15571560.
Kim, Y. S., B. S. Soh, and S.-G. Lee (2005). A new wearable input device: SCURRY. IEEE
Transactions on Industrial Electronics 52 (6), 14901499.
Knight, J. and D. Deen-Williams (2006). Assessing the wearability of wearable computers.
InProceedings of 10th IEEE International Symposium on Wearable Computers (ISWC),
Montreux, Switzerland, pp. 7582.
Knight, J. F., A. Schwirtz, F. Psomadelis, C. Baber, H. W. Bristow, and T. N. Arvanitis (2005).
The design of the SensVest. Personal and Ubiquitous Computing 9 (1), 619.
Kuroda, T., Y. Tabata, A. Goto, H. Ikuta, and M. Murakami (2004). Consumer price dataglove for sign language recognition. In Proceedings of Fifth International Conference
on Disability, Virtual Reality and Associated Technologies (ICDVRAT), Oxford, U.K.,
pp. 253258.
Lee, S., B. Kim, T. Roh, S. Hong, and H.-J. Yoo (2010). Arm-band type textile-MP3 player
with multi-layer Planar Fashionable Circuit Board (P-FCB) techniques. In Proceedings
of 14th IEEE International Symposium on Wearable Computers (ISWC), Seoul, South
Korea, pp. 17.
Liu, J., T. E. Lockhart, M. Jones, and T. Martin (2008). Local dynamic stability assessment of
motion impaired elderly using electronic textile pants. IEEE Transactions on Automation
Science and Engineering 5 (4), 696702.
Lizzy (1993). Lizzy: MITs wearable computer design. http://www.media.mit.edu/wearables/
lizzy/lizzy/index.html. Last accessed: July 22, 2014.
Lukowicz, P., U. Anliker, G. Trster, S. J. Schwartz, and R. W. DeVaul (2001). The WearArm
modular, low-power computing core. IEEE Micro 21 (3), 1628.
Malhi, K., S. C. Mukhopadhyay, J. Schnepper, M. Haefke, and H. Ewald (2012). A Zigbee-based
wearable physiological parameters monitoring system. IEEE Sensors Journal 12 (3),
423430.
Matias, E., I. S. MacKenzie, and W. Buxton (1994). Half-qwerty: Typing with one hand
using your two-handed skills. In Companion Proceedings of the Conference on Human
Factors in Computing Systems (CHI), Boston, MA, pp. 5152.
Matsushita, S. (2001). A headset-based minimized wearable computer. IEEE Intelligent
Systems 16 (3), 2832.
Maurer, U., A. Rowe, A. Smailagic, and D. P. Siewiorek (2006). eWatch: A wearable sensor
and notification platform. In Proceedings of International Workshop on Wearable and
Implantable Body Sensor Networks (BSN), Cambridge, MA, pp. 142145.
Narayanaswami, C., N. Kamijoh, M. Raghunath, T. Inoue, T. Cipolla, J. Sanford, E. Schlig
et al. (2002). IBMs Linux watch: The challenge of miniaturization. IEEE Computer
35(1), 3341.
Noury, N., A. Dittmar, C. Corroy, R. Baghai, J. Weber, D. Blanc, F. Klefstat, A. Blinovska,
S. Vaysse, and B. Comet (2004). A smart cloth for ambulatory telemonitoring of
physiological parameters and activity: The VTAMN project. In Proceedings of Sixth
International Workshop on Enterprise Networking and Computing in Healthcare
Industry (Healthcom), Odawara-shi, Japan, pp. 155160.
Paradiso, J. A. (2002). Footnotes: Personal reflections on the development of instrumented
dance shoes and their musical applications. In Proceedings of International Conference
on New Interfaces for Musical Expression (NIME), Dublin, Ireland, pp. 3449.
Paradiso, R., A. Alonso, D. Cianflone, A. Milsis, T. Vavouras, and C. Malliopoulos (2008).
Remote health monitoring with wearable non-invasive mobile system: The HealthWear
project. In Proceedings of 30th Annual International IEEE EMBS Conference,
Vancouver, British Columbia, Canada, pp. 16991702.
618
Paradiso, R., G. Loriga, and N. Taccini (2005). A wearable health care system based on knitted
integrated sensors. IEEE Transactions on Information Technology in Biomedicine 9 (3),
337344.
Paradiso, R. and M. Pacelli (2011). Textile electrodes and integrated smart textile for reliable
biomonitoring. In Proceedings of 33rd Annual International IEEE EMBS Conference,
Boston, MA, pp. 32743277.
Park, I.-K., J.-H. Kim, and K.-S. Hong (2008). An implementation of an FPGA-based
embedded gesture recognizer using a data glove. In Proceedings of Second International
Conference on Ubiquitous Information Management and Communication (ICUIMC),
Suwon, Korea, pp. 496500.
Rosso, R., G. Munaro, O. Salvetti, S. Colantonio, and F. Ciancitto (2010). CHRONIOUS:
An open, ubiquitous and adaptive chronic disease management platform for chronic
obstructive pulmonary disease (COPD), chronic kidney disease (CKD) and renal
insufficiency. In Proceedings of 32nd Annual International IEEE EMBS Conference,
Buenos Aires, Argentina, pp. 68506853.
Starner, T. (1993). The cyborgs are coming, or, the real personal computers. Technical Report
TR 318, MIT. Written for Wired (unpublished). Obsolete by: TR355 Feb. 1994; original
text Nov. 1993; images June 1995.
Stiefmeier, T., D. Roggen, G. Ogris, P. Lukowicz, and G. Trster (2008). Wearable activity
tracking in car manufacturing. IEEE Pervasive Computing 7 (2), 4250.
Tamaki, E., T. Miyaki, and J. Rekimoto (2009). Brainy hand: An ear-worn hand gesture interaction device. In CHI Extended Abstracts, Boston, MA, pp. 42554260.
Thorp, E. O. (1998). The invention of the first wearable computer. In Proceedings of Second
International Symposium on Wearable Computers (ISWC), Pittsburgh, PA, pp. 48.
Toney, A., B. Mulley, B. H. Thomas, and W. Piekarski (2002). Minimal social weight user
interactions for wearable computers in business suits. In Proceedings of Sixth IEEE
International Symposium on Wearable Computers (ISWC), Seattle, WA, pp. 5764.
Wang, L., B. Lo, and G.-Z. Yang (2007). Reflective photoplethysmograph earpiece sensor for
ubiquitous heart rate monitoring. In Proceedings of Fourth International Workshop on
Wearable and Implantable Body Sensor Networks (BSN), Aachen, Germany, pp.179183.
Wang, R. Y. and J. Popovic (2009). Real-time hand-tracking with a color glove. ACM
Transactions on Graphics (SIGGRAPH 2009), 28(3).
Weber, J.-L., Y. Rimet, E. Mallet, D. Ronayette, C. Rambaud, C. Terlaud, Y. Brusquet etal.
(2007). Evaluation of a new, wireless pulse oximetry monitoring system in infants:
The BBA bootee. In Proceedings of Fourth International Workshop on Wearable and
Implantable Body Sensor Networks (BSN), Aachen, Germany, pp. 143148.
Weiser, M. (1991). The computer for the 21st century. Scientific American International
Edition 265 (3), 6675.
Wilhelm, F. H., W. T. Roth, and M. A. Sackner (2003). The LifeShirt: An advanced system
for ambulatory measurement of respiratory and cardiac function. Behav Modif 27 (5),
671691. DOI: 10.1177/0145445503256321.
Ye, W., Y. Xu, and K. K. Lee (2005). Shoe-Mouse: An integrated intelligent shoe. InProceedings
of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS),
Edmonton, Canada, pp.11631167.
23
E-Textiles in the
Apparel Factory
Leveraging Cut-and-Sew
Technology toward
the Next Generation
of Smart Garments
Lucy E. Dunne, Cory Simon, and Guido Gioberto
CONTENTS
23.1 Introduction................................................................................................... 620
23.2 Background: Textile Integration Strategies................................................... 621
23.2.1 Advanced Electronic-Textile Manufacturing.................................... 621
23.2.2 Surface Attachment and Conductor Integration................................ 621
23.2.3 PCB Attachment and Encapsulation................................................. 621
23.3 Background: Stitching Technologies............................................................. 623
23.3.1 Single-Needle Machines.................................................................... 623
23.3.2 Multineedle Machines....................................................................... 624
23.3.3 CNC Machines.................................................................................. 625
23.4 Routing Interconnects in Garments............................................................... 627
23.4.1 Pattern and Marker Layout................................................................ 627
23.4.2 Production Design and Order of Operations..................................... 628
23.4.3 Seam Crossing Methods.................................................................... 629
23.4.4 Trace Crossings.................................................................................. 631
23.5 Component Attachment................................................................................. 632
23.5.1 Through-Hole Components and Crimped Methods.......................... 632
23.5.2 Surface-Mount Components and Reflow Methods............................ 633
23.6 Textile Components....................................................................................... 634
23.6.1 Stitched Stretch and Bend Sensors.................................................... 634
23.6.2 Sensor Insulation and Durability....................................................... 636
23.7 Conclusion..................................................................................................... 637
References............................................................................................................... 638
619
620
23.1INTRODUCTION
The last 1520 years have seen many significant advances in the development of
on-body technologies for sensing, interface, and communication. The state of the art
in areas like humancomputer interaction, signal processing, and context awareness
has progressed dramatically since the first crude prototypes. However, most of this
progress has been primarily driven by the engineering community. Perhaps consequently, the development of manufacturing techniques has similarly focused on leveraging techniques and technologies well-established in production of small electronic
devices. Aside from being a well-known and often convenient method of producing
technology, in many ways this is a useful focus: electronic components and circuits
benefit significantly from the structure of hard goods. Durability and reliability are
dramatically improved when the electronic part of a system is well-structured and
insulated, protected from wear and tear as well as moisture and other contaminants.
Unfortunately, many of the qualities of hard goods that benefit electronic technologies are at odds with qualities that promote human comfort. Further, a large number
of body-worn devices can require excessive upkeep, maintenance, and memory on the
part of the user. On the other hand, reliance on encapsulating electronics in a single
unit significantly limits the body areas that can be accessed by a wearable technology,
and forces things like sensors to be confined to a very limited number of physical locations. In an application like activity monitoring, the number and specificity of activities
that can be sensed using a single measurement point (like the torso) is far smaller than
what can be achieved using sensors scattered over the body surface (including limbs,
etc.). Integrating electronics into garments can allow body access without requiring
the tending of a large number of body-worn units, and things like power and networking can be simplified. Finally, many exciting applications of wearable technology lie
in the realm of augmenting the functions of clothing, rather than distributing the functions of communications and information technologies over the body.
All of these factors lead to interesting potential for garment-integrated technologies.
However, perhaps one of the largest barriers to successful garment integration of electronics is manufacture. While the electronics industry is highly automated, with most
assembly being done by machine, the apparel industry is still a very traditional industry that relies heavily on manual labor. Due to the extremely short product cycle (with
new products introduced at intervals of 3months or shorter), it is an industry often
without the ability to invest in larger-scale changes to processes, skill sets, or timelines.
However, there remains significant potential to work within the existing structures, technologies, and methods used in apparel production. In many ways, adapting
the existing infrastructure to the production of smart garments may have distinct
benefits, especially in the near term. Here, we discuss some of the basic technologies
of apparel production, with a focus on the most common type of apparel factory, the
Cut-Make-Trim (CMT) factory. CMT operations are responsible for some combination of cutting textile goods, assembling a garment or other product, and applying
finishings such as trims, tags, or other embellishments. The exact capabilities vary
from factory to factory, and may include things like dyeing and printing as well as
front-end services like design, sourcing, and patternmaking, but core capabilities are
centered around cutting, sewing, pressing, and low-tech finishing processes.
E-Textiles in the Apparel Factory
621
23.2 BACKGROUND: TEXTILE INTEGRATION STRATEGIES

23.2.1 Advanced Electronic-Textile Manufacturing
It must be noted that in parallel with the development of systems and interfaces
for wearable technology applications, much attention has been paid to the development of novel fibers and textile production methods to enable fully integrated electronic textiles (e-textiles). While perhaps the most development has been in the area
of conductive fibers and textiles for forming electrical interconnects in fabric and
tape structures, these approaches go as far as developing discrete electronic components (such as transistors) in fiber form (Lee and Subramanian 2005). Methods
for forming interconnects between crossing fibers/components and for fabricating
flexible printed circuit boards (PCBs) in weaveable strips have also been developed
(Bhattacharya etal. 2012; Locher etal. 2004; Zysset etal. 2012).
However, these methods require that garment production methods be adapted
upstream of the CMT processes, and many have follow-on implications for CMT
operations. As such, they represent more significant changes for the entire apparel
production ecosystem.
23.2.2 Surface Attachment and Conductor Integration

To date, most commercial garments with electronic functionality have relied on surface attachment of electronic systems, in which electronics are housed in p ockets and
cables are run through conduits or channels in the garment. In some cases, cables
are adhered or fused to the surface of the garment, but in many cases the electronic
system is made in such a way that it can be entirely removed from the garment. Such
systems can be produced entirely separately from the garment, and inserted manually in final steps of production (in some cases, the two are combined by the end user
post-purchase).
Another popular approach is to selectively integrate conductors into the garment,
which are used to connect specific sensors (commonly physiological sensors like
heart rate electrodes or breathing sensors) to a detachable processing unit. For some
garments, conductors can be easily integrated by weaving or knitting. These processes usually require conductors to be oriented orthogonal to the garment/textile,
but in cases where only a small number of conductors are required in specific locations, orthogonal placement is not necessarily an obstacle.
23.2.3 PCB Attachment and Encapsulation

The next step toward full textile integration is attachment of prefabricated PCBs to
the surface of a textile. This kind of hybrid approach combines the benefits of easyto-assemble PCB manufacture with the comfort properties of flexible, yarn-based
conductors. The craft e-textiles community has made significant use of sew-through
PCBs, in which PCB connections are broken out to plated through-holes with larger
plated flanges (Buechley and Eisenberg 2007). Conductive thread is used to stitch
through and around the through-hole, making both an electrical and mechanical
connection in one operation, as shown in Figure 23.1.
622
FIGURE 23.1 Stitching PCB connections using conductive thread.
Linz etal. have used a similar approach to attach PCBs by machine using a CNC
(Computer Numerical Control, or programmable) embroidery machine (Linz etal.
2005). In this process, the embroidery machine lays out a registration stitch to mark
the PCB location and then stitches electrical connections to the board, which pass
over and around the PCB through-holes.
However, the interface between a soft textile and the rigid edges of a PCB can create
problematic stress points on the conductive stitching. These points are the most common failure points for a textile-integrated circuit. To alleviate stress on the conductor
(and to simultaneously protect the PCB from moisture, wear, and contaminants), textileintegrated PCBs are often encapsulated in an impermeable coating, as shown in Figure
23.2. Depending on the size of the PCB/component, this could be as small as a glob-top
type coating or as large as a molded encapsulation (Kirstein 2013; Linz etal. 2005).
FIGURE 23.2 PCB encapsulation on a textile surface.
623
23.3 BACKGROUND: STITCHING TECHNOLOGIES

The sewing machine revolutionized apparel production when it was first introduced
in the mid-1800s. Prior to that point, apparel was predominantly custom-made for
an individual consumer, stitched by hand. The ability to produce more efficiently
using a sewing machine opened the doors for mass production of apparel. However,
it took the extreme circumstance of the Revolutionary War in the United States for
the hurdle of creating a useful sizing system to be overcome, enabling true mass
production of ready-to-wear apparel.
The first sewing machine used in a production setting was a chain-stitch machine.
However, from that beginning, many various instances of sewing technologies
emerged, starting with more complex stitches for specific functions and progressing to todays fully automated processes like pocket setting, which require a human
operator only for the insertion of components into the machine.
The common range of sewing machinery today ranges from multipurpose
machines like single-needle lockstitch machines to specialized machinery that performs only one operation, like a buttonhole stitcher. A typical CMT factory has some
assortment of sewing machines depending on the type of product it most commonly
produces. Here, we will focus on the common classes of machinery, and on the more
advanced machines that are particularly useful for e-textile fabrication processes.
23.3.1 Single-Needle Machines

Single-needle sewing machines use (as the name implies) only one needle to create
a stitch. They come in two major types, machines that use only the needle thread to
create a chain stitch and machines that use the needle thread in combination with a
bobbin thread to create a lockstitch.
A single-thread chain stitch (ISO class 100) is formed when the needle thread is
punched through the fabric, forming a loop through which the next needle punch is
inserted, as shown in Figure 23.3. This chain of loops requires only one thread, but
can be easily undone by pulling the thread trailing from the most recently formed
loop. Chain stitches are used to form finishing seams that are not weight-bearing
(like structural seams would be), such as hems or trim attachments.
Lockstitch machines (ISO class 301) form a lock stitch in which the needle thread
and the bobbin thread cross or lock in place, as shown in Figure 23.4. A lockstitch is
FIGURE 23.3 The chain stitch.
624

Top thread
Fabric
Bobbin thread
FIGURE 23.4 The lockstitch.
much more secure than a chain stitch, and unravels slowly by pulling the two pieces
of fabric apart. The lockstitch is perhaps the most common sewing operation, used
to make body seams as well as finishing seams, trim attachments, and many other
operations. Its biggest drawback is that it does not stretch (nor does a chain stitch)
and therefore cannot be used to form extensible seams (such as in stretchy materials).
As seen in Figure 23.4, the interlock between the bobbin and needle thread is
typically located in the middle of the fabric piece or seam. This reduces wear and
tear on the twist and creates a stitch that is tight and has little slack. However, the
location of this interlock can pull to one side of the fabric or the other, depending on
the tension in each thread.
23.3.2 Multineedle Machines

Multineedle machines typically form stitches with looped components, in which one
or more looper threads are intertwined with or caught by one or more needle threads.
Perhaps the simplest multineedle stitch is the coverstitch (ISO class 406), which is
essentially a multineedle lockstitch in which one bobbin thread is used for two or
more needle threads. The bobbin locks with loops of each needle thread, introducing some play into the stitch structure (where loops can be extended or contracted),
which allows it to stretch. A three-thread coverstitch (composed of two needles and
one looper bobbin) is very commonly used to form hems and edge finishes on t-shirts
and other stretch garments. A fourth thread (also a looper thread) can be introduced
on top of the stitch to cover the top side of the fabric (ISO class 602). This thread
is typically not used structurally in the stitch, but merely brought back and forth
between the needle threads. A top cover can be used decoratively or to finish a raw
edge on top of the fabric (Figure 23.5).
Perhaps the most common form of multineedle stitch is an overlock or serged
stitch (Figure 23.6). Overlocked stitches are almost exclusively used to form an
edge finish or seam, due to the fact that they trim the raw edge of the fabric as
the stitch is formed. Overlock stitches (ISO class 500, many varieties) use one or
two needles in combination with one to three loopers to form an interlocked stitch
where loop intersections pass through the fabric (forming a seam) as well as at the
edge of the fabric (forming an edge finish that encloses a raw edge). Because the
structure is looped, it is usually extensible and is commonly used to form seams in
stretchy garments.
625

Needle 1
Needle 1
Top cover thread
Needle 2
Looper
Needle 2
Bottom cover thread
FIGURE 23.5 Multineedle coverstitch.
Inner needle
Lower looper
Fa
br
ic
ed
ge
Outer needle
Upper looper
FIGURE 23.6 Multineedle overlock stitch (ISO 514).
23.3.3CNC Machines
Progressive stitches are most commonly formed by lifting the machines presser foot
(which presses the fabric to the bed of the machine) slightly between stitches and
moving the fabric using the machines feed dogs (sawtooth strips located beneath the
presser foot). Because of this, most sewing machines form a stitch that progresses
linearly, perpendicular to the orientation of the machine and parallel to the orientation of the feed dogs. Placing stitches in patterns other than a straight line often relies
on the operator moving the fabric as stitches are formed. For e-textiles, it is often
important to be able to sew more complex patterns automatically, without relying on
operator skill.
626
To form stitches in other directions, either the needle or the fabric must move.
Many lockstitch machines (especially home sewing machines) have the ability to
move the needle position laterally within about 1/41/2. This can be used to form
simple zig-zag stitches, or to form more complex stitch patterns, in combination with
fabric movement via the feed dogs.
Importantly, changing the direction of the stitch can have a significant impact
on the tension of the stitch. Tension is controlled by a complex series of springmounted plates, and mechanical take-up in the loose thread. Because it is calibrated
to dispense a given amount of thread per stitch, when the stitch travels laterally the
relationship between linear distance and thread consumption changes. If the stitch
becomes unbalanced, the top or bottom thread may wrap to the opposite side of the
fabric, pulled by a more dominant tension in the opposite thread. The lower the ratio
between the width of the stitch and the length of the stitch, the more different the
tension balance will be versus a straight lockstitch.
CNC embroidery machines, which are used to sew embellishments like logos
and graphics, use short zig-zag lockstitches (with very low width-to-length ratios)
to cover large areas with thread. Each color is prethreaded on an individual needle
(to avoid the need to rethread the machine), and the active needle changes as each
color is stitched, each needle interacting with the same bobbin thread. The zig-zag
creates coverage of a specific color, while CNC-controlled movement of the fabric
places this covering stitch in the appropriate place on the fabric. While embroidery
machines do also sew straight stitches, they are optimized for covering stitches.
Hence, the bobbin thread is typically much heavier than the needle threads, and the
tension balance is tighter on the bobbin than the needle thread, which causes the
needle thread to wrap to the back of the fabric. However, in an embroidery pattern,
the face of the fabric is most important and the appearance of the back is not taken
into consideration. (Embroideries are almost never weight-bearing, so durability is
not a significant issue.) Therefore, an imbalance in the stitch tension ensures that
the needle thread covers fully (and often wraps to the back of the fabric), creating
uniform color fill.
By contrast, pattern sewing machines typically have only one needle and one
bobbin to create a straight lockstitch, which is used to create nonlinear patterns by
moving only the fabric, and rarely the needle. While the stitch balance may still
be slightly tighter on the bobbin thread to ensure full coverage, it is much closer to
balanced than what is commonly seen in embroidery stitches. Pattern sewers are used
to sew reinforcement patterns or more linear designs. As these may be load-bearing,
the structural integrity of the stitch is much more important than in embroidery.
Both embroidery machines and pattern sewers (and increasingly, other machine
types) often have the ability to also trim the thread at the end of an operation, sometimes combined with an automatic back-tack stitch that reverses the stitch direction
for a few stitches in order to lock the trailing threads against force-induced separation. For embroidery machines, this feature is commonly used when switching colors. However, since embroidery stitches are rarely back-stitched (a backstitch would
create a buildup of thread), the machine is often designed to leave a much longer
thread tail than would be present on other kinds of trimmed stitches. The long tail is
then caught up underneath subsequent stitches, preventing the stitch from unraveling.
627
23.4 ROUTING INTERCONNECTS IN GARMENTS

23.4.1 Pattern and Marker Layout
In the production of cut-and-sewn garments, the product begins as a 2D pattern.
The pattern not only holds the shape of each piece but also holds production markings and placements. Edges of the pattern are notched to indicate crossing seams,
to identify and disambiguate pieces that may be accidentally inverted or swapped
in production, and to indicate placement of features like gathers, pleats, and darts
(Figure 23.7a). Internal (within-piece) placements for things like dart ends, pockets,
Match stripe
Match
stripe
Match stripe
(a)
(b)
FIGURE 23.7 A garment pattern piece (a) marker and (b) showing match stripes.
628
or embellishments are marked by drilling through the fabric (in cases where the hole
will be covered by the embellishment), or by marking with chalk (in cases where a
hole would be visible). For garments cut from fabrics with patterns that must match
across seams, a match stripe is placed on each piece.
Garments are most commonly cut in batches, and all of the pieces to be cut from
a given textile are cut at once. These may or may not be all of the pieces in a garment (depending on how many different fabrics are used in the garment), nor must
they be all from the same garment. Usually, many sizes of a garment are cut at once,
and sometimes pieces of different garments that use the same textile may be mixed.
The more piece shapes to be cut at once, the better the yield of the fabric (waste is
minimized), as the pieces can usually be packed more tightly. Since fabrics are cut
many plies at a time, waste can add up quickly. Garments cut from textiles with large
repeating patterns (such as plaids) usually have a much lower fabric yield, as the
placement of match points is a significant contstraint on efficiency of cutting.
The cutting process begins by laying a plan for cutting, called a marker (Figure
23.7b). The marker can be planned by hand or by computer, and can be printed/drawn
on paper or fed directly to a CNC cutting machine. (It should be noted that paper
markers cut by hand are more common.) Pieces must be laid into the marker according to their orientation requirements: for almost all knit and woven fabrics, pieces
have a clear grainline and must be placed so that they are appropriately oriented with
the grain. Match points for repeating patterns must also be taken into account in the
marker layout. Markers are usually drawn 1 shorter than the full width of the fabric,
so that the edge can be cut away and discarded. For conventional fabrics, this is for
several reasons: (1) the edge of many fabrics is stabilized and held under tension in the
loom while the fabric is woven, and has a very different hand than the rest of the fabric; (2) textiles are stored and shipped in rolls and the top and bottom may be exposed
to much more wear and tear than the interior of the fabric, and consequently may be
dirty or damaged; and (3) the textile production process may introduce irregularities
into the width of the fabric (most commonly from irregular tension in rolling the
fabric onto bolts). However, for e-textiles, this means that conductors or components
cannot be routed around the edge of the fabric, or they may be severed during cutting.
The fabric is spread on a cutting table to match the marker in length and number
of plies. In the case of a paper marker, the marker is placed on top, and the pieces are
cut using a reciprocating knife or rotary blade. In the case of a CNC cutting machine,
pieces are cut using a laser or reciprocating knife. Cut pieces are removed from the
cutting table and tied into bundles, which are then delivered to the production line.
23.4.2 Production Design and Order of Operations

Production is planned to maximize efficiency within the constraints of the factorys
person power and machine assortment, and to preserve two basic principles. First,
operations proceed from the inside of a piece outwardinterior embellishments
on the surface of a piece like darts, pockets, or decorations are completed while
the piece is attached to the fewest number of other pieces. Second, operations are
sequenced to preserve the garment in a flat configuration as long as possible. This is
because, in general, it is easier to sew flat pieces together than to sew 3D components
629
(such as to insert a tubular sleeve into a tubular opening). However, there are structural requirements for order of operations as dictated by the geometry and design
of the garment. It may be impossible to perform certain operations before another
operation has taken place (e.g., the garment front and back must be attached before a
collar can be attached). The inverse is true as well: it may be impossible to perform
certain operations once others have been performed (e.g., once a sleeve is tubular, it
may be difficult or impossible to stitch an embellishment or trim down the length of
the sleeve due to the orientation of the machine bed).
For e-textiles, these requirements pose interesting challenges. If a designer intends
a stitched embellishment to progress from a collar onto the body of a garment and
down the sleeve, it is feasible using notches and placement marks to pre-embroider
each piece before the garment is assembled. However, if that embellishment conducts electricity, it must achieve electrical integrity as well as aesthetic integrity as
it crosses seams.
23.4.3 Seam Crossing Methods

Routing interconnects in garments is essential to fabric area networks (FANs) and to
any garment with distributed functionality powered by a central power source and/
or processor. There are many approaches to integrating conductors and cables into
garments and textiles, some of which were discussed earlier in this chapter. Here, we
will focus on techniques feasibly used by CMT factories to achieve seam crossings
given an array of potential textile structures and garment designs.
Perhaps the most highly textile-integrated method for embedding conductors is
to use a direct weaving or knitting approach. In this case, the textile would arrive at
the CMT factory with conductors pre-embedded. For woven textiles, these could be
insulated or uninsulated. For knitted textiles, uninsulated conductors are much more
commonly used (due to the tight bend radii and fine yarns used in knitting processes).
Conductors may be laid in custom patterns that correspond to the marker layout, to
route specific connections on individual pattern pieces, or may be laid in a regularly
spaced grid or pattern, which is accounted for with match points on the marker.
In the case of uninsulated conductors, connections can be made across seams
by careful alignment of the bare conductors. In this case, a tight, short stitch may
provide enough mechanical stability to make an electrical connection, depending on
the requirements of the circuit.
Uninsulated conductors can be insulated in several ways. Common techniques
include using intarsia knitting processes or multilayer weaving processes to produce
an interconnect within two or more layers of nonconductive textile (Li etal. 2010),
sealing with a surface application such as a tape (Berglund et al. 2014) or paint
(Buechley and Eisenberg 2007), or stitching an additional cover layer of textile over
the conductor.
In the case of insulated conductors, the insulation must be pierced or stripped in
order to make a connection. This is a more difficult process, especially within the
constraints of CMT production. Having operators manually strip insulation is likely
to be prohibitively expensive, and the need for an electrical connection remains.
Soldering in an apparel factory is not impossible but certainly not typical. Lehnetal.
630
have explored interconnection methods for insulated conductors using various

techniques including soldering, taping, and attachment of mechanical connectors.
Although their focus was on attaching PCBs to the surface of a textile rather than
crossing seams, the principles are similar (Lehn etal. 2004).
Slade et al. (2012) have implemented an alternative technique, which uses an
ultrasonic process to weld conductors. The ultrasonic bonding process strips insulation, makes an electrical connection, and produces a welded seam in one operation.
Itshould be noted, however, that welded seaming is only possible on textiles with
thermoplastic fiber content. For natural fibers, this process could be used to make
conductor connections but not to seam the base fabric.
Finally, it is important to note that insulated conductors pose additional hazards
in sewn production due to the risk of piercing a cable with a sewing needle. In cases
where stitches are placed over insulated conductors, stitchers must be trained to avoid
piercing the conductor. In very precise operations it may be possible to calibrate the
stitch length and seam length such that the needle holes skip over the conductor, but
this would be very difficult within the tolerances typical of most sewn products.
An alternative to using a textile with prerouted interconnects is to apply interconnects during CMT production. This approach has benefits in that it can be applied to
almost any textile (reducing the overhead of having a textile custom-designed), and
that it allows much more flexibility in the location and orientation of interconnects.
There are two major approaches to surface application of conductors that have been
widely explored: printing and stitching.
Printed processes apply a film or coating of conductive material to the fabric.
In some cases, this process is too advanced to take place in a CMT factory
(e.g.,vapor deposition or reel-to-reel deposition of conductive materials (Bonderover
and Wagner 2004; Lee and Subramanian 2005; Yamashita et al. 2012). However,
traditional screen printing processes have been used in apparel production for many
decades, and while not exactly typical, they are not rare in CMT factories. In screen
printing, an impermeable resist pattern is applied to a fine mesh screen. Ink is applied
to the top of the screen and drawn down with a squeegee, which forces a thin layer
through the screen and onto the fabric.
Commercial screen-printing operations are often still manual processes. Import
antly for e-textiles, printing usually must be done to a flat (or flattened) garment area.
It is much more difficult to print around a body or body part in circumference, as the
garment part must be flattened and individual prints must be perfectly registered in
order to produce a continuous appearance (or in the case of e-textiles, a good electrical connection). An alternative approach is to print garment pieces before the garment
is assembled. It is also possible to print conductive inks during textile production, but
in this case the impact on CMT processes would be similar to uninsulated woven or
knitted conductors.
Stitching is another way to attach a conductive trace to the surface of a textile. In
e-textiles, the conductive thread is typically used as the bobbin thread in a lockstitch
machine, for durability reasons. The needle thread in a lockstitch machine is pulled
down under the bobbin carriage to form the interlocking loop. The excess thread is
pulled down by the bobbin hook and then taken up by the take-up lever during the
formation of each stitch. During this process, it passes through the needle and back
631
again, causing each point on the thread to pass through the needle (under tension)
upto 70 times before it is embedded in the fabric. This exerts significant wear and
tear on the thread, and can cause thicker or more brittle conductive threads to fray
and break. By contrast, the bobbin thread is fed more or less continuously to the
fabric, with little back-and-forth motion.
Stitching allows traces to be placed in almost any orientation and position on a
garment piece or fully assembled garment. Most stitched processes use uninsulated
conductors, and therefore seam crossings are subject to the methods described previously. As opposed to methods using knitted or woven conductors, stitched methods
can cross preformed seams with ease. However, as previously noted, the need for a
seam-crossing conductor must be taken into account during production planning.
23.4.4Trace Crossings
While in some products only one or two connections may be needed in the garment
itself, for garments using a central processor to control a larger array of peripherals,
it may be necessary for traces to cross in the garment. In a PCB, this is done by using
multiple board layers to allow traces to travel under and over each other. In fabric,
many layers quickly get bulky. Insulated conductors can be woven in orthogonal patterns, but this is significantly limiting for component placement unless interconnects
can be made between woven conductors. Intarsia processes can lay conductors into a
textile in more complex patterns, but may not be possible with insulated conductors.
Stitched processes are another method of allowing traces to cross without shorting, using a sewing machine. In practice, this process ends up similar to a couching
technique used in embroidery, in which an embellishment is stitched to the surface
of a textile using loops of finer thread.
Because lockstitch machines use two threads to form a stitch, one thread can be
used to couch a conductive thread to one surface of the textile. This relies on adjusting the tension balance of the two threads (shown in Figure 23.4), such that one
floats on the surface of the textile while the other is pulled through from one side
to the other. Provided that the textile is nonconductive, it can serve as the insulator
between conductors on each side. In this way, conductors can be stitched on either
side of the fabric.
As with PCBs, e-textile circuits often require interconnections between traces
on different layers. For insulated conductors (particularly those that pass directly
under and over each other with no textile or yarn layer between), interconnects can
be formed using methods previously discussed in this chapter. For uninsulated conductors, connections may need to be made using techniques similar to those used for
PCBs, by creating a perpendicular interconnection like a via or through-hole. In the
CMT factory, such via connections can be made using crimped processes, in which a
connector with metallic legs is inserted through the fabric, and the legs are clamped
around the fabric from the reverse side (see a stud example in Figure 23.8). In apparel
production there are many instances of crimped components, including things like
snap fasteners, studs, and zipper stops. As such, the attachment of crimped fasteners
is a well-established practice, and indeed specialized hand tools and simple machinery exist for this purpose.
632

Conductive crimp fastener
Stitched pads
Top trace
Fabric insulator
Bottom trace
FIGURE 23.8 Crimped via connection between layers of a stitched circuit.
23.5 COMPONENT ATTACHMENT

23.5.1Through-Hole Components and Crimped Methods
The process of attaching a crimped via connection described earlier uses a fastener
or interconnect that resembles in many ways a through-hole dual-in-line (DIP)
integrated circuit package. Because DIP packages have protruding legs that can
be crimped through fabrics, they can be used to connect components directly to
the surface of a textile and to simultaneously make an electrical connection with a
stitched conductor, as shown in Figure 23.9. However, two design parameters must
be optimized for this connection to be successful. First, the strength of the package
leads must be such that they can exert sufficient mechanical strength in a crimped
Crimped DIP LED

Two-sided
trace crossings
Crimped DIP pushbutton
Stitched placement
pad
Crimped stud via
FIGURE 23.9 Example of multilayer crimped DIP stitched circuit. (From Dunne, L.E.
etal., Multi-layer E-textile circuits, in Proceedings of the ACM International Conference on
Ubiquitous Computing (UbiComp), Pittsburgh, PA, 2012.)
633
connection to form a solid electrical connection and to ensure the security of the
mechanical connection. Second, the stitch pattern must be designed such that there
is sufficient surface area of conductive thread to which the lead can be crimped.
To ensure good coverage of the stitched conductor, a lateral covering stitch pattern provides more surface area than a running stitch. However, as discussed earlier,
lateral covering stitches often result in an imbalance in tension between the bobbin
and the needle threads. For a conductive stitch pattern this can be difficult to manage: either the conductive bobbin is contracted as the needle thread wraps to the
opposite side, reducing the amount of surface area covered, or the conductive bobbin
wraps to the needle side, potentially shorting with opposite-side stitches. Further,
if a single machine performs both running stitch trace layout and covering stitch
pad layout, it is unlikely that one tension setting could be successfully used for both
operations. Programmable tension settings on sewing machines are not a common
feature, making this a significant problem.
23.5.2 Surface-Mount Components and Reflow Methods

With the imbalanced lockstitch method to lay circuit traces and interconnects, surfacemount device (SMD) components can also be used. Because surface-mount leads are
too fine to create mechanical connections using CMT tools, alternative methods are
necessary. Two promising methods are the use of adhesives and soldering.
Linz et al. showed that with adequate force a good electrical contact can be
formed between a component lead and a conductive yarn using only a nonconductive adhesive, in a method similar to the flip-chip process for PCBs that similarly
uses only nonconductive adhesive. The force applied to press the chip to the textile
displaces the adhesive from between the contact pads, and when the adhesive cures
it secures the mechanical and electrical connection. In their approach, they demonstrate using a thermoplastic insulator on the embedded conductor, such that the force
and temperature of compression during chip attachment also displaces the insulation
from the conductor.
While the kind of machinery needed for this process is not necessarily typical for
CMT operations, it is possible that such an approach could be modified or adapted
for use in flat press irons. Similarly, a surface-mount reflow technique can also be
adapted for use with flat press irons, heat guns, or curing ovens. Using uninsulated
solderable conductive yarns, chip pads and traces can be stitched using CNC sewing
machines. Because SMD packages have very small leads, the need for surface area
coverage in stitching a pad on the base textile is eliminated. Many SMD component
leads are similar in size to the diameter of sewing threads. Using solder paste or presoldered components (e.g., flip-chip packages), components can be soldered directly
to conductors with heat alone (Figure 23.10).
For both adhesive and soldered approaches, one of the benefits of developing methods of attaching SMD components directly to fabric is that as the chip gets smaller, its
size approaches the bend radius of the fabric. Because of this, the textile (and perhaps
more significantly, the integrated conductor) starts to act as strain relief on the component interconnect. For larger PCBs, the junction between the PCB and the conductor is
a significant stress point and often the location of electrical failure. However, there is
634
(a)
(b)
FIGURE 23.10 (a) Surface-mount IC package soldered to stitched conductors and (b) reflowsoldered connection.
some limitation on the minimum spacing of conductor layout, relative to the diameter
of the integrated conductor. Particularly for stitched methods, very fine conductors are
often too weak to be used in standard sewing machines. To date we have successfully
integrated multipin SOIC (small outline integrated circuit) packages and SMD packages down to the 402 size using stitched conductors and reflow soldering techniques.
23.6 TEXTILE COMPONENTS

While leveraging the well-developed electronic components, standard electronic
devices have clear benefits in efficiency and accuracy; in some cases, electronic
packages are not suited to the needs of e-textiles and smart clothing. A good example
of this is the case of sensors used to detect bend and stretch (strain gauges). Many
commonly used devices are rigid or semirigid planar structures, often of limited
length and predefined size. Integrating such sensors into apparel often introduces
a stiff or bulky area of the garment that is not conducive to meeting wearer expectations for comfort and aesthetics. As with the previously discussed methods for
leveraging CMT technologies to develop softer electronic circuits, they can also be
leveraged to develop softer components. In the following section, we will discuss the
use of multineedle sewing machines to create stretch and bend sensors.
23.6.1 Stitched Stretch and Bend Sensors

Textile stretch and bend sensors are typically formed with one of two approaches:
either with a surface application of a piezoresistive material (like a carbon-loaded
rubber), which is applied in a process similar to screen printing as described earlier,
or by forming loops of conductive yarn that slide over each other when the textile is
deformed. Here, we will focus on the latter approach.
While many looped-conductor approaches to stretch and bend sensing use knitting to form loops (since a knit structure is already formed of loops that slide and
deform as the textile is stretched and bent), there are limitations to this approach.
The most significant of these limitations have been previously discussed in describing the use of knitting to integrate interconnections: the sensor is most easily laid
635
orthogonal to the direction of knitting and its location and configuration must be
preplanned at the textile development stage.
By contrast, stitched methods can be used to place a looped-conductor sensor anywhere on the garment surface, in any orientation. Many industrial sewing
machines can be used to form looped-conductor sensors: any machine that uses a
looper to form a looping or sinusoidal thread pattern is a candidate. To form the sensor, an uninsulated conductive yarn is used in place of one or more looping threads.
Depending on the geometry of the loop and the mechanics of its deformation, shorts
formed by self-intersections of the conductor cause a shortening or lengthening of
the electrical path as the stitch is deformed.
Figure 23.11 shows the electrical equivalent model of three stitch structures used
to create stretch sensors using cover stitch and overlock machines. In the case of the
Bottom cover thread

L1
L1
L2
L2
Top cover thread

L1
L1
L2
Overlock thread
FIGURE 23.11 Three stitched stretch sensors with equivalent electrical circuits.
L2
636

90
Sensor response (Ohm)
80
70
60
Bottom threadcoverstitched sensor
Top threadcoverstitched sensor
Overlock sensor
50
40
30
20
10
20
30
40
50
60
70
80
90
100
Time (s)
FIGURE 23.12 Relative responses to elongation for three types of stitched stretch sensor.
bottom-cover thread, loops slide toward each other and the electrical path shortens
as the stitch is stretched. In the case of the overlock and top-cover threads, loops
separate as the stitch is stretched, resulting in an increase in resistance as the electrical path lengthens.
Figure 23.12 shows the relative responses for sensors of similar length and the
same stimulus across the three types described earlier. Because the loops of the topcover and overlock sensors will all be eventually separated, their response saturates.
By contrast, the bottom-thread sensor loops will continue to deform until the fabric
tears. Therefore, there is no saturation region in the bottom-thread response.
The same stitch structures can be used to detect bend. Because the top- and
bottom-cover threads of a coverstitch are electrically isolated by the fabric between
them, a method such as that used by Tognetti etal. (2014) can be used to differentiate
bend from stretch by comparing the responses of two sensors coupled on opposite
sides of the same substrate. However, we have found that deformation of a single sensor on one side of the fabric also produces an analog resistance response to bending
(Gioberto etal. 2013). This is likely due to forces on the yarn bringing individual
fibers closer together and farther apart as the structure is bent.
23.6.2 Sensor Insulation and Durability

As is the case for uninsulated stitched conductors of all types, the lack of insulation
can cause problems for the electrical circuit (due to the risk of shorting across traces
or within traces when the fabric is folded) as well as for the durability of the conductor. In the case of stitched sensors, we see drift in the relaxed resistance of the sensor
as well as a decrease in the peak-to-peak response over time.
637
FIGURE 23.13 Stitched bend sensor insulated with fusible polyurethane film.
Techniques for insulating bare conductors post-integration can mitigate some of

these effects while preserving the ease of integration afforded by a lack of insulation.
Techniques for post-integration insulation that have been explored include couching or insulating with a covering stitch, applying an insulative paint or coating to
cover the conductive yarn, and covering with an additional textile layer. Important
variables to consider in insulation techniques are the effect of insulation on the hand
of the textile (especially in the case of a sensor like the stitched structure described
here, a stiff or bulky insulation layer can negate the positive qualities of the stitching
approach) and permeability (in some cases keeping air and moisture out can protect
the sensor, in others it is not necessary to fully encapsulate).
An insulation approach that has proven successful for protecting the stitched
stretch/bend sensors is the use of a fusible seam tape. Fusible tapes are not uncommon in CMT processes, where they are used for purposes such as sealing impermeable or selectively permeable materials (after they have been punctured by sewing
a seam), bonding materials without stitching (when needle punctures are not appropriate for the material integrity or end use), and reinforcing or flattening stitched
areas. Many dedicated machines and machine attachments exist that allow tapes to
be applied and fused as the seam is formed.
We have used an extensible polyurethane stitchless bonding film (3M 7012) to
insulate our sensors by fusing the film on top of the sensor, as seen in Figure 23.13.
Although the film adheres to the surface of the sensor, it allows enough movement
that the sensor remains responsive to both stretch and bend.
23.7CONCLUSION
Garment integration of wearable technologies has significant advantages in comfort,
aesthetics, and usability. Further, it expands the range of body areas that can be
accessed by worn technologies. However, textile integration of electronics requires
adaptation of production processes for successful manufacturing, particularly at
larger scales. Potential exists in both directions: in adapting the processes standard
638
to electronics fabrication and in adapting the processes standard to apparel production. The tools and techniques used to produce sewn products have some interesting
advantages for soft goods and can be effective in making the production of e-textiles
and wearable technology more feasible for sewn product manufacturers.
REFERENCES
Berglund, M. E., J. Coughlin, G. Gioberto, and L. E. Dunne. 2014. Washability of E-textile stretch
sensors and sensor insulation. In Proceedings of the 2014 ACM International Symposium
on Wearable Computers, Seattle, WA, pp. 127128, ISWC14. New York: ACM.
Bhattacharya, R., L. Van Pieterson, and K. van Os. 2012. Improving conduction and mechanical reliability of woven metal interconnects. IEEE Transactions on Components,
Packaging and Manufacturing Technology 2 (1): 165168.
Bonderover, E. and S. Wagner. 2004. A woven inverter circuit for e-textile applications. IEEE
Electron Device Letters 25 (5): 295297.
Buechley, L. and M. Eisenberg. 2007. Fabric PCBs, electronic sequins, and socket buttons:
Techniques for E-textile craft. Personal and Ubiquitous Computing 13 (2): 133150.
Dunne, L. E., K. Bibeau, L. Mulligan, A. Frith, and C. Simon. 2012. Multi-layer E-textile
circuits. In Proceedings of the ACM International Conference on Ubiquitous Computing
(UbiComp), Pittsburgh, PA.
Gioberto, G., J. Coughlin, K. Bibeau, and L. E. Dunne. 2013. Detecting bends and fabric folds
using stitched sensors. In Proceedings of the 14th International Symposium on Wearable
Computers, Zurich, Switzerland.
Kirstein, T. 2013. Multidisciplinary Know-How for Smart-Textiles Developers, 1st edn.
Oxford, U.K.: Woodhead Publishing.
Lee, J. B. and V. Subramanian. 2005. Weave patterned organic transistors on fiber for E-textiles.
IEEE Transactions on Electron Devices 52 (2): 269275.
Lehn, D., C. Neely, K. Schoonover, T. Martin, and M. Jones. 2004. E-TAGs: E-textile
attached gadgets. In Proceedings of Communication Networks and Distributed Systems:
Modeling and Simulation. San Diego, CA.
Li, L., W. M. Au, Y. Li, K. M. Wan, S. H. Wan, and K. S. Wong. 2010. Design of intelligent
garment with transcutaneous electrical nerve stimulation function based on the intarsia
knitting technique. Textile Research Journal 80 (3): 279286.
Linz, T., C. Kallmayer, R. Aschenbrenner, H. Reichl, I. Z. M. Fraunhofer, and G. Berlin.
2005. Embroidering electrical interconnects with conductive yarn for the integration of
flexible electronic modules into fabric. In Proceedings of the Ninth IEEE International
Symposium on Wearable Computers, Osaka, Japan, pp. 8689.
Locher, I., T. Kirstein, and G. Troster. 2004. Routing methods adapted to E-textiles. In
Proceedings of the 37th International Symposium on Microelectronics (IMAPS 2004).
Slade, J., A. Houde, and P. Wilson. 2012. Electrically active textiles, articles made there from,
and associated methods. http://www.google.com/patents/US20120030935 (Accessed
date 3rd May, 2015).
Tognetti, A., F. Lorussi, G. D. Mura, N. Carbonaro, M. Pacelli, R. Paradiso, and D. De Rossi.
2014. New generation of wearable goniometers for motion capture systems. Journal of
Neuro Engineering and Rehabilitation 11 (April).
Yamashita, T., S. Takamatsu, K. Miyake, and T. Itoh. 2012. Fabrication of conductive polymer
coated elastomer contact structures using a reel-to-reel continuous fiber process. IEICE
Electronics Express 9 (17): 14421447.
Zysset, C., T. W. Kinkeldei, N. Munzenrieder, K. Cherenack, and G. Troster. 2012. Integration
method for electronics in woven textiles. IEEE Transactions on Components, Packaging
and Manufacturing Technology 2 (7): 11071117.
24 Integrating Energy
Garment Devices
Storage into Textiles
Kristy Jost, Genevieve Dion, and Yury Gogotsi
CONTENTS
24.1 Introduction................................................................................................... 639
24.1.1 Design and Material Parameters for Wearable Electronics...............640
24.1.2 Brief Introduction to Energy Storage Devices................................... 642
24.1.2.1 Energy Storage Components............................................... 642
24.1.2.2 Energy Storage Devices...................................................... 642
24.1.3 Introduction to Textile Structures...................................................... 645
24.2 Work in the Field...........................................................................................646
24.2.1 Coated Devices.................................................................................. 647
24.2.2 Knitted Carbon Fiber Electrodes....................................................... 650
24.2.3 Fibers and Yarns for Energy Storage................................................. 652
24.2.4 Custom Textile Architectures for Supercapacitors............................ 655
24.3 Conclusions.................................................................................................... 657
References............................................................................................................... 657
24.1INTRODUCTION
Portable electronics have evolved rapidly over the last 10 years and now wearable technologies are following the same trend. While multifunctional clothes are
appearing on the market with a multitude of electronic devices incorporated onto the
fabric, garment devices are articles of clothing with inherent electronic properties.
Garment devices are the actual device, a new kind of technology, also referred to
as e-textiles or smart garments (Quinn, 2010; Seymour, 2008, 2010). Cutting edge
research on garment and textile devices (Figure 24.1), ranging from sensing, illuminating, and computer-like garments, continues to appear in the literature, and this
chapter explores how these devices could be powered. New techniques for integrating energy storage (i.e., batteries and capacitors [Simon and Gogotsi, 2008]) into
textiles are described and new methods for generating energy are briefly explored.
Figure 24.1 illustrates the concept of a garment device incorporating various electronic components by custom designing a knitted textile using conductive materials
(Dion, 2013; Kirsch etal., 2009; Sim etal., 2012).
639
640

(b)
(c)
(e)
(d)
(a)
FIGURE 24.1 Design concept for a smart power bodysuit. (a) Piezoelectric patch
converts body movements to electrical energy; (b) textile antennas to transmit communications; (c)textile electrochemical energy storage to store energy from harvesting devices;
(d)integrated conductive yarns act as leads to transmit energy or information throughout the
garment; (e) this design is simulated with realism in the textile structure to show that different materials can be integrated as part of a fabric. (From Jost, K. etal., J. Mater. Chem. A, 2,
10776, 2014a.)
Commercially available devices include the Adidas Mi-Coach, the Hi-Call Phone
glove, and the Under Armour heart rate monitoring shirt. However, many of these
wearable technologies still use solid coin cells or pouch cell lithium batteries, which
can be cumbersome, bulky, and are typically stitched or glued into the garment after
assembly. It has been proposed (Dion, 2013) that garment devices would have batteries integrated into the clothing that were indiscernible from regular textiles. This
chapter describes textiles capable of storing energy, fabricated with traditional and
advanced textile manufacturing methods (Figure 24.1).
However, what kind of battery technology and fabric structure will be ideal for
garment devices? We must first consider the design parameters and limitations a
garment device will have.
24.1.1 Design and Material Parameters for Wearable Electronics

Safety: It is always the top concern for researchers developing conventional or wearable batteries and capacitors. Therefore, the materials must be chemically inert
(e.g., noncorrosive, not capable of self-ignition, nontoxic), and the system must be
designed to avoid shocking the wearer (i.e., electrically insulated, or operational
below a threshold dangerous for human usea few volts).
Garment Devices
641
Nanoparticles are also a concern for wearable devices since the long-term effects
from exposure to these new materials are unknown. However, materials with
controlled nanoscale structures are safe and can be used. Activated carbons (ACs)
or carbide-derived carbons (CDC) (Chmiola etal., 2006, Lin et al., 2009) are particles in micrometers (m) in size that can be developed with controlled pore sizes,
tunable by one-tenth of a nanometer. These materials are widely used for water filtration or for poison control in pill or powder form where pores can be tuned to
selectively adsorb specific impurities, for example. This is one of many instances
where nanotechnology does not pose safety concerns (Gogotsi, 2003). ACs are also
used in double layer capacitors (Section 24.1.2), and typically such energy storage
devices, including any nanoparticles used, are encased in a liquid or gel electrolyte.
Washability: The most common question asked about garment devices is can
they be washed? Washing batteries and electronics the way we wash our clothes
is typically avoided. While some components can be waterproofed, many of the
materials and technologies used in smart garments today are those used in conventional portable electronics such as smartphones and these would never be soaked
in water. Therefore, much like a good wool suit, these technologically enhanced
garments require special care when cleaning. In addition, a process like dry-cleaning
can better preserve garments compared to conventional wet-washing and machine
drying over the long term.
Reliability: If garment devices are to last years, the chosen battery technology must
be reliable for the predicted lifetime of the garments, requiring replacement only if
damaged. For techniques incorporating the battery into the textile material, a device
failure would mean replacing the entire garment.
Durability: Similarly to regular garments, garment devices incorporating battery
fabrics must be able to withstand normal wear and tear from everyday use. Therefore
many researchers include electrochemical testing of their devices not only when flat
but also when bent or stretched. These tests will be described in Section 24.1.2.
Cost: Some battery and supercapacitor systems are composed of rare metals;
theymay also require complex and expensive manufacturing processes. Given that
these must be converted into textiles, abundant materials have a greater chance of
successful commercialization. In particular, many of the works described in this
chapter utilize carbon materials, one of the most abundant elements on the planet.
Different forms of carbon vary in cost; activated carbon and graphite are relatively
inexpensive materials frequently used in supercapacitor and battery electrodes.
Fabrication: As previously mentioned, choosing manufacturing techniques that
already exist in the fashion and textile industry to produce energy storing fabrics
will allow for a smoother transition from lab-scale testing to large-scale manufacturing. This also means that the type of energy storing fabric should be designed
with commonly available materials, as well as based on the simplest conventional
electrode configurations. For example, if a device is composed of too many types of
material than a fabric making process can incorporate at one time, then it is likely
642
not a feasible system. This chapter will explore both printing and knitting techniques
for fabricating energy storing textiles.
Given the design parameters described earlier, understanding the basic principles
of different storage technologies will inform which technologies are best suited for
wearable applications.
24.1.2Brief Introduction to Energy Storage Devices

24.1.2.1 Energy Storage Components
Electrode: It is the charge storing material, either through chemical bonds or a double layer capacitance. Typical materials include activated carbon for supercapacitors,
and lithium cobalt oxide (LiCoO2) and graphite for lithium-ion batteries.
Current collector: It is a sheet of metal that the electrode is rolled/adhered to in order
to improve the electrical conductivity.
Electrolyte: It is a solution (aqueous 1 M NaCl, organic solvent, polymer, etc.) that
transports ions from one electrode to another to perform redox processes or form a
double layer capacitance.
Separator: Divides two electrodes and current collectors in a device assembly sandwiched on top of each other. The separator electrically insulates the electrodes from
each other so they do not short, and allows electrolyte ions to pass through the membrane. The closer the electrodes are to each other without electrically shorting, the
higher the capacitance because ions do not have to travel as far between electrodes.
This also means they can charge faster, that is, have a higher power. Typical separators include Gore (polytetrafluoroethylene) or Celgard (polypropylene) membranes
that have nanopores on the order of 50100nm and are 2050m thick.
24.1.2.2 Energy Storage Devices
Electrochemical Capacitors (ECs): These store less charge than batteries but have
the unique ability to be composed entirely of nontoxic materials (Dyatkin et al.,
2013), last for hundreds of thousands of cycles, and are composed of highly abundant
materials (e.g., activated carbon, polymer, and aluminum foil).
Electric Double Layer Capacitors (EDLCs): These adsorb ions on the surface of
a conductive electrode material (Figure 24.2a), so called a double layer charge,
which is the mechanism by which energy is stored (Figure 24.3a and b) (Gogotsi
and Simon, 2011; Simon and Gogotsi, 2008; Taberna etal., 2003). Typical electrode
materials are carbon based (e.g., activated carbon, carbon nanotubes [CNTs], graphene), and they are porous and conductive enough to store electrical charge. If the
conductivity of the electrode material is not sufficient, conductive additives can be
mixed into the film, or the electrodes can be adhered to a metallic current collector. Commercially available capacitors use acetonitrile-based electrolytes and extend
the voltage w
indow up to 2.7 V but are not suitable for wearing. Nontoxic aqueous
or polymer-based gel electrolytes can be used in garment devices, but have a more
limited voltage window around 1 V. These devices are typically tested with voltage
or current chargedischarge techniques, such as cyclic voltammetry or galvanostatic
643
Garment Devices
SO4
Current collector
(a)
Na
LiCoO2
Charging
Li+
Discharging
Separator
(b)
V
Graphite
+
+
MnIIIOONa
Na
MnIVO2
Electrode
(c)
FIGURE 24.2 Basic schematics for an (a) all carbon double layer capacitor (left), (b) a pseudocapacitor (MnO2 depicted center), and (c) a lithium-ion battery (right). All devices have an
active material (e.g., carbon, MnO2, LiCoO2), a current collector, a separating membrane, and
an electrolyte (e.g., Na2SO4 or LiPF6 solutions). (From Jost, K. etal., J. Mater. Chem. A, 2,
10776, 2014a.)
+
+
(c)
1.0
0.5
(b)
(e)
Potential (V)
(a)
0.0
(d)
(g)
Current
+
+
+
+
+
+
+ +
+
+

+
Current
(f )
Potential
+
4.2 (vs. Li/Li )
(vs. AgCl/Ag)
50 nm
3.6
Bulk
0.5
Q/Qmax
Potential
Potential (V)
1.0
3.0
(h)
Bulk
6 nm
0.45
1.0
Q/Qmax
FIGURE 24.3 Comparing batteries to supercapacitors: (ad) The different mechanisms

of capacitive energy storage are illustrated. Double-layer capacitance develops at electrodes
comprising (a) carbon particles or (b) porous carbon. The double layer shown here arises
from adsorption of negative ions from the electrolyte on the positively charged electrode.
Pseudocapacitive mechanisms include (c) redox pseudocapacitance, as occurs in hydrous
RuO2, and (d) intercalation pseudocapacitance, where Li+ ions are inserted into the host material. (eh) Electrochemical characteristics distinguish capacitor and battery materials. Cyclic
voltammograms distinguish a capacitor material where the response to a linear change in
potential is a constant current (e), as compared to a battery material, which exhibits faradaic
redox peaks (f). Galvanostatic discharge behavior (where Q is charge) for a MnO2 pseudocapacitor is linear for both bulk and nanoscale material (g), but a LiCoO2 nanoscale material
exhibits a linear response while the bulk material shows a voltage plateau (h). (From Simon,P.
etal., Science, 343, 1210, 2014.)
644
cycling (Figure 24.3e). They usually display very rectangular voltammograms,

where the area under the curve is proportional to the charge stored.
Pseudocapacitors (Figure 24.2b schematic): These store charge through fast redox
and intercalation processes (Figure 24.3c). They typically store more energy than a
double layer capacitor but less than a battery, and can last for ~10,000 cycles. Unlike
a battery, they can be charged at rates comparable to double layer capacitors, on
the order of seconds or minutes. Typical cyclic voltammograms for these devices
may have corresponding peaks indicating that reversible surface reactions are taking
place or are featureless just like those of double layer capacitors (Figure 24.3f).
Primary batteries: These are nonrechargeable batteries (e.g., alkaline or zinc-air)
commonly used in small electronics. They are packed with liquid and sometimes
corrosive electrolytes, and are single-use batteries. Since they cannot be recharged,
these systems are not being considered for use in wearable electronics.
Secondary (rechargeable) batteries (Figure 24.2c): These are most commonly used in
laptops, phones, and in some hybrid-electric vehicles. The most popular battery system is currently the lithium-ion battery, commonly composed of lithium-cobalt-oxide
(LiCoO2), a graphite anode, and lithium-hexafluorophosphate (LiPF6) electrolyte.
They operate by shuttling lithium ions between the graphite anode and the oxide
cathode (Figure 24.3d). They have a high energy density, are highly reversible, and
can last for hundreds and sometimes thousands of cycles, potentially lasting for the
lifetime of the garment. However, these electrolytes are hazardous; finding nontoxic
alternatives would make them viable for garment devices.
The next section covers energy density, which in addition to the storage components and devices, is one of the most important metrics governed by the materials
selection and the active mass loading. This metric dictates which applications the
device is best suited for.
Energy density: While batteries have the highest energy density (Figure 24.4),
capacitors have the highest power. This means that capacitors provide bursts of
energy for the short term but discharge quickly, and batteries provide less energy
at one time, but last for a few hours. The energy density of the battery or capacitor must be high enough to fit into a single garment. Therefore, high energy
density is a must.
The charge stored by different types of supercapacitor materials (i.e., woven,
knitted, yarn, fiber, or conventional films) will be reported as capacitance per gram
of material (F/g), as well as volume (F/cm3). Capacitance per gram essentially
estimates how much of the electrode material is actually contributing to the overall capacitance, and can give insight into whether or not there is a difference in
the electronic or ionic conductivity. For yarn and fabric capacitors or batteries,
these metrics are also generally reported in order to compare with conventional
devices. Capacitance per area (F/cm 2) of the fabric surface, and capacitance per
length (F/cm) of a yarn are also reported for full fabrics and individual yarn/fiber
capacitors, respectively. These energy densities per area or length will inform how
to design garments and textiles with a specified capacitance. Reporting energy
density per volume will help to differentiate between fabrics of similar density
645
Garment Devices
105
Capacitors
3.6 ms
0.36 s
3.6 s
36 s
Ni/MH
Li-ion
1h
Li-primary
ors
acit
cap
102
ical
hem
troc
103
Elec
Specific power (W/kg)
104
10 h
10
1
102
PbO2/
Pb
101
10
102
103
Specific energy (Wh/kg)
FIGURE 24.4 Specific power against specific energy (Ragone plot) for various electrical
energy storage devices. If a supercapacitor is used in an electric vehicle, the specific power
shows how fast one can go, and the specific energy shows how far one can go on a single
charge. Times shown are the time constants of the devices, obtained by dividing the energy
density by the power. (From Simon, P. and Gogotsi, Y., Nat. Mater., 7, 845, 2008.)
per area, but vastly different in thicknesses. Thevolumetric capacitance of a thin

fabric will be higher than that of a thick fabric having the same finite capacitance.
Active mass loading: The amount of active material that comprises an electrode is
directly proportional to the energy it will store per area, volume, or length. In order
to store the maximum energy possible, electrodes should be highly dense. Because
fabric supercapacitors and batteries are yet to reach the energy density and performance of conventional devices, high mass loading is a crucial aspect if the battery or capacitor is to be contained in a garment. In practice, electrodes are porous
to accommodate diffusion of electrolyte into the material, sacrificing some of the
stored energy for faster charging and discharging.
24.1.3Introduction to Textile Structures

Most textiles are composed of fibers, which can be spun into yarns, and then knitted
and woven into full fabrics.
Fibers: Fibers are small linear strands of polymer, typically with thicknesses on the
order of 10m up to 100m. Fibers can have cylindrical, corkscrew, bi- or trilobal,
and other cross-sectional structures that occur either naturally or are synthetically
extruded. Short fibers that are less than 3 in. are typically referred to as staple fibers,
and longer fibers are so-called filament fibers.
646
Fibers
(a)
(b)
Staple fibers
(short <3 in.) e.g.,
cotton, wool
Filament fibers
(3 in. <long <3 ft) e.g., silk,
polyester
Yarns
(c) Staple spun yarn
(d)
Filament spun yarn
(e)
Monofilament yarn
(f )
2-ply yarn
Full fabrics
(g)
Realistic knit simulations
Jersey
(h)
Plain
Rib
Cable
Miss
Realistic weave simulations
Twill
Satin
FIGURE 24.5 Illustration depicting fabric structures and their proper names. (a) Staple
fibers, (b) filament fibers, (c) illustration of a staple spun yarn, (d) illustration of a filament
spun yarn, (e) monofilament yarn is a single fiber with sufficient strength to also act as a
yarn, (f) illustration of a 2-ply yarn, (g) realistic simulation of different knit structures, with
a single yarn in dark gray to depict the path of the yarn, and (h) realistic weave simulations.
Modeled on the Shima Seiki Apex-3 Design software. (From Jost, K. etal., J. Mater. Chem.
A, 2, 10776, 2014a.)
Yarns: Fibers can be spun into a variety of yarn structures as seen in Figure 24.5c
through f.
Woven fabrics: These are sheets of material composed of yarns that are intertwined
over and under each other. Typically warp (vertical) yarns are prethreaded and held
tight while the weft yarns are woven back and forth over and under the warp yarns.
These sequences can be done in different orders to generate plain woven, twill, or
satin weaves (Figure 24.5h).
Weft knitted fabrics: Unlike woven fabrics, knitted fabrics are not composed of independent yarns; a full piece of fabric can be made entirely from a single strand of yarn
intermeshed row by row, back upon itself in a snake chain configuration (Figure24.5g).
Weft knitted fabrics have much more stretch in the weft direction (horizontal), and
less in the warp direction (vertical), making them anisotropic physically, but also conductively in the case of metal- or carbon-based yarns. Because the yarn is continuous
along the width of the fabric, electrons can flow through the material itself. However,
rows are electrically connected in the vertical direction by intertwined loops. Knitting
typically requires less material and set-up time to fabricate samples, and has been
explored as a main fabrication technique for smart textiles (Jost etal., 2013; Li etal.,
2010; Soin etal., 2014).
24.2 WORK IN THE FIELD

Literature in the field can be categorized into three approaches: coating preexisting textiles, fabricating fibers and yarns, and designing custom knit or woven
structures that incorporate the battery or capacitor into the textile structure. Each
approach has its advantages and challenges in its integration and operation when
incorporated into garments. Some devices will have to be stitched on and sealed
Garment Devices
647
after the garment is made, while others use manufacturing methods that knit or
weave the garment and device simultaneously.
24.2.1Coated Devices
In 2010, Yi Cuis research group at Stanford University reported on the fabrication
of dip-coated (or dyed) CNT textiles that showed excellent
performances as
supercapacitor electrodes (Hu et al., 2010). The group developed a water-based
ink mixed with surfactant and single-walled carbon nanotubes (SWCNTs) that
allowed the CNTs to conformally wrap around cotton fibers in commercially
available woven or knitted cotton fabrics. This conformal coating was highly conductive, and the group was able to demonstrate ~480 mF/cm 2 of energy stored.
This particular study also used lithium hexafluorophosphate as their electrolyte,
which is corrosive and nonwearable. However, they later published a new study
that focused on using benign neutral sodium and lithium sulfate electrolytes (Pasta
etal., 2010), and loaded 0.42.2 mg/cm 2 of SWCNT ink. This work sparked many
researchers to explore the incorporation of nanomaterials into textiles for wearable
supercapacitors.
In our initial work (Jost et al., 2011) rather than dying the fabric, we chose to
screen print carbon materials into cotton and polyester. Screen printing is a process where ink is pushed through a screen that has a shape masked into it (like a
stencil) and the resulting shape can be printed onto any surface, like printing logos
on a T-shirt. Screen printing is advantageous because it loaded larger quantities
of carbon (~4.9 mg/cm2) into the fabrics compared to the same textiles being dipcoated (~0.42.2 mg/cm2) (Jost et al., 2011) because the screen printing ink was
more dense with carbon than the dip-coating solutions (0.2 vs. 0.01 g/mL, respectively). Cotton and polyester woven fabrics were chosen for screen printing because
they absorbed the most carbon material when dip-coated compared to other cotton
twill and nylon fabrics.
Some of the key challenges to much of the early work on textile energy storage
were toxicity, poor flexibility, and leakage. We chose to use activated carbon (AC)
instead of CNTs, graphene, or other nanoparticles, because it is well known to be
nontoxic (thus safe to wear), and is also the most commonly used electrode material
in the supercapacitor industry because of its low cost and high specific surface area.
Therefore, we could also draw direct comparisons between AC fabric supercapacitors and conventional AC film supercapacitors (Section 24.1.2).
Similarly to Pasta et al. (2010), our work utilized aqueous 1 M sodium sulfate
(Na2SO4) and 2 M lithium sulfate (Li2SO4) electrolytes. These were still liquid and
posed challenges to the wearability of the devices; later work in the field explored
the use of solid and gel electrolytes. One advantage the Yi Cui group (Hu et al.,
2010) had while using CNTs was high conductivity, which meant they did not need
to use metal current collectors. ACs are too resistive to stand on their own as both
electrode and current collector, so we adhered the fabric electrodes to stainless
steel sheets. Because conventionally hard supercapacitors also use stainless steel or
nickel, the textile electrodes performance could be directly compared to conventional supercapacitors.
648
Scanning electron microscopy (SEM) was conducted to observe the morphology

of the native textile as well as the screen printed AC within the textiles fibrous
structure. Textiles have a hierarchical structure, with the largest porosity within
the textile weave/knit structure, then between individual fibers within the yarns,
and finally in the fibers themselves (Figure 24.6a). We found AC to be well integrated into all levels of the fabric structure. The activated carbon also has multiple
ar
ga
rm
en
t
Woven fabric
Yarn
Porous carbon
Fiber
(a)
300 m
300 m
(b)
(c)
30 m
(d)
6 m
(e)
FIGURE 24.6 (a) Design concept of a porous textile supercapacitor integrated into a smart
garment, demonstrating porous carbon impregnation from the weave, to the yarn, to the
fibers. Below: SEM images of weaves and their corresponding fibers (b) polyester microfiber
twill weave before coating. (c) Cotton lawn plain weave before coating. (d) Polyester fiber
screen printed with activated carbon (Kuraray, Japan). (e) Cotton fiber screen printed with
activated carbon (Kuraray, Japan). (From Jost, K. etal., Energ. Environ. Sci., 4, 5060, 2011.)
649
Garment Devices
l evels of porosity between the large micron size particles and its ~1nm micropores.
This hierarchical structure allowed for the electrolyte to access the carbon material quickly, analogous to cars initially traveling on large highways, then smaller
streets, and finally adsorbing in the smallest pores, like a car parking in a driveway. Figure24.6b and c shows SEM micrographs of the native polyester microfiber
(b) and cotton lawn (c) fabrics, where this hierarchical spacing can be observed.
It can be seen from Figure 24.6d and e that the activated carbon is well coated
around individual polyester and cotton fibers.
We found that screen printed electrodes performed better at faster scan rates compared to the standard AC films when tested under the same conditions (Figure24.7).
The textile devices stored ~0.25 F/cm2 and had ~4 mg of active material percm2, which
translated to a specific capacitance of ~88 F/g (Figure 24.7a and b). Cyclicvoltammetry
Polyeer microfiber
Woven cotton
Capacitance (F/g)
100
100
50
50
10 mV/s
100 mV/s
0
50
50
100
100
0.2
(a)
0.0
0.2
0.4
10 mV/s
100 mV/s
0.6
0.8
0.2
(b)
Voltage (V)
0.0
0.2
0.4
0.6
0.8
Voltage (V)
Activated carbon film
1.0
80
Capacitance (F/g)
C/C0
0.9
0.8
0.7
0.6
0.5
(c)
40
0
40
80
10
Scan rate (mV/s)
YP17 film Na2SO4
Cotton YP17 Na2SO4
Cotton YP17 Li2SO4
Polyester YP17 Na2SO4
100
(d)
0.2
0.0
0.2
0.4
0.6
0.8
Voltage (V)
10 mV/s
100 mV/s
Polyester YP17 Li2SO4
FIGURE 24.7 Capacitance vs scan rate: (a) Gravimetric capacitance vs voltage obtained
from cyclic voltammetry of cotton lawn tested in 1 M Na2SO4, at 10 and 100 mV/s. (b) Cyclic
voltammogram of polyester microfiber tested in 1 M Na2SO4 shows more resistive behavior
compared to cotton. (c) Normalized capacitance vs scan rate. (d) Cyclic voltammogram of a
YP17 film tested in 1 M Na2SO4 under the same conditions as the textiles electrodes. (From
Jost, K. etal., Energ. Environ. Sci., 4, 5060, 2011.)
650
was conducted for all samples in 1 M Na2SO4 at a range of scan rates from 10 to
100 mV/s. Figure 24.7c shows how the textile devices retain a higher capacitance at
increasing scan rates compared to the AC film, which loses almost 50% of its capacitance when the rate is increased from 1 to 100 mV/s. It is clear from the CVs (Cyclic
Voltammogram) in Figure 24.7d that the AC film electrodes do not retain high capacitance when the scan rate is increased (reducing the time allotted for charging). The
textile devices have added porosity in their woven structures, while the AC film only
has spacing between AC particles, thus electrolyte cannot diffuse as quickly through
the films as the fabrics. We believe this is why the textiles perform better at faster scan
rates. However, the AC film still stores much more energy per area than the fabrics
because it has more AC per area.
This technique is viable for applications that require printing the electronics onto
surfaces, and certainly has shown good performance. However, this system still
incorporates solid steel current collectors and a liquid electrolyte. Therefore, new
material systems were explored in a later paper (Jost etal., 2013) and will be outlined
in Sections 24.2.2 and 24.2.3. Since this work was published, other works explored
similar methods for incorporating graphene materials into cotton textiles (Li etal.,
2013; Liu etal., 2012a).
24.2.2 Knitted Carbon Fiber Electrodes

In 2013, we combined screen printing with machine knitting to produce a porous
carbon fiber (CF) electrode. Moreover, because the textile was knitted with c onductive
CF, the entire supercapacitor system was made of active materials, as opposed to using
coated cotton or polyester that did not contribute to the capacitance. Additionally,
the CF was much lighter and flexible than the steel current c ollectors, making it
more suitable for wearable applications. The knitted CF was also more porous than
the woven cotton or polyester textile, allowing for more AC to be incorporated into
the structure and hence resulting in more energy storage per area. TheCF was knitted into small 3cm 3cm squares surrounded by inactive wool f abric (Figure 24.8)
produced on a Shima Seiki SSG-122SV 12 gauge knitting machine. For comparison
with the knitted CF, woven CF was also screen printed with AC ink. All samples
were screen printed with the same AC ink as in our previous work (Jost etal., 2011).
Cotton and polyester fabrics were observed to have ~4mg/cm2 of activated carbon
mass; woven CF was shown to have 6 mg/cm2; and lastly the knitted CF was loaded
with 12 mg/cm2. Standard AC films for supercapacitor electrodes are typically made
with ~15 mg/cm2 activated carbon mass, meaning these new knit devices were likely
to have comparable performances and capacitances to a conventional device while
still being flexible and wearable.
When electrochemically tested, capacitances were measured as high as 510mF/cm2
(10 mV/s and ~0.4 A/g), which is very comparable to an AC film electrode tested in
the same gel electrolyte, having 660 mF/cm2. For textile devices, this is one of the
highest reported electrode mass loadings. The main limitation to the textile devices
reported in this work is that the combination of a slow diffusing gel electrolyte combined with a microporous activated carbon results in devices that charge more slowly
compared to conventional supercapacitors. This however is not a limitation of the
651
Garment Devices
(b)
3 cm
(a)
SiWA gel electrolyte
Porous PTFE
separator
(d)
(c)
Activated
carbon ink
Assembled flexible
device
FIGURE 24.8 (a) Continuous length of knitted carbon fiber squares in green wool,
(b)carbon fiber square coming out of the knitting machine, (c) close-up of carbon fiber electrode screen printed with activated carbon ink, (d) testing setup of layered fabric structure
coated in a polymer electrolyte and an image of the assembled device,
(Continued)
652

5 mV . s1
Device capacitance
per area (F/cm2)
1.0
1.0
Knitted CF
0.5
Voltage (V)
1.5
Woven CF
0.0
0.5
1.0
(e)
0.4 A . g1
0.8
Knitted CF
0.6
0.4
Woven CF
0.2
0.0
0.0
0.2
0.4
0.6
Voltage (V)
0.8
800
1.0
(f )
900
1000
Time (s)
FIGURE 24.8 (Continued) (e) cyclic voltammograms of knitted vs woven CF tested at

5mVs1, (f) galvanostatic curves tested at 0.4 Ag1 showing knitted CF vs woven CF both
screen printed with the same AC ink. The knitted CF has a higher mass loading due to its
porous structure, while AC on the woven CF sat more on top of the textile structure. (From
Jost, K. etal., Energ. Environ. Sci., 6, 2698, 2013.)
textile, but of the active carbon material and the available nontoxic/nonliquid electrolytes. Both can be optimized for garment device applications.
At present, polyvinyl alcohol (PVA)-based gel electrolytes (Gao and Lian, 2011)
are the most conductive compared to ion conducting polymers like polyphenylene oxide (PPO) and polyethylene oxide (PEO). The PVA gel electrolyte has water
trapped in the polymer matrix, and ions travel through the water, much like in a typical aqueous electrolyte. For this reason, many different kinds of aqueous electrolytes
can be incorporated into gels as long as they are nonreactive with the PVA.
The advantage of knitting some of the conductive elements in a garment device
is that any program can be finalized and immediately sent to a factory to be knitted. Small components like these square patches, shown in Figure 24.8, can also be
modified and incorporated into a garment for manufacturing.
This system still requires two electrodes being stacked one on top of the other,
and would likely have applications in outerwear garments that also have multiple
layers of fabric, as compared to a fine knit T-shirt. However, this paper is one
of only a few to manufacture a custom textile for an electrochemical application.
Amajority of battery researchers do not have access to industrial knitting or weaving equipment; therefore, many began to work on developing yarn and fiber supercapacitors (Fu etal., 2012, 2013; Kwon etal., 2012; Le etal., 2013; Li etal., 2013;
Meng etal., 2013).
24.2.3Fibers and Yarns for Energy Storage

Fibers and yarns are the fundamental components that are stitched, knitted, or
woven together to form full fabrics. It also means they can be processed like conventional yarns into full fabrics and garments. Moreover, textiles can be designed
to incorporate a desired amount of energy storing yarn, meaning garment devices
with specified energy density and resistance can be easily fabricated, with little to
no postprocessing.
Garment Devices
653
Some of the first reports on fiber- or yarn-like capacitors and batteries began
to appear in 2011 from Z.L. Wangs group at the Georgia Institute of Technology,
Atlanta, GA, where nanowires were grown on Kevlar to act as pseudocapacitive fiber
electrodes (Bae etal., 2011). The electrodes were twisted around each other (similar
to Figure 24.5f) and used a gel electrolyte to keep them electrically separated from
each other. The authors reported capacitances on par with micro-supercapacitors
used to power on-chip circuit components. This group has also reported nanowirebased energy harvesting fibers that can be used in textiles (Wang and Wu, 2012).
Many more papers have since appeared in the literature demonstrating a variety
of graphene (Carretero-Gonzlez etal., 2012; Cheng etal., 2014; Lee etal., 2013),
graphite (Fu etal., 2012), CNT (Wang etal., 2013), and pseudocapacitive (Bae etal.,
2011; Fu etal., 2013) yarns with comparable performances to commercially available
capacitors or batteries. Graphene yarns in particular are increasingly popular because
they are highly conductive (no need for an additional metal current collector) while
also having high surface area to promote high capacitance. Work from Prof. Wallace
(Aboutalebi etal., 2014) and Prof. Baughman (Lee etal., 2013) has demonstrated very
high conductivity as well as high volumetric capacitance for graphene yarns. Wallace
and other groups have also explored the method of wet-spinning these graphene
fibers, where they produce large quantities of material (Meng etal., 2013). They have
also conducted extensive flexibility testing with resulting electrochemical data. Ithas
generally been found from graphite, graphene, and CNT yarns/fibers, which are
continuous strands of materials, that there is little difference in the electrochemical
performance when bent or deformed. Systems that use larger micron-sized particles
show more variation as the particles may be displaced with movement, potentially
breaking conductive bonds. However, bending does not seem to have a significant
effect on the charge storing mechanisms. Recently, new works have been demonstrating stretchable fibers and yarns (Ren etal., 2014; Yang etal., 2013).
Because supercapacitors and batteries must incorporate an electrode, current collector, separator, and electrolyte into its system, researchers have demonstrated different geometries for fiber and yarn ECs. A single fiber can be coated in multiple
layers of electrode, current collector, etc., forming a coaxial style EC (Meng etal.,
2013; Yang etal., 2013) as seen in Figure 24.9a through c. However, sometimes two
electrode fibers are plied/twisted together and separated by a membrane or solid
electrolyte (Figure 24.9d). Any of these fiber geometries can be modified for asymmetric capacitors or batteries, where one electrode is much larger than the other.
However, much like insulated wire, these pose challenges to connecting components
when embedded into fabrics, since the end would have to be stripped and soldered
after the fabric is constructed. Therefore, developing connectors or other methods
that do not completely insulate the components is a field that requires more research.
Lastly, some strips of electrode material were layered into a full EC geometry and
scrolled along their length into a yarn (Gorgutsa etal., 2012). This structure is actually similar to rolled electrodes in commercially available supercapacitors.
One of the main challenges that researchers are yet to fully address is the conductance along lengths of yarns. Since the energy stored will be proportional to the
length, more length may be desirable. However, the longer the supercapacitor (or any
wire), the more resistive it becomes. Eventually, that resistance will become too high
654

Current collector
Electrode material
(b)
(a)
(d)
(c)
Electrolyte
FIGURE 24.9 Cross-sectional diagrams of electrode configurations in yarns. (a) Core-shell

electrode and electrolyte, where the electrode is conductive enough to also act as the current
collector, (b) single electrode with a current collecting core, (c) full coaxial style supercapacitor in a single fiber, and (d) two electrodes where the current collector and electrode yarns are
twisted together and then insulated in electrolyte.
to allow for functional operation of the yarn supercapacitor or battery. Therefore,

fully insulated ECs may not be ideal for preprocessing into fabrics. As has been
previously reported (Li etal., 2010), knitting multiple rows of conductive yarn can
act like wiring resistors in parallel with each other. Therefore, insulating the ECs
after knitting or weaving means multiple rows of electrode yarns can be knitted
together to tune the resistance and capacitance accordingly.
In addition to addressing resistive networks, few researchers have processed their
EC yarns into fabrics, and therefore it is not known if they are mechanically strong
enough to withstand industrial knitting or weaving. However, most of the devices
reported in literature still have low mass loadings, and in some cases even if plied
together, a few hundred strands of yarn may be needed to have the same capacitance
as a conventional supercapacitor per area.
Natural fiber welded electrode yarns for knitted textile supercapacitors: The
next step in our research was to develop carbon embedded yarns capable of being
processed on industrial knitting equipment. In collaboration with Prof. Trulove,
commercially available and knittable yarns were embedded with activated carbon
through a process called natural fiber welding (NFW) (Haverhals etal., 2010, 2012).
Unlike typical coating techniques (i.e., dip coating, screen printing, chemical vapor
deposition, etc.), NFW swells cellulose microfibrils to soften the fibers during the
coating process, and subsequently carbon particles can embed themselves in the
swelled polymer. Then the cellulose is reconstituted in a water bath, which permanently traps the carbon particles in the fibers. This process can also produce large
Garment Devices
655
quantities of yarn (up to 200 ft) at a time on an in-house setup. In addition to producing large quantities of yarn, the main goal of this work was to demonstrate supercapacitive yarns knitted into full textile devices. Therefore, different modifications
of the welded yarns were processed on an industrial knitting machine to determine
their viability for full fabric production (Jost etal., 2014b).
Cotton, linen, bamboo, and viscose yarns were all subjected to NFW with activated carbon, all had an increase in strength, but a reduction in elasticity as shown
by tensile testing. However, only the cotton yarn was unable to be knitted. We determined that the cotton fibers were shorter than the linen, bamboo, and viscose fibers,
and in the 12 gauge needle loops, the cotton fibers separated resulting in a break in
the yarn. When the same force is applied to longer fibers, the strength of the polymer
plays a more significant role in preventing breakage.
All yarns were electrochemically tested in geometries similar to Figure 24.9d prior
to knitting. Initially, cotton yarns NFW with AC were twisted with steel yarn to increase
the conductivity, and resulted in yarns having capacitance up to 8 mF/cm, which was
much higher than previously reported works typically around 0.5 mF/cm. However,
when the plain cotton and steel yarns were twisted prior to NFW with AC, the mass
loading increased from ~0.3 to 0.6 mg/cm, and resulted in yarns with capacitance as
high as 37 mF/cm. This is currently the highest reported capacitance per length for any
carbon-based yarns. Only batteries and redox active yarns surpass this value (Kwon
etal., 2012; Liu etal., 2013).
Upon discovering the exceptional properties of these electrode yarns, we proceeded to knit them into stripes. We discovered very quickly that the cotton yarns
were too brittle to withstand industrial knitting. Multiple welding modifications
were explored to make the yarns softer, but they did not knit without breaking apart
completely. It was at this point that linen, bamboo, and viscose yarns were also
explored as electrodes, and were NFW with AC. The viscose did not hold as much
material, and only had a capacitance of ~1.5 mF/cm. The linen and bamboo were
on par with the cotton yarns, having a capacitance of ~811 mF/cm. After twisting them with steel yarn, they were processed on the knitting machine. The linen,
bamboo, and viscose fibers were all significantly longer, and therefore were not
pulled apart during the knitting process compared to the shorter cotton fibers. More
work is ongoing to characterize the performance of the yarns knitted into fabrics.
It is clear that researchers are beginning to transform their novel yarn materials
into fully woven or knitted fabric devices (Gaikwad etal., 2012; Soin etal., 2014;
Zhouetal., 2014).
24.2.4Custom Textile Architectures for Supercapacitors

Some works have begun to build woven or knitted fabrics that by the nature of their
structure and materials selection may store energy. One of the first examples found
in the literature is from M. Skorobogatiy (Gorgutsa etal., 2012; Liu etal., 2012b).
In two works, the group initially wove capacitive fibers into fabric to act as a fabric
touch sensor, similar to the capacitive touch sensing found in many smartphones
or laptops. This group later moved on to developing battery strips (Figure 24.10a)
capable of being hand woven (Figure 24.10b). In the woven structure, the red and
656
(a)
Conductive textile electrode
Battery stripe
Conductive threads
(b)
Ag/PA66
(Nylon)
PVDF
spacer
Ag/PA66
(Nylon)
3 mm
(c)
FIGURE 24.10 (a) Depicts coated strips of battery electrode fabric, demonstrating their
flexibility and stretchability, (b) depicts a hand woven structure where lengths of these strips
are woven alongside conductive yarns (red and blue) to form a fully functional battery fabric,
and (c) depicts a 3D knitted spacer fabric with piezoelectric properties.
blue yarns are conductive, and the battery strips are woven in between the red and
blue yarns to have a positive and negative connection.
Our work on knitted and screen printed supercapacitors (Jost et al., 2013)
also used knitting to build CF current collectors in specific geometries (Figure
24.8). We also demonstrated the ability to incorporate these into full garments.
Continuing with knitting, other researchers have also knitted piezoelectric spacer
Garment Devices
657
fabrics (Soin etal., 2014) where the top layer of this three-layered fabric is a positive terminal, the bottom is negative, and the interlacing spacer yarn in between is
the piezoelectric polyvinylidene fluoride (PVDF) monofilament yarn. The authors
made use of the textile structure to create functional layers in a single piece of
fabric. This kind of creativity is an excellent example of where the field is heading.
Moreover, for energy storage, methods that can scale up quickly and easily are
essential to the success of energy storing textiles. Fabrics that can provide structural
solutions to the arrangement of specialized materials can be integrated into garments
with greater ease, and can be carefully designed to have desired energy and power
densities. In the future, we can expect to see materials and devices being manufactured more like traditional fabrics. Using this approach it is likely that electronic
fabrics in the future will be visually indistinguishable from everyday textiles.
24.3CONCLUSIONS
Textile energy storage is an exciting field of research with much promise; however,
as wearable electronics have begun to appear on the market, textile energy storage
remains an underdeveloped component. Understanding the design parameters for
both fabrics and energy storage devices is crucial to push the field forward and
find creative solutions to integrate energy into textiles. Many challenges remain,
including finding nontoxic options for battery electrolytes, further increasing the
energy density of fabric batteries and supercapacitors, and finally integrating these
devices into garments using scalable and cost-effective techniques. However, it is
clear that the field is growing quickly with creative and competitive solutions appearing regularly in the literature. A seamless solution is bound to appear.
ACKNOWLEDGMENTS
The authors thank Dr. Paul C. Trulove and colleagues at the U.S. Naval Academy,
Department of Chemistry. K. Jost recognizes support from the Department of
Defense National Science and Engineering Graduate Fellowship (DoD-NDSEG).
REFERENCES
Aboutalebi, S. H., Jalili, R., Esrafilzadeh, D., Salari, M., Gholamvand, Z., Yamini, S. A. etal.
(2014). High-performance multifunctional graphene yarns: Toward wearable all-carbon
energy storage textiles. ACS Nano, 8(3), 24562466.
Bae, J., Song, M. K., Park, Y. J., Kim, J. M., Liu, M. L., and Wang, Z. L. (2011). Fiber supercapacitors made of nanowire-fiber hybrid structures for wearable/flexible energy storage.
Angewandte Chemie International Edition, 50(7), 16831687.
Carretero-Gonzlez, J., Castillo-Martnez, E., Dias-Lima, M., Acik, M., Rogers, D. M.,
Sovich, J. et al. (2012). Oriented graphene nanoribbon yarn and sheet from aligned
multi-walled carbon nanotube sheets. Advanced Materials, 24, 56955701.
Cheng, H., Hu, C., Zhao, Y., and Qu, L. (2014). Graphene fiber: A new material platform for
unique applications. [Review]. Asia Materials, 6, e113.
Chmiola, J., Yushin, G., Gogotsi, Y., Portet, C., Simon, P., and Taberna, P. L. (2006). Anomalous
increase in carbon capacitance at pore sizes less than 1 nanometer. Science, 313(5794),
17601763.
658
Dion, G. (2013). Garment device: Challenges to fabrication of wearable technology. Proceedings

of the Eighth International Conference on Body Area Networks. Boston, MA.
Dyatkin, B., Presser, V., Heon, M., Lukatskaya, M. R., Beidaghi, M., and Gogotsi, Y. (2013).
Development of a green supercapacitor composed entirely of environmentally friendly
materials. ChemSusChem, 6, 22692280.
Fu, Y., Cai, X., Wu, H., Lv, Z., Hou, S., Peng, M. etal. (2012). Fiber supercapacitors utilizing
pen ink for flexible/wearable energy storage. Advanced Materials, 24(42), 57135718.
Fu, Y., Wu, H., Ye, S., Cai, X., Yu, X., Hou, S. etal. (2013). Integrated power fiber for energy
conversion and storage. Energy & Environmental Science, 6, 805812.
Gaikwad, A. M., Zamarayeva, A. M., Rousseau, J., Chu, H. W., Derin, I., and Steingart, D. A.
(2012). Highly stretchable alkaline batteries based on an embedded conductive fabric.
Advanced Materials, 24(37), 50715076.
Gao, H. and Lian, K. (2011). High rate all-solid electrochemical capacitors using proton
conducting polymer electrolytes. Journal of Power Sources, 196(20), 88558857.
Gogotsi, Y. (2003). How safe are nanotubes and other nanofilaments? Materials Research
Innovations, 7(4), 192194.
Gogotsi, Y. and Simon, P. (2011). True performance metrics in electrochemical energy storage.
Science, 334(6058), 917918.
Gorgutsa, S., Gu, J. F., and Skorobogatiy, M. (2012). A woven 2D touchpad sensor and a 1D
slide sensor using soft capacitor fibers. Smart Materials and Structures, 21(1), 015010.
Haverhals, L. M., Reichert, W. M., De Long, H. C., and Trulove, P. C. (2010). Natural fiber
welding. Macromolecular Materials and Engineering, 295(5), 425430.
Haverhals, L. M., Sulpizio, H. M., Fayos, Z. A., Trulove, M. A., Reichert, W. M., Foley, M. P.
etal. (2012). Process variables that control natural fiber welding: Time, temperature, and
amount of ionic liquid. Cellulose, 19(1), 1322.
Hu, L. B., Pasta, M., La Mantia, F., Cui, L. F., Jeong, S., Deshazer, H. D. et al. (2010).
Stretchable, porous, and conductive energy textiles. Nano Letters, 10(2), 708714.
Jost, K., Dion, G., and Gogotsi, Y. (2014a). Textile energy storage in perspective. [10.1039/
C4TA00203B]. Journal of Materials Chemistry A, 2, 1077610787.
Jost, K., Durkin, D. P., Haverhals, L. M., Brown, E. K., Langenstein, M., De Long, H. C.
etal. (2014b). Natural fiber welded electrode yarns for knitted textile supercapacitors.
Advanced Energy Materials, in press.
Jost, K., Perez, C. R., McDonough, J. K., Presser, V., Heon, M., Dion, G. et al. (2011).
Carbon coated textiles for flexible energy storage. Energy and Environmental Science,
4,50605067.
Jost, K., Stenger, D., Perez, C. R., McDonough, J. K., Lian, K., Gogotsi, Y. et al. (2013).
Knitted and screen printed carbon-fiber supercapacitors for applications in wearable
electronics. Energy & Environmental Science, 6, 26982705.
Kirsch, N. J., Vacirca, N. A., Plowman, E. E., Kurzweg, T. P., Fontecchio, A. K., and
Dandekar,K. R. (2009). Optically transparent conductive polymer RFID meandering
dipole antenna. Paper Presented at the RFID, 2009 IEEE International Conference on.
Orlando, FL.
Kwon, Y. H., Woo, S.-W., Jung, H.-R., Yu, H. K., Kim, K., Oh, B. H. etal. (2012). Cable-type
flexible lithium ion battery based on hollow multi-helix electrodes. Advanced Materials,
24(38), 51925197.
Le, V. T., Kim, H., Ghosh, A., Kim, J., Chang, J., Vu, Q. A. etal. (2013). Coaxial fiber supercapacitor using all-carbon material electrodes. ACS Nano, 7, 59405947.
Lee, J. A., Shin, M. K., Kim, S. H., Cho, H. U., Spinks, G. M., Wallace, G. G. etal. (2013).
Ultrafast charge and discharge biscrolled yarn supercapacitors for textiles and microdevices. Nature Communications, 4, 1970.
Li, L., Au, W. M., Wan, K. M., Wan, S. H., Chung, W. Y., and Wong, K. S. (2010). A Resistive network model for conductive knitting stitches. Textile Research Journal, 80(10), 935947.
Garment Devices
659
Li, X., Zang, X., Li, Z., Li, X., Li, P., Sun, P. et al. (2013). Large-area flexible core-shell
graphene/porous carbon woven fabric films for fiber supercapacitor electrodes.
Advanced Functional Materials, 23, 48624869.

Lin, R., Huang, P., Sgalini, J., Largeot, C., Taberna, P.L., Chmiola, J., Gogotsi, Y., and Simon,
P. (2009). Solvent effect on the ion adsorption from ionic liquid electrolyte into subnanometer carbon pores. ElectrochimicaActa, 54, 70257032.
Liu, N., Ma, W., Tao, J., Zhang, X., Su, J., Li, L. etal. (2013). Cable-type supercapacitors of
three-dimensional cotton thread based multi-grade nanostructures for wearable energy
storage. Advanced Materials, 25, 49254931.
Liu, W. W., Yan, X. B., Lang, J. W., Peng, C., and Xue, Q. J. (2012a). Flexible and conductive
nanocomposite electrode based on graphene sheets and cotton cloth for supercapacitor.
Journal of Materials Chemistry, 22(33), 1724517253.
Liu, Y., Gorgutsa, S., Santato, C., and Skorobogatiy, M. (2012b). Flexible, solid electrolytebased lithium battery composed of LiFePO4 cathode and Li4Ti5O12 anode for applications in smart textiles. Journal of the Electrochemical Society, 159(4), A349A356.
Meng, Y., Zhao, Y., Hu, C., Cheng, H., Hu, Y., Zhang, Z. et al. (2013). All-graphene coresheath microfibers for all-solid-state, stretchable fibriform supercapacitors and wearable
electronic textiles. Advanced Materials, 25, 23262331.
Pasta, M., La Mantia, F., Hu, L. B., Deshazer, H. D., and Cui, Y. (2010). Aqueous s upercapacitors
on conductive cotton. Nano Research, 3(6), 452458.
Quinn, B. (2010). Textile Futures. Oxford, U.K.: Berg.
Ren, J., Zhang, Y., Bai, W., Chen, X., Zhang, Z., Fang, X. etal. (2014). Elastic and wearable
wire-shaped lithium-ion battery with high electrochemical performance. Angewandte
Chemie International Edition, 126(30), 79988003.
Seymour, S. (2008). Fashionable Technology, The Intersection of Design, Fashion, Science,
and Technology. New York: Springer.
Seymour, S. (2010). Functional Aesthetics: Visions in Fashionable Technology. New York:
Springer.
Sim, C. Y. D., Tseng, C. W., and Leu, H. J. (2012). Embroidered wearable antenna for ultrawideband applications. Microwave and Optical Technology Letters, 54(11), 25972600.
Simon, P. and Gogotsi, Y. (2008). Materials for electrochemical capacitors. Nature Materials,
7(11), 845854.
Simon, P., Gogotsi, Y., and Dunn, B. (2014). Where do batteries end and supercapacitors
begin? Science, 343(6176), 12101211.
Soin, N., Shah, T. H., Anand, S. C., Geng, J. F., Pornwannachai, W., Mandal, P. etal. (2014).
Novel 3-D spacer all fibre piezoelectric textiles for energy harvesting applications.
Energy & Environmental Science, 7(5), 16701679.
Taberna, P. L., Simon, P., and Fauvarque, J. F. (2003). Electrochemical characteristics and
impedance spectroscopy studies of carbon-carbon supercapacitors. Journal of the
Electrochemical Society, 150(3), A292A300.
Wang, K., Meng, Q., Zhang, Y., Wei, Z., and Miao, M. (2013). High-performance two-ply yarn
supercapacitors based on carbon nanotubes and polyaniline nanowire arrays. Advanced
Materials, 25, 14941498.
Wang, Z. L. and Wu, W. Z. (2012). Nanotechnology-enabled energy harvesting for self-powered
micro-/nanosystems. Angewandte Chemie International Edition, 51(47), 1170011721.
Yang, Z., Deng, J., Chen, X., Ren, J., and Peng, H. (2013). A highly stretchable, fiber-shaped
supercapacitor. Angewandte Chemie International Edition, 52, 1345313457.
Zhou, T., Zhang, C., Han, C., Fan, F., Tang, W., and Wang, Z. L. (2014). Woven structured
triboelectric nanogenerator for wearable devices. ACS Applied Materials & Interfaces,
6, 1469514701.
25
Collaboration with
Wearable Computers
Mark Billinghurst, Carolin Reichherzer,
and Allaeddin Nassani
CONTENTS
25.1 Introduction................................................................................................... 661
25.2 Related Work................................................................................................. 662
25.2.1 Collaborative Wearable Systems....................................................... 663
25.2.2 Communication Theory.....................................................................666
25.2.3 Environment Capture......................................................................... 667
25.2.4 Summary...........................................................................................668
25.3 Social Panoramas.......................................................................................... 670
25.3.1 Prototype System............................................................................... 671
25.3.1.1 Panorama Viewing.............................................................. 671
25.3.1.2 Remote Awareness.............................................................. 672
25.3.1.3 User Interaction................................................................... 673
25.3.1.4 Networking......................................................................... 673
25.3.1.5 User Experience.................................................................. 674
25.4 Pilot Study..................................................................................................... 674
25.5 Conclusion..................................................................................................... 676
25.5.1 Future Work....................................................................................... 677
References............................................................................................................... 677
25.1INTRODUCTION
Since the first days of wearable computers in the 1970s, research in the field has
largely been on how wearable systems can enhance a single users interaction with
the world around them. In an early definition, Mann stated that a wearable computer
was a computer that is Ephemeral, Eudaemonic, and Existential, or always on,
part of the user, and under the users complete control (Mann 1997). In this case,
Mann was focusing on the potential for wearable computers for enhancing personal
imaging. Similar definitions of wearable computers (e.g., Rhodes 1997, Starner
1999) focused on the benefit that wearables could provide to the individual user and
mediate their interaction in the real world. For example, the Remembrance Agent
661
662
(Rhodes 1997) demonstrated how a wearable system could supplement the users
own memory and data retrieval, while the Touring Machine (Feiner etal. 1997) was
a wearable c omputer that provided an Augmented Reality experience to show virtual
information in place.
However, wearable computers can also be used to support remote collaboration. For example, in the CamNet system (British Telecom 1993) a wearable
computer combined with a head-worn camera and display was able to transmit
live video, audio, and still images from an ambulance worker to a doctor waiting at the hospital. Similarly, the Netman wearable computer allowed a remote
technician to stream video and sensor data (IR, network monitoring) back to a
remote expert to enable them to monitor network status (Bauer etal. 1998). The
work has also been done showing how wearable computers could be used to support remote 3D manipulation tasks, and provide an increased sense of remote
awareness.
Despite these early projects, there is a need to conduct more research on how
wearable computers can be used to enhance collaboration. In 2001, Starner listed
eight important topics for future work in wearable computers, and highlighted
collaboration as one of those areas (Starner 2001). He identified three projects as
typical of what should be researched for collaborative applications: (1) live video
view-sharing and remote technical assistance (Kraut etal. 1996); (2) body stabilized
information display for 3D collaboration (Billinghurst etal. 1998); and (3) wearable
agents acting on behalf of their owners (Kortuem etal. 1999b). While there has been
some research carried out remote expert assistance, there has been less work done on
information displays and wearable agents. Modern wearable computers have more
processing, input, and output capabilities than those of 10years ago and so could
be used to develop a far wider range of collaborative experiences that move beyond
these initial application areas.
In this chapter, we review previous research on collaboration with wearable computers, discuss current work in the area, and identify areas of future research. Many
of the demonstrated collaborative wearable systems are fairly straightforward extensions of remote desktop applications, so we are particularly interested in wearable
applications that permit new types of collaboration and create an increased sense of
remote presence. Unlike earlier systems that were mostly focused on collaboration
in professional settings, we are also interested in the use of wearable computers for
social collaboration. To demonstrate the types of systems that might be possible, in
the second half of the chapter we describe a prototype application we have developed
called Social Panoramasa new type of wearable interface that enables sharing of
social spaces. Finally, we conclude this work with suggestions for future research
and other types of wearable interfaces that could be explored.
25.2 RELATED WORK

In our research, we are interested in developing new types of collaborative wearable
interfaces, and so our work is influenced by earlier research in collaborative wearable systems, communication theory, and environment capture. In this section we
review each of these topics in turn and identify possible research gaps.
Collaboration with Wearable Computers
663
25.2.1 Collaborative Wearable Systems

Despite most research in wearable computing being focused on enhancing individual
performance, there has been some research on capturing and sharing experiences with
wearables. One of the first was the CamNet system developed by British Telecom in
1993 (see Figure 25.1), which combined a head-worn display with camera, audio, and
networking. CamNet allowed an emergency services worker to connect to a remote
doctor from the scene of an accident and be guided on what to do. The doctor was able
to use a mouse to point to portions of medical images that were shown in the headmounted display (HMD), while viewing video from the accident site. Although very
early, this work showed that being able to share voice, still imagery, video, audio, and
a shared pointer may be sufficient for many remote collaboration tasks.
Manns WearCam and WearComp systems (Mann 1997) allowed a user to stream
live video to a remote collaborator, and to capture and stitch together panoramas of
their surroundings. Mann produced systems that supported two-way communication
between a user and a remote person, allowing them to share voice and video communication and simple on-screen text message cues (Mann 2000). He also explored
one-way broadcast and at one point was live streaming the view from his head-worn
cameras on the Internet, allowing anyone who visited the WearCam website to see
his current activities.
Kraut etal. (1996) and others have explored how head-mounted cameras, HMDs,
and annotations on live video and still images can be used to enhance remote
FIGURE 25.1 CamNet remote collaboration hardware. (From British Telecom, CamNet
Promotional Video, 1993.)
664
collaboration. They found that using a wearable computer to connect to a remote

expert significantly reduced the time to complete a task. However, there was no
difference in task performance between remote collaboration conditions that used
audio only, or audio and video, but the different technology used had a significant
effect on the communication style. They completed a number of other rigorous user
studies that showed how remote pointing and drawing could assist in collaboration
(Fussell etal. 2000). Their studies showed how important it was for the wearable
computer user to be able to share video and audio with a remote collaborator, not
just video alone. The ability to annotate on the video or display a shared pointer
also significantly improved communication and shared awareness. Many of the early
collaborative wearable systems had the ability to share video, audio, and graphical annotations (Drugge et al. 2004, Hestnes et al. 2001, Kortuem et al. 1999a).
Researchers explored how these could be used in a number of fields such as equipment maintenance, health care, military, etc.
In most of these systems, the remote users view is limited to the live camera feed
from the head-mounted camera. This means that they can effectively see through the
remote users eyes enabling the remote expert to clearly understand what the wearable computer user is looking at and the task they are performing. However, having
this fixed viewpoint can also cause reduced awareness of the wearable users surroundings. The remote user isnt able to independently turn their remote view around
and so get a greater situational awareness.
This is an example of communication asymmetries in collaborative wearable
interfaces (Billinghurst et al. 1999) that can affect communication. They identify
three types of asymmetries:
1.
Implementation: The physical properties of the interface are not identical
(e.g., different resolutions in display modes)
2.
Functional: An imbalance in the functions (e.g., one user sharing video, the
other not)
3.
Social: The ability of people to communicate is different (e.g., only one
person sees the face of the other)
The use of a head-worn camera introduces a functional asymmetry in that the wearable user can see a wide field of view of the real world, and is able to turn their head
to see different viewpoints, while the remote user only sees the world through the
camera, and their viewpoint is fixed to that of the wearable user.
TAC (Bottecchia etal. 2010) addresses this problem by using wider field of view
cameras to support greater situational awareness. The TAC system is a wearable
interface designed for remote collaboration on industrial maintenance tasks. The
system aims to create a shared visual view, by including views from two cameras:
(1) a wide angle normal camera and (2) a narrow angle (30) orthoscopic camera
that matches the field of view of the HMD. The use of the orthoscopic camera on the
HMD creates a system called MOVST (monocular orthoscopic video see through).
The remote user is able to freely swap between inputs of the two cameras depending
on they would like to see exactly what the user is seeing in their HMD, or have a
wider view of the users task space.
665
Camera
Motion sensor
HMD
Headset
Headphone
Microphone
WACL
Camera
Laser pointer
(a)
(b)
FIGURE 25.2 Comparing an HMD camera (a) to the WACL hardware (b) with shoulder
worn robotic camera and laser pointer.
Another approach for providing greater situational awareness is to allow the remote
user to have independent control of the camera viewpoint. Mayol etal. (2000) developed
a similar remote collaboration system with a wearable camera on a pan/tilt mechanism
that allows the remote expert to control its orientation. The remote expert receives a
live stream from the camera and is able to set his or her own independent viewpoint
with the camera. They also used computer vision to achieve an image-stabilized view,
regardless of how the wearable computer user was moving. In user studies they found
that the highest score for the combination of steadiness and field of view was gained
by wearing the camera on the shoulder instead of the head, hand, or chest.
The wearable active camera with laser pointer (WACL) (Kurata et al. 2004)
also used a shoulder-mounted system that the worker would wear (see Figure 25.2).
However, in this case a laser pointer is combined with a camera, enabling the remote
expert to point at locations and objects. Thus, the remote user was able to look around
the workers environment independently of the workers viewpoint and highlight real
objects in their workspace. A user study compared WACL to a fixed head-mounted
camera system and found that there was no difference in performance on an object
search and assembly task, but that users reported that it was more comfortable, less
tiring, and more eye-friendly to wear.
WACL used a laser pointer to convey gesture information to the real world. Other
systems have explored a wide range of different methods to convey remote gesture
cues. For example, GestureCam (Kuzuoka et al. 1994) allows a remote person to
control a robot at the local site that can execute pointing instructions with a laser
pointer. By pointing on the live camera feed, the remote expert is able to move the
robot pointer in the workers environment to indicate an area of interest.
In the drawing over video environment (DOVE) (Ou etal. 2003), hand gestures
are shown inside the video stream of the remote helper and on a display to the local
worker to facilitate remote gesturing. Similarly, Kirk and Fraser (2005) built a system
where the hands of the helper were sent via a video stream to the local worker and
666
projected onto his desk. Poelman etal. (2012) describe a system to support collaboration on a crime scene where investigators at the scene are remotely supported by
experts. The colleagues at the scene wear an HMD with stereo cameras, which maps
the entire scene in 3D and gives the expert user the ability to have an independent
viewpoint and provide a shared augmented space. Additionally, the hand gestures
of the local user are tracked and streamed by the remote user. The lack of presence,
however, resulted in difficulties to effectively communicate and users ended up often
interrupting each other. HandsOnVideo (Alem etal. 2011) is a project that further
explores the richness of hand gestures. Here, the worker wears a display close to his
or her eyes and can see the representation of the helpers hands. The remote collaborator can see the viewpoint of the worker on a big touch-enabled display. The size of
the display is however too big to be portable.
These systems show the importance of providing support for remote gesturing
in collaborative wearable systems. Many early systems just provided remote pointing functionality, but as shown earlier, with newer functionality it is possible to go
beyond this and show natural hand motion and rich gestural communication.
25.2.2 Communication Theory

When using technology for remote collaboration, one of the main goals is to enable
remote users to establish shared understanding or common ground. This occurs
through a process called grounding where each person actively collaborates with
the other in offering conversational assertions and feedback, showing that it has
been understood before offering another assertion (Clark and Wilkes-Gibbs 1986).
Inface-to-face communication, a wide variety of verbal and nonverbal cues are used
to establish grounding, such as speech, facial expression, body language, and interacting with objects (Krauss and Weinheimer 1966). However, in teleconferencing
interfaces these communication cues may not be able to be transmitted as effectively
as possible and so the grounding process may be affected.
Kraut et al. (1996) showed how grounding theory can be applied to wearable
interfaces. In this case, they used an HMD and head-mounted camera to share communication cues between a person trying to repair a bicycle and a remote expert.
They found that varying the remote cues provided by the system did indeed change
the communication behaviors of the participants.
In designing collaborative wearable systems that facilitate communication grounding, there are a number of approaches that can be taken. Bottecchia etal. (2008) advocate following the picking outlining adding (POA) paradigm based on designing for
three activities:
1.
Picking: Selecting an object
2.
Outlining: Maintain attention on an object and providing additional information about it
3.
Adding: Illustrating actions that can be performed (usually with gestures)
Thus, ideal interfaces should support methods for POA. For example, Bottecchia
etal. (2009) created a system for remote collaboration with a wearable computer that
667
allowed a person to point to a location on the shared video with a virtual arrow, place
virtual outlines on an object to highlight it, and use directional arrows to show how
actions should be performed.
In a broader sense, Kortuem identifies primitives that provide simple collaborative
functions that can be combined to build complete collaborative interfaces (Kortuem
1998). These primitives include the following:
1.
Remote awareness: Users must be aware of who else is participating in the
remote collaboration.
2.
Remote presence: Remote users must be represented in the interface in
some way to be able to share communication cues.
3.
Remote presentation: The remote user must have some way to present
information in the view of the other participant.
4.
Remote pointing: The ability to control a remote cursor to point at an object
in the users field of view.
5.
Remote manipulation: The ability of a remote user to manipulate objects in
the users field of view.
6.
Remote sensing: The ability of a remote user to have direct access to the
sensors on the local users wearable computer.
These papers show that wearable collaborative systems should support as many
communication cues as possible to facilitate grounding and establishing shared
understanding.
25.2.3Environment Capture
As seen in Section 25.2.1, many wearable systems stream video or images from the
wearer to a remote person, enabling them to capture a small piece of their environment. However, in most of these systems the remote users view is limited to the
live feed from the head-worn camera reducing awareness of the users surroundings. Systems like WACL allow the remote user to have camera control and so can
increase their situational awareness.
There are other tools that can be used to provide a more immersive view of the
wearable computer users environment. One of these is through the use of image
panoramas. Work on panoramic imagery on desktop computers has been around
since the 1990s, and with increasing technological advancements, panorama applications have become popular in recent years. Such applications as Google Street
View (Google 2014) or Microsoft Streetside (Microsoft 2014) offer 360 imagery of
street scenes. These can be used to explore remote locations in a similar manner to
that of a virtual tour. However, these immersive panoramic scenes require special
hardware and well-calibrated cameras to capture the panoramic image, which comes
with a high cost and thus are not accessible for the end user.
Recently, software for creating panoramas has begun to appear on consumer
devices such as mobile phones and digital cameras (Pece et al. 2013). Advances
in hardware for mobile devices and smartphone platforms have resulted in widespread use of portable devices equipped with high-quality cameras. There has been
668
significant interest in the ability to quickly and easily create panorama imagery at
anytime and anywhere (Au etal. 2012). After years of research in the field of computer vision toward algorithms that can quickly and easily create panoramas, it is
now a popular feature on mobile phones to offer functionalities to combine or stitch
collections of images together (Xu and Mulligan 2013). One such example is Ztitch
(Au etal. 2012), an application for Windows Phone, which lets users create, modify,
and upload panoramic imagery onto an online portal to share with other users. Other
existing panorama applications include Photo Sphere,* TourWrist, and Photosynth.
However, interaction with these platforms is limited to online viewing and there is
no support for real-time collaboration.
Panorama imagery provides an easy way to capture a remote users environment,
but there has been a little previous research on the use of panorama imagery on wearables for remote collaboration. One of the exceptions is the work of Cheng et al.
(1998) who developed a system for collaboration that stitched together pictures from
the wearable computer camera to create image mosaics. These mosaics were shared
with the remote expert on a PC allowing them to place virtual annotations in the wearable users environment. However, their work focused on image tracking and they
didnt generate true immersive panorama views or evaluate the usability of the system.
Similarly, the WACL system (Kurata etal. 2004) used knowledge about the camera orientation to create pseudo-panorama images of the wearers environment,
stitching together overlapping images. The WACL remote expert interface provided
this view along with a live camera view to enable the expert to have increased awareness about the users environment (see Figure 25.3). Using this interface, the expert
can click on the panorama image to project a laser spot in the real world. However,
as with Chengs work, the system does not create a truly immersive panorama that
can be freely viewed by the expert. The focus of the WACL remote expert interface
is on helping the expert understand what the wearable user is doing and to communicate back with them, not capturing the users space and presenting it in an
immersive view.
More recently, new depth sensing technology such as the Microsoft Kinect
hardware has been used to capture 3D representations of space. For example, the
commercially available Occipital sensor can be combined with tablets to create a
handheld scanning solution for environment capture. These systems have not yet
been integrated into wearable systems, but it is expected that the next generation of
wearable devices will move beyond panoramas to full 3D environment capture.
25.2.4 Summary
In this section, we have provided a brief overview of the evolution of wearable computer interfaces for remote collaboration. As can be seen from this earlier work, most
http://www.google.co.nz/maps/about/contribute/photosphere/.
http://www.tourwrist.com/.
http://photosynth.net/.
http://www.microsoft.com/en-us/kinectforwindows/.
https://occipital.com/.
*

669
Objective image
for stabilization
Live video
Pseudo-panoramic view
FIGURE 25.3 WACL remote expert view, showing image stitching.
research on using wearable computers for remote collaboration has been focused
on sharing first-person live video. Many of these systems have support for remote
pointing or gesture-based communication. Some research has also supported use
of independent camera views through body-mounted cameras. There has also been
some research on how to apply communication theory to inform the development of
collaborative wearable interfaces.
From this earlier research it is clear that if we want to develop a wearable system
for remote collaboration, it should have the following features:

Have the ability to share video, audio, and graphical annotations

Provide support for natural gesture-based interaction
Enable environmental capture and sharing
Provide support for remote presence and awareness
Using a grounding communication model for interface design
In the next section, we describe a wearable interface we have developed that has
these properties. However, there are some important differences between our wearable system and earlier research. Our research explores the use of real-time panorama sharing from a wearable computer for remote collaboration, and pointing and
drawing interaction methods in shared panoramas. It studies the effects of presence
and awareness inside a panoramic image presented on HMDs and tablet interfaces.
Most importantly, there is a focus on shared social experiences, compared to the earlier work that used collaborative wearable interfaces for industrial and professional
applications. In the next section, we describe this in more detail.
670
25.3 SOCIAL PANORAMAS

Most of the wearable systems reviewed in Section 25.2 have been focused on industrial or professional applications, such as remote maintenance. This is partly driven
by the high cost and technical nature of early wearable computers, and also by the
odd looking hardware that make them up. However, recently there have been wearable computers developed for the general consumer market. Systems such as Google
Glass* and the Recon Mod are more wearable and less socially isolating than these
earlier research prototypes. They are designed for everyday use in social and sporting situations. This means that there is an opportunity to develop wearable applications for sharing social experiences and not just industrial collaboration.
In our research we are interested in how wearable computer users could rapidly capture and share their surroundings using panorama imagery and live video. Asshown
in Figure 25.4, a user with a wearable computer and camera (such as Google Glass)
could pan their head around and capture their surroundings and then immediately
share it with a remote collaborator as an immersive panorama. Once shared, both
users could view the space independently, talk to each other, and point or draw on
the image to easily communicate about what they are seeing. Since users are sharing
and adding virtual annotations to a captured view of the reality, this is an example of
mixed reality collaboration. Panoramic imagery can help overcome the limited field
of view and providing an increased understanding of the users environment. We call
this a Social Panorama because it uses panoramas to enable people to have shared
social experiences. In this section, we report on early progress toward this vision.
Like other wearable computers, Google Glass has the ability to stream video from
its camera to remote collaborators. Using Google Hangouts, Glass users can conduct first-person video conferencing allowing remote users to see through their eyes.
Other applications, such as Livestream, allow a Glass user to broadcast their view to
multiple remote people. However these applications provide limited spatial cues or
environmental awareness, and a lack of communication cues such as remote pointing or annotation on the shared views. Our work moves beyond this by focusing on
creating a shared social environment facilitating rich communication.
FIGURE 25.4 Conceptual image of using a wearable computer to create Social Panoramas.
Users can share their space and view remote annotations.
https://www.google.com/glass/start/.
http://www.reconinstruments.com/.
http://new.livestream.com/.
*

671
25.3.1Prototype System
In order to explore how panoramas and wearable computing can be used to support
real-time collaboration, we developed a prototype system that connects a person
using a Google Glass wearable computer with a second user on an Android tablet.
The overall goal of our research is to develop a system that will allow a user to capture a panorama of the space around them and then share it in real time with a remote
collaborator. However, Glass has limited processing power that makes it difficult to
perform real-time panorama stitching, so in the initial prototype we assume the panorama has been captured and focus on supporting remote collaboration, presence,
and interaction.
The current prototype has been developed using Processing* with the Ketai
library for sensor support. Processing is a software library that makes it very easy
to create prototype mobile applications. It builds the code into an Android APK
file that is then pushed to run on the target device, either Glass or tablet. The code
was split into two separate code bases; one for Glass and one for the Tablet, which
were mostly similar except for touch interaction and network connection. The Ketai
library provides complete access to all of the sensors on an Android device, such as
camera, orientation sensors, accelerometers, etc. Using this we are able to create an
immersive panorama view that responds to user viewpoint orientation.
25.3.1.1 Panorama Viewing
The system uses a previously captured panorama image mapped onto a virtual cylinder. This simulates a live capture system. To display a panoramic image to the user,
a virtual cylindrical 3D model is rendered with 32 edges using the QUAD_STRIP
shape definition from Processing. Once the cylinder is created, it is textured with the
desired panorama image. The height of the cylinder is set to match the height of the
image, and the width and radius were calculated to maintain the aspect ratio of the
image. The panorama image textures were taken of the local laboratory environment
in which the user evaluations were conducted (see Figure 25.5). This was designed to
mimic the experience of the users capturing their own surroundings.
The cylinder is translated to be surrounding the camera view (i.e., the user view).
Google Glass contains an orientation sensor, and using the Ketai library to read the
values from this, the user can view the panorama by rotating their head, and the
integrated orientation sensor is used to rotate the panorama cylinder accordingly to
match the users orientation change. We also developed a viewing application for the
FIGURE 25.5 Sample cylindrical panorama image used.

*

http://www.processing.org/.
https://code.google.com/p/ketai/.
672
Android tablet that uses the tablet orientation to rotate the tablet view. Currently we
just map rotations about the vertical axis.
25.3.1.2 Remote Awareness
In a collaborative interface, effective collaboration is based on mutual understanding
and grounding and is connected with the concepts of social presence and shared
awareness of the other users actions. Social presence is the feeling of being with
another person in the same communication (Siitonen and Olbertz-siitonen 2013).
In the field of humancomputer interaction (HCI), studies have shown that social
presence is affected by interface types (Biocca et al. 2003) and interactivity
(Trendafilov etal. 2011). Thus it is important in the interface to provide some cues
that provide awareness of what the remote user is doing.
In the Social Panorama interface, we were particularly interested in showing where the remote user is looking. Even though they are sharing the same
panorama view, both connected people could be looking at different portions of
the image. So providing awareness of where the remote person is looking is very
important to facilitate effective communication. There are several methods for
doing this, and in developing the prototyping interface, we explored the following (see Figure 25.6):
1.
Centered radar: An interface cue built using a radar metaphor. A top-down
radar display is shown in the center of the screen with different colored
wedges drawn on it showing the view angles of the two people connected
into the shared panorama. If the wedges are overlapping, then the two users
are seeing the same parts of the panorama.
2.
Context compass: A line shown at the top of the screen represents the viewpoint into the panorama from 0 to 360. Different colored rectangular
boxes are drawn on top of this line representing each of the users viewpoints. They move as the users rotate their heads around, and when the
boxes are overlapping the users share the same viewpoint.
In addition to these remote awareness cues, we also added a different colored circular dot in the center of the screen for each of the users. When one user saw the
others dot coming into view then they knew the other person was beginning to share
(a)
Top-down radar view
(b)
Context compass view
FIGURE 25.6 Two different awareness cues: (a) radar display and (b) context compass.
673
the same viewpoint as them. We also explored complementing the center dot with a
rectangular border around the users view, also showing when the viewpoints began
to overlap.
25.3.1.3 User Interaction
From Section 25.2.4, we know that interfaces for remote collaboration should support
the ability to share graphical annotations and nature gesture. In the Social Panorama
interface, we provide support for shared pointing and drawing, enabling participants
to easily refer to surrounding objects and the environment. The aim was for drawing
to mimic traditional sketching with paper and pencil, providing communication that
feels natural. Drawing and pointing are two modes of interaction that have previously been shown to be good gesture surrogates (Fussell etal. 2004).
Using Google Glass, users are able to touch the touchpad on the side of the Glass
display to point or draw on the panorama. The interface could either be in a pointing or drawing mode, and the user can tap on the touchpad with two fingers to swap
between modes. Once selected, the pointing mode maps the x, y position of the users
finger on the touchpad to the corresponding position on the region of the panorama
in view. As they move their finger on the touchpad, a virtual pointer moves on the
panorama. In a similar way, when they are in drawing mode, touching the touchpad
will draw lines on the panorama that remain until they are erased. Any pointing or
drawing points are sent wirelessly to the tablet so that the remote user can see the
Glass users input. In a similar way, the tablet user can touch and draw on the tablet
and have their annotations appear in a different color on the panorama. When the
user touches either the tablet or Glass touchpad with three fingers, the drawings are
erased. Figure 25.7 shows typical annotations added to the panorama view.
25.3.1.4Networking
In order to share pointing, drawing, and remote awareness cues, the tablet and Glass
applications are connected together over a Wi-Fi network. This enables data to be
freely shared between them. A software module is used to connecting the tablet and
the Glass via the TCP/IP networking library oscP5.* The Glass device starts listening to a TCP port, while the tablet device is attempting to connect to the Glass IP
address and port number.
FIGURE 25.7 Drawing and pointing annotations appearing on the panorama image.
*
https://code.google.com/p/oscp5/.
674
Once the connection is established, both devices start sending their local
o rientation to the remote device every half second. The format of each orientation
message consists of: x, y, z representing the orientation around x-axis, y-axis, and
z-axis, respectively.
The connection component also updates the location of drawing and pointing
points to and from the remote device. The format of each collaboration message
consists of

Touch mode: Drawing/pointing

a, b: Representing x and y of previous touch point
x, y: Representing x and y of the current touch point
o: The orientation at which the touch point was recorded
25.3.1.5 User Experience

The overall user experience with the prototype creates the illusion that a user is able
to share their surroundings with a remote collaborator. When the application starts,
the Google Glass user is able to see a panorama of the office they are standing in
shown in their Glass display (see Figure 25.5). As they turn their head around they
are about to see different portions of the panorama that matches what they can see
with their natural eyes in the real world. Similarly, the tablet user sees the same panorama image of remote office. They can view different portions of the panorama by
turning their tablet around.
The Glass display and tablet are networked together, so as both users look around
the space they can use the remote awareness cues, provided to have an idea of where
their partner is looking. There is an audio connection that allows both users to talk to
each other. Similarly they are able to use touch gestures to show a pointer or draw a
line on the image. In this way they are able to communicate about what they are seeing.
As can be seen, the Social Panorama interface is very lightweight and easy to use.
Compared to some of the earlier wearable interfaces for remote collaboration, this
interface makes it easy to communicate about the Glass users surrounding space.
The interface could be used in a wide variety of settings, such as enabling a wearable
user in beautiful surroundings to share the view with a remote friend still at work, or
allowing a person to share the view inside a museum, or from a concert.
However, with this interface there are many important research questions that
need to be addressed, such as how could different awareness cues affect the remote
collaboration, or does supporting pointing and drawing functionality increase the
sense of social presence? In the next section, we report on an initial pilot study conducted with the prototype interface that explores some of these aspects.
25.4 PILOT STUDY

While developing the Social Panorama prototype we conducted an initial pilot study,
exploring which awareness tools would increase the sense of remote presence. The
pilot study was conducted with four participants. A panorama was precaptured of the
room where the Glass user was seated. A second user was sitting in a different room
with a handheld tablet. Both users were connected via Wi-Fi and an audio connection.
675
Four pairs of subjects communicated using Glass or the Android tablet showing
one of the following three different conditions:
Audio only: Both users could view the panorama image but only talk about it.
Audio + radar: In addition to audio, a radar display was used to provide remote
awareness. The radar display is an exocentric awareness cue that shows a
triangular view for each user that moves in a circular motion according to
the orientation of the device they are using.
Audio + view rectangle: An egocentric rectangle shows where each collaborator was looking. The rectangle would show each users field of view and so
would overlap when facing in the same direction.
Both the radar and view rectangle cues also had circles appearing at the center of the
screen showing the local users center point of view. A second circle of a different color
would appear when the remote user is starting to face the same portion of the panorama. If both circles are lined up, then the two users are facing in the same direction.
The task for each pair of subjects was to discuss for 2min the room that the Glass
user was in and answer a series of interior design questions, such as where they
would put lights to best light the room. It was a within-subjects study so each pair
experienced all three conditions with five different interior design questions. After
each condition the following questions were asked to each participant asking on how
well they think they collaborated:

Q1. How easy was it to work with your partner?

Q2. How easily did your partner work with you?
Q3. How easy was it to be aware of what your partner was doing?
Q4. How much did you feel like when you were in the same room as with your
partner?
Q5. To what extent did your partner seem real?
The questions 13 were answered on a Likert scale from 1 to 7, with 1 being Not
Very Easy and 7 Very Easy. Question 4 ranged from Not like being in the same
room at all to A lot like being in the same room, and Question 5 provided Not
real at all and Very real as possible answers.
Figure 25.8 shows the average scores for each question. Since this is a pilot study
we have too few subjects for a detailed statistical analysis, but in general users felt
that they had the best awareness of their partner in the radar condition. They felt that
this gave them continual awareness of where their partner was looking and enabled
them to easily align their views when needed. Subjects also commented that the box
cue forced them to move around a lot to locate their partner, and was confusing.
Some people felt that using audio only was better than using the view rectangle.
However, the center point was considered as useful, providing a more specific insight
on how to exactly align both views. In general the exocentric cue was used to have
an overview, while the egocentric view was used to align in detail. Based on these
results we decided to provide all interfaces in the future experiments with both egocentric and exocentric remote awareness cues.
676

7
6
5
4
3
1
0
Q1
Q2
Q3
Audio
Box
Q4
Q5
Radar
FIGURE 25.8 Survey question results across awareness conditions.
In observing how people used the interface it was interesting to note that many
of them changed their communication behavior depending on the type of awareness cues that were provided. For example, in the audio-only condition people
would often describe at length the portion of the room that they were looking at,
until they were sure that the other user understood the direction they were facing.
Subjects also felt that the interfaces in general were intuitive to use, especially the
Glass application that just required them to look in the direction that they were
interested in.
25.5CONCLUSION
In this chapter, we have provided a review of wearable computer interfaces for
remote collaboration and then described a prototype interface we have developed
for sharing social spaces. From earlier work in this field it is apparent that most of
the related work is focused on collaboration in a professional setting for applications
such as remote maintenance, rather than purely social interaction. However, this
work does provide useful design guidelines such as the importance of having the
ability to share video, audio, and graphical annotations, and to consider awareness
and communication cues.
Panoramic imagery can provide an immersive and holistic impression of an environment. With the ubiquity of smartphones equipped with high-quality cameras and
the desire of people to share their experiences and feel connected, a panorama can
support this by offering the impression of a remotely located user to be close to
another with the same freedom of looking around independently as in a real environment. However, the use of panoramic imagery in a mobile collaborative environment
has not been researched thoroughly, leaving many gaps in interface guidelines and
on how the sensation of connection and shared experiences can be utilized in such
an omnidirectional view setting.
This research aims to contribute in the field of wearable computing and sharing personal experiences by using panoramic imagery and exploring remote and
collaborative interaction modalities and their impact on presence. To achieve this
goal, a prototype was developed that simulates an already captured panorama that
is presented on a tablet and on an HMD. The implemented user interface supported
677
the awareness necessary for successful grounding, and the addition of pointing and
drawing provided tools to reference objects or locations in the panorama image
quickly.
This Social Panorama wearable application is one of first wearable interfaces
for remote collaboration where the focus is on creating a social space and sharing
the users environment. A pilot study quickly showed the importance of combining
exocentric and egocentric cues, which resulted in interfaces that provided an overall
view of the space and where the remote user is currently looking, as well as a more
detailed view once they overlapped. In addition, established interaction methods for
remote collaboration were implemented for pointing and drawing.
25.5.1 Future Work

This research is the first step into a new field of collaboration using Social Panoramas
between HMDs and tablet devices. However, there are a number of areas of future
work that could be explored. For example, there needs to be a more detailed user
study following the initial pilot study that can evaluate the pointing and drawing
interaction methods and see if they have any impact on the feeling of social presence.
Audio has been shown to be an integral part of communication and social presence. Future research could explore the use of spatial audio, where the voice of the
other user could be heard depending on his or her viewpoint. Using more of a users
senses could provide a better understanding of where the user is looking, and potentially increase the sense of presence. Visual cues would confirm what the user heard.
Another possibility for future work would be to offer a livestream of the current
field of view of the local user, leveraging a mixed reality view to show a combination of captured image and live video. These new interface elements will have to be
evaluated in further experiments that focus on usability and social presence.
As the current prototype has only been tested in a controlled indoor environment,
a higher fidelity prototype should be tested outdoors under real-world conditions. An
outdoor environment introduces several additional challenges; for example, ambient
noise, moving backgrounds, and more complex surroundings.
Finally, future work could include the possibility of using spherical panoramas.
This would require further research regarding how the interface would operate.
Acomplete implementation of the proposed system, including real-time panorama
capture, stitching, and sharing, would make the idea of Social Panoramas real. While
it is not yet known how consumers will use wearable devices, such as Google Glass,
they do offer new possibilities for collaboration, with much left to explore.
REFERENCES
Alem, L., Tecchia, F., and Huang, W. (2011). Hands on video: Towards a gesture based mobile
AR system for remote collaboration. In Alem, L. and Huang, W. (Eds.). Recent Trends of
Mobile Collaborative Augmented Reality Systems (pp. 135148). Springer: New York.
Au, A. and Liang, J. (2012). Ztitch: A mobile phone application for immersive panorama creation, navigation, and social sharing. In IEEE 14th International Workshop on Multimedia
Signal Processing (MMSP), September 1719, 2012, pp. 1318. Banff, Canada.
678
Bauer, M., Heiber, T., Kortuem, G., and Segall, Z. (1998). A collaborative wearable system
with remote sensing. In 2012 2nd International Symposium on Wearable Computers
(ISWC 98), (pp. 1017). IEEE Computer Society, October 1920, 1998, Pittsburgh, PA.
Billinghurst, M., Bowskill, J., and Morphett, J. (1998a). WearCom: A wearable communication space. In Proceedings of CVE'98: Collaborative Virtual Environments 1998, June
17th19th, 1998, Manchester, UK, pp. 123130.
Billinghurst, M., Kato, H., Bee, S., and Bowskill, J. (1998). Asymmetries in collaborative
wearable interfaces. In 2012 16th International Symposium on Wearable Computers
(pp. 133133). IEEE Computer Society.
Billinghurst, M., Weghorst, S., and Iii, T. F. (1997). Wearable computers for three dimensional
CSCW. In 2012 3rd International Symposium on Wearable Computers (pp. 3939).
IEEE Computer Society, October 18th19th, San Francisco, CA.
Biocca, F., Harms, C., and Burgoon, J. K. (2003). Toward a more robust theory and m
easure
of social presence: Review and suggested criteria. Presence Teleoperators Virtual
Environment, 12(5), 456480.
Bottecchia, S., Cieutat, J. M., Merlo, C., and Jessel, J. P. (2009). A new AR interaction paradigm
for collaborative teleassistance system: The POA. International Journal on Interactive
Design and Manufacturing (IJIDeM), 3(1), 3540.
Bottecchia, S., Cieutat, J. M., and Jessel, J. P. (2010). TAC: Augmented reality system for
collaborative tele-assistance in the field of maintenance through internet. In Proceedings
of the First Augmented Human International Conference, 2010, p. 14. April 2nd3rd,
2010, Megeve, France.
British Telecom. (1993). CamNet Promotional Video.
Cheng, L.-T. and Robinson, J. (1998). Dealing with speed and robustness issues for videobased registration on a wearable computing platform. In Wearable Computers,
1998. Digest of Papers. Second International Symposium on, October 1920, 1998,
Pittsburgh, PA. pp.8491.
Clark, H. H. and Wilkes-Gibbs, D. (1986). Referring as a collaborative process. Cognition,
22, 139.
Drugge, M., Nilsson, M., Parviainen, R., and Parnes, P. (2004). Experiences of using wearable
computers for ambient telepresence and remote interaction. In Proceedings of the 2004
ACM SIGMM Workshop on Effective Telepresence (pp. 211). ACM, October 1016,
New York, NY.
Feiner, S., MacIntyre, B., Hllerer, T., and Webster, A. (1997). A touring machine: Prototyping
3D mobile augmented reality systems for exploring the urban environment. Personal
Technologies, 1(4), 208217.
Fussell, S. R., Kraut, R. E., and Siegel, J. (2000). Coordination of communication: Effects
of shared visual context on collaborative work. In Proceedings of the 2000 ACM
Conference on Computer Supported Cooperative Work (pp. 2130). ACM, December
26, Philadelphia, PY.
Fussell, S., Setlock, L., Yang, J., Ou, J., Mauer, E., and Kramer, A. (2004). Gestures over
video streams to support remote collaboration on physical tasks. HumanComputer
Interaction, 19(3), 273309.
Google Street View. (2014). http://www.instantstreetview.com/. Accessed October 14, 2014.
Hestnes, B., Heiestad, S., Brooks, P., and Drageset, L. (2001). Real situations of wearable computers used for video conferencing and for terminal and network design.
In Wearable Computers, 2001. Proceedings of Fifth International Symposium on
(pp.8593). IEEE October 79, Zurich, Switzerland.
Kirk, D. S. and Fraser, D. S. (2005). The effects of remote gesturing on distance instruction.
In Proceedings of the Conference on Computer Supported Collaborative Learning, May
30June 4, Taipei, Taiwan, 2005 (pp. 301310).
679
Kortuem, G. (1998). Some issues in the design of user-interfaces for collaborative wearable
computers. In IEEE Virtual Reality Annual International Symposium, March 1418,
Atlanta, GA.
Kortuem, G., Bauer, M., and Segall, Z. (1999a). NETMAN: The design of a collaborative
wearable computer system. Mobile Networks and Applications, 4(1), 4958.
Kortuem, G., Schneider, J., Suruda, J., Fickas, S., and Segall, Z. (1999b). When cyborgs meet:
Building communities of cooperating wearable agents. In Wearable Computers, 1999.
Digest of Papers. The Third International Symposium on October 1819, San Francisco,
CA, (pp. 124132). IEEE.
Krauss, R. M. and Weinheimer, S. (1966). Concurrent feedback, confirmation and the e ncoding
of referents in verbal communication. Journal of Personality and Social Psychology, 4,
343346.
Kraut, R. E., Miller, M. D., and Siegel, J. (1996). Collaboration in performance of physical tasks: Effects on outcomes and communication. In Proceedings of the 1996 ACM
Conference on Computer Supported Cooperative Work (CSCW 96), November 1620,
Boston, MA, (pp. 5766). ACM.
Kurata, T., Sakata, N., Kourogi, M., Kuzuoka, H., and Billinghurst, M. (2004). Remote
collaboration using a shoulder-worn active camera/laser. In Wearable Computers, 2004.
ISWC 2004. Eighth International Symposium on October 31November 3, Arlington,
VA, (vol. 1, pp. 6269). IEEE.
Kuzuoka, H., Kosuge, T., and Tanaka, M. (1994). GestureCam: A video communication s ystem
for sympathetic remote collaboration. In Proceedings of the 1994 ACM Conference on
Computer Supported Cooperative Work (pp. 3543). ACM.
Mann, S. (1997). Wearable computing: A first step toward personal imaging. Computer, 30(2),
2532.
Mann, S. (2000). Telepointer: Hands-free completely self-contained wearable visual augmented reality without headwear and without any infrastructural reliance. In Wearable
Computers, The Fourth International Symposium on (ISWC 2000), October 1617,
Atlanta, GA, (pp. 177178). IEEE.
Mayol, W. W., Trdoff, B., and Murray, D. W. (2000). Wearable visual robots. In The Fourth
International Symposium on Wearable Computers, Atlanda, GA, October 1617, 2000
(pp. 95102). IEEE Computer Society.
Microsoft StreetSide. (2014). http://www.microsoft.com/maps/streetside.aspx, accessed on
October 14th, 2014.
Ou, J., Fussell, S. R., Chen, X., Setlock, L. D., and Yang, J. (2003). Gestural communication
over video stream: Supporting multimodal interaction for remote collaborative physical
tasks. In Proceedings of the Fifth International Conference on Multimodal Interfaces
(ICMI 2003), November 57, Vancouver, Canada, pp. 242249.
Pece, F., Steptoe, W., Wanner, F., Julier, S., Weyrich, T., Kautz, J., and Steed, A. (2013).
Panoinserts: Mobile spatial teleconferencing. In Proceedings of the SIGCHI Conference
on Human Factors in Computing Systems (CHI '13). ACM, New York, NY, 13191328.
Poelman, R., Akman, O., Lukosch, S., and Jonker, P. (2012). As if being there: Mediated
reality for crime scene investigation. In Proceedings of the ACM 2012 Conference
on Computer Supported Cooperative WorkCSCW12, February 1115, Seattle, WA,
pp.12671276.
Rhodes, B. J. (1997). The wearable remembrance agent: A system for augmented memory.
Personal Technologies, 1(4), 218224.
Siitonen, M. and Olbertz-siitonen, M. (2013). I am right here with youConstructing
presence in distributed teams. In Proceedings of International Conference on Making
Sense of Converging Media (AcademicMindTrek13), October 0104, Tampere, Finland,
pp. 1116.
680
Starner, T. (2001). The challenges of wearable computing: Part 1. IEEE Micro, 21(4), 4452.
Starner, T. E. (1999). Wearable computing and contextual awareness. Doctoral dissertation,
Massachusetts Institute of Technology, Cambridge, MA.
Trendafilov, D., Vazquez-Alvarez, Y., Lemmel, S., and Murray-Smith, R. (2011). Can we
work this out? An evaluation of remote collaborative interaction in a mobile shared
environment. In Proceedings of the 13th International Conference on Human Computer
Interaction with Mobile Devices and Services (MobileHCI11) (pp. 499502). ACM:
New York.
Xu, W. and Mulligan, J. (2013). Panoramic video stitching from commodity HDTV cameras.
Multimedia System, 19, 407426.
Author Index
A
Aanjaneya, M., 153
Ababsa, F., 152
Abbott, J., 233234
Abe, Y., 503
Abidi, M.A., 127
Abolmaesumi, P., 497
Aboutalebi, S.H., 653
Abumi, K., 503
Acik, M., 653
Ackerman, J., 503
Ackermann, H., 505
Adams, J., 40
Adcock, M., 231
Adeli, H., 359
Adhami, L., 502
Adhikary, S.D., 495
Agarwal, R., 502
Agusanto, K., 460462, 497
Ahlers, K., 461462
Ahmadi, S.-A., 508
Ahn, B., 231
Ahn, H., 506
Ahonen, J., 300
Ahrens, J., 300, 321
Aiteanu, D., 334
Aittala, M., 461462, 468
Akahoshi, T., 535
Akiyama, S., 314
Akkar, S.D., 358
Akman, O., 666
Alahi, A., 212213
Alam, S., 294, 315
Alba, M., 359
Al-Deen Ashab, H., 497
Alem, L., 666
Alexander, E., 530
Algazi, V.R., 315, 321
Alkire, M., 509
Allen, K., 68
Allen, P.K., 165, 556
Allen, R., 264
Allotta, B., 552
Allread, B.S., 376
Alonso, A., 587, 589
Alp, M.S., 522
Alpern, M., 80
Alphonse, L., 383
Altobelli, D., 530, 533
Amemiya, T., 554

Amesur, N., 504
Amft, O., 583615
Amit, G., 40
Amma, C., 588, 593, 607
Anand, S., 234
Anand, S.C., 646, 655656
Anderegg, S., 506
Andersen, M., 273
Anderson, B.G., 344, 460, 473
Anderson, T.R., 281
Ando, H., 554
Ando, T., 68
Ando, Y., 281
Andresen, S.H., 416
Andrews, S., 250
Andriot, C., 233, 556
Angelis, J., 46
Anidjar, M., 487
Anliker, U., 586, 594, 597, 599, 606608,
610,613
Anspach, J.H., 374
Aoki, T., 552, 554
Arboleda, C.A., 333334
Arditi, D., 383
Ardito, C., 415
Ariaratnam, S.T., 371372
Arredondo, M.T., 415
Arth, C., 192
Arthur, K.W., 71
Arvanitis, T.N., 587, 589, 592, 600, 603,
608612
Asada, H.H., 597, 600, 606, 609
Asce, M., 358
Aschenbrenner, R., 622, 633
Aschke, M., 506, 531532
Ashbrook, D.J., 27
Ashby, F.G., 203
Ashir, A., 294, 315
Ashton, K., 52
Au, A., 668
Au, W.M., 629, 646, 654
August, B., 262
Auner, G., 497, 531
Auperin, A., 492
Ausman, J.I., 522
Auvray, M., 312
Avanzini, F., 321
Avendano, C., 321
Ayache, N., 496, 536537, 571
681
682
Azizian, M., 499, 503
Azuma, R.T., 3, 152, 227, 259273, 333, 373,
412, 458
B
Baber, C., 587, 589, 592, 600, 603, 608612
Bcher, M., 577
Bachhal, S., 233234
Back, M., 288
Bade, C., 435
Bae, C., 586, 590, 606607
Bae, J., 653
Baer, M., 288, 594, 597, 599, 606608, 613
Baere, T.D., 492
Baggi, D., 315
Baghai, R., 587, 592, 606607, 609
Bagozzi, R.P., 417
Bai, W., 653
Baillie, S., 234
Baillot, Y., 152, 227, 333, 373, 458
Bainbridge, D., 311, 501, 535
Bajura, M., 461462, 503
Bala, K., 458
Ballagas, R., 271
Ballester, M.A.G., 533
Ballou, G.M., 283
Balogh, E., 504
Bamberg, S.J.M., 588589, 594, 607
Bamji, C., 346
Banerjee, P., 231
Banfi, G., 509
Banger, M., 495
Banihachemi, J.-J., 509
Bannach, D., 613
Bansal, R., 363
Baraff, D., 558
Barah, A., 492
Barandiaran, I., 536
Baranski, B., 415
Baratoff, G., 436, 473
Barbagli, F., 250, 556
Barbi, J., 564
Barea, R., 447
Barfield, W., 811, 61, 278
Barger, J., 313
Barner, K.E., 250
Barnum, P.C., 438
Barrett, A., 495
Bartczak, B., 345346
Barthes, R., 52
Bartoli, A., 508
Bartz, D., 463, 466, 468, 470, 473478, 542
Barzel, R.,
Bascle, B., 531532
Bass, L., 80
Bastien, S., 539
Author Index
Bauck, J.L., 296
Bauer, M., 535, 552, 608, 662, 664
Bauernschmitt, R., 535
Baumann, H., 22
Baumhauer, M., 487, 497, 499, 538
Baur, C., 509
Bay, H., 207208
Bayart, B., 232, 234
Bayer, M., 71
Bayless, A., 504
Becker, B.C., 532
Becker, M., 415
Beder, C., 346
Bederson, B.B., 311
Been-Lirn, H.D., 407
Begault, D.R., 281282, 315
Behringer, R., 152, 227, 333, 373, 458
Behzadan, A.H., 331392
Beidaghi, M., 642
Bekaert, P., 458
Belkin, M., 81
Bell, B., 80
Benbasaty, A.Y., 588589, 594, 607
Benford, S., 260, 271, 293, 317
Benhimane, S., 333
Bennett, E., 230
Bennett, J., 52
Bensmi, S.J., 250
Bensoussan, P.-J., 556
Berclaz, T.J., 153
Berg, S., 506
Bergamasco, M., 552554, 556
Bergasa, L., 447
Berger, M.-O., 344345, 473, 508, 528, 537
Berglund, M.E., 629
Bergmann, H., 502, 532
Berlin, G., 622, 633
Bernard, A., 556
Bernold, L.E., 370371
Berrezag, A., 554
Bharatula, N.B., 594597, 608
Bhardwaj, M., 18
Bhat, V., 499
Bhattacharya, R., 621
Bianchi, G., 231
Bianchi, M., 553, 555
Bibeau, K., 632, 636
Bicchi, A., 553, 555
Bichlmeier, C., 507508, 529530, 542
Bickel, B., 577
Bier, J., 497
Biggs, K., 552
Bilinski, P., 321
Billinghurst, M., 65, 73, 127, 156, 230, 385386,
412, 415416, 426, 477, 487, 661677
Biocca, F., 672
Biocca, F.A., 80
683
Author Index
Birkfellner, W., 502, 532
Birth, M., 492
Bischof, H., 186, 192
Bischof, M., 506
Bjorneseth, O., 61
Black, P.M., 530, 532533
Blackwell, M., 504, 535, 539
Blanc, D., 587, 592, 606607, 609
Blank, J., 51
Blsing, B., 314
Blattner, M.M., 293
Blauert, J., 281
Bleser, G., 152, 443, 445
Blinovska, A., 587, 592, 606607, 609
Blum, T., 493, 508
Blyth, M., 495
Board, T.N., 509
Boche, F., 377
Boctor, E.M., 492
Bodenheimer, T., 508
Bodenstedt, S., 508, 530, 534
Boff, K.R., 62
Bogaert, J., 500
Bohm, H.D.V., 62
Bolles, R.C., 126, 186, 192, 196, 213, 216
Bolter, J.D., 263, 268, 270
Bonanni, L., 46, 52
Bonderover, E., 630
Bonfiglio, A., 588589, 593, 603, 607, 611, 613
Book, W.J., 554
Boron, A., 418
Borst, C.W., 231235, 555
Bose, B., 234
Bosio, L., 552
Bosson, J.-L., 509
Botden, S.M., 487
Bottecchia, S., 666
Bottou, L., 208
Bouarfa, L., 508
Bougnoux, S., 461462
Bouguet, J.-Y., 474
Bound, K., 40
Bourgeois, S., 433455
Boussetta, K., 303
Bouwman, H., 411
Bowers, J., 293, 317
Bowie, J., 383
Bowskill, J., 662
Boyce, J., 464
Boyer, D., 413
Bozic, K.J., 509
Brac-de-la-Perriere, V., 19
Bradley, G., 311
Bradski, G., 208, 212213
Brambilla, G., 587, 592, 606, 610
Brandner, M., 198
Braun, A.-K., 298
Breazeal, C., 554

Breebaart, J., 314
Breen, D., 461462
Bregman, A.S., 313
Brewster, S.A., 234, 293
Bridson, R., 564, 570
Brief, J., 534
Brilhault, A., 312
Bristow, H.W., 587, 589, 592, 600, 603, 608612
Brittenham, S., 260
Bronkhorst, A.W., 292
Brooks, D., 496
Brooks, F.P., 332
Brooks, P., 664
Brown, C.M., 231
Brown, D., 79
Brown, E.K., 655
Brown, J.M., 562, 564, 568
Brown, J.S., 324
Brown, L.M., 234
Bruch, H.-P., 492
Brull, W., 298
Brungart, D.S., 281, 292, 321
Brusquet, Y., 589, 594, 606
Brutzman, D., 322
Bryan, J.S., 60
Bucher, P., 506
Buchroeder, R.A., 62
Buchs, N.C., 502
Budd, J., 18
Buechley, L., 621, 629
Bujacz, M., 312
Bujnak, M., 127, 129130, 137, 140, 145
Bulling, A., 594596, 607, 609
Blthoff, H., 478
Burelli, P., 554
Burgert, O., 534
Burggraf, D.S., 374
Burgkart, R., 234
Burgoon, J.K., 672
Brk, C., 492
Butchart, B., 412
Buxton, W., 324, 584
Buy, X., 496
C
Cabrera, D., 281
Cabrera-Umpirrez, M.F., 415
Caffrey, J.P., 359
Cai, H., 335
Cai, K., 477
Cai, X., 652653
Cakmakci, O., 73
Caldas, C.H., 376
Caldwell, D.G., 554
Califf, R.M., 490
684
Calonder, M., 212213
Calvet, L., 440
Campbell, B., 73
Campbell, F.W., 80
Campion, G., 250
Canny, J., 202, 383
Carbonaro, N., 636
Cardia, A., 509
Carleberg, P., 506
Carlile, S., 282
Caron, G., 139
Carpentier, T., 322
Carr, C.E., 324
Carrasco, E., 536
Carretero-Gonzlez, J., 653
Carrigy, T., 314
Carrino, J.A., 504
Carrozza, M.C., 537
Carvalho, E.G.M.D., 492
Casciaro, S., 536
Castiglione, A., 289, 323
Castiglioni, P., 587, 592, 606, 610
Castillo-Martnez, E., 653
Castrillon-Oberndorfer, G., 508, 529, 533534
Cater, K., 314
Catmull, E., 460
Caudell, T.P., 63, 487
Caversaccio, M., 533
Cha, J., 231
Chalupper, J., 302
Chan, H., 496, 507
Chan, S., 414
Chan, V.W., 492
Chandra, T., 127
Chang, J., 96, 652
Chang, W.M., 504
Chang, Y., 232
Chapman, D.N., 372373
Charbel, F.T., 522
Charvillat, V., 440
Chasey, A., 370372
Chaumette, F., 152
Chen, F., 197
Chen, I., 234
Chen, J., 477
Chen, S.J.-S., 507, 525
Chen, W.C., 209
Chen, X., 501, 653, 665
Cheng, A., 497
Cheng, D., 75, 86, 96
Cheng, H., 652653
Cheng, J., 586, 590, 606, 613614
Cheng, K.T., 195223
Cheng, L.-T., 668
Cheok, A.D., 314, 407
Cheon, Y.J., 219
Cherenack, K., 621
Author Index
Chi, D., 314
Chi, S., 376
Chib, V.S., 504
Chin, K.J., 492
Chinello, F., 552555
Chiu, P., 197
Chli, M., 212213
Chmiola, J., 641
Cho, D., 314
Cho, H.U., 653
Cho, Y., 152
Chock, G., 358
Chodorge, L., 556
Choi, H.-W., 435
Choi, J.-W., 300
Choi, K., 376
Choi, M., 411
Choi, S., 227251
Chu, H.W., 655
Chu, M.W.A., 501
Chuangui, Z., 460462
Chubb, E.C., 553
Chum, O., 179, 186, 192, 196, 216
Chun, J., 233
Chung, A.J., 508
Chung, W.K., 250
Chung, W.Y., 646, 654
Ciancitto, F., 586, 589, 591, 606607, 613
Cianflone, D., 587, 589
Cicinelli, J.G., 290, 311
Cieutat, J.M., 666
Ciglar, M., 302
Ciocarlie, M., 556
Cipolla, R., 441, 444
Cipolla, T., 594, 597, 599, 602603
Cirio, G., 552577
Civolani, M., 554
Clark, H.H., 666
Clarkson, M., 531
Claus, D., 152
Clawson, J., 18, 27
Cleary, K., 488, 499, 503
Cline, H., 533
Cobb, J., 495
Cockburn, A., 17
Cohen, D., 33
Cohen, J.D., 380
Cohen, M., 278304, 309324
Colantonio, S., 586, 589, 591, 606607, 613
Coleman, M., 495
Coleman, P.D., 292
Colgate, J.E., 553, 562, 564, 568
Coller, B.S., 490
Collet, A., 152
Collins, D.L., 507, 520543
Collins, J., 474
Collomosse, J., 465
685
Author Index
Colquhoun, H. Jr., 229230
Colucci, D., 537538
Comeau, C.P., 60
Comet, B., 587, 592, 606607, 609
Comport, I., 152
Conditt, M.A., 495, 509
Conway, F., 314
Cooper, D.H., 296
Cooper, J., 464
Cooper, S., 333
Cooperstock, J.R., 324, 541
Coote, A., 42
Coquillart, S., 555
Correia, A., 411
Corroy, C., 587, 592, 606607, 609
Corso, J., 231, 234
Costa, F., 509
Costabile, M.F., 415
Costello, S.B., 372373
Coste-Manire, E., 502
Cote, W., 487
Cotin, S., 528, 537
Cottle, R., 567
Coughlin, J., 629, 636
Courtecuisse, H., 556
Crampton, C., 461462
Crawford, J.R., 537538
Crockett, M., 50
Crossley, F., 242, 246248
Csoma, C., 504
Cui, L.F., 647
Cui, Y., 647
Culbertson, H., 250
Cummins, M., 446
Cundy, T.P., 508
Cunningham, D., 478
Curatu, C., 119
Curone, D., 588589, 593, 603, 607, 611, 613
Currie, M., 501
Cutkosky, M.R., 232, 234, 241, 556
Cynader, M., 292
D
Daeuber, S., 506
Dai, D., 463, 478
Dai, F., 333, 359
Dalton, R.J. Jr., 315
Daly, L., 322
Daly, M.J., 496, 507
Dandekar, K.R., 639
Dandekar, O., 499
Dankelman, J., 508
Dario, P., 537, 593
Darzi, A.W., 508
Dator, J., 49
Datta, A., 438
DAttanasio, S., 537

Davenport, K.P., 488
Davies, B., 495
Davies, E., 508
Davis, F.D., 417, 421
Davis, L.S., 321
Davis, T., 337, 346
Dawe, G., 231
De Anegeli, A., 415
De Buck, S., 500
de Cunha, D.A., 502, 531
de la Plata Alcalde, J.-P., 556
de Lima, E.P., 46
De Long, H.C., 654655
de los Ros, S., 415
De Rossi, D., 636
de Sorbier, F., 461462
Debevec, P., 464
Debus, J., 506
Declerck, J., 536
Deen-Williams, D., 608
DeFanti, T., 231
Deguet, A., 504
Dekeyser, F., 442, 447
del Mar Villafranca, M., 415
Delgado, J.J.L., 250
Deligeorges, S., 314
Delingette, H., 536537, 571
Delisle, J., 487
Demertzis, N., 491
Denby, B., 333
Deng, J., 653
Denis, M., 312
Dennerlein, J.T., 232, 234, 241
Dequidt, J., 528, 537
Derin, I., 655
Desai, A.S., 509
Deschamps, F., 492
Deshazer, H.D., 647
DeVaul, R.W., 586, 589, 592, 600, 607613
Dhome, M., 442, 447, 450, 454
Di Rienzo, M., 587, 592, 606, 610
di Urbina, J.O., 536
Dias-Lima, M., 653
Didier, J.Y., 232, 234
Diepold, K., 321
Dierk, N., 69, 73
DiGioia, A.M., 504, 535, 539
Dillmann, R., 508, 529, 533536
DiMaio, S., 502
Dimitrov, S., 554
Dion, G., 639657
Dipietro, L., 593
Dittmar, A., 587, 592, 606607, 609
Divitini, M., 418
Dixon, B.J., 496, 507
Do, E., 18
686
Dodson, A.H., 333, 374, 377
Doermann, D., 197
Dogan, S., 505
Dohi, T., 505, 522
Doi, K., 303
Doil, F., 435
Dolveck, F., 594, 597, 599, 606608, 613
Donath, J., 288
Dong, S., 331392
Dong, X., 500, 538539
Dong, Y., 234
Dori, Y., 500
Dou, M., 69, 73
Dow, S., 263, 266
Doyle, R., 52
Drageset, L., 664
Drake, S.H., 537538
Dramis, A., 509
Drettakis, G., 461462
Drif, A., 232, 234
Drouin, S., 528
Drugge, M., 664
Drummond, T.R., 126
Drummond, T.W., 152, 192, 198, 207, 221,
438,441, 444
Duckworth, G., 313
Duda, R.O., 315, 321, 363
Dudnik, G., 588589, 593, 603, 607, 611, 613
Duerk, J.L., 503, 531
Duh, H.B.L., 236
Dujovny, M., 522
Dumindawardana, U.C., 315
Dunn, B., 643
Dunne, L.E., 619638
Dnser, A., 412, 415416, 426
Dunston, P.S., 333334, 385
Duraiswami, R., 321
Duriez, C., 556
Durkin, D.P., 655
Durlach, N.I., 278, 324
Dutre, P., 458
Dyatkin, B., 642
Dymarkowski, S., 500
E
Eade, E., 438
Eagleson, R., 535
Eble, T., 461462
Ector, J., 500
Edwards, A.D.N., 312
Edwards, P.E., 488
Edwards, P.J., 502, 531532, 541
Eggers, G., 506, 522, 529, 533534
Eggers, J., 302
Ehlerding, A., 493
Eid, A., 491
Author Index
Eisenberg, M., 621, 629
Eissele, M., 471472
Elgort, D.R., 503, 531
Elhawary, H., 508
Eliason, F.A., 13
Ellis, J.B., 288
Ellis, R.E., 231, 233234, 237
Ellsmere, J., 496
Elson, D., 508
El-Tawil, S., 358359
Endo, T., 68
Engdegrd, J., 314
Engelke, T., 322, 415
Entrena, M., 288
Eom, S., 231
Esrafilzadeh, D., 653
Ess, A., 207208
Ettinger, G.J., 496
Ettinger, G.L., 487
Euler, E., 538539
Evans, A., 333
Evenhouse, R., 231
Everest, F.A., 296, 323
Ewald, H., 594, 597, 599, 606
Ewers, R., 532
F
Fahln, L., 293, 317
Falch, C., 314
Faldowski, R., 250
Falk, V., 502
Fallavollita, P., 501
Faller, C., 314
Fan, F., 655
Fang, X., 653
Faranesh, A.Z., 500
Farrant, A., 436
Faure, F., 460, 473
Fauvarque, J.F., 642
Fayad, L.M., 504
Fayos, Z.A., 654
Fedkiw, R., 564, 570
Feifer, A., 487
Feiner, S., 63, 80, 153, 227, 263, 288, 312, 333,
373, 458, 662
Feiner, S.B., 63
Felfoldy, G.L., 10
Fellner, D., 376
Fels, S.S., 324
Feng, C., 335, 386
Feng, M., 359
Feng, Z., 236
Fenlon, M.R., 532
Feriche, M., 358
Fernando, O.N.N., 315, 319
Fernie, A., 70
687
Author Index
Ferratini, M., 587, 592, 606, 610
Ferreira, M., 438
Ferretti, L., 552
Feuerstein, M., 499, 507508, 535, 542
Feussner, H., 508
Ficco, M., 289, 323
Fichtinger, G., 504
Fickas, S., 662
Fields, B., 314
Fiene, J., 554
Figl, M., 502, 532
Fischer, E., 302
Fischer, G.S., 504
Fischer, J., 457479, 542
Fischer, M., 383
Fischer, M.A., 126
Fischler, M.A., 186, 192, 196, 213, 216
Fisher, R., 69
Fitzgibbon, A.W., 152
Flanagan, P., 3153
Fleig, O., 527, 532533
Flohr, D., 478
Florea, L., 231
Floyd, A.J., 495
Fogel, M.A., 500
Foglia, E., 509
Foley, K.T., 491
Foley, M.P., 654
Foner, L., 312
Fong, T., 509
Fontana, F., 554
Fontecchio, A.K., 639
Foo, J.L., 487
Forest, C., 536537
Fornari, M., 509
Forrest, N.N., 234
Fortin, P.-A., 345
Fouad, H., 291
Fournier, A., 464
Foxlin, E., 198, 222
Frahm, J.M., 186, 192
Franklin, J., 42
Fraser, D.S., 665
Fraunhofer, I.Z.M, 622, 633
Freed, D.J., 287
Fregonese, L., 359
Freiman, M., 497
French, L.M., 199, 209, 212
Freudenthal, A., 509, 536
Frey, C., 42
Frey, M., 234
Freysinger, W., 498
Friedrich, W., 438
Frisoli, A., 553554, 556
Frith, A., 632
Fritz, D., 538
Fritz, J.P., 250
Froehlich, B., 555556

Fu, J.L., 556
Fu, Y., 652653
Fua, F., 126
Fua, P., 212213, 389
Fuchs, H., 69, 73, 76, 503, 537538
Fuchs, P., 232
Fuhrmann, A., 460, 473
Fuhrmann, A.L., 535536
Fukamachi, S., 552, 554
Fukuda, Y., 359
Fukui, Y., 552, 554
Fukumoto, M., 234, 299
Fukushima, S., 234235
Fullerton, C.E., 376
Funkhouser, T., 280
Furness III, T.A., 60, 278
Furuhashi, Y., 533
Fusaglia, M., 502
Fussell, S.R., 664665, 673
G
Gabbard, J.L., 79
Gabriel, T.H., 45
Gaikwad, A.M., 655
Gaikwad, V., 415
Gain, B., 17
Gamper, H., 301
Gandy, M., 263
Gangi, A., 496
Ganster, H., 198
Gao, H., 652
Gao, S., 411, 426
Garcia-Hernandez, N., 554
Gardiner, M., 267
Garg, M., 18
Garner, W.R., 10
Garre, C., 552577
Gaston, R.P., 502, 531
Gavaghan, K.A., 502, 506
Gay-Bellile, V., 433455
Gaye, L., 314
Gazzola, V., 50
Gedenryd, H., 290
Gelfand, N., 209
Gemperle, F., 552, 608
Geng, J.F., 646, 655656
George, A.K., 500
Georgel, P., 333
Georgi, M., 588, 593, 607
Gerard, I., 528
Gerhard, M., 415
Gerling, G.J., 234
Geronazzo, M., 321
Gersak, B., 536
Gershenfeld, N., 46
688
Gervautz, M., 460, 473
Gewirtz, J., 554
Gholamvand, Z., 653
Ghosh, A., 652
Giannachi, G., 260, 271
Gibson, D., 315
Gibson, J., 13
Gilbert, G., 313
Gilkey, R.H., 281
Gillespie, M.J., 500
Gilliland, S., 18, 22
Ginsburg, D., 459
Gioberto, G., 619638
Gionis, A., 215
Giraldez, J.G., 533
Glatz, A.C., 500
Gleason, L., 533
Gleason, P.L., 530
Gleeson, B.T., 553, 555
Gleeson, M.J., 531532
Glossop, N., 505
Gockel, T., 534
Goddard, M.S., 495
Godinez, C., 499
Gogotsi, Y., 639657
Gokturk, B.S., 346
Golparvar-Fard, M., 333334
Gomes, P., 438, 495
Gomez, R., 536
Gomila, C., 464
Gondan, M., 497
Gong, J., 376
Gong, R.H., 500
Gonzalez, R., 185, 202
Goodger, N.M., 532
Gool, L.V., 207208
Gopal, P., 233234
Gordon, I., 441
Gorgutsa, S., 653, 655
Grtler, J., 508, 530
Gosselin, F., 556
Goto, A., 589, 593, 608, 5888
Gottschalk, S., 380
Gounaris, M., 413
Gozen, A.S., 487, 538
Gracia, A., 553, 568
Grange, S., 509
Gransther, P.A., 411
Graser, A., 334
Grasset, R., 80, 156
Grtzel, C., 509
Greenberg, R.M., 293
Greenebaum, K.,
Greenhalgh, C., 293, 317
Greer, D., 461462
Greiner, G., 536
Grimmer, K., 17
Author Index
Grimmer-Somers, K., 17
Grimson, W.E., 487
Grimson, W.E.L., 532533
Grimson, W.L., 496
Grioli, G., 553
Groch, A., 508
Grodski, J.J., 400
Groen, F., 466
Grollmann, B., 152
Grosch, T., 461462
Gross, M., 552577
Grosshauser, T., 234, 314
Gruetzner, P.A., 500, 538539
Grundmann, U., 490
Grzeszczuk, R., 153
Gu, J.F., 653
Guan, T., 345
Guan, W., 152153, 169
Gugino, L., 487
Guha, S.K., 234
Guillerminet, O., 303
Guiraudon, G.M., 501, 535
Guitton, P., 478
Gler, ., 500
Gumerov, N.A., 321
Gunawan, A., 464
Gunkel, A.R., 498
Gunn, C., 231
Gnthner, W.A., 436
Guo, A., 22
Gurdjos, P., 440
Gurung, J., 505
Guruswamy, V.L., 250
Gustafsson, T., 506
Gutierrez, O., 312
Gutt, C.N., 499, 535536
Guven, E.O., 487, 538
Guven, S., 487, 499, 538
Gyorbir, N., 293, 315
H
Ha, I., 411
Ha, T., 232, 234, 236
Ha, Y., 73
Haahr, M., 314
Haas, C.T., 376377
Haberlin, B., 260
Habigt, J., 321
Habigt, T., 321
Hachet, M., 478
Hachisu, T., 234235
Haddadi, A., 242
Hadzic, A., 495
Haefke, M., 594, 597, 599, 606
Haffner, P., 208
Hager, G.D., 233234, 492, 502
Author Index
Hakime, A., 492
Hall, N.C., 493
Hallen, B., 75
Haller, M., 407, 476477
Hamacher, V., 302
Hamilton, H, 22
Hammad, A., 385, 412
Han, C., 234, 655
Han, J.-H., 234, 345
Han, S.H., 233
Handa, J., 233234
Handel, S., 314
Hanel, R., 502, 532
Hansen, C., 507, 541
Hanuschik, M., 502
Haouchine, N., 528, 537
Haralick, R., 202
Harashima, H., 486
Harbisson, N., 47, 312
Harders, M., 227251
Haren, S., 267
Hrm, A., 288
Harms, C., 672
Harms, H., 586, 589, 591593, 600, 602603,
607, 610611, 613, 615
Harms, W., 506
Harrigan, P., 260
Harris, M.A., 500
Harris, S., 495
Hart, G.C., 362
Hart, P.E., 363
Hartle, R., 127, 132
Hartley, R., 192, 442
Hartmann, W.M., 283
Hasegawa, S., 552, 554
Hashimoto, S., 400
Hashizume, M., 487, 535, 537
Hashtrudi-Zaad, K., 242
Hassenpflug, P., 536
Hassfeld, S., 506, 534
Hata, N., 522
Hattori, A., 537
Hattori, T., 303
Haugstvedt, A.-C., 411428
Haus, G., 315
Havemann, S., 376
Haverhals, L.M., 654655
Hawkes, D.J., 502, 531532, 541
Hayes, G., 263
Hayford, M.J., 65
Hayward, V., 250, 553554, 577
Haywood, K., 413
Hebert, C., 290, 311
Hebert, M., 152
Hebert, P., 345
Hedau, V., 153
Heibel, H., 152
689
Heiber, T., 662
Heidbuchel, H., 500
Heiestad, S., 664
Heilig, M., 60
Hein, A., 497
Heining, S.M., 499, 501, 507, 538539, 541542
Held, R.M., 324
Helferty, J.P., 498
Hella, L., 421
Heller, E.J., 297
Hellmuth, O., 314
Hellwich, O., 536
Helmsen, J., 199, 209, 212
Henckel, J., 495
Hendrix, C., 61
Henze, N., 613614
Heon, M., 642, 647650
Herbst, I., 298
Herder, J., 282
Hermann, T., 234, 314
Hernandez, F., 552577
Herre, J., 314
Herrmann, K., 493
Hertzmann, A., 465
Hesina, G., 460, 473
Hestnes, B., 664
Higgins, W.E., 498
Hiipakka, J., 288
Hildebrand, P., 492
Hill, D.L.G., 502, 531532
Hiller, B., 334
Hilliges, O., 267, 474
Hilpert, J., 314
Hinkle, G.H., 493
Hinterstoisser, S., 333
Hinze, J., 376
Hiroka, S., 87
Hirokazu, K., 386
Hirose, M., 78, 312
Hirota, G., 503
Hirota, K., 312
Hix, D., 79
Hmam, H., 126
Ho, C., 234
Hoelzer, A., 314
Hoermann, S., 474
Hoever, R., 232
Hoffmann, J., 508, 534
Holey, P., 415
Holland, D., 8
Holland, S., 290
Hollands, R., 333
Hllerer, T.H., 7980, 174192, 263, 662
Hollins, M., 250
Hollis, R.L., 233234
Holman, T., 299
Holmes, G., 311
690
Holmes, O.W., 44
Holmquist, L.E., 314
Holt, R.E., 292
Hong, J., 535
Hong, K., 233234
Hong, K.-S., 593
Hong, L.X., 497
Hong, S., 596, 598, 602, 607
Honkamaa, P.,
Hoogen, J., 234
Hoppe, H., 506, 534
Hori, T., 498, 531
Horowitz, M.B., 504
Horschel, S.K., 555
Hoshi, K., 505
Hoshi, T., 233234
Hou, S., 652653
Houde, A., 630
Houtgast, T., 292
Howe, R.D., 556
Howells, J., 40
Howes, D., 46, 52
Hsiao, E., 152
Htoon, M.M., 293
Hu, C., 652653
Hu, L.B., 647
Hu, M.H., 539
Hu, Y., 652653
Hu, Z., 126
Hua, H., 75, 86, 117119
Huang, J., 300
Huang, P., 641
Huang, W., 666
Hubbard, A., 314
Huber, K., 532
Huber, P.J., 181, 183
Hughes, C., 268
Hughes, D., 268
Hughes-Hallett, A., 508
Hugues, O., 232
Huhle, B., 473
Hull, R., 314
Hummel, J., 502, 532
Hunt, K., 242, 246248
Hutchings, R., 314
Hutchins, M., 231
Hutchinson, R.C., 597, 600, 606, 609
Huttenlocher, D., 192
Hwang, K., 234
Hwang, P., 497
Hyakumachi, T., 503
I
Iannitti, D.A., 492
Ibrahim, M., 416
Ieiri, S., 499, 535
Author Index
Ieong, E., 594596
Igarashi, T., 400
Iida, K., 281
Ikeda, A., 234
Ikeda, M., 300, 376
Ikeda, S., 152
Ikei, Y., 312
Ikuta, H., 588589, 593, 608
Iltis, R.A., 219
Imhof, H., 502, 532
Imielinska, C., 487
Imlab, C., 39
Inami, M., 231, 233234, 400, 405407
Indugula, A.P., 555
Indyk, P., 215
Inomata, T., 505
Inoue, T., 594, 597, 599, 602603
Ioannidis, N., 265, 413
Iordachita, M., 233234
Irish, J.C., 496, 507
Irschara, A., 186, 192
Irving, G., 564
Isard, M., 192, 215
Iseki, H., 498, 522, 531
Isenberg, T., 465
Ishida, A., 400
Ishii, H., 46, 52, 324
Ishii, M., 233
Ishimaru, S., 595, 598
Ismail, S., 22
Ito, M., 503
Itoh, T., 630
Itoh, Y., 79
Iwata, H., 234
Iwaya, Y., 281
Izadi, S., 267, 474
J
Jacobs, J., 555556
Jacobs, S., 502
Jakimowicz, J.J., 487
Jakka, J., 288
Jakopec, M., 495
Jakubowicz, J., 180, 363
Jalili, R., 653
James, D.L., 232, 556, 567
Jan, M.F., 263
Jank, E., 493
Jannin, P., 508, 520543
Jappinen, J.,
Jarchi, D., 594596
Javidi, B, 117118
Jelle, T., 416
Jenkin, M.R., 280
Jenkins, S., 23
Jeon, S., 227251
691
Author Index
Jeong, J., 345
Jeong, S., 647
Jesberger, J.A., 503, 531
Jessel, J.P., 666
Jewell, D., 531532
Ji, Y., 359
Jiang, B., 152, 198
Jimenez, J.M., 536
Jin, C., 288
Jinnah, R.H., 495
Jiroutek, M., 503
John, T.K., 495
Johnson, A., 231
Johnson, L.F., 413
Johnson, L.G., 502, 507, 541
Jolesz, F.A., 530, 532533
Jones, D.L., 535
Jones, M., 311, 588589, 593, 629630
Jones, S., 311
Jones, V.M., 500
Jonker, P., 666
Jonker, P.P., 508
Joskowicz, L., 497
Jost, K., 639657
Jot, J.-M., 280, 282
Jouffrais, C., 312
Joung, M., 234
Julier, S.J., 79, 152, 227, 333, 373, 458, 667
Jun, K., 314
Jundt, E., 434
Jung, C., 231
Jung, H.-R., 652, 655
Junghanns, S., 376
K
Kaaresoja, T., 234
Kaasinen, E., 411
Kaczmarek, K.A., 61
Kagotani, G., 400
Kahl, F., 127
Kahn, S., 440
Kahrs, L.A., 506, 531532
Kajimoto, H., 231, 233234, 552, 554
Kajinami, T., 78
Kakeji, Y., 499
Kalkofen, D., 80, 478, 507
Kallmayer, C., 622, 633
Kalra, A.K., 234
Kamat, R.V., 383
Kamat, V.R., 331392
Kameas, A., 324
Kamijoh, N., 594, 597, 599, 602603
Kammoun, S., 312
Kamuro, S., 552, 554
Kn, P., 469
Kanade, T., 233234, 389, 438, 504, 535, 539
Kanbara, M., 373, 468469

Kancherla, A.R., 487
Kane, R., 496
Kane, T.D., 499
Kaneko, M., 486
Kaneko, S., 359
Kang, S., 377
Kang, X., 499
Kapoor, A., 233234
Kapralos, B., 280
Karahalios, K., 288
Karigiannis, J., 265, 413
Karimi, H., 412
Karjalainen, M., 288
Karlof, K., 250
Karstens, R.W., 287
Kasabach, C., 552, 608
Kasai, I., 68
Kati, D., 508, 529530, 533534
Kato, H., 65, 127, 131, 230, 281, 385, 462
Kato, K., 503
Katz, B.F.G., 303, 312
Katz, D., 250
Kaufman, A., 556
Kaufman, D.M., 567
Kaufman, L., 62
Kaufman, S., 487
Kaufmann, H., 440, 469
Kautz, J., 667
Kawaguchi, M., 315
Kawakami, N., 231, 233234, 552, 554
Kawamata, T., 498, 531
Kawamura, R., 234
Kawanaka, H., 499
Kazanzides, P., 233234
Ke, Q.F., 215
Keeve, E., 493, 534
Keil, J., 414415
Keita, F., 594, 597, 599, 606608, 613
Keller, K.P., 73, 76, 503, 537538
Kelley, K., 40
Kelley, L., 51
Kellner, F., 345
Kendall, G.S., 287
Kendoff, D., 509
Keounyup, C., 451
Kerner, K.F., 487
Kerouanton, J.-L., 413
Kerr, K., 371
Kerrien, E., 528, 537
Kersten-Oertel, M., 507, 520543
Keysers, C., 50
Khallaghi, S., 497
Khamene, A., 487, 503, 531532, 538539
Khan, M., 493
Khan, M.F., 505
Kheddar, A., 232
692
Kiaii, B., 501
Kichun, J., 451
Kijima, R., 76
Kikinis, R., 487, 496, 530, 532533
Kim, B., 596, 598, 602, 607
Kim, C.-H., 376377, 554
Kim, D., 474
Kim, H., 250, 506, 554, 652
Kim, J., 126, 231, 652
Kim, J.C., 234
Kim, J.H., 219
Kim, J.-H., 593
Kim, J.M., 653
Kim, K., 652, 655
Kim, S., 234, 300
Kim, S.H., 586, 590, 606607, 653
Kim, S.-Y., 234, 554
Kim, W., 234
Kim, Y., 231
Kim, Y.-H., 300
Kim, Y.S., 593594, 597, 599
King, A.J., 292
King, A.P., 502, 531532
King, K., 377
King, L.E., 292
King, S., 314315
Kini, A.P., 377
Kinkeldei, T.W., 621
Kirk, D.S., 665
Kirsch, N.J., 639
Kirstein, T., 621622
Kishino, F., 4, 74, 259, 298, 486, 520
Kishishita, N., 71
Kitano, H., 400
Kiyokawa, K., 6081
Klatzky, R., 504
Kleemann, M., 492
Klefstat, F., 587, 592, 606607, 609
Klein, G., 185, 192, 442, 444, 459, 463, 467468
Klein, J., 497
Klein, M., 497
Kleiner, M., 280
Klette, R., 126
Klinker, G., 79, 412, 436, 535
Klopschitz, M., 192
Knight, J.F., 587, 589, 592, 600, 603, 608612
Kndel, S., 478
Knoerlein, B., 231
Knopp, M.V., 493
Kobayashi, E., 505
Kobbelt, L., 192
Koch, D.G., 65
Koch, R., 345346, 434
Kockro, R.A., 497
Koda, K., 506
Kodama, K., 406
Koehring, A., 487
Author Index
Kohli, P., 474
Kojima, M., 405406
Kolb, A., 508
Komor, N., 18
Konishi, K., 487, 499, 535
Konolige, K., 208, 212213
Koo, B., 383
Koo, J., 233234
Koppens, J., 314
Korah, T., 153
Korb, W., 540
Korkalo, O.,
Kornagel, U., 302
Kortuem, G., 662, 664, 667
Kosa, G., 232
Kosaka, A., 533
Koschan, A., 126
Kser, K., 345
Kosuge, T., 665
Kosugi, C., 506
Koto, T., 400
Koulalis, D., 491
Kourogi, M., 665, 668
Kramer, A., 673
Kraus, M., 471472
Krauss, R.M., 666
Kraut, R.E., 662664, 666
Kreaden, U., 502
Krebs, D.E., 588589, 594, 607
Krempien, R., 506
Kress, B., 19, 86123
Kreuer, S., 490
Krishnan, S., 358
Krogstie, J., 411428
Krueger, T., 497
Krug, B., 497, 534
Kruger-Silveria, M.K.-S., 438
Kruijff, E., 71
Kry, P.G., 556
Kbler, A.C., 497, 534
Kuchenbecker, K.J., 228, 232, 234, 250, 554
Kuijper, A., 440
Kukelova, Z., 127, 129130, 137, 140, 145
Kumar, A., 233234
Kumar, R., 192
Kumar, S., 233234
Kuntze, A., 271
Kunze, K., 595, 598
Kunze, S., 532
Kuorelahti, J., 28
Kurata, T., 152, 665, 668
Kurihara, T., 556
Kurita, Y., 234
Krkloglu, M., 500
Kuroda, T., 588589, 593, 608
Kurzweg, T.P., 639
Kurzweil, R., 4950
693
Author Index
Kutter, O., 493
Kuzuoka, H., 665, 668
Kwon, D.-S., 554
Kwon, S., 377
Kwon, Y.H., 652, 655
Kyprianidis, J., 465
Kyriakakis, C., 320
Kyung, K.-U., 231, 234, 554
L
La Mantia, F., 647
La Palombara, P.F., 537
Labbe, B., 454
Labrune, J.-B., 46, 52
Lackner, C., 556
Laitinen, M.-V., 300
Lakatos, D., 46, 52
Lan, Z.-D., 126
Landerl, F., 477
Lang, J., 250
Lang, J.E., 495
Lang, J.W., 650
Lang, P., 198
Lang, T., 478
Langenstein, M., 655
Langlotz, F., 490
Langlotz, T., 80
Lanman, D., 69, 73
Lanzilotti, R., 415
Laperrire, R., 561
Largeot, C., 641
Larnaout, D., 433455
Laroche, F., 413
Larsen, E., 380
Lasorsa, Y., 322
Lasser, T., 493
Lau, K.N., 492
Lauffer, M., 594596, 600, 607, 609611
Lazebnik, S., 215, 441
Le, V.T., 652
Lechner, M., 322
LeCun, Y., 208
Lecuyer, A., 233
Lederman, R.J., 500
Lee, H., 234, 314
Lee, H.M., 359
Lee, I., 233234, 359, 436
Lee, J., 263, 503, 554
Lee, J.A., 653
Lee, J.B., 621, 638
Lee, J.-Y., 231
Lee, K.K., 588589, 594
Lee, P.Y., 539
Lee, S., 333334, 554, 596, 598, 602, 607
Lee, S.-G., 593594, 597, 599
Lee, S.H., 345
Lee, W.-S., 250

Lefvre, D., 413
Lehn, D., 629630
Lei, P., 499
Leibe, B., 192
Leigh, J., 231
Leitner, J., 407
Lemmel, S., 672
Lemordant, J., 322
Leonov, V., 320
Lepetit, V., 126, 212213, 345, 389, 468
Lerotic, M., 508
Lessoway, V.A., 497
Lester, J., 370371
Leu, H.J., 639
Leutenegger, S., 212213
Leventon, M.E., 487
Levine, A., 413
Levola, T., 100
Levy, M., 314
Lewin, J.S., 503, 531
Lhuillier, M., 442, 448
Li, H., 127, 300, 411
Li, L., 460462, 629, 646, 654655
Li, M., 233
Li, P., 650, 652
Li, W.W.L., 219
Li, X., 650, 652
Li, Y., 192, 556, 564, 629
Li, Z., 650, 652
Lian, K., 646, 650, 652, 656
Liang, J., 668
Liang, T.-P., 411
Liao, C.Y., 197
Liao, H., 505
Liapi, K.A., 376377
Lieberman, J., 554
Livin, M., 534
Lin, M.C., 380, 552
Lin, R., 641
Lindt, I., 298
Linte, C.A., 486510, 535
Linz, T., 622, 633
Lira, I.H., 487
Littler, M., 42
Liu, D., 23
Liu, J., 588589, 593
Liu, K.C., 539
Liu, M.L., 653
Liu, N., 655
Liu, Q., 197
Liu, S., 75, 86
Liu, W.P., 503
Liu, W.W., 650
Liu, X., 197, 477
Liu, Y., 411, 655
Livingston, M.A., 79, 537538
694
Llach, J., 464
Lloyd, J.E., 232
Lo, B., 594596, 606
Lo, J., 535
Lobe, T., 487
Lobes, L.A. Jr., 504, 532
Locher, I., 621
Lockhart, T.E., 588589, 593
Lokki, T., 281, 288, 300, 314
Longridge, T., 70
Lonner, J.H., 495, 509
Loomis, J.M., 290, 311
Lopez, E., 447
Lopez, J., 371
Lpez-Nicolsb, C., 411
Lorensen, W., 533
Lorho, G., 288
Loriga, G., 588589, 593, 603, 607, 611, 613
Lorussi, F., 636
Lossius, T., 322
Lotens, W., 61
Lothe, P., 447
Louis, J., 345
Lourakis, M., 348
Lourakis, M.I.A., 566, 573
Lovejoy, J., 500
Lovo, E.E., 487
Lowe, D.G., 160, 176, 179, 186, 192, 207208,
389, 441
Loy, G., 281
Lozano-Perez, T., 496
Lu, J., 477
Lu, K., 498
Lu, M., 333, 359
Lucas, B.D., 389
Luciano, C., 231
Ludwig, L.F., 315
Ludwig, M.D., 287
Luebke, D., 69, 73, 75
Lueth, T.C., 497
Lukatskaya, M.R., 642
Lukosch, S., 666
Lukowicz, P., 27, 586587, 590, 592,
594597, 599600, 606611,
613614
Luo, X., 22
Lupton, E., 51
Lv, Z., 652653
Lyons, K., 17, 27
Lyytinen, K., 417
M
Ma, W., 655
Maataoui, A., 505
Macaluso, F., 594596, 600, 607, 609611
Macia, I., 536
Author Index
Macintyre, B., 63, 152, 227, 263, 266, 268, 270,
322, 333, 373, 458, 477478, 662
MacKenzie, I.S., 584
MacLean, K.E., 233
Macq, B., 533535
Maeda, N., 404
Maeda, T., 554
Maehara, Y., 535
Maeno, T., 314
Maes, F., 500
Magas, M., 314
Magenes, G., 588589, 593, 603, 607, 611, 613
Magerkurth, C., 314
Magnenat-Thalmann, N., 561
Maguire, J.S., 40
Maguire, Y., 16
Mahalik, N.P., 231
Mahvash, M., 241
Maier-Hein, L., 497, 508
Maimone, A., 69, 73
Majdak, P., 322
Majno, P.E., 502
Makino, Y., 314
Malham, D.G., 292
Malhi, K., 594, 597, 599, 606
Mallem, M., 152
Mallet, E., 589, 594, 606
Malliopoulos, C., 587, 589
Malvezzi, M., 552555
Mandal, P., 646, 655656
Mandryk, R.L., 314
Manduchi, R., 476
Mann, S., 4, 33, 39, 48, 80, 661, 663
Mannava, S., 495
Manocha, D., 380
Marayong, P., 233
Marcacci, M., 537
Marchand, E., 139, 152
Marchand, H., 152
Marcus, H.J., 508
Marcus, J., 413
Marescaux, J., 496, 536537
Margetts, M., 46
Margier, J., 509
Mariani, J., 293, 317
Mariette, N., 303
Marino, S., 564, 570
Marmulla, R., 506, 522, 534
Marner, M.R., 267, 434
Martelli, S., 537
Martens, W.L., 282, 284, 287, 292, 296, 300,
317, 321
Martin, A., 288
Martin, A.D., 499
Martin, E.W. Jr., 493
Martin, J.S., 60
Martin, R., 552, 608
Author Index
Martin, T., 588589, 593, 629630
Martinez, J., 345
Martinez, J.C., 352, 384
Martinie, J.B., 492
Martins, R., 86
Martz, P., 339
Marui, A., 296
Masamune, K., 504
Mashita, T., 71, 79
Masri, S.F., 359
Massie, W., 333
Master, N., 43
Matas, J., 179, 186, 192, 215
Mateas, M., 266
Mather, T., 554
Mathes, A.M., 490
Mathieu, H., 504
Matias, E., 584
Matsubara, H., 400
Matsuno, F., 400
Matsushita, S., 595596, 612
Matthews, J., 263
Matula, C., 502, 532
Matusik, W., 577
Mauer, E., 673
Maurer, C.R. Jr., 502, 531532
Maurer, U., 594, 597, 599, 602603, 609
Mavor, A.S., 278
Mavrogenis, A.F., 491
Mavrommati, I., 324
May, M., 290
Mayer, E.K., 508
Mayol, W.W., 665
Maz, R., 314
McAllister, D.F., 291
McCaffrey, E., 47
McCollum, H., 60
McDonough, J.K., 646650, 656
McGookin, D.K., 293
McGrath, J., 495
McGuire, J., 71
McGurk, M., 532
McKillop, I.H., 492
McKinlay, A., 43
McNeely, W.A., 562
McQuillan, P.M., 495
Megali, G., 537
Meier, P., 152, 435
Meinzer, H.-P., 497, 499, 536
Meis, J., 415
Melamed, T., 314
Melchior, F., 300
Melzer, J., 86
Menassa, R., 436
Mendez, E., 333334, 376, 478, 507
Meng, Q., 653
Meng, Y., 652653
695
Menk, C., 434
Merlo, C., 666
Merloz, P., 491
Merritt, S.A., 498
Mershon, D.H., 292
Metje, N., 372373
Metzger, J.-C., 227251
Metzger, P.J., 486
Meyer, A.A., 537538
Meyers, K., 263
Meyrueis, P., 97
Mhling, J., 534
Miao, M., 653
Mikolajczyk, K., 207
Milanese, S., 17
Milgram, P., 4, 229230, 259, 298, 400, 486, 520
Milios, E., 280
Miller, E., 487
Miller, M.D., 662663, 666
Mills, J.E., 383
Milsis, A., 587, 589
Mimidis, G., 491
Minamizawa, K., 234, 552, 554
Mine, Y., 303
Miranda, E., 358
Mirow, L., 492
Mischkowski, R.A., 497, 534
Misra, M., 522
Mitake, H., 552, 554
Mitchell, B., 233234
Miyake, K., 630
Miyaki, T., 594596, 603
Miyano, G., 487
Miyata, N., 556
Miyazaki, J., 131
Mizell, D.W., 63
Mockel, D., 504
Moffitt, K., 86
Mofidi, A., 495
Mohr, F.W., 502
Moital, M., 411
Mojzisik, C.M., 493
Mok, K., 528
Molina-Castillo, F.J., 411
Molyneaux, D., 474
Monden, M., 497, 535
Monitoring, D., 359
Montola, M., 260
Moore, D.R., 292
Moore, J., 501, 505, 535
Moore, J.T., 501
Mooser, J., 152, 156, 158, 162, 169
Mora, S., 418
Morandi, X., 508
Morel, J.-M., 180, 363
Morel, P., 502, 506
Moreno, E., 268
696
Moreno-noguer, F., 126
Mori, H., 79
Morphett, J., 662
Morse, D.R., 290
Mote, C.D., 564
Motwani, R., 215
Mountain, D., 314
Mountney, P., 508
Mouragnon, E., 442
Mourgues, F., 502
Mudur, S.P., 385
Mueller, S., 461462
Mhling, J., 506, 522, 534
Mukhopadhyay, S.C., 594, 597, 599, 606
Muller, K., 503
Mller, M., 497, 499, 560
Mller, S., 436
Muller-Stich, B.P., 535536
Mulley, B., 587, 589, 592, 600, 602603, 609, 612
Mulligan, J., 668
Mulligan, L., 632
Mulloni, A., 152
Munaro, G., 586, 589, 591, 606607, 613
Munshi, A., 459
Munter, M.W., 506
Munzenrieder, N., 621
Mura, G.D., 636
Murakami, M., 589, 593, 608, 5888
Murphy, D., 153
Murray, D.W., 185, 192, 442, 444, 459, 463,
467468, 665
Murray-Smith, R., 672
Murrey, D.A. Jr., 493
Musicant, A.D., 322
Mussack, T., 499
Mutschler, W., 538539
Mutter, D., 536537
Mylonas, G.P., 508
Mynatt, E.D., 288
Myoungho, S., 451
N
Nafis, C., 533
Nagahara, H., 71
Nagata, K., 234
Naimark, L., 198, 222
Najafi, H., 535
Nakada, K., 487
Nakaizumi, F., 234
Nakajima, Y., 497, 535
Nakamoto, M., 487, 497, 499, 535
Nakamura, A., 405406
Nakamura, N., 552, 554
Nakatani, Y., 135
Nakazawa, A., 79
Naliuka, K., 314
Author Index
Nannipieri, O., 232
Narayanaswami, C., 594, 597, 599, 602603
Narita, Y., 359
Narumi, T., 78
Nassani, A., 661677
Nathwani, D., 594596
Naudet-Collette, S., 436, 438, 445, 450
Navab, N., 488, 493, 499, 501, 507508, 541542
Nedel, L.P., 533535
Neely, C., 629630
Neff, R.L., 493
Neider, J., 337, 346
Nelson, L., 324
Neumann, U., 152169, 198, 461462
Newcombe, R., 267, 474
Newman, P., 446
Ng, C., 41
Ng, I., 497
Nguyen, D., 71
Niazi, A.U., 492
Nicol, R., 322
Nicolau, S.A., 496, 536537
Niedzviecki, H., 4
Nielsen, S.H., 292
Niemann, H., 487
Nii, H., 400, 405
Nijmeh, A.D., 532
Nikou, C., 504, 535, 539
Nilsen, T., 314
Nilsson, M., 664
Nishizaka, S., 78
Nistr, D., 179, 186, 192
Nister, F., 127
Nitschke, C., 79
Nitzsche, S., 502
Noda, I., 400
Noessel, C., 324
Noisternig, M., 322
Nojima, T., 231234, 236
Noltes, J., 411
Nordahl, R., 554
Nour, S.G., 503, 531
Noury, N., 587, 592, 606607, 609
Novak, E.J., 509
Numao, T., 135
O
Ocana, M., 447
Ochiai, Y., 233234
Oda, H., 376
Oda, O., 288
ODonovan, A.E., 321
Oezbek, C., 263
Ogasawara, T., 234
Ogawa, T., 78, 469
Ogertschnig, M., 411
Author Index
Ogris, G., 587, 590, 592, 607
Ogundipe, O., 374, 377
Oh, B.H., 652, 655
Oh, S., 314
Ohbuchi, R., 503
Ohlenburg, J., 298
Ojika, T., 76
Okada, K., 131
Okamoto, M., 68
Okamura, A.M., 232, 234, 241
Okuma, T., 373
Okumura, B., 468469
Okur, A., 493
Okutomi, M., 135
Olbertz-siitonen, M., 672
Oliveira-Santos, T., 506
Oloufa, A.A., 376
Olson, E., 390
OMalley, D.M., 493
OMalley, M.K., 234
Omura, K., 74
Onceanu, D., 542
Ontiveros, A., 358
Opdahl, A.L., 417
Oppenheimer, P., 487
Orlosky, J., 71, 7879
Ortega, M., 555
Ortiz, R., 212213
Ortolina, A., 509
Osborne, M., 42
Oskiper, T., 192
Ossevoort, S., 594597, 600, 607611
Otaduy, M.A., 552577
Ott, R., 231, 555
Ou, J., 665, 673
Oulasvirta, A., 28
Ozuysal, M., 389
P
Pabst, S., 564, 571
Pacchierotti, C., 552555
Pacelli, M., 608, 636
Padoy, N., 508
Pagani, A., 463, 478
Pai, D.K., 232, 556, 567
Pajdla, T., 127, 129130, 137, 140, 145, 215
Palmieri, F., 289, 323
Paloc, C., 536
Pandya, A., 497, 531
Pang, G., 153
Pang, J., 567
Papadopoulo, T., 566, 573
Papadopoulos, D., 3153
Papagelopoulos, P.J., 491
Papanastasiou, J., 491
Pape, D., 231
697
Papetti, S., 554
Paradiso, J.A., 233234, 320321, 588589, 594,
606608, 611
Paradiso, R., 587, 589, 603, 608, 636
Parameswaran, V., 153
Pramo, M., 415
Parati, G., 587, 592, 606, 610
Park, G., 234
Park, H.S., 359
Park, H.-S., 435
Park, I.-K., 593
Park, J.I., 345
Park, J.-W., 435
Park, Y., 468
Park, Y.J., 653
Parkes, R., 234
Parnes, P., 664
Parrini, G., 552
Parseihian, G., 312
Parviainen, R., 664
Pasta, M., 647
Pastarmov, Y., 463, 478
Patel, A., 370372
Patel, N., 27
Patel, R.V., 501
Patel, S., 15
Paterson, N., 314
Pattynama, P.M.T., 509
Paul, P., 527, 532533
Pavlik, J., 263
Pece, F., 667
Pedersen, E.R., 324
Peitgen, H.-O., 507, 541
Peli, E., 97
Peltola, M., 314
Pena-Mora, F., 333334
Peng, C., 650
Peng, H., 653
Peng, M., 652653
Pennec, X., 496, 536537
Pentenrieder, K., 435
Pentland, A., 586, 589, 592, 600, 607612
Peres, R., 411
Perey, C., 322
Perez, A.G., 552577
Perez, C.R., 646650, 656
Prez, L., 415
Perlin, K., 465
Pernici, B., 417
Perrin, N.A., 203
Perrott, D.R., 322
Peshkin, M.A., 553
Peterhans, M., 502, 506
Peterlik, I., 528, 537
Peters, C.A., 488, 499
Peters, N., 322
Peters, T.M., 488, 501, 505, 535
698
Petit, A., 139
Peuchot, B., 539
Pfister, H., 577
Philbin, J., 192, 215
Philip, M., 499
Picinbono, G., 571
Piekarski, W., 334, 587, 589, 592, 600, 602603,
609, 612
Pihlajamki, T., 300
Pintaric, T., 440
Pinz, A., 198
Pisano, E., 503
Plant, S., 45
Platonov, J., 152
Platt, J.C., 321
Plaweski, S., 491, 509
Pletinckx, D., 414
Plowman, E.E., 639
Poelman, R., 666
Poissant, L., 4546, 52
Polat, G., 383
Pold, S., 46
Pollard, N.S., 556
Ponamgi, M., 380
Ponce, J., 215, 441
Popovic, J., 593
Porazzi, E., 509
Pornwannachai, W., 646, 655656
Prschmann, C., 322
Porter, S.R., 434
Porter, T.R., 371
Portet, C., 641
Pouliquen, M., 556
Poupyrev, I., 65, 230, 487
Povoski, S.P., 493
Powell, D., 234
Powell, K.D., 100
Prados, B., 415
Prandi, F., 359
Pratt, P.J., 508
Prattichizzo, D., 250, 552555
Preciado, D., 503
Preim, B., 536
Presser, V., 642, 647650
Pressigout, M., 152
Priego, P., 293
Prisco, G.M., 552
Profita, H., 18
Provancher, W.R., 553555
Provot, X., 564, 570
Psomadelis, F., 587, 589, 592, 600,
603,608612
Puder, H., 302
Puebla, M.C., 487
Pugin, F., 502, 506
Pulkki, V., 281, 300
Pulli, K., 209
Author Index
Puterbaugh, K.D., 562
Pylvninen, T., 153
Q
Qi, J., 288
Qiu, Z., 231
Qu, L., 653
Quackenbush, S., 314
Quan, L., 126
Quinn, B., 639
Quinn, S., 51
Quintana, J.C., 487
R
Rabaud, V., 208, 212213
Rabenstein, R., 300
Rabinowitz, W.M., 292
Raczkowsky, J., 532
Rademacher, P., 537538
Raducanu, B., 312
Raghu, S., 22
Raghunath, M., 594, 597, 599, 602603
Rai, L., 498
Raja, V., 234
Rajchl, M., 501
Rambaud, C., 589, 594, 606
Rampersaud, Y.R., 491
Randall, G., 180, 363
Randhawa, R., 492
Rao, R., 250
Rash, C.E., 60, 86
Raskar, R., 537538
Rass, U., 302
Rassweiler, J., 487, 538
Rassweiler, J.J., 497, 499
Rassweiler, M.-C., 497
Rastogi, A., 400
Rathinavel, K., 73
Ratib, O., 506
Rattner, D., 496
Raulot, V., 97
Redon, S., 555
Reed, C., 322
Regatschnig, R., 522
Regenbrecht, H., 436, 473474
Reichert, W.M., 654
Reichherzer, C., 661677
Reichl, H., 622, 633
Reif, R., 436
Reiley, C.E., 502
Reilly, B.K., 503
Reiners, D., 436
Reisner, A., 597, 600, 606, 609
Reiss, A.A., 583615
Reitmayr, G., 152, 192, 198, 221, 438
699
Author Index
Rekimoto, J., 233234, 324, 594596, 603
Rempel, D., 564
Ren, J., 653
Restelli, U., 509
Reyes, M., 506
Rhee, S., 597, 600, 606, 609
Rhodes, B.J., 661662
Ribo, M., 198
Richmond, J.L., 232
Richter, J., 407
Rideng, O., 536
Rieder, C., 507, 541
Riedmaier, T., 321
Riener, R., 234
Rimet, Y., 589, 594, 606
Risatti, M., 588589, 593, 603, 607, 611, 613
Riser, A., 71
Ritter, F., 507, 541
Rivers, A.R., 556
Riviere, C.N., 532
Rizzo, F., 587, 592, 606, 610
Rizzolatti, G., 50
Roberson, D.J., 8
Robert, L., 461462
Roberts, G., 333, 374, 377
Robinett, W., 65
Robinson, J., 668
Roblick, U.J., 492
Rocchesso, D., 281
Rodde, T., 293, 317
Rodriguez, F., 495
Rodriguez Palma, S., 532
Rogers, C.D.F., 372373
Rogers, D.M., 653
Roggen, D., 586587, 589596, 600, 602603,
607, 609611, 613
Roginska, A., 322
Roh, T., 596, 598, 602, 607
Rhl, S., 508, 530
Rohling, R., 497
Rojah, C., 358
Rolland, J.P., 65, 73, 77, 80, 119, 487
Romano, J.M., 228, 232, 234, 250
Romanzin, C., 464
Rome, J.J., 500
Ronayette, D., 589, 594, 606
Rose, E., 461462
Rosenberg, L.B., 233234, 495
Rosenthal, M., 503
Rosin, P.L., 210211
Rosner, M., 81
Rssle, S.C., 51
Rossler, K., 522
Rosso, R., 586, 589, 591, 606607, 613
Rosten, E., 207
Rothbucher, M., 321
Rothganger, F., 441
Roto, V., 28
Roumeliotis, S., 220221
Rousseau, J., 655
Rouzati, H., 322
Rovers, L., 233
Rowe, A., 594, 597, 599, 602603, 609
Rowe, P.J., 495
Rozier, J., 288
Rubino, G., 531532
Rublee, E., 208, 212213
Rueda, O., 536
Ruff, T.M., 376
Ruffaldi, E., 556
Rumsey, F., 281282
Rund, F., 321
Ryoo, D.W., 586, 590, 606607
Ryu, J., 232, 234, 236, 554
Ryu, S.-W., 345
S
Sa, J., 234
Sabatini, A.M., 593
Sadalgi, S., 288
Saeedi, E., 19
Sager, I., 13
Saito, A., 533
Saito, H., 461462
Saito, Y., 76
Saji, A., 300
Sakas, G., 536
Sakata, N., 665, 668
Sakita, I., 497, 535
Sakuma, I., 505
Salari, M., 653
Salas, J., 312
Salb, T., 534
Salisbury, J.K., 556
Salisbury, K., 250, 556
Salsedo, F., 552
Salvatore, G.A., 613614
Salvetti, O., 586, 589, 591, 606607, 613
Samanta, V., 264
Samarasekera, S., 192
Samset, E., 536
San Jos Estpar, R., 487
Sanderson, P., 23
Sandin, D., 231
Sandler, M., 314
Sandor, C., 231
Sanford, J., 594, 597, 599, 602603
Sansome, A., 436
Santato, C., 655
Santos, J.L., 487
Sanuki, W., 312
Sarakoglou, I., 554
Sarmiento, M., 500
700
Sarpeshkar, R., 324
Sartini, G., 552
Sasama, T., 497, 535
Sato, K., 314
Sato, M., 234235, 552, 554
Sato, S., 503
Sato, Y., 487, 497, 535
Satoh, K., 198
Sattler, T., 192
Saturka, F., 321
Sauer, F., 487, 503, 531532, 538539
Saunders, T., 40
Savioja, L., 314
Savvidou, O.D., 491
Sawhney, N., 18
Sawka, A., 492
Saxenian, A.L., 40
Sayd, P., 442
Scaioni, M., 359
Scarborough, D.M., 588589, 594, 607
Schacher, J.C., 322
Schaffalitzky, A., 127
Schaik, A.V., 288
Schall, G., 333334, 376
Scharl, A., 504
Scharver, C., 231
Scheggi, S., 554
Schenk, A., 536
Scheuering, M., 536
Schiemann, M., 505
Schiller, I., 345
Schilling, A., 461, 473
Schiphorst, T., 46
Schirmbeck, E.U., 535
Schleicher, D., 447
Schlig, E., 594, 597, 599, 602603
Schluns, K., 126
Schmalstieg, D., 80, 152, 192, 333334, 376, 478,
507, 536
Schmandt, C., 18
Schmid, C., 215, 441
Schmidt, A., 613614
Schneberger, M., 538539
Schneider, A., 536
Schneider, J., 662
Schneider, S.O., 490
Schnelzer, A., 493
Schnepper, J., 594, 597, 599, 606
Schnupp, J.W.H., 324
Schoonover, K., 629630
Schorr, O., 506
Schowengerdt, B.T., 74
Schranner, R., 62
Schroeder, D., 564
Schroeder, P., 333
Schultz, T., 588, 593, 607
Schutte, K., 466
Author Index
Schwartz, S.J., 586, 589, 592, 600, 607613
Schwirtz, A., 587, 589, 592, 600, 603, 608612
Scilingo, E.P., 553, 555
Secco, E.L., 588589, 593, 603, 607, 611, 613
Seeberger, R., 508, 534
Sgalini, J., 641
Segall, Z., 662, 664
Seibel, E.J., 74
Seifert, U., 497, 534
Seitel, A., 497
Seitz, S.M., 127, 192
Sekiguchi, D., 231234, 236
Seligmann, D., 63
Seo, B., 293
Seo, C., 554
Seo, J., 233
Serafin, S., 554
Serina, E.R., 564
Serio, A., 555
Serra, L., 497
Servirest, M., 413
Setlock, L.D., 665, 673
Seyler, T.M., 495
Seymour, S., 639
Shaff, S., 281
Shah, T.H., 646, 655656
Shaltis, P., 597, 600, 606, 609
Shamir, R., 497
Shedroff, N., 324
Sheikh, Y., 438
Shekhar, R., 499
Shelton, D.M., 504
Shepard, R.N., 10
Shi, J., 132, 390
Shi, X., 288
Shibasaki, T., 498, 531, 533
Shibata, F., 152
Shilling, R.D., 278
Shimamura, K., 486
Shimizu, E., 68
Shin, D.H., 333334
Shin, M.K., 653
Shinjoh, A., 400
Shinn-Cunningham, B., 278, 324
Shiotani, S., 535
Shirley, P., 458
Shiroma, N., 400
Shiwa, S., 74
Shoham, M., 497
Shoshan, Y., 497
Shotton, H., 474
Shreiner, D., 337, 346, 459
Shuhaiber, J.H., 527, 533, 535
Shukla, G., 504
Siadat, M.-R., 497, 531
Siau, K., 411, 417
Siegel, J., 662664, 666
Author Index
Siegwart, R., 212213
Sielhorst, T., 507508, 541
Siewiorek, D.P., 22, 80, 594, 597, 599,
602603,609
Siitonen, M., 672
Siltanen, S.,
Silverstein, M.D., 509
Sim, C.Y.D., 639
Simard, P., 208
Simms, A., 42
Simon, C., 619638
Simon, D.A., 491
Simon, P., 639, 641643, 645
Simoneau, J., 22
Simpfendrfer, T., 487, 499, 538
Sin, F., 564
Sinclair, D., 528
Sindram, D., 492
Sing, N., 460462
Sirhan, D., 528
Sisodia, A., 71
Sivic, J., 186, 192, 215
Sjkvist, S., 506
Skolnik, D.A., 359
Skorobogatiy, M., 653, 655
Skrypnyk, I., 192
Skulimowski, P., 312
Slade, J., 630
Sloten, J.V., 536
Smailagic, A., 22, 594, 597, 599,
602603,609
Smalley, D., 282
Smedby, ., 506
Smith, B.P., 495
Smith, E., 268
Smith, K.C., 324
Smith, R., 413
Smith, R.T., 434
Smolander, K., 417
Snavely, K., 192
Snavely, N., 127, 192
Soh, B.S., 593594, 597, 599
SoIanki, M., 234
Soin, N., 646, 655656
Sokoler, T., 324
Solazzi, M., 553554
Soler, L., 496, 536537
Son, H., 376
Song, C., 404
Song, M.K., 653
Sonmez, M., 500
Sonntag, D., 7879
Sood, A., 234
Sorger, J., 503
Southern, C., 22
Sovich, J., 653
Spagnol, S., 321
701
Speidel, S., 508, 529, 533534, 538
Spence, C., 234
Spengler, P., 508, 530, 534
Spieth, C., 314
Spink, R., 531532
Spinks, G.M., 653
Splechtna, R.C., 535536
Spors, S., 300
Sprung, L., 487
Spurgin, J.T., 371
Squire, K., 263
Sreng, J., 233
Srinivasan, M.A., 552, 554
Stger, M., 594597, 608
Stamos, I., 165
Stanley, M.C., 562, 564, 568
Stanney, K.M., 278
Stapleton, C., 268
Starkey, K., 43
Starner, T.E., 1328, 320321, 584, 661662
State, A., 69, 73, 76, 503, 537538
Steed, A., 667
Steinemann, D., 552577
Steingart, D.A., 655
Stengel, M., 555
Stenger, D., 646, 650, 656
Stenros, J., 260
Steptoe, W., 667
Sterling, R., 370
Stern, A., 266
Stetten, G.D., 504
Stevens, B., 230
Stewart, A.J., 542
Stewart, C.A., 553
Stewart, R., 314
Stewenius, H., 186, 192
Stewenius, H.D., 127
Stiefmeier, T., 587, 590, 592, 607
Stivoric, J., 552, 608
Stock, C., 198
Stolka, P.J., 492
Stoll, J., 496
Stone, R., 567
Stopp, F., 493
Strig, C., 322
Stoyanov, D., 508
Straer, W., 460, 463, 466, 468, 470, 473478,542
Strasser, W., 564, 571
Stratton, G.M., 80
Strecha, C., 212213
Streicher, R., 296, 323
Streitz, N., 324
Strengert, M., 471472
Stricker, D., 152, 436, 441, 443, 445, 463, 478
Stroila, M., 153
Strong, A.J., 531532
Strumillo, P., 312
702
Sturm, P., 127
Su, J., 655
Su, L.-M., 502
Subramanian, V., 621, 638
Sudia, F.W., 49
Sudra, G., 529, 533534, 538
Sueda, S., 556, 567
Suenaga, H., 505
Suetens, P., 500
Sugano, N., 462
Sugimoto, M., 399409, 506
Sugimura, T., 234
Sukan, M., 288
Sulpizio, H.M., 654
Sumikawa, D.A., 293
Sumiya, E., 79
Sun, J., 215
Sun, P., 650, 652
Sung, M., 314
Suruda, J., 662
Suthau, T., 536
Sutherland, I., 60
Sutton, E., 499
Suwelack, S., 508, 530
Suya, Y., 198
Suzuki, H., 281
Suzuki, M., 506
Suzuki, N., 537
Suzuki, Y., 281, 322
Swan, J.E., 79
Swan, R.Z., 492
Swank, M.L., 509
Sweeney, J.A., 4546
Szab, Z., 506
Szekely, G., 231
Szeliski, R., 127, 192
T
Ta, D.N., 209
Tabata, Y., 589, 593, 608, 5888
Taberna, P.L., 641642
Taccini, N., 602, 603, 606
Tachi, S., 231, 233234, 313, 552, 554
Tachibana, K., 462
Tadokoro, S., 400
Tagle, P., 487
Takada, D., 78
Takagi, A., 76
Takahashi, H., 87
Takahashi, T., 400
Takakura, K., 522
Takamatsu, S., 630
Takao, H., 311
Takasaki, M., 233234
Takemoto, K., 198
Author Index
Takemura, H., 4, 71, 7879, 373, 486
Taketomi, T., 126147
Talha, M.M., 96
Talib, H., 533
Talmaki, S.A., 335
Tamaazousti, M., 433455
Tamaki, E., 594596, 603
Tamaki, T., 234
Tamaki, Y., 497, 535
Tamburo, R.J., 504
Tamminen, S., 28
Tamstorf, R., 552577
Tamura, H., 198
Tamura, S., 487, 497, 535
Tan, H.Z., 250, 554
Tanaka, M., 665
Tanaka, T., 359
Tang, H., 197, 487
Tang, R., 492
Tang, W., 655
Tanguy, A., 539
Taniguchi, N., 76
Tanijiri, Y., 68
Tanikawa, T., 78
Tanno, K., 300
Tanoue, K., 499, 535
Tao, J., 655
Tashev, I.J., 321
Tatzgern, M., 80
Taylor, R., 233234
Taylor, R.H., 233, 502504
Tchouda, S.D., 509
Teber, D., 487, 497, 499, 538
Tecchia, F., 666
Teizer, J., 376
Tener, R.K., 383
Teran, J., 564
Terlaud, C., 589, 594, 606
Ternes, D., 233
Terriberry, T.B., 199, 209, 212
Terven, J.R., 312
Tezuka, T., 506
Thalmann, D., 231, 555, 561
Thiele, H., 502
Thomas, B.H., 17, 267, 407, 434, 436, 476,
587, 589, 592, 600, 602603,
609, 612
Thomas, G.W., 234
Thomas, J.P., 62
Thomas, M., 70
Thomas, M.R.P., 321
Thomas, P., 52
Thomaszewski, B., 564, 571
Thompson, C., 80
Thompson, D.M., 315, 321
Thongrong, D.P.S., 231
703
Author Index
Thorp, E.O., 584
Thorpe, S., 312
Thukral, S., 234
Thumfart, W.F., 498
Thurlow, W.R., 292
Tian, Y., 345
Tikander, M., 288
Timm, B.W., 355
Tobias, H., 63
Tobias, J., 51
Tognetti, A., 588589, 593, 603, 607, 611,
613,636
Tohyama, M., 281
Tokunaga, E., 535
Tomasi, C., 132, 391, 476
Tomei, M., 509
Tomikawa, M., 535
Tomita, M., 405
Tonet, O., 537
Tonetti, J., 491
Toney, A., 587, 589, 592, 600, 602603,
609, 612
Torrealba, G., 487
Toso, C., 502
Tourapis, A., 464
Toyama, T., 7879
Traub, J., 493, 501, 508, 535, 541
Trawny, N., 220221
Traylor, R., 554
Trdoff, B., 665
Treagust, D.F., 383
Trendafilov, D., 672
Trevisan, D.G., 533535
Triggs, B., 127
Troccaz, J., 491
Troester, G., 27
Trost, G., 621
Trster, G., 586587, 589597,
599600, 602603, 606611,
613615,621
Troy, J.J., 562
Truillet, P., 312
Trulove, M.A., 654
Trulove, P.C., 654
Tsagarakis, N.G., 554
Tsai, C.-Y., 411
Tsai, R.Y., 129
Tsai, Y.T., 497
Tseng, C.W., 639
Tsingos, N., 280
Tsotros, M., 413
Tubbesing, S.K., 358
Tuceryan, M., 461462
Turchet, L., 554
Turk, G., 477
Tuytelaars T., 207
U
Uchiyama, H., 139, 152
Uchiyama, S., 198, 231
Ueda, H., 68
Ullmer, B., 324
Umansky, F., 497
Umbarje, K., 492
Ungersbock, K., 522
Uppsll, M., 506
Urban, M., 215
Urey, H., 100
Utsumi, A., 4, 486
V
Vacirca, N.A., 639
Vaghadia, H., 492
Vagvolgyi, B.P., 502
Vaissie, L., 77
Valdivieso, A., 536
Vale, J.A., 508
Valgoi, P., 359
Vallino, J.R., 231
van Dam, A., 324
van den Doel, K., 232
van der Gast, L., 411
Van der Heijden, H., 411, 416417
van der Spoel, S., 411
van Essen, H., 233
van Kleef, N., 411
van Os, K., 621
Van Pieterson, I., 621
Vanderdonckt, J., 533535
Vandergheynst, P., 212213
Varga, E., 509
Varshavsky, A., 15
Vasile, C., 491
Vaughn, J., 270
Vavouras, T., 587, 589
Vaysse, A., 587, 592, 606607, 609
Vazquez-Alvarez, Y., 672
Vega, K., 4849
Velger, M., 86
Venkatesh, V., 421
Ventura, J., 174192
Verkasalo, H., 411
Vescan, A.D., 496, 507
Vetter, M., 536
Vexo, F., 231, 555
Vidal, F., 358
Vieira, F.V., 438
Vilkamo, J., 300
Villegas, J., 290, 292, 294, 309324
Vincenty, T., 340, 346
Vinge, V.
704
Vlahakis, V., 265, 413
Vogl, T.J., 505
Vogt, S., 487, 503, 531532, 538539
Volont, F., 502, 506
Volz, R.A., 232235
Von Busch, O., 45
von Gioi, R.G., 180, 363
Vosburgh, K.G., 487488, 496
Voss, D., 3153
Vouaillat, H., 491
Vu, Q.A., 652
Vullers, R.J.M., 320
W
Wacker, F.K., 503, 531
Waern, A., 260
Wagmister, F., 45
Wagner, A., 502, 532
Wagner, D., 152, 192
Wagner, S., 630
Wahbeh, A.M., 359
Wahl, F., 595, 598
Wallace, G.G., 653
Wallace, J.W., 359
Wallraven, C., 478
Walz, S., 271
Wan, K.M., 629, 646, 654
Wan, S.H., 629, 646, 654
Wander, S., 3738
Wang, C., 345
Wang, C.-D., 293
Wang, D., 504
Wang, G., 541
Wang, H., 385
Wang, J., 505
Wang, K., 653
Wang, L., 493, 501, 594595, 606
Wang, M.L., 539
Wang, Q., 170, 553
Wang, R.Y., 593
Wang, S., 477
Wang, S.-C., 411
Wang, T., 465
Wang, X., 385
Wang, X.L., 492
Wang, Y., 86, 96
Wang, Y.-C., 199
Wang, Z., 505
Wang, Z.L., 653, 655
Wanner, F., 667
Wanschitz, F., 502, 532
Want, R., 15, 288
Ward, J.A., 27, 594, 597, 599, 606608, 613
Wardrip-Fruin, N., 260
Warren, N., 311
Author Index
Warshaw, P.R., 417
Watanabe, K., 322
Watkinson, J., 283
Watts, H., 48
Watzinger, F., 502, 532
Weaver, K., 22
Weber, J.-L., 587, 589, 592, 594, 606607, 609
Weber, S., 497, 502, 506
Webster, A., 63, 333, 662
Wedlake, C., 501, 505, 535
Wegenkittl, R., 535536
Weghorst, S., 487
Wei, Z., 653
Weinheimer, S., 666
Weiser, D., 504
Weiser, M., 324, 584
Wekerle, A.-L., 508, 530
Wells, W., 530
Wells W. III, 496
Wells, W.M., 496
Wendler, T., 493
Wenzel, E.M., 282283
Wesarg, S., 505
Westerfeld, S.
Wetzel, P., 70
Wetzel, R., 298
Weyrich, T., 667
Whitaker, R., 461462
White, S., 153, 312
White, S.J., 496
Whitehead, K.K., 500
Whyte, R., 588589, 593, 603, 607, 611, 613
Wicker, B., 50
Wieferich, J., 507, 541
Wiener, N., 50
Wientapper, F., 415, 441
Wierstorf, H., 322
Wiertlewski, M., 577
Wiles, A.D., 501, 535
Wilke, W., 436
Wilkes-Gibbs, D., 666
Williams, T., 70
Wilsdon, J., 40
Wilson, E., 499
Wilson, J., 42, 117
Wilson, M., 495
Wilson, P., 630
Wimmer-Greinecker, G., 505
Win, M.Z., 219
Winer, E., 487
Winfree, K.N., 554
Winne, C., 493
Wirtz, C.R., 532
Witchey, H., 413
Withagen, P., 466
Wither, J., 264
705
Author Index
Witkin, A., 558
Witterick, I.J., 496, 507
Wloka, M.M., 344, 460, 473
Wolfsberger, S., 522
Wong, K.S., 629, 646, 654
Wong, S.W., 492
Woo, M., 337, 346
Woo, S.-W., 652, 655
Woo, W., 232, 234, 236, 407, 468
Wood, R.E., 185
Woods, E., 73
Woods, R., 202
Woodward, C.,
Wrn, H., 506, 531532
Wren, J., 506
Wright, D.L., 65, 80, 487
Wright, P., 86
Wu, E., 477
Wu, H., 652653
Wu, J.-H., 411
Wu, J.R., 539
Wu, K., 499
Wu, W.Z., 653
Wu, Y., 126
Wu, Z., 215
Wuest, H., 152, 415, 441, 443, 445, 463, 478
Wst, H., 414
Wyland, J., 371
X
Xie, X., 22
Xu, N., 363
Xu, W., 668
Xu, Y., 588589, 594
Xue, Q.J., 650
Xueting, L., 469
Y
Yachida, M., 71
Yagi, Y., 71
Yalcin, H., 346
Yamaguchi, S., 499
Yamamoto, G., 131
Yamamoto, H., 198, 231
Yamasaki, K., 68
Yamashita, T., 630
Yamazaki, H., 312
Yamazaki, M., 506
Yamazaki, S., 76
Yamini, S., 653
Yan, X.B., 650
Yanagibashi, Y., 503
Yang, C., 234
Yang, G.-H., 554
Yang, G.-Z., 508, 594596, 606

Yang, J., 665, 673
Yang, L., 505
Yang, T.-H., 554
Yang, X., 69, 73, 195223
Yang, Z., 653
Yaniv, Z., 486510
Yano, H., 234
Yao, H.-Y., 231, 233234, 237
Yasuda, H., 506
Yau, S.H., 232
Ye, G., 231, 234
Ye, S., 653
Ye, W., 588589, 594
Yeh, Y.-H., 411
Yencilek, F., 487, 538
Yim, S., 250
Yin, P., 464
Yohan, B., 79
Yokokohji, Y., 233234
Yokoya, N., 373, 468469
Yoo, H.-Z., 596, 598, 602, 607
Yoon, Y., 411
Yoshida, A., 317
You, S., 152169
You, Y., 314
Young, F., 250
Young, L.J., 50
Yu, H.K., 652, 655
Yu, K.-C., 498
Yu, S., 554
Yu, X., 653
Yun, K., 407
Yushin, G., 641
Z
Zach, C., 186, 192
Zahorik, P., 292
Zakarauskas, P., 292
Zamarayeva, A.M., 655
Zang, X., 650, 652
Zeagler, C., 18
Zehavi, E., 497
Zhang, C., 655
Zhang, J., 234
Zhang, X., 655
Zhang, Y., 234, 653
Zhang, Z., 129130, 138, 652653
Zhao, Y., 652653
Zheng, G., 500, 538539
Zhou, J., 436
Zhou, T., 655
Zhu, C., 497
Zhu, Y., 564
Ziegeler, S., 490
706
Ziegelwanger, H., 322
Zilles, C.B., 556
Zimmermann, R., 293
Zinreich, S.J., 504
Zinser, M.J., 497, 534
Zisserman, A., 127, 132, 186, 192, 215, 442
Zller, J.E., 497, 534
Zllner, M., 415, 463, 478
Author Index
Zoran, A., 233234
Zordan, V.B., 556
Zotkin, D.N., 321
Zucco, J., 17
Zucco, J.E., 434
Zuerl, K., 538539
Zugaza, I., 478
Zysset, C., 621
Subject Index
A
AAR system, see Audio augmented
reality(AAR) system
Accessory-based wearable computers
multifunctional wearable computer, 594
projects analysis, 595597
projects and placements, 595, 598
arm region, 598599
hand and finger region, 599600
head and neck region, 595, 598
legs and feet region, 600
torso region, 595, 597, 599
wrist region, 599
Acoustic tracking systems, 522
Activated carbons (ACs), 641, 647650, 654655
AEC applications, see Architecture, engineering,
and construction (AEC) applications
Age-related macular degeneration (AMD),
67,107
Anisotropic magnetoresistive (AMR)
magnetometer, 219
Anisotropic nonlinear elastic behavior
constraint formulation, 573
constraint Jacobians, 573574
finger skin deformations, 576
hyperbolic projection function, 572573
linear interpolation, 574575
orthogonal projection, 574575
real-world finger, 570
strain limits, 571572
Architecture, engineering, and
construction(AEC) applications
buried utilities
digging implement, 372373
excavation contractors, 370
field locators and markers, 373
GIS, 374376
KML, 381
MDS, 371372
One-Call excavation damage prevention
process, 371
operator-perspective, 373374
proximity monitoring (see Proximity
monitoring)
spatial awareness, 372
underground utility, 371, 381382
collaborative learning
ARVita, 390392
computer-based visualization, 383385
eagle window, 386

error detection, 385
FLTK_OSGViewer node, 386388
4D CAD modeling, 383384
KEG, 390
natural marker, 388390
OpenGL camera, 386
paper-based shared workspace, 384385
Raytheon STEM Index, 383
tracker and marker mechanism, 386,
388389
VITASCOPE scene node, 385388
discrepancy check tool, 333334
hardware platforms
ARMOR, 355358
UM-AR-GPS-ROVER, 353356
LIVE, 335337
safety and subsurface utilities, 333334
software interfaces
ARVISCOPE, 350352
SMART, 352353
spatial registration
PULL mode, 341342
PUSH mode, 342344
registration process, 337340
structural damage
ATC-20 postearthquake safety
evaluation,358
augmented baseline and building
edge,359
camera orientation errors, 369370
camera position errors, 368370
corner detection, 365366
graphical discrepancy, 360361
ground true tracking data, 367
horizontal edge detection, 364365
IDR calculation, 358359, 365, 367
instrument errors, 368
LSD, 359360
observing angle, 367368
quantitative measurements, 358
target panels, 359
2D intersections, 362
vertex deformation, 362
vertical edge detection, 362363
VP environment, 361362
TINMITH2, 334335
visual illusion
bounding box, 345
GLSL, 345
707
708
implementation, 346348
incorrect and correct occlusion, 344345
RTT techniques, 345
semiautomated method, 344345
stereo camera, 345
TOF camera, 345346
two-stage rendering method, 346347
validation experiments, 349350
Vincenty algorithm, 345346
x-ray vision, 333
AR Faade, 266267
ARMOR platform, see Augmented Reality
Mobile OpeRation (ARMOR)
platform
ARVISCOPE, see Augmented
reality visualization of
simulated construction
operations(ARVISCOPE)
Atomic force microscopy (AFM), 51
Audio augmented reality (AAR) system, 278
anyware and awareware, 316
audio windowing, 315
layered soundscapes, 319320
multipresence, 317, 319
narrowcasting (see Narrowcasting)
applications
assistive technologies, 312313
entertainment and spatial music, 314315
location-aware games, 314
navigation and location-awareness
systems, 311312
sonification, 314
synesthetic telepresence, 313
challenges
authoring standards, 322323
capture and synthesis, 321
performance, 321322
Audio data/definition model (ADM), 322
Audio windowing, 310, 315, 323
Audium, 281282
Augmented Reality Markup
Language(ARML),322
Augmented Reality Mobile
OpeRation(ARMOR) platform
backpack, 355, 357358
improvements, 356
insecure placement, tracking, 355
vs. UM-AR-GPS-ROVER platforms, 356357
Augmented reality visualization of
simulated construction
operations(ARVISCOPE)
animation scalability, 352
animation trace file, 351
automatic generation, 351
manual generation, 351
smooth and continuous operation, 350351
time-ordered sequence, 351352
Subject Index
Augmented Reality Vitascope (ARVita),
384387, 389392
Augmented x-ray guidance system, 490491
Automatic reference counting (ARC), 419
Automotive industry
large-scale deployment, 438440
potential benefits
design and conception, 434435
driving assistance, 436438
factory planning, 435
maintenance support, 438
sales support, 436
user manual, 438
vehicle production, 435436
tracking solutions
large occlusion, 442
model-based tracking solutions, 441
6DoF mechanical measurement
arms,440
2D/3D markers, 440441
VSLAM (see Visual
simultaneous localization and
mapping(VSLAM))
vehicle localization
accuracy, 447
geo-referenced landmark database,
446447
3D visual features, 447
VSLAM constraints (see Visual
mapping (VSLAM))
Autonomous sensory meridian
response(ASMR),295
B
Bag-of-words matching (BoW) matching,
215216
Barco Auro-3D, 300
Best-Bin-First (BBF) algorithm, 160161
Binary robust independent elementary
features(BRIEF), 212213
Binaural hearing aids, 301302
Bujnaks method
center of projection, 138
focal length, 138
lens distortion, 137138
quantitative evaluation
ARToolKit, 139
estimated rotation matrix, 139
Euclidian distance, 139
free camera motion, 140142
real environment, 145147
straight camera motion, 143145
true rotation matrix, 139
video sequences, 139
spline fitting, 138
709
Subject Index
C
CAD model, see Computer-aided design (CAD)
model
Camera parameter estimation
vs. Bujnaks method (see Bujnaks method)
fiducial marker, 127129
2D3D corresponding pairs, PnP problem,
126127
zoomable camera (see Zoomable camera
model)
CamNet system, 662663
Carbide-derived carbons (CDC), 641
Carbon fiber (CF) electrode, 650652
Carbon nanotubes (CNTs), 647, 653
Chromatic aberrations, 67, 77, 467
CMT, see Cut-Make-Trim (CMT)
CNC machines, see Computer Numerical
Control(CNC) machines
Collaborative learning
ARVita, 390392
computer-based visualization, 383385
eagle window, 386
error detection, 385
FLTK_OSGViewer node, 386388
4D CAD modeling, 383384
KEG, 390
natural marker, 388390
OpenGL camera, 386
paper-based shared workspace, 384385
Raytheon STEM Index, 383
tracker and marker mechanism, 386, 388389
VITASCOPE scene node, 385388
Collaborative wearable computers
CamNet system, 662663
communication asymmetries, 664
communication theory, 666667
DOVE, 665
features, 669
HMD, 663665
Social Panoramas (see Social Panoramas,
prototype system)
TAC system, 664
users environment, 667669
WACL, 665
WearCam and WearComp systems, 663
Communication theory, 662, 666667, 669
Computed tomography angiography (CTA), 525
Computer-aided design (CAD) model, 350, 352,
383, 386, 443445, 448
Computer graphics (CG) model, 402, 405, 408,
459462, 464, 470, 552
Computer Numerical Control (CNC) machines,
622, 625626, 628, 633
Cone-beam computed tomography (CBCT), 496,
499500
Couching technique, 631, 637
Craniofacial surgery, 505, 533534

Cultural heritage
Historical Tour Guide
ARC, 419
AR view, 418
CroMAR, 421
detailed information view, 418
events, 420
initialization, 421
Internet platform, 416
list view, 419
local history, 417
map, 419
photo overlay, 418
system architecture, 419420
TAM, 416417
technology acceptance research model,
417418
timeline, 419
UIApplication object, 420
mobile technology
Archeoguide system, 413
artifacts and exhibition areas, museums, 413
CityViewAR system, 415
European project Tag Cloud, 415
head-mounted displays, 413
image recognition-based applications, 415
iTACITUS, 414415
laptop system, 413
Layar platform, 413414
location-based experience, 415
Powerhouse Museum system, 414
smartphones and tablets, 412413
Streetmuseum system, 414
UAR, 414
usability and acceptance, evaluation
intention, 423425
perceived ease of use, 422423
perceived enjoyment, 423424
perceived usefulness, 422
prototype usability analysis, 421
street survey, 421422
structural model, 424426
test variable, 426427
web survey, 422
Cut-Make-Trim (CMT), 620621, 623, 629631
Cybergrasp haptic device, 562563
Cyclic voltammetry, 649650
D
Dahl friction model, 241242
Data acquisition system (DAS), 378
Data, visualization processing, and view (DVV)
taxonomy
block diagram, 523524
craniofacial surgery, 533534
710
dental surgery, 533534
derived data, 525, 541
endoscopic surgery, 537538
factors, 523524
laparoscopic surgery, 537538
maxillofacial surgery, 533534
minimally invasive cardiovascular surgery,
535536
neurosurgery, 531533
orthopedic surgery, 538539
patient-specific data, 523
prior knowledge data, 525
raw imaging data, 525
semantics, 523
view factor, 526527, 541542
visualization processing, 525526, 541
visually processed data, 523
da Vinci robotic surgical system, 502503
declipseSPECT system, 492494
Defocus blur, 468469
Dental surgery, 527, 529, 533534
Depth buffering, 345347
Depth of field, 468469
Depth perception, 62, 76, 79, 501502, 506507
Differential global positioning system (DGPS),
182, 189190, 356, 403, 452
Digital micro-mirror device (DMD), 73
Directional audio coding (DirAC), 300
DirectXs DirectSound, 284
Distributed spatial sound, 302303
Dolby Atmos, 300
Drawing over video environment (DOVE), 665
Dual-in-line (DIP) packages, 632
Duplex theory, 284
DVV taxonomy, see Data, visualization
processing, and view (DVV)
taxonomy
E
Electric double layer capacitors (EDLCs),
642644
Electrochemical capacitors (ECs), 642, 653654
Electromagnetic tracking systems, 489, 522
Electronic textiles (e-textiles)
CMT, 620
durability and reliability, 620
integration strategies
conductor integration, 621
manufacturing methods, 621
PCB, 621622
surface attachment, 621
routing in garments
marker layout, 627628
order of operations, 629
pattern piece, 627628
production design, 628629
Subject Index
seam crossing methods, 629631
trace crossings, 631632
SMD components, 633634
stitching technologies
CNC machines, 625626
multineedle machines, 624625
single-thread chain stitch, 623624
textile components
durability, 636637
sensor insulation, 636637
stretch and bend sensors, 634636
through-hole components, 632633
Electronic travel aids (ETAs), 312
Endoscopic surgery, 537538
Endoscopic video guidance system, 493495
Energy density, 644645, 652
ER4 MicroPro earphones, 288
European Broadcast Union (EBU), 322
Extimacy, see Humancomputer
interaction(HCI)
Eye box
eye pupil diameter, 100102
vs. eye relief, 100101
FOV, 102
holographic combiner and extraction, 95, 100
optical combiner thickness, 100
Eye tracking, 74, 7879
F
Feature-based tracking method
algorithm adaptation, 209212
local feature extraction
BoW matching, 215216
BRIEF, 212213
database object, 214215
efficiency, robustness, and distinctiveness,
206207
interest point detection (see Interest point
detection)
LDB, 213214
LSH, 215
object tracking, 216
SURF, 206, 212
RANSAC/PROSAC algorithms, 206
tracker initialization, 186187, 189
Features from accelerated segmented test (FAST)
detector, 207209
Field of view (FOV)
angular resolution, 7072
depth perception, 62
eye box, 102
field of fixation, 62
light intensity, 62
optical requirements
constraints, 8889
711
Subject Index
dot per degree, 90
dot per inch, 90
occlusion and see-through displays,
8889
pixel counts, 9091
target functionality, 90, 92
visual acuity, 6162
Filament fibers, 645646
Film grain, 464
Finite element method (FEM), 556, 559, 564
Finite impulse response (FIR) filter, 287,
342344, 353
Fitbit One, 26
Fitsense FS-1, 2627
Force systems, 554
4G LTE, 301
FOV, see Field of view (FOV)
G
GABRIEL system architecture, 311312
Garment-based wearable computers
definition, 585
physical activity monitoring, 589
project analysis, 585589
projects and placements, 590591
arms and hands region, 593
feet region, 593594
head and neck region, 590
legs region, 593
torso region, 590593
remote health monitoring, 589
user interfaces, 589
Garment devices
coated devices
cyclic voltammetry, 649650
porous textile supercapacitor, 648
screen printing, 647
SEM, 648649
SWCNT, 647
custom textile architectures, 655657
cutting edge research, 639640
definition, 639
energy storage devices
components, 642
ECs, 642
EDLCs, 642644
energy density, 644645
mass loading, 645
primary batteries, 644
pseudocapacitors, 643644
secondary (rechargeable) batteries,
643644
fibers and yarns, 645646
electrode configurations, 653654
graphene, 653
NFW, 654655
knitted CF electrode, 650652

wearable electronics
cost, 641
durability, 641
fabrication, 641642
reliability, 641
safety, 640641
washability, 641
weft knitted fabrics, 646
woven fabrics, 646
Geographic information system (GIS), 289, 374,
376, 413, 448449, 451453
database creation issue, 454
in-plane accuracy, 449451
low cost, 447448
navigation application, 452453
odometer/inertial sensor, 454
out-plane accuracy, 452
Geospatial information
system(GIS),374376
Gettysburg, 261262, 270
GIS, see Geographic information
system(GIS); Geospatial
information system (GIS)
Global positioning system (GPS), 403
buildings model, 437, 452453
camera positions, 448
degrees of freedom, 447, 449
inequality constraint, 448449
measurements, 448
Google Glass, 8, 1314, 2628, 3536, 46, 90,
110, 585, 670671, 673
Graph-cut based active contour (GCBAC), 363
Graphics processing units (GPUs), 189, 199, 209,
212, 346, 471, 474477
H
Half-silvered mirror devices, 6364, 67, 495,
504505, 527, 539
HandsOnVideo, 666
Haptic augmented reality
breast tumor palpation system, 228
components
interaction, 234236
models, 238239
registration problems, 236237
rendering frame, 237
friction modulation, 248249
multi-finger interaction, 250
pen-shaped magic tool, 228
stiffer inclusion
configuration, 245246
measurement-based approach, 246
rendering algorithm, 247248
variables, 246247
712
stiffness modulation
Geomagic, 240
PHANToM premium model 1.5, 240
single-contact interaction, 240243
two-contact squeezing, 243245
taxonomies
artificial recreation, 232
augmented perception, 233
composite visuo-haptic realityvirtuality
continuum, 229230
MicroTactus system, 233
within and between property, 233234
visuo-haptic realityvirtuality
continuum, 229230
vMR-hMR, 231232
vMR-hR, 229230
vMR-hV, 230231
vR-hMR, 231232
vR-hR, 229230
vR-hV, 230231
vV-hMR, 231232
vV-hR, 229230
vV-hV, 230231
Haptic rendering, 231, 238, 250, 556, 562563,576
Haptic ring, 554
HCI, see Humancomputer interaction (HCI)
Head-mounted displays (HMDs), 86
applications, 6263, 8788
breast biopsy, 503
camera parameters (see Camera parameter
estimation)
collaborative wearable systems, 663
connected glasses, 110
depth of field, 7375
diffractive and holographic optics
angular and spectral bandwidths, 9899
Bragg selectivity, 9798
optical functionality, 97
eye box
FOV, 102
holographic combiner and extraction,
95,100
FOV, 7072
hardware issues
catadioptric designs, 67
eye-relief, 65
free-form prism, 67
HMPD, 69
HOE, 68
non-pupil-forming architecture, 6667
NVIDIA, 69
ocularity, 6465
optical see-through display, 6364
optical waveguide, 68
Subject Index
pupil forming architecture, 6566
UNC, 69
video see-through display, 64
VRD, 6869
history, 6061
image distortions and aberrations, 77
input interfaces
body tapping sensing, 122
brain wave sensing, 122
eye gesture sensors, 119
hand gesture sensing, 121122
head gesture sensor, 119
head-worn computer, 118
optical gaze tracking techniques, 119121
trackpad, 119
voice control, 119
IPD, 99
latency, 7576
microdisplay
HUD, 93, 103
illumination engines, 102103
LCD transmission, 102103
LCoS, 102103
LED, 102103
MEMS/fiber scanners, 102, 105107
OLED panels, 103104
multimodality, 78
neurosurgery, 532
occlusion, 7273
optical architecture
contact lenses, 117118
light field occlusion, 118119
non-pupil-forming architecture, 93, 9596
optical platforms, 9397
pupil-forming architecture, 91, 93, 9596
tools, 91
constraints, 8889
dot per degree, 90
dot per inch, 90
8889
pixel counts, 9091
parallax, 7677
perceptual issues
adaptation, 8081
depth perception, 7980
user acceptance, 80
pictorial consistency, 77
resolution, 70
sensing, 7879
smart eyewear
optical combiner and prescription lenses,
107, 109
potential consumers, 107108
713
Subject Index
requirements, 107
Rx/plano lens, 107, 109110
smart glasses (see Smart glasses)
vision, 6162
Head-mounted projective displays (HMPD),
69,71, 73
Head-related impulse responses (HRIRs),
283284, 288, 312, 321322
Head-related transfer
functions(HRTFs),283286
Heads-up display (HUD), 88, 93, 97, 103
Head tracking system, 20, 284, 300301, 315
Historical Tour Guide
ARC, 419
AR view, 418
CroMAR, 421
detailed information view, 418
events, 420
initialization, 421
Internet platform, 416
list view, 419
local history, 417
map, 419
photo overlay, 418
system architecture, 419420
TAM, 416417
technology acceptance research model, 417418
timeline, 419
UIApplication object, 420
HMDs, see Head-mounted displays (HMDs)
Holographic optical element (HOE), 68, 97
HRIRs, see Head-related impulse
responses(HRIRs)
Humancomputer interaction (HCI)
ethics and speculation
actual and perceived value, 35
coloring, 37
garment anchor, 3435
Google Glass, 3334
iterative design, 36
prototypes, 36
quantified self applications, 3334, 3739
requirements and specifications, 3536
speculative design, 36
system of interactions, 35
social politics
FuelBand, FitBit, Glass, 3940
physical object, 43
quantified self, 4042
regional innovation, 40
tracking and route control, factory floor,
4243
synaptic sculpture
Blinklifier (see Humanistic intelligence)
bridging materiality and
information,4648
digitization, 4445
floating eye, 44
inversion, design process, 4546
materials technologies, 45
spermatozoa fertilization, 4344
star charts, 44
Humanistic intelligence
AFM, 51
anthropomorphize robots, 50
artificial intellects, 49
macro perspectives, 52
nanoperspectives, 52
natural gestures, 4849
neurotransmitter serotonin, 50
Peptomics, 51
singularity, 49
textiles, 5051
3D printing organs, 51
Humanrobot interfaces
active tangible robots, 407408
future predictive visual presentation,
404405
projection-based configuration, 405407
robots and users, information, 399400
robot types, 399400
video see-through-based AR
CG model, 402
egocentric view camera image, 401
four-wheel robot, 403404
fundamental concepts, 401402
system configuration, 401, 403
tele-operated rescue robots, 400401
Time Followers Vision, 401402
virtual third-person view, 402403
HuntCrossley model, 242, 246248
Hybrid methods
applications, 198
design objectives, 199
good accuracy and high efficiency, 199200
recognition and tracking, 221222
Hypersonic sound system, 302
I
IID, see Interaural intensity difference (IID)
Image-based method, 185
Image-guided neurosurgery (IGNS), 531533
Image-guided surgery (IGS)
acoustic tracking systems, 522
craniofacial surgery, 533534
definition, 520
dental surgery, 527, 529, 533534
drawback, 520
DVV taxonomy
derived data, 525, 541
factors, 523524
patient-specific data, 523
prior knowledge data, 525
714
raw imaging data, 525
semantics, 523
view factor, 526527, 541542
visualization processing, 525526, 541
visually processed data, 523
electromagnetic tracking systems, 522
endoscopic surgery, 537538
laparoscopic surgery, 530, 537538
maxillofacial surgery, 533534
minimally invasive cardiovascular surgery,
527, 530, 535536
neurosurgery, 527528
analyzed data, 531
derived data, 531
prior-knowledge data, 531
superimposed anatomical data
objects,530
view factor, 531533
visualization processing, 531
optical tracking systems, 522
orthopedic surgery, 527, 529, 538539
registration, 521522
tracked surgical probe, 521
validation and evaluation, 539540
Implantable miniature telescope (IMT), 67
Incremental matching, 154, 161
Inertial measurement unit (IMU), 76, 216
Information furniture, 303
Intelligent tourism and cultural
information through ubiquitous
services(iTACITUS), 414415
Intelligent traffic system (ITS), 311
Intel Scavenger Bot, 267
Interaural intensity difference (IID), 283285,
310, 314
Interaural time (phase) difference (ITD),
283285, 310, 314
Interest point detection
high-quality detector, 208209
lightweight detector, 207208
properties, 207
Interpupillary distances (IPDs), 61, 65, 77, 99
Intimacy, see Humancomputer interaction (HCI)
ITD, see Interaural time (phase) difference (ITD)
K
KanadeLucasTomasi (KLT) feature tracker,
132, 389390
Keyhole Modeling Language (KML), 381
Knitted carbon fiber electrode, 650652
L
Laboratory for Interactive Visualization in
Engineering (LIVE), 335337
Laparoscopic surgery, 495, 530, 537538
Subject Index
Large expanse extra perspective (LEEP) system,
19, 61
LCD, see Liquid-crystal display (LCD)
LCoS, see Liquid crystal on silicon (LCoS)
LED, see Light-emitting diode (LED)
LevenbergMarquardt method, 134, 137, 159
Leviathan, 269
Light Detection and Ranging (LiDAR) scanners
initial camera pose
estimation, 165166
keypoint features, 165
synthetic images, 165166
3D colored point cloud model, 164165
iterative estimation, 167168
pose refinement, 166167
video image, 168169
Light-emitting diode (LED), 102103, 105,
600,603
Linear blend skinning (LBS), 560561
Liquid-crystal display (LCD), 1819, 47,
102103, 107, 497, 534, 587, 603
Liquid crystal on silicon (LCoS), 73,
102103,107
Local difference binary (LDB), 212214
Locality sensitive hashing (LSH), 215
Location-based entertainment (LBE), 289
Location-based MR and AR storytelling
reinforcing
Battle of Gettysburg, 261
Dow Day, 263
110 Stories, 262
Situated Documentaries, 263
Streetmuseum Londinium, 263
strengths and weaknesses, 265
Voices of Oakland, 263264
The Westwood Experience, 263264
Wizard of Oz approach, 263
remembering
Four Angry Men, 270
REXplorer, 271
Rider Spoke, 271
Three Angry Men, 270271
Twelve Angry Men, 270
You Get Me, 271272
reskinning
Alices Adventures in New Media,
268269
Aphasia House project, 268
Faade, 266267
Half Real, 267
Intel Scavenger Bot, 267
Leviathan, 269
MR Sea Creatures, 268
Rainbows End, 265266
Wizarding World of Harry Potter, 268
Lockstitch machines, 623624, 626, 631
Lou Gehrigs disease, 5
715
Subject Index
M
MAGI system, see Microscope-assisted guided
interventions (MAGI) system
MapMyFitness tool, 41
MAR, see Mobile augmented reality (MAR)
Maxillofacial surgery, 497, 533534, 539
Medarpa system, 505
Medical binocular systems, 495, 501503
Medical navigation guidance system
data fusion, 489
data visualization, 489
feedback method, 490
medical imaging, 488
registration, 489
segmentation and modeling, 488489
tracking approach, 489
Mesopic vision, 62
Metropolitan area network (MAN) systems, 301
Microelectromechanical systems (MEMS), 102,
104107, 290
accelerometer, 217218
gyroscope, 217219
IMU, 216
Kalman filtering, 219221
magnetometer, 219
MicroOpticals displays, 2425, 114
Microscope-assisted guided interventions(MAGI)
system, 502, 532
MicroTactus system, 231, 233
Minimally invasive cardiovascular surgery, 530,
535536
MIThril system, 592, 609
Mixed reality (MR), 34, 230, 287, 303, 313,
486487, 677
Mobile augmented reality (MAR)
hybrid methods
applications, 198
good accuracy and high efficiency,
199200
recognition and tracking, 221222
sensor-based methods (see Sensor-based
methods)
vision-based methods (see Vision-based
methods)
Mobile technology
Archeoguide system, 413
artifacts and exhibition areas, museums, 413
CityViewAR system, 415
European project Tag Cloud, 415
head-mounted displays, 413
image recognition-based applications, 415
iTACITUS, 414415
laptop system, 413
Layar platform, 413414
location-based experience, 415
Powerhouse Museum system, 414

smartphones and tablets, 412413
Streetmuseum system, 414
UAR, 414
Monocular camera
balancing, 133134
definition, 131
epipolar constraint, 131132
fiducial marker corners, 132133
zoom value, continuity, 133
Monocular orthoscopic video see through
(MOVST), 664
Motion blur, 459, 468469
Moverio BT-100 model, 2122
MR, see Mixed reality (MR)
Multichannel loudspeaker arrays, 299
Multineedle machines, 624625
N
Narrowcasting
anyware models, 316
floor control, 316, 318
groupware environments, 317
predicate calculus notation, 315, 317
privacy, 316
Natural fiber welding (NFW), 654655
Nearest neighbor (NN), 160, 186, 214216
Nearphones, 297298, 303, 319
Netherlands Architecture Institutes (NAI) UAR
application, 414
Neurosurgery, 527528
analyzed data, 531
derived data, 531
prior-knowledge data, 531
superimposed anatomical data objects, 530
view factor, 531533
visualization processing, 531
NFW, see Natural fiber welding (NFW)
O
110 Stories, 262, 265
OpenCV, 222
OpenStreetMap, 181182, 190
Operating room, AR
binocular display, 501503
direct patient overlay display, 505506
endoscopic video guidance system, 493495
half-silvered mirror devices, 504505
HMDs, 503
limitations
clinical workflow, 508
cost-effectiveness, 508509
optimal information
dissemination,507508
organ deformation, 508
716
perception-related issues, 507508
soft tissues, 508
medical navigation guidance system (see
Medical navigation guidance system)
minimally invasive interventions, 486
mixed reality, 486
requirements, 490
screen-based display
augmented endoscopy system, 498
camera/projector system, 497498
endoscopic tracking system, 499500
image-guided navigation systems, 496
liver surgery, 499
liver tumor resection, 499
lymph node biopsy guidance system,
498499
model-enhanced US-assisted guidance
system, 501
optical tracking system, 497
robotic device, 497
2D/3D registration, 500501
virtual fluoroscopy, 500
tactile feedback, 495
ultrasound guidance system, 491492
video and SPECT guidance system, 492494
virtual reality, 486
visualization environments, 487488
x-ray guidance system, 490491
Optical architecture
contact lenses, 117118
light field occlusion, 118119
non-pupil-forming architecture, 93, 9596
tools, 91
Optical tracking systems, 440, 442443,
492493, 497, 499, 505, 522
The Optimist, 273
Organic light-emitting diode (OLED),
102107,603
Orientation code matching, 359
Orthopedic surgery, 500, 527, 529, 538539
P
Panasonic Open-Ear Bone Conduction Wireless
Headphones, 297
Panorama imagery, 668, 670
Parametric speakers, 298
Parametric ultrasonics, 302
PCBs, see Printed circuit boards (PCBs)
Peripheral vision, 62, 71, 80
Perspective-n-point (PnP) problem, 126127
Photonic mixer device (PMD), 473474
Photopic vision, 62
Physical activity monitoring, 589, 603, 607
Picking outlining adding (POA) paradigm, 666
Subject Index
Pin-array systems, 554
Point cloud model, see Light Detection and
Ranging (LiDAR) scanners
Polarization beam splitting (PBS) films, 102103
Polyvinyl alcohol (PVA) gel electrolyte, 652
Printed circuit boards (PCBs), 593, 621622, 631
Progressive sample consensus (PROSAC) loop,
179, 186, 189
Proximity monitoring
accurate measurement, 377
autonomous entities, 376377
buried assets, 377
construction and manufacturing, 376
DAS, 378
end-effector position, 377
excavator articulation tracking system,
377378
geometric proximity querying method,
379381
kinematic articulation, 378379
monitored virtual environment, 379
Pseudocapacitors, 643644
Q
QBIC system, 595, 597, 599, 607, 610611, 613
Qualcomm, 102, 222
R
Rainbows End, 265266
Random Sample Consensus (RANSAC) method,
157, 161, 165, 192, 196
Real-time high dynamic range techniques, 77
Real-time locating systems (RTLS), 289, 323
Real-virtual compositing process
aliasing virtual objects, 461
pixel averaging, 470471
real-virtual supersampling, 470472
global illumination effects, 460
occlusion handling methods, 473475
occlusion problem, 460
Realvirtual supersampling, 470472
Recon MOD Live, 2628
Reference camera
camera calibration, 135
camera pose estimation, 136137
fiducial marker, 134135
optical zoom lens movement, geometric
model, 135136
Reflection Technologys Private Eye, 13, 25
Relay optics, 60, 6566
Remote health monitoring, 585, 589, 594
REXplorer, 271
RGBD camera, 404
Rider Spoke, 271
RTLS, see Real-time locating systems (RTLS)
Subject Index
S
Scalable and Modular Augmented Reality
Template (SMART), 352353
Scale-invariant feature transform (SIFT),
165,167, 176, 179, 189, 192, 208,
212, 389
Scanning electron microscopy (SEM), 648649
Scotopic vision, 62
Screen-based display
augmented endoscopy system, 498
camera/projector system, 497498
endoscopic tracking system, 499500
image-guided navigation systems, 496
liver surgery, 499
liver tumor resection, 499
lymph node biopsy guidance system,
498499
model-enhanced US-assisted guidance
system, 501
optical tracking system, 497
robotic device, 497
virtual fluoroscopy, 500
Sensor-based methods
applications, 197198
object tracking
accelerometer, 217218
gyroscope, 217219
IMU, 216
Kalman filtering, 219221
magnetometer, 219
Server/client system design
latency analysis, 187188
offline reconstruction, 186187
online operation, 186187
point cloud, 186187
pose estimate, 187
query response, 187
sensor integration, 188
SFM, see Structure from motion (SFM)
Shoulder-worn acoustic targeting system, 313
SIFT, see Scale-invariant feature
transform(SIFT)
Single photon emission computed
tomography(SPECT), 488, 492493
Single-thread chain stitch machine, 623624
Single-walled carbon nanotubes (SWCNTs), 647
Small blurry image (SBI) localization
method,185
Smart eyewear
optical combiner and prescription lenses,
107, 109
potential consumers, 107108
requirements, 107
Rx/plano lens, 107, 109110
717
Smart glasses
consumer headsets, 113, 116
eye box
FOV, 102
holographic combiner and extraction,
95,100
focus/convergence disparity, 112113,
115116
immersion display, 111, 114
IPD, 99
optical architecture
non-pupil-forming architecture,
93,9596
tools, 91
constraints, 8889
dot per degree, 90
dot per inch, 90
8889
pixel counts, 9091
products, 8788
see-through smart glasses, 111, 114115
specialized headsets, 114, 117
Social Panoramas, prototype system, 670
cylindrical panorama image, 671672
Glass user, 674
prototype system, 671
remote awareness, 672673
TCP/IP networking, 673674
user interaction, 673
Wi-Fi network, 673
Soft finger models, 556
Soft skin simulation/deformation
anisotropic behaviors
constraint formulation, 573
finger skin deformations, 576
hyperbolic projection function, 572573
linear interpolation, 574575
orthogonal projection, 574575
real-world finger, 570
strain limits, 571572
Cybergrasp haptic device, 562563
deformable hands, 556557
flesh
elastic force computation, 560
elasticity model, 559
skeleton, coupling, 560561
tetrahedral discretization, 559
718
haptic rendering, 562563
rigid articulated hands, 555
skeletal bone structure, 557558
strain-limiting model, 564565
constrained dynamics, 566567
constraints, 564565
contact friction, 567
error metrics, 567568
hand animation, 569570
haptic coupling, 568
human finger model, 568569
Software-defined radio (SDR), 301, 323
Soldier wearable shooter detection
system,313
Sonic flashlight, 504
Sonification, 314
Sound bells, 298
Source and sink dimensions
auditory display form factors
ambisonics and HOA, 300
ASMR, 295
Barco Auro-3D, 300
bone conduction headphones, 297, 299
DirAC, 300
Dolby Atmos, 300
mobile terminals, 295, 297
multichannel loudspeaker arrays,
299300
nearphones, 297
parametric speakers, 298
sound bells, 298
stereo earphones, headphones, and
headsets, 295
stereo loudspeakers, 299
VBAP, 300
WFS systems, 300
broadband wireless network
connectivity,301
digital 4C [foresee] convergence, 294
dynamic responsiveness, 300
head tracking, 300301
mobile and wearable auditory interfaces,
295297
personal audio interfaces, 295
ubicomp/ambient intimacy, 294
Spatial dimensions, 283
distance effects, 291292
GIS, 289
GPS, 289
localized audio sources, 290291
MEMS, 290
position, 289
RTLS, 289
stereotelephony, 292294
whenceware, 290
whitherware, 290
Subject Index
Spatially Oriented Format for
Acoustics(SOFA),322
Spatial sound
audio reinforcement, 287
auditory displays, 278
Audium, 281282
binaural hearing aids, 301302
directionalization and localization,
283285
distributed spatial sound, 302303
information furniture, 303
mixed reality and mixed virtuality
systems,287
mobile AAR system, 288
musical sound characteristics, 282
optical and video, 288
parametric ultrasonics, 302
sink chain, 278280
spatial reverberation system, 285287
stereo reproduction systems, 280281
subjective spatial attributes, 284
wearware and everyware (see Source and
sink dimensions)
whereware, spatial dimensions, 284
distance effects, 291292
GIS, 289
GPS, 289
localized audio sources, 290291
MEMS, 290
position, 289
RTLS, 289
stereotelephony, 292294
whenceware, 290
whitherware, 290
Spatial Sound Description Interchange
Format(SpatDIF), 322
SPECT, see Single photon emission computed
tomography (SPECT)
Speech interfaces, 17
Speeded Up Robust Feature (SURF) detector
algorithm adaptation
GMoment method, 211212
gradient moments, 210
implementation, 210, 212
performance degradation, 209210
Phone-to-PC ratio, 210211
runtime cost, 210211
Hessian matrix, 208209
Spherical aberrations, 77
Staple fibers, 645646
State and resource based
simulationof construction
processes(STROBOSCOPE), 351352
Stereo camera model, see Reference camera
Stereo loudspeakers, 299
Stereotactic frame, 521
Stereotelephony, 288, 292294
719
Subject Index
Strain-limiting model, 564565
constrained dynamics, 566567
constraints, 564565
contact friction, 567
error metrics, 567568
hand animation, 569570
haptic coupling, 568
human finger model, 568569
Structure from motion (SFM); see also Visual
mapping (VSLAM)
A/C motor sequence, 162, 164
building scene, 162164
error reduction, 163164
fuse box sequence, 162, 164
incremental keypoint matching, 160161
keypoint descriptors, 160
point cloud, 158160
pose accuracy, 163
RMS reprojection error, 163
subtrack optimization, 156158
unified framework, 153, 156
unmatched keypoints, 161
Stylized AR systems, 463
GPUs, 474477
psychophysical evaluation, 478
realtime augmented environments, 476477
Subtrack optimization, 156159, 161164
Surface-mount (SMD) components, 633634
SURF detector, see Speeded Up Robust
Feature(SURF) detector
Surgical navigation, 487488
SWCNTs, see Single-walled carbon
nanotubes(SWCNTs)
Synesthetic telepresence, 313
semiautomatic geo-alignment
benefits, 179180
external orientation, 179
ground plane determination, 181
map alignment, 181182
vertical alignment, 180
server/client system design
latency analysis, 187188
offline reconstruction, 186187
online operation, 186187
point cloud, 186187
pose estimate, 187
query response, 187
sensor integration, 188
tracking system, 181
camera model, 182
image representation, 182
initialization and reinitialization, 185186
live keyframe sampling, 184
metrics, 184
point correspondence search, 182183
pose updation, 183
Through-hole components, 632633
Time-of-flight range sensor, 473475
Tracking system, 181
camera model, 182
image representation, 182
initialization and reinitialization, 185186
live keyframe sampling, 184
metrics, 184
point correspondence search, 182183
pose updation, 183
Triplett VisualEYEzer 3250 multimeter, 2223
Twelve Angry Men, 270
Twiddler, 14, 17, 2425, 592, 602
2D barcode markers, 202205
Ultrasonic bonding process, 630

Ultrasound guidance system, 491492
University of North Carolina (UNC), 69
User interfaces (UI), 222, 353,
586589,596597
Tabletop synchronized robots, 407408

TAC system, 664
Technology acceptance model (TAM), 411,
416418, 421, 428
Template markers, 203204
3D autostereoscopic system, 505
Three-dimensional (3D) visual model
automatic 3D modeling pipeline, 177179
multiple known points, 174175
panoramic/omnidirectional camera, 176
quantitative evaluations
architectural rendering, 190191
differential GPS, 182, 189190
landscape design application, 190191
speed, 189
video game graphics, 190191
V
Vector base amplitude panning (VBAP),
298,300
Vibrotactile system, 554
Virtual interaction tools, 527, 536
Virtual reality (VR); see also Head-mounted
displays (HMDs)
Facebook Oculus Rift, 323
FOV function, 92
occlusion/immersive displays, 86
sickness issues, 115
720
Sony PlayStation 4 Morpheus, 323
systems, 86
Virtual retinal display (VRD), 6869
Vision-based methods
object recognition and tracking
components, 201202
detection, 202
edge lines, intersections, 201
feature-based methods (see Feature-based
tracking method)
limitations, 205
marker identification, 203205
marker tracking, 205
pipeline, 200201
processes, 200
recognizer, 200201
software development libraries, 222223
Visual consistency
artistic and illustrative stylizations, 463
camera realism, 462463
complementary strategies, 463464
computer-generated virtual objects,
458460
emulating photographic imperfections
camera image noise, 466468
defocus blur, 468469
motion blur, 468469
film grain, 464
fully automatic processing, 465
hardware capabilities, 465
real-time processing, 465
real-virtual compositing (see Real-virtual
compositing)
stylized AR systems, 463
GPUs, 474477
psychophysical evaluation, 478
realtime augmented environments,
476477
system resources, 465
video see-through AR, 458
visual discrepancies, 461462
Visual illusion
implementation, 346348
incorrect and correct occlusion, 344345
occlusion handling process, 344346
two-stage rendering method, 346
validation experiments, 349350
Visual modeling, see Three-dimensional (3D)
visual model
Visual simultaneous localization and
mapping(VSLAM)
accuracy and robustness, 446
CAD model, 443445
Subject Index
error accumulation, 445
GIS
database creation issue, 454
in-plane accuracy, 449451
low cost, 447448
odometer/inertial sensor, 454
out-plane accuracy, 452
GPS constraints
buildings model, 437, 452453
camera positions, 448
degrees of freedom, 447, 449
inequality constraint, 448449
measurements, 448
keyframes, 442
multiview geometry, 442
sales assistance, 437, 446
scale factor, 442443, 445
small motion assumption, 445446
Visual tracking; see also Three-dimensional (3D)
visual model
LiDAR point clouds (see Light Detection and
Ranging (LiDAR) scanners)
object recognition and tracking
database matches, 155
incremental keypoint matching,
154156
keypoints, 154155
offline stage, 153154
online stage, 154
projection matches, 155
unified framework, 152153
robust SFM
A/C motor sequence, 162, 164
building scene, 162164
error reduction, 163164
fuse box sequence, 162, 164
incremental keypoint matching, 160161
keypoint descriptors, 160
point cloud, 158160
pose accuracy, 163
RMS reprojection error, 163
subtrack optimization, 156158
unified framework, 153, 156
unmatched keypoints, 161
VSLAM, see Visual simultaneous localization
and mapping (VSLAM)
Vuforia cloud recognition service, 222223
W
WalshHadamard (WH) kernel projection, 160
Wearable active camera with laser
pointer(WACL), 665, 668669
Wearable computer integration
721
Subject Index
accessory-based wearable computers (see
Accessory-based wearable computers)
cost effectiveness, 600, 602, 612
data and power, 600601, 607
extensibility, 600, 602, 611
garment-based wearable computers (see
Garment-based wearable computers)
on-body computing, 600, 602, 612613
robustness and reliability, 600601, 610611
safety considerations, 600, 602, 612
sensing modalities, 600601, 603607
social acceptance and aesthetics,
600601,609
user interface, 600603
wearability, 600601, 608609
Wearable computing
academic/maker system, 2425
audio displays, 1819
computer-generated images, 9
consumer devices, 2628
diabetes, monitoring, 6
efficiency, improvements, 28
image registration, 10
IMT technology, 67
individual electrodes, 5
industrial wearable systems, 2224
Lou Gehrigs disease, 5
microchip, 45
microvibration device, 56
mixed-reality, 34
mobile input, 1718
networking, 1415
neural interface, 5
portable video viewers, 2022
power and heat, 1517
public policy, 79
virtual reality, 1920
visual displays, 1819
Wearable haptic systems

definition, 552
force systems, 554
grounded kinesthetic devices, 552
haptic feedback, 552553
haptic ring, 554
pin-array systems, 554
soft skin simulation (see Soft skin simulation/
deformation)
vibrotactile system, 554
WFS systems, 300
Whence- and whitherware navigation
systems,290
WiMAX, 301
Windows, icon, menu, pointer (WIMP)
interfaces, 18, 24, 324
X
Xybernaut, 17, 2224
Y
Yamaha Vocaloid, 287
You Get Me, 271272
Z
z-buffering, 346347
Zhangs camera calibration method, 130, 137138
ZigBee, 301, 590, 593, 607
Zoomable camera model
offline process, 129
online process
intrinsic camera parameter change, 130131
monocular camera (see Monocular camera)
reference camera, 134137
PnP problem, 126127

Fundamentals of Wearable Computers and Augmented Reality, Second Edition

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Fundamentals of Wearable Computers and Augmented Reality, Second Edition

Caricato da

Copyright:

Formati disponibili

FUNDAMENTALS OF

Boca Raton London New York

CRC Press is an imprint of the

Section IIThe Technology

Chapter 7 Visual Tracking for Augmented Reality in Natural Environments......151

Section III Augmented Reality

Chapter 17 Applications of Augmented Reality for the Automotive Industry......433

Section IVWearable Computers and Wearable

Fundamentals of Wearable Computers and Augmented Reality

Wearable Computers and Augmented Reality

Fundamentals of Wearable Computers and Augmented Reality

Setpoint, is developing computing therapies to reduce systemic inflammation by

Wearable Computers and Augmented Reality

1.1 PUBLIC POLICY

Fundamentals of Wearable Computers and Augmented Reality

Wearable Computers and Augmented Reality

prescription lenses. Further, a San Francisco bar frequented by a high-tech crowd

1.2 TOWARD A THEORY OF AUGMENTED REALITY

Fundamentals of Wearable Computers and Augmented Reality

1.3 CHALLENGES AND THE FUTURE AHEAD

Wearable Computers and Augmented Reality

2 Meeting the Challenge

Fundamentals of Wearable Computers and Augmented Reality

2.2 POWER AND HEAT

Fundamentals of Wearable Computers and Augmented Reality

2.3 MOBILE INPUT

Fundamentals of Wearable Computers and Augmented Reality

2.5 VIRTUAL REALITY

Fundamentals of Wearable Computers and Augmented Reality

2.6 PORTABLE VIDEO VIEWERS

As small, flash memory-based mobile video players became common, portable

Fundamentals of Wearable Computers and Augmented Reality

2.7 INDUSTRIAL WEARABLE SYSTEMS

Fundamentals of Wearable Computers and Augmented Reality

2.8 ACADEMIC/MAKER SYSTEMS FOR EVERYDAY USE

Fundamentals of Wearable Computers and Augmented Reality

2.9 CONSUMER DEVICES

Fundamentals of Wearable Computers and Augmented Reality

2.10 MEETING THE CHALLENGE

3 Ethics, Power, and Potential

Fundamentals of Wearable Computers and Augmented Reality

Intimacy and Extimacy

3.2FUTURE SCENARIOS: ETHICAL AND SPECULATIVE

Fundamentals of Wearable Computers and Augmented Reality

Partscience fiction, part optimized efficiency, these visions tend to a reductionism of

3.2.1 Garment as Anchor

Intimacy and Extimacy

conventions and techniques used in garments and accessories; approaching them

3.2.2Start with Value

3.2.3Think about the System

3.2.4Requirements and Specifications Are for Humans Too

Fundamentals of Wearable Computers and Augmented Reality

3.2.5Prototypes and Iterative Design

3.2.6Experimenting with the Future, Questioning the Present

Intimacy and Extimacy

3.2.8Life As We Know ItThe Qualified Self

Fundamentals of Wearable Computers and Augmented Reality

Color emotion spectrum

Hierarchical structure enables

1000 colors in each core

Intimacy and Extimacy

3.3SELF AND THE SOCIAL POLITICOFWEARABLE

Fundamentals of Wearable Computers and Augmented Reality

3.3.2Quantifying the Intended User

Section IVWearable Computers and Wearable

3.2FUTURE SCENARIOS: ETHICAL AND SPECULATIVE

3.3SELF AND THE SOCIAL POLITICOFWEARABLE

3.4SYNAPTIC SCULPTURE: VIBRANT MATERIALITY