Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Wearable
Computers and
Augmented
Reality
SECOND EDITION
FUNDAMENTALS OF
Wearable
Computers and
Augmented
Reality
SECOND EDITION
edited by
Woodrow Barfield
MATLAB is a trademark of The MathWorks, Inc. and is used with permission. The MathWorks does
not warrant the accuracy of the text or exercises in this book. This books use or discussion of MATLAB software or related products does not constitute endorsement or sponsorship by The MathWorks
of a particular pedagogical approach or particular use of the MATLAB software.
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
2016 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works
Version Date: 20150616
International Standard Book Number-13: 978-1-4822-4351-2 (eBook - PDF)
This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222
Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Contents
Preface.......................................................................................................................ix
Acknowledgments......................................................................................................xi
Editor...................................................................................................................... xiii
Contributors.............................................................................................................. xv
Section IIntroduction
Chapter 1 Wearable Computers and Augmented Reality: Musings
andFuture Directions............................................................................3
Woodrow Barfield
Chapter 2 Wearable Computing: Meeting the Challenge.................................... 13
Thad Starner
Chapter 3 Intimacy and Extimacy: Ethics, Power, and Potential
ofWearable Technologies................................................................... 31
Patricia Flanagan, Despina Papadopoulos, and Georgina Voss
vi
Contents
vii
Contents
Preface
In the early 1990s, I was a member of the coordinating committee that put together
the first conference on wearable computers, which, interestingly, was followed by
a highly publicized wearable computer fashion show. Speaking at the conference,
Irecall making the following comment about wearable computers: Are we wearing
them, or are they wearing us? At the time, I was thinking that eventually advances
in prosthetics, sensors, and artificial intelligence would result in computational tools
that would have amazing consequences for humanity. Developments sincethen have
proven that vision correct. The first edition of Fundamentals of Wearable Computers
and Augmented Reality, published in 2001, helped set the stage for the coming
decade, in which an explosion in research and applications for wearable computers
and augmented reality occurred.
When the first edition was published, much of the research in augmented reality
and wearable computers was primarily proof-of-concept projects; there were
few, if any, commercial products on the market. There was no Google Glass or
handheld smartphones equipped with sensors and the computing power of a mid1980s supercomputer. And the apps for handheld smartphones that exist now
were n onexistent then. Fast forward to today: the commercial market for wearable
computers and augmented reality is in the millions of dollars and heading toward the
billions. From a technology perspective, much of what is happening now with wearables and augmented reality would not have been possible even five years ago. So,
as an observation, Ray Kurzweils law of accelerating returns seems to be alive and
well with wearable computer and augmented reality technology, because 14 years
after the first edition of this book, the capabilities and applications of both technologies are orders of magnitude faster, smaller, and cheaper.
As another observation, the research and development of wearable computers
and augmented reality technology that was once dominated by U.S. universities
and research laboratories is truly international in scope today. In fact, the second
edition of Fundamentals of Wearable Computers and Augmented Reality contains
contributions from researchers in the United States, Asia, and Europe. And if one participates in conferences in this field, they are as likely to be held these days in Europe
or Asia as they are in the United States. These are very positive developments and
will lead to even more amazing applications involving the use of wearable c omputers
and augmented reality technology in the future.
Just as the first edition of this book provided a comprehensive coverage of the
field, the second edition attempts to do the same, specifically by including chapters
from a broad range of topics w
ritten by outstanding researchers and teachers within
the field. All of the chapters are new, with an effort to again provide fundamental
knowledge on each topic so that a valuable technical resource is provided to the
community. Specifically, the second edition contains chapters on haptics, visual displays, the use of augmented reality for surgery and manufacturing, technical issues
of image registration and tracking, and augmenting the environment with wearable
ix
Preface
audio interfaces. The second edition also contains chapters on the use of augmented
reality in preserving our cultural heritage, on humancomputer interaction and
augmented reality technology, on augmented reality and robotics, and on what we
termed in the first edition as computational clothing. Still, even with this wide range
of applications, the main goal of the second edition is to provide the community with
fundamental information and basic knowledge about the design and use of wearable
computers and augmented reality with the goal to enhance peoples lives. I believe
the chapter authors accomplished that goal showing great expertise and breadth of
knowledge. My hope is that this second edition can also serve as a stimulus for
developments in these amazing technologies in the coming decade.
Woodrow Barfield, PhD, JD, LLM
Chapel Hill, North Carolina
The images for augmented reality and wearable computers are essential for the
understanding of the material in this comprehensive text; therefore, all color images
submitted by the chapter authors are available at http://www.crcpress.com/product/
isbn/9781482243505.
MATLAB is a registered trademark of The MathWorks, Inc. For product information, please contact:
The MathWorks, Inc.
3 Apple Hill Drive
Natick, MA 01760-2098 USA
Tel: 508-647-7000
Fax: 508-647-7001
E-mail: info@mathworks.com
Web: www.mathworks.com
Acknowledgments
I offer special thanks to the following chapter authors for providing images that
appear on the cover of the book: Kiyoshi Kiyokawa, an occlusion-capable optical
see-through head-mounted display; Miguel A. Otaduy, Gabriel Cirio, and Alvaro
G. Perez, simulation of a deformable hand with nonlinear skin mechanics; Vineet
R. Kamat, Amir H. Behzadan, and Suyang Dong, augmented reality visualization
of buried utilities during excavation; Marta Kersten-Oertel, virtual vessels of an
arteriovenous malformation (AVM) (with color-coded vessels [blue for veins,
red for arteries, and purple for the AVM nidus]) overlaid on a live image of a 3D
printed nylon anthropomorphic head phantom; Seokhee Jeon, Seungmoon Choi, and
Matthias Harders, an example of a visuo-haptic augmented reality system, doing a
modulation of real soft object stiffness; and Kristy Jost, Genevieve Dion, and Yury
Gogotsi, 3D simulations of knitted smart textiles (rendered on the Shima Seiki Apex
3 Design Software).
Several members of CRC Press contributed in important ways to this books
publication and deserve recognition. First, I thank Jessica Vakili, senior project
coordinator, for answering numerous questions about the process of editing the book
and those of the chapter authors in a timely, patient, and always efficient manner.
I also thank and acknowledge Cindy Renee Carelli, senior acquisition editor, for
contacting me about editing a second edition, championing the proposal through
the publishers review process, and her timely reminders to meet the deadline. The
project editor, Todd Perry, is thanked for the important task of overseeing the coordination, copyediting, and typesetting of the chapters. Gowthaman Sadhanandham
is also thanked for his work in production and assistance provided to authors.
Most importantly, in my role as editor for the second edition, I acknowledge and
thank the authors for their hard work and creative effort to produce outstanding
chapters. To the extent this book provides the community with a valuable resource
and stimulates further developments in the field, each chapter author deserves much
thanks and credit. In many ways, this book began 14 years ago, when the first edition
was published. To receive contributions from some of the original authors, to see
how their careers developed over the years, and the contributions they made to the
field, was a truly satisfying experience for me. It was a great honor that such a distinguished group again agreed to join the project.
Finally, in memoriam, I thank my parents for the freedom they gave me to follow
my interests and for the Erlenmeyer, distilling, and volumetric flasks when I was a
budding teenage scientist. Further, my niece, Melissa, is an inspiration and serves
as the gold standard in the family. Last but not least, I acknowledge my daughter,
Jessica, student and college athlete, for keeping me young and busy. I look forward
to all she will achieve.
xi
Editor
Woodrow Barfield, PhD, JD, LLM, has served as professor of engineering at the
University of Washington, Seattle, Washington, where he received the National
Science Foundation Presidential Young Investigator Award. Professor Barfield
directed the Sensory Engineering Laboratory, where he was involved in research on
sensors and augmented and virtual reality displays. He has served as a senior editor for Presence: Teleoperators and VirtualEnvironments and isan associate editor
for Virtual Reality.He has more than 350 publications and presentations, including
invited lectures and keynote talks, and holds two degrees inlaw.
xiii
Contributors
Oliver Amft
ACTLab Research Group
University of Passau
Passau, Germany
Seungmoon Choi
Pohang University of Science and
Technology
Pohang, South Korea
Ronald Azuma
Intel Labs
Santa Clara, California
Gabriel Cirio
Department of Computer Science
Universidad Rey Juan Carlos
Madrid, Spain
Woodrow Barfield
Chapel Hill, North Carolina
Amir H. Behzadan
Department of Civil, Environmental,
and Construction Engineering
University of Central Florida
Orlando, Florida
Mark Billinghurst
Human Interface Technology
Laboratory New Zealand
University of Canterbury
Christchurch, New Zealand
Steve Bourgeois
Vision and Content Engineering
Laboratory
CEA LIST
Gif-sur-Yvette, France
K.T. Tim Cheng
Department of Electrical and Computer
Engineering
University of California, Santa Barbara
Santa Barbara, California
D. Louis Collins
Department of Biomedical Engineering
Department of Neurology &
Neurosurgery
Montreal Neurological Institute
McGill University
Montreal, Canada
Michael Cohen
Computer Arts Laboratory
University of Aizu
Aizu-Wakamatsu, Japan
Genevieve Dion
Shima Seiki Haute Technology
Laboratory
ExCITe Center
Antoinette Westphal College of Media
Arts and Design
Drexel University
Philadelphia, Pennsylvania
Suyang Dong
Department of Civil and Environmental
Engineering
University of Michigan
Ann Arbor, Michigan
xv
xvi
Lucy E. Dunne
Department of Design, Housing, and
Apparel
University of Minnesota
St Paul, Minnesota
Jan Fischer
European Patent Office
Munich, Germany
Patricia Flanagan
Wearables Lab
Academy of Visual Arts
Hong Kong Baptist University
Kowloon Tong, Hong Kong
Vincent Gay-Bellile
Vision and Content Engineering
Laboratory
CEA LIST
Gif-sur-Yvette, France
Guido Gioberto
Department of Computer Science and
Engineering
University of Minnesota
Minneapolis, Minnesota
Yury Gogotsi
Department of Materials Science and
Engineering
College of Engineering
A.J. Drexel Nanomaterials Institute
Drexel University
Philadelphia, Pennsylvania
Matthias Harders
University of Innsbruck
Innsbruck, Austria
Anne-Cecilie Haugstvedt
Computas A/S
Lysaker, Norway
Contributors
Tobias Hllerer
University of California
Santa Barbara, California
Pierre Jannin
INSERM Research Director
LTSI, Inserm UMR 1099
University of Rennes
Rennes, France
Kristy Jost
Department of Materials Science and
Engineering
College of Engineering
A.J. Drexel Nanomaterials Institute
and
Shima Seiki Haute Technology
Laboratory
ExCITe Center
Antoinette Westphal College of Media
Arts and Design
Drexel University
Philadelphia, Pennsylvania
Seokhee Jeon
Kyung Hee University
Seoul, South Korea
Vineet R. Kamat
Department of Civil and Environmental
Engineering
University of Michigan
Ann Arbor, Michigan
Marta Kersten-Oertel
Department of Biomedical Engineering
Montreal Neurological Institute
McGill University
Montreal, Quebec, Canada
Kiyoshi Kiyokawa
Cybermedia Center
Osaka University
Osaka, Japan
xvii
Contributors
Bernard Kress
Google [X] Labs
Mountain View, California
John Krogstie
Department of Computer and
Information Science
Norwegian University of Science and
Technology
Trondheim, Norway
Dorra Larnaout
Vision and Content Engineering
Laboratory
CEA LIST
Gif-sur-Yvette, France
Cristian A. Linte
Department of Biomedical Engineering
Rochester Institute of Technology
Rochester, New York
Allaeddin Nassani
Human Interface Technology
Laboratory New Zealand
University of Canterbury
Christchurch, New Zealand
Alvaro G. Perez
Department of Computer Science
Universidad Rey Juan Carlos
Madrid, Spain
Carolin Reichherzer
Human Interface Technology
Laboratory New Zealand
University of Canterbury
Christchurch, New Zealand
Attila Reiss
Chair of Sensor Technology
University of Passau
Passau, Germany
Cory Simon
Johnson Space Center
National Aeronautics and Space
Administration
Houston, Texas
Thad Starner
School of Interactive Computing
Georgia Institute of Technology
Atlanta, Georgia
Ulrich Neumann
Department of Computer Science
University of Southern California
Los Angeles, California
Maki Sugimoto
Faculty of Science and Technology
Department of Information and
Computer Science
Keio University
Tokyo, Japan
Miguel A. Otaduy
Department of Computer Science
Universidad Rey Juan Carlos
Madrid, Spain
Takafumi Taketomi
Nara Institute of Science and
Technology
Nara, Japan
Despina Papadopoulos
Interactive Telecommunications
Program
New York University
New York, New York
Mohamed Tamaazousti
Vision and Content Engineering
Laboratory
CEA LIST
Gif-sur-Yvette, France
xviii
Jonathan Ventura
University of Colorado
Colorado Springs, Colorado
Julin Villegas
Computer Arts Laboratory
University of Aizu
Aizu-Wakamatsu, Japan
Georgina Voss
Science and Technology Policy
Research
University of Sussex
Sussex, United Kingdom
Xin Yang
Department of Electrical and Computer
Engineering
University of California, Santa Barbara
Santa Barbara, California
Contributors
Ziv Yaniv
TAJ Technologies, Inc.
Mendota Heights, Minnesota
and
Office of High Performance Computing
and Communications
National Library of Medicine
National Institutes of Health
Bethesda, Maryland
Suya You
Department of Computer Science
University of Southern California
Los Angeles, California
Section I
Introduction
Wearable Computers
and Augmented Reality
Musings and Future Directions
Woodrow Barfield
CONTENTS
1.1 Public Policy......................................................................................................7
1.2 Toward a Theory of Augmented Reality........................................................... 9
1.3 Challenges and the Future Ahead.................................................................... 10
References................................................................................................................. 11
In this chapter, I briefly introduce the topic of wearable computers and augmented
reality, with the goal to provide the reader a roadmap to the book, a brief h istorical
perspective, and a glimpse into the future of a sensor-filled, wearable computer and
augmented reality (AR) world. While each technology alone (AR and wearables) is
providing people with amazing applications and technologies to assist them in their
daily life, the combination of the technologies is often additive and, in some cases,
multiplicative, as, for example, when virtual images, spatialized sound, and haptic
feedback are combined with wearable computers to augment the world with information whenever or wherever it is needed.
Let me begin to set the stage by offering a few definitions. Azuma (1997) defined
an augmented reality application as one that combines the real world with the virtual
world, is interactive and in real-time, and is registered in three dimensions. Often, the
platform to deliver augmented reality is a wearable device; or in the case of a smart
phone, a hand-held computer. Additionally, most people think of a w
earable computer as a computing device that is small and light enough to be worn on ones body
without causing discomfort. And unlike a laptop or a palmtop, a wearable c omputer is
constantly turned on and is often used to interact with the real-world through sensors
that are becoming more ubiquitous each day. Furthermore, information provided
by a wearable computer can be very context and location sensitive, especially when
combined with GPS. In this regard, the computational model of wearable computers
differs from that of laptop computers and personal digital assistants.
In the early days of research in developing augmented reality, many of the same
researchers were also involved in creating immersive virtual environments. We began
to discuss different degrees of reality and virtuality. Early on, Paul Milgram from
the University of Toronto, codified the thinking by proposing a virtuality continuum
which represents a continuous scale ranging between the completely virtual world,
avirtuality, and a completely real, reality (Milgram etal., 1994). The realityvirtuality
continuum therefore encompasses all possible variations and compositions of real
and virtual objects. The area between the two extremes, where both the real and the
virtual are mixed, is the so-called mixed-realitywhich Paul indicated consisted
of both augmented-reality, where the virtual augments the real, and augmented
virtuality, where the real augments the virtual. Another prominent early researcher
in wearables, and a proponent of the idea of mediating reality, was Steve Mann
(2001, 2002). Steve, now at the University of Toronto, describes wearable computing as miniature body-borne computational and sensory devices; he expanded the
discussion of wearable computing to include the more expansive term bearable
computing by which he meant wearable computing technology that is on or in the
body, and with numerous examples, Steve showed how wearable computing could be
used to augment, mediate, or diminish reality (Mann, 2002).
When I think of the different types of computing technology that may be worn on
or in the body, I envision a continuum that starts with the most basic of wearable computing technology and ends with wearable computing that is actually connected to a
persons central nervous system, that is, their brain (Figure 1.1). In fact, as humans
are becoming more-and-more equipped with wearable computing technology, the
distinction as to what is thought of as a prosthesis is becoming blurred as we integrate more wearable computing devices into human anatomy and physiology. The
extension of computing integrated into a persons brain could radically enhance
human sensory and cognitive abilities; in fact, in my view, we are just now at the cusp
of wearable computing and sensor technology breaking the skin barrier and moving
4 mm
3
1
FIGURE 1.1 A microchip is used to process brain waves that are used to control a cursor on
a computer screen. (Image courtesy of Wikimedia Commons.)
into the human body, and eventually into the brain. Already there are experimental
systems (computing technology integrated into a persons brain) in-field now that are
helping those with severe physical disabilities. For example, consider people with
debilitating diseases such that they are essentially locked in their own body. With
the appropriate wearable computing technology consisting of a microchip that is
implanted onto the surface of the brain (where it monitors electronic thought pulses),
such people may use a computer by thought alone allowing them to communicate
with their family, caregivers, and through the internet, the world at large. Sadly, just
in the United States about, 5000 people yearly are diagnosed with just such a disease
that ultimately shuts down the motor control capabilities of their bodyamyotrophic
lateral sclerosis, sometimes called Lou Gehrigs disease. This disease is a rapidly
progressive, invariably fatal neurological disease that attacks the nerve cells responsible for controlling voluntary muscles. I highlight this example to show, that while
many uses of AR/wearables will be for gaming, navigation, shopping, and so on,
there are very transformative uses of wearable computing technology, either being
developed now, or soon to be developed, that will benefit humanity in ways we are
just now beginning to realize.
One of the early adopters of wearable computing technology, especially with
regard to implantable sensors within the body, was Professor Kevin Warwick who in
1998 at the University of Reading was one of the first people to hack his body when
he participated in a series of proof-of-concept studies involving a sensor implanted
into the median nerves of his left arm; a procedure which allowed him to link his
nervous system directly to a computer. Most notably, Professor Warwick was able
to control an electric wheelchair and an artificial hand, using the neural interface.
In addition to being able to measure the signals transmitted along the nerve fibers in
Professor Warwicks left arm, the implant was also able to create artificial sensation
by stimulating the nerves in his arm using individual electrodes. This bi-directional
functionality was demonstrated with the aid of Kevins wife and a second, less complex implant which connected to her nervous system. According to Kevin, this was the
first solely electronic communication between the nervous systems of two humans;
since then, many have extended Kevins seminal work in wearable computers using
RFID chips and other implantable sensors (and there is even an a nti-chipping statute
enacted in California and other states).
Other types of innovative and interesting wearable devices are being developed
at a rapid pace. For example, researchers at Brown University and Cyberkinetics
in Massachusetts are devising a microchip that is implanted in the motor cortex
just beneath a persons skull that will be able to intercept nerve signals and reroute
them to a computer, which will then wirelessly send a command to any of various
electronic devices, including computers, stereos, and electric wheelchairs. And
neuroscientists and robotics engineers have just recently demonstrated the viability of
direct brain-to-brain communication in humans using electroencephalogram (EEG)
and image-guided transcranial magnetic stimulation (TMS) technologies. Further,
consider a German team that has designed a microvibration device and a wireless
low-frequency receiver that can be implanted in a persons tooth. The vibrator acts
as microphone and speaker, sending sound waves along the jawbone to a persons
eardrum. And in another example of a wearable implantable device, the company
computers designed to monitor their blood-sugar level; because if not controlled such
people are at risk for dangerous complications, including damage to the eyes, kidneys, and heart. To help people monitor their blood-sugar level, Smart Holograms,
a spinoff company of Cambridge University, Google, and others are developing eyeworn sensors to assist those with the disease. Googles technology consists of contact
lens built with special sensors that measures sugar levels in tears using a tiny wireless chip and miniature sensor embedded between two layers of soft contact lens
material. As interesting and innovative as this solution to monitoring diabetes is, this
isnt the only examples of eye-oriented wearable technology that will be developed.
In the future, we may see people equipped with contact lens or retinal prosthesis that
monitor their health, detect energy in the x-ray or infrared range, and have telephoto
capabilities.
As for developing a telephoto lens, for the approximately 2025 million people
worldwide who have the advanced form of age-related macular degeneration (AMD),
a disease which affects the region of the retina responsible for central, detailed vision,
and is the leading cause of irreversible vision loss and legal blindness in people over
the age of 65, an implantable telescope could offer hope. In fact, in 2010, the U.S.
FDA approved an implantable miniature telescope (IMT), which works like the telephoto lens of a camera (Figure 1.2).
The IMT technology reduces the impact of the central vision blind spot due to
end-stage AMD and projects the objects the patient is looking at onto the healthy
area of the light-sensing retina not degenerated by the disease.
FIGURE 1.2 The implantable miniature telescope (IMT) is designed to improve vision
for those experiencing age-related macular degeneration. (Images provided courtesy of
VisionCare Ophthalmic Technologies, Saratoga, CA.)
The surgical procedure involves removing the eyes natural lens, as with cataract
surgery, and replacing the lens with the IMT. While telephoto eyes are not coming
soon to an ophthalmologist office, this is an intriguing step in that direction and a
look into the future of wearable computers. I should point out that in the United
States any device containing a contact lens or other eye-wearable technology is regulated by the Federal Drug Administration as a medical device; the point being that
much of wearable computing technology comes under government regulation.
devices is becoming a major concern. Another point to make is that sensors on the
outside of the body, are rapidly moving under the skin as they begin to connect the
functions of our body to the sensors external to it (Holland etal., 2001).
Furthermore, what about privacy issues and the use of wearable computers to
film people against their will? Consider an extreme case, video voyeurism, which
is the act of filming or disseminating images of a persons private areas under
circumstance in which the person had a reasonable expectation of privacy regardless
of whether the person is in a private or public location. Video voyeurism is not only
possible but being done using wearable computers (mostly hand held cameras). In
the United States, such conduct is prohibited under State and Federal law (see, e.g.,
Video Voyeurism Prevention Act of 2004, 18 U.S.C.A. 1801). Furthermore, what
about the privacy issues associated with other wearable computing technology such
as the ability to recognize a persons face, then search the internet for personal information about the individual (e.g., police record or credit report), and tag that information on the person as they move throughout the environment?
As many of the chapters in this book show, the use of wearable computers
combined with augmented reality capabilities can be used to alter or diminish reality
in which a wearable computer can be used to replace or remove clutter, say, for example, an unwanted advertisement on the side of a building. On this topic, I published
an article Commercial Speech, Intellectual Property Rights, and Advertising Using
Virtual Images Inserted in TV, Film, and the Real World in UCLA Entertainment
Law Review. In the article, I discussed the legal and policy ramifications of placing
ads consisting of virtual images projected in the real world. We can think of virtual
advertising as a form of digital technology that allows advertisers to insert computergenerated brand names, logos, or animated images into television programs or movies; or with Steves wearable computer technology and other displays, the real world.
In the case of TV, a reported benefit of virtual advertising is that it allows the action
on the screen to continue while displaying an ad viewable only by the home audience.
What may be worrisome about the use of virtual images to replace portions of
the real world is that corporations and government officials may be able to alter what
people see based on political or economic considerations; an altered reality may then
become the accepted norm, the consequences of which seem to bring up the dystopian
society described in Huxleys Brave New World. Changing directions, another policy
issue to consider for people equipped with networked devices is what liabilities, if
any, would be incurred by those who disrupt the functioning of their computing prosthesis. For example, would an individual be liable if they interfered with a signal sent
to an individuals wearable computer, if that signal was used to assist the individual
in seeing and perceiving the world? On just this point, former U.S. Vice President,
Dick Cheney, equipped with a pacemaker had its wireless feature disabled in 2007.
Restaurants have also entered into the debate about the direction of our wearable
computer future. Taking a stance against Google Glass, a Seattle-based restaurant,
Lost Lake Cafe, actually kicked out a patron for wearing Glass. The restaurant is
standing by its no-glass policy, despite mixed responses from the local community.
In another incident, a theater owner in Columbus, Ohio, saw enough of a threat from
Google Glass to call the Department of Homeland Security. The Homeland Security
agents removed the programmer who was wearing Google Glass connected to his
10
images combined to form a seamless whole with the environment they were projected into, or whether virtual images projected in the real world appeared separate
from the surrounding space (floating and disembodied from the real world scene).
I recalled a paper I had read while in college by Garner and Felfoldy (1970) on
the integrality of stimulus dimensions in various types of information processing.
The authors of the paper noted that separable dimensions remain psychologically
distinct when in combination; an example being forms varying in shape and color.
We say that two dimensions (features) are integral when they are perceived holistically, that is, its hard to visually decode the value of one independently from the
other. A vast amount of converging evidence suggests that people are highly efficient
at selectively attending to separable dimensions. By contrast, integral dimensions
combine into relatively unanalyzable, unitary wholes, an example being colors
varying in hue, brightness, and saturation. Although people can selectively attend
to integral dimensions to some degree, the process is far less efficient than occurs
for s eparable-dimension stimuli (Shepard, 1964). I think that much can be done to
develop a theory of augmented, mediated, or diminished reality using the approach
discussed by Garner and Felfoldy, and Shepard, and I encourage readers of this book
to do so. Such research would have to expand the past work which was done on single
images, to virtual images projected into the real world.
11
REFERENCES
Azuma, R. T., 1997, A survey of augmented reality, Presence: Teleoperators and Virtual
Environments, 6(4), 355385.
Garner, W. R. and Felfoldy, G. L., 1970, Integrality of stimulus dimensions in various types
ofinformation processing, Cognitive Psychology, 1, 225241.
Glik v. Cunniffe, 655 F.3d 78 (1st Cir. 2011) (case at the United State Court of Appeals for
the First Circuit that held that a private citizen has the right to record video and audio
of public officials in a public place, and that the arrest of the citizen for a wiretapping
violation violated the citizens First and Fourth Amendment rights).
Holland, D., Roberson, D. J., and Barfield, W., 2001, Computing under the skin, in Barfield, W.
and Caudell, T. (eds.), Fundamentals of Wearable Computing and Augmented Reality,
pp. 747792, Lawrence Erlbaum Associates, Inc., Mahwah, NJ.
Mann, S., August 6, 2002, Mediated reality with implementations for everyday life, Presence
Connect, the online companion to the MIT Press journal, PRESENCE: Teleoperators
and Virtual Environments, 11(2), 158175, MIT Press.
Mann, S. and Niedzviecki, H., 2001, Cyborg: Digital Destiny and Human Possibility in the
Age of the Wearable Computer, Anchor Canada Publisher, Toronto, Doubleday, Canada
Publisher.
Milgram, P., Takemura, H., Utsumi, A., and Kishino, F., 1994, Augmented reality: A class of
displays on the realityvirtuality continuum, in Proceedings of the SPIE Conference on
Telemanipulator and Telepresence Technologies, vol. 2351, pp. 282292, Boston, MA.
Shepard, R. N., 1964, Attention and the metric structure of the stimulus space, Journal of
Mathematical Psychology, 1, 5487.
Wearable Computing
Thad Starner
CONTENTS
2.1 Networking...................................................................................................... 14
2.2 Power and Heat................................................................................................ 15
2.3 Mobile Input.................................................................................................... 17
2.4 Display............................................................................................................. 18
2.5 Virtual Reality................................................................................................. 19
2.6 Portable Video Viewers...................................................................................20
2.7 Industrial Wearable Systems........................................................................... 22
2.8 Academic/Maker Systems for Everyday Use..................................................24
2.9 Consumer Devices...........................................................................................26
2.10 Meeting the Challenge.....................................................................................28
References.................................................................................................................28
Wearable computers and head-mounted displays (HMDs) are in the press daily. Why
have they captured our imaginations now, when the technology has been available
for decades? While Fitbits fitness tracking devices are selling in the millions in
2014, what prevented FitSense (see Figure 2.5) from having similar success with
such devices in 2000? Since 1993 I have been wearing a computer with an HMD as
part of my daily life, and Reddy Information Systems had a commercial wearable
with Reflection Technologys Private Eye HMD in 1991 (Eliason 1992). Yet over
20years later, Google Glass is generating more excitement than any of those early
devices.
Many new classes of devices have followed a similar arc of adoption. The fax
machine was invented in 1846 but became popular over 130years later. In 1994, the
IBM Simon touchscreen smartphone had many features familiar in todays phones,
but it was the Apple iPhone in 2007 that seized the publics imagination (Sager 2012).
Often, the perceived need for a technology lags behind innovation, and sometimes
developers can be surprised by the ways in which users run with a technology. When
the cellular phone was introduced in the early 1980s, who would have guessed that
increasingly we would use it more for texting than talking?
Some pundits look for a killer app to drive the adoption of a new class of device.
Yet that can be misleading. As of mid-2014, tablets are outselling laptops in Europe,
yet there is no single killer app that drives adoption. Instead, the tablet offers a different set of affordances (Gibson 1977) than the smartphone or the laptop, making
13
14
it more desirable in certain situations. For example, for reading in bed the tablet
is lighter than a laptop and provides an easier-to-read screen than a smartphone.
Thetablet is controlled by finger taps and swipes that require less hardware and dexterity than trying to control a mouse and keyboard on a laptop, which also makes it
convenient for use when the user is in positions other than upright at a desk.
Wearable computers have yet a different set of affordances than laptops, tablets,
and smartphones. I often lie on a couch in my office, put the focus of my HMD at the
same depth as the ceiling, and work on large documents while typing using a onehanded keyboard called a Twiddler. This position is very comfortable, much more so
than any other interface I have tried, but students often think that they are waking me
when they walk into my office. In addition, I often use my wearable computer while
walking. I find it helps me think to be moving when I am composing, and no other
device enables such on-the-go use.
On-the-go use is one aspect of wearable computers that makes them distinct from
other devices. In fact, my personal definition of a wearable computer is any bodyworn computer that is designed to provide useful services while the user is performing other tasks. Often the wearables interface is secondary to a users other tasks
and should require a minimum of user attention. Take, for example, a digital music
player. It is often used while a user is exercising, studying, or commuting, and the
interface is used in short bursts and then ignored.
Such a secondary interface in support of a primary task is characteristic of a
wearable computer and can be seen in smartwatches, some spacesuits, fitness monitors, and even smartphones for some applications. Some of these devices are already
commonplace. However, here I will focus on wearable computers that include an
HMD, as these devices are at the threshold of becoming popular and are perhaps
the most versatile and general-purpose class of wearable computers. Like all wearable computers, those based on HMDs have to address fundamental challenges in
networking (both on and off the body), power and heat, and mobile input. First I will
describe these challenges and show how, until recently, they severely limited what
types of devices could be manufactured. Then I will present five phases of HMD
development that illustrate how improvements in technology allowed progressively
more useful and usable devices.
2.1NETWORKING
Turn-by-turn navigation, voice-based web search, and cloud-based office tools are
now commonplace on smartphones, but only in the past few years has the latency
of cellular networks been reduced to the point that computing in the cloud is effective. A decade ago, the throughput of a cellular network in cities like Atlanta could
be impressive, yet the latency would severely limit the usability of a user interface
depending on it. Today when sending a message, a Google Glass user might say,
OK Glass, send a message to Thad Starner. Remember to pick up the instruction
manual, and the experience can be a seamless interplay of local and cloud-based
processing. The three commands OK Glass, send a message to, and Thad Starner
are processed locally because the speech recognizer simply needs to distinguish
between one of several prompts, but the message content Remember to pick up
Wearable Computing
15
the instruction manual requires the increased processing power of the cloud to be
recognized accurately. With an LTE cellular connection, the content is processed
quickly, and the user may barely notice a difference in performance between local
and remote services. However, with a GPRS, EDGE, or sometimes even an HSPDA
connection, the wait for processing in the cloud can be intolerable.
WiFi (IEEE 802.11) might seem a viable alternative to commercial cellular networks, but until 2000 open hotspots were rare. Wearable computers in the late 1990s
often used WiFi, but they required adapters that were the size of a small mobile
phone and required significant power. Today, a part of a single chip can provide this
service.
On-body networking has also been a challenge. Bluetooth (IEEE 802.15) was
originally intended as a replacement for RS232 connections on desktop PCs, not as a
body network. The standard was not designed with power as a foremost concern, and
even basic implementations were unstable until 2001. Only recently, with the widespread adoption of Bluetooth Low Energy by the major mobile phone manufacturers
have wearable devices really had an appropriate body-centered wireless network.
Fundamental issues still remain. Both WiFi and Bluetooth use 2.4 GHz radio, which
is blocked by water and the human body. Thus, a sensor mounted in a shoe to monitor footfalls might have difficulty maintaining connection to an earbud that provides
information as to a runners performance.
Most positioning systems also involve networks. For example, the location-aware
Active Badge system made by Olivetti Research Laboratory in 1992 used a network
of infrared receivers to detect transmissions from a badge to locate a wearer and to
unlock doors as the user approached them. When the user was walking through the
lab, the system could also re-route phone calls to the nearest phone (Want 2010).
Similarly, the Global Positioning System uses a network of satellites to provide
precisely synchronized radio transmissions that a body-worn receiver can use to
determine its position on the surface of the planet. Today, GPS is probably one of
the most commonly used technologies for on-body devices. It is hard to imagine
life without it, but before 2000, GPS was accurate to within 100 m due to the U.S.
military intentionally degrading the signal with Selective Availability. Turn-by-turn
directions were impossible. Today, civilian accuracy has a median open, outdoor
accuracy of 10 m (Varshavsky and Patel 2010). Modern GPS units can even maintain
connection and tracking through wooden roofs.
16
normal 18-month consumer product development cycle, the battery should be specified first as it will often be the most constraining factor on the products industrial
design and will drive the selection of other components.
One of those components is the DCDC power converter. A typical converter
might accept between 3.4 and 4.2 V from a nominal 3.6 V lithium battery and
produce several constant voltages for various components. One improvement in
mobile consumer electronics that often goes underappreciated is the efficiency of
DCDC power converters. Before 2000, just the DCDC converter for Google Glass
could mass 30 g (Glass itself is 45 g), and the device might lose 30% of its power as
heat. Today, switching DCDC converters are often more than 95% efficient and are
just a few grams.
Due to this efficiency improvement, there is a corresponding reduction in heat
production. Heat often limits how small a mobile device can be. A wearable device is
often in contact with a users skin, and it must have enough surface area and ventilation
to cool, or it will have to throttle its performance considerably to stay at a comfortable
temperature for the user (Starner and Maguire 1999). This tension between performance and physical size can be quite frustrating to designers of wearable devices.
Users often desire small jewelry-like devices to wear but are also attracted to powerhungry services like creating augmented reality overlays with registered graphics or
transmitting video remotely. Yet in consumer products, fashion is the key. Unless the
consumer is willing to put on the device, it does not matter what benefits it offers, and
physical size and form are major components of the desirability of a device.
In practice, the design of a wearable device is often iterative. Given a battery
size, an industrial designer creates a fashionable package. That package should be
optimized in part for thermal dissipation given its expected use. Will the device
have the ability to perform the expected services and not become uncomfortable
to wear? If not, can the package be made larger to spread the heat, lowering
the temperature at the surface? Or can lower-heat alternatives be found for the
electronics? Unfortunately, many industrial design tools do not model heat, which
tends to require highly s pecialized software. Thus, the iteration cycle between fashion and mechanical engineering constraints can be slow.
One bright spot in designing wearable computers is the considerable effort that
has been invested in smartphone CPUs and the concomitant power benefits. Modern
embedded processors with dynamic voltage scaling can produce levels of computing power equivalent to a late-1980s supercomputer in one instant and then, in the
next moment, can switch to a maintenance mode which draws milliwatts of power
while waiting for user input. Designing system and user software carefully for these
CPUs can have significant benefits. Slower computation over a longer period can
use significantly less power than finishing the same task at a higher speed and then
resting. This slow-and-steady technique has cascading benefits: power converters are
generally more efficient at lower currents, and lithium-ion batteries last longer with
a steady discharge than with bursty uses of power.
Similarly, system software can exploit knowledge about its networking to help
flatten the battery load.
Wireless networking requires significant power when the signal is weak. For
non-crucial tasks, waiting for a better signal can save power and heat. Designing
Wearable Computing
17
maintenance and background tasks (e.g., caching email and social networking feeds)
to be thermally aware allows more headroom for on-demand interactive tasks. If the
wearable is thought of as a leaky cup, and heat as water filling it, then one goal is
to keep the cup as empty as possible at any given time so that when a power-hungry
task is required, we have as much space as possible to buffer the heat produced and
not overflow the cup.
18
upside-down optical mouse sensor to control a cursor. Perhaps with todays smaller
and lower-power components, a wireless version could be made. More recently,
Zeagler and Starner explored textile interfaces for mobile input (Komar etal. 2009,
Profita etal. 2013), and a plethora of community-funded Bluetooth Human Interface
Devices are being developed, often focusing on rings and bracelets. One device will
not satisfy all needs, and there will be an exciting market for third-party interfaces
for consumer wearable computers.
Traditional windows, icon, menu, pointer (WIMP) interfaces are difficult to use
while on-the-go as they require too much visual and manual attention. Fortunately,
however, smartphones have broken the former monopoly on graphical user interfaces. Swipes, taps, and gestures on phone and tablet touchscreens can be made without much precision, and many of the features of Android and iOS can be accessed
through these cruder gestures. Yet these devices still require a flat piece of glass,
which can be awkward to manipulate while doing other tasks. Instead, researchers
and startups are spending considerable energy creating gestural interfaces using
motion sensors. Besides pointing, these interfaces associate gestures with particular
commands such as silencing a phone or waking up an interface. False triggering,
however, is a challenge in the mobile environment; an interface that keeps triggering
incorrectly throughout the users workday is annoying at best.
2.4DISPLAY
While visual displays often get the most attention, auditory and tactile displays are
excellent choices for on-the-go users. Almost all mobile phones have a simple vibration motor to alert the user to an incoming call. Unfortunately, a phone vibrating in a
pants pocket or purse can be hard to perceive while walking. In the future, I expect
the closer contact with the skin made available by smartwatches to enable more reliable and expressive tactile interfaces than a simple on/off vibration motor.
Audio displays are another good choice for on-the-go interaction. Smartphones
and mobile music players are almost always shipped with earbuds included, but there
is much room for innovation. Bone conduction, such as is used with Google Glass
and by the military and professional scuba divers, allows the wearer to hear notifications from the computer without blocking the ear canals. Ambient audio interfaces
(Sawhney and Schmandt 2000) allow the wearer to monitor information sources,
like the volume of stock market trading, for sudden changes without devoting much
attention to the process. Rendering audio in 3D can help the user monitor several
ambient information sources at once or can improve the sense of participant presence during conference calls.
HMDs can range from devices meant to immerse the user in a synthetic reality to
a device with a few lights to provide feedback regarding the wearers performance
while biking. HMDs can be created using lasers, scanning mirrors, holographic
optics, LCDs, CRTs, and many others types of technologies. For any given HMD,
design trade-offs are made between size, weight, power, brightness and contrast,
transparency, resolution, color, eyebox (the 3D region in which the eye can be placed
and still see the entire display in focus), focus, and many other factors. The intended
purpose of the HMD often forces very different form factors and interactions.
19
Wearable Computing
Forthe purpose of discussion, Ive clustered these into five categories: virtual reality,
portable video viewers, industrial wearable systems, academic/maker wearables for
everyday use, and consumer devices. See Kress et al. (2014) for a more technical
discussion of typical optics of these types of displays.
(a)
(b)
(c)
(d)
FIGURE 2.1 Virtual reality HMDs. (a) Virtual Researchs Flight Helmet (1991, $6000).
(b) Nintendo virtual boy video game console (1995, $180). (c) Virtual i-O i-glasses! Personal
3D viewer head-mounted display (1995, $395). (d) Oculus Rift DK1 (2013, $300). (Images
courtesy of Tavenner Hall.)
20
the size. However, with todays lighter weight panels and electronics, the Oculus
Rift Developer Kit 1 slightly surpasses the original Flight Helmets field of view and
has 640 480 pixel resolution per eye while weighing 379 g. The biggest difference
between 1991 and today, though, is the pricethe Rift DK1 is only $300 whereas the
Flight Helmet, adjusted for inflation, would be the equivalent of over $10,000 today.
The 1995 Nintendo Virtual Boy game console is an interesting contrast to the
Flight Helmet. It costs $180, and with over a million devices sold, it ranks among
the largest-selling HMDs. The Virtual Boy introduced many consumers to immersive gameplay. It is portable and includes the full computing system in the headset
(the wired controller includes the battery pack for the device). As a table-top head
display, the Virtual Boy avoids the problem of too much weight on the head, but it
has no possibility of head tracking or the freedom of motion available with most
VR headsets. It uses Reflection Technologys scanning, mirror-style, monochromatic
display in which a column of 224 LEDs is scanned across the eye with an oscillating
mirror as the LEDs flash on and off, creating an apparent 384 224 pixel resolution
display with persistence of vision. Unlike many consumer VR devices, the Virtual
Boy provides adjustments for focus and inter-eye distance. Still, some users quickly
complain of simulation sickness issues.
Minimizing head weight and simulation sickness continues to be a major concern
with modern VR HMDs. However, power and network are rarely a concern with these
devices, since they are mostly for stationary use and attach to desktop or gaming
systems. The user controls the experience through head tracking and instrumented
gloves as well as standard desktop interfaces such as keyboards and joysticks. While
these VR HMDs are not wearables by my definition, they are examples of early
major efforts in industrial and consumer devices and share many features with the
next class of device, mobile video viewers.
21
Wearable Computing
(a)
(b)
(c)
(d)
(e)
(f )
FIGURE 2.2 Portable video viewers first concentrated on interfacing with portable DVD
players, then flash-based media players like the video iPod, and most recently started integrating enough internal memory to store movies directly. (a) Sony Glasstron PLM-A35 (2000,
$499). (b) Eyetop Centra DVD bundle (2004, $599). (c) MyVu Personal Viewer (2006, $270).
(d) Vuzix iWear (2008, $250). (e) Vuzix Wrap 230 (2010, $170). (f) Epson Moverio BT-100
(2012; $700). (Images courtesy of Tavenner Hall.)
22
The Moverio BT-100 (Figure 2.2) is especially interesting as it sits astride three
different classes of device: portable video viewer, industrial wearable, and consumer
wearable. It is self-contained, two-eyed, 2D or 3D, and see-through and can run standard Android applications. It has WiFi and a removable micro-SDHC for loading
movies and other content. Its battery and trackpad controller is in a wired pendant,
giving it ease of control and a good battery life. Unfortunately, the HMD itself is a
bit bulky and the noseweight is too highboth problems the company is trying to
address with the new BT-200 model.
Unlike the modern Moverio, many older devices do not attempt 3D viewing, as
simulator sickness was a potential issue for some users and 3D movies were uncommon until the late 2000s. Instead, these displays play the same image on both eyes,
which can still provide a high quality experience. Unfortunately, video viewers s uffer
certain apathy from consumers. Carrying the headset in addition to a smartphone
or digital video player is a burden, and most consumers prefer watching movies
ontheir pocket media players and mobile phones instead of carrying the extra bulk
of a video viewer. An argument could be made that a more immersive system, like an
Oculus Rift, would provide a higher quality experience that consumers would prefer,
but such a wide field of view system is even more awkward to transport. Studies on
mobile video viewing show diminishing returns in perception of quality above 320
240 pixel resolution (Weaver etal. 2010), which suggests that once video quality is
good enough, the perceived value of the video system will be more determined by
other factors such as convenience, ease-of-use, and price.
23
Wearable Computing
(a)
(b)
(c)
(d)
(e)
(f )
FIGURE 2.3 Wearable systems designed for industrial, medical, and military a pplications.
(a) Xybernaut MA-IV computer (1999, $7500). (b) Triplett VisualEYEzer 3250 multimeter
(2000, $500). (c) Xybernaut MA-V computer (2001, $5000). (d) Xybernaut/Hitachi VII/
POMA/WIA computer (2002, $1500). (e) MicroOptical SV-6 display (2003, $1995). (f)Vuzix
Tac-Eye LT head-up display (2010, $3000). (Images courtesy of Tavenner Hall.)
situations. In the operating room, anesthesiologists use HMDs in a similar way. The
HMD overlays vital statistics on the doctors visual field while monitoring the patient
(Liu etal. 2009). Current practice often requires anesthesiologists to divert their gaze
to monitor elsewhere in the room, which reduces the speed at which dangerous situations are detected and corrected.
With more case studies showing the advantages of HMDs in the workplace,
industry has shown a steady interest in the technology. From the mid-1990s to after
2000, companies such as FlexiPC and Xybernaut provided a general-purpose line
24
of systems for sale. See Figure 2.3 for the evolution of Xybernauts line. Meanwhile,
specialty display companies like MicroOptical and Vuzix (Figure 2.3) made displays
designed for industrial purposes but encouraged others to integrate them into systems
for industry. User input to a general purpose industrial system might be in the form
of small vocabulary, isolated-word speech recognition; a portable trackball; a dial;
or a trackpad mounted on the side of the main computer. Wireless networking was
often by 802.11 PCMCIA cards. CDPD, a digital standard implemented on top of
analog AMPS cellular service, was used when the wearer needed to work outside of
the corporate environment. Most on-body components were connected via wires, as
wireless Bluetooth implementations were often unstable or non-existent. Industrial
customers often insisted on Microsoft Windows for compatibility with their other
systems, which dictated many difficult design choices. Windows was not optimized
for mobile use, and 86 processors were particularly bad at power efficiency. Thus,
wearables had to be large to have enough battery life and to dissipate enough heat
during use. The default Windows WIMP user interface required significant hand-eye
coordination to use, which caused wearers to stop what they were doing and focus on
the virtual interface before continuing their task in the physical world. After smart
phones and tablets introduced popular, lighter-weight operating systems and user
interfaces designed for grosser gesture-based interactions, many corporate customers began to consider operating systems other than Windows. The popularization of
cloud computing also helped break the Windows monopoly, as corporate customers
considered wearables as thin client interfaces to data stored in the wireless network.
Today, lightweight, self-contained Android-based HMDs like Google Glass,
Vuzix M100, and Optinvent ORA are ideal for manufacturing tasks such as order
picking and quality control, and companies like APX-Labs are adapting these devices
to the traditional wearable industrial tasks of repair, inspection, and maintenance.
Yet many opportunities still exist for improvements; interfaces are evolving quickly,
but mobile input is still a fundamental challenge. Switching to a real-time operating system could help with better battery life, user experience, weight, cost, system
complexity, and the number of parts required to make a full machine. One device is
not suitable for all tasks, and I foresee an array of specialized devices in the future.
25
Wearable Computing
(a)
(b)
(c)
(d)
FIGURE 2.4 Some wearable computers designed by academics and makers focused on
creating interfaces that could be used as part of daily life. (a) Herbert 1, designed by Greg
Priest-Dorman in 1994. (b) Lizzy wearable computer, designed by Thad Starner in 1995
(original design 1993). (c) MIThril, designed by Rich DeVaul in 2000. (d) CharmIT, designed
as a commercial, open-hardware wearable computing kit for the community by Charmed,
Inc. in 2000. (Images courtesy of Tavenner Hall.)
lighter-weight interfaces and operating systems, battery life tended to be better than
the industrial counterparts. Networks included analog dial-up over cellular, amateur
radio, CDPD, and WiFi as they became available. The CharmIT, Lizzy, and Herbert
1 concentrated the electronics into a centralized package, but the MIThril and
Herbert 3 (not shown) distributed the electronics in a vest to create a more balanced
package for wearing.
Displays were mostly one-eyed and opaque, depending on the illusion in the
human visual system by which vision is shared between the two eyes. These displays appear see-through to the user because the image from the occluded eye and
the image of the physical world from the non-occluded eye are merged to create a
perception of both. In general, opaque displays provide better contrast and brightness than transparent displays in daylight environments. The opaque displays might
be mounted up and away from the main line of sight or mounted directly in front
of the eye. Reflection Technologys Private Eye (Figure 2.4b) and MicroOpticals
displays (Figure 2.4d) were popular choices due to their relatively low power and
good sharpness for reading text. Several of the everyday users of these homebrew
machines from the 1990s would later join the Google Glass team and help inform
the development of that project.
26
(a)
(b)
(c)
(d)
FIGURE 2.5 As technology improves, consumer wearable devices continue to gain acceptance. (a) Fitsense heart band, shoe sensor, and wristwatch display (2000, $200). (b) Fitbit
One (2012, $100). (c) Recon MOD Live HMD and watch band controller for skiing (2011,
$400). (d) 2012 Ibex Google Glass prototype. Released Glass Explorer edition (2014, $1500).
(Images courtesy of Tavenner Hall.)
Wearable Computing
27
its different components as well as a desktop or laptop. This choice was necessary
because of battery life and the lack of stability of wireless standards-based interfaces
at the time, and it meant that mobile phones could not interface with the device. Now
that Bluetooth LE is becoming common, an increasing number of devices, including
the Recon MOD Live and Google Glass (Figure 2.5), will leverage off-body digital
networks by piggybacking on the connection provided by a smartphone.
Both consumer wristwatches, such as FS-1, and HMDs, such as Recon MOD
and Google Glass, can provide information to the wearer while on the go. Because
these displays are fast to access, they reduce the time from when the user first has
the intention to check some information and the action to do so. Whereas mobile
phones might take 23 s to access (to physically retrieve, unlock, and navigate to the
appropriate application), wristwatches and HMDs can shorten that delay to only a
couple of seconds (Ashbrook etal. 2008). This reduction in time from intention to
action allows the user to glance at the display, much like the speedometer in a cars
dashboard, and get useful information while performing other tasks.
An HMD has several advantages over a wristwatch, one of which is that it can be
actually hands-free. By definition, a wristwatch requires at least one arm to check
the display and often another hand to manipulate the interface. However, such manual control is easy for the user to learn, is precise, and can be subtle. HMDs are also
mounted closer to the wearers primary senses of sight and hearing. This location
provides a unique first-person view of the world, matching the users perspective.
One use, of course, is pairing the HMD with a camera so that the user can capture
what he sees while on-the-go. Being mounted on the head can also allow HMDbased systems to sense many signals unavailable to a wrist interface, including head
motion, eye blinks, eye movement, and even brain signals in ideal circumstances.
On the other hand, a wrist-mounted system can sense the users hand motions and
may even be able to distinguish different types of actions and objects by their sounds
(Ward etal. 2006).
The Recon MOD Live takes advantage of both approaches, pairing a wrist-mounted
controller with an opaque HMD and Android computer mounted in a compatible pair
of goggles. The system is designed for use while skiing to provide information like
location, speed, descent, and jump airtime. With the HMD, status information can
be provided in a head-up manner with little to no control required by the user. The
information can be shared to others via a Bluetooth connection to a smartphone.
When the user has more attention (and hands) to spare, he can use the wrist interface
to select and scroll through text messages, select music to play, or interact with apps.
Google Glass, another Android-based wearable HMD, uses head motion, speech,
and a multi-touch trackpad on one earpiece for its input. Networking is via 802.11
WiFi or tethering to the users phone over Bluetooth. The display is transparent and
mounted high. It is easily ignorable and designed for short microinteractions lasting
a few seconds. This focus on microinteractions helps preserve battery life. Common
uses include texting, email, weather, clock, turn-by-turn directions, stock quotes,
calendar, traffic, remembering ones parking location, pictures, videos (10 s in length
by default), and suggestions for restaurants, events, tourist spots, and photo spots.
Glasss interface is designed to be used throughout the day and while on-the-go.
For example, if the user is walking and a text arrives, Glass alerts the user with a
28
sound. If the user ignores the alert, nothing happens. However, if the user tilts his
head up, the screen lights show the text. The user can read and dismiss it with nudge
of his head upward. Alternatively, the user can say OK Glass, reply and dictate
a response. Because Glass displays a limited amount of text on the screen at once,
interactions are short or broken into multiple small interactions. Ideally, such on-thego interactions should be around four seconds or less (Oulasvirta etal. 2005) to help
keep the user focused in the physical world.
Glass is also designed to interfere as little as possible with the users senses. Not
only is the display mounted high enough that it keeps both pupils unobstructed for
full eye-contact while the user is conversing with another person, but sound is rendered by a bone conduction transducer, which sends sound through the users head
directly to the cochlea. The ears are kept clear so that the user maintains normal,
unobstructed, binaural hearing.
Both the Recon MOD Live and Google Glass are monocular systems with
relatively small fields of view. This design choice minimizes size and weightin
particular, weight supported by the nose. Comfort is more important than features
when designing something intended to be worn for an extended period of time, and
current large field of view displays burden the nose and face too much.
REFERENCES
Ashbrook, D., J. Clawson, K. Lyons, T. Starner, and N. Patel. Quickdraw: The impact of
mobility and on-body placement on device access time. In: ACM Conference Human
Factors in Computing Systems (CHI), April 2008, pp. 219222, Florence, Italy.
Eliason, F. A wearable manual called red. New York Times, March 29, 1992, 7.
Gibson, J. The theory of affordances. In: Perceiving, Acting, and Knowing, R. Shaw,
J.Bransford (eds.). Erlbaum: Hillsdale, NJ, 1977, pp. 6782.
Guo, A., S. Raghu, X. Xie, S. Ismail, X. Luo, J. Simoneau, S. Gilliland, H. Baumann,
C.Southern, and T. Starner. A comparison of order picking assisted by head-up display
(HUD), cart-mounted display (CMD), light, and paper pick list. In: IEEE ISWC, Seattle,
WA, September 2014, pp. 7178.
Komor, N., S. Gilliland, J. Clawson, M. Bhardwaj, M. Garg, C. Zeagler, and T. Starner. Is it
gropable?Assessing the impact of mobility on textile interfaces. In: IEEE ISWC, Linz,
Austria, September 2009, pp. 7174.
Wearable Computing
29
Kress, B., E. Saeedi, and V. Brac-de-la-Perriere. The segmentation of the HMD market:
Optics for smart glasses, smart eyewear, AR and VR headsets. In: Proceedings of the
SPIE 9202, Photonics Applications for Aviation, Aerospace, Commercial, and Harsh
Environments V, September 5, 2014, p. 92020D, San Diego, CA.
Liu, D., S. Jenkins, and P. Sanderson. Clinical implementation of a head-mounted display of
patient vital signs. In: IEEE ISWC, Linz, Austria, September 2009, pp. 4754.
Lyons, K., T. Starner, and B. Gain. Experimental evaluations of the twiddler oneHanded
chording mobile keyboard. HCI Journal 21(4), 2006, 343392.
Oulasvirta, A., S. Tamminen, V. Roto, and J. Kuorelahti. Interaction in 4-second bursts: The
fragmented nature of attentional resources in mobile HCI. In: ACM CHI, Portland,
Oregon, 2005, pp. 919928.
Profita, H., J. Clawson, S. Gilliland, C. Zeagler, T. Starner, J. Budd, and E. Do. Dont mind me
touching my wrist: A case study of interacting with on-body technology in public. In:
IEEE ISWC, Zurich, Switzerland, 2013, pp. 8996.
Sager, I. Before IPhone and Android came Simon, the first smartphone. Bloomberg Businessweek,
June 29, 2012, http://www.bloomberg.com/bw/articles/2012-06-29/before-iphone-andandroid-came-simon-the-first-smartphone (Accessed March 17, 2015).
Sawhney, N. and C. Schmandt. Nomadic radio: Speech and audio interaction for c ontextual messaging in nomadic environments. ACM Transactions on ComputerHuman Interaction
(TOCHI) 7(3), 2000, 353383.
Siewiorek, D., A. Smailagic, and T. Starner. Application Design for Wearable Computing,
Synthesis Lecture Series Monograph. Morgan & Claypool, San Rafael, CA, 2008.
Starner, T. Powerful change part 1: Batteries and possible alternatives for the mobile market.
IEEE Pervasive Computing 2(4), 2003, 8688.
Starner, T. and Y. Maguire. Heat dissipation in wearable computers aided by thermal coupling
with the user. ACM Journal on Mobile Networks and Applications (MONET), Special
issue on Wearable Computers 4(1), 1999, 313.
Thomas, B., K. Grimmer, J. Zucco, and S. Milanese. Where does the mouse go? An investigation into the placement of a body-attached touchpad mouse for wearable computers.
Personal and Ubiquitous Computing 6, 2002, 97112.
Varshavsky, A. and S. Patel. Location in ubiquitous computing. In: Pervasive Computing,
J.Krumm (ed.). CRC Press: Boca Raton, FL, 2010, pp. 285319.
Want, R. An introduction to ubiquitous computing. In: Pervasive Computing, J. Krumm (ed.).
CRC Press: Boca Raton, FL, 2010, pp. 135.
Ward, J., P. Lukowicz, G. Troester, and T. Starner. Activity recognition of assembly tasks using
body-worn microphones and accelerometers. IEEE Transactions on Pattern Analysis
and Machine Intelligence (PAMI) 28(10), 2006, 15531567.
Weaver, K., T. Starner, and H. Hamilton. An evaluation of video intelligibility for novice
American sign language learners on a mobile device. In: ACM ASSETS, Orlando, FL,
October 2010, pp. 107114.
Zucco, J., B. Thomas, K. Grimmer-Somers, and A. Cockburn. A comparison of menu
configurations and pointing devices for use with wearable computers while mobile and
stationary. In: IEEE ISWC, 2009, pp. 6370, Linz, Austria.
of Wearable Technologies
Patricia Flanagan, DespinaPapadopoulos,
and Georgina Voss
CONTENTS
3.1 Introduction..................................................................................................... 32
3.2 Future Scenarios: Ethical and Speculative Implications of How Our
Embodied Materiality Is Affected by Emerging Technologies....................... 33
3.2.1 Garment as Anchor..............................................................................34
3.2.2 Start with Value................................................................................... 35
3.2.3 Think about the System....................................................................... 35
3.2.4 Requirements and Specifications Are for Humans Too...................... 35
3.2.5 Prototypes and Iterative Design........................................................... 36
3.2.6 Experimenting with the Future, Questioning the Present................... 36
3.2.7 Coloring............................................................................................... 37
3.2.8 Life As We Know ItThe Qualified Self........................................... 37
3.3 Self and the Social Politic of Wearable Technologies..................................... 39
3.3.1 Personal Technologies, Regional Innovation....................................... 39
3.3.2 Quantifying the Intended User............................................................40
3.3.3 Tracking in the Factories..................................................................... 42
3.3.4 Bodies at Work.................................................................................... 43
3.4 Synaptic Sculpture: Vibrant Materiality and the Interconnected Body.......... 43
3.4.1 Sperm, Stars, and Human-Centric Perception.................................... 43
3.4.2 Inversion of the Design Process.......................................................... 45
3.4.3 Bridging Materiality and Information.................................................46
3.4.4 Merger of the Body and Technology................................................... 48
3.4.5 Conclusion: Synthesis and Synaptics................................................... 52
References................................................................................................................. 53
31
32
3.1INTRODUCTION
The chapter is founded on the premise that current wearable technology design
practices represent a reductionist view of human capacity. The democratization of
technology into work, play, home, and mobile social networks in recent years has
seen traditional humancomputer interaction (HCI) design methodology broadened
through the integration of other methodologies and knowledge from the humanities
such as social science, anthropology, and ethnography. The field of HCI is inherently
interdisciplinary and its history is one of the inevitable disciplinary multiculturalisms spawned by the expansive impact of technological growth.
What questions should we be asking to engage a more critical design perspective? This chapter extends traditional functionalist approaches to design to engage
cultural, experience-based, and techno-futurist approaches. Wearable technologies
are therefore discussed in terms of their critical, political, ethical, and speculative potential, and case studies are presented to illustrate and exemplify the ideas
promulgated.
The chapter is organized into three sections. The first section proposes the role
of the designer to be one that includes a cultural approach to designing future
scenariosone that considers ethical and speculative implications of how our
embodied materiality is affected by emerging technologies. What is the relationship of the self to the proliferating wearable technologies? How is our sense of
self changing as new technologies mediate the space between our experience of
self and the world? We develop a methodology that asks designers and technologists to build future scenarios and envision how our embodied materiality is
affected by emerging technologies. Using a philosophical framework we explore
design and its implications on the relationship of the self to the self and to social
relationships. We then investigate how technologies such as Google Glasses and
Quantified Self applications inform our relationship to our self and redefine our
social interactions.
The second section discusses the self and the social politic of wearable technologies from macro to micro perspectives. Considering wider supply and production
chains and regulatory systems whose existence shapes the production and meaning
of wearablesboth their material form and design, and the movement of gathered
data from the body into wider dispersed networks of power. Moving from the micro
(technology/body) to the macro (systems of production), we consider where control
lies across these networks, at what unit of analysis, and what their impact could be
on the wider world as they are dispersed.
The final section adopts a techno futurist approach proposing synaptic sculpture
as a process for creative design that engages vibrant materiality and the interconnected body. The section describes the emergence of a new paradigm in terms of
our augmented perspectiveour perception of scale expanding our awareness and
sensitivity across macro- and nanolevels. These new spheres of awareness become
our normative environmentones with an amplified awareness of the instability,
fungability, and interconnectedness of things. This perspective promulgates the
space of design to be in the interface as mediator of experience, rather than design
of objects or products. We propose the need to develop a connoisseur of somesthetic
33
qualities surrounding the design of wearables. This subverts the traditional fashion
design methodology away from the trickle-down theory to one that can enhance a
relationship between designer and user who can become coproducers and connects
materiality to anthropology and the lived experience of the individual.
Our clothing has always expressed our relationship to social structures and to the
ways we perceive others and want to be perceived by them. It also reflects ideological relationships not only to means of productionthe industrial revolution after all
ultimately presaged ready-to-wear and a democratization of access to fashion, but
also to morality. When the zipper was introduced to male pants in 1901 critics at the
time considered it a sign of moral decline. Similarly, corsets, high heels, and casual
Fridays all exemplify our collective attitude toward capability, physicality, and the
way we engage with the world and others.
Today, as we are developing a new range of wearable devices it would be
instructive to use a framework that explores design and its implications to
the relationship of the self to the self and to social relationships. This same
framework can be used to investigate how technologies like Google Glass and
Quantified Self applications inform our relationship to our self and redefine our
social interactions.
In the past 20years we have seen increased development in the realm of wearable technologies. From the early MIT experiments of a cyborgian self (spearheaded
by Steve Mann and Thad Starner) to todays Google Glass and Quantified Self
applications, the focus has been on a singular vision of what it means to be human.
34
35
36
year after it released a version of the device to developers, published a set of social
guidelines, a social etiquette of sorts. The list includes advice such as:
Ask for permission. Standing alone in the corner of a room staring at people while
recording them through Glass is not going to win you any friends.
Glass-out. Glass was built for short bursts of information and interactions that allow
you to quickly get back to doing the other things you love. If you find yourself staring
off into the prism for long periods of time youre probably looking pretty weird to the
people around you. So dont read War and Peace on Glass. Things like that are better
done on bigger screens.
Be creepy or rude (aka, a Glasshole). Respect others and if they have questions about
Glass dont get snappy. Be polite and explain what Glass does and remember, a quick
demo can go a long way. In places where cell phone cameras arent allowed, the same
rules will apply to Glass. If youre asked to turn your phone off, turn Glass off as well.
Breaking the rules or being rude will not get businesses excited about Glass and will
ruin it for other Explorers.
Google Glass, by releasing its product at an early stage, has been able to generate a vigorous discourse on privacy, socialization, and the way social cues can be
built into interaction design. Could Google Glasses be designed in such a way as to
make the list of dos and donts obsolete? Experimenting with scenarios of use and
observing users in the street, cafes, parties, and at work can yield insightful observations that can be translated into design decisions and reflected in specifications and
requirements.
37
3.2.7Coloring
Coloring is a hypothetical consumer health product that is launched in the year 2046
and was developed by School of Visual Arts, MFA Interaction Design (SVA NYC,
2014) students Matt Brigante, Melody Quintana, Sam Wander, and Amy Wu as part
of a Future Wearables class. The project assumes that by the year 2046 significant
leaps in psychology and neuroscience research will have taken place, transforming
our understanding of mental health. The project also assumes that innovations in
materials technology will introduce new possibilities for treatment, such as brain
chip implants.
Coloring is imagined as
a skin interface for people who use brain chip implants to track and manage their
mental health. It communicates with the users brain chip to display a real-time
visualization of their emotional state, right in the palm of their hand. Emotions are
mapped to a 7000-color spectrum. The spectrum is richer and more precise than our
verbal emotional vocabulary, empowering people with a new language to understand
their feelings. Rather than having to use blunt and unpredictable prescription drugs,
users are given the agency to self-medicate when appropriate. They can simply blend
harmonizing colors into their Coloring to balance their mood
Coloring (2014)
The project took as a starting point the work of John Rogers, professor of materials
science and engineering at the University of Illinois at Urbana-Champaign, in
implantable technologies and speculated on future scenarios of use. At the same
time it asks us to consider how our wearable devices can provide us with a new
vocabulary and range for expression and communication. This future scenario can
help explore current opportunities and create a framework for inquiry and extend
what is possible (Figure 3.1).
38
7000 colors
Happiness
Surprise
Disgust
Anger
Contempt
Fear
Sad
7 Families
Discrete Emotion Theory
These seven specific core emotions are
biologically determined emotional
responses whose expression and
recognition is fundamentally the
same for all individuals regardless
of ethnic or cultural differences.
FIGURE 3.1 Coloring by Matt Brigante, Melody Quintana, Sam Wander, and Amy Wu.
provided the insight and reflection that eluded her devices and applications. At the
end of her thesis presentation she writes:
Every time we experience these moments the self is shaped. They shape our expectations, our confidence, our expression. They shape who we are. The truth is simple and
it is not embedded in a set of data that tells me how many steps Ive taken. While data
can be useful with specific set goals, my biggest takeaway throughout this journey has
been to remember to track my soul first. The self is fascinating - that fascination cannot
be quantified
Her experience and reflections can be used as direct input and create guidelines for
developing wearable devices that aim to change behavior and provide insight into the
human experience.
39
Thich Nhat Hanh is a Buddhist monk who was invited by Google (Confino, 2013)
to give a series of workshops and provide inspiration to its developers and product
managers on how to engage users and develop applications and devices that can yield
the insights that evaded those devices and applications that Aydin used and possibly
account for the drop-off rate of current wearables. In discussing the goals of these
workshops, Thich Nhat Hanh commented:
When they create electronic devices, they can reflect on whether that new product will
take people away from themselves, their family, and nature. Instead they can create the
kind of devices and software that can help them to go back to themselves, to take care
of their feelings. By doing that, they will feel good because theyre doing something
good for society.
Engaging with the totality of human experience and probing into what creates value,
the systems that we inhabit and the relationships that we create in them are all fundamental in the creation of meaningful and useful wearable devices. We have adopted
a far too reductionistic approach for too long and have been leading product development based on a mechanistic model of what it is to be human.
Comfort, connectedness, engagement, the delight of a soft material against the
human skin, the rituals of dressing and undressing, form the grounding framework
for creating wearable devices. We stand at the precipice of incredible innovation in
materials, sensors, computational and power technologies. We have the opportunity
to create new models of expression, communication, and reflection, and in order to
do so, we should adopt a methodology that is grounded in humanistic and ethical
principles and critically consider how we want to use these innovations to interact
with our communities and ourselves.
40
both in the materiality of their design, the configuration of their intended use, and
the politics of the data that they gather.
Market researchers predict that wearable computing devices will explode in
popularity in coming years to the extent that they will become the norm (ABI,
2010). The numbers are enormous: by 2018, there are expected to be 485 million
annual device shipments, all of which have to be manufactured somewhere. Despite
rhetoric of a shrinking world, regional patterns of innovation and industry remain
embedded into the earth; certain places are better at doing some things than others
(Howells, 1999).
The San Francisco Bay Area in Northern California is home to the Silicon Valley
information technology cluster (Saxenian, 1996): after an early history around
microprocessors and semiconductors, the area transformed into a hub for software
and Internet service companies and plays host to some of the worlds largest technology companies. Many of these firms, including Google and Facebook, are now
edging into the wearables market, pulling together teams of designers and engineers
to haul together the concept and intent around these devices.
Seventeen time zones away, the intent becomes material. China is one of the largest and most rapidly developing economies in the world, expanding the industrial
capacity of its high tech industries to act as the global economys world factory,
answering Western desire for ICT consumer goods (Bound et al., 2013). Many of
the current generations of wearables are designed by people in the global North
and made by people in the global South. FitBit is the market leader in the wearable
activity band market. While the founder company is based in San Francisco, FitBit
locates its manufacturing in China; while the device retails for around U.S. $100, it
costs less than one-fifth of that to make (Electronica, 2013). Yet these devices are
also designed for users in the global North, with estimates that 61% of the wearable
technology market in 2013 was attributed to sports and activity trackers. FitBit was,
its founder explained, designed as a quiet and personal device:
From early on we promoted a notion of a more introverted technology that is more
about the connection between yourself and your goal, rather than having a third party
like an athletics company telling you how fit you should be and whats the proper
weight for you.
Amit, G. (2014)
In doing so, the technology falls into not only Western trends around commercialized self-improvement (Maguire, 2008) but also trajectories laid down by the earlier
quantimetric self-tracking movement.
41
The term quantified self emerged in 2007 to describe the way that peopleinitially
an elite group of Bay Area inhabitees, including editors of WIRED magazine
sought to find answers to cosmic questions (Who are we? What does it mean to
be human?) through rational corporeal self-knowledge, giving rise to tools that
offered insight into the data found within their own bodies. By this framing, wearables became a way of reducing wider physicaland mentalhealthcare systems of
infrastructure down to the level of the individual: self-tracking as a form of self-care,
reconfiguring the relationship that might otherwise be formed between a patient and
a medical professional to that between a user, a piece of rubber, a circuit board, and
a software algorithm (while, in the wings, a company sits quietly, waiting to mop the
data up).
Research done by the Centre for Creative and Social Technology (CAST) at
Goldsmiths, University of London, found that 63% of U.K. and 71% of U.S. respondents thought that wearable technology had improved their health and fitness,
with one in three willing to wear a monitor that shared personal data with a healthcare provider (Rackspace, 2013). The business models around the market indicated
where the true value of wearables lies: not in the plastic and electronics of the hardware devices themselves but also in the fog of data that they extracted from the
human body. AsChrisBauer, the codirector of CAST, described it:
The rich data created by wearable tech will drive the human cloud of personal data
With this comes countless opportunities to tap into this data; whether its connecting with third parties to provide more tailored and personalized services or working
closer with healthcare institutions to get a better understanding of their patients. We
are already seeing wearable technology being used in the private sector with health
insurance firms encouraging members to use wearable fitness devices to earn rewards
for maintaining a healthier lifestyle.
Bauer (2013)
While the devices themselves are manufactured in their millions, numerous software
apps have also crawled into the world to make sense of this data: see, for example,
the MapMyFitness toolcompatible with devices such as the FitBit and Jawbone,
it has, as of May 2014, 16 million registered users who log over 200,000 health and
fitness activities daily.
For the users of wearable tech in the global North, ethical issues have emerged
around privacythe tipping point between sousveillance and surveillance. Participants
in CASTs research cited privacy concerns as the main barrier to adoption. Questions
have been raised about whether data can be sold on to third parties; whether it is
securely stored; and who, ultimately, owns it (Ng, 2014). These suspicions emerge
from the primacy of the idea of control and choice: that the users who make the choice
to use wearable tech as a way to figure out what humans are here for may unknowingly
and unwittingly relinquish control of the data it generates; that someone else may be
using rational means to see and understand bodies and minds. These are the fears of
the intended user, the perfect persona who chooses to explore self-knowledge through
the body, and who has the leisure time to engage in fitness activities. Control, consent,
and choice are keys: over half of CASTs respondents felt that wearable technology
42
helped them feel more in control of their lives. Down across the supply chain, however, choice is abstracted and bodies are intended to be surveilled.
The notion of the quantified self derives from a core concept of agency and sousveillance, in which the motions of the body are willingly recorded by a participant in
the bodys activity. Yet there is a much longer heritage of using rational metrics to
measure the activity of the human body, only by outside agents. In his work published in 1911, The Principles of Scientific Management, Frederick Taylor described
how the productivity of the workforce could be improved by applying the scientific
method to labor management. These included techniques such as time-and-motion
studies, in which a workers series of motions around various tasksbricklaying,
moving pig iron, were timed to ensure the most efficient way to perform a job. Here,
monitoring is not an autonomous choice made with agency about enlightenment and
self-knowledge, but an act placed onto individuals within the power dimensions of the
workplace itself. The body is quantifiednot for self-directed self-improvement, but
as a means to wring maximum physical efficiency out of it for an outside body: the
boss. The British supermarket chain Tesco equipped its employees with data bands
and determined that it thus needed 18% less of those same workers (Wilson, 2013).
Wearables in the workplace are becoming more prevalent: CAST reported that 18% of
employees now wear some kind of device, and that 6% of employers provide a wearable
device for their workers. Innovations in this space include Hitachis Business Microscope,
a lanyard packed with sensors that recognize face, body and rhythm data between
employees, gathering data that can be turned into interaction-based organizational and
network diagrams. A host of software solutions supports this surveillance of workplace
bodies, such as Cogiscans Tracking and Route Control, which uses real-time information to track the physical location and quantities of all products on the factory floor, and in
doing so minimises unnecessary movements of employees (Cogiscan, 2014).
As Ana Coote notes, we live in an era of instant communication and mobile technologies with global reach, where people can increasingly work anywhere; and there is no
end to what employers can demand (Coote et al., 2014). Yet unlimited work does not necessarily map onto quantified laborindeed, it is possibly its antithesis. Unsurprisingly,
the bodies at work that are the most quantifiable are those engaged in routine manual
labornot the creative knowledge-intensive work done in the designing and prototyping
of wearables by engineers and designers, but repetitive replicable tasks that are only an
inch away from being replaced by the automated machines who can mimic the actions
of human bodies, but without need for sleep, fuel, or rights (Frey and Osborne, 2013).
Adam Littlers quote, given earlier, was taken from a BBC documentary in the
enormous warehouses of the online retailer, Amazon, which stock a range of consumer activity trackers including FitBit, Jawbone, and Polar. Little, an undercover
43
3.3.4Bodies at Work
In their piece, 75 Watt (2013), artists Cohen Van Balen collaborated with a choreographer and Chinese factory workers to create a piece which reverse engineers the
values of a supply chain by creating a useless physical object; the product of the
labor is the dance done by the workers as they assemble the clunky white plastic
device. Seventy-five watts is the average output of energy a human expends in a day,
a measure that could be tracked by sousveillance through a consumer wearable, on
the path to asking questions about the meaning of human life. Yet down along the
supply chain, in the factories and the warehouses, the same transformative power of
digital hardware around wearable technology answers the question: the human life
is capital: the bodies themselves only actions.
44
nurse the spermatozoon until it developed enough to enter the world. The invention
of the microscope revealed the process of spermatozoa fertilization with ovum and
changed the role of women profoundly. Our comprehension of the world is mediated
by technology and is dependent on our ability to adapt and make sense of the information the technologies provide. Early star charts depicted animals, figures, and
objects in the sky. The images and mythologies that went along with them were used
to aid memory and help recall the location of stars in the visible night sky. With the
invention of the telescope the cartographers job changed drastically. Maps became
factual documents plotting out the heavens above in ever increasing detail, in line
with technological advancement. Human consciousness is altered as new technology
enables us to see things differently, for example, when we landed on the moon in
1969, images looking back at the earth were projected into peoples living rooms via
television, and they enabled us to imagine ourselves as part of a greater whole and
see the earthrather than endless and boundless in natural resourcesas a delicate
intertwined ecosystem of which we are just a small part.
Floating eye is a wearable work by Hiroo Iwata performed at Ars Electronica in
2000 where a floating blimp suspended above the wearers body supports a camera.
The head of the wearer is encased in a dome, and from the inside they view a panoramic screen projecting what is being filmed. The experience is that of observing
oneself from above, normal vision is superseded and interaction with the environment estranged. This work predates a perspective that we are becoming accustomed
to, that of navigating space by looking down into the screen of a digital device,
guided by a plan view and prompted by Google maps or the like. Wearable technologies may at first seem to disorient or give a feeling of estrangement but as we explore
new ways to understand the world around us, we are profoundly changing the way
we live and interact. The shift in perspective that we are fast approaching involves
both time and scale. The wearable technology that surrounds and permeates our
bodies will mediate this experience and augment our senses. We are witnessing the
emergence of a new paradigm in terms of our augmented perspectiveour perception of scale expanding our awareness and sensitivity across macro and nanospheres
that we will learn to accommodate and ultimately will become our normative
environment.
Since the mid-1990s we have lived in environments supported by digitization.
This is long enough to evaluate the theoretical hype of the late 1990s surrounding the
digital and virtual world of the Internet that hypothesized homogenization of culture
and the divorce of information from materiality. The teleological and ocular-centric
faith in technology has deep-seated historical roots. With the invention of photo
graphy, journals wrote articles in awe of this new scienceit seemed that we had
procured the magical ability to capture moments of life in factual documents of light
on photo sensitive paper. A zealous appeal made in the comments by Oliver Wendall
Holmes in an article published in 1859 heralds this greatest human triumph over
earthly conditions, the divorce of form and substance. What is to come of the
stereoscope and the photograph[] Form is henceforth divorced from matter. In
fact, matter as a visible object is of no great use any longer, except as the mold on
which form is shaped (Holmes, 1859).
45
46
TABLE 3.1
Authors and Concepts That Point to the Growing Prominence of Experience/
Interaction/Interface Design (Including HCI)
Author
John Sweeney (2014)
Past
Present
Future
Computers controlling
tools
Technology-driven
design
Connected intimacy
Machines making
machines
Need-driven design
Tailored ecosystems
Co-evolved possibilities
47
Sensors perceive data in the environment; they can be based on the detection of
different parameters such as light, heat, humidity, stress, force, movement, and noise
and come in many forms such as microphones, ultrasound detectors, photovoltaic
sheets, stretch sensors, and data gloves. They can be analogue or digital.
Recorders take samples of reality or traces of activity and collect them, in a nalogue
formats by fixing them onto substrates like tape, film or photo-paper, and through
numeric coding in digital formats. Recordings can be transformed and altered, and
augment memory.
Actuators can have different mechanisms such as electric, pneumatic, or hydraulic, in order to produce activities such as movement, light, or sound, for example,
a fan, a light, or a buzzer. When combined with materials such as shape memory
alloys, thermochromic inks, or smart textiles, they can appear to embody autonomy
in reaction to changes in conditions.
Transmitters nullify distance, as they have evolved so has the way we live in the
world. This has been as profound as the effect of the lens on our visual perspective
from micro to macro understandings of the world. They are interfaces ranging from
the telegraph to television, facsimile, radio, Internet, X-bee, etc.; they offer potential
to reconsider time, space, and interaction.
Diffusers are attachments for broadening the spread of a signal into a more even
and regulated flow, for example, devices that spread the light from a source evenly
across a screen. They could be in the form of an electrostatic membrane or a projection screen such as LCD, plasma, or thermal imaging.
Integrators involve the integration of technologies into living organisms, the
mash-up between biology, medicine, tissue engineering, nanotechnology, and artificial life.
Translation of data from one type of input expressed through another form of
output is a natural function within the fungible realm of data; it enables exploration of the traditional boundaries that govern human perception. It is well known
that people without or with minimal function of certain senses, become more acute
in the function of others. For example, by producing oral clicking noises, Ben
Underwood is able to echolocate and visualize spaces around him even though he
is blind. Tests showed that when he performed echolocation his calcimine cortex,
the part of the brain that normally deals with visuals, was shown to be stimulated
(McCaffrey, 2014).
Neil Harbisson is an artist who has achromatopsy, meaning he cannot see colors. He has legally registered as a cyborg and wears a permanent head mounted
computer that enables him to hear color by converting light waves to sound waves.
He is a painter and produces artworks based on music. His senses have been augmented and his body adapted to the expanded somesthetic so that he perceives
more than the natural human visible spectrum to include infrared and ultra violet
(Harbisson, 2012).
The bodys sensual capacities can adapt and accommodate new experiences and
wearables provide a platform for experimentation. An example is a wearable devise
that explores sensory dissonanceBamboo Whisper translates language from one
wearer into percussive sounds and vibration felt by the wearer of a second device
(Figure 3.2).
48
FIGURE 3.2 Bamboo Whisper, Tricia Flanagan, and Raune Frankjaer. (Photo: Tricia
Flanagan, 2012.)
49
2001, 2008). Flanagan and Vegas research into Humanistic Intelligence produced
Blinklifier, a wearable device that uses eye gestures to communicate with an onboard
computer. By wearing electroplated false eyelashes and conductive eyeliner, bio-data
from blinking communicates directly with the processor without engaging cognitive action. The bodys natural gestures are augmented and amplified into a head
mounted light array (Flanagan and Vega, 2012). We innately understand and interpret information from peoples eye gestures; by amplifying these everyday gestures,
Blinklifier leverages the expressive capacity of the body (Figure 3.3).
Anticipating the merger of the body and technology, Ray Kurzweil proposed singularity (Kurzweil, 1998, 2006) as the point in the future when the capacity and
calculation speeds of computers equal that of human neural activity, and our understanding of how the mind works enables us to replicate its function. Kurzweil promulgates artificial intellects superior to human ones, which poses the question: In
the future will we be outsmarted by our smart-clothes? Artificial intellects known as
artilects will conceivably have rights, following the attainment of universal human
rights, and then the rights of animals, landscapes, and trees (Dator, 2008). Nonhuman
entities are already represented in our juridical systems, in the form of corporations,
and artilects could attain rights in a similar manner (Sudia, 2001).
The separation between human and machine intelligence traditionally lies in the
human realm of emotions thought of as metaphysical. Recent scientific discovery has
given us insight into emotions such as retaliation, empathy, and love that can now be
understood within the frame of scientific knowledge.
FIGURE 3.3 Blinklifier, Tricia Flanagan, and Katia Vega. (Photo: Dicky Ma. Tricia
Flanagan, 2012.)
50
Lower levels of the neurotransmitter serotonin may affect your ability to keep
calm when you think someone is treating you badly and promote your tendency to
retaliate (Crockett, 2008). Mirror neurons have been discovered in the brain, which
produce the same chemical reaction in your body when you witness an experience
happening to another as are being produced by the body you are watching (Keysers,
2009). For example, when someone falls over and hurts himself or herself, you may
instinctively say ouch and actually produce small amounts of the same chemical
reaction in your body as if it happened to you. Empathy could therefore be described
as physiological rather than a purely emotional condition. Biologists are endeavoring to interpret emotional states into biological chains of events. Tests indicate that
higher levels of oxytocin in females and vasopressin in males may foster trust and
pair bonding at a quicker rate. Brain activity in dopamine related areas of the brain
are active when mothers look at photos of their offspring or people look at photographs of their lovers. Dopamine is a neurotransmitter, which activates the same
circuitry that drugs like nicotine, cocaine, and heroine do to produce euphoria and
addition. Love therefore can be described as an emergent property of a cocktail of
ancient neuropeptides and neurotransmitters (Young, 2009).
We tend to anthropomorphize robots, our bodies produce mirror neurons in reaction to their behaviors in a similar way that we do to human entities (Gazzola et al.,
2007). Can the experience of digitally mediated touch produce physiological chemistry in the recipient? Cute Circuits Hug Shirt senses the pressure and length of a hug,
the heart rate, and skin temperature of the hugger and sends this data via Bluetooth
to a recipient whose corresponding Hug Shirt actuators provide a simulated hug.
Put simply, the sender hugs their own body and a recipient body feels the experience. Can wearables, designed to actuate physical haptic stimulus on another induce
chemical emotional effect? What are the potential implications for health, medicine,
and well-being? The interconnected networks that mirror neurons imply, between
human but also nonhuman entities, pose fundamental problems to Ray Kurzweil and
Norbert Wieners (Wiener, 1989) assumptions that by mechanistic analysis of the
materials of the body, we will ultimately understand and replicate them. Quantum
physics proposes that to understand the mind, we must look outside the body, and
consider the interconnected nature of everything as porous.
Textiles of the future merge science and technology. Nobel laureate Alex Carrel headed
the first tissue culture laboratory exploring one of the most complexes of all materials
the skin. Future textiles will be designed with highly engineered specificationslike
skincombining areas that are thicker, thinner, more flexible, or ridged and that have
the ability to adapt to the task or the environment. At SymbioticA lab, Aaron Catts has
been growing cultured skins from enzymes to produce kill-free leather, an approach
that tackles ethical and sustainability issues. Stelarcs Ear on Arm (2006ongoing)
was cultured in the SymbioticA lab. The ear was grown from tissue culture around
a frame and then sutured to Stelarcs forearm. A microphone was then embedded in
the prosthetic enabling visitors to Stelarcs website to listen to whatever his third ear
hears. In a future iteration of the project, he plans to implant a speaker into his mouth,
so that people can speak to him through transmitters, for example, from his website
or a mobile telephone, and he will hear the sounds inside his head, or if he opens his
mouth someone elses voice could speak from within it.
51
52
Macro perspectives gained through global information networks, cloud computing, and super computers that allow access to information instantaneously have
enabled us to envisage an interconnected worldview. Simultaneously, an amplified
awareness of the instability, fungability, and interconnectedness of things is emerging
as we acknowledge the vibracity of the world at a molecular level. The importance
of haptic engagement within communication, incorporating the bodys full potential
of senses, is gaining recognition. Nanoperspectives reveal a world with completely
different parameters and support a reconsideration of vitalism fundamental to string
theory and quantum physics that question our current understanding of materiality.
This perspective further promulgates the space of design to be in the interface as
mediator of experience, rather than design of objects or products.
53
Bio-data and big-data combine to produce unprecedented detail of personal information enabling the tailoring of design to personal desires, while at the other end
of the spectrum, human life is subsumed as a widget in the production line. Lines
of control, borders between public and private, are all to be renegotiated. If you can
print your own IUD, you take control away from legislation in terms of birth control
or make DIY medical decisions because you have access to specific data sets that in
the past were left to teams of expertsthe questions who will control society and
how governance systems will function in these new terrains remain unanswered.
A humanistic intelligence approach to wearable technologies considers a seamless integration, extending the reach of the systems of the body into body coverings
and into the world beyond. The biosphere and the data-sphere become one through
which sustainable design solutions will emerge. The field of wearable technology
explores the function of the mechanistic, as well as that of neural networks and mental representation. The peripheral borders where physical atoms meet digital bits are
fertile new spaces for design.
At the nanolevel it is revealed that everything we thought was fixed and stable
is chaotic and in motion. There is a growing awareness of the porosity of the world
and fungability of materials. Design of future wearable tech. apparel and artifacts
are created with molecular aesthetics, they are synaptic sculptures where experience becomes a material to be molded and shaped in the design of interaction. An
awareness of interconnectedness will promulgate designers to create works responsibly and tackle research problems by proposing creative solutions. Vibrant materials
will be crafted into bespoke manifestations of experienceapparel as extensions of
natural systems.
REFERENCES
ABI Research. 2010. Wearable computing devices, like Apples iWatch, will exceed 485million
annual shipments by 2018. ABIresearch.com. Accessed May 20, 2014. https://www.
abiresearch.com/press/wearable-computing-devices-like-apples-iwatch-will.
Amit, G. 2014. Wearable technology that ignores emotional needs is a Major Error.
Dezeen. Accessed May 20, 2014. http://www.dezeen.com/2014/03/10/interview-fitbitdesigner-gadi-amit-wearable-technology/.
Angelis, J. and E.P. de Lima. 2011. Shifting from production to service to experience-based
operations. In: Service Design and Delivery, M. Macintyre, G. Parry, and J. Angelis
(eds.), pp. 8384. New York: Springer.
Ashton, K. 2009. That Internet of Things thing. RFID Journal, June 22, 2009. Accessed June
14, 2014. http://www.rfidjournal.com/articles/view?4986.
Barthes, R. 1973. Mythologies. London, U.K.: Granada.
Bauer quoted in Rackspace. 2013. The human cloud: Wearable technology from novelty to
production. White Paper. San Antonio, TX: Rackspace.
Bennett, J. 2010 (1957). Vibrant Matter: A Political Ecology of Things. Durham, NC: Duke
University Press.
Blank, J. and S.C. Rssle. 2014. PeptomicsMolecular word modeling. Paper presented at
the Third International Conference on Transdisciplinary Imaging at the Intersection of
Art, Science and CultureCloud and Molecular Aesthetics, Pera Museum, Istanbul,
Turkey, June 2628, 2014. Abstract accessed August 30, 2014. http://ocradst.org/
cloudandmolecularaesthetics/peptomics/. See also http://www.peptomics.org.
54
Bound, K., T. Saunders, J. Wilsdon, and J. Adams. 2013. Chinas absorptive state: Innovation
and research in China. Nesta, London, U.K. Accessed August 30, 2014. http://www.
nesta.org.uk/publications/chinas-absorptive-state-innovation-and-research-china.
Cogiscan. 2014. WIP tracking and route control. Cogiscan.com. Accessed May 20, 2014.
http://www.cogiscan.com/track-trace-control/application-software/wip-tracking-routecontrol/.
Cohen, D. 2014. Why we look the way we look now. The Atlantic Magazine, April 16, 2014.
Accessed May 14, 2014. http://www.theatlantic.com/magazine/archive/2014/05/the-waywe-look-now/359803/.
Coloring. Accessed August 30, 2014. http://interactiondesign.sva.edu/people/project/coloring.
Confino, J. 2013. Google seeks out Wisdom of Zen Master Thich Nhat Hanh. The Guardian,
September 5, 2013. Accessed August 30, 2014. http://www.theguardian.com/sustainable-
business/global-technology-ceos-wisdom-zen-master-thich-nhat-hanh.
Coote, A., A. Simms, and J. Franklin, 2014. 21 Hours: Why a Shorter Working Week Can Help
us All to Flourish in the 21st Century. London, U.K.: New Economics Foundation, p. 10.
Crockett, M. 2008. Psychology: Not fair. Nature 453: 827, June 12, 2008.
Dator, J. 2008. On the rights and rites of humans and artilects. Paper presented at the
International Conference for the Integration of Science and Technology into Society,
Daejeon, Korea, July 1417, 2008. Accessed August 30, 2014, www.futures.hawaii.edu/
publications/ai/RitesRightsRobots2008.pdf.
Doyle, R. 2003. Wetwares: Experiments in Postvital Living, Vol. 24. Minneapolis, MN:
University of Minnesota Press.
Electronics360. 2013. Teardown: Fitbit flex. Electronics360. Accessed May 20, 2014. http://
electronics360.globalspec.com/article/3128/teardown-fitbit-flex.
Endeavour Partners. 2014. Inside Wearables, January 2014. Accessed August 30, 2014. http://
endeavourpartners.net/white-papers/.
Flanagan, P. 2011. The ethics of collaboration in sunaptic sculpture. Ctr+P Journal of
Contemporary Art 14: 3750, February.
Flanagan, P. and K. Vega. 2012. Blinklifier: The power of feedback loops for amplifying
expressions through bodily worn objects. Paper presented at the 10th Asia Pacific
Conference on Computer Human Interaction (APCHI 2012), Matsue, Japan. See also,
Accessed August 30, 2014. http://pipa.triciaflanagan.com/portfolio-item/blinklifier/and
https://www.youtube.com/watch?v=VNhnZUNqA6M.
Frey, C. and M. Osborne. 2013. The Future of Employment: How Susceptible Are Jobs to
Computerisation? Oxford, U.K.: OMS Working Paper.
Gabriel, T.H. and F. Wagmister. 1997. Notes on weavin digital: T(h)inkers at the loom. Social
Identities 3(3): 333344.
Gazzola, V., G. Rizzolatti, B. Wicker, and C. Keysers. 2007. The anthropomorphic brain:
The mirror neuron system responds to human and robotic actions. Neuroimage 35(4):
16741684.
Gershenfeld, N. 2011. The making revolution. In: Power of Making: The Importance of
Being Skilled, D. Charny (ed.), pp. 5665. London, U.K.: V&A Pub. and the Crafts
Council.
Harbisson, N. 2012. I listen to color. TED Global. Accessed November 25, 2012. http://www.
ted.com/talks/neil_harbisson_i_listen_to_color.html.
Holmes, O.W. 1859. The stereoscope and the stereograph. In: Art in Theory, 18151900: An
Anthology of Changing Ideas, C. Harrison, P. Wood, and J. Gaiger (eds.), pp. 668672.
Malden, MA: Blackwell. Originally published in The Atlantic Monthly, Vol. 3 (Boston,
MA, June 1859): 738748.
Howells, J. 1999. Regional systems of innovation. In: Innovation Policy in a Global Economy
D. Archibugi, J. Howells, and J. Michie (eds.). Cambridge University Press, Cambridge.
55
Howes, D. 2003. Aestheticization takes command. In: Empire of the Senses: The Sensual
Culture Reader Sensory Formations Series, D. Howes (ed.), pp. 245250. Oxford, U.K.:
Berg.
Howes, D. 2005. HYPERESTHESIA, or, the sensual logic of late capitalism. In: Empire of
the Senses: The Sensual Culture Reader Sensory Formations Series, D. Howes (ed.),
pp.281303. Oxford, U.K.: Berg.
Imlab, C. 2014. Accessed May 20, 2014. https://www.youtube.com/watch?v=pE0rlfBSe7I.
Ishii, H., D. Lakatos, L. Bonanni, and J.-B. Labrune. 2012. Radical Atoms: Beyond Tangible
Bits, Toward Transformable Materials, Vol. 19. New York: ACM.
Ishii, H. and B. Ullmer. 1997. Tangible bits: Towards seamless interfaces between people, bits
and atoms. ITP Tisch. Accessed August 30, 2014. http://itp.nyu.edu/itp/.
Kelley, K. 2007. What is the quantified self? Quantifiedself.com. Accessed May 20, 2014.
http://quantifiedself.com/2007/10/what-is-the-quantifiable-self/.
Kelley, L. 2014. Digesting wetlands. Paper presented at the Third International Conference
on Transdisciplinary Imaging at the Intersection of Art, Science and CultureCloud
and Molecular Aesthetic, Pera Museum, Istanbul, Turkey. Abstract accessed August 30,
2014. ocradst.org/cloudandmolecularaesthetics/digesting-wetlands/.
Keysers, C. 2009. Mirrow neuronsAre we ethical by nature. In: Whats Next?: Dispatches
on the Future of Science: Original Essays from a New Generation of Scientists,
M.Brockman (ed.). New York: Vintage Books.
Kurzweil, R. 1998. The Age of Spiritual Machines: When Computers Exceed Human
Intelligence. New York: Viking Press.
Kurzweil, R. 2006. Singularity: Ubiquity interviews Ray Kurzweil. Ubiquity, January 1, 2006.
Littler, M. 2013. Amazon: The Truth Behind the Click. Produced by Michael Price. London,
U.K.: BBC.
Lupton, E. and J. Tobias, 2002. Skin: Surface, Substance + Design, 36, pp. 7475. New York:
Princeton Architectural Press.
Maguire, J.S. 2008. Leisure and the obligation of self-work: An examination of the fitness
field. Leisure Studies 27: 5975, January.
Mann, S. 1997. Wearable computing: A first step toward personal imaging. Computer 30(2):
2532.
Mann, S. 2001. Wearable computing: Toward humanistic intelligence. IEEE Intelligent
Systems 16(3): 1015.
Mann, S. 2008. Humanistic intelligence/humanistic computing: Wearcomp as a new framework for intelligent signal processing. Proceedings of IEEE 86(11): 21232151.
Margetts, M. 1994. Action not words. In: The Cultural Turn: Scene-Setting Essays on
Contemporary Cultural History, D.C. Chaney (ed.), pp. 3847. New York: Routledge.
Master, N. 2012. Barcode scanners used by amazon to manage distribution centre operations. RFgen. Accessed May 20, 2014. http://www.rfgen.com/blog/bid/241685/
Barcode-scanners-used-by-Amazon-to-manage-distribution-center-operations.
McCaffrey, E. 2014. Extraordinary people: The boy who sees without eyes. Accessed June 29,
2014. http://www.imdb.com/title/tt1273701/.
McKinlay, A. and K. Starkey. 1998. Foucault, Management and Organization Theory: From
Panopticon to Technologies of Self. London, U.K.: Sage.
Ng, C. 2014. Five privacy concerns about wearable technology. Accessed May 20, 2014.
http://blog.varonis.com/5-privacy-concerns-about-wearable-technology/.
Plant, S. 1996. The future looms: Weaving women and cybernetics. In: Clicking in: Hot Links
to a Digital Culture, L. Hershman-Leeson (ed.), pp. 123135. Seattle, WA: Bay Press.
Plant, S. 1997. Zeros and Ones: Digital Women and the New Technoculture. New York: Doubleday.
Poissant, L. 2007. The passage from material to interface. In: Media Art Histories, O. Grau
(ed.), pp. 229251. Cambridge, MA: MIT Press.
56
Pold, S. 2005. Interface realisms: The interface as aesthetic form. Postmodern Culture 15(2).
Accessed February 20, 2015. http://muse.jhu.edu.lib-ezproxy.hkbu.edu.hk/journals/pmc/
toc/pmc15.2.html.
PSFK Labs. 2014. The future of wearable tech. Accessed January 8. http://www.slideshare.
net/PSFK/psfk-future-of-wearable-technology-report: PSFK.
Quinn, B. 2013. Textile Visionaries Innovation and Sustainability in Textiles Design, Vol. 11,
pp. 7681. London, U.K.: Laurence King Publishing.
Rackspace. 2013. The human cloud: Wearable technology from novelty to production. White
Paper. San Antonio, NC: Rackspace.
Saxenian, A.L. 1996. Regional Advantage: Culture and Competition in Silicon Valley and
Route 128. Cambridge, MA: Harvard University Press.
Schiphorst, T. 2011. Self-evidence: Applying somatic connoisseurship to experience design.
In: CHI 11 Extended Abstracts on Human Factors in Computing Systems, pp. 145160.
Sudia, F.W. 2001. A jurisprudence of artilects: Blueprint for a synthetic citizen. Accessed
August 30, 2014. http://www.kurzweilai.net/a-jurisprudence-of-artilects-blueprint-fora-synthetic-citizen.
SVA, NYC. Interaction MFA interaction design. Accessed August 12, 2014. http://
interactiondesign.sva.edu/.
Sweeney, J.A. 2014. Artifacts from the three tomorrows. Graduate Institute of Futures Studies,
Tamkang University, Hawaii. Accessed August 30, 2014. https://www.
academia.
edu/7084893/The_Three_Tomorrows_A_Method_for_Postnormal_Times.
Thomas, P. 2013. Nanoart: The Immateriality of Art. Chicago, IL: Intellect.
Von Busch, O. 2013. Zen and the abstract machine of knitting. Textile 11(1): 619.
Wander, S. 2014. Introducing coloring. Accessed January 12, 2014. http://vimeo.com/
81510205.
Watts, H. 1988. The dada event: From trans substantiation to bones and barking. In: Event
Arts and Art Events, S.C. Foster (ed.), Vol. 57, pp. 119131. Ann Arbor, MI: UMI
Research Press.
Wiener, N. 1989. The Human Use of Human Beings: Cybernetics and Society. London, U.K.:
Free Association.
Wilson, J. 2013. Wearables in the workplace. Harvard Business Review Magazine, September.
Young, L.J. 2009. Love: Neuroscience reveals all. Nature 457: 148. https://hbr.org/2013/09/
wearables-in-the-workplace. Accessed February 28, 2015.
Section II
The Technology
Head-Mounted Display
Technologies for
Augmented Reality
Kiyoshi Kiyokawa
CONTENTS
4.1 Introduction.....................................................................................................60
4.2 Brief History of Head-Mounted Displays........................................................60
4.3 Human Vision System..................................................................................... 61
4.4 HMD-Based AR Applications......................................................................... 62
4.5 Hardware Issues............................................................................................... 63
4.5.1 Optical and Video See-Through Approaches...................................... 63
4.5.2 Ocularity..............................................................................................64
4.5.3 Eye-Relief............................................................................................ 65
4.5.4 Typical Optical Design........................................................................ 65
4.5.5 Other Optical Design........................................................................... 68
4.6 Characteristics of Head-Mounted Displays..................................................... 70
4.6.1 Resolution............................................................................................ 70
4.6.2 Field of View....................................................................................... 70
4.6.3 Occlusion............................................................................................. 72
4.6.4 Depth of Field...................................................................................... 73
4.6.5 Latency................................................................................................ 75
4.6.6 Parallax................................................................................................ 76
4.6.7 Distortions and Aberrations................................................................. 77
4.6.8 Pictorial Consistency........................................................................... 77
4.6.9 Multimodality...................................................................................... 78
4.6.10 Sensing................................................................................................. 78
4.7 Human Perceptual Issues................................................................................. 79
4.7.1 Depth Perception................................................................................. 79
4.7.2 User Acceptance..................................................................................80
4.7.3 Adaptation............................................................................................80
4.8 Conclusion....................................................................................................... 81
References................................................................................................................. 81
59
60
4.1INTRODUCTION
Ever since Sutherlands first see-through head-mounted display (HMD) in the late
1960s, attempts have been made to develop a variety of HMDs by researchers and
manufacturers in the communities of virtual reality (VR), augmented reality (AR), and
wearable computers. Because of HMDs wide application domains and technological
limitations, however, no single HMD is perfect. Ideally, visual stimulation should be
presented in a field of view (FOV) of 200(H) 125(V), at an angular resolution of
0.5min of arc, with a dynamic range of 80 db, at a temporal resolution of 120Hz, and
the device should look like a normal pair of glasses. Such a visual display is difficult to
realize and therefore an appropriate compromise must be made considering a variety
of technological trade-offs. This is why it is extremely important to understand characteristics of different types of HMDs, their capabilities, and limitations. As an introduction to the following discussion, this section introduces three issues related to HMDs:
a brief history of HMDs, human vision system, and application examples of HMDs.
61
The large expanse extra perspective (LEEP) optical system, developed in 1979
by Howledtt, has been widely used in VR. The LEEP system, originally developed
for 3-D still photography, provides a wide FOV (~110(H) 55(V)) stereoscopic
viewing. Having a wide exit pupil of about 40mm, the LEEP requires no adjustment mechanism for interpupillary distance (IPD). Employing the LEEP optical
system, McGreevy and Fisher have developed the Virtual Interactive Environment
Workstation (VIEW) system at the NASA Ames Research Center in 1985. Using
the LEEP optics, VPL Research introduced the first commercial HMD, EyePhone,
in 1989. The EyePhone encouraged VR research at many institutes and laboratories.
Since then a variety of HMDs have been developed and commercialized.
Cornea
Nasal
Iris
Density
Pupil
Lens
Ciliary body
and muscle
Blind spot
Optic nerve
(a)
Cones
Rods
Temporal
Degrees
from fovea
Retina
Fovea
Blind spot
(b)
80
Nasal
20 0 20
FIGURE 4.1 (a) Human eye structure and (b) density of cones and rods.
80
Temporal
62
to move the eyes and/or the head. An area in the view where fixation can be accomplished without head motion is called the field of fixation, which is roughly circular
with a radius of about 4050. However, head motion will normally accompany to
maintain the rotation angle of the eyes smaller than15. The horizontal FOV slowly
declines with age, from nearly 180(H) at age 20, to 135(H) at age 80.
Depth perception occurs with monocular and/or binocular depth cues. These cues
can be further categorized into physiological and psychological cues. Physiological
monocular depth cues include accommodation, monocular convergence, and motion
parallax. Psychological monocular depth cues include apparent size, linear perspective, aerial perspective, texture gradient, occlusion, shades, and shadows. Binocular
convergence and stereopsis are typical physiological and psychological binocular
depth cues, respectively. Binocular convergence is related to the angle between two
lines from a focused object to the both eyes, while stereopsis is about the lateral
disparity between left and right images. Stereopsis is the most powerful depth cue
for distance up to 69m (Boff etal., 1986), and it can be effective up to a few hundreds of meters.
The human eye has a total dynamic sensitivity of at least 1010, by changing the
pupil diameter from about 2 to 8mm. According to the intensity of the light, the
dynamic range is divided into three types of vision: photopic, mesopic, and scotopic (Bohm and Schranner, 1990). Photopic vision, experienced during daylight,
features sharp visual acuity and color perception. In this case, rods are saturated
and not effective. Mesopic vision is experienced at dawn and twilight. In this case,
cones function less actively and provide reduced color perception. At the same time,
peripheral vision can be effective to find dim objects. Scotopic vision is experienced
under starlight conditions. In this case, peripheral vision is more dominant than the
foveal vision with poor visual acuity and degraded color perception because only the
rods are active.
63
Stereoscopic view is also important for accurate operations. Wide FOV, on the other
hand, is not crucial, as the image overlay is needed in a small area at hand.
A lightweight, less-tiring HMD is specifically preferred for end users and/or for
tasks with a large workspace. Early examples in this regard include Boeings AR
system for wire harness assembly (Caudell and Mizell, 1992), KARMA system for
end-user maintenance (Feiner etal., 1993), and an outdoor wearable tour guidance
system (Feiner etal., 1997). In these systems, moderate pixel resolution and registration accuracy often suffice. Safety and user acceptance issues, such as periphery
vision and a mechanism for easy attachment/detachment, are more of importance.
Monitor
Rendered image
Overlaid
image
Optical
combiner
(a)
Real image
Camera
Real image
Captured image
Image
composition
Rendered image
Overlaid
image
Monitor
(b)
FIGURE 4.2 Typical configurations of (a) optical see-through display and (b) video seethrough display.
64
4.5.2 Ocularity
Ocularity is another criterion for categorizing HMDs. There are three types of ocularity: monocular, biocular, and binocular. These categories are independent of the
type of see-through. Table 4.1 shows applicability of each combination of ocularity
and see-through types in AR.
A monocular HMD has a single viewing device, either see-through or closed. It is
relatively small and provides unaided real view to the other eye. A monocular HMD
is preferable, for example, for some outdoor situations, where less obtrusive real view
is crucial and a stereoscopic synthetic image is not necessary. The army aviation and
wearable computing are good examples. With a monocular HMD, the two eyes see quite
different images. This causes an annoying visual experience called binocular rivalry.
This deficiency is prominent when using a monocular video see-through display.
TABLE 4.1
Combinations of Ocularity and See-Through Types
Optical see-through
Video see-through
Monocular
Biocular
Binocular (Stereo)
Good
Confusing
Confusing
Good
Very good
Very good
65
A biocular HMD provides a single image to both eyes. As both eyes always
observe an exact same synthetic image, a problem of binocular rivalry does not
occur. This is a typical configuration for consumer HMDs, where 2D images such as
televisions and video games are primary target contents. Some biocular HMDs have
optical see-through capability for safety reasons. However, an optical see-through
view with a biocular HMD is annoying in AR systems because accurate registration
is achievable only with one eye. For AR, biocular video see-through HMDs are preferable for casual applications, where stereo capability is not crucial but a convincing overlaid image is necessary. Entertainment is a good application domain in this
regard (Billinghurst etal., 2001).
A binocular HMD has two separate displays with two input channels, one for
each eye. Because of the stereo capability, binocular HMDs are preferred in many
AR systems. There is often confusion between binocular and stereo. A binocular
HMD can function as a stereoscopic HMD only when two different image sources
are properly provided.
4.5.3Eye-Relief
Most HMDs need to magnify a small image on the imaging device to produce a large
virtual screen at a certain distance to cover the users view (Figure 4.3a). For small
total size and rotational moment of inertia, a short eye-relief (the separation between
the eyepiece and the eye) is desirable. However, a too-small eye-relief causes the
FOV to be partially shaded off, and it is inconvenient for users with eyeglasses. As a
compromise, eye-relief of an HMD is normally set between 20 and 40mm.
Eye-relief and the actual distance between the eye and the imaging device (or
the last image plane) are interlinked to each other, because a magnifying lens (the
eyepiece functions as a magnifying lens) has normally equivalent front and back
focal lengths. For example, when the eye-relief is 30mm, the distance between
the eye and the image will be roughly 60mm. Similarly, the larger the eye-relief
becomes, the larger the eyepiece diameter needs to be, which introduces heavier
optics but a larger exit pupil size. The exit pupil should be as large as possible,
at least around 10 mm in diameter. The eyepiece diameter cannot exceed the
IPD normally, which varies among individuals from 53 to 73mm (Robinett and
Rolland, 1992).
66
Eyeball
Eyepiece
Imaging device
Virtual screen
(~2 Eye-relief )
Viewing distance (1 m ~ infinity)
(a)
Eyeball
HMPD
Image is projected
onto the real environment
FIGURE 4.3 (a) Eye-relief and viewing distance and (b) locations of the virtual screen in
different types of HMDs.
around the head to minimize rotational moment of inertia. In such systems, the
pupil of an eye needs to be positioned within a specific volume called an eye box to
avoid eclipse.
With the advent of high-resolution, small imaging devices, non-pupil-forming
architecture has become more common, which allows a modest FOV in a lightweight and compact form factor. As a drawback of non-pupil-forming architecture,
optical design is less flexible. Figure 4.4 shows a number of typical eyepiece designs
in non-pupil forming architecture. In early HMDs, refractive optics has been used
67
Eyepiece
Eyeball
Imaging
device
(a)
(b)
Concave
mirror
Half-silvered
mirror
Imaging device
Eyeball
(c)
Free-form prism
FIGURE 4.4 Typical eyepiece designs. (a) Refractive, (b) catadioptric, and (c) free-form
prism.
(Figure 4.4a). In this case, at least three lenses are normally required for aberration
correction. The size in depth and weight of the optics are difficult to reduce. Optical
see-through capability is achieved by folding the optical path by an optical combiner
placed between the eyepiece and the eye.
Catadioptric designs (Figure 4.4b) contain a concave mirror and a half-silvered
mirror. Light emitted from the imaging device is first reflected on the half-silvered
mirror toward the concave mirror. The light then bounces on the concave mirror, travels through the half-silvered mirror, and enters the eye. This configuration
reduces the size and weight significantly. Besides, chromatic aberration is not introduced, which is the inability of a lens to focus different colors to the same point.
Optical see-through capability is achieved by simply making the concave mirror
semitransparent. However, the eye receives only one-fourth of the original light of
the imaging device at most, because the light must travel through the half-silvered
mirror twice. A beam-splitting prism is often used in place of the half-silvered mirror to increase the FOV at the expense of weight.
A free-form prism (Figure 4.4c) reduces the thickness and weight without loss of
light efficiency. For example, 34 horizontal FOV is achieved with the prisms thickness of 15mm. The inner side of the front surface functions as a concave mirror. The
inner side of the back surface is carefully angled. At first, the light from the imaging device bounces off this surface with total reflection. Second, the reflected light
travels through this surface to the eye, because of small incident angles. To provide
optical see-through capability, a compensating prism can be attached at the front
side (on the right side of Figure 4.4c).
68
(a)
Imaging
device
Imaging
device
Eyeball
Image guided with total reflection
Couple-out
optics
Eyeball
Couple-in
optics
Imaging device
(b)
FIGURE 4.5 Examples of (a) HOE-based HMD and (b) waveguide-based HMD.
A holographic optical element (HOE), a kind of diffractive grating, has been used
for lightweight optics in HMDs. Due to its diffractive power, a variety of curved mirror
shapes can be formed on a flat substrate. A HOE can also function as a highly transparent optical combiner due to its wavelength selectivity. Based on these unique characteristics, very thin, lightweight, and bright optical see-through HMDs can be designed
(Ando etal., 1998). An example of HOE-based stereo HMD is illustrated in Figure 4.5a.
An optical waveguide or a light-guide optical element, together with couple-in
and couple-out optics, offers compact, lightweight, wide field of view HMD designs
(Allen, 2002, Kasai etal., 2000). As shown in Figure 4.5b, image components from
an image source are first coupled into the waveguide with total internal reflection.
Those image components are then coupled out of the waveguide using carefully
designed semitransparent reflecting material such as HOE. Some of recent HMDs
such as Google Glass and EPSON Moverio Series use a waveguide-based design.
69
for very high-resolution and wide FOV. The VRD assures focused images all the
time regardless of accommodation of the eye, in exchange for a small exit pupil.
Head-mounted projective displays (HMPD) present a stereo image onto the real
environment from a pair of miniature projectors (Fisher, 1996). A typical configuration of HMPD is shown in Figure 4.6a. From the regions in the real environment that
are covered with retro-reflective materials, the projected stereo image is bounced
back to the corresponding eyes separately. Without the need for eyepiece, this design
is less obtrusive, and it gives smaller aberrations and larger binocular FOV up to
120 horizontally.
In 2013, two novel near-eye light field HMDs have been proposed. The light field
is all the light rays at every point in space travelling every direction. In theory, light
field displays can reproduce accommodation, convergence, and binocular disparity depth cues, eliminating a common problem of the accommodationconvergence
conflict within a designed depth of field. NVIDIAs non-see-through near-eye light
field display (Lanman and Luebke, 2013) is capable of presenting these cues, by
using an imaging device and microlens array near to the eye, closer than the eye
accommodation distance (see Figure 4.6b). Because of the simple structure and a
short distance between the eye and the imaging device, a near-eye light field display
can potentially provide a high-resolution, wide FOV with very thin (~10mm) and
lightweight (~100 g) form factors. University of North Carolinas near-eye light field
display (Maimone and Fuchs, 2013) is optical see-through, supporting a wide FOV,
selective occlusion, and multiple simultaneous focal depths in a similar compact
form factor. Their approach requires no reflective, refractive, or diffractive components, but instead relies on a set of optimized patterns to produce a focused image
when displayed on a stack of spatial light modulators (LCD panels). Although image
quality of these near-eye light field displays is currently not satisfactory, they are
extremely promising because of the unique advantages mentioned earlier.
In 2014, UNC and NVIDIA jointly proposed yet another novel architecture, called
pinlight (Maimone etal., 2014). A pinlight display is simply composed of a spatial
light modulator (an LCD panel) and an array of point light sources (implemented as
an edge-lit, etched acrylic sheet). It forms an array of miniature see-through projectors, thereby offering an arbitrary wide FOV supporting a compact form factor. Their
prototype display renders a wide FOV (110 diagonal) in real time by using a shader
program to rearrange images for tiled miniature projectors.
Projector
Retroreflective
surface
Microlens array
Eyeball
(a)
Eyeball
Half-silvered
mirror
(b)
Imaging device
FIGURE 4.6 (a) Head-mounted projective display and (b) near-eye light field display.
70
4.6.2Field of View
A field of view of an HMD for AR can be classified into a number of regions.
Anaided (or overlay) FOV is the most important visual field in AR where the synthetic image is overlaid onto the real scene. An aided FOV of a stereo HMD typically
71
consists of a stereo FOV and monocular FOVs. Narrow FOV HMDs (<~60(H))
commonly have 100% overlap, whereas wide FOV HMDs (>~80(H)) often have
a small overlap ratio, for example, 50%. Outside of the aided FOV consists of the
peripheral FOV and occluded regions blocked by the HMD structure. The real scene
is directly seen through the peripheral FOV, whereas none of the real or synthetic
image is viewed in the occluded regions. The real views transition between the aided
and peripheral views is desired to be as seamless as possible. The occluded regions
must be as small as possible.
Closed-type, wide FOV (immersive) HMDs, such as Oculus Rift, have typically
no or little peripheral FOV through which the real scene is seen. A video see-through
option is available on market for some closed-type wide FOV HMDs, such as Oculus
Rift and Sensics piSight. By attaching appropriate cameras manually, any closedtype HMDs can be used as a video see-through HMD. InfinitEye V2, which offers
the total binocular FOV of 210(H) 90(V) with 90 of stereo overlap, is not an
exception.
In optical see-through HMDs, overlay FOVs larger than around 60(H) are difficult to achieve with conventional optical designs due to aberrations and distortions. However, optical see-through HMDs tend to have a simple and compact
structure, leaving a wide peripheral FOV for direct observation of the real scene.
Nagahara etal. (2003) proposed a very wide FOV HMD (180(H) 60(V) overlap)
using a pair of ellipsoidal and hyperboloidal curved mirrors. This configuration
can theoretically achieve optical see-through, provided by half-silvered curved
mirror. However, the image is seen only from the very small sweet spot, the focus
of the ellipsoid. L-3 Link Simulation and Trainings Advanced HMD (AHMD)
achieves a wide view of 100(H) 50(V) optically using an ellipsoidal mirror
(Sisodia et al., 2006). Kiyokawa (2007) proposed a type of HMPD, hyperboloidal HMPD (HHMPD) (see Figure 4.7a), which provides a wide FOV by using a
pair of semitransparent hyperboloidal mirrors. With this design, a horizontal FOV
wider than 180 is easily achievable. Nguyen etal. (2011) extended this design to
be available in a mobile environment by using a semitransparent retroreflective
screen (see Figure 4.7b).
Recent advancements in optical designs offer completely new paradigms to
optical see-through wide FOV HMDs. Pinlight displays, introduced in the previous section, allow an arbitrary wide FOV in an eyeglass-like compact form factor.
Innovegas iOptik architecture also offers an arbitrary wide FOV, by a custom contact lens. Through the contact lens, one can focus on the backside of the eyeglasses
and the real environment at the same time. A wide aided FOV is available if an
appropriate image is presented on the backside of the eyeglass, by micro projectors,
for example.
A necessary aided FOV is task-dependent. In medical 3-D visualization, such
as breast needle biopsy, only a limited region in the visual field needs to be aided.
In VR, peripheral vision is proven to be important for situation awareness and
navigation tasks (Arthur, 2000). Larger peripheral FOVs reduce required head
motion and searching time. However, the actual effects of a wide FOV display
on the perception of AR content have not been widely studied. Kishishita et al.
(2014) showed that search performance in a divided attention task either drops
72
(a)
(b)
(c)
FIGURE 4.7 A hyperboloidal head-mounted projective display (HHMPD) (a) with and
(b)without a semitransparent retroreflective screen and (c)an example of image.
4.6.3 Occlusion
Occlusion is well known to be a strong depth cue. In the real world, orders of objects
in depth can be recognized by observing overlaps among them. In terms of cognitive
psychology, incorrect occlusion confuses a user. The occlusion capability of a seethrough display is important in enhancing users perception, visibility, and realism
of the synthetic scene presented. Correct mutual occlusion between the real and the
synthetic scenes is often essential in AR applications, such as architectural previewing. To present correct occlusion, depth information of both the real and the synthetic
scenes is necessary. Depth information of the synthetic image is normally available
from the depth buffer in the graphics pipeline. Real-time depth acquisition in the
73
real scene has been a tough problem, but an inexpensive RGB-D camera is widely
available nowadays.
Once the depth information is acquired, occlusion is reproduced differently with
optical and video see-through approaches. In both cases, a partially occluded virtual
object can be presented by depth keying or rendering phantom objects. Similarly, a
partially occluded real object can be presented in a video see-through approach simply by rendering the occluding virtual object over the video background. However,
the same effect in an optical way is quite difficult to achieve, as the real scene is
always seen through the partially transmissive optical combiner. Any optical combiner will reflect some percentage of the incoming light and transmit the rest, making it impossible to overlay opaque objects in an optical way. Besides, each pixel of
the synthetic image is affected by the color of the real image at the corresponding
point, and never directly shows its intended color.
Some approaches to tackle this problem include (1) using a luminous synthetic
imagery to make the real scene virtually invisible, (2) using a pattern light source
in a dark environment to make part of real objects invisible (e.g., Maimone etal.,
2013), and (3) using a HMPD with retroreflective screens. First approach is common
in flight simulators but it also restricts available colors (to only bright ones). Second
and third approaches need a special configuration in the real environment thus not
available, for example, in a mobile situation. Another approach is a transmissive or
reflective light-modulating mechanism embedded in the see-through optics. ELMO
displays proposed by Kiyokawa employ a relay design to introduce a transparent
LCD panel positioned at an intermediate focus point. The most advanced ELMO display (ELMO-4) features a parallax-free optics with a built-in real-time rangefinder
(Kiyokawa etal., 2003) (see Figure 4.8). An optical see-through light field display
using a stack of LCD panels has a capability of selective occlusion (Maimone and
Fuchs, 2013) and is extremely promising, though its image quality needs to be significantly improved. Reflective approaches have also been proposed using a digital
micro-mirror device (DMD) or a liquid crystal on silicon (LCoS) (Cakmakci etal.,
2004). Although they require a telecentric system, reflective approaches are advantageous in terms of color purity and light efficiency.
4.6.4Depth of Field
Depth of field refers to the range of distances from the eye (or a camera) in which an
object appears in focus. In the real life, the eyes accommodation is automatically
adjusted to focus on an object according to the distance, and objects outside the
depth of field appear blurred. On the other hand, the synthetic image is normally
seen at a fixed distance. Therefore, it is impossible to focus on both the real and the
synthetic images at the same time with a conventional optical see-through HMD,
unless the focused object is at or near the HMDs viewing distance.
This problem does not occur with a video see-through display, though captured
real objects can be defocused due to the camera. To avoid blurred video images,
the camera is preferable to be autofocus or to have a small aperture size. However,
fixed focus of the synthetic image is problematic because accommodation and
74
Masking
LCD
Real
viewpoint
Virtual
viewpoint
Mirror
Mirror
Virtual
viewpoint
Mirror
(a)
(b)
(c)
(d)
FIGURE 4.8 (a) ELMO-4 optics design, (b) its appearance, and overlay images seen through
ELMO-4, (c) without occlusion and (d) with occlusion and real-time range sensing. (Images
taken from Kiyokawa, K. etal., An occlusion-capable optical see-through head mount display
for supporting co-located collaboration, Proceedings of International Symposium on Mixed
and Augmented Reality (ISMAR) 2003, 133141, 2003. Copyright (2003) IEEE.)
c onvergence are closely interlinked in the human vision system. Adjusting one of
these while keeping the other causes eyestrain.
To focus on both the real and the synthetic images at the same time, a different
optical design can be used. Virtual images presented by VRDs and pinlight displays appear clearly in focus regardless of users accommodation distance. This is not
always advantageous, specifically when the content to present is a realistic 3-D scene.
On the other hand, a number of varifocal HMDs have been proposed that change
the depth of focus of the image in real time according to the intended depth of the
content. 3DDAC developed at ATR in late 1990s has an eye-tracking device and a
lens shift mechanism (Omura etal., 1996). Its fourth generation, 3DDAC Mk.4 can
change its focal length in the range between 0 and 4 diopters in about 0.3 s. In 2001,
the University of Washington has proposed True 3-D Display (Schowengerdt and
Seibel, 2004) using laser scanning by a varifocal mirror that can present a number of
75
FIGURE 4.9 A liquid lens-based varifocal HMD. (Courtesy of Hong Hua, University of
Arizona, Tucson, AZ.)
4.6.5Latency
Latency in HMD-based systems refers to a temporal lag from the measurement of
head motion to the moment the rendered image is presented to the user. This leads
to inconsistency between visual and vestibular sensations. In an optical see-through
HMD, latency is observed as a severe registration error with head motion, which
further introduces motion sickness, confusion, and disorientation. In such a situation, the synthetic image swings around the real scene. In a video see-through HMD,
this problem can be minimized by delaying the captured real image to synchronize it
with the corresponding synthetic image. This approach eliminates apparent latency
between the real and the synthetic scenes, at the expense of artificial delay introduced in the real scene.
To compensate latency, prediction filters such as an extended Kalman filter (EKF)
have been successfully used. Frameless rendering techniques can minimize the rendering delay by continuously updating part of the image frame. Taking advantage
of nonuniformity of visual acuity and/or saccadic suppression, limiting regions and/
or resolution of the synthetic image using an eye-tracking device helps reduce the
rendering delay (Luebke and Hallen, 2001). Viewport extraction and image shifting techniques take a different approach. With these techniques, a synthetic image
larger than the screen resolution is first rendered, and then a portion of it is extracted
and presented to the user according to the latest measurement. There exist some
76
FIGURE 4.10 Reflex HMD. (Courtesy of Ryugo Kijima, Gifu University, Gifu, Japan.)
4.6.6Parallax
Unlike optical see-through systems, video see-through HMDs are difficult to eliminate parallax between the users eye and the camera viewpoint. Mounting a stereo camera above the HMD introduces a vertical parallax, causing a false sense of
height. Horizontal parallax introduces errors in depth perception. It is desirable that
the camera lens is positioned optically at the users eye to minimize the parallax.
Examples of parallax-free video see-through HMDs include Canons COASTAR
(Takagi etal., 2000) and State etal.s (2005) display by using a free-form prism and a
half-silvered mirror, respectively. On the other hand, parallax introduced in an optical combiner is negligible and not compensated normally. As another problem, the
viewpoint for rendering must match that of the eye (for optical see-through) or the
camera (for video see-through). As a rendering viewpoint, the center of eye rotation
77
is better for position accuracy, whereas the center of the entrance pupil is better for
angular accuracy (Vaissie and Rolland, 2000). Although human IPD alters dynamically because of eye rotation, this dynamic IPD has not yet been compensated in real
time to the authors knowledge.
4.6.8Pictorial Consistency
Pictorial consistency between the real and virtual images is important for sense
of reality as well as visibility of the overlay information. For example, brightness
and contrast of the synthetic image should be adjusted to that of the real image.
In an optical see-through HMD, it is difficult to match them for a very wide range
of luminance values of the real scene. For example, no imaging device is bright
enough to be comparable to the sunshine. Instead, some products allow transparency control. In video see-through systems, pictorial consistency is more easily achieved. Instead, low contrast (low dynamic range) of the captured image is
often a problem. To compensate, real-time high dynamic range (HDR) techniques
could be used, though the author is not aware of a successful example in video
see-through AR.
78
4.6.9 Multimodality
Vision is a primary modality in AR. Most AR studies and applications are vision
oriented. However, other senses are also important. Literally speaking, AR systems
target arbitrary sensory information. Receptors of special senses including auditory,
olfactory, gustatory, and the sense of balance reside in the head, thus a head-mounted
device is a good choice for modulating such sensory information. For example, a
noise-canceling earphone is considered a hear-through head-mounted (auditory) display in a sense that it combines modulated sound in the real world with digital sound.
Recently, a variety of HMDs for nonvisual senses have been proposed.
Some sensory information is more difficult to reproduce than others. Interplay
of different senses can be used to address this problem. For example, Meta Cookie
developed by Narumi etal. (2011) successfully presents different types of tastes to
the same real cookie by overriding its visual and olfactory stimuli using a headmounted device. In this way, multimodal displays have a great potential in complementing and enforcing missing senses. It will be more and more important, at least
at a lab level, to explore different types of senses in the form of head-mounted
devices.
4.6.10 Sensing
Unlike a smartphone or a smart watch, a head-mounted device will be cumbersome
if a user needs to put on and take off frequently. A typical prospect on a future HMD
is that it will become light, small, and comfortable so that a wearer can continuously
use it for an extended period of time a day for a variety of purposes. However, an
HMD will be useless or even harmful, when the content is not relevant to the current situation hindering observation of the imminent real environment behind. This
problem is less prominent with an HMD for wearable computing, where the FOV is
relatively small and shown off center of the users view. This problem is more crucial
with an HMD for AR, as it is expected to have a wide FOV covering users central
field of vision. In such situations, an AR system must be able to be aware of user and
environmental contexts, and switch contents and its presentation style properly and
dynamically.
Different types of contextual information need to be recognized to determine if
and how the AR content should be presented. Such information includes environmental context such as location, time, weather, traffic, as well as user context such as
body motion (Takada etal., 2010), gaze (Toyama etal., 2014), physiological status,
and schedule. In this sense, integration of sensing mechanisms into an HMD will
become more important. An HMD can be combined not only with conventional
sensors such as a camera and a GPS unit but also with environmental sensors for
light, noise, and temperature as well as biological sensors for EEG, ECG, skin conductance, and body temperature.
Among a variety of sensing information, a large number of attempts have been
made on eye tracking. In 2008, Fraunhofer IPMS has proposed an HMD, iSTAR,
that is capable of both displaying an image and eye tracking at the same time using
an OLED on a CMOS sensor by exploiting the duality of an image sensor and an
Hyperboloidal
half-silvered mirror
79
Eye-hole
(for eyeball observation)
IEEE 1394
camera
First-order mirror
(a)
(b)
FIGURE 4.11 Wide view eye camera. Appearance (a) and captured image (b).
image display. A users view as well as users gaze is important in analysis of users
interest, however, it has been difficult to acquire a wide parallax-free users view.
Mori etal. (2011) proposed a head-mounted eye camera that achieves this by using a
hyperboloidal semitransparent mirror (see Figure 4.11). Eye tracking is also achieved
by analyzing users eye images captured at the same time as users view. Corneal
image analysis is a promising alternative to this system for its simple hardware configuration, offering a variety of applications including calibration-free eye tracking (Nakazawa and Nitschke, 2012), interaction-free HMD calibration (Itoh and
Klinker, 2014), object recognition, etc. For a multifocal HMD, estimation of gaze
direction may not be enough. It is more desirable to be able to estimate the depth of
the attended point in space. Toyama etal. (2014) revealed that a stereo eye tracker
can estimate a focused image distance, by using a prototypical three-layer monocular optical see-through HMD.
80
4.7.2User Acceptance
Inappropriately worn HMDs will induce undesirable symptoms including headaches,
shoulder stiffness, motion sickness, or even severe injuries. From an ergonomic point
of view, HMDs must be as light, small, and comfortable to wear as possible, as far
as the visual performance satisfies the application requirements. The center of mass
of an HMD must be positioned as close to that of the users head as possible. A wellbalanced heavy HMD feels much lighter than a poorly balanced lightweight HMD.
Safety issues are of equal importance. By its nature, AR applications distract
users voluntary attention to the real environment by overlaying synthetic information. Paying too much attention to the synthetic image could be highly dangerous to
the real world activity. To prevent catastrophic results, AR applications may need
to display minimal information as long as the target task is assisted satisfactorily.
Furthermore, HMDs restrict peripheral vision, which obstructs situation awareness
of the surroundings. In video see-through, central vision will be lost under a system
failure. To accommodate these problems, a flip-up display design is helpful (Rolland
and Fuchs, 2001). When safety issues are of top priority, optical see-through HMDs
are recommended.
From a social point of view, HMDs should have a low profile or cool design to be
widely accepted. Video cameras on an HMD have privacy and security issues. Bass
etal. (1997) describe the ultimate test of obtrusiveness of an HMD, as whether or
not a wearer is able to gamble in a Las Vegas casino without challenge.
4.7.3Adaptation
The human vision system is quite dynamic. It takes some time to adapt to and
recover from a new visual experience. For example, wearing an HMD will cause the
pupils dilation slightly. However, complete dilation may take over 20min whereas
complete constriction may take less than one minute (Alpern and Campbell, 1963).
Even though the visual experience is inconsistent with the real world, the human
vision system adapts to the new environment very flexibly. For example, a great ability of adaptation to the inverted image on the retina has been proven for more than
100years (Stratton, 1896). Similar adaptation occurs with AR systems with parallax
in video see-through systems. Biocca and Rolland (1998) found that performance
in a depth-pointing task was improved significantly over time using a video seethrough system with parallax of 62mm in vertical and 165mm in horizontal. Also
found was a negative aftereffect, which can be harmful in some situations.
81
4.8CONCLUSION
With the advancements in display technologies and an increasing public interest to
AR, VR, and wearable computing, both research and business on HMDs are now
more active than ever. However, there is and will be no single right HMD due to
technical limitations and wide variety of applications. Therefore, appropriate compromise must be made depending on the target application. Issues discussed in this
chapter give some insights into the selection of an HMD. One must first consider
whether optical or video see-through approach is more suitable for the target task.
This is, in short, a trade-off between the real world visibility and pictorial consistency. Next consideration would be a trade-off between the field of view and angular
resolution. When the user needs to observe both near and far overlay information, an
accommodation-capable (e.g., near-eye light field displays) or accommodation-free
(e.g., VRDs) HMD may be the first choice. If true occlusion within nearly intact real
views is necessary, occlusion-capable optical see-through displays such as ELMO-4
should be selected. Novel optical designs such as near-eye light field displays and
pinlight displays offer many preferable features at the same time, such as a wide field
of view and a compact form factor. Multimodal output and sensing features will be
more important as the demand for more advanced AR applications grows and HMD
becomes indispensable tool.
REFERENCES
Allen, K. (2002). A new fold in microdisplay optics, in emerging displays review, emerging
display technologies, Stanford Resources, July, pp. 712.
Alpern, M. and Campbell, F. W. (1963). The behavior of the pupil during dark adaptation,
Journal Physiology, 65, 57.
Ando, T., Yamasaki, K., Okamoto, M., and Shimizu, E. (1998). Head-mounted display using
a holographic optical element, Proceedings of SPIE 3293, Practical Holography XII,
SanJose, CA, p.183. doi:10.1117/12.303654.
Arthur, K. W. (2000). Effects of field of view on performance with head-mounted displays,
Doctoral thesis, University of North Carolina at Chapel Hill, Chapel Hill, NC.
Barfield, W., Hendrix, C., Bjorneseth, O., Kaczmarek, K. A., and Lotens, W. (1995). Comparison
of human sensory capabilities with technical specifications of virtual environment equipment, Presence, 4(4), 329356.
Bass, L., Mann, S., Siewiorek, D., and Thompson, C. (1997). Issues in wearable computing: A
CHI 97 workshop, ACM SIGCHI Bulletin, 29(4), 3439.
Bell, B., Feiner, S., and Hollerer, T. (2001). View management for virtual and augmented reality, Proceedings of the ACM UIST 2001, Orlando, FL, pp. 101110.
82
Billinghurst, M., Kato, H., and Poupyrev, I. (2001). The magicbookMoving seamlessly
between reality and virtuality, IEEE Computer Graphics and Applications, 21(3), 68.
Biocca, F. A. and Rolland, J. P. (1998). Virtual eyes can rearrange your body: Adaptation to
virtual-eye location in see-thru head-mounted displays, Presence: Teleoperators and
Virtual Environments (MIT Press), 7(3), 262277.
Boff, K. R., Kaufman, L., and Thomas, J. P. (1986). Handbook of Perception and Human
Performance, John Wiley & Sons, New York.
Bohm, H. D. V. and Schranner, R. (1990). Requirements of an HMS/D for a night-flying helicopter. Helmet-mounted displays II, Proceedings of SPIE, Orlando, FL, 1290, 93107.
Buchroeder, R. A. (1987). Helmet-mounted displays, tutorial short course notes T2, SPIE
Technical Symposium Southeast on Optics, Electro-optics, and Sensors, Orlando, FL.
Cakmakci, O., Ha, Y., and Rolland, J. P. (2004). A compact optical see-through head-worn
display with occlusion support, Proceedings of the IEEE and ACM International
Symposium on Mixed and Augmented Reality (ISMAR), Arlington, VA, pp. 1625.
Caudell, T. P. and Mizell, D. W. (1992). Augmented reality: An application of heads-up display
technology to manual manufacturing processes, Proceedings of the 1992 IEEE Hawaii
International Conference on Systems Sciences, Honolulu, HI, pp. 659669.
Comeau, C. P. and Bryan, J. S. (1961). Headsight television system provides remote surveillance, Electronics, 34(10 November), 8690.
Feiner, S., Macintyre, B., and Seligmann, D. (1993). Knowledge-based augmented reality,
Communications of the ACM, 36(7), 5362.
Feiner, S. B., Macintyre, B., Tobias, H., and Webster, A. (1997). A touring machine: Prototyping
3D mobile augmented reality systems for exploring the urban environment, Proceedings
of ISWC97, Cambridge, MA, pp. 7481.
Fisher, R. (November 5, 1996). Head-mounted projection display system featuring beam splitter and method of making same, US Patent No. 5572229.
Furness, T. A. (1986). The super cockpit and its human factors challenges, Proceedings of the
Human Factors Society, Dayton, OH, 30, 4852.
Grasset, R., Langlotz, T., Kalkofen, D., Tatzgern, M., and Schmalstieg, D. (2012). Imagedriven view management for augmented reality browsers, Proceedings of International
Symposium on Mixed and Augmented Reality (ISMAR), Atlanta, GA, pp. 177186.
Hayford, M. J. and Koch, D. G. (1989). Optical arrangement, US Patent No. 4854688, issued
August 8, 1989.
Heilig, M. (1960). Stereoscopic television apparatus for individual use, US Patent No.
2955156, issued October 4, 1960.
Itoh, Y. and Klinker, G. (2014). Interaction-free calibration for optical see-through headmounted displays based on 3D eye localization, Proceedings of the Ninth IEEE
Symposium on 3D User Interfaces (3DUI), Minneapolis, MN, pp. 7582.
Kasai, I., Tanijiri, Y., Endo, T., and Ueda, H. (2000). A forgettable near eye display, Proceedings
of Fourth International Symposium on Wearable Computers (ISWC) 2000, Atlanta, GA,
pp. 115118.
Kijima, R. and Ojika, T. (2002). Reflex HMD to compensate lag and correction of derivative
deformation, Proceedings of International Conference on Virtual Reality (VR) 2002,
Orlando, FL, pp.172179.
Kishishita, N., Kiyokawa, K., Kruijff, E., Orlosky, J., Mashita, T., and Takemura, H. (2014).
Analysing the effects of a wide field of view augmented reality display on search performance in divided attention tasks, Proceedings of International Symposium on Mixed
and Augmented Reality (ISMAR) 2014, Munich, Germany.
Kiyokawa, K. (2007). A wide field-of-view head mounted projective display using hyperbolic half-silvered mirrors, Proceedings of International Symposium on Mixed and
Augmented Reality (ISMAR) 2007, Nara, Japan, pp. 207210.
83
Kiyokawa, K., Billinghurst, M., Campbell, B., and Woods, E. (2003). An occlusion-capable
optical see-through head mount display for supporting co-located collaboration,
Proceedings of International Symposium on Mixed and Augmented Reality (ISMAR)
2003, Tokyo, Japan, pp. 133141.
Lanman, D. and Luebke, D. (2013). Near-eye light field displays, ACM Transactions on
Graphics (TOG), 32(6), 220. Proceedings of SIGGRAPH Asia, Hong Kong, China.
Liu, S., Cheng, D., and Hua, H. (2008). An optical see-through head mounted display with
addressable focal planes, Proceedings of IEEE International Symposium on Mixed and
Augmented Reality (ISMAR) 2008, Cambridge, UK, pp. 3342.
Livingston, M. A., Swan, J. E., Gabbard, J. L., Hollerer, T. H., Hix, D., Julier, S. J., Yohan,B.,
and Brown, D. (2003). Resolving multiple occluded layers in augmented reality,
Proceedings of International Symposium on Mixed and Augmented Reality (ISMAR)
2003, Tokyo, Japan, pp. 5665.
Longridge, T., Thomas, M., Fernie, A., Williams, T., and Wetzel, P. (1989). Design of an eye slaved
area of interest system for the simulator complexity testbed, in Area of Interest/Field-Of-View
Research Using ASPT, T. Longridge (ed.). National Security Industrial Association, Air Force
Human Resources Laboratory, Air Force Systems Command, Washington, DC, pp. 275283.
Luebke, D. and Hallen, B. (2001). Perceptually-driven simplification for interactive rendering, Proceedings of the ACM 12th Eurographics Workshop on Rendering Techniques,
London, UK, pp.223234.
Maimone, A. and Fuchs, H. (2013). Computational augmented reality eyeglasses, Proceedings
of International Symposium on Mixed and Augmented Reality (ISMAR) 2013, Adelaide,
Australia, pp. 2938.
Maimone, A., Lanman, D., Rathinavel, K., Keller, K., Luebke, D., and Fuchs, H. (2014).
Pinlight displays: Wide field of view augmented reality eyeglasses using defocused
point light sources, ACM Transaction on Graphics (TOG), 33(4), Article No. 89.
Maimone, A., Yang, X., Dierk, N., State, A., Dou, M., and Fuchs, H. (2013). General-purpose
telepresence with head-worn optical see-through displays and projector-based lighting,
Proceedings of IEEE Virtual Reality (VR), Orlando, FL, pp. 2326.
McCollum, H. (1945). Stereoscopic television apparatus, US Patent No. 2,388,170.
Mori, H., Sumiya, E., Mashita, T., Kiyokawa, K., and Takemura, H. (2011). A wide-view p arallax-free
eye-mark recorder with a hyperboloidal half-silvered mirror and appearance-based gaze estimation, IEEE TVCG, 17(7), 900912.
Nagahara, H., Yagi, Y., and Yachida, M. (2003). Super wide viewer using catadioptical optics,
Proceedings of ACM VRST, Osaka, Japan, pp. 169175.
Nakazawa, A. and Nitschke, C. (2012). Point of gaze estimation through corneal surface
reflection in an active illumination environment, Proceedings of European Conference
on Computer Vision (ECCV), Florence, Italy, Vol. 2, pp. 159172.
Narumi, T., Nishizaka, S., Kajinami, T., Tanikawa, T., and Hirose, M. (2011). Meta cookie:
An illusion-based gustatory display, Proceedings of the 14th International Conference
on Human-Computer Interaction (HCI International 2011), Orlando, FL, pp. 260269.
Nguyen, D., Mashita, T., Kiyokawa, K., and Takemura, H. (2011). Subjective image quality
assessment of a wide-view head mounted projective display with a semi-transparent
retro-reflective screen, Proceedings of the 21st International Conference on Artificial
Reality and Telexistence (ICAT 2011), Osaka, Japan.
Omura, K., Shiwa, S., and Kishino, F. (1996). 3-D display with accommodative compensation (3DDAC) employing real-time gaze detection, SID 1996 Digest, San Diego, CA,
pp.889892.
Rash, C. E. and Martin, J. S. (1988). The impact of the U.S. Armys AH-64 helmet mounted
display on future aviation helmet design, USAARL Report No. 88-13. Fort Rucker, AL:
U.S. Army Aeromedical Research Laboratory.
84
Robinett, W. and Rolland, J. P. (1992). A computational model for the stereoscopic optics of a
head-mounted display, Presence: Teleoperators and Virtual Environments (MIT Press),
1(1), 4562.
Rolland, J. P. and Fuchs, H. (2001). Optical versus video see-through head-mounted displays,
in Fundamentals of Wearable Computers and Augmented Reality, Barfield, W. and
Caudell, T. (eds.). Lawrence Erlbaum Associates: Mahwah, NJ.
Rolland, J. P., Wright, D. L., and Kancherla, A. R. (1996). Towards a novel augmented-reality
tool to visualize dynamic 3D anatomy, Proceedings of Medicine Meets Virtual Reality,
Vol. 5, San Diego, CA (1997). Technical Report, TR96-02, University of Central
Florida, Orlando, FL.
Rosner, M. and Belkin, M. (1989). Video display units and visual function, Survey of
Ophthalmology, 33(6), 515522.
Schowengerdt, B. T. and Seibel, E. J. (2004). True 3D displays that allow viewers to dynamically shift accommodation, bringing objects displayed at different viewing distances
into and out of focus, Cyber Psychology & Behavior, 7(6), 610620.
Sisodia, A., Riser, A., Bayer, M., and McGuire, J. (2006). Advanced helmet mounted display
for simulator applications, SPIE Defense & Security Symposium, Helmet- and HeadMounted Displays XI: Technologies and Applications Conference, Orlando, FL.
State, A., Keller, K. P., and Fuchs, H. (2005). Simulation-based design and rapid prototyping
of a parallax-free, orthoscopic video see-through head-mounted display, Proceedings of
IEEE/ACM ISMAR, Santa Barbara, CA, pp. 2931.
Stratton, G. M. (1896). Some preliminary experiments on vision without inversion of the retinal image, Psychological Review, 3, 611617.
Sutherland, I. (1965). The ultimate display, Information Processing 1965: Proceedings of IFIP
Congress, New York, NY, Vol. 2, pp. 506508.
Sutherland, I. (1968). A head-mounted three-dimensional display, Fall Joint Computer
Conference, AFIPS Conference Proceedings, San Francisco, CA, Vol. 33, pp. 757764.
Takada, D., Ogawa, T., Kiyokawa, K., and Takemura, H. (2010). A context-aware wearable
AR system with dynamic information detail control based on body motion, Transaction
on Human Interface Society, Japan, 12(1), 4756 (in Japanese).
Takagi, A., Yamazaki, S., Saito, Y., and Taniguchi, N. (2000). Development of a stereo
video see-through HMD for AR systems, Proceedings of International Symposium on
Augmented Reality (ISAR) 2000, Munich, Germany, pp. 6880.
Toyama, T., Orlosky, J., Sonntag, D., and Kiyokawa, K. (2014). Natural interface for multifocal plane head mounted displays using 3D gaze, Proceedings of the 2014 International
Working Conference on Advanced Visual Interfaces, Como, Italy, pp. 2532.
Vaissie, L. and Rolland, J. (2000). Accuracy of rendered depth in head-mounted displays:
Choice of eyepoint locations, Proceedings of SPIE AeroSense 2000, Orlando, FL, Vol.
4021, pp.343353.
CONTENTS
5.1 Introduction..................................................................................................... 86
5.2 HMD/SMART Eyewear Market Segments..................................................... 87
5.3 Optical Requirements...................................................................................... 88
5.4 Optical Architectures for HMDs and Smart Glasses...................................... 91
5.5 Diffractive and Holographic Extractors..........................................................97
5.6 Notions of IPD, Eye Box, Eye Relief, and Eye Pupil.......................................99
5.7 Optical Microdisplays.................................................................................... 102
5.8 Smart Eyewear............................................................................................... 107
5.9 Examples of Current Industrial Implementations......................................... 110
5.9.1 Display-Less Connected Glasses....................................................... 110
5.9.2 Immersion Display Smart Glasses..................................................... 114
5.9.3 See-Through Smart Glasses............................................................... 114
5.9.4 Consumer Immersion VR Headsets.................................................. 115
5.9.5 Consumer AR (See-Through) Headsets............................................ 116
5.9.6 Specialized AR Headsets.................................................................. 117
5.10 Other Optical Architectures Developed in Industry..................................... 117
5.10.1 Contact Lens-Based HMD Systems.................................................. 117
5.10.2 Light Field See-Through Wearable Displays..................................... 118
5.11 Optics for Input Interfaces............................................................................. 118
5.11.1 Voice Control..................................................................................... 119
5.11.2 Input via Trackpad............................................................................. 119
5.11.3 Head and Eye Gestures Sensors........................................................ 119
5.11.4 Eye Gaze Tracking............................................................................. 119
5.11.5 Hand Gesture Sensing....................................................................... 121
5.11.6 Other Sensing Technologies.............................................................. 122
5.12 Conclusion..................................................................................................... 122
References............................................................................................................... 123
85
86
This chapter reviews the various optical technologies that have been developed to
implement head-mounted displays (HMDs), as augmented reality (AR) devices, virtual reality (VR) devices, and more recently as connected glasses, smart glasses, and
smart eyewear. We review the typical requirements and optical performances of such
devices and categorize them into distinct groups, suited for different (and constantly
evolving) market segments, and analyze such market segmentation.
5.1INTRODUCTION
Augmented reality (AR) HMDs (based on see-through optics) have been around for a
few decades now, although being dedicated solely to defense applications until recently
(Cakmacki and Rolland, 2006, Hua et al., 2010, Martins et al., 2004, Melzer and Moffitt,
1997, Rash, 1999, Velger, 1998, Wilson and Wright, 2007). Today AR headsets have
been applied to various markets, such as firefighting, police, engineering, logistics, medical, surgery, and more, with emphasis on sensors, specific digital imaging, and strong
connectivity. Consumer applications are also emerging rapidly, focused on connectivity
and digital imaging capabilities, in an attractive and minimalistic package. Such segmentation has been possible, thanks to recent technological leaps in the smartphones industry
(connectivity, on-board CPU power with miniaturization of ICs, development of complex
sensors, novel microdisplays, novel digital imaging techniques, and battery technology).
Virtual reality (VR) HMDs (also called occlusion or immersive displays) have
also been around for decades but have been targeted to various market segments,
such as flight simulators and battle training for defense applications. Successive
attempts to mass distribute VR HMDs as well as AR HMDs to the consumer market
has partially failed during the last two decades mainly because of the lack of adapted
displays and microdisplays, embarked sensors, and subsequent problems with high
latency (see a few early examples of VR offerings from the 1990s in Figure 5.1).
87
88
FIGURE 5.2 Some of the current AR HMD and smart glasses products.
89
TABLE 5.1
Requirements for the Various HMD Market Segments
Specs
Smart Glasses
VR Headsets
Industrial HMDs
Defense HMDs
Industrial design
Power
consumption
Costs
Weight/size
++++
++++
+++
+++
+++ (Forgettable)
++
+
+
++
Eye box
(Dial in)
(Dial in )
+ (Helmet
mounted)
(Dial in)
++ (Multicolor)
++++ (>90)
++ (Occlusion)
++ (>30)
++++ (>500.1)
++++
(Mono-/
multicolor)
++++ (>100)
++++ (>500.1)
+++
++
(Occlusion
display)
Binocular 3D
+++
+++
Monocular
Binocular 2D
Rx glasses
integration
Full color
operation
FOV
System contrast
Environmental
stability
See-through
quality
Mono-/binocular
Monocular
(NA)
Note: + means critical; +++ means most critical; means not critical; means least critical.
Google Glass
(15)
Vuzix M-100
(16)
Recon Jet
(16)
Occlusion
display
See-through
display
FIGURE 5.3 Display FOVs (both occlusion and see-through) developed by industry.
90
In order to keep the resolution within or below the angular resolution of the human
eye, scaling large FOV is today a real challenge for immersed VR headsets, which
require very large FOV, and thus also very dense pixel count. Several major display
companies have been developing 4 2 K displays over a regular cell phone display
area, which should be able to address a high FOV and decent angular resolution for
VR systems up to 100 diagonal FOV.
For smart glasses, with FOV of 1520, nHD (640 360 pixels, one ninth of full
HD) or at best 720p resolutions are usually sufficient. The FOV and the resulting
resolution for various available HMDs today are listed in Table 5.2. Dot per degree
(DPD) replaces the traditional resolution criteria of dot per inch (DPI) as in conventional displays. Indeed, a high DPI can result in a low DPD as the FOV is large.
An angular resolution of 50 DPD corresponds roughly to 1.2 arc min, which is the
resolution of the human eye (for 20/20 vision). Figure 5.4a shows the angular resolution of some available HMDs in industry as a function of FOV. As one can expect,
the angular resolution tends to decrease when the FOV increases, even when the
display resolution increases (see also Section 5.4).
The pixel counts to achieve the 1.2 arc min angular resolution for increasing
FOVs (diagonally measured) can be quite large when attempting to implement 20/20
vision in VR headsets for FOV over 100. Today, the most dense pixel count display
is a 2 K display (QHD at 2560 1440 on Galaxy Note 5) which would allow such
resolution over a FOV of 60 only. Next year, 4 K displays (3840 2160 by Samsung)
will be available, pushing the resolution up to nearly 100, which is the minimum for
VR but quite large already for AR applications. Figure 5.4b shows how a 16:9 aspect
ratio pixel count scales with FOV.
It is also interesting to organize the various existing HMDs on a graph showing
the FOV as a function of the target functionality (smart glasses, AR, or VR)see
Figure 5.5.
As one might expect, lower FOV is favored by smart glass and smart eyewear
applications. While the FOV increases, professional AR applications tend to be
favored, and for maximal FOV, occlusion VR gaming devices are the preferred
application.
TABLE 5.2
FOV and Resulting Angular Resolution for Various Devices Available Today
Device
FOV
Resolution
Aspect Ratio
Google Glass
Vuzix M100
Epson Moverio
Oculus Rift
Zeiss Cinemizer
Sony HMZ T2
Optinvent ORA
Lumus DK40
15
16
23
115
35
51
24
25
640 360
400 240
960 540
800 640
870 500
1280 720
640 480
640 480
16:9
16:9
16:9
1.25:1
1.74:1
16:9
4:3
4:3
48
28
48
9
28
46
33
32
91
50
Glass
40
Moverio
Optinvent
Lumus
Cinemizer
Vuzix m100
30
20
Sony
Morpheus
10
10
(a)
20
30
50
60
70
Oculus
DK1
80
90
100
110
35
Size of display (Mpix)
40
Oculus
DK2
40
30
25
20
15
10
nHD
(640 360)
5
0
(b)
Sony
HMZ2
20
40
Smart glasses, AR
60
80
100
120
VR (OLED)
140
160
180
FIGURE 5.4 (a) Angular resolution as a function of FOV for various existing HMDs;
(b)Pixel counts as a function of FOV.
VR
92
VR
AR
Moverio
Laster
AR
Lumus
Optinvent
Smart
glasses
Glass
Connected
glasses
Vuzix
m100
Olympus
50
100
FOV (deg)
Sony
HMZ2
Oculus
rift
Sony
Morpheus
93
1. Pupil-forming architecture
(a)
Oculus Rift
Occlusion
Sony HMZ
Huge FOV
Vuzix
Large eye box
do it yourself
(b)
Laster Sarl
ODA Labs
...
Holographic reflector
Mono or full color
Large FOV
Temple projector
(c)
FIGURE 5.6 (a) Pupil forming and non-pupil-forming optical architectures for HMDs;
(b)occlusion display magnifiers (VR); (c) see-through free-space combiner optics; (Continued)
94
Google Glass
Good see-through
RockChip Ltd
small FOV
ITRI Taiwan
Medium eye box
OmniVision Inc.
(d)
Distorted see-through
(or opaque)
Medium FOV
Medium eye box
Good see-through
Medium FOV
Medium eye box
(e)
FIGURE 5.6 (Continued) (d) see-through light-guide combiner optics; (e) see-through TIR
freeform combiner optics;
(Continued)
95
Volume holographic
combiner
OK see-through
Medium FOV
Medium eye box
Konica Minolta
...
Good see-through
Medium FOV
Small V eye box
(f )
Microprism combiner
Optinvent Sarl See-through OK
Medium FOV
Large eye box
Injection molded
Cascaded coated
mirrors combiner
Lumus Ltd See-through OK
Medium FOV
Large eye box
All glass
various coatings
(g)
FIGURE 5.6 (Continued) (f) see-through single mirror combiner optic; and (g) see-through
cascaded extractor optics.
an aerial image of the microdisplay formed by a relay lens. This aerial image becomes
the object to be magnified by the eyepiece lens, as in a non-pupil-forming architecture.
Although the non-pupil-forming optical architecture seems to be the simplest
and thus best candidate to implement small and compact HMDs, the pupil-forming
architecture has a few advantages such as the following:
For a large FOV, the microdisplay does not need to be located close to the
combiner lens (thus providing free space around the temple side).
As the object is an aerial image (thus directly accessiblenot located under a
cover plate as in the microdisplay), a diffuser or other element can be placed in
that plane to yield an adequate diffusion cone in order to expand, for example,
the eye box of the system. Other exit pupil expanders (EPEs) can also be used
in that pupil plane (microlens arrays [MLAs], diffractive elements, etc.).
96
The optical path can be tilted at the aerial image plane, thus providing for
hear wrap instead of straight optical path as in the non-pupil-forming architecture. The aerial image can be bounced off at grazing incidence through
a mirror or a prism.
Most of the consumer offerings today (see Figure 5.2) are using the non-pupil-forming
architecture. Most of the defense HMDs are using the pupil-forming architecture.
The optical platforms used to implement the optical combining function in smart
glasses, smart eyewear, AR, and VR devices are quite diverse. They can be grouped
roughly into six categories:
1.
Immersion display magnifiers (Figure 5.6b): These are magnifiers placed
directly on top of the display for maximum FOV (such as in VR devices) or
further away in a folded path such as in smaller FOV smart glasses. They
may be implemented as conventional lenses or more compact segmented or
Fresnel optics, on flat or curved substrates, over single or multiple surfaces.
2.
See-through free-space combiner optics (Figure 5.6c): Such optics are usually partially reflective (either through thin metal or dichroic coatings), as
thin elements or immersed in a thicker refractive optical element, and operate in off-axis mode, making them more complex surfaces than standard
on-axis surfaces as in (1). Such surfaces can also be freeform to implement
large FOV. They might be reflective, segmented (Fresnel-type), or reflective
diffractive/holographic (Kressetal., 2009) in order to reduce the curvature
and thus their protrusion.
3.
See-through lightguide combiner optics (Figure 5.6d): Very often these
architectures are not really lightguides, since any light reflecting (through
TIR) from the surfaces might produce ghost images (or reduce the contrast)
rather than contributing to the desired image. However, the light field is
constantly kept inside plastic or glass, keeping it from being affected by
hair, scatter from dust, etc. For perfect see-through, reflective optics might
be used (right side of Figure 5.6d).
4.
See-through freeform TIR combiner optics (Figure 5.6e): This is a classical
design used not only in see-through combiners but also in occlusion HMDs
(Talha etal., 2008). Typically, this is a three-surface freeform optical element, first surface transmissive, second surface TIR, and third surface partially reflective. It is very desirable in occlusion displays since it allows
the relocation of the display on top or on the side and can allow for larger
FOV. In see-through mode, a compensating element has to be cemented on
the partially reflective coating. Multiple TIR bounces (>3) have also been
investigated with this architecture.
5.
See-through single mirror TIR combiner optic (Figure 5.6f): This is a true TIR
guide that uses either a partially reflective, flat, or curved mirror as a single
extractor or a leaky diffractive or holographic extractor. When the guide gets
thin, the eye box tends to be reduced. The combiner element (flat or curved)
as seen by the eye should have the widest extent possible, in order to produce
the largest eye box. This is why the combiner mirror (or half-tinted mirror)
97
should be oriented inside the lightguide in such a way that the user sees the
largest possible combiner area, producing, therefore, the largest possible eyebox, without compromising image resolution, distortion, or efficiency.
6.
See-through cascaded waveguide extractors optics (Figure 5.6g): In order to
expand the eye box from the previous architectures (especially #5), cascaded
extractors (Thomson CSF, 1991) have been investigated, ranging from dichroic
mirrors to partially reflective prism arrays and variable efficiency reflective and
transmission holographic extractors (Kress and Meyrueis, 2009).
Most of the HMDs we review in this chapter are monocular designs, although there
has been extensive research and development for stereoscopic displays for the consumer market. The issues related to potential eye strain are more complex when
dealing with bi-ocular or binocular displays (Peli, 1998).
Beam splitter
Engineered diffusers
DOE/aspheric lenses
Microlens arrays
(MLAs)
CGH
Grating/Beam
redirection (custom pattern projection)
Beam shaping/beam
homogenizing
FIGURE 5.7 Diffractive and holographic optics implementing various optical functionalities.
98
Angular selectivity
(%) @550 nm
(%) @30
100
100
0
Reflection
hologram
Spectral selectivity
30
60 ()
(%)@550 nm
450
550
650 (nm)
550
650
(%) @30
100
100
30
60 ()
450
(nm)
FIGURE 5.8 Angular and spectral bandwidths of reflection and transmission holograms.
(a)
(b)
(c)
FIGURE 5.9 Examples of holographic and diffractive combiners such as: (a) Free space
Digilens and Composyt Labs smart glasses using volume reflection holograms, (b): Flat
Nokia/Vuzix/Microsoft and flat Microsoft Hololens digital diffractive combiner with 2D
exit pupil expanders. (c) KonicaMinolta full color holographic vertical lightguide using a
single RGB reflective holographic extractor and Sony monocolor waveguide smart glasses
using 1D reflective holographic incoupler and exit pupil expander outcoupler.
99
smart glasses (see Figure 5.9). Afree-space operation is depicted in Figure 5.6b and a
waveguide operation is depicted in Figure5.6g. Figure 5.8 shows typical angular and
spectral bandwidths derived from Kogelnik-coupled wave theory, for reflection and transmission volume holograms, operating in either free space or TIR waveguide modes. The
FOV of the display is thus usually limited by the angular spectrum of the hologram, modulated by the spectral bandwidth. Transmission holograms have wider bandwidths, but
require also a higher index modulation, especially when tri-color operation is required.
In order to reduce spectral spread (when using LED illumination) and increase
angular bandwidth (in order to push through the entire FOV without uniformity
hit), it is necessary to use reflection-type holograms (large angular bandwidth and
smaller spectral bandwidth).
5.6 NOTIONS OF IPD, EYE BOX, EYE RELIEF, AND EYE PUPIL
Although the eye box is one of the most important criteria in an HMD, allowing easy
viewing of the entire FOV by users having different interpupillary distances (IPDs) or
temple-to-eye distances, it is the criteria that has also the loosest definition. The IPD is
an important criterion that has to be addressed for consumer smart glasses, in order to
cover a 95 percentile of the potential market (see Figure 5.10). Usually a combination
of optical and mechanical adjustment can lead to a large covering of the IPD (large
exit eye pupil or eye box). A static system may not address a large enough population.
The eye box is usually referred to as the metric distance over which the users eye
pupil can move in both directions, at the eye relief (or vertex) distance, without loosing
the edges of the image (display). However, loosing the display is quite subjective and
involves a combination of resolution, distortion, and illumination uniformity considerations, making it a complex parameter. For obvious aesthetics and wearability reasons,
it is desirable to have the thinnest combiner and at the same time the largest eye box
Interpupillary distance
55 mm
70 mm
53 mm
65 mm
41 mm
Child, low
55 mm
Child, high
100
Thickness of combiner
e
Fr
Aesthetic constraint
ed
v
ur
ec
ac
p
es
er
bin
m
co
Lig
ith
w
ide
u
htg
on
tor
ma
olli
c
s
axi
E
o EP
e w/
id
eg u
Wav
Waveguide w EPE
Full IPD
coverage
FIGURE 5.11 Optical combiner thickness as a function of the eye box size for various optical HMD architectures.
or exit pupil (an eye box of 10mm horizontally and 8mm vertically is often used as
a standard requirement for todays smart glasses). Designing a thin optical combiner
producing a large eyebox is usually not easy: when using conventional free-space
optics, the eye box scales with the thickness of combiner (see, e.g., Figure 5.11), as in
most of the architectures presented in the previous section, expect for architecture #6
(Figure5.6g), which is based on waveguide optics using cascaded planar extractors.
For holographic combiner and extraction (both free spaceFigure 5.6c and
waveguide Figure 5.6g), various EPE techniques have been investigated to expand
the eye box in both directions (Levola, 2006, Urey and Powell, 2005). EPEs are often
based on cascaded extractors (conventional optics or holographics) and act usually
only on one direction (horizontal direction). See, for example, also Figure 5.9, upper
right example (Nokia/Vuzix AR 6000 AR HMD and Microsoft HoloLens) using
a diffractive waveguide combiner with both X and Y diffractive waveguide EPE.
However, such complex diffractive structures require subwavelength tilted structures
difficult to replicate in mass by embossing or injection molding. The EPEs can also
be implemented in free-space architectures by the use of diffusers or MLAs.
The eye box is also a function of the size of the eye pupil (see Figure 5.12).
Typically, a smaller eye pupil (in bright environments) will produce a smaller effective eye box, and a larger eye pupil (in darker environments) will produce a larger
eye box. A standard pupil diameter used in industry is usually 4mm, but can vary
anywhere from 1to 7mm depending on the ambient brightness.
The eye box is modeled and measured at the eye relief, the distance from the
cornea to the first optical surface of the combiner. If the combiner is integrated
within Rx lenses (such as in smart eyewear), the notion of eye relief is then replaced
by the notion of vertex distance, the distance from the cornea to the apex of the lens
on the eye side surface. If the combiner is worn with extra lenses (such as in smart
glasses), the eye relief remains the distance from the cornea to the exit surface of
101
Low-light
conditions
Human pupil
7 mm
FIGURE 5.12 Eye box size (exit pupil) as a function of the eye pupil diameter.
Eye box scales with eye relief
Design space
Eye box
Non-pupil-forming
architectures
Full IPD coverage
Pupil-forming
architectures
Aesthetic
constraint
Eye relief/
vertex distance
FIGURE 5.13 Eye box versus eye relief for various optical HMD architectures.
the combiner, not to the Rx lens. In virtually all HMD configurations, the eye box
reduces when the eye relief increases (see Figure 5.13).
However, in pupil-forming architectures (refer also to Figure 5.5), the eye box may
actually increase until a certain distance (short distance, usually smaller than the
nominal eye relief), and then get smaller. For non-pupil-forming architectures, the eye
box reduces as soon as one gets away from the last optical element in the combiner.
102
FOV
Eye box
FOV
FOV
FIGURE 5.14 The eye box and FOV share a common space.
The eye box is also shared with the FOV (see Figure 5.14). If the HMD can be
switched between various FOV (by either altering the microdisplay, using a MEMS
projector as a display, or changing the position of the microdisplay and the focal
length of the magnifier), the eye box may vary from a comfortable eye box (small
FOV) to a nonacceptable eye box blocking the edges of the image (for large FOV).
Finally, the effective eye box of a smart glass can be much larger than the real
optical eye box when various mechanical adjustments of the combiner may be used to
match the exit pupil of the combiner to the entrance pupil of the users eye. However,
for any position of the combiner, the eye box has to allow the entire FOV to be seen
unaltered, at the target eye relief. It may happen that for a specific position of the combiner, the entire display might be seen indoors (large pupil), but the edges of the display might become blurry outside due to the fact that the eye pupil diameter decreases.
103
104
(a)
(b)
FIGURE 5.17 (a) Phase LCoS microdisplays (HoloEye PLUTO panel) as dynamic computergenerated holograms for far field or near field display; (b) bidirectional OLED panel used in
HMD (Fraunhoffer Institute).
105
FIGURE 5.18 Curved OLED panels can relax the design constraints for HMD combiner optics.
PicoP
display engine
Combiner optics
Projected
image
MEMS
FIGURE 5.19 MEMS micromirror laser sources for Pico projectors and HMDs.
MEMS micromirror laser scanners are desirable image generators for HMDs
(since the laser beams are already collimatedsee Figure 5.19) but cannot produce an
acceptable eyebox without forming an intermediate aerial image plane to build up the
eyebox (optional diffusion at this image plane may also creates parasitic speckle when
used with laser illumination). Most scanners use laser light, which can produce speckle
(which has then to be despeckeled by an additional despeckeler device). However, if
there is no diffuser in the optical path, speckle should not appear to the eye.
An alternative to micromirror MEMS scanner is the vibrating piezo fiber scanner
(such as with Magic Leaps AR HMD). Such a fiber scanner can be used either as an
image-producing device and thus integrated in an HMD system (see Figure 5.20) or can
be used as a digital image sensor (in reverse mode). These devices are very small, and
the laser or LED source can be located away from the fiber end tip (unlike in a MEMS
scanner which is free space), making them ideal candidates for HMD pattern generators.
Furthermore, as fiber can also be used in reverse mode, eye gaze tracking can be integrated in a bidirectional scheme, such as in the bidirectional OLED device in Figure 5.17.
One of the advantages of both MEMS or Fiber scanners is that the effective FOV
can be rescaled and/or relocated in real time without loosing efficiency (provided the
106
Head strap
Brightness
control
Video camera
and IR LEDS
Scanning fiber
display tube
FIGURE 5.20 Vibrating piezo fiber tip produces an image and integrated in an HMD via a
free-space optical combiner.
TABLE 5.3
Microdisplays and Image Generators Used Today in HMD/Smart Glass
Devices
Device
Type
Display
Resolution
Google Glass
Vuzix M100
Epson Moverio
Oculus Rift DK1
Oculus Rift DK2
Silicon microdisplay ST1080
Zeiss Cinemizer
Sony HMZ T3
Sony Morpheus
Optinvent ORA
Lumus DK40
Composyt Labs
See-through
Opaque
See-through
Opaque
Opaque
Opaque
Opaque
Opaque
Opaque
See-through
See-through
See-through
Vuzix/Nokia M2000AR
See-through
LCOS
LCD
LCOS
LCD
OLED
LCOS
OLED
OLED
OLED
LCD
LCOS
Laser MEMS
scanner
Laser MEMS
scanner
640 360
400 240
960 540
1280 800
1920 1080
1920 1080
870 500
1280 720
1920 1080
640 480
640 480
Res and FOV can
vary dynamically
Res and FOV can
vary dynamically
Aspect Ratio
16:9
16:9
16:9
1.25:1
16:9
1.74:1
16:9
16:9
4:3
4:3
Can vary
Can vary
107
combiner optic can still do a decent imaging job at such varying angles). This is possible to a certain degree in emissive panels, but not in LCD or LCoS panels in which
the backlight would always illuminate the entire display.
Table 5.3 summarizes the various image sources used for some of the current
HMD, VR, and smart glass offerings. Most of the image generation sources are used
today (LCD, LCoS, OLED, MEMS).
108
Nearsightedness (Myopia)
SPHERE
CYL
AXIS
ADO
O.D.
+4.25
+2.50
090
+2.00
O.S.
+4.00
+1.50
090
+2.00
Uncorrected
Farsightedness (Hyperopia)
Uncorrected
FIGURE 5.21 Prescription compensation in combiner optics for smart eyewear to address a large potential consumer market.
109
(1.a)
(1.b)
(2.a)
(2.b)
(3.a)
(3.b)
Meniscus
Biconcave Planoconcave
Positive lenses
+8.00
Meniscus
Biconcave
Planoconcave
Negative lenses
+4.00
5.00
8.00
+6.00
+2.00
3.00
6.00
+4.50
+0.50
0.50
4.50
All 4D lenses
FIGURE 5.23 The only acceptable lens shape for smart eyewear is a meniscus.
110
(a)
(b)
FIGURE 5.24 Available prescription lenses implementations in smart glasses: (a) Google
Glass and (b) Lumus Ltd.
Geco eyewear
Life Logger
Bluetooth WiFi camera headset
(a)
(Continued)
111
MicroOptical Corp.,
MyVu Corp.
STAR 1200
Augmented Reality System
Vuzix Corp.
(b)
(c)
(d)
FIGURE 5.25 (Continued) (b) occlusion smart glasses available on the market; (c) seethrough smart glasses available on the market (Google Glass and various copy cats, + Laster
SARL p roduct); (d) pseudo see-through tapered combiner device from Olympus; (Continued)
112
Oculus
Sony
Silicon microdisplay ST1080
(e)
Barrel distortion
(in-engine)
Pin-cushion distortion
(from rift lenses)
No distortion
(final observed image)
(f)
Number of receptors
per square millimeter
180,000
Blind spot
160,000
140,000
Rods
Rods
120,000
100,000
80,000
60,000
40,000
Cones
20,000
Cones
0
70 60 50 40 30 20 10 0 10 20 30 40 50 60 70 80
Angle ()
(g)
FIGURE 5.25 (Continued) (e) some of the main VR headsets available today on the market;
(f) oculus latency sensor (left) and optical distortion compensation (right) through software in VR;
(g) foveated rendering in high pixel count VR systems, linked to gaze tracking;
(Continued)
113
Focus distance
Vergence distance
Real
object
location
Virtual image
location
Screen
location
VR
magnifier
lenses
(h)
Real 3D scene
IPD
(interpupillary distance)
IOD
(interocular distance)
ISD
(interscreen
distance)
Right
display
(i)
Optinvent, France
Microprisms
(j)
114
(k)
(l)
FIGURE 5.25 (Continued) (k) specialized AR headsets for law enforcement, firefighting,
and engineering; and (l) specialized AR headsets for medical and surgical environments.
115
and other designs such as Laster, in which combiner is located vertically (see
Figure 5.25c). An interesting alternative optical architecture has been developed by
Olympus and others (Vuzix Pupil, Sony Smart Glasses, Telepathy One)see Figure
5.25d. The combiner is here an occlusion tapered combiner using a 45 mirror, but
the end tip of the combiner is smaller than the usual 4mm diameter of the eye pupil,
making it thus pseudo transparent to the far field (much like when one takes a knife
edge close to the eye and can see through the edge).
Such see-through smart glasses or smart eyewear is, however, not an AR system:
their limited FOV (limited mainly by the size of the optics) and angular offset of
such FOV make them best suited for contextual display applications rather than for
true AR applications. AR headset systems require larger FOV centered on the users
line of sight (see also Section 5.9.5).
116
IOD (interocular distance), see Figure 5.25i. Adjusting the screens and/or the lenses
or using dynamic lenses can provide a potential solution.
Another way to reduce such disparity in VR and AR systems (and therefore
the associated sickness) is to use non-stereoscopic 3D display techniques such as
dynamic holographic displays (using Fresnel diffractive patterns producing virtual images at specific distances). Light field displays such as the ones developed
by nVIDIA (Douglas Lanman) or Magic Leap Inc. are another elegant way to
produce a true 3D representation accommodating both eye vergence and focus.
However, light field display techniques are still in their infancy and require complex rendering computation power as well as higher display resolutions (similar
constraints as the light field imaging, such as in the Lytro camera system).
117
5.9.6Specialized AR Headsets
A few industrial headsets have been developed for specialized markets such as law
enforcement, firefighting, and engineering (Hua and Javidi, 2014, Wilson, 2005),
on the basis of an opaque combiner located in the lower part of the vision. Such
devices are not consumer devices, and have a price tag only accessible to professional markets (>$5 K). Figure 5.25k shows some of these devices (Motorola HC1,
Kopin-Verizon Golden Eye, and Canon).
They might also include specialized gear such as FLIR camera and special communication channels. Interestingly, these devices are usually built around the architecture described in Figure 5.6e (non-see-through freeform TIR combiner).
Other specialized markets include medical applications, for patient record viewing or assistance in surgery (monitoring the patients vital signs during surgery,
recording surgery for teaching requirements, etc.). Figure 5.25l shows some implementations based on an Epson Moverio see-through AR headset (also shown in
Figure 5.6f) (left), a Motorola HC1 (center), and Google Glass (right).
Medical applications have been a perfect niche application for AR and smart
glasses, long before the current boom in consumer smart glass and VR headsets, and
will remain a relatively small but steady market for such devices in the future.
118
Innovega., U.S.A.
Combo contact lens and glasses
Standard OLED or
LCD display panel
Contact
lens
Video/audio IC
Outer
filter
Eyeglasses
Optical filter
conditions display light
FIGURE 5.26 Dual smart glass/contact lens see-through HMD from Innovega.
the microdisplay is located on the temple with a relay lens forming an image on a
reflective holographic combiner located at the inner surface of the glass lens, providing
a true see-through experience. In another implementation, the microdisplay is located
directly on the inner surface of the glass lens, making it thus non-see-through HMD.
Due to the fact that the collimation lens is located close to the cornea, and that it
actually moves with the cornea, it provides for an excellent eye box and a large FOV.
Panchind
image (close- Displayed
up-photo)
image
Bare microdisplay
119
Near-eye light
field display
FIGURE 5.27 Light field occlusion HMD from NVidia. (From Dr. Douglas Lanman at
NVIDIA, Santa Clara, CA.)
120
P4
P3
P2
P1
L
FIGURE 5.28 Four Purkinje images (P1, P2, P3, and P4) used for gaze tracking (the glint
is the first Purkinje image off the retinaP1).
The first reflection (the glint) is usually the most used one (see also Figure 5.29).
Single or multiple IR flood sources may be used to increase the resolution of the
tracking (e.g., 4 IR sources located symmetrically to produce good vertical and horizontal gaze tracking).
In many combiner architectures, it is desirable to use the same optical path for the
display as for the glint imaging. However, as the display uses an infinite conjugate
(image at infinity) and the gaze tracking finite conjugates, it is necessary to use either
an additional objective lens on the IR CMOS sensor or to position the source outside
the display imaging train (see Figure 5.30). A cold (or hot) mirror would allow for the
splitting between the finite and infinite conjugates.
More complex optical architectures have been developed to implement the finite
conjugate IR imaging task within the infinite conjugate imaging task depicted in
Figure 5.30. One of them is based on the optical architecture described in Figure5.6e
(freeform TIR combiner with compensation optic). The freeform surfaces in the
combiner optic added to an extra objective lens on the IR CMOS sensor allows for
a very compact gaze tracking architecture including both IR flood illumination and
IR imaging (see Figure 5.31).
Glints
FIGURE 5.29 The first Purkinje reflections (glints) used in eye gaze tracking.
121
Microdisplay
Infinite
conjugate
Combiner lens
IR source
Infinite
conjugate
Combiner lens
IR eye gaze
sensor
Finite
conjugate
FIGURE 5.30 Finite (gaze tracking) and infinite (collimation of the display) conjugates in
HMDs.
NIR sensor
Freeform prism
orre
rm c
fo
Free
Exit pupil
Microdisplay
NIR LED
ctor
Freeform combiner with
complement piece for seethrough operation
FIGURE 5.31 Eye gaze tracking in a see-through combiner based on freeform TIR surfaces.
This device has been developed by Canon (Japan). It is however relatively thick
and therefore not quite adapted to consumer smart eyewear, but rather to the professional AR market.
122
FIGURE 5.32 Some of the current hand gesture sensor developers for gaming, TV, computer, and HMDs.
(a)
(b)
(c)
FIGURE 5.33 Gesture sensors integrated as add-ons to HMDs. (a) Oculus DK2 and Leap
Motion sensor, (b) Oculus Rift and Soft Kinetic, and (c) Soft Kinetic and Meta Glasses.
5.12CONCLUSION
We have reviewed in this chapter the main optical architectures used today to implement smart glasses, smart eyewear, see-through AR HMDs, and occlusion VR HMDs.
These various optical architectures produce different size, weight, and functionality
constrains, for devices that are applied to new emerging markets, stemming from the
original defense and VR markets as we have known them for decades. Adapted sensing such as gaze tracking and hand/eye gesture sensing are being implemented in
various HMDs, especially VR headsets, stemming from the earlier efforts produced
for conventional gaming devices. For AR devices, the wide FOV thin see-through
optical combiner remains a challenge, as well as the use of optical occlusion pixels,
and for VR occlusion devices, the challenge resides in reducing thickness, size, and
123
weight of the magnifier lens and increases its performance over a very large FOV.
VR sickness in many users also needs to be addressed for both VR and large FOV
AR systems: although display resolution (pixel counts vs. FOV), low sensor/display
latency, high GPU speed, etc. have all been dramatically improved since the first VR
efforts in the 1990s, many VR sickness issues remain to be addressed today, such as
eye convergence/focus accommodation disparity and motion sickness (internal ear
vs. perceived motion disparity).
For consumer products such as smart glasses and smart eyewear, the integration
of the optical combiner into high wrap sunglass-type lenses or true Rx meniscus
lenses is one of the most desired features for next-generation smart eyewear and is
also one of the most difficult optical challenges.
REFERENCES
Cakmakci, O. and J. Rolland, Head-worn displays: A review, Journal of Display Technology,
2(3), 199216, 2006.
Curatu, C., H. Hua, and J. Rolland, Projection based head mounted display with eye tracking
capabilities, Proceedings of SPIE, 5875, 58750J-1, 2005.
Hua, H., D. Cheng, Y. Wang, and S. Liu, Near-eye displays: State-of-the-art and emerging
technologies, Proceedings of SPIE, 7690, 769009, 2010.
Hua, H. and B. Javidi, A 3D integral imaging optical see-through head-mounted display,
Optics Express, 22(11), 1348413491, 2014.
Kress, B. and P. Meyrueis, Applied Digital Optics, from Micro-Optics to Nano-Photonics,
John Wiley Publisher, Chichester, UK, 2009.
Kress, B., V. Raulot, and P. Meyrueis, Digital combiner achieves low cost and high reliability for head-up display applications, SPIE Newsroom Illumination and Displays,
Bellingham, WA, 2009, http://spie.org/x35062.xml.
Levola, T., Diffractive optics for virtual displays, Proceedings of the Journal of the SID
(Society for Image Display), 14(5), 467475, 2006.
Martins, R. etal., Projection based head mounted displays for wearable computers, Proceedings
of SPIE, 5442, 104110, 2004.
Melzer, J. and K. Moffitt, Head Mounted Displays: Designing for the User, MacGraw Hill,
1997, reprinted in 2011.
Peli, E., The visual effects of head-mounted display are not distinguishable from those of
desk-top computer display, Vision Research, 38, 20532066, 1998.
Rash, C.E., Head mounted display: Design issues for rotary-wing aircraft. United States Army
Aeromedical Research Laboratory, US Government Printing Office, Washington, DC, 1999.
Takahashi, H. and S. Hiroka, Stereoscopic see-through retinal projection head-mounted display, Proceedings of SPIE-IS&T Electronic Imaging, 6803, 68031N, 2008.
Talha, M.M., Y. Wang, D. Cheng, and J. Chang, Design of a compact wide field of view HMD
optics system using freeform surfaces, Proceedings of SPIE, 6624, 662416-1, 2008.
Thomson CSF, US patent # 5,076,664 of December 31, 1991 (Thomson CSF, France).
Urey, H. and K.D. Powell, Microlens array based exit pupil expander for full color displays,
Applied Optics, 44(23), 49304936, 2005.
Velger, M., Helmet Mounted Displays and Sights, Artech House Publisher, Boston, MA, 1998.
Wilson, J. etal., Design of monocular head mounted displays for increased indoor firefighting
safety and efficiency, Proceedings of SPIE, 5800, 103114, 2005.
Wilson, J. and P. Wright, Design of monocular head-mounted displays, with a case study
on fire-fighting, Proceedings of IMechE, Part C: Journal of Mechanical Engineering
Science, vol. 221, 2007.
Image-Based Geometric
Registration for Zoomable
Cameras Using
Precalibrated Information
Takafumi Taketomi
CONTENTS
6.1 Background.................................................................................................... 126
6.2 Literature Review.......................................................................................... 126
6.3 Marker-Based Camera Parameter Estimation............................................... 127
6.4 Camera Pose Estimation for Zoomable Cameras.......................................... 129
6.4.1 Parameterization of Intrinsic Camera Parameter Change................. 130
6.4.1.1 Camera Calibration for Each Zoom Value......................... 130
6.4.1.2 Intrinsic Camera Parameter Change Expression Using
Zoom Variable.................................................................... 130
6.4.2 Monocular Camera Case................................................................... 131
6.4.2.1 Definition of Energy Function............................................ 131
6.4.2.2 Energy Term for Epipolar Constraint................................. 131
6.4.2.3 Energy Term for Fiducial Marker Corners......................... 132
6.4.2.4 Energy Term for Continuity of Zoom Value....................... 133
6.4.2.5 Balancing Energy Terms..................................................... 133
6.4.2.6 Camera Pose Estimation by Energy Minimization............ 134
6.4.3 Stereo Camera Case........................................................................... 134
6.4.3.1 Camera Calibration for Camera pair.................................. 135
6.4.3.2 Geometric Model for Stereo Camera Considering
Optical Zoom Lens Movement........................................... 135
6.4.3.3 Camera Pose Estimation Using Zoom Camera and
Base Camera Pair................................................................ 136
6.5 Camera Parameter Estimation Results.......................................................... 137
6.5.1 Camera Calibration Result................................................................ 137
6.5.2 Quantitative Evaluation in Simulated Environment.......................... 139
6.5.2.1 Free Camera Motion........................................................... 140
6.5.2.2 Straight Camera Motion..................................................... 143
6.5.3 Qualitative Evaluation in Real Environment..................................... 145
6.6 Summary....................................................................................................... 147
References............................................................................................................... 148
125
126
6.1BACKGROUND
In video see-through-based augmented reality (AR), estimating camera parameters
is important for achieving geometric registration between real and virtual worlds.
In general, extrinsic camera parameters (rotation and translation) are estimated by
assuming fixed intrinsic camera parameters (focal length, aspect ratio, principal
points, and radial distortions). Early augmented reality applications assume the use
of head-mounted displays (HMDs) for displaying augmented reality images to users.
When using HMDs, changes in intrinsic camera parameters such as camera zooming
are not used to prevent unnatural sensations in users; thus, assuming fixed intrinsic
camera parameters is not a problem in conventional augmented reality applications.
In contrast, mobile augmented reality applications that use smartphones and tablet
PCs have been widely developed in the recent years. In addition, augmented reality
technology is often used in TV programs. In these applications, the changes in intrinsic camera parameter hardly give unnatural sensations. However, most of the augmented reality applications still assume fixed intrinsic camera parameters. Thisis
due to the difficulty of estimating extrinsic and intrinsic camera parameters simultaneously. Removing the limitation of fixed intrinsic camera parameters in camera
parameter estimation opens possibilities in many augmented reality applications.
In this chapter, to estimate camera parameters while the intrinsic camera parameters change, two methods are introduced: estimation using monocular camera and
estimation using stereo camera. More specifically, we focus on the marker-based
camera parameter estimation method because this method is widely used in augmented reality applications. Both methods are extended versions of the markerbased camera parameter estimation method.
The remainder of this chapter is organized as follows. In Section 6.2, related works
are briefly reviewed. Section 6.3 introduces general marker-based camera parameter
estimation. The framework for estimating intrinsic and extrinsic camera parameters
using precalibrated information is described in Section 6.4, and its effectiveness is
quantitatively and qualitatively evaluated in Section 6.5. Finally, Section6.6 presents
the conclusion and the future work.
127
Solutions for the PnP problem when the intrinsic camera parameters are not
known have also been proposed (Abidi and Chandra 1995; Triggs 1999). These methods can estimate the absolute extrinsic camera parameters and focal length from 2D
to 3D corresponding pairs. However, in these methods, the accuracy of the estimated
camera parameters decreases depending on the specific geometric relationship of
the points. To solve this problem, Bujnak etal. proposed a method for estimating
extrinsic camera parameters and focal length. Their method uses a Euclidean rigidity constraint in object space (Bujnak etal. 2008). Furthermore, they improved the
computational cost of the method (Bujnak etal. 2008) by joining planar and nonplanar solvers (Bujnak etal. 2010). The method (Bujnak etal. 2010) can be implemented
in real time on a desktop computer. However, the accuracy of the estimated camera
parameters still decreases in this method when the optical axis is perpendicular to the
plane formed by the 3D points. Kukelova etal. proposed the five-point-based method
(Kukelova etal. 2013). This method can achieve more stable camera parameter estimation than that of the method in Bujnak etal. (2010). However, most marker-based
applications use a square marker (Kato and Billinghurst 1999). In these applications,
camera parameters should be estimated from four 2D3D corresponding pairs.
Unlike in the PnP problem, for estimating the intrinsic and extrinsic camera parameters, corresponding pairs of 2D image coordinates in multiple images are used
(Hartle and Zisserman 2004; Stewenius etal. 2005; Li 2006). These methods are
usually used in 3D reconstruction from multiple images, for example, the structurefrom-motion technique (Snavely etal. 2006). Although these methods do not need
any prior knowledge of the target environment, they cannot estimate absolute extrinsic
camera parameters. Sturm proposed a self-calibration method for zoom lens cameras, which uses precalibration information (Sturm 1997). The idea of this method
is similar to that of the method described in Section 6.4.2. In this method, intrinsic
camera parameters are calibrated and then represented by one parameter. In the
online process, the estimation of the intrinsic and extrinsic camera parameters uses
this precalibration information and is based on the Kruppa equation. However, the
solution of the Kruppa equation is not robust to noise, and this method cannot estimate absolute extrinsic camera parameters. These methods are impractical for some
AR applications because they require the user to arrange the CG objects and coordinate system manually.
In contrast to previous methods, the method that we describe in this section can
accurately and stably estimate intrinsic and absolute extrinsic camera parameters
using an epipolar constraint and a precalibrated intrinsic camera parameter change.
In our method, a fiducial marker is used to obtain 2D3D corresponding pairs.
Natural feature points that do not have 3D positions are used to stabilize the camera
parameter estimation results. Estimated intrinsic camera parameters are constrained
by the precalibrated intrinsic camera parameter change.
128
A fiducial maker is detected from an input image using image processing techniques
such as binarization. Then, the marker is matched against known markers. After
detection and identification, 3D positions of the fiducial marker features are associated
with the 2D positions of the fiducial marker in the input image. These 2D3D correspondences are used to estimate camera parameters. In general, fixed intrinsic camera
parameters are assumed in this camera parameter estimation process. Thus, in this
estimation process, extrinsic camera parameters are estimated as unknown parameters. Most camera parameter estimation methods employ the following cost function:
E=
p p
i
(6.1)
where
E is the cost
pi is a detected 2D position of the fiducial marker in the input image
pi is a reprojected position of the 3D point of the fiducial marker feature as shown
in Figure 6.1
The position of the reprojected point can be calculated using the translation of vector t, rotation of matrix R, and the intrinsic camera parameter matrix K, as follows:
pi K R | t Pi (6.2)
where Pi is a 3D position of the fiducial marker feature. Note that the distortion factor
is ignored in this formulation. Finally, extrinsic camera parameters are estimated by
minimizing Equation 6.1.
Pi
pi
pi
World coordinate
system
Camera coordinate
system
[R|t]
FIGURE 6.1 Geometric relationship between a reprojected point and a detected point.
129
In the past, intrinsic camera parameters are fixed in the camera parameter estimation process in augmented reality. These intrinsic camera parameters are obtained in
advance by using camera calibration methods (Tsai 1986; Zhang 2000). On the other
hand, simultaneous intrinsic and extrinsic camera parameter estimation methods
have been proposed in the field of computer vision (Bujnak etal. 2008, 2010). These
methods can estimate intrinsic and extrinsic camera parameters using 2D3D correspondences. However, results are unstable when the marker features lie on the same
plane. In addition, the accuracy of camera parameter estimation will decrease when
the camera moves with the camera zooming along the optical axis. In Section 6.3,
two methods for overcoming these problems are introduced.
FIGURE 6.2 Flow diagram of camera parameter estimation for zoomable camera.
130
applications. On the other hand, the stereo-camera-based method can be used for
situations wherein an additional camera can be attached to the camera capturing the
augmented reality background images. Details of these methods are described in the
following sections.
fx
K =0
0
0
fy
0
cx
cy (6.3)
1
where
fx and f y represent focal lengths
cx and cy represent principal points
In this method, we assume zero skew and no lens distortion. This assumption is
reasonable for most of the recent camera devices. Thus, the degree of freedom of the
intrinsic camera parameters is four. In this method, these four values for each zoom
value are obtained by using Zhangs camera calibration method (Zhang 2000).
6.4.1.2Intrinsic Camera Parameter Change Expression
Using Zoom Variable
After getting the intrinsic camera parameters for each zoom value, the intrinsic camera parameters are expressed in terms of the zoom variable m.
f x (m)
K(m) = 0
0
0
f y (m)
0
cx (m)
cy (m) (6.4)
1
By using this expression, the degree of freedom of the intrinsic camera parameter
matrix is reduced to one. In addition, the relationship of each intrinsic camera parameter change is retained. Unlike in previous research that handles the intrinsic camera
parameters independently (Bujnak etal. 2008, 2010), the method described in this
section can achieve stable camera parameter estimation during the online process.
In this method, the third-order spline fitting is employed to the result of camera
calibration to obtain the intrinsic camera parameter change model for each parameter. The third-order spline fitting has features that an obtained function through
131
the all control points and each polynomial function is continuously connected at the
borders. These features are suitable for an energy minimization process in the online
camera parameter estimation.
where wmk and wz are weights for balancing each term. These weights are automatically determined based on the camera parameters of the previous frame. Emk is used
to estimate the absolute extrinsic camera parameters, Eep implicitly gives the 3D
structure information, and Ezoom gives the temporal constraint for zoom values. Eep
and Ezoom help achieve stable estimation of the magnification of the zoom value. In
the following sections, the details of the energy terms and the weights are described.
6.4.2.2 Energy Term for Epipolar Constraint
In this method, the energy term Eep is calculated from the summation of distances
between the epipolar lines and the tracked natural features as shown in Figure 6.3.
Based on epipolar geometry, a corresponding point must be located on the epipolar
Tracked feature point
Pi
Reprojection error di
pi
Key frame
Epipolar line li
qi
ei
Epipole
ei
Current frame j
132
line on another camera image (Hartle and Zisserman 2004). To calculate this distance, frames that satisfy the following criteria are stored as the key frames:
1. The distance between the current camera position and the camera position
of the previous 10 frames is maximum.
2. All the distances between the current camera position and key frame positions are larger than the threshold.
Note that the first frame is stored as the first key frame in the online process of camera parameter estimation. In addition, natural features are tracked between successive frames using the KanadeLucasTomasi feature tracker (Shi and Tomasi 1994).
More concretely, the energy term Eep is as follows:
Eep =
1
Sj
2
i
(6.6)
iS j
where
S is a set of tracked natural feature points in the jth frame
di is the reprojection error for the natural feature point i
The reprojection error di is defined as the distance between an epipolar line l and
a detected natural feature position qi in the input image. The epipolar line l can be
calculated from the epipole ei and the projected position pi of the natural feature
position pi in the key frame. Epipole ei and the projected position pi are calculated as:
ei = K ( m j ) Tj Pkey (6.7)
pi = K ( m j ) Tj Pi (6.8)
where
Pkey represents the key frame camera position in the world coordinate system
Tj represents the extrinsic camera parameter matrix (camera rotation and
translation)
The subscript represents the estimated camera parameters in the key frame. Note that
Pi in Equation 6.8 is already transformed into the world coordinate system using the
matrices Kkey(mkey) and Tkey. By using this notation, we can represent the estimation
error for the two frames with the epipolar constraint as the reprojection error.
6.4.2.3 Energy Term for Fiducial Marker Corners
This term is almost the same energy term as that calculated in conventional camera
parameter estimation methods. Reprojection errors are calculated from the correspondences between the fiducial marker corners in an input image and its reprojected points:
Emk =
( K ( m ) T P p ) (6.9)
j
i =1
j i
133
where Pi and pi are the 3D position of fiducial marker corners and its detected position in the input image, respectively. Unlike the conventional methods, the magnification parameter m of the camera zooming exists in the intrinsic camera parameter
matrix K in the jth frame.
6.4.2.4 Energy Term for Continuity of Zoom Value
This term is used to achieve stable camera parameter estimation. In augmented reality, camera parameters are estimated from a video feed. In this case, the magnification of the zoom value continuously changes in successive frames. In order to add
this constraint, we use the energy term Ezoom in the energy function:
2
Ezoom = ( m j 1 m j ) (6.10)
wmk ( ) =
4 2
+ (6.11)
2
134
parameters in the previous frame. In general, the relationship between the zoom
values and intrinsic camera parameters is not proportional. The focal lengths (fx(m),
f y(m)) are drastically changed at a large image magnification resulting from the camera zooming. For this reason, if we use the constant weight wz, the effect of the
weight wz might be too strong or too weak in the camera parameter estimation process. Thus, we should control the weight wz adequately. To solve this problem, we
employed a weight for wz which depends on fx(m) as follows:
wz =
1
(6.12)
fx ( m j )
In this term, we only use fx because the change of fx is almost the same as that of f y.
By using this weight, we can adequately control the weight wz based on the rate of
change of the intrinsic camera parameters.
6.4.2.6 Camera Pose Estimation by Energy Minimization
To estimate the intrinsic and extrinsic camera parameters, the energy function
E is minimized by using the LevenbergMarquardt algorithm. We employ the
M-estimator to reduce the effect of mis-tracked natural features in the optimization
process. In this method, we employ the GemanMcClure function .
( x) =
x2 /2
(6.13)
1 + x2
where x represents the residual. In this optimization process, the zoom value mj1
estimated in the previous frame and the extrinsic camera parameters estimated by
using K(mj1) are used as initial parameters. The results of camera parameter estimation may converge at a local minimum. Experimentally, we confirmed that the local
minimum problem occurs along the optical axis of the camera. To avoid the local
minimum problem, the optimization process is executed using three different initial
values generated by adding an offset to the initial magnification value of camera
zooming. Finally, the lowest energy value resulting from all trials is chosen, and its
estimated camera parameters K(mj) and Tj are adopted as the final result.
135
for the reference camera, intrinsic and extrinsic camera parameters for the zoomable
camera can be obtained by estimating the zoom value. Details of the algorithm are
described in the following sections.
6.4.3.1 Camera Calibration for Camera pair
This method assumes an additional camera attached to the zoomable camera as
shown in Figure 6.5. This attached camera is used as a reference to estimate intrinsic
and extrinsic camera parameters of the zoomable camera. Intrinsic camera parameters of the reference camera are calibrated and fixed in the whole process. In this
calibration process, the magnification of the zoom value of the zoomable camera is
set to 1.0 (non-zoom mode). In this setting, the intrinsic camera parameters of the
zoomable camera and the reference camera are known. By using these known intrinsic camera parameters, a relative geometric relationship Trel between the zoomable
camera and the reference camera is calibrated by capturing a calibration pattern.
This relative geometric relationship is used to estimate intrinsic and extrinsic camera
parameters of the zoomable camera.
6.4.3.2Geometric Model for Stereo Camera Considering
Optical Zoom Lens Movement
In the case of camera zooming using an optical zoom lens, the relative geometric relationship Trel changes depend on the optical lens movement because the optical center moves along the optical axis. In this method, the optical lens movement
is modeled as the focal length change by zooming using a simple zoomable camera model (Numao et al. 1998). This simple zoomable camera model is shown in
Figure6.6. Concretely, in this modelization, a relationship between the focal length
of each zoom value fi and the minimum focal length f min is calculated:
fi = fi fmin (6.14)
Zoomable camera
fmin
fmax
p
F( f )
Trel
Tzoom
p
Tref
Reference camera
136
Maximum
zoom value
Minimum
zoom value
fmin
Image sensor
L( f )
fmax
Amount of lens
movement L( f ) [mm]
fmin
A regression line is fitted to the result of this calculation, and then the relationship
between lens movement and the focal length change L(f) is obtained:
L ( f ) = f + (6.15)
where and are the parameters for the regression line. The relationship between
lens movement and the focal length change and the relationship between the zoomable camera and the reference camera are used to model the stereo camera model
as shown in Figure 6.5. In this figure, Tzoom and Tref are extrinsic camera parameters
of the zoomable camera and the reference camera in the world coordinate system,
respectively. In addition, F(f) is the amount of optical zoom center movement.
F ( f ) = 0
0
0
1
0
0
0
1
0 (6.16)
L ( f )
137
In this equation, all of parameters are known except for z. Thus, we can estimate
the magnification of the zoom value m by minimizing the reprojection errors of the
detected marker and the reprojected marker corner positions. In this minimization
process, the LevenbergMarquardt method is employed and the zoom value in the
previous frame is used as the initial value for optimization.
Finally, Tzoom is refined by minimizing the following cost function using detected
marker corner positions in reference and zoomable camera images:
E=
i =1
K ( m ) Tzoom Pi pizoom +
i =1
Kref Trel1F f x ( m )
where piref and pizoom represent detected marker corner positions in reference and
zoomable camera images, respectively. Note that the magnification of the zoom
value z is fixed in this optimization process.
138
camera was used to generate virtual camera motions in the quantitative evaluation
and to acquire actual video sequences in the qualitative evaluation. The range of the
image magnification resulting from camera zooming is divided into 20 intervals,
and then, the intrinsic camera parameters for each zoom value are obtained using
Zhangs camera calibration method (Zhang 2000).
Figures 6.7 and 6.8 show the results of the camera calibration. In these figures,
the lines indicate the spline fitting results. These results show that the focal length
drastically changes when the zoom value is greater than 13. In addition, the center
of the projection changes cyclically because the lens rotates during zooming. In the
following experiments, we used the spline fitting results of fx(z), f y(z), u(z), and v(z).
1200
fx(m)
f y(m)
1000
800
600
400
200
0
7 8 9 10 11 12 13 14 15 16 17 18 19 20
Magnification of camera zooming
u(m)
v(m)
400
350
300
250
200
150
100
50
0
10 11 12 13 14 15 16 17 18 19 20
139
Finally, rotation angle is employed as the rotation error of the estimated rotation
matrix. This evaluation method is described in the literature (Petit etal. 2011).
FIGURE 6.9 Part of the camera paths and 3D points in the simulated environment.
140
50
100
150
200
Bujnaks method
fx (estimated)
160
140
120
100
80
60
40
50
fx (ground truth)
100
150
fy (estimated)
200
Method A
fx (estimated)
160
140
120
100
80
60
40
50
fx (ground truth)
100
150
fy (estimated)
200
fx (ground truth)
fy (ground truth)
250
300
Frame number
fy (ground truth)
250
300
Frame number
Method B
fx (estimated)
250
300
Frame number
fy (estimated)
fy (ground truth)
FIGURE 6.10 Estimation results of focal length for each frame in free camera motion.
141
340
320
300
280
260
240
220
200
50
150
100
200
250
300
Frame number
Method A
u (estimated)
340
320
300
280
260
240
220
200
u (ground truth)
50
100
v (estimated)
200
150
250
300
Frame number
Method B
u (estimated)
u (ground truth)
v (ground truth)
v (ground truth)
v (estimated)
FIGURE 6.11 Estimation results of center of projection for each frame in free camera
motion.
30
25
20
15
10
5
0
50
100
Bujnaks method
150
200
Method A
250
300
Frame number
Method B
FIGURE 6.12 Estimated camera position errors for each frame in the case of free camera
motion.
142
10
8
6
4
2
0
50
100
150
Bujnaks method
200
Method A
250
300
Frame number
Method B
FIGURE 6.13 Estimated camera rotation errors for each frame in free camera motion.
TABLE 6.1
Comparison of Accuracy in the Case of Free Camera Motion
Average focal length error (mm)
Average position error (mm)
Average rotation error (degree)
Average reprojection error (pixel)
Processing time (s)
Bujnaks Method
Method A
Method B
13.08
6.1
1.37
1.36
0.0011
0.83
0.46
1.31
0.82
0.06
0.5
0.51
1.18
0.31
0.04
143
160
140
120
100
80
60
40
50
100
150
200
250
300
Frame number
fy (estimated)
fy (ground truth)
200
250
300
Frame number
fy (estimated)
fy (ground truth)
200
250
300
Frame number
fy (estimated)
fy (ground truth)
Bujnaks method
fx (estimated)
Focal length (mm)
160
140
120
100
80
60
40
fx (ground truth)
50
100
150
Method A
fx (estimated)
Focal length (mm)
160
140
120
100
80
60
40
fx (ground truth)
50
100
150
Method B
fx (estimated)
fx (ground truth)
FIGURE 6.14 Estimation results of focal length for each frame in straight camera motion.
144
Center of projection (pixel)
340
320
300
280
260
240
220
200
100
50
150
200
Method A
u (estimated)
340
320
300
280
260
240
220
200
u (ground truth)
100
50
v (estimated)
150
200
Method B
u (estimated)
u (ground truth)
v (estimated)
250
300
Frame number
v (ground truth)
250
300
Frame number
v (ground truth)
FIGURE 6.15 Estimation results of center of projection for each frame in straight camera
motion.
30
25
20
15
10
5
0
50
100
Bujnaks method
150
200
Method A
250
300
Frame number
Method B
FIGURE 6.16 Estimated camera position errors for each frame in straight camera motion.
145
9
8
7
6
5
4
3
2
1
0
50
100
150
Bujnaks method
200
Method A
250
300
Frame number
Method B
FIGURE 6.17 Estimated camera rotation errors for each frame in straight camera motion.
TABLE 6.2
Comparison of Accuracy in the Case of Straight Camera Motion
Average focal length error (mm)
Average position error (mm)
Average rotation error (degree)
Average reprojection error (pixel)
Processing time (s)
Bujnaks Method
Method A
Method B
13.66
7.71
2.24
1.33
0.0012
2.13
1.1
1.67
0.79
0.05
0.51
0.54
1.73
0.43
0.04
146
Method A
Method B
Zoom
Non-zoom
Zoom
Non-zoom
ARToolKit
FIGURE 6.18 Geometric registration results of each method. A virtual cube is overlaid on
Rubiks cube in each frame.
Bujnaks method
Method A
Method B
147
Bujnaks method
Method A
Method B
This figure confirms that the estimated camera paths of Methods A and B are
smoother than those of the Bujnaks method. There is a large jitter in the estimated
camera path of the Bujnaks method. We confirmed that Methods A and B can estimate the camera path with more stability than the Bujnaks method.
6.6SUMMARY
In this chapter, the methods for estimating intrinsic and extrinsic camera parameters
were introduced. In the monocular camera case, camera parameters are estimated
by minimizing the energy function. In this method, two additional energy terms
are added to the conventional marker-based camera parameter estimation method:
reprojection errors of tracked natural features and temporal constraint of zoom
value. On the other hand, in the stereo camera case, intrinsic and extrinsic camera parameter estimation of the zoomable camera is achieved using the reference
camera. In this method, the optical lens movement is modeled as the focal length
change by zooming using a simple zoomable camera model. By using this model and
the reference camera, intrinsic and extrinsic camera parameters can be estimated
by solving a one-dimensional optimization problem. These methods can achieve
accurate and stable camera parameter estimation. However, the current methods
do not consider the lens distortion. Lens distortion must be considered when using
wide-angle lenses.
148
REFERENCES
Abidi, M. A., T. Chandra. A new efficient and direct solution for pose estimation using quadrangular targets: Algorithm and evaluation. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 17(5), 1995, 534538.
Bujnak, M., Z. Kukelova, T. Pajdla. A general solution to the P4P problem for camera with
unknown focal length. Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, Anchorage, AK, June 2328, 2008, pp. 18.
Bujnak, M., Z. Kukelova, T. Pajdla. New efficient solution to the absolute pose problem for
camera with unknown focal length and radial distortion. Proceedings of the Asian
Conference on Computer Vision, Queenstown, New Zealand, November 812, 2010,
pp. 1124.
Drummond, T., R. Cipolla. Real-time visual tracking of complex structure. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 27(7), 2002, 932946.
Fischer, M. A., R. C. Bolles. Random sample consensus: A paradigm for model fitting with
applications to image analysis and automated cartography. Communications of the
ACM, 24(6), 1981, 381395.
Hartle, R., A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge, U.K.:
Cambridge University Press, 2004.
Hmam, H., J. Kim. Optimal non-iterative pose estimation via convex relaxation. International
Journal of Image and Vision Computing, 28(11), 2010, 15151523.
Kato, H., M. Billinghurst. Marker tracking and HMD calibration for a video-based augmented
reality conferencing system. Proceedings of the International Workshop on Augmented
Reality, San Francisco, CA, October 2021, 1999, pp. 8594.
Klette, R., K. Schluns, A. Koschan, editors. Computer Vision: Three-Dimensional Data from
Image. New York: Springer, 1998.
Kukelova, Z., M. Bujnak, T. Pajdla. Real-time solution to the absolute pose problem with
unknown radial distortion and focal length. Proceedings of the International Conference
on Computer Vision, Sydney, New South Wales, Australia, December 18, 2013,
pp.28162823.
Lepetit, V., F. Moreno-noguer, P. Fua. EPnP: An accurate O(n) solution to the PnP problem.
International Journal of Computer Vision, 81(2), 2009, 155166.
Li, H. A simple solution to the six-point two-view focal-length problem. Proceedings of the
European Conference on Computer Vision, Graz, Austria, May 713, 2006, pp. 200213.
Numao, T., Y. Nakatani, M. Okutomi. Calibration of a pan/tilt/zoom camera by a simple camera model. Technical Report of IEICE. PRMU, 1998, Kanagawa, Japan, pp. 6572.
Petit, A., G. Caron, H. Uchiyama, E. Marchand. Evaluation of model based tracking with
TrakMark dataset. Proceedings of the International Workshop on AR/MR Registration,
Tracking and Benchmarking, Basel, Switzerland, October 26, 2011.
Quan, L., Z.-D. Lan. Linear n-point camera pose determination. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 21(8), 1999, 774780.
Shi, J., C. Tomasi. Good features to track. Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, Seattle, WA, June 2123, 1994, pp. 593600.
Snavely, N., S. M. Seitz, R. Szeliski. Photo tourism: Exploring photo collections in 3D. ACM
Transactions on Graphics, 25(3), 2006, 835846.
Stewenius, H., D. Nister, F. Kahl, F. Schaffalitzky. A minimal solution for relative pose with
unknown focal length. Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, San Diego, CA, June 2025, 2005, pp. 789794.
Sturm, P. Self-calibration of a moving zoom-lens camera by pre-calibration. International
Journal of Image and Vision Computing, 15, 1997, 583589.
149
Taketomi, T., K. Okada, G. Yamamoto, J. Miyazaki, H. Kato. Camera pose estimation under
dynamic intrinsic parameter change for augmented reality. International Journal of
Computers and Graphics, 44, 2014, 1119.
Taketomi, T., T. Sato, N. Yokoya. Real-time and accurate extrinsic camera parameter estimation using feature landmark database for augmented reality. International Journal of
Computers and Graphics, 35(4), 2011, 768777.
Triggs, B. Camera pose and calibration from 4 or 5 known 3D points. Proceedings of the
International Conference on Computer Vision, Kerkyra, Greece, September 2027,
1999, pp. 278284.
Tsai, R. Y. An efficient and accurate camera calibration technique for 3D machine vision.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
Miami Beach, FL, 1986, pp. 364374.
Wu, Y., Z. Hu. PnP problem revisited. Journal of Mathematical Imaging and Vision, 24(1),
2006, 131141.
Zhang, Z. A flexible new technique for camera calibration. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 22(11), 2000, 13301334.
CONTENTS
7.1 Introduction................................................................................................... 152
7.2 Framework for Simultaneous Tracking and Recognition.............................. 152
7.2.1 Offline Stage...................................................................................... 153
7.2.2 Online Stage...................................................................................... 154
7.2.3 Simultaneous Tracking and Recognition........................................... 154
7.3 Camera Pose Tracking with Robust SFM..................................................... 156
7.3.1 Structure from Motion Using Subtrack Optimization....................... 156
7.3.2 System Components.......................................................................... 158
7.3.2.1 Building the Point Cloud.................................................... 158
7.3.2.2 Extracting Keypoint Descriptors........................................ 160
7.3.2.3 Incremental Keypoint Matching......................................... 160
7.3.2.4 Camera Pose Estimation..................................................... 161
7.3.2.5 Incorporating Unmatched Keypoints................................. 161
7.3.2.6 Experiments........................................................................ 162
7.4 Camera Pose Tracking with LiDAR Point Clouds........................................ 164
7.4.1 Automatic Estimation of Initial Camera Pose................................... 164
7.4.1.1 Generate Synthetic Images of Point Clouds....................... 165
7.4.1.2 Extract Keypoint Features.................................................. 165
7.4.1.3 Estimate Camera Pose........................................................ 165
7.4.2 Camera Pose Refinement................................................................... 166
7.4.3 Experiments....................................................................................... 167
7.5 Summary and Conclusions............................................................................ 169
Acknowledgment.................................................................................................... 169
References............................................................................................................... 170
151
152
7.1INTRODUCTION
Augmented Reality (AR) is the process of combining virtual and real objects into
a single spatially coherent view. In most cases, this entails capturing a sequence of
images and determining a cameras spatial pose (position and orientation) at each
frame. The cameras position and orientation, along with its internal parameters,
provide the essential information needed to create augmented realities.
Tracking, or camera pose determination, is a primary technical challenge of AR, and
therefore the subject of a large body of AR research and development work (Neumann
and You, 1999; Azuma et al., 2001; Shibata et al., 2010; Uchiyama and Marchand,
2012). Visual tracking is a popular approach to AR pose determination. In its simplest
form, the environment is prepared with artificial markers that can be easily detected and
tracked (Cho and Neumann, 2001; Ababsa and Mallem, 2004; Claus and Fitzgibbon,
2004; Mooser etal., 2006). The marker-based approach, however, is often impractical for use in wide-area environments. A more practical approach is to track naturally occurring elements of the environment (Platonov etal., 2006; Bleser et al., 2006;
Comport et al., 2006; Wagner etal., 2008; Hsiao etal., 2010; Uchiyamaetal.,2011;
Guan etal., 2012), possibly in combination with artificial markers (Jiang etal., 2000).
The advantage of visual tracking for AR, with either artificial markers or natural features, is the ability to capture images, identify visual content, estimate pose, and manage AR display all in one computing device with a camera and display.
This chapter focuses on the problem of robust visual tracking for AR in natural environments. Emphasis is placed on a novel tracking technique that combines
feature matching, visual recognition, and camera pose tracking. A tracking via recognition strategy is presented that performs simultaneous tracking and recognition
within a unified framework. Rather than functioning as a separate initialization
step, recognition is as an ongoing process that both aids and benefits from tracking.
Experiments show the advantages of this unified approach in comparison to traditional approaches that treat recognition and tracking as separate processes.
Within the same unified framework, two tracking systems are presented. The first
system employs robust structure from motion (SFM) to build a sparse point cloud
model of the target environment for visual recognition and pose tracking, while the
second system employs a dense 3D point cloud data captured with active sensors
such as LiDAR (Light Detection and Ranging) scanners. In both cases, by determining a sufficiently large set of 2D3D feature correspondences between an image and
the model database, an accurate camera pose can be computed and used to render
virtual elements for AR display.
153
Offline stage
Image input
SFM
LiDAR
Create virtual
objects
Online stage
Virtual
object DB
AR
display
Recognize Compute
features camera pose
3D point Camera
clouds
poses
Extract feature
descriptors
Render virtual
objects
Feature
DB
Match
features
Video
input
FIGURE 7.1 Unified framework for simultaneous visual tracking and recognition.
Figure 7.1 illustrates the developed unified framework for simultaneous visual
recognition and tracking. At a high level, there are two stages. The first, an offline
stage, defines a set of visual features and associated virtual content. The second is an
online stage that processes an input video stream, recognizes visual features, estimates camera pose, and renders final AR output for each frame. These two stages are
connected through a feature database and a virtual object database.
154
only be rendered when those features are recognized and visible. The construction
and use of the annotation database is outside of this chapters scope, however its
relationship to the feature database is made clear by inclusion in the offline stage.
155
whether the current pose estimate implies that they lie on the surface of the database
object. If it does, its location in object space coordinates can be back-projected to generate a match. Keypoints that do not belong to the object, perhaps because they lie on
an occluding object, will not fit the pose estimates in future frames and these points
are discarded. In the event the system loses track of some of the originally matched
keypoints, it still tracks the object correctly using the newer matches. Thus, estimated
camera pose is used to compute the positions for new keypoints that have never been
matched to the database. Tracking these new points along with database-matched
keypoints helps maintain accurate pose tracking through subsequent frames.
From this point forward, keypoint matches to the database are called database
matches, and those keypoint matches that are derived from the incremental backprojection algorithm are called projection matches.
Figure 7.2 illustrates the principle of incremental keypoint matching for simultaneous tracking and recognition. The scene contains a book cover viewed by a moving camera. A feature database containing the book cover model and features was
prepared offline. In all frames, the detected dots on the book cover are the keypoints
used for object identification and pose tracking; bold lines indicate database matches;
and thin lines are projection matches. An early frame, shown in Figure 7.2a, finds
a few database matches that are used to estimate an initial camera pose. The pose
estimate is used to draw the white rectangle model of the book. As the sequence
continues, Figure 7.3b shows numerous keypoints matched by incremental back-projection. These are used to produce stable pose estimates. Figure 7.2c shows a frame
with substantial object occlusion. Only two of the original database matches remain,
however the projection matches ensure that tracking remains accurate and robust.
(a)
(b)
(c)
FIGURE 7.2 Incremental keypoint matching for simultaneous tracking and recognition.
(a) Initial camera pose is estimated from the database matches. (b) Camera pose is refined
incrementally with back-projection matches. (c) Stable camera pose produced by combining
the database matches and projection matches.
156
k1
k2
k3
k4
k5
k6
FIGURE 7.3 Illustration of hypothetical keypoint tracks for six camera locations. Each keypoint projects along a ray in space. The six rays do not meet at a single point, so there is
no structure point valid for the entire track. However, subtracks k1,2,3 and k4,5,6 do have valid
structure points. The goal of the optimization algorithm is to reliably perform this partitioning.
157
process generates a set of keypoint tracks, each consisting of a keypoint correspondence spanning two or more consecutive frames. Each track continues to grow until
optical flow fails or the keypoint drifts out of view. In practice, tracks can span only
a few frames or several hundred.
Ideally, all keypoints in a given track correspond to the same 3D point in space, in
which case the keypoint track is deemed consistent. Over a long sequence, however,
this seldom holds true. Keypoint tracks are often stable for a few frames, then drift,
and then become stable again.
A simple solution identifies keypoint tracks that do not fit to a single structure
point and remove them from the computation. Traditional outlier detection schemes
such as RANSAC (Random Sample Consensus) may be used to this end. However,
simply labeling entire tracks as an inlier or outlier ultimately discards useful data.
Along keypoint track is generally stable over some portion of its lifetime and a more
powerful approach will identify those sections and use them.
Identifying the sets of frames during which a keypoint track remains stable is
nontrivial. Simply splitting the track into fixed sized partitions, for example, would
only partially address the problem. Partitions, no matter how large or small, may be
individually inconsistent. Moreover, if consecutive partitions are consistent, it would
be preferable to consider them as a whole.
The motivation behind the subtrack optimization algorithm is to solve this partitioning problem optimally. It sets out to identify the longest possible subtracks that
can be deemed consistent. Favoring fewer, longer subtracks is important because it
ensures that they span as wide a baseline as possible. A method that is overly aggressive in partitioning a keypoint track will lose valuable information, and the accuracy
of the resulting structure will suffer accordingly.
This idea is illustrated in Figure 7.3. As a hypothetical camera moves from top to bottom, a keypoint is tracked along a ray in space at each frame. Because those rays do not
meet at a common structure point, the six frame tracks are, by definition, inconsistent.
The subtrack spanning frames 13 and frames 46 are consistent, however, and thus
represent an optimal partitioning with both subtracks usable for pose computation.
Each subtrack corresponds to a single structure point with its consistency determined by average reprojection error. For keypoint track k, let kj and Pj be the k eypoint
and camera, respectively,at frame j and ka,b be the subtrack spanning frames a to b
inclusive. The consistency of ka,b is given by the error function
E (ka,b ) = min
1
N
d(P X , k )
j
(7.1)
j=a
where
d is Euclidean distance in pixels
N is the length of the subtrack
The argument, X, that minimizes the equation is the structure point corresponding
to ka,b.
158
( + E ( k
a ,b
)) (7.2)
ka ,b p
where is a constant penalty term ensuring that the optimization favors longer subtracks whenever possible.
The number of possible partitions is exponential in the length of k, so a brute
force search would be intractable. As it turns out, however, given an estimate of the
camera pose at each frame, a partitioning suited to our needs can be found in low
order polynomial time.
The key idea is to define the cost function recursively as
C ( p 0 ) = 0
C ( p 1 ) = 0
C ( p n ) = min[C ( p a 1 ) + + E (a, n)]
(7.3)
1 a n
159
to fit a fundamental matrix. Using known internal parameters, this defines a rigid
transformation relating the two cameras.
For every keypoint tracked from frame 1 to frame n, an initial structure point
Xk is computed by linear triangulation. The initial structure points are then used to
compute poses for the intervening frames, 2 through (n 1). Each camera pose is a
rigid transformation having six degrees of freedom, which is fit using Levenberg
Marquardt optimization.
The resulting short sequence of camera poses, typically 810 in our implementation, provides enough information to apply the subtrack optimization algorithm to
each tracked keypoint. This partitions each track into one or more subtracks and
determines which subtracks are inliers. For each inlier subtrack, a 3D point is computed to form an initial point cloud.
Subsequent frames are processed with optical flow applied to each keypoint to
extend the corresponding track. For each new frame, the system initially assumes
that all consistent subtracks in the prior frame are still consistent for the current
frame. Since those subtracks have known 3D structure points, they are used to estimate the initial camera pose of the new frame.
With an initial camera pose for a new frame, subtrack optimization is performed
taking that new pose estimate into account. Each keypoint track is repartitioned into
subtracks, with a new 3D structure point assigned to each new subtrack. So each frame
estimates one camera pose followed by one subtrack optimization. This continues for
the entire input sequence to produce a final 3D reconstruction and feature database.
Figure 7.4 shows a resulting point cloud model for a building sequence, rotated to
provide an overhead view. The structure of the hedges is clearly visible, along with
FIGURE 7.4 Point cloud model of a building along with camera poses produced with the
proposed robust SFM approach.
160
some keypoints arising from the brick walls and the lawn. The path of the camera in
front of the building is computed and rendered on the left.
7.3.2.2 Extracting Keypoint Descriptors
The point cloud, by itself, contains only geometric information, namely a 3D location for each structure point. To complete the keypoint database, each point must be
associated with a visual descriptor that can be matched during the online stage.
The descriptor uses the WalshHadamard (WH) kernel projection to describe
a keypoint, which is highly compact and discriminative. The WH descriptor takes
rectangular image patches (typically 32 32 pixel) as input and uses WH kernel
transformation to reduce the patches to low dimensional vectors, with lower dimensions encoding low frequency information and higher dimensions encoding high
frequency information. Given an image patch p and the ith WH kernel ui, the ith element of the kernel projection is given by the inner product uiT p. Experimental trials
revealed that 20 dimensions are sufficient to retain the most characteristic features of
each patch and provide reliable matching results. The first WH kernel simply computes a sum of the patchs intensity values, which contains no discriminative information after normalization. The first kernel is thus discarded, so the 20-dimensional
descriptor vector is comprised by WH kernels u1 through u21.
Given the 20-dimensional descriptor for each keypoint, a Euclidean nearest
neighbor search finds its match in a database of learned keypoints. A distance ratio
filter is applied, that is, a match is accepted only if the distance ratio between its
nearest and second nearest matches is less than a predefined threshold, as detailed
in the following section.
Tracking hundreds of keypoints over hundreds of frames, the resulting database can grow quite large. However, the matching process maintains efficiency by
employing an approximate nearest neighbor search that is, at worst, logarithmic in
the size of the database, as described in the following section.
7.3.2.3 Incremental Keypoint Matching
Each frame of the input sequence contains a set of keypoints. As in the offline
stage, these keypoints are initially generated by a detector and tracked using optical flow, exactly as in the offline stage. Initially, only the 2D image locations of
these keypoints are known. A descriptor is extracted at each keypoint using the
20-
dimensional kernel projections, and matched against the database using a
Euclidean nearest neighbor search. To improve speed performance, the approximate nearest neighbor search algorithm called Best-Bin-First (BBF) (Lowe, 2004)
is employed, which returns the exact nearest neighbor with high probability while
only requiring time logarithmic in the database size. Using the BBF algorithm, the
nearest and second nearest matches are retrieved and their matching distances are
computed. If both of these matches are associated with the same structure point,
their matching distance ratio should be smaller than a predefined threshold and the
match is accepted.
Even with fast approximate nearest-neighbor searches, the matching process is
too time-consuming when applied to all keypoints in an image (typically 300500).
161
In order to limit this computational cost in any frame, we adopt the incremental keypoint matching approach described in Section 7.2.3. Incremental keypoint matching
only attempts to match a limited number of keypoints in each frame. The selected
set varies each frame so the set of successful matches gradually accumulates. While
a few matches per frame are seldom sufficient to estimate a robust pose, successful
matches are tracked from frame to frame. By selecting new keypoints for matching
and tracking existing matches, the number of tracked matches accumulates. In our
experiments, about 10 frames are sufficient for accumulating enough matches to
recover a robust camera pose. As the sequence continues, additional matched points
may be computed at each frame.
7.3.2.4 Camera Pose Estimation
Each successful database match produces a correspondence between a 2D image
point and 3D structure point. RANSAC is used to fit a camera pose. Outliers are
removed, as these 3D points are deemed incorrect.
Keypoints are tracked through the sequence to enable pose recovery in subsequent frames. Some of these points are lost as they drift out of view or due to track
failures. At the same time, however, incremental matching adds new matches at each
frame, trying to maintain a match set large enough to compute a reliable pose.
7.3.2.5 Incorporating Unmatched Keypoints
The process described thus far only uses points whose 3D locations are known in
advance and stored in the feature database. Most of the keypoints in the input image
are never successfully matched against the database. Incorporating these points into
the pose estimation process can significantly improve the quality of the final results.
An unmatched keypoint in a single frame represents a ray in 3D space, and thus
can be associated with any point along that ray. If that same keypoint is tracked over
several frames, each with a known pose, its location in 3D space can be estimated.
In fact, such estimates are precisely the input to the subtrack optimization algorithm
used during the offline stage. Thus, once a pose is recovered for at least 10 consecutive frames, the subtrack optimization algorithm is applied to all keypoint tracks that
are not matched to the database. The partitioned keypoints now have associated 3D
locations and can be applied to the pose estimation of future frames. This results in
two distinct sets of matches to compute pose, that is, database matches and projection matches.
The advantages of including projection matches in the pose estimation process
are twofold. First, there are times that the camera views an area that was not included
in the model building phase, in which case database matches alone are insufficient
to recover pose. In such cases, pose is computed from projection matches since they
persist without database matches.
The second advantage is that even when many database matches are found, additional matches result in a smoother and more reliable pose. Typically, there are many
times more projection matches than database matches. Residual errors exist in all
keypoint position estimates. A larger number of correspondences make the final
least-squares fit more reliable and smoothly varying.
162
7.3.2.6Experiments
Experiments demonstrate various behaviors of the online and offline stages of the
tracking system. The primary focus of all of these tests is to show that the robust
SFM tracking approach makes a substantial, measurable difference in the end results
(Mooser, 2009). Both stages were thus compared with and without subtrack optimization. In all test cases, one video sequence was captured for the offline stage and a
separate, longer video was used for the online stage.
Three test results are shown to demonstrate the systems performance with different scenes. The first test, the Fuse Box sequence in Figure 7.5a, shows the exterior of
an electrical fuse box in an industrial environment. The scene contains a mixture of
planar and nonplanar surfaces. The second test, the A/C Motor sequence in Figure
7.5b, targets an irregularly shaped object. Although the ground surrounding the
motor is flat, it is mostly covered in gravel, and does not contain many easily identified features. The final case, the building sequence in Figure 7.5c, shows an outer
(a)
(b)
(c)
FIGURE 7.5 Sample tracking and augmentation scenes for all three test cases: (a) the fusebox is tracked from a variety of orientations, not all of which were covered in the training
process; (b) the ac motor is tracked through a nearly 180 rotation, while the keypoint dataset
was built from only one side of the motor; and (c) the building scene contains both natural and
man-made objects, making the camera tracking extremely challenging.
163
building scene comprising both natural and man-made objects. The annotations are
virtual labels showing the way to nearby points of interest.
Table 7.1 shows the RMS reprojection error of all database keypoints produced
in the offline stages. Without subtrack optimization, keypoint tracks are never partitioned; tracks are simply terminated when their error exceeds a threshold. Every
track thus corresponds to a single structure point. If the track drifts significantly
over the whole sequence, the structure point is poorly defined and produces a large
reprojection error, as reflected in the results. In all tests, the average optimized error
is lower than the unoptimized cases.
Moreover, when running the offline stage with no optimization, the keypoint
database has far fewer total keypoints. The reason for this is that a keypoint track
that drifts significantly cannot be fit to any single structure point. Without subtrack optimization, such a track does not contribute any points to the database.
Subtrack optimization, however, may find multiple subtracks that each have valid
structure points, all of which can go into the database.
Table 7.2 shows the results of recognizing and tracking in the online stage.
Without subtrack optimization, pose refinement relies only on database matches and
no projection matches are used. As in the offline stage, pose accuracy is measured
by average reprojection error of all keypoints in all frames.
While there is significant error reduction with optimization, the absolute error is not as
low as in the offline stage. This is largely due to residual errors in the keypointdatabase.
TABLE 7.1
Offline Stage Error Measurement
Sequence
Fuse Box
AC Motor
Buildings
No optimization
Average subtrack length
Reprojected error
15.48
1.17
24.97
0.45
20.89
0.85
With optimization
Average subtrack length
Reprojected error
21.36
0.64
26.23
0.37
25.28
0.54
TABLE 7.2
Online Stage Error Measurement
Sequence
Fuse Box
AC Motor
Buildings
Length (frames)
600
444
350
No optimization
Average inliers
Reprojected error
50.38
4.43
46.11
2.08 (failed after 269 frames)
68.79
4.49 (failed after 206 frames)
With optimization
Average inliers
Reprojected error
438.45
1.91
300.61
0.56
438.41
2.06
164
Since the set of keypoints detected during the online stage are a subset of those found
when building the model, even the best matches can have an error of a few pixels. Table
7.2 also compares the average number of inlier keypoints available with and without
subtrack optimization. It shows that the optimization step greatly increases the total
number or projection matches available for use in pose calculation. This larger number
of observations to estimate pose makes the computation more reliable and robust to
individual errors, significantly improving the quality of the final results.
Figure 7.5 shows sample pose tracking and augmentation results for all three test
scenes. Camera poses are accurately estimated so the virtual objects are well aligned
with the real scenes. Note that all the tests involved moving the online camera to
areas outside of the areas viewed during the offline stage. During those movements,
the system is unable to generate database matches. Without the use of projection
matches, camera pose and tracking fail immediately. Using projection matches, however, the system is able to continue tracking, although errors will accumulate in the
absence of any visible database features. As database features become visible again,
the accumulated tracking errors are corrected.
(a)
165
(b)
FIGURE 7.6 Dense 3D point cloud model (a) captured by a ground LiDAR system and a
camera image (b) of a corner in downtown Los Angeles.
and their visual descriptors are computed. These keypoints are then projected back
onto the point cloud data to obtain their 3D positions. In a second step, the camera
pose for an input video frame is estimated by corresponding image keypoints and
the back-projected keypoints.
7.4.1.1 Generate Synthetic Images of Point Clouds
A set of virtual viewpoints are arranged to face the major plane surfaces of the point
cloud model. These views uniformly sample viewing directions and logarithmically
sample viewing distance, as shown in Figure 7.7a. Experiments show that six viewing
directions and three viewing distances are sufficient for facade scenes with one major
plane surface.
Once viewpoints are defined, the 3D point models are rendered onto each 2D
image plane using ray casting and Z-buffers to handle occlusions. Figure 7.7b shows
examples of synthetic images generated from the Los Angeles point cloud dataset
shown in Figure 7.6.
7.4.1.2 Extract Keypoint Features
Keypoint features and their associated SIFT (Scale Invariant Feature Transform) visual
descriptors are extracted in each synthetic view image. The extracted features are reprojected onto the 3D point clouds by finding intersections with the first plane that is obtained
through a plane segmentation method (Stamos and Allen, 2002). It is possible that the
same feature is reprojected to different 3D coordinates from different synthetic views.
Nearby feature points are filtered so that proximate points with similar descriptors are
merged into one feature. The final output is a set of 3D keypoint features with associated
visual descriptors saved in the feature database for online matching and pose estimation.
7.4.1.3 Estimate Camera Pose
Given an input image, its keypoint features are extracted and matched against the 3D
keypoints in the feature dataset. Each matched feature surface normal is computed
and used for clustering features. A modified RANSAC method is employed to estimate the camera pose and remove outliers. Rather than maximizing the number of
inliers that are consensus to the pose hypothesis, we make modifications as follows.
166
(a)
(b)
FIGURE 7.7 (a) Virtual viewpoint arrangement and (b) synthetic images produced from a
3D point cloud.
Inliers are clustered according to their normal directions so that the inliers with
close normal directions are grouped as the same cluster. Let N1 and N2 be the number
of inliers for the largest two clusters. Among all the hypothesized poses, we want to
maximize the value of N2
[ R | T ] = arg max N 2 (7.4)
[R | T ]
This promotes solutions with inliers that lie in different planes. This avoids the condition where all features lie in a single plane making the calculated pose unstable and
sensitive to position errors.
167
generate a new improved pose estimate. In the first iteration, SIFT features are used,
but Harris features are employed in the following iterations to improve processing
speed. Harris features are less distinctive but much faster to compute. For each feature point, a normalized intensity histogram within an 8 8 patch is computed as a
feature descriptor and used for matching. We search for corresponding points within
the neighborhood of H H pixels. Initially, H is set to 64 pixels. The search size is
reduced by half in each iteration to 4 4 (16 pixels) as more accurate pose estimates
are obtained.
The pose refinement is accomplished through an optimization process that minimizes an error function of feature descriptors derived from the input image and the
projected images of point clouds.
1
{i}
E=
(s I
3D
(i ) I 2 D (i ))2 (7.5)
where
I3D (i) and I2D (i) are descriptors for the ith feature on the projected image and input
image, respectively
s is a scale factor compensating the reflectance or lighting effects
s will take the value so that the Equation 7.5 is equivalent to
E=
I 3D (i )2 I 2 D (i )2
2
I 2 D (i )
(7.6)
I 3D (i )2
7.4.3Experiments
Experiments evaluate the systems performances with various real data. Reprojection
error in the image domain is measured to evaluate the accuracy of camera pose estimation. Figure 7.8 demonstrates the behaviors of iterative pose estimation and refinements of the tracking system. The left column of the Figure 7.8 shows the rendered
image of the 3D point cloud model from the estimated camera pose. The middle
column shows the rendered point cloud image aligned with the camera image. The
alignment accuracy depends on the accuracy of the camera pose estimate. A pixel
difference image is shown on the right column. Alignment errors are clearly reduced
after each iteration.
The number of iterations needed is usually small. Our experiments show that
measured projection errors often remain constant after three iterations. This indicates that sufficient correspondences have been obtained after three iterations for
computing a stable and accurate camera pose.
168
FIGURE 7.8 Iterative estimation of camera pose: the left column shows input 3D point
cloud model rendered from the estimated camera pose, the middle column shows the model
image aligned with the camera image, and the right column shows the pixel-difference image
that illustrates the accuracy of pose estimation.
Figure 7.9 shows another example of tracking and augmentation results for a
video image using a 3D point cloud model. An initial camera pose is automatically
obtained from keypoint matches. After three iterations of pose refinements, accurate
camera poses are estimated so the virtual models are well aligned with the real
scenes.
(a)
(b)
(c)
(d)
169
FIGURE 7.9 A video image (a) and a 3D point cloud model (b) rendered with an initial
camera pose. Final pose is estimated after three iterations (c), allowing accurate alignment of
the 3D model with the image (d).
ACKNOWLEDGMENT
Much of this work is the PhD research of members of the Computer Graphics
and Immersive Technology (CGIT) lab at the University of Southern California.
In particular, we relied on the works of Dr. Wei Guan, Dr. Jonathan Mooser, and
170
Dr.QuanWang. We are also grateful for the current and former project sponsors,
including the U.S. Army Research Office (ARO), the Office of Naval Research
(ONR), the National Geospatial-Intelligence Agency (NGA), DARPA, NASA,
Nokia, Airbus, and Korean Air Corp.
REFERENCES
Ababsa, F. and Mallem, M., Robust camera pose estimation using 2D fiducials tracking
for real-time augmented reality systems. Proceedings of the 2004 ACM SIGGRAPH
International Conference on Virtual Reality Continuum and Its Applications in Industry,
Singapore, June 1618, 2004, pp. 431435.
Azuma, R., Baillot, Y., Behringer, R., Feiner, S., Julier, S., and MacIntyre, B., Recent advances
in augmented reality. IEEE Computer Graphics and Applications, 21(6): 3447,
November/December 2001.
Bleser, G., Wuest, H., and Stricker, D., Online camera pose estimation in partially known
and dynamic scenes. International Symposium on Mixed and Augmented Reality, Santa
Barbara, CA, October 2225, 2006, pp. 5665.
Cho, Y. and Neumann, U., Multi-ring fiducial systems for scalable fiducial-tracking augmented reality. PRESENCE: Teleoperators and Virtual Environments, 10(6): 599612,
December 2001.
Claus, D. and Fitzgibbon, A. W., Reliable fiducial detection in natural scenes. Proceedings
of European Conference on Computer Vision, Prague, May 1114, 2004, pp. 469480.
Comport, I., Marchand, E., Pressigout, M., and Chaumette, F., Real-time markerless tracking for augmented reality: The virtual visual serving framework. IEEE Transactions on
Visualization and Computer Graphics, 12(4): 615628, July 2006.
Feiner, S., Korah, T., Murphy, D., Parameswaran, V., Stroila, M., and White, S., Enabling
large-scale outdoor mixed reality and augmented reality. International Symposium on
Mixed and Augmented Reality, Basel, Switzerland, October 2629, 2011, p. 1.
Guan, W., You, S., and Neumann, U., Efficient matchings and mobile augmented reality.
ACM Transactions on Multimedia Computing, Communications and Applications
(TOMCCAP), special issue on 3D Mobile Multimedia, 8(47): 115, September 2012.
Guan, W., You, S., and Pang, G., Estimation of camera pose with respect to terrestrial LiDAR
data. IEEE Workshop on the Applications of Computer Vision (WACV), Tampa, FL,
January 1517, 2013, pp. 391398.
Hsiao, E., Collet, A., and Hebert, M., Making specific features less discriminative to improve
point-based 3D object recognition. IEEE Conference on Computer Vision and Pattern
Recognition, San Francisco, CA, June 1318, 2010, pp. 26532660.
Jiang, B., You, S., and Neumann, U., Camera tracking for augmented reality media. IEEE
International Conference on Multimedia and Expo, New York, July 30August 2, 2000,
pp. 16371640.
Lowe, D. G., Distinctive image features from scale-invariant keypoints. International Journal
of Computer Vision, 60: 91110, 2004.
Mooser, J., You, S., and Neumann, U., Tricodes: A barcode-like fiducial design for augmented
reality media. IEEE International Conference on Multimedia and Expo, Toronto,
Ontario, Canada, July 912, 2006, pp. 13011304.
Mooser, J., You, S., and Neumann, U., Fast simultaneous tracking and recognition using
incremental keypoint matching. International Symposium on 3D Data Processing,
Visualization, and Transmission, Atlanta, GA, June 1820, 2008.
Mooser, J., You, S., Neumann, U., Grasset, R., and Billinghurst, M., A dynamic programming
approach to structure from motion in video. Asian Conference on Computer Vision,
Xian, China, September 2327, 2009a, pp. 110.
171
Mooser, J., You, S., Neumann, U., and Wang, Q., Applying robust structure from motion to
markerless augmented reality. IEEE Workshop on Applications of Computer Vision
(WACV), Snowbird, UT, December 78, 2009b, pp. 18.
Neumann, U., and You, S., Natural feature tracking for augmented reality. IEEE Transactions
on Multimedia, 1(1): 5364, 1999.
Platonov, J., Heibel, H., Meier, P., and Grollmann, B., A mobile markerless AR system for
maintenance and repair. International Symposium on Mixed and Augmented Reality,
Santa Barbara, CA, October 2225, 2006, pp. 105108.
Pylvninen, T., Berclaz, T. J., Korah, T., Hedau, V., Aanjaneya, M., and Grzeszczuk, R., 3D
city modeling from street-level data for augmented reality applications. International
Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission
(3DIMPVT), Zurich, Switzerland, October 1315, 2012, pp. 238245.
Shibata, F., Ikeda, S., Kurata, T., and Uchiyama, H., An intermediate report of Trakmark WGInternational voluntary activities on establishing benchmark test schemes for AR/MR
geometric registration and tracking methods. International Symposium on Mixed and
Augmented Reality, Seoul, Korea, October 1316, 2010, pp. 298302.
Stamos, I., and Allen, P. K., Geometry and texture recovery of scenes of large scale. Computer
Vision and Image Understanding, 88(2): 94118, 2002.
Uchiyama, H., and Marchand, H., Object detection and pose tracking for augmented reality:
Recent approaches. 18th Korea-Japan Joint Workshop on Frontiers of Computer Vision,
Kawasaki, Japan, February 2012, pp. 18.
Wagner, D., Reitmayr, G., Mulloni, A., Drummond, T., and Schmalstieg, D., Pose tracking
from natural features on mobile phones. Proceedings of the International Symposium
on Mixed and Augmented Reality (ISMAR), Cambridge, U.K., September 1518, 2008,
pp. 125134.
CONTENTS
8.1 Introduction................................................................................................... 174
8.2 Outdoor Panoramic Capture.......................................................................... 176
8.2.1 Guidelines for Capture...................................................................... 176
8.3 Automatic 3D Modeling................................................................................ 177
8.3.1 Image Extraction............................................................................... 177
8.3.2 3D Reconstruction Pipeline............................................................... 179
8.4 Semiautomatic Geo-Alignment..................................................................... 179
8.4.1 Vertical Alignment............................................................................ 180
8.4.2 Ground Plane Determination............................................................. 181
8.4.3 Map Alignment.................................................................................. 181
8.5 Tracking the Model........................................................................................ 181
8.5.1 Image Representation........................................................................ 182
8.5.2 Camera Model................................................................................... 182
8.5.3 Point Correspondence Search............................................................ 182
8.5.4 Pose Update....................................................................................... 183
8.5.5 Success Metric................................................................................... 184
8.5.6 Live Keyframe Sampling................................................................... 184
8.6 Tracker Initialization..................................................................................... 185
8.6.1 Image-Based Method......................................................................... 185
8.6.2 Feature-Based Method....................................................................... 186
8.7 Server/Client System Design......................................................................... 186
8.7.1 Server/Client System Overview......................................................... 186
8.7.2 Latency Analysis............................................................................... 187
8.7.3 Sensor Integration.............................................................................. 188
8.8 Evaluation...................................................................................................... 189
8.8.1 Speed................................................................................................. 189
8.8.2 Accuracy Tests with Differential GPS............................................... 189
8.8.3 Augmentation Examples.................................................................... 190
8.9 Discussion...................................................................................................... 191
8.10 Further Reading............................................................................................. 192
References............................................................................................................... 192
173
174
8.1INTRODUCTION
This chapter explains how to digitally capture, model, and track large outdoor spaces
so that they can be used as environments for mobile-augmented reality (AR) applications. The three-dimensional (3D) visual model of the environment is used as a
database for image-based pose tracking with a handheld camera-equipped tablet.
Experimental analysis demonstrates that real-time localization with high accuracy
can be achieved from models created using a small panoramic camera.
Device positioning is a common prerequisite for many AR applications. Indoors,
visual detection of flat, printed markers has proven to be a very successful method for
accurate device positioning, at least for a small workspace. In larger spaces, external
tracking systems allow for precise positioning by use of statically mounted cameras
that observe objects moving in the space. Outdoors, however, we cannot require that
the environment be covered in printed markers or surrounded by mounted cameras. The global positioning system (GPS) provides ubiquitous device tracking from
satellites, but does not guarantee enough accuracy on consumer-level devices for
AR applications. This chapter presents an alternative approach that treats the built
environment like an existing visual marker. By detecting and tracking landmark features on the building facades, the system uses the surrounding buildings for accurate
device positioning in the same way that printed markers are used indoors, except at
a larger scale.
Visual modeling and tracking technology is based on the relationship between
points in the scene and cameras that observe them. Having images of the same point
from multiple known camera positions allows us to determine the 3D location of
the point, as depicted in Figure 8.1. Conversely, observing multiple known points
in a single image allows us to determine the position of the camera, as depicted in
Figure8.2. The first case is useful for building a 3D model of an environment. The
second case is useful for determining the location of a camera with respect to that
model. Researchers in the fields of photogrammetry and multiple-view geometry have
studied the equations and principles governing these relationships extensively. This
chapter describes a system that applies these principles to model a large outdoor space
and track the position of a camera-equipped mobile device moving in that space.
The process for preparing an outdoor environment for tracking usage in AR
applications involves three basic steps. First, the area is captured in many photographs that cover the environment from all possible viewpoints. Second, from this
collection of photographs, feature points are extracted and matched between images,
and their 3D positions are precisely determined in an iterative process. Third, the 3D
points are aligned with a map of building outlines to provide the reconstruction with
a scale in meters and a global position and orientation.
The resulting 3D reconstruction is stored on a server and transferred to the client
device. After computing a tracker initialization on the server, the software on the
mobile client device tracks 3D feature points in the live camera view to continuously
determine the position and orientation of the device.
The following sections provide detailed descriptions and evaluations of these
system components. Sections 8.2 through 8.4 cover the outdoor modeling process.
Sections 8.5 through 8.7 describe the outdoor tracking system. Section 8.8 provides
175
FIGURE 8.1 Triangulation of a 3D point from observations in two cameras. The distance
between the cameras is called the baseline; the estimate is more accurate with a larger baseline.
FIGURE 8.2 Localization of a camera from three 3D point observations. The estimate is
generally more accurate when the points are nearer to the camera.
176
quantitative evaluations of the system, and Section 8.9 provides a discussion of the
overall system design and performance. Finally, Section 8.10 gives an annotated reference list for interested readers who would like to explore related work.
177
8.3.1Image Extraction
There are several common panoramic image representations that could be used to
store the image sequences. Mappings such as spherical and cylindrical projection
offer a continuous representation of all camera rays in one image. However, they
nonlinearly distort the perspective view, which would impact performance when
matching to images from a normal perspective camera, as found on a typical mobile
device.
Instead of using spherical or cylindrical projection, perspective views are
extracted from each panorama, such that the collection of extracted views covers
the entire visual field. A typical cube map used as an environment map in rendering uses six images arranged orthogonally, with 90 horizontal and vertical fields of
view in each image. This representation offers perspective views without distortion.
However, in practice, the low field of view in each image hinders the matching and
reconstruction process.
To address this issue, perspective images with wider than 90 horizontal field
of view are used. The top and bottom of the cube are omitted, since they generally have no usable texture. The faces provide overlapping views, which increases
the likelihood of matching across perspective distortion. Eight perspective views per
panorama are used to increase image matching performance by ensuring that all
directions are covered in a view without severe perspective distortion. The views are
arranged at equal rotational increments about the vertical camera axis. Figure 8.3
shows an image from the panorama camera and its extended cube map representation.
(a)
178
(b)
(c)
(d)
(e)
(f )
(g)
FIGURE 8.3 (Continued) (bc) Images extracted from the panorama using perspective projection. (de) Images extracted from the panorama using perspective projection.
(fg)Images extracted from the panorama using perspective projection.
(Continued)
179
(h)
(i)
FIGURE 8.3 (Continued) (hi) Images extracted from the panorama using perspective
projection.
180
FIGURE 8.4 Lines on the buildings (in white) are used to determine a common vanishing
point and align the vertical axis of the reconstruction with the direction of gravity.
181
8.4.3Map Alignment
Now there are four remaining degrees of freedom left: a rotation about the vertical
axis, a translation on the XY ground plane, and the metric scaling of the reconstruction. An initialization for these remaining transformation parameters is determined
manually by visually comparing an overhead, orthographic view of the reconstruction with a map of building outlines from the area, which can be freely downloaded
from OpenStreetMap. A simple interactive tool renders the point cloud and building
outlines together. The user interactively rotates, translates, and scales the reconstruction until the points roughly match the buildings.
After the user determines a rough initialization, automatic nonlinear optimization is applied to determine the best fit between 3D points and building walls. Each
point is assigned to the nearest building wall according to the 2D point-line distance.
However, if the point-line distance is greater than 4 m, or the projection of the point
onto the line does not lie on the line segment, then the match is discarded. The
rotation, translation, and scale parameters are iteratively updated to minimize the
point-line distance of all matches, using the Huber loss function for robustness to
outliers. The entire optimization procedure is repeated until convergence to find the
best point-line assignment and 3D alignment.
An example reconstruction aligned to OpenStreetMap data is shown in Figure8.5.
The panoramas for this reconstruction were captured by walking in a straight line
through the center of the Graz Hauptplatz courtyard while holding the Ricoh Theta
camera overhead.
182
Northing (m)
50
0
50
100
150
200
200
100
100
200
Easting (m)
FIGURE 8.5 Point cloud reconstruction aligned to OpenStreetMap building data. Thegray
dots indicate panorama capture locations, and the black points indicate triangulated 3Dpoints.
8.5.1Image Representation
We refer the images extracted from the panoramas as keyframes. These keyframes
are extended by preparing an image pyramid, meaning that the image is repeatedly
half-sampled; the stack of images at progressively smaller resolutions is stored and
used to improve patch sampling during tracker operation.
183
For each point that passes the culling test, an 8 8 pixel template patch is
extracted from a keyframe that observes the projected point. This keyframe is called
the source, and the camera image is called the target. A perspective warp is used
to compensate for the parallax between the source and the target. This perspective
warp is determined by the 3D point, X, and its normal, n, and the plane p = (n1, n2,
n3, D)T, where nX + D = 0. If the target and source projection matrices are P = [I | 0]
and Pi = [Ri | ti] respectively, the 3 3 perspective warp is
Wi = Ksource (Ri + tivT) Ktarget
where
v = D1 (n1, n2, n3)T
Ksource and Ktarget are the intrinsic calibration matrices of the source and target
images, respectively
The determinant of the warp |Wi| gives the amount of scaling between source and target. This warp is computed for all keyframes that observe the point; the keyframe with
warp scale closest to one is chosen as the source. This ensures the best resolution when
sampling the template patch. The system also chooses the best level of the source image
pyramid according to the resolution required to when sampling the template patch.
The template patch is used to search in the target image for the points current
projected location in the image. We search for this location by computing the normalized cross-correlation between the template patch and patches sampled from the
target image at all locations on an 8 8 grid around the location given by the pose
prior. The location with the best score is accepted if the score exceeds a threshold
of 0.7, which was experimentally found to adequately separate correct and incorrect
matches. To increase robustness to fast movements, the search for correspondence is
performed over an image pyramid of four levels.
8.5.4Pose Update
After searching for correspondences, the camera pose estimate is updated to fit the
measurements. All correspondences found during patch search are used to update the
camera pose, even if they were not successfully refined to the lowest pyramid level.
For each observed point, we project its search location down to the zero pyramid
level, giving a measured location xi for point Xi. Ten iterations of gradient descent
over an M-estimator are used to minimize the re-projection error of all points:
e=
m( y x )
i
where
yi is the projected location of Xi using the current pose estimate
m(u) is the Tukey loss function (Huber, 1981)
The parameters of the Tukey loss function are recomputed to update the weights
after each of the first five iterations.
184
8.5.5Success Metric
After the pose update, the number Nfound of points found to be inliers by the
M-estimator is counted. This is an indicator of the success of the tracker, that is,
whether the pose posterior matches the true pose of the camera. The system requires
that at least 100 points have been successfully tracked (Nfound 100).
To ensure an acceptable frame rate, a limit Nmax is placed on the number of points
Nattempted that the tracker can attempt to find in a single frame. The system first performs view frustum culling on all points, and then selects Nattempted Nmax points to
search for. The point selection is performed by choosing some ordering of visible
points and selecting the first Nmax points from the ordering to be tracked. The question is, which ordering ensures the best tracking performance?
One commonly used approach is to randomly shuffle all the visible points at each
frame. However, this can lead to pose jitter when the camera is not moving. The
reason is that using different subsets of the points for tracking may result in slightly
different poses found by the gradient descent minimization, because of slight errors
in patch search.
A solution to this problem is to randomly order the points once at system startup.
This provides a fixed, but random, ordering for the points at each frame. The result
is that for a static or slowly moving camera, the tracker will reach a steady state
where the same subset of points is used for tracking and pose update at each frame.
Overall, this sampling procedure reduces pose jitter in comparison to sampling a
new random ordering of points at each frame.
185
8.6.1Image-Based Method
The image-based localization method is relatively simple and is easily implemented.
A cache of recently seen images is stored along with their known poses. When the
tracking system fails and enters the lost state, the system matches the current image
to the cache to find pose hypotheses. The best matching image is used as a pose prior
to start the tracker at the next frame.
The image cache is generated during tracker operation and can be saved for reuse
in future tracking sessions. During tracker operation, a tracked image is added to the
cache when tracking is successful and the closest keyframe in the cache is more than
1 m different in position or 45 different in orientation.
To find image matches, a variant of the small blurry image (SBI) matching procedure is used (Klein and Murray, 2008). Images in the cache are down-sampled to the
fifth image pyramid level (meaning that they are half-sampled four times). This same
down-sampling is applied to the current query image from the camera. Then each
cache image is compared to the query image using the normalized cross-correlation
score (Gonzalez and Wood, 2007). The pose of the image with the highest correlation score is used as the initialization for the tracker in the next frame.
The SBI localization method is suitably fast even for a large number of cache
images. However, it requires the query camera to be relatively close to a cache image.
Beyond a small amount of translation or rotation, the query frame will not match to
any cache image. Thus, this method is impractical for outdoor localization in a large
space, unless a very dense coverage of cache images is acquired.
186
8.6.2Feature-Based Method
Alternatively, feature matching can be used to extend the range of poses where
localization can be achieved. Given a query image, the system extracts features and
searches for correspondences in the panorama keyframe database. Then, a robust
sampling procedure is used to find a subset of inlier correspondences that support a
common pose estimate.
Each feature from the query image is matched to its nearest neighbor in the set
of all features in the database according to the Euclidean distance between SIFT
descriptors. Approximate nearest-neighbor search is performed using a kd-tree for
speed (Lowe, 2004). Then the camera pose is robustly estimated using the PROSAC
procedure (Chum and Matas, 2005) and the three-point absolute pose algorithm
(Fischler and Bolles, 1981).
An alternative approach is to apply a document retrieval technique (Sivic and
Zisserman, 2003). A vocabulary tree (Nistr and Stewenius, 2006) is used to hierarchically organize the descriptors so that each descriptor is identified by its cluster (or word).
Given a query frame, standard tf-idf weighted document matching is applied to order the
keyframes by similarity (Sivic and Zisserman, 2003). The top K documents are then subjected to geometric pose verification to find a suitable match. For each top-ranked document, the nearest-neighbor matching and pose estimation procedure described earlier
is performed, using only features from the single image that was retrieved. This image
retrieval approach scales with database size better than the performing nearest-neighbor
matching with entire database. However, there must exist in the database a single view
that has enough visual overlap with the query for the procedure to work. Irschara etal.
developed a method to increase the set of views in the database synthetically, which
increases the range of the image retrieval technique (Irschara etal., 2009).
187
Omnidirectional video
Stored in
server
Server
Offline reconstruction
Wireless network
Localization
request
Orientation
sensors
Camera
Localization
response
Patch
projection
3DOF relative
rotation
Keyframes and
3D points
Tracking
Video
stream
Copied
to client
Live keyframe
sampling
Dynamic 6DOF
absolute pose
AR display
Mobile device
The tracking system runs in real time on the client device in the following loop.
First, the system tries to track the model using the previous pose estimate. The incremental rotation estimate provided by the inertial sensors in the device is preapplied
to the previous pose estimate to compensate for fast motion. If tracking fails, then the
image cache is searched using the current camera image. Tracking is then tried again
using the pose prior provided by the best match from the image cache. If this fails,
then the system generates a localization query that is sent to the server over the wireless network. While the server processes the query, the system continues attempting
to restart tracking using the image cache. When the query response is received, the
computed pose is used to restart tracking.
8.7.2Latency Analysis
Due to network communication time and server computation time, feature-based
localization introduces latency between an image query and a pose response. During
this time, the camera might be moved from its query position, introducing error in
the localization pose estimate. Thus, the system needs some ability to handle an
outdated pose estimate from the localization system.
188
The region of convergence of the tracker determines the amount of error in the
pose prior that the system can tolerate. The continuous pose tracker uses a patch
search method to find a point given a pose prior. This search occurs over a fixed
region around the estimated point projection location, and is run over an image pyramid to expand the search region. This establishes a maximum pixel error in the
projected point location that will still lead to tracker convergence.
We use a simplified analysis here by considering movement in one dimension, to
produce an estimate of the tracker convergence region.
Assuming rotation around the Y-axis (vertical axis), a rotational error of qerr
degrees will cause a pixel offset of xerr pixels:
xerr = f tan(qerr)
where f is the focal length parameter of the cameras intrinsic calibration matrix.
The maximum projection error can be used to find the maximum rotational pose
error qmax.
The system uses an effective search radius of 423 = 32 pixels, and the Apple iPad
2 camera used for testing has a focal length of f = 1179.90. Thus, the maximum rotational pose error is qmax = 1.55. This limit could be a problem if localization latency
is 1 s or more.
For the translation case, the maximum translation tX depends on the distance Z to
the observed object:
xerr = ftX Z
For the iPad 2 camera, the maximum translation is tX /Z = 0.03. Given a building
that is 12 m away, the maximum translation would be about 1/3 m. This as well
would be a limitation for localization, given the distance a fast-walking user could
cover in 1 s.
This analysis suggests in general that the complete time for the localization query
to be sent, processed, and returnedthe localization latencyshould be within 1 s.
Timing data from our experiments is given in Section 8.1.
8.7.3Sensor Integration
To overcome the problem of rotational movement during the localization latency period,
the inertial sensors in the device are used to maintain an estimate of rotational movement. The estimated difference in rotation between the localization query and response
is preapplied to the localization response before attempting to initialize the tracker.
A similar approach could be applied to estimate translational movement based
on accelerometer readings. However, the accelerometer found in typical consumer
devices, such as the iPad 2, are too noisy to be used for estimating translation, even
over a brief period. Fortunately, translational error during the latency period is not an
issue in larger environments such as typical urban scenes. This is because generally
the distance to the buildings is such that small translational movements do not cause
significant parallax in the image.
189
8.8EVALUATION
This section reports on evaluations of several aspects of the system and shows that
it provides sufficient tracking performance to support many kinds of geo-referenced
mobile AR applications.
8.8.1Speed
Localization queries are processed on a remote server while the mobile tracker continues running. This means that the server does not have to respond in real time, since
the processing happens in the background. However, the processing time should be
as short as possible to provide a smooth user experience, and ideally within 1 s, as
determined in Section 8.7.2.
Average timings were recorded using an Apple Mac Pro with a 2.26 GHz
Quad-Core Intel Xeon and 8 GB RAM. The model tested has 21 panoramas,
3691 points, and 6823 features. Most of the computation time is spent on SIFT
feature extraction (900 ms) and PROSAC pose estimation (500 ms). The time to
transfer a JPEG-compressed image from the device to the server is not a severe
bottleneck, even with a 3G cellular data connection. Transfer time typically
takes 3040 ms using either a wireless or 3G connection.
Overall, the average localization latency is about one and a half seconds. In practice we have experienced localization times of 23 s for a larger model. However, the
processing speed could be greatly improved by using GPU implementations of the
feature extraction and pose estimation steps.
The speed of online tracking on the client device was evaluated using an Apple
iPad 2 tablet. Feature-based tracking on the mobile device consists of three steps that
constitute the majority of computation time per frame: point culling (0.005 ms per
point); patch warp (0.02 ms per point); and patch search (0.033 ms per point). The
total tracking time per frame depends on the total number of points in the model
Ntotal, the number of points tracked Ntrack, and the number of pyramid levels L. This
gives an approximate tracking time per frame:
ttrack = Ntotaltcull + NtrackL(twarp + tsearch)
With multithreading on the dual-core iPad 2, the processing time is approximately
reduced by half. For a model with 3691 points, 1024 tracked points, and 4 pyramid
levels, this gives a maximum tracking time of approximately 117 ms per frame.
However, typically the number of points tracked decreases at each successive pyramid search level, so the actual tracking time in practice is lower, and frame rates of
1520 fps tracking are achievable.
190
30
GPS
Tracker
35
Easting (m)
Northing (m)
40
45
50
55
20
25
30
35
60
65
GPS
Tracker
15
500
1000
1500
2000
Frame
2500
40
500
1000
1500
2000
2500
Frame
FIGURE 8.7 Comparison of the camera position estimates from the visual tracking system
with ground truth position estimates from the differential GPS receiver.
to attain ground truth positional estimates with accuracy under 10cm. Because the
GPS receiver produces positional readings at a rate of 1Hz, linear interpolation was
used to up-sample the signal to 30Hz.
A test video with the differential GPS receiver was recorded in the Graz Hauptplatz
while observing the Rathaus (City Hall). The panoramic reconstruction of this area
was made from 37 panoramas taken with the Ricoh Theta camera. The resulting reconstruction contains 14,523 points. The semiautomatic alignment method described in
this chapter was used to georegister the model with respect to building outlines from
OpenStreetMap. An overhead view of the point cloud is shown in Figure 8.5.
A comparison of the differential GPS track and the positional track created with
our system is shown in Figure 8.7. The system achieved an average error of 0.72 m
in the easting direction and 0.38 m in the northing direction. This shows that our
system provides better accuracy than consumer GPS, which has an accuracy of about
3 m with a high-quality receiver.
8.8.3Augmentation Examples
Several prototypes have been developed and tested to evaluate the use of our modeling and tracking system for AR applications. Example screen captures from these
prototypes are shown in Figure 8.8.
The first prototype is a landscape design application. In a large courtyard on the
UC Santa Barbara campus, the user can place virtual trees on the large grassy area
between the buildings. As trees are placed, the user can move around to view how
the trees would look from different angles.
A second prototype tests the use of video game graphics. In this application, a
landing spaceship is rendered into another building courtyard on the UCSB campus
at the spot on the ground where the user touches the screen. Using an assumed position of the sun, accurate shading and shadows are rendered, to increase the realism
of the rendering.
A third prototype was created to test architectural rendering. Here, a reconstruction of a city street (Branch Street in Arroyo Grande, CA) was created by holding the
panorama camera out on the sunroof of a car and driving down the street to capture
191
(a)
(b)
(c)
FIGURE 8.8 Example images of the tracking system in use with 3D models rendered over
the camera image. (a) Synthetic trees planted in the grass. (b) A spaceship landing in the
courtyard, rendered with lighting and shadow effects. (c) Virtual lamps affixed to the side of
the building.
the buildings on either side. Then, a user standing on the sidewalk can add architectural elements such as virtual lamps to the building facades by simply touching the
screen at the points on the wall where they should be placed.
8.9DISCUSSION
From these evaluations, it can be concluded that visual modeling and tracking offers
a compelling solution to device pose estimation for mobile AR applications. The
approach enables high-accuracy tracking at real-time rates with consumer hardware.
Experience with the prototype applications suggests that the pose estimation is of
sufficient quality to make objects appear to stick to surfaces, such that they seem
truly attached to a wall or a ground. Using simple rendering techniques such as
shading and shadowing also helps to improve the perceived realism of the rendered
graphics.
The major limitation of this approach is that the system is generally restricted
to operation from viewpoints where the scene is visually distinctive and able to be
recognized by its appearance. For many viewpoints, this is not the case, such as
texture-less building walls, and the sky or the ground. In addition, many scenes contain repetitive textures, such as grids of windows, that confuse the visual localization
system and lead to system failure. One possible solution to these problems would be
to further integrate other position and motion sensors, such as a GPS receiver, accelerometer, gyroscope, and compass, to complement the visual tracker.
The source code for the system described in this chapter is publicly available for
download, testing, and further development at http://www.jventura.net/code.
192
REFERENCES
Arth, C., Wagner, D., Klopschitz, M., Irschara, A., and Schmalstieg, D. (2009). Wide area
localization on mobile phones. In ISMAR09 Proceedings of the 2009 Eighth IEEE
International Symposium on Mixed and Augmented Reality (pp. 7382). Washington,
DC: IEEE Computer Society.
Chum, O. and Matas, J. (2005). Matching with PROSAC-progressive sample consensus.
In CVPR 2005. IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, 2005 (Vol. 1, pp. 220226). Washington, DC: IEEE Computer Society.
Fischler, M. A. and Bolles, R. C. (1981). Random sample consensus: A paradigm for model
fitting with applications to image analysis and automated cartography. Communications
of the ACM, 24(6), 381395.
193
Gonzalez, R. and Wood, R. E. (2007). Digital Image Processing. Upper Saddle River, NJ:
Pearson/Prentice Hall.
Hartley, R. and Zisserman, A. (2004). Multiple View Geometry in Computer Vision. Cambridge,
UK: Cambridge University Press.
Huber, P. J. (1981). Robust Statistics. New York: John Wiley & Sons.
Irschara, A., Zach, C., Frahm, J. M., and Bischof, H. (2009). From structure-from-motion
point clouds to fast location recognition. In CVPR 2009. IEEE Conference on Computer
Vision and Pattern Recognition, 2009 (pp. 25992606). Washington, DC: IEEE
Computer Society.
Klein, G. and Murray, D. (2006). Full-3d edge tracking with a particle filter. In British
Machine Vision Conference (BMVC06). Manchester, U.K.: British Machine Vision
Association.
Klein, G. and Murray, D. (2007). Parallel tracking and mapping for small AR workspaces. In
ISMAR07: Proceedings of the 2007 Sixth IEEE and ACM International Symposium on
Mixed and Augmented Reality (pp. 225234). Washington, DC: IEEE Computer Society.
Klein, G. and Murray, D. (2008). Improving the agility of keyframe-based SLAM. In
ECCV08: Proceedings of the 10th European Conference on Computer Vision: Part II
(Vol. 5303 LNCS, pp. 802815). Berlin, Germany: Springer-Verlag.
Li, Y., Snavely, N., and Huttenlocher, D. (2010). Location recognition using prioritized feature
matching. In ECCV10: Proceedings of the 11th European Conference on Computer
vision: Part II (pp. 791804). Berlin, Germany: Springer-Verlag.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International
Journal of Computer Vision, 60(2), 91110.
Nistr, D. (2004). An efficient solution to the five-point relative pose problem. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 26(6), 756777.
Nistr, D. and Stewenius, H. (2006). Scalable recognition with a vocabulary tree. In CVPR06
Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and
Pattern Recognition (Vol. 2, pp. 21612168). Washington DC: IEEE Computer Society.
Oskiper, T., Samarasekera, S., and Kumar, R. (2012). Multi-sensor navigation algorithm using
monocular camera, IMU and GPS for large scale augmented reality. In ISMAR12:
Proceedings of the 2012 IEEE International Symposium on Mixed and Augmented
Reality (ISMAR) (pp. 7180). Washington, DC: IEEE Computer Society.
Philbin, J., Chum, O., Isard, M., Sivic, J., and Zisserman, A. (2007). Object retrieval with large
vocabularies and fast spatial matching. In CVPR07. IEEE Conference on Computer
Vision and Pattern Recognition, 2007 (pp. 18). Washington, DC: IEEE Computer
Society.
Reitmayr, G. and Drummond, T. W. (2006). Going out: Robust model-based tracking for
outdoor augmented reality. In ISMAR06: Proceedings of the Fifth IEEE and ACM
International Symposium on Mixed and Augmented Reality (pp. 109118). Washington,
DC: IEEE Computer Society.
Sattler, T., Leibe, B., and Kobbelt, L. (2011). Fast image-based localization using direct
2D-to-3D matching. In ICCV11 Proceedings of the 2011 International Conference on
Computer Vision (Vol. 43). Washington, DC: IEEE Computer Society.
Sivic, J. and Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. In ICCV03 Proceedings of the Ninth IEEE International Conference on
Computer Vision (Vol. 2, pp. 14701477). Washington, DC: IEEE Computer Society.
Skrypnyk, I. and Lowe, D. G. (2004). Scene modelling, recognition and tracking with invariant image features. In ISMAR04 Proceedings of the Third IEEE/ACM International
Symposium on Mixed and Augmented Reality (pp. 110119). Washington, DC: IEEE
Computer Society.
Snavely, K. (2008). Scene reconstruction and visualization from Internet photo collections.
Dissertation, University of Washington, Seattle, WA.
194
Snavely, N., Seitz, S. M., and Szeliski, R. (2006). Photo tourism: Exploring photo collections
in 3D. ACM Transactions on Graphics (TOG)Proceedings of ACM SIGGRAPH 2006,
25(3), 835846.
Ventura, J. (2012). Wide-Area Visual Modeling and Tracking for Mobile Augmented Reality
(T.Hollerer, Ed.). Santa Barbara, CA: University of California.
Ventura, J. and Hollerer, T. (2011). Outdoor mobile localization from panoramic imagery. In
ISMAR11 Proceedings of the 2011 10th IEEE International Symposium on Mixed and
Augmented Reality (pp. 247248). Washington, DC: IEEE Computer Society.
Ventura, J. and Hollerer, T. (2012a). Structure from motion in urban environments using
upright Panoramas. Virtual Reality, 17(2), 147156.
Ventura, J. and Hollerer, T. (2012b). Wide-area scene mapping for mobile visual tracking.
In ISMAR12 Proceedings of the 2012 IEEE International Symposium on Mixed and
Augmented Reality (ISMAR) (pp. 312). Washington, DC: IEEE Computer Society.
Von Gioi, R. G., Jakubowicz, J., Morel, J.-M., and Randall, G. (2010). LSD: A fast line segment detector with a false detection control. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 32(4), 722732.
Scalable Augmented
Reality on Mobile
Devices
Applications, Challenges,
Methods, and Software
Xin Yang and K.T. Tim Cheng
CONTENTS
9.1 Applications................................................................................................... 195
9.1.1 Computer Vision-Based MAR Apps................................................. 196
9.1.2 Sensor-Based MAR Apps.................................................................. 197
9.1.3 MAR Apps Based on Hybrid Approaches........................................ 198
9.2 Challenges..................................................................................................... 199
9.3 Pipelines and Methods...................................................................................200
9.3.1 Visual Object Recognition and Tracking.......................................... 201
9.3.1.1 Marker-Based Methods....................................................... 201
9.3.1.2 Feature-Based Method........................................................206
9.3.2 Sensor-Based Recognition and Tracking........................................... 216
9.3.2.1 Sensor-Based Object Tracking............................................ 216
9.3.3 Hybrid-Based Recognition and Tracking.......................................... 221
9.4 Software Development Toolkits.................................................................... 222
9.5 Conclusions.................................................................................................... 223
References............................................................................................................... 223
9.1APPLICATIONS
In recent years, mobile devices such as smartphones and tablets have experienced
phenomenal growth. Their computing power has grown enormously and the integration of a wide range of sensors, for example, compass, accelerometer, gyroscope,
GPS, etc., has significantly enriched these devices functionalities. The connectivity
of smartphones has also gone through rapid evolution. A variety of radios including
cellular broadband, Wi-Fi, Bluetooth, and NFC available in todays smartphones
enable users to communicate with other devices, interact with the Internet, and
exchange their data with, and run their computing tasks in the clouds. These mobile
195
196
handheld devices equipped with cameras, sensors, low-latency data networks, and
powerful multicore application processors (APs) can now run very sophisticated
augmented reality (AR) applications. There are already a very rich collection of
mobile AR (MAR) apps for visual guidance in assembly, maintenance and training, interactive books, multimedia-augmented advertising, contextual informationaugmented personal navigation, etc., that can be used anytime and anyplace.
A scalable MAR system, which has the ability to identify objects and/or locations
from a large database and track the pose of a mobile device with respective to the
physical objects, can support a wide range of MAR apps. Such a system can enable
applications such as nationwide multimedia-enhanced advertisements printed on
papers, augmenting millions of book pages in a library, and recognizing millions of
worldwide places of interests and providing navigation or contextual information, etc.
Broadly, scalable MAR apps can be categorized into three classes according to the
underlying techniques they rely on: computer vision-based, sensor-based, and hybrid.
197
198
using available sensors of the phone and could be used as a waypoints tool, sextant, compass, rangefinder, speedometer, and inclinometer.
Other sensor-based MAR apps which serve for navigation focus on the scenario
of car driving. For instance, Wikitude Drive is a MAR navigation system for which
computer-generated driving instructions are drawn on top of the realitythe real
road the user is driving on. Navigation thus takes place in real time in the smartphones live camera image. This app solves a key problem that existing navigation
systems havethe driver no longer needs to take his/her eyes off the road when looking at the navigations system. Other similar apps include Route 66 Maps + Navigation
that provides an amalgamation of comprehensive 3D maps and AR navigation to
bring fun and informative experience to drivers and Follow Me that can trace the
users exact route on road, backed with real-time graphics and a virtual car that leads
the user all the way to the destination.
In addition to navigation, sensor-based MAR is also applicable to educational
applications. A famous example is Star Walk, an interactive astronomy guide offering AR of the sky with the actual sky outside. Using this app, users can align the
physical sky to the sky shown on the display of the phone/tablet. This allows for pinpoint precision for tracking satellites, finding stars, or finding constellations, offering an attractive educational tool for students, part-time star gazers, or astronomers.
As the user adjusts the orientation of phone/tablet toward the sky, with the help of
the accelerometer and gyroscope sensors, which offer highly accurate positioning,
the sky shown on the display will move along to match the physical sky. Finding
celestial objects becomes easysimply moving the viewfinder over the object in the
real-world view and clicking on it. The user can also search for an objectJupiter,
a satellite, or a specific constellation, to have it instantly come up on the viewfinder.
Then the app will guide the user to adjust the orientation of the device to match it up
with the object in the real sky.
199
9.2CHALLENGES
Despite advances in computer vision and signal processing algorithms as well as
mobile hardware, scalable MAR remains very challenging due to the following
reasons:
1. The design objectives of modern mobile APs require more than just performance. Priority is often given to other factors such as low power consumption and a small form factor. Although the performance of mobile
CPUs has achieved greater than 30 improvement within a short period of
recent 5years (e.g., ARM quadcore Cortex A-15 in 2014 vs. ARM 11 single-core in 2009), todays mobile CPU cores are still not powerful enough
to perform computationally intensive vision tasks such as sophisticated
feature extraction and image recognition algorithms. Graphics processing units (GPUs), which have been built into most APs, can help speed
up processing via parallel computing (Cheng etal. 2013; Terriberry etal.
2008), but most feature extraction and recognition algorithms are designed
to be executed sequentially and cannot fully utilize GPU capabilities. In
addition, mobile devices have less memory and lower memory bandwidth
than desktop systems. The memory of todays high-end smartphones, such
as Samsung Galaxy S5, is limited to 2 GB of SDRAM and the memory
size of mid- and entry-level phones is even smaller. This level of memory
sizes is not sufficient for performing local object recognition using a large
database. In order to realize efficient object recognition, the entire indexing structure of a database needs to be loaded and reside in main memory.
The total amount of memory usage for an indexing structure usually grows
linearly with the number of database images. For a database of a moderate size (e.g., tens of thousands of images), or a large size (e.g., millions of images), the indexing structure itself could easily exhaust memory
resources. Several scalable MAR systems employ the client-server model
to handle large databases. That is, sending the captured image or processed image data (e.g., image features) to a server (or a cloud) via the
Internet, performing object recognition and pose estimation on the server
side, and then sending the estimated pose and associated digital data back
to the mobile device. While Wi-Fi is a built-in feature for almost all mobile
devices, connection to high-bandwidth access points is still not available
anyplace, neither anytime. For connection to data networks, todays mobile
devices rely on a combination of mobile broadband networks including
3G, 3.5G, and 4G. These networks, while providing acceptable network
access speed for most apps, cannot support real-time responses for apps
demanding a large amount of data transfer. Moreover, advanced mobile
broadband networks still have limited availability in areas not having
dense populations.
2. For most algorithms, it is very challenging to achieve both good accuracy
and high efficiency. As mentioned in previous section, sensor-based methods can achieve good efficiency but its performance is often limited by
200
the low precision of sensors used in mobile devices, limiting their applicability for apps demanding high recognition rate and tracking accuracy.
In addition, sensor-based approaches do not provide information about
the objects in the camera picture. As a result, the presented information
could only be related to the direction and position of the device, not to a
specific object. On the other hand, vision-based methods usually require
significant computation and memory space to process the image which
consists of a large number of pixels, to match against a large database
and to estimate geometric transformation between the object within a
captured image and the recognized database object. A hybrid approach
that integrates vision-based and sensor-based methods can potentially
combine their complementary advantages; however, designing a fusion
solution that optimizes accuracy, efficiency, and robustness is not a trivial
task at all.
Cloud
Database
Object/location
identification
Identified
result
Augmented digital
data
Pose tracking
Associated
data
Relative
pose
Overlaying virtual
data on reality
Final scene
Cloud
FIGURE 9.1 A general pipeline for scalable augmented reality on mobile devices.
201
202
Edge
detection
Line fitting
and corner
detection
Candidate
markers
detection
Captured image
Recognized object
and camera pose
Postverification
Detected markers
Marker tracking
and
pose estimation
Marker identification
(Marker template, 2D barcode,
imperceptive markers)
203
(a)
(b)
(c)
(d)
FIGURE 9.3 Illustration of 2D markers. (a) Template marker. Barcode markers: (b) QR
code, (c) DataMatrix, and (d) PDF417.
204
pair of marker images involves processing of all corresponding pixels of the two
images. As a result, the total runtime for template matching is nontrivial and grows
linearly with the number of templates in the database, prohibiting its usage for a
large database. To speed up the matching process, markers are often down-sampled
to a small size, that is, 16 16 or 32 32, to reduce the amount of pixels needed to be
compared. However, the scalability and accuracy of a MAR system could be greatly
restricted by the size of quantized markers. For a quantized image size of 16 16
inside a marker, the maximum number of distinct patterns that can be generated is
(16 16)2 = 65,536 (each pixel can be either 1 or 0). In other words, the upper bound
on the number of distinct markers is limited to 65,536. In practice, due to the photometric changes, inaccuracy in the detection process and other noise sources, the
number of markers that can be correctly and reliably detected could be even smaller
than this number. Due to these limitations, template markers are not suitable for scalable MAR apps which use a large database.
2D barcode markers are markers consisting of frequently changed black-andwhite data cells and possibly a border or other landmarks (as shown in Figure 9.3b
through d). A system identifies a 2D barcode marker by decoding the encoded information in it. Typically, the decoding process is performed by sampling the pixel
values from the calculated center of each cell, and then resolves the cell values
using them. The resolved cell value can be either interpreted as a binary number
(i.e., 0 or1) or can link to more information (e.g., ASCII characters) via a database.
Popular 2D barcode standards include QR code (Information Technology 2006a),
DataMatrix (Information Technology 2006b), and PDF417 (Information Technology
2006c), which were originally developed for logistics and tagging purposes but
are also used for AR apps. In addition to these three standards that will be briefly
described in the following section, there are many other standards (e.g., MaxiCode,
Aztec Code, SPARQCode) that might be used for tracking in some applications too.
QR code (Figure 9.3b) is a 2D barcode created by the Japanese corporation DensoWave in 1994. QR is the abbreviation for Quick Response, as the code is intended for
high-speed decoding. QR code became popular for mobile tagging applications and
is the de facto standard in Japan. QR code is flexible and has large storage capacity.
A single QR code symbol can contain up to 7089 numeric characters, 4296 alphanumeric characters, 2953bytes of binary data, or 1817 Kanji characters. Therefore, QR
codes are very suitable for large-scale MAR apps.
DataMatrix (Figure 9.3c) is another popular barcode marker which is famous for
marking small items such as electronic components. The DataMatrix can encode
up to 3116 characters from the entire ASCII character set with extensions. The
DataMatrix barcode is also used in mobile marketing under the name SemaCode.
PDF417 (Figure 9.3d) was developed in 1991 by Symbol (recently acquired by
Motorola). A single PDF417 symbol can be considered multiple linear barcode rows
stacked above each other. A single PDF417 symbol can theoretically hold up to 1850
alphanumeric characters, 2710 digits, or 1108bytes. The exact data capacity depends
on the structure of the data to be encoded; this is due to the internal data compression algorithms used during coding. The ratio of the widths of the bars (or spaces) to
each other encode the information in a PDF417 symbol. For that reason, the printing
accuracy and a suitable printer resolution are important for high-quality PDF417
205
symbols. This also makes PDF417 the least suitable for AR applications where the
marker is often under perspective transformation.
9.3.1.1.3 Marker Tracking
The main idea of AR is to present virtual objects in a real environment as if they were
part of it. Virtual objects should move and change its pose accordingly with the movement of a mobile camera. Tracking the camera pose (i.e., camera location and orientation) in real time is required in order to render the virtual object in the right scale and
perspective. The pose of a camera relative to a marker in the real scene can be uniquely
determined from a minimum of four corresponding points between the marker in the
real scene and the marker on the camera image plane. Note that the four points used for
determining the camera pose need to be coplanar but noncollinear. Marker-based object
tracking uses the four corners of a square marker, which can be reliably detected, for this
purpose. We define a transformation T between a camera and a marker as
X r11 r12
x
Y r21 r22
y = T =
Z r31 r32
1
1 0 0
r13 t x X
r23 t y Y
(9.1)
r33 t z Z
0 1 1
where
[X Y Z 1]T is a homography representation for a marker corners coordinates in the
earth coordinate system
[x y 1]T is its projected coordinates on the image plane
Once an initial camera pose is obtained, the system can keep tracking the marker on the
image plane by constructing corner correspondences between consecutive frames and
computing the transformation matrix between two frames based on the correspondences.
9.3.1.1.4Discussions
2D barcode identification directly decodes the information in a marker without
demanding enormous amount of computations for image matching. In addition,
marker-based tracking only needs to detect four corners of a marker and estimate
the camera pose according to Equation 9.1. Therefore, the barcode marker-based
method is time efficient so as to provide real-time performance for many MAR apps.
Furthermore, 2D barcode markers have a large storage capacity and thus can support
applications which require high scalability. However, barcode markers need to be
printed on or attached to objects beforehand for association with specific contents.
Thus, they are visually obtrusive and for some outdoor scenarios (e.g., landmarks)
attaching markers to objects is not feasible. Moreover, marker-based methods are
sensitive to occlusion. These limitations may lead to a poor user experience. Featurebased methods can overcome these limitations, while the slower speed and greater
memory usage could be two major issues.
206
Lxy
Dyy
Dxy
1
2
1
X = [x1, x2, ..., xd]
(a)
(b)
(c)
(d)
(e)
FIGURE 9.4 (a) An exemplar image overlaid with detected local features. (b) and (c) are
the discretized and cropped Gaussian second-order partial derivative in the y-direction and
the xy-direction, respectively; (d) and (e) are SURF box-filter approximation for Lyy and L xy
respectively.
207
amount for results in this field. With limited space, we can only afford reviewing
a small subset of representative results that are most relevant to the application of
scalable MAR.
9.3.1.2.1.1Interest Point Detection An interest point detector is an operator
which attributes a saliency score to each pixel of an image and then chooses a subset
of pixels with local maximum scores. A good detector should provide points that
have the following properties: (1) repeatability (or robustness), that is, given two
images of the same object under different image conditions, a high percentage of
points on the object can be visible in both images; (2) distinctiveness, that is, the
neighborhood of a detected point should be sufficiently informative so that the point
can be easily distinguished from other detected points; (3) efficiency, that is, the
detection in a new image should be sufficiently fast to support time-critical applications; and (4) quantity, that is, a typical image should contain a sufficient number
of points to cover the target object, so that it can be recognized even under partial
occlusion.
A wide variety of interest point detectors exist in the literature. Some lightweight detectors (Rosten etal. 2006) aim at high efficiency to target applications
that demand real-time performance and/or mobile hardware platforms that have
limited computing resources. However, the performance of these detectors is relatively poor. As a result, it requires pose verification to exclude false matches in
the matching phase that often incurs a nontrivial runtime. On the other hand, several high-quality feature detectors (Bay etal. 2006, 2008; Lowe 2004) have been
developed with a primary focus on robustness and distinctiveness. These detectors ability to accurately localize correct targets from a large database makes
them suitable for large-scale object recognition. However, the computational complexity of these detectors is usually very high, making them inefficient on a mobile
device. Some recent efforts, for example (Yang etal. 2012a), have been made to
adapt these feature detection algorithms onto mobile devices and optimize their
performance and efficiency for MAR. Due to space limitation, we review only the
most representative work for the lightweight detector, the high-quality detector,
and algorithm adaptation. A thorough survey on local feature-based detectors can
be found in Tuytelaars etal. (2008).
Lightweight detector: FAST. The FAST (features from accelerated segmented test)
detector, proposed by Rosten etal. (2006), has become popular recently due to its
highly efficient processing pipeline. The basic idea of FAST is to compare 16 pixels
located on the boundary of a circle (radius is 3) around a central point, each pixel is
labeled from integer number 116 clockwise: if the intensities of n (n threshold)
consecutive pixels are all higher or all lower than the central pixel, then the central
pixel is labeled as a potential feature point and n is defined as the response value
for the central pixel. The final set of feature points is determined after applying
a nonmaximum suppression step (i.e., if the response value of a point is the local
maximum within a small region, this point is considered as a feature point). Since
the FAST detector only involves a set of intensity comparisons with little arithmetic
operations, it is highly efficient.
208
The FAST detector is not invariant to scale changes. To achieve scale invariance,
Rublee et al. (2011) proposed to employ a scale pyramid of an image and detect
FAST feature points at each level in the pyramid. FAST could incur large responses
along edges, leading to a lower repeatability and distinctiveness compared to highquality detectors such as SIFT (Lowe 2004) and SURF (Bay etal. 2006, 2008). To
address this limitation, Rublee etal. employed a Harris corner measure to order the
FAST feature points and discard those with small responses to the Harris measure.
High-quality detector: SURF. The SURF (Speeded Up Robust Feature) detector,
proposed by Bay etal. (2006, 2008), is one of the most popular high-quality point
detectors in the literature. It is scale-invariant and based on the determinant of the
Hessian matrix H(X, ):
L xx ( X , )
H ( X , ) =
L xy ( X , )
L xy ( X , )
(9.2)
L yy ( X , )
where
X = (x, y) is a pixel location in an Image I
is a scale factor
L xx(X, ) is the convolution of the Gaussian second-order derivative in x direction,
similarly for Lyy and L xy (see Figure 9.4b and c).
To speed up the process, a SURF detector approximates the Gaussian second-order
partial derivatives with a combination of box filter responses (see Figure 9.4d and e),
computed using the integral image technique (Simard etal. 1998). The approximated
derivatives are denoted as Dxx, Dxy, and Dyy and accordingly the approximate Hessian
determinant is
A SURF detector computes Hessian determinant values for every image pixel i over
scales using box filters of a successively larger size, yielding a determinant pyramid
for the entire image. Then it applies a 3 3 3 local maximum extraction over the
determinant pyramid to select interest points locations and corresponding salient
scales.
To achieve rotation invariance, SURF relies on gradient histograms to identify a
dominant orientation for each detected point. An image patch around each point is
rotated to its dominant orientation before computing a feature descriptor. Specially,
the dominant orientation of a SURF detector is computed as follows. First, the entire
orientation space is quantized into N histogram bins, each of which represents a
sliding orientation window covering an angle of /3. Then SURF computes gradient
responses of every pixel in a circular neighborhood of an interest point. Based on
the gradient orientation of a pixel, SURF maps it to the corresponding histogram
bins and adds its gradient response to these bins. Finally, the bin with the largest
responses is utilized to calculate the dominant orientations of interest points.
209
TABLE 9.1
Comparison of FAST and SURF Detector on Mobile Device and PC
Time
Detector
FAST detector
SURF detector
PC (ms)
Speed Up
170
2156
40
143
4
15
Comparing to FAST, SURF point detection involves much more complex computations and, thus, is much slower than FAST. The runtime limitation of SURF is
further exacerbated when running a SURF detector on a mobile platform. Table 9.1
compares the runtime performance of a FAST detector and a SURF detector running on a mobile device (Motorola Xoom1) and a laptop (Thinkpad T420), respectively. Running a FAST detector takes 170 ms on a Motorola Xoom1 and 40 ms on
an i5-based laptop, yielding a 4 speed gap. However, running a SURF detector on
them takes 2156 and 143 ms, respectively, indicating a 15 speed gap.
Although FAST detector is more efficient than SURF, it cannot match SURFs
robustness and distinctiveness. As a result, it usually fails to achieve satisfactory
performance for MAR apps that demand high recognition accuracy from a large
database and/or handling content with large photometric/geometric changes.
Algorithm adaptation: Accelerating SURF on mobile devices. There are several
techniques for improving SURFs efficiency by exploiting coherency between consecutive frames (Ta etal. 2009), employing GPUs for parallel computing or optimizing various aspects of the implementation (Terriberry et al. 2008). An interesting
solution proposed recently (Yang etal. 2012a) is to analyze the causes for a SURF
detectors poor efficiency and large overhead on a mobile platform, and propose a set
of techniques to adapt the SURF algorithm to a mobile platform. Specially, two mismatches between the computations used in existing SURF algorithm and common
mobile hardware platforms are identified as the sources of significant performance
degradation:
Mismatch between data access pattern and small cache size of a mobile
platform. A SURF detector relies on an integral image and accesses it
using a sliding window of successively larger size for different scales. But
a 2D array is stored in a row-based fashion in memory (cache and DRAM),
not in a window-based fashion; pixels in a single sliding window reside
in multiple memory rows (illustrated in Figure 9.5a). The data cache size
of a mobile AP, typically 32kB for todays devices, is too small to cache
all memory rows for pixels involved in one sliding window, leading to
cache misses and cache line replacements and, in turn, incurring expensive
memory access.
Mismatch between a huge amount of data-dependent branches in the algorithm and high pipeline hazard penalty of the mobile platform. To identify a dominant orientation, a SURF detector analyzes gradienthistogram.
210
Sliding
window
(a)
Image
tile
(b)
FIGURE 9.5 Illustration of data locality and access pattern in (a) the original SURF detector and (b) the tiled SURF. Each color represents data stored in a unique DRAM row. In the
original SURF, a sliding window needs to access multiple DRAM rows, leading to frequent
cache misses, while in tiled SURF, all required data within a sliding window can be cached.
211
TABLE 9.2
Runtime Cost Comparison on Three Mobile Platforms
Time (ms)
Droid
Thunderbolt
Xoom1
U-SURF
U-SURF tiling
O-SURF
O-SURF lookup table
O-SURF GMoment
O-SURF Tiling + GMoment
1310
930
7700
4264
1516
1053
525
356
2495
1820
613
404
461
243
2156
1178
519
269
TABLE 9.3
Speed Ratio Comparison on Three Mobile Platforms
Phone-to-PC Ratio ()
U-SURF
U-SURF tiling
O-SURF
O-SURF lookup table
O-SURF GMoment
O-SURF Tiling + GMoment
Droid
Thunderbolt
Xoom1
20
14
54
18
19
13
8
7
17
7
8
7
7
4
15
6
7
3
Phone-to-PC ratio =
212
method can greatly reduce the overall runtime and the Phone-to-PC ratio on three
platforms. The reduction in the Phone-to-PC ratio further confirms that branch
hazard penalty has a much greater runtime impact on a mobile CPU than on a
desktop CPU. Choosing proper implementations or algorithms to avoid such penalties is critical for a mobile task. The last rows of Tables 9.2 and 9.3 show the
results with the application of both two adaptations to O-SURF: comparing to
original SURF, the two adaptations can reduce the runtime on mobile platforms
by 68.
9.3.1.2.1.2Local Feature Description Once a set of interest points has been
extracted from an image, their content needs to be encoded in descriptors that are
suitable for matching. In the past decade, the most popular choices for this step have
been the SIFT descriptor and the SURF descriptor. SIFT and SURF have successfully demonstrated their good robustness and distinctiveness in a variety of computer vision applications. However, the computational complexity of SIFT is too
high for real-time application with tight time constraints. Despite that SURF accelerates SIFT by 23, it is still not sufficiently fast for real-time applications running
on a mobile device. In addition, SIFT and SURF are high-dimensional real-value
vectors which demand large storage space and high computing power for matching. Recently, the booming development of real-time mobile apps stimulates a rapid
development of binary descriptors that are more compact and faster to compute than
SURF-like features while maintaining a satisfactory feature quality. Notable work
includes BRIEF (Terriberry etal. 2008) and its variants rBRIEF (Rublee etal. 2011),
BRISK (Leutenegger etal. 2011), FREAK (Alahi etal. 2012), and LDB (Yang etal.
2012b, 2014a,b). In the following section, we review three representative descriptors:
SURF, BIREF, and LDB.
SURF: Speeded Up Robust Features. The SURF descriptor aims to achieve robustness to lighting variations and small positional shifts by encoding the image information in a localized set of gradient statistics. Specifically, each image patch is
divided into 4 4 grid cells. In each cell, SURF computes a set of summary statistics d x, |d x|, dy, and |dy|, resulting in a 64-dimensional descriptor. The firstorder derivatives d x and dy can be calculated very efficiently using box filters and
integral images.
Motivated by the success of SURF, a further optimized version has been proposed in Terriberry etal. (2008) that takes advantage of the computational power
available in current CUDA-enabled graphics cards. This GPUSURF implementation has been reported to perform feature extraction for a 60 480 image at a
frame rate upto 20Hz, thus making feature extraction a truly affordable processing step. However, to date, most mobile GPU cores do not support CUDA and thus
porting an implementation from desktop-based GPUs to mobile GPUs remains a
tedious task.
BRIEF: Binary robust independent elementary features. The BRIEF descriptor,
proposed by Calonder et al. (2010), primarily aims at high computational efficiency for construction and matching, and a small footprint for storage. The basic
idea of BRIEF is to directly generate bit strings by simple binary tests comparing
213
pixel intensities in an image patch. More specifically, a binary test is defined and
performed on a patch p of size S S as
1
( p; x, y) =
0
if I (p, x ) < I ( p, y)
(9.5)
otherwise
where I(p, x) is the pixel intensity at location x = (u, v)T. Choosing a set of nd (x,y)location pairs uniquely defines the binary test set and consequently leads to an
nd -dimensional bit string that corresponds to the decimal counterpart of
1i nd
i 1
( p; xi , yi ) (9.6)
By construction, the tests of Equation 9.6 consider only the information at single
pixels; therefore, the resulting BRIEF descriptors are very sensitive to noises. To
increase the stability and repeatability, the authors proposed to smooth pixels of
every pixel pairs using Gaussian or box filters before performing the binary tests.
The spatial arrangement of binary tests greatly affects the performance of the
BRIEF descriptor. In Calonder et al. (2010), the authors experimented with five
sampling geometries for determining the spatial arrangement. Experimental results
demonstrate that the tests that are randomly sampled from an isotropic Gaussian
distributionGaussian (0, 1/25S2) where the origin of the coordinate system is the
center of a patchgive the highest recognition rate.
LDB: Local difference binary. Binary descriptors such as BRIEF and a list of
enhanced versions of BRIEF (Alahi etal. 2012; Leutenegger etal. 2011; Rublee
etal. 2011) are very efficient to compute, to store, and to match (simply computing
the Hamming distance between descriptors via XOR and bit count operations).
These runtime advantages make them more suitable for real-time applications
and handheld devices. However, these binary descriptors utilize overly simplified information, that is, only intensities of a subset of pixels within an image
patch, and thus have low discriminative ability. Lack of distinctiveness incurs
an enormous number of false matches when matching against a large database.
Expensive postverification methods (e.g., RANSAC, Fischler etal. 1981) are usually required to discover and validate matching consensus, increasing the runtime
of the entire process.
Local difference binary (LDB), a binary descriptor, achieves similar computational speed and robustness as BRIEF and other state-of-the-art binary descriptors,
yet offering greater distinctiveness. The high quality of LDB is achieved through
three schemes. First, LDB utilizes average intensity Iavg and first-order gradients,
dxand dy, of grid cells within an image patch. Specifically, the internal patterns of the
image patch is captured through a set of binary tests, each of which compares the Iavg,
dx, and dy of a pair of grid cells (illustrated in Figure 9.6a and b). The average intensity and gradients capture both the DC and AC components of a patch; thus, they
provide a more complete description than other binary descriptors. Second, LDB
214
Iavg1 = 50
Binary test
dx1 = 0
dy1 = 0
Iavg2 = 150
dx2 = 32
dy2 = 64
(c)
(b)
FIGURE 9.6 Illustration of LDB extraction. (a) An image patch is divided into 3 3 equalsized grids. (b) Compute the intensity summation (I), gradient in x and y directions (dx anddy)
of each patch, and compare I, dx and dy between every unique pair of grids. (c) Three-level
gridding (with 2 2, 3 3, and 4 4 grids) is applied to capture information at different
granularities.
employs a multiple gridding strategy to encode the structure at different spatial granularities (Figure 9.6c). Coarse-level grids can cancel out high-frequency noise while
fine-level grids can capture detailed local patterns, thus enhancing distinctiveness.
Third, LDB leverages a modified AdaBoost method (Yang et al. 2014b) to select
a set of salient bits. The modified AdaBoost targets the fundamental goal of idea
binary descriptors: minimizing distance between matches while maximizing them
between mismatches, optimizing the performance of LDB for a given descriptor
length. Computing LDB is very fast. Relying on integral images, the average intensity and first-order gradients of each grid cell can be obtained by only four to eight
add/subtract operations.
9.3.1.2.2 Local Feature-Based Object Recognition
To recognize objects in a captured image, a system matches each feature descriptors
on a captured image to database features in order to find its nearest neighbor (NN). If
a pair of NNs pass the verification criteria (i.e., the similarity between a feature and
its NN being above a predetermined threshold, complying with a geometric model),
this feature pair is considered a matched pair; otherwise, it is discarded as a false
positive. The database object which has most matched features to the captured image
is considered as the recognized object.
215
Fast and accurately retrieving the NN of a local feature from a large database
is the key to efficient and accurate object recognition, ensuring a satisfactory user
experience and scalability for MAR apps. Two popular techniques that have been
commonly used for large-scale NN matching are locality sensitive hashing (LSH)
and bag-of-words (BoW) matching.
LSH: Locality sensitive hashing. LSH, (Gionis etal. 1999), is a widely used technique for approximate NN search. The key of LSH is a hash function, which maps
similar descriptors into the same bucket of a hash table and different descriptors in
different buckets. To find the NN of a query descriptor, we first retrieve its matching
bucket and then check all the descriptors within the matched bucket using a bruteforce search.
For binary features, the hash function can simply be a subset of bits from the
original bit string; descriptors with a common sub-bit-string are casted to the
same table bucket. The size of the subset, that is, the hash key size, determines
the upper bound of the Hamming distance among descriptors within the same
buckets. To improve the detection rate of NN search based on LSH, two techniques, namely multi-table and multi-probe, are usually used. The multi-table
technique stores the database descriptors in several hash tables, each of which
leverages a different hash function. In the query phase, the query descriptor is
hashed into a bucket of every hash table and all descriptors in each of these buckets are then further checked for matching. Multi-table improves the detection rate
of NN search at the cost of higher memory usage and longer matching time, which
is linearly proportional to the number of hash tables used. Multi-probe examines
both the bucket in which the query descriptor falls and its neighboring buckets.
While multi-probe would result in more matching checks of database descriptors,
it actually requires fewer hash tables and thus incurs lower memory usage. In
addition, it allows a larger key size and in turn smaller buckets and fewer matches
to check per bucket.
BoW: Bag-of-words matching. BoW matching (Sivic et al. 2003) is an effective
strategy to reduce memory usage and support fast matching via a scalable indexing
scheme such as an inverted file. Typically, BoW matching quantizes local image
descriptors into visual words and then computes the image similarity by counting the frequency of co-occurrences of words. However, it completely ignores
the spatial information; hence it may greatly degrade the accuracy. In order to
enhance the accuracy for BoW matching, several approaches have been proposed
to compensate the loss of spatial information. For example, geometric verification (Philbin et al. 2007), which is designed for general image-matching applications, is a popular scheme which verifies local correspondences by checking
their homography consistency. Wu et al. presented a bundling feature matching
scheme (Wu etal. 2009) for partial-duplicate image detection. In their approach,
sets of local features are bundled into groups by MSER (Matas etal. 2002) region
detected regions, and robust geometric constraints are then enforced within each
group. Spatial pyramid matching (Lazebnik etal. 2006), which considers approximate global geometric correspondences, is another scheme to enforce geometric
constraints for more accurate BoW matching. The scheme partitions the image
216
into increasingly finer subregions and computes histograms of local features found
within each subregion. To compute the similarity between two images, the distance between histogram at each spatial level is weighted and summed together.
All these schemes yield more reliable local-region matches by enforcing various
geometric constraints. However, these schemes are very computationally expensive, thus when applying them to MAR, the recognition procedure is conducted on
the server side or in the cloud where abundant computing and memory resources
are available.
9.3.1.2.3 Local Feature-Based Object Tracking
A typical flow of local feature-based object tracking is to find corresponding local
features on consecutive frames and then estimate the homography transformation
between image frames based on local feature matches according to Equation 9.1. But
different from marker-based tracking and pose estimation, which utilize only four
reliable corner matches, local feature-based method often generates a large amount
of correspondences which inevitably could include some outliers. Selecting reliable
matches from a large correspondence set is challenging, and existing solutions often
rely on the RANSAC or PROSAC algorithms to solve this problem. The key idea
of RANSAC and PROSAC is to iteratively estimate parameters of a transformation
model from a set of noisy feature correspondences so that a sufficient number of
consensuses can be obtained. We refer readers to Fischler etal. (1981) and Chum
etal. (2005) for details of the RANSAC and PROSAC algorithms. The quality of
local features is essential for the accuracy of local feature matches. A large number
of false positive matches resulting from low-quality features could lead to an enormous amount of iterations in the RANSAC and PROSAC procedures, yielding an
excessively long runtime.
217
In the following section, we first briefly review their functionalities, hardware, and
bias of these sensors and then present tracking algorithms based on the sensor data.
9.3.2.1.1Accelerometer
An accelerometer measures the acceleration forces exerted on a mobile device. The
accelerometer reading is a summation of two forces: the gravity force due to the
weight of the device and the acceleration force due to the motion of the device.
Todays mobile devices are equipped with a three-axis accelerometer which measures the forces in the x, y, and z directions with respect to the surface plane of the
mobile device.
There are several types of accelerometers and the type used in mobile devices
is the capacitive accelerometer. Figure 9.7a illustrates the structure of a one-axis
MEMS capacitive accelerometer. The accelerometer data is acquired by measuring the force exerted on an object which is able to flex up or down. The amount the
device flexes is monitored by a set of fingers that are attached to a movable inertial
mass and flex with the device. As these fingers/plates move, they get closer to, and
move further apart from, a set of stationary fingers/plates. The proximity of these
fingers/plates can create a change in the measured capacitance between multiple
fingers/plates, which can be monitored to measure the displacement of the center
inertial mass. This structure can be extended to build a three-axis accelerometer for
measuring the displacement along all three axes.
9.3.2.1.2Gyroscope
A three-axis gyroscope provides a 3D vector which measures the rotational (angular) velocity of a device around three axes of the devices coordinate system. The
gyroscope was first introduced into smartphones by Apple in iPhone4 in June 2010.
The first Android phone in which a three-axis gyroscope was integrated is Googles
Nexus S in December 2010. Gyroscopes work off the principles of the Coriolis force,
and are usually implemented within an integrated circuit (IC) using a vibrating mass
attached to a set of springs. If the device is rotated about the axis defined by the
first set of springs, the inner frame will be pushed away from the axis of rotation,
causing a compression in the second set of springs due to the Coriolis acceleration
experienced by the vibrating mass. An example of a MEMS gyroscope is depicted
in Figure 9.7b.
These types of gyroscopes are relatively cheap to manufacture; however, they are
often noisy and could introduce significant errors if their measurements are not modeled properly. On the contrary, many advanced inertial navigation systems (INSs)
today have begun using optical gyroscopes instead, which prove to be much more
accurate. Because of its broad application and increasing popularity, more advanced
gyroscopes are being developed and integrated into new devices, which can yield
more accurate results.
Although latest MEMS gyroscopes have smaller errors than the previous generations, all gyroscopes used in smartphones today still experience a small amount of
bias. Given that the gyroscope measures a rate (change over time), the bias itself is a
rate. A gyroscope bias can be envisioned as the rotational velocity observed by the
device when it is not in motion. This view is somewhat simplified, as a bias can also
218
Base (substrate)
x2
C1
C2
Spring ks
x1
Fixed outer
plates
Movable plates
Inner frame
Vo
Resonating mass
Vo
Proof mass:
movable
microstructure
Springs
Spring ks
Base (substrate)
(a)
Vx
(b)
FIGURE 9.7 (a) A typical 1D MEMS capacitive accelerometer and (b) a vibrating mass gyroscope.
Motion, x
219
occur when the device is moving as well. In addition, this bias is sensitive to several
factors including the temperature and often randomly varies over time, thus is difficult to compensate for. This bias is often estimated as a random variable by many
filtering algorithms.
9.3.2.1.3Magnetometer
The magnetometer measures the strength of the earths magnetic field, which
is a vector pointing toward the magnetic north of the earth. The magnetometer found in most smart devices is primarily one of two possible types: a Hall
effect magnetometer or an anisotropic magnetoresistive (AMR) magnetometer.
The Hall effect magnetometers are the most common and provide a voltage output in response to the measured field strength and can also sense polarity. The
AMR magnetometers use a thin strip of a special kind of alloy that changes its
resistance whenever there is a change in the surrounding magnetic field. AMR
magnetometers usually yield much better accuracy; however, they are more
expensive.
One primary source of a magnetometers error is called the magnetometer bias.
This bias is caused by the surrounding environment (external to the magnetometer
itself) and can cause a wide range of errors in the magnetometer readings. The bias
itself can be separated into one of two types. The first is called hard iron bias. This
type of bias is primarily caused by devices which produce a magnetic field. The
errors observed by a hard iron bias are constant offsets, usually applied to all axes
of the magnetometer equally. This bias is not time or space varying and can be
compensated for by simply adjusting the readings of the magnetometer by some
constant value. The other type of bias commonly experienced by the magnetometer
is called soft iron bias. This type of bias is caused by any distortions in the magnetic
field surrounding the magnetometer, thus can have many forms and is difficult to
compensate for.
9.3.2.1.4 Kalman Filtering for Sensor-Based Tracking
The goal of tracking is to obtain the translation and orientation of a device in
the 3D earth coordinate system. Each of the three sensors alone can provide the
orientation of a mobile device. Once we get the orientation information, we can
derive the gravity force components along three axes of the mobile device and
then subtract the gravity force from the accelerometer data to obtain the motioninduced acceleration force. By double integration of motion-induced acceleration
force, we can obtain the translation of the devices in the 3D earth coordinate
system.
However, since each type of sensor data is quite noisy, relying on a single type of
sensor cannot achieve an accurate tracking. Many approaches apply a filtering-based
method to fuse three types of sensor data for a more reliable and precise tracking
result. In the following, we overview the most standard filtering methodKalman
filtering, which is also the algorithm implemented in Android operating system for
estimating smartphones orientation. For more advanced yet computationally expensive filters such as unscented Kalman filters or particle filters, please refer to Li etal.
(2013) and Cheon etal. (2007) for details.
220
The Kalman filtering process for orientation estimation can be broken down into
two primary steps: the prediction step and the updating step. In the predicting phase,
the filter uses the gyroscope measurement to predict the dynamics of the device rotation. The gyroscope measurements are thus integrated directly into the state transition equation and used to provide a predicted state estimate. In particular, the state
equation is defined using a seven-element state vector consisting of the quaternion
q (t ) (i.e., four-element orientation vector) and the gyro-bias b(t):
q (t )
X (t ) =
(9.7)
b(t )
We define an error angle vector as a small rotation between the estimated and the
true orientation of a device in the earth coordinate system. Similarly, an error bias
vector b is defined as small differences between the estimated and the true bias of
the device. Accordingly, the error state propagation model is given by
[]
=
b 033
I 33 I 33
+
033 b 033
033 n (9.8)
I 33 nb
where
is the rotation velocity around three axes
nw and nb models are the gyroscope noise and bias, respectively
In most cases, nw is assumed to be an independent white Gaussian distribution along
each axis of the gyroscope input. Therefore, its expected value is given as E[n] = 03 1.
The gyroscope bias model is usually defined as b = nb, where nb is an independent
white Gaussian distribution along each axis.
The solution to this differential equation has the closed form solution found in
Trawny etal. (2005) and yields the following state transition matrix:
=
033
(9.9)
I 33
where
]
= I 33 [
]
= [
| t )
| t ))
sin(|
(1 cos(|
] 2
+[
|
|
|
|
| t ))
| t sin(|
| t ))
(1 cos(|
(|
[]2
I 33t
|
|3
|
|
221
In the updating phase, the filter combines the previously estimated state with
the recorded accelerometer, and magnetometer measurements come directly from
sampling the IMUs to revise the orientation estimation. Each recorded measurement complies with a model which describes its relationship with the estimated
states and noise errors of some measurements. Specifically, the measurement model
is of the form:
where
E[nz] = 0
E[ nz nzT ] = R
REB (q) is a rotation matrix from the earth coordinate system to the predicted
device coordinate system
The rotation matrix is obtained using the propagated quaternion from the process
model. z0 is a unit vector representation of north in the earth coordinate system. The
measurement residual is defined as r = z z, where z is the input measurement. Since
this residual represents the error between the measurement vector and the predicted
vector, it is a close approximation of the error angle vector . This approximation is
derived in Trawny etal. (2005) and defined as
r REB (q) z 0
0 + nz (9.11)
b
H = REB (q) z 0
0 (9.12)
After the measurement update, the residual obtained in the measurement model is
used to update the quaternion which in turn is used as the filter result.
222
(i.e., the location on earth) and this information is used to initialize the visual tracking system, which in turn gives the users local pose and the view direction. In
Naimark etal. (2002), the authors proposed to combine visual tracking and GPS
for outdoor building visualization. The user can place virtual models on Google
Earth and the app can retrieve and visualize them based on the users GPS location.
Another promising direction is to combine vision information with motion sensor
data (i.e., gyroscope, accelerometer, and magnetometer) to provide a more accurate and efficient object tracking. The trend of integrating more sensors into mobile
devices has not stopped yet. For example, Google has just released a new mobile
platform, Tango, which integrates 6 Degree-of-Freedom motion sensors, depth sensors, and high-quality cameras. Amazon has announced their new Fire phone which
includes four cameras tucked into the front corners of the phone, in additional to
other motion sensors. Advances in mobile hardware offer the opportunities to gain
richer contextual information surrounding a mobile device and in turn open a door
for new approaches to best utilizing all available multimodel information.
223
class solution with various pricing plans determined by your apps total number of
image recognitions per month. Generally speaking, Vuforia development infrastructure facilitates, and significantly simplifies, the development of MAR apps.
9.5CONCLUSIONS
The advancement of mobile technology, in terms of hardware computing power,
seamless connectivity to the cloud, and fast computer vision algorithms, has raised
AR into the mainstream of mobile apps. Following the widespread popularity of a
handful of killer MAR applications already commercially available, it is believed
that MAR will expand exponentially in the next few years. The advent of MAR will
have a profound and lasting impact on the way people use their smartphones and tablets. These emerging MAR apps will turn our everyday world into a fully interactive
digital experience, from which we can see, hear, feel, and even smell the information
in a different way. This emerging direction will push the industry toward truly ubiquitous computing and a technologically converged paradigm.
The scalability, accuracy, and efficiency of the underlying techniques (i.e., object
recognition and tracking) are key factors influencing user experience of MAR apps.
New algorithms in computer vision and pattern recognition, such as lightweight feature extraction, have been developed to provide efficiency and compactness on lowpower mobile devices and meanwhile maintain sufficiently good accuracy. Several
efforts are also made to analyze particular hardware limitations for executing existing recognition and tracking algorithms on mobile devices and explore adaption
techniques to address these limitations. In addition to advances in the development
of lightweight computer vision algorithms, a variety of sensors have been integrated
into modern smartphones, enabling location recognition (e.g., via GPS) and device
tracking (e.g., via gyroscope, accelerometer, and magnetometer) at little computational cost. However, due to large noise of low-cost sensors equipped in todays
smartphones, the accuracy of location recognition and device tracking is usually
low and cannot meet the requirement for apps which demand high accuracy. Fusing
visual information with sensor data is a promising direction to achieve both high
accuracy and efficiency, and we shall see an increasing amount of research work
along this direction in the near future.
REFERENCES
Alahi, A., Ortiz, R., and Vandergheynst, P. 2012. FREAK: Fast retinal keypoint, In Proceedings
of the Computer Vision on Pattern Recognition.
Ashby, F.G. and Perrin, N.A. 1988. Toward a unified theory of similarity and recognition.
Psychological Review, 95:124150.
Bay, H., Ess, A., Tuytelaars T., and Gool, L.V. 2006. SURF: Speeded-up robust features. In
Proceedings of the European Conference on Computer Vision.
Bay, H., Ess, A., Tuytelaars, T., and Gool L.V. June 2008. Speeded-up robust features.
In Proceedings of the Conference on Vision and Image Understanding, 110(3),
346359.
Calonder, M., Lepetit, V., Strecha, C., and Fua, P. 2010. BRIEF: Binary robust independent
elementary features. In Proceedings of the European Conference on Computer Vision.
224
225
Matas, J., Chum, O., Urban, M., and Pajdla, T. 2002. Robust wide baseline stereo from maximally stable extremal regions. In Proceedings of British Machine Vision Conference,
pp. 384396.
Naimark, L. and Foxlin, E. 2002. Circular data matrix fiducial system and robust image
processing for a wearable vision-inertial self-tracker. In Proceedings of International
Symposium on Mixed and Augmented Reality, pp. 2736.
OpenCV for Android: http://opencv.org/platforms/android.html.
Philbin, J., Chum, O., Isard, M., Sivic, J., and Zisserman, A. 2007. Object retrieval with large
vocabularies and fast spatial matching. Proceedings of Computer Vision and Pattern
Recognition, pp. 18.
Reitmayr, G. and Drummond, T.W. 2007. Initialization for visual tracking in urban environments. pp. 161172.
Ribo, M., Lang, P., Ganster, H., Brandner, M., Stock, C., and Pinz, A. 2002. Hybrid tracking for outdoor augmented reality applications. Computer Graphics and Applications,
IEEE, 22(6):5463, 178.
Rosin, P.L. 1999. Measuring corner properties. Journal of Computer Vision and Image
Understanding, 73(2):291307.
Rosten, E. and Drummond, T. 2006. Machine learning for high speed corner detection. In
Proceedings of the European Conference on Computer Vision.
Rublee, E., Rabaud, V., Konolige, K., and Bradski, G. 2011. ORB: An efficient alternative to
SIFT or SURF. In Proceedings of the International Conference on Computer Vision.
Simard, P., Bottou, L., Haffner, P., and LeCun, Y. 1998. Boxlets: A fast convolution algorithm for signal processing and neural networks. In Proceedings of Neural Information
Processing Systems (NIPS).
Sivic, J. and Zisserman, A. 2003. Video google: A text retrieval approach to object
matching in videos. Proceedings of International Conference on Computer Vision,
2:14701477.
Ta, D.N., Chen, W.C., Gelfand, N., and Pulli, K. 2009. SURFTrac: Efficient tracking and
continuous object recognition using local feature descriptors. In Proceedings of the
Conference on Vision and Pattern Recognition.
Terriberry, T.B., French, L.M., and Helmsen, J. 2008. GPU accelerating speeded-up robust
features. In Proceedings of the 3D Data Processing, Visualization and Transmission.
Trawny, N. and Roumeliotis, S. 2005. Indirect Kalman Filter for 3D Attitude Estimation. Tech.
Rep. 2. Department of Computing Science and Engineering, University of Minnesota,
Minneapolis, MN.
Tuytelaars, T. and Mikolajczyk, K. 2008. Local invariant feature detectors: A survey. Journal
Foundations and Trends in Computer Graphics and Vision, 3:177280.
Uchiyama, S., Takemoto, K., Satoh, K., Yamamoto, H., and Tamura, H. 2002. MR platform: A
basic body on which mixed reality applications are built. In Proceedings of International
Symposium on Mixed and Augmented Reality. Vol. 00, p. 246.
Vuforia main page: https://www.vuforia.com/.
Wu, Z., Ke, Q.F., Isard, M., and Sun, J. 2009. Bundling features for large scale partial-duplicate web image search. In Proceedings of Computer Vision and Pattern Recognition,
pp. 2532.
Yang, X. and Cheng, K.T. June 2012a. Accelerating SURF detector on mobile devices. ACM
International Conference on Multimedia, Nara, Japan.
Yang, X. and Cheng, K.T. 2012b. LDB: An ultrafast feature for scalable augmented reality on
mobile device. In Proceedings of International Symposium on Mixed and Augmented
Reality, pp. 4957.
Yang, X. and Cheng, K.T. 2014a. Local difference binary for ultrafast distinctive feature description. IEEE Transactions on Pattern Analysis and Machine Intelligence,
36(1):188194.
226
Yang, X. and Cheng, K.T. 2014b. Learning optimized local difference binaries for scalable
augmented reality on mobile devices. IEEE Transactions on Visualization and Computer
Graphics, 20(6):852865.
Yang, X., Liao, C.Y., and Liu, Q. 2012. MixPad: Augmenting paper with mice & keyboards
for bimanual, cross-media and fine-grained interaction with documents. In Proceedings
of ACM Multimedia, pp. 11451148.
Yang, X., Liao, C.Y., Liu, Q., and Cheng, K.T. 2011a. Minimum correspondence sets for
improving large-scale augmented paper. In Proceedings of International Conference on
Virtual Reality Continuum and Its Applications in Industry.
Yang, X., Liao, C.Y., Liu, Q., and Cheng, K.T. 2011b. Large-scale EMM identification with
geometry-constrained visual word correspondence voting. In Proceedings of ACM
International Conference on Multimedia Retrieval.
10 Taxonomy, Research
CONTENTS
10.1 Introduction................................................................................................... 227
10.2 Taxonomies.................................................................................................... 229
10.2.1 Visuo-Haptic RealityVirtuality Continuum.................................... 229
10.2.2 Artificial Recreation and Augmented Perception.............................. 232
10.2.3 Within- and Between-Property Augmentation.................................. 233
10.3 Components Required for Haptic AR........................................................... 234
10.3.1 Interface for Haptic AR..................................................................... 234
10.3.2 Registration between Real and Virtual Stimuli................................. 236
10.3.3 Rendering Algorithm for Augmentation........................................... 237
10.3.4 Models for Haptic AR....................................................................... 238
10.4 Stiffness Modulation..................................................................................... 239
10.4.1 Haptic AR Interface...........................................................................240
10.4.2 Stiffness Modulation in Single-Contact Interaction..........................240
10.4.3 Stiffness Modulation in Two-Contact Squeezing.............................. 243
10.5 Application: Palpating Virtual Inclusion in Phantom with Two Contacts......245
10.5.1 Rendering Algorithm......................................................................... 247
10.6 Friction Modulation.......................................................................................248
10.7 Open Research Topics................................................................................... 249
10.8 Conclusions.................................................................................................... 250
References............................................................................................................... 251
10.1INTRODUCTION
This chapter introduces an emerging research field in augmented reality (AR), called
haptic AR. As AR enables a real space to be transformed to a semi-virtual space by
providing a user with the mixed sensations of real and virtual objects, haptic AR does
the same for the sense of touch; a user can touch a real object, a virtual object, or a real
object augmented with virtual touch. Visual AR is a relatively mature technology and
is being applied to diverse practical applications such as surgical training, industrial
manufacturing, and entertainment (Azuma etal. 2001). In contrast, the technology
227
228
for haptic AR is quite recent and poses a great number of new research problems
ranging from modeling to rendering in terms of both hardware and software.
Haptic AR promises great potential to enrich user interaction in various applications.
For example, suppose that a user is holding a pen-shaped magic tool in the hand, which
allows the user to touch and explore a virtual vase overlaid on a real table. Besides, the
user may draw a picture on the table with an augmented feel of using a paint brush on
a smooth piece of paper, or using a marker on a stiff white board. In a more practical
setting, medical students can practice cancer palpation skills by exploring a phantom
body while trying to find virtual tumors that are rendered inside the body. A consumertargeted application can be found in online stores. Consumers can see clothes displayed
on the touchscreen of a tablet computer and feel their textures with bare fingers, for
which the textural and frictional properties of the touchscreen are modulated to those
of the clothes. Another prominent example is augmentation or guidance of motor skills
by means of external haptic (force or vibrotactile) feedback, for example, shared control or motor learning of complex skills such as driving and calligraphy. Creating such
haptic modulations belongs to the realm of haptic AR. Although we have a long way to
go in order to realize all the envisioned applications of haptic AR, some representative
examples that have been developed in recent years are shown in Figure 10.1.
Virtual tumor
(a)
(b)
HMD
Haptic device
Actuator
(c)
(d)
FIGURE 10.1 Representative applications of haptic AR. (a) AR-based open surgery simulator introduced. (From Harders, M. et al., IEEE Trans. Visual. Comput. Graph., 15, 138,
2009.) (b) Haptic AR breast tumor palpation system. (From Jeon, S. and Harders, M., IEEE
Trans. Haptics, 99, 1, 2014.) (c) Texture modeling and rendering based on contact acceleration data. (Reprinted from Romano, J.M. and Kuchenbecker, K.J., IEEE Trans. Haptics, 5,
109, 2011. With permission.) (d) Conceptual illustration of the haptic AR drawing example.
229
In this chapter, we first address three taxonomies for haptic AR based on a composite visuo-haptic realityvirtuality continuum, a functional aspect of haptic AR
applications, and the subject of augmentation (Section 10.2). A number of studies
related to haptic AR are reviewed and classified based on the three taxonomies.
Based on the review, associated research issues along with components needed for
a haptic AR system are elucidated in Section 10.3. Sections 10.4 through 10.6 introduce our approach for the augmentation of real object stiffness and friction, in the
interaction with one or two contact points. A discussion of the open research issues
for haptic AR is provided in Section 10.7, followed by brief conclusions in Section
10.8. We hope that this chapter could prompt more research interest in this exciting,
yet unexplored, area of haptic AR.
10.2TAXONOMIES
10.2.1 Visuo-Haptic RealityVirtuality Continuum
General concepts associated with AR, or more generally, mixed reality (MR) were
defined earlier by Milgram and Colquhoun Jr. (1999) using the realityvirtuality
continuum shown in Figure 10.2a. The continuum includes all possible combinations
of purely real and virtual environments, with the intermediate area corresponding to
MR. Whether an environment is closer to reality or virtuality depends on the amount
of overlay or augmentation that the computer system needs to perform; the more augmentation performed, the closer to virtuality. This criterion allows MR to be further
classified into AR (e.g., a heads-up display in an aircraft cockpit) and augmented
virtuality (e.g., a computer game employing a virtual dancer with the face image
of a famous actress). We, however, note that the current literature does not strictly
discriminate the two terms, and uses AR and MR interchangeably.
Extending the concept, we can define a similar realityvirtuality continuum for
the sense of touch and construct a visuo-haptic realityvirtuality continuum by compositing the two unimodal continua shown in Figure 10.2b. This continuum can be
valuable for building the taxonomy of haptic MR. In Figure 10.2b, the whole visuohaptic continuum is classified into nine categories, and each category is named in an
abbreviated form. The shaded regions belong to the realm of MR. In what follows, we
review the concepts and instances associated with each category, with more attention
to those of MR. Note that the continuum for touch includes all kinds of haptic feedback and does not depend on the specific types of haptic sensations (e.g., kinesthetic,
tactile, or thermal) or interaction paradigms (e.g., tool-mediated or bare-handed).
In the composite continuum, the left column has the three categories of h aptic
reality, vR-hR, vMR-hR, and vV-hR, where the corresponding environments provide only real haptic sensations. Among them, the simplest category is vR-hR,
which represents purely real environments without any synthetic stimuli. The other
end, vV-hR, refers to the conventional visual virtual environments with real touch,
for example, using a tangible prop to interact with virtual objects. Environments
between the two ends belong to vMR-hR, in which a user sees mixed objects but
still touches real objects. A typical example is the so-called tangible AR that has
been actively studied in the visual AR community. In tangible AR, a real prop held
230
Visual virtuality
Visual mixed reality
Visual reality
Mixed reality
Augmented reality
Augmented virtuality
Realityvirtuality continuum
Real environment
Virtual environment
(a)
vV-hR
vV-hMR
vV-hV
vMR-hR
vMR-hMR
vMR-hV
vR-hR
vR-hMR
vR-hV
Haptic reality
(b)
Haptic virtuality
in the hand is usually used as a tangible interface for visually mixed environments
(e.g., the MagicBook in Billinghurst etal. 2001), and its haptic property is regarded
unimportant for the applications. Another example is the projection augmented
model. Acomputer-generated image is projected onto a real physical model to create
a realistic-looking object, and the model can be touched by the bare hand (e.g.,see
Bennett and Stevens 2006). Since the material property (e.g., texture) of the real
object may not agree with its visually augmented model, haptic properties are usually incorrectly displayed in this application.
The categories in the right column of the composite continuum, vR-hV, vMR-hV,
and vV-hV, are for haptic virtuality, corresponding to environments with only virtual
haptic sensations, and have received the most attention from the haptics research
community. Robot-assisted motor rehabilitation can be an example of vR-hV where
231
synthetic haptic feedback is provided in a real visual environment, while an interactive virtual simulator is an instance of vV-hV where the sensory information of both
modalities is virtual. In the intermediate category, vMR-hV, purely virtual haptic
objects are placed in a visually mixed environment, and are rendered using a haptic interface on the basis of the conventional haptic rendering methods for virtual
objects. Earlier attempts in this category focused on how to integrate haptic rendering of virtual objects into the existing visual AR framework, and they identified
the precise registration between the haptic and the visual coordinate frame as a key
issue (Adcock etal. 2003, Vallino and Brown 1999). For this topic, Kim etal. (2006)
applied an adaptive low-pass filter to reduce the trembling error of a low-cost visionbased tracker using ARToolkit, and upsampled the tracking data for use with 1 kHz
haptic rendering (Kim etal. 2006). Bianchi etal. further improved the registration
accuracy via intensive calibration of a vision-based object tracker (Bianchi et al.
2006a,b). Their latest work explored the potential of visuo-haptic AR technology
for medical training with their highly stable and accurate AR system (Harders etal.
2009). Ott et al. also applied the HMD-based visuo-haptic framework to training
processes in industry and demonstrated its potential (Ott et al. 2007). In applications, a half mirror was often used for constructing a visuo-haptic framework due to
the better collocation of visual and haptic feedback, for example, ImmersiveTouch
(Luciano et al. 2005), Reachin Display (Reachin Technology), PARIS display
(Johnson etal. 2000), and SenseGraphics 3D-IW (SenseGraphics). Such frameworks
were, for instance, applied to cranial implant design (Scharver etal. 2004) or MR
painting application (Sandor etal. 2007).
The last categories for haptic MR, vR-hMR, vMR-hMR, and vV-hMR, with which
the rest of this chapter is concerned, lie in the middle column of the composite continuum. A common characteristic of haptic MR is that synthetic haptic signals that
are generated by a haptic interface modulate or augment stimuli that occur due to a
contact between a real object and a haptic interface medium, that is, a tool or a body
part. The VisHap system (Ye etal. 2003) is an instance of vR-hMR that provides
mixed haptic sensations in a real environment. In this system, some properties of a
virtual object (e.g., shape and stiffness) are rendered by a haptic device, while others
(e.g., texture and friction) are supplied by a real prop attached at the end-effector of
the device. Other examples in this category are the SmartTool (Nojima etal. 2002)
and SmartTouch systems (Kajimoto etal. 2004). They utilized various sensors (optical and electrical conductivity sensors) to capture real signals that could hardly be
perceived by the bare hand, transformed the signals into haptic information, and then
delivered them to the user in order to facilitate certain tasks (e.g., peeling off the
white from the yolk in an egg). The MicroTactus system (Yao etal. 2004) is another
example of vR-hMR, which detects and magnifies acceleration signals caused by
the interaction of a pen-type probe with a real object. The system was shown to
improve the performance of tissue boundary detection in arthroscopic surgical training. Asimilar pen-type haptic AR system, Ubi-Pen (Kyung and Lee 2009), embedded miniaturized texture and vibrotactile displays in the pen, adding realistic tactile
feedback for interaction with a touch screen in mobile devices.
On the other hand, environments in vV-hMR use synthetic visual stimuli. For example, Borst etal. investigated the utility of haptic MR in a visual virtual environment
232
by adding synthetic force to a passive haptic response for a panel control task (Borst
and Volz 2005). Their results showed that mixed force feedback was better than synthetic force alone in terms of task performance and user preference. In vMR-hMR,
both modalities rely on mixed stimuli. Ha etal. installed a vibrator in a real tangible
prop to produce virtual vibrotactile sensations in addition to the real haptic information of the prop in a visually mixed environment (Ha etal. 2007). They demonstrated
that the virtual vibrotactile feedback enhances immersion for an AR-based handheld
game. Bayart etal. introduced a teleoperation framework where force measured at
the remote site is presented at the master side with additional virtual force and mixed
imagery (Bayart etal. 2007, 2008). In particular, they tried to modulate a certain real
haptic property with virtual force feedback for a hole-patching task and a painting
application, unlike most of the related studies introduced earlier.
Several remarks need to be made. First, the vast majority of related work, except
(Bayart et al. 2008, Borst and Volz 2005, Nojima et al. 2002), has used the term
haptic AR without distinguishing vMR-hV and hMR, although research issues associated with the two categories are fundamentally different. Second, haptic MR can
be further classified to haptic AR and haptic augmented virtuality using the same
criterion of visual MR. All of the research instances of hMR introduced earlier correspond to haptic AR, since little knowledge regarding an environment is managed
by the computer for haptic augmentation. However, despite its potential, attempts to
develop systematic and general computational algorithms for haptic AR have been
scanty. An instance of haptic augmented virtuality can be haptic rendering systems
that use haptic signals captured from a real object (e.g., see Hoever et al. 2009,
Okamura etal. 2001, Pai etal. 2001, Romano and Kuchenbecker 2011) in addition
to virtual object rendering, although such a concept has not been formalized before.
Third, although the taxonomy is defined for composite visuo-haptic configurations,
a unimodal case (e.g., no haptic or visual feedback) can also be mapped to the corresponding 1D continuum on the axes in Figure 10.2b.
233
234
TABLE 10.1
Classification of Related Studies Using the Composite Taxonomy
Within-property
augmentation
Between-property
augmentation
Artificial Recreation
Augmented Perception
Further, the last two taxonomies are combined to construct a composite taxonomy,
and all relevant literature in the hMR category is classified using this taxonomy in
Table 10.1. Note that most of the haptic AR systems have both within- and betweenproperty characteristics to some degree. For clear classification, we only examined
key augmentation features in Table 10.1.
235
Object
Coupled when in contact
Sensing
Computer
Haptic
interface
Interaction
tool
Coupled
Coupled
Actuation
Haptic rendering system
Perception
Sensorimotor
system
Brain
Action
Human user
both the real environment and the haptic interface are mixed and transmitted to
the user. Therefore, designing this feel-through tool is of substantial importance
in designing a haptic AR interface.
The feel-through can be either direct or indirect. Direct feel-through, analogous
to optical see-through in visual AR, transmits relevant physical signals directly to
the user via a mechanically coupled implement. In contrast, in indirect feel-through
(similar to video see-through), relevant physical signals are sensed, modeled, and
synthetically reconstructed for the user to feel, for example, in masterslave teleoperation. In direct feel-through, preserving the realism of a real environment and
mixing real and virtual stimuli is relatively easy, but real signals must be compensated for with great care for augmentation. To this end, the system may need to
employ very accurate real response estimation methods for active compensation
or special hardware for passive compensation, for example, using a ball bearing
tip to remove friction (Jeon and Choi 2010) and using a deformable tip to compensate for real contact vibration (Hachisu etal. 2012). On the contrary, in indirect
feel-through, modulating real signals is easier since all the final stimuli are synthesized, but more sophisticated hardware is required for transparent rendering of
virtual stimuli with high realism.
Different kinds of coupling may exist. Mechanical coupling is a typical example,
a force feedback haptic stylus instrumented with a contact tip, for example (Jeon and
Choi 2011). Other forms such as thermal coupling and electric coupling are also possible depending on the target property. In between-property augmentation, coupling
may not be very tight, for example, only position data and timing are shared (Borst
and Volz 2005).
Haptic AR tools can come in many different forms. In addition to typical styli,
very thin sheath-type tools are also used, for example, sensors on one side and
236
actuators on the other side of a sheath (Nojima etal. 2002). Sometimes a real object
itself is a tool, for example, when both sensing and actuation modules are embedded
in a tangible marker (Ha etal. 2006).
A tool and coupling for haptic AR needs to be very carefully designed. Each of
the three components involved in the interaction requires a proper attachment to the
tool, appropriate sensing and actuation capability, and eventually, all of these should
be compactly integrated into the tool in a way that it can be appropriately used by
a user. To this end, the form factors of the sensors, attachment joints, and actuation
parts should be carefully designed to maximize the reliability of sensing and actuation while maintaining a sufficient degree of freedom of movement.
237
between virtual and real signals. In the case of within-property augmentation, mixing
happens in a single property, and thus virtual signals related to a target property need to
be exactly aligned with corresponding real signals for harmonious merging and smooth
transition along the line between real and virtual. This needs very sophisticated registration, often with the estimation of real properties based on sensors and environment
models (see Section 10.4 for how we have approached this issue). However, in betweenproperty augmentation, different properties are usually treated separately, and virtual
signals of one target property do not have to be closely associated with real signals of
the other properties. Thus, the registration may be of lesser accuracy in this case.
238
239
TABLE 10.2
Characteristics of the Categories
Category
Within Property
Between Property
Registration and
rendering
Category
Models required
Artificial Recreation
Models for physics simulation.
Sometimes models for registration
and compensation.
Augmented Perception
Models for registration and
compensation.
Category
Rendering
Direct Feel-Through
Real-time compensation of real
property needed.
Indirect Feel-Through
Transparent haptic rendering
algorithm and interface needed.
240
PHANToM
Premium 1.5
PHANToM
Premium 1.5
NANO17 force sensor
FIGURE 10.4 Haptic AR interface. (Reprinted from Jeon, S. and Harders, M., Extending
haptic augmented reality: Modulating stiffness during two-point squeezing, in Proceedings
of the Haptics Symposium, 2012, pp. 141146. With permission.)
fr ( t ) = fh ( t ) + fd ( t ) . (10.1)
The reaction force fr(t) during contact can be decomposed into two orthogonal
force components, as shown in Figure 10.5:
where
frn (t ) is the result of object elasticity in the normal direction
frt (t ) is the frictional tangential force
241
Original surface fr
fr
Deformed surface
fh
fd
frt
f rn
pc
un
FIGURE 10.5 Variables for single-contact stiffness modulation. (Reprinted from Jeon, S.
and Choi, S., Presence Teleop. Virt. Environ., 20, 337, 2011. With permission.)
Let x(t) be the displacement caused by the elastic force component, which represents
the distance between the haptic interface tool position, p(t), and the original nondeformed position pc(t) of a contacted particle on the object surface. If we denote the
unit vector in the direction of frn (t ) by un(t) and the target modulation stiffness by k(t ),
the force that a user should feel is:
This equation indicates the tasks that a stiffness modulation algorithm has to do in
every loop: (1) detection of the contact between the haptic tool and the real object for
spatial and temporal registration, (2) measurement of the reaction force fr(t), (3) estimation of the direction un(t) and magnitude x(t) of the resulting deformation for stiffness augmentation, and (4) control of the device-rendered force fd(t) to produce the
desired force fd (t ). The following section describes how we address these four steps.
In Step 1, we use force sensor readings for contact detection since the entire
geometry of the real environment is not available. A collision is regarded to have
occurred when forces sensed during interaction exceed a threshold. To increase the
accuracy, we developed algorithms to suppress noise, as well as to compensate for
the weight and dynamic effects of the tool. See Jeon and Choi (2011) for details.
Step2 is also simply done with the force sensor attached to the probing tool.
Step 3 is the key process for stiffness modulation. We first identify the friction
and deformation dynamics of a real object in a preprocessing step, and use them later
during rendering to estimate the known variables for merging real and virtual forces.
The details of this process are summarized in the following section.
Before augmentation, we carry out two preprocessing steps. First, the friction
between the real object and the tool tip is identified using the Dahl friction model (Jeon
and Choi 2011). The original Dahl model is transformed to an equivalent discrete-time
difference equation, as described in Mahvash and Okamura (2006). Italso includes
a velocity-dependent term to cope with viscous friction. Theprocedure for friction
242
f (t ) = k x (t )
+ b x (t )
) x ( t ) , (10.5)
m
where
k and b are stiffness and damping constants
m is a constant exponent (usually 1 < m < 2)
For identification, the data triples consisting of displacement, velocity, and reaction
force magnitude are collected through repeated presses and releases of a deformable
sample in the normal direction. The data are passed to a recursive least-squares algorithm for an iterative estimation of the HuntCrossley model parameters (Haddadi
and Hashtrudi-Zaad 2008).
For rendering, the following computational process is executed in every haptic rendering frame. First, two variables, the deformation direction un(t) and the magnitude
of the deformation x(t) are estimated. The former is derived as follows. Equation10.2
indicates that the response force fr(t) consists of two perpendicular force components: frn (t ) and frt (t ). Since un(t) is the unit vector of frn (t ), un(t) becomes:
un (t ) =
fr ( t ) frt ( t )
fr ( t ) frt ( t )
. (10.6)
The known variable in (10.6) is frt ( t ). The magnitude of frt (t ) is estimated using the
identified Dahl model. Its direction is derived from the tangent vector at the current
contact point p(t), which is found by projecting p(t) onto un(tt) and subtracting
it from p(t).
The next part is the estimation of x(t). The assumption of material homogeneity
allows us to directly approximate it from the inverse of the HuntCrossley model
identified previously. Finally, using the estimated un(t) and x(t), fd (t ) is calculated
using (10.4), which is then sent to the haptic AR interface.
In Jeon and Choi (2011), we assessed the physical performance of each component and the perceptual performance of the final rendering result using various real
samples. In particular, the perceptual quality of modulated stiffness evaluated in a
243
psychophysical experiment showed that rendering errors were less than the human
discriminability of stiffness. This demonstrates that our system can provide perceptually convincing stiffness modulation.
fr,* (t) can be further decomposed to pure weight fw,* (t) and a force component in
squeezing direction fsqz,* (t) as shown in Figure 10.6, resulting in
Since the displacement and the force along the squeezing direction contribute to stiffness
perception, the force component of interest is fsqz,* (t). Then, (10.7) can be rewritten as
x1u1
x2u2
pc,2
p1
fsqz,1
fw,1
fsqz,2
fd
fh
fw
p2
pc,1
fr,1
fw,2
fr
fr,2
Deformed surface
FIGURE 10.6 Variables for two-contact stiffness modulation. (Reprinted from Jeon, S.
and Harders, M., Extending haptic augmented reality: Modulating stiffness during two-point
squeezing, in Proceedings of the Haptics Symposium, 2012, pp. 141146. With permission.)
244
fh,* ( t ) = k ( t ) x* ( t ) u* ( t ) , (10.10)
where x*(t) represents the displacement along the squeezing direction and u*(t)
is the unit vector toward the direction of that deformation. Combining (10.9) and
(10.10) results in the virtual force for the haptic interfaces to render for the desired
augmentation:
Here again, (10.11) indicates that we need to estimate the displacement x*(t) and the
deformation direction u*(t) at each contact point. The known variables are the reaction forces fr,*(t) and the tool tip positions p*(t). To this end, the following three observations about an object held in the steady state are utilized. First, the magnitudes of
the two squeezing forces fsqz,1(t) and fsqz,2(t) are the same, but the directions are the
opposite (fsqz,1(t) = fsqz,2(t)). Second, each squeezing force falls on the line connecting the two contact locations. Third, the total weight of the object is equal to the sum
of the two reaction force vectors:
fr ,1 ( t ) + fr ,2 ( t ) = fw,1 ( t ) + fw,2 ( t ) .
The first
second
LLLLLLLLL
I and
LLLLLLLLL
I observations provide the directions of fsqz,*(t) (= u*(t) =
p1 (t )p2 (t ) or p2 (t )p1 (t ) ; also see l(t) in Figure 10.6). The magnitude of fsqz,*(t), fsqz,*
(t) is determined as follows. The sum of the reaction forces along the l(t) direction,
fr sqz (t ) = fr ,1 (t ) u l (t ) + fr ,2 (t ) u l (t ) , includes not only the two squeezing forces, but
also the weight. Thus, fsqz(t) can be calculated by subtracting the effect of the weight
along l(t) from frsqz(t):
where f wsqz(t) can be derived based on the third observation such that
fwsqz ( t ) = fr ,1 ( t ) + fr ,1 ( t ) u l ( t ) . (10.13)
Then, the squeezing force at each contact point can be derived based on the first
observation:
245
FIGURE 10.7 Example snapshot of visuo-haptic augmentation. Reaction force (dark gray
arrow), weight (gray arrow), and haptic device force (light gray arrow) are depicted. Examples
with increased stiffness (virtual forces oppose squeezing) and decreased stiffness (virtual
forces assist squeezing) are shown on left and right, respectively.
Steps for the estimation of the displacement x* (t) in (10.11) are as follows. Let
the distance between the two initial contact points on the non-deformed surface
(pc,1(t) and pc,2(t) in Figure 10.6) be d0. It is constant over time due to the no-slip
assumption. Assuming homogeneity, x1(t) is equal to x2(t), and the displacements can
be derived by
x1 ( t ) = x2 ( t ) = 0.5 d0 d ( t ) , (10.15)
where d(t) is p1 (t )p2 (t ) . All the unknown variables are now estimated and the final
virtual force can be calculated using (10.11).
In Jeon and Harders (2012), we also evaluated the system performance through
simulations and a psychophysical experiment. Overall, the evaluation indicated that
our system can provide physically and perceptually sound stiffness augmentation.
In addition, the system has further been integrated with a visual AR framework
(Harders et al. 2009). To our knowledge, this is among the first system that can
augment both visual and haptic sensations. We used the visual system to display
information related to haptic augmentation, such as the force vectors involved in the
algorithm. Figure 10.7 shows exemplar snapshots.
246
fH,1
fH,2
fR,1
fT,1
fR,2
fT,2
Silicone tissue
mock up
Virtual tumor
with the consideration of the mutual effects between the contacts. The final combined forces fH,* (t) enable a user to feel augmented sensations of the stiffer inclusion,
given as
fH ,* ( t ) = fR,* ( t ) + fT ,* ( t ) . (10.16)
Here, estimating and simulating fT,*(t) is the key for creating a sound illusion. The
hardware setup we used is the same as the one shown in Figure 10.4.
A two-step, measurement-based approach is taken to model the dynamic behavior
of the inclusion. First, a contact dynamics model representing the pure response of the
inclusion is identified using the data captured during palpating a physical mock-up. Then,
another dynamics model is constructed to capture the movement characteristics of the
inclusion in response to external forces. Both models are then used in rendering to determine fT,* (t) in real-time. The procedures are detailed in the following paragraphs.
The first preprocessing step is for identifying the overall contact force resulting
purely from an inclusion (inclusion-only case) as a function of the distance between
the inclusion and the contact point. Our approach is to extract the difference between
the responses of a sample with a stiffer inclusion (inclusion-embedded) and a sample without it (no-inclusion). To this end, we first identify the HuntCrossley model
using the no-inclusion model. We use the same identification procedure described in
Section 10.4.2. This model is denoted by f = H NT ( x, x ). Then, we obtain the data from
the inclusion-embedded sample by manually poking along a line from pTs to pT0 (see
Figure 10.9 for the involved quantities). This time, we also record the position changes
of pT using a position tracking system (TrackIR; NaturalPoint, Inc.). This gives us the
state vector when palpating the tumor-embedded model ( xTE , x TE , fTE , pT , p H ).
As depicted in Figure 10.8, the force f TE (t) can be decomposed into fR (t) and fT (t).
Since f = H NT ( x, x ) represents the magnitude of fR (t), the magnitude of fT (t) can be
obtained by passing all data pairs ( xTE , x TE ) to H NT ( x, x ) and by computing differences using
247
pTs
l0
Deformed
surface
lHT
Displaced tumor
pH
Tool tip
pT0
Initial tumor
pT
FIGURE 10.9 Variables for inclusion model identification. (Reprinted from Jeon, S. and
Harders, M., IEEE Trans. Haptics, 99, 1, 2014. With permission.)
f T (t) can be expressed as a function of the distance between the inclusion and the tool
tip. Let the distance from pH(t) to pT (t) be lHT (t), and the initial distance from pTs to
pT0 be l0. Then, the difference, l(t) = l0 lHT (t), becomes a relative displacement toward
the inclusion. By using the data triples (l, l, fT ), a new response model with respect
to l(t) can be derived, which is denoted as HT (l, l). This represents the inclusion-only
force response at the single contact point pTs, poking into the direction of pT.
In the second step, the inclusion movement in response to external forces is characterized. Nonlinear changes of d(t) with respect to an external force fT (t) can be
approximated using again the HuntCrossley model. After determining d(t) using a
position tracker and fT (t) using our rendering algorithm described in the next subsection, vector triples (d, d , fT ) are employed to identify three HuntCrossley models
for the three Cartesian directions, denoted by Gx (d x , d x ), Gy (d y , d y ), and Gz (dz , d z ).
p H ,* ( t ) pT ( t )
. (10.18)
| p H ,* ( t ) pT ( t ) |
Equation 10.18 indicates that the unknown values, f T,* (t) and pT (t), should be approximated during the rendering.
f T,* (t) is derived based on HT. To this end, we first scale the current indentation
distance to match those during the recording:
l* (t ) = (l0,* lHT ,* (t ))
l0
. (10.19)
l0,*
248
pTs
pHs,1
Original
surface
l0
pH,1
lHT,1
Tool tip 1
Displaced tumor
pH,2
pT0
d
pT
Tool tip 2
Initial tumor
Deformed
surface
FIGURE 10.10 Variables for inclusion augmentation rendering. (Reprinted from Jeon, S.
and Harders, M., IEEE Trans. Haptics, 99, 1, 2014. With permission.)
1/ m
fT ,*,i (t )
*=1
di ( t ) =
k + b d (t )
i = x, y, z, (10.20)
where
n is the number of contact points
m is the exponential parameter in the HuntCrossley model
Finally, fT,*(t) is determined using (10.18), which is directly sent to the haptic AR
interface.
In Jeon and Harders (2014), we compared the simulation results of our algorithm with actual measurement data recorded from eight different real mock-ups via
various interaction methods. Overall, inclusion movements and the mutual effects
between contacts are captured and simulated with reasonable accuracy; the force
simulation errors were less than the force perception thresholds in most cases.
249
FIGURE 10.11 Variables for friction modulation. (Reprinted from Jeon, S. etal., Extensions
to haptic augmented reality: Modulating friction and weight, in Proceedings of the IEEE
World Haptics Conference (WHC), 2011, pp. 227232. With permission.)
with a tool. As illustrated in Figure 10.11, this is done by adding a modulation friction force f mod(t) to the real friction force:
Thus, the task reduces to: (1) simulation of the desired friction response ftarg(t) and (2)
measurement of the real friction force freal(t).
For the simulation of the desired friction force ftarg(t) during rendering, we identify the modified Dahl model describing the friction of a target surface. For the Dahl
model parameter identification, a user repeatedly strokes the target surface with the
probe tip attached to the PHANToM. The identification procedure is the same as
that given in Section 10.4.2. The model is then used to calculate ftarg(t) using the tool
tip position and velocity and the normal contact force during augmented rendering.
freal(t) can be easily derived from force sensor readings after a noise reduction
process. Given the real friction and the target friction, the appropriate modulation
force that needs to be rendered by the device is finally computed using (10.20). The
modulation force is sent to the haptic interface for force control.
We tested the accuracy of our friction identification and modulation algorithms
with four distinctive surfaces (Jeon etal. 2011). The results showed that regardless of
the base surface, the friction was modulated to a target surface without perceptually
significant errors.
250
behaviors, and approaches that are based on more in-depth contact mechanics are
necessary for appropriate modeling and augmentation. This has been one direction
of our research, with an initial result that allows the user to model the shape of a soft
object using a haptic interface without the need for other devices (Yim and Choi 2012).
Our work has used a handheld tool for the exploration of real objects. This must
be extended to those which allow for the use of bare hands, or at least very similar
cases such as thin thimbles enclosing fingertips. Such extension will enlarge the
application area of haptic AR by the great extent, for example, palpation training on
a real phantom that includes virtual organs and lumps. To this end, we have begun to
examine the feasibility of sensing not only contact force but also contact pressure in
a compact device and its utility for haptic AR (Kim etal. 2014).
Another important topic is that for multi-finger interaction. This functionality
requires very complicated haptic interfaces that provide multiple, independent forces
with a very large degrees of freedom (see Barbagli et al. 2005), as well as very
sophisticated deformable body rendering algorithms that take into account the interplay between multiple contacts. Research effort on this topic is still ongoing even for
haptic VR.
Regarding material properties, we need methods to augment friction, texture, and
temperature. Friction is expected to be relatively easier in both modeling and rendering for haptic AR, as long as deformation is properly handled. Temperature modulation is likely to be more challenging, especially due to the difficulty of integrating a
temperature display to the fingertip that touches real objects. This functionality can
greatly improve the realism of AR applications.
The last critical topic we wish to mention is texture. Texture is one of the most
salient material properties and determines the identifying tactual characteristics of
an object (Katz 1925). As such, a great amount of research has been devoted to
haptic perception and rendering of surface texture. Texture is also one of the most
complex issues because of the multiple perceptual dimensions involved in texture
perception; all of surface microgeometry and materials elasticity, viscosity, and friction play an important role (Hollins etal. 1993, 2000). See Choi and Tan (2004a,b,
2005, 2007) for a review of texture perception relevant to haptic rendering, Campion
and Hayward (2007) for passive rendering of virtual textures, and Fritz and Barner
(1996), Guruswamy et al. (2011), Lang and Andrews (2011), and Romano and
Kuchenbecker (2011) for various models. All of these studies pertained to haptic VR
rendering. Among these, the work of Kuchenbecker and her colleagues has the best
feasibility for application to haptic AR; they have developed a high-quality texture
rendering system that overlays artificial vibrations on a touchscreen to deliver the
textures of real samples (Romano and Kuchenbecker 2011) and an open database of
textures (Culbertson etal. 2014). This research can be a cornerstone for the modeling
and augmentation of real textures.
10.8CONCLUSIONS
This chapter overviewed the emerging AR paradigm for the sense of touch. We first
outlined the conceptual, functional, and technical aspects of this new paradigm with
three taxonomies and thorough review of existing literature. Then, we moved to
251
recent attempts for realizing haptic AR, where hardware and algorithms for augmenting the stiffness and friction of a real object were detailed. These frameworks
are applied to medical training of palpation, where stiffer virtual inclusions are rendered in a real tissue mock-up. Lastly, we elucidate several challenges and future
research topics in this research area. We hope that our endeavor introduced in this
chapter will pave the way to more diverse and mature researches in the exciting field
of haptic AR.
REFERENCES
Abbott, J., P. Marayong, and A. Okamura. 2007. Haptic virtual fixtures for robot-assisted
manipulation. In Robotics Research, eds. S. Thrun, R. Brooks, and H. Durrant-Whyte,
pp. 4964. Springer-Verlag: Berlin, Germany.
Abbott, J. and A. Okamura. 2003. Virtual fixture architectures for telemanipulation.
Proceedings of the IEEE International Conference on Robotics and Automation,
pp.27982805. Taipei, Taiwan.
Adcock, M., M. Hutchins, and C. Gunn. 2003. Augmented reality haptics: Using ARToolKit
for display of haptic applications. Proceedings of Augmented Reality Toolkit Workshop,
pp. 12. Tokyo, Japan.
Azuma, R., Y. Baillot, R. Behringer, S. Feiner, S. Julier, and B. MacIntyre. 2001. Recent
advances in augmented reality. IEEE Computer Graphics & Applications 21 (6):3447.
Barbagli, F., D. Prattichizzo, and K. Salisbury. 2005. A multirate approach to haptic interaction with deformable objects single and multipoint contacts. International Journal of
Robotics Research 24 (9):703716.
Bayart, B., J. Y. Didier, and A. Kheddar. 2008. Force feedback virtual painting on real
objects: A paradigm of augmented reality haptics. Lecture Notes in Computer Science
(EuroHaptics 2008) 5024:776785.
Bayart, B., A. Drif, A. Kheddar, and J.-Y. Didier. 2007. Visuo-haptic blending applied to a
tele-touch-diagnosis application. Lecture Notes on Computer Science (Virtual Reality,
HCII 2007) 4563: 617626.
Bayart, B. and A. Kheddar. 2006. Haptic augmented reality taxonomy: Haptic enhancing and
enhanced haptics. Proceedings of EuroHaptics, 641644. Paris, France.
Bennett, E. and B. Stevens. 2006. The effect that the visual and haptic problems associated with touching a projection augmented model have on object-presence. Presence:
Teleoperators and Virtual Environments 15 (4):419437.
Bianchi, G., C. Jung, B. Knoerlein, G. Szekely, and M. Harders. 2006a. High-fidelity visuohaptic interaction with virtual objects in multi-modal AR systems. Proceedings of the
IEEE and ACM International Symposium on Mixed and Augmented Reality, pp.187
196. Santa Barbara, USA.
Bianchi, G., B. Knoerlein, G. Szekely, and M. Harders. 2006b. High precision augmented reality haptics. Proceedings of EuroHaptics, pp. 169168. Paris, France.
Billinghurst, M., H. Kato, and I. Poupyrev. 2001. The MagicBookMoving seamlessly
between reality and virtuality. IEEE Computer Graphics & Applications 21 (3):68.
Borst, C. W. and R. A. Volz. 2005. Evaluation of a haptic mixed reality system for interactions with a virtual control panel. Presence: Teleoperators and Virtual Environments
14(6):677696.
Bose, B., A. K. Kalra, S. Thukral, A. Sood, S. K. Guha, and S. Anand. 1992. Tremor compensation for robotics assisted microsurgery. Engineering in Medicine and Biology Society,
1992, 14th Annual International Conference of the IEEE, October 29, 1992November
1 1992, pp. 10671068. Paris, France.
252
Brewster, S. and L. M. Brown. 2004. Tactons: Structured tactile messages for non-visual
information display. Proceedings of the Australasian User Interface Conference,
pp.1523. Dunedin, New Zealand.
Brown, L. M. and T. Kaaresoja. 2006. Feel whos talking: Using tactons for mobile phone
alerts. Proceeding of the Annual SIGCHI Conference on Human Factors in Computing
Systems, pp. 604609. Montral, Canada.
Campion, G. and V. Hayward. 2007. On the synthesis of haptic textures. IEEE Transactions
on Robotics 24 (3):527536.
Choi, S. and H. Z. Tan. 2004a. Perceived instability of virtual haptic texture. I. Experimental
studies. Presence: Teleoperators and Virtual Environment 13 (4):395415.
Choi, S. and H. Z. Tan. 2004b. Toward realistic haptic rendering of surface textures. IEEE
Computer Graphics & Applications (Special Issue on Haptic RenderingBeyond Visual
Computing) 24 (2):4047.
Choi, S. and H. Z. Tan. 2005. Perceived instability of haptic virtual texture. II. Effects of
collision detection algorithm. Presence: Teleoperators and Virtual Environments
14(4):463481.
Choi, S. and H. Z. Tan. 2007. Perceived instability of virtual haptic texture. III. Effect of
update rate. Presence: Teleoperators and Virtual Environments 16 (3):263278.
Chun, J., I. Lee, G. Park, J. Seo, S. Choi, and S. H. Han. 2013. Efficacy of haptic blind spot
warnings applied through a steering wheel or a seatbelt. Transportation Research Part
F: Traffic Psychology and Behaviour 21:231241.
Culbertson, H., J. J. L. Delgado, and K. J. Kuchenbecker. 2014. One hundred data-driven haptic texture models and open-source methods for rendering on 3D objects. Proceedings
of the IEEE Haptics Symposium, pp. 319325. Houston, TX.
Feng, Z., H. B. L. Duh, and M. Billinghurst. 2008. Trends in augmented reality tracking, interaction and display: A review of ten years of ISMAR. Proceedings of the IEEE/ACM
International Symposium of Mixed and Augmented Reality, pp. 193202. Cambridge, UK.
Frey, M., J. Hoogen, R. Burgkart, and R. Riener. 2006. Physical interaction with a virtual
knee jointThe 9 DOF haptic display of the Munich knee joint simulator. Presence:
Teleoperators and Virtual Environment 15 (5):570587.
Fritz, J. P. and K. E. Barner. 1996. Stochastic models for haptic texture. Proceedings of
SPIEs International Symposium on Intelligent Systems and Advanced Manufacturing
Telemanipulator and Telepresence Technologies III, pp. 3444. Boston, MA.
Fukumoto, M. and T. Sugimura. 2001. Active click: Tactile feedback for touch panels.
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems,
pp.121122. Seattle, WA.
Gerling, G. J. and G. W. Thomas. 2005. Augmented, pulsating tactile feedback facilitates simulator training of clinical breast examinations. Human Factors 47 (3):670681.
Gopal, P., S. Kumar, S. Bachhal, and A. Kumar. 2013. Tremor acquisition and reduction for
robotic surgical applications. Proceedings of International Conference on Advanced
Electronic Systems, pp. 310312. Pilani, India.
Grosshauser, T. and T. Hermann. 2009. Augmented hapticsAn interactive feedback system
for musicians. Lecture Notes in Computer Science (HAID 2012) 5763:100108.
Guruswamy, V. L., J. Lang, and W.-S. Lee. 2011. IIR filter models of haptic vibration textures.
IEEE Transactions on Instrumentation and Measurement 60 (1):93103.
Ha, T., Y. Chang, and W. Woo. 2007. Usability test of immersion for augmented reality based
product design. Lecture Notes in Computer Science (Edutainment 2007) 4469:152161.
Ha, T., Y. Kim, J. Ryu, and W. Woo. 2006. Enhancing immersiveness in AR-based product
design. Lecture Notes in Computer Science (ICAT 2006) 4282:207216.
Hachisu, T., M. Sato, S. Fukushima, and H. Kajimoto. 2012. Augmentation of material property by modulating vibration resulting from tapping. Lecture Notes in Computer Science
(EuroHaptics 2012) 7282:173180.
253
Haddadi, A. and K. Hashtrudi-Zaad. 2008. A new method for online parameter estimation
of hunt-crossley environment dynamic models. Proceedings of the IEEE International
Conference on Intelligent Robots and Systems, pp. 981986. Nice, France.
Harders, M., G. Bianchi, B. Knoerlein, and G. Szekely. 2009. Calibration, registration, and
synchronization for high precision augmented reality haptics. IEEE Transactions on
Visualization and Computer Graphics 15 (1):138149.
Hoever, R., G. Kosa, G. Szekely, and M. Harders. 2009. Data-driven haptic rendering-from
viscous fluids to visco-elastic solids. IEEE Transactions on Haptics 2:1527.
Hollins, M., S. J. Bensmi, K. Karlof, and F. Young. 2000. Individual differences in perceptual space for tactile textures: Evidence from multidimensional scaling. Perception &
Psychophysics 62 (8):15341544.
Hollins, M., R. Faldowski, R. Rao, and F. Young. 1993. Perceptual dimensions of tactile surfaced
texture: A multidimensional scaling analysis. Perception & Psychophysics 54:697705.
Hugues, O., P. Fuchs, and O. Nannipieri. 2011. New augmented reality taxonomy: Technologies
and features of augmented environment. In Handbook of Augmented Reality, ed.
B.Furht, pp. 4763. Springer-Verlag: Berlin, Germany.
Hunt, K. and F. Crossley. 1975. Coefficient of restitution interpreted as damping in vibroimpact. ASME Journal of Applied Mechanics 42:440445.
Iwata, H., H. Yano, F. Nakaizumi, and R. Kawamura. 2001. Project FEELEX: Adding haptic
surface to graphics. Proceedings of ACM SIGGRAPH, pp. 469476. Los Angeles, CA.
Jeon, S. and S. Choi. 2008. Modulating real object stiffness for haptic augmented reality.
Lecture Notes on Computer Science (EuroHaptics 2008) 5024:609618.
Jeon, S. and S. Choi. 2009. Haptic augmented reality: Taxonomy and an example of stiffness
modulation. Presence: Teleoperators and Virtual Environments 18 (5):387408.
Jeon, S. and S. Choi. 2010. Stiffness modulation for haptic augmented reality: Extension to
3D interaction. Proceedings of the Haptics Symposium, pp. 273280. Waltham, MA.
Jeon, S. and S. Choi. 2011. Real stiffness augmentation for haptic augmented reality. Presence:
Teleoperators and Virtual Environments 20 (4):337370.
Jeon, S. and M. Harders. 2012. Extending haptic augmented reality: Modulating stiffness
during two-point squeezing. Proceedings of the Haptics Symposium, pp. 141146.
Vancouver, Canada.
Jeon, S. and M. Harders. 2014. Haptic tumor augmentation: Exploring multi-point interaction.
IEEE Transactions on Haptics 99 (Preprints):11.
Jeon, S., M. Harders, and S. Choi. 2012. Rendering virtual tumors in real tissue mock-ups
using haptic augmented reality. IEEE Transactions on Haptics 5 (1):7784.
Jeon, S., J.-C. Metzger, S. Choi, and M. Harders. 2011. Extensions to haptic augmented reality: Modulating friction and weight. Proceedings of the IEEE World Haptics Conference
(WHC), pp. 227232. Istanbul, Turkey.
Johnson, A., D. Sandin, G. Dawe, T. DeFanti, D. Pape, Z. Qiu, and D. P. S. Thongrong. 2000.
Developing the PARIS: Using the CAVE to prototype a new VR display. Proceedings of
the ACM Symposium on Immersive Projection Technology.
Kajimoto, H., N. Kawakami, S. Tachi, and M. Inami. 2004. SmartTouch: Electric skin to touch
the untouchable. IEEE Computer Graphics & Applications 24 (1):3643.
Katz, D. 1925. The World of Touch. Hillsdale, NJ: Lawrence Erlbaum Associates.
Kim, H., S. Choi, and W. K. Chung. 2014. Contact force decomposition using tactile information for haptic augmented reality. Proceedings of the IEEE/RSJ International Conference
on Robots and Systems, pp. 12421247. Chicago, IL.
Kim, S., J. Cha, J. Kim, J. Ryu, S. Eom, N. P. Mahalik, and B. Ahn. 2006. A novel test-bed for
immersive and interactive broadcasting production using augmented reality and haptics.
IEICE Transactions on Information and Systems E89-D (1):106110.
Kim, S.-Y. and J. C. Kim. 2012. Vibrotactile rendering for a traveling vibrotactile wave based
on a haptic processor. IEEE Transactions on Haptics 5 (1):1420.
254
Kurita, Y., A. Ikeda, T. Tamaki, T. Ogasawara, and K. Nagata. 2009. Haptic augmented reality interface using the real force response of an object. Proceedings of the ACM Virtual
Reality Software and Technology, pp. 8386. Kyoto, Japan.
Kyung, K.-U. and J.-Y. Lee. 2009. Ubi-Pen: A haptic interface with texture and vibrotactile
display. IEEE Computer Graphics and Applications 29 (1):2432.
Lang, J. and S. Andrews. 2011. Measurement-based modeling of contact forces and textures
for haptic rendering. IEEE Transactions on Visualization and Computer Graphics 17
(3):380391.
Lee, H., W. Kim, J. Han, and C. Han. 2012a. The technical trend of the exoskeleton robot
system for human power assistance. International Journal of Precision Engineering and
Manufacturing 13 (8):14911497.
Lee, I. and S. Choi. 2014. Vibrotactile guidance for drumming learning: Method and perceptual assessment. Proceedings of the IEEE Haptics Symposium, pp. 147152.
Houston, TX.
Lee, I., K. Hong, and S. Choi. 2012. Guidance methods for bimanual timing tasks. Proceedings
of IEEE Haptics Symposium, pp. 297300. Vancouver, Canada.
Li, M., M. Ishii, and R. H. Taylor. 2007. Spatial motion constraints using virtual fixtures generated by anatomy. IEEE Transactions on Robotics 23 (1):419.
Luciano, C., P. Banerjee, L. Florea, and G. Dawe. 2005. Design of the ImmersiveTouch:
A high-performance haptic augmented virtual reality system. Proceedings of
International Conference on Human-Computer Interaction. Las Vegas, NV.
Mahvash, M. and A. M. Okamura. 2006. Friction compensation for a force-feedback telerobotic system. Proceedings of the IEEE International Conference on Robotics and
Automation, pp. 32683273. Orlando, FL.
Milgram, P. and H. Colquhoun, Jr. 1999. A taxonomy of real and virtual world display integration. In Mixed RealityMerging Real and Virtual Worlds, ed. by Y. O. A. H. Tamura,
pp.116. Springer-Verlag: Berlin, Germany.
Minamizawa, K., H. Kajimoto, N. Kawakami, and S. Tachi. 2007. Wearable haptic display to
present gravity sensation. Proceedings of the World Haptics Conference, pp. 133138.
Tsukuba, Japan
Mitchell, B., J. Koo, M. Iordachita, P. Kazanzides, A. Kapoor, J. Handa, G. Hager, and
R. Taylor. 2007. Development and application of a new steady-hand manipulator for
retinal surgery. Proceedings of the IEEE International Conference on Robotics and
Automation, pp. 623629. Rome, Italy.
Nojima, T., D. Sekiguchi, M. Inami, and S. Tachi. 2002. The SmartTool: A system for
augmented reality of haptics. Proceedings of the IEEE Virtual Reality Conference,
pp.6772. Orlando, FL.
Ochiai, Y., T. Hoshi, J. Rekimoto, and M. Takasaki. 2014. Diminished haptics: Towards digital
transformation of real world textures. Lecture Notes on Computer Science (Eurohaptics
2014, Part I) LNCS 8618: pp. 409417.
Okamura, A. M., M. R. Cutkosky, and J. T. Dennerlein. 2001. Reality-based models for vibration feedback in virtual environments. IEEE/ASME Transactions on Mechatronics
6(3):245252.
Ott, R., D. Thalmann, and F. Vexo. 2007. Haptic feedback in mixed-reality environment. The
Visual Computer: International Journal of Computer Graphics 23 (9):843849.
Pai, D. K., K. van den Doel, D. L. James, J. Lang, J. E. Lloyd, J. L. Richmond, and S. H. Yau. 2001.
Scanning physical interaction behavior of 3D objects. Proceedings of the Annual Conference
on ACM Computer Graphics and Interactive Techniques, pp. 8796. Los Angeles, CA.
Park, G., S. Choi, K. Hwang, S. Kim, J. Sa, and M. Joung. 2011. Tactile effect design and
evaluation for virtual buttons on a mobile device Touchscreen. Proceedings of the
International Conference on Human-Computer Interaction with Mobile Devices and
Services (MobileHCI), pp. 1120. Stockholm, Sweden.
255
Parkes, R., N. N. Forrest, and S. Baillie. 2009. A mixed reality simulator for feline abdominal
palpation training in veterinary medicine. Studies in Health Technology and Informatics
142:244246.
Powell, D. and M. K. OMalley. 2011. Efficacy of shared-control guidance paradigms for
robot-mediated training. Proceedings of the IEEE World Haptics Conference, pp.427
432. Istanbul, Turkey.
Reachin Technology. Reachin Display. http://www.reachin.se/. Accessed March 4, 2015.
Romano, J. M. and K. J. Kuchenbecker. 2011. Creating realistic virtual textures from contact
acceleration data. IEEE Transactions on Haptics 5 (2):109119.
Rosenberg, L. B. 1993. Virtual fixtures: Perceptual tools for telerobotic manipulation.
Proceedings of the IEEE Virtual Reality Annual International Symposium, pp. 7682.
Rovers, L. and H. van Essen. 2004. Design and evaluation of hapticons for enriched instant
messaging. Proceedings of Eurohaptics, pp. 498503. Munich, Germany.
Sandor, C., S. Uchiyama, and H. Yamamoto. 2007. Visuo-haptic systems: Half-mirrors considered harmful. Proceedings of the World Haptics Conference, pp. 292297. Tsukuba,
Japan.
Scharver, C., R. Evenhouse, A. Johnson, and J. Leigh. 2004. Designing cranial implants in a
haptic augmented reality environment. Communications of the ACM 47 (8):3238.
SenseGraphics. 3D-IW. http://www.sensegraphics.se/. Accessed on March 4, 2015.
SoIanki, M. and V. Raja. 2010. Haptic based augmented reality simulator for training clinical
breast examination. Proceedings of the IEEE Conference on Biomedical Engineering
and Sciences, pp. 265269. Kuala Lumpur, Malaysia.
Spence, C. and C. Ho. 2008. Tactile and multisensory spatial warning signals for drivers. IEEE
Transactions on Haptics 1 (2):121129.
Sreng, J., A. Lecuyer, and C. Andriot. 2008. Using vibration patterns to provide impact position information in haptic manipulation of virtual objects. Lecture Notes on Computer
Science (EuroHaptics 2008) 5024:589598.
Ternes, D. and K. E. MacLean. 2008. Designing large sets of haptic icons with rhythm. Lecture
Notes on Computer Science (EuroHaptics 2008) 5024:199208.
Vallino, J. R. and C. M. Brown. 1999. Haptics in augmented reality. Proceedings of the
IEEE International Conference on Multimedia Computing and Systems, pp. 195200.
Florence, Italy.
Yang, C., J. Zhang, I. Chen, Y. Dong, and Y. Zhang. 2008. A review of exoskeleton-type systems and their key technologies. Proceedings of the Institution of Mechanical Engineers,
Part C: Journal of Mechanical Engineering Science 222 (8):15991612.
Yao, H.-Y., V. Hayward, and R. E. Ellis. 2004. A tactile magnification instrument for minimally invasive surgery. Lecture Notes on Computer Science (MICCAI) 3217:8996.
Ye, G., J. Corso, G. Hager, and A. Okamura. 2003. VisHap: Augmented reality combining
haptics and vision. Proceedings of the IEEE International Conference on Systems, Man
and Cybernetics, pp. 34253431. Washington, D.C.
Yim, S. and S. Choi. 2012. Shape modeling of soft real objects using force-feedback haptic interface. Proceedings of the IEEE Haptics Symposium, pp. 479484. Vancouver,
Canada.
Yokokohji, Y., R. L. Hollis, and T. Kanade. 1999. WYSIWYF display: A visual/haptic
interface to virtual environment. Presence: Teleoperators and Virtual Environments
8(4):412434.
Zoran, A. and J. A. Paradiso. 2012. The FreeDA handheld digital milling device for craft
and fabrication. Proceedings of the ACM Symposium on User Interface Software and
Technology, pp. 34. Toronto, Canada.
Section III
Augmented Reality
11
Location-Based Mixed
and Augmented
Reality Storytelling
Ronald Azuma
CONTENTS
11.1 Motivation...................................................................................................... 259
11.2 Reinforcing.................................................................................................... 261
11.3 Reskinning..................................................................................................... 265
11.4 Remembering................................................................................................. 269
11.5 Conclusion..................................................................................................... 272
References............................................................................................................... 274
11.1MOTIVATION
One of the ultimate uses of mixed reality (MR) and augmented reality (AR) will be
to enable new forms of storytelling that enable virtual content to be connected in
meaningful ways to particular locations, whether those are places, people, or objects.
By AR, I refer to experiences that superimpose or composite virtual content in
3D space directly over the real world, in real time (Azuma, 1997). However, this
chapter also includes a broader range of MR experiences that blend real and virtual
in some manner but may not require precise alignment between the two (Milgram
and Kishino, 1994).
Initially, AR applications focused on professional usages that aided perception of
a task that needed to be done in a complex 3D environment, such as medical surgeries or the maintenance and assembly of equipment. This focus was logical, because
in the early days of AR, the equipment required was so specialized and expensive
that only professional applications seemed economically viable. Later, access to MR
and AR technologies became democratized through marker and image-based tracking via cameras attached to desktop and laptop computers, smartphones, and tablets.
This enabled almost everyone to run certain forms of MR and AR on devices that
they already bought for other purposes besides AR. Today, we see a variety of MR
and AR applications that target the mass market for advertising, entertainment, and
educational purposes. In the future, these experiences will advance to the point of
establishing new forms of media that rely upon the combination of real and virtual
to tell stories in new and compelling ways. In traditional media, such as books, radio,
movies, TV, and video games, the content is entirely virtual and disconnected from
259
260
261
and virtual components. If the experience is based on reality by itself, with little contributed by the augmentations, then there is no point in using AR. Conversely, if the
core of the experience comes solely from virtual content, then the augmentation part
is only a novelty and it will not be a viable new form of media. Many AR experiences
fall into the latter case. In the case of books, DVD cases, and movie posters, what is
compelling about reality is not the book, DVD, or poster. It is the virtual content represented or embodied by those objects. The compelling content resides in the ideas in
the books and the movies themselves, not the physical objects. Therefore, an experience that augments a book or movie poster with virtual augmentations derives all its
power from purely virtual content. Reality then becomes a backdrop that forms the
context of the experience, and perhaps part of the user interface, but reality is not a
core part of the content.
I hypothesize that there are at least three approaches for AR storytelling where
both real and virtual form critical parts of the experience:
Reinforcing
Reskinning
Remembering
In the next three sections, I discuss these three in more detail and describe examples
and concepts of each approach.
11.2REINFORCING
In reinforcing, the AR storytelling strategy is to select a real environment, whether
that is an object, person, or location, that is inherently compelling by itself, without
augmentation. Then, the AR augmentations attempt to complement the power that
is inherent in reality itself to form a new type of experience that is more compelling
than either the virtual content or reality by themselves.
Let me provide a conceptual example. Lets assume the goal of an experience is
to educate a participant about the Battle of Gettysburg. A student wishing to learn
about that battle could watch a 1993 movie, called Gettysburg, which had star actors,
superb cinematography, a compelling soundtrack, thousands of Civil War reenactors, and parts of it were even filmed on the site of the battle. However, Gettysburg is
also a real location. If you are so inclined, you can travel there and see the battlefield
yourself, in person. And if you do this, you will see large grassy fields, stone fences,
and many monuments. You will not see any reenactments of the battle or other virtual recreations that take you back to that fateful time in 1863. Yet, if you know why
that spot is important in American history, then simply being there, on the actual
spot where the event happened, is a powerful experience. I remember standing at
the spot of Picketts Charge, on the Union side, and feeling overcome by emotion.
(For the reader unfamiliar with the American Civil War, Picketts Charge was the
culmination of the Battle of Gettysburg. The Confederates lost both Gettysburg and
Vicksburg, on July 3 and 4, 1863. These two events are generally considered to be
the turning point of that war. Because the Union won the American Civil War, the
United States became one unified, indivisible country rather than separated into two
262
263
264
In contrast to Voices of Oakland, which the participants knew would be in a cemetery, this experience surprised most of the participants by ending in a cemetery,
particularly because this one is small and hidden away behind numerous high-rise
buildings. They were guided to a specific crypt, where they discover that the woman
in the story was Norma Jeane Baker, better known to the world by her other name:
Marilyn Monroe. We used a Situated Documentaries technique at this spot, showing
newsreel footage of her funeral. In certain sequences from the newsreel, the participants can see a clear, meaningful and one-to-one correlation between the crypts they
see in the surrounding environment and the footage that plays on the mobile device.
Combined with the somber music that was composed specifically for this spot, the
effect was a powerful and poignant coda to this experience. The emotions that participants reported feeling at this spot were different than if they simply read a tourist
guidebook and then visited the cemetery. By experiencing a story about her prior to
visiting her crypt, they were left to contemplate her life before becoming a movie
star, and to wonder if the story they just experienced might have been real.
265
Many AR storytelling experiences that rely on the reinforcing strategy use the
technique of connecting the story to the past. Being able to increase the range of
stories that can be told through new AR and MR effects will require advancements
in our ability to track and augment historic outdoor environments. The Archeoguide
project (Vlahakis etal., 2002) was an early effort to develop a platform for augmenting archaeological sites.
Reinforcing as a strategy has strengths and weaknesses. On the positive side,
the experience does not rely solely upon the virtual content by itself. Since reality
itself is compelling on its own, the real world does some of the work of providing
the meaningful experience. It may be easier to design and build virtual content that
complements reality rather than virtual content that must shoulder the entire burden
of being compelling by itself. In 110 Stories and The Westwood Experience, I believe
there are examples demonstrating that this strategy can succeed. On the negative
side, the experience is tied to a specific location. A person wishing to participate in
110 Stories must travel to Manhattan. A specific experience does not scale; it cannot
be experienced at any arbitrary location, but rather only the one it was designed for.
However, many different experiences might be built for different locations around
the world. Furthermore, the story must be tied to the characteristics of the real location. One cannot tell any arbitrary story and expect reinforcing to work; instead, the
story must complement the reality that exists at the chosen site. In The Westwood
Experience, we walked around Westwood with the writer, and he wrote the story
to incorporate real elements in the town, such as a jewelry store. And we were very
aware that our experience ended in a real cemetery. A story that was disrespectful
of that reality or that provided experiences inappropriate to that location would at
best fail to harness the power of that real place, and at worst would be offensive.
Reinforcing requires the story to appropriately complement reality.
11.3RESKINNING
In reskinning, the strategy is to remake reality to suit the purposes of the story you
wish to tell. Reality is either something that the creators specifically set up and then
augment, or the experience is designed to recharacterize whatever real surroundings
exist. Unlike reinforcing, there may not be anything particularly special or evocative
about the real location, which means experiences based on reskinning can potentially scale to operate in most arbitrary real locations, or reward the participant for
finding locations that work well for the experience. However, most of the power
from the experience must now come from the virtual content and how it adapts and
exploits the real world to fit that virtual content.
Rainbows End is a Hugo-award winning science fiction book written by Vinge
(2006) that provides one ultimate concept of reskinning. In this book, nearly perfect
AR is ubiquitously available to people who can operate the latest wearable computing systems, which use displays embedded in contact lenses and tracking provided
by a vast infrastructure of smart motes that permeate almost all inhabited locations.
Within this world, there are Belief Circles, which are persistent virtual worlds that
are linked to and overlaid upon real locations. Each Belief Circle has a particular theme, such as a fantasy world set in medieval times. When a user chooses to
266
subscribe to a Belief Circle, he or she sees the surrounding world changed to fit the
theme. For example, in a medieval Belief Circle, nearby real buildings might appear
to be castles and huts, and people on bicycles might instead appear to be knights
on horseback. A Belief Circle has a large group of people who subscribe to it and
create custom content, and when others view that content, the creators can receive
micropayments. We can view a Belief Circle as a persistent, co-located virtual world
that links directly, one to one, to our real world and uses the principle of reskinning
to change reality to fit the needs of the virtual content and experience.
Unlike the world of Rainbows End, we do not currently have ubiquitous tracking and sensing, so there are few examples of reskinning, and those often rely upon
real environments that were specially created to support the needs of the story. Two
examples are AR Faade and Half Real.
Faade is an interactive story experience where the participant plays the role of
a dinner guest visiting a couple whose marriage is just about to break apart (Mateas
and Stern, 2007). Faade supports free text entry so that the participant can type in
anything to converse with the two virtual characters while walking around freely
in the virtual environment. The experience is not a linear story. Depending on what
the participant does or says, various story beats are triggered and experienced. For
example, walking up to or commenting on a particular object or picture will trigger
certain narrative sequences. Faade by itself is a virtual environment that runs on a
PC and monitor. Researchers built an AR version, called AR Faade, in which they
built a real set that replicated the apartment that is the setting of this experience
(Figure 11.2), and participants wore a wearable AR system and walked around the
real set to see virtual representations of the couple (Dow etal., 2008). The goal was
to provide the participants a greater sense of actually occupying a real apartment
and interacting more naturally with the apartment and its virtual inhabitants. For
example, instead of typing in what they would say, participants now simply said what
they wanted, directly. Rather than relying upon voice recognition, human operators
working behind the scenes then typed in what the participants said into the system.
The evaluation did not directly attempt to measure whether AR Faade was more
engaging or compelling than Faade by itself. However, there was evidence that AR
Faade did affect some participants emotionally. Some chose to quit early rather
267
than participate in an experience that was an uncomfortable social situation that they
were expected to take an active role in. Others became highly engaged, showing visible signs of surprise and emotional connection, such as running to follow one of the
virtual characters when she leaves.
Half Real is a theatrical murder mystery play that uses spatial AR to merge real
actors and a physical set with virtual content and to engage the audience with interactive situations where the audience members vote on how an investigation proceeds
(Marner et al., 2012). Actors were actively tracked so that virtual labels could be
attached to them. Each audience member had a ZigZag handheld controller to vote
when prompted. The real set is painted white so that projective textures can change
the appearance of the set during the performance. The creators had to work out
numerous system issues to provide the reliability, robustness, and transportability
required of a professional stage production. Half Real completed a tour in South
Australia and subsequently played for a 3-week, sold out run in Melbourne. Future
possibilities include using the augmentations to change the appearances of the actors
themselves, rather than just the set and the backgrounds.
Applying the reskinning technique outside of controlled real environments, such as
those of AR Faade and Half Real, requires AR systems that can detect and understand
the real world. Kinect Fusion exploits the depth-sensing capabilities of the Kinect to
build a volumetric representation of a real environment, enabling a system to track off
of that space and augment it more realistically, with correct occlusions (Newcombe
etal., 2011). Such a system represents a step along the directions needed for AR and
MR systems to more commonly enable the reskinning technique. For example, the
Scavenger Bot demonstration that Intel showed at consumer electronics show (CES)
2014 showed a system that could scan a previously unknown tabletop environment and
then change its appearance by applying a different skin upon the environment (Figure
11.3). This system can also handle dynamic changes in the environment. While we are
FIGURE 11.3 The Intel Scavenger Bot demonstration at CES 2014, reskinning a real environment with a virtual grid pattern.
268
now seeing systems that can detect and model the real world, these models generally
lack semantic understanding. True reskinning will require systems that can detect and
recognize the semantic characteristics of the environment and objects.
The University of Central Florida provided an example of reskinning the interior
of a museum to better engage visitors with the exhibits. In the MR Sea Creatures
experience in the Orlando Science Center, visitors saw the museum interior transformed to be underneath the sea, and skeletons of ancient sea creatures on display
then came to life (Hughes etal., 2005). Visitors navigate a virtual rover vehicle to
collect specimens around the museum. At the end, they see an animation of one
dinosaur grabbing a pterodactyl out of the air and holding it in its mouth, which then
transforms back to the real world where the visitors see the real fossil of that dinosaur with the pterodactyl in its mouth.
The Aphasia House project is an exciting new application of MR storytelling
which enables patients suffering from traumatic brain injury to tell their own personal stories to therapists, not for the purpose of entertainment, but as a critical part
of guiding a doctor in determining how to treat a patient (Stapleton, 2014). People suffering from aphasia are impaired in their ability to communicate due to severe brain
injuries. They may lose the ability to speak, read, or write. Preliminary results from
this project indicate that immersive storytelling in an MR environment may enable
some patients to reconnect with their abilities to tell stories, and a doctor involved in
this project testifies that this breakthrough would not have been possible in a purely
virtual environment or without the augmentations provided in the MR environment.
What appeared to be critical was building a real environment (a kitchen) that could
be augmented in a variety of ways to elucidate familiar previous experiences from
the patient: making coffee, eating a bagel, touching countertops, and doing that in
a multimodal way so that he felt, heard, and smelled familiar sensations. This is an
example of applying reskinning to evoke stories out of a patient in the pursuit of a
serious goal: helping patients recover their own abilities to communicate.
Reskinning relies most on the power of the experience coming from the virtual
content, rather than the real environment, so a key strategy may be to exploit virtual content that participants are already familiar with. When this content is created
by professional storytellers and audiences who have already read the books, seen
the movies, or otherwise experienced the virtual content, then a new experience
thatleverages that same content is not starting from scratch. It has an advantage in
that the audience already finds the virtual content compelling. One example of this
is the Wizarding World of Harry Potter at the Universal Studios Orlando theme park
in Florida. While this is not explicitly an example of AR or MR storytelling, it is
an example of this leveraging strategy. Since most visitors are already familiar with
the Harry Potter books or films, when they walk through that area of the theme park
and experience the attractions and shops there, they draw from their memories and
previous knowledge of this fantasy world. Such leveraging is the basis of many crossmedia or trans-media approaches, and it can be quite successful. The Wizarding
World of Harry Potter was sufficiently popular that Universal expanded it in the
summer of 2014.
Alices Adventures in New Media was an early AR narrative experiment that leveraged the world of Alice in Wonderland, written by Lewis Carroll (Moreno etal., 2001).
269
In this system, a participant sat at a table and saw three other characters from the
book. The participant could interact with the characters by performing various
actions such as serving and sipping tea, which affected the narrative snippets.
At CES 2014, Intel ran a series of AR demonstrations based upon the steampunk
fantasy world of Leviathan, written by Westerfeld (2009). These AR demonstrations
were intended to inspire visitors about the potential for AR storytelling that used
this leveraging strategy (Azuma, Leviathan at CES 2014). I was part of a large team
of people who created and ran these demonstrations. The world of Leviathan is set
in an alternate Earth, during World War I, where mankind discovered genetic engineering very early. Therefore, in some countries a biological revolution supplanted
the industrial revolution, and people chose to fabricate new types of living things to
suit their purposes. For example, the Leviathan itself is an enormous flying airship in
the form of a whale, replacing dirigibles. In our demonstrations, we brought virtual
representations of the Leviathan and other creatures inspired by the book into the
real environment, both during the Intel CEOs keynote presentation and in the Intel
booth on the CES show floor (Figure 11.4). While these demonstrations did not tell
stories by themselves, they served as an inspiration of how this leveraging strategy
could result in compelling new storytelling media when applied through the reskinning strategy of AR and MR storytelling.
11.4REMEMBERING
In remembering, the AR storytelling strategy is to draw upon memories and retell
those stories, generally at the particular place where those memories and stories
happened. The belief is that combining the memories and stories with the actual real
location can result in a new experience that is more powerful than the real location
by itself, or the virtual content by itself. For example, I could revisit the site of my
270
wedding ceremony and see the gazebo where that occurred. While I have photos
and videos of that event, communicating my personal story of that day and what that
meant to me might be done in a more powerful manner as an AR or MR experience,
merging that virtual content with the actual location where my wedding occurred.
The strategy of remembering is similar to reinforcing, but there are some differences. The locations used in the reinforcing approach have particular meanings and
power that most people agree upon and know. For example, the site of the Battle of
Gettysburg draws its power from a specific event. While interpretations might vary,
the meaning is shared and agreed to by almost all participants, and that constrains
the experiences based on reinforcing to conform to that meaning. Remembering, in
contrast, is generally more personal and individual. With this approach, the potential stories and memories can vary greatly, even at the same location. For example,
Sproul Plaza on the campus of the University of California, Berkeley, could be home
to a wide variety of memories and stories. One person might remember participating
in the Free Speech Movement at that spot, while another knows it as the place where
he first met his future spouse, and yet another has memories of Pet Hugs sessions
where students could hug therapy dogs to reduce their stress.
Even when divorced from a particular location, memories and viewpoints by
themselves can make compelling experiences. Three Angry Men (MacIntyre etal.,
2003) and its successor, Four Angry Men, are experimental AR narrative demonstrations that enable participants to access and experience the memories and thoughts
of characters in a narrative, from their particular points of view. Inspired by the
drama Twelve Angry Men, written by Reginald Rose, these experiences place the
participant in the viewpoint of a jury member deliberating on a case. When seated
at a particular chair in a table, the participant sees the other jurors from the perspective of the juror who is sitting in the chair he or she is occupying (Figure 11.5). The
participant not only hears what the other jurors say and what his or her persona is
saying, but he or she also hears the inner thoughts on the character sitting in that
chair. The participant is free at any time to switch seats. When he or she does so, the
deliberation continues but the participant now hears and sees things from a different
jurors perspective and hears that jurors inner thoughts. For example, one juror with
liberal leanings sees another African-American juror as a potential ally but the third
juror as prejudiced. When the participant moves to the seat of the prejudiced juror,
the entire experience changes. While the initial juror heard the prejudiced juror as
FIGURE 11.5 Three Angry Men, seeing two other jurors from one point of view.
271
loud and unreasonable, the prejudiced juror hears himself as reasonable, if a bit frustrated. Even the appearances of the jurors change depending on the viewpoint. To the
prejudiced juror, the African-American jurors appearance and behaviors transform
to conform to his biases. Three Angry Men provides an example of how AR storytelling could communicate, at a first-person level, and how stories and memories change
based on personal perspectives and biases, which has been called the Rashomon
effect after Akira Kurosawas film.
REXplorer was a system that encouraged participants to explore and learn about
the historic town of Regensburg, Germany, and its well preserved medieval city center through MR storytelling techniques (Ballagas etal., 2008). Although REXplorer
is primarily a game that asks participants to go on quests to specific locations within
the city center, it motivates these quests through virtual characters, ghosts who used
to inhabit the town, who have requests to make of the participants and stories to tell
them. The participants learn about these characters and their stories and perform
tasks such as carrying a love letter to another character inhabiting a different location in town. By performing these tasks, participants indirectly explore the historical
city center, with the goal of learning history in a more entertaining and enjoyable
manner. Some participants found that using stories in this manner injected life into
a historical tour that otherwise might have been dry and boring.
Rider Spoke is a location-based experience in which bike riders were encouraged
to record personal stories and memories associated with particular locations, at the
spot where those occurred (Benford and Giannachi, 2011). The virtual content consisted of the audio recordings. Riders could add recordings only in spots that did not
already have content associated with that location, ensuring that each location had
unique content. The system provoked the participants to leave significant and evocative memories. For example, one instruction asked a participant to find a spot that his
or her father would like and to talk about that. In this experience, participants were
not just passive consumers of content but active generators, contributing their own
personal stories, as if mapping diary entries to specific spots in the city. Rider Spoke
was conducted in 10 cities across the world.
You Get Me was a 2008 experience in which participants selected one of eight
young people to hear his or her stories and, perhaps, make a connection to that person (Blast Theory, 2008). The eight young people had communication and tracking
equipment and walked around a park. Each person has a key question that he or
she wanted help in answering. Participants went to computer terminals at the Royal
Opera House, about 5 miles away from the park, to select one of the young people
and then explored the park virtually. As the participant moved in the virtual representation of the park, they heard stories relating to that person, the personal geography of how that park maps to the chosen young person. For example, one person
arranged stories around a swimming pool in which she nearly drowned. The stories
give clues for answering that persons question. The participant can then track down
the person in the park and attempt to answer the question. If the person thinks the
answer is insufficient, he or she can reject it and force the participant to explore more
of the personal geography and stories. But if the person finds the answer intriguing,
he or she can invite the participant for a private chat or phone call. And in the conclusion, the person takes a photo of himself or herself with the park in the background
272
11.5CONCLUSION
AR storytelling is still in an early, exploratory phase. While there have been many
initial experiments, as this chapter has discussed, I feel that as a form of media,
it is still very early in its development. It reminds me of an early phase of the
development of motion pictures, where some of the first movies featured footage
of moving trains, showing what the technology could do. Advancing the technology of moving pictures from those early days into the art form of cinema that we
know today required progress on many fronts, not just in technology, but also in
art, design, and business models. In AR and MR storytelling, we do not yet have
the equivalents of the early pioneers in cinema, such as Buster Keaton, Sergei
Eisenstein, and D.W. Griffith. These future pioneers will need to overcome some
of the core challenges of this new form of media while simultaneously unlocking
its potential.
One of the most important challenges in AR and MR storytelling is motivating
people to make the necessary effort to participate in these location-based media.
These experiences generally require participants to leave their homes and travel to
particular locations or venues, which requires effort and costs resources, in terms
of time, money, etc. In comparison, one can watch a film or see a TV show almost
anywhere. It takes little effort to turn on the TV and watch a show on the DVR, see
a movie, or play a video game in ones house. Why would someone get off the couch
and instead participate in these new location-based media?
The answer is that AR and MR storytelling experiences must become compelling enough to convince participants that this effort is worthwhile. Despite our
TV sets, game consoles, and comfortable couches, people still leave home to go
to a movie theater, see a sporting event in a stadium, go to theme parks, visit a
museum, travel to distant sites on vacation, etc. Those experiences are attractive enough that people willingly spend the extra effort to participate in those.
Initially, AR and MR storytelling might leverage such situations, augmenting
273
those experiences that are already proven to draw people out of their homes.
As the medium develops, I look forward to such experiences being sufficient by
themselves to attract participants.
What would be a payoff that would make people eager to participate in
location-based experiences? The Walt Disney Company provided an example in a
2013 Alternate Reality Game called The Optimist (Andersen, 2013). This provided a
series of experiences that Disney fans could participate in, culminating in an elaborate puzzle hunt that took place at the 2013 D23 Expo and in Disneyland. The people
who knew about these events and who chose to attend at the specific locations and
dates were rewarded with access to locations that the general public normally cannot
enter. These locations included the Club 33 private club, Walt Disneys apartment
above the fire station on Main Street, and the Lilly Belle caboose car on one of
the railroad trains. For Disney fans, visiting these locations provided highly desirable and special experiences, ones they would remember and forever cherish. While
compelling, this is a specific approach that requires special locations and does not
generalize or scale to most situations.
A more general approach toward achieving compelling experiences will be to
realize the potential inherent in the medium to see the world around you through
the eyes, viewpoint, and mindset of another person. To me, an ultimate expression
of the potential of AR and MR storytelling is if it can cause you to view the world
in a different way, and if this impact is powerful enough that it actually changes
your own belief system, how you view the world and make decisions. I can give an
example of the desired impact through something that happened to me through real
life experience:
A friend of mine, who worked on several projects with me, had a stroke.
He now requires a powered wheelchair to travel anywhere.
I now view the world differently than I did prior to this incident, because I have
traveled with him to many events. Before, I would not think twice about curbs or
stairs or other things that are insurmountable obstacles to my friend. Now, I am
sensitive to the locations of ramps, elevators, and other items that provide wheelchair access.
AR and MR storytelling experiences have the potential to change how we view
the world, to make us see the world from a different perspective, such as that of my
friend, and to in turn change our belief systems and values. This different perspective can be cultural, political, historical, social, or any other dimension. But if an
experience can change me, in a way similar to what I just described, that is proof that
experience is compelling.
We know that traditional media, such as film, plays, and books, have this power
and there are examples in each where people have found those stories memorable,
compelling, and life altering. When we have equivalent examples in AR and MR
storytelling, that exploit the new potentials in this form of media, then we will know
that it has matured sufficiently to stand equally with established media. I look forward to this day.
274
REFERENCES
Andersen, M. The Optimist draws fans into fictionalized Disney history. Wired, July 23,
2013. http://www.wired.com/2013/07/disney-the-optimist-arg/ (accessed May 5, 2014).
August, B. 110StoriesWhats your story? http://110stories.com (accessed May 5, 2014).
Azuma, R. A survey of augmented reality. Presence: Teleoperators and Virtual Environments,
6(4), 1997, 355385.
Azuma, R. Leviathan at CES 2014. http://ronaldazuma.com/Leviathan_at_CES2014.html
(accessed May 2, 2014).
Azuma, R. The Westwood Experience by Nokia Research Center Hollywood. http://ronal
dazuma.com/westwood.html (accessed May 2, 2014).
Ballagas, R., A. Kuntze, and S. Walz. Gaming tourism: Lessons from evaluating REXplorer, a
pervasive game for tourists. In Pervasive Computing 2008, Sydney, New South Wales,
Australia, May 1922, 2008, pp. 244261.
Benford, S. and G. Giannachi. Performing Mixed Reality. Cambridge, MA: MIT Press, 2011.
Blast Theory. You Get Me. http://www.blasttheory.co.uk/projects/you-get-me/ (accessed May
12, 2014).
Brittenham, S. and B. Haberlin. Anomaly. Anomaly Publishing, 2012.
Dow, S., J. Lee, C. Oezbek, B. MacIntyre, J. D. Bolter, and M. Gandy. Exploring spatial
narratives and mixed reality experiences in Oakland cemetery. In Proceedings of the
2005 ACM SIGCHI International Conference on Advances in Computer Entertainment
Technology, Valencia, Spain, June 1517, 2005, pp. 5160.
Dow, S., B. MacIntyre, and M. Mateas. Styles of play in immersive and interactive story: Case
studies from a gallery installation of AR Faade. In Proceedings of the 2008 International
Conference on Advances in Computer Entertainment Technology, Yokohama, Japan,
December 35, 2008, pp. 373380.
Harrigan, P. and N. Wardrip-Fruin, eds. Second Person: Role-Playing and Story in Games and
Playable Media. Cambridge, MA: MIT Press, 2007.
Hayes, G. Transmedia futures: Situated documentary via augmented reality, 2011. http://
www.personalizemedia.com/transmedia-futures-situated-documentary-via-augmented-
reality/ (accessed May 5, 2014).
Hllerer, T., S. Feiner, and J. Pavlik. Situated documentaries: Embedding multimedia presentations in the real world. In Proceedings of the 3rd IEEE International Symposium on
Wearable Computers 1999, San Francisco, CA, October 1819, 1999, pp. 7986.
Hughes, C., C. Stapleton, D. Hughes, and E. Smith. Mixed reality in education, entertainment
and training. IEEE Computer Graphics and Applications, 25(6), 2005, 2430.
MacIntyre, B., J. D. Bolter, J. Vaughn etal. Three angry men: An augmented-reality experiment in point-of-view drama. In Proceedings of the First International Conference
on Technologies for Interactive Digital Storytelling and Entertainment, Darmstadt,
Germany, March 2426, 2003, pp. 230236.
Mateas, M. and A. Stern. Writing Faade: A case study in procedural authorship. In Second
Person: Role-Playing and Story in Games and Playable Media, P. Harrigan and
N.Wardrip-Fruin (eds.). Cambridge, MA: MIT Press, 2007, pp. 183207.
Marner, M., S. Haren, M. Gardiner, and B. Thomas. Exploring interactivity and augmented
reality in theater: A case study of half real. In IEEE International Symposium on Mixed
and Augmented Reality 2012, Arts, Media and Humanities Proceedings, Atlanta, GA,
November 58, 2012, pp. 8186.
Meyers, K. Revealing Londinium Under London: New AR App. Cultural Heritage Informatics
Initiative, http://chi.anthropology.msu.edu/2011/07/revealing-londinium-under-londonnew-ar-app/ (accessed May 5, 2014).
Milgram, P. and F. Kishino. A taxonomy of mixed reality visual displays. IEICE Transactions
on Information Systems, E77-D(12), 1994, 13211329.
275
Montola, M., J. Stenros, and A. Waern. Pervasive Games: Theory and Design. Burlington,
MA: Morgan Kaufmann Publishers, 2009.
Moreno, E., B. MacIntyre, and J. D. Bolter. Alices adventures in new media: An exploration
of interactive narratives in augmented reality. In CAST01, Bonn, Germany, September
2122, 2001, pp. 149152.
Museum of London. Londinium App. http://www.museumoflondon.org.uk/Resources/app/
Streetmuseum-Londinium/home.html (accessed May 5, 2014).
Newcombe, R., S. Izadi, O. Hilliges etal. KinectFusion: Real-time dense surface mapping and
tracking. In Proceedings of IEEE International Symposium on Mixed and Augmented
Reality (ISMAR) 2011, Basel, Switzerland, October 2629, 2011, pp. 127136.
PhillyHistory.org, Implementing Mobile Augmented Reality Technology for Viewing Historic
Images. An Azavea and City of Philadelphia Department of Records White Paper.
2011. http://www.azavea.com/research/company-research/augmented-reality/ (accessed
May5, 2014).
Squire, K., M. F. Jan, J. Matthews etal. Wherever you go, there you are: Place-based augmented reality games for learning. In The Educational Design and Use of Simulation
Computer Games. Sense Publishing, 2007, pp. 265296. Rotterdam, The Netherlands.
Stapleton, C. Developing stories that healA collaboration between Simiosys and the Aphasia
house. http://simiosys.com/blog/?p=459 (accessed June 16, 2014).
Vlahakis, V., N. Ioannidis, J. Karigiannis etal. Archeoguide: An augmented reality guide for
archaeological sites. IEEE Computer Graphics and Applications, 22(5), 2002, 5260.
Vinge, V. Rainbows End. New York: Tor Doherty Associates, 2006.
Westerfeld, S. Leviathan. New York: Simon Pulse, 2009.
Wither, J., R. Allen, V. Samanta etal. The Westwood experience: Connecting story to locations
via mixed reality. In IEEE International Symposium on Mixed and Augmented Reality
2010, Arts, Media and Humanities Proceedings, Seoul, Korea, October 1316, 2010,
pp. 3946.
12
Dimensions of Spatial
Sound and Interface
Styles of Audio
Augmented Reality
Whereware, Wearware,
and Everyware
Michael Cohen
CONTENTS
12.1 Introduction and Overview............................................................................ 278
12.1.1 Auditory Dimensions......................................................................... 278
12.1.2 Source to Sink Chain......................................................................... 278
12.1.3 Spatial Sound.....................................................................................280
12.1.3.1 Directionalization and Localization................................... 283
12.1.3.2 Spatial Reverberation.......................................................... 285
12.1.4 Spatial Audio Augmented Reality..................................................... 287
12.2 Whereware: Spatial Dimensions.................................................................... 289
12.2.1 Position = Location and Orientation; Changing
Pose=Translation and Rotation........................................................ 289
12.2.2 Whereware for Augmented Reality................................................... 289
12.2.3 Distance Effects................................................................................. 291
12.2.4 Stereotelephony................................................................................. 292
12.3 Wearware and Everyware: Source and Sink Dimensions............................. 294
12.3.1 Capabilities........................................................................................ 295
12.3.1.1 Mobile and Wearable Auditory Interfaces.......................... 295
12.3.1.2 Form Factors....................................................................... 295
12.3.1.3 Dynamic Responsiveness....................................................300
12.3.1.4 Head Tracking.....................................................................300
12.3.1.5 Broadband Wireless Network Connectivity: 4G,
MIMO, ABC, and SDR...................................................... 301
277
278
279
Sensors:
Ultrasonic or acoustic
Magnetic
Optical, infrared
GPS/WAAS
Gyroscopic, accelerometric
Sources
Object generation
Environmental sounds
Auditory icons, earcons
Voice
Music
Nonspatial (anechoic: dry) sources:
Sampling (microphones)
Additive and subtractive synthesis
AM and FM
Physical modeling
Granular synthesis
Speech synthesis (including TTS)
Nonlinear wave-shaping
Waveguide synthesis
Hybrid algorithms
Network- or cloud-served streaming
Parameters:
Location: direction (heading) and distance
Directivity
Directional tone color
Mute/muzzle (solo)
Motion
Space (medium)
Radiation/propagation
Sinks
Reception and
directional synthesis
Auralization:
Location and direction (orientation)
Panning, HRTF processing
deafen/muffle (attend)
Doppler effects
Display:
Earphones, headphones, and headsets
Bone conduction
Nearphones
Loudspeakers
Speaker arrays (5.1, 7.1, 10.2, 22.2, WFS, HOA, etc.)
the various elementssensor (i.e., a microphone when a sound source is not otherwise provided); room or space; torso, head, and pinnae of the modeled listener; and
loudspeakersshould all be considered to create a veridical auditory illusion.
A speakers voice, a musical instrument tone, or another acoustic event(the source)
causes small fluctuations of pressure above (compression, a.k.a. condensation) and
below (rarefaction) atmospheric pressure. These variations can be sensed by a microphone, transducing acoustic energy into electrical. Such m
easurement, expressed as
a voltage signal, is discretely sampled (in time) and quantized (inamplitude) by an
audio interface ADC (analogdigital converter), encoding it uniformly (as in LPCM,
linear pulse code modulation) or nonlinearly (as in - or A-law representations).
Audio sources might alternatively be provided as recorded material, or synthesized or streamed in real time. Recordings, stored as computer files, might need to
be decompressed, for example, from MP3 or AAC (.m4a). Decoded audio data has
a flat PCM encoding, such as that encapsulated by WAV or AIFF files, in which
audio signals are represented as sequences of amplitudes at a constant sampling
rate. Synthesis techniques include those listed below the Sources block at the left
of Figure 12.1. Networked streams are typically remote teleconferees voices or
internet- and cloud-served music or other material.
These audio signals are filtered by a computers DSP (digital signal processing).
Digital amplification or attenuation is accomplished by adjusting linear gain, a
coefficient which modulates (multiplies) a raw signal sequence, scaling the envelope
of a notional pulse train for source exposure or sink sensitivity. Balancing or panning a stereo signal involves coupled gain adjustments to a leftright signal pair.
Spectrum-based adjustmentssuch as equalization and aural enhancementare
also possible, typically by specifying frequency-band-specific gain. Such s weetening
280
12.1.3Spatial Sound
In stereo reproduction systems, sound comes only from left and right transducers:
earphones, headphones, and loudspeakers. Such audio systems project only lateral
arrangement of captured and mixed sources, as the apparent direction from which
sound emanates is controlled by panning, shifting the balance of a sound source
281
between channels with a pan pot (short for panoramic potentiometer), a cross-coupled
dual mixing variable resistor, perhaps with nonlinear cross-tapers to preserve total
power across a distribution. But this technique yields images that are diffuse, located
basically only between loudspeakers and only at distances farther from the listener
than the plane of the speakers, or, if headphones are used, only between the ears
(for intracranial, or IHL, inside-the-head localization). Cyberspatial sound projects
audio media into acoustic space by manipulating signals so that they assume virtual
or projected positions, mapping them from zero space (source channels) into multidimensional space (listeners perceptual spaces). Panning by cross-fading intensity
yields lateralized images, degenerately spatialized 1D, but more sophisticated processes can make virtual sources directionalized in a 2D, periphonic (360, 2 radians
circumferentially) flat soundscape, or spatialized in a 3D, pantophonic (360 90,
4 steradian solid angle) soundscape. Spatial audio involves technology that allows
virtual sound sources to have not only lateral leftright (sway) attributes (as in a
conventional stereo mix), but vertical updown (heave) and longitudinal backforth
(surge) qualities as well. By applying psychoacoustic effects with DSP, engineers
and scientists are developing ways of generating sound fields (Tohyama etal. 1995)
and fully 3D sound imagery (Blauert 1997, Gilkey and Anderson 1997, Begault
2004, Rumsey 2006, Pulkki etal. 2011, Suzuki etal. 2011). Such virtual positions
enable auditory localization, estimability of the position in space of virtual sources.
Augmenting a sound system with spatial attributes unfolds extended acoustic dimensions; spatial sound is a sonic analog of 3D graphics. It takes theater-in-the-round
and turns it inside-out, immersing listeners in soundscapes.
The Audium is a unique spatial sound theater (Loy 1985, Shaff 2002), a specially
constructed venue featuring music steered among its 176 speakers in an intimate
(49 seats) setting in San Francisco. One of its literally motivating principles is that
space (like rhythm, melody, or harmony) is a fundamental element of music, and
that controlled rendering of sound in space creates a kinetic perception that must be
a part of a composers musical vocabulary. Sounds are choreographed through their
movement and intensity on multiple trajectories through space. In the words of the
Audiums director,
The volume of space becomes part of the work and a strong sense of sculpting sound
in three dimensions becomes apparent. A melodic line acquires a starting point,
a speed, a determined pathway, and a point to conclusion. Areas in space become
launching sites and meeting stations for converging sound lines. Melodic convolutions
can be physically felt as they flow along spatial planesvertical, horizontal, diagonal,
circular, and any combination. As each melodic line travels, layers unfold, overlap, and
entwine to reveal a rich audio tapestry. Harmonic tensions between different locations
in space open up unusual timbres. Rhythmic ideas take on new qualities when speed
and direction are enhanced by controlled movement. Live performance of works gives
a human, interactive element to the Audiums spatial electronic orchestra.
Stan Shaff
The most direct way of implementing spatial sound is by simply distributing real
sources in space, as in antiphonal or polychoral music, such as that composed in
282
Venice during the Renaissance by Andrea Gabrieli, his nephew Giovanni Gabrieli,
and Claudio Monteverdi. Such a literal approach to electroacoustic spatial sound,
acoustic space synthesis, physically associates each source with a loudspeaker, statically placed or perhaps moved around mechanically. Alternatively, spatial sound can
be bluntly captured by a gimbaled dummy head positioned around a fixed speaker
or by physically moving sources and speakers around a mannequin (alternatively
spelled manikin, a dummy head; in German: Kunstkopf), the captured binaural signals from which are presentable to listeners. However, such implementations are
awkward and not portable. Therefore, the rest of this chapter concentrates on DSP
synthesis of spatial cues.
Fully parameterized spatial audio (Begault 1994, Carlile 1996, Jot 1999, Rumsey
2001) allows dynamic, arbitrary placement and movement of multiple sources in
soundscapesincluding musical sound characteristics outlined in Table 12.1 and
spatial dimensions presented later in Table 12.2as well as control of extra dimensions (Cohen and Wenzel 1995) shown in Figure 12.2, including apparent extent,
directivity, orientation, and environmental characteristics such as envelopment, for
cyberspatial capabilities (Cohen etal. 1999).
A sound diffuser or spatializer, a multidimensional mixer, creates the impression
that sound is coming from different sources and different places, just as one would
hear in person. There are two paradigms for AR and VR perspectives: projecting simulated sources into a listeners space, and transporting a listener into another space.
TABLE 12.1
Dimensions of Musical Sound
Frequency content
Pitch and register: tone, melody, harmony, vibrato (FM)
Waveform (sinusoid, sawtooth, square, triangle, rectification, etc.), tone color, waveshaping,
equalization, and sweetening
Spectral profile, including envelope and moments (center frequency)
Spectrotemporal pattern (evolving spectrum), texture, tone color
LTAS (long-term average spectrum)
Dynamics
Intensity/volume/loudness
SNR (signal-to-noise ratio)
Envelope: attack, decay, sustain, release (musical note shape)
Temporal envelope, including tremolo (AM)
Timing
Duration
Tempo, repetition rate
Duty cycle
Rhythm and cadence, including syncopation
Spatial position: location and orientation
Direction: azimuth, elevation
Distance (range)
Directivity: attitude and focus
283
TABLE 12.2
Physically Spatial Dimensions: Taxonomy of Positional Degrees of Freedom,
Including Cinematographic Gestures
Position
Static (Posture, Pose)
Dynamic (Gesture)
Location
(Displacement)
Scalar
Translation
Camera
Motion
Lateral (transverse
width or breadth)
Frontal
(longitudinal
depth)
Abscissa
X
Ordinate
Y
Sway: track
(crab)
Surge: dolly
Vertical (height)
Altitude
Z
Heave: boom
(crane)
Orientation or
Attitude
Directions
(Force)
Leftright
Barrel roll
Azimuth
Pitch (tumble,
flip): tilt
Roll (bank,
flop): (roll)
Yaw (whirl,
twist): pan
About
Axis
Rotation
Elevation
Along
Axis
Climb/dive
Left/right
CW/CCW
Perpendicular
to Plane
Sagittal
(median)
Frontal
(coronal)
Horizontal
(transverse)
In Plane
Sagittal
(median)
Frontal
Horizontal
(transverse)
284
Spatial impression
Source
Position
Azimuth
Distance
Elevation
Environment
Focus, diffuseness,
intimacy
Dimensions
Width
Depth
Width
Height
Depth
Height
Envelopment
Dimensions
Width
Depth
Height
Perceived
dimensions
Width
Depth
Height
the head, pinnae (outer ears), and torso of humans or mannequins (Hartmann 1999,
Wenzel 1992). The bumps, folds, cavities, and ridges of a pinnae cause superposition
of direct and reflected sound, direction-dependent interference. This cancellation
and reinforcement results in comb filtering, so-called because of spectral modification, heard as tonal coloration, which manifests as characteristic notches and peaks
in a frequency plot (Ballou 1991, Watkinson 2001). For each direction, a leftright
stereo pair of these HRTF earprints can be captured. Cyberspatial sound can be
generated by driving input signals through these filters in a digital signal processor,
creating psychoacoustic localization effects by expanding an originally monaural
signal into a binaural or multichannel signal with spatial cues. These static perceptual cues are fragile in the presence of conflicting dynamic cues (Martens 2003), so
often the earprint selection is parameterized by head tracking.
Thus, a spatial sound signal processor implements digital filters, the output of
which can be converted to analog signals, amplified, and presented to speakers.
Such systems process arbitrary audio signalsincluding voices, sound effects,
and musicwith functions that place signals within the perceptual three-space of
each listener, including direction (azimuth and elevation) and distance (range), as
explained in the next section. These algorithms are deployed in spatial sound engines
such as DirectXs DirectSound and some implementations of OpenAL, using filters
such as the MIT KEMAR database. Depending upon the application, adequate periphonic directionalization can be achieved by modulating only phase and intensity
just ITD (delay) and IID (as in balance panning) without heavier DSP, as predicted
by the duplex theory, a simple perceptual model of spatial hearing direction estimation. For instance, the Java Sound Spatial Audio library works this way, and allows
285
Source
r sin
r
Contralateral
Ipsilateral
FIGURE 12.3 Woodworths formula for interaural time delay (ITD) is a frequency-
independent, far-field, ray-tracing model of a rigid, spherical head: time difference cues are
registered at starts and ends of sounds (onsets and offsets). Lag in binaural arrival of a planar
wavefront is estimated as = r( +sin )/C, where r is the assumed radius of a head, is the
bearing of a far-field source, and C is the speed of sound. The radius of a typical adult head is
about 10cm. Primarily based on low frequency content of a sound signal, ITD is usable (without phase ambiguity from spatial aliasing) for wavelengths up to the Nyquist spacing of the
distance between the ears, corresponding to about 1700Hz. (This model is a simplification.
For instance, heads are not spherical, and ears do not symmetrically straddle the diameter,
but are slightly rearward, at around 100 and 260.)
In similar colloquialisms, reverberant or anechoic spaces are said to be either live or dead, high- or
low-frequency signals are said to be bright or dark, and active or inactive signals are said to be
hot or cold.
286
HRTF
RTF
Reflection
Reverberation
Absorption
FT IFT
HRIR
Time domain
Source
Mic
ADC
RIR
Convolution
placement of virtual sources and sinks. There are two modeled classes of generated
echoes: early reflections, which are discretely generated (delayed, with frequencydependent amplitude), and late reflections comprising reverberation, which is more
diffuse and usually statistically described. The soundstage impression of the space
in which sound is perceived is related to presence, resonance, clarity and definition, envelopment, and immersion. Spaciousness, the perception of environmental
characteristicssuch as liveness, size, and shapeis correlated with indirect sound.
Spatial texture is associated with perception of interaction of sound with its environment, particularly the interval between arrival of direct sound and first few (early)
reflections.
As explained in the following paragraphs, early reflections are the particular
echoes generated by each source, and the reverberation tail forms the ambience of
the listening environment. Direct sound and discrete early reflections are cascaded
with filters for ambience to yield spatial reverberation.
Early reflections: Representing specific echoes, discrete, early reflectionsoff the
floor, walls, and ceilingprovide source position-dependent auditory images of
287
288
FIGURE 12.5 Poor persons mobile stereotelephony: a pair of inverted mobile phones,
deployed as a microphone array attached to a mannequin, simultaneously calling a dual voice
line, realizes wireless binaural telepresence.
289
290
with microelectromechanical systems (MEMS)gyroscopes, accelerometers, magnetometers (electronic compasses)and dead reckoning (path integration), combined with some kind of sensor fusion, to infer position. A receiver needs (geometric,
photometric, acoustic, etc.) calibration with the real world to align overlaid objects
and scenes. Issues include static (geometric) error and drift, rendering registration
error, and dynamic error (time lag and jitter), all somewhat mitigated by a forgiving
user or a nonliteral user interface, allowing plausible discrepancies within bounds of
suspended disbelief.
Whereware denotes position-aware applications, including LBS and AR applications. Whenceware (from whence, meaning from where) denotes location-aware
applications that reference an origin; whitherware (from whither, meaning to
where) denotes location-aware applications referencing a destination (Cohen and
Villegas 2011). Such functionality is especially relevant to interfaces with spatial
sound capability. For example, whenceware-enhanced voicemail s ystems could
directionalize playback so that each displayed message apparently comes from
its senders location. A real-time streamed voice channel might be p rocessed as
part of its display to express the speakers position. Such applications of s patial
sound (Loomis et al. 1990, Holland et al. 2002, May 2004), consistent with
and reinforcing ones natural sense of direction, can improve situation awareness. In polyphonic soundscapes with multiple audio channels, spatial sound
can enhance the cocktail party effect, allowing listeners to hear out a particular channel from the cacophony, enhancing discriminability and speech intelligibility. Looser mappings are also possible: a virtual source location need not
correspond to g eographic location of a sender, but could be mapped into individualized space of a sink. Important messages might come from a direction
in front of a recipient, while less critical voicemail comes from behind. Timetagged notifications could be projected to clock-associated azimuths, so that, for
instance, a three oclock appointment reminder could come from the right, or a
six oclock message from behind.
Whence- and whitherware navigation systems, primed with hyperlocal geotags
of locations, can auditorily display sonic beaconslandmarks, warnings of situated hazards, and come hithers, beckoning travelers to goals or checkpoints.* Spatial
sound can be used to enhance driving experience. Through onboard devices such as
navigation systems, telematic services based on mobile or vehicular communications
infrastructure give drivers access to weather and traffic advisories, as well as information on nearby restaurants, fuel, and entertainment facilities. As illustrated by
Figure 12.6, localized audio sources can be aligned with real-world locations across
various frames-of-reference for increased situation awareness and safety.
*
Legend suggests a couple of gruesome examples: The mythical Sirens sang so enchantingly that sailors
were lured to shipwreck, running aground on the bird-womens island. When Jason wanted to guide
the Argonauts past them, he had his crew stuff their ears with beeswax (passive attenuation) while
Orpheus played his lyre, drowning out (masking) the otherwise irresistible singing of the sea-nymphs.
In a later fabled era, the Pied Piper was hired by the town of Hamelin to clear a rat infestation. He
played a musical pipe to lure the rats into a river where they drowned. However, the town neglected
to pay the piper, so he played enticingly again to lead his deadbeat patrons children out of town,
whereupon they disappeared.
291
Compass
Goal
Junction,
milestone, checkpoint
Accident
Mobile channel
Traffic jam
Location-based services
Door ajar
Blind spot traffic
Other vehicles
Land line
Home
FIGURE 12.6 Back-seat driver: localized beacons for vehicular situation awareness.
12.2.3Distance Effects
In spatial auditory displays, the direction of virtual sources can be simply literal,
but to make a source audible it must be granted a practical intensity (Fouad 2004)
besides a figurative loudness. Extraordinary sounds like the explosion of Krakatoa
(a v olcanic island in the Indonesian archipelago) in 1883 can be audible hundreds of
kilometers away, but ordinary sounds such as speech and music rarely exceed 100
dB SPL and cannot usually be heard much more than a kilometer away. Although
geometric computer graphic models are almost always finally represented in rectilinear 2D, Cartesian or rectangular (x, y), or 3D, Euclidian (x, y, z), coordinates, navigation data are typically originally GPS-derived geographic latitudelongitudealtitude
coordinates, and spatial sound is almost always modeled using spherical coordinates,
distinguishing direction (azimuth and elevation ) and distance (radius or range ).
These frames-of-reference are naturally coextensive. To reify the notion of a source
projecting audible sound to a sinka spatial sound receiver represented by an avatar
in a virtual environment or the subject of an AAR projectionit is usual to display
the direction directly but to take some liberties by scaling the intensity. As in the
visual domain (McAllister 1993), the mechanism for judging distances is a combination of mono (monaural) and stereo (binaural) and stationary and dynamic cues.
292
Apparent distance of a sound source (Bronkhorst and Houtgast 1999, Moore and
King 1999) is determined mainly (Villegas and Cohen 2010) by
Overall sound level: Intensity of a point source varies as the reciprocal of
its squared distance (Mershon and King 1975). Simple models might use
this inverse-square free-field spherical intensity attenuation, equivalent to
linear gain scaling as a simple inverse and level falling off at 6 dB/range
doubling.
Interaural level differences: Closer sources present a larger ILD (Brungart
and Rabinowitz 1999), especially exaggerated at intimate, near-field whisper ranges.
Reverberation: Distance perception is affected by lateral reflections (Nielsen
1992).
Direct-to-reverberant energy ratio: In environments with reflecting surfaces,
sources far from a listener yield roughly the same reverberation level,
whereas direct sound level attenuates approximately according to the aforementioned 6 dB/distance doubling (Zahorik etal. 2005).
Head orientation: Range estimations are better when source direction is nearly
aligned with the interaural axis (Holt and Thurlow 1969).
Familiarity with environment and sound source: Distance estimation is worse
for unfamiliar sound sources (Coleman 1962).
Source dullness (sharpness): Distant sources are duller due to high-frequency
absorbent effects of air (Coleman 1968, Malham 2001). Nature acts like an LPF.
If a sound source or sink is moving, temporal intensity variation and Doppler shift
(pitch modulation) also contribute to distance estimation (Zakarauskas and Cynader
1991) (see Figure 12.7).
In virtual environments, the gain is usually clamped or railed within a certain near-field distance, and often disregarded beyond a certain range, where it is
assumed to be negligible. Also, distance effects are often exaggerated, as estimation
of range sharpens if level control is driven by models that roll off more rapidly than
the physical inverse 1/d amplitude law of spherical waves. Virtual sources can be
brought even closer to a listeners head by adjusting ILD to extreme values without
requiring the level to be increased as much as would normally occur at such close
range (Martens 2001).
12.2.4Stereotelephony
Stereotelephony means putting multichannel audio into communication networks,
using stereo effects in telephones and spatial sound in groupware. As audio codecs
improve and networks quicken for internet telephony, high-fidelity, high-definition
real-time audio becomes asymptotically
Broadband, with high sampling rate for full spectrum
Broadly dynamically ranged, carrying appropriate bit depth for expressive
intensity modulation
293
Extent (nimbus)
Fall-off for
distance
attenuation
1
0.7
0.5
0
0
3
6
dfva
de
Level/dB
Gain
Clamping
FIGURE 12.7 Virtual sound projection: The full volume area (dfva) is represented by the
dashed circle, within which the sound is heard at full volume, pegged at 0 dB, possibly with
near-field range manipulation. The extent (de) of the exposure refers to the nimbus in which a
focus-modulated sink can hear a nimbus-modulated source. (From Benford, S. etal., Presence:
Teleoperators and Virtual Environments, 4, 364386, 1995; Greenhalgh, C. and Benford, S.,
ACM Transactions on ComputerHuman Interactions, 2(3), 239261, 1995.) (Such limits are
a little like clipping planes in graphics renderers that cull beyond the sides of viewing frusta.)
294
TABLE 12.3
Saturated: Distributed and Pervasive, Continuous and Networked,
Transparent or Invisible (Spatial Hierarchy of Ubicomp or Ambient
Intimacy)
Smart spaces
Cooperative buildings and smart homes
Roomware (software for rooms) and reactive rooms media spaces
Spatially immersive displays
Information furniture
Networked appliances
Handheld, mobile, nomadic, portable, and wireless devices
Wearable computers
Computational clothing (smart clothes)
295
12.3.1Capabilities
12.3.1.1 Mobile and Wearable Auditory Interfaces
The dream motivating wireless technology is anytime, anywhere communications.
Mobile communication offers unique requirements and prospects because of interesting form factors (weight, size, interface), susceptibility to noise (less robust n etwork),
restricted bandwidth, and social potential (universality). Wireless computing has gone
beyond laptops, smartphones, and tablets (as well as smartbooks, netbooks, palmtops, ultrabooks, notebooks, clamshells, slates, tables, touchpad/handheld computers,
PDAs, and handheld gaming devices) to include wearable and intimate systems and
smart clothing. Besides the personal sound display systems illustrated by Figure 12.8,
an ultimate consequence of wearware could be talking clothing, like that imagined
by Figure 12.9. AAR mobile browsers are naturally extended by voice interfaces to
allow audio dialogs and mixed-initiative conversations: recognition of multitouch gestures, spoken input (via ASR: automatic speech recognition) and s ynthesized speech
(via TTS: text-to-speech), which modern algorithms have outgrown the drunken,
Scandinavian robot accents (so to speak) of the recent past.
12.3.1.2 Form Factors
Personal audio interfacesincluding intimate, wearable, handheld, mobile (like a
smartphone), nomadic (like a tablet), and portable (like a laptop)represent one end
of a spectrum, the other end of which is marked by social displays. These endpoints
delimit a continuum of useful interfaces, as outlined by Table 12.4. Auditory display
form factors can be ordered according to degree of intimacy along this private-public
dimension, corresponding to the vertical dimension in Figure 12.10.
Stereo earphones, headphones, and headsets arrange eartop transducers straddling ones head at the ears: in (as with earbuds), on (as with supraaural
headphones), or over (as with circumaural headphones). Design, fashion,
personality, style, and uniqueness are important characteristics of such
wearable devices, perhaps especially for younger users. Headphone-like displays, although somewhat cumbersome, allow greater f reedom of movement
while maintaining individually controlled audio display, including nearfield effects such as whispering. Such intimate sounds in ones sacred space
can evoke the so-called ASMR (autonomous sensory meridian response), a
euphoric tingling sensation sometimes compared to that from binaural beats.
Mobile terminals such as smartphones are portable and repositionable, but requ
ire extension to deliver stereo sound. Headphones which block e xternal sounds
are especially well suited for supporting active noise c ancellation (ANC),
which adds a polarity-inverted image of ambient noise to displayed signals.*
Open-back headphones are more transparent to ambient sounds; they obviate pseudoacoustic features for which binaural microphone-captured signals
* Polarity inversion can also be used as a crude way to broaden stereo imagery. Interaural cross-correlation (IACC) is related to the diffuseness or solidness of an image and affects the perceived spatial
extent of a source, the auditory source width (ASW). By inverting one side of a stereo pair, IACC is
bluntly subdued, and the resultant soundscape is widened.
296
Monophonic:
one speaker
Monotic or monaural:
one ear
Diotic:
one signal to both ears
Stereophonic:
two speakers
Dichoticbiphonic:
separate channels
Dichoticbinaural:
mannequin microphones
Loudspeaker binaural:
crosstalk cancellation
Nearphones, earspeakers:
Pseudophonic:
stereo speakers close to ears cross-connected dichotic
FIGURE 12.8 Personal sound displays. By preconditioning stereo signals, speaker crosstalk can be reduced. A special implementation of this technique is called transaural. (From
Bauck, J.L. and Cooper, D.H., Journal of the Audio Engineering Society, 44(9), 683705,
1996.) Such measures can be obviated by placing the speakers near the ears, nearphones for
unencumbered binaural sound. Pseudophonic arrangements allow a striking demonstration
of the suggestiveness of head-turning directionalization, as frontback and even updown
disambiguation is flipped, even if the subject can see the source. (Extended from Streicher,
R. and Everest, F.A., The New Stereo Soundbook, 3rd edn., Audio Engineering Associates,
Pasadena, CA; Marui, A. and Martens, W.L., Spatial character and quality assessment of
selected stereophonic image enhancements for headphone playback of popular music, in
AES: Audio Engineering Society Convention (120th Convention), Paris, France, 2006.)
297
298
TABLE 12.4
Audio and Visual Displays along PrivatePublic Continua
Display
Proxemic
Context
Architecture
Audio
Visual
Intimate
personal,
private
Individual
Chair
Nearphones, earspeakers
Interpersonal
Couch or bench
Multipersonal,
familiar
Social
Public
Large-screen display
(e.g., IMAX)
Diffusion
Potentially everyone
Massively
multiuser
Multiuser
Single use
Reality
Augmented
reality
Location
Omnipresent
Mobile
Augmented
virtuality
Virtuality
Synthesis
Location-based
Stationary
FIGURE 12.10 Augmented reality location diffusion taxonomy: augmented reality refers
to extension of users real environments by synthetic (virtual) content. The Synthesis axis is
an original MRMV continuum. (From Milgram, P. and Kishino, F., IEICE Transactions on
Informations and Systems, E77-D(2), 13211329, 1994.) Location refers to how and where such
AR systems might be actually used; Diffusion refers to degree of concurrent usage. (Adapted
and extended from Brull, W. etal., Computer Graphics Animation, 28(4), 4048, 2008.)
299
FIGURE 12.11 Whisper wearable terminal: By sticking a finger in ones ear, a user can hear
sound conducted through finger bones. When fingers on opposite hands, driven by respective sides
of a stereophonic signal, are inserted into both ears, a kind of binaural sound can be displayed.
The channels are not so sharply separated compared with ordinary stereophonic sound, but the
display has a unique surround feel provided by the combination of aerial and bone conduction.
(From Fukumoto, M., A finger-ring shaped wearable HANDset based on bone-conduction, in
ISWC: Proceedings of the International Symposium on Wearable Computing, pp. 1013, 2005.)
300
301
or
FIGURE 12.12 Head tracking: Active listening, including natural audition, uses intentional and unintentional head turning to disambiguate frontback confusion.
12.3.2Special Displays
12.3.2.1 Binaural Hearing Aids
About 5% of the worlds population suffers from significant hearing loss, and such
affliction is expected to worsen in the future. Modern hearing aids are a kind of wearable computer: they capture and digitize signals, perform frequency filtering across
separate bands, suppress noise, dynamically amplify conditioned signals, reconstruct
full-band signals from separated bands, and resynthesize amplified analog signals
302
which are finally transduced back into sound. Since microphones and speakers are
collocated in such devices, intense signals can cause howling, which limits amplification rangethe so-called GBF, gain before feedback. Binaural hearing aids can use
the cross-channels to address that problem, including detection stages in the processing chain so that when oscillation is on one side but not in the contralateral device at
the same frequency, it can be identified as feedback and suppressed (Hamacher etal.
2005). Contemporary models feature modes that allow selection from among several
programs according to the circumstantial environment (conversation in noise, phone
call, television, etc.). Consumer products such as GN ReSound Linx and Starkey Halo
use Bluetooth-connected smartphone interfaces to make directional and spectral
hearing aid adjustments, via a kind of body area network (BAN).
12.3.2.2 Parametric Ultrasonics
Ultrasound has wavelengths shorter than audible sound and can be aimed in beams
tighter than those formed by normal loudspeakers. Such displays create audible signals
through propagation distortion, nonlinear effects on air of ultrasonic signals (nominally above around 20 kHz, around 4060 kHz in current practice). They have been
researched for decades as parametric acoustic arrays but only in the last decade have
practical systems become commercially available (Ciglar 2010), including Holosonic
Research Labs Audio Spotlight and American Technologys HyperSonic Sound
System. They work by modulating an audio source onto an ultrasonic carrier, which
is then amplified and projected into the air by ultrasonic transducers. The product of
two ultrasonic signals, a reference carrier and the combination of the carrier and a
variable audio source, decompose into bands through dispersion in the air:
cos(c) sin(c + s) =
1
1
sin(2c + s) + sin(s).
2
2
As in a Theremin, the first (sum) intermodulation term is inaudible, but the second
(difference) is not. Audible sound is generated in the air, not at the speakers. The
highly directionalized sound beams are steerable through their focus and controllable spreading (reportedly as low as 3), and can be bounced off surfaces. If technical issues regarding such systems lower-frequency response, below 200400 Hz
(as there is an inherent 12 dB/octave high-pass slope, a consequence of the way that
ultrasound demodulates into the audible range), and concerns about health hazards
(as the inaudible sounds can be very intense, in the range of 140150 dB SPL) can be
allayed, ultrasonic-based audio displays could be as flexible as analogous light-based
visual displays, allowing personal sound in otherwise quiet spaces such as libraries.
12.3.2.3 Distributed Spatial Sound
Another exciting field of application of audio technologies is distributed spatial
sound. Smartphones and other audio terminals could be used not only as remote
controls, but also as distributed displays, ad hoc or spontaneous loudspeaker arrays
embedded among collocated users and helping to overcome the limited power of
individual devices. Similarly, mobile spatial audio could be used more extensively
for artistic purposes. For example, SoundDelta (Mariette etal. 2010) was a multiuser
303
AAR environment that used ambisonic zones in a way that resembled cells in a
mobile telephony network, rendering spatial audio via headphones. Participants
heard an interactive spatial soundscape while walking around a public area such as
a town square or park.
12.3.2.4 Information Furniture
Internet appliances can be outfitted with spatial sound capability. Besides home
theater, for instance, multimedia baths can be configured for extravagant surround
sound display. Massage chairs (such as those made by Inada) can synchronize s hiatsu
with music. A swivel chair can also be deployed as an instance of multimodal information furniture. As an audio output modality, nearphones straddling a headrest
present unencumbered binaural sound with soundscape stabilization for auditory
image localization, directionalizing audio using dynamically selected transfer functions determined by chair rotation (Cohen etal. 2007). For haptic output modality,
servomotors can twist motorized chairs under networked control, distributing torque
across the internet to direct the attention of seated subjects, orienting users (like
a dark ride amusement park attraction), or nudging them in a particular direction
(Cohen 2003) (see Figure 12.13).
(a)
(b)
FIGURE 12.13 Information furniture: a pivot (swivel) chair with servomotor deployed as a
rotary motion platform and I/O device, shown along with its digital analog. The input modality
is orientation tracking, which dynamically selects transfer functions used to s patialize audio
in a stable (rotation-invariant) soundscape. Nearphones straddling the headrest provide unencumbered binaural display. Commercially available Pioneer BodySonic configurations
embed speakers in the headrest and seat of lounge chairs and sofas, as well as dance floors, to
display visceral vibration that can augment audio-channel information. (a)Rotary motion platform (Developed with Mechtec, www.mechtec.co.jp), (b) mixed reality simulation compositing panoramic imagery into dynamic cg: a simulated simulator. (Model by Daisuke Kaneko.)
304
REFERENCES
Alam, S., M. Cohen, J. Villegas, and A. Ashir (2009). Narrowcasting in SIP: Articulated p rivacy
control. In S. A. Ahson and M. Ilyas (eds.), SIP Handbook: Services, Technologies, and
Security of Session Initiation Protocol, Chapter 14, pp. 323345. CRC Press/Taylor &
Francis. www.crcpress.com/product/isbn/9781420066036.
Ballou, G. M. (1991). Handbook for Sound Engineers (3rd edn.). Focal Press.
Barfield, W. and T. A. Furness III (1995). Virtual Environments and Advanced Interface
Design. Oxford University Press.
Bauck, J. L. and D. H. Cooper (1996, September). Generalized transaural stereo and applications. J. Aud. Eng. Soc. 44(9), 683705. http://www.aes.org/e-lib/browse.cfm?elib=7888.
Begault, D. R. (1994). 3-D Sound for Virtual Reality and Multimedia. Academic Press.
Begault, D. R. (ed.) (2004). Spatial Sound Techniques, Part 1: Virtual and Binaural Audio
Technologies. Audio Engineering Society.
Benford, S., J. Bowers, L. Fahln, C. Greenhalgh, J. Mariani, and T. Rodde (1995). Networked
virtual reality and cooperative work. Presence: Teleoperators and Virtual Environments
4(4), 364386.
Blattner, M. M., D. A. Sumikawa, and R. M. Greenberg (1989). Earcons and Icons: Their
structure and common design principles. Human Computer Interaction 4(1), 1144.
Blauert, J. (1997). Spatial Hearing: The Psychophysics of Human Sound Localization (revised
edn.). MIT Press.
Bronkhorst, A. W. and T. Houtgast (1999, February 11). Auditory distance perception in
rooms. Nature 397(6719), 517520, DOI:10.1038/17374.
Brull, W., I. Lindt, I. Herbst, J. Ohlenburg, A.-K. Braun, and R. Wetzel (2008, July/August).
Towards Next-Gen mobile AR games. Computer Graphics Animation 28(4), 4048.
Brungart, D. S. and W. M. Rabinowitz (1999). Auditory localization of nearby sources. Headrelated transfer functions. The Journal of the Acoustical Society of America 106(3),
14651479. http://link.aip.org/link/?JAS/106/1465/1, DOI: 10.1121/1.427180.
Carlile, S. (1996). Virtual Auditory Space: Generation and Applications. Springer.
Ciglar, M. (2010, June). An ultrasound based instrument generating audible and tactile sound. In
Proc. NIME: New Instruments for Music Expression, Sydney, New South Wales, Australia.
305
306
Hartmann, W. M. (1999, November). How we localize sound. Physics Today 52(11), 2429.
www.aip.org/pt/nov99/locsound.html.
Heller, E. J. (2012). Why You Hear What You Hear. Princeton University Press. www.
whyyouhearwhatyouhear.com, http://press.princeton.edu/titles/9912.html.
Holland, S., D. R. Morse, and H. Gedenryd (2002, September). AudioGPS: Spatial audio navigation with a minimal attention interface. PUC: Personal and Ubiquitous Computing
6(4), 253259.
Holman, T. (2008). Surround Sound: Up and Running (2nd edn.). Oxford, U.K.: Elsevier/
Focal Press.
Holt, R. E. and W. R. Thurlow (1969, December). Subject orientation and judgment of distance
of a sound source. The Journal of the Acoustical Society of America 46(6), 15841585.
Jot, J.-M. (1999). Real-time spatial processing of sounds for music, multimedia and interactive
humancomputer interfaces. Multimedia Systems 7(1), 5569.
Kapralos, B., M. R. Jenkin, and E. Milios (2008, December). Virtual audio systems. Presence:
Teleoperators and Virtual Environments 17(6), 527549. www.mitpressjournals.org/toc/
pres/17/6.
Kendall, G. S., W. L. Martens, D. J. Freed, M. D. Ludwig, and R. W. Karstens (1986). Image
model reverberation from recirculating delays. In AES: Audio Engineering Society
Convention, New York.
Kim, S., M. Ikeda, and W. L. Martens (2014). Reproducing virtually elevated sound via a
conventional home-theater audio system. Journal of the Audio Engineering Society
62(5), 337344.
Kim, Y.-H. and J.-W. Choi (2013). Sound Visualization and Manipulation. Wiley. http://
as.wiley.com/WileyCDA/WileyTitle/productCd-1118368479.html.
Kleiner, M. (2011). Acoustics and Audio Technology (3rd edn.). John Ross Publishing.
Loomis, J. M., C. Hebert, and J. G. Cicinelli (1990, October). Active localization of virtual
sounds. The Journal of the Acoustical Society of America 88(4), 17571763.
Loy, G. (1985). About audiumA conversation with Stanley Shaff. Computer Music Journal
9(2), 4148.
Malham, D. G. (2001, Winter). Toward reality equivalence in spatial sound diffusion. Computer
Music Journal 25(4), 3138. DOI: 10.1162/01489260152815279.
Mariette, N., B. F. G. Katz, K. Boussetta, and O. Guillerminet (2010, May). SoundDelta: A
study of audio augmented reality using WiFi-distributed ambisonic cell rendering. In
Proceedings of the 128 Audio Engineering Society Convention, London.
Martens, W. (2003, December). Perceptual evaluation of filters controlling source direction:
Customized and generalized HRTFs for binaural synthesis. Acoustical Science and
Technology 24(5), 220232. http://dx.doi.org/10.1250/ast.24.220.
Martens, W. L. (2001, December). Psychophysical calibration for controlling the range of
a virtual sound source: Multidimensional complexity in spatial auditory display. In
Proceedings of ICAD: International Conference on Auditory Display, pp. 197207.
http://legacy.spa.aalto.fi/icad2001/proceedings/papers/martens.pdf.
Martin, A., C. Jin, and A. V. Schaik (2009). Psychoacoustic evaluation of systems for delivering spatialized augmented-reality audio. Journal of the Audio Engineering Society
57(12), 10161027.
Marui, A. and W. L. Martens (2006, May). Spatial character and quality assessment of selected
stereophonic image enhancements for headphone playback of popular music. In AES:
Audio Engineering Society Convention (120th Convention), Paris.
May, M. (2004). Wayfinding, ships and augmented reality. In P. Andersen and L. Qvortrup
(eds.), Virtual Applications: Applications with Virtual Inhabited 3D Worlds, Chapter 10,
pp. 212233. London: Springer.
McAllister, D. F. (1993). Stereo Computer Graphics and Other True 3D Technologies.
Princeton University Press.
307
McGookin, D. K., S. A. Brewster, and P. Priego (2009). Audio bubbles: Employing nonspeech audio to support tourist wayfinding. In HAID: Proceedings of the International
Conference on Haptic and Audio Interaction Design, Dresden, Germany, pp. 4150.
Springer-Verlag. DOI: 10.1007/978-3-642-04076-4_5.
Melchior, F., J. Ahrens, and S. Spors (2010, November). Spatial audio reproduction: From
theory to production, Part I. In AES: Audio Engineering Society Convention (129th
Convention), San Francisco, California.
Mershon, D. H. and L. E. King (1975). Intensity and reverberation as factors in the auditory
perception of egocentric distance. Perception & Psychophysics 18(6), 409415.
Milgram, P. and F. Kishino (1994, December). A taxonomy of mixed reality visual displays.
IEICE Transactions on Information and Systems E77-D(12), 13211329.
Moore, D. R. and A. J. King (1999). Auditory perception: The near and far of sound localization. Current Biology 9(10), R361R363.
Mynatt, E. D., M. Back, R. Want, M. Baer, and J. B. Ellis (1998, April). Audio aura: Lightweight audio augmented reality. In Proceeding of CHI: Conference on Computer
Human Interaction, Los Angeles, California, pp. 566573. ACM Press/Addison-Wesley.
DOI: 10.1145/274644.274720, http://dx.doi.org/10.1145/274644.274720.
Nielsen, S. H. (1992, March). Auditory distance perception in different rooms. In Audio
Engineering Society (92nd Convention). www.aes.org/e-lib/browse.cfm?elib=6826.
Pulkki, V. (1997). Virtual source positioning using vector base amplitude panning. Journal of
the Audio Engineering Society 45(6), 456466.
Pulkki, V. (2004, June). Spatialization with multiple loudspeakers. See Greenebaum and
Barzel (2004), pp. 159172.
Pulkki, V., M.-V. Laitinen, J. Vilkamo, J. Ahonen, T. Lokki, and T. Pihlajamki (2009,
November). Directional audio codingPerception-based reproduction of spatial
sound. In Y. Suzuki, D. Brungart, H. Kato, K. Iida, and D. Cabrera (eds.), IWPASH:
Proceedings of the International Workshop on the Principles and Applications of Spatial
Hearing, Zao, Japan. eproceedings.worldscinet.com/9789814299312/9789814299312_
0056.html.
Pulkki, V., T. Lokki, and D. Rocchesso (2011). Spatial effects. In U. Zlzer (ed.), DAFX:
Digital Audio Effects (2nd edn.), Chapter 5, pp. 139184. Wiley.
Rabenstein, R. and S. Spors (2008). Sound field reproduction. In J. Benesty, M. M. Sondhi, and
Y. Huang (eds.), Handbook of Speech Processing, Chapter 53, pp. 10951114. Springer.
Rozier, J., K. Karahalios, and J. Donath (2004, April). Hear and there: An augmented reality
system of linked audio. In Proceedings of ICAD: International Conference on Auditory
Display.
Rumsey, F. (2001). Spatial Audio. Focal Press.
Rumsey, F. (ed.) (2006). Spatial Sound Techniques, Part 2: Multichannel Audio Technologies.
Audio Engineering Society.
Seo, B., M. M. Htoon, R. Zimmermann, and C.-D. Wang (2010, November). Spatializer: A
web-based position audio toolkit. In Proceedings of ACE: International Conference on
Advances in Computer Entertainment Technology, Taipei, Republic of China. ace2010.
ntpu.edu.tw.
Shaff, S. (2002). AUDIUM: Sound-sculptured space. Leonardo 35(3), 248. www.
mitpressjournals.org/toc/leon/35/3.
Shilling, R. D. and B. Shinn-Cunningham (2002). Handbook of Virtual Environments: Design,
Implementation, and Applications, Chapter Virtual auditory displays, pp. 6592.
Human Factors and Ergonomics. Mahway, New Jersey: Lawrence Erlbaum Associates.
Smalley, D. (1986). Spectro-morphology and structuring processes. In S. Emmerson (ed.), The
Language of Electroacoustic Music. Cambridge, Massachusetts: Macmillan-Palgrave.
Smalley, D. (1997, August). Spectromorphology: Explaining sound-shapes. Organised Sound
2(2), 107126. http://dx.doi.org/10.1017/S1355771897009059.
308
13
Applications of Audio
Augmented Reality
Wearware, Everyware,
Anyware, and Awareware
Michael Cohen and Julin Villegas
CONTENTS
13.1 Introduction and Overview............................................................................309
13.2 Applications................................................................................................... 310
13.2.1 Navigation and Location-Awareness Systems................................... 311
13.2.2 Assistive Technology for Visually Impaired..................................... 312
13.2.3 Synesthetic Telepresence................................................................... 313
13.2.4 Security and Scene Analysis............................................................. 313
13.2.5 Motion Coaching via Sonification..................................................... 314
13.2.6 Situated Games.................................................................................. 314
13.2.7 Entertainment and Spatial Music...................................................... 314
13.3 Anyware and Awareware............................................................................... 315
13.3.1 Audio Windowing.............................................................................. 315
13.3.2 Narrowcasting.................................................................................... 315
13.3.3 Multipresence.................................................................................... 317
13.3.4 Layered Soundscapes......................................................................... 319
13.4 Challenges..................................................................................................... 320
13.4.1 Capture and Synthesis....................................................................... 321
13.4.2 Performance....................................................................................... 321
13.4.3 Authoring Standards.......................................................................... 322
13.5 Concluding Remarks..................................................................................... 323
References............................................................................................................... 324
309
310
FIGURE 13.1 Schematic summary of binaural effects: ITD (interaural time delay or
difference), IID (interaural intensity difference), and head shadow (frequency-dependent
binaural attenuation). Ipsilateral signals at the ear closer to a source are stronger and earlier;
contralateral signals at the ear away from a source are weaker and later.
realizeAAR. Utility, professional, and leisure application areas are surveyed, including
multimodal augmented reality (AR) interfaces featuring spatial sound. Consideration
of (individual) wearware and (ubicomp) everyware is continued from the previous
chapter, in the context of mobile ambient transmedial interfaces that integrate personal
and public resources. Two more ware terms are introduced: anyware here refers to
multipresence audio windowing interfaces that use narrowcasting to selectively enable
composited sources and soundscape layers, and awareware automatically adjusts such
narrowcasting, maintaining a model of user receptiveness in order to modulate and
distribute privacy and attention across overlaid soundscapes.
13.2APPLICATIONS
Users of visual AR systems can sometimes be subject to bewildering displays of
information, making it difficult to identify and avoid hazards. Historically, adoption
of novel visual technologies precedes that of audio technologies. Current mobile
technology already offers display resolution finer than human visual acuity at normal viewing distances, and 3D visual displays are deployed on mobile platforms.
Considering such trends, we can expect a great increase in number of applications
using wearable spatial audio technologies. Miniaturization of components will allow
creation of devices small enough to be worn over the ear and controlled by facial
or tongue movements. Such devices will naturally include spatialization options for
audio reproduction. Researchers are exploring alternative ways to present information via complementary sensory modalities, especially audition. In this section we
survey a broad variety of applications of AAR, not only those that explicitly attempt
311
to desaturate the visual channel. Systems like those outlined in the last chapter are
being deployed in utility, professional, and recreational domains, considered here in
that order.
FIGURE 13.2 Localized beacons and directionalization for vehicular way-finding and
way-showing.
312
GPS
receiver
Driver
RS-232,
USB
Broadcaster
OSC
Bone conduction
OSC
Audio
server
Wireless
Landmarks,
virtual sources
Tourists
FIGURE 13.3 System architecture of GABRIEL: vehicular GPS information updates the
soundscape.
Figure 13.3. These ideas can be applied not only to smart or connected vehicles but
also to mobile pedestrian interfaces featuring visual navigation enhanced with spatial sound displayed via headphones (Sanuki etal. 2014). The goal is to provide users
of wearable computers with complementary cues for way-finding (or way-showing)
and situation awareness, including both static contents and dynamic streams.
Other researchers have concentrated on easing interaction between users of wearable computers and the devices themselves. Motivated by the laborious process of
navigating through series of menus in small screens, menu items can be presented
via spatiotemporal multiplexed speech enhanced with spatial audio (Ikei etal. 2006),
achieving almost perfect recognition when four items are displayed at 60 intervals
in front of the user.
313
or guide dog approaches. Indeed, many blind users of such systems regard the resultant experience as more akin to sensory amputation than to sensory augmentation.
13.2.3Synesthetic Telepresence
Mixed reality audio interfaces are not limited to first-person experiences: secondperson teleoperated vehicles can exploit AAR displays. For instance, a remotely controlled telerobot (Tachi 2009) or UAV (unmanned aerial vehicle, or drone) exploring
a nuclear power plant might be equipped with sensors such as directional dosimeters
(Geiger counters) as well as binocular cameras and binaural microphones, sharing
its egocentric perspective for telepresent experience. Overlaid on the drones naturally captured soundscape, an auditory rendering of radioactive hot-spots could
be delivered to human pilots, directionalized audio sources aligned with the actual
hazardous environment. Such multimodal displays support what could be called
synesthetic telepresence, since a one-to-one, unimodal mapping of sensor data to
displayed media is not compulsory: mediation of experience can substitute or include
cross-modal stimulation. The most appropriate telepresence mapping might rely on
crossing modal boundaries, so that, for instance, important data might be sonified
as well as visualized.
314
extract acoustic features such as spectral content, interaural time delays or differences (ITDs) and interaural intensity differences (IIDs), and pitch periodicity, which
are used to identify and locate acoustic sources in the environment, as many mammals do (Handel 1989, Deligeorges etal. 2009). BSS is the parsing of admixtures,
that is, the analysis of a soundscape. Beyond security, such a system could be used
for characterizing animal behavior, acoustic data logging, and underwater acoustic
monitoring, to mention a few applications.
13.2.6Situated Games
Location-aware games can use spatial audio to increase engagement of players
(Paterson etal. 2011). Situated or location-based games (Gaye etal. 2003, Magerkurth
etal. 2005) are a fertile domain for AAR (Cater etal. 2007), sound gardens, including cross-modal applications such as those mentioned earlier. decibel 151 (Stewart
etal. 2008, Magas etal. 2009) was an art installation and music interface that used
spatial audio technology and ideas of social networking to turn individuals into
walking soundtracks as they moved around each other in a shared real space and
listened to each other in a shared virtual space. Platforms to build such kind of interactions are being developed, such as MARA (Mobile Augmented Reality Audio), a
system that allows playback and recording of binaural sounds to track user position
(Peltola etal. 2009).
315
Mist (King 2007) are enlivened by binaural effects. Interactive music browsing can
leverage cyberspatial capability, and hypermedia-encoded musical experiences, such
as that suggested by the IEEE 1599 Music Encoding and Interaction standard (Baggi
and Haus 2009), represent inviting opportunities for spatial sound, including AAR.
Audio mixing techniques can be deployed in user interfaces for acoustic diffusion
(Gibson 2005). Virtual concerts can be optionally presented with perspective, so that
listening position coincides with virtual camera position. Head-tracking systems can
anchor such soundscapes so that they remain fixed in a users environment, such as
relative to a television (Algazi etal. 2005, Algazi and Duda 2011).
13.3.1Audio Windowing
Audio windowing, in analogy to graphical windowing user interfaces, treats soundscapes as articulated elements in a composite display (Begault 1994). In graphical user interfaces (GUIs), application windows can be rearranged on a desktop,
minimized, maximized, and reordered. Audio windowing similarly allows such
configuration of spatial soundscapes. Soundscapes, analogous to layers in graphical applications, can be combined simply by summing, although in practice some
scaling (amplification and attenuation), normalization, equalization, or other conditioning might yield more articulate results. For instance, interior soundscapes might
have reverberation applied, to better distinguish them from outdoor scenes. More
significantly, to make a composited soundscape manageable, some sources might be
muzzled or muted and some sinks might be narrowcastingly muffled or deafened.
(As defined in the last chapter, a sink is the dual of a source, used instead of listener
to distinguish it from an actual human, including allowing designation of multiple
sinks for a single user, as explained later in this chapter.)
13.3.2 Narrowcasting
As mentioned in the last chapter, a panoramic potentiometer can control the balance
of a channel in a conventional (leftright stereo) mix. By using an audio windowing
system as a mixing board, a multidimensional pan pot, users and applications can set
parameters corresponding to source and sink positions to realize a distributed sound
diffuser. Narrowcasting (Cohen and Ludwig 1991b, Cohen 2000, Fernando et al.
2009)by way of analogy with broad-, uni-, any-, and multicastingis an idiom
for limiting media streams, formalized by the expressions shown in Figure 13.5, to
316
FIGURE 13.4 Surroundsound: soundscape overload. (Copyright 2015, The New Yorker
Collection from cartoonbank.com. All rights reserved.)
distribute, ration, and control privacy, attention, and presence. Anyware models are
separate but combinable scenes, allowing a user to have selective attendance across
multiple spaces. Advanced floor control for chat spaces and conferences is outlined
in Table 13.1.
Privacy has two interpretations, the first association being that of avoiding leaks
of confidential information, protecting secrets. But a second interpretation means
freedom from disturbance, in the sense of not being bothered by irrelevance or interruption. Narrowcasting operations manage privacy in both senses, filtering duplex
information flow through an articulated conferencing model. Sources and sinks
are symmetric duals in virtual spaces, respectively representing sound emitters
317
and collectors. A human user might be represented by both a source and a sink in
a groupware environment, or perhaps by multiple instances of such delegates. In
groupware environments, both ones own and others sources and sinks are adjusted
for privacy. Audibility of a soundscape is controlled by embedded sinks. Sources
can be explicitly turned off by muting or implicitly ignored by selecting some
others. Similarly, sinks can be explicitly deafened or implicitly desensitized if other
sinks are attended. Modulation of source exposure or sink attention (Benford etal.
1995, Greenhalgh and Benford 1995) need not be all or nothing: nimbus and focus
can be, respectively, partially softened with muzzling and muffling (Cohen 1993).
Narrowcasting attributes can be crossed with spatialization and used for polite calling or awareware, reflecting a sensitivity to ones availability, like the online
offline status switch of a conferencing service.
13.3.3Multipresence
Increasingly fine-grained networked interpersonal communication, from journaling
through microblogging to life-streaming and tweets, suggests desirability of realtime communication via persistent channels for media streams. Multitasking
users want to have presence in several locations at once. Enriched user interfaces,
especially coupled with position tracking systems, encourage multipresence, the
inhabiting by sources and sinks of multiple spaces simultaneously, allowing a user to
monitor and inhabit many spaces (Cohen 1998). Multipresence is an interface strategy for managing attention and exposure, allowing a single human user to designate
doppelgnger delegates in distributed domains. Being anywhere is better than being
everywhere, since it is selective; multipresence is distilled ubiquity, narrowcastingenabled audition (for sinks) or address (for sources) of multiple objects of regard.
Display technology can enable such augmented telepresence for spoken telecommunication (Martens and Yoshida 2000).
318
TABLE 13.1
s
k
Narrowcasting for sOUrce
Tput and INput: Sources and Sinks Are Symmetric
Duals in Virtual Spaces, Respectively Representing Sound Emitters and
Collectors
Source
Sink
Function
Level
Direction
Presence
Locus
Instance
Transducer
Organ
Express
Include
Radiation
Amplification, attenuation
Output (display)
Nimbus (projection, exposure)
Reception
Sensitivity
Input (control)
Focus (attention)
Speaker
Loudspeaker
Mouth
Megaphone
Solo (select)
Listener
Microphone or dummy-head
Ear
Ear trumpet
Attend
Suppress
Exclude
Own
(reflexive)
Muzzle
Mute
Muffle
Deafen
(Thumb up)
(Thumbs down)
(Thumb down)
(Thumbs up)
Other
(transitive)
319
Independence of location and orientation can be exploited to flatter multipresent auditory localization. An advantage of separating translation and rotation is
that directionalizability can be preserved even across multiple frames of reference.
Such distributed presence can be coupled with a motion platform (like that shown
in Figure 12.13), a vehicle, or position tracking. Moving can twist (but deliberately
not shift) multiple sinks, maintaining consistent proprioceptive sensation. Relaxedly
shared position data can be filtered to adjust objects only relatively, for instance,
using angular displacement instead of absolute azimuth. A technique for integration
(resolving the apparent paradoxes) of such multipresence is explained in Cohen and
Fernando (2009).
320
13.4CHALLENGES
Personal networked spatial audio technologies inherit the challenges of both wearable computing and artificial spatialization (Kyriakakis 1998). While miniaturization and accuracy of sensors and actuators improve rapidly, extension of battery
capacity is far slower. Longevity solutions are being pursued, and techniques are
emerging in the realm of energy generators such as those collecting solar power,
harvesting human body heat (Leonov and Vullers 2009), or transducing limb and
finger movements, blood pressure, or other biological energy sources (Starner and
321
Paradiso2004). Besides such low-level concerns, remaining issues include individualizability of earprints, dynamic soundscape display, and standardization of scene
specification, respectively considered in the following sections.
13.4.2Performance
Audio spatialization brings its own challenges, such as polyphony (i.e., directionalizing multiple sound sources simultaneously) and relatedly, intelligibility of multiple
speech sources. Other issues include sensitivity to noisy conditions, and difficulty in
322
distinguishing sounds coming from the back or front of the listener. System issues
include the need for low latency for richly populated polyphonic soundscapes and
seamless integration of streamed network communication and signal processing.
Distance and movement cues are difficult to express with HRIRs. Humans generally perform poorly at tasks involving localization of moving sources. The minimum audible movement angle ranges from about 8 for slow sources to about 21
for sources moving at one revolution per second (Perrott and Musicant 1977). This
inability to follow moving sources (binaural sluggishness) is also evident in virtual
scenes where binaural cues need to be approximated at sometimes insufficient rates
and where other cues such as Doppler shift are not readily available (Prschmann
and Strig 2009). Compromises between computational complexity and realism of
movement illusion have been reached, but there is still no clear way to definitively
solve this problem.
13.4.3Authoring Standards
Authoring contents for such AR applications can be difficult, and a broadly accepted
standard has yet to emerge (Perey etal. 2011). Spatial Sound Description Interchange
Format (SpatDIF) is a format that describes spatial sound information in a structured way to support real-time and non-real-time applications (Peters etal. 2012).
The Web3D Consortium is working on extending its XML-based X3D (Brutzman
and Daly 2007) to support AR applications. Augmented Reality Markup Language
(ARML) (MacIntyre et al. 2013) is also an XML grammar that lets developers
describe an augmented scene, its objects, and some behavior. A method has been
proposed to model such scenes with a two-stage process (Lemordant and Lasorsa
2010), analogous to rendering documents using HTML and CSS languages.
The challenge of delivering satisfactory binaural experience is reflected in projects such as BiLiBinaural Listening, a collaborative research project being conducted at IRCAM focused on assessment of the quality of the experience made
possible by binaural listening, the research and development of solutions for individualizing the listening while avoiding tedious measurements in an anechoic chamber,
and the definition of a format for sharing binaural data in anticipation of an international standard. With initiatives such as SOFA (Spatially Oriented Format for
Acoustics), the problems of storing and sharing HRIRs and BRIRs (binaural room
impulse responses) can hopefully be ameliorated (Majdak etal. 2013).
In channel-based systems, display configuration is predetermined, and channels are rigidly persistent, but object-based models feature transient sounds
tagged with position metadata, rendered at runtime for both flexible speaker
arrangement and interactivity. MPEG-H Part 3 is a compression standard for 3D
audio that can support many loudspeakers. Intended for display of prerendered
contents, it uses a channel-based model, and therefore has a separate focus from
MDA, Multi-Dimensional Audio, an object-oriented representation. Unifying
these paradigms is Tech 3364, the European Broadcast Union (EBU) Audio Data/
Definition Model (ADM), which integrates audio streams and metadata, including periphonic, binaural, channel-, scene-, and object-based arrangements, hoping for future-proofing by extensibility. The file format is agnostic regarding the
323
324
REFERENCES
Akiyama, S., K. Sato, Y. Makino, and T. Maeno (2012, November). Effect onEnriching
impressions of motions and physically changing motions via synchronous sound effects.
In Proceedings of the Joint International Conference on Soft Computing and Intelligent
Systems (SCIS) and International Symposium on Advanced Intelligent Systems (ISIS),
pp. 856860. DOI: 10.1109/SCIS-ISIS.2012.6505218.
Alam, S., M. Cohen, J. Villegas, and A. Ashir (2009). Narrowcasting in SIP: Articulated privacy control. In S. A. Ahson and M. Ilyas (eds.), SIP Handbook: Services, Technologies,
and Security of Session Initiation Protocol, chapter 14, pp. 323345. CRC Press/
Taylor& Francis. www.crcpress.com/product/isbn/9781420066036.
Algazi, V., R. Duda, D. Thompson, and C. Avendano (2001). The CIPIC HRTF database. In
Proceedings of the IEEE Workshop on the Applications of Signal Processing to Audio
and Acoustics, pp. 99102. DOI: 10.1109/ASPAA.2001.969552.
Algazi, V. R., R. J. Dalton, Jr., R. O. Duda, and D. M. Thompson (2005, October). Motiontracked binaural sound for personal music players. In AES: Audio Engineering Society
Convention (119th Convention), New York.
Algazi, V. R. and R. O. Duda (2011, January). Headphone-based spatial sound. IEEE Signal
Processing Magazine 28(1), 3342.
*
Note that we have not distinguished between external assistive hearing devices (e.g., BTEbehindthe-ear devices), which are properly known as hearing aids (HA), and cochlear implants (CI), which
have an effect like internal headphones. Nor have we discussed bimodal and hybrid configurations
(e.g., HA in one ear, CI in the other). Although there are important differences between them from the
wearable computer perspective, they are similar in function and opportunities.
325
Baggi, D. and G. Haus (2009, March). IEEE 1599: Music encoding and interaction. Computer
42(3), 8487. DOI: 10.1109/MC.2009.85.
Bederson, B. B. (1995). Audio augmented reality: A prototype automated tour guide.
In Proceedings of the CHI: Conference on ComputerHuman Interaction. DOI:
10.1145/223355.223526.
Begault, D. R. (1994). 3-D Sound for Virtual Reality and Multimedia. Academic Press.
Benford, S., J. Bowers, L. Fahln, C. Greenhalgh, J. Mariani, and T. Rodde (1995). Networked
virtual reality and cooperative work. Presence: Teleoperators and Virtual Environments
4(4), 364386.
Bilinski, P., J. Ahrens, M. R. P. Thomas, I. J. Tashev, and J. C. Platt (2014, May). HRTF
magnitude synthesis via sparse representation of anthropometric features. In ICASSP:
Proceedings of the International Conference on Acoustics, Speech, and Signal
Processing, Florence. http://www.icassp2014.org.
Breebaart, J., J. Engdegrd, C. Falch, O. Hellmuth, J. Hilpert, A. Hoelzer, J. Koppens etal.
(2008, May). Spatial audio object coding (SAOC)The upcoming MPEG standard on
parametric object based audio coding. In AES: Audio Engineering Society Convention
(124th Convention). http://www.aes.org/e-lib/browse.cfm?elib = 14507.
Breebaart, J. and C. Faller (eds.) (2007). Spatial Audio Processing: MPEG Surround and
Other Applications. West Sussex: John Wiley & Sons, Ltd.
Bregman, A. S. (1990). Auditory Scene Analysis: The Perceptual Organization of Sound. MIT
Press.
Brungart, D. S. (2002). Near-field virtual audio displays. Presence: Teleoperators and Virtual
Environments 11(1), 93106.
Brutzman, D. and L. Daly (2007, April). X3D: Extensible 3D Graphics for Web Authors.
Morgan Kaufmann.
Bujacz, M., P. Skulimowski, and P. Strumillo (2012). NavitonA prototype mobility
aid for auditory presentation of three-dimensional scenes to the visually impaired.
Journal of Audio Engineering Society 60(9), 696708. www.aes.org/e-lib/browse.
cfm?elib=16374.
Cater, K., R. Hull, T. Melamed, and R. Hutchings (2007, April). An investigation into the
use of spatialized sound in locative games. In Proceedings of the CHI: Conference
on ComputerHuman Interaction, San Jose, California, pp. 23152320. http://dx.doi.
org/10.1145/1240866.1241000.
Chi, D., D. Cho, S. Oh, K. Jun, Y. You, H. Lee, and M. Sung (2008, December). Sound-specific
vibration interface using digital signal processing. In Proceedings of the International
Conference on Computer Science and Software Engineering, vol. 4, pp. 114117.
Cohen, M. (1993, August). Throwing, pitching, and catching sound: Audio windowing models
and modes. IJMMS: Journal of PersonComputer Interaction 39(2), 269304.
Cohen, M. (1998). Quantity of presence: Beyond person, number, and pronouns. In T.L.Kunii
and A. Luciani (eds.), Cyberworlds, chapter 19, pp. 289308. Tokyo: Springer-Verlag.
Cohen, M. (2000, February). Exclude and include for audio sources and sinks: Analogs of
mute and solo are deafen and attend. Presence: Teleoperators and Virtual Environments
9(1), 8496. http://www.mitpressjournals.org/doi/pdf/10.1162/105474600566637.
Cohen, M. and O. N. N. Fernando (2009). Awareware: Narrowcasting attributes for selective
attention, privacy, and multipresence. In P. Markopoulos, B. de Ruyter, and W. Mackay
(eds.), Awareness Systems: Advances in Theory, Methodology and Design, Human
Computer Interaction Series, chapter 11, pp. 259289. Springer.
Cohen, M. and N. Gyrbir (2008). Personal and portable, plus practically panoramic:
Mobile and ambient display and control of virtual worlds. Innovation: The Magazine of
Research & Technology 8(3), 3335. www.innovationmagazine.com.
Cohen, M. and L. F. Ludwig (1991a, March). Multidimensional audio window management.
IJMMS: Journal of PersonComputer Interaction 34(3), 319336.
326
327
Ishii, H. and B. Ullmer (1997, March). Tangible bits: Towards seamless interfaces between
people, bits and atoms. In Proceedings of the CHI: Conference on ComputerHuman
Interaction, pp. 234241.
Jones, M., S. Jones, G. Bradley, N. Warren, D. Bainbridge, and G. Holmes (2008). Ontrack:
Dynamically adapting music playback to support navigation. PUC: Personal and
Ubiquitous Computing 12(7), 513525. DOI: 10.1007/s00779-007-0155-2.
Katz, B. F. G., S. Kammoun, G. Parseihian, O. Gutierrez, A. Brilhault, M. Auvray, P. Truillet,
M. Denis, S. Thorpe, and C. Jouffrais (2012, November). NAVIG: Augmented reality
guidance system for the visually impaired. Virtual Reality 16(4), 253269. DOI: 10.1007/
s10055-012-0213-6.
King, S. (2007, October). The mist in 3-D sound.
Kyriakakis, C. (1998, May). Fundamental and technological limitations of immersive audio
systems. Proceedings of the IEEE 86(5), 941951. DOI: 10.1109/5.664281, http://
ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=664281.
Lemordant, J. and Y. Lasorsa (2010, May). Augmented reality audio editing. In Audio
Engineering Society Convention (128th Convention), London. www.aes.org/e-lib/
browse.cfm?elib=15439.
Leonov, V. and R. J. M. Vullers (2009). Wearable electronics self-powered by using human
body heat: The state of the art and the perspective. Journal of Renewable and Sustainable
Energy 1(6). http://scitation.aip.org/content/aip/journal/jrse/1/6/10.1063/1.3255465.
Loomis, J. M., C. Hebert, and J. G. Cicinelli (1990, October). Active localization of virtual
sounds. Journal of the Acoustical Society of America 88(4), 17571763.
MacIntyre, B., H. Rouzati, and M. Lechner (2013, May/June). Walled gardens: Apps and data as
barriers to augmented reality. IEEE Computer Graphics and Applications 33(3), 7781.
Magas, M., R. Stewart, and B. Fields (2009, August). decibel 151. In Proc. SIGGRAPH Art
Gallery, New Orleans, Louisiana. DOI: 10.1145/1667265.1667290, www.siggraph.org/
s2009/galleries_experiences/information_aesthetics/index.php.
Magerkurth, C., A. D. Cheok, R. L. Mandryk, and T. Nilsen (2005, July). Pervasive games:
Bringing computer entertainment back to the real world. Computer Entertainment 3(3),
4. DOI: 10.1145/1077246.1077257.
Majdak, P., T. Carpentier, R. Nicol, A. Roginska, Y. Suzuki, K. Watanabe, H. Wierstorf, H.
Ziegelwanger, and M. Noisternig (2013, May). Spatially oriented format for acoustics:
A data exchange format representing head-related transfer functions. In AES: Audio
Engineering Society Convention (134th Convention), Rome. Preprint #8880.
Martens, W. (2003, December). Perceptual evaluation of filters controlling source direction:
Customized and generalized HRTFs for binaural synthesis. Acoustical Science and
Technology 24(5), 220232. http://dx.doi.org/10.1250/ast.24.220.
Martens, W. L. and A. Yoshida (2000, May). Augmenting spoken telecommunication via spatial audio transformation. Journal of the 3D-Forum Society of Japan 14(4), 69175.
Paterson, N., K. Naliuka, T. Carrigy, M. Haahr, and F. Conway (2011, February). Locationaware interactive game audio. In AES: Audio Engineering Society Convention (41st
International Conference): Audio for Games. www.aes.org/e-lib/browse.cfm?elib=15769.
Pedersen, E. R., T. Sokoler, and L. Nelson (2000). Paperbuttons: Expanding a tangible user
interface. In DIS: Proceedings of the third Conference on Designing Interactive Systems,
New York, pp. 216223. ACM. DOI: 10.1145/347642.347723.
Peltola, M., T. Lokki, and L. Savioja (2009, February). Augmented reality audio for location-based games. In AES: Audio Engineering Society Convention (35th International
Conference): Audio for Games. www.aes.org/e-lib/browse.cfm?elib=15176.
Perey, C., T. Engelke, and C. Reed (2011). Current status of standards for augmented reality.
In L. Alem and W. Huang (eds.), Recent Trends of Mobile Collaborative Augmented
Reality Systems, chapter 2, pp. 2138. Springer-Verlag.
328
Perrott, D. R. and A. D. Musicant (1977). Minimum auditory movement angle: Binaural localization of moving sound sources. Journal of the Acoustical Society of America 62(6),
14631466. DOI: 10.1121/1.381675.
Peters, N., T. Lossius, and J. C. Schacher (2012, July). SpatDIF: Principles, specification, and
examples. In Proceedings of Sound and Music Computing Conference, Copenhagen.
Prschmann, C. and C. Strig (2009, August). Investigations into the velocity and distance perception of moving sound sources. Acta Acustica United with Acustica 95(4), 696706.
Quackenbush, S. and J. Herre (2005, October/December). MPEG surround. IEEE Multimedia
12(4), 1823. http://doi.ieeecomputersociety.org/10.1109/MMUL.2005.76. 5.
Rekimoto, J. (2008). Organic interaction technologies: From stone to skin. Communications
of the ACM 51(6), 3844.
Rothbucher, M., T. Habigt, J. Habigt, T. Riedmaier, and K. Diepold (2010, December). Measuring
anthropometric data for HRTF personalization. In SITIS: Proceedings of the International
Conference on Signal-Image Technology and Internet-Based Systems, pp.102106.
Rund, F. and F. Saturka (2012, July). Alternatives to HRTF measurement. In TSP: Proceedings
of International Conference on Telecommunications and Signal Processing, pp.648652.
Sanuki, W., J. Villegas, and M. Cohen (2014, April). Machi-beaconAn application of spatial
sound on navigation systems. In AES: Audio Engineering Society Convention (136th
International Conference), Berlin.
Sarpeshkar, R. (2006, May). Brain power: Borrowing from biology makes for low-power
computing. IEEE Spectrum 43(5), 2429. http://spectrum.ieee.org/biomedical/devices/
brain-power/.
Schnupp, J. W. H. and C. E. Carr (2009, May). On hearing with more than one ear: Lessons
from evolution. Nature Neuroscience 12(6), 692697. DOI: 10.1038/nn.2325, www.
nature.com/neuro/journal/v12/n6/full/nn.2325.html.
Scientific American (1880, July). Navigation in fogs. Scientific American 43(1). http://www.
gutenberg.org/files/38482/38482-h/38482-h.htm.
Shedroff, N. and C. Noessel (2012). Make It So: Interaction Design Lessons from Science
Fiction. Rosenfeld Media.
Spagnol, S. and F. Avanzini (2009, July). Auditory localization in the near-field. In Proceedings
of SMC: Sound and Music Computing Conference, Porto.
Spagnol, S., M. Geronazzo, and F. Avanzini (2013, March). On the relation between pinna
reflection patterns and head-related transfer function features. IEEE Transactions on
Audio, Speech, and Language Processing 21(3), 508519.
Starner, T. and J. A. Paradiso (2004). Human generated power for mobile electronics. In Low
Power Electronics Design, pp. 135. CRC Press.
Stewart, R., M. Levy, and M. Sandler (2008, September). 3D interactive environment for
music collection navigation. In Proceedings of DAFx: 11th International Conference on
Digital Audio Effects, Espoo.
Streicher, R. and F. A. Everest (2006). The New Stereo Soundbook (3rd edn.). Pasadena,
California: Audio Engineering Associates.
Streitz, N., A. Kameas, and I. Mavrommati (eds.) (2007). The Disappearing Computer:
Interaction Design, System Infrastructures and Applications for Smart Environments.
State-of-the-Art Survey. Springer LNCS 4500.
Tachi, S. (2009). Telexistence. Singapore: World Scientific Publishing Company.
Takao, H. (2003, July). Adapting 3D sounds for auditory user interface on interactive in-car
information tools. PhD thesis, Waseda University, Tokyo. dspace.wul.waseda.ac.jp/
dspace/handle/2065/436.
Terven, J. R., J. Salas, and B. Raducanu (2014, April). New opportunities for computer visionbased assistive technology systems for the visually impaired. IEEJ Transactions on
Electronics, Information and Systems 47(4), 5258. DOI: 10.1109/MC.2013.265, http://
doi.ieeecomputersociety.org/10.1109/MC.2013.265.
329
14
Recent Advances in
Augmented Reality
for Architecture,
Engineering, and
Construction
Applications
Amir H. Behzadan, Suyang Dong,
and Vineet R. Kamat
CONTENTS
14.1 Introduction................................................................................................... 332
14.1.1 Overview of Augmented Reality in Architecture, Engineering,
and Construction................................................................................ 333
14.1.2 Recent Advances in AR for AEC Applications................................. 335
14.2 Challenges Associated with AR in AEC Applications................................. 337
14.2.1 Spatial Alignment of Real and Virtual Objects (Registration)......... 337
14.2.1.1 Registration Process............................................................ 337
14.2.1.2 Experimental Results.......................................................... 341
14.2.2 Visual Illusion of Virtual and Real-World Coexistence (Occlusion).....344
14.2.2.1 Occlusion Handling Process...............................................344
14.2.2.2 Two-Stage Rendering..........................................................346
14.2.2.3 Implementation Challenges................................................346
14.2.2.4 Experimental Results.......................................................... 349
14.3 Software and Hardware for AR in AEC Applications.................................. 350
14.3.1 Software Interfaces............................................................................ 350
14.3.1.1 ARVISCOPE...................................................................... 350
14.3.1.2 SMART............................................................................... 352
14.3.2 Hardware Platforms........................................................................... 353
14.3.2.1 UM-AR-GPS-ROVER........................................................ 353
14.3.2.2 ARMOR.............................................................................. 355
331
332
14.1INTRODUCTION
In several science and engineering applications, visualization can enhance a users
cognition or learning experience especially when the goal is to communicate information about a complex phenomenon or to demonstrate the applicability of an
abstract concept to real-world circumstances. An important category of visualization
is termed virtual reality (VR), which attempts to replace the users physical world
with a completely synthetic environment. There are a wide array of applications
now commonly associated with VR such as computer-aided design (CAD), scientific
visualization, visual simulation, animation, computer games, and virtual training.
In VR, however, the users sensory receptors (eyes and ears) are isolated from
the real physical world and completely immersed in the synthetic environment that
replicates the physical world to some extent. In addition, even though VR can provide a stable, robust, interactive, and immersive experience, the cost and effort of
constructing a faithful synthetic environment include tasks such as model engineering (the process of creating, refining, archiving, and maintaining 3D models), scene
management, and graphics rendering and can thus be enormous (Brooks 1999).
In contrast to the VR paradigm, another category of visualization techniques, called
augmented reality (AR), attempts to preserve the users awareness of the real environment by compositing the real-world and the virtual contents in a mixed 3D space. In
particular, AR refers to the visualization technology that blends virtual objects with
333
the real world (Azuma etal. 2001). For this purpose, AR must not only maintain a
correct and consistent spatial relation between the virtual and real objects, but also
sustain the illusion that they coexist in the augmented space. The blending effect reinforces the connections between people and objects, promotes peoples appreciation
about their context, and provides hints for the users to discover their surroundings.
In addition, the awareness of the real environment in AR and the information
conveyed by the virtual objects help users perform real-world tasks, whereas VR
applications are mainly restricted to designing, running simulations, and training
(Azuma 1997). Furthermore, AR offers a promising alternative to the model engineering challenge inherent in VR by only including entities that capture the essence
of the study (Behzadan and Kamat 2005). These essential entities usually exist in a
complex and dynamic context that is necessary to the model, but costly to replicate
in VR. However, reconstructing the context is rarely a problem in AR, where modelers can take full advantage of the real context (e.g., terrains and existing structures)
and render them as backgrounds, thereby saving a considerable amount of effort and
resources.
334
(a)
(b)
(c)
(d)
FIGURE 14.1 Example applications of AR in the AEC industry. (a) subsurface utilities (From
Schall, G. et al., Virtual redlining for civil engineering in real environments, Proceedings of
the IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Cambridge,
U.K., 2010.), (b) inspection (From Georgel, P. et al., An industrial augmented reality solution
for discrepancy check, Proceedings of the 2007 IEEE and ACM International Symposium on
Mixed and Augmented Reality, 2007, pp. 111115.), (c) supervision (From Golparvar-Fard, M.
et al., J.Comput. Civil Eng., 23(6), 418, 2009. With permission from ASCE.), and (d) project
feasibility analysis (From Piekarski, W., IEEE Comp. Graphics Appl., 26(1), 14, 2006.)
335
developing a welding helmet that augments visual information, such as paper drawings
and online quality assistance, before and during the welding process.
Video
camera
Electronic
compass
Orientation of
excavator
AR cabin platform
Write
Geometric data in
Position of
excavator
Rover part
of RTK
AR visualization
Read
Geodatabase
(a)
FIGURE 14.2 AR research for AEC applications in LIVE. (a) Visual collision avoidance. (From Talmaki, S.A. et al., Adv. Eng. Inform., 27(2), 283, 2013.)(Continued)
336
GPS
antenna
Electronic
compass
Camera
HMD
RTK rover
receiver
RTK rover
radio
(b)
GPS satellites
Trace file
DES model
Augmented view
Mobile user
Interpreter
CAD models
(c)
Reconnaissance of damaged building. (From Dong, S., Scalable and extensible augmented reality with applications in civil infrastructure systems, PhD dissertation,
Department of Civil and Environmental Engineering, University of Michigan, Ann
Arbor, MI, 2012.) (c) Visualization of construction processes. (From Behzadan, A.H.
and Kamat, V.R., J. Comput. Civil Eng., 23(6), 405, 2009b. With permission from
ASCE.)(Continued)
337
(d)
FIGURE 14.2 (Continued) AR research for AEC applications in LIVE. (d) Collaborative
AR visualization. (From Dong, S. et al., Elsevier J. Adv. Eng. Softw., 55, 45, 2013.)
338
TABLE 14.1
Four Steps of the Registration Process in AR
Step
Viewing
Task
Parameters
and Device
Illustration
Position the
viewing volume
of a users eyes
in the world.
Attitude of
the camera
(electronic
compass)
Zw
Ze
Ye
Xe
Modeling
Position the
objects in the
world.
Yw
Xw
Location of
the world
origin
(RTK-GPS)
Zw
Zo
Yw
Yo
Xw
Xo
Creating
viewing
frustum
Lens and
aspect ratio
of camera
(camera)
Projection
Project the
objects onto the
image plane.
Perspective
projection
matrix
339
Ze
Zw
Zo
Yw East
Yo
Xw
Xo
True North
Yw(Ye)
Yw
Ye
Yw
Xe
Xw
the OpenSceneGraph (OSG) (Martz 2007) default coordinate system, using a righthanded system with the Z-axis as the up vector, and the Y-axis departing from the eye.
As shown in Figure 14.3, the yaw, pitch, and roll angles are used to describe the
relative orientation between the world and eye coordinate systems. The zxy rotation
sequence is picked to construct the transformation matrix between the two coordinate systems. Suppose the eye and world coordinate systems coincide at the beginning. The users head rotates around the Z-axis by yaw angle [180, +180] to get
the new axes, X and Y. Since the rotation is clockwise under the right-handed system, the rotation matrix is Rz(). Then, the head rotates around the X-axis by pitch
angle [90, +90] to get the new axes Y and Z, with counterclockwise rotation
of R x(). Finally, the head rotates around the Y-axis by roll angle [180, +180]
with a counterclockwise rotation of Ry() to reach the final attitude.
Converting the virtual object from the world coordinate to the eye coordinate is
an inverse process of rotating from the world coordinate system to the eye coordinate
system; therefore, the rotating matrix is written as Rz()R x()Ry(), as shown in
Equation 14.1. Since OSG provides quaternion, a simple and robust way to express
rotation, the rotation matrix is further constructed as quaternion by specifying the
rotation axis and angles. The procedure is explained as follows, and all associated
equations are listed in sequence in Equations 14.2 through 14.5: rotating around the
Y-axis by degrees, then rotating around the X-axis by degrees, and finally
rotating around the Z-axis by degrees:
0
0 cos( ) 0 sin( ) X w
X e cos( ) sin( ) 0 1
1
0
Ye = sin( ) cos( ) 0 * 0 cos() sin() * 0
* Yw
Z e 0
0
1 0 sin() cos() sin() 0 cos() Z w
(14.1)
Pe = Rz ( ) *Rx ( ) * Ry ( ) * Pw (14.2)
0
Z-axis = 0 (14.3)
1
340
Once the rotation sequence and transformation is completed, the next step is to
model the virtual objects in their exact locations. The definition of the object
coordinate system is determined by the drawing software. The origin is fixed to
a pivot point on the object with user-specified geographical location. The geographical location of the world coordinate origin is also given by position tracking
devices (e.g., GPS sensor) carried by the user. Therefore, the 3D vector between
the object and world coordinate origins can be calculated. The methods to calculate the distance between geographical coordinates is originally introduced by
Vincenty (1975). Behzadan and Kamat (2007) used this approach to design an
inverse method that uses a reference point to calculate the 3D vector between two
geographical locations.
Once a virtual object is modeled inside the users viewing frustum, any further
translation, rotation, and scaling operations are applied on the object. Finally,
theusers viewing frustum must be defined. The real world is perceived through
the perspective projection by the human eye and the video camera. Four parameters are needed to construct a perspective projection matrix: horizontal angle
of view, horizontal and vertical aspect ratio, and near and far planes. As shown
in Figure 14.4, these parameters together form a viewing frustum and decide the
virtual content to be displayed in the augmented space. In order to increase computational efficiency, all virtual objects outside of the viewing frustum are either
cropped or clipped.
Far
Top
Aspect ratio =
Width
Height
Left
Near
Ze
Line o
t
f sigh
Ye
Center of point
Xe
Vertical
FOV
Bottom
Right
FIGURE 14.4 The viewing frustum defines the virtual content that can be seen.
341
X pos: 0.15 m
Y pos: 0.30 m
Z pos: 0.04 m
X pos: 0.05 m
Y pos: 0.30 m
Z pos: 0.09 m
Roll: 22.21
X pos: 0.07 m
Y pos: 0.30 m
Z pos: 0.09 m
Pitch: 46.12
FIGURE 14.5 Mechanical attitude calibration result and validation experiment of registration algorithm.
342
Wait for
data packet
(a)
Eablish
communication
Wait for
data packet
(b)
FIGURE 14.6 Communication stages in (a) the PULL mode and (b) the PUSH mode.
CoreTM 2Duo CPU T6600 2.2 GHz and 64-bit Windows operating system. In order
to minimize the transmission latency between the camera and the host system, an
integrated camera was used and the resolution was adjusted to the minimum option
of 160120. A TCM compass module was used as a 3D orientation tracking device.
Both the camera and the TCM module ran at approximately 30Hz. The camera update
function was written as a callback and executed at every frame. The system time was
recorded when each new frame was captured. The device update function was written as a delegate and registered with the OnDataReceived event when a new data
packet was placed in the buffer.
The system time stamp was also assigned to the angular data each time the event is
triggered. As shown in Figure 14.7, the TCM module was held static at the beginning,
and then rapidly swung to one side at the speed of about 150/s. Later, the exact instant
that the module started swinging was identified from the recorded image frames and
the TCM module angular data, along with their corresponding time stamps. In this
way, the time stamps were compared to find the lag of the TCM module PUSH mode.
343
(b)
(a)
(c)
Degree
200
150
100
50
0
(d)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Time instance
FIGURE 14.7 Comparison between the TCM-XB data log and the corresponding recorded
image frames. The shaded area highlights the exact instant that the module started swinging.
(a) Static, (b) Begin to swing, (c) Second frame of swing, (d) Recorded data log.
Six groups of experiments were carried out and the delay in the PUSH mode relative
to the web camera was found to be 5 ms on average. This implies that the communication delay in the PUSH mode was small enough to be neglected.
Another source of latency error in PUSH mode is due to the finite impulse response
(FIR) filter of the compass module. In particular, the calibration of the magnetometer
can compensate for a local static magnetic source within the vicinity of the compass
module. However, dynamic magnetic distortion still impacts the module in motion,
and the noise magnification depends on the acceleration of the module. Usually the
noise increases with the acceleration. Among the three degrees of freedom, heading
(i.e., yaw) is the most sensitive to the noise. Except for the high frequency vibration
noise, other types of noise can be removed by a FIR Gaussian filter. The compass
344
Frame
A
Gaussian filter
Pop out
Insert
Frame
A16
0.08
0.04
Update
Frame Frame Frame
A
A1
A2
Frame
A15
Gaussian
0.06
Frame Frame
A31 A32
0.02
0.00
Frame Frame
A30 A31
18 16 14 12 10 8 6 4 2 0
2 4
6 8 10 12 14 16 18
FIGURE 14.8 The filter-induced latency when a 32 tap Gaussian filter is used.
module comes with five options for filtering: 32, 16, 8, 4, and 0 tap filter. The higher
the number, the more stable the output but the longer the expected latency.
Consider the case of selecting a 32 tap filter, as shown in Figure 14.8. When it is time
to send out estimated data at time instant A, the module adds a new sample A to the end
of the queue with the first one being dropped, and applies a Gaussian filter to the queue.
However, the filtered result actually reflects the estimated value at time instant A15.
Since the module samples at approximately 3032Hz, it induces a 0.5s delay for a 32 tap
filter, a 0.25 s delay for 16 tap filter, and so on. This is referred to as filter-induced latency,
and it applies to both PULL and PUSH modes. A 0 tap filter implies no filtering, but significant jittering. Dong (2012) provided a detailed account of how filter-induced latency
can be avoided by moving the Gaussian FIR filter from the hardware to the software.
(a)
345
(b)
FIGURE 14.9 Example of incorrect and correct occlusion in AR visualization. (a) Incorrect
occlusion, (b) Correct occlusion.
to be seen from frame to frame. Lepetit and Berger (2000) refined the previous method
with a semiautomated approach that requires the user to outline the occluding objects
in the key views, and then the system automatically detects these occluding objects and
handles uncertainties on the computed motion between two key frames. Despite the
visual improvements, the semiautomated method is only appropriate for postprocessing.
Fortin and Hebert (2006) studied both a model-based approach using a bounding box,
and a depth-based approach using a stereo camera. The former works only with a static
viewpoint and the latter is subject to low-textured areas. Ryu etal. (2010) and Louis and
Martinez (2012) tried to increase the accuracy of the depth map by a region of interest
extraction method using background subtraction and stereo depth algorithms. However,
only simple background examples were demonstrated. Tian et al. (2010) designed an
interactive segmentation and object tracking method for real-time occlusion, but their
algorithm fails in the situation where virtual objects are in front of the real objects.
In the authors research, a robust AR occlusion algorithm was designed and
implemented that uses a real-time time-of-flight (TOF) camera, an RGB video camera, OpenGL Shading Language (GLSL), and render to texture (RTT) techniques to
correctly resolve the depth of real and virtual objects in real-time AR visualizations.
Compared to the previous approach, this approach enables improvements in three ways:
1.
Ubiquity: The TOF camera is capable of suppressing the background illumination (SBI) and enables the designed algorithm to work in both indoor
and outdoor environments. It puts the least limitation on context and conditions compared with any previous approach.
2.
Robustness: Using the OpenGL depth-buffering method, it can work regardless of the spatial relationship among involved virtual and real objects.
3.
Speed: The processing and sampling of the depth map is parallelized by taking
advantage of the GLSL fragment shader and the RTT technique. Koch etal.
(2009) described a parallel research effort that adopted a similar approach for
TV production in indoor environments with a 3D model constructed beforehand with the goal of segmenting a moving factor from the background.
A fundamental step to correct occlusion handling is obtaining an accurate measurement of the distance from the virtual and real object to the users eye. In an outdoor
346
AR environment, the distance from the virtual object to the viewpoint can be calculated using the Vincenty algorithm (Vincenty 1975). This algorithm interprets the
metric distance based on the geographical locations of the virtual object and the user.
The locations of the virtual objects are predefined by the program. In a simulated
construction operation, for example, the geographical locations of virtual building
components and equipment are extracted from the engineering drawings. The location of the viewpoint, on the other hand, is tracked by a position sensor (e.g., GPS)
carried by the user. A TOF camera estimates the distance from the real object to the
eye with the help of the TOF principle, which measures the time that a signal travels,
with well-defined speed, from the transmitter to the receiver (Beder etal. 2007).
Specifically, the TOF camera measures radio frequency (RF)-modulated light
sources with phase detectors. The modulated outgoing beam is sent out with an RF
carrier, and the phase shift of that carrier is measured on the receiver side to compute
the distance (Gokturk etal. 2010). Compared to traditional light detection and ranging (LIDAR) scanners and stereo vision, the TOF camera features real-time feedback
with high accuracy. It is capable of capturing a complete scene with one shot, and with
speeds of up to 40 frames per second (fps). However, common TOF cameras are vulnerable to background light (e.g., artificial lighting and the sun) that generates electrons
and confuses the receiver. In the authors research, the SBI method is used to allow the
TOF camera to work flexibly in both indoor and outdoor environments (PMD 2010).
14.2.2.2 Two-Stage Rendering
Depth buffering, also known as z-buffering, is the solution for hidden-surface elimination in OpenGL, and is usually done efficiently in the graphics processing unit (GPU).
A depth buffer is a 2D array that shares the same resolution with the color buffer and
the viewport. If enabled in the OpenGL drawing stage, the depth buffer keeps record
of the closest depth value to the observer for each pixel. For an incoming fragment at
a certain pixel, the fragment will not be drawn unless its corresponding depth value
is smaller than the previous one. If it is drawn, then the corresponding depth value in
the depth buffer is replaced by the smaller one. In this way, after the entire scene has
been drawn, only those fragments that were not obscured by any others remain visible.
Depth buffering thus provides a promising approach for solving the AR occlusion problem. Figure 14.10 shows a two-stage rendering method. In the first stage
of rendering, the background of the real scene is drawn as usual, but with the depth
map retrieved from the TOF camera written into the depth buffer at the same time.
In the second stage, the virtual objects are drawn with depth buffer testing enabled.
Consequently, the invisible part of virtual object, either hidden by a real object or
another virtual object, will be correctly occluded.
14.2.2.3 Implementation Challenges
Despite the straightforward implementation of depth buffering, there are several challenges when integrating the depth buffer with the depth map from the TOF camera:
1. After being processed through the OpenGL graphics pipeline and written into the depth buffer, the distance between the OpenGL camera and
the virtual object is no longer the physical distance (Shreiner etal. 2006).
Thetransformation model is explained in Table 14.2. Therefore, the distance
De
buf pth
fer
B
RG age
im
Co
buf lor
fer
Hidden surface
removal
h
pt e
De ag
im
347
AR
registration
FIGURE 14.10 Two-stage rendering for occlusion handling. (From Dong, S. etal., J. Comput.
Civ. Eng., 27(6), 607, 2013.)
TABLE 14.2
Transformation Steps Applied to the Raw TOF Depth Image
Name
Meaning
Operation
Expression
Ze
Distance to the
viewpoint
Zc
Clip coordinate
after projection
transformation
Mortho * Mperspective *
[Xe Ye Ze We]T
Zcvv
Canonical view
volume
Z cvv =
Zd
Value sent to
depth buffer
(Zndc + 1)/2
Zd =
Range
(0, +)
Zc =
Z e * ( f + n) 2 * f * n * We
f n
f n
[n, f]
f n Z e * ( f n)
f +n
f *n
+ 0.5
2*( f n) Ze*( f n)
[1, 1]
[0, 1]
for each pixel from the real object to the viewpoint recorded by the TOF
camera has to be processed by the same transformation model, before it is
written into the depth buffer for comparison.
2. There are three cameras for rendering an AR space: a video camera, a TOF
camera, and an OpenGL camera. The video camera captures RGB or intensity
values of the real scene as the background, and its result is written into the color
buffer. The TOF camera acquires the depth map of the real scene, and its result is
written into the depth buffer. The OpenGL camera projects virtual objects on top
of real scenes, with its result being written into both the color and depth buffers.
In order to ensure correct alignment and occlusion, ideally all cameras should
share the same projection parameters (theprinciple points and focal lengths).
348
Even though the TOF camera provides an integrated intensity image that can be
aligned with the depth map by itself, the monocular color channel compromises
the visual credibility. On the other hand, if an external video camera is used,
then the intrinsic and extrinsic parameters of the video camera and TOF camera
may not agree (i.e., different principle points, focal lengths, and distortions).
Therefore, some image registration methods are required to find the correspondence between the depth map and the RGB image. Dong (2012) provided a
detailed description of two methods including a nonlinear homography estimation implementation adopted from Lourakis (2011) and stereo projection that
are used to register the depth map and RGB image. The projection parameters
of OpenGL camera are adjustable and can accommodate either an RGB or TOF
camera. Figure 14.11 shows snapshots of the occlusion effect achieved in this
research by using homography mapping between the TOF and RGB camera.
3. Traditional OpenGL pixel-drawing commands can be extremely slow when
writing a 2D array (i.e., the depth map) into the frame buffer. In the authors
research, an alternative and efficient approach using OpenGL texture and
GLSL is used.
4. The resolution of the TOF depth map is fixed as 200 200, while that of the
depth buffer can be arbitrary, depending on the resolution of the viewport.
This implies the necessity of interpolation between the TOF depth map and
the depth buffer. Furthermore, image registration demands an expensive
computation budget if a high-resolution viewport is defined. In the authors
research, the RTT technique is used to carry out the interpolation and registration computation in parallel.
(a)
(b)
FIGURE 14.11 Occlusion effect comparison using homography mapping between the
TOF camera and the RGB camera: (a) occlusion disabled and (b) occlusion enabled. (From
Dong,S. etal., J. Comput. Civ. Eng., 27(6), 607, 2013.)
349
(a)
(b)
FIGURE 14.12 Indoor simulated construction processes with occlusion (a) disabled and
(b)enabled. (From Dong, S. etal., J. Comput. Civ. Eng., 27(6), 607, 2013.)
350
(a)
(b)
FIGURE 14.13 Outdoor simulated construction processes with occlusion (a) disabled and
(b) enabled. (From Dong, S. etal., J. Comput. Civ. Eng., 27(6), 607, 2013.)
351
In order to create a viewing frustum with the users eye at the center of the projection,
using the procedure described in Section14.3.2.1, users positional and head orientation data are continuously obtained and processed. Details of the tracking devices
used to achieve this are also described in Section 14.3.2.1.
Despite the fact that the ARVISCOPE authoring language is powerful enough to
describe the complexities involved in a typical construction operation, the syntax of
the language is not very complex. Based on their functionality, ARVISCOPE language statements are grouped into scene construction, dynamic, and control statements (Behzadan 2008). These statements can be sequentially recorded into and
interpreted from a text file referred to as the animation trace file. The animation
trace file is sequentially interpreted line by line as soon as the application starts, the
individual statements are processed, and the graphical representation corresponding
to the event in each line of the trace file is simultaneously created and depicted inside
the users augmented viewing frustum. During this process, the user can freely move
in the animated augmented space.
The animation trace file can be created either manually (for short animations) or
automatically during a simulation run. Manual generation of an animation trace file
is typically not practical except in the case of simple demonstrative examples of short
animated duration. Automatic generation of a trace file is more recommended since
it requires less time and produces more accurate results. Automatic generation of an
animation trace file requires instrumentation of a simulation model (i.e., including
additional code and statements in a simulation model). For example, Figure 14.14
shows how two new lines are created and added to the animation trace file describing a simple earthmoving operation as a result of a statement added to the simulation input file of the same operation. These two lines will be written to the trace file
numerous times with different arguments (e.g., time tag, duration, object name, route
name) depending on the specific instance of the activity taking place.
The completed trace file will contain other lines of text that will be written out when
other parts of the modeled operation take place. Thus, the time-ordered sequence of
animation statements written out by all the activities in the model during a simulation run constitutes the trace file required to visualize the modeled operations in AR.
DES tool (stroboscope)
Soil
Loader
Soil
Load
Haul
Hauler
Return
Dump
AR language (arviscope)
Generating
SIMTIME 12.00;
TRAVEL Hauler1 ReturnRoad 15.00;
FIGURE 14.14 Sample instrumentation of a DES input file for automated generation of the
corresponding ARVISCOPE animation trace file.
352
Itshould be noted that the simulation input file partially shown in Figure14.14 can be
created by any DES authoring language such as STROBOSCOPE (State and Resource
Based Simulation of Construction Processes). STROBOSCOPE is a programmable and
extensible simulation system designed for modeling complex construction operations
in detail and for the development of special-purpose simulation tools (Martinez 1996).
ARVISCOPE supports animation scalability, which in the context of the authors
research is defined as the ability of the visualization to construct complex scenes that
potentially consist of a large number of CAD objects, and to maintain performance
levels as the size of the operation increases. Scalability allows the creation of very
complex scenes such as the erection of an entire structural steel frame consisting of
several beams and columns by loading only a few CAD models of steel sections,
and placing them repeatedly at appropriate locations using multiple transformation
nodes. Figure 14.15 shows animation snapshots of a structural steel erection operation that was modeled in STROBOSCOPE and visualized in full-scale outdoor AR
using ARVISCOPE. The multistory steel structure shown in this figure was completely modeled and animated in the augmented scene using CAD models of only
a few steel sections. The operation consisted of a virtual tower crane that picked up
steel sections and installed them in their appropriate locations on the steel structure.
14.3.1.2SMART
Scalable and Modular Augmented Reality Template (SMART) is an extensible AR
computing framework that is designed to deliver high-accuracy and convincing augmented graphics that correctly place virtual contents relative to a real scene and
robustly resolve the occlusion relationships between them. SMART is built on top
of the previously designed ARVISCOPE platform (Section 14.3.1.1) and is a loosely
coupled interface that is independent of any specific engineering application or
domain. Instead, it can be readily adapted to an array of engineering applications
such as visual collision avoidance of underground facilities, postdisaster reconnaissance of damaged buildings, and visualization of simulated construction processes.
353
The inbuilt registration algorithm of SMART guarantees high-accuracy static alignment between real and virtual objects. Some efforts have also been made to reduce
dynamical misregistration, which includes the following:
1. In order to reduce synchronization latency, multiple threads are dynamically generated for reading and processing sensor measurement immediately upon the data arrival in the host system.
2. The FIR filter applied to jittering output of the electronic compass leads to
filter-induced latency; therefore, an adaptive lag compensation algorithm is
designed to eliminate the dynamic misregistration.
The SMART framework follows the classical modelviewcontroller (MVC) pattern. Scenegraphcontroller is the implementation of the MVC pattern in SMART
and is described in the following:
1. The model counterpart in SMART is the scene that utilizes application-
specific input/output (I/O) engines to load virtual objects, and that maintains their spatial and attribute status. The update of a virtual objects status
is reflected when it is time to refresh the associated graphs.
2. The graph corresponds to the view and reflects the AR registration results
for each frame update event. Given the fact that the users head can be in
continuous motion, the graph always invokes callbacks to rebuild the transformation matrix based on the latest position and attitude measurement,
and refreshes the background image.
3. The controller manages all user interface (UI) elements, and responds to a
users commands by invoking delegates member functions such as a scene
or a graph.
The SMART framework that is based on a scenegraphcontroller setup is shown in
Figure 14.16 and is constructed in the following way: the main entry of the program
is CARApp, which is in charge of CARSensorForeman and CARSiteForeman.
The former initializes and manages all tracking devices such as real-time kinematic
(RTK) GPS receivers and electronic compasses, while the latter defines the relation
among scene, graphs, and controller. Once a CARSiteForeman object is initialized, it orchestrates the creation of CARScene, CARController, and CARGraph
and the connection of graphs to the appropriate scene. Applications derived from
SMART are single document interface (SDI). Therefore, there is only one open
scene and one controller within a SmartSite. The controller keeps pointers to the
graph and the scene.
14.3.2Hardware Platforms
14.3.2.1UM-AR-GPS-ROVER
The designed software interface must be accompanied by a robust and easy-todeploy hardware platform that enables users to perform operations in both indoor and
outdoor settings. Therefore, a first-generation wearable hardware apparatus called
354
1
1
CARSiteManager
Attributes
+Operations()
+Operations()
CARSensorForeman
Attribute
+Operation()
1
1
1..
1
SmartSite
Attributes
+Operations()
<<uses>>
1
CARTrackerCallback
Attributes
+Operations()
1
1
CARControllerA <<uses>>
CARSceneA
Attributes
Attributes
+Operations()
+Operations()
1
1..*
CARGraphA
Attributes
+Operations()
+Operations()
1
1
<<uses>>
CARMotionTracker
Attributes
CARStatementProcessor
CARAnimation
Attributes
+Operations()
Attributes
+Operations()
<<uses>> <<uses>>
CARLocation
Attributes
+Operations()
CAROrientation
Attributes
+Operations()
SMARTVideo
Attributes
CARApp
Attribute
+Operation()
355
GPS sensor
Tracker
(hidden)
Video camera
Head-mounted
display
Laptop
Touch pad
UM-AR-GPS-ROVER was designed in which GPS and three DOF head orientation
sensors were used to capture a users position and direction of look (Behzadan etal.
2008). Figure 14.17 shows the configuration of the backpack and the allocation of
hardware. UM-AR-GPS-ROVER was equipped with the following:
1. Computing devices capable of rapid position calculation and image rendering including an interface for external input (i.e., both user commands and
a video capturing the users environment)
2. An interface to display the final augmented view to the user
3. External power source for the hardware components to ensure continuous
operation without restricting user mobility
The design also had to take into account ergonomic factors to avoid user discomfort
after long periods of operation. Figure 14.18 shows the main hardware components
of UM-AR-GPS-ROVER that include a head-mounted display (HMD), user registration and tracking peripherals, and a mobile laptop computer to control and facilitate
system operation and user I/O devices.
14.3.2.2ARMOR
As a prototype design, UM-AR-GPS-ROVER succeeded in reusability and modularity, and produced sufficient results in proof-of-concept simulation animation.
However, there are two primary design defects that are inadequately addressed: accuracy and ergonomics. First, the insecure placement of tracking devices disqualifies the
UM-AR-GPS-ROVER from the centimeter-accuracy-level goal. Second, packaging
all devices, power panels, and wires into a single backpack makes it impossible to
accommodate more equipment such as RTK Rover radio. The backpack was also too
heavy for even distribution of weight around the body. The Augmented Reality Mobile
OpeRation (ARMOR) platform evolves from the UM-AR-GPS-ROVER platform.
356
h
c
Helmet components
Backpack components
Input devices
ARMOR introduces high-accuracy and lightweight devices, rigidly places all tracking
instruments with full calibration, and renovates the carrying harness to make it more
wearable. The improvements featured in ARMOR can be broken into four categories:
1. Highly accurate tracking devices with rigid placement and full calibration.
2. Lightweight selection of I/O and computing devices and external power
source.
3. Intuitive user command input.
4. Load-bearing vest to accommodate devices and distribute weight evenly
around the body.
An overview comparison between UM-AR-GPS-ROVER and ARMOR is listed in
Table 14.3. ARMOR can work in both indoor and outdoor modes. The indoor mode
does not necessarily imply that the GPS signal is unavailable, but that the qualified
GPS signal is absent. The GPS signal quality can be extracted from the $GGA section of the GPS data string that follows the National Marine Electronics Association
(NMEA) format. The fix quality ranges from 0 to 8. For example, 2 means differential GPS (DGPS) fix, 4 means RTD fix, and 5 means float RTK. The user can
define the standard (i.e., which fix quality is deemed as qualified) in the hardware
357
TABLE 14.3
Comparison between UM-AR-GPS-ROVER and ARMOR Platforms
Component
UM-AR-GPS-ROVER
ARMOR
Comparison
Location
tracking
Orientation
tracking
PNI TCM 5
Video camera
Microsoft LifeCam
VX-5000
Head-mounted
display
Laptop
User command
input
Power source
WristPC wearable
keyboard and Cirque
Smart Cat touchpad
Fedco POWERBASE
OmniStar XP provides
1020cm accuracy. RTK
provides 2.5cm horizontal
accuracy and 3.7cm vertical
accuracy.
Same accuracy, but ARMOR
places TCM XB rigidly
close to camera.
LifeCam VX-5000 is
lightweight, and has small
volume and less wire.
Z800 3DVisor is lightweight
with stereovision.
Asus N10J is lightweight, has
small volume, and is
equipped with NVIDIA GPU.
Wii Remote is lightweight
and intuitive to use.
Backpack
apparatus
Kensington contour
laptop backpack
Load-bearing vest
Tekkeon myPower
ALL MP3750
configuration file. When a qualified GPS signal is available, the geographical location is extracted from the $GPGGA section of the GPS data string. Otherwise, a preset
pseudo-location is used, and this pseudo-location can be controlled by a keyboard.
The optimization of all devices in aspects such as volume, weight, and rigidity
allows that all components be compacted and secured into one load-bearing vest.
Figure 14.19 shows the configuration of the ARMOR backpack and the allocation
of hardware. There are three primary pouches: The back pouch accommodates the
AgGPS 332 Receiver, the SiteNet 900 is stored in the right-side pouch, and the leftside pouch holds the HMD connect interface box to a PC and the MP3750 battery. An
Asus N10J netbook is securely tied to the inner part of the back pouch. All other miscellaneous accessories (e.g., USB to serial port hubs, AAA batteries) are distributed
in the auxiliary pouches. The wire lengths are customized to the vest, which minimizes outside exposure. The configuration of the vest has several advantages over
the Kensington Contour laptop backpack used by ARVISCOPE. First, the design of
the pouches allows for an even distribution of weight around the body. Second, the
separation of devices allows the user to conveniently access and checks the condition of certain hardware. Third, different parts of the loading vest are loosely joined
358
GPS antenna
Electronic compass
HMD
Camera
HMD
RTK Rover
radio antenna
RTK rover
radio
Netbook
so that the vest can fit any body type, and be worn rapidly even when fully loaded.
ARMOR has been tested by several users for outdoor operations that lasted for over
30 continuous minutes, without any interruption or reported discomfort.
359
Calculating the IDR commonly follows contact (specifically the double integration
of acceleration) or noncontact (vision-based or laser scanning) methods. Skolnik and
Wallace (2010) discussed that the double integration method may not be well suited for
nonlinear responses due to sparse instrumentation or subjective choices of signal processing filters. On the other hand, most vision-based approaches require the preinstallation of
a target panel or emitting light source that may not be widely available and can be subject
to damage during long-term maintenance. Examples of these approaches can be found in
Wahbeh etal. (2003) (tracking an LED reference system with a high fidelity camera) and
Ji (2010) (using feature markers as reference points for vision reconstruction).
Fukuda etal. (2010) tried to eliminate the need for target panels by using an object
recognition algorithm called orientation code matching. They performed comparison experiments by tracking a target panel and existing features on bridges such
as bolts, and achieved satisfactory agreement between the two test sets. However,
it is not clear whether this approach performs well when monitoring a buildings
structure, as building surfaces are usually featureless. In addition, deploying laser
scanners for continuous or periodic structural monitoring (Alba et al. 2006, Park
etal. 2007), in spite of their high accuracy, may not be feasible for rapid evaluation
scenarios given the equipment volume and the large collected datasets.
Kamat and El-Tawil (2007) first proposed the approach of projecting the previously
stored building baseline onto the real structure, and using a quantitative method to
count the pixel offset between the augmented baseline and the building edge. Despite
the stability of this method, it required a carefully aligned perpendicular line of sight
from the camera to the wall for pixel counting. Such orthogonal alignment becomes
unrealistic for high-rise buildings, since it demands the camera and the wall to be
at the same height. Dai et al. (2011) removed the premise of orthogonality using a
photogrammetry-assisted quantification method, which established a projection relation between 2D photo images and the 3D object space. They validated this approach
with experiments that were conducted with a two-story reconfigurable aluminum
building frame whose edge could be shifted by displacing the connecting bolts.
However, the issue of automatic edge detection and the feasibility of deploying such a
method at large scales, for example, with high-rise buildings, have not been addressed.
In this Section, a new algorithm called line segment detector (LSD) for automating edge extraction and a new computational framework for automating the damage detection procedure are introduced. In order to verify the effectiveness of these
methods, a synthetic virtual prototyping (VP) environment was designed to profile
the detection algorithms sensitivity to errors inherent in the used tracking devices.
Figure 14.20 shows the schematic overview of measuring earthquake-induced damage
manifested as a detectable drift in a buildings faade. The previously stored building
information is retrieved and superimposed as a baseline wireframe image on the real
building structure after the damage. The sustained damage can be then evaluated by
comparing the key differences between the augmented baseline and the actual drifting
building edge. Figure 14.20 also demonstrates the hardware prototype ARMOR (Dong
and Kamat 2010) on which the developed application can be deployed. The inspector
wears a GPS antenna and a RTK radio that communicates with the RTK base station.
Together they can track the inspectors position up to centimeter-accuracy level. The
estimation procedure and the final results can be shown in an HMD to the inspector.
360
RTK Rover
receiver
RTK Rover
radio
Detected
edge
(a)
Vertical
baseline
Detected
edge
Vertical
baseline
(b)
FIGURE 14.21 Graphical discrepancy between the vertical baseline and the detected
building edge provides hints about the magnitude of the local damage. (a) Small gap indicates
less local damage. (b) Large gap indicates more local damage. (From Dong, S. etal., Autom.
Construct., 33, 24, 2013.)
361
respectively. The fact that the gap between the detected edge and the vertical baseline
in Figure14.21a is smaller than that of Figure 14.21b indicates that the key location in
Figure 14.21b suffered more local damage than that of Figure 14.21a.
14.4.1.1 Technical Approach for AR-Based Damage Assessment
A synthetic 3D environment based on VP principles was designed to demonstrate
and evaluate the computational framework, verify the developed algorithms, and
conduct sensitivity analysis. Figure 14.22 shows a 10-story building model simulated
in the VP environment. The graphical model is entirely reconfigurable and capable
A
6.1 m
6.1 m
6.1 m
W14x22
(Typical)
6.1 m
Moment bays
(a)
9.14 m
A
ROOF
9.14 m
D
W24x84
W16x26
W21x50
W24x117
W21x50
5.33 m
9 @ 4.2 m
W24x117
W14x22
W27x94
(Typical)
W24x131
W27x94
W16x26
(Typical)
6.1 m
Moment bays
9.14 m
9.14 m
5
6
9.14 m
F
STORY
10
9
8
7
W24x131
W27x94
W24x131
W27x94
W24x146
W27x94
W24x146
W27x94
W24x146
W27x94
W24x146
5 @ 9.14 m
(b)
Moment bays
Gravity bay
Moment bays
FIGURE 14.22 A 10-story graphical and structural building model constructed in VP environment. (a) Plan view. (b) Elevation view. (From Dong, S. etal., Autom. Construct., 33, 24, 2013.)
362
of manifesting any level of internal damage on its faade in the form of residual drift,
so that the IDR can be extrapolated for each floor. Given the input IDR, the structural
macro model predicts the potential for structural collapse and the mode of collapse,
should failure occur. The residual drift is represented by translating the joints of the
wireframe model that have been superimposed with a high-resolution faade texture.
The drift is further manifested through the displaced edges on the surface texture
that can be extracted using the LSD method.
Subsequently, the 2D intersections between extracted edges and projected horizontal baselines are used to triangulate the 3D spatial coordinates at key locations
on the building. The 2D image, where extracted edges and baselines are visible, is
taken by the OpenGL camera that is set up at specified corners in the vicinity of the
building. At each corner, the cameras orientation (i.e., pitch) is adjusted to take a
snapshot of each floor in sequence and project the corresponding horizontal baseline.
In reality, the location of the camera may be tracked by an RTK-GPS sensor, and its
orientation may be monitored with an electronic compass. In the simulated VP environment, the location and orientation of the camera are known and can be controlled
via its software interface. Random errors can thus be introduced to simulate the
effects of systemic tracking uncertainty or jitter expected in a field implementation.
In order to imitate the structural damage sustained after the disaster, a uniform distribution drift of [0.06 m, 0.06 m], in both x and y directions, is applied to each joint
of the building so that the difference between consecutive floors in either the xory
direction is less than 0.12 m. The selection of the damage model is derived by the
requirement on inelastic IDR, which is commonly limited to 2.5% by building codes
and is occasionally relaxed to 3% for tall buildings (Hart 2008). Given that the average height of a building floor is 34 m, the maximum allowable displacement between
two consecutive floors will be 0.090.12 m when using the most relaxed IDR of 3%.
In addition, a reasonable assumption is made that unless the internal columns buckle
or collapse, the height of the building remains the same after the damage. Since the
column buckling or collapse situation is not modeled in the simulation, the z value of
the corner coordinate does not change. Each polygon vertex is assigned a 2D texture
coordinate, and the associated clipped texture is pasted onto the surface of the wall
(Figure 14.23). The texture can thus displace with the drifting vertex in the 3D space,
with the goal of estimating the vertex deformation through the displaced texture.
14.4.1.2 Vertical Edge Detection
Vertical edge detection of the building wall is the most critical step for locating the
key point on the 2D image plane, which happens to be a fundamental problem in
FIGURE 14.23 Internal structural damage, shift of the vertex, is expressed through the
displacement of the texture. (From Dong, S. etal., Autom. Construct., 33, 24, 2013.)
363
image processing and computer vision domains as well. Many algorithms for edge
detection exist and most of them use the Canny edge detector (Canny 1986) and
Hough transformation (Duda and Hart 1972) as a benchmark. However, standard
algorithms are subject to two main limitations. First, they face threshold dependency
(resulting from adjusting algorithm parameters that influence its accuracy). Second,
they may generate false positives and negatives by either detecting too many irrelevant small line segments or failing to interpret the desirable line segments. False
positives and negatives are highly related to the threshold tuning.
In the authors research, initially, the graph-cut based active contour (GCBAC)
algorithm (Xu and Bansal 2003) was used. By employing the concept of contour
neighborhood, GCBAC alleviates the local minima trapping problem suffered by
traditional active contour. However, GCBAC requires manual specification of the
initial contour and contour neighborhood width, quantities that are arbitrary and subjective. Optimization can be achieved by using the original baseline of the damaged
building to numerically calculate both the initial contour and neighborhood width.
GCBAC works best when the image covers the entire outline of the building that is
not applicable in the real applications. Unfortunately, the coverage of the entire highrise building surface inevitably results in lower-resolution details. Moreover, frequent
partial occlusion from trees and other buildings can compromise detection accuracy.
The second attempt was a linear-time LSD that gives accurate results, a controlled number of false detections, and (most importantly) requires no parameter
tuning (von Gioi et al. 2010). This method outperforms GCBAC in searching for
localized line segments. However, as shown in Figure 14.24, there may be still multiple line segment candidates in the neighborhood of the actual edge of the building
wall. A filter is used to eliminate those line segments whose slope and boundary
deviate significantly from the original baseline. Manual selection can be used if the
LSD fails to locate a desirable edge; the user can manually transpose the closest line
segment to the desirable position in a short amount of time.
FIGURE 14.24 A geometric filter together with minimal manual reinforcement can efficiently
eliminate most irrelevant line segments. (From Dong, S. etal., Autom. Construct., 33, 24, 2013.)
364
FIGURE 14.25 The detectable gap between the original baseline and real edge enlarges as
the camera gets closer to the building. (From Dong, S. etal., Autom. Construct., 33, 24, 2013.)
ntal
b
Hori
zo
aseli
ntal
b
aseli
Building edge
(a)
Building edge
Hori
zo
365
ne
ne
(b)
FIGURE 14.26 Alignment of horizontal baseline: shifting the two ends of the baseline with
(a) different distances costs O(n4) and (b) the same distance costs O(n2). (From Dong, S. etal.,
Autom. Construct., 33, 24, 2013.)
366
NO
NO
YES
Have 2D intersection
points from two aspects?
YES
Triangulate the 3D coordinate
of the corner from
intersections on two images
YES
Filter the reconstruction
results to find
the best estimation
FIGURE 14.27 Flowchart of the proposed corner detection algorithm. (From Dong, S. etal.,
Autom. Construct., 33, 24, 2013.)
367
P1
P2
FIGURE 14.28 IDR calculation for a sample floor. (From Dong, S. etal., Autom. Construct.,
33, 24, 2013.)
(a)
(b)
(c)
(d)
FIGURE 14.29 Observing angle of the camera (a) Covering both sides of the building,
(b)Covering both sides of the building, (c) Covering only one side of the building, (d)Covering
only one side of the building. (From Dong, S. etal., Autom. Construct., 33, 24, 2013.)
368
is closer to a right angle. In the second group, one image covered both sides, while the
other covered only one side. In the third group, both images covered only one side, as
shown in Figure 14.29c and d. In the latter two cases, the observing angle is closer to
180. It was concluded that the accuracy degenerates significantly when covering only
one side of the wall. This indicates that the detection error is minimized when the angle
formed by two lines is close to a right angle, and magnified when the angle is either
acute or obtuse. In addition, it was found that a higher image resolution helps promote
the accuracy of the LSD algorithm, which in turn increases the overall accuracy.
The goal of the second set of experiments was to test the robustness of the developed algorithm in the presence of instrument errors. The experiments were conducted
with the best configuration, as found from the results of the first set of experiments
(i.e., the camera of 18 megapixels was placed about 35 m away from the building and
provided coverage of both sides of the building). In the first test, ground truth orientation data were assumed and only location error was introduced. In Figure 14.30, the
z-axis shows the average estimation error with the unit of meter. The altitude RMSaxis shows the accuracy response to the change in RTK-GPS altitude measurement
uncertainty, and the longitude and latitude RMS-axis shows the accuracy response to
the change in both RTK-GPS longitude and latitude measurement uncertainty.
The results indicated that uncertainty in longitude and latitude has a bigger impact
on the displacement error than it does on altitude, as indicated by the diagonal arrow
in Figure 14.30. The results also showed that longitudes and latitudes smaller than
3mm can achieve the measurement accuracy of 5mm, as indicated by the left-toright arrow in Figure 14.30.
Given that the displacement error is linear to the GPS location accuracy, state-of-theart RTK-GPS can meet the precision requirement. For example, manufacturer-specified
Measurement error due to GPS
uncertainty
0.03
0.0350.04
0.025
0.030.035
0.02
0.015
0.0250.03
0.01
0.005
0.020.025
0.0150.02
0.010.015
0
0.04
0.035
00.005
18
0
15
12
0.0050.01
FIGURE 14.30 Sensitivity of computed drift to camera position errors. (From Dong, S.
etal., Autom. Construct., 33, 24, 2013.)
369
accuracy reports uncertainty of 1mm (RMS) in latitude and longitude, and 23mm
(RMS) in altitude, in which case displacement error stays below 5mm (Trimble 2009).
The second test assumed ground truth location data, and only introduced error to orientation. In Figure 14.31, the z-axis shows the average estimation error in meters. The
pitch and roll RMS-axis shows the accuracy response to the change in electronic compass pitch and roll reading uncertainty, and the yaw RMS-axis shows the accuracy
response to the change in electronic compass yaw reading uncertainty.
The results indicated that uncertainty in pitch and roll has a more adverse impact
on the displacement error than the yaw does, as shown by the right-to-left arrow in
Figure14.31. Furthermore, a precision of 0.01 (RMS) on all three axes is required to
keep the displacement error in the useful range, as indicated by the left-to-right arrow in
Figure 14.31. Unfortunately, state-of-the-art electronic compasses mostly cannot satisfy
this precision requirement. Most off-the-shelf electronic compasses report uncertainty
bigger than 0.1 (RMS), thus suggesting the need for survey-grade line-of-sight tracking
methods for monitoring the cameras orientation. The third test considered the combined
error from both location and orientation readings. As shown in Figure 14.32, results
indicated that the uncertainty from the electronic compass is the critical source of error.
In summary, the experimental results with ground true location and orientation
data were proven to be satisfactory for damage detection requirements. The results
also highlight the conditions for achieving the ideal measurement accuracy, for
example, observing distance, angle, and image resolution. The experimental results
with instrumental errors reveal the bottleneck in field implementation. While the
state-of-the-art RTK-GPS sensor can meet the location accuracy requirement, the
electronic compass is not accurate enough to supply qualified measurement data,
suggesting that alternative survey-grade orientation measurement methods must
be identified to replace electronic compasses. The conducted sensitivity analysis
Measurement error from compass
uncertainty
0.4
0.3
0.350.4
0.25
0.30.35
0.2
0.250.3
0.15
0.20.25
0.1
0.150.2
0.05
0.08
0.06
0.04
0.02
0.03
0.01
0.02
0.05
0.10.15
0.04
0.09
0.08
0.07
0.06
0.35
0.050.1
00.05
FIGURE 14.31 Sensitivity of computed drift to camera orientation errors. (From Dong, S.
etal., Autom. Construct., 33, 24, 2013.)
370
0.2
0.30.35
0.15
0.250.3
0.1
0.20.25
0.05
0.150.2
0.10.15
3
0.08
0.09
0.04
0.03
0.02
9
0
0.01
0.07
0.050.1
0.06
00
0.05
0.3
0.25
00.05
FIGURE 14.32 Sensitivity of computed drift to camera location and orientation errors.
(From Dong, S. etal., Autom. Construct., 33, 24, 2013.)
371
One-call center
shares information
with member
utility owners
Utility owners
dispatch field
locators and
markers
Expected utility
locations are
marked with
paint/stakes/flags
372
1. Lack of persistent visual guidance for spatial awareness: While the practice of marking utility locations on the ground helps in initial excavation
planning, such surface markings (e.g., paint, stakes, flags) are the first
to be destroyed or dislocated when excavation begins and the top soil or
surface is scraped. This makes it challenging for an excavator operator to
maintain spatial orientation, and must then rely on memory and judgment
to recollect expected utility locations as excavation proceeds. Seemingly
innocuous events such as returning to work after a break or stopping to help
another crew can prove detrimental because the operator must now recall
the marked locations before continuing to dig. Thus, excavator operators
and field supervisors have no persistent visual clues that can help them be
spatially aware of the underground space surrounding an excavators digging implement. Minor lapses in orientation or in recollecting marked utility locations can thus easily lead to accidents.
2. Inability to gauge proximity of excavator to buried assets while digging:
Another significant limitation is that an operator has no practical means of
knowing the distance of an excavators digging implement (e.g., bucket) to
the nearest buried obstructions until they are exposed. Excavation guidelines
in most states including Michigan require buried utilities to be hand exposed
373
prior to the use of any power equipment (MDS 2007). Failure to follow the
hand exposure guidelines, which happens often out of ignorance or as a
conscious decision, means that the first estimate of proximity an operator
receives is when the digging implement actually touches a buried utility. Itis
easy to understand why this first touch can often actually be a strike.
Field locators and markers typically mark only the horizontal location of utilities on
the ground, and often do not include any depth or other attribute information that may
possibly help operators better perceive their location in three dimensions (MDS 2009).
A possible justification for this practice is that locators believe that there are standard
depths where each utility type is typically buried. Second, even if depth and other
attribute information is marked on the ground along with a utilitys horizontal location, the markings are destroyed early in the excavation process, placing the burden
of remembering a utilitys expected depth and orientation on the operator. Excavatormounted devices such as EZiDig (CD 2010) can be partially useful by providing physical evidence of certain utilities (e.g., metallic pipes) existing in an excavators path.
However, as noted earlier, accurate geolocation of all utilities may need a multisensory
approach that can be pursued prior to excavation, but is impractical to implement on a
digging excavator (Costello etal. 2007). Thus, without any visual cues or quantitative
feedback, excavator operators find it very challenging to gauge the evolving distance
between a digging machine and any unexposed utilities that lie buried in the vicinity.
14.4.2.1Technical Approach for Visualization of Buried
Utility Geodata in Operator-Perspective AR
AR visualization can be achieved with optical see-through or video see-through
HMDs worn by the user to view a composite scene (Azuma etal. 2001). The form of
AR visualization most suitable for this application is video see-through AR, where
a video camera abstracts the users view, and graphics are registered and superimposed on the video feed to create a continuous AR composite display (Kanbara etal.
2000). In order to achieve this, the line of sight (position and orientation) of the video
camera must be continuously tracked so that the projection of the superimposed
graphics in each frame can be computed relative to the cameras pose. As shown
in Figure 14.34, using AR visualization, as an excavator digs, the goal is to have
superimposed, color-coded geodata graphics that stay fixed to their intended ground
locations to continuously help orient the operator.
Geodata from
archived GIS or
geophysical
surveys
Characterize and
convert geodata to
annotated 3D
models
Measure position
and orientation
of cab-mounted
video camera
Register 3D models
for visualization in
operator-view
augmented reality
FIGURE 14.34 Overview of the designed approach for visualization of buried asset geodata in operator-view AR.
374
However, in practice, it is not always feasible for a first-person operators perspective due to physical encumbrance and potential distractions associated with traditional AR displays (e.g., HMDs). In order to provide a continuous AR display for
operator spatial awareness, the most suitable approach is to enable AR visualization
on a screen display mounted in the excavators cabin (Figure 14.35). Such a display,
when strategically mounted in the operators field of view may serve as a continuous
excavation guidance tool, while providing an unobstructed view of the jobsite.
In order to abstract the operators viewpoint, a high-speed FireWire camera is
mounted on the roof of the excavators cabin. The position of the camera will be
continuously tracked using RTK-GPS, providing a location uncertainty of 12in.
Tracking of the excavators 3D attitude (heading, pitch, and roll) can be intuitively done using a magnetic orientation sensor. However, the sensor may produce
noisy data due to the excavators metallic construction. In this case, an alternative
approach is to use multiple dual antenna RTK-GPS receivers. The antennas should
be mounted on an excavators cabin in such a way that their simultaneous position
tracking helps interpret a directional vector in 3D space corresponding to the cabins
articulation. A similar idea based on tracking the GPS locations of multiple points
on a machine has been attempted before for grade control and blade positioning on
bulldozers and graders (Roberts etal. 2002b). Having estimated the location of the
viewpoint (i.e., camera) and the orientation of its line of sight, a trigonometric 3D
registration algorithm is used to compute the pose of the geodata graphics that will
be superimposed at each frame to create a composite AR view.
14.4.2.2 Processing Geodata Vectors and Attributes for AR Visualization
The advent of Geospatial Information System (GIS) storage and retrieval, along with
accurate GPS data collection, has allowed fundamental improvements in methods
used to collect and depict utility data (Anspach 2011). Utility owners now typically
have a data record of their underground assets in some form of a GIS. For example,
in the U.S. state of Michigan, DTE (a major utility provider) archives its geodata as
GIS shapefiles using a proprietary database. In order to visualize such GIS geodata
in AR, the geodata must first be converted to a format suitable for graphical visualization. More importantly, the geodata accuracy and associated reliability must be
characterized before it can be used as information support for excavator operation
and control. In order to address the wide disparity in the source, quality, age, and
completeness of utility records, state-of-the-art utility GIS tools characterize geodata in terms of its precision and its pedigree (ProStar 2011). Precision characterizes
the accuracy that can be associated with the geodata (e.g., Map Grade for subfoot,
or Survey Grade for 4 in.). Pedigree, on the other hand, describes the lineage of the
geodata and its attributes (e.g., as-designed or as-built, type of utility, identity of data
collector, device used, aerial imagery). Together, the precision and pedigree help
characterize the accuracy of a geodata set and the reliability that can be associated
with its source. This information can be used to display not only the expected locations of utilities to an operator, but also the degree of uncertainty (or buffer) associated with the expected locations.
GIS shapefiles can be converted to a graphical 3D format using XML-based
encoding schemes such as the Geography Markup Language (GML) (Burggraf 2006).
tron
ic
pipe power
line
Sewe
r
drain and
lines
Petro
le
pipe um
line
Elec
375
376
(a)
(b)
FIGURE 14.36 Geodata converted to annotated 3D models. (a) Google Earth view. (From
Talmaki, S. et al., Geospatial databases and augmented reality visualization for improving safety in urban excavation operations, Proceedings of the 2010 Construction Research
Congress: Innovation for Reshaping Construction Practice, American Society of Civil
Engineers, Reston, VA, 2010, pp. 91101.) (b)Open Scene Graph (OSG) AR.
377
Emulation in 3D
world populated
with buried utility
geodata models
Detection of
proximity to
buried assets in
excavator vicinity
Real-time
knowledge-based
excavator
operation
378
Radio signal
transmitter
RTK
rover antenna
RTK
base antenna
LVDT
RTK
base receiver
Radio signal
receiver
(a)
379
(b)
380
generally consists of N moving and M stationary objects, where both N and M can
be arbitrarily large. In this case, for every monitored excavator (N), the task is to
estimate its proximity to M stationary objects representing buried assets.
The geometric computation of object-to-object proximity is an intensive task.
Efficiency in proximity computing algorithms is thus paramount. In order to address
this challenge, a two-phase, hierarchical approach based on a method originally
presented in Larsen etal. (1999) was investigated. Since activities on an excavation
site are typically spread laterally, not all object pairs are possibly in close vicinity
at any given time. A two-phase approach is efficient for this task because pairs of
excavatorutility objects that are not within interacting distance could be quickly
eliminated. This can possibly limit detailed pair-wise proximity tests to onlythose
object pairs that are within interacting distance (i.e., utilities in the vicinity of
the digging excavator). Figure 14.40 presents the two-stage proximity detection
approach. In particular, following each excavator motion update, a quick approximate test based on loosely fitting bounding volumes called axis-aligned bounding boxes (AABBs) first finds potentially interacting excavatorutility object pairs,
using a variation of the N-body sweep and prune approach originally proposed by
Cohen et al. (1995). After identifying pairs of potentially interacting objects, an
exact two-level test computes the Euclidean distance between the objects in each
pair to determine the excavators proximity to nearby utilities. In this stage, the
algorithm constructs another kind of imaginary bounding volumes called swept
sphere volumes (SSVs) around the digging implement and each buried asset object
(Larsen etal. 1999). In order to improve computational efficiency, these bounding
volumes will then be organized into bounding volume hierarchies (BVHs), on which
proximity tests will be attempted.
Incremental
motion
Emulated 3D
virtual world
Prune distant
object pairs
with AABBs
Flagged
object pairs
No proximity
Response
parameters
Analysis and
warning
response
Exact pair-wise
proximity tests
with SSVs
Possible proximity
Object pairs in
close proximity
Confirmed proximity
381
The excavators current articulation and the location of the nearest buried assets
is then projected on the display along with the AR view. The computed distance
to the nearest buried assets is also displayed, along with identifying attributes to
help the operator correlate the presented information to utility lines displayed in
the AR view.
14.4.2.5 Experimental Results
The robustness of the designed excavatorutility proximity detection methodology
was tested in a series of field experiments using ARMOR and the SMART framework. In particular, electricity conduits in the vicinity of the G.G. Brown Building at
the University of Michigan were exported as Keyhole Modeling Language (KML)
files from a Geodatabase provided by the DTE Energy Company. The following
procedure interprets KML files and builds conduit models:
1. Extract the spatial and attribute information of pipelines from the KML
file using libkml, a library for parsing, generating, and operating in KML
(Google 2008). For example, the geographical location of pipelines is
recorded under the Geometry element as LineString (Google 2012). A cursor is thus designed to iterate through the KML file, locate LineString elements, and extract the geographical locations.
2. Convert consecutive vertices within one LineString from the geographical
coordinate to the local coordinate in order to raise computational efficiency
during the registration routine. The first vertex on the line string is chosen
as the origin of the local coordinate system, and the local coordinates of
the remaining vertices are determined by calculating the relative 3D vector between the rest of the vertices and the first one, using the Vincenty
algorithm.
3. In order to save storage memory, a unit cylinder is shared by all pipeline
segments as primitive geometry upon which the transformation matrix is
built.
4. Scale, rotate, and translate the primitive cylinder to the correct size, attitude, and position. For simplicity, the normalized vector between two successive vertices is named as the pipeline vector. First, the primitive cylinder
is scaled along the X- and Y-axis by the radius of the true pipeline, and then
scaled along the Z-axis by the distance between two successive vertices.
Second, the scaled cylinder is rotated along the axisformed by the cross
product between vector 0, 0, 1 and the pipeline vectorby the angle of
the dot product between vector 0, 0, 1 and the pipeline vector. Finally, the
center of the rotated pipeline is translated to the midpoint between two successive vertices. This step is applied to each pair of two successive vertices.
Figures 14.41 through 14.43 show several snapshots corresponding to different steps
of these field experiments.
382
1
4
Z
2
X 3
FIGURE 14.41 Conduit loading procedure, conduits overlaid on Google Earth and field
experiment results.
FIGURE 14.42 Labeling attribute information and color coding on the underground utilities.
383
384
operational details such as the maneuverability of trucks and backhoes in excavation areas, and the deployment of cranes and materials in steel erection. Such tasks
require careful and detailed planning and validation, so as to maximize resource
utilization and to identify hidden spatial collision and temporal conflicts. Therefore,
this visualization paradigm can help engineers in validating and verifying operational concepts, checking for design interferences, and estimating overall constructability (Kamat and Martinez 2003).
From the point of view of collaborative learning, however, there are still major
gaps between research advancements in visualization and the integration of
visualization techniques in pedagogical settings, as well as outstanding implementation challenges that need to be properly addressed. This Section describes the latest work by the authors in using AR as an interconnecting media to bridge the gap
between computer-based dynamic visualization and paper-based collaboratively
shared workspace. AR is one of the most promising candidates because it blends
computer-generated graphics with real scene backgrounds, using real-time registration algorithms. Users can work across the table face-to-face, shift the focus of
shared workspace instantly, and jointly analyze dynamic engineering scenarios. This
idea is developed and implemented in Augmented Reality Vitascope (ARVita), in
which multiple users wearing HMDs can observe and interact with dynamic simulated construction activities laid on the surface of a table.
Compared to VR, AR can enhance the traditional learning experience since
1. The ability to learn concepts and ideas through interacting with a scene
and building ones own knowledge (constructivism learning) facilitates the
generation of knowledge and skills that could otherwise take too long to
accumulate.
2. Traditional methods of learning spatially relate content by viewing 2D
diagrams or images create a cognitive filter. This filter exists even when
working with 3D objects on a computer screen because the manipulation of
objects in space is done through mouse clicks. By using 3D immersive AR,
a more direct cognitive path toward understanding the content is possible.
3. Making mistakes during the learning process will have literally no real
consequence for the educator, whereas in traditional learning, the failure to
follow certain rules or precautions while operating machinery or handling a
hazardous material could lead to serious safety and health-related problems.
4. AR supports discovery-based learning, an instructional technique in which
students take control of their own learning process, acquire information,
and use that information in order to experience scenarios that may not be
feasible in reality given the time and space constraints of a typical engineering project.
5. An important objective of all academic curricula is to promote social interaction among students, and to teach them to listen, respect, influence, and
act. By providing multiple students with access to a shared augmented space
populated with real and virtual objects, they are encouraged to become
involved in teamwork and brainstorming activities to solve a problem,
which simultaneously helps them improve their communication skills.
385
Video
tracker
marker
Video
Video
background
Marker
Tracker
ModelView
transform matrix
FLTK Window
Embedded
OSG viewer
View
Controller
Sensor
resource
Vitascope API
Coordinate
transform matrix
Vitascope
scene root
Model
FIGURE 14.44 The software architecture of ARVita conforms to the MVC pattern. Each
arrow indicates a belongs to relationship.
386
387
FIGURE 14.45 Two users are observing the animation lying on the table.
instantaneously along the timeline (Figure 14.46). The ARVita Controller wraps
the existing VITASCOPE APIs such as vitaExecuteViewRatioChange() and
vitaExecuteTimeJump(), in a user-friendly interface as most media players do
using features such as fast-forward buttons and progress bar.
As shown in Figure 14.47, the VITASCOPE scene node (the core logic of the
model) resides at the bottom of the tree. The vitaProcessTraceFile()function is called up in every frame to update the animation logic. Above the scene node
is a coordinate transformation node. Since all tracking algorithms used in ARVita
assume that the z-axis is pointing up and use the right-hand coordinate system, this
transformation converts VITASCOPEs y-axis to be pointing upward, and converts
the right-hand coordinate system to ARVitas default system, so that the jobsite
model is laid horizontally above the marker.
The core of the View is the FLTK_OSGViewer node that inherits methods from both
the FLTK Window class and osgViewer class and thus functions as the glue between
388
Embedded
view
View
Controller
Fltk
widgets
Fltk
window
Fltk_OSG
viewer
osgCamera:
projection,
viewport
Video
background
Eagle
window
background
Modelview
Model
Coordinate
transform
Vitascope
Vitascope
scene root
the FLTK and OSG. Under its hood are the ModelView transformation node and video
stream display nodes. The ModelView matrix is updated in each frame by the tracking event callbacks. This approach follows osgARTs example (OSG ARToolkit) that
uses the Tracker and Marker updating mechanism to bundle the tracking procedure
(e.g., ARToolkit and OSG together). As shown in Figure 14.48, both Tracker and
Marker are attached as event callbacks to the respective node in the graph.
The fiducial marker tracking method is efficient and fast. This is because the fiducial marker is usually composed as an artificial picture that contains easy to extract
visual features (e.g., a set of black and white patterns). The extraction of these patterns (e.g., straight lines, sharp corners, circles) is fast and reliable. ARToolkit is one
of the oldest fiducial marker tracking methods, and is widely considered a benchmark in the AR community. However, it also suffers from the common shortcomings
of a fiducial marker, and requires all four of its corners to be visible to the camera so
that the camera pose can be computed.
This can cause frustration when the user navigates through the animated jobsite only to find the animated graphics blinking on and off due to loss of tracking.
389
Root
Abstract
video
Instance
video
Dynamic
texture
Video
layer
Modelview
Abstract
tracker
Event callback
Transformation
matrix and visibility
Instance
tracker
Marker
VITASCOPE
Thisdisadvantage was the motivation to look into natural markers, which are more
flexible with regard to the markers coverage. Besides the advantage of partial coverage (Figure 14.49), the natural marker does not depend on special predefined visual
features such as those of the fiducial marker. In other words, the features can be
points, corners, edges, and blobs that appear in a natural image. The extraction of
these features is vital in establishing correspondence between the observed image by
the camera and the marker image, and in estimating the cameras pose.
Therefore, robust estimation usually requires the establishment of ample correspondence, which is a challenging issue. The KEG tracker used in ARVita inherits merits from both the detection-based methods such as Scale Invariant Feature
Transform (SIFT) (Lowe 2004) or FERNs (Ozuysal etal. 2007), and tracking-based
methods such as KanadeLucasTomasi (KLT) (Lucas and Kanade 1981, Shi and
FIGURE 14.49 The natural marker tracking library is flexible on marker visibility.
390
TABLE 14.4
Comparison between Two Natural Marker Approaches
Approach
Principle
Relation between
consecutive frame
Pros
Cons
Detection-Based
Identify matching feature points on
each new frame.
Independent.
Failure of estimation in one frame will
in no way affect the next frame.
Time consuming.
Tracking-Based
Follow up the existing feature points
from frame to frame.
Current frame is correlated with
previous one.
Fast.
Error of estimation in one frame will
be carried forward, and the
accumulated error will eventually
lead to loss of tracking.
391
Composite
Viewer
View_1
Video_1
ModelView_1
View_2
Video_2
View_n
ModelView_2
Video_n
ModelView_3
VITASCOPE
Scene Root
FIGURE 14.50 All views possess their own video, tracker, and marker objects, but point to
the same VITASCOPE scene node.
FIGURE 14.51 Users can select multiple cameras that are detected by ARVita as the program starts.
392
being made to make ARVita comply with the rules of HLA. This compliance will allow
ARVita to be distributed and synchronized across computers. When this happens, students can run multiple instances of ARVita on their own computers, but still collaborate
on the synchronized model. The current version of ARVita software and its source code
can be found on the website: http://pathfinder.engin.umich.edu/software.htm.
REFERENCES
Aiteanu, D., Hiller, B., and Graser, A. (2003). A step forward in manual welding: Demonstration
of augmented reality helmet. Proceedings of the 2003 IEEE and ACM International
Symposium on Mixed and Augmented Reality. Tokyo, Japan.
Alba, M., Fregonese, L., Prandi, F., Scaioni, M., Valgoi, P., and Monitoring, D. (2006).
Structural monitoring of a large dam by terrestrial laser. Proceedings of the 2006 ISPRS
Commission V (use capital V instead of small v) Symposium. Dresden, Germany.
393
Alphonse, L. (2014). New STEM index finds Americas STEM talent pool still too shallow to
meet demand, U.S. News. http://www.usnews.com/news/stem-index/articles/2014/04/23/
new-stem-index-finds-americas-stem-talent-pool-still-too-shallow-to-meetdemand?int=9a5208 (accessed March 3, 2015).
Anspach, J. H. (2011). Utility data management. http://modot.org/business//Utility Data
Management.pdf (accessed September 15, 2011).
Arditi, D. and Polat, G. (2010). Graduate education in construction management. Journal of
Professional Issues in Engineering Education and Practice, 3, 175179.
ASCE (American Society of Civil Engineers) (2002). Standard guideline for the collection
and depiction of existing subsurface utility data. ASCE/CI Standard 38-02, Reston, VA.
Azuma, R. (1997). A survey of augmented reality. Teleoperators and Virtual Environments,
6(4), 355385.
Azuma, R., Baillot, Y., Behringer, R., Feiner, S., Julier, S., and MaIntyre, B. (2001). Recent
advances in augmented reality. Journal of Computer Graphics and Applications,
21(6), 3447.
Beder, C., Bartczak, B., and Koch, R. (2007). A comparison of PMD-cameras and stereovision for the task of surface reconstruction using patchlets. Proceedings of 2007 IEEE
Conference on Computer Vision and Pattern Recognition. Minneapolis, MN.
Behzadan, A. H. (2008). ARVISCOPE: Georeferenced visualization of dynamic construction
processes in three-dimensional outdoor augmented reality. PhD dissertation, Department
of Civil and Environmental Engineering, University of Michigan, Ann Arbor, MI.
Behzadan, A. H., Dong, S., and Kamat, V. R. (2012). Chapter 5Mobile and pervasive
construction visualization using outdoor augmented reality. In Mobile and Pervasive
Computing in Construction, C. Anumba and X. Wang (eds.), pp. 5485. John Wiley &
Sons: Hoboken, NJ.
Behzadan, A. H. and Kamat, V. R. (2005). Visualization of construction graphics in outdoor
augmented reality. Proceedings of the 2005 Winter Simulation Conference, Institute of
Electrical and Electronics Engineers (IEEE). Orlando, FL.
Behzadan, A. H. and Kamat, V. R. (2007). Georeferenced registration of construction graphics
in mobile outdoor augmented reality. Journal of Computing in Civil Engineering, 21(4),
247258.
Behzadan, A. H. and Kamat, V. R. (2008). Resolving incorrect occlusion in augmented reality animations of simulated construction operations. Proceedings of the 15th Annual
Workshop of the European Group for Intelligent Computing in Engineering (EG-ICE),
European Group for Intelligent Computing in Engineering, Plymouth, U.K., pp.
2435.
Behzadan, A. H. and Kamat, V. R. (2009a). Interactive augmented reality visualization
for improved damage prevention and maintenance of underground infrastructure.
Proceedings of 2009 Construction Research Congress, American Society of Civil
Engineers, pp. 12141222. Seattle, WA.
Behzadan, A. H. and Kamat, V. R. (2009b). Automated generation of operations level construction animations in outdoor augmented reality. Journal of Computing in Civil
Engineering, 23(6), 405417.
Behzadan, A. and Kamat, R. V. (2012). A framework for utilizing context-aware augmented
reality visualization in engineering education. Proceedings of the 2012 International
Conference on Construction Applications of Virtual Reality (CONVR). Taipei, Taiwan.
Behzadan, A. H., Timm, B. W., and Kamat, V. R. (2008). General-purpose modular hardware
and software framework for mobile outdoor augmented reality applications in engineering. Advances Engineering Informatics, 22, 90105.
Berger, M.-O. (1997). Resolving occlusion in augmented reality: A contour based approach
without 3D reconstruction. Proceedings of 1997 IEEE Conference on Computer Vision
and Pattern Recognition. San Juan, PR.
394
Billinghurst, M. and Kato, H. (1999). Collaborative mixed reality. Proceedings of the 1999
International Symposium on Mixed Reality. Yokohama, Japan.
Bowie, J. (2010). Enhancing classroom instruction with collaborative technologies.
http://www.eschoolnews.com/2010/12/20/enhancing - classroom-instruction - with-
collaborative-technologies/ (accessed March 3, 2015).
Brooks, F. P. (1999). Whats real about virtual reality? Journal of Computer Graphics and
Applications, 19(6), 1627.
Burggraf, D. S. (2006). Geography markup language. Data Science Journal, 5(October 19), 178.
Cable Detection (CD) (2010). EZiDig fact sheet. http://www.cabledetection.co.uk/ezidig
(accessed September 15, 2010).
Canny, J. (1986). A computational approach to edge detection. IEEE Transactions Pattern
Analysis and Machine Intelligence, PAMI, 8(6), 679698.
Chi, S., Caldas, C. H., and Gong, J. (2008). A crash avoidance framework for heavy equipment control systems using 3D imaging sensors. ITCon Special Issue on Sensors in
Construction and Infrastructure Management, 13, 118133.
Chock, G. (2006). ATC-20 post-earthquake building safety evaluations performed after
the October 15, 2006 Hawaii Earthquakes Summary and Recommendations for
Improvements (updated). Hawaii Structural Engineers Association: Honolulu, HI.
Cohen, J. D., Lin, M. C., Manocha, D., and Ponamgi, M. (1995). I-COLLIDE: An interactive
and exact collision detection system for large-scaled environments. Proceedings of the
ACM International Symposium on Interactive 3D Graphics, Association for Computing
Machinery, New York, pp. 189196.
Common Ground Alliance (CGA) (2010a). 811 campaign. http://www.call811.com/ (accessed
September 15, 2010).
Common Ground Alliance (CGA) (2010b). Locate accurately, dig safely. http://www.common
groundalliance.com/Content/NavigationMenu/Publications_and_Resources/Educational_
Programs/LocateAccuratelyBrochure.pdf (accessed September 15, 2010).
Costello, S. B., Chapman, D. N., Rogers, C. D. F., and Metje, N. (2007). Underground asset
location and condition assessment technologies. Tunneling and Underground Space
Technology, 22, 524542.
Dai, F., Lu, M., and Kamat, V. R. (2011). Analytical approach to augmenting site photos with
3D graphics of underground infrastructure in construction engineering applications.
Journal of Computing in Civil Engineering, 25(1), 6674.
Dong, S. (2012). Scalable and extensible augmented reality with applications in civil infrastructure systems. PhD dissertation, Department of Civil and Environmental Engineering,
University of Michigan, Ann Arbor, MI.
Dong S., Behzadan A. H., Feng C., and Kamat V. R. (2013). Collaborative visualization of
engineering processes using tabletop augmented reality. Elsevier Journal of Advances
in Engineering Software, 55, 4555.
Dong, S. and Kamat, V. R. (2010). Robust mobile computing framework for visualization of simulated processes in augmented reality. Proceedings of the 2010 Winter Simulation Conference,
Institute of Electrical and Electronics Engineers, pp. 31113122. Baltimore, MD.
Duda, R. O. and Hart, P. E. (1972). Use of the Hough transformation to detect lines and curves
in pictures. Communications of the ACM, 15(1), 1115.
EPA, United States Environmental Protection Agency. (2007). Drinking water needs survey
and assessment, fourth report to congress. http://www.epa.gov/safewater/needssurvey/
pdfs/2007/report_needssurvey_2007.pdf (accessed September 14, 2010).
ESRI. (2005). DTE energy, energy currentsGIS for energy, spring 2005, Redlands, CA. http://
www.esri.com/library/reprints/pdfs/enercur-dte-energy.pdf (accessed September 15, 2010).
Feng, C. and Kamat, V. R. (2012). A plane tracker for AEC automation applications. The
2012 International Symposium on Automation and Robotics in Construction, Mining
and Petroleum. Oulu, Finland.
395
Fortin, P.-A. and Hebert, P. (2006). Handling occlusions in real-time augmented reality:
Dealing with movable real and virtual objects. Proceedings of the 2006 Canadian
Conference on Computer and Robot Vision, pp. 5462. Quebec City, Canada.
Fukuda, Y., Feng, M., Narita, Y., Kaneko, S., and Tanaka, T. (2010). Vision-based displacement sensor for monitoring dynamic response using robust object search algorithm.
IEEE Sensors, 13(12), 19281931.
Georgel, P., Schroeder, P., Benhimane, S., and Hinterstoisser, S. (2007). An industrial augmented reality solution for discrepancy check. Proceedings of the 2007 IEEE and ACM
International Symposium on Mixed and Augmented Reality, pp. 111115. Nara, Japan.
Google (2008). Introducing libkml: A library for reading, writing, and manipulating KML.
http://google-opensource.blogspot.com/2008/03/introducing-libkml-library-for-
reading.html (accessed March 3, 2015).
Google (2012). KML documentation introduction. https://developers.google.com/kml/documentation/ (accessed March 3, 2015).
Gokturk, B. S., Yalcin, H., and Bamji, C. (2010). A time-of-flight depth sensorSystem
description, issues and solutions. Proceedings of 2010 IEEE Conference on Computer
Vision and Pattern Recognition Workshop. San Francisco, CA.
Golparvar-Fard, M., Pena-Mora, F., Arboleda, C. A., and Lee, S. (2009). Visualization of construction progress monitoring with 4D simulation model overlaid on time-lapsed photographs. Journal of Computing in Civil Engineering, 23(6), 391404.
Hammad, A., Wang, H., and Mudur, S. P. (2009). Distributed augmented reality for visualizing collaborative construction tasks. Journal of Computing in Civil Engineering, 23(6), 418427.
Hart, G. C. (2008). An alternative procedure for seismic analysis and design of tall buildings located in the Los Angeles region. Los Angeles Tall Buildings Structural Design
Council (LATBSDC), Los Angeles, CA. http://www.tallbuildings.org/PDFFiles/2011LA-CRITERIA-FINAL.pdf (accessed March 3, 2015).
Hirokazu, K. and Billinghurst, M. (1999). Marker tracking and HMD calibration for a videobased augmented reality conferencing system. Proceedings of the 1999 IEEE and ACM
International Workshop on Augmented Reality (IWAR 99). San Francisco, CA.
Ji, Y. (2010). A computer vision-based approach for structural displacement measurement.
Sensors and Smart Structures Technologies for Civil, Mechanical, and Aerospace
Systems, 43(7), 642647.
Kamat, V. R. and El-Tawil, S. (2007). Evaluation of augmented reality for rapid assessment
of earthquake-induced building damage. Journal of Computing in Civil Engineering,
21(5), 303310.
Kamat, V. R. and Martinez, J. C. (2003). Automated generation of dynamic, operations
level virtual construction scenarios. Electronic Journal of Information Technology in
Construction (ITcon), 8, 6584.
Kanbara, M., Okuma, T., Takemura, H., and Yokoya, N. (2000). A stereoscopic video seethrough augmented reality system based on real-time vision-based registration.
Proceedings of the IEEE Virtual Reality 2000, pp. 255262. New Brunswick, NJ.
Kim, C., Haas, C. T., and Liapi, K. A. (2005). Rapid, on-site spatial information acquisition
and its use for infrastructure operation and maintenance. Automation in Construction,
14(5), 666684.
Kini, A. P., King, K., and Kang, S. (2009). Sensor calibration and real-time tracking of a backhoe loader using the iGPS system. Quality Digest Magazine. http://www.qualitydigest.
com/inside/cmsc-article/sensor-calibration-and-real-time-tracking-backhoe-loader-
using-igps-system.html (accessed September 15, 2011).
Koch, R., Schiller, I., Bartczak, B., Kellner, F., and Kser, K. (2009). MixIn3D: 3D mixed reality with
ToF-camera. Proceedings of 2009 Dynamic 3D Imaging (Dyn3D) Workshop, Jena, Germany.
Koo, B. and Fischer, M. (2000). Feasibility study of 4D CAD in commercial construction.
Journal of Construction Engineering and Management, 126(4), 251260.
396
Krishnan, S. (2006). Case studies of damage to tall steel moment-frame buildings in Southern
California during Large San Andreas Earthquakes. Bulletin of the Seismological Society
of America, 96(4A), 15231537.
Kwon, S., Boche, F., Kim, C., Haas, C. T., and Liapi, K. (2004). Fitting range data to primitives
for rapid local 3D modeling using sparse range point clouds. Journal of Automation in
Construction, 13, 6781.
Larsen, E., Gottschalk, S., Lin, M., and Manocha, D. (1999). Fast proximity queries with
swept sphere volumes. Technical report TR99-018. Department of Computer Science,
UNC-Chapel Hill, Chapel Hill, NC. http://gamma.cs.unc.edu/SSV/ssv.pdf (accessed
September 15, 2010).
Lepetit, V. and Berger, M.-O. (2000). A semi-automatic method for resolving occlusion in augmented reality. Proceedings of 2000 IEEE Conference on Computer Vision and Pattern
Recognition. Hilton Head Island, SC.
Lester, J. and Bernold, L. E. (2007). Innovative process to characterize buried utilities using
ground penetrating radar. Automation in Construction, 16, 546555.
Louis, J. and Martinez, J. (2012). Rendering stereoscopic augmented reality scenes with occlusions using depth from stereo and texture mapping. Proceedings of 2012 Construction
Research Congress, American Society of Civil Engineers, West Lafayette, IN.
Lourakis, M. (2011). Homest: A C/C++ library for robust, non-linear homography estimation.
http://www.ics.forth.gr/~lourakis/homest/ (accessed March 3, 2015).
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International
Journal of Computer Vision.
Lucas, B. D. and Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. Proceedings of the Seventh International Joint Conference on
Artificial Intelligence.
Martinez, J. C. (1996). Stroboscope: State and resource based simulation of construction processes. Thesis, University of Michigan, Ann Arbor, MI.
Martz, P. (2007). OpenSceneGraph Quick Start Guide.
Mendez, E., Schall, G., Havemann, S., Junghanns, S., Fellner, D., and Schmalstieg, D.
(2008). Generating semantic 3D models of underground infrastructure. IEEE Computer
Graphics and Applications, 28(3), 4857.
Mills, J. E. and Treagust, D. F. (2003). Engineering educationIs problem-based or projectbased learning the answer? Australasian Journal of Engineering Education, 3(2), 116.
Miranda, E., Asce, M., and Akkar, S. D. (2006). Generalized interstory drift spectrum. Journal
of Structural Engineering, 132(6), 840852.
MISS DIG Systems Inc. (MDS) (2007). One-Call Excavation Handbook. http://www.missdig.net/images/Education_items/2007onecall_handbook.pdf (accessed September 15,
2011).
Oloufa, A. A., Ikeda, M., and Oda, H. (2003). Situational awareness of construction equipment using GPS, wireless and web technologies. Automation in Construction, 12(6),
737748.
Olson, E. (2011). AprilTag: A robust and flexible visual fiducial system. Proceedings of the
IEEE International Conference on Robotics and Automation (ICRA). Shanghai, China.
Ozuysal, M., Fua, P., and Lepetit, V. (2007). Fast keypoint recognition in ten lines of code.
Proceedings of 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp.18.
Minneapolis, MN
Park, H. S., Lee, H. M., Adeli, H., and Lee, I. (2007). A new approach for health monitoring of structures: Terrestrial laser scanning. Journal of Computer Aided Civil and
Infrastructure Engineering, 22(1), 1930.
Patel, A. and Chasey, A. (2010). Integrating GPS and laser technology to map underground
utilities installed using open trench method. Proceedings of 2010 Construction Research
Congress, American Society of Civil Engineers, Banff, Canada.
397
Patel, A., Chasey, A., and Ariaratnam, S. T. (2010). Integrating global positioning system with
laser technology to capture as-built information during open-cut construction. Journal
of Pipeline Systems Engineering and Practice, ASCE, 1(4), 147155.
PHMSA (Pipeline and Hazardous Materials Safety Administration) (2010). Stakeholder communications, consequences to the public and the pipeline industry. http://primis.phmsa.
dot.gov/comm/reports/safety/CPI.html (accessed September 15, 2011).
Piekarski, W. (2006) 3D Modelling with the tinmith mobile outdoor augmented reality system.
IEEE Computer Graphics and Applications, 26(1), 1417.
PMD (2010). PMD CamCube 3.0. http://www.pmdtec.com/news_media/video/camcube.php
(accessed March 3, 2015).
Porter, T. R. (2010). Navigation sensors enhance GIS capabilities. Midstream Business,
Houston, TX, July 1, 2010. http://www.pipelineandgastechnology.com/Operations/
SCADAAutomation/item63959.php (accessed February 10, 2011). Houston, TX.
ProStar (2011). About ProStar. http://www.guardianprostar.com/company.htm (accessed
September 10, 2011).
Roberts, G., Evans, A., Dodson, A. H., Denby, B., Cooper, S., and Hollands, R. (2002a). The
use of augmented reality, GPS and INS for subsurface data visualization. Proceedings of
the 2002 FIG XIII International Congress, pp. 112. Washington, DC.
Roberts, G., Ogundipe, O., and Dodson, A. H. (2002b). Construction plant control using RTK
GPS. Proceedings of the FIG XXII International Congress, Washington, DC.
Rojah, C. (2005). ATC-20-1 Field Manual: Post Earthquake Safety Evaluation of Buildings.
Applied Technology Council: Redwood City, CA.
Ruff, T. M. (2001). Monitoring blind spotsA major concern for haul trucks. Engineering
and Mining Journal, 202(12), 1726.
Ryu, S.-W., Han, J.-H., Jeong, J., Lee, S. H., and Park, J. I. (2010). Real-time occlusion culling for augmented reality. Proceedings of the 2010 Korea-Japan Joint Workshop on
Frontiers of Computer Vision, pp. 498503. Hiroshima, Japan.
Schall, G., Mendez, E., and Schmalstieg, D. (2008). Virtual redlining for civil engineering in
real environments, Proceedings of the IEEE International Symposium on Mixed and
Augmented Reality (ISMAR), Cambridge, U.K., pp. 9598.
Shi, J. and Tomasi, C. (1994). Good features to track. Proceedings of 1994 IEEE Conference
on Computer Vision and Pattern Recognition, pp. 593600. Seattle, WA.
Shin, D. H. and Dunston, P. S. (2008). Identification of application areas for augmented reality in industrial construction based on technology suitability. Journal of Automation in
Construction, 17(7), 882894.
Shreiner, D., Woo, M., Neider, J., and Davis, T. (2006). OpenGL Programming Guide. Pearson
Education. Boston, MA.
Skolnik, D. A. and Wallace, J. W. (2010). Critical assessment of interstory drift measurements.
Journal of Structural Engineering, 136(12), 15741584.
Son, H., Kim, C., and Choi, K. (2010). Rapid 3D object detection and modeling using range
data from 3D range imaging camera for heavy equipment operation. Automation in
Construction, 19(7), 898906.
Spurgin, J. T., Lopez, J., and Kerr, K. (2009). Utility damage preventionWhat can your
agency do? Proceedings of the 2009 APWA International Public Works Congress &
Exposition. American Public Works Association (APWA): Kansas City, MO.
Sterling, R. (2000). Utility locating technologies: A summary of responses to a statement of need distributed by the Federal Laboratory Consortium for Technology Transfer. Federal Laboratory
Consortium Special Reports Series No. 9. Louisiana Tech University: Ruston, LA.
Subsurface Imaging Systems (SIS) (2011). Surface penetrating radar systems. http://
subsurfaceimagingsystems.com (accessed February 10, 2011).
Talmaki, S. A., Kamat, V. R., and Cai, H. (2013). Geometric modeling of geospatial data for
visualization-assisted excavation. Advanced Engineering Informatics, 27(2), 283298.
398
Teizer, J., Allread, B. S., Fullerton, C. E., and Hinze, J. (2010). Autonomous pro-active
real-time construction worker and equipment operator proximity safety alert system.
Automation in Construction, Elsevier, 19(5), 630640.
Tener, R. K. (1996). Industry-university partnerships for construction engineering education.
Journal of Professional Issues in Engineering Education and Practice, 122(4), 156162.
Tian, Y., Guan, T., and Wang, C. (2010). Real-time occlusion handling in augmented reality
based on an object tracking approach. Sensors, 10(4), 28852900.
Trimble (2009). Trimble R8 GNSS. http://www.trimble.com/Survey/trimbler8gnss.aspx
(accessed March 3, 2015).
Tubbesing, S. K. (1989). The Loma Prieta, California, Earthquake of October 17, 1989-Loss
Estimation and Procedures.
Underground Imaging Technologies (UIT) (2009a). Multi-sensor TDEM system. http://www.
uit-systems.com/emi.html (accessed February 10, 2011).
Underground Imaging Technologies (UIT) (2009b). Case studies. http://www.uit-systems.
com/case_studies.html (accessed February 10, 2011).
Underground Imaging Technologies (UIT) (2010). Seeing what cant be seen with 3D subsurface imaging. www.uit-systems.com (accessed February 10, 2011).
Vidal, F., Feriche, M., and Ontiveros, A. (2009). Basic techniques for quick and rapid postearthquake assessments of building safety. Proceedings of the 2009 International
Workshop on Seismic Microzoning and Risk Reduction, pp. 110. Almeria, Spain.
Vincenty, T. (1975). Direct and inverse solutions of geodesics on the ellipsoid with application
of nested equations. Survey Reviews.
Virtual Underground (VUG) (2010). Virtual underground. http://www.virtualug.com (accessed
February 10, 2011).
von Gioi, R., Jakubowicz, J., Morel, J.-M., and Randall, G. (2010). LSD: A Fast Line Segment
Detector with a false detection control. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 32(4), 722732.
Wahbeh, A. M., Caffrey, J. P., and Masri, S. F. (2003). A vision-based approach for the direct
measurement of displacements in vibrating systems. Smart Materials and Structures,
12(5), 785794.
Wang, X. and Dunston, P. (2008). User perspectives on mixed reality tabletop visualization for faceto-face collaborative design review. Journal of Automation in Construction, 17(4), 399412.
Webster, A., Feiner, S., Macintyre, B., and Massie, W. (1996). Augmented reality in architectural construction, inspection, and renovation. Proceedings of 1996 ASCE Congress on
Computing in Civil Engineering. Anaheim, CA.
Wloka, M. M. and Anderson, B. G. (1995). Resolving occlusion in augmented reality.
Proceedings of Symposium on Interactive 3D Graphics, pp. 512. New York, NY.
Wyland, J. (2009). Rio grande electric moves mapping and field work into digital realm. PRWeb,
February 12, 2009. http://www.prweb.com/releases/mapping/utility/prweb2027014.htm
(accessed February 10, 2011).
Xu, N. and Bansal, R. (2003). Object segmentation using graph cuts based active contours.
Proceedings of 2003 IEEE Conference on Computer Vision and Pattern Recognition,
pp. 4653. Madison, WI.
15
Augmented Reality
HumanRobot
Interfaces toward
Augmented Robotics
Maki Sugimoto
CONTENTS
15.1 Introduction................................................................................................... 399
15.2 Video See-Through Augmented Reality Interface for Tele-Operation.........400
15.3 Future Predictive Visual Presentation with Consideration of Real
Environment..................................................................................................404
15.4 Projection-Based Augmented Reality for Gaming Robots...........................405
15.5 Active Tangible Robots in AR Environment.................................................407
15.6 Conclusion.....................................................................................................408
References...............................................................................................................409
15.1INTRODUCTION
This chapter introduces a set of example applications of humanrobot interfaces
with augmented reality (AR) technology. Previously, robots were used in controlled
environments such as factories for automation, performing highly repetitious tasks.
Nowadays, robots are widely distributed throughout society and can be found performing a range of tasks such as daily housekeeping, search-and-rescue missions,
and humanrobot communications. Figure 15.1 shows an example of robots ranging
from household appliances to mobile robots. AR technology is able to contribute to
visualizing information between robots and users in those situations.
Figure 15.2 shows examples of information related to robots and users. Robots
are able to capture valuable information by their embedded sensors such as vision,
thermo, and depth. Furthermore, robots have information related to their behavior
such as internal motor control plans and trajectory records. It is possible to make
seamless the cooperation between robots and users by projecting such information
onto real scenes with AR technology.
399
400
Task
Sensor
information
Attention
Action plan
Recorded
trajectory
Humanrobot
interface
Battery
information
(a)
401
(b)
FIGURE 15.3 (a) Rescue robot and (b) camera image of egocentric view.
the position and direction of the robot, especially because the distance to a target
is strictly based on camera images from the first-person viewpoint. The exocentric
view camera that is physically installed to the robot is very effective in such situations. However, to appropriately attach an exocentric view camera, physical expansion of the robot is needed. Such a large robotic body often disturbs the activity of
the robot due to physical constraints.
Thinking about the time base of the egocentric view camera image, for instance,
when the robot is advanced, a previous image has more environmental information
of the robot than the current image. The viewpoint of the previous image contains
the current position of the robot in such a situation, as shown in Figure 15.4.
In such a situation, by capturing a previous first-person-view image, a comprehensible third-person-perspective image can be generated even without reconstructing any complex environmental model. By superimposing a model of the robot on a
recorded image, it is possible to create a virtual third-person view that shows the current situation of the robot. Figure 15.5a shows fundamental concepts of the system.
Figure 15.5b shows a snapshot of a remote robot operation by Time Followers Vision.
Figure 15.6a shows the system configuration of the tele-operation interface.
Itconsists of a tele-operated robot that has a first-person-view camera, a sensor to
record the position and direction of the robot, and a visual presentation interface for
the operator that presents a virtual third-person view. The operator controls the robot
Past position
Camera
Moving direction
Field of view
Current position
402
Wireframe
model
Past image
User
(a)
(b)
FIGURE 15.5 (a) Concept and (b) remote robot operation by Time Followers Vision.
by looking at the image presented on the screen. Figure 15.6b shows a snapshot of
virtual third-person view generated by Time Followers Vision.
Through this configuration, the image captured by the camera during remote
operation is stored in a database along with time, position, and direction information.
To generate the virtual third-person view, the system selects the optimal background
image within the database, which provides information on both the current situation
and the surrounding environment of the robot.
The background image is selected by an evaluation function that considers the
field of view, the position of the camera, and the current position of the vehicle.
After a background image has been selected, a CG (Computer Graphics) model of
the vehicle is mapped with the image to generate a virtual viewpoint as seen from
behind the vehicle. It contributes to operability of the robot since both the current
state of vehicle and its environment are clearly visible to the operator.
Although past images are readily available when the environment is static,
problems arise when the environment is dynamic due to disparities in the present
PC
Vehicle
Camera
Images
DirectShow filter
Position
direction
Image
database
Query
Direct 3D
application
3D sensor
403
Wireframe
model
Selected
images
Background
texture
Vision
Control
unit
Command
Operator
(a)
(b)
FIGURE 15.6 (a) Virtual third-person view and (b) system configuration.
environment and the stored images. To apply this method to a dynamic environment,
detecting the difference between them is one of future works.
This method explained earlier can be used in various fields. Figure 15.7 shows
snapshots of a four-wheel robot for outdoor environments and the virtual third-
person-view interface with it. A differential GPS (Global Positioning System) unit
was used as the position sensor for this implementation. It was possible to remotely
operate the robot in a closed tarmac course.
404
(a)
(b)
FIGURE 15.7 (a) Remote robot in outdoor and (b) remote operation in closed tarmac course.
(a)
(b)
FIGURE 15.8 (a) Tele-operated robot with RGBD camera and (b) model of obstacles.
405
FIGURE 15.9 Applying physical simulation (a) before and (b) after.
viewpoint generated either by projecting the CG-based robot model to the stored
sequence of images or by directly capturing the scene using pole cameras lets the
operator become aware of the relative position of the robot in the surroundings.
However, it is still dependent on the operators prior experience to predict the nearfuture event, such as the possibility of a collision or fall that may or may not result
from the current course of robot movement, which thus makes it hard to set the
proper route for the robot.
Pathfinder Vision provides an informative interface that supports the operator
in becoming cognizant of near-future events by generating and presenting images
depicting the predicted events by considering the physical interferences and current
state of vehicle operation (Figure 15.9).
This system predicts where the robot will be after a few seconds, based on the
current state of operation. The prediction of the near-future robot state is governed by the physics simulation of the robot model and the geometric model of
the obstacles. The geometry of the obstacles is obtained by the range finder in real
time. A future prediction image is generated according to the simulation result,
and is presented to the operator through a visual interface for remote operation.
Having this kind of interface augments the ability of remote operators in complex
environments.
15.4PROJECTION-BASED AUGMENTEDREALITY
FOR GAMING ROBOTS
Applications of AR techniques for the control of robots is not limited to the video
see-through configuration described earlier. A projection-based configuration is
another possibility for AR robot systems. Augmented Coliseum (Kojima etal., 2006)
is an example of such a projection-based AR in a local environment. Figure 15.10
shows snapshots of Augmented Coliseum.
Figure 15.11 shows the system configuration. It consists of robots that are enhanced
by computer graphics and the virtual environment as it is represented inside the
computer. In the virtual environment, there are models of the robots that are synchronized with the real environment. Hence, it can achieve interactions between the
virtual and real environments.
406
(a)
(b)
FIGURE 15.10 Robots in augmented gaming environment. (a) Robots in AR gaming environment and (b) AR explosion effect.
Computer
Video signal
Video signal
generator
Display device
(projector)
Fiducial
images
Fiducial images
User
Real robots
Operation
CG objects
Position direction
Virtual environment
Models of
virtual objects
Physics simulation
Models of
robots
Measurement
device
When a user moves a robot in the real environment, the model of the robot in
the virtual environment is updated by measuring the robots position and direction.
In order to measure those factors, this system uses a display-based measurement
technique (Sugimoto etal., 2007). The interaction between the robot and a virtual
object that does not exist in the real environment is realized by physics simulation. The physics engine updates virtual forces for each model of robots and virtual
objects. According to the virtual forces, the motions of the virtual objects and the
robots are controlled. Figure 15.12 shows the physics simulation model of Augmented
Coliseum. It calculates reaction forces in the virtual environment by the penalty
method that considers interpenetration of objects.
Reaction force F
Damper
Spring
407
Speed v
Interpenetration x
CG object
Virtual force
Force by the wheel
The calculated virtual forces are presented by the actuators of the robots. If the
robots have omnidirectional wheels, it is very easy to render the virtual force by
them. On the other hand, robots with non-omnidirectional wheels are able to present the forces arbitrarily by changing the direction and position of the robot itself as
shown in Figure 15.13.
For a robot that has two ordinal wheels, it is possible to apply current for both
actuators according to the angle between the virtual force and the direction of the
robot. By applying this physics simulation and the robot control method, it is possible
to consider the robots as a physical display of the virtual forces.
Figure 15.14 shows a conceptual image of bilateral augmentation by robots and
virtual objects. In this environment, not only are virtual functions given for the robots
by projection, but also it is possible to augment the presence of virtual objects by
robot motion. The physics simulation gives seamless reality for the AR environment.
408
Robot
Computer graphics
(a)
Computer graphics
Robot
(b)
Robot
15.6CONCLUSION
This chapter introduced a video see-through-based remote robot operation interface
and a projection-based local robot augmentation system as applications of AR technology for robots. It is possible to create a better understanding between the users
and the robots by projecting virtual information on to the real environment. Not
surprisingly, computer graphics are able to support visualizing information and give
virtual augmentation for the robots. But the robots are also able to behave according
to the present information in the virtual environment through physics simulation.
Itcontributes to design natural and intuitive humanrobot interfaces.
409
REFERENCES
Hashimoto, S., Ishida, A., Inami, M., and Igarashi, T. (2011). TouchMe: An augmented reality
based remote robot manipulation, in: Proceedings of the 21st International Conference
on Artificial Reality and Telexistence (ICAT11).
Kojima, M., Sugimoto, M., Nakamura, A., Tomita, M., Nii, H., and Inami, M. (2006).
Augmented Coliseum: An augmented game environment with small vehicles, in:
Proceedings of the First IEEE International Workshop on Horizontal Interactive
HumanComputer Systems (TABLETOP06), pp. 38.
Leitner, J., Haller, M., Yun, K., Woo, W., Sugimoto, M., Inami, M., Cheok, A.D., and BeenLirn, H.D. (2009). Physical interfaces for tabletop games, Computers in Entertainment,
7(4), Article No. 61.
Maeda, N., Song, C., and Sugimoto, M. (2013). Pathfinder Vision: Future prediction augmented reality interface for vehicle tele-operation using past images, in: Proceedings of
the 23rd International Conference on Artificial Reality and Telexistence 2013 (ICAT13).
Milgram, P., Rastogi, A., and Grodski, J.J. (1995). Telerobotic control using augmented reality, in: Proceedings of Fourth IEEE International Workshop on Robot and Human
Communication (RO-MAN95).
Richter, J., Thomas, B.H., Sugimoto, M., and Inami, M. (2007). Remote active tangible
interactions, in: Proceedings of the First International Conference on Tangible and
Embedded Interaction, pp. 3942.
Sugimito, M., Kagotani, G., Nii, H., Shiroma, N., Inami, M., and Matsuno, F. (2005). Time
Followers Vision: A tele-operation interface with past images, IEEE Computer Graphics
and Applications, 25(1), 5463.
Sugimoto, M., Kodama, K., Nakamura, A., Kojima, M., and Inami, M. (2007). A display-based
tracking system: Display-based computing for measurement systems, in: Proceedings
of the 17th International Conference on Artificial Reality and Telexistence (ICAT07),
pp. 3138.
Tadokoro, S., Kitano, H., Takahashi, T., Noda, I., Matsubara, H., Shinjoh, A., Koto, T. etal.
(2000). The RoboCup-Rescue project, in: Proceedings of the 2000 IEEE International
Conference on Robotics & Automation, pp. 40904095.
16
Use of Mobile
Augmented Reality
forCultural Heritage
John Krogstie and Anne-Cecilie Haugstvedt
CONTENTS
16.1 Introduction................................................................................................... 411
16.2 Background on Mobile AR for Cultural Heritage......................................... 412
16.3 Application Example: Historical Tour Guide................................................ 416
16.3.1 Overview of Application.................................................................... 418
16.3.2 Technical Details............................................................................... 419
16.4 Evaluation of User Interest of the Application.............................................. 421
16.4.1 Results................................................................................................ 421
16.5 Conclusion..................................................................................................... 427
References............................................................................................................... 429
16.1INTRODUCTION
Cultural heritage is the legacy of physical artifacts and intangible attributes of a
group or society that are inherited from past generations and maintained in the present for the benefit of current and future generations. An important societal challenge
is to both preserve and make cultural heritage artifacts accessible to the general
public in both short- and long-term time frames. One recent technology that is being
used to help preserve cultural heritage is augmented reality (henceforth AR).
To preserve cultural artifacts, several cultural heritage institutions have developed
their own mobile AR applications using cultural heritage resources. These applications combine AR technology with historical pictures and other cultural artifacts.
Aquestion investigated in this chapter is how effective is AR technology for presenting cultural heritage information, and how acceptable is the technology from a users
perspective? A number of studies have examined the acceptance of mobile applications and services (Gao etal. 2011, 2014, Ha etal. 2007, Kaasinen 2005, Liang and
Yeh 2010, Liu and Li 2011, van der Heijden etal. 2005, Verkasalo etal. 2010), in
some cases adding to the traditional technology acceptance model (TAM) based on
limitations of TAM for mobile applications (Gao etal. 2008, Wu and Wang 2005).
Further, while recent studies of mobile technology have examined user acceptance of
mobile tourist guides (Peres etal. 2011, Tsai 2011), we have found only one study that
has examined user acceptance of mobile AR (van Kleef etal. 2010). Therefore,user
411
412
413
414
The cultural heritage application was built on top of the Layar platform and is available for both Android and iPhone devices. The system enables users to view historic
photographs of Philadelphia as overlays on the camera view of their smartphones.
The application contains almost 90,000 geo-positioned images, 500 of which can
be viewed in 3D, while a selection of 20 images contain additional explanatory text
developed by local scholars. The entire development process is thoroughly documented in a white paper and numerous blog posts covering technical and cultural
challenges that the designers confronted and overcome as they developed the system.
The Streetmuseum (ML 2014) cultural heritage system represents an AR application for iPhone and was developed by the Museum of London. The application
contains over 200 images from sites across London. Users with an iPhone can view
these images in 3D as ghostly overlays on the present day scene; however, users
with a 3G phone cannot access the complete AR functionality, but are still able to
view the images in 2D. The Streetmuseum application is different from the applications built by other cultural institutions, mainly because the Museum of London was
able to tailor their system for their particular uses, rather than using an existing AR
browser not tailored to specific users. The result is an application that offers a far
better experience than Layar, but only works on a limited number of devices (Chan
2010). As a measure of its success, the system had more than 50,000 downloads in
the first 2 weeks of its use.
The Netherlands Architecture Institutes UAR application (NAI 2011) is a mobile
architecture device developed by the Netherlands Architecture Institute. The application is built on top of Layar and is available for both Android and iPhone devices.
It uses AR to provide information about the built environment and is similar to the
Streetmuseum and the AR system developed by PhillyHistory.org. However, unlike
those systems, UAR also contains design drawings and 3D models of buildings that
were either never built, are under construction, or in the planning stage.
Another cultural heritage system, the Powerhouse Museums AR system
(Powerhouse 2014) allows visitors to use their mobile phones to visualize Sydney,
Australia as it appeared 100 years ago. The Powerhouse Museum system is not a
custom application, but is implemented as a channel in Layar. It is thus available on
all devices with a Layar browser. Their web page contains detailed instructions on
how to download Layar and search for the Powerhouse Museum channel.
Still another AR system designed to preserve cultural heritage was intelligent
tourism and cultural information through ubiquitous services (iTACITUS) (BMT
2011). This AR system was developed in connection to a European research project that was completed in July 2009. While the project was ongoing, researchers
explored various ways of using AR to provide compelling experiences at cultural
heritage sites, with an additional aim to encourage cultural tourism. One of the systems developed under the iTACITUS program, was Zllner et al.s AR presentation system for remote cultural heritage sites (2009). This system has been used
to produce several installations, among them the 20years representing the Fall of
the Berlin Wall installation at CeBIT in 2009. In that installation, users used Ultra
Mobile PCs (UMPCs) to visualize images of Berlin superimposed on a satellite
image of the city laid out on the floor. By touching the screen, users were able to
navigate through visualizations of Berlin as it appeared in different decades, thus
415
recognizing the historical geographical location and politics in Berlin before and
after World War II, followed by the construction of the Berlin Wall. The installation
also consisted of an outdoor component where interested users were able to take photos of a building and receive historical overlays from the server that corresponded to
the current view of the site.
In another example of mobile AR, the CityViewAR system (Billinghurst and
Dnser 2012) was developed to be used in a city. A particular goal of this AR
system was to support learning. With the system students used a mobile phone
application to see buildings in the city of Christchurch as they existed before the
2011 earthquakea natural event that resulted in much damage to the city. The
application was regarded to be user-friendly and thus designed to be used by any
citizen.
In Keil etal. (2011) a mobile app was designed to use AR technology to explain
the history and the architectural visual features at a real building viewed outdoors.
Inthe application, Explore! (Ardito etal. 2012), one can create an AR outdoor environment based on 3D models of monuments, places, and objects of historical sites,
and also extend the cultural heritage experience with contextual sounds.
Finally a recent paper from the European project Tag Cloud (de los Ros etal.
2014) provides an overview of current trends in information technology that are most
relevant to cultural institutions. The project investigates how AR, storytelling, and
social media can improve a visitors experience of local culture. Following the overview of techniques for cultural heritage, members of the project note recent developments related to the use of AR in the field.
Lights of St. Etienne (Argon 2014) use the AR browser Argon to create an embodied, location-based experience. Further, Historypin (2014) is a system that allows
community members to share images of the past. However, most of the applications
and research projects related to culture heritage are tourism oriented, and do not
consider the importance of engaging local community members about their own
cultural past.
In addition to these systems, there are many image recognitionbased AR appli
cations available: one of the most popular is StickyBits and another is Holey and
Gaikwad (2014); both are in the process of becoming mature technologies with
the capability to show relevant information about any object in the users vicinity.
Vuforia (2014) is a Solution Development Kit (SDK) for AR image r ecognition system that supports iOS, Android, and Unity 3D. These functionalities are only to a
limited degree being used in cultural heritage MAR applications. One use is culture
and nature travel (kulturog naturreise), which is a project whose goal is to present cultural heritage and natural phenomena in Norway using mobile technology
(Kulturrdet 2011). The project is being done in collaboration with the Arts Council
Norway (Kulturrdet), the Directorate for Cultural Heritage (Riksantikvaren), the
Norwegian Directorate for Nature Management (Direktoratet for naturforvaltning),
and the Norwegian Mapping Authority (Statens kartverk). The first pilot experiment
in the project used AR and QR codes to present information from the historical river
district in Oslo, Norway. Future pilot projects will be conducted to identify what
information and technology is needed to present information from archives, museums, and databases on smartphones, tablets, and other mobile devices.
416
The first research question deals with relationships between constructs discussed in
the TAM acceptance model. Van der Heijden (2004) showed that perceived enjoyment and perceived ease of use were stronger predictors of intention to use a hedonic
system than perceived usefulness. Based on this observation, our goal was to discover whether the same finding held for mobile AR applications that presented historical pictures and information. Information about user acceptance of technology
417
and its hedonic qualities, pragmatically, can be used to find ways to make cultural
heritage information more acceptable to users.
The second and third question guiding the research dealt with a users interest
in using mobile AR technology for accessing cultural heritage. Here the research
aim was to discover whether there is an interest in the technology with respect to its
application for cultural heritage, and if so, whether this interest is dependent on the
specific application being used on a specific type of device. It was also of interest to
discover whether people wanted to use the application that was developed in their
home town, or when visiting a new city (as a tourist). We were also interested in
discovering how previous interest in local history influenced whether people wanted
to use the application or not.
To research these questions, a preliminary study was first conducted to explore
the need for an AR application presenting historical photographs. A number of similar solutions were reviewed and stakeholders from local cultural heritage institutions
were interviewed to gather user and system requirements. Next, a prototype was
developed and evaluated for its usability. Based on the results derived from this analysis, another design and development phase was performed. Furthermore, different
models for technology acceptance were reviewed and a questionnaire was designed
to measure usability. The questionnaire consisted of five major parts:
1. Perceived usefulness
2. Perceived ease of use
3. Perceived enjoyment
4. Behavioral intention
5. Individual variables
Figure 16.1 shows the research model used in this study. Note that this is the TAM
with perceived enjoyment as used by Davis etal. (1992) and van der Heijden (2004).
The measure for perceived usefulness was developed specifically for this project in
line with the thinking of van der Heijden (2004). For some time it has been clear that
mobile applications have certain specific challenges regarding usability that should
be taken into account (Krogstie 2001, Krogstie etal. 2003).
Perceived ease
of use (PEOU)
H4
Perceived
enjoyment (PE)
H3
H1
H5
Behavioral
intention to use
(BI)
Perceived
usefulness (PU)
H2
418
Four constructs are included in the model: perceived enjoyment, perceived usefulness, perceived ease of use, and behavioral intention. While it was expected that
the predicative strength of the paths may change, it was also expected that the structure of the relationships from TAM would hold for this model as well. This conclusion led to the following hypotheses:
H1: There is a positive relationship between perceived ease of use and intention to use.
H2: There is a positive relationship between perceived usefulness and intention to use.
H3: There is a positive relationship between perceived ease of use and perceived usefulness.
Perceived enjoyment has the same position in the research model as perceived usefulness that led to the following hypotheses:
H4: There is a positive relationship between perceived ease of use and perceived enjoyment.
H5: There is a positive relationship between perceived enjoyment and intention to use.
All three methods can be combined with filtering to allow users to only look at
photos and related information from a specific decade. We next present the main
functionality of the system:
AR view: The AR view is the main view of the application where POIs are
shown as floating icons overlaying the camera feed. The name of the application is shown in the toolbar at the top. The view is shown in Figure 16.2.
Photo overlay: The applications provided transparent photo overlays. These
let the user see historical images overlaid over the present day scene. The
buttons in the toolbar at the top of the screen are used to close the overlay or
go to the detailed information view belonging to the picture.
Detailed information view: Each of the photographs in the application has
an associated detailed information view. For pictures it would contain a
description of the motive and also lets the user know when the picture was
taken, the source of the photograph, and the name of the photographer.
419
Timeline: The timeline is always visible at the bottom of the screen. It lets
the user filter the amount of incoming information so they only see photographs from a specific decade. The selected decade is marked in green and
written in the upper left corner of the display.
Map: The map shows the users current position and the position of photos
from the decade selected on the timeline. Each pin is tagged with the name
of the photo and the current distance from the user.
List view: This view shows the user a list of all photographs from the
selected decade and provides a convenient method to open detailed views
without having to locate the associated markers.
16.3.2Technical Details
As mentioned earlier, the Historical Tour Guide application is built on top of
CroMAR. Therefore, most of the technical details of these two applications are similar. The code for CroMAR is written in Objective-C, the programming language for
native iOS applications. The Historical Tour Guide is written in the same language
but was updated to use automatic reference counting (ARC), a compiler feature of
XCode that provides automatic memory management of Objective-C objects.
Figure 16.3 shows the key objects in the Historical Tour Guide. The application is
organized around the model-view controller (MVC) pattern. This pattern separates
the data objects in the model from the views used to present the data. It facilitates the
independent development of different components and makes it possible to swap out
views or data without having to change large amounts of code. The system objects in
the diagram are standard objects that are part of all iOS applications. These are not
subclassed or modified by application developers. This is unlike the custom objects
that are instances of custom classes written for this specific application.
420
Custom objects
Model
Data objects
System objects
Either system or
custom objects
Controller
Event
loop
UIApplication
View
View controllers
Application
delegate
View and UI
objects
UIWindow
The UIApplication object is the system object that manages the application event
loop. It receives events from the system and dispatches these to the applications custom classes. It works together with the application delegate, a custom object created
at launch time that is responsible for the initialization of the application. The view
controller objects manage the presentation of the applications content on the screen.
Each of the controller objects manages a single view and its collection of subviews.
The other custom view controllers in the application also manage subclasses, or one
of the other standard iOS view controllers.
Each view covers a specific area and responds to events within that area. Controls
are a specialized type of view for implementing buttons, check boxes, text fields,
or similar interface objects. Further, the views and view controllers are connected.
When a view controller is presented, it makes its views visible by installing them
in the applications window. This is represented by a system object of the type
UIWindow. The last group of objects is the data model objects. These objects store
the applications content, such as the POIs, photographs, and historical information.
The Historical Tour Guide is launched when the user taps the custom application
icon. At this point in time, the application moves from the not running state to the
active state, passing briefly through the inactive state. As part of this launch cycle,
the system creates a process and a thread for the application and calls the applications main function. The Historical Tour Guide is an event-driven application. The
flow of the program is determined by two types of events:
1. Touch events, generated when users touch the views of the application
2. Motion events, generated when users move the device
Events of the first type are generated when a user presses a button, scrolls in a list, or
interacts with any of the other views. An action message is generated and sent to the
target object that was specified when the view was created.
421
16.4.1Results
This section presents the descriptive analysis of the results from the two surveys.
Astatistical test of the overall research model is presented by Haugstvedt and Krogstie
(2012), and we only present the main result here. In the street survey, the age range
422
was 1460years, with a mean age of 27.8. Overall, 59.5% of the respondents were
male, and about a fifth of the respondents replied that they had previously completed
a similar questionnaire. In the web survey the age range was 2045years, with a mean
age of 33.3. The gender distribution was about equal, but with slightly more female
respondents.
Interestingly, for the street survey, the respondents did not use the entire scale.
Apart from some responding 3, all answers were in the range from 4 to 7, which
indicate that those having the opportunity to test the system properly themselves were
all either neutral or positive. On the web survey the entire scale was used, which partly
explains the shift in the average value of the responses between the two surveys.
The responses to the questions on perceived usefulness are found in Table 16.1. First
the individual question is presented (e.g., PU1by using the app, I can more quickly
and easily find historical pictures and information), before first the gradings from the
web survey and then the gradings from the street survey are presented for each question. Inthe final two lines we present the average response for all four questions in the
category perceived usefulness. Although the median answer is the same on most questions, we see that the average is generally higher on the street survey. Still we can regard
the responses to usefulness to be quite high compared to what we have experienced in
similar surveys relative to other mobile applications such as (Hella and Krogstie 2010).
The responses to the four items on perceived ease of use are found in Table 16.2.
First the individual question is presented (e.g., PEOU1interaction with the app is
clear and understandable), before first the gradings from the web survey and then
the gradings from the street survey are presented for each question. In the final two
lines we present the average response for all four questions in the category perceived
TABLE 16.1
Comparing Answers between the Surveys on Perceived Usefulness
Item
Min
Max
Mean
Median
STD
PU1: By using the app, I can more quickly and easily find historical pictures and information.
PU1web
PU1street
200
42
1
4
7
7
5.35
6.00
6
6
1.181
0.963
6
6
1.171
0.661
200
42
1
4
7
7
5.34
6.38
PU3: By using the app, I can quickly find historical pictures and information from places nearby.
PU3web
PU3street
200
42
1
4
7
7
5.39
6.24
6
6
1.202
0.726
PU4: By using app, I am more likely to find historical pictures and information that interest me.
PU4web
PU4street
200
42
1
3
7
7
5.08
6.05
5
6
1.393
0.963
200
42
1
4
7
7
5.29
6.17
5.5
6.25
1.13
0.648
PUaverage
PUweb
PUstreet
423
TABLE 16.2
Comparing Answers between the Surveys on Perceived Ease of Use
Item
Min
Max
Mean
Median
STD
200
42
1
3
7
7
5.31
5.52
6
6
1.233
1.215
PEOU2: Interaction with the app does not require a lot of mental effort.
PEOU2web
PEOU2street
200
42
1
3
7
7
4.87
5.88
5
6
1.361
0.993
1
4
7
7
5.16
5.79
5
6
1.182
1.001
200
42
200
42
1
4
7
7
4.97
5.57
5
6
1.223
1.016
200
42
1
4
7
7
5.08
5.69
5.13
5.75
1.117
0.860
PEOUaverage
PEOUweb
PEOUstreet
ease of use. Here we find a similar pattern as earlier, with higher averages and also
for several of the questions higher medians. The responses are on average a bit less
positive though; thus even if the application first had undergone a separate usability
test and had been improved before being used in this specific investigations, this is
an indication that there is still room for improvements. It is positive though that the
responses to the street survey is more positive, with averages of around six as for
finding the app easy to use and not requiring a lot of mental effort.
For perceived enjoyment the respondents in both surveys used a semantic differential with contrasting adjectives at each end of the scale to rate these items.
Thescale used in the street survey was a discrete scale with seven categories while
the scale used in the web survey was continuous. The replies from the continuous
scale were later on coded into seven categories. Results are shown in Table 16.3.
First the individual scale is presented (e.g., PE1disgustingenjoyable) before first
the gradings from the web survey and then the gradings from the street survey are
presented for each question. In the final two lines we present the average response
for all four scales in the category perceived enjoyment. The data reveal a higher average in the web survey, which indicate quite high perceived enjoyment of this kind
of service, although we find a higher standard deviation on the result of this than in
thecategories perceived ease of use and perceived usefulness, pointing to that here
the opinions of the respondents are more mixed.
In Table 16.4, we provide similar data for intention to use. First the individual
question is presented (e.g., BI1I intend to use the app on a smartphone), before first
the gradings from the web survey and then the gradings from the street survey are
presented for each question. In the final two lines we present the average response
424
TABLE 16.3
Comparing Answers between the Surveys on Perceived Enjoyment
Item
Min
Max
Mean
Median
STD
PE1: disgustingenjoyable
PE1web
PE1street
200
40
1
3
7
7
6.06
5.83
7
6
1.562
1.130
200
40
1
3
7
7
5.60
5.45
6.5
5
1.881
1.131
1
4
7
7
6.17
5.70
7
6
1.514
1.067
200
40
1
3
7
7
5.67
6.00
7
6
1.993
1.038
200
40
1
3.75
7
7
5.9
5.74
6.5
5.75
1.500
0.893
PE2: dullexciting
PE2web
PE2street
PE3: unpleasantpleasant
PE3web
PE3street
200
40
PE4: boringinteresting
PE4web
PE4street
PEaverage
PEweb
PEstreet
for all eight questions in the category intention to use. Intention to use is the only
area in the street survey where the respondents used the entire scale in answering.
As can be seen from the higher standard deviations on many of these questions,
opinions arequite mixed, although the responses on questions on some modes of
usages (e.g.,the use of the app if visiting a city as a tourist) are very positive. We will
discuss some of the other results in more detail in the following.
An important aspect with the use of the web survey was the opportunity to validate the hedonic research model. Figure 16.4 shows the structural model calculated
with data from the web survey. The structural model shows that all five hypotheses
were supported. With the exception of the path between PEOU and BI, all paths were
significant at the p < 0.001 level. The path between PEOU and BI was significant at
the p < 0.05 level. A more detailed treatment of the statistical validity of these results
is found in (Haugstvedt and Krogstie 2012).
To summarize the hypothesis relative to the research model in light of Figure 16.4
H1: There is a positive relationship between perceived ease of use and intention to use. Accepted at p < 0.05.
H2: There is a positive relationship between perceived usefulness and intention to use. Accepted at p < 0.001.
H3: There is a positive relationship between perceived ease of use and perceived usefulness. Accepted at p < 0.001.
H4: There is a positive relationship between perceived ease of use and perceived enjoyment. Accepted at p < 0.001.
H5: There is a positive relationship between perceived enjoyment and intention to use. Accepted at p < 0.001.
425
TABLE 16.4
Comparing Answers between the Surveys on Intention to Use
Item
Min
Max
Mean
Median
STD
200
40
1
3
7
7
4.58
5.98
5
6
1.760
1.121
7
7
4.07
5.80
4
6
1.700
1.229
7
7
4.01
5.22
4
5
1.779
1.557
3.55
4.81
4
5
1.695
1.784
5.05
6.45
5
7
1.577
0.889
4.45
6.12
5
6
1.692
1.109
4.54
5.43
5
5
1.779
1.548
200
41
1
3
200
41
1
2
200
42
1
1
7
7
200
42
1
4
7
7
BI6: I predict that I will use the app in a city I visit as a tourist.
BI6web
BI6street
200
42
1
3
7
7
200
42
1
1
7
7
200
42
1
1
7
7
4.16
5.24
4
5
1.810
1.620
200
40
1
3.63
7
7
4.3
5.7
4.5
5.5
1.520
0.966
BIaverage
BIweb
BIstreet
On a detailed level, the descriptive results from the street survey and the web survey
have some differences. The street survey participants were generally more positive
about the application than their web survey counterparts, and to a lesser extent used
the whole (negative part of) scale. This might be caused by unwanted bias by the
presence of the investigator, but can also be the effect of using and experimenting
with the app freely themselves. The web survey participants rated the application
higher on the scale for perceived enjoyment, but it is likely that this is due to the
different format that was used for this scale in the web survey. The company that
collected the data used a continuous slider to program this question instead of having
seven distinct categories. The answers were afterward mapped into seven categories.
It is possible that this format caused the respondents to use the endpoints of the scale.
426
0.549
Perceived ease
of use (PEOU)
Perceived
enjoyment (PE)
0.301
0.333
(5.416***)
0.760
(23.821***)
(8.085***)
0.152
(2.060*)
Behavioral
intention to use
(BI) 0.301
Perceived
usefulness (PU)
0.578
0.382
(4.680***)
That would at least explain the high number of scores of 7 in this scale in the web
survey compared to the street survey. The participants in both surveys were more
interested in using the application in a city they visited as a tourist than to using it in
their hometown. This indicates that it is relevant to compare the input of people from
other places with those from locals.
Finally, the generalizability of the results should be investigated. As mentioned
earlier a similar application had been made on a normal mobile platform with maps
and geo-tagged historical pictures with limited success, not getting past the prototype stage. The application on a mobile AR platform has received better feedback as
reported here. This mirrors the results reported by Billinghurst and Dnser (2012)
claiming that providing AR on mobile devices can have benefits over offering nonAR content on the same topic.
Norwegians are known to quickly adopt new technologies. It is hard to judge if
users in other countries where the use of smartphones and tablets are not so widespread would be less positive to applications of this sort. Given that the application does not store any private data, aspects of trust that carry different weight in
different cultures (Gao and Krogstie 2011) would not be expected to influence the
results.
Having established the basic research model, we have looked further on relationships between the individual variables and intention to use (BI) using the
web survey. Given that the data are on a Likert scale, we have used nonparametric techniques to investigate correlations and find the following significant
results looking at the web survey. In addition to the overall bivariable (on intention to use), we have also looked at variables relative to use of the app on a smartphone (BI-smart), on a tablet (like tested, BI-tablet), as a tourist (BI-tourist), or
as a local (BI-local).
In Table 16.5, the first column shows the test variable, whereas the second
shows the grouping value that is one of the background variables. The next column
indicates the average value of the test variable for those responding yes on the
427
TABLE 16.5
Significant Relationships between Main Variables and Background Variables
Test Variable
BI
BI
BI
BI-smart
BI-smart
BI-tablet
BI-tourist
BI-local
PEOU
PE
PE
Grouping Variable
A (Yes)
B (No)
N(A)
N(B)
4.53
4.63
4.44
4.52
4.65
4.40
4.90
4.46
5.14
6.10
5.10
4.12
4.16
3.69
3.44
4.09
3.51
4.08
3.85
4.78
5.75
6.00
89
61
163
163
89
61
163
95
139
27
27
95
139
27
163
89
27
27
95
169
0.048
0.036
0.006
0.000
0.008
0.001
0.004
0.033
0.021
0.018
0.002
grouping variable. The next column is the average of the no answer and the next
columns indicate the number of yes and no answers for the grouping variable, and
p is the probability that there is not a difference between the groups on the test
variables.
As we see from the table, those who already have expressed an interest in local
history seem more likely to use such applications, which should come as no surprise. On the technology side, we see that those already having a smartphone or
a tablet find it more likely that they will adopt an application of this sort, which
necessarily must be mobile. This can also explain the large differences in answers
on questions BI1BI4 in Table 16.4. Those having a tablet are more likely to use a
cultural heritage application on a tablet. Similarly those having a smartphone are
more inclined to use a cultural heritage application on a smartphone. We see also
a positive correlation between interest in local history and the usage of the app on
a smartphone. Smartphone users may desire to use the app both as a tourist and
as a local. Finally, those having a smartphone perceived the tablet app to be more
easy to use than those not having a tablet, which also can explain the spread of the
responses to usability. Asmore people get used to using tablets, these applications
exploiting the possibilities of tablets will also be experienced as more easy to use.
Those already having interest in local history expressed significantly higher perceived enjoyment. Somewhat surprising is that those not familiar with such an app,
had a higher perceived enjoyment than those (quite few) that had used such applications before. Possibly, this can be explained by a novelty factor.
16.5CONCLUSION
As have been illustrated in this chapter, mobile AR has been implemented for
a large range of situations, both for indoor and outdoor use, applying a large
variety of techniques to enhance the experience of the spectators, view of cultural heritage. As the technology is developed further, such that a person can
428
This chapter discussed also the differences in the results based on feedback on seeing a mock-up outside the actual usage environment, and the experiences with a real
application in the real environment. This study was done to allow participants to
judge results from the evaluation of a more low-fidelity solution, where it is easier to
get feedback from many users at an early stage (here originally done to validate the
research model, and investigate the significant correlations between the variables in
the research model and background variables). It appears that the responses in the
low-fidelity case were more conservative. One exception was the result on perceived
enjoyment, which seems to have been influenced by how this question was implemented with sliders rather than discrete values in the questionnaire. On the other
hand, we saw limitations with the more real setting, in the sense that there could be
possible investigator bias which leads to not getting the local respondents even at
the local site (which here might be tourists and not locals). Thus when investigating
the more detailed mechanisms for acceptance, we have focused in particular on the
results from the web-based survey.
429
REFERENCES
Andresen, S. H., J. Krogstie, and T. Jelle. 2007. Lab and research activities at wireless
trondheim. In: Fourth International Symposium on Wireless Communication Systems,
Trondheim, Norway.
Ardito, C., M. F. Costabile, A. De Anegeli, and R. Lanzilotti. 2012. Enriching archaeological
parks with contextual sounds and mobile technology. ACM Transactions on Computer
Human Interaction, 19(4): 130, ISSN: 1073-0516, DOI: 10.1145/2395131.2395136.
Argon. 2015. Georgia tech augmented environments. http://argon.gatech.edu/ (accessed June
7, 2014).
Azuma, R., M. Billinghurst, and G. Klinker. 2011. Special section on mobile augmented reality. Computer and Graphics, 35: viiviii.
Billinghurst, M. and A. Dnser. 2012. Augmented reality in the classroom. IEEE Computer,
45(7): 5663.
BMT. 2011. BMT research and development directorate. iTacitusIntelligent tourism and
cultural information through ubiquitous services. http://www.itacitus.org/ (accessed
June 22, 2014).
Boyer, D. and J. Marcus. 2011. Implementing mobile augmented reality applications for
cultural institutions. In: J. Trant and D. Bearman (eds.), Museums and the Web 2011:
Proceedings (MW2011), Toronto, Ontario, Canada.
Butchart, B. 2011. Augmented reality for smartphones. Technical report, JISC observatory.
Brighton, UK.
Chan, S. 2010. On augmented reality (again)Time with UAR, Layar, streetmuseum & the CBA.
http://www.powerhousemuseum.com/dmsblog/index.php/2010/10/26/on-augmentedreality-again-time-with-uar-layar-streetmuseum-the-cba/ (accessed June 22, 2014).
Davis, F. D., R. P. Bagozzi, and P. R. Warshaw. 1992. Extrinsic and intrinsic motivation to use
computers in the workplace. Journal of Applied Social Psychology, 22(14): 11111132.
de los Ros, S., M. F. Cabrera-Umpirrez, M. T. Arredondo, M. Pramo, B. Baranski, J. Meis,
M. Gerhard, B. Prados, L. Prez, and M. del Mar Villafranca. 2014. Using augmented
reality and social media in mobile applications to engage people on cultural sites
universal access in humancomputer interaction. Universal access to information and
knowledge. Lecture Notes in Computer Science, 8514: 662672.
Gao, S. and J. Krogstie. 2011. Explaining the adoption of mobile information services from a
cultural perspective. In: Proceedings ICMB 2011, June 2021, Como, Italy, pp. 243252.
Gao, S., J. Krogstie, and P. A. Gransther. 2008. Mobile services acceptance model. Paper
presented at the International Conference on Convergence and Hybrid Information
Technology (ICHIT2008), Daejeon, Korea.
Gao, S., J. Krogstie, and K. Siau. 2011. Developing an instrument to measure the adoption
of mobile services. International Journal on Mobile Information Systems, 7(1): 4567.
Gao, S., J. Krogstie, and K. Siau. 2014. Adoption of mobile information services: An empirical
study. International Journal on Mobile Information Systems, 10(2): 147171.
Ha, I., Y. Yoon, and M. Choi. 2007. Determinants of adoption of mobile games under mobile
broadband wireless access environment. Information and Management, 44(3): 276286.
Haugstvedt, A.-C. and J. Krogstie. 2012. Mobile augmented reality for cultural heritage:
A technology acceptance study. In: Proceedings of the International Symposium on
Mixed and Augmented Reality (ISMAR), November 58, Atlanta, GA.
Hella, L. and J. Krogstie. 2011. Personalisations by semantic web technology in food s hopping.
In: Proceedings of WIMS 2011, Sogndal, Norway.
Historypin. 2014. A global community collaborating around history. http://www.historypin.
com (accessed June 7, 2014).
Holey, P. and V. Gaikwad. 2014. Google glass technology. International Journal of Advanced
Research in Computer Science and Management Studies, 2(3): 278.
430
431
van Kleef, N., J. Noltes, and S. van der Spoel. 2010. Success factors for augmented reality business models. In: Study Tour Pixel 2010. University of Twente, Twente, the
Netherlands, pp. 136.
Venkatesh, V. and F. D. Davis. 2000. A theoretical extension of the technology acceptance
model: Four longitudinal field studies. Management Science, 46(2): 186204.
Verkasalo, H., C. Lpez-Nicolsb, F. J. Molina-Castillo, and H. Bouwman. 2010. Analysis
of users and non-users of smartphone applications. Telematics and Informatics, 27(3):
242255.
Vlahakis, V., N. Ioannidis, J. Karigiannis, M. Tsotros, and M. Gounaris. 2002. Virtual reality and information technology for archaeological site promotion. In: Proceedings of
the Fifth International Conference on Business Information Systems (BIS02), Poznan,
Poland.
Vuforia. 2014. https://www.qualcomm.com/products/vuforia (accessed March 3, 2015).
Wu, J.-H. and S.-C. Wang. 2005. What drives mobile commerce? An empirical evaluation of
the revised technology acceptance model. Information Management, 42: 719729.
Wst, H., M. Zllner, J. Keil, and D. Pletinckx. 2009. An augmented reality presentation system
for remote cultural heritage site. In: Proceedings of the 10th International Symposium
on Virtual Reality, Archaeology and Cultural Heritage VAST (2009). Atlantic City, NJ.
17
Applications of
Augmented Reality for
the Automotive Industry
Vincent Gay-Bellile, Steve Bourgeois,
DorraLarnaout, and Mohamed Tamaazousti
CONTENTS
17.1 Introduction................................................................................................... 434
17.2 Potential Benefits of Augmented Reality for the Automotive Industry......... 434
17.2.1 Vehicle Design and Conception......................................................... 434
17.2.2 Factory Planning................................................................................ 435
17.2.3 Vehicle Production............................................................................. 435
17.2.4 Sales Support..................................................................................... 436
17.2.5 Driving Assistance............................................................................. 436
17.2.6 User Manual and Maintenance Support............................................ 438
17.3 Technological Challenges for a Large-Scale Deployment............................ 438
17.4 Tracking a Vehicle or One of Its Component................................................440
17.4.1 State of the Art..................................................................................440
17.4.2 Our Solution: VSLAM Constrained by a CAD Model..................... 443
17.4.2.1 Principle.............................................................................. 443
17.4.2.2 Implementation...................................................................444
17.4.3 Discussion.......................................................................................... 445
17.5 Vehicle Localization for Aided Navigation in an Urban Context.................446
17.5.1 State of the Art..................................................................................446
17.5.2 Constrained VSLAM for Large-Scale Vehicle Localization............ 447
17.5.2.1 Constraining VSLAM with GPS........................................448
17.5.2.2 Improving the In-Plane Accuracy with GIS Constraint.....449
17.5.2.3 Improving the Out-Plane Accuracy with a SIG
Constraint...........................................................................452
17.5.2.4 Overview of Our Complete Solution for Vehicle
Localization........................................................................ 452
17.5.3 Discussion.......................................................................................... 453
17.6 Conclusion..................................................................................................... 454
Acknowledgments................................................................................................... 454
References............................................................................................................... 455
433
434
17.1INTRODUCTION
For a number of reasons, the automotive industry is heavily involved in the development of augmented reality (AR) and its applications. Since the late 1990s, car
manufacturers and original equipment manufacturers (OEMs) have explored the
benefits of AR through major collaborative projects, such as ARVIKA with Daimler
Chrysler, Ford, Audi, Volkswagen, and Siemens AG (ARVIKA Project 19992003);
ARTESAS with Siemens AG and BMW (ARTESAS Project 20042006); EFA2014
with Audi, BMW, Continental, and Bosch (EFA2014 Project n.d.); AVILUS with
Daimler, Siemens, and Volkswagen; and finally, EGYPT with PSA. During these
projects, the use of AR for the whole product life cycle has been studied, from car
conception until customer assistance services. Still, the deployment of AR applications in the automotive industry has remained limited due to various technical
challenges.
In this chapter, we first provide an overview of the different uses of AR in the
automotive industry and their relative requirements. Then, we focus on recent tracking technology advances that may lead to large-scale deployment of AR solutions in
the automotive industry.
435
436
operator forms of the design by showing him or her hidden elements that he or she
otherwise could not directly see, such as the position of an electrical wire behind a
metal panel. This virtual information can be generated from existing CAD data of
the product, or also from data arising from nondestructive testing (e.g., ultrasonic
imagery, tomography).
Summarizing, AR can reduce the risk of human mistakes, but also increase the
efficiency of the operator. Indeed, with AR the operator can remain focused on an
important area of intervention since he or she no longer needs to switch between the
documentation and its work area. This approach, the use of AR within the automotive industry, was evaluated for tasks such as picking (Reif and Gnthner 2009),
assembly (Reiners etal. 1998; Regenbrecht etal. 2005), and quality control (Zhou
etal. 2012). In addition, if AR technology can be used to make complex operations
easier, it can also be used for training purposes, in which case AR could free senior
staff from basic training duties or accelerate the replacement of an absent worker.
17.2.4Sales Support
During the process of a sale, an automotive salesperson faces various challenges.
First, he or she may have to present to the potential customer the various models and
options of a vehicle; yet only one or two models may be available in the showroom.
Moreover, even if the vehicle is available, some specificity of the vehicle cannot
be easily observed, such as the turning radius or the braking distance. One current
solution to this problem is to use a VR configurator. However, this approach has
problems since the virtual representation is limited given that the user perception of
dimensions and volumes of the automobile are distorted.
AR provides an interesting alternative to this problem. Similar to a virtual configurator, with AR the vehicle can be customized interactively (Gay-Bellile et al.
2012). Importantly, with this approach, since the augmentation is achieved on a real
vehicle, the perception of dimensions and volumes are preserved. With AR the end
user experience can be further improved by integrating some interactions between
the real environment and the virtual elements. For example, if the customer chooses
to change the color of the bodywork, it is important to provide a restitution that
respects the current lighting conditions, but that also keeps the reflections of the surrounding environment onto the bodywork. And finally as illustrated in Figure 17.1,
if customization through AR is useful for cars purchased by the average customer,
it is even more useful for utility vehicles due to their extreme customizable interior
furnishings.
437
FIGURE 17.2 Driving-assistance in AR. While the vehicle is accurately localized with a
constrained VSLAM using a camera, standard GPS, and a coarse 3D city mode (top left), AR
is used to display the path to follow (top right), safety information such as dangerous intersections (top right) and crosswalks (bottom left), or touristic information such as hotel (bottom
right). (Courtesy of CEATECH.)
438
to current aided navigation systems, the driver would not have to mentally transpose
schematic representation of a road network with his/her current perception of the
environment. AR can also be used to focus the drivers attention onto potential dangers, such as at crosswalks and intersections (see Figure 17.2), and road users.
AR can also improve the drivers perception of the roadway environment when
observation conditions are degraded due to weather or other reasons. For example,
with AR, the edges of the road can be highlighted in foggy conditions; and vehicles occluded by a building (Barnum et al. 2009) or by another vehicle (Gomes
etal. 2013) can be displayed to the driver using AR. Finally, the development of an
autonomous vehicle will probably favor the development of infotainment applications, such as touristic guide, and AR will be especially useful in providing this
information.
439
440
Of course, localization is not the only challenge to implementing AR in the automotive industry. Providing a hands-free AR device remains a key challenge for the
solutions ergonomics, as illustrated by the research activity on semitransparent
glasses and windshields. However, depending on the targeted application, the use of
a deported screen (such as the screen of a tablet or the screen of Google Glasses) or
of a video projector (spatial AR) can provide an imperfect but nevertheless acceptable alternative.
Consequently, in the following sections, we will focus our attention to the localization issue and introduce the recent advances we have made on this subject. Also,
we will distinguish tracking solutions with respect to their applicative context: AR on
the vehicle and AR from the vehicle. The former covers all the applications for which
the car, or one of its components, is the target of the augmentation, while the latter
covers mainly the driving-assistance applications.
441
localize the observer with respect to the object. For that, an off-line calibration process has to be performed beforehand to determine the pose of the target object with
respect to the camera constellation. Similar to the inside-out approach, the outside-in
approach requires the marker to be visible by at least one camera of the constellation. Therefore, the volume covered by such an approach is limited. In both cases,
the necessity of instrumenting the environment with markers or cameras implies a
deployment process whose complexity is inappropriate with most of the applicative
contexts. For the outside-in approach, the cost of the camera constellation can also
be a further limitation.
To prevent these deployment problems, optical solutions that rely exclusively on
natural features of the target object have been developed. They are referred to as
model-based tracking and rely on a 3D model that describes the natural features of
the target object (assumption of known object). These 3D features can be geometrical, such as edges of a 3D mesh model (Drummond and Cipolla 2002; Wuest etal.
2007), or photo-geometric, such as a collection of 3D texture patches (Rothganger
etal. 2003; Gordon and Lowe 2006). The pose P of the object can be estimated
from a camera image by matching a subset {Qi}ni =1 of these 3D model features with
their corresponding 2D features {q i, j }ni =1 in the jth camera image. The object pose is
then defined as the one that minimizes the reprojection error, that is, the distance
between the 2D projection of the 3D feature according to the object pose and the
cameras intrinsic parameters and the position of their associated 2D features in
the image:
arg min P
d (q
2
i, j
, (P Qi ))
i =1
where
d2 is a 2D Euclidian distance
is the camera projection function
While model-based tracking solutions are easy to deploy and result in accurate localization, their lack of robustness is not compatible with the quality and continuity of
service required.
Indeed, model-based tracking is subject to two major drawbacks. First, the
matching of the 3D features with 2D observations can only be achieved under
restrictive conditions of use. On the one hand, the information encoded in geometric
features is not discriminant enough to easily distinguish between two observations.
To succeed, a usual solution consists in reducing the matching complexity problem
by introducing a small motion assumption. However, this assumption reduces the
condition of use of the system. On the other hand, photometric features are discriminant enough to achieve the matching under fast motion. However, because the
appearance of an object varies with the lighting conditions, a 3D model based on
photometric features is only valid under specific lighting conditions. This constraint
reduces the use of this solution to an environment where the lighting conditions are
controllable.
442
Second, the pose of the camera is not accurately estimated when the object is
small in the image, subject to a large occlusion, or out of the field of view, since
those configurations do not provide enough geometric constraints: the number of 2D
features matched with the 3D model is small and/or these 2D features are located in
a small area of the image.
Among the optical tracking solutions, a last approach provides a greater robustness to lighting condition variations and large viewpoint variations. This solution
is usually referred as visual simultaneous localization and mapping (VSLAM) or
structure from motion (SfM), depending on the particular scientific community
(Mouragnon etal. 2006; Klein and Murray 2007).
While the camera pose is estimated in a similar way to a model-based tracking solution, VSLAM approaches do not use an a priori model of an object but
reconstructs, online and in real time, a 3D model of the whole scene (assumption of
unknown environment). To achieve this reconstruction, VSLAM uses the principle of
multiview geometry (Hartley and Zisserman 2004) in order to assess the 3D position
of scene features, such as 3D points, from the motion of their apparent 2D motion
in the video stream. Since a long-enough 2D displacement is required to estimate
the depths of the features, this reconstruction process is not achieved in each frame.
The frames at which the reconstruction process is achieved are usually referred to
as keyframes. To reach an optimal trajectory and scene reconstruction with respect
to the multiview geometry constraints, both of them are optimized simultaneously
with a nonlinear optimization process, referred to as bundle adjustment (BA). This
optimization process minimizes the error R that corresponds to the sum of square
differences between the 2D projection of each 3D point and its associated 2D observations in each keyframe (also referred to as reprojection error):
R () =
d (q
2
i, j
, (Pj , Qi ))
i =1 jA i
where
stands for the scene parameters optimized by the BA (i.e., the coordinates {Qi}ni =1
of the reconstructed 3D features and the pose parameters {Pj}lj=1 at keyframes)
2
d is the squared Euclidean distance
is the projection function of a point Qi with respect to a camera pose P
Ai is the set of keyframe indexes in which the point Qi is associated to an observation qi,j
Since the camera localization is estimated from a 3D model that covers the whole
observed scene, the VSLAM approach provides an excellent robustness to a large
and fast camera motion and partial scene occlusion. Moreover, since the 3D model of
the scene is built online, the photometric appearance of the model features is, by construction, consistent with the current lighting conditions. Unfortunately, the VSLAM
approach provides poor localization accuracy. Indeed, in spite of the BA, the reconstruction process is usually subject to error accumulation due to its incremental nature.
Moreover, VSLAM localization is expressed in an arbitrarily chosen coordinate
443
frame, and an arbitrary scale factor when a single camera is used. However, this last
drawback can be partially resolved by bootstrapping the 3D model of the scene with
the help of a model-based tracking approach. While the model-based solution is used
to estimate the pose of the first keyframe, an initial reconstruction of the scene is
achieved by back-projecting the 2D image features onto the CAD model of the target
object (Bleser etal. 2006). If this initial reconstruction process provides a coordinate
frame and a scale factor to the VSLAM localization, their accuracy remains limited
since the reconstruction is achieved from a single point of view.
In the following, we will introduce our optical tracking approach that goes one
step further toward the combination of model-based tracking and VSLAM, and provides an accurate and robust localization while remaining easy to deploy.
444
error accumulation. The resulting drift can be observed, for example, at keyframes
by projecting the CAD model onto the image with the camera pose estimated bythe
VSLAM process. The error on the camera pose will generate an offset between
thesharp edges of the CAD model and their corresponding contours in the image.
If the constraints provided by the CAD model were strictly respected, these offsets
would be null.
To prevent drift, we introduce the constraints provided by the CAD model directly
into the VSLAM process. Consequently, the optimal trajectory and environment
reconstruction must minimize simultaneously the multiview constraints (i.e., the
reprojection errors of the 3D features reconstructed online) but also the constraints
provided by the CAD model (i.e., the offset between the projection of the 3D sharp
edges of the CAD model and their corresponding contours in the image).
17.4.2.2Implementation
We propose to use model-based constraints provided by the sharp edges of the 3D
model (i.e., for a polygonal model, edges used by two triangles and whose dihedral
angle is superior to a threshold). Similar to Drummond and Cipolla (2002), these
sharp edges are sampled in a set of oriented points {L i}si =1, usually referred as an
edgelet, each point being parameterized by its 3D position Mi and the 3D direction Di of the sharp edge from which it was extracted. An illustration of an edgebased model for the bodywork of a car is illustrated in Figure 17.4.
To introduce the edge-based constrain into a VSLAM process while keeping its
real-time performance, we propose to modify exclusively the BA process. Indeed,
since it can be achieved as a background process (Klein and Murray 2007), the performance of the tracking process will not be altered by our modifications. However,
modifying the BA will affect the result provided by the tracking process since the
BA refines the position of the 3D features used to estimate the camera pose.
In the BA, the edge-based constraint takes the form of an additional term hCAD
in the cost function. This term corresponds to the orthogonal distance between the
445
projection of the edgelet PjMi and its corresponding contour mi,j in the image. This
corresponding contour is usually defined as the nearest point of contour located
along the normal of the edgelet direction with an orientation similar to that of the
projected edgelet. Consequently, the deviation hCAD of the scene with respect to the
CAD-model constraints is defined as follows:
h CAD () =
i, j
(m i, j (Pj , M i ))
i =1 jB
where
ni,j is the normal of the projected direction PjDi
is the camera projection function of a 3D edgelet Mi with respect to a camera
pose Pj
B is the set of keyframe indexes for which the edgelet Mi is associated to a 2D
contour point mi,j
The optimal scene reconstruction optim is therefore defined as the scene that minimizes simultaneously the multiview and CAD-model constraints:
This biobjective cost function is optimized through a BA. More details about its
implementation can be found in Gay-Bellile etal. (2012).
17.4.3 Discussion
The combination of model-based and VSLAM approaches provides two main benefits.
On the one hand, the constraints provided by the CAD model provide accuracy to the
VSLAM process. Indeed, it defines a reference frame but also prevents the drift of the
scale factor and the error accumulation, which are the main drawbacks of the standard
VSLAM process. Moreover, our solution provides a better robustness to coarse initialization than usual solutions that bootstrap a VSLAM process with a model-based solution
(Bleser etal. 2006). In fact, the error that a coarse initialization induces on the online
reconstruction will be progressively corrected since the model-based constraint is optimized for several keyframes simultaneously. Consequently, our constrained VSLAM
process is easier to deploy and provides a more accurate online reconstruction.
On the other hand, the reconstruction of the unknown environment realized by
the VSLAM process provides the robustness that was missing in the standard modelbased approach. Therefore, the features used to estimate the camera poses are distributed over the whole image since the whole environment is used to estimate the
pose of the camera. Even if the object is small in the image, occluded, or out of
the field of view, its location with respect to the camera can still be estimated from
the observation of its surrounding environment. Moreover, because the edgelet-to-
contour association requires a small motion assumption, standard model-based solutions are not robust to fast motion. However, in our constrained VSLAM approach,
446
a first motion estimation is provided by the VSLAM process. Since the edgelet-tocontour matching process is achieved from the camera pose provided by this first
estimation, the small motion assumption is always checked.
Also, since VSLAM and geometric model-based tracking are both robust to illumination variations, the resulting constrained VSLAM maintains this robustness.
Consequently, our solution provides both accuracy and robustness, thus meeting the
requirement of quality and service continuity. On the other hand, the ease of deployment requirements is also fulfilled by our constrained VSLAM solution since it uses
only a standard camera and CAD model that are widely used in the automotive
industry, and since it can be coarsely initialized.
Our constrained VSLAM solution was successfully applied on various parts of a
vehicle, such as the bodywork, a cylinder head, and the automobiles interior. Some
results are shown in Figure 17.1 where our approach was used for sales assistance
using AR technology. Notice that these scenarios would have been extremely challenging given the usual model-based or VSLAM approaches since the bodywork
provides few sharp edges while its texture is unstable since it is mostly generated by
specular reflections of the surrounding environment.
(a)
447
(b)
FIGURE 17.5 (a) The in-plane degree of freedom, that is, the x and y position and the yaw
angle. (b) The out-plane degree of freedom, that is, the altitude, the pitch, and the roll angle.
do not provide an accurate 6DoF localization and, consequently, do not meet the
quality of service criterion expressed earlier.
The second approach relies on a database of 3D visual features, such as 3D points
associated with a description of their visual appearance. This approach corresponds
to a photo-geometric 3D model that can be used online by model-based tracking
solutions (Lothe etal. 2009) previously introduced in Section 17.4.1 to estimate the
6DoF of the camera pose at high frequency. However, this approach is subject to
serious drawbacks. First, the databases are not widely distributed and their constructions are usually expensive since it tries to collect data from a dedicated vehicle that
moves along all the streets of all the cities. Second, the localization process is usually not robust to large illumination variations with respect to illumination condition
observed during the database construction. Therefore, the continuity of service cannot be guaranteed with this solution.
To avoid the drawbacks induced by landmark database, another family of visionbased approaches relies on the exploitation of VSLAM (cf. Section 17.4.1). However,
as already mentioned, this approach is subject to error accumulation, and scale factor
drift when a single camera is used. To prevent these problems, most of the solutions
try to fuse the motion estimated by VSLAM with the location provided by GPS
(Schleicher etal. 2009). While these solutions provide high frequency localization,
the estimation remains inaccurate. Indeed, the accuracy of the in-plane parameters
(cf. Figure 17.5) are usually limited to the GPS accuracy, while the out-of-plane
parameters (cf. Figure 17.5) are not estimated or result in high uncertainty.
Consequently, none of the previously mentioned solutions provide at the same
time the service quality/continuity and ease of deployment. However, in the following, we will demonstrate that the VSLAM approach can fulfill all these requirements
when it is correctly constrained with additional data, such as that derived from a GPS
and a geographic information system (GIS).
448
and a GIS, both being low cost and widely used. Similar to the approach that we introduced for tracking vehicle components, our solution relies on a constrained VSLAM
framework. The two solutions only differ by the nature of the constraints used.
In the following, we will first describe how the in-plane DoF of a VSLAM can
be coarsely constrained with GPS data (Section 17.5.2.1). Then, we will demonstrate
that a standard GIS can be used to refine both the in-plane (Section 17.5.2.2) and outplane DoF (Section 17.5.2.3) of the localization.
17.5.2.1 Constraining VSLAM with GPS
The GPS constraint to consider is that the position of the camera positions provided
by the VSLAM must follow the ones provided by the GPS. In fact, if we consider
that the camera is rigidly embedded to the GPS and the distance between these
two sensors is negligible, then this constraint can be observed by analyzing the gap
between the GPS measurements and the camera positions. It results in the following
GPS constraints:
h GPS () =
( t ) (g )
j x
j x
+ ( t j ) y (g j ) y
j =1
where
((t j )x , (t j )y ) is the jth cameras position and ((g j )x , (g j )y ) its associated GPS data
1 is the number of the camera optimized in the BA
Including this GPS constraint in the BA gives
A () = R () + w h GPS ()
A weight w must be added to take into account the uncertainty of the GPS data
and its variation over time (which is not the case for the CAD model constraints
since it is assumed error free). The presence of aberrant data makes the weight estimation a challenging problem. To avoid this problem, Lhuillier (2012) proposes a
new formulation of the constraint that allows more robustness against inaccurate
data. The principle is to progressively introduce the additional constraints (GPS, 3D
model) while not degrading significantly the multiview geometry (i.e., the reprojection error) beyond a certain threshold:
() < e
t
R
where
is a vector containing all the parameters optimized in the local BA (the n 3D
points and the l camera poses)
R() is the reprojection error previously introduced in Section 17.4.1
449
The inequality constraint R() < et can be formalized through a cost function that contains, in addition to the constraint term h(), a regularization term prohibiting a degradation of the reprojection error beyond the predefined threshold et. This regularization
term, computed from the standard reprojection error, has a negligible value when the
mentioned condition is respected and tends toward infinity when the considered condition is close to being broken. Consequently, the resulting cost function is given by
I () =
w
+ h()
e t R ()
450
The second difficulty is that the GPS and buildings constraint both act (explicitly
or implicitly) on the camera position and have different optimal solutions due to
the important GPS uncertainty in dense urban areas. Thus, merging these two constraints in a unique constraint BA can lead to convergence problems. Therefore, we
propose a solution that uses the buildings models to remove the bias that affect the
GPS data before including them in the BA process.
17.5.2.2.1 Buildings Constraint
The principle of the constraint provided by the 3D buildings models is based on the
following hypothesis: a perfect VSLAM reconstruction (i.e., without any drift) must
lead to a 3D point whose cloud will be almost aligned with the 3D buildings models
of the observed scene. Consequently, it is possible to evaluate the compliance or the
noncompliance of the buildings constraint by measuring the difference between the
positions of the reconstructed 3D points and their corresponding facades in the 3D
buildings model. However, all the 3D points reconstructed by the VSLAM algorithm
do not necessarily belong to a facade. In fact, some 3D points may represent elements
belonging to the parked cars, trees, and road signs. This set of points is not concerned
by the buildings constraint. Consequently, the first step to establish this constraint is to
identify the set M of 3D points that belongs to buildings fronts and associate each one
to its correspondent facade. The constrained term associated to the buildings model
must measure the distance between each point Qi M and its corresponding faade
hi. For that, each point Qi M is expressed in the coordinate frames of its faade:
Qhi i
h Qi
=T i
1
1
where Thi is the (4 4) transfer matrix from the world coordinates frame to the
facade hi coordinates frame. In this coordinate frame the distance between the 3D
points and the buildings model is simply given by the z coordinate of Qhi i. Therefore,
the compliance of a point cloud with respect to the building constraint can be estimated as follows:
h bdg () =
(Q )
hi
i
iM
2
z
451
Recent studies propose to correct the bias of standard GPS by using geo-referenced
information provided by GIS. In Kichun etal. (2013) road markings are used: white
lines delimiting the road are detected in the images and matched with the numerical
model of the road to estimate the lateral bias of the GPS. Crosswalks are also exploited
with the same principle to calculate the bias in the direction of the vehicle displacement. However, white lines are not present on all roads and crosswalks are not frequent
enough to estimate regularly the GPS bias. In addition, these road markings can be regularly occluded by other cars. Finally, this type of information (white lines and crosswalks) is currently not common in GIS, which makes this solution difficult to deploy.
To overcome these problems, we propose to estimate the bias of the GPS from the
reconstruction obtained by the fusion of VSLAM and GPS data by exploiting the buildings that are more visible in urban areas and for which 3D models are widely available in
GIS. The bias of the GPS is not directly observable, unlike the VSLAM r econstruction
error generated by this bias that results in a misalignment between the reconstructed 3D
point cloud and the buildings models. Our solution is based onthe following hypothesis:
error affecting locally (i.e., the n last camera position and the 3D points they observe)
the VSLAM reconstruction after fusion with GPS data corresponds to a rigid transformation in the ground plane, that is, with 3DoF (position and yaw angle). This assumption appears to be a sufficient approximation for a coarse correction.
We add a correction module to the constrained VSLAM process that estimatethe
GPS bias at each keyframe as seen in Figure 17.6. This module takes as input
the buildings model and the VSLAM reconstruction that is not perfectly aligned
with the buildings model due to the GPS bias. The first step of this module is to
identify which points of the 3D point cloud come from buildings facades or from
the rest of the environment such as parked cars or trees. Once this segmentation is
performed, the 3DoF rigid transformation is estimated by minimizing the distances
between the 3D points associated with the buildings model and their associated
facade. This transformation is then locally applied to the GPS data associated to the
n last camera position to correct their bias.
Buildings
model
VSLAM
Constrained
bundle
adjustment
DEM
Georeferenced
VSLAM
reconstruction
Local correction
of GPS data
GPS
FIGURE 17.6 Overview of our proposed solution for accurate localization in urban environment. It combines VSLAM with GPS data and GIS models (buildings and DEM).
452
(t ) h
j =1
kj
j
tkj
tj
where j = Lk j with Lkj the (4 4) transfer matrix from the world coordinates
1
1
frame to the road kj coordinates frame associated to the jth camera. More details
on the DEM constraints are given in Larnaout etal. (2013a).
17.5.2.4 Overview of Our Complete Solution for Vehicle Localization
To summarize the proposed solution, it is a VSLAM approach with a constrained BA
that includes buildings, DEM, and GPS constraints. The resulting cost function of
the constrained BA is given by
I () =
w
+ h GPS () + h DEM () + h bdg ()
e t R ()
To overcome the convergence problem, GPS data are coarsely corrected beforehand
through an additional module that acts as a differential GPS where geo-referenced
antennas are replaced by a buildings model. Figure 17.6 presents an overview of the
proposed solution.
We performed many experiments to evaluate our vehicle geolocalization solution.
For our experiments, sequences of several kilometers of the roadway were acquired
in France. To obtain these sequences, the vehicle was equipped with a standard GPS
1Hz and an RGB camera providing 30 frames per second. The GIS models used
are provided by the French National Geographic Institute with an uncertainty that
do not exceed 2 m. We present results for one of these sequences in Figure 17.7.
(a)
(b)
(c)
(d)
453
FIGURE 17.7 (a) The raw GPS data and (b) the GPS data coarsely corrected with the approach
described in Section 17.5.2.2.2. (c) Results obtained by fusing VSLAM and the raw GPS data
and (d) results obtained with the proposed solution (the triangles are the camera position for
each keyframe and the 3D point cloud is the sparse reconstruction of the observed environment).
The localization accuracy is clearly improved with the proposed solution this is
enhanced by the good alignment between the resulting 3D point cloud and the buildings model where considering the fusion of VSLAM with the raw GPS, data gaps
can be observed between the resulting 3D points and the buildings model. Using a
complete GIS model (buildings and DEM) to improve the fusion of VSLAM and
GPS data yields accurate 6DoF localization that fulfill the quality criteria required
by aided navigation application as illustrated in Figure 17.2.
17.5.3 Discussion
Using GIS models can improve the localization accuracy resulting from the fusion of
VSLAM with GPS data. This fulfills both the quality and ease of deployment requirements of navigation-aided applications since GIS is now widely available and some
GIS databases are free (e.g., OpenStreeMap 3D). Furthermore, since our approach
relies on a VSLAM approach, it provides a continuity of service that solutions based
on visual featured databases cannot guarantee. On the other hand the localization
quality provided by our solution is still less accurate than approaches based on a
database. However, the accuracy provided by our solution remains sufficient for most
454
17.6CONCLUSION
Applications of AR in the automotive industry are numerous. Many AR techniques
for the automobile industry have been studied but few actually deployed. Even if
recent advances in tracking technologies, such as the constrained VSLAM, allow
engineers to remove the main technological locks, some challenges will still remain.
The first challenge relates to the ergonomy of the solution. For example, most of
the applications require a hands-free device. While solutions are already available,
such as spatial AR or semitransparent glasses, their ergonomy (e.g., width of the field
of view, dynamic focus distance, luminosity, and contrast) should be improved to
reach a high end user acceptance. But ergonomic issues are not limited to hardware.
The displayed information should also be designed to facilitate the work of the end
user without disturbing him or her or introducing potential dangers. For example, the
iconography used to display information on a windshield must be designed to reduce
the risk of hiding pedestrian or vehicles from the drivers view.
The second challenge concerns the integration of AR in the product life management process. For example, the goal is to lower the cost of content creation of an AR
application and facilitate its updates; therefore, the product data management (PDM)
should also integrate the needs of AR applications. Further, the development of a 3D
documentation could benefit from the 3D models and animations that were created
during the conception stage of the vehicle. Consequently, the development of new
norms and standards concerning 3D models, animations, interactions, and documentation will probably be necessary. In summary, this chapter presented an overview of
the use of AR in the automotive industry; we expect more integration of AR in the
product life cycle of automobiles in the future.
ACKNOWLEDGMENTS
We would like to thank Diotasoft for supporting our research with applications
in sales support in concession and vehicle maintenance through AR. We also
455
thank Valeo and Renault for their financial supports that allow development of
AR-based aided navigation solutions.
We are grateful to our colleagues at Vision and Content Engineering Laboratory
and Michel Dhome from the Institute Pascal for many useful comments and insights
that helped us to develop and refine the work.
REFERENCES
ARTESAS Project. Advanced augmented reality technologies for industrial service and applications. http://www.wzl.rwth-aachen.de/en/aa272c5cc77694f6c12570fb00676ba1.htm.
20042006.
ARVIKA Project. Augmented reality for development, production and servicing. http://cg.cis.
upenn.edu/hms/research/AVIS/papers/flyer_e.pdf. 19992003.
Barnum, P. C., Y. Sheikh, A. Datta, and T. Kanade. Dynamic seethroughs: Synthesizing hidden
views of moving objects. International Symposium on Mixed and Augmented Reality.
2009. Orlando, FL.
Bleser, G., H. Wuest, and D. Stricker. Online camera pose estimation in partially known and
dynamic scenes. International Symposium on Mixed and Augmented Reality. Santa
Barbard, CA, 2006.
Calvet, L., P. Gurdjos, and V. Charvillat. Camera tracking using concentric circle markers:
Paradigms and algorithms. International Conference on Image Processing. Orlando, FL,
2012.
Cummins, M. and P. Newman. Appearance-only SLAM at large scale with FAB-MAP 2.0. The
International Journal of Robotics Research. 30(9), 2011: 11001123.
Drummond, T. and R. Cipolla. Real-time visual tracking of complex structures. IEEE
Transactions on Pattern Analysis and Machine Intelligence. 24, 2002: 932946.
EFA2014Project.n.d.http://www.strategiekreis-elektromobilitaet.de/public/projekte/
eva2014/other/research-project-energy-efficient-driving-2014.
Friedrich, W. ARVIKAAugmented reality for development, production and service.
International Symposium on Mixed and Augmented Reality. 2002. Darmstadt, Germany.
Gay-Bellile, V., S. Bourgeois, M. Tamaazousti, and S. Naudet-Collette. A mobile markerless
augmented reality system for the automotive field. Workshop on Tracking Methods and
Applications. 2012. Atlanta, GA.
Gomes, P., M. Ferreira, M. K.-S. Kruger-Silveria, and F. V. Vieira. Augmented reality driving supported by vehicular ad hoc networking. International Symposium on Mixed and
Augmented Reality. Adelaide, SA, 2013.
Gordon, I. and D. G. Lowe. What and where: 3D object recognition with accurate pose.
Toward Category-Level Object Recognition. 4170, 2006: 6782.
Hartley, R. and A. Zisserman. Multiple View Geometry in Computer Vision, 2nd edn. Cambridge
University Press, Cambridge, U.K., 2004.
Kahn, S. and A. Kuijper. Fusing real-time depth imaging with high precision pose estimation by a measurement arm. International Conference on Cyberworlds. Darmstadt,
Germany, 2012, pp. 256260.
Kichun, J., C. Keounyup, and S. Myoungho. Gps-bias correction for precise localization of
autonomous vehicles. In Intelligent Vehicles Symposium, Gold Coast, Australia, 2013.
Klein, G. and D. Murray. Parallel tracking and mapping for small AR workspaces. International
Symposium on Mixed and Augmented Reality. 2007. Nara, Japan.
Larnaout, D., V. Gay-Bellile, S. Bourgeois, and M. Dhome. Vehicle 6-DoF localization based
on SLAM constrained by GPS and digital elevation model information. International
Conference on Image Processing. Melbourne, Victoria, Australia, 2013a, pp. 25042508.
456
Larnaout, D., V. Gay-Bellile, S. Bourgeois, and B. Labbe. Fast and automatic city-scale environment modeling for an accurate 6DOF vehicle localization. International Symposium
on Mixed and Augmented Reality. Adelaide, South Australia, Australia, 2013b.
Lhuillier, M. Incremental fusion of structure-from-motion and GPS using constrained bundle
adjustment. IEEE Transactions on Pattern Analysis and Machine Intelligence. 34, 2012:
24892495.
Lothe, P., S. Bourgeois, F. Dekeyser, and M. Dhome. Towards geographical referencing
of monocular SLAM reconstruction using 3D city models: Application to real-time
accurate vision-based localization. IEEE Conference on Computer Vision and Pattern
Recognition. Miami, FL, 2009, pp. 28822889.
Menk, C., E. Jundt, and R. Koch. Evaluation of geometric registration methods for using spatial augmented reality in the automotive industry. Vision, Modeling, and Visualization
Workshop. Siegen, Germany, 2010, pp. 243250.
Mouragnon, E., M. Lhuillier, M. Dhome, F. Dekeyser, and P. Sayd. 3D reconstruction of
complex structures with bundle adjustment: An incremental approach. International
Conference on Robotics and Automation. Orlando, FL, 2006.
Park, H.-S., H.-W. Choi, and J.-W. Park. Augmented reality based cockpit module assembly
system. International Conference on Smart Manufacturing Application. Gyeonggi-do,
South Korea, 2008.
Pentenrieder, K., C. Bade, F. Doil, and P. Meier. Augmented reality-based factory planning
An application tailored to industrial needs. International Symposium on Mixed and
Augmented Reality. Nara, Japan, 2007.
Pintaric, T. and H. Kaufmann. Affordable infrared-optical pose tracking for virtual and augmented reality. Workshop on Trends and Issues in Tracking for Virtual Environments.
2007. Charlotte, NC.
Porter, S. R., M. R. Marner, R. T. Smith, J. E. Zucco, and B. H. Thomas. Validating spatial
augmented reality for interactive rapid prototyping. IEEE International Symposium on
Mixed and Augmented Reality. Seoul, South Korea, 2010, pp. 265266.
Regenbrecht, H., G. Baratoff, and W. Wilke. Augmented reality projects in the automotive and
aerospace industries. Computer Graphics and Applications. 25(6), 2005: 4856.
Reif, R. and W. A. Gnthner. Pick-by-vision: Augmented reality supported order picking. The
Visual Computer. 25(57), 2009: 461467.
Reiners, D., D. Stricker, G. Klinker, and S. Mller. Augmented reality for construction tasks:
Doorlock assembly. International Workshop on Augmented Reality. Natick, MA, 1998.
Reitmayr, G., E. Eade, and T. Drummond. Semi-automatic annotations in unknown environments.
International Symposium on Mixed and Augmented Reality. Nara, Japan, 2007, pp. 6770.
Rothganger, F., S. Lazebnik, C. Schmid, and J. Ponce. 3D object modeling and recognition using affine-invariant patches and multi-view spatial constraints. Conference on
Computer Vision and Pattern Recognition. 2003, pp. 272280. Madison, WI.
Schleicher, D., L. Bergasa, M. Ocana, R. Barea, and E. Lopez. Real-time hierarchical GPS
aided visual SLAM on Urban environments. International Conference on Robotics and
Automation. Kobe, Japan, 2009, pp. 43814386.
Tamaazousti, M., V. Gay-Bellile, S. Naudet-Collette, S. Bourgeois, and M. Dhome. NonLinear
refinement of structure from motion reconstruction by taking advantage of a partial
knowledge of the environment. Computer Vision and Pattern Recognition. Providence,
RI, 2011, pp. 30733080.
Wuest, H., F. Wientapper, and D. Stricker. Adaptable model-based tracking using analysis-bysynthesis techniques. In Computer Analysis of Images and Patterns. W. G. Kropatsch,
M.Kampel, and A. Hanbury (eds.), Springer, Berlin, Germany, 2007, pp. 2027.
Zhou, J., I. Lee, B. Thomas, R. Menassa, A. Farrant, and A. Sansome. In-situ support for automotive manufacturing using spatial augmented reality. International Journal of Virtual
Reality. 11(1), 2012: 3341.
18
Visual Consistency in
Augmented Reality
Compositing
Jan Fischer
CONTENTS
18.1 Introduction................................................................................................... 458
18.1.1 Image Compositing in Video See-Through Augmented Reality....... 458
18.1.2 Image Inconsistencies: Camera Imperfections and Artifacts............ 458
18.1.3 Specific Challenges of RealVirtual Compositing............................ 459
18.1.4 Summary of Effects Causing Visual Discrepancies.......................... 461
18.2 Creating Visual Consistency: Two Complementary Approaches to
Unified Visual Realism in AR....................................................................... 461
18.2.1 Emulation of Camera Realism........................................................... 462
18.2.2 Artistic or Illustrative Stylization...................................................... 463
18.2.3 Summary of the Two Complementary Strategies.............................. 463
18.3 Related Work and Specific Challenges in AR...............................................464
18.4 Emulating Photographic Imperfections in AR..............................................465
18.4.1 Camera Image Noise.........................................................................466
18.4.2 Motion Blur........................................................................................468
18.4.3 Defocus Blur (Depth of Field)...........................................................468
18.5 Specific Challenges of RealVirtual Compositing........................................469
18.5.1 Aliasing at the Border of Virtual Objects..........................................469
18.5.1.1 Pixel Averaging in the Original Resolution........................ 470
18.5.1.2 RealVirtual Antialiasing Using Supersampling............... 470
18.5.2 Occlusion Handling........................................................................... 473
18.5.2.1 Occlusion Handling Using a Time-of-Flight Range
Sensor.................................................................................. 473
18.6 Stylized AR................................................................................................... 474
18.6.1 Real-Time Cartoon-Like Stylization on the GPU............................. 475
18.6.2 Other Stylization Approaches............................................................ 476
18.6.3 Applications of Stylized AR.............................................................. 476
18.6.4 Psychophysical Evaluation of Stylized AR........................................ 478
18.7 Conclusion..................................................................................................... 478
References............................................................................................................... 479
457
458
18.1INTRODUCTION
18.1.1 Image Compositing in Video See-Through Augmented Reality
In augmented reality (AR), virtual graphical elements generated by a computer are
overlaid over a view of the real world. The types of additional graphical elements are
as varied as the spectrum of AR applications. In a pedestrian or bicycle navigation
AR application, the overlay may consist of abstract 3D arrows and some text; in a
medical AR application, organ structures from a CT scan may be shown; and in an
AR game, an entire parallel game universe may coexist within the real surroundings
of the player.
In the case of video see-through AR, which is the focus of this chapter, this visual
integration of virtual objects into the users environment is accomplished with the
help of a digital video camera (Azuma etal., 2001). This digital video camera may
for instance be a camera built into a smart phone, tablet, or laptop, or it may be contained in a wearable device, such as a head-mounted display. This camera generally
points in a direction roughly corresponding to the users viewing direction, and it
continually captures images of the real scene that is in front of the user. The captured
video stream is processed by a computer, which uses each captured video frame as
a background image over which it renders the additional graphical elements. Finally,
the combination of the captured camera image with the superimposed virtual objects
is shown to the user on a screen, which is often built into the same device that also
contains the video camera. This process is continually repeated at real-time frame
rates, generating the impression that the user can see-through the screen to observe
the augmented environment.
459
often run on small and mobile devices such as smart phones, tablets, and wearable
displays, the camera hardware used is typically small and usually also rather inexpensive. A limited image quality generally has to be accepted due to the restrictions of the components of these small cameras in comparison with larger and more
expensive video equipment. Klein and Murray (2010) provide an exhaustive overview of the artifacts introduced in the image acquisition pipeline of a small, low-cost
digital video camera.
The computer-generated virtual objects in an AR scene, on the other hand, are
artificially generated. They are normally synthesized using standard real-time rendering algorithms as they are common in computer graphics. The fast reaction times,
and therefore short latencies, which are required in an interactive AR system, mean
that only relatively simple rendering approaches can be used, in particular in view of
the still rather limited performance of many mobile devices. The models shown as
virtual objects are often rather plain and may in some applications even be generated
procedurally (i.e., during runtime by an algorithm). Illumination parameters such as
the number and position of light sources typically do not match the conditions in the
real environment, which are moreover normally variable. Of course, the computerrendered image portions also lack the imperfections contained in the camera image,
such as image noise and motion blur. Instead, the virtual objects are drawn perfectly
within the capabilities of their rendering algorithms. An introduction into the type of
real-time rendering techniques often utilized in AR is given by Munshi etal. (2008).
As a consequence, the augmented environment presented to the user actually
combines image portions with two entirely different kinds of visual realism: On the
one hand, the visually rich real scene with its complex illumination seen through
an imperfect and distorting imaging pipeline. On the other hand, the synthetically
perfect virtual elements generated from simple models and artificial, static lighting parameters. Ultimately, rather than displaying an integrated augmented environment to the user, the output of an AR system often reveals quite clearly what it
really is: simple computer graphics pasted over a camera image of moderate quality
(see Figure 18.1). It is thus not surprising that a study has shown that people can tell
very reliably whether a particular object is real or virtual in conventional AR (see
Section 18.6.4).
460
FIGURE 18.1 Comparison of the visual realism of real and computer-generated image elements in conventional augmented reality: Which one is real?
effect that is even more noticeable when the respective objects are in motion along
the depth axis. A particularly common instance of this problem concerns the hands
of the user. They are by nature close to the users viewpoint, and they are often
in viewparticularly in applications requiring user interaction. In the nave AR
approach, virtual objects always occlude the users hands, although they are often
supposedly farther away.
It is worth noting that within each of the real and the virtual realms, mutual occlusions are depicted correctly: In the real image portions, objects naturally occlude
each other, while the mutual occlusion of virtual objects is handled by standard
computer graphics algorithms such as the z-buffer (Catmull, 1974; Straer, 1974).
Only when real and virtual objects are visually integrated in the same image, the
lack of depth information in the camera image becomes apparent. This occlusion
problem is a long-standing challenge of AR, of which Wloka and Anderson (1995)
and Fuhrmann etal. (1999) provided early discussions.
For the same reason, global illumination effects caused by the interaction bet
ween real and virtual elements in the augmented environment generally cannot be
rendered correctly. In particular, it is normally impossible to represent reflections
and shadowing properly, since a simulation of these effects also requires knowledge
about the 3D structure of the observed real scene. For example, a virtual sphere hovering over a real desk might be expected to cast a shadow on the surface of the desk,
depending on the position of light sources in the scene. However, correctly rendering
the shadow requires knowledge about the exact spatial position and orientation of
the desk, which cannot be easily extracted from a 2D camera image. Similarly, the
reflections in a virtual mirror (or other reflective object) positioned in the augmented
scene can only be displayed properly if the spatial structure of the real environment
is known. This complex problem of rendering of reflections and shadows in AR was
for instance discussed by Agusanto etal. (2003).
461
462
TABLE 18.1
Reasons for Visual Discrepancies between Real and Virtual Image Portions in AR
Photographic imperfections
Camera noise*
(and other sensor artifacts)
Motion blur*
Defocus blur
(depth of field)*
Lens distortion
463
rendering the virtual objects (Fischer et al., 2006a; Klein and Murray, 2010), the
visual realism of the camera image can be better emulatedeven if this means that
the output image is made worse in a certain sense.
Methods for the emulation of camera realism in AR will be discussed in Section 18.4.
464
Pho
Camera realism
Conventional AR
ns
te
ula
ctio
Sim
erfe
p
ic im
aph
ogr
Artistic/illustrative style
App
ly s
Filt
tyli
ers
zati
to e
on
ntir
e im
age
Stylized AR
Visual
consistency
Low
High
FIGURE 18.2 Complementary approaches for dealing with visual inconsistencies in video
see-through augmented reality.
465
Likewise, the artistic stylization of video sequences has been an area of ongoing
research and development for many years both in academia (e.g., see Hertzmann and
Perlin, 2000; Kypriandis et al., 2013) and the video-processing software industry
(Boris, 2014).
These related developments from the fields of general image and video processing
can provide an important technological basis for achieving corresponding effects in
video see-through AR. However, it must be noted that a set of unique constraints has
to be fulfilled when addressing these problems in AR:
Real-time: Any video-processing filter and specialized rendering method
applied in AR must be able to deliver real-time frame rates to ensure
interactivity.
Competition for system resources: Typical AR systems must additionally
perform a number of computationally expensive tasks, such as image-based
pose estimation or sensor fusion, further reducing the processing resources
available to any video postprocessing step.
Limited hardware capabilities: AR systems are often based on small and
portable hardware platforms such as smart phones, tablets, or wearable
devices, which usually offer significantly less computational power than a
graphical workstation, a desktop computer or even a laptop.
Fully automatic processing: Unlike in many other application areas, the
video-processing steps applied to an AR video stream must generally be
fully automatic. The user cannot make manual adjustments to filter parameters or interactively select different filters depending on circumstances
while the AR system is in operation.
This unique set of constraints means that some image and video-processing methods
developed for other applications may not be suitable for AR at all, while others have
to be modified in order to be useful in this specific context.
466
(a)
(c)
467
(b)
(d)
FIGURE 18.3 Example of image noise emulation for a virtual object. (a) Virtual butter fly
object (shown without background camera image). (b) Example of random noise texture. (c)
Butterfly wing detail without image noise emulation. (d) Same image detail with overlaid
noise emulation (contrast enhanced for better visibility).
The noise textures are used in an adapted AR rendering pipeline. When rendering virtual objects over the background camera image, the color values of their
pixels are modified based on the noise textures. Between frames, the noise textures
are translated and rotated randomly so that dynamically varying noise is emulated.
In order to better match the characteristics of image noise found in the camera
image, the scaling of the noise textures, the magnitude of the noise modulation,
and the speed of the random noise animation are adjustable with user-definable
parameters.
Figure 18.3 shows an example of camera image noise emulation applied to a virtual object. The effect is more apparent in a video than in a still image, since the
emulated noiselike the camera noise itselfis dynamic and therefore continually
changes between frames.
A significantly more advanced emulation of a large number of photographic
imperfections caused by the specific properties of the image acquisition pipeline of
small cameras was developed by Klein and Murray (2010). Their system models and
simulates distortions, chromatic aberrations, blur, Bayer masking, noise, sharpening,
468
18.4.2Motion Blur
A particularly apparent type of blur in the camera image occurs when moving objects
are observed. Motion blur results from the temporal integration of light intensity in the
image sensor. If there is fast movement in the observed scene, a blurred camera image
is captured. This is in particular also true when the camera itself is moved. An adapted
display technique for virtual objects in AR that mimics motion blur was also presented
in Fischer etal. (2006a). In order to be able to simulate the effects of motion blur in
the camera image, the magnitude and direction of the blurring have to be known. The
motion blur vector can be approximated using the camera pose information, which is
already determined in the AR system for the correct placement of the virtual objects.
This motion blur rendering technique for AR computes the geometric center of
each virtual model in a preprocessing step. This is achieved by finding the bounding
box of the model and calculating its center. During the runtime of the AR system,
the blur vector is estimated in every frame. This approximated blur vector is defined
as the 2D motion of the center of the virtual object in image space. It is computed by
projecting the object center into image space according to the current camera pose.
The difference between the projected object centers in two consecutive frames is
used as the blur vector.
Motion blur rendering is only activated if the length of the blur vector is greater
than a predefined threshold (typically between 5 and 10 pixels). Moreover, excessively long motion blur vectors resulting from an erratic pose estimation are ignored.
If motion blur rendering is to be applied in a time step, an adapted display method is
used. The simulated motion blur effect is created by repeatedly blending the virtual
object image over the camera image at different positions. These positions are generated along the computed blur vector, which is centered at the current projection of
the object center. Alpha blending is used during the rendering of the individual copies of the virtual object image. This way, the virtual object appears blurred from its
current position along the blur vector. Figure 18.4 shows an example of motion blur
simulation in AR using this technique.
Motion blur emulation methods based on an external motion estimate are also
employed in Klein and Murray (2010) and Aittala (2010). A differentand potentially more accurateapproach is to estimate the size of motion blur in the camera
image using computer vision methods. Such analytical motion blur simulations were
proposed by Okumura etal. (2006) and Park etal. (2009). In both of these methods,
vision-based motion blur estimation is made possible by integration into the actual
vision-based camera tracking algorithm.
469
(a)
(b)
(c)
FIGURE 18.4 Motion blur simulation in augmented reality. (a) An example of strong
motion blur caused by fast camera rotation in a camera image without virtual objects. Virtual
objects rendered with simulated motion blur in an AR scene: (b) virtual hamburger and (c)
butterfly model.
camera become blurred (i.e., they are out of focus). The range of depths in which captured objects appear acceptably sharp is called depth of field. The resulting defocus
blurring of objects in the camera image is another visual discrepancy in AR, since
virtual objects always appear to be perfect in focus.
The problem of simulating defocus blur in AR has been addressed by several
researchers. The method of Okumura et al. (2006) handles defocus blur together
with motion blur in the context of their marker tracking algorithm. More recently,
Kn and Kaufmann (2012) and Xueting and Ogawa (2013) have proposed specific
AR rendering methods capable of simulating defocus blur.
470
471
(b)
(a)
(c)
FIGURE 18.5 Antialiasing at virtual object boundaries by pixel averaging in the original resolution. The edges of a virtual butterfly object (a) are shown without (b) and with antialiasing (c).
An adaptive edge-directed image upscaling algorithm generates the higher resolution version of the camera image. This advanced upscaling method, which is
executed on the graphics processing unit (GPU), is based on principles described by
Kraus etal. (2007). This sophisticated upsampling scheme generates an enlarged
image that approximates more closely what a higher resolution version of the input
image would likely look like, compared to simple upscaling approaches such as
nearest neighbor or other straightforward interpolation schemes. An example of
a photograph upsampled with the edge-directed scheme is shown in Figure 18.6.
The underlying assumption is that by using a more realistic upsampling scheme
for the camera image, a better result for the realvirtual supersampling can be
achieved.
In the final image composition step, camera image and virtual objects are rendered
at the original, smaller resolution. However, for pixels at the boundary between real
and virtual image regions, the combined higher resolution image is accessed, and the
pixel colors at the corresponding location are averaged.
The averaged high resolution pixels are used as boundary pixels for the combined
image, which leads to an antialiased output. Like the pixel averaging in the original
resolution described earlier, this more advanced realvirtual antialiasing was implemented using image processing shaders executed on the (GPU), resulting in realtime frame rates even for large image resolutions and upsampling factors.
A comparison of antialiasing methods for integrating virtual elements into a
camera image is depicted in Figure 18.7. In this figure, results obtained without
antialiasing (see Figure 18.7a) are shown next to pixel averaging in the original
resolution as described in the preceding section (see Figure 18.7b) and realvirtual
supersampling (see Figure 18.7c). The figure demonstrates that by using supersampling even small-scale structures can be integrated into the camera image with significantly fewer artifacts and an aesthetically superior result compared to the other
approaches.
472
(a)
(b)
(c)
(d)
(a)
(b)
(c)
FIGURE 18.7 Comparison of different methods for integrating a virtual element into a
camera image. A virtual stop sign object is drawn over a real street scene. (a) No antialiasing,
(b) pixel averaging in the original resolution, and (c) realvirtual supersampling (4).
473
18.5.2Occlusion Handling
One of the most disturbing problems in conventional video see-through AR is the
incorrect representation of occlusions. The fact that the spatial relationships between
real and virtual objects in an augmented environment are not correctly rendered is
not only aesthetically displeasing; the resulting disorientation can make it difficult or
even impossible for the user to perform useful interactive tasks.
Consequently, the high relevance of the occlusion problem has inspired the development of many different approaches for solving it since the first AR systems were
introduced. Some of these occlusion-handling methods rely on the reconstruction
of scene depths using a pair of stereo cameras (Wloka and Anderson, 1995) or on
predefined models of physical occluders in the real environment, so-called phantom
models (Fuhrmann etal., 1999; Fischer etal., 2004). Other approaches use computer
vision for establishing a rudimentary model of depth relationships in the real environment (Berger, 1997), possibly with the help of predefined surfaces in the environment in front of which user interaction is supposed to take place (dynamic occlusion
backgrounds, see Fischer etal., 2003).
18.5.2.1 Occlusion Handling Using a Time-of-Flight Range Sensor
In Fischer etal. (2007), a specialized, recently developed type of hardware was utilized for the first time for addressing the occlusion problem in AR. A depth-of-flight
range sensor was introduced into the AR image generation pipeline. This sensor
produces a 2D map of the distances to real objects in the environment. The distance
map is registered with high resolution color images delivered by a digital video camera. When displaying the virtual models in AR, the distance map is used in order
to decide whether the camera image or the virtual object is visible at any given
position. This way, the occlusion of virtual models by real objects can be correctly
represented.
A 3D time-of-flight camera from manufacturer pmdtechnologies (pmdtechnologies, 2014) was chosen, namely the PMD [vision] 19k. The camera uses a 160 120
pixel photonic mixer device (PMD) sensor array that acquires distance data using
the time-of-flight principle with active illumination. An array of LEDs sends out
invisible modulated near-infrared light. For each pixel, the PMD sensor delivers distance and intensity information simultaneously, where the distance data is computed
from the phase shift of the reflected light directly inside the camera. Since both the
intensity and the depth values are captured through the same optical system, they
are perfectly aligned.
The camera works at a frame-rate of up to 20 fps and is therefore well suited
for real-time AR applications. However, the image data of the PMD camera itself
is not adequate for AR applications due to its low resolution and the limitation
to grayscale images. In order to enhance the visual quality, the time-of-flight
camera was combined with a standard high-resolution camera, namely a Matrix
Vision BlueFox with a 1600 1200 pixel sensor (MATRIX VISION, 2014). The
resulting horizontal field-of-view of about 34 is similar to that of the PMD camera. This ensures an easy calibration of both cameras and only a small loss of
information.
474
The PMD depth image and the high resolution color image were registeredwith
standard methods. Using the stereo system calibration method of the Camera Cali
bration Toolbox for MATLAB (Bouguet, 2013), the extrinsic as well as the intrinsic
parameters of this system were calculated. By applying the resulting translation and
rotation to the 3D data calculated from the depth data, 3D coordinates in the reference frame of the high resolution camera were obtained.
The occlusion-handling scheme is based on the comparison of the absolute depths
of real and virtual objects. This is possible because the depth maps delivered by the
time-of-flight camera and the coordinate system of the AR system are both calibrated to operate in units corresponding to real millimeters.
Whenever a graphical primitive constituting a virtual model in the augmented
environment is rendered, a special shader program executed on the GPU is activated. The shader compares the depth of each pixel of the graphical primitive with
the depth information stored in the time-of-flight depth map. If the depth measured
by the time-of-flight sensor is smaller than the primitive depth at this location, the
display of the virtual object pixel is suppressed. This way, the occlusion of virtual
objects by real objects is correctly depicted, without requiring any prior knowledge
about the real occluders.
For logistical reasons, experiments were performed using an offline image generation pipeline. The depth maps and color images created by the process described
earlier were stored on disk and then imported into an adapted AR image generation
software. Figure 18.8 shows examples of AR images with occlusion handling based
on depth maps acquired with the time-of-flight camera, as well as the associated
combined depth maps themselves.
Recently, depth cameras in smaller form factors have become widespread in the
consumer market. These more practical depth sensors are also being exploited for
occlusion handling in AR, for instance the Kinect camera (Izadi etal., 2011) and the
Leap motion controller (Regenbrecht etal., 2013).
18.6 STYLIZED AR
Stylized AR systems present to their users a view of the augmented environment
transformed to resemble an artistic or illustrative style. These stylizations inherently
mask many or even all of the effects causing visual inconsistencies between the real
and virtual elements of an AR scene. As with all advanced rendering approaches
for AR, the methods employed for the stylization must run fast enough to guarantee
real-time frame rates and without requiring user interaction.
The stylized AR system described in Fischer etal. (2005c) relied on a combination of video filtering of the camera image with an adapted rendering method for
the virtual objects in order to achieve an integrated cartoon-like effect. More recent
methods generally implement stylization in AR as a pure video postprocessing filter: First, a normal AR image is generated by rendering the virtual objects over
the camera image in the conventional manner, and then the stylization is applied
to the entire combined image. This is made possible by the programmable GPUs
that are nowadays contained not only in desktop computers but also in many wearable devices. Video stylization filters can be implemented on modern GPUs with a
(a)
(b)
(c)
(d)
475
minimal impact on the frame rate and latency of the AR system, and without requiring computational resources of the central processing unit (CPU) of the device.
476
a photometric weighting of pixels is the basis for this computation. This nonlinear
filter is inspired by bilateral filtering (Tomasi and Manduchi 1998), which is a widespread method for creating uniformly colored regions in an image. The photometric
filter is applied to a shrunk version of the input image. This way, a better color simplification is achieved, and the required computation time can be reduced. Several
filtering iterations are consecutively applied to the image. The repetition of the filter
operation is necessary in order to achieve a sufficiently good color simplification.
The second stage of the nonphotorealistic filter is an edge detection step using a
Sobel filter. The simplified color image is the primary input for this operation. This
way, the generated silhouette lines are located between similarly colored regions in the
image, which is an approximation of a cartoon-like rendering style. To a lesser degree,
edges detected in the original AR frame are also taken into account when drawing the
silhouette lines. The higher resolution of the original image compared to the shrunk
color image can contribute some additional detail to the edge detection result. In the
typical configuration, most of the input for the edge detection step is taken from the
simplified color image. It consists of mostly uniformly colored regions generated by
the photometric filter. Therefore, edges detected in the simplified color image typically
correspond quite well to the outer boundaries of physical or virtual objects.
Finally, the simplified color image is combined with the edge detection results.
The color image is enlarged to the size of the original input image. The combined
responses of the edge detection filters are drawn over the enlarged image as black
lines. A specific weight function is used for computing a transparency for the detected
edge pixels, which produces a smooth blending over the color image.
The GPU-based implementation of the cartoon-like filter for stylized AR is fast
and delivers real-time frame rates. Figure 18.9 shows examples of stylized AR
images generated with the real-time cartoon-like filter.
18.6.3Applications of Stylized AR
The fact that artistic stylization significantly alters not only the appearance of the
virtual objects but also the camera image means that stylized AR is not suitable for
every application area. However, sometimes a seamless visual integration between
real and virtual elements is more important than an unaltered view of the augmented
477
(a)
(b)
(c)
(d)
TABLE 18.2
Overview of Further Real-Time Artistic and Illustrative Stylization
Approaches for Augmented Reality
References
Artistic Style
Pointillism
Sketchy
Technical illustration
Watercolor
Remarks
Use of a static Voronoi diagram (CPU implementation)
Outlines drawn using particle system; background
paper texture
Black-and-white display; GPU-based hatching
Temporal coherence through the use of a dynamic
Voronoi diagram
Temporally coherent stylization; focus control of
abstraction
Advanced methods for temporal coherence
478
18.7CONCLUSION
This chapter has given an introduction into the problems that need to be solved when
a seamless visual integration of real and virtual objects in an augmented environment is desired. The significance of this challenge depends on the requirements of a
given application. In some cases, a unified visual appearance may only be a secondary matter. In such applications, fast processing, short latencies, a reduced implementation effort, and an unmodified view of the real scene may be more important
479
than the desire to address visual inconsistencies. The described strategies for emulating camera image imperfections, as well as the artistic filters employed in stylized
AR, require the implementation of sophisticated image processing algorithms. Even
in their most optimized form, these postprocessing steps still have some impact on
the frame rate and latencies of the AR system. Moreover, they alter the composited
AR image in its conventional form. In a certain sense, it could therefore be said that
these solutions for visual inconsistencies make the AR image worse or even alter
its visual appearance completely.
However, it must be noted that there is no such thing as a natural visual style for
augmented environments. Nave implementations of AR generate visually incongruous images (see Figure 18.1). These incompatible levels of visual realism between
real and virtual elements should not be considered as the self-evident, logical style
of rendering in AR. Rather, the common approach to rendering in AR is a legacy of
the real-time techniques that were available de facto since the emergence of the field.
These were originally developed to display purely synthetic images on computers
with very limited performance.
In particular, those application areas in which presence (a sense of immersion)
in AR is often more criticalentertainment, gaming, interactive experiences, arts,
and culturehave become increasingly important in recent years. Such applications may benefit greatly from techniques that make real and virtual elements in the
environment less distinguishable, and that therefore deliver a more integrated real
virtual experience to the user.
This chapter has provided an overview of techniques that address some of the
underlying problems. Real-time algorithms have been developed for emulating camera image imperfections, for solving the specific challenges of realvirtual compositing, and for the artistic stylization of entire AR video streams. These developments
have been greatly aided by the rapid emergence of ever-faster graphics processing
units capable of performing programmable video postprocessing. Some researchers have even demonstrated integrated pipelines for emulating a large number of
the most relevant effects distinguishing the camera image from the virtual objects.
Future developments will certainly bring us even closer to achieving the aim of presenting to the user a truly seamlessly AR, whether in the style of a camera image or
in an artistic style.
REFERENCES
Agusanto, K., Li, L., Chuangui, Z., and Sing, N. (2003), Photorealistic rendering for augmented reality using environment illumination. In: Proceedings of the Second IEEE
and ACM International Symposium on Mixed and Augmented Reality (ISMAR), Tokyo,
Japan. IEEE, Washington, DC, pp.208216.
Aittala, M. (2010), Inverse lighting and photorealistic rendering for augmented reality. The
Visual Computer, 26(68): 669678.
Azuma, R., Baillot, Y., Behringer, R., Feiner, S., Julier, S., and MacIntyre, B. (2001), Recent
advances in augmented reality. IEEE Computer Graphics and Applications, 21(6):
3447.
Bajura, M. and Neumann, U. (1995), Dynamic registration correction in video-based augmented reality systems. IEEE Computer Graphics and Applications, 15(5): 5260.
480
481
Fischer, J., Bartz, D., and Straer, W. (2005c), Stylized augmented reality for improved immersion. In: Proceedings of IEEE Virtual Reality 2005, Bonn, Germany. IEEE, Washington,
DC, pp. 195202.
Fischer, J., Bartz, D., and Straer, W. (2006a), Enhanced visual realism by incorporating camera image effects. In: Proceedings of the Fifth IEEE and ACM International Symposium
on Mixed and Augmented Reality (ISMAR), Santa Barbara, CA. IEEE, Washington, DC,
pp. 205208.
Fischer, J., Bartz, D., and Straer, W. (2006b), The augmented painting. In: ACM SIGGRAPH
2006 Emerging Technologies, Boston, MA. ACM, New York, p. 2.
Fischer, J., Cunningham, D., Bartz, D., Wallraven, C., Blthoff, H., and Straer, W. (2006c),
Measuring the discernability of virtual objects in conventional and stylized augmented
reality. In: Proceedings of the 12th Eurographics Conference on Virtual Environments
(EGVE), Lisbon, Portugal. Eurographics Association, Geneva, Switzerland, pp. 5361.
Fischer, J., Flohr, D., and Straer, W. (2008a), Selective stylization for visually uniform t angible
AR. In: Proceedings of the 14th Eurographics Symposium on Virtual Environments
(EGVE), Eindhoven, Netherlands. Eurographics Association, Geneva, Switzerland,
pp.18.
Fischer, J., Haller, M., and Thomas, B. (2008b), Stylized depiction in mixed reality.
International Journal of Virtual Reality (IJVR), 7(4): 7179. IPI Press.
Fischer, J., Huhle, B., and Schilling, A. (2007), Using time-of-flight range data for occlusion
handling in augmented reality. In: Proceedings of the 13th Eurographics Conference on
Virtual Environments (EGVE), Weimar, Germany. Eurographics Association, Geneva,
Switzerland, pp. 109116.
Fischer, J., Regenbrecht, H., and Baratoff, G. (2003), Detecting dynamic occlusion in front of
static backgrounds for AR scenes. In: Proceedings of the Ninth Eurographics Workshop
on Virtual Environments (EGVE), pp. 153162. Eurographics Association, Aire-la-Ville,
Switzerland.
Fournier, A., Gunawan, A., and Romanzin, C. (1993), Common illumination between real
and computer generated scenes. In: Proceedings of Graphics Interface 93, Toronto,
Ontario, Canada. Canadian Information Processing Society, pp. 254262. Mississauga,
Canada.
Fuhrmann, A., Hesina, G., Faure, F., and Gervautz, M. (1999), Occlusion in collaborative
augmented environments. Computers & Graphics, 23(6): 809819.
Grosch, T., Eble, T., and Mueller, S. (2007), Consistent interactive augmentation of live
camera images with correct near-field illumination. In: Proceedings of the 2007 ACM
Symposium on Virtual Reality Software and Technology (VRST), Newport Beach, CA.
ACM, New York, pp. 125132.
Grubba Software. (2014), TrueGrainAccurate black & white film simulation for digital photography. Accessed June 19, 2014. http://grubbasoftware.com/.
Haller, M., Landerl, F., and Billinghurst, M. (2005), A loose and sketchy approach in a mediated reality environment. In: Proceedings of the Third International Conference on
Computer Graphics and Interactive Techniques in Australasia and South East Asia
(GRAPHITE), Dunedin, New Zealand. ACM, New York, pp. 371379.
Hertzmann, A. and Perlin, K. (2000), Painterly rendering for video and interaction. In:
Proceedings of the First International Symposium on Non-photorealistic Animation and
Rendering (NPAR), Annecy, France. ACM, New York, pp. 712.
Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R., Kohli, P., Shotton, H. etal.
(2011), KinectFusion: Real-time 3D reconstruction and interaction using a moving
depth camera. In: Proceedings of the 24th Annual ACM Symposium on User Interface
Software and Technology (UIST), Santa Barbara, CA. ACM, New York, pp. 559568.
482
Kalkofen, D., Mendez, E., and Schmalstieg, D. (2007), Interactive focus and context visualization for augmented reality. In: Proceedings of the Sixth IEEE and ACM International
Symposium on Mixed and Augmented Reality (ISMAR), Nara, Japan. IEEE, Washington,
DC, pp. 110.
Kn, P. and Kaufmann, H. (2012), Physically-based depth of field in augmented reality.
Eurographics 2012 (Short Papers), Cagliari, Italy. Eurographics Association, Geneva,
Switzerland, pp. 8992.
Klein, G. and Murray, D. (2010), Simulating low-cost cameras for augmented reality compositing. IEEE Transactions on Visualization and Computer Graphics, 16(3): 369380.
Kndel, S., Hachet, M., and Guitton, P. (2007), Enhancing the mapping between real and virtual
world on mobile devices through efficient rendering. In: Proceedings of Mixed Reality
User Interfaces 2007 (IEEE VR Workshop), Charlotte, NC. IEEE, Washington, DC.
Kraus, M., Eissele, M., and Strengert, M. (2007), GPU-based edge-directed image interpolation. In: Proceedings of the 15th Scandinavian Conference on Image Analysis, Aalborg,
Denmark. Springer, Berlin, Germany, pp. 532541.
Kyprianidis, J., Collomosse, J., Wang, T., and Isenberg, T. (2013), State of the Art: A taxonomy of artistic stylization techniques for images and video. IEEE Transactions on
Visualization and Computer Graphics (TVCG), 19(5): 866885.
Lang, T., MacIntyre, B., and Zugaza, I. (2008), Massively multiplayer online worlds as a
platform for augmented reality experiences. In: Proceedings of IEEE Virtual Reality
Conference (VR), Reno, NE. IEEE, Washington, DC, pp. 6770.
MATRIX VISION. (2014), MATRIX VISION GmbH. Accessed June 22, 2014. http://www.
matrix-vision.com/home-en.html.
Munshi, A., Ginsburg, D., and Shreiner, D. (2008), OpenGL ES 2.0 Programming Guide.
Pearson Education, Boston, MA.
Okumura, B., Kanbara, M., and Yokoya, N. (2006), Augmented reality based on estimation of
defocusing and motion blurring from captured images. In: Proceedings of the Fifth IEEE
and ACM International Symposium on Mixed and Augmented Reality (ISMAR), Santa
Barbara, CA. IEEE, Washington, DC, pp. 219225.
Park, Y., Lepetit, V., and Woo, W. (2009), ESM-Blur: Handling & rendering blur in 3D tracking
and augmentation. In: Proceedings of Eighth IEEE International Symposium on Mixed
and Augmented Reality (ISMAR), Orlando, FL. IEEE, Washington, DC, pp. 163166.
pmdtechnologies. (2014), pmdcan you imagine: The world of pmd. Accessed June 22,
2014. http://www.pmdtec.com/.
Regenbrecht, H., Collins, J., and Hoermann, S. (2013), A leap-supported, hybrid AR interface approach. In: Proceedings of the 25th Australian Computer-Human Interaction
Conference: Augmentation, Application, Innovation, Collaboration, Adelaide, South
Australia, Australia. ACM, New York, pp. 281284.
Schilling, A. (1991), A new simple and efficient antialiasing with subpixel masks. Computer
Graphics, 25(4): 133141. (SIGGRAPH 1991 Proceedings), ACM.
Straer, W. (1974), Schnelle Kurven- und Flachendarstellung auf graphischen Sichtgerten.
Dissertation, Technical University of Berlin, Berlin, Germany.
Sugano, N., Kato, H., and Tachibana, K. (2003), The effects of shadow representation of
virtual objects in augmented reality. In: Proceedings of the Second IEEE and ACM
International Symposium on Mixed and Augmented Reality (ISMAR), Tokyo, Japan.
IEEE, Washington, DC, pp. 7683.
Tomasi, C. and Manduchi, R. (1998), Bilateral filtering for gray and color images. In: Sixth
International Conference on Computer Vision, Bombay, India. IEEE, pp. 839846.
Washington, DC.
Tuceryan, M., Greer, D., Whitaker, R., Breen, D., Crampton, C., Rose, E., and Ahlers, K.
(1995), Calibration requirements and procedures for a monitor-based augmented reality
system. IEEE Transactions on Visualization and Computer Graphics, 1(3): 255273.
483
Wang, S., Cai, K., Lu, J., Liu, X., and Wu, E. (2010), Real-time coherent stylization for augmented reality. The Visual Computer, 26(68): 445455.
Withagen, P., Groen, F., and Schutte, K. (2005), CCD characterization for a range of color
cameras. In: Proceedings of Instrumentation and Measurement Technology Conference
(IMTC 2005), Ottawa, Ontario, Canada. IEEE, Washington, DC, pp. 22322235.
Wloka, M. and Anderson, B. (1995), Resolving occlusion in augmented reality. In: Proceedings
of the 1995 ACM Symposium on Interactive 3D Graphics, Monterey, CA. ACM, New
York, pp. 512.
Xueting, L. and Ogawa, T. (2013), A depth cue method based on blurring effect in augmented
reality. In: Proceedings of the Fourth Augmented Human International Conference,
Stuttgart, Germany. ACM, New York, pp. 8188.
Zllner, M., Pagani, A., Pastarmov, Y., Wuest, H., and Stricker, D. (2008), Reality filtering:
A visual time machine in augmented reality. In: Proceedings of the Ninth International
Symposium on Virtual Reality, Archaeology and Cultural Heritage (VAST), Braga,
Portugal. Eurographics Association, pp. 7177. Geneva, Switzerland.
19
Applications of
Augmented Reality in
the Operating Room
Ziv Yaniv and Cristian A. Linte
CONTENTS
19.1 Introduction................................................................................................... 486
19.1.1 Minimally Invasive Interventions...................................................... 486
19.1.2 Augmented, Virtual, and Mixed Realities......................................... 486
19.1.3 Augmenting Visualization for Surgical Navigation.......................... 487
19.2 Image Guidance Infrastructure..................................................................... 488
19.2.1 Medical Imaging................................................................................ 488
19.2.2 Segmentation and Modeling.............................................................. 488
19.2.3 Instrument Localization and Tracking.............................................. 489
19.2.4 Registration and Data Fusion............................................................. 489
19.2.5 Data Visualization and Feedback Method......................................... 490
19.3 Guided Tour of AR in the Operating Room.................................................. 490
19.3.1 Understanding the OR Requirements................................................ 490
19.3.2 Commercially Available Systems...................................................... 490
19.3.2.1 Augmented X-Ray Guidance.............................................. 490
19.3.2.2 Augmented Ultrasound Guidance...................................... 491
19.3.2.3 Augmented Video and SPECT Guidance........................... 492
19.3.2.4 Augmented Endoscopic Video Guidance........................... 493
19.3.2.5 Augmented Tactile Feedback.............................................. 495
19.3.3 Laboratory Prototypes....................................................................... 495
19.3.3.1 Screen-Based Display....................................................... 496
19.3.3.2 Binocular Display............................................................. 501
19.3.3.3 Head-Mounted Display..................................................... 503
19.3.3.4 Semitransparent (Half-Silvered Mirror) Display..............504
19.3.3.5 Direct Patient Overlay Display......................................... 505
19.4 Limitations and Challenges........................................................................... 507
19.4.1 Optimal Information Dissemination and User Perception
Performance....................................................................................... 507
19.4.2 Accommodation for Tissue Motion and Organ Deformation............ 508
19.4.3 Compatibility with Clinical Workflow and Environment................. 508
19.4.4 Cost-Effectiveness............................................................................. 508
19.5 Future Perspectives........................................................................................509
References............................................................................................................... 510
485
486
19.1INTRODUCTION
19.1.1 Minimally Invasive Interventions
Most surgical procedures and therapeutic interventions have been traditionally
performed by gaining direct access to the internal anatomy, using direct visual inspection to deliver therapy and treat the conditions. In the meantime, a wide variety of
medical imaging modalities have been employed to diagnose the condition, plan the
procedure, and monitor the patient during the intervention; nevertheless, the tissue
manipulation and delivery of therapy has been performed under rather invasive incisions that permit direct visualization of the surgical site and ample access inside the
body cavity. Over the past several decades, significant efforts have been dedicated to
minimizing invasiveness associated with surgical interventions, most of which have
been possible thanks to the developments in medical imaging, surgical navigation,
visualization and display technologies.
487
the computer models onto the real view of the physical parts (Caudell 1994), hence
facilitating the workers task by truly augmenting the users view with computersimulated cues.
488
489
and generate models that provide interactive visualization in the region of interest.
Alarge number of approaches to image segmentation have been developed. These can
be divided into low level and model-based approaches. Themost common low level
segmentation approach is thresholding, which is readily applicable for segmenting
bone structures in CT. Often combinations of multiple low-level operations are used for
specific segmentation tasks, but these are not generalizable. Model-based approaches
have been more successful and generalizable. Among others these include deformable
models, level set based segmentation, statistical shape and appearance models, and
more recently statistical methods for segmenting ensembles of organs or structures
using models of organ variation and the spatial relationships between structures.
490
491
FIGURE 19.1 Virtual fluoroscopy system interface. Projection of tracked tool and virtual
extension are dynamically overlaid onto the intraoperative x-ray images.
492
Tip-to-plane 0.3 cm
(a)
(b)
FIGURE 19.2 Needle guidance system from InnerOptic technology. System provides
guidance for insertion of an electromagnetically tracked ablation probe into a tumor using
a stereoscopic display. (a) Physical setup and endoscopic view (A) US probe (B) ablation
needle. (b) Augmented US video stream with the needle, its virtual extension and ablation
region. (Photo courtesy of S. Razzaque, InnerOptic Technology, Hillsborough, NC.)
(Ultrasonix, Canada), and LOGIQ E9 (GE Healthcare, USA), and a defunct product
the US Guide 2000 (UltraGuide, Israel). These systems have been used to guide
lesion biopsy procedures (Hakime et al. 2012), tumor ablations (Hildebrand et al.
2007), liver resection (Kleemann etal. 2006), and for regional anesthesia (Umbarje
etal. 2013; Wong etal. 2013). A limiting factor, however, is that these systems require
customized needles with embedded sensors or external adaptors rigidly attached to
the needle as used in the UltraGuide system. To address this issue, the Clear Guide
ONE system (Clear Guide Medical, USA)* uses a similar augmentation approach, but
does not require customized needles; instead, it uses a device mounted directly onto
the US probe, which includes a structured light system to track the needle and a combination of optical and inertial sensors for tracking the US probe (Stolka etal. 2011).
Finally, a different approach to augmenting the users perception is provided by
the AIM and InVision, systems (InnerOptic Technology, USA). These use electromagnetic and optical tracking, respectively, with augmentation provided via a stereoscopic display and a realistic representation of the needle. The operator is required
to wear polarized glasses to perceive a full 3D scene. These systems have been clinically used for delivering ablative therapy (Sindram et al. 2010, 2011). Figure 19.2
shows the physical setup and augmented guidance view.
19.3.2.3 Augmented Video and SPECT Guidance
The systems described earlier are used to directly guide the intervention with
AR being the key technical component. The declipseSPECT (SurgicEye GmbH,
*
493
Germany) system uses AR to guide data acquisition and clinical decision making,
although it is not the systems key novel aspect. The clinical context is the use of
nuclear medicine in the surgical setting. A radiopharmaceutical tracer (i.e., a ligand
linked to a gamma ray emitting substance) is injected into the blood stream and
binds to tumor cells more readily than to normal cells, increasing the concentration
of radiation emitting substance in the tumor. A gamma probe is then used to localize
the region exhibiting higher radiation counts. Previously, localization was done in
a qualitative manner, with gamma probes used to evaluate the presence of radioactive material via continuous auditory and numerical feedback (Povoski etal. 2009),
providing limited knowledge about the spatial distribution of the tracer. The novelty
of the declipseSPECT system consists of the reconstruction of a 3D volume based on
the concentration of radioactive tracer measured with an optically tracked, handheld,
gamma probe. Hence, the system does the job of a SPECT imaging system using a
handheld device rather than the large diagnostic SPECT machine that cannot be
used in the operating room setting. To provide AR guidance, the system combines a
video camera with a commercial optical tracking system. This combination device
is calibrated during assembly. Once calibrated, any spatial information known in the
tracking systems coordinate frame is readily projected onto the video.
As the quality of the 3D SPECT reconstruction is highly dependent on the data
gathered with the tracked gamma probe, the system displays the amount of information acquired in the region of interest overlaid onto the video stream of the patient.
This information guides the surgeon to place the gamma probe at locations that have
insufficient measurements. Once the SPECT volume is computed, its projection is
overlaid onto the video of the patient as shown in Figure 19.3. This overlay guides the
surgeon during biopsy or resection of the tumor (Navab etal. 2012). The system was
initially evaluated in a study including 85 patients undergoing lymph node biopsy after
being diagnosed with invasive breast cancer (Wendler etal. 2010). The study identified
the correlation between the quality of the 3D reconstruction and the quality of data
acquisition, emphasizing that better acquisitions resulted in improved guidance.
Having described three forms of commercially available AR systems, we point
out that all three share an important characteristic: they do not involve registration. These systems are based on calibration of the imaging device (x-ray, US, video
camera) with respect to a tracking system. The augmented video consists of intraoperative imaging overlaid with additional data also acquired intraoperatively; the lack
of pre- to intraoperative registration is an important aspect in the clinical setting, as
the registration often disrupts and slows the clinical workflow, significantly reducing
the chances of user acceptance.
19.3.2.4 Augmented Endoscopic Video Guidance
A guidance system that does require registration of preoperative data is the Scopis
Hybrid Navigation system (SCOPIS Medical, Germany), developed in the context of
earnosethroat (ENT) surgery. This system uses a preoperative CT volume of the
patient in which structures of interest, surgical planning information such as desired
osteotomy lines, and targets are defined. Both the CT and a live endoscopic video
stream are used to guide the surgeon (Winne etal. 2011). Intraoperatively, the CT
is registered to the patient using rigid registration based on corresponding points
494
FIGURE 19.3 Interface of the declipseSPECT system from SurgicEye GmbH. The system
uses an optically tracked gamma probe (a), to obtain a SPECT volume that is overlaid onto the
video stream (b). Computations are in a patient-centric coordinate system, a tracked reference
frame that is attached to the patients skin (c). (Photo courtesy of J. Traub, SurgicEye GmbH,
Mnchen, Germany.)
identified in the CT and on the patient. The calibrated endoscope is then tracked
using either optical or electromagnetic tracking, and the information defined preoperatively in the CT coordinate system is overlaid onto the video stream. To strike a
balance between overlaying information onto the image and obscuring the anatomy,
the system overlays information in a minimalistic manner, using outlines as opposed
to solid semitransparent objects. An interesting feature of endoscopic images is that
they exhibit severe barrel distortion. This is a feature associated with the lenses
used in endoscopes that are required to provide a wide angle view. Thus, to augment endoscopic video, one must either correct the distorted images or distort the
projected data prior to merging the two information sources. The approach adopted
by the Scopis system is to distort the projected data, an approach consistent with the
understanding that clinicians prefer to view the original unprocessed images, as well
as also more computationally efficient.
All commercially available AR systems described so far use standard or stereoscopic 2D screens to display the information, most likely due to the availability of
screens in the clinical setting, and the familiarity of clinicians with this mode of
interaction. Stepping into the modern day interventional setting, one immediately
observes that the medical staff focuses on various screens, while almost no one looks
directly at the patient. This approach is not ideal, as it poses a challenge for handeye
495
19.3.3Laboratory Prototypes
The number of laboratory AR systems is exceedingly large as compared to the few
that have made it into the commercial domain. The following sections provide brief
descriptions of such systems. Most notably, all of these prototypes, except one, are
based on visual augmentation and are therefore divided into subcategories based
on their display technology: a screen, a medical binocular system, a head-mounted
display (HMD), a half-silvered mirror display, or direct projection onto the patient.
*
The Acrobot and RIO systems are currently owned by Stryker Corp.
496
To start with an exception, the virtual reality system for guiding endoscopic skull
base surgery that also incorporates auditory augmentation (Dixon etal. 2014). This
system provides auditory cues in addition to virtual reality views of the underlying
anatomy. Critical anatomical structures are segmented from an intraoperative CBCT
data set that is rigidly registered to the patient. Structure- and distance-dependent
auditory cues are used to alert the operator of proximity to specific critical structures. The system was evaluated by seven surgeons who preferred the auditory
alarms be customizable, with the option of turning them off once the location of
the critical structures had been identified using the endoscopic or virtual reality
views. This speaks to the need for context awareness in these systems, as discussed
in Section 19.4.
In this tour, similar systems are grouped and presented in chronological order
according to their development and cited relevant publications in that specific group.
19.3.3.1 Screen-Based Display
Stepping into a modern operating room, one immediately notices multiple screens
displaying various pre- and intraoperative data. The second observation is that most
of the time the clinical staff is looking at the screens and not directly at the patient.
Thus, introducing an additional screen is clinically acceptable, possibly explaining
why this display choice is the most common approach used by AR systems.
It should be noted that this tour does not include systems that utilize a separate
screen approachone screen showing a virtual reality scene depicting the underlying spatial relationships between surgical tools and anatomy with additional screen(s)
showing the physical scene via intraoperative imaging (e.g., endoscopic or US video
streams (Ellsmere etal. 2004)). This approach relies on the operator to integrate the
information from the virtual and real worlds and thus should not be classified as an
AR system, but a VR system in conjunction with real-time imaging and will not be
discussed.
One of the first image-guided navigation systems for neurosurgical procedures
was the system presented in Grimson etal. (1996). This system augmented video
from a standard camera with structures segmented from preoperative CT or MR,
rigidly registered to the intraoperative surgical scene. The surgeon physically
traced contours of the relevant structures onto the patients scalp using the augmented video. The system did not include tracking of the patient or tools, assuming that the patient was stationary until all structures were physically marked.
A related system for liver tumor ablation was described in Nicolau etal. (2009).
This system overlaid models of the liver segmented from CT onto standard video.
Rigid registration was used to align the CT acquired at end-exhalation with the
physical patient. As the liver moves and deforms with respiration, the registration
and guidance information were only valid and therefore used at end-exhalation.
In this system, the distal end of the ablation needle was optically tracked using a
printed marker, assuming a rigid needle, with a fixed transform between the needles tracked distal end and its tip. Following clinical evaluation, it was determined
that the needle deflection was not negligible in 25% of the cases, an issue that can
be mitigated by using electromagnetic tracking and embedding the tracked sensor
close to the needles tip.
497
A system that overlays a rendering of tumor volume onto standard video without
the need for registration was described in Sato etal. (1998). This system was developed in the context of breast tumor resection. A standard video camera and a 2D US
probe are optically tracked and a 3D US volume was acquired with the tracked US,
therefore providing information on the tumor location relative to the tracked camera. The system assumes a stationary tumor, which is achieved by suspending the
patients respiration for a short period.
A more recent system aiming to improve puncture site selection for needle insertion in epidural anesthesia was described in Al-Deen Ashab et al. (2013). Using
an optically tracked US probe, the system creates a panoramic image of the spinal
column and identifies the vertebra levels. The optical tracking system also provides
a live video stream and the calibration parameters for the video camera, which were
used in conjunction with the known 3D location of the vertebras from US to overlay corresponding lines onto the video of the patients back without the need for
registration. The overlay is iconic, with lines denoting the vertebra location and
anatomical name.
Finally, a system that uses AR to guide the positioning of a robotic device was
described in Joskowicz etal. (2006). This system is used in the context of neurosurgery and relied on a preoperative CT or MR of the head that was rigidly registered
relative to a camera located in a fixed position. The location of the desired robot
mounting is projected onto the cameras video stream, allowing the user to position
the physical mount accordingly.
Several groups have proposed the use of a tracked standard video camera to guide
maxillofacial and neurosurgery procedures (Kockro etal. 2009; Mischkowski etal.
2006; Pandya etal. 2005; Weber etal. 2003) and more recently for guidance of kidney stone removal (Mller etal. 2013). Tracking of the camera pose is performed
using either a commercial optical tracking system (Kockro etal. 2009; Mischkowski
etal. 2006; Weber etal. 2003), a mechanical arm (Pandya etal. 2005), or directly
from the video stream using pose estimation (Mller etal. 2013). The camera video
feed is then augmented with projections of anatomical structures segmented from
preoperative CT scans and rigidly registered to the intraoperative patient. The systems described in Weber et al. (2003) and Mischkowski et al. (2006) mount the
camera on the back of a small handheld LCD screen, which is equivalent to the use
of a tablet with built-in camera described in Mller etal. (2013). In these systems,
the operator positions the screen between themselves and the patient as if looking
through a window, providing an intrinsic rough alignment of the operator and camera viewing direction, ensuring that the augmented information overlaid onto the
visible surface is in the correct location relative to the operators viewing angle.
In the system described in Kockro etal. (2009) the tracked camera is mounted
inside a surgical pointer and the augmented views are displayed on a standard screen.
As a consequence, the camera and surgeons viewing direction may differ significantly, resulting in erroneous perception of the location of the overlaid structures
on the skin surface, as illustrated in Figure 19.4. This effect became apparent when
surgeons attempted to use a pen to physically mark the structure locations on the
patients skin, a similar issue to that arising when projecting the underlying anatomical structures directly onto the patient (Section 19.3.3.4).
498
a/
er tor
c
m
Ca roje
p
FIGURE 19.4 When the surgeons viewing direction and the camera/projectors viewing
direction differ significantly, projecting an internal structure onto the skin surface directly or
onto its picture provides misleading guidance.
The following group of systems augments endoscopic video with additional information acquired either pre- or intraoperatively. Before proceeding, it is worthwhile
noting a unique characteristic of all endoscopic systems: they exhibit severe radial
distortions. This is due to their use of fish-eye lenses, a constraint imposed by the
need to have a wide field of view while minimizing the size of the tool inserted into
the body, therefore requiring correction of the distorted video images or distortion
of the augmented information before fusion.
An early example of augmented endoscopy was described in Freysinger et al.
(1997). This system was developed in the context of ENT surgery. The endoscopes
path is specified on a preoperative CT or MR. Both patient and endoscope are localized using electromagnetic tracking. The preoperative image is rigidly registered to
the patient and an iconic representation of the desired endoscope path in the form of
rectangles is overlaid onto the video.
A similar system developed for guiding resection of pituitary tumors was
described in Kawamata etal. (2002). Critical structures such as the carotid arteries
and optic nerves are segmented preoperatively in MR or CT. The patient and endoscope were optically tracked and following rigid registration, all of the segmented
structures are projected as wireframes onto the endoscopic image. It should be noted
that this form of augmentation all but occluded the content of the endoscopic image.
A more recent development is the system for lymph node biopsy guidance presented in Higgins etal. (2008). This system ensures the physician obtains the biopsy
499
from the correct location, which is not visible on the endoscopic video, by overlaying
the segmented lymph node identified in preoperative CT onto the endoscopic view.
A unique feature of this system is that it does not use external tracking devices;
instead, the location of the endoscope is estimated by rigidly registering a virtual
view generated from CT to the endoscopic video. Comparison between the real and
synthetic images is performed using an information theorybased metric.
Having identified intraoperative registration of soft tissue organs as a complex
challenge, some groups have decided to sidestep this hurdle. Instead of using data
from preoperative volumetric images, these groups acquire volumetric or tomographic data intraoperatively in a manner that ensures that the pose of the endoscope
is known with respect to the intraoperative coordinate system. One such system
developed in the context of liver surgery was described in Konishi etal. (2007). The
system uses a combined magneto-optical tracking system to track an US probe and
the endoscope. The tracked 2D US is used to acquire a volume intraoperatively when
the patient is at end-exhalation. Vessels and tumor locations are volume rendered
from this data and overlaid onto the endoscopic video. No registration is required as
the relative pose of the US and endoscope is known in the external tracking systems
coordinate frame.
Another system targeting liver tumor resection was described in Feuerstein etal.
(2008). That system employed an optical tracking system and intraoperative volumetric imaging is acquired using a CBCT machine. A contrast enhanced CBCT
image is acquired before resection with the patient at end-exhalation. The structures
visible in CBCT are then overlaid onto the endoscopic video so that the surgeon is
aware of the location of blood vessels and bile ducts that go into and out of the liver
segment of interest. Augmentation is only valid during the end-exhalation respiratory phase and prior to the beginning of resection.
A system that is not limited to guidance during end-exhalation was described
in Shekhar etal. (2010). Localization is done with an optical tracking system and
intraoperative volumetric images are acquired with a CT machine whose pose relative to the tracking system is determined via calibration. To enable augmentation
throughout the respiratory cycle, an initial high quality contrast enhanced volume
is acquired. This volume is then nonrigidly registered to subsequently acquired
low-dose CTs. The deformed structures are overlaid onto the endoscopic image.
Themain issue with this approach is the increased radiation exposure, making it less
likely to be adopted clinically. This approach later evolved into the use of optically
tracked 2D US that is continuously fused with live tracked stereo endoscopy (Kang
etal. 2014). This system does not require registration and implicitly accommodates
organ motion and deformation.
While the majority of endoscopy-based AR systems use commercial tracking devices, the system described in Teber et al. (2009) and Simpfendrfer et al.
(2011) is unique, in that it uses the medical images and implanted markers to perform endoscopic tracking. Note that this approach is invasive, as the markers are
implanted into the organ of interest under standard endoscopic guidance. Retrieval
of the m
arkers is also an issue, unless the procedure entails resection of tissue and
the markers are implanted in the gross region that will be removed during the procedure. An additional issue with this approach is that it assumes that the organ does not
500
deform, limiting its applicability to a subset of soft tissue interventions. The system
was utilized in radical prostatectomy and partial kidney resection. In the former
application, a 3D surface model obtained from intraoperative 3D transrectal US was
overlaid onto the video images, while in the latter, anatomical models were obtained
from intraoperative CBCT data.
While virtual fluoroscopy as a commercial product has experienced somewhat
of a fall-out from the clinic, variants of this approach that provide augmented x-ray
views of registered anatomical structures have continued to be developed in research
labs. In the context of orthopedic surgery, an AR system was developed to aid in the
reduction of long bone fractures (Zheng et al. 2008). This system registers cylindrical models to the two fragments of the long bone using two x-ray images. Each
bone segment is optically tracked using rigidly implanted reference frames. As the
surgeon manipulates the bone fragments, the image is modified based on the models
without the need for additional x-rays. In the context of cardiac procedures, a variation on the virtual fluoroscopy approach is used with preoperative data from MRI
augmenting live fluoroscopy (De Buck etal. 2005; Dori etal. 2011; George etal.
2011). These procedures are traditionally guided using fluoroscopy as the imaging
modality, which only enables clear catheter visualization, with the heart only clearly
visible under contrast agent administration. As contrast can cause acute renal failure,
the clinician has to strike a balance between the need to observe the spatial relationships between catheters and anatomy and the amount of administered contrast. Both
the systems described in De Buck etal. (2005) and Dori etal. (2011) rigidly register
the MRI data to contrast enhanced x-rays using manual alignment. While the former
uses a segmented model, the latter uses a volume rendering of the MRI dataset.
A system using automatic fiducialbased rigid registration is described in George
etal. (2011), with the x-ray images augmented with a segmented model from MRI.
It should be noted that all these systems use rigid registration with the augmented
data being valid only at a specific point of the cardiac and respiratory cycles. Thus,
the need for live fluoroscopy remains in clinical demand, as it reflects the dynamic
physical reality with the additional augmented information that is not necessarily
synchronized.
Finally, a recent method for initializing 2D/3D registration based on augmentation of fluoroscopic images was described in Gong etal. (2013). An anatomical
model obtained from preoperative CT or MR is virtually attached to an optically
tracked pointer. The operator uses the pointer to control the overlay of the model
onto intraoperative fluoroscopic images of the anatomy. Overlay is performed using
the projection parameters of the x-ray device. When the model is visually aligned
with the fluoroscopic images, it is physically in the same location as the anatomical
structure. Once alignment is visually determined, the pose of the tracked pointer
yields the desired transformation. Figure 19.5 shows this system in the laboratory
setting.
While all of the systems described earlier augmented fluoroscopic x-ray images,
they involved cumbersome user interactions or required the use of additional hardware
components such as tracking systems. This is less desirable in the intraoperative
environment where space is often physically limited and time is a precious commodity. An elegant AR system combining video and x-ray fluoroscopy in a clinically
501
a
c
FIGURE 19.5 Operator uses an (a) optically tracked (b) probe tool to align a volume of the
spine to (c) multiple x-ray images. Spine model is virtually attached to the tracked probe.
appropriate form factor was described in Navabet al. (2010) and Chen etal. (2013).
This system is based on a modified portable fluoroscopy unit, a C-arm. A video camera and a double mirror system are attached to the C-arm and calibrated such that the
optical axis of the video camera and the x-ray machine are aligned. Following this
one-time calibration, the x-ray images are overlaid onto the video, placing them in
the physical context. Thus, the exact location of underlying anatomy that is readily
visible in the x-ray is also visible with respect to the skin surface seen in the video.
The last system in this category is a model-enhanced US-assisted guidance system for cardiac interventions. The system integrates electromagnetically tracked
transesophageal US imaging providing real-time visualization, with augmented
models of the cardiac anatomy segmented from preoperative CT or MRI and virtual
representations of the electromagnetically tracked delivery instruments (Linte etal.
2008, 2010). The interface for these guidance systems is shown in Figure 19.6. The
anatomical models were rigidly registered to the intraoperative setting. While this
registration is not accurate enough for navigation based on the preoperative models,
they provide missing context while the intraoperative US provides the live navigation
guidance. The system was later adapted to an augmented display for mitral valve
repair to guide the positioning of a novel therapeutic device, NeoChord, as described
in Moore etal. (2013).
19.3.3.2 Binocular Display
Medical binocular systems are a natural vehicle for AR as they already provide a
mediated stereo view to the surgeon. Thus, the introduction of AR systems into
procedures utilizing these systems is more natural and has a smaller impact on the
existing workflow, increasing the chances of user adoption. Unlike the screen-based
display in which information is overlaid onto a single image, the systems in this category need to overlay the information on two views in a manner that creates correct
depth perception.
502
(a)
(b)
FIGURE 19.6 Augmented views from US-assisted system for guidance of cardiac interventions. (a) View for guiding direct access mitral valve implantation and septal defect repair,
and (b) View for guidance of transapical mitral valve repair.
One of the earlier systems in this category was the microscope-assisted guided
interventions (MAGI) system (Edwards etal. 2000). This system was developed in
the context of neurosurgery, augmenting the visible surface with semitransparent renderings of critical structures segmented in preoperative MR. Both patient and microscope are optically tracked, with the patient rigidly registered to the intraoperative
settings. Unfortunately, the depth perception created by the overlay of semitransparent surfaces even, with highly accurate projection parameters and registration, was
ambiguous and unstable (Johnson etal. 2003). Another system that can be described
as a medical HMD is the variscope AR (Birkfellner etal. 2003). By adding a beam
splitter to a clinical head-mounted binocular system, additional structures segmented
from preoperative CT are injected into the surgeons line of sight. The surgeons head
location, tools, and patient are all optically tracked. The preoperative data is rigidly
registered to the patient with the intended use being neurosurgery applications.
The da Vinci robotic surgical system is a commercial masterslave robotic system for soft tissue interventions (DiMaio et al. 2011). The surgeon controls the
system using visual feedback provided by a stereo endoscope on the patient side,
with the images viewed through a binocular system on the surgeons side, as shown
in Figure19.7. A number of groups have proposed to augment the surgeons view
by overlaying semitransparent surface models onto the stereo images. Models are
obtained from segmentation of CT and are rigidly registered to the intraoperative
setting. Among others, these include overlay of reconstructed coronary tree for cardiac procedures using a one-time manual adjustment of the original rigid registration
without any further updates (Falk etal. 2005), overlay of semitransparent kidney and
collecting system for partial kidney resection with continuous rigid registration to
account for motion (Su etal. 2009), overlay of blood vessels for liver tumor resection
(Buchs et al. 2013) without any updates after the initial registration,* and overlay
*
Note that the system was used on patients that were cirrhotic; the liver is rigid due to disease.
(a)
503
(b)
FIGURE 19.7 da Vinci optics: (a) stereo endoscope on the patient side and (b) binocular
system on the surgeon side.
of semitransparent facial nerve and cochlea for cochlear implant surgery (Liu etal.
2014), again, without any updates after the initial registration.
19.3.3.3 Head-Mounted Display
HMDs are in many ways equivalent to the medical binocular systems described earlier. They differ in several aspects, as they are not part of the current clinical setup;
historically they have been rather intrusive, and the augmentation is only visible to
a single user. Possibly with the newer generation of devices such as Google Glass,
Epsons Moverio BT-200, and the Oculus Rift, systems utilizing such devices may be
revived in the clinical setting, as they are much less intrusive.
One of the pioneering clinical AR systems did use a HMD for insitu display of
US images (Bajura etal. 1992) and a decade later an updated version of this system
was used to provide guidance for breast biopsy (Rosenthal etal. 2002). In this system
the HMD, US probe, and biopsy needle are optically tracked, with no intraoperative registration required. This guidance concept latter evolved into the AIM and
InVision commercial systems described earlier, but with the display consisting of a
stereoscopic monitor as opposed to a HMD.
A system for biopsy guidance using an optically tracked HMD and needle was
described in Vogt etal. (2006) and Wacker etal. (2006). The system overlays information from preoperative or, the less common, intraoperative MR onto the field of
view, and rigidly registered to the patient. A more recent, cost-effective AR system
for guiding needle insertion in vertebroplasty was described in Abe et al. (2013).
The system uses an outside-in-based optical tracking approach using a low cost
web camera attached to the HMD. Registration is performed via visual alignment
using x-ray imaging, with guidance provided via a needle trajectory planned on a
preoperative CT.
504
FIGURE 19.8 Sonic flashlight used in a cadaver study to place a needle in a deep vein in
the arm. (a) Miniature display and (b) half-silvered mirror. (Photo courtesy of G. Stetten,
University of Pittsburgh, Pittsburgh, PA.)
505
used for diagnosis and less for guidance of interventional procedures due to their
costs, and, as a consequence such a device is less likely to gain widespread clinical
acceptance.
The Medarpa system, developed at the Fraunhofer Institute, Germany, overlays
CT data onto the patient, similar to the previous system, with the advantage that
the display is mobile (Khan etal. 2006). The intended use is needle guidance for
biopsy procedures. To display the relevant information, the head of the operator
and the mobile display are optically tracked, while the needle is electromagnetically tracked. The system requires patient registration and calibration of the two
tracking systems, so that the transformation between them is known. These steps
are less desirable as they increase the time a procedure takes and involve increased
cost and technological footprint in the interventional suite given the use of multiple
tracking systems.
Finally, a unique 3D autostereoscopic system in combination with a half-
silvered mirror was described in the context of guidance of brain tumor interventions in an open MRI scanner (Liao et al. 2010) and guidance of dental
interventions using CT data (Wang etal. 2014). This system is unique in itself,
as it provides the perception of 3D without the need for custom glasses. This is
achieved by using integral photography, placing an array of lenses in front of the
screen that create the correct 3D perception from all viewing directions. In the
MRI-based version of the system, the image volume was rigidly registered to
the patient and the integral v ideography device and tools were tracked optically
using a commercial tracking system. In the CT-based version, the tracking and
registration were customized for the procedure using an optical tracking system
developed in-house with the volumetric image registered to the patient using the
visible dental structures.
19.3.3.5 Direct Patient Overlay Display
Directly overlaying information onto the patient became possible with the introduction of high-quality projection systems that have a sufficiently small form factor.
Similar to the use of a half-silvered mirror, this approach facilitates better handeye
coordination, as the surgeons attention is not divided between a screen and the interventional site. An implicit assumption of a projection-based display is that there is
a direct line of site between the projector and the patient. In the physically crowded
operating room such a setup is not always feasible, as the clinical staff and additional
equipment will often block the projectors field of view. An additional issue with this
display approach is that it can lead to incorrect perception of the location of deep
seated structures if the projection direction and the surgeons viewing direction differ significantly, as illustrated in Figure 19.4.
One of the earlier projectionbased augmentation approaches used a laser light
and the persistence of vision effect to overlay the shape of a craniotomy onto a head
phantom (Glossop etal. 2003). A CT volume was rigidly registered to the phantom
using an optically tracked pointer, after which the craniotomy outline defined in the
CT was projected as a set of points in quick succession, creating the perception of an
outline overlaid onto the skull. Another system developed in the context of craniofacial surgery overlaid borehole geometry, bone cutting trajectories and tumor margins
506
onto the patients head (Marmulla etal. 2005; Wrn etal. 2005). A rigid registration
was performed by matching the surface from a CT scan to an intraoperative surface
scan acquired using coded light pattern.
A system for guiding needle insertion in brachytherapy, insertion of radioactive implants into tumor, was described in Krempien et al. (2008). The system
uses a fixed projector that is utilized both for guidance, as well as to project a
structured light pattern for acquiring 3D intraoperative surface structures. The
intraoperative surface is rigidly registered to a preoperative surface extracted
from CT. Information is presented incrementally, first guiding the operator to
the insertion point, then aligning the needle along insertion trajectory and then
insertion depth. The system used iconic information and color coding to provide
guidance, while tracking both the patient and needle using a commercial optical
tracking system.
A system for guiding port placement in endoscopic procedures was presented
in Sugimoto etal. (2010). A volume rendering of internal anatomical structures
from preoperative CT is projected onto the patients skin using a projector fixed
above the operating table. The CT data is rigidly registered to the patient using
anatomical structures on the skin surface (e.g., umbilicus). Registration accuracy
was about 5 mm, with the result used as a rough roadmap helping the surgeon
to select appropriate locations for port placement. The procedure itself was then
guided using standard endoscopy. A similar system facilitating port placement
in endoscopic liver surgery performed using the da Vinci robotic system was
described in Volont et al. (2011). Registration and visualization are similar in
both systems. In the da Vinci case, clinicians noted that this form of guidance
was primarily useful in obese patients that required modification of the standard
port locations.
The development of small form-factor projectors enabled the projection of information directly onto the patient using a handheld and optically tracked pico-projector
(Gavaghan et al. 2011, 2012). This system was used to project vessels and tumor
locations onto the liver surface for guiding open liver surgery, tumor localization for
guiding orthopedic tumor resection, and for guiding needle insertions by projecting
iconic information, cross hairs and color, to indicate location and distance from target. Preoperative information was obtained from CT, and rigid registration was used
to align this information to the intraoperative setting. This system does not take into
account the viewers pose and the associated issues of depth perception when projecting anatomical structures. Rather, it adopts a practical solution, overlaying iconic
guidance information instead of projecting anatomical structures for structures that
lie deep beneath the skin surface.
A more recent, fixed projector display, system that implicitly accounts for motion
and deformation was described in Szab etal. (2013). This system overlays infrared
temperature maps directly onto the anatomy. The intent is to facilitate identification
of decreased blood flow to the heart muscle during cardiac surgery. The system uses
an infrared camera and a projector that are calibrated, so that their optical axis is
aligned. The system does not require registration as all data is acquired intraoperatively and, as long as the acquisition and projection latency is sufficiently short, it
readily accommodates the moving and deforming anatomy.
507
508
(e.g., systems using medical binocular) where no other member of the clinical staff
can alert them to such issues.
19.4.4Cost-Effectiveness
Based on past performance, introduction of new technologies into the operating
room has more often increased the cost of providing healthcare (Bodenheimer
2005). Government agencies are aware of this as reflected by the goals of the affordable healthcare act in the Unites States that aims to reduce healthcare costs (Davies
2013). From a financial perspective, only a few studies have focused on evaluating
509
the cost-effective use of virtual reality image-guidance and robotic systems (Costa
etal. 2013; Desai etal. 2011; Margier etal. 2014; Novak etal. 2007; Swank etal.
2009), and unfortunately none of these studies reported a clear financial benefit for
using the proposed navigation systems. With regard to evaluating cost-effectiveness
of AR systems, the only evaluation identified was that of the RIO augmented haptic surgery system (Swank et al. 2009). While this system was shown to be costeffective, the analysis also showed an increased number of patients undergoing the
procedure, possibly attracted by the novelty of the technology.
Thus, successfully transitioning from a laboratory implementation and testing
to clinical care now becomes not only a matter of providing improved healthcare,
but also of being cost-effective. If the intent is to develop systems that will be clinically adopted, then cost should also be considered during the research phase and
not ignored until the clinical implementation phase, which seems to be the case for
most recently developed systems that have experienced low traction in terms of their
clinical translation.
510
Transitioning from the laboratory to the commercial domain remains a challenge, as illustrated by the ratio of commercial systems to laboratory prototypes
presented here. On the other hand, medicine is one of the domains where AR systems have made inroads, overcoming both technical and regulatory challenges that
are not faced in other application domains. Given the active research in medical AR
and the large number of laboratory systems, we expect to see additional AR systems or systems incorporating elements of AR in commercial products. While we
use the term medical AR to describe the domain, in practice the important aspect
of these systems is that they augment the physicians abilities to carry out complex
procedures in a minimally invasive manner, improving the quality of healthcare for
all of us.
REFERENCES
Abe, Y., S. Sato, K. Kato, T. Hyakumachi, Y. Yanagibashi, M. Ito, and K. Abumi. 2013. A novel
3D guidance system using augmented reality for percutaneous vertebroplasty: Technical
note. Journal of Neurosurgery. Spine 19(4) (October): 492501.
Adhikary, S. D., A. Hadzic, and P. M. McQuillan. 2013. Simulator for teaching hand-eye coordination during ultrasound-guided regional anaesthesia. British Journal of Anaesthesia
111(5) (November): 844845.
Al-Deen Ashab, H., V. A. Lessoway, S. Khallaghi, A. Cheng, R. Rohling, and P. Abolmaesumi.
2013. An augmented reality system for epidural anesthesia (AREA): Prepuncture
identification of vertebrae. IEEE Transactions on Bio-Medical Engineering 60(9)
(September): 26362644.
Amesur, N., D. Wang, W. Chang, D. Weiser, R. Klatzky, G. Shukla, and G. Stetten. 2009.
Peripherally inserted central catheter placement using the sonic flashlight. Journal of
Vascular and Interventional Radiology 20(10) (October): 13801383.
Bajura, M., H. Fuchs, and R. Ohbuchi. 1992. Merging virtual objects with the real world: Seeing
ultrasound imagery within the patient. In Proceedings of the 19th Annual Conference
on Computer Graphics and Interactive Techniques, SIGGRAPH92, Chicago, IL. New
York: ACM, pp. 203210.
Banger, M., P. J. Rowe, and M. Blyth. 2013. Time analysis of MAKO RIO UKA procedures
in comparison with the Oxford UKA. Bone & Joint Journal Orthopaedic Proceedings
Supplement 95-B(Suppl. 28) (August 1): 89.
Bichlmeier, C., S. M. Heining, M. Feuerstein, and N. Navab. 2009. The virtual mirror: Anew
interaction paradigm for augmented reality environments. IEEE Transactions on
Medical Imaging 28(9) (September): 14981510.
Birkfellner, W., M. Figl, C. Matula, J. Hummel, R. Hanel, H. Imhof, F. Wanschitz, A. Wagner,
F. Watzinger, and H. Bergmann. 2003. Computer-enhanced stereoscopic vision in a
head-mounted operating binocular. Physics in Medicine and Biology 48(3) (February7):
N49N57.
Blackwell, M., C. Nikou, A. M. DiGioia, and T. Kanade. 2000. An image overlay system for
medical data visualization. Medical Image Analysis 4(1) (March): 6772.
Bodenheimer, T. 2005. High and rising health care costs. Part 2: Technologic innovation.
Annals of Internal Medicine 142(11) (June 7): 932937.
Botden, S. M. and J. J. Jakimowicz. 2009. What is going on in augmented reality simulation
in laparoscopic surgery? Surgical Endoscopy 23: 1693700.
Bouarfa, L., P. P. Jonker, and J. Dankelman. 2011. Discovery of high-level tasks in the operating room. Journal of Biomedical Informatics 44(3) (June): 455462 (Biomedical
complexity and error).
511
512
Ellsmere, J., J. Stoll, W. Wells 3rd, R. Kikinis, K. Vosburgh, R. Kane, D. Brooks, and D.Rattner.
2004. A new visualization technique for laparoscopic ultrasonography. Surgery 136(1)
(July): 8492.
Ettinger, G. L., M. E. Leventon, W. E. Grimson, R. Kikinis, L. Gugino, W. Cote, L. Sprung
etal. 1998. Experimentation with a transcranial magnetic stimulation system for functional brain mapping. Medical Image Analysis 2: 477486.
Falk, V., F. Mourgues, L. Adhami, S. Jacobs, H. Thiele, S. Nitzsche, F. W. Mohr, and E.CosteManire. 2005. Cardio navigation: Planning, simulation, and augmented reality in
robotic assisted endoscopic bypass grafting. The Annals of Thoracic Surgery 79(6)
(June): 20402047.
Feifer, A., J. Delisle, and M. Anidjar. 2008. Hybrid augmented reality simulator: Preliminary
construct validation of laparoscopic smoothness in a urology residency program.
Journal of Urology 180: 14551459.
Feuerstein, M., T. Mussack, S. M. Heining, and N. Navab. 2008. Intraoperative laparoscope
augmentation for port placement and resection planning in minimally invasive liver
resection. IEEE Transactions on Medical Imaging 27(3) (March): 355369.
Fichtinger, G., A. Deguet, K. Masamune, E. Balogh, G. S. Fischer, H. Mathieu, R. H. Taylor,
S. J. Zinreich, and L. M. Fayad. 2005. Image overlay guidance for needle insertion
in CT scanner. IEEE Transactions on Bio-Medical Engineering 52(8) (August):
14151424.
Fischer, G. S., A. Deguet, C. Csoma, R. H. Taylor, L. Fayad, J. A. Carrino, S. J. Zinreich, and
G. Fichtinger. 2007. MRI image overlay: Application to arthrography needle insertion.
Computer Aided Surgery 12(1) (January): 214.
Foley, K. T., D. A. Simon, and Y. R. Rampersaud. 2001. Virtual fluoroscopy: Computerassisted fluoroscopic navigation. Spine 26(4) (February 15): 347351.
Freysinger, W., A. R. Gunkel, and W. F. Thumfart. 1997. Image-guided endoscopic ENT
surgery. European Archives of Otorhinolaryngology 254(7): 343346.
Gavaghan, K. A., M. Peterhans, T. Oliveira-Santos, and S. Weber. 2011. A portable image
overlay projection device for computer-aided open liver surgery. IEEE Transactions on
Bio-Medical Engineering 58(6) (June): 18551864.
Gavaghan, K., T. Oliveira-Santos, M. Peterhans, M. Reyes, H. Kim, S. Anderegg, and
S.Weber. 2012. Evaluation of a portable image overlay projector for the visualisation of
surgical navigation data: Phantom studies. International Journal of Computer Assisted
Radiology and Surgery 7(4) (July): 547556.
George, A. K., M. Sonmez, R. J. Lederman, and A. Z. Faranesh. 2011. Robust automatic rigid
registration of MRI and x-ray using external fiducial markers for XFM-guided interventional procedures. Medical Physics 38(1) (January): 125141.
Glossop, N., C. Wedlake, J. Moore, T. Peters, and Z. Wang. 2003. Laser projection augmented reality system for computer assisted surgery. In Medical Image Computing and
Computer-Assisted InterventionMICCAI 2003, (eds.) R. E. Ellis and T. M. Peters,
pp.239246. Lecture Notes in Computer Science 2879. Berlin, Germany: Springer.
Gong, R. H., . Gler, M. Krkloglu, J. Lovejoy, and Z. Yaniv. 2013. Interactive initialization
of 2D/3D rigid registration. Medical Physics 40(12) (December): 121911.
Grtzel, C., T. Fong, S. Grange, and C. Baur. 2004. A non-contact mouse for surgeoncomputer
interaction. Technology and Health Care 12(3): 245257.
Grimson, W. L., G. J. Ettinger, S. J. White, T. Lozano-Perez, W. M. Wells, and R. Kikinis.
1996. An automatic registration method for frameless stereotaxy, image guided s urgery,
and enhanced reality visualization. IEEE Transactions on Medical Imaging 15(2):
129140.
Hakime, A., F. Deschamps, E. G. M. D. Carvalho, A. Barah, A. Auperin, and T. D. Baere.
2012. Electromagnetic-tracked biopsy under ultrasound guidance: Preliminary results.
CardioVascular and Interventional Radiology 35(4) (August 1): 898905.
513
Hansen, C., J. Wieferich, F. Ritter, C. Rieder, and H.-O. Peitgen. 2010. Illustrative visualization of 3D planning models for augmented reality in liver surgery. International Journal
of Computer Assisted Radiology and Surgery 5(2) (March): 133141.
Higgins, W. E., J. P. Helferty, K. Lu, S. A. Merritt, L. Rai, and K.-C. Yu. 2008. 3D CT-video
fusion for image-guided bronchoscopy. Computerized Medical Imaging and Graphics
32(3) (April): 159173.
Hildebrand, P., M. Kleemann, U. J. Roblick, L. Mirow, C. Brk, and H.-P. Bruch. 2007.
Technical aspects and feasibility of laparoscopic ultrasound navigation in radiofrequency ablation of unresectable hepatic malignancies. Journal of Laparoendoscopic &
Advanced Surgical Techniques. Part A 17(1) (February): 5357.
Hughes-Hallett, A., E. K. Mayer, H. J. Marcus, T. P. Cundy, P. J. Pratt, A. W. Darzi, and
J.A.Vale. 2014. Augmented reality partial nephrectomy: Examining the current status
and future perspectives. Urology 83(2) (February): 266273.
Jannin, P. and X. Morandi. 2007. Surgical models for computer-assisted neurosurgery.
NeuroImage 37(3) (September 1): 783791.
Johnson, L. G., P. Edwards, and D. Hawkes. 2003. Surface transparency makes stereo overlays
unpredictable: The implications for augmented reality. Studies in Health Technology
and Informatics 94: 131136.
Joskowicz, L., R. Shamir, M. Freiman, M. Shoham, E. Zehavi, F. Umansky, and Y. Shoshan.
2006. Image-guided system with miniature robot for precise positioning and targeting in
keyhole neurosurgery. Computer Aided Surgery 11(4) (July): 181193.
Kalkofen, D., E. Mendez, and D. Schmalstieg. 2009. Comprehensible visualization for augmented reality. IEEE Transactions on Visualization and Computer Graphics 15(2)
(April): 193204.
Kaneko, M., F. Kishino, K. Shimamura, and H. Harashima. 1993. Toward the new era of visual
communication. IEICE Transactions on Communications E76-B(6): 577591.
Kang, X., M. Azizian, E. Wilson, K. Wu, A. D. Martin, T. D. Kane, C. A. Peters, K. Cleary,
and R. Shekhar. 2014. Stereoscopic augmented reality for laparoscopic surgery. Surgical
Endoscopy 28(7) (February 1): 22272235.
Kati, D., P. Spengler, S. Bodenstedt, G. Castrillon-Oberndorfer, R. Seeberger, J. Hoffmann,
R. Dillmann, and S. Speidel. 2014. A system for context-aware intraoperative augmented reality in dental implant surgery. International Journal of Computer Assisted
Radiology and Surgery 10(1) (April 27): 101108.
Kati, D., A.-L. Wekerle, J. Grtler, P. Spengler, S. Bodenstedt, S. Rhl, S. Suwelack etal.
2013. Context-aware augmented reality in laparoscopic surgery. Computerized Medical
Imaging and Graphics 37(2) (March): 174182.
Kaufman, S., I. Poupyrev, E. Miller, M. Billinghurst, P. Oppenheimer, and S. Weghorst. 1997.
New interface metaphors for complex information space visualization: An ECG monitor
object prototype. Studies in Health Technology and Informatics 39: 131140.
Kawamata, T., H. Iseki, T. Shibasaki, and T. Hori. 2002. Endoscopic augmented reality navigation system for endonasal transsphenoidal surgery to treat pituitary tumors: Technical
note. Neurosurgery 50(6) (June): 13931397.
Kerner, K. F., C. Imielinska, J. Rolland, and H. Tang. 2003. Augmented reality for teaching
endotracheal intubation: MR imaging to create anatomically correct models. In
Proceedings of the Annual AMIA Symposium, Washington, DC, pp. 888889.
Kersten-Oertel, M., S. J.-S. Chen, and D. L. Collins. 2014. An evaluation of depth enhancing
perceptual cues for vascular volume visualization in neurosurgery. IEEE Transactions
on Visualization and Computer Graphics 20(3) (March): 391403.
Khan, M. F., S. Dogan, A. Maataoui, S. Wesarg, J. Gurung, H. Ackermann, M. Schiemann,
G. Wimmer-Greinecker, and T. J. Vogl. 2006. Navigation-based needle puncture of a
cadaver using a hybrid tracking navigational system. Investigative Radiology 41(10)
(October): 713720.
514
515
Marmulla, R., H. Hoppe, J. Mhling, and S. Hassfeld. 2005. New augmented reality concepts
for craniofacial surgical procedures. Plastic and Reconstructive Surgery 115(4) (April):
11241128.
Mathes, A. M., S. Kreuer, S. O. Schneider, S. Ziegeler, and U. Grundmann. 2008. The performance of six pulse oximeters in the environment of neuronavigation. Anesthesia and
Analgesia 107(2) (August): 541544.
Mavrogenis, A. F., O. D. Savvidou, G. Mimidis, J. Papanastasiou, D. Koulalis, N. Demertzis,
and P. J. Papagelopoulos. 2013. Computer-assisted navigation in orthopedic surgery.
Orthopedics 36(8) (August): 631642.
Merloz, P., J. Troccaz, H. Vouaillat, C. Vasile, J. Tonetti, A. Eid, and S. Plaweski. 2007.
Fluoroscopy-based navigation system in spine surgery. Proceedings of the Institution of
Mechanical Engineers. Part H, Journal of Engineering in Medicine 221(7) (October):
813820.
Metzger, P. J. 1993. Adding reality to the virtual. In Proceedings of the IEEE Virtual Reality
International Symposium, Seattle, WA, pp. 713.
Milgram, P. and F. Kishino. 1994. A taxonomy of mixed reality visual displays. In IEICE
Transactions on Information and Systems 77(12), 13211329, 1994.
Milgram, P., H. Takemura, A. Utsumi, and F. Kishino. 1994. Augmented reality: A class
of displays on the realityvirtuality continuum. In Proceedings of the SPIE 1994:
Telemanipulator and Telepresence Technology, Boston, MA, vol. 2351, pp. 282292.
Mischkowski, R. A., M. J. Zinser, A. C. Kbler, B. Krug, U. Seifert, and J. E. Zller. 2006. Application
of an augmented reality tool for maxillary positioning in o rthognathic s urgeryA feasibility
study. Journal of Cranio-Maxillo-Facial Surgery 34(8) (December): 478483.
Moore, J. T., M. W. A. Chu, B. Kiaii, D. Bainbridge, G. Guiraudon, C. Wedlake, M. Currie,
M. Rajchl, R. V. Patel, and T. M. Peters. 2013. A navigation platform for guidance
of beating heart transapical mitral valve repair. IEEE Transactions on Bio-Medical
Engineering 60(4) (April): 10341040.
Mller, M., M.-C. Rassweiler, J. Klein, A. Seitel, M. Gondan, M. Baumhauer, D. Teber,
J.J.Rassweiler, H.-P. Meinzer, and L. Maier-Hein. 2013. Mobile augmented reality for
computer-assisted percutaneous nephrolithotomy. International Journal of Computer
Assisted Radiology and Surgery 8(4) (July): 663675.
Nakamoto, M., K. Nakada, Y. Sato, K. Konishi, M. Hashizume, and S. Tamura. 2008.
Intraoperative magnetic tracker calibration using a magneto-optic hybrid tracker for 3-D
ultrasound-based navigation in laparoscopic surgery. IEEE Transactions on Medical
Imaging 27: 255270.
Navab, N., T. Blum, L. Wang, A. Okur, and T. Wendler. 2012. First deployments of augmented
reality in operating rooms. Computer 45(7) (July): 4855.
Navab, N., S.-M. Heining, and J. Traub. 2010. Camera augmented mobile C-Arm (CAMC):
Calibration, accuracy study, and clinical applications. IEEE Transactions on Medical
Imaging 29(7) (July): 14121423.
Navab, N., J. Traub, T. Sielhorst, M. Feuerstein, and C. Bichlmeier. 2007. Action- and
workflow-driven augmented reality for computer-aided medical procedures. IEEE
516
Pandya, A., M.-R. Siadat, and G. Auner. 2005. Design, implementation and accuracy of a prototype for medical augmented reality. Computer Aided Surgery 10(1) (January): 2335.
Peters, T. and K. Cleary (ed.). 2008. Image-Guided Interventions: Technology and Applications.
Berlin, Germany: Springer.
Povoski, S. P., R. L. Neff, C. M. Mojzisik, D. M. OMalley, G. H. Hinkle, N. C. Hall,
D.A.Murrey Jr, M. V. Knopp, and E. W. Martin Jr. 2009. A comprehensive overview
of radioguided surgery using gamma detection probe technology. World Journal of
Surgical Oncology 7(1) (December 1): 163.
Rodriguez, F., S. Harris, M. Jakopec, A. Barrett, P. Gomes, J. Henckel, J. Cobb, and B. Davies.
2005. Robotic clinical trials of uni-condylar arthroplasty. The International Journal of
Medical Robotics + Computer Assisted Surgery 1(4) (December): 2028.
Rolland, J. P., D. L. Wright, and A. R. Kancherla. 1997. Towards a novel augmented-reality
tool to visualize dynamic 3-D anatomy. Studies in Health Technology and Informatics
39: 337348.
Rosenberg, L. B. 1993. Virtual fixtures: Perceptual tools for telerobotic manipulation. In 1993
IEEE Virtual Reality Annual International Symposium, Seattle, WA, pp. 7682.
Rosenthal, M., A. State, J. Lee, G. Hirota, J. Ackerman, K. Keller, E. Pisano, M. Jiroutek,
K.Muller, and H. Fuchs. 2002. Augmented reality guidance for needle biopsies: An initial randomized, controlled trial in phantoms. Medical Image Analysis 6(3) (September):
313320.
Sato, Y., M. Nakamoto, Y. Tamaki, T. Sasama, I. Sakita, Y. Nakajima, M. Monden, and
S.Tamura. 1998. Image guidance of breast cancer surgery using 3-D ultrasound images
and augmented reality visualization. IEEE Transactions on Medical Imaging 17(5)
(October): 681693.
Shekhar, R., O. Dandekar, V. Bhat, M. Philip, P. Lei, C. Godinez, E. Sutton etal. 2010. Live
augmented reality: A new visualization method for laparoscopic surgery using continuous volumetric computed tomography. Surgical Endoscopy 24(8) (August): 19761985.
Sielhorst, T., C. Bichlmeier, S. M. Heining, and N. Navab. 2006. Depth perceptionA major
issue in medical AR: Evaluation study by twenty surgeons. Medical Image Computing
and Computer-Assisted Intervention 9(Pt 1): 364372.
Simpfendrfer, T., M. Baumhauer, M. Mller, C. N. Gutt, H.-P. Meinzer, J. J. Rassweiler,
S.Guven, and D. Teber. 2011. Augmented reality visualization during laparoscopic radical prostatectomy. Journal of Endourology/Endourological Society 25(12) (December):
18411845.
Sindram, D., I. H. McKillop, J. B. Martinie, and D. A. Iannitti. 2010. Novel 3-D laparoscopic magnetic ultrasound image guidance for lesion targeting. HPB 12(10) (December): 709716.
Sindram, D., R. Z. Swan, K. N. Lau, I. H. McKillop, D. A. Iannitti, and J. B. Martinie. 2011.
Real-time three-dimensional guided ultrasound targeting system for microwave ablation
of liver tumours: A human pilot study. HPB 13(3) (March): 185191.
Stetten, G. D. and V. S. Chib. 2001. Overlaying ultrasonographic images on direct vision.
Journal of Ultrasound in Medicine 20(3) (March): 235240.
Stolka, P. J., X. L. Wang, G. D. Hager, and E. M. Boctor. 2011. Navigation with local sensors in handheld 3D ultrasound: Initial in-vivo experience. In SPIE Medical Imaging:
Ultrasonic Imaging, Tomography, and Therapy, Lake Buena Vista, FL, (eds.) J. Dhooge
and M. M. Doyley, p. 79681J-9.
Stoyanov, D., G. P. Mylonas, M. Lerotic, A. J. Chung, and G.-Z. Yang. 2008. Intra-operative
visualizations: Perceptual fidelity and human factors. Journal of Display Technology
4(4) (December): 491501.
Su, L.-M., B. P. Vagvolgyi, R. Agarwal, C. E. Reiley, R. H. Taylor, and G. D. Hager. 2009.
Augmented reality during robot-assisted laparoscopic partial nephrectomy: Toward
real-time 3D-CT to stereoscopic video registration. Urology 73(4) (April): 896900.
517
Sugimoto, M., H. Yasuda, K. Koda, M. Suzuki, M. Yamazaki, T. Tezuka, C. Kosugi etal. 2010.
Image overlay navigation by markerless surface registration in gastrointestinal, hepatobiliary and pancreatic surgery. Journal of Hepato-Biliary-Pancreatic Sciences 17(5)
(September): 629636.
Swank, M. L., M. Alkire, M. Conditt, and J. H. Lonner. 2009. Technology and cost-
effectiveness in knee arthroplasty: Computer navigation and robotics. American Journal
of Orthopedics (Belle Mead, N.J.) 38(Suppl. 2) (February): 3236.
Szab, Z., S. Berg, S. Sjkvist, T. Gustafsson, P. Carleberg, M. Uppsll, J. Wren, H. Ahn, and
. Smedby. 2013. Real-time intraoperative visualization of myocardial circulation using
augmented reality temperature display. The International Journal of Cardiovascular
Imaging 29(2) (February): 521528.
Takemura, H. and F. Kishino. 1992. Cooperative work environment using virtual workspace.
In Proceedings of the Computer Supported Cooperative Work, Toronto, ON, Canada,
pp. 226232.
Teber, D., S. Guven, T. Simpfendorfer, M. Baumhauer, E. O. Gven, F. Yencilek, A. S. Gozen,
and J. Rassweiler. 2009. Augmented reality: A new tool to improve surgical accuracy
during laparoscopic partial nephrectomy? Preliminary in vitro and in vivo results.
European Urology 56(2) (August): 332338.
Umbarje, K., R. Tang, R. Randhawa, A. Sawka, and H. Vaghadia. 2013. Out-of-plane brachial
plexus block with a novel SonixGPS(TM) needle tracking system. Anaesthesia 68(4)
(April): 433434.
Utsumi, A., P. Milgram, H. Takemura, and F. Kishino. 1994. Investigation of errors in
perception of stereoscopically presented virtual object locations in real display space. In
Proceedings of the Human Factors and Ergonomics Society, Nashville, TN.
Varga, E., P. M. T. Pattynama, and A. Freudenthal. 2013. Manipulation of mental models
of anatomy in interventional radiology and its consequences for design of human
computer interaction. Cognition, Technology & Work 15(4) (November 1): 457473.
Vogt, S., A. Khamene, H. Niemann, and F. Sauer. 2004. An AR system with intuitive user
interface for manipulation and visualization of 3D medical data. Studies in Health
Technology and Informatics 98: 397403 (in Proceedings of the MMVR).
Vogt, S., A. Khamene, and F. Sauer. 2006. Reality augmentation for medical procedures:
System architecture, single camera marker tracking, and system evaluation. International
Journal of Computer Vision 70(2) (November 1): 179190.
Volont, F., F. Pugin, P. Bucher, M. Sugimoto, O. Ratib, and P. Morel. 2011. Augmented reality and image overlay navigation with OsiriX in laparoscopic and robotic surgery: Not
only a matter of fashion. Journal of Hepato-Biliary-Pancreatic Sciences 18(4) (July):
506509.
Vosburgh, K. G. and R. San Jos Estpar. 2007. Natural orifice transluminal endoscopic
surgery (NOTES): An opportunity for augmented reality guidance. Studies in Health
Technology and Informatics 125: 485490 (in Proceedings of the MMVR).
Wacker, F. K., S. Vogt, A. Khamene, J. A. Jesberger, S. G. Nour, D. R. Elgort, F. Sauer,
J. L. Duerk, and J. S. Lewin. 2006. An augmented reality system for MR imageguided needle biopsy: Initial results in a swine model. Radiology 238(2) (February):
497504.
Wang, D., N. Amesur, G. Shukla, A. Bayless, D. Weiser, A. Scharl, D. Mockel etal. 2009.
Peripherally inserted central catheter placement with the sonic flashlight: Initial clinical
trial by nurses. Journal of Ultrasound in Medicine 28(5) (May): 651656.
Wang, J., H. Suenaga, K. Hoshi, L. Yang, E. Kobayashi, I. Sakuma, and H. Liao. 2014.
Augmented reality navigation with automatic marker-free image registration using 3-D
image overlay for dental surgery. IEEE Transactions on Biomedical Engineering 61(4)
(April): 12951304.
518
Weber, S., M. Klein, A. Hein, T. Krueger, T. C. Lueth, and J. Bier. 2003. The navigated
image viewerEvaluation in maxillofacial surgery. In Medical Image Computing and
Computer-Assisted InterventionMICCAI 2003, (eds.) R. E. Ellis and T. M. Peters,
pp.762769. Lecture Notes in Computer Science, vol. 2878. Berlin, Germany: Springer.
Wendler, T., K. Herrmann, A. Schnelzer, T. Lasser, J. Traub, O. Kutter, A. Ehlerding et al.
2010. First demonstration of 3-D lymphatic mapping in breast cancer using freehand SPECT. European Journal of Nuclear Medicine and Molecular Imaging 37(8)
(August1): 14521461.
Wilson, M., M. Coleman, and J. McGrath. 2010. Developing basic handeye coordination
skills for laparoscopic surgery using gaze training. BJU International 105(10) (May):
13561358.
Winne, C., M. Khan, F. Stopp, E. Jank, and E. Keeve. 2011. Overlay visualization in endoscopic ENT surgery. International Journal of Computer Assisted Radiology and Surgery
6(3) (May): 401406.
Wong, S. W., A. U. Niazi, K. J. Chin, and V. W. Chan. 2013. Real-time ultrasound-guided
spinal anesthesia using the SonixGPS needle tracking system: A case report. Canadian
Journal of Anaesthesia 60(1) (January): 5053.
Wrn, H., M. Aschke, and L. A. Kahrs. 2005. New augmented reality and robotic based
methods for head-surgery. The International Journal of Medical Robotics + Computer
Assisted Surgery 1(3) (September): 4956.
Yaniv, Z. and K. Cleary. 2006. Image-guided procedures: A review. Technical report, CAIMR
TR-2006-3. Washington, DC: Georgetown University.
Zheng, G., X. Dong, and P. A. Gruetzner. 2008. Reality-augmented virtual fluoroscopy for
computer-assisted diaphyseal long bone fracture osteosynthesis: A novel technique and
feasibility study results. Proceedings of the Institution of Mechanical Engineers, Part H:
Journal of Engineering in Medicine 222(1) (January 1): 101115.
20
CONTENTS
20.1 Introduction................................................................................................... 520
20.1.1 Image-Guided Surgery...................................................................... 521
20.1.1.1 Registration......................................................................... 521
20.1.1.2 Tracking.............................................................................. 522
20.2 Describing Augmented Reality IGS Systems: Data, Visualization
Processing, View........................................................................................... 523
20.2.1 Data.................................................................................................... 523
20.2.2 Visualization Processing................................................................... 525
20.2.3 View................................................................................................... 526
20.3 Surgical Applications of AR.......................................................................... 527
20.3.1 Neurosurgery..................................................................................... 527
20.3.1.1 Data..................................................................................... 531
20.3.1.2 Visualization Processing..................................................... 531
20.3.1.3 View.................................................................................... 531
20.3.2 Craniofacial, Maxillofacial, and Dental Surgery.............................. 533
20.3.2.1 Data..................................................................................... 533
20.3.2.2 Visualization Processing..................................................... 534
20.3.2.3 View.................................................................................... 534
20.3.3 Internal and Soft Tissue Surgery (Heart, Breast, and Liver)............. 535
20.3.3.1 Data..................................................................................... 535
20.3.3.2 Visualization Processing..................................................... 536
20.3.3.3 View.................................................................................... 536
20.3.4 Endoscopic and Laparoscopic Surgery.............................................. 537
20.3.4.1 Data..................................................................................... 537
20.3.4.2 Visualization Processing..................................................... 537
20.3.4.3 View.................................................................................... 538
20.3.5 Orthopedic Surgery........................................................................... 538
20.3.5.1 Data..................................................................................... 538
20.3.5.2 Visualization Processing..................................................... 539
20.3.5.3 View.................................................................................... 539
519
520
20.1INTRODUCTION
Image-guided surgery (IGS), a form of minimally invasive computer-assisted surgery, was first used in the field of neurosurgery in the mid-1990s. Since then, IGS
has gained wide acceptance and is used in numerous other surgical domains having
shown improvements in patient outcomes with lower morbidity and mortality rates,
smaller incisions and reduced trauma to the patient, faster recovery times, and also a
reduction in costs and hospital stays.
Concomitant with the growing use of less invasive surgical procedures such
as IGS, there has been a growing need for new visualization methods that allow
surgeons to gain as much (or more) visual information as open or exploratory
surgerybut with minimally invasive techniques. IGS has met some of these
needs by guiding surgeons and allowing them to navigate within the surgical field
of view based on computer models of preoperative patient data (e.g.,magnetic resonance images [MRI] or computed tomography [CT] images). This type of surgical guidance, which we will examine more closely in the next section, is achieved
by using tracking systems to localize surgical tools and then visualize them in the
context of computer models of patient anatomy on a monitor within the operating
room (OR).
One drawback that remains with traditional IGS systems is that the burden lies
with the surgeon to map the preoperative patient data displayed on the monitor of the
navigation system to the patient lying on the OR table. This mapping is not trivial, is
time consuming, and may be prone to error. In order to address this issue, a number
of research groups have explored the use of augmented reality (AR) visualizations
that combine in one field of view preoperative virtual data in the form of anatomical regions or objects of interest with the live patient or live images of the patient
(Kersten-Oertel, 2013b).
In this chapter, we examine augmented reality within the unique context of
IGS. We use the term augmented reality that was defined by Milgram and Kishino
(1994) as a position on the mixed reality continuum where virtual objects are
added to a real environment. Due to technological advances, the AR paradigm has
become wider and may now be extended to the notion of mixed reality. This would
also encompass the notion of augmented virtuality (a position on the continuum
where real objects are merged with a virtual environment). In a recent survey of
AR techniques in IGS (Kersten-Oertel, 2013b), 82 of 84 papers surveyed used the
term augmented reality when referring to either augmented virtuality or mixed
reality systems.
521
FIGURE 20.1 A surgeon uses a tracked surgical probe to point to a location of interest on
the patient. A virtual tool model is displayed with respect to the preoperative patient images
on the navigation system. This allows the surgeon to locate the tool with respect to surrounding anatomy that is not directly visible on the patient. (Photo courtesy of Sean Chen, Montreal
Neurological Hospital, Montreal, Quebec, Canada.)
522
of the image-to-frame transformation, any target or point within the brain can be
described with Cartesian or Polar coordinates. During surgery, surgical instruments
are attached to the frame allowing the surgeon to accurately approach the target.
More commonly the virtual and real patients are brought into alignment using
homologous landmark registration (Alp etal., 1998; Eggers etal., 2006; Wolfsberger
etal., 2002). The transformation between landmarks, represented as a point set on
the preoperative data and a corresponding point set chosen on the patient, is computed and optimized. The landmarks may be external markers placed on the patient
(i.e., fiducials) that are also visible on the preoperative images (typically CT or MRI)
or anatomical landmarks that are chosen both on the images of the patient and the
actual patient (e.g., the bridge of the nose, the external cantus of the eye, or the
meatus of the ear).
20.1.1.2Tracking
Tracking systems localize objects by determining their position and orientation in
space. In IGS, acoustic, optical, and electromagnetic tracking systems have been
used.
Optical tracking systems localize objects by measuring the light that is transmitted from an object. Typically, surgical tools, reference frames, and any other objects
in the OR that need to be tracked are outfitted with a trackerthree or four spherical
infrared reflecting balls placed in a specific spatial configuration. An optical infrared
camera system, which has an infrared light source, measures the light reflected by
the markers with two cameras in stereo and determines the position of the tracked
tool in 3-D space. The optical tracking technology used in most common IGS systems is the Polaris Optical Tracking Systems from Northern Digital Inc. (NDI).*
Electromagnetic tracking systems measure the magnetic strength of transmitters
placed at fixed locations. Sensors (in which voltage is induced by the magnetic field)
measure the location and orientation of moving objects to which they are attached
(e.g., surgical instruments). An advantage of electromagnetic tracking systems over
optical tracking systems is that they do not need to be in the line of sight of the emitter
and receiver; this means that surgical tools within the body cavity can also be tracked.
On the other hand, electromagnetic systems are not wireless, which can cause some
inconvenience in sterile areas. Furthermore, electromagnetic signals may suffer from
artifacts if any new tools or objects (made of conducting material) are introduced
into the magnetic field or if there is a magnetic scanner nearby (e.g., an intraoperative
MRI). For these reasons, optical systems are more commonly used in IGS.
Acoustic tracking systems have also been studied in the context of IGS. In one
ultrasound tracking system (Hata etal., 1997) an emitter was attached to the patient
and microphones at various positions in the OR picked up transmitted ultrasound
waves. Due to the different travel times of the ultrasound waves to the various microphones, triangulation could be used to determine the position of the patient. Acoustic
tracking has not gained wide acceptance in IGS, perhaps due to the lack of accuracy
shown to date and susceptibility of ultrasound systems to changes in room temperatures that result in changes in the speed of sound (Eggers etal., 2006).
*
http://www.ndigital.com.
523
20.2.1Data
An important consideration for AR IGS systems is the data that is processed and
visualized for the end user. Data falls into two main classes: patient-specific data
and visually processed data. The different subclasses of data may be directly viewed
or may undergo one or more transformations to become visually processed data.
Patient-specific data may include clinical scores, patient demographics, and signal
or raw imaging data. Here we focus on visually processed data, which may be raw
imaging data, analyzed imaging data, prior knowledge data, or derived data. The
most important characteristics of visually processed data are the dimensionality of
the data (i.e., 1-D, 22.5-D, 3-D, or 4-D), whether in the AR IGS system the data is
represented as a real or virtual object, and the semantics of the data.
The notion of semantics refers to the meaning of the data at a particular surgical step. For example, visually processed data may have a meaning that is strategic,
operational, or anatomical. The most common semantic of visually processed data is
anatomical, or dealing with the physiology and pathology of a patient. Typically, in
AR IGS systems, anatomical models of patient data such as organs, vessels, tumors, or
other objects of interest are combined with live images of the patient to allow a surgeon
to see beyond the exposed anatomical surface of the patient. Data with an operational
semantic are used to represent surgical actions or tasks. As an example, consider the
use of a data primitive to represent different states in tumor biopsy surgery; as the
proximity of a needle approaches a tumor, the tumor changes representation from surface to wireframe mesh. Another typical example of operational semantic is the use
of color to represent the location of a surgical tool, as the tool approaches a high-risk
area; the color of a tool or another indicator may change to red to alert thesurgeon.
Raw imaging
data
Camera/Sensor
Analyzed
imaging data
Processed to create
Derived data
Prior knowledge data
Has
1..*
Perception
location
Are rendered to
Visualization processing
0..* 1..*
Visually
processed data
Gives us Transformed to
Clinical scores
Data
1..* Surgical ep
Conraints
(or/task/user)
1..*
Display
1..*
View
Interaction
tools
FIGURE 20.2 The three factors of our visualization taxonomy (i.e., data, visualization processing, and view), as well as the classes and subclasses
(solid-line arrows) that represent them and the relationships between them (dashed-line arrows) are shown. Numbers in the figure specify the cardinality
of the relationships. The surgical scenario is associated with both visually processed data and view classes. The view is the component that is interacted
with by the user and therefore, the component that is most limited by the constraints of the operating room, the surgeon, the surgical task, etc. * represent
cardinality; * in particular represents many relationships; 0..* means optional on both sides many-to-many; 1..* means one-to-many.
Demographic
information
1
Patient
specific data
Surgical scenario
IGS
visualization
syem
524
Fundamentals of Wearable Computers and Augmented Reality
525
Lastly, data may have a strategic semantic that is concerned with planning and guidance. In IGS, plans are often visualized and displayed for the surgical team; these may
include the representation of virtual tools and their planned paths.
Raw imaging data is data acquired from a particular acquisition sensor, for example, MRI data, CT data, x-ray image data, and microscope data. After raw imaging,
data undergoes a transformation; it becomes analyzed imaging data, which is visualized for the end user of the system. As an example, whereas raw data could be the
direct output of the computed tomography angiography (CTA), the corresponding
analyzed imaging data could be the slices rendered as a volume showing only the
vessels. The most common attribute of analyzed imaging data is the data primitive
that represents the data, for example, a point, a line, a surface, or a volume.
Prior knowledge data is derived from generic models, such as atlases, labels, surgery roadmaps, surgical tool models, or accuracy or uncertainty information about
the IGS system. An example of the use of prior knowledge data in IGS is the surgical
tool models that are visualized with respect to patient anatomy on the navigation
system.
Derived data comes from processing either patient-specific data or prior knowledge data. Typical examples of data deriving from patient-specific data include
uncertainty measurements due to segmentation or registration, measurements such
as tumor volumes, and distances between regions of interest. Data derived from prior
knowledge could be brain regions segmented using an anatomical atlas.
In order to best determine how to visually display the data, it is important to
take into consideration the surgical step at which the data will be shown and the
characteristics and attributes of the different types of data. Some of these attributes
may include the imaging modality of the data, whether the data is preoperative or
intraoperative, and the semantics of the data.
526
Color-coding
Decluttering
Fog
20.2.3 View
The view factor of the DVV taxonomy describes (1) what the end user sees, (2)wherethe
end user sees it, and (3) how the end user can interact with the data and the s ystem.
As such, the view comprises of three subcomponents: the display used, the perception
location, and the interaction tools. The view is strongly linked to the surgical scenario;
each of the visually processed data for a given surgical step is presented at a particular
perception location, on a particular display, and may have only a subset of possible
interactions associated with it.
The constraints of IGS and the OR may require domain-specific solutions for the
view factor. For example, both the display and perception location are limited by the
physical constraints of the OR, including the need for sterile equipment in a particular area of the OR, the limited space available for equipment, and the assurance that
the surgeon is in no way impeded to move and behave as he/she would normally. The
interaction tools are also important in the OR and may also require domain-specific
solutions that take into account the constraints of the surgeon or surgical team such
527
as the cognitive load of the end user (including the amount of relevant information
that can be presented at any point in time), the reaction of the user to system errors,
and the users workflow.
In AR IGS systems, a number of different display devices have been proposed
including the surgical microscope, a monitor, a head-mounted display (HMD), aprojector, and a half-silvered mirror. Depending on the display, the end user will get
either a 2-D (e.g., a computer monitor) or 3-D impression of the augmented reality
visualization. Three-dimensional technologies can either be autostereoscopic or binocular stereoscopic. Whereas binocular stereoscopic visualization requires the use
of special glasses or headwear, autostereoscopic displays, such as multiview lenticular displays, videography, and holography, do not.
The perception location describes where the end user looks to take advantage of
the augmented reality visualization; this may be on or in the patient, a part of the
environment such as the wall, a surgical tool, or another digital device such as a
monitor. The perception location falls into one of two categories, those that require
the surgeon to look away from the surgical field of view and those that project the
virtual objects into the surgical field of view.
Interaction tools fall into two subclasses: hardware interaction tools and virtual
interaction tools. Hardware interaction tools are the physical devices used by the
end-user to interact with the system. A partial list includes keyboards, mice, surgical
tools, tangible objects, and data gloves. The virtual interaction tools describe how
the end user can interact with the data and visualization. Typical examples of interactions used in IGS systems include: tuning volume rendering transfer functions,
volume cutting, voxel peeling, using clipping planes, turning data visibility on and
off and, in general, adjusting data properties such as color, brightness, contrast, and
transparency.
20.3.1Neurosurgery
Neurosurgery was one of the first applications of computer-assisted surgery systems and is currently the most common application for augmented reality IGS
systems. There are two main reasons for this. First, the fact that in neurosurgery
surgeons must resect the smallest possible volumes of tissue in a very narrow
operative field while trying to minimize damage to the eloquent areas of the brain
(Paul et al., 2005; Shuhaiber et al., 2004). Second, in neurosurgery, the surgical
anatomy is constrained within a fixed space (the skull), which allows for feasible
registration (Shuhaiber, 2004). Other soft tissues require more complex registration techniques.
528
(a)
(b)
FIGURE 20.4 Augmented reality systems for different surgical applications. (a)Neurosurgery:
augmented reality view in image-guided neurovascular surgery. Volume-rendered color-coded
vessels are overlaid on top of the patients dura to guide the surgeon and allow for localization
of vessels of interest. (From Kersten-Oertel, M. etal., Augmented reality in neurovascular
surgery: First experiences, Proceedings of AE-CAI, Boston, MA, LNCS, vol. 8678, pp. 8089,
2014.) (b) Liver surgery: augmented reality laparoscopic liver surgery. (From Haouchine, N.
et al., Image-guided simulation of heterogeneous tissue deformation for augmented reality
during hepatic surgery, ISMARIEEE International Symposium on Mixed and Augmented
Reality, Adelaide, SA, pp. 199208, 2013.) A real-time biomechanical model is overlaid on a
wireframe rendered liver, the tumor is depicted in magenta, and different veins are shown in
blue, green, and purple. (Image courtesy of N.Haouchine.)
(Continued)
529
(c)
(d)
FIGURE 20.4 (Continued) Augmented reality systems for different surgical applications.
(c) Orthopedic surgery: Augmented reality view for application in orthopedic surgery: the
virtual spine object segmented from CT is merged with the real. A virtual mirror is used to
see the spine from different perspectives. (From Bichlmeier, C., Immersive, Interactive and
Contextual In-Situ Visualization for Medical Applications, Technische Universitt Mnchen,
Mnchen, Germany, 2010; Image courtesy of C. Bichlmeier and P. Fallavolita.) (d) Dental
surgery: augmented reality system for dental implants: cylinders overlaid on a skull phantom
represent positioning information for implant placement. (From Kati, D. etal., Knowledgebased situation interpretation for context-aware augmented reality in dental implant surgery,
in: Liao, H., Eddie Edwards, P.J., Pan, X., Fan, Y., and Yang, G.-Z. (eds.), Medical Imaging
and Augmented Reality, Lecture Notes in Computer Science, vol. 6326, Springer, Berlin,
Germany, 2010b, pp. 531540; Image courtesy of D. Kati.)(Continued)
530
(e)
(f)
FIGURE 20.4 (Continued) Augmented reality systems for different surgical applications.
(e) Keyhole surgery: augmented reality view of thorax for minimally invasive port based
surgery. (From Bichlmeier, C., Immersive, Interactive and Contextual In-Situ Visualization
for Medical Applications, Technische Universitt Mnchen, Mnchen, Germany, 2010;
Image courtesy of C. Bichlmeier and P. Fallavolita.) (f) Laparoscopic gallbladder surgery:
augmented view of laparoscopic image in gallbladder surgery. (From Kati, D. etal., Off. J.
Comput. Med. Imaging Soc., 37, 174, 2013.) The virtual object represents an anatomical area
to avoid. (Image courtesy of D. Kati.)
531
many other specific applications in neurosurgery have been proposed for augmented
reality visualization. Some of these are: transsphenoidal surgery (where for example
pituitary tumors are removed through the nose and the sphenoid bone); microscope
augmented reality IGS systems for neurosurgery, otolaryngology and ENT (ear,
nose, and throat) surgery, and craniotomy planning.
20.3.1.1Data
The most common type of analyzed data that has been used as virtual objects to
be projected into the optical path of the microscope, mixed with live video images,
or rendered on a HMD for image-guided neurosurgery (IGNS) is anatomical data.
Typically, lesions and tumors, eloquent brain areas to be avoided, vessels, or other
anatomical structures or functional data segmented and processed from MRI, CT,
and/or angiographies are displayed. This type of analyzed data has been typically
rendered as either surface or wireframe objects.
Only a few research groups have described the visualization of derived data. In
Kawamata etal.s (2002) AR neurosurgical system for transsphenoidal surgery the
distance of a tool to the tumor is shown numerically in a bar graph that represents the
status of the approach to the tumor. In a proposed AR IGNS system for needle biopsies (Sauer et al., 2002; Wacker etal., 2006) the derived data comes in the form of
extrapolated needle paths. Whereas the visualization of the needle (ablue cylinder) is
prior-knowledge data, the extrapolation of the needle path (depicted in yellow) that
depicts the needles computed trajectory is considered derived data. Prior-knowledge
data, in IGNS systems has typically taken the form of surgical tools that are visualized on preoperative data. Another form of prior-knowledge data is preoperative
plans that are visualized during the surgery. In the work by Worn etal. (2005), trajectories for bone cutting, points for biopsy, and planned boreholes are all visualized
to aid the surgeon in carrying out the preoperative plan.
20.3.1.2 Visualization Processing
Visualization processing in IGNS systems and in IGS systems in general has greatly
been limited to a few simple techniques. The most commonly described visualization
techniques are color-coding objects of interest (based on type of object or object state)
and using transparency to blend objects together. A few other techniques that have
been used in IGNS are using depth cues such as hidden line removal and stereo projection (Edwards etal., 1995), using lighting and shading (King etal., 2000), or using
saliency techniques such as highlighting objects of interest (Pandya etal., 2005).
20.3.1.3View
The most common display device for neurosurgical AR applications is the surgical
microscope. In many neurosurgical procedures a microscope is used to get a magnified view of the patients brain. Choosing the microscope as display device is a logical solution as the device is already present in the operating theater, and therefore
there is little or no additional cost in using it. Furthermore, using the microscope
reduces the disruption to the surgeons workflow.
A number of research groups have developed AR image-guided neurosurgery
systems that use a surgical microscope or a head-mounted variation of a microscope.
532
533
534
analyzing the current surgical step. Such systems can simplify the user interface
and thus lower the cognitive burden of the surgeon-user.
The types of prior-knowledge data that have been visualized in AR systems for
craniofacial, maxillofacial, and dental surgery are the tracked surgical tools used
(e.g., saws).
20.3.2.2 Visualization Processing
Perhaps due to the use of primarily vector objects, visualization processing of data
for these types of surgery is not typically described. When surface models have been
used, only transparency (Livin and Keeve, 2001; Trevisan et al., 2006) and color
coding (Trevisan et al., 2006) have been noted as visualization techniques.
20.3.2.3View
The most common display device used in craniofacial, dental, and maxillofacial
surgery is a projector. One example comes from the work by Marmulla etal. (2004,
2005) where osteotomy lines and tumor boundaries segmented from CT images
were projected onto the patient to guide and help the surgeon navigate with respect
to his/her surgical plan. Other display devices that have been used for these types of
surgeries include head-mounted displays (HMDs), portable screens, and monitors.
An example where a portable LCD screen (the X-Scope) was used comes from the
work of Mischkowski etal. (2005, 2006). In this system, the surgeon holds the portable screen and walks around the patient taking advantage of the combination of a
volume-rendered patient model from CT or MRI with the live images of the patient.
Salb etal. (2002) used an optical see-through HMD in their proposed system for
craniofacial surgery and Kati etal. (2014, 2010) used one in their dental AR system.
The majority of display devices commonly used for these types of surgery (except
for monitor) allow the perception location to be the patient. This suggests the importance of not disrupting the surgeons workflow; using the patient as the perception
location ensures that the surgeon need not look away from the surgical field of view
to benefit from the AR visualization. Furthermore, when virtual objects are projected onto the patient, the surgeon does not need to mentally transform guidance
images from the navigation system to the anatomy of the patient on the table. In their
AR system for dental surgery, Kati etal. (2010) proposed an interesting solution to
perception location that allowed both the perception location to be either the patient
or an area within the field of view of the surgeon but not on the patient. Their system offers the user two view methods: (1) an AR view where the virtual objects are
overlaid on the real anatomical structures, and (2) an analog view where there is a
visualization of information in a fixed position, for example, left corner of the view,
and does not occlude the patient anatomy. This proposed solution avoids distracting
the surgeon at particular steps in the surgery and occluding the patient with virtual
objects when that information is not needed.
Interactions, whether software or hardware, have not typically been described
in the literature pertaining to dental, craniofacial, and maxillofacial surgery. One
exception is the work of Trevisan etal. (2006) where manipulations of the virtual
objects (rotations, scaling, and zooming) are explicitly described as being possible
at all times.
535
20.3.3 Internal and Soft Tissue Surgery (Heart, Breast, and Liver)
Minimally invasive cardiovascular surgery has been shown to benefit from the
application of augmented reality visualization techniques in the OR. TerryPeters
group at the Robarts Research Institute developed an AR visualization system
for minimally invasive cardiac procedures, integrating real-time ultrasound and
virtual m
odels of the patients beating heart with tracked surgical instruments
(Bainbridge etal., 2008; Linte, 2008; Lo etal., 2005). Traub etal. (2004) also
described a system for optimal port placement in minimally invasive cardiovascular surgery. Liver surgery and breast cancer and breast-conserving surgery
have also been surgical applications that have been examined for augmented
reality IGS systems.
Although AR IGS systems have been proposed for internal and soft tissue type
surgeries, developing AR navigation systems for the liver and intestines (pliable
organs) or soft tissues such as the breast is difficult because they are nonrigid and
deform due to heartbeat, breathing, pressure from laparoscopic instruments, and
from being probed (Shuhaiber, 2004). Deformation is not as significant a problem
with surgery of semirigid organs such as bone and brain (Blackwell, 2000). This
may explain the smaller representation of soft tissue surgeries in the field of augmented reality navigation systems.
20.3.3.1Data
Analyzed data in soft tissue surgery typically have an anatomical semantic. For liver
surgery, segmented surface-rendered tumors, vessels, and whole livers have served
as the anatomical virtual data that is merged with the patient image. Rather than
using surface-rendered objects, Splechna etal.s (2002) system uses volume-rendered
intraoperative ultrasound scans that are visualized along with surface-rendered vessel trees for liver surgery. For breast cancer and breast conserving surgery, typically
wire frame tumor and breast models are visualized. Lastly, in cardiac surgery the
IGS system developed by Peters group (Bainbridge etal., 2008; Linte et al., 2008;
Lo etal., 2005) has depicted cardiac surface models as well as 3-D free-form deformation fields that describe the trajectories of the points on the surface model through
a cardiac cycle.
A unique form of derived data comes from the work of Sato etal.s (1998) breast
cancer surgical navigation system where measurements of the cancers intraductal
spread are depicted (i.e., whether or not abnormal cells have spread outside the
breast duct). Points are visualized as either red to indicate the presence of intraductal
spread and green if no intraductal spread is found. In Tomikawa etal.s (2010) breast-
conserving surgery system the puncture line of the surgical needle is depicted. In
the liver guidance system described by Splechtna etal. (2002), during the surgery a
radiologist places annotation markers (in the form of colored spheres) on the vessel
tree of the liver to note regions of interest for the surgeon. In cardiac surgery, Linte
etal. (2008) graphically represent uncertainty derived from the target registration
error using 95% confidence ellipsoids.
Similar to other IGS techniques, prior knowledge data in soft tissue and internal
surgery has been limited to the depiction of virtual tool models.
536
http://www.googleglasssurgeon.com/surgery.
http://eyetap.org/.
537
538
20.3.4.3View
In endoscopic surgical systems, including laparoscopic and arthroscopic systems,
the display is with few exceptions a monitor. The use of a monitor is a natural decision for an AR system for these types of surgery as traditionally an external monitor
is used to view the image coming from the video camera at the end of the scope. In a
few systems other display devices, such as HMDs, have been proposed. For example,
in Fuchs etal.s (1998) laparoscopic system, the camera image is augmented with
preoperative anatomy and displayed to the end user via a HMD. In the thoracoscopic
AR system proposed by Sauer etal. (2006) a HMD is used to display surface models
of the spine and objects of interest on the patient with thoracoscopic view inset in the
end users AR view. Another example comes from Sudra etal.s system (2007) where
a HMD is used to augment reality in endoscopic robotic surgery.
For endoscopic and laparoscopic navigation systems, the monitor was the most
common perception location. As a monitor is traditionally employed in such surgeries,
its use as the perception location allows the easy extension of traditional navigation
tools and requires no training on the surgeons part. The patient, however, has also
served as the perception location in numerous endoscopic and laparoscopic AR systems where a HMD was used. This may suggest a shift from using a monitor as
perception location in IGS systems to using the patient as perception location, even in
surgeries where the surgeon traditionally does not look directly at the patient.
A few different interaction tools have been discussed in the literature pertaining to AR systems for endoscopic and laparoscopic surgery. In Teber etal.s
(2009) laparoscopic system for urological surgery, an independent operator uses
implanted landmarks to register and combine preoperative CT images with the
real-time video from the endoscope. An interesting solution to system interaction was proposed by Sudra etal. (2007). In their endoscopic robotic system both
speech- and gesture-based interactions are possible. In gesture-based interfaces,
some set of motions or configurations of the hands or body are recognized by the
system as commands. The surgeon can use speech or gestures to switch between
visualization methods, for example, to change parameters or to turn annotation
information on and off.
20.3.5Orthopedic Surgery
AR IGS systems have also been proposed for use in orthopedic surgery. Particular
applications for which AR has been used in orthopedic surgery include spinal surgery, hip replacement, long bone fractures, and knee surgery.
20.3.5.1Data
Analyzed data in AR image-guided orthopedic surgery are most commonly segmented bones from CT scans. In terms of derived data, the AR long bone fracture
navigation system proposed by Zheng etal. (2008) shows outlines and centerlines
of bone fragments on intraoperative fluoroscopic images. Prior knowledge data for
image-guided orthopedic surgery have only been described in the form of surgical
tool models.
539
540
on patient outcomes. Mostsystems have been built as proof of concept and have
not been sufficiently evaluated and validated. Without proper evaluations of integrated systems as well as evaluation of each of component described in the DVV
taxonomy, which would show the added benefit of using them, these systems will
not become regularly used.
In a review of close to a hundred papers on AR IGS systems it was found that
only in 4% of the cited works was the systems effect on surgical outcomes evaluated (Kersten-Oertel, 2013b). Furthermore, the review showed that the majority of
systems developed to date have only been validated in terms of the accuracy of the
system, whether as a whole or in part, for example, for patient-to-image registration,
calibration, or overlay accuracy of the real and virtual images. Validations that have
been done have been typically done using numerical methods or phantoms; very few
systems have been evaluated on patients in a real clinical setting, and even fewer
have been evaluated prospectively.
Based on the paradigm that there is a three-way relationship in the OR between
the surgeon, the patient and the IGS system, Jannin and Korb (2008) proposed
that to assess an IGS system one should look at these three components and their
interrelationships. Therefore, the assessment criteria falls under measures related
to the patient, the surgeon, the IGS system, and the interactions between them:
the surgeon and patient, the IGS system and the surgeon and the IGS system and
the patient. Furthermore, six levels of assessments of IGS systems have been proposed: the technical parameters, reliability in a clinical setting, efficacy in terms
of surgical performance, effectiveness in terms of patient outcome, economic
aspects of its use and, lastly, the social, legal and ethical aspects of its use (Jannin
and Korb, 2008).
The specifications of these criteria and assessment levels outline the complexity of validating and evaluating augmented reality IGS systems. Validating systems
based on patient-related criteria such as patient outcomes is particularly challenging
and time consuming as it involves planning and carrying out clinical trials and determining metrics that can objectively compare techniques to determine if improvements are related to a particular technology or technique used. These difficulties are
discernable by the lack of systems that have been used in real clinical settings and
evaluated on real patient data.
20.5SUMMARY
The examination of the DVV components of augmented reality surgical systems
allows for the use of a common language to describe an IGS system based on what
type of data should be visualized, how it should be visualized, at what point in the
surgery it should be visualized, and how the user can interact with the data both in
terms of manipulation on screen and hardware devices for interaction.
20.5.1Data
When looking at the type of data that is visualized, it is not surprising that across all
surgical domains the most commonly analyzed data is anatomical. Although this is
541
20.5.3 View
Combined with the trend of preferring particular devices in a specific surgical
domain, the variety of display devices used across the various surgical domains suggests that there is no one ideal IGS solution. In fact a number of studies have shown
that choosing the appropriate intraoperative display technology is highly application,
task, and user specific (Cooperstock and Wang, 2009; Traub etal., 2008). Therefore,
an analysis of the surgical domain, the surgical tasks, the surgeons workflow, and
the OR environment must be carefully done to define the requirements of the display
device for a particular surgical application.
Technological advances in display devices are making their way into the OR.
Mobile devices, such as the iPod touch and iPad (Apple Computer, Cupertino,
California) have been used to display preoperative images in the surgical fieldofview.
542
One example of this is the Smith & Nephews Dash Smart Instrument System,* which
is a portable navigation system powered by BrainLab (Feldkirchen, Germany). This
orthopedic navigation system guides the surgeon to accurately place knee and hip
implants. Such ubiquitous devices, which are small and portable, are increasingly
being used across many medical domains and may soon become standard display
devices in the OR.
Although the majority of AR IGS systems continue to use classical interaction
hardware paradigms, such as the mouse and keyboard, these are not appropriate
solutions for the surgical environment (Bichlmeier etal., 2009). However, there has
been little focus on modernizing and developing new solutions for interacting with
visually processed data. One exception, which looked at finding a novel solution to
replace the use of a touch screen or mouse with an IGS system, comes from Fischer
etal. (2005). In their work, surgeons use surgical tools that are already present and
tracked in the OR to perform gestures that are recognized by the tracking system.
The user may click on simple menu markers to load patient data, choose to draw
points or lines, and change the color of virtual objects. In a similar vein, Onceanu
and Stewart (2011) proposed a device into which a tracked surgical probe can be
placed, that allows for joystick-like interaction with a navigation system. With the
advent of new devices such as the Kinect, gesture-based interactions are becoming
more commonly proposed as solutions for interacting with IGS systems.
Two future avenues for research within the view component are likely. First, the
development of appropriate hardware devices or gesture-based techniques that can
be used to interact with visually processed data. These will facilitate interactions
with the surgical navigation system. Second, as surgical modeling and workflow
monitory techniques advance to allow for recognition of a particular stage of surgery, interfaces will be created that require little to no interaction, again facilitating
the surgeons task. Recognition of situational awareness will allow for the automatic
optimization of not only interaction solutions with respect to the current surgical
activity but also the displayed data and visualization techniques used.
In an ideal IGS system, the view would change without interaction and without
disturbing the surgical workflow; the view solutions would be surgeon specific and
allow for natural interactions with the system. Furthermore, suitable representations
of the most appropriate data would be presented at any given stage of the surgery for
a given task.
20.5.4Conclusions
The purpose of augmented reality in IGS is to provide surgeons with a better understanding of the connection between preoperative patient models and the operative
field. Numerous AR technologies, techniques, and solutions have been proposed for
many different types of surgery; yet these solutions have yet to make it into daily
use. The specific domain of the OR makes it challenging to evaluate these proof-ofconcept systems. However, with new research that has begun to focus on surgical
*
https://www.brainlab.com/surgery-products/overview-platform-products/dash-smart-instrument/.
http://www.xbox.com/en-ca/kinect.
543
workflow modeling methods, and novel evaluation metrics specific to OR evaluations of IGS systems, these systems will begin to be properly evaluated and the added
benefit of using AR IGS systems in the OR will be demonstrated. Furthermore, we
predict that the trend of using augmented reality and ubiquitous computing solutions
will continue to increase in daily life. These two factors should push the adoption
and diffusion of AR technologies in the surgical community.
REFERENCES
Alp, M.S., Dujovny, M., Misra, M., Charbel, F.T., Ausman, J.I. (1998) Head registration techniques for image-guided surgery. Neurology Research, 20:3137.
Aschke, M., Wirtz, C.R., Raczkowsky, J., Worn, H., Kunze, S. (2003) Augmented reality in
operating microscopes for neurosurgical interventions. In: First International IEEE EMBS
Conference on Neural Engineering, Capri Island, Italy, March 2022, 2003, pp. 652655.
Bainbridge, D., Jones, D.L., Guiraudon, G.M., Peters, T.M. (2008) Ultrasound image and augmented reality guidance for off-pump, closed, beating, intracardiac surgery. Artificial
Organs, 32:840845.
Bastien, S., Peuchot, B., Tanguy, A. (1997) Augmented reality in spine surgery: Critical
appraisal and status of development. Studies in Health Technology and Informatics,
88:153156.
Bichlmeier, C. (2010) Immersive, Interactive and Contextual In-Situ Visualization for Medical
Applications. Mnchen, Germany: Technische Universitt Mnchen.
Bichlmeier, C., Heining, S.M., Feuerstein, M., Navab, N. (2009) The virtual mirror: A new
interaction paradigm for augmented reality environments. IEEE Transactions on
Medical Imaging, 28:14981510.
Birkfellner, W., Figl, M., Huber, K., Watzinger, F., Wanschitz, F., Hummel, J., Hanel, R. etal. (2002)
A head-mounted operating binocular for augmented reality visualization in medicine
Design and initial evaluation. IEEE Transactions on Medical Imaging, 21:991997.
Birkfellner, W., Figl, M., Matula, C., Hummel, J., Hanel, R., Imhof, H., Wanschitz, F.,
Wagner,A., Watzinger, F., Bergmann, H. (2003) Computer-enhanced stereoscopic vision
in a head-mounted operating binocular. Physics in Medicine and Biology, 48:N49N57.
Blackwell, M., Nikou, C., DiGioia, A.M., Kanade, T. (March 2000) An image o verlay system
for medical data visualization. Medical Image Analysis, 4(1):6772.
Cooperstock, J., Wang, G. (2009) Stereoscopic display technologies, interaction paradigms, and
rendering approaches for neurosurgical visualization. In: SPIE Proceedings Stereoscopic
Displays and Applications, San Jose, CA, vol. 7237, pp. 723703723703-11.
Edwards, P.J., Hawkes, D.J., Hill, D.L., Jewell, D., Spink, R., Strong, A.J., Gleeson, M.J.
(1995) Augmentation of reality using an operating microscope for otolaryngology and
neurosurgical guidance. Journal of Image Guided Surgery, 1:172178.
Edwards, P.J., King, A.P., Hawkes, D.J., Fleig, O., Maurer, C.R., Jr., Hill, D.L., Fenlon, M.R.
et al. (1999) Stereo augmented reality in the surgical microscope. Studies in Health
Technology and Informatics, 62:102108.
Eggers, G., Muhling, J., Marmulla, R. (2006) Image-to-patient registration techniques in head
surgery. International Journal of Oral and Maxillofacial Surgery, 35:10811095.
Figl, M., Birkfellner, W., Watzinger, F., Wanschitz, F., Hummel, J., Hanel, R., Ewers, R.,
Bergmann, H. (2002) PC-based control unit for a head mounted operating microscope
for augmented reality visualization in surgical navigation. In: Medical Image Computing
and Computer-Assisted InterventionMICCAI, Tokyo, Japan, vol. 2489, pp. 4451.
Fischer, J., Bartz, D., Straer, W. (2005) Intuitive and lightweight user interaction for medical
augmented reality. In: Proceedings of Vision, Modeling and Visualization (VMV),
Erlangen, Germany, November 1618, pp.375382.
544
Fuchs, H., Livingston, M.A., Raskar, R., Colucci, D., Keller, K., State, A., Crawford, J.R.,
Rademacher, P., Drake, S.H., Meyer, A.A. (1998) Augmented reality visualization
for laparoscopic surgery. In: Medical Image Computing and Computer-Assisted
InterventionMICCAI, Cambridge, MA, vol. 1496, pp. 934943.
Giraldez, J.G., Talib, H., Caversaccio, M., Ballester, M.A.G. (2006) Multimodal augmented
reality system for surgical microscopy. In: Medical Imaging 2006: Visualization, ImageGuided Procedures, and Display, San Diego, CA, vol. 6141, p. S1411.
Gleason, P.L., Kikinis, R., Altobelli, D., Wells, W., Alexander, E., 3rd, Black, P.M., Jolesz, F.
(1994) Video registration virtual reality for nonlinkage stereotactic surgery. Stereotactic
and Functional Neurosurgery, 63:139143.
Grimson, W.E.L., Kikinis, R., Jolesz, F.A., Black, P.M. (1999) Image-guided surgery. Scientific
American, 280:6269.
Hansen, C., Wieferich, J., Ritter, F., Rieder, C., Peitgen, H.O. (2010) Illustrative visualization
of 3D planning models for augmented reality in liver surgery. International Journal of
Computer Assisted Radiology and Surgery, 5:133141.
Haouchine, N., Dequidt, J., Peterlik, I., Kerrien, E., Berger, M.-O., Cotin, S. (2013) Imageguided simulation of heterogeneous tissue deformation for augmented reality during
hepatic surgery. In: ISMARIEEE International Symposium on Mixed and Augmented
Reality, Adelaide, SA, pp. 199208.
Hata, N., Dohi, T., Iseki, H., Takakura, K. (1997) Development of a frameless and armless
stereotactic neuronavigation system with ultrasonographic registration. Neurosurgery,
41:608613; discussion 613614.
Jannin, P., Korb, W. (2008) Assessments of image-guided interventions. In: Peters, P., Cleary,K.
(eds.) Image-Guided Interventions: Technology and Applications. New York: Springer,
pp. 531547.
Johnson, L., Edwards, P., Hawkes, D. (2002) Surface transparency makes stereo overlays
unpredictable: The implications for augmented reality. Studies in Health Technology
and Informatics, 94:131136.
Kati, D., Spengler, P., Bodenstedt, S., Castrillon-Oberndorfer, G., Seeberger, R., Hoffmann,J.,
Dillmann, R., Speidel, S. (2014) A system for context-aware intraoperative augmented
reality in dental implant surgery. International Journal of Computer Assisted Radiology
and Surgery, 10:101108.
Kati, D., Sudra, G., Speidel, S., Castrillon-Oberndorfer, G., Eggers, G., Dillmann, R. (2010)
Knowledge-based situation interpretation for context-aware augmented reality in dental implant surgery. In: Liao, H., Eddie Edwards, P.J., Pan, X., Fan, Y., Yang, G.-Z.
(eds.) Medical Imaging and Augmented Reality, Lecture Notes in Computer Science,
vol. 6326. Berlin, Germany: Springer, pp. 531540.
Kati, D., Wekerle, A.L., Gortler, J., Spengler, P., Bodenstedt, S., Rohl, S., Suwelack, S. etal.
(2013) Context-aware augmented reality in laparoscopic surgery. Computerized medical imaging and graphics. The Official Journal of the Computerized Medical Imaging
Society, 37:174182.
Kawamata, T., Iseki, H., Shibasaki, T., Hori, T. (2002) Endoscopic augmented reality navigation system for endonasal transsphenoidal surgery to treat pituitary tumors: Technical
note. Neurosurgery, 50:13931397.
Kersten-Oertel, M., Chen, S.J., Collins, D.L. (2013a) An evaluation of depth enhancing perceptual cues for vascular volume visualization in neurosurgery. IEEE Transactions on
Visualization and Computer Graphics, 20:391403.
Kersten-Oertel, M., Gerard, I., Drouin, S., Mok, K., Sirhan, D., Sinclair, D., Collins, D.L.
(2014) Augmented reality in neurovascular surgery: First experiences. In: Proceedings
of AE-CAI, Boston, MA, LNCS, vol. 8678, pp. 8089.
545
Kersten-Oertel, M., Jannin, P., Collins, D.L. (2012) DVV: A taxonomy for mixed reality visualization in image guided surgery. IEEE Transactions on Visualization and Computer
Graphics, 18:332352.
Kersten-Oertel, M., Jannin, P., Collins, D.L. (2013b) The state of the art of visualization in
mixed reality image guided surgery. Computerized Medical Imaging and Graphics: The
Official Journal of the Computerized Medical Imaging Society, 37:98112.
King, A.P., Edwards, P.J., Maurer, C.R., de Cunha, D.A., Gaston, R.P., Clarkson, M., Hill,
D.L.G. et al. (2000) Stereo augmented reality in the surgical microscope. PresenceTeleoperators and Virtual Environments, 9:360368.
Kosaka, A., Saito, A., Furuhashi, Y., Shibasaki, T. (2000) Augmented reality system for surgical navigation using robust target vision. In: IEEE Conference on Computer Vision and
Pattern Recognition, Hilton Head Island, South Carolina, vol. 2, pp. 187194.
Livin, M., Keeve, E. (2001) Stereoscopic augmented reality system for computer-assisted
surgery. In: CARS, vol. 1230, pp. 107111.
Linte, C.A., Moore J., Wiles A.D., Wedlake, C., Peters, T.M. (March 2008) Virtual realityenhanced ultrasound guidance: A novel technique for intracardiac interventions.
Computer Aided Surgery, 13(2):8294.
Lo, J., Moore, J., Wedlake, C., Guiraudon, G.M., Eagleson, R., Peters, T. (2005) Surgeoncontrolled visualization techniques for virtual reality-guided cardiac surgery. Studies in
Health Technology and Informatics, 142:162167.
Lorensen, W., Kikinis, R., Cline, H., Altobelli, D., Nafis, C., Gleason, L. (1993) Enhancing
reality in the operating-room. In: Proceedings of the IEEE Conference on Visualization
93, San Jose, CA, pp. 410415.
Marmulla, R., Hoppe, H., Muhling, J., Eggers, G. (2005) An augmented reality system for
image-guided surgery. International Journal of Oral and Maxillofacial Surgery,
34:594596.
Marmulla, R., Hoppe, H., Mhling, J., Hassfeld, S. (2004) New augmented reality concepts for
craniofacial surgical procedures. Plastic and Reconstructive Surgery, 115:11241128.
Milgram, P., Kishino, F. (1994) A taxonomy of mixed reality visual-displays. IEICE
Transactions on Information and Systems, E77d:13211329.
Mischkowski, R.A., Zinser, M., Kubler, A., Seifert, U., Zoller, J.E. (2005) Clinical and experimental evaluation of an augmented reality system in cranio-maxillofacial surgery. In:
CARS 2005: Computer Assisted Radiology and Surgery, vol. 1281, pp. 565570.
Mischkowski, R.A., Zinser, M.J., Kubler, A.C., Krug, B., Seifert, U., Zoller, J.E. (2006)
Application of an augmented reality tool for maxillary positioning in orthognathic surgeryA feasibility study. Journal of Craniomaxillofacial Surgery, Cologne,
Germany, 34:478483.
Nijmeh, A.D., Goodger, N.M., Hawkes, D., Edwards, P.J., McGurk, M. (2005) Image-guided
navigation in oral and maxillofacial surgery. British Journal of Oral and Maxillofacial
Surgery, 43:294302.
Onceanu, D., Stewart, A.J. (2011) Direct surgeon control of the computer in the operating
room. Medical image computing and computer-assisted intervention. In: MICCAI:
International Conference on Medical Image Computing and Computer-Assisted
Intervention, vol. 14, pp. 121128. Toronto, Canada.
Paloc, C., Carrasco, E., Macia, I., Gomez, R., Barandiaran, I., Jimenez, J.M., Rueda, O., di
Urbina, J.O., Valdivieso, A., Sakas, G. (2004) Computer-aided surgery based on autostereoscopic augmented reality. In: Proceedings of the Eighth International Conference
on Information Visualisation, Austin, TX, pp. 189193.
Pandya, A., Siadat, M.R., Auner, G. (2005) Design, implementation and accuracy of a prototype for medical augmented reality. Computer Aided Surgery, 10:2335.
546
Paul, P., Fleig, O., Jannin P. (November 2005) Augmented virtuality based on stereoscopic
reconstruction in multimodal image-guided neurosurgery: Methods and performance
evaluation. IEEE Transactions on Medical Imaging, 24(11):15001511.
Rodriguez Palma, S., Becker, B.C., Lobes, L.A., Riviere, C.N. (2012) Comparative evaluation
of monocular augmented-reality display for surgical microscopes. In: Engineering in
Medicine and Biology Society (EMBC), 2012 Annual International Conference of the
IEEE, San Diego, CA: IEEE, pp.14091412.
Salb, T., Brief, J., Burgert, O., Gockel, T., Hassfeld, S., Mhling, J., Dillmann, R. (2002) (intraoperative presentation of surgical planning and simulation results)augmented reality for
craniofacial surgery. In SPIE Electronic Imaging. International Conference on Stereoscopic
Displays and Virtual Reality Systems, JM et. al., ed. Vol 5006: Santa Clara, CA, May 2003.
Samset, E., Schmalstieg, D., Sloten, J.V., Freudenthal, A., Declerck, J., Casciaro, S., Rideng,O.,
Gersak, B. (2008) Augmented reality in surgical procedures. In: Human Vision and
Electronic Imaging XIII, vol. 6806, pp. 68060K68060K, San Jose, CA.
Sato, Y., Nakamoto, M., Tamaki, Y., Sasama, T., Sakita, I., Nakajima, Y., Monden, M., Tamura,
S. (1998) Image guidance of breast cancer surgery using 3-D ultrasound images and
augmented reality visualization. IEEE Transactions on Medical Imaging, 17:681693.
Sauer, F., Khamene, A., Bascle, B., Vogt, S., Rubino, G. (2002) Augmented-reality visualization in iMRI operating room: System description and preclinical testing. In:
Proceedings of the SPIE, Medical Imaging: Visualization and Image-Guided Procedures,
SanDiego,CA, vol. 4681, pp. 446454.
Sauer, F., Vogt, S., Khamene, A., Heining, S., Euler, E., Schneberger, M., Zuerl, K., Mutschler,
W. (March 2006) Augmented reality visualization for thoracoscopic spine surgery.
In: Proceedings on SPIE 6141, Medical Imaging 2006: Visualization, Image-Guided
Procedures, and Display, vol. 6141, pp. 430437. San Diego, CA, February 11, 2006.
Scheuering, M., Schenk, A., Schneider, A., Preim, B., Greiner, G. (2003) Intraoperative augmented reality for minimally invasive liver interventions. In: Medical Imaging 2003:
Visualization, Image-Guided Procedures, and Display, vol. 5029, pp. 407417.
Shuhaiber, J.H. (2004) Augmented reality in surgery. Archives of Surgery, 139:170174.
Soler, L., Ayache, N., Nicolau, S., Pennec, X., Forest, C., Delingette, H., Mutter, D.,
Marescaux, J. (2004) Virtual reality, augmented reality and robotics in surgical procedures of the liver. In: Buzug, T.M., Lueth, T.C. (eds.) Perspectives in Image Guided
Surgery. Singapore: World Scientific Publishing Company, pp. 476484.
Splechtna, R.C., Fuhrmann, A.L., Wegenkittl, R. (2002) ARASAugmented reality aided
surgery system description. VRV is Research Center Technical Report.
Sudra, G., Speidel, S., Fritz, D., Muller-Stich, B.P., Gutt, C., Dillmann, R. (2007) MEDIASSIST
MEDIcal ASSistance for intraoperative skill transfer in minimally invasive surgery
using augmented reality. In: Medical Imaging 2007: Visualization and Image-Guided
Procedures, vol. 6509, pp. 65091O65091O. San Diego, CA.
Suthau, T., Vetter, M., Hassenpflug, P., Meinzer, H.-P., Hellwich, O. (2002) A concept work
for augmented reality visualization based on a medical application in liver surgery.
Medical Application in Liver Surgery, Technical University Berlin, Berlin, Germany,
Commission V, WG V/3.
Suzuki, N., Hattori, A., Hashizume, M. (2008) Benefits of augmented reality function for laparoscopic and endoscopic surgical robot systems. In: MICCAI Workshop: AMI-ARCS,
New York, pp. 5360.
Teber, D., Guven, S., Simpfendorfer, T., Baumhauer, M., Guven, E.O., Yencilek, F., Gozen,A.S.,
Rassweiler, J. (2009) Augmented reality: A new tool to improve surgical accuracy during
laparoscopic partial nephrectomy? Preliminary in vitro and in vivo results. European
Urology, 56:332338.
547
Tomikawa, M., Hong, J., Shiotani, S., Tokunaga, E., Konishi, K., Ieiri, S., Tanoue, K.,
Akahoshi, T., Maehara, Y., Hashizume, M. (2010) Real-time 3-dimensional virtual
reality navigation system with open MRI for breast-conserving surgery. Journal of the
American College of Surgeons, 210:927933.
Tonet, O., Megali, G., DAttanasio, S., Dario, P., Carrozza, M.C., Marcacci, M., Martelli,S.,
La Palombara, P.F. (2000) An augmented reality navigation system for computer assisted
arthroscopic surgery of the knee. Medical Image Computing and Computer-Assisted
InterventionMICCAI, 1935:11581162.
Traub, J., Feuerstein, M., Bauer, M., Schirmbeck, E.U., Najafi, H., Bauernschmitt, R.,
Klinker, G. (2004) Augmented reality for port placement and navigation in robotically
assisted minimally invasive cardiovascular surgery. In: CARS 2004: Computer Assisted
Radiology and Surgery, Proceedings, vol. 1268, pp. 735740.
Traub, J., Sielhorst, T., Heining, S.M., Navab, N. (2008) Advanced display and visualization concepts for image guided surgery. Journal of Display Technology, Chicago, IL,
4:483490.
Trevisan, D.G., Nedel, L.P., Macq, B., Vanderdonckt, J. (2006) Detecting interaction variables in a mixed reality system for maxillofacial-guided surgery. In: SVR 2006-SBC
Symposium on Virtual Reality, Belm, Par, Brazil, vol. 1, pp. 3950.
Wacker, F.K., Vogt, S., Khamene, A., Jesberger, J.A., Nour, S.G., Elgort, D.R., Sauer, F.,
Duerk, J.L., Lewin, J.S. (2006) An augmented reality system for MR image-guided
needle biopsy: Initial results in a swine model. Radiology, 238:497504.
Wolfsberger, S., Rossler, K., Regatschnig, R., Ungersbock, K. (2002) Anatomical landmarks
for image registration in frameless stereotactic neuronavigation. Neurosurgical Review,
25:6872.
Worn, H., Aschke, M., Kahrs, L.A. (2005) New augmented reality and robotic based methods
for head-surgery. International Journal of Medical Robotics, 1:4956.
Wu, J.R., Wang, M.L., Liu, K.C., Hu, M.H., Lee, P.Y. (2014) Real-time advanced spinal surgery via visible patient model and augmented reality system. Computer Methods and
Programs in Biomedicine, 113:869881.
Zheng, G., Dong, X., Gruetzner, P.A. (2008) Reality-augmented virtual fluoroscopy for
computer-assisted diaphyseal long bone fracture osteosynthesis: A novel technique and
feasibility study results. Proceedings of the Institution of Mechanical Engineers. Part H,
Journal of Engineering in Medicine, 222:101115.
Section IV
Wearable Computers and
Wearable Technology
21
CONTENTS
21.1 Related Work................................................................................................. 554
21.1.1 Wearable Haptic Devices................................................................... 554
21.1.2 Soft Skin Simulation.......................................................................... 555
21.1.2.1 Rigid Articulated Hands..................................................... 555
21.1.2.2 Deformable Hands.............................................................. 556
21.2 Full-Hand Deformation Using Linear Elasticity........................................... 557
21.2.1 Skeleton.............................................................................................. 557
21.2.2 Flesh................................................................................................... 558
21.2.2.1 Elasticity Model.................................................................. 559
21.2.2.2 Tetrahedral Discretization.................................................. 559
21.2.2.3 Elastic Force Computation.................................................. 560
21.2.3 Coupling of Skeleton and Flesh......................................................... 560
21.2.4 Haptic Rendering............................................................................... 562
21.2.5 Results................................................................................................ 563
21.3 Strain-Limiting for Nonlinear Soft Skin Deformation..................................564
21.3.1 Formulation of Strain-Limiting Constraints......................................564
21.3.2 Constraint Jacobians.......................................................................... 566
21.3.3 Constrained Dynamics...................................................................... 566
21.3.4 Contact and Friction.......................................................................... 567
21.3.5 Error Metrics..................................................................................... 567
21.3.6 Haptic Coupling................................................................................. 568
21.3.7 Results................................................................................................ 568
21.4 Anisotropic Soft Skin Deformation............................................................... 570
21.4.1 Definition of Strain Limits................................................................ 571
21.4.2 Hyperbolic Projection Function......................................................... 572
21.4.3 Constraint Formulation...................................................................... 573
21.4.4 Constraint Jacobians.......................................................................... 573
21.4.5 Comparison with Other Approaches................................................. 574
21.4.6 Results................................................................................................ 576
21.5 Conclusion..................................................................................................... 576
Acknowledgments................................................................................................... 577
References............................................................................................................... 577
551
552
Just as the synthesizing and rendering of visual images defines the area of computer
graphics, the science of developing devices and algorithms that synthesize computergenerated force-feedback and tactile cues is the concern of computer haptics (Lin
and Otaduy 2008). Haptics broadly refers to touch interactions (physical contact)
that occur for the purpose of perception of virtual environments and, more broadly
speaking, for the transmission of tactile cues.
Wearable haptics focuses on haptic devices, with their corresponding control
algorithms, that are worn by the user. Wearable haptic systems generate tactile feedback and apply it directly on the human body to interact, communicate, and cooperate with real and virtual environments. Although the sensory system responsible for
haptic perception, the somatosensory system, is distributed across the entire body, a
large portion of existing work in wearable haptics has focused on hand-based wearable haptics (notably fingertip tactile stimuli and finger kinesthetic feedback), mainly
due to the fundamental importance of manual interaction for most tasks.
Wearable haptics offers new ways of interacting with real and virtual environments. Leveraging touch as a primordial sense through which subjects can communicate easily and instantly, wearable haptics stimulations can be used to come
into contact with cognitively impaired patients, or augment social media interactions
with additional sensory channels. Wearable haptics can improve the cooperation of
humans and robots in a team, in scenarios such as search and rescue. Videogames,
a multibillion industry, as well as serious games for professional training could also
benefit from wearable haptic technology for an increased degree of immersion without requiring expensive and cumbersome ground-based devices, as already shown
by much simpler consumer devices such as Nintendos Wiimote.
The design of wearable devices is constrained by the wearability factor, thus limiting the weight, volume, shape, and form factor of the device (Gemperle etal. 1998).
Grounded kinesthetic devices require an external robot to simulate contact interaction. Similarly, exoskeletons are body-grounded (Bergamasco etal. 1994; Biggs and
Srinivasan 2002), producing forces and counterbalancing forces both felt by the user.
Wearable haptics, on the other hand, should be intrinsically distributed and multidegree-of-freedom (DoF), but necessarily largely underactuated and undersensed to
improve the wearability factor (Prattichizzo etal. 2013).
In order to simulate realistic haptic sensations with wearable devices, cutaneous
feedback is often used to replace kinesthetic feedback while trying to preserve the
sensations. This is an inherently difficult task. Yet, wearable haptic feedback systems producing realistic results have been designed, mainly applied to fingerpads
(Minamizawa etal. 2007, 2008; Nakamura and Fukui 2007; Aoki etal. 2009). These
devices have confirmed that some kinesthetic information, such as grasping force
and object weight, can be reasonably reproduced through tactile sensations alone.
However, these ungrounded haptic displays still have limitations, mainly in terms of
maximum weight and force they can render, due to the absence of real kinesthetic
feedback.
Wearable haptic devices produce haptic feedback through the activation of force
and deformation cues on the users skin. For a realistic perception of these cues, the
553
appropriate sensing channels must be excited with the correct intensity and adequate
timing, while everything is synchronized with the other modalities. Achieving such
a complex goal requires the use of a cognitive layer that predicts deformation and
force cues based on a biomechanical model of the users body. Therefore, wearable
haptic systems require the development of reliable and efficient numerical models
of human touch biomechanics. The model should include the interaction with the
environment while accounting for skin deformation and constraints coming from
articulation joints and contacts.
Therefore, to render high-fidelity haptic feedback for direct interaction with the
skin, which is often the case of wearable haptic devices, the command and actuation of haptic devices must rely on a thorough understanding of the forces and
deformations present at contact locations. This implies the interactive computation
of accurate forces and deformations of the skin. Over the past, research on haptic
rendering has produced excellent methods to support kinesthetic rendering of toolbased interaction, but in recent years we have also witnessed the invention of multiple cutaneous haptic devices, using a variety of technologies and skin stimulation
principles (Chubb etal. 2010; Scilingo etal. 2010; Wang and Hayward 2010; Gleeson
etal. 2011; Solazzi etal. 2011; Chinello etal. 2012). This progress in hardware design
calls for novel methods to compute accurate forces and deformations on the skin for
wearable haptic rendering.
This chapter summarizes the progress on a computationally efficient approach
to simulate accurate soft skin contact. These results are intended to be part of a
model-based control strategy for wearable haptic rendering of direct hand interaction, in which the forces and/or deformations needed to command the wearable
haptic device are computed by resolving the interaction between a skin model and
simulated objects or materials. The computation of high-fidelity forces and deformations during skin contact, which are then used to command wearable haptic devices,
is challenged by two major difficulties: frictional contact and the extreme nonlinear
elasticity of the skin.
We first survey existing wearable haptic devices and early approaches related to
hand and soft skin simulation, ranging from rigid articulated hand skeletons to more
sophisticated flesh and bone models allowing compliant frictional contact. However,
while off-line approaches are too computationally expensive, real-time techniques
do not account for the highly nonlinear and anisotropic elastic behavior of soft skin,
or the efficient two-way computation of haptic feedback. We therefore describe in
more detail a set of techniques that address these issues in the context of haptic
rendering. We first describe a technique that allows two-way coupling between flesh
and bones, fundamental for haptic rendering of hand interaction through contact
(Garre etal. 2011). We then show how to take into account the extreme n onlinearity
of skin
elasticity in an efficient way through strain-limiting constraints (Perez
etal. 2013), and how to incorporate anisotropic constraints to correctly model the
anisotropic behavior of finger flesh (Hernandez etal. 2013). These models capture
the deformations and forces of the skin upon contact. Therefore, they constitute a
central component of a model-based strategy for the control of wearable devices.
554
555
tangential displacements of the skin and therefore display navigation cues to the
user. Tangential skin displacements are also addressed in Gleeson etal. (2010) with
a 2-DoF fingertip device, creating planar motion through a compliant flexure stage.
Leveraging the bielasticity of a fabric, Bianchi etal. (2010) improved the rendering
of softness with a device that conveys both cutaneous and kinesthetic information.
Prattichizzo etal. (2013) designed a 3-DoF wearable device for cutaneous force feedback through mobile platform that applies forces on the finger pad. The platform
is attached to a static one located on the back of the finger and is driven by three
motors. The authors assume a linear relationship between platform displacement and
the resultant wrench, therefore using a very simplified model of soft skin mechanics.
In general, none of these devices rely on soft skin deformation simulations to
compute haptic feedback. Yet, the behavior of flesh is extremely complex, mainly
due to its high deformation nonlinearity. Accounting for the object that is being
stimulated, which is often the fingers or the fingertip, through some sort of biomechanical model of articulated hand and/or soft skin mechanics can greatly increase
the precision of force feedback computations, and allow the use of a model-based
control strategy for wearable haptic rendering of direct hand interaction.
556
557
21.2.1Skeleton
Let us start considering a simple skeleton consisting of two bones a and b connected
through a joint. The dynamics of the two bones can be described by the Newton
Euler equations:
M a v a = Fa (q a , v a , q b , v b ) and M b v b = Fb (q a , v a , q b , v b ), (21.1)
C(q a , q b ) = 0. (21.2)
where
q includes both the position and orientation of a rigid body
v includes its linear and angular velocity
M is a 6 6 mass matrix
F is the vector of forces and torques
These may include gravitational and inertial forces, joint elasticity and damping, and
constraint forces.
For a skeleton with an arbitrary number of bones and joints, we group all bone
velocities in a vector vs, and then the constrained NewtonEuler equations can be
expressed as
M s v s = Fs (q s , v s ),
C s (q s ) = 0.
(21.3)
558
(a)
(b)
(c)
FIGURE 21.1 (a) Articulated hand skeleton with 16 bones. Phalanxes are linked
through hinge joints, and the phalanxes and the palm are linked through universal joints.
(b) Surface model for rendering purposes. (c) Embedding tetrahedral mesh to model flesh
elasticity.
We have modeled the hands skeleton using 16 rigid bodies (3 for each finger and
the thumb, and 1 for the palm) and 15 joints (10 hinge joints between phalanxes,
and 5 universal joints at the junctions of phalanxes and the palm), as shown in
Figure21.1. We have modeled joint limits by activating stiff angular springs after
admissible joint rotations are reached.
To ensure stable simulation under the combination of high stiffness, low mass,
and large time steps, we discretize the constrained NewtonEuler equations with
backward Euler implicit integration (Baraff and Witkin 1998). We approximate
the resulting nonlinear system with a linearization of forces at the beginning
ofeach time step, and then Equation 21.3 is transformed into the linear system of
equations:
A s v s + JTs s = bs ,
J s v s = b s ,
(21.4)
21.2.2Flesh
Flesh is simulated using an elasticity model derived from continuum mechanics.
Wefirst introduce the formulation of continuum elasticity, followed by a description
of the finite element discretization using tetrahedral meshes. Finally, we describe the
computation of elastic forces.
559
1
1
(u + u T ) = (G + GT ) I, (21.6)
2
2
X = (x1 x 4
x2 x4
x 3 x 4 ). (21.8)
For convenience, we express the inverse of the rest-state volume matrix based on its
rows:
r1
X = r2 . (21.9)
r3
1
0
It is also convenient to define a fictitious row r4 = (r1 + r2 + r3). Using the volume
matrix, the deformation gradient G = x/x0 of a tetrahedron can be computed as
G = X X 01. (21.10)
560
B EB dV
T
x0
, (21.11)
e ,x0
A f v f = b f . (21.12)
The velocity vector vf concatenates the velocities of all mesh nodes. The system matrix of the flesh is Af = M f + hDf + h 2K f, with Df and K f damping and
stiffness matrices, respectively. It contains nonzero blocks in the diagonal and
in off-diagonal terms associated to pairs of nodes belonging to the same tetrahedron. Full details of the linear corotational formulation are given in Mller and
Gross (2004).
561
w (x + R x ), (21.13)
ij
j ij
where
(xj, Rj) are the position and orientation of the jth bone
xij is the (constant) target position of the flesh node in the local reference system
of the bone
{wij} are the skinning weights.
A realistic hand should enjoy bidirectional coupling between skeleton and flesh. The
skeletal motion must be transmitted to the flesh, and contact on the flesh must be
transmitted as well to the skeleton. Moreover, the stiffness of the flesh should impose
a resistance on the skeleton under joint rotations. We achieve bidirectional coupling
by inserting zero-length springs between flesh nodes and their target skinning positions. Then, the coupling of the ith flesh node produces a force Fi on the node and a
wrench (i.e., force and torque) Fj on the jth bone given by
Fi = k (x i fi ), (21.14)
Fj = k
fiT
( fi x i ). (21.15)
q j
The Jacobian matrix fi /q j = wij I33 , wij (R j x ij )* relates bone velocities to target
skinning velocities, with * representing the skew-symmetric matrix for a cross
product. Due to the addition of coupling forces between skeleton and flesh, the
numerical integration of skeleton and flesh velocities must be computed through a
single system of equations that couples Equations 21.4 and 21.12:
As
A fs
Js
A sf
Af
0
JTs v s bs
0 v f = b f . (21.16)
0 s bs
The coupling forces contribute to the terms Asf and Afs, through the derivatives of
flesh node forces with respect to bone configurations, and vice versa. Moreover,
the coupling forces also contribute terms to off-diagonal blocks of the As matrix
for pairs of bones that affect the target skinning position of the same flesh node.
In other words, for two bones a and b affecting the same flesh node xi, then
Fa /q b = k (fiT /q a )(fi /q b ) 0.
In practice, we limit the complexity added by the coupling by enforcing that each
flesh node may only be affected by at most two bones, and those bones must be adjacent. Then, the force Jacobian terms added by the fleshskeleton coupling to the As
matrix coincide with those added by joint limits.
562
563
FIGURE 21.2 Haptic interaction with a virtual environment through the wearable haptic
glove Cybergrasp using our full-hand linear deformation model.
21.2.5Results
We have executed several experiments, all on a quad-core 2.4 GHz PC with 3 GB of
memory and a GeForce 8800 GTS. Tracking of the hand configuration is performed
using a Cyberglove, as shown in Figure 21.2, and haptic feedback is achieved using a
Cybergrasp haptic device. During free-space motions, the hand follows the configuration read from the glove without apparent lag. In contact situations, the skeleton
and flesh deviate from the glove configuration as expected, due to the virtual coupling. More importantly, contact situations produce interesting bulging deformations
on the hands flesh. These deformations are possible, thanks to the coupling of a soft
flesh to the hands skeleton.
We have also measured the computational performance of several components
of the simulation algorithm. Timings were recorded using the 16-bone skeleton, a
597-tetrahedra mesh for flesh deformation, a 1,441-triangle mesh for collision detection, and a textured 23,640-triangle mesh for rendering. The computations in the
haptic loop are dominated by the solution of rigid body dynamics for the 16 proxy
bones. Nevertheless, these computations run at just 95 s per step on average. In the
visual loop, the computations are dominated by the simulation of the hand and the
computation of collision response. The solution of articulated body dynamics runs
at 7 ms per time step on average, while the solution of the unconstrained flesh and
bone motion runs at 38 ms per time step on average. For the scene shown above, a
full simulation step including collision response runs at 53 ms per time step on average. In our examples, contact was rarely the dominating cost, because we used rather
simple meshes for collision detection. We expect that our model would produce more
realistic deformations with finer simulation and collision meshes, but it would not
run interactively on current commodity PCs.
Although this approach allows the simulation of a full deformable hand in real
time with force feedback, it models the flesh as a linear elastic material, which is
564
not capable of capturing the range of behaviors of the skin. We address this issue
in the following section by introducing strain-limiting constraints and a constraint
optimization approach.
565
FIGURE 21.3 As shown on the left, the finger suffers severe artifacts when we use a compliant linear material. On top, bottom view of flipped tetrahedra due to friction forces, and on
the bottom, side view of collapsed finger under pressing forces. As shown on the right, these
situations are robustly handled with our strain-limiting model.
effectively by limiting the deformation gradient of each tetrahedron in the finite element mesh. To this end, we compute a singular value decomposition (SVD) of the
deformation gradient of each tetrahedron:
s1
G = USV S = 0
0
0
s2
0
0 = UT GV, (21.17)
s3
where the singular values (s1 s2 s3) capture deformations along principal axes. For
convenience, we express the rotations U and V based on their columns:
U = ( u1
u2
u 3 ), V = (v1
v2
v 3 ). (21.18)
Unit singular values in all directions (i.e., si = 1) imply no deformation, while a unit
determinant (i.e., det(G) = s1 s2 s3 = 1) implies volume preservation. We enforce
strain limiting by applying a lower limit smin (i.e., compression constraint) and an
upper limit smax (i.e., stretch constraint) on each singular value of the deformation
gradient:
566
21.3.2Constraint Jacobians
We enforce strain limiting constraints following a constrained optimization formulation described in the next section. This formulation requires the computation
of c onstraint Jacobians with respect to the generalized coordinates of the system
(i.e.,the nodal positions of the finite element mesh) due to two reasons. First, constraints are nonlinear, and we locally linearize them in each simulation step. Second,
we enforce constraints using the method of Lagrange multipliers, which applies
forces in the direction normal to the constraints.
To define constraint Jacobians, we take, for example, one compression constraint
of one tetrahedron (the formulation is analogous for stretch constraints):
Ci = si smin 0. (21.20)
From the definitions of the deformation gradient in Equations 21.8 through 21.10 and
its singular values in Equations 21.17 and 21.18, the Jacobians of the constraint with
respect to the four nodes of the tetrahedron can be computed as
Ci si
=
= rj v i u Ti ,
x j x j
The derivation follows easily from the fact that sk/gij = uikvjk (Papadopoulo and
Lourakis 2000).
21.3.3Constrained Dynamics
The unconstrained dynamics of deformable bodies can be expressed in matrix form
as Mv = F, where v is a vector that concatenates all nodal velocities, F is a vector
with all nodal forces, including gravity, elasticity, etc., and M is the mass matrix.
Given positions x0 and velocities v0 at the beginning of a simulation step, we integrate the equations with backward Euler implicit integration and linearized forces,
which amounts to solving the following linear system:
Av* = b, with A = M h
F
F
F
h2
v 0 + hF. (21.22)
and b = M h
v
x
v
1
Jv C0 . (21.23)
h
The constrained dynamics problem is a quadratic program (QP) that consists of finding the closest velocity to the unconstrained one, subject to the constraints, that is,
1
v = arg min(v v* )T A(v v* ), s.t. Jv C0 . (21.24)
h
567
1
0 JA 1JT + JA 1b + C0 0, (21.25)
h
In our examples, we solve the LCP using projected GaussSeidel (PGS) relaxation
(Cottle etal. 1992).
21.3.5Error Metrics
The algorithm described earlier is a standard approach for simulating dynamics
under nonpenetration contact constraints (Kaufman etal. 2008; Otaduy etal. 2009).
Then, the simultaneous simulation of strain-limiting and contact constraints is
FIGURE 21.4 Three screen captures (side and bottom views) of interactive deformations
of a finger model under frictional contact, with unconstrained (reference, middle), backward,
(left) and forward (right) motions of the finger. The finger was simulated with a Young modulus of 2000 kPa and isotropic strain-limiting of 0.9 < si < 1.1.
568
simply carried out by merging strain-limiting constraints of the form (21.20) and
contact constrains of the form (21.27) into the same set of constraints C.
However, the convergence of the PGS solver for the LCP in Equation 21.25
requires appropriate weighting of the various constraint errors. We measure the error
of the constrained problem as
error =
min(Ci , 0) . (21.28)
To weight the constraints, we simply express constraint errors as distances in the units
of the workspace. Contact constraints as in Equation 21.27 are already expressed in
distance units; therefore, we set wi = 1 for them. Strain-limiting constraints as in
Equation 21.20 are dimensionless and indicate a relative scaling of tetrahedra. To
transform them to distance units, we scale each strain-limiting constraint by the
average edge length e of its corresponding tetrahedron, that is, wi = e.
21.3.7Results
To show its behavior and characterize its performance, we show examples of
non-linear soft tissue deformation (Figures 21.4, 21.5 and 21.6), and compare
the behavior to linear elastic materials (Figure 21.3). Most importantly, we have
569
FIGURE 21.5 A user manipulates a haptic device with a thimble end-effector. The split
screen shows, on the left, a first-person view of a finger model tapping a wooden table, and on
the right, a close-up bottom view of the fingertip.
applied our algorithm to the simulation of soft finger contact with haptic feedback
of direct finger interaction, as shown in Figure 21.5. A human finger model is
simulated while tracking the haptic device, and forces and deformations on the
fingers skin are computed interactively to command the feedback forces of the
haptic device.
Finally, we have tested our simulation algorithm with a full-hand model. We have
used a Cyberglove device to track the configuration of the users hand, and we use
PD controllers to command the hands skeleton based on input bone transformations.
In the future, mechanical tracking will be replaced with vision-based markerless
hand tracking, and the simulation will be integrated with wearable devices.
With the linear corotational deformation model, realistic bulging is only possible
under moderate forces, and the hand skin soon suffers artifacts. The top row of
Figure 21.6 shows clear artifacts of the linear corotational model when the deformations are large, in particular at finger joints, where the skin is excessively compressed. Our strain-limiting model produces the desired nonlinear skin behavior
instead, and deformations appear bounded in practice, as shown in the second row
of Figure 21.6.
With the addition of strain-limiting constraints, the simulation is no longer interactive. In the four frames shown in Figure 21.6, the number of active strain-limiting
constraints is 331 on average, and the simulation runs at an average of 376 ms per
time step. Future work will focus on the design of interactive nonlinear skin models
based on strain-limiting, and the current solution will be used for validation purposes. The current constraint-based skin model is interactive for models of moderate
size. Future developments in the project will focus on the estimation of model parameters from acquired data, the design of more efficient solvers for the constraint-based
skin model, and the integration of the skin model and the frictional fingertip contact
model. The current results will serve as a baseline for comparisons, and will be used
570
FIGURE 21.6 Hand animation and simulation of grasping. The snapshots compare the simulation of a full-hand model with and without strain-limiting constraints. Top row: simulation
of flesh using a linear corotational FEM model without strain-limiting. The fingers deform
excessively, particularly at joints. Second row: the same simulation using our nonlinear skin
model based on strain-limiting constraints. Third row: embedding tetrahedral mesh without
constraints. Fourth row: embedding tetrahedral mesh with constraints.
for validation purposes of more efficient models. In addition, the final model will be
used not only in the context of the model-based device control strategy, but also for
a model-based analysis of perceptual processes.
571
sinceedges need to be aligned with the deformation direction that is being constrained,
requiring extensive remeshing. In continuum-based approaches, Thomaszewski etal.
(2009) use different limits for each strain value component of a cloth simulation
(weft, warp, and shear strains). With this approach, limits and strain values are always
defined on undeformed axes; hence they do not distinguish well the various deformation modes under large deformations. Picinbono etal. (2003) allow transverse anisotropic strain-limiting (with a transverse and a radial privileged direction) by adding
an energy term to a hyperelasticity formulation, penalizing stretch deformations in
the transverse direction. This formulation does not suffer from the same problems
as full anisotropy, since strain-limiting is only enforced on one axis, the radial axis
being free to deform. Therefore, no interpolation is required, but only a projection of
the strain tensor along the transverse direction. There are simple and straightforward
approaches used in other contexts to model anisotropy, but these solutions produce
unrealistic results for strain-limiting.
In this chapter, we describe a novel hyperbolic projection function to compute
stretch and compress limits along any deformation direction, and formulate the
strain-limiting constraints based on this projection. We compare our approach to
nave solutions and different approaches found in the literature, and show that our
approach produces predictable and more realistic results.
572
s max
VTe2
e2
s2
1,2
s min
VTe1
12
1
smin
1
smax
11
s1
1,1
smin
1,1
smax
FIGURE 21.7 Illustration of our hyperbolic projection method, which projects the limits
from the rotated global axes onto the principal axes of deformation.
(stretch and compress), and each direction has to be projected on each axis of Fd ,
there is a total of 18 limits to be computed (6 for each deformation value si).
p() =
1
, (21.29)
| cos() |
where is the angle between a given rotated limit direction and a given axis of Fd.
j
j
Let us consider, for instance, axis ej of the global frame, where smin
and smax
are
defined. The limit direction in Fd is VT ej, and the axes of Fd are ((1,0,0)T (0,1,0)T
(0,0,1)T) = (e1 e2 e3). This results in the following stretch and compress values for each
deformation value si:
j ,i
smin
= 1+
j
smin
1
, (21.30)
T T
| ei V e j |
j ,i
smax
= 1+
j
smax
1
. (21.31)
T T
| ei V e j |
573
Equations 21.30 and 21.31 provide stretch and compress values for each limit defined
on a global axis (j {1,2,3}) and each principal axis of deformation (i {1,2,3}).
21.4.3Constraint Formulation
In the isotropic case, the constraints are defined as
i
Cmin
= si smin 0, (21.32)
i
Cmax
= smax si 0. (21.33)
Based on Equations 21.30 and 21.31, we reformulate our constraints to take into
account each interpolated limit, resulting in
j ,i
j
Cmin
= | e Ti VT e j | (si 1) (smin
1) 0, (21.34)
j ,i
j
Cmax
= (smax
1) | e Ti VT e j | (si 1) 0. (21.35)
21.4.4Constraint Jacobians
We enforce strain limiting constraints following the constrained dynamics algorithm described in the previous section. This formulation requires the computation
of c onstraint Jacobians with respect to the generalized coordinates of the system
(i.e.,the nodal positions of the finite element mesh) due to two reasons. First, constraints are nonlinear, and we locally linearize them in each simulation step. Second,
we enforce constraints using the method of Lagrange multipliers, which applies
forces in the direction normal to the constraints.
Taking the derivatives of Equations 21.34 and 21.35 with respect to a node xn
requires computing the derivatives of si and VT with respect to xn. For the differentiation of si, recall from Equation 21.21 that
si
= rn v i u Ti . (21.36)
x n
Papadopoulo and Lourakis (Papadopoulo and Lourakis 2000) define the derivative
of V with respect to each component gkl of the deformation gradient G as
V
= Vvk ,l , (21.37)
gkl
VT
= vk ,l VT . (21.38)
gkl
574
We can now use the chain rule to get the derivatives with respect to tetrahedral
nodesxn. To avoid dealing with rank-3 tensors, we directly formulate the derivatives
of VT ej instead:
VT e j
=
x n
1,l
v
VT e j
v2,l VT e j
Using Equations 21.36 and 21.39, we can compute the derivatives of the constraints
in Equations 21.34 and 21.35 with respect to the nodal positions of the mesh:
i, j
VT e j
Cmin
s
= (si 1)sign e Ti VT e j e Ti
+ e Ti VT e j i , (21.40)
x n
x n
x n
i, j
VT e j
Cmax
s
= (1 si )sign e Ti VT e j e Ti
e Ti VT e j i . (21.41)
x n
x n
x n
j ,i
j
s _ orthopro jmin
= 1 + smin
1 e Ti VT e j , (21.42)
j ,i
j
s _ orthopro jmax
= 1 + smax
1 e Ti VT e j . (21.43)
Linear interpolation, on the other hand, interpolates the values defined in the
global frame to find the limits along an arbitrary direction. Instead of rotating
theglobalframe to Fd, we proceed the other way around: we apply the inverse rotation to Fd to get the principal axes of deformation in the global frame. This allows us
to easily compute the interpolations by simply computing the intersection of the line
defined by each principal axis of deformation with the ellipsoid defined by the global
frame and its limits. Therefore, there is a total of six limits and constraints, as in the
575
i
i
i sotropic case, with a stretch limit s _ linearintmax
and a compress limit s _ linearintmin
per principal axis of deformation:
i
min
s1min
= 0
0
i
max
1
smax
= 0
0
s _ linearint
s _ linearint
2
min
2
max
0 V e i , (21.44)
3
smin
0 V e i . (21.45)
3
smax
576
21.4.6Results
We simulate these highly nonlinear, highly anisotropic conditions using our anisotropic strain-limiting approach on the finger model. The finger model is initialized with its longitudinal direction aligned with the horizontal axis (e1), and the
nail facing up along the vertical axis (e2). Limits are defined as 0.95 < s1 < 1.05
(stiff alonge1), 0.75 < s2 < 1.25 (compliant along e2), and 0.98 < s3 < 1.02 (almost
incompressible along e3). The aforementioned simulation parameters were selected
by trial and error to approximately match the behavior of a real finger. Figure 21.8
shows some results of the deformations when the finger is pressed against a table
along each axis. Wecompared the anisotropic model with the isotropic one using
0.75<s1<1.25. As expected, for the same motions we obtained similar results along
(e2), and overly compliant behavior along the other axes.
Overall, these results could have a strong impact on haptic rendering of direct
hand interaction too. In a model-based control strategy to command tactile devices, it
is important to compute realistic contact variables such as contact area, friction, force
magnitude, and force distribution. We have compared contact variables with a linear elastic model and our isotropic strain-limiting approach in an experiment where
the full finger is pressed flat against a plane: our strain-limiting approach shows the
expected fast increase of the contact force once a certain contact area is reached.
21.5CONCLUSION
In this chapter, we presented a set of models for efficiently simulating the highly
nonlinear elastic deformations of soft skin under frictional contact, as well as a full
human hand model with its articulated skeleton and soft flesh, allowing realistic
grasping manipulations. This work is a step toward a model-based control strategy
for wearable haptic rendering of direct hand interaction. This ultimate goal requires
a realistic hand model, allowing the real-time computation of high-fidelity forces and
nonlinear deformations during skin contact with simulated objects and materials.
577
We first quickly surveyed existing devices and techniques allowing rigid and
deformable hand interaction and grasping, highlighting the need for novel techniques
that accounted for the highly nonlinear and anisotropic behavior of flesh, as well as
efficient haptic coupling mechanisms. We then presented a hand model allowing
efficient two-way coupling between flesh and bones for haptic feedback, followed by
a set of constraint based models addressing the nonlinear and anisotropic behavior
of hand deformations.
Haptic interaction requires very fast update rates, and our current bottleneck is
the dynamics solver of the constrained optimization. We are exploring the use of
more efficient solvers, ideally reaching interactive rates for high-resolution models
and allowing more detailed haptic feedback.
We are also working on improving the way deformation limits are chosen and
set. We are exploring the estimation of limits from real force-deformation measurements, and thus mimic the behavior of real-world materials. The automatic estimation and placement of limits in a given model using real-world data (Bickel etal.
2009) would avoid ad hoc tuning and would improve the quality of the deformations.
In our work, we have focused on the improvement of the elastic behavior of finger
simulation models, but accurate modeling of the finger can leverage additional recent
findings about its mechanical behavior (Wiertlewski and Hayward 2012).
This work was largely motivated by the use of an accurate soft skin simulation
as a model-based control strategy in the command of cutaneous haptic devices.
Currently, we have successfully tested the simulation with kinesthetic haptic devices,
and we plan to test it as well with wearable cutaneous devices. To this end, it is
important to identify the particular forces and/or deformations needed to command
specific cutaneous devices.
ACKNOWLEDGMENTS
The authors would like to thank Carlos Garre and Fernando Hernndez for their
contributions. This work was supported in part by the EU FP7 project Wearhap
(601165), the European Research Council (ERC-2011-StG-280135 Animetrics), and
the Spanish Ministry of Economy (TIN2012-35840).
REFERENCES
Amemiya, T., H. Ando, and T. Maeda. March 2007. Hand-held force display with spring-cam
mechanism for generating asymmetric acceleration. In EuroHaptics Conference, 2007
and Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems.
World Haptics 2007. Second Joint, Tsukaba, Japan, pp. 572573.
Aoki, T., H. Mitake, S. Hasegawa, and M. Sato. 2009. Haptic ring: Touching virtual creatures in mixed reality environments. In SIGGRAPH09: Posters, SIGGRAPH09,
NewOrleans,LA. New York: ACM, vol. 100, p. 1100:1.
Baraff, D. and A. Witkin. 1998. Large steps in cloth simulation. In Computer Graphics
(Proceedings of SIGGRAPH 98), Orlando, FL. New York: ACM, pp. 4354.
Barbagli, F., A. Frisoli, K. Salisbury, and M. Bergamasco. 2004. Simulating human fingers:
A soft finger proxy model and algorithm. In Proceedings of the 12th International
Symposium on HAPTICS04, Chicago, IL, vol. 1, pp. 917.
578
Bergamasco, M., B. Allotta, L. Bosio, L. Ferretti, G. Parrini, G.M. Prisco, F. Salsedo, and
G. Sartini. May 1994. An arm exoskeleton system for teleoperation and virtual environments applications. In Proceedings of the 1994 IEEE International Conference on
Robotics and Automation, San Diego, CA, vol. 2, pp. 14491454.
Bianchi, M., A. Serio, E.P. Scilingo, and A Bicchi. March 2010. A new fabric-based softness
display. In 2010 IEEE Haptics Symposium, Waltham, MA, pp. 105112.
Bickel, B., M. Bcher, M.A. Otaduy, W. Matusik, H. Pfister, and M. Gross. July 2009. Capture
and modeling of non-linear heterogeneous soft tissue. ACM Transactions on Graphics
28 (3): 89:189:9.
Biggs, K. and M.A. Srinivasan. 2002. Haptic interfaces. In K.M. Stanney (ed.), Handbook of
Virtual Environments: Design, Implementation, and Applications, vol. 1. Boca Raton,
FL: CRC Press, pp. 93116.
Borst, C.W. and A.P. Indugula. 2005. Realistic virtual grasping. In Proceedings of the IEEE
Virtual Reality Conference, Bonn, Germany, pp. 9198.
Bridson, R., S. Marino, and R. Fedkiw. 2003. Simulation of clothing with folds and w
rinkles.
In Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer
Animation, San Diego, CA, pp. 2836.
Chinello, F., M. Malvezzi, C. Pacchierotti, and D. Prattichizzo. 2012. A three DoFs wearable
tactile display for exploration and manipulation of virtual objects. In Proceedings of the
IEEE Haptics Symposium, Vancouver, BC, pp. 7176.
Chubb, E.C., J.E. Colgate, and M.A. Peshkin. 2010. ShiverPaD: A glass haptic surface that
produces shear force on a bare finger. IEEE Transactions on Haptics 3 (3): 189198.
Ciocarlie, M., C. Lackner, and P. Allen. 2007. Soft finger model with adaptive contact
geometry for grasping and manipulation tasks. In World Haptics Conference, Tsukaba,
Japan, pp. 219224.
Colgate, J.E., M.C. Stanley, and J.M. Brown. 1995. Issues in the haptic display of tool use.
In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and
Systems, Pittsburgh, PA, pp. 140145.
Cottle, R., J. Pang, and R. Stone. 1992. The Linear Complementarity Problem. Boston, MA:
Academic Press.
Duriez, C., H. Courtecuisse, J.-P. de la Plata Alcalde, and P.-J. Bensoussan. 2008. Contact
skinning. In Eurographics Conference (short paper). Crete, Greece.
Frisoli, A., F. Barbagli, E. Ruffaldi, K. Salisbury, and M. Bergamasco. 2006. A limit-curve
based soft finger god-object algorithm. In 14th Symposium on Haptic Interfaces for
Virtual Environment and Teleoperator Systems, Arlington, VA, pp. 217223.
Garre, C., F. Hernandez, A. Gracia, and M.A. Otaduy. 2011. Interactive simulation of a
deformable hand for haptic rendering. In Proceedings of the World Haptics Conference,
Istanbul, Turkey, pp. 239244.
Gemperle, F., C. Kasabach, J. Stivoric, M. Bauer, and R. Martin. October 1998. Design for
wearability. In Second International Symposium on Wearable Computers, 1998. Digest
of Papers, Pittsburgh, PA, pp. 116122.
Gleeson, B.T., S.K. Horschel, and W.R. Provancher. October 2010. Design of a fingertipmounted tactile display with tangential skin displacement feedback. IEEE Transactions
on Haptics 3 (4): 297301.
Gleeson, B.T., C.A. Stewart, and W.R. Provancher. 2011. Improved tactile shear feedback: Tactor
design and an aperture-based restraint. IEEE Transactions on Haptics 4 (4): 253262.
Hernandez, F., G. Cirio, A.G. Perez, and M.A. Otaduy. 2013. Anisotropic strain limiting.
InProceedings of the CEIG. Madrid, Spain.
Howe, R.D. and M.R. Cutkosky. December 1996. Practical force-motion models for sliding
manipulation. International Journal of Robotics Research 15 (6): 557572.
579
Irving, G., J. Teran, and R. Fedkiw. 2004. Invertible finite elements for robust simulation of
large deformation. In Proceedings of the ACM SIGGRAPH/Eurographics Symposium on
Computer Animation. Aire-la-Ville, Switzerland: Eurographics Association, pp. 131140.
Jacobs, J. and B. Froehlich. March 2011. A soft hand model for physically-based manipulation of virtual objects. In 2011 IEEE Virtual Reality Conference (VR), Singapore,
pp.1118.
Jacobs, J., M. Stengel, and B. Froehlich. March 2012. A generalized god-object method for
plausible finger-based interactions in virtual environments. In 2012 IEEE Symposium on
3D User Interfaces (3DUI), Costa Mesa, CA, pp. 4351.
Kaufman, D.M., S. Sueda, D.L. James, and D.K. Pai. 2008. Staggered projections for f rictional
contact in multibody systems. In Proceedings of the ACM SIGGRAPH Asia, Singapore,
vol. 27, pp.164:1164:11.
Kim, H., C. Seo, J. Lee, J. Ryu, S. Yu, and S. Lee. September 2006. Vibrotactile display
for driving safety information. In IEEE Intelligent Transportation Systems Conference,
2006. ITSC06, Toronto, Ontario, Canada, pp. 573577.
Kry, P.G., D.L. James, and D.K. Pai. July 2002. EigenSkin: Real time large deformation character skinning in hardware. In ACM SIGGRAPH Symposium on Computer Animation,
San Antonio, TX. New York: ACM, pp. 153160.
Kry, P.G. and D.K. Pai. July 2006. Interaction capture and synthesis. ACM Transactions on
Graphics 25 (3): 872880.
Kurihara, T. and N. Miyata. July 2004. Modeling deformable human hands from medical
images. In 2004 ACM SIGGRAPH/Eurographics Symposium on Computer Animation.
Aire-la-Ville, Switzerland: Eurographics Association, pp. 355363.
Li, Y., J.L. Fu, and N.S. Pollard. July/August 2007. Data-driven grasp synthesis using shape
matching and task-based pruning. IEEE Transactions on Visualization and Computer
Graphics 13 (4): 732747.
Lieberman, J. and C. Breazeal. October 2007. TIKL: Development of a wearable vibrotactile
feedback suit for improved human motor learning. IEEE Transactions on Robotics 23
(5): 919926.
Lin, M.C. and M.A. Otaduy (eds.). July 2008. Haptic Rendering: Foundations, Algorithms
and Applications, Illustrated edition. Wellesley, MA: A K Peters Ltd.
Magnenat-Thalmann, N., R. Laperrire, and D. Thalmann. June 1988. Joint-dependent local
deformations for hand animation and object grasping. In Graphics Interface88. Toronto,
Ontario, Canada: Canadian Information Processing Society, pp. 2633.
McNeely, W.A., K.D. Puterbaugh, and J.J. Troy. August 1999. Six degrees-of-freedom haptic
rendering using voxel sampling. In Proceedings of the SIGGRAPH 99, Computer
Graphics Proceedings, Los Angeles, CA, vol. 18, pp. 401408.
Minamizawa, K., S. Fukamachi, H. Kajimoto, N. Kawakami, and S. Tachi. 2007. Gravity
grabber: Wearable haptic display to present virtual mass sensation. In ACM SIGGRAPH
2007 Emerging Technologies, SIGGRAPH07. New York: ACM.
Minamizawa, K., S. Kamuro, S. Fukamachi, N. Kawakami, and S. Tachi. 2008. GhostGlove:
Haptic existence of the virtual world. In ACM SIGGRAPH 2008 New Tech Demos,
SIGGRAPH08, Los Angeles, CA. New York: ACM, vol. 18, Article No. 8, p. 118:1.
Mller, M. and M. Gross. 2004. Interactive virtual materials. In Proceedings of the Graphics
Interface, Canadian HumanComputer Communications Society School of Computer
Science, University of Waterloo, Waterloo, Ontario, Canada, pp. 239246.
Nakamura, N. and Y. Fukui. March 2007. Development of fingertip type non-grounding
force feedback display. In EuroHaptics Conference, 2007 and Symposium on Haptic
Interfaces for Virtual Environment and Teleoperator Systems. World Haptics 2007.
Second Joint, Tsukaba, Japan, pp. 582583.
580
581
Serina, E.R., C.D. Mote, and D. Rempel. 1997. Force response of the fingertip pulp to repeated
compression: Effects of loading rate, loading angle and anthropometry. Journal of
Biomechanics 30 (10): 10351040.
Sin, F., Y. Zhu, Y. Li, D. Schroeder, and J. Barbi. 2011. Invertible isotropic hyperelasticity
using SVD gradients. In ACM SIGGRAPH/Eurographics Symposium on Computer
Animation (Posters), Vancouver, BC.
Solazzi, M., A. Frisoli, and M. Bergamasco. September 2010. Design of a cutaneous fingertip
display for improving haptic exploration of virtual objects. In 2010 IEEE RO-MAN,
Viareggio, Italy, pp. 16.
Solazzi, M., W.R. Provancher, A. Frisoli, and M. Bergamasco. 2011. Design of a SMA a ctuated
2-DoF tactile device for displaying tangential skin displacement. In IEEE World Haptics
Conference, Istanbul, Turkey, pp. 3136.
Sueda, S., A. Kaufman, and D.K. Pai. August 2008. Musculotendon simulation for hand
animation. ACM Transactions on Graphics 27 (3). Article No. 8.
Thomaszewski, B., S. Pabst, and W. Strasser. 2009. Continuum-based strain limiting. Computer
Graphics Forum 28 (2): 569576.
Traylor, R. and H.Z. Tan. 2002. Development of a wearable haptic display for situation
awareness in altered-gravity environment: Some initial findings. In Proceedings of
the 10th Symposium on Haptic Interfaces for Virtual Environment and Teleoperator
Systems, 2002. HAPTICS 2002, Orlando, FL, pp. 159164.
Turchet, L., P. Burelli, and S. Serafin. 2013. Haptic feedback for enhancing realism of walking
simulations. IEEE Transactions on Haptics 6 (1): 3545.
Wang, Q. and V. Hayward. 2010. Biomechanically optimized distributed tactile transducer
based on lateral skin deformation. International Journal of Robotics Research 29 (4):
323335.
Wiertlewski, M. and V. Hayward. 2012. Mechanical behavior of the fingertip in the range
of frequencies and displacements relevant to touch. Journal of Biomechanics 45 (11):
18691874.
Winfree, K.N., J. Gewirtz, T. Mather, J. Fiene, and K.J. Kuchenbecker. March 2009. A high
fidelity ungrounded torque feedback device: The iTorqU 2.0. In EuroHaptics conference,
2009 and Symposium on Haptic Interfaces for Virtual Environment and Teleoperator
Systems. World Haptics 2009. Third Joint, Salt Lake City, UT, pp. 261266.
Yang, G.-H., K.-U. Kyung, M.A. Srinivasan, and D.-S. Kwon. March 2007. Development
of quantitative tactile display device to provide both pin-array-type tactile feedback
and thermal feedback. In EuroHaptics Conference, 2007 and Symposium on Haptic
Interfaces for Virtual Environment and Teleoperator Systems. World Haptics 2007.
Second Joint, Tsukaba, Japan, pp. 578579.
Yang, T.-H., S.-Y. Kim, C.-H. Kim, D.-S. Kwon, and W.J. Book. March 2009. Development of
a miniature pin-array tactile module using elastic and electromagnetic force for mobile
devices. In EuroHaptics conference, 2009 and Symposium on Haptic Interfaces for
Virtual Environment and Teleoperator Systems. World Haptics 2009. Third Joint, Salt
Lake City, UT, pp. 1317.
Zilles, C.B. and J.K. Salisbury. 1995. A constraint-based god-object method for haptic display.
In IEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 3. Los
Alamitos, CA: IEEE Computer Society, p. 3146.
22
Design Challenges
of Real Wearable
Computers
Attila Reiss and Oliver Amft
CONTENTS
22.1 Introduction................................................................................................. 584
22.1.1 Review Inclusion and Exclusion Criteria....................................... 585
22.2 Garment-Based Wearable Computers......................................................... 585
22.2.1 Placement on Human Body........................................................... 590
22.2.1.1 Head and Neck Region................................................... 590
22.2.1.2 Torso Region................................................................... 590
22.2.1.3 Arms and Hands Region................................................. 593
22.2.1.4 Legs Region.................................................................... 593
22.2.1.5 Feet Region..................................................................... 593
22.3 Accessory-Based Wearable Computers....................................................... 594
22.3.1 Placement on Human Body........................................................... 595
22.3.1.1 Head and Neck Region................................................... 595
22.3.1.2 Torso Region................................................................... 595
22.3.1.3 Arm Region..................................................................... 598
22.3.1.4 Wrist Region................................................................... 599
22.3.1.5 Hand and Finger Region................................................. 599
22.3.1.6 Legs and Feet Region......................................................600
22.4 Lessons Learned and Best Practices...........................................................600
22.4.1 User Interface................................................................................. 600
22.4.2 Sensing Modalities......................................................................... 603
22.4.3 Data and Power.............................................................................. 607
22.4.4 Wearability..................................................................................... 608
22.4.5 Social Acceptance and Aesthetics.................................................. 609
22.4.6 Robustness and Reliability............................................................610
22.4.7 Extensibility................................................................................... 611
22.4.8 Cost................................................................................................ 612
22.4.9 Safety............................................................................................. 612
22.4.10 On-Body Computing..................................................................... 612
22.5 Future Directions in Wearable Computing.................................................. 613
Acknowledgment.................................................................................................... 615
References............................................................................................................... 615
583
584
22.1INTRODUCTION
The foundations of wearable computing may lie in pocket and wrist-worn watches that
were invented in the sixteenth century. Thorpe and Shannon are credited for being
among the first researchers attempting to build wearable computers in 1961. Their
motivation was to beat statistics and increase chances of winning in card games and
roulette (Thorp, 1998). While their wearable systems were rudimentary by todays
technology standards, they battled with integrating critical features that are key components of a wearable computer until today: sensing or information input, computing,
and some form of actuation, feedback, or communication of retrieved information.
Thorp and Shannons solution was based on game status input using a shoe-integrated
button, computing using a carry-on device hidden in cloths, and information feedback using an audio ear pad, all to avoid being detected by the game clerk. Between
the 1990s and today, wearable computers gained commercial and research interest
and first on-body computing products appeared in the markets. In the 1990s, onbody computer solutions were often derived from standard computing components
available at the time, such as the half-keyboard (Matias etal., 1994) and backpack
or side-case worn embedded computing units (Lizzy, 1993; Starner, 1993). Asmore
integrated sensors became available, the view on market opportunities shifted from
using on-body computers for mobile data entry applications, such as in warehouse
management, to fitness, sports, and the quantified self. However, a substantial share
of todays commercial on-body systems, such as the Nike running gear (shoe sensor plus mobile carry-on device), still follows a similar technical approach as in the
early solution of Thorpe and Shannon: the on-body computer solution is c entered
around a mobile carry-on device that provides computing, and often also sensing and
actuation/communication functions. Similarly, many research efforts during the past
years considered smartphones and other carry-on devices as basis for wearable computers. Devices are often placed in pockets, attached to glasses, or strapped to body
parts, without actually considering the integration into a wearable system.
The vision of invisible computing set out by Mark Weiser (1991) suggests that
technology shall not require particular consideration during everyday life, be virtually invisible, and thus not hinder physical activity. Consequently, electronic
devices that add features to wearable systems, including computing, sensors, etc.,
must be unobtrusively embedded in a users outfit. In its ultimate form, real wearable computers thus become part of a regular garment or accessory that is already
used. Toward the integration of wearable computers, various challenges exist that
affect function, robustness, and various other design considerations. For example,
when integrating computing into a finger ring, space constraints critically affect the
design, besides aesthetic requirements of the ring as an accessory. Rather than space,
wearable computers integrated in garments are mostly constraint to textile material
properties such as breathability and stretchability.
In this chapter, we review projects that investigate the integration of wearable
computers in garments and accessories. Our goal is to provide readers with an overview and in-depth understanding of the technical challenges and best practices of
wearable computer integration, thus wearables that could become useful in everyday
585
586
TABLE 22.1
Garment-Based Wearable Computer Projects Analyzed in This Review
Placement, Form Factor
Location: torso
Form: shirt
Location: torso
Form: shirt
Location: torso
Form: vest
Location: torso
Form: vest
Architecture Components
Computing: microcontroller
Sensors: PPG, accelerometer,
GPS
UI: none
Communication: ZigBee
Computing: MSP430
microcontroller
Sensors: capacitive electrodes
UI: none
Communication: ZigBee
Computing: MSP430
microcontroller
Sensors: accelerometers
UI: buttons, LEDs
Communication: I2C, SPI,
Bluetooth
Computing: microcontroller
Sensors: ECG, respiration rate
monitor, pulse oximeter,
temperature sensor,
accelerometer, microphone
UI: none
Communication: Bluetooth
Computing: MPC823
microprocessor, SA 1110
StrongARM microprocessor
Sensors: microphone,
accelerometers
UI: Twiddler, Palm keyboard,
HMD
Communication: I2C, USB,
Dallas Semiconductor
one-wire protocol, 10Mbs
Ethernet, 802.11 WLAN
Computing: SA 1110
StrongARM microprocessor,
TMS320 DSP
Sensors: grayscale cameras,
proximity sensor
UI: keyboard, mouse, HMD
Communication: I2C, USB, serial
port, IrDA port, PS/2, VGA,
10Mbs Ethernet, 802.11 WLAN
(Continued)
587
Location: torso
Form: vest
Location: torso
Form: jacket
Location: torso
Form: suit
e-SUIT: business-suit-integrated
control of an information
management application (Toney
etal., 2002)
Location: torso
Form: underclothes
Location: torso
Form: underclothes
Location: torso
Form: underclothes
Architecture Components
Computing: no information
Sensors: HR monitor,
temperature sensor,
accelerometer
UI: LED, LCD display
Communication: RF
transmission (not specified)
Computing: no information
Sensors: IMUs, force sensitive
resistors, ultrawideband tags
UI: none
Communication: data bus,
Bluetooth
Computing: microcontroller,
StrongARM microprocessor
Sensors: none
UI: buttons, slide control,
LEDs, LCD display, pager
motor
Communication: data bus,
WLAN
Computing: no information
Sensors: ECG, respiration rate
monitor, temperature sensor,
GPS
UI: button
Communication: I2C, GSM
Computing: no information
Sensors: ECG, respiration rate
monitor, accelerometer
UI: none
Communication: RF
transmission (not specified)
Computing: no information
Sensors: ECG, respiration rate
monitor (piezoresistive/
impedance pneumography),
piezoresistive elbow/
shoulder joint movement
monitor
UI: buttons, LEDs, buzzer
Communication: GPRS
(Continued)
588
Airwriting: glove-integrated
system for 3D-handwriting
recognition (Amma etal.,
2013)
Instrumented pair of pants for
monitoring lower extremity
joints (Liu etal., 2008)
Location: legs
Form: pair of pants
Location: feet
Form: shoes
Location: feet
Form: shoes
Shoe-mouse: instrumented
shoes as alternative (footcontrolled) input device
(Yeetal., 2005)
Location: feet
Form: shoes
Architecture Components
Computing: microcontroller
Sensors: HR monitor, respiration rate
monitor, temperature sensor,
accelerometer, heat flux sensor,
CO/CO2 concentration, GPS
UI: visual and acoustic alarm
Communication: RS485 data bus,
ZigBee, 802.11 WLAN
Computing: DSP
Sensors: bend sensors, contact sensors
UI: none
Communication: no information
Computing: microcontroller
Sensors: accelerometer, gyroscope
UI: none
Communication: Bluetooth
Computing: Atmel AVR Atmega8
microcontroller
Sensors: accelerometers, gyroscopes,
bend sensors
UI: none
Communication: I2C, Bluetooth
Computing: PIC 16C711 microcontroller
Sensors: accelerometer, gyroscope,
force sensitive resistor, bend sensor
UI: none
Communication: RF transmission (not
specified)
Computing: Atmel AT90S8515
microcontroller
Sensors: accelerometer, gyroscope,
force sensitive resistor, bend sensor
UI: none
Communication: GPRS
Computing: C8051F206
microcontroller
Sensors: accelerometer, gyroscope,
force sensitive resistor, bend sensor,
electric field sensor
UI: none
Communication: 916.50MHz RF
transmission (RF Monolithics
DR3000-1)
(Continued)
589
Architecture Components
Computing: no information
Sensors: pulse oximeter,
accelerometer
UI: none
Communication: sub 1 GHz
RFtransmission (Nordic
nRF9E5)
http://vivonoetics.com/products/sensors/lifeshirt/.
590
Other applications. Further applications of garment-based wearable computers include sports (e.g., sensor shirts manufactured by Hexoskin* or
different commercially available shoe-based systems). Motion monitoring
was addressed for entertainment (e.g., the Xsens MVN), and worker support during production and maintenance tasks, e.g., as a motion jacket
(Stiefmeier et al., 2008). Cheng et al. (2010) presented a neck collar for
dietary behavior monitoring.
Clearly, such a widespread collection of application scenarios defines different system requirements. Nevertheless, most garment-based wearable computers include the
key components for sensing, computing, user interface, (wireless) data transmission,
and power supply. Table 22.1 list all analyzed garment-based wearable computer
projects. Detailed information on a systems architecture was not always available.
For example, many projects mentioned computing units but did not provide details
on memory or controller models. Power supply was omitted from the architecture
overview as all projects reported to use batteries.
http://www.hexoskin.com/.
http://www.xsens.com/products/xsens-mvn/.
591
Torso
[Harms2008] [Rosso2010]
[DeVaul2001] [Lukowicz2001]
[Knight2005] [Stiefmeier2008]
[Toney2002] [Noury2004]
[DiRienzo2005] [Paradiso2008]
[Curone2010]
Arms and hands
[Kuroda2004] [Amma2013]
Legs
[Liu2008]
Feet
[Paradiso2002] [Ye2005]
[Bamberg2008] [Weber2007]
FIGURE 22.1 Garment-based wearable computer projects and their placement on the
human body.
Wearable computers at the torso were typically distributed systems, utilizing waist,
chest, back, or shoulders for components. Mostly, sensing at relevant body locations
and main system components, such as batteries and computing, was separated. A prototypical example of a distributed torso-worn system was described by Harms etal.
(2008). Their system was implemented into a loose-fitting long sleeve shirt, hence covering lower arms and the wrist locations too. The wearable computer design was hierarchical, consisting of a central master, region-specific gateways, and outer peripherals
(terminals) that provide sensing and I/O functionality. Essential system components,
including the master, gateways, and wiring, were glued onto the inner side of the garment, while a replaceable battery was placed in a pocket. Each gateway provided standardized interfaces to different terminals, such as accelerometer, buttons, and LEDs.
Harms etal. (2008) position four gateways to maintain a balanced distribution of terminals over the entire body and limiting wiring stretches below 85cm. Figure 22.2 shows
an example shirt and the architecture schematic. Another shirt-based wearable system
was presented by Rosso etal. (2010). Theyintegrate a set of sensors (ECG electrodes,
temperature sensors, accelerometers, etc.) into a T-shirt. In addition, an electronic module was attached to the T-shirt for data collection, analysis, and wireless transmission.
592
Gateways
Battery
Konnex
Physical phenomenon
Vest-based designs were preferred in many early wearable computer projects for
practical reasons, with the vests pockets used to carry bulky components. Moreover,
shirt designs were found cumbersome during dressing and putting off (Knight etal.,
2005). DeVaul etal. (2001) described the MIThril system, included in a chassis that
acts as a lining in a zip-in vest. The MIThril architecture included two computing
cores, a multi-protocol body bus, a range of I2C-based sensors, and interface devices,
such as clip-on HMD and Twiddler chording keyboard. Lukowicz etal. (2001) extend
MIThril with a modular computing core called WearARM, supporting reconfigurable data processing and efficient power management. In Knight etal. (2005), vests
were used to include sensors, core components, and output devices (LCD, LEDs).
Similar to vest-based systems, jacket-based solutions allowed users fast dressing and
stripping. Stiefmeier etal. (2008) described a motion jacket system, integrating inertial sensors and force-sensitive resistors together with processing and communication
features. Toney etal. (2002) integrated components into a traditional business suit,
including input and output interfaces connected to a PDA at the suits inner pocket.
Underclothes were found particularly beneficial for physiological sensors that
require direct skin contact. For example, Noury etal. (2004) developed a m
edical
monitoring system integrated in an undercloth, including sensing components (such
as ECG electrodes and temperature sensors), a fall detection module consisting of an
accelerometer and a microcontroller, wiring and interconnection busses. In addition,
the system included a belt, containing processing and communication components
and batteries, wired to the garment. Di Rienzo etal. (2005) presented an underclothbased system including textile sensors for ECG and respiration. Another underclothbased wearable computer was presented by Paradiso et al. (2008). The garment,
together with nine textile electrodes for cardiopulmonary monitoring, was realized
in one knitting procedure. Sensors were connected to an electronics module, placed
into a pocket at the lower back of cloth.
593
Curone etal. (2010) used three garment components: a T-shirt as inner garment,
a jacket as outer garment, and a pair of boots. Each component was sensor-equipped
for real-time monitoring of physiological, activity, and environmental parameters.
The inner garment included textile sensors connected to an electronic module via textile conductive cables. The outer garment included additional sensors to measure, for
example, external temperature, CO concentration, absolute position, and a processing
module for collecting and preprocessing data from different sensor nodes. Moreover,
electronic communication and alarm modules were attached for sending data to an
operations coordinator and providing visual and acoustic warnings when dangerous
situations were detected. The boots included CO2 sensors and a ZigBee module.
22.2.1.3 Arms and Hands Region
While various glove-based devices were developed in the past years (cf. the survey
of Dipietro etal., 2008), many of the related reports focus on information processing
rather than integration (e.g., Park etal., 2008), and thus were not analyzed.
Kuroda et al. (2004) introduced a data glove, called StrinGlove integrating
24bend sensors for hand posture monitoring and 9 contact sensors to detect contact between fingertips. Data processing was performed by a signal processor unit,
mounted onto the glove. Amma etal. (2013) described a wearable computer in a thin
glove and a wristlet. Their system consisted of a wireless data glove with inertial
sensors at the hands back, and microcontroller, Bluetooth module, and power supply at the wristlet. Besides the wearable computer, their system included an external
module for intensive data processing tasks.
Some glove designs only included sensors or data input, such as hand tracking
with a color glove (Wang and Popovic, 2009). Other glove-based projects are either
commercial developments (e.g., the MoCap Glove from CyberGlove Systems* or
data gloves from 5DT) or rely on one of the available products. A few garmentbased wearable computers for the torso covered upper limbs, too. For example, the
long sleeve shirt of Harms etal. (2008) included terminals at upper and lower arms.
Glove-like systems without fabric integration, as the SCURRY system (Kim etal.,
2005), are covered as accessory in Section 22.3.
22.2.1.4 Legs Region
Liu etal. (2008) introduced trousers with printed circuit boards located at hips and
leg joints, including inertial sensor, microcontroller with I2C interface, and A/D
converter. In addition, bend sensors were attached at the trousers knee locations.
Inother projects, trousers were often used as part of a larger system for only sensing,
thus are not discussed here.
22.2.1.5 Feet Region
Shoe-based wearable computers often included insole-integrated sensors, as well as
an attached module with additional sensors and system components for data processing, communication, and power supply. Typically, only basic data processing, for
*
http://www.cyberglovesystems.com/.
http://www.5dt.com/.
594
example, signal conditioning, was performed on the shoe wearable computer and
then data was sent to an external system for further processing. Foot-mounted system
designs were often constraint by space, required robust wear-resistant attachments,
and autonomous operation to avoid wiring from foot/ankle to a trousers or waistmounted computing unit, for example. The feet region was chosen in applications of
gait analysis or for using feet as input device.
Using insole-sensors, dynamic pressure was frequently measured at the heel and
great toe, as well as sole bending, for example, in Paradiso (2002). Bamberg etal.
(2008) included a capacitive sensor in the insole to estimate foot height above floor
level. Additional sensors, including accelerometers and gyroscopes were included
in the shoe-attached module. Paradiso (2002) placed the shoe-attached module at
shoes side, Bamberg etal. (2008) used the shoes back, while Ye etal. (2005) fully
integrated the components inside the shoe. An infant shoe (bootee) for wireless monitoring of pulse oximetry was presented by Weber etal. (2007). Electronic components, including oximetry module, RF transceiver, and power supply, were contained
in a box, integrated into a thick sole of the bootee.
https://getpebble.com/.
http://getneptune.com/.
595
internet-based services, gather data from other portable devices, or even serve as
simple smartphone replacement.
Table 22.2 list accessory-based wearable computer projects analyzed and further
discussed later per body region. While we aimed at extracting most architectural
information from the projects publications, often details were missing. The power
supply was omitted from the architecture overview since most systems used batteries, except for solar cells in a sensor button system (Bharatula etal., 2004).
596
TABLE 22.2
Accessory-Based Wearable Computer Projects Analyzed in This Review
Placement, Form Factor
Architecture Components
Computing: no information
Sensors: camera
UI: earphone, laser line
Communication: no information
Headset-based wearable
computer for context-aware
applications (Matsushita,
2001)
Computing: AT90S8515
microcontroller, PIC16LF877
microcontroller
Sensors: accelerometer, gyroscope
UI: speaker, microphone
Communication: Bluetooth
Location: torso
Form: belt
QBIC: belt-integrated
wearable computing platform
(Amft etal., 2004)
Location: torso
Form: button
Computing: CoolRisc 88
microcontroller
Sensors: accelerometer, light sensor,
microphone
UI: none
Communication: 868MHz RF
transmission
Location: arm
Form: arm-band
Computing: no information
Sensors: none
UI: buttons, LED array
Communication: no information
(Continued)
597
Location: wrist
Form: watch
Location: wrist
Form: watch
eWatch: smartwatch as a
wearable sensing,
notification, and computing
platform (Maurer etal., 2006)
Location: wrist
Form: bracelet-like
device
Location: wrist
Form: bracelet-like
device
Architecture Components
Computing: ARM7 microcontroller
(Cirrus Logic EP 7211)
Sensors: none
UI: OLED touch display, roller wheel
Communication: Bluetooth, infrared
Computing: ARM7TDMI
microcontroller (Philips LPC2106)
Sensors: accelerometer, temperature
sensor, light sensor, microphone
UI: buttons, LCD display, buzzer,
vibrating motor
Communication: Bluetooth, infrared
Computing: ARM7TDMI
microcontroller (Atmel AT91R40807)
Sensors: ECG, pulse oximeter, blood
pressure system, accelerometer
UI: buttons, LCD display
Communication: GSM
Computing: C8051F020 microcontroller
Sensors: HR monitor, temperature
sensor, accelerometer
UI: none
Communication: ZigBee
Computing: PIC microcontroller
Sensors: accelerometers, gyroscopes
UI: none
Communication: 2.4 GHz RF
transmission
Computing: PIC microcontroller
Sensors: PPG sensor
UI: none
Communication: 915MHz RF
transmission
used for different field studies, including daily routine monitoring and sports. The
system is shown in Figure 22.5. Bharatula etal. (2004) presented a button concept as
autonomous microsystem, integrated in a button-like form. Sensors for light, sound,
and acceleration, microprocessor, and RF transceiver were included and a solar cell
and lithium polymer battery was considered for powering. The button could replace
regular buttons in cloths at various locations.
598
FIGURE 22.3 Accessory-based wearable computer projects and their placement on the
human body.
Light sensor
Frameintegrate
d wiring
Optical heart
rate sensor
Microcontroller,
wireless interface,
inertial sensors,
battery
Top view
599
QBIC buckle
Connector
ports
Belt internal
battery
FIGURE 22.5 Example of accessory-based wearable computer: QBIC belt-integrated computer and system schematic. (From Amft, O. etal., Design of the QBIC wearable computing platform, Proceedings of 15th IEEE International Conference on Application-Specific
Systems, Architectures, and Processors (ASAP), Galveston, TX, 2004, pp. 398410.)
socket, and an earphone socket. The second layer was used to connect power and
ground lines.
22.3.1.4 Wrist Region
Smartwatches are the predominant and commercially successful wearable
computers at the wrist in recent years. Their success may be attributed to the reuse
of an established accessory location: the users wrist was used for watches in a long
time. The location is easily accessible and considered for visual expressions of fashion or trendiness. Narayanaswami etal. (2002) introduced IBM Linux Watch, one
of the earliest smartwatches. The Linux Watch was a complete computer running
Linux, displaying X11 graphics, and had wireless connectivity. The system consisted
of a main board with the processor, communications board including a Bluetooth
module, and display board including an OLED display. Maurer et al. (2006) presented the eWatch device, a wearable sensing, notification, and computing platform
built into a wrist watchform factor. eWatch provided tactile, audio, and visual notification while monitoring light, motion, sound, and temperature. Beside sensors, user
interface, and computing components, eWatch used Bluetooth to communicate with
a cellphone or a stationary computer. Anliker etal. (2004) presented a bracelet-like
device for continuous monitoring and evaluation of multiple physiological parameters, including blood pressure, ECG, and oxygen saturation. The enclosure included
all sensors, processing and communication modules, power supply, and user interface. Medical emergency detection was performed at the device, and analyzed data
was sent to a medical center unit. Another bracelet-based system is presented by
Malhi etal. (2012), integrating sensors into the device to measure temperature, heart
rate, and aim at detecting falls.
22.3.1.5 Hand and Finger Region
The human hand was as well considered for wearing accessory-based wearable computers. Kim etal. (2005) described a glove-like device called SCURRY, composed
of a base module and four ring-type modules containing sensors, communication,
and microcontroller components. The base module included two gyroscopes on the
600
back of the hand, while the ring modules included two-axis accelerometers. Asada
etal. (2003) considered a finger-worn ring accessory and integrated a PPG sensor,
microcontroller for LED modulation, data acquisition, filtering, RF communication,
and an RF transmitter. All components were encapsulated within the ring, in a compact body and powered by a tiny cell battery used for wristwatches.
22.3.1.6 Legs and Feet Region
No accessory-based wearable computers were found for the lower limbs region. Most
wearable computer designs that aim for the lower limbs region are integrated into
trousers or shoes, thus are considered in Sections 22.2.1.4 and 22.2.1.5.
22.4.1User Interface
As they are exposing a systems functionality with I/O modalities, user interfaces
are key to user experience. Multifunctional wearable computers typically support
a range of peripherals, including audio and visual, touch and keys, and buttons.
Peripherals shall be easily accessible, interrupting users in suitable moments, yet
being unobtrusive. DeVaul etal. (2001) stated that user interfaces shall maximize
provided information value while minimizing physical and cognitive burdens
imposed by accessing it.
601
TABLE 22.3
Technical Challenges of Wearable Computer Integration: Summary
ofLessons Learned and Best Practices
Technical Challenge
User interface
Sensing modalities
Wearability
Social acceptance
and aesthetics
Robustness and
reliability
602
Cost
Safety
On-body computing
Toney etal. (2002) overlooked input options regarding cognitive load and external
perception. Their investigations clearly favor garment-integrated hidden buttons as
they do not lead to loss of eye contact in social situations when accessing them, and
require limited cognitive load on the wearer. The authors implemented capacitive
touch sensors by embroidered metal threads inside hems and cuffs of a suit jacket.
The touch-sensitive buttons and sliders could be used to inconspicuously operate
personal information management applications with common gestures when sitting and standing. Buttons with different design were frequently used in many other
garment-based systems too (Paradiso et al., 2005; Harms et al., 2008; Lee et al.,
2010). Textile-integrated buttons could be constructed via multilayer fabrics, for
example, the MP3 player of Lee etal. (2010). As an example of more versatile wearable computers, the MIThril architecture supported various input devices, including
audio via microphone and text entry via Twiddler and Palm folding keyboard.
While Maurer etal. (2006) used push buttons on their smartwatch. Narayanaswami
etal. (2002) and others deliberately designed input interfaces without buttons, motivating this decision with ease of use and a more elegant appearance. The IBM Linux
Watch included touch screen and a roller wheel. Due to the display size limits, only
four quadrants with a potential fifth zone at the displays center were used. Instead,
the roller wheel was used to navigate lists corresponding to a middle mouse wheel.
603
While among input options, touch and button interfaces dominate, a trend to multimodal output controls is observable for information feedback. Toney etal. (2002)
investigated output options regarding cognitive load and external perception too
and implemented LEDs in suit cuffs to inform on priority of available information.
Furthermore, they considered a tactile feedback (pager motor) sewn into the jacket
shoulder providing different vibration patterns, and a watch computer with a programmable LCD for short messages. Toney etal. (2002) considered that sound output required context information to enable it only when appropriate. In the scenario
of Curone etal. (2010), garments included an alarm module for visual and acoustic
warnings in emergencies. Moreover, user warnings are key in medical monitoring
applications. LEDs and buzzers were used in various implementations to give immediate feedback (Paradiso etal., 2005; Harms etal., 2008). For physical activity monitoring, Knight etal. (2005) used LEDs indicating pulse measurements and an LCD
display to show data.
In the eWatch, user notifications were displayed via a 128 64 pixel LCD display,
LED, vibrating motor, and tone generating buzzer (Maurer etal., 2006). For the IBM
Linux Watch, an OLED display was preferred over LCD for readability, reduced
power consumption than backlit LCD, and for aesthetics. They chose yellow for the
OLED due to higher contrast than blue or green. Narayanaswami etal. (2002) found
no significant difference whether graphics were presented in landscape or portrait
format, but, for example, for a phone book the landscape mode was preferred as
fewer lines with more characters per line are easier to read. For particular applications, rather specific feedback options are however well conceivable. For example,
Tamaki etal. (2009) used an earphone for audio feedback and a laser line/miniprojector to indicate the camera view in their ear-worn system.
22.4.2Sensing Modalities
Sensor is a key asset of wearable computers that not only process direct user input
but gather information from measurable phenomena. Various sensing modalities were integrated in garment-based and accessory-based wearable computers.
Table22.4 summarize signals, modalities, and integration approaches used in the
included projects.
Sensors measuring physiological parameters were frequently found in underclothes as they require direct skin contact or skin proximity. Sensors to measure
motion, activity, and environmental parameters could be placed on outer garments or
accessories too, as long as a garment is tight-fitting (Harms etal., 2008). In garmentbased systems, sensors were frequently attached (glued, sewn, fixed with Velcro
straps) or integrated as yarn into the fabric itself, such as for textile electrodes. Since
we review wearable computer projects, sensor modalities and integration techniques
are certainly not exhaustive. Nevertheless, projects included here provide a useful
summary on established sensing approaches.
Paradiso etal. (2005) knitted sensors in garments, including fabric ECG electrodes, using stainless-steel yarn, elongation sensors using piezoresistive yarns for
respiration measurement as expansion and contraction of thoracic and abdomen
regions, and for activity monitoring in sleeves. They used knitted electrodes at
604
TABLE 22.4
Overview of Sensing Modalities Found in the Wearable Computer Projects
Type of Signal
Sensing Approach
Respiration rate,
respiration volume
Respiration rate
Respiration rate
Body temperature
Upper body
movement
605
Sensing Approach
Finger flexure
Detect contact
between fingertips
Lower body
movement
Knee bending
Foot movement
Dynamic pressure of
the heel and toes
Flexion at the
metatarsalphalangeal joint
Plantar flexion and
dorsiflexion
Overall body
movement, fall
detection, gait
analysis
Height of the foot
above floor
Absolute position
Ambient temperature
Temperature probe
Relative position to an
object
606
Sensing Approach
Ambient humidity
Humidity sensor
Ambient light
Ambient sound
Microphone
CO/CO2 concentration
aParadiso etal. (2005); bParadiso etal. (2008); cAnliker etal. (2004); dRosso etal. (2010);
References:
eDi Rienzo etal. (2005); fNoury etal. (2004); gAsada etal. (2003); hKim etal. (2008); iWang
etal. (2007); jWeber etal. (2007); kMalhi etal. (2012); lBulling etal. (2009); mCheng etal.
(2010); nStiefmeier etal. (2008). oMaurer etal. (2006); pKim etal. (2005); qKuroda etal.
(2004); rLiu etal. (2008); sWeber etal. (2007); tBamberg etal. (2008); uParadiso (2002); vYe
etal. (2005); wDi Rienzo etal. (2005); xRosso etal. (2010); yBharatula etal. (2004); zJarchi
et al. (2014); aaAnliker et al. (2004); abStiefmeier et al. (2008); acCurone et al. (2010);
adNoury etal. (2004).
607
temperature is rather low (Anliker etal., 2004). Bulling etal. (2009) measured the
EOG with dry electrodes integrated into a goggles frame and compensated motion
artifacts using an accelerometer.
Inertial sensors are widely used to monitor physical activity in garment-based and
accessory-based systems. Examples include step and fall detection (Kim etal., 2008;
Rosso et al., 2010), upper body posture and motion capture (Harms et al., 2008;
Stiefmeier etal., 2008), finger motion (Kim etal., 2005), and physical activity monitoring (Anliker etal., 2004). Besides inertial sensors, shoe-based wearable systems
typically include pressure and bend sensors, directly integrated into insoles, such
as piezoelectric strips (Paradiso, 2002; Bamberg et al., 2008), bidirectional bend
and capacitive sensors (Bamberg etal., 2008). Additional sensor modalities found
in wearable computers include GPS and CO and CO2 concentration (Curone etal.,
2010), ultrawideband radar (Stiefmeier etal., 2008), and ambient temperature and
humidity (Rosso etal., 2010).
608
22.4.4Wearability
Wearability and ergonomics have been key challenges since the first wearable computers, involving system size, shape, and weight (Gemperle etal., 1998; Knight and
Deen-Williams, 2006). Wearing a garment-based or accessory-based wearable computer shall not alter the usual perception of a garment or accessory, nor the wearers
usual posture and movement. Comfortable wearing, easy setup, and m
aintenance are
concurrent requirements. Ideally, a user attaches the system on in the morning, takes
it off in the evening, and completely forgets about it in between except when actively
using its functionality (DeVaul etal., 2001).
Knight etal. (2005) suggested that anthropometric data should be considered and
gathered design choices for smart garments: (1) components shall be flat but may
consume relatively large surface area (cf. recommendations of Gemperle etal., 1998)
and (2) components should be located on the trunk at location of minimal shape
change when bending or moving. According to Knight et al. (2005), appropriate
body regions include upper chest, upper back shoulder region, and hips. They
reported that their initial shirt design was impractical during dressing and stripping,
and thus changed to a vest design that closes on the chest. Bharatula etal. (2004)
suggested that body areas often in contact with objects, for example, underside of
arms and hands, back, bottom, and feet, should only contain seamless integrated
components within the fabrics thickness. In a similar gist, Paradiso (2002) noted
that at the shoes outer side, components can be placed without constraining common movement. Textile electrodes, elongation sensors, and other embroidered or
knitted transducers, wires, etc., are suitable components for comfortable wearable
computers; however, textile components require similar properties than their fabric
substrate, such as breathability and stretchability.
Functional yarns have been integrated into the fabric structure of garment-based
systems, for example, to acquire ECG and respiration. However, garments cannot be
entirely constructed from conductive fabric as conductive region may be too rigid
and uncomfortable. Hence metal threads are usually twisted around standard textile
yarn. The amount of metal in a fabric is a compromise between required electrical
properties and the necessity to maintain mechanical behavior compatible with textile
applications (Paradiso and Pacelli, 2011). Lukowicz etal. (2001) targeted a garment
to be soft on the inside for comfortable wear, and rigid on the outside for robustness,
for example, to protect system components. They experimented with different material combinations. However, soft fabric garments may be difficult to attach and take
off, as they do not maintain shape and hold sensors (Kuroda etal., 2004).
Compared to garment-based wearable computers, miniaturization and integration of wearable computers in common accessories still appear as open challenges.
Current projects often made trade-offs toward a wearable design. Anliker et al.
(2004) integrated various vital monitoring functions into their wrist-worn AMON
system, thus easing an integration in everyday life activities. However, this all-in-one
design disfavors optimal sensor positioning. For example, acquiring ECG became a
hard challenge for the AMON system, compared to chest-worn solutions. A trade-off
between wearability and signal quality is frequently found in wearable computers.
For example, in EOG goggles, signal amplification and A/D conversion is ideally
609
performed at the electrodes, while weight and size considerations resulted in wiring
EOG electrodes to an upper arm processing unit. Due to longer wires, analog highimpedance EOG signals pick up an increased amount of noise (Bulling etal., 2009).
The ring-based system of Asada etal. (2003) impressively shows miniaturization;
however, it still remains larger than typical finger rings. Besides size, weight remains
a critical design consideration. DeVaul etal. (2001) report a total weight of~2kg for
the MIThril system. They addressed weight distribution using a zip-in vest design that
balanced weight between shoulders. While overall system size decreased in recent systems, weight is determined by powering needs, hence the battery. Noury etal. (2004)
placed larger and heavier system components, including battery, computing and communication units, separately from the garment on a belt. Amft etal. (2004) used the belt
to integrate the complete wearable computer, weighting ~350g. Matsushitas (2001)
headset system weighted 220 g and the system of Bulling etal. (2009) was 188g, with
the goggles weighting only 60 g. To minimize energy needs, dynamic context-aware
power management appears to be an important topic for further research.
610
(a)
(b)
FIGURE 22.6 Example of integration process using silicone depositing (a) and final result
(b) of the SMASH shirt. (Images courtesy of Holger Harms.)
(a)
611
(b)
FIGURE 22.7 Example of wearout effects due to mechanical and chemical stress in the
flex-print (a) and connector (b) of the QBIC belt computer. (Private images of the authors.)
Figure 22.7 shows examples of the line cracks leading to data and power connection
failures. The connectors were subsequently tinned to reduce the effect.
Foot-worn systems are particularly affected by mechanical stress due to foot
impact shocks and high accelerations during movement, requiring components to be
well attached or latched down. Paradiso (2002) covered the electronics board with a
protective Plexiglas shield. Sensors placed in insoles should be protected from abrasion, moisture, etc., which can be addressed by sealing and placing sensor insoles
beneath a regular insole in shoes (Paradiso, 2002).
22.4.7Extensibility
Aiming at multipurpose wearable computers, DeVaul et al. (2001) suggested that
systems should support the widest range of users and applications, which require
physical and functional reconfiguration options by design. In many projects, the
system design was optimized for a particular purpose, hence reducing the need
for extensibility. For multipurpose garment-based systems, extensibility primarily
addresses adding further electronic components. Reconfiguration of the textile-
integrated functions was not sufficiently considered, seemingly due to missing basefabric technology.
The garment designs of Harms et al. (2008) and Curone et al. (2010) follow a
modular architecture, where extensions such as adding sensors do not require modifying the central computing module. The vest design of Knight etal. (2005) included
two originally unused pockets, intended to house future additions. Harms et al.
(2008) deployed a hierarchical architecture consisting of three layers: terminals,
gateways, and central master, where gateways provide interfaces for sensors and
peripherals. Hubs could extend terminal count at a gateway to 127; thus the system
can be equipped with ~500 terminals in total.
Among the accessory-based wearable computers, Amft etal. (2004) addressed
extensibility by providing access to the QBIC system bus inside the belt for additional peripheral devices. In addition, the buckle contained main and extension
boards. The latter included peripheral and wireless interfaces that could be replaced
depending on the application and a specific belt configuration.
612
22.4.8Cost
Except for smartwatches and activity trackers, current garment-based and accessorybased wearable computers must yet be seen as niche products. Market price, and
hence production cost, is a key concern for adoption, in particular for mass market
garment and accessory products. Among the included projects, only few considered
cost aspects. A common approach was to build on components of the shelf only and
minimizing total component count.
Toney etal. (2002) estimated that their suit could be mass-produced at $17$20 for
the integrated electronics. Wearable computers provide a rich design space to choose
implementation options for cost-sensitive designs; however, they require dedicated
component evaluations in the targeted environment or application when prior experience and performance data are missing. For example, Knight etal. (2005) considered
several alternatives for heart-rate monitoring and selected an insert microphone and
pressure bulb as these were the cheapest options. However, components needed to be
replaced with more expensive alternatives due to motion artifact sensitivity.
Current garment-based wearable computers are frequently found in niches, where
higher market prices can be established. One main issue is the special production
process required for textile-integrated components. Accessory-based systems may
be less affected by production and technology-related risks, which enabled vendors
to successfully promote smartwatches.
22.4.9Safety
As new functionality is added, wearable computers are disruptive to classical uses of
garments and accessories. While reported cases of accidents while wearing custom
wearable computers exist, for example DeVaul etal. (2001), current safety considerations are premature. Since wearable computers include electronic modules, wiring,
batteries, etc., fundamental electrical safety considerations shall be applied and possibly extended for the needs of wearable systems in the future.
Key factors affecting safety include physical presence of the system, and cognitive load imposed on the wearer. Bulky and rigid design affects physical presence and
should be avoided. Similarly, wiring inside the outfit and contact insulation are essential
for safe handling and to prevent failures. For example, Knight etal. (2005) sewed wire
channels into their vest to pass leads through. Matsushita (2001) lowered Bluetooth
radio power of their headset system to reduce microwave irradiation into the users head.
Cognitive load is best addressed by user interfaces that minimize disruptive interrupts and attention during operation. Interruption moments could be determined
from user context information processing. To further eliminate safety concerns,
wearable system prototypes and their user interfaces should be evaluated during
long-term realistic trials with nonexperts wearing the system in their daily routine.
22.4.10On-Body Computing
In the headset-based system of Matsushita (2001), step detection was directly performed on the system. The AMON wrist-worn device analyzed measurements online,
613
including signal filtering and converting measured values into medical values, for
example, for blood pressure and RR distance, and performing automated evaluations for emergency detection (Anliker et al., 2004). The QBIC system was used
to run the CRNT streaming framework to recognize various activities (Bannach
etal., 2008). Rosso etal. (2010) used decision-tree algorithms on a PDA attached to
their sensor-embedded T-shirt to recognize worsening condition and provide immediate wearer feedback, while computationally intensive algorithms were executed
remotely. Curone etal. (2010) distributed signal processing and information extraction at sensor level in their outer garment.
Distributed processing and information extraction in wearable computers reduces
overall computational complexity and data amount to be communicated. Lukowicz
et al. (2001) observed that many computationally intensive tasks are application-
specific, for example, signal or image processing, and general-purpose processors
are not optimally suited. Their WearARM design consisted of a heterogeneous,
distributed architecture with general-purpose and special-purpose subsystems.
The latter included low-power DSPs to perform computations using a fraction of a
general-purpose processors time. Harms etal. (2008) distributed tasks onto a hierarchical network of different garment-integrated nodes. Terminals were equipped
with an 8-bit microprocessor for sensor signal preprocessing and translation.
Subsequently, gateways equipped with a 16-bit microcontroller concentrated data
from several attached terminals and extracted features. Eventually, a central master
unit processed feature data for online classification using a nearest centroid classifier
on a 16-bit microcontroller.
Few accessory-based wearable computers used operating systems to expedite
application development, abstract hardware, and use existing libraries and data processing frameworks. Both, IBMs smartwatch and the QBIC belt system ran GNU/
Linux. The EOG goggles ran freeRTOS. For garment-based wearable computers,
design and implementation of a dedicated operating system, called GarmentOS, was
proposed by Cheng etal. (2013).
http://endeavourpartners.net/white-papers/.
614
http://www.simpleskin.org.
http://www.mc10inc.com.
615
ACKNOWLEDGMENT
The authors are thankful to Holger Harms for providing images. This work was
partially supported by the collaborative project SimpleSkin under contract with the
European Commission (#323849) in the FP7 FET Open framework.
REFERENCES
Amft, O., M. Lauffer, S. Ossevoort, F. Macaluso, P. Lukowicz, and G. Trster (2004). Design
of the QBIC wearable computing platform. In Proceedings of 15th IEEE International
Conference on Application-Specific Systems, Architectures, and Processors (ASAP),
Galveston, TX, pp.398410.
616
Amft, O., F. Wahl, S. Ishimaru, and K. Kunze (2015). Making regular eyeglasses smart. IEEE
Pervasive Computing, in press.
Amma, C., M. Georgi, and T. Schultz (2013). Airwriting: A wearable handwriting recognition
system. Personal and Ubiquitous Computing 18 (1), 191203.
Anliker, U., J. A. Ward, P. Lukowicz, G. Trster, F. Dolveck, M. Baer, F. Keita etal. (2004).
AMON: A wearable multiparameter medical monitoring and alert system. IEEE
Transactions on Information Technology in Biomedicine 8 (4), 415427.
Asada, H. H., P. Shaltis, A. Reisner, S. Rhee, and R. C. Hutchinson (2003). Mobile monitoring
with wearable photoplethysmographic biosensors. IEEE Engineering in Medicine and
Biology Magazine 22 (3), 2840.
Bamberg, S. J. M., A. Y. Benbasat, D. M. Scarborough, D. E. Krebs, and J. A. Paradiso (2008).
Gait analysis using a shoe-integrated wireless sensor system. IEEE Transactions on
Information Technology in Biomedicine 12 (4), 413423.
Bannach, D., O. Amft, and P. Lukowicz (2008). Rapid prototyping of activity recognition
applications. IEEE Pervasive Computing 7 (2), 2231.
Bharatula, N. B., S. Ossevoort, M. Stger, and G. Trster (2004). Towards wearable autonomous microsystems. In Proceedings of Second International Conference on Pervasive
Computing (PERVASIVE), Vienna, Austria, pp. 225237.
Bulling, A., D. Roggen, and G. Trster (2009). Wearable EOG goggles: Seamless sensing
and context-awareness in everyday environments. Journal of Ambient Intelligence and
Smart Environments 1 (2), 157171.
Cheng, J., O. Amft, and P. Lukowicz (2010). Active capacitive sensing: Exploring a new wearable sensing modality for activity recognition. In Proceedings of Eighth International
Conference on Pervasive Computing (PERVASIVE), Helsinki, Finland, pp. 319336.
Cheng, J., P. Lukowicz, N. Henze, A. Schmidt, O. Amft, G. A. Salvatore, and G. Trster (2013).
Smart textiles: From niche to mainstream. IEEE Pervasive Computing 12 (3), 8184.
Curone, D., E. L. Secco, A. Tognetti, G. Loriga, G. Dudnik, M. Risatti, R. Whyte, A. Bonfiglio,
and G. Magenes (2010). Smart garments for emergency operators: The ProeTEX project.
IEEE Transactions on Information Technology in Biomedicine 14 (3), 694701.
DeVaul, R. W., S. J. Schwartz, and A. Pentland (2001). MIThril: Context-aware computing for
daily life. Technical report, MIT Media Lab, Cambridge, MA.
Di Rienzo, M., F. Rizzo, G. Parati, G. Brambilla, M. Ferratini, and P. Castiglioni (2005).
MagIC system: A new textile-based wearable device for biological signal monitoring.
Applicability in daily life and clinical setting. In Proceedings of 27th Annual
International IEEE EMBS Conference, Shanghai, China, pp. 71677169.
Dipietro, L., A. M. Sabatini, and P. Dario (2008). A survey of glove-based systems and their
applications. IEEE Transactions on Systems, Man, and Cybernetics 38 (4), 461482.
Gemperle, F., C. Kasabach, J. Stivoric, M. Bauer, and R. Martin (1998). Design for wearability. In Proceedings of Second International Symposium on Wearable Computers
(ISWC), Pittsburgh, PA, pp. 116122.
Harms, H., O. Amft, D. Roggen, and G. Trster (2008). SMASH: A distributed sensing and
processing garment for the classification of upper body postures. In Proceedings of
Third International Conference on Body Area Networks (BodyNets), ICST, Brussels,
Belgium.
Harms, H., O. Amft, and G. Trster (2012). Does loose fitting matter? Predicting sensor
performance in smart garments. In Proceedings of Seventh International Conference on
Body Area Networks (BodyNets), ICST, Brussels, Belgium, pp. 14.
Jarchi, D., B. Lo, E. Ieong, D. Nathwani, and G.-Z. Yang (2014). Validation of the e-AR sensor
for gait event detection using the Parotec foot insole with application to post-operative
recovery monitoring. In Proceedings of 11th International Conference on Wearable and
Implantable Body Sensor Networks (BSN), Zurich, Switzerland, pp. 127131.
617
Kim, S. H., D. W. Ryoo, and C. Bae (2008). U-healthcare system using smart headband.
In Proceedings of 30th Annual International IEEE EMBS Conference, Vancouver,
British Columbia, Canada, pp. 15571560.
Kim, Y. S., B. S. Soh, and S.-G. Lee (2005). A new wearable input device: SCURRY. IEEE
Transactions on Industrial Electronics 52 (6), 14901499.
Knight, J. and D. Deen-Williams (2006). Assessing the wearability of wearable computers.
InProceedings of 10th IEEE International Symposium on Wearable Computers (ISWC),
Montreux, Switzerland, pp. 7582.
Knight, J. F., A. Schwirtz, F. Psomadelis, C. Baber, H. W. Bristow, and T. N. Arvanitis (2005).
The design of the SensVest. Personal and Ubiquitous Computing 9 (1), 619.
Kuroda, T., Y. Tabata, A. Goto, H. Ikuta, and M. Murakami (2004). Consumer price dataglove for sign language recognition. In Proceedings of Fifth International Conference
on Disability, Virtual Reality and Associated Technologies (ICDVRAT), Oxford, U.K.,
pp. 253258.
Lee, S., B. Kim, T. Roh, S. Hong, and H.-J. Yoo (2010). Arm-band type textile-MP3 player
with multi-layer Planar Fashionable Circuit Board (P-FCB) techniques. In Proceedings
of 14th IEEE International Symposium on Wearable Computers (ISWC), Seoul, South
Korea, pp. 17.
Liu, J., T. E. Lockhart, M. Jones, and T. Martin (2008). Local dynamic stability assessment of
motion impaired elderly using electronic textile pants. IEEE Transactions on Automation
Science and Engineering 5 (4), 696702.
Lizzy (1993). Lizzy: MITs wearable computer design. http://www.media.mit.edu/wearables/
lizzy/lizzy/index.html. Last accessed: July 22, 2014.
Lukowicz, P., U. Anliker, G. Trster, S. J. Schwartz, and R. W. DeVaul (2001). The WearArm
modular, low-power computing core. IEEE Micro 21 (3), 1628.
Malhi, K., S. C. Mukhopadhyay, J. Schnepper, M. Haefke, and H. Ewald (2012). A Zigbee-based
wearable physiological parameters monitoring system. IEEE Sensors Journal 12 (3),
423430.
Matias, E., I. S. MacKenzie, and W. Buxton (1994). Half-qwerty: Typing with one hand
using your two-handed skills. In Companion Proceedings of the Conference on Human
Factors in Computing Systems (CHI), Boston, MA, pp. 5152.
Matsushita, S. (2001). A headset-based minimized wearable computer. IEEE Intelligent
Systems 16 (3), 2832.
Maurer, U., A. Rowe, A. Smailagic, and D. P. Siewiorek (2006). eWatch: A wearable sensor
and notification platform. In Proceedings of International Workshop on Wearable and
Implantable Body Sensor Networks (BSN), Cambridge, MA, pp. 142145.
Narayanaswami, C., N. Kamijoh, M. Raghunath, T. Inoue, T. Cipolla, J. Sanford, E. Schlig
et al. (2002). IBMs Linux watch: The challenge of miniaturization. IEEE Computer
35(1), 3341.
Noury, N., A. Dittmar, C. Corroy, R. Baghai, J. Weber, D. Blanc, F. Klefstat, A. Blinovska,
S. Vaysse, and B. Comet (2004). A smart cloth for ambulatory telemonitoring of
physiological parameters and activity: The VTAMN project. In Proceedings of Sixth
International Workshop on Enterprise Networking and Computing in Healthcare
Industry (Healthcom), Odawara-shi, Japan, pp. 155160.
Paradiso, J. A. (2002). Footnotes: Personal reflections on the development of instrumented
dance shoes and their musical applications. In Proceedings of International Conference
on New Interfaces for Musical Expression (NIME), Dublin, Ireland, pp. 3449.
Paradiso, R., A. Alonso, D. Cianflone, A. Milsis, T. Vavouras, and C. Malliopoulos (2008).
Remote health monitoring with wearable non-invasive mobile system: The HealthWear
project. In Proceedings of 30th Annual International IEEE EMBS Conference,
Vancouver, British Columbia, Canada, pp. 16991702.
618
Paradiso, R., G. Loriga, and N. Taccini (2005). A wearable health care system based on knitted
integrated sensors. IEEE Transactions on Information Technology in Biomedicine 9 (3),
337344.
Paradiso, R. and M. Pacelli (2011). Textile electrodes and integrated smart textile for reliable
biomonitoring. In Proceedings of 33rd Annual International IEEE EMBS Conference,
Boston, MA, pp. 32743277.
Park, I.-K., J.-H. Kim, and K.-S. Hong (2008). An implementation of an FPGA-based
embedded gesture recognizer using a data glove. In Proceedings of Second International
Conference on Ubiquitous Information Management and Communication (ICUIMC),
Suwon, Korea, pp. 496500.
Rosso, R., G. Munaro, O. Salvetti, S. Colantonio, and F. Ciancitto (2010). CHRONIOUS:
An open, ubiquitous and adaptive chronic disease management platform for chronic
obstructive pulmonary disease (COPD), chronic kidney disease (CKD) and renal
insufficiency. In Proceedings of 32nd Annual International IEEE EMBS Conference,
Buenos Aires, Argentina, pp. 68506853.
Starner, T. (1993). The cyborgs are coming, or, the real personal computers. Technical Report
TR 318, MIT. Written for Wired (unpublished). Obsolete by: TR355 Feb. 1994; original
text Nov. 1993; images June 1995.
Stiefmeier, T., D. Roggen, G. Ogris, P. Lukowicz, and G. Trster (2008). Wearable activity
tracking in car manufacturing. IEEE Pervasive Computing 7 (2), 4250.
Tamaki, E., T. Miyaki, and J. Rekimoto (2009). Brainy hand: An ear-worn hand gesture interaction device. In CHI Extended Abstracts, Boston, MA, pp. 42554260.
Thorp, E. O. (1998). The invention of the first wearable computer. In Proceedings of Second
International Symposium on Wearable Computers (ISWC), Pittsburgh, PA, pp. 48.
Toney, A., B. Mulley, B. H. Thomas, and W. Piekarski (2002). Minimal social weight user
interactions for wearable computers in business suits. In Proceedings of Sixth IEEE
International Symposium on Wearable Computers (ISWC), Seattle, WA, pp. 5764.
Wang, L., B. Lo, and G.-Z. Yang (2007). Reflective photoplethysmograph earpiece sensor for
ubiquitous heart rate monitoring. In Proceedings of Fourth International Workshop on
Wearable and Implantable Body Sensor Networks (BSN), Aachen, Germany, pp.179183.
Wang, R. Y. and J. Popovic (2009). Real-time hand-tracking with a color glove. ACM
Transactions on Graphics (SIGGRAPH 2009), 28(3).
Weber, J.-L., Y. Rimet, E. Mallet, D. Ronayette, C. Rambaud, C. Terlaud, Y. Brusquet etal.
(2007). Evaluation of a new, wireless pulse oximetry monitoring system in infants:
The BBA bootee. In Proceedings of Fourth International Workshop on Wearable and
Implantable Body Sensor Networks (BSN), Aachen, Germany, pp. 143148.
Weiser, M. (1991). The computer for the 21st century. Scientific American International
Edition 265 (3), 6675.
Wilhelm, F. H., W. T. Roth, and M. A. Sackner (2003). The LifeShirt: An advanced system
for ambulatory measurement of respiratory and cardiac function. Behav Modif 27 (5),
671691. DOI: 10.1177/0145445503256321.
Ye, W., Y. Xu, and K. K. Lee (2005). Shoe-Mouse: An integrated intelligent shoe. InProceedings
of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS),
Edmonton, Canada, pp.11631167.
23
E-Textiles in the
Apparel Factory
Leveraging Cut-and-Sew
Technology toward
the Next Generation
of Smart Garments
Lucy E. Dunne, Cory Simon, and Guido Gioberto
CONTENTS
23.1 Introduction................................................................................................... 620
23.2 Background: Textile Integration Strategies................................................... 621
23.2.1 Advanced Electronic-Textile Manufacturing.................................... 621
23.2.2 Surface Attachment and Conductor Integration................................ 621
23.2.3 PCB Attachment and Encapsulation................................................. 621
23.3 Background: Stitching Technologies............................................................. 623
23.3.1 Single-Needle Machines.................................................................... 623
23.3.2 Multineedle Machines....................................................................... 624
23.3.3 CNC Machines.................................................................................. 625
23.4 Routing Interconnects in Garments............................................................... 627
23.4.1 Pattern and Marker Layout................................................................ 627
23.4.2 Production Design and Order of Operations..................................... 628
23.4.3 Seam Crossing Methods.................................................................... 629
23.4.4 Trace Crossings.................................................................................. 631
23.5 Component Attachment................................................................................. 632
23.5.1 Through-Hole Components and Crimped Methods.......................... 632
23.5.2 Surface-Mount Components and Reflow Methods............................ 633
23.6 Textile Components....................................................................................... 634
23.6.1 Stitched Stretch and Bend Sensors.................................................... 634
23.6.2 Sensor Insulation and Durability....................................................... 636
23.7 Conclusion..................................................................................................... 637
References............................................................................................................... 638
619
620
23.1INTRODUCTION
The last 1520 years have seen many significant advances in the development of
on-body technologies for sensing, interface, and communication. The state of the art
in areas like humancomputer interaction, signal processing, and context awareness
has progressed dramatically since the first crude prototypes. However, most of this
progress has been primarily driven by the engineering community. Perhaps consequently, the development of manufacturing techniques has similarly focused on leveraging techniques and technologies well-established in production of small electronic
devices. Aside from being a well-known and often convenient method of producing
technology, in many ways this is a useful focus: electronic components and circuits
benefit significantly from the structure of hard goods. Durability and reliability are
dramatically improved when the electronic part of a system is well-structured and
insulated, protected from wear and tear as well as moisture and other contaminants.
Unfortunately, many of the qualities of hard goods that benefit electronic technologies are at odds with qualities that promote human comfort. Further, a large number
of body-worn devices can require excessive upkeep, maintenance, and memory on the
part of the user. On the other hand, reliance on encapsulating electronics in a single
unit significantly limits the body areas that can be accessed by a wearable technology,
and forces things like sensors to be confined to a very limited number of physical locations. In an application like activity monitoring, the number and specificity of activities
that can be sensed using a single measurement point (like the torso) is far smaller than
what can be achieved using sensors scattered over the body surface (including limbs,
etc.). Integrating electronics into garments can allow body access without requiring
the tending of a large number of body-worn units, and things like power and networking can be simplified. Finally, many exciting applications of wearable technology lie
in the realm of augmenting the functions of clothing, rather than distributing the functions of communications and information technologies over the body.
All of these factors lead to interesting potential for garment-integrated technologies.
However, perhaps one of the largest barriers to successful garment integration of electronics is manufacture. While the electronics industry is highly automated, with most
assembly being done by machine, the apparel industry is still a very traditional industry that relies heavily on manual labor. Due to the extremely short product cycle (with
new products introduced at intervals of 3months or shorter), it is an industry often
without the ability to invest in larger-scale changes to processes, skill sets, or timelines.
However, there remains significant potential to work within the existing structures, technologies, and methods used in apparel production. In many ways, adapting
the existing infrastructure to the production of smart garments may have distinct
benefits, especially in the near term. Here, we discuss some of the basic technologies
of apparel production, with a focus on the most common type of apparel factory, the
Cut-Make-Trim (CMT) factory. CMT operations are responsible for some combination of cutting textile goods, assembling a garment or other product, and applying
finishings such as trims, tags, or other embellishments. The exact capabilities vary
from factory to factory, and may include things like dyeing and printing as well as
front-end services like design, sourcing, and patternmaking, but core capabilities are
centered around cutting, sewing, pressing, and low-tech finishing processes.
621
622
Linz etal. have used a similar approach to attach PCBs by machine using a CNC
(Computer Numerical Control, or programmable) embroidery machine (Linz etal.
2005). In this process, the embroidery machine lays out a registration stitch to mark
the PCB location and then stitches electrical connections to the board, which pass
over and around the PCB through-holes.
However, the interface between a soft textile and the rigid edges of a PCB can create
problematic stress points on the conductive stitching. These points are the most common failure points for a textile-integrated circuit. To alleviate stress on the conductor
(and to simultaneously protect the PCB from moisture, wear, and contaminants), textileintegrated PCBs are often encapsulated in an impermeable coating, as shown in Figure
23.2. Depending on the size of the PCB/component, this could be as small as a glob-top
type coating or as large as a molded encapsulation (Kirstein 2013; Linz etal. 2005).
623
624
Fabric
Bobbin thread
much more secure than a chain stitch, and unravels slowly by pulling the two pieces
of fabric apart. The lockstitch is perhaps the most common sewing operation, used
to make body seams as well as finishing seams, trim attachments, and many other
operations. Its biggest drawback is that it does not stretch (nor does a chain stitch)
and therefore cannot be used to form extensible seams (such as in stretchy materials).
As seen in Figure 23.4, the interlock between the bobbin and needle thread is
typically located in the middle of the fabric piece or seam. This reduces wear and
tear on the twist and creates a stitch that is tight and has little slack. However, the
location of this interlock can pull to one side of the fabric or the other, depending on
the tension in each thread.
625
Needle 1
Needle 2
Looper
Needle 2
Inner needle
Lower looper
Fa
br
ic
ed
ge
Outer needle
Upper looper
23.3.3CNC Machines
Progressive stitches are most commonly formed by lifting the machines presser foot
(which presses the fabric to the bed of the machine) slightly between stitches and
moving the fabric using the machines feed dogs (sawtooth strips located beneath the
presser foot). Because of this, most sewing machines form a stitch that progresses
linearly, perpendicular to the orientation of the machine and parallel to the orientation of the feed dogs. Placing stitches in patterns other than a straight line often relies
on the operator moving the fabric as stitches are formed. For e-textiles, it is often
important to be able to sew more complex patterns automatically, without relying on
operator skill.
626
To form stitches in other directions, either the needle or the fabric must move.
Many lockstitch machines (especially home sewing machines) have the ability to
move the needle position laterally within about 1/41/2. This can be used to form
simple zig-zag stitches, or to form more complex stitch patterns, in combination with
fabric movement via the feed dogs.
Importantly, changing the direction of the stitch can have a significant impact
on the tension of the stitch. Tension is controlled by a complex series of springmounted plates, and mechanical take-up in the loose thread. Because it is calibrated
to dispense a given amount of thread per stitch, when the stitch travels laterally the
relationship between linear distance and thread consumption changes. If the stitch
becomes unbalanced, the top or bottom thread may wrap to the opposite side of the
fabric, pulled by a more dominant tension in the opposite thread. The lower the ratio
between the width of the stitch and the length of the stitch, the more different the
tension balance will be versus a straight lockstitch.
CNC embroidery machines, which are used to sew embellishments like logos
and graphics, use short zig-zag lockstitches (with very low width-to-length ratios)
to cover large areas with thread. Each color is prethreaded on an individual needle
(to avoid the need to rethread the machine), and the active needle changes as each
color is stitched, each needle interacting with the same bobbin thread. The zig-zag
creates coverage of a specific color, while CNC-controlled movement of the fabric
places this covering stitch in the appropriate place on the fabric. While embroidery
machines do also sew straight stitches, they are optimized for covering stitches.
Hence, the bobbin thread is typically much heavier than the needle threads, and the
tension balance is tighter on the bobbin than the needle thread, which causes the
needle thread to wrap to the back of the fabric. However, in an embroidery pattern,
the face of the fabric is most important and the appearance of the back is not taken
into consideration. (Embroideries are almost never weight-bearing, so durability is
not a significant issue.) Therefore, an imbalance in the stitch tension ensures that
the needle thread covers fully (and often wraps to the back of the fabric), creating
uniform color fill.
By contrast, pattern sewing machines typically have only one needle and one
bobbin to create a straight lockstitch, which is used to create nonlinear patterns by
moving only the fabric, and rarely the needle. While the stitch balance may still
be slightly tighter on the bobbin thread to ensure full coverage, it is much closer to
balanced than what is commonly seen in embroidery stitches. Pattern sewers are used
to sew reinforcement patterns or more linear designs. As these may be load-bearing,
the structural integrity of the stitch is much more important than in embroidery.
Both embroidery machines and pattern sewers (and increasingly, other machine
types) often have the ability to also trim the thread at the end of an operation, sometimes combined with an automatic back-tack stitch that reverses the stitch direction
for a few stitches in order to lock the trailing threads against force-induced separation. For embroidery machines, this feature is commonly used when switching colors. However, since embroidery stitches are rarely back-stitched (a backstitch would
create a buildup of thread), the machine is often designed to leave a much longer
thread tail than would be present on other kinds of trimmed stitches. The long tail is
then caught up underneath subsequent stitches, preventing the stitch from unraveling.
627
Match stripe
Match
stripe
Match stripe
(a)
(b)
FIGURE 23.7 A garment pattern piece (a) marker and (b) showing match stripes.
628
or embellishments are marked by drilling through the fabric (in cases where the hole
will be covered by the embellishment), or by marking with chalk (in cases where a
hole would be visible). For garments cut from fabrics with patterns that must match
across seams, a match stripe is placed on each piece.
Garments are most commonly cut in batches, and all of the pieces to be cut from
a given textile are cut at once. These may or may not be all of the pieces in a garment (depending on how many different fabrics are used in the garment), nor must
they be all from the same garment. Usually, many sizes of a garment are cut at once,
and sometimes pieces of different garments that use the same textile may be mixed.
The more piece shapes to be cut at once, the better the yield of the fabric (waste is
minimized), as the pieces can usually be packed more tightly. Since fabrics are cut
many plies at a time, waste can add up quickly. Garments cut from textiles with large
repeating patterns (such as plaids) usually have a much lower fabric yield, as the
placement of match points is a significant contstraint on efficiency of cutting.
The cutting process begins by laying a plan for cutting, called a marker (Figure
23.7b). The marker can be planned by hand or by computer, and can be printed/drawn
on paper or fed directly to a CNC cutting machine. (It should be noted that paper
markers cut by hand are more common.) Pieces must be laid into the marker according to their orientation requirements: for almost all knit and woven fabrics, pieces
have a clear grainline and must be placed so that they are appropriately oriented with
the grain. Match points for repeating patterns must also be taken into account in the
marker layout. Markers are usually drawn 1 shorter than the full width of the fabric,
so that the edge can be cut away and discarded. For conventional fabrics, this is for
several reasons: (1) the edge of many fabrics is stabilized and held under tension in the
loom while the fabric is woven, and has a very different hand than the rest of the fabric; (2) textiles are stored and shipped in rolls and the top and bottom may be exposed
to much more wear and tear than the interior of the fabric, and consequently may be
dirty or damaged; and (3) the textile production process may introduce irregularities
into the width of the fabric (most commonly from irregular tension in rolling the
fabric onto bolts). However, for e-textiles, this means that conductors or components
cannot be routed around the edge of the fabric, or they may be severed during cutting.
The fabric is spread on a cutting table to match the marker in length and number
of plies. In the case of a paper marker, the marker is placed on top, and the pieces are
cut using a reciprocating knife or rotary blade. In the case of a CNC cutting machine,
pieces are cut using a laser or reciprocating knife. Cut pieces are removed from the
cutting table and tied into bundles, which are then delivered to the production line.
629
(such as to insert a tubular sleeve into a tubular opening). However, there are structural requirements for order of operations as dictated by the geometry and design
of the garment. It may be impossible to perform certain operations before another
operation has taken place (e.g., the garment front and back must be attached before a
collar can be attached). The inverse is true as well: it may be impossible to perform
certain operations once others have been performed (e.g., once a sleeve is tubular, it
may be difficult or impossible to stitch an embellishment or trim down the length of
the sleeve due to the orientation of the machine bed).
For e-textiles, these requirements pose interesting challenges. If a designer intends
a stitched embellishment to progress from a collar onto the body of a garment and
down the sleeve, it is feasible using notches and placement marks to pre-embroider
each piece before the garment is assembled. However, if that embellishment conducts electricity, it must achieve electrical integrity as well as aesthetic integrity as
it crosses seams.
630
631
again, causing each point on the thread to pass through the needle (under tension)
upto 70 times before it is embedded in the fabric. This exerts significant wear and
tear on the thread, and can cause thicker or more brittle conductive threads to fray
and break. By contrast, the bobbin thread is fed more or less continuously to the
fabric, with little back-and-forth motion.
Stitching allows traces to be placed in almost any orientation and position on a
garment piece or fully assembled garment. Most stitched processes use uninsulated
conductors, and therefore seam crossings are subject to the methods described previously. As opposed to methods using knitted or woven conductors, stitched methods
can cross preformed seams with ease. However, as previously noted, the need for a
seam-crossing conductor must be taken into account during production planning.
23.4.4Trace Crossings
While in some products only one or two connections may be needed in the garment
itself, for garments using a central processor to control a larger array of peripherals,
it may be necessary for traces to cross in the garment. In a PCB, this is done by using
multiple board layers to allow traces to travel under and over each other. In fabric,
many layers quickly get bulky. Insulated conductors can be woven in orthogonal patterns, but this is significantly limiting for component placement unless interconnects
can be made between woven conductors. Intarsia processes can lay conductors into a
textile in more complex patterns, but may not be possible with insulated conductors.
Stitched processes are another method of allowing traces to cross without shorting, using a sewing machine. In practice, this process ends up similar to a couching
technique used in embroidery, in which an embellishment is stitched to the surface
of a textile using loops of finer thread.
Because lockstitch machines use two threads to form a stitch, one thread can be
used to couch a conductive thread to one surface of the textile. This relies on adjusting the tension balance of the two threads (shown in Figure 23.4), such that one
floats on the surface of the textile while the other is pulled through from one side
to the other. Provided that the textile is nonconductive, it can serve as the insulator
between conductors on each side. In this way, conductors can be stitched on either
side of the fabric.
As with PCBs, e-textile circuits often require interconnections between traces
on different layers. For insulated conductors (particularly those that pass directly
under and over each other with no textile or yarn layer between), interconnects can
be formed using methods previously discussed in this chapter. For uninsulated conductors, connections may need to be made using techniques similar to those used for
PCBs, by creating a perpendicular interconnection like a via or through-hole. In the
CMT factory, such via connections can be made using crimped processes, in which a
connector with metallic legs is inserted through the fabric, and the legs are clamped
around the fabric from the reverse side (see a stud example in Figure 23.8). In apparel
production there are many instances of crimped components, including things like
snap fasteners, studs, and zipper stops. As such, the attachment of crimped fasteners
is a well-established practice, and indeed specialized hand tools and simple machinery exist for this purpose.
632
Stitched pads
Top trace
Fabric insulator
Bottom trace
Stitched placement
pad
FIGURE 23.9 Example of multilayer crimped DIP stitched circuit. (From Dunne, L.E.
etal., Multi-layer E-textile circuits, in Proceedings of the ACM International Conference on
Ubiquitous Computing (UbiComp), Pittsburgh, PA, 2012.)
633
connection to form a solid electrical connection and to ensure the security of the
mechanical connection. Second, the stitch pattern must be designed such that there
is sufficient surface area of conductive thread to which the lead can be crimped.
To ensure good coverage of the stitched conductor, a lateral covering stitch pattern provides more surface area than a running stitch. However, as discussed earlier,
lateral covering stitches often result in an imbalance in tension between the bobbin
and the needle threads. For a conductive stitch pattern this can be difficult to manage: either the conductive bobbin is contracted as the needle thread wraps to the
opposite side, reducing the amount of surface area covered, or the conductive bobbin
wraps to the needle side, potentially shorting with opposite-side stitches. Further,
if a single machine performs both running stitch trace layout and covering stitch
pad layout, it is unlikely that one tension setting could be successfully used for both
operations. Programmable tension settings on sewing machines are not a common
feature, making this a significant problem.
634
(a)
(b)
FIGURE 23.10 (a) Surface-mount IC package soldered to stitched conductors and (b) reflowsoldered connection.
some limitation on the minimum spacing of conductor layout, relative to the diameter
of the integrated conductor. Particularly for stitched methods, very fine conductors are
often too weak to be used in standard sewing machines. To date we have successfully
integrated multipin SOIC (small outline integrated circuit) packages and SMD packages down to the 402 size using stitched conductors and reflow soldering techniques.
635
orthogonal to the direction of knitting and its location and configuration must be
preplanned at the textile development stage.
By contrast, stitched methods can be used to place a looped-conductor sensor anywhere on the garment surface, in any orientation. Many industrial sewing
machines can be used to form looped-conductor sensors: any machine that uses a
looper to form a looping or sinusoidal thread pattern is a candidate. To form the sensor, an uninsulated conductive yarn is used in place of one or more looping threads.
Depending on the geometry of the loop and the mechanics of its deformation, shorts
formed by self-intersections of the conductor cause a shortening or lengthening of
the electrical path as the stitch is deformed.
Figure 23.11 shows the electrical equivalent model of three stitch structures used
to create stretch sensors using cover stitch and overlock machines. In the case of the
L1
L2
L2
L1
L2
Overlock thread
FIGURE 23.11 Three stitched stretch sensors with equivalent electrical circuits.
L2
636
80
70
60
Bottom threadcoverstitched sensor
Top threadcoverstitched sensor
Overlock sensor
50
40
30
20
10
20
30
40
50
60
70
80
90
100
Time (s)
FIGURE 23.12 Relative responses to elongation for three types of stitched stretch sensor.
bottom-cover thread, loops slide toward each other and the electrical path shortens
as the stitch is stretched. In the case of the overlock and top-cover threads, loops
separate as the stitch is stretched, resulting in an increase in resistance as the electrical path lengthens.
Figure 23.12 shows the relative responses for sensors of similar length and the
same stimulus across the three types described earlier. Because the loops of the topcover and overlock sensors will all be eventually separated, their response saturates.
By contrast, the bottom-thread sensor loops will continue to deform until the fabric
tears. Therefore, there is no saturation region in the bottom-thread response.
The same stitch structures can be used to detect bend. Because the top- and
bottom-cover threads of a coverstitch are electrically isolated by the fabric between
them, a method such as that used by Tognetti etal. (2014) can be used to differentiate
bend from stretch by comparing the responses of two sensors coupled on opposite
sides of the same substrate. However, we have found that deformation of a single sensor on one side of the fabric also produces an analog resistance response to bending
(Gioberto etal. 2013). This is likely due to forces on the yarn bringing individual
fibers closer together and farther apart as the structure is bent.
637
FIGURE 23.13 Stitched bend sensor insulated with fusible polyurethane film.
23.7CONCLUSION
Garment integration of wearable technologies has significant advantages in comfort,
aesthetics, and usability. Further, it expands the range of body areas that can be
accessed by worn technologies. However, textile integration of electronics requires
adaptation of production processes for successful manufacturing, particularly at
larger scales. Potential exists in both directions: in adapting the processes standard
638
to electronics fabrication and in adapting the processes standard to apparel production. The tools and techniques used to produce sewn products have some interesting
advantages for soft goods and can be effective in making the production of e-textiles
and wearable technology more feasible for sewn product manufacturers.
REFERENCES
Berglund, M. E., J. Coughlin, G. Gioberto, and L. E. Dunne. 2014. Washability of E-textile stretch
sensors and sensor insulation. In Proceedings of the 2014 ACM International Symposium
on Wearable Computers, Seattle, WA, pp. 127128, ISWC14. New York: ACM.
Bhattacharya, R., L. Van Pieterson, and K. van Os. 2012. Improving conduction and mechanical reliability of woven metal interconnects. IEEE Transactions on Components,
Packaging and Manufacturing Technology 2 (1): 165168.
Bonderover, E. and S. Wagner. 2004. A woven inverter circuit for e-textile applications. IEEE
Electron Device Letters 25 (5): 295297.
Buechley, L. and M. Eisenberg. 2007. Fabric PCBs, electronic sequins, and socket buttons:
Techniques for E-textile craft. Personal and Ubiquitous Computing 13 (2): 133150.
Dunne, L. E., K. Bibeau, L. Mulligan, A. Frith, and C. Simon. 2012. Multi-layer E-textile
circuits. In Proceedings of the ACM International Conference on Ubiquitous Computing
(UbiComp), Pittsburgh, PA.
Gioberto, G., J. Coughlin, K. Bibeau, and L. E. Dunne. 2013. Detecting bends and fabric folds
using stitched sensors. In Proceedings of the 14th International Symposium on Wearable
Computers, Zurich, Switzerland.
Kirstein, T. 2013. Multidisciplinary Know-How for Smart-Textiles Developers, 1st edn.
Oxford, U.K.: Woodhead Publishing.
Lee, J. B. and V. Subramanian. 2005. Weave patterned organic transistors on fiber for E-textiles.
IEEE Transactions on Electron Devices 52 (2): 269275.
Lehn, D., C. Neely, K. Schoonover, T. Martin, and M. Jones. 2004. E-TAGs: E-textile
attached gadgets. In Proceedings of Communication Networks and Distributed Systems:
Modeling and Simulation. San Diego, CA.
Li, L., W. M. Au, Y. Li, K. M. Wan, S. H. Wan, and K. S. Wong. 2010. Design of intelligent
garment with transcutaneous electrical nerve stimulation function based on the intarsia
knitting technique. Textile Research Journal 80 (3): 279286.
Linz, T., C. Kallmayer, R. Aschenbrenner, H. Reichl, I. Z. M. Fraunhofer, and G. Berlin.
2005. Embroidering electrical interconnects with conductive yarn for the integration of
flexible electronic modules into fabric. In Proceedings of the Ninth IEEE International
Symposium on Wearable Computers, Osaka, Japan, pp. 8689.
Locher, I., T. Kirstein, and G. Troster. 2004. Routing methods adapted to E-textiles. In
Proceedings of the 37th International Symposium on Microelectronics (IMAPS 2004).
Slade, J., A. Houde, and P. Wilson. 2012. Electrically active textiles, articles made there from,
and associated methods. http://www.google.com/patents/US20120030935 (Accessed
date 3rd May, 2015).
Tognetti, A., F. Lorussi, G. D. Mura, N. Carbonaro, M. Pacelli, R. Paradiso, and D. De Rossi.
2014. New generation of wearable goniometers for motion capture systems. Journal of
Neuro Engineering and Rehabilitation 11 (April).
Yamashita, T., S. Takamatsu, K. Miyake, and T. Itoh. 2012. Fabrication of conductive polymer
coated elastomer contact structures using a reel-to-reel continuous fiber process. IEICE
Electronics Express 9 (17): 14421447.
Zysset, C., T. W. Kinkeldei, N. Munzenrieder, K. Cherenack, and G. Troster. 2012. Integration
method for electronics in woven textiles. IEEE Transactions on Components, Packaging
and Manufacturing Technology 2 (7): 11071117.
24 Integrating Energy
Garment Devices
Storage into Textiles
Kristy Jost, Genevieve Dion, and Yury Gogotsi
CONTENTS
24.1 Introduction................................................................................................... 639
24.1.1 Design and Material Parameters for Wearable Electronics...............640
24.1.2 Brief Introduction to Energy Storage Devices................................... 642
24.1.2.1 Energy Storage Components............................................... 642
24.1.2.2 Energy Storage Devices...................................................... 642
24.1.3 Introduction to Textile Structures...................................................... 645
24.2 Work in the Field...........................................................................................646
24.2.1 Coated Devices.................................................................................. 647
24.2.2 Knitted Carbon Fiber Electrodes....................................................... 650
24.2.3 Fibers and Yarns for Energy Storage................................................. 652
24.2.4 Custom Textile Architectures for Supercapacitors............................ 655
24.3 Conclusions.................................................................................................... 657
Acknowledgments................................................................................................... 657
References............................................................................................................... 657
24.1INTRODUCTION
Portable electronics have evolved rapidly over the last 10 years and now wearable technologies are following the same trend. While multifunctional clothes are
appearing on the market with a multitude of electronic devices incorporated onto the
fabric, garment devices are articles of clothing with inherent electronic properties.
Garment devices are the actual device, a new kind of technology, also referred to
as e-textiles or smart garments (Quinn, 2010; Seymour, 2008, 2010). Cutting edge
research on garment and textile devices (Figure 24.1), ranging from sensing, illuminating, and computer-like garments, continues to appear in the literature, and this
chapter explores how these devices could be powered. New techniques for integrating energy storage (i.e., batteries and capacitors [Simon and Gogotsi, 2008]) into
textiles are described and new methods for generating energy are briefly explored.
Figure 24.1 illustrates the concept of a garment device incorporating various electronic components by custom designing a knitted textile using conductive materials
(Dion, 2013; Kirsch etal., 2009; Sim etal., 2012).
639
640
(c)
(e)
(d)
(a)
FIGURE 24.1 Design concept for a smart power bodysuit. (a) Piezoelectric patch
converts body movements to electrical energy; (b) textile antennas to transmit communications; (c)textile electrochemical energy storage to store energy from harvesting devices;
(d)integrated conductive yarns act as leads to transmit energy or information throughout the
garment; (e) this design is simulated with realism in the textile structure to show that different materials can be integrated as part of a fabric. (From Jost, K. etal., J. Mater. Chem. A, 2,
10776, 2014a.)
Commercially available devices include the Adidas Mi-Coach, the Hi-Call Phone
glove, and the Under Armour heart rate monitoring shirt. However, many of these
wearable technologies still use solid coin cells or pouch cell lithium batteries, which
can be cumbersome, bulky, and are typically stitched or glued into the garment after
assembly. It has been proposed (Dion, 2013) that garment devices would have batteries integrated into the clothing that were indiscernible from regular textiles. This
chapter describes textiles capable of storing energy, fabricated with traditional and
advanced textile manufacturing methods (Figure 24.1).
However, what kind of battery technology and fabric structure will be ideal for
garment devices? We must first consider the design parameters and limitations a
garment device will have.
Garment Devices
641
Nanoparticles are also a concern for wearable devices since the long-term effects
from exposure to these new materials are unknown. However, materials with
controlled nanoscale structures are safe and can be used. Activated carbons (ACs)
or carbide-derived carbons (CDC) (Chmiola etal., 2006, Lin et al., 2009) are particles in micrometers (m) in size that can be developed with controlled pore sizes,
tunable by one-tenth of a nanometer. These materials are widely used for water filtration or for poison control in pill or powder form where pores can be tuned to
selectively adsorb specific impurities, for example. This is one of many instances
where nanotechnology does not pose safety concerns (Gogotsi, 2003). ACs are also
used in double layer capacitors (Section 24.1.2), and typically such energy storage
devices, including any nanoparticles used, are encased in a liquid or gel electrolyte.
Washability: The most common question asked about garment devices is can
they be washed? Washing batteries and electronics the way we wash our clothes
is typically avoided. While some components can be waterproofed, many of the
materials and technologies used in smart garments today are those used in conventional portable electronics such as smartphones and these would never be soaked
in water. Therefore, much like a good wool suit, these technologically enhanced
garments require special care when cleaning. In addition, a process like dry-cleaning
can better preserve garments compared to conventional wet-washing and machine
drying over the long term.
Reliability: If garment devices are to last years, the chosen battery technology must
be reliable for the predicted lifetime of the garments, requiring replacement only if
damaged. For techniques incorporating the battery into the textile material, a device
failure would mean replacing the entire garment.
Durability: Similarly to regular garments, garment devices incorporating battery
fabrics must be able to withstand normal wear and tear from everyday use. Therefore
many researchers include electrochemical testing of their devices not only when flat
but also when bent or stretched. These tests will be described in Section 24.1.2.
Cost: Some battery and supercapacitor systems are composed of rare metals;
theymay also require complex and expensive manufacturing processes. Given that
these must be converted into textiles, abundant materials have a greater chance of
successful commercialization. In particular, many of the works described in this
chapter utilize carbon materials, one of the most abundant elements on the planet.
Different forms of carbon vary in cost; activated carbon and graphite are relatively
inexpensive materials frequently used in supercapacitor and battery electrodes.
Fabrication: As previously mentioned, choosing manufacturing techniques that
already exist in the fashion and textile industry to produce energy storing fabrics
will allow for a smoother transition from lab-scale testing to large-scale manufacturing. This also means that the type of energy storing fabric should be designed
with commonly available materials, as well as based on the simplest conventional
electrode configurations. For example, if a device is composed of too many types of
material than a fabric making process can incorporate at one time, then it is likely
642
not a feasible system. This chapter will explore both printing and knitting techniques
for fabricating energy storing textiles.
Given the design parameters described earlier, understanding the basic principles
of different storage technologies will inform which technologies are best suited for
wearable applications.
643
Garment Devices
SO4
Current collector
(a)
Na
LiCoO2
Charging
Li+
Discharging
Separator
(b)
V
Graphite
+
+
MnIIIOONa
Na
MnIVO2
Electrode
(c)
FIGURE 24.2 Basic schematics for an (a) all carbon double layer capacitor (left), (b) a pseudocapacitor (MnO2 depicted center), and (c) a lithium-ion battery (right). All devices have an
active material (e.g., carbon, MnO2, LiCoO2), a current collector, a separating membrane, and
an electrolyte (e.g., Na2SO4 or LiPF6 solutions). (From Jost, K. etal., J. Mater. Chem. A, 2,
10776, 2014a.)
+
+
(c)
1.0
0.5
(b)
(e)
Potential (V)
(a)
0.0
(d)
(g)
Current
+
+
+
+
+
+
+ +
+
+
+
Current
(f )
Potential
+
4.2 (vs. Li/Li )
(vs. AgCl/Ag)
50 nm
3.6
Bulk
0.5
Q/Qmax
Potential
Potential (V)
1.0
3.0
(h)
Bulk
6 nm
0.45
1.0
Q/Qmax
644
645
Garment Devices
105
Capacitors
3.6 ms
0.36 s
3.6 s
36 s
Ni/MH
Li-ion
1h
Li-primary
ors
acit
cap
102
ical
hem
troc
103
Elec
104
10 h
10
1
102
PbO2/
Pb
101
10
102
103
FIGURE 24.4 Specific power against specific energy (Ragone plot) for various electrical
energy storage devices. If a supercapacitor is used in an electric vehicle, the specific power
shows how fast one can go, and the specific energy shows how far one can go on a single
charge. Times shown are the time constants of the devices, obtained by dividing the energy
density by the power. (From Simon, P. and Gogotsi, Y., Nat. Mater., 7, 845, 2008.)
646
Fibers
(a)
(b)
Staple fibers
(short <3 in.) e.g.,
cotton, wool
Filament fibers
(3 in. <long <3 ft) e.g., silk,
polyester
Yarns
(c) Staple spun yarn
(d)
(e)
Monofilament yarn
(f )
2-ply yarn
Full fabrics
(g)
Realistic knit simulations
Jersey
(h)
Plain
Rib
Cable
Miss
Twill
Satin
FIGURE 24.5 Illustration depicting fabric structures and their proper names. (a) Staple
fibers, (b) filament fibers, (c) illustration of a staple spun yarn, (d) illustration of a filament
spun yarn, (e) monofilament yarn is a single fiber with sufficient strength to also act as a
yarn, (f) illustration of a 2-ply yarn, (g) realistic simulation of different knit structures, with
a single yarn in dark gray to depict the path of the yarn, and (h) realistic weave simulations.
Modeled on the Shima Seiki Apex-3 Design software. (From Jost, K. etal., J. Mater. Chem.
A, 2, 10776, 2014a.)
Yarns: Fibers can be spun into a variety of yarn structures as seen in Figure 24.5c
through f.
Woven fabrics: These are sheets of material composed of yarns that are intertwined
over and under each other. Typically warp (vertical) yarns are prethreaded and held
tight while the weft yarns are woven back and forth over and under the warp yarns.
These sequences can be done in different orders to generate plain woven, twill, or
satin weaves (Figure 24.5h).
Weft knitted fabrics: Unlike woven fabrics, knitted fabrics are not composed of independent yarns; a full piece of fabric can be made entirely from a single strand of yarn
intermeshed row by row, back upon itself in a snake chain configuration (Figure24.5g).
Weft knitted fabrics have much more stretch in the weft direction (horizontal), and
less in the warp direction (vertical), making them anisotropic physically, but also conductively in the case of metal- or carbon-based yarns. Because the yarn is continuous
along the width of the fabric, electrons can flow through the material itself. However,
rows are electrically connected in the vertical direction by intertwined loops. Knitting
typically requires less material and set-up time to fabricate samples, and has been
explored as a main fabrication technique for smart textiles (Jost etal., 2013; Li etal.,
2010; Soin etal., 2014).
Garment Devices
647
after the garment is made, while others use manufacturing methods that knit or
weave the garment and device simultaneously.
24.2.1Coated Devices
In 2010, Yi Cuis research group at Stanford University reported on the fabrication
of dip-coated (or dyed) CNT textiles that showed excellent
performances as
supercapacitor electrodes (Hu et al., 2010). The group developed a water-based
ink mixed with surfactant and single-walled carbon nanotubes (SWCNTs) that
allowed the CNTs to conformally wrap around cotton fibers in commercially
available woven or knitted cotton fabrics. This conformal coating was highly conductive, and the group was able to demonstrate ~480 mF/cm 2 of energy stored.
This particular study also used lithium hexafluorophosphate as their electrolyte,
which is corrosive and nonwearable. However, they later published a new study
that focused on using benign neutral sodium and lithium sulfate electrolytes (Pasta
etal., 2010), and loaded 0.42.2 mg/cm 2 of SWCNT ink. This work sparked many
researchers to explore the incorporation of nanomaterials into textiles for wearable
supercapacitors.
In our initial work (Jost et al., 2011) rather than dying the fabric, we chose to
screen print carbon materials into cotton and polyester. Screen printing is a process where ink is pushed through a screen that has a shape masked into it (like a
stencil) and the resulting shape can be printed onto any surface, like printing logos
on a T-shirt. Screen printing is advantageous because it loaded larger quantities
of carbon (~4.9 mg/cm2) into the fabrics compared to the same textiles being dipcoated (~0.42.2 mg/cm2) (Jost et al., 2011) because the screen printing ink was
more dense with carbon than the dip-coating solutions (0.2 vs. 0.01 g/mL, respectively). Cotton and polyester woven fabrics were chosen for screen printing because
they absorbed the most carbon material when dip-coated compared to other cotton
twill and nylon fabrics.
Some of the key challenges to much of the early work on textile energy storage
were toxicity, poor flexibility, and leakage. We chose to use activated carbon (AC)
instead of CNTs, graphene, or other nanoparticles, because it is well known to be
nontoxic (thus safe to wear), and is also the most commonly used electrode material
in the supercapacitor industry because of its low cost and high specific surface area.
Therefore, we could also draw direct comparisons between AC fabric supercapacitors and conventional AC film supercapacitors (Section 24.1.2).
Similarly to Pasta et al. (2010), our work utilized aqueous 1 M sodium sulfate
(Na2SO4) and 2 M lithium sulfate (Li2SO4) electrolytes. These were still liquid and
posed challenges to the wearability of the devices; later work in the field explored
the use of solid and gel electrolytes. One advantage the Yi Cui group (Hu et al.,
2010) had while using CNTs was high conductivity, which meant they did not need
to use metal current collectors. ACs are too resistive to stand on their own as both
electrode and current collector, so we adhered the fabric electrodes to stainless
steel sheets. Because conventionally hard supercapacitors also use stainless steel or
nickel, the textile electrodes performance could be directly compared to conventional supercapacitors.
648
ar
ga
rm
en
t
Woven fabric
Yarn
Porous carbon
Fiber
(a)
300 m
300 m
(b)
(c)
30 m
(d)
6 m
(e)
FIGURE 24.6 (a) Design concept of a porous textile supercapacitor integrated into a smart
garment, demonstrating porous carbon impregnation from the weave, to the yarn, to the
fibers. Below: SEM images of weaves and their corresponding fibers (b) polyester microfiber
twill weave before coating. (c) Cotton lawn plain weave before coating. (d) Polyester fiber
screen printed with activated carbon (Kuraray, Japan). (e) Cotton fiber screen printed with
activated carbon (Kuraray, Japan). (From Jost, K. etal., Energ. Environ. Sci., 4, 5060, 2011.)
649
Garment Devices
l evels of porosity between the large micron size particles and its ~1nm micropores.
This hierarchical structure allowed for the electrolyte to access the carbon material quickly, analogous to cars initially traveling on large highways, then smaller
streets, and finally adsorbing in the smallest pores, like a car parking in a driveway. Figure24.6b and c shows SEM micrographs of the native polyester microfiber
(b) and cotton lawn (c) fabrics, where this hierarchical spacing can be observed.
It can be seen from Figure 24.6d and e that the activated carbon is well coated
around individual polyester and cotton fibers.
We found that screen printed electrodes performed better at faster scan rates compared to the standard AC films when tested under the same conditions (Figure24.7).
The textile devices stored ~0.25 F/cm2 and had ~4 mg of active material percm2, which
translated to a specific capacitance of ~88 F/g (Figure 24.7a and b). Cyclicvoltammetry
Polyeer microfiber
Woven cotton
Capacitance (F/g)
100
100
50
50
10 mV/s
100 mV/s
0
50
50
100
100
0.2
(a)
0.0
0.2
0.4
10 mV/s
100 mV/s
0.6
0.8
0.2
(b)
Voltage (V)
0.0
0.2
0.4
0.6
0.8
Voltage (V)
Activated carbon film
1.0
80
Capacitance (F/g)
C/C0
0.9
0.8
0.7
0.6
0.5
(c)
40
0
40
80
10
Scan rate (mV/s)
YP17 film Na2SO4
Cotton YP17 Na2SO4
Cotton YP17 Li2SO4
Polyester YP17 Na2SO4
100
(d)
0.2
0.0
0.2
0.4
0.6
0.8
Voltage (V)
10 mV/s
100 mV/s
FIGURE 24.7 Capacitance vs scan rate: (a) Gravimetric capacitance vs voltage obtained
from cyclic voltammetry of cotton lawn tested in 1 M Na2SO4, at 10 and 100 mV/s. (b) Cyclic
voltammogram of polyester microfiber tested in 1 M Na2SO4 shows more resistive behavior
compared to cotton. (c) Normalized capacitance vs scan rate. (d) Cyclic voltammogram of a
YP17 film tested in 1 M Na2SO4 under the same conditions as the textiles electrodes. (From
Jost, K. etal., Energ. Environ. Sci., 4, 5060, 2011.)
650
was conducted for all samples in 1 M Na2SO4 at a range of scan rates from 10 to
100 mV/s. Figure 24.7c shows how the textile devices retain a higher capacitance at
increasing scan rates compared to the AC film, which loses almost 50% of its capacitance when the rate is increased from 1 to 100 mV/s. It is clear from the CVs (Cyclic
Voltammogram) in Figure 24.7d that the AC film electrodes do not retain high capacitance when the scan rate is increased (reducing the time allotted for charging). The
textile devices have added porosity in their woven structures, while the AC film only
has spacing between AC particles, thus electrolyte cannot diffuse as quickly through
the films as the fabrics. We believe this is why the textiles perform better at faster scan
rates. However, the AC film still stores much more energy per area than the fabrics
because it has more AC per area.
This technique is viable for applications that require printing the electronics onto
surfaces, and certainly has shown good performance. However, this system still
incorporates solid steel current collectors and a liquid electrolyte. Therefore, new
material systems were explored in a later paper (Jost etal., 2013) and will be outlined
in Sections 24.2.2 and 24.2.3. Since this work was published, other works explored
similar methods for incorporating graphene materials into cotton textiles (Li etal.,
2013; Liu etal., 2012a).
651
Garment Devices
(b)
3 cm
(a)
SiWA gel electrolyte
Porous PTFE
separator
(d)
(c)
Activated
carbon ink
Assembled flexible
device
FIGURE 24.8 (a) Continuous length of knitted carbon fiber squares in green wool,
(b)carbon fiber square coming out of the knitting machine, (c) close-up of carbon fiber electrode screen printed with activated carbon ink, (d) testing setup of layered fabric structure
coated in a polymer electrolyte and an image of the assembled device,
(Continued)
652
Device capacitance
per area (F/cm2)
1.0
1.0
Knitted CF
0.5
Voltage (V)
1.5
Woven CF
0.0
0.5
1.0
(e)
0.4 A . g1
0.8
Knitted CF
0.6
0.4
Woven CF
0.2
0.0
0.0
0.2
0.4
0.6
Voltage (V)
0.8
800
1.0
(f )
900
1000
Time (s)
textile, but of the active carbon material and the available nontoxic/nonliquid electrolytes. Both can be optimized for garment device applications.
At present, polyvinyl alcohol (PVA)-based gel electrolytes (Gao and Lian, 2011)
are the most conductive compared to ion conducting polymers like polyphenylene oxide (PPO) and polyethylene oxide (PEO). The PVA gel electrolyte has water
trapped in the polymer matrix, and ions travel through the water, much like in a typical aqueous electrolyte. For this reason, many different kinds of aqueous electrolytes
can be incorporated into gels as long as they are nonreactive with the PVA.
The advantage of knitting some of the conductive elements in a garment device
is that any program can be finalized and immediately sent to a factory to be knitted. Small components like these square patches, shown in Figure 24.8, can also be
modified and incorporated into a garment for manufacturing.
This system still requires two electrodes being stacked one on top of the other,
and would likely have applications in outerwear garments that also have multiple
layers of fabric, as compared to a fine knit T-shirt. However, this paper is one
of only a few to manufacture a custom textile for an electrochemical application.
Amajority of battery researchers do not have access to industrial knitting or weaving equipment; therefore, many began to work on developing yarn and fiber supercapacitors (Fu etal., 2012, 2013; Kwon etal., 2012; Le etal., 2013; Li etal., 2013;
Meng etal., 2013).
Garment Devices
653
Some of the first reports on fiber- or yarn-like capacitors and batteries began
to appear in 2011 from Z.L. Wangs group at the Georgia Institute of Technology,
Atlanta, GA, where nanowires were grown on Kevlar to act as pseudocapacitive fiber
electrodes (Bae etal., 2011). The electrodes were twisted around each other (similar
to Figure 24.5f) and used a gel electrolyte to keep them electrically separated from
each other. The authors reported capacitances on par with micro-supercapacitors
used to power on-chip circuit components. This group has also reported nanowirebased energy harvesting fibers that can be used in textiles (Wang and Wu, 2012).
Many more papers have since appeared in the literature demonstrating a variety
of graphene (Carretero-Gonzlez etal., 2012; Cheng etal., 2014; Lee etal., 2013),
graphite (Fu etal., 2012), CNT (Wang etal., 2013), and pseudocapacitive (Bae etal.,
2011; Fu etal., 2013) yarns with comparable performances to commercially available
capacitors or batteries. Graphene yarns in particular are increasingly popular because
they are highly conductive (no need for an additional metal current collector) while
also having high surface area to promote high capacitance. Work from Prof. Wallace
(Aboutalebi etal., 2014) and Prof. Baughman (Lee etal., 2013) has demonstrated very
high conductivity as well as high volumetric capacitance for graphene yarns. Wallace
and other groups have also explored the method of wet-spinning these graphene
fibers, where they produce large quantities of material (Meng etal., 2013). They have
also conducted extensive flexibility testing with resulting electrochemical data. Ithas
generally been found from graphite, graphene, and CNT yarns/fibers, which are
continuous strands of materials, that there is little difference in the electrochemical
performance when bent or deformed. Systems that use larger micron-sized particles
show more variation as the particles may be displaced with movement, potentially
breaking conductive bonds. However, bending does not seem to have a significant
effect on the charge storing mechanisms. Recently, new works have been demonstrating stretchable fibers and yarns (Ren etal., 2014; Yang etal., 2013).
Because supercapacitors and batteries must incorporate an electrode, current collector, separator, and electrolyte into its system, researchers have demonstrated different geometries for fiber and yarn ECs. A single fiber can be coated in multiple
layers of electrode, current collector, etc., forming a coaxial style EC (Meng etal.,
2013; Yang etal., 2013) as seen in Figure 24.9a through c. However, sometimes two
electrode fibers are plied/twisted together and separated by a membrane or solid
electrolyte (Figure 24.9d). Any of these fiber geometries can be modified for asymmetric capacitors or batteries, where one electrode is much larger than the other.
However, much like insulated wire, these pose challenges to connecting components
when embedded into fabrics, since the end would have to be stripped and soldered
after the fabric is constructed. Therefore, developing connectors or other methods
that do not completely insulate the components is a field that requires more research.
Lastly, some strips of electrode material were layered into a full EC geometry and
scrolled along their length into a yarn (Gorgutsa etal., 2012). This structure is actually similar to rolled electrodes in commercially available supercapacitors.
One of the main challenges that researchers are yet to fully address is the conductance along lengths of yarns. Since the energy stored will be proportional to the
length, more length may be desirable. However, the longer the supercapacitor (or any
wire), the more resistive it becomes. Eventually, that resistance will become too high
654
(b)
(a)
(d)
(c)
Electrolyte
Garment Devices
655
quantities of yarn (up to 200 ft) at a time on an in-house setup. In addition to producing large quantities of yarn, the main goal of this work was to demonstrate supercapacitive yarns knitted into full textile devices. Therefore, different modifications
of the welded yarns were processed on an industrial knitting machine to determine
their viability for full fabric production (Jost etal., 2014b).
Cotton, linen, bamboo, and viscose yarns were all subjected to NFW with activated carbon, all had an increase in strength, but a reduction in elasticity as shown
by tensile testing. However, only the cotton yarn was unable to be knitted. We determined that the cotton fibers were shorter than the linen, bamboo, and viscose fibers,
and in the 12 gauge needle loops, the cotton fibers separated resulting in a break in
the yarn. When the same force is applied to longer fibers, the strength of the polymer
plays a more significant role in preventing breakage.
All yarns were electrochemically tested in geometries similar to Figure 24.9d prior
to knitting. Initially, cotton yarns NFW with AC were twisted with steel yarn to increase
the conductivity, and resulted in yarns having capacitance up to 8 mF/cm, which was
much higher than previously reported works typically around 0.5 mF/cm. However,
when the plain cotton and steel yarns were twisted prior to NFW with AC, the mass
loading increased from ~0.3 to 0.6 mg/cm, and resulted in yarns with capacitance as
high as 37 mF/cm. This is currently the highest reported capacitance per length for any
carbon-based yarns. Only batteries and redox active yarns surpass this value (Kwon
etal., 2012; Liu etal., 2013).
Upon discovering the exceptional properties of these electrode yarns, we proceeded to knit them into stripes. We discovered very quickly that the cotton yarns
were too brittle to withstand industrial knitting. Multiple welding modifications
were explored to make the yarns softer, but they did not knit without breaking apart
completely. It was at this point that linen, bamboo, and viscose yarns were also
explored as electrodes, and were NFW with AC. The viscose did not hold as much
material, and only had a capacitance of ~1.5 mF/cm. The linen and bamboo were
on par with the cotton yarns, having a capacitance of ~811 mF/cm. After twisting them with steel yarn, they were processed on the knitting machine. The linen,
bamboo, and viscose fibers were all significantly longer, and therefore were not
pulled apart during the knitting process compared to the shorter cotton fibers. More
work is ongoing to characterize the performance of the yarns knitted into fabrics.
It is clear that researchers are beginning to transform their novel yarn materials
into fully woven or knitted fabric devices (Gaikwad etal., 2012; Soin etal., 2014;
Zhouetal., 2014).
656
(a)
Battery stripe
Conductive threads
(b)
Ag/PA66
(Nylon)
PVDF
spacer
Ag/PA66
(Nylon)
3 mm
(c)
FIGURE 24.10 (a) Depicts coated strips of battery electrode fabric, demonstrating their
flexibility and stretchability, (b) depicts a hand woven structure where lengths of these strips
are woven alongside conductive yarns (red and blue) to form a fully functional battery fabric,
and (c) depicts a 3D knitted spacer fabric with piezoelectric properties.
blue yarns are conductive, and the battery strips are woven in between the red and
blue yarns to have a positive and negative connection.
Our work on knitted and screen printed supercapacitors (Jost et al., 2013)
also used knitting to build CF current collectors in specific geometries (Figure
24.8). We also demonstrated the ability to incorporate these into full garments.
Continuing with knitting, other researchers have also knitted piezoelectric spacer
Garment Devices
657
fabrics (Soin etal., 2014) where the top layer of this three-layered fabric is a positive terminal, the bottom is negative, and the interlacing spacer yarn in between is
the piezoelectric polyvinylidene fluoride (PVDF) monofilament yarn. The authors
made use of the textile structure to create functional layers in a single piece of
fabric. This kind of creativity is an excellent example of where the field is heading.
Moreover, for energy storage, methods that can scale up quickly and easily are
essential to the success of energy storing textiles. Fabrics that can provide structural
solutions to the arrangement of specialized materials can be integrated into garments
with greater ease, and can be carefully designed to have desired energy and power
densities. In the future, we can expect to see materials and devices being manufactured more like traditional fabrics. Using this approach it is likely that electronic
fabrics in the future will be visually indistinguishable from everyday textiles.
24.3CONCLUSIONS
Textile energy storage is an exciting field of research with much promise; however,
as wearable electronics have begun to appear on the market, textile energy storage
remains an underdeveloped component. Understanding the design parameters for
both fabrics and energy storage devices is crucial to push the field forward and
find creative solutions to integrate energy into textiles. Many challenges remain,
including finding nontoxic options for battery electrolytes, further increasing the
energy density of fabric batteries and supercapacitors, and finally integrating these
devices into garments using scalable and cost-effective techniques. However, it is
clear that the field is growing quickly with creative and competitive solutions appearing regularly in the literature. A seamless solution is bound to appear.
ACKNOWLEDGMENTS
The authors thank Dr. Paul C. Trulove and colleagues at the U.S. Naval Academy,
Department of Chemistry. K. Jost recognizes support from the Department of
Defense National Science and Engineering Graduate Fellowship (DoD-NDSEG).
REFERENCES
Aboutalebi, S. H., Jalili, R., Esrafilzadeh, D., Salari, M., Gholamvand, Z., Yamini, S. A. etal.
(2014). High-performance multifunctional graphene yarns: Toward wearable all-carbon
energy storage textiles. ACS Nano, 8(3), 24562466.
Bae, J., Song, M. K., Park, Y. J., Kim, J. M., Liu, M. L., and Wang, Z. L. (2011). Fiber supercapacitors made of nanowire-fiber hybrid structures for wearable/flexible energy storage.
Angewandte Chemie International Edition, 50(7), 16831687.
Carretero-Gonzlez, J., Castillo-Martnez, E., Dias-Lima, M., Acik, M., Rogers, D. M.,
Sovich, J. et al. (2012). Oriented graphene nanoribbon yarn and sheet from aligned
multi-walled carbon nanotube sheets. Advanced Materials, 24, 56955701.
Cheng, H., Hu, C., Zhao, Y., and Qu, L. (2014). Graphene fiber: A new material platform for
unique applications. [Review]. Asia Materials, 6, e113.
Chmiola, J., Yushin, G., Gogotsi, Y., Portet, C., Simon, P., and Taberna, P. L. (2006). Anomalous
increase in carbon capacitance at pore sizes less than 1 nanometer. Science, 313(5794),
17601763.
658
Garment Devices
659
Li, X., Zang, X., Li, Z., Li, X., Li, P., Sun, P. et al. (2013). Large-area flexible core-shell
graphene/porous carbon woven fabric films for fiber supercapacitor electrodes.
25
Collaboration with
Wearable Computers
Mark Billinghurst, Carolin Reichherzer,
and Allaeddin Nassani
CONTENTS
25.1 Introduction................................................................................................... 661
25.2 Related Work................................................................................................. 662
25.2.1 Collaborative Wearable Systems....................................................... 663
25.2.2 Communication Theory.....................................................................666
25.2.3 Environment Capture......................................................................... 667
25.2.4 Summary...........................................................................................668
25.3 Social Panoramas.......................................................................................... 670
25.3.1 Prototype System............................................................................... 671
25.3.1.1 Panorama Viewing.............................................................. 671
25.3.1.2 Remote Awareness.............................................................. 672
25.3.1.3 User Interaction................................................................... 673
25.3.1.4 Networking......................................................................... 673
25.3.1.5 User Experience.................................................................. 674
25.4 Pilot Study..................................................................................................... 674
25.5 Conclusion..................................................................................................... 676
25.5.1 Future Work....................................................................................... 677
References............................................................................................................... 677
25.1INTRODUCTION
Since the first days of wearable computers in the 1970s, research in the field has
largely been on how wearable systems can enhance a single users interaction with
the world around them. In an early definition, Mann stated that a wearable computer
was a computer that is Ephemeral, Eudaemonic, and Existential, or always on,
part of the user, and under the users complete control (Mann 1997). In this case,
Mann was focusing on the potential for wearable computers for enhancing personal
imaging. Similar definitions of wearable computers (e.g., Rhodes 1997, Starner
1999) focused on the benefit that wearables could provide to the individual user and
mediate their interaction in the real world. For example, the Remembrance Agent
661
662
(Rhodes 1997) demonstrated how a wearable system could supplement the users
own memory and data retrieval, while the Touring Machine (Feiner etal. 1997) was
a wearable c omputer that provided an Augmented Reality experience to show virtual
information in place.
However, wearable computers can also be used to support remote collaboration. For example, in the CamNet system (British Telecom 1993) a wearable
computer combined with a head-worn camera and display was able to transmit
live video, audio, and still images from an ambulance worker to a doctor waiting at the hospital. Similarly, the Netman wearable computer allowed a remote
technician to stream video and sensor data (IR, network monitoring) back to a
remote expert to enable them to monitor network status (Bauer etal. 1998). The
work has also been done showing how wearable computers could be used to support remote 3D manipulation tasks, and provide an increased sense of remote
awareness.
Despite these early projects, there is a need to conduct more research on how
wearable computers can be used to enhance collaboration. In 2001, Starner listed
eight important topics for future work in wearable computers, and highlighted
collaboration as one of those areas (Starner 2001). He identified three projects as
typical of what should be researched for collaborative applications: (1) live video
view-sharing and remote technical assistance (Kraut etal. 1996); (2) body stabilized
information display for 3D collaboration (Billinghurst etal. 1998); and (3) wearable
agents acting on behalf of their owners (Kortuem etal. 1999b). While there has been
some research carried out remote expert assistance, there has been less work done on
information displays and wearable agents. Modern wearable computers have more
processing, input, and output capabilities than those of 10years ago and so could
be used to develop a far wider range of collaborative experiences that move beyond
these initial application areas.
In this chapter, we review previous research on collaboration with wearable computers, discuss current work in the area, and identify areas of future research. Many
of the demonstrated collaborative wearable systems are fairly straightforward extensions of remote desktop applications, so we are particularly interested in wearable
applications that permit new types of collaboration and create an increased sense of
remote presence. Unlike earlier systems that were mostly focused on collaboration
in professional settings, we are also interested in the use of wearable computers for
social collaboration. To demonstrate the types of systems that might be possible, in
the second half of the chapter we describe a prototype application we have developed
called Social Panoramasa new type of wearable interface that enables sharing of
social spaces. Finally, we conclude this work with suggestions for future research
and other types of wearable interfaces that could be explored.
663
FIGURE 25.1 CamNet remote collaboration hardware. (From British Telecom, CamNet
Promotional Video, 1993.)
664
665
Camera
Motion sensor
HMD
Headset
Headphone
Microphone
WACL
Camera
Laser pointer
(a)
(b)
FIGURE 25.2 Comparing an HMD camera (a) to the WACL hardware (b) with shoulder
worn robotic camera and laser pointer.
Another approach for providing greater situational awareness is to allow the remote
user to have independent control of the camera viewpoint. Mayol etal. (2000) developed
a similar remote collaboration system with a wearable camera on a pan/tilt mechanism
that allows the remote expert to control its orientation. The remote expert receives a
live stream from the camera and is able to set his or her own independent viewpoint
with the camera. They also used computer vision to achieve an image-stabilized view,
regardless of how the wearable computer user was moving. In user studies they found
that the highest score for the combination of steadiness and field of view was gained
by wearing the camera on the shoulder instead of the head, hand, or chest.
The wearable active camera with laser pointer (WACL) (Kurata et al. 2004)
also used a shoulder-mounted system that the worker would wear (see Figure 25.2).
However, in this case a laser pointer is combined with a camera, enabling the remote
expert to point at locations and objects. Thus, the remote user was able to look around
the workers environment independently of the workers viewpoint and highlight real
objects in their workspace. A user study compared WACL to a fixed head-mounted
camera system and found that there was no difference in performance on an object
search and assembly task, but that users reported that it was more comfortable, less
tiring, and more eye-friendly to wear.
WACL used a laser pointer to convey gesture information to the real world. Other
systems have explored a wide range of different methods to convey remote gesture
cues. For example, GestureCam (Kuzuoka et al. 1994) allows a remote person to
control a robot at the local site that can execute pointing instructions with a laser
pointer. By pointing on the live camera feed, the remote expert is able to move the
robot pointer in the workers environment to indicate an area of interest.
In the drawing over video environment (DOVE) (Ou etal. 2003), hand gestures
are shown inside the video stream of the remote helper and on a display to the local
worker to facilitate remote gesturing. Similarly, Kirk and Fraser (2005) built a system
where the hands of the helper were sent via a video stream to the local worker and
666
projected onto his desk. Poelman etal. (2012) describe a system to support collaboration on a crime scene where investigators at the scene are remotely supported by
experts. The colleagues at the scene wear an HMD with stereo cameras, which maps
the entire scene in 3D and gives the expert user the ability to have an independent
viewpoint and provide a shared augmented space. Additionally, the hand gestures
of the local user are tracked and streamed by the remote user. The lack of presence,
however, resulted in difficulties to effectively communicate and users ended up often
interrupting each other. HandsOnVideo (Alem etal. 2011) is a project that further
explores the richness of hand gestures. Here, the worker wears a display close to his
or her eyes and can see the representation of the helpers hands. The remote collaborator can see the viewpoint of the worker on a big touch-enabled display. The size of
the display is however too big to be portable.
These systems show the importance of providing support for remote gesturing
in collaborative wearable systems. Many early systems just provided remote pointing functionality, but as shown earlier, with newer functionality it is possible to go
beyond this and show natural hand motion and rich gestural communication.
667
allowed a person to point to a location on the shared video with a virtual arrow, place
virtual outlines on an object to highlight it, and use directional arrows to show how
actions should be performed.
In a broader sense, Kortuem identifies primitives that provide simple collaborative
functions that can be combined to build complete collaborative interfaces (Kortuem
1998). These primitives include the following:
1.
Remote awareness: Users must be aware of who else is participating in the
remote collaboration.
2.
Remote presence: Remote users must be represented in the interface in
some way to be able to share communication cues.
3.
Remote presentation: The remote user must have some way to present
information in the view of the other participant.
4.
Remote pointing: The ability to control a remote cursor to point at an object
in the users field of view.
5.
Remote manipulation: The ability of a remote user to manipulate objects in
the users field of view.
6.
Remote sensing: The ability of a remote user to have direct access to the
sensors on the local users wearable computer.
These papers show that wearable collaborative systems should support as many
communication cues as possible to facilitate grounding and establishing shared
understanding.
25.2.3Environment Capture
As seen in Section 25.2.1, many wearable systems stream video or images from the
wearer to a remote person, enabling them to capture a small piece of their environment. However, in most of these systems the remote users view is limited to the
live feed from the head-worn camera reducing awareness of the users surroundings. Systems like WACL allow the remote user to have camera control and so can
increase their situational awareness.
There are other tools that can be used to provide a more immersive view of the
wearable computer users environment. One of these is through the use of image
panoramas. Work on panoramic imagery on desktop computers has been around
since the 1990s, and with increasing technological advancements, panorama applications have become popular in recent years. Such applications as Google Street
View (Google 2014) or Microsoft Streetside (Microsoft 2014) offer 360 imagery of
street scenes. These can be used to explore remote locations in a similar manner to
that of a virtual tour. However, these immersive panoramic scenes require special
hardware and well-calibrated cameras to capture the panoramic image, which comes
with a high cost and thus are not accessible for the end user.
Recently, software for creating panoramas has begun to appear on consumer
devices such as mobile phones and digital cameras (Pece et al. 2013). Advances
in hardware for mobile devices and smartphone platforms have resulted in widespread use of portable devices equipped with high-quality cameras. There has been
668
significant interest in the ability to quickly and easily create panorama imagery at
anytime and anywhere (Au etal. 2012). After years of research in the field of computer vision toward algorithms that can quickly and easily create panoramas, it is
now a popular feature on mobile phones to offer functionalities to combine or stitch
collections of images together (Xu and Mulligan 2013). One such example is Ztitch
(Au etal. 2012), an application for Windows Phone, which lets users create, modify,
and upload panoramic imagery onto an online portal to share with other users. Other
existing panorama applications include Photo Sphere,* TourWrist, and Photosynth.
However, interaction with these platforms is limited to online viewing and there is
no support for real-time collaboration.
Panorama imagery provides an easy way to capture a remote users environment,
but there has been a little previous research on the use of panorama imagery on wearables for remote collaboration. One of the exceptions is the work of Cheng et al.
(1998) who developed a system for collaboration that stitched together pictures from
the wearable computer camera to create image mosaics. These mosaics were shared
with the remote expert on a PC allowing them to place virtual annotations in the wearable users environment. However, their work focused on image tracking and they
didnt generate true immersive panorama views or evaluate the usability of the system.
Similarly, the WACL system (Kurata etal. 2004) used knowledge about the camera orientation to create pseudo-panorama images of the wearers environment,
stitching together overlapping images. The WACL remote expert interface provided
this view along with a live camera view to enable the expert to have increased awareness about the users environment (see Figure 25.3). Using this interface, the expert
can click on the panorama image to project a laser spot in the real world. However,
as with Chengs work, the system does not create a truly immersive panorama that
can be freely viewed by the expert. The focus of the WACL remote expert interface
is on helping the expert understand what the wearable user is doing and to communicate back with them, not capturing the users space and presenting it in an
immersive view.
More recently, new depth sensing technology such as the Microsoft Kinect
hardware has been used to capture 3D representations of space. For example, the
commercially available Occipital sensor can be combined with tablets to create a
handheld scanning solution for environment capture. These systems have not yet
been integrated into wearable systems, but it is expected that the next generation of
wearable devices will move beyond panoramas to full 3D environment capture.
25.2.4 Summary
In this section, we have provided a brief overview of the evolution of wearable computer interfaces for remote collaboration. As can be seen from this earlier work, most
http://www.google.co.nz/maps/about/contribute/photosphere/.
http://www.tourwrist.com/.
http://photosynth.net/.
http://www.microsoft.com/en-us/kinectforwindows/.
https://occipital.com/.
*
669
Objective image
for stabilization
Live video
Pseudo-panoramic view
research on using wearable computers for remote collaboration has been focused
on sharing first-person live video. Many of these systems have support for remote
pointing or gesture-based communication. Some research has also supported use
of independent camera views through body-mounted cameras. There has also been
some research on how to apply communication theory to inform the development of
collaborative wearable interfaces.
From this earlier research it is clear that if we want to develop a wearable system
for remote collaboration, it should have the following features:
In the next section, we describe a wearable interface we have developed that has
these properties. However, there are some important differences between our wearable system and earlier research. Our research explores the use of real-time panorama sharing from a wearable computer for remote collaboration, and pointing and
drawing interaction methods in shared panoramas. It studies the effects of presence
and awareness inside a panoramic image presented on HMDs and tablet interfaces.
Most importantly, there is a focus on shared social experiences, compared to the earlier work that used collaborative wearable interfaces for industrial and professional
applications. In the next section, we describe this in more detail.
670
FIGURE 25.4 Conceptual image of using a wearable computer to create Social Panoramas.
Users can share their space and view remote annotations.
https://www.google.com/glass/start/.
http://www.reconinstruments.com/.
http://new.livestream.com/.
*
671
25.3.1Prototype System
In order to explore how panoramas and wearable computing can be used to support
real-time collaboration, we developed a prototype system that connects a person
using a Google Glass wearable computer with a second user on an Android tablet.
The overall goal of our research is to develop a system that will allow a user to capture a panorama of the space around them and then share it in real time with a remote
collaborator. However, Glass has limited processing power that makes it difficult to
perform real-time panorama stitching, so in the initial prototype we assume the panorama has been captured and focus on supporting remote collaboration, presence,
and interaction.
The current prototype has been developed using Processing* with the Ketai
library for sensor support. Processing is a software library that makes it very easy
to create prototype mobile applications. It builds the code into an Android APK
file that is then pushed to run on the target device, either Glass or tablet. The code
was split into two separate code bases; one for Glass and one for the Tablet, which
were mostly similar except for touch interaction and network connection. The Ketai
library provides complete access to all of the sensors on an Android device, such as
camera, orientation sensors, accelerometers, etc. Using this we are able to create an
immersive panorama view that responds to user viewpoint orientation.
25.3.1.1 Panorama Viewing
The system uses a previously captured panorama image mapped onto a virtual cylinder. This simulates a live capture system. To display a panoramic image to the user,
a virtual cylindrical 3D model is rendered with 32 edges using the QUAD_STRIP
shape definition from Processing. Once the cylinder is created, it is textured with the
desired panorama image. The height of the cylinder is set to match the height of the
image, and the width and radius were calculated to maintain the aspect ratio of the
image. The panorama image textures were taken of the local laboratory environment
in which the user evaluations were conducted (see Figure 25.5). This was designed to
mimic the experience of the users capturing their own surroundings.
The cylinder is translated to be surrounding the camera view (i.e., the user view).
Google Glass contains an orientation sensor, and using the Ketai library to read the
values from this, the user can view the panorama by rotating their head, and the
integrated orientation sensor is used to rotate the panorama cylinder accordingly to
match the users orientation change. We also developed a viewing application for the
http://www.processing.org/.
https://code.google.com/p/ketai/.
672
Android tablet that uses the tablet orientation to rotate the tablet view. Currently we
just map rotations about the vertical axis.
25.3.1.2 Remote Awareness
In a collaborative interface, effective collaboration is based on mutual understanding
and grounding and is connected with the concepts of social presence and shared
awareness of the other users actions. Social presence is the feeling of being with
another person in the same communication (Siitonen and Olbertz-siitonen 2013).
In the field of humancomputer interaction (HCI), studies have shown that social
presence is affected by interface types (Biocca et al. 2003) and interactivity
(Trendafilov etal. 2011). Thus it is important in the interface to provide some cues
that provide awareness of what the remote user is doing.
In the Social Panorama interface, we were particularly interested in showing where the remote user is looking. Even though they are sharing the same
panorama view, both connected people could be looking at different portions of
the image. So providing awareness of where the remote person is looking is very
important to facilitate effective communication. There are several methods for
doing this, and in developing the prototyping interface, we explored the following (see Figure 25.6):
1.
Centered radar: An interface cue built using a radar metaphor. A top-down
radar display is shown in the center of the screen with different colored
wedges drawn on it showing the view angles of the two people connected
into the shared panorama. If the wedges are overlapping, then the two users
are seeing the same parts of the panorama.
2.
Context compass: A line shown at the top of the screen represents the viewpoint into the panorama from 0 to 360. Different colored rectangular
boxes are drawn on top of this line representing each of the users viewpoints. They move as the users rotate their heads around, and when the
boxes are overlapping the users share the same viewpoint.
In addition to these remote awareness cues, we also added a different colored circular dot in the center of the screen for each of the users. When one user saw the
others dot coming into view then they knew the other person was beginning to share
(a)
(b)
FIGURE 25.6 Two different awareness cues: (a) radar display and (b) context compass.
673
the same viewpoint as them. We also explored complementing the center dot with a
rectangular border around the users view, also showing when the viewpoints began
to overlap.
25.3.1.3 User Interaction
From Section 25.2.4, we know that interfaces for remote collaboration should support
the ability to share graphical annotations and nature gesture. In the Social Panorama
interface, we provide support for shared pointing and drawing, enabling participants
to easily refer to surrounding objects and the environment. The aim was for drawing
to mimic traditional sketching with paper and pencil, providing communication that
feels natural. Drawing and pointing are two modes of interaction that have previously been shown to be good gesture surrogates (Fussell etal. 2004).
Using Google Glass, users are able to touch the touchpad on the side of the Glass
display to point or draw on the panorama. The interface could either be in a pointing or drawing mode, and the user can tap on the touchpad with two fingers to swap
between modes. Once selected, the pointing mode maps the x, y position of the users
finger on the touchpad to the corresponding position on the region of the panorama
in view. As they move their finger on the touchpad, a virtual pointer moves on the
panorama. In a similar way, when they are in drawing mode, touching the touchpad
will draw lines on the panorama that remain until they are erased. Any pointing or
drawing points are sent wirelessly to the tablet so that the remote user can see the
Glass users input. In a similar way, the tablet user can touch and draw on the tablet
and have their annotations appear in a different color on the panorama. When the
user touches either the tablet or Glass touchpad with three fingers, the drawings are
erased. Figure 25.7 shows typical annotations added to the panorama view.
25.3.1.4Networking
In order to share pointing, drawing, and remote awareness cues, the tablet and Glass
applications are connected together over a Wi-Fi network. This enables data to be
freely shared between them. A software module is used to connecting the tablet and
the Glass via the TCP/IP networking library oscP5.* The Glass device starts listening to a TCP port, while the tablet device is attempting to connect to the Glass IP
address and port number.
FIGURE 25.7 Drawing and pointing annotations appearing on the panorama image.
*
https://code.google.com/p/oscp5/.
674
Once the connection is established, both devices start sending their local
o rientation to the remote device every half second. The format of each orientation
message consists of: x, y, z representing the orientation around x-axis, y-axis, and
z-axis, respectively.
The connection component also updates the location of drawing and pointing
points to and from the remote device. The format of each collaboration message
consists of
675
Four pairs of subjects communicated using Glass or the Android tablet showing
one of the following three different conditions:
Audio only: Both users could view the panorama image but only talk about it.
Audio + radar: In addition to audio, a radar display was used to provide remote
awareness. The radar display is an exocentric awareness cue that shows a
triangular view for each user that moves in a circular motion according to
the orientation of the device they are using.
Audio + view rectangle: An egocentric rectangle shows where each collaborator was looking. The rectangle would show each users field of view and so
would overlap when facing in the same direction.
Both the radar and view rectangle cues also had circles appearing at the center of the
screen showing the local users center point of view. A second circle of a different color
would appear when the remote user is starting to face the same portion of the panorama. If both circles are lined up, then the two users are facing in the same direction.
The task for each pair of subjects was to discuss for 2min the room that the Glass
user was in and answer a series of interior design questions, such as where they
would put lights to best light the room. It was a within-subjects study so each pair
experienced all three conditions with five different interior design questions. After
each condition the following questions were asked to each participant asking on how
well they think they collaborated:
676
1
0
Q1
Q2
Q3
Audio
Box
Q4
Q5
Radar
In observing how people used the interface it was interesting to note that many
of them changed their communication behavior depending on the type of awareness cues that were provided. For example, in the audio-only condition people
would often describe at length the portion of the room that they were looking at,
until they were sure that the other user understood the direction they were facing.
Subjects also felt that the interfaces in general were intuitive to use, especially the
Glass application that just required them to look in the direction that they were
interested in.
25.5CONCLUSION
In this chapter, we have provided a review of wearable computer interfaces for
remote collaboration and then described a prototype interface we have developed
for sharing social spaces. From earlier work in this field it is apparent that most of
the related work is focused on collaboration in a professional setting for applications
such as remote maintenance, rather than purely social interaction. However, this
work does provide useful design guidelines such as the importance of having the
ability to share video, audio, and graphical annotations, and to consider awareness
and communication cues.
Panoramic imagery can provide an immersive and holistic impression of an environment. With the ubiquity of smartphones equipped with high-quality cameras and
the desire of people to share their experiences and feel connected, a panorama can
support this by offering the impression of a remotely located user to be close to
another with the same freedom of looking around independently as in a real environment. However, the use of panoramic imagery in a mobile collaborative environment
has not been researched thoroughly, leaving many gaps in interface guidelines and
on how the sensation of connection and shared experiences can be utilized in such
an omnidirectional view setting.
This research aims to contribute in the field of wearable computing and sharing personal experiences by using panoramic imagery and exploring remote and
collaborative interaction modalities and their impact on presence. To achieve this
goal, a prototype was developed that simulates an already captured panorama that
is presented on a tablet and on an HMD. The implemented user interface supported
677
the awareness necessary for successful grounding, and the addition of pointing and
drawing provided tools to reference objects or locations in the panorama image
quickly.
This Social Panorama wearable application is one of first wearable interfaces
for remote collaboration where the focus is on creating a social space and sharing
the users environment. A pilot study quickly showed the importance of combining
exocentric and egocentric cues, which resulted in interfaces that provided an overall
view of the space and where the remote user is currently looking, as well as a more
detailed view once they overlapped. In addition, established interaction methods for
remote collaboration were implemented for pointing and drawing.
REFERENCES
Alem, L., Tecchia, F., and Huang, W. (2011). Hands on video: Towards a gesture based mobile
AR system for remote collaboration. In Alem, L. and Huang, W. (Eds.). Recent Trends of
Mobile Collaborative Augmented Reality Systems (pp. 135148). Springer: New York.
Au, A. and Liang, J. (2012). Ztitch: A mobile phone application for immersive panorama creation, navigation, and social sharing. In IEEE 14th International Workshop on Multimedia
Signal Processing (MMSP), September 1719, 2012, pp. 1318. Banff, Canada.
678
Bauer, M., Heiber, T., Kortuem, G., and Segall, Z. (1998). A collaborative wearable system
with remote sensing. In 2012 2nd International Symposium on Wearable Computers
(ISWC 98), (pp. 1017). IEEE Computer Society, October 1920, 1998, Pittsburgh, PA.
Billinghurst, M., Bowskill, J., and Morphett, J. (1998a). WearCom: A wearable communication space. In Proceedings of CVE'98: Collaborative Virtual Environments 1998, June
17th19th, 1998, Manchester, UK, pp. 123130.
Billinghurst, M., Kato, H., Bee, S., and Bowskill, J. (1998). Asymmetries in collaborative
wearable interfaces. In 2012 16th International Symposium on Wearable Computers
(pp. 133133). IEEE Computer Society.
Billinghurst, M., Weghorst, S., and Iii, T. F. (1997). Wearable computers for three dimensional
CSCW. In 2012 3rd International Symposium on Wearable Computers (pp. 3939).
IEEE Computer Society, October 18th19th, San Francisco, CA.
Biocca, F., Harms, C., and Burgoon, J. K. (2003). Toward a more robust theory and m
easure
of social presence: Review and suggested criteria. Presence Teleoperators Virtual
Environment, 12(5), 456480.
Bottecchia, S., Cieutat, J. M., Merlo, C., and Jessel, J. P. (2009). A new AR interaction paradigm
for collaborative teleassistance system: The POA. International Journal on Interactive
Design and Manufacturing (IJIDeM), 3(1), 3540.
Bottecchia, S., Cieutat, J. M., and Jessel, J. P. (2010). TAC: Augmented reality system for
collaborative tele-assistance in the field of maintenance through internet. In Proceedings
of the First Augmented Human International Conference, 2010, p. 14. April 2nd3rd,
2010, Megeve, France.
British Telecom. (1993). CamNet Promotional Video.
Cheng, L.-T. and Robinson, J. (1998). Dealing with speed and robustness issues for videobased registration on a wearable computing platform. In Wearable Computers,
1998. Digest of Papers. Second International Symposium on, October 1920, 1998,
Pittsburgh, PA. pp.8491.
Clark, H. H. and Wilkes-Gibbs, D. (1986). Referring as a collaborative process. Cognition,
22, 139.
Drugge, M., Nilsson, M., Parviainen, R., and Parnes, P. (2004). Experiences of using wearable
computers for ambient telepresence and remote interaction. In Proceedings of the 2004
ACM SIGMM Workshop on Effective Telepresence (pp. 211). ACM, October 1016,
New York, NY.
Feiner, S., MacIntyre, B., Hllerer, T., and Webster, A. (1997). A touring machine: Prototyping
3D mobile augmented reality systems for exploring the urban environment. Personal
Technologies, 1(4), 208217.
Fussell, S. R., Kraut, R. E., and Siegel, J. (2000). Coordination of communication: Effects
of shared visual context on collaborative work. In Proceedings of the 2000 ACM
Conference on Computer Supported Cooperative Work (pp. 2130). ACM, December
26, Philadelphia, PY.
Fussell, S., Setlock, L., Yang, J., Ou, J., Mauer, E., and Kramer, A. (2004). Gestures over
video streams to support remote collaboration on physical tasks. HumanComputer
Interaction, 19(3), 273309.
Google Street View. (2014). http://www.instantstreetview.com/. Accessed October 14, 2014.
Hestnes, B., Heiestad, S., Brooks, P., and Drageset, L. (2001). Real situations of wearable computers used for video conferencing and for terminal and network design.
In Wearable Computers, 2001. Proceedings of Fifth International Symposium on
(pp.8593). IEEE October 79, Zurich, Switzerland.
Kirk, D. S. and Fraser, D. S. (2005). The effects of remote gesturing on distance instruction.
In Proceedings of the Conference on Computer Supported Collaborative Learning, May
30June 4, Taipei, Taiwan, 2005 (pp. 301310).
679
Kortuem, G. (1998). Some issues in the design of user-interfaces for collaborative wearable
computers. In IEEE Virtual Reality Annual International Symposium, March 1418,
Atlanta, GA.
Kortuem, G., Bauer, M., and Segall, Z. (1999a). NETMAN: The design of a collaborative
wearable computer system. Mobile Networks and Applications, 4(1), 4958.
Kortuem, G., Schneider, J., Suruda, J., Fickas, S., and Segall, Z. (1999b). When cyborgs meet:
Building communities of cooperating wearable agents. In Wearable Computers, 1999.
Digest of Papers. The Third International Symposium on October 1819, San Francisco,
CA, (pp. 124132). IEEE.
Krauss, R. M. and Weinheimer, S. (1966). Concurrent feedback, confirmation and the e ncoding
of referents in verbal communication. Journal of Personality and Social Psychology, 4,
343346.
Kraut, R. E., Miller, M. D., and Siegel, J. (1996). Collaboration in performance of physical tasks: Effects on outcomes and communication. In Proceedings of the 1996 ACM
Conference on Computer Supported Cooperative Work (CSCW 96), November 1620,
Boston, MA, (pp. 5766). ACM.
Kurata, T., Sakata, N., Kourogi, M., Kuzuoka, H., and Billinghurst, M. (2004). Remote
collaboration using a shoulder-worn active camera/laser. In Wearable Computers, 2004.
ISWC 2004. Eighth International Symposium on October 31November 3, Arlington,
VA, (vol. 1, pp. 6269). IEEE.
Kuzuoka, H., Kosuge, T., and Tanaka, M. (1994). GestureCam: A video communication s ystem
for sympathetic remote collaboration. In Proceedings of the 1994 ACM Conference on
Computer Supported Cooperative Work (pp. 3543). ACM.
Mann, S. (1997). Wearable computing: A first step toward personal imaging. Computer, 30(2),
2532.
Mann, S. (2000). Telepointer: Hands-free completely self-contained wearable visual augmented reality without headwear and without any infrastructural reliance. In Wearable
Computers, The Fourth International Symposium on (ISWC 2000), October 1617,
Atlanta, GA, (pp. 177178). IEEE.
Mayol, W. W., Trdoff, B., and Murray, D. W. (2000). Wearable visual robots. In The Fourth
International Symposium on Wearable Computers, Atlanda, GA, October 1617, 2000
(pp. 95102). IEEE Computer Society.
Microsoft StreetSide. (2014). http://www.microsoft.com/maps/streetside.aspx, accessed on
October 14th, 2014.
Ou, J., Fussell, S. R., Chen, X., Setlock, L. D., and Yang, J. (2003). Gestural communication
over video stream: Supporting multimodal interaction for remote collaborative physical
tasks. In Proceedings of the Fifth International Conference on Multimodal Interfaces
(ICMI 2003), November 57, Vancouver, Canada, pp. 242249.
Pece, F., Steptoe, W., Wanner, F., Julier, S., Weyrich, T., Kautz, J., and Steed, A. (2013).
Panoinserts: Mobile spatial teleconferencing. In Proceedings of the SIGCHI Conference
on Human Factors in Computing Systems (CHI '13). ACM, New York, NY, 13191328.
Poelman, R., Akman, O., Lukosch, S., and Jonker, P. (2012). As if being there: Mediated
reality for crime scene investigation. In Proceedings of the ACM 2012 Conference
on Computer Supported Cooperative WorkCSCW12, February 1115, Seattle, WA,
pp.12671276.
Rhodes, B. J. (1997). The wearable remembrance agent: A system for augmented memory.
Personal Technologies, 1(4), 218224.
Siitonen, M. and Olbertz-siitonen, M. (2013). I am right here with youConstructing
presence in distributed teams. In Proceedings of International Conference on Making
Sense of Converging Media (AcademicMindTrek13), October 0104, Tampere, Finland,
pp. 1116.
680
Starner, T. (2001). The challenges of wearable computing: Part 1. IEEE Micro, 21(4), 4452.
Starner, T. E. (1999). Wearable computing and contextual awareness. Doctoral dissertation,
Massachusetts Institute of Technology, Cambridge, MA.
Trendafilov, D., Vazquez-Alvarez, Y., Lemmel, S., and Murray-Smith, R. (2011). Can we
work this out? An evaluation of remote collaborative interaction in a mobile shared
environment. In Proceedings of the 13th International Conference on Human Computer
Interaction with Mobile Devices and Services (MobileHCI11) (pp. 499502). ACM:
New York.
Xu, W. and Mulligan, J. (2013). Panoramic video stitching from commodity HDTV cameras.
Multimedia System, 19, 407426.
Author Index
A
Aanjaneya, M., 153
Ababsa, F., 152
Abbott, J., 233234
Abe, Y., 503
Abidi, M.A., 127
Abolmaesumi, P., 497
Aboutalebi, S.H., 653
Abumi, K., 503
Acik, M., 653
Ackerman, J., 503
Ackermann, H., 505
Adams, J., 40
Adcock, M., 231
Adeli, H., 359
Adhami, L., 502
Adhikary, S.D., 495
Agarwal, R., 502
Agusanto, K., 460462, 497
Ahlers, K., 461462
Ahmadi, S.-A., 508
Ahn, B., 231
Ahn, H., 506
Ahonen, J., 300
Ahrens, J., 300, 321
Aiteanu, D., 334
Aittala, M., 461462, 468
Akahoshi, T., 535
Akiyama, S., 314
Akkar, S.D., 358
Akman, O., 666
Alahi, A., 212213
Alam, S., 294, 315
Alba, M., 359
Al-Deen Ashab, H., 497
Alem, L., 666
Alexander, E., 530
Algazi, V.R., 315, 321
Alkire, M., 509
Allen, K., 68
Allen, P.K., 165, 556
Allen, R., 264
Allotta, B., 552
Allread, B.S., 376
Alonso, A., 587, 589
Alp, M.S., 522
Alpern, M., 80
Alphonse, L., 383
Altobelli, D., 530, 533
681
682
Azizian, M., 499, 503
Azuma, R.T., 3, 152, 227, 259273, 333, 373,
412, 458
B
Baber, C., 587, 589, 592, 600, 603, 608612
Bcher, M., 577
Bachhal, S., 233234
Back, M., 288
Bade, C., 435
Bae, C., 586, 590, 606607
Bae, J., 653
Baer, M., 288, 594, 597, 599, 606608, 613
Baere, T.D., 492
Baggi, D., 315
Baghai, R., 587, 592, 606607, 609
Bagozzi, R.P., 417
Bai, W., 653
Baillie, S., 234
Baillot, Y., 152, 227, 333, 373, 458
Bainbridge, D., 311, 501, 535
Bajura, M., 461462, 503
Bala, K., 458
Ballagas, R., 271
Ballester, M.A.G., 533
Ballou, G.M., 283
Balogh, E., 504
Bamberg, S.J.M., 588589, 594, 607
Bamji, C., 346
Banerjee, P., 231
Banfi, G., 509
Banger, M., 495
Banihachemi, J.-J., 509
Bannach, D., 613
Bansal, R., 363
Baraff, D., 558
Barah, A., 492
Barandiaran, I., 536
Baranski, B., 415
Baratoff, G., 436, 473
Barbagli, F., 250, 556
Barbi, J., 564
Barea, R., 447
Barfield, W., 811, 61, 278
Barger, J., 313
Barner, K.E., 250
Barnum, P.C., 438
Barrett, A., 495
Bartczak, B., 345346
Barthes, R., 52
Bartoli, A., 508
Bartz, D., 463, 466, 468, 470, 473478, 542
Barzel, R.,
Bascle, B., 531532
Bass, L., 80
Bastien, S., 539
Author Index
Bauck, J.L., 296
Bauer, M., 535, 552, 608, 662, 664
Bauernschmitt, R., 535
Baumann, H., 22
Baumhauer, M., 487, 497, 499, 538
Baur, C., 509
Bay, H., 207208
Bayart, B., 232, 234
Bayer, M., 71
Bayless, A., 504
Becker, B.C., 532
Becker, M., 415
Beder, C., 346
Bederson, B.B., 311
Been-Lirn, H.D., 407
Begault, D.R., 281282, 315
Behringer, R., 152, 227, 333, 373, 458
Behzadan, A.H., 331392
Beidaghi, M., 642
Bekaert, P., 458
Belkin, M., 81
Bell, B., 80
Benbasaty, A.Y., 588589, 594, 607
Benford, S., 260, 271, 293, 317
Benhimane, S., 333
Bennett, E., 230
Bennett, J., 52
Bensmi, S.J., 250
Bensoussan, P.-J., 556
Berclaz, T.J., 153
Berg, S., 506
Bergamasco, M., 552554, 556
Bergasa, L., 447
Berger, M.-O., 344345, 473, 508, 528, 537
Berglund, M.E., 629
Bergmann, H., 502, 532
Berlin, G., 622, 633
Bernard, A., 556
Bernold, L.E., 370371
Berrezag, A., 554
Bharatula, N.B., 594597, 608
Bhardwaj, M., 18
Bhat, V., 499
Bhattacharya, R., 621
Bianchi, G., 231
Bianchi, M., 553, 555
Bibeau, K., 632, 636
Bicchi, A., 553, 555
Bichlmeier, C., 507508, 529530, 542
Bickel, B., 577
Bier, J., 497
Biggs, K., 552
Bilinski, P., 321
Billinghurst, M., 65, 73, 127, 156, 230, 385386,
412, 415416, 426, 477, 487, 661677
Biocca, F., 672
Biocca, F.A., 80
683
Author Index
Birkfellner, W., 502, 532
Birth, M., 492
Bischof, H., 186, 192
Bischof, M., 506
Bjorneseth, O., 61
Black, P.M., 530, 532533
Blackwell, M., 504, 535, 539
Blanc, D., 587, 592, 606607, 609
Blank, J., 51
Blsing, B., 314
Blattner, M.M., 293
Blauert, J., 281
Bleser, G., 152, 443, 445
Blinovska, A., 587, 592, 606607, 609
Blum, T., 493, 508
Blyth, M., 495
Board, T.N., 509
Boche, F., 377
Boctor, E.M., 492
Bodenheimer, T., 508
Bodenstedt, S., 508, 530, 534
Boff, K.R., 62
Bogaert, J., 500
Bohm, H.D.V., 62
Bolles, R.C., 126, 186, 192, 196, 213, 216
Bolter, J.D., 263, 268, 270
Bonanni, L., 46, 52
Bonderover, E., 630
Bonfiglio, A., 588589, 593, 603, 607, 611, 613
Book, W.J., 554
Boron, A., 418
Borst, C.W., 231235, 555
Bose, B., 234
Bosio, L., 552
Bosson, J.-L., 509
Botden, S.M., 487
Bottecchia, S., 666
Bottou, L., 208
Bouarfa, L., 508
Bougnoux, S., 461462
Bouguet, J.-Y., 474
Bound, K., 40
Bourgeois, S., 433455
Boussetta, K., 303
Bouwman, H., 411
Bowers, J., 293, 317
Bowie, J., 383
Bowskill, J., 662
Boyce, J., 464
Boyer, D., 413
Bozic, K.J., 509
Brac-de-la-Perriere, V., 19
Bradley, G., 311
Bradski, G., 208, 212213
Brambilla, G., 587, 592, 606, 610
Brandner, M., 198
Braun, A.-K., 298
C
Cabrera, D., 281
Cabrera-Umpirrez, M.F., 415
Caffrey, J.P., 359
Cai, H., 335
Cai, K., 477
Cai, X., 652653
Cakmakci, O., 73
Caldas, C.H., 376
Caldwell, D.G., 554
Califf, R.M., 490
684
Calonder, M., 212213
Calvet, L., 440
Campbell, B., 73
Campbell, F.W., 80
Campion, G., 250
Canny, J., 202, 383
Carbonaro, N., 636
Cardia, A., 509
Carleberg, P., 506
Carlile, S., 282
Caron, G., 139
Carpentier, T., 322
Carr, C.E., 324
Carrasco, E., 536
Carretero-Gonzlez, J., 653
Carrigy, T., 314
Carrino, J.A., 504
Carrozza, M.C., 537
Carvalho, E.G.M.D., 492
Casciaro, S., 536
Castiglione, A., 289, 323
Castiglioni, P., 587, 592, 606, 610
Castillo-Martnez, E., 653
Castrillon-Oberndorfer, G., 508, 529, 533534
Cater, K., 314
Catmull, E., 460
Caudell, T.P., 63, 487
Caversaccio, M., 533
Cha, J., 231
Chalupper, J., 302
Chan, H., 496, 507
Chan, S., 414
Chan, V.W., 492
Chandra, T., 127
Chang, J., 96, 652
Chang, W.M., 504
Chang, Y., 232
Chapman, D.N., 372373
Charbel, F.T., 522
Charvillat, V., 440
Chasey, A., 370372
Chaumette, F., 152
Chen, F., 197
Chen, I., 234
Chen, J., 477
Chen, S.J.-S., 507, 525
Chen, W.C., 209
Chen, X., 501, 653, 665
Cheng, A., 497
Cheng, D., 75, 86, 96
Cheng, H., 652653
Cheng, J., 586, 590, 606, 613614
Cheng, K.T., 195223
Cheng, L.-T., 668
Cheok, A.D., 314, 407
Cheon, Y.J., 219
Cherenack, K., 621
Author Index
Chi, D., 314
Chi, S., 376
Chib, V.S., 504
Chin, K.J., 492
Chinello, F., 552555
Chiu, P., 197
Chli, M., 212213
Chmiola, J., 641
Cho, D., 314
Cho, H.U., 653
Cho, Y., 152
Chock, G., 358
Chodorge, L., 556
Choi, H.-W., 435
Choi, J.-W., 300
Choi, K., 376
Choi, M., 411
Choi, S., 227251
Chu, H.W., 655
Chu, M.W.A., 501
Chuangui, Z., 460462
Chubb, E.C., 553
Chum, O., 179, 186, 192, 196, 216
Chun, J., 233
Chung, A.J., 508
Chung, W.K., 250
Chung, W.Y., 646, 654
Ciancitto, F., 586, 589, 591, 606607, 613
Cianflone, D., 587, 589
Cicinelli, J.G., 290, 311
Cieutat, J.M., 666
Ciglar, M., 302
Ciocarlie, M., 556
Cipolla, R., 441, 444
Cipolla, T., 594, 597, 599, 602603
Cirio, G., 552577
Civolani, M., 554
Clark, H.H., 666
Clarkson, M., 531
Claus, D., 152
Clawson, J., 18, 27
Cleary, K., 488, 499, 503
Cline, H., 533
Cobb, J., 495
Cockburn, A., 17
Cohen, D., 33
Cohen, J.D., 380
Cohen, M., 278304, 309324
Colantonio, S., 586, 589, 591, 606607, 613
Coleman, M., 495
Coleman, P.D., 292
Colgate, J.E., 553, 562, 564, 568
Coller, B.S., 490
Collet, A., 152
Collins, D.L., 507, 520543
Collins, J., 474
Collomosse, J., 465
685
Author Index
Colquhoun, H. Jr., 229230
Colucci, D., 537538
Comeau, C.P., 60
Comet, B., 587, 592, 606607, 609
Comport, I., 152
Conditt, M.A., 495, 509
Conway, F., 314
Cooper, D.H., 296
Cooper, J., 464
Cooper, S., 333
Cooperstock, J.R., 324, 541
Coote, A., 42
Coquillart, S., 555
Correia, A., 411
Corroy, C., 587, 592, 606607, 609
Corso, J., 231, 234
Costa, F., 509
Costabile, M.F., 415
Costello, S.B., 372373
Coste-Manire, E., 502
Cote, W., 487
Cotin, S., 528, 537
Cottle, R., 567
Coughlin, J., 629, 636
Courtecuisse, H., 556
Crampton, C., 461462
Crawford, J.R., 537538
Crockett, M., 50
Crossley, F., 242, 246248
Csoma, C., 504
Cui, L.F., 647
Cui, Y., 647
Culbertson, H., 250
Cummins, M., 446
Cundy, T.P., 508
Cunningham, D., 478
Curatu, C., 119
Curone, D., 588589, 593, 603, 607, 611, 613
Currie, M., 501
Cutkosky, M.R., 232, 234, 241, 556
Cynader, M., 292
D
Daeuber, S., 506
Dai, D., 463, 478
Dai, F., 333, 359
Dalton, R.J. Jr., 315
Daly, L., 322
Daly, M.J., 496, 507
Dandekar, K.R., 639
Dandekar, O., 499
Dankelman, J., 508
Dario, P., 537, 593
Darzi, A.W., 508
Dator, J., 49
Datta, A., 438
686
Dodson, A.H., 333, 374, 377
Doermann, D., 197
Dogan, S., 505
Dohi, T., 505, 522
Doi, K., 303
Doil, F., 435
Dolveck, F., 594, 597, 599, 606608, 613
Donath, J., 288
Dong, S., 331392
Dong, X., 500, 538539
Dong, Y., 234
Dori, Y., 500
Dou, M., 69, 73
Dow, S., 263, 266
Doyle, R., 52
Drageset, L., 664
Drake, S.H., 537538
Dramis, A., 509
Drettakis, G., 461462
Drif, A., 232, 234
Drouin, S., 528
Drugge, M., 664
Drummond, T.R., 126
Drummond, T.W., 152, 192, 198, 207, 221,
438,441, 444
Duckworth, G., 313
Duda, R.O., 315, 321, 363
Dudnik, G., 588589, 593, 603, 607, 611, 613
Duerk, J.L., 503, 531
Duh, H.B.L., 236
Dujovny, M., 522
Dumindawardana, U.C., 315
Dunn, B., 643
Dunne, L.E., 619638
Dnser, A., 412, 415416, 426
Dunston, P.S., 333334, 385
Duraiswami, R., 321
Duriez, C., 556
Durkin, D.P., 655
Durlach, N.I., 278, 324
Dutre, P., 458
Dyatkin, B., 642
Dymarkowski, S., 500
E
Eade, E., 438
Eagleson, R., 535
Eble, T., 461462
Ector, J., 500
Edwards, A.D.N., 312
Edwards, P.E., 488
Edwards, P.J., 502, 531532, 541
Eggers, G., 506, 522, 529, 533534
Eggers, J., 302
Ehlerding, A., 493
Eid, A., 491
Author Index
Eisenberg, M., 621, 629
Eissele, M., 471472
Elgort, D.R., 503, 531
Elhawary, H., 508
Eliason, F.A., 13
Ellis, J.B., 288
Ellis, R.E., 231, 233234, 237
Ellsmere, J., 496
Elson, D., 508
El-Tawil, S., 358359
Endo, T., 68
Engdegrd, J., 314
Engelke, T., 322, 415
Entrena, M., 288
Eom, S., 231
Esrafilzadeh, D., 653
Ess, A., 207208
Ettinger, G.J., 496
Ettinger, G.L., 487
Euler, E., 538539
Evans, A., 333
Evenhouse, R., 231
Everest, F.A., 296, 323
Ewald, H., 594, 597, 599, 606
Ewers, R., 532
F
Fahln, L., 293, 317
Falch, C., 314
Faldowski, R., 250
Falk, V., 502
Fallavollita, P., 501
Faller, C., 314
Fan, F., 655
Fang, X., 653
Faranesh, A.Z., 500
Farrant, A., 436
Faure, F., 460, 473
Fauvarque, J.F., 642
Fayad, L.M., 504
Fayos, Z.A., 654
Fedkiw, R., 564, 570
Feifer, A., 487
Feiner, S., 63, 80, 153, 227, 263, 288, 312, 333,
373, 458, 662
Feiner, S.B., 63
Felfoldy, G.L., 10
Fellner, D., 376
Fels, S.S., 324
Feng, C., 335, 386
Feng, M., 359
Feng, Z., 236
Fenlon, M.R., 532
Feriche, M., 358
Fernando, O.N.N., 315, 319
Fernie, A., 70
687
Author Index
Ferratini, M., 587, 592, 606, 610
Ferreira, M., 438
Ferretti, L., 552
Feuerstein, M., 499, 507508, 535, 542
Feussner, H., 508
Ficco, M., 289, 323
Fichtinger, G., 504
Fickas, S., 662
Fields, B., 314
Fiene, J., 554
Figl, M., 502, 532
Fischer, E., 302
Fischer, G.S., 504
Fischer, J., 457479, 542
Fischer, M., 383
Fischer, M.A., 126
Fischler, M.A., 186, 192, 196, 213, 216
Fisher, R., 69
Fitzgibbon, A.W., 152
Flanagan, P., 3153
Fleig, O., 527, 532533
Flohr, D., 478
Florea, L., 231
Floyd, A.J., 495
Fogel, M.A., 500
Foglia, E., 509
Foley, K.T., 491
Foley, M.P., 654
Foner, L., 312
Fong, T., 509
Fontana, F., 554
Fontecchio, A.K., 639
Foo, J.L., 487
Forest, C., 536537
Fornari, M., 509
Forrest, N.N., 234
Fortin, P.-A., 345
Fouad, H., 291
Fournier, A., 464
Foxlin, E., 198, 222
Frahm, J.M., 186, 192
Franklin, J., 42
Fraser, D.S., 665
Fraunhofer, I.Z.M, 622, 633
Freed, D.J., 287
Fregonese, L., 359
Freiman, M., 497
French, L.M., 199, 209, 212
Freudenthal, A., 509, 536
Frey, C., 42
Frey, M., 234
Freysinger, W., 498
Friedrich, W., 438
Frisoli, A., 553554, 556
Frith, A., 632
Fritz, D., 538
Fritz, J.P., 250
G
Gabbard, J.L., 79
Gabriel, T.H., 45
Gaikwad, A.M., 655
Gaikwad, V., 415
Gain, B., 17
Gamper, H., 301
Gandy, M., 263
Gangi, A., 496
Ganster, H., 198
Gao, H., 652
Gao, S., 411, 426
Garcia-Hernandez, N., 554
Gardiner, M., 267
Garg, M., 18
Garner, W.R., 10
Garre, C., 552577
Gaston, R.P., 502, 531
Gavaghan, K.A., 502, 506
Gay-Bellile, V., 433455
Gaye, L., 314
Gazzola, V., 50
Gedenryd, H., 290
Gelfand, N., 209
Gemperle, F., 552, 608
Geng, J.F., 646, 655656
George, A.K., 500
Georgel, P., 333
Georgi, M., 588, 593, 607
Gerard, I., 528
Gerhard, M., 415
Gerling, G.J., 234
Geronazzo, M., 321
Gersak, B., 536
Gershenfeld, N., 46
688
Gervautz, M., 460, 473
Gewirtz, J., 554
Gholamvand, Z., 653
Ghosh, A., 652
Giannachi, G., 260, 271
Gibson, D., 315
Gibson, J., 13
Gilbert, G., 313
Gilkey, R.H., 281
Gillespie, M.J., 500
Gilliland, S., 18, 22
Ginsburg, D., 459
Gioberto, G., 619638
Gionis, A., 215
Giraldez, J.G., 533
Glatz, A.C., 500
Gleason, L., 533
Gleason, P.L., 530
Gleeson, B.T., 553, 555
Gleeson, M.J., 531532
Glossop, N., 505
Gockel, T., 534
Goddard, M.S., 495
Godinez, C., 499
Gogotsi, Y., 639657
Gokturk, B.S., 346
Golparvar-Fard, M., 333334
Gomes, P., 438, 495
Gomez, R., 536
Gomila, C., 464
Gondan, M., 497
Gong, J., 376
Gong, R.H., 500
Gonzalez, R., 185, 202
Goodger, N.M., 532
Gool, L.V., 207208
Gopal, P., 233234
Gordon, I., 441
Gorgutsa, S., 653, 655
Grtler, J., 508, 530
Gosselin, F., 556
Goto, A., 589, 593, 608, 5888
Gottschalk, S., 380
Gounaris, M., 413
Gozen, A.S., 487, 538
Gracia, A., 553, 568
Grange, S., 509
Gransther, P.A., 411
Graser, A., 334
Grasset, R., 80, 156
Grtzel, C., 509
Greenberg, R.M., 293
Greenebaum, K.,
Greenhalgh, C., 293, 317
Greer, D., 461462
Greiner, G., 536
Grimmer, K., 17
Author Index
Grimmer-Somers, K., 17
Grimson, W.E., 487
Grimson, W.E.L., 532533
Grimson, W.L., 496
Grioli, G., 553
Groch, A., 508
Grodski, J.J., 400
Groen, F., 466
Grollmann, B., 152
Grosch, T., 461462
Gross, M., 552577
Grosshauser, T., 234, 314
Gruetzner, P.A., 500, 538539
Grundmann, U., 490
Grzeszczuk, R., 153
Gu, J.F., 653
Guan, T., 345
Guan, W., 152153, 169
Gugino, L., 487
Guha, S.K., 234
Guillerminet, O., 303
Guiraudon, G.M., 501, 535
Guitton, P., 478
Gler, ., 500
Gumerov, N.A., 321
Gunawan, A., 464
Gunkel, A.R., 498
Gunn, C., 231
Gnthner, W.A., 436
Guo, A., 22
Gurdjos, P., 440
Gurung, J., 505
Guruswamy, V.L., 250
Gustafsson, T., 506
Gutierrez, O., 312
Gutt, C.N., 499, 535536
Guven, E.O., 487, 538
Guven, S., 487, 499, 538
Gyorbir, N., 293, 315
H
Ha, I., 411
Ha, T., 232, 234, 236
Ha, Y., 73
Haahr, M., 314
Haas, C.T., 376377
Haberlin, B., 260
Habigt, J., 321
Habigt, T., 321
Hachet, M., 478
Hachisu, T., 234235
Haddadi, A., 242
Hadzic, A., 495
Haefke, M., 594, 597, 599, 606
Haffner, P., 208
Hager, G.D., 233234, 492, 502
Author Index
Hakime, A., 492
Hall, N.C., 493
Hallen, B., 75
Haller, M., 407, 476477
Hamacher, V., 302
Hamilton, H, 22
Hammad, A., 385, 412
Han, C., 234, 655
Han, J.-H., 234, 345
Han, S.H., 233
Handa, J., 233234
Handel, S., 314
Hanel, R., 502, 532
Hansen, C., 507, 541
Hanuschik, M., 502
Haouchine, N., 528, 537
Haralick, R., 202
Harashima, H., 486
Harbisson, N., 47, 312
Harders, M., 227251
Haren, S., 267
Hrm, A., 288
Harms, C., 672
Harms, H., 586, 589, 591593, 600, 602603,
607, 610611, 613, 615
Harms, W., 506
Harrigan, P., 260
Harris, M.A., 500
Harris, S., 495
Hart, G.C., 362
Hart, P.E., 363
Hartle, R., 127, 132
Hartley, R., 192, 442
Hartmann, W.M., 283
Hasegawa, S., 552, 554
Hashimoto, S., 400
Hashizume, M., 487, 535, 537
Hashtrudi-Zaad, K., 242
Hassenpflug, P., 536
Hassfeld, S., 506, 534
Hata, N., 522
Hattori, A., 537
Hattori, T., 303
Haugstvedt, A.-C., 411428
Haus, G., 315
Havemann, S., 376
Haverhals, L.M., 654655
Hawkes, D.J., 502, 531532, 541
Hayes, G., 263
Hayford, M.J., 65
Hayward, V., 250, 553554, 577
Haywood, K., 413
Hebert, C., 290, 311
Hebert, M., 152
Hebert, P., 345
Hedau, V., 153
Heibel, H., 152
689
Heiber, T., 662
Heidbuchel, H., 500
Heiestad, S., 664
Heilig, M., 60
Hein, A., 497
Heining, S.M., 499, 501, 507, 538539, 541542
Held, R.M., 324
Helferty, J.P., 498
Hella, L., 421
Heller, E.J., 297
Hellmuth, O., 314
Hellwich, O., 536
Helmsen, J., 199, 209, 212
Henckel, J., 495
Hendrix, C., 61
Henze, N., 613614
Heon, M., 642, 647650
Herbst, I., 298
Herder, J., 282
Hermann, T., 234, 314
Hernandez, F., 552577
Herre, J., 314
Herrmann, K., 493
Hertzmann, A., 465
Hesina, G., 460, 473
Hestnes, B., 664
Higgins, W.E., 498
Hiipakka, J., 288
Hildebrand, P., 492
Hill, D.L.G., 502, 531532
Hiller, B., 334
Hilliges, O., 267, 474
Hilpert, J., 314
Hinkle, G.H., 493
Hinterstoisser, S., 333
Hinze, J., 376
Hiroka, S., 87
Hirokazu, K., 386
Hirose, M., 78, 312
Hirota, G., 503
Hirota, K., 312
Hix, D., 79
Hmam, H., 126
Ho, C., 234
Hoelzer, A., 314
Hoermann, S., 474
Hoever, R., 232
Hoffmann, J., 508, 534
Holey, P., 415
Holland, D., 8
Holland, S., 290
Hollands, R., 333
Hllerer, T.H., 7980, 174192, 263, 662
Hollins, M., 250
Hollis, R.L., 233234
Holman, T., 299
Holmes, G., 311
690
Holmes, O.W., 44
Holmquist, L.E., 314
Holt, R.E., 292
Hong, J., 535
Hong, K., 233234
Hong, K.-S., 593
Hong, L.X., 497
Hong, S., 596, 598, 602, 607
Honkamaa, P.,
Hoogen, J., 234
Hoppe, H., 506, 534
Hori, T., 498, 531
Horowitz, M.B., 504
Horschel, S.K., 555
Hoshi, K., 505
Hoshi, T., 233234
Hou, S., 652653
Houde, A., 630
Houtgast, T., 292
Howe, R.D., 556
Howells, J., 40
Howes, D., 46, 52
Hsiao, E., 152
Htoon, M.M., 293
Hu, C., 652653
Hu, L.B., 647
Hu, M.H., 539
Hu, Y., 652653
Hu, Z., 126
Hua, H., 75, 86, 117119
Huang, J., 300
Huang, P., 641
Huang, W., 666
Hubbard, A., 314
Huber, K., 532
Huber, P.J., 181, 183
Hughes, C., 268
Hughes, D., 268
Hughes-Hallett, A., 508
Hugues, O., 232
Huhle, B., 473
Hull, R., 314
Hummel, J., 502, 532
Hunt, K., 242, 246248
Hutchings, R., 314
Hutchins, M., 231
Hutchinson, R.C., 597, 600, 606, 609
Huttenlocher, D., 192
Hwang, K., 234
Hwang, P., 497
Hyakumachi, T., 503
I
Iannitti, D.A., 492
Ibrahim, M., 416
Ieiri, S., 499, 535
Author Index
Ieong, E., 594596
Igarashi, T., 400
Iida, K., 281
Ikeda, A., 234
Ikeda, M., 300, 376
Ikeda, S., 152
Ikei, Y., 312
Ikuta, H., 588589, 593, 608
Iltis, R.A., 219
Imhof, H., 502, 532
Imielinska, C., 487
Imlab, C., 39
Inami, M., 231, 233234, 400, 405407
Indugula, A.P., 555
Indyk, P., 215
Inomata, T., 505
Inoue, T., 594, 597, 599, 602603
Ioannidis, N., 265, 413
Iordachita, M., 233234
Irish, J.C., 496, 507
Irschara, A., 186, 192
Irving, G., 564
Isard, M., 192, 215
Iseki, H., 498, 522, 531
Isenberg, T., 465
Ishida, A., 400
Ishii, H., 46, 52, 324
Ishii, M., 233
Ishimaru, S., 595, 598
Ismail, S., 22
Ito, M., 503
Itoh, T., 630
Itoh, Y., 79
Iwata, H., 234
Iwaya, Y., 281
Izadi, S., 267, 474
J
Jacobs, J., 555556
Jacobs, S., 502
Jakimowicz, J.J., 487
Jakka, J., 288
Jakopec, M., 495
Jakubowicz, J., 180, 363
Jalili, R., 653
James, D.L., 232, 556, 567
Jan, M.F., 263
Jank, E., 493
Jannin, P., 508, 520543
Jappinen, J.,
Jarchi, D., 594596
Javidi, B, 117118
Jelle, T., 416
Jenkin, M.R., 280
Jenkins, S., 23
Jeon, S., 227251
691
Author Index
Jeong, J., 345
Jeong, S., 647
Jesberger, J.A., 503, 531
Jessel, J.P., 666
Jewell, D., 531532
Ji, Y., 359
Jiang, B., 152, 198
Jimenez, J.M., 536
Jin, C., 288
Jinnah, R.H., 495
Jiroutek, M., 503
John, T.K., 495
Johnson, A., 231
Johnson, L.F., 413
Johnson, L.G., 502, 507, 541
Jolesz, F.A., 530, 532533
Jones, D.L., 535
Jones, M., 311, 588589, 593, 629630
Jones, S., 311
Jones, V.M., 500
Jonker, P., 666
Jonker, P.P., 508
Joskowicz, L., 497
Jost, K., 639657
Jot, J.-M., 280, 282
Jouffrais, C., 312
Joung, M., 234
Julier, S.J., 79, 152, 227, 333, 373, 458, 667
Jun, K., 314
Jundt, E., 434
Jung, C., 231
Jung, H.-R., 652, 655
Junghanns, S., 376
K
Kaaresoja, T., 234
Kaasinen, E., 411
Kaczmarek, K.A., 61
Kagotani, G., 400
Kahl, F., 127
Kahn, S., 440
Kahrs, L.A., 506, 531532
Kajimoto, H., 231, 233234, 552, 554
Kajinami, T., 78
Kakeji, Y., 499
Kalkofen, D., 80, 478, 507
Kallmayer, C., 622, 633
Kalra, A.K., 234
Kamat, R.V., 383
Kamat, V.R., 331392
Kameas, A., 324
Kamijoh, N., 594, 597, 599, 602603
Kammoun, S., 312
Kamuro, S., 552, 554
Kn, P., 469
Kanade, T., 233234, 389, 438, 504, 535, 539
692
Kiaii, B., 501
Kichun, J., 451
Kijima, R., 76
Kikinis, R., 487, 496, 530, 532533
Kim, B., 596, 598, 602, 607
Kim, C.-H., 376377, 554
Kim, D., 474
Kim, H., 250, 506, 554, 652
Kim, J., 126, 231, 652
Kim, J.C., 234
Kim, J.H., 219
Kim, J.-H., 593
Kim, J.M., 653
Kim, K., 652, 655
Kim, S., 234, 300
Kim, S.H., 586, 590, 606607, 653
Kim, S.-Y., 234, 554
Kim, W., 234
Kim, Y., 231
Kim, Y.-H., 300
Kim, Y.S., 593594, 597, 599
King, A.J., 292
King, A.P., 502, 531532
King, K., 377
King, L.E., 292
King, S., 314315
Kini, A.P., 377
Kinkeldei, T.W., 621
Kirk, D.S., 665
Kirsch, N.J., 639
Kirstein, T., 621622
Kishino, F., 4, 74, 259, 298, 486, 520
Kishishita, N., 71
Kitano, H., 400
Kiyokawa, K., 6081
Klatzky, R., 504
Kleemann, M., 492
Klefstat, F., 587, 592, 606607, 609
Klein, G., 185, 192, 442, 444, 459, 463, 467468
Klein, J., 497
Klein, M., 497
Kleiner, M., 280
Klette, R., 126
Klinker, G., 79, 412, 436, 535
Klopschitz, M., 192
Knight, J.F., 587, 589, 592, 600, 603, 608612
Kndel, S., 478
Knoerlein, B., 231
Knopp, M.V., 493
Kobayashi, E., 505
Kobbelt, L., 192
Koch, D.G., 65
Koch, R., 345346, 434
Kockro, R.A., 497
Koda, K., 506
Kodama, K., 406
Koehring, A., 487
Author Index
Kohli, P., 474
Kojima, M., 405406
Kolb, A., 508
Komor, N., 18
Konishi, K., 487, 499, 535
Konolige, K., 208, 212213
Koo, B., 383
Koo, J., 233234
Koppens, J., 314
Korah, T., 153
Korb, W., 540
Korkalo, O.,
Kornagel, U., 302
Kortuem, G., 662, 664, 667
Kosa, G., 232
Kosaka, A., 533
Koschan, A., 126
Kser, K., 345
Kosuge, T., 665
Kosugi, C., 506
Koto, T., 400
Koulalis, D., 491
Kourogi, M., 665, 668
Kramer, A., 673
Kraus, M., 471472
Krauss, R.M., 666
Kraut, R.E., 662664, 666
Kreaden, U., 502
Krebs, D.E., 588589, 594, 607
Krempien, R., 506
Kress, B., 19, 86123
Kreuer, S., 490
Krishnan, S., 358
Krogstie, J., 411428
Krueger, T., 497
Krug, B., 497, 534
Kruger-Silveria, M.K.-S., 438
Kruijff, E., 71
Kry, P.G., 556
Kbler, A.C., 497, 534
Kuchenbecker, K.J., 228, 232, 234, 250, 554
Kuijper, A., 440
Kukelova, Z., 127, 129130, 137, 140, 145
Kumar, A., 233234
Kumar, R., 192
Kumar, S., 233234
Kuntze, A., 271
Kunze, K., 595, 598
Kunze, S., 532
Kuorelahti, J., 28
Kurata, T., 152, 665, 668
Kurihara, T., 556
Kurita, Y., 234
Krkloglu, M., 500
Kuroda, T., 588589, 593, 608
Kurzweg, T.P., 639
Kurzweil, R., 4950
693
Author Index
Kutter, O., 493
Kuzuoka, H., 665, 668
Kwon, D.-S., 554
Kwon, S., 377
Kwon, Y.H., 652, 655
Kyprianidis, J., 465
Kyriakakis, C., 320
Kyung, K.-U., 231, 234, 554
L
La Mantia, F., 647
La Palombara, P.F., 537
Labbe, B., 454
Labrune, J.-B., 46, 52
Lackner, C., 556
Laitinen, M.-V., 300
Lakatos, D., 46, 52
Lan, Z.-D., 126
Landerl, F., 477
Lang, J., 250
Lang, J.E., 495
Lang, J.W., 650
Lang, P., 198
Lang, T., 478
Langenstein, M., 655
Langlotz, F., 490
Langlotz, T., 80
Lanman, D., 69, 73
Lanzilotti, R., 415
Laperrire, R., 561
Largeot, C., 641
Larnaout, D., 433455
Laroche, F., 413
Larsen, E., 380
Lasorsa, Y., 322
Lasser, T., 493
Lau, K.N., 492
Lauffer, M., 594596, 600, 607, 609611
Lazebnik, S., 215, 441
Le, V.T., 652
Lechner, M., 322
LeCun, Y., 208
Lecuyer, A., 233
Lederman, R.J., 500
Lee, H., 234, 314
Lee, H.M., 359
Lee, I., 233234, 359, 436
Lee, J., 263, 503, 554
Lee, J.A., 653
Lee, J.B., 621, 638
Lee, J.-Y., 231
Lee, K.K., 588589, 594
Lee, P.Y., 539
Lee, S., 333334, 554, 596, 598, 602, 607
Lee, S.-G., 593594, 597, 599
Lee, S.H., 345
694
Llach, J., 464
Lloyd, J.E., 232
Lo, B., 594596, 606
Lo, J., 535
Lobe, T., 487
Lobes, L.A. Jr., 504, 532
Locher, I., 621
Lockhart, T.E., 588589, 593
Lokki, T., 281, 288, 300, 314
Longridge, T., 70
Lonner, J.H., 495, 509
Loomis, J.M., 290, 311
Lopez, E., 447
Lopez, J., 371
Lpez-Nicolsb, C., 411
Lorensen, W., 533
Lorho, G., 288
Loriga, G., 588589, 593, 603, 607, 611, 613
Lorussi, F., 636
Lossius, T., 322
Lotens, W., 61
Lothe, P., 447
Louis, J., 345
Lourakis, M., 348
Lourakis, M.I.A., 566, 573
Lovejoy, J., 500
Lovo, E.E., 487
Lowe, D.G., 160, 176, 179, 186, 192, 207208,
389, 441
Loy, G., 281
Lozano-Perez, T., 496
Lu, J., 477
Lu, K., 498
Lu, M., 333, 359
Lucas, B.D., 389
Luciano, C., 231
Ludwig, L.F., 315
Ludwig, M.D., 287
Luebke, D., 69, 73, 75
Lueth, T.C., 497
Lukatskaya, M.R., 642
Lukosch, S., 666
Lukowicz, P., 27, 586587, 590, 592,
594597, 599600, 606611,
613614
Luo, X., 22
Lupton, E., 51
Lv, Z., 652653
Lyons, K., 17, 27
Lyytinen, K., 417
M
Ma, W., 655
Maataoui, A., 505
Macaluso, F., 594596, 600, 607, 609611
Macia, I., 536
Author Index
Macintyre, B., 63, 152, 227, 263, 266, 268, 270,
322, 333, 373, 458, 477478, 662
MacKenzie, I.S., 584
MacLean, K.E., 233
Macq, B., 533535
Maeda, N., 404
Maeda, T., 554
Maehara, Y., 535
Maeno, T., 314
Maes, F., 500
Magas, M., 314
Magenes, G., 588589, 593, 603, 607, 611, 613
Magerkurth, C., 314
Magnenat-Thalmann, N., 561
Maguire, J.S., 40
Maguire, Y., 16
Mahalik, N.P., 231
Mahvash, M., 241
Maier-Hein, L., 497, 508
Maimone, A., 69, 73
Majdak, P., 322
Majno, P.E., 502
Makino, Y., 314
Malham, D.G., 292
Malhi, K., 594, 597, 599, 606
Mallem, M., 152
Mallet, E., 589, 594, 606
Malliopoulos, C., 587, 589
Malvezzi, M., 552555
Mandal, P., 646, 655656
Mandryk, R.L., 314
Manduchi, R., 476
Mann, S., 4, 33, 39, 48, 80, 661, 663
Mannava, S., 495
Manocha, D., 380
Marayong, P., 233
Marcacci, M., 537
Marchand, E., 139, 152
Marchand, H., 152
Marcus, H.J., 508
Marcus, J., 413
Marescaux, J., 496, 536537
Margetts, M., 46
Margier, J., 509
Mariani, J., 293, 317
Mariette, N., 303
Marino, S., 564, 570
Marmulla, R., 506, 522, 534
Marner, M.R., 267, 434
Martelli, S., 537
Martens, W.L., 282, 284, 287, 292, 296, 300,
317, 321
Martin, A., 288
Martin, A.D., 499
Martin, E.W. Jr., 493
Martin, J.S., 60
Martin, R., 552, 608
Author Index
Martin, T., 588589, 593, 629630
Martinez, J., 345
Martinez, J.C., 352, 384
Martinie, J.B., 492
Martins, R., 86
Martz, P., 339
Marui, A., 296
Masamune, K., 504
Mashita, T., 71, 79
Masri, S.F., 359
Massie, W., 333
Master, N., 43
Matas, J., 179, 186, 192, 215
Mateas, M., 266
Mather, T., 554
Mathes, A.M., 490
Mathieu, H., 504
Matias, E., 584
Matsubara, H., 400
Matsuno, F., 400
Matsushita, S., 595596, 612
Matthews, J., 263
Matula, C., 502, 532
Matusik, W., 577
Mauer, E., 673
Maurer, C.R. Jr., 502, 531532
Maurer, U., 594, 597, 599, 602603, 609
Mavor, A.S., 278
Mavrogenis, A.F., 491
Mavrommati, I., 324
May, M., 290
Mayer, E.K., 508
Mayol, W.W., 665
Maz, R., 314
McAllister, D.F., 291
McCaffrey, E., 47
McCollum, H., 60
McDonough, J.K., 646650, 656
McGookin, D.K., 293
McGrath, J., 495
McGuire, J., 71
McGurk, M., 532
McKillop, I.H., 492
McKinlay, A., 43
McNeely, W.A., 562
McQuillan, P.M., 495
Megali, G., 537
Meier, P., 152, 435
Meinzer, H.-P., 497, 499, 536
Meis, J., 415
Melamed, T., 314
Melchior, F., 300
Melzer, J., 86
Menassa, R., 436
Mendez, E., 333334, 376, 478, 507
Meng, Q., 653
Meng, Y., 652653
695
Menk, C., 434
Merlo, C., 666
Merloz, P., 491
Merritt, S.A., 498
Mershon, D.H., 292
Metje, N., 372373
Metzger, J.-C., 227251
Metzger, P.J., 486
Meyer, A.A., 537538
Meyers, K., 263
Meyrueis, P., 97
Mhling, J., 534
Miao, M., 653
Mikolajczyk, K., 207
Milanese, S., 17
Milgram, P., 4, 229230, 259, 298, 400, 486, 520
Milios, E., 280
Miller, E., 487
Miller, M.D., 662663, 666
Mills, J.E., 383
Milsis, A., 587, 589
Mimidis, G., 491
Minamizawa, K., 234, 552, 554
Mine, Y., 303
Miranda, E., 358
Mirow, L., 492
Mischkowski, R.A., 497, 534
Misra, M., 522
Mitake, H., 552, 554
Mitchell, B., 233234
Miyake, K., 630
Miyaki, T., 594596, 603
Miyano, G., 487
Miyata, N., 556
Miyazaki, J., 131
Mizell, D.W., 63
Mockel, D., 504
Moffitt, K., 86
Mofidi, A., 495
Mohr, F.W., 502
Moital, M., 411
Mojzisik, C.M., 493
Mok, K., 528
Molina-Castillo, F.J., 411
Molyneaux, D., 474
Monden, M., 497, 535
Monitoring, D., 359
Montola, M., 260
Moore, D.R., 292
Moore, J., 501, 505, 535
Moore, J.T., 501
Mooser, J., 152, 156, 158, 162, 169
Mora, S., 418
Morandi, X., 508
Morel, J.-M., 180, 363
Morel, P., 502, 506
Moreno, E., 268
696
Moreno-noguer, F., 126
Mori, H., 79
Morphett, J., 662
Morse, D.R., 290
Mote, C.D., 564
Motwani, R., 215
Mountain, D., 314
Mountney, P., 508
Mouragnon, E., 442
Mourgues, F., 502
Mudur, S.P., 385
Mueller, S., 461462
Mhling, J., 506, 522, 534
Mukhopadhyay, S.C., 594, 597, 599, 606
Muller, K., 503
Mller, M., 497, 499, 560
Mller, S., 436
Muller-Stich, B.P., 535536
Mulley, B., 587, 589, 592, 600, 602603, 609, 612
Mulligan, J., 668
Mulligan, L., 632
Mulloni, A., 152
Munaro, G., 586, 589, 591, 606607, 613
Munshi, A., 459
Munter, M.W., 506
Munzenrieder, N., 621
Mura, G.D., 636
Murakami, M., 589, 593, 608, 5888
Murphy, D., 153
Murray, D.W., 185, 192, 442, 444, 459, 463,
467468, 665
Murray-Smith, R., 672
Murrey, D.A. Jr., 493
Musicant, A.D., 322
Mussack, T., 499
Mutschler, W., 538539
Mutter, D., 536537
Mylonas, G.P., 508
Mynatt, E.D., 288
Myoungho, S., 451
N
Nafis, C., 533
Nagahara, H., 71
Nagata, K., 234
Naimark, L., 198, 222
Najafi, H., 535
Nakada, K., 487
Nakaizumi, F., 234
Nakajima, Y., 497, 535
Nakamoto, M., 487, 497, 499, 535
Nakamura, A., 405406
Nakamura, N., 552, 554
Nakatani, Y., 135
Nakazawa, A., 79
Naliuka, K., 314
Author Index
Nannipieri, O., 232
Narayanaswami, C., 594, 597, 599, 602603
Narita, Y., 359
Narumi, T., 78
Nassani, A., 661677
Nathwani, D., 594596
Naudet-Collette, S., 436, 438, 445, 450
Navab, N., 488, 493, 499, 501, 507508, 541542
Nedel, L.P., 533535
Neely, C., 629630
Neff, R.L., 493
Neider, J., 337, 346
Nelson, L., 324
Neumann, U., 152169, 198, 461462
Newcombe, R., 267, 474
Newman, P., 446
Ng, C., 41
Ng, I., 497
Nguyen, D., 71
Niazi, A.U., 492
Nicol, R., 322
Nicolau, S.A., 496, 536537
Niedzviecki, H., 4
Nielsen, S.H., 292
Niemann, H., 487
Nii, H., 400, 405
Nijmeh, A.D., 532
Nikou, C., 504, 535, 539
Nilsen, T., 314
Nilsson, M., 664
Nishizaka, S., 78
Nistr, D., 179, 186, 192
Nister, F., 127
Nitschke, C., 79
Nitzsche, S., 502
Noda, I., 400
Noessel, C., 324
Noisternig, M., 322
Nojima, T., 231234, 236
Noltes, J., 411
Nordahl, R., 554
Nour, S.G., 503, 531
Noury, N., 587, 592, 606607, 609
Novak, E.J., 509
Numao, T., 135
O
Ocana, M., 447
Ochiai, Y., 233234
Oda, H., 376
Oda, O., 288
ODonovan, A.E., 321
Oezbek, C., 263
Ogasawara, T., 234
Ogawa, T., 78, 469
Ogertschnig, M., 411
Author Index
Ogris, G., 587, 590, 592, 607
Ogundipe, O., 374, 377
Oh, B.H., 652, 655
Oh, S., 314
Ohbuchi, R., 503
Ohlenburg, J., 298
Ojika, T., 76
Okada, K., 131
Okamoto, M., 68
Okamura, A.M., 232, 234, 241
Okuma, T., 373
Okumura, B., 468469
Okur, A., 493
Okutomi, M., 135
Olbertz-siitonen, M., 672
Oliveira-Santos, T., 506
Oloufa, A.A., 376
Olson, E., 390
OMalley, D.M., 493
OMalley, M.K., 234
Omura, K., 74
Onceanu, D., 542
Ontiveros, A., 358
Opdahl, A.L., 417
Oppenheimer, P., 487
Orlosky, J., 71, 7879
Ortega, M., 555
Ortiz, R., 212213
Ortolina, A., 509
Osborne, M., 42
Oskiper, T., 192
Ossevoort, S., 594597, 600, 607611
Otaduy, M.A., 552577
Ott, R., 231, 555
Ou, J., 665, 673
Oulasvirta, A., 28
Ozuysal, M., 389
P
Pabst, S., 564, 571
Pacchierotti, C., 552555
Pacelli, M., 608, 636
Padoy, N., 508
Pagani, A., 463, 478
Pai, D.K., 232, 556, 567
Pajdla, T., 127, 129130, 137, 140, 145, 215
Palmieri, F., 289, 323
Paloc, C., 536
Pandya, A., 497, 531
Pang, G., 153
Pang, J., 567
Papadopoulo, T., 566, 573
Papadopoulos, D., 3153
Papagelopoulos, P.J., 491
Papanastasiou, J., 491
Pape, D., 231
697
Papetti, S., 554
Paradiso, J.A., 233234, 320321, 588589, 594,
606608, 611
Paradiso, R., 587, 589, 603, 608, 636
Parameswaran, V., 153
Pramo, M., 415
Parati, G., 587, 592, 606, 610
Park, G., 234
Park, H.S., 359
Park, H.-S., 435
Park, I.-K., 593
Park, J.I., 345
Park, J.-W., 435
Park, Y., 468
Park, Y.J., 653
Parkes, R., 234
Parnes, P., 664
Parrini, G., 552
Parseihian, G., 312
Parviainen, R., 664
Pasta, M., 647
Pastarmov, Y., 463, 478
Patel, A., 370372
Patel, N., 27
Patel, R.V., 501
Patel, S., 15
Paterson, N., 314
Pattynama, P.M.T., 509
Paul, P., 527, 532533
Pavlik, J., 263
Pece, F., 667
Pedersen, E.R., 324
Peitgen, H.-O., 507, 541
Peli, E., 97
Peltola, M., 314
Pena-Mora, F., 333334
Peng, C., 650
Peng, H., 653
Peng, M., 652653
Pennec, X., 496, 536537
Pentenrieder, K., 435
Pentland, A., 586, 589, 592, 600, 607612
Peres, R., 411
Perey, C., 322
Perez, A.G., 552577
Perez, C.R., 646650, 656
Prez, L., 415
Perlin, K., 465
Pernici, B., 417
Perrin, N.A., 203
Perrott, D.R., 322
Peshkin, M.A., 553
Peterhans, M., 502, 506
Peterlik, I., 528, 537
Peters, C.A., 488, 499
Peters, N., 322
Peters, T.M., 488, 501, 505, 535
698
Petit, A., 139
Peuchot, B., 539
Pfister, H., 577
Philbin, J., 192, 215
Philip, M., 499
Picinbono, G., 571
Piekarski, W., 334, 587, 589, 592, 600, 602603,
609, 612
Pihlajamki, T., 300
Pintaric, T., 440
Pinz, A., 198
Pisano, E., 503
Plant, S., 45
Platonov, J., 152
Platt, J.C., 321
Plaweski, S., 491, 509
Pletinckx, D., 414
Plowman, E.E., 639
Poelman, R., 666
Poissant, L., 4546, 52
Polat, G., 383
Pold, S., 46
Pollard, N.S., 556
Ponamgi, M., 380
Ponce, J., 215, 441
Popovic, J., 593
Porazzi, E., 509
Pornwannachai, W., 646, 655656
Prschmann, C., 322
Porter, S.R., 434
Porter, T.R., 371
Portet, C., 641
Pouliquen, M., 556
Poupyrev, I., 65, 230, 487
Povoski, S.P., 493
Powell, D., 234
Powell, K.D., 100
Prados, B., 415
Prandi, F., 359
Pratt, P.J., 508
Prattichizzo, D., 250, 552555
Preciado, D., 503
Preim, B., 536
Presser, V., 642, 647650
Pressigout, M., 152
Priego, P., 293
Prisco, G.M., 552
Profita, H., 18
Provancher, W.R., 553555
Provot, X., 564, 570
Psomadelis, F., 587, 589, 592, 600,
603,608612
Puder, H., 302
Puebla, M.C., 487
Pugin, F., 502, 506
Pulkki, V., 281, 300
Pulli, K., 209
Author Index
Puterbaugh, K.D., 562
Pylvninen, T., 153
Q
Qi, J., 288
Qiu, Z., 231
Qu, L., 653
Quackenbush, S., 314
Quan, L., 126
Quinn, B., 639
Quinn, S., 51
Quintana, J.C., 487
R
Rabaud, V., 208, 212213
Rabenstein, R., 300
Rabinowitz, W.M., 292
Raczkowsky, J., 532
Rademacher, P., 537538
Raducanu, B., 312
Raghu, S., 22
Raghunath, M., 594, 597, 599, 602603
Rai, L., 498
Raja, V., 234
Rajchl, M., 501
Rambaud, C., 589, 594, 606
Rampersaud, Y.R., 491
Randall, G., 180, 363
Randhawa, R., 492
Rao, R., 250
Rash, C.E., 60, 86
Raskar, R., 537538
Rass, U., 302
Rassweiler, J., 487, 538
Rassweiler, J.J., 497, 499
Rassweiler, M.-C., 497
Rastogi, A., 400
Rathinavel, K., 73
Ratib, O., 506
Rattner, D., 496
Raulot, V., 97
Redon, S., 555
Reed, C., 322
Regatschnig, R., 522
Regenbrecht, H., 436, 473474
Reichert, W.M., 654
Reichherzer, C., 661677
Reichl, H., 622, 633
Reif, R., 436
Reiley, C.E., 502
Reilly, B.K., 503
Reiners, D., 436
Reisner, A., 597, 600, 606, 609
Reiss, A.A., 583615
Reitmayr, G., 152, 192, 198, 221, 438
699
Author Index
Rekimoto, J., 233234, 324, 594596, 603
Rempel, D., 564
Ren, J., 653
Restelli, U., 509
Reyes, M., 506
Rhee, S., 597, 600, 606, 609
Rhodes, B.J., 661662
Ribo, M., 198
Richmond, J.L., 232
Richter, J., 407
Rideng, O., 536
Rieder, C., 507, 541
Riedmaier, T., 321
Riener, R., 234
Rimet, Y., 589, 594, 606
Risatti, M., 588589, 593, 603, 607, 611, 613
Riser, A., 71
Ritter, F., 507, 541
Rivers, A.R., 556
Riviere, C.N., 532
Rizzo, F., 587, 592, 606, 610
Rizzolatti, G., 50
Roberson, D.J., 8
Robert, L., 461462
Roberts, G., 333, 374, 377
Robinett, W., 65
Robinson, J., 668
Roblick, U.J., 492
Rocchesso, D., 281
Rodde, T., 293, 317
Rodriguez, F., 495
Rodriguez Palma, S., 532
Rogers, C.D.F., 372373
Rogers, D.M., 653
Roggen, D., 586587, 589596, 600, 602603,
607, 609611, 613
Roginska, A., 322
Roh, T., 596, 598, 602, 607
Rhl, S., 508, 530
Rohling, R., 497
Rojah, C., 358
Rolland, J.P., 65, 73, 77, 80, 119, 487
Romano, J.M., 228, 232, 234, 250
Romanzin, C., 464
Rome, J.J., 500
Ronayette, D., 589, 594, 606
Rose, E., 461462
Rosenberg, L.B., 233234, 495
Rosenthal, M., 503
Rosin, P.L., 210211
Rosner, M., 81
Rssle, S.C., 51
Rossler, K., 522
Rosso, R., 586, 589, 591, 606607, 613
Rosten, E., 207
Rothbucher, M., 321
Rothganger, F., 441
Roto, V., 28
Roumeliotis, S., 220221
Rousseau, J., 655
Rouzati, H., 322
Rovers, L., 233
Rowe, A., 594, 597, 599, 602603, 609
Rowe, P.J., 495
Rozier, J., 288
Rubino, G., 531532
Rublee, E., 208, 212213
Rueda, O., 536
Ruff, T.M., 376
Ruffaldi, E., 556
Rumsey, F., 281282
Rund, F., 321
Ryoo, D.W., 586, 590, 606607
Ryu, J., 232, 234, 236, 554
Ryu, S.-W., 345
S
Sa, J., 234
Sabatini, A.M., 593
Sadalgi, S., 288
Saeedi, E., 19
Sager, I., 13
Saito, A., 533
Saito, H., 461462
Saito, Y., 76
Saji, A., 300
Sakas, G., 536
Sakata, N., 665, 668
Sakita, I., 497, 535
Sakuma, I., 505
Salari, M., 653
Salas, J., 312
Salb, T., 534
Salisbury, J.K., 556
Salisbury, K., 250, 556
Salsedo, F., 552
Salvatore, G.A., 613614
Salvetti, O., 586, 589, 591, 606607, 613
Samanta, V., 264
Samarasekera, S., 192
Samset, E., 536
San Jos Estpar, R., 487
Sanderson, P., 23
Sandin, D., 231
Sandler, M., 314
Sandor, C., 231
Sanford, J., 594, 597, 599, 602603
Sansome, A., 436
Santato, C., 655
Santos, J.L., 487
Sanuki, W., 312
Sarakoglou, I., 554
Sarmiento, M., 500
700
Sarpeshkar, R., 324
Sartini, G., 552
Sasama, T., 497, 535
Sato, K., 314
Sato, M., 234235, 552, 554
Sato, S., 503
Sato, Y., 487, 497, 535
Satoh, K., 198
Sattler, T., 192
Saturka, F., 321
Sauer, F., 487, 503, 531532, 538539
Saunders, T., 40
Savioja, L., 314
Savvidou, O.D., 491
Sawhney, N., 18
Sawka, A., 492
Saxenian, A.L., 40
Sayd, P., 442
Scaioni, M., 359
Scarborough, D.M., 588589, 594, 607
Schacher, J.C., 322
Schaffalitzky, A., 127
Schaik, A.V., 288
Schall, G., 333334, 376
Scharl, A., 504
Scharver, C., 231
Scheggi, S., 554
Schenk, A., 536
Scheuering, M., 536
Schiemann, M., 505
Schiller, I., 345
Schilling, A., 461, 473
Schiphorst, T., 46
Schirmbeck, E.U., 535
Schleicher, D., 447
Schlig, E., 594, 597, 599, 602603
Schluns, K., 126
Schmalstieg, D., 80, 152, 192, 333334, 376, 478,
507, 536
Schmandt, C., 18
Schmid, C., 215, 441
Schmidt, A., 613614
Schneberger, M., 538539
Schneider, A., 536
Schneider, J., 662
Schneider, S.O., 490
Schnelzer, A., 493
Schnepper, J., 594, 597, 599, 606
Schnupp, J.W.H., 324
Schoonover, K., 629630
Schorr, O., 506
Schowengerdt, B.T., 74
Schranner, R., 62
Schroeder, D., 564
Schroeder, P., 333
Schultz, T., 588, 593, 607
Schutte, K., 466
Author Index
Schwartz, S.J., 586, 589, 592, 600, 607613
Schwirtz, A., 587, 589, 592, 600, 603, 608612
Scilingo, E.P., 553, 555
Secco, E.L., 588589, 593, 603, 607, 611, 613
Seeberger, R., 508, 534
Sgalini, J., 641
Segall, Z., 662, 664
Seibel, E.J., 74
Seifert, U., 497, 534
Seitel, A., 497
Seitz, S.M., 127, 192
Sekiguchi, D., 231234, 236
Seligmann, D., 63
Seo, B., 293
Seo, C., 554
Seo, J., 233
Serafin, S., 554
Serina, E.R., 564
Serio, A., 555
Serra, L., 497
Servirest, M., 413
Setlock, L.D., 665, 673
Seyler, T.M., 495
Seymour, S., 639
Shaff, S., 281
Shah, T.H., 646, 655656
Shaltis, P., 597, 600, 606, 609
Shamir, R., 497
Shedroff, N., 324
Sheikh, Y., 438
Shekhar, R., 499
Shelton, D.M., 504
Shepard, R.N., 10
Shi, J., 132, 390
Shi, X., 288
Shibasaki, T., 498, 531, 533
Shibata, F., 152
Shilling, R.D., 278
Shimamura, K., 486
Shimizu, E., 68
Shin, D.H., 333334
Shin, M.K., 653
Shinjoh, A., 400
Shinn-Cunningham, B., 278, 324
Shiotani, S., 535
Shirley, P., 458
Shiroma, N., 400
Shiwa, S., 74
Shoham, M., 497
Shoshan, Y., 497
Shotton, H., 474
Shreiner, D., 337, 346, 459
Shuhaiber, J.H., 527, 533, 535
Shukla, G., 504
Siadat, M.-R., 497, 531
Siau, K., 411, 417
Siegel, J., 662664, 666
Author Index
Siegwart, R., 212213
Sielhorst, T., 507508, 541
Siewiorek, D.P., 22, 80, 594, 597, 599,
602603,609
Siitonen, M., 672
Siltanen, S.,
Silverstein, M.D., 509
Sim, C.Y.D., 639
Simard, P., 208
Simms, A., 42
Simon, C., 619638
Simon, D.A., 491
Simon, P., 639, 641643, 645
Simoneau, J., 22
Simpfendrfer, T., 487, 499, 538
Sin, F., 564
Sinclair, D., 528
Sindram, D., 492
Sing, N., 460462
Sirhan, D., 528
Sisodia, A., 71
Sivic, J., 186, 192, 215
Sjkvist, S., 506
Skolnik, D.A., 359
Skorobogatiy, M., 653, 655
Skrypnyk, I., 192
Skulimowski, P., 312
Slade, J., 630
Sloten, J.V., 536
Smailagic, A., 22, 594, 597, 599,
602603,609
Smalley, D., 282
Smedby, ., 506
Smith, B.P., 495
Smith, E., 268
Smith, K.C., 324
Smith, R., 413
Smith, R.T., 434
Smolander, K., 417
Snavely, K., 192
Snavely, N., 127, 192
Soh, B.S., 593594, 597, 599
SoIanki, M., 234
Soin, N., 646, 655656
Sokoler, T., 324
Solazzi, M., 553554
Soler, L., 496, 536537
Son, H., 376
Song, C., 404
Song, M.K., 653
Sonmez, M., 500
Sonntag, D., 7879
Sood, A., 234
Sorger, J., 503
Southern, C., 22
Sovich, J., 653
Spagnol, S., 321
701
Speidel, S., 508, 529, 533534, 538
Spence, C., 234
Spengler, P., 508, 530, 534
Spieth, C., 314
Spink, R., 531532
Spinks, G.M., 653
Splechtna, R.C., 535536
Spors, S., 300
Sprung, L., 487
Spurgin, J.T., 371
Squire, K., 263
Sreng, J., 233
Srinivasan, M.A., 552, 554
Stger, M., 594597, 608
Stamos, I., 165
Stanley, M.C., 562, 564, 568
Stanney, K.M., 278
Stapleton, C., 268
Starkey, K., 43
Starner, T.E., 1328, 320321, 584, 661662
State, A., 69, 73, 76, 503, 537538
Steed, A., 667
Steinemann, D., 552577
Steingart, D.A., 655
Stengel, M., 555
Stenger, D., 646, 650, 656
Stenros, J., 260
Steptoe, W., 667
Sterling, R., 370
Stern, A., 266
Stetten, G.D., 504
Stevens, B., 230
Stewart, A.J., 542
Stewart, C.A., 553
Stewart, R., 314
Stewenius, H., 186, 192
Stewenius, H.D., 127
Stiefmeier, T., 587, 590, 592, 607
Stivoric, J., 552, 608
Stock, C., 198
Stolka, P.J., 492
Stoll, J., 496
Stone, R., 567
Stopp, F., 493
Strig, C., 322
Stoyanov, D., 508
Straer, W., 460, 463, 466, 468, 470, 473478,542
Strasser, W., 564, 571
Stratton, G.M., 80
Strecha, C., 212213
Streicher, R., 296, 323
Streitz, N., 324
Strengert, M., 471472
Stricker, D., 152, 436, 441, 443, 445, 463, 478
Stroila, M., 153
Strong, A.J., 531532
Strumillo, P., 312
702
Sturm, P., 127
Su, J., 655
Su, L.-M., 502
Subramanian, V., 621, 638
Sudia, F.W., 49
Sudra, G., 529, 533534, 538
Sueda, S., 556, 567
Suenaga, H., 505
Suetens, P., 500
Sugano, N., 462
Sugimoto, M., 399409, 506
Sugimura, T., 234
Sukan, M., 288
Sulpizio, H.M., 654
Sumikawa, D.A., 293
Sumiya, E., 79
Sun, J., 215
Sun, P., 650, 652
Sung, M., 314
Suruda, J., 662
Suthau, T., 536
Sutherland, I., 60
Sutton, E., 499
Suwelack, S., 508, 530
Suya, Y., 198
Suzuki, H., 281
Suzuki, M., 506
Suzuki, N., 537
Suzuki, Y., 281, 322
Swan, J.E., 79
Swan, R.Z., 492
Swank, M.L., 509
Sweeney, J.A., 4546
Szab, Z., 506
Szekely, G., 231
Szeliski, R., 127, 192
T
Ta, D.N., 209
Tabata, Y., 589, 593, 608, 5888
Taberna, P.L., 641642
Taccini, N., 602, 603, 606
Tachi, S., 231, 233234, 313, 552, 554
Tachibana, K., 462
Tadokoro, S., 400
Tagle, P., 487
Takada, D., 78
Takagi, A., 76
Takahashi, H., 87
Takahashi, T., 400
Takakura, K., 522
Takamatsu, S., 630
Takao, H., 311
Takasaki, M., 233234
Takemoto, K., 198
Author Index
Takemura, H., 4, 71, 7879, 373, 486
Taketomi, T., 126147
Talha, M.M., 96
Talib, H., 533
Talmaki, S.A., 335
Tamaazousti, M., 433455
Tamaki, E., 594596, 603
Tamaki, T., 234
Tamaki, Y., 497, 535
Tamburo, R.J., 504
Tamminen, S., 28
Tamstorf, R., 552577
Tamura, H., 198
Tamura, S., 487, 497, 535
Tan, H.Z., 250, 554
Tanaka, M., 665
Tanaka, T., 359
Tang, H., 197, 487
Tang, R., 492
Tang, W., 655
Tanguy, A., 539
Taniguchi, N., 76
Tanijiri, Y., 68
Tanikawa, T., 78
Tanno, K., 300
Tanoue, K., 499, 535
Tao, J., 655
Tashev, I.J., 321
Tatzgern, M., 80
Taylor, R., 233234
Taylor, R.H., 233, 502504
Tchouda, S.D., 509
Teber, D., 487, 497, 499, 538
Tecchia, F., 666
Teizer, J., 376
Tener, R.K., 383
Teran, J., 564
Terlaud, C., 589, 594, 606
Ternes, D., 233
Terriberry, T.B., 199, 209, 212
Terven, J.R., 312
Tezuka, T., 506
Thalmann, D., 231, 555, 561
Thiele, H., 502
Thomas, B.H., 17, 267, 407, 434, 436, 476,
587, 589, 592, 600, 602603,
609, 612
Thomas, G.W., 234
Thomas, J.P., 62
Thomas, M., 70
Thomas, M.R.P., 321
Thomas, P., 52
Thomaszewski, B., 564, 571
Thompson, C., 80
Thompson, D.M., 315, 321
Thongrong, D.P.S., 231
703
Author Index
Thorp, E.O., 584
Thorpe, S., 312
Thukral, S., 234
Thumfart, W.F., 498
Thurlow, W.R., 292
Tian, Y., 345
Tikander, M., 288
Timm, B.W., 355
Tobias, H., 63
Tobias, J., 51
Tognetti, A., 588589, 593, 603, 607, 611,
613,636
Tohyama, M., 281
Tokunaga, E., 535
Tomasi, C., 132, 391, 476
Tomei, M., 509
Tomikawa, M., 535
Tomita, M., 405
Tonet, O., 537
Tonetti, J., 491
Toney, A., 587, 589, 592, 600, 602603,
609, 612
Torrealba, G., 487
Toso, C., 502
Tourapis, A., 464
Toyama, T., 7879
Traub, J., 493, 501, 508, 535, 541
Trawny, N., 220221
Traylor, R., 554
Trdoff, B., 665
Treagust, D.F., 383
Trendafilov, D., 672
Trevisan, D.G., 533535
Triggs, B., 127
Troccaz, J., 491
Troester, G., 27
Trost, G., 621
Trster, G., 586587, 589597,
599600, 602603, 606611,
613615,621
Troy, J.J., 562
Truillet, P., 312
Trulove, M.A., 654
Trulove, P.C., 654
Tsagarakis, N.G., 554
Tsai, C.-Y., 411
Tsai, R.Y., 129
Tsai, Y.T., 497
Tseng, C.W., 639
Tsingos, N., 280
Tsotros, M., 413
Tubbesing, S.K., 358
Tuceryan, M., 461462
Turchet, L., 554
Turk, G., 477
Tuytelaars T., 207
U
Uchiyama, H., 139, 152
Uchiyama, S., 198, 231
Ueda, H., 68
Ullmer, B., 324
Umansky, F., 497
Umbarje, K., 492
Ungersbock, K., 522
Uppsll, M., 506
Urban, M., 215
Urey, H., 100
Utsumi, A., 4, 486
V
Vacirca, N.A., 639
Vaghadia, H., 492
Vagvolgyi, B.P., 502
Vaissie, L., 77
Valdivieso, A., 536
Vale, J.A., 508
Valgoi, P., 359
Vallino, J.R., 231
van Dam, A., 324
van den Doel, K., 232
van der Gast, L., 411
Van der Heijden, H., 411, 416417
van der Spoel, S., 411
van Essen, H., 233
van Kleef, N., 411
van Os, K., 621
Van Pieterson, I., 621
Vanderdonckt, J., 533535
Vandergheynst, P., 212213
Varga, E., 509
Varshavsky, A., 15
Vasile, C., 491
Vaughn, J., 270
Vavouras, T., 587, 589
Vaysse, A., 587, 592, 606607, 609
Vazquez-Alvarez, Y., 672
Vega, K., 4849
Velger, M., 86
Venkatesh, V., 421
Ventura, J., 174192
Verkasalo, H., 411
Vescan, A.D., 496, 507
Vetter, M., 536
Vexo, F., 231, 555
Vidal, F., 358
Vieira, F.V., 438
Vilkamo, J., 300
Villegas, J., 290, 292, 294, 309324
Vincenty, T., 340, 346
Vinge, V.
704
Vlahakis, V., 265, 413
Vogl, T.J., 505
Vogt, S., 487, 503, 531532, 538539
Volont, F., 502, 506
Volz, R.A., 232235
Von Busch, O., 45
von Gioi, R.G., 180, 363
Vosburgh, K.G., 487488, 496
Voss, D., 3153
Vouaillat, H., 491
Vu, Q.A., 652
Vullers, R.J.M., 320
W
Wacker, F.K., 503, 531
Waern, A., 260
Wagmister, F., 45
Wagner, A., 502, 532
Wagner, D., 152, 192
Wagner, S., 630
Wahbeh, A.M., 359
Wahl, F., 595, 598
Wallace, G.G., 653
Wallace, J.W., 359
Wallraven, C., 478
Walz, S., 271
Wan, K.M., 629, 646, 654
Wan, S.H., 629, 646, 654
Wander, S., 3738
Wang, C., 345
Wang, C.-D., 293
Wang, D., 504
Wang, G., 541
Wang, H., 385
Wang, J., 505
Wang, K., 653
Wang, L., 493, 501, 594595, 606
Wang, M.L., 539
Wang, Q., 170, 553
Wang, R.Y., 593
Wang, S., 477
Wang, S.-C., 411
Wang, T., 465
Wang, X., 385
Wang, X.L., 492
Wang, Y., 86, 96
Wang, Y.-C., 199
Wang, Z., 505
Wang, Z.L., 653, 655
Wanner, F., 667
Wanschitz, F., 502, 532
Want, R., 15, 288
Ward, J.A., 27, 594, 597, 599, 606608, 613
Wardrip-Fruin, N., 260
Warren, N., 311
Author Index
Warshaw, P.R., 417
Watanabe, K., 322
Watkinson, J., 283
Watts, H., 48
Watzinger, F., 502, 532
Weaver, K., 22
Weber, J.-L., 587, 589, 592, 594, 606607, 609
Weber, S., 497, 502, 506
Webster, A., 63, 333, 662
Wedlake, C., 501, 505, 535
Wegenkittl, R., 535536
Weghorst, S., 487
Wei, Z., 653
Weinheimer, S., 666
Weiser, D., 504
Weiser, M., 324, 584
Wekerle, A.-L., 508, 530
Wells, W., 530
Wells W. III, 496
Wells, W.M., 496
Wendler, T., 493
Wenzel, E.M., 282283
Wesarg, S., 505
Westerfeld, S.
Wetzel, P., 70
Wetzel, R., 298
Weyrich, T., 667
Whitaker, R., 461462
White, S., 153, 312
White, S.J., 496
Whitehead, K.K., 500
Whyte, R., 588589, 593, 603, 607, 611, 613
Wicker, B., 50
Wieferich, J., 507, 541
Wiener, N., 50
Wientapper, F., 415, 441
Wierstorf, H., 322
Wiertlewski, M., 577
Wiles, A.D., 501, 535
Wilke, W., 436
Wilkes-Gibbs, D., 666
Williams, T., 70
Wilsdon, J., 40
Wilson, E., 499
Wilson, J., 42, 117
Wilson, M., 495
Wilson, P., 630
Wimmer-Greinecker, G., 505
Win, M.Z., 219
Winer, E., 487
Winfree, K.N., 554
Winne, C., 493
Wirtz, C.R., 532
Witchey, H., 413
Withagen, P., 466
Wither, J., 264
705
Author Index
Witkin, A., 558
Witterick, I.J., 496, 507
Wloka, M.M., 344, 460, 473
Wolfsberger, S., 522
Wong, K.S., 629, 646, 654
Wong, S.W., 492
Woo, M., 337, 346
Woo, S.-W., 652, 655
Woo, W., 232, 234, 236, 407, 468
Wood, R.E., 185
Woods, E., 73
Woods, R., 202
Woodward, C.,
Wrn, H., 506, 531532
Wren, J., 506
Wright, D.L., 65, 80, 487
Wright, P., 86
Wu, E., 477
Wu, H., 652653
Wu, J.-H., 411
Wu, J.R., 539
Wu, K., 499
Wu, W.Z., 653
Wu, Y., 126
Wu, Z., 215
Wuest, H., 152, 415, 441, 443, 445, 463, 478
Wst, H., 414
Wyland, J., 371
X
Xie, X., 22
Xu, N., 363
Xu, W., 668
Xu, Y., 588589, 594
Xue, Q.J., 650
Xueting, L., 469
Y
Yachida, M., 71
Yagi, Y., 71
Yalcin, H., 346
Yamaguchi, S., 499
Yamamoto, G., 131
Yamamoto, H., 198, 231
Yamasaki, K., 68
Yamashita, T., 630
Yamazaki, H., 312
Yamazaki, M., 506
Yamazaki, S., 76
Yamini, S., 653
Yan, X.B., 650
Yanagibashi, Y., 503
Yang, C., 234
Yang, G.-H., 554
Z
Zach, C., 186, 192
Zahorik, P., 292
Zakarauskas, P., 292
Zamarayeva, A.M., 655
Zang, X., 650, 652
Zeagler, C., 18
Zehavi, E., 497
Zhang, C., 655
Zhang, J., 234
Zhang, X., 655
Zhang, Y., 234, 653
Zhang, Z., 129130, 138, 652653
Zhao, Y., 652653
Zheng, G., 500, 538539
Zhou, J., 436
Zhou, T., 655
Zhu, C., 497
Zhu, Y., 564
Ziegeler, S., 490
706
Ziegelwanger, H., 322
Zilles, C.B., 556
Zimmermann, R., 293
Zinreich, S.J., 504
Zinser, M.J., 497, 534
Zisserman, A., 127, 132, 186, 192, 215, 442
Zller, J.E., 497, 534
Zllner, M., 415, 463, 478
Author Index
Zoran, A., 233234
Zordan, V.B., 556
Zotkin, D.N., 321
Zucco, J., 17
Zucco, J.E., 434
Zuerl, K., 538539
Zugaza, I., 478
Zysset, C., 621
Subject Index
A
AAR system, see Audio augmented
reality(AAR) system
Accessory-based wearable computers
multifunctional wearable computer, 594
projects analysis, 595597
projects and placements, 595, 598
arm region, 598599
hand and finger region, 599600
head and neck region, 595, 598
legs and feet region, 600
torso region, 595, 597, 599
wrist region, 599
Acoustic tracking systems, 522
Activated carbons (ACs), 641, 647650, 654655
AEC applications, see Architecture, engineering,
and construction (AEC) applications
Age-related macular degeneration (AMD),
67,107
Anisotropic magnetoresistive (AMR)
magnetometer, 219
Anisotropic nonlinear elastic behavior
constraint formulation, 573
constraint Jacobians, 573574
finger skin deformations, 576
hyperbolic projection function, 572573
linear interpolation, 574575
orthogonal projection, 574575
real-world finger, 570
strain limits, 571572
Architecture, engineering, and
construction(AEC) applications
buried utilities
digging implement, 372373
excavation contractors, 370
field locators and markers, 373
GIS, 374376
KML, 381
MDS, 371372
One-Call excavation damage prevention
process, 371
operator-perspective, 373374
proximity monitoring (see Proximity
monitoring)
spatial awareness, 372
underground utility, 371, 381382
collaborative learning
ARVita, 390392
computer-based visualization, 383385
707
708
implementation, 346348
incorrect and correct occlusion, 344345
RTT techniques, 345
semiautomated method, 344345
stereo camera, 345
TOF camera, 345346
two-stage rendering method, 346347
validation experiments, 349350
Vincenty algorithm, 345346
x-ray vision, 333
AR Faade, 266267
ARMOR platform, see Augmented Reality
Mobile OpeRation (ARMOR)
platform
ARVISCOPE, see Augmented
reality visualization of
simulated construction
operations(ARVISCOPE)
Atomic force microscopy (AFM), 51
Audio augmented reality (AAR) system, 278
anyware and awareware, 316
audio windowing, 315
layered soundscapes, 319320
multipresence, 317, 319
narrowcasting (see Narrowcasting)
applications
assistive technologies, 312313
entertainment and spatial music, 314315
location-aware games, 314
navigation and location-awareness
systems, 311312
sonification, 314
synesthetic telepresence, 313
challenges
authoring standards, 322323
capture and synthesis, 321
performance, 321322
Audio data/definition model (ADM), 322
Audio windowing, 310, 315, 323
Audium, 281282
Augmented Reality Markup
Language(ARML),322
Augmented Reality Mobile
OpeRation(ARMOR) platform
backpack, 355, 357358
improvements, 356
insecure placement, tracking, 355
vs. UM-AR-GPS-ROVER platforms, 356357
Augmented reality visualization of
simulated construction
operations(ARVISCOPE)
animation scalability, 352
animation trace file, 351
automatic generation, 351
manual generation, 351
smooth and continuous operation, 350351
time-ordered sequence, 351352
Subject Index
Augmented Reality Vitascope (ARVita),
384387, 389392
Augmented x-ray guidance system, 490491
Automatic reference counting (ARC), 419
Automotive industry
large-scale deployment, 438440
potential benefits
design and conception, 434435
driving assistance, 436438
factory planning, 435
maintenance support, 438
sales support, 436
user manual, 438
vehicle production, 435436
tracking solutions
large occlusion, 442
model-based tracking solutions, 441
6DoF mechanical measurement
arms,440
2D/3D markers, 440441
VSLAM (see Visual
simultaneous localization and
mapping(VSLAM))
vehicle localization
accuracy, 447
geo-referenced landmark database,
446447
3D visual features, 447
VSLAM constraints (see Visual
simultaneous localization and
mapping (VSLAM))
Autonomous sensory meridian
response(ASMR),295
B
Bag-of-words matching (BoW) matching,
215216
Barco Auro-3D, 300
Best-Bin-First (BBF) algorithm, 160161
Binary robust independent elementary
features(BRIEF), 212213
Binaural hearing aids, 301302
Bujnaks method
center of projection, 138
focal length, 138
lens distortion, 137138
quantitative evaluation
ARToolKit, 139
estimated rotation matrix, 139
Euclidian distance, 139
free camera motion, 140142
real environment, 145147
straight camera motion, 143145
true rotation matrix, 139
video sequences, 139
spline fitting, 138
709
Subject Index
C
CAD model, see Computer-aided design (CAD)
model
Camera parameter estimation
vs. Bujnaks method (see Bujnaks method)
fiducial marker, 127129
2D3D corresponding pairs, PnP problem,
126127
zoomable camera (see Zoomable camera
model)
CamNet system, 662663
Carbide-derived carbons (CDC), 641
Carbon fiber (CF) electrode, 650652
Carbon nanotubes (CNTs), 647, 653
Chromatic aberrations, 67, 77, 467
CMT, see Cut-Make-Trim (CMT)
CNC machines, see Computer Numerical
Control(CNC) machines
Collaborative learning
ARVita, 390392
computer-based visualization, 383385
eagle window, 386
error detection, 385
FLTK_OSGViewer node, 386388
4D CAD modeling, 383384
KEG, 390
natural marker, 388390
OpenGL camera, 386
paper-based shared workspace, 384385
Raytheon STEM Index, 383
tracker and marker mechanism, 386, 388389
VITASCOPE scene node, 385388
Collaborative wearable computers
CamNet system, 662663
communication asymmetries, 664
communication theory, 666667
DOVE, 665
features, 669
HMD, 663665
Social Panoramas (see Social Panoramas,
prototype system)
TAC system, 664
users environment, 667669
WACL, 665
WearCam and WearComp systems, 663
Communication theory, 662, 666667, 669
Computed tomography angiography (CTA), 525
Computer-aided design (CAD) model, 350, 352,
383, 386, 443445, 448
Computer graphics (CG) model, 402, 405, 408,
459462, 464, 470, 552
Computer Numerical Control (CNC) machines,
622, 625626, 628, 633
Cone-beam computed tomography (CBCT), 496,
499500
Couching technique, 631, 637
D
Dahl friction model, 241242
Data acquisition system (DAS), 378
Data, visualization processing, and view (DVV)
taxonomy
block diagram, 523524
craniofacial surgery, 533534
710
dental surgery, 533534
derived data, 525, 541
endoscopic surgery, 537538
factors, 523524
laparoscopic surgery, 537538
maxillofacial surgery, 533534
minimally invasive cardiovascular surgery,
535536
neurosurgery, 531533
orthopedic surgery, 538539
patient-specific data, 523
prior knowledge data, 525
raw imaging data, 525
semantics, 523
view factor, 526527, 541542
visualization processing, 525526, 541
visually processed data, 523
da Vinci robotic surgical system, 502503
declipseSPECT system, 492494
Defocus blur, 468469
Dental surgery, 527, 529, 533534
Depth buffering, 345347
Depth of field, 468469
Depth perception, 62, 76, 79, 501502, 506507
Differential global positioning system (DGPS),
182, 189190, 356, 403, 452
Digital micro-mirror device (DMD), 73
Directional audio coding (DirAC), 300
DirectXs DirectSound, 284
Distributed spatial sound, 302303
Dolby Atmos, 300
Drawing over video environment (DOVE), 665
Dual-in-line (DIP) packages, 632
Duplex theory, 284
DVV taxonomy, see Data, visualization
processing, and view (DVV)
taxonomy
E
Electric double layer capacitors (EDLCs),
642644
Electrochemical capacitors (ECs), 642, 653654
Electromagnetic tracking systems, 489, 522
Electronic textiles (e-textiles)
CMT, 620
durability and reliability, 620
integration strategies
conductor integration, 621
manufacturing methods, 621
PCB, 621622
surface attachment, 621
routing in garments
marker layout, 627628
order of operations, 629
pattern piece, 627628
production design, 628629
Subject Index
seam crossing methods, 629631
trace crossings, 631632
SMD components, 633634
stitching technologies
CNC machines, 625626
multineedle machines, 624625
single-thread chain stitch, 623624
textile components
durability, 636637
sensor insulation, 636637
stretch and bend sensors, 634636
through-hole components, 632633
Electronic travel aids (ETAs), 312
Endoscopic surgery, 537538
Endoscopic video guidance system, 493495
Energy density, 644645, 652
ER4 MicroPro earphones, 288
European Broadcast Union (EBU), 322
Extimacy, see Humancomputer
interaction(HCI)
Eye box
eye pupil diameter, 100102
vs. eye relief, 100101
FOV, 102
holographic combiner and extraction, 95, 100
optical combiner thickness, 100
Eye tracking, 74, 7879
F
Feature-based tracking method
algorithm adaptation, 209212
local feature extraction
BoW matching, 215216
BRIEF, 212213
database object, 214215
efficiency, robustness, and distinctiveness,
206207
interest point detection (see Interest point
detection)
LDB, 213214
LSH, 215
object tracking, 216
SURF, 206, 212
RANSAC/PROSAC algorithms, 206
tracker initialization, 186187, 189
Features from accelerated segmented test (FAST)
detector, 207209
Field of view (FOV)
angular resolution, 7072
depth perception, 62
eye box, 102
field of fixation, 62
light intensity, 62
optical requirements
angular resolution, 9091
constraints, 8889
711
Subject Index
dot per degree, 90
dot per inch, 90
occlusion and see-through displays,
8889
pixel counts, 9091
target functionality, 90, 92
visual acuity, 6162
Filament fibers, 645646
Film grain, 464
Finite element method (FEM), 556, 559, 564
Finite impulse response (FIR) filter, 287,
342344, 353
Fitbit One, 26
Fitsense FS-1, 2627
Force systems, 554
4G LTE, 301
FOV, see Field of view (FOV)
G
GABRIEL system architecture, 311312
Garment-based wearable computers
definition, 585
physical activity monitoring, 589
project analysis, 585589
projects and placements, 590591
arms and hands region, 593
feet region, 593594
head and neck region, 590
legs region, 593
torso region, 590593
remote health monitoring, 589
user interfaces, 589
Garment devices
coated devices
cyclic voltammetry, 649650
porous textile supercapacitor, 648
screen printing, 647
SEM, 648649
SWCNT, 647
custom textile architectures, 655657
cutting edge research, 639640
definition, 639
energy storage devices
components, 642
ECs, 642
EDLCs, 642644
energy density, 644645
mass loading, 645
primary batteries, 644
pseudocapacitors, 643644
secondary (rechargeable) batteries,
643644
fibers and yarns, 645646
electrode configurations, 653654
graphene, 653
NFW, 654655
H
Half-silvered mirror devices, 6364, 67, 495,
504505, 527, 539
HandsOnVideo, 666
Haptic augmented reality
breast tumor palpation system, 228
components
interaction, 234236
models, 238239
registration problems, 236237
rendering frame, 237
friction modulation, 248249
multi-finger interaction, 250
pen-shaped magic tool, 228
stiffer inclusion
configuration, 245246
measurement-based approach, 246
rendering algorithm, 247248
variables, 246247
712
stiffness modulation
Geomagic, 240
PHANToM premium model 1.5, 240
single-contact interaction, 240243
two-contact squeezing, 243245
taxonomies
artificial recreation, 232
augmented perception, 233
composite visuo-haptic realityvirtuality
continuum, 229230
MicroTactus system, 233
within and between property, 233234
visuo-haptic realityvirtuality
continuum, 229230
vMR-hMR, 231232
vMR-hR, 229230
vMR-hV, 230231
vR-hMR, 231232
vR-hR, 229230
vR-hV, 230231
vV-hMR, 231232
vV-hR, 229230
vV-hV, 230231
Haptic rendering, 231, 238, 250, 556, 562563,576
Haptic ring, 554
HCI, see Humancomputer interaction (HCI)
Head-mounted displays (HMDs), 86
applications, 6263, 8788
breast biopsy, 503
camera parameters (see Camera parameter
estimation)
collaborative wearable systems, 663
connected glasses, 110
depth of field, 7375
diffractive and holographic optics
angular and spectral bandwidths, 9899
Bragg selectivity, 9798
optical functionality, 97
eye box
eye pupil diameter, 100102
vs. eye relief, 100101
FOV, 102
holographic combiner and extraction,
95,100
optical combiner thickness, 99100
FOV, 7072
hardware issues
catadioptric designs, 67
eye-relief, 65
free-form prism, 67
HMPD, 69
HOE, 68
non-pupil-forming architecture, 6667
NVIDIA, 69
ocularity, 6465
optical see-through display, 6364
optical waveguide, 68
Subject Index
pupil forming architecture, 6566
UNC, 69
video see-through display, 64
VRD, 6869
history, 6061
image distortions and aberrations, 77
input interfaces
body tapping sensing, 122
brain wave sensing, 122
eye gesture sensors, 119
hand gesture sensing, 121122
head gesture sensor, 119
head-worn computer, 118
optical gaze tracking techniques, 119121
trackpad, 119
voice control, 119
IPD, 99
latency, 7576
microdisplay
HUD, 93, 103
illumination engines, 102103
LCD transmission, 102103
LCoS, 102103
LED, 102103
MEMS/fiber scanners, 102, 105107
OLED panels, 103104
multimodality, 78
neurosurgery, 532
occlusion, 7273
optical architecture
contact lenses, 117118
light field occlusion, 118119
non-pupil-forming architecture, 93, 9596
optical platforms, 9397
pupil-forming architecture, 91, 93, 9596
tools, 91
optical requirements
angular resolution, 9091
constraints, 8889
dot per degree, 90
dot per inch, 90
occlusion and see-through displays,
8889
pixel counts, 9091
target functionality, 90, 92
parallax, 7677
perceptual issues
adaptation, 8081
depth perception, 7980
user acceptance, 80
pictorial consistency, 77
resolution, 70
sensing, 7879
smart eyewear
optical combiner and prescription lenses,
107, 109
potential consumers, 107108
713
Subject Index
requirements, 107
Rx/plano lens, 107, 109110
smart glasses (see Smart glasses)
vision, 6162
Head-mounted projective displays (HMPD),
69,71, 73
Head-related impulse responses (HRIRs),
283284, 288, 312, 321322
Head-related transfer
functions(HRTFs),283286
Heads-up display (HUD), 88, 93, 97, 103
Head tracking system, 20, 284, 300301, 315
Historical Tour Guide
ARC, 419
AR view, 418
CroMAR, 421
detailed information view, 418
events, 420
initialization, 421
Internet platform, 416
list view, 419
local history, 417
map, 419
photo overlay, 418
system architecture, 419420
TAM, 416417
technology acceptance research model, 417418
timeline, 419
UIApplication object, 420
HMDs, see Head-mounted displays (HMDs)
Holographic optical element (HOE), 68, 97
HRIRs, see Head-related impulse
responses(HRIRs)
Humancomputer interaction (HCI)
ethics and speculation
actual and perceived value, 35
coloring, 37
garment anchor, 3435
Google Glass, 3334
iterative design, 36
prototypes, 36
quantified self applications, 3334, 3739
requirements and specifications, 3536
speculative design, 36
system of interactions, 35
social politics
FuelBand, FitBit, Glass, 3940
physical object, 43
quantified self, 4042
regional innovation, 40
tracking and route control, factory floor,
4243
synaptic sculpture
Blinklifier (see Humanistic intelligence)
bridging materiality and
information,4648
digitization, 4445
floating eye, 44
inversion, design process, 4546
materials technologies, 45
spermatozoa fertilization, 4344
star charts, 44
Humanistic intelligence
AFM, 51
anthropomorphize robots, 50
artificial intellects, 49
macro perspectives, 52
nanoperspectives, 52
natural gestures, 4849
neurotransmitter serotonin, 50
Peptomics, 51
singularity, 49
textiles, 5051
3D printing organs, 51
Humanrobot interfaces
active tangible robots, 407408
future predictive visual presentation,
404405
projection-based configuration, 405407
robots and users, information, 399400
robot types, 399400
video see-through-based AR
CG model, 402
egocentric view camera image, 401
four-wheel robot, 403404
fundamental concepts, 401402
system configuration, 401, 403
tele-operated rescue robots, 400401
Time Followers Vision, 401402
virtual third-person view, 402403
HuntCrossley model, 242, 246248
Hybrid methods
applications, 198
design objectives, 199
good accuracy and high efficiency, 199200
recognition and tracking, 221222
Hypersonic sound system, 302
I
IID, see Interaural intensity difference (IID)
Image-based method, 185
Image-guided neurosurgery (IGNS), 531533
Image-guided surgery (IGS)
acoustic tracking systems, 522
craniofacial surgery, 533534
definition, 520
dental surgery, 527, 529, 533534
drawback, 520
DVV taxonomy
derived data, 525, 541
factors, 523524
patient-specific data, 523
prior knowledge data, 525
714
raw imaging data, 525
semantics, 523
view factor, 526527, 541542
visualization processing, 525526, 541
visually processed data, 523
electromagnetic tracking systems, 522
endoscopic surgery, 537538
laparoscopic surgery, 530, 537538
maxillofacial surgery, 533534
minimally invasive cardiovascular surgery,
527, 530, 535536
neurosurgery, 527528
analyzed data, 531
derived data, 531
prior-knowledge data, 531
superimposed anatomical data
objects,530
view factor, 531533
visualization processing, 531
optical tracking systems, 522
orthopedic surgery, 527, 529, 538539
registration, 521522
tracked surgical probe, 521
validation and evaluation, 539540
Implantable miniature telescope (IMT), 67
Incremental matching, 154, 161
Inertial measurement unit (IMU), 76, 216
Information furniture, 303
Intelligent tourism and cultural
information through ubiquitous
services(iTACITUS), 414415
Intelligent traffic system (ITS), 311
Intel Scavenger Bot, 267
Interaural intensity difference (IID), 283285,
310, 314
Interaural time (phase) difference (ITD),
283285, 310, 314
Interest point detection
high-quality detector, 208209
lightweight detector, 207208
properties, 207
Interpupillary distances (IPDs), 61, 65, 77, 99
Intimacy, see Humancomputer interaction (HCI)
ITD, see Interaural time (phase) difference (ITD)
K
KanadeLucasTomasi (KLT) feature tracker,
132, 389390
Keyhole Modeling Language (KML), 381
Knitted carbon fiber electrode, 650652
L
Laboratory for Interactive Visualization in
Engineering (LIVE), 335337
Laparoscopic surgery, 495, 530, 537538
Subject Index
Large expanse extra perspective (LEEP) system,
19, 61
LCD, see Liquid-crystal display (LCD)
LCoS, see Liquid crystal on silicon (LCoS)
LED, see Light-emitting diode (LED)
LevenbergMarquardt method, 134, 137, 159
Leviathan, 269
Light Detection and Ranging (LiDAR) scanners
initial camera pose
estimation, 165166
keypoint features, 165
synthetic images, 165166
3D colored point cloud model, 164165
iterative estimation, 167168
pose refinement, 166167
video image, 168169
Light-emitting diode (LED), 102103, 105,
600,603
Linear blend skinning (LBS), 560561
Liquid-crystal display (LCD), 1819, 47,
102103, 107, 497, 534, 587, 603
Liquid crystal on silicon (LCoS), 73,
102103,107
Local difference binary (LDB), 212214
Locality sensitive hashing (LSH), 215
Location-based entertainment (LBE), 289
Location-based MR and AR storytelling
reinforcing
Battle of Gettysburg, 261
Dow Day, 263
110 Stories, 262
Situated Documentaries, 263
Streetmuseum Londinium, 263
strengths and weaknesses, 265
Voices of Oakland, 263264
The Westwood Experience, 263264
Wizard of Oz approach, 263
remembering
Four Angry Men, 270
REXplorer, 271
Rider Spoke, 271
Three Angry Men, 270271
Twelve Angry Men, 270
You Get Me, 271272
reskinning
Alices Adventures in New Media,
268269
Aphasia House project, 268
Faade, 266267
Half Real, 267
Intel Scavenger Bot, 267
Leviathan, 269
MR Sea Creatures, 268
Rainbows End, 265266
Wizarding World of Harry Potter, 268
Lockstitch machines, 623624, 626, 631
Lou Gehrigs disease, 5
715
Subject Index
M
MAGI system, see Microscope-assisted guided
interventions (MAGI) system
MapMyFitness tool, 41
MAR, see Mobile augmented reality (MAR)
Maxillofacial surgery, 497, 533534, 539
Medarpa system, 505
Medical binocular systems, 495, 501503
Medical navigation guidance system
data fusion, 489
data visualization, 489
feedback method, 490
medical imaging, 488
registration, 489
segmentation and modeling, 488489
tracking approach, 489
Mesopic vision, 62
Metropolitan area network (MAN) systems, 301
Microelectromechanical systems (MEMS), 102,
104107, 290
accelerometer, 217218
gyroscope, 217219
IMU, 216
Kalman filtering, 219221
magnetometer, 219
MicroOpticals displays, 2425, 114
Microscope-assisted guided interventions(MAGI)
system, 502, 532
MicroTactus system, 231, 233
Minimally invasive cardiovascular surgery, 530,
535536
MIThril system, 592, 609
Mixed reality (MR), 34, 230, 287, 303, 313,
486487, 677
Mobile augmented reality (MAR)
hybrid methods
applications, 198
design objectives, 199
good accuracy and high efficiency,
199200
recognition and tracking, 221222
sensor-based methods (see Sensor-based
methods)
vision-based methods (see Vision-based
methods)
Mobile technology
Archeoguide system, 413
artifacts and exhibition areas, museums, 413
CityViewAR system, 415
European project Tag Cloud, 415
head-mounted displays, 413
image recognition-based applications, 415
iTACITUS, 414415
laptop system, 413
Layar platform, 413414
location-based experience, 415
N
Narrowcasting
anyware models, 316
floor control, 316, 318
groupware environments, 317
predicate calculus notation, 315, 317
privacy, 316
Natural fiber welding (NFW), 654655
Nearest neighbor (NN), 160, 186, 214216
Nearphones, 297298, 303, 319
Netherlands Architecture Institutes (NAI) UAR
application, 414
Neurosurgery, 527528
analyzed data, 531
derived data, 531
prior-knowledge data, 531
superimposed anatomical data objects, 530
view factor, 531533
visualization processing, 531
NFW, see Natural fiber welding (NFW)
O
110 Stories, 262, 265
OpenCV, 222
OpenStreetMap, 181182, 190
Operating room, AR
binocular display, 501503
direct patient overlay display, 505506
endoscopic video guidance system, 493495
half-silvered mirror devices, 504505
HMDs, 503
limitations
clinical workflow, 508
cost-effectiveness, 508509
optimal information
dissemination,507508
organ deformation, 508
716
perception-related issues, 507508
soft tissues, 508
medical navigation guidance system (see
Medical navigation guidance system)
minimally invasive interventions, 486
mixed reality, 486
requirements, 490
screen-based display
augmented endoscopy system, 498
camera/projector system, 497498
endoscopic tracking system, 499500
image-guided navigation systems, 496
liver surgery, 499
liver tumor resection, 499
lymph node biopsy guidance system,
498499
model-enhanced US-assisted guidance
system, 501
optical tracking system, 497
robotic device, 497
2D/3D registration, 500501
virtual fluoroscopy, 500
tactile feedback, 495
ultrasound guidance system, 491492
video and SPECT guidance system, 492494
virtual reality, 486
visualization environments, 487488
x-ray guidance system, 490491
Optical architecture
contact lenses, 117118
light field occlusion, 118119
non-pupil-forming architecture, 93, 9596
optical platforms, 9397
pupil-forming architecture, 91, 93, 9596
tools, 91
Optical tracking systems, 440, 442443,
492493, 497, 499, 505, 522
The Optimist, 273
Organic light-emitting diode (OLED),
102107,603
Orientation code matching, 359
Orthopedic surgery, 500, 527, 529, 538539
P
Panasonic Open-Ear Bone Conduction Wireless
Headphones, 297
Panorama imagery, 668, 670
Parametric speakers, 298
Parametric ultrasonics, 302
PCBs, see Printed circuit boards (PCBs)
Peripheral vision, 62, 71, 80
Perspective-n-point (PnP) problem, 126127
Photonic mixer device (PMD), 473474
Photopic vision, 62
Physical activity monitoring, 589, 603, 607
Picking outlining adding (POA) paradigm, 666
Subject Index
Pin-array systems, 554
Point cloud model, see Light Detection and
Ranging (LiDAR) scanners
Polarization beam splitting (PBS) films, 102103
Polyvinyl alcohol (PVA) gel electrolyte, 652
Printed circuit boards (PCBs), 593, 621622, 631
Progressive sample consensus (PROSAC) loop,
179, 186, 189
Proximity monitoring
accurate measurement, 377
autonomous entities, 376377
buried assets, 377
construction and manufacturing, 376
DAS, 378
end-effector position, 377
excavator articulation tracking system,
377378
geometric proximity querying method,
379381
kinematic articulation, 378379
monitored virtual environment, 379
Pseudocapacitors, 643644
Q
QBIC system, 595, 597, 599, 607, 610611, 613
Qualcomm, 102, 222
R
Rainbows End, 265266
Random Sample Consensus (RANSAC) method,
157, 161, 165, 192, 196
Real-time high dynamic range techniques, 77
Real-time locating systems (RTLS), 289, 323
Real-virtual compositing process
aliasing virtual objects, 461
pixel averaging, 470471
real-virtual supersampling, 470472
global illumination effects, 460
occlusion handling methods, 473475
occlusion problem, 460
Realvirtual supersampling, 470472
Recon MOD Live, 2628
Reference camera
camera calibration, 135
camera pose estimation, 136137
fiducial marker, 134135
optical zoom lens movement, geometric
model, 135136
Reflection Technologys Private Eye, 13, 25
Relay optics, 60, 6566
Remote health monitoring, 585, 589, 594
REXplorer, 271
RGBD camera, 404
Rider Spoke, 271
RTLS, see Real-time locating systems (RTLS)
Subject Index
S
Scalable and Modular Augmented Reality
Template (SMART), 352353
Scale-invariant feature transform (SIFT),
165,167, 176, 179, 189, 192, 208,
212, 389
Scanning electron microscopy (SEM), 648649
Scotopic vision, 62
Screen-based display
augmented endoscopy system, 498
camera/projector system, 497498
endoscopic tracking system, 499500
image-guided navigation systems, 496
liver surgery, 499
liver tumor resection, 499
lymph node biopsy guidance system,
498499
model-enhanced US-assisted guidance
system, 501
optical tracking system, 497
robotic device, 497
virtual fluoroscopy, 500
Sensor-based methods
applications, 197198
design objectives, 199
good accuracy and high efficiency, 199200
object tracking
accelerometer, 217218
gyroscope, 217219
IMU, 216
Kalman filtering, 219221
magnetometer, 219
Server/client system design
latency analysis, 187188
offline reconstruction, 186187
online operation, 186187
point cloud, 186187
pose estimate, 187
query response, 187
sensor integration, 188
SFM, see Structure from motion (SFM)
Shoulder-worn acoustic targeting system, 313
SIFT, see Scale-invariant feature
transform(SIFT)
Single photon emission computed
tomography(SPECT), 488, 492493
Single-thread chain stitch machine, 623624
Single-walled carbon nanotubes (SWCNTs), 647
Small blurry image (SBI) localization
method,185
Smart eyewear
optical combiner and prescription lenses,
107, 109
potential consumers, 107108
requirements, 107
Rx/plano lens, 107, 109110
717
Smart glasses
consumer headsets, 113, 116
eye box
eye pupil diameter, 100102
vs. eye relief, 100101
FOV, 102
holographic combiner and extraction,
95,100
optical combiner thickness, 100
focus/convergence disparity, 112113,
115116
immersion display, 111, 114
IPD, 99
optical architecture
non-pupil-forming architecture,
93,9596
optical platforms, 9397
pupil-forming architecture, 91, 93, 9596
tools, 91
optical requirements
angular resolution, 9091
constraints, 8889
dot per degree, 90
dot per inch, 90
occlusion and see-through displays,
8889
pixel counts, 9091
target functionality, 90, 92
products, 8788
see-through smart glasses, 111, 114115
specialized headsets, 114, 117
Social Panoramas, prototype system, 670
cylindrical panorama image, 671672
Glass user, 674
prototype system, 671
remote awareness, 672673
TCP/IP networking, 673674
user interaction, 673
Wi-Fi network, 673
Soft finger models, 556
Soft skin simulation/deformation
anisotropic behaviors
constraint formulation, 573
constraint Jacobians, 573574
finger skin deformations, 576
hyperbolic projection function, 572573
linear interpolation, 574575
orthogonal projection, 574575
real-world finger, 570
strain limits, 571572
Cybergrasp haptic device, 562563
deformable hands, 556557
flesh
elastic force computation, 560
elasticity model, 559
skeleton, coupling, 560561
tetrahedral discretization, 559
718
haptic rendering, 562563
rigid articulated hands, 555
skeletal bone structure, 557558
strain-limiting model, 564565
constrained dynamics, 566567
constraint Jacobians, 566
constraints, 564565
contact friction, 567
error metrics, 567568
hand animation, 569570
haptic coupling, 568
human finger model, 568569
Software-defined radio (SDR), 301, 323
Soldier wearable shooter detection
system,313
Sonic flashlight, 504
Sonification, 314
Sound bells, 298
Source and sink dimensions
auditory display form factors
ambisonics and HOA, 300
ASMR, 295
Barco Auro-3D, 300
bone conduction headphones, 297, 299
DirAC, 300
Dolby Atmos, 300
mobile terminals, 295, 297
multichannel loudspeaker arrays,
299300
nearphones, 297
parametric speakers, 298
sound bells, 298
stereo earphones, headphones, and
headsets, 295
stereo loudspeakers, 299
VBAP, 300
WFS systems, 300
broadband wireless network
connectivity,301
digital 4C [foresee] convergence, 294
dynamic responsiveness, 300
head tracking, 300301
mobile and wearable auditory interfaces,
295297
personal audio interfaces, 295
ubicomp/ambient intimacy, 294
Spatial dimensions, 283
distance effects, 291292
GIS, 289
GPS, 289
localized audio sources, 290291
MEMS, 290
position, 289
RTLS, 289
stereotelephony, 292294
whenceware, 290
whitherware, 290
Subject Index
Spatially Oriented Format for
Acoustics(SOFA),322
Spatial sound
audio reinforcement, 287
auditory displays, 278
Audium, 281282
binaural hearing aids, 301302
directionalization and localization,
283285
distributed spatial sound, 302303
information furniture, 303
mixed reality and mixed virtuality
systems,287
mobile AAR system, 288
musical sound characteristics, 282
optical and video, 288
parametric ultrasonics, 302
sink chain, 278280
spatial reverberation system, 285287
stereo reproduction systems, 280281
subjective spatial attributes, 284
wearware and everyware (see Source and
sink dimensions)
whereware, spatial dimensions, 284
distance effects, 291292
GIS, 289
GPS, 289
localized audio sources, 290291
MEMS, 290
position, 289
RTLS, 289
stereotelephony, 292294
whenceware, 290
whitherware, 290
Spatial Sound Description Interchange
Format(SpatDIF), 322
SPECT, see Single photon emission computed
tomography (SPECT)
Speech interfaces, 17
Speeded Up Robust Feature (SURF) detector
algorithm adaptation
GMoment method, 211212
gradient moments, 210
implementation, 210, 212
performance degradation, 209210
Phone-to-PC ratio, 210211
runtime cost, 210211
Hessian matrix, 208209
Spherical aberrations, 77
Staple fibers, 645646
State and resource based
simulationof construction
processes(STROBOSCOPE), 351352
Stereo camera model, see Reference camera
Stereo loudspeakers, 299
Stereotactic frame, 521
Stereotelephony, 288, 292294
719
Subject Index
Strain-limiting model, 564565
constrained dynamics, 566567
constraint Jacobians, 566
constraints, 564565
contact friction, 567
error metrics, 567568
hand animation, 569570
haptic coupling, 568
human finger model, 568569
Structure from motion (SFM); see also Visual
simultaneous localization and
mapping (VSLAM)
A/C motor sequence, 162, 164
building scene, 162164
camera pose estimation, 161
error reduction, 163164
fuse box sequence, 162, 164
incremental keypoint matching, 160161
keypoint descriptors, 160
point cloud, 158160
pose accuracy, 163
RMS reprojection error, 163
subtrack optimization, 156158
unified framework, 153, 156
unmatched keypoints, 161
Stylized AR systems, 463
applications, 476478
GPUs, 474477
psychophysical evaluation, 478
realtime augmented environments, 476477
Subtrack optimization, 156159, 161164
Surface-mount (SMD) components, 633634
SURF detector, see Speeded Up Robust
Feature(SURF) detector
Surgical navigation, 487488
SWCNTs, see Single-walled carbon
nanotubes(SWCNTs)
Synesthetic telepresence, 313
semiautomatic geo-alignment
benefits, 179180
external orientation, 179
ground plane determination, 181
map alignment, 181182
vertical alignment, 180
server/client system design
latency analysis, 187188
offline reconstruction, 186187
online operation, 186187
point cloud, 186187
pose estimate, 187
query response, 187
sensor integration, 188
tracking system, 181
camera model, 182
image representation, 182
initialization and reinitialization, 185186
live keyframe sampling, 184
metrics, 184
point correspondence search, 182183
pose updation, 183
Through-hole components, 632633
Time-of-flight range sensor, 473475
Tracking system, 181
camera model, 182
image representation, 182
initialization and reinitialization, 185186
live keyframe sampling, 184
metrics, 184
point correspondence search, 182183
pose updation, 183
Triplett VisualEYEzer 3250 multimeter, 2223
Twelve Angry Men, 270
Twiddler, 14, 17, 2425, 592, 602
2D barcode markers, 202205
V
Vector base amplitude panning (VBAP),
298,300
Vibrotactile system, 554
Virtual interaction tools, 527, 536
Virtual reality (VR); see also Head-mounted
displays (HMDs)
Facebook Oculus Rift, 323
FOV function, 92
occlusion/immersive displays, 86
sickness issues, 115
720
Sony PlayStation 4 Morpheus, 323
systems, 86
Virtual retinal display (VRD), 6869
Vision-based methods
applications, 196197
design objectives, 199
good accuracy and high efficiency, 199200
object recognition and tracking
components, 201202
detection, 202
edge lines, intersections, 201
feature-based methods (see Feature-based
tracking method)
limitations, 205
marker identification, 203205
marker tracking, 205
pipeline, 200201
processes, 200
recognizer, 200201
software development libraries, 222223
Visual consistency
artistic and illustrative stylizations, 463
camera realism, 462463
complementary strategies, 463464
computer-generated virtual objects,
458460
emulating photographic imperfections
camera image noise, 466468
defocus blur, 468469
motion blur, 468469
film grain, 464
fully automatic processing, 465
hardware capabilities, 465
real-time processing, 465
real-virtual compositing (see Real-virtual
compositing)
stylized AR systems, 463
applications, 476478
GPUs, 474477
psychophysical evaluation, 478
realtime augmented environments,
476477
system resources, 465
video see-through AR, 458
visual discrepancies, 461462
Visual illusion
implementation, 346348
incorrect and correct occlusion, 344345
occlusion handling process, 344346
two-stage rendering method, 346
validation experiments, 349350
Visual modeling, see Three-dimensional (3D)
visual model
Visual simultaneous localization and
mapping(VSLAM)
accuracy and robustness, 446
CAD model, 443445
Subject Index
error accumulation, 445
GIS
database creation issue, 454
in-plane accuracy, 449451
low cost, 447448
navigation application, 452453
odometer/inertial sensor, 454
out-plane accuracy, 452
GPS constraints
buildings model, 437, 452453
camera positions, 448
degrees of freedom, 447, 449
inequality constraint, 448449
measurements, 448
navigation application, 452453
keyframes, 442
multiview geometry, 442
sales assistance, 437, 446
scale factor, 442443, 445
small motion assumption, 445446
Visual tracking; see also Three-dimensional (3D)
visual model
LiDAR point clouds (see Light Detection and
Ranging (LiDAR) scanners)
object recognition and tracking
database matches, 155
incremental keypoint matching,
154156
keypoints, 154155
offline stage, 153154
online stage, 154
projection matches, 155
unified framework, 152153
robust SFM
A/C motor sequence, 162, 164
building scene, 162164
camera pose estimation, 161
error reduction, 163164
fuse box sequence, 162, 164
incremental keypoint matching, 160161
keypoint descriptors, 160
point cloud, 158160
pose accuracy, 163
RMS reprojection error, 163
subtrack optimization, 156158
unified framework, 153, 156
unmatched keypoints, 161
VSLAM, see Visual simultaneous localization
and mapping (VSLAM)
Vuforia cloud recognition service, 222223
W
WalshHadamard (WH) kernel projection, 160
Wearable active camera with laser
pointer(WACL), 665, 668669
Wearable computer integration
721
Subject Index
accessory-based wearable computers (see
Accessory-based wearable computers)
cost effectiveness, 600, 602, 612
data and power, 600601, 607
extensibility, 600, 602, 611
garment-based wearable computers (see
Garment-based wearable computers)
on-body computing, 600, 602, 612613
robustness and reliability, 600601, 610611
safety considerations, 600, 602, 612
sensing modalities, 600601, 603607
social acceptance and aesthetics,
600601,609
user interface, 600603
wearability, 600601, 608609
Wearable computing
academic/maker system, 2425
audio displays, 1819
computer-generated images, 9
consumer devices, 2628
diabetes, monitoring, 6
efficiency, improvements, 28
image registration, 10
IMT technology, 67
individual electrodes, 5
industrial wearable systems, 2224
Lou Gehrigs disease, 5
microchip, 45
microvibration device, 56
mixed-reality, 34
mobile input, 1718
networking, 1415
neural interface, 5
portable video viewers, 2022
power and heat, 1517
public policy, 79
virtual reality, 1920
visual displays, 1819
X
Xybernaut, 17, 2224
Y
Yamaha Vocaloid, 287
You Get Me, 271272
Z
z-buffering, 346347
Zhangs camera calibration method, 130, 137138
ZigBee, 301, 590, 593, 607
Zoomable camera model
offline process, 129
online process
intrinsic camera parameter change, 130131
monocular camera (see Monocular camera)
reference camera, 134137
PnP problem, 126127