Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Matjaž Mihelj
Domen Novak
Samo Beguš
Virtual Reality
Technology
and Applications
Intelligent Systems, Control and Automation:
Science and Engineering
Volume 68
Editor
S. G. Tzafestas, National Technical University of Athens, Athens, Greece
Samo Beguš
123
Matjaž Mihelj
Domen Novak
Samo Beguš
Faculty of Electrical Engineering
University of Ljubljana
Ljubljana
Slovenia
We began working on virtual reality in the first years of the twenty-first century,
but that was not our first glimpse of it. As we grew up, we watched virtual reality
through the eyes of laymen as it stepped out of science fiction and into everyday
life. It has become a fascinating field that brings together engineers, programmers,
designers, artists, psychologists, and others. These people collaborate to create
something more than the sum of their parts, a virtual world made of zeros and ones
that nonetheless feels real.
The magic of virtual worlds captivated us all, but what we desired most was to
peer underneath the hood and see just how things worked. We thus became
involved in the scientific and technical aspects of virtual reality: haptic interfaces,
graphics design, psychological aspects, and others. We created this book for those
who are also fascinated by the inner workings of this intriguing technology.
The book covers the individual elements of virtual reality, delving into their
theory and implementation. It also describes how the elements are put together to
create the virtual worlds we experience. Most of the knowledge contained within
comes from our own experience in human–robot interaction, where virtual envi-
ronments are used to entertain, motivate, and teach. Distilling the knowledge into
text form has been an arduous process, and we leave it to readers to decide whether
we were successful.
The text was originally aimed at engineers, researchers and graduate students
with a solid foundation in mathematics. Our main motivation for writing it was
that many existing virtual reality books do not have a sufficient focus on the
technical, mathematical aspects that would be of interest to engineers. Nonethe-
less, the actual amount of mathematical content varies greatly from chapter to
chapter. Readers with backgrounds other than engineering should be able to read
and understand most chapters, though they may miss out on some of the mathe-
matical details. Due to its origins, however, the book is focused less on psycho-
logical aspects and more on technical aspects—the hardware and software that
makes virtual reality work.
Many people contributed either directly or indirectly to the creation of this
book. Though they are too many to list, we would like to thank colleagues at the
v
vi Preface
University of Ljubljana and ETH Zurich, who travelled the path of research with
us and helped us to discover virtual reality. We would also like to thank the
diligent men and women at Springer who turned the book into reality. Cynthia
Feenstra deserves special thanks for being in touch with us throughout the prep-
aration process and putting up with occasionally missed deadlines. And as always,
we would like to thank our families for supporting us day after day. Whoever you
are, we hope you will enjoy reading this book.
Contents
vii
viii Contents
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
Chapter 1
Introduction to Virtual Reality
Abstract The introductory chapter begins with the basic definitions of virtual reality,
virtual presence and related concepts. It provides an overview of the history of virtual
reality, from its origins in the 1950s to the present day. It also covers some of the most
important applications, from flight simulators to biomedical uses. Finally, it briefly
describes the main feedback loops used in virtual reality and the human biological
systems used to interpret and act on information from the virtual world.
Virtual reality is a term that we’ve all heard many times. Movies such as the Matrix
brought virtual reality out of science fiction and into the minds of the masses. Exam-
ples of virtual and augmented reality are also becoming more and more prevalent in
real life, from military flight simulators to simple smartphone applications. But since
everyone has their own impression of virtual reality is, let’s first give the definition
that we’ll use throughout the book.
Virtual reality is composed of an interactive computer simulation, which senses
the user’s state and operation and replaces or augments sensory feedback information
to one or more senses in a way that the user gets a sense of being immersed in the
simulation (virtual environment). We can thus identify four basic elements of virtual
reality: the virtual environment, virtual presence, sensory feedback (as a response to
the user’s actions) and interactivity [1].
Virtual reality is the observation of the virtual environment through a system that
displays the objects and allows interaction, thus creating virtual presence.
Virtual environment is determined by its content (objects and characters). This
content is displayed through various modalities (visual, aural and haptic), and per-
ceived by the user through vision, hearing and touch.
Just like objects in the real world, objects in a virtual environment also have
their properties such as shape, weight, color, texture, density and temperature. These
properties can be observed using different senses. The color of an object, for example,
is perceived only in the visual domain, while its texture can be perceived both in visual
as well as haptic domains.
The content of the virtual environment can be grouped into categories. Environ-
ment topology describes the surface shape, areas and features. Actions in a virtual
environment are usually limited to a small area within which the user can move.
Objects are three-dimensional forms which occupy space in the virtual environment.
They are entities that the user can observe and manipulate. Intermediaries are forms,
which are controlled via interfaces, or avatars of users themselves. User interface
elements represent parts of the interface that resides within the virtual environment.
These include elements of virtual control such as virtual buttons, switches or sliders.
Virtual presence can be roughly divided into physical (sensory) and mental presence.
It represents the feeling of actually ‘being’ in an environment; this can either be a
completely psychological state or achieved via some physical medium. Physical
virtual presence is the basic characteristic of virtual reality and represents the user’s
body physically entering the medium. Synthetic stimuli are created artificially and
presented to the user, but it is not necessary to affect all senses or involve the entire
human body. Mental virtual presence represents a state of ‘trance’: engagement,
expectations, the feeling of being part of the virtual world. In addition to physical
and mental presence, some authors also define telepresence, the feeling of virtual
presence in a geographically distant location.
Virtual presence is very difficult to evoke with other media, as they do not offer
actual sensory and physical immersion into the environment. The notion of absence
has even been advanced as a concept analogous to presence, but evoked by other
media [3]. Supporters of the concept claim that experiencing, for instance, the story
of a novel requires a detachment from the environment in which the individual is
reading a book. To some degree, information from the environment must be ignored
so that the individual can be immersed in the contents of the novel—the reader
must thus become absent from the surrounding environment. In virtual reality, the
individual is present in an (admittedly virtual) environment, so he/she should also
perceive it as real and respond to it as real.
1.1 Definition of Virtual Reality 3
Physical virtual presence defines virtual reality and simultaneously separates it from
other media. It is achieved by presenting the virtual world to a user with a synthesis
of stimuli to one or more senses in response to the user’s position and actions.
In general, a virtual reality system renders the virtual world through sight, sound and
touch (haptics).
As the user moves, the visual, audio and haptic stimuli change as the virtual scene
also moves. When moving toward an object, it becomes larger, louder, and can even
be touched when at an appropriate distance. Turning the head shows the world to the
left and right of the user. Touch allows objects to be manipulated.
Synthetic stimuli often drown out stimuli from the real world, thus decreasing
mental presence in the real world. The degree to which real stimuli are replaced
by synthetic ones and the number of ‘tricked’ senses affect the level of physical
presence, which in turn affects the level of mental presence.
The level of desired mental virtual presence depends on the goal applications of
virtual reality. If the virtual experience is meant for entertainment, a high level of
mental presence is needed. However, a high degree of mental immersion is often not
necessary, possible or even desirable. The absence of mental virtual presence thus
does not disqualify a medium from being virtual reality.
A user’s mental virtual presence can have varying degrees of intensity: users can
perceive a connection with the computer; users can ignore the real world and focus
on interacting with the virtual world while still knowing the difference between real
and virtual worlds; or users can even be so immersed in the virtual environment that
they forget that it is virtual.
A realistic display that includes sight, sound and touch can significantly affect the
level of mental virtual presence. A photorealistic image of the virtual environment
is unnecessary and sometimes undesired, as even small errors in such an image
distract the user from the experience. The same is true for other elements of realism
such as three-dimensional views or echoes—while they are sometimes crucial for
maintaining mental virtual presence, they may be distracting in other applications.
Virtual reality must ensure at least a minimum level of physical virtual presence.
The definition of mental virtual presence assumes that users are so busy with events in
the virtual environment that they stop doubting what they are experiencing. The level
of mental virtual presence is affected by factors such as the virtual scenario, the quality
of the display and graphical representation, and the number of senses stimulated by
the virtual reality system. Another important factor is the delay between a user’s
action and the virtual environment’s response. If the delay is too long (being too
long depends on the display type—visual, aural or haptic), it can destroy the effect
of mental immersion.
4 1 Introduction to Virtual Reality
The perceived realism of individual objects and the entire virtual environment
can be increased using sensory transfer. If an object looks realistic, we expect it to
also act realistically. By emphasizing certain objects to the senses, it is possible to
significantly increase the realism of the entire environment.
Sensory feedback is a crucial component of virtual reality. The virtual reality sys-
tem provides direct sensory feedback to users according to their physical location.
Generally, most feedback is provided via visual information, though some environ-
ments only use haptic information. Of course, it is necessary to track the user’s
location in order to provide appropriate feedback. The system must thus have the
ability to automatically measure the position and orientation of objects in the real
environment.
1.1.4 Interactivity
If virtual reality is to be realistic, it must respond to the user’s actions; in other
words, it must be interactive. The ability of the user to affect computer-generated
environments represents one form of interaction. Another possibility is to change
the location and angle from which the user views the environment. A multi-user
environment represents an extension to the interactivity and involves a large number
of users simultaneously working in the same virtual environment or simulation. Thus,
a multi-user environment must allow interaction between multiple users, but is not
necessarily part of virtual reality.
When working with others in the same environment, it is necessary to follow their
activities—pose, gestures, actions, gaze direction, speech. The word avatar (a Hindi
word for the embodiment of a deity) is commonly used to describe a virtual object
that represents a user or real object inside the virtual environment.
1.1.5 Perspective
The creator of virtual reality can use several options to change a user’s perception of
the virtual environment. One of them is the viewpoint from which the virtual world
is seen. A first-person view involves looking at the environment through an avatar’s
eyes, a second-person view involves looking from the immediate vicinity of rele-
vant activity, and a third-person view involves looking from an entirely independent
location.
1.2 History of Virtual Reality 5
Human imagination dreamed of virtual reality for decades before the first actual
implementations arrived. In 1931, Aldous Huxley’s book Brave New World already
introduced the concept of feelies—movies that involve touch in addition to sight and
sound. In 1935, Stanley Weinbaum went even farther and presented a detailed idea
of virtual reality in his book Pygmalion’s spectacles:
“But listen—a movie that gives one sight and sound. Suppose now I add taste,
smell, even touch, if your interest is taken by the story. Suppose I make it so that
you are in the story, you speak to the shadows, and the shadows reply, and instead
of being on a screen, the story is all about you, and you are in it. Would that be to
make real a dream?”
This idea remained on paper for about 20 years until Morton Heilig, considered by
some to be the father of virtual reality, began designing a practical implementation.
In 1957, he developed and patented the Sensorama, a machine that offered a virtual
bicycle riding experience. The user sat in the machine with a three-dimensional city
display, heard the sounds of the city, felt the wind and the vibrations of the seat,
and could even smell certain scents. The Sensorama was the first step in the real
development of virtual reality, but was never commercially successful.
The next device worth mentioning was the first head-mounted display, the Philco
HMD. It did not provide a window into a virtual environment, but showed a video
from a real, distant location. It can thus be considered the first example of telep-
resence, an application of virtual reality that is still popular today. In 1968, Ivan
Sutherland then developed a head-mounted display connected to a virtual environ-
ment [4]. Sutherland’s paper The Ultimate Display, written in 1965, predicted the
rise of the fantastic worlds seen in today’s computer games: “There is no reason why
the objects displayed by a computer have to follow the ordinary rules of physical
reality with which we are familiar.” He also had a vision of the ultimate stage of
virtual reality development:
“The ultimate display would, of course, be a room within which the computer
can control the existence of matter. A chair displayed in such a room would be good
enough to sit in. Handcuffs displayed in such a room would be confining, and a
bullet displayed in such a room would be fatal. With appropriate programming, such
a display could literally be the Wonderland into which Alice walked.”
Sutherland’s display, called the Sword of Damocles, consisted of glasses with two
small screens (one for each eye) that together gave an illusion of three-dimensional
vision. It displayed a virtual environment consisting of rooms represented with sim-
ple wire models. By moving the head, the user could also change the view of the
environment, which required a complex motion tracking system attached to the ceil-
ing. Since the screens were partially transparent, the user could see both the real and
virtual worlds simultaneously. The Sword of Damocles can thus also be considered
the first example of augmented reality: synthetic stimuli superimposed on stimuli
from the real environment.
Sensorama and the Sword of Damocles allowed a user to experience virtual en-
vironments using different senses, but did not allow any interaction with these envi-
6 1 Introduction to Virtual Reality
ronments. The first environments that reacted to the user’s actions were developed
around 1970 by Myron Krueger. Using various sensors (from videocameras to pres-
sure sensors in the floor), the virtual reality system could recognize users’ activities
and move objects in the virtual environment accordingly. Virtual objects thus acted
like real ones. Since multiple users could interact with the virtual environment simul-
taneously, this was also the first example of multi-user environments. Krueger’s most
famous creation was the Videoplace environment, which included artistic activities
such as drawing on virtual objects. Krueger also coined the term artificial reality,
which describes the recognition of the user’s activities and the generation of feedback
that reinforces the illusion of the activities taking place in a virtual environment.
Virtual environments that could react to the user’s actions required new motion
recognition methods adapted for virtual reality. The late 1960s and early 1970s thus
saw the development of the Grope I–III systems for virtual molecule display. They
allowed the user to move molecules on the screen using a special haptic interface as
well as feel the forces between the molecules using force feedback. Grope I–III were
an upgrade of Krueger’s system: instead of recognizing motions using videocameras,
they let the user affect objects directly by touching them. The next step in touch
recognition was the Sayre Glove, which used built-in sensors to detect finger motions
and thus represented a simple and cheap human motion recognition method. Its
younger sister, the VPL Dataglove, appeared on a 1987 cover of Scientific American,
entering public consciousness as the first commercially available motion recognition
glove. VPL, the company that developed it, later also developed the first commercial
virtual reality system, Reality built for two. The company’s founder Jaron Lanier also
significantly contributed to public awareness of virtual reality and popularization of
the term virtual reality itself.
Of course, it was not only the scientists that brought the concept of virtual reality
to the masses. Science fiction books, TV series and movies all did their part. In 1982,
William Gibson first used the term cyberspace in his stories Burning Chrome and
Neuromancer. The same year, Hollywood first brought virtual reality to the silver
screen with the movie Tron. In 1987, the TV series Star Trek: The Next Generation
presented the holodeck: a room that used holographic images and the creation of
new matter to conjure up virtual environments that users could actively participate
in. One episode of the series even centered on a man who became so obsessed with
virtual reality that he neglected the real world; a shadowy dream of modern massively
multiplayer online games? The 1990s then delivered an abundance of movies about
virtual reality. Perhaps the most famous one was the Matrix, which imagined a virtual
world that encompassed the majority of humanity and was so realistic that most users
did not even know it was only virtual.
Technological development thus fell far behind human imagination in the 1990s,
but major progress was made nonetheless. Perhaps the most famous virtual reality
product of the nineties was the CAVE (an acronym that stands for Cave Automatic
Virtual Environment): a room whose walls consist of screens displaying a virtual
environment. Users can thus really see themselves inside the virtual environment.
Special glasses can also give an illusion of depth—objects look like they step out of
the screen and float in the air. Electromagnetic sensors built into the walls allow the
1.2 History of Virtual Reality 7
Although virtual reality has not yet achieved the visions of science fiction, it is
already successfully used in many applications. Let’s briefly examine some of them,
as a detailed overview of all applications would take up far too much space.
Flight simulators may be the best-known practical application of virtual reality. They
allow pilots to practice flying in a safe, controlled environment where mistakes can-
not lead to injury or equipment damage. The simplest simulators run on a personal
computer. Their virtual environment contains the entire physical model of the plane
and a simpler model of the landscape over which the plane can fly. More complex
simulators use similar virtual environments, but include a realistic imitation of the
cockpit complete with visual displays, sounds and mechanisms to move the entire
cockpit (when simulating e.g. turbulence). Such virtual environments allow training
in varied situations, from routine flight to serious equipment failures that could en-
danger passengers’ lives. Since the desired situation can be chosen at will, training in
a flight simulator can also be more effective per unit of time than real training. Flight
simulators first appeared in the 1950s and now represent a completely established
technology regularly used by military and civilian pilots all over the world.
Driving simulators were developed for a similar purpose: they allow safe driving
lessons in different conditions (rain, ice, congestions) or tests of new cars. In a
virtual environment, it is possible to change any of the car’s features (both aesthetic
and functional) and then observe how real drivers react to the changes. Simulation
thus also allows designed cars to be tested before actually building a prototype.
A flight simulator is a typical example of virtual reality since it allows difficult tasks
to be practiced in a controlled environment where different actions and conditions
can be tried without any threat to people or machinery. Surgery similarly represents
8 1 Introduction to Virtual Reality
a difficult situation where a single error can lead to the patient’s death. Following in
the footsteps of flight simulators, surgery simulators provide virtual environments
where a surgeon can use realistic haptic interfaces (which look and feel like actual
surgical instruments) to practice surgical procedures on different patients [5]. These
virtual patients are also not necessarily just ‘general’, made-up people. Information
obtained with modern medical equipment (e.g. computed tomography) can be used
to create a three-dimensional model of a patient that is scheduled for actual surgery.
Before the real surgery, surgeons can ‘practice’ on a virtual patient with very similar
characteristics to the real one. Surgery simulators have become especially widespread
with the creation of surgical robots, which allow the entire surgery to be conducted
via haptic interface and screen. With surgical robots, experience from virtual surgery
is even more directly transferrable into reality.
As mentioned in the driving simulator subsection, virtual reality can be used to de-
sign and test different machines and objects. Since virtual reality is often expensive,
it is most frequently used to design objects that are either very expensive (e.g. power
plants, rockets) or manufactured in large quantities (e.g. cars). Such virtual environ-
ments are extremely complex since they need to combine a good visual display with
a detailed physical model that includes all the factors that could affect the tested
object.
Designing objects in virtual reality does not have to be limited to testing concepts
that could later be transferred to the real world. The process can also go the other way:
objects that exist in the real world can be transferred to a virtual environment. One
example are imitations of famous buildings created in virtual environments [6]. The
user can walk through a virtual historical building, play with the items in it and learn
historical facts without ever visiting the building in person. The virtual environment
can even include virtual humans from the historical era of the building, allowing the
user to interact with them and learn additional information.
After injuries such as stroke or spinal cord injury, human limbs are severely weakened
due to a damaged nervous system. Intensive exercise can help the patient partially
or completely regain lost motor abilities, but the rehabilitation process is lengthy
and difficult. Patients need to obtain detailed feedback both during and after exercise
in order to improve their motions, and they must be highly motivated for exercise.
Virtual reality has been suggested as a possible solution to these problems. Interesting
and varied virtual environments can increase motivation since the exercises do not
become monotonous. At the same time, measuring all the variables of the virtual
environment allows a large amount of information about patient movements and
general performance to be obtained. Virtual reality can also be combined with special
rehabilitation robots that actively help the patient exercise. Several studies have
shown that patients can relearn motions in virtual reality and that the knowledge can
be successfully transferred to the real world. However, it has not yet been proven
whether rehabilitation in virtual reality is more effective than classic rehabilitation
methods [8].
10 1 Introduction to Virtual Reality
1.3.7 Psychotherapy
Virtual reality can evoke virtual presence: a feeling of being present and involved
in the virtual environment. Users thus react to virtual objects and creatures as they
would to real ones, a feature often used by psychologists. In fact, the most popular
therapeutic application of virtual reality is the treatment of phobias and traumas
by exposing a individual to the object, creature or situation that they are afraid of.
Psychology discovered long ago that people will never overcome their fears if they
avoid stressful situations—they must face their fears in order to overcome them. Since
exposing an individual to an actual stressful situation can be expensive, dangerous,
impractical or even impossible (e.g. in the case of post-traumatic stress disorder
evoked by wars), virtual reality can act as an effective alternative. Exposure therapy
in virtual reality is completely controlled, inexpensive and can be performed at a
therapeutic institution. Virtual reality has thus been used to successfully cure fear
of heights, spiders, flying, open spaces and public speaking. Additionally, virtual
environments with many positive stimuli can be used to treat other psychological
disorders such as impotence or low self-esteem caused by excess weight [9].
Virtual reality relies on the use of a feedback loop. Figure 1.1 shows the feedback
loop, which allows interaction with the virtual reality system through user’s physical
actions and detection of user’s psychophysiological state. In a fast feedback loop
the user directly interacts with the virtual reality system through motion. In a slow
feedback loop related to affective computing, the psychophysiological state of the
user can be assessed through measurements and analysis of physiological signals
and the virtual environment can be adapted to engage and motivate the user.
The virtual reality system enables exchange of information with the virtual en-
vironment. Information is exchanged through the interface to the virtual world. The
user interface is the gateway between the user and the virtual environment. Ideally,
the gateway would allow transparent communication and transfer of information
between the user and the virtual environment.
The user interface defines how the user communicates with the virtual world and
how the virtual world manifests in a perceptible way. Figure 1.2 shows the relation-
ships between the user interface, methods of creating the virtual world and aspects
of the user’s personality. All these elements affect the virtual reality experience as
well as the physical and mental presence.
Figure 1.3 shows the flow of information in a typical virtual reality system. The
virtual world is projected into a representation that is rendered and shown to the user
via displays. The process takes the user’s motions into account, thus enabling virtual
presence by appropriately adjusting the displayed information. The user can affect
the virtual world via the interface inputs. In augmented reality, the displayed virtual
environment is superimposed onto the perceived real environment.
1.4 Virtual Reality System 11
motion
tracking
display
environment user
model
affective
computing
Fig. 1.1 The feedback loop is a crucial element of a virtual reality system. The system must react
to the user’s actions, and can optionally even estimate the user’s psychological state in order to
better adapt to the situation
Rendering is the process of creating sensory images of the virtual world. They must
be refreshed rapidly enough to give the user an impression of continuous flow (real-
time rendering). The creation of sensory images consists of two steps. First, it is
necessary to decide how the virtual world should look, sound and feel. This is called
the representation of the virtual world. Secondly, the representation must be displayed
using appropriate hardware and software.
12 1 Introduction to Virtual Reality
Virtual environment
a) complexity
b) simulation/physics
c) distributed environments
e) point of view
d) presentation quality
Virtual
reality
experience
User interface
a) input
– motion tracking
( immersion
presence ) User
– physiological signals a) abilities
– props b) emotional state
b) output c) motivation
– visual display d) personal experiences
– audio display e) engagement
– haptic display f) shared experience
c) interaction
– manipulation
– navigation
Fig. 1.2 Virtual reality requires the integration of multiple factors—user interface, elements of the
virtual world and the user’s experiences. Interaction between these factors defines the experience
of virtual reality
If we wish to create virtual reality, we must decide how to represent thoughts, ideas
and information in visual, audio and haptic forms. This decision significantly affects
the effectiveness of virtual reality.
Communication via a certain medium thus demands mutual presentation of
ideas and the understanding of these presentations. Ideas, concepts or physical
characteristics can be presented in different ways, though some are more appropriate
than others.
1.4 Virtual Reality System 13
Rendering generates visual, audio and haptic signals to be displayed with appropriate
equipment. Hardware and software allow computer-generated representations of the
virtual world to be transformed into signals that are then displayed in a way noticeable
to the human senses. Since each sense has different rendering requirements, different
hardware and software are thus also used for each sensory channel. Though the
ultimate goal is to create an unified virtual world for all the senses, the implementation
details differ greatly and will thus be covered separately for visual, audio and haptic
renderings.
14 1 Introduction to Virtual Reality
The experience of virtual reality is based on the user’s perception of the virtual
world, and physical perception of the virtual world is based on computer displays.
The term display will be used throughout the book to refer to any method of presenting
information to any human sense. The human body has at least five senses that provide
information about the external world to the brain. Three of these senses—sight,
hearing and touch—are the most frequently used to transmit synthetic stimuli in
virtual reality. The virtual reality system ‘tricks’ the senses by displaying computer-
generated stimuli that replace stimuli from the real world. In general, the more senses
are provided with synthetically generated stimuli, the better the experience of virtual
reality.
From the perspective of interaction with a virtual environment, a human being can
be divided into three main systems:
• perception allows information about the environment to be obtained;
• motor abilities (musculoskeletal system) allow movement through the environ-
ment, manipulation through touch, and positioning of sensory organs for better
perception;
• cognitive abilities (central nervous system) allow the analysis of information from
the environment and action planning according to current task goals.
Humans perceive the environment around them via multiple sensory channels that
allow electromagnetic (sight), chemical (taste, smell), mechanical (hearing, touch,
orientation) and heat stimuli to be detected. Many such stimuli can be artificially
reproduced in virtual reality.
Biological converters that receive signals from the environment or the interior of
the body and transmit them to the central nervous system are called receptors. In
general, each receptor senses only one type of energy or stimulus. Receptor structure
is thus also very varied and adapted to receiving specific stimuli. Nonetheless, despite
this great variety, most receptors can be divided into the three functional units shown
in Fig. 1.4. The input signal is a stimulus that always appears in the form of energy:
electromagnetic, mechanical, chemical or heat. The stimulus affects the filter part
of the receptor, which does not change the form of the energy but does amplify
or suppress certain parameters. For instance, the ear amplifies certain frequencies of
sound, skin acts as a mechanical filter, and the eyes use the lens to focus light onto the
1.4 Virtual Reality System 15
receptor nerve
stimulus potential impulses
filter converter encoder
retina. The converter changes the modified stimulus to a receptor membrane potential
while the encoder finally converts the signal to a sequence of action potentials.
Most receptors decrease the output signal if the input does not change for a certain
amount of time. This is called adaptation. The receptor’s response usually has two
components: the first is proportional to the intensity of the stimulus while the second
is proportional to the speed with which the stimulus intensity changes. If S(t) is the
stimulus intensity and R(t) is the response, then
Humans use their motor abilities to interact with the virtual world. This interaction
can be roughly divided into navigation within the virtual environment, manipulation
of objects in the environment, and interaction with other users of the virtual reality.
The motor subsystem also has an important role in connection to haptic interfaces.
The user directly interacts with the haptic interface and thus affects the stability of
haptic interaction. It is thus necessary to be familiar with the human motor system
in order to create a stable haptic interface.
The human cognitive system is used to make decisions about how to interact with
the virtual environment. In addition to the rational part of cognition, it is also crucial
to take emotions into account since they have an important effect on behavior in the
virtual world.
16 1 Introduction to Virtual Reality
References
1. Sherman WR, Craig AB (2003) Understanding virtual reality. Morgen Kaufman Publishers
2. Stanney K (2001) Handbook of virtual environments. Lawrence Earlbaum, Inc
3. Waterworth JA, Waterworth EL (2003) The meaning of presence. Presence-Connect 3(2)
4. Sutherland IE (1965) The ultimate display. In: Proceedings of the IFIP congress, pp 506–508
5. Gallagher AG, Ritter EM, Champion H, Higgins G, Fried MP, Moses G, Smith D, Satava RM
(2005) Virtual reality simulation for the operating room proficiency-based training as a paradigm
shift in surgical skills training. Ann Surgery 241(2):364–372
6. Anderson EF, McLoughlin L, Liarokapis F, Peters C, Petridis P, de Freitas S (2010) Developing
serious games for cultural heritage: a state-of-the-art review. Virtual Reality 14:255–275
7. Yu D, Jin JS, Luo S, Lai W, Huang Q (2010) Visual information communication, chap. A useful
visualization technique: a literature review for augmented reality and its application, limitation
and future direction, Springer, pp 311–337
8. Holden MK (2005) Virtual environments for motor rehabilitation: review. CyberPsychology
Behav 8(3):187–211
9. Riva G (2005) Virtual reality in psychotherapy: review. CyberPsychology Behav 8(3):220–230
Chapter 2
Degrees of Freedom, Pose, Displacement
and Perspective
To begin with, we will introduce the degree of freedom in the case of an infinitesimal
mass particle. In this case, the number of degrees of freedom is defined as the number
of independent coordinates (not including time) which are necessary for the complete
description of the position of a mass particle.
A particle moving along a line (infinitesimally small ball on a wire) is a system
with one degree of freedom. A pendulum with a rigid segment swinging in a plane
is also a system with one degree of freedom (Fig. 2.1). In the first example, the
position of the particle can be described with the distance, while in the second case
it is described with the angle of rotation.
A mass particle moving on a plane has two degrees of freedom (Fig. 2.2). Its
position can be described with two cartesian coordinates x and y. A double pendulum
with rigid segments swinging in a plane is also a system with two degrees of freedom.
The position of the mass particle is described by two angles. A mass particle in space
has three degrees of freedom. Its position is usually expressed by three Cartesian
coordinates x, y and z. An example of a simple mechanical system with three degrees
of freedom is a double pendulum where one segment is represented by an elastic
spring and the other by a rigid rod. In this case, the pendulum also swings in a plane.
Next, we will consider degrees of freedom of a rigid body. The simplest rigid
body consists of three mass particles (Fig. 2.3). We already know that a single mass
particle has three degrees of freedom described by three rectangular displacements
along a line called translations (T). We add another mass particle to the first one
in such a way that there is constant distance between them. The second particle is
Fig. 2.1 Two examples of systems with one degree of freedom: mass particle on a wire (left) and
rigid pendulum in a plane (right)
Fig. 2.2 Examples with two (left) and three degrees of freedom (right)
2.1 Degree of Freedom 19
POSITION ORIENTATION
POSE
restricted to move on the surface of a sphere surrounding the first particle. Its position
on the sphere can be described by two circles reminiscent of meridians and latitudes
on a globe. The displacement along a circular line is called rotation (R). The third
mass particle is added in such a way that the distances with respect to the first two
particles are kept constant. In this way the third particle may move along a circle, a
kind of equator, around the axis determined by the first two particles. A rigid body
therefore has 6 degrees of freedom: 3 translations and 3 rotations. The first three
degrees of freedom describe the position of the body while the other three degrees
of freedom determine its orientation. The term pose is used to include both position
and orientation.
In the following sections we will introduce a unified mathematical description of
translational and rotational displacements.
d = ai + bj + ck, (2.1)
When using homogenous transformation matrices, an arbitrary vector has the fol-
lowing 4 × 1 form
20 2 Degrees of Freedom, Pose, Displacement and Perspective
d
v
⎡ ⎤
x
⎢y⎥ T
q=⎢ ⎥
⎣z ⎦ = x y z 1 . (2.3)
1
Rot(z, γ )
x Rot(x, α ) Rot(y, β ) y
xx y
22 2 Degrees of Freedom, Pose, Displacement and Perspective
3 × 3 rotation matrix. The elements of the rotation matrix are cosines of the angles
between the axes given by the corresponding column and row
⎡ x y z ⎤
cos 0 ◦ cos 90 ◦ cos 90◦ 0 x
⎢
Rot (x, α) = ⎢ cos 90 ◦ cos α cos(90◦ + α) 0⎥⎥y
⎣ cos 90 cos(90 − α)
◦ ◦ cos α 0⎦ z
0 0 0 1
(2.6)
⎡ ⎤
1 0 0 0
= ⎢
⎢ 0 cos α − sin α 0⎥⎥.
⎣ 0 sin α cos α 0⎦
0 0 0 1
The angle between the x and the x axes is 0◦ . We therefore have cos 0◦ in the
intersection of the x column and the x row. The angle between the x and the y axes
is 90◦ . We put cos 90◦ in the corresponding intersection. The angle between the y
and the y axes is α. The corresponding matrix element is cos α.
Rotation matrices for rotations around the y axis can be written similarly
⎡ ⎤
cos β 0 sin β 0
⎢ 0 1 0 0⎥
Rot (y, β) = ⎢ ⎥
⎣ − sin β 0 cos β 0 ⎦ (2.7)
0 0 0 1
and z axis
⎡ ⎤
cos γ − sin γ 0 0
⎢ sin γ cos γ 0 0⎥
Rot (z, γ ) = ⎢
⎣ 0
⎥. (2.8)
0 1 0⎦
0 0 0 1
0 0 0 1 1 1
The graphical presentation of rotating the vector u around the z axis is shown in
Fig. 2.7.
2.4 Pose and Displacement 23
2
w
u
2 z 3 x
In the previous section, we learned how a point is translated or rotated around the
axes of the cartesian frame. We are next interested in displacements of objects. We
can always attach a coordinate frame to a rigid object under consideration. In this
section we shall deal with the pose and the displacement of rectangular frames. We
shall see that a homogenous transformation matrix
Rp
H= (2.9)
0 1
describes either the pose of a frame with respect to a reference frame or it represents
the displacement of a frame into a new pose. In the first case the upper left 3 × 3
matrix R represents the orientation of the object, while the right-hand 3 × 1 column
p describes its position (e.g. the position of its center of mass). In the case, of object
displacement, the matrix R corresponds to rotation and the column p corresponds to
translation of the object. We shall examine both cases through simple examples. Let
us first clear up the meaning of the homogenous transformation matrix describing
the pose of an arbitrary frame with respect to the reference frame. Let us consider the
following product of homogenous matrices which gives a new homogenous trans-
formation matrix H
0 0 0 1 0 0 0 1 0 0 0 1 (2.10)
⎡ ⎤
0 0 1 4
⎢ 1 0 0 −3 ⎥
=⎢⎣0 1 0 7 ⎦.
⎥
0 0 0 1
24 2 Degrees of Freedom, Pose, Displacement and Perspective
y
z
z x
4 O
5
3
x y
Fig. 2.8 The pose of an arbitrary frame [x y z ] with respect to the reference frame [x y z]
When defining the homogenous matrix representing rotation, we learned that the
first three columns describe the rotation of the frame x , y , z with respect to the
reference frame x, y, z
x y z
⎡ ⎤
0 0 1 4 x
⎢ 1 0 0 −3⎥ (2.11)
⎢ ⎥y .
⎣0 1 0 7 ⎦ z
0 0 0 1
The fourth column represents the position of the origin of the frame x , y , z
with respect to the reference frame x, y, z. With this knowledge we can graphically
represent the frame x , y , z described by the homogenous transformation matrix
(2.10), relative to the reference frame x, y, z (Fig. 2.8). The x axis points in the
direction of y axis of the reference frame, the y axis is in the direction of the z axis,
and the z axis is in the x direction.
To convince ourselves of the correctness of the frame drawn in Fig. 2.8, we shall
check the displacements included in Eq. (2.10). The reference frame is first translated
into the point [4, −3, 7]T . It is then rotated for 90◦ around the new y axis and finally
it is rotated for 90◦ around the newest z axis (Fig. 2.9). The three displacements of
the reference frame result in the same final pose as shown in Fig. 2.8.
In the continuation of this chapter we wish to elucidate the second meaning of the
homogenous transformation matrix, i.e. a displacement of an object or coordinate
frame into a new pose (Fig. 2.10). First, we wish to rotate the coordinate frame x, y,
z by 90◦ in the counter-clockwise direction around the z axis. This can be achieved
by the following postmultiplication of the matrix H describing the initial pose of the
coordinate frame x, y, z
H1 = H · Rot (z, 90◦ ). (2.12)
The displacement results in a new pose of the object and new frame x , y , z shown
in Figure 2.10. We shall displace this new frame by −1 along the x axis, 3 units
2.4 Pose and Displacement 25
y z
z
O2 O1
O
z x z y x y
O
4
x 3 5
x y
Fig. 2.9 Displacement of the reference frame into a new pose (from right to left). The origins O1 ,
O2 and O are in the same point
yx
Rot(z 90 )
z0
ii
x z
zz y
i iii
3)
2 y y
13 z
n s( 1 iv
x
1 Tra
1
1
Rot(y 90 ) x
1 1 2
2
x0 y0
After translation a new pose of the object is obtained together with a new frame x ,
y , z . This frame will be finally rotated for 90◦ around the y axis in the positive
direction
H3 = H2 · Rot (y , 90◦ ). (2.14)
The Eqs. (2.12), (2.13) and (2.14) can be successively inserted one into another
In Eq. (2.15) the matrix H represents the initial pose of the frame, H3 is the final
pose, while D represents the displacement
26 2 Degrees of Freedom, Pose, Displacement and Perspective
0 0 0 1
Finally we shall perform the postmultiplication describing the new relative pose of
the object
⎡ ⎤⎡ ⎤
1 0 0 2 0 −1 0 −3
⎢0 0 −1 −1 ⎥ ⎢ −1 ⎥
⎢
H3 = H · D = ⎣ ⎥⎢ 0 0 1 ⎥
0 1 0 2 ⎦ ⎣ −1 0 0 −3 ⎦
0 0 0 1 0 0 0 1
y z (2.17)
⎡x ⎤
0 −1 0 −1 x0
= ⎢
⎢ 1 0 0 2 ⎥ y0 .
⎥
⎣ 0 0 1 1 ⎦ z0
0 0 0 1
z z
ψ̇
y
ϑ̇
y y
x ϕ
y
x ϑ
ψ
x x
axis as a function of one of the three angles. A general rotation around three axes
can be obtained as a combination of three consecutive rotations where two consec-
utive rotations must not be executed around two parallel axes. A representation of
orientation of object in space can be achieved with 12 different combinations of the
three elementary rotations around coordinate frame axes (for example, the combina-
tion ZYZ indicates first rotation around the z axis, then rotation around the y axis of
already displaced coordinate frame and finally rotation around z axis of a coordinate
frame that was already displaced twice beforehand; relations are shown in Fig. 2.11).
Each such sequence of rotations represents a triad of Euler angles.
Rotation ZYZ is defined as a sequence of the following elementary rotations
(Fig. 2.11):
If elements of matrix R(φ) are known, Euler angles can be computed. Assuming
that r13 = 0 and r23 = 0, the angle ϕ can be computed as
28 2 Degrees of Freedom, Pose, Displacement and Perspective
2 + r 2 constrains the value of angle ϑ to
The choice of positive sign for term r13 23
(0, π ). Angle ψ can be computed from the equation
Euler angles cannot be computed in poses where the arctangent function does
not return a real value, Euler angles cannot be computed. These are representational
singular values that depend on the selected sequence of Euler elementary rotations.
2.4.2 Quaternions
q∗
q −1 = , (2.24)
|q|
The knowledge of describing the pose of an object by the use of homogenous trans-
formation matrices will first be applied to a mechanical assembly. A mechanical
assembly in a virtual environment can represent a model of a device with mov-
able parts,an avatar or an arbitrary assembly of coupled objects. For this purpose a
mechanical assembly consisting of four blocks, as shown in Fig. 2.12, will be consid-
ered. A plate with dimensions (5×15×1) is placed over a block (5×4×10). Another
plate (8 × 4 × 1) is positioned perpendicularly to the first one, holding another small
block (1 × 1 × 5). Elements of the assembly are connected in series. This means
that a displacement of one element will result in displacement of all elements that
are above the displaced element in the chain and are directly or indirectly attached
to the displaced element.
A frame is attached to each of the four blocks as shown in Fig. 2.12. Our task is
to calculate the pose of the O3 frame with respect to the reference frame O0 . In the
last chapters we learned that the pose of a displaced frame can be expressed with
respect to the reference frame by the use of the homogenous transformation matrix H.
The pose of the frame O1 with respect to the frame O0 is denoted as 0 H1 . In the same
way 1 H2 represents the pose of O2 frame with respect to O1 and 2 H3 represents the
pose of O3 with regard to O2 frame. We learned also that the successive displacements
are expressed by postmultiplications (successive multiplications from left to right) of
homogenous transformation matrices. Also the assembly process can be described
by postmultiplication of the corresponding matrices. The pose of the fourth block
can be written with respect to the first one by the following matrix
0
H3 = 0 H1 1 H2 2 H3 . (2.26)
30 2 Degrees of Freedom, Pose, Displacement and Perspective
1
4
2 8 O2
y x
z
15 O1 z
x z
10 5
1
y y
O3 1
1
z x
y x
O0
4 6
The blocks are positioned perpendicularly one to another. In this way it is not nec-
essary to calculate the sines and cosines of the rotation angles. The matrices can be
determined directly from Fig. 2.12. The x axis of frame O1 points in negative direc-
tion of the y axis in the frame O0 . The y axis of frame O1 points in negative direction
of the z axis in the frame O0 . The z axis of the frame O1 has the same direction as x
axis of the frame O0 . The described geometrical properties of the assembly structure
are written into the first three columns of the homogenous matrix. The position of
the origin of the frame O1 with respect to the frame O0 is written into the fourth
column
O1
x y z
⎡ ⎤ ⎫
0 0 1 0 x⎬ (2.27)
⎢−1 0 0 6⎥ y O0
0 ⎢
H1 = ⎣ ⎥ ⎭ .
0 −1 0 11⎦ z
0 0 0 1
The position and orientation of the fourth block with respect to the first one are given
by the 0 H3 matrix, which is obtained by successive multiplication of the matrices
(2.27), (2.28) and (2.29) ⎡ ⎤
0 10 7
⎢ −1 0 0 −8 ⎥
0
H3 = ⎢⎣ 0 0 1 6 ⎦.
⎥ (2.30)
0 00 1
The fourth column of the matrix 0 H3 [7, −8, 6, 1]T represents the position of the
origin of the frame O3 with respect to the reference frame O0 . The correctness of the
fourth column can be checked from Fig. 2.12. The rotational part of the matrix 0 H3
represents the orientation of the frame O3 with respect to the reference frame O0 .
Now let us imagine that the first horizontal plate rotates with respect to the first
vertical block around axis 1 (z axis of coordinate frame O0 ) for angle ϑ1 . The
second plate also rotates around the vertical axis 2 (y axis of coordinate frame O1 )
for angle ϑ2 . The last block is elongated by distance d3 along the third axis (z axis
of coordinate frame O2 ). The new pose of the mechanism is shown in Fig. 2.13.
Since we introduced motion between the elements of the mechanism, the trans-
formation between two consecutive blocks now consists of the matrix Di that defines
translational or rotational movement and matrix i−1 Hi that defines pose of a block.
ϑ1
ϑ2
Axis 1
Axis 2
d3
Axis 3
The second rotation is accomplished around the y1 axis. The following matrix product
defines the pose of coordinate frame O2 relative to frame O1 (pose of the third block
relative to the second block)
⎡ ⎤⎡ ⎤
cos ϑ2 0 sin ϑ2 0 1 0 0 2
⎢ 1 0 0⎥ ⎢ ⎥
H2 = D2 1 H2 = ⎢
1 0 ⎥ ⎢ 0 0 1 −1 ⎥
⎣ − sin ϑ2 0 cos ϑ2 0 ⎦ ⎣ 0 −1 0 5 ⎦ .
0 0 0 1 0 0 0 1
In the last joint, we are dealing with translation along z 2 axis (pose of the fourth
block relative to the third block)
⎡ ⎤⎡ ⎤
100 0 1 0 0 −1
⎢ 0 1 0 0 ⎥ ⎢ 0 −1 0 1 ⎥
H3 = D3 2 H3 = ⎢
2 ⎥⎢
⎣ 0 0 1 d3 ⎦ ⎣ 0 0 −1 6 ⎦ .
⎥
000 1 0 0 0 1
Matrices 0 H1 , 1 H2 and 2 H3 determine relative poses of elements of the mechanical
assembly after the completed displacements. The pose of the last block (frame O3 )
relative to the first block (frame O0 ) can be computed as a postmultiplication of
matrices i−1 Hi
0
H3 = 0 H1 1 H2 2 H3 . (2.31)
f
a b
[x y z ]
x
[x y z]
y
1 1 1
+ = . (2.32)
a b f
Let us place the lens into the x, z plane of the Cartesian coordinate frame
(Fig. 2.15). The point with coordinates [x, y, z]T is imaged into the point [x , y , z ]T.
The lens equation in this particular situation is as follows
1 1 1
− = . (2.33)
y y f
The rays passing through the center of the lens remain undeviated
z z
= . (2.34)
y y
Another equation for undeviated rays is obtained by exchanging z and z with x and
x in Eq. (2.34). When rearranging the equations for deviated and undeviated rays,
we can obtain the relations between the coordinates of the original point x, y, and z
and its image x , y , z
34 2 Degrees of Freedom, Pose, Displacement and Perspective
x
x = y , (2.35)
1− f
y
y = y , (2.36)
1− f
z
z = y . (2.37)
1− f
The same result is obtained by the use of the homogenous matrix P, which
describes the perspective transformation
⎡ ⎤
1 0 0 0
⎢0 1 0 0⎥
P=⎢
⎣0 0 1
⎥
0⎦ . (2.38)
0 − 1f 0 1
The coordinates of the imaged point x , y , z are obtained by multiplying the coor-
dinates of the original point x, y, z by the matrix P
⎡ ⎤ ⎡ ⎤⎡ ⎤ ⎡ ⎤
x 1 0 0 0 x x
⎢
y⎥ ⎢⎥ ⎢ 0 1 0 0⎥ ⎢y⎥ ⎢ y ⎥
w ⎢
⎣ z ⎦ = ⎣0 0 1
⎥⎢ ⎥ ⎢
0⎦ ⎣ z ⎦ = ⎣ z
⎥,
⎦ (2.39)
y
1 0 − 1f 0 1 1 1− f
where w is a scaling factor. The same relation between the imaged and original
coordinates was obtained as in Eqs. (2.35–2.37). When the element − 1f is at the
bottom of the first column, we are dealing with a perspective transformation along
the x axis. When it is at the bottom of the third column, we have projection along
the z axis.
References
Newtonian physics can be upgraded with physical laws that describe events in
an environment that is beyond our perceptions. These laws apply either to micro
(molecules, atoms) or macro environments (galaxies, universe) and are defined by
quantum and relativistic physics.
The basic concepts applicable to Newtonian physics are summarized below.
A static world and simplified physics are simplifications of Newton dynamics while
the concepts of non-classical physics are beyond the scope of this chapter and will
not be discussed here.
Figure 3.1 shows a mass particle, which is considered a dimensionless object, and a
body constructed from at least three interconnected mass particles. For the purpose of
further consideration we will assume that the body is composed of exactly three mass
particles. The concept can be expanded for more complex bodies. The coordinate
system of the body is located in the body’s center of mass. Vectors ri determine the
position of each mass particle relative to the body’s coordinate frame.
Although both a mass particle as well as a body generally move in a three-
dimensional space, we will assume movement constrained to a plane for purposes
of explanation.
First we will consider the motion of a mass particle. Since we are not interested
in particle orientation, the particle motion can be described using the position vector
p(t) and its time derivatives. Velocity of a mass particle is defined as a time derivative
of the position vector p(t) as
dp(t)
v(t) = ṗ(t) = . (3.1)
dt
Knowing the motion velocity of a mass particle, its position can be computed as a
time integral t
p(t) = p0 + v(ξ )dξ, (3.2)
0
r3
z r1 x
x z r
2
mass particle
rigid body
3.1 Equations of Motion 37
r1 z
O x
y y p1
p, R
z z
r1 x x
O O
Fig. 3.2 Displacement of a rigid body from an initial pose (left) to a final pose (right)
ṗ
z
O x
y
p, R
O
z x
or
p1 (t) R(t) p(t) r1 r
= = T(t) 1 , (3.4)
1 0 1 1 1
The translational velocity is determined as a time derivative defined in Eq. (3.1) of the
position vector that determines the body’s center of mass. The change of orientation
leads to
Ṙ(t) = ω∗ (t)R(t), (3.5)
Quaternion ω̃(t) is an augmented angular velocity vector ω(t) = [ωx (t) ω y (t)
ωz (t)]T and operator ⊗ denotes quaternion multiplication.
A body’s orientation in space can be computed with time integration of Eqs. (3.5)
or (3.6).
Body dynamic properties depend on its mass and inertia. Body mass is defined as a
sum of masses of the individual particles constituting the body (Fig. 3.4)
N
M= mi , (3.8)
i=1
where N is the number of mass particles (in our case N = 3). Since bodies are
usually composed of homogenously distributed matter and not discrete particles,
the sum in the above equation should be replaced with the integral across the body
volume in real conditions.
The definition of a body’s center of mass enables us to separate translational and
rotational dynamics. The body center of mass in the local coordinate frame can be
computed as
N
m i ri
rc = i=1 . (3.9)
M
Since the local coordinate frame is positioned in the body’s center of mass, the
coordinates of the body center of mass expressed in the local coordinate frame equal
3.2 Mass, Center of Mass and Moment of Inertia 39
r1 z x
r2
m1 m2
zero. The body center of mass expressed in the global coordinate frame can be
computed based on the Eq. (3.3).
Finally we define the body inertia tensor I0 with respect to the local coordinate
frame. A body inertia tensor provides information about the distribution of body
mass relative to the body center of mass
⎡ ⎤
m i (ri2y + ri2z ) −m i ri x ri y −m i ri x ri z
N
⎢ −m r r ⎥
i i x i y m i (ri x + ri z ) −m i ri y ri z ⎦ ,
2 2
I0 = ⎣ (3.10)
i −m i ri x ri z −m i ri y ri z m i (ri2x + ri2y )
T
where ri = ri x ri y ri z . If the body’s shape does not change, the inertia tensor
I0 is constant. The inertia tensor with respect to the global coordinate frame can be
computed as
I(t) = R(t)I0 R T (t). (3.11)
In general, matrix I(t) is time-dependent since body orientation relative to the global
coordinate frame changes during the body’s motion.
If v(t) specifies the velocity of a rigid body’s center of mass, a similar equation can
also be written for a rigid body
In this regard, a rigid body behaves similarly to a mass particle with mass M.
40 3 Dynamic Model of a Virtual Environment
If there is no external force acting on the body, the linear momentum is conserved.
It is evident from Eq. (3.13) that translational velocity of the body’s center of mass
is also constant.
A somewhat less intuitive concept than linear momentum is the body’s angu-
lar momentum (angular momentum of a dimensionless body equals zero, since its
moment of inertia equals zero). Angular momentum is defined by the product
x
r
dG(t)
= Ġ(t) = M v̇(t) = F(t), (3.16)
dt
meaning that the time derivative of linear momentum equals the product of body
mass and acceleration or the sum of all forces acting on the body.
The change in angular momentum equals the impulse of the sum of all torques τ
acting on the body, thus
dΓ (t) = τ (t)dt. (3.17)
The time derivative of angular momentum thus equals the sum of all torques acting
on the body
dΓ (t)
= Γ̇ (t) = τ (t). (3.18)
dt
If forces and torques acting on the body are known, time derivatives of linear
and angular momenta are also defined. The body’s linear momentum can thus be
computed as a time integral of all forces acting on the body
t
G(t) = G0 + F(ξ )dξ, (3.19)
0
Fg
torques produced by the medium through which the body moves (viscous damping),
(4) forces resulting from interactions of a body with other bodies in a virtual envi-
ronment (collisions) and (5) virtual actuators (sources of forces and torques).
Interaction between a user and a virtual environment is often done through virtual
tools that the user manipulates through a haptic interface. Virtual actuators are sources
of constant or variable forces and torques acting on a body. Virtual actuators can be
models of electrical, hydraulic, pneumatic actuator systems or, for example, engines
with internal combustion. This group can also include biological actuators—muscles.
The magnitude of forces and torques of virtual actuators can change automatically
based on events within a virtual environment or through interactions with the user.
The force field within which the bodies move can be homogeneous (local gravity
field) or nonhomogeneous (magnetic dipole filed). The force acting on the body
depends on the field parameters and body properties. For example, a gravity force
can be computed as (Fig. 3.7)
Fg = Mg, (3.21)
where g is the gravity acceleration vector. In this case, the force is independent of
the body motion. The homogeneous force field also does not cause any torque that
would result in a change of body angular momentum.
Analysis of interaction forces between the body and the medium through which
the body moves (Fig. 3.8) is relatively straightforward. In this case, friction forces
are of primary interest. In the case of a simple model based on viscous damping (a
body floating in a viscous fluid), the interaction force can be computed as
F B = −Bv, (3.22)
where B is the coefficient of viscous damping and v indicates body velocity. Since
friction forces oppose object motion, a negative sign is introduced.
FB v
3.4 Forces and Torques Acting on a Rigid Body 43
Fc
Fc
d
d
Fig. 3.9 Collision of two bodies and a collision between a body and a grounded wall; p—contact
point, d—body deformation (penetration) and Fc —computed reaction force
The most complex analysis is that of forces and torques resulting from interactions
between bodies. A prerequisite is the implementation of an algorithm for collision
detection. It is then possible to compute reaction forces based on dynamic properties
(plasticity, elasticity).
Relations during a collision are shown in Fig. 3.9. As a result of a collision between
two bodies, a plastic or elastic deformation occurs (the maximal deformation is
indicated with d in the figure). Deformations depend on body stiffness properties.
Body collisions and deformations need to be computed in real time. The computed
deformations can be used to model interaction forces or for visualization purposes.
For the simplest case where a body is modeled only as a spring with stiffness
value k, the collision reaction force Fc can be computed as
Fc = kdn, (3.23)
where d determines the body deformation and vector n determines the reaction force
direction. For a simple case of a frictionless contact, the vector n can be determined as
a normal vector to the surface of the body at a point of contact. A more detailed model
for computation of reaction forces occurring during collisions will be presented in
Chap. 7 dealing with haptic interfaces.
(a) y1 (b) y1
O1 x1 O1 x1
y0 r y0 r
p12 p12
d
p1 R1 p1 R1
p2 p2
O0 x0 O0 x0
Fig. 3.10 Collision between a sphere and a dimensionless particle (simplified view with a collision
between a circle and a particle); the left image shows relations before the collision while the right
image shows relations after the collision. Thick straight arrows indicate force directions
In the case of a frictionless collision, the reaction force direction is determined along
vector p12 , which is normal to the sphere surface.
Figure 3.11 shows a collision between a block and a dimensionless particle. As in
the case of collision with a sphere, the vector p12 = p2 −p1 should first be computed.
However, this is not sufficient since it is necessary to determine the particle position
relative to individual block faces (sides of the rectangle on Fig. 3.11). Namely, vector
p12 is computed relative to the global coordinate frame O0 , while block faces are
generally not aligned with the axes of frame O0 . Collision detection can be simplified
by transforming vector p12 into the local coordinate frame O1 , resulting in p112 . Vector
p112 can be computed as
−1
p112 R1 p1 p2 p
p112 = R1T p12 or = = T−1 2 , (3.25)
1 0 1 1 1
where R1 is the rotation matrix that defines the orientation of the frame O1 relative
to the frame O0 . Axes of coordinate frame O1 are aligned with block principle axes.
Therefore, it becomes straightforward to verify whether a particle lies within or
outside of the body’s boundaries. Individual components of vector p112 have to be
compared against block dimensions a, b and c. For relations in Fig. 3.11 it is clear
that the particle lies within the rectangle’s boundaries (we are considering only plane
relations here) if the following condition is satisfied
3.5 Collision Detection 45
(a) y1 (b) y1
O1 x1 O1 x1
y0 p12 y0
p12
b b d
a a
p1 R1 p1 R1
p2 p2
O0 x0 O0 x0
Fig. 3.11 Collision between a block and a dimensionless particle (simplified view with a collision
between a rectangle and a particle); left image shows relations before the collision while the right
image shows relations after the collision. Thick straight arrows indicate force directions while thick
circular arrow indicates torque acting on the block
a b
| p12
1
x
|< ∧ | p12
1
y
|< , (3.26)
2 2
(a) y1 (b) y1
O1 x1 O1 x1
y0 r1 y0 r1 p12
d r2 y2
p12
r2 y2 O2
p1 , R1 O2 p1 , R1
p2 , R2 p2 , R2
x2
x2
O0 x0 O0 x0
Fig. 3.12 Collision between two spheres (simplified view with a collision between two circles);
the left image shows relations before the collision while the right image shows relations after the
collision. Thick straight arrows indicate force directions
(a) y1 (b) y1
O1 x1 O1 x1
y0 y0 p12 d
y2
b1 p12 y2
a1 b1
a1
O2
b2
O2 p1 , R1 a2
p1 , R1 b2 x2
a2 p2 , R2
p2 , R2 x2
O0 x0 O0 x0
Fig. 3.13 Collision between two blocks (simplified view with a collision between two rectangles);
the left image shows relations before the collision while the right image shows relations after
the collision. Thick straight arrows indicate force directions, while thick circular arrows indicate
torques acting on the blocks
the two spheres collided and the total deformation of both spheres equals
0 for p12 > r1 + r2
d= (3.29)
r1 + r2 − p12 for p12 < r1 + r2 .
d3
(a) d3 (b)
separating plane f
d2 d2
separating axis
d4 d4
d1 d1
Fig. 3.14 Collision between two blocks (simplified view with a collision between two rectangles)—
separating axis and separating plane are indicated; the left image shows relations before the collision
while the right image shows relations after the collision
(a) y1 (b) y1
O1 x1 O1 x1
y0 y0 p12
d y2
b p12 y2
a b r
a
O2
r
O2 p1 , R1
p1 , R1 x2
p2 , R2 x2 p2 , R2
O0 x0 O0 x0
Fig. 3.15 Collision between a block and a sphere (simplified view with a collision between a
rectangle and a circle); left image shows relations before the collision while the right image shows
relations after the collision. Thick straight arrows indicate force directions while thick circular
arrow indicates torque acting on the block
(a) (b)
separating plane f
d2 d2
separating axis
d1 d1
Fig. 3.16 Collision between a block and a sphere (simplified view with a collision between a rec-
tangle and a circle)—separating axis and separating plane are indicated; left image shows relations
before the collision while the right image shows relations after the collision
Finally, we also have to consider the problem of collisions between more complex
bodies. In such cases, collision detection becomes computationally more demanding.
However, it can be simplified by the use of bounding volumes as shown in Fig. 3.17.
The method requires that the body being involved in collision detection is embedded
into the smallest possible bounding volume. The bounding volume can take the shape
of a sphere, a block or a more complex geometry such as a capsule. If a sphere is used,
the bounding volume is called a BS—bounding sphere. The sphere is the simplest
geometry of a bounding volume and enables the easiest collision detection that does
not take into account the body orientation.
The use of a bounding box enables two different approaches. Method AABB—
axis aligned bounding box assumes that box axes are always aligned with the axes
of the global coordinate frame, regardless of the actual body orientation. Therefore,
it becomes necessary to adjust the size of the bounding box during rotation of the
body in space (middle view on Fig. 3.17). Method OBB—oriented bounding box
3.5 Collision Detection 49
BS AABB OBB
Fig. 3.17 Simplification of collision detection between complex bodies with the use of bounding
volumes
assumes that the body is embedded into the smallest possible bounding box that
rotates together with the rotation of the body. In this case, the bounding volume
does not need to be adjusted during the rotation of the body. At the same time,
OBB usually guarantees the most optimal representation of a body with a simplified
bounding volume.
During the computation of collisions between bodies, the original (complex)
geometry is replaced by a simplified geometry defined by a bounding volume. Colli-
sion detection between complex shapes can thus be translated into one of the methods
addressed previously in this chapter.
The use of bounding volumes for collision detection allows only approximation
of true collisions between bodies. If a simple bounding volume does not give satis-
factory results, the body can be split into smaller parts and each of these parts can
be embedded into its own bounding volume. The use of multiple oriented bounding
boxes for a representation of a virtual object is shown in Fig. 3.18. Such representation
enables a more detailed approximation of the underlying object geometry.
OBB
50 3 Dynamic Model of a Virtual Environment
Gk+1 = Gk + Fk Δt
(3.30)
Γ k+1 = Γ k + τ k Δt,
field properties
poses of and actuation interaction
other bodies with user
force field and
display display virtual actuators display
of body of body user of haptc
force field and forces
pose deformations feedback
collision actuator forces
force 1
collision collision + +
detection penetration model
depth force of
body medium medium (environment)
body plasticity– dynamic properties
geometry interaction
elasticity with medium
body mass
4 and moment
of inertia 2
body body linear and
pose velocity 3 velocity angular mom.
computation
Fig. 3.19 Algorithm for computation of body motion as a result of interactions with other bodies,
with the user and with the medium. The force represents a generalized quantity that includes also
torques
3.6 Computation of Body Motion 51
where Gk and Γ k represent body linear and angular momenta at discrete time interval
k, respectively. Initial linear and angular momenta are G0 and Γ 0 . The result of
integration is marked with label 2 in Fig. 3.19.
From the linear and angular momenta computed from (3.13) and (3.14) and by
taking into account body inertial properties (mass and moment of inertia), body
translational velocity vk+1 and rotational velocity ωk+1 can be computed for time
instant k + 1 (label )
3
1
vk+1 = Gk+1
M (3.31)
ωk+1 = Ik−1 Γ k+1 ,
where Ik represents the body inertia in relation to the global coordinate frame at time
instant k.
The new object pose can be computed based on Eqs. (3.2) and (3.6). These equa-
tions are numerically integrated
pk+1 = pk + vk+1 Δt
1 (3.32)
qk+1 = qk + Δt ω̃k+1 ⊗ qk ,
2
where pk and qk are the body position and orientation at time interval k, while initial
position and orientation are determined with p0 and q0 . The new body pose is now
computed (label 4 ) as a consequence of interactions with other bodies, with the user,
with the medium and due to the effects of virtual actuators. The loop for computation
of body pose continues in point 1 after all interaction forces are computed. The loop
presented in Fig. 3.19 needs to be implemented for all dynamic bodies in the virtual
environment.
Chapter 4
Tracking the User and Environment
Virtual reality allows different methods of communication between the user and
virtual world. If we want to create a feeling of presence in the synthetic environment,
we need equipment that can track the user’s position and actions [1]. Information
about the user’s actions allows the system to show the virtual environment from the
user’s perspective, a basic requirement for the induction of physical virtual presence.
At the same time, inputs provided by the user allow interaction with the virtual world.
The user’s interaction with the virtual world via a virtual reality system allows two-
way exchange of information via input/output devices.
The user’s movement and actions can be tracked using either active methods (trig-
gered by the user), where the user transmits information to the virtual reality system,
or using passive methods (triggered by the system without the user’s cooperation),
which sense the user’s movement and inform the computer about the user’s position
and point of gaze. Active methods include spoken instructions as well as controllers
such as joysticks, keyboards, wheels or gamepads. Passive tracking methods are
summarized in Fig. 4.1 [2].
In addition to tracking the user, it is also sometimes necessary to track the envi-
ronment so that information from the real world can be combined with the virtual
world. The real world is usually observed with sensors that are not directly connected
to the user. Inputs from the real world are frequently used to create parts of the virtual
world in real time.
Tracking methods
Ultrasonic Combined
Gloves
(e) Fast: Should allow sampling frequencies on the order of 1 kHz, regardless of
the number of included devices.
(f) Insensitive to occlusion: The device should not need a direct view of the sensor.
(g) Robust: Should be robust with regard to external influences (temperature, mois-
ture, magnetic field, radiofrequency noise).
(h) Unlimited working area: Should allow a target to be tracked regardless of its
speed and distance.
(i) Wireless: Should work without any wires, only with battery power and wireless
connection to a computer.
(j) Cheap.
A pose sensor is a device that allows both the position and orientation to be tracked.
It is perhaps the most important measurement device in a virtual reality system, as
it allows the system to detect the position and orientation of the user in the virtual
world. The device also sets the limitations of virtual reality. Pose tracking methods
are usually based on electromagnetic, mechanical, optical, videometric, ultrasonic or
inertial principles. Each of these has its own advantages and disadvantages. Disad-
vantages are mainly due to limitations of the specific physical medium and limitations
of specific devices or signal processing techniques.
Conceptually the simplest principle of user motion tracking, the mechanical principle
assumes a direct physical connection between the user and the measurement device. A
typical approach uses an articulated mechanism with two or more segments connected
by joints whose angles can be measured. The mechanism thus has multiple degrees
of freedom and is connected to the user. The device follows the user’s motions. The
mechanism can also be equipped with weight compensation for higher effectiveness.
Let’s illustrate the mechanical measurement principle with a simple example
shown in Fig. 4.2. Let’s assume that the user is touching the end of the mechanism
shown in the figure. The user’s pose is thus defined through the pose of coordinate
system [x3 , y3 , z 3 ]. Our goal is the geometric model of the mechanical device. This
model describes the pose of the coordinate system at the end of the device with regard
to the base coordinate system and can be obtained by consecutively multiplying
(postmultiplication) homogeneous transformation matrices. However, in this case
the model is relatively simple and can be calculated directly from the relations in
Fig. 4.2b, which gives a bird’s eye view of the mechanism.
Since the mechanism has only two degrees of freedom, the motion of the endpoint
is limited to the x y plane. The height is constant and determined by the length of
56 4 Tracking the User and Environment
z3
(a) (b)
y3
z2 l3
x3 y0
y2
z1 l2
x2 p3 , l3
y1
ϑ2
ϑ2
x1 p
ϑ1
z0
l1
y0 p2 , l2
ϑ1
x0
z0 x0
Fig. 4.2 Example of a mechanism with two degrees of freedom (a). Kinematic model of the
mechanism (b)
T
the first segment l1 . Along this segment, we define the vector p1 = 0 0 l1 . The
rotation axis for the first joint is the vertical axis z 0 , which points out of the page in
Fig. 4.2b. We define vector p2 in the direction of the second segment. This gives
⎡ ⎤
cos ϑ1
p2 = l2 ⎣ sin ϑ1 ⎦ . (4.1)
0
Vector p3 goes along the third segment. Its components can be determined from
Fig. 4.2b ⎡ ⎤
cos(ϑ1 + ϑ2 )
p3 = l3 ⎣ sin(ϑ1 + ϑ2 ) ⎦ . (4.2)
0
We also define vector p, which goes from the origin of the coordinate system
(x0 , y0 , z 0 ) to the end of the robot
p = p1 + p2 + p3 . (4.3)
(a) (b)
Fig. 4.3 Examples of two mechanical devices. a The robot in the figure is in contact with the user
at only one point at the end of the device (arrow). b Shows an exoskeletal robot where interaction
between the human and device occurs at many points (arrows)
Figure 4.3 shows examples of two more mechanical devices. The robot in Fig. 4.3a
is in contact with the user at one point at the end of the device (arrow). In this case, we
only need to track the endpoint of the user’s limb. Figure 4.3b shows an exoskeletal
robot where interaction between the robot and user occurs at many points (arrows).
The exoskeletal mechanism allows all segments of the limb to be tracked, not only
its endpoint.
Finally, a few strengths and weaknesses of the mechanical tracking principle
should be mentioned. It allows position and orientation to be tracked with a high
accuracy that mostly depends on the specifics of the optical encoders used to measure
joint angles. Sampling frequencies over 1 kHz are attainable, and delays are small.
Weaknesses include high complexity, high price and motion constraints introduced
by the measurement mechanism.
l = ctus , (4.5)
where tus is the travel time of the ultrasonic pulse from the emitter to the receiver.
Three noncollinear receivers are required to determine the position of a point in space.
58 4 Tracking the User and Environment
S2
y
l3
O[xO , yO , zO ]
d2
l1
zO
xO
S0 l2
yO
d1
S1 x
O2
object
O0
receiver
S2 O1
emitter
S1
S0
reference coordinate system
Fig. 4.5 Determining the pose of an object based on the ultrasonic principle
Finally, a few strengths and weaknesses of the ultrasonic tracking principle should
be mentioned. The greatest weaknesses are caused by measurements of the ultra-
sound travel time from the emitter to the receiver. The speed of sound depends on
temperature, pressure and humidity as well as any barriers in the ultrasound’s path.
All of these effects can decrease the accuracy of distance measurements and thus
also the accuracy of the tracked object’s pose. An additional weakness of ultrasonic
measurements is the relatively low speed of sound in air, which limits the highest
attainable sampling frequency to a few dozen Hz and causes nonnegligible measure-
ment delays (up to a few dozen ms). The strengths of the principle include simple
and cheap measurement technology as well as relatively small sensors.
Optical trackers use visual information to track the user’s motion. Measurements can
be performed using videocameras or special cameras with active or passive markers.
The task of computer vision is to recognize the geometry of the scene or the
movement of the user from a digital image. Let’s start the explanation with the
simplest example: a single point (Fig. 4.6) that can also represent a marker attached
to the person. We want to determine the relationship between the point’s coordinates
in a two-dimensional image and the point’s coordinates in the three-dimensional real
world. The basic equations of optics tell us that the position of the point in the image
plane depends on the position of the same point in three-dimensional space. Our task
is to find the geometric relationship between the coordinates of point P[xc , yc , z c ]
in space and the coordinates of the same point p[u, v] in the image.
Since the opening in the lens through which light reaches the image plane is
small compared to the size of the observed objects, the camera lens can be replaced
with a pinhole in the mathematical model. In a perspective transformation, all points
60 4 Tracking the User and Environment
p[u, v] fc
P[xc , yc , zc ]
xc
yc
zc
are mapped to the same plane via lines that intersect in a point called the center of
projection. When a real camera is replaced by a camera with a pinhole, the center of
projection is in the center of the lens.
A coordinate system must be attached to the camera. This allows the pose of the
camera to always be described through the pose of the selected coordinate system.
Axis z c of the camera’s coordinate system points along the optical axis while the
origin of the coordinate system is placed in the center of projection. We choose a
right-handed coordinate system where axis xc is parallel to the rows of the image
while axis yc is parallel to its columns.
In a camera, the image plane is located behind the center of projection. The
distance f c between the image and the center of projection is called the focal length.
In the coordinate system of the camera, the focal length has a negative value since
the image plane is in the negative part of axis z c . For our model, it is more convenient
to use the equivalent image plane on the positive side of axis z c (Fig. 4.7).
The equivalent and real image planes are symmetric with regard to plane [xc , yc ]
of the camera coordinate system. The geometric properties of objects in both planes
are completely equivalent and differ only in their signs.
From now on, we will refer to the equivalent image plane simply as the image
plane. The image plane can be considered a rigid body, so a coordinate system can
also be attached to it. The coordinate origin is placed in the intersection of the optical
axis and the image plane. Axes xs and ys should be parallel to axes xc and yc of the
camera coordinate system.
The camera thus has two coordinate systems: the camera coordinate system and
the image plane system. If point P is expressed in the camera coordinate system and
p represents the projection of point P onto the image plane, we are interested in the
relation between the coordinates of point P and the coordinates of point p.
Let’s say that point P lies on the [yc , z c ] plane of the camera coordinate system.
Its coordinates are then
4.1 Pose Sensor 61
Oc
xc
yc p
o
xs
q P[yc , zc ]
ys
P1
Q1
zc
Q[xc , zc ]
⎤ ⎡
0
P = ⎣ yc ⎦ . (4.7)
zc
Projection p then falls onto the ys axis of the image coordinate system
0
p= . (4.8)
ys
Let’s also take point Q, which lies in the [xc , z c ] plane of the camera coordinate
system. In a perspective projection of point Q, its image q falls on the xs axis of the
image coordinate system. Due to the similarity of triangles QQ1 Oc and qoOc , we
write xc zc
=
xs fc
and xc
xs = f c . (4.10)
zc
We have thus obtained the relation between the coordinates of point P = [xc , yc , z c ]T ,
expressed in the camera space, and the point p = [xs , ys ]T , expressed in the image
62 4 Tracking the User and Environment
space. The above equations represent the mathematical description of the perspective
projection from 3D-space to a 2D-space. They can be written in matrix form
⎤ ⎡
⎡ ⎤ ⎡ xc ⎤
xs fc 0 0 0 ⎢ ⎥
yc ⎥
λ ⎣ ys ⎦ = ⎣ 0 f c 0 0 ⎦ ⎢
⎣ zc ⎦ . (4.11)
1 0 0 10
1
In Eq. (4.11), λ is the scaling factor, [xs , ys , 1]T are the projected coordinates of
the point in the image coordinate system, and [xc , yc , z c , 1]T are the coordinates of
the original point in the camera coordinate system. We also define the perspective
projection matrix ⎡ ⎤
fc 0 0 0
Π = ⎣ 0 fc 0 0 ⎦ . (4.12)
0 0 10
It is easy to see from the Eq. (4.11) that coordinates [xs , ys , λ]T can be unam-
biguously determined when [xc , yc , z c , 1]T is known. However, it is not possible
to determine the coordinates [xc , yc , z c ]T in the camera coordinate system if only
the coordinates [xs , ys ]T in the image coordinate system are known and the scaling
factor λ is unknown. The above matrix equation represents a direct projection while
the calculation of [xc , yc , z c ]T from [xs , ys ]T is called the inverse projection. If only
one camera is used and we have no foreknowledge about the size of the objects in
the scene, it is impossible to find an unambiguous solution to the inverse problem.
Next we examine the inverse projection. As previously stated, the direct
relationship is (4.11) ⎡ ⎤
⎡ ⎤ xc
xs ⎢ yc ⎥
λ ⎣ ys ⎦ = Π ⎢ ⎥
⎣ zc ⎦ . (4.13)
1
1
xc v
yc
xs C
zc
ys
L13
L23
A
L12
B
The system (4.14) contains 12 unknown variables and 9 equations. The solution can
be found only with three additional equations. These can be obtained from the size
of the triangle created by points A, B and C. The lengths of sides AB, BC and C A
can be labeled with distances L 12 , L 23 and L 31
We now have twelve equations for twelve unknown variables. A solution to the
problem thus exists. Unfortunately, the last three equations are nonlinear and must be
solved numerically with special software. This solving method is called an inverse
projection mapping based on a model.
Since the model of the observed object is usually not available or the object
changes with time (e.g. a walking human), other solutions to the inverse projection
mapping problem need to be found. One possible solution is the use of stereo vision:
sensing based on two cameras. The principle is similar to human visual perception
where the images seen by the left and right eyes differ slightly due to parallax and the
brain uses the differences between images to determine the distance to the observed
object.
64 4 Tracking the User and Environment
Cl
d ql
x0 f
y0
qr xs,l z0 , zs,l
Cr ys,l Q[xQ , yQ , zQ ]
f
left camera zQ
yQ
xs,r zs,r
ys,r xQ
right camera
The principle of using two parallel cameras to observe point Q is shown in Fig. 4.9.
Point Q is projected onto the image plane of the left and right cameras. The left
camera’s image plane contains projection ql with coordinates xs,l and ys,l while the
right camera’s image plane contains projection qr with coordinates xs,r and ys,r .
The axes of the reference coordinate system [x0 , y0 , z 0 ] have the same directions as
the left camera’s coordinate system.
Figure 4.10a shows the top view while Fig. 4.10b shows the side view of the
situation in Fig. 4.9. These views will help us calculate the coordinates of point Q.
From the geometry in Fig. 4.10a we can extract the following relations (distances
x Q , y Q and z Q are with regard to coordinate system [x0 , y0 , z 0 ])
zQ xQ
=
f xs,l
zQ xQ − d
= (4.16)
f xs,r
xs,l z Q zQ d
− = . (4.18)
xs,r f f xs,r
d x0 xs,l
qr
Cr f zs,r
right camera
xs,r
Q[xQ , yQ , zQ ]
(b)
yQ
f z0 , zs,l
y0 ys,l
Fig. 4.10 Projections of point O on the planes of the left and right cameras. a Shows a view of
both cameras from above while b shows a side view of the cameras
fd
zQ = . (4.19)
xs,l − xs,r
Using two cameras thus allows the position (and orientation) of an object in space
to be determined without an accurate model of the object. This naturally greatly
66 4 Tracking the User and Environment
Sensor
Infrared light
a Cylindrical lens
ϕ
n
dx
ndx
ϕ = arctan . (4.22)
a
A better resolution of position measurements can be obtained with a line sensor with
a smaller pixel width dx . The position of multiple markers can be determined by
triggering one infrared emitter after another.
Of course, the spatial position of the marker cannot be determined with a single
line sensor. It requires at least three sensors with different orientations. An example
4.1 Pose Sensor 67
yc
xc
d
zc Q[xQ , yQ , zQ ]
β
γ yQ
zQ
α
xQ
of a camera system with three line sensors is shown in Fig. 4.12. The position of
the IR marker in the [xc , yc , z c ] coordinate system can be calculated from the three
incidence angles of the IR rays onto the three line sensors. These three angles are
marked as α, β and γ . Incidence angle α is determined as
xQ + d
tan α = , (4.23)
zQ
xQ − d
tan γ = . (4.25)
zQ
2d
zQ = . (4.26)
tan α − tan γ
x Q = z Q tan α − d (4.27)
68 4 Tracking the User and Environment
y Q = z Q tan β. (4.28)
ϕ
laser
source
dy
camera with
focusing sensor
lens
laser
ray
convex
lens
dx
object dp
(a)
illuminated (b)
object
stripes
projected
onto sensor
triangulation
angle
projection
of stripes
illumination spatial
model of object
of light as shown in Fig. 4.14a. From the viewpoint of the projector, the pattern
consists of straight lines projected onto the object. The camera, which is displaced
by a certain triangulation angle, perceives the same stripes as curves running along
the surface of the object. Mathematical algorithms allow the three-dimensional shape
of the object to be determined from the shape of the stripes.
The principle of structured light can also be used to sense the depth or distance of
objects. An appropriate illumination pattern for such a purpose is shown in Fig. 4.14b.
In the area where the object is located, the distribution of dots is different than in the
background. The depth of the object also affects the pattern of dots. Image processing
allows information about objects’ distances to be obtained.
Additional processing of object distance data also allows tracking of pose or
motion if the observed person or object is located in the camera’s field of view [5].
The procedure is illustrated in Fig. 4.15. The first figure shows distance information
(closer objects are darker). Depth information is used to perform segmentation as
seen in the second figure. The segmented image allows the reconstruction of the
person’s skeleton, which gives information about segment positions relative to one
another (third image). To perform this segmentation and reconstruct the skeleton,
the algorithm learns from a training database of measurements that describe human
kinematic properties. A detailed description of such learning, however, is beyond the
scope of this work.
The videometric principle is actually the optical principle, except that the camera
is not fixed in space but is instead attached to the object whose pose we wish to
70 4 Tracking the User and Environment
Fig. 4.15 Human skeleton reconstruction based on triangulation using structured light
determine. Markers placed around the room are necessary for the videometric prin-
ciple and used to determine the pose of the object to which the camera is attached.
The radiofrequency principle is rarely used for user motion tracking in virtual reality
and is thus mentioned here only briefly. Most methods are based on measuring the
travel time of a radiofrequency signal and thus require very accurate measurements of
time. The principle is not suitable for measuring short distances with high accuracy,
and the complexity of the measurements results in expensive equipment.
An example of the radiofrequency principle applied to user motion tracking is
GPS, which is not used in virtual reality. The measurement concept is shown in
Fig. 4.16. The measurement is based on measuring signal travel time from a satellite
to a user on the Earth’s surface. Position calculations are based on triangulation,
which is done as in the ultrasonic principle and thus not repeated here.
The principle is based on measuring the local vector of the magnetic field in the
sensor’s surroundings. The Earth’s magnetic field can be used as a basis for mea-
surement, but such measurements are usually not accurate enough. Thus, it is usually
necessary to create an additional magnetic field especially for the measurement.
4.1 Pose Sensor 71
B = μ0 H, (4.29)
ϕ R
aϕ
I b
aR
x y
coaxial coupling: c = 2
source sensor
coplanar coupling: c = −1
T
y1
B
x0 y0
the sensor is in a pose determined with ϕ = 90◦ with regard to Fig. 4.18. Inputting
this value into Eq. (4.30) gives a factor of 1 inside the parentheses. However, since
the direction of the magnetic field inside the sensor is the opposite of the direction
of the magnetic field of the source, coaxial coupling is defined with c = −1. The
coupling factor changes nonlinearly in all intermediate poses, allowing the geometric
relations between the source and sensor to be determined.
Naturally, using a single coil as the source and a single coil as the sensor does
not allow us to calculate the pose of the sensor in space. For that, we need three
orthogonal coils on the side of both the source and the receiver (Fig. 4.20). Using
three orthogonal coils, the source sequentially generates a changing magnetic field in
each coil. These fields generate currents in the sensor’s coils via coupling. Measuring
currents in the sensor allows us to determine the relative pose T of the source and
sensor. The sensor’s signal depends both on the distance between the source and
sensor and on their relative orientation. Thus, all six degrees of freedom of the
sensor’s pose can be determined.
All six degrees of freedom of the sensor relative to the source are shown in
Fig. 4.21. Position is defined by spherical coordinates (R, α, β) while orientation
is defined by coordinates (ψ, φ, θ ). The triaxial electromagnetic dipole represents
the reference coordinate system. The source generates a temporally multiplexed
sequence (coils of the source are triggered one after the other) of electromagnetic
fields that are sensed by the triaxial magnetic sensor. The algorithm used to calculate
the pose of the sensor is beyond the scope of this work, but can be found in [6].
Finally, a few strengths and weaknesses of tracking based on the electromagnetic
principle should be mentioned. Sensors based on this principle are compact, light
and relatively cheap. The working area of the system is limited by the decreasing
field strength as a function of distance from the source. The working area also may
not contain any ferromagnetic materials that could distort the magnetic field. The
system’s advantage is that, unlike optical tracking, it does not require a line of sight
between the source and sensor. The sampling frequency is limited by the temporally
multiplexed generation of magnetic fields. The magnetic field in a source’s coil must
dissipate completely before a magnetic field can be generated in the next coil. It
74 4 Tracking the User and Environment
z0
y1
x1 φ
z1
Sensor
θ O
z0 ψ
x0 y0
Reference c.s.
Source β
O
x0 α y0
Reference c.s.
is also necessary to find a compromise between the working area, resolution and
accuracy of the system as well as the sampling frequency.
The inertial principle is based on inertial measurement systems that combine gyro-
scopes (angular velocity sensors), accelerometers and magnetometers (which usually
measure orientation relative to the Earth’s magnetic field). The inertial measurement
system works similarly to the inner ear that senses the head’s orientation. In principle,
inertial systems can measure all six degrees of freedom of an object’s pose, though
the nonideal sensor outputs represent significant technical challenges. The inertial
measurement principle was first used on ships, planes and satellites long before the
idea of virtual reality was realized. The basic concept of inertial motion tracking is
shown in Fig. 4.22 [3]. The gyroscope is used to calculate the object’s orientation
while the accelerometer can be used to calculate the object’s position using double
integration (taking gravitational acceleration into account).
4.1 Pose Sensor 75
gyroscope orientation
measured
ang.velocity
translational
acceleration
translational
velocity
accelerometer subtracting effect position
rotation to global
of gravitational
measured coordinate system
acceleration
acceleration
The first inertial systems were not suitable for use in virtual reality, as the gyro-
scopes and accelerometers were large mechanical devices that could not be attached
to a human. The inertial measurement principle became interesting for virtual reality
only with the development of microelectromechanical devices. Microelectromechan-
ical (MEMS) inertial sensors, which include accelerometers and gyroscopes, are one
of the most important silicon-based sensors. The greatest demand for them comes
mainly from the automotive industry, where they are used to activate safety systems
(e.g. air bags), to control vehicle stability and to electronically control the suspension.
However, inertial sensors are also used in a variety of other applications where small
and cheap sensors are needed. They are used in biomedical applications for human
activity recognition, in cameras for image stabilization, in mobile devices and sports
goods. The industry uses them for robotics and vibration control while the military
uses them to guide missiles. Accelerometers with a high sensitivity are important for
autonomous control and navigation, for seismometers and for satellite stabilization
in space.
MEMS gyroscopes allow an object’s angular velocity to be measured. They are
frequently used together with accelerometers and are thus found in most of the same
applications.
MEMS magnetometers are not inertial measurement systems on their own, but
are frequently used together with accelerometers and gyroscopes and thus included
in this group. They are usually used to measure orientation with respect to the Earth’s
magnetic field.
4.1.8.1 Accelerometer
k m
k
m
x0 d x0
x d
housing housing
Fig. 4.23 The principle of an accelerometer: static conditions (left) and during translational accel-
eration (right)
m ẍ + d ẋ + kx = ma. (4.32)
The product ma in the above equation can be treated as a force acting on the mass.
Due to the force, the mass moves as described by the dynamics of the system and
its parameters m, d and k. Equation (4.32) can be written in the form of a transfer
function
x(s) 1 1
H (s) = = = 2 ωr , (4.33)
a(s) s + ms + m
2 d k s + Q s + ωr2
√ √
where ωr = k/m is the system’s resonance frequency and Q = km/d is the
quality factor of the system. The resonance frequency can be increased by increasing
the spring constant k or decreasing the mass m while the quality factor can be
increased by decreasing damping d or increasing the mass m and spring constant k.
In general, a single accelerometer housing can contain elements to measure accel-
eration along all three translational axis. An implementation of such a system that
uses the capacitive principle to measure displacement is shown in Fig. 4.24. The typ-
ical characteristic parameters of accelerometers include sensitivity, operating range,
frequency response and resolution.
4.1 Pose Sensor 77
capacitive plate
ma
ss
elastic spring
housing
4.1.8.2 Gyroscope
Gyroscopes are devices that measure angular velocities. They are generally divided
into three major categories: (1) mechanical gyroscopes, (2) optical gyroscopes and
(3) gyroscopes based on a vibrating mass. Mechanical gyroscopes are based on the
principle of conservation of angular momentum (Fig. 4.25). Newton’s second law
states that angular momentum is conserved while no external torques act upon the
system. The basic equation that describes a gyroscope is
dL d(Iω)
τ= = = Iα, (4.34)
dt dt
where τ is the torque acting on the gyroscope, L is the gyroscope’s angular momen-
tum, I is the gyroscope’s moment of inertia, and ω and α are the angular velocity
and angular acceleration.
Optical gyroscopes are based on a laser light source and interferometry. Their
main advantage is that they have no moving parts, so they are not vulnerable to
mechanical wear and signal drift.
78 4 Tracking the User and Environment
aCor = 2v ×
x0 tor x0
ua
y act tor
x ua
act fCor
x
housing housing
Fig. 4.26 Operating principle of a gyroscope: a Coriolis acceleration, b gyroscope in static condi-
tions and c gyroscope during rotation
Mechanical and optical gyroscopes are not suitable for attachment to humans.
However, gyroscopes based on a vibrating mass are small and cheap enough to use
in motion tracking. The principle is based on a vibrating mass that is affected by
Coriolis acceleration due to rotation. This acceleration causes secondary vibrations
perpendicular to the direction of primary vibrations and the angular velocity vector.
By measuring the secondary vibrations, it is possible to determine the object’s angular
velocity. Figure 4.26a shows the basic principles of such a gyroscope. A particle
moving with velocity v is affected by acceleration aCor due to the rotation of the
system at angular velocity ω.
In a gyroscope based on a vibrating mass, the actuator causes vibrations that move
the mass with velocity vact . There is no additional movement of the mass as long
as angular velocity is equal to zero (Fig. 4.26b). However, the angular velocity ω
(Fig. 4.26c) causes a Coriolis force
that creates additional movement of the mass in the direction of the force. By mea-
suring the displacement x, it is possible to calculate the angular velocity of the sensor
housing.
In general, a single housing can contain components that measure angular veloci-
ties around all three axes of rotation. Typical characteristic parameters of gyroscopes
include sensitivity, operating range and resolution.
4.1.8.3 Magnetometer
R = R0 + Δ R0 cos2 α
permalloy
m
H
+ α −
electric current
I
materials that their resistance changes due to an external magnetic field. The basic
operating principle of the sensor is shown in Fig. 4.27.
The figure shows a slice of the ferromagnetic material permalloy. Assuming that
there is no external magnetic field, the magnetization vector of permalloy m is parallel
to the direction of the current (in our case, left to right). If the sensor is placed into
a magnetic field with strength H such that the field is parallel to the plane of the
permalloy and perpendicular to the direction of the current, the internal magnetization
vector of the permalloy m turns by an angle α. Consequently, the resistance of the
material R changes as a function of the angle α
where R0 and ΔR0 are properties of the material that give optimal sensor character-
istics. Measuring the resistance R allows measurement of the sensor’s angle relative
to the external magnetic field, which can be used to determine the object’s spatial
orientation.
(a) (b)
x x
axis x0 axis x0
y y
ϕ ϕ
r r
y0 y0 ωz
gyroscope gyroscope
ar
at+ r
ax accelerometer accelerometer
at ay
ay ax
ϕ a ϕ
g g
Fig. 4.28 Example of using an inertial measurement system to measure the angle of a pendulum.
a Stationary pendulum. b Swinging pendulum
ar = ω × (ω × r) (4.38)
az zs
z0 accelerometer
mz
magnetometer
ay ωz
ax gyroscope
my
y0 mx
ys
xs ωy
ωx
x0
housing
Fig. 4.29 Inertial measurement unit consisting of three accelerometers, three gyroscopes and three
magnetometers
a = g + ar + at . (4.40)
The equation used to calculate angle in stationary conditions (4.37) is no longer valid,
so the accelerometer cannot be used to calculate the angle of a swinging pendulum.
However, the output of the gyroscope, which measures the angular velocity of the
pendulum, is now also available. Since the angle of the pendulum can be calculated
as the temporal integral of angular velocity, the following relation can be stated
ϕ = ϕ0 + ωz dt, (4.41)
ϑ
ψ̇
ys
ϕ̇ ϕ y ,y
ϑ̇
x0 , x ψ y0
xs
x
written in the form of XYZ-Euler angles as seen in Fig. 4.30. Vector φ gives the
orientation of the coordinate system (xs , ys , z s ) with regard to coordinate system
(x0 , y0 , z 0 ).
In static and quasistatic conditions, the accelerometer allows measurement of
rotations around the x0 and y0 axes of the reference coordinate system. Since the
gravity vector is parallel to axis z 0 , the accelerometer cannot sense rotation around
this axis. For this purpose, we can use a magnetometer, which also allows measure-
ment of rotation around the z 0 axis (think of how a compass works). Combining an
accelerometer and magnetometer thus gives an estimate of sensor spatial orienta-
tion, but such measurements are suitable only for quasistatic conditions. In dynamic
conditions, an additional acceleration acts on the accelerometer and prevents it from
being used as a tilt sensor.
Combining the best properties of an accelerometer, gyroscope and magnetometer
can give us an accurate and reliable measurement of spatial orientation. This is done
through sensor integration, which can be done e.g. using the Kalman filter [7]. Sensor
integration is beyond the scope of this work, so let’s look only at the basic concept
as illustrated in Fig. 4.31.
4.1 Pose Sensor 83
g +
gyroscope +
a +
accelerometer direct ma Kalman Δˆ
calculation +
B filter
magnetometer of orientation −
s2 ω(s) k1 s + k2
φ= + 2 φ (s) (4.43)
s + k1 s + k2 s
2 s + k1 s + k2 ma
or in a simpler form
ω(s)
φ = G(s) + (1 − G(s)) φ ma (s). (4.44)
s
The latter equation clearly shows that the Kalman filter weighs two different sources
of information (integrated gyroscope signal and absolute measured orientation) that
complement each other. The function G(s), which filters the integrated gyroscope
signal, acts as a high-pass filter. In other words, at high frequencies or during rapid
motions the output is equal to the integrated gyroscope signal. The function 1−G(s),
on the other hand, filters the absolute measured orientation and acts as a low-pass
filter. During slow motions, the filter output thus gives more weight to the absolute
angle measurement. Similar findings were seen during the analysis of the pendulum
angle.
Taking the calculated orientation into account, we can now subtract the gravita-
tional acceleration from the measured acceleration. The remnant represents the trans-
lational acceleration. Integrating this acceleration allows us to estimate the velocity
84 4 Tracking the User and Environment
and position of an object in space as shown in Fig. 4.22. Of course, we must take
into account that the remaining offset value (we can never completely remove the
effect of gravity) causes the velocity and position to drift. This method can thus only
be used over brief time periods.
During interaction with a virtual environment, it is often not enough to only know the
pose and movement of the user. We are also interested in the interaction forces. An
example of such a situation is the measurement of the ground reaction force during
balancing tasks. This section thus presents two methods of measuring forces and
torques between the user and virtual environment. We wish to know the point on
which the total force and torque act as well as their amplitude and direction.
Contact between a human and an object in the environment can be described with
the position of the point of contact p and the interaction force f. Six variables are
thus required: x p , y p , z p , f x , f y in f z . We must track the point of contact in the
environment’s coordinate system as well as measure all three components of the
interaction force. These six values can be determined in different ways.
Let’s first see how we can determine the position of the point of contact if we know
the interaction force f and its torques acting around the axes of the basic coordinate
system. The geometric conditions are shown in Fig. 4.32. The force f can be divided
fy
fx
yp
y
μy
xp
μx
x
4.2 Measuring Interaction Forces and Torques 85
into three components: f x , f y and f z . These forces create the following torques with
regard to the axes of the basic coordinate system
μx = f z y p − f y z p
μy = fx z p − fz x p
μz = f y x p − f x y p . (4.45)
This means that equations (4.45) are redundant. Only two coordinates of the point
of contact can be calculated from them. However, this is usually enough since one
coordinate of the point of contact is generally given.
The interaction force between the user and the environment is generally measured
either with force sensors or force plates.
A force and torque sensor is usually placed between the user and object in the area
where we wish to measure interaction with the user.
A force and torque transducer generally has a cross-shaped mechanical structure
as seen in Fig. 4.33. The ends of the cross are attached to an object in the environment
while the grip comes from an opening in the center. Eight pairs of semiconductive
measurement strips are affixed to the ends of the cross. These strips allow the mea-
surement of deformations from w1 to w8 . Each opposing pair of measurement strips is
connected to a measurement bridge. When force is applied to the sensor, it is possible
to record eight analog voltages proportional to the forces marked in the figure.
A calibration procedure allows us to determine the elements of a 6 × 8 calibration
matrix that converts the measured analog values to force and torque vectors
T T
f x f y f z μ x μ y μz = K w1 w2 w3 w4 w5 w6 w7 w8 , (4.47)
where ⎡ ⎤
0 0 K 13 0 0 0 K 17 0
⎢ K 21 0 0 0 K 25 0 0 0 ⎥
⎢ ⎥
⎢ 0 K 32 0 K 34 0 K 36 0 K 38 ⎥
K=⎢
⎢ 0
⎥ (4.48)
⎢ 0 0 K 44 0 0 0 K 48 ⎥
⎥
⎣ 0 K 52 0 0 0 K 56 0 0 ⎦
K 61 0 K 63 0 K 65 0 K 67 0
86 4 Tracking the User and Environment
x
μx
fx
w7
w1 w2
w8
w3
fz z
μz
fy
y μy w4
w5 w6
is the calibration method whose elements K i j are constant and represent the gains
of the individual measurement bridges.
The interaction force between the user and environment can also be measured with a
force plate. Such a plate contains force transducers in all four corners to measure the
horizontal and vertical components of the measured reaction force (Fig. 4.34). The
coordinate system’s point of origin is placed in the center of the plate. Horizontal
forces in the plate corners are marked as αi while vertical forces are marked as βi .
Interaction forces and torques are obtained from the forces measured in the corners
of the force plate
f x = α3 − α1
f y = α4 − α2
f z = β2 + β4 − β1 − β3
μx = (−(β1 + β4 ) + (β2 + β3 )) 2l (4.49)
μ y = (−(β3 + β4 ) + (β1 + β2 )) 2l
μz = (α1 + α2 + α3 + α4 ) 2l .
The plane of the force plate has the vertical coordinate z p = 0. It is then possible to
calculate the coordinates of the point p onto which the reaction force f acts
4.2 Measuring Interaction Forces and Torques 87
β2
z
μz α2
α3 fz
β3
fy yp
p y
xp μy l
β4
fx α1
μx
α4 x
β1
x p = −μ y / f z
(4.50)
y p = −μx / f z .
Let’s say that the goal is to determine the position of the vertical projection COG
of the center of gravity COG of a person standing on the force plate (Fig. 4.35). The
sensors give information about the quantities f x , f y , f z , μx , μ y in μz , thus allowing
the projection of the center of gravity to be calculated from the Eq. (4.50).
4.3.1 Head
COG
z
μz
fz
yp fy
y
xp μy
COG
fx
μx
x
Fig. 4.35 Determining the vertical projection of the center of gravity of a person standing on a
force plate
Hand and finger tracking allows a user to interact with the virtual world. In multi-user
environments, gesture recognition can be used to communicate between subjects.
Mechanical, optical, ultrasonic, electromagnetic or inertial sensors can be used for
arm tracking. Finger tracking, however, is limited by the maximum size of the sensor
and by the number of degrees of freedom that should be measured. Special gloves
are thus usually used to measure finger movement. These gloves are equipped with
sensors (goniometers, optic fibers) that are placed along the fingers and measure
finger joint angles (Fig. 4.37). A goniometer is a sensor whose resistance changes
as it is bent, thus allowing joint angles to be measured via resistance changes. Bent
4.3 Motion Tracking 89
optic fibers, on the other hand, change the amount of light that can pass through
them, thus allowing joint angles to be calculated from the amount of light.
4.3.3 Eyes
Studies have shown that the eyes are the human’s most frequently moved external
part of the body. The constant motion occurs due to the huge amount of visual data
coming from the environment. The eye does not study the entire available visual scene
in detail, but focuses on smaller bits of information that are examined rapidly one
after another. By integrating individual parts of the visible world, the human creates
an interconnected whole [8]. Small eye movements and thus changes in perceived
light occur even when the eye is focused on a small detail in the environment. Since
the nerve endings in the eye can detect only changes in light, this is the only way we
can see stationary objects.
Eye movement consists of rapid movements called saccades and stationary periods
called fixations. Saccades occur 3–4 times a second, with each one lasting 20–200 ms.
During rapid movements, the eye perceives barely any visual information. Between
individual saccades, the eye briefly stops and focuses on a small part of the visual
scene [8]. Eye movement characteristics such as the number of fixations, the duration
90 4 Tracking the User and Environment
of an individual fixation, the number of saccades and the saccade amplitudes contain
important information about visual attention.
The concept of visual attention has been intensively studied for over a hundred
years. Its beginnings can be found in psychology, where scientists wished to deter-
mine why humans focus attention on only one object among many and why some
objects are viewed longer than others. Eye tracking methods gradually became well-
developed and spread from psychology to other fields. They can be found in branches
of medicine such as neurology and ophthalmology, in marketing and advertising, for
user interface evaluation and for eye tracking studies in everyday tasks such as read-
ing, driving or shopping. Eye positions can also serve as an input to a system, creating
the principle of gaze-based interaction.
Eye tracking systems measure and perceive four different eye-related phenomena:
(1) eye rotation within the eye sockets, (2) gaze direction (taking head and eye
movements into account), (3) blinking and (4) pupil size. Important properties of
eye trackers include sampling frequency, eye position measurement accuracy, and
robustness with regard to changing light conditions and environmental noise.
The first important issue is how quickly and how accurately eye position can be
determined. Eye trackers for medical and psychological research sample at 500 Hz
and can achieve accuracies of up to 1’ visual angle. For user interface research,
sampling frequencies of 60 Hz and accuracies of 1◦ are generally sufficient [9].
On the other hand, the eye tracker must be as unobtrusive as possible, allowing the
subject to move freely around the room if possible.
Depending on the measured parameter, eye tracking systems can be divided into
those that measure eye position relative to the head and those that measure the gaze
direction (also called point of regard). The latter are used primarily in applications
where it is necessary to recognize visual elements that attract a user’s attention [10].
In general, modern computer-aided eye trackers can be divided into three
categories with regard to the hardware and measurement principle: (1) electroocu-
lography, (2) search coils and (3) videooculography. We will focus only on videoocu-
lography, which is based on capturing a visual image of the face or the eye region.
Image processing algorithms allow the gaze direction to be determined based on the
eye’s optical properties.
Simple algorithms search for only one reference point, most commonly the border
between the pupil and the iris or the border between the iris and the sclera (called the
limbus). Appropriate image processing algorithms (edge detection) make it easier
to find the border between the colored iris and the white limbus. Finding the border
between the retina and iris is harder due to the lower contrast, especially in people
with dark eyes. For this purpose, we can either additionally illuminate the eye or
improve the contrast by equipping the camera with a filter that allows only near-
infrared or infrared light to pass. A camera with such a filter can capture an image
with a much more pronounced retina. The contrast between the retina and any color
of iris thus increases.
If an eye tracker measures two properties of the moving eyes, it is possible to
calculate the subject’s gaze direction using appropriate calibration. Two appropriate
eye properties are the center of the illuminated retina and the corneal reflection
4.3 Motion Tracking 91
camera
Fig. 4.38 Placement of the camera as well as the inner and outer rings of infrared light emitting
diodes
(reflection of light from the outer surface of the cornea). The reflection occurs due
to the additional light source pointed at the subject’s eyes. Most typically, infrared
light sources (e.g. light emitting diodes) are used to emphasize the retina and cause
the reflection which is then used to calculate the eye’s current position.
The infrared illumination makes the retina look darker or lighter in the camera’s
image. It looks lighter if the infrared diodes are fixed to the virtual axis connecting
the camera and human eyes. If illumination is not fixed to this axis, the retina looks
darker. One method of locating the retina makes use of this light and dark effect.
The images captured by the camera are divided into even and odd. Image capturing
is synchronized with infrared illumination as shown in Fig. 4.38. The eyes are illu-
minated with infrared diodes on the camera’s axis (inner ring) for even images and
with diodes not on the camera’s axis for odd images (outer ring). The differences in
illumination make the retina look lighter on even images and darker on odd ones.
The retina is then detected by thresholding the difference between captured even and
odd images [11].
Due to the eye’s structure, three reflections are visible in addition to the reflection
from the outer surface of the cornea (the most visible one): reflection from the inner
surface of the cornea, reflection from the outer surface of the lens and reflection from
the inner surface of the lens. These four reflections are collectively known as Purkinje
images. The position of the light reflection from the outer surface of the cornea
(also called the first Purkinje image) relative to the center of the retina changes as
the eyes are moved, but remains relatively unchanged in small head movements. The
distance between the two optical characteristics allows the orientation of the eye to be
determined. Figure 4.39 shows a light and dark retina as well as the reflection from the
outer surface of the cornea. Figure 4.40 shows an example of the left eye of a subject
gazing at nine calibration points as well as the relative position of the first Purkinje
image with regard to the retina. Most modern eye tracking systems track only two
points of the eye: the center of the retina and the reflection from the outer surface
of the cornea. A special type of systems called dual-Purkinje image eye trackers
simultaneously measure reflections from the outer surface of the cornea and from
92 4 Tracking the User and Environment
Fig. 4.39 Images of a dark (left) and light (right) retina as well as the reflection from the outer
surface of the cornea (white spot)
Fig. 4.40 Position of the first Purkinje image (white spot) relative to the retina (black circle) for a
left eye gazing at nine calibration points
the inner surface of the lens. They can thus differentiate between translational and
rotational eye movement since the two reflections move together during translation,
but not during rotation. Such an eye tracking system is very accurate, but requires
the subject’s head to be fixed [10].
Eye tracking systems also differ with regard to the position of the camera. If
we wish to, for instance, measure the region of interest on a computer screen, the
camera can be fixed to the table or to the top of the screen. The difficulty comes from
finding cameras with adequate magnification and suitable viewing angle. Those with
small magnification and wide viewing angle allow the user to perform more head
movements, but also capture many objects in the background that can impede eye
tracking performance. Cameras with large magnification are much more accurate
and usually capture only a single eye, but require the subject to stay still to avoid
moving out of the camera’s field of view. The best option is a combined system
with two cameras and a mechanism that allows the second camera to be moved.
The wide-angle camera first approximately locates the eye, and the mechanism turns
the large-magnification camera in the appropriate direction. The second camera then
takes a high-resolution picture of the eye. The second group of eye trackers are head-
mounted systems, which offer a higher accuracy since the camera is always in the
4.3 Motion Tracking 93
same position relative to the eye. However, the camera only tracks eye movements.
Such head-mounted systems need to be unobtrusive, lightweight and suitable for a
large spectrum of users.
4.3.4 Trunk
Trunk motion tracking can give higher-quality information about body movement
direction than information obtained from the head orientation. If the head orientation
is used to determine movement direction, it becomes impossible for the subject to
look sideways while walking straight ahead.
Since the trunk has a relatively large area, most tracking methods can be used
for it. One exception is the mechanical principle, which is more suitable for the
extremities.
Leg and foot movement is less often tracked, but allows the subject’s speed and
direction of movement to be measured. Leg motion tracking is usually used when
the subject is moving in a large area and thus requires methods that don’t impede
movement. Examples of such methods are ultrasonic, optical or inertial. Other meth-
ods are less suitable for motion tracking in situations with large position changes.
Physical input devices are usually part of the interface between the user and virtual
world. They can be either simple handheld objects or complex platforms. A person
operating the physical device gets an impression of the object’s physical properties
such as weight and texture, thus receiving a type of haptic feedback.
Physical controls include individual buttons, switches, dials and slides that allow the
user to directly affect virtual reality.
94 4 Tracking the User and Environment
4.4.2 Props
Props are physical objects used as interfaces with the virtual world. The prop can be
connected to a virtual object and/or have physical controls attached to it. The physical
properties of the prop (shape, weight, texture, hardness) often already indicate its use
in virtual reality. Such props allow intuitive and flexible interaction with the virtual
world. Since it is easy to determine the spatial relationship between two props or
between the prop and the user, the user can use this information to better understand
the virtual world. The goal of props is to create an interface that allows the user
natural manipulation of the virtual world; in other words, to approximate an ideal
interface that would barely be noticed by the user.
4.4.3 Platform
As the name suggests, a platform is a large and not very mobile physical structure
used as an interface with the virtual world. Just like props, platforms represent a part
of the virtual world through real objects that the user can interact with. The platform
thus becomes part of the virtual reality system. It can be designed to imitate a real-
world device that is simultaneously part of virtual reality, but it can also represent a
general space for the user to stand or sit in. One example of a platform is thus the
cockpit of an airplane.
Speech recognition allows natural communication with a computer system. The use
of speech makes the experience of virtual reality more convincing and realistic.
References
1. Rolland JP, Baillot Y, Goon AA (2001) A survey of tracking technology for virtual environments
2. Zhou H, Hu H (2008) Human motion tracking for rehabilitation—A survey. Biomed Signal
Process Control 3:1–18
3. Welch G, Foxlin E (2002) Motion tracking: no silver bullet, but a respectable arsenal. IEEE
Comput Graph Appl 22:24–38
4. Kuang WT, Morris AS (2000) Ultrasound speed compensation in an ultrasonic robot tracking
system. Robotica 18:633–637
5. Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A
(2011) Real-Time human pose recognition in parts from single depth images. http://research.
microsoft.com/apps/pubs/default.aspx?id=145347
6. Kuipers JB (2002) Quaternions and rotation sequences: a primer with applications to orbits.
Princeton University Press, Princeton (Aerospace and Virtual Reality)
References 95
7. Welch GF (2009) History: the use of the Kalman filter for human motion tracking in virtual
reality. Presence: Teleoper Virtual Environ 18:72–91
8. Richardson DC (2004) Eye-tracking: characteristics and methods. Encyclopedia of biomaterials
and biomedical engineering. Stanford University, ZDA
9. Poole A, Ball LJ (2005) Encyclopedia of human computer interaction, chap. Eye tracking in
human-computer interaction and usability research: Current status and future prospects. Idea
Group, Velika Britanija
10. Duchowski AT (2003) Eye tracking methodology: theory and practice. Springer, London
11. Morimoto C, Koons D, Amir A, Flickner M (2000) Pupil detection and tracking using multiple
light sources. Image Vis Comput 18:331–335
Chapter 5
Visual Modality in Virtual Reality
Abstract Sight is perhaps the most important of all human senses, and the visual
modality is thus also a key component of virtual reality. This chapter begins with
the biological basics of human visual perception, with a particular focus on depth
perception. It then explores the basic elements of computer graphics. It describes the
basic models used in virtual environments (polygons, implicit and parametric sur-
faces, constructive solid geometry, solid modeling) as well as the process of rendering
a model (via projections and transformations, clipping, determining visible objects,
illumination, and finally pixel conversion). The chapter concludes with a description
of various visual displays, from two-dimensional liquid crystal and plasma systems
to stereoscopic displays such as head-mounted, spatially multiplexed, temporally
multiplexed and volumetric displays.
Sight is perhaps the most important of all human senses, and a large part of the
brain is dedicated to interpreting information obtained from visible light. The visual
modality is thus also a key component of virtual reality; some virtual environments
with no visual component do exist, but they are few and far between. This chapter
begins with an overview of human visual perception with an emphasis on depth, then
continues with the design and displaying of visual elements in virtual reality.
Figure 5.1 shows the human eye. Visible light from the environment enters the eye
via the transparent cornea. Light intensity is controlled by the pupil, which dilates
or contracts similarly to a camera shutter and thus limits the amount of light that can
enter the eye. Behind the pupil is the lens, which focuses light on the retina. The
lens is attached to the ciliary muscle, which controls the thickness of the lens by
contracting. This allows objects at different distances from the eye to be seen clearly.
eyelid
pupil
sclera
iris
retina
ciliary muscle
choroid
cornea
iris
optic nerve
lens
ciliary muscle
sclera
The human eye senses colors using cones in the retina. Cones are divided into three
type, each of which senses different wavelengths of light. The first type senses yel-
lowish light (564–580 nm), the second senses greenish light (534–545 nm) and the
third senses bluish light (420–440 nm). We thus, for example, see blue color if the
third type of cones is stimulated more than the second. Similarly, we see purple color
if the third type of cones is stimulated much more than the second. The range of wave-
lengths that human eyes can ‘see’ is between approximately 380 and 700 nm. Light
5.1 Human Visual Perception 99
with shorter wavelengths is called ultraviolet while light with longer wavelengths is
called infrared.
Since the human eye has three types of cones, computers usually also use a model
with three primary colors. Mixing these three colors allows any color to be created.
The most frequently used model in computing is the RGB model, which uses red,
green and blue as the primary colors (Fig. 5.2a). These colors roughly correspond
to the three types of cones. Another popular model is the CMYK model (Cyan-
Magenta-Yellow-Black, Fig. 5.2b), which is mostly used for color printing.
Among other things, our eyes also convey information about depth—the distance
between us and particular objects. Depth is especially important in virtual reality, as
normal twodimensional visual displays cannot properly incorporate depth into the
image. This subsection thus covers the basics of human depth perception, as virtual
reality designers need to fool the human senses and create an illusion of depth in the
virtual environment.
Humans estimate distances to different objects using so-called depth cues. These
are divided into monoscopic and stereoscopic.
Monoscopic depth cues can be seen with only one eye and are also present in
twodimensional images. They include:
1. Occlusions: Objects in the foreground occlude those in the background
(Fig. 5.3).
2. Shading: lets us better estimate the shape of an object, but shadows also indicate
relative positions of different objects (Fig. 5.3).
3. Size: The size of an object can be compared to the size of similar objects we have
memorized, thus giving an impression of absolute distance. Comparing the sizes
of two similar objects allows us to gauge their relative distance (Fig. 5.3).
4. Linear perspective: Parallel lines appear to converge toward a point as they recede
into the distance (Fig. 5.3). This is a useful cue with objects consisting of straight
lines (e.g. most buildings).
100 5 Visual Modality in Virtual Reality
Fig. 5.3 Psychological cues used to perceive depth (linear perspective, shadows, occlusion, texture
gradient, horizon, blurring)
5. Surface texture: Distant objects have a less sharp texture than close ones since
the eye cannot distinguish details at great distances (Fig. 5.3).
6. Accommodation (Fig. 5.4) is the process where the lens dilates or contracts, al-
lowing the eye to focus on the object it is viewing. The brain can estimate the
distance to the object from the required lens thickness.
7. Parallax, the movement of the viewer, allows the distance of the viewed objects
to be estimated since distant objects appear to move less in the field of view than
nearby ones.
8. Movement of the viewed object also allows relative distance to be estimated. As
the viewed objects move away from the viewer, they seem to get smaller. When
they move closer, they appear larger. Based on this cue, the brain also estimates
how much time an approaching object will need to collide with the viewer.
Stereoscopic depth cues combine information from both eyes. The most important
cues are:
1. Convergence (Fig. 5.4) is the process where the eyes turn toward an object
in order to focus on it. The angle of the eyes allows the brain to estimate
depth. Convergence always occurs simultaneously with the previously mentioned
accommodation.
2. Stereopsis (binocular disparity—Fig. 5.5) allows us to estimate depth from the
differences between what the left eye sees and what the right eye sees.
The designer of a virtual environment can mix any of these cues and thus create
an illusion of virtual depth. Since most of the cues are additive, adding more of them
creates a more realistic feeling of depth.
5.1 Human Visual Perception 101
focused object
convergence angle
focused object
view axes
convergence angle eyeball rotation
eyeball rotation
eyeball rotation
view axes
Fig. 5.4 The principle of convergence and accommodation. The eyes turn toward the object, and
the eye angle allows the brain to estimate depth (convergence). At the same time, the lens dilate or
contract in order to focus on the object (accommodation)
Fig. 5.5 The principle of stereopsis. Due to slightly different viewpoints, the left and right eye see
different images. The differences between the images allow depth to be estimated
102 5 Visual Modality in Virtual Reality
While a virtual environment can in principle exist without being seen by anyone, such
an environment is not very useful. All elements of the virtual environment should
have a defined visual representation that can later be shown to users on a display. The
design of these visual representations is in the domain of computer graphics, which
uses various specialized hardware and software. This section covers the basics of the
field, though several concepts are greatly simplified for easier understanding.
Computer images are divided into two large categories: raster and vector images.
Vector images define individual elements of the image using simple geometric shapes:
lines, plane figures and (in three-dimensional images) bodies. By adding and subtract-
ing simple geometric shapes, it is possible to create complex objects called polygons.
A vector image stored on a computer takes the form of mathematical formulas that
describe the image. Unlike raster images, this allows the resolution of the image
to be increased easily without increasing the amount of required computer memory
(Fig. 5.6).
5.2 Computer Graphics 103
Fig. 5.6 Increasing the resolution of vector (top left) and raster images (top right)
Boundary representation methods are suitable for nontransparent objects where in-
formation about the interior is unnecessary. They are generally based on polygons,
implicit/parametric surfaces or constructive solid geometry.
Polygons
Polygons are the simplest modeling method and consist of plane figures with at least
three straight edges. Though any number of edges can in principle be used, polygons
with three or four edges are the most common in practice. Triangular polygons have
three major advantages: they are always planar, they are always convex, and any
104 5 Visual Modality in Virtual Reality
complex polygon can be cut up into multiple triangular ones. Any curved surface
can thus be modeled with any desired accuracy by using a sufficiently large number
of triangular polygons. The main limitation is the amount of memory required to store
the polygons, which increases rapidly with accuracy. A polygonal object model has
the appearance of a wire mesh (Fig. 5.7).
A polygon is completely described with not only the positions of its vertices, but
also its color, textures and surface parameters. Object representations are frequently
simplified by grouping polygons into sets that represent specific objects (e.g. a chair
or table). Grouping allows an object to be easily moved as a whole without needing
to move e.g. an individual table leg or even individual polygons.
Implicit and parametric surface modeling methods describe curves with mathemati-
cal equations. They allow certain curved objects such as spheres to be described
with far less information than in the case of polygons. Parametric surface modeling
describes each curve as a function of one or more parameters while implicit surface
modeling describes each function as an implicit function of multiple variables. Due to
their mathematical nature, these descriptions are perfectly accurate: the same amount
of information allows any desired resolution to be obtained.
For example, the parametric definition of a sphere is:
where r is the radius of the sphere, u runs from −π/2 to π/2, and v runs from 0 to
2π . All possible (x, y, z) points lie on the surface of the sphere.
Similarly, the implicit definition of a sphere is:
x 2 + y2 + z2 = r 2 (5.2)
5.2 Computer Graphics 105
Fig. 5.8 Example of an implicitly modeled sphere surface (left) and a surface modeled with splines
(right)
All (x, y, z) points that satisfy this requirement lie on the surface of the sphere
with the radius r (Fig. 5.8 left).
A sphere can thus be perfectly described with a single equation. If we wanted to use
polygons instead, we would need to save thousands and thousands of polygons into
the computer, and the modeled sphere would still not be perfectly round. Higher-
order parametric and implicit equations allow even extremely complex curves to
be modeled. Among these higher-order equations, splines deserve special mention.
They are curves created from several polynoms ‘spliced’ together piece by piece.
A simple two-dimensional spline, for instance, is:
⎧
⎨ f (x) ⇐ x < 0
y = g(x) ⇐ 0 ≤ x < 1
⎩ (5.3)
h(x) ⇐ x ≥ 1
f (0) = g(0), g(1) = h(1).
A spline is always continuous—each piece fits together with the adjoining one.
Their first or even second derivatives are usually also continuous, ensuring a smooth
curve. The above example would require the following to ensure a continuous first
derivative:
Splines thus allow different curves to be approximated. The concept can be easily
expanded to three dimensions: pieces of a surface (described with simple polynoms)
are ‘stitched’ together at the edges, thus creating a more complex surface (Fig. 5.8
right).
106 5 Visual Modality in Virtual Reality
U U
So Why Polygons?
Boundary representations are inappropriate for transparent objects where the interior
also needs to be modeled. This is especially true for spaces that contain a partially
transparent substance of varying density (e.g. mist, magnetic resonance images).
Solid (non-geometric) modeling comprises volumetric modeling and particle sys-
tems.
Volumetric modeling is suitable for partially transparent objects and is frequently
used to display medical, seismic or other research data. It is based on ray tracing: light
rays, which obey the laws of physics, change their properties upon being reflected
off virtual objects or passing through partially transparent materials.
Particle systems are commonly used to display complex flows in visual scenes.
A large number of particles are generated in the environment and then move ac-
cording to predefined physical laws (acceleration, gravity, reflection). Their initial
position and velocity are defined only roughly (e.g. with the mean value and stan-
dard deviation). The movement of such particles allows simulation of very complex
phenomena such as fire, smoke and large groups of people.
A scene graph is a tool that allows accurate and flexible representation of the virtual
environment’s hierarchy. Its structure defines the relative position and orientation of
all objects in the virtual environment as well as other object features such as color
and texture. It is thus possible to change a part of the virtual world with a single
change to the scene graph. An example of a scene graph is shown in Fig. 5.10. In
this case, it is possible to move (open) a drawer and its contents by changing a single
coordinate system. Similarly, it is possible to move a book to the drawer with a single
operation.
5.2.3 Rendering
Light
Switch Virtual world
Darkness
CST Other objects
Computer desk
Nodes
Cup Book Computer Screen Lamp
Group
Rendered object
Shelf Drawer
Global effect
Modifier
CST Transformation Frame CST Frame CST CST
of the coordinate system
Keyboard Pen
Fig. 5.10 The scene graph allows related objects to be combined so that their parameters can be
defined more easily
viewing, as the rest does not need to be rendered. If we want to include shading and
different levels of light in the image, we need to simulate lighting in the environment.
Finally, the virtual environment visible from the camera must be transformed into a
two-dimensional projection and drawn on the screen.
At the start of rendering, the virtual environment must be transformed from the global
coordinate system to the camera’s coordinate system. The position and orientation
of the camera in the virtual environment must first be determined. Once they are
known, we can use equations given in Chap. 2 to easily determine the transformation
matrix from the global coordinate system to the camera’s coordinate system. This
results in a three-dimensional environment seen from the camera’s viewpoint.
Displaying the virtual environment on a screen requires a two-dimensional image
of the environment as seen from the perspective of the camera. Thus, a projection
from three to two dimensions is also required. Such projections are divided into
parallel and perspective projections (Fig. 5.11). Parallel projections are mostly used
in technical drawing and assume that the camera (center of projection) is located
an infinite distance from the object. Lines that are parallel in three dimensions thus
also remain parallel in a two-dimensional image. Perspective projections, on the
other hand, assume that the camera is near the objects. Lines that are parallel in
5.2 Computer Graphics 109
viewing axis
(a) (b)
Fig. 5.11 Parallel projection (left) and perspective projection (right)
5.2.3.2 Clipping
Objects outside the camera’s field of view do not need to be rendered by the computer.
Clipping is thus the analytical process of determining which parts of virtual objects lie
outside the camera’s field of view. This can be done using several different algorithms.
Lines are generally trimmed with the Cohen-Sutherland or Cyrus-Beck algorithm
while polygons are clipped with the Sutherland-Hodgeman, Weiler-Atherton or Vatti
algorithm. Their mathematical details are beyond the scope of the book, though it is
important to be aware of the general problem.
to determine which objects are obscured by others. The obscured objects then do not
need to be drawn on the screen. Several algorithms exist to solve this challenge.
The simplest is the so-called painter’s algorithm, which sorts polygons according to
their distance from the camera and then draws them in order from the most distant
to the nearest one (Fig. 5.12). It thus paints over distant objects with nearby ones.
This approach is not very efficient, as it nonetheless draws all obscured polygons.
This weakness can be solved with the reverse painter’s algorithm, which draws
polygons from the nearest to the most distant. In each step, it colors only those pixels
that have not yet been colored.
Both the standard and reverse painter’s algorithms have several weaknesses and
have thus been replaced in practice by the depth buffer, which is also called the
Z-buffer. This buffer solves the visibility problem for each pixel separately. It takes
the form of a two-dimensional matrix where each element represents a single pixel.
When the computer renders an object in a particular pixel, it saves the distance (depth)
of this object in the corresponding element of the buffer. Each pixel of the object
can have its own depth. If the computer later tries to render a new object in the same
pixel, it first compares the distance of the new object to the value in the depth buffer.
If the new object’s distance is smaller, the computer draws the new object in the pixel
and updates the value of the buffer.
An alternative method worth mentioning is ray casting, which uses an entirely
different principle than the above algorithms. It creates several rays (one for each
pixel) which travel from the camera in different directions. The first object hit by a
ray is then drawn by the computer in the ray’s corresponding pixel. This method is
relatively slow compared to the depth buffer method, but has certain other advantages.
It represents a simplified ray tracing algorithm which we will examine in the lighting
and shading subsection.
So far, we’ve defined the point from which we view the virtual environment and
the objects that we can see. However, in the real world we would not be able to
see these objects if they were not illuminated by the sun or another source of light.
Thus, different objects are more or less visible depending on the distance from
5.2 Computer Graphics 111
(a) (b)
the light source and its intensity. The object’s properties such as color and albedo
also influence its lighting, and each object casts its shadow onto other objects. In
virtual reality, it is in principle possible to ignore realistic lighting and assume that
all objects are perfectly illuminated. Adding shadows may not really convey much
practical information to the viewer, but it can vastly improve the feeling of realism
(Fig. 5.13).
It can be very computationally demanding to take into account all possible sources
of light (with different intensities, colors, positions. . .) and all possible occlusions
between objects. Thus, two simplifications are often used: illumination without any
light sources and local illumination.
The simplest implementation of lighting in virtual reality does not use any light
sources, but simply changes the color or brightness of different objects depending
on their distance from the camera or other reference point. Objects farther from the
camera are thus darker, simulating poor visibility of distant objects. The method
is computationally extremely simple and is used in applications such as landscape
modeling or medical image displays.
Local Illumination
Local (also called direct) illumination models allow light to travel from a light source
to an object and be reflected from the object, but the reflected light cannot then hit a
second object. This allows, for example, the sides of an object that are facing toward
a light source to be illuminated more than sides facing away from the light source.
However, local illumination cannot model phenomena such as shadows.
There are three types of light in local illumination:
112 5 Visual Modality in Virtual Reality
1. Ambient lighting comes from all directions and evenly illuminates the entire
object. Two neighboring sides of the same object are illuminated identically, so
no border is visible between them.
2. Diffuse lighting comes from a specific direction, but is reflected equally in all
directions when it hits an object and thus doesn’t depend on the position of the
camera. It appears with rough surfaces such as chalk (Fig. 5.14).
3. Specular lighting comes from a specific direction and is reflected in another spe-
cific direction when hitting an object (Fig. 5.14). The amount of visible specular
light thus also changes with camera position. It appears with smooth or shiny
surfaces such as mirrors or metal.
Each object has its own separate reflection coefficients for ambient, diffuse and
specular light reflection.
A local illumination model can use either ambient lighting, ambient lighting in
combination with diffuse lighting, or all three lighting types. The most commonly
used model is Phong illumination, but its mathematical details are beyond the scope
of this book.
Global Illumination
Global illumination models take into account not only the light that reaches an object
directly from the light source, but also light reflected onto the object from other
objects. This gives the most realistic image of the environment, but also significantly
increases computational demands.
The best-known global illumination model is ray tracing, where light ‘rays’ are
sent from the camera in different directions. Their progress through the virtual
environment is then traced. When a ray falls on an object, it can be reflected and
thus travel on. If the ray arrives to an object directly from the camera, reflection can
be diffuse or specular. If the ray was previously reflected by another object, reflection
can only be specular. Each ray travels until it is reflected a certain amount of times
or it covers a predefined distance without any reflections occurring. The color and
brightness of each point thus depends on all rays reflected from it.
5.2 Computer Graphics 113
The last rendering step is to convert the virtual environment into color pixels that can
be directly shown on the screen. This conversion is divided into multiple algorithms
that are responsible for drawing different components of the environment. They
include:
1. Line drawing: The algorithm uses the environment model to obtain the start and
end points of all straight lines, then determines the intermediate pixels that need
to be drawn. The problem is that a pixel is approximated as a small square.
Combining multiple such small squares only allows horizontal or vertical lines
to be drawn, not diagonal ones. The algorithm must thus draw a diagonal line as
‘steps’ that approximate a line such that the error is smallest (Fig. 5.15). Examples
Fig. 5.16 Polygon scan conversion, which “colors” the pixels inside a polygon. For every row of
pixels, a horizontal line is drawn. The line is divided into sections, with every intersection between
the line and polygon marking the beginning of a new section. Every second section is then colored
of line drawing algorithms are the digital differential analyzer and Bresenham’s
line algorithm.
2. Circle drawing: Drawing circles presents a similar challenge as drawing diagonal
lines: the smoothness of the circle needs to be approximated with a multitude of
small squares. When drawing, parametric or implicit circle equations can be used
to speed up the procedure. Since a circle is symmetric, the equations only need
to be evaluated for one eighth of the circle, and symmetry can be used to obtain
the other points. For instance, if the center of the circle is at (x0 , y0 ) and the point
(x0 + a, y0 + b) lies on the circle, it can immediately be determined that points
(x0 ± a, y0 ± b) and (x0 ± b, y0 ± a) also lie on the circle. The most popular
circle drawing algorithm is Bresenham’s circle algorithm.
3. Polygon drawing: It was previously mentioned that polygons are a basic building
block of computer graphics. There are thus several algorithms that specialize in
rasterizing different types of polygons (triangular, convex etc.). The best-known
algorithm is the so-called ‘polygon scan conversion’, which colors all pixels
inside a polygon. However, it first needs to find all pixels inside the polygon,
which can be very challenging for concave polygons. For every row of pixels,
a horizontal line is drawn. The line is divided into sections, with every intersection
between the line and polygon marking the beginning of a new section. Every
second section is then colored (Fig. 5.16). The algorithm is fast and effective, and
can be tweaked to speed it up even further.
A special example of conversion into pixels is the previously mentioned ray trac-
ing. It is used for implicit and parametric surface modeling as well as constructive
solid geometry. The algorithm sends light ’rays’ from the camera in different direc-
tions, then follows their paths. When the ray hits an object, it can be reflected and
thus travel on. Each ray travels until it reaches a maximum number of reflections or
travels a certain distance without any reflection. The color and brightness of each
pixel thus depends on all the rays reflected from it. The algorithm represents an
effective alternative pixel conversion method and can create a more realistic image,
but is relatively slow and thus far less popular.
5.3 Visual Displays 115
Visual displays are defined as the hardware that presents the created visual image to
the user.
This section covers multiple displays that differ in a large number of properties.
These properties define the quality of the image and thus the quality of the virtual
reality experience or may have a major impact on the display’s practical usefulness.
For example, some stationary displays need a large amount of space while head-
mounted displays usually require little space.
Color—most displays use the trichromatic system where three primary colors
(red, green and blue) allow the entire color palette to be generated. The trichromatic
makes sense since the human eye has three types of color-detecting cones. For aug-
mented reality, monochromatic displays are sometimes preferable for their higher
brightness and contrast.
Spatial resolution is usually given as the number of pixels per unit of length. The
size of the display and the number of pixels affect the image quality. The distance
between the eyes and the display must also be taken into account, as a smaller distance
requires a higher pixel density.
Contrast is the difference between bright and dark regions of the image. A higher
contrast makes it easier to separate different components of the image.
Focus distance is the virtual distance between the image and the user’s eyes. Cur-
rent technology places all objects in the scene at the same focus distance regardless
of their virtual distance from the observer. The disparity between different visual
depth cues causes problems with depth perception, which can lead to headaches and
sickness.
Transparency—the display can hide the real world from the user or allow it to be
viewed. Stationary screens and desktop displays cannot hide the real world and are
thus ‘transparent’. Head-mounted displays, on the other hand, are usually nontrans-
parent and prevent the wearer from seeing the real environment. Nontransparency
can affect safety (real objects cannot be seen) and the possibility of cooperation with
others (other people cannot be seen).
Occlusion—when using stationary displays, objects from the real world (such as
the user’s hand) can occlude virtual objects. This is most problematic when a virtual
object comes between the user’s eyes and a real object. In this case, the virtual object
should occlude the real one, but this does not occur. The problem is more easily
solved using head-mounted displays.
Field of view represents the width of the currently visible world. The human field
of view is approximately 200◦ with 120◦ of binocular overlap. The overlap area is
the most important, so displays with a 120◦ field of view cover an acceptable portion
116 5 Visual Modality in Virtual Reality
of human view. The field of view of a head-mounted display, for instance, defines
how much of the world the user can see without having to turn his/her head.
Acceptable graphical delay is prominent in displays that change the view of the
virtual environment in response to the user’s movement. There is a certain delay
between the user’s movement and the corresponding change on the display, which
can make the user uncomfortable or even sick. The delay must thus be as small as
possible. The acceptable graphical delay in augmented reality is even smaller than
the acceptable delay in normal virtual reality, as a large delay means that the real and
virtual worlds are no longer synchronized.
Temporal resolution represents the frequency with which the displayed image is
refreshed. It significantly affects the user’s experience of the virtual world. A fre-
quency above 30 Hz is acceptable while frequencies under 15 Hz lead to discomfort.
User mobility can affect both virtual presence and the usefulness of the virtual
reality. Most displays limit the user’s mobility as a consequence of user movement
sensors, electrical connections or stationary displays.
User movement sensing—advanced displays can measure the user’s movement
and adapt the displayed image accordingly.
Compatibility with displays for other senses—head-mounted displays are entirely
compatible with headphones, but speakers are preferred with stationary screens.
Haptic displays present a special challenge, as their limited workspace also creates
limitations for other (e.g. visual) displays.
Liquid crystals only allow certain wavelengths of light to pass through their unique
structure. They thus represent a polarizing filter. Their structure changes in the pres-
ence of an electrical field, so the crystals can be made transparent to different wave-
lengths of light by modulating the electric field. This principle is exploited by liquid
crystal displays.
A normal liquid crystal display consists of six layers. The layer farthest away from
the viewer is the light source, which can be passive (a mirror reflecting light from
the environment) or active (a lamp). The other five layers form a ‘sandwich’: the two
outer layers are polarizing filters (vertical and horizontal), the two middle layers are
glass plates with electrodes attached, and the inner layer are liquid crystals. Light
travels from the source through the first filter and becomes polarized, then passes
through liquid crystals which may allow it to pass completely or only partially. Each
pixel of the screen has its own liquid crystal cell, so each pixel’s brightness can
be modulated individually. A color screen is made similarly except that each pixel
consists of three subpixels with different color filters.
5.3 Visual Displays 117
A plasma display consists of millions of small fluorescent lamps. Each lamp contains
a mixture of noble gases and mercury. The lamp’s inner surface is also covered with
fluorescent substances. When a charge is applied to the lamp, the mercury turns
to gas while the noble gasses are partially ionized and form the plasma that gives
the display its name. The electrons in the mixture collide with mercury particles,
increasing the mercury’s energy. The mercury then emits energy in the form of
ultraviolet light. When this light hits the fluorescent substance on the wall of the
lamp, the substance emits heat and visible light that can be seen by the user. An
image is formed by manipulating the brightness of the individual lamps. Similarly to
liquid crystal displays, a color plasma display is made by having each pixel’s lamp
consists of three colored sublamps.
The plasma displays of today are heavier and use more energy than liquid crystal
displays, but have a faster response time and better contrast.
5.3.2.3 Projectors
In a projection display, the screen itself does not create an image. Instead, the image
is projected onto the screen by a separate projector that can be either in front of
or behind the screen. The screen is usually much larger than with typical desktop
computers and thus covers a larger portion of the user’s field of view. Most projectors
in virtual reality systems are high-quality and can be based on various technologies
(liquid crystals, micromirrors, laser diodes). Multiple projectors and screens can also
be used to surround the user and thus increase the field of view.
Fig. 5.17 The Wheatstone stereoscope (1838): To view images using the stereoscope, the viewer
places his/her eyes directly in front of the mirrors. The mirrors are placed at a 90◦ angle. The
observed images are placed on the attachments at each side of the viewer and are thus spatially
separated
5.3.3.1 Parallax
The difference between an object’s location in the image for the left eye and its
location in the image for the right eye is called parallax (Fig. 5.18). It depends not
only on the display, but also on the viewer’s position and the distance between the
viewer’s eyes. It is divided into horizontal and vertical parallax. Horizontal parallax
is necessary for the illusion of depth while vertical parallax is always an error in
programming or displaying the virtual environment. After all, since the left and right
eyes are at the same height, the two images should also be at the same height.
In the conditions shown in Fig. 5.19, horizontal parallax can be calculated with
the following equations:
5.3 Visual Displays 119
object
display p
parallax between left
and right images
D
view vector
d (parallel view)
IPD
IPD p
=
D D−d (5.5)
D−d
p= IPD,
D
where p is the parallax, IPD is the interpupillary distance, d is the distance between
the viewer and display (as well as the point that the eyes focus on for accommodation),
and D is the distance between the viewer and the point that the eyes focus on with
convergence.
We thus define four separate types of parallax (Fig. 5.20):
1. Zero parallax occurs when D = d. The object is seen on the screen. Convergence
and accommodation focus on the same point, so the object can be viewed without
problems.
2. Positive parallax occurs when D > d. The viewer thus has the impression that
the object is inside or behind the screen.
3. Negative parallax occurs when D < d. The viewer thus has the impression that
the object is ‘floating’ in front of the screen.
4. Divergent parallax occurs when the eyes focus on different points due to improper
display settings.
Convergence/Accommodation Disparity
Divergent parallax is very unpleasant for the viewer, so we try to avoid it in prac-
tice. However, positive and negative parallax can also be unpleasant. The eyes use
120 5 Visual Modality in Virtual Reality
zero parallax
object negative parallax
display (axes intersect)
positive parallax
(axes do not intersect)
object
convergence point
( observed object )
screen
accommodation point accommodation point
of the left eye of the right eye
2
3
1
screen
parallax between
left and right images
Fig. 5.22 The effect of user viewpoint changes on the perceived object position due to parallax
Viewpoint Changes
Let’s take a look at how changing the viewpoint in the presence of parallax affects
the perception of the displayed object’s position. Figure 5.22 shows the conditions
for three people looking at the screen from three different positions. Both the virtual
depth of the object and its virtual position in the plane of the screen change with the
viewer’s position. If the viewer thus moves his/her head in any direction, the object
on the screen will also appear to move.
Of course, even large head movements don’t allow the viewer to look behind an
object unless a head movement tracking method is used (for example, viewing a
movie where images for the left and right eyes are generated independently of the
viewer’s position). Viewpoint changes are shown in Fig. 5.23. When a real object is
viewed (Fig. 5.23 top), changing head position also changes the perspective. This is
only possible in virtual reality if head movements are tracked and new views of the
virtual environment are generated accordingly (Fig. 5.23 middle). If head movements
are not tracked, changing the viewpoint only slightly changes the object’s position
on the screen due to parallax (Fig. 5.23 bottom).
122 5 Visual Modality in Virtual Reality
1 2
Fig. 5.23 The effect on viewpoint changes on the view of a real (top) and virtual object (bottom)
Head-mounted displays have the appearance of glasses whose lenses have been
replaced with two screens (one per eye) showing the virtual environment (Fig. 5.24).
The screens are usually small and light, so all head-mounted displays are mobile and
move together with the user. The illusion of depth is created by having each display
show the virtual environment from a different viewpoint. The display follows the
user’s head movements and changes the view of the virtual environment accordingly.
A frequent weakness of head-mounted displays is the delay between head move-
ment and viewpoint change, which can lead to simulator sickness. The viewing angle
of typical head-mounted displays is generally limited while the resolution is usually
very high due to the close proximity of the eyes to the screen.
5.3 Visual Displays 123
Unlike head-mounted displays, which use two screens (one per eye), these displays
use only one screen. However, each eye obtains different information from the screen.
This can be achieved in two ways:
1. The image on the screen can be spatially multiplexed and thus simultaneously
contain information for both eyes. Polarized glasses are used to remove part of
this information so that each eye only sees the image meant for that eye. The
advantage of such displays over head-mounted displays is that the same image
124 5 Visual Modality in Virtual Reality
RL R L RL RL RL
polarized glasses
f n (Hz) R L
f n (Hz) f n (Hz)
screen
f n (Hz) f n (Hz)
f n (Hz) f n (Hz)
f n (Hz) f n (Hz)
can be viewed by multiple people. Furthermore, polarized glasses are much lighter
and cheaper than head-mounted displays. Examples of spatially multiplexed dis-
plays are shown in Figs. 5.25 (for an LCD) and 5.26 (for a projection system).
In a projection system, two projectors are generally used, with each generating
5.3 Visual Displays 125
L R L R L R
f n (Hz)
synchronization
of the projector
time
and glasses f n (Hz) left eye sees right eye sees
active glasses
f n (Hz)
fn fn
2 (Hz) 2 (Hz
)
the part of the image for one eye. Two adjacent bands always show images for
different eyes and are differently polarized. If the bands are sufficiently thin,
a viewer with polarized glasses sees a single image with depth.
2. The image shown on the screen can also be temporally multiplexed. In this
case, the screen rapidly switches between images for the left and right eyes.
Only a single image is shown in a particular moment. If the switching frequency
is very high, a viewer with active glasses perceives a single image with depth.
The active glasses are synchronized with the display and block or allow the view
of the screen depending on the currently displayed image. The best-known ex-
ample of such polarized glasses is the 3D cinema where characters and object
seem to ‘jump’ out of the image. Examples of temporal multiplexing are shown in
Figs. 5.27 (LCD screen) and 5.28 (projection system).
The displays from the previous subsection provide a three-dimensional image, but it
is limited to a single screen. The user thus cannot feel truly ‘immersed’ in a virtual
environment since he/she is surrounded by a real environment from all other di-
rections. However, better immersion can be achieved using multiple screens. The
best-known example of such a system is the CAVE (Cave Automatic Virtual
Environment)—a room in which the walls (and optionally the floor and ceiling) are
covered with screens that display the virtual environment (Fig. 5.29). Each individual
126 5 Visual Modality in Virtual Reality
L R L R L R
computer screen
projector
f n (Hz) f n (Hz)
f n (Hz)
active glasses
f n (Hz) f n (Hz)
synchronization fn fn
of the projector 2 (Hz) 2 (Hz)
and glasses
screen is a projection system just like those previously described, but different images
of the environment must be generated for the multiple screens.
Systems such as the CAVE are almost always equipped with surround sound
and motion trackers that adapt the images of the screens according to the user’s
position. They thus offer a strong feeling of presence in the virtual world. Their main
disadvantages are size and price.
mirror
projector
mirror
projection
projector
projection
projector
3D visualization
projection
mirror
active glasses
Fig. 5.29 Example of a CAVE system with three projection screens. The object in the middle of
the CAVE system represents the virtual environment as perceived by the user
redirect light from the screen so that each eye only sees the light meant for that eye.
In both cases, the main weakness is that the viewer needs to stand in a very specific
location to correctly perceive depth. The weakness can be partially overcome using
head tracking and corresponding image adaptation, but this increases the cost of the
system and can only be done if there is only one viewer.
All of the displays described thus far create an image on a screen in two dimensions.
Volumetric displays go a step farther and create the image itself in three-dimensional
space. The viewer does not only obtain an illusion of depth, but perceives real depth.
The image can thus also be viewed from anywhere in the room. All volumetric
displays are autostereoscopic, but deserve their own subsection due to their advan-
tages. They are currently mostly used for military and research purposes, but can be
expected to eventually achieve widespread use.
There are many types of volumetric displays. Perhaps the simplest method is a
cube-shaped grid of small lamps that are transparent when turned off. Each lamp
128 5 Visual Modality in Virtual Reality
Fig. 5.30 Autostereoscopic display with a parallax barrier; the two polygons on the right image
show the ‘sweet spot’ from which a three-dimensional image can be seen
represents a single voxel whose brightness and color can be individually tuned, thus
building a three-dimensional image. This is also called the static method, as parts of
the device do not physically move.
A second widespread method uses a moving base on which the image is dis-
played. An example is a rapidly rotating flat or spherical screen (Fig. 5.31). If the
rotation is fast enough, a viewer perceives a three-dimensional image. The method is
computationally demanding, as the three-dimensional virtual environment must
very quickly be converted into consecutive two-dimensional images displayed on
the screen.
The most famous volumetric method is holography: saving and reconstructing
light rays reflected from objects. When the hologram is properly lit, it emits the
same light as the original and thus represents a three-dimensional image that can
be observed from any viewpoint. This method has been used to create images for
decades, but did not appear in virtual reality. The reason for this delay is that dynamic
images are much harder to display with a hologram than static ones. The principle of
a hologram is as follows: light reflected from the object or scene we wish to record
falls onto the surface of the hologram (Fig. 5.32). The hologram is simultaneously
illuminated with a reference beam of light. The interference between both beams of
light is recorded in the hologram. When the hologram is later illuminated with the
same reference beam, diffraction of light on the recorded hologram results in the
5.3 Visual Displays 129
rotating
screen transparent
dome
virtual
object
projection
optics
object
laser
source object
beam
interference
mirror pattern
reference
beam
virtual
object
diffraction on
interference
pattern
reconstructed
laser wave front
source
reconstruction
beam
viewer
same pattern of light that was reflected from the original image or scene (Fig. 5.33).
The hologram itself is not an image; it is simply a pattern that can be illuminated in
order to produce the recorded image.
Chapter 6
Acoustic Modality in Virtual Reality
Abstract Sound enhances the sense of realism in the virtual world, gives ad-
ditional information about the environment, for example engine speed in flight
simulators. By means of sonification the information is presented in the form of
an abstract sound. Unlike vision, it is not limited to the direction of view, but is
present regardless of head orientation. It is also not possible to temporarily disable
your hearing the way it is possible to temporarily disable vision by closing your eyes.
Temporal and spatial characteristics of sound differ from those of visual perception.
Although what we see exists in space and time, vision stresses the spatial compo-
nent of the environment. In contrast, hearing stresses the temporal component of
the environment. In the following sections, the process of creating a virtual acoustic
environment will be shortly presented with basic acoustic principles, human audio
perception and recording techniques.
Sound enhances the sense of realism in the virtual world, gives additional information
about the environment, for example engine speed in flight simulators. By means
of sonification the information is presented in the form of an abstract sound. For
example different temperatures of the building can be presented with different sound
frequencies.
Sound not only attracts attention, but also helps determine the user’s location. Like
vision, hearing is a remote sense. Unlike vision, it is not limited to the direction of
view, but is present regardless of head orientation. It is also not possible to temporarily
disable your hearing the way it is possible to temporarily disable vision by closing
your eyes.
Temporal and spatial characteristics of sound differ from those of visual per-
ception. Although what we see exists in space and time, vision stresses the spatial
component of the environment. In contrast, hearing stresses the temporal compo-
nent of the environment. Since the main existence of sound is connected strictly to
time, the speed of sound reproduction is more critical than the speed of image dis-
play. Sound can be divided into general ambient sound, sound that indicates events
and sound that enhances or replaces other sensory perceptions. Ambient sounds are
generally used to create mood in virtual reality. Sound can be presented as demateri-
alized background sound or sound from sound sources, i.e. entities in virtual space.
The determination of the location of a sound source can effectively direct user atten-
tion. For example, loud sounds can cause a person to turn in the direction from which
the sound is coming. Sound coming from a point in space can help create awareness
about the existence of a certain object. This makes it possible for objects to create
an illusion of still being present despite already being out of sight.
In the following sections, the process of creating a virtual acoustic environment will
be shortly presented with basic acoustic principles, human audio perception and
recording techniques.
Source modeling is a concept used to produce sound in the VAE with properties such
as directivity. The audio signals used in creation of the VAE include prerecorded
digital audio samples and synthesized sounds. The audio source signals used should
be ‘dry’, without any reverberant or directional properties. They are usually mono-
phonic and regarded as a point source. If a sound source is a stereophonic signal, it is
modeled as two point sources. The signal-to-noise ratio, sampling frequency and bit
depth should be sufficient that the auralization does not produce undesired effects.
6.1 Acoustic Modality 133
Source modeling
- natural audio
- synthetic audio
- speech and sound synthesis
- source directivity
Room Geometry
and Properties MEDIUM
Room modeling
- modeling of acoustic spaces
- propagation
- absorption Multichannel
- artificial reverb
Listener modeling
- modeling of spatial hearing
- HRTFs Binaural
- simple models headphone/loudspeaker
mixing the sound with a delayed and filtered sound that represents reflections from
the walls of the room.
Sinusoidal sounds, described by mathematical sine functions, form the basis of
spectral sound synthesis. Sinusoidal sounds are basic building blocks of computer-
generated audio impressions, just as polygons are the basic building blocks of the
visual domain. Since sounds are time-dependent, the equation that determines sound
is also time-varying. The frequency modulation technique allows slightly richer
sounds than spectral synthesis. In addition to the amplitude and frequency, sound con-
tains additional parameters such as frequency of the carrier signal, the relationship be-
tween the modulation frequency and carrier signal frequency and modulation depth.
Additive and subtractive techniques of sound creation allow signals of different
frequencies to be combined. The result therefore contains a rich combination of
frequencies. The method essentially represents the addition of several sinusoidal
signals of different frequencies with different phase shifts. The Subtractive method
uses filters to attenuate certain frequency bands of an audio signal (e.g. white noise or
harmonics). Filtering techniques allow the creation of effects such as demonstration
of the place of sound origin or the acoustic characteristics of the room.
The properties of human spatial hearing are modeled with the interaural amplitude
difference, time delay and head-related impulse response (HRIR). More information
about HRIR and head-related transfer function (HRTF) can be found in Sect. 6.4.1.
crosstalk
Computer games and virtual reality tend to make the gaming experience as realistic
as possible. In addition to good graphics, a great emphasis is also placed on sound
effects. Computer game developers are installing the latest technology for 3D audio
reproduction in order to allow players to better immerse themselves into the game.
The development of Pure stereo and other algorithms for stereo sound reproduction
may provide a big step in the right direction for computer game and virtual reality
fans.
seems to have come. If someone hears a sound and sees the speaker’s lips to be syn-
chronized with the sound, (s)he expects the sound direction to be from the speaker’s
mouth. Ventriloquism is a strong characteristic that helps localize the sound.
In general, human localization of sound is a relatively underdeveloped ability.
Therefore, strong and unambiguous localization characteristics should be used in
virtual reality.
Basic acoustic theory deals with vibrations and propagation of sound waves through
different media and the effects caused by the propagating wave. Different areas of
expertise explore different aspects of acoustics, making it a highly multidisciplinary
science. Civil engineers are mainly interested in insulation and sound absorption in
buildings. Architects are interested in room acoustics, which is mainly related to
reverberation and echo studying in different halls. Electroacoustical engineers inves-
tigate the accuracy of sound transmission, conversion of electrical and mechanical
energy into sound, and the design and construction of electroacoustic transducers.
Physiologists examine the function and mechanism of hearing, auditory phenomena,
and human reaction to sound or music. Psychoacoustics deals with human perception
and interpretation of sounds. Linguists study the subjective perception of complex
noises and cooperate with rehabilitation engineers in exploring the possibility of
synthetic speech generation. Recently, more and more research has been conducted
in the field of spatial sound with filter algorithms that enable listening to 3D spatial
sound stereo systems.
To understand different aspects of acoustics and human hearing, we must first get
acquainted with spatial sound from a physical and physiological perspective [7, 8].
Therefore, the next subsections present the basic characteristics of sound waves, the
structure and functioning of the human hearing organ, as well as some basic concepts
of room acoustics and its impact on the human sound perception.
Sound is a rapid pressure fluctuation with frequencies in the audible range of the
human ear (approximately 20 Hz–20 kHz). Sound below this frequency range is
called infrasound while sound above this range is called ultrasound. The sound is a
mechanical wave that propagates through gases, liquids or solids that are at least a
little compressible. They must therefore have the characteristics of persistency and
elasticity. Persistency allows the fluctuation transfer from one particle to another
particle while elasticity moves the particle to its steady state. Sound is always a
longitudinal wave motion in gases, but can also appear as a transverse wave in liquids
and solids. Any disturbance in a fluid medium is converted into fluid movement in the
6.2 Fundamentals of Acoustics 139
dx
dx + du
direction of wave propagation, producing small changes of pressure and density that
fluctuate around the equilibrium state. When these compressions and rarefactions
travel along the medium, they cause space- and time-dependent changes in pressure
and density.
During compression, density and pressure are increased. During rarefaction, they
are decreased. A relationship between pressure values and deformation can also be
established in a segment of a medium. Equation (6.1) applies
m = ρ0 V0 . (6.1)
If the volume changes by dV, the density also changes by dρ, Fig. 6.3. Considering
the Eq. (6.1), we can write
Expression dV /V 0 defines the local relative change in volume. The sound field
can be described by the velocity of the particle movements. We use Newton’s law of
F = m a, which can be written for fluids as
δp dv
= −ρ . (6.4)
δx dt
This is actually the Euler equation for fluid physics in one-dimensional space. In
general, it can be written
dv
∇ p = −ρ . (6.5)
dt
140 6 Acoustic Modality in Virtual Reality
p+ dp p
vdt
cdt
Fig. 6.4 Long cylindrical tube with cross-section A, density ρ and pressure p with piston on the
left side
Pressure p on the left side of a pipe, Fig. 6.4, with cross-section A is increased
with the help of a piston by dp. A compression therefore arises near the piston. The
compression moves to the right with the speed of c. The average speed of the piston
and the compression after the time dt is v. The momentum of air G has in the time dt
increased by dG = dm v = A c dt ρ v [9]. Impulse equals the momentum change.
The compressive impulse is dG = F dt, dG = A c dt ρ v, dp = ρ c v, F = A dp.
The average speed reached by particles under pressure difference dp depends on the
substance’s compressibility. The initial volume of material was V = A c dt, but has
shrunk during time dt by d V = A v dt. Compressibility of the material κ is, by
definition,
dV
= −κ dp. (6.6)
V
Followed by κ dp = AA cv dt
dt = c and v = κ c dp = ρ κ c v. We have obtained a
v 2
In the air the speed of sound (sound wave propagating through the medium), is
given with the equation
ϑp
c= , (6.8)
ρ
where ϑ is the specific heat of the air ratio at constant pressure c p with the specific
heat at constant volume cv , p is the pressure in N/m2 , ρ is the density in kg/m3 .
At room temperature of 20 ◦ C and normal atmospheric pressure of 1,000 mbar and
6.2 Fundamentals of Acoustics 141
60 % relative humidity, the speed of sound is 344 m/s. At different temperatures, the
speed of sound can be analytically calculated using the equation
T [◦ C] − 20
c = c20 1 + . (6.9)
273
For a quick approximate calculation we can assume that the speed increases by
0.6 m/s per each degree Celsius.
The wave equation describes the sound pressure change as a function of place and
time. For one dimension, it has the form
∂2 p 1 ∂2 p
− = 0, (6.10)
∂x2 c2 ∂t 2
where p is the acoustic pressure and c is the speed of sound in the air. If speed c is
constant, then the general solution of the Eq. (6.10) is equal to
where f and g are functions that can be derived twice. For a sine wave traveling in
one dimension, we take one of the functions f or g sine function and the other is
zero. We get a solution in the form
The sound source performs work by exciting the oscillation of particles in the matter.
The particle oscillation spreads through the substance in the form of sound energy.
Sound wave energy is composed of the kinetic and potential energy of the particles
oscillating under the influence of waves. For a sinusoidal wave, the wave energy per
volume unit of material (energy density) through which the wave spreads is
ρcω2 y02
I = . (6.14)
2
Sound wave energy flux density is proportional to the square of the amplitude
and frequency of the oscillating particles. Sound intensity I is preferably given
as amplitude of the pressure difference Δp rather than offset amplitude y0 . If the
relationship between the pressure and the amplitude is considered, dp = ρ ω y0 c,
2
(dp)2 = ρ 2 c2 ω2 y02 , I = ρd pc 2 , pe f = √
dp
we get
2
pe2f
I = , (6.15)
ρc
where pef is root mean square of pressure level in N/m2 , ρ is the density in kg/m3 ,
c is the speed of sound in the m/s. At room temperature and normal atmospheric
pressure is pef = 2 · 10−5 N/m2 and the density is ρ = 1.21 kg/m3 . The speed of
sound is 343 m/s and sound intensity is about 10−12 W/m2 .
The density of the sound energy is the energy per one volume unit in the observed
substance. The potential energy of sound waves comes from the substance displace-
ments and the kinetic energy comes from the particle movement. If there are no
losses, the sum of the two energies is constant. The current density of the sound
energy is
p0 ẋ
E tr = ρ ẋ 2 + . (6.16)
c
The average density of the sound energy is
1 2
E pop = ρ ẋ . (6.17)
2
where ρ is the current volume weight in kg/m3 , p0 is the static pressure in N/m2 ,
ẋ is the speed of the particle in m/s, and c is the speed of sound in m/s.
6.2 Fundamentals of Acoustics 143
The specific acoustic impedance of matter is defined as the ratio of sound pressure
and particle velocity
p
z= , (6.18)
v
where p is sound pressure in N/m2 and v is the speed of the particle in m/s. In a
standing wave, the acoustic impedance varies from point to point. In general, it is a
complex number
z = r + ix, (6.19)
A logarithmic scale is used for sound power, intensity and pressure measurements
because of the large measurement range. The sound power level (PWL, Power Level)
is defined as
W
PWL = 10 log , (6.20)
W0
The standard reference sound intensity is I 0 = 10−12 W/m2 . The Sound Pressure
Level (SPL) is defined as
p
SPL = 20 log . (6.22)
p0
Intensity Spectrum Level (ISL) and Pressure Spectrum Level (PSL) are defined as
I
ISL = 10 log( Δf ), (6.23)
I0
Spectrum intensity level is the noise level at some frequency, defined as the level
of noise intensity in a given frequency band width of 1 Hz and with a centre frequency
of f . PSL is defined similarly to SPL, within a frequency band of 1 Hz.
We roughly distinguish three methods for measuring acoustic power in air: free field
measurements, diffuse field measurements and the comparative method.
Total power emitted by the source in an open, free field is obtained by integrating
the sound intensity all over the area that surrounds the origin. In practice, a sphere
is used as the measuring surface and is closed with a reflective bottom. The sound
source is in the centre of the hemisphere, and the radius should be at least twice the
largest dimension of the source. The problem is the choice of measuring locations,
especially if the source emits pure tones or if it is oriented. In that case, a large
number of measuring points should be selected.
Sound excited in a closed reflective space causes the occurrence of diffuse sound
fields. In this case, the specific measurement surface can no longer be defined. The
power emitted by the source in the oscillating state equals the power absorbed by the
walls of the room.
Acoustic power is determined by comparing the measured sound source to a
reference sound source with a known power level or acoustic power level. The emitted
signal must be broadband and spectrally evenly distributed and should not have strong
directional characteristics. The measurement procedure can be carried out in a closed
or open area.
Acoustic power determination through the intensity measurement: The method is
based on the fact that there is a link between the intensity of the sound I and cross
power density K 12 (ω) between two sound pressure values, measured by two micro-
phones, placed at some distance apart.
In addition to this method, another method is also used where the volume of the
sound pressure phase gradient in the sound field is determined. Practical implemen-
tation is possible using a single microphone. In this manner, problems due to phase
differences of the two-channel version can be avoided.
Records concerning echo and sound absorption can be found in the manuscripts from
middle ages. The founder of modern acoustics, Wallace Clement Sabine (1868–
1919), was a physicist who began acoustic development at Harvard University in
order to improve classroom acoustics. With the help of some organ flutes (used as
sound sources), a stopwatch and his skilled hearing, he conducted the first scientific
measurements in the history of acoustics.
6.2 Fundamentals of Acoustics 145
Various factors have an impact on room acoustics or on the desired sound trans-
mission from the acoustic source to the listener: the room volume, the room shape
and the distribution of different absorbing materials. Important room acoustic para-
meters are the reverberation time RT60 , Eq. (6.25), and the succession of the first
acoustic reflections.
In an open space, we usually deal only with the direct sound field. The basic
characteristic of open space is that sound pressure drops by 6 dB when the distance
is doubled (law 1/r). In confined spaces, the impact of the diffuse sound field caused
by reflections and scattering from the walls is also important.
The reverberation time RT60 is defined as the time in which the original sound
energy falls to the millionth part of its value (a drop of 60 dB). Since 1900, different
equations for RT60 calculation have been proposed. The most prominent is still the
Sabine’s equation
4 ln(106 ) V . V
RT60 = = 0.161 (6.25)
c Sa Sa
and
n
Si ai
a= , (6.26)
S
i=1
where V is the volume of space in m3 , c is the speed of sound in the air, S is the total
surface area in m2 , S i is the specific surface area in m2 , a i is the absorption coefficient
of each surface, and a is a mean absorption coefficient. The absorption coefficient
a i is, by definition, the proportion of absorbed acoustic power with respect to the
incident acoustic power.
The human ear is divided into three functional parts: the external, middle and inner
ear. Figure 6.5 shows the cross section and allocation of the ear.
The outer ear consists of the pinna and the external auditory canal. The pinna
consists of skin and cartilage and serves to guide the waves in the external auditory
146 6 Acoustic Modality in Virtual Reality
muscle
temple wall
temple
malleus
incus
stapes
three semicircular canals
pinna cartilage
facial nerve
equilibrium nerve
cochlear nerve
internal
ear canal
external
ear cochlea
ear horn
eardrum
round window
ear lobe oval window
The inner ear consists of two sensory organs: a body balance organ and a cochlea.
The cochlea is a tube wrapped two and a half times around a bone pillar. The cochlea
cavity is divided into two parallel channels: the upper (or atrial) channel and the lower
(or eardrum) channel. The atrial channel is connected to the atrium and through the
oval window to the middle ear. The eardrum channel is adjacent to the middle ear
through the round window. The cochlea is made up of three sections, two of which
are filled with perilymph. In the middle of them is the endolymph-filled cavity.
The channels are separated from each other by thin membranes (Reissner and basilar).
The organ of Corti lies on the basilar membrane and consists of sensory hair cells,
neurons and several types of supporting cells. It is named after its discoverer, Alfonso
Corti (1822–1876). Sensory hair cells are connected with the nerve fibers of the
auditory nerve.
Active mechanics: Operation of a living cochlea, as opposed to a dead one, depends
on the active mechanical process with a positive feedback loop that amplifies the
basilar membrane response. The amplification is carried out by the outer sensory
hair cells. Most of the information (90–95 %) comes from the cochlea to the brain
via internal sensory hair cells, although any damage of outer hair cells leads to hearing
damage.
6.3.2 Loudness
Loudness is the extent of auditory sensations caused by sound reaching the human ear.
Vibrational energy is a physical property while loudness is based on psychological
interpretation. Loudness is therefore a subjective quantity and as such can not be
accurately measured. Usually a relative measure based on a logarithmic ratio of two
amplitudes is used. Man can hear sounds only in a certain range of frequencies and
sound pressure levels. The human hearing range is from 20 Hz to 20 kHz, but the
upper limit decreases with aging.
Figure 6.6 shows equal-loudness contours that are perceived by man as equally
loud. Equal-loudness contours were first published in 1933 by Fletcher and Mun-
son [10] and later by Churcher and King [11], but these measurements show small
discrepancies. In 1956 they were corrected by Robinson and Dadson using a new
contour for the lower auditory threshold [12]. Free field equal-loudness contours
were standardized by ISO 226:1987, Fig. 6.6, and later revised in 2003. The ratio of
sensitivity to high and low tones depends on the intensity of the sound waves. Max-
imum human ear sensitivity is in the range of 2–5 kHz. The external auditory canal
has a resonance frequency at approx. 3 kHz. Subjective sound impression depends
on the frequency content or spectrum of the sound and on the sound amplitude.
As the human ear perceives sound waves of the same power as differently loud if
different frequencies are in the sound wave, it is necessary to introduce the loudness
level with the unit phon.
Phon is an acoustic measure used to indicate the general noise loudness level.
Pure tone with a frequency of 1000 Hz at a sound intensity level IL = 1 dB has,
148 6 Acoustic Modality in Virtual Reality
by definition, the volume level of one phon. All other tones have a sound level of n
phons if the ear judges them to be as loud as a pure tone of frequency 1000 Hz with
the sound intensity level of n dB. A tone with a frequency of 500 Hz and a volume
level of 40 phons sounds as loud as any other tone of 40 phons with a freely chosen
frequency.
Sone is an acoustic criterion used in determining sound loudness level. It is used
to compare and classify the loudness of different sounds based on the way the ear
hears them. By definition, a pure tone with a frequency of 1000 Hz has a sound
loudness of one son at a sound intensity level of 40 dB. A loudness of 1 milison
represents the hearing threshold.
The Loudness Level (LL) is defined by Eq. (6.27) as
I
L L = 10 log , (6.27)
10−12
where I is the intensity of sound in the W/m2 . The link between loudness in sons
and loudness in phons can be written by the equation
P−40
S=2 10 , (6.28)
120
100
80 PHON
80
SPL/ dB
60 PHON
60
40 PHON
40
20 PHON
20
0 PHON
0
101 10 2 10 3 10 4
f / Hz
max. of 1 mbar (the eardrum survives this pressure difference without pain). The
energy that must reach the ear at a certain frequency for the tone to still be heard
was measured after World War I. The lower auditory threshold (minimum sound
intensity) to excite sensation at 3 kHz was established around 10−12 W/m2 . On the
other hand, the density of sound that the ear can still handle without pain is around
1 W/m2 . The human ear is therefore a very sensitive instrument, detecting sound
waves in the range of 12 orders os sound intensity. However, the sensitivity of the
human ear is not the same across the whole range. Experiments show that sensitivity
is proportional to the logarithm of sound intensity, Fig. 6.7.
The sensitivity of the ear for sound power level L 0 = 10−12 W/m2 is zero because
the sound is too weak to be perceptible to the ear. Sound level between the lower
threshold and the threshold of pain is divided into 130 phones.
To define the spatial dimensions of sound perception, the listener’s head is placed
in a coordinate system as shown in Fig. 6.8. Angle ϕ represents the azimuth and δ
represents the angle in the vertical plane (elevation).
The ability of the human ear to identify and localize the direction of a sound source
with great precision is called auditory localization or binaural audition. Auditory
localization is conditioned by the difference of sound intensity in both ears and is
caused by diffractions, reflections and the phase difference in the sound that comes
at different times to both ears.
A person is able to determine the direction of a sound source, except in the case of
a plane wave in a free field coming from the direction where ϕ = 0◦ . When the sound
source is right in front of us, sound waves reach the left and right ear simultaneously.
The direction then cannot be accurately determined. As soon as the head is moved
out of the symmetry plane, the sound wave arrives to one ear earlier and the direction
can be obtained from the timing difference.
Our hearing apparatus uses a number of sound properties in order to determine
the origins of sound. These properties are the result of the propagation of sound
waves from the sound source to the listener’s eardrum. The head, being a natural
barrier in the sound field, causes reflections and refractions of sound waves. With
its geometry, it has an additional impact on the sound field at higher frequencies
I0 I
150 6 Acoustic Modality in Virtual Reality
in relation to its dimensions and its shape compared to the sound wavelength size.
The influence of the room on the sound wave is expressed by absorption, reflection,
refraction and interference due to items located on the path between the sound source
and the listener etc. The influences of the surroundings are important because they
give information about the distance from the sound source to the listener. Reflections,
diffractions, refractions and interference of sound waves due to the listener’s body
(especially shoulders, head and ears) have a key impact on determining the sound
source direction (azimuth and elevation). All this causes a difference between the
sound pressure coming to the left and sound pressure coming to the right ear. These
differences also depend on the direction from which the sound comes to the human
head (ϕ and δ).
Human ears are about 18 cm apart, allowing sound direction to be determined and
enabling stereo hearing. When the sound source is outside the listener’s frontal plane
time and intensity differences occur between the ears as the sound to one ear travels
around the head [13]. The model to calculate the time difference is shown in Fig.
6.9. The time difference between ears Δt is
r ϕ + r sin(ϕ) r (ϕ + sin(ϕ))
Δt = = , (6.29)
c c
backwards
ϕ = 180
δ= 0
δ forwards
ϕ ϕ= 0
δ= 0
Fig. 6.8 The head placed in the sound perception coordinate system
6.4 The Spatial Characteristics of Hearing 151
where r is the head radius, ϕ is the deviation of the sound source from the frontal
plane and c is the speed of sound. The maximum time difference occurs at an angle
of ϕ = 90◦ or π /2 radians. Taking into account Eq. (6.29), it is equal to
This small time difference, which varies with the angle ϕ, defines the sound wave
phase difference and thereby the frequency by which the sound source direction can
be determined. The phase difference φ can be calculated as
r (ϕ + sin(ϕ))
φ = 2π f . (6.31)
c
When the phase difference becomes greater than 180◦ or π radians, the sound
direction becomes indistinguishable with this method since at that angle there are
two possible sound source positions on the left and on the right side. Considering
this limit, we can calculate the maximum frequency at a certain angle ϕ, that is
c
f max (ϕ) = , (6.32)
2 · 0.09 m · (ϕ + sin(ϕ))
This means that at an angle of ϕ = 90◦ , the highest frequency for which we can
still determine the direction of the sound origin is 743 Hz. For smaller angles ϕ the
maximum frequency increases [13].
rsin( ϕ )
rϕ
ϕ
r
ϕ r
152 6 Acoustic Modality in Virtual Reality
The second method of sound direction perception originates in the sound intensity
difference caused by the shading effect. When the sound source is outside the frontal
plane, the amplitude of the sound is reduced (shaded) at the ear that is further away
from the source. Experiments have shown that the amplitude ratio between the two
ears varies sinusoidally between 0 and 20 dB, depending on the angle of the sound
source and the frequency or wavelength. When the wavelength is longer than the
object (head), the sound wave refracts and diffracts around the object. The wavelength
therefore has virtually no impact on the wave spreading. At wavelengths smaller than
the object, there is almost no refraction and the object has a huge impact on the wave
propagation. The size of the object at which the wavelength becomes an important
factor in sound wave propagation is about two thirds of the sound wavelength (2/3 λ),
although the scattering of sound waves already starts at frequencies an octave below.
That means that the minimum frequency at which we are able to detect the direction of
the sound source is when the diameter of the head is about one third of the wavelength
(1/3 λ). Taking into account the width of the head d = 18 cm and the angle at which
the opposite ear is most shadowed (ϕ = π / 2), the minimum frequency equals
1c 1 344 m/s
f min (ϕ=π/2) = = = 637 Hz. (6.34)
3d 3 0.18 m
It follows thereby that the direction perceived from the difference in sound
intensity is useful for higher frequencies and the direction perceived from timing
differences is useful for low frequencies. The above method of determining the
sound source position is impossible if the sound source is at the same angle ϕ placed
in front or behind the listener, because the calculation from Eq. (6.29) shows that the
time difference Δt is the same in both cases.
The complex form of the pinna causes delays in the received audio signal, which
are a function of sound source direction in all three dimensions. The delays are so
small that this effect comes into force only at frequencies greater than 5 kHz. Another
equally important way of determining the direction of the sound source is by turning
the head. In this way, the sound signals coming from the back move in a different
direction compared to the audio signals coming from the front [13].
All of the above methods of sound direction detection create an impression of
three-dimensional sound in the listener’s head—binaural listening.
The listener’s ability to determine the direction of the incoming sound is based on
the difference of the sound pressure that comes to the left and to the right ear. When the
sound comes to our ears from one direction, time and intensity differences of sound
pressure levels are created. Time differences in the sound pressure level between the
ears are called interaural time differences. The dependence of interaural delay on
the azimuth angle ϕ for an average sized head is shown in Fig. 6.10. This of course
applies to the sound wavelengths, which are large in relation to the head’s dimensions
(the distance between the ears). The direction of the sound source is determined on
the basis of the phase delay between the ears (depending on the wavelength). This
time difference is large enough for our brains to determine the direction from which
6.4 The Spatial Characteristics of Hearing 153
the sound is coming. The human ear can detect the time difference of 30 µs, which
corresponds to the change of the sound source azimuth angle ϕ = 3◦ .
For phase delays larger than 180◦ of phase angle or π radians, the direction
becomes indistinguishable as there are two possible sound source positions: on the
left or on the right side of the head. This occurs at a frequency of about 1 kHz.
Let’s look at the situations at 40 Hz and at 1,000 Hz when the distance between
the ears is L = 0.18 m. The sound is coming from a direction in such a way that:
ϕ = 90◦ and δ = 0◦
f 1 = 40 Hz
λ = cf = 340 m/s
40 Hz = 8.5 m
L 0.18 m
ϕ= = 360◦ = 7.6◦
λ 8.5 m
f 2 = 1 kHz
340 m/s
λ = cf = 1000 Hz = 0.34 m
L 0.18 m
ϕ= = 360◦ = 190.6◦
λ 0.34 m
At low frequencies (below 100 Hz), the phase differences are so small that the
sound source direction cannot be determined. This means that the sound pressure
levels on both ears are almost identical. Our brain thus can not determine the direction
of the sound (Fig. 6.11). This property is exploited in multi-channel loudspeaker
systems, where only one subwoofer is required. Similarly, when compressing stereo
audio data, information from the low-frequency spectrum can be combined into one
channel.
The head acts as a barrier at frequencies of the same or much larger wavelength
than the wavelength of sound. Therefore, differences in the sound pressure level
occur between the ears.
0.4
0.2
0
-50 0 50
Azimuth/
154 6 Acoustic Modality in Virtual Reality
(a) (b)
1 1
0.5 0.5
Amplitude
Amplitude
0 0
-0.5 -0.5
-1 -1
0 200 400 0 200 400 600
Phase/ Phase/
Fig. 6.11 The phase delay between the ears at a frequency of 40 Hz (a) and 1 kHz (b)
6.4 The Spatial Characteristics of Hearing 155
(a)
Relative sound amplitude/ dB (b)
1
Amplitude
20
R ear R ear
0
0 -1
0 1 2 3
t/ ms
0.2
Amplitude
-20
0.1
L ear L ear
0
-40 -0.1
10 − 2 10 0 10 2 0 1 2 3
f/kHz t/ ms
Fig. 6.13 The localization accuracy in the middle plane (left) and horizontal plane (right) [15]
in relation to the elevation and distance have been relatively poorly explored for the
time being, while the azimuth has been far better studied [16].
The human auditory system has limited ability to determine the distance of a sound
source, see Fig. 6.14 [15]. To assess the distance from the sound source, human hear-
ing relies on the property that high-frequency sound is more strongly attenuated in the
air than low-frequency sound. Distant sounds thus have more pronounced basses than
high tones. We hear remote sources with lower strength than the surrounding sound
sources. If the sound source or the listener is moving, the Doppler effect arises. The
relationship between the direct and the reflected sound gives us information about
the distance to the sound source. The impression of distance disappears when tones
in the sound field last for a long time.
In a closed space, the ratio between the direct and the reflected sound assists
us in determination of the sound source direction. As a result of binaural listening,
human hearing has the ability to distinguish direct sound from reflected sound and
automatically gives more weight to the direct sound.
156 6 Acoustic Modality in Virtual Reality
Fig. 6.14 The sense of the sound source distance for sound pulse in the horizontal plane and
azimuth 0◦ [15]
In a complex sound image, auditory masking occurs. It can occur in the frequency
domain (simultaneous, frequency or spectral masking) or in the time domain (tem-
poral masking or non-simultaneous masking).
Human hearing range is limited by the lower auditory threshold and the upper
pain limit. Human hearing cannot distinguish amplitude levels lower than 1 phon.
Detailed measurements have shown that audio sources in the sound field which are
more than 15 phons weaker than the loudest sound source can be ignored. However,
if two equally loud sound sources are present in the sound field, the overall volume
level of the two is likely to increase over the theoretical value of 3 dB up to 10 dB.
Due to nonlinearity of human hearing, a new scale has also been introduced. Sound
loudness expressed in sones is distinguished by the feature that it corresponds to n
times greater impression of loudness. At low loudness, sones increase more slowly
than phones. When two distinct tones are present in the sound field and the stronger
does not cover the weaker one, the total volume can be approximated by the sum of
the sones.
As soon as a tone is heard, the sensitivity for weaker tones at different frequencies
is lowered. Therefore the hearing threshold rises, see Fig. 6.15. For frequencies above
1200 Hz, the masked tones should be amplified by 40–50 dB to be audible again.
If the tone mix (music) loudness is increased, lower frequency tones are heard
better.
Time perception of sound: Audio tones of the same power, if heard for a different
amount of time, are not perceived as equally loud. Measurements have shown that
the loudest sound impression is made by a tone if listened to for a time span of 0.5
to 1.5 s.
The overlap effect: When we speak, sound waves are conducted to our own ear
both via bones and via air. These two channels overlap, which is why we do not
recognize our own voice recorded on a tape, even though others find it a faithful
reproduction.
6.4 The Spatial Characteristics of Hearing 157
90
80
mixture of tones mixture of
secondary tone loudness level / dB
audio beat
audio beat
prim. sec.
60 and diff. tone prim. sec. prim. sec.
and diff. tone and diff.
tone
50
prim. and
dif. tone
40
30
primary and
the primary tone only
20 secondary tone
10
0
400 600 800 1200 1600 2400 3200 4000
frequency/ Hz
Fig. 6.15 Auditory masking with the primary tone at the frequency of 1200 Hz and amplitude of
80 dB
158 6 Acoustic Modality in Virtual Reality
The amazing brain flexibility and its ability to replace missing senses is reflected
also in the fact that humans, like some animals (bats, dolphins, …), develop echo-
localization. Echo-localization is a way of navigation or observation by means of
sound using reverberation. Sound is reflected from objects and in turn creates a
picture of the environment in the brain according to the reflected sound.
M+S=X (6.35)
M−S=Y (6.36)
X +Y = M (6.37)
X −Y = S (6.38)
The current stereo technique is compatible, which means that the monophonic
reception is as acoustically accurate as optimal monophonic recording.
6.5 Recording Techniques 159
A dummy human head (mannequin head) with microphones built in the ear canals
is used for binaural recording. It must be reproduced with headphones to create 3D
stereo sensation for the listener and give the impression of actually being in the
acoustic scene. Unlike ‘Pure stereo’ playback technique, the binaural technique does
not require any additional filtering. If played back with loudspeakers, the 3D spatial
information is corrupted due to the crosstalk.
References
Abstract The chapter covers topics relevant for the design of haptic interfaces and
their use in virtual reality applications. It provides knowledge required for under-
standing complex force feedback approaches and introduces general issues that must
be considered for designing efficient and safe haptic interfaces. Human haptics, math-
ematical models of virtual environment, collision detection, force rendering and con-
trol of haptic devices are the main theoretical topics covered in this chapter, which
concludes with a summary of different haptic display technologies.
The word haptic originates from the Greek verb hapto—to touch—and therefore
refers to the ability to touch and manipulate objects. The haptic experience is based
on tactile senses, which provide awareness of the stimuli on the surface of the body,
and kinesthetic senses, which provide information about body pose and movement.
Its bidirectional nature is the most prominent feature of haptic interaction, which
enables exchange of (mechanical) energy—and therefore information—between the
body and the outside world.
The word display usually emphasizes the unidirectional nature of transfer of
information. Nevertheless, in relation to haptic interaction, similar to visual and
audio displays, the phrase haptic display refers to a mechanical device for transfer
of kinesthetic or tactile stimuli to the user.
Virtual environments that engage only the user’s visual and auditory senses are
limited in their ability to interact with the user. It is desirable to also include a haptic
system that not only transmits sensations of contact and properties of objects, but
also allows their manipulation. The human arm and hand allow objects to be pushed,
grasped, squeezed or hit, they enable exploration of object properties such as surface
texture, shape and compliance, and they enable manipulation of tools such as a pen or
a hammer. The ability to touch, feel and manipulate objects in a virtual environment,
augmented with visual and auditory perception, enables a degree of immersion that
otherwise would not have been possible. The inability to touch and feel objects, either
Teleoperation system
Slave Real
system environment
Control
of slave
system
rce (veloc Virtual environment
Fo ity
and slave system
)
)
Velo rce
city (fo
Control of Collision Collision
haptic rendering detection
interface
Fig. 7.1 Haptic system: interaction between a human and the haptic interface represents a bidi-
rectional exchange of information—a human operator controls the movement of a slave system as
well as receives information about the forces and movements of the slave system through the haptic
interface
to the normal force to the surface of the object. The computed or measured force
or displacement is then transmitted to the user through the haptic interface. A local
feedback loop controls the movement of the haptic interface so that it corresponds
to the measured or computed value.
From the block scheme in Fig. 7.1, it is clear that the interaction between a human
and the haptic interface represents a bidirectional exchange of information—a human
operator controls the movement of a slave system as well as receives information
about the forces and movements of the slave system through the haptic interface.
The product of force and displacement represents mechanical work accomplished
during the haptic interaction. Bidirectional transfer of information is the most char-
acteristic feature of haptic interfaces compared to display of audio and visual images.
The following sections provide a general course on haptics in virtual reality. More
information can be found in [4].
Haptic perception represents active exploration and the process of recognizing objects
through touch. It relies on the forces experienced during touch. Haptic perception
involves a combination of somatosensory perception of patterns on the skin surface
and kinesthetic perception of limb movement, position and force. People can rapidly
and accurately identify three-dimensional objects by touch. They do so through the
use of exploratory procedures, such as moving the fingers over the outer surface of
the object or holding the entire object in the hand. The concept of haptic perception is
related to the concept of extended physiological proprioception according to which,
164 7 Haptic Modality in Virtual Reality
Haptic devices receive motor commands from the user and display the image of
force distribution to the user. A haptic interface should provide a good match between
the human haptic system and the device used for sensing and displaying haptic
information. The primary input-output (measured and displayed) variables of the
haptic interface are movement and force (or vice versa) with their spatial and temporal
distributions. Haptic devices can therefore be treated as generators of mechanical
impedance, which represents the relation between the force and movement (and their
derivatives) in various positions and orientations. When displaying contact with a
finite impedance, either force or movement represent excitation while the remaining
quantity represents the response (if force is excitation then movement is response
and vice versa), which depends on the implemented control algorithm. Consistency
between the free movement of hands and touch is best achieved by taking into account
the position and movement of hands as excitation and resultant vector of force and
its distribution within the area of contact as response.
Since a human user senses and controls the position and force displayed by
a haptic device, the performance specifications of the device directly depend on
human capabilities. In many simple tasks that involve active touch, either tactile or
kinesthetic information is of primary importance while the other is only complemen-
tary information. For example, when trying to determine the length of a rigid object
by holding it between thumb and index finger, kinesthetic information is essential
while tactile information is only supplementary. In this case, the crucial ability is
sensing and controlling the position of the finger. On the other hand, perception of
texture or slipperiness of the surface depends mainly on tactile information while
kinesthetic information only supplements tactile perception. In this case, perceived
information about temporal-spatial distribution of forces provides a basis for perceiv-
ing and inferring the conditions of contact and characteristics of the surface of the
object. In more complex haptic tasks, however, both kinesthetic and tactile feedback
are required for correct perception of the environment.
Due to hardware limitations, haptic interfaces provide stimuli that only approx-
imate interaction with a real environment. However, this does not mean that an
artificially synthesized haptic stimulus does not feel realistic. Consider the analogy
with a visual experience of watching a movie. Although visual stimuli in the real
world are continuous in time and space, visual displays project images with a fre-
quency of only about 30 frames per second. Nevertheless, the sequence of images is
perceived as a continuous scene since displays are able to exploit limitations of the
human visual apparatus.
A similar reasoning also applies to haptic interfaces where implementation of
appropriate situation-specific simplifications exploits limitations of the human haptic
system. An understanding of human biomechanical, sensory-motor and cognitive
capabilities is critical for proper design of device hardware and control algorithms
for haptic interfaces.
166 7 Haptic Modality in Virtual Reality
Fig. 7.2 The term kinesthetics mainly refers to the perception of movement and position of limbs
The term kinesthetics refers to the perception of movement and position of limbs
and in a broader sense includes also perception of force. This perception originates
primarily from mechanoreceptors in muscles, which provide the central nervous
system with information about static muscle length, muscle contraction velocity and
forces generated by muscles. Awareness of limb position in space, of limb movement
and of mechanical properties (such as mass and stiffness) of objects with which the
user interacts emerges from these signals. Sensory information about the change
in limb position also originates from other senses, particularly from receptors in
joints and skin. These senses are particularly important for kinesthetics of the arm.
Receptors in the skin significantly contribute to the interpretation of the position
and movement of the arm. The importance of cutaneous sensory information is not
surprising considering the high density of mechanoreceptors in the skin and their
specialization for tactile exploration. This feedback information is important for
kinesthetics of the arm because of the complex anatomical layout of muscles that
extend across a number of joints, which introduces uncertainty in the perception of
position derived from receptors in muscles and tendons (Fig. 7.2).
Mechanoreceptors are primary and secondary receptors (also called Type Ia and
Type II sensory fiber) located in muscle spindles. Muscle spindles are elongated
structures 0.5–10 mm in length, consisting of muscle fiber bundles. Spindles are
located parallel to the muscle fibers, which are generators of muscle force and are
attached at both ends either to the muscle or tendon fibers [5]. A muscle spindle
detects length and tension changes in muscle fibers. The main role of a muscle
spindle is to respond to stretching of the muscle and to stimulate muscle contraction
through a reflex arc to prevent further extension. Reflexes play an important role in
the control of movement and balance. They allow automatic and rapid adaptation of
muscles to changes in load and length.
Both primary and secondary spindle receptors respond to changes in muscle
length. However, the primary receptors are much more sensitive to velocity and
acceleration components of the movement and their response considerably increases
7.1 Human Perceptions and Motor System 167
with increased velocity of muscle stretching. The response of primary spindle recep-
tors is nonlinear and their output signal depends on the length of the muscle, muscle
contraction history, current velocity of muscle contraction and activity of the central
nervous system, which modifies the sensitivity of muscle spindles. Secondary spin-
dle receptors have a much less dynamic response and have a more constant output
at a constant muscle length compared to the primary receptors. Higher dynamic sen-
sitivity of primary spindle receptors indicates that these receptors mainly respond
to the velocity and direction of muscle stretching or movement of a limb while the
secondary spindle receptors measure static muscle length or position of the limb.
The second type of mechanoreceptors is a Golgi tendon organ. It measures 1 mm
in length, has a diameter of 0.1 mm and is located at the attachment of a tendon to the
bundle of muscle fibers. The receptor is therefore connected in series with the group
of muscle fibers and primarily responds to the force generated by these fibers. When
muscle is exposed to excessive load, the Golgi tendon organ becomes excited, which
leads to the inhibition of motor neurons and finally to reduction of muscle tension.
In this way, the Golgi tendon organ also serves as a safety mechanism that prevents
damage to the muscles and tendons due to excessive loads.
Other mechanoreceptors found in joints are Ruffini endings, which are responsible
for sensing angle and angular velocity of the joint movements, Pacinian corpuscles,
which are responsible for estimation of the joint acceleration, and free nerve endings,
which constitute the nociceptive system of the joint.
Although humans are presented with various sensations when touching objects, these
sensations are a combination of only a few basic types of sensations, which can
be represented with basic building blocks. Roughness, lateral skin stretch, relative
tangential movement and vibrations are the basic building blocks of sensations when
touching objects. Texture, shape, compliance and temperature are the basic object
properties that are perceived by touch. Perception is based on mechanoreceptors
in the skin. When designing a haptic device, human temporal and spatial sensory
capabilities have to be considered (Fig. 7.3).
Four different types of sensory organs for sensing touch can be found in the skin.
These are Meissner’s corpuscles, Pacinian corpuscles, Merkel’s discs and Ruffini
corpuscles (Fig. 7.4). Figure 7.5 shows the rate of adaptation of these receptors to
stimuli, the average size of the sensory area, spatial resolution, sensing frequency
range and frequency of maximum sensitivity. Delays in the response of these recep-
tors range from 50 to 500 ms.
Since the thresholds for different receptors overlap, the quality of sensing of
touch is determined by a combination of responses of different receptors. Receptors
complement each other, making it possible to achieve a wide sensing range for
detecting vibrations with frequencies ranging from 0.4 to about 1,000 Hz [6, 7]. In
general, the threshold for detecting tactile inputs decreases with increased duration
168 7 Haptic Modality in Virtual Reality
Fig. 7.3 The highest density of tactile receptors can be found in fingertips
Meissner’s corpuscles
of the stimuli. The spatial resolution at the fingertips is about 0.15 mm while the
minimum distance between two points that can be perceived as separate points is
approximately 1 mm. Humans can detect a 2-µm high needle on the smooth glass
surface. Skin temperature affects the tactile perception.
7.1 Human Perceptions and Motor System 169
receptor
sensory
area
small, sharp edges large, smooth edge small, sharp edges large, smooth edge
stimulus
response
fast adaptation fast adaptation slow adaptation slow adaptation
frequency
range (Hz) 10 − 200 70 − 1000 0.4 − 100 0.4 − 100
maximal
sensitivity (Hz) 40 200 − 250 50 50
sensations flexion, rate, local vibrations, slip, skin curvature, skin stretch,
form, tremor, slip acceleration local shape, pressure local force
Properties of the human tactile perception provide important guidelines for plan-
ning and evaluation of tactile displays. The size of perception area, duration and
frequency of the stimulus signal need to be considered.
The vestibular system, which contributes to human balance and sense of spatial ori-
entation, is the sensory system that provides the dominant input about the movement
and equilibrioception. Together with the cochlea, a part of the auditory system, it
constitutes the labyrinth of the inner ear (Fig. 7.6). As human movements consist
of rotations and translations, the vestibular system comprises two components: the
semicircular canal system, which indicates rotational movements; and the otoliths,
which indicate linear acceleration. The vestibular system sends signals primarily to
the neural structures that control eye movements and to the muscles that keep a body
upright.
170 7 Haptic Modality in Virtual Reality
Fig. 7.6 Vestibular system, located in inner ear, contributes to human balance and sense of spatial
orientation
During haptic interactions, a user directly interacts with a haptic display through
physical contact. As a consequences, this affects the stability of haptic interaction.
It is therefore necessary to consider human motor properties to ensure stable haptic
interaction.
The human arm is a complex biomechanical system whose properties cannot be
uniquely described; it may behave as a system where position is controlled, or it
may behave as a system where in a partly constrained movement the contact force
is controlled.
A human arm can be modeled as a non-ideal source of force in interaction with a
haptic interface. The term non-ideal in this case refers to the fact that the arm does not
respond only to signals from the central nervous system, but also to the movements
imposed by its interaction with the haptic interface. Relations are shown in Fig. 7.7.
Force Fh is the component of the force resulting from muscle activity that is
controlled by the central nervous system. If the arm does not move, the contact force
Fh applied by the human arm on the haptic display equals Fh (muscle force that
initializes the movement). However, Fh is also a function of the imposed movement
by the haptic display. If the arm moves (haptic display imposes movement), the
force acting on the display differs from Fh . Conditions are presented in Fig. 7.7.
Instantaneous force Fh is not only a function of the Fh but also a function of movement
velocity vh of the contact point between the arm and the tip of the haptic interface.
Considering the analogy between mechanical and electrical systems, force Fh can
be written as
7.1 Human Perceptions and Motor System 171
Zh
vh
Fh = Fh − Zh vh , (7.1)
where Zh represents biomechanical impedance of the human arm and maps move-
ment of the arm into force. Zh is primarily determined by physical and neurological
properties of the human arm and has an important role in the stability and perfor-
mance of the haptic system.
Different characteristics of the real environment are perceived through the haptic
sense. The objective of using haptic displays is to represent the virtual environment
as realistically as possible. Abstract haptic representations are rarely used, except in
interactions with scaled environments (e.g. nanomanipulation), for sensory substitu-
tion and for the purpose of avoiding dangerous situations. In interactions with scaled
environments, the virtual reality application may use forces perceivable to humans,
for example, to present events unfolding at the molecular level.
Information that can be displayed through haptic displays includes object fea-
tures such as texture, temperature, shape, viscosity, friction, deformation, inertia and
weight. Restrictions imposed by haptic displays usually prevent the use of combina-
tions of different types of haptic displays.
In conjunction with visual and acoustic presentations, the haptic presentation is
the one that the human cognitive system most relies on in the event of conflicting
information.
Another important feature of haptic presentations is its local nature. Thus, it is
necessary to haptically render only those objects that are in direct reach of the user.
This applies only to haptic interactions since visual and auditory sensations can be
perceived at a distance.
172 7 Haptic Modality in Virtual Reality
Before dealing with methods for collision detection in virtual environments, we shall
review basic concepts of geometric modeling of virtual objects, since the method for
collision detection significantly depends on the object model [8–10]. Most methods
for geometric modeling originate from computer graphics and were presented in
Chap. 5.
Object models are often represented using the object’s exterior surfaces—the
problem of model representation is simplified to a mathematical model for describ-
ing the object’s surface, which defines the outside boundaries of an object. These
representations are often referred to as representations with boundary surface. Other
representations are based on constructive solid geometry, where solid objects are used
as basic blocks for modeling, or volumetric representations, which model objects
with vector fields.
Haptic rendering is generally based on completely different requirements than
computer graphics. The sampling frequency of a haptic system is significantly higher
and haptic rendering is of a more local nature since we cannot physically interact with
the entire virtual environment at once. Haptic rendering thus constructs a specific
set of techniques making use of representational models developed primarily for
computer graphics.
The following section provides an overview of some modeling techniques for
virtual objects with an emphasis on attributes specific for haptic collision detection.
Two early approaches for representation of virtual objects were based on a force
vector field method and an intermediate plane method. The vector field corresponds
to the desired reaction forces. The interior of an object is divided into areas whose
main characteristic is the common direction of force vectors, whereas the force vector
length is proportional to the distance from the surface (Fig. 7.8).
An intermediate plane [11], on the other hand, simplifies representation of objects
modeled with boundary surfaces. The intermediate plane represents an approxima-
tion of the underlying object geometry with a simple planar surface. The plane para-
meters are refreshed as the virtual tool moves across the virtual object. However, the
7.2 Haptic Representation in Virtual Reality 173
refresh rate of the intermediate plane can be lower than the frequency of the haptic
system.
Other representational models originate from the field of computer graphics. Two
frequently used representations are implicit and parametric surfaces.
An implicit surface is defined by the implicit function. It is defined by mapping the
three-dimensional space to the space of real numbers f : 3 → and an implicit
surface is defined with points, where f (x, y, z) = 0. Such a function uniquely
defines what is inside f (x, y, z) < 0 and what is outside f (x, y, z) > 0 of the
model. Implicit surfaces are consequently generically closed surfaces.
A parametric surface is defined by mapping from a subset of the plane into a three-
dimensional space f : 2 → 3 . Contrary to implicit surfaces, parametric surfaces
are not generically closed surfaces. They thus do not present the entire object model,
but only a part of the object boundary surface. Implicit and parametric surfaces were
presented in more details in Sect. 5.2.2.
However, a representational method that is most often used in computer graphics
is based on polygonal models. Representations using polygons are simple, polygons
are versatile and appropriate for fast geometric computations. Polygon models enable
presentation of objects with boundary surfaces. An example of a polygonal model
is shown in Fig. 7.9, where the most simple polygons—triangles—are used. Each
object surface is represented with a triangle that is defined with three points (for
example, tr1 = P0 P1 P2 ).
Haptic rendering based on polygonal models may cause force discontinuities
at the edges of individual polygons, when the normal of the force vector moves
from the current to the next polygon. The human sensing system is accurate enough
to perceive such discontinuities, so they must be compensated for. A method for
removing discontinuities is referred to as force shading and is based on interpolation
of normal vectors between two adjacent polygons.
F
174 7 Haptic Modality in Virtual Reality
The algorithm for haptic interaction with a virtual environment consists of a sequence
of two tasks. When the user operates a virtual tool attached to a haptic interface, the
new tool pose is computed and possible collisions with objects in a virtual envi-
ronment are determined. In case of contact, reaction forces are computed based
on the environment model and force feedback is provided to the user via the hap-
tic display. Collision detection guarantees that objects do not float into each other.
A special case of contact is the grasping of virtual objects that allows object manip-
ulation (Fig. 7.10). If grasping is not adequately modeled, it might happen that the
virtual hand may pass through the virtual object and the reaction forces that the user
perceives are not consistent with the visual information.
If virtual objects fly through each other, this creates a confusing visual effect
Penetration of one object into another thus needs to be prevented. When two objects
try to penetrate each other, we are dealing with collision.
Collision detection is an important step toward physical modeling of a virtual
environment. It includes automatic detection of interactions between objects and
computation of contact coordinates. At the moment of collision, the simulation gen-
erates a response to the contact. If the user is coupled to one of the virtual objects (for
example, via a virtual hand), the response to the collision results in forces, vibrations
or other haptic quantities being transmitted to a user via a haptic interface.
P0 tr 1
P2
P1
Computer haptics is a research area dealing with techniques and processes related to
generation and rendering of contact properties in a virtual environment and displaying
of this information to a human user via a haptic interface. Computer haptics deals
with models and properties of virtual objects as well as algorithms for displaying
haptic feedback in real time.
Haptic cue rendering often represents the most challenging problem in a virtual
reality system. The reason is primarily the direct physical interaction and, therefore, a
bidirectional communication between the user and the virtual environment through a
haptic display. The haptic interface is a device that enables man-machine interaction.
It simultaneously generates and perceives mechanical stimuli.
Haptic rendering allows the user to perceive the mechanical impedance, shape,
texture and temperature of objects. When pressing on an object, the object deforms
due to its final stiffness or moves if it is not grounded. The haptic rendering method
must take into account the fact that humans simultaneously perceive tactile as well
as kinesthetic cues. Due to the complexity of displaying tactile and kinesthetic cues,
virtual reality systems are usually limited to only one type of cue. Haptic rendering
can thus be divided into rendering through the skin (temperature and texture) and
rendering through muscles, tendons and joints (position, velocity, acceleration, force
and impedance).
Stimuli that mainly trigger skin receptors (e.g. temperature, pressure, electrical
stimuli and surface texture) are displayed through tactile displays. Kinesthetic infor-
mation that enables the user to investigate object properties such as shape, impedance
(stiffness, damping, inertia), weight and mobility, is usually displayed through robot-
based haptic displays. Haptic rendering can produce different kinds of stimuli, rang-
ing from heat to vibrations, movement and force. Each of these stimuli must be
rendered in a specific way and displayed through a specific display.
HIP, P2 SCP
penetration depth
7.4 Haptic Rendering in Virtual Reality 177
Haptic rendering with low sampling frequency or high latency may influence the
perception of a virtual environment and may cause instability of the haptic display.
This is completely different from visual rendering, where slow processing causes
the user to perceive visual information not as a continuous stream but as a discrete
sequence of images. However, each image is still a faithful representation of a virtual
environment at a given time. For example, visual representation of a brick would still
display a brick while the haptic system would render it as a mass of clay due to the
low stiffness of the virtual object, which is a result of the low sampling frequency.
Visual rendering results in a visual image that is transmitted to the user from the
display via electromagnetic radiation. Haptic rendering, on the other hand, enables
implementation of different types of stimuli, from vibration to movement and force.
Each stimulus is rendered in a specific manner and presented through a specific
display.
Temperature rendering is based on heat transfer between the display and the skin.
The tactile display creates a sense of object temperature.
Texture rendering provides tactile information and can be achieved, for example,
using a field of needles that simulates the surface texture of an object. Needles are
active and adapt according to the current texture of the object being explored by the
user.
Kinesthetic rendering allows display of kinesthetic information and is usually
based on the use of robots. By moving the robot end-effector, the user is able to
haptically explore his surroundings and perceive the position of an object. The object
is determined by the inability to penetrate the space occupied by that particular
entity. The greater the stiffness of the virtual object, the stiffer the robot manipulator
becomes while in contact with a virtual object. Kinesthetic rendering thus enables
perception of the object’s mechanical impedance.
Haptic rendering of a complex scene is much more challenging compared to
visual rendering of the same scene. Therefore, haptic rendering is often limited to
simple virtual environments. The complexity arises from the need for a high sampling
frequency in order to provide consistent feeling of rendered objects. If the sampling
frequency is low, the time required for the system to respond and produce an adequate
stiffness (for example, during penetration into a virtual object) becomes noticeable.
Stiff objects consequently feel compliant.
The complexity of realistic haptic rendering depends on the type of simulated
physical contact implemented in the virtual reality. If only the shape of an object
is being displayed, then touching the virtual environment with a pencil-style probe is
sufficient. Substantially more information needs to be transmitted to the user if it is
necessary to grasp the object and raise it to feel its weight, elasticity and texture. The
form of the user contact with the virtual object thus needs to be taken into account
178 7 Haptic Modality in Virtual Reality
for haptic rendering (for example, contact can occur at a single point, the object can
be grasped with the entire hand or with a pinch grip between two fingers).
Single-point contact is the most common method of interaction with virtual
objects. The force display provides stimuli to a fingertip or a probe that the user
holds with his fingers. The probe is usually attached as a tool at the tip of the haptic
interface. In the case of the single point contact, rendering is usually limited to the
contact forces only and not contact torques.
Two-point contact (pinch grip) enables display of contact torques through the
force display. With a combination of two displays with three degrees of freedom it
is, in addition to contact forces, possible to simulate torques around the center point
on the line, which connects the points of touch.
Multipoint contact allows object manipulation with six degrees of freedom. The
user is able to modify both the position and the orientation of the manipulated object.
To ensure adequate haptic information, it is necessary to use a device that covers the
entire hand (a haptic glove).
As with visual and acoustic rendering, the amount of details or information that
can be displayed with haptic rendering is limited. The entire environment usually
needs to be displayed in a haptic form. However, due to the complexity of haptic
rendering algorithms and the specificity of haptic sensing, which is local in nature,
haptic interactions are often limited to contact between the probe and a small num-
ber of nearby objects. Due to the large amount of information necessary for proper
representation of object surfaces and dynamic properties of the environment, haptic
rendering requires a more detailed model of a virtual environment (object dimen-
sions, shape and mechanical impedance, texture, temperature) than is required for
visual or acoustic rendering. Additionally, haptic rendering is computationally more
demanding than visual rendering since it requires accurate computation of contacts
between objects or contacts between objects and tools or avatars. These contacts
form the basis for determining reaction forces.
Haptic interfaces provide haptic feedback about the computer-generated or remote
environment to a user who interacts with this environment. Since these interfaces do
not have their own intelligence, they only allow presentation of computer-generated
quantities. For this purpose it is necessary to understand physical models of a virtual
environment that enable generation of time-dependent variables (forces, accelera-
tions, vibrations, temperature, …) required for control of the haptic interface.
The task of haptic rendering is to enable the user to touch, sense and manipulate
virtual objects in a simulated environment via a haptic interface [13, 14]. The basic
idea of haptic rendering can be explained using Fig. 7.12, where a frictionless sphere
is positioned in the origin of a virtual environment. Now assume that the user interacts
with a sphere in a single point, which is defined by the haptic interface end-effector
position (HIP) [15]. In the real world this would be analogous to touching a sphere
7.4 Haptic Rendering in Virtual Reality 179
with a tip of a thin stick. When moving through a free space, the haptic interface
behaves passively and does not apply any force onto the user until the occurrence
of contact with a sphere. Since the sphere has finite stiffness, the HIP penetrates
into the sphere at the point of contact. When contact with the sphere is detected,
the corresponding reaction force is computed and transmitted via a haptic interface
to the user. The haptic interface becomes active and generates a reaction force that
prevents further penetration into the object. The magnitude of the reaction force
can be computed based on a simple assumption that the force is proportional to the
penetration depth. With the assumption of a frictionless sphere, the reaction force
direction is determined as a vector normal to the sphere surface at the point of contact.
In general, two models of haptic interaction with the environment can be distin-
guished. The first model is called a compliance model while the second is called a
stiffness model. The two terms refer to a simple elastic model F = K x of a wall
with stiffness K , penetration depth x and reaction force F. The two concepts for
modeling haptic interaction with the environment are shown in Figs. 7.13 and 7.14.
• In the case of the stiffness model in Fig. 7.13, the haptic interface measures dis-
placement x and the simulation returns the corresponding force F as a F = K x.
Haptic interfaces that are excellent force sources are suitable for implementation
of such a model.
HIP hand
penetration
x
inverse F
dynamics
input x output F
• In the case of the compliance model in Fig. 7.14, the haptic interface measures
the force F between the user and the haptic display and the simulation returns
the displacement x as a result of relation x = K −1 F = C F, where compliance
C is defined as the inverse value of stiffness K . Stiff haptic interfaces, such as
industrial manipulators equipped with a force and torque sensor, are suitable for
implementation of such a model.
In the case of more complex models, where viscous and inertial components are
present in addition to compliance, the terms stiffness model and compliance model
are substituted with the terms impedance model (an equivalent of the stiffness model)
and admittance model (an equivalent of the compliance model). For the purpose of
generality we will use the terms impedance and admittance in the following sections.
When an object is displaced due to contact, object dynamics need to be considered to
determine the relation between force and displacement. An inverse dynamic model
is required for computing the impedance and a forward dynamic model is required
for computing the admittance causality structure.
Most approaches to haptic rendering are based on the assumption of interac-
tions with stiff grounded objects, where stiffness characteristics dominate over other
dynamic properties. In the case of objects in a virtual or remote environment that
are displaced as a result of haptic interactions, the object deformation is masked
with the object displacement. In such cases it is reasonable to attribute the entire
haptic interface displacement to the object movement without considering the object
deformation. Namely, we can assume that the majority of real objects do not deform
considerably under the contact forces. However, such an assumption is not valid
when the impedance, which is the result of object displacement, is approximately
equal to the impedance properties for object deformation.
F
forward
x
dynamics
input F output x
In this case, the viscous damping behaves as a directed damper that is active
during the penetration into the object and passive during the withdrawal from the
object. This enables a stable and damped contact with the object as well as a realistic
contact rendering. Contact relations are shown in Fig. 7.16 for a one degree of freedom
system. From relations shown in Fig. 7.16 it is apparent that at the instance of the
contact, there is a step change in the force signal due to the contribution of viscous
damping, since the approach velocity differs from zero. During the penetration into
the object the influence of the viscous part is being reduced due to the decreasing
movement velocity. At the same time the contribution of the elastic element increases
due to the increased depth of penetration into the object. At the instance of the largest
object deformation, the penetration velocity equals zero and the reaction force is only
the result of the elastic element. Since the damper operates in a single direction (only
182 7 Haptic Modality in Virtual Reality
0
F
K B
x
F
on 0
tr ati t
ne
pe
ẋ
0
t
wi
th
dr
aw
al
x F
object free space
0
t
contact equilibrium final withdrawal
Fig. 7.16 Relations during a contact simulation with a spring-directed damper model
active during penetration and inactive during withdrawal), this results in a linearly
decreasing reaction force as a function of displacement x. The reaction force reaches
zero at the boundary of the undeformed object, resulting in a smooth transition
between the object and free space. Such a modeling approach guarantees pronounced
initial contact, rigidity of a stiff surface and a smooth withdrawal from the object
surface.
Forces and movements that are computed based on dynamic model of the virtual
environment can be used as an input signal to the controller of the haptic inter-
face. Selection of control strategy (impedance or admittance control) depends on the
7.5 Control of Haptic Interfaces 183
available hardware and software architectures as well as on the planned use of the
haptic interface.
Interaction between the user and the environment presents a bilateral transfer of
energy, as the product of force and displacement defines the mechanical work. The
rate of change of energy (mechanical power) is defined by the instantaneous product
of the interaction force and the movement velocity. The exchange of mechanical
energy between the user and the haptic interface is the main difference compared
to other display modalities (visual, acoustic) that are based on one-way flow of
information with negligible energy levels.
If energy flow is not properly controlled, the effect of haptic feedback can be
degraded due to unstable behavior of the haptic device. Important issues related
to control of haptic interaction include its quality and especially stability of haptic
interaction while taking into account properties of the human operator, who is inserted
into the control loop [16, 17].
In the chapter related to modeling of collision detection we introduced concepts of
impedance and admittance models of a virtual environments. Similarly, two classes
of control schemes for control of haptic interfaces can be defined: (1) impedance
control, which provides force feedback and (2) admittance control, which provides
displacement feedback.
The impedance approach to displaying kinesthetic information is based on mea-
suring the user’s motion velocity or limb position and implementation of a force
vector at the point of measurement of position or velocity. We will assume that the
point of interaction is the user’s arm. Even though it is also possible to construct kines-
thetic displays for other parts of the body, the arm is the primary human mechanism
for precise manipulation tasks. The magnitude of the displayed force is determined
as a response of a simulated object to displacement measured on the user’s side of
the haptic interface.
Figure 7.17 shows a block scheme of impedance-controlled haptic interface. Joint
position encoders measure angular displacements q∗ . These are then used in the
forward kinematic model to compute the pose of the haptic interface end-effector
x∗ . Desired reaction forces Fe are computed based on the physical model of the
environment and the haptic interface end-effector pose (interaction force between
the user and the interface can be used as an additional input). The desired force is
then transformed into the desired joint torques through the manipulator-transposed
Jacobian matrix, and haptic display actuators are used to generate the desired torques.
Actuator torques result in a haptic display end-effector force that is perceived by the
user. Thus, the haptic interface generates forces resulting from interactions in the
virtual environment.
The main characteristics of an impedance display can be summarized as [19]:
• it has to enable unobstructed movement of the user arm when there is no contact
with the environment,
• it has to exactly reproduce forces that need to be applied on the user,
• it has to generate large forces in order to simulate or reproduce contacts with stiff
objects and
184 7 Haptic Modality in Virtual Reality
Haptic display
F q
mechanism/
end-effector x q actuators encoders
transmissions
i ↑↓
servo amplifiers
decoding
D/A converters
e q∗
xh Fh transposed direct
Jacobian m. kinematics
Fe x∗
Fig. 7.17 Block scheme of an impedance controlled haptic interface. Arrows indicate dominant
direction of flow of information. The hatched line indicates supplementary information. Adapted
from [18]
Haptic display
F q
mechanism/
end-effector x transmissions
q actuators encoders
i ↑↓
servo amplifiers
decoding
D/A converters
e q∗
xh Fh position x∗ forward
controller kinematics
xe x∗
Fh force and
torque sensor environment admittance model
Fig. 7.18 Block scheme of an admittance controlled haptic interface. Arrows indicate dominant
direction of flow of information. The hatched line indicates supplementary information
that is finally perceived by the human operator. Thus, a haptic interface displays
displacements resulting from interactions with the virtual environment.
The main characteristics of an admittance display can be summarized as [19]:
• the mechanism needs to be stiff enough to completely prevent movement of the
user’s arm when in contact with a stiff object,
• it has to exactly reproduce desired displacement,
• it has to be backdrivable to allow reproduction of free movement and
• bandwidth of the system needs to be large enough to allow reproduction of transient
responses with sufficient fidelity and accuracy.
The above characteristics are similar to the characteristics of position-controlled
robot manipulators, where high accuracy of positional tracking is required.
In some cases the interaction force can be used as an additional input to the
impedance controller. The displacement can be used as a supplementary input for
the admittance controller. The class of the control scheme is therefore usually defined
based on the output of the haptic interface (force, displacement). Impedance control
is usually implemented in systems where the simulated environment is highly com-
pliant while the admittance control approach is usually used in scenarios where the
environment is very stiff. Selection of the type of controller does not depend only
on the type of environment, but also on the dynamic properties of a haptic display.
In the case of a haptic display with low impedance, where a force sensor is rarely
186 7 Haptic Modality in Virtual Reality
Haptic displays are devices composed of mechanical parts, working in physical con-
tact with a human body for the purpose of exchanging information. When executing
tasks with a haptic interface, the user transmits motor commands by physically
manipulating the haptic display, which displays a haptic sensory image to the user
in the opposite direction via correct stimulation of tactile and kinesthetic sensory
systems. This means that haptic displays have two basic functions: (1) to measure
positions and interaction forces (and their time derivatives) of the user’s limb (and/or
other parts of the human body) and (2) display interaction forces and positions (and
their spatial and temporal distributions) to the user. The choice of the quantity (posi-
tion or force) that defines motor activity (excitation) and haptic feedback (response)
7.6 Haptic Displays 187
Fig. 7.19 A collage of different haptic robots for the upper extremities: Phantom (Sensable), Omega
(Force dimension), HapticMaster (Moog FCS), ARMin (ETH Zurich) and CyberGrasp (CyberGlove
systems)
depends on the hardware and software implementation of the haptic interface as well
as on the task for which the haptic interface is used [20–22].
A haptic display must satisfy at least a minimal set of kinematic, dynamic and
ergonomic requirements in order to guarantee adequate physical efficiency and per-
formance with respect to the interaction with a human operator.
A haptic display must be capable of exchanging energy with the user across mechan-
ical quantities, such as force and velocity. The fact that both quantities exist simul-
taneously on the user side as well as on the haptic display side means that the haptic
display mechanism must enable a continuous contact with the user for the whole time,
when the contact point between the user and the device moves in a three-dimensional
space.
The most important kinematic parameter of a haptic display is the number of
degrees of freedom. In general, the higher the number of degrees of freedom, the
greater the number of directions in which it is possible to simultaneously apply
or measure forces and velocities. The number of degrees of freedom, the type of
degrees of freedom (rotational or translational joints) and the length of the segments
determine the workspace of the haptic display. In principle, this should include at
least a subset of the workspace of human limbs, but its size primarily depends on the
tasks for which it is designed.
An important aspect of kinematics of a haptic display presents the analysis of
singularities [19]. The mechanism of the display becomes singular when one or
more joints are located at the limits of their range of motion or when two or more
joint axes become collinear. In a singular pose, the mechanism loses one or more of
its degrees of freedom.
188 7 Haptic Modality in Virtual Reality
K ẋ
F
Fa
m
F f (x, ẋ)
Fig. 7.20 Dynamic model of a haptic display with a single degree of freedom (adapted from [19])
The intrinsic haptic display dynamics distorts forces and velocities that should be
displayed to the user. A convincing presentation of contact with a stiff object, for
example, requires high frequency response bandwidth of a haptic system. Thus,
persuasiveness of force and velocity rendering is limited by the intrinsic dynamics
of the haptic display. The effect of the intrinsic dynamics can be analyzed in a case
study with a simplified haptic device consisting of a single degree of freedom as
shown in Fig. 7.20 [19]. A haptic display applies force on the user while the user
determines the movement velocity. An ideal display would allow undistorted transfer
of a desired force (F = Fa ; Fa is the actuator force and F is the force applied on
the user) and precise velocity measurement (ẋm = ẋ; ẋ is the actual velocity of the
system endpoint and ẋm is the measured velocity of the system endpoint). However,
by taking into account the haptic display dynamics, the actual force applied on the
user equals
F = Fa − F f (x, ẋ) − m ẍ . (7.3)
Thus, the force perceived by the user is reduced by the effect of friction F f (x, ẋ)
and inertia m of the haptic display. In this simplified example the stiffness K does
not affect the transfer of forces.
Equation (7.3) indicates that the mass of the haptic display affects the transmis-
sion of force to the user by resisting the change of velocity. This opposing force is
proportional to the acceleration of the display. Minimization of the haptic display
mass is necessary, since during collisions with virtual objects large accelerations
(decelerations) can be expected. In case of multidimensional displays the dynamics
becomes more complex. Except in specific cases where the dynamics of the mech-
anism is uncoupled (Cartesian mechanism), in addition to inertia, also Coriolis and
centripetal effects cause absorption of actuation forces at velocities different from
zero. A haptic display must be able to support its own weight in the gravitational
field, as otherwise the gravitational force that is not associated with the task is trans-
ferred to the user. Gravity compensation can be achieved either actively through the
display’s actuators or passively with counterweights, which further increase inertia
of the display.
Equation (7.3) indicates that part of the forces being generated by the actuators
are absorbed due to friction. Friction occurs when two surfaces that are in physical
contact move against each other. In general, friction can be separated into three
7.6 Haptic Displays 189
components: static friction (a force that is required to initiate the motion between
two surfaces, one against the other), Coulomb friction, which is velocity independent,
and viscous damping, which is proportional.
Kinesthetic haptic displays are suitable for relatively coarse interactions with vir-
tual objects, but tactile displays must be used for precise rendering. Tactile sensing
plays an important role during manipulation and discrimination of objects, where
force sensing is not efficient enough. Sensations are important for assessment of
local shape, texture and temperature of objects as well as for detecting slippage.
Tactile senses also provide information about compliance, elasticity and viscosity
of objects. Vibrations sensing is important for detection of the objects’ textures as
well as for measuring vibrations. At the same time it also shortens reaction times and
minimizes contact forces. Since reaction force is not generated prior to object defor-
mation, tactile information also becomes relevant for the initial contact detection.
This significantly increases abilities for detecting contacts, measuring contact forces
and tracking a constant contact force. Finally, tactile information is also necessary
for minimizing interaction forces in tasks that require precise manipulations.
In certain circumstances a tactile display of one type can be replaced with a display
of another type. A temperature display can, for example, be used for simulating object
material properties.
Tactile stimulation can be achieved using different approaches. Systems that are
most often used in virtual environments include mechanical needles actuated using
electromagnets, piezoelectric crystals or shape-memory alloy materials, vibrators
that are based on sound coils, pressure from pneumatic systems or heat pumps.
The vestibular sense enables control of balance. The vestibular receptor is located
in the inner ear. It senses acceleration and orientation of the head in relation to the
gravity vector. The relation between vestibular sense and vision is very strong and
the discrepancy between the two inputs can lead to nausea.
The vestibular display is based on the user’s physical movement. A movement
platform can move the ground or the seat of the user. Such platforms are typical in
flight simulators. A vestibular display alone cannot generate a convincing experience,
but can be very effective in combination with visual and audio displays.
190 7 Haptic Modality in Virtual Reality
Haptic interactions that affect the design of haptic displays can be divided into three
categories: (1) free movement in space without physical contact with surrounding
objects, (2) contact that includes unbalanced reaction forces such as pressing on an
object with the tip of a finger and (3) contact that includes balanced internal forces
such as holding an object between the thumb and index finger [23, 24].
Alternatively, classification of haptic interactions can be based on whether the user
perceives and manipulates objects directly or using a tool. The complexity of haptic
displays highly depends on type of interactions to be simulated by the interface. An
ideal haptic display designed for realistic simulation would have to be capable of
simulating the handling of various tools. Such a display would measure limb position
and display reaction forces. It would have a unique shape (e.g. exoskeleton) that
could be used for different applications by adapting the device controller. However,
complexity of human limbs and exceptional sensitivity of skin receptors together
with inertia and friction of the device mechanism and constraints related to sensing
and actuation of the display prevent the implementation of such complex devices
based on the state-of-the-art technology.
Haptic displays can be divided into grounded (non-mobile) devices and mobile
devices. Haptic perception and manipulation of objects require application of force
vectors on the user at different points of contact with an object. Consequently, equal
and opposite reaction forces act on the haptic display. If these forces are internally
balanced, as while grasping an object with the index and thumb fingers, then mechan-
ical grounding of the haptic display against the environment is not required. In the
case of internally unbalanced forces, as while touching an object with a single finger,
the haptic display must be grounded to balance the reaction forces. This means that a
haptic display placed on a table is considered a grounded device while an exoskeleton
attached to the forearm is a mobile device. If the exoskeleton is used for simulating
contact with a virtual object using a single finger, forces that would in the real world
be transferred across the entire human musculoskeletal system are now transferred
only to the forearm.
Figure 7.21 shows classification of haptic displays based on their workspace,
power and accuracy.
The use of grounded haptic displays has several advantages while executing tasks
in a virtual environment. Such displays can render forces that originate from grounded
sources without distortions and ambiguities. They may be used for displaying geo-
metric properties of objects such as size, shape and texture as well as dynamic
properties such as mass, stiffness and friction.
The main advantage of mobile haptic displays is their mobility and therefore,
larger workspace. In order to illustrate ambiguities while displaying reaction forces
using a mobile haptic display, two examples are analyzed in Fig. 7.22: grasping
a virtual ball and pressing a virtual button. In the case of a virtual ball grasped
with the thumb and index fingers, forces acting on the tip of fingers are all that is
necessary for a realistic presentation of size, shape and stiffness of a virtual object.
7.6 Haptic Displays 191
workspace (mm)
1000
mobility
(d)
500
arm
(b)
200 (c)
wrist
100
(a)
Fig. 7.21 Haptic displays classified based on their workspace, power and accuracy: a haptic displays
for hand and wrist, b arm exoskeletons, c haptic displays based on industrial manipulators d mobile
haptic displays
Fn
Fu
− Fu
− Fn
Fig. 7.22 Internally balanced forces Fu when holding a ball and unbalanced forces Fn when
pressing a button
Only internally balanced forces act between the fingers and the ball. On the other
hand, when pressing against a button, the user does not only feel the forces acting on
the finger. The reaction forces also prevent further hand movement in the direction
of the button. In this case the ungrounded haptic display can simulate the impression
of a contact between the finger and the button, but it cannot generate the reaction
force that would stop the arm movement [25].
192 7 Haptic Modality in Virtual Reality
Haptic displays may exist in the form of desktop devices, exoskeleton robots or
large systems, which can move high loads. Given the diversity of haptic feedback
(tactile, proprioceptive, thermal) and the different parts of the body to which the
display can be coupled, displays have diverse properties. Properties of haptic displays
determine the quality of the virtual reality experience. Design of haptic displays
requires compromises which ultimately determine the realism of virtual objects.
Realism defines how realistically certain object properties (stiffness, texture) can be
displayed compared to direct contact with a real object. Low refresh rate of a haptic
interface, for example, significantly deteriorates the impression of simulated objects.
Objects generally feel softer and contact with objects results in annoying vibrations
which affect the feeling of immersion. A long delay between an event in a virtual
environment and responses of the haptic display furthermore degrades the feeling
of immersion. Since haptic interactions usually require hand-eye coordination, it is
necessary to reduce both visual as well as haptic latencies and synchronize both
displays.
Kinaesthetic cues represent a combination of sensory signals that enable human
awareness about joint angles as well as muscle length and tension in tendons. They
enable the brain to perceive body posture and the environment around us. The human
body consists of a large number of joints and segments that all have receptors that pro-
vide kinaesthetic information. Therefore, it becomes impossible to cover all possible
points of contact with the body with a single haptic display.
Tactile cues originate from receptors in the skin that gather information resulting
from local contact. Mechanoreceptors provide accurate information about the shape
and surface texture of objects. Thermoreceptors perceive heat flow between the object
and the skin. Electroreceptors perceive electrical currents that flow through the skin
and pain receptors perceive pain due to skin deformation or damage.
Grounding of force displays provides support against forces applied by the user.
A display grounded relative to the environment restricts the movement of the user to
a space of absolute positions and orientations. Such display restricts the movement
between the user and the outside environment. If a display is attached only to the body
of the user, it is limited in its ability to render forces that originate from grounded
sources in the environment. Such display can only render forces that are internally
balanced between the device and the user.
The user’s mobility is restricted with the use of haptic displays that are grounded
to the environment. On the other hand, mobile displays allow users to move freely
in a large space.
The number of haptic channels is usually very limited as a result of mechanical
complexity. However, combinations of haptic displays can, for example, be used to
enable bimanual manipulation.
The number of degrees of freedom and their characteristics determine the
workspace of a haptic interface. To reach any pose in a three-dimensional space, a
display with six degrees of freedom is required. Displays with less than four degrees
of freedom are usually limited to rendering position and force, not orientation and
torque.
7.6 Control of Haptic Interfaces 193
Physical form is determined by the part of the haptic display with which the user
interacts. The form of the haptic display can be a control prop, which represents a
simple shape (stick, ball), a control prop in the form of an object (a pen, tweezers),
or an amorphous form which varies depending on the needs of the display (gloves).
Spatial and temporal resolution determine the quality of haptic interaction. The
ability of a human sensory system to distinguish between two different nearby tactile
stimuli varies for different parts of the body. This information defines the required
spatial resolution of a haptic display. Temporal resolution is defined by the refresh rate
of a haptic display control system. A low refresh rate usually causes virtual objects
to feel softer and collisions with virtual objects often result in annoying vibrations.
Safety is of utmost importance in dealing with haptic displays in the form of
robots. High forces, which may be generated by haptic devices, can damage the user
in case of a system malfunction.
References
1. Minsky M, Ouh-Young M, Steele OFB, Behensky M (1990) Feeling and seeing: issues in force
display. Comput Graphics 24:235–443 (ACM Press)
2. Barfield W, Furness TA (1995) Virtual environments and advanced interface design. Oxford
University Press, New York
3. Duke D, Puerta A (1999) Design. Specifications and verification of interactive systems.
Springer, Wien
4. Mihelj M, Podobnik J (2012) Haptics for Virtual Reality and Teleoperation. Springer
5. Jones LA (2000) Kinesthetic sensing. Human and machine haptics. MIT Press, Cambridge
6. Biggs SJ, Srinivasan MA (2002) Handbook of virtual environments, chap haptic interfaces.
LA Earlbaum, New York
7. Lederman SJ, Klatzky R (2009) Haptic perception: a tutorial. Attention Percept Psychophysics
71:1439–1459
8. Barraff D (1994) Fast contact force computation for nonpenetrating rigid bodies. Computer
Graphics Proceedings, SIGGRAPH, Orlando, pp 23–34
9. Gottschalk S (1997) Collision detection techniques for 3D models. Cps 243 term paper, Uni-
versity of North Carolina
10. Lin M, Gottschalk S (1998) Collision detection between geometric models: a survey. In: Pro-
ceedings of IMA conference on mathematics on surfaces, pp 11–19
11. Adachi Y, Kumano T, Ogino K (1995) Intermediate representation for stiff virtual objects. In:
Proceedings of virtual reality annual international symposium, pp 203–210
12. Konig H, Strohotte T (2002) Fast collision detection for haptic displays using polygonal models.
In: Proceedings of the conference on simulation and visualization, Ghent, pp 289–300
13. Okamura AM, Smaby N, Cutkosky MR (2000) An overview of dexterous manipulation. In:
Proceedings of the IEEE international conference on robotics and automation, pp 255–262
14. Salisbury JK, Brock D, Massie T, Swarup N, Zilles C (1995) Haptic rendering: programming
touch interaction with virtual objects. Symposium on interactive 3D graphics, Monterey, USA,
pp 123–130
15. Basdogan C, Srinivasan MA (2001) Handbook of virtual environments: design, implemen-
tation, and applications, chap. haptic rendering in virtual environments, Lawrence Erlbaum
Associates, New Jersey, pp 117–134
16. Kazerooni H, Her MG (1994) The dynamics and control of a haptic interface device. IEEE
Trans Rob Autom 20:453–464
194 7 Haptic Modality in Virtual Reality
17. Hogan N (1989) Controlling impedance at the man/machine interface. In: Proceedings of the
IEEE international conference on robotics and automation, pp 1626–1631
18. Carignan CR, Cleary KR (2000) Closed-loop force control for haptic simulation of virtual
environments. Haptics-e 1(2):1–14
19. Hannaford B, Venema S (1995) Virtual environments and advanced interface design, chap.
Kinesthetic displays for remote and virtual environments, Oxford University Press Inc., New
York, pp 415–436
20. Youngblut C, Johnson RE, Nash SH, Wienclaw RA, Will CA (1996) Review of virtual envi-
ronment interface technology. Ida paper p-3786, Institute for Defense Analysis, Virginia, USA
21. Burdea G (1996) Force and touch feedback for virtual reality. Wiley, New York
22. Hollerbach JM (2000) Some current issues in haptics research. In: Proceedings of the IEEE
international conference on robotics and automation, pp 757–762
23. Bar-Cohen Y (1999) Topics on nondestructive evaluation series, vol 4: automation, miniature
robotics and sensors for non-destructive testing and, evaluation, The American Society for
Nondestructive Testing, Inc
24. Hayward V, Astley OR (1996) Performance measures for haptic interfaces. Robotics Research,
pp 195–207
25. Richard C, Okamura A, Cutkosky MC (1997) Getting a feel for dynamics: using haptic interface
kits for teaching dynamics and control. In: Proceedings of the ASME IMECE 6th annual
symposium on haptic interfaces, Dallas, TX, USA, pp 15–25
Chapter 8
Augmented Reality
8.1 Definition
The goal of virtual reality is to replace sensations from the real world with artificial
sensations that originate from a virtual world. In an ideal virtual reality system, the
human is thus completely immersed into the virtual world and does not perceive the
real world at all. However, no-one says that both worlds can’t be presented to the user
at the same time: some information from the real world and some from the virtual
world. The virtual environment thus doesn’t envelop the user completely, allowing
him/her to maintain a feeling of presence in the real world.
In 1994, Milgram and Kishino [1] introduced the reality-virtuality continuum
to describe such mixed realities . The continuum defines different mixtures of real
and virtual worlds (Fig. 8.1). Between the purely real and virtual environments, we
can thus also find augmented reality (real world with additional virtual information)
and augmented virtuality (virtual world with additional real information). Today,
augmented reality is much more prevalent than augmented virtuality and already has
many important applications.
Augmented reality is defined as augmenting an image of the real world (seen by the
user) with a computer-generated image that enhances the real image with additional
information. Besides combining the real and virtual worlds, an augmented reality
system must also allow interaction in real time and track both real and virtual objects
mixed
reality
Modeling the real environment usually has two phases: first sensing the information
from the environment, then reconstructing the environment.
Information from the real environment can be obtained using different sensing
technologies: digital cameras, accelerometers, global positioning systems (GPS),
ultrasonic sensors, magnetometers, lasers, radio waves etc. Compared to sensors for
virtual reality, sensors for augmented reality require a higher accuracy and greater
range since they may also be used e.g. outdoors. Of course, sensing is much eas-
ier indoors since outdoor sensors need to be more mobile and resistant to damage.
Furthermore, buildings can easily be modeled in advance, and their lighting or tem-
perature can be controlled.
Sensors in augmented reality are divided into active and passive ones. With passive
sensors, no equipment needs to be mounted on the object we wish to detect; everything
is done by the sensor. Such systems are more user-friendly since objects don’t need
to be additionally equipped with cumbersome cables, but accurate passive sensing
8.2 Modeling the Real Environment 197
requires expensive equipment and complex software. Active sensors involve a device
(such as a marker) placed on the object we wish to track. This makes tracking easier,
but the devices need to be placed on all objects.
Popular sensor systems in augmented reality include [3]:
• Cameras with passive tracking are normal videocameras that record images of
the environment, then use image analysis methods (edge search, comparison to
previously recorded images) to extract objects and determine their position in the
environment. A training phase is usually necessary for successful object recog-
nition. It involves showing different objects to the camera from different angles,
thus allowing it to recognize them later.
• Cameras with active tracking also record images of the environment, but they do
not try to recognize objects in the image. Instead, they only search for special
(markers) that were previously placed on objects. These markers either have a
special shape or emit light (visible or infrared), so they can be easily recognized
by computers. Cameras with active tracking are more accurate than those with
passive tracking, but require the markers to be placed in advance.
• Ultrasonic sensors detect objects using ultrasonic waves (usually 40 kHz) emitted
into the environment. There are two possible implementations. In the first one,
the ultrasound emitter is attached to the object while the receivers are arrayed
around the room. The object’s position can be calculated from the time it takes the
ultrasound wave to reach the different receivers. In the second implementation,
both the emitter on the object and a fixed emitter in the room emit ultrasound
waves of the same frequency. These waves are measured using multiple receivers
arrayed around the room. The receivers measure the sum of both waves, which is
different depending on the position of both emitters (phase delay).
• Inertial sensors are a combination of accelerometers and gyroscopes attached to the
object we wish to track. If the object’s initial position is known, it can theoretically
be tracked by integrating the measured acceleration. In practice, it is necessary to
minimize measurement errors, as they are otherwise also integrated and thus result
in inaccurate measurements. Inertial sensors are thus frequently combined with
magnetometers, which measure the Earth’s magnetic field and give a reference
absolute orientation. Magnetometers themselves can provide certain information
about object positions, but combining them with accelerometers and gyroscopes
allows more accurate tracking.
• Global positioning systems calculate their position based on radio signals trans-
mitted by a system of artificial satellites orbiting the Earth. Each satellite transmits
information about its own position, the positions of the other satellites, and the
time at which the signal is emitted. A receiver needs a connection with at least four
satellites to calculate its position. Receivers need to be attached to all objects we
wish to track, and the tracking quality depends on the sensitivity and accuracy of
the receiver as well as the quality of the connection with the satellites. This quality
is poor inside buildings or near very high buildings, among other places.
• Hybrid systems combine multiple types of sensors and thus compensate for the dis-
advantages of each individual type. Global positioning systems can, for example,
198 8 Augmented Reality
camera
real world
integration
virtual objects
augmented reality
aligning
the real world
and virtual
objects
Fig. 8.2 Integration of the real and virtual environments. The different coordinate systems need to
be properly aligned
be combined with inertial systems that temporarily track the object’s motion when
no connection to a satellite is available. Similarly, inertial sensors can be com-
bined with cameras since cameras perform better with slow motions while inertial
sensors perform better with fast motions.
Once the positions of the user, display and objects in the real environment are known,
it is possible to create a three-dimensional model of the real environment and inte-
grate it with a model of the virtual environment (Fig. 8.2). The integrated model
then allows e.g. collisions between real and virtual objects to be calculated. The
mathematical tools needed to reconstruct the real environment are identical to the
previously described methods for calculating interactions between objects in virtual
reality (Chap. 3), and will thus not be separately described here.
8.3 Displays
Similarly to virtual reality, the visual appearance is probably the most important
component of augmented reality. The basic visual display technologies are similar to
8.3 Displays 199
y
head
computer position x
graphics
projector
real
world
semitransparent
glass
those in virtual reality, but there are additional challenges due to the need for mobility
and the integration of information from both real and virtual environments.
y
head
computer position x
graphics
image of
real
virtual world
objects
camera
combining screen
signals
Handheld displays are built into small, portable device such as smartphones or tablet
computers. A camera is usually built into the other side and captures an image of
the real world. By displaying this image on the screen, it gives the user the impres-
sion of looking through the device. These displays usually also include accelerome-
ters, digital compasses or global positioning systems, making them very mobile and
suitable for outdoor use. However, due to their small screen they usually allow only
a two-dimensional virtual image and do not create a feeling of virtual presence.
Spatial displays create the virtual component of augmented reality on the surface of
objects in the environment. This is usually done with projectors or holograms that
can either be limited to a single object (e.g. table of wall) or cover the entire room
with augmented reality. In both cases, a model of the room and the objects in it is
required for accurate projection. Spatial displays offer both two-dimensional and
three-dimensional images, and can also be used by several people simultaneously.
Sound displays in augmented reality are mostly limited to the kind of headphones
and speakers seen in normal virtual reality. However, some displays also incorporate
8.3 Displays 201
so-called haptic sound: sound felt through vibrations. This is generally used in head-
phones and mobile devices in order to increase realism and augment user interfaces.
In principle, augmented reality can stimulate all five senses, but most practical sys-
tems focus on sight and hearing. Haptic feedback appears mainly as part of user
interfaces while smell and taste are rarely seen in both virtual and augmented reality.
However, some examples do exist. The most noteworthy are food simulators, which
are equipped with scented, tasty fluids. They offer the user a normal piece of food
sprayed with one of the fluids, giving the user the impression of eating a different
type of food in augmented reality than in the real world.
Just like virtual reality, augmented reality must offer the user the possibility of inter-
acting with virtual objects. Typical user interfaces include [4, 5]:
• Tangible interfaces allow interaction with the virtual world via physical objects
and tools: pointers, gloves, pens, haptic robots etc.
• Collaborative interfaces use multiple displays and interfaces, allowing several
users to work together. These users can all be in the same place or at any distance
from each other.
• Hybrid interfaces combine multiple complementary interfaces and thus allow
many interaction options. Since they are very flexible, they are suitable for sponta-
neous environments where we do not know how the user will wish to communicate
with the augmented reality system.
• Multimodal interfaces combine tangible interfaces with natural forms of interac-
tion such as speech, arm movements and gaze.
8.5 Applications
Augmented reality is an excellent opportunity for games that augment the real game
(e.g. a board) with sounds and visual stimuli. These can make the game more interest-
ing or even help the player by, for example, giving a warning when the desired move
is invalid. A simple example is chess with virtual figures that can be moved with
a pointer. A similar principle is used by videogames that use various interfaces to
202 8 Augmented Reality
combine information from the real and virtual environments. Some games even allow
collaboration between many people using multiple displays and interfaces. Perhaps
the first example of such a game was the British television show Knightmare, where
a group of children must accomplish a certain goal in augmented reality. One of the
children travels through a virtual world with real opponents (players) while the other
children observe on screens and give instructions.
Augmented reality games can also be used for educational purposes. The US
army, for instance, allows soldiers to train with real weapons and virtual opponents
that react to the soldier’s movements. More peaceful games may ask the player to
solve various physical or mental challenges, thus teaching certain skills.
The user of an augmented reality system is not necessarily an active participant in
the game; he or she may only be a passive observer that obtains additional information
via augmented reality. The concept is often seen in sports broadcasts where the video
from the playing field (real information) is combined with displays of the current
score, statistical data about the players and so on. If the broadcast is shown on a
computer, the viewer may be able to select individual players and viewpoints, thus
obtaining the most desired information. The same concept can also easily be used
for education: at museums and other sights, augmented reality can offer additional
information about the user’s location and the objects seen there.
8.5.2 Medicine
Augmented reality has been extensively used to train doctors similarly to educational
games from the previous subsection. Furthermore, it is a valuable tool even for
experienced doctors since it can offer additional information in critical situations.
For instance, during surgery the computer can project an image of the patient’s
internal organs on the surface of the skin and thus help determine the exact location
of an incision. During diagnostic procedures, the computer can also project internal
organs onto the skin, thus letting the doctor better examine critical spots and estimate
the patient’s health.
Just like a surgeon can obtain information about the patient’s internal organs during
surgery, an engineer or repairman can obtain information about a machine’s internal
parts while assembling or repairing it. Here, augmented reality projects a blueprint
or other information (e.g. temperature of individual parts) directly onto the device,
thus allowing easier analysis of individual parts and a better overview of the device
as a whole.
Augmented reality can also be used to design complex machines. Actual compo-
nents can be combined with virtual components that we wish to test. We can thus
8.5 Applications 203
also quickly determine whether a component is suitable, whether the model of the
machine accurately corresponds to the real machine, how the completed product
would look and so on.
8.5.4 Navigation
When we’re traveling, augmented reality can help us reach our goal by providing us
with additional information. If traveling by foot, we can photograph the road with
a handheld device. The augmented reality system then finds known landmarks on
the image and uses them to determine the best route to take. If traveling by car,
augmented reality can be projected directly onto the windshield and provide the
driver with information such as road and weather conditions.
8.5.5 Advertising
Augmented reality was first used for advertising in the automotive industry. Some
companies printed special flyers that were automatically recognized by webcams,
causing a three-dimensional model of the advertised car to be shown on the screen.
This approach then spread to various marketing niches, from computer games and
movies to shoes and furniture. The ubiquitous QR-code (Fig. 8.5) is a very simple
example of such augmented reality: a black-and-white illustration that turns into
more complex information when analyzed by a mobile phone or computer.
An example of more complex augmented reality is virtually trying on shoes. The
user wears a special pair of socks, then walks in front of a camera and sees his/her
own image on the screen wearing a desired pair of shoes. The model, color and
accessories of the shoes can be changed in an instant, allowing the user to easily find
the most attractive footwear.
204 8 Augmented Reality
References
1. Milgram P, Kishino AF (1994) A taxonomy of mixed reality visual displays. IEICE Trans Inf
Syst E77–D(12):1321–1329
2. Azuma R (1997) A survey of augmented reality. Presence Teleoperators Virtual Environ 6:
355–385
3. Costanza E, Kunz A, Fjeld M (2009) Mixed reality: a survey. Lecture notes on computer science,
vol 1. Springer
4. Carmigniani J, Furht B, Anisetti M, Ceravolo P, Damiani E, Ivkovic M (2011) Augmented reality
technologies, systems and applications. Multimedia Tools Appl 51:341–377
5. van Krevelen DWF, Poelman R (2010) A survey of augmented reality technologies, applications
and limitations. Int J Virtual Reality 9:1–20
Chapter 9
Interaction with a Virtual Environment
Interaction with a virtual environment is the most important feature of virtual reality.
Interaction with a computer-generated environment requires the computer to respond
to user’s actions. The mode of interaction with a computer is determined by the type
of the user interface. Proper design of the user interface is of utmost importance since
it must guarantee the most natural interaction possible. The concept of an ideal user
interface uses interactions from the real environment as metaphors through which
the user communicates with the virtual environment.
Interaction with a virtual environment can be roughly divided into manipulation,
navigation and communication. Manipulation allows the user to modify the virtual
environment and to manipulate objects within it. Navigation allows the user to move
through the virtual environment. Communication can take place between different
users or between users and intermediaries in a virtual environment.
user
gesture
recognition
haptic interface
(a) (b)
virtual robot
teach pendant
avatar manipulating
an object
Fig. 9.1 Manipulation methods: a direct user control (gesture recognition), b physical control
(buttons, switches, haptic robots), c virtual control (computer-simulated control devices) and
d manipulation via intelligent virtual agents
Direct user control (Fig. 9.1a) allows a user to interactively manipulate an object
in a virtual environment the same way as he would in the real environment. Gestures
or gaze direction enable selection and manipulation of objects in virtual environment.
Physical control (Fig. 9.1b) enables manipulation of objects in a virtual environ-
ment with devices from real environment (buttons, switches, haptic robots). Physical
control allows passive or active haptic feedback.
Virtual control allows manipulation of objects through computer-simulated
devices (simulation of real-world devices—virtual buttons, steering wheel—
Fig. 9.1c) or avatars (intelligent virtual agents—Fig. 9.1d). The user activates a virtual
device via an interface (real device), or sends commands to an avatar that performs
the required action (voice commands or through the use of gestures). The advantage
of virtual control is that one real device (for example a haptic robot) activates several
virtual devices.
9.1 Manipulation Within Virtual Environment 207
(a) (b)
Fig. 9.2 Multimodal feedback (left image); virtual fixture that constrains movement of a ball along
the tunnel (right image)
Fig. 9.3 What an external viewer would see (left image); what the avatar would see—first-person
perspective (right image)
9.2 Navigation Within the Virtual Environment 209
In determining the current position and path through space, it is important to generate
a mental model of the virtual environment through which the user moves. Knowing
the location and neighborhood is defined as position awareness. Creation of a mental
model is based on different strategies that can be summarized as (1) divide and
conquer—the virtual environment is divided into smaller subregions; the user learns
features of each subregion and determines paths between subregions, (2) global
network—is based on the use of landmarks that the user remembers; navigation is
based in relation to these known landmarks, and (3) gradual expansion—is based
on gradual memorization of the map of an entire area (the user starts with a small
region that is gradually expanded outwards). Path planning may be assisted by maps,
instrumental navigation, virtual fixtures or other features. A birds-eye view of the
scene significantly simplifies navigation as well.
9.2.2 Traveling
In a virtual environment, where the area of interest extends beyond direct virtual
reach of the user, traveling is one possibility for space exploration. Some traveling
methods are shown in Fig. 9.4.
2D r 2D
3D
A B
(d) (e)
Fig. 9.4 Traveling methods: a locomotion, b path tracking, c towrope, d flying and e displacement
210 9 Interaction with a Virtual Environment
Physical locomotion is the simplest way to travel. It requires only tracking the
user’s body movement and adequate rendering of the virtual environment. The ability
to move in real space also enables proprioceptive feedback, which helps to create a
sense of relationships between objects in space. A device that tracks user movement
must have a sufficiently large working area. Path tracking (virtual tunnel) allows the
user to follow a predefined path in a virtual environment. The user is able to look
around, but cannot leave the path. The towrope method is less constraining for the
user than path tracking. The user is towed through space and may move around the
coupling entity in a limited area. Flying does not constrain the user movement to
a surface. It allows free movement in three-dimensional space. At the same time it
enables a different perspective of the virtual environment. The fastest way of mov-
ing through a virtual environment is a simple displacement that enables movement
between two points without navigation (the new location is reached instantly).
The aim of interaction with other users is the exchange of information and experi-
ences. Different methods exist for exchanging experiences with other users. Techno-
logically, shared experience can be divided into two categories: all users are virtually
present in a virtual environment or some users are outside observers and represent
an audience that watches other users who are present in a virtual environment.
power requires the cooperation of several persons, there are many tasks that require
cooperation between experts such as architects, researchers or medical specialists.
The degree of participation in a virtual environment may extend from zero, where
users merely coexist in a virtual environment, to the use of special tools that allow
users to simultaneously work on the same problem.
Interactive cooperation requires environmental coherency. This defines the extent
to which the virtual environment is the same for all users. In a completely coherent
environment, any user can see everything that other users do. It is often not necessary
for all features of the virtual environment to be coherent. Coherency is of primary
importance for simultaneous cooperation; for example, when more users work on a
single object.
References
Fig. 10.1 A generalized concept for the use of virtual reality on different platforms and different
applications
environment human models (avatars) can also be introduced, for example, to liven
up the presentation of architecture of a building. The possibility of user interaction
with the virtual environment is also important.
The virtual environment can be displayed on a variety of platforms, ranging
from stereoscopic and holographic systems to web applications and interactive 3D
documents.
Each virtual environment has its own purpose, its added value. Virtual environ-
ments can be efficiently used in education (presentation of systems through inter-
active simulations rather than using pictures in textbooks), they can be used for
marketing and sales purposes (presentation of the product characteristics, product
configuration based on the user’s needs), they are important in areas of research and
development (where the researcher can simulate and analyze events on the level of a
model), but can also be used for interactive instructions for the use of various devices.
In the next sections we will analyze some examples of virtual environments and
their use for various purposes.
The first presented scenario is a virtual game of table hockey. This virtual environment
is relatively simple and consists of a hockey table, puck and two objects that represent
the user’s handle and the opponent (Fig. 10.2).
Different coordinate frames can be identified in any virtual environment: a coor-
dinate frame of the virtual environment itself (a reference coordinate for all other
entities), a coordinate frame that defines the viewpoint of the user (how the user
perceives the virtual environment) and coordinate frames that define positions (and
orientations) of objects (each object with its own coordinate frame) in the virtual
10.1 Interactive Computer Game 215
opponent
puck
player
haptic interface
user
Fig. 10.2 User plays a game of table hockey through the use of a haptic interface. The user is
coupled to the haptic device end-effector
Nodes
virtual environment
Group
Rendered object
CST
Modifier
CST
Coord. frame
transformation hockey table
f mp = − f pm
table puck
f mn = − f nm
f mi = − f im f np = − f pn
f ip = − f pi
player opponent
Fig. 10.4 Physical interactions between objects in a virtual environment and collision forces. The
player and opponent cannot collide
Figure 10.3 shows the scene graph for a virtual game of table hockey. The scene
graph is relatively simple due to the small number of virtual objects. A hockey table
is placed in a virtual environment and playing objects (puck and two handles) are
positioned on the table. Each individual element can move relative to the table, so
blocks representing transformation of coordinate frames are placed between the table
block and playing object blocks.
Objects in a virtual environment are defined by their properties, which deter-
mine their geometry, dynamics, visual and acoustic appearance: dimensions, weight,
inertia, friction coefficient, stiffness, color and texture, sounds (during collisions).
Since there are three objects on the hockey table that can move independently of
each other, it is necessary to detect collisions between them. Figure 10.4 shows a
graph of possible collisions between objects. We assume that all collisions between
objects are possible. Only the player and the opponent cannot collide because each
is limited to its own half of the table. As a result of collisions reaction forces can be
calculated. Forces always act on both colliding objects, but with the opposite sign.
Forces acting on the table can be neglected since we assume that the table is grounded
(an object with an infinite mass), which prevents its movement.
In addition to collision forces, it is necessary to consider also forces resulting from
the interactions with the medium in which the virtual object is moving (air resistance,
friction with the table) and the gravitational force field. All forces are then summed
up and the resultant force is used to compute object movement based on object’s
dynamic properties.
Finally, computer games come to life through the use of 3D technologies. Stereo-
scopic displays with spatial effect put the player in the center of the action.
10.2 Simulated Operation of Complex Systems 217
As technology progresses, systems become more complex and their use requires
many new skills. Virtual reality is a medium that is suitable for training operators of
such systems. Thus, it is possible to practice flying an aircraft, controlling a crane,
ship navigation, surgical procedures and many other tasks in a virtual environment.
Such environments are also important in the field of robotics. Their advantage is not
only that they enable simulation of a robotic cell. If a virtual environment contains
software modules that behave as a real robot controller, it is possible to write and
validate robot programs, which are only transferred to a real robot at a later stage. This
saves time required for programming, since robot teaching is done in a simulation
environment offline while the robot can still be used in a real environment in the
mean time. At the same time, a virtual environment enables verification of software
correctness before the program is finally transferred to the robot. Simulation-based
programming may thus help avoid potential system malfunctions and consequent
damage to the mechanism or robot cell.
Figure 10.5 shows a robotic cell consisting of a robot and an eccentric press. In
this case, the robot acts as a device tending the press by inserting raw material and
removing semifinished products. Shaded in gray are shown user interface devices
(robot teach panel) for the robot and the press. Buttons on teach boxes simulate the
segment 3
segment 2
seg. 1 press
table
base
base
Fig. 10.5 Robot tending an eccentric press. Interfaces that are part of the virtual environment and
enable interaction with the user (control of the robot and press) are marked in grey color
218 10 Design of a Multimodal Virtual Environment
virtual environment
CST CST
CST
Coord. frame
transformation segment 3
Fig. 10.6 Partial scene graph (most of the elements are omitted due to the model complexity) of a
robot tending an eccentric press
operation of real controls. In principle, it is possible to also control the device through
the physical form of a user interface (teach box), which is connected with a virtual
environment. In such cases, interaction with the system becomes even more realistic.
Through the teaching unit for the robot it is, for example, possible to program a robot
as it would be done on the real system. A robot controller in a virtual environment
can interpret software commands the same way as the real controller of the system.
The operator, who is trained in a virtual environment, can immediately take over the
control of a robotic cell since he knows all its properties and behaviors.
Since the robot and the eccentric press are virtual replicas of real devices, such a
simulation can also serve a technician during troubleshooting. In a virtual environ-
ment it is possible to show internal the structure of the system as well as disassembly
and assembly procedures for individual parts.
Figure 10.6 shows the scene graph for a robotic cell. This graph is much more
complex than the graph representing a game of table hockey. The graph also shows
that the robot segments are connected in series (serial mechanism). A displacement
of the first joint causes changes of positions of all robot segments. The eccentric
press has even more components, so not all are shown in the scene graph.
During the design of complex virtual environments such as the presented robotic
cell, it is necessary to determine the mutual relationship among all components of
the system. So, for example, rotation of the eccentric press main motor affects the
movement of gears and the entire tool section of the press. For proper functioning
of the simulation, it is necessary to know the relationship between displacements,
which are determined by the kinematic model of the device.
10.3 Modeling and Simulation of an Avatar 219
ar al
t
lr ll
Fig. 10.7 Modeling of a mechanism with a skeleton: a initial block representing the trunk.
b Extrusion of legs from the trunk. c robot model and d Skeleton that allows displacement of
robot segments (t trunk, h head, ar right arm al left arm, lr right leg and ll left leg).
(a) (b)
h h
ar al ar al
t t
lr ll lr ll
Fig. 10.8 Placements of coordinate frames (a) and bounding volumes OBB for collision detection
between a robot and other objects (b) (t trunk, h head, ar right arm al left arm, lr right leg and
ll left leg)
220 10 Design of a Multimodal Virtual Environment
robot 1
ball
robot 2
are also each represented with a single segment. There are altogether six segments
that are connected by five joints. The central segment to which all other segments
are attached is the trunk. The skeleton forms the basis for animation of motion.
A coordinate frame is attached to each segment, and animation of avatar movement
can be achieved through transformations of coordinate frames (Fig. 10.8a). If the
surface of the robot continuously transforms (bends) across the robot joints, this
generates an appearance of a skin covering the avatar.
Models of avatars are relatively complex because they usually contain a large
number of degrees of freedom. Collision detection between avatars and surrounding
objects thus also becomes computationally intensive. For this purpose, the model of
an avatar can be simplified with the use of bounding volumes. In the simplest case, the
entire avatar can be embedded into a single OBB (oriented bounding box) or AABB
(axes aligned bounding box) volume. Figure 10.8b shows simplification of robot
geometry with six bounding volumes covering the torso, the head and individual
limbs. Such detailed representation of geometry allows a more accurate collision
detection. At the same time, the computation remains relatively simple.
Figure 10.9 shows the concept of two robots playing a ball game. Since it is
possible to move the robot segments and detect collisions between the ball and the
robot segments, it is consequently possible to kick or throw the ball, thus allowing
implementation of different games (football, volleyball, tennis).
Figure 10.10 shows the scene graph that includes both robots and the ball. The
scene graph is relatively simple. The graph shows that the robot trunk is the basic
structure and coordinate frame that determines position and orientation of the robot
in space. Other segments are attached to the trunk.
10.4 Interactive Education Methods 221
virtual environment
Nodes
Group ball
robot 1 robot 2
Rendered
object
t CST CST CST CST CST t CST CST CST CST CST
Modifier
Coord. frame
CST
transformation h ar al lr ll h ar al lr ll
Fig. 10.10 Scene graph for two robots and a ball (t trunk, h head, ar right arm al left arm, lr right
leg and ll left leg)
(a) (b)
(c) (d)
Fig. 10.11 Spatial representation of various functional units in human brain: a full brain model,
b internal functional units, c selected internal functional units and d internal functional units from
a different perspective
(a) (b)
(c) (d)
Fig. 10.12 Furniture configuration application: a initial assembly and positioning of an element,
b intermediate assembly, c final assembly and selection of colors (textures) and d display of func-
tionalities of various assembly elements
224 10 Design of a Multimodal Virtual Environment
In medicine, three-dimensional displays are increasingly used for training and diag-
nosis.
Modern minimally invasive surgery is often based on the use of a combination
of endoscope, laparoscope and three-dimensional display. The surgery is performed
through a small incision in the body. The display provides a spatial image obtained
in real time through an endoscope or other medical imaging techniques. The surgeon
uses information presented on the display to control the laparoscope either manually
Displacement, 23 G
Display, 14, 161 Games, 201
admittance, 186 Gesture, 87
autostereoscopic, 126 Golgi tendon organ, 167
handheld, 200 Gyroscope, 77
haptic, 161, 171, 176, 186
head-mounted, 5, 87, 122, 199
impedance, 186 H
kinesthetic, 186 Haptic, 161
liquid crystal, 116 Haptic channels, 192
multiple screens, 125 Haptic image, 162
plasma, 117 Haptic interaction, 161, 170
projector, 117 Haptic interaction point, 175, 178
properties, 115 Haptic interface, 162, 176, 178, 186
retinal projector, 123 Head-related impulse response, 135
stereoscopic, 117 Head-related transfer function, 135
tactile, 172, 176, 189 Headphone, 135
vestibular, 189 Headphone equalization, 135
visual, 115, 198 History
volumetric, 127 virtual reality, 5
Doppler effect, 155 Hologram, 128
Dynamics
haptic display, 188
mass particle, 36 I
rigid body, 37 Illumination, 110
global, 112
without light sources, 111
E illumination
Echo, 133 local, 111
Education, 221 Image
Environment reconstruction, 198 raster, 102
vector, 102
Environment topology, 2
Immersion, 192
Environmental coherency, 211
Impedance
Equal-loudness contours, 147
biomechanical, 171
Equations of motion, 36
Impedance control, 183
Euler angles, 26
Implicit surface, 104
Exposure therapy, 10
Inertial measurement unit, 79
Eye, 89
Information
eye, 97
kinesthetic, 164
tactile, 164
Infrasound, 138
F Inner ear, 147
Feedback, 207 Inside-the-head-localization, 135
sensory, 4 Intensity spectrum level, 143
Feedback loop, 10 Interaction, 53, 176, 183, 205, 210
Filtering, 134 man-machine, 176
Force display, 172 Interaction force, 84
Force plate, 86 Interactive configuration, 222
Force vector field, 172 Interactivity, 4
Force/torque sensor, 162 Interaural level difference, 154
Free field, 144 Interaural time difference, 154
Frequency modulation, 134 Interface, 93
Index 229
M
Magnetometer, 78 O
Manipulation, 205, 208 Object, 2
Mass, 38 Orientation, 19, 23
Matrix Outer ear, 145
homogenous transformation, 19, 21, 23, 29
rotation, 22, 38
Measurement P
force and torque, 84 Pacinian corpuscles, 167
Mechanical Painter’s algorithm, 110
energy, 183 Parallax, 118
power, 183 Parametric surface, 104
Meissner’s corpuscles, 167 Passive haptic feedback, 172
Merkel’s discs, 167 Path planning, 208
Middle ear, 146 Path tracking, 210
Mixed reality, 195 Perception, 14
Modality color, 98
aural, 2 depth, 99
haptic, 2 haptic, 163
visual, 2 kinesthetic, 166
Model light, 97
admittance, 180 tactile, 167
compliance, 179 vestibular, 169
free space, 180 Perspective, 4
impedance, 180 Perspective transformation, 32
object stiffness, 181 Phon, 147
polygonal, 173 Physical control, 206
spring-damper, 181 Physical input, 93
stiffness, 179 controls, 93
Modeling platform, 94
computational, 134 props, 94
graphical, 219 Pinna notches, 154
physical, 133 Pixel conversion, 113
ray-based, 134 Polygon, 103, 106
scale, 134 Pose, 19, 23
230 Index