(Premier Reference Source) Mark Grimshaw, Mark Grimshaw - Game Sound Technology and Player Interaction - Concepts and Developments (2010, IGI Global)

Game Sound Technology
and Player Interaction:

Concepts and Developments
Mark Grimshaw
University of Bolton, UK
InformatIon scIence reference

Hershey • New York
Director of Editorial Content: Kristin Klinger
Director of Book Publications: Julia Mosemann
Acquisitions Editor: Lindsay Johnston
Development Editor: Joel Gamon
Publishing Assistant: Milan Vracarich Jr.
Typesetter: Natalie Pronio
Production Editor: Jamie Snavely
Cover Design: Lisa Tosheff
Published in the United States of America by

Information Science Reference (an imprint of IGI Global)
701 E. Chocolate Avenue
Hershey PA 17033
Tel: 717-533-8845
Fax: 717-533-8661
E-mail: cust@igi-global.com
Web site: http://www.igi-global.com
Copyright © 2011 by IGI Global. All rights reserved. No part of this publication may be reproduced, stored or distributed in
any form or by any means, electronic or mechanical, including photocopying, without written permission from the publisher.
Product or company names used in this set are for identification purposes only. Inclusion of the names of the products or com-
panies does not indicate a claim of ownership by IGI Global of the trademark or registered trademark.
Library of Congress Cataloging-in-Publication Data
Game sound technology and player interaction : concepts and development / Mark Grimshaw, editor. p. cm.
Summary: "This book researches both how game sound affects a player psychologically, emotionally, and physiologically,
and how this relationship itself impacts the design of computer game sound and the development of technology"-- Provided by
publisher. Includes bibliographical references and index. ISBN 978-1-61692-828-5 (hardcover) -- ISBN 978-1-61692-830-8
(ebook) 1. Computer games--Design. 2. Sound--Psychological aspects. 3. Sound--Physiological effect. 4. Human-computer
interaction. I. Grimshaw, Mark, 1963-
QA76.76.C672G366 2011
794.8'1536--dc22
2010035721
British Cataloguing in Publication Data

A Cataloguing in Publication record for this book is available from the British Library.
All work contributed to this book is new, previously-unpublished material. The views expressed in this book are those of the
authors, but not necessarily of the publisher.
Editorial Advisory Board
Theo van Leeuwen, University of Technology, Australia
Gareth Schott, University of Waikato, New Zealand
List of Reviewers
Thomas Apperley, University of New England, Australia
Roger Jackson, University of Bolton, England
Martin Knakkergaard, University of Aalborg, Denmark
Don Knox, Glasgow Caledonian University, Scotland
Theo van Leeuwen, University of Technology, Sydney, Australia
David Moffat, Glasgow Caledonian University, Scotland
Patrick Quinn, Glasgow Caledonian University, Scotland
Gareth Schott, University of Waikato, New Zealand
Table of Contents
Foreword ............................................................................................................................................. xii
Preface ................................................................................................................................................ xiv
Acknowledgment ................................................................................................................................. xx
Section 1
Interactive Practice
Chapter 1
Sound in Electronic Gambling Machines: A Review of the Literature and its
Relevance to Game Sound ...................................................................................................................... 1
Karen Collins, University of Waterloo, Canada
Holly Tessler, University of East London, UK
Kevin Harrigan, University of Waterloo, Canada
Michael J. Dixon, University of Waterloo, Canada
Jonathan Fugelsang University of Waterloo, Canada
Chapter 2
Sound for Fantasy and Freedom... ........................................................................................................ 22
Mats Liljedahl, Interactive Institute, Sonic Studio, Sweden
Chapter 3
Sound is Not a Simulation: Methodologies for Examining the
Experience of Soundscapes................................................................................................................... 44
Linda O’ Keeffe, National University of Ireland, Maynooth, Ireland
Chapter 4
Diegetic Music: New Interactive Experiences... ................................................................................... 60
Axel Berndt, Otto-von-Guericke University, Germany
Section 2
Frameworks & Models
Chapter 5
Time for New Terminology? Diegetic and Non-Diegetic Sounds in Computer
Games Revisited... ................................................................................................................................ 78
Kristine Jørgensen, University of Bergen, Norway
Chapter 6
A Combined Model for the Structuring of Computer Game Audio...................................................... 98
Ulf Wilhelmsson, University of Skövde, Sweden
Jacob Wallén, Freelance Game Audio Designer, Sweden
Chapter 7
An Acoustic Communication Framework for Game Sound: Fidelity, Verisimilitude,
Ecology ............................................................................................................................................... 131
Milena Droumeva, Simon Fraser University, Canada
Chapter 8
Perceived Quality in Game Audio ...................................................................................................... 153
Ulrich Reiter, Norwegian University of Science and Technology, Norway
Section 3
Emotion & Affect
Chapter 9
Causing Fear, Suspense, and Anxiety Using Sound Design in Computer Games... ........................... 176
Paul Toprac, Southern Methodist University, USA
Ahmed Abdel-Meguid, Southern Methodist University, USA
Chapter 10
Listening to Fear: A Study of Sound in Horror Computer Games... .................................................. 192
Guillaume Roux-Girard, University of Montréal, Canada
Chapter 11
Uncanny Speech.................................................................................................................................. 213
Angela Tinwell, University of Bolton, UK
Mark Grimshaw, University of Bolton, UK
Andrew Williams, University of Bolton, UK
Chapter 12
Emotion, Content, and Context in Sound and Music.......................................................................... 235
Stuart Cunningham, Glyndŵr University, UK
Vic Grout, Glyndŵr University, UK
Richard Picking, Glyndŵr University, UK
Chapter 13
Player-Game Interaction Through Affective Sound... ........................................................................ 264
Lennart E. Nacke, University of Saskatchewan, Canada
Section 4
Technology
Chapter 14
Spatial Sound for Computer Games and Virtual Reality... ................................................................. 287
David Murphy, University College Cork, Ireland
Flaithrí Neff, Limerick Institute of Technology, Ireland
Chapter 15
Behaviour, Structure and Causality in Procedural Audio... ................................................................ 313
Andy Farnell, Computer Scientist, UK
Chapter 16
Physical Modelling for Sound Synthesis... ......................................................................................... 340
Eoin Mullan, Queen’s University Belfast, N. Ireland
Section 5
Current & Future Design
Chapter 17
Guidelines for Sound Design in Computer Games... .......................................................................... 362
Valter Alves, University of Coimbra, Portugal & Polytechnic Institute of Viseu, Portugal
Licínio Roque, University of Coimbra, Portugal
Chapter 18
New Wine in New Skins: Sketching the Future of Game Sound Design... ........................................ 384
Daniel Hug, Zurich University of the Arts, Switzerland
Appendix..... ....................................................................................................................................... 416
Compilation of References ............................................................................................................... 427
About the Contributors .................................................................................................................... 467
Index ................................................................................................................................................... 473

Detailed Table of Contents
Foreword ............................................................................................................................................. xii
Preface ................................................................................................................................................ xiv
Acknowledgment ................................................................................................................................. xx
Section 1
Chapter 1
Sound in Electronic Gambling Machines: A Review of the Literature and its
Relevance to Game Sound ...................................................................................................................... 1
Karen Collins, University of Waterloo, Canada
Holly Tessler, University of East London, UK
Kevin Harrigan, University of Waterloo, Canada
Michael J. Dixon, University of Waterloo, Canada
Jonathan Fugelsang University of Waterloo, Canada
An analysis of the music and sound used in electronic gambling machines. The psychology at play is
discussed: how sound is used to create a sense of winning and how such specific sound design might be
useful to computer game sound design in general.
Chapter 2
Sound for Fantasy and Freedom... ........................................................................................................ 22
Mats Liljedahl, Interactive Institute, Sonic Studio, Sweden
The relationship between sound and image in computer games and how, in a reversal of the normal
situation, sound can be given priority over the visual. The rationale for such a reversal is demonstrated
through practical game design examples.
Chapter 3
Sound is Not a Simulation: Methodologies for Examining the
Experience of Soundscapes................................................................................................................... 44
Linda O’ Keeffe, National University of Ireland, Maynooth, Ireland
What is the relationship between player and the game’s soundscape? How elements of the soundscape
are perceived by the player is explained through the principles and theories of acoustic ecology.
Chapter 4
Diegetic Music: New Interactive Experiences... ................................................................................... 60
Axel Berndt, Otto-von-Guericke University, Germany
An analysis of diegetic music in games and, in particular, an assessment of issues of interaction and
algorithmic performance. A framework is proposed that aids in the design of both individual and social
musical performance paradigms into music games.
Section 2
Frameworks & Models
Chapter 5
Time for New Terminology? Diegetic and Non-Diegetic Sounds in Computer
Games Revisited... ................................................................................................................................ 78
Kristine Jørgensen, University of Bergen, Norway
The terms diegetic and non-diegetic are widely used in the analysis games (and not solely for sound).
A thorough analysis of the application of the terminology to computer game sound is provided resulting
in a new model that accounts for the interactive nature of the medium.
Chapter 6
A Combined Model for the Structuring of Computer Game Audio...................................................... 98
Ulf Wilhelmsson, University of Skövde, Sweden
Jacob Wallén, Freelance Game Audio Designer, Sweden
A framework for the analysis and design of computer game sound is provided that builds upon existing
frameworks in games and film. A practical example demonstrates the models utility.
Chapter 7
An Acoustic Communication Framework for Game Sound: Fidelity, Verisimilitude,
Ecology ............................................................................................................................................... 131
Milena Droumeva, Simon Fraser University, Canada
Soundscape and communication theories are used to assess the computer game’s soundscapes and the
ways in which the player perceives it. Different codes of realism are discussed and a model of the player
and soundscape combined as acoustic ecology is proposed.
Chapter 8
Perceived Quality in Game Audio ...................................................................................................... 153
Ulrich Reiter, Norwegian University of Science and Technology, Norway
Perceptual bi-modality and cross-modality between auditory and visual stimuli is discussed in addition
to issues of realism and verisimilitude. A design model is put forward that assesses audio quality in
computer games on the basis of player interactivity and attention.
Section 3
Emotion & Affect
Chapter 9
Causing Fear, Suspense, and Anxiety Using Sound Design in Computer Games... ........................... 176
Paul Toprac, Southern Methodist University, USA
Ahmed Abdel-Meguid, Southern Methodist University, USA
An overview of relevant emotion theories and their potential application to sound design for computer
games. In particular, discussion centres around the eliciting of fear and anxiety during gameplay and
the results of experiments in this area are discussed.
Chapter 10
Listening to Fear: A Study of Sound in Horror Computer Games..................................................... 192
Guillaume Roux-Girard, University of Montréal, Canada
A thorough analysis of sound design and sound perception in the survival horror game genre that fo-
cuses upon sound’s ability to instil fear and dread in the play. An analytical model of sound design is
proposed that is founded upon the reception of sound, rather than production, and the use of the model
is illustrated through several practical examples.
Chapter 11
Uncanny Speech.................................................................................................................................. 213
Angela Tinwell, University of Bolton, UK
Andrew Williams, University of Bolton, UK
An exploration of the genesis of the Uncanny Valley theory and its implications for the design and per-
ception of Non-Player Character speech in horror computer games. Empirical work by the authors’ on
the perception of such speech is discussed particularly with regard to the evocation of fear and anxiety.
Chapter 12
Emotion, Content, and Context in Sound and Music.......................................................................... 235
Stuart Cunningham, Glyndŵr University, UK
Vic Grout, Glyndŵr University, UK
Richard Picking, Glyndŵr University, UK
A summary of emotion research and its relevance to the design of sound and music for computer games
is provided before a discussion on the use and effect of musical playlists during gameplay. In particular,
such playlists can be generated automatically according to the real-world environment the player plays
in and according to the player’s changing psychology and physiology.
Chapter 13
Player-Game Interaction Through Affective Sound... ........................................................................ 264
Lennart E. Nacke, University of Saskatchewan, Canada
An assessment of the role and efficacy of psychological, physiological, and psychophysiological mea-
surements of players exposed to sound and music during gameplay. Recent empirical results from a
psychophysiological study on computer game sound is presented followed by a discussion on the im-
plications of biofeedback for game sound design and player immersion.
Section 4
Technology
Chapter 14
Spatial Sound for Computer Games and Virtual Reality... ................................................................. 287
David Murphy, University College Cork, Ireland
Flaithrí Neff, Limerick Institute of Technology, Ireland
An introduction to spatial sound, its application to computer games and the technological challenges
inherent in emulating real-world spatial acoustics in virtual worlds. A variety of current technologies
are assessed as to their strengths and weaknesses and suggestions made as to the requirements of future
technology.
Chapter 15
Behaviour, Structure and Causality in Procedural Audio... ................................................................ 313
Andy Farnell, Computer Scientist, UK
A critical assessment of the current use of audio samples for computer games from the point of view
of creativity and realism in game sound design. Procedural audio is proposed instead and the strengths
and opportunities afforded by such a technology is discussed.
Chapter 16
Physical Modelling for Sound Synthesis... ......................................................................................... 340
Eoin Mullan, Queen’s University Belfast, N. Ireland
A review of the potential for computer game sound design of one branch of procedural audio viz. physi-
cal modelling synthesis. The historical evolution of the process is traced leading to a discussion of how
such synthesis might be integrated into game engines and the implications for player interaction.
Section 5
Chapter 17
Guidelines for Sound Design in Computer Games... .......................................................................... 362
Valter Alves, University of Coimbra, Portugal & Polytechnic Institute of Viseu, Portugal
Licínio Roque, University of Coimbra, Portugal
A discussion of the relevance and importance of sound to the design of computer games with particular
regard to the concepts resonance and entrainment. Seven guidelines for game sound design are pre-
sented and exemplified through an illustrative example of a game design brief.
Chapter 18
New Wine in New Skins: Sketching the Future of Game Sound Design... ........................................ 384
Daniel Hug, Zurich University of the Arts, Switzerland
The aesthetic debt that computer game sound owes to film sound is described as a prelude to a variety
of examples from independent game developers going beyond such a paradigm in their sound design.
Suggestions are made as to how game sound design might evolve in the future to take greater account
of the interactive potential inherent in the structure of computer games.
Appendix..... ....................................................................................................................................... 416
Compilation of References ............................................................................................................... 427
About the Contributors .................................................................................................................... 467
Index ................................................................................................................................................... 473

xii
Foreword
BANG! There, that got your attention. OK, so that’s a fairly bad joke to illustrate just what sound can
do for you… namely, GET YOUR ATTENTION! Actually, sound does so much more: it connects your
visual input to a frame of reference, the audio-visual contract. So, when we create experiences, either
in film, TV, live on stage, or in computer games, we use this cerebral connection between sound and
vision to intensify your overall experience. Because, that’s our goal in any of these mediums–to create
an experience!
Sound takes up 50% of this experience (maybe not 50% of the budget, but that’s another story).
There’s an old adage we audiophiles use when discussing budgets in the hope that a producer might
actually listen to us once in a while. If you get a room full of people to watch great graphics with poor
sound and then compare it to poor graphics with great sound, they will almost always perceive the latter
as the best quality graphics.
Generally producers don’t believe this story, but I have witnessed it in real life. A few years ago I
was working on an AAA title–action adventure: cars, guns, gangsters… you get the idea. One evening,
the sound designer reworked the “Whacking someone over the head with a pool cue” sound, improving
its overall effectiveness with small, subtle, deep thuds, some crunching bone (actually carrots), and a
deliciously realistic skin smacking sound (supermarket chicken being hit by a baseball bat). He added
his new sound to the game database and went home. The following morning the game team rebuilt the
whole game (including the new sound). Later that day many people congratulated the “Whacking someone
over the head with a pool cue” animator on his new improved animation: he was somewhat bemused to
say the least. He hadn’t worked on that animation for several weeks. I’m sure you can work out what
happened, people saw the same animation with the new improved sound and believed they were seeing
a better animation. This is how we use the audio-visual contract to our benefit.
OK, so that’s my practitioner's story in, but let’s take a look at game sound and what you need to
study if you are interested in this field… and what’s in this book. There are several axes or dimensions
to think about. Emotion is the obvious one: fear, anger, hatred and so on, these are all well represented
in game sound, from survival horrors to gangster simulations. But what about humour, joy, happiness?
Just play Mario Kart, Sonic the Hedgehog, Loco Roco and I guarantee you’ll soon realise that the sound
has a great deal to do with provoking laughter, smiles, and an enlightened mood.
So, I’ve now mentioned the breadth of experience our industry creates, but think too of another
axis, the history of game sound. From tiny little beeps and bleeps (Pong) to the colossus soundscapes
of today’s blockbuster games. A story which starts with a few programmers/musicians/sound engineers
trying to get “something” out of a paltry 8-bit chip after the graphics guys have already had their fill,
through to my point at the beginning of this introduction–persuading a producer to give you some kind of
xiii
serious sound budget. A tale of one guy who does everything (including the voice over) to a small army
of specialists from musicians, Foley artists, sound technicians, weapons specialists, vehicle specialists,
atmosphere creators, the list goes on. Our game sound pioneers took this journey and, along the way,
solved some tricky issues, like repetition–in music, in dialogue, in sound effects–memory management,
automated in-game mixing and so on.
I am going to sum this section up by saying there are now many different aspects to game sound:
music, diegetic sound, atmospheres, interactive music, development of emotional connection, realism,
abstractism, super-realism. What I really like about this book’s approach to game sound is the 5 core
sections which give it a unique and very practical way of tying together all the axes I mentioned earlier,
namely: Interactive Practice, Frameworks & Models, Emotion & Affect, Technology, and Current &
Future Design. In conclusion, then, I hope that you, as a reader, enjoy the discussion and findings dis-
cussed here as much as I have.
Dave Ranyard
Dave Ranyard is the Game Director/Executive Producer of Sony’s hugely successful, 20+ million selling SingStar franchise.
He has been in the games industry since the mid nineties, starting out as an AI programmer at Psygnosis, and later moving to
Sony Computer Entertainment Europe's London Studio where he has held a number of roles over the past 10 years, ranging
from audio manager to running the internal creative services group. He has worked on titles including Wip3out, The Getaway
& The Getaway: Black Monday, The Eyetoy: Play series and, more recently, Singstar. Prior to the games industry he lectured
in Artificial Intelligence at the University of Leeds where he also gained a PhD in the subject. In recent years, Dave has taken
a keen interest in GDC and is currently on the advisory board. Dave is a keen musician and he has written and produced many
records over the past 15 years.
xiv
Preface
A phrase often used when writing about the human ability to become immersed in fantasy is “the will-
ing suspension of disbelief” which Samuel Taylor Coleridge first coined in the early 19th Century as
an argument for the fantastical in prosody and poetry. What is a computer game? At base, it is nothing
more than a cheap plastic disc encased within a cheaper plastic tray. And the system it is destined for?
A box of electronics, lifeless in a corner. Put the two together, though, throw in the player's imagination
and interaction and he or she is delivered of experiences that, to use Diderot's phrase, are “the strongest
magic of art”. Disbelief is suspended willingly, sense and rationality recede, and the player becomes
engaged with, engrossed in, and, given the appropriate game, immersed in a virtual world of flickering
light and alluring sound where the fantastical becomes the norm and the mythic reality.
For the reader interested in that flickering light, there is a plethora of books and scholarly articles
on the subject. For the reader interested in the ins and outs of music and sound software, how to rig a
microphone to record sound, and how to transfer that sound to a game environment, there likewise is a
wealth of handy resources. For the reader truly interested in understanding or harnessing the power of
sound in that virtual world, in emulating reality or the creation of other realities, in engaging, engrossing
and immersing the player through sound and emotion, there is this book.
This is a book that deals with computer game sound in a variety of forms and from a variety of
viewpoints. Sound FX, rather than game music is the topic, other than where the music is interactive or
otherwise intimately bound up with the playing of the game. Such sound FX may be used to emulate
acoustic environments of the real world while others deliberately set out to create alternate realities,
some are based upon the use of audio samples whilst others are starting to make use of procedural syn-
thesis and audio processing, some sound works hand-in-hand with image and game action to immerse
the game player in the gameworld while interactive music in other cases is the sole raison d’être of the
game. From the simplest of puzzle games to the most detailed and convoluted of gameworlds, sound
is the indicator par excellence of player engagement and interaction with the structures of the game and
the rules of play.
Academic writing about game sound, its analytical and theoretical drivers , is a developing area and
this is reflected by the diversity of theoretical methodologies and the variety of terminology in use. Far
from being a weakness, this range points to the potential for the discipline and the wide appeal of its
study because it is, at heart, multidisciplinary. The range of subject matter across the chapters reflects
the complexity and potential of human interaction with sound in virtual worlds as much as it reflects
the passions, backgrounds, and training of the book's contributors. Their contributions to the study of
computer game sound bring in disciplines and theories from film studies, cultural studies, sound design,
acoustic ecology, acoustics, systems design and computer programming and cognitive sciences and psy-
xv
chology. The authors themselves have a diversity of experience: some are researchers and academics
whilst others are sound practitioners in the games industry. All are experts in their chosen field yet all
are students of game sound, forever exploring, forever questioning, forever seeking to drive the study
and practice forward.
The readership of this book is intended to be similarly diverse in terms of both discipline and mo-
tivation. There is something for everyone here: the student for whom a knowledge of computer game
sound leads to that important qualification , a game sound designer wishing to keep abreast of the latest
thinking and developmental concepts, or an academic theoretician or researcher working to innovate
game sound theory or technology. Furthermore, the appeal of the book is wider than computer games,
reaching out to those working in virtual reality or with autism, for example. The reader will not find
screeds of instructions for software or hardware, programming recipes or tips on how to break into the
industry. Instead, contained within this book, will be found lucid essays on philosophical questions,
theoretical analyses on aspects of computer game sound, models for conceptualizing sound, ideas for
sound design, and provocative discussions about new sound technology and its future implications. All
chapters raise further questions as to the fascinating relationship between player and sound.
Reflecting the disciplines the authors come from, some key terms (found at the back of each chapter)
are provided with definitions that, prima facie, differ slightly to the definition provided for the same key
term in another chapter. As with the authors' preferences for American or British English, this has been
allowed to stand in order to illustrate both the diversity of approach to the topic throughout the book
and the educational and professional backgrounds of each author. The study of computer game sound
is yet young and the terminology and its application still in flux: the definition for each key term, where
minor differences exist, pertains to the chapter the key term belongs to.
The term “computer game” has been chosen, in preference to a number of other possibilities, as
referring to all forms of digital game, arcade machine, gaming console, PC game, or videogame and the
reader may assume that, unless a chapter uses one of those specific forms, “computer game” references
the general case. Quite deliberately, the term has been chosen in preference to videogame in order to
fly the flag for sound: videogames are not just video but sound too and all chapters proselytize the im-
portance of sound to the game experience even where they reference the relationship of sound to image.
“Sound” has generally been chosen in preference to “audio” because the focus of the book is on the
relationship between sound and player rather than techniques for creating and manipulating audio data.
However, “audio” is the usual term in some disciplines and, here, authors have been given free reign to
use whichever terminology they are comfortable with.
The book itself is organized into five sections. None is mutually exclusive in terms of its content.
Indeed, the astute reader will pick up divers common threads meandering their way through the chapters:
the debt game sound owes to film sound and the need to slough off that used skin, issues of presence and
player immersion, realism, the unique, interactive nature of computer game sound, and the potential for
the emotional manipulation of the player, for instance. All chapters, too, have an eye on the future and
its possibilities and authors have been encouraged to speculate on that future.
An oft-overlooked area in computer gaming (and certainly not the first thing that comes to mind
with the term “computer game”) is that of electronic gambling machines: one-armed bandits and their
modern equivalents. Karen Collins and her co-authors open the first section on Interactive Practice by
providing a fascinating glimpse into the sound of such machines and how music and sound FX provoke
and toy with the user's emotions in an effort to part them from their money. They draw parallels to sound
use in other, more typical computer games and suggest ways in which sound use in electronic gambling
xvi
machines might provide inspiration for the design and analysis of sound in computer games in general.
Mats Liljedahl's chapter is an attempt to redress the imbalance between visual and auditory modes in
computer games. It does this by providing an overview of the use of sound in virtual environments and
augmented realities, in particular, concentrating on the sound designer's required attention to emotion
and flow. Using the concept of GameFlow, Liljedahl describes and explains two games he has been
involved in the design of in which the sound modality is purposefully given priority over the visual.
The chapter seeks to inspire and serves as an introduction to the art of computer game sound design:
Sound for Fantasy and Freedom.
Linda O Keeffe takes a holistic view of computer game sound by treating it as a dynamic soundscape
created anew at each playing. She draws upon soundscape and acoustic ecology theory to elucidate
her stance and compares and contrasts game soundscapes to real-world soundscapes. Throughout, O
Keeffe prompts questions as to the listener's perception of, and relationship to, soundscapes: what is
noise, what roles do context and the player's culture and society play? Ultimately, how can (and why
should) we design immersive soundscapes for the gameworld?
Next, Axel Berndt takes a close look at the occurrence of diegetic music in games, design principles
for music games and issues of interactivity and algorithmic performance. A critique is presented of
recent and current games as regards the performance of in-game music and advice and solutions are
offered to improve what is currently a rather static state of affairs, merely scratching at the surface of
possibility. Interactivity in music games is assessed through a critique of what is termed visualized music
and Berndt proposes a framework of design that incorporates musical performance paradigms both as
individual and as social, collaborative practice.
This leads to Frameworks & Models which is opened by Kristine Jørgensen and whose chapter is
both an exhaustive survey of the use of diegetic terminology, with regard to game sound, and a proposal
for a new conceptual model for such sound. The main thrust of her argument is that the concepts of
diegetic sound and non-diegetic sound have been transposed from film theory to the study of computer
games with frequently scant regard for the very different premises of the two media. The interactive,
real-time nature of computer games and the immersive environments of many game genres requires a
radical reappraisal of sound usage and sound design for games: games are not films and the use-value
of game sound is greater than that of film sound.
This is followed by a chapter in which Ulf Wilhelmsson and Jacob Wallén propose a model for the
analysis and design of computer game sound that combines two previous models–the IEZA Framework
for game sound and Walter Murch's conceptual model for film sound–with affordance and cognition
theories. The IEZA Framework accounts for the structural basis of game sound, the function of sound,
while Murch's model describes sound as either embodied or encoded, a system that accounts for human
perception and cognitive load limits. Combining the two systems, the authors assert, provides a powerful
tool not only for analysis but also for the planning and design of computer game sound and this claim
is demonstrated by a practical example.
Milena Droumeva, in her chapter,filters the computer game soundscape through the precepts of
Schafer's and Truax's soundscape and acoustic communication theories. Different ways of listening to
game sound are proposed with assessments of the role of sound in the perception of realism: Does sound
provide fidelity to source or does it provide a sense of verisimilitude and what are the strengths of each
approach as regards computer game sound design? Droumeva ultimately advocates a view comprising
game soundscape and player together as an acoustic ecology and expands that ecology from the virtual
world of the game to include concurrent sounds from the real world.
xvii
Ulrich Reiter's chapter on Perceived Quality in Game Audio explores the bi-modality and cross-modality
of auditory and visual stimuli in gameworlds. It summarizes previous work in this area and proposes a
high-level salience model for the design of audio in games that accounts for both interactivity and at-
tention as the bases for the evaluation of audio quality. Issues of level of realism and verisimilitude are
discussed while the validity, and use, of Reiter's proposed model is substantiated through experimental
methods outlined towards the end of the chapter.
Paul Toprac and Ahmed Abdel-Meguid's chapter introduces the section dealing with Emotion &
Affect. The authors present to the reader four relevant emotion theories then summarize fundamental
research they conducted in order to test those properties of diegetic sound best suited to evoke sensations
of fear, anxiety, and suspense. Their results, an early example of an empirical and statistical basis for
sonic fear and anxiety, lead the authors to devise some rough heuristics for the design of such emotions
into computer game sound and to point to directions for future research in the area.
The following chapter, by Guillaume Roux-Girard, also deals with the perception of fear in computer
game sound being an in-depth analysis of sound usage in the survival horror game genre that focuses on
sound's ability to instil fear and dread in the player. It proposes a model for sound, based upon film sound
practice and existing models for computer game sound, that is user-centric–one based on the reception
of sound rather than its production–and Roux-Girard provides several illustrative examples from recent
horror games to validate the model.
In Uncanny Speech, Angela Tinwell, Mark Grimshaw, and Andrew Williams continue the horror theme
with a look at Non-Player Character speech in horror games and its relationship to the 1970s' theory of
the Uncanny Valley. The authors trace the development of theories of the uncanny from its beginnings
in psychoanalysis over 100 years ago through to its practical application in robotics (as the Uncanny
Valley theory) and its strong correlation to fear and anxiety in computer games. Recent empirical work
by the authors is described and its implication for the design and production of Non-Player Character
speech in computer games is discussed.
Stuart Cunningham, Vic Grout, and Richard Picking's chapter looks at Emotion, Content, and Context
in Sound and Music. The chapter is an exploration of the interaction that is possible between player
and computer game sound, in particular, music playlists used in conjunction with games. The authors
provide an overview of emotion research in the context of computer games and consider the emotional
and affective value of sound and music to the player. The experimental work that is summarized in
the chapter includes the generation of musical playlists according to the environmental context of the
player: that is, the environment outside the game. Not only does this raise the intriguing situation of
sensory and perceptual overlap and interplay between real-world and virtual, but the authors also sug-
gest further possibilities such as the playlists themselves being responsive to the changing psychology
and physiology of the player during gameplay.
Concluding the section Emotion & Affect, Lennart Nacke and Mark Grimshaw's chapter is a study
of computer games as affective activity. In this form of activity, sound has a large role to play and the
chapter focuses on that role as it affects, indeed effects flow and, particularly, immersion: in the latter
case, a preliminary mathematical equation is supplied for modelling immersion. The authors start with a
review of psychological and physiological experiments and, combining these approaches, psychophysi-
ological experiments on the effects of sound and image in virtual environments. Following a summary
of a recent psychophysiological study on computer game sound conducted by the authors, the chapter
concludes with a discussion on the advantages and disadvantages of such an empirical methodology
before speculating on the implications of biofeedback for computer game sound with reference to player
xviii
interaction and immersion.

The section on Technology opens with a chapter on Spatial Sound for Computer Games and Virtual
Reality by David Murphy and Flaithrí Neff. The authors guide the reader through the basics of human
spatial sound processing and the propagation of sound in space while pointing out the problems faced in
transferring and accurately replicating these phenomena within computer game systems. They conclude
with a survey of existing spatial sound technologies for use in virtual worlds, their strengths and weak-
nesses, and look to the future possibilities for computer game sound posed by the ongoing development
of the technology.
Andy Farnell's chapter comprises an in-depth critique of sample-based game audio followed by an
analysis of the potential of procedural audio: the real-time design of sound. For Farnell, audio samples
have proven to be too limiting, both for the purposes of creativity in game sound design and for the
promise of realism: audio samples are predicated upon selection whereas procedural audio is design. A
close discussion of procedural audio techniques, both as they have been used and how they might be used
to the benefit of computer games, leads the author to the conclusion that it is both pointless and wasteful
of computer resources to pursue precise sonic realism: procedural audio can instead be used to provide
just the necessary level of realism, a perceptual realism, that is required for the player to comprehend
source and source behaviour whilst saving scarce resources for more interesting and immersive tasks.
Eoin Mullan's chapter delves further into the promise of procedural audio by providing a detailed
exploration of the potential of physical modelling, a branch of procedural audio. He traces the technique's
development from the synthesis of musical instrument sounds to its current state where it stands poised
to deliver a new level of behavioural realism to computer games. For Mullan, this will be achieved
through the integration of the technology with game physics engines and through physical modelling's
ability to provide unprecedented levels of player-sound interaction.
Current & Future Design is the subject of the next two chapters comprising the final section. First,
Valter Alves and Licínio Roque present a lucid case for the importance of sound to the design and ex-
perience of computer games: attention should be paid to sound from the start of the design process the
authors assert. Alves and Roque discuss concepts such as resonance and entrainment as means to engage
and immerse players in the gameworld. They present 7 guidelines for game sound design and detail an
illustrative example of the application of these heuristics.
Lastly, in this section, Daniel Hug's chapter is a clarion call for a new aesthetic of computer game
sound. Through a discussion of two dominant paradigms in computer game sound discourse, pursuit
of reality and cinematic aesthetics, it details the debt that game sound owes to cinema sound but then
uses examples, in particular from many innovative game developers and from cinema's own subversive
stream, to shrug off that mantle and argue for a new future for game sound design. Rich in ideas and
provocative in its discourse, the chapter is full of practical suggestions for making computer game sound
not only a different experience to cinematic sound but an engaging and rewarding one too.
Closing the chapters is an appendix which is a lightly edited transcript of an online discussion fo-
rum to which the book’s contributors were invited to attempt to debate and answer the question: What
will the player experience of computer game sound be in the future? This is, of course, an open-ended
question and the unstructured, lively debate that ensues is indicative of the open-ended potential for the
future of computer game sound.
Whatever your need in picking up the book, I hope you will find it met. At the very least, perhaps
one of the contributions here will raise intriguing questions in your mind, an itch that will be scratched
xix
by future investigation on your part. Perhaps the ideas contained within will inspire you to develop a
new game sound design paradigm or to innovate the technology and push the frontiers of human-sound
interaction? After all, the aim of this book is not just to contribute to the development of ever better
computer games or more cogent analyses but it is also to cast an illuminating light on at least one part
of humankind's relationship with sound as we step out of reality into virtuality.
Mark Grimshaw
xx
Acknowledgment
An anthology such as this requires the hard work and input of many people not just the contributing
authors: publishers and their staff, the Foreword author, the Editorial Advisory Board, and the reviewers.
Each chapter has been exhaustively blind reviewed by two leading academics in the field and I would
like to thank each and every one of them for their tireless work in support of this project: Thomas Ap-
perley (University of New England, Australia), Roger Jackson (University of Bolton, England), Martin
Knakkergaard (University of Aalborg, Denmark), Don Knox (Glasgow Caledonian University, Scotland),
Theo van Leeuwen (University of Technology Sydney, Australia and member of the Editorial Advisory
Board), David Moffat (Glasgow Caledonian University, Scotland), Patrick Quinn (Glasgow Caledonian
University, Scotland), and Gareth Schott (University of Waikato, New Zealand and member of the Edi-
torial Advisory Board). My thanks too to Dave Ranyard of Sony Computer Entertainment Europe Ltd.
for his flattering foreword and my appreciation for the guidance and patience shown towards me by Joel
Gamon of IGI Global. Finally, I must extend my apologies to my contributors and fellow authors for
the hectoring they were subjected to by the editor: The end surely justifies the means.
Mark Grimshaw
University of Bolton, UK.
Section 1
1
Chapter 1
Sound in Electronic
Gambling Machines:
A Review of the Literature and its
Relevance to Game Sound
Karen Collins
University of Waterloo, Canada
Holly Tessler
University of East London, UK
Kevin Harrigan
Michael J. Dixon
Jonathan Fugelsang
AbstrAct
A much neglected area of research into game sound (and computer games in general) is the use of sound
in the games on electronic gambling machines (EGMs). EGMs have many similarities with commercial
computer games, particularly arcade games. Drawing on research in film, television, computer games,
advertising, and gambling, this chapter introduces EGM sound and provides an introduction into the
literature on gambling sound in general, including discussions of the casino environment, the slot machine
EGM, and the physiological responses to sound in EGMs. Throughout the article, we address how the
study of EGM sound may be relevant to the practice and theory of computer game audio.
DOI: 10.4018/978-1-61692-828-5.ch001
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Sound in Electronic Gambling Machines
INtrODUctION more structural components to slot machine

gameplay. Many of these structural components
A much neglected area of research into computer have been adapted from computer games, such as
game sound is the use of sound in electronic cut scenes, bonus rounds and specialist plays. And
gambling machines (EGMs; also known as slot while the arm of the “one-armed bandit” remains
machines, video slots and video fruit machines). on many slot machines, more commonly players
To put the influence of EGMs into perspective, use simple rectangular or round blinking buttons
the computer game industry in the United States very similar to those of many arcade games.
contributes approximately $8 billion in sales each There are also, of course, some notable dif-
year to the country’s GDP (Seeking Alpha 2008). ferences between computer games and electronic
The slot industry, on the other hand, generates gambling machines. Historically, the vast major-
approximately $1 billion a day in wagers in the ity of EGMs have been exclusively installed in
United States alone (Rivlin, 2004). Moreover, this casinos, where the usual age for entry is 21, thus
amount is increasing as slot machines grow in effectively excluding young people from game-
popularity and are increasingly found outside of play. However, this is changing as the companies
designated casinos. In 1980, an average of 45% of attempt to capture a younger audience and the
the gaming floor of a Nevada casino was devoted machines proliferate in non-gambling environ-
to slots, whereas today this number is at least ments (Rivlin, 2004). Today, EGMs can be found
77%, with machines generating more than twice in bars, restaurants, arcades, hotel lobbies, and
the combined revenue of all other types of games entertainment and sporting venues. There are
(Schull, 2005). Although they are also increasing also, of course, virtual slot machines online, and
in complexity (see below), slot machines are at- these represent a significantly growing propor-
tractive to players because they require little or tion of slot income. Research has further shown
no training or previous experience, they are quick that casinos and gaming companies are seeking
and easy to play and, perhaps most importantly, to target women, particularly those over 55 as its
they elicit a number of sights and sounds that make main demographic, although as the venues change,
them striking and exciting on the casino floor. the target market is becoming younger.
EGMs have many similarities with commercial Electronic gambling machines today are also
computer games, particularly arcade games. In much faster to play than their mechanical and
fact, many of the early video arcade game com- electronic ancestors. Now, the average player
panies also had a long prior history manufacturing initiates a new game every 6 seconds (Harrigan
slot machines, including Bally and the Williams & Dixon, 2009a, p. 83), playing up to 600 games
Manufacturing Company. As such, many of the per hour, and there are even artificially intelligent
creators and designers of slot machines today machines that adapt to the speed of the player—
have also worked for computer game companies. when they start slowing down, the machine will
In fact, much of the sound design and music of slow down with them, but work to build them
slots is still outsourced to game sound designers back up after a little break. Many games aim for
and composers, such as George “The Fat Man” “immersion” (what might be best described in
Sanger (composer of 7th Guest, Wing Commander, terms of Csíkszentmihályi’s concept of “flow”,
and others). characterized by concentration on the task at
Furthermore, until the 1990s slot machines had hand, a sense of control, merging of awareness
fairly standard mechanical or electro-mechanical and action, temporal distortion and a loss of self-
reels and parts. Today, however, with the digitiza- consciousness—see Csíkszentmihályi, 1990).
tion of slot machines there are now considerably It is, however, often possible to jam the button
2
with a piece of card, and let the machine play on to simultaneously satisfy different needs of dif-
its own for even faster results. Most machines ferent players.
also include a “Bet Max” function, a one-button In this chapter, we will introduce the literature
mechanism that simultaneously allows players of EGMs and related phenomena to the reader with
to wager the maximum allowable amount and to a specific focus on the use of sound. A brief intro-
spin the reels—a function that encourages both duction to the structural components of gameplay
faster wagering and continuous, rapid gameplay is followed by an examination of existing studies
requiring a minimum of attention from distracted on the sonic elements of casinos and gambling
players.1 Thus, a “nickel slot” can mean wagers of and an exploration of how this knowledge might
up to about $4 per bet, although these are typically apply to computer games.
displayed in “credits” of 25-cent allotments so the
illusion is that the player is betting less.
The biggest distinction between slot machines strUctUrAL cOMPONENts
and computer games is, of course, the aspect of OF EGM GAMEs
financial risk added to gameplay, which adds a
potential new level of psychological, cognitive, A slot machine essentially involves three or more
and emotional involvement in the game (we say reels (in today’s EGMs, these are often computer-
potential because these distinctions are as yet un- generated digital simulations, rather than actual
explored in the research). The win-loss component mechanical parts). Touch-screen machines typi-
of electronic gambling games is more complicated cally do not have handles, but rather the reels are
than it at first appears, with “losses disguised as spun by the player pressing a button (the one-armed
wins”, and “near-misses” (see below). These are bandit style pull-lever handle still exists on most
carefully doled out according to a reward schedule, slot machines, but is not often used). When the
based on scientific research about how long we reels stop spinning, three or more icons (often up
will play before needing a win to keep motivated to five) will line up on the payline for a win, but
(see Brown, 1986). Reward schedules have also other combinations of icons can also lead to a win
been built into computer games, particularly (diagonal lines, and so on), with the amount won
hunter-gatherer type games in which the player relating inversely to the probability of the symbol
must spend considerable time roaming lands and coming up on the payline (Turner & Horbay,
collecting objects.2 Some psychologists suggest 2004). Payouts vary by country/state/province
that the reward schedule combined with the rapid- and by initial betting amount, ranging from about
ity of the gameplay is similar in character to the 80 to 95%—in other words, a fairly significant
effect of amphetamines, stimulating the on-off number of plays result in some form of a “win”
cycle that repeatedly energizes and de-energizes (see below for information about these “wins”).
the brain. This link is supported by functional The amount bet on a win can vary also—the player
magnetic resonance imaging studies revealing can, for instance, be playing a “nickel slot” but
that brain scans of active gamblers and active can end up betting several dollars on a single
cocaine users reveal similar patterns of neurocir- play by betting on a larger number of potential
cuitry (Crockford, Goodyear, Edwards, Quickfall, payout lines. Moreover, with EGMs, the number
& el-Guebaly, 2005). It has been suggested that of payout lines also varies. For example, Lucky
there are many different motivations for gambling, Larry’s Lobstermania made by IGT, has five reels
with a distinct dichotomy between arousal/action and 15 possible paylines. The maximum wager is
seekers and those who seek escape/dissociation. 75 credits ($3.75), while the top prize is 50,000
In other words, slot machine games are designed credits ($2,500). There are also two different bonus
3
rounds available depending on the version of the apparent) also increase the illusion of control.
game: a Great Lobster Escape, and a Buoy Bonus The bonus elements of gameplay are sometimes
round in which additional payouts are guaranteed hinted at by the sound (as in The Simpsons EGM,
but the amount of payout varies.3 In these bonus in which Krusty the Clown says “Here’s a clue
rounds, the player is asked to select from a variety for ya, Jack”). A simple bonus or increased skill
of options, giving the player the illusion of control component leads to an increased psychological
and the perception of skill. The use of a stopping involvement on the part of the player and, it is
device, for instance, in which the players can stop suggested, has a “significant effect on habitual
the spinning of the reels voluntarily, increases the gambling” (Parke & Griffiths, 2006, p. 176).
perception that the stopping is not random but that The use of these functions helps to keep players
there is some form of skill involved: By having interested in that they hope that they will learn
that control, there is an increased probability of the “secrets” of the machine and thus be able to
success, thus making the game more attractive to demonstrate their skill through winning as well
the player (Ladouceur & Sévigny, 2005). as increase their winnings. Of course, similar
Indeed, slot machines today can feature a li- bonus rounds and “Easter Eggs” are often built
brary of game variations, in order to increase what into computer games to reward the regular player
the industry calls “time on device” (Schull, 2005, who has taken the time to find them—thus upping
p. 67). Some features of EGMs (and particularly the player’s credibility amongst other gamers.
bonus rounds) such as nudge and stop buttons, Usually superfluous to gameplay, Easter Eggs
give the illusion of control to the player—an are nevertheless viewed as rewards for the time
important component but one that the gaming spent on the device (see Oguro, 2009). But even
industry has referred to as being an “idiot skill” beyond the world of Easter Eggs, players develop
(Parke & Griffiths, 2006, p. 154). This perhaps skills beyond the initial simple skills required to
calls to mind the “button-mashing” skill of the technically play a game, notes Surman (2007):
early arcade game beat’em-up genre.4 David Sur-
man (2007) notes that Capcom’s 1987 arcade hit While a player new to videogames explores the
Street Fighter, for instance, was released with a pleasures of the gameworld with the clumsy
touch-sensitive hydraulic button system in which curiosity of a toddler, as one becomes a more
the increase of the player’s pressure on the button sophisticated gamer other pleasure registers come
related to the power of the player’s character’s into play, which are concerned with a literacy of
kicking and punching, thus encouraging play- sorts in which one is sensitive to the codes and
ers to bang and smash on the buttons. He states: conventions of the gameworld, and the panoramic
“This ‘innovation’ led to many machines being experience of worldliness reduces to a hunt for
rendered defunct by over-zealous players smash- the telltale graphical or acoustic ‘feedback loops’,
ing the control system. The cacophony of these confirming success in play. Still higher, as the core
large red buttons being bashed would come to gameplay becomes exhausted, players end up cen-
signify the arcades which stocked a number of tring on the reflexive undoing of the gameworld;
these first Street Fighter units” (pp. 208-209). pushing it to its limits, exploring and exploiting
When the player has an increased perception of glitches, ticks, aberrations in the system. (p. 205)
control, they are more likely to engage with the
game, play for longer, and spend more money. This description fits closely with Csíkszentmi-
Bonus or built-in “secret” functions (often a hályi’s (1990) ideas of the requirements for flow
cancel button, slow-down or hints—these are typi- (immersion), where a careful balance between
cally not actually secret but often not immediately difficulty and skill is required to continually en-
4
gage a player in an activity. As the skill increases, helping to create a sense of familiarity, branding
so must the difficulty, or the player will become or distinguishing the machine, and creating the
bored. If the skill required is too difficult for the illusion of winning, since players may only hear
novice, the player will likewise lose interest. winning sounds (Griffiths & Parke, 2005). Fur-
Equally important to the psychology of the thermore, Dibben (2001) argues that, for listeners,
player are the built-in gambling machine concepts the reception of music and sounds are not only
of the “near miss” and the “loss disguised as a embedded in the material and physical dimensions
win”. A near miss is a failure that was close to a of hearing but are also, and critically, grounded
win—such as two matching icons arriving on the in social and cultural knowledge and awareness,
payline followed by a third reel whose icon sits based on “listeners’ needs and occupations” (p.
just off the pay-line. Slot machine manufacturers 183). This idea—that response to music and
use this concept to create a statistically unrealisti- sounds can be influenced by culture and personal
cally high number of near misses (Harrigan, 2009), experience—has self-evident relevance for a study
which convinces the player that they are close focusing on the role of sound in relation to indi-
to winning, and therefore leads to significantly viduals immersed in gambling environments and/
longer playing times (Parke & Griffiths, 2006). or those at risk for addictive gambling behavior.
Described gambling researchers Jonathan Parke We will first cover the environment in which the
and Mark Griffiths (2006): slot machines are commonly found and then focus
on the machines themselves.
At a behaviourist level, a near miss may have the
same kind of conditioning effect on behaviour as
a success. At a cognitive level, a near miss could cAsINO sOUND:
produce some of the excitement of a win, that is, ENVIrONMENtAL FActOrs
cognitive conditioning through secondary rein-
forcement. Therefore, the player is not constantly The sound of electronic gambling machines in the
losing but constantly nearly winning. (p. 163) context of a casino can play a significant role in
the perception of the games. Background music
A loss disguised as a win, on the other hand, in the casinos or bars changes throughout the day,
is a play in which the player “wins” but receives with pop music played in daytime, and relaxing
a payout amount of money less than that of the music in the evenings (Dixon, Trigg, & Griffiths,
amount wagered, hence actually losing on the 2007). The noise and music gives the impression
wager despite being convinced (sonically) that of an exciting and fun environment and, critically,
they have, in fact, won. So for example, a gambler that winning is more common than losing. In fact,
might wager $2 on a play and win $1.50 back. Anderson and Brown (1984), in a comparison of
S/he is actually losing 50 cents, but is given the response to gambling in a laboratory and a casino
reinforcement cues (see below) of a win. setting, found that in the casino, the player’s heart
An important contributing factor to all of these rate increases considerably. Moreover, increased
illusions that increases playing time and increases exposure to the casino setting in problem gamblers
money lost is sound. A small number of previous leads to an increased arousal response. They note
studies of sound in slot machines have shown that “[t]he constant repetition of major changes
that sound influences a gambler’s impression or in autonomic or other kinds of arousal associated
perception of the machine, including the quality in time and place with various forms of gambling
of the machine (the fidelity of the sound is a pri- activity is likely to have a powerful classical or
mary reason for selecting one EGM over another),
5
Pavlovian conditioning effect on gambling be- spent. Young people under 25 perceived that
havior” (p. 400). they had spent longer in an “easy listening” store
There has been considerable research into condition, while older shoppers perceived that
environmental sounds and its impact on consumer they had spent longer in a Top 40 store condition.
behavior in regards to advertising and retail. Familiar music led to the impression that they were
Servicescapes—that is, the soundscape and land- shopping longer (Yalch & Spangenberg, 2000).
scape of the service environment—have been one Muzak’s website described of its music concept
recent area of focus in advertising and marketing (what it terms “audio architecture”):
research. A pleasant ambience, it is felt, is key to
a pleasurable shopping experience. Congruency Its power lies in its subtlety. It bypasses the resis-
in ambience between the brand, sounds scent, and tance of the mind and targets the receptiveness
other aspects of the store are vital to a positive of the heart. When people are made to feel good
consumer experience (see Mattilaa & Wirtz, 2001). in, say, a store, they feel good about that store.
Companies like the now-defunct Muzak have, of They like it. Remember it. Go back to it. Audio
course, built businesses on this idea. Alvin Collis, Architecture builds a bridge to loyalty. (Muzak
VP of strategy and brand for Muzak, outlines the Corporation, n.d)
concept of the servicescape:
Music is, of course, not the only element of
I walked into a store and understood: this is just environmental sound that plays into the overall
like a movie. The company has built a set, and ambience. Sound effects, such as in Discovery
they’ve hired actors and given them costumes and Channel’s stores with sound zones, or a Canadian
taught them their lines, and every day they open supermarket close to one of the authors, Sobey’s,
their doors and say, ‘Let’s put on a show.’ It was which has chirping birds and frogs in the produce
retail theatre. And I realized then that Muzak’s aisle, can also create an overall atmosphere. Both
business wasn’t really about selling music. It sound effects and music can help to quickly iden-
was about selling emotion—about finding the tify a brand for consumers without prior experience
soundtrack that would make this store or that of that brand. Music can cue the shopper as to the
restaurant feel like something, rather than being intended market, and a poor choice of music can
just an intellectual proposition. (see Owen, 2006) clash with the values of the brand (Beverland,
Lim, Morrison, & Terziovski 2006).
Certainly, statistics seem to back-up Muzak’s Griffiths and Parke (2005) draw on a theoretical
ideas, with some studies suggesting that young model by Condry and Scheibe (1989) regarding
people spend 36% more time in a shop when persuasion in advertising and adopt this model for
music is being played, that if Muzak is played in slot machine sound. They suggest that there are
a supermarket, it will increase the percentage of stages in the persuasion process that involves a per-
customers making a purchase there by 17%, and son committing to the machine. This begins with
so on (KSK Productions, n.d). Generally speaking, exposure (they must be exposed to the machine
consumers spend longer in environments when and that might be in a bar) and leads to attention
there is some form of background music as long (in which sound plays a particularly important role
as the volume is low and uncomplex (Garlin & to draw attention in a noisy atmosphere). From
Owen, 2006, p. 761). Music tempo changes can there, comprehension and yielding takes place—
alter the length of time a shopper spends as well a familiar musical theme helps draw the player
as the amount of money. Not only this, but music in, believing the machine is socially acceptable
can also influence the perceived amount of time because the sound is likable and familiar. Finally,
6
the retention and decision-to-gamble stages occur. Some slot machines, however, employ noise
In other words, sound is used to draw people in, cancellation technology to remove any “destruc-
make them feel comfortable, and convince them to tive interference” that may distract a player from
play. The authors hypothesize that the background the flow of gameplay, to increase immersion
sounds and music might increase confidence of the (Schull, 2005, p. 67). An Australian study found
players, increase arousal, help to relax the player, conflicting reviews of background ambience, with
help the player to disregard previous losses, and some players getting distracted, and others report-
induce a romantic state leading them to believe ing excitement: “You can go either way when you
that they may win. hear somebody else going, you can get all hyped
One study into the effect of background music up and think, gee their machine’s going I could
on virtual roulette found that the speed of betting also have it, or it could go the opposite, why isn’t
was influenced by the tempo of the music, with my machine paying. It has a double affect” versus,
faster music leading to faster betting. Another sug- “The minute I hear the ‘ching, chong China man’,
gests that there are two main types of casino design: I quickly run around to see”… Two participants
a playground design (spacious, with warm colors, noted that the music made them “anxious” and
vegetation, and moving water) and a low-ceiling, “desperate” as they believed that everyone else
crowded and compact area. This study found around them was winning something, when they
that music increased perceived at-risk gambling were not” (Livingstone, Woolley, Zazryn, Bakacs,
intentions in the playground casino design while & Shami, 2008, p. 103).
decreasing the intentions in the other gambling Computer games today are rarely consumed
design. In the presence of just ambient sounds, in an arcade environment whose music and sound
however, this finding was reversed (Marmurek, can be manipulated, but the use of non-diegetic
Finlay, Kanetkar, & Londerville 2007). What is music in games as well as the use of ambience
certain is that the flashing lights, the room lighting, could be adjusted to take into consideration
the carpeting, and visual design of the space, the some of the results of these studies. For instance,
conflicting smells of food, perfume and alcohol, altering the perception of time through the use
and in particular the use of loud sounds serves to of changing tempos or generating feelings of
at once create feelings of excitement and luxury as excitement with carefully timed sound effects in
well as serving to distract the player by increasing the ambient world may help to engage the player.
cognitive load (the efforts involved in process- There are also implications here relating to games
ing multi-modal information and use of working that require further research. In particular, how
memory) (see Hirsch, 1995; Kranes, 1995; Skea, does the fact that players can substitute their
1995). Multiple conflicting stimuli and calls own music in Xbox360 games influence their
on attention leading to this increased cognitive perception of gameplay? How does the use of
load causes people to process information using familiar music impact the player’s perception of
guessing, stereotyping, and automatic response to unfamiliar games? These questions are outside the
stimuli rather than reasoned and rational response scope of this chapter, but clearly have important
and introspection.5 This depends, somewhat, on consequences in regards to player engagement
the type of music involved, as well as the personal with and enjoyment of a game. Of course, more
perception of the individual involved (Carter, easily manipulated than the environmental space
Wilson, Lawsom, & Bulik, 1995; McCraty, Bar- in which gameplay takes place, is the use of sound
rios, Atkinson, & Tomasino 1998; Wolfson & in the games themselves.
Case, 2000).
7
sOUND IN EGMs increase the perception that the sound is more real
than it is in actuality and to reduce the recognition
The earliest slot machines, such as the Mills Lib- that it is merely careful programming at play. The
erty Bell of 1907, included a ringing bell with a patent describes:
winning combination, a concept that is still present
in most slots today. Playwright Noël Coward noted In the conventional slot machine… the sound
that sound was a key part of the experience in Las effects generated from the speaker are based on
Vegas: “The sound is fascinating . . . the noise only one kind of sound effect pattern. For ex-
of the fruit machines, the clink of silver dollars, ample, when a big bonus game occurs, a fanfare
quarters, nickels” (in Ferrari & Ives, 2005). As indicating the occurrence of the big bonus game
in the contemporary nickelodeons, sound’s most is sounded, and so forth. Meanwhile, with a slot
important early role was its hailing function, at- machine in which a special game has once oc-
tracting attention to the machines (Lastra, 2000, curred, the player typically keeps playing games
p. 98). Sound in EGMs has advanced alongside while expecting special games to further occur.
the technological changes introduced into the In this case, if the sound effects (winning sounds)
machines in the last few decades. EGMs are now identical to those at the first occurrence of the
using computer-generated graphics, popular mu- special game are generated upon the second or
sic, and high-fidelity sampled sound rather than later occurrence, the pleasure of gaming may not
relying on mechanical ball-bearings, bells or basic fully be enjoyed. (Tsukahara, 2002, p. 1)
square-wave synthesizer chips.
Today, sound effects in EGMs are used for a Slot machines use pseudo-random number
variety of feedback and reward systems. Up until generators carefully programmed to elicit the
about the early 1990s, slot machines featured about right reward schedule, however, and there is no
15 ‘’sound events’’, whereas they now average real skill involved, only manipulations of per-
about 400 and are often carefully researched to ception. Recent research findings that music can
manipulate the player (Rivlin, 2004, p. 4). Sound increase success rate, for instance, are fallacious
designer George Sanger described that sound is because it is simply not possible. Yamada (2009)
created “by committee” and that the committee for example proposes that:
“always want it to be more exciting” with little
consideration for a dynamic range in the excite- Results indicated that the no-music condition
ment portrayed (Personal communication, October showed the best rate of success. Moreover, a
15 2009, Austin, TX). This includes sound effects “mixed” musical excerpt added “unpleasantness”
of coins falling even though many slot machines to the game and, in turn, resulted in a negative
neither accept nor pay out coins anymore. Notes effect on the success rate. Increasing the speed
Bill Hecht, an audio engineer for IGT, “We basi- increased the “potency” of the game, but did
cally mixed several recordings of quarters falling not affect the success rate, systematically. In the
on a metal tray and then fattened up the sound second experiment, we used the two excerpts
with the sound of falling dollars” (Rivlin, 2004, p. performed in various registers and with various
3). Moreover, these false coin sounds can portray timbres as musical stimuli. (p.1)
wins much larger than the actual win.
Unpredictable sounds in particular help to It is unclear if Yamada custom-designed the
capture and maintain our attention (Glass & games that were tested or if the test was for il-
Singer, 1972). There has even been a recent patent lusion of success and perceptions of gameplay
to randomize winning sound effects in order to rather than actual success, and neither is it clear
8
if the game involved was a custom-built game for game, the louder and quicker the music usually
the purposes of the study (we could find no refer- becomes. High pitched sounds—very common to
ences to the game in Google), and so Yamada’s slot machines—are also very useful in attracting
findings remain dubious at best. However, what our attention as they perceptually appear closer
Yamada’s work does show is that it is highly likely to us. Notes Millicent Cooley (1998): “Advertis-
that music plays an important role in increasing ers use this principle when they pump television
the illusion of success. commercials full of high frequency audio that
Of course, winning sounds are particularly makes characters sound as if they are intruding
important to the popularity and attraction of the into viewers’ homes.” The types of sounds used
machines and losing sounds are rarely heard. When in EGMs are also carefully chosen according to
losing sounds are used in some machines, they are Western cultural likes and dislikes. As one study
intentionally employed to antagonize the player, of pleasing sounds found, chimes are particularly
creating a short-term sense of frustration that, it highly rated: “Our highest rated sounds generally
has been suggested, prolongs the play period in related to escapism (e.g., fantasy chimes, birds
what has been called “acoustic frustration”: singing) and pleasure (children laughing)” (Effrat,
Chan, Fogg, & Kong, 2004, p. 64).
Antagonistic sounds invoke frustration and disap- Large wins in slot machines are character-
pointment. For example, on The Simpsons fruit ized by a “rolling sound” with the length of the
machine, Mr. Smithers smugly informs Homer win tied to the length of the music cue. Winning
Simpson that, “You’re fired”, or Chief Wiggam sounds are often carefully constructed to be heard
says, “You’re going away for a long time”. At pres- over the gameplay of other players to draw atten-
ent, we can only speculate about the consequences tion to the machine and to raise the self-esteem
of such sound effects. In line with hypotheses sup- of the player, who then becomes the center of
porting frustration theory and cognitive regret… attention on the slot machine/casino/bar floor
this might make the fruit machine more inducing.” (Griffiths & Parke, 2005, p. 7). Often, this music
(Parke & Griffiths, 2006, p. 171) contains high pitched, major mode songs with
lots of chimes and money sounds. Higher pitch
This idea of acoustic frustration could be also has a tendency to increase the perception of
adapted and utilized by computer games more urgency, with that increase in perceived urgency
effectively than is currently seen. For instance, corresponding to an increase in pitch, but it also
commentary on gameplay (see below) is com- helps to cut through the ambient noise of a busy
mon in some types of games but absent in most casino (Haas & Edworthy, 1996).
computer games. Sound effects and music could There are several implications here for com-
play a commentary effect without using dialogue puter game sound. First is the reinforcement role
as well. of both encouraging and antagonistic sound. Sonic
The types of sounds used are particularly rewards are under-utilized in games and the idea
important to their affective power. Pulsating of a reward schedule, while it has been used in
sounds that increase in pitch or speed (vibrato computer games, is likewise unusual. To tie the
and tremolo) have been shown to help to increase two together—to have a system of sonic rewards
tension and verbal reinforcements (both nega- at anticipated specific timings in the game—can
tive and positive) are used to goad the player on help to keep a player interested for longer. Los-
with a sensation known as perceived urgency ing sounds, as discussed above, are perhaps the
(see Edworthy, Loxley, & Dennis, 1991; Haas & equivalent of player health decreases or death in a
Edworthy,1996). The deeper a player gets into a computer game. It is quite common for computer
9
games marketed to children to sonically represent Likewise, the concepts of near misses and
the player’s character’s death as a not particularly losses disguised as wins are elements popular
negative event. This may in fact even be silence in slot machines but rarely—if ever—heard in
upon the character or game’s end (the equivalent computer game sound. One might imagine, for
of not hearing a losing sound in a slot machine), instance, a “mini-game” within a larger game in
a fun “raspberry”, a game show-like losing sound which the player is sonically teased with almost
(as in Rocky and Bullwinkle on the Nintendo En- winning a bonus round or is given the impression
tertainment System (NES)), or a cheery “try again” that they have won more points than they actually
music (as in the Jetsons or Flintstones game-over have within that bonus round. This would prob-
music for the NES). On the other hand, in more ably, of course, only be useful for certain types of
adult-oriented games, the player’s death can be games aimed at certain types of players. One can
a much more negative event with serious funeral imagine this effect in a Wii casual game designed
dirges. It may be worthwhile for sound designers for all ages, for instance, but less so for a big
to explore the possibility of including both more budget first-person shooter title on the Xbox360.
losing-type sounds in other places within the game,
in order to increase the acoustic frustration the slots, Familiarity and brands
player feels, thus enhancing the impact of win-
ning sounds and increase emotional engagement. Important to feelings of player comfort and emo-
Psychological studies have shown that frus- tional connection to the machine is the role of
trative non-rewards are considerably motivat- branding EGMs by using well-known intellectual
ing. In simple terms, “failing to fulfill a goal property. Popular songs are often used to attract a
produces frustration which (according to the player to the machine and to cause players to feel
theory) strengthens ongoing behavior”, leading more comfortable and familiar with that machine.
to cognitive regret, encouraging persistent play Similarly, sound can play a role in branding by
in the desire to relieve the regret (Griffiths, 1990; certain companies which create distinctive win-
see also Amsel, 1962). Note King, Delfabbro, & ning sounds in an effort to have their sounds heard
Griffiths (2009): over the din of the casino. Indeed, branded EGMs
are becoming both more commonplace and more
Video games have also become longer and more popular in casino environments. Whereas once
complex, making a punishment like permanent producers of popular culture sought to remain
character death an unappealing feature, particu- apart from the perceived negative connotations
larly for a less committed, casual playing audience. associated with the gambling industry, today films
Common forms of punishment in games include like Top Gun, and Star Wars, television game
having to restart a level, failing an objective, or shows like Jeopardy!, Deal or No Deal and The
losing resources of some kind, like items, XP or Price Is Right, and musical acts like Elvis Presley,
points. (p. 10) the Village People and Kenny Rogers all have
branded EGMs (Dretzka, 2004). Familiarity with
It is possible, therefore, to improve the sound a television show, film, person, place, musical act
of losing tied to these lesser events, in order to or sport can, for instance, entice players to the
tap into the acoustic frustration effect seen in slot machine because it may “represent something
machines. While we typically hear sounds tied to that is special to the gambler… Players may find
these events in current games, a stronger sense it more enjoyable because they can easily inter-
of loss (and thus, upon winning, reward) may act with the recognizable images and music they
improve player involvement.
10
experience” (Griffiths & Parke, 2005, p. 5). As again, because of the dialog and swaggering,
Dretzka (2004) observed: aggressive tone of the host. The machine is in
charge and you, the player, are not; the game is
Seemingly overnight, casinos actually began quick-paced, there is a sense that you will be rushed
sounding different. Instead of clanging bells, along and should try to keep up and prove that
mechanical clicks and clacks, and jackpot alarms, you do, in fact, know jack. You feel this pressure
the soundtrack was more of an electronic gurgle because the voice of the host rushes you to sign
and hum, with bursts of ‘This is Jeopardy!’, in, taunting you impatiently at every step. (p. 8)
Wheel of Fortune!’ and snippets of rock songs. A
generation of Americans raised in front of their It is possible that a similar process is at work
television sets ate it up. with slot machines—that is to say, the taunting
will increase the speed with which the player
Moreover, familiarity and repetition of musical plays, antagonizing the player to the point where
themes has been shown to have a positive influence the player loses focus on what truly matters (that
on our liking of the music (see Bradley, 1971). is, the loss of their money).
Verbal reinforcement with known characters In reference to sonic branding, Jackson (2003)
(as well as, to a lesser degree, unknown charac- suggests that the voice heard links to the perceived
ters) also takes place, as seen above, with familiar personality (including perceived behavior and per-
characters telling people that they are “cool” or “a ceived appearance) of the speaker and, therefore,
genius”. Parke and Griffiths (2006) note that verbal of the brand (p. 135), and it is equally likely that a
reinforcement that increases play is designed to similar effect is seen in the perceived personality
raise self-esteem, give hints and guidance, and of the machine. It has been said that 38% of the
even provide friendship or company (p. 171). An effect we have on other people can be attributed
unexplored area of research is the relationship to our voice, with only 7% to the actual words
between verbal reinforcement and the anthropo- we’ve spoken (the rest being body language)
morphizing of slot machines. Describes Langer (Westermann, 2008, p. 153). In a study into voice
(1975), with regard to such anthropomorphism: and brand, UK Telecom provider Orange identi-
“Gamblers imbue artifacts such as dice, roulette fied a series of attributes that define the sound of
wheels, and slot machines with character, calling a voice: rhythm (emphasis is placed on what is
out bets as though these random (or uncontrollable) said); pitch (high versus low), melody (rhythm
generators have a memory or can be influenced” and pitch together; pace (speed), tone (overall
(see also Gaboury & Ladouceur, 1989; Toneatto, musical quality); intonation (what is said relating
Blitz-Miller, Calderwood, Dragonetti, & Tsanos, to how it is said), energy; clarity; muscular ten-
1997). It is very likely that sound plays a con- sion; resonance pause, breath; commitment, and
siderable role in the anthropomorphizing of slot volume (Westermann, 2008, p. 153). Each of these
machines—particularly in those cases where the attributes work together to impact our perceptions
machines “talk” to the player, but also in the mere of what is being said. Particularly notable is the
fact that they are sonically responsive to our input. impact that the voice (and what it says) can have
In reference to the game show computer game on our perceptions of what we are seeing and/or
You Don’t Know Jack, Millicent Cooley (1998) experiencing. Several studies have shown how the
notes that the player: voice influences our perception of video sports
performances. In a study of sports commentary,
Will be aggressively challenged to prove that you Bryant, Brown, Comisky, & Zillmann, (1982)
know jack (anything at all), and you know this, discovered that our enjoyment of watching sports
11
is largely tied to the dramatic embellishments instance, has maintained a distinctive aesthetic
provided by the commentary of the sportscasters. through countless incarnations, platforms, and
However, it is not only our enjoyment but also the technological improvements. However, there are
ways that we interpret what we are seeing that is many games that still do not attempt to capital-
influenced by commentary. In one study, it was ize on this ability to entice experienced players
found that commentary affected the perception of to a new version of the game with the creation a
aggression of the players in an ice hockey match. distinct, recognizable sound.6
(Comisky, Bryant, & Zillmann, 1977). Not only
this, but the more aggressive commentary was also
perceived as more enjoyable. Other similar studies rEsPONsEs tO EGM sOUND
have reached similar conclusions in commentary
in a tennis match, (Bryant, Brown, Comisky, & The response of players to slot machine sounds
Zillmann, 1982), a soccer game (Beentjes, Van is diverse, representing the different needs and
Oordt, & Van Der Voort, 2002) and a basketball desires of the players. For many, music and sound
game (Sullivan, 1992). This influence of com- signify success, as one study has found: “I like
mentary on perception is likely to play an equally it when it’s going long [the music], because you
important role in slot machines as well as com- know you’re winning plenty of money. When
puter games, although this remains another area they’re short, I don’t like them…” (Livingstone
of game sound largely unexplored. Sports games et al., 2008, p. 103). Other players—those which
in particular make use of commentary although by their comments appear to be more regular
it is also very common to find commentary in gamblers—dislike the sounds, the study found:
games that imitates television game shows. It is “sounds are too loud and attract attention. If
possible, therefore, that the addition of a narrative someone lets the feature music go on and on
of events in some games may impact the player’s they are not serious—the problem gamblers hate
perception of their gameplay, as well as their hearing it go on and on—and it draws attention
enjoyment of the game, although the technique to you” (Livingstone et al., 2008, p. 103). A few
is clearly under-utilized. other participants also reported pressing “col-
Another trait discussed above that is highly lect” straight after a win specifically to stop the
popular in slot machines but less common in music from playing. While some players found
computer games is the use of familiarity and the sounds of others winning exciting, others felt
branding tied to the machines. Not only do the that it gave them the impression that “everyone
games themselves have distinctive sounds, but is winning but you” (Livingstone et al., 2008, p.
each company has its own overarching style and 103). One study regarding sound’s presence (as
aesthetic that can be quickly learned upon spending on or off) showed that players strongly preferred
time on the casino floor. The coin sounds from an sound to be on (Delfabbro, Fazlon, & Ingram,
IGT slot machine, for instance, sound different 2005). Response to sound, therefore, can vary
from those generated by a Bally machine. While from player to player, but some typical responses
this acoustic branding is particularly relevant in an can be summarized.
environment where machines are competing for Studies of the physiological response to
attention, the relevance of creating a distinctive sound (typically industrial noise, but also in-
sound and branding franchise games or episodic cluding music, speech, and other sounds) have
games remains in other environments also. Some found that sounds can contribute to increases
computer games have, of course, employed this in blood pressure and, most importantly, impair
technique—the Super Mario Bros series, for performance on a vigilance task (Smith & Morris
12
1997). Wolfson and Case (2000) studied heart rate though they tested their student subjects during an
response to manipulation of loudness of sound in examination. Rohner and Miller (1980) found that
a computer game, finding that louder sounds led music had no influence on anxiety levels. Pitzen
to increased heart rate, and discussed the impact and Rauscher (1998) and Hirokawa (2004), on the
that physiological arousal has on our attention other hand, more recently found that stimulating
levels. They note: music increased energy and relaxation (increasing
GSR but not heart rate).
People performing a task when minimally aroused Although there are many studies about music
are more likely to be slow, indifferent, and spread in isolation and its physiological effect on listen-
their attention across a wide range of stimuli. ers, there has been much less research on music’s
When highly aroused, people tend to be faster impact on GSR while taking into consideration
but less accurate, and they focus mainly on the the interaction between sound and visual image
most salient aspects of a task. Thus both high and (for example, Thayer & Levenson, 1983). Percep-
low levels of arousal can have detrimental effects tual studies (non-physiologically based research)
on performance. (Wolfson & Case, 2000, p. 185) from the field of advertising suggest that image
and sound, when used congruently (that is, for
Physiological responses to stimuli can be tested instance, when both have a similar message),
using a variety of measures, including (but not tend to amplify each other (for instance, Bolivar,
limited to) electroencephalograms (EEGs), facial Cohen, & Fentress, 1994; Bullerjahn & Gülden-
electromyography, heart rate, pupil dilation and ring, 1994; Iwamiya, 1994). There have also been
electrodermal response. Galvanic skin response studies into the physiological effects of gambling,
(GSR), one component of electrodermal response, which have shown that pupils may dilate, heart
also known as skin conductance response or sweat rate may increase, and skin conductance levels
response, is an affordable and efficient measure- increase (raising the GSR). Collectively, these are
ment of simple changes in arousal levels—one known as arousal levels, and it is the arousal induc-
of the reasons why it is the main component of a ing properties of slot machines that are affected
polygraph device. Essentially, GSR measures the by winning and losing, with increased arousal
electrical conductivity of the skin, which changes levels for wins (such as Coventry & Constable,
in resistance due to psychological states. (See 1999; Coventry & Hudson, 2001; Sharpe, 2004).
Nacke & Grimshaw, 2011 for the use of such Additionally, a number of studies, for instance,
measures when assessing psychophysiological research by Dickerson and Adcock (1987), have
responses to computer game sound.) questioned whether there is a connection between
Studies using GSR on subjects while being physiological responses to gambling and wider
exposed to music date back to at least the 1940s psychological issues governing perceptions of
(for example, Dreher, 1947; Traxel & Wrede, 1959) such elements as gambling environment, luck,
but are highly contradictory due to the conditions and mood. These studies suggest there is some
in which the studies took place. Sound and music evidence to support both psychological and
has a known influence on listener’s arousal and physiological responses to gambling behaviors
anxiety levels, but this depends on many factors are fuelled in part by a player’s illusion of control
including the degree of musical knowledge, the (for example, Alloy, Abraham, & Viscusi, 1981).
tempo of the music, the familiarity with the music, In more recent research into computer games
preference for the music, and recent exposure to and the computer gaming environment, Hébert,
that music. Smith and Morris (1976) found that Béland, & Dionne-Fournelle (2005) have dis-
stimulating music increased worry and anxiety, al- covered that, “for the first time…auditory input
13
contributes significantly to the stress response our response. With losses disguised as wins, the
found during video game playing” (pp. 2371- numbers displayed on the machine tell us that we
2372). This research suggests that physiological are losing (in other words, we “won” 50 cents,
responses to music in computer games may be but our total credits and cash have been reduced
linked in part to genre, noting generally that the since the last play) but the sound tells us that we
more aggressive and rapid the music, the more are winning. In a sense, the sound overrules our
elevated physiological stress levels become. eyes and leads the emotional (and physiological)
A recent pilot study into the sounds and sights response to the event. This phenomenon illustrates
of losses disguised as wins was undertaken with the importance of sound to our overall perception
16 participants by the University of Waterloo’s of audio-visual media, and demonstrates one
Problem Gambling Research Group. Each partici- under-utilized way that sound is used in computer
pant played Lucky Larry’s Lobstermania for 45 games. Far from merely reinforcing image, sound
minutes while being tested for their arousal levels can have a much more complex relationship with
using GSR. Participants wore a GSR recording what is occurring on screen. We might use a “win-
device on their fingers while they played, with ning cue” sound for instance in a battle scene to
the output from the GSR being tied to two wires trick the player into thinking that the evil “big
which output when the player pressed the play boss” enemy is dead, only to have them return
button and whether or not the play resulted in a to life. Or, we might use sound into tricking the
win, loss disguised as a win (where payout is less player into thinking drinking that bottle of potion
than spin wager) or a regular loss (that is, losses was a beneficial event, only to later reveal that
without reinforcing sounds of a win). As might it was not.
be expected, the highest GSR rating—indicat-
ing the highest arousal level—was found with
wins, with the lowest rating with regular losses. cONcLUsION
What is particularly interesting, however, is that
losses disguised as wins were much closer physi- The intent of this article has been to explore a
ologically to wins, than to losses. In other words, comparatively understudied area of computer
hearing the sounds of winning, even though the game sound, chiefly that of the role of music and
player has lost money, is enough to trick the mind/ sound in electronic gambling machines (EGMs).
body into believing that the player is winning We explored the structural components of EGMs
(Dixon, Harrigan, Sandhu, Collins, & Fugelsang, and EGM games, tracing the development of
forthcoming). technical advances that have led to progressively
In the case of losses disguised as wins, these more enhanced audio interfaces over the past two
games play on the idea of synchresis. Film theorist decades. Central to this discussion is the inter-
Michel Chion (1994) defines synchresis as “the relationship between EGM technology, sound
forging of an immediate and necessary relation- and human behavioral psychology. Research has
ship between something one sees and something shown that standard EGM gameplay concepts like,
one hears,” combining the ideas of synchronism for instance, the “near miss” and “losses disguised
(simultaneous events) with synthesis (p.5). Essen- as wins”, coupled with enhanced sound prompts
tially, sound changes our perception of the image and triggers can encourage both more rapid and
that we see and, despite there being an opposing longer gameplay.
relationship between sound and image, we view A second correlated point in this study has been
images as connected to sound when they are our consideration of EGM sound within the wider
played concurrently, with the sound dominating soundscape of a casino/bar/gaming environment.
14
An interesting area of research as yet unexplored more frequent frustration sounds being used as the
is determining whether gambling behavior is player advances, for example, and greater sonic
affected when EGM sounds commingle with, encouragement at the start of a game. Different
and compete against, external sources of music, sounds may be used when the game is being played
sounds, and noise. Further, it would be interesting as a one-player or in multi-player mode.
to explore whether a correlation exists between Recently, with the creation of physiologically
the concurrent use of image and sound in EGMs. aware gaming devices such as the Wii Vitality Sen-
Specifically, to determine if EGM sound and sor, it has become possible to adjust in real-time
video individually and together amplify and/or based on the player’s physiological response. We
reinforce the notion of a loss disguised as a win believe that this area of computer gaming—what
or, conversely, if EGM sound and visuals instead we might call “player aware” games—will become
worked to distract and divert gamblers’ attention an important future area for research. In particular,
away from the machine, and by extension, from it is possible to both craft sound to manipulate the
the act of gambling. Early research does indicate player based on their physiological response, as
that sound does, in fact, reinforce the idea of well as to respond based on their physiological
winning even when the player is losing. There response. It might be possible, in other words, for
have been no studies to explore the impact that a games to “read” our emotional and physiological
similar sonic process has in computer games, but state and adjust music to keep us interested, to
this is an interesting area for future exploration. guide us to another state, or to enhance an exist-
A particularly important concept that can be ing state. Sound clearly plays an important role
taken from slot machines is the idea of custom- in the perception of gaming, and will continue to
ization. Slot machines, as shown, have two basic grow in importance as computer games search for
markets that they cater to: arousal/action seekers, ever-increasing ways to keep players interested.
and those who seek escape/dissociation. It may
be suggested that computer games have a similar
audience, although this simple way of dividing rEFErENcEs
players is perhaps inadequate. What does remain,
however, is the concept that players have differ- Alloy, L., Abramson, L., & Viscusi, D. (1981).
ent needs for gameplay. And while some players Induced mood and the illusion of control. Jour-
enjoy the sounds of slot machines and the casino nal of Personality and Social Psychology, 41,
environment, others clearly would prefer the abil- 1129–1140. doi:10.1037/0022-3514.41.6.1129
ity to turn down—or turn off—sound altogether. Amsel, A. (1962). Frustrative nonreward in partial
computer games, of course, have long recognized reinforcement and discrimination learning: Some
this and offered the ability to turn sound on, off, recent history and a theoretical extension. Psy-
and later adjust volumes of individual elements chological Review, 69(4), 306–328. doi:10.1037/
(ambience/sound effects/dialogue/music). More h0046200
recently, the option for players to insert their own
preferred music into a game has furthered the Anderson, G., & Brown, R. I. T. (1984). Real
ability to customize game sound. Further, some and laboratory gambling, sensation-seeking and
games have “boredom switches” that drop the arousal. The British Journal of Psychology, 75(3),
volume levels automatically after a player has 401–410.
become “stuck” at a particular stage in the game.
However, it might also be possible to adjust sound
based on the player’s skill level and ability—with
15
Beentjes, J. W. J., Van Oordt, M., & Van Der Carter, F. A., Wilson, J. S., Lawson, R. H., &
Voort, T. H. A. (2002). How television com- Bulik, C. M. (1995). Mood induction procedure:
mentary affects children’s judgments on soccer importance of individualising music. Behaviour
fouls. Communication Research, 29, 31–45. Change, 12, 159–161.
doi:10.1177/0093650202029001002
Chion, M. (1994). Audio-vision: Sound on screen.
Beverland, M., Lim, E. A. C., Morrison, M., & New York: Columbia University Press.
Terziovski, M. (2006). In-store music and con-
Comisky, P. W., Bryant, J., & Zillmann, D.
sumer–brand relationships: Relational transforma-
(1977). Commentary as a substitute for action.
tion following experiences of (mis)fit. Journal of
The Journal of Communication, 27(3), 150–153.
Business Research, 59, 982–989. doi:10.1016/j.
doi:10.1111/j.1460-2466.1977.tb02141.x
jbusres.2006.07.001
Condry, J., & Scheibe, C. (1989). Non program
Bolivar, V. J., Cohen, A. J., & Fentress, J. C.
content of television: Mechanisms of persuasion .
(1994). Semantic and formal congruency in music
In Condry, J. (Ed.), The Psychology of Television
and motion pictures: Effects on the interpretation
(pp. 217–219). London: Erlbaum.
of visual action. Psychomusicology, 13, 28–59.
Cooley, M. (1998, November). Sound + image
Bradley, I. L. (1971). Repetition as a factor in the
in computer-based design: Learning from sound
development of musical preferences. Journal of
in the arts. Paper presented at International
Research in Music Education, 19(3), 295–298.
Community for Auditory Display Conference,
doi:10.2307/3343764
Glasgow, UK.
Brown, R. I. F. (1986). Arousal and sensation-
Coventry, K. R., & Constable, B. (1999). Physi-
seeking components in the general explana-
ological arousal and sensation seeking in female
tion of gambling and gambling addictions.
fruit machine players. Addiction (Abingdon,
Substance Use & Misuse, 21(9), 1001–1016.
England), 94, 425–430. doi:10.1046/j.1360-
doi:10.3109/10826088609077251
0443.1999.94342512.x
Bryant, J., Brown, D., Comisky, P. W., & Zillmann,
Coventry, K. R., & Hudson, J. (2001). Gender
D. (1982). Sports and spectators: Commentary
differences, physiological arousal and the role
and appreciation. The Journal of Communication,
of winning in fruit machine gamblers. Addiction
32(1), 109–119. doi:10.1111/j.1460-2466.1982.
(Abingdon, England), 96, 871–879. doi:10.1046/
tb00482.x
j.1360-0443.2001.9668718.x
Bryant, J., Comisky, P., & Zillmann, D.
Crockford, D., Goodyear, B., Edwards, J., Quick-
(1982). Drama in sports commentary. The
fall, J., & el-Guebaly, N. (2005). Cue-Induced
Journal of Communication, 27(3), 140–149.
brain activity in pathological gamblers. Biologi-
doi:10.1111/j.1460-2466.1977.tb02140.x
cal Psychiatry, 58(10), 787–795. doi:10.1016/j.
Bullerjahn, C., & Güldenring, M. (1994). An biopsych.2005.04.037
empirical investigation of effects of film music
Csíkszentmihályi, M. (1990). Flow: The psychol-
using qualitative content analysis. Psychomusicol-
ogy of optimal experience. New York: Harper-
ogy, 13, 99–118.
Perennial.
16
Delfabbro, P., Fazlon, K., & Ingram, T. (2005). Ferrari, M., & Ives, S. (2005). Slots: Las Vegas
The effects of parameter variations in electronic gamblers lose some $5 billion a year at the slot
gambling simulations: Results of a laboratory- machines alone. Las Vegas: An unconventional
based pilot investigation. Gambling Research: history. New York: Bulfinch.
Journal of the National Association for Gambling
Gaboury, A., & Ladoucer, R. (1989). Erroneous
Studies, 17(1), 7–25.
perceptions and gambling. Journal of Social Be-
Dibben, N. (2001). What do we hear, when we havior and Personality, 4(41), 111–120.
hear music? Music perception and musical mate-
Garlin, F. V., & Owen, K. (2006). Setting the tone
rial. Musicae Scientiae, 2, 161–194.
with the tune: A meta-analytic review of the effects
Dickerson, M., & Adcock, S. (1987). Mood, of background music in retail settings. Journal of
arousal and cognitions in persistent gambling: Business Research, 59, 755–764. doi:10.1016/j.
Preliminary investigation of a theoretical model. jbusres.2006.01.013
Journal of Gambling Behaviour, 3(1), 3–15.
Glass, D. C., & Singer, J. E. (1972). Urban stress.
doi:10.1007/BF01087473
New York: Academic.
Dixon, L., Trigg, R., & Griffiths, M. (2007). An
Griffiths, M., & Parke, J. (2005). The psychology
empirical investigation of music and gambling
of music in gambling environments: An observa-
behaviour. International Gambling Studies, 7(3),
tional research note. Journal of Gambling Issues,
315–326. doi:10.1080/14459790701601471
13. Retrieved July 15, 2009, from http://www.
Dixon, M., Harrigan, K. A., Sandhu, R., Collins, camh.net/egambling/issue13/jgi_13_griffiths_2.
K., & Fugelsang, J. (2011: In press). Slot machine html.
play: Psychophysical responses to wins, losses,
Griffiths, M. D. (1990). The cognitive psychology
and losses disguised as wins. Addiction.
of gambling. Journal of Gambling Studies, 6(1),
Dreher, R. E. (1947). The relationship between 31–42. doi:10.1007/BF01015747
verbal reports and the galvanic skin response.
Haas, E. C., & Edworthy, J. (1996). Designing
Journal of Abnormal and Social Psychology,
urgency into auditory warnings using pitch, speed
44, 87–94.
and loudness. Computing and Control Engineering
Dretzka, G. (2004, December 12). Casinos, ce- Journal, 7, 193–198. doi:10.1049/cce:19960407
lebrities bet on our love for pop culture icons.
Harrigan, K. A. (2009). Slot machines: Pursuing
Seattle Times. Retrieved July 15, 2009, from http://
responsible gaming practices for virtual reels
community.seattletimes.nwsource.com/archive/?
and near misses. International Journal of Mental
date=20041212&slug=casinos12.
Health and Addiction, 7(1), 68–83. doi:10.1007/
Edworthy, J., Loxley, S., & Dennis, I. (1991). s11469-007-9139-8
Improving auditory warning design: relationship
Harrigan, K. A., & Dixon, M. (2009). PAR sheets,
between warning sound parameters and perceived
probabilities, and slot machine play: Implications
urgency. Human Factors, 33, 205–231.
for problem and non-problem gambling. Journal
Effrat, J., Chan, L., Fogg, B. J., & Kong, L. (2004). of Gambling Issues, 23, 81–110. doi:10.4309/
What sounds do people love and hate? Interac- jgi.2009.23.5
tion, 11(5), 64–66. doi:10.1145/1015530.1015562
17
Hébert, S., Béland, R., & Dionne-Fournelle, O. Langer, E. J. (1975). The illusion of control.
(2005). Physiological stress response to video- Journal of Personality and Social Psychology,
game playing: the contribution of built-in music. 32, 311–328. doi:10.1037/0022-3514.32.2.311
Life Sciences, 76, 2371–2380. doi:10.1016/j.
Lastra, J. (2000). Sound technology and the
lfs.2004.11.011
American cinema: Perception, representation,
Hirokawa, E. (2004). Effects of music, listening, modernity. New York: Columbia University Press.
and relaxation instructions on arousal changes and
Livingstone, C., Woolley, R., Zazryn, T., Bakacs,
the working memory task in older adults. Journal
L., & Shami, R. (2008). The relevance and role of
of Music Therapy, 41(2), 107–127.
gaming machine games and game features on the
Hirsch, A. R. (1995). Effects of ambient odors play of problem gamblers. Adelaide: Independent
on slot-machine usage in a Law Vegas casino. Gambling Authority of South Australia.
Psychology and Marketing, 12(7), 585–594.
Lucas, G. (Director). (1977). Star Wars [Motion
doi:10.1002/mar.4220120703
picture]. Los Angeles, CA: 20th Century Fox.
Hopson, J. (2001). Behavioral game design.
(2002). Lucky Larry’s Lobstermania [Computer
Gamasutra. Retrieved October 23, 2009, from
game]. Reno, NV: IGT.
http://www.gamasutra.com/view/feature/3085/
behavioral_game_design.php. Marmurek, H. H. C., Finlay, K., Kanetkar,
V., & Londerville, J. (2007). The influence
Iwamiya, S. (1994). Interaction between auditory
of music on estimates of at-risk gambling
and visual processing when listening to music in
intentions: An analysis by casino design. In-
an audio visual context. Psychomusicology, 13,
ternational Gambling Studies, 7(1), 113–122.
133–154.
doi:10.1080/14459790601158002
Jackson, D. (2003). Sonic branding: An in-
Mattilaa, A. S., & Wirtz, J. (2001). Congruency of
troduction. New York: Palgrave/Macmillan.
scent and music as a driver of in-store evaluations
doi:10.1057/9780230503267
and behavior. Journal of Retailing, 77, 273–289.
King, D., Delfabbro, P., & Griffiths, M. (2009). doi:10.1016/S0022-4359(01)00042-2
Video game structural characteristics: A new
McCraty, R., Barrios-Choplin, B., Atkinson, M.,
psychological taxonomy. International Journal
& Tomasino, D. (1998). The effects of different
of Mental Health and Addiction, 8(1), 90–106.
types of music on mood, tension and mental clar-
doi:10.1007/s11469-009-9206-4
ity. Alternative Therapies in Health and Medicine,
Kranes, D. (1995). Play grounds. Gambling: 4, 75–84.
Philosophy and policy [Special Issue]. Journal
Muzak Corporation. (n.d.). Why Muzak. Retrieved
of Gambling Studies, 11(1), 91–102. doi:10.1007/
October 5, 2009, from http://music.muzak.com/
BF02283207
why_muzak.
Ladouceur, R., & Sévigny, S. (2005). Structural
Nacke, L., & Grimshaw, M. (2011). Player-game
characteristics of video lotteries: Effects of a stop-
interaction through affective sound . In Grimshaw,
ping device on illusion of control and gambling
M. (Ed.), Game sound technology and player in-
persistence. Journal of Gambling Studies, 21(2),
teraction: Concepts and developments. Hershey,
117–131. doi:10.1007/s10899-005-3028-5
PA: IGI Global.
18
Oguro, C. (2009). The greatest Easter eggs in gam- Seeking Alpha, “The Video Game Industry: An
ing. Gamespot. Retrieved October 5, 2009, from $18 Billion Entertainment Juggernaut” August
http://www.gamespot.com/features/6131572/ 05, 2008 http://seekingalpha.com/article/89124-
index.html. the-video-game-industry-an-18-billion-enter-
tainment-juggernaut.
Owen, D. (2006, April 10). The soundtrack
of your life: Muzak in the realm of retail Sharpe, L. (2004). Patterns of autonomic
theatre. The New Yorker. Retrieved October arousal in imaginal situations of winning
5, 2009, from http://www.newyorker.com/ and losing in problem gambling. Journal of
archive/2006/04/10/060410fa_fact. Gambling Studies, 20, 95–104. doi:10.1023/
B:JOGS.0000016706.96540.43
Parke, J., & Griffiths, M. (2006). The psychol-
ogy of the fruit machine: The role of structural Skea, W. H. (1995). “Postmodern” Las Vegas and
characteristics (Revisited). International Journal its effects on gambling. Journal of Gambling Stud-
of Mental Health and Addiction, 4, 151–179. ies, 11(2), 231–235. doi:10.1007/BF02107117
doi:10.1007/s11469-006-9014-z
Smith, C. A., & Morris, L. W. (1976). Effects of
Pitzen, L. J., & Rauscher, F. H. (1998, May). stimulative and sedative music on cognitive and
Choosing music, not style of music, reduces emotional components of anxiety. Psychological
stress and improves task performance. Poster Reports, 38, 1187–1193.
presented at the American Psychological Society,
Sullivan, D. B. (1992). Commentary and viewer
Washington, DC.
perception of player hostility: Adding punch
Productions, K. S. K. (n.d.). Cinematic & Muzak. to televised sports. Journal of Broadcasting &
Retrieved October 20, 2009, from http://www.ksk- Electronic Media, 35, 487–504.
productions.nl/en/services/cinematic-a-muzak.
(1995). Super Mario Bros [Computer game].
Rivlin, G. (2004, May 9). The tug of the new- Redmond, WA: Nintendo.
fangled slot machines. New York Times. Re-
Surman, D. (2007). Pleasure, spectacle and reward
trieved July 15, 2009, from http://www.nytimes.
in Capcom’s Street Fighter series . In Krzywinska,
com/2004/05/09/magazine/09SLOTS.html.
T., & Atkins, B. (Eds.), Videogame, player, text
Rohner, S. J., & Miller, R. (1980). Degrees of (pp. 204–221). London: Wallflower.
familiar and affective music and their effects on
7th guest [Computer game]. (1993). Trilobyte
state anxiety. Journal of Music Therapy, 17, 2–15.
(Developer). London: Virgin Games.
Schull, N. D. (2005). Digital gambling: The co-
Thayer, J. F., & Levenson, R. W. (1983). Effects
incidence of desire and design. The Annals of the
of music on psychophysiological responses to a
American Academy of Political and Social Science,
stressful film. Psychomusicology, 3(1), 44–52.
597, 65–81. doi:10.1177/0002716204270435
The adventures of Rocky and Bullwinkle [Com-
Scott, T. (Director). (1986). Top Gun [Motion
puter game]. (1992). Radical Entertainment
picture]. Hollywood, CA: Paramount Pictures.
(Developer). Agoura Hills, CA: THQ.
The Flintstones. (1991). The rescue of Dino &
Hoppy [Computer game]. Vancouver, Canada:
Taito Corporation.
19
The Jetsons. (1992). Cogswell’s caper! [Computer Yamada, M. (2009, September). Can music
game]. Vancouver, Canada: Taito Corporation. change the success rate in a slot-machine game?
Paper presented at the Western Pacific Acoustics
Toneatto, T., Blitz-Miller, T., Calderwood,
Conference, Bejing, China.
K., Dragonetti, R., & Tsanos, A. (1997).
Cognitive distortions in heavy gambling. You don’t know jack [Computer game]. (1995).
Journal of Gambling Studies, 13, 253–261. Berkeley Systems/Jellyvision (Developer).
doi:10.1023/A:1024983300428 Fresno, CA: Sierra On-Line.
Too human [Computer game]. (2008). Silicon
Knights (Developer). United States: Microsoft
Game Studios. KEY tErMs AND DEFINItIONs
Traxel, W., & Wrede, G. (1959). Changes in physi- Acoustic Frustration: The use of sound to
ological skin responses as affected by musical antagonize a player, creating a short-term sense
selection. Journal of Experimental Psychology, of frustration that, it has been suggested, prolongs
16, 57–61. the play period.
Tsukahara, N. (2002). Game machine with random Electronic Gambling Machines: EGMs, also
sound effects. U.S. Patent No. 6,416,411 B1. Wash- known as slot machines, video slots, or video fruit
ington, DC: U.S. Patent and Trademark Office. machines are digital, electronic slot machines.
They tend to be much faster than electric or me-
Turner, N., & Horbay, R. (2004). How do slot chanical slots, with an increased number of play
machines and other electronic gambling machines options and bonuses.
actually work? Journal of Gambling Issues, 11. Galvanic Skin Response: GSR: one compo-
Westermann, C. F. (2008). Sound branding and nent of electrodermal response, also known as
corporate voice: Strategic brand management skin conductance response or sweat response, is
using sound. Usability of speech dialog systems: an affordable and efficient measurement of simple
Listening to the target audience. Berlin: Springer- changes in arousal levels—one of the reasons why
Verlag. it is the main component of a polygraph device.
Essentially, GSR measures the electrical conduc-
(1990). Wing commander [Computer game]. tivity of the skin, which changes in resistance due
Austin, TX: Origin Systems. to psychological states.
Wolfson, S., & Case, G. (2000). The effects of Losses Disguised as Wins: A play in which
sound and colour on responses to a computer the player “wins” but receives a payout amount
game. Interacting with Computers, 13, 183–192. of money less than that of the amount wagered,
doi:10.1016/S0953-5438(00)00037-0 hence actually losing on the wager despite being
convinced (sonically) that they have, in fact, won.
Yalch, R. F., & Spangenberg, E. R. (2000). The Near Miss: A failure that was close to a win—
effects of music in a retail setting on real and such as two matching icons arriving on the payline
perceived shopping times. Journal of Business followed by a third reel whose icon sits just off
Research, 49, 139–147. doi:10.1016/S0148- the pay-line. Slot machine manufacturers use this
2963(99)00003-X concept to create a statistically unrealistically high
number of near misses (Harrigan 2009), which
convinces the player that they are close to win-
20
ning, and therefore leads to significantly longer other gamblers and passersby may also take
playing times (Parke & Griffiths, 2006). gamers’ attention away from the machines.
Reward Schedule: A schedule of pay-off or 2
For instance, a reward schedule is built into
rewards tied to timings or game actions, resulting Too Human. Personal conversation, Denis
in a series of emotional peaks and valleys to keep Dyack of Silicon Knights, St. Catherines,
a player interested in a game. Ontario, 2008. See Hopson, 2001.
Rolling Sound: The music or sound effects 3
There are different versions of the game
that are played when a player wins a round on a available, including a “progressive slot”
slot machine. The length of the sound (its roll) is with varying jackpots, a 25-line slot with
tied to the amount of the win, with longer sounds a max bet of 1,250 credits and a payout of
rolling for longer times. 500,000 credits.
4
Thanks to the anonymous reviewer of the
chapter for this idea.
ENDNOtEs 5
Commission on Behavioral and Social
Sciences and Education Committee on the
1
It is a common practice for many avid slot Social and Economic Impact of Pathologi-
machine gamers to play multiple, adjacent cal Gambling. (1999). Committee on Law
machines simultaneously. Further, activities and Justice. Commission on Behavioral and
like drinking, smoking and interaction with
21
22
Chapter 2
Sound for Fantasy and Freedom
Mats Liljedahl
Interactive Institute, Sonic Studio, Sweden
AbstrAct
Sound is an integral part of our everyday lives. Sound tells us about physical events in the environ-
ment, and we use our voices to share ideas and emotions through sound. When navigating the world
on a day-to-day basis, most of us use a balanced mix of stimuli from our eyes, ears and other senses to
get along. We do this totally naturally and without effort. In the design of computer game experiences,
traditionally, most attention has been given to vision rather than the balanced mix of stimuli from our
eyes, ears and other senses most of us use to navigate the world on a day to day basis. The risk is that
this emphasis neglects types of interaction with the game needed to create an immersive experience. This
chapter summarizes the relationship between sound properties, GameFlow and immersive experience
and discusses two projects in which Interactive Institute, Sonic Studio has balanced perceptual stimuli
and game mechanics to inspire and create new game concepts that liberate users and their imagination.
INtrODUctION put to play. In our work we use perspectives and

methods from art, science, and technology and
At the Interactive Institute, Sonic Studio in Piteå, we utilize digital technology as a vehicle for our
Sweden, we do research on sound and auditory ideas and experiments.
perception in order to find new ways to use sound, In a series of projects we have explored in-
new contexts where sound can be utilized, and tuitive, emotional, imaginative, and liberating
new applications for sound in general. Of special properties of sound. These projects have resulted
interest to us is how sound resembles and differs in new insights and knowledge as well as in new
from other sensory stimuli and how this can be and innovative applications for sound, audio, and
technology. In this chapter I will describe our per-
DOI: 10.4018/978-1-61692-828-5.ch002 spective on a number of sound properties and how
we have put these to work in various ways. The disbelief. To achieve this, the computer game
projects are based on and inspired by an ecologic industry must build broader and broader bridges
and everyday-listening approach to sound, like over the reality gap to make the virtual game reality
the ones proposed by R. Murray Shafer, William more immersive. The traditional way to increase
Gaver, and their followers. immersion and suspension of disbelief has primar-
As human beings, we are good at interpreting ily been to increase graphics capability and, today
the soundscape constantly surrounding us. When we can enjoy near photo-realistic, 3D-graphics in
we hear a sound we can make relatively accurate real time. But there might be alternative ways to
judgments about the objects involved in generating tackle the problem. Potentially, computer games
the sound, their weight, the materials they are made could be more engaging and immersive without
of, the type of event or series of events that caused having to build long and broad bridges over the
the sound, the distance and direction to the sound reality gap. What about narrowing the gap instead
source, and the environment surrounding the sound of building broader bridges over it?
source and the listener, for example. Much of the
existing research on sound and auditory perception
is about how to convey clear and unambiguous bAcKGrOUND
information through sound. In computer games,
however, the aim is also to create other effects, Sound and light work in different ways and reach
effects that have as much to do with emotions, the us on complementary channels. Our corresponding
subconscious, intuition, and immersion as they do input devices, the visual and auditory perceptions,
with clear and unambiguous messages. show both similarities and differences and we
This article describes a couple of projects in have an innate ability to experience the world
which we have worked with the balance between around us by combining the visual, auditory, touch
eye and ear, between ambiguity and un-ambiguity, and olfactory perceptions into one, multimodal
between cognition and intuition and between body whole. We are built for and used to handling
and mind. The aim has been to create experiences the world through a balanced mix of perceptual
built on a multitude of human abilities and af- input from many senses simultaneously. This
fordances, mediated by new media technology. can be exemplified in different ways. One is by
In a traditional computer game setting, the TV crossmodal illusions, for example, the McGurk
screen or computer monitor is the center of atten- effect (Avanzini, 2008, p. 366) which shows how
tion. The screen depicts the virtual game world and our auditory perception is influenced by what we
the player uses some kind of input device, such as see. Another example is the ventriloquist illusion
a game pad, a mouse, a keyboard, or a Wiimote, to in which the perceived location of a sound shifts
remotely control the virtual gameworld and objects depending on what we see (O’Callaghan, 2009,
and creatures in it. The action takes place in the section 4.3.1). If the signal on one sensory channel
virtual world and the player is naturally detached is weak, we more or less automatically fill in the
from the game action by the gap between the gaps with information from other channels and,
player’s physical world and the virtual world of in this way, we are able to interpret the sum of
the game. Much work has to be done and complex sensory input and make something meaningful
technology used in order to bridge that gap and of that sum. Watching lip movements in order to
to have the player experience a sense of presence hear what your friend is saying at a noisy party is
in the virtual gameworld. The aim is to make the just one everyday example of this phenomenon.
player feel as immersed as possible in the game A third example is Stoffregen and Bardy’s con-
experience and to make her suspend her natural cept of “global array” (Avanzini, 2008, p. 350).
23
According to this concept, observers are not its shape. Even if the individual has a previous
separately sensitive to structures in the optic and experience and memory that connects the sound
acoustic flows, but are rather sensitive to patterns of broken glass to the event of a broken bottle,
that extend across these flows: the global array. the ear, not knowing the exact cause of the sound,
Another way to describe this is that we do not might hesitate. Was it a bottle that crashed or was
“see and hear” but rather “see-hear”, what we it perhaps a large drinking glass that broke? The
perceive is the sum of sensations reaching our eye can give the correct answer, whereas the ear
different modalities. is left to interpret and to guess in various degrees.
What we really hear, what a sound is, where Tuuri, Mustonen, and Pirhonen (2007) have
a sound is located and so forth are questions that continued along this path and propose a hierarchi-
philosophers have been arguing over for several cal scheme of listening modes. Two of these are
hundreds of years. O’Callaghan (2009) gives a preconscious, two are source-oriented, three are
broad summary of the history and current state context-oriented and one is quality-oriented. In
of the field. What most philosophers and sound the two preconscious-oriented listening modes, the
researchers agree on is that sounds are the result focus is on what reflexive, emotive and associative
of events in the physical world. Sound holds responses a sound evokes in the listener. In the
information about these events and the objects two source-oriented modes, the focus is on how
involved in them. This means that to our percep- the listener perceives the source of a sound and
tion, sounds are strongly linked to the physical what event caused it. In the three context-oriented
world and we are “hard wired” to treat sounds as modes, the focus is on whether the sound had a
tokens of physical activity, matter in motion and specific purpose, if it represents any symbolic or
matter in interaction. conventional meaning, and if the sound in that case
In this context the pioneering work of William was suitable and understandable in the context.
Gaver (1993) on sound classification and listening In the last, quality-oriented listening mode, the
modes is still often cited and relevant for game focus is on the acoustic properties of the sound,
sound design. Gaver makes a distinction between its pitch, loudness, duration and so forth. To use
musical listening and everyday listening. In musi- these or other, complementary, identified listen-
cal listening, you listen to the acoustic properties ing modes is a powerful way to inform the sound
of the sound, for example, its pitch, loudness, and design process of not only computer games, but
timbre. In everyday listening, on the other hand, sound design processes in general. The important
you listen to events rather than sounds. When you thing to notice here is that research on listening
hear a car passing by or you hear a bottle breaking modes in general shows that sound can indeed
you do not pay much attention to pitch or loud- be used to evoke emotions and associations, to
ness but more to the event as such. In everyday communicate properties of physical objects and
listening, the interpretation and the mapping of events and to convey meaning and purpose.
sounds to the individual’s previous experiences Already from the time before we are born our
and memories are crucial. When a bottle crashes auditory perception starts giving us information
against the floor, loses its original shape and turns about the world around us (Lecanuet, 1996). From
into a number of smaller and larger pieces, this day one we start building our library of associations
is immediately obvious to the eye. But, in order to individual sounds and to whole soundscapes.
to be able to pinpoint the event that caused the Gradually, we learn what they mean and we train
sound of the broken bottle, the ear has to learn and our ability to interpret them. Furthermore, some
form a memory that connects the sound of broken researchers argue that we experience sounds “as
glass to the event of a bottle crashing and losing of” a bigger whole. O’Callaghan (2009) argues
24
that the sound of hooves of a galloping horse Gaver (1997). This is true also when it comes to
is not identical with the galloping. Instead, it is sound in computer games but, in this context, the
part of the particular event of galloping: “Audi- need to interpret and disambiguate the computer
tory perceptual awareness as of the whole [sic] game system is not the only aspect of the issue. On
occurs in virtue of experiencing the part” (part the contrary, some authors argue that ambiguity
2.3.2). This strong linkage between the sounds and the need to interpret a system instead can be
we hear and the physical world we inhabit can used as an asset (Sengers & Gaver, 2006; Sengers,
be brought into play in computer games through Boehner, Mateas, & Gay, 2007). Here, we argue
rich soundscapes in order to convey information that this is certainly the case. When the ideas of
about objects, environments, and events in the ambiguity and interpretation are combined with
game world. Try this simple experiment. Pick the concepts of flow and GameFlow described
an environment with a reasonable number of ac- below, the sum can be used to inform the game
tivities, people, birds, machines or whatever you design process in new ways.
can find that makes everyday sounds. Close your Development of computer games has so far
eyes and try not to interpret, make associations mostly been geared towards vision. When it comes
and create mental pictures from what you hear. to sound in games, much of the work is inspiring
It is very hard to put the auditory interpreter to case studies but less research. Sweetser and Wyeth
rest and this is true for sounds from all types of list three aspects of usability in games that have
sources, including the headphones playing sounds previously been in focus for research (Sweetser
from your iPod. This interpretation, mapping, or & Wyeth, 2005). These are interface (controls and
disambiguation of individual sounds and whole display), mechanics (interacting with the game-
soundscapes involves high-level mental processes world), and gameplay (problems and challenges).
related to our conscious and subconscious, cogni- Lately, also other dimensions of the design and use
tive and emotional layers. As such, these processes of computer games have started to gain interest
have the potential to invoke a myriad of physical among game researchers, dimensions that incor-
and mental responses: fear, flight, well-being, porate new and more complex aspects and ideas
happiness, anger, understanding and so on. In of player enjoyment and computer game design.
computer game design, this means huge potential Several research groups have, for example, made
to both convey cognitive meaning and to create connections between interactivity in general and,
moods and affect. more specifically, player enjoyment in games on
Auditory perception can be understood as be- the one hand and the concept of flow developed
coming aware of the whole by virtue of the parts. by Mihaly Csíkszentmihályi on the other. In the
Sounds can also be said to be more ambiguous and 1970s and 1980s Csíkszentmihályi conducted
leave wider space for interpretation than visual extensive research into what makes experiences
stimuli do, at least when it comes to interpreting enjoyable. He found that optimal experiences are
where and what we have heard. In Human Com- the same all over the world and can be described
puter Iinterface (HCI) contexts, ambiguity has in the same terms regardless of who is enjoying
often been thought of in terms of disadvantage the experience. He called these optimal experi-
and problem (Sengers & Gaver, 2006) and much, ences flow. A flow experience is defined by
perhaps even a majority, of the research done in Csíkszentmihályi (1990) as being “so gratifying
the field has tried to overcome this and find ways that people are willing to do it for its own sake,
to create clear and unambiguous systems and in- with little concern for what they will get out of
terfaces. Research on sound interaction design is it, even when it is difficult, or dangerous” (p. 71).
no exception to this, as described by, for example,
25
Judging from the volume and type of work be used in sound design for games is covered in
built on and derived from Csíkszentmihályi’s more detail below.
flow principle, it can be argued that the concept A number of research projects report on sound
is relevant in the context of computer games. and audio’s ability to create rich, strong and im-
Andrew Polaine (2005) has written about The mersive experiences using mobile platforms that
Flow Principle in Interactivity. This work does give physical freedom to the users. These projects
not relate to computer games per se, but is also support the general idea that sound and audio
closely related to the subject in that it connects are well suited for use in the design of computer
flow with both “willing suspension of disbelief” game experiences based on the GameFlow model.
(a term borrowed from narratives in theater and Reid, Geelhoed, Hull, Cater, and Clayton report on
film) and the experience of play. The GameFlow a public, location-based audio drama called Riot
model developed by Sweetser and Wyeth builds 1831. The evaluation of the project showed that a
directly on the concept of flow and is a model for majority of the users had rich and immersive expe-
evaluating computer games from an enjoyment riences created from the sounds of an audio-based
perspective. Another example is Kalle Jegers’ narrative. Based on the results from this project,
(2009) “Pervasive GameFlow” model that takes the authors argue that “immersion is a positive
Sweetser’s and Wyeth’s GameFlow concept to the determinant for enjoyment (and vice versa)” (Reid,
pervasive game arena. A final example is Cowley, Geelhoed, Hull, Cater, & Clayton, 2005). It should
Charles, Black, and Hickey’s (2008) USE model be noted that the drama took place in a square in
(User, System, Experience) that looks at games, Bristol, UK, which gives this project similarities
player interaction, and flow from an information to pervasive and location-based games where
system perspective. the virtual gameworld and the physical world of
Built on the Flow concept, Sweetser and Wy- the player are blended. Friberg and Gärdenfors
eth’s GameFlow model consists of eight elements (2004) report on a project in which three audio-
for achieving enjoyment in games. The model based games (what the authors term TiM games)
can be used both when designing new games and were developed. Based on audio communication
when evaluating existing game concepts. In sum- with the users, the authors report that these games
mary, according to Sweetser and Wyeth, games give the users spatial freedom, encourage physical
must keep the player concentrated through a high activity and open up possibilities to create new
workload. At the same time, the game tasks must types of interfaces for input to the game. Ekman
be sufficiently challenging and match the skill et al. report on the development of a game for a
level of the player. The game tasks must have clear mobile platform (Ekman et al., 2005). They point
goals and the player must be given clear feedback out that sound and audio can indeed be used to
on progression towards these goals. Enabling create immersion, but also that the use of sound
deep yet effortless involvement in the game can does not automatically create immersion. Great
potentially create immersion in the game. Accord- care must be taken when designing the game
ing to Sweetser and Wyeth, experiences can be sounds and the developers must also carefully
immersive if they let us concentrate on the task select the best technology and equipment to play
of the game without effort. “Effortlessly” can, in back the game audio to get the desired effects.
this context, be interpreted in several ways: one In two projects, the Interactive Institute’s Sonic
way to think about it is in terms of how true to Studio has investigated how sound in games can
real life a gaming experience is and how trans- be used to bring the user’s own fantasy into play
parent the interaction with the game creating the to create new gaming experiences (Liljedahl,
experience it is. How the GameFlow model can Lindberg, & Berg, 2005; Liljedahl, Papworth, &
26
Lindberg, 2007). In both these projects the bal- MIND tHE GAP—sOUND FOr
ance point between visible and audible stimuli FEEDbAcK AND IMMErsION
from the game has been moved away from the
visual and towards the audible. In both cases the Pictures are not the real world; they are merely
users of the computer games are given only a the shadows of it. René Magritte’s provoking pipe
minimum of visual information and are, instead, is a painting about exactly this: the picture of a
given rich and varied soundscapes. The projects pipe and beneath it the text “Ceci n’est pas une
have shown that the users have had rich and im- pipe” (This is not a pipe). We are surrounded by
mersive gaming experiences and are given other still and moving images and we are used to treat-
types and amounts of freedom compared to more ing pictures as pictures and not the real, physical
traditional computer games. These projects will world. Even the most violent computer games and
be described in more detail later in this chapter. Hollywood film productions are assumed to be
Humankind has, in recent centuries, invested physically and mentally non-hazardous to us just
considerable energy and creativity in creating because we are supposed to be able to discriminate
complex technology. We have a long tradition between reality and the fictive picture of it. Sound,
in replacing human capability with machinery. on the other hand, seems to work slightly differ-
In the early days it was mostly muscle power ently. When striving for engagement, immersion,
that was mimicked, replaced, and superseded by and suspension of disbelief in computer games
steam, combustion, and, later, electricity. It can and films, sound, plays a very prominent role
be argued that research into artificial intelligence and, according to Parker and Heerema (2007),
is striving to do the same with human cognitive “sound is a key aspect of a modern video game”.
and emotional capabilities. Following this long Natural sounds in the physical world are the result
tradition, it seems that we often neglect human of events in that world and we become aware of
capabilities, affordances, gifts, and needs when physical events to a large degree through sound.
designing computer games and other systems. It can thus be argued that sound is a strong link
Much of the focus has been on creating photore- to the physical world. In fact, Gilkey and Weisen-
alistic 3D-environments in real time and less on berger argue that “…an inadequate, incomplete
how the players’ internal, fantasy-driven, “sound or nonexistent representation of the auditory
interpreter and mapper” can be put into play to background in a VE [Virtual Environment] may
create complementary, mental images. In the compromise the sense of presence experienced by
following I will describe how we at Interactive users” (quoted in Larsson, Västfjäll, & Kleiner,
Institute, Sonic Studio work with finding ways 2002). It is this mechanism that is utilized when
to increase user satisfaction and involvement creating the sound tracks to films and games.
in gaming situations by using existing technol- Just seeing Donald Duck smash into a wall is not
ogy in slightly new ways. Often, this has meant enough. It is not until the sound effect is added that
moving complexity from technology to the user, the nature and the full consequence of the smash
decreasing the demands on technology used, and are made evident to the audience. When we hear
increasing the demands on the user to invest and the sound of the smash, all of us have our own,
spend energy physically and mentally in a game slightly unique, experiences of and relationship
experience. to the sound. The sound has the power to im-
mediately trigger our interpretation machinery
and evoke memories and fantasies. In a fraction
of a second the sound makes us re-live our own
experiences and we can feel what Donald feels:
27
pain, anger, and humiliation. In this way it can be • Social Interaction. Games should support
argued that the sound is playing us. Like a guitarist and create opportunities for social interac-
plucking a string that generates sound, sound is tion (Sweetser & Wyeth, 2005).
plucking our interpretation, spawning memories,
understanding, and emotions. The string cannot As can be seen from the list, these criteria are
stop the guitarist from plucking it and we cannot very general and could be applied to many aspects
stop sound from triggering our understanding, our of life, from children’s play to high school educa-
memories, associations, and emotions. tion, working life, and leisure. When it comes to
For a computer game to be successful it is sound design for computer games, some of these
crucial that the players can immerse themselves in criteria are more relevant than others. When look-
the gaming experience and that they are invited to ing at Tuuri, Mustonen, and Pirhonen’s (2007)
a gameworld and game experience in which they hierarchical listening modes, a clear link to the
are willing to suspend their natural disbelief. After GameFlow concepts Feedback and Immersion
all, World of Warcraft is not the real world. In their criteria can be found. Sweetser and Wyeth divide
GameFlow concept, Sweetser and Wyeth (2005) the Feedback criterion into the following parts:
set up a number of criteria that game designers
and game researchers can use when designing and • Players should receive feedback on their
evaluating games with respect to immersion and progress towards their goals
suspension of disbelief. Some of these criteria • Players should receive immediate feed-
are general, overarching principles that relate to back on their actions
many human activities, while other criteria relate • Players should always know their status or
more closely to gaming and the media used to score (Sweetser & Wyeth, 2005).
convey the game’s metaphor and narrative. The
GameFlow model lists the following criteria for The Immersion criterion is similarly divided
player enjoyment in games: into the following parts:
• Concentration. Games should require • Players should be less aware of their

concentration and the player should be surroundings
able to concentrate on the game • Players should be less self-aware and less
• Challenge. Games should be sufficiently worried about everyday life or self
challenging and match the player’s skill • Players should experience an altered sense
level of time
• Player Skills. Games must support player • Players should feel emotionally involved
skill development and mastery in the game
• Control. Players should feel a sense of • Players should feel viscerally involved in
control over their actions in the game the game (Sweetser & Wyeth, 2005).
• Clear Goals. Games should provide the
player with clear goals at appropriate times Given our ability to listen on several cogni-
• Feedback. Players must receive appropri- tive abstraction levels, as indicated by Tuuri,
ate feedback at appropriate times Mustonen, and Pirhonen’s hierarchical listening
• Immersion. Players should experience modes, it can be argued that sound is well suited
deep but effortless involvement in the to communicate feedback to the user and to sub-
game stantially add to the game’s ability to immerse the
player in the gaming experience. In the following
28
we will look at how sound can be used and what behind, and other dangers. Our ears are under a
sound properties could be brought into play in constant bombardment of auditory input from all
order to give immediate and continuous feedback directions and we cannot simply turn away from
to users, to help them become less aware of their a sound. To be able to handle all this information
surroundings and themselves, and to help them and to avoid fatigue and sensory overload, we
get involved in the game. handle most of the input subconsciously.
Luckily, we also have the ability to focus on
specific parts in the soundscape. We can, for ex-
sOUND PrOPErtIEs At ample, isolate a conversation with a friend in a
YOUr DIsPOsAL noisy restaurant from a dozen nearby, unrelated
conversations. This is often referred to as “the
There are a number of properties of sound as a cocktail party problem” (Bregman, 1990, p. 529).
physical, acoustic phenomenon that, in conjunc- In GameFlow terms, the omni-directional
tion with the inherent workings of our auditory qualities of sound relate to both feedback and
perception and our ability to use different listening immersion. Sound for feedback from a game does
modes, are at our disposal to use, explore, and not force the user to look at a special location on
exploit when designing computer game experi- a screen: in fact, it does not require a screen at all.
ences. Most of these properties are well known Sound is a strong carrier of emotions, events, and
in everyday contexts and most people will im- objects, as discussed above. In our everyday lives,
mediately be able to connect to the descriptions we are also used to being surrounded by sound.
of them, have own experiences of them and to Mimicking this in a computer game scenario can
understand the implications of them. These prop- make profound contributions to the immersive
erties can, of course, be described in physical and qualities of the game.
acoustic terms of frequency, amplitude, overtone
spectrum, envelopes and so forth. Unfortunately Uninterruptible
these terms say very little about our human experi-
ences of sounds, sound sources, and soundscapes. Along the same line is the fact that we do not have
It is therefore important to also describe sound “earlids” and cannot just shut our ears to get rid of
properties in relation to how our hearing works. the sounds around us or choose to hear just one of
The following is a summary of what we have the sounds of the total mass that reaches our ears.
discussed above and an attempt to start making From an evolutionary point of view, it has been
the discussion more concrete and applicable to an advantage to get early warnings and hear all
sound design for computer games. dangers, not only the dangers you choose to listen
to, but all. It also means that our eyes and our ears
Omni-Directionality are designed differently and that the streams of
sensory input from those senses complement and
Sound is omni-directional and reaches our ears interact with each other. Again, this means that a
from all directions (almost) simultaneously. Ac- constant stream of input data must be handled. The
tively and consciously, as well as automatically and way to cope with this is to do it subconsciously.
pre-consciously, we use this omni-directionality In our everyday lives we are submerged in
to navigate in our everyday lives. Even though the ever-present stream of sounds from the world
we do not have to look out for saber-toothed ti- surrounding us. By supplying a relevant and well-
gers anymore, we are constantly warned for cars designed stream of sounds from a computer game,
and buses from left and right, falling trees from the users can get constant and natural feedback
29
on their actions, very much like in real life. This us, but we are also forced to process most of what
in turn adds in a natural way to the sought-after we hear subconsciously.
effortless immersion. Often, we do not know exactly what the source
of a sound is or from what direction and distance
sound connects to the it comes. We can hear a vehicle approaching from
Physical World behind but have to guess what type of vehicle it is
and how fast it is approaching. We can roughly tell
Sound connects you to the physical world by tell- if it is a truck or a car and make educated guesses
ing about physical objects and events that involve about when it will pass us, but usually not more
physical objects. We can be described as hardwired than that. Sound leaves a relatively large space
to perceive and automatically interpret sounds as within which we can (or are forced to) fill out
results of events occurring in the physical world. the details ourselves and make assumptions and
This is true even if the sound is mediated through interpretations based on our individual memories,
a loudspeaker: our internal interpreter does not experiences and associations.
make much difference between sounds from a When telling stories, making films or design-
physical coffee cup being placed on a table and ing computer games, this ambiguity can be of
the recorded sound of the same event played back great value. By planting a well-designed sound
through a pair of headphones as long as the tech- at the right moment, you can trigger a person’s
nical quality is sufficient. It is still a coffee cup imaginative and emotive mechanisms by forcing
being placed on a table. As with the real-world her to consciously or subconsciously interpret and
example you were asked to listen to above, try disambiguate the sound. Leaving the user space
listening to a film with your eyes shut. It is vir- open to her own interpretation, inviting her and
tually impossible to turn off the flow of images, giving her the freedom to use her own imagination
feelings and associations flowing through you as can potentially help the user to be emotionally and
you listen. You have to concentrate very hard on viscerally involved in the game.
something else not to be affected by the sounds
that reach your ears. The sound of a dentist’s drill sound reaches us on
gives a direct bodily sensation and you can almost subconscious channels
feel the drill in your own mouth. The picture of
the drill alone, without the sound, does not have Our ears are constantly capturing the soundscape
the same power over our imagination, emotions, around us. If all that data were to be processed by
and physiology. the cognitive and conscious layers in our brains,
Again, sound can be used to immerse the user we would either suffer from mental overload
in the gameworld in a way that strongly resembles or have another brain constitution. But thanks
the way we handle and work in everyday life. to the limited bandwidth of our consciousness,
our subconscious, emotional and intuitive layers
sound can be Ambiguous process most of the sounds we hear.
This does not mean that we are not affected
We constantly hear sounds from all directions and, by what our ears pick up and what our brains are
to some degree, we can decide the direction and processing. What it does mean is that the effect
the distance to the sound source. At will, we can is not totally controllable by us and that we are,
consciously filter out discrete sounds of special to a large degree, victims of the sonic world.
interest to us from the whole soundscape around Often this is useful, sometimes it is stressful and
sometimes it is fun. We are more or less forced
30
to interpret and react to what we hear. A sound and musicals, music and ambience, and music
heard spawns meaning and interpretations based and sound effects in games and films.
on our previous experiences. In games this can
be extremely useful as a way to invite the players speech and Dialog
to invest and get deeply involved in the game.
This relates strongly to the GameFlow criterion When you want to convey a clear and unambiguous
“immersion” described above. message, the human voice is a natural choice. The
same is true if you want to tell a riddle or recite
a poem or just want to be vague and ambiguous.
sOUND tYPEs At YOUr DIsPOsAL Human language is so rich and there are a myriad
of ways to use this in computer game contexts.
There are a number of ways to categorize and clas- Speech and dialog can be used to address several
sify sounds. In this context it makes sense to use of the criteria for player enjoyment included in the
the three categories traditionally used for sound in GameFlow concept. They can be used to promote
films and computer games (Sonnenschein, 2001; concentration on the game by providing a comple-
see also Hug, 2011; Jørgensen, 2011 for more mentary source of stimuli, getting the player’s
involved taxonomies of computer game sound): attention without disrupting the player’s visual
focus, or spreading the total game workload on
• Speech and dialog. Human language complementary channels, for instance. Sometimes
brought to sound, the sounding counterpart it is necessary to give instructions to the player
to the visual text. The most cognitive and on what to do next, or what is expected from the
unambiguous of the three types often used game. If you do not want to exclude the player
to convey clear messages with least pos- from an ongoing game sequence or if you have
sible risk of misunderstanding problems with limited screen size, using speech
• Sound effects and the subcategory ambient as a complement to text is one solution. Today,
sounds. The result of events in the physical more and more computing and gaming platforms
world. A falling stone hitting the ground; have built-in support for voice recognition, which
air fluttering in the feathers of a bird; a means that the player can control the game by
mechanical clock ticking; a heavy piece issuing voice commands. Since this is totally in
of frozen wood dragged over a horizontal, line with what we do in our everyday lives, it
dry concrete floor; the ever-present, ever- also supports a very natural way to co-create the
changing sounds of the atmosphere game world and to get a desired sense of impact
• Music. Sometimes referred to as “the lan- upon it. Speech is a natural way to get feedback
guage of emotion”. An integral part of from a game on player progress and distance to
human cultures since the dawn of Homo game goals without having to force the player to
sapiens. shift visual focus to get the necessary feedback.
Speech and human voices are totally natural parts
Note that these categories are only for clarity of human society and of everyday lives. The hu-
and discussion. It is important to point out the fact man voice is therefore very well suited to making
that, in reality, the possible borders between them the players forget that they are participating in the
are floating. he borders between music particularly game through a medium and it helps to make the
and the other two categories have been blurred for game interface less visible and less obtrusive to
centuries: for example, music and dialog in opera the player. Voices can therefore be integrated into
31
the background soundscape of the game to give as freed and part of the physical world through
a sense of human presence. the added sound.
Apart from the above-mentioned rather objec- Friberg and Gärdenfors use a number of
tive and technical uses of speech and dialog, all categories for the sounds in the TiM games
variations of subjective, expressive and dramatic mentioned above. Most of their categories can
qualities of the human voice are also available. be seen as subcategories to the traditional sound
A bad result uttered with an offensive voice will effect category. The categories listed by Friberg
be something radically different from the same and Gärdenfors (2004) are:
result uttered with a friendly and supportive voice.
Here, the thin border between computer games, • Avatar sounds refer to the effects of avatar
film, theater and other narrative media is clear. activity, such as footstep sounds, shooting
or bumping into objects
sound Effects Make it real • Object sounds indicate the presence of ob-
jects. They can be brief, recurring sounds
Events in the physical world generate sounds. It or long, continuous sounds, depending on
is actually very hard to live and be active in this the chosen object presentation
world without giving rise to sounds. Sounds heard • Character sounds are sounds generated by
in the physical world are the results of events non-player characters
involving physical objects. Explosions in a com- • Ornamental sounds are sounds that are not
bustion engine, oscillations of the vocal cords in necessary for conveying gameplay infor-
your throat, putting down your cappuccino cup mation, such as ambient music, although
on the saucer. Sounds are the proofs that you are they enrich the atmosphere and add to the
still firmly attached to the physical world of your complexity of the game.
senses. The absence of sound, on the other hand,
could be the sign that what you are experiencing In GameFlow terms this means that sound
is not real, that it is a dream or virtual reality. effects and ambient, background sounds can add
A green rectangle silently moving over a to several of the criteria for player enjoyment.
computer screen is probably perceived as just Presenting a lot of stimuli to the player on vari-
a green rectangle on the screen. But if you add ous channels is crucial for the ability of the player
the sound of a heavy stone dragged over asphalt to concentrate on the game. We are also used to
to this simple animation, the green rectangle au- constantly interpreting the soundscape surround-
tomagically turns into a heavy stone. Sound and ing us, and a well designed game soundscape will
computer game audio is a bridge on which the have great potential to grab the player’s attention
virtual visual worlds can travel out and become and help them focus on the game. Sound effects
part of the real, physical world. are today absolutely necessary for feedback to
Ambient or background sounds are the sound- the players of computer games. Everything from
ing counterparts to the graphic background. Hav- game control commands issued by the player to
ing no ambient sounds is like having a pitch-black virtual events caused by non-player characters can
visual background and can be perceived as an be signaled and embodied using sounds.
almost physical pressure on the ears. Adding just Sound effects and ambient sounds are very
a virtually inaudible ambient sound to the virtual important for player immersion and to involve
world of a computer game can create an immedi- the player emotionally and viscerally in the game.
ate experience of presence and reality. The silent Many of the sound stimuli that reach our ears are
virtual world that was locked in can be perceived processed subconsciously and handling sound
32
on this level of perception is totally natural to municate the same thing to two persons. But if the
us. This fact also supports the idea that sound is music is paired with something else, for example, a
very well suited to adding to the total experience film or a game, something happens. People that are
of immersion in the game world. said to hate classical music, and would never put
on a recording of classical music, can spend hours
Music Makes You Feel watching films with music tracks firmly grounded
in western classical music tradition, sounding like
Sound in general and music in particular have a something composed in the late 19th Century by
very strong ability to touch our feelings. Music Richard Wagner or Gustav Mahler. When musical
works emotionally in two significant ways. Firstly, sounds meet other sensory inputs, for example,
it tells us stories about feelings that we do not music in an animated film, the individual stimuli
necessarily feel ourselves: the music works like tend to blend together and become a new whole.
sounding pictures of emotions (Gabrielsson & The “film + music” object is perceived as being
Lindström, 2001, p. 230). Secondly, music can radically different from the film alone and the
have the power to induce feelings in us, that is, music alone. The music becomes more universal
to actually make us feel (Juslin & Västfjäll, 2008, and has the ability to communicate relatively
p. 562). Today, the borders between music, sound universal values, emotions, and moods.
effects, ambient background sounds and voices Music is normally a very linear phenomenon:
become more and more blurred and music is used a song starts at A and ends at B, and the journey
as sound effects and sound effects can be used as between the two is always the same and takes the
music. It can therefore be hypothesized that the same amount of time to travel each time. This is
emotional qualities of music are also, to some especially true of recorded, mediated music. In
extent, true for other types of sounds. a non-linear and interactive context, this linear
Research has shown that music alone, in the music concept does not necessarily apply. Most
absence of supporting pictures or other sensory often, music has a form that creates successions of
input, can in many cases and for a majority of tension and relief, which in turn creates expecta-
people induce feelings of happiness and sadness. tions on how the music will continue: the music
Most people can also accurately tell if a piece of can therefore not be altered as quickly and easily
music is composed and intended to express sad- as other media. To function and be perceived as
ness or happiness. Other, more complex emotions music, it has to follow at least some basic musical
like jealousy or homesickness are harder to dis- rules of form and continuity.
tinguish: Music, alone, has less power to induce A number of techniques and systems have
such feelings and to actually make us feel them. been developed to cope with the gap between
However, if you add pictures and other media linear music and non-linear environments. Many
to the musical expression, the musical power of these are proprietary systems developed by the
increases exponentially. commercial game developers and are not available
Auditory perception tends to dominate judg- to the general public. What most of the systems
ment in the temporal dimension (Avanzini,, 2008, seem to agree on is a division between a vertical
p. 390). Music is a special case of this, since it is and horizontal dimension. The vertical dimension
sound that is highly structured in time. By syn- controls aspects of musical intensity and emotion
chronizing sound and visual movements, very and the horizontal dimension controls aspects of
strong effects can be created. time and form. The vertical dimension is often
Some of the music we hear affects us very implemented using a layered approach whereby
individually: it is not universal and does not com- a number of musical tracks play in parallel. Each
33
track plays music with a certain content represent- and awareness about vision, graphic design and so
ing a level of intensity or emotion and the game forth is also remarkably higher, more general and
engine cross-fades between the tracks to create more common than their sounding counterparts, as
the correct blend of intensity and emotion. The are the creative tools available. In the Association
horizontal dimension is often implemented using for Computing Machinery’s Computing Classifi-
short phrases of 1, 2 or 4 bars linked together. cation System (2010), sound and audio are added
When a transition from one musical segment to late compared to, for example, computer graphics.
another is motivated by the state of the game, the Sound and audio are also mentioned on a lower
current phrase is played to the end and the chain level (level three) in the classification system,
of linked phrases takes another route than if the whereas computer graphics is a level two item.
game state had not changed.
balance the senses
3D-Positioned Audio
Our eyes play a dominant role in our everyday lives
Since sounds are the results of physical events and computer game development has tradition-
in three-dimensional space, it is often vital to ally put most emphasis on graphics and vision. At
be able to give the impression of game sound the same time, other modalities and media types
as emanating from a certain point in a 3D space such as sound and hearing can be described as
(see Murphy, 2011). 3D-positioned audio is a underused. This suggests that new computer game
powerful technique to bridge the gap between concepts could potentially be found by chang-
the virtual game reality and the physical world ing the balance between modalities and media
of the player’s senses. This is especially true for types. What happens for example if we reduce
sound effects but is also very useful for speech graphics and visual stimuli and instead build the
and dialog. Music and ambient sounds are most gaming experience more on sound and audition?
often not 3D-positioned. What would the effect be if you had a computer
game with only an absolute minimum of graphics
and instead a rich, varied and gameplay-driving
sOUND FOr FANtAsY soundscape? Potentially such a game would be
AND FrEEDOM immersive in other ways and give different types
of game experiences compared to more traditional,
We cannot hear away from a sound like we can graphics-based games. A couple of things im-
look away from an object, and we have no “earlids” mediately become obvious. First of all, the game
to shut as we can our eyelids. These simple facts designer must let other qualities than computer
makes sound ideal to use if you are looking for new graphics build and drive gameplay. Secondly the
game concepts to contrast the traditional screen player is liberated from the need to keep her eyes on
and eye-based computer games. Western societies a 20-something-inch rectangle (in mobile applica-
are often said to be vision-based or eye-centric. tions only a few inches). Instead, all of a sudden,
This suggests that we rely mostly on our eyes and she becomes free to move over much larger areas
use our other senses and abilities more or less just or even volumes. Both these open up possibilities
as support for what we see. In language this is to create radically new types of computer games
reflected in that we “watch” things. We “watch” for radically new computer game experiences.
TV and films despite silent movies being history They also represent new challenges for both game
since the 1930s. We even “watch” music concerts designers and computer game players (see Hug,
(at least this is true in Swedish). Our knowledge 2011 for an expansion of such ideas).
34
Our auditory perception is good at interpret- based on questions related to the ideas outlined
ing sounds as tokens of events. When we hear a above. These projects have shown that by shifting
sound we know something has happened, matter the balance between graphics and other media
has interacted with matter. The sounds of broken types and between eyes and other modalities,
glass, of cars colliding, of footsteps, our own games with new qualities can indeed be created:
breathing, and combustion engines all contain games that attract new user groups and games
information about materials, weights, speeds, that can be used in new contexts, in new ways
surface roughness and so on. In our everyday lives and for new purposes.
we are constantly immersed in a soundscape that In this context it is also relevant to make a
we receive through two streams, one in the left distinction between gameplay or game mechanics
ear and one in the right. From day one we start and metaphor. Gameplay can be defined as the set
training our perception in order to be able to make of rules and the mechanics that drive the game,
priorities and pick out the relevant information the game’s fundamental natural laws. Metaphor
from these two streams. Since sound reaches us on the other hand defines the world in which these
from all directions, it can be hypothesized that most abstract laws work. Gameplay can, for example,
of the events we hear, we do not see. In the light define that you are able to navigate in 4 directions
of the above, it becomes natural to use sounds as called north, south, east and west, that you will
means to convey feedback on both player actions be presented with challenges you can either win
and other events occurring in the virtual world or lose, and that you win the game by winning
of a computer game. Since sound tells us in a a defined number of these challenges. Metaphor
totally natural way about things we do not see, defines the world in which the navigation takes
sound can be used to expand the game world far place and the nature of the challenges. When
beyond what is displayed on a screen. Sound is gameplay defines an abstract challenge, metaphor
very well suited for delivering the feedback and can, for example, show an enemy soldier that
creating the immersion necessary for successful must be eliminated or it can present the player
game concepts, as described by the GameFlow with a falling egg that must be caught before it
concept above. hits the floor. A good game must have both a well-
The use of sound to convey information about designed gameplay and a metaphor that supports
events, creatures and things that are not visible adds that gameplay: both are equally important. Often,
yet another dimension to the game experience: the sound designer works with the metaphor side
imagination, a word originally meaning “picture of a game. The metaphor chosen dictates the
to oneself”. When we hear a sound without seeing possibilities available to the sound designer. A
the sound source we make an interpretation of metaphor with a large number of natural sounds
what we have heard. The interpretation is based that the players are likely to be able to relate to is
on previous experiences of memories of and as- potentially more immersive than a metaphor with
sociations to sounds with similar properties. The few and/or unknown sounds.
interpretation is often subconscious and made
without effort. To invite the players to use their two case studies
imagination, fantasy, and associations to fill out
the gaps in this way and complement what they In two projects, alternative ways to balance visual
see on the screen is one way to make the players and audible stimuli in computer games have been
emotionally and viscerally immersed in the game. explored by the Interactive Institute, Sonic Studio.
In a series of research and development projects In the first project, called Beowulf, a game for de-
we have conducted investigations and experiments vices with limited screen size, such as cell phones,
35
Figure 1. The Beowulf game window
was developed (Liljedahl, Papworth, & Lindberg, and differences just as in real life. Feedback on
2007). In this project, the hypothesis was set up player actions and progress is given by footstep
that a game with most of the graphics removed, sounds, breathing sounds, the sound of a swinging
having, instead, a rich, varied and challenging sword, and other sounds natural in the context of
soundscape, can create a new type of immersive the game’s world metaphor. Immersion is created
game experience. The hypothesis also included through the natural and effortless interaction with
the idea that a game built mostly on audio stimuli the sounding dimension of the gameworld.
will be more ambiguous and open for interpre- In the second project, called DigiWall, the
tation than a game built on visuals and that the computer monitor was removed totally (Liljedahl,
need for the users to interpret and disambiguate Lindberg, Berg, 2005). Instead, a computer game
the soundscape will create a rich and immersive interface in the form of a climbing wall was de-
game experience with new qualities compared to veloped (see Figure 2). The 144 climbing grips
traditional computer games. The game uses both a are equipped with sensors reacting to the touch
well-known gameplay and a traditional metaphor of hands, feet, knees, and other body parts. The
to keep as many parameters as possible constant. grips are also equipped with red LEDs and can
Although the gameplay is very simple, the game’s be lit, turning the wall’s climbing area into an
sound-based metaphor makes it a both challenging irregular and very low-resolution visual display.
and rewarding game to play. A number of games were then developed based
The Beowulf game world is graphically repre- on a balanced mix of sounds, physical activity,
sented by a revealing map, a map showing only and the sparse visuals of the climbing grips. The
the parts of the game world you have visited so absence of traditional computer game graphics
far as a red track (see Figure 1). Your position and the shift in balance between modalities and
in the game world is indicated by a blue triangle media types gives another effect: the games be-
pointing in your current direction. The player uses come open for the players to adapt to their own
headphones to listen to the gameworld, which is level of physical ability, their familiarity with the
described in much greater detail audially than games, how they chose to team up, to create
visually. The player navigates this gameworld by variation and so on. In this sense, the new balance
listening to sound sources positioned in a 3D space. between modalities and media types means new
Navigating includes localizing sound sources freedom for the players.
by turning and moving to experience changes
36
Figure 2. DigiWall climbing wall computer game interface
Both projects explored questions related to gameworld’s environments, materials, tempera-

how computer game players could be offered new ture, atmosphere, inhabitants and so forth none of
and unique gaming experiences in terms of free- which had any visible cues. In the DigiWall case,
dom and fantasy. In Beowulf the hypothesis was positive user investment ranked highly both in
that a shift in balance between eye and ear would player satisfaction and the subsequent publicity
invite the players to co-create the game experience and commercial success of the project.
and to bring their imagination into play in new In these projects, audio is used in a number
ways, compared to traditional, graphics-based of ways to create a sense of presence and to link,
games. The studies performed on the game concept as closely as possible, the virtual reality of the
showed that, to a majority of players, this was game to the physical reality of the player. Sound
also the case. The DigiWall concept is based on was also used to communicate instructions, cues,
the players’ freedom to use their whole bodies clues, feedback, and results from the game to the
and to play the games by moving over the whole, player. The aim was to create new balances be-
15m2 game interface. The absence of a tradi- tween sound and graphics compared to traditional
tional computer monitor also opens up the rules computer game applications and to explore if and
of play in such a way that the users are invited to how sound could be used to drive gameplay and
co-create and adapt the basic gameplays offered to create fun, challenging, rewarding and im-
to their own needs and desires. mersive gaming experiences. The aim was also to
In this context it is also important to mention the use sound to blur the borders between the virtual
term “user investment”. Both projects eventually reality of the game and the physical reality of the
showed that the need to interpret and disambigu- player. In both cases, game metaphors were chosen
ate the soundscape of the games was in fact an to match the gameplays and to present as many
asset. Both games more or less forced the players possibilities and large design spaces as possible
to use their own imagination and experiences to for the sound designers.
flesh out the sounding skeletons supplied by the Here follows a brief description of how sound
game’s metaphors. In the Beowulf case, the user was implemented in the two projects.
investment was expressed as high-ranking in game
satisfaction as well as in vivid descriptions of the
37
Ambience and background players and, for example, “whisper” that speed
to bridge the reality Gap is increasing or that time is running out and you
must hurry.
Physical environments are (almost) never silent.
Air, water, objects, creatures and machines around sound Effects and Music
us all more or less make sounds. The absence of for cues and clues
sound is unnatural and scary; it is an auditory
counterpart to pitch black. Sounds are the signs of Often game designers want to encourage the play-
presence, life and function. By adding just a very ers to go in certain directions or to take certain
soft sound of moving air, an otherwise dead and actions. By carefully planting sound effects and/
detached game environment can come alive. If or music, the player can be guided, inspired or
the sound is well designed, it is possible to create even intentionally misled. Beowulf uses a large
an experience where the game-generated sounds number of natural sounds to warn the player of
blend with the sounds from the gamer’s physical potential dangers such as predators, bottomless
environment, creating an inseparable whole. The holes or boiling lava. The DigiWall games use
gap between the realities closes. music and sound effects with musical properties
Ambient sounds can be strong carriers of to guide attention in certain directions on the
emotion and mood. They share this ability with wall. One example is the game Catch The Grip,
music and the fact is that the border between the in which the direction from the last grip caught
two is more and more often blurred by film and to the next to catch is represented by a series of
game sound designers (Dane Davis, cited in Son- notes. The length of the series tells the physical
nenschein, 2001, p. 44). Carefully “composing” distance on the wall. The panning of the notes in
an ambient or background sound can serve several the loudspeaker system signals the direction left/
purposes at the same time. It can create a sense of right. In the game Scrambled Eggs, sound effects
physical presence, it can set the basic mood and with a falling pitch denote the movement of “eggs”
it can communicate emotion and arousal. falling from the top of the wall towards the floor.
In the Beowulf game, the ambient sounds were
the sound of air softly flowing through the game- speech, Music and sound Effects
world’s system of caves and tunnels. The sounds for Information and Feedback
had a slight amount of reverb added to create a
sense of volume in the caves and the reverb was Many sounds are emotional and meant to create and
removed for tunnels. The ambient sounds were communicate mood and presence. Other sounds
also deliberately freed as much as possible from are meant to convey cognitive information about
musical components such as pitch and rhythm: rules, scores, results and so forth. Speech is, of
We wanted to give the players as much freedom course, very versatile and useful in this case. It
as possible to use their own imagination, not is very effective to have a voice read the initial
influencing them in any direction defined by us instructions for a game, especially if it is a game
more than necessary. with relatively simple gameplay and few rules.
Most of the DigiWall games use music tracks The same is true for scores and results. Who won,
as ambient and background sounds. In this case the left or right team? How many points did you
the purpose is the opposite. Music is used to set score? To have a voice read these results creates
the basic mood of the games and to encourage a strong feeling of presence and makes the game
physical activity in the players. The music is de- come alive. One drawback with speech is, of
signed to communicate subconsciously with the course, language. For example, Swedish voice-
38
overs in a game do not make very much sense sounds that the players can immediately relate to
in the UK. As with text, it is necessary to have can greatly enhance the gameplay aspect of the
localized versions and this quickly starts add- same game as well as create the sought-after sense
ing cost in terms of computer memory, coding, of presence and immersion. The DigiWall’s game
development time, and other resources. But then Scrambled Eggs uses the sound of broken eggs to
again, sometimes it is worth it. In the DigiWall signal points lost and the sound of an egg rescued
games serving as an example here, speech is in the palm of your hand to signal points gained.
used as introduction to all games. A majority of In Beowulf, if the player enters a forbidden game
the games also present scores and results using tile, the sound of a scream receding down a hole
speech. The DigiWall game interface is equipped together with the sound of falling rocks signals
with two buttons, so the players can select one of life lost. When this is followed by the funeral
two available languages. march, failure and the end of the game are obvious
A danger with speech is the risk of wearing to anyone, without the need for speech or text.
out often-repeated phrases. It is therefore useful
to give the players the option to skip, for example,
instructions when they are no longer needed. FUtUrE rEsEArcH DIrEctIONs
Music and sound effects can also serve as
carriers of information, albeit not as clear and It is often said that sound is still underused and
unambiguous as speech. This is not an innate that audio is a media type with potential yet to be
disability though, but rather an effect of the way unleashed. In order to free this unused potential,
we use music and sound effects. Rhythm, for research and development efforts must be carried
example, can be used to convey semantics just as out on several parallel fronts. We need to develop
well as any speech: what is required is simply to more in-depth knowledge about auditory percep-
learn the system (Morse code, for example). One tion and how heard experiences affect users of
of the advantages with sound effects and music is computer games and other interactive systems.
that they are not limited by language, but are more This also implies richer taxonomies and more
universal. This can of course be used in many ways. developed languages for writing about, talking
In the Beowulf game, each new round starts with about and reflecting over this new knowledge and
a short, horn melody, as if it were announcing the making it useful in wider contexts. Furthermore,
approach of the king’s ambassador. The players a number of current ideas and traditions in the
learn very quickly what this signal means and, field must be challenged and a set of updated
since it is very short, the risk of becoming bored ideas must be developed. Ambiguity and wider
with it is minimal. Beowulf also uses pure music interpretation spaces treated as design assets
to signal success and failure. Success is signaled rather than problems in the design of interactive
by a short triumphant brass fanfare and failure is systems is one example. Another example is when
signaled by a short funeral march. simple efficiency metrics for player enjoyment
By carefully selecting the metaphor aspect are replaced with more complex systems for the
of a game’s design, tremendous opportunities to design and evaluation of computer game experi-
create sound effects for feedback and information ences, such as the GameFlow concept. Finally,
can be opened up. By placing the game in an en- new technology that can carry and realize the new
vironment (metaphor) that the players are likely knowledge and ideas must be developed. This
to have some kind of relation to, the designer can includes technologies for procedural audio (see
choose sounds for feedback and information that Farnell, 2011; Mullan, 2011 for further descrip-
are natural in that environment. Using natural tions of this technology)) and systems for dynamic
39
simulation of room acoustics and acoustic occlu- How we interpret a sound depends on, and
sion and obstruction, just to name a few. draws from, our previous personal experiences.
Well-known sounds will spawn a myriad of pic-
tures in our inner, mental movie theaters. Unknown
cONcLUsION sounds can create both confusion and excitement.
Working in parallel with the gameplay and the
Sound is a complex stimulus and it is only in metaphor aspects of computer game design, and
recent years that science has started to under- making sure that the two match and support each
stand auditory perception in any depth. Much other, is a powerful way to find and design the
of the knowledge and practice in sound design sounds that build the total soundscape of the game.
for computer games and other interactive ap- By working in parallel with and carefully balanc-
plications is based on experience and anecdotal ing the graphics and the sounds of a computer
evidence. But the awareness of sound’s potential game the users’ bodies and fantasies can be set
and scientifically-based knowledge in sound de- free, creating unique, immersive, and rewarding
sign is slowly increasing. This is not only true in gaming experiences.
the computer game industry, but in industry and
society in general. The implications of the fact
that our ears and our eyes complement each other rEFErENcEs
are slowly beginning to have an effect. Graphics
alone gives one type of experience: sound alone Association for Computing Machinery. (2010).
gives another type of experience, and graphics ACM computing classification system. New York:
plus sounds gives new and unique experiences. ACM. Retrieved February 4, 2010, from http://
By working with the balance of ears, eyes and www.acm.org/about/class/.
other senses and human abilities, new opportuni- Avanzini, F. (2008). Interactive sound . In Polotti,
ties emerge for the computer game designer. The P., & Rocchesso, D. (Eds.), Sound to sense, sense
Wii, Dance Dance Revolution and DigiWall are to sound – A state of the art in sound and music
just a couple of examples of this. computing (pp. 345–396). Berlin: Logos Verlag.
Sounds in the physical reality of our bodies
are the results of physical events in that same Bregman, A. S. (1990). Auditory scene analysis:
reality. Our hearing is designed and “hardwired” The perceptual organization of sound. London:
to constantly scan and analyze the soundscape MIT Press.
surrounding us and react rationally to the sounds Cowley, B., Charles, D., Black, M., & Hickey, R.
heard. Most of the time this is done subconsciously (2008). Toward an understanding of flow in video
and our hearing can therefore be described as, to a games. ACM Computers in Entertainment, 6(2).
large degree, intuitive, emotional, or pre-cognitive.
The soundscape reaching our ears demands inter- Csíkszentmihályi, M. (1990). Flow: The psychol-
pretation and disambiguation in other ways than ogy of optimal experience. New York: Harper
the visual stimuli reaching our eyes. This need Collins.
to interpret and disambiguate can be turned into
Dance dance revolution [Computer game]. (2010).
a great asset in computer game design. A game
Tokyo: Konami.
with a well-designed, rich, and varied soundscape
will play on the user’s intuition and emotions: the DigiWall [Computer game]. (2010). Piteå, Swe-
game will be immersive and give fun and reward- den: Digiwall Technology. Retrieved February
ing gaming experiences. 10, 2010, from http://www.digiwall.se/.
40
Ekman, I., Ermi, L., Lahti, J., Nummela, J., Jegers, K. (2009). Elaborating eight elements of
Lankoski, P., & Mäyrä, F. (2005). Designing sound fun: Supporting design of pervasive player enjoy-
for a pervasive mobile game. In Proceedings of ment. ACM Computers in Entertainment, 7(2).
the ACM SIGCHI International Conference on
Jørgensen, K. (2011). Time for new terminology?
Advances in Computer Entertainment Technol-
Diegetic and non-diegetic sounds in computer
ogy,2005.
games revisited . In Grimshaw, M. (Ed.), Game
Farnell, A. (2011). Behaviour, structure and causal- sound technology and player interaction: Con-
ity in procedural audio . In Grimshaw, M. (Ed.), cepts and developments. Hershey, PA: IGI Global.
Game sound technology and player interaction:
Juslin, P. N., & Västfjäll, D. (2008). Emotional
Concepts and developments. Hershey, PA: IGI
responses to music: The need to consider under-
Global.
lying mechanisms. The Behavioral and Brain
Friberg, J., & Gärdenfors, D. (2004). Audio games: Sciences, 31, 559–621.
New perspectives on game audio. In Proceedings
Larsson, P., Västfjäll, D., & Kleiner, M. (2002).
of the ACM SIGCHI International Conference on
Better presence and performance in virtual envi-
Advances in Computer Entertainment Technol-
ronments by improved binaural sound rendering.
ogy2004, 148-154.
In AES 22nd International Conference on Virtual,
Gabrielsson, A., & Lindström, E. (2001). The Synthetic and Entertainment Audio.
influence of musical structure on emotional ex-
Lecanuet, J. P. (1996). Prenatal auditory experi-
pression . In Juslin, P., & Sloboda, J. A. (Eds.),
ence . In Deliège, I., & Sloboda, J. (Eds.), Musical
Music and emotion: Theory and research. Oxford,
beginnings: Origins and development of musical
UK: Oxford University Press.
competence (pp. 3–36). Oxford, UK: Oxford
Gaver, W. (1993). What in the world do we University Press.
hear? An ecological approach to auditory event
Liljedahl, M., Lindberg, S., & Berg, J. (2005).
perception. Ecological Psychology, 5(1), 1–29.
Digiwall: An interactive climbing wall. Proceed-
doi:10.1207/s15326969eco0501_1
ings of theACM SIGCHI International Conference
Gaver, W. (1997). Auditory interfaces . In Hel- on Advances in Computer Entertainment Technol-
ander, M. G., Landauer, T. K., & Prabhu, P. (Eds.), ogy, 2005, 225-228.
Handbook of human-computer interaction (2nd
Liljedahl, M., Papworth, N., & Lindberg, S. (2007).
ed.). Amsterdam: Elsevier Science. doi:10.1016/
Beowulf: An audio mostly game. Proceedings of
B978-044481862-1/50108-4
the International Conference on Advances in Com-
Gaver, W. W., Beaver, J., & Benford, S. (2003). puter Entertainment Technology, 2007, 200–203.
Ambiguity as a resource for design. Proceedings
Mullan, E. (2011). Physical modelling for sound
of the ACM CHI Conference on Human Factors
synthesis . In Grimshaw, M. (Ed.), Game sound
in Computing Systems, 2003, 233-240.
technology and player interaction: Concepts and
Hug, D. (2011). New wine in new skins: Sketching developments. Hershey, PA: IGI Global.
the future of game sound design . In Grimshaw,
M. (Ed.), Game sound technology and player
interaction: Concepts and developments. Hershey,
PA: IGI Global.
41
Murphy, D., & Neff, F. (2011). Spatial sound for Tuuri, K., Mustonen, M. S., & Pirhonen, A. (2007).
computer games and virtual reality . In Grimshaw, Same sound – different meanings: A novel scheme
M. (Ed.), Game sound technology and player infor modes of listening. In Proceedings of Audio
teraction: Concepts and developments. Hershey, Mostly 2007 – 2nd Conference on Interaction
PA: IGI Global. with Sound, 13-18.
O’Callaghan, C. (2009 Summer). Auditory (2010). World of warcraft [Computer game]. Reno,
perception. In E. N. Zalta (Ed.), The Stanford NV: Blizzard Entertainment.
encyclopedia of philosophy, Retrieved January
24, 2010, from http://plato.stanford.edu/archives/
sum2009/entries/perception-auditory/.
ADDItIONAL rEADING
Parker, J. R., & Heerema, J. (2008). Audio Interac-
tion in Computer Mediated Games. International Altman, R. (Ed.). (1992). Sound theory sound
Journal of Computer Games Technology, 2008, practice. New York: Routledge.
1–8. .doi:10.1155/2008/178923 Boehner, K., DePaula, R., Dourish, P., & Sengers,
Polaine, A. (2005). The flow principle in interac- P. (2005). Affect: From information to interaction.
tivity. In Proceedings of the Second Australasian In Proceedings of the 4th Decennial Conference
Conference on interactive Entertainment. on Critical Computing: Between Sense and Sen-
sibility.
Reid, J., Geelhoed, E., Hull, R., Cater, K., &
Clayton, B. (2005). Parallel worlds: Immersion Brown, E., & Cairns, P. (2004). A grounded
in location-based experiences. In CHI ‘05 Ex- investigation of game immersion. In CHI ‘04
tended Abstracts on Human Factors in Comput- Extended Abstracts on Human Factors in Com-
ing Systems. puting Systems.
Sengers, P., Boehner, K., Mateas, M., & Gay, G. Juslin, P., & Sloboda, J. A. (Eds.). (2001). Music
(2008). The disenchantment of affect. Personal and emotion: Theory and research. Oxford, UK:
and Ubiquitous Computing, 12(5), 347–358. Oxford University Press.
doi:10.1007/s00779-007-0161-4 Kaptelinin, V., & Nardi, B. A. (2009). Acting with
Sengers, P., & Gaver, B. (2006). Staying open technology: Activity theory and interaction design.
to interpretation: Engaging multiple meanings Cambridge, MA: MIT Press.
in design and evaluation. Proceedings of the 6th Norman, D. A. (1988). The design of everyday
Conference on Designing Interactive Systems, things. New York: Basic Books.
2006, 99-108.
Polotti, P., & Rocchesso, D. (Eds.). (2008). Sound
Sonnenschein, D. (2001). Sound design: The to sense, sense to sound: A state of the art in sound
expressive power of music, voice and sound ef- and music computing. Berlin: Logos Verlag.
fects in cinema. Studio City, CA: Michael Wiese
Productions. Schafer, R. M. (1977). The soundscape: Our
sonic environment and the tuning of the world.
Sweetser, P., & Wyeth, P. (2005). GameFlow: A Rochester, VT: Destiny Books.
model for evaluating player enjoyment in games.
ACM Computers in Entertainment, 3(3). Sider, L., Freeman, D., & Sider, J. (Eds.). (2003).
Soundscape: The School of Sound lectures 1998
– 2001. London: Wallflower Press.
42
KEY tErMs AND DEFINItIONs Gameplay: The rules and mechanics defining
the functionality of a computer game.
Auditory Perception: The process of attaining Game Metaphor: The embodiment of the
awareness or understanding of auditory informa- virtual environment comprising the game world.
tion or stimulus. Immersion: Deep mental involvement.
Avatar: A controllable representation of a Pervasive Game: A computer game tightly
person or creature in a virtual reality environment. interwoven with our everyday lives through the
Feedback: Output from a computer game to objects, devices and people that surround us and
inform the user of various changes in game state. the places we inhabit.
Flow: The mental state of operation in which a Suspension of Disbelief: A silent agreement
person is fully immersed in what he or she is doing between an audience and an entertainment pro-
by a feeling of energized focus, full involvement, ducer in which the audience agrees to provision-
and success in the process of the activity. ally suspend their judgment in exchange for the
promise of entertainment.
43
44
Chapter 3
Sound is Not a Simulation:
Methodologies for Examining the
Experience of Soundscapes
Linda O’ Keeffe
National University of Ireland, Maynooth, Ireland
AbstrAct
In order to design a computer game soundscape that allows a game player to feel immersed in their
virtual world, we must understand how we navigate and understand the real world soundscape. In this
chapter I will explore how sound, particularly in urban spaces, is increasingly categorised as noise,
ignoring both the social significance of any soundscape and how we use sound to interpret and negotiate
space. I will explore innovative methodologies for identifying an individual’s perception of soundscapes.
Designing virtual soundscapes without prior investigation into their cultural and social meaning could
prove problematic.
INtrODUctION Within a real world all the senses are exposed

to information, sight, sound, smell, and touch.
Simmel (as cited in Frisby, 2002) argues that the Within a gameworld, we are currently exposed
exploration and navigation of a space, particularly to an overriding visual experience and minimal
an urban space, impacts all of the human senses. sound information. There is a deficit of sensory
Equally he suggests that when exposed to multiple information occurring within this digital world
inputs of both internal and external stimuli, we and, as more people move towards gaming and
make choices, such as movement and interaction, virtual communities, this deficit must be examined.
based on the sensory information of a given space For digital virtual worlds to create a convincing
(Simmel, 1979). In the design of gameworlds, immersive experience with the technology that
we must examine this concept of sensory input is available, we must explore sound as well as
as both a method of navigation and socialisation. sight in the construction of gameworlds from a
sociological perspective.
DOI: 10.4018/978-1-61692-828-5.ch003
Sound is Not a Simulation
Thompson (1995) argues that when we enter cal impact of sound on the individual and society.
virtual spaces or communities we leave orality A game designer must also take into account the
behind: he sees no space for sound within vir- more abstract representation of sound that is expe-
tual worlds or online communities. It has been rienced in art, cinema, and other mediated spaces.
prevalent in social and media theory to ignore There is already a history of the experience of
the experience of sound in a space, whether that sound through mediatisation (Bull, 2000; Cabrera
sound is produced by human activity or by other Paz & Schwartz, 2009; Cohen, 2005; Drobnick,
natural sources. It is my argument that sound 2004): the difference between these theories and
plays a part in the social construction of space, the theory of game sound design is the concept
whether real or virtual, either by its presence or of immersion, interactivity, and simulated reality.
absence. Equally, I will argue that sound which What describes a soundscape, who defines the
is produced by objects through reverberation and description and what models are used to categorise
other acoustic qualities can affect how we navigate levels of sound and their meaning? There are no
or place meaning in a space. set methods for the study of acoustic ecology or
I will also explore the process of control which the soundscape from a sociological perspective.
is dominating research into the Soundscape, this I propose an interdisciplinary method which will
is primarily due to an increasing awareness of the draw on social theory, media theory, and sound
side effects or apparent dangers of loud sounds design. In order to explore the soundscape we
on people. must incorporate different methods and theories
The need to monitor and control sound in the to analyze the social impact of the soundscape,
environment has become a predominant research real or virtual on the individual and the group.
focus within soundscape studies. Sounds within
urban centers are increasingly seen as a by-product
of industry and technology: this has led to the tHE EXPLOrAtION OF
creation of noise policies within a number of tHE sOUNDscAPE
countries. Sound is increasingly seen as a measure
of sound pressure levels rather than being seen Some of the earliest documented exploration of
as a social structure (Blesser & Salter, 2009, pp. the modern soundscape arose from within the
1, 2). This is significant for sound designers who arts and modern music composition. Those who
wish to gather data on the meaning of sound within practised the art of listening explored the changes
society. If a sound designer considers sound only in our early soundscape, technology was seen to
in relation to volume, noise, or other objective change our soundscape, but this was not seen as
criteria they might ignore the meaning sound be- a negative event (Luigi Russolo’s 1913). Luigi
yond its output level. In looking at the social and Russolo’s 1913 manifesto, The Art of Noise, pos-
perceptual aspects of sound we are constructing ited that sound had reached a limit of invention,
what Feld (2004) would call an acoustemology of technological sounds allowed for an “enjoyment in
the sound world. He increasingly acknowledges the combination of the noises of trams, backfiring
that soundscape studies, which react to human motors, carriages, and bawling crowds”. He argued
interventions to the natural soundscape, ignore that in listening and using these sounds as types
cultural systems which develop as a result of be- of music we would create an awareness of the
ing immersed and surrounded by sound. rapidly changing soundscape. In an ever changing
The game space, or any virtual space which technological climate, we would increasingly be
asks a person to become immersed in it, needs to exposed to new types of sounds at a faster rate
be founded upon an understanding of the sociologi- than at any time preceding mechanisation. The
45
soundscape would also play a much stronger part environment, to act back on loud sound spaces
in the construction of music and sound art with and argue that because constructed spaces remain
the introduction of audio recording devices. static it is through social behaviour that we have
However over time this modern soundscape the ability to modify our sound arena.
became less a usable musical landscape or instru-
ment and more like an environmental pollutant the Designed space
(Bijsterveld 2008). Bijsterveld (2008) argued that
technology became a symbol of the loudness and De Certeau (1988) argues that the city is a rep-
unhealthy character of the urban soundscape. resentation of political economy, historical nar-
Schafer’s examination of the soundscape in rative and social forces of capitalism and while
the 1960s was guided by an awareness of the architects and planners see the whole, the vista,
increased levels of sound within urban centers as the individual who lives and works in the city
well as (Cohen, 2005; Schafer, 1977). He argued will never see it in totality. He suggests that we
that the spread of industrialisation, polluted not walk the city blindly, reconstructing our own
only such physical spaces as water and land but narratives of space. De Certeau implicates sound
the hearing space, leading to an alteration of the without referencing it as a way to see an invisible
perceived space for animals and humans. Sound, whole. He argues against the rationalising of the
or what was now being called noise, was increas- city or functionalist utopianisms, allowing for the
ingly seen as a negative side effect of industry. transformation of space by those that live within
Schafer’s research focused on a reification of it. Adams et al (2006) suggest that “a soundscape
past soundscapes and the preservation of sound- is simultaneously a physical environment and a
marks (similar to historic landmarks). The World way of perceiving that environment” (Adams et
Soundscape Project, established by several people al., 2006, p. 2). They see the soundscape as a con-
including Murray Schafer, in the late 1960s, pro- struct through which we will navigate. Adams et
posed a practice of recording the landscapes of al. and de Certeau understand that the construction
different spaces around the world. They wanted of space and our ability to navigate through it is
to record and archive certain landscapes they felt dependent on more information than the visible.
were being transformed as a result of a noisier In recreating the soundscape in digital land-
soundscape. These recordings would then high- scapes, the designer pays homage to the real world
light the effect that increased sound levels were she tries to replicate: she codes, intentionally or not,
having on certain spaces. the universalisms of design into the construction
Although Schafer brought the sound world into of her virtual space. The space is built to replicate
the equation as a factor within industrial change, the reflection of sound against object, as if this
very little focus has been on the positive aspects is the only way sound moves through space or
of contemporary soundscapes or their social mean- equally the only way we perceive it. She is equally
ing. Human activities produce sound, we are also guided by the epistemology of sight as the “the
embedded in sound, space becomes revealed to us epistemological status of hearing has come a poor
through sound and, as spaces become more built second to that of vision (Bull & Back, 2004, p. 1).
up or newly transformed our ability to see beyond Like any other visual medium, the design makes
our immediate space becomes limited. Blesser assumptions on how sound should be perceived
and Salter (2009) argue that sound allows us to in any constructed space. This functional ap-
envision our space; a space becomes revealed to proach only measures our potential physiologi-
us through its “aural architecture”. They examine cal responses to sound. It does not explore the
the ability of humans to restructure their sound individual or community experience of sound or
46
the subjective and immersive experience of time of real time and space. Thompson (1995) explores
and space through either real world listening or the change in perception of “spatial and temporal
mediated listening. characteristics of social life” (Thompson, 1995,
Augoyard and Torgue (2006) theorize that p. 12) due to the development of communications
sound may guide social behaviours: they argue technology. He recognises that the role of oral
that no sound event can be removed from “spatial traditions has changed: face to face contact is
and temporal conditions” and that sound is never eliminated in favour of virtual communications.
experienced in isolation. They have adopted quali- Bull (2000) argues that mediated listening is now
tative approaches to the exploration and analysis used as a means to escape the “urban overload”
of sound in urban spaces. Augoyard and Torgue of our cities and suggests that the use of mobile
argue that the term “soundscape” is tied to a certain technology for listening to the radio or to music
empirical model of measurement which may be collections affords a breather or a meta-physical
too narrow in its meaning, belonging more to a removal from the real world. How we shift be-
textual rather than observational critic of “acous- tween these acoustic environments, and how our
tical sources” and “inhabited spaces” (2006, p. personality and behaviour may be manipulated,
4). They suggest that the term sonic effect better both by our apparent control of one type of space
describes the experience of sound within space. and our lack of control over another, may affect
It breaks the analysis of sound into three distinct social patterns of relating to each other and the
fields: “acoustical sources, inhabited space, and world we inhabit.
the linked pair of sound perception and sound ac-
tion” (Augoyard & Torgue, 2006, p. 6). Each of sound control
these fields are required in order to examine the
ubiquitous nature of the soundscape as a process Research has shown that the reasons for putting on
which impacts on social, physiological as well as headphones are motivated by numerous factors,
psychological behaviour. such as (Bull, 2000). Erving Goffman’s (1959)
What is most difficult to analyse, but funda- theory of civil inattention addresses this concept.
mental to the soundscape design is the subjective He examines the unwillingness of the individual
experience of sound. When constructing a virtual to be seen in public spaces and explores the no-
landscape, the primary consideration is—and for tion of contexts structuring “our perception of the
a number of games it is the only goal—the reac- social world” (as cited in Manning, 1992, p. 12).
tion time of game player interaction: if I shoot, Goffman suggests that social spaces are framed
will I hear the sound of gunfire instantaneously? and, within these frames, we act a certain way.
How we act is perceived as being the acceptable
or normal behaviour for those spaces and he uses
MEDIAtED LIstENING the example of the elevator space: when travelling
in such a confined space, the “normal” behav-
The numbers of people turning to electronic de- iour is to look anywhere but at another person’s
vices (mp3, walkman, ipod, mobile games, and lap- face. Mediated spaces contain their own framed
tops) as a means of shutting out real world sounds context. When we engage in a fully immersive
has increased exponentially in the last decade experience, such as gaming or mediated listening,
(Bull, 2000). The personal headphone has played even if this happens in a public space we are not
a part in reconfiguring the landscape, allowing us seen to be ignoring the real world. We are seen
a choice in how we perceive our world and how to be engaged within another space, one which
we are perceived as taking part in or stepping out requires our full attention.
47
Bull’s (2000) research also highlights how There are massive assumptions being posited
the perception of time becomes distorted when by researchers into the field of noise or increased
listening to personal headphones. For some, listen- sound levels. Schafer and the World Forum for
ing is required to manage the boredom of “slow Acoustic Ecology argue that increased sound
time”. It is also used to negotiate a path through levels are creating a rift between the natural world
space, a path which is experienced through a and humanity’s relationship to it. They support
virtual soundscape or soundtrack and this alters research which is concerned with the “preservation
the listener’s perception of time. Bull’s studies of natural and traditional soundscapes” (Epstein,
have revealed that time is almost always a reason 2009). This focus on the conservation of older or
for engaging in mediated listening. This concept traditional soundscapes ignores the “everyday
of controlling space and time, through mediated urban situations impregnated with blurred and
listening, suggests that the senses required for hazy...sound environments” (Augoyard & Torgue,
listening extend beyond simply hearing. If the 2006, p. 6).
experience of listening alters the perception of
time and space then reality also becomes less
fixed and more flexible. Lefebvre (2004) argues NOIsE: tHE sIDE EFFEct
that time and the everyday life exists on multiple OF INDUstrY
levels and that the experience of time contains a
value coding, depending on the task being done. The term noise is often used to describe unwanted
He suggests that time is both fundamental and sound or sound that, in its make-up, carries certain
quantifiable and that quantifiable time is an im- characteristics that define it as negative. Schafer’s
posed measure which is based on the invention of early work on the soundscape explored ways of
clocks and watches. When engaged in mediated quantifying noise levels. One of his early explora-
listening (radio, sound art, audio books, and games, tions into the soundscape used a system of tables
for example), time may be re-appropriated. We are which measured the amount of complaints made
experiencing what Schafer called a schizophonic against certain noise sources and this project was
shift in perception, where, by means of medi- carried out in several countries. Schafer’s research
ated listening we exist between two time zones, concurred with what most people would suspect:
one created by our imagination and the other in most modern cities, traffic is seen as a pollutant
by the world around us. Devices, such as stereo both for carbon emissions as well as sound levels.
headphones, mobile phones and portable games, Yet in Johannesburg, South Africa, we see a very
which we use to pull us out of time, also act as different picture in relation to what is seen as noise
filters: they give us the choice to decide what it is and what is accepted as city sounds (Schafer,
we hear and do not hear. Equally, we can choose 1977, p.187). The vast majority of complaints
to hear both spaces, real and mediated, so that for sounds considered intrusive or annoying were
we do not become so distracted in our mediated made against the increased sounds of animals
listening that we walk under a car. The increased and birds within the city: unusually, the smallest
use of mediated listening devices, particularly in numbers of complaints were directed towards
public spaces, might be seen as an adaptation to traffic. It could be argued that one type of sound
the increase in sound levels within urban spaces. is seen as normal and part of the everyday urban
It could also be as a result of the sheer diversity while the more natural sounds no longer fit with
of sounds that exist within our world, most of the concept of an urban landscape.
which have no meaning or relevance in our day
to day lives.
48
sound as side Effect Our relationship to the

Modern soundscape
One of the areas in which noise pollution has
focused on within the urban soundscape is that Industrialisation has had a major impact on civili-
of the motor vehicle, which is seen as a major sation, and the association of sound to production
contributor to increased sound levels within cities is seen as implicit. If we introduce noise abate-
and towns. Bijsterveld’s (2004) historical analysis ment laws to tackle sound levels we ignore the
of noise laws, highlight the increasingly negative relationship that has evolved between humans
public opinion directed towards the motor vehicle and the sounds of mechanisation and industriali-
since the turn of the century. The city was increas- sation. In our concern for the soundscape and its
ingly seen as a space which had once held silence possible effects on humans we may change our
and that this silence needed to be regained, either soundscape to create a perceived better sound level
through the removal of motor vehicles or severe or quality, but ultimately we might also change
noise laws. Yet, over the decades, a relationship the relationship people now have to cities or in-
has developed between motorists and the sounds dustrial centres. It is necessary to fully understand
of their vehicles, an idea which is being explored the relationship that groups and individuals have
by Paul Jennings. Jennings’ (2009) research fo- to the urban soundscape, specifically the sounds
cuses on the positive aspects of sounds produced that are reminders of its urbanity, economy, and
by cars, from the sound of the door shutting, to population as well as its activities. MacLaran
the sounds of a petrol engine. He explores the (2003) argues that the urban space is increasingly
various ways of simulating the sounds emitted becoming partitioned and that the individual in-
by cars; studies have revealed that drivers have creasingly tries to locate a private space in which
developed a relationship to the sounds produced to claim ownership. With geographic boundaries
by cars such as power, control, and drivability becoming increasingly part of the urban space,
and so on. Simultaneously further research has defined by economics, politics and as a reaction
shown that car sounds exterior to the vehicle are to overpopulation, the urban space is increasingly
an important factor to visual orientation, particu- seen as a “mirror of the societies that engender
larly to the blind, hard of sight and cyclists (“Fake them” (MacLaran, 2003, p. 67).
Engine Noises” 2008). The sound of a vehicle has Yet Thompson (1995) suggests that a chang-
become an inherent part of the urban soundscape ing landscape is part and parcel of the urban
and it is used to measure distance, speed, and time. metropolis, people have and will adapt to further
In virtual terms, this association to a vehicle’s architectural or cultural shifts within urban areas,
individual soundscape has new meaning. If, for creating new cultures and social movements that
example, the hybrid car (electric and petrol and stand alongside these changes to the landscape.
very quiet) becomes more prevalent in society, What is not considered by these researchers is that
will we change the perceived soundscape of the a city is more than its visual or geographical cues.
urban space? For decades, we have associated Thompson argues that within the media, partic-
the sounds of cities with vehicles and they have ularly the internet, new social structures will form
become a significant part of the urban soundscape, within virtual spaces, and these will, to a certain
an ambience that defines the metropolis. If this extent, replace the physical world in developing
sound disappears what effect might this have on community and place which is increasingly seen
our relationship to both the city and its transport? as crowded. Yet within mediated environments
and the real world there is no real consideration
to the soundscape and its importance as a social
49
construct in the formation of identity and society. through volume or interactive means of changing
There is a substratum of symbolic content associ- the sound environment. In the visual world of
ated with the visual space; Schafer’s research has games certain elements are static and the con-
created a set of hermeneutics from which sound- troller cannot change or effect the environment.
scape studies may draw. It is necessary to create This is based on the conceptual approximation
dialectic on the soundscape, one which poses of reality, (a tree is a tree and must remain so in
questions of meaning, noise, control, structure order to simulate reality). If we introduce ambient
and interpretation. This becomes more significant sound it too must approximate this idea a gamer
as urban and governmental policy move towards can close their eyes to shut out the world, but no
controlling sound. If we operate on the basis that one can close their ears. But as in the real world
sound is a set of objects which can be assessed we can create or find spaces of acoustic interest
by their levels rather than their meaning, we will to us, we can in a virtual environment turn of an
construct passive digital soundscapes. engine, perhaps a gamer should be able to turn off
While the study of sound through the social all engines and close down (or destroy) factories
and physical sciences have advanced towards and other sounds they perceive as unwanted in
exploring sound as a subject, we are gradually their soundscape. Equally the soundscape should
moving towards an acoustic epistemology which simulate reality, the ambient soundscape whatever
embraces the ephemerality of sound. It is both that is must be all surrounding and there must be
sensorial and primary, a subject which needs limits to the control of this sound that is if the
fundamental and theoretical frameworks which intention to approximate the physicality of space.
can be realised through methodological research. I do not propose that we draw attention to
Unfortunately, in rushing towards categorising the soundscape within games, the more real a
sound and its effects, certain policies have been soundscape seems, the less a gamer would notice
created to simply categorize sound as noise, not it. Instead we must consider that to increase the
understanding the many social contexts which may perception of immersion the soundscape must
explain why, “despite successful implementation reflect or approximate a real world soundscape,
of noise maps and action plans…there is little evi- rather than being as a “bit part player to the visual
dence of preventing and reducing environmental star” (Grimshaw & Schott, 2007, p. 2).
noise” (“Working Group Noise Eurocities” n.d.). Ambient sound denotes a sound that surrounds
These policies fail to understand that sound has all physical space; it has been defined by some
many social contexts and that this means under- as foreground, middle ground and background
standing that sound is not simply a signifier of sound (Adams, 2009; Schafer, 1977). This three
some otherness, an association with a producer; part description of a soundscape lays out sound,
a product or side affect of technology, car sounds, within both the virtual and the real world, as an
factory sounds etc. assemblage, one which is created as a result of
What this underlines is that there is a need to reverberation, dynamics, levels and acoustics.
explore the control issue which has arisen within These three characteristics imply that sound can
soundscape research, if sound is being seen as a be split apart to understand its workings, and then
negative effect of industry and modernism one reconstructed as a virtual soundscape, that is if we
which seems beyond the individuals control ignore how sound is socially and psychologically
then we have a concept to explore in virtual perceived. While technology can break sound apart
soundscapes. so that we can hear minute elements of the whole,
The positive act of listening in a virtual sound- we physically hear sound in its entirety because
scape is that the sound can be controlled, be it we cannot shut out sounds; we do not have what
50
Schafer called “earlids”. We comprehend that competition, for example Space Invaders (Taito,
sound may be reaching us from particular dis- 1978), Pac Man (Namco, 1980) and Donkey Kong
tances or places, and we make choices in regards (Nintendo, 1981). Has the goal shifted towards
to what we consider important sounds to listen to, the user having a more connected experience or
but we cannot choose to not hear sounds within relationship with the virtual or gameworld? If
our hearing range. Equally we inhabit and work they space is a simulation of the real world do we
in spaces that produce sounds that we have to engage less with the concept of a game and more
make meaning from and that we contribute to, towards the concept of being able to relate to the
our entire lives are spent surrounded by sounds. space. Bull and Back (2004) would argue that of
So how do we make meaning from these sounds the human senses “vision is the most ‘distancing’
and how do we measure that meaning? If we wish one” (Bull & Back, 2004, p. 4), revealing only what
to simulate the experience of being within a space is real and what is. The goal has evolved to create
whether this space is a war zone a different planet a sense of co-presence within film and potentially
or the North Pole, we must understand that sound games; 3D cinema examines the possibility of the
is socially and culturally constructed (Drobnick, image creating a sense of surround and presence,
2004). For sound design this is paramount, if we again see Avatar and the new 3D TV from Pana-
wish to create a simulacrum of the real we must sonic. The overall assumption seems to be, that
understand to what extent sound plays in our the only way to create a sense of reality within
navigation both physically and socially of spaces. a digitally created world is through the imagery,
a kind of simulated panoptic vision. What seems
to be forgotten within this quest for immersion
IMMErsION AND is that sound is actually three dimensional and
sIMULAtED rEALItY listening is not a simulated experience.
It is the concept of immersion which guides sonic Immersion

design within the gaming industry, being seen
as the “holy grail of digital game design” (Grim- Sound is inherently physical and we are always
shaw, Lindley, & Nacke, 2008). Graphic design immersed in it, even if we focus our listening
in gaming has evolved through several stages of towards one sonic experience we are still hearing
realism, towards the appearance or “illusion of the entire sonic effect of any space. This is then
life” (Hodgkinson, 2009, p. 1). One outcome of the challenge and the goal for digital game sound
this simulation of the real world within digital designers; to create spaces that accept the whole
games can be seen in the film industry. Films are universality of the ambient space, and be aware of
produced which have been based on games: Tomb the outside world that will invariably intrude on
Raider (West, 2001) and Resident Evil (Anderson, this design. Therefore sound design must create
2002). Equally we have movies which resemble a sense of displacement or removal from the real,
gameworlds and the gameworld concept: Final while accepting that the real will equally intrude
Fantasy (Sakaguchi, 2001), Aeon Flux (Kusama, on the virtual experience.
2005) and most recently Avatar (Cameron, 2009). Similarly digital game designers must address
The focus of digital visual game design seems the issue of the senses being in their entirety neces-
aimed towards an essential realism, but why this sary to comprehend a world. Surround sound must
search for the most realistic? Early games were then play a part within the design of certain game
less concerned with the realism of the space or spaces, for example, first-person shooter (FPS)
the characters and more on the idea of game and games. FPS games generally involve a single
51
player navigating through a space; if they are to The soundwalk technique has been adopted
feel physically immersed the sound must seem by different researchers for numerous projects
all-surrounding. The need for surround sound or around the world since the seventies. Most recently
immersive experiences must also take into account Adams adopted the soundwalking method for the
the physics of sound. Connor (2004) argues that Positive Soundscapes Project in 2006. The purpose
sound is both intensely corporeal, it physically of the research was to develop a holistic approach
moves us, and paradoxically immaterial, it cannot to studying the soundscape. The project invited
be grasped. He argues that sound does not simply people to engage in listening to their soundscape
surround us, it enters us, if loud enough or high and then identify sounds of importance. Adams
enough it can cause pain and damage; it is seen adopted Schafer’s terminology of keynote sounds,
as tied to emotion more so than sight which is soundmarks and sound signals as analytical mod-
seen as neutral. Within social theory sight has els in which to assess the data. This method in
overwhelmed the senses; the epistemological itself does not clarify contextual or social meaning
status of sight over sound has crossed over to so we must explore other qualitative approaches
many disciplines including digital game design. such as field research and interviews, and decid-
In Simmel’s 1886 work Sociological Aesthetics ing which qualitative paradigm will best suit this
(as cited in Frisby, 2002), he argues that vision investigation.
gave a fuller expression to the fragmented city, Traditional sociological methods should
the eye if “adequately trained” perceives all of a play a part in the exploration of meaning and
space. This merging of all visual signals suggests construction of sound. In Adams research, when
that we do not see in parts but in total. Simmel “prompted to consider spatial layout” (2009, p. 7)
saw sound as intrusive to the perfection of the the respondents tried to identify the sounds that
visible world; it was the profusion of sounds that they heard in the same way they would objects.
distracted one from the beauty of the modern urban This proved problematic as the participants had
space. Tonkiss (2004) argues that within modern no vocabulary to describe the soundscape or its
sociology the goal was to flatten the city, to will meaning. Simply focusing on identifying sounds
sound to silence, to order it. Tonkiss suggests that and their meaning may limit the explanation
vision is spectacle, whereas sound is atmosphere or interpretation of cultural or social meaning.
and she argues that sound offers us a sense of Therefore other methods must be incorporated
depth and perspective. into the exploration of the soundscape that enable
the researcher to comprehend the ubiquity of the
sound environment. Interviews both structured and
sOUND MEtHODOLOGY open ended allow for the retrieval of information
AND ANALYsIs beyond the specifics of description. Adopting a
soundwalking method alongside personal nar-
In order to identify what is significant about a rative interviews or life history interviews can
soundscape one must adopt a multi- method ap- connect meaning to hearing.
proach. One method is soundwalking created by Allowing a participant a longer time to consider
Hildegard Westerkamp and Murray Schafer in the their sound environment, such as having them
1970s. Westerkamp’s use of this method involved notate or record over a period of time, may reveal
asking participants to move through an area that anamnesis experiences. This is where a sound can
was known to them and recording places of sig- evoke a memory or sensation of a past experience.
nificance. These recordings would later become This is not as subjective as it may seem, the sound
part of radio art works or installations. track in films—particularly the leitmotif—are
52
often used to refer to a previous part of the film For digital game sound this does not necessarily
causing a kind of anamnesis in the listener (Augo- seem an important issue, the ambient soundscape
yard & Torgue, 2006; Chion, Gorbman, & Murch, rarely includes high levels of conversational sound
1994). Sounds become tied to experiences and and game designers rarely design for the blind.
therefore have a meaning beyond a description Yet in cities and urban centers, vocal sounds and
of sound and effect. Our participant, in having a directional sounds are one of the dominating
longer time to record or document these kinds of sound and spatial characteristics of the environ-
experiences will allow for a further insight into ment. There is interplay between vocal sounds
what certain sounds can trigger. and architecture; they will resonate at different
Riessman (1993) argues that in the act of telling frequencies depending on the construction of the
there is an inevitable gap between the experience space. Thus understanding how people distinguish
and the telling: the sound methods allow for the sounds, such as vocals amongst a variety of other
participant to embody themselves in the narrated sounds may be relevant if a designer wishes to
space, as they are situated in the environment to include this soundtrack of reality into sound design
which they are referring to. What these combined for gaming. Equally we can make choices in what
methods may reveal lie not in how we listen to direction we choose to go to based on acoustic as
sound but what we hear when we actively think well as visual information. This could be explored
about listening. That in itself may highlight how through a series of listening projects whereby a
much active listening happens in a person’s life focus group must listen to different sets of sounds
and if it turns out that there is, quite a lot heard while trying to engage in other activities. If the
in an individual’s day to day experiences we level of information and not volume is increased
must consider sound more actively in the design over time, one could ascertain how much informa-
of digital soundscapes conversely, if we reveal tion we can process simultaneously while trying
that sound plays only a minor part in a person’s to complete tasks.
relationship to his environment we may have to
re-think how sound, beyond music, should be part contextualizing Game space
of a digital game space.
Sequeira, Specht, Hamalainen, and Hugdahl’s Understanding that there are a variety of ways
(2008) research on the hearing impaired noted to experience the gameworld is a necessary con-
that clarity is essential in picking up the minutiae dition to deciding what soundscape should or
within the complexity of sounds, as issues can could be placed within this virtual space. What
occur when ambient sound levels are too high. is the operant behaviour of the gamer, what is the
The comprehension of language becomes more participation level and how much control in the
difficult when we try to distinguish dialogue which gameworld does the player have? Finally how
is surrounded by high levels of background sounds. does one contextualise oneself within the world?
Equally, Sanchez and Lumbreras’s (1999) research Grimshaw and Schott (2007) noted that there
in the design of digital gameworlds for the blind was a feedback “for operant behaviours (panting
highlighted the need for 3D audio interfaces as a breaths is a good indicator of the player’s energy
method in which to navigate space. They argue level) (2007, p. 475). In examining FPS games,
that users, when deprived of the sense of sight, are we see that sound is predominantly responsive
able to recognise spatiality and “localise specific and reactive, rather than passively situated in the
points in 3D space, concluding that navigating background, and this is a key component to this
space through sound can be a precise task for type of gaming. We may hear the dying groans of
blind people” (1999, p. 1). another wounded warrior in FPS games, but we
53
do not hear the voices of hundreds of men dying of interview allows the interviewer a certain level
or in pain, a sound that would exist in a real war. of control which directs the interviewee down
Our experiences of explosions are controlled lest particular paths. Equally it allows the interviewee
we be deafened, but where is the artillery con- to expand on themes outside the limits of the ques-
stantly humming over the horizon, the perpetual tion, which can reveal unexpected information
whump, whump of helicopters marking or spotting (Bryman, 2008).
territory? Jørgensen (2008) argues that symbolic
sounds are key components in Player V Player the Mapped soundscape
games, more so than background. For her, game
context is key: what kind of game is it and what If we were to map the soundscape of a city where
type of space does the avatar inhabit? Jorgensen’s would we start? Would we first categorize it, a
research focuses on the situation oriented ap- heading from loudest to quietest or might we
proach which interprets sounds in reference to break it up into specific human sounds, crowds,
events, rather than object orientated perspective. individuals, groups of five or more, age related or
She argues that the gamer must understand the gender specific? Females have a different tonality
rules of the system in order to both manipulate to their voices compared to men, children have
it and understand that it “can affect individual higher pitched voices to adults, and teenagers are
actions” (Jørgensen, 2008, p. 2). This concept louder than everybody. Then we refer to acoustics,
reflects Blumer’s (1986) symbolic interactionist how different do people sound on a pedestrian
approach, where humans “define each other’s street as compared to a car filled street or even a
actions instead of merely reacting to each other’s park? We can then examine the architecture of the
actions” (p. 79). The other person in this case is space, the height of the buildings their position
the gameworld. and how this might change the reverberant space.
There may be several schools of thought on Then we could move on to city noises, for
sound within gaming. If the sound is too real, example trams running through a city. This would
would it terrify the gamer, distract them, annoy sound at a very low but continuous level, marking
them, or just confuse them? Both Schafer and specific territories within a city at particular times.
Smith have looked at the history of the sound- Then there is the multitude of cars, trucks and
scape and analysed the possible cause and effect vans and the occasional house alarm, fire alarms,
of certain soundscapes on the human condition fire trucks, police cars and ambulances sounding
(Schafer, 1977; Smith, 1999, 2004). However, off regularly throughout the day, reminding us of
a new research model is needed to identify how sickness, danger and intrusion. The continuous
certain sounds trigger emotive or psychological hum of traffic that never quite stops, but it shifts
responses, particularly to the soundscape that is in decibel level throughout the day and sits along-
featured in a large number of games: war sounds. side a cacophony of beeping horns. There is the
For a conclusive multi-method we must first opening and shutting of thousands of doors onto
decide what is actually needed in a digital game streets, which might include the hiss of sliding
space. For example, if the game has no point of doors, the beeping signals at pedestrian traffic
free space where the player can actively listen to lights, or a robotic voice counting down till we
their environment, is it necessary for a detailed can cross the street. These sounds are part of the
soundscape? This question may be answered ambient soundscape of most cities, but they are
by the questionnaire approach; a series of semi still just a small part of the overall sound.
structured interviews may reveal how people hear Maybe we think we have not heard the sound of
a space that they only traverse through. This type a million footsteps pounding a street—it is such a
54
huge part of the murmur of a city that we no longer ticular spaces, my interpretation of these sounds
distinguish it from the background noise—yet if lie in my education, upbringing, and the socially
it stopped… we would notice the silence. The constructed meanings that are inevitably tied to
street hawkers and homeless, a perpetual cry of, certain sounds.
“What do you want?”, “Can you give?”, “Have We again return to what Augoyard and Torgue
you got any change?”, “Will you buy?”, Specific (2006) would consider the inherent problem of
sound markers in Dublin are, “flowers get your describing or analysing a soundscape: the subjec-
flowers, get your fruit, get your veg, paper, eve- tivity issue. If each group or individual perceives
nin’ paper, any money for a hostel”. These oral sounds differently, how can we generalise when
announcements could also be considered part of constructing a soundscape? This argument could
the ambient sound track of the city. They would cross over to many disciplines, within the arts it
in fact be the soundmarkers for particular urban is generally understood that a work of art is best
spaces. This multitude of sound still leaves out the understood by the artist who made it. Yet the
sounds related to the outside or inside acoustics artist accepts that their work will be interpreted
created by structures and objects such as build- differently by every person that sees it. So what
ings, cars, trains or metro stations. makes a great work of art? Is it tied to cultural
If one moves to what urban dwellers consider phenomena, can a particular work be representa-
the apparently quiet soundscape of the natural tive of a particular time? Do people understand
world, we find a multitude of sounds connected the meaning because it resonates with what is
to the society of animals, from mating cries to happening at a particular moment, globally, politi-
hunting calls as well as the sound of eating and cally and socially?
foraging, flying, climbing and running. There is It is not enough to dismiss understanding how
the ambient sound of wind through trees, grass or the individual experiences sound because it is sub-
wood bending, rain storms, flowing rivers, rippling jective, we must explore how people understand
water, small streams, and all of this situated in one sound in particular places at particular times and
small area. Now relate this minimal soundscape to then look for similarities between other places
sounds within gaming. Such a comparison might and people. Then perhaps we can generalise in
lead us to ask how we can experience a real, or the construction of digital sound design based on
significantly close to real, soundscape in a virtual data that reveals particular generalities.
world if the sound design is limited to “character
or interface sounds” (Grimshaw & Schott, 2007).
This description might be considered too linear cONcLUsION
and too connected to time and human activities.
The ability to comprehend space and the sounds The interpretation and meaning of sound alters
within it are not based entirely on the ability to in relation to personal, historical and cultural
hear, it is also based on the cultural and social experiences, as well as the context of our auditory
context of both the sounds we hear and our in- experience. The physicality of sound can alter
terpretation of them. Blesser and Salter (2009) our perception of the space in which we hear it,
would argue that we cannot interpret and construct expanding or contracting the landscape and shap-
sonic architecture without accepting the cultural ing our psychological and sociological response to
relativism of the sensory experience. place. If we wish to construct a digital soundscape
Therefore in my description of the urban rural which simulates reality and creates the sense of
soundscape I cannot claim to be objective; my immersion, a study of the sociological impact of
choice of sounds relate to my experience of par- the soundscape must be undertaken. However the
55
consideration of what defines reality and experi- tion of sound, we have no choice in the sound the
ence must also be explored. As mentioned earlier engine makes, but beeping a horn is active sound
in the text the simulated soundscape of war games making. Thus sound production has an implicit
are not based on the real soundscape of a war message the interpretation of which might be
zone, but on a sound designer’s definition of war subjective. Whether it is perceived as positive or
sounds. What definition of reality are we measur- negative can depend on the intention. It may also
ing this soundscape of virtual worlds against, and affect behaviour, do we choose to move out of the
how real do we want our virtual environments way of a vehicle or allow it to stimulate anger or
to be? Most of the environments we experience other emotive responses.
within games are spaces which we may never This active sound does not simply reference the
experience in reality. Our experience of certain acoustics of space or a description of noise; it car-
soundscapes may be understood in relation to ries a message, a description of a situation that has
other media representations: television, Internet social and cultural context. If, as Thibaud (1984)
and cinema. The digital game soundscape then suggests sound is not a “mere epiphenomenon or
becomes a construct of definitions rather than a secondary consequence of activity” (p. 4) then we
simulated reality. must consider that all sound has meaning, it is
If we are trying to simulate a sense of reality in how to deconstruct that meaning that will allow
gaming we must consider how real we wish to go. for a clearer understanding of the soundscape.
Grimshaw (2007) argues that it is only through the With this understanding we can construct digital
audification of gaming that we actually simulate soundscapes which will challenge the perception
the idea of immersion. This implies that sound in that the image is what gives the illusion of the real.
itself provides a sense of reality whether or not
the sound is based on reality. So what is it about
the physical aspects of sound that create a sense rEFErENcEs
of being elsewhere? It is not enough to suggest
that because sound is physical it creates a sense Adams, M. (2009). Hearing the city: Reflections
of immersion. Sound must be understood beyond on soundwalking. Qualitative Research, 10, 6–9.
the physical, a language must be developed as a Adams, M., Cox, T., Moore, G., Croxford, B.,
result of empirical research which explores the Refaee, M., & Sharples, S. (2006). Sustainable
sociological phenomena of sound. soundscapes: Noise policy and the urban expe-
Thibaud (1998) suggests that we must create a rience. Urban Studies (Edinburgh, Scotland),
“praxiology” of sound from the natural soundscape 43(13), 2385. doi:10.1080/00420980600972504
before we construct artificial soundscapes. He also
argues that beyond just meaning and interpretation, Anderson, P. W. S. (2002). Resident evil [Motion
sound can and does affect our choices; we pick picture]. Munich, Germany: Constantin Film.
up “information displayed by the environment in Augoyard, J., & Torgue, H. (2006). Sonic experi-
order to control actions (such as locomotion or ence: A guide to everyday sounds (illustrated ed.).
manipulation) […] thus, the environmental prop- Montreal, Canada: McGill-Queen’s University
erties and the actor/perceiver activities cannot be Press.
disassociated: they shape each other” (Thibaud,
1998, p. 2).
Sound can be both active and passive and this
will affect our response to it. Driving a car, for
example, might be considered a passive produc-
56
Bijsterveld, K. (2004). The diabolical symphony Connor, S. (2004). Edison’s teeth: Touching hear-
of the mechanical age: Technology and symbolism ing. In V. Erlmann (Ed.), Hearing cultures: Essays
of sound in European and North American noise on sound, listening, and modernity (English ed.,
abatement campaigns, 1900-40 . In Back, L., & pp. 153-172). Oxford, UK: Berg.
Bull, M. (Eds.), The auditory culture reader (1st
de Certeau, M. D. (1988). The practice of everyday
ed., pp. 165–190). Oxford, UK: Berg.
life. Berkeley: University of California Press.
Bijsterveld, K. (2008). Mechanical sound: Tech-
Donkey kong [Computer game]. (1981). Kyoto:
nology, culture, and public problems of noise in
Nintendo.
the twentieth century. Cambridge, MA: MIT Press.
Drobnick, J. (2004). Aural cultures. Toronto:
Blesser, B., & Salter, L. (2009). Spaces speak, are
YYZ Books.
you listening?: Experiencing aural architecture.
Cambridge, MA: MIT Press. Epstein, M. (2009). Growing an interdisciplinary
hybrid: The case of acoustic ecology. History of
Blumer, H. (1986). Symbolic interactionism.
Intellectual Culture, 3(1). Retrieved December
Berkeley: University of California Press.
29, 2009, from http://www.ucalgary.ca/hic/is-
Bryman, A. (2008). Social research methods (3rd sues/vol3/9.
ed.). Oxford, UK: Oxford University Press.
Fake engine noises added to hybrid and elec-
Bull, M. (2000). Sounding out the city: Personal tric cars to improve safety. (2008). Retrieved
stereos and the management of everyday life. January 10, 2010, from http://www.switched.
Oxford, UK: Berg. com/2008/06/05/fake-engine-noises-added-to-
hybrid-and-electric-cars-to-improve/.
Bull, M., & Back, L. (2004). The auditory culture
reader (1st ed.). Oxford, UK: Berg. Feld, S. (2004). A rainforest acoustemology . In
Bull, M., & Back, L. (Eds.), The auditory culture
Cabrera Paz, J., & Schwartz, T. B. M. (2009).
reader (1st ed., pp. 223–240). Oxford, UK: Berg
Techno-cultural convergence: Wanting to say
Publishers.
everything, wanting to watch everything. Popular
Communication: The International Journal of Frisby, D. (2002). Cityscapes of modernity: Criti-
Media and Culture, 7(3), 130. cal explorations. Cambridge, UK: Polity.
Cameron, J. (Director). (2009). Avatar [Motion Goffman, E. (1959). The presentation of self in
picture]. Los Angeles, CA: 20th Century Fox. everyday life (1st ed.). New York: Anchor.
Lightstorm Entertainment, Dune Entertainment,
Grimshaw, M. (2007). Sound and immersion in
Ingenious Film Partners [Studio].
the first-person shooter. In Proceedings of 11th
Chion, M., Gorbman, C., & Murch, W. (1994). International Conference on Computer Games:
Audio-vision. New York: Columbia University AI, Animation, Mobile, Educational and Serious
Press. Games.Published to CDROM.
Cohen, L. (2005). The history of noise [on the Grimshaw, M., Lindley, C. A., & Nacke, L. (2008).
100th anniversary of its birth]. IEEE Signal Pro- Sound and immersion in the first-person shooter:
cessing Magazine, 22(6), 20–45. doi:10.1109/ Mixed measurement of the player’s sonic experi-
MSP.2005.1550188 ence. In Proceedings of Audio Mostly Conference
21-26.
57
Grimshaw, M., & Schott, G. (2007). Situating Sakaguchi, H. (Director). (2001). Final fantasy
gaming as a sonic experience: The acoustic ecol- [Motion picture]. Los Angeles: Columbia.
ogy of first-person shooters. In Proceedings of
Schafer, R. M. (1977). The tuning of the world.
Situated Play, 24-28.
Toronto: McClelland and Steward.
Hodgkinson, G. (2009). The seduction of realism.
Sequeira, S. D. S., Specht, K., Hämäläinen, H.,
In Proceedings of ACM SIGGRAPH ASIA 2009
& Hugdahl, K. (2008). The effects of different
Educators Program (pp. 1-4). Yokohama, Japan:
intensity levels of background noise on dichotic
The Association for Computing Machinery.
listening to consonant-vowel syllables. Scandi-
Jennings, P. (2009). WMG: Professor Paul Jen- navian Journal of Psychology, 49(4), 305–310.
nings. Retrieved December 30, 2009, from http:// doi:10.1111/j.1467-9450.2008.00664.x
www2.warwick.ac.uk/fac/sci/wmg/about/people/
Simmel, G. (1979). The metropolis and mental
profiles/paj/.
life. Retrieved February 1, 2010, from http://www.
Jørgensen, K. (2008). Audio and gameplay: An blackwellpublishing.com/content/BPL_Images/
analysis of PvP battlegrounds in World of Warcraft. Content_store/Sample_chapter/0631225137/
GameStudies. Retrieved January 10, 2010, from Bridge.pdf.
http://gamestudies.org/0802/articles/jorgensen.
Smith, B. R. (1999). The acoustic world of early
Kusama, K. (Director). (2005). Aeon flux[Motion modern England: Attending to the o-factor (1st
picture]. Hollywood, CA: Paramount. ed.). Chicago: University Of Chicago Press.
Lefebvre, H. (2004). Rhythmanalysis: Space, time Smith, B. R. (2004). Tuning into London c.1600 .
and everyday life. Continuum. In Bull, M., & Back, L. (Eds.), The auditory culture
reader (1st ed., pp. 127–136). Oxford, UK: Berg.
Lumbreras, M., & Sánchez, J. (1999). Interac-
tive 3D sound hyperstories for blind children. In Space invaders [Computer game]. (1978). Tokyo,
Proceedings of the SIGCHI conference on Human Japan: Taito.
factors in computing systems: the CHI is the limit
Thibaud, J. (1998). The acoustic embodiment of
(pp. 318-325). Pittsburgh, PA: ACM.
social practice: Towards a praxiology of sound
MacLaran, A. (2003). Making space: Property environment . In Karlsson, H. (Ed.), Proceedings
development and urban planning. London: Hod- of Stockholm, Hey Listen! (pp. 17–22). Stockholm:
der Arnold. The Royal Swedish Academy of Music.
Manning, P. (1992). Erving Goffman and modern Thompson, J. B. (1995). The media and modernity.
sociology. Standord, CA: Stanford University Standford, CA: Stanford University Press.
Press.
Tonkiss, F. (2004). Aural postcards: sound,
Pac man [Computer game]. (1980). Tokyo, Japan: memory and the city . In Back, M., & Bull, L.
Namco. (Eds.), The auditory culture reader (1st ed., pp.
303–310). Oxford, UK: Berg.
Riessman, D. C. K. (1993). Narrative analysis
(1st ed.). Los Angeles: Sage. West, S. (Director). (2001). Laura Croft:Tomb
raider [Motion picture]. Hollywood, CA: Para-
Russolo, L. (1913). Russolo: The art of noises. Re-
mount.
trieved December 30, 2009, from http://120years.
net/machines/futurist/art_of_noise.html.
58
Working Group Noise Eurocities. (n.d.). Retrieved between the real world space and the virtual
January 10, 2010, from http://workinggroupnoise. soundscape.
web-log.nl/. Social construction of space: Social con-
structivists examine ways in which individuals
and groups participate in the creation of their
perceived social reality. In this context, I am fo-
KEY tErMs AND DEFINItIONs
cusing on how society can change their perceived
Holistic: In order to understand the whole of space through sound, either by how they listen to
a system, one must look at the parts within it that or produce sound in a space.
make it up. Within sociology, Durkheim devel- Sonic Architecture: The study of the acoustic
oped a concept of holism which is in opposition affect of objects such as building’s, interior and
to methodological individualism. exterior, on space. Equally, sonic architecture ex-
Immersion: To be completely surrounded plores how people can construct sonic structures
by sound. or challenge the sounds of places by creating their
Mediatization: Sonia Livingstone’s defini- own sonic space.
tion of Mediatization is for me the most accurate Soundscape: Refers to both natural and man-
because it refers “to the meta process by which made sounds that immerse an environment.
every day practices and social relations are in- Soundwalking: A soundwalk is a journey
creasingly shaped by mediating technologies and where the objective is to discover an environment
media organisations” (http://www.icahdq.org/ by listening to it.
conferences/presaddress.asp par. 3). Symbolic Interactionist: The study of micro-
Schizophonic: Murray Schafer describes the scale social interaction. It is seen as a process that
term schizophonic as the split between an origi- informs and forms human conduct, the premise
nal sound and an electroacoustic reproduction in being that humans beings act on and upon things
a soundscape. I am using it as a metaphor for a based on the meaning these things have, things
split between two types of listening spaces: If one being defined as physical objects such as chairs,
is listening to music while traversing through a trees, phones, and human beings, mothers, shop
real space the attention is split in comprehension clerks and so forth.
59
60
Chapter 4
Diegetic Music:
New Interactive Experiences
Axel Berndt
Otto-von-Guericke University, Germany
AbstrAct
Music which is performed within the scene is called diegetic. In practical and theoretical literature on
music in audio-visual media, diegetic music is usually treated as a side issue, a sound effect-like occur-
rence, just a prop of the soundscape that sounds like music. A detailed consideration reveals a lot more.
The aim of this chapter is to uncover the abundance of diegetic occurrences of music, the variety of
functions they fulfill, and issues of their implementation. The role of diegetic music gains importance in
interactive media as the medium allows a nonlinearity and controllability as never before. As a diegetic
manifestation, music can be experienced in a way that was previously unthinkable except, perhaps, for
musicians.
INtrODUctION scene, associates contextual information, and

thus enhances understanding (Wingstedt, 2008).
Dealing with music in audio-visual media leads the Comparatively little attention has been given to
researcher traditionally to its non-diegetic occur- diegetic music. As its source is part of the scene’s
rence first, that is offstage music. Its interplay with interior (for example, a performing musician, a
the visuals and its special perceptual circumstances music box, a car radio), it is audible from within
have been largely discovered and analyzed by the scene. Hence, it can exert an influence on the
practitioners, musicologists, and psychologists. plot and acting and is frequently even an inherent
Its role is mostly an accompanying, annotating part of the scenic action. In interactive media it
one that emotionalises elements of the plot or can even become an object the user might be able
to directly interact with.
This chapter addresses the practical and
DOI: 10.4018/978-1-61692-828-5.ch004 aesthetic issues of diegetic music. It clarifies
Diegetic Music
Figure 1. A systematic overview of all forms of diegetic music
differences to non-diegetic music regarding The key to this is interactivity. However, dif-
inner-musical properties, its functional use, and its ferent types of games allow different modes of
staging and implementation. Particular attention interaction. Different approaches to diegetic music
is paid to interactivity aspects that hold a variety follow, accordingly. To lay a solid conceptual basis,
of new opportunities and challenges in store, es- this chapter also introduces a more differentiating
pecially in the context of modern computer games typology of diegetic music and its subspecies,
technology. This directly results in concrete design which is outlined in Figure 1. The respective sec-
guidelines. These show that adequate staging of tions expand on the different types. Before that,
diegetic music requires more than its playback. The a brief historical background and a clarification
problem area comprises the simulation of room of the terminology used are provided.
acoustics and sound radiation, the generation of
expressive performances of a given compositional Where Does It come From?
material, even its creation and variation in real-
time, amongst others. Early examples of diegetic music can be found
The complexity and breadth of these issues in classic theatre and opera works, for instance,
might discourage developers. The effort seems too the ball music in the finale of W.A. Mozart’s Don
expensive for a commercial product and is barely Giovanni (KV 527, premiered in 1787) which is
invested. Game development companies usually performed onstage, not from the orchestra pit.
have no resources available to conduct research Placing musicians onstage next to the actors may
in either of these fields. But in most cases, this is hamper dialog comprehensibility. To prevent
not even necessary. Previous and recent research such conflicts, diegetic music was often used
in audio signal processing and computer music as a foreground element that replaces speech. It
created many tools, algorithms, and systems. wasn’t until radio plays and sound films offered
Even if not developed for the particular circum- more flexible mixing possibilities that diegetic
stances of diegetic music, they approach or even music grew to be more relevant for background
solve similar problems. It is a further aim of this soundscape design (for example, bar music, street
chapter to uncover this fallow potential. This may musicians). Such background features could
inspire developers to make new user experiences now be set on a significantly lower sound level
possible, beyond the limitations of an excluded to facilitate focusing the audience’s attention on
passive listener. the spoken text, comparable to the well-known
Cocktail Party Effect (Arons, 1992).
61
Diegetic Music
A further form of occurrence evolved in the of the diegesis. The diegesis does not necessarily
context of music-based computer games, having have to simulate real world circumstances. The
its origins in the aesthetics of music video clips: later discussion on music video games1 will show
music that is visualized on screen. In this scenario, that it does not have to be visual either, even if
the virtual scene is literally built up through music. visually presented. Again, the diegesis in interac-
Musical features define two- or three-dimensional tive media is the ultimate interaction domain, not
objects, their positioning, and set event qualities any interposed interface layer. Keyboard, mouse,
(for example, bass drum beats may induce big gamepad, and graphical user interface elements
obstacles on a racing track or timbral changes like health indicator and action buttons are extra-
cause transitions of the color scheme). The visu- diegetic. They serve only to convert user input
alizations are usually of an aesthetically stylized into diegetic actions or to depict certain diegetic
type. Thus, the scenes are barely (photo-)realistic information.
but rather surrealistic. Typical representatives of The terms diegetic and non-diegetic in their
music-based computer games are Audiosurf: Ride narrow sense describe the source domain of a de-
Your Music (Fitterer, 2008), Vib-Ribbon (NanaOn- scribed entity: diegesis or extra-diegesis. Diegetic
Sha, 1999), and Amplitude (Harmonix, 2003). sound comes from a source within the diegesis.
However, music does not have to be completely Many theorists add further meaning to the terms
precomposed for the interactive context. Games regarding, for instance, the addressee. A soldier
like Rez (Sega, 2001) demonstrate that player in- in a strategy game may ask the player directly
teraction can serve as a trigger of musical events. where to go. As the player may also adapt his
Playing the game creates its music. One could playing behaviour to non-diegetic information
argue that this is rather a very reactive non-diegetic (a musical cue warns of upcoming danger), these
score. However, the direct and very conspicuous can be influential for the diegesis. Such domain-
correlation of interaction and musical event and crossing effects are unthinkable in linear, that is
the entire absence of any further sound effects non-interactive, media. The strict inside-outside
drag the music out of the “off” onto the stage. separation of the traditional terminology is, of
The surrealistic visuals emphasize this effect as course, incapable of capturing these situations
they decrease the aesthetic distance to musical and it may never be meant to do so. Galloway
structure. In this virtual world, music is the sound (2006) deals with this subject in an exemplary
effect and is, of course, audible from within the way. This chapter does not intend to participate
scene, hence diegetic. The conceptual distance in this discussion.
to virtual instruments is not far as is shown by For the sake of clarity, the narrow sense of the
the game Electroplankton (Iwai, 2005) and the terminology is applied in this chapter. This means
lively discussion on whether it can still be called that the terms only refer to the source domain,
a game (Herber, 2006). not the range of influence. Diegetic is what the
In the contexts of Jørgensen’s (2011) terminol- mechanics of the diegesis (world simulation, in
ogy discussion, a more precise clarification of the a sense) create or output. If the superior game
use of diegetic and non-diegetic in this chapter is mechanics produce further output (for example,
necessary. The diegesis, mostly seen as a fictional interface sounds or the musical score) it is de-
story world, is here used in its more general sense clared non- or extra-diegetic. This is also closer
as a virtual or fictional world detached from the to the principles of the technical implementation
conventional story component. It is rather the of computer games and may make the following
domain the user interacts with either directly explanations more beneficial.
(god-like) or through an avatar which itself is part
62
Diegetic Music
ONstAGE PErFOrMED MUsIc Conversely, it can also be that dramatic events

happen, maybe the protagonist is attacked, but
The primal manifestation of diegetic music is the musical background does not react. Instead,
music that is performed within the scene, either it may continue playing jaunty melodies. Such
as a foreground or background artifact. As such, an indifferent relation between foreground and
it usually appears in its autonomous form as a background evokes some kind of incongruence.
self-contained and very often a pre-existent piece. This emphasizes the dramaturgical meaning of the
The most distinctive difference between diegetic event or action. Moreover, it is sometimes under-
music and its non-diegetic counterpart is that the stood as a philosophical statement indicating an
latter cannot be considered apart from its visual indifferent attitude of the environment. Whatever
and narrative context. happens there, it means nothing to the rest of the
Likewise, the perceptual attitude differs sub- world: “life goes on” (Lissa, 1965, p.166).
stantially. Foreground diegetic music is perceived Even though the source of diegetic music is
very consciously, comparable to listening to a piece part of the scene it does not have to be visible.
of music on the radio or a concert performance. The sound of a gramophone suffices to indicate
Even background diegetic music that serves a its presence. In this way diegetic music, just like
similar purpose as non-diegetic mood music is diegetic sound effects, gathers in non-visible ele-
comprehended differently. While mood music ments of the scene and blurs the picture frame,
describes an inner condition (What does a location which is particularly interesting for fixed-camera
feel like?) background diegetic music contributes shots. It associates a world outside the window
to the external description (What does the location and beyond that door which never opens. Its role
sound like?) and can be mood-influential only on as a carrier of such associations takes shape the
a general informal level (They are playing sad more music comes to the fore because the linkage
music here!). to its visual or narrative correlative is very direct
and conspicuous (The guy who always hums that
Functions melody!).
Furthermore, when diegetic music is per-
Therefore, the role of background diegetic music formed by actors, and thereby linked to them,
is often regarded as less intrinsic. It is just a prop, it can become a means of emotional expression
an element of the soundscape, which gives more revealing their innermost condition. The actor can
authenticity to the scenario on stage. As such it whistle a bright melody, hum it absentmindedly
serves well to stage discos, bars, cafés, street while doing something else, or articulate it with
settings with musicians, casinos (see Collins, sighing inflection. Trained musicians can even
Tessler, Harrigan, Dixon, & Fugelsang, 2011) for change the mode (major, minor), vary the melody,
an extensive description of sound and music in or improvize on it.
gambling environments) and so forth. However, The more diegetic music becomes a central
it does not have to remain neutral, even as a element of the plot the more its staging gains in
background element. It represents the state of the importance. Did the singer act well to the music?
environment. Imagine a situation where the street Does the fingering of the piano player align with
musicians suddenly stop playing. This is more than the music? It can become a regulator for motion
an abrupt change of the background atmosphere, and acting. The most obvious example is probably
it is a signal indicating that something happened a dancing couple. Very prominent is also the final
that stopped them playing, that something has assassination scene in Alfred Hitchcock’s (1956)
fundamentally changed. The Man Who Knew Too Much. During a concert
63
Diegetic Music
performance of Arthur Benjamin’s Cantata Storm ings. Diegetic music should come from where it is
Clouds the assassin tries to cover his noise by performed. The human listener is able to localize
shooting in synch with a loud climactic cymbal real world sound sources with deviations down to
crash. Even screaming Doris Day is perfectly in two degrees (Fastl & Zwicker, 2007). Depending
time with the meter of the orchestra. on the speaker setting, this can be significantly
worse for virtual environments. But even stereo
Design Principles speakers provide rough directional information.
Localization gets better again when the source
However, when a musical piece is entirely per- is moving or the players are able to change their
formed in the foreground, it creates a problem. It relative position and orientation to the source. In
slows the narrative tempo down. This is because either case the source should not “lose” its sound
change processes take more time in autonomous or leave it behind when it moves. It would, as
music than on the visual layer, in films as well a consequence, lose presence and believability.
as in games. In contrast to non-diegetic music, Positioning the music at the performer’s location
where changes are provoked and justified by the in relation to the listener is as essential as it is for
visual and narrative context, diegetic music has every further sound effect.
to stand on its own. Its musical structure has to But up to now only a very primitive kind of
be self-contained, hence, change processes need localization has been discussed: setting the sound
to be more elaborate. Such compositional aspects source at the right place. In interactive environ-
of non-diegetic film music and its differences to ments, the player might be able to come very close
autonomous music have been discussed already to the performer(s). If it is just a little clock radio,
by Adorno and Eisler (1947). a single sound source may suffice. But imagine a
For an adequate implementation of diegetic group of musicians, a whole orchestra, the player
music, further issues have to be addressed. In being able to walk between them, listening to
contrast to non-diegetic music, it is subject to the each instrument at close range. Not to forget that
acoustic conditions of the diegesis. A big church the performer, let us say a trumpet player, would
hall, small bed room, or an outdoor scene in the sound very different at the front than from behind,
woods, each environment has its own acoustics and at least in reality. Each instrument has its individual
resonances. Ever heard disco music from outside sound radiation angles. These are distinctively
the building? The walls usually filter medium and pronounced for each frequency band. The radia-
high frequencies, the bass is left. This changes tion of high-frequency partials differs from that
completely when entering the dance floor. Diegetic of medium and low frequencies, a fact that, for
music as well as any other sound effect cannot, instance, sound engineers have to consider for
and must not, sound like a perfectly recorded and microphonics (Meyer, 2009).
mixed studio production. A solo flute in a large How far do developers and designers need to
symphony orchestra is always audible on CD but go? How much realism is necessary? The answer
gets drowned in a real life performance. Accord- is given by the overall realism that the developers
ing to the underlying sound design there might, aim for. Non-realistic two-dimensional environ-
nonetheless, be a distinction between foreground ments (cartoon style, for example) are comparably
and background mixing that does not have to be tolerant of auditory inconsistencies. Even visually
purely realistic. Furhter discussion of this can be (photo-) realistic environments do not expect re-
found, for instance, in Ekman (2009). alistic soundscapes at all. Hollywood cinematic
The sound positioning in the stereo or surround aesthetics, for instance, focus on the affect not on
panorama also differs from that of studio record- realism. Ekman (2009) describes further situations
64
Diegetic Music
where the human subjective auditive perception Non-Linearity and Interactivity

differs greatly from the actual physical situation.
Possible causes can be the listener’s attention, However, in the gaming context in particular this
stress, auditory acuity, body sounds and reso- authenticity gets lost when the player listens to
nances, hallucination and so forth. the same piece more than once. A typical situation
All this indicates that diegetic music has to be in a game: The player re-enters the scene several
handled on the same layer as sound effects and times and the diegetic music always starts with
definitely not on the “traditional” non-diegetic the same piece as if the performers paused and
music layer. In the gaming scenario, it falls under waited until the player came back. This can be
the responsibility of the audio engine that renders experienced, for example, in the adventure game
the scene’s soundscape. Audio Application Pro- Gabriel Knight: Sins of the Fathers (Sierra, 1993)
gramming Interfaces (APIs) currently in use are, when walking around in Jackson Square. Such a
for instance, OpenAL (Loki & Creative, 2009), déjà vu effect robs the virtual world of credibility.
DirectSound as part of DirectX (Microsoft, 2009), The performers, even if not audible, must continue
FMOD Ex (Firelight, 2009), and AM3D (AM3D, playing their music and when the player returns
2009). An approach to sound rendering based on he must have missed parts of it.
graphics hardware is described by Röber, Ka- Another very common situation where the
minski, and Masuch (2007) and Röber (2008). A player rehears a piece of music occurs when getting
further audio API that is especially designed for stuck in a scene for a certain time. The perform-
the needs of mobile devices is PAudioDSP by ers, however, play one and the same piece over
Stockmann (2007). and over again. In some games they start again
It is not enough, though, to play the music when they reach the end, in others, the music
back with the right acoustics, panorama, and loops seamlessly. Both are problematic because
filtering effects. Along the lines of “more real it becomes evident that there is no more music.
than reality”, it is often a good case to reinforce The end of the world is reached in some way and
the live impression by including a certain degree there is nothing beyond. A possible solution could
of defectiveness. The wow and flutter of a record be to extend the corpus of available pieces and
player may cause pitch bending effects. There can go through it either successively or randomly in
be interference with the radio reception resulting the music box manner. But the pieces can still
in crackling and static noise. Not to mention the recur multiple times. In these cases it is important
irksome things that happen to each musician, even that the performances are not exactly identical.
to professionals, at live performances: fluctua- A radio transmission does not always crackle at
tion of intonation, asynchrony in ensemble play, the same time within the piece and musicians try
and wrong notes, to name just a few of them. to give a better performance with each attempt.
Those things hardly ever happen on CD. In the They focus on the mistakes they made last time
recording studio, musicians can repeat a piece and make new ones instead. This means that the
again and again until one perfect version comes game has to generate ever new performances.
out or enough material is recorded to cut down Examples for systems that can generate expres-
a perfect version during postproduction. But at sive performances are:
life performances all this happens and cannot
be corrected afterwards. Including them in the • the rule-based KTH Director Musices by
performance of diegetic music makes for a more Friberg, Bresin, and Sundberg (2006)
authentic live impression. • the machine learning-based YQX by
Flossmann, Grachten, and Widmer (2009)
65
Diegetic Music
• the mathematical music theory-based ap- tions through trained neural networks (Hörnel,
proach by Mazzola, Göller, and Müller 2000; Hörnel & Menzel, 1999). Based on a graph
(2002). representation of possible alternative chord pro-
gressions (a Hidden Markov Model derivative
Even the expressivity of the performance itself called Cadence Graph), Stenzel (2005) describes
can be varied. This can derive from the scene an approach to variations on the harmonic level.
context (the musician is happy, bored, or sad) or Beyond varying musical material it is also
be affected by random deviations (just do it dif- possible to generate ever new material. There-
ferently next time). Systems to adapt performative fore, Hiller and Isaacsons (1959) have already
expression were developed by Livingstone (2008) attempted this through the application of random
and Berndt and Theisel (2008). number generators and Markov chains. This is still
But modifying performative expression is not common practice today, for example, for melody
the only way to introduce diversity into music. A generation (Klinger & Rudolph, 2006). Next,
further idea is to exploit the potential of sequential harmonization and counterpoint can be created for
order, that is, to rearrange the sequence of musical that melody to achieve a full polyphonic setting
segments. The idea derives from the classic musi- (Ebcioglu, 1992; Schottstaedt, 1989; Verbiest,
cal dice games which were originally invented by Cornelis, & Saeys, 2009). Further approaches to
Kirnberger (1767) and became popular through music composition are described by Löthe (2003),
Mozart (1787). The concept can be extended by Taube (2004), and Pozzati (2009). Papadopoulos
so-called One Shot segments that can be inter- & Wiggins (1999) and Pachet and Roy (2001)
posed occasionally amongst the regular sequence give more detailed surveys of algorithmic music
of musical segments as proposed within several generation techniques.
research prototypes by Tobler (2004) and Berndt, The nonlinear aspects of diegetic music as they
Hartmann, Röber, & Masuch (2006). These make have been discussed up to now omitted one fact
the musical progress appear less fixed. Musical that comes along with interactive media. Music,
polyphony offers further potential for variance: as part of the diegesis, not only influences it but
Building block music2 allows various part settings can also be influenced by it, especially by the
as not all of them have to play at once. One and player. Which player is not tempted to click on the
the same composition can sound very different performer and see what happens? In the simplest
by changing the instrumentation (Adler, 2002; case a radio is just switched on and off or a song is
Sevsay, 2005) or even the melodic material and selected on the music box. Interaction with virtual
counterpoint (Aav, 2005; Berndt et al., 2006; musicians, by contrast, is more complicated. Two
Berndt, 2008). Thus, each iteration seems to be modes can be distinguished: the destructive and
a rearrangement or a variation instead of an exact the constructive mode.
repetition. Destructive interaction interferes with the
Generative techniques can expand the musical musician’s performance. The player may talk to
variance even more. Imagine a virtual jazz band him, jostle him, distract his attention from playing
that improvises all the time. New music is con- the right notes and from synchronisation with the
stantly created without any repetition. This can ensemble. This may even force the musician to
be based on a given musical material, a melody stop playing. Destructive interaction affects the
for instance, that is varied. The GenJam system, musical quality. A simple way to introduce wrong
a genetic approach (Miranda & Biles, 2007), is a notes is to change the pitch of some notes by a
well known representative. MeloNet and JazzNet certain interval. Of course, not all of them have to
are two systems that create melody ornamenta- be changed. The number of changes depends on the
66
Diegetic Music
degree of disruption. Likewise for the size of the So much effort, such a large and complex
pitch interval: for example, the diatonic neighbor arsenal for mostly subsidiary background music?
(half and whole step) with small errors and big- Do we really require all this? The answer is ”no”.
ger intervals the more the musician is distracted. This section proposed a collection of tools of which
In the same way rhythmic precision and syn- the one or other can be useful for rounding off
chrony can be manipulated. Making musicians the coherence of the staging and to strengthen the
asynchronous simply means adding a plain delay believability of the music performance. Moreover,
that puts some of them ahead and others behind these tools establish the necessary foundations for
in the ensemble play. The rhythmic precision, by music to be more than a background prop, but to
contrast, has to do with the timing of a musician. come to the fore as an interactive element of the
Does he play properly in time or is he “stumbling”, scene. This opens up the unique opportunity for
in other words, unrhythmical? Such timing aspects the player to experience music and its performance
were described, investigated, and implemented in a completely different way, namely close up.
by Friberg et al. (2006) and Berndt and Hähnel
(2009) amongst others. As ensemble play is also
a form of communication between musicians, VIsUALIZED MUsIc
one inaccurate player affects the whole ensemble,
beginning with the direct neighbor. They will, of Beyond visualizing only the performance of
course, try to come together again which can be music, that is showing performing musicians or
emulated by homeostatic (self-balancing) systems. sound sources as discussed so far, there is a further
Such self-regulating processes were, for instance, possibility: the visualization of music itself. In
described by Eldridge (2002) and used for serial fact, it is not music as a whole that is visualized
music composition. but rather a selection of structural features of a
Constructive interaction, by contrast, influ- musical composition (rhythmic patterns, melodic
ences musical structure. Imagine a jazz band contour and so on). Moreover, the visual scene
cheered by the audience, encouraged to try more must not be completely generated from musical
adventurous improvisations. Imagine a street information. Music video games just like music
musician playing some depressive music. But video clips often feature a collage-like combina-
when giving him a coin he becomes cheerful, tion of realistic and aesthetically stylized visuals.
his music likewise. Such effects can rarely be The latter is the focus of this section.
found in virtual gaming worlds up to now. The The Guitar Hero series (Harmonix, 2006-2009)
adventure game Monkey Island 3: The Curse of works with such collage-like combinations. While
Monkey Island (LucasArts, 1997) features one of a concert performance is shown in the background
the most famous and visionary exceptions. In one the foreground illustrates the guitar riffs which
scene the player’s pirate crew sings the song “A the player has to perform. PaRappa the Rapper
Pirate I Was Meant To Be.” The player chooses (NanaOn-Sha, 1996) also shows the performers
the keywords with which the next verse has to on screen and an unobtrusive line of symbols on
rhyme. The task is to select the one that nobody top that indicates the type of interaction (which
finds a rhyme for, to bring them back to work. keys to press) and the timing to keep up with
The sequential order of verses and interludes is the music. In Audiosurf, by contrast, the whole
adapted according to the multiple-choice decisions scene is built up through music: the routing of the
that the player makes. A systematic overview of obstacle course, the positioning of obstacles and
this and further approaches to nonlinear music is items, the color scheme, background objects, and
given by Berndt (2009). visual effects, even the driving speed. So music
67
Diegetic Music
not only sets visual circumstances but also event bass, for example) arranged as multiple lanes.
qualities. Some pieces induce more difficult tracks Color coding is often used to represent sound
than others. timbre (Audiosurf). Other visualization techniques
are based on the actual waveform of the recording
the Musical Diegesis or on its Fourier transformation (commonly used
in media player plug-ins and also in games). For
The visual instances of musical features are completeness, it should be mentioned that it is,
aesthetically looser in video clips. In the gaming of course, not enough to create only a static scene
scenario they have to convey enough information or a still shot. Since music is a temporal art its
to put the game mechanics across to the player. visualisation has to develop over time, too.
Hence, they have to be aesthetically more consis- In music video games, as well as in video clips,
tent and presented in a well-structured way. Often music constitutes the central value of the medium.
a deviation of the pitch-time notation, known It is not subject to functional dependencies on
from conventional music scores (pitch is aligned the visual layer. Conversely, the visual layer is
vertically, time horizontally), forms the conceptual contingent upon music, as was already described.
basis of the illustrations. Upcoming events scroll Although the visual scene typically does not show
from right to left. Its vertical alignment indicates or even include any sound sources in a traditional
a qualitative value—not necessarily pitch—of the sense (like those described in the previous section),
event. The orientation can, of course, vary. Shultz music has to be declared a diegetic entity, even
(2008) distinguishes three modes: more than the visuals. These is only a translation
of an assortment of musical aspects into visual
• Reading Mode: corresponds to score no- metaphors. They illustrate, comment, concretize,
tation as previously described and imple- and channel associations which the music may
mented, for example, in Donkey Konga evoke (Kungel, 2004). They simplify conven-
(Namco, 2003) tional visually marked interaction techniques. But
• Falling Mode: the time-axis is vertically the interaction takes place in the music domain.
oriented, the pitch/quality-axis horizontal- The visuals do not and cannot grasp the musical
ly, upcoming events “drop down” (Dance diegesis as a whole.3 In this scenario the diegesis
Dance Revolution by Konami (1998)) is literally constituted by music. It is the domain
• Driving Mode: just like falling mode but of musical possibilities.
with the time-axis in z-direction (depth), In this (its own) world, music is subject to no
upcoming events approach from ahead restrictions. The visual layer has to follow. The
(Guitar Hero). imaginary world that derives from this is equally
subject to no logical or rational restrictions. The
The illustrations do not have to be musically routings of the obstacle courses in Audiosurf run
accurate. They are often simplified for the sake of freely in a weightless space: even the background
better playability. In Guitar Hero, for instance, no graphics and effects have nothing in common with
exact pitch is represented, only melodic contour. real sky or space depictions. Practical restrictions,
Even this is scaled down to the narrow ambit that such as those discussed above for onstage per-
the game controller supplies. It is, in fact, not formed music (like radio reception interference,
necessary to translate note events into some kind wrong notes and so forth), likewise do not exist.
of stylization. Structural entities other than pitch Hence, the performative quality can be at the
values can be indicated as well. In Amplitude, it is highest stage, that is, studio level.
the polyphony of multiple tracks (rhythm, vocals,
68
Diegetic Music
Interactivity in the Musical Domain and preproduced material. Further interactivity

requires more musical flexibility. Therefore, two
However, the possibilities to explore these worlds different paths can be taken:
interactively are still severely limited. Often,
statically predetermined pieces of music dictate • interaction by musically primitive events
the tempo and rhythm of some skill exercises • interaction with high-level structures and
without any response to whether the player does design principles.
well or badly. This compares to conventional
on-rails shooter games that show a pre-rendered Primitive events in music are single tones, drum
video sequence which cannot be affected by the beats, and even formally consistent groups of such
player whose only task is to shoot each appearing primitives that do not constitute a musical figure
target. A particular piece of music is, here, es- in itself (for instance, tone clusters and arpeg-
sentially nothing else but one particular tracking gios). In some cases even motivic figures occur as
shot through a much bigger world. primitive events: they are usually relatively short
Music does not have to be so fixed and the (or fast) and barely variable. The game mechanics
player should not be merely required to keep up provide the interface to trigger them and set event
with it. The player can be involved in its cre- properties like pitch, loudness, timbre, cluster
ation: “Music videogames would benefit from density, for example. Ultimately, this leads to a
an increasing level of player involvement in close proximity of interactive virtual instrument
the music” (Williams, 2006, p.7) The diegesis concepts. It can be a virtual replica of a piano,
must not be what a prefabricated piece dictates violin, or any instrument that exists in reality.
but should rather be considered as a domain of Because of the radically different interaction mode
musical possibilities. The piece that is actually (mouse and keyboard) these usually fall behind
played reflects the reactions of the diegesis to their realworld prototypes regarding playability.
player interaction. An approach to this begins with To overcome this limitation, several controllers
playing only those note events (or more generally, were developed that adapt form and handling of
musical events) that the player actually hits, not real instruments like the guitar controller of Guitar
those he was supposed to hit. In Rez, for instance, Hero, the Donkey Konga bongos, the turntable
although it is visually an on-rails shooter, only a controller of DJ Hero (FreeStyleGames, 2009),
basic ostinato pattern (mainly percussion rhythms) and not to forget the big palette of MIDI instru-
is predefined and the bulk of musical activity is ments (keyboards, violins, flutes, drum pads and
triggered by the player. Thus, each run produces so forth). Roads (1996) gives an overview of such
a different musical output. Williams (2006) goes professional musical input devices.
so far as to state that “it is a pleasure not just to But real instruments do not necessarily have to
watch, but also to listen to someone who knows be adapted. The technical possibilities allow far
how to play Rez really well, and in this respect more interaction metaphors, as is demonstrated
Rez comes far closer to realising the potential of by the gesture-based Theremin (1924), the sensor-
a music videogame” (p.7). equipped Drum Pants (Hansen & Jensenius, 2006),
In Rez, the stream of targets spans the domain and the hand and head tracking-based Tone Wall/
of musical possibilities. The player’s freedom may Harmonic Field (Stockmann, Berndt, & Röber,
still be restricted to a certain extent but this offers 2008). Even in the absence of such specialized
a clue for the developers to keep some control controllers keyboard, mouse, and gamepad al-
over the musical dramaturgy. This marks the upper low expressive musical input too. The challenge,
boundary of what is possible with precomposed therefore, is to find appropriate metaphors like
69
Diegetic Music
aiming and shooting targets, painting gestural to the underlying meter and rhythmic structure.
curves, or nudge objects of different types in a Advanced difficulty settings can be like driving
two- or three-dimensional scene. without such safety systems. It is most interesting
Although the player triggers each event manu- for trained players who want to experiment with
ally he does not have to be the only one playing. a bigger range of possibilities.
An accompaniment can be running autonomously Interaction with high-level structures is less
in the background like that of a pianist that goes direct. The characteristic feature of this approach is
along with a singer or a rock band that sets the the autonomy of the music. It plays back by itself
stage for a guitar solo. Often repetitive structures and reacts to user behaviour. While the previously
(ostinato, vamp, riff) are therefore applied. Such described musical instruments are rather perceived
endlessly looping patterns can be tedious over as a tool-like object, in this approach the impres-
a longer period. Variation techniques like those sion of a musical diegesis, a virtual world filled
explained in the previous section can introduce with entities that dwell there and react and interact
more diversity. Alternatively, non-repetitive with the player, is much stronger. User interaction
material can be applied. Precomposed music is affects the arrangement of the musical material
of limited length, hence, it should be sufficiently or the design principles which define the way the
long. Generated music, by contrast, is subject material is generated. In Amplitude (in standard
to no such restrictions. However, non-repetitive gameplay mode) it is the arrangement. The songs
accompaniment comes with a further problem: are divided into multiple parallel tracks. A track
it lacks musical predictability and thereby ham- represents a conceptual aspect of the song like
pers a player’s smooth performance. This can be bass, vocals, synth, or percussion and each track
avoided. Repetitive schemes can change after a can be activated for a certain period by passing a
certain number of iterations (for example, play riff skill test. Even this test derives from melodic and
A four times, B eight times, and C four times). rhythmic properties of the material to be activated.
The changes can be prepared in such a way that The goal is to activate them all.
the player is warned. A well-known example is The music in Amplitude is precomposed and,
the drum roll crescendo that erupts in a climactic thus, relatively invariant. Each run leads ultimately
crash. Furthermore, tonally close chord relations to the same destination music. Other approaches
can relax strict harmonic repetition without losing generate the musical material just in time while
the predictability of appropriate pitches. it is performed. User interaction affects the pa-
The player can freely express himself against rameterization of the generation process which
this background. But should he really be allowed results in different output. For this constellation
to do anything? If yes, should he also be allowed of autonomous generation and interaction Chapel
to perform badly and interfere with the music? In (2003) coined the term Active Musical Instrument,
order not to discourage a proportion of the custom- an instrument for real-time performance and com-
ers, lower difficulty settings can be offered. The position that actively interacts with the user: “The
freedom of interaction can be restricted to only system actively proposes musical material in real-
those possibilities that yield pleasant satisfactory time, while the user’s actions [.. .] influence this
results. There can be a context sensitive compo- ongoing musical output rather than have the task
nent in the event generation just like a driving to initiate each sound” (p.50). Chapel states that
aid system that prevents some basic mistakes. an Active Instrument can be constructed around
Pitch values can automatically be aligned to the any generative algorithm.
current diatonic scale in order to harmonize. A The first such instrument was developed by
time delay can be used to fit each event perfectly Chadabe (1985). While music is created autono-
70
Diegetic Music
mously, the user controls expressive parameters will be—they already are—an easy introduction
like accentuation, tempo, and timbre. In Chapel’s to music for everyone, even non-musicians, who
case the music generation is based on fractal func- playfully learn musical principles to good and
tions which can be edited by the user to create lasting effect.
ever new melodic and polyphonic structures. El-
dridge (2002) applies self-regulating homeostatic
networks. Perturbation of the network causes INtErActING WItH MUsIc:
musical activity—a possible way to interact with A cONcLUsION
the system. The musical toy Electroplankton for
Nintendo DS offers several game modes (called Music as a diegetic occurrence in interactive
plankton types) that build up a musical domain media cannot be considered apart from interac-
with complex structures, for example, a melodic tivity. But music being the object of interaction
progression graph (plankton type Luminaria) and is a challenging idea. It is worth taking up this
a melodic interpreter of graphical curves (plank- challenge. The growing popularity of music video
ton type Tracy). These can be freely created and games over the last few years encourages further
modified by the user. exploration of the boundaries of interactivity and
A highly interactive approach that incorpo- to surmount them.
rates precomposed material is the Morph Table Music does not have to be static. It can vary in
presented by Brown, Wooller, & Kate (2007). its expressivity regarding the way it is performed.
Music consists of several tracks. Each track is Users can interact with virtual performers. These
represented by a physical cube that can be placed do not have to play fixed compositions. Let them
on the tabletop: this activates its playback. For ornament their melodies, vary or even improvise
each track, there are two different prototype riffs on them. Why not just generate new music in real-
represented by the horizontal extremes of the time while the game is played? Let the players exert
tabletop (left and right border). Depending on an influence on this. Or enable them to playfully
the relative position of the cube in-between, the arrange or create their own music. Few of these
two riffs are recombined by the music morphing possibilities are applied in practice up to now.
techniques which Wooller & Brown (2005) devel- Music is a living art that should be more than
oped. The vertical positioning of the cube controls simply reproduced, it should be experienced anew
other effects. The tabletop interface further allows each time. It is a temporal art and its transience is
collaborative interaction with multiple users. an inherent component. This chapter has shown
This anticipates a promising future perspective how to raise music in interactive media above the
for music video games. Music making has always status of its mere reproduction. As a domain of
been a collaborative activity that incorporates a interactivity, it invites the users to explore, create,
social component, encourages community aware- and to have new musical experiences.
ness, interaction between musicians, and mutual
inspiration. What shall be the role of music games
in this context? Do they set the stage for the per- rEFErENcEs
formers or function as performers themselves?
In contrast to conventional media players, which AM3D (2009). AM3D [Computer software].
are only capable of playing back prefabricated AM3D A/S (Developer). Aalborg, Denmark.
pieces, music video games will offer a lot more.
They will be a platform for the user to experiment
with and on which to realize his ideas. And they
71
Diegetic Music
Aav, S. (2005). Adaptive music system for Direct- Brown, A. R., Wooller, R. W., & Kate, T. (2007,).
Sound. Unpublished master’s thesis. University The morphing table: A collaborative interface for
of Linköping, Sweden. musical interaction. In A. Riddel & A. Thorogood
(Eds.), Proceedings of the Australasian Com-
Adler, S. (2002). The study of orchestration (3rd
puter Music Conference (pp. 34-39). Canberra,
ed.). New York: Norton & Company.
Australia.
Adorno, T. W., & Eisler, H. (1947). Composing
Chadabe, J. (1985). Interactive music composi-
for the films. New York: Oxford University Press.
tion and performance system. U.S. Patent No.
Arons, B. (1992, July). A review of the cocktail 4,526,078. Washington, DC: U.S. Patent and
party effect. Journal of the American Voice I/O Trademark Office.
Society, 12, 35-50.
Chapel, R. H. (2003). Real-time algorithmic music
Berndt, A. (2008). Liturgie für Bläser (2nd ed.). systems from fractals and chaotic functions: To-
Halberstadt, Germany: Musikverlag Bruno Uetz. wards an active musical instrument. Unpublished
doctoral dissertation. University Pompeu Fabra,
Berndt, A. (2009). Musical nonlinearity in inter-
Barcelona, Spain.
active narrative environments. In G. Scavone, V.
Verfaille & A. da Silva (Eds.), Proceedings of the Collins, K., Tessler, H., Harrigan, K., Dixon, M.
Int. Computer Music Conf. (ICMC) (pp. 355-358). J., & Fugelsang, J. (2011). Sound in electronic
Montreal, Canada: International Computer Music gambling machines: A review of the literature
Association, McGill University. and its relevance to game audio. In Grimshaw,
Berndt, A., & Hähnel, T. (2009). Expressive
musical timing. In Proceedings of Audio Mostly
PA: IGI Global.
2009: 4th Conference on Interaction with Sound
(pp. 9-16). Glasgow, Scotland: Glasgow Cale- Ebcioglu, K. (1992). An expert system for har-
donian University, Interactive Institute/Sonic monizing chorales in the style of J. S. Bach. In
Studio Piteå. Balaban, M., Ebcioglu, K., & Laske, O. (Eds.),
Understanding music with AI: Perspectives on
Berndt, A., Hartmann, K., Röber, N., & Masuch,
music cognition (pp. 294–334). Cambridge, MA:
M. (2006). Composition and arrangement tech-
MIT Press.
niques for music in interactive immersive environ-
ments. In Proceedings of Audio Mostly 2006: A Ekman, I. (2009). Modelling the emotional lis-
Conference on Sound in Games (pp. 53-59). Piteå, tener: Making psychological processes audible. In
Sweden: Interactive Institute/Sonic Studio Piteå. Proceedings of Audio Mostly 2009: 4th Conference
on Interaction with Sound (pp. 33-40). Glasgow,
Berndt, A., & Theisel, H. (2008). Adaptive musical
Scotland: Glasgow Caledonian University, Inter-
expression from automatic real-time orchestration
active Institute/Sonic Studio Piteå.
and performance. In Spierling, U., & Szilas, N.
(Eds.), Interactive Digital Storytelling (ICIDS) Eldridge, A. C. (2002). Adaptive systems music:
2008 (pp. 132–143). Erfurt, Germany: Springer. Musical structures from algorithmic process. In
doi:10.1007/978-3-540-89454-4_20 C. Soddu (Ed.), Proceedings of the 6th Genera-
tive Art Conference Milan, Italy: Politecnico di
Milano University.
72
Diegetic Music
Fastl, H., & Zwicker, E. (2007). Psychoacous- Hiller, L. A., & Isaacsons, L. M. (1959). Ex-
tics: Facts and models (3rd ed., Vol. 22). Berlin, perimental music: Composing with an electronic
Heidelberg: Springer. computer. New York: McGraw Hill.
Firelight (2009). FMOD Ex. v4.28 [Computer soft- Hitchcock, A. (1956). The Man Who Knew Too
ware]. Victoria, Australia: Firelight Technologies. Much [Motion picture]. Hollywood, CA: Para-
mount.
Fitterer, D. (2008). Audiosurf: Ride Your Music
[Computer game]. Washington, DC: Valve. Hörnel, D. (2000). Lernen musikalischer Struk-
turen und Stile mit neuronalen Netzen. Karlsruhe,
Flossmann, S., Grachten, M., & Widmer, G.
Germany: Shaker.
(2009). Expressive performance rendering: in-
troducing performance context. In Proceedings of Hörnel, D., & Menzel, W. (1999). Learning
the 6th Sound and Music Computing Conference musical structure and style with neural net-
(SMC). Porto, Portugal: Universidade do Porto. works. Computer Music Journal, 22(4), 44–62.
doi:10.2307/3680893
FreeStyleGames (2009). DJ Hero [Computer
game]. FreeStyleGames (Developer), Activision. Iwai, T. (2005). Electroplankton [Computer
game]. Indies Zero (Developer), Nintendo.
Friberg, A., Bresin, R., & Sundberg, J. (2006).
Overview of the KTH Rule System for musical Jørgensen, K. (2011). Time for new terminology?
performance. Advances in Cognitive Psychology. Diegetic and non-diegetic sounds in computer
Special Issue on Music Performance, 2(2/3), games revisited. In Grimshaw, M. (Ed.), Game
145–161. sound technology and player interaction: Con-
cepts and developments. Hershey, PA: IGI Global.
Galloway, A. R. (2006). Gaming: Essays on
algorithmic culture. Electronic Mediations (Vol. Kirnberger, J. P. (1767). Der allezeit fertige
18). Minneapolis: University of Minnesota Press. Polonaisen und Menuetten Komponist. Berlin,
Germany: G.L. Winter.
Hansen, S. H., & Jensenius, A. R. (2006). The
Drum Pants. In Proceedings of Audio Mostly 2006: Klinger, R., & Rudolph, G. (2006). Evolution-
A Conference on Sound in Games (pp. 60-63). ary composition of music with learned melody
Piteå, Sweden: Interactive Institute/Sonic Studio. evaluation. In N. Mastorakis & A. Cecchi (Eds.),
Proceedings of the 5th WSEAS International
Harmonix (2003). Amplitude [Computer game].
Conference on Computational Intelligence, Man-
Harmonix (Developer), Sony.
Machine Systems and Cybernetics (pp. 234-239).
Harmonix (2006-2009). Guitar Hero series [Com- Venice, Italy: World Scientific and Engeneering
puter games]. Harmonix, Neversoft, Vicarious Academy and Society.
Visions, Budcat Creations, RedOctane (Develop-
Konami (1998). Dance Dance Revolution.
ers), Activision.
Konami, Disney, Keen, Nintendo.
Herber, N. (2006). The Composition-Instrument:
Kungel, R. (2004). Filmmusik für Filmemacher—
Musical emergence and interaction. In Proceed-
Die richtige Musik zum besseren Film. Reil,
ings of Audio Mostly 2006: A Conference on Sound
Germany: Mediabook-Verlag.
in Games (pp. 53-59). Piteå, Sweden: Interactive
Institute/Sonic Studio Piteå. Lissa, Z. (1965). Ästhetik der Filmmusik. Leipzig,
Germany: Henschel.
73
Diegetic Music
Livingstone, S. R. (2008). Changing musical NanaOn-Sha (1999). Vib-Ribbon [Computer

emotion through score and performance with a game]. NanaOn-Sha (Developer), Sony.
compositional rule system. Unpublished doctoral
Pachet, F., & Roy, P. (2001). Musical harmoni-
dissertation. The University of Queensland, Bris-
zation with constraints: A survey. Constraints
bane, Australia.
Journal.
Loki & Creative. (2009). (1.1). Loki Software.
Papadopoulos, G., & Wiggins, G. (1999). AI
Open, AL: Creative Technology.
methods for algorithmic composition: A survey, a
Löthe, M. (2003). Ein wissensbasiertes Verfahren critical view and future prospects. In AISB Sympo-
zur Komposition von frühklassischen Menuetten. sium on Musical Creativity. Edinburgh, Scotland.
Unpublished doctoral dissertation. University of
Pozzati, G. (2009). Infinite suite: Computers and
Stuttgart, Germany.
musical form. In G. Scavone, V. Verfaille & A.
LucasArts. (1997). Monkey Island 3: The Curse da Silva (Eds.), Proceedings of the International
of Monkey Island. LucasArts. Computer Music Conference (ICMC) (pp. 319-
322). Montreal, Canada: International Computer
Manz, J., & Winter, J. (Eds.). (1976). Baukasten-
Music Association, McGill University.
sätze zu Weisen des Evangelischen Kirchenge-
sangbuches. Berlin: Evangelische Verlagsanstalt. Roads, C. (1996). The computer music tutorial.
Cambridge, MA: MIT Press.
Mazzola, G., Göller, S., & Müller, S. (2002). The
topos of music: Geometric logic of concepts, the- Röber, N. (2008). Interacting with sound: Explora-
ory, and performance. Zurich: Birkhäuser Verlag. tions beyond the frontiers of 3D virtual auditory
environments. Munich, Germany: Dr. Hut.
Meyer, J. (2009). Acoustics and the performance of
music: Manual for acousticians, audio engineers, Röber, N., Kaminski, U., & Masuch, M. (2007).
musicians, architects and musical instrument Ray acoustics using computer graphics technol-
makers (5th ed.). New York: Springer. ogy. In Proceedings of the 10th International
Conference on Digital Audio Effects (DAFx-07)
Microsoft. (2009). [Computer software] [. Micro-
(pp. 117-124). Bordeaux, France: LaBRI Univer-
soft Corporation.]. Direct, X, 11.
sity Bordeaux.
Miranda, E. R., & Biles, J. A. (Eds.). (2007).
Schottstaedt, W. (1989). Automatic counterpoint.
Evolutionary computer music (1st ed.). USA:
In Mathews, M., & Pierce, J. (Eds.), Current di-
Springer. doi:10.1007/978-1-84628-600-1
rections in computer music research. Cambridge,
Mozart, W. A. (1787). Musikalisches Würfelspiel: MA: MIT Press.
Anleitung so viel Walzer oder Schleifer mit zwei
Sega (2001). Rez [Computer game]. Sega.
Würfeln zu componieren ohne musikalisch zu seyn
noch von der Composition etwas zu verstehen. Sevsay, E. (2005). Handbuch der Instrumentation-
Köchel Catalog of Mozart’s Work KV1 Appendix spraxis (1st ed.). Kassel, Germany: Bärenreiter.
294d or KV6 516f.
Shultz, P. (2008). Music theory in music games.
Namco (2003). Donkey Konga [Computer game]. In Collins, K. (Ed.), From Pac-Man to pop music:
Namco (Developer), Nintendo. Interactive audio in games and new media (pp.
177–188). Hampshire, UK: Ashgate.
NanaOn-Sha (1996). PaRappa the Rapper [Com-
puter game]. NanaOn-Sha (Developer), Sony.
74
Diegetic Music
Sierra (1993). Gabriel Knight: Sins of the Fathers Wingstedt, J. (2008). Making music mean: On
[Computer game]. Sierra Entertainment. functions of, and knowledge about, narrative music
in multimedia. Unpublished doctoral dissertation.
Stenzel, M. (2005). Automatische Arrangi-
Luleå University of Technology, Sweden.
ertechniken für affektive Sound-Engines von
Computerspielen. Unpublished diploma thesis. Wooller, R. W., & Brown, A. R. (2005). Investi-
Otto-von-Guericke University, Department of gating morphing algorithms for generative music.
Simulation and Graphics, Magdeburg, Germany. In Proceedings of Third Iteration: Third Interna-
tional Conference on Generative Systems in the
Stockmann, L. (2007). Designing an audio API for
Electronic Arts. Melbourne, Australia.
mobile platforms. Internship report. Magdeburg,
Germany: Otto-von-Guericke University.
Stockmann, L., Berndt, A., & Röber, N. (2008). A
musical instrument based on interactive sonifica-
tion techniques. In Proceedings of Audio Mostly Diegesis: Traditionally it is a fictional story
2008: 3rd Conference on Interaction with Sound world. In computer games, or more generally in
(pp. 72-79). Piteå, Sweden: Interactive Institute/ interactive media, it is the domain the user ulti-
Sonic Studio Piteå. mately interacts with.
Taube, H. K. (2004). Notes from the metalevel: Diegetic Music: Music that is performed
Introduction to algorithmic music composition. within the diegesis.
London, UK: Taylor & Francis. Extra-Diegetic: The terms extra-diegetic
and non-diegetic refer to elements outside of the
Theremin, L. S. (1924). Method of and appa- diegesis. Extra-diegetic is commonly used for
ratus for the generation of sounds. U.S. Patent elements of the next upper layer, the narrator’s
No. 73,529. Washington, DC: U.S. Patent and world or the game engine, for instance. Non-
Trademark Office. diegetic, by contrast, refers to all upper layers up
Tobler, H. (2004). CRML—Implementierung eines to the real world.
adaptiven Audiosystems. Unpublished master’s Music Video Games: Computer games
thesis. Fachhochschule Hagenberg, Hagenberg, with a strong focus on music-related interac-
Austria. tion metaphors. For playability, musical aspects
are often, if not usually, transformed into visual
Verbiest, N., Cornelis, C., & Saeys, Y. (2009). representatives.
Valued constraint satisfaction problems applied to Musical Diegesis: In music video games, the
functional harmony. In Proceedings of IFSA World user interacts with musical data. These constitute
Congress EUSFLAT Conference (pp. 925-930). the domain of musical possibilities, the musical
Lisbon, Portugal: International Fuzzy Systems diegesis.
Association, European Society for Fuzzy Logic Nonlinear Music: The musical progress in-
and Technology. corporates interactive and/or non-deterministic
Williams, L. (2006). Music videogames: The influences.
inception, progression and future of the music
videogame. In Proceedings of Audio Mostly 2006:
A Conference on Sound in Games (pp. 5-8). Piteå,
Sweden: Interactive Institute, Sonic Studio Piteå.
75
Diegetic Music
ENDNOtEs 2
Building block music: translated from the
German term “Baukastenmusik” (Manz &
1
Although this book prefers the generic term Winter, 1976).
computer games, here, I use the term music 3
Likewise, non-diegetic film music does and
video game both to emphasize the musical cannot mediate the complete visual diegesis.
interaction and because it is the more com-
monly used term for this genre.
76
Section 2
Frameworks & Models
78
Chapter 5
Time for New Terminology?
Diegetic and Non-Diegetic Sounds
in Computer Games Revisited
Kristine Jørgensen
University of Bergen, Norway
AbstrAct
This chapter is a critical discussion of the use of the concepts diegetic and non-diegetic in connection with
computer game sound. These terms are problematic because they do not take into account the functional
aspects of sound and indicate how gameworlds differ from traditional fictional worlds. The aims of the
chapter are to re-evaluate earlier attempts at adapting this terminology to games and to present an al-
ternative model of conceptualizing the spatial properties of game sound with respect to the gameworld.
INtrODUctION tween two characters is seen as diegetic, while

background score music is seen as non-diegetic.
Two concepts from narrative theory that often In connection with game sound, a likely adapta-
appears in discussions about game sound are tion of these concepts would describe the response
diegetic and non-diegetic (Collins, 2007, 2008; “More work?” from an orc peon unit in the real-
Ekman 2005; Grimshaw 2008; Grimshaw & Schott time strategy game Warcraft 3 (Blizzard, 2002) as
2007; Jørgensen 2007b, 2008; Stockburger, 2003; an example of a diegetic sound since it is spoken
Whalen, 2004). The terms are used in film theory by a character within the gameworld. Music that
to separate elements that can be said to be part of signals approaching enemies in the role-playing
the depicted fictional world from elements that the game Dragon Age: Origins (Bioware, 2009)
fictional characters cannot see or hear and which would according to this view be an example of
should be considered non-existent in the fictional non-diegetic sound since the music is not being
world (Bordwell, 1986; Bordwell & Thompson, played from a source within the game universe.
1997). According to this approach, dialogue be- However, when analyzing the examples more
closely, we see that using these terms in computer
DOI: 10.4018/978-1-61692-828-5.ch005 games is confusing and at best inaccurate. As a
response to a player command, the “More work?” means that the functional aspects of game sound
question has an ambiguous status in relation to the therefore disappears when applying diegetic and
gameworld: If we ask ourselves who the peon is non-diegetic to game sound. Also, gameworlds
talking to, it appears to address the player, who is cannot be appropriately described by these terms
not represented as a character in the gameworld, since they are designed for different purposes than
but manages the troops and base from the outside traditional fictional worlds. Since gameworlds
of the gameworld. The warning music heard in the invite users to enter their domains as players, they
role-playing game is also ambiguous. Although are qualitatively different from other fictional
there is nothing to suggest that the music is being worlds, and this makes the traditional diegetic
played by an orchestra in the wilderness, there is versus non-diegetic divide problematic when
no doubt that the music influences the players’ applied to computer games. While the aim of the
tactical decisions and therefore has direct conse- chapter is to evaluate the use of the two concepts
quence for the player-characters’ actions and the in relation to game sound, the chapter will also be
progression of the game. The confusion comes a revision of my earlier theory on transdiegetic
into being because game sound has a double status sounds (Jørgensen, 2008b). I will discuss my own
in which it provides usability information to the and other attempts at adapting the concepts to
player at the same time as it has been stylized to game sound, based on the original meaning and
fit the depicted fictional world. It works as sup- uses of diegesis, and present an alternative way
port for gameplay, while also providing a sense of conceptualizing the phenomena in relation to
of presence in the gameworld (Jørgensen, 2007a, game sound. The main argument of this chapter
2009; Nacke & Grimshaw, 2011). From this point rests on two principles. One is that the participa-
of view, diegetic and non-diegetic sounds tend to tory nature of games allows the players a dual
blend systematically in games, thereby creating position where they are located on the outside of
additional levels of communication compared to the gameworld but with power to reach into it. The
the traditional diegetic versus non-diegetic divide. other is that gameworlds differ from traditional
Although sound may be categorized and fictional worlds in fundamental ways as they are
discussed in several ways, the diegetic versus worlds intended for play. This difference requires
non-diegetic divide may be especially attractive game sound to be evaluated on terms other than
for describing modern computer games since those used for analyzing film sound.
they are set in universes separate from ours and A short reader guide is appropriate. The
that on the surface remind one of the fictional chapter is organized according to principles of
universes of film and literature. This makes the clarity where an overview of earlier theory cre-
terminology seem like an illustrative approach ates the basis of the argument and, in order to
for describing auditory properties with respect get the most out of the chapter, it should be read
to the represented universe in games. The con- from beginning to end rather than being dipped
cepts enable us to separate what is perceived as into. I will introduce the chapter with a discus-
internal to that universe from what is perceived sion of the origin and application of diegetic and
as external to it. However, as this chapter will non-diegetic in traditional media before going on
argue, the concepts of diegetic and non-diegetic to present other attempts at categorizing game
are developed with traditional media in mind, sound (Collins, 2007, 2008; Huiberts & van Tol,
and are therefore confusing and misleading when 2008; Stockburger, 2003; Whalen, 2004). Next,
attempts are made to uncritically transfer them to the chapter will review different attempts to
computer games. First, the participatory role of the adapt diegetic terminology to games (Galloway,
player is not accounted for in this theory, which 2006) and game sound (Ekman, 2005; Grimshaw,
79
2008; Jørgensen, 2007b). I will then discuss how diegesis was revived in the 1950s to describe the
gameworlds separate themselves from traditional “recounted story” of a film, and today it has become
fictional worlds and that this has consequences the accepted terminology for “the fictional world
for the way we interact with them (Aarseth, 2005, of the story” (p. 16). According to this terminology,
2008; Klevjer, 2007), and consequently for the diegetic sound is represented as “sound which has
application of diegetic and non-diegetic. The a source in the story world”, while non-diegetic
last section of the chapter will present an alter- sound is “represented as coming from a source
native model for analyzing game sound in terms outside the story world” (Bordwell & Thompson,
of spatial integration. Throughout the chapter, I 1997, p. 330). Game scholars who use diegetic and
will also use data from research interviews with non-diegetic when describing game sound, tend
empirical players where this is appropriate. The to take their point of departure from this newer,
data concerns player interpretations of so-called film theory understanding of diegesis, and extend
transdiegetic features in computer games, and sup- the meaning of the “fictional world of the story”,
port the idea that gameworlds work on premises to the universe of the game. As mentioned, this
other than traditional fictional worlds. is confusing since it implies that the gameworld
Although this chapter focuses on the auditory is a storyworld, and is misleading because game
aspect of games in particular, it should be noticed sound works for different purposes compared to
that the discussion about the relevance of diegetic film sound. These points will be in focus in the
and non-diegetic features does not concern audi- following discussion that critically evaluates the
tory features alone. However, sound is particularly use of diegetic and non-diegetic in relation to
interesting for several reasons. Since sound is computer game sound.
neither tangible nor visible, and has a temporal Of course, the debate about the relationship
quality, it has the ability to remain non-intrusive between diegetic and non-diegetic features is not
even when it breaks the borders of the gameworld. unique to game studies. Also, film theory sees
The ability to seamlessly integrate with the game- the limited ability of this theory to precisely de-
world gives it the opportunity to challenge the scribe sound. While David Bordwell and Kristin
relationship between diegetic and non-diegetic Thompson (1997) define non-diegetic sound as
in a way that visual information cannot. “represented as coming from a source outside the
story world” (p. 330), Edward Branigan separates
non-diegetic features into extra-fictional and
bAcKGrOUND non-diegetic. He argues that when a piece of back-
ground film music is accompanying the credits of
Diegetic vs. Non-Diegetic sound a film, it should be interpreted as extra-fictional,
but when it accompanies a series of shots from a
The term diegetic originally stems from The nightclub, and is thus presented as typical of an
Republic, where Plato separates between two evening at that location, it should be interpreted as
narrative modes that he calls diegesis and mime- non-diegetic (1992, p. 96). In this view, Branigan
sis. Diegesis, or pure narrative, is when the poet claims that non-diegetic sound is related to the
“himself is a speaker and does not even attempt diegesis, but does not correspond to the fictional
to suggest to us that anyone but himself is speak- characters’ experience of it (1992, p. 96), while
ing”; while mimesis, or imitation, is when the extra-fictional sound exists outside the diegesis
poet “delivers a speech as if he were someone and is required to talk about the diegesis as fic-
else” (Plato in Genette, 1983, p. 162). According tional (1992, p. 88). Although not accounting for
to film scholar David Bordwell (1986), the term the participatory nature of games, Branigan’s view
80
of non-diegetic is more sympathetic towards how, However, being closely connected to the act of
for instance, score music works in games, since narration—how a story is told—metalepsis only
there is some kind of bond between the sound and serves as a comparative illustration for the trans-
what happens within the diegesis. boundary movement that happens in computer
When discussing film music, Michel Chion games.
also points out that the non-diegetic category is These methods of categorization show that the
complicated. A central reason, in his view, is that relationship between diegetic and non-diegetic
so-called diegetic music, like non-diegetic music, sound is not without debate in film theory and
may have a commentary function meant to help literary theory but, while the concepts work as a
the interpretation of what is going on in the film. point of departure and as a common ground for
Chion’s own example is Siodmak’s Abschied, in understanding the narrative levels of traditional
which the protagonist’s emotional states are being fiction, they create confusion in connection with
punctuated by the music of his pianist neighbor, computer game sound because of the participatory
thereby questioning the non-diegetic state of the nature of games and gameworlds (Collins, 2008,
music. Because of such ambiguous cases, Chion ar- p. 180; Jørgensen, 2006, p. 48, 2007b, p. 106). In
gues that the reference to diegetic and non-diegetic films and computer games equally, sound cues the
music is misleading, and uses pit music and screen media user’s understanding of the environment, di-
music instead. While pit music “accompanies the rection, spatiality, temporality, objects and events.
image from a non-diegetic position, outside the However, film sound is limited to informing the
time and space of action”, screen music refers to audience as to how to interpret what is going
“music arising from a source located directly or on in an inaccessible world while game sound
indirectly in the space of time” (Chion, 1994, p. provides information relevant for understanding
80). From this approach, screen music could also how to interact with the game system and behave
be used to describe the computer game version of in the virtual environment that is the gameworld
leitmotifs (Gorbman, 1997, pp. 3, 26-29), in which (Jørgensen, 2008). This means that game sound
music with an apparent non-diegetic source warns has a double status in which it provides usability
the player about dangers. information to the player at the same time as it
The relationship between diegetic and non- has been stylized to fit the depicted universe. This
diegetic is not a simple one in literary theory may create confusion with respect to the role of
either. One example of this is provided by Gerard the sound since it appears to have been placed
Genette, who points out that the diegetic and non- in the game from the point of view of creating
diegetic levels often blend together in the act of a sense of presence and physicality to the game
narration. He uses the term metalepsis to describe universe while it actually works as a support for
any transition from one diegetic level to another. gameplay. A comparison serves as illustration.
While the classics used the term to refer to “any When the players of The Elder Scrolls III: Mor-
intrusion by the non-diegetic narrator or narratee rowind (Bethesda, 2002) hear the music change
into the diegetic universe” (Genette, 1983, pp. when navigating through a forest, they know that
234-235), Genette extends the term and calls all an enemy is approaching, and may act accordingly.
kinds of narrative transitions of elements between However, since this music has no source in the
distinct levels of the literary diegesis narrative gameworld, the player character should not be able
metalepsis”. In literature, these transitions range to hear it, but since the player does hear it and may
from simple rhetorical figures, where the narrator act upon it, the character also seems to act as if it
addresses the reader, to extremes in which a man knows enemies are approaching even though it
is killed by a character in the novel he is reading. does not yet see them coming. In this sense, sound
81
that appears to be non-diegetic affects diegetic have not been designed to take this difference
events, thereby disrupting the traditional meaning into account, and is therefore not sufficient for
of diegetic and non-diegetic sound (Jørgensen, analyzing sound in computer games.
2007b). In Pulp Fiction (Tarantino, 1994), on the
other hand, one of the characters is sitting in his categorization of Game sound
car accompanied by what at first appears to be
non-diegetic music. Suddenly he starts whistling There have been different attempts to categorize
along with the music. In this case, the audience is game sound and, in this section, I will present some
not led to believe that the character hears music of the most fruitful endeavors. Although only a
that is not present; instead, they re-interpret the few scholars base their descriptions on whether
music not as non-diegetic, but as diegetic music or not sounds are diegetic and non-diegetic, many
played on the car radio. refer to the concepts and may in some cases use
On the surface, the situations from the game them as unambiguous ways to look at sound. This
and the film may appear similar, but in terms of section will provide a short overview of such
how it affects its context, there is a huge difference scholarly attempts before the next section goes
between the film music and the game music: In the on to discuss specific attempts to adapt diegetic
case of the film music, we revise our interpreta- and non-diegetic concepts to game sound.
tion when we realize that the fictional character Alex Stockburger (2003) was perhaps the first
actually can hear it (Branigan, 1992, p. 88). There academic that came up with a method of catego-
is therefore never any ambiguity connected to the rization for game sound. He defines a number
origin of the music, and we are never led to believe of “sound objects” according to their use in the
that the character hears music that is not present game environment, and separates between score
in his world. The game music, on the other hand, sound objects, zone sound objects, interface sound
has a functional value related to the game system: objects, speech sound objects, and a range of dif-
it provides a warning to the players about a change ferent effect sound objects connected variously to
in game state: namely that an enemy is aware of the avatar, to objects usable by the avatar, to other
their presence and about to attack. In this sense, game characters, to other entities, and to events.
the role of game music is to enable the player to Although Stockburger emphasizes the importance
use its informative value to make progress in the of understanding the functional role of sound, his
game. In this respect, film music and game music categories do not cover this. Instead, his model
have fundamental different roles. While film music describes sound according to what kind of object
provides clues about moods, upcoming events, it is connected to in the game engine. He also uses
and how to interpret specific scenes, game music diegetic and non-diegetic as matter-of-fact and
works as a user interface that provides usability in- straightforward concepts and does not discuss
formation that helps players progress in the game. how they should be interpreted in terms of game
Also, while non-diegetic film music never allows sound. One who does argue that diegetic concepts
the audience to change the protagonists’ behavior can be usefully applied to game sound is Zack
or to save them from certain death, game music Whalen. He states that non-diegetic game music
can enable the player to guide their avatar away has two functions; to “expand the concept of a
from danger or to make them draw their sword game’s fictional world or to draw the player for-
even before the enemy has appeared. This is, of ward through the sequence of gameplay” (2004).
course, a direct result of the difference between In other words, it can either support the sense of
players and audiences and it puts emphasis on the spatiality and presence in the game environment,
fact that the concepts of diegetic and non-diegetic or support the player’s progression through the
82
game. His approach is interesting as it takes into visualization of structural features of a musical
account the fact that game music provides informa- composition, exemplified by the stylized visu-
tion relevant for gameplay, but by being tied to the alization of patterns found in the user interface
traditional meaning of non-diegetic it is equally of music games such as Rock Band (Harmonix,
misleading as other adaptations of the concepts. 2007) and Electroplankton (Indies Zero, 2006).
A scholar who does see the diegetic/non- From the point of departure of this chapter, this
diegetic division as complicated is Karen Collins view of diegetic is problematic, since it distances
(2007, 2008). She points out that the division itself from the original use of diegesis and thereby
between diegetic and non-diegetic sound is creates confusion. Milena Droumeva, on the other
problematic since the player is engaging in the hand, outlines a framework of game sound accord-
on-screen sound playback process directly (2008, ing to “realism” in terms of fidelity and verisi-
p. 125). Her separation between interactive and militude, and connects these to acoustic ecology
adaptive sound is based on functionality. Whereas and Barry Truax’ idea of an acoustic community
interactive sound refers to sound events occurring that includes physical world sounds that have an
in response to player action, adaptive sound reacts impact upon gameplay. Examples of this are the
to events in the environment (2007, 2008, p. 4). acoustic soundscape of group play, and online
In this respect, sound is understood as a dynamic conferencing (“live chat”) (Droumeva, 2011).
feature closely related to events, at the same time From this perspective, she argues that the use of
as it takes into account the agency of the player. diegetic and non-diegetic terminology is limited
Huiberts & Van Tol (2008) also point out that because it fails to acknowledge the importance of
using diegetic and non-diegetic is complicated these kinds of sounds. Although a valid point when
in connection with game sound, since interactiv- discussing the general soundscape of the gaming
ity allows non-diegetic sounds to affect diegetic activity, this point has only limited value to the
events. They still decide to use the terms because argument of this chapter, since it is restricted to
they see them as established within game studies. how game internal sound works with respect to the
By putting diegetic and non-diegetic in context gameworld, and only briefly mentions externally
with setting and activity, their IEZA framework produced sounds.
takes into account the interactive aspects of game
sound, but does not take into consideration that Diegetic theories of Game sound
gameworlds are designed for different purposes
compared to diegeses, and that they therefore Some of the more critical attempts at adapting
influence sound in a different way. diegetic and non-diegetic to games have resulted
There are also other models for describing in analyses that show that game sound has more
sound in this anthology. Wilhelmsson & Wallén’s significant layers of meaning than can be explained
(2011) general framework for sound design and by using the terminology above. In this section,
analysis combines theories of listening with both I will evaluate the most comprehensive of these
the IEZA framework and Murch’s description of adaptations and discuss their strengths and weak-
five layers between “encoded” and “embodied” nesses. However, even though the following ac-
sound in film ranging from speech to music via counts are attentive to how the concepts of diegetic
effect sounds: However, like many others, they and non-diegetic when used for describing games
take the fruitfulness of diegetic and non-diegetic differ from how they are used for films, emphasiz-
for granted. In his discussion of diegetic music, ing this difference may lead to a situation in which
Berndt (2011) claims that what he calls visualized one keeps leaning too heavily on a terminology
music must be considered diegetic. This is the that is meant to describe film sound, without be-
83
ing able to free oneself to establish a new model 2008, p. 224). In this respect, sounds do not have
designed to take the particular characteristics of to be placed within the game environment in a
game sound into account. way that we recognize from the physical world.
A game scholar that partly succeeds in using In other words, as long as the referent is diegetic,
diegetic and non-diegetic in his description of the signal does not need to be. There is no need to
games is Alexander Galloway (2006). Focusing have a character in the gameworld that produces
on games as activities, he couples the terminol- the sound for it to count as diegetic. For Grimshaw,
ogy with his own terminology of whether it is sounds are diegetic as long as they relate to actions
the player (operator) or game system (machine) and events in the gameworld. He exemplifies by
that performs the act. His model describes all pointing out that sounds signaling the entrance or
actions as executed either inside the “world of exit of players in a multiplayer game should be
gameplay” or outside of it and whether it is the considered diegetic since they concern entities in
player or the game system that takes a specific the game environment and affect their behavior.
action. In this way, he describes all actions from Based on this understanding, Grimshaw elabo-
the player firing a gun to configuring the op- rates that diegetic game sounds are not limited
tions menu, from the movements of non-playing to sounds that exist in the gameworld but that
characters to the spawning of power-ups. While we also need to take into account all sounds that
the categories themselves are not crucial to this provide information relevant for understanding
chapter, Galloway’s perspective is important. He the gameworld. In effect, this would also include
emphasizes the fact that games are activities and the traditional background music that signals an
that they must be described as such. He also states enemy about to attack in The Elder Scrolls III:
that when diegetic and non-diegetic are used in Morrowind, and disembodied voiceovers in War-
connection with games the meaning of the terms craft 3. By introducing additional new concepts
changes (Galloway, 2006). However, even though that specify whether a sound is heard by a specific
he points this important fact out, Galloway’s player (ideodiegetic sounds), and whether such
use of these terms is somewhat confusing since a sound results from the player’s haptic input
he, like I do with the term transdiegetic, tries to or not (kinediegetic versus exodiegetic sounds)
change the concepts from describing the relative (Grimshaw & Schott, 2007; Grimshaw, 2008),
positioning of features in space to describing ac- Grimshaw creates a game-specific terminology
tions. The model is worth mentioning, however, that recognizes its theoretical relationship to the
since the action-oriented perspective supports diegetic or non-diegetic divide. A concept that is
sound by focusing on temporality: that is, like particularly interesting is what he calls telediegetic
sound, action is time-based. sounds. Connected to multiplayer situations, these
Galloway’s approach to diegesis as a “world of are sounds produced by one player and of con-
gameplay” is also closely related to Mark Grim- sequence for a second player who does not hear
shaw’s radical modification of what should count that sound. While it may be seen as a paradox to
as diegetic sound in computer games. He extends call this information auditory when it is in fact the
the idea of diegetic sound compared to film theory, action of the first player that affects the second
and states that in computer games, diegetic sound player, the concept has interesting implications. If
is “defined as the sound that emanates from the we detach the concept from the idea that it must
gameplay environment, objects and characters and be heard by a first player, it may be extended to
that is defined by that environment, those objects all situations in which players appear to react to
and characters”, and that it must “derive from a sound that they do not hear, such as is the case
some entity of the game during play” (Grimshaw, when players apparently react to the traditionally
84
speaking non-diegetic music of approaching en- gameworld but is external to it. The same goes
emies. However, even though Grimshaw’s theory for the disembodied warning “Our base is under
emphasizes all sounds that have relevance for attack!” in Warcraft 3. It is external transdiegetic
player actions in the gameworld, it is confusing because it provides information relevant to player
that he still insists on using the concept diegetic action, but is not produced by anyone within the
also for sounds that appear to have no source in gameworld. When the avatar in Diablo 2 (Bliz-
the game environment and that the avatar should zard, 1998) claims “I’m overburdened”, however,
not be able to hear. In any respect, Grimshaw’s I called the sound internal transdiegetic because
extension of what counts as diegetic, and his focus the avatar as a character existing in the gameworld
on the player in relation to the concept, are strong communicates to the player situated in an external
arguments for exchanging the existing terminol- position. The strengths of transdiegetic as concept
ogy with new. are that it emphasizes the functional role of the
In my Ph.D. research (Jørgensen, 2007a, 2009), sound in relation to player action in the gameworld,
I developed a model of categorization that took and it points out that the spatial origin of the sound
into consideration functionality with respect to is often relative. It is also able to describe all game
usability and type of information, location with sounds by using the same framework. However, it
respect to the gameworld, and referentiality with is confusing that it is based on the term diegesis,
respect to the relationship between sound signal which creates connotations to the mechanisms of
and the event it refers to (2007a, pp. 84-87). In Jør- narratology and storytelling. Also, the internal and
gensen (2008), the model was further developed to external variations are flawed as they appear to
include what generates a specific sound. However, be two variations over the same theme, while in
in describing the location of sound with respect reality they are not. While internal transdiegetic
to the gameworld, these models both included sounds can easily be interpreted as abstractions of
references to the diegetic/non-diegetic divide by “diegetic” sounds since they are partly integrated
the use of the neologism transdiegetic sounds into the game environment, external transdiegetic
(Jørgensen, 2007b). This approach described sounds are externally situated but with clear impact
sound as transdiegetic by way of transcending on the game environment.
the border between diegetic and non-diegetic: Inger Ekman’s approach to game sound (2005)
Diegetic sounds may address non-diegetic entities, is closely related to that of transdiegetic sounds.
while non-diegetic sounds may communicate to Common to Ekman’s and my account is the idea
entities within the diegetic world. Such sounds that the space of the gameworld is not absolute,
have an important functional value in computer and that information is carried across its boundar-
games by being an extension of the user interface ies. Another common ground is the idea that game
and providing information such as feedback sounds are used to integrate the game system into
and warnings to the player. Utilizing the border the environment in which it is set. From a semiotic
between diegetic and non-diegetic, transdiegetic perspective, she observes that game sounds that
sounds merge game system information with the traditionally would be labeled diegetic, often have
gameworld and create a frame of reference that non-diegetic referents, and vice versa. In this re-
has usability value at the same time as it upholds spect, computer game sound is not limited to being
the sense of presence in the gameworld. Using this diegetic or non-diegetic, but creates two additional
terminology, I argued that apparently non-diegetic layers that may be used to integrate non-diegetic
music that provides information relevant for player elements connected to the game system into the
action in the gameworld is external transdiegetic diegetic world of the game. Masking sounds is her
since the musical source is not found within the term for diegetic sound signals with non-diegetic
85
referents. Such sounds appear to be produced in player to collect and store items in the game.
the gameworld, while its referent is a mechanic This interpretation was suggested by two player
of the game system. An example of a masking respondents in my research on the topic of trans-
sound can be found in World of Warcraft when a diegetic communication:
monster attacks the avatar preemptively. In such
cases, a sound specific for that monster will be […] Well, it is the character’s voice saying this.
heard that signals to the player that the avatar has But still I don’t get the feeling that it is the char-
entered the aggression zone of that monster. This acter speaking. It’s like the game narrator’s voice
sound is hard to interpret as natural to the world of provides the player with a hint that, okay, you
the game since no animal would signal to its prey should check your inventory. […] (John, (30).
that it is about to attack. Being represented by a Individual interview, Dec 10, 2008.)1
sound signal with a source in the gameworld, the
sound has the ability to mask its origin as a system It’s a like some sort of error, or a… if you want
message by being integrated into the gameworld, to see her as an individual person, it’s really an
and thus becomes situated on the border of what error. Because then the question is, who is she
is traditionally seen as the diegesis. Ekman calls talking to? […] (Isabel (25). Individual interview,
a sound symbolic, however, in cases where the Dec 1, 2008.)
signal is non-diegetic and the referent is diegetic.
An example of this is adaptive game music that While John sees the above sound signal as a
is not produced by a source in the gameworld, system message masked as diegetic, Isabel thinks
but refers to an event in the gameworld, such as of it as an error since it is unclear who the avatar
is the case when the player suddenly hears the is talking to. In this case, the referent is also am-
music change when an enemy is about to attack biguous in the same way as it is not clear whether
in Dragon Age: Origins. the sound refers to the fact that the avatar is trying
Although Ekman’s model is fruitful in explain- to pick up something in the gameworld but fails
ing how game sound relates to the traditional film or to the fact that the inventory is overloaded.
theory understanding of diegetic and non-diegetic Warcraft 3 provides another example. When the
sound, it also demonstrates the problematic aspects player tries to place a new building on an illegal
of applying these concepts to games because game location, a disembodied voiceover says, “Can’t
sound in many cases is only partially diegetic. Also, build there!” At first glance, the signal seems to
there are many examples of sounds that cannot be be non-diegetic since there is no character in the
fully explained by Ekman’s model. When a voice gameworld that produces the sounds. However,
that apparently belongs to the avatar proclaims this is challenged by the fact that the voice and
that “I’m overburdened” in Diablo II, it is not the accent are very similar to the voices of the
certain whether signal and referent are diegetic other units of that race. The referent is even more
or not. While the signal gives the impression of ambiguous: while the sound refers to an operation
being diegetic due to the use of the first person that is illegal according to the game system, it also
personal pronoun and the fact that it is produced refers to the fact that this specific location in the
by a voice that seems to belong to the avatar, it gameworld has diegetic properties such as trees
may also be interpreted as a non-diegetic system or existing structures that makes it impossible to
sound masked as diegetic since it is unclear who build here.
the avatar is talking to (itself or the player?) and As has been demonstrated in the above discus-
since it provides information about the inventory, sion, the attempts to adapt the concepts diegetic
which is the game system feature that allows the and non-diegetic to game sound point to interesting
86
aspects that recognize the specificities of game and argues that gameworlds are radically different
sound compared to sound in other media. At the from storyworlds because they are worlds designed
same time, however, these attempts also demon- for playing games. This means they are unified
strate that the use of concepts designed to explain and self-contained wholes, structured as arenas
traditional media is problematic and confusing. for participation and contest, and are therefore
There is a need to invent a terminological apparatus subject to a coherent purpose (Klevjer, 2007, p.
that fully grasps the uniqueness of game sound 58). Such worlds are created around a different
without trivializing it or confusing it with related, logic than “fictional storyworlds” and, as long as
but different, features in other media. However, all elements are explained as being parts of the
what the adaptations above have in common, is game system, they do not need to be explained
seeing game sound as qualitatively different from as a credible part of a hypothetical world. Espen
sound in other audio-visual contexts. Specifi- Aarseth (2008) makes a clear distinction between
cally, there is a tendency to pay attention to the gameworlds and fictional worlds by stating that
interactive nature of game sound and to see it as the virtual world of World of Warcraft (Blizzard,
a part of the user interface of the game in that 2004) is no fictional world but instead “a functional
it provides information to the player that helps and playable gameworld, built for ease of naviga-
feedback and control (Saunders & Novak, 2006). tion” (p.118). This is also emphasized in Aarseth
These adaptations also suggest that gameworlds (2005) in which he describes the environmental
operate in a different manner compared to story- design of Half-Life 2 (Valve, 2004). It is a care-
worlds. This is particularly evident in Grimshaw’s fully designed environment with a specific layout
extended understanding of diegetic sound as all that guides the players through specific areas, and
sounds that derive from a gameplay event. In the limits the freedom of navigation in order to set
following I will discuss how the understanding of up the challenges of the game, at the same time
game sound as interface, and the gameworld as a as it is given properties that remind one of the
different construct to traditional diegeses, affects physical world in terms of world-representation.
the idea of diegetic sound and I suggest alternative I want to follow up on Klevjer’s and Aarseth’s
ways of discussing the relationship between the approaches and further point out that gameworlds
gameworld and game sound. are universes designed for the purpose of playing
games. This means that they are fitted for very
specific uses, and their layouts are decided in terms
sOUND AND tHE GAMEWOrLD of functionality according to the game system. En-
vironmental features and dungeon layouts are not
I have suggested above that diegetic and non- created randomly but, because of careful design,
diegetic are problematic in connection with games they are oriented towards a specific gameplay
and game sound because gameworlds are different experience. This view will be the starting point
constructs compared to traditional fictional worlds, for the following discussion that will focus on
or diegeses, and because of the way the players the functional aspects of gameworlds and sounds
interact with them. In this section I will go into connected to it. As we will see, this view of the
the characteristics of gameworlds, what makes gameworld is important for understanding how
them different from traditional fictional worlds, sound is used, and explains why players do not
and what consequences this has for understanding see what I earlier called transdiegetic sounds as
their sound usage. interfering.
Rune Klevjer rejects using the term diegesis to As different constructs compared to traditional
describe gameworlds due to its link to storytelling, fictional worlds, gameworlds operate on other
87
premises. One characteristic of gameworlds is since gameworlds functionally are very different
that they need to have a comprehensive system for from literary or cinematic diegeses.
player interaction. They need to be able to com- Based on the above, the upholding of the game
municate necessary information about changes system by the gameworld also has consequences
in game state and allow the player the necessary for the integration and design of sound in games.
degree of control. Many of these interface features, All game sounds have a function with respect to the
including sounds, are often added to the game as gameworld, be it to provide information relevant
abstractions of specific game mechanics partly for gameplay or to provide a specific atmosphere.
integrated into the gameworld and, as that, it is Specific games and genres use sound in different
problematic to see them as either diegetic or non- ways and the degree to which it is incorporated
diegetic in traditional terms. Instead of looking into the gameworld plays an important role for
at what would be a credible representation of a reasons of clarity and consistency and in order to
naturalistic world, we should look at how the game- create an immediately understandable relationship
world and the game system work to support each between the sound and the gameworld. When
other. If the game rules state that monsters growl designing user interfaces for games, a designer
when attacking, and that individuals respawn with needs to decide how to present information to the
their amour 10% damaged after being killed, this player. Central to this is deciding which menus
is the premise of the specific gameworld. This is that should allow interaction or not, how and
a view that is a familiar one for empirical players. whether the user interface should be integrated
One of the player respondents in my empirical into the gameworld, and how sounds and visual
research states it thus: elements should work together. Game design-
ers Kevin Saunders and Jeannie Novak (2006)
[…] In this world, you can define whatever you describe two ways of relating the user interface
would like there to be, it doesn’t seem that things to the gameworld and the gamespace. A dynamic
are very credible in themselves. interface supports the idea that all audio-visual
aspects of a game should be seen as interface be-
Q: So why do we accept it? cause they all provide the player with some kind of
information, and dynamic interfaces are therefore
Because it’s a game. And that is something completely incorporated into the gameworld. An
completely different from a film. (Isabel (25). example is the way an avatar’s amour and weapons
Individual interview, Dec 01, 2008) provide information in a massively multiplayer
online game (MMO)2 like World of Warcraft: By
Here Isabel emphasizes the idea that game- looking at what gear the opponent has, a player
worlds do not need to be a credible alternative to receives vital information about class, level and
other fictional worlds, and that game designers power of that avatar. A static interface, on the
can decide what they want to include as existent other hand, is an overlay interface that consists
in their world: Because they are integrated with of external control elements such as health bar,
the game system, gameworlds are necessarily dif- map, pop-up menus, inventory, action bars and
ferent from fictional worlds, such as films. This so on. Since user interface and gameworld often
interpretation supports Grimshaw’s extended view tend to merge, making the boundary between
of what counts as diegetic in computer games, but gameworld and interface relative (Jørgensen,
at the same time it amplifies the problematic as- 2007b, 2008, 2009), the static/dynamic divide
pects of using diegesis as explanatory terminology, should not be seen as absolute, but as a continuum
where the interface may be more or less integrated
88
into the gameworld. Used as an interface, sound ing that all elements affecting gameplay should
often takes on a relativistic position where it is be counted within in the gamespace, regardless
integrated into the gameworld while remaining of whether these are part of the original system
part of the game system. Using sound signals that or design. From this point of view, gamespace
are based on real world sounds, but which have seems to be equivalent to Grimshaw’s and Ber-
been stylized, user interface designers add sounds ndt’s understanding of diegesis, since it includes
that provide the necessary usability information external system features relevant for gameplay,
at the same time as ensuring the sounds seem such as voiceovers announcing new players enter-
natural to the environment of the game. Ekman’s ing the game. Gamespace is therefore also what
masking sounds are textbook examples of this. Droumeva (2011) seems to have in mind when
Another example is the response “More work?” focusing on the importance of live chat and talk
by Warcraft 3’s orc peons. As a verbal statement that happens during group play. The gamespace is
produced by a character in the gameworld, it has thus separated from the gameworld by including
a direct link to that gameworld, but at the same all features that have direct relevance to progress
time it is an interface sound produced in response in the gameworld, be it score music signaling ap-
to player action. However, the sound is not an proaching enemies or add-on software in World of
actual sound of an event in the gameworld, since Warcraft, while the gameworld is the contained
it would make little sense if the peon actually universe or environment designed for play in which
were talking to the player. actions and events take place. In this sense, a static
overlay interface of a computer game is part of the
Gameworld vs. Gamespace gamespace, even though it may not be part of the
gameworld, while a dynamic integrated interface
So far we have seen that game system informa- would be part of the gameworld.
tion and game user interface features such as For clarification, take the screenshot from Dia-
sound may be more or less integrated into the blo II in Figure 1 as an illustration. The right half of
gameworld. However, they will also have a spe- the screen consisting of inventory, the bottom ac-
cific relationship to the gamespace of a specific tion bar including health and mana measurements,
game. Looking at this relationship may provide and the upper left icon of the avatar’s minion are
us with clear insights into how gameworlds all parts of the overlaid interface. These should
work compared to diegeses. Gamespace should not be interpreted as part of the gameworld, which
be understood as the conceptual space in which is represented by the virtual environment on the
the game is played (Juul, 2005, p. 167), inde- left. The interface features are, however, directly
pendent of any possible fictional universe used relevant for player progress in the gameworld,
as a context for it. It is thus the arena on which and they are also attributes governed by the game
gameplay takes place, and includes all elements system. They must therefore be seen as part of the
relevant for playing the game. According to the gamespace; that is, the space of action relevant for
magic circle theory (Huizinga, 1955, p. 10; Salen the game progression included within the magic
& Zimmermann, 2004, pp. 94-95) all games are circle of the game. Now consider the left side of
seen as a subset of the real world, delimited by a the screenshot, a screen segment of the gameworld.
conceptual boundary that defines what should be One interesting feature in this part of the image is
understood as part of the game and not. The magic the small illuminated icon above the avatar’s head
circle is what separates the game from the rest of which represents a boost to the avatar’s stamina. In
the world, and defines thus the gamespace (Juul, terms of transdiegeticity, I would have explained
2005, pp. 164-167). One may go as far as claim- this feature as internal transdiegetic because, in
89
Figure 1. Gamespace vs. gameworld. Diablo 2. ©2000 Blizzard Entertainment, Inc. All rights reserved
a traditional sense, it is a feature that seems alien inventory, there will be a short, nondescript click
to the diegesis while at the same time it provides which does not seem to represent any actual sound
information about the gameworld. However, view- in the gameworld. However, once he discards it
ing gameworlds as different constructs compared in the gameworld, there is a responsive sound
to traditional fictional worlds, the icon is clearly resembling that item being dropped to the ground.
part of the gameworld, since it is not part of the If it is a potion, there is a bubbling sound and, if
overlay interface, but a feature picked up as the it is a weapon, there is the sound of metal hitting
avatar visited a stamina well and which follows the ground. By being adjusted to the atmosphere
the avatar everywhere he walks. Since gameworlds of the different spaces, the sound clearly empha-
works on other premises than traditional diegeses, sizes which frame it belongs to; there is no doubt,
players would have no problem accepting that this though, that it does move from one to the other.
is part of the gameworld even though the avatar However, how this movement from frame
is not aware of it. to frame is achieved may vary between games
There is an important direct link between the and genres. A first-person shooter like Crysis
gamespace and the gameworld which is particu- (Crytek, 2007) that integrates the interface as a
larly accentuated by the use of sound. When the HUD3 that is part of the avatar’s suit situates the
player decides to discard an item in the screenshot relationship between gameworld and gamespace
above, he will use his mouse to drag and drop the somewhat differently from third person perspec-
item from the inventory on the right to the vir- tive avatar-based games. One of the empirical
tual environment on the left or, in other words, player respondents elaborates:
he will move it from the gamespace to the game-
world. The moment he selects the item in the
90
I’m absolutely positive to the idea [that the avatar two more seamless by using interfaces that are
sees the HUD]. It’s presented so that the suit he’s integrated into the gameworld in different ways.
wearing […] in a way provides all the informa- Since sound is neither tangible nor visible and has
tion that you need, through the perspective. And, a temporary quality, it does not disrupt the sense of
well, it’s one solution, they probably try to make a unified space in the same way as alien graphical
it an integrated part of this world. (Eric, (26). features would. It therefore seems to be easier to
Individual interview, Nov 28, 2008) accept the growl of an attacking animal than it is
to accept a question mark floating around in thin
Here, even the HUD and overlaid features must air. This therefore provides greater potential for
be interpreted as part of the gameworld and thus designers for manipulating auditory information
the gameworld and the gamespace overlap each compared to visual information when creating user
other more or less completely. The reason for interfaces for games. The fact that gameworlds
this is that the game user interface designers have work on other premises compared to traditional
decided to make the interface part of the avatar’s fictional worlds is what makes the player accept
advanced military suit so that all audio-visual stylistic and abstract sounds that integrate the
information is provided to the avatar in the same game system into the gameworld, but this ability
manner as it is provided to the player. is also part of the reason why gameworlds are
While all features are part of the gamespace accepted as a different constructs compared to
as long as they are not connected to external the traditional fictional worlds. This discussion
menus in which one changes the game settings also puts emphasis on the argument that talking
or starts a new game, they may or may not be about diegesis, and thus diegetic and non-diegetic
connected to the gameworld as well4. If they are, sound, has crucial shortcomings that are avoided
they are typically positioned in the gameworld if we instead evaluate gamespaces on their own
in the same way as what I earlier called internal terms by emphasizing how gameworlds differ
transdiegetic features. While not appearing to be from other fictional worlds.
native to the gameworld, they are still positioned
inside it graphically. They may be placed above
the heads of non-playing characters in a way that sPAtIAL INtEGrAtION
allows the player to move around it: It will move OF GAME sOUND
with the environment, and not with the overlay
interface that is tied to the edges of the screen. An If we want to find an alternative model that
example of a corresponding auditory feature, is describes the relative integration of sounds in
the “Hi, you’re a tall one!” response from a non- gameworlds, we need to get away from the bi-
playing character (NPC) in World of Warcraft. ased meaning of diegesis and instead focus on
Features I earlier called external transdiegetic, the specificities of game sound. In evaluating
however, are not part of the gameworld, only of the usefulness of the concepts diegetic and non-
the gamespace. They are not integrated into the diegetic in relation to game sound, I have stressed
gameworld but provide information relevant for that these do not grasp how sounds are integrated
gameplay. An auditory example of this is music into the gameworld and that they do not empha-
signaling the presence of enemies in The Elder size how sounds work as an interface providing
Scrolls III: Morrowind and Dragon Age: Origins. action-relevant information to the player. In this
In this section I have argued that sounds have section, I will present a game-specific approach
a particular role in connecting the gamespace and to describing game sound that avoids the use of
the gameworld, making the boundary between the the diegetic/non-diegetic diad. Due to the scope of
91
this chapter, the model focuses on spatial integra- into the gameworld to a lesser or greater degree.
tion and the difference between gameworlds and Moreover, since sound is part of a game’s user
storyworlds, but it also reflects awareness of the interface, it is also possible to locate different
functional aspects of game sound by looking at it sounds on the same continuum. In the table above,
as an interface, and how these aspects transcend I have identified five points on this continuum
the border of the gameworld in a meaningful way. where sound signals tend to be located in modern
This model puts emphasis on how well a computer games. All categories have a certain
sound is integrated into the gameworld. It builds degree of integration into the gameworld, with
on and supports existing theories on how we may the exception of the first group which is the only
understand gameworlds, game sound and how one that is not part of the gameworld. I call this
they work together. Grimshaw’s radical interpreta- group metaphorical interface sounds since they
tion of diegesis is conserved in emphasizing the are not “naturally” produced by the game universe
distinction between gameworld and gamespace, but have a more external relationship to the game-
and we also gain new insight into the functional world, even though they also have a metaphorical
and integrational aspects of so-called transdiegetic similarity (Keller and Stevens, 2004) to the at-
sounds. Also, Galloway’s focus on games as ac- mosphere and the events in it. The enemy music
tivities is preserved as there is a heavy focus on found in Dragon Age: Origins and The Elder
how sounds affect gameplay in addition to the Scrolls III: Morrowind are typical examples of
fact that gameworlds are games intended for play. these kinds of sounds, which are usually system-
Last but not least, the model avoids all confusion generated and may provide orientating and iden-
connected to the usage of terminology connected tifying information as well as working proac-
to the diegesis. This approach will be described tively as a warning to the player.
in detail below. In pointing out that game sounds The remaining four categories are all integrated
should be seen as an interface, it places emphasis into the gameworld in different ways and to dif-
on the usability aspects of sound in the sense that ferent degrees. Overlay interface sounds have
it provides information to the player such as warn- the same relationship to the game as Saunders &
ings and responses as well as information relevant Novak’s static user interface when it is added as
to game control, identification, and orientation. an overlay. These sounds are directly connected
See Table 1. to the overlay menus, maps and action bars, and
This interpretation of sound’s integration into are typically generated by the player in response
the gameworld is based in Saunders & Novak’s to his commands. These are found in most game
separation of static and dynamic interfaces, but I genres but are in particular common to interface-
believe it is more fruitful and more correct to see heavy genres like real-time strategy games. The
this separation not as a binary divide but as a example above is from Command & Conquer 3:
continuum that integrates user interface elements Tiberium Wars (EA LA, 2007), where the player
Table 1. Game sound and world integration
Metaphorical interface Dragon Age: Origins: Enemy music

Overlay interface C&C3: Mouseclick when selecting actions
Integrated interface Diablo 2: Sound following boost
Emphasized interface WoW: “Hi, you’re a tall one!”
Iconic interface Crysis: Avatar moans when injured
92
typically hears the generic sound of a mouseclick a sound and identifying what that event means
every time he selects an action from any of the for the player’s state. Last but not least, it would
menus. Integrated interface sounds are typically take into account how the sound is integrated into
related to user interface elements that have been the gameworld. Combined, the models will form
placed into the gameworld, such as exclamation a comprehensive and detailed analytical tool that
marks and the icons above the heads of characters. describes all gameplay related sounds in computer
The sound played as the avatar gets a boost to games, without creating the confusing association
stamina in Diablo 2 is a typical example of this to traditional diegeses.
and it is a system-generated sound that works as
a notifier that also identifies the boost in question.
Emphasized interface sounds have a somewhat cONcLUsION
different relationship to the gameworld as they
often appear to be generated by friendly NPCs in When sounds work functionally in the sense of
the gameworld. An example is the lines spoken by providing gameplay-relevant information to the
NPCs in World of Warcraft in response to player player, it must be seen as part of the user interface
targeting: When the goblin merchant says “Hi, of a game. In this respect, we need to acknowledge
you’re a tall one!” This is a sound that appears its status as such and use an approach that allows
to be diegetic in the traditional sense of the term us to describe it in terms of an interface. However,
since it is something a character in the gameworld the traditional distinction between diegetic and
actually says, but it is in fact a system-generated non-diegetic is not based on participatory use
sound that has been stylized and fitted into the and does not allow us to describe game sound in
gameworld. Iconic interface sounds, however, this way. This article presents a game-oriented
are completely integrated into the gameworld and alternative to diegetic and non-diegetic that takes
correspond to Saunders and Novak’s dynamic user into account spatial integration of sounds from a
interface features. In terms of film theory, these gameplay perspective. The model is also compat-
sounds would be labeled diegetic as they seem ible with earlier models characterizing game sound
to belong naturally to the universe in which they (Jørgensen, 2007a, 2008, 2009) and together they
are in. They can have any kind of generator and form a framework that allows us to describe the
may provide any kind of usability information. An interface aspects of computer game sounds while
example of an iconic interface sound is heard when also paying equal attention to its relationship to
the avatar moans because he is injured in Crysis. the gameworld as an environment that reminds of
While this model is limited to solely taking those of fiction but instead is built on game rules.
into account spatial integration of game sound, While this chapter argues for substitution
it is fully compatible with my earlier models de- of the terms diegesis, diegetic and non-diegetic
scribing the usability value of a sound (Jørgensen, when discussing sound in games, it should be
2007b) and what generates a sound (Jørgensen, stressed that these terms may be fruitful in some
2008). When combining these functions, we may respects. They may be used when a scholar wants
study game sound along several dimensions that to compare computer games and game sound with
grasp usability on a more general level by iden- other media and they may also be used the way
tifying whether a sound provides responsive or this chapter does; to show why they are prob-
urgent information and whether it is related to lematic. From these perspectives, Galloway’s,
control functions, orientation or identification. Ekman’s, Grimshaw’s and Jørgensen’s earlier
Such a combination would be able to dive into work on the subject are important contributions
the gameworld describing what event generates that are especially fruitful for those seeking to
93
understand how game sound and gameworlds Berndt, A. (2011). Diegetic music: New interactive
differ from other media. It is, however, important experiences . In Grimshaw, M. (Ed.), Game Sound
to emphasize the fact that spatiality in computer Technology and Player Interaction: Concepts and
games operates on very different premises than in Developments. Hershey, PA: IGI Global.
film, for instance, and that we talk about a differ-
Bordwell, D. (1986). Narration in the fiction film.
ent relationship between sound and environment
London: Routledge.
compared to the traditional separation between
diegetic and non-diegetic. A crucial difference Bordwell, D., & Thompson, K. (1997). Film
is that gameworlds are different constructs from art. An introduction to film theory. New York:
traditional fictional worlds and this must be taken MacGraw-Hill.
into consideration when discussing the origin of
Branigan, E. (1992). Narrative comprehension
sounds and other features.
and film. London: Routledge.
It is important to note that the model presented
here is not limited to the study of game sound but Chion, M. (1994). Audio-vision, sound on screen.
that it may be used to analyze all interface-related New York: Columbia University Press.
features of a computer game. However, sound is
Collins, K. (2007). An introduction to the par-
particularly interesting because of its seamless
ticipatory and non-linear aspects of video games
integration and its ability to remain non-intrusive
audio . In Hawkins, S., & Richardson, J. (Eds.),
even when it tends to break with the conventions
Essays on sound and vision. Helsinki: Helsinki
of the gameworld. It should also be mentioned
University Press.
that the framework is supposed to work as a tool
to help us better understand how game sound and Collins, K. (2008). Game sound: An introduc-
other game features operate, and as such, it will tion to the history, theory, and practice of video
always be subject to modification. game music and sound design. Cambridge, MA:
MIT Press.
Command & conquer 3: Tiberium wars. (2007).
AcKNOWLEDGMENt
EA Games.
Thanks to Jesper Juul, Matthew Weise, Mark Crysis. (2007). EA Games, Crytek.
Grimshaw and the anonymous review committee
for comments. Diablo 2. (2000). Blizzard Entertainment.
Dragon age: Origins. (2009). EA Games, Bioware.
rEFErENcEs Droumeva, M. (2011). An acoustic commu-

nication framework for game sound: Fidelity,
Aarseth, E. (2005). Doors and perception: Fiction verisimilitude, ecology . In Grimshaw, M. (Ed.),
vs. simulation in games. In Proceedings of 6th Game Sound Technology and Player Interaction:
Digital Arts and Culture Conference 2005. Concepts and Developments. Hershey, PA: IGI
Global.
Aarseth, E. (2008). A hollow world: World of
Warcraft as spatial practice . In Corneliussen, H., Ekman, I. (2005). Meaningful noise: Understand-
& Rettberg, J. W. (Eds.), Digital culture, play and ing sound effects in computer games. Paper from
identity: A World of Warcraft reader. Cambridge, DAC 2005. Retrieved January 12, 2009, from
MA: MIT Press. http://www.uta.fi/~ie60766/work/DAC2005_Ek-
man.pdf.
94
Electroplankton. (2006). Nintendo, Indies Zero. Jørgensen, K. (2008). Audio and gameplay: An
analysis of PvP battlegrounds in World of War-
Galloway, A. R. (2006). Gaming: Essays on
craft. In Gamestudies, 8(2). Retrieved January 12,
algorithmic culture. Electronic mediations (Vol.
2010, from http://gamestudies.org/0802/articles/
18). Minneapolis, London: University of Min-
jorgensen.
nesota Press.
Jørgensen, K. (2009). A comprehensive study of
Genette, G. (1983). Narrative discourse: An essay
sound in computer games. Lewiston, NY: Edwin
in method. Ithaca, NY: Cornell University Press.
Mellen Press.
Gorbman, C. (1987). Unheard melodies? Narra-
Juul, J. (2005). Half-real. Video games between
tive film music. Bloomington: Indiana University
real rules and fictional worlds. Cambridge, MA:
Press.
MIT Press.
Grimshaw, M. (2008). The acoustic ecology of the
Keller, P., & Stevens, C. (2004). Meaning from
first-person shooter. City, Country: VDM Verlag.
environmental sounds: Types of signal-referent
Grimshaw, M., & Schott, G. (2007). Situating relations and their effect on recognizing audi-
gaming as a sonic experience: The acoustic ecol- tory icons. Journal of Experimental Psychology.
ogy of first person shooters . In Proceedings of Applied, 10(1). doi:10.1037/1076-898X.10.1.3
DiGRA 2007. Situated Play.
Klevjer, R. (2007). What is the avatar? Fiction
Half-Life 2. (2004). Sierra Entertainment, Valve and embodiment in avatar-based singleplayer
Corporation. computer games. Unpublished doctoral disserta-
tion. University of Bergen, Norway.
Huiberts, S., & van Tol, R. (2008). IEZA: A frame-
work for game audio. In Gamasutra. Retrieved Nacke, L., & Grimshaw, M. (2011). Player-game
January 12, 2010, from http://www.gamasutra. interaction through affective sound . In Grimshaw,
com/view/feature/3509/ieza_a_framework_for_ M. (Ed.), Game Sound Technology and Player In-
game_audio.php. teraction: Concepts and Developments. Hershey,
PA: IGI Global.
Huizinga, J. (1955). Homo ludens: A study of the
play element in culture. Boston: Beacon Press. Rock band. (2007). EA Games.
Jørgensen, K. (2006). On the functional aspects Salen, K., & Zimmermann, E. (2004). Rules of
of computer game audio. In Proceedings of the play: Game design fundamentals. Cambridge,
Audio Mostly Conference (pp. 48-52). MA: MIT Press.
Jørgensen, K. (2007a). ‘What are these grunts and Saunders, K., & Novak, J. (2006). Game develop-
growls over there?’ Computer game audio and ment essentials: Game interface design. Stamford,
player action. Unpublished doctoral dissertation. CT: Cengage Learning.
Copenhagen University, Denmark.
Stockburger, A. (2003). The game environment
Jørgensen, K. (2007b). On transdiegetic sounds from an auditory perspective. In M. Copier & J.
in computer games. Northern lights Vol. 5: Raessens (Eds.), Proceedings of Level Up: Digital
Digital aesthetics and communication. Intellect Games Research Conference.
Publications.
Tarantino, Q. (1994). Pulp fiction. Miramax.
95
TheElder Scrolls III: Morrowind. (2002). Bethesda designed for the purpose of playing a specific
Softworks. game. Gameworlds are oriented towards a specific
gameplay experience and do not need to be ex-
Warcraft 3: Reign of chaos. (2002). Blizzard
plained as a credible part of a hypothetical world.
Entertainment.
Metaphorical Interface Sounds: Sounds
Whalen, Z. (2004). Play along: An approach to that provide usability information to the player
video game music. Gamestudies, 4(1). Retrieved while being placed external to the gameworld.
January 12, 2010, from http://www.gamestudies. An example is adaptive music which informs the
org/0401/whalen. player that an enemy is approaching.
Non-Diegetic: That which is external to the
Wilhelmsson, U., & Wallén, J. (2011). A com-
fictional world. Non-diegetic sounds are thus
bined model for the structuring of game audio .
sounds represented as coming from a source
In Grimshaw, M. (Ed.), Game Sound Technology
outside the fictional world.
and Player Interaction: Concepts and Develop-
Overlay Interface Sounds: Sounds that are
ments. Hershey, PA: IGI Global.
associated with the overlay interface placed as a
World of Warcraft. (2004). Vivendi Games. filter on top of the gameworld. An example is the
sound of mouseclicks whenever the player makes
a selection from the action bar.
Transdiegetic: Transdiegetic features are
auditory and visual elements of a computer
game which transcend the traditional division
Diegesis: Originally referring to pure narrative,
between diegetic and non-diegetic by way of
or situations in which the author is the communi-
merging system information with the gameworld.
cating agent of a narrative, diegesis was revived
Transdiegetic features thus create a frame of com-
in the 1950s to describe the “recounted story” of
munication that has usability value at the same
a film. It is today the accepted term in film theory
time as they are integrated into the represented
to refer to the fictional world of the story.
universe of the game.
Diegetic: That which is part of the depicted
Integrated Interface Sounds: Sounds that
fictional world. Diegetic sounds are thus sounds
are connected to user interface elements that have
that have a source in the fictional world.
been placed inside the gameworld for usability
Game System: The formal structure of the
purposes. An example is system-generated sounds
game consisting of a set of features that affect
that follow the player’s collecting of coins, boosts
each other to form a pattern. Includes the rules
or other prizes.
of a game and the mechanisms that decide how
Emphasized Interface Sounds: Sounds that
the rules interact.
have been stylized and fitted into the gameworld
Gamespace: The conceptual space or arena
while also remaining clear system-generated fea-
in which a game is played, independent of any
tures. Examples are the auditory responses from
possible fictional universe in which it may be
units being selected in strategy games.
set. Gamespace is defined by the magic circle,
Iconic Interface Sounds: System-generated
and includes potentially all elements relevant for
sounds that are completely integrated into the
playing, regardless of whether they are part of the
gameworld as if they were natural to that universe.
original system or not.
An example is the sound of weapon use in a game.
Gameworld: A unified and self-contained
universe that is functionally and environmentally
96
ENDNOtEs 4
As the formal structure of the game, the game
system seems to lie somewhere in between
1
All quotes are originally in Norwegian, and the gamespace and the gameworld. While
have been translated by the author. talk between players during group play in
2
MMO is short for Massively Multiplayer the same physical space would be part of the
Online games. These are games in which gamespace, this kind of communication is
thousands of players play together on online not an actual part of the formal game system.
servers. However, so-called external transdiegetic
3
Originally a military technology, HUD is features such as music signalling incom-
short for heads-up display which is “an elec- ing enemies, are clearly part of the game
tronic display of instrument data projected system even though they are not part of the
at eye level so that a driver or pilot sees gameworld.
it without looking away from the road or
course” (Random House Dictionary, 2009).
97
98
Chapter 6
A Combined Model
for the Structuring of
Computer Game Audio
Ulf Wilhelmsson
University of Skövde, Sweden
Jacob Wallén
Freelance Game Audio Designer, Sweden
AbstrAct
This chapter presents a model for the structuring of computer game audio building on the IEZA-framework
(Huiberts & van Tol, 2008), Murch’s (1998) conceptual model for the production of film sound, and
the affordance theory put forth by Gibson (1977/1986). This model makes it possible to plan the audio
layering of computer games in terms of the relationship between encoded and embodied sounds, cog-
nitive load, the functionality of the sounds in computer games, the relative loudness between sounds,
and the dominant frequency range of all the different sounds. The chapter uses the combined model to
provide exemplifying analyses of three computer games—F.E.A.R., Warcraft III, and Legend of Zelda—.
Furthermore, the chapter shows how a sound designer can use the suggested model as a production
toolset to structure computer game audio from a game design document.
INtrODUctION of functional models, such as Sander Huiberts

and Richard van Tol’s (2008) IEZA-framework
Computer game audio is an often neglected area (Figure 1), are available. The IEZA-framework
when analyzing and producing computer games is also discussed in Droumeva (2011).
(Cancellaro, 2006; Childs, 2007; Marks 2001). In this chapter, we use the IEZA-framework
The same seems to be the case when analyzing or in combination with Walter Murch’s (1998) con-
producing movies (Murch, 1998; Thom, 1999;). ceptual model for film sound (Figure 2). Why
There is a general lack of functional models, for combine these two different areas, that is, a
the analysis as well as the production of computer model concerned with computer game audio and
game audio, even though some good examples another with film sound? As Huiberts and van Tol
(2008) have noted, film sound is a “field of
DOI: 10.4018/978-1-61692-828-5.ch006 knowledge that is closely related to game audio”
A Combined Model for the Structuring of Computer Game Audio
Figure 1. Huiberts and van Tol’s IEZA-framework for the analysis and production of computer game audio
into which we have added frames for the different categories. Adapted from Huiberts and van Tol (2008)
(p. 2). Although these two areas are related and (1998) conceptual color model (Figure 2), which
do share some common ground, they are also spans from language that clearly has relations to
quite different in many ways. It is striking that speech (encoded) via effects to music (embodied).
when we think about games we use the term With such a typology rather clearly differentiating
audio, yet when we think about film we seem to the 3 basic entities of film sound from each other,
primarily use the term sound. In our opinion, there we might jump to the conclusion that film sound
is a difference between these 2; audio is a more is fairly easy to create and that computer game
technology-based term than sound. A sound is audio could be modeled, more or less, on the
something you hear which in turn leads to listen- practice and theories of film sound. Since we only
ing, while audio is something that precedes sound have 3 basic categories of sounds that can be used
but with stronger technological connotations as and combined to create a sonic environment, how
a term. Film sound is, as Murch notes, normally hard can it really be? However, film sound is more
composites of sound in several layers, an assertion complex than this initial typology suggests and
which precedes a more thorough discussion of we address this in this chapter. Furthermore,
this model (Figure 2). Murch concludes that we computer game audio works under quite different
may be wise to limit those layers to a total of 5 conditions than film sound does: film sound is
different ones simultaneously played back on the fixed, stored and played linearly. This does not,
sound track of a movie. A common method of however, mean that sound in movies needs to be
separating the different parts of film sound is a synchronous with the visual, since it might be
typology consisting of 3 separate categories: narrating at a different level that does not have
speech, effects, and music (Bordwell & Thomp- its basis in the present image (Hug, 2011; Kubel-
son, 2001; Sobchack & Sobchack, 1980). This ka, 1998; Pudovkin, 1929/1985). Computer game
typology is originally based on the technology of audio, on the other hand, is dynamic and stored
early sound films and its 3 tracks, constituting a as a resource for the player to use in a non-linear
practice-oriented separation of sound into differ- fashion. An invariant set-up of sounds is stored
ent categories. It also corresponds well to Murch’s in a database, but the use of objects that would
99
Figure 2. Murch’s conceptual model. Adapted from Murch (1998)
produce these sounds while the game is being fuse tend to have one or several of these factors
played is likely to be highly variable if the game within the same, small range. Additionally, one
is not to become extremely linear in its progres- should not forget the active ability of humans to
sion and very boring to play (see Farnell, 2011; focus on particular sounds to the exclusion of
Mullan, 2011 for technologies offering the chance others: what is commonly referred to as the cock-
to break from this paradigm). tail party effect. In this chapter we can not discuss
The typology for film sound and its 3 catego- the whole field of acoustics and psychoacoustics
ries can also be compared to how the human but will need to focus the attention towards a
auditory system is biased. Humans are biased limited number of issues concerning the complex-
towards listening for voices (Chion, 1994, p. 6), ity of sound such as dynamics, relative amplitude,
and towards attempting to interpret voices as dominant frequencies, and their relation to se-
words of language, and spoken language is a mantic value. 2 For the time being, we can conclude
primary resource for communication. Humans that if there are many sounds with the same prop-
are generally most sensitive in the part of the erties, that clarity might then become a problem
sound spectrum occupied by the human voice, and, at worst, the mix will become blurred or
that is, approximately 150 to 6000 Hz, and espe- distorted. Therefore, it can be useful to consider
cially sensitive within the range of 3000 to 4000 what types of sounds have already been used when
Hz. Spoken language occupies a quite broad part designing a sonic environment, for which our
of the sound spectrum in which the threshold is model can be a powerful toolset.
low. In movies and games this part of the sound Will broad dynamics thus create good sound
spectrum is also commonly inhabited by concur- design by itself? The answer is obviously no. If
rent sounds, such as explosions, music, and so two or more sounds are played simultaneously,
on, which have a natural broad spectrum.1 A voice they may blend and be heard as one. In theory, the
does not need the same level as a low pitched more the sounds differ in a dominant frequency
boom in order to be perceived as having the same span and relative loudness, the easier it becomes
loudness. There are a number of reasons as to why to distinguish them into 2 different sounds with
some sounds fuse together into one and why, in different semantic values. Reality, on the other
some cases, they do not. These include frequency, hand, is not that simple. In games, 2 audio files
relative amplitude, timbre, onset, amplitude en- can typically be played together in innumerable
velope, and sound source location: Sounds that ways and with different timings. Consequently,
100
one of the key problems of computer game audio load over the brain, the sonic environment is more
is the loss of control that a sound designer has likely to be clear and distinct. Perhaps every sound
over the playback of the sound in the gameplay does not have to be as loud as possible (Thom,
of a complex game. Two or considerably more 1999)? If every sound is evaluated and then given
different (or identical) audio files might be played values for a set of variables, such as dynamic
simultaneously due to gameplay events induced by range, dominant frequency, and cognitive load,
the player or the game system. This could lead to a the sonic environment can be easier to visualize.
chaotic sonic environment, the “logjam” of sound This is what our combined model does.
that Murch (1998) describes in relation to film So far we have identified a number of key
sound (see also Cancellaro, 2006; Childs, 2007; problems in the analysis and production of com-
Marks, 2001; Prince, 1996; Wallén, 2008). This puter game audio:
“logjam” does not support or enhance gameplay
and may also become very tiresome to listen to • There is a general lack of functional mod-
during even short sessions of play. In order to els for the analysis of computer game audio
avoid that sounds lose their definition and thereby • There is also a general lack of functional
their semantic value, a game audio designer needs models for the production of game audio
to plan and structure the game audio as much as • The loss of control that a sound designer
possible, which constitutes a major problem. has over the playback of the audio in the
The sound designer can design and deliver gameplay of a complex game may lead to
the sounds to a game but the player is the one a chaotic blur of sounds which makes them
person in control of the play button. The goal for lose their definition and hence their seman-
a sound designer should be to retain as much con- tic value
trol over the final sonic environment as possible, • When two or more sounds play simultane-
even though it is hard to define exactly when the ously, the clarity of the mix depends on the
sounds are going to be played. Since game sounds type of sounds, which leads to
are usually loaded on call by certain events in the • The nature of the relationship between en-
game, the sounds cannot be edited and mixed in coded and embodied sounds.
a fashion similar to the mixing of film sound.
In other words, the sonic environment has to be Furthermore:
spread out beforehand. To avoid a big, undefined
wall of sound, the sounds have to be somewhat • Sound is often an abstract to game design-
compatible with each other. One could compare ers, graphical artists and programmers, due
sound layers with a jig-saw puzzle; in order to to a lack of consistent and communicable
complete it, each part must fit in with the surround- terminology.
ing parts. If a number of pieces are put onto each
other, the parts at the bottom will be covered and The overall purpose of this chapter is therefore
not clearly visible. On the other hand, as Chion to present a model (Figures 3 to 8) that solves
(1994) has noted, sounds may be superimposed these problems and makes it possible to plan the
on top of each other without the conceptualiza- audio layering of computer games in terms of:
tion that they stem from different environments
(pp. 45--46). The problem is that sounds which • The relationship between encoded and em-
are similar will blur into one another. By using bodied sounds
the entire dynamic and frequency range, as well • Cognitive load
as the panorama and distribution of the cognitive
101
Figure 3. The structure of the combined model of computer game audio (without the addition of specific
sounds)
• The functionality of the sounds in com- analyze computer game audio, and a number of
puter games analyses showing the benefits of this are provided.
• The relative loudness between sounds and Through the use of this model, a sound designer
• The dominant frequency range of all the will be able to clearly understand how different
different sounds. kinds of games emphasize different parts of the
audio due to the genre and gameplay principles.
We also try to establish a consistent communi- This chapter is structured as follows:
cable terminology for the analysis and production We first present the two models for game audio
of computer game audio. that we have tried to combine, starting with the
The suggested model (Figure 3) combines IEZA-framework for computer game audio and
Huiberts and van Tol’s (2008) IEZA-framework proceeding to Murch’s conceptual model for film
(Figure 1) with Murch’s (1998) conceptual model sound. This is followed by a presentation of the
(Figure 2). combined model, how it is structured and what
This combined model may be used to plan kind of problems it can solve. In order to provide
the sonic environment of computer games in a a more theoretical approach to the complexity of
complete and balanced way, that is, balanced computer game audio, a discussion concerned
in relation to the sound spectrum available and with playing computer games and listening to
complete in relation to the visual component of the sounds, which is sustained by a case study,
the game. The model constitutes a tool that pro- follows. We then provide 3 exemplifying analyses
vides a sonic richness and avoids “the logjam of of existing computer games, F.E.A.R. (Monolith
sounds” (Murch, 1998). It may also be used to Productions, 2005), Warcraft III (Blizzard Enter-
102
tainment, 2002) and Legend of Zelda (Nintendo, The IEZA-framework consists of 4 categories:
1987), to show how the combined model, as an
analytical toolset, may be applied. Since this model 1. Interface: Sounds related to the game’s
is also suitable for the production of game audio, interface. Interface sounds are non-diegetic
we then provide an example of how to actually and belong to the game as a system.
use it as a production toolset. The final section 2. Effect: Sounds directly or indirectly triggered
is a summary of this chapter and our concluding by the player’s actions. The sounds of effects
thoughts. are diegetic and the result of activity within
the game environment.
the IEZA-Framework 3. Zone: Sounds related to the game environ-
ment. Zone sounds are diegetic and belong
Although we show that sound is closely related to the setting.
to immersion, most literature on game audio does 4. Affect: Sounds outside the game environ-
not deal with fundamental questions, such as ment, mainly intended to set the mood. The
those related to what game audio really is, what it sounds of affects are non-diegetic and often
consists of and what makes it function in games. used to create anticipation.
It is striking that in this emerging field, theory The 4 categories are divided into 2 axes in a
on game audio is still rather scarce. While cross pattern: diegetic versus non-diegetic in the
most literature focuses on the production and vertical axis and activity versus setting in the hori-
implementation of game audio, like recording zontal axis. The terms diegetic and non-diegetic
techniques and programming of sound engines, are also very often used in film theory (Bordwell
surprisingly little has been written in the field of & Thompson, 1994; Bordwell & Thompson, 2001;
ludology about the structure and composition of Chion, 1994; Wilhelmsson, 2001) and diversify
game audio. (Huiberts and van Tol, 2008, p. 1) the environment inside the movie/game, that is,
the diegesis, versus the system that carries this
Huiberts and van Tol were looking for a func- world inside the movie/game, that is, the non-
tional and coherent framework to use for the study diegetic (Cunningham, Grout, & Picking, 2011;
of game audio and examined different categoriza- Jørgensen, 2011). The IEZA-framework makes a
tion methods, from games and films respectively. clear distinction between the sounds that belong
However, they found that none provided any inside the game environment, the Zone and the
sensible information about the organization and Effect sounds, and the sounds that belong to the
functionality of the audio. This is a problem since system as such, the Affect and Interface sounds.
the functionality of sound is essential to computer The horizontal axis places the sounds on a scale of
games. While this model, in its original form, does setting versus activity. The Zone and Affect sounds
not specifically discuss the semantics of sounds in belong to the setting of the game and the Effect
a detailed way, our combined model emphasizes and Interface to the activities during gameplay.
this important issue.3 This is a good starting point for understanding
Huiberts and van Tol propose that a more coher- how computer game audio may be categorized in
ent way to categorize the audio in a game should accordance with its functionality within the sonic
also include the function, role and properties of environment of a specific game. We agree with
the different sounds. They therefore developed the Huiberts and van Tol (2008) that this structure
IEZA-framework (Figure 1) for the categorization enables the IEZA-framework to go deeper than
and planning of audio in computer games. other similar frameworks. We have used the IEZA-
103
framework successfully in game audio courses at 3. Yellow – Equally Balanced Effects

the University of Skövde with good and promising 4. Orange – Musical Effects (e.g. atmospheric
results. Nevertheless, we have also realized that tonalities)
the model does not cover all the important issues 5. Red – Music.
with regard to creating a sonic richness and at the
same time avoiding the smearing of sound all over
the sonic environment. The key problem is that Before addressing Murch’s conceptual model we
the IEZA-framework does not, in itself, produce need to elaborate the statement that humans are
a visualization of the cognitive load, the relation biased towards listening for voices (Chion, 1994,
between the semantic value of different sounds, p. 6). Chion states that: “Sound in film is, above
the relation between encoded and embodied all, voco- and verbocentric because human be-
sounds, the dominant frequencies of a sound file ings in their habitual behavior are as well” (p. 6).
or its loudness. Combined with Murch’s (1998) He suggests 3 different listening modes: causal,
conceptual model the IEZA-framework can be semantic and reduced listening. We first listen in
part of a more elaborate tool for the production order to identify the cause of a sound—causal
and the analysis of computer games. We have now listening—and, when identified, we listen to find
covered the first node of our combined model and the meaning of the sound—semantic listening.
it is time to take a closer look at the second node: Reduced listening is a special case that is not
Murch’s conceptual model of film sound. discussed in this chapter.
Therefore, what is Chion’s suggestion about
how listening to a cinematic soundtrack works
MUrcH’s cONcEPtUAL MODEL with regard to the 3 different types of sound, that
is, speech, effects, and music?
One central point made by Murch in his work on
the conceptual model (1998) is that just as audible If the scene has dialogue, our hearing analyzes
sound may be placed on a scale ranging from the vocal flow into sentences, words-hence, lin-
approximately 20 Hz to 20,000 Hz, a sound may guistic units. Our perceptual breakdown of noises
also be placed on a conceptual scale from Encoded will proceed by distinguishing sound events, the
to Embodied covering a spectrum from speech to more easily if there are isolated sounds. For a
music via sound effects in order to avoid a “log- piece of music we identify the melodies, themes,
jam” of sounds. This dimension of film sound is and rhythmic patterns, to the extent that our musi-
the reason for our choice of Murch’s conceptual cal training permits. In other words, we hear as
model as the second node of our combined model usual, in units not specific to cinema that depend
of computer game audio. The IEZA-framework entirely on the type of sound and the chosen level
does not, in itself, categorize the different sounds of listening (semantic, causal, reduced).
on a scale from encoded to embodied, and no refer-
ences to Murch’s conceptual model of film sound
are made in Huiberts’ and van Tol’s article (2008). The same thing obtains if we are obliged to sepa-
rate out sounds in the superimposition and not in
Example from Murch (1998) their succession. In order to do so we draw on a
multitude of indices and levels of listening: dif-
1. Violet – Dialogue ferentiating masses and acoustic qualities, doing
2. Cyan/Green – Linguistic/Rhythmic Effects causal listening, and so on. (Chion, 1994, p. 45)
(e.g. footsteps, door knocks etc)
104
What does this then mean? Voco- and ver- 1. Dialogue (violet)
bocentrism relates to 2 of the 3 listening modes 2. Small arms fire (blue-green ‘words’ which
suggested by Chion (1994), that is, causal listen- say “Shot! Shot! Shot!”)
ing and semantic listening. In Chion’s work, the 3. Explosions (yellow “kettle drums” with
part on semantic listening is very brief and only content)
discusses semantic value in relation to a code or 4. Footsteps and miscellaneous (blue to orange)
spoken language. However, it is fruitful to also 5. Helicopters (orange music-like drones)
use this concept in relation to the system of sounds 6. Valkyries Music (red).
that a given movie or a given game puts forth
which may be understood as a semiotic system Something had to be sacrificed whilst main-
consisting of sound signs. Within such a system taining density and clarity and Murch therefore
the sounds are part of the communication of the decided to omit the music and have a five layer
environment. This line of thought is also found soundtrack consisting of:
in Murch’s conceptual model.
According to Murch (1998), the clearest ex- 1. Dialogue (“I’m not going! I’m not going!”)
ample of encoded sound is speech and the clearest 2. Other voices, shouts, etc.
example of embodied sound is music. Further- 3. Helicopters
more, since the human brain normally divides the 4. AK-47’s and M-16s
processing of sound (and other stimuli) between 5. Mortar fire.
the left and right side of the brain, we are able
to discern 5 different layers of sound simultane- In Murch’s (1998) example, the instances of
ously, if they are evenly spread on the conceptual “small arms fire” are effect sounds with a semantic
spectrum from encoded/violet to embodied/red. value that Murch calls “blue-green ‘words’ which
Murch (1998) provides a number of practice-based say Shot! Shot! Shot!”. Firing a gun in a game
examples and problems from his work on the film would typically result in a direct response sound-
Apocalypse Now! (Coppola, 1979): wise. The player would also probably anticipate
such a feedback. In addition, firing a gun would
[…] it appeared to be caused by having six layers conceptually evoke a sound and some kind of
of sound, and six layers is essentially the same as visual response as well. Sound reveals something
sixteen, or sixty: I had passed a threshold beyond about the environment. In this case, it signals the
which the sounds congeal into a new singularity - presence of guns and a potential danger; it is a sign
dense noise in which a fragment or two can perhaps that denotes clear and present danger. Humans
be distinguished, but not the developmental lines tend to seek meaning and structure in and from
of the layers themselves. With six layers, I had the surrounding environment. In this case, we try
achieved Density, but at the expense of Clarity. to identify the source of a sound and what it might
mean in the present context. We have used both
causal and semantic listening and found a specific
What I did as a result was to restrict the layers for sound among others in the concurrent audio lay-
that section of film to a maximum of five. By luck or ers. The sound as such is dense and clear at the
by design, I could do this because my sounds were same time since it is carefully planned to occupy
spread evenly across the conceptual spectrum. a specific frequency range and to develop within
a specific part of the dynamic range.
Murch’s problem in this case concerned the 6 Is Murch on the right track with his conceptual
concurrent layers described below: model and his conclusion that five concurrent
105
layers of sound is the maximum for obtaining the immersion radically, and even an old film
density and clarity? Murch’s conceptual model such as The Phantom Chariot (Sjöström, 1921),
corresponds well to Miller’s (1956)The Magical with a musical soundtrack composed by Matti
Number Seven, Plus or Minus Two: Some Limits Bye in 1998, becomes an interesting movie. This
on Our Capacity for Processing Information. clearly exemplifies that sound, even if it is only
As humans, we have limitations with regard to music, has the effect of including the audience in
processing data. Miller (1956) discussed this in the environment and that the moving images do
terms of bits and chunks: not, in themselves, have the same desired effect,
which supports Ong’s claim on the bipolarity of
If the human observer is a reasonable kind of vision and hearing. Only music… well, of course,
communication system, then when we increase music should not be solely considered as embod-
the amount of input information the transmit- ied rather than encoded. Music is a plethora of
ted information will increase at first and will systems. It can be narrative and it contains many
eventually level off at some asymptotic value. cultural dependent codes. However, Murch’s point
This asymptotic value we take to be the channel is that music works rapidly and is usually aimed
capacity of the observer: it represents the greatest more at our emotive rather than our intellectual
amount of information that he can give us about response. There are, of course, differences between
the stimulus on the basis of an absolute judg- individuals. “For a piece of music we identify the
ment. The channel capacity is the upper limit on melodies, themes, and rhythmic patterns, to the
the extent to which the observer can match his extent that our musical training permits” (Chion,
responses to the stimuli we give him. 1994, p. 45). In the case of Bye’s musical score for
The Phantom Chariot, much of the music mimics
He used Pollack’s (1952, 1953) work on au- other kinds of sounds, such as the squeaks of the
ditory displays to discuss and explain absolute chariot’s wheels, and therefore the music is also
judgment of unidimensional stimuli which clearly clearly semantic. Bye’s music can very well be said
showed the channel capacity for pitch to be “2.5 to make use of a scale of sounds from encoded to
bits which corresponds to about six equally likely embodied that comprises the soundtrack.
alternatives” (Miller, 1956). It is interesting to note Thus far we can conclude that Murch’s model
that sound was the focus for this groundbreak- fits well into the idea of an upper limitation of a
ing work on the human capacity for processing simultaneous processing capacity that has been
information. thoroughly investigated since at least 1956 and
The combination of moving images and sonic onwards. Furthermore, the conceptual model he
environment makes up the setting in which the suggests makes it possible to consider sound on
actions of a movie or a game take place. Watch- a scale spanning from encoded to embodied. This
ing a movie without any sound added is often in turn implies that if such a scale, spanning from
somewhat dull. Our experience is that an audience encoded to embodied sound, were to be used in
trying to become immersed in old silent movies combination with the IEZA-framework, it would
without any preserved soundtrack grows bored be much easier to structure a sonic environment
and separated from the events on the screen. This for a computer game. We also have a new set of
is in accordance with Walter Ong’s (1982/90) parameters that add content to the sound categories
remarks on the bipolarity of sight and hearing suggested by the IEZA-framework. This content
which we later elaborate in the discussion about is the level of meaning a specific sound carries.
playing computer games and listening to the Meaning, or semantic value, is not only carried
sounds. Adding a musical soundtrack heightens by sonic signs such as the spoken words, utter-
106
ances and so on of language production, although away from the centre, Murch’s spectrum is then
language and its use is a prototypical example traversed until the periphery of the circle which
of highly encoded sounds which Murch’s model equates to red/embodied. The more central a
emphasizes. sound is placed, the higher its level of encoding;
the more peripheral a sound is the lower its level
of encoding. This is a clear difference from both
tHE cOMbINED MODEL the IEZA-framework and Murch’s conceptual
FOr GAME AUDIO model which do not themselves allow such a visual
differentiation in the first case and in the second
This section introduces the different parts of the do not, apart from Effects, place the sounds in
combined model for the layering of computer game specific categories (such as Interface, Zone and
audio. The combined model makes it possible to Affect) nor place the sound on the vertical axis
categorize the different sounds for any part of a of diegetic versus non-diegetic or the horizon-
game in a number of ways. Such a categorization tal axis of setting versus activity. Murch does
could span from relative dynamic range (domi- write about the setting versus activity scale, but
nant frequency areas, “encoded sound” versus his conceptual model does not have a structure
“embodied sound” (Murch, 1998)) to whether a that visualizes this aspect clearly. The effect of
sound belongs to the diegesis of the game, if it combining these two models, that is, the IEZA-
is part of the interface, belongs to the activity of framework and the conceptual model, will make
playing, or to the setting in general. If, for example, it easier to understand what is happening in the
many “encoded sounds” are used, such as spoken sound environment beforehand, in a more detailed
language in a game, it is necessary to be attentive manner. The sound designer does not need to use
to the total sonic environment in which these “en- actual sounds: They may be derived from a game
coded sounds” take place and plan for an acoustic script or story board prior to production.4 If the
niche for the dialogue with few interfering sounds sound designer has lots of sounds in the centre of
played simultaneously within the same frequency the model, she is most likely to produce a cogni-
span. If many “embodied sounds” are needed, tive overload for the player because the sounds in
such as music combined with ambient sounds the centre are encoded and need more intellectual
designating the environment, it will be necessary processing to be meaningful and distinct. Since
to make them work together by shaping the sounds controlling dominant frequencies is one way to
to fit and allow each other concurrent presence. distinguish1sound stream from another, we have
As Figures 1 to 3 above show, we have taken chosen to make this quality of sound visible within
the basic differentiation of the game audio divided the model. We have also chosen to use 3 basic
into Interface, Effect, Zone and Affect sounds primitives, as Figures 4 and 5 illustrates:
from the IEZA-framework. We have also used,
from the IEZA-framework, the horizontal axis • A circle = a sound in which the bass fre-
that differentiates sounds on the setting versus quencies are dominant
activity scale and the vertical axis that describes • A square = a sound in which the midrange
sound as diegetic or non-diegetic from the origi- frequencies are dominant
nal IEZA-framework. The IEZA-framework is • A triangle = a sound in which the treble/
intact within our model; Murch’s conceptual high frequencies are dominant.
model has, however, been visually adapted. The
centre of the circle equates to the left-hand foot These 3 basic primitives were chosen since
of Murch’s arch (violet/encoded) and, moving they seemed natural, but this is not to say that
107
Figure 4. The different loudness primitives used in the combined model ranging from very low to very
loud. Midway should be about normal speech level, assuming that the game’s sound is somewhat dynamic.
The scale is, in other words, relative to the specific game’s loudest and quietest sounds
Figure 5. F.E.A.R. analysis example
they would be the natural shapes to represent what place a bass sound originates, which is why a
these qualities: They are more likely the result of a circle was chosen. However, a midrange sound has
process of cultural connotation than anything else. a high degree of definition and is distinct, which
A bass sound does not have the same sharpness is the reason a square was chosen. Furthermore,
as a midrange sound. It is often hard to hear from a triangle was chosen for the treble sound, which
108
is sharp and often pointy.5 But we also need yet Playing a computer game involves the ma-
another dimension and that is loudness. These nipulation of objects within the game environ-
primitives have therefore been made in 5 different ment in a dynamic, sequential flow of events.
sizes designating their relative loudness: A larger During play, the player will be processing a lot
primitive represents a louder sound. of data that will need to be made meaningful in
The model is able to show the following order to proceed within the game environment
aspects: and the game as a system. The player will need
to identify the data and turn it into information
• The amount and clustering of encoded by categorizing the graphical as well as the sound
sounds elements. Objects in a game may be connected to
• The amount and clustering of embodied a corresponding sound that carries meaning, that
sounds is, there is a semantic level in the sounds and the
• The amount and clustering of diegetic graphics that is fundamental to gameplay as such.
sounds Scripted sounds, or series of sounds, on the other
• The amount and clustering of non-diegetic hand, are the result of a player’s position within
sounds the environment rather than a specific gameplay
• The amount and clustering of Interface action taken by the player. For example, a player
sounds reaches a certain point in the game environment
• The amount and clustering of Effect sounds and a sound starts. The player has not taken any
• The amount and clustering of Zone sounds conscious action and hence does not anticipate
• The amount and clustering of Affect sounds any specific feedback. The conceptual spectrum
• The relative loudness between sounds induces different kinds of anticipation. You might
• The dominant frequencies in each sound. very well use scripted sounds to make the player
take action, for example, by letting the player
The parameters above help the sound designer walk in a narrow corridor with glass walls on-
avoid cognitive overload due to a logjam of sounds. 1side. When the player reaches a specific point
We have now described how the combined model of observation, you script a sound event to occur
is structured. Before explaining how the sound (Gibson, 1986) also adding a sudden motion seen
designer can use it to analyze and/or structure through the glass wall. If the gameplay is based
the sonic environments in computer games, we on survival/horror, you might induce the player
need to elaborate on the discussion of the process to waste a number of bullets on the supposition
of playing computer games and listening to the that the sound and the motion imply danger is
sonic component of the environment as part of present, that is, you scare her into action through
game playing. a scripted event.
In a film everything is placed in a comparable
Playing computer Games and scripted order when the editing process has been
Listening to the sounds concluded and the film is completed. However,
in a computer game, the total environment must
What is game playing? How does a player act and support the gameplay. It may contain cut scenes,
why does she act the way she does? What role does scripted events as well as those based on player
sound have in the playing of a computer game? action. In the first case you have the same con-
Can sound be used to manipulate the player into trol as in traditional movies, while in the second
acting in specific ways? you preplan an event to occur at a specific point
in the game environment. For the last case, a
109
number of options are available soundwise that to understanding the function that sound has
can be supplied to the player through a database. in audiovisual constructions such as computer
As an example, you can, and probably should, games. Think about it; in order to fully take in
limit the number of weapons accessible to the an environment through vision we need to move
player. Every single weapon should be discern- around and turn our eyes towards what we would
ible from any other weapon through its sound in like to see (cf. Ong, 1982/90; Gibson, 1986). Ong
order to enhance the semantic value. Our point actually refers to the immersive effect that high-
here is that the player is supplied with a number fidelity audio reproduction accentuates. Sight is
of objects with which to play the game. Many of limiting, but hearing is not in the same way. We
them produce sound effects within the diegesis of can hear what is behind us and then turn around
the game, such as shots. As this example shows, to see it. If sound integrates rather than separates
the IEZA-framework is useful in this part of the us from the surrounding environment, it would
process, discerning what kind of sound belongs seem reasonable that sound and immersion have
where in the game’s structure. You have some, a strong relationship. Integration through sound
but not total, control over when, why and where might lead to immersion. Hearing is, on the other
the player will use these play objects. hand, also a selective process. We may, to some
In the above, our focus is on the sonic envi- extent, filter out uninteresting and disturbing
ronment of computer games and the problem of sounds. A construed audiovisual environment is
balancing the sounds in relation to each other. a prefiltered1into which sound and images have
However, very few games consist of sound alone been put through the selective processes of their
(notable examples can be found on websites such creators. We therefore discuss hearing, vision, the
as http://www.audiogames.net/). What happens, visual, and affordances (Gibson, 1977, 1986) in
then, when a game consisting of sound and graphi- relation to the sonic environments of computer
cal elements is played? What does sound provide games.
to this experience? In the following section we To some extent, the bipolarity of hearing and
discuss a case study that relates to this issue in vision is innate. Biologically, we develop hearing
general and the use of sound as a means of direct- before we can see. A human fetus can normally
ing the player in particular. perceive sound from week 15 after conception
According to Ong (1982/90), vision and and the ears are usually fully developed by week
hearing have a basic bipolarity. The aim of the 24. The fetus is surrounded by amniotic fluid and
following paragraph is to discuss the relation this underwater environment is an immersive one
between vision and hearing, as well as Ong’s that completely encloses us. We are immersed
suggested bipolarity of these two and how this and can feel touch from week ten. In fact, one
relates to immersion. Vision separates us from of the primary definitions of immersion used in
the environment, making the limits of our bodily the context of computer games clearly connects
containers protrude, whereas sound integrates the concept of immersion with being under water
us with the environment, blurring the border be- (Murray, 1997, pp. 98-99). Hearing the environ-
tween the container of the self and the adjacent ment precedes seeing it, in terms of how these
environment. Ong’s theory, which is concerned senses develop from conception, and feeling the
with the differences between written and spoken environment precedes hearing it. In the womb,
language, might seem odd to use in relation to a movement is restricted and, as newborns, we have
model for the analysis and production of computer no locomotion and must be transported by oth-
game audio. Nevertheless, we find his remark ers. Sight is still limited and objects in the visual
about this bipolarity highly relevant with regard field need to be very close to be in sharp focus,
110
even if the eyes themselves are more or less fully a common user pattern for furniture, nor could
developed from week twenty five. However, we the whole category be described by 1simple and
can hear and recognize sound such as the voices understandable form. For example, a table and a
of our parents or melodies from a computer game cupboard are both pieces of furniture but they do
prior to birth and respond to such auditory stimuli not look the same or have the same schemata of use.
by kicking and moving around. It can be asserted Although there are good reasons to follow Ong’s
that sound activates the fetus. When we grow up, idea about the bipolarity of vision and hearing, we
hearing is still physically affective and may induce argue that sound might also be understood from
both conscious and reflexive physical responses. the perspective suggested by Lakoff and Johnson.
Rapid and loud sounds may be frightening while Furthermore, we constantly observe the environ-
slow and soft ones may be relaxing. When listen- ment through points of observation which include
ing to the long sequence of the breathing sound all our senses (Gibson, 1986). The human mind
in the movie 2001: A Space Odyssey (Kubrick, and the human body are not primarily separate
1968), it is our experience that it is almost impos- units but make up complex systems of which the
sible not to fall into the same rhythm and breathe visual and auditory sensory systems are of great
in synchronization with the sound of the film. importance for our understanding of the world.
The bottom line is that hearing includes us in The configuration of the human body has effects
the environment, making us part of it rather than on human perception as well as human cognition.
separating us from it. Listening is part of feeling Gibson (1986) suggests a number of different
immersed and immersion is a perceptual, body kinds of vision which are based on the situations
based experience. they are employed in:
Central to human perception and cognition is
the configuration of the human body and its abil- • Snapshot vision: fixating a point and then
ity to move around within an environment. This exposing some other point momentarily
is the basis for a number of theories on embod- • Aperture vision: successive scanning of
ied and situated cognition, that is, how humans the visual stimuli
make meaning of the environment in which they • Ambient vision: looking around by turning
are situated. We propose that the organization of the head
sound within computer game environments would • Ambulatory vision: Looking around by
benefit from some basic insight into cognitive moving towards objects.
theory and the idea of basic level primacy (La-
koff, 1987; Lakoff & Johnson, 1999). A central In a real time strategy (RTS) game like Warcraft
claim in Lakoff and Johnson’s work is that the III (Blizzard Entertainment, 2002), the player has
objects we encounter may be understood from a a larger visual field than the controlled characters
perspective of the superordinate, that is, basic and she is commanding, enabling an overview of the
subordinate levels of which the basic level is the diegetic environment using 3 of the 4 different
highest one that provides an object with an overall types of vision. She can use snapshot vision to
understandable form and a general user pattern. fixate a point, aperture vision to perform succes-
The concept of a chair is, for instance, at the ba- sive scanning, and ambulatory vision by moving
sic level, whereas furniture is at a superordinate towards objects. If the controlled avatar is turned
level and a specific red and white chair, made of around, the visual field, as such, does not rotate.
steel and concrete, is at the subordinate level. The This is the common practice in RTS games. The
more detailed meaning we can assign to any given player may, to some extent, change the angle of
object the more subordinate it is. We do not have the visual field which is also somewhat limited
111
by a highlighted ring. What lies beyond this ring ment meaningful. As mentioned earlier, Chion
is not visible. In order to see what is hidden, the (1994) suggests 3 different listening modes: causal,
player must enforce ambulatory vision, that is, semantic and reduced listening. In addition to
move the controlled characters. these, we suggest the following different kinds of
In an adventure game such as Legend of Zelda listening when playing a computer game, which
(Nintendo, 1987), the player’s view is locked in are analogous to Gibson’s 4 kinds of vision:
a specific angle heightwise. The player needs to
move the avatar towards the end of the visual field • Snapshot listening: fixating a point and
in order to reveal what lies beyond the framing of then shifting to some other point mo-
the diegetic environment, that is, she can use snap- mentarily by filtering out all other sound
shot vision, aperture vision and ambulatory vision. sources
In F.E.A.R. (Monolith Productions, 2005), which • Aperture listening: successive scanning
is a first-person shooter (FPS), the player, on the of the audio stimuli
other hand, sees the world through the eyes of the • Ambient listening: increasing the fre-
avatar.An immobile observer, who cannot change quency range of the sound by turning the
the viewpoint within the game, can only use either body towards its source for higher defini-
snap shot vision or aperture vision within the visual tion of the sound
field of the computer game environment. A game • Ambulatory listening: listening by mov-
such as Myst (Cyan Worlds, 1993) works this way, ing around and using sound as part of the
as does Pac-Man (Namco, 1980), Space Invaders navigation within the environment.
(Taito, 1978) and many other early games. Hence
“The single, frozen field of view provides only In order to support the idea of our 4 listening
impoverished information about the world […] modes and their relation to Gibson’s 4 modes of
The evidence suggests that visual awareness is in seeing, we briefly present a case study called In
fact panoramic and does in fact persist during long the Maze, which was a laboratory-based experi-
acts of locomotion” (Gibson, 1986, p. 2). All the ment conducted at the InGaMe Lab at the Uni-
games mentioned above have sound to make the versity of Skövde. This discussion also provides
environment more connected to the act of playing interesting connections to our combined model
and to achieve a more prominent and lifelike game for game audio.
environment. The sonic environment of Myst is The case study was originally devised to inves-
quite elaborate for its time, being distributed on tigate whether or not sound can be said to align with
CD-ROM which allows considerably more data Gibson’s (1977, 1986) ideas of affordances and,
than earlier games. In addition, more data capacity if so, whether sound stimuli would make certain
also meant comparably high audio resolution, that locomotive patterns more probable than others.
is, bit depth and sample rate. Pac-Man and Space
Invaders used other kinds of technology in their The affordances of the environment are what it
original form, relying on the hardware rather than offers the animal, what it provides or furnishes,
the software but, when ported to other platforms, either for good or ill […] I mean by it something
such as consoles and PCs, they were kept quite that refers to both the environment and the ani-
close to the original limits soundwise. mal in a way no existing term does. It implies the
If sound integrates us in the environment, as complementary of the animal and the environment.
Ong (1982/90) proposes, and if sound and im- (Gibson, 1986, p. 127)
mersion are also related, we might also employ
different kinds of listening to make the environ-
112
The game environment for our experiment was • Video recordings of the gameplay session
a hexagonal structure comprising a labyrinth of from the players’ perspective
corridors that made it impossible to adopt a strat- • Video recordings of the game player from
egy based on always going left or always going 3 different angles; face on, from the left
right because that would only lead the test subject side and from above
back to the starting point. We also tried to create a • Sound recording of the player while play-
consistent environment, that is, a level consistent ing, for the purpose of capturing spontane-
with reality and including sound that would match ous comments addressed to the game as a
the visual environment. There were, however, system6
differences in the sound played at specific parts • Video and audio recordings of semi-struc-
in the corridors leading to intersections. The test tured interviews after test sessions and
subjects played the game wearing headphones. • Video and audio recording of a replay of
At certain points in the game’s 4 levels, the game each player’s session, in which they were
was scripted to play 2 different kinds of sounds in given a chance to freely comment upon
the right and left headphone speakers. The player their own gameplay.
did not at first know when such a scripted sound
would occur but could turn back in the corridor Several test subjects adopted an audio-based
and the sounds would be triggered again. That is game strategy even if many were not actually
to say, the first time a scripted sound played, it aware of it. What we can deduce from the data
was not the result of any conscious game strategy collected is that many of the test subjects tried to
formulated by the player. The difference between follow sound to reach the end of each level in the
the sounds was thatonekind of sound was meant labyrinth. We did put in a reversed level to examine
to have the semantic value open and the other whether there actually was audio that mattered with
closed, at a basic level of categorization. In other regard to choices. That is, 1of the 4 levels had the
words, we tried to propose a certain universal user sounds that signaled open leading to dead ends
pattern, to walk towards the open sound rather and vice versa. This level indicates a tendency
than the closed one. The basic intention of the that test subjects really followed sounds and were
test was that if a sound in the right speaker was confused when the pattern was changed. The data
designed to suggest “closed”, the path to the right shows that a perceptual/cognitive set (Bugelski
would lead to a dead end and vice versa. The only & Alampay, 1961; Wilhelmsson, 2001) may be
instructions the test subjects received were to play constructed of audio and visual stimuli and that
a game. By doing so, we introduced the idea that such a perceptual set may lead to the formation of
there ought to be some form of rules and ludus a strategy to reach the end state of a game.
element rather than free playing activity, that is, We used the game engine of Half-Life (Valve
paidia (Caillois, 1958/1961). The hypothesis was Corporation, 1998) for our case study test. Some
that with only rudimentary instructions the test of the test subjects identified the game as a Half-
subjects would need to identify the environment Life level very quickly, which in turn and, due
as a maze by exploring it using the 4 different to previous experience of Half-Life, led to im-
types of vision that Gibson suggests, then devise mediate speculation about what the game would
a strategy for moving through the maze. The idea provide. Half-Life is a FPS game based on the
of our 4 modes of listening was not part of the principle of Agôn (Caillois, 1958/1961). Test
hypothesis but a result deduced from these tests. subjects with previous and deep experience of FPS
We collected several layers and types of data games therefore presumed they would encounter
for this test: enemies of some kind and quickly adopted a spe-
113
cific locomotive pattern. When the game began, of the walk-ability is at play: You walk straight on
they rapidly turned around to get an overview, because that is what you do in a corridor. “Mov-
that is, ambient vision; they also tried to watch ing objects generally receive a FRONT–BACK
their back at times and had a tendency to walk orientation so that the front is in the direction of
in a criss-cross pattern, indicating ambulatory motion (or in the canonical direction of motion,
vision. At times, criss-crossing led them to see so that a car backing up retains its front)” (Lakoff
more of 1corridor, depending on which side they & Johnson, 1980, p. 42). We can also conclude
were walking or running when they reached an that the culture of play and prior familiarity with
intersection. That is, if they were keeping to the this kind of game environment had some influ-
right, they would see more of the corridor to the ence on how the test subjects tried to move the
left and, in some cases, they preferred to move Game Ego manifestation around within the game
towards what they could see. Test subjects who environment and not only the visual (and sonic)
had a great deal of experience with Half-Life or affordances as such.
other similar first-person shooters, probably also The results of the case study are, in essence,
used snapshot vision to rapidly scan the environ- transferable to the suggested combined model
ment. However, we had no means of measuring and take into account the relation between vision
this and the only way we can observe the use of and sound. In fact, the method used in the case
snapshot vision from our data is how the centre study makes a good integral part of the combined
of the first-person perspective fluctuates. It is model. Some of the test subjects were affected
probably the case that subjects moving rapidly not only by the ambulatory listening but also by
in the environment need to use snapshot vision the ambulatory visual position within the game
due to their velocity but further tests would need environment. It is important to bear this in mind,
to be conducted before jumping to conclusions. since most digital games consist of graphics and
Furthermore, inexperienced players tended to sound manifesting some kind of environment.
always walk in the direction suggested by their Games are actions undertaken by players and
starting orientation. The ME-FIRST orientation these actions may be induced by sound and/or
(Lakoff & Johnson, 1980; Wilhelmsson, 2001), by sound and graphics in combination. The hori-
as well as the experience of having a body and zontal axis of the original IEZA-framework is the
moving primarily forwards, overshadowed other one that categorizes the game audio in terms of
possibilities of locomotion: setting versus activity and here we have a clear
connection between our test and the combined
Since people typically function in an upright posi- model. The case study provides the material for
tion, see and move frontward, spend most of their the analysis of the sound and image relation and
time performing actions, and view themselves the effect of locomotion, that is, it stresses the
as being basically good, we have a basis in our horizontal axis of the IEZA-framework that dif-
experience for viewing ourselves as more UP than ferentiates setting and action. The vertical axis
DOWN, more FRONT than BACK, more ACTIVE of the IEZA-framework categorizes the game
than PASSIVE, more GOOD than BAD. (Lakoff audio in terms of diegetic versus non-diegetic.
& Johnson, 1980, p. 132) The dynamics of the audiovisual environment
and Ong’s (1982/90) suggested bipolarity of vi-
The Game Ego manifestation (Wilhelmsson, sion and hearing can be understood at a deeper
2001) presented the inexperienced player with a level using the combined model that takes into
direction for walking or running that was not ini- account both the cognitive load on the subject
tially questioned. At the same time, the affordance playing the game and the moving around within
114
Figure 6. Warcraft III analysis example. Numbers are collected from Table 2 as an imaginable snapshot
of sounds. The combined model visualizes the possible cognitive load in the audio layering. The more
central a sound is placed, the higher its level of encoding; the more peripheral a sound is, the lower its
level of encoding
the game environment, exploring its possibilities and as snapshots of the audio layering within the
in a dynamic flow (Figures 3, 5, 6, 7, and 8). combined model (Figures 5, 6, and 7).
While employing the combined model, we
found that the internal emphasis on the different
UsING tHE cOMbINED MODEL tO parts of the IEZA-framework vary between dif-
ANALYZE cOMPUtEr GAMEs ferent types of games, due to limitations of tech-
nology and genre conventions, which in turn
In the following section, we provide 3 sample depends on the gameplay and the relationship
analyses using the combined model. The games between player and game environment. For ex-
used for these analyses are: F.E.A.R. (Monolith ample, a first-person shooter or a shoot’em up
Productions, 2005); Warcraft III (Blizzard Enter- game will emphasize the Effect sounds, and have
tainment, 2002); and Legend of Zelda (Nintendo, fewer Zone sounds. A typical example of this is
1987). The aim of the analyses was to find each F.E.A.R. (Monolith Productions, 2005).
sound source possible within these games and There are indeed a lot of yellow effect sounds
then estimate the properties of the sounds. Not in the game F.E.A.R. (2005). Luckily, not all of
every single sound is included as it would have these sounds are always played simultaneously.
taken too long to find them all. Thus, for example, However, sometimes, many are played together,
sound 25 in Table 2 “Hero uses magic” represents which results in a big wall of sound that is hard
every sound created when a Hero in the game to make sense of. This can be used to emphasize
uses some kind of magic spell. In this way, the chaos and, in the context of an action game, it
results are applicable to the combined model and might well be useful. Nevertheless, that would be
are presented both as tables (Tables 1, 2, and 3) its only use. With regard to the analysis, F.E.A.R.
115
Figure 7. Imaginable snapshot of sounds in Legend of Zelda
also has few interface sounds, probably due to the and all are diegetic sounds. Sound 21 (Crickets)
minimalistic graphical interface. belongs to the Zone and is an ambient sound with
Warcraft III (2002) was chosen for analysis a lot of treble. Sound 32 (“Our goldmine has col-
on the basis of the personal pre-understanding lapsed”) originates from the non-diegetic Affect
that its audio is well balanced and also complete part of the sonic environment as does sound 43
in relation to the visual component of the game. which is the background music.
In Figure 6, we used the sounds from Warcraft Our analysis of Legend of Zelda (Nintendo,
III to exemplify how to apply the combined model 1987) was mainly due to curiosity about how
for the analysis of a specific game. Numbers are earlier games differ in the distribution of audio cat-
collected from Table 1 as a possible snapshot egories in relation to IEZA and Murch’s conceptual
of sounds. The combined model visualizes the model. The Loudness and Frequency parameters
possible cognitive load in the audio layering. In were omitted in the analysis of Legend of Zelda.
addition, the model visualizes that there are 3 Due to the technical nature of the system in
bass, 4 midrange, and 2 treble sounds. If there which Legend of Zelda is played, the number of
are too many sounds in the centre of the model, possible simultaneous sounds is limited to only
the cognitive load, in terms of encoded sounds, is a few. The dynamic range is also very narrow,
higher. Not surprisingly, the midrange sounds are therefore all the shapes are sparse and equally
cyan (1 = Cutting wood) and violet (32 = “Our sized. The system, as such, does not support
Goldmine has collapsed”). Numbers 12, 17 and 18 spoken language, which is why there are no en-
are sound events connected to Effect and Activity coded sounds.
116
Table 1. F.E.A.R. analysis
F.E.A.R. (2005)
Sound Event State Diegetic? IEZA Color Origin Loudness Frequency Band
1 Weapon reload In-game Yes Effect Yellow Character 3 Middle
2 Clothes sound In-game Yes Effect Yellow Character 2 Middle
3 Player footsteps In-game Yes Effect Cyan Character 2 Middle
4 Enemy footsteps In-game Yes Effect Cyan Character 2 Middle
5 Landing after jump In-game Yes Effect Yellow Character 3 Middle
6 In-game music In-game No Effect Red “Orchestra” 2 Low
7 Enemy gunfire In-game Yes Effect Cyan Character 4 Middle
8 Fire gun In-game Yes Effect Cyan Character 4 Middle
9 Glass shatter In-game Yes Zone Yellow Object 2 High
10 Empty shell bounce In-game Yes Effect Cyan Object 2 High
11 Enter slow motion mode In-game No Affect Orange “Narrator” 2 Low
12 Exit slow motion mode In-game No Affect Orange “Narrator” 2 Low
13 Enemy radio chatter In-game Yes Effect Violet Character 3 Middle
14 Friendly radio chatter In-game Yes Effect Violet Character 3 Middle
15 Radio noise In-game Yes Effect Orange Object 2 Middle
16 Throw grenade In-game Yes Effect Yellow Character 1 Middle
17 Grenade bouncing In-game Yes Effect Cyan Object 2 High
18 Grenade explosion In-game Yes Effect Yellow Object 5 Low
19 Enemy talk In-game Yes Effect Violet Character 3 Middle
20 Change weapon In-game Yes Effect Yellow Object 3 Middle
21 Enemy dies In-game Yes Effect Violet Character 3 Middle
22 Breaking environment In-game Yes Zone Yellow Object 2 Middle
23 Pause game In-game No Interface Orange “Narrator” 1 High
24 Unpause game In-game No Interface Orange “Narrator” 1 High
25 Ghost talking In-game Yes Effect Violet Character 2 Middle
26 Picking up weapon In-game Yes Effect Yellow Object 2 Middle
27 Picking up grenade In-game Yes Effect Yellow Object 2 Middle
29 Throw weapon In-game Yes Effect Yellow Object 3 Middle
30 Pick up health booster In-game No Interface Orange “Narrator” 3 Middle
31 Pick up reflex booster In-game No Interface Orange “Narrator” 3 Middle
32 Pick up medkit In-game Yes Effect Yellow Object 2 Middle
33 Using medkit In-game Yes Effect Orange objetct 2 Middle
28 Menu music Menu No Affect Red “Orchestra” 2 Low
34 Menu selection Menu No Interface Orange “Narrator” 1 High
35 Menu accept Menu No Interface Orange “Narrator” 1 High
36 Menu go back Menu No Interface Orange “Narrator” 1 High
117
Table 2. Warcraft III analysis
Warcraft III
1 Cutting wood In-game Yes Effect Cyan Character 2 Middle
2 “I can´t build there” In-game Yes Effect Violet Character 3 Middle
3 Insufficient recourses In-game Yes Affect Violet “Narrator” 3 Middle
4 ”Awaiting order” In-game Yes Effect Violet Character 3 Middle
5 “Job´s done” In-game Yes Effect Violet Character 3 Middle
7 “Accepting order” In-game Yes Effect Violet Character 3 Middle
8 New Unit Available In-game Yes Effect Violet Character 3 Middle
9 Unit attack order In-game Yes Effect Violet Character 3 Middle
10 Click on building In-game Yes Effect Yellow Object 2 Low/Middle
11 Building construction In-game Yes Effect Yellow Object 2 Low/Middle
12 Goldmine collapse In-game Yes Effect Yellow Object 4 Low/Middle
13 Building collapse In-game Yes Effect Yellow Object 4 Low/Middle
14 Building on fire In-game Yes Effect Yellow Object 2 Middle
15 Building attacked In-game Yes Effect Yellow Object 2 Middle
16 Click on “critter” In-game Yes Effect Yellow Character 1 Middle
17 Unit Attacked In-game Yes Effect Yellow Character 2 Low
18 Falling tree In-game Yes Effect Yellow Object 4 Low
19 Singing birds In-game Yes Zone Yellow Ambience 1 High
20 Ambient noise In-game Yes Zone Yellow Ambience 1 High
21 Crickets In-game Yes Zone Yellow Ambience 1 High
22 Frog In-game Yes Zone Yellow Ambience 1 Middle
23 Cicadas In-game Yes Zone Yellow Ambience 1 High
24 Owl In-game Yes Zone Yellow Ambience 1 Middle
25 Hero uses magic In-game Yes Effect Orange Character 2 All
26 Unit constructing In-game Yes Effect Cyan Character 2 Short
27 Unit dies In-game Yes Effect Cyan Character 3 Middle
28 “Victory” In-game No Affect Violet “Narrator” 4 Low/Middle
29 “Defeat” In-game No Affect Violet “Narrator” 4 Low/Middle
30 “Research complete” In-game No Affect Violet “Narrator” 3 Middle
31 “Upgrade complete” In-game No Affect Violet “Narrator” 3 Middle
“Our goldmine has col-
32 lapsed” In-game No Affect Violet “Narrator” 3 Middle
33 “Our hero has fallen” In-game No Affect Violet “Narrator” 3 Middle
“Our forces are under
34 attack” In-game No Affect Violet “Narrator” 3 Middle
35 Rooster In-game No Affect Yellow “Narrator” 2 Middle
36 Wolf Howl In-game No Affect Yellow “Narrator” 2 Low
37 Set rally point In-game No Affect Yellow “Narrator” 2 Low
38 Set building spot In-game No Affect Yellow “Narrator” 2 Low
continued on following page
118
Table 2. continued
Warcraft III
39 Unavailable Sound In-game No Interface Yellow “Narrator” 3 Middle
40 Click upper GUI In-game No Interface Orange “Narrator” 1 Middle
41 Mini map signal In-game No Affect Orange “Narrator” 1 High
42 Click lower GUI In-game No Interface Orange “Narrator” 1 Middle
43 Background Music In-game No Affect Red “Orchestra” 2 Low/Middle
44 Meteor Falling Menu Yes Effect Yellow Object 2 High
45 Meteor Impact Menu Yes Effect Yellow Object 3 Low
46 Rain Menu Yes Zone Yellow Ambience 1 High
47 Thunder Menu Yes Zone Yellow Ambience 2 Low
48 Click Menu Menu No Interface Orange “Narrator” 2 Middle
49 Menu switch Menu No Interface Orange “Narrator” 2 High
50 Menu Music Menu No Affect Red “Orchestra” 2 Low/Middle
The 3 sample analyses clearly show that the The combined model above (Figure 3) al-
emphasis on specific parts of the IEZA-framework lows the visualization of the sonic environment
shifts depending on what genre and what kind of of a computer game, in terms of cognitive load
platform the game belongs to. The snapshots made (Figures 5, 6, 7 and 8). While producing a sonic
with the combined model show how sounds are environment for a game, the different sounds to be
clustered and provide a visualization of the sound used are first categorized, in accordance with the
layering. Figure 5 illustrates how the sounds of IEZA-framework, and then placed into Murch’s
F.E.A.R. are mostly only within one quarter of model, as more or less encoded or embodied, which
the model: the diegetic activity quarter. A highly in combination results in our proposed model
paced game, such as this, would probably have (Figures 3 to 8). The combined model visualizes
most of its sound in this quarter. Figure 6 shows the sonic environment of a given environment in a
that Warcraft III has sounds in all 4 quarters, with way that makes it possible to see how sound might
an emphasis on the diegetic activity quarter, while be clustered; the closer the sounds are to each other
Figure 7 is an example of an older kind of game the fewer that can be used if clarity of meaning
in which the technical limitations affect how the is wanted, that is, a good level of semantic value.
sonic environment is structured. The effect will be that the sound designer can see
beforehand whether the sonic environment will
be biased towards encoded or embodied sound,
UsING tHE cOMbINED MODEL providing the opportunity to rebalance accord-
As A PrODUctION tOOLsEt ingly. This also balances the frequency spectrum
of the sonic environment and distributes the
We have declared that the potential loss of control cognitive load in the brain. Even if all possible
over the sonic environment while producing a combinations of sounds cannot be plotted, a sound
computer game is a problem. How can the com- strategy for limiting these unwanted effects can
bined model solve this? be adopted by plotting prototypical game events
119
Table 3. Legend of Zelda analysis
Legend of Zelda
Sound State Diegetic? IEZA Color Origin
1 Enter cave/walking stairs In-game Yes Effect Cyan Character
2 Use sword In-game Yes Effect Yellow Character
3 Sword shoots In-game Yes Effect Yellow Character
4 Enemy takes damage In-game Yes Effect Yellow Character
5 Enemy dies In-game Yes Effect Yellow Character
6 Open locked door In-game Yes Effect Cyan Object
7 Door shuts In-game Yes Effect Cyan Object
8 Boomerang In-game Yes Effect Cyan Object
9 Boss sound In-game Yes Effect Yellow Character
10 Sword useless In-game Yes Effect Yellow Character
11 Place bomb In-game Yes Effect Yellow Object
12 Bomb explode In-game Yes Effect Yellow Object
13 Waves against shoreline In-game Yes Zone Yellow Ambience
14 Background music In-game No Affect Red “Orchestra”
15 Letters typing In-game No Effect Cyan “Narrator”
16 Pick up consumable In-game No Effect Orange “Narrator”
17 Pick up quest item In-game No Effect Orange “Narrator”
18 Consumable appear In-game No Effect Orange “Narrator”
19 Collect money In-game No Effect Orange “Narrator”
20 Dungeon music In-game No Affect Red “Orchestra”
21 Key appear In-game No Affect Orange “Narrator”
22 Collect key In-game No Effect Orange “Narrator”
23 Solve a puzzle In-game No Affect Orange “Narrator”
24 Take compass In-game No Effect Orange “Narrator”
25 Take Map In-game No Effect Orange “Narrator”
26 Low health In-game No Affect Orange “Narrator”
27 Player character “dies” In-game No Effect Orange “Narrator”
28 Switch item In-game No Interface Orange “Narrator”
29 Dungeon Complete Music In-game No Affect Red “Orchestra”
30 Menu music Menu No Affect Red “Orchestra”
31 Menu selection Menu No Interface Orange “Narrator”
32 Chose letter Menu No Interface Orange “Narrator”
33 Game over music Menu No Affect Red “Orchestra”
in accordance with the central gameplay aspects the dynamic range, that is, the relative loudness
of a given game and a given game genre. One between sounds. It can at first be difficult to see
can also use the different sizes of the primitives, how to utilize the model practically. The model’s
put into the combined model (Figure 8), to show present design may well be refined later, but that
120
Figure 8. Shoot the Ducks level 1
does not really matter. This is not just a model in order to avoid cognitive overload. In fact, the
but also a kind of paradigm or, in other words, a combined model might, in itself, be used as an
way of thinking about these matters. The key is interface for audio editing software.
to be pro-active with regard to sound design and What then are the benefits of using our com-
to plan the distribution of sounds before they are bined model in practice? Let us provide an example
even created. of creating the sound design for a simple game.
In most audio-editing software, the colors of We first present the game’s design document.
Murch’s original conceptual-model, and our com- The aim here is not to create a stunning new best
bined model (Figures 2 and 3), may well be used selling game, but rather to exemplify how the
to designate the status of the sound files as more combined model may be used to plan the sound
or less encoded. The music track (affect) could of the proposed shoot’em up game on the basis
be made red, guns and explosions (effect) would of a design document.
be yellow, the ambient sounds, such as birds and
so on, (zone) should be orange, and the dialogue shoot the Ducks Design Document
(encoded) shall be blue, which is in accordance
with Murch’s (1998) conceptual model. This Game Objects
feature of the color encoding of specific sound
events is found in many commercial products The game includes twenty objects: 4 ducks, ar-
and may very well be used in this manner while mor for the ducks for each of the levels from 5
creating a sonic environment for a game or movie to 8, 2 guns, a pond, and a wall. The wall object
121
Table 4. The sounds from the Shoot the Ducks design document
A sound that is played as background music for instructions

A sound that is played to indicate game started
A sound that is played as background music for level #1
A duck sound that is played while the ducks are swimming
A duck chatter sound
A duck sound that is played when the duck is hit by a shot.
A duck sound that is played when a duck dies
A bounce sound that is used when the ducks hit a wall object.
A gun handling sound that indicates the change from one gun to another gun click
A sound that is played when gun #1 is fired
A sound that is played when gun #2 is fired
A sound that is played when a wall is hit by a shot
A sound that is played as background music end titles
A sound that is played to indicate that the player has reached high score
A sound that is looped that contains ambience sound
A sound that is played to indicate a score change
A sound that is played to indicate a change of level
has a grass-like image, while the playing area, place. The gun object is placed at the bottom of
the pond is surrounded by the grass-like objects. the screen, where it is fixed and can only move
From levels 4 to 8, the pond has small islands to the left and right in a half circular pattern by
of wall objects behind which the ducks can seek keyboard commands (See Controls).
shelter (they are programmed to do so). The ducks
swim in the pond which is made of water like Sounds
images. The only function of the wall object is
to stop the ducks from moving out of the pond. We use 24 sounds in this game, covering all the
The duck object, which has the image of a duck, categories from the IEZA-framework as well as
moves with varying speed. Whenever it hits a wall the span from encoded to embodied sounds (see
object it bounces. The player can shoot through Table 4). In addition, the sounds are spread across
the walls on levels 4 to 8, but the effect of the the action versus setting axis and the diegetic
shots decreases. Whenever the player hits a duck versus non-diegetic axis.
the score increases 10 points. The duck’s speed
increases slightly when it jumps to a random
122
Controls Utilizing the combined Model

in a Game Design Document
Both mouse and keyboard control the game. The
mouse must have left and right buttons and a The relationship between game design and sound
scroll wheel. In order to aim the gun, A and D on design obviously depends on the complexity of
the keyboard are pressed. The A button moves the the game. In this case we mainly focused on how
gun barrel towards the left and D moves the gun our model can be implemented in the sound de-
barrel towards the right. The gun is fired with a sign process. We first categorized each sound and
click of the left mouse button, and using the mouse determined its position in our combined model
wheel causes a gun change from one to another. (Table 5). Categorizing the sounds in relation to
our combined model provides us with a sense of
Game Flow how the sonic environment will be balanced. For
the sake of consistency, we chose to use a table
The game starts with the instructions, background that is similar to the analysis examples mentioned
music #1 plays and the game begins when the earlier (Tables 1, 2 and 3). Furthermore, in order to
player presses the on-screen start button. This keep it simple, we clustered all the similar sounds
shows the room with the swimming ducks. into groups, for example, all the music sounds do
The game ends when the player presses the not have to be specified as single sound events.
<Esc> key. They will not be played simultaneously under
any circumstances, if the game runs as intended.
Scores Instead of 10 music sounds we only added one. If
we group similar sound events, we do not have to
At the start of the game the score is set at 0. The think about sound variations at this stage. We also
number of hits a duck can take before dying de- used this kind of sound grouping in the analyses
pends on the following factors: mentioned earlier in this chapter. However, you
might then ask why the Warcraft III analysis
• Distance from the gun (Table 2 and Figure 6) has a lot of quotations in
• Angle of the shot its table whereas the F.E.A.R. table (Table 1 and
• Area hit by the shot Figure 5) does not. This is simply because all
• Whether the shot has passed a grass island the quotations from Warcraft III originate from
• The armor value of the duck. essential sound events that are important for the
gameplay. In the F.E.A.R. analysis, we chose to
Levels group the sounds of characters because they have
sufficient similarities to constitute 1single group
The game has 8 levels. The difficulty of the game of sounds. By carefully planning the game audio
increases because the initial speed of the ducks with two of Chion’s suggested listening modes in
increases after each level and they are given ar- mind (causal listening and semantic listening), the
mor from levels 4 to 8. The pond also has small sound designer can group the different sounds in
islands of grass behind which the ducks can seek relation to their cause and meaning. Furthermore,
shelter from levels 4 to 8. The game ends when if the sound designer emphasizes the basic level
the player has killed all the ducks on all the levels of categorization, the game audio, as such, will
or when the player presses <Esc>. suggest what the player is supposed to do or
provide feedback about what the player has done.
123
In the following review of a few sound events, Murch’s (1998) conceptual model, music sounds
we explain the process using our combined model are red and embodied.
as a production toolset. We begin with looking We now categorized an interesting sound event
at quite a tricky sound event: the sound omitted which allowed us to make an active decision to
when a duck is swimming. To simplify matters, minimize the cognitive load. According to Murch,
we quickly decided to use a swimming sound, too much dialogue or, more specifically, too much
some kind of movement through water. It is first spoken language which is encoded, will make the
necessary to determine if a sound for this event sonic environment dense and avoiding an excess of
would take place in the gameworld. In other words, spoken language means keeping the sound design
should it be considered diegetic or non-diegetic? clear. In this trivial example of just a few sounds,
Indeed, it must be diegetic since the ducks live and a cognitive overload is obviously unlikely, but it
move within the game’s environment of which the is, however, good practice to think about. The
water is very much a part. After deciding that the sound for a level change could utilize some sort
sound is diegetic, we looked at where it belongs of announcer using a voice to inform the player
in the IEZA-framework. With regard to the IEZA- of the level change. If we, for example, choose
framework, we have two options: A diegetic sound to use an encouraging musical effect instead, the
can either be a zone or an effect sound. This is result would be quite different. A short encouraging
where it becomes tricky if we do not pause for musical effect is non-diegetic, so the sound was
a second. The outcome depends on whether the either affect or interface. Since it belonged to the
sound is omitted, due to player-induced activity, game system, it was therefore an interface sound
or if it is an integral part of the game’s setting. It and given the color orange, which is the category
should be remembered that in the IEZA-frame- for musical effects and embodied sounds.
work, a sound, in order to qualify as being the This game had no encoded sounds (violet) be-
result of activity, has to be directly or indirectly cause we avoided including dialogue and spoken
triggered by the actions of the player. Since this language (see Table 5).
sound event is not triggered by the player, either As the snapshot (Figure 8) of the combined
directly or indirectly, we categorized it as a zone model below illustrates, the sounds at this level
sound. The presence of the swimming sound is to of the game are spread evenly across the dy-
enhance the environment, as well as sustain the namic range and the frequency range of the
presence of water in the pond and the motion of the sonic environment.
ducks. Our last step was to determine the position This example shows how the combined model
of the sound in Murch’s conceptual model. The may function as a production toolset. A sound
sound has a kind of rhythmic effect which neatly designer can, of course, make a quick pencil draw-
suits the description of the color cyan. The sound ing based on the game design document. The use
of swimming can also have the semantic value of of a computer to begin the planning of the game
movement, which is the category of cyan sounds. audio is not needed. Making the model easy to
The music sound was the next to be cat- draw with pencil and paper is beneficial for the
egorized. The design document did not mention sound designer. She can be part of the production
any in-game orchestra that plays music and we process as soon as there is a design document or,
therefore categorized the sound as non-diegetic. in fact, in the initial discussions before a design
This gave us 2 options: Did the sound depend on document exists. The purpose of the combined
activity or setting (in other words, was it an affect model is to be a rapid but, at the same time, very
or interface sound)? We chose affect. Referring to structured way of planning the sonic environment
of a game. Working with the model is meant to be
124
Table 5. Shoot the Ducks sounds categorized
Shoot the Ducks

Sound Event Comment Diegetic? IEZA Color Loudness Frequency Band
Game started Beep No Interface Orange 3 Middle
Music Percussion No Affect Red 2 Low
Swimming
Duck swimming sound Yes Zone Cyan 1 Middle
Duck chatter Yes Effect Cyan 2 Middle
Duck hit Duck ”scream” Yes Effect Yellow 3 Middle
Duck dies Yes Effect Yellow 4 Middle
Cartoony
Duck bounce off wall bounce Yes Zone Orange 2 Low
Change weapon Click Yes Effect Yellow 2 Low
Gun fired Yes Effect Yellow 5 Low
Bullet through wall Yes Effect Yellow 2 Middle
Tree whisper
Ambience and birds Yes Zone Yellow 1 High
Score change Beep No Interface Orange 3 High
Short musical
Level change effect No Interface Orange 3 Middle
As Table 5 above indicates, there is an emphasis on the sounds of effects in the game, which is due to the shooter genre as such. A shooter
needs the sounds of effects as the main audio feedback for the player. After all, as a shooter you are supposed to shoot things. You would
also expect sounds from weapons as well as those designating hits or missed shots, the handling of different weapons and maybe some big
explosions. The internal order of the IEZA-framework would rather be EZIA in this example of Shoot the Ducks.
an easy process that leads to thinking about sound • The general lack of functional models for
in a diversified way, providing density and clar- analyzing computer game audio
ity, avoiding a logjam of sound, unwanted sonic
artifacts as well as a clear cut visualization that In order to solve the first problem, we have
is communicable to other members of a game provided a functional model for the analysis of
development team. We have deliberately tried to existing sonic environments in computer games
minimize the terminology within the combined and movies. The combined model covers the dy-
model in order to make it comprehensible for team namic range of the sounds in relation to each other
members other than the sound designer. and the frequency range occupied by the different
sounds. Encoded sounds, primarily speech, have a
natural position in the human frequency response
cONcLUsION curve. Our work has been anchored in theories
spanning from linguistics to semiotics and from
As this chapter has shown, audio in computer film theory to theories of cognition. We have
games is a complex matter the understanding of found that different genres of games have differ-
which could be made easier using the suggested ent emphases on which types of sound dominate
combined model for game audio. A summary of the sonic environment. This might be caused by
the problems addressed in this chapter and our the choice of technology or by the genre as such.
solutions to these problems follow.
125
A shooter needs more effect sounds than a role the sound to avoid cognitive overload, supports
playing game, for example. density and clarity, diversifies the sounds in the
We have briefly presented a case study, In The 4 basic categories of Interface, Effect, Zone,
Maze, to show how sound might be part of a game and Affect sounds, as well as the setting versus
playing strategy, despite the results of the case activity axis. It also allows the sound designer
study also supporting the idea that prior experience to distinguish between diegetic and non-diegetic
and anticipation, concerning the game’s content, sounds as well as embodied versus encoded ones.
affect the locomotive patterns in a game, as also The structure of the combined model provides an
the visual field does. Ong’s statement that sight overview that enables the clustering of encoded
separates us whereas sound integrates us with the and embodied sounds to be visualized in order
environment has been discussed and put in relation to help the sound designer plan the production.
to the models for sound suggested by Huiberts Furthermore, the combined model establishes
and van Tol as well as by Murch. a common ground of terminology that is commu-
nicable in a dialogue between the sound designer,
• The general lack of functional models for the game designer, the game writer, the graphical
the production of game audio artist and the programmer.
The combined model will need further refine-
In this chapter, we have attempted to put forth ment and might have the potential to function as
a model for the production of the sonic environ- an interface for sound design software. However,
ments of computer games. We have shown how even a small step for a sound designer, such as
the sonic environment of computer games (and this, might serve as a good starting point for how
movies) may be planned to avoid cognitive over- to plan and analyze the sonic environments of
load as well as unwanted interference, by using computer games.
a model that combines Huiberts and van Tol’s
(2008) IEZA-framework for computer game audio
and Murch’s (1998) conceptual model for film rEFErENcEs
sound. The loss of control a sound designer has
over the playback of the audio in the gameplay Bordwell, D., & Thompson, K. (1994). Film his-
of a complex game may lead to a chaotic blur of tory: An introduction. New York: McGraw-Hill.
sounds causing them to lose their definition and Bordwell, D., & Thompson, K. (2001). Film art:
thereby their semantic value. An introduction. New York: McGraw-Hill.
• When 2 or more sounds are played simul- Bugelski, B. R., & Alampay, D. A. (1961). The
taneously, the clarity of the mix depends role of frequency in developing perceptual sets.
on the type of sounds, which leads to Canadian Journal of Psychology, 15(4), 201–211.
• The nature of the relationship between en- doi:10.1037/h0083443
coded and embodied sounds Cancellaro, J. (2006). Exploring sound design for
interactive media. Clifton Park, NY: Thomson
The sound designer has some, though limited, Delmar Learning.
control of the sonic environment in a game. To
avoid a blurred sonic environment, it will be nec- Childs, G. W. (2007). Creating music and sound
essary to define the sound as much as possible. for games. Boston, MA: Thomson Course Tech-
The combined model gives the sound designer an nology.
overview of the sonic environment that structures
126
Chion, M. (1994). Audio-vision: Sound on screen. Hug, D. (2011). New wine in new skins: Sketching
New York, Colombia: University Press. the future of game sound design . In Grimshaw,
Coppola, F. F. (Director). (1979). Apocalypse now!
[Motion picture]. Hollywood, CA: Paramount
PA: IGI Global.
Pictures.
Huiberts, S., & van Tol, R. (2008). IEZA: A
Cunningham, S., Grout, V., & Picking, R. (2011).
framework for game audio. Gamasutra. Retrieved
Emotion, content and context in sound and music
October 13, 2008, from http://www.gamasutra.
. In Grimshaw, M. (Ed.), Game sound technology
com/view/feature/3509/ieza_a_framework_for_
and player interaction: Concepts and develop-
game_audio.php
Droumeva, M. (2011). An acoustic communication
framework for game sound – Fidelity, verisimili-
games revisited . In Grimshaw, M. (Ed.), Game
tude, ecology . In Grimshaw, M. (Ed.), Game sound
sound technology and player interaction: Con-
developments. Hershey, PA: IGI Global.
Kubelka, P. (1998). Talk on Unsere Afrika Reise.
Ekman, I. (2008). Comment on the IEZA: A
Presented at The School of Sound, London,
framework for game audio. Gamasutra. Retrieved
England.
January 13, 2010, from http://www.gamasutra.
com/view/feature/3509/ieza_a_framework_for_ Kubrick, S. (Director). (1968). 2001: A space
game_audio.php odyssey [Motion picture]. Location: Metro-
Goldwyn-Mayer.
Farnell, A. (2011). Behaviour, structure and causal-
ity in procedural audio . In Grimshaw, M. (Ed.), Lakoff, G. (1987). Women, fire and dangerous
Game sound technology and player interaction: things. Chicago: University of Chicago Press.
Lakoff, G., & Johnson, M. (1980). Metaphors we
Global.
live by. Chicago: University of Chicago Press.
F.E.A.R. (2005). Vivendi Universal Games.
Lakoff, G., & Johnson, M. (1999). Philosophy in
Monolith Productions.
the flesh. New York: Basic Books.
Gibson, J. (1977). The theory of affordances . In
Legend of Zelda. (1987). Nintendo.
Shaw, R. E., & Bransford, J. (Eds.), Perceiving,
acting and knowing (pp. ##-##). New Jersey: LEA. Loftus, G. R., & Loftus, E. F. (1983). Mind at
play. New York: Basic Books.
Gibson, J. (1986). The ecological approach to
visual perception. New Jersey: LEA. Marks, A. (2001). The complete guide to game
audio. Location: CMP.
Howard, D. M., & Angus, J. (1996). Acoustics
and psychoacoustics. Oxford: Focal Press.
127
Miller, G. A. (1956). The magical number seven, Pudovkin, V. I. (1985). Asynchronism as a prin-
plus or minus two: Some limits on our capacity ciple of sound film . In Weis, E., & Belton, J.
for processing information. Originally published (Eds.), Film sound: Theory and practice (pp.
in The Psychological Review (1956), 63, 81-97. ##-##). New York: Columbia University Press.
(Reproduced, with the author’s permission, by (Original work published 1929)
Stephen Malinowski). Retrieved March 10, 2009,
Sjöström, V. (1921). The phantom chariot. Svensk
from http://www.musanim.com/miller1956/
Filmindustri.
Sobchack, V., & Sobchack, T. (1980). An introduc-
tion to film. Boston, MA: Little Brown.
developments. Hershey, PA: IGI Global. Space Invaders. (1978). Taito.
Murch, W. (1998). Dense clarity – Clear density. Thom, R. (1999). Designing a movie for sound.
Retrieved March 10, 2009, from http://www.ps1. Retrieved July 7, 2009, from http://filmsound.org/
org/cut/volume/murch.html articles/designing_for_sound.htm
Murray, J. (1997). Hamlet on the holodeck: The Valve Corporation. (1998). Half-Life [computer
future of narrative in cyberspace. Cambridge, game]. Sierra Entertainment.
MA: MIT Press.
Wallén, J. (2008). Från smet till klarhet. Un-
Myst. (1993). Brøderbund. published bachelor’s thesis. University of
Skövde, Country. Retrieved month day, year,
Ong, W. (1982/1990). Orality and literacy: The
from http://his.diva-portal.org/smash/record.
technologizing of the word (L. Fyhr, G.D. Hansson
jsf?searchId=1&pid=diva2:2429
& L. Perme Swedish Trans.). Göteborg, Sweden:
Anthropos. Warcraft III. (2002). Blizzard Entertainment.
Pac-Man. (1980). Namco. White, G. (2008). Comment on the IEZA: A
framework for game audio. Retrieved January
Pollack, I. (1952). The information of el-
13, 2010, from http://www.gamasutra.com/view/
ementary auditory displays. The Journal of the
feature/3509/ieza_a_framework_for_game_au-
Acoustical Society of America, 24, 745–749.
dio.php
doi:10.1121/1.1906969
Wilhelmsson, U. (2001). Enacting the point of be-
Pollack, I. (1953). The information of elemen-
ing. Computer games, interaction and film theory.
tary auditory displays II. The Journal of the
Unpublished doctoral dissertation. University of
Copenhagen, Country.
doi:10.1121/1.1907173
Prince, R. (1996). Tricks and techniques for sound
effect design. CGDC. Retrieved October 10,
ADDItIONAL rEADING
2008, from http://www.gamasutra.com/features/
sound_and_music/081997/sound_effect.htm Adams, E., & Rollings, A. (2007). Game design
and development. Saddle River, NJ: Pearson-
Prentice-Hall.
128
Alexander, B. (2005). Audio for games: Planning, Katz, J. (1997). Walter Murch in conversation
process, and production. Berkeley: New Riders. with Joy Katz. PARNASSUS Poetry in Review:
The Movie Issue, 22, 124-153.
Alexander, L. (2008). Does survival horror really
still exist? Retrieved March 12, 2009, http://ko- Klevjer, R. (2007). What is the avatar? Fiction
taku.com/5056008/does-survival-horror-really- and embodiment in avatar-based singleplayer
still-exist. computer games. Unpublished doctoral disserta-
tion. University of Bergen, Country.
Branigan, E. (1992). Narrative comprehension
and film. London, New York: Routledge. Murch, W. (2000). Stretching sound to help the
mind see. Retrieved January 25, 2010, from http://
www.filmsound.org/murch/stretching.htm.
ticipatory and non-Linear aspects of video games
audio . In Hawkins, S., & Richardson, J. (Eds.), Neale, S. (2000). Genre and Hollywood. New
Essays on sound and vision. Helsinki: Helsinki York: Routledge.
University Press.
Perron, B. (2004). Sign of a threat: The effects
Cousins, M. (1996). Designing sound for of warning systems in survival horror games. In
Apocalypse Now. In J. Boorman & W. Donohue, . Proceedings of COSIGN, 2004, 132–141.
Projections 6: Film-makers on film-making (pp.
Salen, K., & Zimmermann, E. (2004). Rules of
149-162). Location: Publisher.
play. Game design fundamentals. Cambridge,
Grimshaw, M., & Schott, G. (2007). Situating MA: MIT Press.
gaming as a sonic experience: The acoustic ecol-
ogy of first person shooters, In Proceedings of
from an auditive perspective . In Proceedings of
DiGRA 2007: Situated Play.
DiGRA 2003. Level Up.
Huizinga, J. (1955). Homo ludens: A study of the
Taylor, L. (2005). Toward a spatial practice in
play element in culture. Boston: Beacon Press.
video games. Gamology. Retrieved June 23, 2007,
Jørgensen, K. (2006). On the functional aspects from http://www.gameology.org/node/809.
of computer game audio. In Proceedings of the
Whalen, Z. (2004). Play along: An approach to
Audio Mostly Conference.
video game music. GameStudies, 4(1).
Jørgensen, K. (2007). ‘What are these grunts and
growls over there?’ Computer game audio and
player action. Unpublished doctoral dissertation.
Copenhagen University, Country.
Jørgensen, K. (2008). Audio and gameplay: An Affordance Theory: A theory put forth by
analysis of PvP pattlegrounds in World of Warcraft. James J Gibson (1977 and 1986). An affordance
GameStudies, 8(2). is what an environment provides an animal. A path
in a wood is “walk-able”, a chair is “sit-able” etc.
Juul, J. (2005). Half-real. Video games between Ambient Listening: increasing the frequency
real rules and fictional worlds. Cambridge, MA: range of the sound by turning the body towards its
MIT Press. source for higher definition of the sound.
129
Ambulatory Listening: listening by moving ENDNOtEs

around and using sound as part of the navigation
within the environment.
1
In action movies there has been, and still is,
Aperture Listening: successive scanning of a tendency to equate loud with good (Thom,
the audio stimuli. 1999). Violent explosions, big loud weap-
Combined Model for the Structuring Com- ons and roars of wild engines fill the sonic
puter Game Audio: The model suggested in this environment in far too many action movies
chapter that combines the IEZA-framework with produced in the last two decades.
Murch’s conceptual model.
2
The interested reader is referred to a good
IEZA-Framework: A framework suggested textbook on acoustics and psychoacoustics
by Sander Huiberts and Richard van Tol 2008. The such as Howard and Angus (1996).
IEZA-framework distinguishes between sounds
3
The original article on the IEZA-framework
that belong to: the Interface (I), the Effects (E), the has been criticized for not considering previ-
Zone (Z) and the Affects (A) in a computer game. ous work (Ekman, 2008; White, 2008).
Murch’s Conceptual Model: A model for
4
Which is of course the case also with the
the production of film sound put forth by sound original IEZA-framework and Murch’s
designer Walter Murch 1998. The conceptual conceptual model but the level of detail is
framework spans from encoded sound (language) significantly higher in the combined model.
to embodied sound (music). It also suggests that
5
Of course further research into this would
in order to obtain density and clarity of a sound be necessary to put forth more solid cogni-
mix the sound designer should limit the amount tive ground for these primitives but we do
of sound layers to five separate layers. believe that they fill their purpose for our
Snapshot Listening: Fixating a point and combined model.
then shifting to some other point momentarily by
6
As Loftus &Loftus (1983) have shown, play-
filtering out all other sound sources. ers may occasionally try to talk to the system
in acts of personification of the system.
130
131
Chapter 7
An Acoustic Communication
Framework for Game Sound:
Fidelity, Verisimilitude, Ecology
Milena Droumeva
Simon Fraser University, Canada
AbstrAct
This chapter explores how notions of fidelity and verisimilitude manifest historically both as global
cultural conventions of media and technology, as well as more specifically as design goals in the produc-
tion of sound in games. By exploring these two perspectives on acoustic realism through the acoustic
communication framework with its focus on patterns of listening over time, acoustic communities, and
ecology, I hope to offer a model for future theorizing and exploration of game sound and a lens for in-
depth analysis of specific game titles. As a novel contribution, this chapter offers a set of listening modes
that are derived from and describe attentional stances towards historically diverse game soundscapes
in the hopes that we may use these to not only identify but also evaluate the relationship between gam-
ing and culture.
INtrODUctION Wallén, 2011), which builds on several existing

design guideline systems for game sound (Ekman,
Within game studies—a relatively young disci- 2005; Grimshaw & Schott, 2007; Jørgensen, 2006;
pline itself—the field of game sound has already Stockburger, 2007), and particularly Grimshaw’s
experienced growth, however there are still (2008) conceptualization of an acoustic ecology
scarce resources and analytical frameworks for in first-person shooter games are beginning to
understanding the role of sound for purposes pave the way for more in-depth explorations into
of cultural critique, historical analysis or cross- understanding, analyzing and representing the role
media examination. Frameworks such as the IEZA of sound in games.
one (Huiberts & van Tol, 2008; Wilhelmsson & In addition to the more established foundations
of game sound in music synthesis, algorithmic
DOI: 10.4018/978-1-61692-828-5.ch007 sound generation, and real-time implementation of
An Acoustic Communication Framework for Game Sound
sound effects (Brandon, 2004; Collins, 2007; Fri- and characterizes the magic flow state in games.
berg & Gärdenfors, 2004; Roeber, Deutschmann, Finally, I’d like to connect both these ideas to
& Masuch, 2006), there is a need for building acoustic ecology and particularly to the concept
more general theoretical and analytical frame- of acoustic community, which includes the real
works to describe the various elements of game situation of a player’s own acoustic soundscape
sound and their role within the game’s designed in addition to the game’s sonic environment,
soundscape and its informational ecology. Ex- interlaced in a complex ecology.
amples of rich theoretical works on game sound
are still few (Collins, 2008; Grimshaw, 2008). I
would like to propose a framework for studying tHE AcOUstIc cOMMUNIcAtION
game sound that engenders a multi-disciplinary MODEL: bAcKGrOUND AND
perspective with a specific focus on listening as rELEVANcE tO GAME sOUND
a dynamically developing, socio-cultural activity
influenced by and influencing cultural production The concept of acoustic communication articu-
and experience. This framework, based on the lated by Truax (2001) is a framework that attempts
acoustic communication model developed by to bring multi-disciplinary perspectives into the
Barry Truax (2001) and inspired by R. Murray study of sound reception as well as sound produc-
Schafer (1977) combines media histories with the tion and that provides a structure for analyzing and
current technological and cultural reality and takes understanding the role of sound in contemporary
a critical analytical stance towards discussing the culture, in media, and in technology. Its roots lie
way media shapes our world. in the tradition of acoustic ecology that was the
Delivering a full history of any game sound basis of Schafer’s work in the late 1960s and
predecessors and tracing critical, socio-cultural 1970s: work that is already referenced by several
perspectives of every game genre in existence authors (Grimshaw, 2008; Hug, 2011). The fol-
is not only an ambitious task, but is one that has lowing history helps contextualize and focus the
been done in parts by both scholars and game particular perspective that acoustic communica-
writers (Collins, 2008; McDonald, 2008). Instead, tion has taken on.
I will focus on two particular aspects of game A pioneer in the field of acoustic ecology,
sound—fidelity and verisimilitude—and situate Schafer first defined the notion of a soundscape
them within the interdisciplinary framework of to mean a holistic system of sound events consti-
analysis that the acoustic communication model tuting an acoustic environment and functioning
offers. They are two sides of the same idea rep- in an ecologically balanced, sustainable way
resenting notions of realism or reality in game (Schafer, 1977). Born out of the threat of urban
soundscapes. They reflect long-standing cultural noise pollution, Schafer focused on conceptual-
ideals and production values whose histories izing and advocating an ecological balance in the
transgress radio, cinema, and real-world envi- acoustic realm. He developed the terms hi-fi and
ronments. By juxtaposing the two ideas in this lo-fi to describe different states of aural stasis in
manner I hope to elucidate qualities and features the environment. A hi-fi soundscape, exemplified
of game sound both in a richer way and within in Schafer’s view by the natural environment, is
a socio-historical discursive context. Fidelity one where frequencies occupy their own spectral
reflects the development of sound in games from niches and are heard distinctly, thus creating a
a technological perspective while verisimilitude high signal-to-noise ratio. A lo-fi soundscape,
reflects the cultural emergence of authenticity, on the other hand, often exemplified by modern
immersion and suspension of disbelief in cinema, urban city settings, is one where amplified sound,
132
traffic, and white noise mask other sound signals and environment. Two major classifications of
and obstruct clear aural communication, creating listening are everyday listening as put forward
a low signal-to-noise ratio (Truax, 2001, p. 23). by Gaver (1994, p. 426) —an omni-directional,
Following Schafer’s work, Truax developed a semi-distracted, adaptive-interactive listening that
multi-disciplinary framework for understanding focuses on immediate information-processing
sound based on notions of acoustic ecology as of sound–and analytic listening (Truax, 2001, p.
well as communication theory. This framework 163) —listening that has attention to detail and
models sound, listener and environment in a holis- which is an expert activity focused on an aesthetic
tic interconnected system, where the soundscape or analytical experience of sound that is rooted in
mediates a two-way relationship between listener context as its frame of reference for the extrac-
and environment (Truax, 2001, p. 12). It also places tion of information from sound characteristics.
importance on the role of context in the process Based on the idea of different classifications of
of listening, emphasizing the listener’s ability to listening, Truax developed a number of categories
extract meaningful information from the content, exemplifying major listening modes and processes
qualities, and structure of the sound precisely (pp. 21-27): see Table 1.
by situating this process in their knowledge and Clearly, this ontology of listening needs a
familiarity with the context and environment significant degree of modification in order to fit
(p. 12). Yet Truax also recognizes listening as a the complexities of listening in gameplay contexts,
product of cultural and technological advances, and we will continue returning, adding to, and
subject to macro shifts and patterns over time. re-conceptualizing the idea of listening positions
Such a multi-disciplinary understanding of sound with regard to game soundscapes. This set of
allows us to bring socio-cultural considerations listening types is simply a beginning, allowing us
into the soundscape paradigm alongside auditory a way to access the historical evolution of listen-
perception and cognition. ing stances as media, technology, and design have
Traditional models of auditory perception changed. These types of listening, as part of the
conceptualize listening as a process of neural acoustic communication framework, directly
transmission of incoming vibrations to the brain represent macro shifts in the historical and cul-
(Cook, 1999) that, shaped by our physiology, al- tural reality of acoustic, electroacoustic, and
lows us to experience sound qualities. In fact, as media listening, and, as an extension, game listen-
pointed out by Truax (2001) and others, listening ing. In analyzing game sound then, this set of
is a complex activity involving multi-level and listening attentions is to be amended in a similar
dynamically shifting attention, as well as higher fashion to uncover and elucidate macro shifts
cognitive functions (inevitably dependent on directly procured by the socio-historical experi-
context) such as memory associations, template ence of sound in games.
matching, and foregrounding and backgrounding The notions of fidelity, verisimilitude, and
of sound (p. 11). Again, this model points to the ecology are a particular choice too, yet the con-
importance of understanding listening as a physi- cept and drive towards realism is one that I see
ological as well as a cultural and social practice. as not only one aspect of game design and game
From a design perspective, it is also imperative culture but a more symbolic movement intersect-
to understand that listening is a dynamic and ing many media genres and technologies. Rather
fluid activity that in turn affects the perception than simply a design requirement, it is an ideology
and experience of sounds in the acoustic or of contemporary mediated expressions. Examples
electroacoustic environment and helps mediate span from immersive cinematic soundscapes for
the relationship between actor, activity, context the big screen and surround sound aesthetics
133
Table 1. List of Listening Positions from the Acoustic Communication framework (Truax, 2001)
Listening Positions Description

Active attentional and purposeful listening, a questing out towards a sound source or soundscape. Sometimes
listening-in-search involves a determined seeking of a particular sound template in an aurally busy environment.
Listening-in-search The cocktail party effect, for example, is a special mode of listening-in-search, which involves a zooming in
on a particular sound source—often semantic-based (speech) and familiar in an environment of competing
sound information in the same spectrum (Truax, 2001, p. 22).
Listening-in-readiness involves background listening with an underlying expectation for a particular sound or
Listening-in-readiness set of sound signals (such as a baby’s cry). It is a sub-attentional listening in expectation of a familiar sound
or signal, a latent alertness.
A non-attentional listening, a receptive stance without a conscious attention or interpretation of sounds or
Background Listening
soundscape heard.
An adaptation of media’s flow of perceptual and attentional cues as delivered through sound. Media listen-
ing and distracted listening are two positions of listening that Truax (2001) argues are a direct result of the
transition to electroacoustic sound and especially the way in which sound has evolved in its use in media.
Media Listening
Since much of media is experienced as a background to life, often in the visual background, programming
flow has developed sophisticated and strong aural cues in order to manage and direct listeners’ attention to
the next item on the media program.
Analytical Listening A focused, critical expert listening to particular qualities of electroacoustic sounds and recordings.
taking the viewer into a powerful suspension of the move from abstract musical chiptunes (8-bit
disbelief, to complete virtual reality, ambient in- synthetic tunes) to realistic sampled sounds in the
telligent environments, and computer-augmented design of game soundscapes. Fidelity here will
physical spaces which have become the norm for exemplify the technological changes in game
contemporary museums and art galleries. There sound’s realism.
is also the ever-so-popular genre of reality TV,
which has reared and acculturated a version of role in Game sound:
society of the spectacle generation of audiences. socio-cultural History
In tracing some of the history of game sound,

FIDELItY Stephen Deutch (2003) makes a convincing point
about the trajectories that sound for games has
Literally, fidelity means faithfulness. In relation taken historically. As he points out, the first game
to sound, fidelity signifies the accuracy and qual- sound designers were essentially musicians and/or
ity of sound reproduction, that is, the degree to experimental composers (p. 31). In that, histori-
which an electroacoustic iteration faithfully rep- cally there was a split between those who followed
resents the original acoustic source. From there, Pierre Schaffeur’s musique concrète tradition and
the notions of hi-fi (high fidelity) and lo-fi have those who were interested in electronic music. The
emerged and are now commonly applied to refer second group ended up getting involved in game
to quality of audio equipment, specific recordings sound production and laying the foundations of
and (cinematic) listening experiences. As noted in contemporary game sound. The way in which
the previous section, Schafer (1977) also utilized this fact concerns fidelity is that while musique
these two distinctions of fidelity, except he applied concrète works with sampled sound—that is, real
them to refer to a soundscape’s ecological balance acoustic sources—as material for sonic expres-
in terms of a signal-to-noise ratio. In this section, sions, electronic musicians were fascinated with
we’ll focus on fidelity as a concept representing the purely abstract world of the synthesizer and
134
the completely un-real soundscapes it produced. constructs her own unique soundscape by moving
From here we have the tradition of chiptunes: 8-bit and interacting with their avatar. Yet even then
synth tunes encoded directly on the microchip of sound effects are “loopy”: they often come from
the game console. Initially, of course, space and generic sound banks, (see Figure 3) and are ex-
memory were some of the pragmatic issues driving actly the same each time they sound, sometimes
the minimalistic and synth-based soundscapes in getting cut off if the player’s actions are faster
games. With technological improvements, such than the sound file’s duration. They get called up
constraints are no longer relevant, however the and filtered according to the spatial/contextual
demographic of game sound practitioners still demands of the character’s progression, however,
exerts a formative role not on what is possible it is only in high-end games, typically in first-
but on what is realized in game sound today and person shooters (FPS) where the richness of a
how associations between sounds and their mean- complex soundscape really comes through with
ings in a game become forged. As Deutch puts 3D audio rendering and spatialization (Grimshaw,
it, even though game sound emulates film sound 2008) to account for acoustic coloration and at-
in its “filmic reality” of representation, it is often mospheric variables. FPS games afford the
too literal—“sound effects as opposed to sound player the unique position of literally listening
design” (p. 31) —see Figure 1. with the character’s ears since the game presup-
Invoking what Schafer (1977) might call the poses the player is that character. Any other POV
listener as composer, many games today utilize (point of view) character stance by definition
adaptive-interactive audio, that is, each player distances the player from the soundscape, making
Figure 1. Note the compressed, repetitive nature of the waveform, reflecting synthetic strings of sounds,
often separated by little sine tone clicks and artificial silences
135
Figure 2. Historical and cross-genre, cross-platform example of game soundscapes. In the first two ex-
amples we see a progression from 8-bit sound to polyphonic synthesized sound, while Fallout 3 reflects
a 3D spatialized environment of varied and large dynamic range (highs and lows) to avoid masking
and maximize clarity; finally God of War features a broadband soundscape where many (high-quality
sounds) are mixed in, competing and somewhat masking each other
them more of an audience member as opposed to in banks that are called up in real-time to be filtered
a true participant in that acoustic ecology. This and mixed in as a player progresses through a
current model of game sound design has slowly game, reflecting the quality of a space, sound
shifted to reflect the interactive, dynamic and behaviour and ambience in real time as well. For
personalized nature of game soundscapes, depart- example, if our avatar is in an ocean setting we
ing from the cinematic tradition and the early will hear waves, wind and seagulls; similarly, if
game 8-bit sound. It uses sound samples organized the avatar is moving down a tunnel looking to
136
Figure 3. Note the flow of gameplay, comprised of series of loops, varied slightly, however having a
uniform attack-sustain pattern thus still sounding “loopy”, and often triggered out of temporal sync,
resulting in unrealistic interruptions and overlap. Also, the stereo zoom-in reveals little if any spatial-
ization. Elements that aren’t identified on the diagram are the background music and cave ambiance,
as well as a few other uniform sound effects such as footsteps
avoid or preempt enemy attacks, out-of-frame rendering, and richer textures but with essen-
sounds are heard as coming from their respective tially the same melodies and game sound conven-
(implied) locations and from the appropriate tions. On the other hand, shifts in interactive-
distance. adaptive audio, as a relatively contemporary
There are three ways in which we can examine design standard, are less evident historically, but
the shifts of game sound fidelity over time. As manifest themselves across different game genres
pointed out in other game sound histories (Collins, and platforms. For instance, portable platforms
2008; McDonald, 2008), 8-bit sound from early feature only a limited sonic variety in representa-
fantasy and arcade-style games has evolved to tive/environmental sound effects, relying heavily
polyphonic MIDI orchestrations, higher quality on synthesized polyphonic mixes; more affordable
137
consoles such as the Gamecube, the Wii, and the in a game but its every unique variation, coloration,
PlayStation 2 tend to feature games with more temporal and spatial character, in interaction with
authentic soundscapes and variety, and higher-end other sounds within the electroacoustic environ-
consoles such as the PlayStation 3 and Xbox 360 ment. Such an approach to game sound synthesis
flaunt stellar graphics as well as multi-channel, would make the game soundscape truly personal-
3D sound capabilities capable of delivering that ized through subtlety and non-repetition, and it
precision of spatialization and timbre character- would reverse the tendency to use substitute aural
istic of FPS games. Similarly, fantasy and action objects or sound images from the cinematic tradi-
role playing games (RPGs) such as Final Fan- tion, essentially returning game sound to a real-
tasy, Prince of Persia, Assassin’s Creed 2, and istic modelling of acoustic phenomena. However,
God of War, to mention a few, use limited and would such a turn eliminate the necessity for
uniform sound effects banks to build environments purposeful sound design? Would it make it all
with minimal acoustic properties: even though about programmatic representation? After all,
the audio is less compressed in quality then in sound’s role in games is not simply descriptive,
their predecessors. Higher-end military, FPS and one of reflecting reality in a high-fidelity manner,
strategy games such as Hitman, and Metal Gear but it is largely about function! Interface sounds,
Solid often combine a rich variety of high-quali- warning sounds, alerts, and musical earcons must
ty sound effects rendered with 3D sound spatial- continue to be part of this acoustic ecology, sub-
ization techniques and sound behaviour physics ject to issues of acoustic balance, masking and
engines to simulate the temporal and spatial tra- fidelity, as well as the informational ecology of
jectories of competing sonic information in the interactive play.
game space.
Finally, fidelity changes in game sound can also the Listening Experience
be discussed in terms of Schafer’s classifications
of hi-fi and lo-fi soundscapes (1977; Truax, 2001, So what types of listening do these aspects of
p. 21) reflecting the ecological acoustic balance in fidelity foster in game players/listeners? Listening
a given environment. Quite simply, as game sound is essentially a particular way of paying attention.
has become more complex, richer in textures, and Truax (2001) describes this phenomenon in terms
in need of accommodating an ever-expanding va- of listening positions that we have developed both
riety of alert cues and signals, game soundscapes with regard to everyday listening and when en-
have become sites for much sonic masking. If gaging with different forms of media (pp. 19-23).
we look at Figure 2 we see a transition from a Film theorists such as Chion (1994) and Murch
one-track synthesized music model, which lacks (1995), among others, have already spoken about
authentic fidelity but has little masking; to more different listening modes: The one proposed by
complex games where the soundtracks become Chion has also been discussed and augmented by
a constant broadband spectrum of high-quality Grimshaw and Schott (2007) in their discussion
music, environmental sound effects, alerts and of FPS games. Tuuri, Mustonen, and Pirhonen
signals, and ambience coloration. (2007) provide a more recent compelling account
However, the newest trend in game sound of listening modes in gameplay, identifying a hier-
design (Collins, 2008; Farnell, 2011; Hug, 2011; archical attentional structure of listening. Table 2
Phillips, 2009) might be to return to synthesis attempts to summarize popular notions of listening
utilizing much more sophisticated tools - physical to game sound and organize them according to
modeling and real-time sound synthesis to real- existing typologies of game functions (Jorgensen,
istically convey not only every sound occurring 2006), attentional positions (Stockburger, 2007)
138
Table 2. An attempt at linking attentional and listening positions with game functions and examples of
game sound
Attentional Game Listening Examples from Reference

Position Functions Position Gameplay Frames
Analytical Lis-
tening (Truax,
2001)
Listening-in-
search (Truax,
2001)
Semantic Listen-
Alerts: notifica-
ing (Chion,
tions, warnings,
Action-Oriented 1994)
Foreground confirmation Trans-diegetic
Functions Causal Listening
and rejection
(Chion, 1994)
Interface sounds
Functional,
Semantic and
Critical Modes
of Listening
(Tuuri, Mut-
sonen, & Pirho-
nen, 2007)
Media Listening
(Truax, 2001)
Navigational
Listening (Grim-
Orienting Func- Contextual
shaw & Schott,
tions sound effects
Mid Ground 2007) Diegetic
Identifying Auditory icons
Causal & Empa-
Functions Earcons
thetic Modes of
Listening (Tuuri,
Mutsonen, &
Pirhonen, 2007)
Background Lis-
tening (Truax,
2001)
Reduced Listen-
Atmospheric ing (Chion,
Musical score
Functions 1994)
Background Environmental Extra-diegetic
Control-related Reflexive
soundscape
functions & Connota-
tive Modes of
Listening (Tuuri,
Mutsonen, &
Pirhonen, 2007)
and states of diegesis (Chion, 1994; Grimshaw, tion of diegesis, and yet function as diegetic cues
2008; Huiberts & van Tol, 2008; Jørgensen, within a game’s soundtrack. As another limitation
2006). As a side note, Jørgensen’s (2011) newest of diegesis, I will argue in the last section of this
work in this book brings an important critique of chapter that it fails to recognize sounds outside
the very usefulness of discussing game sound in the gameworld which may very much be part of
terms of diegesis given that sound in games needs the experience of play: the acoustic soundscape
to function on many different levels besides a of group play, the arcade environment or online
descriptive/immersive one and such levels may audio conferencing such as Teamspeak.
be non-diegetic according to film theory’s defini-
139
VErIsIMILItUDE the player. So already, there is an implication that

the auditeur is also a participant, hearing with
If fidelity refers to the faithfulness of sound quality the ears of the character. As Chion (2003) puts
in computer games, verisimilitude concerns itself it (in relation to David Lynch’s cinematographic
with the experience and nature of truthfulness style): “We listen to the characters listening to us
and authenticity in a game context, as conveyed listening to them” (p. 153). In FPS games, this
through the game soundscape. In the section above relationship is even clearer as the soundscape
we used the notion of fidelity to trace the move design is very intentionally oriented towards an
from synthetic tones representing real actions to authentic experience of listening with the char-
realistic sound effects attached to character move- acter’s ears—the acoustic field shifting with the
ments that are called up interactively to combine avatar’s movement on screen, the reflections,
into a unique and (at least in principle) seamless sound coloration and directionality of sounds
flow. Verisimilitude addresses precisely the nature dynamically and responsively shifting along—a
of this acoustic ecology and its claim to represent mode of listening that Grimshaw (2008) defines
a realistic experience in both temporal and spatial as first-person audition (p. 83).
terms. In its traditional literary/theatrical defini- Undoubtedly, one of the most important
tion, verisimilitude reflects the extent to which a predecessors of game sound is sound in cinema.
work of fiction exhibits realism or authenticity, or Expanding the context of significance to other
otherwise conforms to our sense of reality. In film, media forms would include radio (the predecessor
the notion of verisimilitude signifies the relative to film) as well as television and a particular genre
success of cinematography at creating an immer- of motion picture: cartoons (with their own prede-
sive, engaging fictional world of hyper-realistic cessor, the paper comic). Unlike cinema, however,
proportions both in terms of image and sound, where sound’s role is highly artistic and affective,
but also of intensity of emotion and experience or radio and television, where sound is part of a
(Chion, 1994; Deutch, 2003; Figgis, 2003; Murch, programming flow (Truax, 2001, p. 169) sound
1995). The core idea in this section is the notion in games must aspire to both aesthetic, affective
that game sound has developed historically to as well as informational and epistemic functions.
conform to our sense of reality while at the same Since games are an interactive medium, these
time it has constructed a sense of reality, particular functions often overlap and are interdependent.
to games, that we now expect. Verisimilitude as a feature of a designed or sup-
porting soundscape can be traced back to the early
role in Game sound: days of radio particularly with radio drama (Truax,
socio-cultural History 2001, p. 170). In the absence of a visual reference
in-house generated sound effects came to play a
Cinematic immersion works by presenting a hyper- central role in creating a realistic environment to
real universe, a larger-than-life movie world with go along with the narrative, thus inadvertently
action and emotion wrought to an exaggeratedly giving birth to some of the most widespread
high intensity. It both summons attention and conventions of cinema and game sound: notable
diverts attention. Its visual and auditory elements examples being fist-fight sounds or walking in
both attract and construct an experience and work snow sounds, the former being generated as an
to divert the audience’s attention from realizing artificial exaggeration of what a punch would
that what they see isn’t real. In games, this is even sound like, and the latter is easily simulated by
more the case-by definition games are interactive- grinding a fist into a bag of rice or peppercorns.
their auditory and visual elements are driven by Foley art, which emerged as the mainstream film
140
sound craft in the earlier days of modern cinema, meaningful in themselves mix together to create
and which is experiencing a resurgence today, a flow of gameplay experience (McDonald, 2009)
builds directly onto these conventions, generat- but also a game space.
ing an ever-increasing repertoire of techniques As with narrative support music in cinema,
through which to simulate “real” sounds (typically synthesized tunes in early games, specifically in
by using other acoustic materials). the fantasy genre (titles such as Final Fantasy,
In his discussion of film sound Christian Metz Zelda, Castlevania and others), act as a vector (to
(1985) uses the term aural objects to refer to use Chion’s (1994) term) to the temporal flow of
film’s tendency to solidify an arbitrary relation- the interactive experience and take on iconic or
ship between the viewer/listener’s perception of referential meaning (Deutch, 2003). It is precisely
real sounds and the reality of the actual sound this quality of game sound that illustrates perfectly
sources. The resulting realism, as pointed out the distinction between fidelity and verisimilitude
not only by him, but other film theorists such as - as technologies, storage capacities and process-
Chion (1994), Deutsh (2003), Figgis (2003) and ing speeds of game consoles have improved over
Murch (1995), to name a few, is that film sound time, some games have moved towards a more and
bites become hyper-real: We associate them with more authentic depiction of the acoustic reality,
certain events and interactions in place of their while others continue to preserve the nostalgic
authentic acoustic counterparts. For example, if qualities of what Murch (1995) calls metaphoric
someone played back the actual sound of walking sound, only in better sound quality (see Figure 2).
in snow and the sound of close-miked grinding Metaphoric sound—one that does not represent
into a bag of rice, most of us would perceive the the action seen on the screen realistically, is so
latter as more real. Given such a set of conven- ingrained in our cultural memory that it seems odd
tions, and media’s natural condition of being an to even point it out. Popularized by early fantasy
inter-textual and self-perpetuating phenomenon, games and their predecessors—isomorphic car-
subsequent media forms and genres simply have toon sounds (Altman, 1992), it contributes to a
to play on and incorporate said conventions. Or type of verisimilitude that is very different from
do they? the one richer and more realistic game genres
strive for (adventure, military or FPS games).
Aural Objects, Flow and space In other words, Super Mario, Zelda or Final
Fantasy just wouldn’t be recognizable to their
As mentioned already, the first RPGs utilized a audience or, in our terms, possess verisimilitude,
small corpus of synthesized melodies to denote if it were not for their inter-textual references to
unique spaces, quintessential game moments iconic sounds of the past. Examples are ample -
and mood. Loosely based on music psychology the theme sounds of their game universe or even
conventions, these early game soundscapes used individual sound effects such as the 1-up sound,
major tonality to signify an uplifting mood, minor the brick-smashing sound or the jumping tune in
tonality to signify danger or failure (as in Zelda Super Mario; the battle cries of Zelda’s Link and
or the Final Fantasy series), upward note-trill to its iconic chest-opening sounds; or the epic combat
denote jump and a downward note sequence to rhythms during attacks and boss battles in Final
indicate death or end-game (as in all of the Super Fantasy, among many others. Given this, sound
Mario-based and derivative series). The bigger designers for classic fantasy titles take great care
picture in the early days consisted of having a to preserve these iconic sounds in each platform
continuously running soundtrack of synthesized and each iteration of their titles. As Phillips (2009)
music where many smaller elements, that are mentions in his expose on film and game music,
141
fantasy game theme songs have long transgressed in realistic, rich cinematic RPG/action games. I
the computer game genre and, particularly in Ja- will begin with Murch’s (1995) notion of worldiz-
pan, are frequently re-orchestrated and performed ing—giving a certain space acoustic qualities that
by choirs and symphonic orchestras. Composers make the player get involved—and combine that
of game music, while largely unknown in North with Ekman’s (2005) discussion on diegetic versus
America have star status in most of Asia. non-diegetic sound as acoustic elements that do or
There is another issue too: fantasy games deal do not belong to a gameworld. Historically, it is
with imaginary actions that no one has experienced important to note again how early games (Collins,
in the real world, such as stepping on enemies’ 2008; McDonald, 2008) instantiated the use of a
heads, eating a giant mushroom, catching a star melody to represent space—for example, in Final
(references from Super Mario) and, sonically, Fantasy towns have a certain melody representing
these actions do not have ‘real’ counterparts in the calm mood of a non-threatening environment
the acoustic reality we are familiar with. Creat- while out-of-town wooded areas use a separate
ing the infamous sound of the lightsaber in Star melody which is consistent everywhere in the game
Wars (McDonald, 2008) is a classic story in the and represents mild danger: mission dungeons
history of metaphoric sound using both musical have their own musical melody and, within them,
conventions and pop-psychology. Likewise, this entering the space of a boss battle features a fast-
quote from a sound designer of Torment illustrates paced tension music that is consistently the same
game verisimilitude challenges perfectly: throughout the game for each boss battle. Thus,
these games established a situation where mood,
During Torment, I was processing some sword hits, space, and call-to-action are rolled into one and are
and they were coming up very interesting. While all represented via one single melody/track. With
they didn’t work for the spell I was working on, I the emergence of more powerful game consoles
gave them a description like ‘reverberant metal the notion of space becomes divorced from the
tones, good spell source.’ Later, I was looking for conveyance of mood or a call for a particular ac-
something with those qualities, but had forgotten I tion and becomes more representative and realistic
made those sounds. When I searched my database aiming to immerse the player into a gameworld.
for ‘metal tones’, I found them, and they were This connects the idea of diegesis with the
exactly what I needed! (Farmer, 2009) notion of verisimilitude through the experience of
immersion, as “immersion is a mental construct
A less discussed but highly important part of resulting from perception rather than sensation”
game verisimilitude is the temporal flow of the (Grimshaw & Schott, 2007, p. 476). While the
soundscape, as it is intimately linked to the tradi- cinematic concept of diegesis simply refers to
tion of sound effects and aural objects. While the whether or not the sound source is in or outside
fantasy sound of the past presents a highly melodic, the frame, both Jørgensen (2006) and Ekman
musically semantic flow, the interactive-adaptive (2005) use this term to address whether a sound
tradition results in a “loopy”-sounding score of belongs to a gameworld or not. There is an im-
slightly varied bank sound effects (i.e. there may portant distinction to be made in using diegesis
be only one footsteps sound that is nevertheless in this way as it puts the emphasis on immersion
used for all characters) organized around modules into the resounding space (Grimshaw & Schott,
of game quests and activities but lacking an overall 2007) of a game and carries an implication that
structure or temporal design (see Figure 3 below). the gameworld already is an acoustic reality that
Another aspect of verisimilitude in game sounds either belong or not belong to. On the
sound has to do with creating space, specifically other hand, regarding diegesis only as a refer-
142
ence to in- or out-of frame sounds leaves the work and design complexity by counting on the fact
game soundscape intact as it assumes then that that players don’t need that much realism—only
all sounds are part of the gameworld. Such an enough in order to be hooked. The idea being is, it
idea fits perfectly with Schafer and Truax’s notion is acceptable if a lot of things from the real world
of an acoustic community (1977; 2001): a sonic don’t necessarily manifest themselves sonically in
locale or context that is formed over time through the gameworld. Given this, we can now expand
a dynamic exchange between sounds, soundscape the framework of listening positions from Table
and listeners, becoming an ecology of its own that 1 to include a pattern of attention to sound that
can be threatened, altered or generally disturbed ignores the otherwise obvious ”loopy”-ness of
by the introduction of new, foreign sounds or the sound effects and as such, the predictability of
removal of familiar signals that local inhabitants game soundscapes as a whole. A listening of denial,
(players) depend upon. The question is whether it or naive listening is perhaps a good term to use. It
is an ecology, where the listener is consumed by is not that players can’t, when prompted, identify
the soundscape in a spectator-based relationship the artificial nature of many sonic elements in a
(Westerkamp, 1990), or if the ecology includes the game soundscape, it is that they conditionally and
player in an (inter)active co-production. Again, purposefully ignore it, while instead immersing
we have to remind ourselves that immersion is a themselves in the experience of gameplay. Ide-
perception, not a sensation (Grimshaw, 2008, pp. als of game sound become less about fidelity of
170-174). The answer is in the ear of the listener acoustic sources or of audio quality and more the
so to speak: While even realistic games represent verisimilitude of non-engaging engagement with
only a small portion of the game environment soni- a holistic, interactive environment.
cally (see Figure 4), they do successfully create From the discussion so far, there are a few other
and maintain a sense of immersion, verisimilitude, modes of listening that I would like to put forth,
and belonging to a gameworld, not to mention however before I introduce them, it is important
conveying information through sonic signals. to draw a link between the types of listening fos-
tered by the flow of television and contemporary
radio soundscapes, and those encouraged by the
LIstENING tO GAME sOUND gameplay experience in general. The emergence
of continuous media such as radio and TV cre-
It follows that the historic shifts of verisimilitude ated a brand new type of listening experience:
in game sound have affected the experience of one that Truax calls distracted or media listening
listening as well. With the socio-cultural baggage (2001, p. 169). In order to accommodate viewers
of radio and film sound, listeners are already tuning in and out of the program and at the same
conditioned to accept aural objects (Metz, 1985), time attract and keep their attention, TV sound
internalize them, and think of them as more real flow uses a number of attention-management
than the real sounds they represent. Further, listen- techniques such as dynamic shift changes and
ers of game sound have adopted what Colin Ware modular programming structure (Truax, 2001, p.
(2004) refers to in visual studies as naive physics 170). It essentially tells us how to listen. It trains
of perception—in the aural sense. That is, play- us to increase or decrease our auditory attention
ers accept and often ignore the clearly artificial by use of carefully crafted cues, until they become
behaviour of looped sound bites, their sometimes second nature. These gestalts of auditory percep-
low or unrealistic quality, and their lack of diversity tion, then, seamlessly integrate cinema and game
and complexity (see Figures 3 and 4). What Ware sound, carrying the promise of total immersion,
was trying to get to is that designers often reduce suspension of disbelief and verisimilitude. As a
143
Figure 4. A sonic excerpt from Grand Theft Auto: San Andreas gameplay. While richer and more varied
in dynamic range (including periods of relative silence) the game flow still consists of a series of sound
effects strung together, with some distance/amplitude rendering
result, we begin relying less on active, engaged, experience of computer game sound are presented
information-processing listening, and more on in Table 3.
habitual background and media listening in all of
our surroundings (Schafer, 1977; Truax, 2001).
This is not to forget however, that games are EcOLOGY
interactive, and the player is, in Schafer’s terms,
a co-composer of her own game soundscape, at Discussing game soundscapes as sites of local
the same time that she listens to it. The listening acoustic ecologies is not a novel idea (Grimshaw,
positions that I’d like to add to in the interest of 2008) and as Grimshaw and Schott (2007) point
engaging with and critically understanding the out, “the more immersive a game is the more appro-
144
Table 3. A set of listening modes emergent out of the current discussion on fidelity, verisimilitude and the
ecology of game sound. These modes reflect and attempt to identify macro trends borne out of historical
shifts in the qualities, techniques and functions of game sound over time
A listening that supplies the perceptual conditions for immersion - building up a mental image of an environ-
Imaginative Listening ment from the little that is provided acoustically by the game’s soundtrack, for example, the way a game like
Cooking Mama is reminiscent of Super Mario games and evokes a fun, fantastical, care-free world.
An analytical, culturally-critical type of listening that has emerged over time in experienced players who look
Nostalgic Listening for iconic game music themes through platforms and generations of a particular game (some notable examples
here being the Final Fantasy, Super Mario, Zelda and Mega Man series).
A listening position that describes the ability that gamers develop to very quickly and fluidly interchange
listening attentions—one moment they may be immersed in the heat and tension of a battle and in the next they
Disjunctive Listening may pause to change their settings, entering a user interface type of soundscape (for instance, in the Fallout 3
example in Figure 2, the player shifts constantly between the battlefield ambience/listening position and armour
selection/target selection screens).
A non-analytical, electroacoustic listening that allows the player to feel immersed into the game reality with
the minimum amount of auditory complexity. In the absence of truly realistic soundscapes, players effortlessly
Naive Listening
ignore loops, repetitions and lack of sonic fidelity in order to become more immersed in the game. The name is
inspired by Ware’s (2004)naive physics of perception idea.
The type of listening that Truax (2001) calls media [flow] listening (p. 169) where players listen with an under-
Conditioned Listen-
lying expectation of how the flow of the game’s soundscape will unfold, tacitly familiar with the sonic elements
ing
of the games in general.
A result of cross-pollination of different media genres, this listening position addresses situations where game
soundscapes contain radio, telephone, or TV sounds (most famously featured in Grand Theft Auto). Conversely,
Inter-textual Listening
the popular events of Video Game Concerts are settings where game sounds live on outside gameworlds and are
performed, listened to, and used for other purposes outside of games.
priate it is to discuss the game world in terms of an ance within an acoustic community where each
ecology and, therefore, the greater the immersive sound has a meaning in the sonic context and a
function of the game sounds’’ (p. 479). Grimshaw, place within the spectral niche of the soundscape.
unlike Schafer, analogizes game soundscapes This acoustic balance may or may not be in sta-
to an actual bio-ecology where various species sis: at certain times an element may mask and
(in our case, sounds) interact, co-exist and are overpower other sonic elements. For example,
co-dependent on each other. He also focuses on in action scenes music often takes on a dominant
the ecology of first person shooter (FPS) game sonic role overshadowing smaller environmental
soundscapes as this genre lends itself particularly or game alert sounds (in Figure 4 it is clearly vis-
well to a discussion of ecology in terms of sound. ible in the full sequence layout (top section) that
Spatialization and 3D sound rendering are honed music tracks have a significantly higher/broader
to an art form in FPS games and the player literally dynamic range than all other sound effects). For
has to listen through the character’s ears in order Schafer, and especially for Truax whose work
to play and succeed in the game. Sounds of shots, focuses more on electroacoustic sound, sound
enemies in the background or out-of-the-frame balance is not simply about loudness but also
(extra-diegetic) sounds are extremely important, about value connotations. Music, for example,
as are user interface sounds including warnings is not only a much stronger emotional, affective
and alarms that often require immediate attention device than environmental sound within a given
and split-second decisions (trans-diegetic sounds, game environment, but it also carries a history of
per Jørgensen, 2006). Schafer, however, would being used commercially, to condition consumers
still look at ecology from the perspective of bal- into spending time and money in certain settings
145
(Truax, 2001; Westerkamp, 1991). As Hildegard game characters and the player-driven avatar are
Westerkamp (1991) points out, the phenomenon all participants in the ecology. However, such
of background music is responsible for sound algorithmic subtlety is far from reality to date
becoming “associated in our memories with en- and, partly due to economic reasons but also party
vironments and products” (1991). In essence it due to notions of value, may never be a generally
becomes the ambience of the media environment, utilized phenomenon anyway. Even though sound
however, it does not result in endless diversity of in games has experienced tremendous growth
spaces and sounds but, rather, in the emergence of and is now considered an important part of game
archetypal surrogate environments (Westerkamp, design, development companies still invest in it
1991). In the context of games, ominous abstract considerably less than they do in visual graphics
tones analogous to the cinematic model of the and animation. Sound designers in game develop-
mood track provide such a strong emotional sense, ment companies are typically pressured to stick
enforced and enriched by previous generations of with tried and true approaches to composition,
media listening such as film, radio and TV, that the design and functionality of audio, and are dis-
acoustic qualities of space, reverberation, distance, suaded from implementing “risky” new ways of
location and timbre, which are the more subtle yet using sound as part of the game mechanics. There
vital cues of everyday listening, are often lost in are, of course, a few examples where sound is used
the ‘background’. Similarly, music in action and in more participatory or ecological ways. For a
rhythm games often provides a promotion vehicle while now Nintendo DS features a microphone
for indie bands whose sound is conceived as cul- input so games such as Elektroplankton and to
turally related to the genre of the game itself thus a lesser degree titles such as Yoshi’s Island or
perpetuating—not challenging—the status quo Guitar Hero involve user-generated vocal ele-
of popular culture and mass media. Essentially, ments into the gameplay: mostly in the form of
music’s overshadowing of other sonic elements shouting, blowing or speaking into the mike. More
has both a cultural and a political economic im- complex platforms support a genre called stealth
plication for games in addition to an acoustic one. games where the avatar’s own soundmaking in the
game (primarily footsteps) is implied to be heard
Ecology of Listening by the other non-player characters. Metal Gear
Solid is the best known title, in addition to Hit-
While so far we have been discussing new listening man, Assassin’s Creed 2, and even youth-themed
patterns that emerge from the experience of game games such as Harry Potter and the Chamber of
soundscapes and their socio-cultural and histori- Secrets, or Zelda: The Phantom Hourglass, where
cal evolution, what about the listening that takes Link has to walk slowly in the Temple of Time in
place inside game soundscapes? Does anybody order not to alert the phantom knights. Even at a
listen within the game itself or is it a silent vacuum rudimentary implementation such as linking the
space where sound happens but no one can hear player/character’s speed to levels of “noise” in a
it? In other words, how would a game’s acoustic given space, this approach taps into an aspect of
ecology change if characters in it (maybe even all acoustic ecology that has been largely overlooked:
of them!) could listen to one another and to the the character’s experience of listening within the
player’s character, or even to sounds outside the gameworld. Acoustic Community as a Feature of
gameworld? In Truax’s (2001) terms this would Game Sound
complete the holistic relationship of true acous- We have already discussed acoustic com-
tic communication, uniting a constant interplay munity in the context of game soundscapes as a
between listeners, sounds and soundscape, where conglomeration of different types of sound cues,
146
sound functions, foreground, midground and contrast, many RPG, sports or puzzle games that
background sounds; a community that forms over are played at home, even with company, result
time and evokes a coherent sense of place in the in a much quieter soundscape with sporadic
gameworld. In this section, I’d like to also bring up and minimal interaction. Using Teamspeak or
the idea of the acoustic soundscape that is located other voice chat programs for Massively multi-
outside the gameworld but exists synchronously player online role-playing games (MMORGs) or
to it: the sounds that surround the player in her multi-player military strategy games results in
physical environment, sounds that may or may not yet another acoustic community where players’
be related to the gameplay, but are nevertheless voices have to fit seamlessly within the spectral
part of the immediate acoustic community that niche of the game’s soundscape without masking
the player or players are in. Without focusing too or obliteration: every second counts and a lot of
much on the minutiae of less significant sonic the designed sonic information is crucial to the
details such as household sounds, context does gameplay (see Figure 5). Game expos, conven-
offer quite a distinct sense of acoustic community tions and professional game championships are
depending on whether a player is at home alone, another quintessential acoustic community of
with friends, at an arcade, at a LAN party, or on gaming, filled with PAs (amplified public an-
a headset with online co-players (see Figure 5). A nouncements), a constant arcade-like hum of
Rock Band house party, for example, is a particular game sounds: the shifting of chairs and mashing of
community where the soundmaking of multiple buttons, whether players are wearing headphones
players and audience members supplies much of or not, the murmur and exclamations of crowds.
what makes this game’s soundscape a great ex- In fact the arcade environment, as Phillips (2009)
perience. It is precisely the exclamations of joy, points out, is responsible for some of the early
frustration, encouragement—and not the designed choices in game sound as each game’s signature
game sound—that give this acoustic community soundtrack was designed to attract attention in a
both a sense of fidelity and verisimilitude. In loud and noisy acoustic environment of competing
Figure 5. On the left we see a recording from an arcade ambience: a constant hum of competing, mask-
ing sounds, many of which are already distorted synthetic chiptunes (in the zoom-in section). On the
right we have a Teamspeak-based recording of a World of Warcraft mission: the progression (upper
section) clearly reflects more verbal excitement as the team finally defeats a difficult boss, culminating
into celebratory exclamations.
147
game stations: hence gaming’s early and ready minent event, I realize: science and programming
acceptance of sonic masking. As games moved still have a ways to go), we need to generate
into the home and became more technologically precisely the type of historical and socio-cultural
sophisticated, game sound changed to provide a analysis of game sound touched on in this chapter.
fuller, more subtle soundscape, often to be deliv- We need to understand the importance of all the
ered through headphones. With the emergence of elements of a game soundscape, which, for better
MMORGs, the popularity of game tournaments, or for worse, have become important to audiences,
expos, LAN parties and, most recently, Guitar or at the very least, we are now habituated to.
Hero and Rock Band house party nights, gam- There is a crucial epistemological relationship
ing is once more returning to a social model of there—through inter-textual cross-pollination
play where the sounds of the cultural context and and transference of practices and artefacts, we
setting are again significant and instrumental in have internalized many of these arbitrary mean-
forming that sense of acoustic community that ings and a realistic physical modelling of a game
unites designed game sound with the incidental soundscape might not mean much to us or even
(acoustic and electroacoustic) sound-making and be conducive to gameplay. Designers, audio en-
sonic environment. gineers and programmers need to know and think
about these issues.
Further, I believe the focus on listening posi-
cONcLUsION AND FUtUrE tions in this chapter is a key to understanding not
DIrEctIONs only some of the cultural practices surrounding
gameplay, but it can also tell us something about
This chapter explores the notions of fidelity and auditory perception that designers or scientists
verisimilitude manifesting historically both as could potentially use. Listening to game sound
global cultural conventions of media and technol- is now every bit as everyday as everyday listen-
ogy, as well as, more specifically, being design ing goes in our media and technology-saturated
goals in the production of sound in games. By ex- environment, so games offer new opportunities
ploring these two perspectives of acoustic realism to science, given the fact that contextual listen-
through the lens of the acoustic communication ing has always evaded laboratory psychoacoustic
framework with its focus on patterns of listening studies. Clearly, my main concerns however,
over time, acoustic communities and ecology, I are with the opportunities for critical and media
hope to offer a model for future theorizing and studies to engage with and treat game sound and
exploration of game sound and a lens for in-depth the phenomenon of listening to game sound as
analysis of particular game titles. As well, it is my another rich cultural artifact—a text if you will—
hope that placing some much needed emphasis that can add to the layers of theory and critique
on listening, ecology, and the holistic acoustic surrounding media, art, and cultural expression.
setting of the gaming experience will benefit not While the use of fidelity and verisimilitude are
only sound designers and game theorists but will only two relevant heuristics in the analysis of
also continue the trajectory of deepening inquiries game sound, it is my hope that the field of media
into game studies as a rich and unique form of studies will identify others and conduct the same
interactive media deserving of its own theoretical kind of rigorous examination of their historical
attentions. and cultural roots in order to elucidate their role
For example, before we go ahead and favour and importance not only in game sound but in our
real-time audio synthesis and physical modeling culture-at-large today.
for their realistic acoustic rendering (not an im-
148
Finally, my sense of the future directions in the Brandon, A. (2004). Audio for games: Planning,
field of game sound is that, as the game industry process, and production. Berkeley, CA: New
matures, as playing computer games starts to lose Riders Games.
some of its negative notoriety, naturally there is
Castlevania. (1989). Konami Digital Entertain-
more and more societal and media attention on
ment.
games as well as on game elements such as sound.
With that, increased popularity of gaming results Chion, M. (1994). Audio-vision, sound on screen.
in industry growth, expanding game genres, ex- New York: Columbia University Press.
panding the notions of what a game is, how it is
Chion, M. (1999). The voice in cinema. New York:
played and how it is experienced. Sound plays a
Columbia University Press.
crucial role in experience and interactivity and
there has been an increased design attention to it Chion, M. (2003). The silence of the loudspeaker
both from industry as well as from independent or why with Dolby sound it is the film that listens
artists. With that comes a book like this one and to us . In Sider, L., Freeman, D., & Sider, J. (Eds.),
my prediction is that there will be (hopefully) Soundscape: The School of Sound lectures 1998-
more to come from scholars, critics, media theo- 2001 (pp. 150–154). London: Wallflower Press.
rists, sociologists, scientists, and designers who
would be now better equipped to continue this
ticipatory and non-linear aspects of video game
in-depth conversation about game sound and
audio . In Hawkins, S., & Richardson, J. (Eds.),
listening in a way that preserves the complex
Essays on sound and vision (pp. ##-##). Helsinki:
ecology of people’s interactions with their (media/
Helsinki University Press.
techno)-soundscapes while expanding the multi-
disciplinary nature of this maturing field. There Collins, K. (2008). Game audio: An introduc-
has been a resurgence of concern over noise and tion to the history, theory, and practice of video
the urban soundscape coming back into public game music and sound design. Cambridge, MA:
attention in the context of environmentalism and MIT Press.
sustainability and, well, it only takes one look at the
Cook, P. (Ed.). (1999). Music, cognition, and
history of game sound, inter-related with similar
computerized sound: An introduction to psycho-
media forms and genres, to glean its influence on
acoustics. Cambridge, MA: MIT Press.
the way in which we listen, make sense of and
experience our physical offline soundscape. More Cooking Mama. (2007). OfficeCreate. Majesco
work in this area is not only needed, but is, I am Publishing.
confident, bound to come.
Deutch, S. (2003). Music for interactive moving
pictures . In Sider, L., Freeman, D., & Sider, J.
rEFErENcEs (Eds.), Soundscape: The School of Sound lectures
1998-2001 (pp. 28–34). London: Wallflower
Altman, R. (1992). Sound theory sound practice. Press.
London: Routledge. Ekman, I. (2005). Understanding sound effects
Assassin’s Creed 2. (2009). Ubisoft Montreal. in computer games. In Proceedings of the 6th
Ubisoft. Annual Digital Arts and Cultures Conference,
2005, Copenhagen, Denmark: IT University Press.
149
Electroplankton. (2006). Nintendo America. Grimshaw, M., & Schott, G. (2007). Situating
Nintendo. gaming as a sonic experience: The acoustic ecol-
ogy of first-person shooters. In Proceedings of
Fallout 3. (2008). Bethesda Softworks. Bethesda
the Third Digital Games Research Association
Game Studios.
Conference (pp. 474-481).
Farmer, D. (2009). The making of Torment audio.
Guitar Hero. (2006). Harmonix. Rec Octane.
Retrieved July 9, 2009, from http://www.film-
sound.org/game-audio/audio.html. Harry Potter and the Chamber of Secrets. (2002).
Eurocom. Electronic Arts.
ity in procedural audio . In Grimshaw, M. (Ed.), Hitman. (2002). Io Interactive. Eidos Interactive.
Game Sound Technology and Player Interaction:
Hug, D. (2011). New wine in new skins: sketching
Concepts and Developments. Hershey, PA: IGI
Global.
M. (Ed.), Game Sound Technology and Player
Figgis, M. (2003). Silence: The absence of sound Interaction: Concepts and Developments. Her-
. In Sider, L., Freeman, D., & Sider, J. (Eds.), shey, PA: IGI Global.
Soundscape: The School of Sound lectures 1998-
Huiberts, S., & van Tol, R. (2008). IEZA: A frame-
2001 (pp. 1–14). London: Wallflower Press.
work for game audio, Gamasutra, Retrieved April
Final Fantasy 2. (1988). Squaresoft. Square ENIX. 4, 2009, from http://www.gamasutra.com/view/
feature/3509/ieza_a_framework_for_game_au-
Friberg, J., & Gärdenfors, D. (2004). Audio games:
dio.php?page=3.
New perspectives on game audio. In Proceedings
of ACM SIGCHI International Conference on Jørgensen, K. (2006). On the functional aspects
Advances in Computer Entertainment Technology of computer game audio. In Proceedings of the
(pp. 148-154). first International AudioMostly Conference (pp.
48-52).
Gaver, W. (1994). Using and creating auditory
icons. In G. Kramer (Ed.). Auditory Display: Jørgensen, K. (2011). Time for new terminology?
Signification, Audification, and Auditory Inter- Diegetic and non-diegetic sounds in computer
faces (Santa Fe Institute Studies in the Sciences games revisited . In Grimshaw, M. (Ed.), Game
of Complexity, Vol. 18, pp. 417-446). Reading, Sound Technology and Player Interaction: Con-
MA: Addison-Wesley. cepts and Developments. Hershey, PA: IGI Global.
God of War 2. (2007). SCE Studios Santa Monica. Marks, A. (2009). The complete guide to game
Sony Computer Entertainment. audio: For composers, musicians, sound de-
signers, game developers (2nd ed.). Location:
Grand Theft Auto. (2004). San Andreas. Rockstar
Elsevier Press.
North. Rockstar Games.
McDonald, G. (2008). A brief timeline of video
Grimshaw, M. (2008). The acoustic ecology of
game music. Retrieved July 8, 2009, from http://
the first-person shooter: The player, sound and
www.gamespot.com/gamespot/features/video/
immersion in the first-person shooter computer
vg_music/.
game. Saarbrücken, Country: VDM.
Mega Man. (1993). Capcom. Capcom Entertain-
ment.
150
Metal Gear Solid. (1998). Konami Japan. Konami Tuuri, K., Mustonen, M., & Pirhonen, A. (2007).
Computer Entertainment. Same sound—different meanings: A novel scheme
for modes of listening. In Proceedings of the
Metz, C. (1985). Aural objects . In Belton, E.
Second International AudioMostly Conference,
W. J. (Ed.), Film sound (pp. ##-##). New York:
2007, 13-18.
Ware, C. (2004). Information visualization: Per-
Murch, W. (1995). Sound design: The dancing
ception for design (2nd ed.). Location: Morgan
shadow . In Boorman, J., Luddy, T., Thomson,
Kaufman Publishing.
D., & Donohue, W. (Eds.), Projections 4: Film-
makers on film-making (pp. 237–251). London: Westerkamp, H. (1990). Listening and sound-
Faber and Faber. making: A study of music-as-environment . In
Lander, D., & Lexier, M. (Eds.), Sound by artists
Phillips, N. (2009). From films to games, from
(pp. ##-##). Location: Art Metropole & Walter
analog to digital: Two revolutions in multi-media!
Phillips Gallery.
Retrieved July 8, 2009, from http://www.film-
sound.org/game-audio/film_game_parallels.htm. Wilhelmsson, U., & Wallén, J. (2011). A combined
model for the structuring of computer game audio
Planescape: Torment. (2005). Black Island Stu-
. In Grimshaw, M. (Ed.), Game Sound Technology
dios. Interplay.
and Player Interaction: Concepts and Develop-
Rock Band. (2008). Harmonix. MTV Games. ments. Hershey, PA: IGI Global.
Roeber, N., Deutschmann, E. C., & Masuch, M. World of Warcraft. (2004). Blizzard Entertain-
(2006). Authoring of 3D virtual auditory environment. Blizzard.
ments. In Proceedings of the First International
Yoshi’s Island. (2007). Nintendo Japan. Nintendo.
AudioMostly Conference (pp. 15-21).
Zelda: Phantom Hourglass. (2007). Nintendo.
Schafer, R. M. (1977). The tuning of the world.
Nintendo.
Toronto: McClelland and Stewart.
Spyro the Dragon. (2008). Insomniac Games.
Sony Computer Entertainment.
Stockburger, A. (2007). Listen to the iceberg: On
the impact of sound in digital games . In von Bor- Acoustic Community: A term emerging from
ries, F., Walz, S. P., & Böttger, M. (Eds.), Space Schafer and Truax’s work with the WSP (World
time play: Computer games, architecture and Soundscape Project) in the 1970s referring to stable
urbanism: The next level (pp. ##-##). Location: sonic locales that include a set of sound which
Birkhäuser Publishing. clearly belong there and characterize a commu-
nity: For example, the sounds of coin machines,
Super Mario Bros. NES (1985). Nintendo. Nin- yelling, and synthesized music all belong to an
tendo. arcade acoustic community.
Truax, B. (2001). Acoustic communication (2nd Acoustic Ecology: A movement started by
ed.). Location: Ablex Publishing. R.M. Schafer and continued through the World
Forum for Acoustic Ecology (WFAE) and also a
term denoting the sonic balance in a given sound-
scape through its signal-to-noise ratio.
151
Chiptunes: A term that has been now popu- form of listening attention that we all engage in
larized by the arts community, referring to 8-bit at different times.
synthesized melodies or single tones that were Loopy: An adapted term I am using here to
originally directly encoded onto a game’s elec- denote the quality of game sound flow in many
tronic chip memory, in early game development. RPG games where short looped sounds from an
Diegesis: A term from film studies referring to effects bank are triggered each time an action is
what is in-the-frame of the screen as opposed to performed, thereby often sounding cut-off, too-
what isn’t. In game sound studies, Chion is credited similar, or simply uniform.
with popularizing it to refer to sounds that are in Soundscape: A term coined by R.M. Schafer
or out of frame from the player’s perspective. It to describe the totality of sounds surrounding us
has also been used by others to refer to sounds at any given time/place: analogous to a landscape.
that do or do not belong to the gameworld. Verisimilitude: Literally means similar to
Fidelity: Literally means faithfulness and here, reality and it is a theatrical term referring to the
it refers to the audio quality of a sound reproduc- ability of an artwork to appear real, to foster a
tion relative to its original acoustic source. sense of realism in the audience. Here, it refers
Listening Positions: Developed by B. Truax to the ability of game soundscapes to sound real.
as a term, it refers to types of listening attentions
that have become patterns over time with exposure
to certain types of sound environments, habits, or
media, that is, background listening is a passive
152
153
Chapter 8
Perceived Quality in
Game Audio
Ulrich Reiter
Norwegian University of Science and Technology, Norway
AbstrAct
This chapter reviews game audio from a Quality of Experience point of view. It describes cross-modal
interaction of auditory and visual stimuli, re-introduces the concept of plausibility, and discusses issues
of interactivity and attention as the basis for the qualitative, high-level salience model being suggested
here. The model is substantiated by experimental results indicating that interaction or task located in
the audio domain clearly influences the perceived audio quality. Cross-modal influence, with interac-
tion or task located in a different (for example, visual) domain, is possible, but is significantly harder
to predict and evaluate.
INtrODUctION this assessment has had to be revised over the

last years. Learning from other artistic fields like
Perceived quality in game audio is not a question of cinema, in which storytelling is a central means
audio quality alone. As audio is usually only a part of providing “user experience”, game developers
in an overall game concept consisting of graphics, have come to know that audio can trigger emotions
physics, artificial intelligence, user input, feedback and provide additional information otherwise hard
and so forth, audio has been considered to play to convey. Today, although budgets are still limited
a relatively minor role in the overall experience compared to other aspects of game engineering,
that a game provides. Consequently, a lot of effort audio in games is given more attention by the
has been put into providing near photo-realistic game developers than ever before.
representations of (virtual) game scenarios to the But there is more to audio in games than just
player, but only little into audio. Interestingly, an emotional support for a story. Most games are
user-centered and non-linear, as opposed to the
DOI: 10.4018/978-1-61692-828-5.ch008 linear story telling of traditional, non-interactive
Perceived Quality in Game Audio
content presentation. Therefore, the audio has tion: well-known facts about visual and auditory
to be manipulated in real-time depending on the perception are summarized briefly.
player’s actions. Real-time processing of audio The second section presents a discussion of
can become computationally very demanding cross-modal influences, that is, interaction be-
and is a problem for complex game scenarios. tween auditory and visual stimuli in the perceptual
This has introduced the concept of plausibility: apparatus, and cross-modality in general. A survey
the main goal in game audio is not to have an detailing the most accepted theories of how audio-
audio simulation as exact and close to reality as visual (bimodal) perception is achieved in the
possible, but to render audio that is plausible in human brain is given. This is far more complex
the game scenario, and that provides an overall than just adding the results of auditory and visual
quality impression that matches the other aspects processing and is therefore worth an extended
of the game. discussion. This is followed by examples of ef-
One fact well known from home cinema ap- fects in bimodal perception (based on research in
plications is that an improved quality in video the fields of psychology and cognitive sciences)
can also increase the subjectively perceived audio that can be relevant in the context of game audio.
quality, and that the reverse effect also exists The third section discusses the concept of
(Beerends & De Caluwe, 1999). It is therefore a auditory and audio-visual plausibility. It briefly
most interesting question to see whether these ef- compares the requirements for exact (room)
fects can be exploited to increase the subjectively acoustic simulations versus real-time rendering
perceived overall quality of a game without actu- and details the constraints resulting for computer
ally increasing the computational load. Instead of games.
just rendering more details (equivalent to a higher The next section gives an overview on issues
simulation depth), focusing on those details that are related to interactivity, such as latency, user input,
actually relevant in a certain context could provide and perceptual feedback. Interactivity is closely
a much higher Quality of Experience (QoE) (see related to the generation of presence, defined as
Farnell, 2011 for a discussion of relevancy and the “perceptual illusion of non-mediation”, or
redundancy in procedural audio design). simply the feeling of “being there”. The concept
The central question is, therefore, which stimuli of presence is discussed as an indirect measure
in a game scenario are of most importance? Can for perceived quality.
information that is difficult and cost-intensive to The fifth section elaborates on the concept
convey in one modality be presented in another of attention. The perception of multiple streams
modality with less effort but similar perceptual is discussed and an introduction to the general
impact? What role does interactivity play in model of the Perceptual Cycle according to Neis-
the perception of quality? What are the techni- ser is given. From this, the concepts of selective
cal parameters that can influence the perceived attention and divided attention are discussed and
quality of a game, and which other factors exist capacity limits of the human perceptual system
that potentially dominate the perceptual process? are explained.
This chapter aims at identifying and discussing Finally, in the sixth section, the resulting factors
general quality criteria in multimedia application (technical as well as human) are arranged to form
systems with a focus on games. These criteria a qualitative model describing human audio-visual
contain technical as well as human factors. In perception based on saliency of stimuli. Such a
order to understand these factors, the first section model can serve as a basis for determining the
touches upon the mechanisms of human percep- QoE in games in general and specifically for game
audio. Experimental results documenting inner-
154
modal versus cross-modal effects on perceived ing us to solve even ambiguous situations with
audio quality are summarized. contradicting sensory information.
Finally, a summary is given that reviews the All these depth cues can be exploited even when
most important concepts leading to the salience the environment is at rest. As soon as motion (of
model presented in the preceding section. Further objects or of the head) is present, motion parallax
research potential is defined. takes on an important role in depth perception.
Motion parallax describes the fact that the image
of an object far away from the viewer moves more
MEcHANIsMs OF HUMAN slowly across the retina than the image of an object
PErcEPtION at a close distance. Motion parallax also provides
cues in the monocular case.
Vision (sight) and audition (hearing) are the most
important human senses for playing games. In Auditory Perception
the real world, these senses provide us with in-
formation about the more remote surroundings, Auditory stimuli are perceived to be localized in
as opposed to taste (gustation), smell (olfaction), space. The sound is not heard within the ear, but
and touch (taction or pressure) which provide it is phenomenally positioned at the source of the
information about our immediate vicinity. Be- sound. In order to localize a sound, the auditory
cause vision and audition communicate spatial system relies on binaural and monaural acoustic
and temporal relations of objects, and because cues. Directional hearing in the horizontal plane
the necessary technology to stimulate the two is (azimuth) is dominated by two mechanisms which
readily available on computer systems used in the exploit binaural time differences and binaural
home, most games only stimulate the two. intensity differences. For sinusoidal signals, inte-
raural time differences (ITDs, the same stimulus
Visual Perception arriving at different times at the left and the right
ear) can be interpreted by the human hearing
Vision mainly serves to indicate spatial correla- system as directional cues from around 80Hz
tion of objects, as the human visual system sel- up to a maximum frequency of around 1500Hz.
dom responds to direct light stimulation. Rather, This maximum frequency corresponds to a
light is reflected by objects and thus transmits wavelength of roughly the distance between the
information about certain characteristics of the two ears. For higher frequencies, more than one
object. The direction of a visually perceived object wavelength fits between the two ears, making the
corresponds directly to the position of its image comparison of phase information between left and
on the retina, the place where the light receptors right ear equivocal (Braasch, 2005). For signals
are located in the eye. At the same time, a visual with frequencies above 1500Hz, interaural level
stimulus occupies a position in perceptual space differences (ILDs) between the two ears are the
that is defined relative to a distance axis, as well primary cues (Blauert, 2001). Regardless of the
as to the vertical and horizontal axes. source position, ILDs are small at low frequencies.
In the determination of an object’s distance to This is because the dimensions of head and pin-
the eye, there are a number of potential cues of nae (the outer ear visible on the side of the head)
depth. These include monocular mechanisms like are small in comparison to the wavelengths at
interposition, size, and linear perspective as well frequencies below about 1500Hz. Therefore they
as binocular cues like convergence and disparity. do not represent any noteworthy obstacle for the
All of these are usually evaluated jointly, allow- propagation of sound.
155
Directional hearing in the vertical plane (el- The main questions with respect to audio-visual
evation) is dominated by monaural cues. These perception are: At what level of perceptual pro-
stem from direction-dependent spectral filtering cessing do cross-modal interactions occur? And
caused by reflection and diffraction at the torso, what mechanism underlies them?
head, and pinnae. Each direction of incidence
(for instance, defined in terms of azimuth and Joint Processing of Audio-
elevation) is related to a unique spectral filtering Visual stimuli
for each individual. This spectral filtering can
be described by head-related transfer functions As early as 1909, Brodmann suggested a division
(HRTFs). In addition to providing localization of of the cerebral cortex into 52 distinct regions, based
sounds in the vertical plane, these spectral cues on their histological characteristics (Brodmann,
are also essential for resolving front-back confu- 1909). These areas, today called Brodmann areas,
sions (Blauert, 2001). Pulkki (2001) reports that, have later been associated to nervous functions.
for elevation perception, frequencies around 6kHz The most important areas in the audio-visual
are especially important. game context are Primary Visual Cortex (V1),
In everyday situations, localization of sound Visual Association Cortex (V2 and V3), as well
sources seldom relies on auditory cues alone. as Primary Auditory Cortex and anterior and
Knowledge of the potential source of a sound (for posterior transverse temporal areas (H). This
example, airplane noises from above, or crunching division suggests that the different modalities are
shoes from below) aids in the localization process. related to separate regions of the brain, and that
Visual cues heavily influence the localization of processing of stimuli is performed separately for
sound sources. each modality.
Taking a closer look at the brain reveals that
the neurons of the neocortex are arranged in six
crOss-MODAL INtErActION horizontal layers, parallel to the surface. The
bEtWEEN AUDIO AND VIDEO functional units of cortical activity are organized
in groups of neurons. These are connected by four
Human perception in real world situations is a types of fibers, of which the association fibers are
multi-modal, recursive process. Stimuli from especially interesting when looking at information
different modalities usually complement each exchange between cortical areas. Short association
other and make the perceptual process more un- fibers (called loops) connect adjacent gyri, whereas
equivocal. Only those stimuli that can actually long association fibers form bundles to connect
be perceived by the primary receptors of sound, more distant gyri in the same hemisphere. These
light, pressure and so on contribute to an overall association bundles give fibers to and receive fibers
impression (which is the result of any perceptual from the overlying gyri along their routes. They
process). The human perceptual process, because occupy most of the space underneath the cortex.
of its complexity, cannot easily be explained in There are many such connections between
a simple block diagram without neglecting im- different functional areas of the neocortex such
portant features. A number of descriptive models that information can be exchanged between
exist, but these only cover certain aspects of the them and true multi-modal processing can be
process, depending on the level of abstraction at achieved. Goldstein (2002) gives an example of
which the respective model is located. a red, rolling ball entering our field of view. Lo-
Relatively little is known about the mechanisms cally distinct neurons are then activated by either
of multi-modal processing in the human brain. motion, shape, or color. Subsequently, dorsal and
156
ventral streams are also activated. Although the stimuli perceived by our senses. There is actu-
involved neurons are locally distinct, we perceive ally no need to consciously further evaluate the
one singular object, not separate rolling, red color, different percepts in terms of relevance, because
or round shape. they usually complement (and not contradict)
Until now, it is unclear how the processing of each other.
multiple characteristics of a single object is orga- In order to actually verify any naturally given
nized. A number of theories have been suggested order of significance of the perceived stimuli, it is
to explain this binding problem, and the explora- necessary to present the human perceptual system
tion of binding in the visual system has become a with contradictory sensory information and see
heavily discussed topic. According to Goldstein what the generally dominating modality is—if
(2002), the most prominent theory, suggested by there is any. There have been a number of scien-
Singer, Engel, Kreiter, Munk, Neuenschwander, tific efforts to explain in a perceptual relevance
and Roelfsema (1997), assumes that visual ob- model how the human perceptual system weighs
jects are represented by groups of neurons. These the different contradicting percepts.
so-called cell-assemblies are activated jointly, Two such models have been proposed to de-
producing an oscillatory response. This way, scribe how perceptual judgments are made when
neurons belonging to the same cell-assembly can signals from different modalities are conflicting.
synchronize. Whenever the reaction to stimuli One of these models suggests that the signal that
is synchronized, this means that the respective is typically most reliable dominates the competi-
cortical areas are processing data coming from tion completely in a winner-take-it-all fashion: the
one single object or context. judgment is based exclusively on the dominant
Yet, this binding by synchrony theory has signal. In the context of spatial localization based
left doubts with respect to the interpretation and on visual and auditory cues, this model is called
processing of the synchrony code. For example, visual capture because localization judgments
Klein, König, and Körding (2003) postulate that are made based on visual information only. The
“many properties of the mammalian visual system other model suggests that perceptual judgments
can be explained as leading to optimally sparse are based on a mixture of information originating
neural responses in response to pictures of natural from multiple modalities. This can be described as
scenes” (p. 659). According to Goldstein (2002), an optimal model of sensory integration which has
many others argue that binding can be explained by been derived based on the maximum-likelihood
(selective) attention. Attention is discussed below. estimation (MLE) theory. This model assumes
that the percepts in the different modalities are
Dominance of single Modalities statistically independent and that the estimate of a
property under examination by a human observer
Very often the dominance of visual stimuli over has a normal distribution. In engineering literature,
other modalities is accepted naturally as a given. the MLE model is also known as the Kalman Filter
In fact, looking at our everyday experiences we (Kalman & Bucy, 1961).
might be inclined to accept this posit without fur- Battaglia, Jacobs, and Aslin (2003) report that
ther discussion: because “seeing is believing”, we several investigators have examined whether hu-
often think that we tend to trust our eyes more than man adults actually combine information from
the other senses. Yet, this appraisement is often multiple sensory sources in a statistically optimal
due to the fact that in the real world we seldom manner (that is, according to the MLE model).
have to face contradictions in the multi-modal They explain:
157
According to this model, a sensory source is reli- having cognitive, attentional, or other origins. This
able if the distribution of inferences based on that is especially interesting as there was no degrada-
source has a relatively small variance; otherwise tion in the quality of the visual percept offered,
the source is regarded as unreliable. More-reliable which otherwise inevitably provokes the human
sources are assigned a larger weight in a linear- perceptual system to rely on other modalities.
cue-combination rule, and less reliable sources To sum up, the combined results of these experi-
are assigned a smaller weight. (Battaglia et al., ments suggest that there is no clear, generalized
2003, p. 1391) bias of humans toward any of the available mo-
dalities in terms of dominance. Apparently, there
Looking at it this way, visual capture is just a is no such thing as a general dominance of visual
special case of the MLE model: the highly reliable percepts over other stimuli. Instead, whenever such
percept (the visual cue) is assigned a weight of a bias toward any of the available modalities exists,
one, whereas the less reliable percept (the auditory this seems to be highly dependent on the context.
cue) is assigned a weight of zero. Whereas Battaglia et al. (2003) tested subjects for
Battaglia et al. (2003) describe an experiment contradicting localization cues and were presented
designed to answer the question whether human with a bias toward the visual percept, Shams et al.
observers localize events presented simultane- (2000) tested subjects for temporal variations of
ously in the auditory and visual domain in a way cues and were presented with a bias toward the
that is best predicted by the visual capture model auditory percept. This actually indicates that the
or by the MLE model. Their report suggests that human perceptual system tends to prefer those
both models are partially correct and that a hy- senses (give a higher weight to those percepts)
brid model may provide the best account of their that promise a higher degree of reliability or
subjects’ performances. As greater amounts of resolution for the presented perceptual problem:
noise were added to the visual signal, subjects Whereas the horizontal resolution of the human
used more and more information perceived via the auditory system is roughly 2 to 3 degrees for
auditory channel, as suggested by the MLE model. sinusoidal signals coming from a forward direc-
Yet most notably, according to their analysis, test tion (Zwicker & Fastl, 1999), the resolution of the
subjects seemed to be biased towards using visual visual system is at least 100 times as high, about
information to a greater extent than originally 1 min. of arc (Howard, 1982). On the other hand,
predicted by the MLE model. This means that the time resolution of the auditory system allows
the model used in the experiments committed a to resolve the temporal structure of sounds as close
systematic error by constantly underestimating as 2ms (Zwicker & Fastl, 1999), whereas the hu-
the test subjects’ use of visual information (thus man visual system can be tricked into believing
overestimating the use of auditory information). in a continuously moving object when presented
Shams, Kamitani, and Shimojo (2000, 2002) with only 24 sampled pictures of the continuous
describe experiments in which visual illusion was movement per second.
induced by sound, resulting in the auditory cue
outweighing the visual cue. They presented test
subjects with flashes of light and beeps of sound: AUDItOrY AND AUDIO-
whenever a single flash of light was accompanied VIsUAL PLAUsIbILItY
by multiple auditory beeps, the single flash was
perceived as multiple flashes. They conclude that In classic room acoustic simulation, the time
this alteration of the visual percept is caused by necessary to render the room audible (in other
cross-modal perceptual interactions, rather than words, to perform the room acoustic simulation
158
itself), is often considered second-rank. Instead, the virtual scene that is displayed (audio-visually)
the (acoustic) similarity between the simulation is perceived as plausible. It is thus accepted that
and the real situation is considered most important. stimuli coming from the surrounding real world
In games, this situation is reversed: the available (which cannot be entirely excluded in a typical
computational power is critical, and rendering computer game playing environment) might in-
has to be performed in real-time. Therefore, the terfere with those from the virtual scene.
concept of plausibility is applied: as long as there Furthermore, the time and investment neces-
is no obvious contradiction between the visual and sary to develop completely accurate auditory and
the acoustic representation of a virtual scene, the visual models is as much of a limiting factor for
human senses merge auditory and visual impres- how much detail will be rendered, as is the com-
sions. Hence, it is usually possible to replace a putational power alone. It is therefore reasonable
cost-intensive geometry-based room acoustic to focus only on the most important stimuli and
simulation with a generic reverberation algorithm, leave out those that would go unnoticed in a real
for example, with combinations of all pass filters world situation. In order to do so, it is necessary to
and delays according to Schroeder (1962, 1970), predict what the most important stimuli or objects
with nested all pass filters according to Gardner in the overall audio-visual percept are.
(1992), or with feedback delay line structures ac-
cording to Jot and Chaigne (1991). This way, the
auditory part of the presentation provides a rough INtErActIVItY IssUEs
sketch of the room’s characteristics, whereas the AND PrEsENcE
visual part complements the overall impression
with an increased level of detail. As long as the The concept of interactivity has been defined by
information provided in the two modalities is Lee, Jin, Park, and Kang (2005) and Lee, Jeong,
not contradictory, there is a high chance that the Park, and Ryu (2007) based on three major
player’s perceptual apparatus merges the stimuli viewpoints: technology-oriented, communication-
and blends them to form a single, multi-modal setting oriented, and individual-oriented views.
representation of the scene. Here, the technology-oriented view of interac-
In general, it might be arguable whether a tivity is adopted, which “defines interactivity as
“perfect” reproduction of the properties of a a characteristic of new technologies that makes
real life experience will ever be possible in a an individual’s participation in a communication
computer game at all (with the assumption that setting possible and efficient” (Lee et al., 2007).
a simulation is good enough as long as there is Steuer (1992) holds that interactivity is a
no perceptual difference to reality detectable by stimulus-driven variable which is determined by
the human senses in the given situation). A lesser the technological structure of the medium. Accord-
interpretation of this applies to scenes which have ing to Steuer, interactivity is “the extent to which
no counterpart in reality: their appearance needs users can participate in modifying the form and
to be plausible in every aspect and also in a sense content of a mediated environment in real time”
of perfect agreement between the cues offered by (p. 14) —in other words, the degree to which users
the system in the different perceptual domains. can influence a target environment. He identifies
In the context of games, this requirement can be three factors that contribute to interactivity:
further reduced. Because the visual representation
of the scene is limited to a region in the frontal • speed (the rate at which input can be as-
area and is not supposed to fill the field of view similated into the mediated environment)
entirely, it suffices to require that the one part of
159
• range (the number of possibilities for ac- Fitts’ Law target acquisition task in which they
tion at any given time) had to move the mouse from a starting point to a
• mapping (the ability of a system to map target, with a latency of between 25ms and 225ms
its controls to changes in the mediated from moving the mouse to actually seeing the cur-
environment in a natural and predictable sor move on the screen. The authors report that
manner). the threshold at which latency started to affect the
performance was approximately 75ms. This effect
These factors are related to technological con- was also dependent on the difficulty of the task:
straints that come into play when an application is the harder the task, the greater was the adverse
supposed to provide interactivity to the user, as is effect caused by increased latency.
the case for computer games. These technological Wenzel (1998, 1999, 2001) has published a
constraints are briefly discussed in the following number of reports about the impact of system
subsections. latency on dynamic performance in virtual acous-
tic environments with a focus on localization of
Latency sound sources. The bottom line is that depending
on the source velocity of the audio signal itself,
Latency is one of the main concerns in computer localization of sound sources might be impaired
games. Latency in the context of interactivity can when total system latency (end-to-end latency)
be defined as the time that elapses between a user is higher than around 60ms for audio-only pre-
input and the apparent reaction of the system to that sentations (Wenzel, 1998). On the other hand,
input. It is closely related to Steuer’s speed factor. error rates in an active localization task, tested
Latencies are introduced by individual com- on an HRTF-based reproduction system, showed
ponents of the system. These components may comparable error rates for both low and very high
include input devices, signal processing algo- latencies suggesting that subjects were largely
rithms, device drivers, communication lines and able to ignore latency altogether (Wenzel, 2001).
so on. Although these components may interact in Nordahl (2005) examined the impact of self-
more than one way on a game platform, a system’s induced footstep sounds on the perception of pres-
end-to-end latency should not vary over time to ence and latency. Interestingly, for audio-visual
make it predictable. feedback in a VE, the maximum sound delay that
Meehan, Razzaque, Whitton, and Brooks was possible without latency being perceived as
(2003) report a study in which they tested the such was around 50% higher than for the audio-
perceived sense of presence (see below) for two only feedback case (mean values of 60.9ms against
different end-to-end latencies in a Virtual Environ- 41.7ms). Nordahl explains this as attention being
ment (VE). The low latency was 50ms, the high focused mainly on the visual, rather than the audi-
latency was 90ms. Test subjects were presented tory feedback in the audio-visual case.
with a relaxing environment that was switched to Looking at these experimental results, it is
a threatening one and their response was observed. difficult to draw a general conclusion on the maxi-
Meehan et al. report that subjects in the low- mum allowed latency for computer games. Appar-
latency group had a higher self-reported sense of ently, the perception of latency as such depends
presence and a statistically higher change in heart on the system setup itself (screen, loudspeakers/
rate between presentations of the two situations. headphones, for example), on the task, and on
MacKenzie and Ware (1993) conducted the the content that is displayed. At the same time,
first quantitative experiments with respect to ef- measuring total system latency correctly is not a
fects of visual latency. Participants completed a trivial task. Therefore, a general recommendation
160
would be to keep latency as low as possible within subjective feeling of presence. Other factors have
any such system, that is, preferably below 50ms. been found to be determinants for presence but
these depend on the theoretical concept applied
Input and Perceptual Feedback by the researcher.
Ellis (1996) points out that presence may
Perceptual feedback is the response that a system not necessarily be the ultimate goal of every
provides to the player’s input. In games, perceptual interactive audio-visual application system. He
feedback is usually provided in the auditory and holds that successful task accomplishment can
visual domains. Input provided by the player can, be far more important than presence, especially
in the general case, consist of any kind of signal in situations “where the medium itself is not the
accepted by the system for controlling it: speech, message” (p. 253). This is easily accepted for
gesture, haptic control, eye tracking and so forth. player-game interaction, but is also applicable to
Input and perceptual feedback are related to communication between players in a multi-player
Steuer’s (1992)mapping factor and his range factor game environment, when players have to team up
is related to the kind of interaction that is offered to achieve a certain goal.
by the game. This depends strongly on the goal
of the application or game itself. In a first-person
shooter, players might expect a different range of AttENtION
interaction than in a business simulation game.
Hence, both input and perceptual feedback When being confronted with an increased number
define the degree of interactivity a game player of stimuli, the human perceptual apparatus will
can experience. try to keep up with the processing required for the
input on offer. Generally, this can be achieved using
Presence different strategies. According to Pashler (1999),
all of them are usually referred to as attention.
Closely related to interactivity is presence. Lars- Many human activities require that information
son, Västfjäll, and Kleiner (2003) define presence from a multitude of sources is taken in. When we
in interactive audio-visual application systems or attempt to monitor one stream of information, we
VEs “as the feeling of ‘being there’” (p. 98), and pay attention to the source. Usually, natural scenes
as the element that generates involvement of the are multi-modal, thus providing information in
user. Lombard and Ditton (1997) define presence more than one modality. Also, natural scenes usu-
in a broader sense as the “perceptual illusion of ally provide more than one informational stream.
nonmediation” (p. 24). The question is then, how is attention distributed
According to Steuer (1992), the level of inter- if a multitude of information is presented in more
activity (degree to which users can influence the than one stream? What role does multi-modality
target environment) has been found to be one of of the information play in computer games?
the key factors for the degree of involvement of a
user. Steuer has found vividness (ability to techno- Perception of Multiple streams
logically display sensory rich environments) to be
the second fundamental component of presence. Eijkman and Vendrik (1965) conducted one of
Along the same lines, Sheridan (1994) assumes the earliest studies on the perception of bimodal
the quality and extent of sensory information that stimuli. They asked test subjects to detect incre-
is fed back to the user, as well as exploration and ments in the intensity of light and tones. The
manipulation capabilities, to be crucial for the stimuli lasted one second and were presented
161
either separately or simultaneously. Subjects and Lindsay (1967) presented test subjects with
detected the increments in one modality without tones and patches of light. Subjects were asked
interference from simultaneously monitoring the to judge the intensity of either tone or light, and
other modality, and performance of detection results were compared to the bimodal judgment
was comparable to that of only monitoring one of intensity of both stimuli.
modality. Other studies, for example, Shiffrin All of these studies characteristically involve
and Grantham (1974) and by Gescheider, Sager, magnitude judgments rather than categorical judg-
and Ruffolo (1975), also support these results for ments. Therefore, the performance of test subjects
presentations of short bimodal stimuli. in the bimodal case might have been limited by
As the stimuli presented in the auditory and the difficulty of maintaining a standard in memory
the visual modalities were not contextually related against which to judge the inputs, rather than by
in the study of Eijkman and Vendrik (1965), they the influence of a second modality itself.
constituted what could be called separate percep-
tual streams. Yet, detection of increments in the the Perceptual cycle
duration of the same stimuli was showing marked
interference. This suggests that temporal judg- Neisser’s model of the Perceptual Cycle describes
ments might be processed by the same processing perception as a setup of schemata, perceptual
system (the same cortical areas), a theory that is exploration and stimulus environment (Farris,
further supported by the findings of Shams et al. 2003). These elements influence each other in a
(2000, 2002) already discussed in the subsection continuously updated circular process, see Figure
on visual dominance. 1. Thus, Neisser’s model describes at a very ab-
Interestingly, other studies combining auditory stract level how the perception of the environment
and visual discrimination tasks showed modest is influenced by background knowledge, which in
but considerable decrements in terms of perfor- turn is updated by the perceived stimuli.
mance. This was observed when test subjects were In Neisser’s model, schemata represent an
confronted with bimodal stimuli in comparison individual’s knowledge about the environment.
to unimodal ones. To give an example, Tulving Schemata are based on previous experiences and
Figure 1. The Perceptual Cycle after Neisser. (Adapted from Farris, 2003)
162
are located in the long term memory. Neisser at- to an object, it is perceptually analyzed. When
tributes to them the generation of certain expecta- attention is allocated to several objects, they are
tions and emotions that steer our attention in the processed in parallel until the capacity limits are
further exploration of our environment. The ex- exceeded. In that case, processing becomes less
ploratory process consists, according to Neisser, efficient or eventually impossible.
in the transfer of sensory information (the stimu- Neither of the two concepts can be ruled out
lus) into the short-term memory. In the explor- by the many investigations performed in the sci-
atory process, the entirety of stimuli (the stimulus entific community up to now. Instead, assuming
environment) is compared to the schemata already either the gating or the resource interpretation, all
known. Recognized stimuli are given a meaning, empirical results can be explained in some way or
whereas unrecognized stimuli will modify the other. As a result it must be concluded that both
schemata, which will then in turn direct the ex- capacity limits and perceptual gating character-
ploratory process further (Goldstein, 2002, Farris, ize human perceptual processing. This combined
2003). concept is termed controlled parallel processing
Returning to the area of games, the differences (CPP). It claims that parallel processing of dif-
in schemata between human individuals cause the ferent objects is achievable but optional. At the
same stimulus to provoke different reactions in dif- same time, selective processing of a single object
ferent game players. Following Neisser’s model, is possible, largely preventing other stimuli from
new experiences (those that cause a modification of undergoing full perceptual analysis.
existing schemata) are especially likely to generate In fact, further conceptualizing attention
a higher load in terms of processing requirements. might not even be possible unless we understood
Schemata therefore also control the attention that the neural circuitry and operations that underlie
we pay toward stimuli. The exploratory process is these processes in detail. Rather, in the context
directed in the same way for multi-modal stimuli of bimodal perception it is interesting whether
as for unimodal stimuli. there are separate perceptual attention systems
associated with different sensory modalities or
selective Attention whether a unified multi-modal attention system
exists. Are visual and auditory attention the same
An unmanageable number of studies have tried thing? According to Pashler (1999), investigations
to identify and describe the strategies that are have shown that humans are capable of selecting
actually used in the human perceptual process. visual stimuli in one location in space and auditory
Pashler (1999) gives an overview and identi- stimuli in another.
fies two main concepts of attention: attention as Spence, Nicholls, and Driver (2001) have
based on exclusion (gating) or based on capacity examined the effect of expecting a stimulus in a
(resource) allocation. The first concept defines the certain modality upon human performance. They
mechanism that reduces processing of irrelevant measured the reaction time to a stimulus located
stimuli to be attention. It can be regarded as a in the auditory, visual, or tactile modality between
filtering device that keeps out stimuli from the different frequencies of occurrence (equal number
perceptual machinery that performs the recogni- of targets in all modalities against a 75% major-
tion. Attention is therefore identified with a purely ity of targets located in one modality). Spence
exclusionary mechanism. et al. report that reaction times for targets in the
The second concept construes the limited pro- unexpected modalities were slower than for the
cessing resource (rather than the filtering device) expected modality or no expectancy at all. They
as attention. It suggests that when attention is given further state that shifting attention away from the
163
tactile modality was taking longer than shifting result in selective perception (concept of gating or
from the auditory or visual modality. These results early selection) or in selective behavior (resource
show that performance not only depends on what allocation or late selection). Most importantly,
actually happens, but also on what is anticipated she argues that the choice of mechanism actually
by a game player. Yet, it must also be noted that in applied depends on the perceptual load. At low
this study a faster response time for the most likely perceptual load, irrelevant information continues
modality was always related to priming from an to be processed—early selection fails and late
event in the same modality on the previous trial, selection becomes necessary. When the perceptual
and not to the expectancy as such. load is high, irrelevant information is not processed
Alais and Blake (1999) have found evidence and resource allocation is no longer needed. She
that attention focused on a visual object markedly cites a number of experimental studies that sup-
amplifies neural activity produced by features of port these conclusions: processing of distractors
the attended object. They applied single-cell and ceases when the perceptual capacity is exhausted.
neuroimaging studies and reinforce that visual at- Interestingly, Lavie claims that distractor
tention modulates neural activity in several areas processing depends on perceptual capacity limits,
within the visual cortex. They state that “attentional rather than on limited information contained in
modulation seems to involve a boost in the gain of the relevant stimuli. This makes the MLE model
responses of cells to their preferred stimuli, not a second-rank in importance: In the MLE model,
sharpening of their stimulus selectivity” (p. 1015). limited information contained in the relevant
These findings clearly indicate that the per- stimuli should entail the processing of additional
ceptual process is actually controlled by attention. cues among the distractors to check for reliability
They can not fully answer the question whether of that limited information and the correctness of
there is one multi-modal attention or whether at- its interpretation. Following Lavie, this is either
tentions are associated with modalities. However, not possible when the perceptual load is high, or
there are indicators that favor the latter. attention needs to be shifted to formerly irrelevant
information.
Divided Attention and
Perceptual capacity Limits
PErcEPtUAL sALIENcE
One of these indicators is that capacity limits AND sALIENcE MODEL
appear to be more severe when multiple stimuli
are presented in the same modality compared Landragin, Bellalem, and Romary (2001) suggest
with multiple modalities (Pashler, 1999; Reiter, that in the absence of information about the his-
Weitzel, and Cao, 2007; Reiter & Weitzel, 2007; tory of an interactive process, a (visual) object can
Reiter, 2009). This means that capacity limits may be considered salient when it attracts the user’s
occur earlier and more frequently if the main task visual attention more than the other objects. This
and the so-called distractors (stimuli that are not definition of salience originally valid for the visual
directly related to the task/the direct focus of at- domain can easily be extended to what might be
tention) are located in the same modality. called multi-modal salience, meaning that:
In an overview article, Lavie (2001) examines
the capacity limits in selective attention. Lavie • certain properties of an object attract the
reasserts and concludes what evidence from user’s general attention more than the other
several studies suggests: that selective attention properties of that object
as discussed in the previous section can either
164
• certain objects attract the user’s attention audio-visual appearance of the object in focus, for
more than other objects in that scene. instance, by making the sound (effects) related
to that object more realistic in terms of acoustic
A salience model in the game context requires a details, frequency range, localization and so on.
user model of perception, as well as a task model.
The user model describes familiarity of the game salience Model
player with the objects’ properties, as attention on
the properties of an object may vary with back- Obviously, there are situations in which the
ground and experience of the player. Whereas an game engine has no or only limited information
avatar of a human being or a human speech utter- about the player’s current focus. In these cases,
ance can be considered more or less equally salient it appears to be useful to have a salience model
to all players (because its significance to humans classifying the objects contained in the game
is embedded genetically), an acoustically trained scene. No such generalized multi-modal salience
person might focus more on the reverberation in model exists, yet. For the rather limited scope of
a virtual room than a visually oriented person. a gaming situation, a qualitative salience model
The task model describes the fact that salience is suggested here.
depends on intentionality: depending on the task The salience model comprehends the influence
a player is given, his focus will shift accordingly. factors that control the level of saliency of the ob-
Salience also depends on the physical char- jects in a game scene. Figure 2 shows how such a
acteristics of the objects themselves. In the audi- salience model may be structured: it is reasonable
tory domain it is known that certain noises with to start from the basis of human perception, the
increased measures of properties like sharpness stimuli. In games, stimuli are generated by the
or roughness call the attention more than others game system itself, so they depend on a number
(Zwicker & Fastl, 1999). Adding to this, salience of factors—the influence factors of level 1. These
can be due to spatial or temporal disposition of comprise the audio and visual reproduction setups,
the objects in a scene. as well as input devices used for player feedback
One of the most interesting aspects of a sa- to (and control of) the system, like keyboard, joy-
lience model in the context of computer games stick, mouse, or any other dedicated input device.
is its dependency on the degree of interactivity Influence factors of level 1 are those related to
that the game offers to the player. If the player the generation and control of stimuli.
is allowed to interact freely with the objects in a The core elements of human perception are
virtual scene, then it is quite easy to determine sensory perception on the one hand and cognitive
the player’s focus. Obviously, the player’s focus processing on the other. Sensory perception can
will be on the object he is currently manipulating, be affected by a number of influence factors of
so there is a clear indication of where to create level 2. These involve the physiology of the user
a higher agreement of modalities. Consequently, (acuity of vision and hearing, masking effects
games with fewer interaction possibilities are caused by limited resolution of the human sensors
less likely to provide a sense of being there to and so on), as well as other factors directly re-
the player. Thus, interactivity is important for the lated to the physical perception of stimuli.
perceived realism of games in two different ways: Cognitive processing produces a response by
first, it allows the player to do something in the the player. This response can be obvious, like an
virtual world, and second, it allows the applica- immediate reaction to a stimulus, or it can be an
tion to determine the player’s momentary focus. internal response like re-distributing attention/
This information can then be used to enhance the shifting focus or just entering another turn of
165
Figure 2. A salience model for perceived quality in audio-visual games
the Perceptual Cycle. Obviously, the response is individual attribute’s weight not only depends on
governed by another set of influence factors of the audio-visual game scene under assessment (the
level 3. These span the widest range of factors and stimuli), but also on the experimental methodol-
are also the most difficult to quantify: experience, ogy itself. An attribute that is explicitly asked for
expectations, and socio-cultural background of the might be assumed to be of higher importance by
player; difficulty of task in a specific game situa- a test player (we know from our experience that
tion; degree of interactivity; and so forth Influence only important things are asked for in any kind
factors of level 3 are related to the processing and of test). The player’s attention will be directed
interpretation of the perceived stimuli. toward the attribute under assessment, which
Cognitive processing will eventually lead to distorts unbiased perception of the audio-visual
a certain quality impression that is a function of scene as a whole. Therefore, the player’s reaction
all influence factors of types 1–3. This quality in terms of quality rating can be assumed to be
impression cannot be directly quantified. It needs influenced as well.
additional processing to be uttered in the form of
ratings on a quality (or quality impairment) scale, Experimental results
as semantic descriptors and so on. The overall
quality impression is, in turn, the result of evalu- A number of experiments have shown that player
ating single or combined quality attributes. For interaction with an audio-visual game might
example, Woszczyk, Bech, and Hansen (1995) have an effect on the perceived overall quality
have developed a number of attributes that are (Jumisko-Pyykkö, Reiter, and Weigel, 2007; Re-
believed to be relevant for an overall audio-visual iter et al., 2007; Reiter & Weitzel, 2007; Reiter
quality impression: they organize these attributes & Jumisko-Pyykkö, 2007; Reiter, 2009). In these
(quality, magnitude, involvement, balance) into 4 experiments, the general assumption was that by
dimensions of perception (motion, mood, space, offering an attractive interactive content, or by
action), resulting in a 4 by 4 matrix of quality assigning the user a challenging task, this user
criteria. Yet, a quantification of their impact is would become more involved and thus experience
hardly possible as of now. This is because the a subjectively higher overall quality. Along the
166
same lines, it was hypothesized that the subject’s possible but was not regarded as probable, given
ability to differentiate between different levels of the results of informal experiments with a simi-
quality would decrease with an increase in dif- lar variation in reverberation. The second, was
ficulty of task/degree of interaction. The results that the tasks (pressing a button, and navigating/
show that this is not generally the case. However, collecting objects) were not demanding enough
when both task and main varying quality attribute and that it was too easy for subjects to dedicate
were located in the same modality, such an effect part of their attention towards the quality-rating
could be observed. task. This was contradicted by the claims of the
More specifically, in the first experiment subjects themselves: a large majority claimed
(Jumisko-Pyykkö et al., 2007; Reiter and Jumisko- to have been distracted by the navigation task.
Pyykkö, 2007) subjects were presented with a The third possible explanation was subsequently
scenario located in a virtual sports gym. In the looked at in further experiments: The additional
center of the gym, a loudspeaker was positioned cognitive load (pressing a button, navigating while
that played back music/speech signals with vary- collecting objects) was located in the visual and
ing amounts of reverberation (time and strength). haptic domains, whereas the quality differences
Subjects were asked to rate the quality of reverbera- to be rated were located in the auditory domain.
tion under three different degrees of interaction: In a second round of experiments (compare
Reiter et al., 2007; Reiter, 2009), both the ad-
1. No interaction (watch task): subjects were ditional cognitive load and the quality variations
automatically moved on a pre-defined mo- were located in the auditory domain. A virtual room
tion path through the virtual scenario (replica of the entrance hall of a large university
2. Limited interaction (watch and press button building) was equipped with a virtual loudspeaker
task): subjects were moved on a pre-defined in the center, and subjects were asked to navigate
motion path through the virtual scenario, freely through the room using a computer mouse.
but were asked to press a button whenever The loudspeaker played back a randomized se-
a certain object appeared within their field quence of numbers from 1 to 4 read out loud. The
of view reverberation time of the room acoustic simulation
3. Full interaction (navigate and collect task): could be adjusted between 1.0s and 3.0s in 0.5s
subjects were asked to move freely through steps, with 2.0s considered the “reference” rever-
the scenario by using the computer mouse beration time. In the experiment, the reverberation
and to collect as many objects as possible time was changed from reference to another value
by approaching them. at a single random point in time during a transition
time frame beginning 5 seconds after the start and
Interestingly, the ability of subjects to rate ending 5 seconds before the end of each 30 second
the quality of reverberation correctly did not trial. A modified Degradation Category Rating
vary with the degree of interaction/difficulty of scale according to Recommendation ITU-T P.911
the task (Friedman Χ2=3.3, df=2, p>0.05, ns). (1998) was used, consisting of 5 levels (much
Although subjects claimed to have experienced shorter, shorter, equal, longer, much longer), to
more difficulties in the interactive tasks, this did have subjects compare the test reverberation time
not show in the statistical analysis of the collected with the reference reverberation time.
data. Three possible explanations were looked at. The additional cognitive load consisted of a
The first was that the quality differences were too so-called n-back working memory task, similar
obvious, that is, the steps between the different to what has been introduced by Kirchner (1958).
amounts of reverberation were too big. This is Here, subjects were asked to semantically compare
167
Figure 3. Presented stimuli and correct answers (“Comparison”) for 1-back and 2-back continuous-
matching-tasks
the current stimulus (the current number) with the ties have to be processed at the same time, we
one presented n steps back, see Figure 3. In the are better able to parallelize and distribute the
experiment, n was varied between 0 (no additional processing accordingly. This is also suggested by
load) and 2 (high additional load). the common theories of capacity limits in human
The hypothesis was, again, that with increas- attention, see above.
ing difficulty of the task, subjects would commit
more errors in correctly rating the reverberation Game Example
time as a measure of perceived quality. Here, for
the statistical analysis, the rating errors were re- In a third experiment (Reiter & Weitzel, 2007),
structured according to flaw size, such that each inspired by Zielinski, Rumsey, Bech, de Bruyn,
0.5s deviation would result in one error point. The and Kassier (2003) and Kassier, Zielinski, and
subsequent analysis was performed on error points. Rumsey (2003), it has been shown that cross-
A complete description of the experiment can be modal influence of interaction is very well possible
found in (Reiter, 2009, pp. 203-212). when stimuli and interaction/task are carefully
A comparison of the error rates for “navigation balanced. For this, a simplified Space Invaders-
only” with “navigation with 2-back task” resulted like arcade game has been created, in which two
in a highly significant difference (T=20, p≤0.01). different types of objects (donuts and snowballs)
Comparing these results to the first experiment moved through a virtual room. Motion of objects
described above, it becomes apparent that inner- was straight towards the baseline, on which the
modal influence of task is significantly greater player could move left and right. Players were
than cross-modal influence. This might indicate instructed to collect as many donuts as possible and
that humans perform a pre-processing of stimuli to avoid collisions with snowballs. Each collected
that—depending on modality – takes place in donut resulted in an increase of the player’s score
separate areas of the brain. Thus, in situations whereas a collision with a snowball decreased
where stimuli that belong to different modali- the score. The current game score was displayed
168
Figure 4. Grey-scale screenshot of the game scenario
on the screen near the chimney, which served as players were asked to rate the perceived tonal
the source of the flying objects. Figure 4 shows quality degradation using the standardized ITU-
a screenshot of the game scenario. T P.911 (Recommendation ITU-T P.911, 1999),
A typical background music track for an arcade 5-level impairment scale.
game was chosen for the game. For the experi- A total of 32 subjects participated in the
ment, each subject carried out a passive and an experiment. Seven players were female and 25
active session. The active session involved play- were male (age M = 25.7, SD = 5.36). Regard-
ing the computer game and evaluating the sound ing their listening experience, 20 subjects were
quality of the game music. This session was de- considered initiated assessors and 12 classified as
signed to cause a division of attention between naive assessors. The group of initiated assessors
evaluating the audio quality and reaching a high had already gained abilities and knowledge in
score. In the passive session, subjects were asked rating audio quality in preceding unimodal and
to evaluate the audio quality while a game demo bimodal subjective assessments. All participants
was presented. Here, the attention of the subjects reported normal hearing and normal or corrected
was assumed to be directed to the audio quality to normal visual acuity.
exclusively. A Wilcoxon T test showed that the quality
In both sessions, active or passive, either ratings of the active session varied significantly
the original (20kHz) game music, or a low-pass from the ratings of the passive session for cut-off
filtered version with cut-off frequencies at fc = frequencies up to 12kHz. A significant decrease
11kHz, 12kHz or 13kHz was played. This was in rating correctness was shown for the active
complemented by an anchor with fc = 4kHz. session in comparison to the passive session for
Thus a total of 5, 3-test items, 1 anchor item, and the anchor item (T = 37, p ≤ 0.01), the cut-off
1 reference item (corresponding to the original frequency fc = 11kHz (T = 452.50, p ≤ 0.01), and
full-range signal) were presented to the players the cut-off frequency fc = 12kHz (T = 812, p ≤
in the experiment. After each round of the game, 0.01). For the cut-off frequency of 13kHz and the
169
reference item, no significant differences could be processes. Still, whether it is possible to come up
found (T = 630.50 and T = 75, resp., p > 0.05, ns). with a generalized model of cross-modal percep-
The data analysis showed that the ratings of tual processing at all is highly questionable. It is
the tonal quality degradations in the active session assumed by many that its complexity exceeds by
differed significantly from those in the passive far the possibilities for designing a suitable model.
session. The low-pass filtering in the active session Yet, it seems feasible to aim at perceptual models
was rated as being less perceptible compared to the that are valid for certain perceptual scenarios only.
passive session, for which active players turned A specific game-playing scenario can be one of
into passive viewers. More generally speaking, the these, as factors like setup (computer screen,
experiment shows that an influence of interaction loudspeakers/headphones, input devices) and task
performed in one modality (visual-haptic) upon are of rather small variance across users, given
the perception of quality in another modality (in a certain use case. This has been demonstrated
this case, auditive) is possible. Thus, cross-modal in the game example above. A salience model
influences are possible. as described in this contribution could therefore
In order for a cross-modal influence to exist, serve as a starting point for the exploitation of
the characteristics of stimuli and interaction/task salience effects.
must be carefully balanced. At this time, it is not Saliency is closely related to distribution of
possible to determine or quantify that balance a attention and perceptual capacity limits. The
priori. However, some of the influence factors experimental results summarized in this chapter
that contribute to this balance have been identified indicate that effects of capacity limits are more
in the salience model in Figure 2 above. These dominant inner-modally than cross-modally. At
influence factors need to be quantified and this the same time, capacity limits seem to be more
is a task for the future. predictable inner-modally than cross-modally.
Unless we have better models of the percep-
tual processing underlying the generation of a
sUMMArY AND cONcLUsION subjective quality impression, it will be difficult
to predict the perceived quality of audio in a multi-
This chapter has reviewed some of the most modal context in general, or in a game context as
important issues of perceived quality of audio discussed here. Nevertheless, both the experiments
in computer games. The main conclusion is that described, and the literature and effects reviewed
audio quality in games, as perceived by a game here, suggest that there is potential for exploitation
player, is not independent of other factors (apart of such perceptual constraints.
from sound quality itself). Because games usually Future research should therefore concentrate
provide information and feedback to the player in on methodologies for the subjective evaluation
more than the auditory modality, it is necessary of audio-visual quality, or multi-modal quality
also to take into account other modalities when in general. Only a few recommendations exist
judging the impact and quality of audio. A rating for performing audio-visual experiments and the
of audio quality alone, without the gameplay impact of interactivity—as naturally given in
context, is not meaningful. any gameplay—on the perceived quality is, until
The physical mechanisms of human auditory now, simply not considered at all. Once proper
and visual perception are well understood. Cross- recommendations exist, it will be much easier to
modal interaction between the two domains, that compare and validate experimental results, thus
is, perceptual processing in the human brain, needs paving the way for a quantification of the salience
further research, before it is possible to model such model described in this chapter.
170
rEFErENcEs Farris, J. S. (2003). The human interaction cycle:

A proposed and tested framework of perception,
Alais, D., & Blake, R. (1999). Neural strength cognition, and action on the web. Unpublished
of visual attention gauged by motion adapta- doctoral dissertation. Kansas State University,
tion. Nature Neuroscience, 2(11), 1015–1018. USA.
doi:10.1038/14814
Gardner, W. G. (1992, November). A realtime
Battaglia, P. W., Jacobs, R. A., & Aslin, R. N. multichannel room simulator. Paper presented
(2003). Bayesian integration of visual and audi- at the 124th meeting of the Acoustical Society of
tory signals for spatial localization. Journal of the America.
Optical Society of America, 20(7), 1391–1397.
doi:10.1364/JOSAA.20.001391 Gescheider, G. A., Sager, L. C., & Ruffolo, L. J.
(1975). Simultaneous auditory and tactile infor-
Beerends, J. G., & De Caluwe, F. E. (1999). The mation processing. Perception & Psychophysics,
influence of video quality on perceived audio 18, 209–216.
quality and vice versa. Journal of the Audio En-
gineering Society. Audio Engineering Society, Goldstein, E. B. (2002). Wahrnehmungspsycholo-
47(5), 355–362. gie (2nd ed.). Berlin: Spektrum Akadem. Verlag.
Blauert, J. (2001). Spatial hearing: The psycho- Howard, I. P. (1982). Human visual orientation.
physics of human sound localization (3rd ed.). New York: Wiley.
Cambridge, MA: MIT Press. Jot, J. M., & Chaigne, A. (1991). Digital delay
Braasch, J. (2005). Modelling of binaural hearing . networks for designing artificial reverberators.
In Blauert, J. (Ed.), Communication acoustics (pp. Paper presented at the AES 90th Convention.
75–108). Berlin: Springer Verlag. doi:10.1007/3- Preprint 3030.
540-27437-5_4 Jumisko-Pyykkö, S., Reiter, U., & Weigel, C.
Brodmann, K. (1909). Vergleichende Lokalisa- (2007). Produced quality is not perceived quality—
tionslehre der Grosshirnrinde in ihren Prinzipien A qualitative approach to overall audiovisual
dargestellt auf Grund des Zellenbaues. Leipzig, quality. In Proceedings of the 3DTV Conference.
Germany: Johann Ambrosius Barth Verlag. Kalman, R. E., & Bucy, R. S. (1961). New results
Eijkman, E., & Vendrik, J. H. (1965). Can a sen- in linear filtering and prediction problems. Journal
sory system be specified by its internal noise? The of Basic Engineering, 83, 95–108.
Journal of the Acoustical Society of America, 37, Kassier, R., Zielinski, S., & Rumsey, F. (2003).
1102–1109. doi:10.1121/1.1909530 Computer games and multichannel audio quality
Ellis, S. R. (1996). Presence of mind... A reaction part 2—Evaluation of time-variant audio degrada-
to Thomas Sheridan’s “Musing on telepresence.” tion under divided and undivided attention. AES
. Presence (Cambridge, Mass.), 5, 247–259. 115th Convention. Preprint 5856.
Farnell, A. (2011). Behaviour, structure and causal- Kirchner, W. K. (1958). Age differences in short-
ity in procedural audio . In Grimshaw, M. (Ed.), term retention of rapidly changing information.
Game sound technology and player interaction: Journal of Experimental Psychology, 55(4),
Concepts and developments. Hershey, PA: IGI 352–358. doi:10.1037/h0043688
Global.
171
Klein, D. J., König, P., & Körding, K. P. (2003). Meehan, M., Razzaque, S., Whitton, M. C., &
Sparse spectrotemporal coding of sounds. EURA- Brooks, F. P., Jr. (2003). Effect of latency on
SIP Journal on Applied Signal Processing, 7, presence in stressful virtual environments. In
659–667. doi:10.1155/S1110865703303051 Proceedings of IEEE Virtual Reality, 141-148.
Landragin, F., Bellalem, N., & Romary, L. (2001). Nordahl, R. (2005). Self-induced footsteps sounds
Visual salience and perceptual grouping in mul- in virtual reality: Latency, recognition, quality and
timodal interactivity. In Proceedings of Interna- presence. In Proceedings of PRESENCE 2005,
tional Workshop on Information Presentation and 8th Annual International Workshop on Presence,
Natural Multimodal Dialogue IPNMD. 353-354.
Larsson, P., Västfjäll, D., & Kleiner, M. (2003). Pashler, H. E. (1999). The psychology of attention.
On the quality of experience: A multi-modal ap- Cambridge, MA: MIT Press.
proach to perceptual ego-motion and sensed pres-
Pulkki, V. (2001). Spatial sound generation and
ence in virtual environments. In Proceedings of
perception by amplitude panning techniques.
First ISCA ITRW on Auditory Quality of Systems
Unpublished doctoral dissertation. Helsinki Uni-
AQS-2003, 97-100.
versity of Technology, Finland.
Lavie, N. (2001). Capacity limits in selective at-
Recommendation ITU-T P.911. (1998/1999). Sub-
tention: Behavioral evidence and implications for
jective audiovisual quality assessment methods for
neural activity . In Braun, J., & Koch, C. (Eds.),
multimedia applications. Geneva: International
Visual attention and cortical circuits (pp. 49–60).
Telecommunication Union.
Reiter, U. (2009). Bimodal audiovisual perception
Lee, K. M., Jeong, E. J., Park, N., & Ryu, S. (2007).
in interactive application systems of moderate
Effects of networked interactivity in educational
complexity. Unpublished doctoral dissertation.
games: Mediating effects of social presence. In
TU Ilmenau, Germany.
Proceedings of PRESENCE2007, 10th Annual
International Workshop on Presence, 179-186. Reiter, U., & Jumisko-Pyykkö, S. (2007). Watch,
press and catch—Impact of divided attention on
Lee, K. M., Jin, S. A., Park, N., & Kang, S. (2005).
requirements of audiovisual quality . In Jacko,
Effects of narrative on feelings of presence in com-
J. (Ed.), Human-Computer Interaction, Part III,
puter/video games. In Proceedings of the Annual
HCI 2007 (pp. 943–952). Berlin: Springer Verlag.
Conference of the International Communication
Association (ICA). Reiter, U., & Weitzel, M. (2007). Influence of
interaction on perceived quality in audiovisual
Lombard, M., & Ditton, Th. (1997). At the heart
applications: Evaluation of cross-codal influence.
of it all: The concept of presence. Journal of
In Proceedings of 13th International Conference
Computer-Mediated Communication, 3.
on Auditory Displays (ICAD), 380-385.
MacKenzie, I. S., & Ware, C. (1993). Lag as a
Reiter, U., Weitzel, M., & Cao, S. (2007). Influ-
determinant of human performance in interactive
ence of interaction on perceived quality in audio
systems. In Proceedings of the ACM Conference
visual applications: Subjective assessment with
on Human Factors in Computing Systems – IN-
n-back working memory task. In Proceedings of
TERCHI’93, 488-493.
AES 30th International Conference.
172
Schroeder, M. R. (1962). Natural sounding Wenzel, E. M. (1998). The impact of system la-
artificial reverberation. Journal of the Audio tency on dynamic performance in virtual acoustic
Engineering Society. Audio Engineering Society, environments. In Proceedings of the 15th Interna-
10(3), 219–223. tional Congress on Acoustics and 135th Meeting
of the Acoustical Society of America, 2405-2406.
Schroeder, M. R. (1970). Digital simulation of
sound transmission in reverberant spaces (part 1). Wenzel, E. M. (1999). Effect of increasing system
The Journal of the Acoustical Society of America, latency on localization of virtual sounds. In Pro-
47(2), 424–431. doi:10.1121/1.1911541 ceedings of the AES 16th International Conference
on Spatial Sound Reproduction, 42-50.
Shams, L., Kamitani, Y., & Shimojo, S. (2000).
What you see is what you hear. Nature, 408, 788. Wenzel, E. M. (2001). Effect of increasing system
doi:10.1038/35048669 latency on localization of virtual sounds with
short and long duration. In Proceedings of 7th
Shams, L., Kamitani, Y., & Shimojo, S. (2002).
International Conference on Auditory Displays
Visual illusion induced by sound. Brain Re-
(ICAD). 185-190.
search. Cognitive Brain Research, 14, 147–152.
doi:10.1016/S0926-6410(02)00069-1 Woszczyk, W., Bech, S., & Hansen, V. (1995).
Interactions between audio-visual factors in a
Sheridan, T. B. (1994). Further Musings on the
home theater system: Definition of subjective
Psychophysics of Presence. Presence (Cambridge,
attributes. AES 99th Convention. Preprint 4133.
Mass.), 5, 241–246.
Zielinski, S., Rumsey, F., Bech, S., de Bruyn, B.,
Shiffrin, R. M., & Grantham, D. W. (1974). Can
& Kassier, R. (2003). Computer games and mul-
attention be allocated to sensory modalities? Per-
tichannel audio quality—The effect of division of
ception & Psychophysics, 15, 460–474.
attention between auditory and visual modalities.
Singer, W., Engel, A. K., Kreiter, A. K., Munk, In Proceedings of the AES 24th International
M. H. J., Neuenschwander, S., & Roelfsema, Conference on Multichannel Audio, 85-93.
P. R. (1997). Neuronal assemblies: necessity,
Zwicker, E., & Fastl, H. (1999). Psychoacous-
signature and detectability. Trends in Cognitive
tics—Facts and models (2nd ed.). Berlin: Springer
Sciences, 1(7), 252–261. doi:10.1016/S1364-
Verlag.
6613(97)01079-6
Spence, C., Nicholls, M. E. R., & Driver, J.
(2001). The cost of expecting events in the wrong
sensory modality. Perception & Psychophysics,
63(2), 330–336. Binaural: Literally means “having or relating
Steuer, J. (1992). Defining virtual reality: to two ears”. Binaural hearing, along with fre-
Dimensions determining telepresence. The quency cues, lets humans determine the direction
Journal of Communication, 42(4), 73–93. of incidence of sounds.
doi:10.1111/j.1460-2466.1992.tb00812.x Brodmann Areas: 52 different regions of the
cortex, defined on the basis of the organization of
Tulving, E., & Lindsay, P. H. (1967). Identification cells. Named after Korbinian Brodmann’s maps
of simultaneously presented simple visual and of cortical areas in humans, published 1909.
auditory stimuli. Acta Psychologica, 27, 101–109.
doi:10.1016/0001-6918(67)90050-9
173
CognitivE Load: A term describing the load Presence: The feeling of being present in
on working memory during instruction (problem an artificial environment, for example a game
solving, thinking, reasoning). scenario in a jungle.
Dorsal Stream: Also known as the parietal Quality of Experience: The overall accept-
stream, the “where” stream, or the “how” stream, ability of an application or service, as perceived
proposed to be involved in the guidance of actions subjectively by the end-user. Quality of Experience
and recognizing where objects are in space. includes the complete end-to-end system effects
Fitts’ Law: A model of the human movement (client, terminal, network, services infrastructure
in human-computer interaction and ergonomics and so on). Overall acceptability may be influenced
which predicts that the time required to rapidly by user expectations and context.
move to a target area is a function of the distance Salience: The state or quality of an item that
to and the size of the target. stands out relative to neighboring items.
Localization: The ability to detect the direc- Schema: Previous knowledge, something we
tion of incidence of a sound. already understand or are familiar with.
Monaural: Literally means “having or relat- Single-Cell Recording: A technique used in
ing to one ear”. brain research to observe changes in voltage or
Multi-Modal: More than one perceptual current in a neuron, thus measuring a neuron’s
modality involved, usually the auditory and the activity.
visual domain, sometimes also including haptics. Space Invaders: An arcade video game de-
Perceptual Cycle: A model describing human signed by Tomohiro Nishikado, released in 1978,
perception as a cyclic setup of schemata, percep- with the aim of defeating waves of aliens with a
tual exploration, and stimulus environment which cannon, earning as many points as possible.
influence each other in a continuously updated Ventral Stream: Also known as the “what”
process, first introduced by US psychologist stream, associated with object recognition and
Ulric Neisser. form representation.
174
Section 3
Emotion & Affect
176
Chapter 9
Causing Fear, Suspense,
and Anxiety Using Sound
Design in Computer Games
Paul Toprac
Southern Methodist University, USA
Ahmed Abdel-Meguid
Southern Methodist University, USA
AbstrAct
This chapter provides a theoretical foundation for the study of how emotions are affected by game sound
as well as empirical evidence for determining how to promote fear, suspense, and anxiety in players using
sound effects. Four perspectives on emotions are described: Darwinian, James-Lange, cognitive, and
social constructivist. Three basic properties of diegetic sound effects were studied: volume, timing, and
source. Results strongly suggest that the best sound design for causing fear is high volume and timed
sound effects (synchronized game sound with visual moment) and somewhat suggest that sourced sound
effects also promote fear. For anxiety, results strongly suggest that the best sound design is medium
volume sound effects. Results also suggest that acousmatic and untimed sound effects evoke suspense
rather than anxiety. Low volume sound effects are not effective at evoking fear, suspense, and anxiety
due to potential masking by other sounds. Implications and future research directions are presented.
INtrODUctION everyday life. Immersion occurs when the game:

(1) “monopolizes the senses” (Carr, 2006, p. 68),
Computer games are audio-visual entertainment (2) engages the player psychologically, and (3)
media that provide an escapist experience (Grim- requires physical action (see Nacke & Grimshaw,
shaw, 2007). That is, computer games utilize both 2011 for more on immersion). The authors of this
audio and visual media to capture players’ attention chapter believe that all three components of im-
and engage players’ motor and mental skills; thus mersion are highly linked and can be (and are)
immersing the players in the gameworld. This used to evoke emotions from players.
immersion provides an escape for players from Visuals and sound are often used to elicit spe-
cific emotions among the consumers of computer
DOI: 10.4018/978-1-61692-828-5.ch009 games. Currently, however, the computer game
Causing Fear, Suspense, and Anxiety Using Sound Design in Computer Games
industry is focused on the quality of the graphics can be conducted. Furthermore, our aim is to
within the game. The computer game industry has produce valid results that are able to both explain
clear guidelines for visuals, but not particularly phenomena and be useful for game designers.
for sound. Yet, sound is as least as important, Specifically, this chapter describes a study to de-
if not more important, than visuals for creating termine the best sound design principles pertaining
immersion and evoking emotions (Anderson, to game sound effects (defined here as all diegetic
1996; Grimshaw, 2007), though often underrated game sound except dialogue) to cause fear and
by the players (Cunningham, Grout, & Picking, anxiety in players—two common emotions that
2011). Sound can change the player’s perception players feel while playing computer games. The
of images to the point where the sound dominates empirical research examines how to manipulate
even when the player is presented an opposing three basic properties of game sound (volume,
relationship between the sound and image (Col- timing, and source) through a game level designed
lins, Tessler, Harrigan, Dixon, & Fugelsang, 2011). to evoke fear, suspense, and anxiety. Through
Unfortunately, as Collins (2007) states, “work this quantitative and qualitative examination,
into the sonic aspects of audio-visual media has the general design principles of how to develop
neglected games [and] video games audio remains game sound effects to promote fear and anxiety
largely unexplored” (p. 263). Furthermore, as is better understood.
Serafin (2004) wrote, “[s]o far no quantitative
results are available to help designers to build
soundscapes which allow the user to feel fully bAcKGrOUND: LItErAtUrE
immersed” (p.4). And, finally, according to Nacke AND FIELD rEVIEW
and Grimshaw (2011), “not much work has been
put into sensing the emotional cues of game sound In order to design games and perform research us-
in games, let alone in understanding the impact ing game sound for promoting fear, suspense, and
of game sounds on players’ affective responses”. anxiety, both theories of emotion and the current
The purpose of the current chapter is to create state of the art design of sound effects in games are
a theoretical foundation and empirical evidence important to understand. Emotions and affect are
for the study of how emotions and affect are elusive in nature, and difficult to define (Cornelius,
impacted by game sound. Although Roux-Girard 1996). For instance, some consider emotions and
(2011) “firmly believes that adopting a position affect to be the same psychological construct,
that emphasizes reception issues of gameplay while other researchers consider affect to be the
can provide a more productive model than one conscious experience of emotions. In either case,
that would be grounded directly in the produc- our research measures the conscious experience
tion aspects (implementation and programming) of emotion, whether that is considered affect or
of game audio”, we believe that researching the emotion or both. Furthermore, rather than define
impact of the production aspects of game sounds emotion and affect, which is attempted in Nacke
is just as productive. Ultimately, we believe that and Grimshaw (2011), we will describe emotions
both approaches are equally viable and should be from the perspectives of four theoretical traditions
used to understand the experience of game sounds. of research on emotion in psychology (Cornelius,
Whereas Roux-Girard attempts to understand the 1996). These schools of thought on the sources
effect of game sounds from a top-down approach, and development of emotion are the Darwinian
our intent is to build from bottom-up a research theory of emotions, James-Lange theory, cognitive
foundation upon which further inquiry into the theory, and social constructivist theory. Our intent
relationship between emotions and game sound is to provide an understanding of the emergence
177
of emotions while playing a game, which Perron (Barlow, 1988). May (1977) resolves whether
(2005a) termed gameplay emotions. Gameplay anxiety is innate or not by suggesting that all
emotions are different than everyday emotions. For humans have the instinctive capacity to react to
instance, gameplay emotions can be paradoxical threats, whether the threat is concrete (for fear) or
in nature, such as deliberating watching a scary unspecific (anxiety). However, what the individual
movie to enjoy the sensation of fear (Cunningham considers threats may be learned and are triggered
et al., 2011). The following discussion on emo- by the appraisal of particular events or stimuli.
tions is focused on fear, anxiety, and suspense.
The Cognitive Perspective of Emotions
theories of Emotions and Games
The importance of appraisals of particular events
The Darwinian Perspective of Emotions or stimuli, and their associations with emotions,
is illuminated by the cognitive theory of emo-
In the Darwinian perspective of emotions, there tions (Cornelius, 1996). Based on the cognitive
are certain basic emotions that are inherited and perspective, emotions and behavior are constantly
shared across the human experience. Researchers changing as an individual appraises and reap-
of the Darwinian perspective, such as Plutchik praises the changing environment (Folkman &
(1984), have identified several primary emotions, Lazarus, 1990). Depending on what the player
such as rage, loathing, grief, amazement, terror, of a game is consciously thinking of a situation,
admiration, ecstasy, and vigilance. Each of these he or she can experience any of a range of emo-
emotions has several levels of intensity. For tions and behaviors. Appraisals and reappraisals
instance, the less intense levels of the emotion are important parts of the emotional experience
of terror are fear and then apprehension. Game of survival horror games (Perron, 2004). Game
players can be observed showing many of these players may feel fear and anxiety by appraising
identified primary emotions. For example, play- particular sounds as being “scary” or “creepy” or
ers often feel fear or apprehension at the appear- they may appraise the same game sounds as “silly.”
ance of the enemy, particularly in survival horror Game designers can promote the experience of fear
computer games. and anxiety through priming cues, such as music,
Plutchik’s theory posits that we can promote acousmatic sound effects, and visuals, which
fear in everyone. Fear is a psychological experi- can encourage “thinking” about the scariness or
ence to prepare individuals for the ‘freeze, fight or creepiness of the game. After being primed, the
flight’ response (Gray, 1971). However, Plutchik’s player is more likely to appraise particular stimuli
theory does not easily account for anxiety. Fear is a in the way that is desired by the game designer,
reaction to a specific danger or threat while anxiety such as fear when suddenly a monster appears in
is unspecific, vague, and objectless (May, 1977). survival horror computer games.
Thus, anxiety is not a lower level of intensity of
fear, or even apprehension. Anxiety is diffuse with The James-Lange
a vague sense of apprehension (Kaplan & Sadock, Perspective of Emotions
1998), rather than apprehension due to a specific
stimulus (Gullone, King, & Ollendick, 2000). Appraisals are also an important part of the
Anxiety is often thought to be a future-oriented James-Lange perspective of emotions. However,
mood--a vague discomforting sense--that things in this perspective the appraisals are unconscious
will go wrong, which can have an adaptive func- evaluations of the body’s response to stimuli
tion of enhancing performance at optimal levels (Cornelius, 1996). While playing games, the body
178
reacts and provides feedback to the brain, which Fear and anxiety responses to particular stimuli
unconsciously appraises the body’s reaction and are learned through conditioning from family and
influences further cognition. For instance, if the other valued persons, which, in turn, are part of
player jumps due to a stimulus, the mind may at- the larger general culture (May, 1977). Social-
tribute that bodily action to being scared. Research constructivists believe that emotions are used to
by Wolfson and Case (2000) showed that louder maintain interpersonal relationships and identity
sounds in a computer game increased heart rate in a person’s communities (Greeno, Collins, &
and impacted physiological arousal and attention. Resnick, 1996). The community can be friends,
According to the James-Lange perspective, the relatives, or other game players, who are all
player may unconsciously attribute increased heart influenced by the general culture. For instance,
rate and arousal as a state of fear and/or anxiety. people often feel scared when they suddenly see a
The game designer can use this knowledge to cockroach because that is what they have learned
his or her advantage. Loud and sudden noises, for from their mothers, who feared and loathed cock-
example, can make players instinctively jump, roaches. If they did not feel fear, but rather liked
which promotes the feeling of fear. However, it is cockroaches, their relationship with their mothers
less clear how to use this perspective for eliciting may have become strained, which, at a young
anxiety. If players sweat profusely while playing a age, would not be desirable. Thus, these youths
game, will he or she attribute that to fear or anxiety appropriate the feeling of fear of cockroaches
or something else? Perhaps the answer depends from their mothers, who, in turn, maintain this
on the stimuli, which leads some researchers fear because it is part of the cultural milieu in
back to the Darwinian and cognitive perspectives which the mothers desired to participate. Finally,
of emotions, where Darwinians believe that our as Cunningham, Grout, and Picking (2011) point
emotional reactions to stimuli are innate and out, the social and cultural milieu includes the
cognitivists believe that they are learned. context in which the person is experiencing his
or her emotion. Thus, emotions that emerge while
The Social Constructivist playing games (that is, gameplay emotions) are
Perspective of Emotions different from everyday emotions because the
context of playing games is not the same as the
Are our reactions to particular stimuli innate or context of typical everyday experiences.
learned? If learned, how can game designers know For game designers, this means that anything
what stimuli to use to promote fear and anxiety? that the player has been taught to fear can be lev-
Although there is some controversy regarding eraged to promote fear, including such things as
the answer to the former question, there do seem death and failure, which are risks (to the player’s
to be a few, select stimuli that innately promote avatar) in most computer games. Furthermore,
fear such as sudden, unexpected movements, game designers can use sound effects as cues for
especially approach-motions, or sounds (Gebeke, threats. For example, if a player gets close to an
1993) and, for anxiety, the threatened security electrical hazard, the game designer can add a
patterns between an individual and his or her loud sparking noise to scare the player. However,
valued significant persons (May, 1977). However, game designers must keep in mind that particular
beyond that, it would seem that our reactions to graphics and sound may elicit different or less
particular events or stimuli are learned, and this intense emotions between individuals in different
leads to the question of how and what is learned. cultures. For instance, the sound of a slide-action
The response to this is best answered by the shotgun pumping may promote more fear or
social constructivist perspective of emotions.
179
anxiety in America than in countries where there a threat to be more dangerous than warranted,
are fewer shotguns. resulting in an elevated fear response.
Though “response to sound, therefore, can So, what is the emotion of suspense? Compared
vary from player to player” (Collins et al., 2011), to fear and anxiety, there has been very little
the theories of emotion described in this section research on suspense. However, psychologists
provide a framework to understand the sources and quoted in Paradox of Suspense (Carroll, 1996) pro-
range of emotional responses of players to game vide this definition of suspense: “…a Fear emotion
sound. In the Darwinian perspective, there are coupled with the cognitive state of uncertainty”
certain basic emotions that are inherited and shared (p. 78). That is, fear coupled with anxiety. The
across the human experience. The cognitive per- film scholar, Zillman (1991), describes suspense
spective to emotions contributes to understanding as “the experience of uncertainty regarding the
them by explicating the importance of appraisals outcome of a potentially hostile confrontation” (p.
of stimuli and their underlying association with 283), which is similar to the definition of anxiety
emotions. Researchers of the James-Lange theory but with more emphasis on specific stimuli that
of emotions believe that humans first experience are associated with fear. Thus, we conclude that
bodily changes as a result of the perception of the suspense is the intersection or overlap of fear and
emotion-eliciting stimuli and that is the experience anxiety. Suspense can be viewed as fear of im-
of the feeling. Furthermore, the social constructiv- minent threat that is likely to occur, but has not
ist perspective posits the theory that emotions are appeared, and/or a state of high anxiety due to an
learned and culturally determined. These emotion impending dangerous situation. As Krzywinska
theories correspond to three generally accepted (2002), a professor in film studies, states: “Many
forms of human expression of emotions: expres- video games deploy sound as a key sign of impend-
sive behavior (showing the emotion), subjective ing danger, designed to agitate a tingling sense in
experience (appraising the feeling), and the physi- anticipation of the need to act” (p. 213).
ological component (sympathetic arousal) (see Fear, anxiety, and suspense are gameplay
Cunningham et al., 2011). emotions that are intentionally promoted in the
As previously mentioned, anxiety, at first design of survival horror games. Game designers
glance, could be conceptualized as a less intense control all that the player sees and hears within the
experience of fear, but this is not considered by survival horror game experience, and they have
most emotion researchers to be the case. Fear used this control to develop sound design tech-
and anxiety are closely related but not the same niques to elevate the player’s fear, suspense, and
(Gullone, King, & Ollendick, 2000). Fear is an anxiety. Some of these techniques are explained
emotional response to a particular event or object in the following section.
and anxiety is an emotional response to an un-
specific event or object. Though fear and anxiety sound Design in Games
are considered as two separate mental processes,
representing different affective and cognitive Sound design is used in almost all computer
states, the two are considered linked. Fear can games. To design the soundscape in a computer
feed off anxiety and vice-versa. Game designers game, there are a large number of sound proper-
may be able to increase players’ fear if the play- ties that can be manipulated (see Liljedahl, 2011;
ers are already anxious rather than in a state of Wilhelmsson & Wallén, 2011). In this chapter,
calm. Because the player is already in a state of sound properties are reduced to three independent
nervousness and worry, he or she may perceive variables: volume, timing, and source. We believe
that these are three of the most basic properties
180
of sound that are considered when designing the provides an overview of the sound design of the
soundscape of a computer game. Volume is the five different survival horror games selected:
relative loudness at which a sound is heard from
a loudspeaker. Timing is the relative synchroni- • Alone in the Dark: This game uses high
zation of the sound with its source. Source is the quality visuals coupled with interspersed
origin of a sound. moments of surprise to cause player fear
Game sounds are here categorized into the and anxiety.
common topology of three separate categories: • Dead Space: This game has an abun-
music, dialogue, and sound effects (Wilhelmsson dance of ambient sound effects and clut-
& Wallén, 2011). Music is a type of mood-setting ter to add to the realism and increase the
technique that typically coincides with the theme player’s anxiety and suspense. Dead Space
of a game. Music is not considered as part of this also uses a combination of well-timed and
study because it is typically non-diegetic in na- high-volume sound effects to elicit fear re-
ture and music is “heavily controlled by tempo” sponses from the player, and has received
(Cunningham et al., 2011) rather than the sound praise for its sound design.
properties studied here. Dialogue is diegetic but is • Doom 3: The soundscape of Doom 3 fo-
not considered in this chapter because properties of cuses on voice acting and ambient sound
speech, such as intonation, may be more important effects. The ambience succeeds in creating
than the selected properties of volume, timing, a mood of suspense, while the encounters
and source in their impact on players. Sound ef- with monsters focus on creating fear.
fects are diegetic game sounds, such as ambient, • Eternal Darkness: This game takes a mini-
weapon, and environmental sounds. Examples of malistic approach to sound design and only
sound effects are: ambient noises such as rustling uses sounds very sparingly. This approach
leaves and the steady drip of rain; player avatar allows the player to hear what few sounds
sounds that are not related to dialogue, such as are in the game with little difficulty and
pained grunts; and weapon noises, such as the this increases the effect of each sound.
crack of a rifle or the swing of a club. • Silent Hill 2: This is an older game that
To understand how the sound properties of uses a minimalist approach to sound de-
sound effects are used by game designers to cause sign, like Eternal Darkness, but with more
fear and anxiety, the state of the art in sound design of an emphasis on game sounds without
for games is reviewed in this section. Specifically, visible sources to create suspense.
computer games in the survival horror genre are
reviewed using the sound properties of volume, Volume of sound Effects
timing, and source. Survival horror games provide
good case studies because they are designed to While game designers decide on what level of
keep the player in a state of fear, suspense, and volume to play their sound effects relative to other
anxiety throughout the game: “Crawling with sounds, players change the overall volume of the
monsters, survival horror games make wonderful game sounds emitting from their loudspeakers at
use of surprise, attack, appearances, and any other will. Thus, the game designer can only manipu-
disturbing action that happens without warning” late the magnitude of volume in relation to other
(Perron, 2004, p. 2). The games chosen for this sounds. A “loud” sound has a higher volume than
field review are: Alone in the Dark (2008), Dead the average sounds currently playing. A “soft”
Space (2008), Doom 3 (2004), Eternal Darkness sound has a lower volume than the average.
(2002), and Silent Hill 2 (2001). The following Psychoacoustic research suggests that the lower
181
in volume a sound effect is, the more likely that seems to focus more on visual quality than sound
players will miss the sound (Healy, Proctor, & quality. Yet, one section in Doom 3 that stands out
Weiner, 2003). For game designers, this means among the rest occurs when a screaming, flying
they should create important sound effects that skull circles around the player’s avatar, and its
are at least the same volume, if not louder, than volume rises and falls based on its distance from
the ambient soundscape in order to be perceived. that avatar. This part of the game causes fear due
Loud sound effects are more likely to be effective to the perceived danger from the sudden and loud
at evoking sudden and shocking emotions in the sounds accompanied by the mysterious nature of
player than soft sounds. Softer sounds, however, the flying skulls, though over time the fear sub-
may serve as a good atmospheric tool that can sides as the player becomes more habituated to
enhance immersion and set a mood. Computer the situation. Furthermore, Doom 3 has an enemy
games typically play ambient sound effects at ambush almost every time the player’s avatar picks
low to medium volume, while emotion-evoking up an item. The game has the same sound and the
sound effects play at medium to high volumes to same enemy for many of these ambushes. This
maximize the likelihood that the player perceives eventually becomes repetitive and boring, and
them. players begin to anticipate the ambushes.
For instance, in Alone in the Dark, the ambient Eternal Darkness and Silent Hill 2 use the
sound effects such as electrical sparks and rag- technique of “less is more,” where a few high
ing fires are typically abrupt and louder than the volume sound effects scattered throughout evoke
background music but soft enough that they do not more fear and anxiety than many high volume
drown out other important sounds like dialogue and recurring sound effects that may eventually cause
combat sound effects. The ambient sound effects habituation, as in Doom 3.
of Dead Space consist of steam vents leaking, The use of volume with sound effects varies
garbage rustling, and lights sparking, and are all depending on whether the game designer is at-
low to medium volume. In these cases, it is not tempting to evoke fear or anxiety. Based on the
clear whether the ambient sounds solely promote above review of games, high volume, abrupt sound
greater immersion and/or promote anxiety. effects seem to be more effective at causing fear,
In Dead Space, the player’s interaction with while low to medium volume ambient sound ef-
the monsters is the most important part of the fects may be more effective at creating suspense
game, and thus, the sound effects related to these and anxiety by convincing the player that they
interactions are the loudest. For instance, any are in a dangerous circumstance.
time the player engages in combat, the monsters
screech loudly and very discernibly until they die. timing of sound Effects
The scream instantly tells the player that his or
her avatar is in danger and, because the monsters Game designers decide for each sound effect one
can kill the player’s avatar, these sound effects of three alternatives for timing: (1) the sound effect
can cause fear in the player. is timed to coincide with a corresponding, often
All of the ambient sounds and music in Doom visible event or object, (2) the sound effect and
3 are loud: they mask out almost all other game the corresponding event or object lag each other,
sound. The enemy sounds are quieter than the or (3) the sound effect is played without regard to
ambient noise, which causes the enemies to corresponding specific event(s), that is, untimed.
seem less menacing. The only thing consistently Thus, timing can be conceptualized as the degree
louder than the music and the ambient noise is the of synchronization between the sound effect and
player’s gun and avatar’s pain screams. Doom 3 visible object(s) (see Roux-Girard, 2011). For
182
instance, when an enemy ambush in a game be- sounds help to immerse the player by bridging the
gins, an appropriate sound effect accompanies the reality gap between the game and real physical
ambush (the game sound is highly synchronized environments (Liljedahl, 2011). Some examples
with the visible event), such as a door swinging of ambient sound effects are leaking ventilation
open or glass shattering. The intended purpose of shafts in Alone in the Dark, sparking electrical
these synchronized sound effects can be to sur- wires in Dead Space, and rustling leaves in Eter-
prise or startle the player, which can promote fear. nal Darkness, which are all visible to the player
While many survival horror games use ambushes when played. These ambient sound effects also
regularly, such as the majority of encounters in help set the mood of the game for players, which
Dead Space and Doom 3, Silent Hill 2 seldom uses may encourage players to appraise objects and
ambushes to scare the player. Rather, this game events in the game as scary.
does the opposite by playing a radio static sound In Silent Hill 2, however, most ambient sound
loop, emanating from the avatar’s pocket radio, effects have no visible source. Players of Silent
to warn players of nearby enemies. The player Hill 2 are unable to find the source of these sounds
quickly learns that the white noise is a forewarn- (that is, acousmatic sounds), such as babies cry-
ing of an imminent attack. This lagging technique, ing, discordant wind chimes clanging together,
coupled with the extremely limited visibility in and tricycle bells ringing. Furthermore, monsters’
the game, causes players to search for the source sounds are mixed at low volume within the non-
of the static whenever they hear it. Players know diegetic music (Roux-Girard, 2011). These sound
that a dangerous situation is nearby, which often production techniques add a strong air of mystery
causes players to feel suspense. This forewarning and ambiguity between sound generators (Roux-
is an emotional and cognitive cue for problem Girard, 2011), which may cause anxiety.
solving (Perron, 2004). Untimed environmental Based on the literature and field review, our
sound effects are present in Alone in the Dark experimental hypothesis for designing sound
and Silent Hill 2. In these games, sounds such as effects for fear and anxiety is as follows: high
crackling fire, whistling wind, or shaking earth volume, synchronized sound with the correspond-
are seemingly set to play at random. ing visual stimulus, and visibly sourced sound
effects are more effective at creating fear, and
source of sound Effects low to medium volume, scary, eerie or mysterious
acousmatic sound effects are more effective at
According to psychoacoustic theories, humans creating anxiety, with no difference between timed
judge whether a sound comes from an appropriate and untimed sound effects for anxiety. The authors
source by the visible availability of a source and believe that if the source of the sound effect can
whether or not that source could sensibly create that be seen by the player then the synchresis of timing
sound (Healy, Proctor, & Weiner, 2003). Almost with the visible threat becomes salient, promoting
all games have clearly visible sources for their veridicality (Collins et al., 2011; Roux-Girard,
sound effects, such as the sound of an attacking 2011) and resulting in the player feeling fear. If
enemy. Providing a visible source of sound helps the source cannot be seen, that is, acousmatic
the player determine what to do within the game sounds, then synchresis is not achieved and the
and helps the player navigate through the game player cannot determine the relationship between
by listening (Grimshaw, 2008), and enhances the what the player sees and what the player hears: this
player’s avatar survival prospects (Roux-Girard, should promote anxiety. If the sound effect is not
2011). For instance, the player can listen for the timed with the visible threat, then the sound effect
location of monsters. Furthermore, ambient game will probably be ignored and will not promote
183
fear or anxiety. Furthermore, timed sound effects- causing Fear Findings

-synchronized game sound and visual threat--will
not promote anxiety because the threat becomes a Results showed a statistically significant (p < 0.05)
clear stimulus. Thus, timing should not promote and large difference (η2 > 0.14) in fear due to the
anxiety. Fear and anxiety were measured in the volume of sound effects between low volume
experiment discussed below, rather than suspense, sound and high volume sound, as well as between
because fear and anxiety are considered separate medium volume sound and high volume sound.
emotions, whereas suspense is the overlap of these No meaningful qualitative data was gathered for
emotions, which could confound the interpretation volume related fear responses.
of the results. Our hypothesis was tested using For timing, results showed a statistically sig-
the methodology described in the next section. nificant and very large (see eta square, Cohen,
1988) increase between timed and untimed sound
effects. Qualitative data showed that timed sound
EXPErIMENtAL EVIDENcE effects enhanced the fear of many players when
accompanied along with a visual gameplay ele-
The hypothesis, as stated in the previous section, ment, such as the presence of an in-game enemy,
was tested using a survival horror game level in though the sound effect by itself may not have
Gears of War (2007), which was created in Unreal substantially elicited fear.
Editor 3. During each test subject’s play-through Findings for fear related to sourced sound ef-
the participant heard one randomly selected alter- fects appeared to be considerable but they were
native (wolf howl, gunfire, or wretch growl) for not statistically significant. Several participants
the volume test, one randomly selected alternative verbally reported that the acousmatic sound ef-
(thunder, boomer growl, or creaking door) for the fects, such as a breaking window or footsteps on
timing test, and one randomly selected alterna- the ceiling, evoked fear. In particular, participants
tive (locust growl, glass shattering, or footsteps) reported that the acousmatic sound effect required
for the source test. Both quantitative data using them to be attentive to possible threats.
7-point self report surveys and qualitative data
were gathered and analyzed. Although there can causing Anxiety Findings
be issues with “after-the-fact narration” (Nacke
& Grimshaw, 2011) by participants completing Results showed a statistically significant and large
self report surveys and interviews, the use of difference in anxiety due to the volume of sound
these indirect measures are common approaches effects between low volume sound and medium
to data gathering in research of emotions. Thirty- volume sound, as well as between low volume
four participants in the U.S.A., ten females and sound and high volume sound. No meaningful
twenty-four males participated in the study. The qualitative data was gathered for volume related
average participants’ age was 26 years old. The anxiety responses.
average playing time per week was about eleven There was not a statistically significant differ-
hours, and fifteen participants (approximately ence between timed and untimed sound effects
44%) liked playing survival horror games. (For for anxiety. Qualitative data showed that untimed
a full exposition of the methodology and results, sound effects caught some players off guard,
see Amdel-Meguid, 2009). because they could not determine whether the
sounds were meant to signal danger, or if they
were benign sounds.
184
There was not a statistically significant dif- tive at causing fear in players than by introducing
ference between sourced and acousmatic sound the gameplay element without a sound effect or
effects. Some participants reported that they felt with a mistimed sound effect.
anxiety when they could not find the source of Finally, there were mixed results of whether
a sound effect. Other participants stopped and sourced sounds elicit more fear than acousmatic
looked around when an acousmatic sound played, sound effects. Quantitative data did not show a
as they did for untimed sound effects, and pro- significant increase in fear responses to sourced
ceeded to search for the source of the sound effect. sounds compared to acousmatic sound effects.
One player stated that not finding the source of However, qualitative data suggested that an acous-
an untimed sound effect made him worry that he matic sound effect drew attention to a potentially
had possibly missed something important. imminent danger in the game, which may have
put players in a state of suspense. If this is the
Discussion case, some participants may not have reported
that sourced sound effects evoked significantly
Causing Fear Discussion more fear than acousmatic sounds because they
considered their suspense responses to be closer
Results strongly suggest that high volume sound to fear than anxiety.
effects are most effective at causing fear in play-
ers. The quantitative data showed a significant, Causing Anxiety Discussion
large increase in reported fear responses, when a
sound effect is louder than the rest of the sounds Results showed that medium and high volume
in the game. This implies to game designers that sound effects are significantly and substantially
the louder they create a sound effect, relative to more effective at eliciting anxiety in players than
other sounds, the more effective it is at promoting low volume sound effects. Perhaps low volume
fear in a player. sound effects are not easily perceived because
In addition, results strongly suggest that sound they become masked amidst other higher volume
effects timed to coincide with a visual gameplay sounds. In contrast, high volume sound effects are
element, such as an in-game enemy, are effec- easily perceived but not necessary, because there
tive at eliciting fear. Quantitative data showed a is no significant difference between medium and
significant, large increase in fear responses due to high volume sound effects for evoking anxiety.
timed sound effects compared to untimed sound Furthermore, given that high volume sound seems
effects. However, the qualitative data showed that to elicit fear reactions from players, the use of
players may not have reacted with fear to the sound this sound technique for evoking anxiety should
itself, but, rather, fear was primarily evoked by be avoided, because of its potential confounding
the accompanied visual gameplay element. That effect. This implies to game designers that the best
is, a well-timed sound effect amplifies attention volume to play anxiety-causing sound effects at
to the gameplay element and enhances the initial is at the same volume as the average soundscape
fear response caused by the visual perception of in the game. Low volume sound effects, perhaps,
that element. The synchronization of the sound and should be used to immerse the player by generating
corresponding image enhanced the feeling of fear the ambience and mood of the game (Roux-Girard,
through the process of synchresis, which promoted 2011) rather than to promote specific emotions.
veridicality. This implies to game designers that The quantitative results did not show a signifi-
accompanying a visual gameplay element with a cant change in anxiety between timed and untimed
well-timed, appropriate sound effect is more effec- sound effects. Qualitative results indicated that
185
some players were concerned by the untimed The results of this study strongly suggest that
sounds. However, this concern in finding the the best sound design for causing fear are high
nature of the untimed sound effect seems to be volume sound effects that are well-timed with the
more about not knowing the source of the sound accompanying visual element. This may seem
effect rather than the timing. Likewise, the quanti- obvious but this study has provided statistical va-
tative results did not show a significant difference lidity for using this technique and these results can
between sourced and acousmatic sound effects be used as basis for further research. For anxiety,
in evoking anxiety. Nevertheless, the qualitative results strongly suggest that the best sound design
results indicate that when some players heard an is medium volume sound effects. Furthermore,
acousmatic sound, they stopped and looked around qualitative data suggest that suspense was evoked
in an attempt to find the source. Being unable to by untimed and acousmatic sound effects. And,
find the source caused some players to become although results suggested that medium sound
concerned that something dangerous would oc- effects were able to promote anxiety, players may
cur later in the game, which could mean that the have been in a state of suspense at this time, as
players were in a state of suspense. And, as noted well. Low, acousmatic sound effects appear to
previously, players may have considered suspense not be effective at evoking fear and anxiety, and
to be more of a fear emotion than anxiety, which possibly any emotion, due to their tendency to
would have resulted in less reporting of anxiety. become masked by other sounds. Perhaps low
The overall implication to game designers is that volume sound effects may be best used for en-
playing a threatening or eerie sound effect without hancing immersion or mood.
a visual source may be better at causing suspense An interesting interpretation of the current
in players than accompanying a sound with a visual study’s evidence is that anxiety, as a separate
threat. Furthermore, untimed sound effects can gameplay emotion, is difficult to evoke on its own.
also promote suspense if the player perceives it Rather, the combination of fear and anxiety, that
as independent from the visual source, at which is, suspense, is easier to promote, and probably
point the player would appraise the sound effect more desirable. Players play survival horror games
as acousmatic. to experience fear and suspense (Perron, 2005b).
Anxiety is too diffuse and vague to be compel-
ling for players to experience in survival horror
cONcLUsION games. Players of these games would rather have
a more direct and powerful emotional response
The aim of the current chapter was to provide a to perceived events and gameplay.
theoretical foundation for the study of evoking This chapter provided quantitative and qualita-
emotions using sound design and determine how tive evidence that game designers can manipulate
to cause fear and anxiety through sound design in the sound properties of volume, timing, and source
computer games. The literature and field review to evoke fear, suspense, and anxiety in players.
that focused on human emotion theory and sur- The literature and field review, methodology, and
vival horror games provided an understanding of results of this study can serve as a foundation for
basic sound design principles of volume, timing, future research.
and source in relation to the emotions of fear and
anxiety. This study used qualitative and quantita-
tive methods to determine the best use of volume,
timing, and source of diegetic sound effects to
cause fear and anxiety in players.
186
FUtUrE rEsEArcH DIrEctIONs to add new insights and dimensions to emotional

theories, as well.
One of the limitations of this study is that most
participants were male and tended to be heavy
gamers. Future studies may want to focus on popu- rEFErENcEs
lations that better represent female and/or casual
gamers. Another limitation is that all the sound Alone in the Dark. (2008). Eden Games.
effects were included in one level and there was Amdel-Meguid, A. A. (2009). Causing fear and
possible interaction between the volume, timing, anxiety through sound design in video games.
and source variables For instance, the volume of Unpublished master’s thesis. Southern Methodist
a sound effect may influence the emotional im- University, Dallas, Texas, USA.
pact of timing and/or source. In order to control
for these possible confounding factors, a future Anderson, J. D. (1996). The reality of illusion:
study may focus on the affect of solely one sound An ecological approach to cognitive film theory.
property. Furthermore, as in most studies, a larger Carbondale, IL: Southern Illinois University Press.
population should yield better external validity.
Barlow, D. H. (1988). Anxiety and its disorders:
One possible research direction is to continue
The nature and treatment of anxiety and panic.
studying fear, suspense, and anxiety beyond the
New York: Guilford Press.
parameters of the study described in this chapter.
For instance, are the sound design techniques the Carr, D. (2006). Space, navigation and affect . In
same for causing fear, suspense, and anxiety in Carr, C., Buckingham, D., Burn, A., & Schott, G.
game genres other than survival horror? What is the (Eds.), Computer games: Text, narrative and play
difference between male and female responses to (pp. 59–71). Cambridge, UK: Polity.
sound design techniques that cause fear, suspense,
Carroll, N. (1996). The paradox of suspense. In
and anxiety? What are the effects of the absence
Vorderer & Friedrichsen (Eds.), Suspense: con-
of sound on fear, suspense, and anxiety? What is
ceptualization, theoretical analysis, and empirical
the relationship between visual gameplay elements
explorations (pp. 71-90). Hillsdale N.J.: Lawrence
and sound effects on how they affect players’
Erlbaum Associates.
fear, suspense, and anxiety? How can other types
of sounds, such as music and dialogue, increase Cohen, J. W. (1988). Statistical power analysis
fear, suspense, and anxiety? How do other sound for the behavioral sciences (2nd ed.). Hillsdale,
properties affect fear, suspense, and anxiety? NJ: Lawrence Erlbaum Associates.
Finally, future possible research can study other
Collins, K. (2007). An introduction to the participa-
emotions using the same or other sound proper-
tory and non-linear aspects of video games audio
ties. For example, how can game designers elicit
. In Hawkins, S., & Richardson, J. (Eds.), Essays
the emotions of anger, joy, and sadness in players
on sound and vision (pp. 263–298). Helsinki,
through sound design? This research would lead
Finland: Helsinki University Press.
to inquiry into the questions raised previously,
such as the effect of genre, visual gameplay ele-
ments, and the type of sound evoking the studied
emotion. From this research, we would not only
understand how to promote certain emotional
experiences from playing computer games through
the use of sound design, but we may be also able
187
Collins, K., Tessler, H., Harrigan, K., Dixon, M. Grimshaw, M. (2007). Sound and immersion in
J., & Fugelsang, J. (2011). Sound in electronic the first-person shooter. In Proceedings of The
gambling machines: A review of the literature 11th International Computer Games Conference:
and its relevance to game audio . In Grimshaw, AI, Animation, Mobile, Educational & Serious
M. (Ed.), Game sound technology and player Games (CGAMES 2007).
Grimshaw, M. (2008). The acoustic ecology of
PA: IGI Global.
the first-person shooter: The player experience of
Cornelius, R. R. (1996). The science of emotion. sound in the first-person shooter computer game.
Upper Saddle River, NJ: Prentice-Hall. Saarbrucken, Germany: VDM Verlag.
Creswell, J. (2005). Educational research: Plan- Gullone, E., King, N., & Ollendick, T. (2000).
ning, conducting, and evaluating quantitative and The development and psychometric evalua-
qualitative research (2nd ed.). Upper Saddle River, tion of the Fear Experiences Questionnaire:
New Jersey: Pearson Education. An attempt to disentangle the fear and anxiety
constructs. Clinical Psychology & Psycho-
therapy, 7(1), 61–75. doi:10.1002/(SICI)1099-
0879(200002)7:1<61::AID-CPP227>3.0.CO;2-P
. In Grimshaw, M. (Ed.), Game sound technology
and player interaction: Concepts and develop- Healy, A. F., Proctor, R. W., & Weiner, I. B. (2004).
ments. Hershey, PA: IGI Global. Handbook of psychology: Vol. 4. Experimental
psychology. Hoboken, NJ: Wiley.
Dead Space. (2008). Electronic Arts.
Kaplan, H. I., & Sadock, B. J. (1998). Synopsis of
Doom 3. (2004). Activision.
psychiatry. Baltimore, MD: Williams & Wilkins.
Eternal Darkness. (2002). Nintendo.
Krzywinska, T. (2002). Hands-on horror . In
Folkman, S., & Lazarus, R. S. (1990). Coping King, G., & Krzywinska, T. (Eds.), ScreenPlay:
and emotion . In Leventhal, N. B., & Trabasso, T. Cinema/Videogames/Interfaces (pp. 206–223).
(Eds.), Psychological and biological approaches London: Wallflower.
to emotion (pp. 313–332). Hillsdale, NJ: Erlbaum.
Liljedahl, M. (2011). Sound for fantasy and
Gears of War. (2007). Microsoft. freedom . In Grimshaw, M. (Ed.), Game sound
Gebeke, D. (1993). Children and fear. Retrieved
December 10, 2009, from http://www.ag.ndsu.
edu/pubs/yf/famsci/he458w.htm. Lincoln, Y. S., & Guba, E. D. (1985). Naturalistic
inquiry. Thousand Oaks, CA: Sage Publications,
Gray, J. A. (1971). The psychology of fear and
Inc.
stress. New York: McGraw-Hill.
May, R. (1977). The meaning of anxiety (revised
Greeno, J. G., Collins, A. M., & Resnick, L. B.
ed.). New York: Norton.
(1996). Cognition and learning . In Berliner, D.,
& Calfee, R. (Eds.), Handbook of educational Nacke, L., & Grimshaw, M. (2011). Player-game
psychology (pp. 15–46). New York: Simon & interaction through affective sound . In Grimshaw,
Schuster Macmillan. M. (Ed.), Game sound technology and player in-
teraction: Concepts and developments. Hershey,
PA: IGI Global.
188
Perron, B. (2004). Sign of a threat: The effects ADDItIONAL rEADINGs

of warning systems in survival horror games. In
Proceedings of the Fourth International COSIGN Barbara, S. C. (2003). Hearing in three dimen-
(Computational Semiotics for Games and New sions. The Journal of the Acoustical Society of
Media) 2004 Conference. America, 113(4), 2200–2200.
Perron, B. (2005a). A cognitive psychological ap- Bridgett, R. (2008). Post-production sound: A

proach to gameplay emotions. In Proceedings of new production model for interactive media.
the Second International DiGRA (Digital Games Soundtrack, 1(1), 29–39. doi:10.1386/st.1.1.29_1
Research Association) 2005 Conference. Calvert, S. L., & Scott, M. C. (1989). Sound effects
Perron, B. (2005b). Coming to play at frightening for children’s temporal integration of fast-paced
yourself: Welcome to the world of horror video television content. Journal of Broadcasting &
games. In Proceeding of the Aesthetics of Play Electronic Media, 33(3), 233–246.
conference. Gärdenfors, D. (2003). Designing sound-based
Plutchik, R. (1984). Emotions: A general psy- computer games. Computer Creativity, 14(2),
choevolutionary theory. Hillsdale, NJ: Erlbaum. 111–114.
Roux-Girard, G. (2011). Listening to fear: A Grimshaw, M. (2009). The audio uncanny valley:
study of sound in horror computer games . In Sound, fear and the horror game. In Proceedings
Grimshaw, M. (Ed.), Game sound technology and of the Audio Mostly 2009 Conference.
player interaction: Concepts and developments. Houlihan, K. (2003). Sound design: The expres-
Hershey, PA: IGI Global. sive power of music, voice, and sound effects in
Serafin, S. (2004). Sound design to enhance cinema. Journal of Media Practice, 4(1), 69–69.
presence in photorealistic virtual reality. In Pro- doi:10.1386/jmpr.4.1.69/0
ceedings of the 2004 International Conference Izard, C. E. (2009). Emotion theory and research:
on Auditory Display. Highlights, unanswered questions, and emerging
Silent Hill 2. (2001). Konami. issues. Annual Review of Psychology, 60(1), 1–25.
doi:10.1146/annurev.psych.60.110707.163539
Wilhelmsson, U., & Wallén, J. (2011). A com-
bined model for the structuring of game audio . In Jennett, C., & Cox, A. L. (2008). Measuring and
Grimshaw, M. (Ed.), Game sound technology and defining the experience of immersion in games. In-
player interaction: Concepts and developments. ternational Journal of Human-Computer Studies,
Hershey: IGI Global. 66(9), 641–661. doi:10.1016/j.ijhcs.2008.04.004
Wolfson, S., & Case, G. (2000). The effects of Jones, K. (2004). Fear of emotions. Sim-
sound and colour on responses to a computer u l a t i o n & G a m i n g, 3 5( 4 ) , 4 5 4 – 4 6 0 .
game. Interacting with Computers, 13, 183–192. doi:10.1177/1046878104269893
doi:10.1016/S0953-5438(00)00037-0 Jørgensen, K. (2007). On transdiegetic sounds in
Zillman, D. (1991). The logic of suspense and computer games. Northern Lights: Film & Media
mystery . In Bryant, J., & Zillman, D. (Eds.), Studies Yearbook, 5(1), 105–117. doi:10.1386/
Responding to the screen: Reception and reaction nl.5.1.105_1
processes (pp. 281–303). Hillsdale, NJ: Lawrence
Erlbaum Associates.
189
Klimmt, C., Rizzo, A., Vorderer, P., Koch, J., & Satoru, O., & Shigeru, A. (2003). Video game ap-
Fischer, T. (2009). Experimental evidence for paratus, background sound output setting method
suspense as determinant of video game enjoy- in video game, and computer-readable recording
ment. Cyberpsychology & Behavior, 12(1), 29–31. medium storing background sound output setting
doi:10.1089/cpb.2008.0060 program. The Journal of the Acoustical Society
of America, 114(3), 1208–1208.
Kofler, A. (1997). Fear and anxiety across con-
tinents: The European and the American way. Schafer, R. M. (1994). The soundscape: Our
Innovation: The European Journal of Social sonic environment and the tuning of the world.
Sciences, 10(4), 381–404. Rochester, VT: Destiny Books.
Kyosik, K., & Hyungtai, C. (2008). Enhancement Sider, L. (2003). If you wish to see, listen: The
of a 3D sound using psychoacoustics. International role of sound design. Journal of Media Practice,
Journal of Biological & Medical Sciences, 1(3), 4(1), 5–15. doi:10.1386/jmpr.4.1.5/0
151–155.
Tinwell, A., & Grimshaw, M. (2009, April).
Levitt, H. (1971). Transformed up-down meth- Survival horror games: An uncanny modality.
ods in psychoacoustics. The Journal of the Proceedings of the International Conference
Acoustical Society of America, 40, 467–477. Thinking After Dark.
doi:10.1121/1.1912375
Yantas, A. E., & Azcan, O. (2006). The effects
Liu, M., Toprac, P., & Yuen, T. (2008). What factors of the sound-image relationship within sound
make a multimedia learning environment engag- education for interactive media design. Computer
ing: A case study . In Zheng, R. (Ed.), Cognitive Creativity, 17(2), 91–99.
Effects of Multimedia Learning. Hershey, PA:
Zwicker, E., & Fastl, H. (1990). Psychoacoustics
Idea Group Inc.
facts and models. New York: Springer-Verlag.
Portnoy, S. (1997). Unmasking sound: Music and
representation in The Shout and Blue. Spectator:
The University of Southern California Journal of
Film & Television, 17(2), 50–59.
Raghuvanshi, N. (2007). Real-time sound Anxiety: A generalized mood condition that
synthesis and propagation for games. Com- occurs without an identifiable triggering stimulus.
munications of the ACM, 50(7), 66–73. Cognitive Emotional Theory: Cognitive
doi:10.1145/1272516.1272541 activity in the form of judgments, evaluations,
or thoughts is necessary in order for an emotion
Roberts, J. R. (2006). Influence of sound and to occur.
vibration from sports impacts on players’ percep- Darwinian Emotional Theory: Emotions
tions of equipment quality. Journal of Materials: evolved via natural selection and therefore have
Design & Applications, 220(4), 215–227. cross-culturally universal counterparts.
Robertson, H. (2004). Random noises. Video- Fear: An emotional response to a perceived
maker, 19(4), 71–74. threat.
James-Lange Emotional Theory: Emotional
experience is largely due to the experience of
bodily changes.
190
Psychoacoustics: The study of subjective Source: The object emitting sound.

human perception of sounds. Suspense: Is a feeling of fear and anxiety
Sound Design: The manipulation of audio about the outcome of certain actions.
elements to achieve a desired effect. Timing: The degree of synchronization be-
Sound Principles: Components that most tween the sound effect and visible object(s).
influence how sound is perceived. Volume: The amplitude or loudness of a sound
191
192
Chapter 10
Listening to Fear:
A Study of Sound in Horror
Computer Games
Guillaume Roux-Girard
University of Montréal, Canada
AbstrAct
This chapter aims to explain how sound in horror computer games works towards eliciting emotions
in the gamer: namely fear and dread. More than just analyzing how the gamer produces meaning with
horror game sound in relation to its overarching generic context, it will look at how the inner relations
of the sonic structure of the game and the different functions of computer game sound are manipulated
to create the horrific strategies of the games. This chapter will also provide theoretical background on
sound, gameplay, and the reception of computer games to support my argument.
INtrODUctION most intense terror. In horror computer games,

it is not recommended that the gamer divert
Computer game sound is as crucial to the cre- their attention from the various sound events,
ation of the depicted gameworld’s mood as it is as a careful listening will allow for—or at least
in its undeniable support to gameplay. In horror favour—the survival of their player character. In
computer games, this role is increased tenfold his thesis on the sound ecology of the first-person
as sound becomes the engine of the gamer’s im- shooter, Mark Grimshaw (2008) underlined that in
mersion within the horrific universe. From the common day life, where dangers are limited, the
morphology of the sound event to its audio-visual auditory system “can operate in standby mode (or,
and videoludic staging, sound cues provide most in cognitive terminology, [the] auditory system is
of the information necessary for the gamer’s pro- operating at a low level of perceptual readiness)
gression in the game and, simultaneously, supply awaiting more urgent signals as categorized by
a range of emotions from simple surprise to the experience” (p. 10). Just as Grimshaw did about
the genre at the heart of his study, I suggest that
DOI: 10.4018/978-1-61692-828-5.ch010 “the hostile world of the [horror computer] game
Listening to Fear
requires a high level of perceptual readiness in believe that adopting a position that emphasizes
regard to sound” (p. 10). The level of attention reception issues of gameplay can provide a more
required vis-à-vis sound must be increased all the productive model than one that would be grounded
more so as computer game environments are often directly in the production aspects (implementation
designed to limit the visual perception of the gamer. and programming) of game sound.
Whether it is by means of a constraining virtual Overall, this text aims at explaining how hor-
camera system (Taylor, 2005), by using stylistic ror game sound works in a way to elicit specific
effect such as the thick fog shrouding the streets emotions in the gamer. Adopting a gamer- and
of Silent Hill (Konami, 1999), or by drastically gameplay-centric perspective, it wishes to high-
reducing sources of light, game designers have, light how the inner relations of the sonic struc-
through time, found a variety of ways to force the ture and the different functions of game sound
gamer to utilise their ears in order to help their are used to create strategies based on the micro
player character survive in the nightmarish worlds events and on the overarching generic context that
in which they play. regulates these events. With examples borrowed
To fully comprehend how horror computer from the Alone in the Dark (I-motion, 1992-1995,
games manage to frighten the gamer, one must Infogrames, 2001 & Atari, 2008), Resident Evil
understand how sound is structured, as well as (Capcom, 1996-2009) and Silent Hill (Konami,
be aware of how the gamer makes meaning with 1999-2008) series, and from the computer game
the information the sounds carry. From this point, Dead Space (Electronic Arts, 2008), this paper
many questions arise. What are the implications of will also try to demonstrate how the notion of
the generic context on the reception of the sounds genre, instead of being merely a tool to classify
in horror computer games? On what basis should games, rather impacts on the expectations of
we approach the sound structure of those games? the gamer and therefore structures the way they
How does this structure allow for the mise en scène organize and make meaning of sound in relation
of the dreadful elements or horrific strategies of to the game context.1
the games? What are the basic functions of hor-
ror computer game sounds and, once again, how
can the game work on these functions to create a APPrOAcHING HOrrOr
sentiment of fear and dread in the gamer? cOMPUtEr GAME sOUND
As it will be further explored in the next sec-
tions of this chapter, I make the hypothesis that Before we try to understand what purposes sounds
sound in computer games should be approached serve in horror computer games and how they
directly in regard to its purposes towards gameplay. contribute in generating fear, it is essential to take
After all, gameplay is what mainly distinguishes a look at the numerous factors which condition
computer games from their linear audio-visual the gamer’s journey and influence their listening
counterparts: the main difference between com- through their gaming sessions.
puter games and films being situated in the par-
ticipatory and interactive nature of the videoludic the Horizon of Expectations
medium. Therefore, it is mainly through a study
of gameplay that true understanding of the role of In her book Game Sound: An Introduction to the
game sound can be achieved. In this perspective, History, Theory, and Practice of Video Games,
I also suggest that sound should be addressed in Karen Collins (2008) noted that “game [sound]
a way that is both accessible to designers and the has been significantly affected by the nature of
most common gamer. In order to do so, I firmly technology […] and by the nature of the industry”
193
Listening to Fear
(p. 123). Indeed, economic and technological 28) as well as generically modelled which “draws
constraints are greatly responsible for the game’s on and conforms to existing generic traditions,
aesthetic as the limits imposed by production time conventions and formulae” (Neale, 2000, p. 28).3
and hardware often force the designers to lessen To be considered as a horror game, a videoludic
the richness of the soundscape while encouraging work must then be designed with a purpose of
others to find inventive ways to overcome these scaring the gamer and must be received as such
constraints.2 However, as Collins explained, the by the gaming community that will then treat this
games themselves also affect game sound by the intention as a gaming constraint. Accordingly,
means of their genre, narrative structure, and par- sound must be exploited to support these design
ticipatory nature. Consequently, she pointed out choices, and, to a certain degree, correspond to
that “[g]enre in games is particularly important the expectations the games produce.
in that it helps to set the audience’s expectations
by providing a framework for understanding What is a Horror computer Game?
the rules of gameplay” (Collins, 2008, p. 123).
Consequently, the horizon of expectations gamers Horror computer games generate fear through
have of the games is probably the first thing that mechanisms specifically tied to their videoludic
will influence the production of meaning towards nature even though they often draw their strate-
a sound. As Hans Robert Jauss (1982) explains: gies of mise en scène from its cinematographic
counterpart’s conventions and mythologies
The analysis of the literary experience of the (Whalen, 2004; Perron, 2004). Derived from the
reader [or the videoludic experience of the gamer] “adventure” genre (Whalen, 2004), these com-
avoids the threatening pitfalls of psychology if it puter games exploit horror conventions on the
describes the reception and the influence of a work plot level, often by opposing a lone individual,
within the objectifiable system of expectations trapped inside a gloomy location, to a flock of
that arise for each work in the historical moment bloodthirsty, monstrous creatures which he must
of its appearance, from a pre-understanding of confront—or sometimes run from—in order to
the genre, from the form and themes of already survive. On the gameplay level, “the gamer has to
familiar works, and from the opposition between find clues, gather objects [...] and solves puzzles”
poetics and practical language. (p. 22) (Perron, 2004, p. 133). As it was mentioned previ-
ously, sound will play a determining role, as these
This horizon of expectations will thus be forged games normally limit vision through their formal
by the gamer’s previous experiences at playing and aesthetic treatments, in helping the gamer to
computer games, particularly those in the horror gather the necessary information on their environ-
genre, but also his familiarity with broader horror ment to stay alive.
mythology and conventions such as the ones found Horror computer games are not only designed
in movies and novels. We can also maintain that to generate fear based on their narrative setting or
the notion of genre will play a determining role in the iconography they employ, but are also concep-
the way game sound is produced and then received tualised to produce what Bernard Perron called
by the gaming community. This relationship be- gameplay emotions. According to Perron (2004),
tween production and reception is fundamental to these games engender three different kinds of
understand the functions and evolution of sound emotions: (1) fictional emotions which “are rooted
in horror computer games. Indeed, these games in the fictional world and the concerns addressed
are generically marked “which rely on generic by that world”, (2) artefact emotion emanating
identification by an audience” (Neale, 2000, p. “from concerns related to the artefact, as well as
194
Listening to Fear
stimulus characteristics based on these concerns”, For this matter, the notion of genre will mostly
but mostly (3) gameplay emotions “that arise serve as an overarching catalyst through which
from the gamer’s action in the game-world and the gamer structures their journey in the games.4
the consequent reactions of this world” (p. 132). Of course, the chapter will still deal with design
While all horror computer games are provided issues such as looking at the implementation of
with a more or less elaborate fictional setting, sound strategies, however, this will be done in
in the end, it remains a part of the experience order to investigate how designers built these
of gameplay. For horror games to be effective, stratagems out of their predictions of how the
gameplay mechanics must have been designed gamer potentially produces meaning through the
with the intent of scaring the gamer, by limiting sounds, in regards to generic constraints, during
the quantity of ammunition, for instance. their gameplay activity.
But then again, what is gameplay? In Half-
How to Approach Horror Real, Jesper Juul (2005) approached the concept
computer Games sound? of gameplay using Richard Rouse’s definition as
a basis: “A game’s gameplay is the degree and
In the introduction to Sound Theory, Sound Prac- nature of the interactivity that game includes,
tice, Rick Altman (1992) claimed that rather than i.e., how the [gamer] is able to interact with the
seeing cinema as a self-centered text, it should be game-world and how that game-world reacts to
perceived as an event. Traditionally, film studies the choices the [gamer] makes” (Rouse in Juul,
modelled the production and reception as gravitat- p. 87). To further elaborate on the question of
ing around the film-as-text. However, as Altman gameplay, and to prevent a misunderstanding of
explained: “Viewed as a macro-event, cinema is the term, Juul added that “gameplay is not a mir-
still seen as centred on the individual film, but ror of the rules of a game, but a consequence of
[…] the textual center is no longer the focal point the game rules and the dispositions of the game
of a series of concentric rings” (pp. 2-3). Follow- players” (p. 88). Using this quotation as starting
ing this model, the film-as-text mostly serves as point, and as a way to oppose the fallacy con-
a “point of interchange” between the process of structed by Manovich’s (2001) definition of an
production and the process of reception which algorithm, Arsenault and Perron (2009) reminded
mutually influence one another. The film itself thus me that “one of the misconceptions of gameplay
becomes a representation of this “dialogue” or this which needs to be addressed springs out when
“event”. Computer games can be envisioned in a one does not make a distinction between the
similar fashion. However, while technical aspects process of playing games and the game system
of computer game production might enlighten cer- itself” (p. 110). Following their logic, gameplay
tain points about how the sounds are implemented must not be understood as “the” game system
and structured within the game code, I believe but as the “ludic experience” emerging from the
that it is not with regard to this code that horror relation that is established between the gamer
computer games should be approached. While and the game system. Therefore, it is important
some PC games offer the option to look at how to understand that through the eyes of a gamer,
the files are organised on the disc, most computer the experience of gameplay is not portrayed by
games—particularly console games—do not. I a series of codes managed by an algorithm, nor
will therefore be addressing sounds through the a direct representation of the implementation of
gameplay process of the individuals playing the sound within this code.5 Consequently, I chose to
game and all design matters will be dealt with exploit a terminology which facilitates the under-
in regard to creating this gameplay experience. standing of the gamer’s cognitive process during
195
Listening to Fear
gameplay. It will allow me to better illustrate how evolution of the gamer’s relationship with the
the gamer produces meaning of sounds as a means game—is left aside” (Arsenault & Perron, 2009,
of completing their main objective: the survival p. 115). To correct this failing, Arsenault and Per-
of their player character. ron proposed a model—the Magic Cycle (Figure
Arsenault and Perron (2009) defined computer 1)—that is based on 3 interconnected spirals: the
games as “a chain of reactions” in which “[t] heuristic spiral of gameplay, the heuristic spiral
he [gamer] does not act so much as he reacts to of narrative and the hermeneutic spiral. They also
what the game presents to him, and similarly, the clarified that: “[t]he relationship to each other is
game reacts to his input” (pp. 119-120) In other one of inclusion: the gameplay leads to the unfold-
words, the gamer responds to events that were ing of the narrative, and together the gameplay
programmed by a designer (whose job partly and the narrative can make possible some sort of
consisted of predicting the gamer’s reactions interpretation” (p. 118). Their model also took
to the proposed events), and then the game acts into account the gamer’s experience in gaming
in response to the gamer’s input with other pre- and the horizon of expectations of the gamer that
programmed events fitting the new parameters. are shaped by their previous knowledge of the
According to their gameplay (and gamer-centric game or sometimes by an introductory cut scene.
model), the authors explained (a single loop of) While looking at the model, these are respectively
gameplay through four steps in which “the game represented by the dotted lines entitled “launch
always gets the first turn to speak” (Arsenault & window” and by the inverted spiral. From this
Perron, 2009): point the looping process described above will
be “repeated countless numbers of time to make
• From the game’s database, the game’s up the magic cycle” (p. 121) and to represent the
algorithm draws the 3-D objects and tex- mental image the gamer develops about the game
tures, and plays animations, sound files, (represented by the Game′ of the model). This
and finds everything else that it needs to perpetual process, alongside the implication of
represent the game state the generic context, will therefore allow for the
• The game outputs these to the screen, mental organisation of sounds towards the gamer
speakers, or other peripherals. The gamer activity inside the game.
uses his perceptual skills (bottom-up) to
see, hear and/or feel what is happening
• The gamer analyses the data at hand strUctUrING HOrrOr
through his broader anterior knowledge (in cOMPUtEr GAME sOUND
top-down fashion) of narrative convention,
generic competence, gaming repertoire, When they are engaged in a horror game, the
etc. to make a decision exercise of gameplay requires the gamer to some-
• The gamer uses his implementation skills what organise sounds according to their gaming
(such as hand-eye coordination) to react to objectives which, in the case of the genre we study
the game event, and the game recognizes here, mainly revolve around allowing their player
this input and factors it into the change of character to survive the horrors of the game. In
the game state. (pp. 120-121) order to do so, the gamer tries to answer two basic
questions regarding game sound: 1) From where
However, as the authors recalled, “the most does that sound originate? and 2) what is the cause
obvious flaw of representing gameplay with a of that sound? I therefore propose to explore a basic
single circle is that the temporal progression—the sound structure that will effectively represent the
196
Listening to Fear
Figure 1. Arsenault & Perron’s Magic Cycle. (© 2009, Arsenault and Perron. Used with permission)
cognitive process (as previously explained with world.7 From a structural perspective, more than
Arsenault and Perron’s model) that is performed the description of a world, it is particularly in the
almost unconsciously by the gamer while playing division that exists between elements considered
a horror computer game. as being part of the fictional world (diegetic) and
elements which are not judged to be components
Inside and Outside of the “Diégèse” of the fictional world (extra-diegetic8) that this
notion has found a niche in works on sound in
While glancing at the game sound literature (Col- game studies.
lins, 2008; Grimshaw, 2008; Huiberts and van Indeed, while listening to horror computer
Tol, 2008, Jørgensen, 2006, 2011; Stockburger, game sound, the fact that a sound is part of the
2003), we notice that one of the most common depicted gameworld or not will have a consider-
ways to envision the structure and composition of able impact on the decisions the gamer will make
sound in games is relative to its status regarding regarding this sound. Based on the gameplay
the diégèse of the game (I am using the French model that was introduced earlier, these sound
word to avoid any misconception that this term cues will engender many questions in an attempt
holds the same meaning as Plato’s and Aristotle’s to recreate the mental image of the game state. Is
definition of diegesis6). Taking its origin in film the sound produced by an instance present in the
studies, the diégèse must be understood as a “diégèse”? If it is, does that instance represent a
“mental reconstruction of a world” (Odin, 2000, threat to the player character or is it just a part of
p. 18, freely translated) that can be “perceived as the ambience of the gameworld? Furthermore,
an inhabitable space” (Odin, 2000, p. 23, freely as it was hinted by this set of queries, while the
translated). This definition of diégèse clearly diegetic status of a sound holds much importance,
refers to the “historico-temporal” universe in recreating the mental image of the game state
which the story—or in the case that interests us, necessitates a more elaborate set of qualifiers.
the simulation—takes place. This definition thus
allows more easily for the parallel that is often
established between the diégèse and the game-
197
Listening to Fear
sound Generators system. Accordingly, I will briefly describe these

generators following Jørgensen’s definition and
In computer games, much attention must be paid to adapt them to my own corpus of study. General
sound sources as they contribute to the construction informative functions of each type of generator will
of the diegetic space. However, more important also be mentioned as they will provide a tighter
than what instance or event emits the sound is what relationship with the next section of this chapter
generates the sound. Not only does the notion of on the functions of horror computer game sounds.
generator furnish knowledge on what caused a A sound generated by the gamer is “caused
specific cue, but it also provides information on by [gamer] action” (Jørgensen, 2008, Player
its relationship to other sounds, its relationship to Generated Sound section, para. 1). As Jørgensen
the game state, as well as the situation in which explained:
they are heard. These sound generators, as Kristine
Jørgensen (2008) explained are “not the same The most important informative role of [gamer]
as the source of the sounds. While the source is generated sounds is to provide usability informa-
the object that physically (or virtually) produces tion, or more specifically to provide response since
the sound: the generator is what causes the event they always seem to appear immediately after
that produces the sound” (Player Interpretation a player action. Player generated sounds also
of Audio in Context section, para. 2). If we adapt provide spatial information, and sometimes also
Jørgensen’s example to a horror computer game temporal and [player character] state informa-
context, this basically means that the shrieking tion. (Player Generated Sound section, para. 1)
sound emitted by one of Dead Space’s necromorph
(its source) while being dismembered by the In Resident Evil (Capcom, 1996), for instance,
player character’s plasma cutter is in fact gener- these sounds may include footsteps, gunshots,
ated by the gamer. Therefore, this concept (in its the opening of doors, angry monster growls after
definition) also reflects the interactive nature of they are shot by the gamer, the opening of Chris
computer games by putting forward the agency9 Redfield’s or Jill Valentine’s inventory menus
of the gamer within the simulated world, as well and so on.
as the response of the game to the gamer’s actions. For their part, enemy generated cues “are pro-
While studying World of Warcraft, Jørgensen duced externally from the [gamer’s] perspective,
(2008) identified 5 categories of sound genera- by being detached from the [gamer’s] own actions
tors: the gamer, allies, enemies, the gameworld, and emerging from the gameworld” (Jørgensen,
and the game system each of which is organized 2008, Sound Generated By Enemies and Allies
according to the perspective of the gamer. section, para. 1). Such sounds will furnish spatio-
Even though, some horror games propose an temporal information and will also serve “pres-
interaction with friendly non-player characters ence” purposes as they engage with the existence
such as Luis in Resident Evil 4 (Capcom, 2004) of enemies in the vicinity. Of course, these sounds
or, as in Resident Evil 5 (Capcom, 2009) and Left also give information about modification in the
4 Dead (Valve Software, 2008), offer a multi- game state and supply progression functions of
player co-operative mode, most games of the the game: these might include the sounds of off-
genre privilege the solitude of the player character screen or on-screen monsters, or may indicate
and allies are normally quite scarce. Therefore, that the player character has been wounded after
this chapter will focus on the dynamic and non- being hit by a zombie.
dynamic sounds (Collins, 2008) produced by the Gameworld generated sounds are similar to
gamer, the enemies, the gameworld, and the game what Huiberts and van Tol (2008) described as
198
Listening to Fear
zone sounds. These sound cues consist of sounds tHE FUNctIONs OF HOrrOr
“linked to the environment in which the game is cOMPUtEr GAME sOUND
played” (Huiberts & van Tol, 2008, Zone section,
para. 1). While these sounds are often implemented To reach his objective, the gamers must also gather
to generate the ambience of the game, they also information about the game state. To do so, they
serve as spatial functions and might give certain must ask themselves what are the functions of
information about the game state. In Dead Space, a particular sonic cue and, if the sounds serve
these sounds include the rumbling of the ship and more than one purpose, which function is more
some of the gruesome sounds emitted by the pre- important according to the context?
programmed burst of blood coming out of organic In computer games, sounds contribute to the
matter that can be found on the wall and floor. gamer’s immersion: they construct the mood of
Game system-generated sounds are by far the the game, and provide information that will be
most ambiguous. Jørgensen (2008) defined them used in gameplay. According to Jørgensen (2006),
as sounds “generated by the system to provide we can state that sound serves two main func-
information that any [player character] cannot tions. On one hand, it “has the overarching role
produce on its own, and carry information directly of supporting a user system” and, on the other, it
connected to game rules and as well as game is “supporting the sense of presence in a fictional
and [gamer] state” (Conclusions and Summary world” (p. 48). This basically means that sound
section, para.3). Horror computer games do not creates “a situation where the usability information
include many of those sounds. However, a few of elements such as [sound] becomes integrated
examples can be found. The “fuzzing” sound, with the sense of presence in the virtual world”
accompanied by heart pounding, that is emitted (Jørgensen, 2008, Integration of Game System
when an player character is lethally wounded in and Virtual World section, para. 1).
Resident Evil 5 could correspond to this descrip-
tion as it is not directly produced by a gamer’s the (Double) causality of sound
action, it is generated by the system to warn the
gamer that his player character needs immedi- To fill the important functions exposed by Jør-
ate health assistance. While it is not explicitly gensen, I believe that sounds first need to create
mentioned by Jørgensen, I would argue that the a feeling of causality with: 1) the images (and
extra-diegetic musical score of the game is also more largely with the gameworld) and 2) with
system generated. While this music often plays an the gamer’s actions.
affective role in the game, it also serves presence Just like in movies, images and sounds are
and game state purposes. For instance, in Alone in tightly linked, producing the effect of added value,
the Dark: Inferno (Atari, 2008), the music ramps described by Michel Chion (2003) as a “sensory,
up, signalling that enemies are nearby or attack- informative, semantic, narrative, structural or
ing the player character. It is mostly according expressive value that a sound heard during a scene
to the relationship between this extra-diegetic leads us to project on the image, until creating
music, the gamer, and the gameworld that this the impression that we see in this image what in
category of generators will be examined in this reality we ‘audio-see’” (p. 436, freely translated).
paper. These generators will be used as a structural The added value of a sound on the images creates
basis when studying the creation of horror game what Chion called audio-visiogenic effects which
sound strategies. can be classified within four categories: (1) ef-
fect of sense, atmosphere, content, (2) rendering
and matter effect (materializing sound indices)
199
Listening to Fear
which creates sensations of energy, textures, will in act. This relationship between action and
speed, volume, temperature, for example, (3) sounds is primordial in establishing the horror
scenography effect concerning the creation of an games conventions and greatly contributes to the
imaginary space and (4) effect related to time and effect of presence as it gives a sensory support
the construction of a temporal phrasing. These for the gamer’s agency.
audio-visiogenic effects and materializing sound
indices are essential to horror computer games such Gameplay Functions
as Dead Space, as they give an organic texture to
an anthropomorphic monster. The gooey sound From a gameplay point of view, and following
that accompanies the impact of a plasma cutter the loop of Arsenault and Perron’s (2008) model,
blast as blood and guts explode on the screen sound performs two main functions: (1) to give
helps the gamer believe that what they are seeing information on the game-state and (2) to give
is real, while in fact what is showed on the screen feedback on the gamer’s activity in response to
is a simple translation of coloured polygons. The the game state. Before we engage in a typology of
effectiveness of the added value rests upon 3 fac- the different gameplay functions of sounds, I wish
tors that have also been defined by Chion. It is to mention that I am fully aware that every sound,
principally by means of synchronisation points, while serving gameplay purposes, simultaneously
“a more salient moment of a synchronised reunion has immersive and affective functions. However,
between concomitant sonic moment and visual for reasons of brevity, I will not integrate those
moment” (p. 433, freely translated) or, more functional poles together right away. In this line
broadly, an effect of synchresis, and an effect of of thought, I will not present an exhaustive list of
rendering which will give the sound a necessary gameplay functions, and keep only those useful
degree of veridicality (Grimshaw, 2008) for it to for my analysis of horror computer game sound
seem “real, efficient and adapted” to “recreate strategies.10 Based on Collins (2008), Grimshaw’s
the sensation [...] associated to the cause or to (2008), Jørgensen’s (2008) and Whalen’s (2004)
the circumstance evoked in the [game]” (Chion, work, I wish to take a look at five gameplay func-
1990, p. 94, freely translated). For this to be ef- tions that some horror game strategies are founded
fective, Grimshaw (2008) reminds us that a sound upon: spatial functions, temporal functions, pre-
“must be as faithful as possible to its sound source paratory functions, identification functions, and
[within the game], containing and retaining, from progression functions.
recording or synthesis through to playback, all the In computer games, it is essential to determine
information required for the player to accurately the approximate location of the sound genera-
perceive the cause and, therefore, the significance tors. Spatial functions allow for the localization
of the sound” (p. 73). of generators in terms of direction and distance,
However, we must not forget that computer contribute to the quantification and qualification of
games are not only audio-visual, but also inter- game space and help the gamer to navigate through
active. Therefore, sound must also establish a it. More precisely, the sounds will be described as
sentiment of causality between the gamer’s ac- choraplasts which are sounds “whose function is
tions which mostly correspond to the handling of to contribute to the perception of resonating space
joysticks and pressing buttons on their controller, [volume and time, localization]” (Grimshaw,
and the action performed by the player character on 2008, p. 113). By privileging a “navigational”
the diegetic level. For this matter synchronisation mode of listening (Grimshaw, 2008, p. 32), the
points turn out to be less aesthetic and more prag- augmentation or diminution of a sound’s intensity
matic as they become the product of the gamer’s might, for instance, assist the gamer in localizing
200
Listening to Fear
the generators and help them decide if they want in Silent Hill 2 (Konami, 2001) lead to a quick
to advance, or not, in their direction. identification, while at the same time provide
Sonic temporal functions are also very im- these characters with an imposing an threatening
portant to horror computer games. For example, demeanour. The identifying functions’ use is not
in Resident Evil 5, the flamethrower and satel- limited to distinguishing and qualifying enemies,
lite laser-guide that the gamer needs to utilise in it also “has a central role related to changes in
order to kill the dangerous Uroboros monsters game state and player state”11 (Jørgensen, 2008,
are regularly required, respectively, to be filled The Role of Audio in a Gameplay Context sec-
with fuel or to regain energy. To signal that the tion, para. 2). In Dead Space, when Isaac Clarke
weapons are recharging, in addition to a visual grunts in pain after taking a hit, it signals to the
indicator, the game underlines this process with a gamer that the player character’s physiological
distinctive sound. Similarly, when the replenishing integrity has been affected. Musical loops can
is done, a tone will inform the gamer. The same also signify transitions in the game state. In the
assumptions can be applied to other weapons as Resident Evil series, the leitmotif associated with
reload times and rate of fire are sometimes vital the “save room” means that the player character is
to the survival of the player character. Sounds that in safety, while fast-paced music normally implies
are “affording the perception of time passing” are the presence of a threat or requires immediate
named chronoplast by Grimshaw (2008, p. 113). attention from the gamer.
The preparatory functions, a term I have Progression functions is a term I propose based
borrowed from Collins (2008), and which cor- on my reflections upon the motivational purpose
respond to what Jørgensen (2006; 2008) called of music proposed by Zach Whalen (2004) in
urgency functions, are sounds alerting the gamer his text Play Along: An Approach to Videogame
that an event has occurred in the diegetic world Music. As Whalen explained, in Silent Hill, “the
or which forewarn them of the presence of an music is always in a degree of “danger state”
enemy within the immediate environment of the in order to impel the player through the game’s
player character. For instance, in Dead Space, spaces. The mood of the game is crucial to the
the alarm signalling that a section of the ship is horrific ‘feel’, but it also provides motivation
being put into quarantine serves as an alert, while by compelling continual progress through the
the off-screen moans of zombies in Resident Evil game” (Silent Hill section, para. 1). I suggest that
are considered a forewarning. It must also be other sounds, such as the enemies’ sound cues or
acknowledged that adaptive and interactive (Col- alarm sounds, can achieve a similar purpose and
lins, 2008) extra-diegetic music can also occupy encourage (or sometimes discourage) the gamer
these roles as they either punctuate an event or, to progress into the game. While these functions
as in Resident Evil 4, testify to the presence of are mostly integrated in enemy-generated sounds,
infected Ganados. some segments of dialogue can also be considered
For their part, identifying functions, which as serving progression functions. For instance, in
were more accurately theorised by Jørgensen, Dead Space, radio communications with Kendra
(2006), correspond to the ability of a sound “to and Hammond help the gamer to figure out how
identify objects and to imply an objects value” to reinitialise the ventilation system of the hydro-
(Identifying Functions section, para. 1). For ex- ponic station of the U.S.S. Ishimura.
ample, the heavy footsteps and the characteristic Of course, one single sound event can serve
music loop accompanying the presence of Nemesis many of these functions simultaneously. Fur-
in Resident Evil 3 (Capcom, 1999), as well as thermore, as Jørgensen (2008) specified, “the
the screeching of Pyramid Head’s gigantic blade functional roles of sounds [will be] judged with
201
Listening to Fear
different urgency in different situations even exciting narrative contexts to the games, no matter
though the sound is exactly the same” (Player how far-fetched they were. (p. 59)
Interpretation of Audio in Context section, para.
1). While this quote was intended to portray the As Remi Delekta and Win Sical (2003) sug-
relationship existing between sound and context gest in an article of the only issue of the Horror
in multiplayer sessions of World of Warcraft Games Magazine: “[Horror computer games] can
(Blizzard Entertainment, 2004), it is, neverthe- not exist without a minimum of technical capaci-
less, quite applicable to the single player games ties: sounds, graphics, processing speed. Fear to
which characterize most of the horror computer exist needs to be staged and mise en scène needs
game genre. It is in regard to the macro and micro means” (p. 13, freely translated). It was in 1992 that
contexts of the games that prioritisation of one Alone in the Dark, designed by Frédérik Raynal,
function over another will be possible. With all shook the entire videoludic scene by incorporat-
this in mind, it is now time to take a look at how ing polygonal characters, monsters and objects
horror games partly build their sound strategies in two-dimensional, pre-rendered backgrounds.
by playing with these functions. While this simulated three-dimensionality opened
a new “game space” allowing for novel possi-
bilities in gameplay, it also created an innovative
HOrrOr cOMPUtEr GAMEs’ “playground” for imaginative sound designers.
sOUND strAtEGIEs
between Horror and terror
Horror computer games have been around for a
long time. During the 1980s, many games such as Before we begin our analysis of horror games’
Atari’s Haunted House (1981), Sweet Home (Cap- sound strategies, I need to clarify that fear, terror,
com, 1989)12 and the videoludic adaptations of the dread, horror, anxiety and disgust, while they are
movies Halloween (Wizard Video Games, 1983) broadly analogous emotions, are not synony-
and Friday the 13th (LGN, 1989) hit the shelves mous. Moreover, not all horror computer games
to satisfy gamers in quest of an adrenalin rush. try to generate this entire emotional spectrum13.
However, as I explained in a chapter published Accordingly, while some games rely on visceral
in Horror Video Games: Essays on the Fusion of manifestations of fear such as horror and disgust,
Fear and Play, the abstract graphics and synthe- others create fear at a psychological level, generat-
sised sounds of those games could not provide a ing suspense, terror and dread. To understand how
simulation of evisceration as convincing as cer- games manage to scare gamers, we must first take
tain computer games can provide today. Indeed, a look at the difference between horror and terror.
“at that time, the horror was more lurking in the According to Perron14 (2004),
paratextual material than the games themselves”
(Roux-Girard, 2009, p. 147). As Mark J. P. Wolf horror is compared to an almost physical loath-
(2003) explained: ing and its cause is always external, perceptible,
comprehensible, measurable, and apparently ma-
The boxes and advertising were eager to help terial. Terror, as for it, is rather identified with the
players imagine that there was more to the games more imaginative and subtle anticipatory dread. It
than there actually was, and actively worked to relies more on the unease of the unseen. (p. 133)
counter and deny the degree of abstraction that
was still present in the games. Inside the box, Of course, sound design plays a prominent
game instruction manuals also attempted to add role in setting these two poles up. On one hand,
202
Listening to Fear
sounds provoke spontaneous sensations using the allure, grain, dynamic profile, and the mass
rendering effects of matter, and on the other, they profile of a sound that determines its repercussion
contribute to the elevation of suspense by creating on the gamer. During their gameplay activity, the
ambiguity between causes, uncertainty regard- gamer hears (entendre) the morphological qualities
ing the origin of the sounds and by limiting the of the sounds which allow them to comprehend
information carried by the sound’s affordances. (comprendre) and experience them as frighten-
To achieve this, horror computer games rely on ing. Therefore, it is not only because the gamer
a plurality of strategies. listens (écouter) to what they can identifiy as a
In the preceding sections of this chapter, I in- zombie that they are scared, but because they hear
troduced a number of theoretical tools to help us (entendre) a moan or a growl, which correspond
understand how gamers structure sounds within to the sound motifs contained in their knowledge
and without the gameworld and how they produce of horror symbols. Therefore, it is not so much
meaning with the different cues they listen to. I because the lamentation is generated by a zombie
now propose to revisit those concepts in light of a and comprises low-frequencies that it is frighten-
horrific mise en scène to comprehend how horror ing but, because, in its essence, it contains an
games develop those strategies. energy reminiscent of a certain form of pain and
agony. Ambiences can have a similar effect as
the choice of the sounds they associate acoustic qualities with unpleasant
situations and frightening locales. Reciprocally,
While horror computer games (and mostly survival the emotions produced by these choices of sound
horror games) utilize a wide range of sound strate- force the gamer to focus on every little detail of
gies, the staging of fear starts at a purely formal the sound design and are partly responsible for
level. The choice of sounds and the way they are the gamer’s high level of “perceptual readiness”.
used are greatly responsible for the quality of Of course, the selection of sounds must also
the mood of the games. Some empirical research aim to create uncertainty as this feeling is essential
(quoted in Grimshaw, 2009) attempted to demon- to the creation of suspense. To do so, designers
strate that there is a certain degree of correlation sometimes have to baffle the gamer’s expectations
between the physical signal of a sound and the to a certain point. In his book on the Silent Hill
emotions felt by listeners. For instance, Cho, Yi, series, Perron (2006) observed an evolution, from
and Cho’s (2001) research on textile sounds shows one title to another, in the sound used to portray
that loud and high-pitched sounds are unpleasant the monstrous nurses. As the author explains:
to the ear, while Halpern, Blake, and Hillenbrand “The nurses, which have a much low-pitched
(1986) point to loud, low-mid frequencies as being ‘voice’ in [Silent Hill], have a penetrating sped
disagreeable. Whereas these investigations seem up respiration in [Silent Hill 3]” (Perron, 2006,
contradictory, they nevertheless tend to reveal p. 93, translated by the author). According to
that the acoustic qualities of sounds can have, me, this purely aesthetic strategy has the effect
amongst other factors, a physiological as well as of reducing the gamer’s “launch window” into
psychological impact on the gamer. the game, preventing him from using his anterior
However, to arouse emotions, we need much knowledge to identify (identifying functions) his
more than mere frequencies. Borrowing from opponents. Consequently, Silent Hill 3’s sound
Pierre Schaeffer’s theory (synthesised by Chion, design created ambiguity regarding the cause of
1983) on the morphological description of sounds the sound, and forced the gamer to reconstruct,
and Quatre écoutes (écouter, ouir, entendre, from game to game, the relation between the
comprendre), it is mostly the work performed on sounds and their generators.
203
Listening to Fear
However, as we have seen earlier, horror As it is the case with horror films, the silence
games do not create fear only with their aesthetic [...] puts the player on edge rather than reassur-
dimension, but also with their narrative structure ing him that there is no danger in the immediate
and gameplay. Therefore, some of their strategies environment, increasing the expectation that
are also constructed from these two dimensions. danger will soon appear. The appearance of the
danger is, therefore, heightened in intensity by
creation of a startle Effect way of its sudden intrusion into silence. (Silent
Hill section, para. 3)
Sound plays a preponderant role in the creation of a
variety of surprise effects. Following an analysis of It is according to this technique that designers
this phenomenon by Robert Baird, Perron (2004) punctuated, by shattering a window, the intrusion
explained in his text Sign of a Threat: The Effects of a long-fanged monster in Alone in the Dark
of Warning Systems in Survival Horror Games, (I-Motion, 1992), or in a similar incursion of a
that the essential formula for creating a startle zombie-dog in Resident Evil (Capcom, 2002),
effect can be summed up into three steps: “(1) a intensified the attack of a crawling monster in
character presence, (2) an implied offscreen threat, Silent Hill 2 (Konami, 2001), or amplified the
and(3) a disturbing intrusion [often accentuated brutal opening of an elevator door by a necromorph
by a sound burst] into the character’s immediate in Dead Space.
space” (p. 133). As noted by the author, it is indeed As Perron (2004) mentioned: “To trigger sud-
at the moment of the intrusion of the off-screen den events is undoubtedly one of the basic tech-
threat inside the screen that sound will take on niques used to scare someone. However, because
all its importance. At this level, it is a question of the effect is considered easy to achieve, it is often
contrast in the sonic intensity and synchronisation labelled as a cheap approach and compared with
of the sound and its generator in the visual field a more valued one: suspense” (p. 133). Follow-
of the gamer. Therefore, startle effects depend on ing this line of thought, if sound plays a decisive
the physical limitations of the ears. As ears are role when it comes to making a gamer jump out
slower to react than the eyes, the startle effect of his shoes, it also plays a role in the creation of
will temporally cloud the gamer’s evaluation suspense. It is in this perspective, towards dread
and identification operations. To favour such ef- and anticipation, that the next strategies will be
fects, horror games often rely on a refined sound explored.
aesthetic and create moments of approximate
silence. We can also say that the sounds the gamer the Impact of Forewarning
cannot hear—the noises an enemy should make
while moving towards the gamer that are rendered To create suspense, forewarning is one of the
inaudible—play a role as important as the ones most effective strategies. Before further devel-
he can hear. In Schaefferian terms, we could say oping this concept, it is essential to mention that
the game plays on the limits of hearing (ouïr) as forewarning is not always exclusively based on
a way to fool the gamer’s listening (écouter). It sound. Forewarning, which consists of alerting
is only into these considerations that the episodes the gamer to the presence of a menace in the
of respite before an attack play a determining role surroundings of his player character, can also be
in the staging of a startle effect. This is stressed based on visual cues, as it is the case with Fatal
by Whalen (2004): Frame (Tecmo, 2002) when the indicator in the
bottom of the screen turns orange, signalling the
presence of a ghost. However, many forewarning
204
Listening to Fear
systems have been designed through sound. The will privilege without ambiguity the identification
most renowned case of such a technique and, and localisation of the source with sound, will
incidentally, the most studied—being discussed submit sound’s diffusion to the sound proper-
by Carr (2003), Kromand (2008), Perron (2004) ties’ logic” (p. 53, freely translated). To create
and Whalen (2004) —is the pocket radio in the fear and strong feelings of discomfort, horror
Silent Hill series. This radio, which emits static games execute a reversal of this concept making
when a threat is nearby, plays its role as a warning the generators of the sounds harder to identify
system perfectly. Forewarning can also be created and localize. In the example from Alone in the
through a more classical way through making use Dark: The New Nightmare described earlier, the
of off-screen sounds (Perron, 2004). This is the designers have avoided creating an evolution in
case in Alone in the Dark: The New Nightmare the morphological properties of the sounds of the
(Infogrames, 2001) when, during the numerous plant monsters in relation to the player character
seconds necessary for the gamer to go down the travelling through the fort’s space. This technique
stairs leading to the interior court of the fort, it is used to alter the information the sound is car-
is possible to hear sounds associated with plant rying regarding the distance separating the threat
monsters coming from outside the frame of the and the gamer’s player character. While listening
fixed virtual camera shots. carefully, the gamer remarks no variation in the
If we could believe that such a warning, pre- dynamic profile and mass profile of the sound
figuring the entrance of a gloomy monster inside generated by the creatures of darkness even though
the screen, could reduce the feeling of fear or the player character performs a descent which, if
uneasiness in the gamer, research cited in Perron’s it were scaled, would be equivalent to a little less
(2004) work tends to prove the opposite. As the than a hundred meters. In this case, the designers
author himself specifies, “[…] simple forewarning intentionally reduce the quantity of information
is not a way to prevent intense emotional upset. carried by the sound in a way that limits the
It is worse than having no information about an gamer’s interpretation of space and time, as it is
upcoming event” (Perron, 2004, p. 135). Such a impossible to evaluate the distance between the
method creates terror by anticipation based on a player character and the monsters. However, this
fear of the unseen. tweaking of the spatial and temporal functions of
However, what Perron fails to highlight, is the sound allows for an emphasis to be put on its
that forewarning does not rely only on the sound forewarning purpose, which is bound to influence
function of the same name. To be really effective, the progression function of the sound. Preventing
the forewarning must be unreliable and/or the the easy localization of the source/generator of the
quantity of information about the localisation of sound has an effect of reinforcing the suspense
the generator must be limited. This precision offers established by the forewarning while simultane-
the opportunity to introduce another strategy of ously forcing the gamer to take a more prudent
horror computer games which relies on the func- approach while going down the stairs.
tions of game sound: luring the gamer with sound. Many horror game strategies rely on creating a
certain level of ambiguity regarding the origin of
Luring the Gamer With sound sounds within the gameworld. While this can be
achieved, as suggested by Daniel Kromand (2008),
In his master’s thesis, Serge Cardinal (1994) by blurring the frontier between the diegetic and
explained that “filmic writing favouring the non-diegetic parts of the game, similar exercises
emergence of a clear spatial structure will have can be performed between instances within the
the tendency to anchor the sound with its source, diégèse. This partly explains why I chose to
205
Listening to Fear
structure my analysis of horror computer game character. After hearing the monster’s footsteps
sounds around the concept of sound generators and for the first time, the gamer’s perceptual readiness
functions of game sound. Indeed, those notions are will augment regarding these sounds. However,
best suited to describing the relationship between since the sounds emitted by the gamer’s player
the different instances of sound, in that there is character are so similar to and blend with those of
more in horror computer games than meets the ear. the enemies, the movement of the player charac-
ter on the gooey surface might signal a potential
Ambiguity between presence in the player character’s surrounding
sound Generators environment. The gamer will then be forced to
adopt a more careful approach and look around
A study of the relations that exist between the more often than he would have normally done.
different categories of sound generators allows The flesh covered sections of the spaceship
me to put forward some of the sonic strategies also encouraged the designers to establish a similar
of horror computer games. One of the most basic relationship, much more common to horror com-
strategies of those games is to design sound in a puter games, between the sounds generated by the
way that creates ambiguity between the different enemies and the gameworld. The game environ-
sound generators of the game. Indeed, if two or ments are often designed to generate ambiences
more generators manage to produce sounds of a that imitate the sounds generated by the threats
similar nature, it will directly affect the cognitive of the games. As mentioned by Kromand (2008):
process of the gamer, making it harder to localize “The [gamer]’s understanding of affordances can
the sources but also harder to classify the cues as help to perform better [...] as certain sounds pass
more or less important regarding the game context. information regarding nearby opponents, but at the
For the most part, these ambiguities will concern same time these exact affordances are mimicked
the spatio-temporal and preparatory functions of by the ambiance.” (Welcome to Rapture section,
sound and will generate fear through anticipation. para.4). Once again, this way of conceptualizing
The first technique consists of blurring the sound in the game favours the creation of doubt
line between the sounds generated by the player in the player regarding the real provenance of the
and those generated by enemies. If these two sounds. To get back to our Dead Space example,
generators manage to produce similar sound cues the organic matter is sometimes surmounted by
through a common source, it is possible to believe excrescences which randomly squirt blood when
that, for example, the movements of the gamer’s the player character passes by. The excretion sound
player character through space might nourish the is also reminiscent of the sound made by enemies
suspense. I must admit that this technique is not and tends to mislead the gamer as to what generated
widespread in horror computer games but, see- the sound. Similarly, other ambiance sounds, such
ing as the game Dead Space manages to create as the creaking of the ship’s hull, the rumbling
such a doubt, it is worthy of being mentioned as of the machinery, and other metallic impacts are
similar modus operandi might be exploited in used to simulate the prowling of a necromorph
future horror games. Indeed, in Dead Space, the in an air vent or in one of the ship’s corridors. Of
sounds emitted by the player character’s footsteps course, Dead Space is not the only game that makes
on the viscous organic matter which often covers use of such strategies. As Ekman and Lankoski
the floors of the spaceship are very similar to the (2009) noted, in “Silent Hill 2 and Fatal Frame,
sounds produced through the interaction of the the whole gameworld breathes with life, suggest-
substance and the deformed limbs of the grotesque ing that somehow the environment itself is alive,
monsters roaming with intent to kill the player sentient, and capable of taking action against the
206
Listening to Fear
player” (p. 193). This way of introducing “event Question section, para. 2). Therefore, as mentioned
sounds with no evident cause, sound not plausibly by Kromand (2008), “the constant guessing as to
attributed to an inanimate environment” is, for whether the sounds have a causal connection put
that matter, the trademark of the Silent Hill series. the [gamer] in unusual insecure spot that might
This way of conceptualizing sound even extends well build a more intense experience” (Conclusion
to the atonal, extra-diegetic music of the game. section, para. 2), which has the effect of augment-
This aesthetic choice allows me to introduce one ing the level of fear in the player.
last case of ambiguity between sound generators. As a unit, the techniques which aim at creating
Some horror games aim at creating ambiguity ambiguity between sound generators are based on
between the game system, the gameworld, and the the different circuits a sound can perform between
enemies, the emphasis being put, as suggested by the on-screen, the off-screen and the extra-diegetic.
Kromand (2008), on blurring the line between Indeed, it is by regularly making sounds pass from
elements that are part of the diegesis and others on-screen (which allows the player to identify the
that are not. By choosing to exploit atonal music, cause of the sound) to the off screen (where the
which is closer to musique concrète than traditional sound serves as a forewarning of a threat) to the
orchestral or popular music, that blends and often extra-diegetic (where sound simulates the pres-
merges with the ambient and dynamic sound ef- ence of a threat), that videoludic sound manages
fects of the game, designers manage to lure the to condition the gamer to be wary of everything
gamer into thinking that there are more threats he hears.
than there actually are. This technique also often
succeeds at diverting the gamer’s attention from Fear and context
the real threats in the game. The most flagrant
example of such a scrambling between the sounds Of course, fear will not only be induced by the
emitted by enemies and the game system comes morphological nature of a sound, by its fixed
from Silent Hill. During a gameplay sequence relation with its cause or the constructions of
in the alternate town of Silent Hill (Konami, strategies. Fear, horror, and terror mostly depend
1999), the non-diegetic music, which is mostly on the context in which the sound is heard. At this
constituted of metallic, industrial sounds, also level, many parameters will influence the percep-
includes in its loop a sound that is very similar tion the gamer will have of a sound: the spatial
to the sounds generated by the flying monsters configuration, the general difficulty of the gamer,
of the game. Since the flying demons’ sounds the number of enemies, the available resources, the
are mixed very low within the music, the gamer, available time and so forth. The global situation
who is concentrated on his activity, probably related to the perception of a sound will have a
won’t notice that this cue is repeated on a fixed determining impact on the attitude a gamer will
temporal line and will be bound to associate this adopt towards this sound. A videoludic design
sound to an oncoming monster. favouring such game mechanics will therefore be
A similar type of conception was also privileged an accomplice to the sound strategies.
in the sound design of Dead Space. As Don Veca,
the lead sound designer of the game, underlined:
“We […] approached the entire sound-scape as cONcLUsION
a single unit that would work together to create a
dark and eerie vibe. [...] In this way, Dead Space In an attempt to scare their gamers, horror com-
has really blurred the line between music and puter games utilise different strategies of mise
sound design” (cited in Napolitano, 2008, First en scène. Testament to the dialogue between the
207
Listening to Fear
production and reception of the games, these strate- To feel safe, the gamer must be able to quickly
gies, to be efficient, must play with the gamer’s find answers to their questions. To arouse fear,
expectations—regarding the reading and listening horror games block this process. While the mor-
constraints imposed by the genre and paratext— phologic nature of a sound is sometimes enough
and exploit the cognitive schemes that help them to induce a strong feeling of discomfort, horror
to classify the information they receive during computer games mostly rely on sound strategies to
their gameplay sessions. In this line of thought, reach their goal. From the startle effects to the cre-
the games must create situations that will generate ation of ambiguity between the sound generators,
negative emotions such as fear, horror, and terror. the games trick the gamer’s listening by limiting
As only the gamer gets access to these emotions, I the information the sounds carry. Plunged into a
privileged an approach oriented on the reception of universe of “un-knowledge” (Kromand, 2008),
sound in a gameplay situation rather than a mere the gamer can only be scared by their gameplay
analysis of technical data. It is consequently with experience. To be really effective, the sound strate-
a terminology that does not reference directly the gies must be part of a whole and integrated into
game code or algorithm, but instead focuses on a global staging of fear, which also depends on
the gamer’s mental reproduction of the videoludic the relationships between the sound and images,
universe, that I attempted to explain the importance the gameplay, and the game’s narrative. In the
of sound in the development of horror computer end, it is the pressure applied by the genre, and
game strategies. the deconstruction of the structure and the func-
The gamer’s first objective being to insure tions of sound by the different in-game situations,
the survival of their player character, their tasks that will determine the true impact of the sound
mainly revolve around detecting all the intrusions strategies on the gamer.
that might become hazardous for their character.
In these circumstances, gamers must structure
the sounds they hear and extract from them all rEFErENcEs
the information they need to properly respond to
a given situation. This cognitive process has been Alone in the dark. [Computer game]. (1992). In-
broadly presented with the help of Arsenault and fogrames (Developer). Villeurbanne: Infogrames.
Perron’s model (Figure 1). More precisely, the Alone in the dark: Inferno. [Computer game].
gamer must determine the origin and the cause of (2008). Eden Games S.A.S. (Developer). New
the sounds. To do so, they must first determine if York: Atari.
a sound is generated by an event present within
the videoludic world or overhanging this world. Alone in the dark:The new nightmare. [Computer
The gamer must then refine this categorisation game]. (2001). DarkWorks (Developer).Villeur-
to establish more precisely what, between their banne: Infogrames.
actions, the enemies, the game environment, and Altman, R. (1992). General introduction: Cinema
the game system, is the generator of the sound. as event . In Altman, R. (Ed.), Sound theory, sound
At the same time, they must pay attention to the practice (pp. 1–14). New York: Routledge.
affordances (their functions) of the sounds which
might communicate information about the space, Arsenault, D., & Perron, B. (2009). In the frame
the time, the enemies, and the events occurring of the magic cycle: The circle(s) of gameplay .
in the game environment. The gamer must then In Perron, B., & Wolf, M. J. P. (Eds.), The video
evaluate which affordance must be prioritised game theory reader 2 (pp. 109–132). New York:
according to the circumstances. Routledge.
208
Listening to Fear
Arsenault, D., & Picard, M. (2008). Le jeu vidéo Fatal frame. [Computer game]. (2002). Tecmo
entre dépendance et plaisir immersif: les trois (Developer). Torrance: Tecmo.
formes d’immersion vidéoludique. Proceedings
Friday the 13th. [Computer game]. (1989). Pack-
of HomoLudens: Le jeu vidéo: un phénomène
In-Video (Developer). New York: LJN.
social massivement pratiqué, (pp. 1-16). Retrieved
from http://www.homoludens.uqam.ca/index. Grimshaw, M. (2008). The acoustic ecology of
php?option=com_ content&task=view&id=55 the first person shooter: The player experience of
&Itemid=63. sound in the first-person shooter computer game.
Saarbrücken, Country: VDM Verlag Dr. Muller.
Boillat, A. (2009). La «diégèse» dans son acception
filmologique. Origine, postérité et productivité Grimshaw, M. (2009). The audio uncanny val-
d’un concept. Cinémas Journal of Film Studies, ley: Sound, fear and the horror game. In Pro-
19(2-3), 217–245. ceedings of Audio Mostly: 4th Conference on
Interaction with Sound. Retrieved from http://
Bordwell, D. (1986). Narration in fiction film.
digitalcommons.bolton.ac.uk/cgi/viewcontent.
New York: Routledge.
cgi? article=1008&context=gcct_conferencepr.
Carr, D. (2003). Play dead: Genre and affect in
Halloween. [Computer game]. (1983). Video
Silent Hill and Planescape Torment. Game Stud-
Software Specialist (Developer). Los Angeles:
ies, 3(1). Retrieved from http://www.gamestudies.
Wizard Video Games.
org/0301/carr/
Hauntedhouse.[Computer game]. (1981). Atari
Chion, M. (1983). Guide des objets sonores:
(Developer).Sunnyvale: Atari.
Pierre Schaeffer et la recherche musicale. Paris:
Buchet/Chastel. Huiberts, S., & van Tol, R. (2008). IEZA: A frame-
work for game audio. Gamasutra. Retrieved from
Chion, M. (1990). L’Audio-vision. Paris: Nathan.
http://www.gamasutra.com/view/feature/3509/
Chion, M. (2003). Un art sonore, le cinéma: ieza_a_framework_for_game_audio.php.
histoire, esthétique, poétique. Paris: Cahiers du
Jauss, H. R. (1982). Toward an aesthetic of recep-
Cinéma.
tion. Minneapolis, MN: University of Minnesota
Collins, K. (2008). Game sound: An introduc- Press.
tion to the history, theory, and practice of video
Jørgensen, K. (2006). On the functional aspects
game music and sound design. Cambridge, MA:
of computer game audio. In Proceedings of Au-
MIT Press.
dio Mostly – A Conference on Sound in Games
Dead space. [Computer game]. (2008). EA (pp. 48-52). Retrieved from http://www.tii.se/
Redwood Shores (Developer). Redwood City: sonic_prev/images/ stories/amc06/amc_proceed-
Electronic Arts. ings_low.pdf.
Dektela, R., & Sical, W. (2003). Survival horror: Jørgensen, K. (2008). Audio and gameplay: An
Un genre nouveau. Horror Games Magazine, analysis of PvP battlegrounds in World of War-
1(1), 13–16. craft. Game Studies, 8(2). Retrieved from http://
gamestudies.org/0802/articles/jorgensen.
Ekman, I., & Lankoski, P. (2009). Hair-raising
entertainment: Emotions, sound, and structure Jørgensen, K. (2011). Time for new terminology?
in Silent Hill 2 and Fatal Frame . In Perron, B. Diegetic and non-diegetic sounds in computer
(Ed.), Horror video games: Essays on the fusion games revisited . In Grimshaw, M. (Ed.), Game
of fear and play (pp. 181–199). Jefferson, NC: sound technology and player interaction: Con-
McFarland. cepts and developments. Hershey, PA: IGI Global.
209
Listening to Fear
Juul, J. (2005). Half-real: Video games between Resident evil 5. [Computer game]. (2009). Cap-
real rules and fictional worlds. Cambridge, MA: com Production Studio 4 (Developer). Sunnyvale:
MIT Press. Capcom USA. Cardinal, S. (1994). Occurrences
sonores et espace filmique. Unpublished master’s
Kromand, D. (2008). Sound and the diegesis in
thesis. University of Montréal, Montréal.
survival-horror games. In Proceedings of Audio
Mostly 2008 the 3rd Conference on Interaction Resident evil. [Computer game]. (1996). Capcom
with Sound (pp. 16-19). Retrieved from http:// (Developer). Sunnyvale: Capcom USA.
www.audiomostly.com/images/stories/ proceed-
Resident evil. [Computer game]. (2002). Capcom
ing08/proceedings_am08_low.pdf.
(Developer). Sunnyvale: Capcom USA.
Left 4 dead. [Computer game]. (2008). Turtle Rock
Roux-Girard, G. (2009). Plunged alone into dark-
Studios (Developer). Kirkland: Valve Software.
ness: Evolution in the staging of fear in the Alone
Manovich, L. (2001). The language of new media. in the Dark series . In Perron, B. (Ed.), Horror
Cambridge, MA: MIT Press. video games: Essays on the fusion of fear and
play (pp. 145–167). Jefferson, NC: McFarland.
Murray, J. (1997). Hamlet on the holodeck: The
future of narrative in cyberspace. New York: The Silent hill 2. [Computer game]. (2001). KCET
Free Press. (Developer). Redwood City: Konami of America.
Napolitano, J. (2008). Dead Space sound design: Silent hill 3. [Computer game]. (2003). KCET
In space no one can hear intern screams. They (Developer). Redwood City: Konami of America.
are dead. (Interview). Original Sound Version.
Silent hill. [Computer game]. (1999). KCEK
Retrieved from http://www.originalsoundversion.
(Developer). Redwood City: Konami of America.
com/?p=693.
Neale, S. (2000). Genre and Hollywood. New
from an auditive perspective. In Proceedings of
York: Routledge.
Level Up, DiGRA 2003. Retrieved from http://
Odin, R. (2000). De la fiction. Bruxelle: De Boeck. www.stockburger.co.uk/research/pdf/ AUDIO-
stockburger.pdf.
Perron, B. (2004). Sign of a threat: The effects
of warning systems in survival horror games. Sweethome. [Computer game]. (1989). Capcom
In . Proceedings of COSIGN, 2004, 132–141. (Developer). Osaka: Capcom.
Retrieved from http://www.cosignconference.
Taylor, L. (2005). Toward a spatial practice in
org/ downloads/papers/perron_cosign_2004.pdf.
video games. Gamology.Retrieved from http://
Perron, B. (2006). Silent hill: Il motore del terrore. www.gamology.org/node/809.
Milan: Costa & Nolan.
Whalen, Z. (2004). Play along: An approach to
Resident evil 3: Nemesis. [Computer game]. videogame music. Game Studies, 4(1). Retrieved
(1999). Capcom (Developer). Sunnyvale: Cap- from http://www.gamestudies.org/0401/whalen/.
com USA.
Wolf, M. J. P. (2003). Abstraction in the video
Resident evil 4. [Computer game]. (2004). Cap- game . In Perron, B., & Wolf, M. J. P. (Eds.),
com Production Studio 4 (Developer). Sunnyvale: The video game theory reader (pp. 47–65). New
Capcom USA. York: Routledge.
210
Listening to Fear
Worldof Warcraft. [Computer game]. (2004). ENDNOtEs

Vivendi (Developer). Irvine: Blizzard.
1
It must be mentioned that this chapter does
not wish to theorize the perhaps ill-suited
notion of videoludic genres—a fertile field
KEY tErMs AND DEFINItIONs of computer game research that should, in
coming years, generate quite a debate–but
Allure: It is the amplitude or frequency modu-
wishes, rather, to use it as a tool to better
lation of a sound.
understand how gamers structure their
Comprendre: According to Schaeffer, com-
gameplay session in survival horror games.
prendre means grasping a meaning, values, by 2
For space reasons, I chose to limit my analysis
treating the sound like a sign, referring to this
of these specific factors. Just keep in mind
meaning as a function of a language, a code.
that the industry and the technology play a
Dynamic Profile: It is the temporal evolution
great part in the final rendering of the games.
of the sound’s energy. 3
Note that the former definition is largely
Écouter: According to Schaeffer, écouter, is
associated with reception issues while the
listening to someone, to something; and through
later refers to the productions aspects of the
the intermediary of sound, aiming to identify the
games.
source, the event, the cause, it treats the sound as 4
Generic issues of survival horror games will
a sign of this source, this event.
therefore be approached as a “constraint
Entendre: According to Schaeffer, entendre,
of listening” from which the gamer will
here, according to its etymology, means showing
organise and evaluate the role of sound in a
an intention to listen [écouter], choosing from
given context.
what we hear [ouïr] what particularly interests 5
Indeed, while playing a game, the gamer
us, thus “determining” what we hear.
never has access to this code. As Arsenault
Grain: It can be defined as the microstructure
and Perron (2009) explained, the gamer
of sound matter, such as the rubbing of a bow.
“only witnesses the [...] result of the com-
Mass Profile: It is the evolution in the mass of
puter’s response to his action. He does not,
a sound. For example, from pitched to complex.
per se, discover the game’s algorithm which
Mise En Scène: It is the organisation of the
remains encoded, hidden and multifaceted”
different elements that define the staging of a scene,
which means that “the notion that a gamer’s
or, in the case that interests us, the simulation of
experience and a computer program directly
a gameplay sequence.
overlap is a mistake” (p. 110). While this
Ouïr: According to Schaeffer, ouïr is to per-
statement upholds the approach of this
ceive by the ear, to be struck by sounds, it is the
paper, it also calls for a use of terminology
crudest level, the most elementary of perception;
that can reflect a game audio structure with
so we “hear”, passively, lots of things which we
accuracy and that can be applied directly to
are not trying to listen to nor understand
a gameplay situation.
Videoludic: It is an adjecti.ve linked to vid- 6
I find necessary to make this distinction be-
eogames. The use of this term opens a door for
cause the notion diegesis, which is now often
the utilisation of sonoludic as an adjective for
broadly defined as “the fictional world of the
audio only games or computer games in which
story” (Bordwell, 1986) might be question-
gameplay mechanics are mostly based on sound.
able as it sometimes seems to borrow too
much from narrative theory. Étienne Souriau
211
Listening to Fear
(n.d.), in his original definition of the term, of “systemic immersion” (Arsenault &
conceptualised the “diégèse” as a “‘world’ Picard, 2008), allowing for more levels of
constructed by representation” (Boillat, communication between the gamer’s world
2009, p. 223, freely translated) and, as it is and the gameworld. On these premises,
possible to deduce, which is not necessar- whether certain sounds generated within
ily specific to a narrative theory. Following the “diégèse” seem to address an instance
Souriau’s line of thought, “the diegetic level without it or not, does not hold that much
is characterized not only by ‘everything we importance regarding the construction and
take into consideration as being represented integrity of the “diégèse”.
by the film’ but also by ‘the type of reality 8
I personally prefer to use the adjective extra-
supposed by the signification of the film’” diegetic instead of non-diegetic because I
(cited in Boillat, 2009, p. 222, freely trans- believe that, for example, survival horror
lated). According to Boillat (2009), Souriau games’ music is tightly linked to the events
refined this definition by assimilating the that are taking place in the diegetic world.
“diégèse” to “all that belongs, ‘in the indi- 9
In Hamlet on the Holodeck, Janet Murray
gibility’ [...] to the story being told, to the (1997) defines agency as “the satisfying
world supposed or proposed by the fiction power to take meaningful action and see
of the film” (Boillat p.222, freely translated), the results of our decisions and choices” (p.
this “all” making reference to three very 126).
important constituents: time, space, and the 10
For example Jørgensen’s (2006; 2008)
character. As it is also highlighted by Boillat, response functions, even though they play
this second part of the definition is essential an important role in the actual gameplay of
to the concept so as to prevent the “reducing survival horror games are not as important
[of] the ‘diégèse’, as it was often the case to the construction of the games’ strategies.
[...] to only the ‘recounted story’” (p. 222, For this reason, they will be left out of this
freely translated). However, in his book De chapter. For more information on sound
la Fiction, French semio-pragmatist Roger function, see Grimshaw, 2008; Jorgensen,
Odin makes a clarification regarding the 2006 and 2008, and Collins 2008.
dichotomy between the story and the diégèse. 11
I think “player character state” would be
As he explained, the “diégèse” “cannot be more appropriate as the gamers themselves
mixed up with the story” but “provides the remain in their living room.
descriptive elements the story needs mani- 12
Only available in Japan.
fest to itself” (cited in Boillat, 2009, p. 234, 13
This allows for the differentiation between
freely translated). horror computer games, which are a broader
7
While trying to apply the concept of diégèse category of the videoludic horror genre, and
to videogames, one must acknowledge that survival horror games, which can be referred
it does not function following the require- to as games that maximize the elements of
ment of fictional films and according to a a horrific mise en scène.
pure “fictionalisation process” (Odin, 2000). 14
Following William H. Rockett’s line of
The reconstruction of the diegetic stage thoughts.
works differently based partly on a process
212
213
Chapter 11
Uncanny Speech
Angela Tinwell
Mark Grimshaw
Andrew Williams
AbstrAct
With increasing sophistication of realism for human-like characters within computer games, this chapter
investigates player perception of audio-visual speech for virtual characters in relation to the Uncanny
Valley. Building on the findings from both empirical studies and a literature survey, a conceptual frame-
work for the uncanny and speech is put forward which includes qualities of speech sound, lip-sync,
human-likeness of voice, and facial expression. A cross-modal mismatch for the fidelity of speech with
image can increase uncanniness and as much attention should be given to speech sound qualities as
aesthetic visual qualities by game developers to control how uncanny a character is perceived to be.
INtrODUctION revealed a tech demo (The Casting) for the com-

puter game Heavy Rain (2006), in which the
As technological advancements allow for the rep- main character, Mary Smith, evoked a somewhat
resentation of high fidelity, realistic, human-like negative responsive from the audience (Gouskos,
characters within computer games, aspects of a 2006). Criticism was made of the uncanny nature
character’s appearance and behaviour are being of Mary Smith’s speech in that it sounded strange
associated with the Uncanny Valley phenomenon. and out of context with the given facial expression
(A definition of the Uncanny Valley is provided and emotion portrayed by this character. A closer
in the first section of this chapter.) It seems that inspection of the video showed that not only were
one of the main factors contributing to a character there errors in the sound recording (disparities be-
being regarded as lifeless as opposed to lifelike is tween the acoustics and the volume and materials
the character’s speech. In 2006, Quantic Dream of the room with excessive plosives contradicting
the distant camera and microphone), but a lack of
DOI: 10.4018/978-1-61692-828-5.ch011 correct pitch and intonation for speech and a lack
Uncanny Speech
of synchronization of speech with lip movement visually-based, excluding sound as a factor. As

were factors that reduced the overall believability a way towards building a conceptual framework
for this character (Tinwell & Grimshaw, 2010). A for the uncanny and virtual characters in immer-
mismatch between the conveyed emotion of Mary sive 3D environments, this chapter defines how
Smith’s voice with her gestures and posture exac- characteristics for a character’s speech may exag-
erbated how unnatural and odd the character was gerate the uncanny by considering aspects such
perceived to be. MacDorman (quoted in Gouskos, as synchronization of audio and video streams,
2006), observed that a perceived asynchrony of articulation, and qualities of speech.
lip movement with speech was one of the factors The first section provides an exposition of the
that people found disturbing about Mary Smith: Uncanny Valley describing how the theory came
about, previous investigation into the theory and
In addition, there is sometimes a lack of synchro- potential limitations of the theory in relation to
nization with her speech and lip movements, which virtual characters.
is very disturbing to people. People ‘hear’ with Previous authors (such as Bailenson et al.,
their eyes as well as their ears. By this, I mean 2005; Brenton, Gillies, Ballin, & Chatting, 2005;
that if you play an identical sound while looking and Vinayagamoorthy, Steed, & Slater, 2005)
at a person’s lips, the lip movements can cause have suggested that uncanniness is increased
you to hear the sound differently. when the behavioural fidelity for a realistic,
human-like character does not match up with
Since Mary Smith was revealed in 2006, that character’s realistic, human-like appearance.
increasing technological sophistication for com- The second section discusses how a cross-modal
puter games has allowed for heightened realism mismatch between a character’s appearance and
of human-like characters. Cinematic animation speech may exaggerate the uncanny. For instance
is achieved not only for cut scenes and trailers whether a character’s speech may be perceived
containing full motion video (FMV) but also for as belonging to a character or not, based on that
animation during in-game play. For example, character’s appearance.
the phoneme extractor and facial expression tool The third section discusses how particular
Faceposer designed by Valve for titles such as Left qualities of speech such as slowness of speech,
4 Dead (2008) and Half Life 2 (2008). However intonation and pitch and how monotone the voice
it would seem that speech, as a factor integral to sounds, may influence perceived uncanniness and
the uncanny phenomenon, is often overlooked how such qualities might work to the advantage
when compared to the aesthetic visual qualities of of those characters intended to elicit an eerie
behaviour of a human-like character. So far there sensation.
have been limited studies to ascertain which factors The results from the UM study (Tinwell &
contribute to the uncanny for virtual characters. Grimshaw, 2010) revealed a strong relationship
In response to the hearsay in mass media raised between how strange a character is perceived
by characters such as Mary Smith, Tinwell and to be and the lack of synchronization of speech
Grimshaw (2010) conducted a study to investigate and lip movement. (Characters rated as close to
how the cross-modality of image and sound might perfect synchronization for lip movement and
exaggerate the uncanny. The results from this study speech were perceived as less strange than those
are referred to throughout all sections within this with disparities in synchronization.) The fourth
chapter as the Uncanny Modality (UM) study, un- section reviews the findings from this study and
less otherwise stated from another study. Prior to also puts forward future experiments that may
this, much of the work on the uncanny had been
214
Uncanny Speech
help to define acceptable levels of asynchrony for being at first an assurance against death, then the
computer games where uncanniness is not desired. more sinister reminder of death’s omen “a ghastly
For figures onscreen an over exaggeration harbinger of death” (p. 235).
of pronunciation for particular words can make Building on previous depictions of the uncanny,
the figure appear uncanny to the viewer as the the roboticist Masahiro Mori (1970, as translated
figure seems absurd or comical (Spadoni, 2000). by MacDorman & Minato, 2005) observed that a
The fifth section considers how the manner of robot continued to be perceived as more familiar
articulation of speech may influence the uncanny and pleasing to a viewer as the robot’s appear-
by examining the visual representation (viseme) ance became more human-like. However, a more
for each phoneme within the choreography tool negative response was evoked from the robot as
Faceposer (Valve Corporation, 2008). the degree of human-likeness reached a stage at
A summary is presented in the final section which the robot was close to being human, but not
that defines the outcomes from this inquiry as to fully. Mori plotted a perpendicular slope climbing
how speech influences the uncanny for realistic, as the variables for perceived human-likeness and
human-like virtual characters as a way towards familiarity increased until a point was reached
building a conceptual framework for the uncanny. where the robot was regarded as more strange
It is intended that this framework is not only rel- than familiar (see Figure 1). At this point (about
evant to computer game characters but also for 80-85% human-likeness), due to subtle deviations
characters within a wider context of user inter- from the human-norm and the resounding negative
faces. For example virtual conversational agents associations with the robot, Mori drew a valley
within therapeutic applications used to interact shaped dip. A real human was placed, escaping
with autistic children to aid the development of the valley, on the other side. Mori gave examples
communication skills. Also those virtual conver- of objects such as zombies, corpses and lifelike
sational agents used to deliver learning material prosthetic hands that lie within the valley. He
to students within e-learning applications. also predicted that the Uncanny Valley would be
amplified with movement as opposed to the still
images of a robot.
tHE UNcANNY VALLEY Mori recommended that for robot designers,
it was best to avoid designing complete androids
The subject of the uncanny was first introduced and to instead develop humanoid robots with
in contemporary thought by Jentsch (1906) in an human-like traits, aiming for the first valley peak
essay entitled On the Psychology of the Uncanny. and not the second which would risk a fall into
Jentsch described the uncanny as a mental state the Uncanny Valley. As computer game designers
where one cannot distinguish between what is real working in particular genres continue the pursuit
or unreal and which objects are alive or dead. In of realism as a way to improve player experience
1919, to establish what caused certain objects to and immersion, designers have the second peak
be construed as frightening or uncanny, Sigmund as a goal to achieve believably realistic, human-
Freud made reference to Jentch’s essay as a way like characters (Ashcraft, 2008; Plantec, 2008).
to describe the feeling caused when one cannot To reach this goal and to assess if overcoming the
detect if an object is animate or inanimate upon Uncanny Valley is an achievable feat, further
encountering objects such as “waxwork figures, investigation and analysis of the factors that may
ingeniously constructed dolls and automata” (p. exaggerate the uncanny is required.
226). Freud characterized the uncanny as similar
to the notion of a doppelganger; the body replica
215
Uncanny Speech
Figure 1. A diagram to demonstrate Mori’s plot of perceived familiarity against human-likeness as the
Uncanny Valley (taken from a translation by MacDorman and Minato of Mori’s ‘The Uncanny Valley’)
Previous Investigation into virtual characters. Schneider et al. investigated

the Uncanny Valley the relationship between human-like appearance
and attraction with the results indicating that the
Since Mori’s original theory of the Uncanny Val- safest combination for a character designer seems
ley over thirty years ago, the increasing realism to be a clearly non-human appearance with the
possible for virtual characters and androids has ability to emote like a human.
sparked a renewed interest in the phenomenon Hanson (2006) conducted an experiment using
(Green, MacDorman, Ho, & Vasudevan, 2008; still images of robots across a spectrum of human-
Pollick, in press; Steckenfinger & Ghazanfar, likeness. An image of a human was morphed to
2009). However, there have been few empirical an android on one half of the spectrum and then
studies conducted to support the claims of uncanny the android to a mechanical-looking, humanoid
virtual characters and androids evident within new robot on the other half. The results depicted an
media (Bartneck, Kanda, Ishiguro, & Hagita, 2009; uncanny region between the mechanical-looking,
MacDorman and Ishiguro, 2006; Pollick, in press; humanoid robot and the android. In a second
Steckenfinger & Ghazanfar, 2009). experiment, Hanson found that it was possible to
Still images of both virtual characters and remove the uncanny region within the same plot,
robots have been used for experiments investi- where it had previously existed, by changing the
gating the Uncanny Valley. Design guidelines appearance of the android’s features to a more
have been authored to help realistic, human-like, “cartoonish” and friendly appearance.
characters escape from the valley (for example, However the results from these experiments
Green et al., 2008; MacDorman, Green, Ho, & only provide a somewhat limited interpretation
Koch, 2009; Schneider, Wang & Yang, 2007; of perceived uncanniness based on inert (unre-
Seyama & Nagayama, 2007). MacDorman et al. sponsive) still images. Most characters used in
focused on how facial proportions, skin texture animation and computer games are not stationary,
and how levels of detail affect the perceived with motion, timing and facial animation being the
eeriness, human likeness, and attractiveness of main factors contributing to the Uncanny Valley
216
Uncanny Speech
(Richards, 2008; Weschler, 2002). For realistic Not only was it suggested that a lack of lip/vo-
androids, behaviour that is natural and appropri- calization synchronization reduced how familiar
ate when engaging with humans, referred to as a character was perceived to be, but a perceived
“contingent interaction” by Ho, MacDorman, and lack of human-likeness for a character’s voice,
Pramono (2008, p. 170), is a key factor in assess- facial expression, and doubt in judgement as to
ing a human’s response to an android (Bartneck whether the voice actually belonged to the char-
et al., 2009; Kanda, Hirano, Eaton, & Ishiguro, acter or not, also reduced perceived familiarity.
2004). Previous authors (such as Green et al.,
2008; Hanson, 2006; MacDorman et al., 2009; Limitations of Mori’s theory
Schneider et al., 2007) state that the conclusions
drawn from their experiments where still images Recent studies demonstrate weaknesses within
had been used may have been different had move- the Uncanny Valley theory and suggest it may
ment (and sound) been included as a factor. be more complex than the simplistic valley shape
The perception of the uncanny does not al- that Mori plotted in his original diagram (see
ways have to provide a negative impact for the Figure 1). Various factors (including speech) can
viewer (MacDorman, 2006). The principals of influence how uncanny an object is perceived to
the uncanny theory can work to the advantage of be (Bartneck, et al., 2009; Ho et al., 2004; Mi-
engineers when designing robots with the purpose nato, Shimda, Ishiguro, & Itakura, 2004; Tinwell
of being unnerving within an appropriate setting & Grimshaw, 2009). Attempts to plot Mori’s
and context. Similarly, the uncanny may help in the Uncanny Valley shape cannot confirm the two-
success of the horror game genre for zombie-type dimensional construct that Mori envisaged. The
characters. Building on these findings, Tinwell and results from experiments that have been conducted
Grimshaw (2010) conducted the UM study, using using cross-modal factors such as motion and
video clips with sound, to investigate how the sound imply that it is unlikely that the uncanny
uncanny might enhance the fear factor for horror phenomena can be reduced to the two factors,
games. The results showed that combined factors perceived familiarity and human-likeness, and is
such as appearance and sound can work together instead a multi-dimensional model (see Figure 2).
to exaggerate the uncanny for virtual characters.
Figure 2. The Uncanny Wall, (Tinwell & Grimshaw, 2009)
217
Uncanny Speech
When ratings for perceived familiarity were tingling, inducing goose bumps, freakish, ghastly
plotted against human-likeness, the results from and horrible” (MacDorman & Ishiguro, 2006, p.
Tinwell and Grimshaw’s experiment, using 100 312) while Freud used the word unheimlich to
participants and 15 videos ranging from human- define the uncanny: Further confusing the issue,
oid to human with character vocalization, depict the root heimlich has two meanings viz familiar or
more than one valley shape. The plot is more agreeable and that which is concealed and should
complex than Mori’s smooth curve and the valley be kept from sight. Freud discussed both mean-
shapes less steep than Mori’s perpendicular climb. ings in his 1919 essay and they are not necessarily
The most significant valley occurs between the mutually exclusive as we show below. However,
humanoid character Mario, on the left and the despite a generic understanding for the word
stylized, human-like Lara Croft, on the right. The that Mori used, the appropriateness of the term
nadir for this valley shape is positioned at about shinwa-kan, (translated as familiarity) that Mori
50-55% human-likeness that is lower than Mori’s used in his original paper as a variable to measure
original prediction of 80-85% human-likeness. and describe uncanniness has been addressed by
Results from studies using robots with motion previous authors.
and speech are also inconsistent with Mori’s Un- As an uncommon word within Japanese culture
canny Valley. MacDorman (2006) plotted ratings there is no direct English equivalent for the word
for perceived familiarity against human-likeness shinwa-kan. The word familiarity stands for the
for an experiment using videos of robots from opposite to unfamiliarity (one of the synonyms
mechanical to human-like, including some stimuli for bukimi), yet the word familiarity may be open
with speech. The results showed no significant val- to misinterpretation. Whilst strange is a typical
ley shape in keeping with the depth and gradient term for describing the unfamiliar, familiarity
of Mori’s plot and that robots rated with the same might be interpreted with a variety of meanings
degree of human-likeness can have a different rat- including how well-known an object appears:
ing for familiarity. Bartnek et al. (2009) found that for example, a well-known character in popular
when a robotic copy of a human was compared to culture or an android replica of a famous person.
that human for the two conditions movement (with Bartnek et al. (2009) proposed that with no di-
motion and speech) and still, despite a significant rect translation shinwa-kan could be treated as
difference in perceived human-likeness between a “technical term” in its own right however this
the human and the android, there was no significant may cause problems when comparing the results
difference between perceived likeability for the from one experiment to another where the more
android and the human. These results imply that generic translation “familiarity” is used as the
movement may not be the only factor to influence dependent variable (p. 271). Other words such
the uncanny. Further investigation is required as likeability (Bartnek et al., 2009) or unstrange
to assess how speech may contribute to a more (the opposite to strange) may be closer to Mori’s
multi-dimensional model to measure the uncanny. original intention, nevertheless the validity for
Uncertainty exists as to whether the meaning experiments conducted into the uncanny may be
for Mori’s original concept may have been “lost more robust if a standard word were to be used
in translation” (Bartnek et al., 2009, p. 270). The as a dependent variable to measure and describe
word that Mori used in the title for the Uncanny perceived uncanniness: that word has yet to be
Valley is bukimi, which, translated in Japanese, agreed upon.
stands for “weird, ominous, or eerie”. In English, Conflicting views exist as to whether it is ac-
“synonyms of uncanny include unfamiliar, eerie, tually possible to overcome the Uncanny Valley.
strange, bizarre, abnormal, alien, creepy, spine One theory put forward is that objects may appear
218
Uncanny Speech
less uncanny over time as one grows used to a as used in the Uncanny Wall hypothesis being an
particular object. Brenton et al. (2005) give the exposition of the first Freudian sense of heimlich/
example of the life-like sculpture The Jogger by unheimlich as described above, the undesired
Duane Hanson: The sculpture will appear “less unmasking of the technological processes used in
uncanny the second time that it is viewed because the production of a character, and the perception
you are expecting it and have pre-classified it as a of those processes as flaws in the presentation
dead object”. The effect of habituation may also of that character, allows us simultaneously and
apply to those with regular exposure to realistic without contradiction to use the second meaning
human-like virtual characters. 3D modellers work- of heimlich: that which should remain out of sight.
ing with this type of character or gamers with The concept of the Uncanny Wall (as opposed to
an advanced level of gaming experience may be the Uncanny Valley which always holds out the
less able to detect flaws within a particular char- hope for a successful traversal to the far side),
acter because they had grown accustomed to the evokes a variety of myths, legends and modern
appearance and behaviour for that character by stories (Frankenstein’s monster, for example, or
interacting with it on a regular basis (Brenton, et the Golem) in which beings created by man are
al., 2005). Recent empirical evidence goes against condemned to forever remain pale shades of those
this theory. The results from a study by Tinwell created by gods.
and Grimshaw (2009) showed that the level of Further studies would be required to provide
experience for both playing computer games evidence for the Uncanny Wall to substantiate
and of using 3D modelling software made little the hypothesis that the Uncanny Valley is an
difference in detecting uncanniness. (Judgements impossible surmount for realistic, human-like
for those with an advanced level of experience for virtual characters. As soon as the next character
perceived familiarity and human-likeness had no is released, announced as having overcome the
significant difference between those with lesser Uncanny Valley, we intend to conduct another
or no experience.) test using the same characters as in the previous
Tinwell and Grimshaw suggest it may never experiment. If those characters previously rated
be possible to overcome the Uncanny Valley as a as close to escaping the valley, such as Emily (Im-
viewer’s discernment for detecting subtle nuances age Metrics, 2008), are placed beneath the new
from the human norm keeps pace with develop- character as perceived strangeness increases, our
ments in technology for creating realism. With a prediction may be justified. In the meantime, a
lack of empirical evidence to support the notion conceptual guide for uncanny motion and sound
of an Uncanny Valley, the notion of an Uncanny in virtual characters may be beneficial in aiding
Wall may be more appropriate (see Figure 2). computer game developers to manipulate the
Viewers who may at first have been “wowed” by degree of uncanniness.
the apparent realism of characters such as Quantic
Dream’s Mary Smith (2006) or characters in ani-
mation such as Beowulf (Zemeckis, 2007) or The crOss-MODAL MIsMAtcH
Polar Express (Zemeckis, 2004), soon developed
the skills to detect discrepancies for such charac- For androids, if a human-like appearance causes
ters’ appearance and behaviour. Indeed, as soon as us to evaluate an android’s behaviour from a hu-
the next technological breakthrough in achieving man standard, we are more likely to be aware of
realism is released, a viewer may be reminded of disparities from human norms (MacDorman &
the flaws for a character that at first did not seem Ishiguro, 2006; Matsui, Minato, MacDorman, &
uncanny. In addition to the meaning of uncanny Ishiguro, H., 2005; Minato et al., 2004). Ho et
219
Uncanny Speech
al. (2008) observed that a robot is eeriest when also increased for a character the less human-like
a human-like appearance creates an expectation the facial expression appeared.
of a human form when non human-like elements Laurel (1993) suggests that to achieve har-
fail to deliver to expectations. Also, a mismatch mony, there is an expectation for the sensory
in the human-likeness of different features for a modalities of image and sound to have the same
robot, for example, a nonhuman-like skin texture resolution. So that there is accord between visual
combined with human-like hair and teeth, elicited appearance and behaviour for virtual characters we
an uncanny sensation for the viewer. put forward that the degree of fidelity of human-
With regards to virtual characters it has been likeness for a character’s voice should match that
suggested that a high graphical fidelity for realistic character’s appearance, or otherwise risk discord
human-like characters raises expectations for the for that character. To avoid the uncanny, attention
character’s behavioural fidelity (Bailenson et al., should be given to the fidelity of human-likeness
2005; Brenton et al., 2005; Vinayagamoorthy et for a character’s voice in accordance with that
al., 2005). Any discrepancies from the human- character’s appearance. For high fidelity human-
norm with how a character spoke or moved would like characters it is expected that that character
appear odd. For humanoid or anthropomorphic should have a human-like voice of a resolution
characters with a lower fidelity of human-likeness that matches their realistic, human-like appear-
(for example, Mario or Sonic the Hedgehog), ance. However for mechanical-looking robots, a
differences from the human-norm would be less human-like and more mechanical-sounding
more acceptable to the viewer: Expectations are voice is preferable. The humanoid robot Robovie
lowered based on the more stylized and iconic was intentionally given a mechanical sounding
appearance for that character. Despite seemingly voice so that it appeared more natural to the
strange behaviour with jerky movements or a viewer (Kanda et al., 2004). A voice that was too
less than human-like voice, the viewer will still human-like may have been regarded as unnatural
develop a positive affinity with the character. based on the robot’s appearance, thus exaggerating
Empirical evidence implies that humanoid and the uncanny for the robot.
anthropomorphic type characters do escape the To test the Uncanny Valley theory with virtual
valley dip as Mori predicted, being placed before characters, it has been suggested that it is not
the first peak in the valley (Tinwell, 2009; Tinwell necessary to include characters from computer
& Grimshaw, 2009). games as the level of realism achieved from gaming
Evidence shows that for virtual characters environments generated in real-time is less than
(and robots) a perceived mismatch in the human- that achieved for animation and film (Brenton et
likeness for a character’s voice based on that al., 2005). Some characters created for television
character’s appearance exaggerates the uncanny. and film have been proclaimed as overcoming
As part of the Uncanny Modality survey (Tinwell the Uncanny Valley: In 2008, Plantec hailed the
& Grimshaw, 2010), 100 participants rated how character Emily as finally having done so.
human-like the character’s voice sounded and how Walker, of Image Metrics, states that whilst
human-like the facial expression appeared using a computer games would benefit from these more
scale from 1 (nonhuman-like) to 9 (very human- realistically rendered faces, it is not yet possible
like). Strong relationships were identified between to achieve the same high level of polygon counts
the uncanny and perceived human-likeness for a for in-game play as achieved for television and
character’s voice and facial expression. The less film due to technical restrictions: “We can produce
human-like the voice sounded, the more strange Emily-quality animation for games as well, but
the character was regarded to be. Uncanniness
220
Uncanny Speech
it just can’t work in a real-time gaming environ- reduce the uncanny for computer game characters
ment” (as quoted in Ashcraft, 2008). due to the fact that they will always be playing
Accordingly, for virtual characters used within catch up to the level of realism achieved for film.
computer games that are approaching levels of Refinements made to character’s voices over a
realism as achieved for the film industry, it may spectrum of human-likeness ranging from human-
be advisable to reduce the level of human-likeness like to mechanical, may perhaps help to remove
for a character’s voice to a level that is in keeping the uncanny where it was previously evident.
with that character’s appearance. Actors’ voices Reiter notes that recently, more attention has
are typically used for realistic, human-like char- been given to the quality of sound in computer
acters’ speech in computer games. Yet, if the level games to keep up with the quality of realism
of fidelity for achieving human-like realism for achieved visually for in-game play and to provide
computer games is less than that achieved for film, a more cinematic experience. As a method of
a less than human-like voice should be used to communication both diegetic and non-diegetic
avoid the character being perceived as unnatural. game sound enhances a game’s plausibility in
Hug (2011) makes a similar point when discussing that sound can “trigger emotions and provide
the similarities between indie game and animation additional information otherwise hard to convey”
film aesthetics. Hug describes an affinity between (Reiter, 2011). Distinctions made as to the quality
sound used in animation film or cartoons matches of game sound are not simply due to the level of
and the aesthetic style for the animation: “[S]ounds clarity, resolution, or digital output achievable
that are more or less de-naturalized in a comical, for sound: “Perceived quality in game audio is
playful, or surreal way, which is characterized not a question of audio quality alone” (Reiter,
by a subservsive interpretation of sound-source 2011). For speech, textures, emotive qualities and
associations”. He further uses the example of an delivery style are attributes that contribute to the
explosion that occurs within the arcade game perceived quality and overall believability for a
Grey Matter (McMillen, Refenes, & Baranowsky, character. (Qualities of speech and the uncanny
2008) as an intriguing case of “cartoonish” sound are discussed further in the following section.)
design “when an abstract dot hits a flying cartoon Quality of speech is critical in portraying the
brain, the latter ‘explodes’ with sounds of broken emotive context of a character convincingly.
glass”. Although a more cartoonish style of sound However with regards to the uncanny, if the per-
is used for the explosion, the sound seems more ceived realism and quality for a voice goes beyond
in keeping with the stylized appearance of the that of the quality and realism for a character’s
object to which the sound belongs. The visceral appearance, such a cross-modal mismatch could
sounds of the impact are still evident despite the exaggerate the uncanny. Further experiments are
more simplistic nature of the sound. The acoustics required to test this theory. Building on the premise
appear more natural as the level of detail appears of Hanson’s (2006) experiment where the uncanny
to match the stylized aestheticism of the film’s was removed from a morphed sequence of images
environment. from robot to human by making a robot’s features
Of course we do not suggest that cartoon-like more “cartoonish” and friendly, similar changes
voices be used with characters that are approaching could be made to the acoustics of speech for
believable realism in computer games, however videos of realistic, human-like characters. Whilst
the level of human-likeness may be subtly modi- the videos of characters would remain constant,
fied so that the perceived style of the voice sound the speech sound would be changed across a
matches the aesthetic appearance of the character. spectrum of human-likeness from mechanical to
This absurd juxtaposition may be necessary to human-like. If our predictions are correct, char-
221
Uncanny Speech
acters will be perceived as more strange when the a time like those of radio announcers filled the
speech sounds too mechanical or too human-like theatre” (p. 64). As Tinwell and Grimshaw state,
in relation to the fidelity of human-likeness for a paraphrasing Spadoni, (2010) the unique textures
character’s appearance. A character may appear and delivery style for Dracula’s speech increased
more natural and be perceived as more familiar the uncanny for Dracula:
once the fidelity of human-likeness for speech
is adjusted to be regarded as matching that of a Dracula’s voice, the ethereal voice of the undead,
character’s appearance. is compared to the voice of reason and materiality
that is Van Helsing’s. In the former, the uncanny
is marked by uneven and slow pronunciation,
QUALItIEs OF sPEEcH staggered rhythm and a foreign (that is, not Eng-
lish) accent and all this produces a disconnect
Bizarre qualities and textures of speech served to between body and speech. Van Helsing’s speech,
gratify the pleasure humans sought in frighten- by contrast, is the embodiment of corporiality;
ing themselves with early horror film talkies, for authoritative, clearly enunciated and rational in
example the monster in Browning’s (1931) film its delivery and meaning.
Dracula. Some cinematic theorists argue that the
success of films such as Dracula was due to an For zombie characters in computer games,
uncanny modality that occurred during the tran- comparisons have been made with horror film
sition between silent to sound cinema (Spadoni, talkies as to the methods used to create and modify
2000, p. 2). Sounds that may have been perceived sound to induce an ambience of fear (Brenton
as unreal or strange due to technical restrictions of et al., 2005; Perron, 2004; Roux-Girard, 2011;
sound recording and production at the time were Toprac & Abdel-Meguid, 2011). Results from
used to the advantage of the character Dracula. the UM study by Tinwell and Grimshaw (2009)
For early sound film, to produce the most to define cross-modal influences of image and
intelligible dialogue for the viewer, the recording sound and the uncanny in virtual characters show
process required that words were pronounced that particular qualities of speech (similar to those
slowly, emphasizing every “syl-la-ble” (Spadoni, observed for early horror talkies) can exaggerate
2000, p. 15). However, whilst words could be how uncanny a virtual character is perceived
easily interpreted by the viewer, this impeded to be. Thirteen video clips of one human and
delivery style made the speech sound unnatural twelve virtual characters in different settings and
and unreal. Delivery of speech style also influ- engaged in different activities were presented to
enced how strange Dracula was perceived to be. 100 participants. The twelve virtual characters
In the role of Dracula, the acoustics of Bela consisted of six realistic, human-like characters:
Lugosi’s speech set the standard for what the (1) the Emily Project (2008) and (2) the Warrior
“voice of horror” should be (Spadoni, 2000, pp. (2008) both by Image Metrics; (3) Mary Smith
63-70). The weird textures of Bela Lugosi’s voice from The Casting (Quantic Dream, 2006); (4) Alex
were manipulated to create a greater conceptual Shepherd from Silent Hill Homecoming (Konami,
peculiarity for the viewer, thus setting the epony- 2008) and two avatars (5) Louis and (6) Francis
mous character apart from other horror films. from Left 4 Dead (Valve, 2008); four zombie
The distinctive vocal tone and pronunciation of characters, (7) a Smoker, (8) The Infected, (9) The
Dracula’s speech were characteristics that critics Tank and (10) The Witch from Left 4 Dead; (11) a
acclaimed as the most shocking and chilling; “slow stylised, human-like Chatbot character “Lillien”
painstaking voices pronouncing each syllable at (Daden Ltd, 2006); (12) a realistic, human-like
222
Uncanny Speech
zombie (Zombie 1) from the computer game of just 2 (see Table 1). However it seems the
Alone in the Dark (Atari Interactive, Inc, 2009) unintelligible hisses and snarls from the Tank
and (13) a human. were regarded as sounds that this character was
Table 1 shows the median ratings for a char- likely to make based on the Tank’s appearance
acter’s strangeness and for the speech qualities: and how he behaved. Likewise the inhuman cries
whether the speech seemed (a) slow, (b) monotone, and screeches from the Witch matched her seem-
(c) of the wrong intonation, (d) if the speech did not ingly pathetic and wretched appearance. Such
appear to belong to a character, or (e) none of the sounds enhanced the believability of these char-
above. Characters with the same median value for acters as they were in keeping with their nonhu-
strangeness were grouped together and the median man-like appearance.
values for speech qualities were then calculated for The findings from the UM study provide
those characters or groups. (Median values were empirical evidence to support the claims made
used to indicate a central tendency for results, to by MacDorman (as quoted in Gouskos, 2006)
help establish a clear overall picture of the vital that Mary Smith’s speech was one of the main
relationships over multiple qualities of speech.) contributing factors as to why she was perceived
The results implied that, slowness of speech, an as uncanny. Twenty percent of participants ob-
incorrect intonation, and pitch and how monotone served a lack of correct pitch and intonation for
the voice sounded increased uncanniness. Mary Smith’s speech. This implies that the pitch
A strong indirect relationship was identified and tone for her voice may not have matched the
between individual ratings for the variables “the facial expression exhibited by this character. The
speech intonation sounds incorrect” and “the voice emotive qualities of speech may have seemed ei-
belongs to the character”. This implies that if the ther inappropriate or out of context with how this
intonation for a character’s voice is in keeping character appeared to look and behave. The facial
with what the viewer may have expected, this expression may not have matched nor accurately
characteristic may contribute to the overall believ- conveyed the emotive qualities of her voice. At-
ability for that character. The two zombies the tributes such as these raised doubts as to whether
Witch and the Tank, from the computer games the voice actually belonged to this character or
Left 4 Dead (Valve, 2008), were regarded as the not, thus increasing the sense of perceived eeri-
most uncanny with a median strangeness rating ness for this character.
Table 1. Median ratings for speech qualities for those characters or groups with the same median strange-
ness value. (Tinwell & Grimshaw, 2010). Note. Judgements for strangeness were made on 9-point scales
(1 = very strange, 9 = very familiar)
Median Strangeness for Wrong

Slow Monotone Belongs None
Character or Group intonation
The Tank, The Witch, (Mdn = 2) 10 9.5 23.5 56.5 16.5
The Infected, The Smoker, Zombie 1, Chatbot,
24 21.5 40 42 8.5
(Mdn = 3)
Mary Smith, (Mdn = 4) 8 3 20 20 8
The Warrior, Alex Shepherd, (Mdn = 6) 14 17 17 62.5 7.5
Louis, Francis, (Mdn = 7) 2.5 3.5 6.5 79.5 4.5
Emily, (Mdn = 8) 2 0 2 87 6
Human, (Mdn = 9) 1 15 4 72 6
223
Uncanny Speech
As well as being regarded of the wrong pitch, circumstance for a character may be regarded as
speech that is delivered in a slow, monotone way out of context and confusing for a viewer. To avoid
increased the uncanny for both zombie characters the uncanny, attention should be given to ensur-
and human-like characters not intended to contest a ing that the pitch of voice accurately depicts the
sense of the real. Within the UM study, the Chatbot given emotion for a character and, once speech has
character received a less than average rating for been recorded at the correct pitch, that the facial
perceived familiarity and was placed with three expression conveys that emotion convincingly.
other zombie characters with a median strangeness
value of just three (see Table 1). The Chatbot’s
voice was rated individually as being slow (75%), LIP-sYNcHrONIZAtION
monotone (59%), and of an incorrect intonation VOcALIZAtION
(76%). The “speech” for Zombie 1, grouped with
the Chatbot character with a median strangeness The process of matching lip movement to speech
value of three, was also judged individually as is an integral factor in maintaining believability
being monotone (29%), slow (42%), and of an infor an onscreen character (Atkinson, 2009). For
correct intonation (34%). Including such qualities first-person shooters (FPS) and other similar types
of speech for the zombie may have been a con- of action game, there are limited periods during
scious design decision by developers to increase gameplay when attention is focused solely on a
the perceived eeriness for a character intended to headshot of a speaking character. Close up shots
elicit an uncanny sensation. (As mentioned above, of a player’s character, comrades or antagonists
such qualities enhanced the overall impact for the are predominantly used when exchanging infor-
monster Dracula.) However the crippled speech mation during gameplay or during cinematic cut
style for the Chatbot appeared unnatural and scenes and trailers.
unreal. Such qualities for this character’s speech The music genre of computer games provides
were factors that viewers found most annoying an outlet for musicians to promote and sell their
and irritating, exaggerating the uncanny for this work (Kendall, 2009; Ripken, 2009). As well as
character when perhaps this was not intended. FPS games, music games can a provide challenge
Our results imply that uncanniness is increased for developers with regards to facial animation
if speech is judged to be of the wrong pitch, too and sound. The Beatles: Rock Band (EA Games,
monotone, or slow in delivery style. Whilst such 2009) highlights the recent success of the merger
qualities can work to the advantage of antipathetic of music and computer games that use realistic,
characters by increasing the fear factor, these human-like characters to represent music artists.
qualities may work against empathetic characters It has been found, however, that uncanny traits
in the role of hero or protagonist within a game. can leave viewers dissatisfied with particular
A designer may wish the player to have a posi- characters within the context of a computer game
tive affiliation with the protagonist character, yet (Tinwell, 2009). With emphasis directed at a
the designer may unwittingly create an uncanny character’s mouth as the vocals are matched to the
sensation for the player with speech qualities that music tracks, it is important that an artist’s identity
sound strange to the viewer. Speech prerecorded be transferred effectively within this new medium
in a manner that is too slow or monotone to aid (Ripken, 2009). Factors such as asynchrony may
clarity for post-production purposes may be judged result in a negative impact on the overall believ-
as unnatural and should be instead recorded at ability for such characters.
an appropriate tempo. Pitch and tone of speech This section discusses the outcomes of a lack
that do not match the facial expression or given of synchrony for lip-vocalization narration in film
224
Uncanny Speech
and television and the corresponding implications with a smaller window of acceptable asynchrony
for characters in computer games. for when sound lagged behind video at 220ms.
Standards set by the television broadcasting in-
Lip syncing for television and Film dustry require that the audio stream should not
precede the video stream by more than 45ms and
The process of a viewer accepting that sound that the audio stream should not lag behind the
and image occur simultaneously from one given video stream by more than 125ms (ITU-R, 1998).
source is referred to as synchresis (Chion, 1994) An asynchrony for speech with lip movement
or synchrony (Anderson, 1996).1 For early sound can lead to one misinterpreting what has been said:
cinema, various methods of sound recording and the McGurk Effect (1976). As a viewer, one can
post production techniques were applied before interpret what has been heard by what has been
a viewer no longer doubted that a voice actually seen. Depending on which modality one’s atten-
belonged to a figure onscreen. A perceived lack tion may be drawn to for audio-visual speech (and
of synchronization between image and sound has depending on which syllable is used), the pronun-
been equated with much of the uncanny sensation ciation of a visual syllable can take precedence
evoked by films within the horror genre in early over the auditory syllable. Conversely a sound
sound cinema (Spadoni, 2000, pp. 58-60). Errors syllable can take precedence over the visual syl-
in synchrony evoked the uncanny for a scene in lable. Alternatively, as one comprehends the visual
Browning’s Dracula (1931). As a figure’s lips articulatory process of speech both automatically
remained still, human laughter resonated within and subconsciously, one can combine the sound
the scene. With no given body or source, the laugh- and visual syllable information to create a new
ter is regarded as an eerie, disembodied sound. syllable. For example, a visual “ga” coinciding
Whilst technology allows for some improvement with the sound “ba” can be interpreted as a “da”
with cinema speakers, televisions and personal sound. (This type of effect was observed by Mac-
computers, most sound is still delivered through Dorman (2006) for the character Mary Smith’s
some mechanism that is physically disjunct from speech, who was criticized for being uncanny.)
the onscreen image (for example, via headphones A viewer’s overall enjoyment of a television
or separate speakers). Tinwell and Grimshaw programme can be disrupted if delays occur be-
(2010) note that future technologies may overcome tween transmission devices for video and audio
issues with asynchrony within the broadcasting signals. To prevent confusion or irritation for the
industry: “Presumably, there will be no need for viewer, sub-titles are often preferred to dubbing
such perceptual deceit once flat-panel speakers of speech for foreign works. (Hassanpour, 2009).
with accurate point-source technology provide Errors in the synchronization of lip move-
simultaneously a visual display” (p. 7). ments with voice for figures onscreen (lip sync
For human figures in television and film, error) can result in different responses from the
viewers are more sensitive to an asynchrony of lip viewer depending upon the context within which
movement with speech than for visual information the errors are portrayed. A study by Reeves and
presented with music (Vatakis & Spence, 2005). Voelker (1993) found that not only is lip sync error
Viewers are also more sensitive to asynchrony potentially stressful for the television viewer, but it
when sound precedes video and less so when sound can also lead to a dislike for a particular program
lags behind video (Grant et al., 2004). Grant et al. and viewers evaluating the people displayed on the
found that for continuous streams of audio-visual screen more negatively and as “less interesting,
speech presented onscreen, detectable asynchrony more unpleasant, less influential, more agitated,
occurred at 50ms when sound preceded video, more confusing, and less successful” (p. 4). On the
225
Uncanny Speech
contrary, lip sync error has also been deliberately pants as to how uncanny and how synchronized
used to provoke a humorous affect for the viewer speech with lip movement was perceived to be.
where the absurd is regarded as comical as opposed (A full description of the stimuli used in the
to annoying. For example, the intentionally bad experiment is provided in the third section.) The
dubbing for characters in “Chock-Socky” movies results revealed a strong relationship between
(Tinwell & Grimshaw, 2010). how uncanny a character was perceived to be and
a lack of synchronization between lip movement
Lip syncing for computer Games and speech: those characters with disparities in
synchronization were perceived as less familiar
With increasing technological sophistication in and more strange than those characters rated as
the creation of realism in computer games, text- close to perfect lip-synchronization.
based communication systems have been replaced Synchronization problems with the recorded
with virtual characters using actors’ voices. To voice for early sound cinema heightened a viewer’s
create full voice-overs for characters, automated awareness that the figure was not real and was
lip-syncing tools extract phoneme sounds from simply a manufactured artifact (Spadoni, 2000, p.
prerecorded lines of speech. The visual represen- 34). A viewer was reminded that figures onscreen
tation (viseme) for a particular sound is retrieved were merely fabricated objects created within a
from a database of predetermined mouth shapes. production studio. The uncanny was increased
Muscles within the mouth area for a 3D character as figures were perceived as, “a reassembly of a
are modified to create a particular mouth shape figure” easily disassembled within a movie theatre
for each phoneme. Interpolated motion is inserted (Spadoni, 2000, p. 19). The results from the UM
between the next phoneme and associated mouth study (Tinwell & Grimshaw, 2010) imply that
shape to enable contingency of lip movement for the implications of asynchrony for speech and
words within a given sentence. For example, a the uncanny for human figures within the clas-
specific mouth shape can be selected for the sound sic horror cycle of Hollywood film also apply to
“sh” to be used in conjunction with other sounds virtual characters intended for computer games.
within a word or line of speech. Full voice-overs The zombie characters the Witch and the Tank
for characters were generated for titles developed from the computer game Left 4 Dead (2008),
by Valve such as Left 4 Dead (2008) and Half received less than average scores for perceived
Life 2 (2008) using this technique. A phoneme lip-synchronization. The jerky, haphazard move-
extractor tool within Faceposer allowed for the ment of the Witch’s lips appeared disparate from
detection and extraction of phoneme sounds from the high-pitched cries and shrieks spewed out by
prerecorded speech to be synchronized with a this character. As the Witch proceeded to attack,
character’s lips. her presence seemed evermore overwhelming as
Whilst research has been undertaken to im- sounds appeared to emulate from an incorporeal
prove the motion quality of real-time data driven and uncontrollable being in a similar manner to
approaches for realistic visual speech synthesis Dracula’s laughter noted earlier. Similarly, partici-
(Cao, Faloustsos, Kohler, & Pighin, 2004), prior pants seemed somewhat confused by the chaotic
to the UM study (Tinwell & Grimshaw, 2010) movement and irregular sounds generated by the
there have been no attempts to investigate what Tank character making the viewer feel panicked
impact lip-synchronization may have on viewer and uncomfortable.
perception and the uncanny in virtual characters. The stimuli for this study were presented in
Videos of 13 virtual characters ranging from different settings and as different actions. Some
humanoid to human were rated by 100 partici- were presented as talking heads, for example the
226
Uncanny Speech
Chatbot character, whilst others moved around animation and expression, this technique is mostly
the screen, for example the Tank and the Witch. useful for FMV. Recorded motions are difficult
A further study is required to determine the actual to modify once transferred to a three-dimensional
causality of lip-synchronization as a significant model and the digital representation of the mouth
contributor towards the uncanny when not asso- remains an area requiring further modification.
ciated with other factors of facial animation and Editing motion capture data often involves careful
sound. Thus, we intend a further experiment to key-framing by a talented animator. A developer
test the hypothesis: Uncanniness increases with may edit individual frames of existing motion
increasing perceptions of lack of synchronization capture data for prerecorded trailers and cut scenes
between the character’s lips and the character’s yet, for computer games, most visual material is
sound. generated in real-time during gameplay. For in-
At present there are no standards set for accept- game play, automatic simulation of the muscles
able levels of asynchrony for computer games as within and surrounding the mouth is necessary
there are for television. It may well be that these to match mouth movement with speech. Motion
acceptable levels are the same across the two capture by itself cannot be used for automated
media but it might equally be the case that the facial animation.
interactive nature of computer games and the use To create automatic visual simulation of mouth
of different reproduction technologies and para- movement with speech, computer game engines
digms propose a different standard. For example, require a set of visemes as the visual representation
perhaps it is the case that current technological for each phoneme sound. Faceposer (Valve, 2008)
limitations in automated lip-syncing tools require uses the phoneme classes phonemes, phonemes
a smaller window of acceptable asynchrony for strong, and phonemes weak with a corresponding
computer games than previously established for viseme to represent each syllable within the In-
television. We hope the future experiment noted ternational Phonetic Alphabet (IPA). Prerecorded
above will also ascertain if viewers are more speech is imported into a phoneme extractor tool
sensitive to an asynchrony of speech for virtual that extracts the most appropriate phoneme (and
characters where the audio stream precedes video corresponding viseme) for recognized syllables.
(as has been previously identified for the televi- Editing tools allow for the creation of new pho-
sion broadcasting industry). neme classes, or to modify the mouth shape for
an existing viseme.
The UM study (Tinwell & Grimshaw, 2010)
ArtIcULAtION OF sPEEcH identified a strong relationship between how
uncanny a character was perceived to be with a
Hundreds of individual muscles contribute to perceived exaggeration of facial expression for the
the generation of complex facial expressions and mouth. The results implied that those characters
speech. As one of the most complex muscular perceived to have an over-exaggeration of mouth
regions of the human body, and with increased real- movement were regarded as more strange. Thus,
ism for characters, generating realistic animation uncanniness increases with increasing exaggera-
for mouth movement and speech is a challenge for tion of articulation of the mouth during speech.
designers (Cao et al., 2004; Plantec, 2007). Even Finer adjustments to mouth shapes using tools
though the dynamics of each of these muscles is such as Faceposer may prevent a perceived over-
well understood, their combined effect is very dif- exaggeration of articulation of speech, yet such
ficult to simulate precisely. Whilst motion capture adjustments are time consuming for the developer.
allows for the recording of high fidelity facial If no original visual footage is available for speech,
227
Uncanny Speech
judgements made to correct mouth shapes that ap- 4. Speech delivery that is perceived as slow,
pear too strong or too weak are likely to be based on monotone, or of the wrong tempo
the subjective opinion of an individual developer. 5. An over-exaggeration of articulation of the
Even then, the developer is still constrained by mouth during speech.
the number of mouth and facial muscles available
to modify within the 3D model, which may not Whilst such characteristics of speech may
include an exhaustive depiction of every single adorn the spine tingling sensation associated
muscle used in human speech. with the uncanny for antipathetic characters in
To avoid the uncanny, working with the range the horror genre of games, a developer may risk
of mouth shapes and facial expression that cur- the uncanny if such characteristics exist for empa-
rent technology allows for within tools such as thetic characters. The protagonist Mary Smith, as
Faceposer, the developer should at least avoid featured in the tech demo for the adventure game
an articulation of speech that may appear over- Heavy Rain (2006), may have been intended to
exaggerated. The mouth shape for the phoneme evoke affinity and sympathy from the audience.
used to pronounce the word “no” (“n” in Face- Instead, Mary Smith was regarded as strange
poser) may be applicable if the word is pronounced and abnormal: Uncanny speech for this character
in a strong, authoritative way, but would appear contributed to just such a negative response from
overdone and out of context if the same word was the audience. The speech was not only judged as
used to provide reassurance in a calming and less lacking synchronization with lip movement but
domineering manner. an inaccurate pitch and lack of human-likeness
Indeed, if the developer wishes to create an raised doubt as to whether the voice actually be-
uncanny sensation for a zombie character, adjust- longed to the character or not. Attributes such as
ing mouth shapes so that articulation of speech these reduced the overall believability for Mary
appears over exaggerated may enhance the fear Smith. However, for zombies such as the Tank
factor for such characters by increasing perceived and the Witch from the survival horror game Left
strangeness. In the same way that a snarling dog 4 Dead (Valve, 2008), uncanny speech increased
or ferocious beast may raise the corners of their (in a desired manner) how strange and freakish
mouths to show their teeth in an aggressive way, these characters were perceived to be.
viewers may be made to feel uncomfortable by The outcomes from this investigation show
overstated mouth movements that suggest a pos- that the majority of characteristics for uncanny
sible threat. speech in computer games may be induced by
current technological limitations in the production,
reproduction, and control of virtual characters.
sUMMArY AND cONcLUsION Restrictions as to the range of facial muscles avail-
able to manipulate in automated facial animation
In summary, attributes of speech that may ex- tools used to generate footage in real-time is a cur-
aggerate the uncanny for realistic, human-like rent constraint for achieving realism in computer
characters in computer games are: games comparable to film. It seems there is a lack
of an exhaustive range of mouth shapes to fully
1. A level of human-likeness for a character’s represent each phoneme sound and variation of
speech that does not match the fidelity of interpolation between syllables in a range of dif-
human-likeness for a character’s appearance ferent contexts. Such constraints may contribute
2. An asynchrony of speech with lip movement to a perceived asynchrony of speech and mouth
3. Speech that is of an incorrect pitch or tone.
228
Uncanny Speech
shapes being used for syllables that do not ac- interacting with the interface. Such a tool will
curately convey the prosody or context of speech. aid in fine-tuning the qualities of speech that
Computer games may always be playing catch- will, depending on the desired situation, reduce
up with the levels of anatomical fidelity achieved or enhance uncanny speech.
in film for facial animation, however developments
in procedural game audio and animation may
provide a solution for uncanny speech. rEFErENcEs
As Hug states, the future of sound in computer
games is moving towards procedural sound tech- Alone in the dark [Computer game]. (2009). Eden
niques that allow for the generation of bespoke Games (Developer). New York: Atari Interactive,
sounds, to create a more realistic interpretation of Inc.
life within the 3D environment. For in-game play Anderson, J. D. (1996). The reality of illusion:
dynamic sound generation techniques, “such as An ecological approach to cognitive film theory.
physical modelling, modal synthesis, granulation Carbondale, IL: Southern Illinois University Press.
and others, and meta forms like Interactive XMF”
will create sounds in real-time responding to both Ashcraft, B. (2008) How gaming is surpassing the
user input and the timing, position, and condition Uncanny Valley. Kotaku. Retrieved April 7, 2009,
of objects within gameplay (Hug, 2011). from http://kotaku.com/5070250/how-gaming-is-
Using procedural audio (speech synthesis in surpassing-uncanny-valley.
this case), a given line of speech may be generated Atkinson, D. (2009). Lip sync (lip synchro-
over a differing range of tempos using a delivery nization animation). Retrieved July 29, 2009,
style appropriate for the given circumstance. For from http://minyos.its.rmit.edu.au/aim/a_notes/
example, the sentence “I don’t think so” may be anim_lipsync.html.
said in a slow, controlled manner, if carefully
contemplating the answer to a question. In con- Bailenson, J. N., Swinth, K. R., Hoyt, C. L.,
trast, a fast-paced tone may be used if intended Persky, S., Dimov, A., & Blascovich, J. (2005).
as a satirical plosive when at risk of being struck The independent and interactive effects of
by an antagonist. embodied-agent appearance and behavior on
Procedural animation techniques for the self-report, cognitive, and behavioral markers of
mouth area may also allow for a more accurate copresence in immersive virtual environments.
depiction of articulation of mouth movement Presence (Cambridge, Mass.), 14(4), 379–393.
during speech. Building on the existing body of doi:10.1162/105474605774785235
research into real-time, data-driven, procedural
Ballas, J. A. (1994). Delivery of information
generation techniques for motion and sound (for
through sound . In Kramer, G. (Ed.), Auditory
example, Cao et al., 2004; Farnell, 2011; Mullan,
display: Sonification, audification, and auditory
2011), a tool might be developed that combines
interfaces (pp. 79–94). Reading, MA: Addison-
techniques for the procedural generation of emo-
Wesley.
tive speech in response to player input (actions or
psychophysiology) (Nacke & Grimshaw, 2011) Bartneck, C., Kanda, T., Ishiguro, H., & Hagita,
or game state. Interactive conversational agents N. (2009). My robotic doppelganger—A critical
in computer games or within a wider context of look at the Uncanny Valley theory. In Proceed-
user interfaces may appear less uncanny if the ings of the 18th IEEE International Symposium
tempo, pitch, and delivery style for their speech on Robot and Human Interactive Communication,
varies in response to the input from the person RO-MAN2009, 269-276.
229
Uncanny Speech
Brenton, H., Gillies, M., Ballin, D., & Chatting, Farnell, A. (2011). Behaviour, structure and causal-
D. J. (2005, September 5). The Uncanny Valley: ity in procedural audio . In Grimshaw, M. (Ed.),
Does it exist? Paper presented at the HCI 2005, Game sound technology and player interaction:
Animated Characters Interaction Workshop, Na- Concepts and developments. Hershey, PA: IGI
pier University, Edinburgh, UK. Global.
Browning, T. (Producer/Director). (1931). Dracu- Ferber, D. (2003, September) The man who mis-
la [Motion picture]. England: Universal Pictures. took his girlfriend for a robot. Popular Science.
Retrieved April 7, 2009, from http://iiae.utdallas.
Busso, C., & Narayanan, S. S. (2006). Interplay
edu/news/pop_science.html.
between linguistic and affective goals in facial
expression during emotional utterances. In Pro- Freud, S. (1919). The Uncanny . In The standard
ceedings of 7th International Seminar on Speech edition of the complete psychological works of
Production, 549-556. Sigmund Freud (Vol. 17, pp. 219–256). London:
Hogarth Press.
Calleja, G. (2007). Revising immersion: A con-
ceptual model for the analysis of digital game Gaver, W. W. (1993). What in the world do we hear?
involvement. In Proceedings of Situated Play, An ecological approach to auditory perception.
DiGRA 2007 Conference, 83-90. Ecological Psychology, 5(1), 1–29. doi:10.1207/
s15326969eco0501_1
Cao, Y., Faloustsos, P., Kohler, E., & Pighin, F.
(2004). Real-time speech motion synthesis from Gouskos, C. (2006). The depths of the Uncanny
recorded motions. In R. Boulic & D. K. Pai (Eds.), Valley. Gamespot. Retrieved April 7, 2009, from,
Eurographics/ACM SIGGRAPH Symposium on http://uk.gamespot.com/features/6153667/index.
Computer Animation (2004), 345-353. html.
Chion, M. (1994). Audio-vision: Sound on screen Grant, W., Wassenhove, V., & Poeppel, D.
(Gorbman, C., Trans.). New York: Columbia (2004). Detection of auditory (cross-spectral) and
University Press. auditory-visual (cross-modal) synchrony. Speech
Communication, 44(1/4), 43–53. doi:10.1016/j.
Edworthy, J., Loxley, S., & Dennis, I. (1991).
specom.2004.06.004
Improving auditory warning design: Relationship
between warning sound parameters and perceived Green, R. D., MacDorman, K. F., Ho, C. C., &
urgency. Human Factors, 33(2), 205–231. Vasudevan, S. K. (2008). Sensitivity to the pro-
portions of faces that vary in human likeness.
Ekman, I., & Kajastila, R. (2009, February 11-13).
Computers in Human Behavior, 24(5), 2456–2474.
Localisation cues affect emotional judgements:
doi:10.1016/j.chb.2008.02.019
Results from a user study on scary sound. Paper
presented at the AES 35th International Confer- Grey Matter [INDIE arcade game]. (2008).
ence, London, UK. McMillen, E., Refenes, T., & Baranowsky, D.
(Developers). San Francisco, CA: Kongregate.
(2008). Emily Project. Santa Monica, CA: Image
Metrics, Ltd. Grimshaw, M. (2008a). The acoustic ecology of
the first-person shooter: The player experience of
(2008). Faceposer [Facial Animation Tool as Part
sound in the first-person shooter computer game.
of Source SDK]. Bellevue, WA: Valve Corpora-
Saarbrücken, Germany: VDM Verlag Dr. Mueller.
tion.
230
Uncanny Speech
Grimshaw, M. (2008b). Sound and immersion in Jentsch, E. (1906). On the psychology of the
the first-person shooter. International Journal of Uncanny. Psychiat.-neurol. Wschr., 8(195), 219-
Intelligent Games & Simulation, 5(1). 21, 226-7.
Grimshaw, M., Nacke, L., & Lindley, C. A. Kanda, T., Hirano, T., Eaton, D., & Ishiguro, H.
(2008, October 22-23). Sound and immersion in (2004). Interactive robots as social partners and
the first-person shooter: Mixed measurement of peer tutors for children: A field trial. Human-
the player’s sonic experience. Paper presented at Computer Interaction, 19(1), 61–84. doi:10.1207/
Audio Mostly 2008, Piteå, Sweden. s15327051hci1901&2_4
Half Life 2. [Computer game]. (2008). Valve Kendall, N. (2009, September 12). Let us play:
Corporation (Developer). Redwood City, CA: Games are the future for music. The Times: Playl-
EA Games. ist, p. 22.
Hanson, D. (2006). Exploring the aesthetic range Laurel, B. (1993). Computers as theatre. New
for humanoid robots. In Proceedings of the ICCS/ York: Addison-Wesley.
CogSci-2006 Long Symposium: Toward Social
Left 4 dead [Computer game]. (2008). Valve
Mechanisms of Android Science, 16-20.
Corporation (Developer). Redwood City, CA:
Hassanpour, A. (2009). Dubbing. The Museum EA Games. Lillian—A natural language library
of Broadcast Communications. Retrieved July interface and library 2.0 mash-up. (2006). Bir-
14, 2009, from, http://www.museum.tv/archives/ mingham, UK: Daden Limited.
etv/D/htmlD/dubbing/dubbing.htm.
MacDorman, K. F. (2006). Subjective ratings of
Ho, C. C., MacDorman, K., & Pramono, Z. A. D. robot video clips for human likeness, familiarity,
(2008,). Human emotion and the uncanny valley. and eeriness: An exploration of the Uncanny Val-
A GLM, MDS, and ISOMAP analysis of robot ley. ICCS/CogSci-2006 Long Symposium: Toward
video ratings. In Proceedings of the Third ACM/ Social Mechanisms of Android Science.
IEEE International Conference on Human-Robot
MacDorman, K. F., Green, R. D., Ho, C. C., &
Interaction, 169-176.
Koch, C. T. (2009). Too real for comfort? Uncanny
Hoeger, L., & Huber, W. (2007). Ghastly mul- responses to computer generated faces. Computers
tiplication: Fatal Frame II and the videogame in Human Behavior, 25, 695–710. doi:10.1016/j.
Uncanny. In Proceedings of Situated Play, DiGRA chb.2008.12.026
2007 Conference, Tokyo, Japan, 152-156.
MacDorman, K. F., & Ishiguro, H. (2006). The
Hug, D. (2011). New wine in new skins: Sketching uncanny advantage of using androids in cognitive
the future of game sound design . In Grimshaw, and social science research. Interaction Stud-
M. (Ed.), Game sound technology and player ies: Social Behaviour and Communication in
interaction: Concepts and developments. Hershey, Biological and Artificial Systems, 7(3), 297–337.
PA: IGI Global. doi:10.1075/is.7.3.03mac
ITU-R BT.1359-1. (1998). Relative timing of Matsui, D., Minato, T., MacDorman, K. F., &
sound and vision for broadcasting. Question Ishiguro, H. (2005). Generating natural motion
ITU-R, 35(11). in an android by mapping human motion. In Pro-
ceedings of IEEE/RSJ International Conference
on Intelligent Robots and Systems, 1089-1096.
231
Uncanny Speech
McGurk, H., & MacDonald, J. (1976). Hearing lips Pollick, F. E. (in press). In search of the Uncanny
and seeing voices. Nature, 264(5568), 746–748. Valley . In Grammer, K., & Juett, A. (Eds.), Analog
doi:10.1038/264746a0 communication: Evolution, brain mechanisms, dy-
namics, simulation. Cambridge, MA: MIT Press.
McMahan, A. (2003). Immersion, engagement,
and presence: A new method for analyzing 3-D Reeves, B., & Voelker, D. (1993). Effects of audio-
video games . In Wolf, M. J. P., & Perron, B. (Eds.), video asynchrony on viewer’s memory, evaluation
The video game theory reader (pp. 67–87). New of content and detection ability. (Research Report
York: Routledge. prepared for Pixel Instruments, CA). Palo Alto,
CA: Standford University, Department of Com-
Minato, T., Shimda, M., Ishiguro, H., & Itakura,
munication.
S. (2004). Development of an android robot for
studying human-robot interaction. In R. Orchard, Reiter, U. (2011). Perceived quality in game audio
C. Yang & M. Ali (Eds.), Innovations in applied . In Grimshaw, M. (Ed.), Game sound technology
artificial intelligence, 424-434. and player interaction: Concepts and develop-
Mori, M. (1970/2005). The Uncanny Valley. In K.
F. MacDormand & T. Minato (Trans.) . Energy, Richards, J. (2008, August 18). Lifelike anima-
7(4), 33–35. tion heralds new era for computer games. The
Times Online. Retrieved April 7, 2009, from,
http://technology.timesonline.co.uk/tol/news/
tech_and_web/article4557935.ece.
developments. Hershey, PA: IGI Global. Ripken, J. (2009, October 19). Game synchro-
nisation: A view from artist development. Paper
presented at the Music and Creative Industries
Conference 2009, Manchester, UK.
M. (Ed.), Game sound technology and player in-
teraction: Concepts and developments. Hershey, Roux-Girard, G. (2011). Listening to fear: A
PA: IGI Global. study of sound in horror computer games . In
Grimshaw, M. (Ed.), Game sound technology and
Perron, B. (2004, September 14-16). Sign of a
player interaction: Concepts and developments.
threat: The effects of warning systems in survival
Hershey, PA: IGI Global.
horror games. Paper presented at COSIGN 2004,
University of Split, Croatia. Schafer, R. M. (1994). The soundscape: Our
Plantec, P. (2007). Crossing the Great Uncanny
Rochester, VT: Destiny Books.
Valley. In Animation World Network. Retrieved
August 21, 2010, from http://www.awn.com/ar- Schneider, E., Wang, Y., & Yang, S. (2007). Ex-
ticles/production/crossing-great-uncanny-valley/ ploring the Uncanny Valley with Japanese video
page/1%2C1. game characters. In Proceedings of Situated Play,
DiGRA 2007 Conference, 546-549.
Plantec, P. (2008). Image Metrics attempts to leap
the Uncanny Valley. In The Digital Eye. Retrieved
April 6, 2009, from http://vfxworld.com/?atype=
articles&id=3723&page=1.
232
Uncanny Speech
Seyama, J., & Nagayama, R. S. (2007). The Vatakis, A., & Spence, C. (2006). Audiovisual
uncanny valley: The effect of realism on the synchrony perception for speech and music us-
impression of artificial human faces. Presence ing a temporal order judgment task. Neurosci-
(Cambridge, Mass.), 16(4), 337–351. doi:10.1162/ ence Letters, 393, 40–44. doi:10.1016/j.neu-
pres.16.4.337 let.2005.09.032
Silent hill homecoming [Computer game]. (2008). Vinayagamoorthy, V., Steed, A., & Slater, M.
Double Helix & Konami (Developer/Co-Devel- (2005). Building characters: Lessons drawn from
oper). Tokyo, Japan: Konami. virtual environments. In Proceedings of Toward
social mechanisms of android science, COGSCI
Spadoni, R. (2000). Uncanny bodies. Berkeley:
200, 119-126.
University of California Press.
Warren, D. H., Welch, R. B., & McCarthy, T. J.
Steckenfinger, A., & Ghazanfar, A. (2009).
(1982). The role of visual-auditory “compelling-
Monkey behavior falls into the uncanny valley.
ness” in the ventriloquism effect: Implications for
Proceedings of the National Academy of Sci-
transitivity among the spatial senses. Perception
ences of the United States of America, 106(43),
& Psychophysics, 30(6), 557–564.
18362–18366. doi:10.1073/pnas.0910063106
(2008). Warrior Demo. Santa Monica, CA: Image
The Beatles. Rock band [Computer game]. (2009).
Metrics, Ltd.
Harmonix. Redwood City, CA: EA Games.
Weschler, L. (2002). Why is this man smiling?
The casting [Technology demonstration]. (2006).
Wired. Retrieved April 7, 2009, from http://www.
Quantic Dream (Developer). Foster City, CA:
wired.com/wired/archive/10.06/face.html.
Sony Computer Entertainment, Inc.
Zemekis, R. (Producer/Director). (2004). The
Tinwell, A. (2009). The uncanny as usability ob-
polar express [Motion picture]. California: Castle
stacle. In A. A. Ozok & P. Zaphiris (Eds.), Online
Rock Entertainment.
Communities and Social Computing workshop,
HCI International 2009, 12, 622-631. Zemekis, R. (Producer/Director). (2007). Beowulf
[Motion picture]. California: ImageMovers.
Tinwell, A., & Grimshaw, M. (2009). Bridging the
uncanny: An impossible traverse? In Proceedings
of Mindtrek 2009.
Tinwell, A., Grimshaw, M., & Williams, A. (2010).
Uncanny behaviour in survival horror games. Audio-Visual: An artifact with the components
Journal of Gaming and Virtual Worlds, 2(1), 3–25. image and sound.
doi:10.1386/jgvw.2.1.3_1 Cross-Modal: Interaction between sensory
Toprac, P., & Abdel-Meguid, A. (2011). Causing and perceptual modes, in this case, of vision and
fear, suspense, and anxiety using sound design hearing.
in computer games . In Grimshaw, M. (Ed.), Realism: Representation of objects as they
Game sound technology and player interaction: may appear in the real world.
Concepts and developments. Hershey: IGI Global. Uncanny Valley: A theory that as human-
likeness increases, an object will be regarded as
less familiar and more strange, evoking a negative
effect for the viewer (Mori, 1970).
233
Uncanny Speech
Virtual Character: A digital representation ENDNOtE

of a figure onscreen.
Viseme: A visual representation of a mouth
1
In the field of psychoacoustics, synchrony
shape for a particular speech utterance such as and synchresis are closely related to the
“k,” “ch” and “sh.” Those with hearing impedi- ventriloquism effect.
ments can use visemes to lip read and understand
the spoken language when unable to hear sound.
234
235
Chapter 12
Emotion, Content, and
Context in Sound and Music
Stuart Cunningham
Glyndŵr University, UK
Vic Grout
Richard Picking
AbstrAct
Computer game sound is particularly dependent upon the use of both sound artefacts and music. Sound
and music are media rich in information. Audio and music processing can be approached from a range
of perspectives which may or may not consider the meaning and purpose of this information.Computer
music and digital audio are being advanced through investigations into emotion, content analysis, and
context, and this chapter attempts to highlight the value of considering the information content present in
sound, the context of the user being exposed to the sound, and the emotional reactions and interactions
that are possible between the user and game sound. We demonstrate that by analysing the information
present within media and considering the applications and purpose of a particular type of information,
developers can improve user experiences and reduce overheads while creating more suitable, efficient
applications. Some illustrated examples of our research projects that employ these theories are provided.
Although the examples of research and development applications are not always examples from computer
game sound, they can be related back to computer games. We aim to stimulate the reader’s imagination
and thought in these areas, rather than attempt to drive the reader down one particular path.
INtrODUctION and a cause-and-effect relationship occurs. Whilst

this relationship is unique to each individual up
Music and sound stimulate one of the five human to a point, it is safe to assume that broad, often
senses: hearing. Any form of stimulation is subject shared, experiences occur across multiple listen-
to psychological interpretation by the individual ers. It can be argued that the emotional reaction
and response of a listener to a sound or piece of
DOI: 10.4018/978-1-61692-828-5.ch012
Emotion, Content, and Context in Sound and Music
Figure 1. Idealised Role of Emotion, Content and Context in a Computer Application
music is the single most important event resulting Contextual data provides additional informa-
from that experience. tion about factors that contribute to making the
The goal of this chapter is to explore the rela- user interaction experience much more relevant
tionship between sound stimuli and human emo- and effective by acquiring knowledge of the ex-
tion. In particular, this chapter examines the role ternal factors that influence decision making and
sound plays in conveying emotional information, the emotion of the user.
even from sources that may be visual in origin. The conceptual diagram of Figure 1 shows
Equally, the chapter seeks to demonstrate how an idealised situation in which a large database
human emotion is able to flip this paradigm and of audio media is presented to the user through a
influence music and sound selection, based on suitable application (such as a computer game).
emotional state and consideration of the context In this scenario, the user’s emotion and context
of the user. are analysed and compared against analysis of
The content being represented digitally pro- appropriate media content. This provides selection
vides the opportunity to gain a greater under- of the ‘best fit’ media that will further stimulate
standing of the information present in a data set. and engage the user in the most effective way.
Information being stored often has a number of The chapter explains the fundamentals of
characteristic features and structural elements emotional stimulation using sounds and music,
that can be identified automatically. For example, whilst retaining relevance to the audiologist. We
music generally contains an identifiable structure, demonstrate that by analysing the information
which might consist of several movements, parts, present within media and considering its applica-
or, more commonly, verses and choruses. How- tions, significant advantages can be gained which
ever, such structure can almost be considered frac- improve user experiences, reduce overheads, and
tal, in that there are microscopic and macroscopic aid in the development of more suitable, efficient
levels of organisation and also repetition, ranging applications: whether they be computer games or
from musical beats, bars, verses and choruses to other audio tools.
the level of the song itself.
236
EMOtION • Diegetic. Sound or music that is directly

related, or at least perceived to be related,
Emotion is a key factor to consider in computer to the environment in which the subject is
applications given that almost all applications intended to be immersed. For example, in a
will have some form of Human Computer Inter- movie this could be the sound coming from
face (HCI). Humans are emotional beings and a television that is in the room pictured on
the interaction with the machine will have some screen. Another example would be the
emotional effect on them to a greater or lesser voices of the characters on screen or the
extent. The computer, therefore, has an ability sound of a character firing a gun or driving
to invoke an emotional response in the user. The a car. In a nutshell, the subject is able to
user may bring their own emotions with them to reasonably identify the source of the sound
an interactive experience which has been affected given the surrounding virtual environment
by external factors in the environment around them • Non-diegetic. These sounds are generally
(Dix, Finlay, Abowd, & Beale, 2003). The qual- presented to augment or complement the
ity and resultant experience that a user has with virtual environment but come from sources
a machine is important and this is also true when that the subject cannot identify in the cur-
we consider the frequent interaction that we have rent environment. To go back to the horror
with entertainment media and computer games. movie example again, consider the famous
shower scene from Alfred Hitchcock’s
Emotion in Multimedia classic Psycho from 1960: the screeching,
stabbing violin sounds as the character of
The use of sound in multimedia, and especially Marion Crane is stabbed by Norman Bates
in computer games, is commonplace. This is (dressed as his mother). There is no reason
unsurprising if one considers that, in order to for the watcher of Psycho to believe that
successfully engage a human user in an immer- there are a collection of violinists in the
sive experience, the interaction must be achieved bathroom with Norman and Marion, rather
through one of the primary human senses. Speech the music is there to enhance the environ-
and hearing are hugely important in our daily lives ment that is presented.
and allow us as humans to send and receive large
amounts of information on an ad-hoc basis. Emotion in computer Games
Naturally, it is hearing and the use of sound
that we are interested in examining in this chapter. Game players exhibit larger emotional investment
Sound is used in complementing and augmenting in games than in many other forms of digital en-
other stimuli, especially visual. Consider, for ex- tertainment, primarily due to the interactive nature
ample, the last time you watched a horror movie of the medium. Jansz (2006) argues that game
and were embarrassed by the unintended jump or players often emotionally immerse themselves
flinch you experienced at a big bang or crescendo in games to experience emotional reactions that
that accompanied the appearance of the bad guy in cannot reasonably be stimulated in the real-world:
the movie! Proof, if it were needed, that the con- a sandbox environment for emotional develop-
structive use of music and sound can provoke one ment and experience. This notion will probably
of the most primal of human emotional instincts; be familiar to most readers, as many of us will
fear. Sound in multimedia environments is clas- have deliberately watched a scary movie to try
sified into two distinct categories (see Jørgensen, and frighten ourselves and because we enjoy ex-
2011 for a fuller analysis of these terms): periencing the sensations and physical responses
237
of being frightened, provided we are within a of analysis employ significance testing of the
controlled environment. data collected.
Freeman (2004) provides a list of reasons that Ravaja et al. (2005) conducted experiments that
support the activation of emotion in computer attempt to evidence the impact of computer game-
games, citing “art and money” (p. 1) as the principle play upon human emotions by employing an array
drivers, although his work focuses mainly on the of biometric measurements. This is based upon the
latter, such as competitive advantages for games generally held theory that emotion is expressed
development companies, rather than direct benefit by humans in three forms: “subjective experience
to consumers and game players. Nevertheless, as (e.g., feeling joyous), expressive behavior (e.g.,
Freeman advocates, this awareness in the indus- smiling), and the physiological component (e.g.,
try of the need to integrate emotion further into sympathetic arousal)” (p. 2). Taking this further,
computer gaming, is evidence of market demand the authors make the point that the psychological
and big business interest in this exciting field. connection between a player and a computer game
Emotion manifests itself in many ways and exceeds pure emotion and touches cognition where
there is an identifiable physical symptom in the players make assertions and links to the game:
user. Whilst the studies discussed later concentrate believing they are a super-hero or ninja warrior, for
on identifying physical emotional reaction, these example. This work also highlights the issue that,
have not always been directly linked to the player’s until recently, research into emotional enjoyment
physical interaction with the game. However, and influence has focused upon non-interactive,
research by Sykes and Brown (2003) describes mass media communication channels, such as
an initial study that deals with investigating not television, film and radio.
just emotional response or reaction in users but The wide range of measurements used by
emotional interaction with a game. Ravaja et al. is concise and, as the authors indicate,
Sykes and Brown also support the theory few other studies have employed such a wide
that emotional reaction and interaction represent range of metrics when investigating emotional
significant potential in being able to adapt and connection with computer gameplay. The authors
manipulate gaming environments in response use electrocardiogram (ECG)/inter-beat intervals
to the emotional and affective states of the user. (IBI), facial electromyography (EMG) and skin
Their investigation dealt with determining if the conductance level (SCL) as measurements dur-
amount of pressure applied to the buttons of a ing their experiments. The experiments showed
computer game controller pad correlated with an that reliable results are achieved across a range
increased level of difficulty in the game environ- of subjects in response to significant events in
ment. A benefit of using this approach as opposed a game scenario (such as success, failure, poor
to galvanic skin response or heart rate monitoring performance and so on). This work provides very
is that those mechanisms can be altered by the strong evidence that subjects exhibit strong, iden-
environmental changes around the user whereas tifiable physical reactions that are typical during
changes in pressure applied to the game control- emotional arousal when playing with computer
ler are much more likely to have been caused games. It supports the argument, made in this
by events occurring in the game. Their results chapter, that emotion, through physical distur-
indicated that players did indeed apply greater bance, is a strong method for detecting emotional
pressure to the game controller when a greater state and response when interacting with computer
level of difficulty and concentration was required games. Broadly speaking, positive and negative
in the game. Although the study is preliminary game events correlated to positive and negative
and relatively small-scale, the authors’ methods emotional reactions in players. However, one
238
point of note from the study is that the intuitively the Use of Emotional
expected emotional response was not always the sound in Games
one that was encountered in subjects.
One criticism of Ravaja et al.’s study is that, Research by Ekman (2008) bridges the gap be-
although a reasonable sample size was used (36 tween traditional movies and modern computer
participants), the gender balance was almost 70% games by explaining how sound is used to stimulate
in favour of male participants. Whilst it can be emotions in each of these media. Ekman enhances
argued that the gaming population is likely to be her discussions with summaries of some of the
male in majority, the study could have reflected numerous theories in the portrayal of emotional
the situation more accurately. The paper does not involvement experienced through sound and
attempt to account for this disparity or investigate music. Perhaps most importantly in her work,
whether a significant difference was present Ekman emphasises the difference between the
between the results of the male participants and role of sound in movies as opposed to computer
female participants (see Nacke & Grimshaw, 2011 games. Principally, this is that sound in movies is
for indications of gender difference in response present to enhance the narrative and heighten the
to game sound). Although beyond the scope experience whereas, in computer games, sound
of their paper, the work could have been much must perform not only this function but also serve
strengthened by performing some form of subjec- as a tool for interaction, often to the extent where
tive response with subjects on their performance the narrative element is sacrificed in favour of
in the game scenarios, thus allowing a more providing informational content. Ekman’s work
valid conclusion by employing triangulation of therefore suggests that incorporating diegetic
quantitative and qualitative methods. This would and non-diegetic sounds into computer games
complement the reliable results attained through significantly increases the level of complexity
their objective measurements. for the sound designer.
There is no doubt that emotion plays a signifi- Kromand (2009) feels strongly that sound can
cant affective role in computer gaming and that be used to influence a game player’s stress and
it has the potential to be used both as a reactive awareness levels by incorporating suitable mix-
and interactive device to stimulate users. The tures of diegetic and non-diegetic sound. He pro-
emotion elicited in gamers is a function of both vides examples of several contemporary computer
the content of the game as well as the context in games that feature such affective sound. In par-
which the user is placed, further justifying the ticular, his work focuses on the popular BioShock,
aims and underlying concept of this chapter: that F.E.A.R. and Silent Hill 2 titles. Kromand’s work
these three traits are inextricably linked and that is an interesting starting point and introduction to
further understanding and utilising them must the use of sound in games, especially in inducing
therefore lead to more intense, immersive, and more unpleasant sensations. He provides extensive
interactive gaming. Conati (2002), for example, discussion and illustrative examples and considers
considers how probabilistic models can be em- the concept of trans-diegetic sounds (Jørgensen,
ployed to develop artificial intelligence systems 2011) those which transcend the traditional bar-
that are able to predict emotional reactions to an rier between diegetic and non-diegetic. Kromand
array of content and contextual stimuli in educa- concludes by proposing that mixtures of diegetic
tion games, with the aim of keeping the player and non-diegetic sound can lead to confusion and
engaged with the game. But what of sound linked uncertainty about the environment and actions
to emotion in games? around the game player. He hypothesises that this
confusion is purposefully implemented in the game
239
environment and that the uncertainty of events have been required to consider their response
taking place adds to the emotional investiture of during which time the effects of spontaneity or
the player in the game. the moment could well have been depleted. The
Though not as up-to-date as other works results gained are not enough to fully support the
concerning computer games and human emo- idea of distinguishing spontaneous sounds from
tion, a corresponding work, which also looks at planned although there is evidence to suggest
methods of eliciting emotional state in computer that this might be a logical progression in future.
gamers, comes from Johnstone (1996). The age Nevertheless, the data collected shows promise in
of this paper alone demonstrates the importance being able to determine notions of urgency and felt
and significance of the emotional link between difficulty in the game environment from events
computer games and game players. His study that are associated with achieving the objectives
concerns the discernment of emotional arousal of the game. Primarily this can be measured by
by speech sounds made by users during their changes in spectral energy levels, low frequency
interaction with a computer game. Part of the energy distribution, and shorter speech duration.
rationale behind his approach is hypothesised to In more recent work, Livingstone and Brown
be because the feedback equipment of today (heart (2005) present theories and results that support
rate monitors and skin conductance devices) was the use of auditory stimuli to provide dynamic
not so readily or cheaply available in 1996. and interactive gaming environments. Whilst
An interesting concept that is partially ad- their paper explores the use of musical changes
dressed by Johnstone is that spontaneous emo- and emotional reactions in a general sense, part
tional speech sounds differ acoustically from those of their work is also devoted to investigating the
that are planned and considered. If this theory application in gaming. Their underlying concept
holds true, then it means that genuine emotional is that musical changes in the game can trigger
responses can be distinguished from planned emotional reactions in game players in a more
responses. In effect, this is somewhat analogous dynamic manner than is currently the norm.
to the use of voice stress analysis in lie detection Livingstone and Brown employ a rule-based
scenarios. Johnstone indicates that this ability is analysis of symbolic musical content that relates
also useful in a truly interactive manner, since it to a fixed set of emotional responses. Their work
not only means that users or game player responses demonstrates that by dynamically altering the
can be analysed to determine emotional valences, musical characteristics of playing music, such as
but also that synthesised speech, such as the voices the tempo, mode, loudness, pitch, harmony and
of characters in games, could be manipulated in so forth, the user perceives different emotional
similar acoustic ways to provide more realistic and intentions and contexts within the piece of music
affective game environments and conjunctions. that is currently playing. Music, then, stimulat-
For diegetic sounds in particular, this presents a ing one of the five human senses, is capable of
world of opportunity. influencing emotional change within humans in
The results of Johnstone’s initial study are a computer game environment.
promising though there are some methodological The work of Parker and Heerema (2008) pres-
aspects of the research that would have benefited ents a useful overview of how sound is used in
from tighter control. For example, subjects’ spon- diegetic and non-diegetic forms within computer
taneous speech sounds were recorded and analysed games. They argue that greater use should be made
but they were also required to answer subjective of sound in order to enhance the game environment
questions to provide speech samples. By the very and experience. A primary exemplar used by them,
nature of such an enquiry, the subjects would is that sound should also serve as a tool for input
240
and interaction with the game, rather than being objective measurement, is elusive. Subjective in-
present purely to be heard. They reiterate that vestigation has traditionally always been the forté
sound in games at present is reactive rather than of psychological and sociological researchers. It is
interactive. However, in this chapter, we suggest for this reason that sound designers and scientists
that sound is simply a tool of the emotions and working in the field must have an awareness of
that it is player emotion that should be interactive, these issues, especially the sound designer working
rather than reactive, in order to provide a new level in computer game and multimedia development.
of computer gaming experiences. We feel this In short, emotion is highly difficult to measure in
can be strongly underpinned by the use of sound. an absolute way. Bridging this gap must be done
Parker and Heerema go on to describe audio carefully and backed-up by considered research
gaming and provide a series of examples and and investigation.
discussions of scenarios where sound can be used There is a wealth of literature relating to the
as the primary interaction mechanism between the emotional impact of games. Equally, there is an
player and the computer game. These range from increasing amount of published work concern-
the player reacting to audio cues, providing the ing audio games; the majority of literature still
game with input using speech, or other sonic input, concerns itself with traditional, visually-focused
and by directly controlling sound and music in the games. As the reader may have noticed in this sec-
game. Although concise and valid at representing tion of this book, there are few studies that have
the current state of play of sound in games, their concerned themselves with using sound as the
work does not consider the affective nature of primary interactive method whilst also monitoring
using sound in games. Emotion is triggered by and responding to the emotional reactions of the
sound and the two are intrinsically linked. game player. It is just this sort of scenario that the
Recent work by Grimshaw, Lindley, and Nacke studies and ideas presented in this chapter aim to
(2008) seeks to formalise the relationship between inspire, support and help stimulate.
a subject’s immersion in a game environment as a
function of the auditory content. Grimshaw et al. Are sound and Music really
employ a series of biometric techniques to provide Important in Games?
insight into the human emotional and physiological
response to the sonic actions and environment of a It is interesting to consider to what extent sound is
first-person shooter game. Their method employs perceived as being important by users in computer
a significant array of quantitative, physiological games. If we consider the move from the beeps and
measurements that are correlated with subjective clicks that early computer games such as Space
questioning. The deep complexity of human emo- Invaders and Pong made to modern alternatives
tion and psychology is exposed in their work as such as the Guitar Hero series, we can see that
a strong relationship between the results of these the computer games industry has certainly placed
two investigative methods cannot be found. This an increased focus on the use of sound and music
deficiency is the subject of significant discussion in games. To this extent, we conducted research,
by the authors and, unsurprisingly, it is suggested by means of a user survey, into determining user
as an area for significant future investigation. awareness of sound in computer games. The work
It is important to place an emphasis on this is documented in grater detail in (Cunningham,
point: although broad hypotheses and empirical Grout, & Hebblewhite, 2006), but a summary
evidence show sound and music play a large part of the important findings and discussions are
in stimulating emotional responses in human sub- provided here.
jects, the quantification of these effects, especially
241
Table 1. Overall Game Genre Preference of Survey Participants in Rank Order
Game Genre Preference (%)

Role-Playing Game (RPG) 39
Shoot-em-up 24
Strategy/Puzzles 12
Adventure 9
Sports 9
Simulation 3
Other 3
This survey was undertaken to establish the the sound or musical elements to be important
various factors that subjects considered important when deciding upon a game to buy. Intriguingly,
when it came to purchasing a new computer game. the ability to play a game online with other users
Our initial hypothesis was that users would rate took favour over sound, which is an intriguing
factors such as playability and visuals of a game, insight into the mind of the 21st Century games
much higher than the sound and music, demon- player. Users who selected the “Other” category
strating that the focus in the computer gameworld were prompted to provide a more detailed expla-
tends to be in the areas of the graphical and nation. The responses received here all related to
gameplay domains. The survey had a total of 34 one of the following comments: “depth and cre-
respondents. A profile of the gamers participating ativity”, “the whole package” and two participants
in the survey, in terms of their game type prefer- stated that the “story or scenario” was most im-
ence, is shown in Table 1. portant.
We believe a future study should investigate It is argued, on the basis that playability will
whether the favoured game genre affects particu- always rate highest, that the results in Figure 3
lar factors that users specifically look for in games. are more insightful than those in Figure 2. After
For example, role-playing games have been tra- all, the whole notion of computer games is that
ditionally much more limited in terms of their they are to be played with! This time we see, as
graphic and aural flamboyance, with greater we might well have expected, that the graphics
emphasis being placed upon story-line and depth, and visual stimulation was the most popular fac-
whilst action and adventure games are often much tor. As expected, the sound present in a game was
more visually stimulating. cited by a low percentage of those surveyed. The
Figure 2 and Figure 3 illustrate the results users who chose the “Other” category on this
of questions where participants were asked to occasion also stated that the factor important to
indicate the most important and, since it was as- them related to the story of the game. Encourag-
sumed prior to the study that the playability or ingly, however, and still applicable in the context
gameplay would most likely be rated highly, the of sound in games, is the percentage of users that
second most important feature that influenced value the interface.
game purchasing decisions. If we consider some of the most recent success-
Not surprisingly, we found that the most im- ful games, where the use of music and sound has
portant factor is playability. The rating for all the been prominent, these titles almost all employ an
other possible factors are negligible, although interactive sound interface of some form. Prime
somewhat surprising is that no participants rated examples include the Guitar Hero, Rock Band,
242
Figure 2. Rating the Most Important Game Feature
Dance Dance Revolution (DDR), and SingStar tions through our survey, though the results are
series of games, as well as Battle of the Bands, inconclusive with an almost 50/50 split between
Ultimate Band and Wii Music, to name just a few. positive and negative responses. However, it is
For the budding entrepreneur game developer, it worth bearing in mind that these responses are now
is probably worth taking note that the majority of somewhat dated. An overview of the responses is
these titles revolve around the player being placed presented in Table 2.
in a live music performance scenario or band. We It is reasonable to suggest that the soundtrack
briefly attempted to analyse these two assump- of a game brings an added attraction when it comes
Figure 3. Rating the Second Most Important Game Feature
243
Table 2. Survey of Participants’ Interest in Game Soundtrack and Music
Does the soundtrack/music of a computer game make you more interested in Yes 50%
playing or buying it?
No 44%
Don’t Know 6%
to a gamer parting with their hard-earned cash. cessful gaming experiences. It is fair to assume
As mentioned earlier, game series like Grand that users are not only affected by sound and music
Theft Auto, FIFA and Dave Mirra feature music but that they also respond to feedback and interact
by well-known recording artists, in some cases with the game, essentially providing full-duplex
including music that has been commissioned communication between human and machine that
specifically for that game. is becoming increasingly information-rich. It is
It can be seen in the results summarised in this these interactions and information that the rest of
section that, other than the added value suggested this chapter focuses on.
above, users do not place any particular emphasis
on game sound. As was expected, the main aspects
users were interested in were the playability and cONtENt
graphics of a game, although interaction with
sound offers great potential. The development of Digital audio data holds much more information
new sound-motivated games will be a dynamic than the raw binary data from which it is consti-
and challenging field in the years to come, though tuted. At its barest, sound and music are generally
we must not forget the golden rule of a successful provided to augment and provide realism to the
game: playability. current scenario. However, as we demonstrated
The under-use of sound in games is further in the previous section of this chapter, computer
supported by Parker and Heerema (2008), a source games are truly multimedia experiences that
the reader is encouraged to investigate if they are combine a range of stimuli to interact with the
still in doubt as to the true potential of sound in user. In short, we see the area of content analysis
the gaming environment. To quote directly from as providing a semi-intelligent mechanism with
their work: “The use of sound in an interactive which to tie together one or more media employed
media environment has not been advanced, as a in a multimedia environment in order to provide
technology, as far as graphics or artificial intel- even more effective and efficient interaction and
ligence” (p. 1). Their work goes on to justify these experience.
assertions and they explain that poor quality sound Content of a particular medium can take many
in a game often results in the game being unsuc- different forms, some of which will be shared
cessful in the marketplace, whilst the success of across a range of media while others will be ex-
a game containing an acceptable or higher quality clusive to a particular media type. The following
of sound will be based upon other factors such as is an attempt to briefly describe and exemplify
playability or graphics. these two categories:
It is clear from the discussions and investigation
covered in this section that human interaction and • Shared content information. If we consider
psychological and emotional links with games are an entire multimedia artefact as being a
more and more to the fore, as well as becoming hierarchical object, greater than the sum
increasingly important in the development of suc- of its parts, then shared content informa-
244
tion would be found attached to each of As Zhang and Jay Kuo (2001) demonstrated,
the media components present. For exit is quite possible to extract and classify a range
ample, the publishing house, year of proof different sound content types from multime-
duction, copyright information, and name dia data, especially the kind of mixes found in
of the game in which a multimedia object traditional entertainment like television shows
(a sound or otherwise) appears will always and movies. Though their work is focused on the
be the same. This information is generally traditional media of multimedia communication,
that which is exclusively available in the the computer game environment is simply a natural
form of meta-data and requires little data extension of this, with the major difference being
mining to extract the integration of an element of interactivity.
• Exclusive content information. This is It is these principles that we hope content analy-
information about the content that can sis allows us to build upon and utilise in the field
only be found in a given type of media. of electronic media processing and development.
Although the same content information In particular, we hope that game sound content
may appear in multiple instances of that can be analysed to provide an enhanced gaming
media, it will generally be exclusive to that experience. As a good starting point for consider-
type of media. For example, if we consid- ation, we began to explore the relationship between
er the music present in a computer game, visual information and music in electronic media,
the exclusive content information would to provide an augmented experience when viewing
include the tempo, amplitude range, time the visual data. In another of our works (Davies,
signature, spectral representation, self- Cunningham, & Grout, 2007) we attempted to
similarity measurement, and so on. generate musical sequences based upon analysis
of digital images: in that particular case, those
The relationship between sound and visual of photographs and traditional works of art. The
elements has been a mainstay of the media field underlying thoughts and questions that motivated
since its inception. Consider the music video and that research revolved around suggestions such
Hollywood movie. Careful correlation occurs as: What would the Mona Lisa sound like? We felt
in these areas between the content presented to this would also provide additional information for
the user in these fields. Prime examples of these people who were, for example, visually impaired,
include the synchronisation between actions and and it could be used to provide added description
transitions appearing in the visual field and the and emotional information relating to a particular
sound content. An illustration of this that the still image. It became a logical ethos that the only
authors find particularly effective is in the open- way in which this could be achieved would be
ing sequence of the 1977 movie Saturday Night to analyse the content of the image, as it is this
Fever. This particular scene sees the watcher that contains the information and components
treated to shots of John Travolta’s feet, pounding required to relay the same information but in an
the streets of New York in time to the Bee Gee’s alternative format.
classic Stayin’ Alive: a classic in its own right A tool that we have found very effective in
and an almost ridiculously simple example of the analysing musical content is that of the Audio Simi-
sound content being combined in the production larity Matrix (ASM), based upon ideas initially
of the visual content to produce something that proposed and demonstrated by Foote (1999). This
has a much greater impact than either of the two allows a visual indication of the self-similarity,
individual components. and therefore structure, of a musical piece. We
suggest further reading into Foote’s work as a
245
Figure 4. ASM of ABBA’s “Head Over Heels” (28 second sample)
good starting point to stimulate the imagination form of content analysis. Even a simple param-
into how content analysis can provide highly useful eter that defined the “colour” of the scene or
information for a variety of scenarios. A graphical presence of tagged objects nearby would suffice
example of an ASM is presented in Figure 4 as on a basic level. Another suggestion would be to
an exemplar, where dark colours represent high manipulate the gameplay by the choice of music
similarity and bright colours show low similarity. and sound prescribed by the user. For example,
Whilst we do not limit the application of con- the same game scenarios and task may be under-
tent analysis to computer games, we suggest a taken at different speeds, levels of difficulty, and
few examples of appropriate situations where it in different environments, based upon the choice
may be used. Simple examples relate to the link of music the user makes. Consider the scenario
between visuals and sound. In a game where the where a user may decide to play dance music
scene is bright and full of strong, primary colours, whilst interacting with a game, thereby instigating
it would be pertinent to include sound that reflects a bright, quantised environment with predictable,
this notion: bright and strong in timbre. On the rhythmic, structured gameplay content. Whereas
other hand, a dark, oppressive scene would require if they choose highly random, noisy, alternative
slower, darker music with a thinner and sharper noise-core they would be presented with chaotic,
timbre, inducing a different set of emotions. In overwhelming game scenarios: in both cases, a
today’s dynamic computer games, where the user reflection of the structure and content of the mu-
has an apparently boundless freedom to explore sic that can be achieved only through detailed
a virtual environment, the dynamic updating of signal and structural content analysis of the audio
sound content to match the visuals requires some data.
246
cONtEXt A crucial work that backs up these notions of

more interactive and reliable context awareness,
Context awareness also provides opportunities for especially when it comes to the surrounding en-
a heightened user experience with digital media vironment, can be found in Clarkson, Mase, and
systems, particularly those that hold large data Pentland (2000). Although this work may now
sets, the content of which may only be relevant be slightly dated, the principles and techniques
to a user in certain usage scenarios. We believe employed in their work are effective and provide
the incorporation of contextual information into good examples of the type of contextual informa-
digital devices provides a more tailored experience tion that can be acquired by using simple sensor
for users. Contextual information can be consid- input. Their work investigates how context, such
ered as an added extra in digital media systems, as whether the user is on a train or at work and
allowing more defined information about the user whether they are in conversation or not, can be
to be brought into software systems. Recommen- estimated from sensor input, primarily a camera
dation systems, for example, are a great example and microphone. Such work provides a strong basis
of where contextual data can be included. from which to lead into more specific analysis of
Schmidt and Winterhalter’s (2004) work in e- context that is relevant to the current activity or
learning is a good example of how context aware- software application. This is further elaborated
ness can be incorporated into digital, computer- upon in the context of mobile device usage by
based communication media. In their field, the Tamminen, Oulasvirta, Toiskallio, and Kankainen
context of the user is particularly important as it (2004), who consider determining contextual
allows greater control and focusing of learning and information in mobile computing scenarios.
teaching materials in order to engage at a deeper Computers, gaming consoles and mobile
level with the user. Their work emphasises that devices have all become much more powerful in
the key stages of context awareness are in first recent years and interface with a range of local and
acquiring contextual information and then building remote information sources. These information
a suitable user-context model so as to estimate the sources range from the traditional tactile input
current context of the user. Schmidt and Winter- devices to accelerometers, touch screens, cameras,
halter also reinforce notions that good contextual microphones and so on. The Nintendo Wii and
modelling comes by acquiring information from Apple iPhone and iPod are prime examples of such
a range of sources. Most importantly, in discus- low-cost, sensor-rich, powerful computational
sions of the importance of user context, Schmidt devices. The technology available in these devices,
and Winterhalter hit upon the key questions that as well as those devices that can be further added
context awareness is able to begin to address: into the chain, mean a wide range of contextual
“How do we know what the user currently does, information can potentially be extracted from a
or what he intends to do?” (p. 42). game player, be they mobile or static.
Schmidt and Winterhalter choose to employ We consider that the foremost sources of con-
more passive mechanisms for contextual data textual information come from the user themselves
acquisition, such as those which passively track and from the surrounding environment in which
user progress through tasks and record commonly the user is currently immersed. This is further
accessed information. This is perfectly suitable for ratified by Reynolds, Barry, Burke, and Coyle
e-learning applications but, in the field of computer (2007) who also consider the importance and
games and interactive entertainment, we feel that usage of contextual input parameters from these
something a bit more fortuitous is required. two domains in their own research.
247
Information from the user is arguably the most DEtErMINING UsEr

useful data that can be acquired in determining the PErcEPtIONs OF MUsIc
context of the user. This allows the researcher to
begin to investigate factors such as their level of In this section of the chapter, we aim to gain more
activity, stress levels, emotional state, for example. of an insight into how emotion, content and con-
We propose mechanisms such as skin conductivity, text are attached to music by human listeners. By
motion and heart rate data that might be acquired investigating the various perceptions and semantic
directly from the user and would prove particularly terms users relate to different musical genres, it is
useful in monitoring their contextual state. possible to gain a deeper understanding of the ways
Factors in the environment around the user in which humans relate their emotions, musical
are likely to have an effect on their performance content and the context of different types of music.
in a game, their general attitudes, and their emo- Wide ranges of semantics are frequently
tional state. A number of metrics can provide employed to portray musical characteristics and
suitable input to a software system to estimate range from technically-related terms to experi-
environmental context. Environmental informa- ential narratives (Károlyi, 1999). It is proposed
tion includes the amount of ambient noise, light that the characteristics of a piece of music are
levels, time of day and year, temperature and so difficult to quantify in a single term or statement.
forth. We feel that the devices and information Whilst high-level abstractions may be possible
in this scenario are relevant to many contextual that categorise the music or provide an overview
extraction applications, not only those of digital of the timbre, this is a highly subjective and in-
entertainment and games. dividual (and potentially emotionally influenced)
Hopefully, the reader can begin to gain an expression of a listener’s experience of the music.
insight into the usefulness of contextual informa- Such an investigation also allows groupings to
tion from the examples and discussions in this be applied to terms, understanding to be formed
section of the chapter. The next section of this and a mapping of the relationships between these
chapter seeks to exemplify how context (as well groups to be formed.
as emotion and content) can be employed in digi-
tal multimedia applications, especially those that repertory Grid technique
relate to sound. We feel that, in computer games,
the virtual gameplay environment can be tailored In order to extract common descriptive features
to reflect the real environment of the player. In all, and semantics that are most meaningful and
this will provide a deeper, more immersive experi- globally understood, it is better to employ a
ence: this will help the player to develop greater technique where the listener subject may employ
emotional and personal investment in the game. It their own descriptions of the elements under
will also be interesting to see if such a game can investigation. George Kelly’s work (1955) into
contribute to altering the emotional state of the personal construct theory (PCT) and personal
user and impact upon their own personal context. construct psychology (PCP) provides a suitable
For example, can games be designed that would mechanism, known as repertory grid analysis,
relax a user, reduce their stress levels and heart by which such descriptions can be elicited from
rate, and even make them alter their surrounding subjects, correlated and employed in measure-
environment to reflect their new, calmer state? ment subject experiences. Kelly’s work in this
Only through more contextual awareness and area is grounded in principles of constructivism,
pervasive interactions with games will we know where subjects identify and deal with the world
the answer to this question and others.
248
around them based upon their own experiences concatenated and can be immediately visualised
and interpretation of events and objects. as one large grid but, more crucially, the oppor-
Repertory grid analysis consists of defining tunity is available to determine the importance
a particular subject or domain to be investigated of elements and constructs within the larger grid.
within a particular context. Descriptions of in- The bi-polar nature of defining constructs al-
stances or examples of the domain are known as lows the context and relationship of a construct
elements and bi-polar descriptions of the elements, to be articulated and better understood by the
known as constructs, are rated on a scale (usu- researcher. This further removes ambiguity when
ally 5 or 7 point). For example, the domain being a subject provides a rating, since the interrelation
investigated might be movies and the elements between the opposing ends of the scale have been
could be a number of popular movies and the specified by the subject themselves (Kelly, 1955).
bi-polar constructs used to describe and rate the Further detail of PCP and repertory grid technique
movie elements could be violent or non-violent, goes well beyond this work and can be found in
an adult’s film, or a children’s film and so on. Kelly’s seminal text.
Constructs are defined by the subject with the
help of the interviewer, who enables the subject Using a repertory Grid to
to produce more constructs by defining the rela- Understand Perceptions of Music
tionships and differences between the nominated
elements. This can be enhanced through interview A set of elements was defined to include a repre-
techniques involving triads, where three random sentative spectrum of musical genres upon which
elements are chosen and the subject asked to the subjects would be asked to define their own
choose the least similar of the three and define bi-polar personal constructs in regard to their
the construct that separates it from the other two experiences and perceptions of the characteristics
elements (Bannister & Mair, 1968). Subjects then of those genres, in their experiences of listening
provide a rating on the point scale for each element to music. Whilst there are many sub-genres and
against each of the constructs they have defined pseudo-related musical styles, this provides an
in order to complete their grid. Alternatively, appropriate, broad spread for the purposes of this
and particularly of use when subjects struggle to particular investigation without making the inter-
separate two elements from a third, the interviewer view process for the participant overly laborious
can also find it useful to present a subject with two in terms of time and effort. The elements defined
elements and ask them to explain the factor that were those shown in Table 3.
differentiates the two elements. This will often Subjects for the investigation were drawn from
provide one pole of a construct and the subject a random sample of the population. Subjects were
is then asked for what the opposite side of that interviewed on an individual basis and told that
particular construct would be. the purpose of the exercise was to get them to
Once a desired number of subjects have com- express their perceptions about the characterising
pleted a repertory grid each, the grids are then features of different type of music. To carry out
Table 3. Musical Genre Elements Used in Repertory Grid Experiment
• Pop • Rock • Dance

• Jazz • Classical • Soul
• Blues • Rap • Country
249
the rating of elements against their defined con- looking at Figure 7 we can produce the notion
structs, subjects were asked to perform a card that blues music is “emotionally evocative”, has
sorting exercise for each pair of constructs. The “specific geography & history”, is placed in the
use of triads was made to elicit the choice of context of being “African American”, and is “mel-
constructs by randomly selecting 3 elements and low”. Naturally, there is some subjectivity present
once subjects began to struggle with the use of here and these statements are open to interpreta-
triads they were asked to differentiate between 2 tion, but to most readers it is expected that these
randomly selected elements. A total of 10 subjects constructs should represent the group norm.
were selected to participate in the elicitation A perceived limitation of the repertory grid
process of the repertory grid interview. The age technique to have been encountered during this
of subjects interviewed ranged from 16 to 59, particular study is that of familiarity with the
with the average age being 34, and there was a elements under investigation by subjects, during
50/50 male/female gender split. The results of the repertory grid interview. During interviews there
ten repertory grid interviews are presented in were clearly some elements that subjects were
Figure 5 and Figure 6. definitely not as familiar with as others. It was
Though the number of subjects involved in observed that subjects would often group together
the repertory grid interviews appears to be a low the elements they were less familiar with when
population sample at first glance, the granularity rating elements against their chosen constructs.
from these interviews comes from the sum num- Whilst it is appreciated that this phenomenon
ber of constructs elicited across all participants. is likely to be particularly present in this study,
Furthermore, the data retrieved using constructs due to music awareness firmly depending upon
provides both qualitative and quantitative infor- personal preference or taste, it is doubtless likely
mation regarding the domain of enquiry. to occur in other scenarios.
In addition to the visual analysis of a reper- Using a repertory grid sought to elicit human-
tory grid, a PrinCom map, which makes use of friendly descriptions of musical characteristics.
Principal Component Analysis (PCA), can be Although not strictly timbral definitions, these
derived that relates elements and constructs in constructs succeed in describing the characteristics
a graphical fashion where the visual distance of musical genres. To put this into the context
between elements and constructs is significant. of artistic definitions with the notion of a visual
The PrinCom mapping integrates both elements metaphor, whilst timbre is a human description
and constructs on a visual grid and shows the of the colour of a sound or piece of music, these
relationship between the two. A PrinGrid for constructs can be thought of as describing the
the repertory grid derived in this investigation is patterns; the mix of shapes and colours that pro-
shown in Figure 7. vides deeper information about the content and
It is the constructs elicited that are particu- the bigger picture
larly of interest within the scope of this work. The We find repertory grid investigation to be a
constructs used by subjects provide insight into highly useful tool in determining group norms and
how they perceive music. As can be seen from perceptions of important factors in any field that
the grid in Figure 5 and Figure 6, the range of is being explored. In the context of this chapter, it
constructs elicited provides an insight into, not can hopefully be seen that using such techniques
only how subjects typically perceive the sound would allow information about how a group of
content of each musical genre but also, terms users would perceive a game and game sound in
relating to the context in which they place each terms of the content that constitutes the game along
genre and occasional indications of the emo- with their emotional perceptions of the game and
tional impact of each genre. For example, by also the context in which they view it.
250
Figure 5. Repertory Grid Ratings of Musical Genres (Part a).
EXAMPLE APPLIcAtIONs IN by the use of emotion, content and context in

cUrrENt rEsEArcH various guises. A number of the works presented
here have been studies involving a small set of
Presented in this section are summaries from some music. For convenience, this small database of
research work that has been influenced or involved music is shown in Table 4 so that the reader may
251
Figure 6. Repertory Grid Ratings of Musical Genres (Part b).
252
Figure 7. Musical Genre Principal Components Analysis.
cross-reference the ID number to the song, where Our main motivation in this area of research
appropriate. We feel that this small selection of and development was to address some of the
songs represents a reasonable cross-section of shortcomings traditionally employed in automatic
contemporary popular musical genres. recommendation and playlist generation tools.
Historically, these tools evolved in a similar
responsive Automated Music way to that of Automated Collaborative Filters
Playlists (ACFs). That is to say, simple measurements of
user preference and the preference of a typical
Some of our most recent and cognate work com- population were used to build a ranked table of
bining the use of emotion, content and context music in a library. These analysed information
in musical applications has been in the area of such as the most frequently played tracks, a user
intelligent playlist generation tools and this work rating of each track, favourite artists and musical
is explored in greater detail in a separate work genres, and other meta-data attached to a song
(Cunningham, Caulder, & Grout, 2008). However, (Cunningham, Bergen, & Grout, 2006). However,
to see the effectiveness of combining all three of this is not to totally trivialise the area of automatic
these areas, the reader is provided here with a playlist generation, since a number of systems exist
summary of that work to date. that employ much more advanced learning and
253
Table 4. Mini Music Database Used in Testing
ID Artist Song
0 Daft Punk One More Time (Radio Edit)
1 Fun Lovin’ Criminals Love Unlimited
2 Hot Chip Over and Over
3 Metallica Harvester of Sorrow
4 Pink Floyd Comfortably Numb
5 Sugababes Push The Button
6 The Prodigy Breathe
7 ZZ Top Gimme All Your Lovin’
analysis techniques and technologies (Aucouturier to examine the context the listener is in and how
& Pachet, 2002; Platt, Burges, Swenson, Weare, these external factors might affect their choice of
& Zheng, 2002). suitable music. Additionally, the emotional state
In recent years, as computational power and of the user is of interest since this also is likely
resources have increased, the tools that underpin to influence their choice of music. To visually
musical and sound content analysis have migrated summarise the complete system we are describ-
deeper into the field of playlist generation, allowing here, a figure showing an idealised scenario is
ing greater scope and accuracy for classification of provided in Figure 8. To realise a system that will
musical features (Logan, 2002, Gasser, Pampalk, potentially have to deal with and correlate a wide
& Tomitsch, 2007). However, although these ad- range of input parameters, we employ approaches
vances have been significant, such methods have that utilise fuzzy logic and self-learning systems.
always focused on musical and sonic information Principally, determining the user’s emotional
extraction and few systems have considered the state is of key importance to a truly successful
wider scope of the user and his or her environment. and useful playlist generation system. This is
It is reasonable to expect that factors relating to informed by and correlated with the state of the
the emotion, state of mind and current activities surrounding environmental factors for the user,
of a listener will greatly influence their current as well as their current levels of movement,
and, most importantly, next selection of music. physical activity, heart rate, stress levels, and so
on.
Emotion, Content and Context in Within our work, we felt that it was initially
Automatic Playlist Generation most important to investigate the current locomo-
tive state of the user. For example, a user who is
In this review of our recent work in the area moving a lot and accelerating rapidly may be par-
of intelligent automatic playlist generation, we ticipating in an energetic activity such as running,
provide details of the development principles, dancing, cycling, or exercising in some way. It is
investigation and analysis into the viability of feasible to suggest that most people listening to
playlist generation that considers the wider cir- music in such scenarios would be likely to want
cumstance of the listener. To achieve more accurate to listen to music that reflects the nature of their
and useful playlist generation, we propose to not physical motion such as music with a dominant,
only build upon the established principles of using driving rhythm and high tempo, greater than 120
information about the musical content but also or 130 beats-per minute, for instance.
254
Figure 8. Emotion, Content and Context Aware Playlist System
Given currently available technology, it is also information acquired from the user to arrive at an
relatively easy to find equipment that allows the estimate of the user’s emotional state (E-state). To
measurement of a user’s movement. This was re- achieve this, we developed a small scale system
alised in our case by employing the wireless hand that would work from a number of simulated fac-
controller from the Nintendo Wii: the Wiimote. tors (controlled by the researcher) and also live
The Wiimote, when compared to other alterna- data extracted from sensors, principally the Wii
tives, is a cheap device that allows measurement controller. This system was designed to work with
of three-dimensions of movement. The Wiimote a small music database consisting of eight songs,
is almost universally accessible since it employs shown in Table 4, and rank these songs in order of
the Bluetooth communication protocol to send most suitable, based upon the estimated E-state.
and receive data to a paired host. As Maurizio To begin working with the motion data from
and Samuele (2007) demonstrate, valuable mo- the Wiimote controller, we attempted to work with
tion information can be retrieved through the four simple locomotive states: standing, walking,
accelerometers contained in the Wii controller. jogging, and running. These simple locomotive
states were believed to be detectable from not only
Implementation and Initial Results the Wiimote but a range of motion measurement
devices such as the accelerometers built into the
Our initial work in this field sought to demon- Apple iPhone/iPod, as well as higher-level systems
strate the ability to attain, analyse and correlate such as a Qualisys motion capture system (which
content-related data about music and contextual we also had access to and allows us to verify the
255
Table 5. Defined Set of Emotional States (E-states)
E-State Depressed Unhappy Neutral Happy Zoned

Numeric Range 0-3 3-4 4-6 5-8 7-9
results obtained from the Wiimote). Similarly, Qualisys motion detection system. By determin-
for each of the other input parameters we work ing the rate of acceleration from the accelerom-
with, such as weather conditions, amount of light eters in the Wii controller and asking a subject to
and so on, a range of states was also defined. As provide three locomotive states (walking, jogging,
previously mentioned, due to our current limita- and running) we can see that the Wii controller
tions of time and resources, we focused on only allows rapid identification of each of these states,
the locomotive state of the user. It is hoped that as the graph in Figure 9 shows.
in future, further ratification will be achieved by We defined a number of different scenarios,
using other user measurements such as heart rate that combined a range of contextual parameters
and galvanic skin response. which a user might typically find himself or her-
To avoid additional complication, and allow self in. Each of these scenarios is shown in Table
greater control over the testing procedure, this set 6.
of parameter ranges and values was loosely defined These parameters included those that the user
according to the empirical and historic knowledge has control over, in this case the locomotive state,
of the individual. However, this is too fixed and and external, environmental factors, beyond the
logical to fit the way things tend to work in the control of the user. We carried out a quantitative
real world: therefore, to make these values more investigation, using 10 subjects, where each sub-
representative of real scenarios, they are fuzzified, ject was asked to map one of the emotional states
when defined in the fuzzy logic system. In simple from Table 5 against one of the scenarios from
terms, this means the boundaries and degree of Table 6. From this investigation, the average E-
accuracy of each point on a scale is related within state response rating for each scenario is shown
the range of all available values. in Table 7. Although the use of an average response
The implemented fuzzy logic system provides is not ideal, it provides a sufficient insight into
a single output value that represents the predicted the common perception of each scenario and when
E-state of the listener. For simplification, we began we performed analysis of each response, there
by defining five emotional states and assigned a was strong correlation across all of the subjects.
numeric range to each state to allow the repre- Each of the songs in the mini-database was
sentation of varying degrees of this state and the allocated a range of values from the E-state table
overlap where states merge into one another. This by the research team. These values reflected the
is appropriate, since it is very difficult to place researchers’ perceptions of the content and emo-
an absolute, quantitative metric onto the complex tional indicators present in the music. Naturally,
emotions felt by humans. A table representing in future research, we will explore the perceived
these allocations is shown in Table 5. emotional state attached to each song, by employ-
To verify the effectiveness of using the Wii ing a more detailed sample of a suitable population.
controller as a device to measure movement, and However, for now, these allocations were de-
particularly locomotive state, we performed a cided to be a controlled factor in this particular
number of experiments benchmarked against a investigation. These allocations were then mapped
256
Figure 9. Wii Acceleration Curves for 3 States of Locomotion
against the emotional states extracted from the At this stage, it is recognised that the system
user-scenario study and each song’s E-state was is currently more limited than the idealised sce-
ranked against each scenario’s E-state to provide nario presented earlier in Figure 8. A number of
a grade for each song, using a simple Euclidean factors have been simulated and a number of as-
distance measurement of the form sumptions have been made. However, using a
Fuzzy Rule Based System (FRBS) we have been
able to implement a limited version of the system
(p − q ) .
2
G (p, q ) = (1)
outlined in that dynamically outputs an E-state
based on live sensor data from a Wii controller
Table 8 shows the resultant ranking of songs and simulated environmental parameters. An
(by their ID), for each of these scenarios. outline of the Takagi-Sugeno-Kang (TSK) type
Table 6. Range of Scenarios in Subject E-state Evaluation
ID Scenario Description
1 Walking, temperature is hot, lighting is dark/grey and weather is light rain.
2 Stationary, temperature is cold, lighting is dark and weather is raining.
3 Stationary, temperature is warm, lighting is brightening and weather is dry.
4 Running, temperature is hot, lighting is daylight/getting brighter and dry.
5 Walking, temperature is getting hot, lighting is dark and weather is drizzling.
6 Stationary, temperature is hot, lighting is grey and weather is dry.
7 Walking/Jogging, temperature is mild, daylight and it’s dry.
257
Table 7. Average User E-state for each Scenario
Scenario Emotion
ID (0 = Unhappy; 100 = Very Happy)
1 4.3
2 0.0
3 6.8
4 7.7
5 3.0
6 3.8
7 6.5
FRBS we employed, along with the fuzzified input FUtUrE rEsEArcH IDEAs
parameters, is shown in Figure 10. AND cONcLUsIONs
Whilst a number of parameters such as light
and temperature measurement have been simu- We have seen that awareness of the presented
lated at this stage, the implementation of live issues is beneficial in not only providing richer
sensor data from such devices is a trivial one and interactive experiences and more appropriate
is only limited by the current lack of the hardware information, but that knowledge of the purpose
resources to incorporate these devices into the of information can be used to optimise compu-
live system. tational challenges. Furthermore, information,
In its current state the system demonstrates such as that presented visually, can be analysed
the ability to read contextual data from the user in terms of content and context with the goal of
and correlate this with information from the en- being able to stimulate emotions in a user who
vironment to make an informed judgement of the might otherwise be disadvantaged from such an
user’s emotional state. With future development experience, due to visual impairment.
and also feedback from the user whilst using the It is hoped that we have demonstrated the ap-
system, the facility will be available to teach the plicability of determining and analysing features
playlist generator about the user’s preferences related to emotion, content and context in relation
and train the accuracy with which the system is to improving systems where user-interaction is of
able to estimate the emotional state of the user. particular significance.
Table 8. Ranked Playlist Ordering for Set of Given Scenarios
Scenario E-state Playlist order

1 4.3 1; 4; 0; 7; 3; 6; 2; 5
2 0 6; 3; 4; 1; 0; 7; 2; 5
3 6.8 0; 7; 2; 5; 1; 4; 3; 6
4 7.7 2; 5; 0; 7; 1; 4; 3; 6
5 3 4; 3; 6; 1; 0; 7; 2; 5
6 3.8 4; 1; 3; 6; 0; 7; 2; 5
7 6.5 0; 7; 1; 2; 5; 4; 3; 6
258
Figure 10. Simplified Overview of TSK-type FRBS Used in Playlist Generation
For the budding researcher interested in tual data and output a robust, truly reflec-
these areas, we suggest the following broad, tive emotional state for the majority of a
non-exhaustive, list of some of the key research sample user population.
themes and areas that would greatly benefit from
further investigation: Above all, we hope that in reading this chapter
we have stimulated intellectual thought and got
• Explore the commonly perceived emo- the creative juices flowing. Our aim here is not
tions of users playing computer games and to provide a cast-in-stone set of data and instruc-
determine the most reliable methods with tions for the budding developer and researcher
which to record and model these emotions to follow and obey: far from it! By all means
• Develop a common software toolbox to question, criticise and make up your own mind.
allow audio content analysis to be easily Anyone working in the field of computer game
bolted into a range of software products and multimedia development that involves sound
• Further examine the value of environmen- will not only be aware of the technical, computing,
tal context for users playing computer and engineering issues of their field (the logical
games compared with user-centred contex- ones) but they will doubtless have opinions and
tual data their own tastes and creativity. If there is a lesson
• Assess gameplay parameters that are best to be learned from this chapter, it is that we hope
influenced and reflected in the emotion, you will consider the bigger picture, the wider
content and context of the user implications, the external factors and the notion
• Develop fuzzy logic systems that can accu- of a user-centred design process. We feel that
rately read a range of content and contex- the three areas of emotion, content and context
259
epitomise these views and will be the crucial Cunningham, S., Grout, V., & Hebblewhite, R.
issues in future technological and entertainment (2006). Computer game audio: The unappreciated
areas. Think big. In our opinion, the ‘blue sky’ scholar of the Half-Life generation. In Proceed-
and ‘off the wall’ ideas are some of the most fun ings of the Audio Mostly Conference on Sound
and interesting things you can do when it comes in Games.
to being creative with technology. Have fun with
Dance dance revolution. (1998). Konami.
your work and work with fun stuff!
Dave mirra freestyle BMX. (2000). Z-Axis.
Davies, G., Cunningham, S., & Grout, V. (2007).
rEFErENcEs
Visual stimulus for aural pleasure. In Proceedings
Aucouturier, J. J., & Pachet, F. (2002). Scaling up of the Audio Mostly Conference on Interaction
music playlist generation. In Proceedings of the with Sound.
IEEE International Conference on Multimedia Davis, H., & Silverman, R. (1978). Hearing and
Expo. deafness (4th ed.). Location: Thomson Learning.
Bannister, D., & Mair, J. M. M. (1968). The evalu- Dix, A., Finlay, J., Abowd, G. D., & Beale, R.
ation of personal constructs. London: Academic (2003). Human computer interaction (3rd ed.).
Press. Essex, England: Prentice Hall.
Battle of the bands. (2008). Planet Moon Studios. Ekman, I. (2008). Psychologically motivated tech-
BioShock. (2007). Irrational Games. niques for emotional sound in computer games.
In Proceedings of the Audio Mostly Conference
Bordwell, D., & Thompson, K. (2004). Film art: on Interaction with Sound.
An introduction (7th ed.). New York: McGraw-
Hill. F.E.A.R. First encounter assault recon. (2005).
Monolith Productions.
Clarkson, B., Mase, K., & Pentland, A. (2000).
Recognizing user context via wearable sensors. FIFA. (1993-). EA Sports
In Proceedings of the Fourth International Sym- Foote, J. (1999). Visualizing music and audio
posium of Wearable Computers. using self-similarity. Proceedings of the seventh
Conati, C. (2002). Probabilistic assessment ACM international conference on Multimedia
of user’s emotions in educational games. Ap- (Part 1), 77-80.
plied Artificial Intelligence, 16(7/8), 555–575. Freeman, D. (2004). Creating emotion in
doi:10.1080/08839510290030390 games: The craft and art of emotioneer-
Cunningham, S., Bergen, H., & Grout, V. (2006). ing™. Computers in Entertainment, 2(3), 15.
A note on content-based collaborative filtering of doi:10.1145/1027154.1027179
music. In Proceedings of IADIS - International Gasser, M., Pampalk, E., & Tomitsch, M. (2007).
Conference on WWW/Internet. A content-based user-feedback driven playlist gen-
Cunningham, S., Caulder, S., & Grout, V. (2008). erator and its evaluation in a real-world scenario.
Saturday night or fever? Context aware music In Proceedings of the Audio Mostly Conference
playlists. In Proceedings of the Audio Mostly on Interaction with Sound.
Conference on Interaction with Sound. Grand theft auto. (1993-). Rockstar Games.
260
Grimshaw, M., Lindley, C. A., & Nacke, L. Maurizio, V., & Samuele, S. (2007). Low-
(2008). Sound and immersion in the first-person cost accelerometers for physics experiments.
shooter: Mixed measurement of the player’s sonic European Journal of Physics, 28, 781–787.
experience. In Proceedings of the Audio Mostly doi:10.1088/0143-0807/28/5/001
Conference on Interaction with Sound.
Moore, B. C. J. (Ed.). (1995). Hearing: Handbook
Guitar hero. (2005-). [Computer software]. Har- of perception and cognition (2nd ed.). New York:
monix Music Systems (2005- 2007)/ Neversoft Academic Press.
(2007-).
Moore, B. C. J. (2003). An introduction to the
Hitchcock, A. (Director) (1960). Psycho. Hol- psychology of hearing (5th ed.). New York:
lywood, CA: Paramount. Academic Press.
Jansz, J. (2006). The emotional appeal of violent Nacke, L., & Grimshaw, M. (2011). Player-game
video games. Communication Theory, 15(3), 219– interaction through affective sound . In Grimshaw,
241. doi:10.1111/j.1468-2885.2005.tb00334.x M. (Ed.), Game Sound Technology and Player In-
teraction: Concepts and Developments. Hershey,
Johnstone, T. (1996). Emotional speech elicited
PA: IGI Global.
using computer games. In Proceedings of Fourth
International Conference on Spoken Language Parker, J. R. & Heerema, J. (2008). Audio interac-
(ICSLP96). tion in computer mediated games. International
Journal of Computer Games Technology.
Diegetic and non-diegetic sounds in computer Platt, J. C., Burges, C. J. C., Swenson, S., Weare,
games revisited . In Grimshaw, M. (Ed.), Game C., & Zheng, A. (2002). Learning a Gaussian
Sound Technology and Player Interaction: Con- process prior for automatically generating music
cepts and Developments. Hershey, PA: IGI Global. playlists. Advances in Neural Information Pro-
cessing Systems, 14, 1425–1432.
Károlyi, O. (1999). Introducing music. Location:
Penguin. Pong. (1972). Atari Inc.
Kelly, G. A. (1955). The psychology of personal Ravaja, N., Saari, T., Laarni, J., Kallinen, K.,
constructs. New York: Norton. Salminen, M., Holopainen, J., & Järvinen, A.
(2005). The psychophysiology of video gaming:
Kromand, D. (2008). Sound and the diegesis in
Phasic emotional responses to game events. In
survival-horror games. In Proceedings of the Audio
Proceedings of DiGRA 2005 Conference: Chang-
Mostly Conference on Interaction with Sound.
ing Views - Worlds in Play.
Livingstone, S. R., & Brown, A. R. (2005). Dy-
Reynolds, G., Barry, D., Burke, T., & Coyle, E.
namic response: Real-time adaptation for music
(2007). Towards a personal automatic music playl-
emotion. In Proceedings of the Second Austral-
ist generation algorithm: The need for contextual
asian Conference on Interactive Entertainment.
information. In Proceedings of the Audio Mostly
Logan, B. (2002). Content-based playlist genera- Conference on Interaction with Sound.
tion: Exploratory experiments, In ISMIR2002, 3rd
Rock band. (2005-2007). Harmonix Music Sys-
International Conference on Musical Information
tems.
(ISMIR).
261
Schmidt, A., & Winterhalter, C. (2004). User Cunningham, S., & Grout, V. (2009). Audio
context aware delivery of e-learning material: compression exploiting repetition (ACER): Chal-
Approach and architecture. Journal of Universal lenges and solutions. In Proceedings of the Third
Computer Science, 10(1), 38–46. International Conference of Internet Technologies
and Applications (ITA 09).
Silent hill 2. (2001). Konami.
Ekman, I., & Lankoski, P. (2009). Hair-raising
SingStar. (2004). London Studio.
entertainment: Emotions, sound, and structure
Space invaders. (1978). Taito Corporation. in Silent Hill 2 and Fatal Frame . In Perron, B.
(Ed.), Gaming after dark. Welcome to the world
Stigwood, R., & Badham, J. (Producers). (1977).
of horror video games (pp. 181–199). Jefferson,
Saturday night fever [Motion picture]. Hollywood,
NC: McFarland & Company, Inc.
CA: Paramount.
Freeman, D. (2003). Creating emotion in games.
Sykes, J., & Brown, S. (2003). Affective gam-
Indianapolis, IN: New Riders.
ing: Measuring emotion through the gamepad. In
Proceedings of Conference on Human Factors in Grimshaw, M. (2008). The acoustic ecology of
Computing Systems (CHI ‘03). the first-person shooter: The player, sound and
immersion in the first-person shooter computer
Tamminen, S., Oulasvirta, A., Toiskallio, K., &
game. Saarbrücken, Germany: VDM Verlag Dr.
Kankainen, A. (2004). Understanding mobile
Mueller.
contexts. Personal and Ubiquitous Computing,
8(2), 135–143. doi:10.1007/s00779-004-0263-1 Grimshaw, M. (2009). The audio Uncanny Valley:
Sound, fear and the horror game. In Proceedings
Ultimate band. (2008). Fall Line Studios.
of the Audio Mostly Conference on Interaction
Wii Music. (2008). Nintendo. with Sound.
Yost, W. A. (2007). Fundamentals of hearing: An Loy, G. (2006). Musimathics: The mathematical

introduction (5th ed.). New York: Academic Press. foundations of music (Vol. 1). Cambridge, MA:
MIT Press.
Zhang, T., & Jay Kuo, C. C. (2001). Audio content
analysis for online audiovisual data segmenta- Loy, G. (2007). Musimathics: The mathematical
tion and classification. IEEE Transactions on foundations of music (Vol. 2). Cambridge, MA:
Speech and Audio Processing, 9(4), 441–457. MIT Press.
doi:10.1109/89.917689
Papworth, N., Liljedahl, M., & Lindberg, S. (2007).
Beowulf: A game experience built on sound ef-
fects. In Proceedings of the 13th International
ADDItIONAL rEADING Conference on Auditory Display (ICAD).
Röber, N., & Masuch, M. (2005). Leaving the
Bishop, C. M. (2006). Pattern recognition and
screen: New perspectives in audio-only gaming.
machine learning. New York: Springer.
Brewster, S. A. (2008). Nonspeech auditory on Auditory Display (ICAD).
output . In Sears, A., & Jacko, J. (Eds.), The hu-
man computer interaction handbook (2nd ed.,
pp. 247–264). Philadelphia: Lawrence Erlbaum
Associates.
262
KEY tErMs AND DEFINItIONs Emotional Reaction: A human affective

response or feeling in response to one or more
Content: The definable qualities and charac- stimuli.
teristics for any given piece of information. Emotional State: The dominant, overriding
Context: The scenario and environment in emotional sensation of a human at a given moment.
which a user or application is placed in. Playlist Generation: The production of a
Emotional Interaction: A digital system ca- sequence of songs, often to be listened to on a
pable of inducing emotional reactions in a user portable music player.
and being able to dynamically respond to human
emotional states.
263
264
Chapter 13
Player-Game Interaction
Through Affective Sound
Lennart E. Nacke
University of Saskatchewan, Canada
Mark Grimshaw
AbstrAct
This chapter treats computer game playing as an affective activity, largely guided by the audio-visual
aesthetics of game content (of which, here, we concentrate on the role of sound) and the pleasure of
gameplay. To understand the aesthetic impact of game sound on player experience, definitions of emotions
are briefly discussed and framed in the game context. This leads to an introduction of empirical methods
for assessing physiological and psychological effects of play, such as the affective impact of sonic player-
game interaction. The psychological methodology presented is largely based on subjective interpretation
of experience, while psychophysiological methodology is based on measurable bodily changes, such
as context-dependent, physiological experience. As a means to illustrate both the potential and the dif-
ficulties inherent in such methodology we discuss the results of some experiments that investigate game
sound and music effects and, finally, we close with a discussion of possible research directions based on
a speculative assessment of the future of player-game interaction through affective sound.
INtrODUctION playing sports. Games also impose new research

challenges to many scientific disciplines – old
Digital games have grown to be among the fa- and new – as they have been hailed as drivers
vourite leisure activities of many people around of cloud computing and innovation in computer
the world. Today, digital gaming battles for a science (von Ahn & Dabbish, 2008), promoters
share of your individual leisure time with other of mental health (Miller & Robertson, 2009; Pul-
traditional activities like reading books, watching man, 2007), tools for training cognitive and motor
movies, listening to music, surfing the internet, or abilities (Dorval & Pepin, 1986; Pillay, 2002), and
as providers of highly immersive and emotional
DOI: 10.4018/978-1-61692-828-5.ch013 environments for their players (Ravaja, Turpeinen,
Player-Game Interaction Through Affective Sound
Saari, Puttonen, & Keltikangas-Järvinen, 2008; user experience depends much more on affective
Ryan, Rigby, & Przybylski, 2006). Gaming is a factors. Affect is defined as a discrete, conscious,
joyful and affective activity that provides emo- subjective feeling that contributes to, and influ-
tional experiences and these experiences may ences, an individual’s emotion (Bentley, et al.,
guide how we process information. 2005; Damasio, 1994; Russell, 1980). We will
Regarding emotions, Norman’s (2004) defini- revisit this notion later in the text.
tion is that emotion works through neurochemical In addition, Moffat (1980) introduced an in-
transmitters which influence areas of our brain teresting notion about the relationships between
and successively guide our behaviour and modify personality and emotion, which are distinguished
how we perceive information and make decisions. along the two dimensions: duration (brief and
While Norman makes a fine distinction between permanent) and focus (focused and global). For
affect and cognition, he also suggests that both example, an emotion might develop from brief
are information-processing systems with dif- affection into a long-term sentiment or a mood
ferent functionality. Cognition refers to making that occurs steadily might become a personal-
sense of the information that we are presented ity trait. The two dimensions can be plausibly
with, whereas affect refers to the immediate “gut identified at a cognitive level, making a strong
reaction” or feeling that is triggered by an object, case for the relation between emotion, cognition,
a situation, or even a thought. Humans strive to and personality both at the surface and at a deep,
maximize their knowledge by accumulating novel, structural level.
but also interpretative information. Experiencing Psychophysiological research shows that
novel information and being able to interpret it may affective psychophysiological responses elicit
be a cause of neurophysiological pleasure (Bie- more activity (on facial muscles such as cor-
dermann & Vessel, 2006). Cognitive processing rugator supercilii, indicating negative appraisal)
of novel information activates endorphins in the and higher arousal when people have to process
brain, which moderate the sensation of pleasure. unpleasant sound cues (e.g., bomb sounds), which
Thus, presenting novel cues in a game environ- shows that sound cues can be used in games to
ment will affect and mediate player experience influence players’ emotional reactions (Bradley
and in-game learning. This is an excellent example & Lang, 2000). Sound and music are generally
of how cognition and affect mutually influence known to enhance the immersion in a gaming
each other, which is in line with modern emotion situation (Grimshaw, 2008a). To music has been
theories (Damasio, 1994; LeDoux, 1998; Norman, attributed also a facilitation of absorption in an
2004). Norman (2004) proceeds to define emotion activity (Rhodes, David, & Combs, 1988), and
as consciously experienced affect, which allows it is generally know to trigger the mesolimbic
us to identify, who (or what) caused our affective reward system in the human brain (Menon &
response and why. The problem of not making a Levitin, 2005), allowing for music to function as
clear distinction between emotion and affect is a reward mechanism in game design and possibly
further addressed by Bentley, Johnston, & von allowing for reinforcement learning (Quilitch &
Baggo (2005), who recall Plutchik’s (2001) view Risley, 1973). The recent explosion of interac-
on emotion as an accumulated feeling which is tive music games is a testament to the pleasure-
influenced by context, experience, personality, enhancing function of music in games. Examples
affective state, and cognitive interpretation. They for interactive music games are Audiosurf (2008),
also explain that user experience for desktop soft- the Guitar Hero series (2005-2009), SingStar
ware or office-based systems is more dependent (2004), or WiiMusic (2008). They make heavy
on performance factors while, for digital games, use of reinforcement learning, as both positive
265
and negative reinforcement are combined when elements and reinforcement learning techniques
learning to play a song on Guitar Hero (2005) for are two design methods most pronounced in
example (for a comprehensive list of interactive music interaction games and that drive affective
music games see the list at the end of this chapter). engagement with sound and music.
Hitting the button and strumming with the right With recent efforts in the field of human-
timing leads to positive reinforcement in the way computer interaction (Dix, Finlay, & Abowd,
that the guitar track of the particular song is played 2004), the sensing and evaluation of the cognitive
back and suggests player finesse, while a crank- and emotional state of a user during interaction
ing sound acts as negative reinforcement when with a technological system has become more
the button and strumming are off. Such reward important. The automatic recognition of a user’s
mechanisms that foster reinforcement learning affective state is still a major challenge in the
are a very common design element in games (see emerging field of affective computing (Picard,
Collins, Tessler, Harrigan, Dixon, & Fugelsang, 1997). Since affective processes in players have
2011). Applying them to diegetic composition of a major impact on their playing experiences, re-
music is new and warrants further study as sound cent studies have emerged that apply principles
and music effects in games are currently not stud- of affective computing to gaming (Gilleade, Dix,
ied with the same scientific rigour that is present & Allanson, 2005; Hudlicka, 2008). The field of
for example in the study of violent digital game affective gaming is concerned with processing
content and aggression (Bushman & Anderson, of sensory information from players (Gilleade
2002; Carnagey, Anderson, & Bushman, 2007; & Dix, 2004), adapting game content (Dekker &
Ferguson, 2007; Przybylski, Ryan, & Rigby, 2009). Champion, 2007) – for example, artificial behav-
In addition to the reinforcement learning iour of non-player character game agents to player
techniques in game design, another design fea- emotional states – and using emotional input as a
ture is what Bateman (2009, p. 66) calls toyplay, game mechanic (Kuikkaniemi & Kosunen, 2007).
facilitating the motivation of playing for its own However, not much work has been put into sens-
sake. Toyplay denotes an unstructured activity of ing the emotional cues of game sounds in games
play guided by the affordances of the gameworld (Grimshaw, Lindley, & Nacke, 2008), let alone
and is largely of an exploratory nature (Bateman, in understanding the impact of game sound on
2009; Bateman & Boon, 2006) being similar to players’ affective responses.
games of emergence (Juul, 2005, p. 67) and un- We start by discussing general theories of
structured and uncontrolled play termed “paida” emotion and affect and their relevance to games
(Caillois, 2001, p. 13). Many music games work and psychophysiological research (for a more
completely without a narrative framing and derive general introduction to emotion, see Cunningham,
the joy of playing simply out of their player-game Grout, & Picking’s (2011) chapter on Emotion,
interaction. For example, Audiosurf (2008) elimi- Content & Context). For instance, we suggest it
nates most design elements not necessary for the is emotion that drives attention and this has an
interaction of the player with the game, which is important effect upon both engagement with the
essentially the production of music by “surfing” game and immersion (in those games that strive
the right tones. The colourful representation of to provide immersive environments). Immer-
tones and notes is a visual aesthetic that drives sion is an important and current topic in games
the player to produce music. A simple concept literature – rather than attempt to define it (that is
brought to stellar quality in games such as Rez attempted elsewhere in this book); we limit our-
(2001) or SimTunes (1996), which truly appeal to selves to a brief overview of immersion theories
the toyplay aspect of gaming. Therefore, toyplay and their relationship to theories of emotion, flow,
266
and presence before discussing empirical studies tion of a certain emotion is likely to influence the
and theoretical stratagems for measuring player psychophysiological reaction. This theory already
immersion as aided by game sound. Once we can tries to account for a combination of cognitive
understand under what sonic conditions immer- and physiological factors when experiencing
sion arises, we can then design more precisely emotions, in which case an emotion is not purely
for immersion. physiological (i.e. it is separate from mental pro-
cessing). Another important emotional concept is
the two-factor theory of emotions which is based
tHEOrIEs OF EMOtION on empirical observations (Schachter & Singer,
1962) and considers emotions to arise from the
Psychophysiological research, affective neurosci- interaction of two factors: cognitive labeling and
ence as well as affective and emotive computing physiological arousal (Schachter, 1964). In this
are supporting the assumption that a user’s (or theory, cognition is used as a framework within
in our case a player’s) affective state can be which individual feelings can be processed and
measured by sensing brain and body responses labeled, giving the state of physiological arousal
to experienced stimuli (Nacke, 2009). Emotions positive or negative values according to the situ-
in this sense can be seen as psychophysiological ation and past experiences. These theories have
processes, which are evoked by sensation, percep- spawned modern emotion research in neurology
tion, or interpretation of an event and/or object and psychophysiology (Damasio, 1994; Lang,
which is referred to in psychology as a stimulus. 1995; LeDoux, 1998; Panksepp, 2004) which
A stimulus usually entails physiological changes, is gathering evidence for a strong connection
cognitive processing, subjective feeling, or general between affective and cognitive processing as
changes in behaviour. This is of general interest, underlying factors of emotion in line with the
since playing games includes all sorts of virtual definition of Norman (2004) which we initially
events taking place in virtual environments con- provided.
taining virtual objects.
Emotions blur the boundaries between physi- From Emotions to Experience
ological and mental states, being associated with
feelings, behaviours and thoughts. No definitive Modern emotion research typically uses one of
taxonomy has been worked out for emotions, two taxonomies which try to account for emotions
but several ways of classifying emotions have as either consisting of a combination of a few
been used in the past. One of the first and most fundamental emotions or as comprising different
prominent theories of emotion is the James- dimensions usually demarked by extreme char-
Lange theory, which states that emotion follows acteristics on the ends of the dimensional scales:
from experiencing physiological alterations: The
change of an outside stimulus (either event or 1. Emotions comprise a set of basic emotions.
object) causes the physiological change which In the vein of Darwin (1899) who observed
then generates the emotional experience (James, fundamental characteristic expressive move-
1884; Lange, 1912). ments, gestures, and sounds), researchers
The Cannon-Bard theory offered an alterna- like Ekman (1992) and Plutchik (2001)
tive explanation of the processing sequence of argue for a set of basic discrete emotions,
emotions, stating that, after an emotion occurs, it such as fear, anger, joy, sadness, acceptance,
evokes a certain behaviour based on the processing disgust, expectation, and surprise. Each basic
of the emotion (Cannon, 1927). Thus, the percep- emotion can be correlated to an individual
267
physiological and behavioural reaction, for (Erregung-Beruhigung), and tension-reso-

example a facial expression as Ekman (1992; lution (Spannung-Lösung). A more modern
Ekman & Friesen, 1978) found after studying approach and currently the most popular
hundreds of pictures of human faces with dimensional model was suggested by Russell
emotional expressions (1980). His circumplex model (see Figure
2. Emotions can be classified by means 1) assumes the possible classification of
of a dimensional model. Dimensional emotional responses in a circular order on a
models have a long history in psychology plane spanned by two axes, emotional affect
(Schlosberg, 1952; Wundt, 1896) and are and arousal. The mapping of emotions to the
especially popular in psychophysiological two dimensions of valence and arousal has
research. Wundt (1896) was one of the first been used in numerous studies (Lang, 1995;
to classify “simple feelings” into a three- Posner, Russell, & Peterson, 2005; Watson
dimensional model, which consisted of the & Tellegen, 1985; Watson, Wiese, Vaidya, &
three fundamental axes of pleasure-dis- Tellegen, 1999) including studies of digital
pleasure (Lust-Unlust), arousal-composure
Figure 1. The two-dimensional circumplex emotional model based on Russell (1980)
268
games (Mandryk & Atkins, 2007; Nacke & Thus, we suggest that for measurement of
Lindley, 2008; Ravaja, et al., 2008). emotional responses to game sound, three broad
strategies are available for a full, scientific com-
The current popularity of dimensional models prehension of player experiences. This means
of emotion in psychophysiology can be explained that there are at least three ways of understanding
by the fact that Wundt (1896) was one of the first the emotional player experience in games (each
researchers to correlate physiological signals, illustrated by a particular stratagem) but the third,
such as respiration, blood-pressure, and pupil being a combination of the previous two, is likely
dilation with his “simple feelings” dimensions. to be the most accurate:
Bradley and Lang (2007) note that discrete and
dimensional models of emotion need not be mu- 1. As objective, context-dependent experi-
tually exclusive but, rather, these views of emotion ence – Physiological measures (using sensor
could be seen as complementary to each other. technology) of how a player’s body reacts to
For example, basic emotions can be classified game stimuli can inform our understanding
within affective dimensions. Finding physiolog- of these emotions
ical and behavioural emotion patterns as re- 2. As subjective, interpreted experience –
sponses to specific situations and stimuli is one Psychological measures of how players
of the major challenges that psychophysiological understand and interpret their own emo-
emotion research faces currently. However, new tions can inform our understanding of these
evidence from neurophysiological functional emotions
Magnetic Resonance Imaging (fMRI) studies 3. As subjective-objective, interpreted and con-
supports the affective circumplex model of emo- textual experience – Inferences drawn from
tion (Posner et al., 2009), showing neural networks physiological reactions and psychological
in the brain that can be connected to the affective measures allow a more holistic understand-
dimensions of valence and arousal: in this case, ing of experience.
affective pictures were used as stimuli. The mea-
surement of emotions induced by sound stimuli One of our primary research goals is to un-
in a game context is, however, more complex. To derstand gaming experience, which has been
identify how a certain sound, or a game element connected to positive emotions (Clark, Lawrence,
in general, is perceived, a subjective investigation Astley-Jones, & Gray, 2009; Fernandez, 2008;
is necessary, usually done after the experimental Frohlich & Murphy, 1999; Hazlett, 2006; Mandryk
session. Gathering subjective responses in addi- & Atkins, 2007), but also to more complex expe-
tion to psychophysiological measurements of riential constructs like, for example, immersion
player affect allows cross-correlation and valida- (Calleja, 2007; Ermi & Mäyrä, 2005; Jennett, et al.,
tion of certain emotional stimuli that may be 2008), flow (Cowley, Charles, Black, & Hickey,
present in a gaming situation. This ‘after-the-fact’ 2008; Csíkszentmihályi, 1990; Gackenbach, 2008;
narration is not, however, without its self-evident Sweetser & Wyeth, 2005) or presence (Lombard
problems. A further major challenge remains the & Ditton, 1997; Slater, 2002; Zahorik & Jenison,
distinction between auditory and visual stimuli 1998). Thus, we will provide an overview of the
within games, as many games evoke highly im- current understanding of immersion, flow and
mersive, audio-visual experiences, which can also presence in games and then provide suggestions
be influenced by setting, past experiences, and as to how this could be measured using objective
social context. and subjective approaches.
269
IMMErsION, FLOW, AND PrEsENcE responses of players influence immersion and

what measures of player affect are most suitable
In the fields of game science, media psychology, to evaluate immersion.
communication and computer science, many Immersion is seen in some literature (Sweetser
studies are concerned with uncovering experi- & Wyeth, 2005) – based on qualitative analysis
ences evoked by playing digital games. There – as an enabler of a fleeting experience of peak
is a lot of work directed towards investigating performance labeled flow (Csíkszentmihályi,
the potentials, definition, and limitations of im- 1990; Nakamura & Csíkszentmihályi, 2002).
mersion in digital games (Douglas & Hargadon, Flow is a little understood, but often-used expe-
2000; Ermi & Mäyrä, 2005; Jennett et al., 2008; riential concept for describing one kind of game
Murray, 1995). A major challenge of studying experience. Some examples from game studies
immersion is defining what exactly is meant by and human-computer-interaction literature try to
the term “immersion” and how does it relate to use flow for analyzing successful game design
similar game experience phenomena such as flow features of games (Cowley et al., 2008; Sweetser
(Csíkszentmihályi, 1990), cognitive absorption & Wyeth, 2005). However, originally, flow was
(Agarwal & Karahanna, 2000) and presence conceived by Csíkszentmihályi (1975) on the basis
(Lombard & Ditton, 1997; Slater, 2002). of studies of intrinsically motivated behaviour
of artists, chess players, musicians, and sports
From Immersion to Flow players. This group was found to be rewarded
and Presence by executing actions per se, experiencing high
enjoyment and fulfilment in the activity itself
In a very comprehensive effort, Jennett et al. (2008; rather than, for example, being motivated by future
Slater, 2002) give an extensive conceptual over- achievement. Csíkszentmihályi describes flow
view of immersion. According to their definition, as a peak experience, the “holistic sensation that
immersion is a gradual, time-based, progressive people feel when they act with total involvement”
experience that includes the suppression of all (p. 36). Thus, complete mental absorption in an
surroundings (spatial, audio-visual, and temporal activity is fundamental to this concept, which
perception), together with attention and involve- ultimately makes flow an experience mainly
ment mediating the feeling of being in a virtual found in situations with high cognitive loading
world. This suggests immersion to be an experi- accompanied by a feeling of pleasure. According
ence related to cognitive processing and attention: to a more recent description from Nakamura and
the more immersive an experience is, the more Csíkszentmihályi (2002), it should be noted that
attentionally demanding it is (see Reiter, 2011 for entering flow, two conditions should be met:
for a discussion of attention and audio stimuli). (1) a matching of challenges or action opportuni-
One could hypothesize that emotional state drives ties to an individual’s skill and (2) clear and close
attention (Öhman, Flykt, & Esteves, 2001) and goals with immediate feedback about progress.
therefore, the more affective an experience is, Flow itself can be described through the follow-
the more likely it is to grab individual attention ing manifested qualities (which are admittedly
and consequently to immerse the player. Thus, too fuzzy for a clear evaluation using subjective
immersion would be elicited as the result of an or objective methods): (1) concentration focuses
action chain that starts with affect. This prompts on present moment, (2) action and consciousness
an emotional response that influences attention merge, (3) self-awareness is lost, (4) one is in full
and, as a consequence, leads to immersion. It control over one’s actions, (5) temporal perception
remains to be shown whether, and how, affective is distorted, and (6) doing the activity is rewarding
270
in itself (Nakamura & Csíkszentmihályi, 2002). through what phases immersion unfolds and what
Flow even shares some properties with immersion, types of stimuli are likely to foster immersive ex-
such as a distorted temporal perception and lost periences. In what situations is immersion likely
or blurred awareness of self and surroundings. to unfold and what situational elements make it
Jennett et al. (2008) argue that immersion progress? When does it reach its peak and how
can be seen as a precursor for flow experiences, much immersion is too much? More research is
thus allowing immersion and flow to overlap in needed to investigate such questions, as well as
certain game genres, while noting that immersion a possible link between high engagement and
can also be experienced without flow: Immersion, addiction, as studied by Seah and Cairns (2008)
in their definition, is the “prosaic experience of or the differences between high engagement and
engaging with a videogame” (p. 643) rather than addiction as suggested in a study by Charlton and
an attitude towards playing or a state of mind. Danforth (2004).
One important question in the discussion about
flow and immersion is whether flow is a state or the scI Immersion Model
a process. Defining flow as a static rather than a
procedural experience would be in contrast to the Ermi and Mäyrä (2005) subdivide immersive game
process-based definitions of immersion such as the experiences into sensory (as mentioned above),
challenge-based immersion of Ermi and Mäyrä challenge-based and imaginative immersion (the
(2005). This kind of immersion oscillates around SCI-model) based on qualitative surveys. The
the success and failure of certain types of game elements of this immersion model account for
interactions. Another important differentiation different facilitators of immersion, such as, the
between flow and immersion is that immersion experience of elements (in a gaming context)
could be described as a “growing” feeling, an through which immersion is likely to take place.
experience that unfolds over time and is dependent The three immersive game experiences Ermi and
on perceptual readiness of players as well as the Mäyrä give implicitly provide different immersion
audio-visual sensory output capabilities of the models of static state and progressive experience.
gaming system. Past theoretical and taxonomi- Sensory immersion can be enhanced by amplifying
cal approaches have tried to define immersion as a game’s audio-visual components, for example,
consisting of several phases or components. For using a larger screen, a surround-sound speaker
example, Brown and Cairns (2004) describe system, or greater audio volume. If immersion is
three gradual phases of immersion: engagement, actually facilitated in this way, immersion would
engrossment, and total immersion, where the be an affective experience, as evidence points to
definition of total immersion as an experience of the fact that enhanced audio-visual presentation
total disconnection with the outside world overlaps results in an enhanced affective gaming experience
with definitions of telepresence, where users feel (Ivory & Kalyanaraman, 2007). By jamming the
mentally transported into a virtual world (Lombard perceptive systems of players (as a result of men-
& Ditton, 1997). The concept of presence is also tal workload associated with auditory and visual
discussed by Jennett et al. (2008) in relation to processing of game stimuli), sensory immersion
immersion, but defined as a state of mind rather is probably also a facilitator of guiding player’s
than a gradually progressive experience like im- attention (see Reiter, 2011). This strengthens the
mersion. If we assume for a moment that immer- hypothetical link between attentional processing
sion is an “umbrella” experience, immersion could and immersive feeling found in related literature
incorporate notions of presence and flow at certain (Douglas & Hargadon, 2000) but, while the link
stages of its progress. It remains, however, unclear remains, the cognitive direction is the reverse of
271
those discussed earlier. Imaginative immersion These equations would suggest that the longer
describes absorption in the narrative of a game people play with a higher success than failure rate,
or identification with a character which is under- the more immersed they would feel. If the failure
stood to be synonymous with feelings of empathy rate is higher than the success rate, the feeling of
and atmosphere. However, atmosphere might be immersion for players will decrease over time.
an agglomeration of imaginative immersion and Many sonic interactions in games are implicitly
sensory immersion (since certain sounds and challenge-based because they require interpreta-
graphics might facilitate a compelling atmospheric tion (or are understood from previous experience),
player experience): the use of this term raises but an example of explicitly challenge-based
the need for a clearer definition of the concept sonic interaction in games is given by Grimshaw
of atmosphere and this is not provided by Ermi (2008a) in his description of the navigational
and Mäyrä (2005). If ‘imaginative’ refers mainly mode of listening (p. 32). It remains to be tested
to cognitive processes of association, creativity, whether such an equation could account for im-
and memory recall, it is likely to be facilitated by mersion itself or whether this would only measure
player affect. However, individual differences are one aspect of the immersive experience. Ideally,
huge when it comes to pleasant imagination (this such a ratio would be extended and combined
is probably a matter of personal preference), which with psychophysiological variables that measure
would make it very difficult to accurately assess a player’s affective response over time.
this kind of immersion using empirical methodol-
ogy. The last SCI dimension, viz. challenge-based Implications for Player-Game
immersion, conforms closely with one feature of Interaction and Affective sound
Csíkszentmihályi’s (1990) description of flow.
This is the only type of immersion in this model In the context of sound and immersion in computer
that suggests it might be progressive experience games, other work investigates the role of sound
because challenge level is never simply static but in facilitating player immersion in the gameworld.
is something that oscillates around the success and A strong link between “visual, kinaesthetic, and
failure of certain types of interaction over time. If auditory modalities” is hypothetically assumed
we assume now that immersion is linked to either to be key to immersion (Laurel, 1991, p.161).
successful or failed interactions in a game that The degree of realism provided by sound cues
are likely to strengthen or weaken the subjective is also a primary facilitator for immersion, with
feeling of immersion, we can try to establish the realistic audio samples being drivers of immersion
following relationship between game interactions (Jørgensen, 2006) similar to employing spatial
and immersions. Given a number of successful sound (Murphy & Pitt, 2001) although some
interactions σ, a number of failed interactions φ, authors, as noted by Grimshaw (2008b) argue for
and incremental playing time τ, then two descrip- an effect of immersion through perceptual real-
tions of the magnitude of immersion ι could be ism of sound (as opposed to a mimetic realism)
considered: where verisimilitude, based on codes of realism,
(1) For σ, φ > 0: If σ proves as effective if not more efficacious than
> φ, then ι = σ/φ × τ. emulation and authenticity of sound (see Farnell,
2011). Self-produced, autopoietic sounds of
(2) For σ, φ > 0: If σ players, and the immersive impact that sounds
≤ φ, then ι = σ/(φ × τ). have on the relationship between players and the
virtual environment a game is played in, have
been framed in discussions on acoustic ecolo-
272
gies in first-person shooter (FPS) games which increases during positive moods. High obicularis
provide a range of conceptual tools for analyzing oculi muscle activity (responsible for closing the
immersive functions of game sound (Grimshaw, eyelid) is associated with mildly positive emotions
2008a; Grimshaw & Schott, 2008). In an argument (Cacioppo, Tassinary, & Berntson, 2007). An
for physical immersion of players through spatial advantage of physiological assessment is that it
qualities of game sound (Grimshaw, 2007), we can assess covert activity of facial muscles with
find the concept of sensory immersion reoccurring great sensitivity to subtle reactions (Ravaja, 2004).
(Ermi & Mäyrä, 2005). The perception of game Measuring emotions in the circumplex model of
sound in this context is not only loading player’s emotional valence and arousal is now possible
mental and attentional capacities but is also having during interactive events, such as playing games,
an effect on the player’s unconscious emotional by covertly recording the physiological activity of
state. The phenomenon of physical sonic immer- brow, cheek and eyelid muscle (Mandryk, 2008;
sion is not new, but has been observed before Nacke & Lindley, 2008; Ravaja, et al., 2008).
for movie theatre audiences and the concept has For the correct assessment of arousal, addi-
been transferred to sound design in FPS simula- tional measurement of a person’s electrodermal
tions and games (Shilling, Zyda, & Wardynski, activity (EDA) is necessary (Lykken & Venables,
2002). In some cases, the sensory intensity levels 1971), which is either measured from palmar sites
of game sound may be such that affect really is (thenar/hypothenar eminences of the hand) or
a gut feeling as alluded to earlier in this chapter. plantar sites (e.g. above abductor hallucis muscle
Possible immersion through computer game and midway between the proximal phalanx of
sound may be strong enough to enable a similar the big toe) (Boucsein, 1992). The conductance
affective experience by playing with audio only, of the skin is directly related to the production
as investigations in this direction suggest (Röber of sweat in the eccrine sweat glands, which is
& Masuch, 2005). entirely controlled by a human’s sympathetic
nervous system. Increased sweat gland activity
is related to electrical skin conductance. Using
PsYcHOPHYsIOLOGIcAL EMG measurements of facial muscles that reliably
MEAsUrEMENt OF EMOtIONs measure basic emotions and EDA measurements
that indicate a person’s arousal, we can correlate
As we have discussed before, a rather modern ap- emotional states of users to specific game events
proach is the two-dimensional model of emotional or even complete game sessions (Nacke, Lindley,
affect and arousal suggested by Russell (1980, & Stellmach, 2008; Ravaja, et al., 2008). Below,
see Figure 1). Ekman’s (1992) insight that basic we refer to several experiments analyzing cumula-
emotions are reflected in facial expressions was tive measurements of EMG and EDA to assess the
fundamental for subsequent studies investigating overall affective experience of players in diverse
physiological responses of facial muscles using a game sound scenarios.
method called electromyography (EMG) which
measures subtle reactions of muscles in the human Pointers from Psychophysiological
body (Cacioppo, Berntson, Larsen, Poehlmann, Experiments
& Ito, 2004). For example, corrugator muscle
activity (in charge of lowering the eyebrow) was A set of preliminary experiments (Grimshaw et al.,
found to increase when a person is in a bad mood 2008; Nacke, 2009; Nacke, Grimshaw, & Lindley,
(Larsen, Norris, & Cacioppo, 2003). In contrast to 2010) investigated the impact of the sonic user
this, zygomaticus muscle activity (on the cheek) experience and psychophysiological effects of
273
game sound (i.e., diegetic sound FX) and music suggested that the emotional responses to such
in an FPS game. They measured EMG and EDA complex stimuli can be simultaneously positive
responses together with subjective questionnaire and negative (Larsen, McGraw, & Cacioppo,
responses for 36 undergraduate students with a 2 × 2001; Larsen, McGraw, Mellers, & Cacioppo,
2 repeated-measures factorial design using sound 2004). Tellegen, et al. (1999) proposed a structural
(on and off) and music (on and off) as predictor hierarchical model of emotion which might be
variables with a counter-balanced order of sound more suited in this context by providing for both
and music presentation in an FPS game level. independent positive emotional activation (PA)
Among many results, two are particularly and negative emotional activation (NA) organized
interesting: (1) higher co-active EMG brow and in a three-level hierarchy. The top level is formed
eyelid activity when music was present than when by a general bipolar Happiness-Unhappiness
it was absent (regardless of other sounds) and (2) dimension, followed by the PA and NA dimen-
a strong effect of sound on gameplay experience sion allowing discrete emotions to form its base.
dimensions (IJsselsteijn, Poels, & de Kort, 2008). With this model, we could argue that the find-
In the case of the latter result, higher subjective ings of Nacke et al. (2010) show an independent
ratings of immersion, flow, positive affect, and positive and negative emotional activation during
challenge, together with lower negative affect the music conditions. This would, however, also
and tension ratings, were discovered when sound indicate that the physiological activity is not a
was present than when it was absent (regardless direct result of the sound and music conditions,
of music). The psychophysiological results of this but arguably of a combination of stimuli present
study put the usefulness of (tonic) psychophysi- during these conditions.
ological measures to the test, since the literature In addition, greater electrodermal activity
points to expressions of antipathy when the facial was found for female players when both sound
muscles under investigation are activated at the and music were off, while the responses for male
same time (Bradley, Codispoti, Cuthbert, & Lang, players were almost identical (see Figure 2). The
2001). The caveat here is that the most common authors assumed music to have a calming effect
stimuli that have been used in psychophysiological on female players, resulting in less arousal during
research are pictures (Lang, Greenwald, Bradley, gameplay. For females, music was also connected
& Hamm, 1993). Using music, especially in a with pleasant emotions as higher eyelid EMG ac-
highly immersive environment such as a first- tivity indicated. Overall, the psychophysiological
person perspective digital game, may lead to a results from that study pointed toward a positive
number of emotions being elicited simultaneously emotional effect of the presence of both sound
and which might lie outside of the dimensional and music (see also Nacke, 2009). Interesting
space that is being used in Russell’s (1980) model. in this context is that music does not seem to be
This opinion argues that a person’s emotional experienced significantly differently on a subjec-
experience is a cognitive interpretation of this tive level, whereas sound was clearly indicated as
automatic physiological response (Russell, 2003). having an influence on game experience. Higher
But the bipolarity of the valence-arousal dimen- subjective ratings of immersion, flow, positive af-
sions have been criticized before as the model is fect, and challenge, together with lower negative
too rigid to allow for simultaneous (i.e., positive affect and tension ratings when sound was present,
and negative) emotion measurements (Tellegen, paint a positive picture of sound for a good game
Watson, & Clark, 1999). Using sound and music experience (particularly so when music is absent).
in a digital game is, however, a very ambiguous The results discussed above are ones that run
and complex use of stimuli and prior research has the gamut from expected (sound contributes
274
Figure 2. Results of electrodermal activity (EDA averages in log [µS]) from the Nacke et al. (2010)
study, split up between gender, sound, and music conditions in the experiment (see also Nacke, 2009)
positively to the experience of playing games) to will help circumvent possible priming effects of
the interesting and meriting further investigation familiarity or non-familiarity with games in the
(for example, gender differences in sound affect experimental analysis. In our experiments, per-
in the context of FPS games). Being the results sonality assessments and demographic question-
of preliminary experiments, they typically provoke naires were handed out prior to each study to
more questions than they answer and such results factor out priming elements later in the statistical
should, for the time being, be viewed in the light analysis. Finally, it is difficult to correlate objec-
of several limiting factors. For example, the ex- tive measurements taken during gameplay with
periments provided audio-visual stimuli (not subjective, post-experiential responses and it may
solely audio) and the sub-genre of game used – the well be that such psychophysiological measure-
FPS game – proposes a hunter-and-the-hunted ments are not the most optimal method for as-
scenario which, perhaps, might account for the sessing the role of sound in digital games.
gender affect differences noted. Another limitation
that needs to be considered in psychophysiologi-
cal research is the effect of familiarity with a cONcLUsION
particular game genre and a psychological mind-
set. Thus, a personality test and demographic In this chapter, we have given an overview of the
questions regarding playing habits and behaviour emotional components of gameplay experience
275
with a special focus on the influence of sound and measurements and results obtained remain valid
music. We have discussed the results of experi- and thus more readily informative for the design
ments that have made use of both subjective and suggestions above. We also see a lot of potential
objective assessments of game sound and music. in cross-correlation of subjective and objective
After these pilot studies, and our discussion of measures in terms of attentional activation, such
emotional theories and experiential constructs, as the exploration of brain wave (that is EEG) data
we have to conclude that the detailed explora- to find out more about the cognitive underpinnings
tion of game sound and music at this stage of our of gameplay experience, by this means potentially
knowledge is still difficult to conduct because separating experiential constructs from an affec-
there are few comparable research results avail- tive emotional attribution and aligning them to
able and there is not yet a perfect measurement an attentive cognitive attribution. Experiments
methodology. The multi-method combination of might be designed to answer the question does
subjective and objective quantitative measures attention guide immersion or vice versa? Others
is a good starting point from which to create and might investigate sound and affect in game genres
refine more specific methodologies for examining other than FPS.
the impact of sound and music in games.
Potential of these New
Important Questions and technologies for sound Design
Future challenges
Why go to all this experimental trouble? After
The important questions regarding game design all, most digital games seem to function well
that aims to facilitate flow, fun, or immersive enough with current sound design paradigms.
experiences are: should tasks be provided by The answer lies in two technologies both having
the game (i.e., created by the designer), should great potential for the future of sound design. The
they be encouraged by the game environment, or first is procedural audio and as other chapters here
should finding the task be part of the gameplay? deal with the subject in great depth (Farnell, 2011;
The latter is rather unlikely, since finding only Mullan, 2011), we limit ourselves to highlighting
one task at a time sequentially might frustrate the importance of the ability to stipulate affective-
players and choosing a pleasant task according to emotional parameters for the real-time synthesis
individual mood, emotional, or cognitive disposi- of sound. It is generally accepted that a sudden,
tion will probably provide more fun. Thus, instead loud sound in a particular context (perhaps there
of saying players need to face tasks that can be is a preceding silence and darkness wrapped up
completed, it might be better design advice to in a horror genre context) is especially arousing.
provide several game tasks at the same time and However, what is less understood is the role, for
design for an environment that encourages playful example, of timbre on affect and emotion and, in
interaction. An environment that facilitates flow, the context of digital games and virtual environ-
fun, or immersion would provide opportunities for ments, immersion. Would it be effective to design
the player to alternate between playing for its own an affective real-time sound synthesis sub-engine
sake (i.e., setting up their own tasks) and finding as part of the game engine where the controllable
closure by completing a given task. parameters are not amplitude and frequency but
Some of the future challenges here will in- high-level factors such as fear, happiness, arousal,
clude finding good experimental designs that or relaxation? Perhaps these parameters could be
clearly distinguish audio stimuli, while still being governed by the player in the game set-up menu
embedded in a gaming context, in order that the who might opt, for instance, for a more or less
276
emotionally intense experience through the use Bentley, T., Johnston, L., & von Baggo, K. (2005).
of a simple fader. This brings us to the second Evaluation using cued-recall debrief to elicit in-
technology. formation about a user’s affective experiences. In
Although rudimentary and imprecise, con- T. Bentley, L.Johnston, & K. von Baggo (Eds.),
sumer biofeedback equipment for digital devices Proceedings of the 17th Australian conference on
(including computers and gaming consoles) is Computer-Human Interaction (pp. 1-10). New
beginning to appear.1 Pass the output of these York: ACM.
devices (which are variations of the EMG and
Biedermann, I., & Vessel, E. A. (2006). Percep-
ECG/EKG technologies used in the experiments
tual pleasure and the brain. American Scientist,
previously described) to the controllable param-
94(May-June), 247–253.
eters of the game sound engine proposed above
and procedural audio becomes a highly responsive, Boucsein, W. (1992). Electrodermal activity. New
affective, and emotive technology.2 Furthermore, York: Plenum Press.
a feedback loop is established in which both play
Bradley, M. M., Codispoti, M., Cuthbert, B. N.,
and sound emotionally respond to each other.
& Lang, P. J. (2001). Emotion and motivation
In effect, the game itself takes on an emotional
I: Defensive and appetitive reactions in picture
character that reacts to the player’s affect state
processing. Emotion (Washington, D.C.), 1(3),
and emotions and that elicits affect responses and
276–298. doi:10.1037/1528-3542.1.3.276
emotions in turn – perhaps the game’s character
might be empathetic or antagonistic to the player. Bradley, M. M., & Lang, P. J. (2000). Affective
This is the future of game sound design and the reactions to acoustic stimuli. Psychophysiology,
reason for pursuing the line of enquiry described 37, 204–215. doi:10.1017/S0048577200990012
in this chapter.
Bradley, M. M., & Lang, P. J. (2007). Emotion
and motivation . In Cacioppo, J. T., Tassinary,
L. G., & Berntson, G. G. (Eds.), Handbook of
rEFErENcEs
psychphysiology (3rd ed., pp. 581–607). New
Agarwal, R., & Karahanna, E. (2000). Time flies York: Cambridge University Press. doi:10.1017/
when you’re having fun: Cognitive absorption CBO9780511546396.025
and beliefs about information technology usage. Brown, E., & Cairns, P. (2004). A grounded inves-
Management Information Systems Quarterly, tigation of game immersion . In Dykstra-Erickson,
24(4), 665–694. doi:10.2307/3250951 E., & Tscheligi, M. (Eds.), CHI ‘04 extended
Audiosurf. [Video game]. (2008). Dylan Fitterer abstracts (pp. 1297–1300). New York: ACM.
(Developer), Bellevue, WA: Valve Corporation Bushman, B. J., & Anderson, C. A. (2002). Violent
(Steam). video games and hostile expectations: A test of
Bateman, C. (2009). Beyond game design: Nine the General Aggression Model. Personality and
steps towards creating better videogames. Boston: Social Psychology Bulletin, 28(12), 1679–1686.
Charles River Media. doi:10.1177/014616702237649
Bateman, C., & Boon, R. (2006). 21st century

game design. Boston: Charles River Media.
277
Cacioppo, J. T., Berntson, G. G., Larsen, J. T., Collins, K., Tessler, H., Harrigan, K., Dixon, M.
Poehlmann, K. M., & Ito, T. A. (2004). The J., & Fugelsang, J. (2011). Sound in electronic
psychophysiology of emotion . In Lewis, M., gambling machines: A review of the literature
& Haviland-Jones, J. M. (Eds.), Handbook of and its relevance to game audio . In Grimshaw,
emotions (2nd ed., pp. 173–191). New York: M. (Ed.), Game sound technology and player
Guilford Press. interaction: Concepts and developments. Hershey,
PA: IGI Global.
Cacioppo, J. T., Tassinary, L. G., & Berntson, G.
G. (2007). Handbook of psychophysiology (3rd Cowley, B., Charles, D., Black, M., & Hickey,
ed.). Cambridge, UK: Cambridge University R. (2008). Toward an understanding of flow in
Press. doi:10.1017/CBO9780511546396 video games. Computers in Entertainment, 6(2),
1–27. doi:10.1145/1371216.1371223
Caillois, R. (2001). Man, play and games. Chicago:
University of Illinois Press. Csíkszentmihályi, M. (1975). Beyond boredom
and anxiety. San Francisco: Jossey-Bass Pub-
Calleja, G. (2007). Digital games as designed
lishers.
experience: Reframing the concept of immer-
sion. Unpublished doctoral dissertation. Victoria Csíkszentmihályi, M. (1990). Flow: The psychol-
University of Wellington, New Zealand. ogy of optimal experience. New York: Harper
Perennial.
Cannon, W. B. (1927). The James-Lange theory of
emotions: A critical examination and an alterna- Cunningham, S., Grout, V., & Picking, R. (2011).
tive theory. The American Journal of Psychology, Emotion, content and context in sound and music
39(1/4), 106–124. doi:10.2307/1415404 . In Grimshaw, M. (Ed.), Game sound technology
and player interaction: Concepts and develop-
Carnagey, N. L., Anderson, C. A., & Bushman, B.
J. (2007). The effect of video game violence on
physiological desensitization to real-life violence. Damasio, A. R. (1994). Descartes’ error. New
Journal of Experimental Social Psychology, 43(3), York: G. P. Putnam.
489–496. doi:10.1016/j.jesp.2006.05.003
Darwin, C. (1899). The expression of the emotions
Charlton, J. P., & Danforth, I. D. W. (2004). in man and animals. New York: D. Appleton and
Differentiating computer-related addictions and Company.
high engagement . In Morgan, K., Brebbia, C.
Dekker, A., & Champion, E. (2007). Please bio-
A., Sanchez, J., & Voiskounsky, A. (Eds.), Hu-
feed the zombies: Enhancing the gameplay and
man perspectives in the internet society: culture,
display of a horror game using biofeedback. In
psychology and gender. Southampton: WIT Press.
Proceedings of DiGRA: Situated Play Conference.
Clark, L., Lawrence, A. J., Astley-Jones, F., & Retrieved January 1, 2010, from http://www.digra.
Gray, N. (2009). Gambling near-misses enhance org/dl/db/07312.18055.pdf.
motivation to gamble and recruit win-related brain
Dix, A., Finlay, J., & Abowd, G. D. (2004).
circuitry. Neuron, 61(3), 481–490. doi:10.1016/j.
Human-computer interaction. Harlow, UK: Pear-
neuron.2008.12.031
son Education. DJ hero. [Video game],(2009).
FreeStyleGames (Developer), Santa Monica,
CA: Activision.
278
Donkey Konga. [Video game], (2004). Namco Fernandez, A. (2008). Fun experience with digi-
(Developer), Kyoto: Nintendo. tal games: A model proposition . In Leino, O.,
Wirman, H., & Fernandez, A. (Eds.), Extending
Dorval, M., & Pepin, M. (1986). Effect of playing
experiences: Structure, analysis and design of
a video game on a measure of spatial visualiza-
computer game player experience (pp. 181–190).
tion. Perceptual and Motor Skills, 62, 159–162.
Rovaniemi, Finland: Lapland University Press.
Douglas, Y., & Hargadon, A. (2000). The plea-
Frequency. (2001). Sony Computer Entertainment
sure principle: Immersion, engagement, flow. In
(PlayStation 2).
Proceedings of the eleventh ACM on Hypertext
and Hypermedia (pp.153-160), New York: ACM. Frohlich, D., & Murphy, R. (1999, December
20). Getting physical: what is fun computing in
Ekman, P. (1992). An argument for basic emo-
tangible form? Paper presented at the Computers
tions. Cognition and Emotion, 6(3/4), 169–200.
and Fun 2 Workshop, York, UK.
doi:10.1080/02699939208411068
Gackenbach, J. (2008). The relationship between
Ekman, P., & Friesen, W. V. (1978). Facial action
perceptions of video game flow and structure.
coding system: A technique for the measurement
Loading... 1(3).
of facial movement. Palo Alto, CA: Consulting
Psychologists Press. Gilleade, K. M., & Dix, A. (2004). Using frus-
tration in the design of adaptive videogames. In
Electroplankton. [Video game], (2006). Indies
[New York: ACM.]. Proceedings of ACE, 2004,
Zero (Developer), Kyoto: Nintendo.
228–232.
Elite beat agents. [Video game], (2006). iNiS
Gilleade, K. M., Dix, A., & Allanson, J. (2005).
(Developer), Kyoto: Nintendo.
Affective videogames and modes of affective
Ermi, L., & Mäyrä, F. (2005). Fundamental com- gaming: Assist me, challenge me, emote me.
ponents of the gameplay experience: Analysing In Proceedings of DiGRA 2005 Conference:
immersion. In Proceedings of DiGRA 2005 Con- Changing Views: Worlds in Play. Retrieved
ference Changing Views: Worlds in Play. Retrieved January 1, 2010, from http://www.digra.org/dl/
January 1, 2010, from http://www.digra.org/dl/ db/06278.55257.pdf.
db/06276.41516.pdf.
Gitaroo man. [Video game], (2001). Koei/iNiS
Farnell, A. (2011). Behaviour, structure and causal- (Developer) (PlayStation 2). Electroplankton.
ity in procedural audio . In Grimshaw, M. (Ed.), [Video game], (2006). Indies Zero (Developer),
Game sound technology and player interaction: Kyoto: Nintendo.
Grimshaw, M. (2007). The resonating spaces
Global.
of first-person shooter games. In Proceedings
Ferguson, C. J. (2007). Evidence for publication of The 5th International Conference on Game
bias in video game violence effects literature: Design and Technology. Retrieved January 1,
A meta-analytic review. Aggression and Vio- 2010, from http://digitalcommons.bolton.ac.uk/
lent Behavior, 12(4), 470–482. doi:10.1016/j. gcct_conferencepr/4/.
avb.2007.01.001
279
Grimshaw, M. (2008a). The acoustic ecology of Hudlicka, E. (2008). Affective computing for
the first-person shooter: The player, sound and game design. In Proceedings of the 4th Interna-
immersion in the first-person shooter computer tional North American Conference on Intelligent
game. Saarbrücken: VDM Verlag Dr. Mueller. Games and Simulation (GAMEON-NA).Montreal,
Canada.
Grimshaw, M. (2008b). Sound and immersion in
the first-person shooter. International Journal of IJsselsteijn, W., Poels, K., & de Kort, Y. A. W.
Intelligent Games & Simulation, 5(1), 2–8. (2008). The Game Experience Questionnaire:
Development of a self-report measure to assess
Grimshaw, M., Lindley, C. A., & Nacke, L. (2008).
player experiences of digital games. FUGA De-
Sound and immersion in the first-person shooter:
liverable D3.3. Eindhoven, The Netherlands: TU
Mixed measurement of the player’s sonic experi-
Eindhoven.
ence. In Proceedings of Audio Mostly 2008 - A
Conference on Interaction with Sound. Retrieved Ivory, J. D., & Kalyanaraman, S. (2007). The ef-
January 1, 2010, from http://digitalcommons. fects of technological advancement and violent
bolton.ac.uk/gcct_conferencepr/7/. content in video games on players’ feelings of
presence, involvement, physiological arousal, and
Grimshaw, M., & Schott, G. (2008). A conceptual
aggression. The Journal of Communication, 57(3),
framework for the analysis of first-person shooter
532–555. doi:10.1111/j.1460-2466.2007.00356.x
audio and its potential use for game engines.
International Journal of Computer Games Tech- James, W. (1884). What is an emotion? Mind,
nology, 2008. 9(34), 188–205. doi:10.1093/mind/os-IX.34.188
Guitar hero 5. [Video game], (2009). RedOctane Jennett, C., Cox, A. L., Cairns, P., Dhoparee,
(Developer), Santa Monica, CA: Activision. S., Epps, A., & Tijs, T. (2008). Measuring and
defining the experience of immersion in games.
Guitar hero II. [Video game], (2006). RedOctane
International Journal of Human-Computer Stud-
(Developer), Santa Monica, CA: Activision.
ies, 66, 641–661. doi:10.1016/j.ijhcs.2008.04.004
Guitar hero III. [Video game], (2007). RedOctane
of computer game audio. In Audio Mostly: A
Guitar hero: On tour. [Video game], (2008). Conference on Sound in Games.
RedOctane (Developer), Santa Monica, CA:
Juul, J. (2005). Half-real: Video games between
Activision. (Nintendo DS).
real rules and fictional worlds. Cambridge, MA:
Guitar hero. [Video game], (2005). RedOctane MIT Press.
(Developer), New York: MTV Games.
Kuikkaniemi, K., & Kosunen, I. (2007). Progres-
Guitar hero world tour. [Video game], (2008). sive system architecture for building emotionally
RedOctane (Developer), Santa Monica, CA: adaptive games. In BRAINPLAY ’07: Playing
Activision. with Your Brain Workshop at ACE (Advances in
Computer Entertainment) 2007.
Hazlett, R. L. (2006). Measuring emotional
valence during interactive experiences: Boys at Lang, P. J. (1995). The emotion probe. Stud-
video game play. In Proceedings of CHI’06 (pp. ies of motivation and attention. The American
1023 – 1026). New York: ACM. Psychologist, 50, 372–385. doi:10.1037/0003-
066X.50.5.372
280
Lang, P. J., Greenwald, M. K., Bradley, M. Mandryk, R. L. (2008). Physiological measures

M., & Hamm, A. O. (1993). Looking at pic- for game evaluation . In Isbister, K., & Schaf-
tures: Affective, facial, visceral, and behav- fer, N. (Eds.), Game usability: Advice from the
ioral reactions. Psychophysiology, 30, 261–273. experts for advancing the player experience (pp.
doi:10.1111/j.1469-8986.1993.tb03352.x 207–235). Burlington, MA: Elsevier.
Lange, C. G. (1912). The mechanism of the emo- Mandryk, R. L., & Atkins, M. S. (2007). A
tions . In Rand, B. (Ed.), The classical psycholo- fuzzy physiological approach for continuously
gists (pp. 672–684). Boston: Houghton Mifflin. modeling emotion during interaction with play
environments. International Journal of Human-
Larsen, J. T., McGraw, A. P., & Cacioppo, J. T.
Computer Studies, 65(4), 329–347. doi:10.1016/j.
(2001). Can people feel happy and sad at the
ijhcs.2006.11.011
same time? Journal of Personality and Social
Psychology, 81(4), 684–696. doi:10.1037/0022- Menon, V., & Levitin, D. J. (2005). The rewards
3514.81.4.684 of music listening: Response and physiological
connectivity of the mesolimbic system. Neuro-
Larsen, J. T., McGraw, A. P., Mellers, B. A., &
Image, 28(1), 175–184. doi:10.1016/j.neuroim-
Cacioppo, J. T. (2004). The agony of victory and
age.2005.05.053
thrill of defeat: Mixed emotional reactions to disap-
pointing wins and relieving losses. Psychological Miller, D. J., & Robertson, D. P. (2009). Using a
Science, 15(5), 325–330. doi:10.1111/j.0956- games console in the primary classroom: Effects
7976.2004.00677.x of ‘Brain Training’ programme on computation
and self-esteem. British Journal of Educational
Larsen, J. T., Norris, C. J., & Cacioppo, J. T.
Technology, 41(2), 242–255. doi:10.1111/j.1467-
(2003). Effects of positive and negative affect
8535.2008.00918.x
on electromyographic activity over zygomaticus
major and corrugator supercilii. Psychophysiol- Moffat, D. (1980). Personality parameters and pro-
ogy, 40, 776–785. doi:10.1111/1469-8986.00078 grams . In Trappl, R., & Petta, P. (Eds.), Creating
personalities for synthetic actors (pp. 120–165).
Laurel, B. (1991). Computers as theatre. Boston,
Berlin: Springer.
MA: Addison-Wesley.
LeDoux, J. (1998). The emotional brain. London:
Orion Publishing Group.
Lego rock band. [Video game], (2009). Harmonix developments. Hershey, PA: IGI Global.
(Developer), New York: MTV Games.
Murphy, D., & Pitt, I. (2001). Spatial sound
Lombard, M., & Ditton, T. (1997). At the heart enhancing virtual storytelling. In Proceedings of
of it all: The concept of presence. Journal of the International Conference ICVS, Virtual Sto-
Computer-Mediated Communication, 3(2). rytelling Using Virtual Reality Technologies for
Storytelling (pp. 20-29) Berlin: Springer.
Lykken, D. T., & Venables, P. H. (1971). Direct
measurement of skin conductance: A proposal for Murray, J. H. (1995). Hamlet on the holodeck:
standardization. Psychophysiology, 8(5), 656– The future of narrative in cyberspace. New York:
672. doi:10.1111/j.1469-8986.1971.tb00501.x Free Press.
281
Nacke, L., Lindley, C., & Stellmach, S. (2008). Phase. [Video game], (2007). Harmonix Music
Log who’s playing: Psychophysiological game Systems.
analysis made easy through event logging. In P.
Picard, R. W. (1997). Affective computing. Cam-
Markopoulos, B. Ruyter, W. IJsselsteijn, & D.
bridge, MA: MIT Press.
Rowland (Eds.), Proceedings of Fun and Games,
Second International Conference (pp. 150-157). Pillay, H. K. (2002). An investigation of cognitive
Berlin: Springer. processes engaged in by recreational computer
game players: Implications for skills of the future.
Nacke, L., & Lindley, C. A. (2008). Flow and im-
Journal of Research on Technology in Education,
mersion in first-person shooters: Measuring the
34(3), 336–350.
player’s gameplay experience. In Proceedings of
the 2008 Conference on Future Play: Research, Plutchik, R. (2001). The nature of emotions.
Play, Share (pp. 81-88). New York: ACM. American Scientist, 89(4), 344–350.
Nacke, L. E. (2009). Affective ludology: Scientific Posner, J., Russell, J. A., Gerber, A., Gorman, D.,
measurement of user experience in interactive Colibazzi, T., & Yu, S. (2009). The neurophysi-
entertainment. Unpublished doctoral dissertation. ological bases of emotion: An fMRI study of the
Blekinge Institute of Technology, Karlskrona, affective circumplex using emotion-denoting
Sweden. Retrieved January 1, 2010, from http:// words. Human Brain Mapping, 30(3), 883–895.
affectiveludology.acagamic.com. doi:10.1002/hbm.20553
Nacke, L. E., Grimshaw, M. N., & Lindley, C. A. Posner, J., Russell, J. A., & Peterson, B. S. (2005).
(2010). More than a feeling: Measurement of sonic The circumplex model of affect: An integrative
user experience and psychophysiology in a first- approach to affective neuroscience, cognitive
person shooter. Interacting with Computers, 22(5), development, and psychopathology. Development
336–343. doi:10.1016/j.intcom.2010.04.005 and Psychopathology, 17, 715–734. doi:10.1017/
S0954579405050340
Nakamura, J., & Csíkszentmihályi, M. (2002).
The concept of flow . In Snyder, C. R., & Lopez, Przybylski, A. K., Ryan, R. M., & Rigby, S. C.
S. J. (Eds.), Handbook of positive psychology (pp. (2009). The motivating role of violence in video
89–105). New York: Oxford University Press. games. Personality and Social Psychology Bulletin,
35(2), 243–259. doi:10.1177/0146167208327216
Norman, D. A. (2004). Emotional design. New
York: Basic Books. Pulman, A. (2007). Investigating the potential
of Nintendo DS Lite handheld gaming consoles
Öhman, A., Flykt, A., & Esteves, F. (2001).
and Dr. Kawashima’s Brain Training software
Emotion drives attention: Detecting the snake in
as a study support tool in numeracy and mental
the grass. Journal of Experimental Psychology.
arithmetic. JISC TechDis HEAT Scheme Round
General, 130(3), 466–478. doi:10.1037/0096-
1 Project Reports. Retrieved June 6, 2009, from
3445.130.3.466
http://www.techdis.ac.uk/index.php?p=2_1_7_9.
Panksepp, J. (2004). Affective neuroscience: the
Quilitch, H. R., & Risley, T. R. (1973). The effects
foundations of human and animal emotions. Ox-
of play materials on social play. Journal of Applied
ford: Oxford University Press.
Behavior Analysis, 6(4), 573–578. doi:10.1901/
PaRappa the rapper. [Video game], (1996). Sony jaba.1973.6-573
Computer Entertainment.
282
Ravaja, N. (2004). Contributions of psycho- Schachter, S. (1964). The interaction of cognitive

physiology to media research: Review and recom- and physiological determinants of emotional state
mendations. Media Psychology, 6(2), 193–235. . In Berkowitz, L. (Ed.), Advances in experimen-
doi:10.1207/s1532785xmep0602_4 tal social psychology (Vol. 1, pp. 49–80). New
York: Academic Press. doi:10.1016/S0065-
Ravaja, N., Turpeinen, M., Saari, T., Puttonen,
2601(08)60048-9
S., & Keltikangas-Järvinen, L. (2008). The
psychophysiology of James Bond: Phasic emo- Schachter, S., & Singer, J. (1962). Cognitive,
tional responses to violent video game events. social, and physiological determinants of emo-
Emotion (Washington, D.C.), 8(1), 114–120. tional state. Psychological Review, 69, 379–399.
doi:10.1037/1528-3542.8.1.114 doi:10.1037/h0046234
Reiter, U. (2011). Perceived quality in game audio Schlosberg, H. (1952). The description of facial
. In Grimshaw, M. (Ed.), Game sound technology expressions in terms of two dimensions. Journal
and player interaction: Concepts and develop- of Experimental Psychology, 44(4), 229–237.
ments. Hershey, PA: IGI Global. doi:10.1037/h0055778
Rez. [Video game], (2001). Sega (Developer, Seah, M., & Cairns, P. (2008). From immersion to
Dreamcast), Sony Computer Entertainment Eu- addiction in videogames. In [New York: ACM.].
rope (Developer, PlayStation 2). Proceedings of BCS HCI, 2008, 55–63.
Rhodes, L. A., David, D. C., & Combs, A. L. Shilling, R., Zyda, M., & Wardynski, E. C. (2002).
(1988). Absorption and enjoyment of music. Introducing emotion into military simulation and
Perceptual and Motor Skills, 66, 737–738. videogame design: America’s Army: Operations
and VIRTE. In Conference GameOn 2002. Re-
Röber, N., & Masuch, M. (2005). Leaving the
trieved January 1, 2010, from http://gamepipe.
screen: New perspectives in audio-only gaming.
usc.edu/~zyda/pubs/ShillingGameon2002.pdf.
on Auditory Display (ICAD). SimTunes. [Video game], (1996). Maxis (Devel-
oper).
Rock band. [Video game], (2007). New York:
MTV Games. SingStar. [Video game], (2004). Sony Computer
Entertainment Europe (PlayStation 2 & 3).
Russell, J. A. (1980). A circumplex model of affect.
Journal of Personality and Social Psychology, Slater, M. (2002). Presence and the sixth sense.
39(6), 1161–1178. doi:10.1037/h0077714 Presence (Cambridge, Mass.), 11(4), 435–439.
doi:10.1162/105474602760204327
Russell, J. A. (2003). Core affect and the psycho-
logical construction of emotion. Psychological Sweetser, P., & Wyeth, P. (2005). GameFlow: A
Review, 110(1), 145–172. doi:10.1037/0033- model for evaluating player enjoyment in games.
295X.110.1.145 [CIE]. Computers in Entertainment, 3(3), 3.
doi:10.1145/1077246.1077253
Ryan, R., Rigby, C., & Przybylski, A. (2006). The
motivational pull of video games: A self-determi- Tellegen, A., Watson, D., & Clark, A. L. (1999).
nation theory approach. Motivation and Emotion, On the dimensional and hierarchical structure of
30(4), 344–360. doi:10.1007/s11031-006-9051-8 affect. Psychological Science, 10(4), 297–303.
doi:10.1111/1467-9280.00157
283
Traxxpad. [Video game], (2007). Eidos Interactive Isbister, K., & Schaffer, N. (2008). Game usability:
(PlayStation Portable). Advice from the experts for advancing the player
experience. Burlington, MA: Morgan Kaufmann
von Ahn, L., & Dabbish, L. (2008). Designing
Publishers.
games with a purpose. Communications of the ACM,
51(8), 58–67. doi:10.1145/1378704.1378719 James, W. (1994). The physical basis of emo-
tion. Psychological Review, 101(2), 205–210.
Watson, D., & Tellegen, A. (1985). Toward a con-
doi:10.1037/0033-295X.101.2.205
sensual structure of mood. Psychological Bulletin,
98(2), 219–235. doi:10.1037/0033-2909.98.2.219 Jenkins, S., Brown, R., & Rutterford, N. (2009).
Comparing thermographic, EEG, and subjective
Watson, D., Wiese, D., Vaidya, J., & Tellegen, A.
measures of affective experience during simulated
(1999). The Two General Activation Systems of
product interactions. International Journal of
Affect: Structural findings, evolutionary consid-
Design, 3(2), 53–65.
erations, and psychobiological evidence. Journal
of Personality and Social Psychology, 76(5), Kahneman, D. (1973). Attention and effort. Engle-
820–838. doi:10.1037/0022-3514.76.5.820 wood Cliffs, NJ: Prentice-Hall.
WiiMusic. [Video game], (2008). Kyoto: Nintendo. Koster, R. (2005). A theory of fun for game design.
Phoenix, AZ: Paraglyph Press.
Wundt, W. (1896). Grundriss der Psychologie.
Leipzig, Germany: Alfred Kröner Verlag. Lang, P. J. (1994). The varieties of emotional
experience: A meditation on James-Lange
Zahorik, P., & Jenison, R. L. (1998). Presence as
Theory. Psychological Review, 101, 211–221.
being-in-the-world. Presence (Cambridge, Mass.),
doi:10.1037/0033-295X.101.2.211
7(1), 78–89. doi:10.1162/105474698565541
Lazzaro, N. (2003). Why we play: Affect and
the fun of games . In Jacko, J. A., & Sears, A.
(Eds.), The human-computer interaction hand-
ADDItIONAL rEADING
book: Fundamentals, evolving technologies, and
Brewster, S. A., & Crease, M. G. (1999). Correct- emerging applications (pp. 679–700). New York:
ing menu usability problems with sound. Behav- Lawrence Erlbaum.
iour & Information Technology, 18(3), 165–177. Mathiak, K., & Weber, R. (2006). Toward brain
doi:10.1080/014492999119066 correlates of natural behavior: fMRI during violent
DeRosa, P. (2007). Tracking player feedback to video games. Human Brain Mapping, 27(12),
improve game design. Gamasutra. Retrieved May 948–956. doi:10.1002/hbm.20234
21, 2009, from http://www.gamasutra.com/view/ Nacke, L. E., Drachen, A., Kuikkaniemi, K.,
feature/1546/tracking_player_feedback_to_.php. Niesenhaus, J., Korhonen, H. J., Hoogen, W. M.
Edworthy, J. (1998). Does sound help us to work d., et al. (2009). Playability and player experience
better with machines? A commentary on Rauter- research. In Proceedings of DiGRA 2009: Breaking
berg’s paper ‘About the importance of auditory New Ground: Innovation in Games, Play, Practice
alarms during the operation of a plant simulator’. and Theory. Retrieved February 2, 2010, from
Interacting with Computers, 10(4), 401–409. http://www.digra.org/dl/db/09287.44170.pdf.
284
Nacke, L. E., & Lindley, C. A. (2009). Affective be based or which facilitates reaching a scientific
ludology, flow and immersion in a first- person conclusion.
shooter: Measurement of player experience. Human-Centered Design: Also known as
Loading... 3(5). Retrieved February 2, 2010, from user-centered design (UCD) is a design philoso-
http://journals.sfu.ca/loading/index.php/loading/ phy that values the needs, wants, and limitations
article/view/72. of users during each iterative step of the design
process.
Röber, N. (2009). Interaction with sound: Explora-
Human-Computer Interaction (HCI): The
tions beyond the frontiers of 3D virtual auditory
research area studying how people interact with
environments. Unpublished doctoral dissertation.
computational machines.
Otto-von-Guericke University, Magdeburg.
Interaction Design: The creation and study
Wise, R. A. (2004). Dopamine, learning and of hardware devices and/or software that users
motivation. Nature Reviews. Neuroscience, 5(6), can interact with.
483–494. doi:10.1038/nrn1406 Psychophysiology: A branch of psychology
concerned with the way psychological activities
Wolfson, S., & Case, G. (2000). The effects of
produce physiological responses.
sound and colour on responses to a computer game.
User Experience (UX): The field of study
Interacting with Computers, 13(2), 183–192.
concerned with experience people have as a result
doi:10.1016/S0953-5438(00)00037-0
of their interactions with products, technology
and/or services.
User Studies: Experimental studies involv-
KEY tErMs AND DEFINItONs ing human participants to evaluate the impact of
software or hardware on users.
Affective Gaming: The research area explor-
ing game designs and mechanics that evoke player
emotions and affects. ENDNOtEs
Affective Sound: One auditory stimulus or
multiple auditory stimuli (here in a gaming con- 1
For example, the NIA: http://www.ocztec-
text) that evoke affect and emotion. hnology.com/products/ocz_peripherals/
Audio Entertainment: An activity that in- nia-neural_impulse_actuator.
volves the manipulation or reception of one sonic 2
In terms of the creation of static sound ob-
entity or multiple sonic entities that permits the jects, the future might see the sound designer
users to amuse themselves. being able to think sounds rather than having
Empirical Methods (Quantitative): The col- to design sounds.
lection of quantitative data on which a theory can
285
Section 4
Technology
287
Chapter 14
Spatial Sound for Computer
Games and Virtual Reality
David Murphy
University College Cork, Ireland
Flaithrí Neff
Limerick Institute of Technology, Ireland
AbstrAct
In this chapter, we discuss spatial sound within the context of Virtual Reality and other synthetic environ-
ments such as computer games. We review current audio technologies, sound constraints within immersive
multi-modal spaces, and future trends. The review process takes into consideration the wide-varying
levels of audio sophistication in the gaming and VR industries, ranging from standard stereo output to
Head Related Transfer Function implementation. The level of sophistication is determined mostly by
hardware/system constraints (such as mobile devices or network limitations), however audio practitio-
ners are developing novel and diverse methods to overcome many of these challenges. No matter what
approach is employed, the primary objectives are very similar—the enhancement of the virtual scene
and the enrichment of the user experience. We discuss how successful various audio technologies are
in achieving these objectives, how they fall short, and how they are aligned to overcome these shortfalls
in future implementations.
INtrODUctION artificial and simulated domains. Linear sound-

scape composition, especially within computer
In the past, sound has often been a secondary games, has been facilitated with advancements
consideration in visually intensive environ- in computer hardware and storage capacities. The
ments, such as Virtual Reality (VR) systems and sonic contribution of linear music to the virtual
computer games. However, hearing and several scene is extremely important, especially during
other perceptual modalities are now considered gameplay, as it adds atmosphere, drama, emo-
equally relevant to the user-experience within tion, and sometimes fantasy to the overall scene.
However, interactive sounds and environmental
DOI: 10.4018/978-1-61692-828-5.ch014 acoustics are also important in enhancing the user-
Spatial Sound for Computer Games and Virtual Reality
experience by immersing the user in the gameplay persists the problem of rendering accurate room
or VR scene. These types of sounds at present are and outdoor acoustics. This is, again, down to
still used mostly as effects rather than as authen- the constraints of available hardware resources.
tic references to the virtual landscape. Accurate Rendering what may be considered a simple scene
spatialization and real-time interactive sonic ele- in the visual domain could easily entail several
ments are essential if the user-experience is to be very complex models of various acoustically
brought to the next level in future developments. dependent elements. For example, an accurate
Many of the recently developed audio tools rendition of the listener closing a room door
are based on well-established theory, but remain would need to model the room itself, the door’s
limited in their implementation of true spatial material and structure, and the change in acoustic
sound by hardware constraints. Some of the theory space during the act of closing the door (from a
that has been successfully implemented to varying coupled space to a singular space). In addition,
degrees are techniques such as Interaural Time other very important factors such as the material
Difference (ITD), Interaural Intensity Difference on the floor, walls and ceiling, and reflective and
(IID), the Doppler Shift, and Distance Attenuation. absorbing objects within the space also need to be
However, many more spatial attributes remain dif- modelled. All this, of course, in real-time!
ficult to render in real-time, such as high fidelity In spite of the current limitations to imple-
simulation of ear geometrics and head/shoulder menting commercial solutions for individualized
shadow. spatialization, the industry is employing very
As mentioned earlier, some of the basic prin- interesting and creative workarounds. It has not
ciples and techniques are now readily available only tackled the distribution of sound in virtual
to developers, but the underlying theory in this space intuitively, but it has also efficiently tackled
field indicates that for true spatial sound to be problems relating to large audio data file sizes and
delivered to the listener, individualization1 of the bandwidth. Therefore, in this chapter, not only do
listening experience is key to its success. Despite we review sound spatialization techniques, but in
the advances in hardware in recent times, all of tandem, we also discuss audio compression tech-
the current spatialization techniques used within nology and how this theme goes hand-in-hand with
gaming and VR environments remain focused on a spatialized sound for VR and computer games.
generalized listening experience, and, as of yet, no
commercially viable method has been successfully
implemented that achieves true individualized PErcEPtUAL PrOcEssING
spatial sound. The generation of individualized OF sOUND
Head Related Transfer Function (HRTFs) for
commercial dissemination is one of the remaining The cognitive mechanisms involved in the aural
milestones to be affected by hardware limitations. perception of space are highly evolved and com-
Many in the industry argue that generic solutions plex, and can be categorized into two distinct
are sufficient in achieving an accurate sense of groups–direct analyses of physical/sensory in-
immersion in virtual environments for most users. formation, and higher cognitive influences (see
This argument may well indeed hold true, except Figure 1). Both groups play a crucial role in our
for the fact that it cannot truly be tested until we everyday hearing processes. Even in cases of
can compare it to individualized spatial sound on perceived silence, background noise stimulates
a commercial scale. auditory spatial awareness by communicating
In addition to the limitations of implement- spatial information about the surrounding environ-
ing individualized spatial listening, there still ment to the listener based on both acute sensory
288
Figure 1. A simplified outline of the human auditory processing system. Higher cognitive processes
influence what we hear and how we hear the external world
detection and environmental experience (Ashmed external environment) remain to be solved

& Wall, 1999). within reasonable cost and performance (Ashmed
The physical interaction between moving & Wall, 1999). Music is perhaps the closest, and
sound-waves and static/moving objects in space one of the oldest, methods of interacting with
are well understood (HRTF, ITD, ILD, Aural Oc- higher cognitive processes. Figure 2 illustrates a
clusion, Doppler Shift and so on), but the higher forecast for the evolution of spatial sound imple-
cognitive mechanisms involved in audition have mentation in VR and game technology.
yet to be fully explored and explained. Many of In essence, raw auditory sensation of sound
these procedures are abstract concepts, such as involves the physical transportation of both the
the influence of cultural and personal experience attributes of the sonic event/source and the prop-
on our sonic perception of space. This aspect of erties of an acoustic space (whether virtual or
spatial sound design is vast and scientific meth- real-world) to the listener. Not only does the hu-
odology has yet to be fully developed for many man auditory system detect and analyze the
of the issues. An understanding of the metamor- acoustic attributes of the sound source itself and
phosis from raw auditory sensation to a spatial localize it within a complex sonic soundscape,
awareness that has meaning is perhaps the final but it also determines the acoustic make-up of the
chapter in creating true spatial sonic interfaces space based on the difference between the direct
for VR and computer games. However, from a sound and the physical effects the space has im-
commercial standpoint, these issues are for future posed on the sound as it travels through. An un-
consideration given that convincing solutions for finished list of these alterations includes absorp-
the “simpler” elements of spatial sound (that is, tion, reflection, refraction, dispersion, and
the physics of sound-wave propagation in the occlusion. But raw auditory sensation doesn’t end
289
Figure 2. Evolution of spatial sound implementation in VR and game technology
there. Before it enters the middle ear, the sound two signals). This mechanism is most useful for
is filtered by the listener’s unique ear and head frequencies between 20 Hz to 2 kHz. In ITD, sound
shape/size. These processes of filtering, along coming directly from the right will reach the right
with internal higher cognitive processes, combine ear 0.6 msec before reaching the contralateral
to infuse the processed sound with a spatial sig- left (Begault, 1993). The accuracy of the human
nature unique to the listener. auditory system to locate sounds based on ITD
is very impressive, with studies (Begault, 1993)
Human Auditory system–sound showing a discrimination of an angle as small as
Localization Mechanisms 1º or 2º (which translates to a time difference of
about 11 µsec!), depending on the position. Refer
The basic (physiological-based) understanding of to Figure 3.
sound localization in humans is one comprising However, what happens when sounds are
two distinct categories–the first dealing with the continuous and we do not have the onset informa-
horizontal plane (left, right, front, back) and the tion such as in sudden, brief sounds? A slight
other with the vertical plane (above and below the variant of ITD is used instead by the human audi-
head position). With regard to sound localization tory system. In such cases, the phase discrepancy
on the horizontal plane, a number of factors come between the right and left ears is analyzed and
into play. One of the most obvious mechanisms the location of a sound source is determined. In
is ITD where a sound will reach one ear before other words, the peak of a wave-cycle reaches the
it reaches the other (the speed of sound remains right ear before it reaches the left. From this
constant, however, apparent differences in arrival method ensues another problem however. For
time results from phase differences between the high frequencies (in humans this would be around
290
Figure 3. a) Sound-wave arriving from the right. b) Sound-wave information reaches the right ear
11µsec before reaching the left. This corresponds to a sound-source detection precision of as little as 2º
2 kHz to 20 kHz), the techniques involving ITD localizing sound on the horizontal plane. A very
and its variant are not applicable. This is due to different approach is believed to take place where
the nature of high frequencies, where the periods localization on the vertical plane is concerned.
of each cycle are very short, meaning that many The comparison between the inputs of a sig-
cycles have occurred within the distance between nal (time, phase or intensity) reaching both ears
both ears. Therefore, when high continuous sounds is not effective when localizing sound on the
are presented to the listener, phase discrepancy vertical plane. It is easy to comprehend why: A
becomes unreliable. A very different analysis signal coming from above or below the listener
procedure is required to determine the location is likely to reach both ears approximately at the
of continuous high-frequency sound source. To same time and the head and body shadows the
this end, IID is employed. input of both ears almost equally. An alternative,
IID is a technique employed by the auditory albeit technically more complicated, analysis is
system that describes the difference in intensity used. The process entails the filtering of the sig-
levels between sound signals arriving at both nal before entering the auditory canal due to the
ears. In effect, this procedure takes into account geometric features of the pinnae. (Refer to Figure
the interaction between the external sound-wave 4 for a detailed view of a pinna). The folds of the
and the listener’s head and shoulders. As a solid pinnae reflect certain frequencies of an incoming
body, the head and shoulders will reflect and signal and, as a sound source moves vertically,
absorb energy from a sound-wave as it travels the combined direct sound and reflected sound
past. In essence, this means that a sound travel- changes dynamically. See Figure 4a showing
ing from the right will reach the right ear with a the structure of a pinna and 4b illustrating the
particular intensity level, but reaches the left ear combination of direct and reflected signal paths
with a lower intensity because of the head and before entering the auditory canal.
shoulder interference.
The combination of ITD and IID (known as
the Duplex Theory of sound localization) means
that the human hearing system is very efficient at
291
Figure 4. a) The human pinna structure, b) a sound impinging and interacting with the pinna
sOUND PrOPAGAtION IN rEAL acoustics are very different from the typical air-
sPAcE based medium. Not only does the shape, size,
and context of the space influence acoustics, but
Synthetic environments in VR and computer so too do static and moving objects within that
games are very varied and sometimes very com- space as well as materials of large surfaces such
plex. Visual simulation of real-world scenes has as tiled walls.
come a long way in terms of player immersion and
photo-realism. The more complex and detailed the Indoor Acoustics
visual representations become, the more elaborate
and intricate the sonic attributes need to be in order In the most basic terms, sound in an empty room
to match user expectations. This can pose many is both absorbed by and reflected off surfaces.
problems for sound designers in terms of realistic The energy that is reradiated is dispersed around
acoustic simulation and sound source emission. the room and the listener hears both direct and
Again, as with the theoretical understanding of reflected sound as a result. Between each oppos-
psychoacoustics, environmental/room acoustics ing wall, a standing frequency and its associated
are well understood but the problem of imple- multiples resonate . The standing waves that are
menting the theory in VR and computer games produced express the room’s resonant character-
lies mainly with resource allocation and hardware istic: there are multiple resonant frequencies in
constraints. any one room. The acoustic result for the listener
The propagation of sound can vary dramati- is a reinforcement of those resonant frequencies
cally from scene to scene or from level to level (by way of emphasized energy) when present in
during an instance of gameplay. The player can be the soundscape. (Refer to Figure 5).
within a small room enclosure in one scene and Another basic feature of indoor acoustics is
change suddenly to a wide, open space in another. reflection, which is often applied broadly in VR
Some scenes take place in unusual environments, and computer game environments using general
such as under water or in outer-space, where the delay and reverberation units. In Figure 6a, the
292
Figure 5. Standing waves between two parallel walls
most basic scenario is illustrated when a sound tinguishable from direct sound. To the listener,
source is emitted next to a single-walled surface early sounds increase the perceived presence of
in free space. Sound is reflected off the surface the direct sound and have a slight intensity drop
and these reflections act as if they emanate from compared to the original direct sound. Reflections
an exact copy of the original source but, instead, arriving after early sounds are usually more nu-
it is on the opposite side of the wall and the same merous, have shorter time gaps between each
distance from the wall as the original. Reflected instance and contribute to the creation of rever-
sound is generally categorized according to the beration (see Figure 6b).
time it takes to reach the listener after the direct Figure 6b illustrates a simple typical indoor
sound. Reflected sound reaching the listener ap- reflected sound. However, imagine a listener
proximately 50-80 ms after the direct sound is positioned within a simple rectangular room. The
often referred to as early sound and can be indis- sound source is now reflected off six surfaces (not
293
Figure 6. a) A sound is reflected from a surface in free space. The reflected sound-waves act like a copy
of the original, but at the opposite side and the same distance from the surface. Based on image from
Everest (2001), b) reverberation, comprised of direct sound, early reflections, and reverberation
to mention the listener). The concoction of direct other side, as well as filtering based on the type
sound and reflected sound, as well as taking the of material the barrier consists of, impact on the
listener’s position in the room into account, final sound heard by the player/listener.
quickly becomes more complex to simulate.
Couple this with the fact that the process of surface Outdoor Acoustics
reflection is a function of the sound’s frequency
(wavelength), then a 100% accurate simulation In an open environment, away from reflective
would need to consider any frequency within the buildings or nearby surfaces, the outdoor scene
human hearing range (circa 16 Hz to 20000 Hz, could be considered a free field in the most sim-
or wavelengths from 21.5 m to 0.017 m). To deal plistic terms. In such circumstances simple rules
with this large frequency range, sound designers can apply, such as the inverse square law, when
and audio developers conform to modes whereby dealing with sound pressure and intensity levels
frequencies are arbitrarily grouped in relation to between sound source and listener. However, in-
geometric acoustic data. In addition to room creasingly rich visual representations of outdoor
modes, reflections from static or moving objects environments within VR and computer games
also need to be considered, as well as the absorp- leads players to expect increasingly accurate
tion coefficients of the various materials. All of acoustic simulation. Bearing this in mind, future
these factors make spatial sound simulation of implementations may need to consider factors such
indoor environments resource-intensive and a as atmospheric absorption, refraction, turbulence,
complex component in overall computer game diffraction, humidity, temperature, ground mate-
and VR design. rial, and the listener’s proximity to the ground
At points where there is semi-occlusion, such (Fletcher, 2004).
as glass-free windows or a ¾ wall partition (see Sound attenuation outdoors is frequency de-
Figure 7), there is at least some transmission of pendent, with high frequencies loosing much of
sound through the barrier, some diffraction around their energy due to the elements described above.
the barrier, and some reflection around the bar- An example in nature is lightening and thunder.
rier. These indirect paths to the listener on the Lightening occurring close to the listener results
294
Figure 7. Sound is transmitted to the listener on the other side of an occluding barrier via through and
around the barrier, as well as reflected from surfaces such as a ceiling
in thunder that is quite rich in both high and low Furthermore, ground level gameplay may also
frequencies. However, if the listener were much incorporate wind elements that also have an effect
further from the source, only the low frequencies on sound-wave direction. Sound may be carried
would survive the distance. toward the listener if they are positioned downwind
Most gameplay occurs on or close to the vir- or they may have difficulty hearing the sound in
tual ground where sound is reflected back to the cases where they are upwind (see Figure 8c).
listener at varying rates depending on the surface Slopes and valleys also have an effect, as do
type. Some other factors that can influence outdoor ground surfaces themselves. For example, grass
sound propagation at ground level are things such surfaces do not tend to affect frequencies below
as low-lying mist or fog. These conditions alter 100 Hz, but can seriously attenuate higher fre-
the sonic properties of the outdoor environment, quencies up to 40dB/km at 1 kHz (Fletcher, 2004).
with the effect of increasing the apparent loudness Similarly, trees and foliage scatter mid to high-end
of distant sounds. This phenomenon is caused frequencies also, whilst their effect on low fre-
by a temperature irregularity, where cold air is quencies is minimal.
closer to the ground than warm. This forces some In relation to super-hero characters capable
sound-waves to bend back towards the ground of flying well above ground level, similar effects
at the point of temperature inversion (from cold in terms of sound propagation would also occur.
to warm), instead of propagating upward and These effects are predominantly due to tempera-
diminishing. See Figure 8a and 8b. However, ture change (getting colder with increased height)
an opposing factor in such an environment may and as the character flies further upward, colder
also come into play where some sound may be temperatures slow the speed of sound (the dif-
attenuated or muffled due to humidity levels in ference being as much as 10% between ground
the low-lying fog. level and 10000 meters) (Fletcher, 2004). Other
295
Figure 8. a) Typical dispersion of sound from a ground-based source showing the effects of warmer air
close to the ground a) and colder air close to the ground b), also, wind can affect sound energy and
dispersion c). Based on an image from Everest (1997)
factors affecting the sonic environment of our VIrtUAL sPAtIAL sOUND

flying super-hero may also need to be considered, IMPLEMENtAtION—INtrODUctION
such as turbulence, which itself can generate low
frequency sound in addition to scattering high The level of fidelity in the implementation of
frequencies. sound localization varies considerably from
system to system. Most of the time, constraints
such as network bandwidth, processing speeds,
storage restrictions, and memory considerations
296
limit the flexibility required by sound design- heard momentarily in a different location from
ers. Such restrictions have led scientists to find their visual origin.
imaginative alternatives of rendering spatial audio In addition to front-back confusion, vertical
in synthetic environments–using strategies such sound localization remains a significant challenge
as compact file sizes, low bit rates, client-based in VR and computer games. Until a commercially
synthetic sound rendering and so forth, without viable method for obtaining individualized HRTF
impacting on perceived sound quality. In this measurements and accurate real-time processing
section, we will review examples of some of the of head-tracking is achieved, these issues will
popular approaches to rendering spatial sound in continue to task developers who will have to
games and VR. rely on simpler approaches. The current method
In general, left-right source positioning is rela- for obtaining an individual’s HRTF is by mea-
tively easy to achieve on both headphone-based suring the right and left Head-Related Impulse
and speaker-based implementations. However, Response (HRIR), which can then be convolved
front-back sound positioning is often less success- with any mono signal source. Essentially, the
ful, especially when reproduced on headphones. HRTF is the Fourier Transform of the HRIR.
The primary reason for this is due to the inherent The HRTF measurements are usually undertaken
requirement of the listener to perform head move- in an anechoic chamber with specialized equip-
ment when determining the location of sound in ment. Most HRTF implementations in computer
front or back scenarios. A surround-speaker setup games and VR systems are derived from generic
in this respect does not have the same level of dif- HRTF databases developed from a specialized
ficulty due to the arrangement of discrete channels human head manikin or derived from an average
placed in the physical world. In more sophisticated set of HRTFs taken from a particular population.
headphone scenarios, some VR systems incor- The pinnae and head dimensions of the manikin
porate head-tracking to allow for the integration head devices are procured from statistical data of
of head orientation and movement in the virtual average human biometrics. Of course, the disad-
scene. Typically, for headphone output, however, vantage to this approach is that the player’s ear
most implementations for computer games and/ and head shape may be very different from that
or mobile devices use generic filtering processes. of the manikin, which results not only in the lack
These are especially vulnerable to difficulties of a true, individualized spatial experience, but in
when attempting to externalize the sound in front- the distortion of spatial listening cues. However,
back scenarios. The use of the term “externalize” recent research into novel ways of acquiring in-
in this context relates to the impression the listener dividualized characteristics of the human hearing
has of the sound being some distance out from system are being explored, paving the way forward
(either in front of, behind, above, or below) their to exciting developments in spatial audio for VR
own listening position. Front-back spatial sound and computer games (Satoshi & Suzuki, 2008)
through headphones usually results in sound (Otani & Ise, 2003).
sources being heard inside the head rather than
being virtually projected out–the impression of Ambisonics
depth would be realized were the sound sources
virtually projected out. Ambisonics is a spatial audio system that was
The result arising from this situation is front- developed in the 1970s by Michael Gerzon and
back confusion, something that can negatively often touted as being superior in terms of spatial
impact on a listener’s experience during gameplay reproduction when compared to commercial do-
or VR navigation, where characters or objects are mestic multichannel formats. It is a system that
297
includes audio capturing techniques, the represen- significant amount of computation if its trajectory
tation or encoding of the signal as a soundfield is to be smooth and uninterrupted. A balance needs
(referred to as ‘B-format’), and the decoding of to be struck and compromises are necessary. It is
the signal during reproduction. Particular mi- broadly accepted that update rates of 60 Hz and
crophone patterns and positions, or a specially a total delay time of up to 50 msec are acceptable
built microphone with four specifically arranged for acoustic virtual sound (Vorländer, 2008).
capsules (Soundfield Microphone), are required In many instances, a perceptual evaluation
to capture the signal in a compatible way. The referred to as the Just-Noticeable-Difference
result is 4 signals conventionally referred to as (JND) is a useful instrument. This psychoacoustic
X, Z, Y, W in first order Ambisonics: evaluation procedure can be used for pitch dif-
ferentiation or temporal differentiation, to name
• X = Front minus Back just a few. In terms of spatial sound, it is useful to
• Z = Up minus Down know the accuracy of the human hearing system in
• Y = Left minus Right differentiating between the same sound-source at
• W = A non-directional reference signal. Front different degrees in space. With this information,
+ Up + Left + Back + Down + Right. previous calculations that were not perceptually
detected by the listener can now be disregarded.
Although considered one of the most advanced Some useful JND values in this regard allows for
and realistic spatial reproduction systems avail- the reduction of redundant data, and resources
able, the Ambisonics format has suffered from can be used for other processes such as sound
commercial setbacks. These can be attributed propagation. The performance of human listen-
to bad timing in entering the marketplace, mis- ers in point-to-point localization on the azimuth
leading associations with ill-fated quadraphonic is most accurate in the frontal direction at 1º. On
techniques, and the lack of uptake by key music the left/right axis, the performance diminishes
industry players during its development. However, significantly to 10º. The rear direction is 5º and
recently the computer game industry and virtual the JND on the vertical plane is the least accurate
reality researchers have stimulated renewed inter- at 20º (Vorländer, 2008). Therefore, a computer
est in Ambisonics. game or VR system can effectively render spatial
sound with diminished precision at certain loca-
real-time Processing tions around the listener.
Real-time binaural synthesis (the generation
The real-time simulation of physical environments of spatial sound for headphone reproduction),
in computer games and VR systems requires a is also a very resource demanding process, and
significant degree of real-time processing. Because interesting techniques are used to address the
of the complexity of this task, a certain amount problem. One of the methods used is to preprocess
of latency can be expected but limits must be binaural sound for reproduction when the listener
placed on the extent of the latency so as not to reaches particular coordinates and orientation in
thwart the user-experience. With current hardware, the virtual room. A best-matched filter is applied
latency issues remain a concern, especially when as the listener’s position changes. The key to this
attempting to reproduce a real-world scenario approach is to make the changeover from filter to
that requires rapid and consistent refresh rates of filter as inaudible as possible. Fast convolution
visual, aural, haptic, and motional events. Even is required no matter which real-time binaural
within a sound-only game, the continuous motion synthesis approach is taken. With many channels
of a sound source around the listener requires a being processed in parallel, and with the continu-
298
ous updating of listener position, convolution must The implementation of spatial sound in the
be rapid and dynamic with no perceived artifacts Java 3D specification employs a hierarchy of
during the many transitions. Multi-processing nodes that comprise:
systems have aided in achieving more realistic
rendering, as have techniques such as optimized • Sound node
fading between impulse response updates. • PointSound node
Head-tracking technology also introduces • ConeSound node
an amount of latency into a real-time process- • BackgroundSound node
ing system. Usually, the technologies employed
are optical, inertial, mechanical, ultrasonic, and There are also two Java classes for defining
electromagnetic. However, new developments in the aural attributes of an environment. These are
terms of eye tracking and image recognition are the Soundscape Node and the AuralAttributes
being explored to reduce the amount of hardware Object. Each node is defined in a SceneGraph.
encumbrance placed on the user. These techniques The SceneGraph is a collection of nodes that
are also finding interesting applications in the constitute the three-dimensional environment. The
computer game industry by taking advantage of application reads the nodes and their associated
the integrated webcam facilities built-in to most parameters from the SceneGraph and constructs
modern consumer computers. the three-dimensional world with that information.
The BackgroundSound node is not a spatial
Developing spatial sound sound rendering node. Its purpose is to facilitate
Environments the use of ambient background sounds within the
Java application. The audio input to this node is
So far in this chapter we have explored some of normally a mono or stereo audio file.
the key concepts in spatial sound and how they
might apply to computer games and VR environ- Spatial Sound in Java 3D
ments. The next section examines a number of key
implementations of spatial sound. Where possible The Sound Node itself does not address the
the emphasis is upon standardized implementa- spatial rendering of the sound source: this is
tions, such as MPEG-4, Java 3D, and OpenSL-ES, accomplished in one of two ways. Firstly, by
which are very stable, unlikely to change in the explicitly constructing the spatial attributes of the
near-term, and have also informed the develop- sound using either the PointSound Node or the
ment of other implementations. There are also ConeSound Node, or secondly, by configuring
other implementations that are introduced by virtue the acoustical characteristic of an environment
of their prevalence within the industry. using the Soundscape Node.
The first technique, constructing the spatial
Java 3D Sound API attributes, is dependent upon the type of sound
source that is being used. If the sound source is a
Although the Java 3D API specification was uniformly radiating sound (positional sound) then
originally intended for 3D graphics it has proved the PointSound node should be used, otherwise
to be a suitable vehicle for the rendering of the developer should use the ConeSound node
three-dimensional sound. It makes sense from a (directional sound).
developer’s point of view to keep all of the three- Distance attenuation, as implemented in the
dimensional functionality within the same API set. Java 3D specification, employs distance attenu-
ation arrays, which modify the amplitude of po-
299
sitional and directional sound sources, and also taken for the sound to reach the listener having
applies angular attenuation modifications to the undergone one reflection. This component is either
amplitude of directional sound sources. When a set explicitly or implied by the bounding regions
sound object is created it has to be assigned an of the volume. Note that the bounding region is
initialGain value: if this field is empty then the not necessarily the same as the region specified
value defaults to 1.0 (where 1.0 is the maximum for the Soundscape Node. Delay time is measured
gain and 0.0 is the equivalent of a gain value of in milliseconds.
–60dB). In relation to the generic Sound Node, The reflection coefficient is used to determine
no distance attenuation is applied. This would the attenuation factor for the sound. The reflection
seem to be a shortcoming in the specification as coefficient(s) represent the reflective or absorp-
distance attenuation is one of the strongest cues tion properties of the environment. A value of 1.0
in establishing depth perception for sound and represents an un-attenuated sound and a value of
should be accessible from the generic object. 0.0 represents a sound that has been fully absorbed.
If the developer did not want the sound to have These coefficients are applied as a uniform attenu-
distance attenuation then he could simply leave ation across the spectrum. This is not a very refined
the distance field blank. scheme as most reflective/absorptive materials
The SoundScape node (refer to Figure 9) con- alter the spectrum of the sound in a non-uniform
figures the acoustical properties of the listener’s manner. Using the specification’s present imple-
environment. An unlimited number of SoundScape mentation of reflection/absorption, there would
nodes can be contained within a scene. The defined be very little timbral difference between a sound
SoundScape node region determines which sets that has been reflected by plaster and one that has
of acoustical properties are to be used. As a result been reflected by a metallic object.
of being able to specify several SoundScape node The final component, feedback loop, speci-
regions, one can generate a number of aural envi- fies the number of times a sound is reflected or
ronments within the scene. For instance, within the order of reflection. If the feedback loop has
the one scene there could be three rooms, each a value of 0.0 no reverberation is performed, if it
with a different acoustical signature. Alternatively, is set to one then the listener hears an echo and,
with more detailed scene description, one could if it is set to –1.0 the reverberation will continue
set up a number of acoustical regions within a until the amplitude of the signal dies to –60dB.
single room using a number of SoundScape nodes. This is known as effective zero in the Java 3D
The acoustical properties, that is, reverberation specification. Effective zero relies upon a –6dB
and atmospheric attributes of the SoundScape drop in gain for every doubling of distance (inverse
node, are specified in the AuralAttributes Object. square law). Values between 0.0 and 1.0 refer to
The AuralAttributes Object is a component object the number of iterations of the loop.
of the SoundScape Node. It specifies the follow- The parameter attributeGain is used to alter
ing aural properties: reverberation, Doppler effect, the speed of sound in order to mimic the effects
distance frequency filtering, and atmospheric of atmospheric change. The default value is 0.344
rolloff. Table 1 contains a list of the parameters meters per millisecond and this refers to the speed
and their default values for when an AuralAt- of sound at room temperature. This value is then
tribute Object is first constructed. altered by the gain scale value specified by the
The AuralAttributes node describes rever- developer. A value greater than 1.0 will increase
beration with three components: delay time, re- the speed of sound and conversely a value less
flection coefficient, and feedback loop. Delay than 1.0 will decrease the speed of sound.
time is used to calculate the amount of time
300
Figure 9. Sound node hierarchy in the Java 3D specification
The Doppler effect is achieved by taking the console) . In relation to the current mass-market,
value of the speed of sound, multiplying it by 5.1 channel sound is the standard deployment
the velocity scale factor (which is the change of for multichannel sound output. This consists of
speed relative to the listener’s position) and then a left, center, right, left surround, right surround
proportionally applying the frequency scale val- (5) and Low-Frequency Enhancement (.1). 7.1
ues. If the velocity scale factor is 0.0, no Doppler sound is also popular and 10.2 (see figure 10) is
effect is processed. the emerging next generation of surround speaker
For further information on the Java 3D setup. Work is also being done on 22.2-surround
API the reader is referred to Murphy (1999a), sound. Games generally follow either a 2.0 (stereo)
Murphy(1999b), Murphy and Rumsey (2001), or 5.1-channel standard.
and Murphy and Pitt (2001). 5.1-surround sound is effective to a point.
However, it is essentially a coarse representation
XNA/XACT Implementation of the spatial sound field. Stepping between the
of Surround Sound speakers, as sound moves dynamically from
frontal speakers to surrounds, can be quite audible.
XNA/XACT is a collection of game develop- A natural sound (that emanates directly from an
ment technologies produced by Microsoft for the acoustic sound source and not via speakers) retains
various Windows platforms (including the Xbox its timbral characteristics despite colouration from
Table 1. AuralAttribute Object Properties
Parameter Default Value

attributeGain 1.0
rolloff 1.0
reflectionCoeff 0.0
reverbDelay 0.0
reverbBounds null
reverbOrder 0
distanceFilter null
frequencyScaleFactor 1.0
velocityScaleFactor 1
301
room acoustics and HRTFs. As it dynamically by MPEG. The Sound Node of MPEG-4 is quite
moves from front to surround positions around similar to that of the VRML/Java 3D Sound Node.
the listener’s head, the sound is filtered in a num- However, MPEG-4 contains a sound spatialization
ber of ways. However, the listener can distinctly paradigm called Environmental Spatialisation of
recognize that the sound is from the same source Audio (ESA). ESA can be divided into a Physical
as it retains its fundamental timbral characteristic. Model and a Perceptual Model.
With exact matching speakers, therefore, one Physical Model (see Table 2): This enables
would expect the same would hold true with the rendering of source directivity, detailed room
5.1-surround sound, since their discrete position acoustics and acoustic properties for geometrical
in space is simply like a natural sound source objects (walls, furniture, and so on.). Auraliza-
traveling around the listener’s head. Unfortunately, tion, another term for realisation of the physical
this is not the case as 5.1-surround is a system model, has been defined as: “creating a virtual
that represents too few steps in the sound field. auditory environment that models an existent or
Therefore, sound designers need to compensate non-existent space” (Väänänen, 1998).
for timbral instabilities by equalizing the signal Three Nodes have been devised to facilitate
as it reaches the surround speakers. This is a dif- the physical modelling approach. These are
ficult task, as sound designers cannot predict the AcousticScene, AcousticMaterial and Directive-
type of room the listener will have their gaming Sound.
system in. Briefly, DirectiveSound is a replacement for
the simpler Sound Node. It defines a directional
MPEG-4 and Spatial Sound sound source whose attenuation can be described
in terms of distance and air absorption. The direc-
MPEG (Motion Picture Experts Group) is a work- tion of the source is not limited to a directional
ing group of an ISO/IEC subcommittee that gener- vector or a particular geometrical shape.
ates multimedia standards. In particular, MPEG The velocity of the sound can be controlled
defines the syntax of low-bitrate video and audio via the speedOfSound field: this can be used, for
bit streams, and the operation of codecs. MPEG example, to create an instance of the Doppler ef-
has been working for a number of years on the fect. Attenuation over the distance field can now
design of a complete multimedia toolkit, which drop to -60dB and can be frequency-dependent if
can generate platform-independent, dynamic, in- the useAirabs field is set to TRUE. The spatial-
teractive media representations. This has become ize field behaves the same as its counterpart in
the MPEG-4 standard. the Sound Node but with the addition that any
In this standard, the various media are encoded reflections associated with this source are also
separately allowing for better compression, the spatially rendered. The roomEffect field controls
inclusion of behavioral characteristics, and user- the enabling of ESA and, if set to TRUE, the
level interaction. Instead of creating a new Scene source is spatialized according to the environ-
Description Language (SDL) the MPEG organiza- ment’s acoustic parameters.
tion decided to incorporate Virtual Reality Model- AcousticScene is a node for generating the
ing Language (VRML). VRML’s scene description acoustic properties of an environment. It simply
capabilities are not very sophisticated so MPEG establishes the volume and size of the environment
extended the functionality of the existing VRML and assigns it a reverberation time. The auraliza-
nodes and incorporated new nodes with advanced tion of the environment involves the processing
features. Support for advanced sound within the of information from the AcousticScene and the
scene graph was one of the areas developed further
302
Figure 10. 10.2 enhanced surround sound. The ITU-R BS 775-1 standard for 5.1-surround is L, C, R,
LS, RS and one Sub. 10.2-surround expands this by adding an extra Sub, left and right elevated speak-
ers, back surround, wide left and wide right speakers. L = Left. C = Center. R = Right. LW = Left Wide.
RW = Right Wide. LH = Left Height. RH = Right Height. L Sub = Subwoofer Left Side. R Sub = Right
Subwoofer Right Side. LS = Left Surround. RS = Right Surround. BS = Back Surround
acoustic properties of surfaces as declared in A first advantage we see in this concept is that
AcousticMaterial. both the design and the control of MPEG4 Scenes
Perceptual Model (see Table 3): Version 1 of is more intuitive compared to the physical ap-
the MPEG-4 standard only rendered spatial sound proach, and manipulating these parameters does
based upon physical attributes, that is, geometric not require any particular skills in Acoustics. A
properties. However, virtual and synthetic worlds second advantage is that one can easily attribute
are not constrained by physical laws and proper- individual acoustical properties for each sound
ties: it became necessary to introduce a perceptual present in a given virtual scene.
equivalence of the physical model. To this end,
two new nodes were added in version 2 of MPEG- The principle elements of the perceptual model
4: PerceptualScene and PerceptualSound. Rault, are drawn from research undertaken by IRCAM’s
Emerit, Warusfel, and Jot (1998) highlighted the Spatialisateur project, and additional features are
merits of the perceptual approach in a document derived from Creative Lab’s Environmental Audio
to the MPEG group: Extensions (EAX) and Microsoft’s DirectSound
303
Table 2. MPEG-4, V2. Advanced Audio Nodes–Physical Nodes
Node Field
AcousticScene params
3DVolumeCenter
3DVolumeSize
reverbtime
AcousticMaterial reffunc
transfunc
ambientIntensity
diffuseColor
emissiveColor
shininess
specularColor
transparency
DirectiveSound direction
intensity
directivity
speedOfSound
distance
location
source
useAirabs
spatialize
roomEffect
API (Burgess, 1992). Using the perceptual model, It can also be seen from Table 4 that the last 3
each sound source’s spatial attributes can be ma- fields of the Environment section and all of the
nipulated individually, or an acoustic-preset can Source fields are dependent upon the position,
be designed for the environment. orientation, and directivity of the source. The
Fields such as Presence, Brilliance, and Heavy- validity of this approach could be questioned in
ness are used to configure the room/object’s terms of its subjectivity, for example, the choice
acoustic characteristics. In all, there are 9 fields of words such as Warmth and Brilliance. However,
used to describe, in non-technical terms, the spatial the use of subjective terms as acoustic parameters,
characteristics of a room or a sound object. These in this context, is to facilitate the non-specialist to
fields have been derived from psycho-acoustic compose a soundscape with convincing acoustic
experiments carried out at IRCAM (the Spa- properties. This effectively opens up the complex
tialisateur project). Of the 9 subjective fields, 6 world of acoustics to the non-specialist. For further
describe perceptual attributes of the environment information on MPEG spatial sound the reader
and 3 are perceived characteristics of the source. is referred to Murphy (1999a), Murphy (1999b),
Table 1 lists the parameters for both Environment Murphy and Rumsey (2001), and Murphy and
and Source. Pitt (2001).
304
Table 3. MPEG-4 version 2. Advanced Audio Nodes–PERCEPTUAL NODES
Node Field
PerceptualScene AddChildren
RemoveChildren
BboxCenter
UseAirabs
UseAttenuation
RefDistance
Latereverberance
Heavyness
Liveness
RoomPresence
RunningReverberance
RoomEnvelopment
Presence
Warmth
Brilliance
Fmin
Fmax
PerceptualSound direction
intensity
directivity
omniDirectivity
speedOfSound
distance
location
relParams
directFilter
inputFilter
useAirabs
useAttenuation
spatialize
roomEffect
source
some considerations According to Burgess: “The lack of these [head-

related] cues can make spatial sound difficult to
Head Tracking is an important tool in a dynamic use. We tend to move our heads to get a better sense
virtual environment. Apart from the obvious ad- of a sound’s direction. This ‘closed-loop’ cue can
vantages it brings to the visual presentation it is be added to a spatial sound system through the
also important in the spatial rendering of sound. use of a head-tracking device” (Burgess, 1992).
305
Table 4. Perceptual Fields for MPEG-4 Spatial Audio
Environment Fields Source Fields

LateReverberance Presence
Heavyness Warmth
Liveness Brilliance
RoomPresence
RunningReverberance
RoomEnvelopment
Recent research has shown that the use of Head too unreliable in a programming environment,
Tracking reduces front-back reversals by a ratio requiring compatibility with a wide variety of
of 2:1 (Blauert, 1997) and there is evidence that it device models. Therefore, there seems to be an
assists in the externalisation of sources that would unusual scenario whereby 3D audio on mobile
otherwise be located ‘inside-the-head’. Another phones continues to be of extreme interest to
area where Head Tracking is helpful is in the interface researchers and API developers, but
simulation and control of the Doppler Effect and the practical implementation of the technology
to resolve source-listener movement ambiguities. is stagnant in terms of mainstream consumerism.
Blauert (1997) terms this “persistence”: This is set to change, however, with the emergence
of faster wireless networks, more powerful mobile
In connection with spatial hearing, the term operating systems and the establishment of digital
‘persistence’ refers to the fact that the position media broadcast standards for handheld devices
of the auditory event can only change with lim- (such as DVB-H).
ited rapidity. Under appropriate conditions the
position of the auditory event exhibits a time lag Jsr-234
with respect to a change in position of the sound
source. Persistence must always be taken into JSR-234, or the Advanced Multimedia Supple-
consideration when using sound sources that ments (AMMS), is an API initiated by Nokia and
change position rapidly. (p. 47) developed under the Java Community Process. It
allows for more control over multimedia elements,
including the creation of 3D audio environments.
Virtual spatial Audio On It is an optional supplement to the Mobile Media
Mobile systems API (MMAPI, JSR-135) designed for J2ME/
CLDC mobile devices. Refer to Sun (2010), Li
Although not widely known, there have been a (2005) and Goyal (2006) for more information
number of 3D audio solutions available for mobile on CLDC and MIDP specifications.
devices for a number of years. Despite this, manu- MMAPI is itself an optional low-footprint API,
facturers have been quite slow in implementing implemented in MIDP 2.0, allowing developers to
Operating System (OS) and hardware support for create Java applications to playback and capture
these audio APIs or only offer limited support on audio and video in a variety of multimedia file
a select number of devices. As a consequence, formats, perform camera operations, stream radio
third-party developers cannot rely on 3D audio over a network, generate musical tones and so
effects for their mainstream applications as it is forth. A large number of mobile phone devices
306
support MMAPI, but this is reduced when it comes attributes are capable if using MMAPI alone and
to fully implementing AMMS. MMAPI’s Manager class cannot be expanded to
It is important to point out that these multimedia include this service.
APIs are not part of the actual mobile phone OS: In order to fulfil the requirements of spatial
they do, however, enable developers to create audio rendering, AMMS was created to extend
third-party applications in the form of a MIDlet MMAPI. Of interest to us in the AMMS package
(.jar and .jad files). Therefore, whilst developers are GlobalManager, Spectator, and Module. The
cannot create 3D audio menu items for the phone’s GlobalManager class is similar in action to the
OS, it provides an excellent platform for testing MMAPI Manager class, but is also very different
the psychophysical considerations of such an in what interfaces it creates. Therefore, it does
interface in preparation for future spatial audio not extend or replace the Manager class and the
implements in the core OS. MMAPI Manager is still required to create Play-
In JSR-234, a source-medium-receiver model ers (see Figure 13). The GlobalManager handles
(Paavola & Page, 2005) is employed (see Figure the creation of Module interfaces and allows ac-
11). This approach goes beyond the capabilities cess to the Spectator class.
of MMAPI. The Modules implemented via the GlobalMa-
Using MMAPI alone, sourced audio data can nager are the EffectsModule and the Sound-
be started, stopped, paused, and primitively con- Source3D module. In contrast to MMAPI, AMMS
trolled. A summary of the MMAPI process is as allows several Players to be assigned to one audio
follows: the abstract class, DataSource, locates effect (that is, mixing several Player instances).
the audio content; a Manager class creates the This allows common effects to be applied to all
appropriate Player interface; and the Player in Players and this helps to optimize effects on lim-
turn incorporates control methods for rendering ited resources. The types of control effects pos-
and primitively controlling the audio content (see sible using the EffectsModule interface are
Figure 12). Control methods include VolumeCon- equalization, panning, virtualization, reverb, and
trol, ToneControl, PitchControl, StopTimeCon- chorus/flanger (Paavola & Page, 2005).
trol, RecordControl and RateControl. No spatial
Figure 11. An overview of JSR-234
307
Figure 12. Basic MMAPI process. The Manager class bridges between the DataSource class and Player
interface. Primitive controls are implemented via the Player’s methods
The SoundSource3D module performs control phones, PDAs and mobile digital music players.
effects specifically relating to positioning sources It is a C-language audio API with some overlap
in the virtual space. These include sound source with OpenMAX AL 1.0, a multimedia/record-
directivity (sound cones), distance attenuation, ing API for embedded systems from the same
Doppler shift, location control, specifying the group. Like the relationship between MMAPI
dimensions of an extended sound source (not a and AMMS, OpenMAX AL has basic audio ca-
point source), obstruction of the sound traveling pabilities, with OpenSL ES providing advanced
from source to listener, and reverberation effects 3D audio and sonic effects, and both share many
(JSR-234 Group, 2005; Paavola & Page, 2005). common methods. However, unlike MMAPI and
The Spectator class represents the listener AMMS, both OpenMAX AL and OpenSL ES are
in the virtual space (JSR-234 Group, 2005). As entirely independent and each can perform as a
with other sound sources in the virtual space, the standalone API on target devices.
listener/spectator must also possess spatial cues. Three Profiles are present in the OpenSL
The controls associated with the Spectator class ES implementation: Phone, Music, and Game
are location, Doppler, and orientation. (see Table 5). Different audio capabilities exist
for different profiles but manufacturers are free
OpensL Es to implement two or all three profiles on their
devices. All features of a given profile have to
OpenSL ES is an open standard API for interactive be implemented by the manufacturer in order to
spatial audio for embedded systems developed ensure compatibility. Therefore, a manufacturer
by the Khronos Group (2009). Target devices implementing the Phone profile, but wanting to
for OpenSL ES are basic mobile phones, smart
308
Figure 13. MMAPI Manager creates Players. AMMS GlobalManager creates Modules that Players
hook into
incorporate elements of the Game profile, must for spatial sound environments: most notable of
fully implement the Game profile also. these are the lack of support for HRTFs and issues
around real-time processing in Java.
MPEG-4 version 2 is where the main break-
cONcLUsIONs AND FUtUrE throughs in the integration of spatial sound in an
DEVELOPMENts international standard have been achieved. To
cater for both ESA and absolute sound render-
Spatial presentation of sound is a very important ing, a dual approach has been developed. This
feature of VR and becoming more important in dual approach of both physical and perceptual
computer games. Without spatial sound, virtual descriptions of spatial sound seems to encapsulate
environments would lack the complex qualities all of the necessary attributes for a cogent spatial
required for a convincing immersive experience. experience.
The development of synthetic sound spatializa- XNA opens up spatial sound for game develop-
tion techniques for immersive environments has ment. While not an open platform, unlike OpenAL,
lagged behind comparible visual technology. the development environment is accessible and
However, there now exists a number of options fits neatly into the XNA architecture of computer
for developers who wish to incorporate spatial game development.
sound into computer games and VR. The future holds some remarkable potential for
Java 3D provides a collection of tools that spatial sound in computer games: the designer’s
enable developers to integrate spatial sound in imagination being the only limiting factor. Imag-
a virtual environment. While the API set allows ine being immersed completely in sound, where
for the construction of reasonable spatial sound gameplay relies heavily on the sense of hearing.
experiences, there are a number of shortcom- Walking down a dark corridor in a first-person
ings which hinder the advancement of Java 3D shooter, hearing your footsteps below you, en-
309
Table 5. The table shows audio-only information for the profiles associated with OpenSL ES. MIDI
specifications have been excluded from the table. Adapted from Khronos Group (2009)
API Feature Phone Music Game

PLAYBACK/PROCESSING CONTROLS
Play multiple sounds at same time YES YES YES
Playback mono & stereo YES YES YES
Basic playback controls YES YES YES
End-end looping YES YES YES
Partial looping NO NO YES
Set playback position YES YES YES
Position-related notifications YES YES YES
Sound prioritization YES YES YES
Audio to several concurrent outputs YES NO NO
Volume Control YES YES YES
Audio balance & pan control NO YES YES
Metadata retrieval NO YES YES
Modify playback rate & pitch NO NO YES
Play sounds from secondary store YES YES YES
Buffer/queues NO NO YES
CAPABIILITY QUERIES
Query capabilities of implementation YES YES YES
Enumerate audio I/O devices YES YES YES
Query audio I/O device capabilities YES YES YES
EFFECTS
Stereo widening NO YES YES
Virtualization NO YES YES
Reverberation NO YES YES
Equalization NO YES YES
Effect control NO YES YES
3D AUDIO
Positional 3D audio NO NO YES
Sound cones NO NO YES
Multiple distance models NO NO YES
Source & listener velocity NO NO YES
Source & listener orientation NO NO YES
3D sound grouping NO NO YES
Simultaneous render of multiple 3D controls NO NO YES
vironmental sounds coming from air ducts and ghastly beast who wants you for lunch. You fire
doorways, suddenly, you hear a noise behind your weapon, the piercing impact of the firing
and to your left, you turn to be confronted by a mechanism on your ears, the sound reverberat-
310
ing and interacting with the room, shell casings Khronos Group. (2009). OpenSL ES specification.
tinkling on the floor, and the creature falls to the The Khronos Group.
ground with a resonating thud. The application of
Li, S., & Knudsen, J. (2005). Beginning J2METM
spatial sound will advance and drive the narrative
platform: From novice to professional (3rd ed.).
and drama in a computer game, and ultimately
City, CA: Apress Press Inc.
lead to an immersive user experience.
Mark, F., Bear, B. W. C., & Paradiso, M. A. (2007).
Neuroscience —Exploring the brain (3rd ed.).
rEFErENcEs City, ST/Country: Lippincott Williams & Wilkins.
Ashmed, D. H., & Wall, R. S. (1999). Auditory Murphy, D. (1999). A review of spatial sound in
perception of walls via spectral variations in the the Java 3D API specification. Institute of Sound
ambient sound field. Journal of Rehabilitation Recording, University of Surrey.
Research and Development, 36(4). Murphy, D. (1999). Spatial sound description
Begault, D. R., & Wenzel, E. M. (1993). Head- in virtual environments. In Proceedings of the
phone localization of speech. Human Factors, Cambridge Music Processing Colloquium.
35, 361–376. Murphy, D., & Pitt, I. (2001). Spatial sound
Blauert, J. (1997). Spatial hearing: The psycho- enhancing virtual story telling. Springer Lecture
physics of human sound localization (rev. ed). Notes In Computer Science, 2197.
Cambridge, MA: MIT Press. Murphy, D., & Rumsey, F. (2001). A scalable
Burgess, D. (1992). Techniques for low cost spatial sound rendering system. In Proceedings
spatial audio. In Proceedings of the 5th annual of the 110th AES Convention.
ACM symposium on User interface software and Otani, M., & Ise, S. (2003). A fast calculation
technology. method of the head-related transfer functions for
Duda, R. O., Algazi, V. R., & Thompson, D. M. multiple source points based on the boundary ele-
(2002). The use of head-and-torso models for im- ment method. Acoustical Science and Technology,
proved spatial sound synthesis. In Proceedings of 24(5), 259–266. doi:10.1250/ast.24.259
the 113th Audio Engineering Society Convention. Paavola, M. K. E., & Page, J. (2005). 3D audio
Everest, F. A. (1997). Sound studio construction for mobile devices via Java. In Proceedings of
on a budget. City, ST: McGraw-Hill. the AES 118th Convention.
Everest, F. A. (2001). Master handbook of acous- Rault, J. B., Emerit, M., Warusfel, O., & Jot, J. M.
tics. City, ST: McGraw-Hill. (1998). Audio rendering of virtual room acoustics
and perceptual description of the auditory scene.
Fletcher, T. D. R. N. H. (2004). Principles of TCI/SC29/WG11.
vibration and sound (2nd ed.). New York.
Satoshi Yairi, Y. I., & Suzuki, Y. (2008). Indi-
Goyal, V. (2006). Pro Java ME MMAPI: Mobile vidualization of Head-Related Transfer Functions
media API for Java Micro Edition. City, CA: based on subjective evaluation. In Proceedings of
Apress Press Inc. JSR-234 Group. (2005). Ad- the 14th International Conference on Auditory
vanced multimedia supplements API for JavaTM2 Displays.
Micro Edition. Nokia Corporation.
311
Sun Microsystems. (2010). Java ME API. Re- Fast Convolution: Convolution is an operation
trieved February 4, 2010, from http://java.sun. applied to two signals, the most common opera-
com/javame/reference/apis.jsp. tion being multiplication. A Fast Convolution is
an optimised circular convolution, typically used
Väänänen, R. (1998). Verification model of ad-
in conjunction with a Fast Fourier Transform.
vanced BIFS (systems VM 4.0 subpart 2). ISO/
Free-Field: An open space/environment which
IEC JTCI/SC29/WG11.
does not interact with the sound source (as opposed
Vorländer, M. (2008). Auralization—Funda- to ‘room interaction’ in a closed space.
mentals of acoustics, modelling, simulation, Head Tracking: The tracking of position and
algorithms and acoustic virtual reality (1st ed.). orientation of head movement by an external sen-
Berlin: Springer. sor in VR or computer games.
HRTF: Head Related Tranfer Function.
Occlusion: An obstacle that blocks the effec-
tive transmission of sound by absorbing energy,
or reflecting sound waves.
API (Application Programming Interface):
A mechanism in software engineering to allow
two separate pieces of software to integrate or
ENDNOtE
interface, for instance a library of functionality 1
Individualization refers to a listening expe-
being integrated into an application.
rience that is tailored to and unique to the
Doppler Effect: The apparent shift in pitch/
individual listener.
frequency of a sound due to motion parallax.
312
313
Chapter 15
Behaviour, Structure and
Causality in Procedural Audio1
Andy Farnell
Computer Scientist, UK
AbstrAct
This chapter expands some key concepts and problems in the emerging field of procedural audio. In ad-
dition to historical, philosophical, commercial, and technological themes, it examines why procedural
audio differs from earlier “computer music” and “computer sound”. In particular, the extension of sound
synthesis to the general case of ordinary, everyday objects in a virtual world, and the requirements for
interactivity and real-time computation are examined.
INtrODUctION and structure which are implied by the identifica-

tion of sound as a branch of dynamics requiring
Procedural audio is sound as a process. Instead energetic change.
of thinking about nouns, we think of verbs, or the The task at hand is producing sound for film,
actions that cause sounds. Procedural sound is computer games, or other interactive entertainment
also a structural and reasoned, rather than purely applications. This task requires creativity, knowl-
sensible, approach to sound, in which behaviour edge and understanding. Creativity in traditional
supplants identity and we ask not what sound is, sound design is directed at capturing, curating and
but what it does. matching audio data to depicted circumstance,
This chapter follows the publication of De- whereas creation of procedural sound is from
signing Sound (Farnell, 2008) in which I lay first principles: so it is truly design as opposed
foundations for procedural audio, particularly to selection. Insights into process are therefore at
its use in real-time virtual worlds. Here I would the root of the work.
like to talk about the ideas of behaviour, causality The end product is sound as code, or sound
objects (Polotti, Papetti, Rocchesso, & Delle,
DOI: 10.4018/978-1-61692-828-5.ch015 2001). One must create new digital signal process-
Behaviour, Structure and Causality in Procedural Audio
ing (DSP) algorithms, or a set of parameters for sation of models at an abstract level is possible,
existing sound objects, rather than actual sound perhaps to combine fluid models like bubbling
files. The product is “potential sound”, rather mud with fire to create lava flows.
than particular audio data as it would appear at If procedural audio implies the sound is based
the digital analog convertor (DAC). The complete on process, then behavioural audio implies that the
package of a sound object is DSP, control, and internal model and supplied parameters reflect the
encapsulation code compatible with a set of pa- behaviour of the target object in some way. What
rameters to be supplied in the future. Objects may is behaviour? Behaviour relates environmental
be instantiated and animated at some later time, stimulus to what is observed in time. It is a strange
in any contrived circumstance. We say this media concept because it must both change and, to some
has deferred form, exhibiting desired behaviour degree, remain fixed in time. A purely signal in-
according to supplied runtime parameters. terpretation might define behaviour as repeatable
changes in one or more output features in light of
Procedural Audio as a one or more supplied, dependent variables with
Design Philosophy little restraint on the whole cavalcade of qualifiers,
discrete, continuous, linear, non-linear, causal, or
It’s worth adding that the above goal is not the non-causal that might apply. A useful behaviour
exclusive end. Merely adopting a procedural way might be supposed to be time invariant on a me-
of thinking about sound can inform and enhance dium to long scale (in the order of seconds). In
traditional design approaches. For this reason I other words, behaviour that changes too rapidly
teach a sound design syllabus based upon this ceases to be observed as behaviour.
premise. Students begin the first semester by When talking about behaviour we might
considering the physical processes inherent in incorporate statefulness into sound objects, for
all sound sources They then progress to devising example; whether a container is full or empty
models and appropriate synthetic methods in an will cause it to respond differently to a collision.
implementation language. This deeper under- Within a larger context behaviour implies a nar-
standing enables choices and creativity through rative. A drink can or tumbleweed in the street is
structural metaphor and simile rather than only destined to roll around as it is blown in the wind.
empirical (surface) features. Of course it can do many things, but rolling around
For example, we study the design of bells is its “script”, its purpose in life. One way of un-
with reference to design traditions on shape and derstanding the narrative is a context that places
material properties. Combining metallurgy and particular emphasis on certain features. In real
geometry (which leads to modal interpretation) life the function of glass windows is to keep out
with the study of basilar physiology and sensory the wind while allowing light in. In a first person
psychoacoustics of Barkhausen/Zwicker (1961) shooter their primary concern (understood tragic
and Plomp and Mimpen (1968), we arrive at a firm destiny) is to be shot at and broken. As we will
understanding of why some bells are sonorous, dis- examine later, this change of narrative focus (and
sonant, hollow, or foreboding. In another exercise thus reality) is both a dynamic force in interac-
we decompose the sound of fire to arrive at the tive sound, and the source of an error in some
components corresponding to physical processes programming approaches that assume the need
of combustion. This results in models of fire with for a literal, uncoloured interpretation of reality.
surface parameters like crackles, hiss, and roar,
which can be abstracted further into scalable fire
models for burning trees or liquid fires. Hybridi-
314
Must Procedural Audio understandings of production I use the terms prior,

be real-time? concurrent and deferred computation. Those com-
ing from a computer animation background, such
Loose definitions of procedural audio might as in the work of Zheng and James at Cornell, and
omit the requirement for real-time computation Moss and Yee at North Carolina, seem to embrace
because behaviour and process can be described the concurrent model in which realistic sounds are
and computed in an offline system. In some a side effect of realistic physics based animation.
ways the boundary between what is offline and However, a fuller definition of procedural and
real-time is only a matter of degree, the amount behavioural audio would add the requirement
of CPU power available. In other ways there is a of real-time computation and imply something
stark discontinuity in the temporal design of some further about delivery as well as generation.
models that means they cannot cross the divide. An Procedural audio differs from longer established
example is found in the Csound implementation computer music in several ways. It tackles the
layer, based originally on an offline musical score general case of sound synthesis, not only those
model in which event durations are given a priori. sounds which are musically pleasing. It is real-time
Although advances in machine technology now and interactive. And it anticipates deferred form
allow real-time deployment, sound objects must (and unexpected interactions) as a philosophy. A
be structurally redesigned to incorporate “on the concurrent approach does not handle non-diegetic
fly” MIDI events. The underlying Csound model sources, high energy events or, at present, more
requires different operations for each approach, complex systems than rigid body collisions and
which cannot be mixed. One must therefore ap- simple fluid dynamics. We put the full power of
proach a Csound design with this in mind at the procedural audio to use in the context of computer
start. games when we extend it to the general case all
For film, pre-recorded radio and other static natural sounds in an unpredictable environment.
media, parameters are applied up-front, then the It is for live installment (arts and theatre), virtual
sound is recorded. Computer sound has been used reality, or games that sound design with proce-
in this way for decades, and indeed this inherits dural sound objects comes into its own because
from the MUSIC-V lineage, which includes we can make parametric decisions in real-time,
Csound. For computer sound, this rendering which means we can create sound for situations
predates similar CGI concepts by a long time. A that have not been planned in advance.
slightly different use is also possible. Where the This way of thinking is a departure from con-
synthesis parameters accompany a visual time- temporary sound design á la Hollywood. In some
line, such as in animation, and where sufficient ways it returns us to the era before Foley, of the
offline computing power is available, we have a live theatre sound effects artist employing props:
powerful editing system in which changes to the a reactive, adaptive and intelligent sound maker.
visual elements are automatically reflected in So, as for nomenclature, we might better employ
the audio track, which is rendered along with the the established expression “synthesised sound” for
images. This changes the temporal status of what prior media like music and film, while reserving
is traditionally post-production, because sounds “procedural audio” for deferred applications with
don’t have to be created after the picture but are a real-time element in which sound is created in
designed as part of it, or even beforehand. Here is the moment, as required. In prior media, the use
a subtle change in the use of the technology rather of the word “sound” implies it has already been
than the deeper nature of the technology itself. realised (concretised) as a signal, while “proce-
To distinguish these (past, present, and future) dural audio” is still code (potential sound). You
315
can’t just play procedural audio; you have to run other examinations of design, this continues in a
it, in a context that makes sense. “dialectic” testing the new synthesis against the
internal model and external reference, and iterating
Procedural Audio is More over successive improvements. There are several
than Physical Modelling flavours of analysis that can be employed, such
as perceptual analysis and signal (correlation,
Since it is often conflated precisely with the words spectrum, transient) analysis. The one we will
synthesis or physical modelling, let’s be clear consider in this chapter is physical component
that procedural audio isn’t necessarily either of analysis. Without an interpretive intermediate
these subjects. Physical modelling (insomuch as stage, analysis-synthesis is merely transcoding,
it refers to finite elements, tensor matrices, mass in the sense of a Shannon-Weaver signal. In other
spring damper systems, waveguides and other words we could resynthesise a sound transpar-
implementations) and synthesis (as it pertains ently from an analysis and gain nothing (except
to known parametric methods such as additive, data reduction useful for transmission). A proper
granular or non-linear), are pieces of a larger analysis yields behavioural features and concepts
picture. This picture also comprises psychology that can be meaningfully manipulated to give new
(perceptual psychoacoustics and auditory cogni- sound transformations. We get proper handles on
tion), philosophy and epistemology, and object things. An analysis that makes sense will allow us
domain knowledge. to creatively intervene in a reasoned way because
It is in their combination that we attempt to we are able to abstract something useful from
reduce a sound to its behavioural realisation. The the example sounds. This is dealt with in several
culmination of these disciplines is to program places in Representations of Musical Signals (De
physically informed synthetic or composite source Poli et al 1991).
sound objects with behaviourally informed re- An analysis works best when carried out over
sponses to input conditions. This involves some a long time, or with numerous examples of the
measure of simplification and necessary under- target rather than with a single snapshot. Like
standing and is not an attempt to construct a one- tomography, it is an attempt to reveal what is
to-one model of a system. So, by programming, hidden within by integrating views. Imagine a
I do not mean only models, synthesis methods, wrapped present that you shake, weigh and tap,
and implementations but, rather, the total set of trying to guess what is inside. A stimulus provokes
principles required to get good behaviour, effective a response. Consistent responses, mapping domain
over a range of use cases. In this respect, procedural to range, constitute a behaviour and, as we think
audio intersects with some of the interesting stuff of the object as a “black box”, by observing the
of orthodox of sound design, the tricks, shortcuts, common responses to lots of environmental stimuli
perceptual devices, simplifications, and decep- we thus reveal something about the object. In the
tions of psychology and storytelling. case of an idiophonic object, say a snare drum, the
response may be immediate and consistent, while
the increasing complexity of machines and living
bEHAVIOUr AND ANALYsIs organisms may require deeper analysis and yield
less predictable behaviour. At the limits of behav-
Starting with a concept, analysis is the beginning of iourism, we can either open the box to reverse
the real work. It seeds mental prototypical designs engineer the system or use signal (superficial)
which in turn lead to synthesis. As described in analysis with a phenomenal synthetic method.
Designing Sound, Computer Sound Design and There may also be accidental characteristics,
316
untypical of the class and peculiar to the target, Models

like an intermittent rattle due to a broken part.
All other environmental variables should We could reduce running water to a model of
remain constant. But this ideal of independent overlapping exponentially frequency-modulated
parameters is rare in the real world, so it’s often sinusoids corresponding to Helmholtz oscillators
necessary to play with several data sets. The en- caused by entrained air cavities. We can reduce
vironment we work in is the physical world. This fire to a componentised model of crackles, hisses,
is another way of saying that physics is an objec- and low frequency noise corresponding to physical
tive representation of reality with which we can features like fuel fragmentation, gaseous expan-
describe sound signals (and their effect) produced sion, and turbulence during combustion. At the
by objects. We describe the perceived signal in surface, it is only necessary to express the model in
terms of the things that produce the signal, object clear words, or in simple formulae with correctly
and stimuli. Clearly a knowledge of sound physics ordered relationships between features.
is useful, particularly mechanics, solid vibrational Notice that the first case can be taken superfi-
physics, fluid dynamics, and gas phase acoustics cially, only as a surface (signal) analysis, without
given in standard textbooks (Elmore & Heald, asking why the water comprises patterns of expo-
1969), (Subrahmanyan & Lal, 1974). nentially rising sine bursts. We could just make
note of their average pitch, duration, overlap, and
density arriving at a broad model involving spectral
MODELs, MEtHODs AND centroid and flux. Or, we can delve deeper into
IMPLEMENtAtIONs the causes of features. The latter approach brings
physical behaviour into the picture and can offer
Having obtained an analysis, an understanding meaningful performance parameters while the
of the target, we wish to produce a design which former can only ever yield a brittle, phenomeno-
simulates its salient features and behaviours. logical model. The term performance is borrowed
We break the design of sonic objects into three from computer music where we would probably be
conceptual strata: a model, which is an abstract dealing with a musical instrument model. For the
representation of desired behaviours; one or more general case of synthesis, a model is “performed”
methods, which allow us to realise audio signals by actions from its environment or energy changes
from a behavioural description; and an implemen- within itself. For a babbling brook we might say
tation, which provides the vehicle for delivery. it’s performance parameters are speed of flow,
These decouplings, which I observed during many and depth of the water.
years of synthetic sound design, allow modular What we desire from a procedural model
software engineering practice in which each layer is that it presents a parametric (performance)
may be replaced whilst leaving the others intact. It interface, with the smallest set of useful control
becomes possible to construct a sounding object, signals corresponding to forces or values in the
and then completely replace its sound synthesis environment relevant to the narrative. We say the
method, perhaps swapping a subtractive for FM model captures the sound source well when this
method, or re-implementing the same model and correspondence is extensive and efficient. These
methods on a different DSP platform. might be fixed, like the height of a waterfall,
continuously variable values, such as speed of
fluid flow or temperature, or a discontinuous or
ordinal feature set such as a texture tag (wood,
metal, stone) taken from the object’s properties
317
or the surrounding world (such as footsteps over and versatile control. Pragmatic implementations
changing ground materials). Making this mapping often harness both classes.
well behaved can be a difficult programming task Synthetic methods are not limited to the produc-
that requires an artistic, human contribution. tion of signals at the audio rate, they apply also
A model must also explain/manifest the rela- to control systems above the audio DSP level, or
tionship between its own parts or subsystems. For any other model feature that can be computed.
example, a helicopter is a complex machine with Examples are the rolling drinks can in Designing
many parametrically coupled sources, an engine, Sound which has an inertial control model distinct
gear box, exhaust system, and two propellers. Their from the sound production, or fragmentation
individual performance parameters are linked models of Zheng and James which concentrate on
by a higher set of control equations that accord the control level of particle debris and leave the
with proper flight manoeuvres. The ideal set of actual sound generation to precomputed proxies.
parametric controls, from the viewpoint of game In some racing games it is the sonic behaviour
audio design, would be the parameters used to of the engine and vehicle as an overall system
actually fly the vehicle (plus an observation point that could be described as procedural, while the
vector). Consequently, procedural audio relies on synthesis itself is at best a naïve granular, and at
much tighter coupling with physics computation, worst carefully hand blended sample loops played
at least if efficiency gains are to made by avoiding back under the control of the vehicle performance
duplicate calculations. model.
As a systematic example let’s return again to
Methods the helicopter, whose engine is modelled with a
parabolic pulse source and waveguide network,
Methods are the way we concretise models. They gearbox by a closed form additive expression,
connect the abstract model to the solid imple- and the blades by a subtractive (noise-based)
mentation as DSP. Methods are drawn from an method. But their inter-relation for given flight
extensive set of known techniques that provide manoeuvres may be calculated by differential
particular spectral or time domain signal behav- equations at the control rate. These controls are of
iour from an assumed input parameter set, such course independent of the DSP used to compute
as additive Fourier, subtractive, non-linear wave- the audio signatures, which could be replaced by
shaping, FEM (lumped mass), waveguide, MSD cheaper methods for reduced level of audio detail
(elastic), FM, AM, granular, wavelet, wavetable, (LOAD) as the vehicle recedes into the distance.
fractal, Walsh and others. These are essentially Yet overall, the model still retains its behaviour
mathematical formulations, functions of time, as the detail of the sound (and cost of production)
systems of linear and non-linear equations from diminishes. Dynamic adjustment or replacement
which we obtain audio signals. They fall loosely of model components or methods to obtain chan-
into two classes: parametric (signal) and source ing level of detail is one very powerful aspect of
(physical). In the former, the relationship between properly stratified procedural audio.
model and implementation must be described by
the method. In the latter, the mapping of model to Implementations
output signal is the method. The latter promises
unparalleled realism at the expense of computa- Actual implementation of audio signals requires
tion cost and lack of abstract control, while the knowledge of practical computer engineering and
former trades detailed realism for cost savings is not a subject to ponder deeply here. We shall
just glance over it, because it changes from case
318
to case, and for a high level treatment we can concepts. Yet in the form I use it, as Pure Data,
take exchangeable implementations as a given. it is far from perfect. Nonetheless it probably
It’s worth noting that there is an axiomatic set represents the best hope currently available for
of a few operators including such “atoms” as the designing procedural audio due to the ease with
unit delay (Z -1) and arithmetic operators (plus, which large programs can be constructed. A nice
minus, multiply, divide) from which all of DSP overview of language design for the particular case
can be obtained. In native implementations, many of musical synthesis is by Günter Geiger (2005).
constructions can be built upon this foundation, The general case of synthesis (everyday sound
such as the sin() and cos() functions approximated effects created using procedural sonic objects)
by polynomials (truncated Taylor/Maclaurin), is not addressed by Geiger’s study. While Pure
familiar band limited waveforms built on known Data seems ideal for design, deployment may be
identities, continued fractions or closed forms, and much better served by those languages having a
standard filter blocks built on established topolo- “built from the ground up” client-server model
gies (biquad and so forth). Recently attention has like Chuck or SuperCollider.
been drawn back to computational physical meth- Beyond a comparison of algorithmic com-
ods (specifically finite difference methods for one plexity in relative technologies, such as Avanzini
and two dimensional wave equations) which have (2001), a research opportunity exists here for cost
come from Bilbao (2009) and, which combined metrics of implementations on modern vector
with the earlier waveguide work of Smith (1992) processors, such as CUDA on Nvidia GPU (see
and trends toward matrix processors, offer hope Angus and Caunce 2010). Briefly, the qualities
for less expensive versions of direct computation to be considered are trans-structural concurrency
for plates, surfaces, tubes and volumetric extents. (parallelisation) for which functional LISP type
In practice, as you develop procedural audio evaluations seem suited (though suffering from
systems, current technologies and runtime con- overheads associated with functional languages),
straints will determine implementation for you. plus those that handle instance concurrency
Designing Sound gives some coverage, both as (source polyphony), like SuperCollider or Chuck.
proof of methodology and an example set for Other issues are core language size (for example,
implementation using the Pure Data language. MPEG4SAOL Csound or Lua Vessel), scalabil-
Many of these have been translated to SuperCol- ity, whether objects should be pre-compiled or
lider by Dan Stowell at Queen Mary University, whether an interpreter or server/client model is
London. We must also remain mindful of low level best, and licensing for commercial development
programatic and architectural issues, such as why (for example, Pure Data or Max/MSP).
to avoid memory lookups, data bus throughput,
stack frame overheads, the danger of denormals,
and even simple things like watching for divide cOst AND VALUE
by zero, though these are addressed at the C code,
compiler or assembly level and depend on the Of those design qualities on orthogonal axes,
underlying hardware too. cheap, good or fast, the ground in procedural
The choice of a language and its features is audio is similarly divided between a mixture
another subject. Visual dataflow is certainly an of expedient design, aesthetic realism, efficient
advantage for productivity, readability and reuse. code, and object flexibility. Flexibility in terms of
It has a natural congruence with the thought pro- software engineering benefits is worth investing
cess during design. It has a wonderful granularity in because reuse, polymorphism, and free asset
combining simplistic concretes alongside abstract generation are amazing advantages available
319
for computer game development. Paramount pressures, and there are precious few who dare to
throughout decades of development has been take a chance on artistically progressive projects,
computational efficiency. Progress here provides a indeed the games industry is remarkably conserva-
one-off, lump investment that pays back dividends tive in this regard. However, Reiter and Weitzel
forever. Any breakthrough or leap of insight into (2007) attempt some metrics and Reiter (2011)
cheaply obtaining a sonic behaviour can reduce discusses elsewhere in this book the inter-modal
the cost of all subsequent models, or make pos- effects within an interactive multi-media system.
sible those hitherto infeasible. Regardless, there
is an inexorable movement towards procedural
audio anyway, even if it is not much recognised FrOM stAtE tO PrOcEss
by the industry at present. Even if it were not for
the myriad advantages of procedural audio there The frequently examined question of whether
is a steady trend in the direction of run-time rather game sound designers should also be programmers
than prior processing, simply because the march is perhaps misleading. They are programmers to
of technology makes it so (Whitmore, 2009). the extent that the tools (often externally supplied
The software engineering principle of continuous closed-source “middleware solutions”) fail to al-
integration and revision can take advantage of low them to express and design sonic concepts with
deferred form to avoid expensive mistakes being procedural reasoning. Rather, the questions are:
fixed into code by premature decisions, good for
game development which already stretches the • To what extent can audio programmers get
limits of Agile development. away with not understanding sound design?
The real-time requirement has always made In other words, are fixed tools developed in
insightful analysis and thoughtful construction isolation (brick wall model) of the creative
part of a constant technical and creative quest. design phase counterproductive, and if so how
Something worth noting about spacial and tem- can continuous integration and closer working
poral structure here, is the need for reusable com- be facilitated by defining new roles in game
ponentised objects and connecting flow, not only audio, or consolidating old ones?
because this is the natural computational model • To what extent have we got stuck in a para-
but also because it is a natural, human-interfacing digm that is restricting progress in sound?
model that satisfies the need for expedient de- Can middleware products offering an impov-
sign and robust, incrementally improvable code. erished event-asset models grow to include
Dataflow programming already goes some way new approaches, or must they be replaced?
towards this, on the surface at least, languages • What new tools and skills will be needed
such as Max/MSP and Pure Data are ideal during for next generation game sound, and to what
model design. extent can these be standardised in industry
Finally, the main goal is aesthetic and some and taught in higher education ?
ability to discern satisfactory results (good ears) is
needed. This is possibly one of the hardest qualities Some of these issues have been touched upon
to value and one of the most expensive activities is at the AES35 and AES128 (in panel W10 for
the refinement, testing, and evaluation of aesthetic instance) and by Nicholas Fournell and myself
quality since it is often subjective (without metrics at other venues such as Brighton Develop Con-
or value boundaries) to the point of being arbitrary. ferences.
Like most art forms, conformity to genre norms is
an overwhelming consideration under marketing
320
LANGUAGE AND PrOcEss ogy the principles needed to create sampling plus
synthesis (S+S), wavetable and velocity-mapped
In a command like: acoustic instruments. This migration of audio
technology, from music synthesisers to game audio
play (scrape.wav) during the late 1990s and early 21st Century, can
be seen as the renaissance of synthesis. The “dark
there is a definite article, a specific, singular age” of sampling following an abandonment of
sound which exists as a data file or area of memory game audio synthesis chips like the SID, AY8910,
containing a digitised recording. It is an allusion YM2151 (Collins, 2008, pp. 31-60)), is over now
to an atomic event. In this unqualified case, the that native synthesis capabilities are more than
event time is implicitly now. We could choose to adequate for realistic sound.
“bind” the sound to an event, deferring it until As mentioned earlier, it is perfectly possible to
some condition is met, by saying something like: approach the procedural concept with an imple-
mentation using only pre-digitised waves. The
if (moves(gate)) play(scrape. line between sampling and synthesis has never
wav) been a clear one and at present it is this area,
using hybrid sampling-synthesis, that holds the
In the case of a single, one-shot sound, the game most promise for transitional generation game
logic is unaware of the sound length. The relation- audio technology during the changeover from
ship between the audio engine and game logic is data driven to fully procedural systems. Practi-
stateless, so any timing relationship between the cal transitional systems employed at Electronic
sound and visual elements must be predefined or Arts and Disney are variations on granular or
contrived at runtime. As a further refinement, a wavetable S+S with Steve Rockett’s Blackrock
looped sound can be playing or stopped. In such a team attempting some ambitious work on vehicle
stateful system, an indeterminate future endpoint engines. Work done by Kenneth Young and others
is explicitly given by game logic: meanwhile, a on the Media Molecule title “Little Big Planet”
looped sample plays repeatedly. Like MIDI, this shows many of the structural and behavioural
leaves the possibility for a stuck sound without hallmarks of procedural audio as applied to com-
a safety timeout. binational sound that can be configured for user
For decades, this has been the dominant model generated content (UGC). While such endeavours
of game audio. Everything of significance can stop short of fully procedural audio, they are a
be reduced to a single occurrence, an event, or valuable step in the right direction because they
to a simple set of states. A multi-state example establish conceptual foundations necessary for
might be an elevator that can be “starting”, “mov- proper structural approaches and they are properly
ing”, “stopping”, or “stopped”. We say this is an tested in publicly distributed titles. Along with
event-based sound system, and that each event dynamic reconfigurability, which has a bearing on
is bound to a sound asset, or to a simple post- the effectiveness of user generated content, most
processed control of that asset. State transitions important of these approaches are the transition
are themselves events. Essentially, the entire from state/event to continuous parameterisation
game audio system can be reduced to a matrix and the recognition of behavioural audio objects
of event-resource pairings. with multi-dimensional parameters.
Since the turn of this century, more sophis- To take the above example of the screeching
ticated approaches have appeared. Multi-state, gate, we might now express the condition as:
multi-sample sources borrow from music technol-
321
if (change(angle(gate))) we now need a computationally more expensive

playgrains(scrape(angle(gate))) grain/segment-based player (for the hybrid case)
or a control and DSP synthesis layer for a fully
The “event” that the audio engine main loop procedural implementation.
is waiting for is a change in the angle of the gate A further developmental disadvantage in mov-
because a player has moved it. From here, we ing beyond the limitation of events to thinking
can dare to abandon entirely the notion of state about actions and energy flows is that previously
and pretend that all objects in the world, subject neat boundaries become blurred. A discrete time
to their position in a masking/priority table, are interpretation of a game makes simple, easy sense
reactive and continuously parameterised. Use- to level designers or script writers. In practice, be-
ful mask topologies are functions for spatial and havioural sound objects coupled to the underlying
temporal relevance (player focus), geometric physics engine (for example, see Mullan, 2009)
distance from known listener actors (machine and game logic may need to be presented such that
listening objects and human players), occlusion the boundaries of sounds may remain, in surface
based on raycasting, and unimpeded work (power appearance at least, based on indivisible events.
delta as a measure of how loud the sound could
be in free space).
The “sound” is now no longer simply a file bEHAVIOUrAL AUDIO
whose duration should be matched to the rotation
time of a fixed animation: It is a command to a Another advantage of procedural sound is that it
granular synthesiser that creates screeching sounds can easily obviate the old problem of repetitive
by picking grains from a file and replaying them. sound sources and the need to fake variations. If
It is a sound made as a process or function. The a procedural sound object is repeatedly triggered
domain of the function comprises two variables, with precisely the same parameters, will it not
time and angle, while the range is a time-variant make an identical sound? A comparison I have
audio signal. Further, the function could have been fond of is between sound as a film, which
hidden internal mappings, for example, the rate can reveal behaviour, and sound as a photograph
of change of angle (angular acceleration) could which cannot. It accords with a dynamic interpreta-
be used to select different grains and timings that tion of sound where causality is significant. One
stimulate a stick-slip friction model to obtain an phenomenon here is iterated complexity, familiar
impulse signature such as outlined by Rocchesso, to some as the “butterfly effect” (viz. Lorenz
Avanzini, Rath, Bresin, and Serafin (2004) or in (1993)). Natural variation can be introduced when
Farnell (2008). a function of several variables is sensitive to initial
As sound code rather than audio file, advan- conditions. Although of course all computations
tages are better cohesion and decoupling since are deterministic, with small deliberate variances
no subsystem need remain aware of the state of even systems of low complexity yield large varia-
any other (an advantage for network replication tion of output in a short time.
too). A disadvantage is that more data must pass A weakness in the film analogy is that a film
between the underlying world model (geometry also has a script. Watching the same film twice
and physics engine) and the audio system. In use, does not alter the story. Films are also snapshots
the first obvious advantage is that the player can (singular experiences, regardless of warped time,
exert continuous control over the object movement in the sense of Tarkovsky) in (and of) time, despite
and hear corresponding sounds. A disadvantage an extra dimension to play with. The analogy of a
is that instead of a linear, sample replay routine theatrical play with real actors is an improvement,
322
exposing the concept of deferred form which, Depth of Model and the
I believe, is a vital facet of procedural audio. Question of realism
In this analogy the performance is “played to
the audience”, perhaps a little different on each Clothes you put on don’t change who you are.
night. It’s adaptive and plastic. So, my favorite A soldier is more than a uniform and a doctor is
procedural sound metaphor is a football game. It more than a stethoscope. Thus it is with sampled
encapsulates the essence of football, the players, sounds that suffer the weakness of exposing only
the rules, and the ball, the spectators and, at each surface features which will not withstand harder
juncture, it is entirely causal: Every move sets up examination. One of the things people mean when
every subsequent move, yet its unfolding form and describing sounds as “two dimensional” is an
precise outcome are unpredictable even though the inability to work in more than one or two tightly
essential experience of it remains football. Natural constrained contexts. The idea of multi-sampling
sounds are like this. Given identical contexts, only makes sense insofar as all entity interactions
no two snapping twigs would produce the same can be enumerated. With combinatorial growth this
sound pattern. Game sound is itself, like a game. clearly becomes a nonsense. Thus, sampled sounds
are straw men, exposed by cursory examination.
behavioural breadth They capture the look without the feel, the appear-
ance without the behaviour. To use a computer sci-
The football game is bounded by immutable con- ence term, they are brittle. The breakdown comes
straints, such as the players not leaving the field quickly under the functionalist gaze (if ears can
and flying about. It is a fair model for physical cast such a thing) of a listener who cannot refrain
vibrations and acoustic propagation where unfath- from causal analysis (causal listening in Chions
omable complexity means that no two time domain sense of seeking an identifying mechanism behind
waveforms will ever be the same, yet underlying the surface signal). The senses are jarred and the
circumstances mean that two indistinguishable familiarity of being as change (as Heidegger might
strikes can sound from the same bell hit in dif- have it) deeply upset by hearing exactly the same
ferent places. This non-contradiction requires a thing twice (it’s completely unnatural), especially
model with constraints that allow a wide enough in rapid temporal proximity where it may still be
resulting range for the domain of stimulations in echoic memory. This is the experience of the
(behavioural parameters), and narrow enough for sound sample. A whole generation has grown used
the sound to be perceptually defined. We could to it. And yet, with a sound photograph (sample)
call this the reality window. the best, most expensive microphones, studios
In the idiophonic case, the breadth of behav- and playback technology only help to expose
ioural parameters may extend to include different the fakery as soon as the same sound plays for a
excitation points or methods. For example, we second time. Doesn’t this make bogus the whole
may wish to construct a tin can model that can be quest for “realism” in game audio?
impacted on its base or sides, be scraped, rolled Such inflexibility has long been the bane
or crushed, and yet still remain recognisable as of game audio developers struggling to avoid
the same object under different conditions of repetition and uniformity (comically depicted in
behavioural stimulation (see Vicario, 2001). an episode of “The Simpsons” where Sideshow
Bob repeatedly steps on garden rakes, trapped in
a “simulacra hall of mirrors”, repeating the same
experience and briefly reducing the story to the
level of a cheap computer game). Many systems
323
have been, and are still being, developed to spice tion. Whether multi-modal or through repeated
up sampled sources and impart random features examples through the same channel, perception
that seem to imitate a behavioural source. Unfor- in depth will only stand up if there is behaviour
tunately for this approach, the underlying features in depth. This can happen on a short timescale,
indicative of identifiable behaviour are far from on the acoustic and neurological time-scales in
random. Bregman (1992, pp. 10-36), and again which the sounds of a snapping twig (potential
Vicario (2001), point out that our sense of realism, predator) and a raindrop can move from sensa-
in terms of confidence in a concrete identification tion to perception, and then to identification in
with well-formed behaviour, increases with the the conscious awareness of the fore-brain. Or it
number of instances whose variance reveals an can happen over a medium timescale given sev-
underlying model (the more examples of a face eral instances of a source held in echoic memory
we see the better we get at identifying its true when we are capable of discerning fine nuances of
essence). So, contrary to the goals of “increas- quality, formant, and scaling (qualitative feature
ing realism through variation”, resynthesising recognition, see McAdams & Bigand, 1992).
or treating samples to obtain random dispersions To discern a Steinway piano from a honky-tonk
of attack time, phase, and pitch may just lead to usually takes more than one note. On still longer
a muddy confusion at the cognitive level, like timescales that involve stronger memory forma-
seeing several distorted and falsely coloured pho- tion, a collection of asynchronous sample loops,
tographs of the same face in the style of Warhol. say of windy weather, may fool us for many
No underlying behaviour that can reveal the thing minutes or hours before we become aware that
in itself is conveyed. the behaviour lacks depth.
So, to answer the question “how real is it?”, Such discernment may even require the gamer
I postulate two different kinds of realism. There to engage in many sessions of play before the dif-
is a superficial kind of realism, that works from ference between repetitive, data-driven sound and
one angle and relies on smoke and mirrors, and “living” procedural sound is apparent. Here lies
there is realism in depth. I also call the former a vital development metric, if you never expect
type of surface reality aesthetic or sensible realism the player to play the game more than once, don’t
and the latter type of realism behavioural, char- bother with realistic depth at all. To describe the
acteristic, or essential realism. Sensible sources sound as more “alive” might seem strange, but
just provide the right sensations, while essential in defense I must say finally on this subject of
sources provide the correct perceptions, in the realism that there is an ineffable quality about
examined case. This is something obtained for any behaviourally designed sounds, regardless of
free by a source (as opposed to signal) method. whether the behaviour makes sense. Although I
A twist to behavioural approaches, as we shall cannot yet formalise this in my own words it is
see, is that you can fake the parametric depth to illustrated by some anecdotes and quotes and is
a degree you can comfortably afford, in constant certainly worth further study with more formal
memory and CPU space. experiments. In one observation, students playing
According to Warren (1992), Bregman (1992), with an explosion generator and a billiard ball col-
and those of the cognitive schools, behavioural lision simulator simply did not want to stop. They
parameters are as important in our assessment of knew instinctively the difference between this and
reality as surface features. We are accustomed to a system that was just playing back a wide choice
having our senses trick us with a small number of of samples. (Perhaps this can be accounted for
sample points, thus perception is a multi-layered, by novelty against an generational experience of
iterative, convergent affair: a cognitive integra- samples). Without any visual accompaniment the
324
sounds alone, like a musical instrument, seemed priori, through visual priming, by experience,
engaging enough to just keep pressing the buttons or on a short timescale by the “rules of auditory
(in a way that nobody would ever do with a sample scene analysis” (McAdams & Bigand, 1992)
replay). One described it as “satisfying”, another which amount to an innate understanding of phys-
as “moreish” (addictive). Another, more academic ics (and by Chion’s (1994) rules of audio-vision
explained that the continual subtle difference synchronisation). I like to extend this idea to
was what forged its identity and begged further encompass “behavioural plausibility” in general.
exploration (I suppose in the spirit of Deleuze) as As an example, an innate physical behaviour is
opposed to the anaesthetic, throw away emptiness that all systems are in energetic decay unless a
of repetitious samples. One emotional testimony new source of energy is supplied. A bouncing
stays in my mind from an online discussion in ball must decay (decreasing period between col-
2005 about synthetic game sound. It remains lisions), because if it makes sound there must be
inspiring to my work even today. Cosinusoidal loss, ergo less kinetic energy, and thus shorter
(screen name) writes: subsequent bounces. We know this at a deep level,
learned through exposure to a lifetime of physics,
“Real synthesis in computer games is something or perhaps the result of inherited structure (in the
I imagine every night to help me drift off into sense of Chomsky’s innate grammar disposition).
dreams/sleep . I remember being 7 years old and A ball with increasing energy (apparent through
playing C64 computer games and the fact that increasing collision intervals, perhaps because a
the computer was some how alive making these force is silently being applied to it by a basketball
sounds brought tears to our eyes . Hopefully games player) seems to be playing backwards, even
developers will realise that real synthesis is pro- though the energetic decay of individual collisions
foundly more immersive than samples could ever is correct. Thus the larger scale behavioural feature
be, today reading your views on the same thing dominates the smaller scale one. This interpreta-
has gave me hope [sic] that computer games might tion of energetic growth, decay or maintenance is
one day return to synthesis methods.” given by Pierre Schaeffer (discussed by Miranda,
2002, p. 127).
Where is the limit of this behavioural realism?
Plausibility and Edge cases At what point does real become real enough?
The answer, in our context, is where it serves
A sonic object captures some feature of a real its purpose for arts and entertainment. Once the
source when one or more parameters can be depth of object behaviour is good enough we need
ascribed corresponding to a real physical vari- go no further. Using super-computers and many
able. For example, the order of a low pass filter hours of computation to produce sounds, misses
combined with the delay time of a ground image a crucial point. There is a middle ground between
crudely captures the distance of an aircraft where improving upon sampled sounds and a scientific
height and all other variables remain fixed. It simulation on which lives depend. Failing to
sounds real because it directly matches something distinguish these goals and chasing brute force
that happens in reality (multiple path destructive implementations with computationally precise,
interference causing a comb filter sweep). but sensibly inaccurate, or perceptually irrelevant
Beck’s (2000) “acoustic viability” captures results is a mistake in my opinion (unless you are
the behaviour of a synthetic musical instrument actually performing simulations for civil aircraft
once the parameter space accords with our sensory noise abatement purposes). Knowing which are
expectations. These expectations can be set up a the relevant parameters is what the art of practical
325
procedural audio is all about (for further discussion powerful analytical technique. Perhaps this paucity
on the limits of realism and definitions of percep- of imagination is where game audio, driven by an
tual realism, see Grimshaw, 2008; Grimshaw & adolescent sense of inadequacy in the “realism”
Schott, 2008; Hug, 2011). department (which I hope we have established
is largely irrelevant anyway), differs from film.
The way that game audio will break away from
scHOOLs OF DEsIGN it’s older brother’s shadow may be to break into
a fresh abstract and modernistic movement, not
I would like to talk a little about the “how?” of rejecting the concrete in a reactionary way, but to
procedural audio. Rather than dwelling on the find a proper voice for the living, reactive qualities
dry, mathematical subject of synthetic methods that the video game can offer as a form.
it seems more fun to cluster the use of methods
into schools of thought. This sideways glance Essential
may provoke some thoughts about application of
synthetic methods in procedural systems because Essentialists are preoccupied with exactly model-
each exposes a different view of abstraction and ling, one to one, those physical features of real-
control. ity with sonic effect. For hard proponents such
as Hiller and Ruiz (1971) or Zheng and James
concrete (2009), brute force modelling from elementary
equations of material and fluid dynamics is the
I mention this first because the liberation of sound paradigm. Moderates such as Smith (1992), Cook
design requires at least a few more punches to the (2002), Bilbao (2009), and Karplus and Strong
jaw of orthodoxy, if only to wake it up. Let the (1983) are happier with approximations offering
concrete school represent all that we are trying to computational efficiency or analogous behaviour.
attack and reject, the use of recorded samples and Common analogous methods are waveguides,
the attendant culture of infantilising commodifica- which are finite or recursive filters designed as
tion (“one problem one product”). I like the term acoustic models of spaces, finite element model-
concrete, as it derives from musique concrète ling, mass spring damper arrays used prima facia
conveying the spirit in which this technique is to model energetic propagation on a per point
usually executed. The real principle of musique basis or such as tensor arrays aimed at solutions
concrète is the juxtaposition and recontextualisa- over continuous manifold. They are, of course,
tion of real sounds. Since this is obviated by game inherently causal. Beyond sound, modelling one
audio practice that demands a gunshot represents can identify radical essentialists who seek not just
a gunshot and a falling rock represents a falling depth but full concurrency and coherence with
rock, it is, in the words of Rand (1971): “Blades the audiovisual, a high ideal in which the sonic
of grass glued onto a piece of paper to represent and visual characteristics of a modelled object/
grass”. Thus the value of concrete technique, process are a product of the same underlying be-
in technical and traditional sound design, is the haviour (and computation). Here we would, for
substitution of one sound for another. Breaking example, use the same fluid dynamic equations
bones represented by a snapped carrot, tearing to render a view of rippling water waves and the
flesh by a cabbage. We can learn a lot from sonic sounds made. This breed of essentialists want to
metaphor, about which features of signal A make recreate the real world within their computers and
it a useful perceptual substitute for signal B. In love helium cooled supercomputers, extravagant
other words, a study of sound metaphor exposes a research budgets and William Gibson novels.
326
However, in game audio, this becomes a seduc- difficult tests, the plausibility of the results may
tive goal. Take again the case of the bouncing ball. eventually break down. Behaviouralists know up
Sonically we do not wish to model the nuance of front that their models will only work for a pre-
kinetic, potential and elastic energy exchange, for defined range of uses. For the bouncing ball, we
entertainment purposes (good enough synthesis) might choose to ignore all but a pair of material
we are satisfied with a fair approximation at a tim- parameters and the initial height from which the
ing curve and an adequate model of the modes of a ball falls. Given that no other changes occur (the
sphere excited at an exterior point. While the ball surface stays fixed, and gravity doesn’t change),
is unseen (acousmatic) no real advantage accrues. then everything else is irrelevant to the subsequent
The payoff comes when we have accompanying sound pattern. Constraints must be drawn up a
graphics provided by a physics engine. Some priori and be satisfied throughout the sounding,
of the parameters needed for our synthesis are which somewhat breaks the conditions of sponta-
available for free, with the added bonus of perfect neity when we make a simplification that “bakes
synchronisation. So, up to a point, a hard essen- in” some future behaviour for a while.
tialist’s goal is an noble one for computer games.
At what point does it become pathological? The Phenomenal
question is whether over-computation is happen-
ing; are we calculating irrelevant information? It’s The school of phenomenology (in the sense of Hus-
hard to draw a definite line, but using information serl and Merleau-Ponty) might say: “If it works,
bandwidth studies from HCI as a yardstick, then use it”. Meaning, it does not even matter whether
once the complexity of the model greatly exceeds the underlying process matches some counterpart
the information that can be acquired by a person in reality, or is understood, so long as its facade
then we’re probably going down the wrong road. yields passably aesthetic results. The senses are
all. Some of Miranda’s (2002) [improvements
behavioural by Serquera 2010] experiments with cellular au-
tonoma could be seen as phenomenal synthesis
Behaviouralists are concerned with rendering since, although certain generative systems can be
convincing facsimiles of sonic phenomena by discovered yielding astonishing sonic effects, we
understanding the underlying physical behaviour don’t have any mapping between their parameters
though not necessarily modelling it directly. The and sensible, acoustically viable parameters. The
behaviouralist position admits psychoacoustics weakness of phenomenal synthesis is that it is
and perceptual science into its framework as a brittle and it works for a small range of sounds or
necessary component. The aims are to produce islands, and it easily breaks when pushed too far.
perceptual realism (effect, recognition, semantics, Its advantage is that if we only pay attention to
and emotion). The mechanism of the underlying the surface features of a sound we can do unusual
process is relevant though we are concerned synthesis cheaply, as in Serquera’s multi-voter
with data reduction, ignoring all except those method; a way of herding the CA cats into a pat-
parameters and control systems that are most tern that appears to be a natural process.
important. An analogy may be drawn to Searle’s Yet careful, reasoned use of phenomenal
Chinese Room or to the Turing Test. As a black technique crops up frequently in practical proce-
box, while we obtain consistent and plausible dural design, and is vital in many places. A chirp
results, and while the effects are subjectively impulse replacing an impossibly loud transient,
correct, the internal functions are unimportant to the use of noise bursts in critical band masking,
our judgment. As we apply more and increasingly and grain dilation to increase perceived loudness,
327
are all psychoacoustic sound design tricks that computer graphics (in procedural geometry and
work at the sensation or perceptual level. They textures) and will certainly yield new audio syn-
are stock tricks known to sound designers rather thesis methods in time. In some cases they seem
than features of a well-modelled reality. to accord with natural processes in an uncanny
way. In practice, exact relations to clearly stated
Metaphysical (Purely Abstract) and understood natural processes are often open
mathematical questions. Many operate on the
The metaphysicists would have it that there is a coarse-grained, emergent level, which is not to
deeper order lying behind even the hard essen- say purely aleatoric/stochastic. Contrast this with
tialist position, that ultimately mathematics alone the precision of the wave equation methods. For
can provide what we need, often from simple and this reason there is some overlap in use with the
elegant roots. The seductive idea that a formula phenomenal approach (it works but we don’t care
can fall out the air to replace thousands of cycles why), though their ideologies are different.
of computation provokes intense debate. This
ranges from comparatively concrete, order await- Pragmatic
ing discovery (within geometry and topology with
the effect of furthering understanding of signals), A pragmatic stance admits all of the above think-
from the “bigger things” of which Gauss (1882) ing, using every tool in the box to achieve a goal.
speaks, or Feynman’s “chequer board”, to the Bringing together all of the above, but without
frankly mystical. If our bouncing ball can obtain insistence on purity of method or on some over-
a shortcut to spherical modes by Riemann, and arching ideology. This requires the synthesist
a handy computable identity leading to a fast to understand at least a little of each and their
way of making struck solid sphere sounds, that’s relationships. What influences adherence to one
wonderful progress. Maybe things turn out to be or another way of thinking are the constraints
simple in the end. My bet is that we’ll need to imposed by a production context. For animation
work hard for results for a lot longer yet. and movies, one may, in the spirit of Takala &
But elegant formulas don’t always give el- Hahn (1992), prefer to adopt full-blown physical
egant methods. The arbiter is computability: not modelling with the luxury of powerful computers,
mathematics but, to the extent our Von Neumann plentiful budgets, and an offline deadline . On
machines are able; algorithmic complexity. the other hand, an embedded musical greetings
Waveguide and finite difference methods may card may harness unusual, shallow processes
be expensive, but they are reliable and provable. like shift register composition and direct binary
When stability becomes an issue in such systems, instruction (Miranda, 2002) to create useful sound
at least they are able to assert good behaviour for with negligible CPU cost/complexity. Computer
proper conditions. A series that converges too games, for the next decade will probably exploit
slowly, grows too fast for a set of variables, or has procedural audio techniques with a behavioural
too many expensive terms might never match a slant towards signal/parametric methods, with es-
crude bunch of heuristics stored in a lookup table sential/source methods appearing as cost allows.
in terms of reliability. Ideas of innateness, intrinsic It will be driven by the need for a compromise
order and Platonic truths give promise of natural between deep process and efficient code that may
models with little computational work. They run in real-time. Open frameworks that allow
include self similar fractal geometry, L-systems, artists to play with cheap, mixed methods, and
cellular and genetic autonoma, recursive noise, allow programmers to re-write methods are likely
and chaotic systems. All have all found use in to be the most successful rather than any grand
328
unified scheme. Diversity and openness are the We have an innate understanding of size and
watchwords for the future of game audio. scale as revealed by sound. Modal parameters
that change with size for constant shapes (scaled
Eigenvectors) are shown by some experiments
strUctUrE, sPAcE (Kunkler-Peck & Turvey, 2000) to be universally
AND cAUsALItY IN interpreted. Changes of parametric scale (such as
PrOcEDUrAL sOUND speeding up, slowing down playback rate, or shift-
ing formants in a fixed ratio) to indicate changes
We now move on to a deeper discussion of sound of size, is a common sound design technique. For
object design. In chapter 3 of Designing Sound, the modal synthesis this technique is understood as
basic concepts of physical sound were explored, a means to change the apparent size of simple
at least as they pertain to rigid body vibrations rigid bodies.
in objects whose size and shape remains fixed. For an excellent annotated bibliography of
Within the framework of current game physics psychoacoustic aspects see Giordano (2001).
almost all such sources are taken to be the result Moving on to more complex sonic objects, large
of a collision and thus the energetic source is ki- scale structures (in which propagation cannot be
netic. Shape (Kunkler-Peck & Turvey, 2000) (van taken as uniform), heterogeneous, polymorphic,
den Doel & Pai, 1998) and material constitution composite, and variable size objects presents a
as a starting point for the synthesis of idiophonic challenge. To summarise the work above, and that
sounds is established. That a particular sound cor- presented in Designing Sound, it is sufficient to say
responds to a set of material properties, shape, size that we can understand and synthesise sounds from
and excitation pattern and position, is confirmed objects that change in shape or size (like poured
by many whose work in modal and waveguide liquids), or have non-linear discontinuities (such
methods produces excellent results. as the twanged ruler against a table).
A deeper investigation of the role of structure As the size and complexity of an object in-
would make a fascinating thesis on its own. The creases we can no longer treat it as a collection
question of whether such a correspondence is of modal resonators connected without concern
strict (injective/one-to-one and surjective/onto) is for propagation. We must consider the journey
left aside. Benson (2007, pp. 119-120) references of sound waves through a series of sub-objects,
the work of Gordon, Webb, and Wolpert (1992) from some excitation point or temporal origin,
regarding the Dirichlet spectrum of homomor- towards the listener’s ear through a radiation
phic plates. In short, it’s possible for different surface, intervening medium and acoustic context.
shapes, similarly excited at different points, to At this point we introduce causality and the flow
sound identical. This leads to a useful simplifi- of energy into our model. Space, viz. size, now
cation where for some object we can ignore the becomes relevant to the modal frequencies and to
physical arrangement of different sub-parts and the time domain structure. An example is a ticking
consider only their connectivity, like a net-list clock model in which power is transferred from
that represents only their logical relationships, a sprung store of elastic potential energy along a
not their spacial relationships. This is one essen- series of interconnected cogs and wheels towards
tial feature of modal synthesis, an overview and a final radiator which is the face and hands of the
bibliography for which is given in Adrien (1991) device. Each sub-object can be modelled modally,
with a more recent discussion of modal methods using a variety of methods (additive, subtractive,
in Bilbao (2009). or non-linear), but the overall behaviour that makes
the sound object a clock, as opposed to a collection
329
of cogs dropped onto a table, is the synchronous is statically coupled to a significant radiator, for the
and causal relationship between the parts. instantaneous sound excitation we should consider
This Newtonian correctness can be extended symmetrically only the respective structures and
to acoustic models like vehicle engines, best the velocity (total kinetic energy) with which they
suited to waveguide methods, or abstracted to a are brought together.
control level, such as for the bouncing ball that
loses energy. Application at the control level Good behaviour for structural
should be of particular note to designers of audio and causal systems
systems linked to game physics engines as this
represents the perfect interface of simple classi- Perhaps a way to appreciate the importance of
cal physical models (elasticity, mass, damping, proper structure and causality is to consider a case
friction, rolling and other common behaviours) where it is not observed and the consequences.
to audio DSP. Ideally, such parameters should be Often we play with wild settings of a synthesiser
exposed at the sub-millisecond refresh rate or at and discover a wonderful result, a subversion of a
audio block frequency. tuba that suddenly sounds exactly like a motorbike.
At a design level there are many advantages I sometimes call these “islands”, because they are
to componentised construction. Sound designers disconnected in behavioural timbre space.
have already dreamed of modular systems in This can be seen from two viewpoints, physical
which they can create objects by combination. and psychoacoustic.
The software systems Cordis Anima, Modalsys, Let’s concentrate on the psychoacoustic inter-
and Mosaic were the musical forerunners of newer pretation first, where we have activated a higher
systems that allow plug and play modelling. An level recognition schema, albeit a false one. This
advantage, explored in my work constructing game may be due to a spectral match, an associative
objects like guns, rocket launchers, and vehicles, (metaphorical or similar match) or a partial
is that one can obtain unexpected behaviours for behavioural (mechanistic) match. Whatever the
free. For example, once a weapon body is con- basis of the match, once identified and without
structed then reload sounds are available at no further information, the tuba is a motorbike until
extra computational cost. Likewise construction we know more. In the analysis of Vicario (2001)
of a car chassis and bodywork to get the correct these mismatches are interpreted in a Kantian
engine filter implies the availability of door sounds phenomenological sense. He describes a typical
with little further work. causal identification error; the sound of rain on
Newtonian simplifications need more attention a window that turns out to be branches rattling
once we enter a real game scenario. Causality is against the glass.
often represented (in duplicate) at a higher level Partial behavioural matches are intriguing as
of abstraction in games systems. Above the phys- they form one of the pillars of traditional sound
ics engine, the collision or action system often design, shaking an umbrella for bird sounds,
maintains a causal trace, an actor-instigator chain, crunching vegetables to make the sounds of break-
in order to make logical gameplay decisions. ing bones. An object that displays some subset
The “one hand clapping” problem (in which we of the behaviour of another can often be coerced
have to ask which of two objects, both of which to produce signals easily mistaken for the target,
are mutually excitor and resonator, such as two especially when supplemented with confirming
colliding billiard balls, is the source of sound) is visual stimuli. The trick here, for the sound de-
a false dilemma imposed by the faulty logic of signer, is to identify those behavioural parameters
non-relativistic representation. Unless one object which might exaggerate or counterpoint a desired
330
artistic direction, as Randy Thom likes to remind trumpet. I strongly suspect the answer is no (given
us, the narrative is paramount. Knowing this, such just one example). Roth agrees, but suggests that
insight can equally be applied to the performance convergent parameter estimation may work for
of a synthetic model. In Vicario’s (2001) example higher dimensional performance spaces too and
it is not until further sensible data is obtained that that given two or more examples of trumpet notes,
an identification error is revealed. The apparent and thus the ability to form lines then planes within
sound of rain on the window is experienced as the performance parameter space, it would do so.
rain until the curtain is drawn back to reveal a Indeed, this is what we would expect of such a
cloudless sky and leaves brushing against the multi-dimensional adaptive system. We call them
window in a breeze. neural networks. It’s what the brain of a musician
Let’s turn to the physical domain now, and we does while learning to play an instrument.
can see the source of the confusion. A motorbike
and a tuba share much in common, with a long Parameters and Performance
tubular exhaust system driven by acoustic pulses.
Any source, whatever its mechanism, that happens Let’s distinguish time invariant or fixed parameters
to coincide with the spectrum of the motorbike from behavioural parameters. These parameters
will, when taken entirely in isolation as a static for a particular piano note are generally taken as
spectrum, be a motorbike. The moment of truth fixed within the duration/lifetime of that sound
comes when we attempt to move one of the gen- (although they may themselves have time vari-
erative parameters. The error, that a correct causal ance such as envelope settings). In music we
model (in terms of structure and scale) exists, is sometimes call the behavioural parameters the
revealed. Structure is wrong for all other points in performance setup, for instance pedal pressure
the behavioural parameter space. As soon as the or keyboard scaling. These describe how higher
pitch speeds up, the motorbike transforms into a order parameter changes affect each note or each
tuba again and the deception is exposed. instance of a sound. The fixed parameters would
We can almost always find isolated matches to be the oscillator levels, filter settings and such-
static examples. That is to say, given an arbitrary like, while the behavioural parameters are those
synthesiser with a small number of arbitrary pa- that change during performance. A well-formed
rameters and a timbre space that includes the target parameter space provides a behaviour captured
sound, there are successful methods of converging by the fewest salient variables while allowing
on the parameter set necessary to mimic the target the greatest sensible range. For a piano, it’s how
(Yee-King & Roth, 2008) One of the fascinating hard you hit the key. That’s all, no need to alter
things about the system of Yee-King and Roth, the weight per unit length of the string or the
which uses genetic algorithms to approach the best size of the sound board. A pianist doesn’t need
approximation for a given time domain example, to know that. It interfaces to the performance use
is that it can find unlikely candidates within the case giving no more or less control than required.
timbre space. It can find islands that are entirely Imagine if a piano offered an array of levers for
brittle and bear no resemblance to the target string tension and hammer hardness that had to
sound. One question I have put to Roth (with be set up before pressing each note!
whom I currently share a laboratory) is whether, A model for raindrops offering ten filter values
given a structurally and causally well-formed at its interface would be less useful than one offer-
trumpet model and a target snapshot of a trumpet ing only two relevant controls for size and velocity,
note, the system would converge on a parameter provided that the size and velocity controls work
set congruent with the performance space of the over a proper range of values. Part of the work
331
of an object designer is to wrap up or “collapse” similar sounds just gives noise. The FM method
(often many) parameters into a small set of useful can have parametric spaces where this happens.
ones. Realistically for the rain example we would
like to pick up only one or two variables from
the game engine: how much it is raining (maybe DEVELOPMENt ADVANtAGEs
flux in drops per square meter per second) and OF PrOcEDUrAL AUDIO
the material texture tag of an object which will
sound (and perhaps this would be discovered au- I will here present only a short overview of the
tomatically from nearby objects within a hearing developmental advantages of procedural audio
radius). Parameter range should be conformal and since I have dealt with these in some detail else-
continuously differentiable in the sensible space, where. As mentioned above, the side effects of
that is to say, it should contain no poles or zeros obtaining behaviour for free, as a result of object
where some combination of parameters causes an design, are wonderful. It does, however, have
anomalous signal output. In other words, we want some potentially annoying but not insurmount-
a model with the fewest meaningful parameters, able, artistic problems. Coupling between existing
which for all settings will smoothly produce the abstractions might be problematic. Changing one
correct sonic behaviour. Such a model would at part of a model may change others in unexpected
least satisfy the criteria of Beck’s (2000) “acoustic ways, so object classes might have to be overloaded
viability” which I shall paraphrase as a “simple for special cases and one off events.
set of parameters that work with a consistent The potential solution to combinatorial asset
underlying physical process”. growth is a benefit that alone may be enough for
Concerning hybridisation, the subject that was developers to embrace procedural audio. Where
fashionably called morphing in the late 1990s; space is the issue, something we have not talked
does this mean that given two parameter points about yet is source compactness. As code, sound
along a presumably continuous behavioural line, can be stored and transmitted with orders of mag-
a listener would correctly place a new example? nitude better space efficiency (say 10kB instead
The work of Wessel (1973) and Grey (1975) on of 10MB). Its value in replicated network games
classical instrument timbre space seems to say and mobile applications is high. This is topical at
that if sounds are hybridised by simple mixing of the present time in the UK where the gold-rush
spectra or interpolation of envelope curves then to unbounded 3G bandwidth growth has hit a
the perceptual interpolation is relatively smooth wall, leaving many business strategies without
and ‘navigable’. That is to say, given a trumpet and a paddle. Suddenly data reduction and efficient
flute somewhere in the middle, even though there representation are on the agenda again, at least
are numerous candidate parameter combinations, for casual mobile gaming.
there will be an area of “flumpets” and “trutes” Overall cost also improves compared to sample
(and also two clear flute and trumpet boundaries playback (which must grow with linear cost) above
with hysteresis such that the matching space is about 300 concurrent sources. In computer graph-
trisected). Yet the more complex the synthesis ics, there is a long-established concept of visual
becomes the less likely perceptual/behavioural in- level of detail. Objects at a distance, fast moving
terpolation can be achieved. Even if isolated points objects, or those of low relevance are drawn in a
of sensible accordance can be found, unless the cursory fashion using techniques of texture MIP-
model is causally and structurally defined all other mapping and partial rendering. Especially relevant
points fail, even those close to the working ones. to computer game production is to introduce the
Often the parametric space between superficially idea of LOAD, a concept I am hoping to develop
332
further in consultancy for the Thief title with audio and getting around the limitations of what are
director Paul Weir. This is made possible in pro- simplistic, inelegant tools.
cedural audio because, while data playback incurs Something I classed previously as a technical
a fixed cost per source, a computational source development advantage, dynamic level of detail,
may have variable cost. Also, the application of is also something to be seen as an aesthetic op-
a perceptual audio model to a game in which AI portunity. Many remedial tools, which exist
enemies respond primarily to sound seems an because the philosophy of game audio is still in
obvious direction in virtual (simulated) machine essence brute force, might be dispensed with. In
listening research. a scene of sufficient complexity the procedural
approach allowing LOAD is to be preferred,
Artistic Advantages of Procedural not only because it becomes more efficient,
and behaviouraltthinking but it also offers a better quality of sound. This
“sparse” quality accrues from the ability to se-
Though I am fond of advocating procedural game lect psychoacoustically relevant structures thus
audio from a technical position it seems inescap- dovetailing with existing sounds whereas, with
able that there are numerous artistic advantages sample playback, there are ultimately limits to
too. These have been difficult to formulate and unbridled superposition. Game audio is frequently
sometimes extremely hard to find a sympathetic criticised as too dense. Designers are constantly
audience for, despite also being a sound artist badgered to produce “Big” sounds which are then
in my career and recognising the ways we have clumsily overused. The loudness wars have run
become stuck as sound artists unable to move out of earth to scorch and there’s nowhere left to
forward artistically until the limitations of samples go. The limits of superposition, grey goo mixes
have been surpassed; as I put it earlier; shaking (see also Hug, 2011), can be remedied only by
off the concrete and dreary realism. The goal of taking something away, which is only possible
sonic structuralism and formalism here is only to if you have something to take away. Because
open a doorway to effective computational models. procedural mixes are constructed from atomic
Once that door is open I think the results will be contributions, it is possible to use psychoacous-
spectacular, not just technically but artistically. An tic prioritisation (similar to Bark band masking
accusation often levelled by sceptics of computa- used in MP3 compression) to keep mixes sparse
tional sound is that it’s too complex for artists to and clean while reducing CPU cost at the same
deal with. Oddly it’s never actual artists or prac- time (for example see Moeck, Bonneel, Tsingos,
titioners who are saying this. My thought is that Drettakis, Viaud-Delmon, & Alloza, 2007). This
the same was said of Pixar by pen and ink artists, allows space for significant events to be “punched
just as the same was said about word-processors. through” under artistic control.
And it is not just a question of tools. It is insult- Remedial techniques employed in engines like
ing to artists to assume the requisite concepts for the Wwise and FMOD middleware can selectively
advanced audio are beyond them. It is from here duck and filter sources to achieve focus. Beyond
that the most creative of the next generation of about 300 sources, the real-time dynamic process-
sound designers will come to enjoy manipulating ing required to tame and focus a mix becomes such
and extending the exposed structure and func- a burden that it is probably better to deconstruct
tions available at the new frontier of sonic arts. sources and to select behaviourally significant
In many ways the existing event-asset paradigm structures rather than allowing them to compete
is as arbitary as any other. Much creative energy in a mix, hoping that the listener can properly at-
is currently spent learning proprietary interfaces, tend. This logic is natural to composers and music
333
engineers who understand how, at a certain point symphony radiating from a loudspeaker cone the
in mixing a record, simplifying or removing parts sound of a loudspeaker cone? What is the sound
produces a more potent result. Less can be more. and what is the source? How are a rock dropped
Another conceptual change in procedural audio into a pool and a stone thrown into a lake not the
is that objectified sounds no longer exist. They same sound? Language fails to help us where
are only potential sounds. A sound object with mass, diameter, velocity, fluid viscosity, and depth
essence (behaviour), form (model, methods and can: The palette of elementary physics provides
implementation), and potentiality (latent signals a more real set of colours to play with provided
brought out at run time) exists in its future. This some basic understanding is assumed. In time,
is a movement away from sound as product, to we may be able to provide simple interfaces that
sound as service: Does this sound object have artists with no mathematical or physics knowledge
the potential to work for these situations (plural can use purely by experimentation.
emphasised)? A creative mind limited to exercising
choice and selection, reaching for the last popular
library, typing a keyword into a search utility, can sUMMArY
become mired in self-reference, gratuitous reuse,
and aspiration to the last big thing. Getting stuck Moving from a data model to a computational
this way is bad for any artist. You become only model of sound is more than a change of technol-
as great as your library, and the ability of your ogy; it alters how we think about sound design.
memory (assisted by search tools) to navigate it. It is a move back towards an era of coherent
A sound artist is set free by considering sound as audio-visual modelling that was cut short by a
a process again instead of as an asset. Focusing temporarily cheaper, but weaker technology. The
on process reveals possibility and allows one to breaking away of sound, as though to a separate
think in sound again, not merely to think of sound. faculty, has come full circle and we now need to
I’m also sceptical of current search and da- consider reintegrating sound within the larger
tabase technologies for sound and curious about model. This challenges the episteme of sound de-
alternatives to language at the creative stage of sign and asks us to re-examine concepts of realism,
sound design. Tags or symbolic tokens applied satisfaction, and immersion. For artists the move
to sounds are of limited help. I have heard many is as profound as going from a two dimensional
researchers describe the hard problems in multi- to a three dimensional creative medium. Iden-
dimensional search for which sound is a difficult tity is replaced by an appreciation of behaviour,
case, even with meticulously crafted meta-data structure, process, causality, and relativity to the
and timbre tag matching, such as those used by environment. This enriches the ways we can talk
the Echo Nest. Sounds are not what they say to viewers or communicate experience to play-
they are, since beyond simple onomatopoeia the ers. For developers, the challenges are equally
name of the source is not the sound. Linguistics tough in taking game and film sound design to
and phenomenology of sound provide wonderful another level. Some advantages do compete with
thought experiments. These are not airy philoso- existing sampling technology and, in some cases,
phies, they are vital definitions that shape tools, procedural technology still has far to go to equal
influence software interfaces and determine how the impact of recordings. But “impact”, “rich-
we get to think about sound. ness” and “realism” are ill-defined and overused
Challenging the underlying mindset that con- words in sound design. We need to re-evaluate
siders sound as a fixed asset may be helpful. As these words in structural and behavioural terms
Rocchesso et al. (2004) put it, is the sound of a if they are to have any modern meaning in the
334
critique of 21st century digital audio. This means Bregman, A. S. (1992). Auditory scene analysis:
deeply questioning and perhaps rejecting the Listening in complex environments . In McAdams,
“Hollywood” values that have colonized game S. E., & Bigand, E. (Eds.), Thinking in sound
audio, but may not actually be appropriate to a (pp. 10–36). New York: Clarendon Press/Oxford
multi-modal, interactive context. There are new University Press.
advantages like LOAD, essential realism, space
Chion, M. (1994). Audio-vision. New York: Co-
efficiency, sparse superposition, automatic asset
lumbia University Press.
generation, and sound object polymorphism that
are simply not relevant to data driven methods Collins, K. (2008). Game sound: An introduc-
of the film era and will never be achieved using tion to the history, theory, and practice of video
samples. Only real-time procedural audio can ad- game music and sound design. Cambridge, MA:
dress these concepts and now that the necessary MIT Press.
processing power is available a new frontier has
Cook, P. R. (2002). Real sound synthesis for
opened in game audio which is here to stay.
interactive applications. Location: AK Peters.
De Poli, G., Piccialli, A., & Roads, C. (1991).
rEFErENcEs Representations of musical signals. Cambridge,
MA: MIT Press.
Adrien, J. M. (1991). The Missing link: Modal
synthesis . In De Poli, G., Piccialli, A., & Roads, Elmore, W. C., & Heald, M. A. (1969). Physics
C. (Eds.), Representations of music signals (pp. of waves. Location: McGraw Hill.
269–298). Cambridge, MA: MIT Press. Farnell, A. J. (2008). Designing sound. London:
Angus, J. A. S, and Caunce A. (2010) A GPGPU Applied Scientific Press.
approach to improved acoustic finite difference Gauss, C. F. (1882). General solution of the prob-
time domain calculations. AES 128 (7963) Lon- lem: To map a part of a given surface on another
don, UK. given surface so that the image and the original
Avanzini, F. (2001). Computational issues in are similar in their smallest parts. Copenhagen:
physically-based sound models. Unpublished Journal of Royal Society of Science.
doctoral dissertation. University of Padova, Italy. Geiger, G. (2005). Abstraction in computer music
Beck, D. (2000). In Boulanger, R. (Ed.), Designing software systems. Unpublished doctoral disserta-
acoustically viable instruments in Csound. The tion. Universitat Pomp eu Fabra, Barcelona.
Csound book: Perspectives in software synthe- Giordano, B. (2001). Preliminary observations
sis, sound design and signal processing (p. 155). on materials recovering from real impact sounds:
Cambridge, MA: MIT Press. Phenomenology of sound events . In Polotti, P.,
Benson, D. J. (2007). Music: A mathematical of- Papetti, S., Rocchesso, D., & Delle, S. (Eds.), The
fering. Cambridge: Cambridge University Press. sounding object (Sob project) (p. 24). Verona:
University of Verona.
Bilbao, S. (2009). Numerical sound synthesis.
Location: John Wiley & Sons. Gordon, C., Webb, D. L., & Wolpert, S. (1992).
Isospectral plane domains and surfaces via Ri-
emannian orbifolds. Inventiones Mathematicae,
110, 1–22. doi:10.1007/BF01231320
335
Grey, J. M. (1975). Exploration of musical timbre. Moeck, T., Bonneel, N., Tsingos, N., Drettakis,
Stanford University Dept. Music Technology G., Viaud-Delmon, I., & Alloza, D. (2007). Pro-
Report, STAN-M-2. gressive perceptual audio rendering of complex
scenes. In Proceedings of the 2007 Symposium
Grimshaw, M. (2008). Sound and immersion in
on Interactive 3D Graphics and Games (ACM
the first-person shooter. International Journal of
SIGGRAPH),189-196.
Intelligent Games & Simulation, 5(1), 2–8.
Moss, W., Yeh, H. (2010) Automatic sound
Grimshaw, M., & Schott, G. (2008). A conceptual
synthesis from fluid simulation. ACM Trans. On
framework for the analysis of first-person shooter
Graphics (SIGGRAPH 2010).
audio and its potential use for game engines.
International Journal of Computer Games Tech- Mullan, E. (2009). Driving sound synthesis from a
nology, 2008. physics engine. IEEE Games Innovation Confer-
ence (ICE-GIC 09).
Hiller, L. and Ruiz, P. (1971). Synthesizing musical
sounds by solving the wave equation for vibrating Plomp, R., & Mimpen, A. M. (1968). The ear
objects. Journal of the Audio Engineering Society. as a frequency analyzer. The Journal of the
Hug, D. (2011). New wine in new skins: Sketching
doi:10.1121/1.1919256
M. (Ed.), Game sound technology and player Polotti, P., Papetti, S., Rocchesso, D., & Delle, S.
interaction: Concepts and developments. Hershey, (Eds.). (2001). The sounding object (Sob project).
PA: IGI Global. Verona: University of Verona.
Karplus, K., & Strong, A. (1983). Digital synthesis Rand, A. (1971). Art and cognition. The Romantic
of plucked strings and drum timbres. Computer Manifesto. 78. Signet.
Music Journal, 7(4), 43–55. doi:10.2307/3680062
Reiter, U. (2011). Perceived quality in game audio
Kunkler-Peck, A. J., & Turvey, M. A. (2000). . In Grimshaw, M. (Ed.), Game sound technology
Hearing shape. Journal of Experimental Psychol- and player interaction: Concepts and develop-
ogy. Human Perception and Performance, 26(1), ments. Hershey, PA: IGI Global.
279–294. doi:10.1037/0096-1523.26.1.279
Reiter, U., & Weitzel, M. (2007). Influence of
Little BigPlanet. (2008). Sony Computer Enter- interaction on perceived quality in audiovisual
tainment. applications: Evaluation of cross-modal influence.
Lorenz, E. (1993). The essence of chaos. Se-
on Auditory Displays (ICAD).
attle, WA: University of Washington Press.
doi:10.4324/9780203214589 Rocchesso, D., Avanzini, A., Rath, M., Bresin, R.,
& Serafin, S. (2004). Contact Sounds for Continu-
McAdams, S. E., & Bigand, E. (Eds.). (1992).
ous Feedback. In Proceedings of International
Thinking in sound: The cognitive psychology of
Workshop on Interactive Sonification.
human audition. New York: Clarendon Press.
Oxford: University Press. Serquera, J., Miranda, E. R. (2010) CA sound
synthesis with an extended version of the multi-
Miranda, E. R. (2002). Towards the cutting edge:
type voter model. AES128 (8029) London, UK.
AI, supercomputing and evolutionary systems.
Computer Sound Design, 157-192. Elsevier.
336
Smith, J. O. III. (1992). Physical modeling using Zwicker, E. (1961). Subdivision of the audible
digital waveguides. Computer Music Journal, frequency range into critical bands. The Journal
16(4), 74–91. doi:10.2307/3680470 of the Acoustical Society of America, 33, 248.
doi:10.1121/1.1908630
Subrahmanyan, N., & Lal, B. (1974). A textbook
of sound. Delhi: University of Delhi.
Takala, T., & Hahn, J. (1992). Sound rendering.
Proceedings of SIGGRAPH ’92, 26(2), 211-220.
van den Doel, K., & Pai, D. K. (1998). The sounds Auditory Scene Analysis: Determining what
of physical shapes. Presence (Cambridge, Mass.), is happening in the environment using only the
7(4), 382–395. doi:10.1162/105474698565794 sense of hearing.
Continuous Parameterisation: A system that
Vicario, G. B. (2001). Prolegomena to the percep- has no discrete control states but is driven by a
tual study of sounds . In Polotti, P., Papetti, S., set of real numbers.
Rocchesso, D., & Delle, S. (Eds.), The sounding Deferred Form: A property of something
object (Sob project) (p. 13). Verona: University that only becomes fully defined at the moment
of Verona. of use. For example, stock market investments
Warren, R. M. (1992). Perception of acoustic se- have a deferred value because you don’t know
quences . In McAdams, (Eds.), Thinking in sound: what they will be worth in 20 years.
The cognitive psychology of human audition. New DSP: Digital signal processing. In our case,
York: Clarendon Press. Oxford: University Press. computer code designed to create or modify audio
signals.
Wessel, D. L. (1973). Physchoacoustics and music: Dynamics: A branch of mathematics or physics
A report from Michigan State University. PAGE concerning the changes occurring within a system.
Bulletin of the Computers Arts Soc., 30. Dataflow: A programming paradigm based on
Whitmore, G. (2009). The runtime studio in your flow. Usually with a visual (patching) front-end
console: The inevitable directionality of game in which boxes represent transformations/func-
audio. Develop, 94, 21. tions and lines are wires that connect the boxes
together. Examples include Max/MSP, Pure Data,
Yee-King, M., & Roth, M. (2008). Synthbot: An CPS, Reaktor.
unsupervised software synthesiser programmer. Excitation: The means by which energy
International Computer Music Conference. becomes vibration in a sounding object. Often
Zheng, C., & James, D. L. (2009). Harmonic flu- passing from an outside object to the object that
ids. ACM Transaction on Graphics (SIGGRAPH makes the sound (although self-excitation by in-
2009), 28(3). ternal stress is possible). Examples are collision,
friction, field and proximity coupling, plucking,
Zheng, C. & James, D. L. (2010). Rigid-Body turbulence, and induction.
Fracture Sound with Precomputed Soundbanks. Finite Difference (Also, First Difference): A
ACM Transaction on Graphics (SIGGRAPH DSP scheme where mathematical differentiation
2010), 29(3). is represented discretely.
Finite Element: A part in a computational
model in which a continuum is broken down into
small, notionally atomic parts, each with simple
337
relationships to identical neighbouring parts. Taking advantage of this allows data reduction
Behaviour of the whole then emerges from the techniques and a knowledge of it is at the heart
simple behaviour of parts. of skilled synthesis.
Good Behaviour: In a computer science sense, Method: A general way of creating an audio
a well-behaved algorithm has predictability in time signal.
and space resources and its growth is acceptable Model: That which captures the overall be-
for all likely input cases. It will not unexpectedly haviour of a sound object.
cause clicks, dropouts, or lockups. This is vital Occlusion: Changes to the intensity and tone of
in a real-time system where resources may be a sound by the intervention of an object between
pushed to the limit. the source and listener.
Grey Goo: The breakdown of all boundaries Parameter: A value given to a sound object
and dissolving of form as everything becomes (a function of time). Three senses of the word are
uniformly bland. Sometimes used to express commonly encountered; Formal parameters are
fear of a totalitarian transformation or the move- blank slots left in the definition of an abstraction,
ment towards a lifeless normalised form without to be filled later. Actual, instantiation, or fixed
diversity or dynamics. parameters, which don’t change within the life
Idiophonic: Simple, often homogeneous ob- of a sound, are used to set up an object when it
jects having fixed size and shape throughout the is created. Real-time or performance parameters
sounding process. Examples are dropped objects, are those properties of a simulated world that
bottles, planks and other simple geometric shapes change (often rapidly) and are fed to a sound
in collision. object instance while it runs.
Implementation: A specific body of code Parametric (Signal) Method: A synthesis
or hardware for making some synthesis method method that maps input parameters to an audio
work. Also, in traditional game development, the signal using only functions that define the spec-
act of manually binding sound samples to events. trum, such as FM, AM, or waveshaping.
LOAD (Level of Audio Detail): Changes Physical Modelling: (Misnomer, more con-
made to the running synthesis system according to sistently written as physical methods). The class
the amount of detail needed in the signal. Can be of methods that attempt to more or less precisely
determined by distance, focus, object relevance, and uniformly model the material properties of
or by other perceptual (psychoacoustic) factors. an object. Mathematical ways of viewing mass,
Machine Listening: The branch of AI applied, stiffness, force and velocity of some finite ele-
in the widest sense, to recognition, classification, ments or continuum.
identification, or tracking of audio signal data. Physically Informed Model: A simplified,
Sometimes AI auditory scene analysis, and speech perhaps heterogeneous, model in which physical
recognition. properties and relationships are encoded as heu-
Mask Topology: A: set of filters/rules for se- ristics, simple rules and coarse data. A looser and
lecting or inhibiting points in a multi-dimensional less precise model than a strict physical model. For
data set. example a model of something as complicated as
Masking: A psychoacoustic phenomenon a helicopter cannot hope to be a physical model,
accounting for the dominance of sounds con- it would contain too many elements. Instead a
taining one feature over sounds with the same or physically informed model states the relation-
similar features. Some perceptual rules define, ship between components like pistons, engines,
according to time, amplitude and spectrum, the exhausts, rotor blades and so forth. The smaller
extent to which one sound may mask another. components may themselves be (large scale)
338
physically informed parts or (small scale) physi- time-variant filters and basic amplitude modula-
cal models. The parts of a physically informed tion). Popular in 1990s music synthesisers.
model don’t necessary have to be physical models Sound Object (Also Sounding Object):
but could be parametric. It is the overall system Object oriented computer code implementing a
model that embodies the physical relationships. virtual sound source with methods which activate
Physics Engine: Part of a computer game re- parameters to DSP code. Musically, the analogy
sponsible for modelling the large scale behaviour is a musical instrument.
of solids, fluids and gases according to heuristic Statefulness: Having, and being in, one of
Newtonian physics. For example the bouncing, two or more discrete and exclusive states; such
rolling, and deformation of meshes according to as existing or not existing.
mass, gravity, kinetic energy, and buoyancy. This Tensor: A way of representing a complex
code is usefully seen as distinct from modules re- vector of forces (relationships between vectors).
sponsible for the appearance and sound of objects. Waveguide: A physical model where time de-
Reactive Audio: Live synthesised sound that lays and filters are used to simulate the motion and
requires user input. Most sounding objects are edge reflection of a wave in a bounded medium.
reactive.
Replication: In computer games, replication
is the problem of keeping networked clients in a ENDNOtE
multi-player game in synchrony so that each player
on their gaming system feels they are experienc-
1
A note on terminology relevant to imple-
ing the same instant in time and causality as other mentation: In computer science, we have
players (even though the clients are separated by the notion of a procedural language. A
significant and unpredictable packet latency). This procedural sound will exist as happily on a
is a hard problem (and in some senses, because computer running a functional language, a
of the speed of light, unresolvable). quantum computer if such a thing existed,
S+S: (Sampling Plus Synthesis).: A synthesis or a Turing machine made of tin cans and
method using a combination of stored wavetables, string. The word “procedural” does not apply
mixing, and post-processing (mainly consisting of to the implementation, rather to the treatment
of sound as process.
339
340
Chapter 16
Physical Modelling for
Sound Synthesis
Eoin Mullan
Queen’s University Belfast, N. Ireland
AbstrAct
While the first computer games synthesised all their sound effects, a desire for realism led to the wide-
spread use of sample playback when technology matured enough to allow it. However, current research
points to many advantages of procedural audio which is generated at run time from information on
sound producing events using various synthesis techniques. A specific type of synthesis known as physical
modelling has emerged, primarily from research into musical instruments, and this has provided audio
synthesis with an intuitive link to a system’s virtual physical parameters. Various physical modelling
techniques have been developed, each offering particular advantages, and some of these have been used
to synthesise audio in interactive virtual environments. Refinements of these techniques have improved
their efficiency by exploiting human audio perception. They have been implemented in large virtual
environments and linked to third party physics engines, unveiling the potential for more realistic audio,
reduced production costs, faster prototyping, and new gaming possibilities.
INtrODUctION generation process often involves the physical

modelling of sound producing objects from an
Current research is realising the potential of acoustic perspective. This chapter looks back at
procedural audio for generating sound effects in the relevant history of physical modelling and
computer games. Procedural audio is generated forward to how it is set to be a part of the future
from information on specific sound producing of computer game audio. It is laid out in four sec-
events and the result is a unique soundtrack each tions. The remainder of this section gives a brief
time a virtual environment is explored. This sound history of sound effects in computer games before
discussing the shortcomings of sample playback
DOI: 10.4018/978-1-61692-828-5.ch016 and the potential of procedural audio and physical
Physical Modelling for Sound Synthesis
modelling. The second section takes a look at the common method of producing them (McCuskey,
evolution of physical modelling including many 2003) and it is widely used for good reason. Raghu-
lessons which are to be learned from the physical vanshi, Lauterbach, Chandak, Manocha, and Lin
modelling of musical instruments. Some specific (2007) state that the method of sample playback is
techniques are discussed with extra emphasis “simple and fast”, meaning it is computationally
placed on two techniques, modal synthesis and inexpensive and straight forward to implement (p.
digital waveguide synthesis, which are particularly 68). The method takes advantage of known sound
useful in real-time applications. The third section design techniques that have been refined through
presents, in chronological order, the projects that a long history of use in the movie industry. In the
have made advances in the area of physical model- introduction to Real Sound Synthesis for Interac-
ling for sound synthesis in computer games and tive Applications, Perry Cook (2002) concedes
virtual environments and the last section looks that, with much effort having gone into improv-
at directions which future research may take as ing the sample-based approach, “the state of the
well as an industry perspective on the technique. art has indeed advanced to the point of absolute
The earliest computer games to include sound realism, at least for single sounds triggered only
effects synthesised them using whatever hardware once” (p. xi). However, there are many drawbacks
was available at the time. The sounds produced with sample playback and, in an interactive en-
were very much influenced by the limitations vironment, it cannot provide “absolute realism”.
of the hardware. Although by today’s standards The sounds heard in reality are produced
they could not be considered realistic, they were as a result of many factors and information on
marketed as such and a “drive towards realism these factors is contained within the sounds. For
[…] is a trend we shall see throughout the history example, a piece of wood struck near its centre
of game sound” (Collins, 2008, p. 9). “By 1980, will sound differently than when struck close to its
arcade manufacturers included dedicated sound edge. If struck harder it will not only sound louder
chips known as programmable sound generators, but it will have a different quality (Cook, 2002,
or PSGs into their circuit boards” (Collins, 2008, p. xiii). When continuous contact is maintained,
p. 12). Early consoles also had sound chips that for example during rolling or scraping, another
developers used to synthesise sound and again variation of sounds is heard. The object used to
the hardware limitations influenced their work. excite the piece of wood also has an influence.
However, as computers became more powerful, While the sonic differences can be subtle, we in-
developers began to utilise recorded samples tuitively perceive the conditions that cause them
in their quest for realism. Andy Farnell, author and therefore these factors are important in creating
of Designing Sound (2008), explains, “[e]arly realistic audio. One recording of a piece of wood
consoles and personal computers had synthesiser being struck may be enough to provide realistic
chips that produced sound effects and music in audio in a pre-determined scenario but it will be
real-time, but once sample technology matured inadequate in a fully interactive environment.
it quickly took over because of its perceived A partial solution to this problem is to use
realism thus synthesised sound was relegated to multiple recordings. We could, for example, record
the scrapheap” (p. 298). Karen Collins gives a a block of wood being struck on various points
comprehensive account of the earliest chips that with varying forces using different objects. When
performed synthesis up to the first systems that its virtual counterpart is excited, an algorithm
were capable of playing CD-quality samples. can then determine the most suitable sample to
When sound effects are required in a virtual playback or interpolate between the most appropri-
environment today, sample playback is the most ate samples. However, this approach can quickly
341
become expensive in terms of both the memory The solution is to defer some of the audio
and processing power required. Cook (2002) goes processing until runtime when information on the
on to say: “To have truly realistic continuously specific sound producing event is known. This
interactive sound synthesis, however, essentially information is used to drive more realistic audio
infinite memory would be required to store all pos- production. Director of Audio at Microsoft Game
sible samples” (pp. xi-xii) and “the equivalent of Studios, Guy Whitmore (2009), explains this in
a truly exhaustive physical model would have to a statement written on his then upcoming Audio
be computed in order to determine which subset Keynote speech to the 2009 Develop Conference
of the samples and parameters would be required in Brighton:
to generate the correct output sound” (p. xii).
This means that, while using samples can provide fully-mixed wave files, common in the majority
excellent audio in pre-determined situations, in a of games today, are like 2D art; you can only see
fully interactive environment it is not capable of one side. They’re big and clunky and typically
producing realistic sound for every eventuality pre-processed. Yet there’s a desire and need to
that may occur. hear sound and music from all perspectives, from
Even if one were to ignore, as game develop- many directions, in various surroundings, and in
ers have been doing, the limitation of sample different contexts. In order to effectively react and
playback described, there are further drawbacks adapt to a non-fixed game timeline, audio content
with the technique. Firstly, it costs time and must be authored and kept in smaller component
hence money to record a library of samples, parts until runtime, when they are assembled and
something which is described by Doel, Kry, and mixed. So take your entire music studio and sound
Pai (2001) as “a slow, labor intensive process” design suite, including mixers, synths, samplers,
(p. 537). As some sounds will not be specific to DSP, mics, speakers; virtualise everything and
just one game, for example, footsteps or doors make it adequately efficient, then drop it all into
opening and closing, it may be possible to reuse your game authoring environment. Stir in a non-
samples from a previously compiled library but linear DAW [Digital Audio Workstation], the
this leads to another disadvantage of the practice, right game ties, and some mixing intelligence,
which is that repeatedly using the same samples and there you have it; a complete runtime studio,
as often as is necessary can become repetitive to ready with the flexibility to turn on a dime to meet
the listener/gamer. A further disadvantage is that the needs of the game.
sound effects can only be included for objects
that are conceived during a game’s development. This technique improves on simply playing
There is no scope for allowing the user to create, back samples, but researchers wish to take the
modify, and then sound his or her own custom idea further. “It’s about sound as a process rather
objects. This is becoming more restrictive with than sound as data” (Farnell, 2008, p. 1). Farnell
the increasing popularity of games such as Me- goes on to explain the concept of procedural audio:
dia Molecule’s LittleBigPlanet (Sony Computer
Entertainment Europe, 2008) that feature User The sample based data model requires that most
Generated Content (UGC) and developments like of the work is done in advance, prior to execu-
LucasArts’ Digital Molecular Matter technology, tion on the platform. Many decisions are made
which can realistically simulate objects being in advance and cast in stone. Procedural audio
broken, hence creating a different combination on the other hand is highly dynamic and flexible,
of shattered pieces each time. it defers many decisions until runtime. (p. 301)
342
Procedural audio is generated in real-time PHYsIcAL MODELLING FOr

from all the relevant information available. As it sOUND sYNtHEsIs
is generated in response to sound producing events
it also fits the term “dynamic audio”, which Col- Sound synthesis techniques have been developed
lins (2008) defines as audio that “reacts both to for musical and sound design purposes since the
changes in the gameplay environment, and/or ac- late 1950s when Matthews first performed wa-
tions taken by the player” (p. 4). Researchers have vetable synthesis by generating sound from data
developed techniques to generate many types of stored in tables (Bilbao, 2009). The following
sounds from pouring and bubbling water to fire and decade saw the development of FM synthesis and
thunder and from guns and explosions to animal additive synthesis and, since then, composers and
and vehicle sounds. Farnell details many of these sound designers have learned to use these tech-
techniques in Designing Sound (2008). While it’s niques and others to achieve their aims. However,
true that synthesising sounds from scratch is more these techniques, often described as abstract, have
computationally expensive than sample playback no basis in the real world and so the link between
“the rewards are astonishing” (Farnell, 2008, p. the input parameters and the sound produced is
1). The ever increasing capabilities of personal not naturally intuitive. This means control systems
computers and games consoles coupled with the are often complex for the user (Adrien, 1991) and,
desire for realism means that procedural audio while commonly used in early computer games,
is set to become part of the future of computer there was no obvious link between their input
game audio. parameters and in-game variables. Physical mod-
Farnell (2011) describes the thinking behind elling synthesis techniques are, however, based
procedural audio, with the ultimate goal of cre- on real physical systems where there will usually
ating perceived realism, elsewhere in this book. be an intuitive link between the configuration of
Most often this will involve a degree of sound the system and the sound produced. Hence there
synthesis. The current chapter is concerned with a should be an intuitive link between the parameters
specific type of sound synthesis known as physical of a physical model and the sounds it produces,
modelling which is mentioned by Marin D. Wilde offering good control to sound designers (and
(2004, pg. 158) as a future direction for computer musical composers).
game audio. To physically model an object means To discuss physical modelling from an acoustic
to simulate how it behaves physically and from point of view is to discuss the physical modelling
an audio perspective this means simulating how of musical instruments as this is a long established
an object vibrates in response to excitation and research area in which much progress has been
causes sound waves to be radiated from it. Physical made. Much of what has been learned can be ap-
modelling can be used to realise procedural audio plied to the physical modelling of any sounding
and would be the preferred method of those whom object and is therefore relevant where sound effects
Farnell (2011) might call “moderate essentialists”. are to be created in a virtual environment. This
A range of physical modelling techniques have section looks at some of the methods available
been developed and this is the subject of the next before giving a more detailed look at the methods
section. most suitable for operating in real-time and those
used in the projects discussed.
Throughout history, musical instruments have
evolved to create sound in a sometimes complex
way while giving relatively simple control to the
musician. Instrument makers have, through cen-
343
turies of tradition, learned which design subtleties not behave uniformly throughout. As well as this,
are important in particular instruments and what more accuracy can be gained in important parts of
is necessary to make an instrument sound “good”. a structure, for example the part of a car which will
In more recent times, the exact way in which in- be impacted upon during a crash simulation, by
struments produce and radiate sound has been an using a finer grid in that area. However difficulties
area of research for physicists and acousticians. arise when one attempts to extract an audio signal
Mathematical equations have been devised to from the surface vibrations of an FEM simulation
describe the physical behaviour that causes sound due to the irregularity of the grid and in general
producing vibrations for many instruments and it is a computationally expensive technique. For
this has increased our understanding of them. this reason FEM is quite uncommon in acoustic
The interested reader is directed to The Physics of applications, and while sometimes used in a pre-
Musical Instruments (Fletcher & Rossing, 1991) processing stage it is less suitable for direct sound
which gives the derivation of such equations for synthesis in real-time.
many popular instruments. Methods have been Methods such as finite difference models and
developed to solve these equations numerically mass-spring networks also discretise an object in
enabling computers or signalling processing hard- space and time. The current position of a point
ware to simulate the instruments thereby revealing on an object is calculated from its own posi-
more about why they sound as they do. This has tion at previous temporal intervals and that of
not only increased our understanding of instru- neighbouring points. The mass-spring networks
ments and sound-producing objects but it has also method treats an object as a mesh-like network
given musicians new tools with which to create of point masses connected by springs and damp-
sounds and music. For a more thorough explana- ers. The positions of points in the network are
tion of the methods described below, and others, calculated using Newton’s second law of motion
one may wish to read Discrete Time Modelling of and Hooke’s law. The finite difference method
Musical Instruments (Välimäki, Pakarinen, Erkut, is based on partial differential equations (PDEs),
& Karjalainen, 2006). also derived using Newton’s second law of mo-
Physical modelling techniques were first de- tion and Hooke’s law, that describe the vibration
veloped in the 1960s based on the causal systems of an object. To enable these equations to be
that produce sound in reality (Kelly & Lochbaum, solved numerically, they are discretised in space
1962; Ruiz, 1969). Back then, the limited process- and time using finite approximations. With the
ing power and availability of computers was pro- finite difference method, the same equations
hibitive but, as computers became more powerful are used to describe all parts of the object being
and accessible, the algorithms being developed modelled and so the method is suited to objects
could afford to be more computationally demand- that behave uniformly throughout and that can be
ing and this allowed researchers to develop many fitted to a regular grid. Both these methods can
different physical modelling techniques. be used in real-time and offer some advantages,
The finite element method (FEM) has been for example, it is easy to extract an audio signal
employed in many areas of scientific research from from the surface vibrations of an object modelled
car crash simulations to complex weather systems. using these techniques. The techniques allow for
It involves dividing a distributed physical system, real-time interaction, at one or many points on
which is complex, into discrete elements, which are the object, which could be external excitation or
simple. The advantage of FEM over other physi- coupling with another object. They can also model
cal modelling techniques is the ease with which distributed nonlinear effects, that is, effects due to
it can be applied to complex geometries that do larger or smaller vibrations beyond the amplitude
344
of the output. Higher spatial and temporal resolu- memory location of each buffer corresponding to
tions can be used for a more accurate simulation the desired pickup position.
but this costs more in terms of processing cycles Compared to the techniques discussed so far,
and these techniques can become computationally which may require hundreds or thousands of
expensive for large objects. For this reason, they arithmetic operations per temporal interval, digital
are not often the preferred method for environ- waveguide synthesis is a very inexpensive tech-
ments involving many objects or in cases where nique. The movement of a signal through a buffer
efficiency is important and sound quality is not requires no arithmetic operations and a second
the top priority. order digital filter will often suffice for applying
In the mid-1980s, digital waveguide synthesis attenuation at the ends. It should be stressed that
emerged as a computationally cheap method for this technique does not gain computational ef-
modelling one-dimensional (1D) wave propaga- ficiency over other methods through some crude
tion in systems with no stiffness. As many stringed approximation or by sacrificing veracity. Instead,
and woodwind instruments approximately fit this it exploits the harmonic nature of waves in a 1D
description the technique proved popular and has system without stiffness and employs a very cheap
been heralded as “the most successful physical method of delaying a signal in time.
modelling technique to date” (Bilbao, 2009, p. An obvious drawback with digital waveguides
18). It is explained as follows. is their limitation to 1D systems. When they have
The vibrational behaviour of a 1D system been extended to more dimensions, creating a
with no stiffness can be modelled as two travel- “waveguide mesh”, the performance gains have
ling waves propagating through the structure, been lost and the number of arithmetic calcula-
bouncing back at the ends and not interacting tions involved invariably approaches that of a
with each other. These waves originate from an finite difference scheme (Bilbao, 2006). Another
excitation signal and initially leave the excitation limitation is their inability to model the effect
point in opposite directions. Digital waveguide that the vibration amplitude has on how a system
synthesis involves employing two memory buf- resonates, that is, they cannot easily be used to
fers as a delay line for each direction. The size model distributed nonlinearities. This effect is
of the buffers corresponds to the time taken for a apparent when one listens to how the sound of a
sound wave to travel the length of the structure gong changes as it resonates.
being modelled and is therefore calculated using To understand another drawback of digital
the speed of sound in the medium. To model at- waveguide synthesis one must understand the
tenuation, the waves are passed through a digital effect that bending stiffness has on how an ob-
filter as they bounce at each end. The filters are ject vibrates. The composite frequencies of a 1D
designed to apply the correct attenuation for the sounding object with no bending stiffness will be
structure being modelled and can apply frequency- harmonically related, that is, the higher frequen-
dependent attenuation, giving the output sound the cies will be integer multiples of the fundamental
signature of different materials. As locations in frequency. Bending stiffness causes waves to
the memory buffers correspond to positions on a propagate more quickly through a medium and,
real object it is easy to simulate the structure being significantly, this effect is more pronounced in
excited at different points by adding an excitation higher frequencies. Therefore, the spectrum of a
signal to the corresponding memory location of 1D sounding object with bending stiffness will
each buffer. Audio output can easily be extracted not be harmonic. Because many stringed and
from the model by summing the contents of the woodwind instruments are not greatly affected by
stiffness, this disadvantage has not been enough
345
to dissuade its widespread use in musical appli- modal synthesis is to create a digital filter tuned
cations. However, the primary sound sources in to resonate at the desired frequency and with the
a typical virtual environment are rarely plucked desired decay rate. To simulate object excitation,
strings and woodwind instruments and so digital an impulse signal is input into each filter inde-
waveguide synthesis has not been so commonly pendently. The location-dependent gain can either
adopted in these situations. The effect of stiffness be incorporated into the design of each filter, or,
can be realised by using more sophisticated, and if the contact location is to be varied during run
hence more computationally expensive, digital time, applied to the impulse signal before input
filters. Some research has been carried out in an to each of the filters.
attempt to do this in an inexpensive way (Essl, It is with good reason that modal synthesis
Serafin, Cook, & Smith, 2004) but, to accurately has become the most popular physical modelling
realise the full effect of stiffness, the total com- technique to be applied in interactive virtual envi-
putation required would equal that of the next ronments. Its many advantages will be presented
physical modelling technique to be discussed. here in detail as they are relevant to the projects
The method which has become the most popu- discussed later in this chapter. Firstly, modal
lar for real-time sound synthesis in interactive synthesis generates sound as a sum of sinusoids
virtual environments is modal synthesis. It is a which correlates naturally with the way in which
physically based method that has many advantages humans perceive sound. The computational ex-
for real-time interactive sound synthesis. As it is pense of the technique is directly proportional to
used by many of the projects discussed in this the number of modes being used and, as in theory
chapter, it is beneficial to understand some theory this is unlimited, one cannot simply assert that it
behind it. There follows a brief description of the is a cheap technique. However, many studies have
technique and, for a more detailed examination, indicated that realistic object contact sounds can be
one may wish to read Jean-Marie Adrien’s seminal created with little computational expense and that
work on the subject, The Missing Link: Modal its memory usage is much lower than that of the
Synthesis (1991). digital waveguide technique. Furthermore, studies
To understand modal synthesis, one should have implemented modal synthesis in such a way
consider the sound of a vibrating object as a sum that the processing power afforded it can be varied
of decaying sinusoids. The frequencies of these at run time with a graceful, if at all noticeable,
sinusoids correspond to an object’s modes of degradation in sound quality. The ability to set a
vibration and, in modal synthesis, they are recre- sonic level of detail is particularly appealing in a
ated by a bank of oscillators working in parallel. large environment where some sounding objects
Each mode of vibration also has an associated may be far away from the observer or occluded,
decay rate and a shape function or shape matrix or in a busy game environment where the player’s
and, together with its frequency, they represent attention is to be focused on a particular aspect. It
an object’s modal data, that is, all the information is also attractive to have an audio engine that can
needed to carry out modal synthesis for that object. utilise more processing power when it is available
A shape function, or shape matrix, provides the and make do with less when it is not.
gain of a given mode for a specific contact location Although, like digital waveguide synthesis,
meaning that modal synthesis can create contact modal synthesis cannot easily model distributed
location-specific sounds, an obvious advantage nonlinear effects, it can model the effect of bending
in virtual interactive environments where real- stiffness at no extra cost and hence synthesise the
ism is desired and perpetual cues are important. sound of inharmonic objects. This is important in
A common way of implementing oscillators for a typical virtual environment where the majority
346
of sounding objects will have a level of stiffness tHE PAPEr trAIL OF

that is significant from an acoustic point of view. PHYsIcAL MODELLING IN
Like digital waveguide synthesis, it is possible VIrtUAL ENVIrONMENts
with modal synthesis to give the effect of sound-
ing different materials by changing how quickly The idea of sound rendering for computer anima-
the composite frequencies are damped with time. tion was introduced by Tapio Takala and James
This is straightforward to implement with modal Hahn in their paper Sound Rendering (1992) and
synthesis–damping can be simply applied to each they pioneered many of the ideas used in more
mode of vibration–as opposed to digital waveguide recent projects. They presented a methodology for
synthesis which requires filters to be designed combining procedural sounds into a synchronised
with this in mind. soundtrack based on an animation. Each object in
Another advantage of modal synthesis is its a scene is associated with a characteristic sound
versatility with regard to how an object’s modal and a sound script is created for an animation
data is obtained. Projects have determined infor- detailing “how a prototype sound signal will be
mation on an object’s modes of vibration from instantiated, and how it is transformed by the
calculations, from recordings of real sounding acoustic environment” (Takala & Hahn, 1992, p.
objects, and by using finite element analysis with 218). Although they acknowledge the potential
each of these methods having its own advantages for physical modelling, saying “sound can be
and disadvantages. synthesized from physical principles” (Takala &
A final physical modelling technique to be Hahn, 1992, p. 214), and they describe how an
aware of is called the functional transformation object’s complex vibration can be computed as a
method (FTM) (Trautmann & Rabenstein, 2003). sum of its vibration modes (modal synthesis), the
Originally developed in the 1990s, this more main focus of the paper is on the modulation of
formalised form of modal synthesis derives the sounds due to propagation in a three-dimensional
modal data directly from the underlying PDEs environment.
that describe the object’s vibrations, using Laplace The authors touch upon the idea of driving
and Sturm-Liouville transformations. The FTM sound synthesis from a physics engine when
can be applied in 1D, 2D, and 3D linear systems they propose “key events of a script can be […]
with regular shapes and provides a more structural automatically computed by a behavioural or
approach to interconnecting vibrating structures. physically-based motion control” (Takala & Hahn,
Multi-rate techniques, which involve simulating 1992, p. 215). This idea has been built upon by
lower frequencies with a lower sample rate without some of the projects presented later in this chapter
affecting the sound quality (a technique which which successfully extract and use not only event
can also be applied to modal synthesis), enable triggers, but also information from a physics
FTM to be used in real-time on a typical desktop engine. Takala and Hahn also give their insight
PC. However, to date, this technique has not been into how sound is produced when two surfaces
utilised in an interactive virtual environment. slide over each other:
the surface features cause both of the objects to

vibrate in the same way as a phonograph needle
vibrates when dragged in the groove of a record
disk. The waveform of the sound generated is
similar to the shape of surface imperfections of
the objects. The so called 1/f noise could be used
347
to model this phenomenon for rough surfaces. the name may suggest, this method uses modal
(p. 214) synthesis and the intention is that the modal data
has a physical meaning. The framework suggests
While its implementation is not presented in methods for determining the modal frequencies
their paper, the idea of using 1/f noise for a scarp- from the recording of a real sounding instrument,
ing sound has been utilized to good effect in more for example “peak-picking from spectra calculated
recent projects. by Fourier analysis” (Cook, 1997, p. 39). Although
Around the same time as the above work automation can help in setting the modal data,
appeared, William W. Gaver published some the author suggests “it relies heavily on human
research on auditory event perception (Gaver, analysis and decisions” (p. 39). To create an ap-
1993b) and synthesizing sounds for auditory propriate excitation signal, the suggestion is made
icons (Gaver, 1993a). In the former work, Gaver to record an actual object striking a non-resonant
notes that “sound provides information about an surface. So, if one wished, for example, to simu-
interaction of materials at a location in an envi- late a virtual object being struck by a mallet, one
ronment” (Gaver, 1993b, p. 5) and he explores should use the recording of a real mallet striking
“everyday listening” as the process of listening a non-resonant body. If the body being struck is
to events rather than sounds. From this, a method completely rigid and non-resonant then the sound
for describing sounds by attributes of the sound made will be due only to the mallet and should
source is proposed. In the latter work, Gaver therefore capture the excitation force, which can
states that auditory icons can add functionality be used as an excitation signal in the virtual world.
to a computer environment and, while the digital The second algorithm is called Physically
samples cannot be easily manipulated, it is possible Informed Stochastic Event Modeling (PhISEM).
to synthesize them. He then describes algorithms This involves “the overlapping and adding of small
to synthesize many everyday sounds as specified grains of sound” (Cook, 1997, p. 40), a familiar
by parameters that describe the sound producing process known as granular synthesis. The method
event, unlike many synthesis methods at the time is motivated by instruments like the maracas or
which synthesized sound in musical terms. This tambourine, the characteristic sounds of which
was an important step towards synthesizing object are made up of many short, discrete sound events.
contact sounds based on their physical attributes, a While these instruments may be uncommon in
necessity of physically motivated sound synthesis the average computer game environment, Cook
in virtual environments. notes that PhISEM algorithms can also be used to
A few years later, in 1997, Perry Cook pub- synthesize many everyday sounds like footsteps
lished research carried out on physical modelling. on gravel or dripping water. The process involves
Physically Informed Sonic Modeling (PhISM): analyzing a sound-producing system and possibly
Synthesis of Percussive Sounds appeared in the waveform of the sound it produces. From this
the Computer Music Journal (Cook, 1997) and analysis, rules are created to map the variables of
documented Cook’s work in creating a frame- the system, such as the shake velocity of a maracas,
work which could synthesize the sounds of a to the parameters of a granular synthesis algorithm.
wide range of percussion instruments. Here, two One of many contributions in the area by Kees
synthesis algorithms are presented motivated van den Doel and Dinish K. Pai was published
by two distinct types of percussive instruments. in 1998. The Sounds of Physical Shapes (Doel &
The first, Physically Informed Spectral Additive Pai, 1998) documented the creation of their Sonic
Modeling (PhISAM), deals with resonant percus- Explorer application which adds object contact
sive instruments like a marimba or cowbell. As sounds to real-time simulation environments.
348
They focused on making realistic object contact Virtual Environments (Pedersini, Sarti, & Tubaro,
sounds by considering the shape and material of 2000), they present a way to model multiple in-
the sounding object, the location of contact on teracting sounding objects. They model sounding
the object, and the force of the impact. Sound objects by combining digital waveguides and wave
is generated by modal synthesis and an object’s digital filters (Fettweis, 1986), which are closely
modal data is determined by calculation. This is related to digital waveguides. Nonlinear elements
possible for regularly shaped objects and, in Sonic are incorporated in object contact conditions and
Explorer, has been carried out for strings, bars, these are exploited to make a dynamic intercon-
plates, and membranes (in physical modelling nection topology. This allows for connections
terminology, a membrane is a two-dimensional between sounding objects to be made and broken
system with no stiffness). The process of calcu- as the acoustic effects are accurately modelled.
lating modal frequencies for these objects is ex- They also give an overview of available sound
plained in chapters two and three of The Physics rendering algorithms, with a strong influence
of Musical Instruments (Fletcher & Rossing, 1991) from 3D graphics rendering. More recently, this
and in Sonic Explorer this is carried out off-line approach has been automated using a binary con-
as a pre-processing stage. Doel and Pai explain nection tree (BCT) (De Sanctis, Sarti, Scarparo,
that the contribution of each mode to the overall & Tubaro, 2005) allowing real-time interactions
vibration depends on the impact location and they between sounding objects in an interactive setting.
derive some general formulas to calculate the mode In 2001, Doel and Pai, along with others at
amplitudes for a given impact location. This means the University of British Columbia, Vancouver,
that when an object is struck on different locations developed a system for bringing real world objects
there will be subtle differences between the sounds into the virtual domain. Scanning Physical Inter-
produced that match what the listener intuitively action Behaviour of 3D Objects (Pai et al., 2001)
expects to hear. The authors do not take into ac- extends the idea of simply scanning an object to
count the directionality of the emitted sound nor create a graphical virtual equivalent by capturing
do they accurately model a physical environment its deformation behaviour (how it reacts to external
as this is beyond the scope of their project but, force), its surface texture, and its sound producing
with a few justified simplifications, they present a properties. This is achieved by scanning the inter-
suitable method for transforming the vibrations of action behaviour in a variety of ways using robotic
an object into the sound that is heard. Next, they measurement facilities. While the techniques for
show how to account for different materials by finding an object’s deformation behaviour and
explaining that an object’s internal friction param- surface texture are fascinating they fall outside
eter, which is determined by its material, affects the scope of this chapter and so this project will
how its sounding frequencies become damped be examined from an audio perspective.
with time. By changing the damping values they Similar to The Sounds of Physical Shapes, this
can therefore create the effect of sounding differ- project generates sound by modal synthesis but
ent materials. A frequency-independent damping now the modal data is automatically determined
value is applied to all frequencies equally and a for an object based on robotically obtained mea-
frequency-dependent damping has a greater effect surements. To this end, a robotic arm applies an
on higher frequencies. approximation of an impulse force (a short tap)
The beginning of the new millennium saw a to a location on the object being scanned and
significant contribution from a team at the Poly- the resulting sound is recorded. A technique is
technic University of Milan, led by Augusto Sarti. described for extracting the object’s modal data
In their paper Object-Based Sound Synthesis for (frequency of vibration modes, the amplitude
349
of these modes, and their damping) from this synthesis. In the case of a single impact, it was
recording. To enable impact location-dependent found that the shape of the force profile was not
sound synthesis, that is, to realize the subtle dif- perceptually important but that the duration of
ferences in sound due to an object being struck the force gave a good feel for the hardness of the
on different locations, this process is repeated at objects in contact. For collisions involving hard
many points on the object’s surface. In order to surfaces, a burst of impulses at the dominant
model friction, a measure of surface roughness is modal frequencies of the colliding objects was
determined robotically and this is then used for found to convincingly produce the sound due to
generating continuous contact sounds. micro-collisions. To create a scraping impulse,
In the same year, Doel and Pai along with Paul the phonograph model as described earlier in the
G. Kry, who specializes in computer graphics work of Takala and Hahn (1992) is recalled. To
and physics based animations, published Foley- create such a surface profile, noise is filtered to
Automatic: Physically-based Sound Effects for give a 1/f shape and a fractal dimension variable is
Interactive Simulation and Animation (Doel, Kry, considered to represent the roughness in the profile
& Pai, 2001). In this project the process of sound produced. In the case of rolling, it is theorized
synthesis is linked to a dynamic simulation (phys- that the impulse is similar to scraping but more
ics engine) that allows user interaction. Sound is “drawn out in time” (Doel et al., 2001, p. 541) and
generated using modal synthesis, as described in this is realized by applying a low-pass filter to the
The Sounds of Physical Shapes (Doel & Pai, 1998), scraping impulse. Suspecting a stronger coupling
and modal data is determined as in Scanning Physi- between objects during rolling than scraping, it
cal Interaction Behavior of 3D Objects (Pai et al., was found that enhancing the spectrum at the
2001). In addition to being linked to a dynamic resonant frequencies of the objects involved gave
simulation, this project is the first to synthesize better rolling sounds at a higher computational
continuous contact sounds, that is, rolling and cost. All of the ideas summarized here have been
scraping, in a real-time simulation. The technique implemented in the application FoleyAutomatic
used in this paper could be implemented with any which can “automatically generate high-quality
multi-body dynamic simulation method as the realistic contact sounds” from physical informa-
requisite information, such as object collision tion in order to “increase the feeling of realism
forces, normal forces during continuous contact, and immersion in interactive simulations” (Doel
and the position of objects relative to each other, et al., 2001, p.543). See Figure 1.
if not directly available, can be easily computed Another, paper published in 2001 by James F.
from the object velocities and positions which are O’Brien, Perry Cook and Georg Essl, focused on
available. However, the authors note that methods calculating the sound heard due to a vibrating
which provide smooth surface models are prefer- object at a relative location. Synthesizing Sounds
able to polyhedral approximations and methods from Physically Based Motion (O’Brien, Cook,
which model rolling and sliding are also desirable. & Essl, 2001) describes how to calculate the air
In order to create realistic audio, object inter- pressure at two points in an environment (for
actions are simulated at an audio rate which is stereo sound) due to the surface vibrations of an
much higher than the video rate. The modelling of object. Considering Huygen’s principle, the delay
object interactions must be carried out quickly and and attenuation due to sound propagation is cal-
the authors note a stochastic model that involves culated. However, the technique requires a de-
some random element is often appropriate. When formable body simulator that calculates the surface
objects make contact, a force signal is calculated vibrations of objects at audible frequencies and
at the audio rate and this is used to drive modal is therefore not compatible with most real-time
350
Figure 1. Virtual rock in a wok, from the Foleyautomatic project. (© 2010 Kees van den Doel. Used
with permission.)
interactive simulations. The following year, how- Institute for Hearing Accessibility Research in the
ever, O’Brien, along with Chen Shen and Chris- University of British Columbia. Measurements
tine M. Gatchalian published a paper which ad- of Perceptual Quality of Contact Sound Models
dressed this. (Doel, Pai, Adam, Kortchmar, & Pichora-Fuller,
Synthesizing Sounds from Rigid-Body Simula- 2002) explored ways to improve how the modal
tions (O’Brien, Shen, & Gatchalian, 2002) notes synthesis technique can be employed. The authors
that when a body’s sound producing deformations analysed the recording of a metal vase being struck
are small enough, which they generally are, they and found 179 modes. However after “laborious
can be decoupled from its rigid-body behaviour. trial and error” (Doel et al., 2002, p. 2) they found
This means that it is possible to use a rigid body that only 10 to 15 of those modes were perceptu-
simulator to calculate how objects move on a ally important. By conducting listening tests, they
macroscopic scale while audio is generated by a then set about finding an algorithm for sorting an
separate process. O’Brien et al. have implemented object’s modes by perceptual importance so that
this twice, with two different third-party physics only the most important modes need actually be
engines. Modal synthesis is the method of choice synthesized, thus saving processing power. For
here and the unique contribution of this work is example, a simple approach was to weight the
the way in which modal data is obtained. Unlike importance of a mode by its gain while more
FoleyAutomatic, they do not require any experi- sophisticated techniques considered the effect of
mental data and, unlike Sonic Explorer, arbitrarily masking. After describing their techniques, ex-
shaped objects can be included because an object’s perimental procedure, and results, they concluded
modal data is determined using a finite element that while results varied substantially among
scheme. A tetrahedral method previously used participants the “efficiency of the synthesis can
by O’Brien et al. (O’Brien et al., 2001) is now be improved by several orders of magnitude by
modified to produce modal data for an object of a careful selection of the modal model” (Doel et
a given size, shape, and material. While this pre- al, 2002, p. 4).
computation phase may take a few hours it allows The year 2002 also saw the first published
sound synthesis to happen in real-time during contribution to the field from Dylan Menzies.
the simulation. The authors also note that some Realising the promise of physical modelling for
parameters can be changed after the computation sound synthesis in virtual environments and the
without affecting the modal frequencies while potential of these techniques to work closely with
other changes affect all frequencies by the same a physics engine, Menzies set about creating a
ratio and so can be quickly computed. modular framework to enable the sounding and
The same year saw a contribution from Doel interaction of many objects in a virtual world.
and Pai, this time along with researchers at the Scene Management for Modelled Audio Objects
351
in Interactive Worlds (Menzies, 2002) describes functionality for extracting an audio signal from
the beginnings of what would later become the an object. Modular programming also allows a
Phya project. This project is detailed later when derived class to inherit functionality from a base
discussing Menzies’ 2007 paper Physical Audio class and so common code can be written in a base
for Virtual Environments, Phya in Review (Men- and used in many specialized derived classes. This
zies, 2007). important advantage ensures the functionality of
Meanwhile, Doel and Pai continued their the system can easily be extended in the future.
research on efficient ways to implement modal In this work the authors note that: “Modal
synthesis by applying it to complex scenes in- synthesis can also be used to model other types of
volving a large number of sounding objects. In physical systems which can be modeled by excita-
Interactive Simulation of Complex Audio-Visual tions acting on resonances, such as car engines,
Scenes (Doel, Knott, & Pai, 2004) they describe rumbling sounds, or virtual musical instruments”
a process which they call mode pruning in which (Doel & Pai, 2006, p. 100). They go on to explain
the effect of auditory masking is used to predict their implementation of a system to synthesize the
which modes are perceptually unimportant so sound of a four-stroke engine and a four-cylinder
that they can be excluded from synthesis, hence engine. To do this, they determine modal data that
saving processing power. The technique is not represents “a lumped model of everything that
just applied to individual sounding objects, as is vibrating” (p. 114) and drive it with a force
in previous work: all sounding objects are now signal the generation of which is inspired by the
considered together. This requires the process to workings of an engine. They state that reasonable
be carried out at run time as objects are being results were obtained with their “extremely simple
sounded and reaps further gains in efficiency. In models” (p. 115) and express optimism about the
addition, this project eases the load on the CPU range of engine sounds that could be achieved by
by offloading as much computation as possible spending more time developing the technique.
to the Graphical Processing Unit (GPU). In the same year, Nikunj Raghuvanshi and
The next major contribution from Doel and Ming C. Lin published research on their use of
Pai was published as a chapter entitled Modal physical modelling in large environments in a pa-
Synthesis for Vibrating Objects (Doel & Pai, per entitled Interactive Sound Synthesis for Large
2006) in the book Audio Anecdotes III. As a book Scale Environments (Raghuvanshi & Lin, 2006).
chapter, this work is self contained and as such Their method of determining an object’s modal
includes a comprehensive review of the theory of data is similar to the one used by O’Brien et al.
modal synthesis before explaining how they have (2002) and can similarly be used for arbitrarily
implemented it using a bank of band-pass filters. shaped objects. They consider three-dimensional
In order for the project to facilitate environments objects as being composed of a thin shell and a
that may potentially grow to be very large, with hollow inside and they represent it as a mesh of
sounding objects continuously being created and particles connected by damped springs. Off-line
destroyed, a modular solution was designed. From computation of a matrix of particle masses and
a programming point of view, this means that code one of elastic forces renders the modal data for an
is written in classes and, at run-time, instances object from which synthesis can be performed. A
of these classes are created and destroyed as re- large portion of the paper details ways in which
quired. For example, a class named SonicObject computational efficiency is gained and 3 tech-
contains functionality for rendering to a system’s niques are detailed. Similar to Doel and Pai in
audio hardware, ModalSonicObject implements Interactive Simulation of Complex Audio-Visual
modal synthesis and an AudioForce class provides Scenes, the total computational expense is reduce
352
by decreasing the amount of modes used for syn- of interacting objects “with little loss in perceived
thesis and obviously the aim is to do this with the sound quality” (Raghuvanshi & Lin, 2006, p. 108).
minimum degradation of sound quality. Returning to the research of Menzies, 2007
The first technique, which they call mode saw the publication of Physical Audio for Virtual
compression, exploits the fact that humans find it Environments, Phya in Review (Menzies, 2007).
difficult to discriminate between nearby frequen- This paper described a project now known as
cies (Sek & Moore, 1995). Therefore, a number of Phya: a library that facilitates physical modelling
nearby frequencies are lumped together to reduce for sound synthesis in tandem with a physics en-
computational expense. The second technique, gine. In this work, the author underlines the need
mode truncation, considers that an oscillator for creative thinking in such a project in order to
requires processing cycles regardless of how produce a “powerful synergistic perceptual effect”
much its output contributes to an overall sound. (Menzies, 2007, p. 1) by combining realistic audio
To improve efficiency, a threshold is introduced with visuals. This may entail relaxing the physical
below which an oscillator’s output is deemed to constraints on the sound production process and
be unimportant and therefore is no longer cal- giving sound designers some freedom to enhance
culated. The third technique, quality scaling, is the characteristic sounds of a scene. Ways in which
concerned with synthesizing sound for multiple a level of control may be given to a sound designer
objects simultaneously. Each sounding object in are highlighted throughout the work. There is also
a scene is given a processing time-quota within an emphasis on using robust, efficient techniques
which to perform modal synthesis with more and on creating a system that can be easily scaled
important objects given more time than those up to handle large environments.
of lesser importance. At each audio callback, By combining a knowledge of techniques
modal synthesis is carried out starting with the described in previous projects with new innova-
most important object and finishing with the least tions, Phya facilitates sound synthesis for many
important, allowing each object their full time different types of contact (Figure 2.). These in-
quota if required. This means the more important clude: simple impacts; multiple impacts, which
objects will be rendered with a higher level of happen at a rate not captured by most physics
sonic detail than those of less importance. Results engines; scraping and rolling, including the effect
indicate that each of these techniques yields an of contact jumps where the objects momentarily
efficiency gain and, when used together, sound break contact; grazing sounds, which occur when
can be synthesized for a large scene with hundreds an object simultaneously bounces and skids off
a surface; stick and slip, due to friction between
Figure 2. Screenshot of Phya synthesising the sounds of multiple objects. (© 2010 Dylan Menzies. Used
with permission.)
353
objects; and the buzzing from vibrating objects use persistent contacts but instead simply report
in light contact with each other. In addition, the if two objects are touching in a given frame.
project can produce the effect of surface damping Before concluding, Menzies, who himself has
(an object becomes less resonant due to being in experience in both the academic and industrial
contact with another object) and also the effect of sides of computer game development, gives his
a change in sound due to a deformable object be- own “tentative explanations” as to why physical
ing forced out of shape. To a first approximation, modelling hasn’t been embraced outside of re-
the effect of distributed nonlinearities described search projects despite its potential value. Firstly,
earlier can be simulated, producing interesting as is widely known among sound designers and
results like pitch glide. Further, as yet unreleased, audio programmers in the games industry, audio
functionality has demonstrated the effect of dif- is usually considered less important than graph-
fuse resonance which is when an object’s modes ics, meaning that less resources are allocated to
“become very numerous and merge into a diffuse it both in terms of development and hardware.
continuum” (Menzies, 2007, p. 3). More recent Therefore “audio programming is often carried
demonstrations of Phya have introduced loose out by a non-specialist programmer” and “there is
particle surfaces, for example, gravel, achieved often a natural resistance to acknowledge that out
through a PhISEM like approach, as well as sur- of house technologies could be valuable if they
faces covered in leaves, plastic packaging, and can not readily be reproduced in house” (Menzies,
shallow water. Bear in mind the author does not 2007, p. 5). Considering the difficulties associated
claim these effects are created through accurate with correctly harnessing the information available
modelling of the physical phenomena that cause from a physics engine, it is perhaps understandable
them but, instead, techniques have been developed that developers may not wish to risk spending
by combining an understanding of these phenom- resources on a concept that has yet to be tested in
ena with a creative approach to sound synthesis a commercial sense. In addition, Menzies contends
while reserving some aspects to be controlled by that “published research often focuses on a level
a sound designer. In a later publication, Menzies of audio modeling detail that goes well beyond
refers to this as “the development of semi-physical that required in a simulation” (Menzies, 2007, p.
perceptual models that provide some freedom for 5) implying that current research is not entirely
the sound designer to more easily mould a virtual relevant to an industry where a level of creative
world” (Menzies, 2008, p.1) control is desirable, algorithmic efficiency is vital,
The information required to create these sounds and absolute authenticity is not.
is unlikely to be available directly from a physics In 2007, Raghuvanshi and Lin, now collaborat-
engine and therefore a collision update layer of ing with Christian Lauterbach, Anish Chandak,
Phya must deduce it from what is available. Often, and Dinesh Manocha all from the University
the physics engine will not provide enough detail, of North Carolina, published an article on their
for example when multiple impacts are involved, continued research. Real-Time Sound Synthesis
and so the collision update layer must generate and Propagation for Games (Raghuvanshi et al.,
extra information from what is known in a deter- 2007) reviews their previously described work
ministic or stochastic way. The collision update (Raghuvanshi & Lin, 2006) before presenting
process is reported as one of the more tricky sound propagation functionality that has since been
problems the project has overcome, with the issue added. Their approach is an adaptation of beam
of monitoring continuous contact said to be par- tracing (Funkhouser et al., 1998) and it is, as they
ticularly awkward as most physics engines do not claim, well suited to interactive applications. Their
results indicate that, in complex, dynamic scenes
354
that contain moving sound sources, the technique FUtUrE DIrEctIONs AND
is effective enough to model sound propagation cONcLUsION
with several orders of reflection.
In 2008, Menzies, in his paper Virtual Inti- So, what lies in the future of this research area?
macy: Phya as an Instrument (Menzies, 2008), What are the current barriers to its mainstream
considered how Phya might be used to create adoption? Is physical modelling for sound syn-
music. He describes how Phya’s physical be- thesis set to become the norm for computer game
haviour is naturally appealing to humans and sound effects? Let us first look at possible future
how the facility to create physically impractical research directions.
or even impossible conditions in a virtual world Menzies’ most recent work promises to provide
gives users more freedom than in the real world, users with an authoring environment for Phya,
creating new musical composition possibilities. hence making it accessible to a wider audience.
In 2009, Menzies revealed the beginnings of a Although it is not yet clear what form this will take,
complementary application for Phya, in the paper the facility for users to modify an object’s size,
Phya and VFoley, Physically Motivated Audio shape, and material properties while it is sounded
for Virtual Environments (Menzies, 2009). The in real-time, would most certainly be desirable.
development of VFoley is intended to allow users The most problematic stipulation here is the provi-
to hear how an object sounds as modifications are sion of a means to change an object’s shape while
being made to it, so that a user may quickly test providing sound in real-time. To date, any work
object interactions (Figure 3.). An application that allows for the sounding of arbitrarily shaped
developed by myself (Mullan, 2009) to synthesise objects has required an off-line pre-processing
sounds for regular shapes, not only allows the stage to determine the object’s modal data and,
user to modify an object’s geometric and mate- so far, this cannot happen quickly enough to be
rial parameters, but also parameters that directly used in an application that could be described
describe how an object sounds, referred to as its as interactive. Cynthia Bruyns has carried out
“audibly perceptible” parameters. The completion related research and has shown how to quickly
of projects such as these brings physical model- estimate an arbitrarily shaped object’s modal data
ling for sound synthesis to non-programmers, and based on those of a similar shape (Bruyns, 2006).
hence more widespread usage. However, no application exists whereby a level
designer may create a unique object and instantly
(or within a reasonably short time) hear how it
sounds. This would be desirable from a game
development point of view and would create new
gameplay possibilities.
Figure 3. Screenshot taken from the development of VFoley. (© 2010 Dylan Menzies. Used with permission.)
355
Some readers may have noticed that, in all available by developments in physical modelling
the studies described in this chapter, the focus and procedural audio.
was rarely on formal listening tests. Instead, the At a minimum, games are set to become more
developers were usually the ones to evaluate the realistic due to improved audio. Each time a game
sounds produced. Although anyone working in is played, a unique soundtrack, tailored to that
this research area should have good listening skills game experience, will be created. The subtleties
there is undoubtedly much to be learned from therein will match that which the gamer’s intuition
carrying out extensive listening tests on a range expects based on what he or she sees and this will
of people, including subjects with and without lead to a more immersive experience.
experience in sound design, both gamers and non- Beyond this, new gaming possibilities are
gamers. The results could reveal failings/overkill opening up, ranging from small enhancements
in the current techniques and should inform future of current common situations to completely new
research projects. This should, in turn, lead to prospects. For example, a common puzzle in The
improvements in efficiency which is desirable in Legend of Zelda: A Link to the Past (Nintendo,
any aspect of computer game programming, and 1992) was to identify the weak point of a wall so
certainly in audio programming. it could be destroyed. A weakness was usually
Finally, there are still barriers to overcome in visible and also produced a different sound when
the adoption of physical modelling for sound syn- struck with a sword. However, the sound produced
thesis in common game engines. While Menzies by striking either a strong or weak point was
highlights the difficulties encountered in linking always one of two samples. With the incorpora-
sound synthesis to a physics engine (Menzies, tion of physical modelling into this situation, not
2007), he also stresses that these difficulties can only could the sound of striking a wall contain
be overcome. So far, many projects have used information on the material and thickness of the
a physics engine by harnessing the information wall, but also on the sword used to strike it. This
already available from it, but no physics engine would still hold true even if the sword had been
has been adapted for sound synthesis from the created uniquely by the gamer during gameplay,
inside. The potential benefits of this are worth and this links to another range of possibilities
exploring as, in a commercial sense, an in-built facilitated by physical modelling. Games will be
sound synthesis module could give a physics able to produce sound for objects that were not
engine a competitive edge. conceived during development. As mentioned in
Looking forward to a time when the meth- the introduction to this chapter, this is particularly
ods discussed here can be fully integrated into desirable in games such as LittleBigPlanet (Sony
computer games, it is important to view physical Computer Entertainment Europe, 2008) which
modelling not as a standalone practice, but as a encourage users to create their own unique con-
branch of procedural audio. Within the context tent. Indeed, such games might require the user
of computer game sound effects there is a con- to fashion objects that sound a particular way in
tinuum from pure physical modelling to purely order to solve a puzzle. As also mentioned earlier,
abstract synthesis methods. Audio programmers with the increasing sophistication of physics in
will require not only the knowledge but also the games, we have now reached the point of objects
creativity to know which part of this continuum being realistically shattered, creating a potentially
they should use in different situations in order unique combination of fragments each time. Again,
to create the most compelling sounds with the any shattered pieces will not have been conceived
resources available. Furthermore, game designers during a game’s development and so the only
will need to be aware of the new possibilities made way to create sound for them is by synthesis at
356
run time. Finally, with creative thinking, there is Bruyns, C. (2006). Modal synthesis for arbitrarily
potential for new game scenarios that are designed shaped objects. Computer Music Journal, 30(3),
with the possibilities of physical modelling in 22–37. doi:10.1162/comj.2006.30.3.22
mind. With the emergence of environments like
Collins, K. (2008). Game sound: An introduc-
LittleBigPlanet, Crayon Physics Deluxe (Kloo-
tion to the history, theory and practice of video
niGames, 2009) and Phun – 2D Physics Sandbox
(Ernerfeldt, 2008), gamers now have the facility
MIT Press.
to create their own worlds which evolve naturally
due to the effects of physics (like a domino effect). Cook, P. R. (1997). Physically informed sonic
Currently, object contact sounds are not attached modeling (PhISM): Synthesis of percussive
to these environments but, with the introduction sounds. Computer Music Journal, 21(3), 38–49.
of physical modelling for sound synthesis, this doi:10.2307/3681012
need not be the case. If object contact sounds
Cook, P. R. (2002). Real sound synthesis for inter-
were synthesised in these environments, there
active application. Natick, MA: A K Peters, Ltd.
is the potential to create an intentional sequence
of sounds, that is, music. This virtual musical Crayon Physics Deluxe. (2009). Petri Purho
performance would be driven by physics and (Developer). San Mateo: Hudson Soft.
accompanied by graphics. Such an environment
De Sanctis, G., Sarti, A., Scarparo, G., & Tubaro,
would give musical composers a new composition
S. (2005). Automatic modelling and authoring of
tool and gamers a creative environment.
nonlinear interactions between acoustic objects.
These are just some suggestions as to how
In K. Galkowski, A. Kummert, E. Rogers & J.
physical modelling for sound synthesis might
Velten (Eds.), The Fourth International Work-
enhance computer games in the future. As re-
shop on Multidimensional Systems – NDS 2005
search in the area continues and these techniques
(pp.116-122).
become available, the onus will switch to game
developers to embrace them, implement them (or Doel, K. d., Knott, D., & Pai, D. K. (2004). Inter-
buy them), utilise their advantages, and dream up active simulation of complex audio-visual scenes.
new possibilities. Presence (Cambridge, Mass.), 13(1), 99–111.
doi:10.1162/105474604774048252
Doel, K. d., Kry, P. G., & Pai, D. K. (2001).
rEFErENcEs
FoleyAutomatic: Physically-based sound effects
Adrien, J. M. (1991). The missing link: Modal for interactive simulation and animation. In P.
synthesis . In DePoli, G., Picialli, A., & Roads, Lynn (Ed.), Proceedings of SIGGRAPH ’01: The
C. (Eds.), Representations of musical signals (pp. 28th annual conference on Computer graphics
269–297). Cambridge, MA: MIT Press. and interactive techniques (pp. 537-544). New
York: ACM.
Bilbao, S. (2006). Fast modal synthesis by digital
waveguide extraction. IEEE Signal Processing Doel, K. d., & Pai, D. K. (1998). The sounds of
Letters, 13(1), 1–4. doi:10.1109/LSP.2005.860553 physical shapes. Presence (Cambridge, Mass.),
7(4), 382–395. doi:10.1162/105474698565794
Bilbao, S. (2009). Numerical sound synthesis:
Finite difference schemes and simulation in mu-
sical acoustics. Chichester, England: John Wiley
and Sons.
357
Doel, K. d., & Pai, D. K. (2006). Modal synthesis Gaver, W. W. (1993a). Synthesizing auditory
for vibrating objects. In K. Greenebaum, & R. icons. In S. Ashlund, K. Mullet, A. Henderson,
Barzel (Eds.), Audio anecdotes III: Tools, tips, E. Hollnagel & T. White (Eds.) Proceedings of
and techniques for digital audio (pp. 99-120). the INTERCHI ’93 conference on Human factors
Wellesley, MA: A K Peters, Ltd. in computing systems (pp. 228-235). New York:
ACM.
Doel, K. d., Pai, D. K., Adam, T., Kortchmar,
L., & Pichora-Fuller, K. (2002). Measurements Gaver, W. W. (1993b). What in the world do we
of Perceptual Quality of Contact Sound Models. hear? An ecological approach to auditory event
In Nakatsu & H. Kawahara (Eds.), Proceedings perception. Ecological Psychology, 5(1), 1–29.
of the 8th International Conference on Auditory doi:10.1207/s15326969eco0501_1
Display, (pp. 345-349). Kyoto, Japan: ATR.
Kelly, J., & Lochbaum, C. (1962). Speech syn-
Ernerfeldt, E. (2008). Phun: 2D physics sandbox. thesis. In Proceedings of the Fourth International
Available from http://www.phunland.com/wiki/ Congress on Acoustics, 4, 1-4. Retrieved from
Home. http://hear.ai.uiuc.edu/public/Kelly62.pdf
Essl, G., Serafin, S., Cook, P., & Smith, J. LittleBigPlanet. (2008). Media Molecule (Devel-
O. (2004). Theory of banded waveguides. oper). Surrey, UK: Sony Computer Entertainment
Computer Music Journal, 28(1), 37–50. Europe.
doi:10.1162/014892604322970634
McCuskey, M. (2003). Beginning game audio
Farnell, A. (2008). Designing sound. London: programming. Boston, MA: Premier Press.
Applied Scientific Press.
Menzies, D. (2002). Scene management for
Farnell, A. (2011). Behaviour, structure and causal- modelled audio objects in interactive worlds.
ity in procedural audio . In Grimshaw, M. (Ed.), In Nakatsu & H. Kawahara (Eds.), Proceedings
Game sound technology and player interaction: of the 8th International Conference on Auditory
Concepts and developments. Hershey, PA: IGI Display. Kyoto, Japan: ATR.
Global.
Menzies, D. (2007). Physical audio for virtual
Fettweis, A. (1986). Wave digital filters: Theory environments, Phya in review. In W. L. Martens
and practice. Proceedings of the IEEE, 74(2), (ed.), Proceedings of the 13th International Confer-
270–327. doi:10.1109/PROC.1986.13458 ence on Auditory Display (pp.197-202). Montreal,
Canada: McGill University.
Fletcher, N. H., & Rossing, T. D. (1991). The phys-
ics of musical instruments. New York: Springer. Menzies, D. (2008). Virtual intimacy: Phya as
an instrument. In Proceedings of the 8th Interna-
Funkhouser, T., Carlbom, I., Elko, G., Pingali, G.,
tional Conference on New Interfaces for Musical
Sondhi, M., & West, J. (1998). A beam-tracing
Expression NIME08. Retrieved from http://www.
approach to acoustic modelling for interactive
zenprobe.com/dylan/pubs/menzies08_virtualInti-
virtual environments. In S. Cunningham, W.
macy.pdf
Bransford & M. F. Cohen (Eds.) Proceedings of
SIGGRAPH ’98: The 25th annual conference on
Computer graphics and interactive techniques
(pp. 21-28). New York: ACM.
358
Menzies, D. (2009). Phya and VFoley, physi- Raghuvanshi, N., & Lin, M. C. (2006). Interactive
cally motivated audio for virtual environments. sound synthesis for large scale environments. In
In 35th AES Conference on Audio for Games. Proceedings of the 2006 symposium on Interac-
Retrieved from http://www.aes.org/e-lib/browse. tive 3D graphics and games (pp. 101-108). New
cfm?elib=15171 York: ACM.
Mullan, E. (2009). Driving sound synthesis from Ruiz, P. (1969). A technique for simulating the
a physics engine. In Charlotte Kobert (Ed.), Pro- vibrations of strings with a digital computer.
ceedings of the IEEE Games Innovation Confer- Unpublished master’s thesis. University of Il-
ence 2009 (pp.256-264). New York: IEEE. linois, Urbana, IL.
O’Brien, J. F., Cook, P. R., & Essl, G. (2001). Sek, A., & Moore, B. C. (1995). Frequency
Synthesizing sounds from physically based mo- discrimination as a function of frequency,
tion. In P. Lynn (Ed.), Proceedings of SIGGRAPH measured in several ways. The Journal of the
’01: The 28th annual conference on Computer Acoustical Society of America, 97(4), 2479–2486.
graphics and interactive techniques (pp. 529-536). doi:10.1121/1.411968
New York: ACM.
Takala, T., & Hahn, J. (1992). Sound rendering. In
O’Brien, J. F., Shen, C., & Gatchalian, C. M. Proceedings of SIGGRAPH ’92: The 19th annual
(2002). Synthesizing sounds from rigid-body conference on Computer graphics and interactive
simulations. In T. Appolloni (Ed.), Proceedings techniques, 26(2), 211-220. New York: ACM.
of the 2002 ACM SIGGRAPH/Eurographics
The legend of Zelda: A link to the past. (1992). Nin-
symposium on Computer animation (pp.175-181).
tendo EAD (Developer). Kyoto, Japan: Nintendo
New York: ACM.
Trautmann, L., & Rabenstein, R. (2003). Digital
Pai, D. K., Doel, K. d., James, D. L., Lang,
sound synthesis by physical modelling using the
J., Lloyd, J. E., Richmond, J. L., & Yau, S. H.
functional transformation method. New York:
(2001). Scanning physical interaction behaviour
Kluwer Academic/Plenum Publishers.
of 3D objects. In P. Lynn (Ed.), Proceedings of
SIGGRAPH ’01: The 28th annual conference on Välimäki, V., Pakarinen, J., Erkut, C., & Karj-
Computer graphics and interactive techniques alainen, M. (2006). Discrete-time modelling of
(pp. 87-96). New York: ACM. musical instruments. Reports on Progress in Phys-
ics, 69, 1–78. doi:10.1088/0034-4885/69/1/R01
Pedersini, F., Sarti, A., & Tubaro, S. (2000). Object-
based sound synthesis for virtual environments Whitmore, G. (2009, May). The runtime studio
using musical acoustics. IEEE Signal Processing in your console: The inevitable directionality of
Magazine, 17(6), 37–51. doi:10.1109/79.888863 game audio. Develop, 94, 21.
Raghuvanshi, N., Lauterbach, C., Chandak, A., Wilde, M. D. (2004). Audio programming for
Manocha, D., & Lin, M. C. (2007). Real-time interactive games. Oxford: Focal Press.
sound synthesis and propagation for games.
Communications of the ACM, 50(7), 67–73.
doi:10.1145/1272516.1272541
Sound synthesis: The process of generating

audio.
359
Physical Modelling: Simulating the physical Discretise: The process of making a continu-
behaviour of a real object or system. From an ous model or function discrete. Usually this is
audio point of view, this means simulating how performed so a problem can be solved numeri-
an object vibrates in the audible frequency range. cally or implemented digitally using a computer.
Digital Waveguide Synthesis: A physical Inharmonic: Describes a spectrum in which
modelling synthesis technique that models the the higher frequencies (partials) are not integer
vibrations of a system as travelling waves. It is multiples of a fundamental frequency.
particular efficient for 1D systems with a harmonic Procedural Audio: Audio that is generated at
frequency spectrum. run time from information on sound producing
Modal Synthesis: A physical modelling syn- events. Sound generation algorithms are derived
thesis technique which generates sound as a sum from analysis of real sound producing systems.
of decaying sinusoids. This analysis may be physics based, phenomeno-
logical, or a mixture of these.
360
Section 5
362
Chapter 17
Guidelines for Sound Design
in Computer Games
Valter Alves
University of Coimbra, Portugal & Polytechnic Institute of Viseu, Portugal
Licínio Roque
University of Coimbra, Portugal
AbstrAct
The inconsequential exploitation of sound in most computer games, both in extent and nature, con-
trasts with its prominence in our daily lives and with the kind of associations that have been explored
in domains such as music and cinema. Sound design remains the craft of a talented minority and the
unavailability of a public body of knowledge on the subject has greatly contributed to this state of af-
fairs. This leads to a mix of alienation and best-judgment improvisation in the broader development
community. A sensitivity to the potential of sound for the enrichment of the experience—with emphasis
on game specifics—is, therefore, necessary. This study presents a contribution to the practice of sound
design for computer games. An approach to intentional sound design, informed by multi-disciplinary
interpretations of concepts including emotion, context, acoustic ecology, soundscape, resonance, and
entrainment, is distilled into a set of design guidelines that holistically address the different sound layers.
INtrODUctION in computer games. More relevant to the study

presented here, there is little theoretical support
Computer game sound design is in its infancy. for someone, who is not one of those experts, to
It is still a practice almost reserved to a limited perform intentional sound design.
number of experts in the game industry who have This situation is not an exception in the broader
typically made their own way through the field in context of human-machine interfaces and interac-
the absence of a structured body of knowledge. tion systems. Game development, though, is one
The consequences are self-evident. To start with, of the fields where sound is deserving of greater
there is no abundance of purposeful sound usage attention as noted by a number of recent authors
(Collins, 2008a; Ekman, 2005, 2008; Grimshaw,
DOI: 10.4018/978-1-61692-828-5.ch017 2007; Peck, 2001, 2007). Additionally, in the
Guidelines for Sound Design in Computer Games
wider field of Human Computer Interaction visual modality, leaving others, like sound, less
(HCI), research on sound is recognized as quite explored (Alves & Roque, 2009a).
neglected (Brewster, 1994; Frauenberger, 2007; The field that is acknowledged to be most
Hermann & Hunt, 2005; Kramer et al., 1997). One contributive to game sound–and to many other
conspicuous sign of the lack of a relevant body of aspects of game development, for that matter–is the
knowledge is the unavailability of clear guidelines movie industry. In fact, practices on game sound
or best practices. Yet, this kind of support does are strongly influenced by those from cinema. Still,
exist and is widely known with respect to the although this is understandable and legitimate to
visual modality (Kramer et al., 1997). some extent, it is crucial to understand that fun-
What is more, researchers in HCI often resort damental disparities exist between the two media
to computer games as instruments to conduct stud- that both impose and propose distinct approaches.
ies on several aspects (Barr, 2008; A. Jørgensen, It is exactly in this difference that we find most
2004) including those related to sound. Sound de- prospective development. Ultimately, what is
sign in computer games is particularly interesting needed is knowledge on how to compose sound
because it supplies evidence of the pertinence of attending to game scenario specifics including
multiple aspects of sound in interaction. To start nonlinearity, dynamicity, and the need for vari-
with, computer game sound matters to usability, ability (Collins, 2008a, 2008b).
in the sense of “easing the use of the system The lack of guidance in sound design has
by providing specific information to the player proven to be damaging. On the one hand, devel-
about states of the system” (K. Jørgensen, 2006, opers are discouraged from integrating sound in
p. 48). It can also work as support to gameplay their projects leading to unbalanced interfaces
(K. Jørgensen, 2008). Additionally, sound is a when compared to our experiences in daily life
valuable component of overall game aesthetics or even with other media. On the other hand,
and affective perception. Furthermore, it may be and possibly more harmfully, when developers
used to create and enhance emotional impact (Ek- venture into sound integration they have to resort
man, 2008) and contribute to immersion (Collins, to their best judgment, not necessarily achieving
2008a; Grimshaw, 2007, 2008). Nevertheless, it interesting results (Frauenberger, 2007). In turn,
is important to be aware that interaction in HCI all these circumstances, have contributed to us-
and computer games are not the same: applica- ers/players becoming accustomed to the factual
tions typically bracketted under the HCI label are unimportance of sound, even developing some
meant to be used, while games are meant to be negative associations to sound from which the
played (Barr, 2008; Sotamaa, 2009). urge to the mute button is an emblematic example.
The relevance of computer games in HCI re- Muting is interesting as a transient state, not as
search is also justified by a growing appreciation the defensive default.
for the concept of User Experience (Hassenzahl Considering such a scenario and refocusing on
& Roto, 2007; Hassenzahl & Tractinsky, 2006) research and development, two modes of attack
which emerged as an attempt to promote a ho- seem to be imperative. One is sensitization. This
listic interaction perspective beyond the more means getting more people aware of the low-level
traditional efforts, such as usability. Aspects as appreciation that the audio component currently
efficiency or performance are no longer the sole has and countering this by proposing innovative
design concerns: Subjective appreciation mat- ways to explore sound potential. The other is to
ters and also influences the former concerns. Yet deliver support to enable the implementation of
again, the research has been much directed to such ideas. This stretches from providing guidance
on the potential concepts that may allow tackling
363
the intentionality of the design to pragmatic aspects brings some value–which can be fun, certainly–it
of implementation. is just disturbing the surrounding sounds. And that
In this chapter we contribute to both these is when the mute button becomes handy.
approaches. We will start by addressing some A sound designer must consider the project
fundamental questions. Then, we will present as a whole and ponder how sound will best serve
a contribution intended to aid sound design in the overall purposes in harmony with all other
the form of a set of design guidelines. These aspects. For that to be possible, it is crucial that
guidelines are an expression of findings that sound designers become involved in the general
we have synthesized from an interdisciplinary design process as soon as it begins. Unless that
literature review and from an extensive analysis happens, the range of possibilities will be severely
of media products, particularly computer games. curtailed by whatever other decisions have been
We brought together research and concepts that taken. This is an issue that is documented regarding
include: acoustic ecology; recent studies on emo- sound designers in movie industry (for example,
tion, including the latest findings on neuroscience; Parker, 2003; Peck, 2001).
physical phenomena having repercussions on the
psychology and physiology of perception, cogni- Emotions
tion, and emotion; and context engineering. We
will present some background to these concepts We have already stated the scarce consideration
and on their prominent relationships to each other. that sound has so far received in most designed
We will also present a report on an exemplary de- interactive processes. No less relevant is the fact
sign exercise (Alves & Roque, 2009b), following that most of the efforts on leveraging sound us-
the method here presented, carried out by a team age have been focused on utilitarian issues. These
of game developers with no prior experience in include complex data display, event monitoring
sound design for the purposes of demonstrating a and reinforcement of critical messages, applica-
possible practical interpretation of our suggestions. tions for visually impaired people, and interfaces
for eyes-free devices. Of course, these are all
most noble quests, but they do not explore a very
INtENtIONAL sOUND DEsIGN powerful facet of sound, which is its association
to emotions.
It is essential that the exploration of the usage of Research on emotion was not always popular
sound in some interactive experience does not end although theories can be traced back at least to
up confused with the mere placing of sounds on Plato and Aristotle. As an area of research, it has
top of things. Designers should not be searching had a low profile for most of the 20th Century and
for excuses to use sound: they should be design- only recently has it had a resurgence in interest
ing ways in which sound may contribute to the (Damásio, 2003; Ledoux, 1998; Nettle, 2006)
purpose of the application. To put it another way, thanks largely to advances in neuroscience labora-
in this context, sound is a means, not an end. It is tory tools. The fact that it is now possible to have an
not about fitting; it is about profiting. internal perspective of emotion, rather than deal-
Failing to understand this enlarges the user’s ing with external observations alone, contributes
perception that sound is expendable. And the truth decisively to a new consideration of emotions. To
is the user does not need our help to hear “things”. start with, it helps to set apart what is science and
The user is not living in a vacuum being already what is no more than wishful thinking, allowing
surrounded by sounds. So, it is probably the case for the credibility of the approaches that right-
that, unless the sound coming from the application fully find support on the emotional plane. Also, it
364
reveals new opportunities to act according to the and which also remains overlooked, is the need
physiological observations of emoting processes. for a holistic perspective on sound, exploring the
But, and possibly more relevantly, recent scientific benefits of considering the auditory component
findings on brain phenomena and on how cogni- not as a set of independent stimuli but as a coher-
tion and emotion are intertwined (Damásio, 2000, ent composition integrated with the context of
2003, 2005; Lane, Nadel, Allen, & Kaszniak, 2002; the experience.
Ledoux, 1998) build support for unprecedented
studies that aim to leverage cognitive attributes Acoustic Ecology
through the exploration of emotional aspects of
the interaction (Norman, 2004). Acoustic ecology (Kallmann, Woog, & Wester-
The new thinking on emotion contributed to a kamp, 2007; World Soundscape Project, n.d.;
new perspective on the interaction process itself, Wrightson, 2000), an area founded mostly by
consistent with a move in the research focus from music composers, is very insightful to an emo-
a functionalist view of usability to a broader notion tionally meaningful conception of contextualized
of User Experience (Hassenzahl & Tractinsky, sound. It is supported by the central concept of
2006; Mahlke, 2007; Mahlke & Thüring, 2007; soundscape (Schafer, 1973, 1994) and the thereby
Norman, 2002). User Experience privileges qual- developed musical composition (Truax, 1995,
ity of interaction over instrumental aspects and 2001). Together, they represent a meaningful body
introduces “the general notion of technology as of knowledge with particular emphasis on context,
a positive aspect of our daily lives” (Mahlke, emotion, and interaction between the listener and
2007). In computer games, the experience and the the environment. The term soundscape means the
explicit designing of emotions are core concepts “sound heard in a real or virtual environment”
(Freeman, 2003; Marks & Novak, 2009; Schell, (Wrightson, 2000, p. 10) considered as a whole.
2008) and–apart from game categories such as A soundscape is an ecologically balanced entity
“serious games”–they constitute the ultimate where sound mediates the relationships between
argument for consumption. individuals and the environment. So, acoustic
That said, it seems fair to argue for a more ecology implies a consideration of how the envi-
thoughtful exploration of sound, namely in what ronment is understood by those living within it:
concerns its potential association to emotions regarding sound, the focus is on how it functions,
(Ekman, 2008; Follett, 2007; Grimshaw, 2007; not simply how it propagates.
Peck, 2001), with both a focus on purely hedonic Acoustic ecology also supports the idea that
purposes and through an exploration of how the an acoustic environment can be understood as
achievement of specific emotional states may a musical composition. This emphasis on the
indirectly contribute to pragmatic goals such as concepts of harmony and orchestration is not
various aspects of performance: efficiency, ef- mere lyricism. Studies on natural environments
fectiveness, perception, memory, and so forth. show balance in level, spectra, and rhythm. For
Interestingly, in other disciplines, sound has instance, it was observed that “animal and insect
proven to be notably associated with emotion; vocalizations tended to occupy small bands of
relationships between sound and emotion have frequencies leaving ‘spectral niches’ (bands of
been traditionally explored in areas like music little or no energy) into which the vocalizations
(Juslin & Sloboda, 2001) and cinema (Peck, 2001; (fundamental and formants) of other animals,
Sider, 2003) with a solid body of knowledge. birds or insects can fit” (Wrightson, 2000, p. 11).
One aspect that appears fundamental to the Another implication is that the listener shares
research of sound and emotion in interaction, responsibility in composition (Wrightson, 2000).
365
The idea of the listener as a composer is very Resonance is the matching between vibra-
insightful. First, it gives relevance to the sound tory rates and is found in all periodic, sinusoidal
the listener himself produces (composes and/ movements. It requires a concordance between
or interprets), intentionally or not. Second, and the exciting frequency and that of the object put
perhaps more impressively, it emphasizes that into vibration. A resonant system exists when an
the user completes the composition by filling the object is able to make another resonate. Natural
“meaning” that is absent or that is not evident resonance occurs when an object vibrates as a
(Truax, 1995). This process of construction is consequence of being excited with its own natural
assumed to be personal since the overall context, frequency. If the object has the ability to vibrate to
where an acoustic environment fits, is different a variety of frequencies, resonance can be forced.
for each person. Accordingly, we can describe the unengaging
We consider that these insights from acoustic circumstances mentioned above as failures to
ecology can be adapted to inform sound design in achieve resonance: For diverse reasons, there is
computer games from which conceit it becomes no match between the game sound features and the
relevant to conceive of a translation of the knowl- player. This is not just a figurative interpretation
edge generated around the concept of soundscape: of the concept: Our interest in resonance is indeed
This will be driven by research on the implica- related to how the body, as a system, responds to
tions of such an idea to overall perception and the sound stimuli and this is no different from what
emotional dimension of interaction. has been exploited by music through the ages.
One explanation for the failure in the desired
resonance and Entrainment matching, derived from the concept’s definition,
is that the entities–the player and the setting–are
One goal inherent to game design is to allow in such different states that no resonance can
for engaging experiences. Thus, it is important even be forced.
to reflect on reasons that may lead to a player The second concept–entrainment–gives us a
not becoming engaged with a designed setting. hint on how to work on that problem. Entrain-
Perhaps we have to recognize that, ultimately, ment has to do with the synchronization between
such lack of engagement may be explained by the resonant systems. It “has been found so ubiqui-
player’s own will and not by flaws in the design tous that we hardly notice it” (Sonnenschein,
process. This is no excuse, however, for ignoring 2001, p. 97). Entrainment has long been used by
important sound design considerations. music to induce specific states of consciousness
In some circumstances, the deviation from (Leeds, 2001; Sonnenschein, 2001). In terms of
the predicted behavior derives from the fact that psychoacoustics, the pertinence is to change the
there is no matching between the player’s state rate of brainwaves, heartbeat, or breath according
and some desired state in any particular moment. to verified associations between those rates and
From physics, we borrow two related concepts that cognitive and emotional states.
allow us to describe and to formalize a model to For entrainment to happen, three conditions
actuate design with the purpose of addressing this must be met (Leeds, 2001). Firstly, a system will
circumstance. These concepts are resonance and only entrain another if the latter is able to achieve
entrainment (Augoyard & Torgue, 2005; Sonnen- the same vibratory rate. Secondly, the former
schein, 2001), both physical phenomena having needs power enough to prevail over the latter.
repercussions on the psychology and physiology Finally, the former needs to keep the same vibra-
of perception, cognition, and emotion (Leeds, tory parameters until the latter is able to entrain.
2001; Sonnenschein, 2001). Regardless of whether we opt to take this liter-
366
ally or as an insightful metaphor, we must realize ing: considering the relevance of acoustic proper-
that if we want a player to resonate to a system’s ties of elements selected for interaction, namely
desired state we may need to first get the system as to their emotional effect; conveying meaning
resonating with the player and then progressively and coherent consequence to diegetic sound, inside
bring the system–and the player along–into the the gameworld; allowing to perform through the
desired state. exploration of the sonic outcome of meaningful
Resonance–including, for our purposes the actions; exploring the activation of events and
related concepts of entrainment, sympathetic interaction elements through the interpretation
vibration, resonant frequencies, and resonant of the corresponding acoustic expressions; inte-
systems–has been said to be “the single most im- grating users’ context in the sonic composition;
portant concept to understand if you are to grasp supporting and exploiting resonance and entrain-
the constructive or destructive role of sound in ment; and dealing with perception issues during
your life” (Leeds, 2001, p. 35). We believe reso- a user’s experience. Each guideline is presented
nance is fundamental to the exploration of sound with a description, relevant context, and examples.
in computer games, notably to support a model For the conception of these guidelines, we
that serves as an aid to understanding and, hope- did not focus on speech-based interaction. Also,
fully, overcoming the issue of empathy between although we do not exclude the use of music,
a game and its players. we are mainly interested in exploring interaction
through non-musical sounds. In terms of sound
layers (Peck, 2001, 2007), this does not mean we
GUIDELINEs FOr sOUND will not be considering dialog and music because
DEsIGN IN cOMPUtEr GAMEs that would ruin our commitment to the holistic
approach underpinning our research: Depending
Based on the concepts and findings here described, on the purpose with which specific sound stimuli
we have distilled a set of guidelines for sound are added to the composition, they can play a role
design in computer games. We encourage readers in any layer. It simply means we are not attempting
to understand this set as a work-in-progress. Our to contribute guidelines that specifically go into
purpose is to contribute to the research community such matters as dialog generation and interpreta-
by building knowledge that can give us and other tion or musical composition in the strict sense.
researchers the confidence to consider it plausible
and worth refining not least for its use value to Guideline 1: select Elements
computer game sound designers. Therefore, these with High sonic Potential
guidelines have no claim (yet) of truth-value:
instead, their value is strictly instrumental to the It is strategic that the inherent, potential sonic
research and structuring of a body of knowledge in expressiveness is valued when selecting the inter-
sound design. Also, the guidelines do not prescribe action protagonists in early stages of design. This
procedures but, instead, establish a mindset that mindset applies to the full extent of the game’s
can inform those procedures. In that sense, they components, including objects, characters, script,
state what to care about rather than stipulating and features such as the gameplay. Actually, this
how to do it in a particular instance. But, most guideline is the mother of all others here presented:
of all, they are meant to generate understanding, In every each of them, for the designer to be able
not to be obeyed. to implement the respective idea, a dedicated
The guidelines attend to the identification of selection of these components is mandatory. We
several affective aspects of sound design, includ- will avoid stating it as a prerequisite because it
367
is not supposed to happen before those ideas are or even variation in a continuum, may be identi-
set. Both the selection of the elements and the fied. For example: a squeaky rubber duck has no
setting of the ideas that will explore them will sonic expression when left alone but possesses a
profit from a tight process of decision-making very well known sonic identity when squeezed;
along the progression which, in turn, ought to conversely, a cicada has a customary expression
be carried out from the very early phases of the that ceases when disturbed; a waterfall seems to
overall design process. have the same characteristic sound both on its
Also important to notice is that it is not about own and when someone bathes in it; and, a flock
selecting sounds. It is about selecting game ele- of pigeons also emits sound in both situations but
ments, taking into account how they will supply these are very distinct (mating and feeding versus
the sonic properties that are required to accomplish alarm and flapping wings). In another vein, if we
some design aspect. This distinction is absolutely need a game character to drive fast through the
fundamental. Unless that is kept in mind, then rush-hour traffic, we might consider including a
energies will be spent on enlarging the mistake of car horn and choose carefully its sound (accord-
not using sound but covering with sound. Actu- ing to Guideline 3 below). So, there are countless
ally, using sound to wrap the elements in a game possibilities to explore, depending on what is
is not an error per se. intended to be communicated.
Metaphorically speaking, we do prefer our Although some acoustic elements may be
gifts when they come in a nice wrapping paper. added–or patched–along the project, without
Still, that nice paper can be discounted and dis- overall disturbance, others imply strategic deci-
connected from the gift itself: Even if we opt to sions and consequently need to be analyzed in the
keep the paper, the gift and the paper will still be early stages of design. In the latter case, above,
independent entities, not contributing to the oth- resorting to a siren of some emergency vehicle
ers accomplishments but being in their separate service would imply the necessity to fit such
existences. decision in the design options: even considering
The attentive selection of interaction elements, it would be plausible in the scenario, it might be
prizing rich sonic expression, expands the space inappropriate if too many other design decisions
of possibilities in design time. This will allow ful- had been taken.
filling the intentionality of the soundscape whilst Finally, a related challenge is to reunite ele-
maintaining contextual consistency. Also, it should ments, which are coherent among themselves,
be easier to provide a good auditory perception within the whole project. For instance, unless
of the environment if objects in it are identifiable premeditated, dinosaur roars and bottle pops,
or provide context through their sonic properties. would not be compatible, although each one would
Choosing and combining acoustic protagonists possibly be associated to ideas that we might need
may be thought of as the construction of a dia- to combine (let’s say, angst and repose). The issue
lect, specific to the project and which will allow is compatibility, not verisimilitude: we are happy
supporting its communication model. This calls to hear the bad guys’ spaceship exploding in the
for a creative effort of collecting and combining void, although we know that would be impossible
possibilities. Still, it is useful to be attentive to (The Curious Team, 1999).
some opportunities. One is that elements may
have different states of sonic expression: roughly,
the sound emitted while in customary or natural
conditions and the sound emitted when the ele-
ment is “activated”. In some cases, more states,
368
Guideline 2: select Elements Whose birds (relaxation, attentiveness, fleeing); weather

changes in sonic Expression May elements (calm, scaring); baby sounds (joy,
support or translate Emotions tranquility, agitation, affliction); nice breakable
materials (aesthetic contemplation, trespassing,
When designing a game’s emotional script, destruction).
the designer should evaluate how sound will
contribute to it. There is no doubt emotions are Guideline 3: Allow sound to
core to computer games. Additionally, it is well Matter in the Gameworld
documented that sounds can be used to support
emotional contexts. Actually, that is a common The nature of the interaction, as perceived by the
practice–and sometimes the ultimate goal–in user, should be extended in order to genuinely
some mature fields as music (Gouk, 2004) and integrate sound as an instrument for action in the
the cinema (Lynch, 2003). It is important to notice environment. This is perhaps the most neglected
we are not claiming that sound should be the way use of sound in computer games. Sound, if used,
to support emotions in computer games. Sound is predominantly relegated to complement the
is one way to contribute to that but one way that visual rendering. It serves as output, which is
should not be forgotten, considering its potential good but just half the idea. In fact, acting through
and particular strengths for these purposes. sound makes perfect sense in a system with a
One approach that can be further explored, bidirectional interface. There is no reason for
when selecting each acoustic element according sound driven actions not to deserve the same kind
to its association to emotion, is to evaluate it of appreciation as running, jumping, grabbing, or
with an emphasis on its ability to support differ- shooting. Allowing the player to perform through
ent emotions, that is to say, to express emotional sound, either as a consequence of some contex-
changes through its own sonic alteration. This is tualized and meaningful action or by explicitly
not mandatory, since emotional changes may be deploying some sonic event, has the potential
achieved by resorting to different elements–pos- to greatly extend the value of the experience.
sibly one to support each different emotion–but Moreover, it significantly enlarges the space of
it may be advantageous to explore the use of possibilities in terms of design of the gameplay.
elements capable of supporting several emo- Reasons for the under-exploration of this kind of
tional states and signaling the correspondent approach may be that this is something that could
change. That, for instance, may relieve the user hardly be borrowed from music or cinema–the
from interpreting new sonic elements for their chief contributors for sound design practices in
emotive associations, and may provide gains in computer games (Deutsch, 2003)–and that it is
effectiveness. Moreover, the swapping of distinct also commonly neglected in computer applica-
sonic elements in the soundscape is more prone to tion interfaces.
erroneous interpretations, such as motion of their It should be noted that we are thinking beyond
respective sources, although visual information speech-activated commands. Speech recognition
may be enough for disambiguation. Finally, and is not a goal in our study. Also, the kind of input
more relevantly, this approach is more likely to suggested in this guideline is particularly mean-
offer continuity and emotional gradations. ingful if it does not consist of a mere mapping of
As in Guideline 1, this is a matter of creative commands that otherwise would be entered by
gathering and the selection of possibilities. A pressing a key or button. Although the latter may
few illustrative examples of elements and their be useful, it doesn’t truly represent a change in
possible associate emotional states would be: the interaction itself but only in its activation. In
369
fact, to observe this guideline, the actual activa- slightest imprudence regarding noise in the scope
tion, at the level of the interface, can still resort of some other specific event can unleash the devil.
to a typical key press instead of true sound input. Even so, and excluding the merit of well-designed
In our non-digital lives we often resort to alternate realities, such adaptation demands at least
sound to make things happen: We open our way a first effort from the player. That effort has little
into the crowd by saying “excuse me, excuse to do with playing: it is exterior to the gaming
me” rather then pushing or shooting; we yell to experience itself. The player–the game user–gets
the annoying neighbor’s dog to counter its attack confronted with the implausible and has to solve
(sometimes it gets worse but we still do it); we it consciously before eventually coming to ac-
cough to make someone notice us; we use the car cept it. In turn, that compromises flow and game
horn to stop another driver hitting us; we walk immersion. If indeed the required suspension of
more or less loudly according to our intention to disbelief comes at a cost with no intended value,
make ourselves noticed, even if unconsciously; just as the player is able to overlook the limitations
and so on. Sound plays a huge part as input in the of a compromised game design, efforts ought to
communicational model, not only as dialog, in a be made to minimize the effect.
strict sense, but also in more indirect ways. So, we Some examples of the ideas expressed in this
have the means to get inspired about what could guideline can, in fact, be found in a few existing
be different in computer games. In fact, when put computer games. In the Thief game series–for
this way, it seems that it is not about how to let example, Thief: Deadly Shadows (Ion Storm Inc,
sound in, but rather how to stop forcing it out of 2004)–and Metal Gear Solid 4 (Kojima Produc-
the game: How to escape from the bias of visual tions, 2008), both stealth games, some items can
predominance and derived solutions, and how to be thrown in order to make noise and consequently
allow for more balanced approaches. divert enemies’ attention to them. In the latter,
One aspect that we believe ought to deserve it is even possible to knock on nearby objects
careful attention is the construction of a sense of with similar purpose. In both games and others,
coherence. In truth, when we claim the need to such as The Elder Scrolls IV: Oblivion (Bethesda
consider sound consequences, we are already ad- Game Studios, 2006) the sound of the character’s
dressing the issue of coherence between the value footsteps can broadcast his position.
of what is seen and what is–or should be–heard. Other hypothetical examples would be: yell-
But let us confine, for now, our reasoning to what is ing to frighten or as part of the strategy to defeat
heard: The inclusion of aspects in the game that are beasts, whistling to call our dog or horse, clapping
sound-driven may turn out to be improper if they hands to scare birds and so on.
reveal an incomprehensibly unequal treatment
regarding other aspects that are evident candidates Guideline 4: Allow Meaningful sonic
for the same behavior. This is not about realism: control for Intended Actions
the coherence is relative to the gameworld, not
necessarily to the real world. Instead, it is related This works as an inversion of the cause-effect
to the holistic perspective that is dominant in the relationship in events with a natural or associated
notion of the soundscape. sonic expression. As in Guideline 3, this guideline
Of course, incoherence can become accepted relates sound and acting, however, this time instead
based on the willing suspension of disbelief. The of performing some event X and expecting that
player can indeed adapt to the game’s reality other events Y are triggered or shaped by its sonic
where, for instance, a very noisy event does not expression, we are suggesting a way to trigger
trigger any kind of reaction from enemies but the an event Z by performing its own sonic expres-
370
sion. The idea is to allow the player/character to phasize that we are not arguing the value of one
produce the sound that translates the actions that approach over the other: our aim is to contribute
are intended to occur. An interesting collateral to the enrichment of the space of possibilities.
effect is that, in this process, the player/character One final point that should not be overlooked
substitutes or participates in the correspondent is the potential ludic value inherent to making
sound and, consequently integrates into the overall sounds: that is, in performing at the interface.
composition. In contrast to the former guideline, Thus, not only the ludic meaning of the triggered
in order to cope with this one, it seems relevant actions but also the activation itself becomes part
to allow for actual sound input. of the game. This is a rare opportunity. Typically,
Conceptually, this differs from strict voice the activation level is not conceived of for the
commands in the sense that the input does not purposes of providing fun. There is not much joy
reflect an order for something to happen but rather in the act of pressing keys at the keyboard, moving
the actual sonic expression of something as if it the mouse, pushing buttons in controllers and so
were already happening. This is indeed a relevant on (although, to be fair, there is fun inherent to
distinction, with some implications both in format the use of some interface devices such as steering
and semantics. One difference is the nature of the wheel and pedals, musical instrument imitations,
emitted message: Text versus expression. Another and some modern game console controllers). Of
is the timing and duration of the message. In the course, the design of the sounds that are supposed
case of voice commands the order precedes the to be input–a matter that fits into Guideline 1–has
action and its duration does not depend on that a determinant importance on the kind of achieve-
of the action; in the case of the approach we are ments that may become possible at this level of
suggesting, the stimulus and the action are theo- the game.
retically simultaneous: The action starts as soon Other hypothetical examples would be: driv-
as the stimulus is identified (despite, in practice, ing a cart on a path while avoiding running over
that this will imply some latency) and lasts for as crossing animals by producing the sounds of the
long as the stimulus is maintained. Consequently, engine and possibly the emergency brake, gain-
there are also differences in the kind of control that ing focus over a wooden box to move it on a rock
is possible for actions that are flexible regarding floor by imitating the sound it would make and
duration. Also, it is conceivable that we interpret controlling directions with mouse or keys, mak-
variances in the acoustic parameters along the ing a ball jump different heights according to the
stimulus (intensity, pitch and so forth) and dy- modulation of some established sound, shooting
namically shape the action according to preset a gun by vocalizing the shots, shooting different
conventions. Furthermore, there are significant guns using a feature of automatic weapon selection
aesthetic differences: For instance, the proposed based on their distinct shot sounds and so forth.
approach evidences great potential regarding the
exploration of the input sound as a component of Guideline 5: Allow Integration
the game’s artistic value. Finally, there are differ- of Player’s context into the
ences in terms of the emotional impact underlying soundscape composition
each approach: For example, if we are actually
giving orders, as in some war games such as Tom Context plays an important role in interaction
Clancy’s EndWar (Ubisoft Shanghai, 2008), voice processes. Also, sound is both part of that con-
commands may feel more appropriate, while, in text and a way to express it. It is worthwhile to
some other scenarios, making non-verbal sounds explore the possibilities in terms of soundscape
may provide a better experience. Again, we em- composition and, particularly in respect to affec-
371
tive sound, allowed by the consideration of the context that turns out to be indeed influential to
player’s context. that process, bearing in mind the problem that
Actually, all guidelines here presented have contextual aspects are inherently non-evident.
been strongly influenced by a constant attention Another class of challenges is the actual read-
to context. In all aspects–interaction protagonists, ing of the contextual parameters which, in many
emotional support, consequent sound, action cases, demands the usage of probes or sensors.
through sound–there is always an emphasis on In turn, this is potentially problematic not only
the need to consider a global perspective, both in terms of the availability of those devices but
concerning the integration of the different mo- because some of them can be considered intrusive
dalities and regarding the different combining or uncomfortable to use.
approaches in the particular case of sound. The An example of contextual parameters, which
bottom line is that no approach is good unless it we suggest for the sound designer to consider,
fits in the whole. If it does not, either the approach is the player’s ambient sound (as in Cunning-
or the whole needs to be adjusted. ham, Caulder, & Grout, 2008 and Cunningham,
This guideline goes a little further in terms of Grout, & Picking, 2011). This might be useful
the consideration for context. The argument is to dynamically equalize each of the categories
that the context is not limited to the game itself. of game sounds according to the expected abil-
A game is played by someone who actually has– ity of the player to perceive them. Or, in a more
and is–context too. So there is no point in trying complex endeavor, it might become interesting
to figure out how to turn a game into a perfectly to integrate the players’ ambient sound, or some
designed context piece if we leave out the only of its acoustic parameters, into the game’s sound.
element of the context who would possibly ap- Still, we should not restrict ourselves to sound-
preciate it: the player. to-sound explorations: all possible combinations
Some concepts that have recently became are relevant to game design, at the very least
well-known in game design, such as immersion those that have sound in either of the extremes
(Grimshaw, 2008) and flow (Csíkszentmihályi, fit the present guideline. For instance, we are
2008), emphasize, in different ways, the pertinence particularly sensible to acoustic explorations that
of getting the player and the game into the same can be develop from the readings of the players’
plane of existence. These approaches focus mostly physiological indicators, namely heartbeat, breath,
on the migration of the player into the game. We and brainwaves. In truth, there are some classical
suggest tackling the same issue in a complementary examples of similar exploration in other domains,
way, which is somehow the reverse method: To as evidenced by the relationship between music
extend the game in order to embrace the player, rhythm and the heartbeat. We believe that, since
that is, to build the game around the player. these indicators provide hints on the player’s
Dealing with context poses complex chal- emotional state, it will be interesting to consider
lenges. Conceptually, all aspects of the player’s their potential to dynamically set compositional
context matter to whatever is done in the scope aspects of sound in game scenarios thus aiming
of that context. In practice, this has two related at a better resonance and possibly as the basis
implications. One is that, since it is not technically for entrainment. This is suggested in Guideline
viable to seize all context parameters, it becomes 7 below (see also, Nacke & Grimshaw (2011) on
necessary to identify and capture the most mean- the monitoring of psychophysiological states of
ingful parameters of that context, considering players and implications for game sound design).
the process we are designing. The other is that An aspect that also deserves some commen-
we cannot afford to neglect some aspect of the tary is the possible contradiction between leading
372
the player into the context of a fantasy world integrated according to the techniques mentioned
and bonding with the context of the real world. in Guideline 5. Regarding the combination of
Indeed, once a resonance state between player the stimuli, it is important to be attentive to the
and game has been established, the player might insights from acoustic ecology and consider that
appreciate being transported to another context. the design of a shared-context soundscape should
Actually, the sense of escapism is part of the support the fitting of individual interventions
argument for playing computer games. Even so, rather than superposing their disconnected sounds
this is not contradictory with the effort suggested (Wrightson, 2000).
in this guideline. To start with, because it is a This approach may be considered with different
prerequisite to first be able to empathize with the purposes: for example, simply aesthetic, taking
player (something we will explore in Guideline advantage of aspects of the global complexity; as
7 which concerns entrainment). Next, the kind of a mechanism to deliver a sense of presence and
context that is integrated in the experience and the of activity of the respective community; as part
way that context is translated into the experience of the gameplay, making available some aspects
do not necessarily evidence the bonds in such a and hiding others according to what best serves
manner that they anchor the player to a former the game mechanics.
state or to the consciousness of a real world exis-
tence. Ultimately, the designer may decide that the Guideline 7: Integrate
more immersive the current state the less binding Acoustic Elements that May
there is with the player’s outer context. But even support Entrainment
then, the ability to evaluate the immersion level
will probably require reading certain parameters Entrainment can be used to support the mainte-
from the player’s current context. Most of all, it nance or the change of emotional states. Sound
seems to be a matter of dynamically adjusting is one prominent way to implement entrainment
the components of the context that are the most which can be achieved by progressively mov-
critical to resonance management. ing from one state of resonance into another. In
terms of game experience, keeping the player
Guideline 6: consider shared emotionally involved along time, as complexity
context in Multi-player Environments grows and emotions unfold, is crucial. As the term
entrainment suggests, the idea will be to create
This is an extension of the previous guideline the conditions for the player to engage with and to
through the consideration of multi-player envi- be transported on a journey. Still, the path can be
ronments. Each player’s context may include too turbulent for the designer to assume the player
the perception of aspects of the other players’ will have enough of a pleasurable experience to
context. The argument is that, in a multi-player warrant reaching the end.
environment, context is both local and global The consequences of such an observation
(Roque, 2005 and discussed in terms of a virtual are relevant. The most important is that any tool
acoustic ecology by Grimshaw, 2008). It may be a game designer has to monitor and direct the
advantageous if each player perceives not only course of action in order to avoid losing players
other player’s actions but also relevant elements will be valuable. In this sense, entrainment, and
of the context that shaped those actions. its support through sound, is instrumental. Also,
The implementation of this guideline calls regarding each particular instant of the experience,
for the combination of elements deriving from the managing of the proximity between a player’s
different players, which, in turn, are captured or emotional state and the expected (or even required)
373
emotional state may be addressed through the idea Balinese Gamelan, which has a beat phenomenon
of resonance. Finally, and although resonance that generates frequencies of about 4 to 8 Hz and
must be granted during the whole experience, the this also targets the theta brain waves. Another
initial moment–that is, the first resonant achieve- example, more commonly acknowledged, includ-
ment–is particularly challenging. It is clear that it ing in computer games–for instance, inFamous
will be harder to go from a state of no resonance (Sucker Punch Productions, 2009) and Uncharted:
to a state of resonance than it will be (later) to Drake’s Fortune (Naughty Dog, 2007)–is the use
move between resonance states. The latter situ- of strong beats that gradually increase in rhythm
ation, being well designed, should allow a more and intensity in order to emulate the heart rate
continuous transition. that would match the designed emotional state.
To address the achievement of initial resonance, Depending on the intended purpose, these
at least two approaches can be explored. One is to practices may be used to inform game design.
speculate about the initial mindset and emotional Once again, the acoustic elements used to design
state of the player and gently move from there. That the conditions for entrainment should fit in with
is no different from what is done in other forms of the design of the soundscape according to the
communication: It is a good idea to perform some principles covered in Guideline 1.
sort of introduction before getting into the core of
the message. Still, the contents of the introduction
have to be tuned according to the context of the APPLYING tHE GUIDELINEs
listeners, which frequently has to be estimated.
Although this approach is technically simple it When we argue for the relevance of the integration
may be ineffective due to the lack of indicators of sound in the design of interaction processes,
about both the starting context and the evolution based on the observation of the discrepancy
of the process. So, a second class of approaches, between current game sound use and the value
where there is some way to read indicators that that sound assumes in everyday life, it may seem
permit a better judgment about those aspects, we are implicitly claiming for a balancing in the
will allow more efficiency. For this purpose, any gameworld similar to that in real world regard-
known technique to dynamically infer a player’s ing the prominence of sound in interaction. This
emotional state will be useful. In the scope of is not the case. We are addressing the design of
the present study we find particular relevance in a virtual world where, in principle, there is no
those techniques that take into account the player’s reason for us to be anchored to the constraints of
physiological rhythms, namely heartbeat, breath the real world. So, the designer should pursue not
rate, and brain waves because of their potential fidelity to reality but, rather, creativity.
exploitation in terms of sound (see Guideline Again, this should not be confused with a dis-
5). The problem with the actual reading of such cussion around the search for realism, although
indicators is the device apparatus which is likely that is also an interesting matter to approach in
to be found intrusive and, as such, contraindicated the context of this text (see Farnell, 2011). It may
in terms of the experience. be clear by now that we prize an exploration of
The relationships between emotions and heart- sound that goes beyond the concerns for realism.
beat, breath rate, and brain waves have long been We acknowledge that a rich experience does not
explored (for example, Atwater, 1997; Leeds, require a realistic approach to sound. Of course, the
2001). Musical examples are Shamic drumming, ability to achieve realistic features–at some sound
that induce theta brain waves with consequent layers–is interesting in the sense that it enlarges
approximation to deep sleep and trance state, and the boundaries of the space of possibilities, but it
374
seems fairly evident that it is not a requisite. What equal, nor even that one player remains the same
is more, paradoxically, approaching realism can be through the passage of time).
troublesome in terms of perception and emotional Finally, and somehow in the same vein, it is
response, as is the case of the “uncanny valley” fundamental to recognize that we will never be
phenomenon (Grimshaw, 2009; Tinwell, Grim- designing the players’ behaviors or feelings. In-
shaw, & Williams, 2011) that comprises a feeling stead, through sound design, we are working with
of strong discomfort with greater humanlikeness. the conditions that will influence those players
That is, plausibility and precise realism become into what is intended to be a desired emotional
issues, and failure to achieve them contributes experience. But, again, since those players will
severely to the degradation of the experience. always be subject not only to the designed condi-
Considering that we have been presenting tions but also to other conditions that constitute
foundations to possible insights that might inform their own current context–including manifesting
game sound design, we feel the need to not let their own will and deciding, for example, not to
pass unnoticed the importance of the designer engage–it is not reasonable to be assertive and
having a background in gaming and possess- didactic about effectiveness. In fact, because
ing an extensive analysis of the widest possible games are mostly forms of participatory media,
universe of computer games. Particularly, once the players also are, to some extent, designers of
one is sensible to sound design, one develops their own experiences.
attentiveness to sound facets when playing com-
puter games, even unintentionally. Experimenting
with computer games in a genuine setting–that is, A DEsIGN EXErcIsE
playing games–and possibly becoming or taking
advantage of being a hardcore player, is also one We present an example of the application of the
rich source of information and insights (Aarseth, guidelines by a group of developers with no prior
2003). Even gaming experiences that are perceived experience in game sound design. The exercise
as poor, become sometimes most valuable if one involved a team of 5 Master’s students on a
can rationalize what seems wrong and what would course in game design and development (Alves
be an alternative. & Roque, 2009b). The team was commissioned
A different reason why it is relevant to actually with the design of a game specifically intended to
play computer games, with a behavioral pattern demonstrate the importance of sound in gameplay.
similar to that of the players who are the typical This prompted them to think about a game that
consumers of the kind of games we are address- could not otherwise be played except with and
ing, is that, as we argued, a player’s perceptions through sound.
are strongly influenced by context. In turn, the Our argument for attaching this example to
context of a certain player is also shaped by the this chapter is twofold. On the one hand, it serves
number and diversity of games played before, as an instantiation that may be useful to illustrate
amount of time usually dedicated to playing, the a possible interpretation of some of the sugges-
number of playing hours in a given moment and tions this study provides. On the other hand, it
so forth. Adding this to the inherent difficulty in goes some way to verifying the plausibility of the
grasping other people’s contexts, it seems ap- guidelines we have presented. Of course, at this
propriate to say that the more the researcher or point, the simple observance of this experiment
designer is able to feel like a player, the closer does not provide the support for a generalization
the judgments reached will be to those of players of the results, but the results are an encouraging
(even when considering that no two players are indicator nonetheless.
375
Game Plot and setting sympathy. When he has succeeded, the parent
volunteers to accompany the castaway to the
The game is a single-player adventure, suitable beach and to help him to yell for the attention of
for audiences over the age of 6. It is about a cast- the ships passing by. While they yell, someone, in
away and his rescuing from an island inhabited one of the ships, appears to have noticed something
by fictitious creatures. The plot comprises gain- but assumes it was an illusion because the stimuli
ing the sympathy of the native creatures in order coming from the beach were too weak. In each
to get their help in calling the attention of some of the other zones in the island the plot repeats:
passing ships. Two input methods were designed: each time a parent becomes a friend, the entire
vocalized sound input through a microphone and, group gathers at the beach for another attempt
alternatively, the use of keystrokes to model the to catch the attention of the passing ships. With
corresponding programmed sounds. each attempt, the perception grows that the aim
The game takes place in an island scenario is about to be achieved until, after enlisting the
where the playing character interacts with a set of aid of a certain number of creatures, the goal is
creatures, one at a time, by interpreting their sound finally reached and the castaway rescued.
manifestations in the context of the game diegesis. Non-interactive (cinematographic) scenes
As an example, the player has to “gain trust” of a include the arrival at the island and lonely call for
creature by imitating its pitch, its rhythm and so passing ships, moving onto the beach accompanied
on with two end results: unlocking some progress by friendly creatures and yelling to the passing
in the game and training the ability to recognize ships, and a ship’s crew member wondering about
and reproduce specific sound characteristics, in the yelling sounds (this is a distinct scene each
the context of other sound sources, in order to time a new creature joins the group).
achieve a specific composition. The coordination
abilities thus gained by the player will then be put Gameplay
to a final test in a final setting.
The castaway moves through the island’s zones
story (there being no predefined sequence) and in each
zone there are two types of interaction: with cub
A castaway gets into an island inhabited by strange creatures and with their parent. All interactions
creatures. He notices that ships pass at a distance happen between the castaway and only one crea-
and that they might rescue him, but, when he tries ture at a time with each interaction comprising
to signal his presence by yelling to them, he fails an iterative process of alternate interventions in
to get noticed. On the island, there are several a dialog. The interaction can be aborted before
accessible zones and each zone is inhabited by success is achieved by the player’s decision or
one species. A species population consists of a because a certain number of iterations has been
bunch of cubs and a parent: The cubs are curi- reached. There is no enforced order to the interac-
ous, the parent is neutral though vigilant. The tions but it is mandatory to successfully interact
cubs’ behavior triggers communication-learning with several cubs before being able to complete
episodes where the castaway iteratively tries to with success the interaction with the parent. The
replicate their utterances. After a certain number dialog with a cub is initiated and conducted by it
of successful such episodes, the parent becomes while the dialog with the parent is initiated and
receptive to communication and the castaway, conducted by the castaway. The success condition
combining expressions learned from the cubs, in the relationship with a cub depends on suf-
starts a communication process to conquer its ficiently matching its utterances and the success
376
condition in the relationship with a parent depends, designed from scratch), keeping a faithfulness to
firstly, on its receptivity to communicate–which, in the principle still demanded tenacity, despite the
turn depends on the number of cubs with whom a passionate attentiveness to the guidelines.
successful conversation has been carried out–and, Ironically, despite all that freedom, it was not
secondly, on the level of satisfaction to which the particularly easy to come up with a satisfying idea
castaway can lead the creature in a process where, that permitted one to experiment with the set of
in response to each castaway’s sound sequence, guidelines. Actually, that was a time consuming
the creature manifests the correspondent sympathy task and a valuable lesson that deserves some
reaction. The level of sympathy may drop during commentary. It was evident, for those involved
the interaction with the parent. Every zone in the in the exercise, that the team was particularly
island shares the same game mechanics: what unaccustomed to the opportunity of thinking in
differs are the sound stimuli. auditory terms. For instance, the insights often
The relationship with the cubs can be under- suffer from too much visual bias: In a moment
stood as a learning process of the sound stimuli when auditory possibilities were being experi-
that will eventually allow a successful relationship mented with, the team agreed it was desirable to
between the castaway and the cubs’ parent. On go beyond a simple mapping to visual elements
the other hand, the relationship with the parent is and worked instead to make the gameplay itself
an exploratory exercise of composition through as strongly influenced by the audio component
the combination of these stimuli with some room as it is by the visual modality.
for creativity. In the early stages of this exercise the team
Regarding similarity evaluation criteria of was uneasy about how long the observance of the
the sound stimuli used in interactions, in a first proposed guidelines would have to be explicitly
approach, the following acoustic variables were carried. Yet, and although the circumstances of the
considered: duration, loudness, and pitch. In prac- research did not allow designers to forget about
tice this means sounds do not have to be strictly them, once the design was defined, particularly
identical: they only have to match according to the game flow and interaction, their requirements
those variables. became embedded into the whole design and, as
intended, subsequent steps related to sound became
critical reflection on the Exercise merely a matter of implementation.
One difficulty, more operational than conceptu-
The observation of the design experience sur- al, had to do with which sound files to use. This was
rounding this exercise provided a reinforcement not exactly a surprise since we knew beforehand
of the idea that the observance of this set of that “sound designers are often limited by having
guidelines implies that they must be considered poor, outdated equipment, not enough off-the-shelf
from the early stages of the overall game design sound libraries, but most importantly, not enough
process. The guidelines involve fundamental time to go out and get new, original sounds for
aspects of the interaction which could hardly be the game project” and that “sound is art [and] to
tuned and achieved if too much design features make a game sound artful […] sound designers
had already been decided. That is an important [must] have the time and money to practice their
consideration. We may have the need to put it as art” (Peck, 2001, p. 1). There are several reasons
a prerequisite or accept the limitation of this effort for us to mention our experiences regarding this
if used upon an already well-developed design. practical aspect. First, to note paucity of exist-
Although, in this exercise, there were the optimal ing sounds and lack of time to record new ones
conditions to escape this struggle (the exercise was were critical factors in this particular exercise.
377
Second, and more important, to remind one how the importance of the integration of sound design
significant such a bottleneck may be for this kind in game development practices and advocate the
of endeavor in general. Finally, to acknowledge requirement of conceptual guidelines for those
that, despite the predictability of such difficulties, who will undertake sound design.
a priori conditioning the space of possibilities as a We reiterate that sound design should serve
function of the already available sound materials the project’s intentionality and constitute a whole
would be extraordinarily limiting. along with all other aspects of game design. At-
Finally, we realize the designed gameplay tempts to do sound design directed by the need to
includes a tacit approach to the problem of the provide “something to be heard” are limited, do
players’ adaptation to the game model, in terms of not honor sound’s potential, and may even cause
both interface and game mechanics. This addresses problems with other aspects of the game. Implicit
an early concern: The introduction of uncommon in this thought is that this conceptual sound design
ingredients in interaction, unless carefully ac- ought to be performed right from the early stages
complished, can pose difficulties for players. In of the project and be applied to all semantic layers
the case of this exercise, the interaction with the of game sound.
island creatures occurs as an iterative procedure We contributed to the recognition of the value
which is, in fact, a learning process. Most pleasing of sound design by presenting an approach that
is that such learning makes sense inside and along is based on a multi-disciplinary interpretation of
the game: It is not an introductory level with a several concepts. These include: emotions, regard-
tutorial goal. In that sense, it is the character, not ing which we have empathy for the neurological
the player, who learns. approach because it provides a less context-depen-
dent way to deal with personal behavior; context,
which allows us to understand the individual as
cONcLUsION a complex being blended with others, with the
environment, with own prior experiences, and so
We exposed the discrepancy that exists between on; acoustic ecology, which provides a contextual
current exploitation of sound in computer games conceptualization of sound with emphasis on the
and the value that sound assumes in interaction affective dimension; soundscape and soundscape
processes in our daily-lives. We reinforced this composition, both concepts derived from acoustic
point by mentioning that in other domains, such ecology; resonance and entrainment, two physical
as music and cinema, sound has proven to be concepts with repercussions for perception, cogni-
effective in many aspects that are also critical tion, and emotion and that inspire interpretations of
to the experience of computer games. We also emotion management through a game experience.
contextualized current game sound design with From a holistic consideration of principles and
sound design in the wider scenario of interaction insights subsidiary to these concepts, a set of guide-
systems, namely those addressed by HCI. lines for sound design in computer games has been
We made a point of the fact that noticing the drawn up. The guidelines address several affective
relevance of sound in other fields is insightful and aspects of sound design, including: valuing the
can provide relevant synergy. However, computer acoustic properties of all interaction protagonists
games have their own specifics that oblige proper and their influence on perception and emotions;
adaptation and, most of all, they provide oppor- conveying meaning to the presence of sound in
tunities that are particular to the field. terms of consequence inside the designed world;
Considering our assessment of the current sta- acting through sound by performing meaningful
tus, we argue the need for a collective sensitivity to actions which have valuable sonic expression;
378
using sound associated to events as an input to rEFErENcEs

control them; ensuring coherence in the use of
sound; integrating the player’s context in the sonic Aarseth, E. (2003, August). Playing research:
composition, including in multi-player games; Methodological approaches to game analysis.
exploring resonance as a instrument to achieve Paper presented at the Digital Arts and Cultures
a binding between the player and the designed Conference, DAC2003. Melbourne, Australia.
intent; and the use of entrainment as a model to Alves, V., & Roque, L. (2009a). A proposal of
create a dynamism of resonance states according soundscape design guidelines for user experience
to the emotional script. enrichment. In Proceedings of the 4th Conference
We also presented a report on a brief design on Interaction with Sound, Audio Mostly 2009
case where those guidelines were exercised and (pp. 27-32). Glasgow, UK.
conducted by a team of game developers with no
prior experience in sound design. We registered Alves, V., & Roque, L. (2009b). Notes on adopt-
some uneasiness on the part of designers to work ing auditory guidelines in a game design case .
with the acoustic field as well as they do with the In Veloso, A., Roque, L., & Mealha, O. (Eds.),
visual field: Fighting the visual bias that leads to Proceedings of Videojogos2009 - Conferência de
sound merely being an extension of visual rep- Ciências e Artes dos Videojogos. Aveiro, Portugal.
resentations becomes a primary task. Difficulties Atwater, F. (1997). Inducing altered states of
also arise with quality audio sampling and with consciousness with binaural beat technology. In
communicating sonic design ideas or intentions Proceedings of the Eighth International Sympo-
when compared to drawing visual renderings on sium on New Science (pp. 11-15). Fort Collins,
paper. CO: International Association for New Science.
In further research we intend to augment and
refine the set of design guidelines and to build Augoyard, J. F., & Torgue, H. (Eds.). (2005). Sonic
a significant understanding of their application. experience: A guide to everyday sounds. Montreal,
Particularly, we will be considering how to en- Canada: McGill-Queens University Press.
hance the approach to dynamic composition of
Barr, P. (2008). Video game values: Play as hu-
soundscapes in computer games, with special
man-computer interaction. Unpublished doctoral
relevance to designing the experience with non-
dissertation. Victoria University of Wellington,
musical layers of sound.
New Zealand.
Bethesda Game Studios (Developer). (2006). The
AcKNOWLEDGMENt Elder Scrolls IV: Oblivion [Computer game]. 2K
Games & Bethesda Softworks.
We thank the Master’s students involved in the
Brewster, S. A. (1994). Providing a structured
design exercise here presented: João Pinheiro,
method for integrating non-speech audio into
Lara Silva, Nuno Lourenço, Pedro Almeida, and
human-computer interfaces. Unpublished doctoral
Sandra Mendes.
dissertation. University of York, Heslington, UK.
This research is partially supported by FCT,
Fundação para a Ciência e a Tecnologia, grant Collins, K. (2008a). Game sound: An introduc-
SFRH/PROTEC/49757/2009. tion to the history, theory, and practice of video
MIT Press.
379
Collins, K. (2008b). Nothing odd about audio. Ekman, I. (2008). Psychologically motivated
Retrieved September 31, 2009, from http://www. techniques for emotional sound in computer
slideshare.net/collinsk/sk-466356 games. In Proceedings of the 3rd Conference on
Interaction with Sound, Audio Mostly 2008 (pp.
Csíkszentmihályi, M. (2008). Flow: The psy-
20-26). Piteå, Sweden.
chology of optimal experience. London: Harper
Perennial. Farnell, A. (2011). Behaviour, structure and causal-
ity in procedural audio . In Grimshaw, M. (Ed.),
Cunningham, S., Caulder, S., & Grout, V. (2008).
Game Sound Technology and Player Interaction:
Saturday night or fever? Context aware music
Concepts and Developments. Hershey, PA: IGI
playlists. In Proceedings of the 3rd Conference
Global.
on Interaction with Sound, Audio Mostly 2008
(pp. 64-71). Piteå, Sweden. Follett, J. (2007). Audio and the user experience.
UXmatters. Retrieved September 31, 2009, from
http://www.uxmatters.com/MT/archives/000200.
php
. In Grimshaw, M. (Ed.), Game Sound Technology
and Player Interaction: Concepts and Develop- Frauenberger, C. (2007). Ears))): A methodologi-
ments. Hershey, PA: IGI Global. cal framework for auditory display design . In
CHI ‘07 extended abstracts on Human factors
Damásio, A. (2000). The feeling of what happens:
in computing systems (pp. 1641–1644). San Jose,
Body and emotion in the making of consciousness.
CA: ACM Press.
London: Vintage Books.
Freeman, D. (2003). Creating emotions in games.
Damásio, A. (2003). Emotion, feeling, and so-
Berkley, CA: New Riders Games.
cial behavior: The brain perspective. The Walter
Chapin Simpson Center for the Humanities. Gouk, P. (2004). Raising spirits and restoring souls:
Retrieved September 31, 2009, from http:// Early modern medical explanations for music’s
depts.washington.edu/uwch/katz/20022003/an- effects . In Erlmann, V. (Ed.), Hearing cultures:
tonio_damasio.html Essays on sound, listening and modernity (pp.
87–105). Oxford: Berg.
Damásio, A. (2005). Descartes’ error: Emotion,
reason, and the human brain. London: Vintage Grimshaw, M. (2007). Situating gaming as a sonic
Books. experience: The acoustic ecology of first person
shooters . In Proceedings of Situated Play (pp.
Deutsch, S. (2003). Music for interactive mov-
474–481). Tokyo, Japan: DIGRA.
ing pictures . In Sider, L. (Ed.), Soundscape: The
School of Sound lectures 1998-2001 (pp. 28–34). Grimshaw, M. (2008). The acoustic ecology of
London: Wallflower Press. the first-person shooter. Saarbrücken, Germany:
VDM Verlag Dr. Muller.
Ekman, I. (2005). Meaningful noise: Under-
standing sound effects in computer games. In Grimshaw, M. (2009). The audio uncanny valley:
Proceedings of Digital Arts and Cultures 2005. Sound, fear and the horror game. In Proceedings
Copenhagen, Denmark. of the 4th Conference on Interaction with Sound,
Audio Mostly 2009 (pp. 21-26). Glasgow, UK.
380
Hassenzahl, M., & Roto, V. (2007). Being and Kramer, G., Walker, B., Bonebright, T., Cook, P.,
doing: A perspective on User Experience and its Flowers, J., Miner, N., et al. (1997). Sonification
measurement. Interfaces, 72, 10–12. report: Status of the field and research agenda.
Retrieved September 31, 2009, from http://www.
Hassenzahl, M., & Tractinsky, N. (2006). User
icad.org/websiteV2.0/References/nsf.html
Experience—a research agenda [Editorial]. Be-
haviour & Information Technology, 25(2), 91–97. Lane, R. D., Nadel, L., Allen, J. J. B., & Kaszniak,
doi:10.1080/01449290500330331 A. W. (2002). The study of emotion from the per-
spective of cognitive neuroscience . In Lane, R.
Hermann, T., & Hunt, A. (2005). Guest Editors’
D., & Nadel, L. (Eds.), Cognitive neuroscience of
Introduction: An Introduction to Interactive
emotion (Series in affective science) (pp. 3–11).
Sonification. IEEE MultiMedia, 12(2), 20–24.
Oxford: OUP.
doi:10.1109/MMUL.2005.26
Ledoux, J. (1998). The emotional brain: The mys-
Ion Storm Inc (Developer). (2004). Thief: Deadly
terious underpinnings of emotional life. London,
Shadows [Computer game]. Eidos Interactive.
UK: Phoenix.
Jørgensen, A. (2004). Marrying HCI/Usability
Leeds, J. (2001). The power of sound. Rochester,
and computer games: A preliminary look. In
VT: Inner Traditions.
Proceedings of the third Nordic conference on
Human-computer interaction, NordiCHI ‘04 (pp. Lynch, D. (2003). Action and reaction . In Sider,
393-396). Tampere, Finland. L. (Ed.), Soundscape: The School of Sound lec-
tures 1998-2001 (pp. 49–53). London: Wallflower
Press.
of computer game audio. In Proceedings of the
3rd Conference on Interaction with Sound, Audio Mahlke, S. (2007). Marc Hassenzahl on user ex-
Mostly 2006 (pp. 48-52). Piteå, Sweden. perience. HOT Topics, 6(2). Retrieved September
31, 2009, from http://hot.carleton.ca/hot-topics/
Jørgensen, K. (2008). Audio and Gameplay: An
articles/hassenzahl-on-user-experience/
Analysis of PvP Battlegrounds in World of War-
craft. Gamestudies, 8(2). Mahlke, S., & Thüring, M. (2007). Studying an-
tecedents of emotional experiences in interactive
Juslin, P. N., & Sloboda, J. A. (Eds.). (2001). Music
contexts. In Proceedings of the SIGCHI confer-
and emotion: Theory and research. Oxford: OUP.
ence on Human factors in computing systems (pp.
Kallmann, H., Woog, A. P., & Westerkamp, H. 915-918). San Jose, CA: ACM Press.
(2007). The World Soundscape Project. The Ca-
Marks, A., & Novak, J. (2009). Game development
nadian Encyclopedia. Retrieved September 31,
essentials: Game audio development. Florence,
2009, from http://thecanadianencyclopedia.com/
KY: Delmar Cengage Learning.
PrinterFriendly.cfm?Params=U1ARTU0003743
Kojima Productions (Developer). (2008). Metal
Gear Solid 4: Guns of the Patriots [Computer
M. (Ed.), Game Sound Technology and Player In-
game]. Konami.
teraction: Concepts and Developments. Hershey,
PA: IGI Global.
381
Naughty Dog (Developer). (2007). Uncharted: Sonnenschein, D. (2001). Sound design: The ex-
Drake’s Fortune [Computer game]. Sony Com- pressive power of music, voice and sound effects in
puter Entertainment. cinema. Seattle, WA: Michael Wiese Productions.
Nettle, D. (2006). Happiness: The science behind Sotamaa, O. (2009). The player’s game: Towards
your smile. Oxford: OUP. understanding player production among computer
game cultures. Unpublished doctoral dissertation.
Norman, D. (2002). Emotion & design: attractive
University of Tampere, Finland.
things work better. interactions, 9(4), 36-42.
Sucker Punch Productions (Developer). (2009).
Norman, D. (2004). Emotional design: Why we
Famous [Computer game]. Sony Computer En-
love (or hate) everyday things. New York: Basic
tertainment.
Books.
The Curious Team. (1999). Curious about space:
Parker, P. (2003). Filling the gaps . In Sider, L.
Can you hear sounds in space? Ask an Astronomer.
(Ed.), Soundscape: The School of Sound lectures
Retrieved September 31, 2009, from http://curious.
1998-2001 (pp. 184–194). London: Wallflower
astro.cornell.edu/question.php?number=8
Press.
Peck, N. (2001). Beyond the library: Applying film
Uncanny speech . In Grimshaw, M. (Ed.), Game
postproduction techniques to game sound design.
Sound Technology and Player Interaction: Con-
In Proceedings of Game Developers Conference.
cepts and Developments. Hershey, PA: IGI Global.
San Jose, CA.
Truax, B. (1995, September). Sound in context:
Peck, N. (2007, September). Unpublished Pre-
Acoustic communication and soundscape research
sentation. CoFesta/TGS, Tokyo, Japan.
Simon Fraser University. Paper presented at the
Roque, L. (2005). A sociotechnical conjecture International Computer Music Conference.
about the context and development of multi-
Truax, B. (2001). Acoustic communication (2nd
player online game experiences. In Proceedings
ed.). Westport: Greenwood Press.
of DiGRA 2005 Conference: Changing Views –
Worlds in Play. Vancouver, Canada. Ubisoft Shanghai (Developer). (2008). Tom
Clancy’s EndWar [Computer game]. Ubisoft.
Schafer, R. M. (1973). The music of the environ-
ment. Cultures, 1973(1). World soundscape project. (n.d.). Retrieved Sep-
tember 31, 2009, from http://www.sfu.ca/~truax/
Schafer, R. M. (1994). The soundscape: Our
wsp.html
Rochester, VT: Destiny Books. Wrightson, K. (2000). An introduction to acoustic
ecology. Soundscape: The Journal of Acoustic
Schell, J. (2008). The art of game design: A book
Ecology, I (I, Spring 2000), 10-13.
of lenses. London: Morgan Kaufmann.
Sider, L. (Ed.). (2003). Soundscape: The School
of Sound lectures 1998-2001. London: Wallflower
Press.
Context: Context encompasses intrinsic and
extrinsic aspects that surround and influence in-
teraction phenomena. Disregarding context can
382
make all the difference, namely, deviation from a tympanic membrane is an example of the prin-
predicted outcome. Context has long challenged ciple of forced resonance and, here, the limits of
engineering and design disciplines. what can be forced establish the audible range.
Emotion: There are many possible levels The human body is subject to resonance at many
to approach and therefore define emotions. In levels, depending on the frequencies to which it
this text we adopt the cognitive neuroscience is exposed.
perspective, which explains emotions as body Sound Layers and Semantics: One way to
reactions that include releasing chemicals in brain address the complexity of the components of sound
and blood. Acknowledging this biological basis design is by classifying sound stimuli in layers
emphasizes how seriously the matter ought to be according to their semantics. Classifications, as
taken: It is definitely not something oneself can borrowed from the body of knowledge and prac-
decide whether to attend to or not, once exposed tice in film, might include: dialog, which is the
to “competent” stimuli. This perspective also sup- discourse; music, for setting the emotional tone;
ports the notion that changes occurring in the body foley, which is the sound of actions; ambience,
are accompanied by automatic associations, for comprising the sounds of the environment; and
instance, joy makes our cognition tend to speed sound effects, which are the sounds of abstract or
up while sadness slows it down. imaginary objects.
Entrainment: Entrainment refers to the syn- Soundscape: Soundscape is a concept that
chronization of resonant systems. Breath, heart- derives from the field of acoustic ecology and
beat, and brainwaves are examples of resonant refers to the sound of an environment heard as
systems for which entrainment may be explored a whole. A soundscape is an ecologically bal-
as studied in psychoacoustics. There are two anced entity where sound mediates relationships
types of entrainment: internal-to-internal and between individuals and the environment. This
external-to-internal. Internal-to-internal refers to holistic consideration puts emphasis on context,
entrainment among one person’s pulse systems, emotion, and interaction between the listener and
namely heart, breath, and brain. For instance, the environment.
when heartbeat increases so does breath rate. Soundscape Composition: Acoustic ecol-
External-to-internal has to do with the changing ogy supports the notion that a soundscape can
of internal rhythms through external stimulation, be understood as a composition: like a musical
in our case, through sound. The latter is what al- composition. What is more, soundscapes can be
lows for entrainment through design; the former composed. This inherent sense of harmony and
augments the opportunities regarding the system orchestration is not mere lyricism: for instance,
at which that entrainment is target. studies on animal vocalizations, in natural en-
Resonance: Resonance is the phenomenon in vironments, evidence balance in level, spectra,
which an object is put into sympathetic vibration and rhythm.
by finding a concordance between its frequency Willing Suspension of Disbelief: The term
and an exciting frequency. There are two types comes from the early 19th Century British poet
of resonance: natural (also called free), when an Samuel Taylor Coleridge who argued that an in-
object vibrates as a consequence of being excited fusion of reality into the fantastical was required
with its own natural frequency; and forced, if for readers to accept implausible narratives. It
the object has the ability to vibrate to a variety has since been widely adapted for the study of
of external frequencies. The functioning of the computer games and immersive environments.
383
384
Chapter 18
New Wine in New Skins:
Sketching the Future of
Game Sound Design
Daniel Hug
Zurich University of the Arts, Switzerland
AbstrAct
With the disappearance of technological constraints and their often predetermining impact upon design,
computer game sound has the opportunity to develop into many innovative and unique aesthetic direc-
tions. This article reflects upon related discourse and design practice, which seems strongly influenced
by mainstream Hollywood film and by a striving for naturalism and the simulation of “reality.” It is
proposed that this constitutes an unnecessary limitation to the development and maturation of game
sound. Interestingly, a closer understanding of aesthetic innovations of film sound, in particular in rela-
tion to what can be termed “liberation of the soundtrack,” can indicate thus far unexploited potential
for game sound. Combined with recent innovations in creative practice and technology, they serve as
inspiration to propose new directions for game sound design, taking into account the inherent qualities
of the interactive medium and the technological and aesthetic possibilities associated with it.
INtrODUctION: tHE crEAtIVE which, having matured significantly over the last
DEAD-END IN GAME sOUND two decades, in many ways still seems to live with
DEsIGN its parents, Mrs. Film Sound and Mr. Realism.
This manifests itself in both game design practice
Growing up and maturing is usually associated and technological developments, as well as in the
with acquiring independence from one’s parents discourse that permeates it all. At present, like a
and leaving the family home, to follow one’s child, the game sound has a limited horizon and
own, autonomous destiny. This archetypal human is oriented very much to its “parents”.
narrative could well be applied to game sound,
DOI: 10.4018/978-1-61692-828-5.ch018
New Wine in New Skins
Just a chip off the Old block? labelled: How do we ensure that the wine (sound
design,1 aesthetics) we put into the new skins (the
You might suggest this is not a problem and, per- medium: computer games) is not mouldy, but
haps, even quite a normal situation. In fact, once fresh and fruity?
again using film as an example, movie makers To cut a long story short: I do not have a definite
have drawn their aesthetical points of reference answer and there probably isn’t one. But a travel
upon theatre, photography, pure document, and so through the creative history of film and game
on. But the greatest advances within the medium sound, and through the consideration of some of
occurred when it developed its very own, emanci- the intrinsic qualities of computer-based interac-
pated aesthetics. In their “Realtime Art Manifesto”, tive games, suggests several creative approaches
Auriea Harvey and Michaël Samyn suggest that, that could very well be a useful contribution. The
to develop a unique language for the real time 3D aim of the article is thus not to provide ready-made
medium and to avoid imitation of any old medium, solutions, but rather to enrich an existing discourse
artists should: “Imitate life and not photography, by cross-fertilizing with other fields. Therefore,
or drawings, or comic strips or even old-school those of you who believe that the journey some-
games” (Harvey & Samyn, 2006). This is prob- times is the reward, please read on.
ably overstating (after all, it is a manifesto!) but
still raises an important point. I suggest that the Overview
computer game represents a young medium that
still needs to find its own, autonomous identity, The first section will describe the state of the art in
in particular concerning sound aesthetics. game sound design, and shall elaborate upon the
I propose that an understanding of the driving reasons for the creative limitations as they stand. I
force behind the maturing and emancipation of will outline the recent historical developments in
film sound can contribute to aesthetic innovation creative practice and technology while also tak-
in game sound design. This sounds contradictory ing the underlying discourse into account. I will
at first, but there are good reasons for this strategy. focus on significant contributions to innovations,
In principal, I argue that film sound undertook a inspired by both the technological advances and
similar path towards maturity. Of course, it would the innovative approaches taken by mainstream
be misleading and naïve historicism to think that commercial and independent developers.
the aesthetic developments of film can be directly Following this, the second section will provide
applied to games. An examination of the history an overview of relevant developments within film
of film sound predominately serves as point of sound design, from the arrival of sound film to the
reference, showing what basic strategies of in- present day. Most importantly, it will describe the
novation could be used. Underlying principles new design strategies resulting from the “libera-
are carved out and translated to the realm of game tion of the soundtrack”.
sound by relating them to specific qualities of the The third and final section proposes directions
medium. To this end, historical, theoretical, tech- for game sound design, which in particular build
nical, cultural, and formal aspects of film sound upon the underlying concepts that motivated the
aesthetics are investigated. Ultimately, the idea innovative aesthetics in game and film sound and
is to encourage a fresh approach to game sound partially proposing entirely new approaches based
design that, although inspired by film sound in on the essential qualities of interactive computer
some ways, actually detaches it from this heritage. games and the unique experience they provide.
In this article I would like to contribute to a These propositions will also be discussed in the
discussion which is slowly emerging and could be light of technological approaches that could be
385
used to implement them and will be illustrated objects need to be rendered in a way that the brain
through examples. expects so that the information they represent can
be effectively processed. Virtual Reality research
demonstrates that plausibility and consistency
GAME sOUND DEsIGN tODAY are very important in generating the sense of
presence and supporting in-world performance.
the Dominating Paradigms: There is a need then to attend to the “objective”
simulation of reality and characteristics of the sound object (particularly
Hollywood Film Aesthetics environment ambience). (Bridgett, 2009a)
Looking at the topics discussed at the 2009 AEC The automatic generation of sound in real
conference on Game Sound in London (Audio time through physical modelling seems to be a
Engineering Society, 2009) as well as the web- logical next step in game sound, just as physics
sites of the Interactive Audio Special Interest simulations have become established through
Group (IASig2), the Game Audio Network Guild middleware like Nvidia’s PhysX.6 In this scenario,
(G.A.N.G. 3), Gamasutra4 and others, shows that “creative” interventions are mostly limited, for
the hot topics in contemporary game sound are example, to modifying volume falloff curves in
quite evident: dynamic mixing and digital signal order to fine-tune dynamic spatial mixing, but even
processing (DSP), dynamic procedural sound this is driven by functional necessities, which are
generation techniques, and meta formats like mostly the understandability of dialogue, creat-
Interactive XMF.5 ing clear distinction between “foreground” and
In many cases, the related discourse concerns “background”, or preventing an overload of the
the development of a credible “recreation of mix. It is symptomatic, that so far the only winner
reality” (Young, 2006). The technical apparatus of the award for “Most Innovative Use of Audio”
of computer games provides everything needed (given by the Game Audio Network Guild7 that is
for creating sounds that provide a truly coherent not a music game8 follows these strategies and uses
simulation of “reality”. With the help of sound the technologies mentioned: Tom Clancy’s Ghost
design, middleware sounds can be “attached” to Recon: Advanced Warfighter 2 (Ubisoft, 2007)
sources, placed in Cartesian space and linked to was awarded for its “audio controlling graphics
movements and scripted events. A powerful com- & physics engine”. This example reinforces the
bination of software engine and sound hardware observation that the ideal of a simulation system
calculates and produces the correct psychoacoustic is driving innovation in game sound design.
transformations for creating the illusion of loca- It is arguable that the only exception from
tion, movement and spatiality. The creative focus this limiting orientation towards the reproduction
on the simulation of reality is manifest in the of “realism” seems to be found in the genres of
interest in providing a naturalistic presentation horror and survival games as well as in games
of complex sound environments. Simon Carlile with suspense-driven settings (see, for example,
writes in a comment to an article on the future of Kromand, 2008). But within such contexts, “real-
Game Audio: ism” is simply replaced by the aim of fulfilling
established aesthetic expectations and conventions
There can be hundreds of simultaneous sound using stereotypes and clichés from filmic genres
objects when we cross the road but fortunately (horror, psycho-thriller and so forth). On closer
we hear out the approaching truck pretty reliably. examination, this reveals a fundamental dialectic
But to allow that capability in games, the sound inherent within contemporary computer games:
386
There is no “real” set where the action takes place Here we have these outrageously powerful desktop
and that could serve as point of orientation for machines, easily producing many more channels
establishing verisimilitude (Grimshaw, 2007). It of audio than were available to $100,000 mixing
has been noted that games constitute a “cinematic consoles just a few years ago -- and THIS is the
realism” rather than an “objective” one (Collins, best we can come up with!? tired, recycled 50’s
2008, p. 134), re-creating a sense of immersion gladiator movie soundtracks? the Matrix again
and believability within a fantasy world by means and again? heavy metal guitar cliche after cliche
already established in film. ... ach! my ears!! gimme the volume control ...
This leads to the other dominating ideal of aes-
thetics in current computer game sound discourse In searching for innovative game sound con-
which is achieving a more “filmic” soundtrack. cepts, I shall further develop upon the following
Despite essential (theoretical) differences (I will two areas: technology that could drive innova-
discuss these in the last section in more detail) the tion, and innovation that eventually could drive
discourse about the state of the art and the possible technology or just change the way we see (or
future of mainstream game sound design often hear!) things.
resorts to “Hollywood” and film sound design as a
point of reference. In many cases, the production technological Opportunities
and technology of games and film are also very
similar (for example, Collins, 2008; Grimshaw, Interactive Mixing and Digital
2007). On Gamasutra, Rob Bridgett (2006) argues Signal Processing
that game sound has progressed “both towards and
away from it’s [sic] antecedent of film sound” (p. The sophistication of game sound has been greatly
5). While the movement away from film sound is facilitated by several technological advances in the
mainly concerned with a need to produce event- last 10-15 years, both in hardware (DSP chips on
based, interactive sound, the movement towards soundcards, affordable multichannel output or 3D
film sound has strong aesthetic implications. virtualization technologies for loudspeakers and
However, not everybody is satisfied with this headphones, distribution on DVDs, and massive
situation. A cursory glance at articles within rel- increase of memory) and in software technologies
evant online and offline game magazines reveals (standardized, dynamic audio APIs like OpenAL,
that, due to the arrival of the so-called “Next powerful sound design middleware like FMOD
Generation” consoles (Microsoft XBox 360, Sony or Wwise). Two important techniques that have
Playstation 3, and, to a certain extent because of emerged are adaptive (or interactive) mixing and
technical limitations, the Nintendo Wii), a small DSP. In some of the more recent postings in the
but significant number of publications are dealing audio thread at Gamasutra.com, Rob Bridgett
with aesthetic challenges and the vast unexplored reflects on his experiences mixing titles like
potential lurking in this still relatively new me- Scarface: The World Is Yours (Vivendi, 2006), and
dium. Inspired by Randy Thom’s (1999) article LittleBigPlanet (Sony, 2008). He puts forward the
“Designing a Movie for Sound”, Bridgett (2007a) idea of interactive mixing as a key strategy for
calls for designing games for sound: A game’s advancing game sound and proposes a combina-
design should be created with sound in mind, tion of both film standards (such as grouping,
from the very beginning, to allow the soundtrack auxiliary channels, automation including mixer
to fulfil its potential. Some authors express their snapshots and standardization) with game-specific
frustration with an aesthetic “dead end”. Peter mixing features. Such features include fall-off
Drescher (2006a) states on his blog: management, passive (for instance, auto-ducking)
387
and active, script-driven mixing techniques and a Interactive, Procedural Audio

more game-specific use of snapshots. Bridgett also and Physical Modelling
stresses that game sound requires specific artistic
techniques. Again, these are mostly inspired by Another “hot topic” in the game sound community
film sound (Bridgett, 2009a). He also discusses the concerns procedural audio (see Farnell (2011)
application of these technologies and methods in and Mullan (2011) for an extended discussion).
titles like Scarface: The World Is Yours, Heavenly This technology partially overlaps with interac-
Sword (Sony, 2007), Fable II (Microsoft, 2008), tive mixing and DSP, but is less concerned with
LittleBigPlanet and Prototype (Activision, 2009). traditional studio metaphors like channels, faders,
Scott Morgan provides an example of how and effects. Procedural audio describes a broad
advanced mixing engines can be used to create array of systems that are able to produce a range of
more dynamic interactive soundscapes basing sound outputs for a range of inputs. This includes
on the smart selection and mixing of 18 parallel synthetic, generative, algorithmic, and AI driven
channels of audio and the application of runtime systems, among others. According to Andy Farnell,
reverb and filtering to change the overall feeling “it is better to describe procedural audio by what
of the atmosphere when needed (Morgan, 2009). it is not. It is not pre-sequenced, pre-recorded
The potential of interactive mixing also raises sound and music” (Farnell, 2007, p. 12). There
the question as to whether this technology could are not many examples of games that follow this
be used to manage more advanced narrative func- approach thoroughly, but some titles, such as
tions, which are often inspired by film conventions, LittleBigPlanet or some games by Will Wright,
to control sound in games. A report put together in particular Sim City (3000 and later versions,
by the group dealing with interactive mixing at Maxis 1999 - 2007), the Sims series (Electronic
the Project Bar-B-Q Interactive Music Confer- Arts, 2000 -) and Spore (Electronic Arts, 2008)
ence raises the following question: How can we closely adopt this principle. The design of such
introduce a high-level mixing aesthetic to games games was strongly driven by generative design
which would allow for the control of sounds dy- principles and the contribution of programmers
namically to create narrative mixes that compare from the Demo scene, a computer art subculture
with the best musical and cinematic examples that specializes in producing generative, non-in-
(Grigg et al., 2006). Drescher, one of the group teractive audio-visual presentations. For example,
members, provokingly proposes a fictional tool while there is a significant amount of audio mate-
called “THE Homunculonic AEStheticator”, an rial in Spore (over two gigabytes of compressed
“interactive audio mixing engine with real-time audio, according to Kent Jolly in (Jackson, 2009)),
Haptic Applicators, capable of producing multiple the music (which was produced in collaboration
adaptive soundtracks encoded with True Human with Brian Eno) is mostly composed of short
Emotions™, using T.H.E. algorithm.” Describing phrases and samples, which are synchronised in
the fictional product, Drescher points out a few different ways by the audio engine. The criteria
issues that should in fact be addressed by the game for the composition and the mixing are partially
sound community: dramatic tension, intelligibility, event driven and mostly generative. The result is
focus, scale, romance, comedy, dynamic volume reminiscent of minimal music by Philip Glass or
control, and automatic mastering for various Steve Reich (albeit nowhere near the quality of
output media. He also notes that, in principal, all said composers9), providing an adequate back-
the required technologies would be available to drop for an open-ended game, the pace of which
address these issues (Drescher, 2006b). fundamentally depends upon how it is played.
Robi Kauker states that procedural audio is the
388
only way to deal with the tremendous amount of driven approach). According to Farnell, the sys-
content required for non-linear, open-world games tems described here represent a (highly evolved)
like The Sims or Spore. On asked about whether “data model” rather than a truly procedural one.
the decision to follow a data-driven approach to A truly procedural system, according to Farnell,
the audio of these titles was driven by aesthetical could do a resynthesis of recorded sound, which
or practical considerations, Kauker answers: would provide a full real time control over all its
parameters (Farnell, 2007; 2011).
It’s practical because there’s really no other way to Physical modelling of sound is a technology
make it interesting. We could play a big loop, and that can be understood as a specific type of proce-
the ambience would go [hums] all the time. That’s dural audio. A significant number of the systems
lovely and it works for some types of games, but mentioned come close to an accurate imitation of
for our games that are user-developed, we have physical acoustics, but true physical modelling
to vary the world constantly. With Spore and The systems are still in an early stage of development.
Sims, you don’t know what the world looks like Audiokinetic’s SoundSeed10 is an example of a
beforehand. You don’t know what’s going to be in software-based technology that creates sound
the world beforehand. The only thing you know is variations parametrically from a single source.
that there is a world! That makes it different by It uses a physical modelling-inspired method to
the very nature. (Fleming, 2009) process the sound according to various models,
such as impact or air. This very limited example
Referring to general sound effect design, Kent demonstrates there is a motivation within the
Jolly states: market to push the envelope in this direction.
The footstep system is also complicated because Aesthetics of “Independent” Games

you never knew what kind of foot [the player] will
put on or how many feet a creature will have. The Now let us have a look at what is often understood
front two can be humanoid and the back can be as the hotbed of creativity, the independent (or
hooves, and the hooves can be huge and the front “indie”) game developer community. Fuelled by
feet tiny. (Jackson, 2009) accessible and widespread digital distribution
systems and by engines like Unity11 that simplify
All of the character sounds are thus combina- development and distribution, this movement
tions of various sound components depending has gained significant momentum since the mid-
upon character configuration. The designers cre- nineties, with indie games having an increasing
ated a dynamic system of samples that could be impact upon on the mass-market industry.
adapted in real time by filtering and other process- If we look at “typical” indie game titles,12 we
ing techniques to create the endless combinatorics see a perspective on technology that is quite dif-
that the gameplay affords, for example to make ferent from the one described above. High-end
feet sound bigger or smaller, without having to technology is usually not an option for small low-
change the basic sample. budget productions. It seems that this limitation
At this point the attentive reader may note that supports an aesthetic that seems oriented more
I have mentioned the word “sample” several times, towards animated movies, abstract representa-
and might wonder whether what I am describing tions, making references to the arcade age and
really is “procedural audio”. In fact, this depends the first console generations. This can be seen not
upon the point of view that can vary slightly from only as an involuntary tendency caused by said
author to author (a consequence of said discourse- limitations but also as an ideology: Independent
389
and alternative game artists and developers often tions. Interesting examples are titles by Amanita
emphasize more or less explicitly that the aesthetic Design, for example Samorost 1 (2003) and Samo-
power and potential in games does not lie in the rost 2 (2005). Blueberry Garden (Svedäng, 2009)
simulation of reality alone. In the Realtime Art is another example where the physical, material
ManifestoHarvey and Samyn (2006) express it this representation is questioned: Jumping reminds one
way: “Make it feel real, not necessarily look real”. of the sound of a wooden stick being very quickly
dragged over a rough surface, flying through air
Musical Approaches distantly reminds one of the synthesized sound of
a rope swirling in the air, and oversized fruit falls
Some titles employ musical elements without on the ground with a dull “thud”. Also Grey Mat-
being actual music games. In Primerose, a geo- ter13 (McMillen, Refenes, Baranowsky, 2008) is
metrical puzzler by Jason Rohrer (2009), different an interesting case of “cartoonish” sound design:
colours produce different tones that are tuned to When an abstract dot hits a flying cartoon-brain,
a minor chord, and dissolving rows results in an the latter “explodes” with sounds of breaking
interval of a fifth. When a chain reaction of dis- glass. The relationship of sound and animation
solving rows occurs, the fifth rises by one tone. is motivated strongly by how the explosion is
This, and like approaches, are similar to design designed: The cartoon-like objects explode into
strategies used in some vintage arcade games, such spiky particles, scattering like glass. In these ex-
as Cakewalk (Commavid, 1983), Mr. Do! (CBS amples, the sound narrative is transformed into
Electronics, 1983), Oink! (Activision, 1983), or a trans-natural entity, whereby the sound design
Dig Dug (Atari, 1983), where tonality and musical shifts from the “real” to the metaphoric, the iconic
motives are strongly liked to gameplay and user and symbolic, without losing the roots of “dirty
input. Aquaria (Ambrosia Software, 2007) adopts matter”. Yet, an impact is still an impact and
a more innovative and distinctive approach to some of its “visceral” characteristics are always
linking musicality with interaction. The player has maintained. These sonic aesthetics are common
to make her avatar sing to activate certain spells. in animation movies (Curtis, 1992; Beauchamp,
This is achieved through drawing a sequence of 2005) and have only become possible through
connecting lines between symbols arranged in a the “liberation” of sound from its source, which
circle. Touching a symbol with the mouse pro- I will describe in more detail later.
duces a tone and each of the tones belongs to a
harmonic scale tuned to the game’s soundtrack. An Abstract is Beautiful
interesting fusion of interface, avatar, interaction,
and soundtrack is achieved which transgresses Another aesthetic category in experimental games
diegetic limits with ease. relies upon the total detachment of sound from
any actual physical or even metaphorical source.
Animation Film Aesthetics Some of the examples here follow traditional
design techniques (emitters, zones, event-based
Of course, a prototypical, “pure” aesthetics of triggering of samples and so forth) but use them
animation film or cartoons does not exist, but still in interesting ways to create unique sonic aes-
one can speak of a certain affinity of many indie thetics. In Brainpipe (Digital Eel, 2008) spatial
games to animation film. This manifests itself in navigation generates an abstract soundtrack that
sounds that are more or less de-naturalized in a blurs the borders between music, voice, and
comical, playful, or surreal way, characterized by a sound effects and challenges diegetic borders by
subversive interpretation of sound-source associa- pitching down all sounds when the player de-
390
celerates. The real time strategy game Darwinia -), Splinter Cell (Ubisoft, 2002 -) and Prey (2K
(Ambrosia Software, 2007) features insectoid, Games/3D Realms, 2005)–not to mention games
geometric life-forms, which produce synthetic from the survival horror genre, such as the Silent
sounds reminiscent of actual insect sounds. These Hill series (Konami, 1999-2009).
are combined with all kinds of energetic sounds, Harvey and Samyn from “Tale of Tales”14
hums and wobbles, which are attached to static are an independent design team who use sound
or moving game entities, generating an entirely extensively to develop an interactive ambience.
emitter-based soundscape. Dyson (Kremers and The Path (Tale of Tales, 2009) is an interesting
May, 2009), is an interesting hybrid between ab- example, as it implements a procedural, open
stract and concrete sounds. Planets, represented by gameworld complemented by a distinctive sonic
simple circles, have to be conquered by planting aesthetic that largely builds on interactive ambi-
seeds. The sounds of seeds rooting in a planet are ence. There are no sounds for the direct interactions
a combination of a soft rustle with a faint melodic with objects in the gameworld as such but, instead,
tone emanating from the planet. When seeds start the ambience responds to the user’s movements
to battle, the soundtrack becomes reminiscent of and actions. For example, when the avatar runs,
a swordfight but with a rough, lo-fidelity texture. the camera slowly moves away, the screen blurs
The different planets also emit different sounds: slightly, and sharp and more aggressive string
The sounds emanating from the conquered planets tones “intrude” upon the melancholic string and
are ambiguous and are reminiscent of the sounds piano soundtrack. Moreover a pumping, dull
made by machinery in a laboratory, combined with sound similar to a heartbeat is played, masking
faint beeps that oscillate between machine signals, the surrounding sounds. In this way, sound fosters
and crickets. Planting new seedlings emits a glass- a dual role in acting as the traditional soundtrack
like, percussive sound which has no connection and as a component of narration, or rather a com-
to the visual representation (nor does it function ment on the player’s action, ultimately making
as a metaphor) but, rather, defines an experiential him feel responsible for the change of ambience
quality of the interaction with the game. that has occurred. It enunciates the action of sus-
tained running as unpleasant, inappropriate, and
Interactive Ambience potentially dangerous.
Interactive ambiences, as Bridgett points out, are summary: Innovation with

still an undervalued design opportunity (Bridgett, the Handbrake On
2007b). A related aesthetical concept has been
labelled “antimusic” by Ed Lima, describing The “parents” indeed have left a deep mark in the
his approach of using very little musical scor- mind of their bastard progeny: Film sound often
ing in Doom 3 (Activision, 2004) and of using seems to be a lodestar for the game sound com-
carefully crafted interactive ambiences instead munity. Like children looking for safety, many
(Lima, 2005). This example shows the potential of productions are oriented towards filmic, stereo-
using just simple, two-dimensional (foreground- typical, “best practices”–after all, if it worked in
background), static ambience design paradigms. film, why should it not work for games? Certainly,
Interactive ambiences, or ambiences that are technological advances have left their mark. Some
crafted to support dramatic effects, already play a faint voices call for an innovative exploitation of
role in some mass-market titles. An early example interactive, procedural technology. But still, the
is Thief: The Dark Project (Eidos, 1998). Some “parental” paradigms of using the technology
other notable examples are Half-Life (Valve, 1998 either to replicate film aesthetics or to “simulate
391
reality”, prevails. As expected, the industry “dis- of the montage. Pudovkin (1929) further elabo-
sidents” of the family, the independent developers, rated this point, arguing that image and sound
provide interesting aesthetic approaches to sound are united by the resulting interplay of meanings:
design and usually make an effort to find their The redundant use of sound is to be avoided and
own style. However, true innovation sound-wise image and sound have to be developed along
is relatively rare also here: Most indie games do separate rhythmic paths by using counterpoint
not move beyond the aesthetic level that animation as an essential compositional device. Around the
film has already reached, and the occasional use same time, in a different cultural context, French
of procedural technologies has, as yet, not been director René Clair expressed his concern at the
applied to sound at all. stereotyped patterns that early sound films exhib-
This is not necessarily a big problem. After ited even during the early experimental stage. He
all, there are many great games with wonderful saw more potential in the interpretation instead of
sound out there. However, from the point of view the imitation of noises, and argued for an alternate
of a longer-term advance of computer games as a use of the visual subject and the sounds produced
medium and game sound in particular, the current by it (Clair, 1929).
situation represents a dead-end, preventing the Over the following decades, sound became
development of a unique aesthetic identity. It is firmly established in film. As a result, aesthetic ap-
time for game sound to come of age! proach to sound was extended, and also revised, in
some aspects. While sound was supposed to serve
the image (according to earlier writers), directors
FILM sOUND: FrOM WALLFLOWEr such as Bresson emphasized the reciprocity of
tO EMANcIPAtION sound and image. Sound should replace image,
not complement it, and it can also dominate the
As stated in the introduction, my proposition is, image (Bresson, 1985). This concept has been
that film sound can teach us a few interesting les- developed further, in particular by Michel Chion
sons about how to find aesthetic independence. (1994). Metz considers that attention towards
I will elaborate upon what those lessons could sound should move beyond a purely phenom-
be within this section. I will firstly focus upon enological understanding and towards sound as a
the developments that had the biggest impact in socially constructed entity. We experience sound
the aesthetic history of film sound and which are through a body of knowledge and thus its design
potentially of interest to computer game sound. (and study) takes place within larger cultural and
ideological structures (Metz, 1980).
Pioneering Approaches
New Hollywood and its relatives
Ever since the introduction of ”talkies,”15 film
makers began to reflect upon the use of sound Let us now take a look at the biggest aesthetic
in film. The seeds for an experimental aesthetics revolution in sound cinema in terms of historic and
in sound design were already planted during this economic dimensions. Mainstream Hollywood in
pioneering era of film sound when the medium the 1950s was growing rapidly and productions
was not yet entirely defined and conventionalized. became increasingly monumental, relying upon
In their famous Statement, Eisenstein, Pudovkin rigorous division of labour and tight production
and Alexandrov (1928) demanded that sound and marketing plans. This lead to the constant
needed to be used contrapuntally in relation to the repetition of conventional, formulaic, “designing
visual montage in order to avoid the destruction for the masses” approaches to film. This, in turn,
392
suffocated a lot of creativity, including sonic cre- strategies, and musicality emerged as a principle
ativity. Studio-specific sound libraries were used to create and arrange every noise in a sound track
over and over again, which was due partially to (Flückiger, 2001). Commenting on his sound work
convenience, and partially to assist larger studios for THX 1138 (Lucas, 1971) Murch states that: “It
to create their signature sounds (Flückiger, 2001, is possible to just listen to the sound track of THX
Whittington, 2007). In some ways, the situation exclusive of the dialogue. The sound effects in the
of “classic” Hollywood is comparable with the background have their own musical organization”
mainstream game industry today. (cited in Whittington, 2007, p. 57).
In the late 1960s and early 1970s the tide A further consequence of this movement
began to turn. Directors such as George Lucas, was the combination of synthesized sound with
Steven Spielberg, Francis Ford Coppola began recorded sound that opened up new narrative
to treat sound in an entirely different way. The and aesthetic spaces. For example, the screams
inspiration for these non-conformist “movie of the birds in Hitchcock’s eponymous movie
brats” came from avant-garde movements, such (1963) and created by Bernard Herrmann, Remi
as the French Nouvelle Vague: In the late fifties Gassman, and Oskar Sala on an early electronic
and early sixties, significant changes within the instrument called the Trautonium.16 Another strik-
French movie industry provided the creative ing and highly evolved example can be found
minds of François Truffaut, Alain Renais, Chris in Apocalypse Now (Coppola, 1979) where the
Marker, and Jean-Luc Godard, amongst others, naturalistic recording of helicopter sounds are
with the freedom to break from convention and combined, juxtaposed, and fused with wobbling
to explore new directions. The Nouvelle Vague sounds from a synthesizer.
was characterized by a critical approach to society
but also to cinema itself, emphasizing the role of sonic Gene technology
the author. This was an important inspiration for
a new generation of Hollywood film makers, the As can be seen from these examples, technology
New Hollywood. I will come back to this later. often played an important role in the develop-
Firstly, let us consider the factors that contributed ment of aesthetic innovations, even though it was
to the liberation of the soundtrack. sometimes used in an unorthodox way. A further
example relates to the impact of multichannel
the sound of Music sound in driving aesthetic innovation: Reporting
about the technological and aesthetic challenges
Sonically, avant-garde movements had an impor- posed by the quadraphonic system, Murch states
tant impact, in particular Futurism and Musique that: “Parenthetically, that’s actually where the
Concrète (see, for example, Walter Murch in concept of sound design came from. I felt that
LoBrutto, 1994, p. 84). This led to highly in- since nobody had ever done this before, I had to
novative sound design practices. Sound design- design it and figure out how to use this new tool
ers approached sound “in itself”, interweaving that we’d come up with.” (LoBrutto, 1994, p. 91).
the dominant causalistic and naturalistic sound Added to this was an unprecedented increase in
ideology with the “objet sonore”, which is only flexibility of the recording process due to portable
attainable through “reduced listening”, where technology, in particular the Nagra, invented
the real or supposed source of a sound and the in 1959 by Stefan Kudelski. This high-fidelity
meaning it may convey is ignored. The aesthetic portable recording technology made it possible
achievements of Musique Concrète inspired the to work in the field more frequently, and also to
use of what could be termed “musical” design record sounds that previously were hard to record,
393
further liberating the sound recording process. Is it a bird? Is it a Plane?

For the first time the messy, wild, aleatoric, ba- No, it’s Ambiguity, Man!
nal, everyday, even “abnormal” sounds (broken
machines, leftovers and trash found in basements The Russian director Andrei Tarkovsky in particu-
and attics) were embraced, becoming part of the lar explored ambiguity as a catalyst for engaging
material the “movie brats” worked with. This led experience and depth. His use of ambiguity in films
to a reconsideration of the constructed nature of like Solaris (1972), Stalker (1979) or Sacrifice
any soundtrack and the meticulous de- and re- (1986) creates a sonic environment in which the
construction of the complex fictional sonic event. audience struggles to make sense of a sound heard,
Sounds were now combined in layers, in complex creating meaning through establishing coherence
sonic alloys, combining several qualities into a between the heterogeneous elements of the audio-
single sonic “transobject”. A prominent example visual narrative. This diegetic playfulness leaves
is the sonic re-engineering of the “used future” the spectator struggling with her beliefs. Truppin
in THX 1138 and in Star Wars (Lucas, 1977) (1992) describes specific design strategies used
with its laser swords, haggard, stuttering space- by Tarkovsky: The revelation or negation of a
ship jet engines and worn out androids (Burtt in (unexpected) source of a sound, the subversion
LoBrutto, 1994). of the coherence between sonic and visual space,
As a consequence of new ideologies and tech- or the use of sounds on parallel levels in order to
nologies, the limitation imposed when sound is enunciate qualities of both the material and the
treated as an index of a single, recognizable source psychological or spiritual. On the other hand,
was overcome. All levels of association of sound Truppin notes that this use of clearly identifi-
and its source, be it on-screen, off-screen, or even able, specific and naturalistic sounds in a surreal
in a reference to a cultural framework beyond the setting might unhinge established conceptions of
film itself, became possible. The reduced listen- the real or provide signs of safety in an otherwise
ing, abstract qualities in sounds, such as structural confusing narrative world.
instability, change in energy and power, organic
or synthetic notions and so on became impor- sonic Perspective breaks Loose
tant, particularly within the science fiction genre
(Whittington, 2007). A common design strategy The liberation of sound from source and the
relates to the ability to “play” with the familiarity increased flexibility in the recording and produc-
of a sound, using de-familiarization as a narrative tion process led also to the deconstruction of the
device. In addition, unidentifiable or ambiguous perspective relationship between sound and im-
sounds can create interpretive spaces and activate age. So far, practices to create sonic perspective
the viewer. They also can frustrate expectations and were motivated either by the need to understand
even create fear by proposing an unknown source dialogue or the maintenance of a more or less
or phenomenon, the alien and incomprehensible. naturalistic sonic perspective (Wurtzler, 1992).
Postmodern, anthropomorphic, ambiguous, hybrid Microphone placement possibilities provided
machines could be created: The robot R2D2 from additional restrictions. The critical authors of the
Star Wars “speaks” with synthetic beeps which Nouvelle Vague fundamentally questioned these
remind the viewer of baby talk (Burtt in LoBrutto, conventions. In Godard’s films, for instance,
1994). Despite this ambiguous, uncanny sonic sounds would appear relatively loud, negating
identity, the form is accepted, and even loved, the division between foreground and background
by the listeners. and would even refuse to disappear when “more
important” information appeared. What is more,
394
protagonists would seem strangely unresponsive audience. Dream sequences or representations of

to them. Godard would even avoid covering the hallucination represent extreme cases of subjec-
edits of the sounds (Williams, 1985). This leads tivization. For a direct and radical confrontation,
to a sonic aesthetic that is diametrically opposed I recommend David Lynch’s Eraserhead (1977)
to the transparency of the mix aimed for in most or a look at the dreams of special agent Cooper
Hollywood films, old and new. Thanks to these in Twin Peaks (Lynch, 1990-1991), in particular
unorthodox approaches, sonic aesthetics like the red room at the end of the second episode of
these are no longer taboo and are used in several season one. Other striking examples are the explicit
innovative sonic designs, such as subjectivization. audio-visual placement of the viewer into one of
the protagonists in first-person view, for example
I Feel Good: subjectivization in Predator (McTiernan, 1987).
The possibility of enunciating subjective experi- take Me Higher: High

ence is an important aesthetic possibility emerging Level semantics
from the liberation of sound from its source and the
techniques of montage described above. Sounds Last but not least, the liberation of sound from
were now used in various ways to mark, or even a strictly indexical function facilitated the emer-
simulate, subjective experience. Flückiger (2001) gence of complex higher level semantics, where
identifies several sound design strategies, which primary semantics (related to the questions: What
are very common and unquestioned nowadays: creates the sound? What is it made of? How does
disassociation of sound and image, disappearing it move? Where is it?) became constituents of
sounds, non-naturalistic reverberation, montage of higher level meanings (Flückiger, 2001). More
unidentifiable sounds over slow-motion images, than before, sounds could now have symbolic
enlargement relative to the image, body sounds and metaphoric functions standing for cultural,
like breathing and heartbeats, and overempha- religious, or psychological entities (think of bells,
sized, anti-naturalistic selection. For instance, in keys, animal sounds and so on). Sounds could be
The Terminator (Cameron, 1984), and even more established as “keysounds” within the narrative
so in Terminator 2: Judgement Day (Cameron, context of a specific film, for example, the sound
1991) the sounds of the Terminator’s leather of the scanner on the bridge in the original Star
clothes and his interactions with his sunglasses Trek series (produced by Roddenberry, 1966-1969)
are moved towards the foreground. This creates or the sounds of helicopters in Apocalypse Now.
an uneasy intimacy with the deadly man-machine. From here, new stereotypes and meta-signs could
Another powerful (and aesthetically very differ- be established where artificial, non-referential
ent) example for subjectivization can be found sounds achieve a new indexicality through system-
in Pi (Aronofsky, 1998): The protagonist suffers atic re-use within certain genres or filmic styles.
from violent headaches. These are both marked This connects to Altman’s (1992) proposition of
as the protagonist’s subjective experience of pain understanding cinema as “event”. In this view,
through unidentifiable sounds, heartbeat-like mu- cinema is no longer an autonomous aesthetic en-
sic or metaphorical sounds of grinding stones, as tity, but a complex socio-cultural artefact which
well as simulated through high pitched screeches. emerges in the interaction of complex production
The simulation effect is enhanced through the and reception processes (see also Metz, 1980). This
action-driven ducking of the painful sounds when encourages us to think about the many complex
the protagonist switches the lights off, temporary influences that make all aspects of cinema, not
relieving the pain for both the character and the solely sound and image, meaningful. Just con-
395
sider all the audio commentaries, “Making Of” office hits, they continued their work in relatively
documentaries and Internet fan sites, educating small, independent teams, considering themselves
filmgoers within the production process which in colleagues that could be trusted. Freedom in the
turn influences their experience and understand- creative process was the result and only this way
ing of films (Whittington, 2007). Only through it was possible that (for example) Ben Burtt could
such processes can the start-up sound of an Apple spend one whole year designing the sound effects
computer become a meaningful sonic event in the of Star Wars through trial-and-error.
Sci-Fi animation movie Wall-E (Stanton, 2008).
Through the process of cultural reception, diges- summary: seeds of change
tion, and reproduction, many of these experimental
designs have now entered our collective memories As our little voyage through the history of film
and have become signs that are easy to understand sound has shown, the long process of “emancipa-
or even clichés. Additionally, there is a culture of tion” of sound, from considerations of audio-visual
constant cross-referencing, citation, and remixing. montage through to its liberation from a naïve
This began with a strong emphasis on genre in indexical straitjacket, has provoked a fundamen-
New Hollywood, which soon turned into de- and tal shift in aesthetic paradigms and resulted in
reconstruction and recombination of genre into a series of aesthetic innovations. Asynchronity,
pastiche like Star Wars (elements of Western, counterpoint and complex reciprocal image-
Swashbucklers, Sci-Fi, Cartoon17) or hybrids like sound relations enriched the vocabulary of the
Alien (Scott, 1979) and Predator (Sci-Fi blended time-based medium leading to a reciprocity of the
with Horror). This post-modern aesthetics became influence between image and sound and a mal-
a significant driving force for film sound (Whit- leable dominance of the one over the other. The
tington, 2007) and is commonplace now, with emancipation of sound made it possible to develop
films like The Matrix (Wachowski brothers, 1999) a rich semantic vocabulary, relying on symbols,
combining characteristic sonic signatures of Sci- key-sounds and so forth, establishing genre spe-
Fi, Horror, Film Noir, and Martial Arts. cific stereotypes. The musicalization of the sound
track has broken the barrier between musical and
brats at Work non-musical, abstract and concrete, material and
synthetic, referential and non-referential sound,
Economical, structural and technological changes naturalizing the fictional and fictionalizing the
alone would not be enough to drive a significant natural. The embracing of ambiguity led to a rich
aesthetic revolution. Fundamental to innovation practice of sonic enunciation of subjectivity and
is the challenging of convention, developing an showed that the struggle for understanding can be
attitude that rules are there to be broken. It is a enjoyable experience. Finally, an understanding
probably no coincidence that this spirit flourished of sound-related discourse, emerging from the
mostly in communities that were driven by non- socio-cultural process of production and consump-
conformist ideals, such as the Russian film schools tion has opened up and encouraged post-modern
of the 1950s, the French Nouvelle Vague and the playfulness with style, form, and meaning.
“movie brats” of New Hollywood. This mindset A few driving forces emerge as precondi-
was linked to production systems resembling the tions for such a development. Firstly, there is
“Cinema Copain” ideal of the Nouvelle Vague. The the confrontation with the artistic avant-garde
biographies of many innovative directors share of Futurism and Musique Concrète and the rebel
this similarity: After acquiring financial indepen- attitude of “Rock’n’Roll” which leads to a critical
dence, for instance, through placing some box approach and a playful “joy of subversion” in the
396
creative process. Additionally, it is important to Defining the Essence of

break from the chains of the big studios and to computer Games
work in relatively small teams with a less strict
division of work. Last, but not least, technology First of all, a common conceptual ground for the
plays an important role as a catalyst for aesthetic understanding of the essential qualities of com-
innovation, in particular through creative “abuse” puter games, on which any further discussion can
or exploration of less common applications. be based, shall be established. Note that this is
by no means meant to be a general definition of
computer games. Also, it is not a comprehensive
rEDrAWING PAttErNs OF review of the essential literature on the topic.
cHANGE: PErsPEctIVEs Rather, it forms a working definition, limited to
FOr GAME sOUND DEsIGN the scope of this chapter and serving to elucidate
its thesis.
In this section I will revisit some of the innovations Computer games, as understood here, are
in film sound described above18 and propose “con- computational systems that essentially are pro-
ceptual derivatives” for computer game sound. cedural, functional and interactive. These auto-
As previously mentioned, it is clear that there is poietic (Grimshaw, 2007) systems gestate worlds
neither a repetition of the aesthetic history nor can emerging only through the player’s agency and
patterns or guidelines derived from film sound interaction. In this sense, computer games are
phenomena be applied directly to game sound.19 both narrative spaces and tools for action. Ac-
There are certainly commonalities, but there are cording to Neitzel, games have a transitional,
also essential differences between game and film hybrid character, oscillating between the closed,
sound. I will not elaborate here upon these dif- symbolic spaces of representative media that
ferences; the reader may refer to the discussions function through observation and immersive sys-
in Deutsch (2001), Jørgensen (2007), Grimshaw tems of agency in a virtual world. Depending on
(2007), Collins (2008), and, in particular (Grim- genre and type of game, the player is situated, or
shaw, 2008) who offers an in-depth discussion of constantly shifts, between experience and action.
several core elements of film sound theory and Computer games thus resolve the subject as being
their applicability to game sound.20 Nevertheless, the centre and origin of diegesis, as it is presented
as I have made clear in the introduction to this in film (Neitzel, 2000). Using Neitzel’s term, this
chapter, historical developments in film sound “playful schizophrenia” is pushed even further
aesthetics can serve as models of transformation through multiplayer games and online worlds,
and as points of reference for my proposition of where the computer only provides the setting and
new creative and aesthetic directions for game rules–a kind of procedural experience system but
sound and thus providing an inspirational source one where the interaction mainly happens between
and motivation for exploring new directions. The humans represented by avatars. Narrative may
creative leaps in film sound design are interesting well emerge, but is not a constituent of the game
also because, despite originating in experimental system any more. A similar impact results from
artistic approaches, they were not only relevant to the diffusion of games into everyday life through
a small underground scene, but have finally made pervasive gaming applications. Here, life is a
their way to the mass market and the mainstream. game, literally.
In the immersive, performative experience of
playing, the boundary of engagement through the
“interface”, which involves physical controllers
397
as well as virtual representations of artefacts, or of a process occurring with an object (Chion,

even the avatar’s limbs, is dissolved. This has in- 1998, p. 102).
teresting implications for how sound is associated The concept of realism is particularly question-
with player agency as it can relate to the physical able in the entirely constructed virtual worlds of
interaction as well as the action in the gameworld computer games, where any relation to a “given”
or even to actions of the game system, as will be reality (as in film) is entirely voluntary. Films
elaborated below. already require constant leaps of faith in terms
Thus, the essential quality of a computer game of identifying a sound’s source. It is through
is constituted by the action afforded by the game psychoacoustics and our imagination and willing-
apparatus and performed by the player, where ness to accept and normalize inconsistencies that
the interactive system facilitates the emergence the sound from a speaker somewhere behind a
of certain experiences that may have a narrative screen can become the sound of a thing on screen
quality, at least in retrospect. I will follow this (see Chion’s description of “magnetization” and
understanding of games exclusively here, which “synchresis”, Chion, 1994). But, while a film will
also means that I will not discuss aspects related always produce a rupture between the world it
to narrative such as diegesis, unless they offer a has recorded and its representation, 3D computer
possibility for aesthetic experimentation. games do not produce this rupture, as the mere
In the following section, I will demonstrate how possibility of spatial sound and physical modelling
the inspiration taken from film sound aesthetics naturalizes and justifies every sound produced.
and the qualities of computer games described Computer games constitute an apparatus where
here can be turned into directions for innovative the process that generates its entities and the
sound design motivated by the inherent qualities manifestations of these processes form a closed
of computer games as defined above. system, where sounds are calculated according
to an ideal physical model and are “naturally”
Field of Action 1: Media emitted from objects in three-dimensional space.
Aesthetics and semantics Of course, such a simulative system of recreating
reality has its appeal but, if this approach domi-
Sound beyond Simulation nates creation processes, the fundamental quality
and Naturalism of the technology is missed, its potential to create
new, surprising aesthetics is overlooked. It is not
Considering the prominence of the discourse about a natural fact that any generative system relying
simulation and realism identified in the review of on physical modelling is predestined for recreat-
the state of the art in computer games, this issue ing reality. Let us recall some of the discussions
shall be addressed first. about Virtual Reality when it still was a relatively
I have previously criticized a general sonic young medium:
naturalism and reductionism (Hug, 2008b), mainly
by building upon Chion’s observation that there is Whereas hyperreality still implies some con-
no sound of a thing as a one-to-one-relationship nection, regardless of how faint, to the ethos of
and, if an unambiguous indexicality is needed, verisimilitude, sound has no such loyalty; after
we usually rely on idealized instances of sonic all, where there is no ‘thing’ to represent, there
occurrences. Additionally, Chion points out that can be no ‘misrepresentation.’ Similarly, VR, as a
many sounds suggest abstract qualities of mate- space of computer-generated simulation, renders
rial and process, which he labels “indices sonores irrelevant questions of verisimilitude, realism,
materialisants”, rather than being specific indexes and authenticity. Unlike the camera, the simula-
398
tion makes no claims to reproduce reality, and in In terms of designing sonic objects it would
that sense it cannot be wrong, it can only be bad. also be worthwhile to investigate ways to use
(Dyson, 1996, p. 84) generative technologies, such as physical mod-
elling, in a creative, unorthodox way. The play
with qualia, with sonic “traces” of materiality or
Designing for Autopoietic Content physical processes, of the human or the animal,
the organic or the inorganic, the material and the
Designing for autopoietic content means that the structural, could be achieved in real time following
content of a game is designed as a system of po- similar semantic approaches as in film, as outlined
tentialities, framed and specified by superordinate above. The system could also be devised as a
qualities of a desired game experience, rather than kind of “real time sound designer”, assembling
consisting of a finite sum of fixed assets. This is sonic components into complex sonic amalgams
potentiality embodied within formulations of the as micro-narratives (Back, 1996) on the fly. This
methods and procedures of its generation and the would result in a subversion of the physical model-
models of their mutual interaction. This is not a ling paradigm into “fictional physical modelling”,
new design strategy; it is used, for example, in linking it to dynamic interactive processes rather
procedural art, and also for experimental music than a “trigger” paradigm. Why not take Farnell’s
and live electronics. While artistic design strate- (2011) proposal of a “behavioural sonic object”
gies can and should be used for creating unique further by subjecting the control of the sound
sonic experiences in games, they are of limited generation algorithm to any imaginable narrative
relevance as the related “application domains” are or performative expression? In the game Love
free from a-priori functional demands and do not (Steenberg, upcoming) the available processing
necessarily rely on interactivity and play. Closer power is used for a very distinct and aesthetically
to the sought characteristics, even if originat- innovative graphical post-process in real time.
ing from an entirely different field, are certain Sound could be approached in the same way: for
sonification methods, in particular Model Based example, to use the processing power to create new
Sonification (Hermann & Ritter, 1999). Here, a sounds “on the fly”, instead of simulating reality.
sound generating system is set up in such a way This would be a sound design which works with
that the totality of all components that generate potentiality, as demanded, rather than crafting the
a sound are driven by the dataset that is fed into actual sample itself.
the system. The system is then treated as a form
of “virtual emitter” that can be excited through Designing for Autopoietic
a user’s interaction, for example, by virtually Second Order Semantics
touching it with an interface. This could very well
be an inspiration for an experimental approach to As in film, it is paramount to understand how
designing the sounds for a game: Following the meaning of sound emerges from socio-cultural
analogy, all entities of the games could be specific processes of production and reception. Sonic meta-
parametrical setups of a general sonic model, signs like symbols and key sounds, the practice
which could be derived from an overall aesthetic of citation and remix, and the play with codes
concept, and the sounds would be generated in are some examples of those complex sonic signs.
real time through the agency of both players, non- In a computational, dynamic, modular system,
player characters (NPCs) and the world-system such complex systems of meaning creation can
of the game. be assembled on the fly, in real time, driven by
interaction. The current paradigm of using more
399
or less elaborate static samples with event based essential difference is that it is basically a system
triggers can result in interesting and even out- to allow for the creation of complex modification
standing gaming experiences, as we have seen in patterns in all aspects of the sound design, from
the examples given in the first part of this article, the sonic object to its arrangement in an interactive
but this approach will never be able to exhaust time-space, depending upon the player’s behaviour
the potential of the interactive real time medium in the gameworld. This requires further research
that games are. Harvey and Samyn (2006) state in into how people play games and the strategies
the Realtime Art Manifesto: “The situation is the they develop (the field of affective computing
story. Choose your characters and environment is one that researches these questions (see, for
carefully so that the situation immediately triggers example, Picard, 1997)).
narrative associations in the mind of the user.”
This also means that sound should be designed Agency-Driven Sonic Montage
in a way that supports a situational emergence of
narrative, which of course requires us to rethink the From the standpoint of aesthetical history of film,
whole sound design process and the unorthodox montage is probably one of the most important
use of the procedural technology we have at our aspects of audio-visual design. As discussed
disposal. An approach could be based on a script above, temporal concepts such as asynchronic-
language that allows us to denote conditions for ity or counterpoint cannot be transferred directly
certain narrative and compositional reconfigura- into games, but an agency-driven understanding
tions of a procedural audio engine. The sound could be followed. Like the active, script-driven
designer’s job in this case would be to define mixing proposed by Bridgett (2009a), and that I
the changes in the parameters and the mappings. mentioned earlier, one could envisage an “active
The engine registers the patterns in the player’s montage”: By motivating the player to do certain
actions: This acting could be framed by simple, things, for instance, to visit the inventory repeat-
established psychological categories, for example edly or to switch perspective from close-up to
basic emotional states or levels of intentionality. total, an agency-driven sonic montage may be
Does a player walk straight to a target or does achieved. Game mechanics and level design are
he explore the surrounding world? Is he low on the fundamental components of design here.
health, weak, hectic or calm? Where does he look Let us also consider cinematic off-screen
and for how long? Does he first aim, then shoot? sound for a moment, an important element in the
Does he collect health potions and use them only audio-visual montage. At first view it seems that,
when necessary? Is he hitting the target with a last, in a three-dimensional game, which gives the user
desperate, blow? Does the player often look at the a fair amount of control over the camera (both
map? Does he constantly rearrange the inventory? independently or not from the point of view of
Does he switch weapons aimlessly or in a very her avatar) an “off-screen” mode does not exist
targeted manner? Does he miss a lot of the hidden as a possibility. But by examining this possibil-
pickup objects of secret doors? All this implies ity more closely, you will notice that specifically
certain experiential qualities that can be taken as staged “off-screen” sound events actually do
control elements of the interactive experience. exist in games. Common examples are invisible
This way, the experiential quality, that in film is doors, machinery and so on, that are activated
narrated audio-visually, becomes the actual ex- by switches and the like. Another example is the
periential quality of the player in the gameworld. spawning of players, NPCs, or objects. Spawning
In some ways, this strategy is comparable to and remote control switches (and their derivatives)
the approach that is used in adaptive music. The are constituent elements of many games. The
400
sonic design of these off-screen events is usually unique element of the realtime medium and it wants
very stereotypical, for example through some to be free”. Thus, it is not possible to attempt to
kind of synthesized energy sound or the sound control all aspects of gameplay!
of a mechanism being activated. From the study Multisensory congruency, consistency, obvi-
of film sound, we quickly realize that this is not ousness: they are all useful and often necessary.
necessary the end of it. To begin with, the engine However, there is no absolute rule stating that
could be aware of the direction an avatar looks at, artificially created virtual experiences have to
and control “off-screen” sounds depending upon follow these ideals. As for sound, already trivial
this direction. From there, a myriad of design everyday observations show that the multitude of
possibilities open up that wait to be explored. sensory phenomena occurring at any moment does
not have to overlap necessarily: I can be looking at
Sonic Effects: A Helpful Paradigm a picture of my last holiday while I hear the cars on
the street (Ihde, 1976). These components are not
A useful paradigm that can support the design of semantically related by some kind of embracing
dynamic interactive sonic environments are “sonic narrative, but they can potentially emerge into a
effects”, proposed by Jean-Francois Augoyard and “narrative” in my memory which always will be
Henry Torgue (2005). “Sonic effects” emerge from unified into a coherent whole: The same is true
the interaction of sonic events with their spatial for computer games if they are understood and
and social environment and they always also have designed as autopoietic systems that have very
perceptual and psychological dimension. This little or no predetermining story line. Soundscape
approach, originating in urban studies, provides studies have revealed how soundscapes mediate
a useful link between Pierre Schaeffer’s “objet so- relationships between listener and environments
nore” and Murray Schafer’s “soundscape”. Sonic (Truax, 2001) and these soundscapes are not
effects relate to sounds as an instrumentarium to simply “atmos” or “backgrounds”: They are
give shape to human relations and the everyday constituted by sonic manifestations of individual
management of urban space, thus stressing the agency, human or animal, culture and nature. By
performative aspect of sound. At the same time, situating games as acoustic ecologies, Grimshaw
the approach roots sounds in specific situations (2007) developed such an understanding of games
and places. In addition to the acoustic analysis of and it seems worthwhile to elaborate on it, both
a sound, or its relation to other sounds and space, theoretically and in experimental practice. It is the
it also considers its psychological dimensions fundamental listening experience of the acoustic
and the socio-cultural discourse surrounding it. It ecology as emergent, non-conventional and poten-
seems to me that understanding how sound can be tially surprising that may encourage us to follow
a constituent of experience should be fundamental the tracks of Eisenstein, Clair, Tarkovsky, Lynch,
in thinking about procedural semantics. In their and so many others who introduced the poetic,
book, Augoyard and Torgue (2005) describe a unforeseeable and even indescribable into their
range of sonic effects and this knowledge could works. The game The Path, which was mentioned
be implemented in game engines as well. earlier, is an example of an almost Tarkovskyan
audio-visual aesthetic. And the spatially extended,
Predictability Killed the Game Star delocalized sounds of Thief: The Dark Project are
akin to the changing room tones we can hear in
Sound design for games must also embrace limits Eraserhead in the role they play in our experience
of control and pre-determination. Harvey and of the gameworld.
Samyn (2006) state that: “Interactivity is the one
401
Learning from the experience of sound de- intentionally and unintentionally (Seitter, 2007).
signers within movies, we can conclude that we Playing games with sound also means hearing one-
should not avoid the ambiguous and unidentifiable self being active. The player does not just listen to
at all costs. Interaction is essentially a process sonic events or a soundscape, he also does sound.
of ambiguity, requiring ongoing negotiation of Sounds thus manifest his presence and agency in
meaning and goals not just between humans but the gameworld. Chion has introduced the term
also between humans and truly interactive com- ergo-audition, to describe this “hearing oneself
puter systems. Instead of associating an image, doing something”. First of all, this is relevant as
action and sound with pre-determined, one-to-one regulatory feedback. But the concept includes
relationships, we can create “situations” in which also (inter)subjective and socio-cultural dimen-
something can take place rather than conveying sions of meaning making. Usually, we hear other
“one single message”. Games do not have to be people’s activities rather than our own. However,
“understandable” all the time: They may confront when we break a sonic taboo, for example when
us with situations that seem accidental, but reveal sneezing during a classical music concert, we
their poetic quality as part of an overall gaming become more, even painfully, aware, of our own
experience. sounds. We are also aware of our sound-making
Ambiguity, and also the potential to surprise, capacity in the exact opposite case: when we en-
represents certain qualities of life that need to be joy hearing ourselves, a phenomenon that Chion
taken into account when designing experience labelled “plaisir de l’ergo-audition” (Chion, 1998).
systems like computer games. This demands a The sound of the beer can that I kick around, the
consideration of similar techniques used in film, sound of the champagne cork popping–they are
such as the deconstruction of causality or defamil- positive manifestations of my agency and I want
iarization, whereby, in addition to the crafting of them to be heard by others as well. This effect is
static sounds, they can also be achieved procedur- increased by interesting and surprising relation-
ally by manipulation in real time. It is important ships between my action and the sounds and also
to keep in mind that any sound, even ambiguous because they encourage exploration.
ones, can become familiar again, constituting new This manifestation of positive agency can be
categories of signifiers and establishing links to a powerful means of creating compelling experi-
the experience from whence they emerged. ences in games. I have mentioned the simple but
effective feature of singing in the game Aquaria.
Field of Action 2: sonic Agency Although not very explorative, this interaction is
and Mediation of self very satisfactory, not only because of the pleas-
ing sonic quality but also because it activates the
Hear Me Interact “joy of self-hearing”. Of particular interest is an
effect I call “differential of power”: If I press a
From an anthropological point of view, making small button and this results in a massive, pow-
sound is one of the first performative acts of a erful sound, I experience a feeling of power and
human being, and the crying of the newborn can influence. A weak and fragile sound, however, is
be considered the first “public statement”. Sound a sign of weakness and powerlessness. Specifi-
plays an important role in the constitution of our cally designing this relationship between player
self image and for expressing this self image to and gameworld contributes to the creation of
others. Through breathing to laughing, crying, and engaging experiences. After the discussion of
screaming, we have an immense palette of sonic procedural approaches above, it is easy to imagine
archetypes, and we use these instruments both now that the control of this relationship could be
402
done in real time, depending upon the player’s in classical Western movies. And what about a
performance and the dramaturgy unfolding in the tool allowing players to select from a wide range
gameplay, by adding or subtracting elements of of sounds, or even import their own sounds, to
complex sound constructions or by just modifying teach the game how they want it to sound like?
volume, envelope, and/or frequency spectrum of Something like a sonic pipette, following the
the interaction sound. pipettes of photo-editing software? In particular,
using procedural sound design for autonomous,
Sound Mediating the procedural avatars, like the “Drama Princess”, 21
Relationship to the Avatar is an interesting prospect.
Another important point related to ergo-audition Sound Mediating the Relationship

is the relationship to the player’s avatar, which to the Virtual World
includes a transformed projection of self, even if
no visual representation of an avatar exists in the While film sound often connects elements of
game. In this sense, games incorporate aspects visual montage and provides continuity, game
of theatrical performance and role-playing. For sound can link the player action on the interface
instance, in online role-playing games such as with the action of her/his virtual presence in the
World of Warcraft (Blizzard, released in Europe gameworld. A simple click is transformed into
in 2005), players are explicitly required to cre- a complex dynamic movement or process like
ate an alter ego which also includes the selection opening boxes, operating artefacts and so on. On
of a virtual race, body, clothes, profession, and the one hand, missing visual information can be
skills. Upon entering the gameworld, users act replaced by sound, on the other hand, through its
out their role using all elements available. Creat- “immaterial corporeality” (Connor, 2004), sound
ing and maintaining virtual identities is thus an contributes to a virtual embodiment of a user’s
important part of many games. So far, however, agency. Sound-image synchresis is thus extended
there are very few titles that allow the player to to action: What I do (or what I suppose another
customize sonic identities without having to ac- agent did) has caused that sound.
cess low level functionality of the software. It is Also the physicalization effect of sound is
somewhat implicit in Spore, as the creation of a relevant for action and proprioception as it gives
life-form also influences how it sounds. This is an us hints about the meaning and properties of a
area where a fictional physical modelling system virtual object and provides a kind of sonic affor-
would produce exciting new design possibilities. dance, giving the player hints about what to do
Each piece of clothing or artefact an avatar could next, thus stimulating action. Sound here supplies
put on or use would have a dynamically generated a form of virtual embodiment, often compensat-
sound which could be modified additionally by ing for the lack of haptic feedback, for example
attributive settings to achieve a rich sound palette. when picking locks in Thief 3: Deadly Shadows
A team of players in World of Warcraft would (Eidos, 2004): Here, the two dimensional move-
also follow sonic criteria when choosing their ment of the mouse is transformed audio-visually
equipment and clothing and they could further into a physical, three-dimensional interaction
customize their sonic appearance to their liking. with the lock. Another interesting example is
The sounds of their approaching avatar could represented by the sounds emanating from the
become a signal of peril for their foes or a signal Wii Remote when pulling the virtual string of
of hope for their friends, just like the sounds of the bow or using the virtual fishing rod in The
the cavalry and fanfare to the Indians or settlers Legend of Zelda – Twilight Princess (Nintendo,
403
2006). This proprioceptive feedback could also failure, success and so forth) in which the sounds
be proceduralized and, for example, be influenced of the events encountered play an important role.
by the avatar’s exhaustion. A very simple and primitive example implemented
Let us remember the power of ambiguity and in many games is the change of state of the player
unidentifiable sonic objects in this context: Why avatar, such as the “bullet time” in Max Payne
is it that, even in relatively experimental games, (Rockstar Games, 2001). Most examples of the
standard gameplay elements such as pick-up sonic enunciation of subjectivity can be found in
sounds, gates, teleporters, system information and First-Person Shooters (FPS) and related genres
so on always sounds similar? Considered super- that feature an internal, first-person point of view.
ficially, it can certainly help the understanding of So far, most of the first-person perspectives are
the game’s function and interface but, on the other staged in a very “neutral” way, except for so-called
hand, exploration, surprise, and adventuring into special game states, in which the changed states are
the unfamiliar is a fundamental, if not the “raison usually strongly enunciated by visual and auditory
d’être”, for virtual worlds. Again, consider the means. Examples are various variations of rage
most famous examples of innovative cinema and and invincible modes, for example, in Scarface:
you will notice that this quality is what many of The World Is Yours, Haze (Free Radical Design,
them share in their sound design. Additionally, 2008), or Prey. However, state changes are so far
with games being procedural systems, ambiguity inherently binary concepts and it is necessary to
can be dynamically controlled in relation to the explore more open and dynamic approaches to
whole image-sound-action relationship. Why not, “player states,” and their sonic enunciation.
for instance, surprise an overly self-conscious
player by introducing disturbances into the sounds Sonic Perspective
of artefacts and the environment?
A further element of mediating the relationship
Subjectivization to the virtual world is sonic perspective. Vintage
arcade games and the 3D FPS probably are the
In movies, extensive post-processing, as described most restrictive genres in terms of perspective
above, is used for enunciation markers and the limitation. They also represent two extremes of
simulation or marking of subjective experiences. sonic perspective, the “one dimensionality” and the
Games are essentially subjective experiences, “emitter paradigm”. In terms of perspective, game
so this would suggest that related sound design sound follows the same paradigms as the early
strategies could be applied. Chances are, however, film sound: Intelligibility of dialogue is extended
that the designs used in movies cannot be applied by the need for intelligibility of any sound that is
in straightforward manner, as in a game we do directly relevant for gameplay. These sounds are
not have a purely observational position. It is brought forward into a two dimensional space.
also important to note that we tend to perceive The naturalistic spatial perspective representing
unexpected events, such as the sudden marking distance from the camera from film in games is
of subjective experience, as the system’s agency embodied in the emitter paradigm. Why do we not
which can result in a breakdown of flow and im- shake up these perspective rules and explore sonic
mersion. Therefore, significant experimentation perspective and extension beyond the restrictions
with this design element is still necessary. A pos- of intelligibility and naturalism? Innovative film
sible initial approach could be to relate enuncia- sound has shown us that it is possible. Of course,
tions of subjective experience to the experiential this means letting go of the general approach that
story of the player during gameplay (experience of
404
treats virtual 3D spaces as constructs analogous bonds between the player’s body, sound, interface,
to “real” space. virtual object, and action.
As we can see, contrary to movies, the game
Sonic Manifestation of the apparatus has one component that does not neces-
Systems Agency sarily have to be “hidden” or masked–the interface.
As such, this interface can be a candidate for sound
This is perhaps the most abstract and general op- design as well. Through sound, the interface might
portunity for intervention to contribute to aesthetic also gain significance and become more transpar-
innovation in game sound design, but nevertheless ent as part of the gaming experience. For instance,
touches an essential point: In a truly interactive sound helps to transform the Wii Remote into
setting, the human player is not the only possible what it stands for in the gameworld, be it a gun,
agent. The system itself, and all its manifestations, a sword, a tennis bat, or a bow. Game interfaces
for example in the form of NPCs, are also agents such as these constitute placeholder objects22
and thus can be candidates for designing sonic for many kinds of meanings and functions and
manifestations of agency. I have mentioned the are inherently ambiguous. They can–within the
abstraction of the computer as agent in the con- constraints of their appearance and operation mo-
dition of the breakdown of an interaction flow. dality–be redefined through sound. The object in
However, without breaking the magic circle of its materiality and form integrates into its virtual
immersion, sound can manifest a transformation or function. In terms of sound design, the physical
a presence in the system. Combining this with the properties of the placeholder artefact are connected
idea of games that do not try to hide the apparatus with the more complex functionalities it offers.
but instead make the game system the “opponent” The type of operation of the simulated object and
(or companion?), this is a very promising outlook. its functionality defines the sound: Sounds that
relate to the direct manipulation of objects are
Field of Action 3: the normally used and are, furthermore, combined
schizophonic Game Interface with additional semiotic potential.
The term “schizophonic” can be used to de-
Last but not least, a neglected aspect is the sound scribe such artefacts. Schizophonia is the term
design of the artefacts the gamer uses. I have al- coined by Schafer to denote the separation of
ready mentioned the Wii Remote as an interface to sound from their (natural) sources by means of
control the bow in The Legend of Zelda: Twilight electroacoustics. For him this concept carries
Princess (Nintendo, 2006). With ordinary mice, only negative connotations (Schafer, 1977).
joysticks and gamepads, the actual actions and However, in my understanding, every interactive
sounds of the interface on the one hand, and the physical artefact which disposes of some kind of
actions and sounds of the virtual world they con- artificial, electroacoustic sound is schizophonic.
nect to on the other hand, already form a mutual Schizophonia is thus an essential and exciting
relationship, but this relationship is not subject aspect to consider when designing innovative
to design and the two dimensions are rather game interfaces with sound.23
juxtaposed than integrated. The Wii Remote is a This also opens up a perspective from the in-
foreshadowing of what would be possible if game terface of a stationary computer or console game
controllers were allowed more input modes and to more experimental gaming artefacts, mobile
always had loudspeakers embedded. The sounds gaming and so on. Here, we enter an exciting
of interaction partially emanate directly from the new field of theory and practice where sound has
controller in the player’s hand, which tightens the enjoyed little attention thus far: The sonic enhance-
405
ment of interactive artefacts of everyday life, an playing with form, the subversion of convention,
area of research and design which is investigated the unexpected and ambiguous can be essential
by the Sonic Interaction Design community.24 driving forces for innovation and engaging expe-
Understanding the possible associations of action, rience. In the long run, believing that satisfying
sound, and object (interface and/or virtual) is of expectations is the only way to go, is a dangerous
great importance in this context. The relationships error. Humans tend to get bored and eventually
are manifold, and very seldom do we encounter prefer to be challenged (in digestible doses) and
simple analogies or isomorphisms between sound to encounter new things. Thus, the first lesson is
and action. A simple differentiation could be to leave conventions behind sometimes and to
trigger versus constant manipulation (and their play! Do not just produce for the market, do not
combination). This can be further diversified into make it too easy for your consumers and, more
more or less isomorphous and more or less direct importantly, do not produce for consumers but
connections (for more elaborate discussion, see for players.
Chion, 1998 and also Jensenius, 2007). These
possibilities are of specific interest to game sound Art and craft Meets Procedurality
design as they contribute directly to the experi-
ence of agency, ergo-audition, and the pleasure of Game sound design has to marry the high demands
self-hearing in a game. I would like to point out for craft and artistic inspiration known from film
here that the most simple and direct mappings are sound design with the procedurality of the medium.
not necessarily the most interesting and pleasing Sound design for games requires not only the de-
ones. Additionally, let us always consider the use sign of complex sounds but the design of systems
of dynamic, procedural techniques in the design: and processes that can generate, or modify them,
A gestural-sonic link may be modulated, depend- and embed them in an interactive, ever changing
ing upon the artefacts or the player’s condition or experience. Interactive systems may also learn
state, or can convey the shifting of agency from the from the player’s agency, from the tactics and
player to the virtual artefact or the game system strategies he develops while interacting with the
and vice-versa. gameworld. Ultimately, this leads to a vision of
games not as simple event and state machines but
rather as evolving and adapting systems, always in
cONcLUsION dialogue with the player, implicitly or explicitly.
The game system becomes another actor.25 Of
Go Play! course, this radical approach is not the only one
to take and it will not always make sense to do so.
In this essay, I have expressed concern that game Taking this approach depends on the experience
sound design is being restricted at an early stage and of the “hermeneutic affordances”26 that the
of its history to certain implicit paradigms mani- designers want to provide.
fested in technological developments, creative Real time mixing, digital signal processing,
practice, and discourse. Certainly, the emergence and adaptive and procedural techniques are the
of a major game industry that serves a mass technologies that enable this development and
market has brought many advances and benefits some game sound designers have started to move
to the medium. However, just as Hollywood has towards this direction. This does not replace more
experienced, this threatens to ossify the medium traditional approaches to game (sound) design but
before it can blossom and develop. The examples helps to increase the breadth of possible gaming
from the history of film sound have shown that experiences. In fact, looking at all the technology
406
available, in particular the most recent generation part of the game. Imagine the room you play in
of consoles and middleware, we have reached a is part of the world inside the game.
state comparable to essential technical turning
points in film sound history. reduce to the Max
However, in order to explore new directions
for game sound, we might also need new game All this seems to offer an incredible amount of
concepts. For example, take the notion of “genre”: possibilities. The question is, how to turn this
In games, genre defines the diegetics, the way into innovative aesthetical approaches to game
music is used and how “realism” or credibility is sound. Leonard Paul and Rob Bridgett (2006)
established (Collins, 2008). Why not play with propose several strategies to push innovation
those genre references instead of subjecting the in game sound by essentially limiting the free-
criteria for sound design to the genre conven- dom for “Next Generation” (which, by now, is
tions? We can do that either by pushing genre to of course the “present generation” of consoles)
its extreme and remixing and combining genre sound design. The authors are convinced that “as
elements, as in New Hollywood, or we can de- we move away from having our boundaries and
sign games from the “inside out”, starting with aesthetics defined for us by the hardware (…) we
considerations about interactivity, human agency, need to enforce our own ‘boundaries’ or refined
the play with real time and with procedural story aesthetics.” (Original emphasis) They propose
spaces. Experimental approaches can also be to establish a strict aesthetic valuing originality
formal or structural (comparable to films by Da- and distinctive style over high production quality.
vid Lynch or Andrei Tarkovsky), they can touch Establishing limits within which to work forms
the mainstream and play with it, such as 2001: A the initial step in this process. A strategy is to
Space Odyssey (Kubrick, 1968) or THX 1138, or start the design from one single voice or sound,
they can establish new mainstream aesthetics that limiting tracks and DSP as well as microphones,
are rooted in the spirit of works like Star Wars and instead using resources to explore modulation,
and The Matrix. dynamics, and more detailed sculpting of sounds.
They also propose recording performances or
Games as Pervasive Event Foley sessions in one go, basically approaching
“life-feeling” and forcing concentration upon
We can think of a game sound design which does the “essential”. In relation to interactive mixing
not regard the game as an isolated hermetic system techniques, they warn against relying too much
but as something integrated into everyday culture. upon 3D voices and too much “realism”.
For this to happen, the complex cultural practices
of creation, re-mix, cross-reference, and appro- Push the Envelope
priation need to be integrated into the conceptual
process of game (sound) design. While many of these suggestions make sense,
Think outside the box, literally. Think about the question may be raised whether (artificial)
pervasive gaming, games that interweave with our limitation can really be the only key to experi-
everyday practices, that are played in semi-public mentation, creativity and inspiration. Of course,
spaces and transgress the traditional borders of the as the Demo Scene demonstrates impressively, it
medium, also sonically: The complex intertwining can. However, it is absurd to follow this path as
of the gaming event with everyday culture and its the only way of dealing with a lack of innovation
soundscape becomes all the more evident. Imagine and aesthetical elaboration. Consider the Cinema
the machine that generates the game becoming Vérité or the Dogma 95 movements in film: They
407
existed only as small movements within a bigger appear. Scripting languages need to be created to
system, infusing it with alternative ideas to debate, ensure that the creation of complex game systems
but could not (and should not!) become the ulti- is accessible to artists and designers as well. A look
mate guideline for “good” artworks. The same is at the impact that visual patching has brought to
true for game development. As movie history has artistic approaches to programming in general or
shown, technological innovation has always been the effect of accessible open source electronics
a fruitful ground for creative re-appropriation. For platforms, such as the Arduino, on artistic ap-
example, multichannel sound has led to an artistic propriation of “physical computing” shows the
exploration of the technology at hand by innova- potential of a broad range of accessible tools.
tive movie makers such as Francis Ford Coppola To make this all possible, education plays an
and Walter Murch. Why not take this example as important role. At the Zurich University of the Arts
a motivation for a similar attitude towards new Game Design programme, we have been trying to
technologies available today in game sound? The foster a critical, sometimes subversive and always
point is not to let oneself be restricted creatively playful approach to game design. In relation to
by dominating conventions such as “simulating sound design, we developed a curriculum that
reality”. constantly oscillates between experiment, research
and design (Hug, 2007). We have limited access
Analysis for Inspiration to the latest technologies including those available
to create advanced game sound design. But this
Analytical concepts such as those presented by does not matter: If the students understand what
Grimshaw (2007) and Jørgensen (2007; 2011) are is at stake in the design of interactive games, they
extremely helpful for game sound designers in will be able to realize the same innovative ideas
terms of inspiration for new concepts and design with more advanced technology later. It is also
approaches. What, for example, if other players important to provide a space for experimentation,
can also hear a system sound, which presumably literally, to understand our classrooms as experi-
was addressing only a single player (under certain mental laboratories. These labs should offer tools
conditions)? What if a causal link of a sound is for experimentation with standard computers, but
not established through a direct association with also with embedded systems, sensors, tiny loud-
a player’s activity, but with certain interaction speakers, loudspeaker arrays and so on.
patterns of whole player communities? The point Our students have taken to this approach by
I would like to make here is that, in the end, one producing some fantastic games with some of them
of the biggest advantages of analytic theory lies performing successfully in festival competitions.27
in its ability to infuse new ideas by playing with Of course; there is still a lot of room for improve-
them, by deconstructing their “rules” and under- ment, in particular in terms of sound design. All
lying assumptions in a creative way. This is true too often, the preconditioning to a conservative
for the notion of diegetics as well as for any other approach to sound, and the general inability of
analytical concept. our culture to deal with sound in a creative way
outside of what is considered “music”, has stifled
creative Environments creativity. However, there are always encouraging
for creative Minds projects emerging, which embody many of the
ideas presented here. So, there are good reasons
We also need new production processes and tools. to hope for a ”next generation” that truly deserves
Drescher’s humorous “Homunculonic AEStheti- the term.
cator” might not be as farfetched as it would first
408
These are all very general and pretentious Back, M. (1996). Micro-narratives in sound
statements, easy to say and hard to do. My take on design: Context, character, and caricature in
this is that we as game (sound) designers should waveform manipulation. In Proceedings of the
consider such approaches, watch and listen (again) 3rd International Conference on Auditory Display.
to experimental films and artworks, and to our
Beauchamp, R. (2005). Designing sound for
everyday soundscapes, and that we should never
animation. Burlington, MA: Elsevier.
stop searching for a sonic aesthetic that emerges
from the procedural quality of interactive com- Blueberry garden. (2009). Erik Svedäng.
puter games.
Brainpipe. (2008). Digital Eel.
To sum it all up, in order to advance in this
direction, all we have to do is… play! Bresson, R. (1985). Notes on sound . In Weis,
E., & Belton, J. (Eds.), Film sound: Theory and
practice. New York: Columbia University Press.
AcKNOWLEDGMENt
Bridgett, R. (2006). Audible words, pt. 2: Updat-
ing the state of critical writing in game sound.
I wish to thank Graeme Coleman for his great
Gamasutra. Retrieved February 6, 2009, from
support in editing this article and his valuable
http://www.gamasutra.com/features/20060831/
feedback. I also would like to thank the team at
audio_03.shtml
the Game Design department of the Zurich Uni-
versity of the Arts for providing an environment Bridgett, R. (2007a). Designing a next-gen game
that enables playful experiments with games, and for sound. Gamasutra. Retrieved February 13,
our students for the inspiration they provide with 2009, from http://www.gamasutra.com/view/
their ideas, questions and products. feature/2321/designing_a_nextgen_game_for_
sound.php
Bridgett, R. (2007b). Interactive ambience. Game
rEFErENcEs
Developer Magazine. Retrieved May 8, 2009, from
Altman, R. (1992). Cinema as event . In Altman, http://www3.telus.net/public/kbridget/aural_fixa-
R. (Ed.), Sound theory sound practice. New York: tion_april07.jpg
Routledge. Bridgett, R. (2009a). The future of game audio: Is
Aquaria. (2007). Ambrosia Software. interactive mixing the key? Gamasutra. Retrieved
May 3, 2009, from http://www.gamasutra.com/
Aronofsky, D. (1998). Pi. Harvest Filmworks view/feature/4025/
Audio Engineering Society. (2009). AES 35th inter- Cakewalk. (1983). Commavid.
national conference: Audio for games. Journal of
the Audio Engineering Society. Audio Engineering Cameron, J. (1984). The terminator. Pacific
Society, 57(4), 254–261. Western.
Augoyard, J. F., & Torgue, H. (2005). Sonic ex- Cameron, J. (1991). Terminator 2: Judgement
perience - A guide to everyday sounds. Montreal: day. Pacific Western.
McGill University Press. Chion, M. (1994). Audio-vision: Sound on screen.
New York: Columbia University Press.
Chion, M. (1998). Le son. Paris: Nathan.
409
Clair, R. (1985). The art of sound. Excerpts Eisenstein, S. M., Pudovkin, V. I., & Alexandrov,
from a series of letters . In Weis, E., & Belton, G. V. (1985). Statement on the sound film . In Weis,
J. (Eds.), Film sound: Theory and practice. New E., & Belton, J. (Eds.), Film sound: Theory and
York: Columbia University Press. (Original work practice. New York: Columbia University Press.
published 1929) (Original work published 1928)
Collins, K. (2008). Game sound: An introduc- FableII. (2008). Microsoft.
tion to the history, theory, and practice of video
Farnell, A. (2007). An introduction to procedural
audio and its application in computer games.
MIT Press.
Retrieved May 20, 2009, from http://www.obi-
Connor, S. (1994). Edison’s teeth: Touching hear- wannabe.co.uk/html/papers/proc-audio
ing . In Erlmann, V. (Ed.), Hearing cultures: Essays
on sound, listening and modernity. Oxford: Berg.
ity in procedural audio . In Grimshaw, M. (Ed.),
Coppola, F. F. (1979). Apocalypse now. Zoetrope Game sound technology and player interaction:
Studios. Concepts and developments. Hershey, PA: IGI
Global.
Curtis, S. (1992). The sound of the early Warner
Bros. cartoons . In Altman, R. (Ed.), Sound theory Fleming, J. (2009). Planet of sound: Talking art,
sound practice. New York: Routledge. noise, and games with EA’s Robi Kauker. Retrieved
June 3, 2009, from http://www.gamasutra.com/
Darwinia. (2007). Ambrosia Software.
view/feature/3978/planet_of_sound_talking_art_.
Deutsch, S. (2001). Harnessing the power of php
music and sound design in interactive media . In
Flückiger, B. (2001). Sound design, die virtuelle
Earnshaw, R., & Vince, J. (Eds.), Digital content
Klangwelt des Films. Marburg, Country: Schüren.
creation. New York: Springer.
Grey matter. (2008). McMillen, Refenes, & Ba-
Dig dug. (1983). Atari.
ranowsky.
Doom 3. (2004). Activision, 2004.
Grigg, C., Whitmore, G., Azzarello, P., Pilon, J.,
Drescher, P. (2006a). GAC(k!). Retrieved June Noel, F., Snyder, S., et al. (2006, May). Group
8, 2009, from http://blogs.oreilly.com/digitalme- report: Providing a high level of mixing aesthetics
dia/2006/09/gack-1.html in interactive audio and games. Paper developed at
the Annual Interactive Music Conference Project
Drescher, P. (2006b). THE Homunculonic AES-
Bar-B-Q.
theticator. Retrieved June 8, 2009, from http://
blogs.oreilly.com/digitalmedia/2006/11/the- Grimshaw, M. (2007). The acoustic ecology of
homunculonic-aestheticator-1.html the first person shooter. Unpublished doctoral
dissertation. University of Waikato, New Zealand.
Dyson, F. (1996). When is the ear pierced? The
clashes of sound, technology and cyberculture . In Grimshaw, M. (2008). Per un’analisi comparata
Moser, M. A., & MacLeod, D. (Eds.), Immersed del suono nei videogiochi e nel cinema . In Bit-
in technology: Art and virtual environments. tanti, M. (Ed.), Schermi interattivi saggi critici
Cambridge, MA: MIT Press. su videogiochi e cinema (pp. 95–121). (Bittanti,
M., Trans.). Roma: Meltemi.
Dyson. (2009). Kremers & May.
410
Half-Life series. (1998-). Valve. Jørgensen, K. (2007). ‘What are those grunts and
growls over there?’ Computer game audio and
Harvey, A., & Samyn, M. (2006). Realtime art
player action. Unpublished doctoral dissertation,
manifesto. Retrieved June 6, 2009, from http://
Copenhagen University, Denmark.
tale-of-tales.com/tales/RAM.html
Haze. (2008). Free Radical Design.
Heavenly sword. (2007). Sony. games revisited . In Grimshaw, M. (Ed.), Game
sound technology and player interaction: Con-
Hermann, T., & Ritter, H. (1999). Listen to your
data: Model-based sonification for data analysis
. In Advances in intelligent computing and mul- Kromand, D. (2008). Sound and the diegesis in
timedia systems (pp. 189–194). Baden-Baden. survival-horror games. In Proceedings of Au-
diomostly 2008, 3rd Conference on Interaction
Hitchcock, A. (1963). The birds. Universal Pic-
With Sound.
tures.
Kubrick, S. (1968). 2001: A space odyssey. Metro-
Hug, D. (2007). Game sound education at ZHdK:
Goldwyn-Mayer.
Between research laboratory and experimental
education. In Proceedings of Audio Mostly 2007 Lima, E. (2005). The devil’s in the details: A look
- 2nd Conference on Interaction with Sound. at Doom3’s antimusic. In Music4games. Retrieved
June 12, 2009, from http://www.music4games.
Hug, D. (2008a). Towards a hermeneutics and
net/Features_Display.aspx?id=70
typology of sound for interactive commodities. In
Proceedings of the CHI 2008 Workshop on Sonic LittleBigPlanet. (2008). Sony.
Interaction Design.
LoBrutto, V. (1994). Sound-on-film: Interviews
Hug, D. (2008b). Genie in a bottle: Object-sound with creators of film sound. Westport, CT: Praeger.
reconfigurations for interactive commodities. In
Love. (forthcoming). Eskil Steenberg.
Proceedings of Audiomostly 2008, 3rd Conference
on Interaction With Sound. Lucas, G. (1971). THX 1138. Warner Bros. Pic-
tures.
Ihde, D. (1976). Listening and voice: A phenom-
enology of sound. Athens, OH: Ohio University Lucas, G. (1977). Star wars. LucasFilm.
Press.
Lynch, D. (1977). Eraserhead. American Film
Jackson, B. (2009). SFP: The magical world Institute.
of “Spore”. In Mix Online. Retrieved May 20,
Lynch, D. (1990-1991). Twin Peaks. Lynch/Frost
2009, from http://mixonline.com/post/features/
Productions.
sfp-magical-world-spore
Marks, A. (2001). The complete guide to game
Jensenius, A. R. (2007). ACTION --SOUND:
audio. Lawrence, KS: CMP Books.
Developing methods and tools to study music-
related body movement. Unpublished doctoral Maturana, H. R., & Varela, F. G. (1980). Autopoi-
dissertation. University of Oslo, Department of esis: The organization of the living . In Maturana,
Musicology. H. R., & Varela, F. G. (Eds.), Autopoiesis and
Cognition. Dordrecht, Netherlands: Reidel.
411
Max Payne. (2001). Rockstar Games. Scarface: The world Is yours. (2006). Vivendi.
McTiernan, J. (1987). Predator. Amercent Films. Schaeffer, P. (1966). Traité des objets musicaux.
Paris: Seuil.
Metz, C. (1980/1985). Aural objects . In Weis,
E., & Belton, J. (Eds.), Film sound: Theory and Schafer, R. M. (1977). The soundscape: Our sonic
practice. Columbia: Columbia University Press. environment and the tuning of the world. New
York: Destiny Books.
Morgan, S. (2009). Dynamic game audio ambi-
ence: bringing Prototype’s New York City to life. Scott, R. (1979). Alien. Twentieth Century Fox.
Gamasutra. Retrieved May 8, 2009, from http://
Seitter, W. (2007). Das Spektrum der menschlichen
www.gamasutra.com/view/feature/4043/
Schallproduktionen. In H. Schulze & C. Wulf
Mr. Do! (1983). CBS Electronics. (Eds.), Paragrana, Internationale Zeitschrift
für Historische Anthropologie, 16(2), 191-205.
Berlin: Akademie Verlag.
technology and player interaction: Concepts and Silent hill series. (1999-). Konami.
Sim city. (1999-2007). Maxis.
Neitzel, B. (2000). Gespielte Geschichten. Struk-
Sonnenschein, D. (2001). Sound design: The
tur- und prozessanalytische Untersuchungen der
expressive power of music, voice, and sound ef-
Narrativität von Videospielen. Unpublished doc-
fects in cinema. Studio City, CA: Michael Wiese
toral dissertation, University of Weimar, Germany.
Productions.
Oink! (1983). Activision.
Splinter cell series. (2002-). Ubisoft.
Paul, L., & Bridgett, R. (2006). Establishing an
aesthetic in next generation sound design. Gama- Spore. (2008). Electronic Arts.
sutra. Retrieved May 25, 2009, from http://www.
Stanton, A. (2008). Wall-E. Pixar Animation
gamasutra.com/view/feature/2733/
Studios.
Picard, R. W. (1997). Affective computing. Cam-
Tarkovsky, A. (1972). Solaris. Mosfilm.
Tarkovsky, A. (1979). Stalker. Mosfilm.
Prey. (2005). 2K Games/3D Realms.
Tarkovsky, A. (1986). Sacrifice. Argos Films.
Primerose. (2009). Jason Rohrer.
The legend of Zelda: Twilight princess. (20060).
Prototype. (2009). Activision.
Nintendo.
Pudovkin, V. (1985). Aynchronism as a principle of
The path. (2009). Tale of Tales.
sound film . In Weis, E., & Belton, J. (Eds.), Film
sound: Theory and practice. New York: Columbia The Sims series (2000-). Electronic Arts.
University Press. (Original work published 1929)
Thief 3: Deadly shadows. (2004). Eidos.
Roddenberry, G. (1966-1969). Star trek. Para-
Thief: The dark project. (1998), Eidos.
mount Television.
Samorost 1. (2003). Amanita Design.
412
Thom, R. (1999). Designing a movie for sound. introduced by Humberto Maturana and Francisco
Film Sound. Retrieved February 12, 2009, from Varela and describes a system organized as “a
http://filmsound.org/articles/designing_for_ network of processes of production (transforma-
sound.htm tion and destruction) of components which: (i)
through their interactions and transformations
Tom Clancy’s ghost recon: Advanced warfighter
continuously regenerate and realize the network
2. (2007). Ubisoft.
of processes (relations) that produced them; and
Truax, B. (2001). Acoustic communication. West- (ii) constitute it (the machine) as a concrete unity
port, CT: Greenwood Press. in space in which they (the components) exist by
specifying the topological domain of its realization
Truppin, A. (1992). And then there was sound:
as such a network” (Maturana & Varela 1980, p.
The films of Andrei Tarkovsky . In Altman, R.
79). The concept has been applied in several fields
(Ed.), Sound theory sound practice. New York:
such as sociology and cybernetics particularly
Routledge.
where it references self-organizing systems.
Wachowski, L., & Wachowski, A. (1999). The Demo Scene: A computer art subculture that
matrix. Warner Bros. Pictures. specializes in producing non-interactive audio-
visual presentations that are entirely generative and
Whittington, W. (2007). Sound design & science
run in real-time on a computer. See for instance
fiction. Austin: University of Texas Press.
http://www.demoscene.info.
Williams, A. (1985). Godard’s use of sound . In Futurist Music: Most prominently defined by
Weis, E., & Belton, J. (Eds.), Film sound: Theory Russolo in his 1913 manifesto “Art of Noises”,
and practice. New York: Columbia University futurist music embraced noise, everyday urban
Press. sounds, and, in particular, the sounds of industry
and war as material for musical expression.
World of warcraft. (2005). Blizzard.
Soundscape: According to the original defini-
Wurtzler, S. (1992). “She sang live, but the mi- tion by R. Murray Schafer, a soundscape is “any
crophone was turned off”: The live, the recorded portion of the sonic environment regarded as a
and the subject of representation . In Altman, R. field for study” (see glossary in Schafer, 1977).
(Ed.), Sound theory sound practice. New York: It is characterized by a communicative and sys-
Routledge. tematic relationship between sounds, listener, and
environment.
Young, K. (2006). Recreating reality. Game Sound. Musique Concrète: A form of electroacous-
Retrieved February 13, 2009, from http://www. tic music without visible source (“acousmatic”)
gamesound.org/articles/RecreatingReality.html which mainly uses material derived from the ma-
nipulation of recordings of found “sonic objects”.
Reduced Listening: In the theory of Pierre
KEY tErMs AND DEFINItIONs Schaeffer, reduced listening is listening to the
sound for its own sake, as a sound object, by
Arduino: An open-source electronics proto- removing its real or supposed source and the
typing platform based on flexible, easy-to-use meaning it may convey. He describes his theory of
hardware and software. See http://www.arduino. the “objet sonore”, including the various listening
cc. modes, in (Schaeffer, 1966). The only substantial
Autopoiesis: From Greek “auto” (self) and English text covering Pierre Schaeffer seems to
“poiesis” (creation). The term was originally be Chion’s Audio-Vision (1994).
413
Sonification: Sonification is the use of non- although often this broad scope results in
speech sound to convey information or perceptual- musical games winning the award, which
ize data. For more information about this research are not necessarily the most innovative titles
field visit http://www.icad.org. in terms of sound design.
13.
http://www.newgrounds.com/portal/
view/467236
ENDNOtEs 14.
http://tale-of-tales.com
15.
Colloquial term for the first sound movies.
1.
At this point I want to clarify my use of the 16.
Another example of the early adoption of
multi-faceted term “sound design”. I will new technologies by innovative filmmakers.
use the term here to describe the activity of 17.
Note that the use of sonic codes originally
creating and composing what is commonly reserved for cartoons (described as iconic
called “sound effects” (SFX) and does not and non-indexical by Curtis, 1992) in such
involve music or speech, except for the cases movies gives the notion of any objective
where the borders between these traditional “realism” as a point of reference the final
classifications are blurred. The nature of blow.
computer games also includes aspects of 18.
A minor methodical issue has to be mentioned
implementation and programming. at this point. Firstly, most of the movies
2.
http://www.iasig.org discussed by the authors cited are at least
3.
http://www.audiogang.org 15 years old. Furthermore, there is no sub-
4.
http://www.gamasutra.com stantial account available on sound design
5.
Interactive XMF is mainly used for music, within television series where many aesthetic
and will be integrated in Sony’s “Awesome” innovations have been developed in the last
scripting engine for the Playstation 3. few years. Future work shall incorporate a
6.
http://www.nvidia.com/page/pg_55418. discussion of these relatively new formats.
html 19.
Many of the aesthetic innovations, like mix-
7.
http://www.audiogang.com ing many different sounds from all possible
8.
I do not discuss musical computer games as sources, time-stretching and pitch-shifting,
they form a specific, highly stereotypical, are “standard” methods for sound design in
genre, and are therefore not relevant for the film and computer games (see, for example,
aim of this article. Sonnenschein, 2001, Marks, 2001). They
9.
Actually, it is a major challenge to create have entered our collective memory built on
procedural systems that do not sound random mass media consumption and, in some cases,
and characterless. have become stereotypes themselves often
10.
http://www.audiokinetic.com/4105/sound- serving the limited, functional aesthetics of
seed-introduction.asp most of today’s games.
11.
http://unity3d.com 20.
Note the recent publication dates: It is clear
12.
The various nominees and winners of the that the discussion of this topic is far from
Independent Games Festival competition concluded and that the related theory is still
which is being held at GDC in San Fran- very much in a state of flux.
cisco provides a useful basis for studying 21.
Drama Princess is a “reusable autonomous
interesting approaches to sound design in character for realtime 3D focused on dra-
independent games. Of particular interest is, matic impact rather than simulation of natural
of course, the “Excellence In Audio” award, behavior”, developed by Harvey and Samin
414
from Tale of Tales. http://www.tale-of-tales. 26.

I use this term to describe the interaction of
com/DramaPrincess “given” affordance in the sense of ecological
22.
For a more in-depth discussion of possible psychology, with the ongoing interpretive
categories of sounding interactive artefacts, process in our interactive experience of
see (Hug, 2008a). artificial sounds.
23.
For an elaboration please refer to (Hug, 27.
A recent example is the game Feist by Flo-
2008b). rian Faller and Adrian Stutz (http://www.
24.
http://www.cost-sid.org playfeist.net). More information about the
25.
From this perspective the diegetic question programme is available at http://gamedesign.
seems obsolete. If the gamer is part of the zhdk.ch and on the work platform at http://
same system as the player, the narrative www.gametheory.ch (available only in Ger-
world and the existential world of the player man, so far).
merge into one.
415
416
Appendix
AbstrAct
What will the player experience of computer game sound be in the future?
This was the question posed in an online discussion forum to which the book’s contributors were invited
to respond. What follows is a free-wheeling debate about the future of game sound. Little editing has
been done, other than the most obvious grammar, syntax and spelling errors, in order to maintain the
fresh, often off-the-cuff responses. Three related themes become apparent in this discussion: affect,
emotion and biofeedback; realism versus alternative realities; and the need for a game-sound design
aesthetics. The first opens up interesting possibilities for enhanced player interaction (including player-
player interaction across networked games) and immersion. Although authors and games companies
often talk about the player being immersed in the gameworld, it is clear that current technology only
hints at the potential. Similarly, games companies often praise the realism of their game sounds: even
the iconic sound of Atari’s Pong of the early 1970s had its synthetic tones described as “realistic”. But
which realism is being alluded to? What precisely does this Holy Grail of realism represent and how
should it be attained? Is it the authenticity of sound that contributes to game realism or its verisimilitude
in the context? If the latter, does realism derive from expectation, culture and genre and what debt does
it owe to other forms of media? If realism refers to an emulation of reality, do we mean social real-
ism, thematic realism, consequential or physical realism and who wants to play reality anyway? These
questions directly relate to the need for a game sound design language: something that is still nascent.
Game sound involves a very different paradigm to the derivation and perception of sound as found in
reality or any other form of recreational medium. Like real-world environments, game sound derives
from the actions of and upon its entities but it is triggered from a different rather than issuing directly
from those entities. Unlike cinema, games require the willing and active participation of the player to
effect the game and its sound. Whatever the future holds, it is clear that we have only begun to discover
the possibilities inherent in computer game sound.
Appendix
Grimshaw: There are a number of ways to ap- of the player and their culture/society (various
proach this. What will the picture be in 1 year, 10 literature reports that different nationalities view
years, 20 years, how will the technology change, very different engine sounds as exciting and sporty
what interfaces/outputs might we have, do we need -- Ferrari for Italians, Porsche for Germans). The
realism (what type of realism), what will change game setup menu might have faders for national-
in the player’s perception, how will sound design ity, gender and age.
change, are there ethical questions involved in
biofeedback and so on? This type of technology will void the requirement
for game sound designers because it will be the
I’ll start this off by saying that games of the future players (their psychophysiological state) who
will conduct a form of dialogue between player and design the sound on the fly. Some role might re-
sound where the sound itself becomes an active, main for the creation of specific sounds but sound
participating character in the gameworld work- designers will find their role greatly decreased in
ing in tandem with the player to increase his/her the games industry. Of course, it also opens up
experience and immersion. This will be achieved new avenues, creative ones, outside the industry
through biofeedback whereby the game constantly -- the technology described above leads to the
monitors the player’s immediate affect and latent possibility of being able to design/create sound
emotion (through EEG, GSR, ECG, EMG etc. by ‘thinking’ about it -- presumably, the most
-- devices such as Nia and emotiv headsets are creative soundscapes created in future will be
tending in this direction) and responds by synthe- thought by the most creative minds.
sising new sounds and processing NPC speech.
In the former case, parameters of sound such as
frequency, timbre, intensity, ADSR envelope will O Keeffe: Perhaps in 20 years, sound will be
be modified; in the latter, pitch, stress, rhythm considered less important, characterised as an
and so on will be modified (I’m imagining real- interference with game play. In the real world,
time synthesis of NPC text files with an emotive sound within urban spaces is constantly being
envelope/pattern applied according to player state categorised as noise. If we continue on the path
and game context). of highlighting post modern soundscapes as noisy
environments we justify silent virtual spaces,
So, the game engine senses the player is not places to escape to sound of the real.
frightened enough? Up the ‘fear’ controller to
alter the synthesis of sounds and add a worried
tremor to NPC speech. Perhaps this should be Grimshaw: Rather like ‘silent’ rooms in busy
taken the other way. The player is about to have corporate and academic campuses.
a heart attack so an emotion governor kicks in to
calm them down by synthesising soothing sounds
(after all, presumably game companies don’t want Liljedahl: There is also the concept of “percep-
to be sued). Players can be kept on just the right tulization” which is interesting in the view of
level of emotional rollercoaster. the very well established “visualization” and the
at least in some communities well established
Of course, before all this is possible, there needs “auralization”. In the future, sound, graphics and
to be substantial research into what it is about other media types will be more integrated. Today,
sound that induces fear (or happiness, sadness etc.). what we see and what we hear do not necessarily
This will need to take in context, past experience match. One example is room acoustics. When
417
Appendix
for example the content of a room changes dur- need to? In the example of the leg being chopped
ing the course of a game, the acoustics does not. off, I’m assuming it’s not a real leg attached to a
With new, better methods to simulate reverbera- live human being. Yet, the horror works because
tion and acoustic occlusion culling will add to the context, latex leg, blood, screams and other
realism. Along the same line is the idea that the appropriate sawing and chopping sounds make us
sounds of the game could adapt to the acoustics “see” what we do not see. Again, this is a case of
of the physical room where the player is located. sound making the image look better -- the scene
This can be used in pervasive games to blur the would lose its power without the sound (just as
borders between the virtual reality of the game Tati’s films would be nothing without his absurd
and the physical reality of the player. sound use).
What would a future be like where, instead of

Liljedahl: I like the idea with silent rooms. It opens putting sound to image (to make it look better),
up for new types of games where realism is not we put image to sound (to make it sound better)?
self evident or the only aspect of the game media. (I wish to ignore musical forms such as ballet,
musicals etc. here.)
Liljedahl: Perhaps we have done what can be

done given the sound technology available today. Cunningham: Bio-feedback will surely play a
Decades of film, TV and computer game produc- big part. It sounds so cumbersome and intrusive
tion have exploited many of the possibilities doable now but the technology will come along to let us
with today’s relatively static audio tools. New do it in more discreet and passive ways. In the
levels of interactivity will put new demands on the meantime the scope is there to research the hu-
ability to create dynamic soundscapes. New DSP man physiological responses to sounds of fear,
technology will open up totally new possibilities to joy, sadness, and so on. Improving computer
create far more dynamic soundscapes than today. models of emotion and AI engines will mean
that the game can, in turn, adapt to the changing
state of the player.
Grimshaw: I was chatting with a colleague last
night who works in (visual) SFX and he was de- Silent rooms are interesting. I would classify Sec-
scribing some visceral scene in one of the Saw ond Life as being something of a pseudo-game,
movies (a leg being chopped off or something). but I frequently find myself turning the ambient
According to him: “Sound makes the image look noises and music down or off - as I find it easier
better” (my italics). to get a handle on what my avatar is doing and the
interactions it is having with other characters and
I wonder if it’s ever the case that “image can make objects. Perhaps this comes back to the concept
the sound sound better”? Many of the chapters of “realism” in games. I think we can already
in the book make the point that sound tends to produce game sound that is 80% or 90% realistic,
be subservient to image particularly so in the especially using surround sound. I suppose what
production process and this is illustrated by my we need to consider is: is realism needed and what
colleague’s comment. are the effects of it on game players?
Regarding the comment here that “what we see and If we do want realism, then the technology has
what we hear do not necessarily match”. Do they to get better. I don’t want 5 or 7 speakers dotted
418
Appendix
round the walls of my living room (and neither with new control devices surprisingly little has
does my missus!). Wavefield synthesis would be been done on the side of audio technology sad
great too but it’s not practical for the average man enough.
in the street. Do we need to go back to HRTF or
are thin, wireless speakers the way to go?
Cunningham: I guess to answer the question
directly, the player experience of computer game
Wilhelmsson: What would a future be like where, sound in the future will be an one that is totally
instead of putting sound to image (to make it look transparent, pervasive, and natural.
better), we put image to sound (to make it sound
better)? (I wish to ignore musical forms such as
ballet, musicals etc. here.) (Grimshaw) Alves: One way to obtain this state would be to
start the process of game design with the sound
One way to obtain this state would be to start the or at the very least let the sound designer become
process of game design with the sound or at the a part of the process from the very beginning.
very least let the sound designer become a part of (Wilhelmsson)
the process from the very beginning. I would say
that some ideas from moviemaking could come Yes, I do believe that’s the way to go. Though
in handy. Pudovkin’s ideas on asynchronsim for there will always be the need for people with the
instance. Use the sound to communicate with technical ability to deal with sound, the real deal
the player what the images do not and stress that is to ensure that sound is explored in its depth. Not
invisible part as game play foundation. Maybe only as sonorization but also as part of meaningful
abandon the image as progenitor of the sound components of the game where sound is relevant
more or less totally. However, that is not likely to for the course of action and for the inhabitants of
happen. A more likely development for the player the gameworld.
experience of computer game sound would be less
compressed audio files which in turn might lead To accomplish that, a sound designer (meaning
to a sound environment that has a more “natural” a designer who is aware and attentive to sound
dynamic range. If processing power of the sound potential) ought to be involved right from the start
would keep up with or overshadow the processing of the design process.
power of graphics cards we might have less a need
for high compression on the audio files. Would it Yet, I do not believe that we should be looking
not be nice to have multiple processors and 128 for ways to ensure that a game will have sound.
GB to just handle all the audio files in a game? My understanding is that sound as any other
What more? The player will probably benefit from modality should be subservient to the global
more refined audio technology with directional communicative purposes of the game, let’s say
sound. Why not continue the reconfiguration of to the accomplishment of an emotional script. In
our daily living environments yet another step and that sense, ultimately, if the best use of sound in
make full use of directed sound technology? Let a particular game is to keep silence, so be it - that
the player experience the sound without disturbing will be good sound design too.
others and not force her to use headphones. This
sound strategy could very well have the positive Anyway, my point is that for the designer to be
side effect that the player might be forced to move able to use sound whenever and however it should
around to hear all that there is to hear. In a world be more appropriate, it is fundamental to be able
419
Appendix
to design the game with such design decisions promote collective experiences. This is, of course,
already present, in counterpoint to have the game just a feeling but I guess if I would have a game
already defined and trying to find the best way to where I was supposed to do a lot of yelling, I
fit or to wrap it with meaningful sound. would certainly guarantee a good laugh if I could
use the help of some friends or family (I mean,
in loco, not online).
Alves: is realism needed and what are the effects of
it on game players? … If we do want realism, then It seems to me that a game interface that allows a
the technology has to get better. (Cunningham) more expressive activation is also more prominent
to sharable experiences.
Realism is certainly interesting. Still, I tend to
feel that the actual issue is the adequacy of sound In turn, if there is some truth in this foresight,
to the “reality” of the gameworld (which can be we are also talking about new opportunities in
very disparate from that of the real world). That terms of game design. We may be willing to ad-
is, the realism relative to the gameworld. dress more attentively the design of games that
are meant to provide collective experiences.
One important aspect of this inner realism, I
believe, is coherence. Coherence with visuals,
with physics, and among sounds. I mean, the Cunningham: A quick thought in regard to the
whole setting should be believable (or at least it last post by Alves, discussing the use of sound
should not ruin the player’s will to believe in the as a game input and the role playing of the game
gameworld). player. I’ve often thought it might be interesting
to conduct a study of the type of speech and non-
speech sounds made by games players during play.
Alves: In addition to the contribution of sound I’ve often found myself muttering, cursing, elating,
to the enrichment of the gameworld, I’m also and so on in response to the stresses, failures, and
hoping that sound will contribute to make the victories in the game scenario. If the game could
act of playing a more enjoyable experience, per respond to such utterances this could lead to an
se. I mean that the use of sound, particularly as even more dynamic and interactive experience.
input, can change the way a player interacts with
the game interface with possible improvements
in the overall experience. Droumeva: perhaps in 20 years, sound will be
considered less important, characterised as an
I’m expecting that sound will become increasingly interference with game play. In the real world,
bidirectional. And that means that someday we sound within urban spaces is constantly being
will be sending all sorts of sound into the game. categorised as noise. If we continue on the path
In turn, that seems to call for a much more active of highlighting post modern soundscapes as noisy
interpretation of the role of playing -- no more sit- environments we justify silent virtual spaces,
ting still and mute, just clicking or pressing keys. places to escape to sound of the real. (O Keeffe)
That is, the actual behavior of the person who is
playing, while playing, may change. This made me think of a captivating user video I
once saw on YouTube from a “walkthrough” of
I also believe that roaring or making some kind Grand Theft Auto: Liberty City. The player was
of bizarre noise towards the game will tend to moving his avatar and narrating his actions, and
420
Appendix
basically he wanted to talk about the depth of the traditional narrative, the fearful sound works
exploration one can get to in GTA (a game that, just because we have been “guided” toward it by
as we all know, has had a reputation for being the linear storyline (and the sensory experiences
gratuitously violent), how rich the environment that accompany it, think “calm before the storm”).
is, etc. He walked his character all the way away In the perspective of an ad-hoc modification of
from the city - saying that “it was too noisy and synthesis parameters it might well be that we
busy” and into “Central Park” in order to “enjoy constantly reflect the fact that our behavior has
some quiet nature sounds and peace of mind, away an impact on the events, which might make the
from it all” - it just struck me as a most curious actual events much less interesting.
simulacrum - finding the precious solace of “real-
ism”, of reprieve from a noisy environment in a A second issue is to deal with the nuances in
game - in the quiet natural soundscape of a game! perception, behavior, and the possible contrast
between measured states and felt states. Bio-
To me, that signifies one possible future for game physical excitement might have different causes,
sound - it will be more and more the “real” envi- so altering the assumed cause might trigger the
ronment of young people as opposed to the real wrong feedback loops... But that’s a subject of a
soundscapes of the noisy, urban, overcrowded off- lot of research anyway.
line world. So design has to be conscious of that,
absolutely, how - I’m not sure...mimic closely and But I also see a huge chance for creative practice
thoroughly our surrounding acoustic soundcsape, in such technologies. However, this seems to re-
or foster completely imaginary worlds? quire the invention of new game genres. I think
the traditional “narrative” approach to game de-
sign (more or less linear, storylines, quests) has
Grimshaw: I think foster completely imaginary a few merits (subtly changing the sounds based
worlds. It’s the ‘otherness’ of other environments on player’s states, as described), but it would not
that captivates players and I for one would see no be the most suitable approach to leverage the full
reason to immerse myself in a world exactly the potential.
same as this one.
The play with emotions in itself could become
part of the game, and the player would have to use
Hug: Following up on the initial question (and his self-control over emotional states for actively
some related points made during the discussion)... controlling aspects of the game. Imagine you are
I agree with Grimshaw that there is a strong pos- a virtual spy and have to trick a lie detector or an
sibility biofeedback will be used in some form to investigative detective... Ok that’s more traditional
control game engine states, including real time narrative again, but what I want to say is: It might
sound synthesis. However I am not convinced be worthwhile not to try to hide the system, the
that this will necessarily lead to improved player apparatus, from the player, but make it available
experience. The problem is, that if players are to them as tool for action.
aware of these mechanisms (and they surely will
be, because advertisements, making-ofs and How this connects to sound? I strongly believe
magazines will make a rave about it) this will that there is a big potential in the possibility of
already alter the way they approach the game. linking a virtual sound world with the actions of
Usually, once we are aware of a certain level of the player. This may be partially controlled by
control, we will try to subvert a given system. In biophysical monitoring systems, but I think at least
421
Appendix
equally important are (physical) game controllers. of the aesthetics of the medium. Game sound thus
And I mean not just Wii Motes, but the idea is should explore new directions and for that we need
that everything can become a game controller in a people (artists?) that abolish preconceptions and
“mixed reality”. The interesting differential then just try out crazy stuff.
(and it’s this differential that is the most exciting
to design) is between the player’s actions and I think no one can exactly say how a game sound
the sounds as manifestations of this action in the aesthetics will or should be like, but we can say
game world. So biophysical monitoring would not that we have to explore unorthodox paths and
mainly be used to adjust the sensory output of the eventually, a new “language” will emerge. I also
medium to alter the player’s emotional state, but think, that the directions towards which such
it would be used to give the player an additional an exploration could go can be derived of some
channel of expression. genuine qualities of the medium. Think about the
idea of “montage” in film which was one of the
Imagine a game where players learn about the strongest catalysts for audiovisual innovation. In
sonic behavior of virtual artifacts (and the way games, it is maybe not so much about audio-visual
they have to handle them using their physical montage, but about action-feedback montage.
placeholders or project natal - body movements), De-constructing familiar action-feedback loops
where totally new and surprising action-sound re- and creating new ones.
lationships could be designed. And the mentioned
input channel for speech & nonverbal expressions Another field which is prone to artistic exploration
could play an important part in it. Remember the is “diegesis” of game sound, as it seems very un-
audio-gun from Dune? clear where diegesis starts and ends in a medium
where a narrative is not passively consumed but
Hug: Addressing Droumeva’s point. There is actively co-created as player experience. Film
always a fascination in the simulation of “real- sound has developed a great variety of ways to
ity”, and actually I think part of the fascination establish or support diegesis, as well as how to
comes from the knowledge that you are not “out integrate non-diegetic sound to serve a narrative.
there” but sitting at home in your full immersion In game sound this is still terra incognita to a
suit listening to binaural soundscapes. I think large extent, in particular if we look at genres
this will always have its place and justification. with “low narrativity”.
But on the other hand, I think Grimshaw is right.

It’s the “otherworld” that we seek to flee into. Grimshaw: Certainly the player may subvert the
This otherworld certainly is composed of familiar system I propose and that might be part of the fun
elements but deconstructs them and surprises us (and would all players be aware of the possibil-
with the unexpected. ity and, even if they were, would the apparatus
recede into the background with familiarity and
In general I think that creative and more sustain- the needs to play the game in order to reach the
able potential lies in the definition of new aesthetics desired outcome?). However, why not have the
rather than simulation of the familiar and “real”. I game subvert the player? The sound engine need
think game sound could take an example in how not slavishly mimic the fear of the player, for
film sound was pushed into a media language of example, it could do the reverse and stubbornly
its own, establishing design strategies that have refuse to help that emotion along until the player
become kind of “naturalized”, are inherently part is lulled into a false sense of security and then....!
422
Appendix
If anyone would like to help develop this, let me

I like the idea of having to use emotion to navi- know!
gate the game (and this feeds into more than just
sound). You’re right Daniel [Hug], such a game This actually also points to a question related to
would probably require a new genre (can I bag the future of game audio in general, if procedural
the name first as ‘emotive gaming’!) where the methods really are the future: at which point does
game itself emotionally engages with the player, the actual sound design take place? Will we be
becomes a character itself. merely adjusting parameters of simulated physi-
cal entities? How do we achieve the magic and
the “bigger than life” effect and surprising, new
Hug: However, why not have the game subvert sounds, if everything is controlled by a “realistic”
the player? The sound engine need not slavishly simulation engine? Is there a way to combine the
mimic the fear of the player, for example, it could strengths of procedural audio with old-fashioned
do the reverse and stubbornly refuse to help that compositional sound design? Well, my idea of a
emotion along until the player is lulled into a false hybrid foley box would maybe be a way to join
sense of security and then....! (Grimshaw) these worlds...
Well I doubt this degree of control over the player’s

emotions can ever be achieved, simply because of Grimshaw: Well I doubt this degree of control
the ambiguity of interpretation. It’s maybe a bit over the player’s emotions can ever be achieved,
like with psychoactive drugs: for some a dream simply because of the ambiguity of interpretation.
for others a horror trip (or even for the same per- It’s maybe a bit like with psychoactive drugs: for
son under different circumstances). This is why I some a dream for others a horror trip (or even for
think the power of biofeedback should maybe be the same person under different circumstances).
seen as another channel of agency for the player This is why I think the power of biofeedback should
rather than hiding it. maybe be seen as another channel of agency for
the player rather than hiding it. (Hug)
Hug: My vision of a tool for game sound design: I can dream.... Certainly a lot of work needs to
A hybrid foley box which seamlessly integrates be done: fundamental research into mapping
physical modeling and all kinds of synthesis emotion/affect to sound parameters not to men-
methods as well as real-world recordings and re- tion precise and accurate measurement of such
synthesis. The most important feature will be a emotion/affect. It may well be that it’s a long time
sonic pipette: just grab a sonic residue somewhere, before we move beyond the blunt tool of mere
drop it into a placeholder object, combine with positive/negative valence and are able to precisely
“sonic drops” from other sources, including vir- identify fear as opposed to anxiety for example.
tual ones drawn from physical models, create an
envelope in real time by singing into it and then However, (once that research is well under way)
define a set of mapping criteria to attach it to an personal emotion profiles for individuals could be
entity of your game world (objects, npcs, avatars) stored taken from ‘set-up’ measurements and this
and play around with the object and its sounds in would allow more precise targeting of individuals.
realtime in the game world.
423
Appendix
Grimshaw: This actually also points to a question I love the idea of biofeedback and I do see that
related to the future of game audio in general, coming in a more mainstream way into gaming
if procedural methods really are the future: at in the 10-year prediction range - and hopefully
which point does the actual sound design take by then game sound will operate with a new and
place? Will we be merely adjusting parameters of improved “media language” - crazy artists would
simulated physical entities? How do we achieve have made their mark on the interactional map-
the magic and the “bigger than life” effect and pings between game sound and game input, as well
surprising, new sounds, if everything is controlled as the structure and mechanics of gaming period,
by a “realistic” simulation engine? Is there a way so biofeedback might literally control the sound-
to combine the strengths of procedural audio with scape or at least the avatar’s own soundmaking
old-fashioned compositional sound design? Well, in the game (assuming a narrative structure once
my idea of a hybrid foley box would maybe be a again, I know) as a mechanic. I think along with
way to join these worlds... (Hug) biofeedback, I see remote networking tangible
controllers - things that can send sensations like
Now there’s an interesting question. Anyone? touch, temperature, pressure, perhaps sound and
breath, vibration, rhythm, to a remote player. I also
like the idea of players being made to understand
Droumeva: In general I think that creative and more how game sound is synthesized and be able
more sustainable potential lies in the definition to take part in that process more actively, though
of new aesthetics rather than simulation of the this point makes me wonder if the future of gaming
familiar and “real”. I think game sound could take is all about “opening up” the programmatic side of
an example in how film sound was pushed into a games and making players-as-producers - I don’t
media language of its own, establishing design know if that might result in really bland, generic
strategies that have become kind of “naturalized”, game structures that are the “blank slate” upon
are inherently part of the aesthetics of the medium. which players build up game worlds and game
Game sound thus should explore new directions feedback. That said, I can definitely see, within
and for that we need people (artists?) that abolish a year even, game sound being customizable - i.e.
preconceptions and just try out crazy stuff. (Hug) players being allowed to upload their own sound
effects to each game, and thus construct their
I completely agree actually, got wrapped up in own soundscapes.
making a point about the experiences of “real-
ity” which, I believe, will still be an important But regarding the general question of future of
social experience in gaming, albeit - agreed game sound - obviously related tightly to the
with Grimshaw and Hug that gaming is more future of game genres, and game mechanics - I
about a different reality than re-immersing into also see a rise in “lifestyle gaming” and “human
a nostaligic version of past realities. (whatever it computation”. Lifestyle gaming I’d call things like
is that those “alternate realities” end up being). Wii Fit, Brain Age, the multitude of “games” that
And I do retract my previous implication that are essentially utility applications thinly veiled as
“reality” should be somehow integrated into game games. Game sound is bound to be affected by this
sound design, or be a design principle - I meant shift. Thinking specifically of biofeedback, I can
it strictly as an important cultural byproduct and definitely see it being used in “lifestyle games” for
social experience. anti-anxiety, meditation, stress control, etc. and
424
Appendix
that would entail different (perhaps more second- telepresence, body-suits, etc. This vision might
ary, limited, or perhaps more information-based/ now actually become technically feasible.
driven) uses of game sound than entertainment
games - driven by a quest for playfulness, fun But just to give this a twist into a slightly dif-
and challenge. ferent direction: What if in the future (which
actually has already begun in some ways) there
Human computation - the use of gaming structures is no “closed system” of gaming anymore? No
for humans to do actual work, may be fringe now, dedicated software and hardware interfaces?
but I see it rising with trends like education technol- When gaming is pervasive, where you are, where
ogy (I myself am in that field, somewhat…) and I shopping for food becomes a quest for the one
can’t help but thinking game sound - its potentials milk bottle which contains the key to level 92?
for fun and playfulness - might suffer, should When your CO2 footprint directly links to your
pragmatics over-ride aesthetics and playfulness. avatar’s stats and to bonus programmes offered
Just throwing this out there... by a green power syndicate?
What of sound, then, being neither a reflection

Grimshaw: I don’t think game sound will suf- of a (constructed) reality nor the expression of a
fer. Pragmatics might be needed should game separate, self-referential aesthetic system (“game
structures broaden their reach into non-gaming sound aesthetics”, think 8 bit...), but an element
areas because such areas are not intended to be in a hybrid, electroacoustic soundscape? I’m do-
games and therefore do not need (necessarily) ing some extensive research on sound design for
game aesthetics and playfulness. interactive artifacts for everyday use and there I
constantly run into this question. In this scenario
there is no distinguished system of aesthetic codes
Grimshaw: With regard to Droumeva, you anymore as we know it from film and today’s
brought up the idea of network tangible control- games, there is no entering or leaving a specific
lers. What about extending this to incorporate application, environment, cinema, game, etc.,
biofeedback in a multi-player system? We’ve there is just a constant “multilayeredness” of pres-
already discussed using a player’s psychophysiol- ence and agency - or maybe a constant switching
ogy to affect/effect sound in the game and, due between presences. And this poses fundamental
to the nature of networked games, assuming the questions about what might be suitable (sound)
new sound then has a feedback effect upon the design strategies. How do we combine, merge,
player, this will probably have an effect upon juxtapose, subvert the “naturally occurring”
gameplay and other players. So far, the biofeedback physical sounds, and the sounds of a pervasive
sound is only heard by the one player -- could the gaming system? And how do we integrate these
parameters used to drive the sound synthesis/ sonic events into the socio-cultural fabric of
processing also be sent via the network to other everyday life?
player’s audio engines? In a horror game, can
players then sense the fear of others? This sounds maybe far out, but then again, it is
happening already. Do we need to investigate
not only the “acoustic ecology” of the game, as
Hug: There is something in this discussion pointed out by Grimshaw in his work, but an
which strongly reminds me of mid-end nineties acoustic ecology of our game-lives?
discussions about full immersion cyberspace,
425
Appendix
Or, might it be possible that in the end, humans I guess if we are able to enhance any aspect of our
will always prefer to experience the transition lives then it should become legitimately persistent
between systems, to know when they are inside (just as shelter, clothes, food, and education).
or outside a game, work, “real life”?... Where the We would deal with ‘new sound’ the same way
power switch (and the “mute” button) is? we currently deal with other sounds we are able
to control: should a sound become inconvenient
in any particular circumstance, we would behave
Alves: hybrid, electroacoustic soundscape (...) in a way it would not happen.
might it be possible that in the end, humans will
always prefer to experience the transition between
systems, to know when they are inside or outside ENDNOtEs
a game, work, “real life”?... Where the power
switch (and the “mute” button) is? (Hug) 1
Collins, K. (2008). Game sound: An intro-
duction to the history, theory, and practice
In such scenario: no switch, please... We are en- of video game music and sound design.
titled to a playful life! Cambridge MA: MIT Press. p. 9.
426
427
Compilation of References
(1990). Wing commander [Computer game]. Austin, TX: Adams, M. (2009). Hearing the city: Reflections on
Origin Systems. soundwalking. Qualitative Research, 10, 6–9.
(1995). Super Mario Bros [Computer game]. Redmond, Adams, M., Cox, T., Moore, G., Croxford, B., Refaee, M., &
WA: Nintendo. Sharples, S. (2006). Sustainable soundscapes: Noise policy
and the urban experience. Urban Studies (Edinburgh, Scot-
(2002). Lucky Larry’s Lobstermania [Computer game].
land), 43(13), 2385. doi:10.1080/00420980600972504
Reno, NV: IGT.
Adler, S. (2002). The study of orchestration (3rd ed.).
(2008). Emily Project. Santa Monica, CA: Image Met-
New York: Norton & Company.
rics, Ltd.
Adorno, T. W., & Eisler, H. (1947). Composing for the
(2008). Faceposer [Facial Animation Tool as Part of
films. New York: Oxford University Press.
Source SDK]. Bellevue, WA: Valve Corporation.
Adrien, J. M. (1991). The Missing link: Modal synthesis
(2008). Warrior Demo. Santa Monica, CA: Image Met-
. In De Poli, G., Piccialli, A., & Roads, C. (Eds.), Repre-
rics, Ltd.
sentations of music signals (pp. 269–298). Cambridge,
(2010). World of warcraft [Computer game]. Reno, NV: MA: MIT Press.
Blizzard Entertainment.
Agarwal, R., & Karahanna, E. (2000). Time flies when
Aarseth, E. (2008). A hollow world: World of Warcraft you’re having fun: Cognitive absorption and beliefs about
as spatial practice . In Corneliussen, H., & Rettberg, J. information technology usage. Management Information
W. (Eds.), Digital culture, play and identity: A World of Systems Quarterly, 24(4), 665–694. doi:10.2307/3250951
Warcraft reader. Cambridge, MA: MIT Press.
Alais, D., & Blake, R. (1999). Neural strength of visual
Aarseth, E. (2003, August). Playing research: Method- attention gauged by motion adaptation. Nature Neurosci-
ological approaches to game analysis. Paper presented ence, 2(11), 1015–1018. doi:10.1038/14814
at the Digital Arts and Cultures Conference, DAC2003.
Alloy, L., Abramson, L., & Viscusi, D. (1981). Induced
Melbourne, Australia.
mood and the illusion of control. Journal of Personality
Aarseth, E. (2005). Doors and perception: Fiction vs. and Social Psychology, 41, 1129–1140. doi:10.1037/0022-
simulation in games. In Proceedings of 6th Digital Arts 3514.41.6.1129
and Culture Conference 2005.
Alone in the Dark. (2008). Eden Games.
Aav, S. (2005). Adaptive music system for DirectSound.
Alone in the dark. [Computer game]. (1992). Infogrames
Unpublished master’s thesis. University of Linköping,
(Developer). Villeurbanne: Infogrames.
Sweden.
Alone in the dark: Inferno. [Computer game]. (2008). Anderson, P. W. S. (2002). Resident evil [Motion picture].
Eden Games S.A.S. (Developer). New York: Atari. Munich, Germany: Constantin Film.
Alone in the dark:The new nightmare. [Computer Angus, J. A. S, and Caunce A. (2010) A GPGPU ap-
game]. (2001). DarkWorks (Developer).Villeurbanne: proach to improved acoustic finite difference time domain
Infogrames. calculations. AES 128 (7963) London, UK.
Altman, R. (1992). Sound theory sound practice. London: Aquaria. (2007). Ambrosia Software.
Routledge.
Aronofsky, D. (1998). Pi. Harvest Filmworks
Altman, R. (1992). General introduction: Cinema as event
Arons, B. (1992, July). A review of the cocktail party ef-
. In Altman, R. (Ed.), Sound theory, sound practice (pp.
fect. Journal of the American Voice I/O Society, 12, 35-50.
1–14). New York: Routledge.
Arsenault, D., & Perron, B. (2009). In the frame of the
Altman, R. (1992). Cinema as event . In Altman, R. (Ed.),
magic cycle: The circle(s) of gameplay . In Perron, B.,
Sound theory sound practice. New York: Routledge.
& Wolf, M. J. P. (Eds.), The video game theory reader 2
Alves, V., & Roque, L. (2009b). Notes on adopting (pp. 109–132). New York: Routledge.
auditory guidelines in a game design case . In Veloso,
Arsenault, D., & Picard, M. (2008). Le jeu vidéo en-
A., Roque, L., & Mealha, O. (Eds.), Proceedings of
tre dépendance et plaisir immersif: les trois formes
Videojogos2009 - Conferência de Ciências e Artes dos
d’immersion vidéoludique. Proceedings of HomoLudens:
Videojogos. Aveiro, Portugal.
Le jeu vidéo: un phénomène social massivement pratiqué,
Alves, V., & Roque, L. (2009a). A proposal of soundscape (pp. 1-16). Retrieved from http://www.homoludens.uqam.
design guidelines for user experience enrichment. In ca/index.php?option=com_ content&task=view&id=5
Proceedings of the 4th Conference on Interaction with 5&Itemid=63.
Sound, Audio Mostly 2009 (pp. 27-32). Glasgow, UK.
Ashcraft, B. (2008) How gaming is surpassing the
AM3D (2009). AM3D [Computer software]. AM3D A/S Uncanny Valley. Kotaku. Retrieved April 7, 2009, from
(Developer). Aalborg, Denmark. http://kotaku.com/5070250/how-gaming-is-surpassing-
uncanny-valley.
Amdel-Meguid, A. A. (2009). Causing fear and anxiety
through sound design in video games. Unpublished Ashmed, D. H., & Wall, R. S. (1999). Auditory percep-
master’s thesis. Southern Methodist University, Dallas, tion of walls via spectral variations in the ambient sound
Texas, USA. field. Journal of Rehabilitation Research and Develop-
ment, 36(4).
Amsel, A. (1962). Frustrative nonreward in partial re-
inforcement and discrimination learning: Some recent Assassin’s Creed 2. (2009). Ubisoft Montreal. Ubisoft.
history and a theoretical extension. Psychological Review,
Association for Computing Machinery. (2010). ACM com-
69(4), 306–328. doi:10.1037/h0046200
puting classification system. New York: ACM. Retrieved
Anderson, G., & Brown, R. I. T. (1984). Real and labora- February 4, 2010, from http://www.acm.org/about/class/.
tory gambling, sensation-seeking and arousal. The British
Atkinson, D. (2009). Lip sync (lip synchronization ani-
Journal of Psychology, 75(3), 401–410.
mation). Retrieved July 29, 2009, from http://minyos.its.
Anderson, J. D. (1996). The reality of illusion: An eco- rmit.edu.au/aim/a_notes/anim_lipsync.html.
logical approach to cognitive film theory. Carbondale,
IL: Southern Illinois University Press.
428
Atwater, F. (1997). Inducing altered states of conscious- Ballas, J. A. (1994). Delivery of information through
ness with binaural beat technology. In Proceedings of sound . In Kramer, G. (Ed.), Auditory display: Sonifica-
the Eighth International Symposium on New Science tion, audification, and auditory interfaces (pp. 79–94).
(pp. 11-15). Fort Collins, CO: International Association Reading, MA: Addison-Wesley.
for New Science.
Bannister, D., & Mair, J. M. M. (1968). The evaluation of
Aucouturier, J. J., & Pachet, F. (2002). Scaling up music personal constructs. London: Academic Press.
playlist generation. In Proceedings of the IEEE Interna-
Barlow, D. H. (1988). Anxiety and its disorders: The
tional Conference on Multimedia Expo.
nature and treatment of anxiety and panic. New York:
Audio Engineering Society. (2009). AES 35th interna- Guilford Press.
tional conference: Audio for games. Journal of the Audio
Barr, P. (2008). Video game values: Play as human-
Engineering Society. Audio Engineering Society, 57(4),
computer interaction. Unpublished doctoral dissertation.
254–261.
Victoria University of Wellington, New Zealand.
Audiosurf. [Video game]. (2008). Dylan Fitterer (Devel-
Bartneck, C., Kanda, T., Ishiguro, H., & Hagita, N.
oper), Bellevue, WA: Valve Corporation (Steam).
(2009). My robotic doppelganger—A critical look at the
Augoyard, J. F., & Torgue, H. (Eds.). (2005). Sonic expe- Uncanny Valley theory. In Proceedings of the 18th IEEE
rience: A guide to everyday sounds. Montreal, Canada: International Symposium on Robot and Human Interactive
McGill-Queens University Press. Communication, RO-MAN2009, 269-276.
Augoyard, J., & Torgue, H. (2006). Sonic experience: Bateman, C. (2009). Beyond game design: Nine steps
A guide to everyday sounds (illustrated ed.). Montreal, towards creating better videogames. Boston: Charles
Canada: McGill-Queen’s University Press. River Media.
Avanzini, F. (2008). Interactive sound . In Polotti, P., & Bateman, C., & Boon, R. (2006). 21st century game
Rocchesso, D. (Eds.), Sound to sense, sense to sound design. Boston: Charles River Media.
– A state of the art in sound and music computing (pp.
Battaglia, P. W., Jacobs, R. A., & Aslin, R. N. (2003).
345–396). Berlin: Logos Verlag.
Bayesian integration of visual and auditory signals for spa-
Avanzini, F. (2001). Computational issues in physically- tial localization. Journal of the Optical Society of America,
based sound models. Unpublished doctoral dissertation. 20(7), 1391–1397. doi:10.1364/JOSAA.20.001391
University of Padova, Italy.
Battle of the bands. (2008). Planet Moon Studios.
Back, M. (1996). Micro-narratives in sound design: Con-
Beauchamp, R. (2005). Designing sound for animation.
text, character, and caricature in waveform manipulation.
Burlington, MA: Elsevier.
In Proceedings of the 3rd International Conference on
Auditory Display. Beck, D. (2000). In Boulanger, R. (Ed.), Designing
acoustically viable instruments in Csound. The Csound
Bailenson, J. N., Swinth, K. R., Hoyt, C. L., Persky, S.,
book: Perspectives in software synthesis, sound design and
Dimov, A., & Blascovich, J. (2005). The independent
signal processing (p. 155). Cambridge, MA: MIT Press.
and interactive effects of embodied-agent appearance
and behavior on self-report, cognitive, and behavioral Beentjes, J. W. J., Van Oordt, M., & Van Der Voort, T. H.
markers of copresence in immersive virtual environ- A. (2002). How television commentary affects children’s
ments. Presence (Cambridge, Mass.), 14(4), 379–393. judgments on soccer fouls. Communication Research, 29,
doi:10.1162/105474605774785235 31–45. doi:10.1177/0093650202029001002
429
Beerends, J. G., & De Caluwe, F. E. (1999). The influ- Berndt, A., Hartmann, K., Röber, N., & Masuch, M. (2006).
ence of video quality on perceived audio quality and vice Composition and arrangement techniques for music in
versa. Journal of the Audio Engineering Society. Audio interactive immersive environments. In Proceedings of
Engineering Society, 47(5), 355–362. Audio Mostly 2006: A Conference on Sound in Games
(pp. 53-59). Piteå, Sweden: Interactive Institute/Sonic
Begault, D. R., & Wenzel, E. M. (1993). Headphone
Studio Piteå.
localization of speech. Human Factors, 35, 361–376.
Bethesda Game Studios (Developer). (2006). The Elder
Benson, D. J. (2007). Music: A mathematical offering.
Scrolls IV: Oblivion [Computer game]. 2K Games &
Cambridge: Cambridge University Press.
Bethesda Softworks.
Bentley, T., Johnston, L., & von Baggo, K. (2005). Evalu-
Beverland, M., Lim, E. A. C., Morrison, M., & Terziovski,
ation using cued-recall debrief to elicit information about a
M. (2006). In-store music and consumer–brand relation-
user’s affective experiences. In T. Bentley, L.Johnston, &
ships: Relational transformation following experiences
K. von Baggo (Eds.), Proceedings of the 17th Australian
of (mis)fit. Journal of Business Research, 59, 982–989.
conference on Computer-Human Interaction (pp. 1-10).
doi:10.1016/j.jbusres.2006.07.001
New York: ACM.
Biedermann, I., & Vessel, E. A. (2006). Perceptual pleasure
Berndt, A. (2008). Liturgie für Bläser (2nd ed.). Halber-
and the brain. American Scientist, 94(May-June), 247–253.
stadt, Germany: Musikverlag Bruno Uetz.
Bijsterveld, K. (2008). Mechanical sound: Technology,
Berndt, A. (2011). Diegetic music: New interactive experi-
culture, and public problems of noise in the twentieth
ences . In Grimshaw, M. (Ed.), Game Sound Technology
century. Cambridge, MA: MIT Press.
and Player Interaction: Concepts and Developments.
Hershey, PA: IGI Global. Bijsterveld, K. (2004). The diabolical symphony of the
mechanical age: Technology and symbolism of sound
Berndt, A., & Theisel, H. (2008). Adaptive musical expres-
in European and North American noise abatement cam-
sion from automatic real-time orchestration and perfor-
paigns, 1900-40 . In Back, L., & Bull, M. (Eds.), The
mance . In Spierling, U., & Szilas, N. (Eds.), Interactive
auditory culture reader (1st ed., pp. 165–190). Oxford,
Digital Storytelling (ICIDS) 2008 (pp. 132–143). Erfurt,
UK: Berg.
Germany: Springer. doi:10.1007/978-3-540-89454-4_20
Bilbao, S. (2006). Fast modal synthesis by digital wave-
Berndt, A. (2009). Musical nonlinearity in interactive
guide extraction. IEEE Signal Processing Letters, 13(1),
narrative environments. In G. Scavone, V. Verfaille & A.
1–4. doi:10.1109/LSP.2005.860553
da Silva (Eds.), Proceedings of the Int. Computer Music
Conf. (ICMC) (pp. 355-358). Montreal, Canada: Interna- Bilbao, S. (2009). Numerical sound synthesis: Finite
tional Computer Music Association, McGill University. difference schemes and simulation in musical acoustics.
Chichester, England: John Wiley and Sons.
Berndt, A., & Hähnel, T. (2009). Expressive musical tim-
ing. In Proceedings of Audio Mostly 2009: 4th Conference BioShock. (2007). Irrational Games.
on Interaction with Sound (pp. 9-16). Glasgow, Scotland:
Blauert, J. (2001). Spatial hearing: The psychophysics
Glasgow Caledonian University, Interactive Institute/
of human sound localization (3rd ed.). Cambridge, MA:
Sonic Studio Piteå.
MIT Press.
Blesser, B., & Salter, L. (2009). Spaces speak, are you

listening?: Experiencing aural architecture. Cambridge,
MA: MIT Press.
430
Blueberry garden. (2009). Erik Svedäng. Brandon, A. (2004). Audio for games: Planning, process,
and production. Berkeley, CA: New Riders Games.
Blumer, H. (1986). Symbolic interactionism. Berkeley:
University of California Press. Branigan, E. (1992). Narrative comprehension and film.
London: Routledge.
Boillat, A. (2009). La «diégèse» dans son acception film-
ologique. Origine, postérité et productivité d’un concept. Bregman, A. S. (1990). Auditory scene analysis: The
Cinémas Journal of Film Studies, 19(2-3), 217–245. perceptual organization of sound. London: MIT Press.
Bolivar, V. J., Cohen, A. J., & Fentress, J. C. (1994). Bregman, A. S. (1992). Auditory scene analysis: Listen-
Semantic and formal congruency in music and motion ing in complex environments . In McAdams, S. E., &
pictures: Effects on the interpretation of visual action. Bigand, E. (Eds.), Thinking in sound (pp. 10–36). New
Psychomusicology, 13, 28–59. York: Clarendon Press/Oxford University Press.
Bordwell, D. (1986). Narration in the fiction film. Lon- Brenton, H., Gillies, M., Ballin, D., & Chatting, D. J.
don: Routledge. (2005, September 5). The Uncanny Valley: Does it exist?
Paper presented at the HCI 2005, Animated Characters
Bordwell, D., & Thompson, K. (1994). Film history: An
Interaction Workshop, Napier University, Edinburgh, UK.
introduction. New York: McGraw-Hill.
Bresson, R. (1985). Notes on sound . In Weis, E., & Belton,
Bordwell, D., & Thompson, K. (2004). Film art: An
J. (Eds.), Film sound: Theory and practice. New York:
introduction (7th ed.). New York: McGraw-Hill.
Boucsein, W. (1992). Electrodermal activity. New York:
Brewster, S. A. (1994). Providing a structured method
Plenum Press.
for integrating non-speech audio into human-computer
Braasch, J. (2005). Modelling of binaural hearing . In interfaces. Unpublished doctoral dissertation. University
Blauert, J. (Ed.), Communication acoustics (pp. 75–108). of York, Heslington, UK.
Berlin: Springer Verlag. doi:10.1007/3-540-27437-5_4
Bridgett, R. (2006). Audible words, pt. 2: Updating the
Bradley, I. L. (1971). Repetition as a factor in the devel- state of critical writing in game sound. Gamasutra. Re-
opment of musical preferences. Journal of Research in trieved February 6, 2009, from http://www.gamasutra.
Music Education, 19(3), 295–298. doi:10.2307/3343764 com/features/20060831/audio_03.shtml
Bradley, M. M., Codispoti, M., Cuthbert, B. N., & Lang, P. Bridgett, R. (2007a). Designing a next-gen game for sound.
J. (2001). Emotion and motivation I: Defensive and appeti- Gamasutra. Retrieved February 13, 2009, from http://
tive reactions in picture processing. Emotion (Washington, www.gamasutra.com/view/feature/2321/designing_a_
D.C.), 1(3), 276–298. doi:10.1037/1528-3542.1.3.276 nextgen_game_for_sound.php
Bradley, M. M., & Lang, P. J. (2000). Affective reactions Bridgett, R. (2007b). Interactive ambience. Game Devel-
to acoustic stimuli. Psychophysiology, 37, 204–215. oper Magazine. Retrieved May 8, 2009, from http://www3.
doi:10.1017/S0048577200990012 telus.net/public/kbridget/aural_fixation_april07.jpg
Bradley, M. M., & Lang, P. J. (2007). Emotion and moti- Bridgett, R. (2009a). The future of game audio: Is interac-
vation . In Cacioppo, J. T., Tassinary, L. G., & Berntson, tive mixing the key? Gamasutra. Retrieved May 3, 2009,
G. G. (Eds.), Handbook of psychphysiology (3rd ed., from http://www.gamasutra.com/view/feature/4025/
pp. 581–607). New York: Cambridge University Press.
doi:10.1017/CBO9780511546396.025
Brainpipe. (2008). Digital Eel.
431
Brodmann, K. (1909). Vergleichende Lokalisationslehre Bullerjahn, C., & Güldenring, M. (1994). An empirical
der Grosshirnrinde in ihren Prinzipien dargestellt auf investigation of effects of film music using qualitative
Grund des Zellenbaues. Leipzig, Germany: Johann Am- content analysis. Psychomusicology, 13, 99–118.
brosius Barth Verlag.
Burgess, D. (1992). Techniques for low cost spatial audio.
Brown, R. I. F. (1986). Arousal and sensation-seeking In Proceedings of the 5th annual ACM symposium on
components in the general explanation of gambling and User interface software and technology.
gambling addictions. Substance Use & Misuse, 21(9),
Bushman, B. J., & Anderson, C. A. (2002). Violent video
1001–1016. doi:10.3109/10826088609077251
games and hostile expectations: A test of the General Ag-
Brown, E., & Cairns, P. (2004). A grounded investigation gression Model. Personality and Social Psychology Bul-
of game immersion . In Dykstra-Erickson, E., & Tscheligi, letin, 28(12), 1679–1686. doi:10.1177/014616702237649
M. (Eds.), CHI ‘04 extended abstracts (pp. 1297–1300).
Busso, C., & Narayanan, S. S. (2006). Interplay between
New York: ACM.
linguistic and affective goals in facial expression during
Brown, A. R., Wooller, R. W., & Kate, T. (2007,). The emotional utterances. In Proceedings of 7th International
morphing table: A collaborative interface for musical Seminar on Speech Production, 549-556.
interaction. In A. Riddel & A. Thorogood (Eds.), Proceed-
Cabrera Paz, J., & Schwartz, T. B. M. (2009). Techno-
ings of the Australasian Computer Music Conference (pp.
cultural convergence: Wanting to say everything, want-
34-39). Canberra, Australia.
ing to watch everything. Popular Communication: The
Browning, T. (Producer/Director). (1931). Dracula [Mo- International Journal of Media and Culture, 7(3), 130.
tion picture]. England: Universal Pictures.
Cacioppo, J. T., Tassinary, L. G., & Berntson, G. G.
Bruyns, C. (2006). Modal synthesis for arbitrarily (2007). Handbook of psychophysiology (3rd ed.). Cam-
shaped objects. Computer Music Journal, 30(3), 22–37. bridge, UK: Cambridge University Press. doi:10.1017/
doi:10.1162/comj.2006.30.3.22 CBO9780511546396
Bryant, J., Brown, D., Comisky, P. W., & Zillmann, D. Cacioppo, J. T., Berntson, G. G., Larsen, J. T., Poehlmann,
(1982). Sports and spectators: Commentary and appre- K. M., & Ito, T. A. (2004). The psychophysiology of
ciation. The Journal of Communication, 32(1), 109–119. emotion . In Lewis, M., & Haviland-Jones, J. M. (Eds.),
doi:10.1111/j.1460-2466.1982.tb00482.x Handbook of emotions (2nd ed., pp. 173–191). New York:
Guilford Press.
Bryant, J., Comisky, P., & Zillmann, D. (1982). Drama
in sports commentary. The Journal of Communication, Caillois, R. (2001). Man, play and games. Chicago:
27(3), 140–149. doi:10.1111/j.1460-2466.1977.tb02140.x University of Illinois Press.
Bryman, A. (2008). Social research methods (3rd ed.). Cakewalk. (1983). Commavid.
Oxford, UK: Oxford University Press.
Calleja, G. (2007). Digital games as designed experi-
Bugelski, B. R., & Alampay, D. A. (1961). The role of ence: Reframing the concept of immersion. Unpublished
frequency in developing perceptual sets. Canadian Journal doctoral dissertation. Victoria University of Wellington,
of Psychology, 15(4), 201–211. doi:10.1037/h0083443 New Zealand.
Bull, M. (2000). Sounding out the city: Personal stereos Calleja, G. (2007). Revising immersion: A conceptual
and the management of everyday life. Oxford, UK: Berg. model for the analysis of digital game involvement. In
Proceedings of Situated Play, DiGRA 2007 Conference,
Bull, M., & Back, L. (2004). The auditory culture reader
83-90.
(1st ed.). Oxford, UK: Berg.
432
Cameron, J. (1984). The terminator. Pacific Western. Chadabe, J. (1985). Interactive music composition and
performance system. U.S. Patent No. 4,526,078. Wash-
Cameron, J. (1991). Terminator 2: Judgement day. Pa-
ington, DC: U.S. Patent and Trademark Office.
cific Western.
Chapel, R. H. (2003). Real-time algorithmic music systems
Cameron, J. (Director). (2009). Avatar [Motion picture].
from fractals and chaotic functions: Towards an active
Los Angeles, CA: 20th Century Fox. Lightstorm Enter-
musical instrument. Unpublished doctoral dissertation.
tainment, Dune Entertainment, Ingenious Film Partners
University Pompeu Fabra, Barcelona, Spain.
[Studio].
Charlton, J. P., & Danforth, I. D. W. (2004). Differentiat-
Cancellaro, J. (2006). Exploring sound design for interac-
ing computer-related addictions and high engagement . In
tive media. Clifton Park, NY: Thomson Delmar Learning.
Morgan, K., Brebbia, C. A., Sanchez, J., & Voiskounsky,
Cannon, W. B. (1927). The James-Lange theory of emo- A. (Eds.), Human perspectives in the internet society:
tions: A critical examination and an alternative theory. culture, psychology and gender. Southampton: WIT Press.
The American Journal of Psychology, 39(1/4), 106–124.
Childs, G. W. (2007). Creating music and sound for games.
doi:10.2307/1415404
Boston, MA: Thomson Course Technology.
Cao, Y., Faloustsos, P., Kohler, E., & Pighin, F. (2004).
Chion, M. (1999). The voice in cinema. New York: Co-
Real-time speech motion synthesis from recorded mo-
lumbia University Press.
tions. In R. Boulic & D. K. Pai (Eds.), Eurographics/
ACM SIGGRAPH Symposium on Computer Animation Chion, M. (1983). Guide des objets sonores: Pierre
(2004), 345-353. Schaeffer et la recherche musicale. Paris: Buchet/Chastel.
Carnagey, N. L., Anderson, C. A., & Bushman, B. J. Chion, M. (1990). L’Audio-vision. Paris: Nathan.
(2007). The effect of video game violence on physi-
Chion, M. (2003). Un art sonore, le cinéma: histoire,
ological desensitization to real-life violence. Journal
esthétique, poétique. Paris: Cahiers du Cinéma.
of Experimental Social Psychology, 43(3), 489–496.
doi:10.1016/j.jesp.2006.05.003 Chion, M. (1998). Le son. Paris: Nathan.
Carr, D. (2006). Space, navigation and affect . In Carr, Chion, M. (1994). Audio-vision: Sound on screen (Gorb-
C., Buckingham, D., Burn, A., & Schott, G. (Eds.), man, C., Trans.). New York: Columbia University Press.
Computer games: Text, narrative and play (pp. 59–71).
Chion, M. (2003). The silence of the loudspeaker or
Cambridge, UK: Polity.
why with Dolby sound it is the film that listens to us . In
Carr, D. (2003). Play dead: Genre and affect in Silent Hill Sider, L., Freeman, D., & Sider, J. (Eds.), Soundscape:
and Planescape Torment. Game Studies, 3(1). Retrieved The School of Sound lectures 1998-2001 (pp. 150–154).
from http://www.gamestudies.org/0301/carr/ London: Wallflower Press.
Carroll, N. (1996). The paradox of suspense. In Vorderer Clair, R. (1985). The art of sound. Excerpts from a series
& Friedrichsen (Eds.), Suspense: conceptualization, theo- of letters . In Weis, E., & Belton, J. (Eds.), Film sound:
retical analysis, and empirical explorations (pp. 71-90). Theory and practice. New York: Columbia University
Hillsdale N.J.: Lawrence Erlbaum Associates. Press. (Original work published 1929)
Carter, F. A., Wilson, J. S., Lawson, R. H., & Bulik, C. Clark, L., Lawrence, A. J., Astley-Jones, F., & Gray, N.
M. (1995). Mood induction procedure: importance of (2009). Gambling near-misses enhance motivation to
individualising music. Behaviour Change, 12, 159–161. gamble and recruit win-related brain circuitry. Neuron,
61(3), 481–490. doi:10.1016/j.neuron.2008.12.031
Castlevania. (1989). Konami Digital Entertainment.
433
Clarkson, B., Mase, K., & Pentland, A. (2000). Recogniz- Connor, S. (2004). Edison’s teeth: Touching hearing. In
ing user context via wearable sensors. In Proceedings of the V. Erlmann (Ed.), Hearing cultures: Essays on sound,
Fourth International Symposium of Wearable Computers. listening, and modernity (English ed., pp. 153-172).
Oxford, UK: Berg.
Cohen, L. (2005). The history of noise [on the 100th an-
niversary of its birth]. IEEE Signal Processing Magazine, Cook, P. (Ed.). (1999). Music, cognition, and computerized
22(6), 20–45. doi:10.1109/MSP.2005.1550188 sound: An introduction to psychoacoustics. Cambridge,
MA: MIT Press.
Cohen, J. W. (1988). Statistical power analysis for the
behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Cook, P. R. (1997). Physically informed sonic modeling
Erlbaum Associates. (PhISM): Synthesis of percussive sounds. Computer Music
Journal, 21(3), 38–49. doi:10.2307/3681012
Collins, K. (2008). Game sound: An introduction to the
history, theory, and practice of video game music and Cook, P. R. (2002). Real sound synthesis for interactive
sound design. Cambridge, MA: MIT Press. application. Natick, MA: A K Peters, Ltd.
Collins, K., Tessler, H., Harrigan, K., Dixon, M. J., & Cooking Mama. (2007). OfficeCreate. Majesco Publish-
Fugelsang, J. (2011). Sound in electronic gambling ma- ing.
chines: A review of the literature and its relevance to game
Cooley, M. (1998, November). Sound + image in
audio . In Grimshaw, M. (Ed.), Game sound technology
computer-based design: Learning from sound in the arts.
and player interaction: Concepts and developments.
Paper presented at International Community for Auditory
Hershey, PA: IGI Global.
Display Conference, Glasgow, UK.
Collins, K. (2007). An introduction to the participatory
Coppola, F. F. (Director). (1979). Apocalypse now! [Mo-
and non-linear aspects of video games audio . In Hawkins,
tion picture]. Hollywood, CA: Paramount Pictures.
S., & Richardson, J. (Eds.), Essays on sound and vision.
Helsinki: Helsinki University Press. Cornelius, R. R. (1996). The science of emotion. Upper
Saddle River, NJ: Prentice-Hall.
Collins, K. (2008b). Nothing odd about audio. Retrieved
September 31, 2009, from http://www.slideshare.net/ Coventry, K. R., & Constable, B. (1999). Physiological
collinsk/sk-466356 arousal and sensation seeking in female fruit machine
players. Addiction (Abingdon, England), 94, 425–430.
Comisky, P. W., Bryant, J., & Zillmann, D. (1977). Com-
doi:10.1046/j.1360-0443.1999.94342512.x
mentary as a substitute for action. The Journal of Commu-
nication, 27(3), 150–153. doi:10.1111/j.1460-2466.1977. Coventry, K. R., & Hudson, J. (2001). Gender differ-
tb02141.x ences, physiological arousal and the role of winning in
fruit machine gamblers. Addiction (Abingdon, England),
Command & conquer 3: Tiberium wars. (2007). EA
96, 871–879. doi:10.1046/j.1360-0443.2001.9668718.x
Games.
Cowley, B., Charles, D., Black, M., & Hickey, R. (2008).
Conati, C. (2002). Probabilistic assessment of user’s emo-
Toward an understanding of flow in video games. ACM
tions in educational games. Applied Artificial Intelligence,
Computers in Entertainment, 6(2).
16(7/8), 555–575. doi:10.1080/08839510290030390
Crayon Physics Deluxe. (2009). Petri Purho (Developer).
Condry, J., & Scheibe, C. (1989). Non program content
San Mateo: Hudson Soft.
of television: Mechanisms of persuasion . In Condry,
J. (Ed.), The Psychology of Television (pp. 217–219).
London: Erlbaum.
434
Creswell, J. (2005). Educational research: Planning, Damásio, A. (2003). Emotion, feeling, and social behav-
conducting, and evaluating quantitative and qualitative ior: The brain perspective. The Walter Chapin Simpson
research (2nd ed.). Upper Saddle River, New Jersey: Center for the Humanities. Retrieved September 31, 2009,
Pearson Education. from http://depts.washington.edu/uwch/katz/20022003/
antonio_damasio.html
Crockford, D., Goodyear, B., Edwards, J., Quickfall, J.,
& el-Guebaly, N. (2005). Cue-Induced brain activity in Dance dance revolution. (1998). Konami.
pathological gamblers. Biological Psychiatry, 58(10),
Darwin, C. (1899). The expression of the emotions in
787–795. doi:10.1016/j.biopsych.2005.04.037
man and animals. New York: D. Appleton and Company.
Crysis. (2007). EA Games, Crytek.
Darwinia. (2007). Ambrosia Software.
Csíkszentmihályi, M. (1975). Beyond boredom and anxi-
Dave mirra freestyle BMX. (2000). Z-Axis.
ety. San Francisco: Jossey-Bass Publishers.
Davies, G., Cunningham, S., & Grout, V. (2007). Visual
Cunningham, S., Grout, V., & Picking, R. (2011). Emotion,
stimulus for aural pleasure. In Proceedings of the Audio
content and context in sound and music . In Grimshaw,
Mostly Conference on Interaction with Sound.
M. (Ed.), Game sound technology and player interaction:
Concepts and developments. Hershey, PA: IGI Global. Davis, H., & Silverman, R. (1978). Hearing and deafness
(4th ed.). Location: Thomson Learning.
Cunningham, S., Bergen, H., & Grout, V. (2006). A
note on content-based collaborative filtering of music. de Certeau, M. D. (1988). The practice of everyday life.
In Proceedings of IADIS - International Conference on Berkeley: University of California Press.
WWW/Internet.
De Poli, G., Piccialli, A., & Roads, C. (1991). Represen-
Cunningham, S., Caulder, S., & Grout, V. (2008). Sat- tations of musical signals. Cambridge, MA: MIT Press.
urday night or fever? Context aware music playlists. In
De Sanctis, G., Sarti, A., Scarparo, G., & Tubaro, S.
Proceedings of the 3rd Conference on Interaction with
(2005). Automatic modelling and authoring of nonlinear
Sound, Audio Mostly 2008 (pp. 64-71). Piteå, Sweden.
interactions between acoustic objects. In K. Galkowski,
Cunningham, S., Grout, V., & Hebblewhite, R. (2006). A. Kummert, E. Rogers & J. Velten (Eds.), The Fourth
Computer game audio: The unappreciated scholar of the International Workshop on Multidimensional Systems –
Half-Life generation. In Proceedings of the Audio Mostly NDS 2005 (pp.116-122).
Conference on Sound in Games.
Dead space. [Computer game]. (2008). EA Redwood
Curtis, S. (1992). The sound of the early Warner Bros. Shores (Developer). Redwood City: Electronic Arts.
cartoons . In Altman, R. (Ed.), Sound theory sound prac-
Dekker, A., & Champion, E. (2007). Please biofeed the
tice. New York: Routledge.
zombies: Enhancing the gameplay and display of a horror
Damasio, A. R. (1994). Descartes’ error. New York: G. game using biofeedback. In Proceedings of DiGRA: Situ-
P. Putnam. ated Play Conference. Retrieved January 1, 2010, from
http://www.digra.org/dl/db/07312.18055.pdf.
Damásio, A. (2000). The feeling of what happens: Body
and emotion in the making of consciousness. London: Dektela, R., & Sical, W. (2003). Survival horror: Un
Vintage Books. genre nouveau. Horror Games Magazine, 1(1), 13–16.
Damásio, A. (2005). Descartes’ error: Emotion, reason,

and the human brain. London: Vintage Books.
435
Delfabbro, P., Fazlon, K., & Ingram, T. (2005). The effects Doel, K. d., Knott, D., & Pai, D. K. (2004). Inter-
of parameter variations in electronic gambling simula- active simulation of complex audio-visual scenes.
tions: Results of a laboratory-based pilot investigation. Presence (Cambridge, Mass.), 13(1), 99–111.
Gambling Research: Journal of the National Association doi:10.1162/105474604774048252
for Gambling Studies, 17(1), 7–25.
Doel, K. d., & Pai, D. K. (1998). The sounds of physical
Deutch, S. (2003). Music for interactive moving pictures . shapes. Presence (Cambridge, Mass.), 7(4), 382–395.
In Sider, L., Freeman, D., & Sider, J. (Eds.), Soundscape: doi:10.1162/105474698565794
The School of Sound lectures 1998-2001 (pp. 28–34).
Doel, K. d., & Pai, D. K. (2006). Modal synthesis for
London: Wallflower Press.
vibrating objects. In K. Greenebaum, & R. Barzel (Eds.),
Deutsch, S. (2001). Harnessing the power of music and Audio anecdotes III: Tools, tips, and techniques for digi-
sound design in interactive media . In Earnshaw, R., & tal audio (pp. 99-120). Wellesley, MA: A K Peters, Ltd.
Vince, J. (Eds.), Digital content creation. New York:
Doel, K. d., Kry, P. G., & Pai, D. K. (2001). FoleyAu-
Springer.
tomatic: Physically-based sound effects for interactive
Diablo 2. (2000). Blizzard Entertainment. simulation and animation. In P. Lynn (Ed.), Proceedings
of SIGGRAPH ’01: The 28th annual conference on Com-
Dibben, N. (2001). What do we hear, when we hear
puter graphics and interactive techniques (pp. 537-544).
music? Music perception and musical material. Musicae
New York: ACM.
Scientiae, 2, 161–194.
Doel, K. d., Pai, D. K., Adam, T., Kortchmar, L., & Pichora-
Dickerson, M., & Adcock, S. (1987). Mood, arousal and
Fuller, K. (2002). Measurements of Perceptual Quality
cognitions in persistent gambling: Preliminary investiga-
of Contact Sound Models. In Nakatsu & H. Kawahara
tion of a theoretical model. Journal of Gambling Behav-
(Eds.), Proceedings of the 8th International Conference
iour, 3(1), 3–15. doi:10.1007/BF01087473
on Auditory Display, (pp. 345-349). Kyoto, Japan: ATR.
Dig dug. (1983). Atari.
Donkey kong [Computer game]. (1981). Kyoto: Nintendo.
DigiWall [Computer game]. (2010). Piteå, Sweden:
Donkey Konga. [Video game], (2004). Namco (Devel-
Digiwall Technology. Retrieved February 10, 2010, from
oper), Kyoto: Nintendo.
http://www.digiwall.se/.
Doom 3. (2004). Activision.
Dix, A., Finlay, J., & Abowd, G. D. (2004). Human-
computer interaction. Harlow, UK: Pearson Education. DJ Dorval, M., & Pepin, M. (1986). Effect of playing a video
hero. [Video game],(2009). FreeStyleGames (Developer), game on a measure of spatial visualization. Perceptual
Santa Monica, CA: Activision. and Motor Skills, 62, 159–162.
Dixon, L., Trigg, R., & Griffiths, M. (2007). An em- Douglas, Y., & Hargadon, A. (2000). The pleasure prin-
pirical investigation of music and gambling behav- ciple: Immersion, engagement, flow. In Proceedings of
iour. International Gambling Studies, 7(3), 315–326. the eleventh ACM on Hypertext and Hypermedia (pp.153-
doi:10.1080/14459790701601471 160), New York: ACM.
Dixon, M., Harrigan, K. A., Sandhu, R., Collins, K., & Dragon age: Origins. (2009). EA Games, Bioware.
Fugelsang, J. (2011: In press). Slot machine play: Psycho-
Dreher, R. E. (1947). The relationship between verbal
physical responses to wins, losses, and losses disguised
reports and the galvanic skin response. Journal of Ab-
as wins. Addiction.
normal and Social Psychology, 44, 87–94.
436
Drescher, P. (2006a). GAC(k!). Retrieved June 8, 2009, Eijkman, E., & Vendrik, J. H. (1965). Can a sensory
from http://blogs.oreilly.com/digitalmedia/2006/09/ system be specified by its internal noise? The Journal
gack-1.html of the Acoustical Society of America, 37, 1102–1109.
doi:10.1121/1.1909530
Drescher, P. (2006b). THE Homunculonic AEStheticator.
Retrieved June 8, 2009, from http://blogs.oreilly.com/digi- Eisenstein, S. M., Pudovkin, V. I., & Alexandrov, G.
talmedia/2006/11/the-homunculonic-aestheticator-1.html V. (1985). Statement on the sound film . In Weis, E.,
& Belton, J. (Eds.), Film sound: Theory and practice.
Dretzka, G. (2004, December 12). Casinos, celebrities bet
New York: Columbia University Press. (Original work
on our love for pop culture icons. Seattle Times. Retrieved
published 1928)
July 15, 2009, from http://community.seattletimes.nw-
source.com/archive/?date=20041212&slug=casinos12. Ekman, P. (1992). An argument for basic emo-
tions. Cognition and Emotion, 6(3/4), 169–200.
Drobnick, J. (2004). Aural cultures. Toronto: YYZ Books.
doi:10.1080/02699939208411068
Droumeva, M. (2011). An acoustic communication frame-
Ekman, P., & Friesen, W. V. (1978). Facial action coding
work for game sound: Fidelity, verisimilitude, ecology
system: A technique for the measurement of facial move-
. In Grimshaw, M. (Ed.), Game Sound Technology and
ment. Palo Alto, CA: Consulting Psychologists Press.
Player Interaction: Concepts and Developments. Hershey,
PA: IGI Global. Ekman, I., & Lankoski, P. (2009). Hair-raising entertain-
ment: Emotions, sound, and structure in Silent Hill 2 and
Duda, R. O., Algazi, V. R., & Thompson, D. M. (2002).
Fatal Frame . In Perron, B. (Ed.), Horror video games:
The use of head-and-torso models for improved spatial
Essays on the fusion of fear and play (pp. 181–199). Jef-
sound synthesis. In Proceedings of the 113th Audio En-
ferson, NC: McFarland.
gineering Society Convention.
Ekman, I. (2005). Meaningful noise: Understanding sound
Dyson. (2009). Kremers & May.
effects in computer games. In Proceedings of Digital Arts
Dyson, F. (1996). When is the ear pierced? The clashes and Cultures 2005. Copenhagen, Denmark.
of sound, technology and cyberculture . In Moser, M.
Ekman, I. (2008). Comment on the IEZA: A framework
A., & MacLeod, D. (Eds.), Immersed in technology: Art
for game audio. Gamasutra. Retrieved January 13, 2010,
and virtual environments. Cambridge, MA: MIT Press.
from http://www.gamasutra.com/view/feature/3509/
Ebcioglu, K. (1992). An expert system for harmoniz- ieza_a_framework_for_game_audio.php
ing chorales in the style of J. S. Bach . In Balaban, M.,
Ekman, I. (2008). Psychologically motivated techniques
Ebcioglu, K., & Laske, O. (Eds.), Understanding music
for emotional sound in computer games. In Proceedings
with AI: Perspectives on music cognition (pp. 294–334).
of the 3rd Conference on Interaction with Sound, Audio
Mostly 2008 (pp. 20-26). Piteå, Sweden.
Edworthy, J., Loxley, S., & Dennis, I. (1991). Improving
Ekman, I. (2009). Modelling the emotional listener: Mak-
auditory warning design: Relationship between warning
ing psychological processes audible. In Proceedings of
sound parameters and perceived urgency. Human Fac-
Audio Mostly 2009: 4th Conference on Interaction with
tors, 33(2), 205–231.
Sound (pp. 33-40). Glasgow, Scotland: Glasgow Caledo-
Effrat, J., Chan, L., Fogg, B. J., & Kong, L. (2004). What nian University, Interactive Institute/Sonic Studio Piteå.
sounds do people love and hate? Interaction, 11(5), 64–66.
doi:10.1145/1015530.1015562
437
Ekman, I., & Kajastila, R. (2009, February 11-13). Lo- Everest, F. A. (1997). Sound studio construction on a
calisation cues affect emotional judgements: Results from budget. City, ST: McGraw-Hill.
a user study on scary sound. Paper presented at the AES
Everest, F. A. (2001). Master handbook of acoustics. City,
35th International Conference, London, UK.
ST: McGraw-Hill.
Ekman, I., Ermi, L., Lahti, J., Nummela, J., Lankoski, P., &
F.E.A.R. (2005). Vivendi Universal Games. Monolith
Mäyrä, F. (2005). Designing sound for a pervasive mobile
Productions.
game. In Proceedings of the ACM SIGCHI International
Conference on Advances in Computer Entertainment FableII. (2008). Microsoft.
Technology,2005.
Fake engine noises added to hybrid and electric cars
Eldridge, A. C. (2002). Adaptive systems music: Musical to improve safety. (2008). Retrieved January 10, 2010,
structures from algorithmic process. In C. Soddu (Ed.), from http://www.switched.com/2008/06/05/fake-engine-
Proceedings of the 6th Generative Art Conference Milan, noises-added-to-hybrid-and-electric-cars-to-improve/.
Italy: Politecnico di Milano University.
Fallout 3. (2008). Bethesda Softworks. Bethesda Game
Electroplankton. [Video game], (2006). Indies Zero Studios.
(Developer), Kyoto: Nintendo.
Farmer, D. (2009). The making of Torment audio. Re-
Elite beat agents. [Video game], (2006). iNiS (Developer), trieved July 9, 2009, from http://www.filmsound.org/
Kyoto: Nintendo. game-audio/audio.html.
Ellis, S. R. (1996). Presence of mind... A reaction to Farnell, A. J. (2008). Designing sound. London: Applied
Thomas Sheridan’s “Musing on telepresence.” . Presence Scientific Press.
(Cambridge, Mass.), 5, 247–259.
Farnell, A. (2011). Behaviour, structure and causality in
Elmore, W. C., & Heald, M. A. (1969). Physics of waves. procedural audio . In Grimshaw, M. (Ed.), Game sound
Location: McGraw Hill. technology and player interaction: Concepts and devel-
opments. Hershey, PA: IGI Global.
Epstein, M. (2009). Growing an interdisciplinary hybrid:
The case of acoustic ecology. History of Intellectual Farnell, A. (2007). An introduction to procedural audio
Culture, 3(1). Retrieved December 29, 2009, from http:// and its application in computer games. Retrieved May
www.ucalgary.ca/hic/issues/vol3/9. 20, 2009, from http://www.obiwannabe.co.uk/html/
papers/proc-audio
Ermi, L., & Mäyrä, F. (2005). Fundamental components
of the gameplay experience: Analysing immersion. In Farris, J. S. (2003). The human interaction cycle: A
Proceedings of DiGRA 2005 Conference Changing Views: proposed and tested framework of perception, cognition,
Worlds in Play. Retrieved January 1, 2010, from http:// and action on the web. Unpublished doctoral dissertation.
www.digra.org/dl/db/06276.41516.pdf. Kansas State University, USA.
Ernerfeldt, E. (2008). Phun: 2D physics sandbox. Available Fastl, H., & Zwicker, E. (2007). Psychoacoustics: Facts
from http://www.phunland.com/wiki/Home. and models (3rd ed., Vol. 22). Berlin, Heidelberg: Springer.
Essl, G., Serafin, S., Cook, P., & Smith, J. O. (2004). Fatal frame. [Computer game]. (2002). Tecmo (Devel-
Theory of banded waveguides. Computer Music Journal, oper). Torrance: Tecmo.
28(1), 37–50. doi:10.1162/014892604322970634
Feld, S. (2004). A rainforest acoustemology . In Bull, M.,
Eternal Darkness. (2002). Nintendo. & Back, L. (Eds.), The auditory culture reader (1st ed.,
pp. 223–240). Oxford, UK: Berg Publishers.
438
Ferber, D. (2003, September) The man who mistook his Flossmann, S., Grachten, M., & Widmer, G. (2009).
girlfriend for a robot. Popular Science. Retrieved April Expressive performance rendering: introducing perfor-
7, 2009, from http://iiae.utdallas.edu/news/pop_science. mance context. In Proceedings of the 6th Sound and
html. Music Computing Conference (SMC). Porto, Portugal:
Universidade do Porto.
Ferguson, C. J. (2007). Evidence for publication bias in
video game violence effects literature: A meta-analytic Flückiger, B. (2001). Sound design, die virtuelle Klangwelt
review. Aggression and Violent Behavior, 12(4), 470–482. des Films. Marburg, Country: Schüren.
doi:10.1016/j.avb.2007.01.001
Folkman, S., & Lazarus, R. S. (1990). Coping and emotion
Fernandez, A. (2008). Fun experience with digital games: A . In Leventhal, N. B., & Trabasso, T. (Eds.), Psychologi-
model proposition . In Leino, O., Wirman, H., & Fernandez, cal and biological approaches to emotion (pp. 313–332).
A. (Eds.), Extending experiences: Structure, analysis and Hillsdale, NJ: Erlbaum.
design of computer game player experience (pp. 181–190).
Follett, J. (2007). Audio and the user experience. UXmat-
Rovaniemi, Finland: Lapland University Press.
ters. Retrieved September 31, 2009, from http://www.
Ferrari, M., & Ives, S. (2005). Slots: Las Vegas gamblers uxmatters.com/MT/archives/000200.php
lose some $5 billion a year at the slot machines alone. Las
Foote, J. (1999). Visualizing music and audio using self-
Vegas: An unconventional history. New York: Bulfinch.
similarity. Proceedings of the seventh ACM international
Fettweis, A. (1986). Wave digital filters: Theory and conference on Multimedia (Part 1), 77-80.
practice. Proceedings of the IEEE, 74(2), 270–327.
Frauenberger, C. (2007). Ears))): A methodological frame-
doi:10.1109/PROC.1986.13458
work for auditory display design . In CHI ‘07 extended
FIFA. (1993-). EA Sports abstracts on Human factors in computing systems (pp.
1641–1644). San Jose, CA: ACM Press.
Figgis, M. (2003). Silence: The absence of sound . In
Sider, L., Freeman, D., & Sider, J. (Eds.), Soundscape: Freeman, D. (2004). Creating emotion in games: The craft
The School of Sound lectures 1998-2001 (pp. 1–14). and art of emotioneering™. Computers in Entertainment,
London: Wallflower Press. 2(3), 15. doi:10.1145/1027154.1027179
Final Fantasy 2. (1988). Squaresoft. Square ENIX. Freeman, D. (2003). Creating emotions in games. Berkley,
CA: New Riders Games.
Firelight (2009). FMOD Ex. v4.28 [Computer software].
Victoria, Australia: Firelight Technologies. FreeStyleGames (2009). DJ Hero [Computer game].
FreeStyleGames (Developer), Activision.
Fitterer, D. (2008). Audiosurf: Ride Your Music [Computer
game]. Washington, DC: Valve. Frequency. (2001). Sony Computer Entertainment
(PlayStation 2).
Fleming, J. (2009). Planet of sound: Talking art, noise,
and games with EA’s Robi Kauker. Retrieved June 3, 2009, Freud, S. (1919). The Uncanny . In The standard edition
from http://www.gamasutra.com/view/feature/3978/ of the complete psychological works of Sigmund Freud
planet_of_sound_talking_art_.php (Vol. 17, pp. 219–256). London: Hogarth Press.
Fletcher, N. H., & Rossing, T. D. (1991). The physics of Friberg, A., Bresin, R., & Sundberg, J. (2006). Overview
musical instruments. New York: Springer. of the KTH Rule System for musical performance. Ad-
vances in Cognitive Psychology . Special Issue on Music
Fletcher, T. D. R. N. H. (2004). Principles of vibration
Performance, 2(2/3), 145–161.
and sound (2nd ed.). New York.
439
Friberg, J., & Gärdenfors, D. (2004). Audio games: New Gasser, M., Pampalk, E., & Tomitsch, M. (2007). A
perspectives on game audio. In Proceedings of the ACM content-based user-feedback driven playlist generator and
SIGCHI International Conference on Advances in Com- its evaluation in a real-world scenario. In Proceedings of
puter Entertainment Technology2004, 148-154. the Audio Mostly Conference on Interaction with Sound.
Friday the 13th. [Computer game]. (1989). Pack-In-Video Gauss, C. F. (1882). General solution of the problem: To
(Developer). New York: LJN. map a part of a given surface on another given surface so
that the image and the original are similar in their smallest
Frisby, D. (2002). Cityscapes of modernity: Critical
parts. Copenhagen: Journal of Royal Society of Science.
explorations. Cambridge, UK: Polity.
Gaver, W. (1993). What in the world do we hear?
Frohlich, D., & Murphy, R. (1999, December 20). Get-
An ecological approach to auditory event percep-
ting physical: what is fun computing in tangible form?
tion. Ecological Psychology, 5(1), 1–29. doi:10.1207/
Paper presented at the Computers and Fun 2 Workshop,
s15326969eco0501_1
York, UK.
Gaver, W. (1997). Auditory interfaces . In Helander, M.
Funkhouser, T., Carlbom, I., Elko, G., Pingali, G., Son-
G., Landauer, T. K., & Prabhu, P. (Eds.), Handbook of
dhi, M., & West, J. (1998). A beam-tracing approach to
human-computer interaction (2nd ed.). Amsterdam: Else-
acoustic modelling for interactive virtual environments.
vier Science. doi:10.1016/B978-044481862-1/50108-4
In S. Cunningham, W. Bransford & M. F. Cohen (Eds.)
Proceedings of SIGGRAPH ’98: The 25th annual confer- Gaver, W. (1994). Using and creating auditory icons. In G.
ence on Computer graphics and interactive techniques Kramer (Ed.). Auditory Display: Signification, Audifica-
(pp. 21-28). New York: ACM. tion, and Auditory Interfaces (Santa Fe Institute Studies
in the Sciences of Complexity, Vol. 18, pp. 417-446).
Gaboury, A., & Ladoucer, R. (1989). Erroneous percep-
Reading, MA: Addison-Wesley.
tions and gambling. Journal of Social Behavior and
Personality, 4(41), 111–120. Gaver, W. W. (1993a). Synthesizing auditory icons. In
S. Ashlund, K. Mullet, A. Henderson, E. Hollnagel & T.
Gabrielsson, A., & Lindström, E. (2001). The influence
White (Eds.) Proceedings of the INTERCHI ’93 conference
of musical structure on emotional expression . In Juslin,
on Human factors in computing systems (pp. 228-235).
P., & Sloboda, J. A. (Eds.), Music and emotion: Theory
New York: ACM.
and research. Oxford, UK: Oxford University Press.
Gaver, W. W., Beaver, J., & Benford, S. (2003). Ambiguity
Gackenbach, J. (2008). The relationship between percep-
as a resource for design. Proceedings of the ACM CHI
tions of video game flow and structure. Loading... 1(3).
Conference on Human Factors in Computing Systems,
Galloway, A. R. (2006). Gaming: Essays on algorithmic 2003, 233-240.
culture. Electronic Mediations (Vol. 18). Minneapolis:
Gears of War. (2007). Microsoft.
University of Minnesota Press.
Gebeke, D. (1993). Children and fear. Retrieved December
Gardner, W. G. (1992, November). A realtime multichan-
10, 2009, from http://www.ag.ndsu.edu/pubs/yf/famsci/
nel room simulator. Paper presented at the 124th meeting
he458w.htm.
of the Acoustical Society of America.
Geiger, G. (2005). Abstraction in computer music software
Garlin, F. V., & Owen, K. (2006). Setting the tone with the
systems. Unpublished doctoral dissertation. Universitat
tune: A meta-analytic review of the effects of background
Pomp eu Fabra, Barcelona.
music in retail settings. Journal of Business Research, 59,
755–764. doi:10.1016/j.jbusres.2006.01.013
440
Genette, G. (1983). Narrative discourse: An essay in Gordon, C., Webb, D. L., & Wolpert, S. (1992). Isospectral
method. Ithaca, NY: Cornell University Press. plane domains and surfaces via Riemannian orbifolds.
Inventiones Mathematicae, 110, 1–22. doi:10.1007/
Gescheider, G. A., Sager, L. C., & Ruffolo, L. J. (1975).
BF01231320
Simultaneous auditory and tactile information processing.
Perception & Psychophysics, 18, 209–216. Gouk, P. (2004). Raising spirits and restoring souls:
Early modern medical explanations for music’s effects .
Gibson, J. (1986). The ecological approach to visual
In Erlmann, V. (Ed.), Hearing cultures: Essays on sound,
perception. New Jersey: LEA.
listening and modernity (pp. 87–105). Oxford: Berg.
Gibson, J. (1977). The theory of affordances . In Shaw, R.
Gouskos, C. (2006). The depths of the Uncanny Val-
E., & Bransford, J. (Eds.), Perceiving, acting and knowing
ley. Gamespot. Retrieved April 7, 2009, from, http://
(pp. ##-##). New Jersey: LEA.
uk.gamespot.com/features/6153667/index.html.
Gilleade, K. M., & Dix, A. (2004). Using frustration in
Goyal, V. (2006). Pro Java ME MMAPI: Mobile media
the design of adaptive videogames. In [New York: ACM.].
API for Java Micro Edition. City, CA: Apress Press Inc.
Proceedings of ACE, 2004, 228–232.
JSR-234 Group. (2005). Advanced multimedia supple-
Gilleade, K. M., Dix, A., & Allanson, J. (2005). Affec- ments API for JavaTM2 Micro Edition. Nokia Corporation.
tive videogames and modes of affective gaming: Assist
Grand Theft Auto. (2004). San Andreas. Rockstar North.
me, challenge me, emote me. In Proceedings of DiGRA
Rockstar Games.
2005 Conference: Changing Views: Worlds in Play.
Retrieved January 1, 2010, from http://www.digra.org/ Grant, W., Wassenhove, V., & Poeppel, D. (2004). Detec-
dl/db/06278.55257.pdf. tion of auditory (cross-spectral) and auditory-visual (cross-
modal) synchrony. Speech Communication, 44(1/4),
Giordano, B. (2001). Preliminary observations on materi-
43–53. doi:10.1016/j.specom.2004.06.004
als recovering from real impact sounds: Phenomenology
of sound events . In Polotti, P., Papetti, S., Rocchesso, D., Gray, J. A. (1971). The psychology of fear and stress.
& Delle, S. (Eds.), The sounding object (Sob project) (p. New York: McGraw-Hill.
24). Verona: University of Verona.
Green, R. D., MacDorman, K. F., Ho, C. C., & Vasudevan,
Gitaroo man. [Video game], (2001). Koei/iNiS (Devel- S. K. (2008). Sensitivity to the proportions of faces that
oper) (PlayStation 2). Electroplankton. [Video game], vary in human likeness. Computers in Human Behavior,
(2006). Indies Zero (Developer), Kyoto: Nintendo. 24(5), 2456–2474. doi:10.1016/j.chb.2008.02.019
Glass, D. C., & Singer, J. E. (1972). Urban stress. New Greeno, J. G., Collins, A. M., & Resnick, L. B. (1996).
York: Academic. Cognition and learning . In Berliner, D., & Calfee, R.
(Eds.), Handbook of educational psychology (pp. 15–46).
God of War 2. (2007). SCE Studios Santa Monica. Sony
New York: Simon & Schuster Macmillan.
Computer Entertainment.
Grey Matter [INDIE arcade game]. (2008). McMillen,
Goffman, E. (1959). The presentation of self in everyday
E., Refenes, T., & Baranowsky, D. (Developers). San
life (1st ed.). New York: Anchor.
Francisco, CA: Kongregate.
Goldstein, E. B. (2002). Wahrnehmungspsychologie (2nd
Grey, J. M. (1975). Exploration of musical timbre. Stanford
ed.). Berlin: Spektrum Akadem. Verlag.
University Dept. Music Technology Report, STAN-M-2.
Gorbman, C. (1987). Unheard melodies? Narrative film
music. Bloomington: Indiana University Press.
441
Griffiths, M. D. (1990). The cognitive psychology of Grimshaw, M. (2009). The audio uncanny valley: Sound,
gambling. Journal of Gambling Studies, 6(1), 31–42. fear and the horror game. In Proceedings of the 4th Con-
doi:10.1007/BF01015747 ference on Interaction with Sound, Audio Mostly 2009
(pp. 21-26). Glasgow, UK.
Griffiths, M., & Parke, J. (2005). The psychology of music
in gambling environments: An observational research Grimshaw, M., & Schott, G. (2008). A conceptual frame-
note. Journal of Gambling Issues, 13. Retrieved July work for the analysis of first-person shooter audio and its
15, 2009, from http://www.camh.net/egambling/issue13/ potential use for game engines. International Journal of
jgi_13_griffiths_2.html. Computer Games Technology, 2008.
Grigg, C., Whitmore, G., Azzarello, P., Pilon, J., Noel, F., Guitar hero 5. [Video game], (2009). RedOctane (Devel-
Snyder, S., et al. (2006, May). Group report: Providing a oper), Santa Monica, CA: Activision.
high level of mixing aesthetics in interactive audio and
Guitar hero II. [Video game], (2006). RedOctane (De-
games. Paper developed at the Annual Interactive Music
veloper), Santa Monica, CA: Activision.
Conference Project Bar-B-Q.
Guitar hero III. [Video game], (2007). RedOctane (De-
Grimshaw, M., & Schott, G. (2007). Situating gaming as
veloper), Santa Monica, CA: Activision.
a sonic experience: The acoustic ecology of first person
shooters . In Proceedings of DiGRA 2007. Situated Play. Guitar hero world tour. [Video game], (2008). RedOctane
Grimshaw, M. (2008). The acoustic ecology of the first-
person shooter: The player experience of sound in the Guitar hero. (2005-). [Computer software]. Harmonix
first-person shooter computer game. Saarbrucken, Ger- Music Systems (2005- 2007)/ Neversoft (2007-).
many: VDM Verlag.
Guitar hero. [Video game], (2005). RedOctane (Devel-
Grimshaw, M. (2008). Sound and immersion in the first- oper), New York: MTV Games.
person shooter. International Journal of Intelligent Games
Gullone, E., King, N., & Ollendick, T. (2000). The
& Simulation, 5(1), 2–8.
development and psychometric evaluation of the Fear
Grimshaw, M. (2008). Per un’analisi comparata del Experiences Questionnaire: An attempt to disentangle
suono nei videogiochi e nel cinema . In Bittanti, M. (Ed.), the fear and anxiety constructs. Clinical Psychology &
Schermi interattivi saggi critici su videogiochi e cinema Psychotherapy, 7(1), 61–75. doi:10.1002/(SICI)1099-
(pp. 95–121). (Bittanti, M., Trans.). Roma: Meltemi. 0879(200002)7:1<61::AID-CPP227>3.0.CO;2-P
Grimshaw, M. (2007). Sound and immersion in the first- Haas, E. C., & Edworthy, J. (1996). Designing urgency
person shooter. In Proceedings of The 11th International into auditory warnings using pitch, speed and loudness.
Computer Games Conference: AI, Animation, Mobile, Computing and Control Engineering Journal, 7, 193–198.
Educational & Serious Games (CGAMES 2007). doi:10.1049/cce:19960407
Grimshaw, M. (2007). The acoustic ecology of the first Half Life 2. [Computer game]. (2008). Valve Corporation
person shooter. Unpublished doctoral dissertation. Uni- (Developer). Redwood City, CA: EA Games.
versity of Waikato, New Zealand.
Half-Life series. (1998-). Valve.
Grimshaw, M. (2007). The resonating spaces of first-
Halloween. [Computer game]. (1983). Video Software
person shooter games. In Proceedings of The 5th Inter-
Specialist (Developer). Los Angeles: Wizard Video
national Conference on Game Design and Technology.
Games.
Retrieved January 1, 2010, from http://digitalcommons.
bolton.ac.uk/gcct_conferencepr/4/.
442
Hansen, S. H., & Jensenius, A. R. (2006). The Drum Pants. Hauntedhouse.[Computer game]. (1981). Atari (Devel-
In Proceedings of Audio Mostly 2006: A Conference on oper).Sunnyvale: Atari.
Sound in Games (pp. 60-63). Piteå, Sweden: Interactive
Haze. (2008). Free Radical Design.
Institute/Sonic Studio.
Hazlett, R. L. (2006). Measuring emotional valence during
Hanson, D. (2006). Exploring the aesthetic range for hu-
interactive experiences: Boys at video game play. In Pro-
manoid robots. In Proceedings of the ICCS/CogSci-2006
ceedings of CHI’06 (pp. 1023 – 1026). New York: ACM.
Long Symposium: Toward Social Mechanisms of Android
Science, 16-20. Healy, A. F., Proctor, R. W., & Weiner, I. B. (2004).
Handbook of psychology: Vol. 4. Experimental psychol-
Harmonix (2003). Amplitude [Computer game]. Harmonix
ogy. Hoboken, NJ: Wiley.
(Developer), Sony.
Heavenly sword. (2007). Sony.
Harmonix (2006-2009). Guitar Hero series [Computer
games]. Harmonix, Neversoft, Vicarious Visions, Budcat Hébert, S., Béland, R., & Dionne-Fournelle, O. (2005).
Creations, RedOctane (Developers), Activision. Physiological stress response to video-game playing:
the contribution of built-in music. Life Sciences, 76,
Harrigan, K. A. (2009). Slot machines: Pursuing respon-
2371–2380. doi:10.1016/j.lfs.2004.11.011
sible gaming practices for virtual reels and near misses.
International Journal of Mental Health and Addiction, Herber, N. (2006). The Composition-Instrument: Musical
7(1), 68–83. doi:10.1007/s11469-007-9139-8 emergence and interaction. In Proceedings of Audio Mostly
2006: A Conference on Sound in Games (pp. 53-59).
Harrigan, K. A., & Dixon, M. (2009). PAR sheets, prob-
Piteå, Sweden: Interactive Institute/Sonic Studio Piteå.
abilities, and slot machine play: Implications for problem
and non-problem gambling. Journal of Gambling Issues, Hermann, T., & Hunt, A. (2005). Guest Editors’ Introduc-
23, 81–110. doi:10.4309/jgi.2009.23.5 tion: An Introduction to Interactive Sonification. IEEE
MultiMedia, 12(2), 20–24. doi:10.1109/MMUL.2005.26
Harry Potter and the Chamber of Secrets. (2002). Euro-
com. Electronic Arts. Hermann, T., & Ritter, H. (1999). Listen to your data:
Model-based sonification for data analysis . In Advances
Harvey, A., & Samyn, M. (2006). Realtime art manifesto.
in intelligent computing and multimedia systems (pp.
Retrieved June 6, 2009, from http://tale-of-tales.com/
189–194). Baden-Baden.
tales/RAM.html
Hiller, L. A., & Isaacsons, L. M. (1959). Experimental
Hassanpour, A. (2009). Dubbing. The Museum of Broad-
music: Composing with an electronic computer. New
cast Communications. Retrieved July 14, 2009, from,
York: McGraw Hill.
http://www.museum.tv/archives/etv/D/htmlD/dubbing/
dubbing.htm. Hiller, L. and Ruiz, P. (1971). Synthesizing musical sounds
by solving the wave equation for vibrating objects. Journal
Hassenzahl, M., & Roto, V. (2007). Being and doing: A
of the Audio Engineering Society.
perspective on User Experience and its measurement.
Interfaces, 72, 10–12. Hirokawa, E. (2004). Effects of music, listening, and re-
laxation instructions on arousal changes and the working
Hassenzahl, M., & Tractinsky, N. (2006). User
memory task in older adults. Journal of Music Therapy,
Experience—a research agenda [Editorial]. Be-
41(2), 107–127.
haviour & Information Technology, 25(2), 91–97.
doi:10.1080/01449290500330331
443
Hirsch, A. R. (1995). Effects of ambient odors on slot- Hudlicka, E. (2008). Affective computing for game
machine usage in a Law Vegas casino. Psychology and design. In Proceedings of the 4th International North
Marketing, 12(7), 585–594. doi:10.1002/mar.4220120703 American Conference on Intelligent Games and Simula-
tion (GAMEON-NA).Montreal, Canada.
Hitchcock, A. (1963). The birds. Universal Pictures.
Hug, D. (2011). New wine in new skins: Sketching the
Hitchcock, A. (1956). The Man Who Knew Too Much
future of game sound design . In Grimshaw, M. (Ed.),
[Motion picture]. Hollywood, CA: Paramount.
Game sound technology and player interaction: Concepts
Hitchcock, A. (Director) (1960). Psycho. Hollywood, and developments. Hershey, PA: IGI Global.
CA: Paramount.
Hug, D. (2007). Game sound education at ZHdK: Be-
Hitman. (2002). Io Interactive. Eidos Interactive. tween research laboratory and experimental education.
In Proceedings of Audio Mostly 2007 - 2nd Conference
Ho, C. C., MacDorman, K., & Pramono, Z. A. D. (2008,).
on Interaction with Sound.
Human emotion and the uncanny valley. A GLM, MDS,
and ISOMAP analysis of robot video ratings. In Proceed- Hug, D. (2008a). Towards a hermeneutics and typology
ings of the Third ACM/IEEE International Conference on of sound for interactive commodities. In Proceedings of
Human-Robot Interaction, 169-176. the CHI 2008 Workshop on Sonic Interaction Design.
Hodgkinson, G. (2009). The seduction of realism. In Hug, D. (2008b). Genie in a bottle: Object-sound recon-
Proceedings of ACM SIGGRAPH ASIA 2009 Educators figurations for interactive commodities. In Proceedings
Program (pp. 1-4). Yokohama, Japan: The Association of Audiomostly 2008, 3rd Conference on Interaction
for Computing Machinery. With Sound.
Hoeger, L., & Huber, W. (2007). Ghastly multiplication: Huizinga, J. (1955). Homo ludens: A study of the play
Fatal Frame II and the videogame Uncanny. In Proceed- element in culture. Boston: Beacon Press.
ings of Situated Play, DiGRA 2007 Conference, Tokyo,
Ihde, D. (1976). Listening and voice: A phenomenology
Japan, 152-156.
of sound. Athens, OH: Ohio University Press.
Hopson, J. (2001). Behavioral game design. Gamasutra.
IJsselsteijn, W., Poels, K., & de Kort, Y. A. W. (2008).
Retrieved October 23, 2009, from http://www.gamasutra.
The Game Experience Questionnaire: Development of
com/view/feature/3085/behavioral_game_design.php.
a self-report measure to assess player experiences of
Hörnel, D. (2000). Lernen musikalischer Strukturen und digital games. FUGA Deliverable D3.3. Eindhoven, The
Stile mit neuronalen Netzen. Karlsruhe, Germany: Shaker. Netherlands: TU Eindhoven.
Hörnel, D., & Menzel, W. (1999). Learning musical Ion Storm Inc (Developer). (2004). Thief: Deadly Shadows
structure and style with neural networks. Computer Music [Computer game]. Eidos Interactive.
Journal, 22(4), 44–62. doi:10.2307/3680893
ITU-R BT.1359-1. (1998). Relative timing of sound and
Howard, D. M., & Angus, J. (1996). Acoustics and psy- vision for broadcasting. Question ITU-R, 35(11).
choacoustics. Oxford: Focal Press.
Ivory, J. D., & Kalyanaraman, S. (2007). The effects of
Howard, I. P. (1982). Human visual orientation. New technological advancement and violent content in video
York: Wiley. games on players’ feelings of presence, involvement,
physiological arousal, and aggression. The Journal of
Communication, 57(3), 532–555. doi:10.1111/j.1460-
2466.2007.00356.x
444
Iwai, T. (2005). Electroplankton [Computer game]. Indies Jørgensen, K. (2009). A comprehensive study of sound
Zero (Developer), Nintendo. in computer games. Lewiston, NY: Edwin Mellen Press.
Iwamiya, S. (1994). Interaction between auditory and Jørgensen, K. (2011). Time for new terminology? Diegetic
visual processing when listening to music in an audio and non-diegetic sounds in computer games revisited . In
visual context. Psychomusicology, 13, 133–154. Grimshaw, M. (Ed.), Game sound technology and player
interaction: Concepts and developments. Hershey, PA:
Jackson, D. (2003). Sonic branding: An introduction. New
IGI Global.
York: Palgrave/Macmillan. doi:10.1057/9780230503267
Jørgensen, A. (2004). Marrying HCI/Usability and com-
Jackson, B. (2009). SFP: The magical world of “Spore”.
puter games: A preliminary look. In Proceedings of the
In Mix Online. Retrieved May 20, 2009, from http://
third Nordic conference on Human-computer interaction,
mixonline.com/post/features/sfp-magical-world-spore
NordiCHI ‘04 (pp. 393-396). Tampere, Finland.
James, W. (1884). What is an emotion? Mind, 9(34),
Jørgensen, K. (2006). On the functional aspects of com-
188–205. doi:10.1093/mind/os-IX.34.188
puter game audio. In Audio Mostly: A Conference on
Jansz, J. (2006). The emotional appeal of violent Sound in Games.
video games. Communication Theory, 15(3), 219–241.
Jørgensen, K. (2007). ‘What are those grunts and growls
doi:10.1111/j.1468-2885.2005.tb00334.x
over there?’ Computer game audio and player action.
Jauss, H. R. (1982). Toward an aesthetic of reception. Unpublished doctoral dissertation, Copenhagen Univer-
Minneapolis, MN: University of Minnesota Press. sity, Denmark.
Jegers, K. (2009). Elaborating eight elements of fun: Jørgensen, K. (2007b). On transdiegetic sounds in com-
Supporting design of pervasive player enjoyment. ACM puter games. Northern lights Vol. 5: Digital aesthetics
Computers in Entertainment, 7(2). and communication. Intellect Publications.
Jennett, C., Cox, A. L., Cairns, P., Dhoparee, S., Epps, Jørgensen, K. (2008). Audio and Gameplay: An Analysis
A., & Tijs, T. (2008). Measuring and defining the expe- of PvP Battlegrounds in World of Warcraft. Gamestudies,
rience of immersion in games. International Journal of 8(2).
Human-Computer Studies, 66, 641–661. doi:10.1016/j.
Jot, J. M., & Chaigne, A. (1991). Digital delay networks
ijhcs.2008.04.004
for designing artificial reverberators. Paper presented at
Jennings, P. (2009). WMG: Professor Paul Jennings. Re- the AES 90th Convention. Preprint 3030.
trieved December 30, 2009, from http://www2.warwick.
Jumisko-Pyykkö, S., Reiter, U., & Weigel, C. (2007).
ac.uk/fac/sci/wmg/about/people/profiles/paj/.
Produced quality is not perceived quality—A qualitative
Jensenius, A. R. (2007). ACTION --SOUND: Developing approach to overall audiovisual quality. In Proceedings
methods and tools to study music-related body movement. of the 3DTV Conference.
Unpublished doctoral dissertation. University of Oslo,
Juslin, P. N., & Västfjäll, D. (2008). Emotional responses
Department of Musicology.
to music: The need to consider underlying mechanisms.
Jentsch, E. (1906). On the psychology of the Uncanny. The Behavioral and Brain Sciences, 31, 559–621.
Psychiat.-neurol. Wschr., 8(195), 219-21, 226-7.
Juslin, P. N., & Sloboda, J. A. (Eds.). (2001). Music and
Johnstone, T. (1996). Emotional speech elicited using emotion: Theory and research. Oxford: OUP.
computer games. In Proceedings of Fourth International
Juul, J. (2005). Half-real. Video games between real rules
Conference on Spoken Language (ICSLP96).
and fictional worlds. Cambridge, MA: MIT Press.
445
Kallmann, H., Woog, A. P., & Westerkamp, H. (2007). King, D., Delfabbro, P., & Griffiths, M. (2009). Video
The World Soundscape Project. The Canadian En- game structural characteristics: A new psychological
cyclopedia. Retrieved September 31, 2009, from taxonomy. International Journal of Mental Health and
http://thecanadianencyclopedia.com/PrinterFriendly. Addiction, 8(1), 90–106. doi:10.1007/s11469-009-9206-4
cfm?Params=U1ARTU0003743
Kirchner, W. K. (1958). Age differences in short-term
Kalman, R. E., & Bucy, R. S. (1961). New results in retention of rapidly changing information. Journal of
linear filtering and prediction problems. Journal of Basic Experimental Psychology, 55(4), 352–358. doi:10.1037/
Engineering, 83, 95–108. h0043688
Kanda, T., Hirano, T., Eaton, D., & Ishiguro, H. (2004). Kirnberger, J. P. (1767). Der allezeit fertige Polonaisen
Interactive robots as social partners and peer tutors for und Menuetten Komponist. Berlin, Germany: G.L. Winter.
children: A field trial. Human-Computer Interaction,
Klein, D. J., König, P., & Körding, K. P. (2003). Sparse
19(1), 61–84. doi:10.1207/s15327051hci1901&2_4
spectrotemporal coding of sounds. EURASIP Journal
Kaplan, H. I., & Sadock, B. J. (1998). Synopsis of psy- on Applied Signal Processing, 7, 659–667. doi:10.1155/
chiatry. Baltimore, MD: Williams & Wilkins. S1110865703303051
Károlyi, O. (1999). Introducing music. Location: Penguin. Klevjer, R. (2007). What is the avatar? Fiction and
embodiment in avatar-based singleplayer computer
Karplus, K., & Strong, A. (1983). Digital synthesis of
games. Unpublished doctoral dissertation. University of
plucked strings and drum timbres. Computer Music
Bergen, Norway.
Journal, 7(4), 43–55. doi:10.2307/3680062
Klinger, R., & Rudolph, G. (2006). Evolutionary com-
Kassier, R., Zielinski, S., & Rumsey, F. (2003). Computer
position of music with learned melody evaluation. In
games and multichannel audio quality part 2—Evalua-
N. Mastorakis & A. Cecchi (Eds.), Proceedings of the
tion of time-variant audio degradation under divided and
5th WSEAS International Conference on Computational
undivided attention. AES 115th Convention. Preprint 5856.
Intelligence, Man-Machine Systems and Cybernetics (pp.
Keller, P., & Stevens, C. (2004). Meaning from environ- 234-239). Venice, Italy: World Scientific and Engeneering
mental sounds: Types of signal-referent relations and their Academy and Society.
effect on recognizing auditory icons. Journal of Experi-
Kojima Productions (Developer). (2008). Metal Gear
mental Psychology. Applied, 10(1). doi:10.1037/1076-
Solid 4: Guns of the Patriots [Computer game]. Konami.
898X.10.1.3
Konami (1998). Dance Dance Revolution. Konami,
Kelly, G. A. (1955). The psychology of personal constructs.
Disney, Keen, Nintendo.
New York: Norton.
Kramer, G., Walker, B., Bonebright, T., Cook, P., Flowers,
Kelly, J., & Lochbaum, C. (1962). Speech synthesis. In
J., Miner, N., et al. (1997). Sonification report: Status of
Proceedings of the Fourth International Congress on
the field and research agenda. Retrieved September 31,
Acoustics, 4, 1-4. Retrieved from http://hear.ai.uiuc.edu/
2009, from http://www.icad.org/websiteV2.0/References/
public/Kelly62.pdf
nsf.html
Kendall, N. (2009, September 12). Let us play: Games are
Kranes, D. (1995). Play grounds. Gambling: Philosophy
the future for music. The Times: Playlist, p. 22.
and policy [Special Issue]. Journal of Gambling Studies,
Khronos Group. (2009). OpenSL ES specification. The 11(1), 91–102. doi:10.1007/BF02283207
Khronos Group.
446
Kromand, D. (2008). Sound and the diegesis in survival- Landragin, F., Bellalem, N., & Romary, L. (2001).
horror games. In Proceedings of Audiomostly 2008, 3rd Visual salience and perceptual grouping in multimodal
Conference on Interaction With Sound. interactivity. In Proceedings of International Workshop
on Information Presentation and Natural Multimodal
Krzywinska, T. (2002). Hands-on horror . In King, G., &
Dialogue IPNMD.
Krzywinska, T. (Eds.), ScreenPlay: Cinema/Videogames/
Interfaces (pp. 206–223). London: Wallflower. Lane, R. D., Nadel, L., Allen, J. J. B., & Kaszniak, A.
W. (2002). The study of emotion from the perspective
Kubelka, P. (1998). Talk on Unsere Afrika Reise. Presented
of cognitive neuroscience . In Lane, R. D., & Nadel, L.
at The School of Sound, London, England.
(Eds.), Cognitive neuroscience of emotion (Series in af-
Kubrick, S. (1968). 2001: A space odyssey. Metro- fective science) (pp. 3–11). Oxford: OUP.
Goldwyn-Mayer.
Lang, P. J. (1995). The emotion probe. Studies of mo-
Kuikkaniemi, K., & Kosunen, I. (2007). Progressive sys- tivation and attention. The American Psychologist, 50,
tem architecture for building emotionally adaptive games. 372–385. doi:10.1037/0003-066X.50.5.372
In BRAINPLAY ’07: Playing with Your Brain Workshop
Lang, P. J., Greenwald, M. K., Bradley, M. M., & Hamm,
at ACE (Advances in Computer Entertainment) 2007.
A. O. (1993). Looking at pictures: Affective, facial,
Kungel, R. (2004). Filmmusik für Filmemacher—Die visceral, and behavioral reactions. Psychophysiology,
richtige Musik zum besseren Film. Reil, Germany: 30, 261–273. doi:10.1111/j.1469-8986.1993.tb03352.x
Mediabook-Verlag.
Lange, C. G. (1912). The mechanism of the emotions . In
Kunkler-Peck, A. J., & Turvey, M. A. (2000). Hear- Rand, B. (Ed.), The classical psychologists (pp. 672–684).
ing shape. Journal of Experimental Psychology. Hu- Boston: Houghton Mifflin.
man Perception and Performance, 26(1), 279–294.
Langer, E. J. (1975). The illusion of control. Journal
doi:10.1037/0096-1523.26.1.279
of Personality and Social Psychology, 32, 311–328.
Kusama, K. (Director). (2005). Aeon flux[Motion picture]. doi:10.1037/0022-3514.32.2.311
Hollywood, CA: Paramount.
Larsen, J. T., McGraw, A. P., & Cacioppo, J. T. (2001).
Ladouceur, R., & Sévigny, S. (2005). Structural charac- Can people feel happy and sad at the same time? Journal
teristics of video lotteries: Effects of a stopping device of Personality and Social Psychology, 81(4), 684–696.
on illusion of control and gambling persistence. Journal doi:10.1037/0022-3514.81.4.684
of Gambling Studies, 21(2), 117–131. doi:10.1007/
Larsen, J. T., McGraw, A. P., Mellers, B. A., & Cacioppo,
s10899-005-3028-5
J. T. (2004). The agony of victory and thrill of defeat:
Lakoff, G. (1987). Women, fire and dangerous things. Mixed emotional reactions to disappointing wins and
Chicago: University of Chicago Press. relieving losses. Psychological Science, 15(5), 325–330.
doi:10.1111/j.0956-7976.2004.00677.x
Lakoff, G., & Johnson, M. (1980). Metaphors we live by.
Chicago: University of Chicago Press. Larsen, J. T., Norris, C. J., & Cacioppo, J. T. (2003). Ef-
fects of positive and negative affect on electromyographic
Lakoff, G., & Johnson, M. (1999). Philosophy in the flesh.
activity over zygomaticus major and corrugator super-
New York: Basic Books.
cilii. Psychophysiology, 40, 776–785. doi:10.1111/1469-
8986.00078
447
Larsson, P., Västfjäll, D., & Kleiner, M. (2002). Better Left 4 dead [Computer game]. (2008). Valve Corporation
presence and performance in virtual environments by im- (Developer). Redwood City, CA: EA Games. Lillian—A
proved binaural sound rendering. In AES 22nd International natural language library interface and library 2.0 mash-
Conference on Virtual, Synthetic and Entertainment Audio. up. (2006). Birmingham, UK: Daden Limited.
Larsson, P., Västfjäll, D., & Kleiner, M. (2003). On the Legend of Zelda. (1987). Nintendo.
quality of experience: A multi-modal approach to percep-
Lego rock band. [Video game], (2009). Harmonix (De-
tual ego-motion and sensed presence in virtual environ-
veloper), New York: MTV Games.
ments. In Proceedings of First ISCA ITRW on Auditory
Quality of Systems AQS-2003, 97-100. Li, S., & Knudsen, J. (2005). Beginning J2METM plat-
form: From novice to professional (3rd ed.). City, CA:
Lastra, J. (2000). Sound technology and the American
Apress Press Inc.
cinema: Perception, representation, modernity. New
York: Columbia University Press. Liljedahl, M., Papworth, N., & Lindberg, S. (2007).
Beowulf: An audio mostly game. Proceedings of the
Laurel, B. (1991). Computers as theatre. Boston, MA:
International Conference on Advances in Computer
Addison-Wesley.
Entertainment Technology, 2007, 200–203.
Lavie, N. (2001). Capacity limits in selective attention:
Liljedahl, M. (2011). Sound for fantasy and freedom . In
Behavioral evidence and implications for neural activity
Grimshaw, M. (Ed.), Game sound technology and player
. In Braun, J., & Koch, C. (Eds.), Visual attention and
interaction: Concepts and developments. Hershey, PA:
cortical circuits (pp. 49–60). Cambridge, MA: MIT Press.
IGI Global.
Lecanuet, J. P. (1996). Prenatal auditory experience . In
Liljedahl, M., Lindberg, S., & Berg, J. (2005). Digiwall:
Deliège, I., & Sloboda, J. (Eds.), Musical beginnings:
An interactive climbing wall. Proceedings of theACM
Origins and development of musical competence (pp.
SIGCHI International Conference on Advances in Com-
3–36). Oxford, UK: Oxford University Press.
puter Entertainment Technology, 2005, 225-228.
Ledoux, J. (1998). The emotional brain: The mysterious
Lima, E. (2005). The devil’s in the details: A look at
underpinnings of emotional life. London, UK: Phoenix.
Doom3’s antimusic. In Music4games. Retrieved June 12,
Lee, K. M., Jeong, E. J., Park, N., & Ryu, S. (2007). 2009, from http://www.music4games.net/Features_Dis-
Effects of networked interactivity in educational games: play.aspx?id=70
Mediating effects of social presence. In Proceedings of
Lincoln, Y. S., & Guba, E. D. (1985). Naturalistic inquiry.
PRESENCE2007, 10th Annual International Workshop
Thousand Oaks, CA: Sage Publications, Inc.
on Presence, 179-186.
Lissa, Z. (1965). Ästhetik der Filmmusik. Leipzig, Ger-
Lee, K. M., Jin, S. A., Park, N., & Kang, S. (2005). Ef-
many: Henschel.
fects of narrative on feelings of presence in computer/
video games. In Proceedings of the Annual Conference Little BigPlanet. (2008). Sony Computer Entertainment.
of the International Communication Association (ICA).
Livingstone, C., Woolley, R., Zazryn, T., Bakacs, L.,
Leeds, J. (2001). The power of sound. Rochester, VT: & Shami, R. (2008). The relevance and role of gaming
Inner Traditions. machine games and game features on the play of problem
gamblers. Adelaide: Independent Gambling Authority of
Lefebvre, H. (2004). Rhythmanalysis: Space, time and
South Australia.
everyday life. Continuum.
448
Livingstone, S. R. (2008). Changing musical emotion Lykken, D. T., & Venables, P. H. (1971). Direct
through score and performance with a compositional rule measurement of skin conductance: A proposal for
system. Unpublished doctoral dissertation. The University standardization. Psychophysiology, 8(5), 656–672.
of Queensland, Brisbane, Australia. doi:10.1111/j.1469-8986.1971.tb00501.x
Livingstone, S. R., & Brown, A. R. (2005). Dynamic Lynch, D. (1977). Eraserhead. American Film Institute.
response: Real-time adaptation for music emotion. In
Lynch, D. (2003). Action and reaction . In Sider, L. (Ed.),
Proceedings of the Second Australasian Conference on
Soundscape: The School of Sound lectures 1998-2001
Interactive Entertainment.
(pp. 49–53). London: Wallflower Press.
LoBrutto, V. (1994). Sound-on-film: Interviews with
Lynch, D. (1990-1991). Twin Peaks. Lynch/Frost Pro-
creators of film sound. Westport, CT: Praeger.
ductions.
Loftus, G. R., & Loftus, E. F. (1983). Mind at play. New
MacDorman, K. F., Green, R. D., Ho, C. C., & Koch, C.
York: Basic Books.
T. (2009). Too real for comfort? Uncanny responses to
Logan, B. (2002). Content-based playlist generation: Ex- computer generated faces. Computers in Human Behavior,
ploratory experiments, In ISMIR2002, 3rd International 25, 695–710. doi:10.1016/j.chb.2008.12.026
Conference on Musical Information (ISMIR).
MacDorman, K. F., & Ishiguro, H. (2006). The uncanny
Loki & Creative. (2009). (1.1). Loki Software. Open, AL: advantage of using androids in cognitive and social sci-
Creative Technology. ence research. Interaction Studies: Social Behaviour and
Communication in Biological and Artificial Systems, 7(3),
Lombard, M., & Ditton, T. (1997). At the heart of it all:
297–337. doi:10.1075/is.7.3.03mac
The concept of presence. Journal of Computer-Mediated
Communication, 3(2). MacDorman, K. F. (2006). Subjective ratings of robot
video clips for human likeness, familiarity, and eeriness:
Lorenz, E. (1993). The essence of chaos. Seattle, WA: Uni-
An exploration of the Uncanny Valley. ICCS/CogSci-2006
versity of Washington Press. doi:10.4324/9780203214589
Long Symposium: Toward Social Mechanisms of Android
Löthe, M. (2003). Ein wissensbasiertes Verfahren zur Science.
Komposition von frühklassischen Menuetten. Unpublished
MacKenzie, I. S., & Ware, C. (1993). Lag as a determi-
doctoral dissertation. University of Stuttgart, Germany.
nant of human performance in interactive systems. In
Love. (forthcoming). Eskil Steenberg. Proceedings of the ACM Conference on Human Factors
in Computing Systems – INTERCHI’93, 488-493.
Lucas, G. (1971). THX 1138. Warner Bros. Pictures.
MacLaran, A. (2003). Making space: Property develop-
Lucas, G. (Director). (1977). Star Wars [Motion picture].
ment and urban planning. London: Hodder Arnold.
Los Angeles, CA: 20th Century Fox.
Mahlke, S. (2007). Marc Hassenzahl on user experience.
LucasArts. (1997). Monkey Island 3: The Curse of Monkey
HOT Topics, 6(2). Retrieved September 31, 2009, from
Island. LucasArts.
http://hot.carleton.ca/hot-topics/articles/hassenzahl-on-
Lumbreras, M., & Sánchez, J. (1999). Interactive 3D user-experience/
sound hyperstories for blind children. In Proceedings of
the SIGCHI conference on Human factors in computing
systems: the CHI is the limit (pp. 318-325). Pittsburgh,
PA: ACM.
449
Mahlke, S., & Thüring, M. (2007). Studying anteced- Matsui, D., Minato, T., MacDorman, K. F., & Ishiguro,
ents of emotional experiences in interactive contexts. H. (2005). Generating natural motion in an android by
In Proceedings of the SIGCHI conference on Human mapping human motion. In Proceedings of IEEE/RSJ
factors in computing systems (pp. 915-918). San Jose, International Conference on Intelligent Robots and
CA: ACM Press. Systems, 1089-1096.
Mandryk, R. L., & Atkins, M. S. (2007). A fuzzy physi- Mattilaa, A. S., & Wirtz, J. (2001). Congruency of scent
ological approach for continuously modeling emotion and music as a driver of in-store evaluations and behavior.
during interaction with play environments. International Journal of Retailing, 77, 273–289. doi:10.1016/S0022-
Journal of Human-Computer Studies, 65(4), 329–347. 4359(01)00042-2
doi:10.1016/j.ijhcs.2006.11.011
Maturana, H. R., & Varela, F. G. (1980). Autopoiesis:
Mandryk, R. L. (2008). Physiological measures for game The organization of the living . In Maturana, H. R., &
evaluation . In Isbister, K., & Schaffer, N. (Eds.), Game Varela, F. G. (Eds.), Autopoiesis and Cognition. Dordrecht,
usability: Advice from the experts for advancing the player Netherlands: Reidel.
experience (pp. 207–235). Burlington, MA: Elsevier.
Maurizio, V., & Samuele, S. (2007). Low-cost acceler-
Manning, P. (1992). Erving Goffman and modern sociol- ometers for physics experiments. European Journal of
ogy. Standord, CA: Stanford University Press. Physics, 28, 781–787. doi:10.1088/0143-0807/28/5/001
Manovich, L. (2001). The language of new media. Cam- Max Payne. (2001). Rockstar Games.
May, R. (1977). The meaning of anxiety (revised ed.).
Manz, J., & Winter, J. (Eds.). (1976). Baukastensätze zu New York: Norton.
Weisen des Evangelischen Kirchengesangbuches. Berlin:
Mazzola, G., Göller, S., & Müller, S. (2002). The topos
Evangelische Verlagsanstalt.
of music: Geometric logic of concepts, theory, and per-
Mark, F., Bear, B. W. C., & Paradiso, M. A. (2007). formance. Zurich: Birkhäuser Verlag.
Neuroscience —Exploring the brain (3rd ed.). City, ST/
McAdams, S. E., & Bigand, E. (Eds.). (1992). Thinking
Country: Lippincott Williams & Wilkins.
in sound: The cognitive psychology of human audition.
Marks, A., & Novak, J. (2009). Game development es- New York: Clarendon Press. Oxford: University Press.
sentials: Game audio development. Florence, KY: Delmar
McCraty, R., Barrios-Choplin, B., Atkinson, M., & Toma-
Cengage Learning.
sino, D. (1998). The effects of different types of music on
Marks, A. (2001). The complete guide to game audio. mood, tension and mental clarity. Alternative Therapies
Lawrence, KS: CMP Books. in Health and Medicine, 4, 75–84.
Marks, A. (2009). The complete guide to game audio: For McCuskey, M. (2003). Beginning game audio program-
composers, musicians, sound designers, game developers ming. Boston, MA: Premier Press.
(2nd ed.). Location: Elsevier Press.
McDonald, G. (2008). A brief timeline of video game
Marmurek, H. H. C., Finlay, K., Kanetkar, V., & Lond- music. Retrieved July 8, 2009, from http://www.gamespot.
erville, J. (2007). The influence of music on estimates com/gamespot/features/video/vg_music/.
of at-risk gambling intentions: An analysis by casino
McGurk, H., & MacDonald, J. (1976). Hearing lips
design. International Gambling Studies, 7(1), 113–122.
and seeing voices. Nature, 264(5568), 746–748.
doi:10.1080/14459790601158002
doi:10.1038/264746a0
450
McMahan, A. (2003). Immersion, engagement, and pres- Meyer, J. (2009). Acoustics and the performance of music:
ence: A new method for analyzing 3-D video games . In Manual for acousticians, audio engineers, musicians,
Wolf, M. J. P., & Perron, B. (Eds.), The video game theory architects and musical instrument makers (5th ed.). New
reader (pp. 67–87). New York: Routledge. York: Springer.
McTiernan, J. (1987). Predator. Amercent Films. Microsoft. (2009). [Computer software] [. Microsoft
Corporation.]. Direct, X, 11.
Meehan, M., Razzaque, S., Whitton, M. C., & Brooks,
F. P., Jr. (2003). Effect of latency on presence in stressful Miller, D. J., & Robertson, D. P. (2009). Using a games
virtual environments. In Proceedings of IEEE Virtual console in the primary classroom: Effects of ‘Brain Train-
Reality, 141-148. ing’ programme on computation and self-esteem. British
Journal of Educational Technology, 41(2), 242–255.
Mega Man. (1993). Capcom. Capcom Entertainment.
doi:10.1111/j.1467-8535.2008.00918.x
Menon, V., & Levitin, D. J. (2005). The rewards of music
Miller, G. A. (1956). The magical number seven, plus or
listening: Response and physiological connectivity of
minus two: Some limits on our capacity for processing
the mesolimbic system. NeuroImage, 28(1), 175–184.
information. Originally published in The Psychological
doi:10.1016/j.neuroimage.2005.05.053
Review (1956), 63, 81-97. (Reproduced, with the author’s
Menzies, D. (2002). Scene management for modelled permission, by Stephen Malinowski). Retrieved March
audio objects in interactive worlds. In Nakatsu & H. 10, 2009, from http://www.musanim.com/miller1956/
Kawahara (Eds.), Proceedings of the 8th International
Minato, T., Shimda, M., Ishiguro, H., & Itakura, S. (2004).
Conference on Auditory Display. Kyoto, Japan: ATR.
Development of an android robot for studying human-
Menzies, D. (2007). Physical audio for virtual environ- robot interaction. In R. Orchard, C. Yang & M. Ali (Eds.),
ments, Phya in review. In W. L. Martens (ed.), Proceedings Innovations in applied artificial intelligence, 424-434.
of the 13th International Conference on Auditory Display
Miranda, E. R., & Biles, J. A. (Eds.). (2007). Evo-
(pp.197-202). Montreal, Canada: McGill University.
lutionary computer music (1st ed.). USA: Springer.
Menzies, D. (2008). Virtual intimacy: Phya as an instru- doi:10.1007/978-1-84628-600-1
ment. In Proceedings of the 8th International Conference
Miranda, E. R. (2002). Towards the cutting edge: AI,
on New Interfaces for Musical Expression NIME08.
supercomputing and evolutionary systems. Computer
Retrieved from http://www.zenprobe.com/dylan/pubs/
Sound Design, 157-192. Elsevier.
menzies08_virtualIntimacy.pdf
Moeck, T., Bonneel, N., Tsingos, N., Drettakis, G., Viaud-
Menzies, D. (2009). Phya and VFoley, physically mo-
Delmon, I., & Alloza, D. (2007). Progressive perceptual
tivated audio for virtual environments. In 35th AES
audio rendering of complex scenes. In Proceedings of the
Conference on Audio for Games. Retrieved from http://
2007 Symposium on Interactive 3D Graphics and Games
www.aes.org/e-lib/browse.cfm?elib=15171
(ACM SIGGRAPH),189-196.
Metal Gear Solid. (1998). Konami Japan. Konami Com-
Moffat, D. (1980). Personality parameters and programs
puter Entertainment.
. In Trappl, R., & Petta, P. (Eds.), Creating personalities
Metz, C. (1980/1985). Aural objects . In Weis, E., & Belton, for synthetic actors (pp. 120–165). Berlin: Springer.
J. (Eds.), Film sound: Theory and practice. Columbia:
Moore, B. C. J. (Ed.). (1995). Hearing: Handbook of
perception and cognition (2nd ed.). New York: Academic
Press.
451
Moore, B. C. J. (2003). An introduction to the psychology Murphy, D., & Neff, F. (2011). Spatial sound for computer
of hearing (5th ed.). New York: Academic Press. games and virtual reality . In Grimshaw, M. (Ed.), Game
sound technology and player interaction: Concepts and
Morgan, S. (2009). Dynamic game audio ambience:
bringing Prototype’s New York City to life. Gamasutra.
Retrieved May 8, 2009, from http://www.gamasutra.com/ Murphy, D. (1999). Spatial sound description in virtual
view/feature/4043/ environments. In Proceedings of the Cambridge Music
Processing Colloquium.
Mori, M. (1970/2005). The Uncanny Valley. In K. F.
MacDormand & T. Minato (Trans.) . Energy, 7(4), 33–35. Murphy, D., & Pitt, I. (2001). Spatial sound enhancing
virtual storytelling. In Proceedings of the International
Moss, W., Yeh, H. (2010) Automatic sound synthesis
Conference ICVS, Virtual Storytelling Using Virtual
from fluid simulation. ACM Trans. On Graphics (SIG-
Reality Technologies for Storytelling (pp. 20-29) Berlin:
GRAPH 2010).
Springer.
Mozart, W. A. (1787). Musikalisches Würfelspiel: An-
Murphy, D., & Rumsey, F. (2001). A scalable spatial
leitung so viel Walzer oder Schleifer mit zwei Würfeln
sound rendering system. In Proceedings of the 110th
zu componieren ohne musikalisch zu seyn noch von der
AES Convention.
Composition etwas zu verstehen. Köchel Catalog of Mo-
zart’s Work KV1 Appendix 294d or KV6 516f. Murray, J. (1997). Hamlet on the holodeck: The future
of narrative in cyberspace. Cambridge, MA: MIT Press.
Mr. Do! (1983). CBS Electronics.
Muzak Corporation. (n.d.). Why Muzak. Retrieved Octo-
Mullan, E. (2011). Physical modelling for sound synthesis
ber 5, 2009, from http://music.muzak.com/why_muzak.
. In Grimshaw, M. (Ed.), Game sound technology and
player interaction: Concepts and developments. Hershey, Myst. (1993). Brøderbund.
PA: IGI Global.
Nacke, L. E., Grimshaw, M. N., & Lindley, C. A. (2010).
Mullan, E. (2009). Driving sound synthesis from a phys- More than a feeling: Measurement of sonic user experi-
ics engine. In Charlotte Kobert (Ed.), Proceedings of the ence and psychophysiology in a first-person shooter. In-
IEEE Games Innovation Conference 2009 (pp.256-264). teracting with Computers, 22(5), 336–343. doi:10.1016/j.
New York: IEEE. intcom.2010.04.005
Murch, W. (1995). Sound design: The dancing shadow . Nacke, L., & Grimshaw, M. (2011). Player-game inter-
In Boorman, J., Luddy, T., Thomson, D., & Donohue, W. action through affective sound . In Grimshaw, M. (Ed.),
(Eds.), Projections 4: Film-makers on film-making (pp. Game sound technology and player interaction: Concepts
237–251). London: Faber and Faber. and developments. Hershey, PA: IGI Global.
Murch, W. (1998). Dense clarity – Clear density. Re- Nacke, L. E. (2009). Affective ludology: Scientific mea-
trieved March 10, 2009, from http://www.ps1.org/cut/ surement of user experience in interactive entertainment.
volume/murch.html Unpublished doctoral dissertation. Blekinge Institute of
Technology, Karlskrona, Sweden. Retrieved January 1,
Murphy, D. (1999). A review of spatial sound in the Java
2010, from http://affectiveludology.acagamic.com.
3D API specification. Institute of Sound Recording,
University of Surrey.
452
Nacke, L., & Lindley, C. A. (2008). Flow and immersion Nordahl, R. (2005). Self-induced footsteps sounds in
in first-person shooters: Measuring the player’s gameplay virtual reality: Latency, recognition, quality and pres-
experience. In Proceedings of the 2008 Conference on ence. In Proceedings of PRESENCE 2005, 8th Annual
Future Play: Research, Play, Share (pp. 81-88). New International Workshop on Presence, 353-354.
York: ACM.
Norman, D. (2004). Emotional design: Why we love (or
Nacke, L., Lindley, C., & Stellmach, S. (2008). Log who’s hate) everyday things. New York: Basic Books.
playing: Psychophysiological game analysis made easy
Norman, D. (2002). Emotion & design: attractive things
through event logging. In P. Markopoulos, B. Ruyter, W.
work better. interactions, 9(4), 36-42.
IJsselsteijn, & D. Rowland (Eds.), Proceedings of Fun
and Games, Second International Conference (pp. 150- O’Brien, J. F., Cook, P. R., & Essl, G. (2001). Synthesizing
157). Berlin: Springer. sounds from physically based motion. In P. Lynn (Ed.),
Proceedings of SIGGRAPH ’01: The 28th annual confer-
Nakamura, J., & Csíkszentmihályi, M. (2002). The concept
ence on Computer graphics and interactive techniques
of flow . In Snyder, C. R., & Lopez, S. J. (Eds.), Handbook
(pp. 529-536). New York: ACM.
of positive psychology (pp. 89–105). New York: Oxford
University Press. O’Callaghan, C. (2009 Summer). Auditory perception. In
E. N. Zalta (Ed.), The Stanford encyclopedia of philosophy,
Namco (2003). Donkey Konga [Computer game]. Namco
Retrieved January 24, 2010, from http://plato.stanford.
(Developer), Nintendo.
edu/archives/sum2009/entries/perception-auditory/.
NanaOn-Sha (1996). PaRappa the Rapper [Computer
Odin, R. (2000). De la fiction. Bruxelle: De Boeck.
game]. NanaOn-Sha (Developer), Sony.
Oguro, C. (2009). The greatest Easter eggs in gaming.
NanaOn-Sha (1999). Vib-Ribbon [Computer game].
Gamespot. Retrieved October 5, 2009, from http://www.
NanaOn-Sha (Developer), Sony.
gamespot.com/features/6131572/index.html.
Napolitano, J. (2008). Dead Space sound design: In space
Öhman, A., Flykt, A., & Esteves, F. (2001). Emotion
no one can hear intern screams. They are dead. (Interview).
drives attention: Detecting the snake in the grass. Journal
Original Sound Version. Retrieved from http://www.
of Experimental Psychology. General, 130(3), 466–478.
originalsoundversion.com/?p=693.
doi:10.1037/0096-3445.130.3.466
Naughty Dog (Developer). (2007). Uncharted: Drake’s
Ong, W. (1982/1990). Orality and literacy: The technolo-
Fortune [Computer game]. Sony Computer Entertainment.
gizing of the word (L. Fyhr, G.D. Hansson & L. Perme
Neale, S. (2000). Genre and Hollywood. New York: Swedish Trans.). Göteborg, Sweden: Anthropos.
Routledge.
Otani, M., & Ise, S. (2003). A fast calculation method of
Neitzel, B. (2000). Gespielte Geschichten. Struktur- und the head-related transfer functions for multiple source
prozessanalytische Untersuchungen der Narrativität von points based on the boundary element method. Acoustical
Videospielen. Unpublished doctoral dissertation, Uni- Science and Technology, 24(5), 259–266. doi:10.1250/
versity of Weimar, Germany. Oink! (1983). Activision. ast.24.259
Nettle, D. (2006). Happiness: The science behind your Owen, D. (2006, April 10). The soundtrack of your life:
smile. Oxford: OUP. Muzak in the realm of retail theatre. The New Yorker.
Retrieved October 5, 2009, from http://www.newyorker.
com/archive/2006/04/10/060410fa_fact.
453
Paavola, M. K. E., & Page, J. (2005). 3D audio for mo- Peck, N. (2001). Beyond the library: Applying film
bile devices via Java. In Proceedings of the AES 118th postproduction techniques to game sound design. In Pro-
Convention. ceedings of Game Developers Conference. San Jose, CA.
Pachet, F., & Roy, P. (2001). Musical harmonization with Peck, N. (2007, September). Unpublished Presentation.
constraints: A survey. Constraints Journal. CoFesta/TGS, Tokyo, Japan.
Pac-Man. (1980). Namco. Pedersini, F., Sarti, A., & Tubaro, S. (2000). Object-based
sound synthesis for virtual environments using musical
Pai, D. K., Doel, K. d., James, D. L., Lang, J., Lloyd,
acoustics. IEEE Signal Processing Magazine, 17(6),
J. E., Richmond, J. L., & Yau, S. H. (2001). Scanning
37–51. doi:10.1109/79.888863
physical interaction behaviour of 3D objects. In P. Lynn
(Ed.), Proceedings of SIGGRAPH ’01: The 28th annual Perron, B. (2006). Silent hill: Il motore del terrore. Milan:
conference on Computer graphics and interactive tech- Costa & Nolan.
niques (pp. 87-96). New York: ACM.
Perron, B. (2004). Sign of a threat: The effects of warning
Panksepp, J. (2004). Affective neuroscience: the founda- systems in survival horror games. In Proceedings of the
tions of human and animal emotions. Oxford: Oxford Fourth International COSIGN (Computational Semiotics
University Press. for Games and New Media) 2004 Conference.
Papadopoulos, G., & Wiggins, G. (1999). AI methods Perron, B. (2005a). A cognitive psychological approach
for algorithmic composition: A survey, a critical view to gameplay emotions. In Proceedings of the Second
and future prospects. In AISB Symposium on Musical International DiGRA (Digital Games Research Associa-
Creativity. Edinburgh, Scotland. tion) 2005 Conference.
PaRappa the rapper. [Video game], (1996). Sony Com- Perron, B. (2005b). Coming to play at frightening yourself:
puter Entertainment. Welcome to the world of horror video games. In Proceed-
ing of the Aesthetics of Play conference.
Parke, J., & Griffiths, M. (2006). The psychology of
the fruit machine: The role of structural characteristics Phase. [Video game], (2007). Harmonix Music Systems.
(Revisited). International Journal of Mental Health and
Phillips, N. (2009). From films to games, from analog to
Addiction, 4, 151–179. doi:10.1007/s11469-006-9014-z
digital: Two revolutions in multi-media! Retrieved July
Parker, J. R., & Heerema, J. (2008). Audio Interac- 8, 2009, from http://www.filmsound.org/game-audio/
tion in Computer Mediated Games. International film_game_parallels.htm.
Journal of Computer Games Technology, 2008, 1–8.
Picard, R. W. (1997). Affective computing. Cambridge,
.doi:10.1155/2008/178923
MA: MIT Press.
Parker, P. (2003). Filling the gaps . In Sider, L. (Ed.),
Pillay, H. K. (2002). An investigation of cognitive pro-
Soundscape: The School of Sound lectures 1998-2001
cesses engaged in by recreational computer game players:
(pp. 184–194). London: Wallflower Press.
Implications for skills of the future. Journal of Research
Pashler, H. E. (1999). The psychology of attention. Cam- on Technology in Education, 34(3), 336–350.
Pitzen, L. J., & Rauscher, F. H. (1998, May). Choosing
Paul, L., & Bridgett, R. (2006). Establishing an aesthetic music, not style of music, reduces stress and improves
in next generation sound design. Gamasutra. Retrieved task performance. Poster presented at the American
May 25, 2009, from http://www.gamasutra.com/view/ Psychological Society, Washington, DC.
feature/2733/
454
Planescape: Torment. (2005). Black Island Studios. Pong. (1972). Atari Inc.
Interplay.
Posner, J., Russell, J. A., Gerber, A., Gorman, D., Coliba-
Plantec, P. (2007). Crossing the Great Uncanny Valley. In zzi, T., & Yu, S. (2009). The neurophysiological bases
Animation World Network. Retrieved August 21, 2010, of emotion: An fMRI study of the affective circumplex
from http://www.awn.com/articles/production/crossing- using emotion-denoting words. Human Brain Mapping,
great-uncanny-valley/page/1%2C1. 30(3), 883–895. doi:10.1002/hbm.20553
Plantec, P. (2008). Image Metrics attempts to leap the Posner, J., Russell, J. A., & Peterson, B. S. (2005). The
Uncanny Valley. In The Digital Eye. Retrieved April 6, circumplex model of affect: An integrative approach
2009, from http://vfxworld.com/?atype=articles&id=37 to affective neuroscience, cognitive development, and
23&page=1. psychopathology. Development and Psychopathology,
17, 715–734. doi:10.1017/S0954579405050340
Platt, J. C., Burges, C. J. C., Swenson, S., Weare, C., &
Zheng, A. (2002). Learning a Gaussian process prior for Pozzati, G. (2009). Infinite suite: Computers and musical
automatically generating music playlists. Advances in form. In G. Scavone, V. Verfaille & A. da Silva (Eds.), Pro-
Neural Information Processing Systems, 14, 1425–1432. ceedings of the International Computer Music Conference
(ICMC) (pp. 319-322). Montreal, Canada: International
Plomp, R., & Mimpen, A. M. (1968). The ear as a fre-
Computer Music Association, McGill University.
quency analyzer. The Journal of the Acoustical Society of
America, 36, 1628–1636. doi:10.1121/1.1919256 Prey. (2005). 2K Games/3D Realms.
Plutchik, R. (1984). Emotions: A general psychoevolu- Primerose. (2009). Jason Rohrer.

tionary theory. Hillsdale, NJ: Erlbaum.
Prince, R. (1996). Tricks and techniques for sound effect
Plutchik, R. (2001). The nature of emotions. American design. CGDC. Retrieved October 10, 2008, from http://
Scientist, 89(4), 344–350. www.gamasutra.com/features/sound_and_music/081997/
sound_effect.htm
Polaine, A. (2005). The flow principle in interactivity. In
Proceedings of the Second Australasian Conference on Productions, K. S. K. (n.d.). Cinematic & Muzak. Retrieved
interactive Entertainment. October 20, 2009, from http://www.kskproductions.nl/en/
services/cinematic-a-muzak.
Pollack, I. (1952). The information of elementary audi-
tory displays. The Journal of the Acoustical Society of Prototype. (2009). Activision.
America, 24, 745–749. doi:10.1121/1.1906969
Przybylski, A. K., Ryan, R. M., & Rigby, S. C. (2009).
Pollack, I. (1953). The information of elementary audi- The motivating role of violence in video games. Person-
tory displays II. The Journal of the Acoustical Society of ality and Social Psychology Bulletin, 35(2), 243–259.
America, 25, 765–769. doi:10.1121/1.1907173 doi:10.1177/0146167208327216
Pollick, F. E. (in press). In search of the Uncanny Valley Pudovkin, V. (1985). Aynchronism as a principle of sound
. In Grammer, K., & Juett, A. (Eds.), Analog communica- film . In Weis, E., & Belton, J. (Eds.), Film sound: Theory
tion: Evolution, brain mechanisms, dynamics, simulation. and practice. New York: Columbia University Press.
Cambridge, MA: MIT Press. (Original work published 1929)
Polotti, P., Papetti, S., Rocchesso, D., & Delle, S. (Eds.). Pulkki, V. (2001). Spatial sound generation and perception
(2001). The sounding object (Sob project). Verona: Uni- by amplitude panning techniques. Unpublished doctoral
versity of Verona. dissertation. Helsinki University of Technology, Finland.
455
Pulman, A. (2007). Investigating the potential of Nintendo Recommendation ITU-T P.911. (1998/1999). Subjective
DS Lite handheld gaming consoles and Dr. Kawashima’s audiovisual quality assessment methods for multimedia
Brain Training software as a study support tool in numeracy applications. Geneva: International Telecommunication
and mental arithmetic. JISC TechDis HEAT Scheme Round Union.
1 Project Reports. Retrieved June 6, 2009, from http://
Reeves, B., & Voelker, D. (1993). Effects of audio-video
www.techdis.ac.uk/index.php?p=2_1_7_9.
asynchrony on viewer’s memory, evaluation of content
Quilitch, H. R., & Risley, T. R. (1973). The effects of and detection ability. (Research Report prepared for Pixel
play materials on social play. Journal of Applied Behavior Instruments, CA). Palo Alto, CA: Standford University,
Analysis, 6(4), 573–578. doi:10.1901/jaba.1973.6-573 Department of Communication.
Raghuvanshi, N., Lauterbach, C., Chandak, A., Manocha, Reid, J., Geelhoed, E., Hull, R., Cater, K., & Clayton,
D., & Lin, M. C. (2007). Real-time sound synthesis and B. (2005). Parallel worlds: Immersion in location-based
propagation for games. Communications of the ACM, experiences. In CHI ‘05 Extended Abstracts on Human
50(7), 67–73. doi:10.1145/1272516.1272541 Factors in Computing Systems.
Raghuvanshi, N., & Lin, M. C. (2006). Interactive sound Reiter, U. (2011). Perceived quality in game audio . In
synthesis for large scale environments. In Proceedings Grimshaw, M. (Ed.), Game sound technology and player
of the 2006 symposium on Interactive 3D graphics and interaction: Concepts and developments. Hershey, PA:
games (pp. 101-108). New York: ACM. IGI Global.
Rand, A. (1971). Art and cognition. The Romantic Mani- Reiter, U., & Jumisko-Pyykkö, S. (2007). Watch, press
festo. 78. Signet. and catch—Impact of divided attention on requirements of
audiovisual quality . In Jacko, J. (Ed.), Human-Computer
Rault, J. B., Emerit, M., Warusfel, O., & Jot, J. M. (1998).
Interaction, Part III, HCI 2007 (pp. 943–952). Berlin:
Audio rendering of virtual room acoustics and perceptual
Springer Verlag.
description of the auditory scene. TCI/SC29/WG11.
Reiter, U. (2009). Bimodal audiovisual perception in
Ravaja, N. (2004). Contributions of psychophysiology
interactive application systems of moderate complexity.
to media research: Review and recommendations. Media
Unpublished doctoral dissertation. TU Ilmenau, Germany.
Psychology, 6(2), 193–235. doi:10.1207/s1532785x-
mep0602_4 Reiter, U., & Weitzel, M. (2007). Influence of interaction
on perceived quality in audiovisual applications: Evalu-
Ravaja, N., Turpeinen, M., Saari, T., Puttonen, S., &
ation of cross-modal influence. In Proceedings of 13th
Keltikangas-Järvinen, L. (2008). The psychophysiology
International Conference on Auditory Displays (ICAD).
of James Bond: Phasic emotional responses to violent
video game events. Emotion (Washington, D.C.), 8(1), Resident evil 3: Nemesis. [Computer game]. (1999).
114–120. doi:10.1037/1528-3542.8.1.114 Capcom (Developer). Sunnyvale: Capcom USA.
Ravaja, N., Saari, T., Laarni, J., Kallinen, K., Salminen, Resident evil 4. [Computer game]. (2004). Capcom Pro-
M., Holopainen, J., & Järvinen, A. (2005). The psycho- duction Studio 4 (Developer). Sunnyvale: Capcom USA.
physiology of video gaming: Phasic emotional responses
Resident evil 5. [Computer game]. (2009). Capcom
to game events. In Proceedings of DiGRA 2005 Confer-
Production Studio 4 (Developer). Sunnyvale: Capcom
ence: Changing Views - Worlds in Play.
USA. Cardinal, S. (1994). Occurrences sonores et espace
filmique. Unpublished master’s thesis. University of
Montréal, Montréal.
456
Resident evil. [Computer game]. (1996). Capcom (De- Röber, N., Kaminski, U., & Masuch, M. (2007). Ray
veloper). Sunnyvale: Capcom USA. acoustics using computer graphics technology. In Pro-
ceedings of the 10th International Conference on Digital
Resident evil. [Computer game]. (2002). Capcom (De-
Audio Effects (DAFx-07) (pp. 117-124). Bordeaux, France:
veloper). Sunnyvale: Capcom USA.
LaBRI University Bordeaux.
Reynolds, G., Barry, D., Burke, T., & Coyle, E. (2007).
Rocchesso, D., Avanzini, A., Rath, M., Bresin, R., &
Towards a personal automatic music playlist generation
Serafin, S. (2004). Contact Sounds for Continuous
algorithm: The need for contextual information. In Pro-
Feedback. In Proceedings of International Workshop on
ceedings of the Audio Mostly Conference on Interaction
Interactive Sonification.
with Sound.
Rock Band. (2008). Harmonix. MTV Games.
Rez. [Video game], (2001). Sega (Developer, Dream-
cast), Sony Computer Entertainment Europe (Developer, Rock band. (2005-2007). Harmonix Music Systems.
PlayStation 2).
Roddenberry, G. (1966-1969). Star trek. Paramount
Rhodes, L. A., David, D. C., & Combs, A. L. (1988). Television.
Absorption and enjoyment of music. Perceptual and
Roeber, N., Deutschmann, E. C., & Masuch, M. (2006).
Motor Skills, 66, 737–738.
Authoring of 3D virtual auditory environments. In Pro-
Richards, J. (2008, August 18). Lifelike animation heralds ceedings of the First International AudioMostly Confer-
new era for computer games. The Times Online. Retrieved ence (pp. 15-21).
April 7, 2009, from, http://technology.timesonline.co.uk/
Rohner, S. J., & Miller, R. (1980). Degrees of familiar and
tol/news/tech_and_web/article4557935.ece.
affective music and their effects on state anxiety. Journal
Riessman, D. C. K. (1993). Narrative analysis (1st ed.). of Music Therapy, 17, 2–15.
Los Angeles: Sage.
Roque, L. (2005). A sociotechnical conjecture about the
Ripken, J. (2009, October 19). Game synchronisation: A context and development of multiplayer online game
view from artist development. Paper presented at the Music experiences. In Proceedings of DiGRA 2005 Conference:
and Creative Industries Conference 2009, Manchester, UK. Changing Views – Worlds in Play. Vancouver, Canada.
Rivlin, G. (2004, May 9). The tug of the newfangled Roux-Girard, G. (2011). Listening to fear: A study of
slot machines. New York Times. Retrieved July 15, sound in horror computer games . In Grimshaw, M. (Ed.),
2009, from http://www.nytimes.com/2004/05/09/ Game sound technology and player interaction: Concepts
magazine/09SLOTS.html. and developments. Hershey, PA: IGI Global.
Roads, C. (1996). The computer music tutorial. Cam- Roux-Girard, G. (2009). Plunged alone into darkness:
bridge, MA: MIT Press. Evolution in the staging of fear in the Alone in the Dark
series . In Perron, B. (Ed.), Horror video games: Essays
Röber, N. (2008). Interacting with sound: Explorations
on the fusion of fear and play (pp. 145–167). Jefferson,
beyond the frontiers of 3D virtual auditory environments.
NC: McFarland.
Munich, Germany: Dr. Hut.
Ruiz, P. (1969). A technique for simulating the vibrations
Röber, N., & Masuch, M. (2005). Leaving the screen: New
of strings with a digital computer. Unpublished master’s
perspectives in audio-only gaming. In Proceedings of 11th
thesis. University of Illinois, Urbana, IL.
International Conference on Auditory Display (ICAD).
457
Russell, J. A. (1980). A circumplex model of affect. Journal Schafer, R. M. (1977). The tuning of the world. Toronto:
of Personality and Social Psychology, 39(6), 1161–1178. McClelland and Stewart.
doi:10.1037/h0077714
Schafer, R. M. (1977). The soundscape: Our sonic
Russell, J. A. (2003). Core affect and the psychological environment and the tuning of the world. New York:
construction of emotion. Psychological Review, 110(1), Destiny Books.
145–172. doi:10.1037/0033-295X.110.1.145
Schafer, R. M. (1973). The music of the environment.
Russolo, L. (1913). Russolo: The art of noises. Retrieved Cultures, 1973(1).
December 30, 2009, from http://120years.net/machines/
Schell, J. (2008). The art of game design: A book of lenses.
futurist/art_of_noise.html.
London: Morgan Kaufmann.
Ryan, R., Rigby, C., & Przybylski, A. (2006). The moti-
Schlosberg, H. (1952). The description of facial expres-
vational pull of video games: A self-determination theory
sions in terms of two dimensions. Journal of Experimental
approach. Motivation and Emotion, 30(4), 344–360.
Psychology, 44(4), 229–237. doi:10.1037/h0055778
doi:10.1007/s11031-006-9051-8
Schmidt, A., & Winterhalter, C. (2004). User context aware
Sakaguchi, H. (Director). (2001). Final fantasy [Motion
delivery of e-learning material: Approach and architecture.
picture]. Los Angeles: Columbia.
Journal of Universal Computer Science, 10(1), 38–46.
Salen, K., & Zimmermann, E. (2004). Rules of play:
Schneider, E., Wang, Y., & Yang, S. (2007). Exploring the
Game design fundamentals. Cambridge, MA: MIT Press.
Uncanny Valley with Japanese video game characters. In
Samorost 1. (2003). Amanita Design. Proceedings of Situated Play, DiGRA 2007 Conference,
546-549.
Schottstaedt, W. (1989). Automatic counterpoint . In
Satoshi Yairi, Y. I., & Suzuki, Y. (2008). Individualization
Mathews, M., & Pierce, J. (Eds.), Current directions in
of Head-Related Transfer Functions based on subjective
computer music research. Cambridge, MA: MIT Press.
evaluation. In Proceedings of the 14th International
Conference on Auditory Displays. Schroeder, M. R. (1962). Natural sounding artificial
reverberation. Journal of the Audio Engineering Society.
Saunders, K., & Novak, J. (2006). Game development
Audio Engineering Society, 10(3), 219–223.
essentials: Game interface design. Stamford, CT: Cen-
gage Learning. Schroeder, M. R. (1970). Digital simulation of sound
transmission in reverberant spaces (part 1). The Journal
Scarface: The world Is yours. (2006). Vivendi.
of the Acoustical Society of America, 47(2), 424–431.
Schachter, S., & Singer, J. (1962). Cognitive, social, and doi:10.1121/1.1911541
physiological determinants of emotional state. Psychologi-
Schull, N. D. (2005). Digital gambling: The coinci-
cal Review, 69, 379–399. doi:10.1037/h0046234
dence of desire and design. The Annals of the American
Schachter, S. (1964). The interaction of cognitive and Academy of Political and Social Science, 597, 65–81.
physiological determinants of emotional state . In doi:10.1177/0002716204270435
Berkowitz, L. (Ed.), Advances in experimental social
Scott, R. (1979). Alien. Twentieth Century Fox.
psychology (Vol. 1, pp. 49–80). New York: Academic
Press. doi:10.1016/S0065-2601(08)60048-9 Scott, T. (Director). (1986). Top Gun [Motion picture].
Hollywood, CA: Paramount Pictures.
Schaeffer, P. (1966). Traité des objets musicaux. Paris:
Seuil.
458
Seah, M., & Cairns, P. (2008). From immersion to addic- Seyama, J., & Nagayama, R. S. (2007). The uncanny
tion in videogames. In [New York: ACM.]. Proceedings valley: The effect of realism on the impression of artifi-
of BCS HCI, 2008, 55–63. cial human faces. Presence (Cambridge, Mass.), 16(4),
337–351. doi:10.1162/pres.16.4.337
Seeking Alpha, “The Video Game Industry: An $18 Billion
Entertainment Juggernaut” August 05, 2008 http://seek- Shams, L., Kamitani, Y., & Shimojo, S. (2000).
ingalpha.com/article/89124-the-video-game-industry-an- What you see is what you hear. Nature, 408, 788.
18-billion-entertainment-juggernaut. doi:10.1038/35048669
Sega (2001). Rez [Computer game]. Sega. Shams, L., Kamitani, Y., & Shimojo, S. (2002). Visual
illusion induced by sound. Brain Research. Cognitive
Seitter, W. (2007). Das Spektrum der menschlichen
Brain Research, 14, 147–152. doi:10.1016/S0926-
Schallproduktionen. In H. Schulze & C. Wulf (Eds.),
6410(02)00069-1
Paragrana, Internationale Zeitschrift für Historische
Anthropologie, 16(2), 191-205. Berlin: Akademie Verlag. Sharpe, L. (2004). Patterns of autonomic arousal in
imaginal situations of winning and losing in problem
Sek, A., & Moore, B. C. (1995). Frequency discrimina-
gambling. Journal of Gambling Studies, 20, 95–104.
tion as a function of frequency, measured in several ways.
doi:10.1023/B:JOGS.0000016706.96540.43
The Journal of the Acoustical Society of America, 97(4),
2479–2486. doi:10.1121/1.411968 Sheridan, T. B. (1994). Further Musings on the Psycho-
physics of Presence. Presence (Cambridge, Mass.), 5,
Sengers, P., Boehner, K., Mateas, M., & Gay, G. (2008).
241–246.
The disenchantment of affect. Personal and Ubiquitous
Computing, 12(5), 347–358. doi:10.1007/s00779-007- Shiffrin, R. M., & Grantham, D. W. (1974). Can atten-
0161-4 tion be allocated to sensory modalities? Perception &
Psychophysics, 15, 460–474.
Sengers, P., & Gaver, B. (2006). Staying open to interpre-
tation: Engaging multiple meanings in design and evalu- Shilling, R., Zyda, M., & Wardynski, E. C. (2002). Intro-
ation. Proceedings of the 6th Conference on Designing ducing emotion into military simulation and videogame
Interactive Systems, 2006, 99-108. design: America’s Army: Operations and VIRTE. In
Conference GameOn 2002. Retrieved January 1, 2010,
Sequeira, S. D. S., Specht, K., Hämäläinen, H., & Hug-
from http://gamepipe.usc.edu/~zyda/pubs/Shilling-
dahl, K. (2008). The effects of different intensity levels
Gameon2002.pdf.
of background noise on dichotic listening to consonant-
vowel syllables. Scandinavian Journal of Psychology, Shultz, P. (2008). Music theory in music games . In Col-
49(4), 305–310. doi:10.1111/j.1467-9450.2008.00664.x lins, K. (Ed.), From Pac-Man to pop music: Interactive
audio in games and new media (pp. 177–188). Hampshire,
Serafin, S. (2004). Sound design to enhance presence in
UK: Ashgate.
photorealistic virtual reality. In Proceedings of the 2004
International Conference on Auditory Display. Sider, L. (Ed.). (2003). Soundscape: The School of Sound
lectures 1998-2001. London: Wallflower Press.
Serquera, J., Miranda, E. R. (2010) CA sound synthesis
with an extended version of the multi-type voter model. Sierra (1993). Gabriel Knight: Sins of the Fathers [Com-
AES128 (8029) London, UK. puter game]. Sierra Entertainment.
Sevsay, E. (2005). Handbuch der Instrumentationspraxis Silent hill 2. [Computer game]. (2001). KCET (Developer).
(1st ed.). Kassel, Germany: Bärenreiter. Redwood City: Konami of America.
459
Silent hill 3. [Computer game]. (2003). KCET (Developer). Smith, J. O. III. (1992). Physical modeling using digital
Redwood City: Konami of America. waveguides. Computer Music Journal, 16(4), 74–91.
doi:10.2307/3680470
Silent hill homecoming [Computer game]. (2008). Double
Helix & Konami (Developer/Co-Developer). Tokyo, Smith, B. R. (2004). Tuning into London c.1600 . In Bull,
Japan: Konami. M., & Back, L. (Eds.), The auditory culture reader (1st
ed., pp. 127–136). Oxford, UK: Berg.
Silent hill series. (1999-). Konami.
Sobchack, V., & Sobchack, T. (1980). An introduction to
Silent hill. [Computer game]. (1999). KCEK (Developer).
film. Boston, MA: Little Brown.
Redwood City: Konami of America.
Sonnenschein, D. (2001). Sound design: The expressive
Sim city. (1999-2007). Maxis.
power of music, voice and sound effects in cinema. Studio
Simmel, G. (1979). The metropolis and mental life. City, CA: Michael Wiese Productions.
Retrieved February 1, 2010, from http://www.black-
Sotamaa, O. (2009). The player’s game: Towards un-
wellpublishing.com/content/BPL_Images/Content_store/
derstanding player production among computer game
Sample_chapter/0631225137/Bridge.pdf.
cultures. Unpublished doctoral dissertation. University
SimTunes. [Video game], (1996). Maxis (Developer). of Tampere, Finland.
Singer, W., Engel, A. K., Kreiter, A. K., Munk, M. H. J., Space invaders [Computer game]. (1978). Tokyo, Japan:
Neuenschwander, S., & Roelfsema, P. R. (1997). Neu- Taito.
ronal assemblies: necessity, signature and detectability.
Spadoni, R. (2000). Uncanny bodies. Berkeley: University
Trends in Cognitive Sciences, 1(7), 252–261. doi:10.1016/
of California Press.
S1364-6613(97)01079-6
Spence, C., Nicholls, M. E. R., & Driver, J. (2001). The
SingStar. [Video game], (2004). Sony Computer Enter-
cost of expecting events in the wrong sensory modality.
tainment Europe (PlayStation 2 & 3).
Perception & Psychophysics, 63(2), 330–336.
Sjöström, V. (1921). The phantom chariot. Svensk
Splinter cell series. (2002-). Ubisoft.
Filmindustri.
Spore. (2008). Electronic Arts.
Skea, W. H. (1995). “Postmodern” Las Vegas and its ef-
fects on gambling. Journal of Gambling Studies, 11(2), Spyro the Dragon. (2008). Insomniac Games. Sony
231–235. doi:10.1007/BF02107117 Computer Entertainment.
Slater, M. (2002). Presence and the sixth sense. Stanton, A. (2008). Wall-E. Pixar Animation Studios.
Presence (Cambridge, Mass.), 11(4), 435–439.
Steckenfinger, A., & Ghazanfar, A. (2009). Monkey
doi:10.1162/105474602760204327
behavior falls into the uncanny valley. Proceedings of
Smith, C. A., & Morris, L. W. (1976). Effects of stimulative the National Academy of Sciences of the United States
and sedative music on cognitive and emotional compo- of America, 106(43), 18362–18366. doi:10.1073/
nents of anxiety. Psychological Reports, 38, 1187–1193. pnas.0910063106
Smith, B. R. (1999). The acoustic world of early modern Stenzel, M. (2005). Automatische Arrangiertechniken
England: Attending to the o-factor (1st ed.). Chicago: für affektive Sound-Engines von Computerspielen. Un-
University Of Chicago Press. published diploma thesis. Otto-von-Guericke University,
Department of Simulation and Graphics, Magdeburg,
Germany.
460
Steuer, J. (1992). Defining virtual reality: Dimensions Sweethome. [Computer game]. (1989). Capcom (Devel-
determining telepresence. The Journal of Communication, oper). Osaka: Capcom.
42(4), 73–93. doi:10.1111/j.1460-2466.1992.tb00812.x
Sweetser, P., & Wyeth, P. (2005). GameFlow: A model for
Stigwood, R., & Badham, J. (Producers). (1977). Saturday evaluating player enjoyment in games. [CIE]. Computers
night fever [Motion picture]. Hollywood, CA: Paramount. in Entertainment, 3(3), 3. doi:10.1145/1077246.1077253
Stockburger, A. (2007). Listen to the iceberg: On the im- Sykes, J., & Brown, S. (2003). Affective gaming: Mea-
pact of sound in digital games . In von Borries, F., Walz, suring emotion through the gamepad. In Proceedings of
S. P., & Böttger, M. (Eds.), Space time play: Computer Conference on Human Factors in Computing Systems
games, architecture and urbanism: The next level (pp. (CHI ‘03).
##-##). Location: Birkhäuser Publishing.
Takala, T., & Hahn, J. (1992). Sound rendering. In Pro-
Stockburger, A. (2003). The game environment from ceedings of SIGGRAPH ’92: The 19th annual conference
an auditory perspective. In M. Copier & J. Raessens on Computer graphics and interactive techniques, 26(2),
(Eds.), Proceedings of Level Up: Digital Games Research 211-220. New York: ACM.
Conference.
Tamminen, S., Oulasvirta, A., Toiskallio, K., & Kankainen,
Stockmann, L. (2007). Designing an audio API for mo- A. (2004). Understanding mobile contexts. Personal and
bile platforms. Internship report. Magdeburg, Germany: Ubiquitous Computing, 8(2), 135–143. doi:10.1007/
Otto-von-Guericke University. s00779-004-0263-1
Stockmann, L., Berndt, A., & Röber, N. (2008). A musical Tarantino, Q. (1994). Pulp fiction. Miramax.
instrument based on interactive sonification techniques.
Tarkovsky, A. (1972). Solaris. Mosfilm.
In Proceedings of Audio Mostly 2008: 3rd Conference
on Interaction with Sound (pp. 72-79). Piteå, Sweden: Tarkovsky, A. (1979). Stalker. Mosfilm.
Interactive Institute/Sonic Studio Piteå.
Tarkovsky, A. (1986). Sacrifice. Argos Films.
Subrahmanyan, N., & Lal, B. (1974). A textbook of sound.
Taube, H. K. (2004). Notes from the metalevel: Introduc-
Delhi: University of Delhi.
tion to algorithmic music composition. London, UK:
Sucker Punch Productions (Developer). (2009). Famous Taylor & Francis.
[Computer game]. Sony Computer Entertainment.
Taylor, L. (2005). Toward a spatial practice in video
Sullivan, D. B. (1992). Commentary and viewer percep- games. Gamology.Retrieved from http://www.gamology.
tion of player hostility: Adding punch to televised sports. org/node/809.
Journal of Broadcasting & Electronic Media, 35, 487–504.
Tellegen, A., Watson, D., & Clark, A. L. (1999). On the
Sun Microsystems. (2010). Java ME API. Retrieved dimensional and hierarchical structure of affect. Psy-
February 4, 2010, from http://java.sun.com/javame/ chological Science, 10(4), 297–303. doi:10.1111/1467-
reference/apis.jsp. 9280.00157
Super Mario Bros. NES (1985). Nintendo. Nintendo. guest [Computer game]. (1993). Trilobyte (Developer).
th
London: Virgin Games.

Surman, D. (2007). Pleasure, spectacle and reward in
Capcom’s Street Fighter series . In Krzywinska, T., & Thayer, J. F., & Levenson, R. W. (1983). Effects of music
Atkins, B. (Eds.), Videogame, player, text (pp. 204–221). on psychophysiological responses to a stressful film.
London: Wallflower. Psychomusicology, 3(1), 44–52.
461
The adventures of Rocky and Bullwinkle [Computer game]. Thompson, J. B. (1995). The media and modernity.
(1992). Radical Entertainment (Developer). Agoura Hills, Standford, CA: Stanford University Press.
CA: THQ.
The Beatles. Rock band [Computer game]. (2009). Har- Uncanny behaviour in survival horror games. Journal
monix. Redwood City, CA: EA Games. of Gaming and Virtual Worlds, 2(1), 3–25. doi:10.1386/
jgvw.2.1.3_1
The casting [Technology demonstration]. (2006). Quantic
Dream (Developer). Foster City, CA: Sony Computer Tinwell, A., Grimshaw, M., & Williams, A. (2011).
Entertainment, Inc. Uncanny speech . In Grimshaw, M. (Ed.), Game Sound
Technology and Player Interaction: Concepts and De-
The Curious Team. (1999). Curious about space: Can
velopments. Hershey, PA: IGI Global.
you hear sounds in space? Ask an Astronomer. Retrieved
September 31, 2009, from http://curious.astro.cornell.edu/ Tinwell, A. (2009). The uncanny as usability obstacle.
question.php?number=8 In A. A. Ozok & P. Zaphiris (Eds.), Online Communi-
ties and Social Computing workshop, HCI International
TheElder Scrolls III: Morrowind. (2002). Bethesda
2009, 12, 622-631.
Softworks.
Tinwell, A., & Grimshaw, M. (2009). Bridging the
The Flintstones. (1991). The rescue of Dino & Hoppy
uncanny: An impossible traverse? In Proceedings of
[Computer game]. Vancouver, Canada: Taito Corporation.
Mindtrek 2009.
The Jetsons. (1992). Cogswell’s caper! [Computer game].
Tobler, H. (2004). CRML—Implementierung eines
Vancouver, Canada: Taito Corporation.
adaptiven Audiosystems. Unpublished master’s thesis.
The legend of Zelda: A link to the past. (1992). Nintendo Fachhochschule Hagenberg, Hagenberg, Austria.
EAD (Developer). Kyoto, Japan: Nintendo
Tom Clancy’s ghost recon: Advanced warfighter 2.
The legend of Zelda: Twilight princess. (20060). Nintendo. (2007). Ubisoft.
The path. (2009). Tale of Tales. Toneatto, T., Blitz-Miller, T., Calderwood, K., Dragonetti,
R., & Tsanos, A. (1997). Cognitive distortions in heavy
The Sims series (2000-). Electronic Arts.
gambling. Journal of Gambling Studies, 13, 253–261.
Theremin, L. S. (1924). Method of and apparatus for the doi:10.1023/A:1024983300428
generation of sounds. U.S. Patent No. 73,529. Washington,
Tonkiss, F. (2004). Aural postcards: sound, memory and
DC: U.S. Patent and Trademark Office.
the city . In Back, M., & Bull, L. (Eds.), The auditory
Thibaud, J. (1998). The acoustic embodiment of social culture reader (1st ed., pp. 303–310). Oxford, UK: Berg.
practice: Towards a praxiology of sound environment .
Too human [Computer game]. (2008). Silicon Knights
In Karlsson, H. (Ed.), Proceedings of Stockholm, Hey
(Developer). United States: Microsoft Game Studios.
Listen! (pp. 17–22). Stockholm: The Royal Swedish
Academy of Music. Toprac, P., & Abdel-Meguid, A. (2011). Causing fear,
suspense, and anxiety using sound design in computer
Thief 3: Deadly shadows. (2004). Eidos.
games . In Grimshaw, M. (Ed.), Game sound technol-
Thief: The dark project. (1998), Eidos. ogy and player interaction: Concepts and developments.
Hershey: IGI Global.
Thom, R. (1999). Designing a movie for sound. Retrieved
July 7, 2009, from http://filmsound.org/articles/design-
ing_for_sound.htm
462
Trautmann, L., & Rabenstein, R. (2003). Digital sound Välimäki, V., Pakarinen, J., Erkut, C., & Karjalainen,
synthesis by physical modelling using the functional M. (2006). Discrete-time modelling of musical in-
transformation method. New York: Kluwer Academic/ struments. Reports on Progress in Physics, 69, 1–78.
Plenum Publishers. doi:10.1088/0034-4885/69/1/R01
Traxel, W., & Wrede, G. (1959). Changes in physiological Valve Corporation. (1998). Half-Life [computer game].
skin responses as affected by musical selection. Journal Sierra Entertainment.
of Experimental Psychology, 16, 57–61.
van den Doel, K., & Pai, D. K. (1998). The sounds of
Traxxpad. [Video game], (2007). Eidos Interactive (Play- physical shapes. Presence (Cambridge, Mass.), 7(4),
Station Portable). 382–395. doi:10.1162/105474698565794
Truax, B. (2001). Acoustic communication. Westport, Vatakis, A., & Spence, C. (2006). Audiovisual synchrony
CT: Greenwood Press. perception for speech and music using a temporal or-
der judgment task. Neuroscience Letters, 393, 40–44.
Truax, B. (1995, September). Sound in context: Acoustic
doi:10.1016/j.neulet.2005.09.032
communication and soundscape research Simon Fraser
University. Paper presented at the International Computer Verbiest, N., Cornelis, C., & Saeys, Y. (2009). Valued
Music Conference. constraint satisfaction problems applied to functional har-
mony. In Proceedings of IFSA World Congress EUSFLAT
Truppin, A. (1992). And then there was sound: The films
Conference (pp. 925-930). Lisbon, Portugal: International
of Andrei Tarkovsky . In Altman, R. (Ed.), Sound theory
Fuzzy Systems Association, European Society for Fuzzy
sound practice. New York: Routledge.
Logic and Technology.
Tsukahara, N. (2002). Game machine with random sound
Vicario, G. B. (2001). Prolegomena to the perceptual
effects. U.S. Patent No. 6,416,411 B1. Washington, DC:
study of sounds . In Polotti, P., Papetti, S., Rocchesso,
U.S. Patent and Trademark Office.
D., & Delle, S. (Eds.), The sounding object (Sob project)
Tulving, E., & Lindsay, P. H. (1967). Identification of si- (p. 13). Verona: University of Verona.
multaneously presented simple visual and auditory stimuli.
Vinayagamoorthy, V., Steed, A., & Slater, M. (2005).
Acta Psychologica, 27, 101–109. doi:10.1016/0001-
Building characters: Lessons drawn from virtual environ-
6918(67)90050-9
ments. In Proceedings of Toward social mechanisms of
Turner, N., & Horbay, R. (2004). How do slot machines android science, COGSCI 200, 119-126.
and other electronic gambling machines actually work?
von Ahn, L., & Dabbish, L. (2008). Designing games with
Journal of Gambling Issues, 11.
a purpose. Communications of the ACM, 51(8), 58–67.
Tuuri, K., Mustonen, M., & Pirhonen, A. (2007). Same doi:10.1145/1378704.1378719
sound—different meanings: A novel scheme for modes
Vorländer, M. (2008). Auralization—Fundamentals of
of listening. In Proceedings of the Second International
acoustics, modelling, simulation, algorithms and acoustic
AudioMostly Conference, 2007, 13-18.
virtual reality (1st ed.). Berlin: Springer.
Ubisoft Shanghai (Developer). (2008). Tom Clancy’s
Wachowski, L., & Wachowski, A. (1999). The matrix.
EndWar [Computer game]. Ubisoft.
Warner Bros. Pictures.
Ultimate band. (2008). Fall Line Studios.
Väänänen, R. (1998). Verification model of advanced BIFS

(systems VM 4.0 subpart 2). ISO/IEC JTCI/SC29/WG11.
463
Wallén, J. (2008). Från smet till klarhet. Unpublished Wessel, D. L. (1973). Physchoacoustics and music: A
bachelor’s thesis. University of Skövde, Country. Re- report from Michigan State University. PAGE Bulletin
trieved month day, year, from http://his.diva-portal.org/ of the Computers Arts Soc., 30.
smash/record.jsf?searchId=1&pid=diva2:2429
West, S. (Director). (2001). Laura Croft:Tomb raider
Warcraft 3: Reign of chaos. (2002). Blizzard Entertain- [Motion picture]. Hollywood, CA: Paramount.
ment.
Westerkamp, H. (1990). Listening and soundmaking: A
Ware, C. (2004). Information visualization: Perception for study of music-as-environment . In Lander, D., & Lexier,
design (2nd ed.). Location: Morgan Kaufman Publishing. M. (Eds.), Sound by artists (pp. ##-##). Location: Art
Metropole & Walter Phillips Gallery.
Warren, D. H., Welch, R. B., & McCarthy, T. J. (1982).
The role of visual-auditory “compellingness” in the Westermann, C. F. (2008). Sound branding and corporate
ventriloquism effect: Implications for transitivity among voice: Strategic brand management using sound. Usability
the spatial senses. Perception & Psychophysics, 30(6), of speech dialog systems: Listening to the target audience.
557–564. Berlin: Springer-Verlag.
Warren, R. M. (1992). Perception of acoustic sequences Whalen, Z. (2004). Play along: An approach to videogame
. In McAdams, (Eds.), Thinking in sound: The cognitive music. Game Studies, 4(1). Retrieved from http://www.
psychology of human audition. New York: Clarendon gamestudies.org/0401/whalen/.
Press. Oxford: University Press.
White, G. (2008). Comment on the IEZA: A framework
Watson, D., & Tellegen, A. (1985). Toward a consensual for game audio. Retrieved January 13, 2010, from http://
structure of mood. Psychological Bulletin, 98(2), 219–235. www.gamasutra.com/view/feature/3509/ieza_a_frame-
doi:10.1037/0033-2909.98.2.219 work_for_game_audio.php
Watson, D., Wiese, D., Vaidya, J., & Tellegen, A. (1999). Whitmore, G. (2009). The runtime studio in your console:
The Two General Activation Systems of Affect: Structural The inevitable directionality of game audio. Develop,
findings, evolutionary considerations, and psychobiologi- 94, 21.
cal evidence. Journal of Personality and Social Psychol-
Whittington, W. (2007). Sound design & science fiction.
ogy, 76(5), 820–838. doi:10.1037/0022-3514.76.5.820
Austin: University of Texas Press.
Wenzel, E. M. (1998). The impact of system latency on
WiiMusic. [Video game], (2008). Kyoto: Nintendo.
dynamic performance in virtual acoustic environments.
In Proceedings of the 15th International Congress on Wilde, M. D. (2004). Audio programming for interactive
Acoustics and 135th Meeting of the Acoustical Society games. Oxford: Focal Press.
of America, 2405-2406.
Wilhelmsson, U., & Wallén, J. (2011). A combined model
Wenzel, E. M. (2001). Effect of increasing system latency for the structuring of game audio . In Grimshaw, M.
on localization of virtual sounds with short and long dura- (Ed.), Game Sound Technology and Player Interaction:
tion. In Proceedings of 7th International Conference on Concepts and Developments. Hershey, PA: IGI Global.
Auditory Displays (ICAD). 185-190.
Wilhelmsson, U. (2001). Enacting the point of being.
Weschler, L. (2002). Why is this man smiling? Wired. Computer games, interaction and film theory. Unpublished
Retrieved April 7, 2009, from http://www.wired.com/ doctoral dissertation. University of Copenhagen, Country.
wired/archive/10.06/face.html.
464
Williams, A. (1985). Godard’s use of sound . In Weis, E., Wundt, W. (1896). Grundriss der Psychologie. Leipzig,
& Belton, J. (Eds.), Film sound: Theory and practice. Germany: Alfred Kröner Verlag.
New York: Columbia University Press.
Wurtzler, S. (1992). “She sang live, but the microphone
Williams, L. (2006). Music videogames: The inception, was turned off”: The live, the recorded and the subject of
progression and future of the music videogame. In Pro- representation . In Altman, R. (Ed.), Sound theory sound
ceedings of Audio Mostly 2006: A Conference on Sound practice. New York: Routledge.
in Games (pp. 5-8). Piteå, Sweden: Interactive Institute,
Yalch, R. F., & Spangenberg, E. R. (2000). The effects
Sonic Studio Piteå.
of music in a retail setting on real and perceived shop-
Wingstedt, J. (2008). Making music mean: On functions ping times. Journal of Business Research, 49, 139–147.
of, and knowledge about, narrative music in multimedia. doi:10.1016/S0148-2963(99)00003-X
Unpublished doctoral dissertation. Luleå University of
Yamada, M. (2009, September). Can music change the
Technology, Sweden.
success rate in a slot-machine game? Paper presented at
Wolf, M. J. P. (2003). Abstraction in the video game . In the Western Pacific Acoustics Conference, Bejing, China.
Perron, B., & Wolf, M. J. P. (Eds.), The video game theory
Yee-King, M., & Roth, M. (2008). Synthbot: An unsu-
reader (pp. 47–65). New York: Routledge.
pervised software synthesiser programmer. International
Wolfson, S., & Case, G. (2000). The effects of sound Computer Music Conference.
and colour on responses to a computer game. Interact-
Yoshi’s Island. (2007). Nintendo Japan. Nintendo.
ing with Computers, 13, 183–192. doi:10.1016/S0953-
5438(00)00037-0 Yost, W. A. (2007). Fundamentals of hearing: An intro-
duction (5th ed.). New York: Academic Press.
Wooller, R. W., & Brown, A. R. (2005). Investigating
morphing algorithms for generative music. In Proceed- You don’t know jack [Computer game]. (1995). Berkeley
ings of Third Iteration: Third International Conference Systems/Jellyvision (Developer). Fresno, CA: Sierra
on Generative Systems in the Electronic Arts. Melbourne, On-Line.
Australia.
Young, K. (2006). Recreating reality. Game Sound. Re-
Working Group Noise Eurocities. (n.d.). Retrieved January trieved February 13, 2009, from http://www.gamesound.
10, 2010, from http://workinggroupnoise.web-log.nl/. org/articles/RecreatingReality.html
World of Warcraft. (2004). Blizzard Entertainment. Zahorik, P., & Jenison, R. L. (1998). Presence as being-
Blizzard. in-the-world. Presence (Cambridge, Mass.), 7(1), 78–89.
doi:10.1162/105474698565541
World of warcraft. (2005). Blizzard.
Zelda: Phantom Hourglass. (2007). Nintendo. Nintendo.
World soundscape project. (n.d.). Retrieved September
31, 2009, from http://www.sfu.ca/~truax/wsp.html Zemekis, R. (Producer/Director). (2004). The polar
express [Motion picture]. California: Castle Rock En-
Woszczyk, W., Bech, S., & Hansen, V. (1995). Interactions
tertainment.
between audio-visual factors in a home theater system:
Definition of subjective attributes. AES 99th Convention. Zemekis, R. (Producer/Director). (2007). Beowulf [Motion
Preprint 4133. picture]. California: ImageMovers.
Wrightson, K. (2000). An introduction to acoustic ecol-

ogy. Soundscape: The Journal of Acoustic Ecology, I (I,
Spring 2000), 10-13.
465
Zhang, T., & Jay Kuo, C. C. (2001). Audio content analysis Zillman, D. (1991). The logic of suspense and mystery
for online audiovisual data segmentation and classifica- . In Bryant, J., & Zillman, D. (Eds.), Responding to the
tion. IEEE Transactions on Speech and Audio Processing, screen: Reception and reaction processes (pp. 281–303).
9(4), 441–457. doi:10.1109/89.917689 Hillsdale, NJ: Lawrence Erlbaum Associates.
Zheng, C. & James, D. L. (2010). Rigid-Body Fracture Zwicker, E., & Fastl, H. (1999). Psychoacoustics—Facts
Sound with Precomputed Soundbanks. ACM Transaction and models (2nd ed.). Berlin: Springer Verlag.
on Graphics (SIGGRAPH 2010), 29(3).
Zwicker, E. (1961). Subdivision of the audible frequency
Zheng, C., & James, D. L. (2009). Harmonic fluids. ACM range into critical bands. The Journal of the Acoustical
Transaction on Graphics (SIGGRAPH 2009), 28(3). Society of America, 33, 248. doi:10.1121/1.1908630
Zielinski, S., Rumsey, F., Bech, S., de Bruyn, B., & Kassier,
R. (2003). Computer games and multichannel audio qual-
ity—The effect of division of attention between auditory
and visual modalities. In Proceedings of the AES 24th
International Conference on Multichannel Audio, 85-93.
466
467
About the Contributors
Mark Grimshaw is a Reader in Creative Technologies in the School of Business & Creative Technolo-
gies at the University of Bolton, United Kingdom, where he runs the Emotioneering Research Group.
He possesses an honours degree in music, an MSc in music technology, and a PhD in computer game
sound from South Africa, England, and New Zealand and is widely published in the area of computer
games, particularly on the topics of immersion and sound. Mark’s previous book was entitled The
Acoustic Ecology of the First-Person Shooter and he is also the lead developer for WIKINDX, an Open
Source, Virtual Research Environment in wide use around the world.
***
Ahmed Alaa Abdel-Meguid was born in Cairo, Egypt to Alaa Abdel-Meguid and Azza Tawfik.
Soon afterwards, his family moved to the Midwestern United States where they soon made a home for
themselves. His first video game was Joust for the Atari 5200 at age six. He started in game design as
a game-master for tabletop roleplaying games such as Dungeons and Dragons during his high-school
years. After earning his Bachelor’s of Organizational Leadership at Illinois State University, he im-
mediately went on to The Guildhall at Southern Methodist University to earn his Masters of Interactive
Technology with a specialization in Level Design. As of writing this biography, he is currently working
on The Old Republic at BioWare Austin as a World Builder. In his spare time, he plays the guitar and
violin, swings fire, and paints little space marines and orcs.
Valter Alves is a lecturer of Computer Science at the Informatics Department of the Polytechnic
Institute of Viseu, Portugal. He has taught diverse courses to Informatics Engineering and Technology
and Design of Multimedia students. He holds a degree in Informatics Engineering and a MsC in Infor-
mation Systems and Technologies, both from the Faculty of Sciences and Technology of the University
of Coimbra, where he is now a PhD candidate under the supervision of Professor Licínio Roque. Valter
is also a researcher at the Center for Informatics and Systems of the University of Coimbra. His research
interests include human–computer interaction, user experience, computer game design, sound design,
context, emotions, and research targeting handicapped people. Currently his research is focused on the
enrichment of user experience through soundscape design.
Axel Berndt studied computer science and music at the Otto-von-Guericke University in Magde-
burg, Germany. He is currently working there as a computer music researcher. His research interests
comprise expressive performance analysis and modelling, musical structure analysis, automatic com-
position, arrangement, and adaptation for interactive media scoring. Beyond that, Axel Berndt is active
as a musician and composer.
Karen Collins is Canada Research Chair in Interactive Audio at the Canadian Centre of Arts and
Technology, University of Waterloo, Canada. She is the author of a book on game audio, Game Sound:
An Introduction to the History, Theory and Practice of Video Game Music and Sound Design published
by The MIT Press, and editor of From Pac-Man to Pop Music: Interactive Audio in Games and New
Media published by Ashgate.
Stuart Cunningham was awarded the BSc degree in Computer Networks in 2001 and, in 2003, was
awarded the MSc Multimedia Communications degree with Distinction, both from the University of
Paisley (UK). In 2009 he was awarded the degree of PhD in Data Reduced Audio Coding by the Uni-
versity of Wales (UK). He is a Fellow of the British Computer Society (BCS), Chartered IT Professional
(CITP), Member of the Institution of Engineering & Technology (IET) and Member of the Institute of
Electrical and Electronics Engineers (IEEE). Dr Cunningham was a member of the MPEG Music No-
tation Standards (MPEG-SMR) working group. His research interests are in the areas of digital audio,
computer music, human perception of sound, and audio compression techniques. In his spare time, Stuart
is an avid mountain biker and performs in a Pink Floyd tribute band named Pink Lloyd.
Michael J. Dixon is Professor of Psychology at the University of Waterloo. He is one of the fore-
most authorities on synaesthesia (an anomalous type of perception). His current research into problem
gambling is aimed at identifying those elements of the gambling experience that lead to measurable
changes in behaviour–changes which may, potentially, lead to problem gambling.
Milena Droumeva has a Bachelors degree in Communication (focusing on acoustic communica-

tion and acoustic ecology) and media studies. Then she completed a Masters in Interactive Arts and
Technologies focusing on interactive soundscape design for responsive environments and ambient in-
telligent games. She has since worked on a variety of game sound projects and has a particular interest
in adapting sonification techniques and environmental sound for games. Currently, she is pursuing a
Doctorate in Education exploring the cultural and epistemological implications of secondary orality and
the soundscape. She is interested in drawing connections between listening experiences in designed
soundscapes, and our practices and conceptions around knowledge. She did not grow up as a gamer
but came to gaming and a subsequent keen interest in game sound through procrastination from other
graduate work.
Andy Farnell is a computer scientist from the UK specialising in audio DSP and synthesis. Author
of Designing Sound, his original research and design work champions the emerging field of Procedural
Audio. Between consultancy for pioneering game and audio technology companies he teaches widely,
as resident lecturer and visiting professor at several European institutions. Andy is a long-time advo-
cate of free open source software, educational opportunities and universal access to enabling tools and
knowledge.
Jonathan Fugelsang is Assistant Professor in Cognitive Psychology at the University of Waterloo.

His research interests span several topics in cognitive psychology and cognitive neuroscience, though
468
his primary focus is in higher level cognition. He has recently joined the problem gambling research
team at the University of Waterloo.
Guillaume Roux-Girard is a Master’s Degree student in film studies at the University of Montreal.
His current research focus on the different roles of sound in horror video games. His recent publica-
tions include an appendix chapter on film studies and video games in the Video Game Theory Reader
2 (Routledge, 2009) and a chapter on the Alone in the Dark series (1992-2008) in the anthology Horror
Video Games: Essays on the Fusion of Fear and Play (McFarland, 2009).
Vic Grout was awarded a BSc in Mathematics and Computing from the University of Exeter in
1984 and a PhD in Communication Engineering from Plymouth Polytechnic in 1988. He has worked in
senior positions in both academia and industry for twenty years and has published and presented over
200 research papers and three books. He is currently Professor of Network Algorithms at Glyndŵr Uni-
versity, Wales, where he leads the Centre for Applied Internet Research. Professor Grout is a Chartered
Engineer, Chartered Electrical Engineer, Chartered Scientist, Chartered Mathematician and Chartered
IT Professional, a Fellow of the Institute of Mathematics and its Applications, British Computer Society
and Institution of Engineering and Technology and a Senior Member of the Institute of Electrical and
Electronics Engineers. He chairs the biennial international conference series on Internet Technologies
and Applications (ITA 05, ITA 07 and ITA 09).
Kevin Harrigan teaches game design and is the lead researcher and contact person for the Problem
Gambling Research Team at the University of Waterloo. His primary research interest is in gambling
addictions with a focus on why so many slot machine gamblers become addicted.
Daniel Hug has a background in music, sound design, interaction design and project management in
applied research. From 1999 he has investigated sound and interaction design-related questions through
installations, design works and theoretical publications. Since 2005, he teaches sound studies and sound
design at the Interaction Design and the Game Design departments of the Zurich University of the Arts,
Switzerland. Daniel is currently pursuing a PhD on sound design for interactive commodities at the
University of the Arts and Industrial Design of Linz, Austria, is management committee member in
the European COST-initiative Sonic Interaction Design, and greatly enjoys the fact that his profession
“requires” him to play computer games regularly.
Kristine Jørgensen is a postdoctoral research fellow at the Department of Information Science and
Media Studies, University of Bergen, Norway. She holds a Ph.D. in Media Studies from the Univer-
sity of Copenhagen with a thesis on the functional role of computer game sound. Her current research
project is funded by a grant from the Norwegian Research Council, and focuses on the communicative
aspects of computer games, fiction in games, and the relationship between the user interface and the
gameworld. Jørgensen is also a board member of Joingame, the Norwegian network for games research
and development.
Mats Liljedahl Since the mid 1980’s, Mats Liljedahl has been working with sound, music and digital
and interactive media in various forms and contexts. Since 2000 he has been at the Interactive Institute,
Sweden, involved in research and development projects related to sound and sound design, all built on
469
and carried by interactive media. Mats Liljedahl has a special interest in how people perceive sound and
how sound affects us cognitively, emotionally and intuitively. This interest has led to projects focusing
on how sound can be used in new ways and in new contexts. Examples of projects include audio based
games as research tools and as potential new gaming products, sound design for information and new
tools and methods for working with sound design.
Eoin Mullan obtained his undergraduate Degree in Electronic and Software Engineering from
the University of Ulster in 2005. This included an industrial placement year spent writing software
for British Telecom. In 2006 he completed a Masters in Sonic Arts at the Sonic Arts Research Centre
(SARC) in Queen’s University Belfast, which combined his background in programming with elements
of music, sound design, composition, musical interface design, acoustics, and physical modelling. Eoin
returned to SARC to undertake his PhD in the area of physical modelling for real time sound synthesis
in computer games and virtual environments. He is currently researching efficient ways to synthesise
contact sounds for objects that may be modified in real-time and for arbitrarily shaped objects.
David Murphy is a lecturer and researcher at the Department of Computer Science, University Col-
lege Cork, Ireland where he is also a director of the Interactive Medical Computing Lab. In a previous
life, David was a professional musician, and a Multimedia Engineer at Apple Computer, where he was
responsible for Audio and MIDI in Apple products. In 1999 David left Apple to setup the Multimedia
section of the Computer Science Department, UCC. His research interests include spatial sound, serious
games, and virtual reality.
Lennart Nacke received one of Europe’s first Ph.D. degrees in Digital Game Development from
Blekinge Institute of Technology, Sweden. He is currently working on affective and entertainment
computing as a postdoctoral fellow in the Human-Computer Interaction Lab of the University of Sas-
katchewan, Canada. He chaired and co-organized several expert panels on psychophysiological player
measurement and interaction, game usability and UX at academic conferences (e.g., DiGRA, Future
Play, CHI) and industry venues (e.g., GDC Canada). As much as an avid gamer, he is a passionate sci-
entist, whose research interests are psychophysiological player testing and interaction for example with
EEG (i.e., brainwaves) and EMG (i.e., facial muscle contractions) or eye tracking as well as gameplay
experience in player-game interaction, technology-driven innovation (e.g., playability metrics, affective
computing) and innovative interaction design with digital entertainment technologies.
Flaithrí Neff is a lecturer and researcher at the Department of Electrical & Electronic Engineer-
ing, Limerick Institute of Technology, Ireland. He is also a research member of the IDEAS Research
Group at the Department of Computer Science, University College Cork, Ireland, where he is currently
completing his PhD studies. In 2002 he attained a first class honours MSc degree at the University of
Limerick, Ireland specializing in Audio Technology. His research interests are in virtual sonic interface
design and intelligent hearing systems. He is particularly focused on applying his research to issues
encountered by visually-disabled users of technology.
Linda O Keeffe (www.lindaokeeffe.com) is a sound artist currently pursuing a PhD within the de-
partment of sociology, Maynooth and her working title is How I See What I Hear. She has exhibited
internationally and in Ireland where she lives. O Keeffe is also in the process of composing a body of
work with musician composer Tony Doyle for performance and CD.
470
Richard Picking is a Reader in Computing at Glyndŵr University in Wales and Deputy Director
of the Centre for Applied Internet Research (CAIR). He has a BSc (Hons) degree in Computing and
Operational Research from Leeds Polytechnic (UK, 1986), an MSc in Control Engineering and Infor-
mation Technology (University of Sheffield, UK, 1987) and a PhD in Interactive Multimedia Interface
Design from Loughborough University (UK) in 1996. His research interests cover various aspects of
user-interface design and usability. Rich is a passionate saxophonist and keen songwriter.
Ulrich Reiter is a researcher and lecturer working in the fields of audiovisual quality perception,
subjective assessment methodologies, and interactivity issues in audiovisual applications at the Norwegian
University of Science and Technology (NTNU) in Trondheim, Norway. He holds a Master’s degree in
electrical engineering from RWTH Aachen, and a PhD in media technology from TU Ilmenau, both in
Germany. Ulrich was the development coordinator for the cross-platform, object-based, multi-processing,
and real-time audio rendering engine TANGA used in the IAVAS I3D MPEG-4 player. His work has
been published in numerous AES-, IEEE- and other journals, conference proceedings and papers. He
was the recipient of the ‘IEEE International Symposium on Consumer Electronics (ISCE) Best Paper
Award’ in 2005 and 2007. Ulrich’s current research focus is on cross-modal effects in audiovisual media.
Licínio Roque obtained a PhD in Informatics Engineering from the University of Coimbra while
developing Context Engineering, a socio-technical approach to Information Systems Development.
He has been practicing research and development in diverse fields: management information systems,
individual and organizational learning, technologies for online communities, and computer games.
Over the last 10 years he taught postgraduate courses on Software Engineering, Human-Computer
Interaction, Ludic Learning Contexts, Game Studies and Development, using studio and project-based
methodologies. He also teaches a course on game design as strategy for exploring cultural heritage as
part of the EuroMACHS European Master Program. Currently, he does research on design methodology
and technologies for multiplayer online games. He is Adjunct Teaching Professor at Carnegie Mellon
University, on the MSE Program.
Holly Tessler is Senior Lecturer and Program Leader in the Music Industry Management program
at the University of East London, UK. She recently completed her PhD on music and branding at the
Institute of Popular Music at the University of Liverpool.
Angela Tinwell As a Senior Lecturer in the School of Business & Creative Technologies at the
University of Bolton, Angela Tinwell is researching the subject area of the Uncanny for a PhD. Recent
works, including Uncanny as Usability Obstacle, authored for the HCI International Conference 2009,
and Survival Horror Games – An Uncanny Modality, for the Thinking After Dark Conference, 2009,
investigate the implications of the Uncanny Valley phenomenon for realistic, human-like virtual char-
acters within 3D immersive environments. Angela Tinwell teaches modules on the Computer Games
Design and Computer Games Art Courses at the University of Bolton which involve the design and
creation of 3D characters for Computer Games.
Paul Komninos Toprac is a lecturer at The Guildhall at Southern Methodist University, where he
focuses on teaching and the research, design, and implementation of game technology-based applica-
tions. He has more than the twenty years of experience in the software industry, in roles ranging from
471
CEO to product manager to consultant. During his studies at the University of Texas at Austin, Paul
was the producer and designer of a science-based computer game called Alien Rescue: The Game,
which was used in his dissertation entitled The Effects of a Problem Based Learning Computer Game
on Continuing Motivation to Learn Science. He holds a Bachelor’s of Science in Engineering, a Master’s
of Business Administration, and a Ph.D. in Curriculum and Instruction from The University of Texas
at Austin. In his spare time, Paul hopes to convince universities and schools that students can have fun
and learn at the same time.
Jacob Wallén holds a Bachelors degree of arts in the field of computer game design from the Uni-
versity of Skövde, Sweden. He has been making music and working with sound for the greater part of
his life, and the education at the University of Skövde made it possible for him to combine his interest
for sound with computer games. His bachelor thesis, Från smet till klarhet ‘from batter to better’, is
about creating a complete and balanced sound design for computer games. He has been in charge of
sound and music for a couple of smaller game projects and he recently finished working with the game
Testament (www.testamentgame.com), a game funded by the Church of Sweden.
Ulf Wilhelmsson holds a Ph.D from the University of Copenhagen, Denmark. His Ph.D dissertation,
Enacting the Point of Being, has a focus on computer games and film theory. Wilhelmsson was one of
the initiators of the computer game studies programs that have been offered since 2002 by the school
of Humanities & Informatics at the University of Skövde, Sweden and he is currently working as senior
lecturer and coordinator for these programs. He is a member of the InGaMe Lab research group (www.
his.se/iki/ingame) at the University of Skövde. His research interests lies primarily within computer
game studies and integrate film theory, cognitive theory and theories concerned with the audiovisual
construction of space and narratives.
Andrew Williams is a Principal Lecturer in the School of Business & Creative Technologies at the
University of Bolton. He has published on engagement and motivation in game development processes
and on the use of competitive strategy games as a way of motivating students. He is currently leading a
project relating to the use of gesture-driven interfaces for games. He leads a team of seven in delivering
three games-related undergraduate programmes and teaches on the Advanced Games Technology, Games
Design Team Project and Games Evaluation modules. He has sat on a number of review panels for the
provision of games undergraduate degrees and he is currently external examiner for the University of
Hull’s MSc in Games Programming.
472
473
Index
Symbols Ambisonics 297, 298

ambulatory listening 112, 114, 130
3D audio interfaces 53
ambulatory visual position 114
3D-environments 27
Amplitude 62, 68, 70, 73
3D-graphics 23
analytic listening 133
3D-positioned 34
androids 215, 216, 217, 219, 231
3D space 34, 36, 53
anterior and posterior transverse temporal areas
4 dimensions of perception 166
(H) 156
A anxiety 176, 177, 178, 179, 180, 181, 182, 183,
184, 185, 186, 187, 188, 190, 191
abstract soundtrack 390 Aperture Listening 112, 130
acoustemology 45, 57 API (Application Programming Interface) 65,
acoustic communication 131, 132, 133, 146, 75, 299, 301, 304, 306, 308, 309, 310,
148 311, 312
acoustic communities 83, 131, 146, 148, 151 Arduino 408, 413
acoustic ecology 131, 132, 133, 136, 138, 140, asynchrony 214, 215, 224, 225, 226, 227, 228,
146, 150, 151, 362, 364, 365, 366, 373, 232
378, 380, 382, 383 attention 153, 154, 157, 160, 161, 163, 164,
acoustic environment 365, 366 165, 166, 167, 168, 169, 170, 171, 172,
acoustic frustration 9, 10, 20 173
acoustic realism 131, 148 audio-editing software 121
acoustics 100, 130 Audio Entertainment 285
acoustic viability 325, 332 Audio Similarity Matrix (ASM) 245, 246
Advanced Multimedia Supplements (AMMS) Audiosurf 62, 67, 68, 73
306, 307, 308, 309 audio synthesis 340
aesthetic independence 392 audio-visiogenic effects 199, 200
Affect 103, 107, 109, 116, 117, 118, 119, 120, Audio-visual 233
124, 125 audio-visual (bimodal) perception 154, 161,
Affective Gaming 285 162, 163, 169
Affective Sound 264, 272, 285 audio-visual media 60
Affordance Theory 129 audition (hearing) 155
Allure 211 auditory perception 22, 23, 24, 29, 35, 39, 40,
ambient game sounds 183 43
Ambient Listening 112, 129 Auditory Scene Analysis 337
ambient sounds 31, 32, 34, 38, 50, 53, 55 aural architecture 46, 57
AuralAttributes Object 299, 300
Index
aural objects 138, 141, 142, 143 computer game audio 98, 99, 100, 101, 102,
Automated Collaborative Filters (ACFs) 253 103, 104, 107, 110, 125, 126, 129
automatic playlist generation 253, 254 computer game playing 264
Autopoiesis 411, 413 computer game sound 78, 80, 81, 85
avatar 32, 43, 111, 112, 129, 135, 136, 140, ConeSound node 299
146, 179, 181, 182, 183, 390, 391, 398, Constructive interaction 67
400, 401, 403, 404 constructivism 248
Avatar sounds 32 Content 235, 236, 239, 240, 241, 244, 245,
246, 247, 248, 250, 251, 253, 254, 255,
B 256, 258, 259, 260, 261, 262, 263
background diegetic music 63 context 235, 236, 239, 242, 247, 248, 249, 250,
background music 6, 7, 17, 67, 137, 146, 169 251, 253, 254, 255, 258, 259, 260, 262,
background noise 55, 58 263, 362, 364, 365, 366, 367, 368, 371,
BackgroundSound node 299 372, 373, 374, 375, 376, 378, 379, 380,
behavioural audio 314, 315, 321, 322 382, 383
behaviourally informed 316 context-oriented 24
behavioural parameters 323, 324, 330, 331 Continuous Parameterisation 337
behavioural realism 325 controlled parallel processing (CPP) 163
behaviourist 5 Cross-modal 233
Beowulf 35, 36, 37, 38, 39, 41 cross-modal interaction 153
Bet Max 3 Csound 315, 319, 335
binary connection tree (BCT) 349 cultural conventions of media and technology
Binaural 173 131, 148
bi-polar 249
D
bipolarity 106, 110, 111, 114
bits and chunks 106 Darwinian Emotional Theory 177, 190
black box 316, 327 Dataflow 319, 320, 337
Brodmann areas 156, 173 DAW [Digital Audio Workstation] 342
bukimi 218 Deferred Form 337
butterfly effect 322 Demo Scene 407, 413
button-mashing 4 Designing Sound 313, 316, 318, 319, 329
Destructive interaction 66
C diégèse 83, 87, 88, 89, 90, 93, 197, 205, 209,
cartoonish 216, 221 212
Character sounds 32 diegesis 62, 64, 66, 68, 69, 70, 75, 76, 79, 80,
Chion 100, 101, 103, 104, 105, 106, 112, 123, 81, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92,
127 93, 96, 152
chiptunes 134, 135, 147, 152 diegetic 60, 61, 62, 63, 64, 65, 66, 68, 71, 73,
civil inattention 47 75, 76, 78, 79, 80, 81, 82, 83, 84, 85, 86,
Cognitive Emotional Theory 177, 178, 190 87, 88, 91, 93, 94, 96, 103, 107, 109, 111,
cognitive load 98, 101, 104, 114, 115, 116, 112, 114, 116, 119, 122, 124, 126, 127,
119, 174 176, 177, 181, 183, 186, 197, 198, 199,
Cognitive processing 165, 166 200, 201, 205, 207, 209, 212, 237, 239,
Combined Model for the Structuring Computer 240, 261
Game Audio 130 diegetic music 60, 61, 63, 64, 65, 66, 75
Comprendre 211 diegetic sound 78, 80, 81, 82, 83, 84, 85, 86,
474
Index
87, 91 107, 109, 111, 119, 122, 124, 125, 126,

digital analog convertor (DAC) 314 130
digital games 265, 268, 270, 275, 276, 279, emitter paradigm 404
280 Emotion 380, 382, 383
digital game sound 51, 53 Emotional Interaction 263
digital game soundscape 56 Emotional Reaction 263
digital signal processing (DSP) 313, 314, 317, emotional state (E-state) 255, 256, 257, 258,
318, 319, 322, 330, 337, 339 263
digital visual game design 51 Emphasized Interface Sounds 93, 96
digital waveguide synthesis 341, 345, 346, Empirical Methods (Quantitative) 285
347, 360 encoded 98, 99, 101, 104, 105, 106, 107, 109,
DigiWall 36, 37, 38, 39, 40 116, 119, 121, 122, 124, 126, 130
Discretise 360 Entendre 211
distractors 164 entrainment 362, 366, 367, 372, 373, 374, 378,
DJ Hero 69, 73 379, 383
Donkey Konga 68, 69, 74 Environmental Audio Extensions (EAX) 303
Doppler Effect 306, 312 ergo-audition 402, 403, 406
Dorsal Stream 174 essential realism 324, 335
Dracula 222, 224, 225, 226, 230 everyday listening 24, 133, 138, 146, 148
dread 192, 193, 202, 204 Excitation 337
Driving Mode 68 exodiegetic 84
Drum Pants 69, 73 external transdiegetic 85, 91, 97
dynamic audio 343 extra-diegetic 75, 199, 201, 207, 212
dynamic interface 88 eye-centric 34
Dynamic Profile 211
Dynamics 337 F
Faceposer 214, 215, 226, 227, 228, 230
E
facial electromyography 13
earlids 29, 34, 51 Falling Mode 68
early sound 293 Fast Convolution 312
Easter Eggs 4 fear 176, 177, 178, 179, 180, 181, 182, 183,
ECG/EKG 277 184, 185, 186, 187, 188, 189, 190, 191,
ecology 131, 132, 133, 136, 138, 140, 143, 192, 193, 194, 202, 203, 204, 205, 206,
145, 146, 148, 149, 150 207, 208, 209, 210
Écouter 211 F.E.A.R. 98, 102, 108, 112, 115, 117, 118, 119,
Effect 103, 107, 109, 115, 116, 117, 118, 119, 123, 127
120, 124, 125 Feedback 28, 36, 38, 43
EGM sound 1, 14, 15 fidelity 131, 132, 133, 134, 137, 138, 140, 141,
electrocardiogram (ECG) 238 143, 145, 147, 148, 152
electrodermal activity (EDA) 273, 274, 275 film music 80, 81, 82, 95
electrodermal response 13, 20 film sound aesthetics 385, 397, 398
electroencephalograms (EEGs) 13 film sound design 385, 387, 397, 406
electromyography (EMG) 238, 273, 274, 277 Finite Difference 337
electronic gambling machines (EGMs) 1, 2, 3, Finite Element 337
4, 5, 8, 9, 10, 12, 14, 15, 20 finite element method (FEM) 344
embodied sounds 98, 99, 101, 104, 105, 106, First Difference 337
475
Index
first-person shooter (FPS) 51, 53, 112, 113, 403, 404, 405, 406, 407, 408, 409
135, 138, 140, 141, 145, 224, 273, 274, game sound design 384, 385, 386, 387, 405,
275, 276, 404 406, 407, 408
Fitts’ Law 160, 174 game (sound) designers 409
flow 25, 26, 30, 40, 42, 43 gamespace 45, 53, 54, 88, 89, 90, 91, 92, 96,
flow experience 25 97
FoleyAutomatic 350, 351, 357 Game System 96
forewarning 201, 204, 205, 207 gameworld 23, 25, 31, 33, 35, 36, 43, 44, 51,
Free-Field 312 53, 78, 79, 80, 81, 83, 84, 85, 86, 87, 88,
full motion video (FMV) 214, 227 89, 90, 91, 92, 93, 94, 96, 97, 391, 398,
functional Magnetic Resonance Imaging 400, 401, 402, 403, 405, 406
(fMRI) 269, 282, 284 gaming community 194
functional transformation method (FTM) 347 gaming environment 244
Futurist Music 413 Generated music 70
fuzzified 256, 258 Good Behaviour 330, 338
fuzzing 199 Grain 211
fuzzy logic 254, 256, 259 Graphical Processing Unit (GPU) 352
Fuzzy Rule Based System (FRBS) 257, 258, Grey Goo 338
259 Guitar Hero 67, 68, 69, 73
G H
Galvanic skin response (GSR) 13, 14, 20 Half-Life 113, 114, 128
game audio 153, 154, 318, 320, 321, 323, 326, hardwired 24, 30, 40
327, 329, 333, 335, 336, 337 Head-Related Impulse Response (HRIR) 297
game audio designer 98, 101 head-related transfer functions (HRTFs) 156,
game design 98, 123, 124, 384, 408 287, 288, 289, 297, 302, 309, 312
GameFlow 22, 25, 26, 28, 29, 31, 32, 35, 39, Head Tracking 305, 306, 312
42 hermeneutic affordances 406
game-generated sounds 38 hi-fi 132, 134, 138
Game Metaphor 43 higher level semantics 395
game music 82, 83, 86, 94, 96 Holistic 59
gameplay 35, 43, 101, 102, 103, 109, 113, 115, horror computer games 192, 193, 194, 195,
120, 123, 126, 129, 177, 178, 179, 180, 197, 198, 199, 200, 201, 202, 203, 205,
184, 185, 186, 187, 189, 192, 193, 194, 206, 207, 208, 212
195, 196, 197, 199, 200, 202, 203, 204, Huiberts 98, 99, 102, 103, 104, 125, 126, 127,
207, 208, 209, 211, 212, 343, 355, 356 129, 130
gameplay emotions 178, 179, 180, 189 Human-Centered Design 285
game sound 1, 2, 9, 10, 12, 13, 14, 15, 78, 79, Human-Computer Interaction (HCI) 266, 283,
80, 81, 82, 83, 84, 85, 86, 87, 91, 92, 93, 284, 285, 327, 363, 378, 381
94, 131, 132, 133, 134, 135, 136, 137, Human Computer Interface (HCI) 25, 237
138, 139, 140, 141, 142, 143, 144, 145, human emotion 236, 240, 241
147, 148, 149, 150, 152, 192, 193, 194, human-likeness 213, 214, 215, 216, 217, 218,
196, 197, 199, 200, 205, 206, 264, 266, 219, 220, 221, 222, 224, 228, 233
267, 269, 273, 274, 276, 277, 362, 363, human-likeness of voice 213
366, 367, 372, 374, 375, 377, 378, 382,
384, 385, 386, 387, 388, 391, 392, 397,
476
Index
I kinediegetic 84
Iconic Interface Sounds 93, 96 L
ideodiegetic 84
Idiophonic 338 latency 154, 160, 161, 172, 173
idiot skill 4 Legend of Zelda 98, 103, 112, 115, 116, 120,
IEZA 131, 150 121
IEZA-framework 98, 99, 102, 103, 104, 106, leitmotif 81, 201
107, 110, 114, 115, 119, 122, 124, 125, liberation of the soundtrack 384, 385, 393
126, 130 lifeless 213
imitation 80 lifelike 213, 215
immersion 2, 4, 7, 28, 36, 42, 43, 103, 106, lip-synchronization 213, 225, 226, 227
110, 111, 112, 132, 140, 142, 143, 145, lip-vocalization 224
150, 176, 177, 182, 186, 188, 189, 265, listening modes 24, 28, 29, 131, 133, 138, 145
266, 267, 269, 270, 271, 272, 273, 274, listening positions 133, 134, 138, 139, 143,
276, 277, 278, 279, 280, 282, 283, 285 144, 148, 152
immersive 3D environments 214 LOAD (Level of Audio Detail) 318, 332, 333,
immersive user experience 311 335, 338
Implementation 338 Localization 174
indie game 389 locomotion 110, 112, 114
indoor acoustics 292 lo-fi 132, 134, 138
Inharmonic 360 logjam 101, 102, 104, 109, 125
input 153, 154, 159, 160, 161, 165, 170 loopy 135, 137, 142, 143, 152
Integrated Interface Sounds 93, 96 losses disguised as wins 3, 5, 10, 14, 15, 17, 20
Interaction Design 285 M
interactive ambiences 391
Interactive Institute, Sonic Studio 22, 27, 35 Machine Listening 338
interactivity 153, 154, 159, 160, 161, 165, 166, mapping 160, 161
170, 172 Masking 338
Interaural Intensity Difference (IID) 288, 291 Mask Topology 338
interaural level differences (ILDs) 155 Massively multi-player online role-playing
Interaural Time Difference (ITD) 288, 289, games (MMORGs) 147, 148
290, 291 Mass Profile 211
inter-beat intervals (IBI) 238 maximum-likelihood estimation (MLE) 157,
Interface 103, 107, 109, 117, 119, 120, 125, 158, 164
129, 130 McGurk Effect 225
internal transdiegetic 85, 89, 91 meaning of sounds 196
International Phonetic Alphabet (IPA) 227 mediated listening 47, 48
Mediatization 59
J metalepsis 81
James-Lange Emotional Theory 177, 180, 190 Metaphorical Interface Sounds 96
Just-Noticeable-Difference (JND) 298 Method 338
mimesis 80
K mise en scène 193, 194, 202, 203, 207, 211,
212
Kalman Filter 157 Mobile Media API (MMAPI) 306, 307, 308,
keynote sounds 52 309, 311
477
Index
modal synthesis 341, 346, 347, 348, 349, 350, O

351, 352, 353, 357, 358, 360
mode compression 353 Object sounds 32
Model 323, 338 objet sonore 393, 401, 413
mode truncation 353 Occlusion 289, 312, 338
Monaural 174 one-dimensional (1D) 345, 347, 360
mood track 146 OpenSL ES 308, 310, 311
morphing 332 Operating System (OS) 306, 307
morphology 192 Ornamental sounds 32
movie brats 393, 394, 396 Ouïr 211
MPEG (Motion Picture Experts Group) 299, outdoor acoustics 288, 294
302, 303, 304, 305, 306, 309 Overlay Interface Sounds 96
multi-modality 161, 174 P
multi-modal salience 164, 165
multi-player environments 373 Parameter 332, 338
multiplayer games 397 Parametric (Signal) Method 338
Murch’s conceptual model 100, 102, 104, 105, partial differential equations (PDEs) 344, 347
106, 107, 116, 124, 130 PAs (amplified public announcements) 147
musical diegesis 68, 70, 75 Perceptual Cycle 154, 162, 166, 174
music (embodied) 99 perceptual feedback 154, 161
Music Video Games 75 perceptual relevance model 157
musique concrète 134, 207, 326, 393, 396, 413 personal construct psychology (PCP) 248, 249
Muzak 6, 18, 19 personal construct theory (PCT) 248
Pervasive Game 43
N phenomenology 327, 334
naive listening 143 photorealistic 27
naive physics of perception 143, 145 Phya 352, 353, 354, 355, 358, 359
near miss 3, 5, 14, 20 physically informed 316, 338, 339
neurochemical transmitters 265 Physically Informed Stochastic Event Modeling
neurophysiological pleasure 265 (PhISEM) 348, 354
Next Generation 387, 407 physical modelling 316, 338, 340, 341, 343,
nickel slot 3 344, 345, 346, 347, 348, 349, 351, 352,
noise pollution 49 353, 354, 355, 356, 357, 359, 360
non-diegetic 60, 61, 62, 63, 64, 65, 73, 75, 76, physical world 23, 24, 25, 26, 27, 30, 31, 32,
78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 34
91, 93, 94, 96, 103, 107, 109, 114, 116, Physics Engine 339
122, 124, 126, 127, 181, 183, 237, 239, physiological response 12, 15, 46
240, 261 pit music 81
non-diegetic music 61, 64, 65 Playlist Generation 254, 259, 263
non-diegetic sound 78, 80, 81, 82, 83, 86, 91 points of observation 111
non-gambling environments 2 PointSound node 299
nonhuman-like 220, 223 polymorphism 319, 335
nonlinear music 67, 75 Precomposed music 70
non-player characters (NPCs) 91, 399, 400, Presence 154, 160, 161, 171, 172, 173, 174
405 Primary Auditory Cortex 156
Nouvelle Vague 393, 394, 396 Primary Visual Cortex (V1) 156
478
Index
Principal Component Analysis (PCA) 250 sampling plus synthesis (S+S) 321, 339
procedural audio 229, 230, 313, 314, 315, 316, Scene Description Language (SDL) 302
318, 319, 320, 321, 323, 326, 328, 332, SceneGraph 299
333, 334, 335, 340, 342, 343, 356, 358, Schema 174
360 schizophonic 48, 59
programmable sound generators (PSGs) 341 Scrambled Eggs 38, 39
Progression functions 201 screen music 81
psychoacoustics 100, 127, 130, 190, 191 see and hear 24
psychoacoustic sound 328 see-hear 24
Psychophysiological research 265, 267 self-learning 254
psychophysiology 267, 269, 277, 278, 281, semantics 371, 383
282, 283, 285 sensible realism 324
Pure Data 319, 320, 337 Servicescapes 6
pure narrative 80, 96 Single-Cell Recording 174
skin conductance level (SCL) 238
Q slot machines 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13,
Quality of Experience (QoE) 153, 154, 174 15, 17, 19, 20
quality-oriented 24 smell (olfaction) 155
quality scaling 353 Snapshot Listening 112, 130
Social construction of space 59
R social constructivist theory 177
Sonic Architecture 59
range 160, 161, 165, 166, 169
sonic branding 11
Reactive Audio 339
sonic effect 47, 51, 401
Reading Mode 68
sonic elements 143, 145, 146
Realism 233
sonic environment 99, 100, 101, 102, 103, 104,
reality window 323
106, 107, 110, 112, 116, 119, 121, 123,
real-time 313, 315, 320, 328, 333, 335, 338
124, 125, 126, 130
real time strategy (RTS) 111
sonic events 401, 402
real world 44, 46, 47, 49, 50, 51, 59
Sonic Explorer 348, 349, 351
real world listening 47
sonic expression 368, 370, 371, 378
real world sound sources 64
sonic identity 394
reduced listening 393, 394, 413
sonic objects 399, 404, 413
reinforcement cues 5
sonic re-engineering 394
Replication 339
Sonification 399, 414
resonance 362, 366, 367, 372, 373, 374, 378,
sound design 100, 121, 123, 124, 126, 127,
379, 383
176, 180, 189, 191, 362, 363, 364, 366,
reverberation 292, 293, 294, 300, 302, 308
367, 369, 372, 375, 378, 379, 382, 383,
reward schedule 3, 8, 9, 21
384, 385, 386, 387, 390, 392, 393, 395,
Rez 62, 69, 74
397, 398, 399, 400, 403, 404, 405, 406,
role of sound 264, 272, 275
407, 408, 409, 410, 412, 414
Rolling Sound 21
sound designers 402, 406, 408
S sound effects 31, 32, 176, 177, 178, 179, 181,
182, 183, 184, 185, 186, 187, 189, 340,
Salience 165, 170, 174 341, 342, 343, 355, 356, 357
salience model 153, 155, 165, 166, 170 sound images 138
479
Index
Sound layers 383 T

sound localization 290, 291, 296, 297, 311
soundmarks 46, 52, 55 Takagi-Sugeno-Kang (TSK) 257, 259
Sound Node 299, 300, 301, 302 taste (gustation) 155
sound objects 313, 314, 315, 316, 322, 339 telediegetic 84
sound positioning 64 Tensor 339
Sound Principles 191 three dimensional 334, 347, 349, 350, 359,
sound rendering 65 385, 387, 391, 398, 404, 405, 407, 412,
soundscape 6, 14, 23, 29, 30, 32, 34, 35, 36, 414
37, 40, 42, 44, 45, 46, 47, 48, 49, 50, 52, time on device 4
53, 54, 55, 56, 59, 64, 149, 150, 151, Timing 181, 182, 191
152, 362, 365, 366, 368, 369, 370, 371, Tone Wall/Harmonic Field 69
373, 374, 378, 379, 380, 381, 382, 383, touch (taction or pressure) 155
401, 413 trans-diegetic 79, 80, 84, 85, 86, 87, 89, 91, 92,
soundscape composition 371, 378, 383 95, 96, 97, 239
Soundscape Node 299, 300 transdiegetic sounds 79, 85, 87, 92, 95
soundscapes 177, 180, 181, 182, 185, 190 two dimensional 319, 323, 334, 342, 347, 357,
sound script 347 358
sound signals 52 typology 99, 100
sound synthesis 341, 342, 343, 344, 346, 347,
U
348, 350, 351, 353, 354, 355, 356, 357,
359 Uncanny Modality (UM) 214, 217, 222, 223,
Soundwalking 59 224, 226, 227
Source 181, 183, 191 uncanny speech in computer games 228
Space Invaders 168, 174 Uncanny Valley 213, 214, 215, 216, 217, 218,
spatial audio 297, 307, 308, 311 219, 220, 229, 230, 231, 232, 233
spatial audio system 297 unstrange 218
spatial sound 287, 288, 289, 290, 294, 297, urban overload 47
298, 299, 301, 303, 304, 305, 309, 311 urban soundscape 46, 49
speech (encoded) 99 user-centered design (UCD) 285
speed 159, 160 User Experience (UX) 153, 285
Statefulness 339 user generated content (UGC) 321
static ambience 391 user investment 37
static interface 88 User Studies 285
surprise effects 204
surround sound 301, 302, 303 V
suspense 176, 177, 178, 180, 181, 182, 183, van Tol 98, 99, 102, 103, 104, 125, 126, 127,
184, 185, 186, 187, 189, 190, 191 129, 130
Suspension of Disbelief 43 Ventral Stream 174
Symbolic Interactionist 59 verisimilitude 131, 132, 133, 140, 141, 142,
synchronisation points 200 143, 145, 147, 148, 152
synchronism (simultaneous events) 14 VFoley 355, 359
synchronized game sound 176, 184 videoludic 192, 193, 194, 202, 207, 208, 211,
synchrony 224, 225, 230, 233, 234 212
synthesis 14 virtual characters 213, 214, 215, 216, 217, 219,
220, 221, 222, 226, 227, 228, 234
480
Index
Virtual Environment (VE) 27, 56, 81, 89, 90, W

160, 237, 246, 340, 341, 343, 346, 347,
348, 351, 358, 359, 365 Walter Murch 98, 129, 130
virtual gameplay environment 248 Warcraft III 98, 102, 111, 115, 116, 118, 119,
virtual gameworld 23, 26 123, 128
virtual physical parameters 340 Waveguide 328, 339
Virtual Reality Modeling Language (VRML) willing suspension of disbelief 370, 383
302 winning cue 14
Virtual Reality (VR) 287, 288, 289, 290, 292, World Forum for Acoustic Ecology 48
294, 297, 298, 299, 302, 309, 312 World Soundscape Project 46
virtual scene 287, 297, 303 X
virtual soundscape 44, 48, 50, 59
virtual space 45, 46, 53 XNA/XACT 301
virtual worlds 23, 32, 35, 44, 55, 62, 65, 70,
313, 397, 404, 405 Z
vision-based 34 Zone 103, 107, 109, 115, 116, 117, 118, 119,
Vision (sight) 155 120, 125, 129, 130
Visual Association Cortex (V2 and V3) 156
visual capture 157, 158
visual representation (viseme) 215, 226, 234
Volume 181, 191
481

(Premier Reference Source) Mark Grimshaw, Mark Grimshaw - Game Sound Technology and Player Interaction - Concepts and Developments (2010, IGI Global)

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

(Premier Reference Source) Mark Grimshaw, Mark Grimshaw - Game Sound Technology and Player Interaction - Concepts and Developments (2010, IGI Global)

Caricato da

Copyright:

Formati disponibili

Game Sound Technology

and Player Interaction:

InformatIon scIence reference

Published in the United States of America by

Library of Congress Cataloging-in-Publication Data

British Cataloguing in Publication Data

Foreword ............................................................................................................................................. xii

Preface ................................................................................................................................................ xiv

Appendix..... ....................................................................................................................................... 416

Compilation of References ............................................................................................................... 427

About the Contributors .................................................................................................................... 467

Index ................................................................................................................................................... 473

Foreword ............................................................................................................................................. xii

Preface ................................................................................................................................................ xiv

Appendix..... ....................................................................................................................................... 416

Compilation of References ............................................................................................................... 427

About the Contributors .................................................................................................................... 467

Index ................................................................................................................................................... 473

interaction and immersion.

INtrODUctION more structural components to slot machine

INtrODUctION put to play. In our work we use perspectives and

• Concentration. Games should require • Players should be less aware of their

Figure 1. The Beowulf game window

Figure 2. DigiWall climbing wall computer game interface

Both projects explored questions related to gameworld’s environments, materials, tempera-

INtrODUctION Within a real world all the senses are exposed

sound as side Effect Our relationship to the

It is the concept of immersion which guides sonic Immersion

INtrODUctION scene, associates contextual information, and

Figure 1. A systematic overview of all forms of diegetic music

ONstAGE PErFOrMED MUsIc Conversely, it can also be that dramatic events

where the human subjective auditive perception Non-Linearity and Interactivity

Interactivity in the Musical Domain and preproduced material. Further interactivity

Livingstone, S. R. (2008). Changing musical NanaOn-Sha (1999). Vib-Ribbon [Computer

INtrODUctION tween two characters is seen as diegetic, while

Table 1. Game sound and world integration

Metaphorical interface Dragon Age: Origins: Enemy music

rEFErENcEs Droumeva, M. (2011). An acoustic commu-

INtrODUctION of functional models, such as Sander Huiberts

Figure 2. Murch’s conceptual model. Adapted from Murch (1998)

framework successfully in game audio courses at 3. Yellow – Equally Balanced Effects

Figure 5. F.E.A.R. analysis example

Figure 7. Imaginable snapshot of sounds in Legend of Zelda

Table 1. F.E.A.R. analysis

Table 2. Warcraft III analysis

Table 3. Legend of Zelda analysis

Figure 8. Shoot the Ducks level 1

A sound that is played as background music for instructions

Controls Utilizing the combined Model

Table 5. Shoot the Ducks sounds categorized

Shoot the Ducks

Ambulatory Listening: listening by moving ENDNOtEs

INtrODUctION Wallén, 2011), which builds on several existing

Listening Positions Description

In tracing some of the history of game sound,

Attentional Game Listening Examples from Reference

VErIsIMILItUDE the player. So already, there is an implication that

INtrODUctION this assessment has had to be revised over the

Figure 2. A salience model for perceived quality in audio-visual games

Figure 4. Grey-scale screenshot of the game scenario

rEFErENcEs Farris, J. S. (2003). The human interaction cycle:

INtrODUctION everyday life. Immersion occurs when the game:

fear or anxiety. Furthermore, timed sound effects- causing Fear Findings