Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
• Explanation/definition of terms
Multimodal Interaction and
Perception • Examples of multimodal interaction
• Multimodal interaction: Interacting with a system • Interaction implies receiving information from the
using more than one (dominant) modality system (output modalities) and providing input to the
• Multimodal interfaces system (input modalities)
• What defines a modality in this term? • Input and output modalities do not have to be the
• The various sensory systems/channels (visual, same
auditory, haptic as the most dominant ones) • Classical computer interface: Output via display
• In less formal terms, a modality is a path of (visual), input via keyboard/mouse (haptic)
communication between the human and the computer • Speech is an example of using the same modality for
(this would then also include brain-computer input (requires automatic speech recognition) and
interfaces) output (requires speech synthesis)
• This area is strongly embedded in HCI research, • Be careful: The term interaction is also used with a
Human Computer Interaction different meaning
/Human Technology Interaction 5-2-2009 PAGE 2 /Human Technology Interaction 5-2-2009 PAGE 3
/Human Technology Interaction 5-2-2009 PAGE 4 /Human Technology Interaction 5-2-2009 PAGE 5
1
Examples Examples II
• If it is unclear where your focus of attention is, a • Regarding sensory bandwidth effects
sound is a much better alarming stimulus than a
light flash (e.g. in control rooms) • Positive effect: It is known that speech perception
can be improved by matched visual and acoustic
• If your hands are busy (for a doctor during a medical representation (at least for difficult acoustic
intervention; for yourself in the kitchen), speech as situations)
input modality has great advantages
• Negative effect: It is known that having a phone
• If you attend a winterschool, and you expect an conversation during driving (requires visual
important phone call, vibration is a good modality for attention) leads to increased reaction times (also for
signaling hands-free conditions)
/Human Technology Interaction 5-2-2009 PAGE 6 /Human Technology Interaction 5-2-2009 PAGE 7
• A special case of specific contexts of use are interfaces • SATIN: Sound And Tangible Interfaces for Novel product
designed for people with reduced perceptual and motor design (http://www.satin-project.eu/)
abilities
• often referred to as accessibility
• Sensory substitution • The main objective of the project is to develop a new
generation of multimodal and multisensory interfaces,
supporting free-hand interaction with virtual shapes
• Speech and braille output for blind users
• See with your Ears! • Based on fusion of force feedback, sound and vision for
representing global and local properties of shape and material
• Image to sound transformation system, in particular for the
blind; by Peter Meijer • Our contribution: Use data sonification (e.g., of curvature
values) to represent object shape properties which cannot
• http://www.seeingwithsound.com/ easily be perceived visually or haptically
• Spatial disparity
• Research question: What do we perceive when
audio and video are presented from different
• Temporal and rhythm disparity positions (directions)?
/Human Technology Interaction 5-2-2009 PAGE 10 /Human Technology Interaction 5-2-2009 PAGE 11
2
From Stein and Meredith:
The merging of the senses (1993) Spatial disparity II
/Human Technology Interaction 5-2-2009 PAGE 12 /Human Technology Interaction 5-2-2009 PAGE 13
/Human Technology Interaction 5-2-2009 PAGE 14 /Human Technology Interaction 5-2-2009 PAGE 15
/Human Technology Interaction 5-2-2009 PAGE 16 /Human Technology Interaction 5-2-2009 PAGE 17
3
Demo (a more magical version than a double
flash, a quintuple rabbit)
• WEB source:
http://www.cns.atr.jp/~kmtn/audiovisualRabbit/index.html
/Human Technology Interaction 5-2-2009 PAGE 18 /Human Technology Interaction 5-2-2009 PAGE 19
/Human Technology Interaction 5-2-2009 PAGE 20 /Human Technology Interaction 5-2-2009 PAGE 21
Asynchrony and quality degradation (Rihs, ITU recommendation (1998): Relative timing of Sound
1995) and Vision for Broadcasting
/Human Technology Interaction 5-2-2009 PAGE 22 /Human Technology Interaction 5-2-2009 PAGE 23
4
Take-home message Asynchrony judgments for distant stimuli
/Human Technology Interaction 5-2-2009 PAGE 24 /Human Technology Interaction 5-2-2009 PAGE 25
Results for distant stimuli (I) Results for distant stimuli (II)
• Rudloff (1997): He compared quality judgments for two video • Stone et al. (2001): AV synchrony judgments for simple stimuli
scenes: • Short distance (0.5 m)
• Object at short (5 m) distance • Larger distance (3.5 m)
• Object at distant (30 m) position (in the recording, the sound • The difference in sound travel time between the two conditions is 11
had about 100 ms delay) ms
• The PSE values obtained in the two conditions differed for 3 of the 5
• Quality judgments as a function of extra delay differed
subjects
• For short distance, maximum quality at extra delay of 80 ms • When 11 ms were subtracted (the difference between T_h and T_s), no
• For large distance, maximum quality at 0 ms extra delay significant difference remained.
• Conclusion: No (complete) compensation for physical delay in • Conclusion by the authors: Observers do not discount the effects of
stimulus distance when making judgments of simultaneity (what counts is the
temporal relation at the head of the observer)
/Human Technology Interaction 5-2-2009 PAGE 26 /Human Technology Interaction 5-2-2009 PAGE 27
/Human Technology Interaction 5-2-2009 PAGE 28 /Human Technology Interaction 5-2-2009 PAGE 29
5
Conclusion by Sugita and Suzuki The latest data on this issue
/Human Technology Interaction 5-2-2009 PAGE 30 /Human Technology Interaction 5-2-2009 PAGE 31
/Human Technology Interaction 5-2-2009 PAGE 32 /Human Technology Interaction 5-2-2009 PAGE 33
Braille display
Discussion
• Clear asymmetry in the sensitivity for audio and for video delays • A refreshable Braille display or Braille terminal is an
• Typical explanation: Asymmetry reflects difference in arrival times of A electro-mechanical device for displaying Braille
and V for distant sources (a distance of 10 m corresponds to 30 ms characters, usually by means of raising dots through
sound propagation time)
• But: The asymmetry exists already for babies of 2 months of age
holes in a flat surface.
• PSE corresponds approximately to the difference in transduction time • Those displays are commercially available (but not
between peripheral sensors and first integrating neurons in the cheap), several 1000 EURO
Superior Colliculus
• There might be a methodological issue in using the TOJ paradigm
• Need of more data studying the influence of object distance
/Human Technology Interaction 5-2-2009 PAGE 34 /Human Technology Interaction 5-2-2009 PAGE 35
6
Results for disk falling down
0.9
0.8
0.7
0.6
Responses
Video first
0.5 Synchronous
Audio first
0.4
0.3
0.2
0.1
0
-400 -300 -200 -100 0 100 200 300 400
Audio Delay (ms)