Representing Sound Digitally

UNIT
2 Sound/Audio
Formatted on September 5, 2000
Objectives Contents
To understand how computers process 1 The Nature of Sound 2
sound
2 Computer Representation of Sound 7
To understand how computers
synthesize sound 3 Computer Music — MIDI 15
To understand the differences between 4 Summary — MIDI versus digital audio 26

two major kinds of audio, namely
digitised sound and MIDI music 5 Exercises 27
COMP3600/SCI2600 Multimedia Systems

2. Sound/Audio
Department of Computer Science
(200009) Slide: 1
1 The Nature of Sound
Sound is a physical phenomenon produced by the vibration of matter and transmitted as waves.
However, the perception of sound by human beings is a very complex process. It involves three
systems:
the source which emits sound;

the medium through which the sound propagates;
the detector which receives and interprets the sound.
Amplitude
Sounds we heard everyday are very complex. Every Time
sound is comprised of waves of many different

frequencies and shapes. But the simplest sound we can
Period
hear is a sine wave.
Sound waves can be characterised by the following attributes:
Period Frequency Amplitude Bandwidth

Pitch Loudness Dynamic
2. Sound/Audio
(200009) Slide: 2
1.1 Pitch and Frequency
Period is the interval at which a periodic signal

repeats regularly.
Pitch is a perception of sound by human beings It
measures how ‘high’ is the sound as it is perceived
by a listener. Infra-sound 0 – 20 Hz
Frequency measures a physical property of a Human hearing range 20 – 20 kHz
wave. It is the reciprocal value of period f = P1 . Ultrasound 20 kHz – 1 GHz
The unit is Herts (Hz) or kiloHertz (kHz). Hypersound 1 GHz – 10 THz
Musical instruments are tuned to produce a set of Note Ratio Frequencies

fixed pitches. C 1:1 264
D 9:8 297
E 5:4 330
F 4:3 352
G 3:2 396
A 5:3 440
B 15:8 495
C 2:1 528

2. Sound/Audio
(200009) Slide: 3
1.2 Loudness and Amplitude
The other important perceptual quality is This is known as the threshold of feeling. If the
loudness or volume. intensity is 10 12watt=m2, we may just be able
to hear it. This is know as the threshold of
Amplitude is the measure of sound levels. For a hearing.
digital sound, amplitude is the sample value.
The relative intensity of two different sounds is
The reason that sounds have different loudness measured using the unit Bel or more commonly
is that they carry different amount of power. deciBel (dB). It is defined by
The unit of power is watt. The intensity of
I2
sound is the amount of power transmitted relative intensity in dB = 10 log
through an area of 1m2 oriented perpendicular I1
to the propagation direction of the sound. Very often, we will compare a sound with the
threshold of hearing.
If the intensity of a sound is 1watt=m 2, we may
start feel the sound. The ear may be damaged.

2. Sound/Audio
(200009) Slide: 4
160 dB Jet engine
130 dB Large orchestra at fortissimo
100 dB Car on highway
70 dB Voice conversation
50 dB Quiet residential areas
30 dB Very soft whisper
Typical sound levels generated by various sources 20 dB Sound studio
Intensity Sound Level Loudness

(watt=m2) dB
1 120 Threshold of feeling

3
10 90 fff
4
10 80 ff
5
10 70 f
6
10 60 mf
7
10 50 p
8
10 40 pp
9
10 30 ppp
12
Typical sound levels in music 10 0 Threshold of hearing

2. Sound/Audio
(200009) Slide: 5
1.3 Dynamic and Bandwidth
Dynamic range means the change in sound levels.

For example, a large orchestra can reach 130dB at its climax and drop to as low as 30dB at its
softest, giving a range of 100dB.
Bandwidth is the range of frequencies a device can produce or a human can hear.
FM radio 50Hz – 15kHz
AM radio 80Hz – 5kHz
CD player 20Hz – 20kHz
Sound Blaster 16 sound card 30Hz – 20kHz
Inexpensive microphone 80Hz – 12kHz
Telephone 300Hz – 3kHz
Children’s ears 20Hz – 20kHz
Older ears 50Hz – 10kHz
Male voice 120Hz – 7kHz
Female voice 200Hz – 9kHz

2. Sound/Audio
(200009) Slide: 6
2 Computer Representation of Sound
Sound waves are continuous while computers are good at handling discrete numbers.
In order to store a sound wave in a computer, samples of the wave are taken.
Each sample is represented by a number, the ‘code’.
This process is known as digitisation.
This method of digitising sound is know as pulse code modulation (PCM).
Refer to Unit 1 for more information on digitisation.
According to Nyquist sampling theorem, in order to capture all audible frequency components
of a sound, i.e., up to 20kH z , we need to set the sampling to at least twice of this.
This is why one of the most popular sampling rate for high quality sound is 4410H z .
Another aspect we need to consider is the resolution, i.e., the number of bits used to represent
a sample.
Often, 16 bits are used for each sample in high quality sound. This gives the SNR of 96dB .

2. Sound/Audio
(200009) Slide: 7
2.1 Quality versus File Size
The size of a digital recording depends on

the sampling rate, resolution and number of S file size bytes
channels. R sampling rate samples per second
S = R ( b=8)
C D
b
C
resolution
channels
bits
1 - mono, 2 - stereo
Higher sampling rate, higher resolution gives D recording duration seconds
higher quality but bigger file size.
For example, if we record 10 seconds of stereo music at
44.1kHz, 16 bits, the size will be:
S = 44100 (16 8) 2 10
=
= 1; 764; 000bytes
= 1722:7Kbytes
1Kbytes = 1024bytes
= 1:68Mbytes Note:
1Mbytes = 1024Kbytes
High quality sound files are very big, however, the file size can be reduced by compression.

2. Sound/Audio
(200009) Slide: 8
File size for some common sampling rates and resolutions
Sampling Stereo Size for
Rate Resolution /Mono for 1 Min. Comments
44.1KHz 16-bit Stereo 10.5MB CD-quality recording
44.1KHz 16-bit Mono 5.25MB A good trade-off for high-quality recordings of
mono sources such as voice-overs
44.1KHz 8-bit Stereo 5.25MB Achieves highest playback quality on low-end
devices such as most of the sound cards
44.1KHz 8-bit Mono 2.6MB An appropriate trade-off for recording a mono
source
22.05KHz 16-bit Stereo 5.25MB Darker sounding than CD-quality recording
because of the lower sampling rate
22.05KHz 16-bit Mono 2.5MB Not a bad choice for speech, but better to trade
some fidelity for a lot of disk space by dropping
down to 8-bit
22.05KHz 8-bit Stereo 2.6MB A very popular choice for reasonable stereo
recording where full bandwidth playback is not
possible
22.05KHz 8-bit Mono 1.3MB A thinner sound than the choice just above, but
very usable
11KHz 8-bit Stereo 1.3MB At this low a sampling rate, there are few
advantages to using stereo
11KHz 8-bit Mono 650K In practice, probably as low as you can go and still
get usable results
5.5KHz 8-bit Stereo 650K Stereo not effective
5.5KHz 8-bit Mono 325K About as good as a bad telephone connection

2. Sound/Audio
(200009) Slide: 9
2.2 Audio File Formats
The most commonly used digital sound format in Windows systems is .wav files.
Sound is stored in .wav as digital samples known as Pulse Code Modulation(PCM).

Each .wav file has a header containing information of the file.
type of format, e.g., PCM or other modulations
size of the data
number of channels
samples per second
bytes per sample
There is usually no compression in .wav files.
Other format may use different compression technique to reduce file size.
.vox use Adaptive Delta Pulse Code Modulation (ADPCM).

.mp3 MPEG-1 layer 3 audio.
RealAudio file is a proprietary format.
2. Sound/Audio
(200009) Slide: 10
Some common audio files formats
Extension MIME Type Platform Use

aif Audio/x-aiff Mac, SGI Audio
aifc Audio/x-aiff Mac, SGI Audio (compressed)
AIFF Audio/x-aiff Mac, SGI Audio
aiff Audio/x-aiff Mac, SGI Audio
au Audio/basic Sun, NeXT ULAW audio data
mov Video/QuickTime Mac, Win QuickTime video
mpe Video/mpeg All MPEG video
mpeg Video/mpeg All MPEG video
mpg Video/mpeg All MPEG video
mp3 Audio/x-mpeg All MPEG audio
qt Video/QuickTime Mac, Win QuickTime video
ra,ram Audio/x-pn-realaudio All RealAudio Sound
snd Audio/basic Sun, NeXT ULAW Audio Data
vox Audio/ All VoxWare Voice
wav Audio/x-wav Win WAV Audio

2. Sound/Audio
(200009) Slide: 11
2.3 Audio Hardware
Recording and Digitising sound: Different sound card have different
An analog-to-digital converter(ADC) capability of processing digital sounds.
converts the analog sound signal into When buying a sound card, you should
digital samples. look at:
A digital signal processor(DSP) maximum sampling rate
processes the sample, e.g. filtering,
stereo or mono
modulation, compression, and so on.
Play back sound: duplex or simplex
A digital signal processor processes the

sample, e.g. decompression,
demodulation, and so on ADC
An digital-to-analog converter(DAC)
Digital
converts the digital samples into sound DSP
Sound
signal
All these hardware devices are integrated DAC
into a few chips on a sound card

2. Sound/Audio
(200009) Slide: 12
2.4 Audio Software
Windows device driver — controls the hardware

device.
Many popular sound cards are Plus and Play.
Windows has drivers for them and can recognise
them automatically. For cards that Windows does
not have drivers, you need to get the driver from the
manufacturer and install it with the card.
If you do not hear sound, you should check the settings,
such as interrupt, DMA channels, and so on.
Device manager — the user interface to the hardware

for configuring the devices.
You can choose which audio device you want to use
You can set the audio volume

2. Sound/Audio
(200009) Slide: 13
Mixer — its functions are:
to combine sound from different sources
to adjust the play back volume of sound sources
to adjust the recording volume of sound sources
Recording — Windows has a simple Sound Recorder.
Editing — The Windows Sound Recorder has a limiting editing function, such as changing
volume and speed, deleting part of the sound.
There are many freeware and shareware programs for sound recording, editing and processing.

2. Sound/Audio
(200009) Slide: 14
3 Computer Music | MIDI
Sound waves, whether occurred natural or man-made, are often very complex, i.e., they consist of
many frequencies. Digital sound is relatively straight forward to record complex sound. However,
it is quite difficult to generate (or synthesize) complex sound.
There is a better way to generate high quality music. This is known as MIDI — Musical
Instrument Digital Interface.
It is a communication standard developed in the early 1980s for electronic instruments and
computers. It specifies the hardware connection between equipments as well as the format in
which the data are transfered between the equipments.
Common MIDI devices include electronic music synthesisers, modules, and MIDI devices in
common sound cards.
General MIDI is a standard specified by MIDI Manufacturers Association. To be GM compatible,

a sound generating device must meet the General MIDI system level 1 performance requirement.
minimum of 24 fully voices
16 channels, percussion on channel 10
minimum 16 simultaneous and different timbre instruments
This sign indicated that
minimum 128 preset instruments the device is a general
Support certain controllers MIDI device.

2. Sound/Audio
(200009) Slide: 15
3.1 MIDI Hardware
An electronic musical instrument or a computer which has MIDI interface should has one or more
MIDI ports. The MIDI ports on musical instruments are usually labelled with:
IN — for receiving MIDI data;

OUT — for outputting MIDI data that are generated by the instrument;
THRU — for passing MIDI data to the next instrument.
MIDI devices can be daisy-chained together.
MIDI device
IN OUT
OUT IN THRU IN THRU IN THRU

MIDI device MIDI device MIDI device

2. Sound/Audio
(200009) Slide: 16
3.2 MIDI Data
Unlike digital sound, MIDI data does not encode individual samples. MIDI data
encode musical events and commands to control instruments.
MIDI data are grouped into MIDI messages. Each MIDI message represents a Key C4 ON
musical event, e.g., pressing a key, setting a switch or adjusting foot pedals. A Key G3 OFF
sequence of MIDI messages is grouped into a track. Volume 105
An instrument or a computer satisfies both the hardware interface and the data
format is known as a MIDI device.

2. Sound/Audio
(200009) Slide: 17
3.3 MIDI Channels and Modes
MIDI devices communicate with each other through channels. The

MIDI standard specifies each MIDI connection has 16 channels. Each
instrument can be mapped to a single channel (omni Off), or it can use
all 16 channels (Omni On).
Some instruments are capable of playing more than one note at the
same time, e.g., organs and piano. This is known as polyphony. Other
instruments, such as flute, is monophony since they can only play one
note at a time.
Each MIDI device must be set to one of the modes for receiving MIDI
data: Omni On/Poly
Omni On/Mono
Omni Off/Poly
Omni Off/Mono

2. Sound/Audio
(200009) Slide: 18
3.4 Instrument Patch
Each MIDI device is usually capable of producing sound resembling several real instruments
and/or noise effects (e.g., telephone, aircraft). Each instrument or noise effect is known as a patch,
or preset.
The general MIDI standard specifies 128 patches(ranges from 0 to 127).
ID Sound ID Sound
0 Acoustic grand piano 35 Acoustic bass drum
7 Clarinet 45 Low tom
10 Music box 76 High wood block
19 Church organ 80 Mute triangle
40 Violin
48 String ensemble I
124 Telephone ring
125 Helicopter
126 Applause

2. Sound/Audio
(200009) Slide: 19
3.5 MIDI les
When using computers to play MIDI music, the MIDI data are often stored in MIDI files. Each
MIDI files contains a number of chunks. There are two types of chunks:
Header chunk — contains information about the entire file: the type of MIDI file, number of
tracks and the timing.
Track chunk — the actual data of MIDI track.
Chunk 1 Chunk 2 Chunk 3 Chunk 4 Chunk 5 Chunk 6
There three types of MIDI file:

0 single multichannel track
Header chunk
1 one or more simultaneous track of a
sequence
Track chunk
2 one or more sequentially independent
single-track patterns MIDI event

2. Sound/Audio
(200009) Slide: 20
Tracks, channels and patches
Multiple tracks can be played at the same time.

Each track can be assigned to a different channel.
Each channel can accept more than one track.
Each channel is assigned a patch, therefore generates sound of a particular instrument
Track 1 Chn 2
Track 2 Chn 3
Track 3 Chn 4
Track 4 Chn10
Device 1 Device 2 Device 3
Omni On
Chn 2<--> Piano Chn 10
Chn 2<--> Vialin
Chn3 <--> Guitar
Chn 4 <--> String ensemble

2. Sound/Audio
(200009) Slide: 21
3.6 How MIDI Sounds Are Synthesized
A simplistic view is that:
the MIDI device stores the characteristics of sounds produced by different sound sources;
the MIDI messages tell the device which kind of sound, at which pitch is to be generated, how
long the sound is played and other attributes the note should have.
There are two ways of synthesizing sounds:
FM Synthesis (Frequency Modulation) — Using one sine wave to modulate another sine wave,
thus generating a new wave which is rich in timbre. It consists of the two original waves, their
sum and difference and harmonics.
The drawbacks of FM synthesis are: the generated sound is not real; there is no exact formula
for generating a particular sound.
Wave-table synthesis — It stores representative digital sound samples. It manipulates these
sample, e.g., by changing the pitch, to create the complete range of notes.

2. Sound/Audio
(200009) Slide: 22
MIDI Sound Attributes
The shape of the amplitude envelop has great influence on the resulting character of sound. There
are two different types of envelop:
Diminishing sound — gradually die out;

Continuing sound — sustain until turned off.
On
Key
Off
Amplitude
Amplitude
Time Time
Deminishing sound Continuing sound

2. Sound/Audio
(200009) Slide: 23
The Amplitude Envelop
Delay — the time between when a key is

played and when the attack phase begins
Attack— the time from no sound to
maximum amplitude
Hold — the time envelop will stay at the peak
level beform starting the decay phase
Amplitude
Decay — the time it takes the envelop to go
from the peak level to the sustain level
Sustain — the level at which the envelop
remains as long as a key is held down
Release — the time is takes for the sound to Attack Decay Sustain Time
Release
fade to nothing Delay Hold

2. Sound/Audio
(200009) Slide: 24
3.7 MIDI software
MIDI player for playing MIDI music. This includes:

Windows media player can play MIDI files
Player come with sound card — Creative Midi player
Freeware and shareware players and plug-ins— Midigate, Yamaha Midplug, etc.
MIDI sequencer for recording, editing and playing MIDI
Cakewalk Express, Home Studio, Professional
Cubasis
Encore
Voyetra MIDI Orchestrator Plus
Configuration — Like audio devices, MIDI devices require a driver. Select and configure MIDI
devices from the control panel.

2. Sound/Audio
(200009) Slide: 25
4 Summary | MIDI versus digital audio
Digital Audio MIDI
Digital representation of physical sound Abstract representation of musical sounds

waves and sound effects
File size is large if without compression MIDI files are much more compact
Quality is in proportion to file size File size is independent to the quality
More software available
Much better sound if the sound source is of
Play back quality less dependent on the high quality
sound sources
Can record and play back any sound Need some music theory
including speech Cannot generate speech

2. Sound/Audio
(200009) Slide: 26
5 Exercises
1. Your hard disk has 256Mbytes of free space. You keyboard is a general MIDI device which is set to
are going to record a speech with a sampling rate of ‘Omni On’ mode, Synthesiser 1 is a drum machine
11KHz, 8-bit resolution and a single channel. What which accepts signals on the drum channel only, and
is the length of the recording that can be stored in Synthesiser 2 is a guitar accepting signals from all
the hard disk? (Answer in seconds) channels but can only synthesise guitar sound. A
MIDI sequencer is playing a MIDI file on the
2. A multimedia presentation has 30 minutes of
computer. The table below lists tracks with their
CD-quality digital audio in .wav files. What is the
patch and channel settings. What kind of sound will
storage required for these files?
be heard from each device?
3. In a teleconferencing system, a network connection Track Patch Channel Track Patch Channel
having bandwidth of 100Kbytes/sec is allocated to 1 2 3 2 8 4
duplex audio link. What is the maximum sampling 3 28 5 4 49 6
rate and resolution in which uncompressed audio 5 57 7 6 73 2
data can be transmitted in real-time?
4. You are developing a network voice communication
program. It uses the Internet to connect two remote
users and allows them to talk to each other in real
IN OUT THRU IN THRU IN THRU
time. What is the most appropriate sampling rate for
recording their voice? Computer
Music Keyboard Synthesiser 1 Synthesiser 2
5. In the configuration shown below, the music

2. Sound/Audio
(200009) Slide: 27
5.1 Solution to Exercises
1. The size of one second of recording is 11,050 bytes. 4. Because the data we want to handle are voice, a
the length of recording that can be stored in sampling rate of 11KHz and mono will be enough.
256MBytes is This creates a data rate of 11K/sec. If the users are
(256 1024 1024) =11; 025 = 24; 347(seconds)
connected to the Internet via a LAN, this data rate
should be handled by the system adequately.
2. CD-quality digital audio means 44,100Hz Sampling However, if they are connected via modems, the
rate, 16-bit resolution and stereo. data rate will be too high. Compression technique
44100 2 2 (30 60) has to be used.
= 317; 510; 000 bytes 5. Since the music keyboard can accept signal from all
= 302:8 MBytes channels, the general MIDI patch map, we will hear
the following sounds:
3. Since the link is duplex, we will allocate half the
bandwidth to each direction, i.e., 50Kbytes/sec. Track Patch Channel Sound
Using the formula below 1 2 3 Bright Acoustic Piano
S = R ( b=8)
C D;
2 8 4 Clavinet Chromatic
3 28 5 Electric Guitar
and assuming we record in mono, we have 4 49 6 String Ensemble 1
50; 000 = R (b=8). Since it is a teleconferencing 5 57 7 Trumpet
application, the sound will mostly be speech. We 6 73 2 Piccolo
can use lower sampling rate and higher resolution.
Therefore, we can use 22.05KHz sampling rate and Synthesiser 2 will produce no sound because no
16-bit resolution. The most important limit is that signal is sent to Channel 10. Synthesiser 3 will
the data can not be larger than the bandwidth, produce sound for signals from all channels but they
otherwise, we will not be able to transmit them. all sound like guitar.
2. Sound/Audio
(200009) Slide: 28

Representing Sound Digitally

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Representing Sound Digitally

Caricato da

Copyright:

Formati disponibili

UNIT

Formatted on September 5, 2000

 To understand how computers process 1 The Nature of Sound 2

 To understand the differences between 4 Summary — MIDI versus digital audio 26

COMP3600/SCI2600 Multimedia Systems

 the source which emits sound;

Sounds we heard everyday are very complex. Every Time

sound is comprised of waves of many different

hear is a sine wave.

Sound waves can be characterised by the following attributes:

Period Frequency Amplitude Bandwidth

Period is the interval at which a periodic signal

Musical instruments are tuned to produce a set of Note Ratio Frequencies

COMP3600/SCI2600 Multimedia Systems

COMP3600/SCI2600 Multimedia Systems

Intensity Sound Level Loudness

1 120 Threshold of feeling

COMP3600/SCI2600 Multimedia Systems

 Dynamic range means the change in sound levels.

COMP3600/SCI2600 Multimedia Systems

COMP3600/SCI2600 Multimedia Systems

The size of a digital recording depends on

COMP3600/SCI2600 Multimedia Systems

COMP3600/SCI2600 Multimedia Systems

 Sound is stored in .wav as digital samples known as Pulse Code Modulation(PCM).

 .vox use Adaptive Delta Pulse Code Modulation (ADPCM).

Extension MIME Type Platform Use

COMP3600/SCI2600 Multimedia Systems

 A digital signal processor processes the

into a few chips on a sound card

COMP3600/SCI2600 Multimedia Systems

 Windows device driver — controls the hardware

 Device manager — the user interface to the hardware

COMP3600/SCI2600 Multimedia Systems

COMP3600/SCI2600 Multimedia Systems

General MIDI is a standard specified by MIDI Manufacturers Association. To be GM compatible,

COMP3600/SCI2600 Multimedia Systems

IN — for receiving MIDI data;

MIDI devices can be daisy-chained together.

OUT IN THRU IN THRU IN THRU

COMP3600/SCI2600 Multimedia Systems

COMP3600/SCI2600 Multimedia Systems

MIDI devices communicate with each other through channels. The

COMP3600/SCI2600 Multimedia Systems

The general MIDI standard specifies 128 patches(ranges from 0 to 127).

COMP3600/SCI2600 Multimedia Systems

There three types of MIDI file:

COMP3600/SCI2600 Multimedia Systems

 Multiple tracks can be played at the same time.

Device 1 Device 2 Device 3

COMP3600/SCI2600 Multimedia Systems

A simplistic view is that:

There are two ways of synthesizing sounds:

COMP3600/SCI2600 Multimedia Systems

 Diminishing sound — gradually die out;

COMP3600/SCI2600 Multimedia Systems

 Delay — the time between when a key is

COMP3600/SCI2600 Multimedia Systems

MIDI player for playing MIDI music. This includes:

COMP3600/SCI2600 Multimedia Systems

 Digital representation of physical sound  Abstract representation of musical sounds

COMP3600/SCI2600 Multimedia Systems

To understand how computers process 1 The Nature of Sound 2

To understand the differences between 4 Summary — MIDI versus digital audio 26

the source which emits sound;

Dynamic range means the change in sound levels.

Sound is stored in .wav as digital samples known as Pulse Code Modulation(PCM).

.vox use Adaptive Delta Pulse Code Modulation (ADPCM).

A digital signal processor processes the

Windows device driver — controls the hardware

Device manager — the user interface to the hardware

Multiple tracks can be played at the same time.

Diminishing sound — gradually die out;

Delay — the time between when a key is

Digital representation of physical sound Abstract representation of musical sounds