Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
1. ng B Vit 11DT3
2. Ng t
11DT3
3. Bi nh Phc 11DT2
OUTLINE
INTRODUCTION
FUNDAMENTALS
MPEG-x AUDIO STANDARDS
PERFORMANCE MEASURES
EVALUATION
CONCLUSION
INTRODUCTION
MPEG-x Standards: evolving set of standards for video and audio
compression developed by the Moving Picture Experts Group.
MPEG-x Audio:
General Audio (GA) coding
Taking PCM audio streams and effectively encoding them for
transmission and storage
Synthetic audio
Text-to-Speech, how to generate and play virtual instruments
INTRODUCTION
Ancillary Data
(Optional)
FUNDAMENTALS
Psycho-acoustic model:
Hearing characteristics
Threshold of hearing
Frequency masking
Critical bands
Bark units
Temporal masking
Time to Frequency Transformation:
Filter banks
Bit allocation:
Bitstream Formatting
MPEG Audio Algorithm
PSYCHO-ACOUSTIC
How humans perceive the sound
The main feature in the compression context is that it tells what parts
that we can remove
HEARING CHARACTERISTICS
FLETCHER-MUNSON EQUAL-LOUDNESS
CURVES
THRESHOLD OF HEARING
THRESHOLD OF HEARING
[dB]
The origin is at the frequency of 2 kHz since Threshold(f) = 0 at f = 2 kHz
FREQUENCY MASKING
A sound makes another sound be heard difficult if there is a certain
difference in frequencies between them.
A lower frequency can effectively mask a higher frequency.
A higher frequency does not mask a lower frequency well.
The greater the power in the masking frequency, the broader the
range of frequencies it can mask
If two sounds are widely separated in frequency, little masking occurs.
CRITICAL BANDS
Because of frequency masking, we can divide human hearing range
into critical bands.
Human auditory system cannot resolve sounds better than within
about one critical band when other sounds are present
Critical bandwidth corresponds to the smallest frequency difference
between two partials such that each can still be heard separately
Critical band
Less than 100 Hz at f < 500 Hz nearly constant
For f 500 Hz, increases roughly linearly with frequency
CRITICAL BANDS
BARK UNITS
The range of frequencies affected by masking is broader for higher
frequencies
It is useful to define a new frequency unit
In terms of this new unit, each of the masking curves has about the
same width
The new unit defined is called the Bark, named after Heinrich
Barkhausen (1881-1956)
One Bark unit corresponds to the width of one critical band, for any
masking frequency
BARK UNITS
The conversion between a frequency f and its corresponding critical
Another formula:
where f is in kHz, b is in Barks
BARK UNITS
TEMPORAL MASKING
Any loud tone causes the hearing receptors in the inner ear to
become saturated, and they require time to recover
TEMPORAL MASKING
TEMPORAL MASKING
TEMPORAL MASKING
A signal is able to mask other signals that occur just after or before it
sounds
Time to Frequency
Transformation
Filter banks:
A parallel bank of bandpass filters covering the entire spectrum
Used to break input signal into frequency components- subbands
BIT-ALLOCATION
BIT-ALLOCATION
Bitstream Formatting
Ancillary
Data
(Optional)
Ancillary
Data
(Optional)
Ancillary
Data
(Optional)
Determine the number of code bits to quantize the subband to minimize the audibility of quantization noise
Bits are allocated where they are most needed to lower the
quantization noise below an audible level.
Then the number of bits allocated is used to quantize the
information from the filter bank
Ancillary
Data
(Optional)
Ancillary
Data
(Optional)
MPEG-1
Three downward-compatible layers of audio compression
Each offers more complexity in the psychoacoustic model applied and
correspondingly better compression for a given level of audio quality
Layer 1 quality can be quite good, provided a comparatively high
bitrate is available
Layer 2 has more complexity and was proposed for use in digital
audio broadcasting
Layer 3 is most complex and was originally aimed at audio
transmission over ISDN lines
Each of the layers uses a different frequency transform
MPEG-1 Layers
In the Layer 1 encoder, the sets of 32 PCM values are first assembled
into a set of 12 groups of 32s
MPEG-1 Layers
A Layer 2 or Layer 3, frame actually accumulates more than 12
samples for each sub band: a frame includes 1,152 samples
MPEG-1 Layer 3
Main difference:
Employs a similar filter bank to that used in Layer 2, except using
a set of filters with non-equal frequencies
Takes into account stereo redundancy(Mid/Side Coding)
Uses Modified Discrete Cosine Transform (MDCT)
Sophisticated bit allocation and quantization strategies rely on
non-uniform quantization.
Use Huffman Coding-loss less coding.
MPEG-1 Layer 3
MPEG-1 Layer 3
WHAT IS MPEG-7 ?
"Multimedia Content Description Interface
Providing meta-data for multimedia.
MPEG-7: makes content accessible, retrievable,
filterable, manageable (via device / computer).
Multi-degrees of interpretation of informations
meaning
Support as broad a range of applications as possible.
A compatible (with existing tech) and extensible
standard.
MPEG-7 OBJECTIVES
Standardize content-based description for various
types of audiovisual information
Independent from media support (encoding and
storage)
Different granularity
MPEG-7 AUDIO
LOW-LEVEL FEATURES
MPEG-7 Audio Framework:
Two low-level descriptor types: (for sample and
segment)
Scalar : (e.g. power or fundamental frequency)
Vector : (e.g. spectra)
LOW-LEVEL FEATURE
(TYPES)
MPEG-21 (ISO/IEC
21000)
What?
Why?
Why now?
MPEG-21
OBJECTIVES
Vision
Purpose
Goals
FUNDAMENTAL CONCEPTS
A structured digital object with a standard
representation, identification and meta-data
The fundamental unit of distribution and transaction
in the MPEG-21 framework
Digital Item = resource + metadata + structure
Resource: individual asset, e.g., MPEG-2 video
Metadata: descriptive information, e.g., MPEG-7
Structure: relationships among parts of the item
DIGITAL ITEM
Resources
Metadata
MPEG-1
MPEG-7
MPEG-2
New Metadata
& Resource
Forms
Structure
MPEG-4
MPEG-21
PERFORMANCE MEASURES
Two criteria:
1. Compression ratio
2. Hearing perception
FFMPEG software
EVALUATION
DEMONSTRATION
DEMONSTRATION
CONCLUSION
THANK
THANKYOU
YOUFOR
FORLISTENING!
LISTENING!