Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
COM
© Anite 2014
WHITEPAPER
CONTENTS
Page 2
WHITEPAPER
1 THEORETICAL BACKGROUND
1.1 Overview
Speech transmission path amongst mobile and fixed networks consists of many different elements.
Along the path there can be multiple speech codecs, analog-to-digital and digital-to-analog
conversions, echo cancellers, noise suppressors, adaptive level controllers, voice activity detectors,
comfort noise generators, signal enhancers, and so on. In modern packet-switched networks
variable delays and packet losses inflict other types of problems. Moreover, especially in mobile
networks additional quality degradation may, and usually will, happen due to bit errors on the air
interface layer and also by silent gaps caused by, for example, handovers.
These kinds of complicated systems can inflict a large variety of degradations to speech signals.
These degradations include loudness loss, talker and listener echo, temporal gaps on speech signal,
filtering, amplitude clipping, variable delays, distortions, channel errors, and effects/artifacts from
noise reduction algorithms and from operation of echo cancellers.
Impairment Grade
Imperceptible 5
Slightly annoying 3
Annoying 2
Very annoying 1
Page 3
WHITEPAPER
Based on these test conditions a population of typically 20 to 50 test subjects will be presented with
identical series of speech fragments. Every test subject will be asked to score each sample on the
impairment scale. After statistical processing of the individual results, a mean opinion score (MOS)
can be calculated. With thorough setups, such test results can be reproduced quite well, even at
different locations. Of course, the effort needed in terms of subjects and time is tremendous.
Furthermore, such test methods cannot be applied within a practical or field environment in the
daily life.
Page 4
WHITEPAPER
A first approach to overcome these problems was the development of PSQM+ (however it is not
included in the standard). It could well handle the larger distortions as they are caused by e.g.
burst errors, but still had significant problems with the compensation of the varying delay.
With the new ITU standard P.862 (PESQ) [6] this problem is now finally eliminated. PESQ combines
the excellent psychoacoustic and cognitive model of PSQM+ with a time alignment algorithm that
perfectly handles varying delays. The only drawback of PESQ is that it is absolutely not designed for
streaming applications. This is in turn why it cannot fully replace PSQM+. With PSQM and PESQ
there are now two standards that cover the entire problem of measuring speech quality. Figure 2
gives an overview of the structure of the PESQ algorithm and shows the new blocks which have
been added to the PSQM algorithm.
Page 5
WHITEPAPER
The ITU has standardized a universal PESQ to MOS mapping. This was created from a shared pool
of subjective test results covering wireless, VoIP, fixed and codec-only conditions, including
Japanese, British English, American English, French, German, Italian, Swedish, Dutch and Finnish.
This mapping is continuous from PESQ –0.5 to 4.5 and MOS 1 to 4.55. It takes the form of a logistic
with four parameters, and is shown below:
For more information on this mapping, please see ITU-T recommendation P.862.1.
For more information on this mapping, please see ITU-T recommendation P.862.2 [11].
Page 6
WHITEPAPE
ER
1.2.5
1 ITU
U-T P.863
3 POLQA
POLLQA is the neext-generatio
on voice qua ality testing technology for
f fixed, mo obile and IP based
netwworks. POLQQA has been selected to form the new ITU-T voic ce quality te
esting standaard, P.863,
and will be usedd with HD Vo
oice, 3G and d 4G/LTE. PO OLQA – whichh stands for Perceptual Objective
Listtening Quallity Analysiis – will offe r a new leve marking capa bility to dete
el of benchm ermine the
voic
ce quality of mobile netw
work servicess.
Pa
age 7
WHITEPAPER
Bandwidth P.863 in super-wideband mode takes into PESQ applies a linear frequency equalization
limitations account bandwidth limitations by detecting stage before presenting the signals to the
the absence of any speech energy above 3.8 psycho-acoustic model. This effectively
kHz to indicate a narrowband degraded file removes frequency response influences from
and the absence of any speech energy above being detected in the model.
7 kHz to indicate a wideband degraded file. This is useful for small degrees of frequency
With a narrowband signal the maximum shaping but PESQ underestimates severe linear
achievable score is 3.8. frequency response distortions.
Change in bandwidth between speakers
(different gender or talker) in a single test
may lead to a lower than expected score.
Short interrupts The predicted quality score tends to a MOS PESQ behaves in a similar way to P.863 with
(e.g. packet loss) value of 1.0 as frame loss increases up to scores tending to a MOS of 1.0 as loss rate
30%. Even small loss rates cause a drop in increases to 30%.
measured speech quality. Results show that
the super-wideband mode is slightly more
sensitive to short interrupts than the
narrowband mode.
Long interrupts Long interrupts describe muting of speech for It has been claimed that PESQ reacts
(e.g. VAD clipping, 200ms or more at the front, in the middle or unexpectedly to lost speech. For small
inter-system the end of a speech sentence. Loss in the interrupts P.863 and PESQ produce consistent
handovers) middle of speech leads to the largest drop in scores, but with longer interrupts PESQ
quality, followed by front-end loss with losses predictions are significantly more optimistic
at the end of a sentence having least impact than expected.
on score.
Table 2. Short summary of what can be expected from P.863 for certain types of
degradation compared to the PESQ processing model.
P.863 will return scores very similar to PESQ P.862.1 in the narrowband mode with simple codecs
such as G.711. Tests with more sophisticated codecs and transmission techniques may yield
different scores as P.863 addresses the objective assessment limitations of PESQ.
It is more difficult to compare P.863 super-wideband mode with PESQ P.862.2 because most
wideband experiments were performed with 16k sample rate material. P.863 super-wideband mode
requires a 48k sample rate reference file. The P.863 results with a 16k sample rate reference file or
an up-sampled 16k reference file will be wrong as there will not be any speech energy above 8 kHz.
Page 8
WHITEPAPER
Table 3.Test sample average scores with standard deviations as measured in near static
conditions. Note that calculated offsets are only informative and are not statistically
strong enough for score calibration in all networks.
Page 9
WHITEPAPER
Table 4 presents PESQ and POLQA maximums for AMR-12.2 codec along with sampling rate, length,
SNR and speech activities.
Sample filename Samp. rate Length SNR Speech.act Pesq‐NB Polqa‐NB Polqa‐SWB
Page 10
WHITEPAPER
Page 11
WHITEPAPER
2.1 Overview
Nemo Voice Quality (VQ) is an option for several Nemo products, namely Nemo Outdoor, Nemo
Invex, Nemo Handy, Nemo Walker Air, Nemo Autonomous, and Nemo Server. Nemo VQ uses PESQ
and POLQA OEM libraries that are licensed from Opticom GmbH. The PESQ/POLQA scores are
calculated by comparing the original and the degraded sample files – this is the so called intrusive
testing method.
In mobile-to-mobile testing each score is a combination of both test terminal’s uplink and downlink
quality. In contrast to this, when testing against a fixed end, such as Nemo Server with PSTN or
ISDN, the server-side results present purely mobile uplink quality and mobile-side results present
mobile downlink quality. Because PSTN and ISDN can be considered static, the quality by direction
can be isolated only in mobile-to-fixed testing.
Page 12
WHITEPAPER
With NMR, PESQ/POLQA voice quality measurements can be done with Android-based smartphones
without any additional hardware, such as sound cards or Nemo Invex handset isolation modules.
This is highly beneficial as most new smartphones because interfering extra signals to the audio
Page 13
WHITEPAPER
path when phone is, for example, charged or powered via a USB data cable. With NMR, interfering
noise issues can be completely avoided in voice quality measurements.
NMR interface was initially designed for voice quality measurements but the same interface can be
later used also for other purposes, such as data testing.
In voice quality measurements, a smartphone records the received audio sample files and transfers
them via the NMR interface back to the host computer in real-time. The host computer then
calculates PESQ/POLQA results, similarly to the sound card interface, and values are written to
Nemo Outdoor log files. This requires no changes in PESQ and POLQA licensing, unlike the sound
card based solution.
NMR supports multiple phones connected simultaneously to one host. The target is to be able to
support six or more phones connected to one host simultaneously.
The main advantages from NMR are:
eliminates extra sound card HW and audio cabling
decreases the complexity of the measurement system
reduces weight and power consumption
improves user experience and audio quality by eliminating the phone-generated noise
In mobile-to-mobile testing the test terminals used in voice quality measurements can be connected
to separate laptops, attached to Nemo Multi or Nemo Invex, or they can be Handy-A units
measuring independently. One or both ends can also be Nemo Handy-A units or Nemo Server with
PSTN or ISDN test lines. All combinations work together.
The audio can be transferred via a sound card as pictured above, via handset isolation modules, or
via Nemo Media Router installed on the connected smartphone.
Page 14
WHITEPAPER
Test Procedure
1. Both measurement ends are configured to use the same reference sample.
2. At first the reference sample is resampled and rescaled to the nearest format supported by
the audio interface used.
3. Nemo Outdoor/Invex/Handy initiates the test mobile to make a test call to the other mobile.
4. After a voice connection is established, the mobile initiating the call (MO from now on)
starts sending the synchronization signal.
5. Call receiver (MT from now on) has already accepted the incoming call and starts receiving
audio as soon as voice is connected.
6. MO starts sending the reference sample after the synchronization signal has been sent
(synchronization total length is 480ms).
7. MT detects the synchronization signal and adjusts its timing so that it can receive the test
sample from the correct point of time.
8. MT receives the first test sample and calculates the PESQ/POLQA score.
9. MT starts sending the synchronization signal over to MO.
10. MO receives synchronization, adjusts its timing and level, and then receives its first test
sample.
11. After that the test cycle continues like at the beginning, alternating the sending and
receiving until call is finished.
Page 15
WHITEPAPER
3 SUMMARY
In short, Nemo Voice Quality provides certified measurement results and consistent experience for
wide variety of test devices – within and between all Nemo product platforms. Additionally with
advances in the field of intrusive speech quality measurement brought in by POLQA algorithm, a
system and device -neutral assessment of wide variety of voice call applications – including not only
cellular technologies but also data and stream -based technologies as well – is now possible. Finally,
all these combined with the superior NMR technology that eliminates the analogue path altogether
makes Nemo Voice Quality the industry-leading speech quality measurement system.
Page 16
WHITEPAPER
4 REFERENCES
[2] ITU-T Recommendation P.800, Methods for subjective determination of transmission quality,
1996
[3] ITU-T Recommendation P.810, Modulated Noise Reference Unit (MNRU), 1996
[5] ITU-T Recommendation P.861, Objective Quality measurement of telephone-band (300 -3400
Hz) speech codecs, 1996
[6] ITU-T Recommendation P.862, PESQ an objective method for end-to-end speech quality
assessment of narrowband telephone networks and speech codecs, February 2001
[7] KARJALAINEN M., A New Auditory Model for the Evaluation of Sound Quality of Audio Systems,
Proc. of the ICASSP 1985, pp. 608-611
[8] Revised draft of Application Guide for P.863, ITU-T SG12, TD 851rev1 (GEN/12), June 2012
[9] ITU-T Recommendation P.862.1, Mapping function for transforming P.862 raw result scores to
MOS-LQO
[10] PESQ limitations for EVRC-based speech codec, Qualcomm, August 2007
Page 17