Mean Opinion Score

Mean opinion score
Mean opinion score (MOS) is a measure used in the domain of Quality of Experience and telecommunications engineering,
representing overall quality of a stimulus or system. It is the arithmetic mean over all individual “values on a predefined scale that a
subject assigns to his opinion of the performance of a system quality”.[1] Such ratings are usually gathered in a subjective quality
evaluation test, but they can also be algorithmically estimated.
MOS is a commonly used measure for video, audio, and audiovisual quality evaluation, but not restricted to those modalities. ITU-T
has defined several ways of referring to a MOS in Recommendation P.800.1, depending on whether the score was obtained from
audiovisual, conversational, listening, talking, or video quality tests.
Contents
Rating scales and mathematical definition
Properties of the MOS
MOS for speech and audio quality estimation
MOS estimation using quality models
See also
References
Rating scales and mathematical definition

The MOS is expressed as a single rational number
, typically in the range 1–5, where 1 is lowest perceived quality, and 5 is the highest
perceived quality. Other MOS ranges are also possible, depending on the rating scale that has been used in the underlying test. The
Absolute Category Ratingscale is very commonly used, which maps ratings between Bad and Excellent to numbers between 1 and 5,
as seen in below table.
Rating Label
5 Excellent
4 Good
3 Fair
2 Poor
1 Bad
Other standardized quality rating scales exist in ITU-T recommendations (such as P.800 or P.910). For example, one could use a
continuous scale ranging between 1–100. Which scale is used depends on the purpose of the test. In certain contexts there are no
ferent scales.[2]
statistically significant differences between ratings for the same stimuli when they are obtained using dif
The MOS is calculated as the arithmetic mean over single ratings performed by human subjects for a given stimulus in a subjective
quality evaluation test. Thus:
Where are the individual ratings for a given stimulus by subjects.
Properties of the MOS

The MOS is subject to certain mathematical properties and biases. In general, there is an ongoing debate on the usefulness of the
[3]
MOS to quantify Quality of Experience in a single scalar value.
When the MOS is acquired using a categorical rating scales, it is based on – similar to Likert scales – an ordinal scale. In this case,
the ranking of the scale items is known, but their interval is not. Therefore, it is mathematically incorrect to calculate a mean over
individual ratings in order to obtain the central tendency; the median should be used instead.[4] However, in practice and in the
definition of MOS, it is considered acceptable to calculate the arithmetic mean.
It has been shown that for categorical rating scales (such as ACR), the individual items are not perceived equidistant by subjects. For
example, there may be a larger “gap” between Good and Fair than there is between Good and Excellent. The perceived distance may
also depend on the language into which the scale is translated.[5] However, there exist studies that could not prove a significant
impact of scale translation on the obtained results.[6]
Several other biases are present in the way MOS ratings are typically acquired.[7] In addition to the above-mentioned issues with
scales that are perceived non-linearly, there is a so-called “range-equalization bias”: subjects, over the course of a subjective
experiment, tend to give scores that span the entire rating scale. This makes it impossible to compare two different subjective tests if
the range of presented quality differs. In other words, the MOS is never an absolute measure of quality, but only relative to the test in
which it has been acquired.
For the above reasons – and due to several other contextual factors influencing the perceived quality in a subjective test – a MOS
value should only be reported if the context in which the values have been collected in is known and reported as well. MOS values
gathered from different contexts and test designs therefore should not be directly compared. ITU-T Recommendation P.800.2
prescribes how MOS values should be reported. Specifically
, P.800.2 says:
it is not meaningful to directly compare MOS values produced from separate experiments, unless those experiments
were explicitly designed to be compared, and even then the data should be statistically analysed to ensure that such a
comparison is valid.
MOS for speech and audio quality estimation

MOS historically originates from subjective measurements where listeners would sit in a "quiet room" and score a telephone call
quality as they perceived it. This kind of test methodology had been in use in the telephony industry for decades and was
standardized in ITU-T recommendation P.800. It specifies that “the talker should be seated in a quiet room with volume between 30
and 120 dB and a reverberation time less than 500 ms (preferably in the range 200–300 ms). The room noise level must be below 30
dBA with no dominant peaks in the spectrum.” Requirements for other modalities were similarly specified in ITU recommendations
later.
MOS estimation using quality models

Obtaining MOS ratings may be time-consuming and expensive as it requires the recruitment of human assessors. For various use
cases such as codec development or service quality monitoring purposes – where quality should be estimated repeatedly and
automatically – MOS scores can also be predicted by objective quality models, which typically have been developed and trained
using human MOS ratings.
See also
Absolute Category Rating
Likert scale
MUSHRA (ITU-R Recommendation BS.1534)
Objective video quality
Subjective video quality
References
1. ITU-T Rec. P.10 (2006) Vocabulary for performance and quality of service.
2. Huynh-Thu, Q.; Garcia, M. N.; Speranza, F.; Corriveau, P.; Raake, A. (2011-03-01)."Study of Rating Scales for
Subjective Quality Assessment of High-Definition V ideo" (http://ieeexplore.ieee.org/document/5635365/)
. IEEE
Transactions on Broadcasting. 57 (1): 1–14. doi:10.1109/TBC.2010.2086750(https://doi.org/10.1109%2FTBC.2010.
2086750). ISSN 0018-9316 (https://www.worldcat.org/issn/0018-9316).
3. Hoßfeld, Tobias; Heegaard, Poul E.; Varela, Martín; Möller, Sebastian (2016-12-01)."QoE beyond the MOS: an in-
depth look at QoE via better metrics and their relation to MOS"(https://link.springer.com/article/10.1007/s41233-016-
0002-1). Quality and User Experience. 1 (1): 2. doi:10.1007/s41233-016-0002-1(https://doi.org/10.1007%2Fs41233-
016-0002-1). ISSN 2366-0139 (https://www.worldcat.org/issn/2366-0139).
4. Jamieson, Susan. "Likert scales: how to (ab) use them." Medical education 38.12 (2004): 1217-1218.
5. Streijl, Robert C., Stefan Winkler, and David S. Hands. "Mean opinion score (MOS) revisited: methods and
applications, limitations and alternatives." Multimedia Systems 22.2 (2016): 213-227.
6. Pinson, M. H.; Janowski, L.; Pepion, R.; Huynh-Thu, Q.; Schmidmer , C.; Corriveau, P.; Younkin, A.; Callet, P. Le;
Barkowsky, M. (October 2012). "The Influence of Subjects and Environment on Audiovisual Subjectiveests: T An
International Study" (http://ieeexplore.ieee.org/document/6286980/)
. IEEE Journal of Selected Topics in Signal
Processing. 6 (6): 640–651. doi:10.1109/jstsp.2012.2215306(https://doi.org/10.1109%2Fjstsp.2012.2215306) .
ISSN 1932-4553 (https://www.worldcat.org/issn/1932-4553).
7. Zielinski, Slawomir, Francis Rumsey, and Søren Bech. "On some biases encountered in m odern audio quality
listening tests-a review." Journal of the Audio Engineering Society 56.6 (2008): 427-451.
Retrieved from "https://en.wikipedia.org/w/index.php?title=Mean_opinion_score&oldid=806684395

"
This page was last edited on 23 October 2017, at 14:59.
Text is available under theCreative Commons Attribution-ShareAlike License ; additional terms may apply. By using this
site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of theWikimedia
Foundation, Inc., a non-profit organization.

Mean Opinion Score

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Mean Opinion Score

Caricato da

Copyright:

Formati disponibili

Mean opinion score

Rating scales and mathematical definition

Properties of the MOS

MOS for speech and audio quality estimation

MOS estimation using quality models

Retrieved from "https://en.wikipedia.org/w/index.php?title=Mean_opinion_score&oldid=806684395

This page was last edited on 23 October 2017, at 14:59.

Potrebbero piacerti anche