Mel-Frequency Cepstrum

Mel-frequency cepstrum
From Wikipedia, the free encyclopedia
In sound processing, the mel-frequency cepstrum (MFC) is a representation of the short-term power spectrum
of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency.
Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an MFC. They are
derived from a type of cepstral representation of the audio clip (a nonlinear "spectrum-of-a-spectrum"). The
difference between the cepstrum and the mel-frequency cepstrum is that in the MFC, the frequency bands are
equally spaced on the mel scale, which approximates the human auditory system's response more closely than the
linearly-spaced frequency bands used in the normal cepstrum. This frequency warping can allow for better
representation of sound, for example, in audio compression.
MFCCs are commonly derived as follows:[1][2]
1. Take the Fourier transform of (a windowed excerpt of) a signal.
2. Map the powers of the spectrum obtained above onto the mel scale, using triangular overlapping windows.
3. Take the logs of the powers at each of the mel frequencies.
4. Take the discrete cosine transform of the list of mel log powers, as if it were a signal.
5. The MFCCs are the amplitudes of the resulting spectrum.
There can be variations on this process, for example: differences in the shape or spacing of the windows used to
map the scale,[3] or addition of dynamics features such as "delta" and "delta-delta" (first- and second-order frameto-frame difference) coefficients.[4]
The European Telecommunications Standards Institute in the early 2000s defined a standardised MFCC algorithm
to be used in mobile phones.[5]
Contents
1 Applications
2 Noise sensitivity
3 History
4 See also
5 References
6 External links
Applications
MFCCs are commonly used as features in speech recognition systems, such as the systems which can automatically
recognize numbers spoken into a telephone. They are also common in speaker recognition, which is the task of
recognizing people from their voices.[6]
MFCCs are also increasingly finding uses in music information retrieval applications such as genre classification,
audio similarity measures, etc.[7]
Noise sensitivity
MFCC values are not very robust in the presence of additive noise, and so it is common to normalise their values in
speech recognition systems to lessen the influence of noise. Some researchers propose modifications to the basic
MFCC algorithm to improve robustness, such as by raising the log-mel-amplitudes to a suitable power (around 2
or 3) before taking the DCT, which reduces the influence of low-energy components.[8]
History
Paul Mermelstein[9][10] is typically credited with the development of the MFC. Mermelstein credits Bridle and
Brown[11] for the idea:
Bridle and Brown used a set of 19 weighted spectrum-shape coefficients given by the cosine
transform of the outputs of a set of nonuniformly spaced bandpass filters. The filter spacing is chosen
to be logarithmic above 1 kHz and the filter bandwidths are increased there as well. We will,
therefore, call these the mel-based cepstral parameters.[9]
Sometimes both early originators are cited.[12]
Many authors, including Davis and Mermelstein,[10] have commented that the spectral basis functions of the cosine
transform in the MFC are very similar to the principal components of the log spectra, which were applied to speech
representation and recognition much earlier by Pols and his colleagues.[13][14]
See also
Gammatone filter
Psychoacoustics
References
1. ^ Min Xu et al. (2004). "HMM-based audio keyword generation"
(http://cemnet.ntu.edu.sg/home/asltchia/publication/AudioAnalysisUnderstanding/Conference/HMMBased%20Audio%20Keyword%20Generation.pdf). In Kiyoharu Aizawa, Yuichi Nakamura, Shin'ichi Satoh.
Advances in Multimedia Information Processing PCM 2004: 5th Pacific Rim Conference on Multimedia.
Springer. ISBN 3-540-23985-5.

2. ^ Sahidullah, Md.; Saha, Goutam (May 2012). "Design, analysis and experimental evaluation of block based
transformation in MFCC computation for speaker recognition"
(http://www.sciencedirect.com/science/article/pii/S0167639311001622). Speech Communication 54 (4): 543565.
doi:10.1016/j.specom.2011.11.004 (http://dx.doi.org/10.1016%2Fj.specom.2011.11.004).
3. ^ Fang Zheng, Guoliang Zhang and Zhanjiang Song (2001), "Comparison of Different Implementations of MFCC
(http://link.springer.com/article/10.1007%2FBF02943243?LI=true#page-1)," J. Computer Science & Technology,
16(6): 582589.
4. ^ S. Furui (1986), "Speaker-independent isolated word recognition based on emphasized spectral dynamics"
5. ^ European Telecommunications Standards Institute (2003), Speech Processing, Transmission and Quality
Aspects (STQ); Distributed speech recognition; Front-end feature extraction algorithm; Compression algorithms
(http://webapp.etsi.org/workprogram/Report_WorkItem.asp?wki_id=18820). Technical standard ES 201 108,
v1.1.3.
6. ^ T. Ganchev, N. Fakotakis, and G. Kokkinakis (2005), "Comparative evaluation of various MFCC
implementations on the speaker verification task (http://www.wcl.ece.upatras.gr/ganchev/Papers/ganchev17.pdf),"
in 10th International Conference on Speech and Computer (SPECOM 2005), Vol. 1, pp. 191194.
7. ^ Meinard Mller (2007). Information Retrieval for Music and Motion (http://books.google.com/books?
id=kSzeZWR2yDsC&pg=PA65&dq=mfcc+music+applications#PPA65,M1). Springer. p. 65. ISBN 978-3-54074047-6.
8. ^ V. Tyagi and C. Wellekens (2005), On desensitizing the Mel-Cepstrum to spurious spectral components for
Robust Speech Recognition (http://dx.doi.org/10.1109/ICASSP.2005.1415167), in Acoustics, Speech, and Signal
Processing, 2005. Proceedings. (ICASSP 05). IEEE International Conference on, vol. 1, pp. 529532.
9. ^ a b P. Mermelstein (1976), "Distance measures for speech recognition, psychological and instrumental," in
Pattern Recognition and Artificial Intelligence (http://books.google.com/books?
id=wW9QAAAAMAAJ&q=%22Distance+measures+for+speech+recognition,+psychological+and+instrumental%2
2&dq=%22Distance+measures+for+speech+recognition,+psychological+and+instrumental%22&lr=&as_brr=0&as
_pt=ALLTYPES&ei=zdRmSZjKLoH4lQTfqaXhBg&pgis=1), C. H. Chen, Ed., pp. 374388. Academic, New York.
10. ^ a b S.B. Davis, and P. Mermelstein (1980), "Comparison of Parametric Representations for Monosyllabic Word
Recognition in Continuously Spoken Sentences (http://books.google.com/books?
id=yjzCra5eW3AC&pg=PA65&dq=cosine+mel+pols&lr=&as_brr=3&ei=ytJmSZGLNI6ukAThwuGxCA#PPA65,M
1)," in IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), pp. 357366.
11. ^ J. S. Bridle and M. D. Brown (1974), "An Experimental Automatic Word-Recognition System", JSRU Report
No. 1003, Joint Speech Research Unit, Ruislip, England.
12. ^ Nelson Morgan, Herv Bourlard, and Hynek Hermansky (2004). "Automatic Speech Recognition: An Auditory
Perspective" (http://books.google.com/books?id=xWU2o08AxwwC&pg=PA315&dq=melfrequency+Mermelstein+Bridle). In Steven Greenberg and William A. Ainsworth. Speech Processing in the
Auditory System. Springer. p. 315. ISBN 978-0-387-00590-4.
13. ^ L. C. W. Pols (1966), "Spectral Analysis and Identification of Dutch Vowels in Monosyllabic Words," Doctoral
dissertion, Free University, Amsterdam, The Netherlands
14. ^ R. Plomp, L. C. W. Pols, and J. P. van de Geer (1967). "Dimensional analysis of vowel spectra
(http://dare.uva.nl/document/36194)." J. Acoustical Society of America, 41(3):707712.
External links
A tutorial on MFCCs for Automatic Speech Recognition
(http://www.practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstralcoefficients-mfccs/)
Retrieved from "http://en.wikipedia.org/w/index.php?title=Mel-frequency_cepstrum&oldid=616809372"
Categories: Signal processing
This page was last modified on 13 July 2014 at 17:35.
Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply.
By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia is a registered trademark
of the Wikimedia Foundation, Inc., a non-profit organization.

Mel-Frequency Cepstrum - Wikipedia, The Free Encyclopedia

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Mel-Frequency Cepstrum - Wikipedia, The Free Encyclopedia

Caricato da

Copyright:

Formati disponibili

From Wikipedia, the free encyclopedia

Springer. ISBN 3-540-23985-5.

Potrebbero piacerti anche