Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Julián Urbano, Dmitry Bogdanov, Perfecto Herrera, Emilia Gómez and Xavier Serra
Music Technology Group, Universitat Pompeu Fabra Barcelona, Spain
{julian.urbano,dmitry.bogdanov,perfecto.herrera,emilia.gomez,xavier.serra}@upf.edu
0.30
0.20
0.10
0.00
though that this residual does not account for any random
CBR.64
CBR.96
CBR.128
CBR.160
CBR.192
CBR.256
CBR.320
VBR.6
VBR.4
VBR.2
VBR.0
CBR.64
CBR.96
CBR.128
CBR.160
CBR.192
CBR.256
CBR.320
VBR.6
VBR.4
VBR.2
VBR.0
Size×(BRate:Codec)
2
σ̂Genre 0.99 4.53 3.92 0.08 3.80 1.12 0.52 0.90 0.32 0.89
2
σ̂T 19.76 5.84 6.46 11.59 5.73 10.12 3.91 2.65 5.23 2.59
rack
2
σ̂residual 42.05 32.75 53.92 78.72 54.03 34.19 35.26 55.87 59.52 56.92
Grand mean 0.0591 1.6958 0.9999 0.9977 0.9999 0.0682 1.8820 0.9998 0.9939 0.9998
Total variance 0.0032 3.4641 1.8e-7 3.2e-5 1.5e-7 0.0081 11.44 1.6e-6 0.0005 1.4e-6
Standard deviation 0.0567 1.8612 0.0004 0.0056 0.0004 0.0897 3.3835 0.0013 0.0214 0.0012
2
σ̂F 1.17 0.32 0.16 0.24 0.18 0.25 0 0 0 0
Size
2
σ̂Codec 0 0 0 0 0 0 0 0 0 0
2
σ̂BRate:Codec 4.91 6.01 2.32 0.74 3.14 23.46 24.23 14.27 13.31 15.02
2
σ̂F 0 0 0 0 0 0 0 0 0 0
Size×Codec
2
σ̂F 0.96 0.43 0.03 0.04 0.09 7.17 8.09 10.35 6.34 10.86
Lib2
Size×(BRate:Codec)
2
σ̂Genre 4.21 14.68 2.84 0.61 4.41 0.37 5.37 0.50 0 0.48
2
σ̂T 52.34 61.05 32.07 66.10 41.26 27.33 14.10 6.55 13.32 5.53
rack
2
σ̂residual 36.41 17.51 62.57 32.27 50.92 41.42 48.21 68.32 67.03 68.11
Grand mean 0.0622 0.0278 0.9999 0.9955 0.9999 0.0656 0.0342 0.9998 0.9947 0.9999
Total variance 0.0040 0.0015 8.9e-8 0.0002 3.5e-8 0.0055 0.0034 6.4e-7 0.0002 4.8e-7
Standard deviation 0.0631 0.0391 0.0003 0.0131 0.0002 0.0740 0.0587 0.0008 0.0150 0.0007
Table 1. Variance components in the distributions of robustness of MFCCs for Lib1 (top) and Lib2 (bottom). Each
component represents the percentage of total variance due to each effect (eg. σ̂F2 Size = 3.03 indicates that 3.03% of the
variability in the robustness indicator is due to differences across frame sizes; σ̂x2 = 0 when the effect is so extremely small
that the estimate is slightly below zero). All interactions with the Genre and Track main effects are confounded with the
residual effect. The last rows show the grand mean, total variance and standard deviation of the distributions.
ity remains virtually the same beyond 96 Kbps. These plots their shape, the individual components vary significantly
confirm that the MFCC implementation in Lib1 is nearly across encodings; we observed that increasing the bitrate
twice as robust and stable when the encoding is homoge- leads to larger coefficients overall. This suggests that nor-
neous in the corpus, while the implementation in Lib2 is malizing the chroma coefficients could dramatically im-
less robust but more stable with heterogeneous encodings. prove the distributions of δ and ε. We tried the parameter
The FSize effect is negligible, indicating that the choice normalization=2 to have Lib2 normalize chroma vec-
of frame size does not affect the robustness of MFCCs tors to unit maximum. As expected, the effects of codec
in general. However, in several cases we can observe and bitrate are removed after normalization, and most of
large σ̂F2 Size×(BRate:Codec) scores, meaning that for some the variability is due to the Track effect. The correlation
codec-bitrate combinations it does matter. An in-depth indicators are practically unaltered after normalization.
analysis shows that these differences only occur at 64 Kbps
though (small frame sizes are more robust); differences are 5. ROBUSTNESS IN GENRE CLASSIFICATION
2
very small otherwise. Finally, the small σ̂Genre scores in- The previous section provided indicators of robustness that
dicate that robustness is similar across music genres. can be easily understood. However, they can be hard to
A similar analysis was conducted to assess the robust- interpret because in the end we are interested in the ro-
ness and stability of chroma features. Even though the bustness of the various algorithms that make use of these
correlation indicators are generally high as well, Table 2 features; whether δ = 5% is large or not depends on how
shows that chroma vectors do not preserve the shape as MFCCs and chroma are used in practice. To investigate
well as MFCCs do. When looking at individual coeffi- this question we consider a music genre classification task.
cients, the relative errors are similarly δ ≈ 6% in Lib1, but For each sampling rate, codec, bitrate and tool we trained
they are greatly reduced in Lib2, especially at 44100 Hz. one SVM model with radial basis kernel using MFCCs and
In fact, the chroma implementation in Lib2 is more robust another using chroma. For MFCCs we used a standard
and stable according to all indicators 4 . For Lib1, virtually frame size of 2048, and for chroma we set 4096 in Lib1
all the variability in the distributions is due to the Track and the fixed 16384 in Lib2. We did random sub-sampling
and residual effects, meaning that chroma is similarly ro- validation with 100 random trials for each model, using
bust across encodings, analysis parameters and genre. For 320 tracks for training and the remaining 80 for testing.
Lib2, we can similarly observe that errors in the correla- We first investigate whether a particular choice of en-
tion indicators depend almost entirely on the Track effect, coding is likely to classify better when fixed across train-
but δ and ε depend mostly on the codec-bitrate combina- ing and test sets. Table 3 shows the results for a selec-
tion. This indicates that, despite chroma vectors preserve tion of encodings at 44100 Hz. Within the same tool and
4 Even though these distributions include all frame sizes in Lib1 but
descriptor, differences across encodings are quite small,
only 16384 in Lib2, the FSize effect is negligible in Lib1, meaning that approximately 0.02. In particular, for MFCCs and Lib1
these indicators are still comparable across implementations an ANOVA analysis suggests that differences are signifi-
22050 Hz 44100 Hz
δ ε r ρ θ δ ε r ρ θ
2
σ̂F 1.68 2.77 0.20 0.15 0.38 2.37 2.42 0.24 0.34 0.50
Size
2
σ̂Genre 2.81 2.75 1.29 1.47 0.81 3.12 2.61 1.17 1.25 0.85
2
σ̂T rack 20.69 19.27 17.75 18.52 16.63 22.28 20.78 18.81 19.92 18.64
Lib1 2
σ̂residual 74.82 75.21 80.75 79.86 82.17 72.22 74.19 79.79 78.49 80.01
Grand Mean 0.0610 0.0545 0.9554 0.9366 0.9920 0.0588 0.0521 0.9549 0.9375 0.9922
Total variance 0.0046 0.0085 0.0276 0.0293 0.0014 0.0048 0.0082 0.0286 0.0298 0.0013
Standard deviation 0.0682 0.0924 0.1663 0.1713 0.0373 0.0695 0.0904 0.1691 0.1725 0.0355
2
σ̂Codec 63.62 34.55 0 0 0 32.32 21.59 0 0 0
2
σ̂BRate:Codec 0.71 0.23 0 0 0 61.80 39.51 0.01 0.03 0.04
2
σ̂Genre 0.25 15.87 2.90 4.05 7.95 0.62 9.98 3.43 1.33 3.66
Lib2
2
σ̂T 19.29 32.77 96.71 92.75 91.80 3.27 13.79 94.24 93.04 77.00
rack
2
σ̂residual 16.14 16.58 0.38 3.20 0.25 1.98 15.13 2.32 5.60 19.30
Grand mean 0.0346 0.0031 0.9915 0.9766 0.9998 2.6e-2 2.2e-3 0.9989 0.9928 1
Total variance 0.0004 5e-6 0.0002 0.0007 6.1e-8 4.6e-4 4.8e-6 3.7e-6 0.0001 1.8e-9
Standard deviation 0.0195 0.0022 0.0135 0.0270 0.0002 0.0213 0.0022 0.0019 0.0122 4.2e-5
Table 2. Variance components in the distributions of robustness of Chroma for Lib1 (top) and Lib2 (bottom), similar
to Table 1. The Codec main effect and all its interactions are not shown for Lib1 because all variance components are
estimated as 0. Note that the FSize main effect and all its interactions are omitted for Lib2 because it is fixed to 16384.
64 96 128 160 192 256 320 WAV trate compression, mostly due to distortions at high fre-
Lib2 Lib1