Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
89
ISMIR 2008 – Session 1b – Melody
90
ISMIR 2008 – Session 1b – Melody
sequences based on two principles: first, so as to maximise pitch class, scale degree, contour and many other derived
n-gram frequencies to the left and right of the boundary; and features are important in the perception and analysis of pitch
second, so as to maximise the entropy of the conditional dis- structure. In order to accommodate these properties, the
tribution across the boundary. The algorithm was able to modelling process begins by choosing a set of basic fea-
successfully identify word boundaries in text from four lan- tures that we are interested in predicting. As these basic
guages and episode boundaries in the activities of a mobile features are treated as independent attributes, their proba-
robot. bilities are computed separately and in turn, and the prob-
IDyOM itself is based on n-gram models commonly used ability of a note is simply the product of the probabilities
in statistical language modelling [18]. An n-gram is a se- of its attributes. Each basic feature (e.g., pitch) may then be
quence of n symbols and an n-gram model is simply a predicted by any number of models for different derived fea-
collection of such sequences each of which is associated tures (e.g., pitch interval, scale degree) whose distributions
with a frequency count. During the training of the statis- are combined using the same entropy-weighted scheme.
tical model, these counts are acquired through an analysis The use of long- and short-term models, incorporating
of some corpus of sequences (the training set) in the target models of derived features, the entropy-based weighting
domain. When the trained model is exposed to a sequence method and the use of a multiplicative as opposed to a
drawn from the target domain, it uses the frequency counts additive combination scheme all improve the performance
associated with n-grams to estimate a probability distribu- of IDyOM in predicting the pitches of unseen melodies
tion governing the identity of the next symbol in the se- [24, 26]. Full details of the model and its evaluation can
quence given the n − 1 preceding symbols. The quantity be found elsewhere [9, 23, 24, 26].
n − 1 is known as the order of the model and represents the The conditional probabilities output by IDyOM in a
number of symbols making up the context within which a given melodic context may be interpreted as contextual ex-
prediction is made. pectations about the nature of the forthcoming note. Pearce
However, n-gram models suffer from several problems, and Wiggins [25] compare the melodic pitch expectations
both in general and specifically when applied to music. The of the model with those of listeners in the context of sin-
first difficulties arise from the use of a fixed-order. Low- gle intervals [10], at particular points in British folk songs
order models fail to provide an adequate account of the [34] and throughout two chorale melodies [19]. The results
structural influence of the context. However, increasing the demonstrate that the statistical system predicts the expec-
order can prevent the model from capturing much of the tations of listeners as least as well as the two-factor model
statistical regularity present in the training set (an extreme of Schellenberg [35] and significantly better in the case of
case occurring when the model encounters an n-gram that more complex melodic contexts.
does not appear in the training set and returns an estimated In this work, we use the model to predict the pitch, IOI
probability of zero). In order to address these problems, and OOI associated with melodic events, multiplying the
the IDyOM model maintains frequency counts during train- probabilities of these attributes together to yield the over-
ing for n-grams of all possible values of n in any given all probability of the event. For simplicity, we don’t use
context. During prediction, distributions are estimated us- any derived features. We then focus on the unexpectedness
ing a weighted sum of all models below a variable order of events (information content, h) using this as a bound-
bound. This bound is determined in each predictive con- ary strength profile from which we compute boundary loca-
text using simple heuristics designed to minimise uncer- tions (as described below). The role of entropy (H) will be
tainty. The combination is designed such that higher-order considered in future work. The IDyOM model differs from
predictions (which are more specific to the context) receive the GPRs, the LBDM and Grouper in that it is based on
greater weighting than lower-order predictions (which are statistical learning rather than symbolic rules and it differs
more general). from DOP in that it uses unsupervised rather than supervised
Another problem with n-gram models is that a trained learning.
model will fail to make use of local statistical structure of
the music it is currently analysing. To address this prob-
lem, IDyOM includes two kinds of model: first, the long- 2.4 Comparative evaluation of melody segmentation al-
term model that was trained over the entire training set in gorithms
the previous step; and second, a short-term model that is
trained incrementally for each individual melody being pre- Most of the models described above were evaluated to some
dicted. The distributions returned by these models are com- extent by their authors and, in some cases, compared quan-
bined using an entropy weighted multiplicative combination titatively to other models. In addition, however, there exist
scheme [26] in which greater weights are assigned to mod- a small number of studies that empirically compare the per-
els whose predictions are associated with lower entropy (or formance of different models of melodic grouping. These
uncertainty) at that point in the melody. studies differ in the algorithms compared, the type of ground
A final issue regards the fact that music is an inherently truth data used, and the evaluation metrics applied. Melucci
multi-dimensional phenomenon. Musical events have many and Orio [20], for example, collected the boundary indi-
attributes including pitch, onset time, duration, timbre and cations of 17 music scholars on melodic excerpts from 20
so on. In addition, sequences of these attributes may have works by Bach, Mozart, Beethoven and Chopin. Having
multiple relevant dimensions. For example, pitch interval, combined the boundary indications into a ground truth, they
evaluated the performance of the LBDM against three base-
91
ISMIR 2008 – Session 1b – Melody
line models that created groups containing fixed (8 and 15) boundary, 0 = no boundary), the other models output a pos-
or random (between 10 and 20) numbers of notes. Melucci itive real number for each note which can be interpreted as
and Orio report false positives, false negatives, and a mea- a boundary strength. In contrast to Bruderer [5] we chose
sure of disagreement which show that the LBDM outper- to make all segmentation algorithms comparable by pick-
forms the other models. ing binary boundary indications from the boundary strength
Bruderer [5] presents a more comprehensive study of the profiles.
grouping structure of melodic excerpts from six Western To do so, we devised a method called Simple Picker that
pop songs. The ground truth segmentation was obtained uses three principles. First, the note following a boundary
from 21 adults with different degrees of musical training; should have a greater or equal boundary strength than the
the boundary indications were summed within consecutive note following it: Sn ≥ Sn+1 . Second, the note following
time windows to yield a quasi-continuous boundary strength a boundary should have a greater boundary strength than
profile for each melody. Bruderer examines the performance the note preceding it: Sn > Sn−1 . Whilst these principles,
of three algorithms: Grouper, LBDM and the summed GPRs simply ensure that a point is a local peak in the profile, the
quantified in [14] (GPR 2a, 2b, 3a and 3d). The output of third specifies how high the point must be, relative to ear-
each algorithm is convolved with a Gaussian window to pro- lier points in the profile, to be considered a peak. Thus the
duce a boundary strength profile that is then correlated with note following a boundary should have a boundary strength
the ground truth. Bruderer reports that the LBDM achieved greater than a threshold based on the linearly weighted mean
the best and the GPRs the worst performance. and standard deviation of all notes preceding it:
Another study [38] compared the predictions of the
LBDM and Grouper to segmentations at the phrase and sub- n−1
n−1
i=1 (wi Si − S w,1...n−1 )
2 wi Si
phrase level provided by 19 musical experts for 10 melodies
Sn > k n−1 + i=1
n−1
in a range of styles. The performance of each model on wi wi
1 1
each melody was estimated by averaging the F1 scores over
the 19 experts. Each model was examined with parame- The third principle makes use of the parameter k which
ters optimised for each individual melody. The results indi- determines how many standard deviations higher than the
cated that Grouper tended to outperform the LBDM. Large mean of the preceding values a peak must be to be picked.
IOIs were an important factor in the success of both models. In practice, the optimal value of k varies between algorithms
In another experiment, the predictions of each model were depending on the nature of the boundary strength profiles
compared with the transcribed boundaries in several datasets they produce.
from the EFSC. The model parameters were optimised over In addition, we modified the output of all models to pre-
each dataset and the results indicated that Grouper (with dict an implicit phrase boundary on the last note of a melody.
mean F1 between 0.6 and 0.7) outperformed the LBDM
(mean F1 between 0.49 and 0.56). 3.3 The Models
All these comparative studies used ground truth segmen-
tations derived from manual annotations by human judges. The models included in the comparison are as follows:
However, only a limited number of melodies can be tested
Grouper: as implemented by [36]; 1
in this way (ranging from 6 in the case of [5] to 20 by [20]).
Apart from Thom et al. [38], Experiment D, there has been LBDM: as specified by [6] with k = 0.5;
no thorough comparative evaluation over a large corpus of IDyOM: with k = 2;
melodies annotated with phrase boundaries.
GPR2a: as quantified by [14] with k = 0.5;
GPR2b: as quantified by [14] with k = 0.5;
3 METHOD GPR3a: as quantified by [14] with k = 0.5;
3.1 The Ground Truth Data GPR3d: as quantified by [14] with k = 2.5;
Always: every note falls on a boundary;
We concentrate here on the results obtained for a subset of
the EFSC, database Erk, containing 1705 Germanic folk Never: no note falls on a boundary.
melodies encoded in symbolic form with annotated phrase Grouper outputs binary boundary predictions but the out-
boundaries which were inserted during the encoding process put of every other model was processed by Simple
by folk song experts. The dataset contains 78,995 sounding Picker using a value of k was chosen from the set
events at an average of about 46 events per melody and over- {0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4} so as to maximise F1 (and sec-
all about 12% of notes fall before boundaries. There is only ondarily Recall).
one hierarchical level of phrasing and the phrase structure
exhaustively subsumes all the events in a melody.
4 RESULTS
3.2 Making Model Outputs Comparable The results of the model comparison are shown in Table 1.
The four models achieving mean F1 values of over 0.5
The outputs of the algorithms tested vary considerably.
While Grouper marks each note with a binary indicator (1 = 1 Adapted for use with Melconv 2 by Klaus Frieler.
92
ISMIR 2008 – Session 1b – Melody
93
ISMIR 2008 – Session 1b – Melody
segmentation: A computational approach. Music Per- Computing, City University, London, UK, 2005.
ception, 23(3):249–269, 2006. [24] M. T. Pearce and G. A. Wiggins. Improved methods
[8] P. R. Cohen, N. Adams, and B. Heeringa. Voting ex- for statistical modelling of monophonic music. Jour-
perts: An unsupervised algorithm for segmenting se- nal of New Music Research, 33(4):367–385, 2004.
quences. Intelligent Data Analysis, 11(6):607–625, [25] M. T. Pearce and G. A. Wiggins. Expectation in
2007. melody: The influence of context and learning. Mu-
[9] D. Conklin and I. H. Witten. Multiple viewpoint sys- sic Perception, 23(5):377–405, 2006.
tems for music prediction. Journal of New Music Re- [26] M. T. Pearce, D. Conklin, and G. A. Wiggins. Meth-
search, 24(1):51–73, 1995. ods for combining statistical models of music. In
[10] L. L. Cuddy and C. A. Lunny. Expectancies generated U. K. Wiil, editor, Computer Music Modelling and Re-
by melodic intervals: Perceptual judgements of conti- trieval, pages 295–312. Springer Verlag, Heidelberg,
nuity. Perception and Psychophysics, 57(4):451–462, Germany, 2005.
1995. [27] I. Peretz. Clustering in music: An appraisal of task
[11] I. Deliège. Grouping conditions in listening to mu- factors. International Journal of Psychology, 24(2):
sic: An approach to Lerdahl and Jackendoff’s group- 157–178, 1989.
ing preference rules. Music Perception, 4(4):325–360, [28] RISM-ZENTRALREDAKTION. Répertoire in-
1987. ternational des scources musicales (rism). URL
[12] J. L. Elman. Finding structure in time. Cognitive Sci- http://rism.stub.uni-frankfurt.de/
ence, 14:179–211, 1990. index\_e.htm.
[13] M. Ferrand, P. Nelson, and G. Wiggins. Memory and [29] J. R. Saffran. Absolute pitch in infancy and adulthood:
melodic density: a model for melody segmentation. In The role of tonal structure. Developmental Science, 6
N. Giomi F. Bernardini and N. Giosmin, editors, Pro- (1):37–49, 2003.
ceedings of the XIV Colloquium on Musical Informat- [30] J. R. Saffran and G. J. Griepentrog. Absolute pitch in
ics, pages 95–98, Firenze, Italy, 2003. infant auditory learning: Evidence for developmental
[14] B. W. Frankland and A. J. Cohen. Parsing of melody: reorganization. Developmental Psychology, 37(1):74–
Quantification and testing of the local grouping rules 85, 2001.
of Lerdahl and Jackendoff’s A Generative Theory of [31] J. R. Saffran, R. N. Aslin, and E. L. Newport. Statis-
Tonal Music. Music Perception, 21(4):499–543, 2004. tical learning by 8-month old infants. Science, 274:
[15] J. Hale. Uncertainty about the rest of the sentence. 1926–1928, 1996.
Cognitive Science, 30(4):643–672, 2006. [32] J. R. Saffran, E. K. Johnson, R. N. Aslin, and E. L.
[16] F. Lerdahl and R. Jackendoff. A Generative Theory of Newport. Statistical learning of tone sequences by hu-
Tonal Music. MIT Press, Cambridge, MA, 1983. man infants and adults. Cognition, 70(1):27–52, 1999.
[17] R. Levy. Expectation-based syntactic comprehension. [33] H. Schaffrath. The Essen folksong collection. In
Cognition, 16(3):1126–1177, 2008. D. Huron, editor, Database containing 6,255 folksong
[18] C. D. Manning and H. Schütze. Foundations of Statis- transcriptions in the Kern format and a 34-page re-
tical Natural Language Processing. MIT Press, Cam- search guide [computer database]. CCARH, Menlo
bridge, MA, 1999. Park, CA, 1995.
[19] L. C. Manzara, I. H. Witten, and M. James. On the [34] E. G. Schellenberg. Expectancy in melody: Tests of
entropy of music: An experiment with Bach chorale the implication-realisation model. Cognition, 58(1):
melodies. Leonardo, 2(1):81–88, 1992. 75–125, 1996.
[20] M. Melucci and N. Orio. A comparison of manual [35] E. G. Schellenberg. Simplifying the implication-
and automatic melody segmentation. In Proceedings realisation model of melodic expectancy. Music Per-
of the International Conference on Music Information ception, 14(3):295–318, 1997.
Retrieval, pages 7–14, 2002. [36] D. Temperley. The Cognition of Basic Musical Struc-
[21] L. B. Meyer. Meaning in music and information the- tures. MIT Press, Cambridge, MA, 2001.
ory. Journal of Aesthetics and Art Criticism, 15(4): [37] J. Tenney and L. Polansky. Temporal Gestalt percep-
412–424, 1957. tion in music. Contemporary Music Review, 24(2):
[22] E. Narmour. The Analysis and Cognition of Ba- 205–241, 1980.
sic Melodic Structures: The Implication-realisation [38] B. Thom, C. Spevak, and K. Höthker. Melodic seg-
Model. University of Chicago Press, Chicago, 1990. mentation: Evaluating the performance of algorithms
[23] M. T. Pearce. The Construction and Evaluation of and musical experts. In Proceedings of the 2002 Inter-
Statistical Models of Melodic Structure in Music Per- national Computer Music Conference, San Francisco,
ception and Composition. PhD thesis, Department of 2002. ICMA.
94