Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
+
Intro to machine learning for
multimedia analysis
Perception & Multimedia Computing
Lecture 19
Rebecca Fiebrink
Lecturer, Department of Computing
Goldsmiths, University of London
1
Last time
• Perceptually-motivated features for IR
of audio and music
• Evaluation of IR systems
2
Today
• Shazam
• Machine learning intro
• Supervised learning introduction
• Wekinator demo
3
Shazam
4
Using Shazam
1. Open app
2. Record a few
seconds of audio
3. Find out
immediately
which song you’re
listening to (and
have the
opportunity to
buy it)
5
Fingerprinting
Goal: find a function f(song) such that
f(song1) = f(song2)
IF AND ONLY IF
song1 = song2
(*** if song1 and song2 are same
recording, but possibly with different
compression, background noise, etc.)
6
Challenge:
11
Hash Function
1. Compute the spectrogram
12
Hash Function
2. Identify peaks in spectrogram
3235293:
(32572, 39280)
(209371, 94830)
(32572, 3927)
3235294:
(2324, 2323)
15
Receiving a query
Repeat hash process for query:
1. Compute spectral peaks
2. Find pairs of nearby peaks
3. Compute list of hashes: (freq1, freq2, time2-time1)
4. For each hash value:
a. Find hash bucket.
b. For each song in bucket:
Check if query matches song
16
Checking for a match
Does query match song?
17
Relative hash timings
If query matches song, we expect to see something like
this:
18
Query begins
40 seconds
into song
19
20
Results
Robust to noise:
Only 1-2% of hash values
must survive in order to enable
identification
21
22
23
Results
Very fast:
On a regular PC, search through
20,000 tracks takes 5-500ms
24
For more info
Read the academic paper with
details:
http://www.ee.columbia.edu/
~dpwe/papers/Wang03-
shazam.pdf
25
Intro to machine
learning
26
Feature values give a data item (e.g., song)
a point in feature space
Feature
2:
Average
centroid
David Bowie
songs
Feature
2:
Average
centroid
K=3
Feature
2:
Average
centroid
inputs
training
data
algorithm model
Training
outputs
33
33
Supervised learning algorithms
build models from data.
Each example represented as a feature vector
.01, .59, .03, 32 .05, 1.2, 3.2, 31 -.1, .34, .20, 8.2 .01, .64, .02, 20
inputs
“C Major” “F minor” “G7”
training
data
algorithm model
Training
Running “Coutputs
Major”
34
Classification: Assign 1 of N discrete labels
to each point in feature space
feature2
This model: a
separating
line or
hyperplane
(decision
boundary)
feature1
35
Regression
output
This model: a
real-valued
function of the
input features
feature
36
Unsupervised learning
Dataset includes examples, but no labels
Example: Infer clusters from data:
feature2
37
feature1
How supervised learning
algorithms work
(the basics)
38
The learning problem
39
Which classifier is best?
“Underfit” “Overfit”
Competing goals:
Accurately model training data
**Accurately classify unseen data points**
40
Image from Andrew Ng
A simple classifier: nearest neighbor
feature2 feature1
?
41
Another simple classifier: Decision tree
Images: http://ai.cs.umbc.edu/~oates/classes/2009/ML/homework1.html,
42
http://nghiaho.com/?p=1300
AdaBoost: Iteratively train a “weak” learner
Image from http://www.cc.gatech.edu/~kihwan23/
imageCV/Final2005/FinalProject_KH.htm
43
Support vector machine
44
Supervised learning and music
k-Nearest Neighbor
+ Can tune k to adjust smoothness of decision boundaries
- Sensitive to noisy, redundant, irrelevant features; prone to overfitting;
weird in high dimensions
Decision tree:
+ Can prune to reduce overfitting, produces human-understandable
model
- Can still overfit
AdaBoost
+ Theoretical benefits, less prone to overfitting
+ Can tune by changing base learner, number of training rounds
Support Vector Machine
+ Theoretical benefits similar to AdaBoost
Many parameters to tune, training can take a long time
Other considerations…
47
For more info
Wekinator:
http://wekinator.cs.princeton.edu/
Data Mining by
Witten, Frank, and Hall
48
More detailed and technical textbooks
49
Next week
Visualization
50