Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Audio Fingerprinting
Xavier Anguera, Antonio Garzn and
Tomasz Adamek
Telefonica Research
Outline
?
watermarking
fingerprinting
Discriminability
Robustness
MASK
FIngerprint
Compactness
Efficiency
18 or 34 MEL-spectrum bands
( t) 2
T hr [n] = E [n 1] exp
2 2
where t is the distance (in frames) between the prev
peak and the considered peak, E [n
1] is
the energy o
=0.98
=40
previously selected peak and , are two parameters use
set the threshold falling rate and its width, respectively. In
proposed implementation we set them to 0.98 and 40 fol
ing a similar implementation used in the Shazam ngerp
by Dan Ellis1.
t
(beautiful, Timit-female)
time
Si
Sj
S > S b[n] =1
i
j
otherwise b[n] = 0
Spectral Regions
Include one or several timefrequency values
Overlaps are allowed
The number of comparisons
defines the size of the fingerprint
Designed manually (for now)
5 MEL bands
Fingerprint encoding
10011
MASK FP
Exact
matching
DDBB
Content
ID
Time
offset
-Nq
Nq
QUERY MASK
0->01100
1->10011
13->11101
14->01101
15->00010
Matching
segment start
Final score
Matching
Segment end
Experimental section
database
NIST-TRECVID 2010-2011 data for video-copy detection
400h reference videos
1400 audio queries per year (201 unique videos X 7)
7 audio transformations
original
MP3 compression
MP3 compression + multiband companding
Bandwidth limit (500-3K) + single-band companding
Mixed with speech
Mixed with speech + multiband companding
Mixed with speech + bandwidth limit + monoband companding
Scores histogram
Conclusions
A novel binary fingerprint is proposed to
improve on some shortcomings from well
reputed prior art.
We show that we can extract the FP and use it
for VCD with excellent results