Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
April, 2018
Short-Time Fourier
Transform
Yi-Hsuan Yang Ph.D.
http://www.citi.sinica.edu.tw/pages/yang/
yang@citi.sinica.edu.tw
• MATLAB code
─ [a,sr] = wavread(‘…’) % sr = sampling rate
─ length(a) % length of the signal in ‘number of samples’
─ length(a)/sr % length of the signal in ‘seconds’
2
Sampling Rate (Cont’)
• MATLAB code
─ a2 = downsample(a,2);
─ sr2 = sr/2;
─ length(a2) % length of the signal in ‘number of samples’
─ length(a2)/sr2 % length of the signal in ‘seconds’
─ wavwrite(a2,sr2,‘test.wav')
3
Sinusoids
• MATLAB code
─ sr = 200;
─ t = 0:1/sr:1;
─ f0 = 10; % frequency
─ a = 1; % amplitude
─ y = a*sin(2*pi*f0*t + pi/2);
─ stem(t,y)
• Why?
sin(2*pi*f0*t + pi/2) = 1, when t = 1/f0, 2/f0, 3/f0, 4/f0, …
frequency = inverse of the period
4
Nyquist–Shannon Sampling Theorem
• A signal must be sampled at least twice as fast as the
bandwidth of the signal to accurately reconstruct the
waveform; otherwise, the high-frequency content will
alias at a frequency inside the spectrum of interest
• Sampling freq > 2* the highest freq in the signal
http://zone.ni.com/reference/en-XX/help/370524T-
01/siggenhelp/fund_nyquist_and_shannon_theorems/
5
Nyquist–Shannon Sampling Theorem
f0 = 10
y = sin(2*pi*f0*t)
6
Nyquist–Shannon Sampling Theorem
• Telephone audio: 8k Hz
Via phone, we cannot hear frequency higher than 4k Hz
https://www.quora.com/How-do-HRT-sex-reassignment-and-other-such-procedures-
affect-vocal-production-particularly-the-singing-voice
7
Nyquist–Shannon Sampling Theorem
http://altered-states.net/barry/update236/
8
Fourier Transform
• To get the spectrum of a signal
• MATLAB code
─ https://www.mathworks.com/help/matlab/ref/fft.html
─ doc fft
─ Y = abs(fft(y));
9
Fourier Transform
• MATLAB code
─ x1 = 0.7*sin(2*pi*50*t);
─ x2 = sin(2*pi*120*t);
10
Fourier Transform
• Problem: cannot “localize” signal of interest
11
Fourier Transform
• Problem: cannot “localize” signal of interest
12
Short Time Fourier Transform (STFT)
• “Windowed” version of the Fourier Transform
• Output: a time-frequency representation
• MATLAB code
─ https://www.mathworks.com/help/signal/ref/spectrogram.
html
─ doc spectrogram
─ spectrogram(y,window,noverlap,nfft)
─ spectrogram(y,100,50,100,sr,'yaxis')
13
Short Time Fourier Transform (STFT)
14
Short Time Fourier Transform (STFT)
15
Short Time Fourier Transform (STFT)
• window size = 100
16
Short Time Fourier Transform (STFT)
• window size = 100
17
Short Time Fourier Transform (STFT)
• Hop size
hop_size = win_size
hop_size = 0.5*win_size
hop_size = 0.1*win_size
18
Quiz Time
1. When the sampling rate
(sr) is 1k Hz, what would ∆t
be the time interval (in
seconds) between two
neighboring samples?
19
Quiz Time
3. When the sr=1k Hz and we
useuse
we a STFT
a STFT
window
windowsize of
100 of
size samples
100 samples
with nowith
overlaps
no overlaps
between
between
consecutive windows, how
manymany
how timestimes
do wedoneed
we to
moveto
need themove
window
the to cover
a signal with
window to cover
300 samples?
a signal
with 300 samples?
4. And, if there is 50% over-
laps between windows,
how many times do we
need to move the window?
20
Quiz Time
5. Given the following spectrogrms, try to draw the correspond-
ing waveforms
21
Quiz Time
5. Given the following spectrogrms, try to draw the correspond-
ing waveforms (SOLUTION)
22
Quiz Time
6. Given the following spectrogrms, try to draw the correspond-
ing the spectra computed by Fourier Transform
23
Quiz Time
6. Given the following spectrogrms, try to draw the correspond-
ing the spectra computed by Fourier Transform (SOLUTION)
24
Quiz Time
• MATLAB code for (c)
─ sr = 1e3; f = 100;
─ t1 = 1/sr:1/sr:1; t2 = 1/sr:1/sr:0.5;
─ y1 = [sin(2*pi*f*t2) zeros(1,1.5*sr)];
─ y2 = [zeros(1,sr) sin(2*pi*f/2*t1)];
─ y = [y1+y2];
─ figure(1), spectrogram(y,256,250,256,1e3,'yaxis')
─ figure(2), plot(t,y)
─ figure(3), NFFT = 2^nextpow2(length(y));
─ Y = fft(y,NFFT)/length(y);
─ ff = sr/2*linspace(0,1,NFFT/2+1);
─ plot(ff,2*abs(Y(1:NFFT/2+1)))
25
Understanding STFT
• Different window size (win_size = 50, 100, 150)
30
Understanding STFT
• temporal_resolution: hop_size
longer window → worse temporal resolution
temp_resolution = 25, 50, 75 (ms), respectively
31
Trade-off Between Temp/Freq Resolution
• sr = 1000; hop_size = win_size/2;
win_size (sample) freq_resolution (hz) temp_resolution (ms)
50 20 25
100 10 50
150 6.6667 75
34
https://newt.phys.unsw.edu.au/jw/notes.html
Quiz Time
6. Given a music signal with sr =
44,100 Hz, when we use a
window size of 1,024 samples,
what would be the frequency
resolution?
7. According to the figure on the
right, we know that the
fundamental frequency (f0) of
A1 is 55 Hz, that of A♯1 is
58.27 Hz, etc. Following the
previous question, which
notes does the first frequency
bin in the STFT cover?
35
https://newt.phys.unsw.edu.au/jw/notes.html
Quiz Time
6. Given a music signal with sr = [0, 43.1)
44,100 Hz, when we use a
window size of 1,024 samples,
what would be the frequency [43.1, 86.2)
resolution? Sol: 43.1 Hz
7. According to the figure on the
right, we know that the
fundamental frequency (f0) of [86.2, 129.3)
A1 is 55 Hz, that of A♯1 is
58.27 Hz, etc. Following the
[129.3, 172.4)
previous question, which
notes does the first frequency
[172.4, 215.5)
bin in the STFT cover?
36
https://newt.phys.unsw.edu.au/jw/notes.html
Quiz Time
8. Given a music signal with sr = [21.5, 32.3)
44,100 Hz, how if we use a [32.3, 43.1)
window size of 4,096 samples?
[43.1, 53.8)
[0, 10.8)
[10.8, 21.5)
[21.5, 32.3)
[32.3, 43.1)
[43.1, 53.8)
…
38
Feature Extraction
• Spectrogram → mel-spectrogram → MFCC ( mbre)
• Spectrogram → CQT → chroma feature (harmony)
“Feature learning and deep architectures: new directions for music informatics,” J Intell Inf Syst (2013)
https://link.springer.com/content/pdf/10.1007%2Fs10844-013-0248-5.pdf
39
Feature Learning by Convolutional Layers
40