Sei sulla pagina 1di 5

Compressive sampling of speech signals using

Bessel basis
Senthil Murugan S∗ , Manoj C Rakesh Peter†
Sathiesh Kumaar L, Magesh S Computational Engg. and Networking
Electronics and Communication Engineering Amrita Vishwa Vidyapeetham
Amrita Vishwa Vidyapeetham Coimbatore, India
Coimbatore, India † rakesh.peter@gmail.com
∗ amrita.senthil@gmail.com

Abstract—Compressive sampling is an evolving approach for only for sparsely excited speech but also for voiced speech.
sampling signals at rates much less than Nyquist rate. For Com- We use ‘Basis pursuit’ framework [3] for CS recovery.
pressive sampling of signal, suitable basis function is required The paper is organised in the following order. In Section II,
which gives sparsity on transformation. This paper proposes com-
pressive sampling of speech signals by using Bessel function as we give the mathematical model for compressive sampling.
basis function for transformation of signal from time domain. By Section III describes the characteristics of Bessel function
using Bessel function as basis, it is shown in this paper that speech and Fourier Bessel transform for speech signal. Section IV
signal has been sampled at a rate less than the Nyquist rate. proposes the compressive sampling of speech signal using
The Signal-to-Distortion ratio (SDR) of reconstructed speech Bessel as basis function. The experiment details and results are
signals using Bessel basis and Sinusoidal basis are compared
for compressive sampling rate of 8000Hz, 6400Hz and 3200Hz. dealt in Section V. Finally, the conclusion is given in Section
Results shows that the proposed Bessel basis performance is far VI followed by overview of future work in Section VII.
superior to Sinusoidal basis in terms of signal reconstruction.
Index Terms—Sampling, Compressive sensing (CS), Bessel II. M ATHEMATICAL MODEL FOR C OMPRESSIVE SAMPLING
function, Sampling of speech signal
A. Compressive sampling
I. I NTRODUCTION Compressive sampling relies on the fact that many nat-
ural signals are sparse under suitable basis function for
The compression of speech signals by signal processing
sampling and reconstruction. Consider a discrete time sig-
techniques after sampling the analog signal at twice the
nal x of N samples as vector in vector space ℜN . Let
maximum frequency -Nyquist-Shannon sampling theorem-is
Ψ = [ψ1 |ψ2 |ψ3 . . . ψN ] be N x N basis matrix where
generally accepted and found in many literature. Though the
ψ1 , ψ2 , . . . ψN are basis vectors of space ℜN .
amount of data after compression is reduced, the data to be
sampled are generally large in these methods. To avoid this
x = Ψs (1)
resource(sensing) wastage, the work on compressive sampling
have been progressing in recent years. Compressive sam- where s = [s1 s2 . . . sN ] is a vector of length N containing
pling[1] takes only small amount of samples directly from projection coefficients si = hx, ψi i.
the sampling process for reconstruction of speech signal. The vector x is said to be K-sparse in Ψ domain if K
For compressive sampling of signal, the signal have to non-zero coefficients of s are enough to reconstruct x with
be sparse-expanded with suitable basis functions. Generally, minimal distortion. We observe sampling redundancy in signal
sinusoids are used as basis function for expansion of speech acquistion which will be removed by compressive sampling by
signals in speech recognition,speaker identification and veri- taking only M samples as follows:
fication,etc. But, the sinusoidal expansions i.e. Fourier trans- Let y be measurement vector of length M given by y=Φx
forms doesn’t give sparse coefficients on expansion. Though where Φ is an M x N measurment matrix and K < M << N .
there are many works on Compressive sampling of audio By equation (1), y is written as
signals [14], [17], [18], the papers on speech signals [13],
[14] are rare because of the above mentioned problem. In [13], y = Θx where Θ = ΦΨ (2)
compressive sampling is done only for sparsely excited speech
using Matching Pursuit method. The selection of Φ have to be done carefully in a way
In our paper, we propose Zeroth order Bessel function, that Φ and Ψ are incoherent which is the essential condition
which are decaying in nature, as the basis function for (Restricted Isometry Property) for reconstructing x from y [4].
compressive sampling of speech signals, as it captures the In papers [1], [2], [3], it is shown that Gaussian matrix of size
characteristics of speech better than sinusoids [9] and produces M x N satisfy isometry property for all most any Ψ with high
sparse coefficients. Our proposed basis function works not probability if M ≥ cKlog(N/K) for small constant c.
B. Reconstruction Algorithm Bessel Basis
1
N
The reconstruction of signal x in space ℜ is equivalent to
reconstruction of sparse vector s in Ψ domain by equation (1).
Unfortunately, finding exact s in ℜN which satisfy Θs′ = y is
not that trivial because of the following reason: Θs′ = y for
all s′ = s + r where vector r is in the Null space N (Θ) of Θ. 0.5
However, approximate solution can be obtained using l1 norm

Amplitude
minimization given by

bs = arg min||s′ ||1 , (3)


0
with constraint Θs′ = y
This convex optimization problem can be solved using
linear program algorithm called as Basis pursuit [3]. Many
reconstruction algorithms are also proposed in [5] and [6].
As we are concentrating mostly on finding basis function for
−0.5
CS of speech signal rather than optimization techniques, we 0 10 20 30 40 50 60 70 80 90 100
x
adopted basis pursuit algorithm for reconstruction of speech chch
signal.
Fig. 1. Bessel function of First kind.
III. F OURIER B ESSEL S ERIES
A. Bessel function
Speech signal in time domain
Bessel functions are the solutions for the differential equa- 0.1

tion
′′ ′
x2 y + xy + (x2 − n2 )y = 0, n > 0 (4) 0

which is called Bessel’s differential equation. The solution for −0.1


0 50 100 150 200 250 300 350
this equation is given by
Fourier Bessel transform of speech signal
0.1
y = C1 Jn (x) + C2 Yn (x) (5)
where Jn (x) is called as Bessel function of the first kind of 0
order n and Yn (x) is called a Bessel function of the second
kind of order n. −0.1
0 50 100 150 200 250 300 350
Bessel function of first kind is given by Fourier transform of speech signal
3
X∞
(−1)r (x/2)n+2r )
Jn (x) = ( (6) 2
r=0
r!Γ(n + r + 1)
1
Figure 1 clearly shows that bessel function of first kind
0
resembles damped sinusoids i.e. amplitude of bessel function 0 50 100 150 200 250 300 350
of first kind decays as ‘x’ increases. It is observed that
all physical systems including speech signal have naturally chch
decaying impulse response. When the signal is projected on Fig. 2. Fourier Bessel transform and Fourier transform of speech signal.
basis which have resemblance as that of signal itself, the
projected coefficients will be spase. Hence, the bessel function
which has the characteristics of speech signal and sastisfying where J0 (λm t/a) are the Zero-order Bessel functions, λm
orthogonal property [7] can be used as basis function for for m=1,2,3,. . . are the ascending order positive roots of
expanding speech signal. J0 (λ) = 0, Q is the order of the Bessel expansion.
The coefficient Cm called as Fourier Bessel coefficients
B. Fourier Bessel expansion for speech signal
(FB) are computed as
Consider a signal s(t) defined over arbitary time interval Z a
(0,a). This signal can be expressed in terms of bessel function 2
Cm = 2 2 ts(t)J0 (λm t/a)dt (8)
as a J1 (λm ) 0

Q where J1 (λm ) are the First-order Bessel function.


X
s(t) = Cm J0 (λm t/a), 0 < t < a (7) The Fourier Bessel expansion for speech signal have already
m=1 been explored in many areas of speech processing like speech
coding [9], speech recognition, speaker verification and iden- breaking down convex problem to LPP is that the s, Θ and
tification systems [10], [11], [12]. The paper [9] dicuss about y should have real-value entries [20]. One of the remarkable
compression of speech signal using FB expansion. But the property of Fourier Bessel expansion is that bessel coefficient
application of the same technique on compressive sampling is in Bessel domain are all real values [11]. Also while designing
first in this work. Measurement matrix, we make sure it contains real values.
So, Convex optimization problem can be solved using Linear
IV. C OMPRESSIVE SAMPLING OF SPEECH USING B ESSEL programs. This is another advantage of going to Bessel basis
BASIS
as sinusoidal basis does not have these features.
A. Design of Bessel basis matrix
As already mentioned in the paper, Compressive sampling V. E XPERIMENTAL RESULTS
of any signal in vector space ℜN starts with formulation of
N x N basis matrix Ψ containing N column vectors which A. Experiments
are the basis vectors for ℜN . The Bessel basis vector ψi is The voiced speech is taken for testing the proposed method
given by   of compressive sampling of speech. We sampled the signal
J0 ( λNi ) at 16000Hz for simulation in MATLAB. Since speech signal
 J0 ( 2λi )  is non-stationary in nature, it is split into frames x of length
 N 
ψi =  ..  (9) 20ms which corresponds to 320 samples.
 . 
J0 ( NNλi ) 1) Compressive sampling of speech for M = 160: The ma-
trix Θ of size 160 X 320 is designed as mentioned previously
where λi is the ith positive root of J0 (λ) = 0. The Bessel basis and applied on x to get compressive sampled vector y of
matrix Ψ = [ψ1 |ψ2 | . . . ψN ] is designed and given below size 160 resulting in compressive sampling rate of 8000Hz.
  For reconstruction of speech vector x, the sparse vector bs
J0 ( λN1 ) J0 ( λN2 ) . . . J0 ( λNN )
 J0 ( 2λ1 ) J0 ( 2λ2 ) . . . J0 ( 2λN )  is estimated by l1 minimization with equality constraints
 N N N  (also called as basis pursuit). We used l1 -MAGIC MATLAB
Ψ= .. .. .. ..  (10)
 . . . .  package [15] for reconstruction of s via Convex programming.
J0 ( NNλ1 ) J0 ( NNλ2 ) . . . J0 ( NNλN ) The Bessel basis matrix Ψ of size 320 x 320 is calculated
following the steps given in previous section and it is applied
Now, if the discretized speech signal is projected on the
on the estimated sparse vector bs of size N to get reconstructed
Inverse Bessel matrix Ψ−1 ,the vector s containing sparse
speech vector bx. The SDR value of reconstructed speech signal
bessel coefficients is obtained.
using Bessel basis is found to be an average of 13dB. We
B. Design of measurement matrix also found SDR of reconstructed speech using Sinusoid basis
to be an average of 7dB. This clearly shows the efficieny
Measurement matrix forumulation should be carried out
of the our proposed system with the existing model. The
with care, such that dimensionality reduction from ℜN to
original speech signal and its reconstruction using compressive
ℜM should not affect signal reconstruction. In our work, we
sampling techniques for both sinusoidal and bessel basis are
designed the matrix Θ of size M x N instead of Measurement
plotted.
matrix by random selection of M rows of Bessel matrix Ψ and
multiplying it with Gaussian distributed random weights. Now, 2) Compressive sampling of speech for M=64: Compres-
the random measured vector y or compressive samples will be sive sampling for M=64 corresponds to sampling rate of
obtained as given below: 3200Hz. Here, only the matrix Θ have to be desinged for
size of 64 x 320. The SNR value for Bessel and Sinusoidal
y = Θs (11) basis are almost same (average of 2.5dB) for this sampling
rate. Anyhow, the quality of reconstructed speech using Bessel
C. Optimization technique basis is far superior to Sinusiodal basis which can be seen
The papers[3], [5] gives various methods for signal recovery from graphs of the reconstructed speech signal. We also
by convex programming from highly incomplete information. experimented for other compressive sampling rates of 6400Hz,
We are adopting basis pursuit (also called l1 minimization with 9600Hz, 12800Hz and the SDR values are compared for both
equality constraints) for our work. The framework of basis basis functions.
pursuit for speech recovery is stated as
B. Results
min ||s||1 subject to
PΘs = y
where ||s||1 := |si | The plots containing original speech signal (for 20ms), re-
i constructed speech signal using Bessel basis and reconstructed
There are two ways in which convex problem can be solved speech signal using Sinusoidal basis are given in this section
with less complexity: (1) By recasting convex problem in for different CS rates. It can be seen from the plot that only
to Linear Programming Problem (LPP) (2) By recasting it some intial values of reconstructed speech varies much from
into Second Order Cone Problem (SOCP). The condition for original speech signal.
Original speech signal Original speech signal
0.1 0.1

0 0

−0.1 −0.1
0 50 100 150 200 250 300 350 0 50 100 150 200 250 300 350
Recovered signal taking only 160 samples − Bessel basis Recovered signal taking only 64 samples − Bessel basis
0.1 0.1

0 0

−0.1 −0.1
0 50 100 150 200 250 300 350 0 50 100 150 200 250 300 350
Recovered signal taking only 160 samples − Sinusoidal basis Recovered signal taking only 64 samples − Sinusoidal basis
0.1 0.1

0 0

−0.1 −0.1
0 50 100 150 200 250 300 350 0 50 100 150 200 250 300 350

chch chch
Fig. 3. Reconstructing speech signal using 160 samples. Fig. 5. Reconstructing speech signal using 64 samples.

Original speech signal


0.1

−0.1
0 50 100 150 200 250 300 350
Recovered signal taking only 128 samples − Bessel basis
0.1

−0.1
0 50 100 150 200 250 300 350
Recovered signal taking only 128 samples − Sinusoidal basis
0.1

0
chch
−0.1
0 50 100 150 200 250 300 350 Fig. 6. SDR values of reconstructed speech signal for various M/N ratio
chch
Fig. 4. Reconstructing speech signal using 128 samples. VII. F UTURE WORK
Our proposed method is being applied for tracking people
The SDR values for reconstructed speech for various M/N in Wireless Sensor Networks. Using microphone sensors and
ratios are also provided for selecting Compressive sampling the person’s speech signal as identity, person tracking systems
rate according to desired quality. can be developed. Similar approach in [18] requires the use of
audio signal source attached to a person for tracking. In our
VI. S UMMARY AND C ONCLUSION future work, we are planning to use USRP based Software
In this paper, Compressive sampling of speech using Bessel Radio Basestation [21] to receive the compressed samples
function as basis is proposed. It is shown that recovered of speech signal from wireless microphone sensor nodes and
speech signal using Bessel basis has outperformed Sinusoidal compute the Bessel sparse coefficients by Convex optimization
basis (Fourier transform). It is important to notice that our technique in the base station PC. The estimated Bessel sparse
proposed algorithm is the first of its kind to perform well coefficients will be given to speaker identification system [10],
on speech audio.The computational complexity involved in [12] without transferring to time domain, opposed to the case
solving convex optimization problem for sparse vector is also in [16]. The Compressive sampling of speech signal has a
reduced in our proposed method due to real-valued Fourier lot of application in security systems using Wireless Sensor
Bessel coefficients. The quality of reconstructed speech can Network and our proposed system will help further research
be enhanced by improving the optimization techniques. in such areas.
R EFERENCES speech signals,” Final Rep. Summer Faculty Res. Program, Armstrong
Lab., AFOSR, Aug. 1993.
[1] Emamnuel J Candés, “Compressive sampling,” Proceedings of the In- [11] K. Gopalan and T. R. Anderson, “Speech processing using Bessel func-
ternational Congress of Mathematicians,vol. 3,pp. 1433-1452, Madrid, tions,” Proc. Symp. Intelligent Systems in Communications and Power,
Spain, 2006. Mayaguez, PR, Feb. 1994, pp. 255-259.
[2] Massimo Fornasier,Holger Rauhut, “Compressive Sensing,” Handbook of [12] K. Gopalan and T. R. Anderson, “Speaker Identification using Bessel
Mathematical Methods in Imaging (O. Scherzer Ed.), Springer, 2011. function representation and a back-propagation neural network,” Proc. of
[3] Emmanuel Candés, Justin Romberg, and Terence Tao, “Robust uncer- the IEEE International Symposium on Industrial Electronics, 1995.
tainity principles: Exact signal reconstruction from highly incomplete [13] T.V. Sreenivas, W. Batiaan Kleijn, “Compressive sensing for sparsely
frequency information,” IEEE Trans.on Information Theory, Vol. 52,no. excited speech signal,” Proc. IEEE Int. Conf. Acoust., Speech, Signal
2, pp.489-509, February 2006. Processing, 2009, pp.4125-4128.
[4] Richard Baraniuk, “Compressive sensing,” IEEE Signal Processing Mag- [14] M. G. Christensen, J. ∅stergaard, and S. H. Jensen, “On compressed
azine, Vol. 24,no. 4, pp.118-121, July 2007. sensing and its applications to speech and audio signals,” Asilomar Conf.,
[5] J. Haupt, R. Nowak, “Signal reconstruction from noisy random 2009.
projections,” IEEE Trans. Inform. Theory,” vol. 52, no.9, pp.4036- [15] E. Candès and J. Romberg, “l1-magic: Recovery of sparse signals via
4048,Sept.2006. convex programming,” code package available at www.l1-magic.org
[6] J. Tropp, A.C. Gilbert, “Signal recovery from partial information [16] Anthony Griffin, Elen Karamichali, “Speaker Identification using
via orthogonal matching pursuit,” Apr. 2005 [Online]. Available: sparsely excited speech signals and compressed sensing”, 18th European
http://www.personal.umich.edu/ jtropp/papers/TG06-Signal Recovery.pdf Signal Processing Conference, August 23-27, 2010.
[7] J. Schroeder, “Signal Processing via Fourier-Bessel series expansion,” [17] Laura Balzano, Robert Nowak, “Compressed Sensing Audio Demon-
Digital Signal Processing 3, 112-124, 1993. stration,” in website http://sunbeam.ece.wisc.edu/csaudio/
[8] K. Gopalan, T8. R. Anderson, “A Comparison of speaker identification [18] Anthony Griffin, Panagiotis Tsakalides, “Compressed sensing of Audio
results using features based on Cepstrum and Fourier-Bessel expansion,” signals using multiple sensors”,16th European Signal Processing Confer-
IEEE Transactions on Speech and Audio processing, Vol. 7, no. 3, May ence, August 25-29, 2008.
1999. [19] Chong Luo, Feng Wu, “Compressive Data Gathering for Large-Scale
[9] K. Gopalan, “Speech coding using Fourier-Bessel expansion of speech Wireless Sensor Networks”,in Mobicom 2009.
signals,” IECON’01: The 27th Annual Conference of the IEEE Industrial [20] S.S. Chen, D.L. Dohono, “Atomic decomposition by basis pursuit,”
Electronics Society. SIAM J.Sci. Comput., 20:33-61, 1999.
[10] K. Gopalan, “Speaker identification using Bessel function expansion of [21] Ettus Research LLC, 1043 North Shoreline Blvd., Suite 100, Mountain
View, CA 94043, www.ettus.com

Potrebbero piacerti anche