Mapping Speech-Like Signal Transmission GSM Voice Channel: Data Onto To Over The

40th Southeastern Symposium on System Theory University of New Orleans New Orleans, LA, USA, March 16-18, 2008
MA2.5
Data Mapping onto Speech-like Signal to Transmission over the GSM Voice Channel
Mahsa Rashidi
Msc student at Electrical Engineering Department Amirkabir University of

m
rashidi(gau.tac.ir
Associate professor at Electrical Engineering Department Amirkabir University of
Abolghasem Sayadiyan
eea35gaut.ac.ir
PhD student at Electrical Engineering Department Amirkabir University of P MowlaeeLajeee.org
Pejman Mowlaee
Abstract- One of the most important objectives in mobile communication systems is secure voice and data communication (including text, picture, video and voice) esp.
communications (GSM) voice channel and then demodulated and decrypted at the receiver. We propose an appropriate modelfor the GSMAFull Rate (FR) speech codec by mapping data onto the fundamental parameters related to formants in a speech-like waveform including phases, frequencies and pitch frequencies. The proposed model has been evaluated for a GSM-to-GSM connection. Conducting different simulations we observed that the proposed approach results in a bit error rate (BER) of 0.020o when Signal-to-Noise Ratio (SNR) is 15 dB in a 1.5kbps channeL As a result, proposed method can be considered as afavorable choice for robustness to additive noise.
in high bit rates. In this paper, a new procedure is proposed in which the intended data or voice is encrypted and modulated onto speech-like waveforms. The modulated waveforms are transmitted over the global system for mobile
communications channel and then reaches the second GSM handset. The received waveform from second handset is demodulated, decrypted and finally decoded
[3]
Keywords - formants, GSM, LSF, speech-like waveform, formants.
1. INTRODUCTION
for the 2 generaLion communications systems makes them only capable for data transmission at low bit rates (e.g. 1120 bits per page) for Short Message Service (SMS) in G.7 signaling channel. However, as the data channel is available for a limited number of subscribers, data transmissions are still possible to a maximum rate of 9.6 kbps. In contrast to data channel, using a voice channel can result in negligible time delays as reported in [1]. In addition, one of the most important problems in data transmission over the GSM voice channel is to make sure whether the transmitted data is highly secure. To cope this problem, the resulting bit stream from a low bit rate speech coder implemented for voice channel adaptation, usually enters into data encrypting block [2]. Data will be modulated on speech-like waveforms prior entering the global system for mobile communications (GSM) network. The resulting waveform then enters the first GSM handset,
defined including pitch frequency, Line Spectral Frequencies (LSF) coefficients and frame energy in modulator side. Next these parameters are used for waveforms synthesis. Finally, the encrypted data are mapped onto these waveforms. These parameters will be derived from the received speech-like waveforms in demodulator side and compared to codebook and finally the best one is chosen [3]. Meanwhile, this approach has been adopted for GSM Enhanced Full
transfer capability with low bit rates. As a result, based on the proposed method in this paper, an appropriate modem is presented. However, in some recent works by Katugampala reported in [3], codebooks including the values of speech-like waveform parameters are
The ov ll syste ock diagrm lss nsucin Fg1.uecto lwwe raeqof spee chanhaving data ing communications, require modems
Hardware and protocol deficiencies are two drawbacks
Rate (EFR) speech codes 12.2 kbps whereas the proposed approach in this paper is considered for 13
kbps GSM FR speech codes as reported in ETSI GSM 06.10 [4]. This paper is organized in the following sections. In section 2, the complete procedure of speech-like signal production, data mapping are presented. Section 3 is dedicated to synchronization. In section 4 simulation results are reported. Section 5 concludes.
2. SPEECH-LIKE SIGNAL PRODUCTION PROCEDURE
available in GSM speech coder). Therefore in this paper, we produce speech-like waveforms based on Auto-regressive (AR) model. Waveforms should be produced with four formants so that they can be
We require mapping data bit stream on speech-like waveforms of2G ms length (equal to what is usually
978-1-4244-1807-7/08/$25.OO 2008 IEEE.
54
MA2.5
adapted to GSM coder. As formants are sensitive to changes and for simply extraction of signal characteristics in the demodulation, we would prefer to parallelized the resulting transfer function according to work in [5]. Finally, by applying excitation signal to resulting transfer function, appropriate speech-like signal will be produced. As a result, related the transfer function of ith formant is a second-order difference equation as follows:
where HTotal (z) is the same paralleling transfer function, Nana is the analysis window length, and Nfr is the frame shift length. Note that, vectors a, f and T in (4) are obtained from M prominent peaks found trough peak picking procedure as reported in [6]. Fig.2 shows a prototype of speech-like waveforms with 20ms length produced by harmonic modeling approach discussed above.
Speech-like
where
A.=1+B.+C.
B1
=
(1)
fs). Cos(27f Ifs) Af1/ fs)
I
2 exp(-7w Af
1j
4lCpare
Ot
Bit Strem
Data Demodulator
Speech-like
waveform
Speech CodC
Decod
Channel ing
Ci =-exp(-27
Figure 1: overview of the complete system.

1.2
frequencies and bandwidths, respectively. In order to have a logic compare between the received envelope spectrum and the one in the transmitter paralleling of Hi (z) is done under conditions as follows:
*
wherefs is sampling frequency, Af andf are formants'
1
0.8
E 06
Firstly, we normnalize the transfer function related to each formant to its central frequency:
0.4
z
Hi (ee)
*
2w ki =1
i 1,2,3,4
(2)
0
0 20 40 60
Secondly, another normalization should be employed in a parallel resultant format transfer function which can be written as below:
Figure 2: Synthesized speech-like signal

2.2. Data mapping on speech-like signal
Time(sample,8KHz)
80
100
120
140
160
a, H (ejn) +/3 IH2(eicon +P H(e
+t...~(e') (3) +2n H4(ej-"+/IHI(e'I)n= H4(e'X) =1 ~ 1,...,4
where, an Pn Pn and An are the normalized equation coefficients. Finally, the speech-like waveform will be resulted from the overall spectral envelope by employing the harmonic model synthesis method based on [6]. As a result, the complete process for waveforms production can be demonstrated as follows:
a
HTtl(f) v=2a (Naa

N
2
Al
-Nf)
(27f
n=I
(4)
cosATnp =n
a
One of the key points is to correctly select the formants' frequencies within telephone voice band (300-3400 Hz). During experiments and investigations we concluded that among the mentioned parameters in formants, their related frequencies and phases can only be detected as speech-coded passing signals in a voice channel. As a result, we explain in detail how to select parameters and to allocate data bits to frequencies and phases parameters. We should select the 1 st and 2nd frequency formants among the frequencies ranges in 300 to 1000 Hz. These frequencies are encoded by 3 bits. Note that, the frequency range of the third formant ranges in 1400 to 2500 Hz and coded by 3 bits and fourth format range between 2900 to 3400 Hz coded in 2 bits. Note that since harmonic model is used in the proposed method, the formant frequencies discussed -above should be selected as a multiple of the pitch frequency which results in a negligible error in
55
MA2.5
jumps occurring in frame boundaries, it is necessary to overlap the produced speech-like waveforms with above approach. Also it should be considered that data bit streams on speech-like signal remain undamaged. 1. Received signal amplitude should be more than To this end, it is so important to select pitch period i.e. 70 percent of transmitted amplitude. 1/fp that has direct relation to data mapped on each frame. GSM codec does a linear interpolation between 2. Frequency displacement of received formants should not be more than a default frequency steps Log Area Ratio (LAR) coefficients of two adjacent frames (each frame consisting of 160 samples). To for each formant. Otherwise, it causes incorrect avoid spurious transients as well as interpolating LAR extraction of the mapped information in the coefficients of the last frame's the primary 40 samples resulting frequencies. As a result, selecting the with LAR coefficients of the current frame's the frequency steps as a multiple of the given pitch primary 40 samples [4]. This motivates us to the idea frequency fulfills such a condition. However, a that adjacent frames should have the minimum overlap. larger frequency step is selected for the 4th This is due to the fact when a PCM waveform signal formant due to its high sensitivity to starts GSM tandem connection; high overlapping of displacement. intra-frames does not cause tremendous changes in 3 nother important point is the lack of proximity reflection coefficients of each frame. This, as a result, Anotheradjacent formants.isthresulack pheroxicauses incorrect detection of transmitted data. Note ththeorainsmesnechfmehulnt in two impotacnt poimants As a result, there are that the overlapping samples in each frame should not unusale g unusable bad regons iin bounary btwee band regions boundary between be chosen in order to prevent inter-modulation effects. formants. This is due to the fact that minimum As a result, (5) presents the linear interpolation for distance for two adjacent formants is twice the proposed modulator: bandwidth considered while their bandwidths are the same. Due to the lack of fidelity in GSM coder/decoder to formants bandwidths, we only Y (1+a+160-n) 2a +1 consider constant and similar bandwidth (n + a-160) n (160- a), ...160 ofAf =160 Hz in whole synthesis process. Y2 = 2a+I (5) As the phase fidelity only holds for frequencies under (m + a) 1 kHz, some information should be preserved in phases -Y3 2a +1 m n=1, ,(1+a) related to first and second formants. As a result, the l (+ a-m) difference between the extracted phase from the Y4 = a-n 2a +1 received signal envelope and the mapped phase in that particular frequency phase should be coded within 3 bits. Another important parameter is pitch-frequency Where a equals the overlapping samples in each frame. selection problem which is proportionate to the choice Note that, Y1, Y2 are multiplied by samples of of the synthesize window employed in harmonic (160-a) to 160 in the last frame, s(i 1), and y3 y4 are analysis procedure discussed earlier in Section 2.1. As multiplied by samples of Ito (1+ a) in the current a result, we observed that using pitch frequencies frame, sI, presented in (6), respectvely. fp=123 Hz and fp=125 Hz result in acceptable performance. Therefore we coded the mapped data on L1 = Y1 x S(j1) () pitch frequencies while employing 1 bit. Finally, the whole speech-like waveform procedure can be L2= Y2 x S(i_1) (k1) modulated by 12 data bits in a 20ms frame length. In L3 = Y3 x S1 () addition, we demonstrate in the simulation results that x Si (k2) using the proposed technique we achieved at a bitrate L4 = (6)
extracting information. The selection criteria are as follows:
of 600bps.
2.3. Intra-frame Interpolation
Interpolat ion1 = L + L2 Interpolat ion 2 =L3 + L4 Interpolat ion = Interpolat ion 1 + Interpolat ion 2
In order to achieve phase continuity which is an important characteristic in speech signals and some
where k1 k2 are the numbers of samples interpolated, Interpolation1, Interpolation2 are the overlapped
56
MA2.5
samples of each frame and Interpolation is the whole interpolated samples of two adjacent frames and note that in demodulator this region shouldn't be chosen. Finally, appropriate PCM waveform signal has been prepared to enter into the speech coded voice channel.
E
0.8 0.7
0.6
0.5
0.4
3. Synchronization
One of the important things in simulation is synchronization of system elements so in order to simulate the synchronization of the speech codec frames in two base-stations, we considered this effect inserted into system by a random number of samples before the signal passed to the second codec. Then to simulate the synchronization of modulator and demodulator, at the start of any communication a predefined synchronization sequence is sent from the modulator to the demodulator. This sequence of samples is known to both. Since in the simulation it is known that there will be a synchronization sequence in the input signal, the synchronization module crosscorrelate a fixed predefined number of input samples at the beginning of the transmission with the predefined synchronization sequence. The sample sequence that best matched is used for synchronization in the
demodulator.
< 0.3
N 0.2
0.1
0A
0
0.2
Time(samples,8KHz) Figure 3: Synthesized and Received signals
20
Synthesized signal
40
Received signal
60
80
100
120
140
0.9
0.7 _ 0.6
0.8
0.4
03
0.2
0.1
0
Received spectral envelope
Synchronized spectral envelope
4. Simulation Results
We tested our system on the GSM-to-GSM connection. In our simulation we generate speech-like signal by the proposed method with the length of 2.5s consisting of about 120 different waveforms (with 20ms length) and The best interpolation occur in a=7. To synchronize modulator and demodulator, before starting of any transmission a predefined synchronization sequence is sent, that the synchronization process occurs in the 22nd sample of this sequence. Next, generated signal is transmitted from modulator to the coder, channel, decoder and demodulator. Fig.3 evaluates 3rd frame of synthesized and received signals. In order to extract the important speech-like parameters, we need to have the envelope spectrum of
500
1000
1500
Figure 4: Synchronized and Received envelopes
frequency(Hz)
2000
2500
3000
3500
4000
convolutional code with constraint length of 7, on the 2kbps, achieved a 1.15 kbps channel with 0.02% BER for SNR=15dB. As a result, the proposed method can be considered as a favorable choice due to its robustness to additive noise as depicted in Fig. 5.
10-3
Empirical
We achieved a throughput of 2kbps with 0.30 Bit error rate (BER). Using a Punctured l2-rate
the received signal. Hence, fig.4 illustrates the envelope spectra of signals. Note fig. 3, 4 have been generated by pitch-frequency corresponding to 125 Hz and signals are not selected from interpolated samples as explained in section 2.3. Four peaks with maximum amplitudes show displacement of central frequencies of 3rd frame's formants in synthesized and received
signals as depicted in Fig.4.
1o-5
10
Fig.5. BER over Fading channel with additive Noise for
SNR per bit,Eb/No (dB)
15
20
25
30
BT=0.3
57
MA2.5
5. Conclusion
A robust method is proposed for secure data transmission over a GSM voice channel. The method was based on transmitting of the mapped data on the fundamental parameters related to formants in a speech-like waveform including phases, frequencies and pitch resulting in transferring 12 bits data on a speech-like waveform using frame size of 20ms.
Reference
[1] M. Street, "Interoperability and international operation: An introduction to end to end mobile security", IEE Secure GSM and Beyond: End to End Security for Mobile Communications, London, Feb., 2003. [2] M. Stefanovic, Y. D. Cho, S. Villette, and A. M. Kondoz, "A 2.4/1.2 kb/s speech coder with noise pre-processor", proceedings EUSIPCO 2000, Tampere, Finland, pp. 4-8, Sept., 2000. [3] N. Katugampala, S. Villette, and A. Kondoz, "Secure voice over GSM and other low bit rate systems," IEE Secure GSM and Beyond: End to End Security for Mobile Communications, London, Feb., 2003. [4] J. Degener and C. Bormann." Gsm 06.10 lossy speech
compression".ftp://ftp.cs.tu-rlin.de/pub/local/kbs/tubmik/gsm /gsm-1.0. 10.tar.gz. [5] D. H. Klatt, "Software of cascade/parallel formant synthesizer", J. Acoustic. Soc. Am. 67 (3), Massachusett, pp 971-996, Mar., 1980 [6] R.J. McAulay and T.F. Quatieri, "Speech analysis/synthesis based on a sinusoidal representation," IEEE Trans. ASSP, vol.34, pp. 744-754, Aug., 1986.
58

Mapping Speech-Like Signal Transmission GSM Voice Channel: Data Onto To Over The

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Mapping Speech-Like Signal Transmission GSM Voice Channel: Data Onto To Over The

Caricato da

Copyright:

Formati disponibili

40th Southeastern Symposium on System Theory University of New Orleans New Orleans, LA, USA, March 16-18, 2008

Msc student at Electrical Engineering Department Amirkabir University of

Associate professor at Electrical Engineering Department Amirkabir University of

PhD student at Electrical Engineering Department Amirkabir University of P MowlaeeLajeee.org

Keywords - formants, GSM, LSF, speech-like waveform, formants.

Hardware and protocol deficiencies are two drawbacks

2. SPEECH-LIKE SIGNAL PRODUCTION PROCEDURE

978-1-4244-1807-7/08/$25.OO 2008 IEEE.

Figure 1: overview of the complete system.

wherefs is sampling frequency, Af andf are formants'

Figure 2: Synthesized speech-like signal

a, H (ejn) +/3 IH2(eicon +P H(e

+t...~(e') (3) +2n H4(ej-"+/IHI(e'I)n= H4(e'X) =1 ~ 1,...,4

HTtl(f) v=2a (Naa

Time(samples,8KHz) Figure 3: Synthesized and Received signals

Received spectral envelope

Synchronized spectral envelope

Figure 4: Synchronized and Received envelopes

Fig.5. BER over Fading channel with additive Noise for

SNR per bit,Eb/No (dB)

Potrebbero piacerti anche