Sei sulla pagina 1di 5

40th Southeastern Symposium on System Theory University of New Orleans New Orleans, LA, USA, March 16-18, 2008

MA2.5

Data Mapping onto Speech-like Signal to Transmission over the GSM Voice Channel
Mahsa Rashidi

Msc student at Electrical Engineering Department Amirkabir University of


m

rashidi(gau.tac.ir

Associate professor at Electrical Engineering Department Amirkabir University of

Abolghasem Sayadiyan

eea35gaut.ac.ir

PhD student at Electrical Engineering Department Amirkabir University of P MowlaeeLajeee.org

Pejman Mowlaee

Abstract- One of the most important objectives in mobile communication systems is secure voice and data communication (including text, picture, video and voice) esp.

communications (GSM) voice channel and then demodulated and decrypted at the receiver. We propose an appropriate modelfor the GSMAFull Rate (FR) speech codec by mapping data onto the fundamental parameters related to formants in a speech-like waveform including phases, frequencies and pitch frequencies. The proposed model has been evaluated for a GSM-to-GSM connection. Conducting different simulations we observed that the proposed approach results in a bit error rate (BER) of 0.020o when Signal-to-Noise Ratio (SNR) is 15 dB in a 1.5kbps channeL As a result, proposed method can be considered as afavorable choice for robustness to additive noise.

in high bit rates. In this paper, a new procedure is proposed in which the intended data or voice is encrypted and modulated onto speech-like waveforms. The modulated waveforms are transmitted over the global system for mobile

communications channel and then reaches the second GSM handset. The received waveform from second handset is demodulated, decrypted and finally decoded
[3]

Keywords - formants, GSM, LSF, speech-like waveform, formants.

1. INTRODUCTION
for the 2 generaLion communications systems makes them only capable for data transmission at low bit rates (e.g. 1120 bits per page) for Short Message Service (SMS) in G.7 signaling channel. However, as the data channel is available for a limited number of subscribers, data transmissions are still possible to a maximum rate of 9.6 kbps. In contrast to data channel, using a voice channel can result in negligible time delays as reported in [1]. In addition, one of the most important problems in data transmission over the GSM voice channel is to make sure whether the transmitted data is highly secure. To cope this problem, the resulting bit stream from a low bit rate speech coder implemented for voice channel adaptation, usually enters into data encrypting block [2]. Data will be modulated on speech-like waveforms prior entering the global system for mobile communications (GSM) network. The resulting waveform then enters the first GSM handset,

defined including pitch frequency, Line Spectral Frequencies (LSF) coefficients and frame energy in modulator side. Next these parameters are used for waveforms synthesis. Finally, the encrypted data are mapped onto these waveforms. These parameters will be derived from the received speech-like waveforms in demodulator side and compared to codebook and finally the best one is chosen [3]. Meanwhile, this approach has been adopted for GSM Enhanced Full

transfer capability with low bit rates. As a result, based on the proposed method in this paper, an appropriate modem is presented. However, in some recent works by Katugampala reported in [3], codebooks including the values of speech-like waveform parameters are

The ov ll syste ock diagrm lss nsucin Fg1.uecto lwwe raeqof spee chanhaving data ing communications, require modems

Hardware and protocol deficiencies are two drawbacks

Rate (EFR) speech codes 12.2 kbps whereas the proposed approach in this paper is considered for 13

kbps GSM FR speech codes as reported in ETSI GSM 06.10 [4]. This paper is organized in the following sections. In section 2, the complete procedure of speech-like signal production, data mapping are presented. Section 3 is dedicated to synchronization. In section 4 simulation results are reported. Section 5 concludes.

2. SPEECH-LIKE SIGNAL PRODUCTION PROCEDURE

available in GSM speech coder). Therefore in this paper, we produce speech-like waveforms based on Auto-regressive (AR) model. Waveforms should be produced with four formants so that they can be

We require mapping data bit stream on speech-like waveforms of2G ms length (equal to what is usually

978-1-4244-1807-7/08/$25.OO 2008 IEEE.

54

MA2.5

adapted to GSM coder. As formants are sensitive to changes and for simply extraction of signal characteristics in the demodulation, we would prefer to parallelized the resulting transfer function according to work in [5]. Finally, by applying excitation signal to resulting transfer function, appropriate speech-like signal will be produced. As a result, related the transfer function of ith formant is a second-order difference equation as follows:

where HTotal (z) is the same paralleling transfer function, Nana is the analysis window length, and Nfr is the frame shift length. Note that, vectors a, f and T in (4) are obtained from M prominent peaks found trough peak picking procedure as reported in [6]. Fig.2 shows a prototype of speech-like waveforms with 20ms length produced by harmonic modeling approach discussed above.
Speech-like

where

A.=1+B.+C.
B1
=

(1)
fs). Cos(27f Ifs) Af1/ fs)
I

2 exp(-7w Af

1j

4lCpare
Ot

Bit Strem

Data Demodulator

Speech-like
waveform

Speech CodC

Decod

Channel ing

Ci =-exp(-27

Figure 1: overview of the complete system.


1.2

frequencies and bandwidths, respectively. In order to have a logic compare between the received envelope spectrum and the one in the transmitter paralleling of Hi (z) is done under conditions as follows:
*

wherefs is sampling frequency, Af andf are formants'

1
0.8

E 06

Firstly, we normnalize the transfer function related to each formant to its central frequency:

0.4
z

Hi (ee)
*

2w ki =1

i 1,2,3,4

(2)

0
0 20 40 60

Secondly, another normalization should be employed in a parallel resultant format transfer function which can be written as below:

Figure 2: Synthesized speech-like signal


2.2. Data mapping on speech-like signal

Time(sample,8KHz)

80

100

120

140

160

a, H (ejn) +/3 IH2(eicon +P H(e

+t...~(e') (3) +2n H4(ej-"+/IHI(e'I)n= H4(e'X) =1 ~ 1,...,4

where, an Pn Pn and An are the normalized equation coefficients. Finally, the speech-like waveform will be resulted from the overall spectral envelope by employing the harmonic model synthesis method based on [6]. As a result, the complete process for waveforms production can be demonstrated as follows:
a

HTtl(f) v=2a (Naa


N
2
Al

-Nf)
(27f
n=I

(4)

cosATnp =n
a

One of the key points is to correctly select the formants' frequencies within telephone voice band (300-3400 Hz). During experiments and investigations we concluded that among the mentioned parameters in formants, their related frequencies and phases can only be detected as speech-coded passing signals in a voice channel. As a result, we explain in detail how to select parameters and to allocate data bits to frequencies and phases parameters. We should select the 1 st and 2nd frequency formants among the frequencies ranges in 300 to 1000 Hz. These frequencies are encoded by 3 bits. Note that, the frequency range of the third formant ranges in 1400 to 2500 Hz and coded by 3 bits and fourth format range between 2900 to 3400 Hz coded in 2 bits. Note that since harmonic model is used in the proposed method, the formant frequencies discussed -above should be selected as a multiple of the pitch frequency which results in a negligible error in

55

MA2.5

jumps occurring in frame boundaries, it is necessary to overlap the produced speech-like waveforms with above approach. Also it should be considered that data bit streams on speech-like signal remain undamaged. 1. Received signal amplitude should be more than To this end, it is so important to select pitch period i.e. 70 percent of transmitted amplitude. 1/fp that has direct relation to data mapped on each frame. GSM codec does a linear interpolation between 2. Frequency displacement of received formants should not be more than a default frequency steps Log Area Ratio (LAR) coefficients of two adjacent frames (each frame consisting of 160 samples). To for each formant. Otherwise, it causes incorrect avoid spurious transients as well as interpolating LAR extraction of the mapped information in the coefficients of the last frame's the primary 40 samples resulting frequencies. As a result, selecting the with LAR coefficients of the current frame's the frequency steps as a multiple of the given pitch primary 40 samples [4]. This motivates us to the idea frequency fulfills such a condition. However, a that adjacent frames should have the minimum overlap. larger frequency step is selected for the 4th This is due to the fact when a PCM waveform signal formant due to its high sensitivity to starts GSM tandem connection; high overlapping of displacement. intra-frames does not cause tremendous changes in 3 nother important point is the lack of proximity reflection coefficients of each frame. This, as a result, Anotheradjacent formants.isthresulack pheroxicauses incorrect detection of transmitted data. Note ththeorainsmesnechfmehulnt in two impotacnt poimants As a result, there are that the overlapping samples in each frame should not unusale g unusable bad regons iin bounary btwee band regions boundary between be chosen in order to prevent inter-modulation effects. formants. This is due to the fact that minimum As a result, (5) presents the linear interpolation for distance for two adjacent formants is twice the proposed modulator: bandwidth considered while their bandwidths are the same. Due to the lack of fidelity in GSM coder/decoder to formants bandwidths, we only Y (1+a+160-n) 2a +1 consider constant and similar bandwidth (n + a-160) n (160- a), ...160 ofAf =160 Hz in whole synthesis process. Y2 = 2a+I (5) As the phase fidelity only holds for frequencies under (m + a) 1 kHz, some information should be preserved in phases -Y3 2a +1 m n=1, ,(1+a) related to first and second formants. As a result, the l (+ a-m) difference between the extracted phase from the Y4 = a-n 2a +1 received signal envelope and the mapped phase in that particular frequency phase should be coded within 3 bits. Another important parameter is pitch-frequency Where a equals the overlapping samples in each frame. selection problem which is proportionate to the choice Note that, Y1, Y2 are multiplied by samples of of the synthesize window employed in harmonic (160-a) to 160 in the last frame, s(i 1), and y3 y4 are analysis procedure discussed earlier in Section 2.1. As multiplied by samples of Ito (1+ a) in the current a result, we observed that using pitch frequencies frame, sI, presented in (6), respectvely. fp=123 Hz and fp=125 Hz result in acceptable performance. Therefore we coded the mapped data on L1 = Y1 x S(j1) () pitch frequencies while employing 1 bit. Finally, the whole speech-like waveform procedure can be L2= Y2 x S(i_1) (k1) modulated by 12 data bits in a 20ms frame length. In L3 = Y3 x S1 () addition, we demonstrate in the simulation results that x Si (k2) using the proposed technique we achieved at a bitrate L4 = (6)
extracting information. The selection criteria are as follows:
of 600bps.
2.3. Intra-frame Interpolation

Interpolat ion1 = L + L2 Interpolat ion 2 =L3 + L4 Interpolat ion = Interpolat ion 1 + Interpolat ion 2

In order to achieve phase continuity which is an important characteristic in speech signals and some

where k1 k2 are the numbers of samples interpolated, Interpolation1, Interpolation2 are the overlapped

56

MA2.5

samples of each frame and Interpolation is the whole interpolated samples of two adjacent frames and note that in demodulator this region shouldn't be chosen. Finally, appropriate PCM waveform signal has been prepared to enter into the speech coded voice channel.
E

0.8 0.7
0.6

0.5

0.4

3. Synchronization
One of the important things in simulation is synchronization of system elements so in order to simulate the synchronization of the speech codec frames in two base-stations, we considered this effect inserted into system by a random number of samples before the signal passed to the second codec. Then to simulate the synchronization of modulator and demodulator, at the start of any communication a predefined synchronization sequence is sent from the modulator to the demodulator. This sequence of samples is known to both. Since in the simulation it is known that there will be a synchronization sequence in the input signal, the synchronization module crosscorrelate a fixed predefined number of input samples at the beginning of the transmission with the predefined synchronization sequence. The sample sequence that best matched is used for synchronization in the
demodulator.

< 0.3

N 0.2
0.1

0A
0

0.2

Time(samples,8KHz) Figure 3: Synthesized and Received signals

20

Synthesized signal
40

Received signal

60

80

100

120

140

0.9

0.7 _ 0.6

0.8

0.4
03
0.2
0.1
0

Received spectral envelope

Synchronized spectral envelope

4. Simulation Results
We tested our system on the GSM-to-GSM connection. In our simulation we generate speech-like signal by the proposed method with the length of 2.5s consisting of about 120 different waveforms (with 20ms length) and The best interpolation occur in a=7. To synchronize modulator and demodulator, before starting of any transmission a predefined synchronization sequence is sent, that the synchronization process occurs in the 22nd sample of this sequence. Next, generated signal is transmitted from modulator to the coder, channel, decoder and demodulator. Fig.3 evaluates 3rd frame of synthesized and received signals. In order to extract the important speech-like parameters, we need to have the envelope spectrum of

500

1000

1500

Figure 4: Synchronized and Received envelopes

frequency(Hz)

2000

2500

3000

3500

4000

convolutional code with constraint length of 7, on the 2kbps, achieved a 1.15 kbps channel with 0.02% BER for SNR=15dB. As a result, the proposed method can be considered as a favorable choice due to its robustness to additive noise as depicted in Fig. 5.
10-3
Empirical

We achieved a throughput of 2kbps with 0.30 Bit error rate (BER). Using a Punctured l2-rate

the received signal. Hence, fig.4 illustrates the envelope spectra of signals. Note fig. 3, 4 have been generated by pitch-frequency corresponding to 125 Hz and signals are not selected from interpolated samples as explained in section 2.3. Four peaks with maximum amplitudes show displacement of central frequencies of 3rd frame's formants in synthesized and received
signals as depicted in Fig.4.

1o-5

10

Fig.5. BER over Fading channel with additive Noise for

SNR per bit,Eb/No (dB)

15

20

25

30

BT=0.3

57

MA2.5

5. Conclusion
A robust method is proposed for secure data transmission over a GSM voice channel. The method was based on transmitting of the mapped data on the fundamental parameters related to formants in a speech-like waveform including phases, frequencies and pitch resulting in transferring 12 bits data on a speech-like waveform using frame size of 20ms.

Reference
[1] M. Street, "Interoperability and international operation: An introduction to end to end mobile security", IEE Secure GSM and Beyond: End to End Security for Mobile Communications, London, Feb., 2003. [2] M. Stefanovic, Y. D. Cho, S. Villette, and A. M. Kondoz, "A 2.4/1.2 kb/s speech coder with noise pre-processor", proceedings EUSIPCO 2000, Tampere, Finland, pp. 4-8, Sept., 2000. [3] N. Katugampala, S. Villette, and A. Kondoz, "Secure voice over GSM and other low bit rate systems," IEE Secure GSM and Beyond: End to End Security for Mobile Communications, London, Feb., 2003. [4] J. Degener and C. Bormann." Gsm 06.10 lossy speech

compression".ftp://ftp.cs.tu-rlin.de/pub/local/kbs/tubmik/gsm /gsm-1.0. 10.tar.gz. [5] D. H. Klatt, "Software of cascade/parallel formant synthesizer", J. Acoustic. Soc. Am. 67 (3), Massachusett, pp 971-996, Mar., 1980 [6] R.J. McAulay and T.F. Quatieri, "Speech analysis/synthesis based on a sinusoidal representation," IEEE Trans. ASSP, vol.34, pp. 744-754, Aug., 1986.

58

Potrebbero piacerti anche