Angel Iniesta & Carlos Lopez Thesis

Digital Filtering vs DSP for Acoustic Echo and Noise Cancelling
Project realized by: Carlos Lpez Snchez ngel Iniesta Navarro
Telekommunikationstechnik & -systeme (TKS) Department of Systems Theory & Signal Processing Project supervised by: Armin Veichtlbauer Peter Haber Salzburg University of Applied Sciences and Technologies
June 2004
ngel Iniesta & Carlos Lpez
_______________________________________________________________________
Index
1. Introduction
page
1.1. Motivation.4 1.2. Surrounding of the topic..4 1.2.1. The Acoustic Echo Problem....4 1.2.2. Introduction to the Echo Cancellers....5 1.2.3. Acoustic Echo Cancellers....6 1.2.3.1. AEC components...........7 1.2.3.2. AEC metrics...8 1.2.4. AEC Conclusions............9
2. Theoretical Background
2.1. Area: fundamentals of signals and systems processing...10 2.1.1. Signals and Systems..10 2.1.2. Linearity11 2.1.3. Examples of Linear and Nonlinear Systems.11 2.1.4. Convolution...12 2.1.5. Correlation.13 2.1.6. The Discrete Fourier Transform (DFT).13 2.1.7. Frequency Response of Systems...15 2.1.8. Inverse DFT (IDFT)..16 2.1.9. The Fast Fourier Transform (FFT)16 2.1.9.1. How the FFT works...16 2.1.9.2. Efficiency of the FFT.....17 2.1.9.3. Some Terminology used in FFT17 2.1.9.4. Implementation of FFT..17 2.2. Digital Filters and DSPs.18 2.2.1. Digital Filters18 2.2.1.1. Introduction...18 2.2.1.2. How Information is Represented in Signals..19 2.2.1.3. The Four Common Frequency Responses.20 2.2.1.4. Filter Classification21 2.2.1.5. Windowed-Sinc Filters..21 2.2.1.6. Recursive Filters23 2.2.1.7. Non-recursive filters: FIR filters...24 2.2.1.7.1. FIR impulse response...25 2.2.1.7.2. The convolution representation of a FIR filter25 2.2.1.7.3. Finite in FIR filters...26 2.2.1.8. Chebyshev Filters..26
_______________________________________________________________________ 2.2.1.8.1. The Chebyshev and Butterworth Responses.26 2.2.2. Digital Signal Processors (DSPs).28 2.2.2.1. Introduction...28 2.2.2.2. DSPs vs Microprocessors29 2.2.2.3. Circular buffer operation.. 30 2.2.2.4. Architecture of the Digital Signal Processor32 2.2.2.5. Data formats: Fixed and Floating Point35 2.2.2.6. Programming languages: C and Assembly...39 2.2.2.7. How Fast are DSPs..39 2.3. Analog vs Digital Filters..41 2.3.1. Introduction....41 2.3.2. Advantages of using digital filters in front of analog circuits..43
3. Analysis of the possible solutions

3.1. Solution ways...45 3.1.1. Digital Adaptive Filters for Echo Cancelling....45 3.1.1.1. The Echo Canceller...45 3.1.1.2. The FIR Filter Echo Canceller..46 3.1.1.3. The Wiener Filter..48 3.1.1.4. Least Mean Squares Algorithm.....49 3.1.1.5. Normalised Least Mean Squares Algorithm.51 3.1.1.6. Non-Linearities And Echo Cancellation...53 3.1.1.7. ERLE.53 3.1.1.8. ERL...54 3.1.2. Available DSPs on the market..56 3.1.2.1. List of all DSPs companies on the market...56 3.1.2.2. Texas Instruments.58 3.1.2.3. Analog Devices.62 3.1.2.4. Motorola....65 3.1.2.5. Programming languages: C versus Assembly...67 3.2. Simulation Toolkits.....70 3.2.1. National Instruments LabVIEW.70 3.2.1.1. DSP Test Integration Toolkit70 3.2.1.2. Signal Processing Toolset.70 3.2.2. Texas Instruments Code Composer Studio IDE.72 3.2.3. Analog Devices VisualDSP++ for SHARC Processors....74 3.2.4. The Mathworks MATLAB & Simulink.75 3.3. Conclusion: Selected method to solve the application.79
_______________________________________________________________________ 3.3.1. Selected Floating Point DSPs models..79 3.3.2. Selected Fixed Point DSPs models..80 3.3.3. TMS320C6713 DSP Starter Kit (DSK)81
4. Description of the chosen solution

4.1. Acoustic Echo Cancelling...84 4.1.1. Modeling....86 4.1.1.1. Describing the AEC model blocks....86 4.1.1.1.1. C6713 DSK DIP Switch...86 4.1.1.1.2. C6713 DSK ADC.87 4.1.1.1.3. C6713 DSK DAC.87 4.1.1.1.4. C62x General Real FIR87 4.1.2. Testing in simulation environment88 4.1.2.1. Parameters setup...88 4.1.3. Analyzing the simulation results.......90 4.2. Acoustic Noise Cancelling..92 4.2.1. Introduction to Acoustic Noise Cancellers....92 4.2.2. ANC Modelling.93 4.2.2.1. Describing the ANC blocks and its parameters setup..95 4.2.2.1.1. Acoustic environment block95 4.2.2.1.2. C6713 DSK DIP Switch..95 4.2.2.1.3. C6713 DSK ADC / DAC.96 4.2.2.1.4. Fast Adapt / Slow Adapt blocks...96 4.2.2.1.5. Normalized LMS filter block...96 4.2.2.1.5.1. General description..96 4.2.2.1.5.2. Supported Data Types..97 4.2.3. Testing and analyzing in the simulation environment...98 4.2.3.1. Simulation before adaptation with white noise source......98 4.2.3.2. Simulation after adaptation with white noise source100 4.2.3.3. Simulation before adaptation with pink noise source..102 4.2.3.4. Simulation after adaptation with pink noise source..104 4.2.3.5. Describing the wave files specifications.107 4.2.3.6. Different type of Noise signals108 4.2.4. ANC conclusions..109 Appendix: A. LMS algorithms in Matlab110 B. Bibliography....133 3
_______________________________________________________________________
1. Introduction
1.1. Motivation
Acoustic echo and noise cancellers are quite interesting nowadays because they are required in many applications such as speakerphones and audio/video conferencing. Moreover it is a new field for us, which we have not studied in depth. That are the reasons which motivate us to develop this thesis theme.
1.2. Surrounding of the topic

The problem that has to be solved is the presence of the acoustic echo and noise signals in a communication between two or more users who are being interrupted by their own echo and possible acoustic noise, forcing them to stop speaking until the echo is faded away and the process is repeated over and over again. This problem degrades the quality of the communication considerably. Firstly, let is talk about the acoustic echo problem, that is how the echo appears in the communication system, and then we will introduce the technique to remove or cancel this undesireble signal through an Acoustic Echo Canceller (AEC) system.
1.2.1. The Acoustic Echo Problem Acoustic echo is inevitable whenever a loudspeaker is placed near a microphone in a full-duplex communication application. This is the case in speaker-phones, audio and video conferencing, desktop communication, and many other communication scenarios. Especially hands-free mobile communication and kits for cars are becoming increasingly important due to safety regulations introduced in more and more countries. In all those and similar communication scenarios, the voice from the loudspeaker (Far End Speech, FES) is inevitably picked by the microphone (NES) and transmitted back to the remote speaker as shown in the next figure. This makes the remote speaker hear her own voice distorted and delayed by the communication channel, which is known as echo. The longer the channel delay, the more annoying the echo becomes until it makes natural conversation impossible and decreases the perceived quality of the communication service. It is therefore absolutely necessary to avoid transmitting back the echo picked by the microphone. Modern full-duplex communication systems make use of an acoustic echo canceller (AEC) to prevent the echo from being transmitted back to the channel. The AEC is employed in each terminal, and has completely different requirements than the Network Echo Canceller employed by the telephone network provider to eliminate the electric echo. The AEC basically estimates the echo and subtracts the estimated echo from the microphone signal as shown in the next figure. The resulting signal (RESidual) is transmitted to the far end speaker through the communication channel.
_______________________________________________________________________
In conclusion, acoustic echo is generated whenever a loudspeaker and a microphone are closely placed in an enclosure, as we have shown in the previous figure.
1.2.2. Introduction to the Echo Cancellers Echo Cancellers have become essential components in todays telecommunications applications. Echo Cancellers can be divided into two main categories, namely Network Echo Cancellers (NEC) and Acoustic Echo Cancellers (AEC). Both types of echo cancellers rely on an adaptive filter to estimate the echo path and subsequently use this estimate to reduce the echo in transmitted signals. The requirements, performance, underlying structure, and adaptive algorihms used to implement a NEC are usually different from those used to implement an AEC.
Figure: block diagram of the acoustic echo canceller
The problem addressed by acoustic echo canceller in all the applications which have describe before is illustrated in the last figure. In this figure, one side of a communication channel (of an audio conference, for instance) is shown. The received
_______________________________________________________________________ speech or audio signal (Rout) is played through a loudspeaker so that all users in the local conference room can hear what the remote users say. The speech of the local users (Sin) is collected by one or more microphones and transmitted through the communication channel to the remote users. The problem in this audio setup is that the voice signal Rout played through the loudspeaker and its reflections off the room boundaries (represented in the figure by the gray lines) will also be collected by the local microphones and transmitted back with the voice of the local users to the remote users, who will hear their own voice delayed by the network and acoustic delay of the communication chain.
1.2.3. Acoustic Echo Cancellers The acoustic echo problem mentioned above can be considerably reduced by an AEC as shown in the last figure. The main function of the acoustic echo canceller is to estimate the acoustic transfer function from the local speakers to the local microphones including the reflections paths. Filtering the incoming voice signal through the estimated acoustic transfer function produces an estimate of echo free signal y(n). Substracting this estimated echo from the microphone signal results in the echo free signal e(n) = d(n)y(n) which is transmitted through the communication channel instead of the microphone signal d(n). In the best case, when the estimate is accurate, this leads to completely eliminating the acoustic echo. The G.167 recommendations requires practical echo cancellers to achieve an echo reduction in the range of 40 to 45dB. Acoustic echo cancellers usually employ adjustable (or adaptive) finite impulse response (FIR) filters to estimate the acoustic echo path. The FIR coefficients are adjusted using an adaptive algorithm to minimize the error signal (Sout). The input (reference) signal for the adaptive filter is the far end speech (Rin) while its desired signal is the near end speech (Sin). After convergence, the adaptive filter coefficients will have an impulse response close to the acoustic impulse response between the loudspeaker and the microphone placed in the local space. The filter response includes the responses of:

the Digital to Analog (D/A) converter before the loudspeaker the power amplifier driving the loudspeaker the loudspeaker response the acoustic path including the reflections off the room boundaries the microphone response the pre-amplifier after the microphone the Analog to Digital (A/D) converter after the microphone
Since this acoustic impulse response can last several hundreds of miliseconds, acoustic echo cancellers usually need long adaptive FIR filters. The G.167 recommendations assumes an average reverberation time of 400ms for teleconferencing applications, and 500ms for hands-free telephones. At sampling frequency of 8kHz or 16kHz, AECs employing adaptive filters of several thousands coefficients are, therefore, not uncommon in practice. Adapting and filtering through such a large filter in real time is a real challence, and advanced, more efficient signal processing algorithms that reduce computational complexity and speed the convergence rate of the adaptive filter must be used.
_______________________________________________________________________ In simple applications where the acoustic impulse response is short, time domain algorithms such as the Normalized Least Mean Squares (NLMS) or one its variants might be used. For more complex environments, frequency domain filters such as the Block Frequency Domain Adaptive Filter (BFDAF) or its partitioned version provide better alternatives. 1.2.3.1. AEC components Although the adaptive filter is the most important component in an acoustic echo canceller, an AEC must include several other components to properly function. For instance, the adaptive filter will diverge and the coefficients will be corrupted if the coefficients are adapted while there is no far end speech signal (the remote speaker is not talking), or when both conversation parties talk at the same time (double talk situation). The different components comprizing an acoustic echo canceller and their relation to each other are shown in the next figure.
Figure: detailed Block diagram of the acoustic echo canceller.
Short description of each block that configures the AEC system:
Far End Speech Detector (FESD): the FESD analyzes the incoming voice signal Rin and determines the moments in time when the remote speaker is active. Near End Speech Detector (NESD): the NESD analyzes the local microphone signal Sin and determines the moments in time when the local speaker is active. Double Talk Detector (DTD): the DTD analyzes several signals to determine the time moments when the two conversation parties are talking at the same time. Control Unit (CU): the outputs of the DTD, FESD, and NESD are analyze in the control unit which generates a control command to the adaptive filter defining whether the filter should calculate the estimated echo, update the filter, do both, or neither.
_______________________________________________________________________
Non-Linear Processor (NLP): when neither speakers is active, or the residual siganl level is low enough, the CU generates a command to active the NLP located in the residual path to output a suitable signal to further reduce the echo. Noise Reduction Unit (NR): this block is very important and necessary when the communication device operates in noisy acoustic environment or when cheap components are used that generate internal noise. The NR unit cleans the microphone signal from such noise, therefore improving speech quality and decreases the recognition error rate in voice command and voice recognition systems. Automatic Gain Control (AGC): the AGC unit keeps the sound level of its output at a predifined value regardless of the sound level of its input. To improve signal to noise ratio, the Rout signal is first passed through an AGC before being sent to the communication channel. Another AGC unit might also be used at the Sout end to automatically adjust the speaker sound. This unit is very important, to avoid communication quality loss when the speakers are far from the microphones or when they are moving while talking.
1.2.3.2. AEC metrics It is often required to compare the performance of several implementations of AECs or test whether a specific implementation complies to a certain standard such as the G.167. Here summarizes the most important metrics that specify the performance of an AEC.
AEC Processing Delay: this is the extra delay which might result from processing the different signals inside the AEC. The G.167 recommndations allow maximum of 16ms delay in each direction of speech transmission for endto-end digital communication systems. The maximum is reduced to 2ms for hands-free telephones connected to the PSTN and 10ms for mobile radio systems. Weighted Terminal Coupling Loss Single Talk: this is the amount of echo reduction when the far end user is talking while the near end user is not talking. This quantity is measured by first resetting all AEC coefficients and applying a signal at the Rin input, the measured signal at Sout is called Sout1. The AEC adaptation is then enabled for a sufficiently long time. No other signals are applied to the microphone except for the loudspeaker sound due the Rin signal. The second measurement of Sout is called Sout2. The difference between Sout2 and Sout1 in dB is the single talk coupling loss. The G.167 requires a coupling loss of 40 to 45 dB for different digital analog systems. Weighted Terminal Coupling Loss Doubled Talk: similar to the singled talk case, this quantity is the amount of echo reduction when both users of the communication system are talking at the same time. Here the difference between Sout2 and Sout1 in dB is the double talk coupling loss. The G.167 requires a coupling loss of 25 to 30 dB for the different applications. Initial Convergence Time: this quantity is the amount of coupling loss achieved after one second for the moment of enabling the AEC. This is measured by first resetting all the AEC coefficients and then enabling adaptation. The difference in Sout before enabling the AEC adaptation and its value after 1s of adaptation is the coupling loss. G.167 requires this coupling loss to be at least 20 dB for all applications.
_______________________________________________________________________ 1.2.4. AEC Conclusions The most important component in an AEC is the adaptive filter. In many applications, an adaptive filter of several thousands of coefficients must be used to meet the required echo reduction. When a simple time domain algorithm (such as the NLMS algorithm) is used to implement this long adaptive filter, the AEC provides unsatisfactory perfomance and a huge amount of computation power is needed. Implementing the adaptive filter in the frequency domain, on the other hand, reduces the computation complexity and improves the AEC perfomance. This, however, is achieved on the cost of increasing memory requirements and introducing processing delay. So now let is go to study in depth the theoretical background that embraces this field to understand better how works an acoustic echo canceller.
_______________________________________________________________________
2. Theoretical Background
2.1. Area: fundamentals of signals and systems processing
2.1.1. Signals and Systems A signal is a description of how one parameter varies with another parameter. For instance, voltage changing over time in an electronic circuit, or brightness varying with distance in an image. A system is any process that produces an output signal in response to an input signal. This is illustrated by the block diagram in the figure. Continuous systems input and output continuous signals, such as in analog electronics. Discrete systems input and output discrete signals, such as computer programs that manipulate the values stored in arrays. Several rules are used for naming signals. These aren't always followed in DSP, but they are very common. The mathematics is difficult enough without a clear notation. First, continuous signals use parentheses, such as: x(t) and y(t), while discrete signals use brackets, as in: x[n] and y[n]. Second, signals use lower case letters. Third, the name given to a signal is usually descriptive of the parameters it represents. For example, a voltage depending on time might be called: v(t), or a stock market price measured each day could be: p[d]
Signals and systems are frequently discussed without knowing the exact parameters being represented. This is the same as using x and y in algebra, without assigning a physical meaning to the variables. This brings in a fourth rule for naming signals. If a more descriptive name is not available, the input signal to a discrete system is usually called: x[n], and the output signal: y[n]. For continuous systems, the signals which are used: x(t) and y(t). There are many reasons for wanting to understand a system. For example, you may want to design a system to remove noise in an electrocardiogram, sharpen an out-of-focus image, or remove echoes in an audio recording. In other cases, the system might have a distortion or interfering effect that you need to characterize or measure.
10
_______________________________________________________________________ For instance, when you speak into a telephone, you expect the other person to hear something that resembles your voice. Unfortunately, the input signal to a transmission line is seldom identical to the output signal. If you understand how the transmission line (the system) is changing the signal, maybe you can compensate for its effect. In still other cases, the system may represent some physical process that you want to study or analyze. Radar and sonar are good examples of this. These methods operate by comparing the transmitted and reflected signals to find the characteristics of a remote object. In terms of system theory, the problem is to find the system that changes the transmitted signal into the received signal. At first glance, it may seem an overwhelming task to understand all of the possible systems in the world. Fortunately, most useful systems fall into a category called linear systems. This fact is extremely important. Without the linear system concept, we would be forced to examine the individual characteristics of many unrelated systems. With this approach, we can focus on the traits of the linear system category as a whole. Our first task is to identify what properties make a system linear, and how they fit into the everyday notion of electronics, software, and other signal processing systems. 2.1.2. Linearity A system is called linear if it has two mathematical properties: homogeneity and additivity. If you can show that a system has both properties, then you have proven that the system is linear. Likewise, if you can show that a system doesn't have one or both properties, you have proven that it isn't linear. A third property, shift invariance, is not a strict requirement for linearity, but it is a mandatory property for most DSP techniques. When you see the term linear system used in DSP, you should assume it includes shift invariance unless you have reason to believe otherwise. These three properties form the mathematics of how linear system theory is defined and used.
2.1.3. Examples of Linear and Nonlinear Systems Examples of Linear Systems: Wave propagation such as sound and electromagnetic waves. Electrical circuits composed of resistors, capacitors, and inductors. Electronic circuits, such as amplifiers and filters. Mechanical motion from the interaction of masses, springs, and dashpots (dampeners). Systems described by differential equations such as resistor-capacitor-inductor networks. Multiplication by a constant, that is, amplification or attenuation of the signal. Signal changes, such as echoes, resonances, and image blurring. The unity system where the output is always equal to the input. The null system where the output is always equal to the zero, regardless of the input. Differentiation and integration, and the analogous operations of first difference and running sum for discrete signals. Small perturbations in an otherwise nonlinear system, for instance, a small signal being amplified by a properly biased transistor.
11
_______________________________________________________________________ Convolution, a mathematical operation where each value in the output is expressed as the sum of values in the input multiplied by a set of weighing coefficients. Recursion, a technique similar to convolution, except previously calculated values in the output are used in addition to values from the input. Examples of Nonlinear Systems: Systems that do not have static linearity, for instance, the voltage and power in a resistor: P = V/R, the radiant energy emission of a hot object depending on its temperature: R = kTT, etc. Systems that do not have sinusoidal fidelity, such as electronics circuits for: peak detection, squaring, sine wave to square wave conversion, frequency doubling, etc. Common electronic distortion, such as clipping, crossover distortion and slewing. Multiplication of one signal by another signal, such as in amplitude modulation and automatic gain controls. Hysteresis phenomena, such as magnetic flux density versus magnetic intensity in iron, or mechanical stress versus strain in vulcanized rubber. Saturation, such as electronic amplifiers and transformers driven too hard. Systems with a threshold, for example, digital logic gates, or seismic vibrations that are strong enough to pulverize the intervening rock.
2.1.4. Convolution Convolution is a formal mathematical operation, just as multiplication, addition, and integration. Addition takes two numbers and produces a third number, while convolution takes two signals and produces a third signal. Convolution is used in the mathematics of many fields, such as probability and statistics. In linear systems, it is used to describe the relationship between three signals of interest: the input signal, the impulse response, and the output signal. An input signal, x[n], enters a linear system with an impulse response, h[n], resulting in an output signal, y[n]. In equation form: x[n] * h[n] = y[n]. Expressed in words, the input signal convolved with the impulse response is equal to the output signal. Just as addition is represented by the plus, +, and multiplication by the cross, , convolution is represented by the star, *. It is unfortunate that most programming languages also use the star to indicate multiplication. A star in a computer program means multiplication, while a star in an equation means convolution.
12
_______________________________________________________________________ The equation that represents the convolution is:
The system output y[n] is the convolution of the input signal x[n] which convolves with the system impulse response h[n].
2.1.5. Correlation Correlation is a mathematical operation that is very similar to convolution. Just as with convolution, correlation uses two signals to produce a third signal. This third signal is called the cross-correlation of the two input signals. If a signal is correlated with itself, the resulting signal is instead called the autocorrelation. Correlation is the optimal technique for detecting a known waveform in random noise. That is, the peak is higher above the noise using correlation than can be produced by any other linear system. Using correlation to detect a known waveform is frequently called matched filtering.
2.1.6. The Discrete Fourier Transform (DFT) Fourier analysis is a family of mathematical techniques, all based on decomposing signals into sinusoids. The Discrete Fourier Transform is the family member used with digitized signals. Before we get started on the DFT, let's look for a moment at the Fourier transform (FT) and explain why we are not talking about it instead. The Fourier transform of a continuous-time signal x(t) may be defined as:
Thus, right off the bat, we need calculus. The DFT, on the other hand, replaces the infinite integral with a finite sum:
13
_______________________________________________________________________ A signal can be either continuous or discrete, and it can be either periodic or aperiodic. The combination of these two features generates the four categories: Aperiodic-Continuous This includes, for example, decaying exponentials and the Gaussian curve. These signals extend to both positive and negative infinity without repeating in a periodic pattern. The Fourier Transform for this type of signal is simply called the Fourier Transform. Periodic-Continuous Here the examples include: sine waves, square waves, and any waveform that repeats itself in a regular pattern from negative to positive infinity. This version of the Fourier transform is called the Fourier Series. Aperiodic-Discrete These signals are only defined at discrete points between positive and negative infinity, and do not repeat themselves in a periodic fashion. This type of Fourier transform is called the Discrete Time Fourier Transform. Periodic-Discrete These are discrete signals that repeat themselves in a periodic fashion from negative to positive infinity. This class of Fourier Transform is sometimes called the Discrete Fourier Series, but is most often called the Discrete Fourier Transform. The four categories are shown on the following figure:
14
_______________________________________________________________________ 2.1.7. Frequency Response of Systems Systems are analyzed in the time domain by using convolution. A similar analysis can be done in the frequency domain. Using the Fourier transform, every input signal can be represented as a group of cosine waves, each with a specified amplitude and phase shift. Likewise, the DFT can be used to represent every output signal in a similar form. This means that any linear system can be completely described by how it changes the amplitude and phase of cosine waves passing through it. This information is called the system's frequency response. Since both the impulse response and the frequency response contain complete information about the system, there must be a one-to-one correspondence between the two. Given one, you can calculate the other. The relationship between the impulse response and the frequency response is one of the foundations of signal processing: A system's frequency response is the Fourier Transform of its impulse response. Keeping with standard DSP notation, impulse responses use lower case variables, while the corresponding frequency responses are upper case. Since h[ ] is the common symbol for the impulse response, H[ ] is used for the frequency response. Systems are described in the time domain by convolution, that is: x[n] * h[n] = y[n]. In the frequency domain, the input spectrum is multiplied by the frequency response, resulting in the output spectrum. As an equation: X[f ] H[f ] = Y[f ]. That is, convolution in the time domain corresponds to multiplication in the frequency domain. And convolution in the frequency domain corresponds to multiplication in the time domain. The next figure illustrates the relationship betwen the impulse response and the frequency response:
15
_______________________________________________________________________ 2.1.8. Inverse DFT The inverse DFT (the IDFT) is given by:
In summary, the DFT is proportional to the set of coefficients of projection onto the sinusoidal basis set, and the IDFT is the reconstruction of the original signal as a superposition of its sinusoidal projections. This basic architecture extends to all linear orthogonal transforms, including wavelets, Fourier transforms, Fourier series, the discrete-time Fourier transform (DTFT), and certain short-time Fourier transforms (STFT).
2.1.9. The Fast Fourier Transform (FFT) The Fast Fourier Transform is another method for calculating the DFT. While it produces the same result as the other approaches, it is incredibly more efficient, often reducing the computation time by hundreds. This is the same improvement as flying in a jet aircraft versus walking! The FFT requires a few dozen lines of code, and it is one of the most complicated algorithms in DSP.
2.1.9.1. How the FFT works By making use of periodicities in the sines that are multiplied to do the transforms, the FFT greatly reduces the amount of calculation required. Here's a little overview. Functionally, the FFT decomposes the set of data to be transformed into a series of smaller data sets to be transformed. Then, it decomposes those smaller sets into even smaller sets. At each stage of processing, the results of the previous stage are combined in special way. Finally, it calculates the DFT of each small data set. For example, an FFT of size 32 is broken into 2 FFT's of size 16, which are broken into 4 FFT's of size 8, which are broken into 8 FFT's of size 4, which are broken into 16 FFT's of size 2. Calculating a DFT of size 2 is trivial. Here is a slightly more rigorous explanation: It turns out that it is possible to take the DFT of the first N/2 points and combine them in a special way with the DFT of the second N/2 points to produce a single N-point DFT. Each of these N/2-point DFTs can be calculated using smaller DFTs in the same way. One (radix-2) FFT begins, therefore, by calculating N/2 2-point DFTs. These are combined to form N/4 4-point DFTs. The next stage produces N/8 8-point DFTs, and so on, until a single N-point DFT is produced.
16
_______________________________________________________________________ 2.1.9.2. Efficiency of the FFT The DFT takes N^2 operations for N points. Since at any stage the computation required to combine smaller DFTs into larger DFTs is proportional to N, and there are log2(N) stages (for radix 2), the total computation is proportional to N * log2(N). Therefore, the ratio between a DFT computation and an FFT computation for the same N is proportional to N / log2(n). In cases where N is small this ratio is not very significant, but when N becomes large, this ratio gets very large. (Every time you double N, the numerator doubles, but the denominator only increases by 1.) 2.1.9.3. Some Terminology used in FFT The radix is the size of an FFT decomposition. For single-radix FFT's, the transform size must be a power of the radix. FFT's can be decomposed using DFT's of even and odd points, which is called a Decimation-In-Time (DIT) FFT, or they can be decomposed using a first-half/secondhalf approach, which is called a Decimation-In-Frequency (DIF) FFT. Generally, the user does not need to worry which type is being used. 2.1.9.4. Implementation of FFT Except as a learning exercise, you generally will never have to. Many good FFT implementations are available in C, Fortran and other languages, and microprocessor manufacturers generally provide free optimized FFT implementations in their processors' assembly code, Therefore, it is not so important to understand how the FFT really works, as it is to understand how to use it.
17
_______________________________________________________________________
2.2. Digital Filters and DSPs

2.2.1. Digital Filters
2.2.1.1. Introduction Digital filters are used for two general purposes: (1) separation of signals that have been combined, and (2) restoration of signals that have been distorted in some way. Analog (electronic) filters can be used for these same tasks; however, digital filters can achieve far superior results. Digital filters are a very important part of Digital Signal Processing (DSP). In fact, their extraordinary performance is one of the key reasons that DSP has become so popular. As mentioned in the introduction, filters have two uses: signal separation and signal restoration. Signal separation is needed when a signal has been contaminated with interference, noise, or other signals. Signal restoration is used when a signal has been distorted in some way. For example, an audio recording made with poor equipment may be filtered to better represent the sound as it actually occurred. These problems can be attacked with either analog or digital filters. Which is better? Analog filters are cheap, fast, and have a large dynamic range in both amplitude and frequency. Digital filters, in comparison, are vastly superior in the level of performance that can be achieved. It is common in DSP to say that a filter's input and output signals are in the time domain. This is because signals are usually created by sampling at regular intervals of time. But this is not the only way sampling can take place. The second most common way of sampling is at equal intervals in space. Many other domains are possible; however, time and space are by far the most common. When you see the term time domain in DSP, remember that it may actually refer to samples taken over time, or it may be a general reference to any domain that the samples are taken in. The most straightforward way to implement a digital filter is by convolving the input signal with the digital filter's impulse response. All possible linear filters can be made in this manner. When the impulse response is used in this way, filter designers give it a special name: the filter kernel. There is also another way to make digital filters, called recursion. When a filter is implemented by convolution, each sample in the output is calculated by weighting the samples in the input, and adding them together. Recursive filters are an extension of this, using previously calculated values from the output, besides points from the input. Instead of using a filter kernel, recursive filters are defined by a set of recursion coefficients.
18
_______________________________________________________________________ The important point is that all linear filters have an impulse response, even if you don't use it to implement the filter. To find the impulse response of a recursive filter, simply feed in an impulse, and see what comes out. The impulse responses of recursive filters are composed of sinusoids that exponentially decay in amplitude. In principle, this makes their impulse responses infinitely long. However, the amplitude eventually drops below the round-off noise of the system, and the remaining samples can be ignored. Because of this characteristic, recursive filters are also called Infinite Impulse Response or IIR filters. In comparison, filters carried out by convolution are called Finite Impulse Response or FIR filters. A digital filter uses a digital processor to perform numerical calculations on sampled values of the signal. The processor may be a general-purpose computer such as a PC, or a specialised DSP (Digital Signal Processor) chip. The analog input signal must first be sampled and digitised using an ADC (analog to digital converter). The resulting binary numbers, representing successive sampled values of the input signal, are transferred to the processor, which carries out numerical calculations on them. These calculations typically involve multiplying the input values by constants and adding the products together. If necessary, the results of these calculations, which now represent sampled values of the filtered signal, are output through a DAC (digital to analog converter) to convert the signal back to analog form. Note that in a digital filter, the signal is represented by a sequence of numbers, rather than a voltage or current. The following diagram shows the basic setup of such a system:
2.2.1.2. How Information is Represented in Signals The most important part of any DSP task is understanding how information is contained in the signals you are working with. There are only two ways that are common for information to be represented in naturally occurring signals. We will call these: information represented in the time domain, and information represented in the frequency domain.
19
_______________________________________________________________________ Information represented in the time domain describes when something occurs and what the amplitude of the occurrence is. In contrast, information represented in the frequency domain is more indirect. By measuring the frequency, phase, and amplitude of this periodic motion, information can often be obtained about the system producing the motion. This brings us to the importance of the step and frequency responses. The step response describes how information represented in the time domain is being modified by the system. In contrast, the frequency response shows how information represented in the frequency domain is being changed. This distinction is absolutely critical in filter design because it is not possible to optimize a filter for both applications. Good performance in the time domain results in poor performance in the frequency domain, and vice versa.
2.2.1.3. The Four Common Frequency Responses Frequency domain filters are generally used to pass certain frequencies (the passband), while blocking others (the stopband). These four responses are the most common: lowpass, high-pass, band-pass, and band-reject:
The purpose of these filters is to allow some frequencies to pass unaltered, while completely blocking other frequencies. The passband refers to those frequencies that are passed, while the stopband contains those frequencies that are blocked. The transition band is between. A fast roll-off means that the transition band is very narrow. The division between the passband and transition band is called the cutoff frequency. In analog filter design, the cutoff frequency is usually defined to be where the amplitude is reduced to 0.707 (i.e., -3dB). Digital filters are less standardized, and it is common to see 99%, 90%, 70.7%, and 50% amplitude levels defined to be the cutoff frequency.
20
_______________________________________________________________________ 2.2.1.4. Filter Classification Digital filters are classified by their use and by their implementation. Time domain filters are used when the information is encoded in the shape of the signal's waveform. Time domain filtering is used for such actions as: smoothing, DC removal, waveform shaping, etc. In contrast, frequency domain filters are used when the information is contained in the amplitude, frequency, and phase of the component sinusoids. The goal of these filters is to separate one band of frequencies from another. Custom filters are used when a special action is required by the filter, something more elaborate than the four basic responses (high-pass, low-pass, band-pass and band-reject).
Digital filters can be implemented in two ways, by convolution (also called finite impulse response or FIR) and by recursion (also called infinite impulse response or IIR). Filters carried out by convolution can have far better performance than filters using recursion, but execute much more slowly.
2.2.1.5. Windowed-Sinc Filters Windowed-sinc filters are used to separate one band of frequencies from another. They are very stable, produce few surprises, and can be pushed to incredible performance levels. These exceptional frequency domain characteristics are obtained at the expense of poor performance in the time domain, including excessive ripple and overshoot in the step response. When carried out by standard convolution, windowed-sinc filters are easy to program, but slow to execute. The next figure illustrates the idea behind the windowed-sinc filter. In (a), the frequency response of the ideal low-pass filter is shown. All frequencies below the cutoff frequency, fC, are passed with unity amplitude, while all higher frequencies are blocked.
21
_______________________________________________________________________ The passband is perfectly flat, the attenuation in the stopband is infinite, and the transition between the two is infinitesimally small. Taking the Inverse Fourier Transform of this ideal frequency response produces the ideal filter kernel (impulse response) shown in (b). As previously discussed , this curve is of the general form: sin(x)/x, called the sinc function, given by: h[n] = sin (2 fC n) / n
There are two more important types of windowed filters, that are called: the Hanning window and the Blackman window. On the next figure we can see the characteristics of the Blackman and Hanning windows. The shapes of these two windows are shown in (a). As shown in (b), the Hanning window results in about 20% faster roll-off than the Blackman window. However, the Blackman window has better stopband attenuation (Blackman: 0.02%, Hanning: 0.2%), and a lower passband ripple (Blackman: 0.02% Hanning: 0.2%):
22
_______________________________________________________________________ 2.2.1.6. Recursive Filters Recursive filters are an efficient way of achieving a long impulse response, without having to perform a long convolution. They execute very rapidly, but have less performance and flexibility than other digital filters. Recursive filters are also called Infinite Impulse Response (IIR) filters, since their impulse responses are composed of decaying exponentials. This distinguishes them from digital filters carried out by convolution, called Finite Impulse Response (FIR) filters.
The recursive equation that represents this type of filters is the following expression:
y [n] = a0 x [n] + a1 x [n1] + a2 x [n2] + a3 x [n3] + + b1 y [n1] + b2 y [n2] + b3 y [n3]+
We should recognize that this is nothing more than simple convolution, with the coefficients: a0, a1, a2,, forming the convolution kernel. Each point in the output signal is found by multiplying the values from the input signal by the "a" coefficients, multiplying the previously calculated values from the output signal by the "b" coefficients, and adding the products together. Notice that there isn't a value for b0, because this corresponds to the sample being calculated. The previous equation is called the recursion equation, and filters that use it are called recursive filters. The "a" and "b" values that define the filter are called the recursion coefficients. In actual practice, no more than about a dozen recursion coefficients can be used or the filter becomes unstable (i.e., the output continually increases or oscillates). Recursive filters are useful because they bypass a longer convolution. For instance, consider what happens when a delta function is passed through a recursive filter. The output is the filter's impulse response, and will typically be a sinusoidal oscillation that exponentially decays. Since this impulse response in infinitely long, recursive filters are often called infinite impulse response (IIR) filters. In effect, recursive filters convolve the input signal with a very long filter kernel, although only a few coefficients are involved. The relationship between the recursion coefficients and the filter's response is given by a mathematical technique called the z-transform. The z-transform can be used for such tasks as: converting between the recursion coefficients and the frequency response, combining cascaded and parallel stages into a single filter, designing recursive systems that mimic analog filters, etc. Unfortunately, the z-transform is very mathematical, and more complicated than most DSP users are willing to deal with.
23
_______________________________________________________________________ The following figure shows a single pole low-pass filter. Digital recursive filters can mimic analog filters composed of resistors and capacitors. As we can observe on the figure, the single pole low-pass recursive filter smoothes the edge of a step input, just as an electronic RC filter.
2.2.1.7. Non-recursive filters: FIR filters
This figure gives the system diagram of a general finite-impulse-response-filter (FIR). Such a filter is also called a transfer filter, or a tapped dalay line. The implementation is one example of a direct-form implementation of a digital filter.
24
_______________________________________________________________________ 2.2.1.7.1. FIR impulse response The impulse response h[n] is obtained at the output when the input signal is the impulse signal = [1,0,0,0,...]. More formally, the impulse signal is defined by:
If the k th tap is denoted bk, then it is obvious from next figure that the impulse response signal is given by:
In other words, the impulse response simply consists of the tap coefficients, prepended and appended by zeros.
The transfer function of a FIR filter is given by the z transform of its impulse response. This is true for any linear, time-invariant (LTI) filter. For FIR filters in particular, we have, from the previous expression:
Thus, the transfer function of every length N = M+1. FIR filter is an M th-order polynomial in z.
2.2.1.7.2. The convolution representation of a FIR filter Note that the output of the k th delay element in the system diagram of FIR, is x[nk], k = 0,1,2,,M where x[n] is the input signal amplitude at time n. The output signal y[n] is therefore:
25
_______________________________________________________________________
A FIR filter thus operates by convolving the input signal x[n] with the filter's impulse response h[n].
2.2.1.7.3. Finite in FIR filters From the previous equation, we can see that the impulse response becomes zero after time M = N1. Therefore, a tapped delay line can only implement finite-duration impulse responses in the sense that the non-zero portion of the impulse response must be finite. This is what is meant by the term finite impulse response (FIR). FIR digital filtering is the most common technique used in digital signal processing.
2.2.1.8. Chebyshev Filters Chebyshev filters are used to separate one band of frequencies from another. Although they cannot match the performance of the windowed-sinc filter, they are more than adequate for many applications. The primary attribute of Chebyshev filters is their speed, typically more than an order of magnitude faster than the windowed-sinc. This is because they are carried out by recursion rather than convolution. The design of these filters is based on a mathematical technique called the z-transform. 2.2.1.8.1. The Chebyshev and Butterworth Responses The Chebyshev response is a mathematical strategy for achieving a faster rolloff by allowing ripple in the frequency response. Analog and digital filters that use this approach are called Chebyshev filters. These filters are named from their use of the Chebyshev polynomials, developed by the Russian mathematician Pafnuti Chebyshev (1821-1894).
26
_______________________________________________________________________
The previous figure shows the frequency response of low-pass Chebyshev filters with passband ripples of: 0%, 0.5% and 20%. As the ripple increases (bad), the roll-off becomes sharper (good). The Chebyshev response is an optimal tradeoff between these two parameters. When the ripple is set to 0%, the filter is called a maximally flat or Butterworth filter (after S. Butterworth, a British engineer who described this response in 1930). A ripple of 0.5% is a often good choice for digital filters. This matches the typical precision and accuracy of the analog electronics that the signal has passed through. These Chebyshev filters are called type 1 filters, meaning that the ripple is only allowed in the passband. In comparison, type 2 Chebyshev filters have ripple only in the stopband. Type 2 filters are seldom used, and we won't discuss them. There is, however, an important design called the elliptic filter, which has ripple in both the passband and the stopband. Elliptic filters provide the fastest roll-off for a given number of poles, but are much harder to design. We won't discuss the elliptic filter here, but be aware that it is frequently the first choice of professional filter designers, both in analog electronics and DSP. If you need this level of performance, buy a software package for designing digital filters.
27
_______________________________________________________________________
2.2.2. Digital Signal Processors (DSPs)

2.2.2.1. Introduction First of all, we have to ask us what is DSP. Digital Signal Processing (DSP) is used in a wide variety of applications, and it is hard to find a good definition that is general. We can start by dictionary definitions of the words: Digital: operating by the use of discrete signals to represent data in the form of numbers. Signal: a variable parameter by which information is conveyed through an electronic circuit. Processing: to perform operations on data according to programmed instructions. Which leads us to a simple definition of: Digital Signal processing: changing or analysing information which is measured as discrete sequences of numbers. Note two unique features of Digital Signal processing as opposed to plain old ordinary digital processing:
signals come from the real world, this intimate connection with the real world leads to many unique needs such as the need to react in real time and a need to measure signals and convert them to digital numbers. signals are discrete, which means the information in between discrete samples is lost.
The advantages of DSP are common to many digital systems and include: Versatility:

digital systems can be reprogrammed for other applications (at least where programmable DSP chips are used). digital systems can be ported to different hardware (for example a different DSP chip or board level product).
Repeatability:

digital systems can be easily duplicated. digital systems do not depend on strict component tolerances. digital system responses do not drift with temperature.
28
_______________________________________________________________________ Simplicity:
some things can be done more easily digitally than with analogue systems.
DSP is used in a very wide variety of applications like: telephony, radar, digital TV, audio, sonar, multimedia, fax, process control, etc., but most share some common features:

they use a lot of maths (multiplying and adding signals). they deal with signals that come from the real world. they require a response in a certain time.
Where general purpose DSP processors are concerned, most applications deal with signal frequencies that are in the audio range. Continueing with DSP, Digital Signal Processing is carried out by mathematical operations. In comparison, word processing and similar programs merely rearrange stored data. This means that computers designed for business and other general applications are not optimized for algorithms such as digital filtering and Fourier analysis. Digital Signal Processors are microprocessors specifically designed to handle Digital Signal Processing tasks. These devices have seen tremendous growth in the last decade, finding use in everything from cellular telephones to advanced scientific instruments. In fact, hardware engineers use "DSP" to mean Digital Signal Processor, just as algorithm developers use "DSP" to mean Digital Signal Processing.
2.2.2.2. DSPs vs Microprocessors The last forty years have shown that computers are extremely capable in two broad areas, data manipulation, such as word processing and database management, and mathematical calculation, used in science, engineering, and Digital Signal Processing. All microprocessors can perform both tasks; however, it is difficult (expensive) to make a device that is optimized for both. There are technical tradeoffs in the hardware design, such as the size of the instruction set and how interrupts are handled. Even more important, there are marketing issues involved: development and manufacturing cost, competitive position, product lifetime, and so on. As a broad generalization, these factors have made traditional microprocessors, such as the Pentium, primarily directed at data manipulation. Similarly, DSPs are designed to perform the mathematical calculations needed in Digital Signal Processing.
29
_______________________________________________________________________
In the previous figure we observe that digital computers are useful for two general tasks: data manipulation and mathematical calculation. Data manipulation is based on moving data and testing inequalities, while mathematical calculation uses multiplication and addition. Data manipulation involves storing and sorting information. For instance, consider a word processing program. The basic task is to store the information (typed in by the operator), organize the information (cut and paste, spell checking, page layout, etc.), and then retrieve the information (such as saving the document on a floppy disk or printing it with a laser printer). These tasks are accomplished by moving data from one location to another, and testing for inequalities (A=B, A<B, etc.).
2.2.2.3. Circular buffer operation Digital Signal Processors are designed to quickly carry out FIR filters and similar techniques. To understand the hardware, we must first understand the algorithms. We need to distinguish between off-line processing and real-time processing. In offline processing, the entire input signal resides in the computer at the same time. In real-time processing, the output signal is produced at the same time that the input signal is being acquired. For example, this is needed in telephone communication, hearing aids, and radar. These applications must have the information immediately available, although it can be delayed by a short amount. For instance, a 10 millisecond delay in a telephone call cannot be detected by the speaker or listener. Likewise, it makes no difference if a radar signal is delayed by a few seconds before being displayed to the operator. Real-time applications input a sample, perform the algorithm, and output a sample, over-and-over. Alternatively, they may input a group of samples, perform the algorithm, and output a group of samples. This is the world of Digital Signal Processors.
30
_______________________________________________________________________
The previous figure illustrates an eight sample circular buffer. We have placed this circular buffer in eight consecutive memory locations, 20041 to 20048. Figure (a) shows how the eight samples from the input might be stored at one particular instant in time, while (b) shows the changes after the next sample is acquired. The idea of circular buffering is that the end of this linear array is connected to its beginning; memory location 20041 is viewed as being next to 20048, just as 20044 is next to 20045. You keep track of the array by a pointer (a variable whose value is an address) that indicates where the most recent sample resides. For instance, in (a) the pointer contains the address 20044, while in (b) it contains 20045. When a new sample is acquired, it replaces the oldest sample in the array, and the pointer is moved one address ahead. Circular buffers are efficient because only one value needs to be changed when a new sample is acquired. The whole point of this discussion is that DSPs should be optimized at managing circular buffers to achieve the highest possible execution speed. Circular buffering is also useful in off-line processing. Consider a program where both the input and the output signals are completely contained in memory. Circular buffering isn't needed for a convolution calculation, because every sample can be immediately accessed. However, many algorithms are implemented in stages, with an intermediate signal being created between each stage. For instance, a recursive filter carried out as a series of biquads operates in this way. The brute force method is to store the entire length of each intermediate signal in memory. Circular buffering provides another option: store only those intermediate samples needed for the calculation at hand. This reduces the required amount of memory, at the expense of a more complicated algorithm. The important idea is that circular buffers are useful for off-line processing, but critical for real-time applications. In the following table we will see the steps needed to implement an FIR filter using circular buffers for both the input signal and the coefficients. The efficient handling of these individual tasks is what separates a DSP from a traditional microprocessor.
31
_______________________________________________________________________
1. Obtain a sample with the ADC; generate an interrupt 2. Detect and manage the interrupt 3. Move the sample into the input signal's circular buffer 4. Update the pointer for the input signal's circular buffer 5. Zero the accumulator 6. Control the loop through each of the coefficients 7. Fetch the coefficient from the coefficient's circular buffer 8. Update the pointer for the coefficient's circular buffer 9. Fetch the sample from the input signal's circular buffer 10. Update the pointer for the input signal's circular buffer 11. Multiply the coefficient by the sample 12. Add the product to the accumulator 13. Move the output sample (accumulator) to a holding buffer 14. Move the output sample from the holding buffer to the DAC
The goal is to make these steps execute quickly. Since steps 6-12 will be repeated many times (once for each coefficient in the filter), special attention must be given to these operations. Traditional microprocessors must generally carry out these 14 steps in serial (one after another), while DSPs are designed to perform them in parallel. In some cases, all of the operations within the loop (steps 6-12) can be completed in a single clock cycle.
2.2.2.4. Architecture of the Digital Signal Processor One of the biggest bottlenecks in executing DSP algorithms is transferring information to and from memory. This includes data, such as samples from the input signal and the filter coefficients, as well as program instructions, the binary codes that go into the program sequencer. For example, suppose we need to multiply two numbers that reside somewhere in memory. To do this, we must fetch three binary values from memory, the numbers to be multiplied, plus the program instruction describing what to do.
The previous figure shows how this seemingly simple task is done in a traditional microprocessor. This is often called a Von Neumann architecture. Von Neumann architecture contains a single memory and a single bus for transferring data into and out of the central processing unit (CPU). Multiplying two numbers requires at least three clock cycles, one to transfer each of the three numbers over the bus from the memory to the CPU.
32
_______________________________________________________________________ We don't count the time to transfer the result back to memory, because we assume that it remains in the CPU for additional manipulation (such as the sum of products in an FIR filter).
The Harvard architecture is shown in (b). This is named for the work done at Harvard University in the 1940s under the leadership of Howard Aiken (1900-1973). Aiken insisted on separate memories for data and program instructions, with separate buses for each. Since the buses operate independently, program instructions and data can be fetched at the same time, improving the speed over the single bus design. Most present day DSPs use this dual bus architecture.
Figure (c) illustrates the next level of sophistication, the Super Harvard Architecture. These are called SHARC DSPs, a contraction of the longer term, Super Harvard ARChitecture. The idea is to build upon the Harvard architecture by adding features to improve the throughput. While the SHARC DSPs are optimized in dozens of ways, two areas are important enough to be included: an instruction cache, and an I/O controller. Now lets look inside the CPU. At the top of the diagram are two blocks labeled Data Address Generator (DAG), one for each of the two memories. These control the addresses sent to the program and data memories, specifying where the information is to be read from or written to.
33
_______________________________________________________________________ In simpler microprocessors this task is handled as an inherent part of the program sequencer, and is quite transparent to the programmer. However, DSPs are designed to operate with circular buffers, and benefit from the extra hardware to manage them efficiently. The DAGs in the SHARC DSPs are also designed to efficiently carry out the Fast Fourier transform. In this mode, the DAGs are configured to generate bit-reversed addresses into the circular buffers, a necessary part of the FFT algorithm. In addition, an abundance of circular buffers greatly simplifies DSP code generation- both for the human programmer as well as high-level language compilers, such as C. The math processing is broken into three sections, a multiplier, an arithmetic logic unit (ALU), and a barrel shifter. The multiplier takes the values from two registers, multiplies them, and places the result into another register. The ALU performs addition, subtraction, absolute value, logical operations (AND, OR, XOR, NOT), conversion between fixed and floating point formats, and similar functions. Elementary binary operations are carried out by the barrel shifter, such as shifting, rotating, extracting and depositing segments, and so on. A powerful feature of the SHARC family is that the multiplier and the ALU can be accessed in parallel. In a single clock cycle, data from registers 0-7 can be passed to the multiplier, data from registers 8-15 can be passed to the ALU, and the two results returned to any of the 16 registers.
34
_______________________________________________________________________ The previous figure shows the typical DSP architecture. And this is the simplified diagram of a SHARC DSP, where we can observe the blocks that compose the CPU, which we have described before.
2.2.2.5. Data formats: Fixed and Floating Point Digital Signal Processing can be divided into two categories, fixed point and floating point. These refer to the format used to store and manipulate numbers within the devices. Fixed point DSPs usually represent each number with a minimum of 16 bits, although a different length can be used. In comparison, floating point DSPs typically use a minimum of 32 bits to store each value. Lets study these two data formats. It is worth noting that fixed point format is not quite the same as integer:
The integer format is straightforward: representing whole numbers from 0 up to the largest whole number that can be represented with the available number of bits. Fixed point format is used to represent numbers that lie between 0 and 1: with a 'binary point' assumed to lie just after the most significant bit. The most significant bit in both cases carries the sign of the number.

The size of the fraction represented by the smallest bit is the precision of the fixed point format. The size of the largest number that can be represented in the available word length is the dynamic range of the fixed point format.
To make the best use of the full available word length in the fixed point format, the programmer has to make some decisions:
If a fixed point number becomes too large for the available word length, the programmer has to scale the number down, by shifting it to the right: in the process lower bits may drop off the end and be lost.
35
_______________________________________________________________________
If a fixed point number is small, the number of bits actually used to represent it is small. The programmer may decide to scale the number up, in order to use more of the available word length.
In both cases the programmer has to keep a track of by how much the binary point has been shifted, in order to restore all numbers to the same scale at some later stage. Floating point format has the remarkable property of automatically scaling all numbers by moving, and keeping track of, the binary point so that all numbers use the full word length available but never overflow:
Floating point numbers have two parts: the mantissa, which is similar to the fixed point part of the number, and an exponent which is used to keep track of how the binary point is shifted. Every number is scaled by the floating point hardware:

If a number becomes too large for the available word length, the hardware automatically scales it down, by shifting it to the right. If a number is small, the hardware automatically scale it up, in order to use the full available word length of the mantissa.
In both cases the exponent is used to count how many times the number has been shifted. In floating point numbers the binary point comes after the second most significant bit in the mantissa.
The block floating point format provides some of the benefits of floating point, but by scaling blocks of numbers rather than each individual number:
36
_______________________________________________________________________
Block floating point numbers are actually represented by the full word length of a fixed point format.
If any one of a block of numbers becomes too large for the available word length, the programmer scales down all the numbers in the block, by shifting them to the right. If the largest of a block of numbers is small, the programmer scales up all numbers in the block, in order to use the full available word length of the mantissa.
In both cases the exponent is used to count how many times the numbers in the block have been shifted. Some specialised processors, such as those from Zilog, have special features to support the use of block floating point format: more usually, it is up to the programmer to test each block of numbers and carry out the necessary scaling. The floating point format has one further advantage over fixed point: it is faster. Because of quantisation error, a basic direct form 1 IIR filter second order section requires an extra multiplier, to scale numbers and avoid overflow. But the floating point hardware automatically scales every number to avoid overflow, so this extra multiplier is not required:
37
_______________________________________________________________________ The precision with which numbers can be represented is determined by the word length in the fixed point format, and by the number of bits in the mantissa in the floating point format. In a 32 bit DSP processor the mantissa is usually 24 bits: so the precision of a floating point DSP is the same as that of a 24 bit fixed point processor. But floating point has one further advantage over fixed point: because the hardware automatically scales each number to use the full word length of the mantissa, the full precision is maintained even for small numbers:
The next figure illustrates the primary trade-offs between fixed and floating point DSPs:
Fixed point DSPs are generally cheaper, while floating point devices have better precision, higher dynamic range, and a shorter development cycle. When fixed point is chosen, the cost of the product will be reduced, but the development cost will probably be higher due to the more difficult algorithms. In the reverse manner, floating point will generally result in a quicker and cheaper development cycle, but a more expensive final product.
38
_______________________________________________________________________ 2.2.2.6. Programming languages: C and Assembly DSPs are programmed in the same languages as other scientific and engineering applications, usually assembly or C. Programs written in assembly can execute faster, while programs written in C are easier to develop and maintain. In traditional applications, such as programs run on personal computers and mainframes, C is almost always the first choice. If assembly is used at all, it is restricted to short subroutines that must run with the utmost speed. For every traditional programmer that works in assembly, there are approximately ten that use C. However, DSP programs are different from traditional software tasks in two important respects. First, the programs are usually much shorter, say, onehundred lines versus tenthousand lines. Second, the execution speed is often a critical part of the application. After all, that's why someone uses a DSP in the first place, for its blinding speed. These two factors motivate many software engineers to switch from C to assembly for programming Digital Signal Processors.
2.2.2.7. How Fast are DSPs The primary reason for using a DSP instead of a traditional microprocessor is speed, the ability to move samples into the device, carry out the needed mathematical operations, and output the processed data. This brings up the question: How fast are DSPs? The usual way of answering this question is benchmarks, methods for expressing the speed of a microprocessor as a number. For instance, fixed point systems are often quoted in MIPS (million integer operations per second). Likewise, floating point devices can be specified in MFLOPS (million floating point operations per second). The next illustration shows the range of throughput (of a particular DSP algorithm it can be found by dividing the clock rate by the required number of clock cycles per sample) for four common algorithms, executed on a SHARC DSP at a clock speed of 40 MHz.
39
_______________________________________________________________________ In the previous figure we have introduced a new technique which is the Fast Fourier Transform convolution. FFT convolution is a fast way to carry out FIR filters. In a typical case, a 512 sample segment is taken from the input, padded with an additional 512 zeros, and converted into its frequency spectrum by using a 1024 point FFT. After multiplying this spectrum by the desired frequency response, a 1024 point Inverse FFT is used to move back into the time domain. The resulting 1024 points are combined with the adjacent processed segments using the overlapadd method. This produces 512 points of the output signal. FFT convolution can also be applied in two-dimensions (2D), such as for image processing. For instance, suppose we want to process an 800600 pixel image in the frequency domain. First, pad the image with zeros to make it 10241024. The twodimensional frequency spectrum is then calculated by taking the FFT of each of the rows, followed by taking the FFT of each of the resulting columns. After multiplying this 10241024 spectrum by the desired frequency response, the two-dimensional Inverse FFT is taken. This is carried out by taking the Inverse FFT of each of the rows, and then each of the resulting columns. Adding the number of clock cycles and dividing by the number of samples, we find that this entire procedure takes roughly 150 clock cycles per pixel. For a 40 MHz SHARC DSP, this corresponds to a data throughput of about 260k samples/second. Comparing these different techniques shown in the previous illustration, we can make an important observation. Nearly all DSP techniques require between 4 and 400 instructions (clock cycles in the SHARC family) to execute. For a SHARC DSP operating at 40 MHz, we can immediately conclude that its data throughput will be between 100k and 10M samples per second, depending on how complex of algorithm is used.
40
_______________________________________________________________________
2.3. Analog vs. Digital Filters

2.3.1. Introduction Most digital signals originate in analog electronics. If the signal needs to be filtered, is it better to use an analog filter before digitization, or a digital filter after? We will answer this question by letting two of the best contenders deliver their blows. The goal will be to provide a low-pass filter at 1 kHz. Fighting for the analog side is a six pole Chebyshev filter with 0.5 dB (6%) ripple. This can be constructed with 3 operational amplifiers, 12 resistors, and 6 capacitors. In the digital corner, the windowed-sinc is warming up and ready to fight. The analog signal is digitized at a 10 kHz sampling rate, making the cutoff frequency 0.1 on the digital frequency scale. The length of the windowed-sinc will be chosen to be 129 points, providing the same 90% to 10% roll-off as the analog filter. Fair is fair. The next figure shows the frequency and step responses for these two filters. Let's compare the two filters blow-by-blow. As shown in (a) and (b), the analog filter has a 6% ripple in the passband, while the digital filter is perfectly flat (within 0.02%). The analog designer might argue that the ripple can be selected in the design; however, this misses the point. The flatness achievable with analog filters is limited by the accuracy of their resistors and capacitors. Even if a Butterworth response is designed (i.e., 0% ripple), filters of this complexity will have a residue ripple of, perhaps, 1%. On the other hand, the flatness of digital filters is primarily limited by round-off error, making them hundreds of times flatter than their analog counterparts. Score one point for the digital filter. Next, look at the frequency response on a log scale, as shown in (c) and (d). Again, the digital filter is clearly the victor in both roll-off and stopband attenuation. Even if the analog performance is improved by adding additional stages, it still can't compare to the digital filter. For instance, imagine that you need to improve these two parameters by a factor of 100. This can be done with simple modifications to the windowed-sinc, but is virtually impossible for the analog circuit. Score two more for the digital filter. The step response of the two filters is shown in (e) and (f). The digital filter's step response is symmetrical between the lower and upper portions of the step, i.e., it has a linear phase. The analog filter's step response is not symmetrical, i.e., it has a nonlinear phase. One more point for the digital filter. Lastly, the analog filter overshoots about 20% on one side of the step. The digital filter overshoots about 10%, but on both sides of the step. Since both are bad, no points are awarded. In spite of this beating, there are still many applications where analog filters should, or must, be used. This is not related to the actual performance of the filter (i.e., what goes in and what comes out), but to the general advantages that analog circuits have over digital techniques. The first advantage is speed: digital is slow; analog is fast.
41
_______________________________________________________________________ For example, a personal computer can only filter data at about 10,000 samples per second, using FFT convolution. Even simple operational amplifiers can operate at 100 kHz to 1 MHz, 10 to 100 times as fast as the digital system! The second inherent advantage of analog over digital is dynamic range. This comes in two flavors. Amplitude dynamic range is the ratio between the largest signal that can be passed through a system, and the inherent noise of the system. For instance, a 12 bit ADC has a saturation level of 4095, and an rms quantization noise of 0.29 digital numbers, for a dynamic range of about 14000. In comparison, a standard op amp has a saturation voltage of about 20 volts and an internal noise of about 2 microvolts, for a dynamic range of about ten million. Just as before, a simple op amp devastates the digital system. The other flavor is frequency dynamic range. For example, it is easy to design an op amp circuit to simultaneously handle frequencies between 0.01 Hz and 100 kHz (seven decades). When this is tried with a digital system, the computer becomes swamped with data. For instance, sampling at 200 kHz, it takes 20 million points to capture one complete cycle at 0.01 Hz. You may have noticed that the frequency response of digital filters is almost always plotted on a linear frequency scale, while analog filters are usually displayed with a logarithmic frequency. This is because digital filters need a linear scale to show their exceptional filter performance, while analog filters need the logarithmic scale to show their huge dynamic range.
42
_______________________________________________________________________
2.3.2. Advantages of using digital filters in front of analog circuits The following list gives some of the main advantages of digital over analog filters. 1. A digital filter is programmable, i.e. its operation is determined by a program stored in the processor's memory. This means the digital filter can easily be changed without affecting the circuitry (hardware). An analog filter can only be changed by redesigning the filter circuit. 2. Digital filters are easily designed, tested and implemented on a general-purpose computer or workstation. 3. The characteristics of analog filter circuits (particularly those containing active components) are subject to drift and are dependent on temperature. Digital filters do not suffer from these problems, and so are extremely stable with respect both to time and temperature.
43
_______________________________________________________________________ 4. Unlike their analog counterparts, digital filters can handle low frequency signals accurately. As the speed of DSP technology continues to increase, digital filters are being applied to high frequency signals in the RF (radio frequency) domain, which in the past was the exclusive preserve of analog technology. 5. Digital filters are very much more versatile in their ability to process signals in a variety of ways; this includes the ability of some types of digital filter to adapt to changes in the characteristics of the signal. 6. Fast DSP processors can handle complex combinations of filters in parallel or cascade (series), making the hardware requirements relatively simple and compact in comparison with the equivalent analog circuitry.
Comparative table between Digital and Analog Filtering
DIGITAL FILTERS
High Accuracy
ANALOG FILTERS
Less Accuracy - Component Tolerances
Linear Phase (FIR Filters)
Non-Linear Phase
No Drift Due to Component Variations
Drift Due to Component Variations
Flexible, Adaptive Filtering Possible
Adaptive Filters Difficult
Easy to Simulate and Design
Difficult to Simulate and Design
Computation Must be Completed in Sampling Period-Limits Real Time Operation
Analog Filters Required at High Frequencies and for Anti-Aliasing Filters
Requires High Performance ADC, DAC & DSP
No ADC, DAC, or DSP Required
In the previous table we can observe the pros and cons of using analog or digital filters, and evaluating all the comparations we conclude obviously that using digital filters is more apropiate for our application.
44
_______________________________________________________________________
3. Analysis of the possible solutions

3.1. Solution ways
In this chapter is going to study in depth the two possible solution ways such as the common adaptive filtering and digital signal processors (configured or programmed to work as an adaptive filter in the main application). At first, lets introduce us in the field of digital adaptive filtering (circuits).
3.1.1. Digital Adaptive Filters for Echo Cancelling

In this subchapter we begin with an examination of the filter structure with emphasis on FIR (Finite Impulse Response) filters. This is followed by a review of the Wiener filter leading to the development of the LMS (Least Mean Squares) algorithm. The NLMS (normalised LMS) is detailed with a comparison of its main characteristics to the LMS. Thus the standard elements of an echo canceller are described. We will start by introducing what is an Echo canceller. 3.1.1.1. The Echo Canceller Simplifying the echo canceller down to its major components, it is found that it is quite a basic mechanism. Essentially it is an adaptive filter used for the purpose of direct system modelling, as shown in the next figure. An adaptive filter is a device which extracts the required information from a signal by adjusting its parameters in response to the environment to give the optimal solution. It is made up of two elements, a filter and an adaptive algorithm.
45
_______________________________________________________________________ The previous figure shows both the general structure for the system modelling application and the basic echo canceller structure. Comparing these it can be seen that they are of the same configuration. Referring to the direct system modelling configuration in this Figure; the adaptive filter aims to model the unknown system by adjusting its weights to replicate the transfer function of the system. The same input signal is passed through both the unknown system and the filter, with the output from the unknown system being the desired signal and the output from the filter being the synthesized signal. The principle of this application is to minimise the difference, known as the error, between the desired and the synthesized signals until eventually the two are identical. The error signal is fed into an adaptive algorithm which is based on some function for minimising this error and the filter coefficients are recalculated. In theory, this process should lead to finding a filter which models exactly the unknown system. This class of adaptive filters is used for system identification, echo and interference cancellation. Referring now to the echo canceller configuration in the figure; the unknown system in this case is the hybrid transformer which causes a reflection of the transmitted signal back to the source terminal as echo. The transmit signal is also input to the adaptive filter (echo canceller) to enable a synthesized echo, z(n) , to be created. The error signal, e(n) = z(n) z(n), is the difference between the actual echo and the replicated echo and is fed back into the adaptive algorithm which then updates the filter coefficients. This error signal is also passed along the line to the (source) receiver and will be heard as echo if the canceller is not working effectively. Once the error has reached zero, then the synthesized echo is a match to the actual echo and no echo appears at the receiver of the source terminal. This has described the simple case of single-talking (ST) and the consequential echo in the network. The desired signal is more generally made up of a double-talk (DT) signal, (n), plus the echo of the single-talk signal, x(n), i.e. z(n) = x(n) + (n). Since the DT signal is uncorrelated with the ST signal it is therefore uncorrelated with the echo. The echo canceller therefore sees the DT signal as noise and the filter still adapts towards the echo based on the error.
3.1.1.2. The FIR Filter Echo Canceller An echo canceller is a closed loop linear adaptive filter used for direct system modelling. There are many different combinations of filters and algorithms, depending on the particular application requirements; from FIR to IIR (Infinite Impulse Response) filters, from LMS to RLS (Recursive Least Squares) algorithms. For echo cancellation, there is a classical standard adaptive filter formation. The filter part is made up of the most commonly used structure: a FIR filter which is also known as a tapped delay line, non-recursive or feed-forward transversal filter, as shown in the next figure and, as we have seen in previous chapters.
46
_______________________________________________________________________
FIR filter structure
The FIR filter consists of a series of delays, multipliers and adders; has one input, x(n), and one output, y(n). The output is expressed as a linear combination of the delayed input samples:
where wi(n) are the filter coefficients and N is the filter length. y(n) therefore is the convolution (inner product) of the two vectors w(n) and x(n). The significant advantage that FIR filters have over other structures is the important fact that their transfer functions contain only zeros:
This makes FIR filters inherently and unconditionally stable. IIR filters on the other hand, generally have both poles and zeros in their transfer functions and their output is likely to oscillate indefinitely as the poles move outside the unit circle. However, subject to constraints, there are conditionally stable IIR filter forms such as lattice filters.
47
_______________________________________________________________________ There are other additional reasons why FIR filters are more often used than IIR, such as:
They can be designed to be linear phase and so long as the FIR coefficients are symmetrical the filter does not distort the phase of the input signal. They are simple to implement and to adapt as they have a limited memory. Only present and past input samples are used to derive the current output value. No use is made of any previous output samples in influencing the latest output of the filter. They have desirable numeric properties. For finite-precision arithmetic they are less susceptible to round-off, overflow or coefficient quantization errors as they have no output feedback. The majority of digital signal processors are implemented in finiteprecision arithmetic.
However there are also some disadvantages to FIR filters. The desired responses that the filters aim to synthesize often contain both poles and zeros. Therefore, the FIR filters are only approximating the exact model. This can lead to very large FIR filters with large memories, high cost and computational complexity, since a strong pole may require a few hundred zeros in return for satisfactory performance. The IIR however could achieve the desired response using much less memory and fewer calculations. 3.1.1.3. The Wiener Filter For linear filtering problems with stationary inputs, the adaptive algorithms aim to converge to the same solution as the Wiener filter. This includes applications such as the system identification model, i.e. echo cancellation, as well as areas such as signal estimation, prediction and smoothing. The next figure shows the filtering problem of estimating the excitation or interference process which is stimulating or corrupting the input signal, having available the original input signal and a reference signal, which is generally known as the desired signal.
48
_______________________________________________________________________ On due consideration of this filtering problem, N. Wiener and E. Hopf, in 1931, produced a formula for obtaining the optimum solution for the estimated filter taps for a linear continuous-time filter. This formula required the solution of an integral equation, which has become known as the Wiener-Hopf equation and was based on minimising the mean square error of the system. It was then developed by Levinson in 1947 to produce the Wiener-Hopf equation for discrete-time signals. This is expressed as: R w0 = P the solution of which is:
w0 = (1/ R) P
where w0 is the optimum tap-weight vector, R is the (NN) auto-correlation matrix of the tap inputs, and P is the (N1) cross-correlation vector between the tap inputs and the desired response.
Wiener filters minimise the mean square error of the system. This cost function is one of the simplest to mathematically solve and in the majority of cases it has a single global minimum, particularly in the case of a FIR filter. For the ideal situation therefore the optimum solution of a Wiener filter exists and can be reached exactly.
3.1.1.4. Least Mean Squares Algorithm In practice Wiener filters are commonly implemented as FIR filters using a LMS family algorithm. LMS algorithms are capable of arriving at close approximations of the same (optimum) solution as the Wiener-Hopf equation but without directly solving the equation. The method of least squares is accredited to Gauss though Legendre, who independently developed the same algorithm. With work commencing on adaptive filters in the late 1950s, the least squares algorithm was developed to produce the least mean squares. This achievement was a result of the work of Widrow and Hoff on an adaptive linear element known as Adaline, which was a pattern recognition scheme. From the LMS came the normalised LMS and later the leaky LMS. Then came the development of the signed LMS and the Quantized LMS and so on. The LMS algorithm follows a stochastic gradient approach to finding the optimum Wiener solution, w0, by minimization of the mean-square error, where the error is the difference between the output from the filter and the desired response. It basically follows the steepest descent method to finding an optimum solution of the cost function, which in this case is the mean-squared error. The steepest descent algorithm is expressed as: w^(k+1) = w^(k) ^(k) F {w^(k)} whereby the filter coefficients, w, are updated proportionally to the gradient value F [w^(k)], ^(k) is a small step size and the minus sign ensures that the parameter estimates descend the error surface.
49
_______________________________________________________________________ However, whereas an actual or determined value for the gradient is calculated at each stage in the steepest descent, with a stochastic gradient method instantaneous values are used to calculate an estimate of the gradient. This gives a random or stochastic gradient. In order to derive the LMS algorithm (previous equation) is rewritten in terms of the mean-squared error cost function, as given in the next equation: w(n+1) = w(n) w (e(n)) and solving the gradient to find its value, we conclude: w (e(n)) = = 2 e(n)x(n) Substituting this value in the last equation, the LMS algorithm is given by: w(n+1) = w(n) + 2 e(n)x(n)
The LMS has a number of favourable properties that makes it a popular algorithm to use: one of the greatest advantages of the LMS is its simplicity; it does not require direct calculation of the correlation functions nor matrix inversion which can be quite expensive computationally; It is a straight forward algorithm to carry out having on each iteration only (N+1) multiplications and N additions, where N is the number of tap-weights in the filter; Another favoured property of the LMS is its robustness; small uncertainties, non-linearities or small (energy) disturbances cause only small estimation errors. These do not cause the LMS to veer wildly from its adaptation path but to continue tracking to the optimum solution. Generally a sufficient condition for the stability of the LMS is for the step size to lie within the range 0<<1/max, where max is the largest eigenvalue of the (input) autocorrelation matrix. This gives convergence of the LMS in the mean and is in fact the necessary and sufficient condition for the stability of the steepest descent algorithm. Much analysis has been carried out on the stability condition of the LMS with some studies providing more restrictive limits on the step-size. Along with the input signal power, the step size governs the stability, convergence time and the fluctuations (misadjustment) of the adaptation process. A large step size will cause the algorithm to converge quickly but it may also cause it to noticeably oscillate about the adaptation path. There is also the possibility that if too large a step-size is chosen the coefficients will diverge due to an exponentially increasing error. For a small step-size the LMS will converge very slowly but the stability of the algorithm can then be guaranteed. The rate of convergence of the LMS has the undesirable property of being dependent on the input signal statistics, i.e. on the range of the eigenvalues of the autocorrelation matrix (R). If the input signal is a highly correlated signal such as speech, then there is a wide spread of eigenvalues and convergence is very slow, i.e. the ratio min /max is small. In this case it is advisable to select a small step-size. Whereas if the input signal is reasonably uncorrelated such as white noise, then the eigenvalues are approximately all equal and constant and the convergence rate is fast, i.e. the ratio min /max is approximately equal to 1. The smallest eigenvalue is responsible for the convergence speed since the algorithm converges slowly for the weak eigenvalues.
50
_______________________________________________________________________ When the eigenvalues are approximately equal, the LMS converges at the same speed for all of them. Therefore the LMS will converge slowly for low power signals. The misadjustment of an algorithm is a dimensionless quantitative measure of how close the adapted result is, to the optimum Wiener solution. Since the Wiener-Hopf equation solves for minimal mean-square error, the misadjustment is given by the ratio of the steady-state excess mean-square error to the minimum mean-square error, i.e.: M = excess MSE / MSEmin For the LMS, the previous equation is approximated to:
where tr[R] trace of the autocorrelation matrix, N the number of filter taps, mse time constant of the weight adaptation process i.e. the time constant of the learning curve.
However, from these equations it can be seen that the misadjustment is directly proportional to the step size but the time constant is inversely proportional to the step size. Therefore to have a small misadjustment the step size needs to be small but for the LMS to converge quickly a large step size is required. The choice of the step size therefore needs to be carefully made. These equations also highlight that the misadjustment increases linearly with the filter length for a fixed time constant.
3.1.1.5. Normalised Least Mean Squares Algorithm Though other algorithms belonging to other families, such as the RLS, may be more complex, less stable and more difficult and expensive to implement, they often converge much faster than the LMS. This fast convergence is a very desirable property. Improving the convergence speed but keeping the simplicity of the LMS has been the major consideration of much algorithmic research, with one such algorithm being the NLMS (normalised least mean square). Whereas for the LMS, the rate of convergence is dependent on the step-size which is in turn dependent on the input signal characteristics, the NLMS does not have this dependency. The equation for the NLMS is as follows:
51
_______________________________________________________________________ w(n+1) = w(n) + { e(n)x(n) / x ^T(n) x(n)}
On each iteration, the tap-weight change (gradient) is normalised with respect to the squared Euclidean norm of the input signal. This minimises the effect of the signal power on the coefficient change. The NLMS aims to find the step-size that minimises the instantaneous output error and by doing so it follows the principle of minimal disturbance which states that in the light of new input data, the parameters of an adaptive system should only be disturbed in a minimal fashion. Thus for the NLMS, when new input data is received, the tapweight vector is updated with only an optimal minimal change and consequently successive values for the tap-weights do not vastly oscillate. This results in the adaptation path being optimised: the convergence rate has been optimised. For highly correlated and nonstationary signals such as speech, the NLMS has a significant improvement in the convergence rate over the LMS, due to the normalisation which minimises the effect of the input signal power. For uncorrelated and stationary signals such as white noise, the NLMS still gives an improved performance with the convergence rate being constant and maximum. Nevertheless, the convergence rate of the NLMS is still dependant on the eigenvalue spread of the input signal and consequently it converges slowly compared to other algorithmic families. However, the NLMS is the most popular algorithm used in echo cancellation technology, with the following version being most often used in practical implementations: w(n+1) = w(n) + { e(n)x(n) / x ^T(n) x(n) + } is a very small constant which is used to prevent a division by zero or other nominal value close to zero. This situation occurs when the estimated input signal power is very small and would, from equation we have seen previously to this, result in a tap-weight change of infinity. By including , the maximum step-size is limited and a more stable NLMS algorithm is obtained. In comparison to the LMS the NLMS is convergent in the mean if the step-size lies in the range 0<<2. This shows that the stability of the NLMS does not rely on the statistics of the input signal. It is a slightly more complex algorithm than the LMS due to the normalisation division. The NLMS may be slightly more computationally complex than the LMS but it is still one of the more simple algorithms to implement. It has all the favoured benefits of the LMS with the addition of a faster convergence and better performance. It is popular in industry as it gives a good balance between cost and performance and has a guaranteed stability. Therefore it is the algorithm on which this thesis is based to solve the echo and noise problem.
52
_______________________________________________________________________ 3.1.1.6. Non-Linearities And Echo Cancellation Though FIR filters are not capable of exactly matching the systems desired response, in many cases it is possible to achieve a satisfactory output from a FIR filter based on linear assumptions, even if the system exhibits some minor non-linearities. This is so with echo cancellation and generally speaking it has been found that echo paths are sufficiently linear to be modelled by adaptive linear transversal filters. For purely digital transmission systems, non-linear echo cancellation may provide a better performance. However, it is not in widespread use since non-linear echo cancellers are much more complex than standard linear ones, with the complexity increasing exponentially with the size of the data symbol alphabet. Consequently there has to be a very strong need for the extra performance benefits to outweigh the use of a linear echo canceller. A major problem with non-linearities is that they are not generally simple-matters and tend to be memory-intensive; Volterra-based cancellers need to use at least a second order Volterra-series to effectively model non-linearities. The number of filter coefficients increases rapidly as the order increases which will depend on the memory and size of the non-linearity. Table look-up filters also depend on memory, where for each input sample there is a possible L output values each stored at a memory address. Therefore the larger the filter the more unmanageable the look-up is and the slower the convergence. Another possibility is to use recursive filters but these may become unstable. Precautions therefore need to be taken to prevent this instability but these may result in a decrease in performance of the IIR filters. In general, weighing up the pros and cons, linear FIR adaptive filters still hold the upper hand in echo cancellation technology, being simpler, more cost-effective, giving a reasonable level of cancellation and being on the whole, stable. Therefore for the purpose of this thesis, the echo canceller model could be based on linear FIR adaptive filters. 3.1.1.7. ERLE The performance of an echo canceller is usually given in terms of the echo return loss enhancement (ERLE). This is a comparison of the echoes before and after cancellation. It is calculated as:
where z(n) desired signal (actual echo only), z(n) replicated echo.
53
_______________________________________________________________________ For the calculation of the ERLE it is assumed that the system is not in a double-talk situation. The ERLE therefore is the amount of attenuation of the echo signal introduced by the echo canceller. It does not include any further reduction in the residual echo by any extra nonlinear processing after the basic echo cancellation. The ERLE provides a figure of merit for determining how effective the echo cancellation process is; it assumes that there is always a certain amount of loss incurred by echo and then shows the rate of improvement after echo cancellation. It reflects both the convergence rate and the steady-state residual echo. The plot of ERLE versus time shows the rate of change in the enhancement: it shows the rate of convergence of the algorithm to the steady-state error value. The ERLE gives a good indication of the performance of the echo canceller. Over time the ERLE changes; initially it may be quite small but as the algorithm converges towards the optimum tap-weight values it increases. Theoretically the steady state ERLE could be very large and an ideal echo canceller with a perfectly linear echo signal would output an infinite ERLE in a very short period of time. Practically however, there are limiting factors to this result; the echo path always contains some non-linearities introduced by various components in the transmission path; the devices that generate the echo produce a certain amount of echo loss that little can be done about and the use of finite-precision devices limit the accuracy of the computations. Therefore the ERLE will not reach its theoretical steady-state maximum value. Nevertheless, a good performing echo canceller will output a very large steady-state ERLE in a very short convergence time.
3.1.1.8. ERL The ERLE can also be given in terms of the echo return loss (ERL): the ERLE is the apparent increase in ERL resulting from the echo cancellation process. The ERL is the attenuation of a signal by the network elements and the echo generating device; it is a measure of the loss estimate of the hybrid and transmission line. (It is not just the amount of attenuation caused by the hybrid alone but it is a good estimate of the hybrid loss). The ERL is the ratio in dB of the input signal to the echo signal, as experienced in the actual telephone circuit without any form of echo protection. It is given by the following formula:
For the calculation of the ERL it is assumed that the system is not in a double-talk situation.
54
_______________________________________________________________________ The ERL is a function of frequency. It is more accurately defined as the difference in dB between the level of a composite-frequency signal sent into a circuit and the level of the echo signal that is reflected back to the source, where the composite signal normally contains all frequencies between 500Hz and 2500Hz at equal amplitudes. Low ERL relates to a low loss across the hybrid. This results in a high reflected signal i.e. much echo. As the hybrid loss increases and the input signal level also increases, the ERL improves and the reflected signal is low, i.e. little echo.
55
_______________________________________________________________________
3.1.2. Available DSPs on the market

3.1.2.1 List of all DSPs companies on the market

Alacron Alex Computer Systems Analog Devices Angeles Design Systems Corporation. Ariel Corporation Atmel Corporation Berkeley Design Technology, Inc.- DSP Technology Specialists Bittware Research Systems Bridgenorth Signal Processing Clarkspur Design Inc. - designer of configurable core integrated circuits for DSP Clearline Communications - providing comprehensive and progressive voice quality products Coreco Inc. Data Translation DNA Enterprises, Inc. DSP Communications, Inc. DSP Developement Corporation - DADiSP DSP Group, Inc. DSP Software Engineering DSP Solutions DSP Tools, Inc. Elanix Incorporated Eonic Systems, Inc. GO DSP Corporation Hyperception Hyperstone Electronics Improv Systems, Inc.- designs configurable integrated circuit architectures, compilers, system applications and support technology Innovative Integration Loughborough Sound Images Massana - DSP Algorithm developement and IC Design Motorola Multiprocessor Toolsmiths Inc. National Instruments Numerix, Ltd. Nyvalla DSP Parallel Performance Group, Inc. Pentek Signalogic, Inc. Silicon Systems, Inc. Sonitech International, Inc. Spectron Microsystems Spectrum Signal Processing Tasking Texas Instruments
56
_______________________________________________________________________

Traquair Data Systems, Inc. Transtech Parallel Systems Visual Solutions, Inc. VLSI Solution - DSP cores and IC design
We will introduce the three most important DSPs companies on the market:

Texas Instruments Analog Devices Motorola
57
_______________________________________________________________________
3.1.2.2. Texas Instruments

TMS320 DSP Family Overview Since the launch of Texas Instruments first single-chip Digital Signal Processor (DSP) in 1982, TI has provided designers an accelerated time-to-market with next-generation, breakthrough systems as well as complementary technology and support. DSPs are unique microprocessors that are programmable and operate in real-time much faster than general-purpose microprocessors. The ability to crunch vast quantities of numbers, while racing a clock is the value digital signal processors bring to the electronics marketplace. The TMS320 DSP family offers the most extensive selection of DSPs available anywhere, with a balance of general-purpose and application-specific processors to suit your needs. There are three distinct Instruction Set Architectures that are completely codecompatible within platforms:
Highest Performance: TMS320C6000 DSP platform Raising the bar in performance and cost efficiency, the C6000 DSP platform offers a broad portfolio of the industry's fastest DSPs running at clock speeds up to 1 GHz. The platform consists of the TMS320C64x and TMS320C62x fixed-point generations as well as the TMS320C67x floating-point generation. Optimal for designers working on targeted broadband infrastructure, performance audio and imaging applications, the C6000 DSP platform's performance ranges from 1200 to 8000 MIPS for fixed-point and 600 to 1350 MFLOPS for floating point.
Best Power Efficiency: TMS320C5000 DSP Platform TMS320C5000 DSP Platform is optimized for the consumer digital market - the heart of the mobile Internet - and it's convergence with other consumer electronics. With a roadmap to power consumption as low as 0.33mA/MHz, the TMS320C55x and TMS320C54x DSPs are optimized for personal and portable products like digital music players, GPS receivers, portable medical equipment, 3G cell phones, and digital cameras as well as MIPS-intensive voice and data applications and extremely cost effective single and multi-channel applications. Based on the C55x DSP core, the OMAP5910 processor integrates a C55x DSP core with a TI-enhanced ARM925 on a single chip for the optimal combination of high performance with low power consumption. This unique architecture offers an attractive solution to both DSP and ARM developers, providing the low power real-time signal processing capabilities of a DSP coupled with the command and control functionality of an ARM. Sampling today, the OMAP5910 is optimal for designers working with devices that require embedded applications processing in a connected environment.
58
_______________________________________________________________________ Control Optimized: TMS320C2000 DSP Platform TMS320C2000 DSP Platform provides the digital control industry with the highest level of on-chip integration and powerful computational abilities that produce unparalleled improvements in energy efficiency. The TMS320C28x DSP generation is the highest-performance solution for digital control. The TMS320C24x DSP generation is the foundation for this diverse platform. This generation delivers power and control advantages that allow designers to implement advanced, cost-efficient control systems
Tools and Software For rapid DSP product development, the TMS320 DSP family is supported by the eXpressDSP Real-Time Software Technology that includes Code Composer Studio integrated development environment, DSP/BIOS Real-time software kernel, TMS320 DSP Algorithm Standard and choices for reusable, modular software from the largest Third-Party Network in the industry.
Complementary Data Converter and Power Management Products TI also offer a range of complementary data converter and power management products to get your designs to market faster. And because most of TI's new analog products are designed to work directly with TI DSPs, TI can focus on providing total solution sets. Changing the world, providing more choices for every market requirement and a roadmap leading from today's needs to tomorrow's demands.
TI families which can work as echo canceller Inside the three most important families of TI (C2000, C5000 and C6000), only C5000 and C6000 offer the possibility of aplying an echo canceller with one of their DSPs.
59
_______________________________________________________________________ Processors from C5000 & C6000 families The TMS320C6000 platform consists of three fully code-compatible device generations: TMS320C64x: The C64x fixed-point DSPs offer the industry's highest level of performance to address the demands of the digital age. At clock rates of up to 1 GHz, C64x DSPs can process information at rates up to 8000 MIPS with costs as low as $19.95. In addition to a high clock rate, C64x DSPs can do more work each cycle with built-in extensions. These extensions include new instructions to accelerate performance in key application areas such as digital communications infrastructure and video and image processing. TMS320C62x: These first-generation fixed-point DSPs represent breakthrough technology that enables new equipments and energizes existing implementations for multi-channel, multi-function applications, such as wireless base stations, remote access servers (RAS), digital subscriber loop (xDSL) systems, personalized home security systems, advanced imaging/biometrics, industrial scanners, precision instrumentation and multi-channel telephony systems. TMS320C67x: For designers of high-precision applications, C67x floating-point DSPs offer the speed, precision, power savings and dynamic range to meet a wide variety of design needs. These dynamic DSPs are the ideal solution for demanding applications like audio, medical imaging, instrumentation and automotive.
The TMS320C5000 platform consists of two fully code-compatible device generations: TMS320C54x: The C54x generation consists of over 17 code compatible devices. With a broad range of performance and peripheral options, low-power operation and innovative architecture and instruction set, the C54x generation gives designers effective ways of achieving high-performance, low-power operation and low system cost. TMS320C55x: The TMS320C55x generation contains the industrys most powerefficient DSPs and redefines the potential of applications ranging from portable Internet appliances to high-speed wireless communications. The rapidly-growing generation delivers ultra-low power performance through advanced power management techniques that automatically power down inactive peripherals, memory and core functional units increasing battery life for portable applications.
60
_______________________________________________________________________ Comparative table C6000 DSPs

Parametric MHz MIPS/MFLOPS 16-bit MMACS 8-bit MMACS Special Instructions/ Capabilities Active Power Pricing (10kU) Peripherals/ Co-processors C62x Fixed-Point DSPs 150 - 300 1200 - 2400 MIPS 300 - 600 300 - 600 Multi-channel Voice, Data and Imaging 0.9 - 2.1 Watts $8.55 - $102.00 US C64x Fixed-Point DSPs 300 - 1000 2400 - 8000 MIPS 1200 - 4000 2400 - 8000 Accelerated Video, Data, Imaging, Audio 0.25 - 1.06 Watts $19.95 - $199.00 US C67x FloatingPoint DSPs 100 - 225 600 - 1350 MFLOPS 200 - 550 200 - 550 IEEE single- and double-precision floating-point 0.5 - 1.4 Watts $13.50 - $111.00 US
McBSP, 32-bit/33MHz PCI, McBSP, 32-bit/33MHz PCI, McASP, McBSP, 16/32-bit HPI, 4/16-channel 16/32-bit HPI, 64-channel DMA, 16/32-bit HPI, 4/16DMA, 16/32-bit EMIF, 32- 16-64-bit EMIF, UTOPIA, Timers channel DMA, bit Expansion Bus (X-Bus), Viterbi Co-Processor, Turbo Co16/32-bit EMIF, Timers Processor, Video Ports Timers
Comparative table C5000 DSPs

Parametric MHz MIPS 16-bit MACS 8-bit MACS Active Power Pricing (10kU) Peripherals/ Co-processors C55x Fixed-Point DSPs 144 - 200 288 - 400 288 - 400 N/A 65 - 160 mW $5.00 - $25.00 US McBSP, 16-bit HPI, 6-channel DMA, 16/32-bit EMIF, USB 2.0 Full Speed, ADC, IIC, MMC/SD, uART, Video Hardware Extensions (DCT, Motion Estimation, Pixel Interpolation) C54x Fixed-Point DSPs 50 - 160 50 - 532 Single-MAC Single-MAC N/A 40m - 90 mW $9.95 - $25.00 US McBSP, 16-bit HPI, 6-channel DMA, uAR
Special Instructions Variable Length Instructions (8 - 48 bit)
61
_______________________________________________________________________
3.1.2.3. Analog Devices

The most important ADs families are: ADSP-21xx, SHARC 32-Bit DSPs, TigerSHARC Processor and Blackfin Processors (ADSP-BF5xx). Lets introduce them:
ADSP-21xx The ADSP-21xx Processor code-compatible family members vary in memory integration, operating voltage, operating speed and temperature range. The ADSP-218x products offer numerous pin to pin-compatible parts, with on-chip memory varying from 8K words to 104K words. All products are code-compatible offering a very simple upgrade path to integrate new software product features. The most recent ADSP-218x N series products operate at 1.8V, consuming half the power of the prior versions, making them ideal for low power portable applications. The ADSP-219x products double the performance of the ADSP-218xN series, operating at 160MHz. The most recent, the ADSP-2195 and the ADSP-2196, are pin-compatible with the ADSP-2191 offering many system upgrade options.
SHARC 32-Bit DSPs SHARC Processors offer exceptionally high floating-point DSP performance while integrating application-specific peripherals and interfaces designed to minimize overall system costs. The completely code-compatible family portfolio extends from entry-level products that are priced under $10 to high-end products providing 300 Mhz/1800 MFLOPs of signal processing performance. The broad range of price and performance points available in the SHARC processor family make its members particularly well suited to applications ranging from consumer, automotive, and professional audio to industrial and medical imaging. All SHARC Processors are based on a 32-bit super Harvard architecture that combines a high-performance signal processing core with sophisticated memory and I/O processing subsystems. This balanced architecture enables unparalleled performance while ensuring sufficient memory and I/O bandwidth is available for the most algorithmically challenging applications. In addition to these hardware-centric efficiencies, all SHARC processors offer a very flexible algorithm development environment by supporting a variety of fixed- and floating-point data types. Auto Room Tuner (ART) is a new software algorithm for audio calibration and equalization that makes it very easy, fast and convenient for consumers to match surround-sound receiver settings for their speakers to the acoustics of the space in which they are installing their home theater or music system. The ART technology is integrated into the latest SHARC Melody Platform based on the recently introduced third-generation SHARC Processors.
62
_______________________________________________________________________ TigerSHARC Processor The TigerSHARC Processor family offers the industry's highest performance per-watt and per-square-inch of board space for the most demanding signal- and imageprocessing applications. Its patented link-port technology allows glueless interprocessor communication within arrays of two or more TigerSHARC processors, delivering unbounded performance in terms of MMACS and MFLOPS. Based on a 128-bit static superscalar architecture, TigerSHARC Processors offer native support of fixed and floating-point data types and a balanced combination of computational performance, I/O bandwidth and memory integration. Together this yields sustained DSP system-level performance that's two-to-four times greater than conventional DSPs or microprocessors with vector processing units. By providing native support for 1-bit data formats used for chip-rate processing, TigerSHARC pioneers a new class of software-defined radios and serves applications that were previously the exclusive domain of expensive ASICs (application-specific integrated circuits) and FPGAs (field-programmable gate arrays). And by moving to a software-centric design model, TigerSHARC Processors allow IP reuse which greatly enhances R&D productivity throughout each successive product generation. The newly announced TigerSHARC Processors provide a broad range of price and performance points to meet the needs of many different applications. The ADSPTS201S is offered at 500 and 600 MHz with 24 Mbits of on-chip memory while the ADSP-TS202S and ADSP-TS203S are offered at 500 MHz with 12 Mbits and 4 Mbits of on-chip memory, respectively.
Blackfin Processor Blackfin Processors embody a new breed of embedded processor designed specifically to meet the computational demands and power constraints of today's embedded audio, video, and communications applications. Blackfin delivers breakthrough signal-processing performance and power efficiency with a RISC programming model. Blackfin Processors present high-performance, homogeneous software targets, which allows flexible resource allocation between hard real-time DSP tasks and non real-time control tasks. System control tasks can often run in the shadow of DSP and video tasks.
63
_______________________________________________________________________ Comparative table SHARC 32-Bit DSPs

Parametric MHz Package MMACS MFLOPS On-Chip SRAM (Mbits) On-Chip ROM (Mbits) FIR Filter (per tap) Serial Ports Core Voltage (V) Pricing ADSP-21365 300 MiniBGA 600 1800 3 4 1.67 ns 6 1.2 ADSP-21364 300 MiniBGA 600 1800 3 4 1.67 ns 6 1.2 ADSP-21266 200 LQFP, MiniBGA 400 1200 2 4 2.5 ns 6 1.2 ADSP-21267 150 LQFP, MiniBGA 300 900 1 3 4 1.2 ADSP-21262S 200 LQFP, MiniBGA 400 1200 2 4 2.5 ns 6 1.2 -
Comparative table Blackfin DSPs

Parametric MHz Package MMACS (max) RAM Memory (kBytes) External Memory Bus Core Voltage (V) Pricing Parallel Periph Interface PCI - USB Device ADSP-BF561 500 - 750 256 Mini-BGA, 256-PBGA 3000 328 32bit 1.2 $19.95 - $39.95 Yes No 2 PPIs UART 12 Timers 2 SPORTs ADSP-BF535 200 - 350 260 PBGA 700 308 32bit 1.6 ADSP-BF533 500 - 750 ADSP-BF532 400 ADSP-BF531 400 160 MiniBGA, 169PBGA, 176 LQFP 800 52 16bit 1.2 $4.95 - $8.45 Yes No PPI UART SPI 2 SPORTs 3 Timers
160 Mini-BGA, 160 Mini-BGA, 169-PBGA, 176 169-PBGA LQFP 1200 148 16bit 1.2 800 84 16bit 1.2 $7.50 - $9.95 Yes No PPI UART SPI 2 SPORTs 3 Timers
$22.00 - $31.25 $12.95 - $31.95 No Yes PCI USB Device 2 SPORTs 2 SPI Yes No PPI UART SPI 2 SPORTs 3 Timers
Key Peripherals
64
_______________________________________________________________________
3.1.2.4. Motorola
The most important Motorolas families are: DSP56300 and 56800/E Family. Lets introduce them:
DSP56300 The broad DSP56300 family is based on the DSP56300 core, a design integrating advanced features that dramatically boost performance, simplify system design, and drive system costs down. These devices are code-compatible with the DSP56000 Family of Processors.
56800/E Family Developers Starter Kit to support the 56F8300 Family Motorola's advanced 56F8300 Developers Start Kit brings demonstration, evaluation and development capbilities to the 56F8300 series of hybrid controllers. This comprehensive kit includes a demonstration board that utilizes the 60 MIPS 56F8323 hybrid controller with on-chip oscillator to illustrate the enhanced capabilities of the 56F8300 Series. To demonstrate some typical 56F8300 applications in which processing power must be combined with remote sensin capabilites, the MC56F8300DSK integrates Motorola's MC33794 E-field sensor. It also features a built-in JTAG-to-parallel port command converter, with parallel cable included, providing fast and simple out-of-the-box debugging. For rapid application development, the 56F8300 Developers Starter Kit includes CodeWarrior Development Studio for 56800/E with Processor Expert technology. This multi-tiered tool allows development, compiling, linking and debugging applications with a complimentary permanent license (limited to 16 KB). Processor Expert technology provides fully debugged peripheral drivers, libraries and interfaces that allow the programmer to create unique C application code independent of compenent architecture.
65
_______________________________________________________________________ Comparative table Motorola DSPs

Parametric Device Speed (Max) (MHz) Core Performance DSP (MMACS) Internal Data Bus Width (bit) External Program Memory (kByte) External Data Memory (kByte) Internal Data Flash (kByte) Internal Program Flash (Byte) Number of Timers Core Voltage (Spec) (V) 56800/E DSPs 60 - 120 30 - 120 16 - 32 2.048 - 8.192 8.192 - 262.144
DSP56300 DSPs 80 - 150 60 - 180 24 768 - 4800 1536 - 9600 3 1.25 - 3.3
4 - 16 1.8 - 3.3
66
_______________________________________________________________________
3.1.2.5. Programming languages: C versus Assembly

DSPs are programmed in the same languages as other scientific and engineering applications, usually assembly or C. Programs written in assembly can execute faster, while programs written in C are easier to develop and maintain. In traditional applications, such as programs run on personal computers and mainframes, C is almost always the first choice. If assembly is used at all, it is restricted to short subroutines that must run with the utmost speed. This is shown graphically in the next figure(a); for every traditional programmer that works in assembly, there are approximately ten that use C. However, DSP programs are different from traditional software tasks in two important respects. First, the programs are usually much shorter, say, onehundred lines versus tenthousand lines. Second, the execution speed is often a critical part of the application. After all, that's why someone uses a DSP in the first place, for its blinding speed. These two factors motivate many software engineers to switch from C to assembly for programming Digital Signal Processors. This is illustrated in (b); nearly as many DSP programmers use assembly as use C. Figure (c) takes this further by looking at the revenue produced by DSP products. For every dollar made with a DSP programmed in C, two dollars are made with a DSP programmed in assembly. The reason for this is simple; money is made by outperforming the competition. From a pure performance standpoint, such as execution speed and manufacturing cost, assembly almost always has the advantage over C. For instance, C code usually requires a larger memory than assembly, resulting in more expensive hardware. However, the DSP market is continually changing. As the market grows, manufacturers will respond by designing DSPs that are optimized for programming in C. For instance, C is much more efficient when there is a large, general purpose register set and a unified memory space. These future improvements will minimize the difference in execution time between C and assembly, and allow C to be used in more applications.
67
_______________________________________________________________________
As shown in (a), only about 10% of traditional programmers (such as those that work on personal computers and mainframes) use assembly. However, as illustrated in (b), assembly is much more common in Digital Signal Processors. This is because DSP programs must operate as fast as possible, and are usually quite short. Figure (c) shows that assembly is even more common in products that generate a high revenue.
The efficiency of C versus assembly depends greatly on the particular DSP being used. Floating point architectures can generally be programmed more efficiently than fixedpoint devices when using high-level languages such as C. Of course, the proper software tools are important for this, such as a debugger with profiling features that help you understand how long different code segments take to execute. There is also a way you can get the best of both worlds: write the program in C, but use assembly for the critical sections that must execute quickly. This is one reason that C is so popular in science and engineering. It operates as a high-level language, but also allows you to directly manipulate the hardware if you so desire. Even if you intend to program only in C, you will probably need some knowledge of the architecture of the DSP and the assembly instruction set.
68
_______________________________________________________________________
As shown in the previous figure, programs in C are more flexible and quicker to develop. In comparison, programs in assembly often have better performance; they run faster and use less memory, resulting in lower cost.
Which language is best for our application? It depends on what is more important to us. If we need flexibility and fast development, we would choose C. On the other hand, we would use assembly if we need the best possible performance. Here are some things we should consider before choosing C or Assembly:
How complicated is the program? If it is large and intricate, you will probably want to use C. If it is small and simple, assembly may be a good choice. Are you pushing the maximum speed of the DSP? If so, assembly will give you the last drop of performance from the device. For less demanding applications, assembly has little advantage, and you should consider using C. How many programmers will be working together? If the project is large enough for more than one programmer, lean toward C and use in-line assembly only for time critical segments. Which is more important, product cost or development cost? If it is product cost, choose assembly; if it is development cost, choose C. What is your background? If you are experienced in assembly (on other microprocessors), choose assembly for your DSP. If your previous work is in C, choose C for your DSP. What does the DSP's manufacturer suggest you use?
This last item is very important. Suppose you ask a DSP manufacturer which language to use, and they tell you: "Either C or assembly can be used, but we recommend C." You had better take their advice! What they are really saying is: "Our DSP is so difficult to program in assembly that you will need 6 months of training to use it". On the other hand, some DSPs are easy to program in assembly.
69
_______________________________________________________________________
3.2. Simulation Toolkits

Lets introduce the possible simulation sofware for simulating the future filter or DSP design. The most important toolkits which have found on the market are: LabVIEW from National Instruments, Code Composer Studio from TI, VisualDSP from Analog Devices and MATLAB & Simulink from Mathworks.
3.2.1. National Instruments - LabVIEW

3.2.1.1. DSP Test Integration Toolkit

Automate routine Code Composer Studio functions. Seamlessly integrate LabVIEW with TI Code Composer Studio development tool. Integrate a wide variety of I/O for DSP testing. Directly share information from the DSP through direct memory or RTDX technology. Visualize and control data with the LabVIEW Debugging Workbench for RTDX Communication.
This toolkit simplifies debugging and validation of code executing on TI TMS320 DSPs with an executable that design engineers use to establish communication and control of data internal to DSPs in seconds. The executable runs parallel to CCStudio and uses the real-time data exchange (RTDX) communication protocol. The LabVIEW DSP Integration Toolkit interacts directly with Code Composer. The Toolkit will enable the student to migrate subsystems from the desktop simulation of the modem to the C6700 DSP board. The student can use the same PC on which the DSP board resides to validate the design. Alternately, the desktop simulation can be run on a separate PC to serve as either the transmitter or the receiver that is connected to the modem the student is designing.
3.2.1.2. Signal Processing Toolset

Wavelet and filter-bank design for short-duration signal characterization, noise reduction, and detrending Joint time-frequency analysis (JTFA) with the award-winning Gabor spectrograph Digital filter design for graphical design and characterization of FIR and IIR digital filters Super-resolution spectral analysis for high-frequency resolution model-based spectral estimation with a small data set
The National Instruments Signal Processing Toolset is a suite of highlevel VIs, libraries, software tools, example programs, and utilities for The JTFA, wavelet and filter bank design, and superresolution.
70
_______________________________________________________________________ Overview The National Instruments Signal Processing Toolset is a suite of highlevel VIs, libraries, software tools, example programs, and utilities for time-frequency analysis and digital filter design. With this toolset, you can experiment and develop with modern analysis techniques that include wavelets, superresolution (model-based) spectral analysis, and joint time-frequency analysis (JTFA). With the Digital Filter Design component of this toolset, you can interactively design and characterize finite impulse response (FIR) and infinite impulse response (IIR) filters. You can use these functions for LabVIEW-based offline analysis, inline analysis, and for real-time deterministic applications using the LabVIEW Real-Time Module. In addition, you can use these libraries in NI LabWindows/CVI. Time-Frequency Analysis The JTFA, wavelet and filter bank design, and superresolution spectral analysis components of the Signal Processing Toolset are all tools that you can use for timefrequency analysis. Time-frequency analysis commonly involves characterizing how the spectral content of signals evolves over time. This technique can reveal information that is not immediately obvious with standard frequency analysis tools such as a fastFourier-transform-based spectrogram. Digital Filter Design Component Library of Functions and Vis Software for interactively designing FIR and IIR digital filters Design classic FIR and IIR lowpass, highpass, bandpass, and bandstop filters by interacting with the magnitude response graph Design arbitrary FIR filters by interactively modifying the magnitude response plot Design arbitrary IIR filters by interactively modifying the pole-zero plot Analysis and display of magnitude, phase, pole-zero, H(z), impulse, step Filter data from NI hardware or simulated signals Save/load design specifications to file Save filter coefficients to file for use in LabVIEW or LabWindows/CVI Program Examples
Demonstrates how to load coefficients and apply your filter designs Interactively move poles and zeros around complex plane Three methods of filter design Set filter parameters (cutoff frequencies, ripple, etc.)
71
_______________________________________________________________________
3.2.2. Texas Instruments - Code Composer Studio IDE

Code Composer Studio (CCStudio) Development Tools contain all the PC host tools needed to get TI DSP customers to market faster with their real-time embedded applications. CCStudio is a key element of TIs eXpressDSP Software and Development Tools. With the version 2.2 that is available now, you can eliminate down time, guesswork, bottlenecks and increase productivity. New to CCStudio v2.2:

Fast simulators - deeper visibility for quick and precise problem resolution Analysis Tool Kit - boosts performance and simplifies tedious guess work with new utilities: Cache Analysis, Code Coverage and Multi-event Profiler Enhanced Pipeline Analysis tool - detailed pipeline visibility XDS560 device drivers - TI's high speed emulator RTDX channel status viewer - better management of real-time applications Many new usability features - improved usability Performance enhancements - increased reliability and stability
Whats Included: Code Composer Studio includes the following components to shorten the time throughout the development cycle:

Real-time analysis with RTDX and DSP/BIOS RTA components Industry leading C/C++/assembly code generation tools Advanced emulation drivers for high speed interface to the target system Advanced high speed simulators to give deeper visibility into code execution Code editor with advanced code editing features to reduce editing errors Simple to use and powerful debugger to visually control and observe execution Powerful project manager with source control support for single users and large teams DSP/BIOS support to make configuring and using DSP/BIOS simple and intuitive Update Advisor to keep your system up to date with the latest product releases from TI
CCStudios comprehensive, integrated development environment brings complete build and debug tooling, world-class compilers and industry unique analysis and visualization capabilities via JTAG Real-Time Data Exchange (RTDX) to quickly find and fix problems. CCStudio is thoroughly tested for ease of use, reliability and robustness. No matter what the experience level of the developer, size of the project or size of the team, CCStudio is the means to simplify work and get real-time products out the door faster. The real-time foundation provided by TIs DSP/BIOS kernel, RTDX and TIs new simulation capabilities, offers key features such as real-time analysis and advanced data and system visualization, which results in the highest level of host/target integration in the DSP industry.
72
_______________________________________________________________________ In addition, CCStudio is engineered around an open plug-in architecture that enables developers to integrate additional third-party plug-in tools to customize their development environment for specific project requirements. CCStudio Key Benefits:

Quick start with familiar tools and interfaces Easily manage large multi-user, multi-site and multi-processor projects Utilize fast code creation, optimization and debugging tools Access multiple projects from a single window Maximum reuse and portability for faster code development Leverage dynamic developers assistant for high-level language programming Multimedia help files, tutorials and documentation for quick start Run complex DSP simulation in minutes instead of hours
CCStudio Key Features:

Integrated development environment with tightly integrated editor, debugger, visual project management system, profiler and Probe Points Advanced watch windows, local and global variable windows and profiler capabilities C/C++ compiler, assembler, and linker (code generation tools) Update Advisor live update capability for the latest tools, drivers and software releases Real-Time Data Exchange (RTDX) between host and target Real-time analysis and data visualization tools C++, UML, MATLAB and VAB support in compiler File I/O Probe Points and graphical algorithm scope probes Interactive profiling and advanced graphical signal analysis GEL function support for automation activities such as regression testing Automated testing and customization via scripting Multi-processor debugging
Minimum System Requirements

233MHz or higher Pentium -compatible CPU 600MB of free hard disk space 64MB of RAM SVGA (800 x 600) display Internet Explorer (4.0 or later) or Netscape Navigator (4.7 or later) Local CD-ROM drive
Supported Operating Systems for Code Composer Studio 2.2

Microsoft Windows 98 (SP1 and SE) Microsoft Windows NT (SP6) Microsoft Windows 2000 (SP1 and SP2) Microsoft Windows XP Pro and XP Home
73
_______________________________________________________________________
3.2.3. Analog Devices - VisualDSP++ for SHARC Processors

General Description VisualDSP++ is an easy-to-use integrated software development and debugging environment (IDDE). Efficiently manage projects from start to finish from within a single interface. Release 3.5 Key Features

Fully integrated user interface including project management, debugging, profiling, plotting Support a variety of debug targets (emulation, simulation, compiled simulation, and 3rd party offerings) C/C++ compiler, assembler (with C data type support), expert linker, loader VisualDSP++ Kernel (VDK) with multiprocessor messaging capability Automation API and Automation Aware Scripting Engine Background Telemetry Channel (BTC) support with data streaming capability Profile-Guided Optimization (PGO) Comprehensive multiprocessor build and debug support
Minimum Requirements

Pentium processor 166 MHz or faster 128MB of RAM CD-ROM Drive Internet Explorer 5.01 or later Windows 98, Windows NT, Windows 2000 and Windows XP
74
_______________________________________________________________________
3.2.4. The Mathworks - MATLAB & Simulink

The MathWorks products accelerate the development of signal processing and communication systems for the electronics, communications, aerospace, defense, medical, and other industries. Engineers rely on MathWorks products to develop algorithms, model and simulate complex systems, generate real-time code, and verify hardware and software implementation. Model-Based Design Increasing demands for higher performance, lower cost, and faster time to market leave no room for design failure. MATLAB and Simulink let you to create and validate a system design before implementation. This greatly reduces the risk of finding errors late in the process, saving you time and money. The MATLAB and Simulink family of products provide model-based design. With model-based design you quickly and accurately model complex systems and produce validated, executable specifications in the shortest time. You can also generate and verify implementations on DSP and FPGA hardware.
Algorithm Development with MATLAB Before and during the development of every system, new components and algorithms need to be designed and analyzed. This requires mathematical modeling and analysis, and can involve tasks such as filter design, filter analysis, signal analysis and acquisition of measured data.
75
_______________________________________________________________________ MATLAB is the world's leading software for algorithm design. It is a high-level, interpreted programming language for algorithm development and data analysis. It provides a vast library of mathematical modeling and computation functions, allowing the user to work interactively or to build these commands into complex programs. 2-D, 3-D, and specialized plotting routines help you understand your data visually and graphical user interfaces (GUIs) can be designed quickly and used as front-ends to your application. With MATLAB and application toolboxes such as the Signal Processing Toolbox and the Filter Design Toolbox, you can develop algorithms and perform all common signal and system analysis operations on individual components and data. Over 40 toolboxes let you perform common tasks in applications like communications algorithm design, wavelet analysis, image processing, control design, optimization, and statistics. MATLAB is the standard tool for signal processing research and education with more than 50 textbooks based on the language, so many engineers can be productive without additional training. You can also acquire measured data to use for testing algorithms and models. This is simplified by the Data Acquisition Toolbox and Instrument Control Toolbox which provide tools for I/O from a variety of PC-compatible data acquisition hardware and enable communication with instruments, such as oscilloscopes and function generators, directly from MATLAB. Algorithms that are designed and optimized in MATLAB can be simulated with other components in a Simulink model. MATLAB can be used throughout the development process to systematically test and analyze simulation results from Simulink and verify hardware and software implementation.
Modeling and Simulation with Simulink Simulink is a model-based design environment that allows you to model a system, simulate its behavior, and refine your design before implementation. You can create hierarchical block diagram designs for both high-level modeling to capture an overall product concept and lower-level modeling to specify implementation details. You can build complete end-to-end simulations, integrating different components including analog/mixed signal, DSP, digital communications, and control logic in a single model. With Simulink simulations, you can ensure that it performs to your specifications, explore design trade-offs and tune parameters to optimize performance. Simulink can model single and multi-channel data, and linear and non-linear components. It can simulate digital, analog, and mixed-signal components, and it employs efficient techniques for simulating frame-based and multi-rate signal processing systems. Function libraries such as the DSP Blockset and the Communications Blockset provide all the common blocks found in digital signal processing and communications applications allowing you to build complete system models. The Fixed-Point Blockset adds the capability to perform bit-true simulations, compare these to a floating point reference, and solve scaling problems before implementation. Simulink's companion product Stateflow models control logic and other event-driven systems.
76
_______________________________________________________________________ Simulink blocksets come with extensive reference examples to get you started with modeling wireless systems such as 802.11b, W-CDMA and Bluetooth, as well as audio and video signal processing techniques such as edge detection, motion detection, image compression and focus assement.
Component Design and Integration Most DSP-based communications and multi-media products have these main components: analog/mixed-signal, DSP or digital communications, and control logic. When design teams work independently and use different tools they can't easily simulate component interactions or test the whole system. With Simulink, you can simulate all of these components together in one environment. This capability allows system architects to simulate component interactions, test the whole system and deliver clear specifications to component designers. As a result, Simulink effectively streamlines communication and design hand offs across teams. Signal Processing Simulink and the DSP Blockset provide a library of predefined blocks and efficient simulation techniques that enable you to easily model digital signals and a wide array of real-time systems. The DSP Blockset contains blocks that perform filtering, FFTs, correlation, windowing, vector and matrix math, complex math, linear algebra, and statistics. Blocks can execute at any rate, enabling you to model complex multirate systems. Fast framebased simulation is used so that large frames of data are processed at every time step, helping your simulation to run much faster. You can even input and output streaming audio in real time to your sound card with Windows 95/98 or NT, and can construct a wide range of real-time systems with the provided blocks. Fixed-Point Modeling The Fixed-Point Blockset extends Simulink by enabling variable precision - from 1 to 128 bits - for fundamental math and logical operations, such as arithmetic. By combining the Fixed-Point Blockset with the DSP Blockset, you can simplify digital filter design with the DSP Filter Realization Wizard, which automatically generates Simulink models of fixed-point filters in a variety of realizations. Integrating C Code The flexibility of the MathWorks model-based design environment provides interoperability between MATLAB, Simulink, and C. This enables you to import C code into a Simulink model by creating your own block. With this capability, you use Simulink as your framework for your C code, which will help you jumpstart your project with access to the prebuilt blocks and application examples. You can also generate C code to create standalone executables for ultimate simulation speed.
77
_______________________________________________________________________ Code Generation and Design Verification Up to 70% of development time is spent on testing and verification. The MathWorks and complementary third-party products provide code-generation and designverification interfaces to eliminate slow, error-prone methods. Code Generation Software Real-Time Workshop can automatically generate efficient ANSI C code from a Simulink model for downloading to a DSP or embedded processor. The Embedded Target for TIC6000 DSP Platform provides rapid prototyping and generates efficient code for C6000 processors directly from Simulink models. It supports I/O for DSKs, an optimized fixed-point library, and enables you to customize code to tune the algorithm performance and deploy on your own development board. Hardware If you are implementing your system on an FPGA, products from key partners, such as the System Generator for DSP from Xilinx and DSP Builder from Altera, provide a seamless path to FPGAs from a Simulink model. Verification Once you have created a behavioral model in Simulink, you can use this validated design as an executable specification or reference for the generation of test signals to streamline the verification of your hardware or embedded software designs. The MATLAB Link for Code Composer Studio Development Tools integrates MATLAB with Code Composer Studio, the Texas Instruments (TI) software development environment, and TI DSP hardware for real-time analysis, testing, and verification of TI DSP programs. It simplifies debugging of realtime algorithms on TI C2000, C5000, C6000 and OMAP processor families. You can verify hardware and software prototypes by using MATLAB to acquire data and live signals using the Data Acquisition Toolbox and the Instrument Control Toolbox. That is all concerning the possible simulation toolkits to program the DSP which is going to be employed in the main application.
78
_______________________________________________________________________
3.3. Conclusion: Selected method to solve the application

In this subchapter we take a decision about choosing the best way for solving the topic of this project, choosing the suitable device and explaining the reasons of this decision regarding criteria like devices features, price, applications, etc. Obviously as we have said before, we choose the way of DSP because the design of a Common Adaptive Filter would result more complex and it could have errors of design which would imply testing problems (i.e. with electronic components). So we think this way could be more expensive by the reasons explained before. Lets go to evaluate and compare the best DSPs that have seen in the previous subchapter and to select the more suitable device for solving the application. Firstly, we want to rule out the Motorolas option, because their DSPs features are less attractive than other companies and they dont show prices, so that makes difficult to compare them, but this doesnt mean that are not useful for our application. The devices models will be chosen from the two most important DSP companies: Texas Instruments and Analog Devices. We show in the next comparative tables two possible processors of each company which could work as an AEC (Acoustic Echo Canceller), comparing their main characteristics, prices, etc.
3.3.1. Selected Floating Point DSPs models
DSP device Company Fixed/Floating Point Speed (MHz) MFLOPS MMACS (max) Price Peripherals/ Co-processors
TMS320C67x (C6000)
Texas Instruments Floating Point 100 - 225 600 - 1350 MFLOPS 200 - 550 $13.50 - $111.00 US McASP, McBSP, 16/32-bit HPI, 4/16-channel DMA, 16/32-bit EMIF, Timers
ADSP 21365 (SHARC)

Analog Devices Floating Point 300 1800 600 Available in Spring 2004 IEEE 32-bit floating-point, 40bit floating-point and 32-bit fixed-point data types, 6 SPORTs, 2 SPI, 16 PWM, 3 fullfeatured timers, 25 zerooverhead DMA channels Automotive Audio, Consumer Home Theater, Digital Audio Amplifiers and Professional Audio
Applications
Audio, medical imaging, instrumentation and automotive
79
_______________________________________________________________________ 3.3.2. Selected Fixed Point DSPs models
DSP device Company Fixed/Floating Point Speed (MHz) MIPS MMACS (max) Price Peripherals/ Co-processors
TMS320C64x (C6000)
Texas Instruments Fixed Point 300 - 1000 2400 - 8000 MIPS 1200 - 4000 $19.95 - $199.00 US McBSP, 32-bit/33MHz PCI, 16/32-bit HPI, 64-channel DMA, 16-64-bit EMIF, UTOPIA, Timers Viterbi Co-Processor, Turbo Co-Processor, Video Ports Multi-channel telephony systems, digital communications infrastructure and video and image processing
ADSP BF561 (Blackfin)

Analog Devices Fixed Point 500 - 750 3000 $19.95 - $39.95 2 PPIs, UART, 12 Timers, 2 SPORTs
Applications
Embedded audio, video, and communications
We have shown in the two last tables two devices of floating point and two more of fixed point. Firstly we have to decide between fixed or floating point, and we decant to choose floating point by the following reasons:

Product Cost: in fixed point devices, the cost will be reduced, but the development cost will probably be higher due to the more difficult algorithms. Precision and Development Cycle: floating point devices have better precision, higher dynamic range and will generally result in a quicker and cheaper development cycle although the price is more expensive than fixed point.
These pros and cons to use floating versus fixed point data formats, are illustrated in the following figure:
As we have shown in the first table, we have selected two possible floating point devices for our application from the two main DSP companies (TI and Analog Devices). These models are TMS320C67x from the C6000 TI family and ADSP 21365 from the SHARC Analog Devices family.
80
_______________________________________________________________________
Both devices are very similar about their features and applications, so any of them will be an excellent choice for an AEC (Acoustic Echo Canceller). The only cons of ADSP 21365 is that the device is not yet on the market and its price will be expensive at the beginning. So our final election is the model TMS320C67x from TI. The exact model, that is going to be used in our project at FH Salzburg is the TMS320C6713 DSP Starter Kit (DSK), and its price is $395.01.
3.3.3. TMS320C6713 DSP Starter Kit (DSK) Description The TMS320C6713 DSP Starter Kit (DSK) developed jointly with Spectrum Digital is a low-cost development platform designed to speed the development of high precision applications based on TIs TMS320C6000 floating point DSP generation. The kit uses USB communications for true plug-and-play functionality. Both experienced and novice designers can get started immediately with innovative product designs with the DSKs full featured Code Composer Studio v2.2 IDE and eXpressDSP Software which includes DSP/BIOS and Reference Frameworks. All users will benefit from the eXpressDSP for Dummies textbook featured for the first time in this DSK The C6713 DSK tools includes the latest fast simulators from TI and access to the Analysis Toolkit via Update Advisor which features the Cache Analysis tool and MultiEvent Profiler. Using Cache Analysis developers improve the performance of their application by optimizing cache usage. By providing a graphical view of the onchip cache activity over time the user can quickly determine if their code is using the on-chip cache to get peak performance. The C6713 DSK allows you to download and step through code quickly and uses Real Time Data Exchange (RTDX) for improved Host and Target communications. The DSK includes the Fast Run Time Support libraries and utilities such as Flashburn to program flash, Update Advisor to download tools, utilities and software and a power on self test and diagnostic utility to ensure the DSK is operating correctly. The full contents of the kit include:

C6713 DSP Development Board with 512K Flash and 8MB SDRAM C6713 DSK Code Composer Studio v2.2 IDE including the Fast Simulators and access to Analysis Toolkit on Update Advisor Quick Start Guide Technical Reference Customer Support Guide USB Cable Universal Power Supply AC Power Cord(s) MATLAB from The Mathworks 30 day free evaluation Receive a FREE eXpressDSP for Dummies textbook with the purchase of the TMDSDSK6713 (limited time offer)
81
_______________________________________________________________________ Features The DSK features the TMS320C6713 DSP, a 225 MHz device delivering up to 1800 million instructions per second (MIPs) and 1350 MFLOPS. This DSP generation is designed for applications that require high precision accuracy. The C6713 is based on the TMS320C6000 DSP platform designed to needs of high-performing high-precision applications such as pro-audio, medical and diagnostic. Other hardware features of the TMS320C6713 DSK board include:

Embedded JTAG support via USB High-quality 24-bit stereo codec Four 3.5mm audio jacks for microphone, line in, speaker and line out 512K words of Flash and 8 MB SDRAM Expansion port connector for plug-in modules On-board standard IEEE JTAG interface +5V universal power supply
Software - Designers can readily target the TMS32C6713 DSP through TIs robust and comprehensive Code Composer Studio DSK development platform. The tools, which run on Windows 98, Windows 2000 and Windows XP, allow developers to seamlessly manage projects of any complexity. Code Composer Studio features for the TMS320C6713 DSK include:
A complete Integrated Development Environment (IDE), an efficient optimizing C/C++ compiler assembler, linker, debugger, an a advanced editor with Code Maestro technology for faster code creation, data visualization, a profiler and a flexible project manager DSP/BIOS real-time kernel Target error recovery software DSK diagnostic tool "Plug-in" ability for third-party software for additional functionality
82
_______________________________________________________________________
Picture: especification of the board TMS32C6713 DSK components.
That is all shown about DSP and let is go modeling the algorithm for the Acustic Echo Canceller and to simulate and analyse our system in the next chapter.
83
_______________________________________________________________________
4. Description of the chosen solution

4.1. Acoustic Echo Cancelling
4.1.1. AEC Modeling In this chapter we are going to describe the DSP algorithm which will be implemented in agreement with the signals that are going to be filtered, and the AEC model, where we will show all the blocks that configure the system. The algorithm is going to be used in the simulation with the selected toolkit and in a real environment. This will cover at least two use cases: a simple case to demonstrate the viability of the chosen solution, and a more sophisticated case for a realistic application. The following figure represents the AEC model implemented with Simulink.
Figure 1. Acoustic Echo Canceller model implemented in Frequency Domain.
We start showing the AEC model implemented with Simulink (Matlab 6.5.1), where can see every block which compose the Acoustic Echo Canceller system represented in the Frequency Domain.
Adaptive Echo Cancellation This Simulink model demonstrates a frequency-domain adaptive filtering application on our Texas Instruments DSP Starter Kit using Embedded Target for Texas Instruments
84
_______________________________________________________________________ TMS320C6000(tm) DSP Platform. The model features the C62x/C64x DSP Library blocks and fixed-point data handling. Adaptive Filter Algorithm As we have explained in other chapters, echo cancelation is a signal processing technique for removing reverberant noise from speech and audio signals. Here, we demonstrate the Block Least Mean Squares echo cancelation algorithm implemented in the frequency domain, leveraging the computational efficiency of performing FFTs on TI's TMS320C6xxx DSP. The algorithm is implemented using fixed-point arithmetic. In this model, the input audio read from a Line In is considered to be the unwanted noise to be attenuated. Let is go to see inside the adaptive filter block called FBLMS_Fixed in the model.
Figure 2. Adaptive filter block.
85
_______________________________________________________________________ 4.1.1.1. Describing the AEC model blocks
4.1.1.1.1. C6713 DSK DIP Switch This block behaves differently in simulation than in code generation and targeting. In Simulation, the options Switch 0, Switch 1, Switch 2, and Switch 3 generate output to simulate the settings of the user-defined dual inline pin (DIP) switches on our C6713 DSK. Each option turns the associated DIP switch on when you select it. The switches are independent of one another. By defining the switches to represent actions on our target, DIP switches let we modify the operation of our process by reconfiguring the switch settings. Use the Data type to specify whether the DIP switch options output an integer or a logical string of bits to represent the status of the switches. Selecting the Integer data type results in the switch settings generating integers in the range from 0 to 15 (uint8), corresponding to converting the string of individual switch settings to a decimal value. In the Boolean data type, the output string presents the separate switch setting for each switch, with the Switch 0 status represented by the least significant bit (LSB) and the status of Switch 3 represented by the most significant bit (MSB). In Code generation and targeting, the code generated by the block reads the physical switch settings of the user switches on the board and reports them as shown above. Your process uses the result in the same way whether in simulation or in code generation. In code generation and when running your application, the block code ignores the settings for Switch 0, Switch 1, Switch 2 and Switch 3 in favor of reading the hardware switch settings. When the block reads the DIP switches, it reports the results as either a Boolean string or an integer value.
4.1.1.1.2. C6713 DSK ADC Use the C6713 DSK ADC (analog-to-digital converter) block to capture and digitize analog signals from external sources, such as signal generators, frequency generators or audio devices. Placing an C6713 DSK ADC block in our Simulink block diagram lets we use the audio coder-decoder module (codec) on the C6713 DSK to convert an analog input signal to a digital signal for the digital signal processor. You can select one of three input sources from the ADC source list:

Line In, the codec accepts input from the line in connector (LINE IN) on the board's mounting bracket. Mic, the codec accepts input from the microphone connector (MIC IN) on the board mounting bracket. Loopback, routes the analog signal from the codec output back to the codec input. Can be useful in some feedback applications.
86
_______________________________________________________________________ 4.1.1.1.3. C6713 DSK DAC Adding the C6713 DSK DAC (digital-to-analog converter) block to our Simulink model provides the means to output an analog signal to the analog output jack on the C6713 DSK. When you add the C6713 DSK DAC block, the digital signal received by the codec is converted to an analog signal. After converting the digital signal to analog form (digital-to-analog (D/A) conversion), the codec sends the signal to the output jack. One of the configuration options in the block affects the codec. The remaining options relate to the model we are using in Simulink and the signal processor on the board.
4.1.1.1.4. C62x General Real FIR The General Real FIR block filters a real input signal X using a real FIR filter. This filter is implemented using a direct form structure. The filter coefficients are specified by a real vector H, which must contain at least five elements. The coefficients must be in reversed order. All inputs, coefficients, and outputs are Q.15 signals. The General Real FIR block supports discrete sample times and both little-endian and big-endian code generation.
87
_______________________________________________________________________ 4.1.2. Testing in simulation environment In this subchapter we introduce the testing area in the simulation environment and the problems which we have to solve. As we have explained before, the simulation environment takes place in Matlab 6.5.1, exactly in Simulink for modeling. For testing our Simulink AEC model we have used an audio fragment signal, as an input signal, which is stored in the code and read circularly. We have to move the model to a writable working directory and choose: Tools >> Real-Time Workshop >> Build Model (or Ctrl-B). After generating code, Real-Time Workshop connects to Code Composer Studio and creates a new project. After compiling and linking the code, Real-Time Workshop then downloads the COFF file to the DSK and begins execution. At this time, if we have connected speakers to the audio output jack of the DSK, we should hear the AEC output signal. We can control the adaptive filter during execution by turning on the first User DIP switch on the board. On C6713DSK it is marked "0" on SW1. Set the switch to ON (down) to set the filter coefficients to zero. Once the generated code is running on the target, a profile report may be viewed by invoking the following method from the MATLAB command line: >> profile(CCS_Obj, 'report') 4.1.2.1. Parameters setup Before testing our AEC Simulink model, let is observe how is the system configured, viewing the parameters setup from the main blocks. Input signal parameters ( Audiofrag ) Sample time: 1/8000 s = 125s Samples per frame: 64
DIP Switch parameters Sample time: 64/8000 s = 8ms Switch 0 (LSB): 0 Switch 1: 0 Switch 2: 0 Switch 3 (MSB): 0
General Real FIR parameters Vector H coefficients: fir1(63,0.25)
88
_______________________________________________________________________ DAC parameters Word length: 16-bit Scaling: Integer value If we want to set the coefficients of the adaptive filter to zero, this means that the filter doesnt remove the echo from the input signal, we have to set Switch 0 to ON( 1 ). That is important to observe how fast the adaptation works, when we change SW0 to OFF position, and the other pins are also in OFF position.
89
_______________________________________________________________________ 4.1.3. Analyzing the simulation results Here we show the results of the conducted tests in the simulation environment that are going to be described and analyzed. Let is go to observe the representation of each signal, in the following graphics, obtained after simulation (stopped at 2942 frames).
Input signal (with echo): audio fragment from workspace
Original signal: input signal passed through the General Real FIR
90
_______________________________________________________________________ Output signal: input signal without echo (removed by the FBLMS Fixed block)
Tap filter coefficients: real value of the coefficients during LMS adaptation
AEC conclusion results In the previous graphics we observe the input signal (the original signal with echo) and how this echo is removed from the original signal after crossing by the LMS block and being filtered. The echo signal has got less amplitude than the original signal. It is also shown the filter coefficients, demonstrating how the filter works in the signal, and the result is an output signal (with less amplitude than the input signal and without echo).
91
_______________________________________________________________________
4.2. Acoustic Noise Cancelling
4.2.1. Introduction to Noise Cancelling Noise cancellation is an important topic in many different fields, ranging from satelite photo imaging, speech recognition, and telecommunications, to noise control in industry and automobiles. With continued advances in telecommunications and digital processing, the problem of noise elimination will become ever more important. From a theoretical viewpoint, the problem is one of separating an information signal which has had a noise signal superimposed upon it. If it is possible to obtain a separate noise signal, then it is possible to subtract the noise from the information signal. Such a technique can be found in certain kinds of noise cancelling headphones. Microphones which are strategically placed on the headset listen for noise, then generate a cancelling sound pattern which is piped into the headphones. Standard fixed FIR and IIR filters are incapable of removing noise from a signal if the noise is subject to changes in frequency, phase, amplitude, or some combination of all three. To account for these changes, the filter must adapt to the new noise conditions. Although there are many adaptive filter types which have been designed, the simplest, the stochastic gradient or least mean squares filter (LMS), which we have explained it in previous chapters, offers surprising performance, combined with computational simplicity and theoretical clarity. This is a transversal filter with a single tap weight (and tap weight computation) at each tap in the filter. Such filters require that noise be different from the signal in some way; clearly this must be so, or there must be all signal or all noise. The LMS adaptive filter requires that the noise be statistically uncorrelated to the signal. However, the noise may not be purely random. The LMS filter attempts to minimize total power in the output signal, and because the filter weight estimates are based upon the gradient of the input signal (with noise), the input must be relatively smooth and derivatives must exist. Also, very sharp changes in noise content are not handled well by the LMS filter. All adaptive filtering systems require that there be the input signal (with noise) and a reference signal which the filter uses to adjust tap weights. The filters and signals may be combined in one of four ways:
Type I - Identification Systems: The filter is used to identify the impulse response of an unknown plant. The reference signal is the difference in outputs between the plant impulse response and the filter response. Type II - Inverse Modelling: Somewhat similar to a Type I system but the output is the inverse impulse response. Type III - Predictive Systems: The input signal delayed one or more samples, then fed into the filter. The reference signal is the summation of the input signal and the output of the filter. LMS filters are not usually suitable for this application. Type IV - Noise Cancellation: The primary signal and a separate reference signal are combined to remove the noise from the primary signal. 92
_______________________________________________________________________ In this thesis we develop and simulate the type IV, which deals with Acoustic Noise Cancellation. The situation of this acoustic noise environment is illustrated in the next figure, where we have a noisy signal, x(k), which is the sum of a speech signal, s(k), and ambient noise (to be removed) signal, n(k), that inputs by the microphone. This signal is passed through a noise reduction process system and the result is an enhanced signal, y(k), which is equal to the speech cleaned signal, that means with the noise substracted.
In the following point let is procede modeling this noise reduction process system, which is called commonly as we have said, Acoustic Noise Canceller (ANC).
4.2.2. ANC Modelling Here we show the ANC Simulink model implemented in Time Domain, and its main blocks as the NLMS (Normalized Least Mean Square) filter and the Acoustic environment block. On the following figures they are illustrated.
Figure 1. Acoustic Noise Canceller Simulink model
93
_______________________________________________________________________
Figure 2. Acoustic environment block
Figure 3. Normalized LMS filter block
Adaptive Noise Cancelation This Simulink model demonstrates an adaptive filtering application on our Texas Instruments DSP Starter Kit using Embedded Target for Texas Instruments TMS320C6000 DSP Platform.
Adaptive Filter Algorithm The Least Mean Square adaptive filter uses the reference signal (on the In port) and the error signal (on the Err port) to automatically match the filter response in the block 94
_______________________________________________________________________ labeled Acoustic environment. As it converges to the correct filter, the filtered noise should be completely subtracted from the Input signal + Noise signal and the Error signal should contain only the original signal.
4.2.2.1. Describing the ANC blocks and its parameters setup Here we are going to explain the configuration of the Acoustic Noise Canceller viewing its main blocks function and its parameters setup. This means how they work each one inside the ANC system. 4.2.2.1.1. Acoustic environment block In this block, whose implementation is represented in figure 2, we have got a noise signal which comes from a noise source and this is one of the outputs from the main block called Exterior Mic. This noise signal is the first input signal (labeled In) from the digital filter, DF FIR, and this noise signal is going to be filtered Bandpass or Lowpass, depending if we select, in the Switch block, with a 0 or 1. That is the second input (labeled Num) from the DF FIR. This means that if we select a 0 in the input from the Acoustic environment block (labeled filter), the noise signal will be filtered Bandpass and selecting a 1 it will be filtered Lowpass. This filtered noise signal is the output from the DF FIR (labeled Out). This previous signal is added with the speech input signal, which comes from the Mic In port of the ADC C6713 DSK block. This sum is the second output from the Acoustic environment block, which is labeled Pilots Mic. This two outputs from this block are going to be passed to the NLMS adaptive filter by its Input and Desired ports.
4.2.2.1.2. C6713 DSK DIP Switch The DIP Switch functioning was explained in the point 4.1.1.1.1 of the AEC blocks description, and simply we are going to specify how this block is configured for executting the acoustic noise cancellation process. Let is go to see it: Switch 0: here if we select a 0 (OFF), this implicates a Bandpass filtering in the DF FIR. On the other hand, if we select a 1 (ON), we will have got a Lowpass filtering. Switch 1: here if we select a 1 (ON), then starts the adaptation process. But if we select a 0 (OFF) the adaptation is stopped and this means that the LMS filter coefficients are setting to zero. So the Input signal that comes with noise, this will not removed. Switch 2: here selecting it to ON position we will reset the LMS filter coefficients, so they will come back to be zero. But if we put it to OFF position it will not do anything. Switch 3: No Operation. It is connected to a terminator, used to terminate output signals. This prevents warnings about unconnected output ports.
95
_______________________________________________________________________ 4.2.2.1.3. C6713 DSK ADC and C6713 DSK DAC The C6713 DSK ADC and C6713 DSK ADC main functions were explained before in the points 4.1.1.1.2 and 4.1.1.1.3 of the AEC blocks description.
4.2.2.1.4. Fast Adapt / Slow Adapt blocks The fast and slow adapt blocks are designed to change the speed of the filter adaptation. This speed is represented by a constant value parameter. This constant value is a vector parameters as 1-D is on, treat the constant value as 1-D array. Otherwise, the output is a matrix with the same dimensions as the constant value. The Fast Adapt block has fixed as a constant value, 0.04. This means that the filter adapts fastly. The Slow Adapt block has fixed as a constant value, 0.002. So the filter makes the adaptation slowly. The output of these blocks are multiplexed manually, as we can observe in the figure 1. This multiplexed output connects to the Step-size port from the NLMS filter to control the speed of the adaptation.
4.2.2.1.5. Normalized LMS filter block
4.2.2.1.5.1. General description The LMS Filter block can implement an adaptive FIR filter using five different algorithms, and we have selected the Normalized LMS(as adaptive algorithm). The block estimates the filter weights, or coefficients, needed to minimize the error, e(n), between the output signal, y(n), and the desired signal, d(n). Connect the signal we want to filter to the Input port. This input signal can be a sample-based scalar or a singlechannel frame-based signal. Connect the desired signal to the Desired port. The desired signal must have the same data type, frame status, complexity, and dimensions as the input signal. The Output port outputs the filtered input signal, which is the estimate of the desired signal. The output of the Output port has the same frame status as the input signal. The Error port outputs the result of subtracting the output signal from the desired signal. The Adapt port is used to start the adaptation process with a 1. The Reset port is used to set the adaptive filter coefficients to zero. And the Wts port outputs the coefficients values of the filter. The equivalent model of this LMS filter is represented in the previous figure 2. Note also that the LMS filter length is fixed to 40 in all future simulations.
96
_______________________________________________________________________ 4.2.2.1.5.2. Supported Data Types
Port Input
Supported Data Types Double-precision floating point Single-precision floating point Fixed point Double-precision floating point Single-precision floating point Fixed point Double-precision floating point Single-precision floating point Fixed point Double-precision floating point Single-precision floating point Boolean 8-, 16-, and 32-bit signed integers 8-, 16-, and 32-bit unsigned integers Double-precision floating point Single-precision floating point Boolean 8-, 16-, and 32-bit signed integers 8-, 16-, and 32-bit unsigned integers Double-precision floating point Single-precision floating point Fixed point Double-precision floating point Single-precision floating point Fixed point Double-precision floating point Single-precision floating point Fixed point
Desired
Step-size
Adapt
Reset
Output
Error
Wts
97
_______________________________________________________________________ 4.2.3. Testing and analyzing in the simulation environment In this subchapter we are going to show the results from the ANC simulation using to this wave files as an input signal and noise signal (noise source) with the same sampling frequency to observe the good efficiency from the NLMS adaptive filter algorithm. We have tested and simulated the ANC system with two kind of noise signals, white noise and pink noise, which are the two more common. On the following graphics we can observe the representation in time domain of the main signals and the response of our system. All the simulations have been done with a Bandpass Digital Filtering ( Switch 0 to ON position) and with a slow adaptation process, to see how adaptation evolves. Later on we will specify the characteristics of the audio input signals and all graphics will be comented before and after the adaptation process.
4.2.3.1. Simulation before adaptation (stopped at 1078 frames) with white noise source Input signal: from a speech wave file
Noise signal: from a white noise wave file
98
_______________________________________________________________________ Desired signal: Input signal added with Noise signal (filtered Bandpass by DF FIR)
Output signal: called error signal in the ANC model. It is the desired signal passed through the NLMS filter but, how the filter coefficients are set to zero, we obtain the signal represented in the previous graphic, the desired signal
99
_______________________________________________________________________ Filter coefficients: 3D representation. They are fixed to zero
4.2.3.2. Simulation after adaptation (stopped at 4545 frames) with white noise source Input signal: from a speech wave file
100
_______________________________________________________________________ Noise signal: from a white noise wave file
Desired signal: Input signal added with Noise signal (after being filtered Bandpass by DF FIR)
Output signal: called error signal in the ANC model. It is the desired signal passed through the NLMS filter but, how the adaptation have been executed, we observe that this output signal is the desired signal with the noise substracted, it is aproximately the input signal
101
_______________________________________________________________________ Filter coefficients: 3D representation. Now the coefficients have taken a value, then have been totally completed the adaptation process
4.2.3.3. Simulation before adaptation (stopped at 1062 frames) with pink noise source Input signal: from a speech wave file
102
_______________________________________________________________________ Noise signal: from a pink noise wave file
Desired signal: Input signal added with Noise signal (filtered Bandpass by DF FIR)
Output signal: called error signal in the ANC model. It is the desired signal passed through the NLMS filter but, how the filter coefficients are set to zero, we obtain the signal represented in the previous graphic, the desired signal
103
_______________________________________________________________________ Filter coefficients: 3D representation. They are fixed to zero
4.2.3.4. Simulation after adaptation (stopped at 2360 frames) with pink noise source
Input signal: from a speech wave file
104
_______________________________________________________________________ Noise signal: from a pink noise wave file
Desired signal: Input signal added with Noise signal (after being filtered Bandpass by DF FIR)
Output signal: called error signal in the ANC model. It is the desired signal passed through the NLMS filter but, how the adaptation have been executed, we observe that this output signal is the desired signal with the noise substracted, it is aproximately the input signal
105
_______________________________________________________________________ Filter coefficients: 3D representation. Now the coefficients have taken a value, then have been totally completed the adaptation process
106
_______________________________________________________________________ 4.2.3.5. Describing the wave files specifications Now we illustrate in the next figure (Acoustic environment block) the changes we have done in this model which we have shown in the point 4.2.2. As we have explained, we have introduced two input blocks (Input signal and Noise signal) from two wave files. Let is observe the model used for simulating.
Input wave file properties File name: chal.wav Channels: 1 Ch (mono) Word length: 16-bit Sampling frequency: 44.1kHz
Noisy wave files properties File name: White_Noise.wav Pink_Noise.wav Channels: 1 Ch (mono) Word length: 16-bit Sampling frequency:44.1kHz
White Noise frequency response (module)
107
_______________________________________________________________________ Pink Noise frequency response (module)
4.2.3.6. Different type of Noise signals Here we describe the five different types of common noise signals which can interfere in an acoustic environment. All these types of noise can be removed by our Acoustic Noise Canceller system.
Type of Noise White
Description This is white noise, with a frequency distribution of 1 (i.e. all components of equal intensity). As seen from the spectrograph, this corresponds to a flat response. This is 1/f noise, with a frequency distribution of 1/f, or a 3 dB per octave roll-off. It is generally considered the kind of noise most prevalent in nature. This variety derives its name from Brownian motion, and is the kind of noise associated with such random walks. It has a frequency distribution of 1/(f ^2), which is to say a roll-off of 6 dB per octave. This is, in a sense, the inverse of pink noise, with a frequency increase of 3 dB per octave. Intensity is proportional to f. And this is the inverse of brown noise, with a frequency premphasis of 6 dB per octave, or an intensity characteristic of f ^2.
Pink
Brown
Blue Violet
108
_______________________________________________________________________ 4.2.4. ANC conclusions Firstly, as we have observed in the results mentioned and comented before, the ANC design which we have simulated works correctly. Secondly, the system has been tested in a real environment with the board C6713 DSK with a microphone input (Mic In), two speakers (Line Out) and a broadband noise generated by the own system. We have tested how the LMS filter adapts with two different speeds of adaptation, slow and fast. This adaptation is controled manually from the board, by the DIP Switch module. All these things has been checked rigorously in the laboratory and we can conclude that the Acoustic Noise Canceller eliminates succesfully the noise, so this means it runs hopefully well.
Personal remark
We are really pleased with the rigorous work that we have realized, and we have enjoyed it a lot during this Thesis development. All this has been possible thank the good mutual understanding about much factors concerning at time to work, discussing things, contributing with ideas, etc. We think that have done a good work together and we have learnt many concepts about this theme (DSP) which we had never studied previously. We would like to thank also our projects tutors, which have been always there when we have needed technical support during the realization of the thesis. That is all about what we would want to comment personally.
109
_______________________________________________________________________
Appendix
A. LMS algoritms in Matlab language
LMS adaptive algorithm in Matlab (C source code) of the ANC /* DSP Blockset Filter Implementation (sdspfilter2) - '<S1>/Digital Filter' */ /* Filter algorithm: FIR Direct-Form (single precision floating-point) */ /* Complexities: input - real, num coeffs - real */ /* Implementing filter algorithm */ MWDSP_FIR_DF_RR(&ANC_noise_wav_B->From_Wave_File[0], &ANC_noise_wav_B->Digital_Filter[0], &ANC_noise_wav_DWork->Digital_Filter_FILT_STATES[0], &ANC_noise_wav_DWork->Digital_Filter_DF_INDX, 40, 32, 1, &ANC_noise_wav_B->Switch_b[0], 1); /* Level2 S-Function Block: <S1>/ADC (c6416dsk_adc) */ /* Call into Simulink for MEX-version of S-function */ ssCallAccelRunBlock(S, 11, 6, SS_CALL_MDL_OUTPUTS); /* Sum: '<S1>/Sum2' */ { int_T i1; const real32_T *u0 = &ANC_noise_wav_B->Digital_Filter[0]; const real32_T *u1 = &ANC_noise_wav_B->ADC[0]; real32_T *y0 = &rtb_Sum2[0]; for (i1=0; i1 < 32; i1++) { y0[i1] = u0[i1] + u1[i1]; } } /* S-Function Block (sfun_manswitch): <S3>/S-Function */ ANC_noise_wav_B->S_Function_a = ANC_noise_wav_P->S_Function_a_P1; } /* SubSystem: '<S2>/Frame-Based LMS Filter' */ if (ssIsSampleHit(S, 0, tid)) { /* Sample time: [7.2562358276643992E-004, 0.0] */ /* Output and update for iterator system: '<S2>/Frame-Based LMS Filter' */ { /* simstruct variables */ ANC_noise_wav_BlockIO *ANC_noise_wav_B = (ANC_noise_wav_BlockIO *) _ssGetBlockIO(S); ANC_noise_wav_D_Work *ANC_noise_wav_DWork = (ANC_noise_wav_D_Work *) 110
_______________________________________________________________________ ssGetRootDWork(S); ANC_noise_wav_Parameters *ANC_noise_wav_P = (ANC_noise_wav_Parameters *) ssGetDefaultParam(S); int32_T iterS11; for (iterS11 = 1; iterS11 <= (int32_T)ANC_noise_wav_B->Width; iterS11++) { /* ForIterator: '<S11>/For Iterator' */ rtb_For_Iterator = iterS11; /* Selector: '<S11>/Selector' */ ANC_noise_wav_B->Selector = ANC_noise_wav_B->From_Wave_File[rtb_For_Iterator-1]; /* DSP Blockset Buffer/Unbuffer (sdsprebuff2) - '<S11>/Buffer' */ { /* Copy input samples to buffer */ { MWDSP_Buf_CopyScalar_OL_1ch((const byte_T *)&ANC_noise_wav_B->Selector, (byte_T **)&ANC_noise_wav_DWork->Buffer_IN_BUF_PTR, (byte_T *)&ANC_noise_wav_DWork->Buffer_CircBuff[0], *&ANC_noise_wav_DWork->Buffer_ShiftPerElem, 79 * sizeof(real32_T)); } /* Copy output samples from buffer */ { MWDSP_Buf_OutputFrame_1ch((byte_T *)&ANC_noise_wav_B->Buffer[0], (byte_T **)&ANC_noise_wav_DWork->Buffer_OUT_BUF_PTR, (byte_T *)&ANC_noise_wav_DWork->Buffer_CircBuff[0], *&ANC_noise_wav_DWork->Buffer_ShiftPerElem, 79 * sizeof(real32_T) ,40, 39 * sizeof(real32_T)); } } /* Output: DSP Blockset Delay (sdspdelay) - '<S11>/Filter Taps' */ /* check for reset */ if( ANC_noise_wav_B->Switch_a[2]) { byte_T *buff = (byte_T *)&ANC_noise_wav_DWork>Filter_Taps_IC_BUFF[0]; byte_T *ics = (byte_T *)&ANC_noise_wav_P->Filter_Taps_IC; { int_T numElems = 40; while (numElems--) { memcpy( buff, ics, sizeof(real32_T) ); buff += sizeof(real32_T);
111
_______________________________________________________________________ } } } { byte_T *buff = (byte_T *) &ANC_noise_wav_DWork>Filter_Taps_IC_BUFF[0]; byte_T *y = (byte_T *) &rtb_Filter_Taps[0]; /* 40 channel input, scalar delay (Delay = 1, Samples per channel = 1) */ const int_T bytesInBuffer = sizeof(real32_T); memcpy(y, buff, bytesInBuffer*40); } /* Sum: '<S11>/sum2' incorporates: * Product: '<S11>/Product' */ rtb_sum2 = (ANC_noise_wav_B->Buffer[0] * rtb_Filter_Taps[0]); { int_T i1; for (i1=0; i1 < 39; i1++) { rtb_sum2 += (ANC_noise_wav_B->Buffer[i1+1] * rtb_Filter_Taps[i1+1]); } } /* Sum: '<S11>/sum1' incorporates: * Selector: '<S11>/Selector1' */ rtb_sum1 = rtb_Sum2[rtb_For_Iterator-1] - rtb_sum2; /* Assignment: '<S11>/Assignment1' */ if (iterS11 == 1) { (void)memcpy(&ANC_noise_wav_B->Assignment1[0],&ANC_noise_wav_B>From_Wave_File[0],32*sizeof(real32_T)); } ANC_noise_wav_B->Assignment1[rtb_For_Iterator-1] = rtb_sum1; /* SubSystem: '<S11>/Update' */ /* Output and update for enable system: '<S11>/Update' */ { /* simstruct variables */ ANC_noise_wav_BlockIO *ANC_noise_wav_B = (ANC_noise_wav_BlockIO *) _ssGetBlockIO(S); ANC_noise_wav_Parameters *ANC_noise_wav_P = (ANC_noise_wav_Parameters *) ssGetDefaultParam(S);
112
_______________________________________________________________________ if (ANC_noise_wav_B->Switch_a[1]) { /* Math: '<S15>/conj' * * Regarding '<S15>/conj': * Op: conj */ { int_T i1; const real32_T *u0 = &ANC_noise_wav_B->Buffer[0]; real32_T *y0 = &rtb_conj[0]; for (i1=0; i1 < 40; i1++) { y0[i1] = u0[i1]; } } /* DSP Blockset Normalization (sdsp2norm2) - '<S15>/Normalization' */ { real32_T *y = &rtb_Normalization[0]; const real32_T *u0 = &rtb_conj[0]; int_T inRows = 40; real32_T E = 0.0; int_T i; /* Determine energy (sum of squares): */ for(i=inRows; i-- > 0; ) { E += *u0 * *u0; u0++; } /* Normalize input vector by squared 2-norm: */ E = 1.0F / (E + 1.0E-010F); u0 -= inRows; /* Back up to beginning of input column. */ for(i=inRows; i-- > 0; ) { *y++ = *u0++ * E; } } /* Switch: '<S3>/Switch' */ if (ANC_noise_wav_B->S_Function_a) { { /* simstruct variables */ ANC_noise_wav_BlockIO *ANC_noise_wav_B = (ANC_noise_wav_BlockIO *) _ssGetBlockIO(S); ANC_noise_wav_Parameters *ANC_noise_wav_P =
113
_______________________________________________________________________ (ANC_noise_wav_Parameters *) ssGetDefaultParam(S); /* Constant: '<Root>/Slow Adapt' */ ANC_noise_wav_B->Slow_Adapt = ANC_noise_wav_P>Slow_Adapt_Value; } rtb_Switch_c = ANC_noise_wav_B->Slow_Adapt; } else { { /* simstruct variables */ ANC_noise_wav_BlockIO *ANC_noise_wav_B = (ANC_noise_wav_BlockIO *) _ssGetBlockIO(S); ANC_noise_wav_Parameters *ANC_noise_wav_P = (ANC_noise_wav_Parameters *) ssGetDefaultParam(S); /* Constant: '<Root>/Fast Adapt' */ ANC_noise_wav_B->Fast_Adapt = ANC_noise_wav_P->Fast_Adapt_Value; } rtb_Switch_c = ANC_noise_wav_B->Fast_Adapt; } /* Sum: '<S15>/sum' incorporates: * Gain: '<S15>/Gain2' * Product: '<S15>/Product' * * Regarding '<S15>/Gain2': * Gain value: ANC_noise_wav_P->Gain2_Gain */ { int_T i1; real32_T *y0 = &ANC_noise_wav_B->sum[0]; for (i1=0; i1 < 40; i1++) { y0[i1] = (rtb_Filter_Taps[i1] * ANC_noise_wav_P->Gain2_Gain) + (rtb_sum1 * rtb_Normalization[i1] * rtb_Switch_c); } } } } /* Update: DSP Blockset Delay (sdspdelay) - '<S11>/Filter Taps' */ { byte_T *buff = (byte_T *) &ANC_noise_wav_DWork>Filter_Taps_IC_BUFF[0]; const byte_T *u = (const byte_T *) &ANC_noise_wav_B->sum[0];
114
_______________________________________________________________________ /* 40 channel input, scalar delay (Delay = 1, Samples per channel = 1) */ const int_T bytesInBuffer = sizeof(real32_T); memcpy(buff, u, bytesInBuffer*40); } } } }
115
_______________________________________________________________________ FBLMS adaptive algorithm in Matlab (C source code) of the AEC /* SubSystem: '<Root>/Original' */ /* Outputs for atomic system: '<Root>/Original' */ { /* simstruct variables */ c6713dskfblms_BlockIO *c6713dskfblms_B = (c6713dskfblms_BlockIO *) _ssGetBlockIO(S); /* S-Function (sfix_fix2fix): '<S2>/Gateway Out1' * * Regarding '<S2>/Gateway Out1': * Fixed-Point Conversion Block: '<S2>/Gateway Out1' * Input0 Data Type: Fixed Point S16 2^-15 * Output0 Data Type: Floating Point real_T * Round Mode: Floor * Saturation Mode: Wrap * Output's Real World Value should equal * input's Real World Value, if possible. */ { int_T i1; const int16_T *u0 = &c6713dskfblms_B->General_Real_FIR[0]; real_T *y0 = &c6713dskfblms_B->Gateway_Out1_a[0]; for (i1=0; i1 < 64; i1++) { y0[i1] = ldexp((double)u0[i1],-15); } } /* SubSystem: '<S2>/Original' */ c6713dskfblms_Original(S); } /* SubSystem: '<Root>/Plot' */ /* Outputs for atomic system: '<Root>/Plot' */ { /* simstruct variables */ c6713dskfblms_BlockIO *c6713dskfblms_B = (c6713dskfblms_BlockIO *) _ssGetBlockIO(S); /* S-Function (sfix_fix2fix): '<S3>/Gateway Out1' * * Regarding '<S3>/Gateway Out1': * Fixed-Point Conversion Block: '<S3>/Gateway Out1'
116
_______________________________________________________________________ * Input0 Complex Data Type: Fixed Point S32 2^-24 * Output0 Complex Data Type: Floating Point real_T * Round Mode: Floor * Saturation Mode: Wrap * Output's Real World Value should equal * input's Real World Value, if possible. */ { int_T i1; const cint32_T *u0 = &c6713dskfblms_B->Filter_Taps2[0]; creal_T *y0 = &rtb_temp31[0]; for (i1=0; i1 < 128; i1++) { y0[i1].re = ldexp((double)u0[i1].re,-24); y0[i1].im = ldexp((double)u0[i1].im,-24); } } /* DSP Blockset FFT (sdspfft2) - '<S3>/IFFT1' */ /* Complex input, complex output,1 channels, 128 rows, linear output order */ MWDSP_R2BR_Z(&rtb_temp31[0], 1, 128, 128); /* In-place bit-reverse reordering */ /* Radix-2 DIT IFFT using TableSpeed twiddle computation */ MWDSP_R2DIT_TBLS_Z(&rtb_temp31[0], 1, 128, 128, &rtcP_IFFT1_TwiddleTable[0], 1, 1); MWDSP_ScaleData_DZ(&rtb_temp31[0], 128, 1.0/128); /* ComplexToRealImag: '<S3>/Complex to Real-Imag1' */ { int_T i1; const creal_T *u0 = &rtb_temp31[0]; real_T *y0 = &c6713dskfblms_B->Complex_to_Real_Imag1_d[0]; for (i1=0; i1 < 128; i1++) { y0[i1] = u0[i1].re; } } /* DSP Blockset Submatrix S-Function (sdspsubmtrx) - '<S3>/Last Half1' ** Real input, data type: real_T */ { byte_T *u = (byte_T *) &c6713dskfblms_B->Complex_to_Real_Imag1_d[0]; byte_T *y = (byte_T *) &c6713dskfblms_B->Last_Half1[0]; const int_T bytesPerElement = sizeof(real_T);
117
_______________________________________________________________________ memcpy(y, u, (65 * bytesPerElement)); } /* SubSystem: '<S3>/Tap Coefficients Update ' */ c6713dskfblms_Tap_Coeffic(S); } /* SubSystem: '<Root>/Reject' */ /* Outputs for atomic system: '<Root>/Reject' */ { /* simstruct variables */ c6713dskfblms_BlockIO *c6713dskfblms_B = (c6713dskfblms_BlockIO *) _ssGetBlockIO(S); /* S-Function (sfix_fix2fix): '<S4>/Gateway Out1' * * Regarding '<S4>/Gateway Out1': * Fixed-Point Conversion Block: '<S4>/Gateway Out1' * Input0 Data Type: Fixed Point S16 2^-15 * Output0 Data Type: Floating Point real_T * Round Mode: Floor * Saturation Mode: Wrap * Output's Real World Value should equal * input's Real World Value, if possible. */ { int_T i1; const int16_T *u0 = &c6713dskfblms_B->Complex_to_Real_Imag_c[0]; real_T *y0 = &c6713dskfblms_B->Gateway_Out1_c[0]; for (i1=0; i1 < 64; i1++) { y0[i1] = ldexp((double)u0[i1],-15); } } /* SubSystem: '<S4>/Reject' */ c6713dskfblms_Reject(S); } /* ComplexToRealImag: '<S16>/Complex to Real-Imag1' */ { int_T i1; const cint32_T *u0 = &rtb_temp32[0]; int32_T *y0 = &c6713dskfblms_B->temp16[0]; int32_T *y1 = &c6713dskfblms_B->temp17[0];
118
_______________________________________________________________________ for (i1=0; i1 < 128; i1++) { y0[i1] = u0[i1].re; y1[i1] = u0[i1].im; } } /* DSP Blockset Pad (sdsppad) - '<S1>/Zero Pad' */ /* Input dimensions: [64 x 1], output dimensions: [128 x 1] */ MWDSP_PadPreAlongCols( (const byte_T *)&rtb_sum1[0], (byte_T *)&c6713dskfblms_B->temp29[0], (byte_T *)(&c6713dskfblms_P->Zero_Pad_PadValue), 1, 128 * sizeof(int16_T), 64, 2 * sizeof(int16_T) ); /* Level2 S-Function Block: <S1>/FFT (stic6x_fft16x16r) */ /* Call into Simulink for MEX-version of S-function */ ssCallAccelRunBlock(S, 6, 40, SS_CALL_MDL_OUTPUTS); /* S-Function (sfix_fix2fix): '<S14>/Conversion1' * * Regarding '<S14>/Conversion1': * Fixed-Point Conversion Block: '<S14>/Conversion1' * Input0 Complex Data Type: Fixed Point S16 2^-12 * Output0 Complex Data Type: Fixed Point S32 2^-24 * Round Mode: Floor * Saturation Mode: Wrap * Output's Real World Value should equal * input's Real World Value, if possible. */ { int_T i1; const cint16_T *u0 = &c6713dskfblms_B->temp21[0]; cint32_T *y0 = &rtb_temp35[0]; for (i1=0; i1 < 128; i1++) { y0[i1].re = LSL_S32(12,((int32_T)u0[i1].re)); y0[i1].im = LSL_S32(12,((int32_T)u0[i1].im)); } } /* Gain: '<S14>/Mu5' */ /* Gain Block: '<S14>/Mu5' * * y[i] = k * u[i] i = 0 to 127 * * Input0 Complex Data Type: Fixed Point S32 2^-24 * Output0 Complex Data Type: Fixed Point S32 2^-24 * Round Mode: Floor * Saturation Mode: Saturate
119
_______________________________________________________________________ * * Parameter: Gain * Data Type: Fixed Point * */ { { int_T i1;
S32 2^-24
const cint32_T *u0 = &rtb_temp35[0]; cint32_T *y0 = &rtb_temp32[0]; for (i1=0; i1 < 128; i1++) { MUL_S32_S32_S32_SR24_SAT(y0[i1].re,u0[i1].re,c6713dskfblms_P>Mu5_Gain); MUL_S32_S32_S32_SR24_SAT(y0[i1].im,u0[i1].im,c6713dskfblms_P>Mu5_Gain); } } } /* ComplexToRealImag: '<S16>/Complex to Real-Imag2' */ { int_T i1; const cint32_T *u0 = &rtb_temp32[0]; int32_T *y0 = &c6713dskfblms_B->temp13[0]; int32_T *y1 = &c6713dskfblms_B->temp12[0]; for (i1=0; i1 < 128; i1++) { y0[i1] = u0[i1].re; y1[i1] = u0[i1].im; } } /* Level2 S-Function Block: <S16>/Vector Multiply1 (stic6x_mul32) */ /* Call into Simulink for MEX-version of S-function */ ssCallAccelRunBlock(S, 6, 44, SS_CALL_MDL_OUTPUTS); /* Level2 S-Function Block: <S16>/Vector Negate (stic6x_neg32) */ /* Call into Simulink for MEX-version of S-function */ ssCallAccelRunBlock(S, 6, 45, SS_CALL_MDL_OUTPUTS); /* Level2 S-Function Block: <S16>/Vector Multiply (stic6x_mul32) */ /* Call into Simulink for MEX-version of S-function */ ssCallAccelRunBlock(S, 6, 46, SS_CALL_MDL_OUTPUTS); /* Sum: '<S16>/sum2' * * Regarding '<S16>/sum2':
120
_______________________________________________________________________ * Sum Block: '<S16>/sum2' * * y[i] = u0[i] - u1[i] i = 0 to 127 * * Input0 Data Type: Fixed Point S32 2^-23 * Input1 Data Type: Fixed Point S32 2^-23 * Output0 Data Type: Fixed Point S32 2^-23 * Round Mode: Floor * Saturation Mode: Saturate */ { int_T i1; const int32_T *u0 = &c6713dskfblms_B->temp8[0]; const int32_T *u1 = &c6713dskfblms_B->temp17[0]; int32_T *y0 = &rtb_temp37[0]; for (i1=0; i1 < 128; i1++) { y0[i1] = u0[i1]; ACCUM_NEG_S32_S32_SAT(y0[i1],u1[i1]); } } /* Level2 S-Function Block: <S16>/Vector Multiply3 (stic6x_mul32) */ /* Call into Simulink for MEX-version of S-function */ ssCallAccelRunBlock(S, 6, 48, SS_CALL_MDL_OUTPUTS); /* Level2 S-Function Block: <S16>/Vector Multiply2 (stic6x_mul32) */ /* Call into Simulink for MEX-version of S-function */ ssCallAccelRunBlock(S, 6, 49, SS_CALL_MDL_OUTPUTS); /* Sum: '<S16>/sum3' * * Regarding '<S16>/sum3': * Sum Block: '<S16>/sum3' * * y[i] = u0[i] + u1[i] i = 0 to 127 * * Input0 Data Type: Fixed Point S32 2^-23 * Input1 Data Type: Fixed Point S32 2^-23 * Output0 Data Type: Fixed Point S32 2^-23 * Round Mode: Floor * Saturation Mode: Saturate */ { int_T i1; const int32_T *u0 = &c6713dskfblms_B->temp17[0]; const int32_T *u1 = &c6713dskfblms_B->temp16[0]; int32_T *y0 = &rtb_temp36[0];
121
_______________________________________________________________________
for (i1=0; i1 < 128; i1++) { y0[i1] = u0[i1]; ACCUM_POS_S32_S32_SAT(y0[i1],u1[i1]); } } /* RealImagToComplex: '<S16>/Real-Imag to Complex' */ { int_T i1; const int32_T *u0 = &rtb_temp37[0]; const int32_T *u1 = &rtb_temp36[0]; cint32_T *y0 = &rtb_temp35[0]; for (i1=0; i1 < 128; i1++) { y0[i1].re = u0[i1]; y0[i1].im = u1[i1]; } } /* S-Function (sfix_fix2fix): '<S17>/Conversion' * * Regarding '<S17>/Conversion': * Fixed-Point Conversion Block: '<S17>/Conversion' * Input0 Complex Data Type: Fixed Point S32 2^-23 * Output0 Complex Data Type: Fixed Point S16 2^-9 * Round Mode: Nearest * Saturation Mode: Saturate * Output's Real World Value should equal * input's Real World Value, if possible. */ { int_T i1; const cint32_T *u0 = &rtb_temp35[0]; cint16_T *y0 = &rtb_temp43[0]; for (i1=0; i1 < 128; i1++) { FIX2FIX_S16_S32_SR14_SAT_NEAR(y0[i1].re,u0[i1].re); FIX2FIX_S16_S32_SR14_SAT_NEAR(y0[i1].im,u0[i1].im); } } /* Gain: '<S19>/Gain3' */ /* Gain Block: '<S19>/Gain3' * * y[i] = k * u[i] i = 0 to 127 *
122
_______________________________________________________________________ * Input0 Complex Data Type: Fixed Point S16 2^-15 * Output0 Complex Data Type: Fixed Point S16 2^-15 * Round Mode: Floor * Saturation Mode: Saturate * * Parameter: Gain * Data Type: Fixed Point S16 2^-15 * */ { { int_T i1; const cint16_T *u0 = &rtb_temp43[0]; cint16_T *y0 = &rtb_temp42[0]; for (i1=0; i1 < 128; i1++) { MUL_S16_S16_S16_SR15_SAT(y0[i1].re,u0[i1].re,c6713dskfblms_P>Gain3_c_Gain); MUL_S16_S16_S16_SR15_SAT(y0[i1].im,u0[i1].im,c6713dskfblms_P>Gain3_c_Gain); } } } /* ComplexToRealImag: '<S21>/Complex to Real-Imag' */ { int_T i1; const cint16_T *u0 = &rtb_temp42[0]; int16_T *y0 = &rtb_temp44[0]; int16_T *y1 = &rtb_temp45[0]; for (i1=0; i1 < 128; i1++) { y0[i1] = u0[i1].re; y1[i1] = u0[i1].im; } } /* S-Function (sfix_abs): '<S21>/Unary Minus' */ /* Fixed-Point Unary Minus: '<S21>/Unary Minus' * Input0 Data Type: Fixed Point S16 2^-15 * Output0 Data Type: Fixed Point S16 2^-15 * Round Mode: Floor * Saturation Mode: Wrap */ { int_T i1;
123
_______________________________________________________________________ const int16_T *u0 = &rtb_temp45[0]; int16_T *y0 = &rtb_temp45[0]; for (i1=0; i1 < 128; i1++) { y0[i1] = -u0[i1]; } } /* RealImagToComplex: '<S21>/Real-Imag to Complex' */ { int_T i1; const int16_T *u0 = &rtb_temp44[0]; const int16_T *u1 = &rtb_temp45[0]; cint16_T *y0 = &c6713dskfblms_B->temp29[0]; for (i1=0; i1 < 128; i1++) { y0[i1].re = u0[i1]; y0[i1].im = u1[i1]; } } /* Level2 S-Function Block: <S19>/FFT (stic6x_fft16x16r) */ /* Call into Simulink for MEX-version of S-function */ ssCallAccelRunBlock(S, 6, 57, SS_CALL_MDL_OUTPUTS); /* ComplexToRealImag: '<S20>/Complex to Real-Imag' */ { int_T i1; const cint16_T *u0 = &c6713dskfblms_B->temp21[0]; int16_T *y0 = &rtb_temp44[0]; int16_T *y1 = &rtb_temp45[0]; for (i1=0; i1 < 128; i1++) { y0[i1] = u0[i1].re; y1[i1] = u0[i1].im; } } /* S-Function (sfix_abs): '<S20>/Unary Minus' */ /* Fixed-Point Unary Minus: '<S20>/Unary Minus' * Input0 Data Type: Fixed Point S16 2^-12 * Output0 Data Type: Fixed Point S16 2^-12 * Round Mode: Floor * Saturation Mode: Wrap */ { int_T i1;
124
_______________________________________________________________________
const int16_T *u0 = &rtb_temp45[0]; int16_T *y0 = &rtb_temp45[0]; for (i1=0; i1 < 128; i1++) { y0[i1] = -u0[i1]; } } /* RealImagToComplex: '<S20>/Real-Imag to Complex' */ { int_T i1; const int16_T *u0 = &rtb_temp44[0]; const int16_T *u1 = &rtb_temp45[0]; cint16_T *y0 = &rtb_temp43[0]; for (i1=0; i1 < 128; i1++) { y0[i1].re = u0[i1]; y0[i1].im = u1[i1]; } } /* Gain: '<S19>/Gain1' */ /* Gain Block: '<S19>/Gain1' * * y[i] = k * u[i] i = 0 to 127 * * Input0 Complex Data Type: Fixed Point S16 2^-12 * Output0 Complex Data Type: Fixed Point S16 2^-12 * Round Mode: Floor * Saturation Mode: Saturate * * Parameter: Gain * Data Type: Fixed Point S16 2^-12 * */ { { int_T i1; const cint16_T *u0 = &rtb_temp43[0]; cint16_T *y0 = &rtb_temp42[0]; for (i1=0; i1 < 128; i1++) { MUL_S16_S16_S16_SR12_SAT(y0[i1].re,u0[i1].re,c6713dskfblms_P>Gain1_c_Gain); MUL_S16_S16_S16_SR12_SAT(y0[i1].im,u0[i1].im,c6713dskfblms_P>Gain1_c_Gain); }
125
_______________________________________________________________________ } } /* DSP Blockset Overwrite (sdspoverwrite) - '<S10>/Overwrite End1' - Output */ { cint16_T *y = (cint16_T *) &c6713dskfblms_B->Overwrite_End1[0]; cint16_T *pValue = 0; int_T colIdx; memcpy( &c6713dskfblms_B->Overwrite_End1[0], &rtb_temp42[0], (256 * sizeof(int16_T)) ); pValue = (cint16_T *)&c6713dskfblms_P->Overwrite_End1_OverWritingVal; y += 0 * 128; /* Loop from starting column index through ending column index */ for (colIdx = 0; colIdx <= 0; colIdx++) { { /* MWDSP_CopyScalarICs */ int_T i = 64; int_T tmpIncre = 0; while (i-- > 0) { memcpy( y + tmpIncre + 64, pValue, 4 ); tmpIncre ++; } } /* Bump output pointer for next time */ y += 128; } } /* Gain: '<S18>/Gain3' */ /* Gain Block: '<S18>/Gain3' * * y[i] = k * u[i] i = 0 to 127 * * Input0 Complex Data Type: Fixed Point S16 2^-12 * Output0 Complex Data Type: Fixed Point S16 2^-12 * Round Mode: Floor * Saturation Mode: Saturate * * Parameter: Gain * Data Type: Fixed Point S16 2^-12 * */ {
126
_______________________________________________________________________ { int_T i1; const cint16_T *u0 = &c6713dskfblms_B->Overwrite_End1[0]; cint16_T *y0 = &c6713dskfblms_B->temp29[0]; for (i1=0; i1 < 128; i1++) { MUL_S16_S16_S16_SR12_SAT(y0[i1].re,u0[i1].re,c6713dskfblms_P>Gain3_d_Gain); MUL_S16_S16_S16_SR12_SAT(y0[i1].im,u0[i1].im,c6713dskfblms_P>Gain3_d_Gain); } } } /* Level2 S-Function Block: <S18>/FFT (stic6x_fft16x16r) */ /* Call into Simulink for MEX-version of S-function */ ssCallAccelRunBlock(S, 6, 64, SS_CALL_MDL_OUTPUTS); /* Gain: '<S18>/Gain1' */ /* Gain Block: '<S18>/Gain1' * * y[i] = k * u[i] i = 0 to 127 * * Input0 Complex Data Type: Fixed Point S16 2^-9 * Output0 Complex Data Type: Fixed Point S32 2^-23 * Round Mode: Floor * Saturation Mode: Saturate * * Parameter: Gain * Data Type: Fixed Point S16 2^-14 * */ { { int_T i1; const cint16_T *u0 = &c6713dskfblms_B->temp21[0]; cint32_T *y0 = &rtb_temp35[0]; for (i1=0; i1 < 128; i1++) { MUL_S32_S16_S16(y0[i1].re,u0[i1].re,c6713dskfblms_P->Gain1_d_Gain); MUL_S32_S16_S16(y0[i1].im,u0[i1].im,c6713dskfblms_P->Gain1_d_Gain); } } } /* S-Function (sfix_fix2fix): '<S10>/Conversion' * * Regarding '<S10>/Conversion':
127
_______________________________________________________________________ * Fixed-Point Conversion Block: '<S10>/Conversion' * Input0 Complex Data Type: Fixed Point S32 2^-23 * Output0 Complex Data Type: Fixed Point S16 2^-14 * Round Mode: Floor * Saturation Mode: Wrap * Output's Real World Value should equal * input's Real World Value, if possible. */ { int_T i1; const cint32_T *u0 = &rtb_temp35[0]; cint16_T *y0 = &rtb_temp43[0]; for (i1=0; i1 < 128; i1++) { y0[i1].re = ((int16_T)ASR(9,u0[i1].re)); y0[i1].im = ((int16_T)ASR(9,u0[i1].im)); } } /* Gain: '<S7>/Mu1' */ /* Gain Block: '<S7>/Mu1' * * y[i] = k * u[i] i = 0 to 127 * * Input0 Complex Data Type: Fixed Point S16 2^-14 * Output0 Complex Data Type: Fixed Point S16 2^-14 * Round Mode: Floor * Saturation Mode: Wrap * * Parameter: Gain * Data Type: Fixed Point S16 2^-15 * */ { { int_T i1; const cint16_T *u0 = &rtb_temp43[0]; cint16_T *y0 = &rtb_temp42[0]; for (i1=0; i1 < 128; i1++) { MUL_S16_S16_S16_SR15(y0[i1].re,u0[i1].re,c6713dskfblms_P->Mu1_a_Gain); MUL_S16_S16_S16_SR15(y0[i1].im,u0[i1].im,c6713dskfblms_P->Mu1_a_Gain); } } } /* ComplexToRealImag: '<S7>/Complex to Real-Imag1' */
128
_______________________________________________________________________ { int_T i1; const cint16_T *u0 = &rtb_temp42[0]; int16_T *y0 = &c6713dskfblms_B->Complex_to_Real_Imag1_c_o1[0]; int16_T *y1 = &c6713dskfblms_B->Complex_to_Real_Imag1_c_o2[0]; for (i1=0; i1 < 128; i1++) { y0[i1] = u0[i1].re; y1[i1] = u0[i1].im; } } /* Level2 S-Function Block: <S7>/Vector Sum of Squares2 (stic6x_vecsumsq) */ /* Call into Simulink for MEX-version of S-function */ ssCallAccelRunBlock(S, 6, 69, SS_CALL_MDL_OUTPUTS); /* Level2 S-Function Block: <S7>/Vector Sum of Squares3 (stic6x_vecsumsq) */ /* Call into Simulink for MEX-version of S-function */ ssCallAccelRunBlock(S, 6, 70, SS_CALL_MDL_OUTPUTS); /* Sum: '<S7>/Sum2' * * Regarding '<S7>/Sum2': * Sum Block: '<S7>/Sum2' * * y = u0 + u1 * * Input0 Data Type: Fixed Point S32 2^-28 * Input1 Data Type: Fixed Point S32 2^-28 * Output0 Data Type: Fixed Point S32 2^-28 * Round Mode: Floor * Saturation Mode: Saturate */ rtb_Sum2 = c6713dskfblms_B->Vector_Sum_of_Squares2; ACCUM_POS_S32_S32_SAT(rtb_Sum2,c6713dskfblms_B>Vector_Sum_of_Squares3); /* S-Function (sfix_fix2fix): '<S7>/Conversion1' * * Regarding '<S7>/Conversion1': * Fixed-Point Conversion Block: '<S7>/Conversion1' * Input0 Data Type: Fixed Point S32 2^-28 * Output0 Data Type: Fixed Point U16 2^-13 * Round Mode: Floor * Saturation Mode: Saturate * Output's Real World Value should equal * input's Real World Value, if possible. */
129
_______________________________________________________________________ FIX2FIX_U16_S32_SR15_SAT(rtb_Conversion1_b,rtb_Sum2); /* Lookup Block: '<S7>/Sqrt(x)^-1' * Input0 Data Type: Fixed Point U16 2^-13 * Output0 Data Type: Fixed Point U16 2^-12 * Round Mode: Floor * Saturation Mode: Saturate * Lookup Method: Nearest * * XData parameter uses the same data type and scaling as Input0 * YData parameter uses the same data type and scaling as Output0 */ { unsigned int iLeft;
BINARYSEARCH_U16_U16_Near_iL(&(iLeft),rtb_Conversion1_b,&c6713dskfblms_ P->Sqrt_x_1_XData[0],63); rtb_Sqrt_x_1 = c6713dskfblms_P->Sqrt_x_1_YData[iLeft]; } /* Product: '<S7>/Product3' */ /* Product Block: '<S7>/Product3' * * y[i] = u0[i] * u1 i = 0 to 127 * * Input0 Complex Data Type: Fixed Point S16 2^-14 * Input1 Real Data Type: Fixed Point U16 2^-12 * Output0 Complex Data Type: Fixed Point S32 2^-26 * Round Mode: Floor * Saturation Mode: Saturate */ { int_T i1; const cint16_T *u0 = &rtb_temp43[0]; cint32_T *y0 = &rtb_temp35[0]; for (i1=0; i1 < 128; i1++) { MUL_S32_S16_U16(y0[i1].re,u0[i1].re,rtb_Sqrt_x_1); MUL_S32_S16_U16(y0[i1].im,u0[i1].im,rtb_Sqrt_x_1); } } /* Gain: '<S1>/Mu1' */ /* Gain Block: '<S1>/Mu1' * * y[i] = k * u[i] i = 0 to 127 *
130
_______________________________________________________________________ * Input0 Complex Data Type: Fixed Point S32 2^-26 * Output0 Complex Data Type: Fixed Point S32 2^-31 * Round Mode: Floor * Saturation Mode: Wrap * * Parameter: Gain * Data Type: Fixed Point S32 2^-35 * */ { { int_T i1; const cint32_T *u0 = &rtb_temp35[0]; cint32_T *y0 = &rtb_temp32[0]; for (i1=0; i1 < 128; i1++) { MUL_S32_S32_S32_SR30(y0[i1].re,u0[i1].re,c6713dskfblms_P->Mu1_b_Gain); MUL_S32_S32_S32_SR30(y0[i1].im,u0[i1].im,c6713dskfblms_P>Mu1_b_Gain); } } } /* Sum: '<S1>/sum2' * * Regarding '<S1>/sum2': * Sum Block: '<S1>/sum2' * * y[i] = u0[i] + u1[i] i = 0 to 127 * * Input0 Complex Data Type: Fixed Point S32 2^-24 * Input1 Complex Data Type: Fixed Point S32 2^-31 * Output0 Complex Data Type: Fixed Point S32 2^-24 * Round Mode: Floor * Saturation Mode: Saturate */ { int_T i1; const cint32_T *u0 = &rtb_Mu2[0]; const cint32_T *u1 = &rtb_temp32[0]; cint32_T *y0 = &rtb_temp35[0]; for (i1=0; i1 < 128; i1++) { y0[i1].re = u0[i1].re; ACCUM_POS_S32_S32_SAT(y0[i1].re,ASR(7,u1[i1].re)); y0[i1].im = u0[i1].im; ACCUM_POS_S32_S32_SAT(y0[i1].im,ASR(7,u1[i1].im)); }
131
_______________________________________________________________________ } /* DSP Blockset Overwrite (sdspoverwrite) - '<S13>/Set Taps to Zeros' - Output */ { cint32_T *y = (cint32_T *) &c6713dskfblms_B->Set_Taps_to_Zero[0]; cint32_T *pValue = 0; int_T colIdx; memcpy( &c6713dskfblms_B->Set_Taps_to_Zero[0], &rtb_temp35[0], (256 * sizeof(int32_T)) ); pValue = (cint32_T *)&c6713dskfblms_P->Set_Taps_to_Zero_OverWritingVal; y += 0 * 128; /* Loop from starting column index through ending column index */ for (colIdx = 0; colIdx <= 0; colIdx++) { { /* MWDSP_CopyScalarICs */ int_T i = 128; int_T tmpIncre = 0; while (i-- > 0) { memcpy( y + tmpIncre + 0, pValue, 8 ); tmpIncre ++; } } /* Bump output pointer for next time */ y += 128; } }
132
_______________________________________________________________________ B. Bibliography Visited Websites
DSP Companies Texas Instruments - http://www.ti.com/ Analog Devices - http://www.analog.com/ Motorola - http://e-www.motorola.com/ All DSP Companies - http://www.eas.asu.edu/~dsp/links/companies.html
Toolkits MATLAB / Simulink - http://www.mathworks.com TI Code Composer Studio - http://www.ti.com National Instruments LabVIEW - http://www.ni.com/
Other websites DSP Algorithms - http://www.dspalgorithms.com/ DSP Engineering - http://www.dspengineering.com/ DSP Guru - http://www.dspguru.com/ EG3 Portal - http://eg3.com/navi/dsp.htm Introduction to Digital Filters - http://ccrma.stanford.edu/~jos/filters/
Consulted Books
Mixed-Signal and DSP Design Techniques Published and edited by Analog Devices, Inc. The Scientist and Engineer's Guide to Digital Signal Processing Published and edited by California Technical Publishing Author: Steven W. Smith TSM320C6713 Floating-Point DSP Guide Published and edited by Texas Instruments Using MATLAB version 6 Published and edited by Mathworks Using Simulink version 5 Published and edited by Mathworks
133

Angel Iniesta &amp; Carlos Lopez Thesis

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Angel Iniesta &amp; Carlos Lopez Thesis

Caricato da

Copyright:

Formati disponibili

Digital Filtering vs DSP for Acoustic Echo and Noise Cancelling

Project realized by: Carlos Lpez Snchez ngel Iniesta Navarro

Digital Filtering vs DSP for Acoustic Echo and Noise Cancelling

ngel Iniesta & Carlos Lpez

Digital Filtering vs DSP for Acoustic Echo and Noise Cancelling

ngel Iniesta & Carlos Lpez

3. Analysis of the possible solutions

Digital Filtering vs DSP for Acoustic Echo and Noise Cancelling

ngel Iniesta & Carlos Lpez

4. Description of the chosen solution

Digital Filtering vs DSP for Acoustic Echo and Noise Cancelling

ngel Iniesta & Carlos Lpez

1.2. Surrounding of the topic

Digital Filtering vs DSP for Acoustic Echo and Noise Cancelling

ngel Iniesta & Carlos Lpez

Figure: block diagram of the acoustic echo canceller

Digital Filtering vs DSP for Acoustic Echo and Noise Cancelling

ngel Iniesta & Carlos Lpez

Digital Filtering vs DSP for Acoustic Echo and Noise Cancelling

ngel Iniesta & Carlos Lpez

Figure: detailed Block diagram of the acoustic echo canceller.

Short description of each block that configures the AEC system:

Digital Filtering vs DSP for Acoustic Echo and Noise Cancelling

ngel Iniesta & Carlos Lpez

Digital Filtering vs DSP for Acoustic Echo and Noise Cancelling

ngel Iniesta & Carlos Lpez

Digital Filtering vs DSP for Acoustic Echo and Noise Cancelling

ngel Iniesta & Carlos Lpez

Digital Filtering vs DSP for Acoustic Echo and Noise Cancelling

ngel Iniesta & Carlos Lpez

Digital Filtering vs DSP for Acoustic Echo and Noise Cancelling

ngel Iniesta & Carlos Lpez

Digital Filtering vs DSP for Acoustic Echo and Noise Cancelling

ngel Iniesta & Carlos Lpez

_______________________________________________________________________ The equation that represents the convolution is:

Digital Filtering vs DSP for Acoustic Echo and Noise Cancelling

ngel Iniesta & Carlos Lpez

Digital Filtering vs DSP for Acoustic Echo and Noise Cancelling

ngel Iniesta & Carlos Lpez

Digital Filtering vs DSP for Acoustic Echo and Noise Cancelling

ngel Iniesta & Carlos Lpez

Digital Filtering vs DSP for Acoustic Echo and Noise Cancelling

ngel Iniesta & Carlos Lpez

Digital Filtering vs DSP for Acoustic Echo and Noise Cancelling

ngel Iniesta & Carlos Lpez

2.2. Digital Filters and DSPs

Digital Filtering vs DSP for Acoustic Echo and Noise Cancelling

ngel Iniesta & Carlos Lpez

Digital Filtering vs DSP for Acoustic Echo and Noise Cancelling

ngel Iniesta & Carlos Lpez

Digital Filtering vs DSP for Acoustic Echo and Noise Cancelling

ngel Iniesta & Carlos Lpez

Digital Filtering vs DSP for Acoustic Echo and Noise Cancelling

ngel Iniesta & Carlos Lpez

Digital Filtering vs DSP for Acoustic Echo and Noise Cancelling

ngel Iniesta & Carlos Lpez

y [n] = a0 x [n] + a1 x [n1] + a2 x [n2] + a3 x [n3] + + b1 y [n1] + b2 y [n2] + b3 y [n3]+

Digital Filtering vs DSP for Acoustic Echo and Noise Cancelling

ngel Iniesta & Carlos Lpez

2.2.1.7. Non-recursive filters: FIR filters

Digital Filtering vs DSP for Acoustic Echo and Noise Cancelling

ngel Iniesta & Carlos Lpez

Digital Filtering vs DSP for Acoustic Echo and Noise Cancelling

ngel Iniesta & Carlos Lpez

Angel Iniesta & Carlos Lopez Thesis

Angel Iniesta & Carlos Lopez Thesis