Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Outline/objectives
Identify the most important DSP processor architecture features and how they relate to DSP applications Understand the types of code appropriate for DSP implementation
What is a DSP?
A specialized microprocessor for realtime DSP applications
Digital filtering (FIR and IIR) FFT Convolution, Matrix Multiplication etc
DIGITAL INPUT ADC DSP DIGITAL OUTPUT DAC
ANALOG INPUT
ANALOG OUTPUT
low
Medium
Low Medium
Medium
Short
Short
Harvard Architecture
Physically separate memories and paths for instruction and data
DATA MEMORY
CPU
PROGRAM MEMORY
(a ix i )
i=0
Can compute a sum of nproducts in n cycles
F=a+b
c=e/g
d=x&y
w=z*h
F
PU
c
PU
d
PU
10
Pipelining
DSPs commonly feature deep pipelines TMS320C6x processors have 3 pipeline stages with a number of phases (cycles):
Fetch
Program Address Generate (PG) Program Address Send (PS) Program ready wait (PW) Program receive (PR)
Decode
Dispatch (DP) Decode (DC)
Execute
6 to 10 phases
ACOE343 - Embedded Real-Time Processor Systems Frederick University 11
Saturation Arithmetic
fixed range for operations like addition and multiplication normal overflow and underflow produce the maximum and minimum allowed value, respectively Associativity and distributivity no longer apply 1 signed byte saturation arithmetic examples:
64 + 69 = 127 -127 5 = -128 (64 + 70) 25 = 122 64 + (70 -25) = 109
12
Examples
Perform the following operations using one-byte saturation arithmetic
0x77 + 0x99 = 0x4*0x42= 0x3*0x51=
13
Tail
Cache memory
Separate instruction and data L1 caches (Harvard architecture) Cache coherence protocols required, since most systems use DMA
17
Microcontroller
Mostly von Neumann Architecture Single execution unit Flexible bit-level operations No hardware MACs Control applications
18
Examples
Estimate how long will the following code fragment take to execute on
A general purpose processor with 1 GHz operating frequency, five-stage pipelining and 5 cycles required for multiplication, 1 cycle for addition A DSP running at 500 MHz, zero overhead looping and 6 independent ALUs and 2 independent singlecycle MAC units?
for (i=0; i<8; i++) { a[i] = 2*i + 3; b[i] = 3*i + 5; }
ACOE343 - Embedded Real-Time Processor Systems Frederick University 19
Review Questions
Which of the following code fragments is appropriate for SIMD implementation?
a[0]=b[0]+c[0]; a[2]=b[2]+c[2]; a[4]=b[4]+c[4]; a[6]=b[6]+c[6]; a[0]=b[0]&c[0]; a[0]=b[0]%c[0]; a[0]=b[0]+c[0]; a[0]=b[0]/c[0];
Can the following instructions be merged into one VLIW instruction? If not in how many?
a=b+c; d=c/e; f=d&a; g=b%c;
ACOE343 - Embedded Real-Time Processor Systems Frederick University 20
Review Questions
Which of the following is not a typical DSP feature?
Dedicated multiplier/MAC Von Neumann memory architecture Pipelining Saturation arithmetic
Examples
How many VLIW instructions does the following program fragment require if there two independent data paths (a,b), with 3 ALUs and 1 MAC available in each and 8 instructions/word? How many cycles will it take to execute if they are the first instructions in the program and all instructions require 1 cycle, assuming the pipelining architecture of slide 10 with 6 phases of execution?
ADD a1,a2,a3 SUB b1,b3,b4 MUL a2,a3,a5 MUL b3,b4,b2 AND a7,a0,a1 MUL a3,a4,a5 OR a6,a3,a2 ;a3 ;b4 ;a5 ;b2 ;a1 ;a5 ;a2 = = = = = = = a1+a2 b1-b3 a2-a3 b3*b4 a7 AND a0 a3*a4 a6 OR a3
22
References
DR. Chassaing, DSP Applications using C and the TMS320C6x DSK, Wiley, 2002 Texas Instruments, TMS320C64x datasheets Analog Devices, ADSP-21xx Processors
23