Sei sulla pagina 1di 26

ARIZONA STATE UNIVERSITY

IRA FULTON SCHOOL OF ENGINEERING


ELECTRICAL ENGINEERING DEPARTMENT
EEE404/591 – Real Time DSP – Lab 2
Introduction to the DSP56800E Assembly

Objective

The objective of this lab is to introduce students to the DSP56800E assembly instruction
set. Upon completion, the student will be familiar with:
• Assembler statement syntax and directives.
• Different addressing modes.
• Arithmetic operations.
• Branch and looping instructions

This lab consists of prerequisite tasks that need to be completed by the student on the
Metrowarrior simulator before arriving to the lab. These questions are marked as
“Prereq” in the manual. Please refer to the list of references provided at the end of this lab
for a more detailed overview.

I. Assembly Syntax

Each line of an assembly language program is split into up to four fields where spaces or
tabs delimit the fields as shown below:
Label Mnemonic Operand1, …, Operandk Comment

Note: At least one space should be placed between the fields.


Example 1 shows a typical instruction and how it is partitioned into different fields.
A detailed explanation of the assembly syntax is given next.

Example 1:
Label Mnemonic Operand1Operandk Comment
Start MPY X0, Y0, A ; Multiply the two registers X0 & Y0
and store result in accumulator A

a) Labels

A label (which is optional) is translated into the physical address of the memory location
where the instruction is stored. Labels are used extensively in programs to reduce reliance
upon programmers remembering where data or code is located. For example, a label can
be used to refer to a memory location, the address of a sub-routine or a code portion.

Note: Adding colon to the end of the label will make the label local to the subroutine
where it belongs. Note also that the label field is case sensitive.

1
b) Mnemonic & Operands

A mnemonic is a symbolic representation of the actual binary or hex value of the opcode.
Mnemonics are used because they:
Are more meaningful than hex or binary values.
Reduce the chances of making an error.
Are easier to remember than bit values.

The mnemonic and operands of each statement are translated by the assembler program
into executable machine code (Figure 1). Taken together, a mnemonic and its operands
are called an instruction.
The contents and number of operand fields are dependent on the mnemonic used. The
operand may contain, for example, a symbol pointing to a register or memory location or
an expression such as built in functions (i.e.: LOG(100)). Note that operands are
separated by comma.

Start MPY X0, Y0, A

Program Memory Program Memory


0x00FF 011010 FFFJJJJJ 0x00FF 011010 00010111
. .
. .
. .

Encoding
A FFF = 000
X0,Y0 JJJJJ = 10111

Figure 1. Instruction Coding by Assembler; FFF represents 3 bits that encode the
destination register (See Table A-7, p A-315 in [1]) and JJJJJ are 5 bits encoding the 2
sources registers (See Table A-8, p. A-317 in [1])

c) Comments

Comments begin with a semicolon (;). All characters after the semicolon are ignored.
Commenting is a good programming practice describing each assembly language
instruction used.

d) Directives

There exists another type of instruction, which is not intended to be executed on the DSP
chip and therefore not translated to executable machine code; they are called assembly
language directives (Example 2). These are used for:

2
Reserving memory for data variables, arrays and structures (DS => Define
Space).
Determining the entry address of the program (ORG).
Initializing variable values (DC=> Define Constant).

Note: The assembler does not generate any machine code instructions for assembly-
language directives or comments.

Example 2:

Directive Description
PI EQU 3.14 Instructing the assembler to replace symbol
PI by 3.14 whenever it is encountered.
X DS 4 Reserves 4 memory allocations for array X
ORG P: Determines the entry address of the
program to be P. Also used to indicate that
the following instructions are to be saved in
program memory.
X DC 1, 6 Assigns an array with elements 1 & 6 to X
ORG X: Indicates that the variables declared below
this instruction will be allocated space in
data memory. Can occur anywhere in the
code. However, remember to include a
―ORG P:‖ after all the variables have been
declared, to switch back to program
memory, so that subsequent instructions
are correctly saved in program memory.

Note: In the case of directives, the label field holds the name of a data variable or
constant.

II. Data Types & Addressing Modes

The DSP56800E architecture supports byte (8-bit), word (16-bit) and long word (32 bit)
integer data types, it also supports word, long-word and accumulator fractional data types
(fractional data type is used to avoid overflow and will be explained later in this lab).
Addressing modes specify where the operands for a specific instruction are located (in an
immediate value, in a register or in memory).

You will learn about different data types and addressing modes through examples.
Follow the steps below for creating a project:
1- Start Code Warrior R8.2.
2- Create a new project named ―Intro_ASM (follow the steps described in lab 2 for
creating assembly project).
3- Open main.asm.

Table 1 shows the content of ―main.asm with a brief description of each line. For more
details on directives and instruction set, refer to the listed references at the end of the lab,
or, double click on the specified directive or mnemonic to highlight it and press F1 to get

3
to the help window describing the operation of the specified instruction. Before starting
to code, variables and arrays need to be declared. Variables are usually stored in data
memory (set by the linker command file). You can switch between program and data
memory by using the ―ORG‖ directive. Program memory is designated by ―P:‖ while
data memory is designated by ―X:‖ (i.e.: ORG X :). Note that instructions cannot be
declared in data memory.

Table 1. ―Main.asm‖ Description


Code Description
The ‘section’ directive indicates the beginning of
section rtlib the section rtlib. Note that the linker file DSP56858
specifies exactly the mapping of each section to the
specific locations in memory
The ‘org’ directive specifies that any instruction
org p: thereafter will be loaded into program memory
The ‘global’ directive indicates the function
global Fmain ‘Fmain’ could be called from another file
The ‘SUBROUTINE’ directive causes the
SUBROUTINE "Fmain",Fmain,FmainEND-Fmain
assembler to generate debugging information for the
subroutine. In this case, the subroutine is ‘Fmain’.
; assembly level entry point
Fmain: Label defining the start of the function ‘Fmain’
nop ; do nothing Do Nothing instruction
rts Return to the calling function, in our case
Finit_dsp5685x_
FmainEND: Label defining the end of the function ‘Fmain’
The ‘endsec’ directive indicates the end of the
endsec Section.
The ‘end’ directive indicates the end of the program
Code
end

Two types of variables exist:


Initialized variables defined using ‘DC’ (Define Constant) or ‘DCB’ (Define
Constant Byte). The ‘DC’ or ‘DCB’ directives may have one or more arguments
separated by commas. Multiple arguments are stored in successive address
locations. Other directives may be used as well [2]. Examples of directives
follow:

X DC 6 ; linker allocates a memory word (16-bit) location and
; initializes it to 6, X contains a pointer or address to the
; created variable.

X DCB 6 ; linker allocates a memory byte (8-bit) location and
; initializes it to 6, X contains a pointer or address to the
; created variable.
Uninitialized variables defined using ‘DS’ (Define Space) or ‘DSB’ (Define
Space Bytes).
➢X DS 5 ; linker allocates five memory word (16-bit) locations. X
; contains a pointer or address to the created array.
➢X DSB 5 ; linker allocates five memory byte (8-bit) location. X
; contains a pointer or address to the created array.

4
Now you are ready to start writing your first code. All examples should be executed
within Fmain routine. Note that a list of the DSP56800E assembly instruction set is
provided in the Appendix.

a) Loading Accumulator

The DSP56800E architecture contains four 36-bit accumulators (A, B, C and D). Each
accumulator is divided into three parts:
4-bit extension register used to accommodate for intermediate overflow: A2, B2,
C2, or D2.
16-bit most significant portion MSP: A1, B1, C1, or D1.
16-bit least significant portion LSP: A0, B0, C0, or D0.

Let us start with a simple operation that of assigning a value to the accumulator. Type the
following:

Example 3: (Prereq: Needs to be completed on the DSP56800E simulator)

MOVE.W #10, A ;load accumulator A with 16-bit


;decimal 10
MOVE.W #%1010, B ;load accumulator B with 16-bit
;binary 0000000000001010
MOVE.L #$A, C ;load accumulator C with 32-bit
;hexadecimal $0000000A

Note: In Example 3, if the specified constant is less than 16 bits, it is assumed to be > 0
and zeros are appended to the left.

The first operand, referred to as the source, uses the symbol # to indicate that the value is
a constant. The second operand is referred to as the destination.
Also note that each time you modify the code you need to build the project (F7) and
Debug (F5). If you are already in debug mode, you should go to Debug >kill in the CW
menu in order to be able to modify the code. As you know, the ―Finit_dsp5685x_‖
function will call the Fmain function and then the Fexit_halt function (check
dsp5685x_asm_int.asm). The Fexit_halt function will flash or reset the accumulators or
internal registers. To view the content of the registers before being destroyed:
Use breakpoints (i.e.: insert breakpoint at ―return to subroutine (rts)‖ instruction
in Fmain function). [Recommended]
Disable call to subroutine Fexit_halt by going to‖dsp5685x_asm_int.asm‖ and
commenting the jump to subroutine instruction ―jsr Fexit_halt‖ (insert a
semicolon). [Not Recommended]

From now on, each time you are asked to verify results it is assumed that the project is
built and debugged.

In the CW menu, select View > Registers and check for the values of A, B and C.
Knowing that:
‘.W’ suffix – indicates word memory access.
5
‘.L’ suffix – indicates long memory access.

In what portion of the accumulators (i.e.: A0, A1 or A2) is the number loaded
for each instruction? Which instruction should be used to load accumulator A
with decimal value 100000? Why?

When using ‘MOVE.W’ instruction, the 16 bit word (i.e.: $000A) is moved to the
accumulator with sign extension; if the number loaded to the accumulator is positive, A2
is filled with zeros. If the number is negative, A2 is filled with ones (sign bit is 1). Run
Example 4 and check the results. ‘MOVEU’ is used when dealing with unsigned
numbers.

What is the difference between the output of MOVE.W A1, R1 and MOVEU.W A1,
R1?

Example 4: (Prereq: Needs to be completed on the DSP56800E simulator)

MOVE.W #-10, A ;load accumulator A with decimal –10


;or $FFF6 in two’s complement notation
;with sign extension
MOVEU.W A1, R1 ;load register R1, located in Address
;generation unit, with hexadecimal
;$FFF6 with zero extension

b) Addressing Modes

In this section, you will become familiar with the different addressing modes available.
Addressing modes specify where operands are located: either in an immediate value
(immediate addressing), in a register or in memory.
Immediate addressing is used when the value to be loaded in the accumulator or any
other internal register is provided in the operand (the value following immediately after
the symbol # is the constant to be loaded).
If the operand is located in memory, the address of the memory location can be found
within the instruction (absolute addressing) or in the address generation unit AGU
register (indirect addressing).

Try the following:


Example 5: (Prereq: Needs to be completed on the DSP56800E simulator)

MOVE.W #$FF0A, X:$0000 ;load data memory $0000 with $FF0A


MOVE.W #$FF0B, X:$0001 ;load data memory $0001 with $FF0B
MOVE.L #$0000, R0 ;store $0000 into R0
MOVE.W X:(R0), X0 ;read data memory location pointed
;by R0 into internal register X0

6
R0->R5 are 24-bit registers located in the address generation unit AGU and are used
usually as pointers to memory locations.
The different addressing modes used in Example 5 are summarized as follows:
The first two instructions use two different addressing modes; the first operand
uses immediate value, while the second operand uses absolute addressing.
The third instruction uses register direct mode where an internal register is
directly referenced as the second operand (R0).
The fourth instruction uses indirect addressing for the first operand.
What will be the content of register X0?

To read memory location $0001 and store the result in the Y0 register, you can, for
example, add the following instructions:

Example 6: (Prereq: Needs to be completed on the DSP56800E simulator)

ADDA #1, R0 ;Add 1 to R0 and store the result


;back in R0
MOVE.W X:(R0), Y0 ;read data memory $0001 and store
;result in Y0

To reduce the number of instruction cycles you can take advantage of the architecture of
the DSP56800E family where an indirect addressing and register increment is possible in
one clock cycle as shown in the following example:

Example 7: (Prereq: Needs to be completed on the DSP56800E simulator)

MOVE.W #$FF0A, X:$0000 ;load data memory $0000 with $FF0A


MOVE.W #$FF0B, X:$0001 ;load data memory $0001 with $FF0B
MOVE.L #$0000, R0 ;store $0000 into R0
MOVE.W X:(R0)+, X0 ;read data memory $0000 to internal
;register X0 and increment R0 by one
MOVE.W X:(R0), Y0 ;read data memory $0001 and store
;result in Y0
MOVE.W X:(R0+2), Y0 ;read data memory $0001 + 2 =
;$0003 and store result in Y0
MOVE.W #10, N ;Store 10 into register N
MOVE.W X:(R0+N), A ;read data memory $0001 + N =
;$000B and store result in A
MOVE.W X:(R0)+N, B ;read data memory $0001 and store
;result in B, increment R0 by N
MOVE.W X:(R0-1), C ;read data memory $000B - 1 =
;$000A and store result in C

7
Example 7 provides several ways for accessing memory and modifying the pointer using
only one instruction. In the instructions above, whenever a + appears outside the
parentheses, the register inside the parentheses is incremented (decremented if -) after
performing the operation. For example:

MOVE.W X: ( R0)+N, B

Here, the value in R0 is the memory address and the value at that address is moved into
B1. After the move has happened, R0 is incremented by N. This instruction is equivalent
to the following two instructions in sequence

MOVE.W X:(R0),B
ADDA N,R0

Whenever a + appears inside the parentheses, the offset that is present after the + is
added(decremented if -) to the value in the register to generate the address to be accessed
before performing the operation. This generated address is used in the operation.
However, the value in the register is not changed and it retains the value that it contained
before the offset addition was performed.

It is recommended to use indirect addressing whenever successive memory locations are


read or written.

c) Word and Byte Pointers

Byte pointers are used to access byte values in memory while word pointers are used to
access bytes, word or long-word data types in memory.
An address in an AGU (Address Generation Unit) register is considered a byte pointer
when used in conjunction with instructions expecting byte pointers (i.e: MOVE.BP).

Instructions using:
‘.BP’ suffix indicate byte pointer accessing a byte from memory.
‘.B’, ‘.W’, ‘.L’ indicates word pointer byte, word or long word memory location.

8
Figure 2. Accessing a Byte Using Word Pointer [1]

Figure 2 illustrates the execution of MOVE.B A1, X:(R0+3) instruction. This uses a
Word Pointer stored in R0 in conjunction with a two bit offset (3 in this example). The
upper bit of the two bit offset determines whether the word pointed to by the register is to
be used or the one next to word pointed to by the register (i.e., Register Value+1). The
lower bit of the two bit offset determines whether the upper or lower byte in the chosen
word is to be used. Table 2 summarizes this.

Table 2. Meaning of 2 bit offset value

Two bit Offset Value Byte indexed

Upper bit Lower bit


(Selects Word) (Selects byte)
0 0 Lower byte of Word pointed to by register
0 1 Upper byte of Word pointed to by register
1 0 Lower byte of Word next to the one pointed to by
register (i.e. lower byte of (Register Value+1) )
1 1 Upper byte of Word next to the one pointed to by
register (i.e. upper byte of (Register Value+1) )

In the example above, R0 contains $1000. The immediate offset is arithmetically right
shifted by 1 bit to extract the upper bit: 3>>1 = 1. This extracted bit is added to the
address contained in R0 to give the correct word address: 1 + $1000 = $1001. The least

9
significant bit (LSB) of the immediate offset selects which byte at the word address is
accessed. In this example, the LSB of the immediate offset (3) is set, so the upper byte of
the memory word is accessed. The lowest 8 bits of the A1 register, $CD, are then written
to this location. The lower byte of the memory location $1001 is not modified.

If a byte can be accessed using a word pointer, what is the purpose of having an extra
instruction with byte pointer MOVE.BP?

The need is illustrated by Example 8.

Example 8:

ORG X:
X1 DCB 2,3,4,5
ORG P:
MOVE.L #X1, R1 ;move word pointer to R1
MOVE.W #0, A ;clear A
MOVE.W #0, B ;clear B
DO #4,Loop ; start looping
MOVE.B X:(R1+0),B ;read byte and store in B
ADD B, A ; Add A + B -> A
ADDA #1, R1 ; increment R1
Loop:

Debug the code above and check the value of B after each iteration. What is
wrong in that code? What is the final result of A? What should it be? How are the
bytes stored in memory?

The problem encountered in Example 8 is solved by using byte pointers. Word pointers
are used whenever mixed data is accessed such as data structures containing byte, word
and long word data. Byte pointers are used to exclusively access bytes.

Note that a word contains two bytes. Hence there are twice as many bytes as there are
words and consequently twice as many addresses. Figure 3 shows the relationship
between word and byte addresses. From the figure we can see that the lower byte of the
word is saved at (Word address *2) and the upper byte is saved at (Word address * 2)+1

MSB LSB
Word Byte
Address Address
$0001 CC DD
$0000 AA BB $0003 CC
$0002 DD
$0001 AA
$0000 BB
Figure 3. Word and Byte addresses

10
Figure 4 shows a byte access using a byte pointer. The example executes the MOVE.BP
A1, X:(R0) instruction.

The address contained in R0, $2001, is right shifted to give the correct word address,
$1000. The LSB of the R0 register selects which byte at the word address is accessed. In
this example, the LSB is set and hence the upper byte is to be accessed at location $1000.
The lowest 8 bits of the A1 register, $CD, are then written to this location. The lower
byte of memory location $1000 is not modified.

Figure 4. Accessing Byte with a Byte Pointer [1]

Rewrite Example 8 using MOVE.BP; you may use the built-in assembler functions
described in Table 3. These functions are useful for converting a word address into a
byte address (byte pointer).

Table 3. Useful built-in Assembler Functions


Assembler Function Computation Performed

@hb(value) (value<<1) + 1
@lb(value) (value<<1) + 0

The ‗value‘ mentioned here is a word address and 2 bytes are present at that location.
‗@hb(value)‘ and ‗@lb(value)‘ are used extract the address of higher and lower bytes at
this location, respectively.

III. Arithmetic Operations

We will start this section by a simple example by multiplying two integers.

11
Example 9: (Prereq: Needs to be completed on the DSP56800E simulator)
MOVE.W #5000, A ;store 5000 in Accumulator A
MOVE.W #-3, Y0 ;store -3 in Y0
IMPY.W A1,Y0,B ;16 x 16 multiply and store 16-bit
;result in Accumulator B

Note that the multiplication of a two 16-bit signed integer operands using the IMPY.W
instruction gives a 16-bit signed integer placed in the upper 16 bits of the accumulator
(i.e.: B1). The corresponding extension register will be filled with sign bit and the lower
16 bits portion will remain unchanged (Figure 5).

N = 16 bits

Figure 5. Multiplication Using IMPY.W [1]

In N bit 2‘s complement representation, the range for a signed N bit number is between -
2N-1 and 2N-1-1. A multiplication of two N bit signed numbers gives a signed result that
is 2N-1 bits. To illustrate the idea, assume N = 4 having a range between -8 and 7. The
min/max number obtained from multiplication is -56/+64, so 7 bits are enough to
represent the result. In example 9, only the lower 7 bits will be stored while the MSB,
called ―sign extension bit‖ and replicating the sign bit (7th bit), is ignored. Another way
to interpret the result is that if converted to unsigned, the result of the multiplication will
be (N-1) + (N-1) + 1 sign bits giving 2N – 1.

In the CW menu select, View > Registers and check for A2, A1, A0, Y0, B2, B1,
B0. Where is the multiplication result located? By default, results are shown in
hexadecimal; right click on the hexadecimal number and select ―View as Signed
Decimal‖ to change to decimal.

12
Now, instead of loading the decimal –3 in the register Y0 load the decimal number 10
(MOVE.W #10, Y0). Run the program and check the result of the multiplication, keeping
in mind that multiplying two positive numbers will yield a positive number. When you
check the result of B, you will notice that the result is negative. The range of a 16-bit
integer is between 32,767 and –32,768. It is obvious that the multiplication exceeds this
range (5000 x 10 = 50,000) and hence overflow has occurred. When overflow occurs, the
maximum allowed positive number will wrap around starting from the minimum allowed
number. In the example, 50,000 is greater than 32767 by 17233. The value you should
see is -32768 +(17233-1)= -15536. As another example, if the resultant number is 32769,
it is greater by 32767 by two. It wraps around and is represented as the second number
counting up from the minimum value, i.e., -32767. If you check the binary
representations of 32769 and -32767, you will see that they are equal.

One solution to avoid overflow is to use the long word 32-bit multiplication instruction
instead of the word 16-bit multiplication instruction as depicted in Example 10.

Example 10: (Prereq: Needs to be completed on the DSP56800E simulator)

MOVE.W #5000, A ;store 5000 in Accumulator A


MOVE.W #25, Y0 ;store 25 in Y0
IMPY.L A1,Y0,B ;16 x 16 multiply and store 31 bit
;result in Accumulator B

Multiplication of two N-bit signed integers will result in a (2N-1) bit result. In our case,
the resultant 31-bit will be stored in the accumulator, where the LSP (Least Significant
Portion) is stored in FF0 (i.e.:A0) and the MSP (Most Significant Portion) is stored in
FF1 (i.e:A1).

Check the accumulator B for the result, is overflow occurring? Usually, the result
needs to be stored back in memory and, in this case, the result will occupy two
memory locations. This is not recommended for large algorithms. Can you think of a
mechanism to prevent overflow using only 16-bit results?

In a 16-bit processor, the dynamic range is small and can create easily overflows. For
example, 300 x 300 = 90,000, this is an overflow.

To remedy this problem, scaling can be applied where any 16-bit stored value is scaled to
be a fraction between –1 and 1. To scale the integer to a fraction, divide the integer by a
scaling factor (215 for Q15). Table 4 shows how integers are mapped to fractions

Table 4. Fractional and Integer Arithmetic


Number Biggest Smallest
Integer number 32767 -32768
-15
Fractional number 1-2 0.999 -1

To get the integer number equivalent of a fractional number multiply by 215. For
example, 0.5 will be stored in memory as 16384 (=0.5*215).
13
Once the integers have been converted to a fractional representation, you can perform
fractional multiplication and then scale the result back up to get the result without
overflow.

Obviously, using fractional number multiplication will never result in an overflow; the
result cannot exceed 1 or –1 (but addition may result in overflow). Another advantage of
using fractional numbers is that while multiplication of two 16-bit fractional numbers
also requires 32 bits for the result, the 32 bits can be rounded into 16 bits as in integer
multiplication yet only introducing an error of approximately 2 -16. This rounding is
performed by discarding the lower 16 bits. This is analogous to decimal rounding where a
number like 1.5632 can be rounded to a lower precision of two decimal places by
discarding lower digits to give 1.56.

The only question that remains to be answered is: How do you perform floating point
(fractional) multiplication on a fixed point processor?

a) Fractional Arithmetic

The fractional number representation briefly discussed above is called ―Q15 format‖.
Q15 number representation is a very popular representation of floating point numbers in a
fixed point processor. The letter ‗Q‘ represents ―Quantity of fractional bits‖ and the
number following the Q indicates the number of bits that are used for fraction. Figure 6
shows the fraction point for integer and fractional numbers.

. .

Sign 15 bits
Sign 15 bits

Zero fractional bits, 15 bits fractional bits


integer number and one sign bit

Figure 6. Integer and Fractional Representation

To get the correct result from multiplying two fractional numbers an extra one-bit shift to
the left is needed. The left-shift requirement can alternatively be explained by way of
decimal place alignment. Remember that when we multiply decimal numbers, we first
multiply them ignoring the decimal points and then put the decimal point back in the last
step. The decimal point is placed so that the total number of digits to the right of the
decimal point in the multiplier and multiplicand is equal to the number of digits to the
right of the decimal point in their product.
The same applies here; the ―decimal point‖ is to the right of the leftmost (sign) bit and, in
Q15 format, there are 15 bits (digits) to the right of this point. When signed multiplication is
performed on two Q15 format numbers, the sign bit is the most significant bit (Bit 31), the
result is in the lower 30 bits (Bit 29 to Bit 0) and bit 30 is a redundant sign bit. So, we shift
the number to the left by one bit. This shifts the Most Significant Bit (Bit 31) out and shifts a
zero into the Least Significant bit (Bit 0). Now the Most significant 16 bits contain the result
and we can discard the lower 16 bits to round
14
down the result into a 16 bit register. If we do not perform the shift, the redundant sign bit
is incorrectly interpreted as a magnitude bit. If the Most Significant 16 bits do not contain
the result, but the lower 16 bits do, this is considered an underflow because the result is
too small to be represented by just 16 bits. This is analogous to a loss of data that occurs
in decimal rounding when a number like 1.000025 is rounded to a lower precision of two
decimal places by discarding lower digits to give 1.00.

Example 11 shows the assembly code for multiplying two fractional numbers: 0.5 * 0.5.
Note that the integer equivalent of the fractional number is used in the instructions.

Example 11:

MOVE.W #16384, A ;store 16384(0.5 in Q15) in


;Accumulator A
MOVE.W #16384, Y0 ;store 16384(0.5 in Q15) in Y0
IMPY.L A1,Y0,B ;16 x 16 multiply and store 31 bit
;result in Accumulator B
LSL.W B ;1 bit shift left of the
;accumulator
MOVE.W B, X0 ;MOVE B1 to X0

Check the values in B1, B0 and X0. What is the result of the multiplication?

We can reduce the number of cycles by using a specially designed multiplication


instruction that will perform the shift automatically as shown in Example 12.

Example 12:

MOVE.W #16384, A ;store 16384(0.5 in Q15) in


;Accumulator A
MOVE.W #16384, Y0 ;store 16384(0.5 in Q15) in Y0
MPY A1,Y0,B ;16 x 16 fractional multiply
MOVE.W B, X0 ;MOVE B1 to X0

Check again for the result in B1, B0 and X0. Is it the same as the one previously
obtained?
Perform the following fractional multiplication using Example 12: 5 × 5 .
215 215
What is the result obtained in B and X0? Comment.

15
b) Multiply and Accumulate

There are several DSP algorithms such as Finite Impulse Response filters (FIRs) that
require a large number of sum-of-products terms. This is achieved by carrying out a
series of multiplications and adding the products together.

Example 13 multiplies two fractional arrays: arr1 = {0.125, 0.125, 0.5} and
arr2 = {0.25, 0.125, 0.125} element by element and adds the resultant products:

Example 13:

Org X:
ARR1 DC 4096,4096,16384 ;Declare Arr1
ARR2 DC 8192,4096,4096 ;Declare Arr2
Org P: ;switch to program memory
MOVEU.W #ARR1,R0 ;get pointer to ARR1
MOVEU.W #ARR2,R3 ;get pointer to ARR2
CLR A ; clear Accumulator A
MOVE.W X:(R0)+,Y0 ;read first element of
;ARR1 into Y0
MOVE.W X:(R3)+,X0 ;read first element
;of ARR2 into X0
REP #3 ;repeat next instruction
;3 times
MAC Y0,X0,A X:(R0)+,Y0 X:(R3)+,X0
; Multiply Y0*X0 + A -> A, in
;parallel store content of memory
;pointed by R0 and R3 into Y0 and
;X0 respectively and post
;increment the two pointers

Note that the last instruction used is a special instruction taking multiple operands and
performing several tasks in parallel. For more on parallel instructions, refer to pages 4-48
& 4-49 in [1]. Get the total number of instructions executed by:
- Switching to ―Simulator Mode‖ in the CW window
- Selecting Edit > ldm Settings
- Scrolling down the left menu till you reach the ―debugger‖ item
- Selecting ―Remote Debugging‖, changing the connection type from
―56800E Local Hardware Connection‖ to ―56800E Simulator‖, and
clicking OK.
- In the CW window, select Debug > Kill and then rebuild the project using
Project > Make and then Project > Debug.
- Before running the project, in the CW window go to DSP56858E >
Display Cycle /Instruction Count (now highlighted) and click on Reset.
- To run the project, select Project > Run. Then, go back to Display Cycle
/Instruction Count and write down the displayed number of machine
cycles and instructions.

16
Measure the number of cycles and instruction count for the MAC operation. (Start
measuring at the instruction after the CLR A and upto and including the MAC
instruction). Compare with the number of cycles and instruction count you
obtained for doing a MAC operation in assembly in PART B of lab 2. (Note: Here
you are performing the MAC three times, hence you must multiply the number of
cycles and instruction count you obtained from lab2 by three)

IV. Branch and Looping Instructions

a) Branching Instructions

There is often a need for a decision making instruction in your program code. In a high
level language, decision making can be written in terms of an IF – THEN – ELSE. In this
section, two types of branch instructions will be discussed:
Unconditional Branching (i.e: BRA)
Conditional Branching (i.e: Bcc, where cc is a given condition). See table 10 in
Appendix)

Unconditional Branching

When encountered, a ‘BRA’ instruction will branch to the address specified within the
instruction. ‘BRA’ is equivalent to a ‘goto’ statement in a high-level language.

Example 14:

MOVE.W #$000A, A ;load 10 into A


BRA SKIP ;Branch unconditionally
;to label SKIP.
MOVE.W #$00FF, A ; This instruction is
;skipped
SKIP MOVE.W #$00AA, B ; Initialize
;accumulator B

What is the content of accumulator A after code execution?

Conditional Branches

Consider the following simple ‗if condition‘ in C

If (x = = 3)
{x ++;}

Example 15 implements the above if condition in Assembly.

17
First you need to initialize a variable ‘TEST_VAR’ to 3 using the DC directive.
Remember that variable initialization should occur in data memory and not in program
memory. Please refer to example 13 to see how to switch between data and program
memory sections using the ‘Org’ directive and to see how to initialize variables.

Example 15:

MOVE.W #3, X0 ; Move 3 to X0


MOVEU.W #TEST_VAR, R0 ; Get memory address into R0
SUB.W X:(R0), X0 ; Subtract 3 from TEST_VAR
BEQ Increment ; If the results is zero goto
;“Increment”
BRA Cont_code ; Empty Else, skip to label
;“Cont_code”
Increment:
INC.W X:(R0) ; Increment variable
;TEST_VAR
Cont_code:
nop

Set a breakpoint on ―BEQ Increment‖. The program will run until it reaches ―BEQ‖
command. In the CW menu select View > Register Details, a window will appear
showing the binary representation of the specified register with a brief description. In the
description field type: ―SR‖. The content of the Status Register will be shown. Check for
the ―Z‖ bit which is bit 2 in the status register.
If the result of the last operation that was executed is zero, the ‘Z’ bit in the SR register
will be equal to one. If the result is not zero, the ‘Z’ bit will be cleared to zero.

Is the ‗BEQ‘ condition satisfied? Continue executing the program and check the
content of the memory location TEST_VAR.
Rather than initializing the TEST_VAR to ―3‖ change it to be ―5‖, execute the
code again and check for the results.

GRAD STUDENTS ONLY: Can you think of another way to perform the same
task in assembly using other instructions? (Hint: Try BNE). Write the following C
program in assembly:

If (x = = 3);
{x ++;}
else
{x--;}

c) Looping Instructions

Typically, a loop is implemented using a variable used as a counter. When the counter
reaches a certain value, execution of the loop comes to an end.

18
Example 16:
int i = 10; //Counter variable i
int j = 0; // Memory location to be incremented
do // Start of loop
{
i--; // Decrement counter
j++; // Some operation to be performed for example, increment j
} while (i>0) // if counter equals zero break

This loop can be performed in assembly as shown in Example 17. Initialize TEST_VAR
to zero using DC directive (i.e: TEST_VAR DC 0)

Example 17:

MOVEU.W #TEST_VAR, R0 ; Get memory address


MOVE.W #10, A1 ; Move 10 to Accumulator
Loop:
INC.W X:(R0) ; Increment variable TEST_VAR
DEC.W A ; Decrement Accumulator by one
BNE Loop ; Loop as long as A is different
; from zero

Run the code and check for the value memory content of TEST_VAR.

Example 17 implemented looping using software instructions. Another way to do the


loop is to use hardware registers and specialized looping instructions. The looping
instructions load specific hardware registers with the loop count in order to correctly
perform looping. The ―REP‖ instruction executes a single word instruction a number of
times as shown in Example 18:

Example 18:

MOVEU.W #TEST_VAR,R0 ; Get memory address


Loop:
REP #10 ;Repeat the following
;instruction 10 times
INC.W X:(R0) ; Increment variable TEST_VAR

GRAD STUDENTS ONLY: Which method is more efficient? Compare instruction cycles for
example 17 and 18.

19
The drawback of the ―REP‖ instruction is that it executes only the following instruction
a number of times. To execute a block of instructions, use the ―DO‖ loop instruction as
shown in Example 19:

Example 19:

MOVEU.W #TEST_VAR, R0 ; Get memory address


DO #10,Loop ; Do Looping 10 times
INC.W X:(R0) ; Increment variable TEST_VAR
MOVE.W X:(R0), X0 ; Move TEST_VAR to register X0
ASL.W X0 ; Multiply X0 by 2 by shifting
;to the left
MOVE.W X0, X:(R0) ; Store the result back to
;TEST_VAR
Loop:

References
[1] DSP56800E 16-bit DSP Core Reference Manual Rev2.16, 2005
[2] CodeWarrior™ Development Studio for Freescale ® DSP56800x Embedded
Systems Assembler Manual.

20
Appendix A– Instruction Set Summary
Table 1. Multiplication Instructions
Instruction Parallel Move? Description
IMAC.L — Signed integer multiply-accumulate with full precision
IMACUS — Unsigned/signed integer multiply-accumulate with full precision
IMACUU — Unsigned/unsigned integer multiply-accumulate with full precision
IMPY.L — Signed integer multiply with full precision
IMPY.W — Signed integer multiply with integer result
IMPYSU — Signed/unsigned integer multiply with full precision
IMPYUU — Unsigned/unsigned integer multiply with full precision
MAC Yes Signed fractional multiply-accumulate
MACR Yes Signed fractional multiply-accumulate and round
MACSU — Signed/unsigned fractional multiply-accumulate
MPY Yes Signed fractional multiply
MPYR Yes Signed fractional multiply and round
MPYSU — Signed/unsigned fractional multiply

Table 2. Arithmetic Instructions


Instruction Parallel Move? Description
ABS Yes Absolute value
ADC — Add long with carry
ADD Yes Add two registers
ADD.B — Add byte value from memory to register
ADD.BP — Add byte value from memory to register
ADD.L — Add long value from memory (or immediate) to register
ADD.W — Add word value from memory (or immediate) to register
CLR Yes Clear a 36-bit register value
CLR.B — Clear a byte value in memory
CLR.BP — Clear a byte value in memory
CLR.L — Clear a long value in memory
CLR.W — Clear a word value in memory or in a register
CMP Yes Compare a word value from memory (or immediate) with an
accumulator; also compare two registers, where the second is always an
accumulator; comparison done on 36 bits
CMP.B — Compare the byte portions of two registers or an immediate with the byte
portion of a register; comparison done on 8 bits
Compare a byte value from memory with a register; comparison done on
CMP.BP —
8 bits
CMP.L — Compare a long value from memory (or an immediate value) with a
register; also compare the long portions of two registers; comparison
done on 32 bits
CMP.W — Compare a word value from memory (or immediate) with a register; also
compare the word portions of two registers; comparison done on 16 bits
DEC.BP — Decrement byte in memory
DEC.L — Decrement an accumulator or a long in memory
Decrement upper word of accumulator, word register, or a word in
DEC.W Yes
memory
DIV — Divide iteration
INC.BP — Increment byte in memory

21
Table 2. Arithmetic Instructions
Instruction Parallel Move? Description
INC.L — Increment an accumulator or a long in memory
Increment upper word of accumulator, word register, or a word in
INC.W Yes
memory
NEG Yes Negate an accumulator
NEG.BP — Negate byte in memory
NEG.L — Negate a long word in memory
NEG.W — Negate a word in memory
NORM — Normalize
RND Yes Round
SAT Yes Saturate a value in an accumulator and store in destination
SBC — Subtract long with carry
SUB Yes Subtract two registers
SUB.B — Subtract byte value in memory from register and store in register
SUB.BP — Subtract byte value in memory from register and store in register.
SUB.L — Subtract long value in memory from register and store in register.
Subtract word value in memory (or immediate) from register and store in
SUB.W —
register.
SUBL Yes Shift accumulator left and subtract word value
SXT.B — Sign extend a byte value in a register and store in destination
SXT.L — Sign extend a value in an accumulator and store in destination
SWAP — Swap R0, R1, N, and M01 registers with corresponding shadows
Tcc — Conditionally transfer one or two registers to other registers
TFR Yes Transfer data ALU register to an accumulator
TST Yes Test a 36-bit accumulator
TST.B — Test byte in memory or in a register
TST.BP — Test byte in memory
TST.L — Test an accumulator or a long in memory
TST.W — Test a word in memory or in a register
ZXT.B — Zero extend a byte value in an register and store in destination

Table 3. Shifting Instructions


Instruction Parallel Move? Description
ASL1 Yes Arithmetic shift left (shift register 1 bit)
ASL16 — Arithmetic left shift a register or accumulator by 16 bits
ASL.W — Arithmetic shift left a 16-bit register (shift register 1 bit)
ASLL.L — Arithmetic multi-bit shift left a long value
ASLL.W — Arithmetic multi-bit shift left a word value
ASR Yes Arithmetic shift right (shift register 1 bit)
ASR16 — Arithmetic right shift a register or accumulator by 16 bits
ASRAC — Arithmetic multi-bit shift right with accumulate
ASRR.L — Arithmetic multi-bit shift right a long value
ASRR.W — Arithmetic multi-bit shift right a word value
LSL.W — Logical shift left a word-sized register
LSR.W — Logical shift right (shift word-sized register 1 bit)
LSR16 — Logical right shift a register or accumulator by 16 bits
22
Table 3. Shifting Instructions
Instruction Parallel Move? Description
LSRAC — Logical multi-bit shift right with accumulate
LSRR.L — Logical multi-bit shift right a long value
LSRR.W — Logical multi-bit shift right a word value
ROL.L — Rotate left on long register
ROL.W — Rotate left on word register
ROR.L — Rotate right on long register
ROR.W — Rotate right on word register

Table 4. Logical Instructions


Instruction Parallel Move? Description
AND.L — Logical AND on long registers
AND.W — Logical AND on word registers
ANDC — Logical AND immediate data on word in memory
CLB — Count leading zeros or ones
EOR.L Yes Logical exclusive OR on long registers
EOR.W — Logical exclusive OR on word registers
EORC — Logical exclusive OR immediate data on word in memory
NOT.W — Logical complement on word registers
NOTC — Logical complement on word in memory
OR.L — Logical OR on long registers
OR.W — Logical OR on word registers
ORC — Logical OR immediate data on word in memory

Table 5. AGU Arithmetic (No Parallel Moves)


Instruction Description
ADDA Add register or immediate to AGU register
ADDA.L Add to AGU register with 1 bit left shift of source operand
Save old value of stack pointer onto stack, aligning SP for long memory accesses before
ALIGNSP
performing save
ASLA Arithmetic 1 bit left shift an AGU register
ASRA Arithmetic 1 bit right shift an AGU register
CMPA Compare two AGU registers; comparison done on 24 bits
CMPA.W Compare two AGU registers; comparison done on 16 bits
DECA Decrement an AGU register by one
DECA.L Decrement an AGU register by two
DECTSTA Decrement and test an AGU register
LSRA Logical 1 bit right shift an AGU register
NEGA Negate an AGU register
SUBA Subtract register or immediate from AGU register
SXTA.B Sign extend a byte value in an AGU register
SXTA.W Sign extend a word value in an AGU register
TFRA Transfer one AGU register to another

TSTA.B Test the byte portion of an AGU register

23
Table 5. AGU Arithmetic (No Parallel Moves)
Instruction Description
TSTA.L Test the long portion of an AGU register
TSTA.W Test the word portion of an AGU register
TSTDECA.W Test and decrement the word portion of an AGU register
ZXTA.B Zero extend a byte value in an AGU register
ZXTA.W Zero extend a word value in an AGU register

Table 6. Bit Manipulation Example (No Parallel Moves)


Instruction Description
BFCHG Bitfield test and change
BFCLR Bitfield test and clear
BFSET Bitfield test and set
BFTSTH Bitfield test for on condition
BFTSTL Bitfield test for off condition

Table 7. Looping Instructions (No Parallel Moves)


Instruction Description
DO Load LC register with unsigned 16-bit loop count and start hardware loop
DOSLC Start hardware loop with signed 16-bit loop count already in LC register
ENDDO Terminate current hardware DO loops
REP Repeat immediately following instruction

Table 8. Move Instructions (No Parallel Moves)


Instruction Description
MOVE.B Move (signed) byte using word pointers and byte addresses
MOVE.BP Move (signed) byte using byte pointers and byte addresses
MOVEU.B Move unsigned byte using word pointers and byte addresses
MOVEU.BP Move unsigned byte using byte pointers and byte addresses
MOVE.L Move long using word pointers
MOVE.W Move (signed) word using word pointers and word addresses
(data or program memory)
MOVEU.W Move unsigned word using word pointers and word addresses
(data or program memory)

24
Table 9. Program Control Instructions (No Parallel Moves)
Instruction Description
Bcc Branch conditionally
BRA Branch
BRAD Delayed branch
BRCLR Branch if selected bits are clear
BRSET Branch if selected bits are set
BSR Branch to subroutine
FRTID Delayed return from fast interrupt
ILLEGAL Generate an illegal instruction exception
Jcc Jump conditionally
JMP Jump
JMPD Delayed jump
JSR Jump to subroutine
RTI Return from interrupt
RTID Delayed return from interrupt
RTS Return from subroutine
RTSD Delayed return from subroutine
SWI Software interrupt at highest priority level
SWI #<0–2> Software interrupt at specified priority level
SWILP Software interrupt at lowest priority level
DEBUGEV Generate debug event
DEBUGHLT Enter debug mode
NOP No operation
STOP Stop processing (lowest power standby)
WAIT Wait for interrupt (low power standby)

Table 10. Branch Conditionally


Instruction Description
CC (HS*) — carry clear (higher or same) C=0
CS (LO*) — carry set (lower) C=1
EQ — equal Z=1
GE — greater than or equal N V=0
GT — greater than Z + (N V) = 0
HI* — higher C Z =1
LE — less than or equal Z + (N V) = 1
LS* — lower or same C+Z=1
LT — less than N V=1
NE — not equal Z=0
NN — not normalized Z + (U E ) = 0
NR — normalized Z + (U . E ) = 1

25
EEE404/591 – Real Time DSP
Verification Sheet

Lab 2: Introduction to the DSP56800E Assembly

Student’s Name: Time In: Time Out:

Session:
Date:

Items to be Checked
 Example 3 and Example 5 Questions (Prereq: Simulator)
 Example 9 and Example 10 Questions (Prereq: Simulator)
 Example 8 Questions
 Example 11 Questions
 Example 12 Questions
 Example 14 Questions
 Example 15 Questions
 Example 18 Questions
 Grad only Questions

Comments (For TA Use)

Potrebbero piacerti anche