Sei sulla pagina 1di 129

solutions MANUAL FOR

ARM Assembly
Language
Fundamentals and Techniques
Second Edition

by

William Hohl and


Christopher Hinds
solutionS MANUAL FOR
ARM Assembly
Language
Fundamentals and Techniques
Second Edition

by
William Hohl and
Christopher Hinds

Boca Raton London New York

CRC Press is an imprint of the


Taylor & Francis Group, an informa business
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2015 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works

Printed on acid-free paper


Version Date: 20140722

International Standard Book Number-13: 978-1-4822-2992-9 (Ancillary)

This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and
information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission
to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic,
mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or
retrieval system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact
the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides
licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment
has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation
without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
ARM Assembly Language

Fundamentals and Techniques

2nd Edition

Solutions Manual

1
Copyright © 2015 William Hohl and Christopher Hinds. All rights reserved.

Proprietary Notice
ARM and the ARM Powered logo are trademarks of ARM Ltd. Neither the whole nor any
part of the information contained in, or the product described in, this document may be
adapted or reproduced in any material form except with the prior written permission of the
copyright holders. The product described in this document is subject to continuous
development and improvement. All particulars of the product and its use contained in this
document are given by the authors in good faith. However, all warranties implied or
expressed, including but not limited to implied warranties or merchantability, or fitness for
purpose, are excluded. This document is intended only to assist the reader in the use of the
product. Neither ARM Ltd nor the authors shall be liable for any loss or damage arising
from the use of any information in this document, or any error or omission in such
information, or any incorrect use of the product.

Credits
Thanks are due to Joe Bungo for many of these answers.

Release Information
The authors reserve the right to modify or change the solutions as better ones become
available or errors are found. Corrections or comments can be sent to us at WEBSITE
ADDRESS?

Issue Date By Change


Draft 01 Mar 09 JB/BH Created
A 17 Mar 14 BH/CH 2nd edition

2
Authors’ Note:

You’ll notice that many of the programs are already written to be executed in the Keil tools,
while some will have to be slightly altered, so mind the type of tool suite that you have. All
answers are provided as illustration only and are not intended to be considered optimal
solutions. Hopefully, students are able to follow the flow of the code and ultimately make
optimizations or improvements on their own.

William Hohl and Christopher Hinds

3
Table of Contents
Chapter 1 ............................................................................................................................. 5
Chapter 2 ........................................................................................................................... 11
Chapter 3 ........................................................................................................................... 14
Chapter 4 ........................................................................................................................... 18
Chapter 5 ........................................................................................................................... 22
Chapter 6 ........................................................................................................................... 27
Chapter 7 ........................................................................................................................... 31
Chapter 8 ........................................................................................................................... 46
Chapter 9 ........................................................................................................................... 53
Chapter 10 ......................................................................................................................... 59
Chapter 11 ......................................................................................................................... 65
Chapter 12 ......................................................................................................................... 74
Chapter 13 ......................................................................................................................... 89
Chapter 14 ....................................................................................................................... 103
Chapter 15 ....................................................................................................................... 108
Chapter 16 ....................................................................................................................... 109
Chapter 17 ....................................................................................................................... 119
Chapter 18 ....................................................................................................................... 122

4
Chapter 1
1. Give two examples of system-on-chip designs available from semiconductor manufacturers.

Describe their features and interfaces. They do not necessarily have to contain an ARM

processor.

The Texas Instruments OMAP 35x processors contain an ARM Cortex-A8 processor, along with

a camera image signal processor, external memory interfaces (SDRAM and general purpose

controllers), a display subsystem, serial communications, and removable media connections. The

LPC1758 microcontroller from NXP has a Cortex-M3 processor, along with an Ethernet MAC,

DMA controller, 4 UARTs, 2 CAN channels, a 10-bit Digital-to-Analog Converter, a motor

control PWM, timers, and a host of other peripherals.

2. Find the two’s complement representation for the following numbers, assuming they are

represented as a 16-bit number. Write the value in both binary and hexadecimal.

a. -93 b. 1,034

c. 492 d. -1094

a. 1111111110100011, 0xFFA3

b. 0000010000001010, 0x040A

c. 0000000111101100, 0x01EC

d. 1111101110111010, 0xFBBA

3. Convert the following binary values into hexadecimal:

a. 10001010101111 b. 10101110000110

c. 1011101010111110 d. 1111101011001110

a. 0x22AF

b. 0x2B86

5
c. 0xBABE

d. 0xFACE

4. Write the 8-bit representation of -14 in one’s complement, two’s complement and sign-

magnitude representations.

One’s: 11110001

Two’s: 11110010

Signed Magnitude: 10001110

5. Convert the following hexadecimal values to base 10:

a. 0xFE98 b. 0xFEED

c. 0xB00 d. 0xDEAF

a. 65176

b. 65261

c. 2816

d. 57007

6. Convert the following base 10 numbers to base 4:

a. 812 b. 101

c. 96 d. 3640

a. 30230

b. 1211

c. 1200

d. 320320

6
7. Using the smallest data size possible, either a byte, a halfword (16 bits), or a word (32 bits),

convert the following values into two’s complement representations:

a. -18,304 b. -20

c. 114 d. -128

a. 1011100010000000

b. 11101100

c. 01110010

d. 10000000

8. Indicate whether each value could be represented by a byte, a halfword or a word-length two’s

complement representation:

a. -32,765 b. 254

c. -1,000,000 d. -128

a. halfword

b. halfword

c. word

d. byte

9. Using the information from the ARM v7-M Architectural Reference Manual, write out the 16-

bit binary value for the instruction SMULBB r5, r4, r3.

11111011000101001111010100000011

10. Describe all the ways of interpreting the hexadecimal number 0xE1A02081 (hint: it might not

be data).

It can be data in a number of formats:

Unsigned: 3,785,367,681 in decimal

7
Signed one’s complement: -509,599,614 in decimal

Signed two’s complement: -505,599,615 in decimal

It could be an address, or the instruction

MOV r2, r1, LSL #1 (you will see this as LSL r2, r1, #1 in UAL format)

It could also be the floating-point number -3.69227 x 1020.

11. If the hexadecimal value 0xFFE3 is a two’s complement, halfword value, what would it be in

base 10? What if it were a word-length value (i.e., 32 bits long)?

Halfword: -29

Word: 65507

12. How do you think you could quickly compute values in octal (base 8) given a value in binary?

Start from the right, splitting the binary number up into 3-bit partitions, then convert each 3-bit

partition into its decimal equivalent.

13. Convert the following decimal numbers into hexadecimal:

a. 256 b. 1000

c. 4095 d. 42

a. 0x100

b. 0x3E8

c. 0xFFF

d. 0x2A

14. Write the 32-bit representation of -247 in sign-magnitude, one’s complement, and two’s

complement notations. Write the answer using 8 hex digits.

Sign-magnitude: 0x800000F7

8
One’s complement: 0xFFFFFF08

Two’s complement: 0xFFFFFF10

15. Write the binary pattern for the letter ‘Q’ using the ASCII representation.

01010001 or 0x51

16. Multiply the following binary values. Notice that binary multiplication works exactly like
decimal multiplication, except you are either adding 0 to the final product or a scaled
multiplicand. For example:

100 (multiplicand)
x 110 (multiplier)
-----
0
1000 (scaled multiplicand – by 2)
10000 (scaled multiplicand – by 4)
---------
11000

a. 1100 b. 1010 c. 1000 d. 11100


x 1111 x 1011 x 1001 x 111
------ --------- --------- -----------

a. 10110100

b. 1101110

c. 1001000

d. 11000100

17. How many bits would the following C data types use by the ARM7TDMI?

a. int

b. long

c. char

d. short

e. long long

9
a. 32

b. 32

c. 8

d. 16

e. 64

18. Write the decimal number 1.75 in the IEEE single-precision floating-point format. Use one of

the tools given in the References to check your answer.

0x3FE00000

10
Chapter 2
1. How many modes does the ARM7TDMI processor have? How many states does it have? How

many modes does the Cortex-M4 have?

ARM7TDMI: 7 modes, 2 states

Cortex-M4: 2 modes, Handler mode and Thread mode

2. What do you think would happen if the instruction SMULTT (an instruction that runs fine on a

Cortex-M4) were issued to an ARM7TDMI? Which mode do you think it would be in after this

instruction entered the execute stage of its pipeline?

It wouldn’t recognize the instruction and would go into Undefined mode.

3. What is the standard use of register r14? Register r13? Register r15?

r14: Link Register

r13: Stack Pointer

r15: Program Counter

4. On an ARM7TDMI, in any given mode, how many registers does a programmer see at one

time?

17 for User mode, 18 for remaining modes

5. Which bits of the ARM7TDMI status registers contain the flags? Which register on the Cortex-

M4 holds the status flags?

ARM7TDMI: Bits [31:28]

Cortex-M4: APSR bits [31:28]

11
6. If an ARM7TDMI processor encounters an undefined instruction, from what address will it

begin fetching instructions after it changes to Undefined mode? What about a reset?

On the ARM7TDMI: Undefined: 0x04 and Reset: 0x0

7. What is the purpose of FIQ mode?

Handling of highest priority interrupt(s)

8. Which mode on an ARM7TDMI can assist in supporting operating systems, especially for

supporting virtual memory systems?

Abort mode

9. How do you enable interrupts on the ARM7TDMI?

Clear bits 6 and 7 of the CPSR

10. How many stages does the ARM7TDMI pipeline have? Name them.

3: Fetch, Decode, and Execute

11. Suppose that the program counter, register r15, contained the hex value 0x8000. From what

address would an ARM7TDMI fetch the next instruction (assume you are in ARM state)?

This depends on the state of the machine. It would be 0x8004 if the machine is in ARM state.

Note that it could be 0x8002 if the machine is in Thumb state.

12. What is the function of the Saved Program Status Register?

To save the state of the machine in the CPSR when an exception is taken so it can be restored

upon return of handling the exception.

12
13. On an ARM7TDMI, is it permitted to put the instruction

SUB r0, r2, r3

at address 0x4? How about at address 0x0? Can you put that same bit pattern at address 0x4 in a

system using a Cortex-M4?

On an ARM7TDMI, technically, yes and yes. The instruction would be executed. You could put

the bit pattern for the SUB instruction at address 0x4 in a Cortex-M4 system, but remember that

the bit pattern is seen as an address, so the machine will attempt to fetch an instruction from that

address, and mostly likely die a horrible death.

14. Describe the exception vector table for any other microprocessor. How does it differ from the

ARM7TDMI processor? How does it differ from the Cortex-M4?

Motorola 68000’s vectors correspond to different exceptions like ARM’s, but they contain

addresses instead of instructions. It is actually more like the Cortex-M4’s exception vectors.

15. Give an example of an instruction that would typically be placed at address 0x0 on an

ARM7TDMI. What value is typically placed at address 0x0 on a Cortex-M4?

LDR pc, [pc, #offset]

where “#offset” is an offset to the Reset handler.

For the Cortex-M4, the address of the top of the stack is placed at address 0x0.

16. Explain the current program state of an ARM7TDMI if the CPSR had the value 0xF00000D3.

N, Z, C, V flags are set; IRQs and FIQs are enabled; ARM state; Supervisor mode

13
Chapter 3
1. Change Program 1, replacing the last MOV instruction with

ADD r2, r1, r1, LSL #2

and rerun the simulation. What value is in register r2 when the code reaches the infinite loop (the

B instruction)? What is the ADD instruction actually doing?

r2 = 170 (0xAA). The ADD instruction is multiplying r1 by 5.

2. Using a Disassembly window, write out the seven machine codes (32-bit instructions) for

Program 2.

0xE3A0600A
0xE3A07001
0xE3560000
0xC0070796
0xC2466001
0xCAFFFFFB
0xEAFFFFFE

3. How many bytes does the code for Program 2 occupy? What about Program 3?

Program 2: For the ARM7TDMI code, 28 bytes. For the Cortex-M4 code, 22 bytes.

Program 3: 32 bytes (don’t forget the two constants in the literal pool)

4. Change the value in register r6 at the start of Program 2 to 12. What value is in register r7

when the code terminates? Verify that this hex number is correct.

0x1C8CFC00 = 479001600 = 12!

5. Run Program 3. After the first XOR instruction, what is the value in register r0? After the

second XOR instruction, what is the value in register r1?

r0 = 0xE16298F1

r1 = 0xF631024C

14
6. Using the instructions in Program 2 as a guide, write a program that computes 6x2-9x+2 and

leaves the result in register r2. You can assume x is in register r3. For the syntax of the

instructions, such as addition and subtraction, see the ARM Architectural Reference Manual and

the ARM v7-M Architectural Reference Manual.

AREA Prog6, CODE, READONLY


ENTRY

MOV r4, #6
MUL r2, r3, r3 ; r2 = x^2
MUL r2, r4, r2 ; r2 = 6x^2
ADD r3, r3, r3, LSL #3 ; r3 = 9x
SUB r2, r2, r3 ; r2 = 6x^2 - 9x
ADD r2, r2, #2 ; r2 = 6x^2 -9x +2
stop B stop
END

You can also implement this by rewriting the expression as (6x - 9)x + 2.

7. Show two different ways to clear all the bits in register r12 to zero. You may not use any other

registers other than r12.

AND r12, r12, #0

EOR r12, r12, r12

SUB r12, r12, r12

8. Using Program 3 as a guide, write a program that adds the 32-bit two’s complement

representations of -149 and -4,321. Place the result in register r7. Show your code and the

resulting value in register r7.

AREA Prog8, CODE, READONLY


ENTRY

LDR r1, =0xFFFFFF6B ; r1 = -149


LDR r2, =0xFFFFEF1F ; r2 = -4321
ADD r7, r1, r2 ; r7 = -149 -4321 = -4470
stop B stop
END

15
r7 = 0xFFFEE8A

9. Using Program 2 as a guide, execute the following instructions. Place small values in the

registers beforehand. What do the instructions actually do?

a. MOVS r6, r6, LSL #5

b. ADD r9, r8, r8, LSL #2

c. RSB r10, r9, r9, LSL #3

d. (b) followed by (c)

a. r6 = r6 * 32

b. r9 = r8 * 5

c. r10 = r9 * 7

d. r10 = r8 * 35

10. Suppose a branch instruction is located at address 0x0000FF00 in memory. What ARM

instruction (32-bit binary pattern) do you think would be needed so that this B instruction could

branch to itself?

0xEA003FBB

11. Translate the following machine code into ARM mnemonics. What does the machine code

do? What is the final value in register r2? You will want to compare these bit patterns with

instructions found in the ARM Architectural Reference Manual.

Address Machine code

00000000 E3A00019
00000004 E3A01011
00000008 E0811000
0000000C E1A02001

MOV r0, #25


MOV r1, #17

16
ADD r1, r1, r0
MOV r2, r1

12. Using the VLDR pseudo-instruction shown in Program 5, change Program 4 so that it adds

the value of pi (3.1415926) to 2.0. Verify that the answer is correct using one of the floating-point

conversion tools given in the References.

LDR r0, =0xE000ED88 ; read modify write


LDR r1, [r0]
ORR r1, r1, #(0xF<<20) ; enable CP10, CP11
STR r1, [r0]
VLDR.F s0, =2.0
VLDR.F s1, =3.1415926 ; pi
VADD.F s2, s1, s0

13. The floating-point instruction VMUL.F works very much like a VADD.F instruction. Using
Programs 4 and 5 as a guide, multiply the floating-point representation for Avagadro’s constant
and 4.0 together. Verify that the result is correct using a floating-point conversion tool.

LDR r0, =0xE000ED88 ; read modify write


LDR r1, [r0]
ORR r1, r1, #(0xF<<20) ; enable CP10, CP11
STR r1, [r0]
VLDR.F s0, =4.0
VLDR.F s1, =6.0221415e23 ; Avagadro’s constant
VMUL.F s2, s1, s0

17
Chapter 4
1. What is wrong with the following program?

AREA ARMex2, CODE, READONLY


ENTRY

start
MOV r0, #6
ADD r1, r2, #2

END

The MOV and ADD instructions need to be indented as they cannot be in the label space.

2. What is another way of writing the following line of code?

MOV PC, LR

MOV r15, r14

3. Use a Keil directive to assign register r6 to the name bouncer.

bouncer RN 6

4. Use a Code Composer Studio directive to assign register r2 to the name FIR_index.

.asg R2, FIR_index

5. Fill in the missing Keil directive below:

SRAM_BASE ________ 0x2000

MOV r12, #SRAM_BASE


STR r6, [r12]
EQU

6. What is the purpose of a macro?

To allow a programmer to build definitions of functions or operations once, and then call this

operation by name throughout the code, saving coding time.

18
7. Create a mask (bit pattern) in memory using the DCD directive (Keil) and the SHL and OR

operators for the following cases. Repeat the exercise using the .word (CCS) and the << and |

operators. Remember that bit 31 is the most significant bit of a word and bit 0 is the least

significant bit.

a. The upper two bytes of the word are 0xFFEE and the least significant bit is set.

b. Bits 17 and 16 are set, and the least significant byte of the word is 0x8F.

c. Bits 15 and 13 are set (hint: do this with two SHL directives).

d. Bits 31 and 23 are set.

Keil:

a. MaskA DCD (0xFFEE:SHL:16):OR:1

b. MaskB DCD 0x8F:OR:(3:SHL:16)

c. MaskC DCD (1:SHL:13):OR:(1:SHL:15)

d. MaskD DCD (1:SHL:31):OR:(1:SHL:23)

CCS:

a. MaskA .word (0xFFEE<<16) | 1

.b MaskB .word 0x8F | (3<<16)

.c MaskC .word (1<<13) | (1<<15)

.d MaskD .word (1<<31) | (1<<23)

8. Give the Keil directive that assigns the address 0x800C to the symbol INTEREST.

INTEREST EQU 0x800C

9. What constant would be created if the following operators are used with a DCD directive? For

example,

MASK DCD 0x5F:ROL:3

19
a. 0x5F:SHR:2 b. 0x5F:AND:0xFC

c. 0x5F:EOR:0xFF d. 0x5F:SHL:12

a. 0x17

b. 0x5C

c. 0xA0

d. 0x5F000

10. What constant would be created if the following operators are used with a .word directive?

For example,

MASK .word 0x9B<<3

a. 0x9B>>2 b. 0x9B & 0xFC

c. 0x9B ^ 0xFF d. 0x9B<<12

a. 0x26

b. 0x98

c. 0x64

d. 0x9B000

11. What instruction puts the ASCII representation of the character ‘R’ in register r11?

MOV r11, #0x52

12. Give the Keil directive to reserve a block of zeroed memory, holding 40 words and labeled

coeffs.

coeffs SPACE 160

13. Give the CCS directive to reserve a block of zeroed memory, holding 40 words and labeled

coeffs.

20
coeffs .space 160

14. Explain the difference between Keil’s EQU, DCD, and RN directives. Which, if any, would

be used for the following cases?

a. Assigning the Abort mode’s bit pattern (0x17) to a new label called Mode_ABT.

b. Storing sequential byte-sized numbers in memory to be used for copying to another location in

memory.

c. Storing the contents of register r12 to memory address 0x40000004.

d. Associating a particular microcontroller’s predefined memory-mapped register address with a

name from the chip’s documentation, for example, VIC0_VA7R.

EQU gives a symbolic name to constants and values.

DCD allocates one or more word-aligned words of memory and defines runtime contents of

memory.

RN defines a register name for a particular register.

a. EQU

b. neither, should be DCB

c. neither

d. EQU

21
Chapter 5
1. Describe the contents of register r13 after the following instructions complete, assuming that

memory contains the values shown below. Register r0 contains 0x24, and the memory system is

little-endian.

Address Contents
0x24 06
0x25 FC
0x26 03
0x27 FF

a. LDRSB r13, [r0] c. LDR r13, [r0]

b. LDRSH r13, [r0] d. LDRB r13, [r0]

a. 0x06

b. 0xFFFFFC06

c. 0xFF03FC06

d. 0x06

2. Indicate whether the following instructions use pre- or post-indexed addressing modes:

a. STR r6, [r4, #4] c. LDRB r4, [r3, r2]!

b. LDR r3, [r12], #6 d. LDRSH r12, [r6]

a. pre-indexed

b. post-indexed

c. pre-indexed

d. pre-indexed

3.Calculate the effective address of the following instructions if register r3 = 0x4000 and register

r4 = 0x20:

a. STRH r9, [r3, r4] c. LDR r7, [r3], r4

22
b. LDRB r8, [r3, r4, LSL #3] d. STRB r6, [r3], r4, ASR #2

a. 0x4020

b. 0x4100

c. 0x4000

d. 0x4000

4. What’s wrong with the following instruction on an ARM7TDMI?

LDRSB r1,[r6],r3,LSL#4

For halfword, signed halfword, and signed byte accesses, an offset register cannot be shifted.

This was a design tradeoff because the need for this is so infrequent.

5. Write a program for either the ARM7TDMI or the Cortex-M4 that sums word-length values in

memory, storing the result in register r3. Include the following table of values to sum in your

code:

TABLE DCD 0xFEBBAAAA, 0x12340000, 0x88881111


DCD 0x00000013, 0x80808080, 0xFFFF0000

; For the ARM7TDMI


AREA Exercise5, CODE
ENTRY

LDR r0, =TABLE ; r0 is pointer to TABLE


MOV r1, #6 ; r1 is length of table
MOV r2, #0 ; r2 is sum

loop LDR r3, [r0], #4 ; load TABLE value and


; post-increment pointer
ADD r2, r2, r3 ; r3 = sum
SUBS r1, r1, #1 ; decrement counter
BNE loop

stop B stop

TABLE DCD 0xFEBBAAAA, 0x12340000, 0x88881111


DCD 0x00000013, 0x80808080, 0xFFFF0000

23
END

NB: If students aren’t using B (branch) instructions, then you could code this using only ADD

and LDR/STR instructions.

6. Assume an array contains 30 words of data. A compiler associates variables x and y with

registers r0 and r1, respectively. Assume the starting address of the array is contained in register

r2. Translate the C statement below into assembly instructions using pre-indexed addressing:

x = array[7] + y;

LDR r0, [r2, #28] ; r0 = array[7]


ADD r0, r0, r1 ; r0 = array[7] + y

7. Using the same initial conditions as Exercise 6, translate the following C statement into

assembly instructions using pre-indexed addressing:

array[10] = array[8] + y;

LDR r0, [r2, #32] ; r0 = array[8]


ADD r0, r0, r1 ; r0 = array[8] + y
STR r0, [r2, #40] ; array[10] = array[8] + y

8. Consider a C procedure that initializes an array of bytes to all zeros, given as

init_Indices (int a[], int s) {


int i;
for (i = 0; i < s; i ++)
a[i] = 0; }

Write the assembly language for this initialization routine. Assume s > 0 and is held in register r2.

Register r1 contains the starting address of the array, and the variable i is held in register r3.

While loops are not covered until Chapter 8, you can build a simple for loop using the following

construction:

MOV r3, #0 ; clear i

24
loop instruction
instruction
ADD r3, r3, #1 ; increment i
CMP r3, r2 ; compare i to s
BNE loop ; branch to loop if not equal

AREA init_Indices, CODE, READONLY


ENTRY

MOV r3, #0 ; i = 0
MOV r4, #0 ; r4 = clearing agent

Loop STRB r4, [r1], #1 ; a[i] = 0


ADD r3, r3, #1 ; increment i
CMP r3, r2 ; i < s ?
BNE Loop

stop B stop

END

9. Suppose that registers belonging to a particular peripheral on a microcontroller have a starting

address of 0xE000C000. Individual registers within the peripheral are addressed as offsets from

the starting address. If a register called LSR0 is 0x14 bytes away from the starting address, write

the assembly and Keil directives that will load a byte of data into register r6, where the data is

located in the LSR0 register. Use pre-indexed addressing.

LSR_BASE EQU 0xE000C000


LSR0 EQU 0x14

LDR r0, =LSR_BASE


LDRB r6, [r0, #LSR0]

10. Assume register r3 contains 0x8000. What would the register contain after executing the

following instructions?

a. STR r6, [r3, #12] c. LDRH r5, [r3], #8

b. STRB r7, [r3], #4 d. LDR r12, [r3, #12]!

a. 0x8000

b. 0x8004

c. 0x8008

25
d. 0x800C*

* Assuming this instruction doesn’t immediately follow (c) in a program.

11. Assuming you have a little-endian memory system connected to the ARM7TDMI, what

would register r4 contain after executing the following instructions? Register r6 holds the value

0xBEEFFACE and register r3 holds 0x8000.

STR r6, [r3]


LDRB r4, [r3]

What if you had a big-endian memory system?

Little: 0xCE
Big: 0xBE

26
Chapter 6

1. What constant would be loaded into register r7 by the following instructions?

a. MOV r7, #0x8C, 4 c. MVN r7, #2

b. MOV r7, #0x42, 30 d. MVN r7, #0x8C, 4

a. 0xC0000008

b. 0x00000108

c. 0xFFFFFFFD

d. 0x3FFFFFF7

2. Using the byte rotation scheme described for the ARM7TDMI, calculate the instruction and

rotation needed to load the following constants into register r2:

a. 0xA400 c. 0x17400

b. 0x7D8 d. 0x1980

a. MOV r2, #0xA4, 24

b. can’t be done – use LDR r2, =0x7D8

c. MOV r2, #0x5D, 22

d. MOV r2, #0x66, 26

3. Tell whether or not the following constants can be loaded into an ARM7TDMI register without

creating a literal pool and using only a single instruction:

a. 0x12340000 c. 0xFFFFFFFF

b. 0x77777777 d. 0xFFFFFFFE

a. no

b. no

c. yes – MVN r2, #0

27
d. yes – MVN r2, #1

4. Tell whether or not the following constants can be loaded into a Cortex-M4 register without

creating a literal pool and using only a single instruction:

a. 0xEE00EE00 c. 0x33333373

b. 0x09A00000 d. 0xFFFFFFFE

a. yes

b. yes

c. no – use LDR r3, =0x33333373

d. yes – use MVN r2, #1

5. What is the best way to put a numeric constant into a register?

Use the pseudo-load instruction, LDR Rd, =constant

6. Where is the best place to put literal pool data?

Very close to the instructions accessing the data.

7. Suppose you had the following code:

AREA SAMPLE, CODE,READONLY


ENTRY
start
MOV r12, #SRAM_BASE
ADD r0, r1, r2
MOV r0, #0x18
BL routine1
.
.
.
routine1
STM sp!, {r0-r3,lr}
.
.
.
END

28
Describe two ways to load the label routine1 into register r3, noting any restrictions that
apply.

1. LDR r3, =routine1


2. MOV r3, #routine1 ; routine1 address must be within range
; of immediates useable for data
; processing instructions

8. Describe the difference between ADR and ADRL.

ADR loads a relative address of a label into a register if the address is not more than 255 bytes
away. ADRL does the same, but it has a +-64KB range.

9. Give the instruction(s) to perform the following operations for both the ARM7TDMI and the

Cortex-M4:

a. Add 0xEC00 to register r6, placing the sum in register r4.

b. Subtract 0xFF000000 from register r12, placing the result in register r7.

c. Add the value 0x123456AB to register r7, placing the sum in register r12.

d. Place a two’s complement representation of -1 into register r3.

a. ADD r4, r6, #0xEC, 24 (ARM7TDMI)

ADD.W r4,r6, #0xEC00 (Cortex-M4 – note that we need the 32-bit ADD instruction)

b. SUB r7, r12, #0xFF, 8 (ARM7TDMI)

SUB.W r7, r12, #0xFF000000 (Cortex-M4 – note that we need the 32-bit SUB

instruction)

c. LDR r0, =0x123456AB


ADD r12, r7, r0

d. MVN r3, #0

10. Suppose you had the following code and you are using the Keil tools:

AREA EXERCISE, CODE


ENTRY
BL func1 ; call first subroutine
BL func2 ; call second subroutine

29
stop B stop ; terminate the program

func1 MOV r2, #0


LDR r1, =0xBABEFACE
LDR r2, =0xFFFFFFFC
MOV pc, lr ; return from subroutine

LTORG ; literal pool 1 has 0xBABEFACE


func2 LDR r3, =0xBABEFACE
LDR r4, =0x66666666
MOV pc, lr ; return from subroutine
BigTable
SPACE 3700 ; clears 3700 bytes of memory,
; starting here
END

On an ARM7TDMI, will loading 0x66666666 into register r4 cause an error? Why or why not?
What about on a Cortex-M4?

No. The LDR can reach the literal pool after the 3700 bytes of memory space since it will be
within 4K. The Cortex-M4 will not generate a literal since this type of constant can be loaded
with a single instruction.

11. The ARM branch instruction – B – provides a ±32MB branching range. If the program

counter is currently 0x8000 and you need to jump to address 0xFF000000, how do you think you

might do this?

Directly load the PC with the instruction LDR pc, #0xFF000000

12. Assuming that the floating-point hardware is already enabled (CP10 and CP11), write the

instructions to load the floating-point register s3 with a quiet NaN, or 0x7FC00000. (Hint: use

Program 5 from Chapter 3 as a guide.) You can write it using either the Keil tools or the CCS

tools.

; using the Keil tools here


VLDR.F s3, =0x7FC00000

Or if you’re using the CCS tools:

MOVW r0, #0x0


MOVT r0, #0x7FC0
VMOV.F s3, r0

30
Chapter 7
1. What’s wrong with the following ARM instructions? You may want to consult the ARM

Architectural Reference Manual to see the complete instruction descriptions and limitations.

a. ADD r3,r7,#1023 b. SUB r11,r12,r3,LSL #32

c. RSCLES r0, r15, r0, LSL r4 d. EORS r0, r15, r3, ROR r6

a. #1023 is out of range for immediate for data processing instructions

b. cannot shift register operand 2 more than #31

c. cannot use pc as Rn or Rm in register specified shift, as it produces undefined effects

d. cannot use pc as Rn or Rm in register specified shift, as it produces undefined effects

2. Without using the MUL instruction, give instructions which multiply register r4 by:

a. 135 c. 18

b. 255 d. 16,384

a. RSB r0, r4, r4, LSL #3


ADD r1, r0, r4, LSL #7

b. RSB r0, r4, r4, LSL #8

c. ADD r0, r4, r4, LSL #3


ADD r1, r0, r0

d. ADD r0, r4, LSL #14

3. Write a compare routine to compare two 64-bit values, using only two instructions. (Hint: the

second instruction is conditionally executed, based on the first comparison).

;Value1 = r1:r0
;Value2 = r3:r2

compare64
CMP r1, r3
CMPEQ r0, r2

31
4. Write shift routines that allow you to arithmetically shift 64-bit values which are stored in two

registers. The routines should shift an operand left or right by one bit.

; 64-bit value in r1:r0

leftShift1
MOV r1, r1, LSL #1 ; the MSB can drop off
MOVS r0, r0, LSL #1 ; MUST use MOVS to set flags
ADC r1, r1, #0 ; use the carry from shifter

rightShift1
MOVS r1, r1, ASR #1 ; the LSB goes to C bit
MOV r0, r0, RRX ; C bit goes to MSB of r0

5. Write the following decimal values in Q15 notation:

a. 0.3487 b. -0.1234

c. -0.1111 d. 0.7574

a. 0x2CA2

b. 0xFFFFF035

c. 0xFFFFF1C8

d. 0x60F2

6. Write the following signed, two’s complement Q8 values in decimal:

a. 0xFE32 b. 0x9834

c. 0xE800 d. 0xF000

a. -1.8047

b. -103.7969

c. -24

d. -16

32
7. Write the assembly code necessary to detect an error in a 12-bit Hamming code, where your

code tests the four checksum bits c3, c2, c1 and c0. Place your corrupted data in memory.

Assume that only a single error occurs in the data and store your corrected value in register r6.

AREA Hamming, CODE


ENTRY

; generate Hamming Code

MOV r2, #0 ; clear out transmitting reg


ADR r1, data
LDRB r0, [r1] ; load data
;
; calculate c0 using bits 76543210
; * ** **
; even parity, so result of XORs is the value of c0
;
MOV r4, r0 ; make a copy
EOR r4, r4, r0, ROR #1 ; 1 XOR 0
EOR r4, r4, r0, ROR #3 ; 3 XOR 1 XOR 0
EOR r4, r4, r0, ROR #4 ; 4 XOR 3 XOR 1 XOR 0
EOR r4, r4, r0, ROR #6 ; 6 XOR 4 XOR 3 XOR 1 XOR 0
AND r2, r4, #1 ; create c0 -> R2
;
; calculate c1 using bits 76543210
; ** ** *
MOV r4, r0
EOR r4, r4, r0, ROR #2 ; 2 XOR 0
EOR r4, r4, r0, ROR #3 ; 3 XOR 2 XOR 0
EOR r4, r4, r0, ROR #5 ; 5 XOR 3 XOR 2 XOR 0
EOR r4, r4, r0, ROR #6 ; 6 XOR 5 XOR 3 XOR 2 XOR 0
AND r4, r4, #1 ; isolate bit
ORR r2, r2, r4, LSL #1 ; 7 6 5 4 3 2 c1 c0
;
; calculate c2 using bits 76543210
; * ***
ROR r4, r0, #1 ; get bit 1
EOR r4, r4, r0, ROR #2 ; 2 XOR 1
EOR r4, r4, r0, ROR #3 ; 3 XOR 2 XOR 1
EOR r4, r4, r0, ROR #7 ; 7 XOR 3 XOR 2 XOR 1
AND r4, r4, #1 ; isolate bit
ORR r2, r2, r4, ROR #29 ; 7 6 5 4 c2 2 c1 c0
;
; calculate c3 using bits 76543210
; ****
ROR r4, r0, #4 ; get bit 4
EOR r4, r4, r0, ROR #5 ; 5 XOR 4
EOR r4, r4, r0, ROR #6 ; 6 XOR 5 XOR 4
EOR r4, r4, r0, ROR #7 ; 7 XOR 6 XOR 5 XOR 4

33
AND r4, r4, #1
;
; build the final 12-bit result
;
ORR r2, r2, r4, ROR #25 ; rotate left 7 bits
AND r4, r0, #1 ; get bit 0 from original
ORR r2, r2, r4, LSL #2 ; add bit 0 into final
BIC r4, r0, #0xF1 ; get bits 3,2,1
ORR r2, r2, r4, LSL #3 ; add bits 3,2,1 to final
BIC r4, r0, #0x0F ; get upper nibble
ORR r2, r2, r4, LSL #4 ; r2 now contains 12 bits
; with checksums

; for this example, r2 holds the Hamming code 0xA6B

; now, corrupt it with a few hypothetical transmission errors

BIC r2, r2, #0xA ; now r2 holds corrupted


; hamming code 0xA61
;BIC r2, r2, #0x9 ; now r2 holds corrupted
; hamming code 0xA62
ADR r1, hammingCodeHolder
STR r2, [r1] ; store it in memory

; construct corrupt data from corrupt Hamming code

LDR r2, [r1] ; load corrupted hamming code


MOV r0, #0 ; clear out register to create
; data from corrupted Hamming code
MOV r7, #0 ; clear out register to create
; corrupted checksum bit indicator
MOV r8, #0 ; clear out register to create
; corrupted checksum bit counter

ANDS r5, r2, #0x4 ; d0 in corrupted Hamming code = 1?


ADDNE r0, r0, #0x1 ; if so, d0 = 1 in corrupted data
ANDS r5, r2, #0x10 ; d1 in corrupted Hamming code = 1?
ADDNE r0, r0, #0x2 ; if so, d1 = 1 in corrupted data
ANDS r5, r2, #0x20 ; d2 in corrupted Hamming code = 1?
ADDNE r0, r0, #0x4 ; if so, d2 = 1 in corrupted data
ANDS r5, r2, #0x40 ; d3 in corrupted Hamming code = 1?
ADDNE r0, r0, #0x8 ; if so, d3 = 1 in corrupted data
ANDS r5, r2, #0x100 ; d4 in corrupted Hamming code = 1?
ADDNE r0, r0, #0x10 ; if so, d4 = 1 in corrupted data
ANDS r5, r2, #0x200 ; d5 in corrupted Hamming code = 1?
ADDNE r0, r0, #0x20 ; if so, d5 = 1 in corrupted data
ANDS r5, r2, #0x400 ; d6 in corrupted Hamming code = 1?
ADDNE r0, r0, #0x40 ; if so, d6 = 1 in corrupted data
ANDS r5, r2, #0x800 ; d7 in corrupted Hamming code = 1?
ADDNE r0, r0, #0x80 ; if so, d7 = 1 in corrupted data

34
; now, r0 = corrupted data (extracted from corrupted Hamming
; code)

; check if c0 corrupted
AND r3, r2, #0x1 ; r3 = c0 from corrupted Hamming code
MOV r4, r0 ; make a copy
EOR r4, r4, r0, ROR #1 ; 1 XOR 0
EOR r4, r4, r0, ROR #3 ; 3 XOR 1 XOR 0
EOR r4, r4, r0, ROR #4 ; 4 XOR 3 XOR 1 XOR 0
EOR r4, r4, r0, ROR #6 ; 6 XOR 4 XOR 3 XOR 1 XOR 0
AND r4, r4, #1 ; isolate bit
EORS r4, r4, r3 ; 6 XOR 4 XOR 3 XOR 1 XOR 0 XOR c0
ADDNE r7, r7, #0x1 ; if not equal to 0, c0 is corrupted
ADDNE r8, r8, #1 ; increment corrupted checksum counter

; check if c1 corrupted
AND r3, r2, #0x2
LSR r3, r3, #1 ; r3 = c1 from corrupted Hamming code
MOV r4, r0 ; make a copy
EOR r4, r4, r0, ROR #2 ; 2 XOR 0
EOR r4, r4, r0, ROR #3 ; 3 XOR 2 XOR 0
EOR r4, r4, r0, ROR #5 ; 5 XOR 3 XOR 2 XOR 0
EOR r4, r4, r0, ROR #6 ; 6 XOR 5 XOR 3 XOR 2 XOR 0
AND r4, r4, #1 ; isolate bit
EORS r4, r4, r3 ; 6 XOR 5 XOR 3 XOR 2 XOR 0 XOR c1
ADDNE r7, r7, #0x2 ; if not equal to 0, c1 is corrupted
ADDNE r8, r8, #1 ; increment corrupted checksum counter

; check if c2 corrupted
AND r3, r2, #0x8
LSR r3, r3, #3 ; r3 = c2 from corrupted Hamming code
ROR r4, r0, #1 ; get bit 1
EOR r4, r4, r0, ROR #2 ; 2 XOR 1
EOR r4, r4, r0, ROR #3 ; 3 XOR 2 XOR 1
EOR r4, r4, r0, ROR #7 ; 7 XOR 3 XOR 2 XOR 1
AND r4, r4, #1 ; isolate bit
EORS r4, r4, r3 ; 7 XOR 3 XOR 2 XOR 1 XOR c2
ADDNE r7, r7, #0x4 ; if not equal to 0, c2 is corrupted
ADDNE r8, r8, #1 ; increment corrupted checksum counter

; check if c3 corrupted
AND r3, r2, #0x80
LSR r3, r3, #7 ; r3 = c3 from corrupted Hamming code
ROR r4, r0, #4 ; get bit 4
EOR r4, r4, r0, ROR #5 ; 5 XOR 4
EOR r4, r4, r0, ROR #6 ; 6 XOR 5 XOR 4
EOR r4, r4, r0, ROR #7 ; 7 XOR 6 XOR 5 XOR 4
AND r4, r4, #1 ; isolate bit
EORS r4, r4, r3 ; 7 XOR 6 XOR 5 XOR 4 XOR c3
ADDNE r7, r7, #0x8; if not equal to zero, c3 is corrupted
ADDNE r8, r8, #1 ; increment corrupted checksum counter

35
; now, r7 bits indicate corrupted checksum bits from corrupted
; Hamming code

CMP r8, #2 ; was there two checksum bits corrupted?


MOVNE r6, r0 ; if not, data is fine
BNE done

CMP r7, #0x3 ; if so, was it c1 and c0?


EOREQ r6, r0, #0x1 ; if so, bit 0 should be changed
CMP r7, #0x5 ; if so, was it c2 and c0?
EOREQ r6, r0, #0x2 ; if so, bit 1 should be changed
CMP r7, #0x6 ; if so, was it c2 and c1?
EOREQ r6, r0, #0x4 ; if so, bit 2 should be changed
CMP r7, #0x9 ; if so, was it c3 and c0?
EOREQ r6, r0, #0x10 ; if so, bit 4 should be changed
CMP r7, #0xA ; if so, was it c3 and c1?
EOREQ r6, r0, #0x20 ; if so, bit 5 should be changed
CMP r7, #0xC ; if so, was it c3 and c2?
EOREQ r6, r0, #0x80 ; if so, bit 7 should be changed

done B done

data DCD 0xAC


hammingCodeHolder
DCD 0x0

END

8. Write a program to calculate π x 48.9 in Q10 notation.

AREA FRACTIONAL, CODE


ENTRY

LDR r0, =0xC90 ; load pi in Q10


LDR r1, =0xC399 ; load 48.9 in Q10
MUL r2, r0, r1 ; signed multiply, result in Q20

END

9. Show the representation of sin(82°) in Q15 notation.

0x7EC1

10. Show the representation of sin(193°) in Q15 notation.

0xE335

36
11. Temperature conversion between Celsius and Fahrenheit can be computed using the

relationship

5
C ( F  32)
9

where C and F are in degrees. Write a program that converts a Celsius value in register r0 to

degrees Fahrenheit. Convert the fraction into a Q15 representation and use multiplication instead

of division in your routine. Load your test value from a memory location called CELS and store

the result in memory labeled FAHR. Remember that you will need to specify the starting address

of RAM for the microcontroller that you use in simulation. For example, the LPC2132

microcontroller has SRAM starting at address 0x40000000.

GLOBAL Reset_Handler
AREA Reset, CODE, READONLY
ENTRY

; certain assumptions are made here:


; the temperature is between 0 and 100 degrees C
; the result truncates the fractional portion, so it's
; a whole number in degrees F, without rounding up, etc.
; You can make it slightly more accurate if you wish.

FAHR EQU 0x40000000 ; for LPC2102, start of RAM


CELS EQU 0xE ; Celsius value in Q0

Reset_Handler

LDR r0, =CELS ; load Celsius value in Q15


LDR r4, =FAHR ; load address of FAHR
LDR r1, =0xE666 ; load 1.8 in Q15
MUL r3, r0, r1 ; signed multiply, result in Q15
LDR r2, =0x100000 ; load 32 in Q15
ADD r3, r3, r2 ; add 32
LSR r3, r3, #15 ; shift right 15 to convert to Q0

STR r3, [r4] ; store result in FAHR

Done
B Done

END

37
12. Write a program for either the ARM7TDMI or the Cortex-M4 that counts the number of ones

in a 32-bit value. Store the result in register r3.

AREA CountOnes, CODE


ENTRY

LDR r0, =VALUE


LDR r1, =32 ; 32-bit loop counter
LDR r3, =0x0 ; count

Loop CMP r0, #0 ; 1 in MSB?


ADDLT r3, r3, #1 ; if so, increment count
LSL r0, r0, #1 ; shift out MSB
SUBS r1, r1, #1
BNE Loop

Done B Done

VALUE EQU 0xFFFFFFFF

END

13. Using the ARM Architectural Reference Manual (or the Keil or CCS tools), give the bit

pattern for the following instructions:

a. RSB r0, r3, r2, LSL #2

b. SMLAL r3, r8, r2, r4

c. ADD r0, r0, r1, LSL #7

a. instruction encoding = 0xE0630102

b. instruction encoding = 0xE0E83492

c. instruction encoding = 0xE0800381

14. A common task that microcontrollers perform is ASCII-to-binary conversion. If you press a

number on a keypad, for example, the processor receives the ASCII representation of that

number, not the binary representation. A small routine is necessary to convert the data into binary

for use in other arithmetic operations. Looking at the ASCII table in Appendix C, you will notice

that the digits 0 through 9 are represented with the ASCII codes 0x30 to 0x39. The digits A

38
through F are coded as 0x41 through 0x46. Since there is a break in the ranges, it’s necessary to

do the conversion using two checks.

The algorithm to do the conversion is:

Mask away the parity bit (bit 7 of the ASCII representation), since we don’t care about it.

Subtract a bias away from the ASCII value.

Test to see if the digit is between 0 and 9.

If so, we’re done. Otherwise subtract 7 to find the value.

Write this routine in assembly. You may assume that the ASCII representation is a valid character

between 0 and F.

AREA AsciiToBinary, CODE


ENTRY

LDR r0, =0x39 ; load ASCII value


BIC r0, r0, #0x80 ; mask bit 7
SUB r0, r0, #"0" ; subtract bias
CMP r0, #0xA ; was ASCII value < 0xA?
SUBGE r0, r0, #7 ; if not, subtract 7

Done B Done

END

15. Write four different instructions which clear register r7 to zero.

Choose from:

AND r7, r7, #0

EOR r7, r7, r7

BIC r7, r7, #0xFFFFFFFF

MOV r7, #0x0

SUB r7, r7, r7

39
16. Suppose register r0 contains the value 0xBBFF0000. Give the Thumb-2 instruction and

register value for r1 that would insert the value 0x7777 into the lower half of register r0, so that

the final value is 0xBBFF7777.

BFI r0, r1, #0, #16 ; r1 = 0x7777

17. Write ARM instructions which set bits 0, 4, and 12 in register r6 and leave the remaining bits

unchanged.

ORR r6, r6, #0x11


ORR r6, r6, #0x1, 20

18. Write a program that converts a binary value between 0 and 15 into its ASCII representation.

See Exercise 14 for background information.

AREA BinaryToAscii, CODE


ENTRY

LDR r0, =0xF ; load binary value


CMP r0, #0xA ; is number less than 0xA?
BLT ADD0
ADD r0, r0, #"A"-"0"-0xA ; add offset ‘A’ to ‘F’

ADD0 ADD r0, r0, #"0" ; convert to ASCII

Done B Done

END

19. Assume that a signed long multiplication instruction is not available on the ARM7TDMI.

Write a program that performs a 32x32 multiplication, producing a 64-bit result, using only

UMULL and logical operations (such as MVN for inversions, EOR, ORR, etc.) Run the program

to verify its operation.

AREA Lab275, CODE, READONLY


ENTRY

LDR r0, =0x80000000


LDR r1, =0x80000000
MOV r5, #0 ; number of negative operands counter

40
CMP r0, #0 ; r0 negative?
ADDLT r5, r5, #1 ; if so, increment neg operands counter
RSBLT r0, r0, #0 ; and make positive
CMP r1, #0 ; r1 negative?
ADDLT r5, r5, #1 ; if so, increment neg operands counter
RSBLT r1, r1, #0 ; and make positive
UMULL r2, r3, r0, r1 ; perform multiply
CMP r5, #1 ; was only one operand negative?
BEQ ResultNeg ; if so, result is negative

Done B Done

ResultNeg
RSBS r2, r2, #0 ; subtract LO from #0, set carry flag
RSC r3, r3, #0 ; subtract HI from #0 with carry
B Done

END

20. Write a program to add 128-bit numbers together, placing the result in registers r0, r1, r2 and

r3. The first operand should be placed in registers r4, r5, r6 and r7, and the second operand should

be in registers r8, r9, r10 and r11.

AREA Add128, CODE, READONLY


ENTRY

LDR r4, =0xFF556000 ; r4 = op1[31:0]


LDR r5, =0x00001000 ; r5 = op1[63:32]
LDR r6, =0x77666677 ; r6 = op1[95:64]
LDR r7, =0x00887771 ; r7 = op1[127:96]

LDR r8, =0x00056000 ; r8 = op2[31:0]


LDR r9, =0x05501000 ; r9 = op2[63:32]
LDR r10, =0x89668677 ; r10 = op2[95:64]
LDR r11, =0x00800071 ; r11 = op2[127:96]

ADDS r0, r4, r8 ; r0 = op1[31:0] + op2[31:0]

ADCS r1, r5, r9 ; r1 = op1[63:32] + op2[63:32]


ADCS r2, r6, r10 ; r2 = op1[95:64] + op2[95:64]
ADC r3, r7, r11 ; r3 = op1[127:96] + op2[127:96]

Done B Done

END

41
21. Write a program that takes character data ‘a’ through ‘z’ and returns the character in

uppercase.

AREA Upper, CODE, READONLY


ENTRY

LDR r0, =char


AND r0, r0, #0xDF ; clear bit 5

Done B Done

char EQU 0x7A ; ‘z’

END

22. Give three different methods to test the equivalence of two values held in registers r0 and r1.

1) CMP r0, r1 (then test if z flag is set)

2) TEQ r0, r1 (then test if z flag is set)

3) EORS r2, r0, r1 (then test if z flag is set)

23. Write assembly code for the ARM7TDMI to perform the following signed division:

r1= r0/16

MOV r1, r0, ASR #4 ; r1 = signed (r0/16)

24. Multiply 0xFFFFFFFF (-1 in a two’s complement representation) and 0x80000000 (the

largest negative number in a 32-bit two’s complement representation) on the ARM7TDMI. Use

the MUL instruction. What value do you get? Does this number make sense? Why or why not?

Result is 0x80000000 = -2147483648 in decimal. This looks odd, but from the machine’s

standpoint, it’s the result of multiplying two 32-bit numbers together and then truncating the

result. The 64-bit result would have been 7FFFFFFF80000000 (remember the machine hasn’t

been told the numbers are signed or unsigned – it just blindly multiplies them). When the result

was effectively truncated to 32 bits, the lower 32 bits happen to be 0x80000000.

42
25. A Gray code is an ordering of 2n binary numbers such that only one bit changes from one

entry to the next. One example of a 2-bit Gray code is b10 11 01 002. The spaces in this example

are for readability. Write ARM assembly to turn a 2-bit Gray code held in r1 into a 3-bit Gray

code in r2. Note that the 2-bit Gray code occupies only bits [7:0] of r1, and the 3-bit Gray code

occupies only bits [23:0] of r2. You can ignore the leading zeros. One way to build an n-bit Gray

code from an (n – 1)-bit Gray code is to prefix every (n – 1)-bit element of the code with 0 and

have those be the high codes (top half of the entire code). Then create the low half of the n-bit

Gray code elements by taking each (n – 1)-bit Gray code element in reverse order and prefixing it

with a one. For example, the 2-bit Gray code above becomes

b010 011 001 000 100 101 111 110.

; Gray Codes - Definition: An ordering of 2^n binary numbers


; such that only one bit changes from one entry to the next.
; This code takes a Gray Code for 2 bits (r1), turns it into a
; 3-;bit Gray Code. The 3-bit Gray Code shall be held in r2
; after program termination. One should note that a 2-bit Gray
; Code will only occupy bits [7:0] of an ARM register and a 3-
; bit Gray Code will only occupy bits [23:0] of an ARM register.

AREA GrayCodes, CODE, READONLY


ENTRY

; 1001 0010 0100 used as operand to prefix 2-bit elements with


; '1'

LDR r12, =0x924

; mask out individual 2-bit codes

AND r3, r1, #0xC0 ; r3[7:6] = 2-bit Gray code[7:6]


AND r4, r1, #0x30 ; r4[5:4] = 2-bit Gray code[4:5]
AND r5, r1, #0xC ; r5[3:2] = 2-bit Gray code[3:2]
AND r6, r1, #0x3 ; r6[1:0] = 2-bit Gray code[1:0]

; do same, these will be used to reverse order and place in lower


; half of final 3-bit Gray code

AND r7, r1, #0x3 ; r7[1:0] = 2-bit Gray code[1:0]

43
AND r8, r1, #0xC ; r8[3:2] = 2-bit Gray code[3:2]

AND r9, r1, #0x30 ; r9[5:4] = 2-bit Gray code[5:4]


AND r10, r1, #0xC0 ;r10[7:6] = 2-bit Gray code[7:6]

LSL r3, r3, #3 ; shift r3[7:6] to r3[10:9]


LSL r4, r4, #2 ; shift r4[5:4] to r4[7:6]
LSL r5, r5, #1 ; shift r5[3:2] to r5[4:3]

; put individual 2-bit Gray code elements in reverse order in


; lower half of final 3-bit Gray code

LSL r7, r7, #9 ; shift r7[1:0] to r7[10:9]


LSL r8, r8, #4 ; shift r8[3:2] to r8[7:6]
LSR r9, r9, #1 ; shift r9[5:4] to r9[4:3]
LSR r10, r10, #6 ; shift r10[7:6] to r10[1:0]

; begin to construct 3-bit Gray Code

ADD r3, r3, r4


ADD r3, r3, r5
ADD r3, r3, r6

; r3 now equals 0 [10:9] 0 [7:6] 0 [4:3] 0 [1:0], prefixing each


; 2-bit Gray code element with '0'

ADD r7, r7, r8


ADD r7, r7, r9
ADD r7, r7, r10

; r7 now equals 0 [10:9] 0 [7:6] 0 [4:3] 0 [1:0], prefixing each


; 2-bit Gray code element with '0'

ADD r7, r7, r12 ; r7 = 1 [10:9] 1 [7:6] 1 [4:3] 1 [1:0]

LSL r3, r3, #12 ; r3 = 0 [22:21] 0 [19:18]


; 0 [16:15] 0 [13:12]
ADD r2, r3, r7 ; now put piece the two together

Done B Done

END

26. Write a program that calculates the area of a circle. Register r0 will contain the radius of the

circle in Q3 notation. Represent π in Q10 notation, and store the result in register r3 in Q3

notation.

AREA AreaCircle, CODE, READONLY


ENTRY

44
LDR r0, =0x18 ; r0 = radius in Q3 (radius = 3 here)
LDR r1, =0xC90 ; r1 = pi in Q10

MUL r3, r0, r0 ; r3 = r^2 in Q6


MUL r3, r1, r3 ; r3 = pi*r^2 in Q16
LSR r3, r3, #13 ; convert r3 from Q16 to Q3
Done B Done

END

45
Chapter 8
1. Code the following IF-THEN statement using Thumb-2 instructions:

if (r2 != r7)
r2 = r2 – r7;
else
r2 = r2 + r4;

CMP r2, r7
ITE NE
SUBNE r2, r2, r7
ADDEQ r2, r2, r4

2. Write a routine for the ARM7TDMI that reverses the bits in a register, so that a register

containing d31d30d29…d1d0 now contains d0d1…d29d30d31. Compare this to the instruction RBIT on

the Cortex-M4.

AREA ReverseReg, CODE, READONLY


ENTRY

LDR r0, =0x11111111 ; r0 = register to be reversed


LDR r1, =32 ; 32-bit register bit counter
MOV r2, #0 ; r2 = result (will be r0 reversed)

Loop AND r3, r0, #1 ; r3 = current LSB of register to be


; reversed
ADD r2, r3, r2, LSL #1 ; shift result left 1, new
; result LSB is r3
LSR r0, r0, #1 ; shift right to look at new LSB
; of register to be reversed
SUBS r1, r1, #1 ; loop 32 times
BNE Loop

Done B Done

END

3. Code the GCD algorithm given in Section 8.4.1 using Thumb-2 instructions.

gcd CMP r0, r1


IT GT
SUBGT r0, r0, r1
IT LT
SUBLT r1, r1, r0
BNE gcd

46
But note also that this code only takes up 12 bytes of code space, less than the 16 bytes of ARM

code!

4. Find the maximum value in a list of 32-bit values located in memory. Assume the values are in

two’s complement representations. Your program should have 50 values in the list.

AREA MaxVal, CODE, READONLY


ENTRY

LDR r0, =49 ; initialize loop counter, r0 (length


; of elements - 1)
ADR r2, Values ; r2 = pointer to values

Loop LDR r3, [r2], #4 ; r3 = 1st element for comparison,


; increment pointer
LDR r4, [r2] ; r4 = 2nd element for comparison
CMP r3, r4 ; 1st > 2nd ?
STRGT r3, [r2] ; if so, store 1st in location of 2nd
SUBS r0, r0, #1 ; decrement counter
BNE Loop

LDR r1, [r2] ; result is at location in r2, load


; into r1

Done B Done

Values
DCD 1, 2, 4, 5, 6, 7, -8, 11, 23, 100
DCD 4, 5, 6, 2, 3, 4, 5, 13, 45, 200
DCD 5, -4, 888, 2, 1, 2, 7, 88, 45, 444
DCD 3, 4, -5, 1, 0, 2, 0, 23, -23, 111
DCD 3, 4, 5, 6, 7, -8, 9, 33, 22, 666

END

5. Write a parity checker routine that examines a byte in memory for correct parity. For even

parity, the number of ones in a byte should be an even number. For odd parity, the number of

ones should be an odd number. Create two small blocks of data, one assumed to have even parity

and the other assumed to have odd parity. Introduce errors in both sets of data, writing the value

0xDEADDEAD into register r0 when an error occurs.

AREA ParityChecker, CODE, READONLY

47
ENTRY

ADR r0, Even ; load address of Even


LDRB r1, [r0] ; r1 = Even
ADR r0, Odd ; load address of Odd
LDRB r2, [r0] ; r2 = Odd

LDR r3, =0 ; bit counter for r1


LDR r4, =0 ; bit counter for r2

LDR r5, =8 ; loop counter

Loop ANDS r6, r1, #1 ; current LSB a 1?


ADDNE r3, r3, #1 ; if so, increment bit
; counter for r1
LSR r1, r1, #1 ; shift left 1 bit
ANDS r6, r2, #1 ; current MSB a 1?
ADDNE r4, r4, #1 ; if so, increment bit
; counter for r2
LSR r2, r2, #1 ; shift left 1 bit
SUBS r5, r5, #1 ; decrement counter
BNE Loop

ANDS r3, r3, #1 ; is LSB of Even counter 1?


LDRNE r0, =0xDEADDEAD ; if so, parity error
ANDS r4, r4, #1 ; is LSB of Odd counter 1?
LDREQ r0, =0xDEADDEAD ; if not, parity error

Done B Done

Even DCB 0xFF


Odd DCB 0x80

END

6. Compare the code sizes (in bytes) for the GCD routines in Section 8.4.1, where one is written

using conditional execution and one is written using branches. (PS – both use both!)

Larger version: 28 bytes

Smaller version: 16 bytes

7. Digital signal processors make frequent use of Finite Impulse Response filters. The output of

the filter, y (n), can be described as a weighted sum of past and present input samples, or

48
N 1
y (n)   h(m) x(n  m)
m 0

where the coefficients h (m) are calculated knowing something about the type of filter you want.

A linear phase FIR filter has the property that its coefficients are symmetrical. Suppose that N is

7, and the values for h (m) are given as

h(0) = h(6) = -0.032

h(1) = h(5) = 0.038

h(2) = h(4) = 0.048

h(3) = -0.048

Use the sample data x (n) below.

SAMPLE DCW 0x0034,0x0024,0x0012,0x0010


DCW 0x0120,0x0142,0x0030,0x0294

Write an assembly language program to compute just one output value, y(8), placing the result in

register r1. You can assume that x(8) starts at the lowest address in memory and that x(7), x(6),

etc. follow as memory addresses increase. The coefficients should be converted to Q15 notation,

and the input and output values are in Q0 notation.

GLOBAL Reset_Handler
AREA Reset, CODE, READONLY
ENTRY

; most DSP processors have dedicated hardware for implementing


; efficient circular queues specifically for filters,
; but just to demonstrate a simple loop, the following code
; could be used.
; y(8) = h(0)x(8) + h(1)x(7) + ... + h(6)x(2)
; Note that the output is still in Q15 notation

Reset_Handler

LDR r1, =7 ; 32-bit register bit counter

49
ADR r2, COEFF ; start of coefficients in memory
ADR r0, SAMPLE ; start of sample data in memory
MOV r7, #0 ; the accumulator

Loop
LDRSH r4, [r2], #2 ; bring in one coefficient
LDRSH r5, [r0], #2 ; bring in one sample
MUL r6, r5, r4 ; h(n)*x(n-m)
ADD r7, r7, r6 ; result still in Q15 format
SUBS r1, r1, #1 ; loop 7 times
BNE Loop

Done B Done

COEFF DCW 0xFBE8 ; -0.032


DCW 0x04DD ; 0.038
DCW 0x0624 ; 0.048
DCW 0xF9DC ; -0.048
DCW 0x0624
DCW 0x04FF
DCW 0xFBE8
SAMPLE DCW 0x0034, 0x0024, 0x0012, 0x0010
; x(8), x(7), x(6), x(5)
DCW 0x0120, 0x0142, 0x0030, 0x0294
; x(4), x(3), x(2), x(1)
; Note: x(1) isn't needed, really...
END

8. Write a routine to reverse the word order in a block of memory. The block contains 32 words

of data.

AREA ReverseMem, CODE, READONLY


num EQU 32 ; number of words for block copy

ENTRY

start
LDR r0, =Values ; r0 = ptr to source for block copy
LDR r1, =Temp ; r1 = ptr to destination for block copy
MOV r2, #num ; r2 = number of words for block copy

copy LDR r3, [r0], #4 ; a word from Values


STR r3, [r1], #4 ; store a word to Temp
SUBS r2, r2, #1 ; decrement counter
BNE copy ; ... copy more

LDR r0, =Values ; reset pointer to start of Values


MOV r2, #num ; reset counter
SUB r1, r1, #4 ; adjust Temp to point to last word

50
copyRev
LDR r3, [r1], #-4 ; word from Temp starting at the end
STR r3, [r0], #4 ; str a word to Values, from beginning
SUBS r2, r2, #1
BNE copyRev

Done B Done

AREA Block, DATA, READWRITE

Values
DCD 1, 2, 4, 5, 6, 7, -8, 11, 23, 5
DCD 4, 5, 6, 2, 3, 4, 5, 13, 45, 7
DCD 5, -4, 8, 2, 1, 2, 7, 6, 45, 44
DCD 3, 4

Temp DCD 0,0,0,0,0,0,0,0,0,0


DCD 0,0,0,0,0,0,0,0,0,0
DCD 0,0

END

9. Translate the following conditions into a single ARM instruction:

a. Add registers r3 and r6 only if N is clear. Store the result in register r7.

ADDNE r7, r3, r6

b. Multiply registers r7 and r12, putting the results in register r3 only if C is set and Z is clear.

MULHI r3, r7, r12

c. Compare registers r6 and r8 only if Z is clear.

CMPNE r8, r8

10. The following is a simple C function that returns 0 if (x+y)<0 and returns 1 otherwise:

int foo(int x, int y){


if (x+y < 0)
return 0;
else
return 1;
}

51
Suppose that a compiler translated it into the following assembly:

foo ADDS r0, r0, r1


BPL PosOrZ
done MOV r0, #0
MOV pc, lr
PosOrZ MOV r0, r1
B done

This is inefficient. Rewrite the assembly code for the ARM7TDMI using only four instructions

(hint: use conditional execution).

foo ADDS r0, r0, r1 ; r1 = x + y, setting CCs


MOVPL r0, #1 ; return 1 if n bit = 0
MOVMI r0, #0 ; return 0 if n bit = 1
MOV pc, lr

11. Write Example 7.10 (finding the absolute value of a number) for the Cortex-M4.

MOVS r1, r0
IT LT
RSBLT r1, r1, #0

12. What instructions are actually assembled if you type the following lines of code for the

Cortex-M4 into the Keil assembler, and why?

CMP r3, #0
ADDEQ r2, r2, r1

CMP r3, #0
IT EQ
ADDS r2, r2, r1

The assembler will automatically add the IT instruction for you to create conditional

execution. Note that you get an ADDS instruction because the lower registers are used here (r1

and r2), which force the instruction to set condition codes in Thumb code. If you were using high

registers, then a plain ADD would have been used instead.

52
Chapter 9
1. Represent the following values in half-precision, single-precision, and double-
precision.

a. 1.5
0x3e00, 0x3fc00000, 0x3ff80000_00000000
b. 3.0
0x4200, 0x40400000, 0x40080000_00000000
c. -4.5
0xc480, 0xc0900000, 0xc0120000_00000000
d. -0.46875
0xb780, 0xbef00000, 0xbfde0000_00000000
e. 129
0x5808, 0x43010000, 0x40602000_00000000
f. -32768
0xf800, 0xc7000000, 0xc0e00000_00000000

2. Write a program in a high-level language to take as input a value in the form (-)x.y and
convert the value to single-precision and double-precision values.

Note to instructor: Output of single-precision in hexadecimal is straightforward with a


simple cast. Output of double-precision is a bit more complicated. Both may need a short
lesson on pointers in C. The output of double-precision may be omitted without any loss
of usefulness in the program.

/*************************************
* SimpleConv
* Convert input in decimal format
* (-)x(.x) to single-precision
* and double-precision floating-point
* CNH 4/2014
*************************************/

#include <stdio.h>

int main()
{

char xinline[81];
float F_in;
double D_in;
double *pD_in = &D_in;
int *pD_inh = (int *)pD_in+1;

while(1) {
printf("Enter input in decimal (-)x(.x): ");
if(scanf("%le", &D_in) == 1) {
F_in = D_in;
printf("Output: Single-precision: %08x, Double-precision:
%08x_%08x\n", *(int *)&F_in, *pD_inh, *(int *)&D_in);

53
}
else { return 0;
}
}
}

3. Using the program from Exercise 2 (or a converter on the internet), convert the
following values to single-precision and double-precision.

a. 65489
Output: Single-precision: 477fd100, Double-precision: 40effa20_00000000
b. 2147483648
Output: Single-precision: 4f000000, Double-precision: 41e00000_00000000
c. 229
Output: Single-precision: 4e000000, Double-precision: 41c00000_00000000
d. -0.38845
Output: Single-precision: bec6e2eb, Double-precision: bfd8dc5d_63886595
e. 0.0004529
Output: Single-precision: 39ed7336, Double-precision: 3f3dae66_b0384190
f. 11406
Output: Single-precision: 46323800, Double-precision: 40c64700_00000000
g. -57330.67
Output: Single-precision: c75ff2ac, Double-precision: c0ebfe55_70a3d70a

4. Expand the program in Exercise 2 to output half-precision values. Test your output on
the values from Exercise 3. Which would fit in half-precision?

See note in Exercise 2.


/*************************************
* SimpleConvHalf
* Convert input in decimal format
* (-)x(.x) to half-presicion,
* single-precision and double-precision
* floating-point
* CNH 4/2014
*************************************/

#include <stdio.h>
short ConvertToHalf(float);

int main()
{
char xinline[81];
char *token;
float F_in;
double D_in;
double *pD_in = &D_in;
int *pD_inh = (int *)pD_in+1;
short Half;

while(1) {
printf("Enter input in decimal (-)x(.x): ");

54
if(scanf("%le", &D_in) == 1) {
F_in = D_in;
Half = ConvertToHalf(F_in);
printf("Output: Half-Precision: %04x, Single-precision: %08x,
Double-precision: %08x_%08x\n", (Half & 0xffff), *(int *)&F_in,
*pD_inh, *(int *)&D_in);
}
else { return 0;
}
}
}

short ConvertToHalf(float F_input) {


char C_Output[5];

short W_Exponent, W_Half;


int I_Sign, I_Fraction, I_Half_Fraction, I_LSB, I_Guard, I_Sticky,
I_Round;

union {
int I_Single ;
float F_Single ;
} Single ;

Single.F_Single = F_input;
// Extract bits of the SP values and shift them to the right
W_Exponent = ((Single.I_Single & 0x7F800000) >> 23) - 127;
I_Fraction = (Single.I_Single & 0x007FFFFF) | 0x00800000;
I_Half_Fraction = (I_Fraction & 0x7FE000) >> 13;
I_Sign = (Single.I_Single & 0x80000000) >> 31;
I_LSB = I_Fraction & 0x2000;
I_Guard = I_Fraction & 0x1000;
I_Sticky = I_Fraction & 0xFFF;
I_Round = ((I_LSB > 0) & (I_Guard > 0))
| ((I_Guard > 0) & (I_Sticky > 0));
if (I_Round > 0) { I_Half_Fraction++; }
if (I_Half_Fraction == 0x800) { // overflow!
I_Half_Fraction = I_Half_Fraction >> 1;
W_Exponent++;
}
if (W_Exponent > 15 ) {
if (I_Sign) { W_Half = 0xfc00; }
else {W_Half = 0x7c00; }
} else if (W_Exponent < -16) {
W_Half = (I_Sign << 15 | 0);
} else {
W_Half = (I_Sign << 15)
| ((W_Exponent + 15) << 10)
| (I_Half_Fraction & 0x7FF)
;
}
return W_Half;
}

55
5. Write a program in a high-level language to take as input a single-precision value in
the form 0xffffffff (where f is a hexadecimal value) and convert the input to
decimal.

/*************************************
* ConvHexToSP
* Convert input in hexadecimal format
* to decimal.
* CNH 4/2014
*************************************/

#include <stdio.h>

int main()
{

union {
int I_Single;
float F_Single;
} IntSP;

printf("Enter input in hexadecimal 0xXXXXXXXX: ");


scanf("0x%8x", &IntSP.I_Single);
printf("Output: Single-Precison: %f\n", IntSP.F_Single);
}

6. Using the program from Exercise 5 (or a converter on the internet), convert the
following single-precision values to decimal. Identify the class of value for each input.
If the input is NaN, give the payload as the value, and NaN type in the class field.

Single-precision value Value Class


a. 0x3fc00000 1.5 Normal
b. 0x807345ff -1.058618e-38 Subnormal
c. 0x7f350000 2.405903e+38 Normal
d. 0xffffffff 0x7fffff NaN
e. 0x20000000 1.084202e-19 Normal
f. 0x7f800000 Inf Inf
g. 0xff800ffe 0x000ffe NaN
h. 0x42c80000 1.00e+02 Normal
i. 0x4d800000 2.684355e+08 Normal
j. 0x80000000 -0.0 Zero

7. What value would you write to the FPSCR to set the following conditions?

a. FZ unset, DN unset, roundTowardZero rounding mode


0x00C00000
b. FZ set, DN unset, roundTowardPositive rounding mode
0x01400000
c. FZ set, DN set, roundTiesToEven rounding mode
0x03000000

56
8. Complete the following table for each of the FPSCR values shown below.

N Z C V DN FZ RMode IDC IXC UFC OFC DZC IOC


0x41c00010 0 1 0 0 0 1 RZ 0 1 0 0 0 0
0x10000001 0 0 0 1 0 0 RN 0 0 0 0 0 1
0xc2800014 1 1 0 0 1 0 RM 0 1 0 1 0 0

9. Give the instructions to load the following values to FPU register S3.

a. 5.75 x 103

VLDR.F32 s3,=5.75e3

b. 147.225

VLDR.F32 s3,=147.225

c. -9475.376

VLDR.F32 s3,=-9475.376

d. -100.6 x 10-8

VLDR.F32 s3,=-1.006e-10

10. Give the instructions to perform the following load and store operations.

a. Load the 32-bit single-precision value at the address in R4 into S12.

VMOV.F32 s12, [r4]

b. Load the 32-bit single-precision value in R6 to S15. Repeat for a store of the value in
S15 to R6.

VMOV.F32 s15, r6
VMOV.F32 r6, s15

c. Store the 32-bit value in S4 to memory at the address in R8 with an offset of 16 bytes.

VMOV.F32 [r8, #16]

d. Store the 32-bit constant 0xFFFFFFFF to S28.

VLDR.F32 s28, =0f_FFFFFFFF

57
11. Give the instructions to perform a conversion of four fixed-point data in unsigned 8.8
format stored in S8 to S11 to single-precision format.

VCVT.U16.F32 s8, s8, #8


VCVT.U16.F32 s9, s9, #8
VCVT.U16.F32 s10, s10, #8
VCVT.U16.F32 s11, s11, #8

12. What instruction would you use to convert a half-precision value in the lower half of
register S5 to a single-precision value, and store the result in S2?

VCVTB.F32.F16 s2, s5

13. Give the instructions to load 8 single-precision values at address 0x40000100 to FPU
registers S8-S15.

FPDATA_BASE EQU 0x40000100

LDR r0,=#FPDATA_BASE
VLDR.32 s8, [r0]
VLDR.32 s9, [r0, #4]
VLDR.32 s10, [r0, #8]
VLDR.32 s11, [r0, #12]
VLDR.32 s12, [r0, #16]
VLDR.32 s13, [r0, #20]
VLDR.32 s14, [r0, #24]
VLDR.32 s15, [r0, #28]

Or, since the operands are contiguous in memory and in the register, file, the VLDR 64-
bit instruction could be used:

FPDATA_BASE EQU 0x40000100

LDR r0, =#FPDATA_BASE


VLDR.64 d4, [r0]
VLDR.64 d5, [r0, #8]
VLCR.64 d6, [r0, #16]
VLDR.64 d7, [r0, #24]

14. How many subnormal values are there in a single-precision representation? Is this
the same number as values for any non-zero exponent?

The fraction is 23 bits, and all bit patterns are valid for subnormal values. So there
23
are 2 unique representative values. The the same number of values exists for each non-
zero exponent in the normal range.

58
Chapter 10
1. For each rounding mode, show the resulting value given the sign, exponent, fraction,
guard, and sticky bit in the cases below.

Rounding mode Sign Exponent Fraction G S


bit bit
0 011111111 11111111111111111111111 1 0
roundTiesToEven 0 100000000 00000000000000000000000
roundTowardPositive 0 100000000 00000000000000000000000
roundTowardNegative 0 011111111 11111111111111111111111
roundTowardZero 0 011111111 11111111111111111111111
1 000000000 11111111111111111111111 0 1
roundTiesToEven 1 000000000 11111111111111111111111
roundTowardPositive 1 000000000 11111111111111111111111
roundTowardNegative 1 000000001 00000000000000000000000
roundTowardZero 1 000000000 11111111111111111111111
0 11111110 11111111111111111111111 1 1
roundTiesToEven 0 11111111 00000000000000000000000
roundTowardPositive 0 11111111 00000000000000000000000
roundTowardNegative 0 11111110 11111111111111111111111
roundTowardZero 0 11111110 11111111111111111111111
1 11111110 01111111111111111111110 1 0
roundTiesToEven 1 11111110 01111111111111111111110
roundTowardPositive 1 11111110 01111111111111111111110
roundTowardNegative 1 11111110 01111111111111111111111
roundTowardZero 1 11111110 01111111111111111111110

2. Rework Example 10.1 with the following values for OpB. For each, generate the
product rounded for each of the four rounding modes:

For the negative operands (exercises a-d) the smaller operand is negated with a 2’s
complement bit added in the lowest place. When summed, this results in a MSB in the
sum of zero, requiring a left shift of one place. After this shift, the remaining bits are used
in the rounding computation. Recall that the result is positive, so RM and RZ will never
increment, while RN and RP will use the rounding bits.

a. 0xBF000000
RN/RP – 0x4B800000
RM/RZ – 0x4B7FFFFF
b. 0xBF800000
All rounding modes – 0x4B7FFFFF (note – all rounding bits are zero)
c. 0xBF900000
RN/RP – 0x4B7FFFFF
RM/RZ – 0x4B7FFFFE

59
d. 0xBFC00000
RN/RM/RZ – 0x4B7FFFFE
RP – 0x4B7FFFFF
e. 0x3F000000
RN/RM/RZ – 0x4B800000
RP – 0x4B800001
f. 0x3FC00000
RM/RZ – 0x4B800000
RN/RP – 0x4B800001
g. 0x40200000
RN/RM/RZ – 0x4B800001
RP – 0x4B800000
h. 0x40400000
RM/RZ – 0x4B800001
RN/RP – 0x4B800002

3. Complete the following table for the given operands and operations, showing the result
and the exception status bits resulting from each operation. Assume the multiply-
accumulate operations are fused.

Operation Operand A Operand B Operand C Result Exception


Bit(s) set
A+B 0xFF800000 0x7F800000 NaN - IOC
0x7FC00000
A*B 0x80000000 0x7F800000 - Inf None
A–B 0xFF800000 0xFF800001 - NaN IOC
0xFFC00001
A/B 0x7FC00011 0x00000000 - NaN IOC
0x7FC00011
A/B 0xFF800000 0xFF800000 - NaN None
0xFFC00000
A*B 0x10500000 0x02000000 - 0x00000000 UFC, IXC
A*B 0x01800000 0x3E7FFFFF - 0x00800000 UFC, IXC
A*B 0x3E7F0000 0x02000000 - 0x00FF0000 None
(A * B) + 0x80000000 0x00800000 0x7FB60004 NaN IOC
C 0x7FF60004
(A * B) + 0x3F800000 0x7F800000 0xFF800000 NaN IOC
C 0x7FC00000
(A * B) + 0x6943FFFF 0x71000000 0xFF800000 NaN IOC
C 0x7FC00000

4. Write a program in a high-level language to input two single-precision values and add,
subtract, multiply and divide the values. In this Exercise use the default rounding mode.
Test your program with various floating-point values.

60
/******************************
* QuickFloat - simple C program
* to compute add, sub, mul, and
* div for two input SP values
* CNH 4/2014
******************************/

#include <stdio.h>

int main() {
// For this exercise, we assume inputs in hexadecimal
union {
int I_Single;
float F_Single;
} inA, inB, res;

printf("Enter two single-precision values in hexadecimal format


(0xXXXXXXXX): ");
scanf("0x%8x 0x%8x", &inA.I_Single, &inB.I_Single);

// Perform addition
res.F_Single = inA.F_Single + inB.F_Single;
printf("\nSum : 0x%08x\n", res.I_Single);

// Perform subtraction, A-B


res.F_Single = inA.F_Single - inB.F_Single;
printf("Diff : 0x%08x\n", res.I_Single);

// Perform multiplication
res.F_Single = inA.F_Single * inB.F_Single;
printf("Prod : 0x%08x\n", res.I_Single);

// Perform division, A/B


res.F_Single = inA.F_Single / inB.F_Single;
printf("Quot : 0x%08x\n\n", res.I_Single);

return(0);
}

5. Using routines available in your high-level language, perform each of the computations
in each of the four rounding modes. In C, you can use the floating-point environment by
including <fenv.h> and changing the rounding mode by the function fsetround(RMODE),
where RMODE is one of

FE_DOWNWARD
FE_TONEAREST
FE_TOWARDZERO
FE_UPWARD

Test your program with input values, modifying the lower bits in the operands to see how
the result differs for the four rounding modes. For example, use 0x3f800001 and
0xbfc00000. What differences did you notice in the results for the four rounding
modes?

61
/*********************************
* QuickFloatRM - simple C program
* to compute add, sub, mul, and
* div for two input SP values
* using routines to set the
* desired rounding mode.
* CNH 4/2014
******************************/

#include <stdio.h>
#include <fenv.h>

int main() {
// For this exercise, we assume inputs in hexadecimal
union {
int I_Single;
float F_Single;
} inA, inB, resRN, resRP, resRM, resRZ;

printf("Enter two single-precision values in hexadecimal format


(0xXXXXXXXX): ");
scanf("0x%8x 0x%8x", &inA.I_Single, &inB.I_Single);
printf("Op RN RP RM RZ\n");
printf("------------------------------------------------------\n");

// Perform each function using each rounding mode.


// Perform addition
fesetround(FE_TONEAREST);
resRN.F_Single = inA.F_Single + inB.F_Single;
fesetround(FE_UPWARD);
resRP.F_Single = inA.F_Single + inB.F_Single;
fesetround(FE_DOWNWARD);
resRM.F_Single = inA.F_Single + inB.F_Single;
fesetround(FE_TOWARDZERO);
resRZ.F_Single = inA.F_Single + inB.F_Single;

printf("Sum : 0x%08x 0x%08x 0x%08x 0x%08x\n", resRN.I_Single,


resRP.I_Single, resRM.I_Single, resRZ.I_Single);

// Perform subtraction A-B


fesetround(FE_TONEAREST);
resRN.F_Single = inA.F_Single - inB.F_Single;
fesetround(FE_UPWARD);
resRP.F_Single = inA.F_Single - inB.F_Single;
fesetround(FE_DOWNWARD);
resRM.F_Single = inA.F_Single - inB.F_Single;
fesetround(FE_TOWARDZERO);
resRZ.F_Single = inA.F_Single - inB.F_Single;

printf("Diff : 0x%08x 0x%08x 0x%08x 0x%08x\n", resRN.I_Single,


resRP.I_Single, resRM.I_Single, resRZ.I_Single);

// Perform multiplication
fesetround(FE_TONEAREST);
resRN.F_Single = inA.F_Single * inB.F_Single;

62
fesetround(FE_UPWARD);
resRP.F_Single = inA.F_Single * inB.F_Single;
fesetround(FE_DOWNWARD);
resRM.F_Single = inA.F_Single * inB.F_Single;
fesetround(FE_TOWARDZERO);
resRZ.F_Single = inA.F_Single * inB.F_Single;

printf("Prod : 0x%08x 0x%08x 0x%08x 0x%08x\n", resRN.I_Single,


resRP.I_Single, resRM.I_Single, resRZ.I_Single);

// Perform division A/B


fesetround(FE_TONEAREST);
resRN.F_Single = inA.F_Single / inB.F_Single;
fesetround(FE_UPWARD);
resRP.F_Single = inA.F_Single / inB.F_Single;
fesetround(FE_DOWNWARD);
resRM.F_Single = inA.F_Single / inB.F_Single;
fesetround(FE_TOWARDZERO);
resRZ.F_Single = inA.F_Single / inB.F_Single;

printf("Quot : 0x%08x 0x%08x 0x%08x 0x%08x\n\n", resRN.I_Single,


resRP.I_Single, resRM.I_Single, resRZ.I_Single);

return(0);
}

Differences are limited to the last bit for subtraction and multiplication, with addition and
division returning the same result for all rounding modes.

6. Give 3 values that will hold to the distributive property and 3 which will not.

Three values for which the distributive property will hold could be A = B = C =
0x3f800000 (1.0).

Two conditions (there may be more) will demonstrate a failure of the distributive
property for floating-point.

1. (B + C) < X, and A * X is normal, but A * B or A * C overflows. For example,

A = 0x43000000 (128)
B = 0x7E800001 (very large, positive)
C = 0xFE800000 (very large, negative)

In this example, both A * B or A * C will overflow, returning an Infinity. A * (B + C)


will remain in the normal range. Only one of A * B or A * C need overflow to
demonstrate this property.

2. (B + C) overflows, but A is < 1.0. For example,

A = 0x3F000000 (1/2)
B = 0x7F7FFFFF (just under the max for single-precision normal)

63
C = 0x7F7FFFFF

7. Is it is possible to have cancellation in an effective subtraction operation and have a


guard and sticky bit? If so, show an example. If not, explain why.

It is not. To have cancellation, the exponents must be equal or differ by only one. If they
are equal, there will not be either a guard or sticky bit. If the differ by one, only a guard
bit will exist.

8. Redo Example 10.5 using 0x3F800003 as the multiplier.

We replaced the multiplier (0x3F800001) with 0x3F800003. This will add a term in the
multiplier array due to bit [1] of the multiplier fraction. Otherwise, the exercise is
identical and the determination of rounding is made using the new guard and sticky bits,
and the sign.

Operand Significand
OpB Significand 1 101 00000000000000000000
(Multiplicand)
OpA Significand 1 000 00000000000000000011
(Multiplier)

Internally aligned partial product terms


OpA[0] term 1 101 00000000000000000000
OpA[1] term 11 010 00000000000000000000
OpA[23] term 1.101 0000000000000000 0000
Infinitely-precise internal significand
Infinitely precise 1.101 0000000000000000 0100 111 00000000000000000000
product
Pre-rounded product 1.101 0000000000000000 000L Gss ssssssssssssssssssss
Incremented product 1.101 0000000000000000 0101 111 00000000000000000000

Result significand
Result product - RNE 1.101 0000000000000000 0101
Result product – RP 1.101 0000000000000000 0100
Result product – RM 1.101 0000000000000000 0101
Result product – RZ 1.101 0000000000000000 0100

9. Show that 36! generates an overflow condition.

36! = 3.719 x 1041. This value is greater than the range for normal values (3.4 x 1038).

10. Demonstrate a case in which a multiply operation results in a normal value for the RZ
and RM rounding modes, but overflows for the RNE and RP rounding modes.

0x7F7FFFFF and 0x3F800001.

64
Chapter 11
1. Complete the table for a VCMP instruction and the following operands.

Operand A Operand B N Z C V
0xFF800000 0x3F800000 1 0 0 0
0x00000000 0x80000000 0 1 1 0
0x7FC00005 0x00000000 0 0 1 1
0x7F80000F 0x7F80000F 0 0 1 1
0x40000000 0xBF000000 0 0 1 0

2. Give the instructions to implement the following algorithm. Assume y is in S0, and
return x in S0.

x = 8y2 -7y + 12

; Compute the polynomial


; x = 8y^2 - 7Y + 12

; Set up y value in s0
VLDR.F32 s0,=-3.0

; Set up the contents of s1-s3 using VLDR


; pseudo-instruction
VLDR.F32 s1,=8.0 ; power 2 coeff
VLDR.F32 s2,=-7.0 ; power 1 coeff
VLDR.F32 s3,=12.0 ; constant

VMUL.F32 s1, s0, s1 ; compute 8 * y


VMUL.F32 s1, s0, s1 ; compute (8 * y) * y
VMUL.F32 s2, s0, s2 ; compute -7y
VADD.F32 s0, s1, s2 ; compute 8y^2 + (-7)y
VADD.F32 s0, s0, s3 ; add in +12, return in s0

Exit B Exit

3. Expand the code above to create a subroutine that will resolve an order 2 polynomial
with the constant for the square term in S1, the order 1 term in S2, and the constant factor
in S3, and the y value in S0. Return the result in S0.

; Set up y value in s0
VLDR.F32 s0,=-3.0

; Set up the contents of s4-s7 using VLDR


; pseudo-instruction
VLDR.F32 s1,=8.0 ; power 2 coeff

65
VLDR.F32 s2,=-7.0 ; power 1 coeff
VLDR.F32 s3,=12.0 ; constant

BL func

Exit B Exit

func
; y is input in S0
; function - (S1)y^2 + (S2)y + (S3)
VMUL.F32 s1, s1, s0 ; compute (S1)y
VMUL.F32 s1, s1, s0 ; compute ((S1)y)y
VMUL.F32 s2, s0, s2 ; compute (S2)y
VADD.F32 s0, s1, s2 ; compute (S1)y^2 + (S2)y
VADD.F32 s0, s0, s3 ; add in constant term
BX lr ; return

4. Give the instructions to perform the following loop over an array X of 20 data values
in memory at address 0x40000100 and an array Y of data values in memory at address
0x40000200. The new y value should overwrite the original value.

y = Ax + y

AREA RESET, CODE, READONLY


THUMB

EXPORT __Vectors
__Vectors
DCD 0x400 ; Top of Stack
DCD Reset_Handler ; Reset Handler

EXPORT Reset_Handler
ENTRY

Reset_Handler

; Enable the FPU


; Code taken from ARM website
; CPACR is located at address 0xE000ED88
LDR.W R0, =0xE000ED88
; Read CPACR
LDR R1, [R0]
; Set bits 20-23 to enable CP10 and CP11
ORR R1, R1, #(0xF << 20)
; Write back the modified value to the CPACR
STR R1, [R0]
DSB

66
; Reset pipeline not the FPU is enabled
ISB

; Apply the equation to two arrays, x and y


; each 20 entries. The function is called
; "func," and provides two values in S0 and
; S1, and return the result in S0. S2
; contains the constant, A.

IterCnt EQU 20
RAMStartX EQU 0x20000000
RAMStartY EQU 0x20000100

; Registers used
; r0 = address of x_data, incremented after
; r1 = address of y_data, incremented after
; r2 = iteration count, set to 20
; r3 = counter, initially set to 0
; Assume at least one iteration

LDR r0, =x_data ; address for x data


LDR r1, =y_data ; address for y data
MOV r2, #IterCnt ; number of iterations
MOV r3, #0 ; initial iteration
VLDR.F32 s2,=-20.0 ; initialize constant, A

; Copy the data from flash to RAM


LDR r4, =RAMStartX
LDR r5, =RAMStartY
CpyLoop
LDR r6, [r0], #4
LDR r7, [r1], #4
STR r6, [r4], #4
STR r7, [r5], #4
ADD r3, #1
CMP r3, r2
BLT CpyLoop

; Reset the pointers to the RAM space


LDR r0, =RAMStartX
LDR r1, =RAMStartY
MOV r3, #0 ; reinitialize iter cnt

; Perform the function


Loop
VLDR.32 s0, [r0], #4 ; load x data, post-inc

67
VLDR.32 s1, [r1] ; load y data, no post-inc
BL func ; call function
ADD r3, r3, #1 ; increment counter
VSTR.32 s0, [r1], #4 ; store new y, post-inc
CMP r3, r2 ; test for done
BLT Loop ; if less than IterCnt, loop
Exit B Exit ; else, exit. We're done.

; Function is y = Ax + y.
; S0 - x
; S1 - y
; S2 - A
; Return result in S0

func
VMUL.F32 S0, S0, S2
VADD.F32 S0, S0, S1
BX LR

x_data
DCD 0x3f800000
DCD 0x40000000
DCD 0x40800000
DCD 0x41000000
DCD 0x41800000
DCD 0x42000000
DCD 0x42800000
DCD 0x43000000
DCD 0x43800000
DCD 0x44000000
DCD 0x44800000
DCD 0x45000000
DCD 0x45800000
DCD 0x46000000
DCD 0x46800000
DCD 0x47000000
DCD 0x47800000
DCD 0x48000000
DCD 0x48800000
DCD 0x49000000
DCD 0x49800000

y_data
DCD 0xbf800000
DCD 0xc0000000
DCD 0xc0800000

68
DCD 0xc1000000
DCD 0xc1800000
DCD 0xc2000000
DCD 0xc2800000
DCD 0xc3000000
DCD 0xc3800000
DCD 0xc4000000
DCD 0xc4800000
DCD 0xc5000000
DCD 0xc5800000
DCD 0xc6000000
DCD 0xc6800000
DCD 0xc7000000
DCD 0xc7800000
DCD 0xc8000000
DCD 0xc8800000
DCD 0x49000000

END

5. Modify the program in Example 11.4 to order the four values in S8-S11 in order from
smallest to largest.

AREA RESET, CODE, READONLY


THUMB

EXPORT __Vectors
__Vectors
DCD 0x400 ; Top of Stack
DCD Reset_Handler ; Reset Handler

EXPORT Reset_Handler
ENTRY

Reset_Handler

; Enable the FPU


; Code taken from ARM website
; CPACR is located at address 0xE000ED88
LDR.W R0, =0xE000ED88
; Read CPACR
LDR R1, [R0]
; Set bits 20-23 to enable CP10 and CP11
ORR R1, R1, #(0xF << 20)
; Write back the modified value to the CPACR
STR R1, [R0]
DSB

; Reset pipeline not the FPU is enabled


ISB

69
; It will take a maximum of 3 passes for a bubble sort
MOV r0, #3

; Set up the contents of registers s8-s1 using VLDR


; pseudo-instruction
VLDR.F32 s8,=2.0
VLDR.F32 s9,=1.0
VLDR.F32 s10,=-1.0
VLDR.F32 s11,=-2.0

; Data in registers S8-S11 are to be ordered from smallest (in


; S8) to largest. Use s0 as the temp scratch register
Loop
VCMP.F32 s8, s9
VMRS APSR_nzcv, FPSCR
VMOVGT.F32 s0, s9
VMOVGT.F32 s9, s8
VMOVGT.F32 s8, s0

VCMP.F32 s9, s10


VMRS APSR_nzcv, FPSCR
VMOVGT.F32 s0, s10
VMOVGT.F32 s10, s9
VMOVGT.F32 s9, s0re

VCMP.F32 s10, s11


VMRS APSR_nzcv, FPSCR
VMOVGT.F32 s0, s11
VMOVGT.F32 s11, s10
VMOVGT.F32 s10, s0

SUB r0, #1
BNE Loop

END

6. In the third case in Example 11.9 (Section 11.7.2.3) the error of the VMLA was 2-22.
Show why this is the case.

The inputs are (1.5 + 1 ulp)2 – (2.5 + 4 ulps). When the inputs are multiplied, we have
2.25 + 3 ulps + 1 ulp2. The VMLA will round this to 2.25 + 4 ulps, or (1.125 + 2 ulp) *
21 (0x40100002). When 0xC0100002 is added to the product, the result is zero for the
VMLA, but what remains after the subtraction in the VFMA case is –(2 ulps + 2 ulp2)
(with an ulp equal to 2-23). The result in single-precision is 0xB3FFFFFE.

7. Write a division routine that checks for a divisor of zero, and if it is returns a correctly
signed infinity without setting the DZC bit. If the divisor is not zero, the division is
performed.

Note: Testing for a negative zero is a bit tricky. The Cortex-M4 will compare two zeros
as equal regardless of the signs of the zeros. The option used in this solution is to move

70
the zero to an ARM register and test it, and if negative flip the sign on a +1 operand and
multiply an infinity by this value to return the signed infinity.

AREA RESET, CODE, READONLY


THUMB

EXPORT __Vectors
__Vectors
DCD 0x400 ; Top of Stack
DCD Reset_Handler ; Reset Handler

EXPORT Reset_Handler
ENTRY

Reset_Handler

; Enable the FPU


; Code taken from ARM website
; CPACR is located at address 0xE000ED88
LDR.W R0, =0xE000ED88
; Read CPACR
LDR R1, [R0]
; Set bits 20-23 to enable CP10 and CP11
ORR R1, R1, #(0xF << 20)
; Write back the modified value to the CPACR
STR R1, [R0]
DSB

; Reset pipeline not the FPU is enabled


ISB

; Test out the division routine


VLDR.F32 s0,=+1.0 ; just some normals
VLDR.F32 s1,=+2.0
BL DivNoDZ

VLDR.F32 s0,=0.0 ; try with +0/+0


VLDR.F32 s1,=0.0
BL DivNoDZ

VLDR.F32 s0,=0f_7fe00000 ; NaN/-0


VLDR.F32 s1,=-0.0
BL DivNoDZ

VLDR.F32 s0,=+1.0 ; normal/-0


VLDR.F32 s1,=-0.0
BL DivNoDZ

Exit B Exit

; Division routine. Check is made for a zero divisor, and


; if so, a correctly signed infinity is returned without
; setting DZ, even if the dividend is NaN or zero.
; Input:
; s0 - dividend
; s1 - divisor

71
; Return
; s0 - quotient
; If the divisor is zero, r0 is used to extract the sign

DivNoDZ
VCMP.F32 s1, #0.0 ; if divisor is zero, rtn INF
VMRS APSR_nzcv, FPSCR
BEQ RtnInf
VDIV.F32 s0, s0, s1
BX lr
RtnInf
VMOV.F32 r0, s1 ; use the tst inst
TEQ r0, #0 ; in the integer side
MOV r0, #0x3f800000 ; if +0, return +1
ORRMI r0, #0x80000000 ; if -0, flip sign, rtn -1
VMOV.F32 s1, r0
VLDR.F32 s0,=0f_7f800000 ; load +inf
VMUL.F32 s0, s1
BX lr

END

8. Add to the program of Exercise 7 a check on a divisor that is 2.0. If it is, perform a
multiplication of 0.5 rather than do the division.

Modifying the DivNoDZ routine above, we have

; Division routine. Check is made for a zero divisor, and


; if so, a correctly signed infinity is returned without
; setting DZ, even if the dividend is NaN or zero.
; Added check on divisor of 2.0, to be replaced with a
; multiplication of 0.5
; Input:
; s0 - dividend
; s1 - divisor
; s2 - scratch
; Return
; s0 - quotient
; If the divisor is zero, r0 is used to extract the sign

DivNoDZ
VCMP.F32 s1, #0.0 ; if divisor is zero, rtn INF
VMRS APSR_nzcv, FPSCR
BEQ RtnInf
VLDR.F32 s2,=2.0 ; load 2.0 for check
VCMP.F32 s1, s2 ; if divisor is 2.0, mul by 0.5
VMRS APSR_nzcv, FPSCR
BEQ MulByHalf
VDIV.F32 s0, s0, s1
BX lr
RtnInf
VMOV.F32 r0, s1 ; use the tst inst
TEQ r0, #0 ; in the integer side
MOV r0, #0x3f800000 ; if +0, return +1
ORRMI r0, #0x80000000 ; if -0, flip sign, rtn -1

72
VMOV.F32 s1, r0
VLDR.F32 s0,=0f_7f800000 ; load +inf
VMUL.F32 s0, s1
BX lr

MulByHalf
VLDR.F32 s1,=0.5 ; replace divisor by 0.5
VMUL.F32 s0, s1
BX lr

END

9. Write a subroutine that would perform a reciprocal operation on an input in S0,


returning the result in S0.

; Reciprocal routine. A value in s0 is operated on as


; 1.0/s0, with the result in s0.
; Input:
; s0 - input
; Return
; s0 - reciprocal

Recip
VLDR.F32 s1,=1.0 ; set up 1.0 constant
VDIV.F32 s0, s1, s0 ; perform the division
BX lr ; and retur

73
Chapter 12
1. Using the sine table as a guide, construct a cosine table which produces the value for cos(x),

where 0 ≤ x < 360. Test your code for values of 84 degrees and 105 degrees.

AREA COSINETABLE, CODE


ENTRY

; Registers used:
; r0 = return value in Q31 notation
; r1 = sin argument (in degrees, from 0 to 360)
; r2 = temp
; r4 = starting address of sine table
; r7 = copy of argument

main
MOV r7, r1 ; make a copy of the argument
LDR r2,=270 ; constant won’t fit into rotation scheme
ADR r4, cosine_data ; load address of cosine table
CMP r1, #90 ; determine quadrant
BLE retvalue ; first quadrant?

CMP r1, #180


RSBLE r1,r1,#180 ; second quadrant?
BLE retvalue

CMP r1, r2
SUBLE r1, r1, #180 ; third quadrant?
BLE retvalue
RSB r1, r1, #360 ; otherwise, fourth

retvalue
LDR r0, [r4, r1, LSL #2] ; get cosine value from table
CMP r7, #90
BLT done
CMP r7, r2 ; do we return a negative value?
BGT done
RSBLT r0, r0, #0 ; if between 90 and 270, negate the value

done B done

ALIGN

cosine_data
DCD 0x7fffffff,0x7ffb0280,0x7fec0a00,0x7fd31780
DCD 0x7fb02e00,0x7f834f00,0x7f4c7e80,0x7f0bc080
DCD 0x7ec11a80,0x7e6c9280,0x7e0e2e00,0x7da5f580
DCD 0x7d33f100,0x7cb82880,0x7c32a680,0x7ba37500
DCD 0x7b0a9f80,0x7a683180,0x79bc3880,0x7906c080
DCD 0x7847d900,0x777f9000,0x76adf600,0x75d31a80

74
DCD 0x74ef0e80,0x7401e500,0x730baf00,0x720c8080
DCD 0x71046d00,0x6ff38a00,0x6ed9eb80,0x6db7a880
DCD 0x6c8cd700,0x6b598e80,0x6a1de700,0x68d9f980
DCD 0x678dde80,0x6639b000,0x64dd8980,0x63798500
DCD 0x620dbe80,0x609a5300,0x5f1f5e80,0x5d9cff80
DCD 0x5c135380,0x5a827980,0x58ea9100,0x574bb900
DCD 0x55a61280,0x53f9be00,0x5246dd80,0x508d9200
DCD 0x4ecdff00,0x4d084680,0x4b3c8c00,0x496af400
DCD 0x4793a200,0x45b6bb80,0x43d46500,0x41ecc480
DCD 0x40000000,0x3e0e3dc0,0x3c17a500,0x3a1c5c40
DCD 0x381c8bc0,0x36185b00,0x340ff240,0x32037a40
DCD 0x2ff31bc0,0x2ddf0040,0x2bc75100,0x29ac37c0
DCD 0x278dde80,0x256c6f80,0x234815c0,0x2120fb80
DCD 0x1ef74c00,0x1ccb3240,0x1a9cd9a0,0x186c6de0
DCD 0x163a1a80,0x14060b60,0x11d06ca0,0x0f996a20
DCD 0x0d613050,0x0b27eb60,0x08edc7b0,0x06b2f1d0
DCD 0x04779630,0x023be164,0x00000000

END

Using r1 = 0x54, which is 84 degrees, the result is r0 = 0x0D613050 in Q31 = .10528463

Using r1 = 0x69, which is 105 degrees, the result is r0 = 0xDEDF0480 in Q31 = -.25881903

* Note that this cosine table was generated on a simulated ARM7 without floating point

hardware, using:

double M_PI = 2 * acos(0.0);

to represent pi, since M_PI is not part of the ARM mathlib library. Some cosine tables may

slightly differ depending on the processor/environment/tools/precision used.

2. It was mentioned in Section 12.4 that a binary search only works if the entries in a list are

sorted first. A bubble sort is a simple way to sort entries. The basic idea is to compare two

adjacent entries in a list – call them entry[j] and entry[j+1]. If entry[j] is larger, then swap the

entries. If this is repeated until the last two entries are compared, the largest element in the list

will now be last. The smallest entry will ultimately get swapped, or “bubbled”, to the top. This

algorithm could be described in C as

75
last = num;
while (last>0){
pairs = last – 1;
for (j=0; j <=pairs; j++) {
if(entry[j] > entry[j+1]) {
temp = entry[j];
entry[j] = entry[j+1];
entry[j+1] = temp;
last = j;
}
}
}

where num is the number of entries in the list. Write an assembly language program to implement

a bubble sort algorithm, and test it using a list of 20 elements. Each element should be a word in

length.

AREA BubbleSort, CODE, READONLY


num EQU 20 ; number of elements
ENTRY

MOV r0, #num ; number of elements


LSL r0, r0, #2 ; number of bytes of elements
ADR r1, elements ; pointer to start of elements
MOV r2, r1 ; make copy of start of elements
SUB r2, r2, #4 ; length is # of elements - 1

Sort ADD r3, r1, r0


SUB r3, r3, #4 ; addr of last element (new ptr)
MOV r4, #0 ; r4 = change flag
ADD r2, r2, #4 ; incr by 1 word each iteration

NextWord
LDR r5, [r3], #-4 ; load 1st word
LDR r6, [r3] ; load 2nd word
CMP r5, r6 ; 1st > 2nd ?
BGT NoSwap ; if so, don't swap

STR r5, [r3], #4 ; otherwise, swap


STR r6, [r3]
ADD r4, r4, #1 ; flag swap
SUB r3, r3, #4 ; decrement pointer

NoSwap
CMP r3, r2 ; checked all words?
BNE NextWord ; if not, check more
CMP r4, #0 ; was there a swap?
BNE Sort ; if so, check again

76
Done B Done

elements
DCD 3, 4, 2, 54, 6, 33, -5, 2, 1, 0
DCD 77, 1, 2, 3, 0, -5, -35, 3, 4, 1

END

3. Using the bubble sort algorithm written in Exercise 2, write an assembly program that sorts

entries in a list and then uses a binary search to find a particular key. Remember that your sorting

routine must sort both the key and the data associated with each entry. Create a list with 30

entries or so, and data for each key should be at least 12 bytes of information.

NUM EQU 14 ; insert # of entries here


ESIZE EQU 4 ; log 2 of the entry size (16 bytes)
; NB: Assumes entry size is a power of 2

AREA BINARYSearch, CODE


ENTRY

; bubble sort keys and data

MOV r0, #NUM ; number of elements


LSL r0, r0, #4 ; number of bytes of elements
ADR r1, table_start ; pointer to start of elements
MOV r2, r1 ; make copy of start of elements
SUB r2, r2, #16 ; length is # of elements - 1

Sort ADD r3, r1, r0


SUB r3, r3, #16 ; r3 = addr of last element (new ptr)
MOV r4, #0 ; r4 = change flag
ADD r2, r2, #16 ; incr by 16 bytes each iteration

NextElement
LDMIA r3, {r5-r8} ; load first 16-byte key/element
SUB r3, r3, #16 ; adjust pointer
LDMIA r3, {r9-r12} ; load second 16-byte key/element
CMP r5, r9 ; 1st > 2nd ?
BGT NoSwap ; if so, don't swap

STMIA r3, {r5-r8} ; otherwise swap


ADD r3, r3, #16
STMIA r3, {r9-r12}
ADD r4, r4, #1 ; flag swap
SUB r3, r3, #16 ; decrement pointer

77
NoSwap
CMP r3, r2 ; checked all elements?
BNE NextElement ; if not, check more
CMP r4, #0 ; was there a swap?
BNE Sort ; if so, check again

; Binary Search
; Registers used:
; R0 - first
; R1 - last
; R2 - middle
; R3 - index
; R4 - size of the entries (log 2)
; R5 - the key (what you're searching for)
; R6 - address of the list
; R7 - temp

LDR r5, =0x250 ; let’s look for PINEAPPLE


ADR r6, table_start ; load address of the table
MOV r0, #0 ; first = 0
MOV r1, #NUM-1 ; last = num of entries in list - 1
loop CMP r0, r1 ; compare first and last
MOVGT r2, #0 ; first>last, no key found, middle = 0
BGT done
ADD r2, r0, r1 ; first + last
ASR r2, r2, #1 ; first + last /2
LDR r7, [r6, r2, LSL #ESIZE] ; load the entry
CMP r5, r7 ; compare key to value loaded
ADDGT r0, r2, #1 ; first = middle + 1
SUBLT r1, r2, #1 ; last = middle - 1
BNE loop ; go again
done MOV r3, r2 ; move middle to 'index'

stop B stop

table_start
DCD 0x024
DCB "EXTRA SAUCE "
DCD 0x005
DCB "ANCHOVIES "
DCD 0x012
DCB "GREEN PEPPER"
DCD 0x010
DCB "OLIVES "
DCD 0x300
DCB "PINE NUTS "
DCD 0x018
DCB "BLACK OLIVES"
DCD 0x004
DCB "PEPPERONI "
DCD 0x022
DCB "CHEESE "

78
DCD 0x038
DCB "MUSHROOMS "
DCD 0x026
DCB "CHICKEN "
DCD 0x030
DCB "CANADIAN BAC"
DCD 0x100
DCB "TOMATOES "
DCD 0x200
DCB "PINEAPPLE "
DCD 0x035
DCB "GREEN OLIVES"

END

4. Create a queue of 32-bit data values in memory. Write a function to remove the first item in the

queue.

AREA QueueRemoveFirst, CODE, READONLY


ENTRY

LDR r0, Queue ; load head of queue


STR r1, Pointer ; store it to Pointer
CMP r0, #0 ; end of queue?
BEQ Done ; if so, done

LDR r1, [r0] ; otherwise, get next pointer


STR r1, Queue ; and make it head of queue

Done B Done

AREA Data, DATA


Queue DCD Item1 ; pointer to start of queue
Pointer
DCD 0 ; pointer space

DataArea % 40 ; queue space for new entries

Item1 DCD Item2 ; pointer


DCD 1, 2 ; data
Item2 DCD Item3 ; pointer
DCD 3, 4 ; data
Item3 DCD 0 ; NULL pointer
DCD 5, 6 ; data

END

79
5. Using the sine table as a guide, construct a tangent table which produces the value for tan(x),

where 0 ≤ x ≤ 45. Test your code for values of 12 degrees and 43 degrees. You may return a

value of 0x7FFFFFFF for the case where the angle is equal to 45 degrees, since 1 cannot be

represented in Q31 notation.

GLOBAL Reset_Handler
AREA Reset, CODE
ENTRY

; Registers used:
; r0 = return value in Q31 notation
; r1 = tan argument (in degrees, from 0 to 45)
; r4 = starting address of tan table

Reset_Handler

ADR r4, tan_data ; load address of tan table

LDR r1, =12 ; put test value in


LDR r0, [r4, r1, LSL #2] ; get tan value from table

LDR r1, =43 ; put test value in


LDR r0, [r4, r1, LSL #2] ; get tan value from table

done B done

ALIGN
tan_data
DCD 0x00000000,0x023bf7b4,0x047848a8,0x06b54c50
DCD 0x08f35ca0,0x0b32d420,0x0d740e40,0x0fb76790
DCD 0x11fd3e00,0x1445f100,0x1691e1e0,0x18e17440
DCD 0x1b350da0,0x1d8d16c0,0x1fe9fae0,0x224c28c0
DCD 0x24b412c0,0x27222ec0,0x2996f800,0x2c12ed40
DCD 0x2e969380,0x31227500,0x33b721c0,0x365530c0
DCD 0x38fd4100,0x3baff840,0x3e6e0580,0x41382180
DCD 0x440f0e00,0x46f39980,0x49e69d00,0x4ce90000
DCD 0x4ffbb800,0x531fc980,0x56564b80,0x59a06680
DCD 0x5cff5880,0x60747580,0x64012b00,0x67a70100
DCD 0x6b679e00,0x6f44c980,0x73407080,0x775ca780
DCD 0x7b9bb080,0x7fffffff

END

80
6. Create a jump table that contains the addresses of three functions: sin(x), cos(x) and tan(x). The

arguments to these functions should be between 0 and 45 degrees. Call each of these functions

using the jump table for the angle 27 degrees.

;
; Note that you would probably not implement such a jump table in
; practice, as there are better ways to compute trig values.
; This exercise is mostly illustrative of how to build the jump
; table itself.
;

GLOBAL Reset_Handler
AREA Reset, CODE, READONLY
num EQU 3
ENTRY

Reset_Handler
start
MOV r0, #1 ; first parameter determines
; function and holds return argument:
; 0 = sin(x)
; 1 = cos(x)
; 2 = tan(x)
MOV r1, #27 ; 27 degrees (NB: argument
; assumed to be 0<=x<=45)
BL trigfun ; call the function
stop
B stop

trigfun
CMP r0, #num ; if code is >=num then return
MOVHS pc, lr
ADR r3, JumpTable ; load address of jump table
LDR pc, [r3, r0, LSL #2]; jump to appropriate routine

JumpTable
DCD DoSin
DCD DoCos
DCD DoTan
DoSin
MOV r7, r1 ; make a copy
LDR r2,=270 ; constant won't fit into rotation scheme
ADR r4, sin_data ; load address of sin table
CMP r1, #90 ; determine quadrant
BLE retvalue ; first quadrant?
CMP r1, #180
RSBLE r1,r1,#180 ; second quadrant?
BLE retvalue
CMP r1, r2

81
SUBLE r1, r1, #180 ; third quadrant?
BLE retvalue
RSB r1, r1, #360 ; otherwise, fourth

retvalue

LDR r0, [r4, r1, LSL #2] ; get sin value from table
CMP r7, #180 ; do we return a neg value?
RSBGT r0, r0, #0 ; negate the value
MOV pc, lr ; return

ALIGN
sin_data
DCD 0x00000000,0x023be164,0x04779630,0x06b2f1d8
DCD 0x08edc7b0,0x0b27eb50,0x0d613050,0x0f996a30
DCD 0x11d06ca0,0x14060b80,0x163a1a80,0x186c6de0
DCD 0x1a9cd9c0,0x1ccb3220,0x1ef74c00,0x2120fb80
DCD 0x234815c0,0x256c6f80,0x278dde80,0x29ac3780
DCD 0x2bc750c0,0x2ddf0040,0x2ff31bc0,0x32037a40
DCD 0x340ff240,0x36185b00,0x381c8bc0,0x3a1c5c80
DCD 0x3c17a500,0x3e0e3dc0,0x40000000,0x41ecc480
DCD 0x43d46500,0x45b6bb80,0x4793a200,0x496af400
DCD 0x4b3c8c00,0x4d084600,0x4ecdff00,0x508d9200
DCD 0x5246dd00,0x53f9be00,0x55a61280,0x574bb900
DCD 0x58ea9100,0x5a827980,0x5c135380,0x5d9cff80
DCD 0x5f1f5f00,0x609a5280,0x620dbe80,0x63798500
DCD 0x64dd8900,0x6639b080,0x678dde80,0x68d9f980
DCD 0x6a1de700,0x6b598f00,0x6c8cd700,0x6db7a880
DCD 0x6ed9ec00,0x6ff38a00,0x71046d00,0x720c8080
DCD 0x730baf00,0x7401e500,0x74ef0f00,0x75d31a80
DCD 0x76adf600,0x777f9000,0x7847d900,0x7906c080
DCD 0x79bc3880,0x7a683200,0x7b0a9f80,0x7ba37500
DCD 0x7c32a680,0x7cb82880,0x7d33f100,0x7da5f580
DCD 0x7e0e2e00,0x7e6c9280,0x7ec11a80,0x7f0bc080
DCD 0x7f4c7e80,0x7f834f00,0x7fb02e00,0x7fd31780
DCD 0x7fec0a00,0x7ffb0280,0x7fffffff

DoCos
MOV r7, r1 ; make a copy
LDR r2,=270 ; constant won't fit into rotation scheme
ADR r4, cosine_data ; load addr of cosine table
CMP r1, #90 ; determine quadrant
BLE retvalue1 ; first quadrant?

CMP r1, #180


RSBLE r1,r1,#180 ; second quadrant?
BLE retvalue1

CMP r1, r2
SUBLE r1, r1, #180 ; third quadrant?
BLE retvalue1
RSB r1, r1, #360 ; otherwise, fourth

82
retvalue1
LDR r0, [r4, r1, LSL #2] ; get cosine value from table
CMP r7, #90
BLT done1
CMP r7, r2 ; do we return a negative value?
BGT done1
RSBLT r0, r0, #0 ; if between 90 and 270, negate
done1 MOV pc,lr

ALIGN

cosine_data
DCD 0x7fffffff,0x7ffb0280,0x7fec0a00,0x7fd31780
DCD 0x7fb02e00,0x7f834f00,0x7f4c7e80,0x7f0bc080
DCD 0x7ec11a80,0x7e6c9280,0x7e0e2e00,0x7da5f580
DCD 0x7d33f100,0x7cb82880,0x7c32a680,0x7ba37500
DCD 0x7b0a9f80,0x7a683180,0x79bc3880,0x7906c080
DCD 0x7847d900,0x777f9000,0x76adf600,0x75d31a80
DCD 0x74ef0e80,0x7401e500,0x730baf00,0x720c8080
DCD 0x71046d00,0x6ff38a00,0x6ed9eb80,0x6db7a880
DCD 0x6c8cd700,0x6b598e80,0x6a1de700,0x68d9f980
DCD 0x678dde80,0x6639b000,0x64dd8980,0x63798500
DCD 0x620dbe80,0x609a5300,0x5f1f5e80,0x5d9cff80
DCD 0x5c135380,0x5a827980,0x58ea9100,0x574bb900
DCD 0x55a61280,0x53f9be00,0x5246dd80,0x508d9200
DCD 0x4ecdff00,0x4d084680,0x4b3c8c00,0x496af400
DCD 0x4793a200,0x45b6bb80,0x43d46500,0x41ecc480
DCD 0x40000000,0x3e0e3dc0,0x3c17a500,0x3a1c5c40
DCD 0x381c8bc0,0x36185b00,0x340ff240,0x32037a40
DCD 0x2ff31bc0,0x2ddf0040,0x2bc75100,0x29ac37c0
DCD 0x278dde80,0x256c6f80,0x234815c0,0x2120fb80
DCD 0x1ef74c00,0x1ccb3240,0x1a9cd9a0,0x186c6de0
DCD 0x163a1a80,0x14060b60,0x11d06ca0,0x0f996a20
DCD 0x0d613050,0x0b27eb60,0x08edc7b0,0x06b2f1d0
DCD 0x04779630,0x023be164,0x00000000

DoTan
ADR r4, tan_data ; load address of tan table

LDR r0, [r4, r1, LSL #2]; get tan value from table
MOV pc,lr

ALIGN
tan_data
DCD 0x00000000,0x023bf7b4,0x047848a8,0x06b54c50
DCD 0x08f35ca0,0x0b32d420,0x0d740e40,0x0fb76790
DCD 0x11fd3e00,0x1445f100,0x1691e1e0,0x18e17440
DCD 0x1b350da0,0x1d8d16c0,0x1fe9fae0,0x224c28c0
DCD 0x24b412c0,0x27222ec0,0x2996f800,0x2c12ed40

83
DCD 0x2e969380,0x31227500,0x33b721c0,0x365530c0
DCD 0x38fd4100,0x3baff840,0x3e6e0580,0x41382180
DCD 0x440f0e00,0x46f39980,0x49e69d00,0x4ce90000
DCD 0x4ffbb800,0x531fc980,0x56564b80,0x59a06680
DCD 0x5cff5880,0x60747580,0x64012b00,0x67a70100
DCD 0x6b679e00,0x6f44c980,0x73407080,0x775ca780
DCD 0x7b9bb080,0x7fffffff

END

7. Implement a sine table which holds values ranging from 0 to 180 degrees. The implementation

contains fewer instructions than the routine in Section 12.2, but to generate the value, it uses more

memory to hold the sine values themselves. Compare the total code size for both cases.

GLOBAL Reset_Handler
AREA RESET, CODE
ENTRY

; Registers used:
; r0 = return value in Q31 notation
; r1 = sin argument (in degrees, from 0 to 360)
; r2 = temp
; r4 = starting address of sine table
; r7 = flag to return a negative value

Reset_Handler
main
MOV r1, #186 ; test 186 degrees
ADR r4, sin_data ; load address of sin table
MOV r7,r1 ; make a copy
CMP r1, #180 ; determine quadrant
BLE retvalue ; third or fourth?
SUB r1,r1,#180 ; sin(x-180)=-sin(x)

retvalue

LDR r0, [r4, r1, LSL #2] ; get sin value from table
CMP r7, #180 ; do we return a neg value?
RSBLT r0, r0, #0 ; negate the value

done B done

ALIGN
sin_data
DCD 0x00000000,0x023be164,0x04779630,0x06b2f1d0
DCD 0x08edc7b0,0x0b27eb60,0x0d613050,0x0f996a20
DCD 0x11d06ca0,0x14060b60,0x163a1a80,0x186c6de0

84
DCD 0x1a9cd9a0,0x1ccb3240,0x1ef74c00,0x2120fb80
DCD 0x234815c0,0x256c6f80,0x278dde80,0x29ac37c0
DCD 0x2bc75100,0x2ddf0040,0x2ff31bc0,0x32037a40
DCD 0x340ff240,0x36185b00,0x381c8bc0,0x3a1c5c40
DCD 0x3c17a500,0x3e0e3dc0,0x40000000,0x41ecc480
DCD 0x43d46500,0x45b6bb80,0x4793a200,0x496af400
DCD 0x4b3c8c00,0x4d084680,0x4ecdff00,0x508d9200
DCD 0x5246dd80,0x53f9be00,0x55a61280,0x574bb900
DCD 0x58ea9100,0x5a827980,0x5c135380,0x5d9cff80
DCD 0x5f1f5e80,0x609a5300,0x620dbe80,0x63798500
DCD 0x64dd8980,0x6639b000,0x678dde80,0x68d9f980
DCD 0x6a1de700,0x6b598e80,0x6c8cd700,0x6db7a880
DCD 0x6ed9eb80,0x6ff38a00,0x71046d00,0x720c8080
DCD 0x730baf00,0x7401e500,0x74ef0e80,0x75d31a80
DCD 0x76adf600,0x777f9000,0x7847d900,0x7906c080
DCD 0x79bc3880,0x7a683180,0x7b0a9f80,0x7ba37500
DCD 0x7c32a680,0x7cb82880,0x7d33f100,0x7da5f580
DCD 0x7e0e2e00,0x7e6c9280,0x7ec11a80,0x7f0bc080
DCD 0x7f4c7e80,0x7f834f00,0x7fb02e00,0x7fd31780
DCD 0x7fec0a00,0x7ffb0280,0x7fffffff,0x7ffb0280
DCD 0x7fec0a00,0x7fd31780,0x7fb02e00,0x7f834f00
DCD 0x7f4c7e80,0x7f0bc080,0x7ec11a80,0x7e6c9280
DCD 0x7e0e2e00,0x7da5f580,0x7d33f100,0x7cb82880
DCD 0x7c32a680,0x7ba37500,0x7b0a9f80,0x7a683180
DCD 0x79bc3880,0x7906c080,0x7847d900,0x777f9000
DCD 0x76adf600,0x75d31a80,0x74ef0e80,0x7401e500
DCD 0x730baf00,0x720c8080,0x71046d00,0x6ff38a00
DCD 0x6ed9eb80,0x6db7a880,0x6c8cd700,0x6b598e80
DCD 0x6a1de700,0x68d9f980,0x678dde80,0x6639b000
DCD 0x64dd8980,0x63798500,0x620dbe80,0x609a5300
DCD 0x5f1f5e80,0x5d9cff80,0x5c135380,0x5a827980
DCD 0x58ea9100,0x574bb900,0x55a61280,0x53f9be00
DCD 0x5246dd80,0x508d9200,0x4ecdff00,0x4d084680
DCD 0x4b3c8c00,0x496af400,0x4793a200,0x45b6bb80
DCD 0x43d46500,0x41ecc480,0x40000000,0x3e0e3dc0
DCD 0x3c17a500,0x3a1c5c40,0x381c8bc0,0x36185b00
DCD 0x340ff240,0x32037a40,0x2ff31bc0,0x2ddf0040
DCD 0x2bc75100,0x29ac37c0,0x278dde80,0x256c6f80
DCD 0x234815c0,0x2120fb80,0x1ef74c00,0x1ccb3240
DCD 0x1a9cd9a0,0x186c6de0,0x163a1a80,0x14060b60
DCD 0x11d06ca0,0x0f996a20,0x0d613050,0x0b27eb60
DCD 0x08edc7b0,0x06b2f1d0,0x04779630,0x023be164
DCD 0x00000000

END
428 bytes versus 768 bytes

8. Implement a cosine table which holds values ranging from 0 to 180 degrees. The

implementation contains fewer instructions than the routine in Section 12.2, but to generate the

85
value, it uses more memory to hold the cosine values themselves. Compare the total code size for

both cases.

GLOBAL Reset_Handler
AREA reset, CODE
ENTRY

; Registers used:
; r0 = return value in Q31 notation
; r1 = cos argument (in degrees, from 0 to 360)
; r2 = temp
; r4 = starting address of cosine table

Reset_Handler
main
MOV r1, #186 ; test 186 degrees
LDR r2, =360

ADR r4, cosine_data ; load address of cosine table


CMP r1, #180 ; determine quadrant
SUBGT r1, r2, r1 ; subtract from 360 if > 180

; get cosine value from table


LDR r0, [r4, r1, LSL #2]

done B done

ALIGN
cosine_data
DCD 0x7fffffff,0x7ffb0280,0x7fec0a00,0x7fd31780
DCD 0x7fb02e00,0x7f834f00,0x7f4c7e80,0x7f0bc080
DCD 0x7ec11a80,0x7e6c9280,0x7e0e2e00,0x7da5f580
DCD 0x7d33f100,0x7cb82880,0x7c32a680,0x7ba37500
DCD 0x7b0a9f80,0x7a683180,0x79bc3880,0x7906c080
DCD 0x7847d900,0x777f9000,0x76adf600,0x75d31a80
DCD 0x74ef0e80,0x7401e500,0x730baf00,0x720c8080
DCD 0x71046d00,0x6ff38a00,0x6ed9eb80,0x6db7a880
DCD 0x6c8cd700,0x6b598e80,0x6a1de700,0x68d9f980
DCD 0x678dde80,0x6639b000,0x64dd8980,0x63798500
DCD 0x620dbe80,0x609a5300,0x5f1f5e80,0x5d9cff80
DCD 0x5c135380,0x5a827980,0x58ea9100,0x574bb900
DCD 0x55a61280,0x53f9be00,0x5246dd80,0x508d9200
DCD 0x4ecdff00,0x4d084680,0x4b3c8c00,0x496af400
DCD 0x4793a200,0x45b6bb80,0x43d46500,0x41ecc480
DCD 0x40000000,0x3e0e3dc0,0x3c17a500,0x3a1c5c40
DCD 0x381c8bc0,0x36185b00,0x340ff240,0x32037a40
DCD 0x2ff31bc0,0x2ddf0040,0x2bc75100,0x29ac37c0
DCD 0x278dde80,0x256c6f80,0x234815c0,0x2120fb80
DCD 0x1ef74c00,0x1ccb3240,0x1a9cd9a0,0x186c6de0
DCD 0x163a1a80,0x14060b60,0x11d06ca0,0x0f996a20

86
DCD 0x0d613050,0x0b27eb60,0x08edc7b0,0x06b2f1d0
DCD 0x04779630,0x023be164,0x00000000,0xfdc41e9c
DCD 0xfb8869d0,0xf94d0e30,0xf7123850,0xf4d814a0
DCD 0xf29ecfb0,0xf06695e0,0xee2f9360,0xebf9f4a0
DCD 0xe9c5e580,0xe7939220,0xe5632660,0xe334cdc0
DCD 0xe108b400,0xdedf0480,0xdcb7ea40,0xda939080
DCD 0xd8722180,0xd653c840,0xd438af00,0xd220ffc0
DCD 0xd00ce440,0xcdfc85c0,0xcbf00dc0,0xc9e7a500
DCD 0xc7e37440,0xc5e3a3c0,0xc3e85b00,0xc1f1c240
DCD 0xc0000000,0xbe133b80,0xbc2b9b00,0xba494480
DCD 0xb86c5e00,0xb6950c00,0xb4c37400,0xb2f7b980
DCD 0xb1320100,0xaf726e00,0xadb92280,0xac064200
DCD 0xaa59ed80,0xa8b44700,0xa7156f00,0xa57d8680
DCD 0xa3ecac80,0xa2630080,0xa0e0a180,0x9f65ad00
DCD 0x9df24180,0x9c867b00,0x9b227680,0x99c65000
DCD 0x98722180,0x97260680,0x95e21900,0x94a67180
DCD 0x93732900,0x92485780,0x91261480,0x900c7600
DCD 0x8efb9300,0x8df37f80,0x8cf45100,0x8bfe1b00
DCD 0x8b10f180,0x8a2ce580,0x89520a00,0x88807000
DCD 0x87b82700,0x86f93f80,0x8643c780,0x8597ce80
DCD 0x84f56080,0x845c8b00,0x83cd5980,0x8347d780
DCD 0x82cc0f00,0x825a0a80,0x81f1d200,0x81936d80
DCD 0x813ee580,0x80f43f80,0x80b38180,0x807cb100
DCD 0x804fd200,0x802ce880,0x8013f600,0x8004fd80
DCD 0x80000000

END

444 bytes versus 752 bytes


9. Rewrite the binary search routine for the Cortex-M4

NUM EQU 14 ; insert # of entries here


ESIZE EQU 4 ; log 2 of the entry size (16 bytes)
; NB: This assumes entry size
; is a power of 2

; Registers used:
; R0 - first
; R1 - last
; R2 - middle
; R3 - index
; R4 - size of the entries (log 2)
; R5 - the key (what you're searching for)
; R6 - address of the list
; R7 - temp
; R8 - temp

LDR r5, =0x200 ; let’s look for PINEAPPLE

ADR r6, table_start ; load address of the table


MOV r0, #0 ; first = 0
MOV r1, #NUM-1 ; last = num of entries in list - 1

loop CMP r0, r1 ; compare first and last

87
ITT GT
MOVGT r2, #0 ; first>last, no key found, middle = 0
BGT done

ADD r2, r0, r1 ; first + last


ASR r2, r2, #1 ; first + last /2

; The shift count on a load instruction with register offset


; must be between 0 and 3 - our ESIZE value is too big, so we
; can do this with one extra instruction and a different load

LSL r8, r2, #ESIZE


LDR r7, [r6, r8] ; load the entry
CMP r5, r7 ; compare key to value loaded
IT GT
ADDGT r0, r2, #1 ; first = middle + 1
IT LT
SUBLT r1, r2, #1 ; last = middle - 1
BNE loop ; go again

done MOV r3, r2 ; move middle to 'index'


stop B stop

ALIGN
table_start
DCD 0x004
DCB "PEPPERONI "
DCD 0x005
DCB "ANCHOVIES "
DCD 0x010
DCB "OLIVES "
DCD 0x012
DCB "GREEN PEPPER"
DCD 0x018
DCB "BLACK OLIVES"
DCD 0x022
DCB "CHEESE "
DCD 0x024
DCB "EXTRA SAUCE "
DCD 0x026
DCB "CHICKEN "
DCD 0x030
DCB "CANADIAN BAC"
DCD 0x035
DCB "GREEN OLIVES"
DCD 0x038
DCB "MUSHROOMS "
DCD 0x100
DCB "TOMATOES "
DCD 0x200
DCB "PINEAPPLE "
DCD 0x300
DCB "PINE NUTS "

88
Chapter 13
1. What’s wrong with the following ARM7TDMI instructions?

a. STMIA r5!, {r5, r4, r9}

Register r5 is included in the register list, as well as being used as a pointer. According to the

ARM ARM, the results are unpredictable when writeback is used in such a situation.

b. LDMDA r2, {}

At least one register must be transferred, otherwise result is unpredictable

c. STMDB r15!, {r0-r3, r4, lr}

Using the program counter as the base pointer gives unpredictable results

2. On the ARM7TDMI, if register r6 holds the address 0x8000 and you executed the instruction

STMIA r6, {r7, r4, r0, lr}

what address now holds the value in register r0? Register r4? Register r7? The Link Register?

Address 0x8000 holds value in r0.

Address 0x8004 holds value in r4

Address 0x8008 holds value in r7

Address 0x800C holds value in lr

3. Assume that memory and ARM7TDMI registers r0 through r3 appear as follows:

Address Register
0x8010 0x00000001 0x13 r0
0x800C 0xFEEDDEAF 0xFFFFFFFF r1

89
0x8008 0x00008888 0xEEEEEEEE r2
0x8004 0x12340000 0x8000 r3
0x8000 0xBABE0000

Describe the memory and register contents after executing the instruction

LDMIA r3!, {r0, r1, r2}

Memory contents stay the same.

r0 = 0xBABE0000, r1 = 0x12340000, r2 = 0x00008888, r3 = 0x0000800C

4. Suppose that a stack appears as shown in the first diagram below. Give the instruction or

instructions that would push or pop data so that memory appears in the order shown. In other

words, what instruction would be necessary to go from the original state to that shown in (a), and

then (b), and then (c)?

Address
0x8010 0x00000001 0x00000001 0x00000001 0x00000001
0x800C 0xFEEDDEAF 0xFEEDDEAF 0xFEEDDEAF 0xFEEDDEAF
0x8008 0xBABE2222 0xBABE2222
0x8004 0x12340000
0x8000

original (a) (b) (c)

LDR sp, =0x800C


LDR r1, =0xBABE2222
LDR r0, =0x12340000

STR sp!, [r1] ; gets you to (a)


STR sp!, [r0] ; gets you to (b)
LDMDB sp!, {r0, r1} ; gets you to (c)

5. Convert the cosine table from Problem 1 in Chapter 12 into a subroutine, using a full

descending stack.

; Registers used:
; r0 = return value in Q31 notation
; r1 = sin argument (in degrees, from 0 to 360)
; r2 = temp
; r4 = starting address of sine table

90
; r7 = copy of argument

CosineTable
STMFD sp!,{r4,r7,lr} ; save off used registers
MOV r7, r1 ; make a copy
LDR r2,=270 ; constant won’t fit into
; rotation scheme
ADR r4, cosine_data ; load addr of cosine table
CMP r1, #90 ; determine quadrant
BLE retvalue ; first quadrant?

CMP r1, #180


RSBLE r1,r1,#180 ; second quadrant?
BLE retvalue

CMP r1, r2
SUBLE r1, r1, #180 ; third quadrant?
BLE retvalue
RSB r1, r1, #360 ; otherwise, fourth

retvalue
; get cosine value from table
LDR r0, [r4, r1, LSL #2]

CMP r7, #90


BLT done
CMP r7, r2 ; return a negative value?
BGT done
RSBLT r0, r0, #0 ; if between 90 and 270,
; negate the value

LDMFD sp!,{r4,r7,pc} ; return

cosine_data
DCD 0x7fffffff,0x7ffb0280,0x7fec0a00,0x7fd31780
DCD 0x7fb02e00,0x7f834f00,0x7f4c7e80,0x7f0bc080
DCD 0x7ec11a80,0x7e6c9280,0x7e0e2e00,0x7da5f580
DCD 0x7d33f100,0x7cb82880,0x7c32a680,0x7ba37500
DCD 0x7b0a9f80,0x7a683180,0x79bc3880,0x7906c080
DCD 0x7847d900,0x777f9000,0x76adf600,0x75d31a80
DCD 0x74ef0e80,0x7401e500,0x730baf00,0x720c8080
DCD 0x71046d00,0x6ff38a00,0x6ed9eb80,0x6db7a880
DCD 0x6c8cd700,0x6b598e80,0x6a1de700,0x68d9f980
DCD 0x678dde80,0x6639b000,0x64dd8980,0x63798500
DCD 0x620dbe80,0x609a5300,0x5f1f5e80,0x5d9cff80
DCD 0x5c135380,0x5a827980,0x58ea9100,0x574bb900
DCD 0x55a61280,0x53f9be00,0x5246dd80,0x508d9200
DCD 0x4ecdff00,0x4d084680,0x4b3c8c00,0x496af400
DCD 0x4793a200,0x45b6bb80,0x43d46500,0x41ecc480
DCD 0x40000000,0x3e0e3dc0,0x3c17a500,0x3a1c5c40
DCD 0x381c8bc0,0x36185b00,0x340ff240,0x32037a40
DCD 0x2ff31bc0,0x2ddf0040,0x2bc75100,0x29ac37c0

91
DCD 0x278dde80,0x256c6f80,0x234815c0,0x2120fb80
DCD 0x1ef74c00,0x1ccb3240,0x1a9cd9a0,0x186c6de0
DCD 0x163a1a80,0x14060b60,0x11d06ca0,0x0f996a20
DCD 0x0d613050,0x0b27eb60,0x08edc7b0,0x06b2f1d0
DCD 0x04779630,0x023be164,0x00000000

6. Rewrite Example 13.4 using full descending stacks.

SRAM_BASE EQU 0x40000000

AREA Passbyreg, CODE, READONLY


ENTRY

LDR sp, =SRAM_BASE + 100 ; stack pointer initialized

; try out positive case


; (this should saturate)
MOV r0, #0x40000000
MOV r1, #2
BL saturate

; try out a negative case


; (should not saturate
MOV r0, #0xFFFFFFFE
MOV r1, #8
BL saturate

stop B stop

saturate
; Subroutine saturate32
; Parameters are read from the stack, and
; registers r4 through r7 are saved on
; the stack. The result
; is placed at the bottom of the stack.
; Registers used:
; r4 - result
; r5 - operand to be shifted
; r6 - scratch register
; shift amount (m)

; r4 = saturate32 (r5 << m)


STMFD sp!,{r6,lr} ; save off used registers
MOV r6, #0x7FFFFFFF
MOV r2, r0, LSL r1
TEQ r0, r2, ASR r1 ; if (r0!=(r2>>m))
EORNE r2, r6, r0, ASR #31 ; r2 = 0x7FFFFFFF^sign(r5)
LDMFD sp!,{r6,pc} ; return

92
END

7. Rewrite Example 13.5 using full descending stacks.

SRAM_BASE EQU 0x40000000

AREA PassbyMem, CODE, READONLY


ENTRY

LDR sp, =SRAM_BASE+100 ; stack pointer initialized


LDR r3, =SRAM_BASE+100 ; writable mem for parameters
; try out positive case
; (this should saturate)
MOV r1, #0x40000000
MOV r2, #2
STMFD r3,{r1,r2} ; save off parameters
BL saturate

; try out a negative case


; (should not saturate
MOV r1, #0xFFFFFFFE
MOV r2, #8
STMFD r3, {r1,r2}
BL saturate

stop B stop

saturate
; Subroutine saturate32
; Parameters are read from memory, and the
; starting address is in register r3. The result
; is placed at the start of parameter memory
; Registers used:
; r4 - result
; r5 - operand to be shifted
; r6 - scratch register
; shift amount (m)

; r4 = saturate32 (r5 << m)


STMFD sp!,{r4-r7,lr} ; save off used registers
LDMFD r3, {r5,r7} ; get parameters off the stack
MOV r6, #0x7FFFFFFF
MOV r4, r5, LSL r7
TEQ r5, r4, ASR r7 ; if (r5!=(r4>>m))
EORNE r4, r6, r5, ASR #31 ; r4 = 0x7FFFFFFF^sign(r5)
STR r4, [r3] ; move result to memory
LDMFD sp!,{r4-r7,pc} ; return

93
END

8. Rewrite Example 13.6 using full descending stacks.

SRAM_BASE EQU 0x40000000

AREA Passbystack, CODE, READONLY


ENTRY

LDR sp, =SRAM_BASE+100 ; stack pointer initialized

; try out positive case


; (this should saturate)
MOV r1, #0x40000000
MOV r2, #2
; push parameters on the stack
STMFD sp!,{r1,r2}
BL saturate
; pop results off the stack
; now r1 = result of shift
LDMFD sp!, {r1,r2}

; try out a negative case


; (should not saturate
MOV r1, #0xFFFFFFFE
MOV r2, #8
STMFD sp!, {r1,r2}
BL saturate
LDMFD sp!, {r1,r2}

stop B stop

saturate
; Subroutine saturate32
; Parameters are read from memory, and the
; starting address is in register r3. The result
; is placed at the start of parameter memory
; Registers used:
; r4 - result
; r5 - operand to be shifted
; r6 - scratch register
; shift amount (m)

; r4 = saturate32 (r5 << m)


STMFD sp!,{r3-r7,lr} ; save off used registers
LDR r5, [sp, #0x18] ; get first parameter off stack

94
LDR r7, [sp, #0x1C] ; get second parameter off stack
MOV r6, #0x7FFFFFFF
MOV r4, r5, LSL r7
TEQ r5, r4, ASR r7 ; if (r5!=(r4>>m))
EORNE r4, r6, r5, ASR #31 ; r4 = 0x7FFFFFFF^sign(r5)
STR r4, [sp,#0x18] ; move result to bottom of stack
LDMFD sp!,{r3-r7,pc} ; return

END

9. Convert the factorial program written in Chapter 3 into a subroutine, using full descending

stacks. Pass arguments to the subroutine using both pass-by-register and pass-by-stack

techniques.

GLOBAL Reset_Handler
AREA RESET, CODE, READONLY
ENTRY

Reset_Handler

SRAM_BASE EQU 0x40000000

LDR sp, =SRAM_BASE + 100

; Since we really want these to behave as subroutines, i.e.,


; passing back a value in a register, we’ll send the argument
; in register r0 and also put the return value in r0.

MOV r0, #10 ; load 10 into r0, pass by register


BL FactReg

MOV r0, #10


STMFD sp!, {r0} ; save off parameter
BL FactStack

stop B stop

FactReg
STMFD sp!,{r4,lr}
MOV r4, #1 ; if n=0, at least n! = 1
loop CMP r0, #0
MULGT r4, r0, r4 ; perform multiply, result in r7
SUBGT r0, r0, #1 ; decrement n
BGT loop
MOV r0, r4 ; pass the result back
LDMFD sp!,{r4,pc}

95
FactStack

STMFD sp!,{r4,lr}
LDR r0, [sp, #8] ; get parameter off stack
MOV r4, #1 ; if n=0, at least n! = 1
loop1 CMP r0, #0
MULGT r4, r0, r4 ; perform multiply, result in r7
SUBGT r0, r0, #1 ; decrement n
BGT loop1
MOV r0, r4 ; pass the result back

LDMFD sp!,{r4,pc}

END

10. Write the ARM7TDMI division routine from Chapter 7 as a subroutine that uses empty

ascending stacks. Pass the subroutine arguments using registers, and test the code by dividing

4000 by 32.

AREA DivUsingEAstack, CODE, READONLY

Rcnt RN 4 ; assign R4 to Rcnt


Ra RN 1 ; assign R1 to Ra
Rb RN 2 ; assign R2 to Rb
Rc RN 3 ; assign R3 to Rc

ENTRY

LDR Ra, =4000


LDR Rb, =32
BL Division

Done B Done

Division
STMEA sp!,{r4,lr}
MOV Rcnt, #1 ; bit to control the division

Div1 CMP Rb, #0x80000000 ; move Rb until greater than Ra


CMPCC Rb, Ra
MOVCC Rb, Rb, LSL #1
MOVCC Rcnt, Rcnt, LSL #1
BCC Div1

96
MOV Rc, #0
Div2 CMP Ra, Rb ; test for possible subtraction
SUBCS Ra, Ra, Rb ; subtract if OK
ADDCS Rc, Rc, Rcnt ; put relevant bit into result
MOVS Rcnt, Rcnt, LSR #1 ; shift control bit
MOVNE Rb, Rb, LSR #1 ; halve unless finished
BNE Div2 ; divide result into Rc, remainder in Ra
LDMEA sp!,{r4,pc}

END

Rc = r3 = 0x7D = 125 = 4000/32


Ra = 0

11. Match the following terms with their definitions:

a. recursive 1. subroutine can be interrupted and called by the interrupting routine

b. relocatable 2. a subroutine which calls itself

c. position independent 3. the subroutine can be placed anywhere in memory

d. reentrant 4. all program addresses are calculated relative to the program counter

a.  2.

b.  3.

c.  4.

d.  1.

12. Write the ARM7TDMI division routine from Chapter 7 as a subroutine that uses full

descending stacks. Pass the subroutine arguments using the stack, and test the code by dividing

142 by 7.

AREA DivUsingFDstack, CODE, READONLY

STACK_BASE EQU 0x40000000

Rcnt RN 4 ; assign R4 to Rcnt


Ra RN 1 ; assign R1 to Ra
Rb RN 2 ; assign R2 to Rb
Rc RN 3 ; assign R3 to Rc

ENTRY

LDR sp, =STACK_BASE


LDR Ra, =142

97
LDR Rb, =7
STMFD sp!,{Ra,Rb}
BL Division

Done B Done

Division
STMFD sp!,{r4,lr}
LDR Ra, [sp, #0x8] ; load Ra from stack
LDR Rb, [sp, #0xC] ; load Rb from stack
MOV Rcnt, #1 ; bit to control the division

Div1 CMP Rb, #0x80000000 ; move Rb until greater than Ra


CMPCC Rb, Ra
MOVCC Rb, Rb, LSL #1
MOVCC Rcnt, Rcnt, LSL #1
BCC Div1
MOV Rc, #0
Div2 CMP Ra, Rb ; test for possible subtraction
SUBCS Ra, Ra, Rb ; subtract if OK
ADDCS Rc, Rc, Rcnt ; put relevant bit into result
MOVS Rcnt, Rcnt, LSR #1 ; shift control bit
MOVNE Rb, Rb, LSR #1 ; halve unless finished
BNE Div2 ; divide result into Rc, remainder in Ra
LDMFD sp!,{r4,pc}

END

Rc = r3 = 0x14 = 20 = 142/7
Ra = 0x2 = 2

13. Write ARM assembly to implement a PUSH operation without using LDM or STM

instructions. The routine should handle three data types, where register r0 contains 1 for byte

values, 2 for halfword values, and 4 for word values. Register r1 should contain the data to be

stored on the stack. The stack pointer should be updated at the end of the operation.

AREA Push, CODE, READONLY


ENTRY

CMP r0, #4
BNE Half
STR r1, [r13]
B Inc_SP

Half CMP r0, #2

98
BNE Byte
STRH r1, [r13]
B Inc_SP

Byte STRB r1, [r13]

Inc_SP ADD r13, r13, #4

Done B Done

END

14. Write ARM assembly to check whether an N x N matrix is a magic square. A magic square is

an N x N matrix in which the sums of all rows, columns, and the two diagonals add up to N(N2 +

1)/2. All matrix entries are unique numbers from 1 to N2. Register r1 will hold N. The matrix

starts at location 0x4000 and ends at location (0x4000 + N2). Suppose you wanted to test a

famous example of a magic square:

16 3 2 13
5 10 11 8
9 6 7 12
4 15 14 1

The numbers 16, 3, 2, and 13 would be stored at addresses 0x4000 to 0x4003, respectively. The

number 5, 10, 11, 8 would be stored at addresses 0x4004 to 0x4007, etc. Assume all numbers are

bytes. If the matrix is a magic square, r9 will be set upon completion; otherwise it will be cleared.

You can find other magic square examples, such as Ben Franklin’s own 8 x 8 magic square, on

the Internet to test your program.

; Write ARM Assembly to check whether an N x N matrix is a


; magic square. A magic square is an N x N matrix in which the
; sums of all rows, columns, and diagonals add up to
; N(N^2 + 1)/2.
; All matrix entries are unique numbers from 1 to N. The
; matrix starts at location 0x4000. R1 will hold N. If the
; matrix is a magic square, r9 upon completion will be set,
; otherwise cleared.

AREA MagicSquare, CODE, READONLY


ENTRY

99
MOV r9, #0
MOV r8, #1
MOV r7, #0

MUL r6, r1, r1 ; r6 = N^2


ADD r6, r6, #1 ; r6 = N^2 + 1
MUL r6, r6, r1 ; r6 = N(N^2 + 1)
MOV r6, r6, LSR #1 ; r6 = N(N^2 + 1)/2

BL Initialize

Rows0 MOVEQ r3, #0


Rows LDRB r4, [r2], #1 ; r4 = element for row addition
ADD r3, r3, r4 ; acc row sums to r3
SUBS r5, r5, #1 ; decrement counter
BNE Rows

Check_Rows
SUBS r7, r7, #1 ; row counter
BEQ Column_Init ; tested all rows successfully
CMP r3, r6 ; test for magic
MOVEQ r5, r1 ; reset counter
BEQ Rows0 ; test next row
B Done

Column_Init
BL Initialize

Columns0
MOVEQ r3, #0
Columns
LDRB r4, [r2], r1
ADD r3, r3, r4
SUBS r5, r5, #1
BNE Columns

Check_Columns
SUBS r7, r7, #1
BEQ Diag_Init
CMP r3, r6
MOVEQ r5, r1
MOVEQ r2, #0x4000
ADDEQ r2, r2, r8
ADDEQ r8, r8, #1
BEQ Columns0
B Done

Diag_Init
BL Initialize

Diags0

100
MOVEQ r3, #0
ADD r10, r1, #1
Diags
LDRB r4, [r2], r10
ADD r3, r3, r4
SUBS r5, r5, #1
BNE Diags

Diags1
CMP r3, r6
BNE Done
MOV r3, #0
MOV r2, #0x4000
MOV r7, r1
SUB r7, r7, #1
ADD r2, r2, r7
SUB r10, r1, #1
MOV r5, r1
Diags2
LDRB r4, [r2], r10
ADD r3, r3, r4
SUBS r5, r5, #1
BNE Diags2

CMP r3, r6
MOVEQ r9, #0xFFFFFFFF

Done B Done

END

15. Another common operation in signal processing and control applications is to compute a dot

product, given as

N 1
a   cm xm
m 0

where the dot product a is a sum of products. The coefficients cm and the input samples xm are

stored as arrays in memory. Assume sample data and coefficients are 16-bit, unsigned integers.

Write the assembly code to compute a dot product for 20 samples. This will allow you to use the

LDM instruction to load registers with coefficients and data efficiently. You probably want to

bring in four or five values at a time, looping as needed to exhaust all values. Leave the dot

product in a register, and give the register the name DPROD using the RN directive in your code.

101
If you use the newer v7-M SIMD instructions, note that you can perform two multiplies on 16-bit

values at the same time.

AREA DotProduct, CODE, READONLY


ENTRY

DPROD RN 0

LDR DPROD, =0
LDR r1, =samples
LDR r2, =coeffs
LDR r14, =4 ; loop counter for 20 samples

loop LDMIA r1!, {r3-r7} ; load samples


LDMIA r2!, {r8-r12} ; load coefficients
MLA DPROD, r3, r8, DPROD ; multiply and accumulate
; corresponding
MLA DPROD, r4, r9, DPROD ; samples and coeffs
MLA DPROD, r5, r10, DPROD
MLA DPROD, r6, r11, DPROD
MLA DPROD, r7, r12, DPROD
SUBS r14, r14, #1 ; loop 4 times
BNE loop

Done B Done

samples
DCD 3, 4, 2, 54, 6, 33, 5, 2, 1, 0, 77
DCD 1, 2, 3, 0, 5, 35, 3, 4, 1
coeffs
DCD 1, 1, 3, 3, 2, 1, 4, 2, 1, 0, 0, 0
DCD 0, 2, 1, 3, 2, 1, 0, 1

END

15. Suppose your stack was defined to be between addresses 0x40000000 and 0x40000200, with

program variables located at address 0x40000204 and higher in memory, and your stack pointer

contains the address 0x400001FC. What do you think would happen if you attempt to store 8

words of data on an ascending stack?

The machine would let you perform this operation, but as a result, you now have data corruption

in the program variables.

102
Chapter 14
1. Name three ways in which FIQ interrupts are handled more quickly than IRQ interrupts.

1) FIQs have a higher priority than IRQs when they occur simultaneously

2) The FIQ vector is at the top of the vector table allowing you to place the handler there

eliminating the need to change the PC, as you must with the IRQ vector

3) FIQ mode contains extra FIQ-specific registers which allows you to not have to push some

data on the stack on an FIQ call.

2. Describe the types of operations normally performed by a reset handler.

Set up exception vectors

Initialize the memory system (e.g. MMU/PU)

Initialize all required processor mode stacks and registers

Initialize variables required by C

Initialize any critical I/O devices

Enable interrupts

Change processor mode and/or state

Call the main application

3. Why can you only return from exceptions in ARM state on the ARM7TDMI?

There are special instructions to access the PSRs (to restore program state) which are only

available in ARM state.

4. How many external interrupt lines does the ARM7TDMI have? If you have 8 interrupting

devices, how would you handle this?

103
Two. If more than two sources of interrupt, they will need to be connected to an interrupt

controller which prioritizes and signals the core via the two interrupt lines.

5. Write an SVC handler that accepts the number 0x1234. When the handler sees this value, it

should reverse the bits in register r9 (see Exercise 2 in Chapter 8). The SWI exception handler

should examine the actual SWI instruction in memory to determine its number. Be sure to set up a

stack pointer in SVC mode before handling the exception.

T_bit EQU 0x20


RAMSTART EQU 0x40000000 ; start of onboard RAM for LPC2104
Mode_SVC EQU 0x13 ; Mode bits for Supervisor mode
I_Bit EQU 0x80 ; when I bit is set, IRQ is disabled
F_Bit EQU 0x40 ; when F bit is set, FIQ is disable

MSR CPSR_c, #Mode_SVC:OR:I_Bit:OR:F_Bit


; enter Supervisor mode

LDR sp, =RAMSTART ; setup Supervisor stack pointer

; this is not a standalone program


; ………..
; SVC Handler

SVC_Handler
STMFD sp!, {r0-r3,r12,lr} ; stack registers
MOV r1, sp ; and set pointer to parameters
MRS r0, SPSR ; get SPSR
STMFD sp!, {r0} ; and store it on the stack
TST r0, #T_bit
LDRNEH r0, [lr,#-2] ; extract SWI number
BICNE r0, r0, #0xFF00 ; 8 bits for Thumb
LDREQ r0, [lr,#-4]
BICEQ r0, r0, #0xFF000000 ; 24 bits for ARM

LDR r2, =0x1234


CMP r0, r2 ; is the SVC number 0x1234?
BLEQ ReverseR9

; insert checks for other SVC numbers here

LDMFD sp!, {r1}


MSR SPSR_csxf, r1
LDMFD sp!, {r0-r3,r12,pc}^ ; restore registers
; and return

104
ReverseR9
LDR r1, =32 ; 32-bit register bit counter
MOV r2, #0 ; r2 = result (will be r0 reversed)

Loop AND r3, r9, #1 ; r3=current LSB of reg to be reversed


ADD r2, r3, r2, LSL #1 ; shift result left
; 1, result LSB is r3
LSR r9, r9, #1 ; look at new LSB of reg
; to be reversed
SUBS r1, r1, #1 ; loop 32 times
BNE Loop
MOV pc, lr ; return

6. Explain why you can’t have an SVC and an undefined instruction exception occur at the same

time.

An SVC cannot be an undefined instruction, so the two are mutually exclusive.

7. Build an Undefined exception handler that tests for and handles a new instruction called

FRACM. This instruction takes two Q15 numbers, multiplies them together, shifts the result left

one bit to align the binary point, then stores the upper 16 bits to register r7 as a Q15 number. Be

sure to test the routine with two values.

AREA UndefQ15Mul, CODE, READONLY


ENTRY

LDR r0, =0xFFFFBB79 ; load -0.5354 into r3 (Q15)


LDR r4, =0x1FEB ; load 0.24938 into r4 (Q15)

FRACM DCD 0x77F07FF4 ; r7 (HI Q31) = r0(Q15)*Rm(Q15)

Done B Done

UndefHandler
STMFD sp!, {r0-r12, lr} ; save workspace and lr to stack
MRS r0, SPSR ; copy SPSR to r0
STR r0, [sp, #-4]! ; Save SPSR to stack

LDR r0, [lr, #-4] ; r0 = undefined instruction


BIC r2, r0, #0xF00FFFFF ; clear out all bits but
; opcode bits
TEQ r2, #0x07F000000 ; r2 = opcode for MulQ15
BLEQ MulQ15 ; if valide opcode, handle it

105
; insert tests for other undefined instructions here

LDR r1, [sp], #4 ; restore SPSR to r1


MSR SPSR_cxsf, r1 ; restore SPSR
LDMFD sp!, {r0-r12, pc}^ ; return to place in program
; after undef instruction

; MulQ15 is r0(Q15) * Rm(Q15) (where Rm can only be between r0


; and r7 inclusive), and puts the HI
; result in r7. It does not decode immediates, CC, S-bit, etc.
; Format is same as example in text.

MulQ15
BIC r3, r0, #0xFFFFFFF0 ; mask out all bits except Rm
ADD r3, r3, #1 ; bump past SPSR on the stack
LDR r0, [sp, #4] ; grab r0 from the stack
LDR r3, [sp, r3,LSL #2] ; use the Rm field as an offset
MUL r7, r0, r3 ; calculate r0(Q15) * Rm(Q15)
MOV r7, r7, LSL #1 ; shift Q30 number to get Q31
MOV r7, r7, LSR #15 ; move upper 16 bits to [15:0]
STR r7, [sp, #4] ; store result back on stack
MOV pc, lr

END

8. As a sneak peek into Chapter 16, we’ll find out that interrupting devices are programmable.

Using ST Microelectronic’s documentation found on their website, give the address range for the

following peripherals on the STR910FM32 microcontroller. They can cause an interrupt to be

sent to the ARM processor.

a. Real Time Clock

b. Wake-up/Interrupt Unit

a. 0x4C001000 to 0x4C001017

b. 0x48001000 to 0x48001013

9. Explain the steps the ARM7TDMI processor takes when handling an exception.

When an exception occurs, the ARM7TDMI:


Copies CPSR into SPSR_<mode>
Sets appropriate CPSR bits
If core currently in Thumb state then
ARM state is entered.

106
Mode field bits
Interrupt disable bits (if appropriate)
Stores the return address in LR_<mode>
Sets PC to vector address
To return, exception handler needs to:
Restore CPSR from SPSR_<mode>
Restore PC from LR_<mode>

10. What mode does the processor have to be in to move the contents of the SPSR to the CPSR?

What instruction is used to do this?

Any privileged mode.

You’ll use the MSR instruction.

11. When handling interrupts, why must the link register be adjusted before returning from the

exception?

The interrupt handler is called after the instruction has finished executing. Thus, the PC has

already been updated at the point LR is calculated.

12. How many SPSRs are there on the ARM7TDMI?

13. What is a memory management unit (MMU) used for? What is the difference between an

MMU and a memory protection unit (MPU)? You may want to consult the ARM documentation

for specific information on both.

MMU – handles accesses to memory requested by CPU and converts virtual addresses to physical

addresses using pages.

MPU – sets and enforces access permissions to various portions of memory, but in general, is

simpler than an MMU.

107
Chapter 15
1. How many operation modes does the Cortex-M4 have?

Two modes: Handler mode and Thread mode

2. What happens if you do not enable usage faults in Example 15.1?

The processor will take a hard fault.

3. Which register must be used to switch privilege levels?

The CONTROL register

4. What are the differences between the ARM7TDMI and Cortex-M4 vector tables?

a) ARM7TDMI vector table contains actual instructions at each address, where the Cortex-M4

vector table has handler addresses in the table.

b) The vector tables are ordered differently, e.g., the exception vector at address 0x0 is the reset

exception for the ARM7TDMI and the pointer to the top of the stack on the Cortex-M4.

c) There are more exception types on the Cortex-M4

5. Give the offsets (from the base address 0xE000E000) and register size for the following:

a) Usage Fault Status Register – 0xD28, 16 bits

b) Memory Management Fault Status Register – 0xD28, 8 bits

You may wish to consult the TM4C1233H6PM data sheet (Texas Instruments 2013)

6. Configure Example 15.5 so that the timer counts up rather than down. Don’t forget to

configure the appropriate match value!

There are two changes that must be made. The first is to set the counter to count up, which is

bit[4] of the CPTMTAMR register, so instead of ORing with the value 0x21, OR with the value

0x31. The second change is to remove the code that clears the timer match value, since the

default match value is 0xFFFF, so the write to offset 0x30 should be removed.

108
Chapter 16
1. Write an SVC handler so that when an SVC instruction is executed, the handler prints out the

contents of register r8. The program should incorporate assembly routines similar to the ones

already built. For example, you will need to convert the binary value in register r8 to ASCII first,

then use a UART to display the information in the Keil tools. Have the routine display “Register

r8 =” followed by the register’s value.

AREA SVCr8ASCII, CODE, READONLY

PINSEL0 EQU 0xE002C000 ; controls the function of the pins


U0START EQU 0xE000C000 ; start of UART0 registers
LCR0 EQU 0xC ; line control register for UART0
LSR0 EQU 0x14 ; line status register for UART0

RAMSTART EQU 0x40000000 ; start of onboard RAM for 2104


Mode_SVC EQU 0x13 ; Supervisor mode bits
I_Bit EQU 0x80 ; when I bit is set, IRQ is disabled
F_Bit EQU 0x40 ; when F bit is set, FIQ is disabled

T_bit EQU 0x20

ENTRY

MSR CPSR_c, #Mode_SVC:OR:I_Bit:OR:F_Bit


; switch to Supervisor mode

LDR sp, =RAMSTART ; and set Supervisor stack ptr

; Subroutine UARTConfig
; This subroutine configures the I/O pins first. It
; then sets up the UART control register. The parameters
; are set to 8 bits, no parity and 1 stop bit.
; Registers used:
; r5 – scratch register
; r6 – scratch register
; inputs: none
; outputs: none

UARTConfig
LDR r5,=PINSEL0 ; base address of register
LDR r6,[r5] ; get contents
BIC r6,r6,#0xF ; clear out lower nibble
ORR r6,r6,#0x5 ; sets P0.0 to Tx0 and P0.1 to Rx0
STR r6, [r5] ; r/modify/w back to register
LDR r5,=U0START
MOV r6, #0x83 ; 8 bits, no parity, 1 stop bit

109
STRB r6, [r5, #LCR0] ; write control byte to LCR
MOV r6, #0x61 ; 9600 baud @15MHz VPB clock
STRB r6, [r5] ; store control byte
MOV r6, #3 ; set DLAB = 0
STRB r6, [r5, #LCR0] ; Tx and Rx buffers set up

SVC_Handler
STMFA sp!, {r0-r3,r12,lr} ; stack registers
MOV r1, sp ; and set pointer to parameters
MRS r0, SPSR ; get SPSR
STMFA sp!, {r0} ; and store it on the stack
TST r0, #T_bit
LDRHNE r0, [lr,#-2] ; extract SVC number
BICNE r0, r0, #0xFF00 ; 8 bits for Thumb
LDREQ r0, [lr,#-4]
BICEQ r0, r0, #0xFF000000 ; 24 bits for ARM

; insert checks for SVC numbers here

BL printR8
LDMFA sp!, {r1}
MSR SPSR_csxf, r1
LDMFA sp!, {r0-r3,r12,pc}^ ; restore regs and return

printR8
MOV r9, lr ; preserve LR
CMP r8, #0xA ; is number less than 0xA?
BLT ADD0
ADD r8, r8, #"A"-"0"-0xA ; add offset 'A' to 'F'

ADD0 ADD r8, r8, #"0" ; convert to ASCII

LDR r1,=CharData ; starting address of characters

Loop LDRB r0, [r1],#1 ; load character, increment addr


CMP r0,#0 ; null terminated?
BLNE Transmit ; send character to UART
BNE Loop ; continue if not a ‘0’

LDR r5, =U0START


STRB r8, [r5] ; print r8's ASCII value

; Done B Done ; uncomment to test UART output

MOV pc, r9 ; return

Transmit
; Subroutine Transmit
; This routine puts one byte into the UART
; for transmitting.
; Register used:
; r5 – scratch

110
; r6 - scratch
; inputs: r0- byte to transmit
; outputs: none

STMFA sp!,{r5,r6,lr}
LDR r5,=U0START
wait LDRB r6,[r5,#LSR0] ; get status of buffer
CMP r6,#0x20 ; buffer empty?
BEQ wait ; spin until buffer’s empty
STRB r0,[r5]
LDMFA sp!,{r5,r6,pc}

CharData
DCB "Register r8 = ",0

END

2. Choose a device with general-purpose I/O pins, such as the LPC2103, and write an assembler

routine which sequentially walks a 1 back and forth across the I/O pins. In other words, at any

given time, only a single pin is set to 1 – all others are 0. Set up the Keil tools to display the pins

(you might want to compile and run the Blinky example that comes with the tools to see what the

interface looks like).

AREA IOwalk, CODE, READONLY

IODIR EQU 0xE0028008


IOSET EQU 0xE0028004
IOCLR EQU 0xE002800C

ENTRY

LDR r0, =0x00FF0000


LDR r1, =IODIR
STR r0, [r1] ; configure IO pins as output

LDR r1, =0x00010000 ; load least sig IO pin


LDR r2, =IOSET
LDR r3, =IOCLR

LoopForever
LDR r4, =8 ; loop counter for 8 pins

Loop1 STR r1, [r2] ; turn on LED


BL Delay ; delay

111
STR r0, [r3] ; turn off LEDs

LSL r1, r1, #1 ; shift to next pin


SUBS r4, r4, #1
BNE Loop1

LDR r4, =8

Loop2 LSR r1, r1, #1 ; shift to next pin


STR r1, [r2] ; turn on LED
BL Delay ; delay
STR r0, [r3] ; turn off LEDs

SUBS r4, r4, #1


BNE Loop2

B LoopForever

Delay
LDR r5, =0x00FF0000
DelayLoop
SUBS r5, r5, #1
BNE DelayLoop
MOV pc, lr

END

3. Rewrite the two examples in this chapter using full descending stacks.

See solutions in Chapter 13 for examples of how to do this.

4. What is the address range for on-chip SRAM for the LPC2106 microcontroller?

0x40000000 to 0x4000FFFF

5. Write an assembly program which takes a character entered from a keyboard and echoes it to a

display. To do this, you will need to use two UARTs, one for entering data and one for displaying

it. The routine which accepts characters should not generate an interrupt when data is available,

but merely wait for the character to appear in its buffer. In other words, the UART routine should

spin in a loop until a key is pressed, after which it branches back to the main routine. Note that

112
the UART window must have the focus in order to accept data from a keyboard (this is a

Windows requirement), so be sure to click on the UART window first when testing your code.

AREA UartRxTx, CODE, READONLY

PINSEL0 EQU 0xE002C000 ; controls the function of the pins

U0START EQU 0xE000C000 ; start of UART0 registers


LCR0 EQU 0xC ; line control register for UART0
LSR0 EQU 0x14 ; line status register for UART0

U1START EQU 0xE0010000 ; start of UART1 registers


LCR1 EQU 0xC ; line control register for UART1
LSR1 EQU 0x14 ; line status register for UART1

ENTRY

LDR r5,=PINSEL0 ; base address of register


LDR r6,[r5] ; get contents
BIC r6,r6,#0xF ; clear out lower nibble
LDR r4, =0x50005
ORR r6,r6,r4 ; sets P0.0 to Tx0; P0.1 to Rx0
; and P0.8 to Tx1; P0.9 to Rx1
STR r6, [r5] ; r/modify/w back to register

UART0Config
LDR r5,=U0START
MOV r6, #0x83 ; 8 bits, no parity, 1 stop bit
STRB r6, [r5, #LCR0] ; write control byte to LCR
MOV r6, #0x61 ; 9600 baud @15MHz VPB clock
STRB r6, [r5] ; store control byte
MOV r6, #3 ; set DLAB = 0
STRB r6, [r5, #LCR0] ; Tx and Rx buffers set up

UART1Config
LDR r5,=U1START
MOV r6, #0x83 ; 8 bits, no parity, 1 stop bit
STRB r6, [r5, #LCR1] ; write control byte to LCR
MOV r6, #0x61 ; 9600 baud @15MHz VPB clock
STRB r6, [r5] ; store control byte
MOV r6, #3 ; set DLAB = 0
STRB r6, [r5, #LCR1] ; Tx and Rx buffers set up

Recieve LDR r5,=U1START


waitRx LDRB r6,[r5,#LSR1] ; get status of buffer
TST r6,#0x1 ; buffer empty?
BEQ waitRx ; spin until buffer’s empty
LDRB r0,[r5]

113
Transmit LDR r5,=U0START
waitTx LDRB r6,[r5,#LSR0] ; get status of buffer
CMP r6,#0x20 ; buffer empty?
BEQ waitTx ; spin until buffer’s empty
STRB r0,[r5]

B Recieve ; loop forever

END

6. Using the D/A converter example as a guide, write an assembly routine which will generate a

waveform defined by

f(x) = asin(0.5x) + bsin(x) + c

where a, b and c are constants that allow a full period to be displayed. The output waveform

should appear on AOUT so that you can view it on the logic analyzer in the Keil tools.

; Sine wave generator using the LPC2132 microcontroller


; This program will generate a sine wave using
; the equation f(x) = 256sin(0.5x)+256sin(x)+512 and
; the D/A converter on the controller. The output can be
; viewed using the Logic Analyzer in the MDK tools

PINSEL1 EQU 0xE002C004


DACREG EQU 0xE006C000
SRAMBASE EQU 0x40000000

GLOBAL Reset_Handler
AREA Reset, CODE
ENTRY

Reset_Handler
main
LDR sp, =SRAMBASE ; initialize stack pointer
LDR r6, =PINSEL1 ; PINSEL1 configures pins
LDR r8, =DACREG ; DAC Register[15:6] is VALUE
LDR r7, [r6] ; read/modify/write
ORR r7, r7, #1:SHL:19 ; set bit 19
BIC r7, r7, #1:SHL:18 ; clear bit 18
STR r7, [r6] ; change P0.25 to Aout

outloop MOV r6, #720 ; start number

; Note that we have to count to 720 now, since the period of


; sin(0.5x) is twice that of sin(x)

114
inloop
RSB r1, r6, #720 ; arg = 720 - loop count
BL sine ; get sin(r1)
MOV r9, r0 ; hold sin(x)

; this isn't the cleanest way to do this, but generate


; another argument and calculate sin(0.5x)
RSB r1, r6, #720
MOV r1, r1, ASR#1 ; argument/2
BL sine ; get sin(r1)

; Now that we have r0 = sin(r1), we need to send


; this to the DAC converter
; First, we take the Q31 value and make it Q15
; and multiply it by 256. We do this for both
; the sin(x) and sin(0.5x). Then we offset the result
; by 512 to show the full sine wave. Aout is
; VALUE/1024 * Vref, so our sine wave should swing
; between 0 and 3.3V on the output.

; this section is for sin(0.5x)


ASR r0, r0, #16 ; convert Q31 to Q15
LSL r0, r0, #8 ; *256 now in Q15 notation
ASR r0, r0, #15 ; keep the integer part only
; this section is for sin(x)
ASR r9, r9, #16 ; convert Q31 to Q15
LSL r9, r9, #8 ; *256 now in Q15 notation
ASR r9, r9, #15 ; keep the integer part only

ADD r0, r9, r0 ; 256*sin(0.5x)+256*sin(x)


ADD r0, r0, #512 ; above+512 to show wave
LSL r0, r0, #6 ; bits 5:0 of DAC are
;
undefined
STRH r0, [r8] ; write to DACREG

SUBS r6, r6, #1 ; count down to 0


BNE inloop
B outloop ; do this forever

; Sine function
; Returns Q31 value for integer arguments from 0 to 360
; Registers used:
; r0 = return value in Q31 notation
; r1 = sin argument (in degrees)
; r4 = starting address in sine table
; r5 = temp
; r7 = copy of argument
; Note that the function has been slightly modified to account
; for values of x such that 360<=x<=720 so we can see a full
; period of sin(0.5x)

115
sine
STMIA sp!, {r4,r5,r7,lr} ; stack used registers
LDR r5,=270 ; const won't fit into rotation scheme
ADR r4, sin_data ; load address of sin table
CMP r1, #360 ; if x>360, then subtract 360 from it
SUBGE r1,r1,#360
MOV r7, r1 ; make a copy
CMP r1, #90 ; determine quadrant
BLE retvalue ; first quadrant?
CMP r1, #180
RSBLE r1,r1,#180 ; second quadrant?
BLE retvalue
CMP r1, r5
SUBLE r1, r1, #180 ; third quadrant?
BLE retvalue
RSB r1, r1, #360 ; otherwise, fourth

retvalue

LDR r0, [r4, r1, LSL #2] ; get sin value from table
CMP r7, #180 ; do we return a neg value?
RSBGT r0, r0, #0 ; negate the value
LDMDB sp!,{r4,r5,r7,pc} ; restore registers

ALIGN
sin_data
DCD 0x00000000,0x023be164,0x04779630,0x06b2f1d8
DCD 0x08edc7b0,0x0b27eb50,0x0d613050,0x0f996a30
DCD 0x11d06ca0,0x14060b80,0x163a1a80,0x186c6de0
DCD 0x1a9cd9c0,0x1ccb3220,0x1ef74c00,0x2120fb80
DCD 0x234815c0,0x256c6f80,0x278dde80,0x29ac3780
DCD 0x2bc750c0,0x2ddf0040,0x2ff31bc0,0x32037a40
DCD 0x340ff240,0x36185b00,0x381c8bc0,0x3a1c5c80
DCD 0x3c17a500,0x3e0e3dc0,0x40000000,0x41ecc480
DCD 0x43d46500,0x45b6bb80,0x4793a200,0x496af400
DCD 0x4b3c8c00,0x4d084600,0x4ecdff00,0x508d9200
DCD 0x5246dd00,0x53f9be00,0x55a61280,0x574bb900
DCD 0x58ea9100,0x5a827980,0x5c135380,0x5d9cff80
DCD 0x5f1f5f00,0x609a5280,0x620dbe80,0x63798500
DCD 0x64dd8900,0x6639b080,0x678dde80,0x68d9f980
DCD 0x6a1de700,0x6b598f00,0x6c8cd700,0x6db7a880
DCD 0x6ed9ec00,0x6ff38a00,0x71046d00,0x720c8080
DCD 0x730baf00,0x7401e500,0x74ef0f00,0x75d31a80
DCD 0x76adf600,0x777f9000,0x7847d900,0x7906c080
DCD 0x79bc3880,0x7a683200,0x7b0a9f80,0x7ba37500
DCD 0x7c32a680,0x7cb82880,0x7d33f100,0x7da5f580
DCD 0x7e0e2e00,0x7e6c9280,0x7ec11a80,0x7f0bc080
DCD 0x7f4c7e80,0x7f834f00,0x7fb02e00,0x7fd31780
DCD 0x7fec0a00,0x7ffb0280,0x7fffffff

END

116
You should see an image like the one below:

7. What is the address range for the following devices on the LPC2132 microcontroller?

a. General purpose input/output ports

b. Universal asynchronous receiver/transmitter 0

c. Analog-to-digital converter

d. Digital-to-analog converter

a. 0xE0028000 to 0xE002BFFF

b. 0xE000C000 to 0xE000FFFF

c. 0xE0034000 to 0xE0037FFF

d. 0xE006C000 to 0xE006FFF

8. What is the address range for the following devices on the STR910FM32 microcontroller?

a. General purpose input/output ports

b. Real time clock

c. Universal asynchronous receiver/transmitter

d. Analog-to-digital converter

117
a.

GPIO Port 0: 0x48006000 to 0x48006423


GPIO Port 1: 0x48007000 to 0x48007423
GPIO Port 2: 0x48008000 to 0x48008423
GPIO Port 3: 0x48009000 to 0x48009423
GPIO Port 4: 0x4800A000 to 0x4800A423
GPIO Port 5: 0x4800B000 to 0x4800B423
GPIO Port 6: 0x4800C000 to 0x4800C423
GPIO Port 7: 0x4800D000 to 0x4800D423
GPIO Port 8: 0x4800E000 to 0x4800E423
GPIO Port 9: 0x4800F000 to 0x4800F423

b. 0x4C001000 to 0x4C001017

c.

UART0: 0x4C004000 to 0x4C00404B


UART1: 0x4C005000 to 0x4C00504B
UART2: 0x4C006000 to 0x4C00604B

d. 0x4C00A000 to 0x4C00A037

9. The UART example given in this chapter uses registers r5 and r6 to configure the UART and,

therefore, must write them to the stack before corrupting them. Rewrite the UART example so

that no registers are stacked and the routine is still AAPCS compliant.

Just remove stack instructions and replace all instances or r5 with r2 and r6 with r3.

10. Modify the routine in Section 16.4.5 so that the LEDs flash alternately between red and blue

for about 1 sec each.

You only need to have the value written to Port F be either 2 or 8; therefore, instead of shifting by

1 bit, shift by two bits. Change the line

LSLLT r6, r6, #1

to

LSLLT r6, r6, #2

and the value will alternate between 2 and 8.

118
Chapter 17
1. On the ARM7TDMI, which bit in the CPSR indicates whether you are in ARM state or Thumb

state?

Bit 5

2. Give the mnemonic(s) for a Thumb instruction(s) that is equivalent to the ARM

instruction

SUB r0, r3, r2, LSL #2

LSL r2, #2
SUB r3, r2
MOV r0, r3

3. Why might you want to switch to Thumb state in an exception handler?

The bulk of the handler can be written in THUMB for the same reason you would use THUMB
for anything else: code density and better performance from narrow memory.

4. Can you talk to a floating-point coprocessor in Thumb state?

Not on the ARM7TDMI. On the Cortex-M4, since you are only executing Thumb-2 instructions,

you can execute floating-point code once the floating-point unit is enabled.

5. Using Figure 17.2 as a guide, convert Program 3 from Chapter 3 into 16-bit Thumb assembly.

AREA Prog3, CODE, READONLY


ENTRY
ARM

ADR r0, intoThumb + 1


BX r0

CODE16
intoThumb
LDR r0, =0xF631024C ; load some data
LDR r1, =0x17539ABD ; load some data

EOR r0, r1 ; r0 = r0 XOR r1

119
MOV r2, r0 ; r2 = temp r0
EOR r2, r1 ; r2 = r2 XOR r1
MOV r1, r2 ; r1 = r2 XOR r1
EOR r0, r1 ; r0 = r0 XOR r1

stop B stop

END

6. Describe why veneers might be needed in a program.

A veneer is needed when a branch involves a destination beyond the branching range of the

current state.

7. Convert Example 13.4 into Thumb code. Do not convert the entire subroutine – just the four

lines of code to perform saturation arithmetic.

MOV r6, #0
NEG r6, r6
LSR r6, r6, #1 ; MOV r6, #0x7FFFFFFF

LSL r0, r1
MOV r2, r0 ; MOV r2, r0, LSL r1

ASR r2, r1
CMP r0, r2
BEQ Return ; TEQ r0, r2, ASR r1

ASR r0, #31


EOR r6, r0
MOV r2, r6 ; EORNE r2, r6, r0, ASR #31

Return ; return instructions here

8. In which state does the ARM7TDMI processor come out of reset?

ARM

9. How do you switch to Thumb state if your processor supports both ARM and Thumb

instructions?

BX and BLX instructions

10. How do you switch to ARM state on the Cortex-M4?

120
This is a trick question, since you cannot switch to ARM state!

121
Chapter 18
1. Example 18.1 gives the program necessary to set the Q flag. Run the code using the Keil tools,

with the target being the STR910FM32 from ST Microelectronics. Which registers does the

compiler use, and what is the value in those registers just before the QDADD instruction is

executed?

r0 = 0x60000000, r1 = 0x0007000, r2 = 0x00007ABC, r3 = 0x60000000, sp = 0x04000458, lr

(r14) = 0x35B24000 , pc = 0x0000424C

2. Example 18.2 demonstrates the embedded assembler. Compile the code and run it. What is the

value in the program counter just before the BX instruction executes in the function

my_strcopy? In order to compile this example, you will need to target the LPC2101 from NXP

and include files from the “Inline” example found in the Keil “Examples” directory. Include the

source files serial.c and retarget.c in your own project. Also be sure to include the startup file

when asked. When you run the code, you can use the UART #2 window to see the output from

the printf statements.

0x000041D8

3. Write a short C program which declares a variable called TMPTR. Using example 18.2 as a

guide, print out the variable in degrees Celsius, with some initial temperature defined in the main

program in degrees Fahrenheit. Write the temperature conversion program as an inline assembly

function. You’ll want to use fractional arithmetic to avoid division.

#include <stdio.h>

extern void init_serial (void);

int main(void)
{
int TMPTR = 96;

122
int a, b, c;
float result;
init_serial();

__asm
{
MOV a, #0x471C // load 5/9 in Q15
SUB b, TMPTR, #32 // b = TMPTR - 32
MUL c, a, b // c = 5/9 * (TMPTR - 32)
}

result = c/32768.0; // convert Q15 to decimal


printf("%i",TMPTR);
printf(" degrees F equals ");
printf("%f",result);
printf(" degrees C.");

return 0;
}

4. Using the saturation algorithm discussed in Chapter 13, which performs a logical shift left by m

bits and saturates when necessary, write a C routine which calls it as an embedded assembly

function. The function should have two parameters: the value to be shifted and the shift count. It

should return the shifted value. The small C routine should create a variable with the initial value.

__asm void sat(int val, int shiftVal)


{
// Subroutine sat
// Performs r0 = saturate32 (r0 << r1)
// Registers used:
// r0 = operand to be shifted and result returned
// r1 = shift amount m
// r6 = scratch register
MOV r6, #0x7FFFFFFF
MOV r2, r0, LSL r1
TEQ r0, r2, ASR r1 // if (r0!=(r2>>m)
EORNE r0, r6, r0, ASR #31 // r2 = 0x7FFFFFFF^sign(r0)
}

void callsat()
{
// try out a positive case
// (this should saturate)
int val = 0x40000000;
int shiftVal = 2;

123
sat(val,shiftVal);
}

5. Modify Example 18.1 so that the function Clear_Q_Flag returns 1 when the function clears

a set Q flag; otherwise, if the bit was clear, it returns 0.

__inline int Clear_Q_flag (void)


{
int temp, temp2;
__asm
{
MRS temp, CPSR
BIC temp2, temp, #Q_Flag
ANDS temp, temp, #Q_Flag
MOVNE temp, #1
MSR CPSR_f, temp2
}
return temp;
}

6. Run Example 18.4 by creating two separate source files in the Keil tools. Once you have saved

these files, you can add them to a new project. The Keil tools will compile the C source file and

assemble the assembly language file automatically. When you run the code, you can see the

output on UART #2. Refer to Exercise 2 above for more details.

No written answer required.

7. Run Example 18.5 by creating three separate source files in the Keil tools. Recall that the FPU

must be initialized, and this should be one of the three files. Notice the value of OutVal in the

variables window and confirm the converted values match the expected inputs from the sensor

(see the comments in the array declaration).

No written answer required.

8. Expand Example 18.5 by converting the OutVal floating-point value back to S16 8x8 format.

Add this routine to the file containing the CvtShorts8x8ToFloat routine and call it

CvtFloatToShort8x8. Verify that the result of the conversion back to S16 8x8 format matches the

124
original value. Experiment with some other formats, such as 9.7 or 7.9, and see what values are

produced.

No written answer required.

125

Potrebbero piacerti anche