Solutions Manual For: Arm Assembly Language

solutions MANUAL FOR
ARM Assembly
Language
Fundamentals and Techniques
Second Edition
by
William Hohl and

Christopher Hinds
solutionS MANUAL FOR
ARM Assembly
Language
Second Edition
by
William Hohl and
Christopher Hinds
Boca Raton London New York
CRC Press is an imprint of the

Taylor & Francis Group, an informa business
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2015 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works
Printed on acid-free paper

Version Date: 20140722
International Standard Book Number-13: 978-1-4822-2992-9 (Ancillary)
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and
information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission
to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic,
mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or
retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact
the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides
licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment
has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation
without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
ARM Assembly Language
2nd Edition
Solutions Manual
1
Copyright © 2015 William Hohl and Christopher Hinds. All rights reserved.
Proprietary Notice
ARM and the ARM Powered logo are trademarks of ARM Ltd. Neither the whole nor any
part of the information contained in, or the product described in, this document may be
adapted or reproduced in any material form except with the prior written permission of the
copyright holders. The product described in this document is subject to continuous
development and improvement. All particulars of the product and its use contained in this
document are given by the authors in good faith. However, all warranties implied or
expressed, including but not limited to implied warranties or merchantability, or fitness for
purpose, are excluded. This document is intended only to assist the reader in the use of the
product. Neither ARM Ltd nor the authors shall be liable for any loss or damage arising
from the use of any information in this document, or any error or omission in such
information, or any incorrect use of the product.
Credits
Thanks are due to Joe Bungo for many of these answers.
Release Information
The authors reserve the right to modify or change the solutions as better ones become
available or errors are found. Corrections or comments can be sent to us at WEBSITE
ADDRESS?
Issue Date By Change

Draft 01 Mar 09 JB/BH Created
A 17 Mar 14 BH/CH 2nd edition
2
Authors’ Note:
You’ll notice that many of the programs are already written to be executed in the Keil tools,
while some will have to be slightly altered, so mind the type of tool suite that you have. All
answers are provided as illustration only and are not intended to be considered optimal
solutions. Hopefully, students are able to follow the flow of the code and ultimately make
optimizations or improvements on their own.
William Hohl and Christopher Hinds
3
Table of Contents
Chapter 1 ............................................................................................................................. 5
Chapter 2 ........................................................................................................................... 11
Chapter 3 ........................................................................................................................... 14
Chapter 4 ........................................................................................................................... 18
Chapter 5 ........................................................................................................................... 22
Chapter 6 ........................................................................................................................... 27
Chapter 7 ........................................................................................................................... 31
Chapter 8 ........................................................................................................................... 46
Chapter 9 ........................................................................................................................... 53
Chapter 10 ......................................................................................................................... 59
Chapter 11 ......................................................................................................................... 65
Chapter 12 ......................................................................................................................... 74
Chapter 13 ......................................................................................................................... 89
Chapter 14 ....................................................................................................................... 103
Chapter 15 ....................................................................................................................... 108
Chapter 16 ....................................................................................................................... 109
Chapter 17 ....................................................................................................................... 119
Chapter 18 ....................................................................................................................... 122
4
Chapter 1
1. Give two examples of system-on-chip designs available from semiconductor manufacturers.
Describe their features and interfaces. They do not necessarily have to contain an ARM
processor.
The Texas Instruments OMAP 35x processors contain an ARM Cortex-A8 processor, along with
a camera image signal processor, external memory interfaces (SDRAM and general purpose
controllers), a display subsystem, serial communications, and removable media connections. The
LPC1758 microcontroller from NXP has a Cortex-M3 processor, along with an Ethernet MAC,
DMA controller, 4 UARTs, 2 CAN channels, a 10-bit Digital-to-Analog Converter, a motor
control PWM, timers, and a host of other peripherals.
2. Find the two’s complement representation for the following numbers, assuming they are
represented as a 16-bit number. Write the value in both binary and hexadecimal.
a. -93 b. 1,034
c. 492 d. -1094
a. 1111111110100011, 0xFFA3
b. 0000010000001010, 0x040A
c. 0000000111101100, 0x01EC
d. 1111101110111010, 0xFBBA
3. Convert the following binary values into hexadecimal:
a. 10001010101111 b. 10101110000110
c. 1011101010111110 d. 1111101011001110
a. 0x22AF
b. 0x2B86
5
c. 0xBABE
d. 0xFACE
4. Write the 8-bit representation of -14 in one’s complement, two’s complement and sign-
magnitude representations.
One’s: 11110001
Two’s: 11110010
Signed Magnitude: 10001110
5. Convert the following hexadecimal values to base 10:
a. 0xFE98 b. 0xFEED
c. 0xB00 d. 0xDEAF
a. 65176
b. 65261
c. 2816
d. 57007
6. Convert the following base 10 numbers to base 4:
a. 812 b. 101
c. 96 d. 3640
a. 30230
b. 1211
c. 1200
d. 320320
6
7. Using the smallest data size possible, either a byte, a halfword (16 bits), or a word (32 bits),
convert the following values into two’s complement representations:
a. -18,304 b. -20
c. 114 d. -128
a. 1011100010000000
b. 11101100
c. 01110010
d. 10000000
8. Indicate whether each value could be represented by a byte, a halfword or a word-length two’s
complement representation:
a. -32,765 b. 254
c. -1,000,000 d. -128
a. halfword
b. halfword
c. word
d. byte
9. Using the information from the ARM v7-M Architectural Reference Manual, write out the 16-
bit binary value for the instruction SMULBB r5, r4, r3.
11111011000101001111010100000011
10. Describe all the ways of interpreting the hexadecimal number 0xE1A02081 (hint: it might not
be data).
It can be data in a number of formats:
Unsigned: 3,785,367,681 in decimal
7
Signed one’s complement: -509,599,614 in decimal
Signed two’s complement: -505,599,615 in decimal
It could be an address, or the instruction
MOV r2, r1, LSL #1 (you will see this as LSL r2, r1, #1 in UAL format)
It could also be the floating-point number -3.69227 x 1020.
11. If the hexadecimal value 0xFFE3 is a two’s complement, halfword value, what would it be in
base 10? What if it were a word-length value (i.e., 32 bits long)?
Halfword: -29
Word: 65507
12. How do you think you could quickly compute values in octal (base 8) given a value in binary?
Start from the right, splitting the binary number up into 3-bit partitions, then convert each 3-bit
partition into its decimal equivalent.
13. Convert the following decimal numbers into hexadecimal:
a. 256 b. 1000
c. 4095 d. 42
a. 0x100
b. 0x3E8
c. 0xFFF
d. 0x2A
14. Write the 32-bit representation of -247 in sign-magnitude, one’s complement, and two’s
complement notations. Write the answer using 8 hex digits.
Sign-magnitude: 0x800000F7
8
One’s complement: 0xFFFFFF08
Two’s complement: 0xFFFFFF10
15. Write the binary pattern for the letter ‘Q’ using the ASCII representation.
01010001 or 0x51
16. Multiply the following binary values. Notice that binary multiplication works exactly like
decimal multiplication, except you are either adding 0 to the final product or a scaled
multiplicand. For example:
100 (multiplicand)
x 110 (multiplier)
-----
0
1000 (scaled multiplicand – by 2)
10000 (scaled multiplicand – by 4)
---------
11000
a. 1100 b. 1010 c. 1000 d. 11100

x 1111 x 1011 x 1001 x 111
------ --------- --------- -----------
a. 10110100
b. 1101110
c. 1001000
d. 11000100
17. How many bits would the following C data types use by the ARM7TDMI?
a. int
b. long
c. char
d. short
e. long long
9
a. 32
b. 32
c. 8
d. 16
e. 64
18. Write the decimal number 1.75 in the IEEE single-precision floating-point format. Use one of
the tools given in the References to check your answer.
0x3FE00000
10
Chapter 2
1. How many modes does the ARM7TDMI processor have? How many states does it have? How
many modes does the Cortex-M4 have?
ARM7TDMI: 7 modes, 2 states
Cortex-M4: 2 modes, Handler mode and Thread mode
2. What do you think would happen if the instruction SMULTT (an instruction that runs fine on a
Cortex-M4) were issued to an ARM7TDMI? Which mode do you think it would be in after this
instruction entered the execute stage of its pipeline?
It wouldn’t recognize the instruction and would go into Undefined mode.
3. What is the standard use of register r14? Register r13? Register r15?
r14: Link Register
r13: Stack Pointer
r15: Program Counter
4. On an ARM7TDMI, in any given mode, how many registers does a programmer see at one
time?
17 for User mode, 18 for remaining modes
5. Which bits of the ARM7TDMI status registers contain the flags? Which register on the Cortex-
M4 holds the status flags?
ARM7TDMI: Bits [31:28]
Cortex-M4: APSR bits [31:28]
11
6. If an ARM7TDMI processor encounters an undefined instruction, from what address will it
begin fetching instructions after it changes to Undefined mode? What about a reset?
On the ARM7TDMI: Undefined: 0x04 and Reset: 0x0
7. What is the purpose of FIQ mode?
Handling of highest priority interrupt(s)
8. Which mode on an ARM7TDMI can assist in supporting operating systems, especially for
supporting virtual memory systems?
Abort mode
9. How do you enable interrupts on the ARM7TDMI?
Clear bits 6 and 7 of the CPSR
10. How many stages does the ARM7TDMI pipeline have? Name them.
3: Fetch, Decode, and Execute
11. Suppose that the program counter, register r15, contained the hex value 0x8000. From what
address would an ARM7TDMI fetch the next instruction (assume you are in ARM state)?
This depends on the state of the machine. It would be 0x8004 if the machine is in ARM state.
Note that it could be 0x8002 if the machine is in Thumb state.
12. What is the function of the Saved Program Status Register?
To save the state of the machine in the CPSR when an exception is taken so it can be restored
upon return of handling the exception.
12
13. On an ARM7TDMI, is it permitted to put the instruction
SUB r0, r2, r3
at address 0x4? How about at address 0x0? Can you put that same bit pattern at address 0x4 in a
system using a Cortex-M4?
On an ARM7TDMI, technically, yes and yes. The instruction would be executed. You could put
the bit pattern for the SUB instruction at address 0x4 in a Cortex-M4 system, but remember that
the bit pattern is seen as an address, so the machine will attempt to fetch an instruction from that
address, and mostly likely die a horrible death.
14. Describe the exception vector table for any other microprocessor. How does it differ from the
ARM7TDMI processor? How does it differ from the Cortex-M4?
Motorola 68000’s vectors correspond to different exceptions like ARM’s, but they contain
addresses instead of instructions. It is actually more like the Cortex-M4’s exception vectors.
15. Give an example of an instruction that would typically be placed at address 0x0 on an
ARM7TDMI. What value is typically placed at address 0x0 on a Cortex-M4?
LDR pc, [pc, #offset]
where “#offset” is an offset to the Reset handler.
For the Cortex-M4, the address of the top of the stack is placed at address 0x0.
16. Explain the current program state of an ARM7TDMI if the CPSR had the value 0xF00000D3.
N, Z, C, V flags are set; IRQs and FIQs are enabled; ARM state; Supervisor mode
13
Chapter 3
1. Change Program 1, replacing the last MOV instruction with
ADD r2, r1, r1, LSL #2
and rerun the simulation. What value is in register r2 when the code reaches the infinite loop (the
B instruction)? What is the ADD instruction actually doing?
r2 = 170 (0xAA). The ADD instruction is multiplying r1 by 5.
2. Using a Disassembly window, write out the seven machine codes (32-bit instructions) for
Program 2.
0xE3A0600A
0xE3A07001
0xE3560000
0xC0070796
0xC2466001
0xCAFFFFFB
0xEAFFFFFE
3. How many bytes does the code for Program 2 occupy? What about Program 3?
Program 2: For the ARM7TDMI code, 28 bytes. For the Cortex-M4 code, 22 bytes.
Program 3: 32 bytes (don’t forget the two constants in the literal pool)
4. Change the value in register r6 at the start of Program 2 to 12. What value is in register r7
when the code terminates? Verify that this hex number is correct.
0x1C8CFC00 = 479001600 = 12!
5. Run Program 3. After the first XOR instruction, what is the value in register r0? After the
second XOR instruction, what is the value in register r1?
r0 = 0xE16298F1
r1 = 0xF631024C
14
6. Using the instructions in Program 2 as a guide, write a program that computes 6x2-9x+2 and
leaves the result in register r2. You can assume x is in register r3. For the syntax of the
instructions, such as addition and subtraction, see the ARM Architectural Reference Manual and
the ARM v7-M Architectural Reference Manual.
AREA Prog6, CODE, READONLY

ENTRY
MOV r4, #6
MUL r2, r3, r3 ; r2 = x^2
MUL r2, r4, r2 ; r2 = 6x^2
ADD r3, r3, r3, LSL #3 ; r3 = 9x
SUB r2, r2, r3 ; r2 = 6x^2 - 9x
ADD r2, r2, #2 ; r2 = 6x^2 -9x +2
stop B stop
END
You can also implement this by rewriting the expression as (6x - 9)x + 2.
7. Show two different ways to clear all the bits in register r12 to zero. You may not use any other
registers other than r12.
AND r12, r12, #0
EOR r12, r12, r12
SUB r12, r12, r12
8. Using Program 3 as a guide, write a program that adds the 32-bit two’s complement
representations of -149 and -4,321. Place the result in register r7. Show your code and the
resulting value in register r7.

ENTRY
LDR r1, =0xFFFFFF6B ; r1 = -149

LDR r2, =0xFFFFEF1F ; r2 = -4321
ADD r7, r1, r2 ; r7 = -149 -4321 = -4470
stop B stop
END
15
r7 = 0xFFFEE8A
9. Using Program 2 as a guide, execute the following instructions. Place small values in the
registers beforehand. What do the instructions actually do?
a. MOVS r6, r6, LSL #5
b. ADD r9, r8, r8, LSL #2
c. RSB r10, r9, r9, LSL #3
d. (b) followed by (c)
a. r6 = r6 * 32
b. r9 = r8 * 5
c. r10 = r9 * 7
d. r10 = r8 * 35
10. Suppose a branch instruction is located at address 0x0000FF00 in memory. What ARM
instruction (32-bit binary pattern) do you think would be needed so that this B instruction could
branch to itself?
0xEA003FBB
11. Translate the following machine code into ARM mnemonics. What does the machine code
do? What is the final value in register r2? You will want to compare these bit patterns with
instructions found in the ARM Architectural Reference Manual.
Address Machine code
00000000 E3A00019
00000004 E3A01011
00000008 E0811000
0000000C E1A02001
MOV r0, #25

MOV r1, #17
16
ADD r1, r1, r0
MOV r2, r1
12. Using the VLDR pseudo-instruction shown in Program 5, change Program 4 so that it adds
the value of pi (3.1415926) to 2.0. Verify that the answer is correct using one of the floating-point
conversion tools given in the References.
LDR r0, =0xE000ED88 ; read modify write

LDR r1, [r0]
ORR r1, r1, #(0xF<<20) ; enable CP10, CP11
STR r1, [r0]
VLDR.F s0, =2.0
VLDR.F s1, =3.1415926 ; pi
VADD.F s2, s1, s0
13. The floating-point instruction VMUL.F works very much like a VADD.F instruction. Using
Programs 4 and 5 as a guide, multiply the floating-point representation for Avagadro’s constant
and 4.0 together. Verify that the result is correct using a floating-point conversion tool.
LDR r0, =0xE000ED88 ; read modify write

LDR r1, [r0]
ORR r1, r1, #(0xF<<20) ; enable CP10, CP11
STR r1, [r0]
VLDR.F s0, =4.0
VLDR.F s1, =6.0221415e23 ; Avagadro’s constant
VMUL.F s2, s1, s0
17
Chapter 4
1. What is wrong with the following program?
AREA ARMex2, CODE, READONLY

ENTRY
start
MOV r0, #6
ADD r1, r2, #2
END
The MOV and ADD instructions need to be indented as they cannot be in the label space.
2. What is another way of writing the following line of code?
MOV PC, LR
MOV r15, r14
3. Use a Keil directive to assign register r6 to the name bouncer.
bouncer RN 6
4. Use a Code Composer Studio directive to assign register r2 to the name FIR_index.
.asg R2, FIR_index
5. Fill in the missing Keil directive below:
SRAM_BASE ________ 0x2000
MOV r12, #SRAM_BASE

STR r6, [r12]
EQU
6. What is the purpose of a macro?
To allow a programmer to build definitions of functions or operations once, and then call this
operation by name throughout the code, saving coding time.
18
7. Create a mask (bit pattern) in memory using the DCD directive (Keil) and the SHL and OR
operators for the following cases. Repeat the exercise using the .word (CCS) and the << and |
operators. Remember that bit 31 is the most significant bit of a word and bit 0 is the least
significant bit.
a. The upper two bytes of the word are 0xFFEE and the least significant bit is set.
b. Bits 17 and 16 are set, and the least significant byte of the word is 0x8F.
c. Bits 15 and 13 are set (hint: do this with two SHL directives).
d. Bits 31 and 23 are set.
Keil:
a. MaskA DCD (0xFFEE:SHL:16):OR:1
b. MaskB DCD 0x8F:OR:(3:SHL:16)
c. MaskC DCD (1:SHL:13):OR:(1:SHL:15)
d. MaskD DCD (1:SHL:31):OR:(1:SHL:23)
CCS:
a. MaskA .word (0xFFEE<<16) | 1
.b MaskB .word 0x8F | (3<<16)
.c MaskC .word (1<<13) | (1<<15)
.d MaskD .word (1<<31) | (1<<23)
8. Give the Keil directive that assigns the address 0x800C to the symbol INTEREST.
INTEREST EQU 0x800C
9. What constant would be created if the following operators are used with a DCD directive? For
example,
MASK DCD 0x5F:ROL:3
19
a. 0x5F:SHR:2 b. 0x5F:AND:0xFC
c. 0x5F:EOR:0xFF d. 0x5F:SHL:12
a. 0x17
b. 0x5C
c. 0xA0
d. 0x5F000
10. What constant would be created if the following operators are used with a .word directive?
For example,
MASK .word 0x9B<<3
a. 0x9B>>2 b. 0x9B & 0xFC
c. 0x9B ^ 0xFF d. 0x9B<<12
a. 0x26
b. 0x98
c. 0x64
d. 0x9B000
11. What instruction puts the ASCII representation of the character ‘R’ in register r11?
MOV r11, #0x52
12. Give the Keil directive to reserve a block of zeroed memory, holding 40 words and labeled
coeffs.
coeffs SPACE 160
13. Give the CCS directive to reserve a block of zeroed memory, holding 40 words and labeled
coeffs.
20
coeffs .space 160
14. Explain the difference between Keil’s EQU, DCD, and RN directives. Which, if any, would
be used for the following cases?
a. Assigning the Abort mode’s bit pattern (0x17) to a new label called Mode_ABT.
b. Storing sequential byte-sized numbers in memory to be used for copying to another location in
memory.
c. Storing the contents of register r12 to memory address 0x40000004.
d. Associating a particular microcontroller’s predefined memory-mapped register address with a
name from the chip’s documentation, for example, VIC0_VA7R.
EQU gives a symbolic name to constants and values.
DCD allocates one or more word-aligned words of memory and defines runtime contents of
memory.
RN defines a register name for a particular register.
a. EQU
b. neither, should be DCB
c. neither
d. EQU
21
Chapter 5
1. Describe the contents of register r13 after the following instructions complete, assuming that
memory contains the values shown below. Register r0 contains 0x24, and the memory system is
little-endian.
Address Contents
0x24 06
0x25 FC
0x26 03
0x27 FF
a. LDRSB r13, [r0] c. LDR r13, [r0]
b. LDRSH r13, [r0] d. LDRB r13, [r0]
a. 0x06
b. 0xFFFFFC06
c. 0xFF03FC06
d. 0x06
2. Indicate whether the following instructions use pre- or post-indexed addressing modes:
a. STR r6, [r4, #4] c. LDRB r4, [r3, r2]!
b. LDR r3, [r12], #6 d. LDRSH r12, [r6]
a. pre-indexed
b. post-indexed
c. pre-indexed
d. pre-indexed
3.Calculate the effective address of the following instructions if register r3 = 0x4000 and register
r4 = 0x20:
a. STRH r9, [r3, r4] c. LDR r7, [r3], r4
22
b. LDRB r8, [r3, r4, LSL #3] d. STRB r6, [r3], r4, ASR #2
a. 0x4020
b. 0x4100
c. 0x4000
d. 0x4000
4. What’s wrong with the following instruction on an ARM7TDMI?
LDRSB r1,[r6],r3,LSL#4
For halfword, signed halfword, and signed byte accesses, an offset register cannot be shifted.
This was a design tradeoff because the need for this is so infrequent.
5. Write a program for either the ARM7TDMI or the Cortex-M4 that sums word-length values in
memory, storing the result in register r3. Include the following table of values to sum in your
code:
TABLE DCD 0xFEBBAAAA, 0x12340000, 0x88881111

DCD 0x00000013, 0x80808080, 0xFFFF0000
; For the ARM7TDMI

AREA Exercise5, CODE
ENTRY
LDR r0, =TABLE ; r0 is pointer to TABLE

MOV r1, #6 ; r1 is length of table
MOV r2, #0 ; r2 is sum
loop LDR r3, [r0], #4 ; load TABLE value and

; post-increment pointer
ADD r2, r2, r3 ; r3 = sum
SUBS r1, r1, #1 ; decrement counter
BNE loop
stop B stop
TABLE DCD 0xFEBBAAAA, 0x12340000, 0x88881111

DCD 0x00000013, 0x80808080, 0xFFFF0000
23
END
NB: If students aren’t using B (branch) instructions, then you could code this using only ADD
and LDR/STR instructions.
6. Assume an array contains 30 words of data. A compiler associates variables x and y with
registers r0 and r1, respectively. Assume the starting address of the array is contained in register
r2. Translate the C statement below into assembly instructions using pre-indexed addressing:
x = array[7] + y;
LDR r0, [r2, #28] ; r0 = array[7]

ADD r0, r0, r1 ; r0 = array[7] + y
7. Using the same initial conditions as Exercise 6, translate the following C statement into
assembly instructions using pre-indexed addressing:
array[10] = array[8] + y;
LDR r0, [r2, #32] ; r0 = array[8]

ADD r0, r0, r1 ; r0 = array[8] + y
STR r0, [r2, #40] ; array[10] = array[8] + y
8. Consider a C procedure that initializes an array of bytes to all zeros, given as
init_Indices (int a[], int s) {

int i;
for (i = 0; i < s; i ++)
a[i] = 0; }
Write the assembly language for this initialization routine. Assume s > 0 and is held in register r2.
Register r1 contains the starting address of the array, and the variable i is held in register r3.
While loops are not covered until Chapter 8, you can build a simple for loop using the following
construction:
MOV r3, #0 ; clear i
24
loop instruction
instruction
ADD r3, r3, #1 ; increment i
CMP r3, r2 ; compare i to s
BNE loop ; branch to loop if not equal
AREA init_Indices, CODE, READONLY

ENTRY
MOV r3, #0 ; i = 0
MOV r4, #0 ; r4 = clearing agent
Loop STRB r4, [r1], #1 ; a[i] = 0

ADD r3, r3, #1 ; increment i
CMP r3, r2 ; i < s ?
BNE Loop
stop B stop
END
9. Suppose that registers belonging to a particular peripheral on a microcontroller have a starting
address of 0xE000C000. Individual registers within the peripheral are addressed as offsets from
the starting address. If a register called LSR0 is 0x14 bytes away from the starting address, write
the assembly and Keil directives that will load a byte of data into register r6, where the data is
located in the LSR0 register. Use pre-indexed addressing.
LSR_BASE EQU 0xE000C000

LSR0 EQU 0x14
LDR r0, =LSR_BASE

LDRB r6, [r0, #LSR0]
10. Assume register r3 contains 0x8000. What would the register contain after executing the
following instructions?
a. STR r6, [r3, #12] c. LDRH r5, [r3], #8
b. STRB r7, [r3], #4 d. LDR r12, [r3, #12]!
a. 0x8000
b. 0x8004
c. 0x8008
25
d. 0x800C*
* Assuming this instruction doesn’t immediately follow (c) in a program.
11. Assuming you have a little-endian memory system connected to the ARM7TDMI, what
would register r4 contain after executing the following instructions? Register r6 holds the value
0xBEEFFACE and register r3 holds 0x8000.
STR r6, [r3]

LDRB r4, [r3]
What if you had a big-endian memory system?
Little: 0xCE
Big: 0xBE
26
Chapter 6
1. What constant would be loaded into register r7 by the following instructions?
a. MOV r7, #0x8C, 4 c. MVN r7, #2
b. MOV r7, #0x42, 30 d. MVN r7, #0x8C, 4
a. 0xC0000008
b. 0x00000108
c. 0xFFFFFFFD
d. 0x3FFFFFF7
2. Using the byte rotation scheme described for the ARM7TDMI, calculate the instruction and
rotation needed to load the following constants into register r2:
a. 0xA400 c. 0x17400
b. 0x7D8 d. 0x1980
a. MOV r2, #0xA4, 24
b. can’t be done – use LDR r2, =0x7D8
c. MOV r2, #0x5D, 22
d. MOV r2, #0x66, 26
3. Tell whether or not the following constants can be loaded into an ARM7TDMI register without
creating a literal pool and using only a single instruction:
a. 0x12340000 c. 0xFFFFFFFF
b. 0x77777777 d. 0xFFFFFFFE
a. no
b. no
c. yes – MVN r2, #0
27
d. yes – MVN r2, #1
4. Tell whether or not the following constants can be loaded into a Cortex-M4 register without
creating a literal pool and using only a single instruction:
a. 0xEE00EE00 c. 0x33333373
b. 0x09A00000 d. 0xFFFFFFFE
a. yes
b. yes
c. no – use LDR r3, =0x33333373
d. yes – use MVN r2, #1
5. What is the best way to put a numeric constant into a register?
Use the pseudo-load instruction, LDR Rd, =constant
6. Where is the best place to put literal pool data?
Very close to the instructions accessing the data.
7. Suppose you had the following code:
AREA SAMPLE, CODE,READONLY

ENTRY
start
MOV r12, #SRAM_BASE
ADD r0, r1, r2
MOV r0, #0x18
BL routine1
.
.
.
routine1
STM sp!, {r0-r3,lr}
.
.
.
END
28
Describe two ways to load the label routine1 into register r3, noting any restrictions that
apply.
1. LDR r3, =routine1

2. MOV r3, #routine1 ; routine1 address must be within range
; of immediates useable for data
; processing instructions
8. Describe the difference between ADR and ADRL.
ADR loads a relative address of a label into a register if the address is not more than 255 bytes
away. ADRL does the same, but it has a +-64KB range.
9. Give the instruction(s) to perform the following operations for both the ARM7TDMI and the
Cortex-M4:
a. Add 0xEC00 to register r6, placing the sum in register r4.
b. Subtract 0xFF000000 from register r12, placing the result in register r7.
c. Add the value 0x123456AB to register r7, placing the sum in register r12.
d. Place a two’s complement representation of -1 into register r3.
a. ADD r4, r6, #0xEC, 24 (ARM7TDMI)
ADD.W r4,r6, #0xEC00 (Cortex-M4 – note that we need the 32-bit ADD instruction)
b. SUB r7, r12, #0xFF, 8 (ARM7TDMI)
SUB.W r7, r12, #0xFF000000 (Cortex-M4 – note that we need the 32-bit SUB
instruction)
c. LDR r0, =0x123456AB

ADD r12, r7, r0
d. MVN r3, #0
10. Suppose you had the following code and you are using the Keil tools:
AREA EXERCISE, CODE

ENTRY
BL func1 ; call first subroutine
BL func2 ; call second subroutine
29
stop B stop ; terminate the program
func1 MOV r2, #0

LDR r1, =0xBABEFACE
LDR r2, =0xFFFFFFFC
MOV pc, lr ; return from subroutine
LTORG ; literal pool 1 has 0xBABEFACE

func2 LDR r3, =0xBABEFACE
LDR r4, =0x66666666
MOV pc, lr ; return from subroutine
BigTable
SPACE 3700 ; clears 3700 bytes of memory,
; starting here
END
On an ARM7TDMI, will loading 0x66666666 into register r4 cause an error? Why or why not?
What about on a Cortex-M4?
No. The LDR can reach the literal pool after the 3700 bytes of memory space since it will be
within 4K. The Cortex-M4 will not generate a literal since this type of constant can be loaded
with a single instruction.
11. The ARM branch instruction – B – provides a ±32MB branching range. If the program
counter is currently 0x8000 and you need to jump to address 0xFF000000, how do you think you
might do this?
Directly load the PC with the instruction LDR pc, #0xFF000000
12. Assuming that the floating-point hardware is already enabled (CP10 and CP11), write the
instructions to load the floating-point register s3 with a quiet NaN, or 0x7FC00000. (Hint: use
Program 5 from Chapter 3 as a guide.) You can write it using either the Keil tools or the CCS
tools.
; using the Keil tools here

VLDR.F s3, =0x7FC00000
Or if you’re using the CCS tools:
MOVW r0, #0x0

MOVT r0, #0x7FC0
VMOV.F s3, r0
30
Chapter 7
1. What’s wrong with the following ARM instructions? You may want to consult the ARM
Architectural Reference Manual to see the complete instruction descriptions and limitations.
a. ADD r3,r7,#1023 b. SUB r11,r12,r3,LSL #32
c. RSCLES r0, r15, r0, LSL r4 d. EORS r0, r15, r3, ROR r6
a. #1023 is out of range for immediate for data processing instructions
b. cannot shift register operand 2 more than #31
c. cannot use pc as Rn or Rm in register specified shift, as it produces undefined effects
d. cannot use pc as Rn or Rm in register specified shift, as it produces undefined effects
2. Without using the MUL instruction, give instructions which multiply register r4 by:
a. 135 c. 18
b. 255 d. 16,384
a. RSB r0, r4, r4, LSL #3

ADD r1, r0, r4, LSL #7
b. RSB r0, r4, r4, LSL #8
c. ADD r0, r4, r4, LSL #3

ADD r1, r0, r0
d. ADD r0, r4, LSL #14
3. Write a compare routine to compare two 64-bit values, using only two instructions. (Hint: the
second instruction is conditionally executed, based on the first comparison).
;Value1 = r1:r0
;Value2 = r3:r2
compare64
CMP r1, r3
CMPEQ r0, r2
31
4. Write shift routines that allow you to arithmetically shift 64-bit values which are stored in two
registers. The routines should shift an operand left or right by one bit.
; 64-bit value in r1:r0
leftShift1
MOV r1, r1, LSL #1 ; the MSB can drop off
MOVS r0, r0, LSL #1 ; MUST use MOVS to set flags
ADC r1, r1, #0 ; use the carry from shifter
rightShift1
MOVS r1, r1, ASR #1 ; the LSB goes to C bit
MOV r0, r0, RRX ; C bit goes to MSB of r0
5. Write the following decimal values in Q15 notation:
a. 0.3487 b. -0.1234
c. -0.1111 d. 0.7574
a. 0x2CA2
b. 0xFFFFF035
c. 0xFFFFF1C8
d. 0x60F2
6. Write the following signed, two’s complement Q8 values in decimal:
a. 0xFE32 b. 0x9834
c. 0xE800 d. 0xF000
a. -1.8047
b. -103.7969
c. -24
d. -16
32
7. Write the assembly code necessary to detect an error in a 12-bit Hamming code, where your
code tests the four checksum bits c3, c2, c1 and c0. Place your corrupted data in memory.
Assume that only a single error occurs in the data and store your corrected value in register r6.
AREA Hamming, CODE

ENTRY
; generate Hamming Code
MOV r2, #0 ; clear out transmitting reg

ADR r1, data
LDRB r0, [r1] ; load data
;
; calculate c0 using bits 76543210
; * ** **
; even parity, so result of XORs is the value of c0
;
MOV r4, r0 ; make a copy
EOR r4, r4, r0, ROR #1 ; 1 XOR 0
EOR r4, r4, r0, ROR #3 ; 3 XOR 1 XOR 0
EOR r4, r4, r0, ROR #4 ; 4 XOR 3 XOR 1 XOR 0
EOR r4, r4, r0, ROR #6 ; 6 XOR 4 XOR 3 XOR 1 XOR 0
AND r2, r4, #1 ; create c0 -> R2
;
; ** ** *
MOV r4, r0
EOR r4, r4, r0, ROR #2 ; 2 XOR 0
AND r4, r4, #1 ; isolate bit
ORR r2, r2, r4, LSL #1 ; 7 6 5 4 3 2 c1 c0
;
; * ***
ROR r4, r0, #1 ; get bit 1
EOR r4, r4, r0, ROR #2 ; 2 XOR 1
ORR r2, r2, r4, ROR #29 ; 7 6 5 4 c2 2 c1 c0
;
; ****
EOR r4, r4, r0, ROR #5 ; 5 XOR 4
33
AND r4, r4, #1
;
; build the final 12-bit result
;
ORR r2, r2, r4, ROR #25 ; rotate left 7 bits
AND r4, r0, #1 ; get bit 0 from original
ORR r2, r2, r4, LSL #2 ; add bit 0 into final
BIC r4, r0, #0xF1 ; get bits 3,2,1
ORR r2, r2, r4, LSL #3 ; add bits 3,2,1 to final
BIC r4, r0, #0x0F ; get upper nibble
ORR r2, r2, r4, LSL #4 ; r2 now contains 12 bits
; with checksums
; for this example, r2 holds the Hamming code 0xA6B
; now, corrupt it with a few hypothetical transmission errors
BIC r2, r2, #0xA ; now r2 holds corrupted

; hamming code 0xA61
;BIC r2, r2, #0x9 ; now r2 holds corrupted
; hamming code 0xA62
ADR r1, hammingCodeHolder
STR r2, [r1] ; store it in memory
; construct corrupt data from corrupt Hamming code
LDR r2, [r1] ; load corrupted hamming code

MOV r0, #0 ; clear out register to create
; data from corrupted Hamming code
; corrupted checksum bit indicator
; corrupted checksum bit counter
ANDS r5, r2, #0x4 ; d0 in corrupted Hamming code = 1?

ADDNE r0, r0, #0x1 ; if so, d0 = 1 in corrupted data
34
; now, r0 = corrupted data (extracted from corrupted Hamming
; code)
; check if c0 corrupted
AND r3, r2, #0x1 ; r3 = c0 from corrupted Hamming code
EOR r4, r4, r0, ROR #1 ; 1 XOR 0
EORS r4, r4, r3 ; 6 XOR 4 XOR 3 XOR 1 XOR 0 XOR c0
ADDNE r7, r7, #0x1 ; if not equal to 0, c0 is corrupted
ADDNE r8, r8, #1 ; increment corrupted checksum counter
AND r3, r2, #0x2
LSR r3, r3, #1 ; r3 = c1 from corrupted Hamming code
EOR r4, r4, r0, ROR #2 ; 2 XOR 0
EORS r4, r4, r3 ; 6 XOR 5 XOR 3 XOR 2 XOR 0 XOR c1
AND r3, r2, #0x8
EOR r4, r4, r0, ROR #2 ; 2 XOR 1
EORS r4, r4, r3 ; 7 XOR 3 XOR 2 XOR 1 XOR c2
AND r3, r2, #0x80
EOR r4, r4, r0, ROR #5 ; 5 XOR 4
EORS r4, r4, r3 ; 7 XOR 6 XOR 5 XOR 4 XOR c3
ADDNE r7, r7, #0x8; if not equal to zero, c3 is corrupted
35
; now, r7 bits indicate corrupted checksum bits from corrupted
; Hamming code
CMP r8, #2 ; was there two checksum bits corrupted?

MOVNE r6, r0 ; if not, data is fine
BNE done
CMP r7, #0x3 ; if so, was it c1 and c0?

EOREQ r6, r0, #0x1 ; if so, bit 0 should be changed
CMP r7, #0xA ; if so, was it c3 and c1?
CMP r7, #0xC ; if so, was it c3 and c2?
done B done
data DCD 0xAC

hammingCodeHolder
DCD 0x0
END
8. Write a program to calculate π x 48.9 in Q10 notation.
AREA FRACTIONAL, CODE

ENTRY
LDR r0, =0xC90 ; load pi in Q10

LDR r1, =0xC399 ; load 48.9 in Q10
MUL r2, r0, r1 ; signed multiply, result in Q20
END
9. Show the representation of sin(82°) in Q15 notation.
0x7EC1
10. Show the representation of sin(193°) in Q15 notation.
0xE335
36
11. Temperature conversion between Celsius and Fahrenheit can be computed using the
relationship
5
C ( F  32)
9
where C and F are in degrees. Write a program that converts a Celsius value in register r0 to
degrees Fahrenheit. Convert the fraction into a Q15 representation and use multiplication instead
of division in your routine. Load your test value from a memory location called CELS and store
the result in memory labeled FAHR. Remember that you will need to specify the starting address
of RAM for the microcontroller that you use in simulation. For example, the LPC2132
microcontroller has SRAM starting at address 0x40000000.
GLOBAL Reset_Handler
AREA Reset, CODE, READONLY
ENTRY
; certain assumptions are made here:

; the temperature is between 0 and 100 degrees C
; the result truncates the fractional portion, so it's
; a whole number in degrees F, without rounding up, etc.
; You can make it slightly more accurate if you wish.
FAHR EQU 0x40000000 ; for LPC2102, start of RAM

CELS EQU 0xE ; Celsius value in Q0
Reset_Handler
LDR r0, =CELS ; load Celsius value in Q15

LDR r4, =FAHR ; load address of FAHR
LDR r1, =0xE666 ; load 1.8 in Q15
MUL r3, r0, r1 ; signed multiply, result in Q15
LDR r2, =0x100000 ; load 32 in Q15
ADD r3, r3, r2 ; add 32
LSR r3, r3, #15 ; shift right 15 to convert to Q0
STR r3, [r4] ; store result in FAHR
Done
B Done
END
37
12. Write a program for either the ARM7TDMI or the Cortex-M4 that counts the number of ones
in a 32-bit value. Store the result in register r3.
AREA CountOnes, CODE

ENTRY
LDR r0, =VALUE

LDR r1, =32 ; 32-bit loop counter
LDR r3, =0x0 ; count
Loop CMP r0, #0 ; 1 in MSB?

ADDLT r3, r3, #1 ; if so, increment count
LSL r0, r0, #1 ; shift out MSB
SUBS r1, r1, #1
BNE Loop
Done B Done
VALUE EQU 0xFFFFFFFF
END
13. Using the ARM Architectural Reference Manual (or the Keil or CCS tools), give the bit
pattern for the following instructions:
a. RSB r0, r3, r2, LSL #2
b. SMLAL r3, r8, r2, r4
c. ADD r0, r0, r1, LSL #7
a. instruction encoding = 0xE0630102
b. instruction encoding = 0xE0E83492
c. instruction encoding = 0xE0800381
14. A common task that microcontrollers perform is ASCII-to-binary conversion. If you press a
number on a keypad, for example, the processor receives the ASCII representation of that
number, not the binary representation. A small routine is necessary to convert the data into binary
for use in other arithmetic operations. Looking at the ASCII table in Appendix C, you will notice
that the digits 0 through 9 are represented with the ASCII codes 0x30 to 0x39. The digits A
38
through F are coded as 0x41 through 0x46. Since there is a break in the ranges, it’s necessary to
do the conversion using two checks.
The algorithm to do the conversion is:
Mask away the parity bit (bit 7 of the ASCII representation), since we don’t care about it.
Subtract a bias away from the ASCII value.
Test to see if the digit is between 0 and 9.
If so, we’re done. Otherwise subtract 7 to find the value.
Write this routine in assembly. You may assume that the ASCII representation is a valid character
between 0 and F.
AREA AsciiToBinary, CODE

ENTRY
LDR r0, =0x39 ; load ASCII value

BIC r0, r0, #0x80 ; mask bit 7
SUB r0, r0, #"0" ; subtract bias
CMP r0, #0xA ; was ASCII value < 0xA?
SUBGE r0, r0, #7 ; if not, subtract 7
Done B Done
END
15. Write four different instructions which clear register r7 to zero.
Choose from:
AND r7, r7, #0
EOR r7, r7, r7
BIC r7, r7, #0xFFFFFFFF
MOV r7, #0x0
SUB r7, r7, r7
39
16. Suppose register r0 contains the value 0xBBFF0000. Give the Thumb-2 instruction and
register value for r1 that would insert the value 0x7777 into the lower half of register r0, so that
the final value is 0xBBFF7777.
BFI r0, r1, #0, #16 ; r1 = 0x7777
17. Write ARM instructions which set bits 0, 4, and 12 in register r6 and leave the remaining bits
unchanged.
ORR r6, r6, #0x11

ORR r6, r6, #0x1, 20
18. Write a program that converts a binary value between 0 and 15 into its ASCII representation.
See Exercise 14 for background information.
AREA BinaryToAscii, CODE

ENTRY
LDR r0, =0xF ; load binary value

CMP r0, #0xA ; is number less than 0xA?
BLT ADD0
ADD r0, r0, #"A"-"0"-0xA ; add offset ‘A’ to ‘F’
ADD0 ADD r0, r0, #"0" ; convert to ASCII
Done B Done
END
19. Assume that a signed long multiplication instruction is not available on the ARM7TDMI.
Write a program that performs a 32x32 multiplication, producing a 64-bit result, using only
UMULL and logical operations (such as MVN for inversions, EOR, ORR, etc.) Run the program
to verify its operation.
AREA Lab275, CODE, READONLY

ENTRY
LDR r0, =0x80000000

LDR r1, =0x80000000
MOV r5, #0 ; number of negative operands counter
40
CMP r0, #0 ; r0 negative?
ADDLT r5, r5, #1 ; if so, increment neg operands counter
RSBLT r0, r0, #0 ; and make positive
CMP r1, #0 ; r1 negative?
ADDLT r5, r5, #1 ; if so, increment neg operands counter
RSBLT r1, r1, #0 ; and make positive
UMULL r2, r3, r0, r1 ; perform multiply
CMP r5, #1 ; was only one operand negative?
BEQ ResultNeg ; if so, result is negative
Done B Done
ResultNeg
RSBS r2, r2, #0 ; subtract LO from #0, set carry flag
RSC r3, r3, #0 ; subtract HI from #0 with carry
B Done
END
20. Write a program to add 128-bit numbers together, placing the result in registers r0, r1, r2 and
r3. The first operand should be placed in registers r4, r5, r6 and r7, and the second operand should
be in registers r8, r9, r10 and r11.
AREA Add128, CODE, READONLY

ENTRY
LDR r4, =0xFF556000 ; r4 = op1[31:0]

LDR r5, =0x00001000 ; r5 = op1[63:32]
LDR r6, =0x77666677 ; r6 = op1[95:64]
LDR r7, =0x00887771 ; r7 = op1[127:96]
LDR r8, =0x00056000 ; r8 = op2[31:0]

LDR r9, =0x05501000 ; r9 = op2[63:32]
LDR r10, =0x89668677 ; r10 = op2[95:64]
LDR r11, =0x00800071 ; r11 = op2[127:96]
ADDS r0, r4, r8 ; r0 = op1[31:0] + op2[31:0]
ADCS r1, r5, r9 ; r1 = op1[63:32] + op2[63:32]

ADCS r2, r6, r10 ; r2 = op1[95:64] + op2[95:64]
ADC r3, r7, r11 ; r3 = op1[127:96] + op2[127:96]
Done B Done
END
41
21. Write a program that takes character data ‘a’ through ‘z’ and returns the character in
uppercase.
AREA Upper, CODE, READONLY

ENTRY
LDR r0, =char

AND r0, r0, #0xDF ; clear bit 5
Done B Done
char EQU 0x7A ; ‘z’
END
22. Give three different methods to test the equivalence of two values held in registers r0 and r1.
1) CMP r0, r1 (then test if z flag is set)
2) TEQ r0, r1 (then test if z flag is set)
3) EORS r2, r0, r1 (then test if z flag is set)
23. Write assembly code for the ARM7TDMI to perform the following signed division:
r1= r0/16
MOV r1, r0, ASR #4 ; r1 = signed (r0/16)
24. Multiply 0xFFFFFFFF (-1 in a two’s complement representation) and 0x80000000 (the
largest negative number in a 32-bit two’s complement representation) on the ARM7TDMI. Use
the MUL instruction. What value do you get? Does this number make sense? Why or why not?
Result is 0x80000000 = -2147483648 in decimal. This looks odd, but from the machine’s
standpoint, it’s the result of multiplying two 32-bit numbers together and then truncating the
result. The 64-bit result would have been 7FFFFFFF80000000 (remember the machine hasn’t
been told the numbers are signed or unsigned – it just blindly multiplies them). When the result
was effectively truncated to 32 bits, the lower 32 bits happen to be 0x80000000.
42
25. A Gray code is an ordering of 2n binary numbers such that only one bit changes from one
entry to the next. One example of a 2-bit Gray code is b10 11 01 002. The spaces in this example
are for readability. Write ARM assembly to turn a 2-bit Gray code held in r1 into a 3-bit Gray
code in r2. Note that the 2-bit Gray code occupies only bits [7:0] of r1, and the 3-bit Gray code
occupies only bits [23:0] of r2. You can ignore the leading zeros. One way to build an n-bit Gray
code from an (n – 1)-bit Gray code is to prefix every (n – 1)-bit element of the code with 0 and
have those be the high codes (top half of the entire code). Then create the low half of the n-bit
Gray code elements by taking each (n – 1)-bit Gray code element in reverse order and prefixing it
with a one. For example, the 2-bit Gray code above becomes
b010 011 001 000 100 101 111 110.
; Gray Codes - Definition: An ordering of 2^n binary numbers

; such that only one bit changes from one entry to the next.
; This code takes a Gray Code for 2 bits (r1), turns it into a
; 3-;bit Gray Code. The 3-bit Gray Code shall be held in r2
; after program termination. One should note that a 2-bit Gray
; Code will only occupy bits [7:0] of an ARM register and a 3-
; bit Gray Code will only occupy bits [23:0] of an ARM register.
AREA GrayCodes, CODE, READONLY

ENTRY
; 1001 0010 0100 used as operand to prefix 2-bit elements with

; '1'
LDR r12, =0x924
; mask out individual 2-bit codes
AND r3, r1, #0xC0 ; r3[7:6] = 2-bit Gray code[7:6]

AND r4, r1, #0x30 ; r4[5:4] = 2-bit Gray code[4:5]
AND r5, r1, #0xC ; r5[3:2] = 2-bit Gray code[3:2]
; do same, these will be used to reverse order and place in lower

; half of final 3-bit Gray code
43
AND r8, r1, #0xC ; r8[3:2] = 2-bit Gray code[3:2]

AND r10, r1, #0xC0 ;r10[7:6] = 2-bit Gray code[7:6]
LSL r3, r3, #3 ; shift r3[7:6] to r3[10:9]

LSL r4, r4, #2 ; shift r4[5:4] to r4[7:6]
LSL r5, r5, #1 ; shift r5[3:2] to r5[4:3]
; put individual 2-bit Gray code elements in reverse order in

; lower half of final 3-bit Gray code
LSL r7, r7, #9 ; shift r7[1:0] to r7[10:9]

LSL r8, r8, #4 ; shift r8[3:2] to r8[7:6]
LSR r9, r9, #1 ; shift r9[5:4] to r9[4:3]
LSR r10, r10, #6 ; shift r10[7:6] to r10[1:0]
; begin to construct 3-bit Gray Code
ADD r3, r3, r4

ADD r3, r3, r5
ADD r3, r3, r6
; r3 now equals 0 [10:9] 0 [7:6] 0 [4:3] 0 [1:0], prefixing each

; 2-bit Gray code element with '0'
ADD r7, r7, r8

ADD r7, r7, r9
ADD r7, r7, r10
; r7 now equals 0 [10:9] 0 [7:6] 0 [4:3] 0 [1:0], prefixing each

; 2-bit Gray code element with '0'
ADD r7, r7, r12 ; r7 = 1 [10:9] 1 [7:6] 1 [4:3] 1 [1:0]
LSL r3, r3, #12 ; r3 = 0 [22:21] 0 [19:18]

; 0 [16:15] 0 [13:12]
ADD r2, r3, r7 ; now put piece the two together
Done B Done
END
26. Write a program that calculates the area of a circle. Register r0 will contain the radius of the
circle in Q3 notation. Represent π in Q10 notation, and store the result in register r3 in Q3
notation.
AREA AreaCircle, CODE, READONLY

ENTRY
44
LDR r0, =0x18 ; r0 = radius in Q3 (radius = 3 here)
LDR r1, =0xC90 ; r1 = pi in Q10
MUL r3, r0, r0 ; r3 = r^2 in Q6

MUL r3, r1, r3 ; r3 = pi*r^2 in Q16
LSR r3, r3, #13 ; convert r3 from Q16 to Q3
Done B Done
END
45
Chapter 8
1. Code the following IF-THEN statement using Thumb-2 instructions:
if (r2 != r7)
r2 = r2 – r7;
else
r2 = r2 + r4;
CMP r2, r7
ITE NE
SUBNE r2, r2, r7
ADDEQ r2, r2, r4
2. Write a routine for the ARM7TDMI that reverses the bits in a register, so that a register
containing d31d30d29…d1d0 now contains d0d1…d29d30d31. Compare this to the instruction RBIT on
the Cortex-M4.
AREA ReverseReg, CODE, READONLY

ENTRY
LDR r0, =0x11111111 ; r0 = register to be reversed

LDR r1, =32 ; 32-bit register bit counter
MOV r2, #0 ; r2 = result (will be r0 reversed)
Loop AND r3, r0, #1 ; r3 = current LSB of register to be

; reversed
ADD r2, r3, r2, LSL #1 ; shift result left 1, new
; result LSB is r3
LSR r0, r0, #1 ; shift right to look at new LSB
; of register to be reversed
SUBS r1, r1, #1 ; loop 32 times
BNE Loop
Done B Done
END
3. Code the GCD algorithm given in Section 8.4.1 using Thumb-2 instructions.
gcd CMP r0, r1

IT GT
SUBGT r0, r0, r1
IT LT
SUBLT r1, r1, r0
BNE gcd
46
But note also that this code only takes up 12 bytes of code space, less than the 16 bytes of ARM
code!
4. Find the maximum value in a list of 32-bit values located in memory. Assume the values are in
two’s complement representations. Your program should have 50 values in the list.
AREA MaxVal, CODE, READONLY

ENTRY
LDR r0, =49 ; initialize loop counter, r0 (length

; of elements - 1)
ADR r2, Values ; r2 = pointer to values
Loop LDR r3, [r2], #4 ; r3 = 1st element for comparison,

; increment pointer
LDR r4, [r2] ; r4 = 2nd element for comparison
CMP r3, r4 ; 1st > 2nd ?
STRGT r3, [r2] ; if so, store 1st in location of 2nd
BNE Loop
LDR r1, [r2] ; result is at location in r2, load

; into r1
Done B Done
Values
DCD 1, 2, 4, 5, 6, 7, -8, 11, 23, 100
DCD 4, 5, 6, 2, 3, 4, 5, 13, 45, 200
DCD 5, -4, 888, 2, 1, 2, 7, 88, 45, 444
DCD 3, 4, -5, 1, 0, 2, 0, 23, -23, 111
DCD 3, 4, 5, 6, 7, -8, 9, 33, 22, 666
END
5. Write a parity checker routine that examines a byte in memory for correct parity. For even
parity, the number of ones in a byte should be an even number. For odd parity, the number of
ones should be an odd number. Create two small blocks of data, one assumed to have even parity
and the other assumed to have odd parity. Introduce errors in both sets of data, writing the value
0xDEADDEAD into register r0 when an error occurs.
AREA ParityChecker, CODE, READONLY
47
ENTRY
ADR r0, Even ; load address of Even

LDRB r1, [r0] ; r1 = Even
ADR r0, Odd ; load address of Odd
LDRB r2, [r0] ; r2 = Odd
LDR r3, =0 ; bit counter for r1

LDR r4, =0 ; bit counter for r2
LDR r5, =8 ; loop counter
Loop ANDS r6, r1, #1 ; current LSB a 1?

ADDNE r3, r3, #1 ; if so, increment bit
; counter for r1
LSR r1, r1, #1 ; shift left 1 bit
ANDS r6, r2, #1 ; current MSB a 1?
ADDNE r4, r4, #1 ; if so, increment bit
; counter for r2
LSR r2, r2, #1 ; shift left 1 bit
BNE Loop
ANDS r3, r3, #1 ; is LSB of Even counter 1?

LDRNE r0, =0xDEADDEAD ; if so, parity error
ANDS r4, r4, #1 ; is LSB of Odd counter 1?
LDREQ r0, =0xDEADDEAD ; if not, parity error
Done B Done
Even DCB 0xFF

Odd DCB 0x80
END
6. Compare the code sizes (in bytes) for the GCD routines in Section 8.4.1, where one is written
using conditional execution and one is written using branches. (PS – both use both!)
Larger version: 28 bytes
Smaller version: 16 bytes
7. Digital signal processors make frequent use of Finite Impulse Response filters. The output of
the filter, y (n), can be described as a weighted sum of past and present input samples, or
48
N 1
y (n)   h(m) x(n  m)
m 0
where the coefficients h (m) are calculated knowing something about the type of filter you want.
A linear phase FIR filter has the property that its coefficients are symmetrical. Suppose that N is
7, and the values for h (m) are given as
h(0) = h(6) = -0.032
h(1) = h(5) = 0.038
h(2) = h(4) = 0.048
h(3) = -0.048
Use the sample data x (n) below.
SAMPLE DCW 0x0034,0x0024,0x0012,0x0010

DCW 0x0120,0x0142,0x0030,0x0294
Write an assembly language program to compute just one output value, y(8), placing the result in
register r1. You can assume that x(8) starts at the lowest address in memory and that x(7), x(6),
etc. follow as memory addresses increase. The coefficients should be converted to Q15 notation,
and the input and output values are in Q0 notation.
ENTRY
; most DSP processors have dedicated hardware for implementing

; efficient circular queues specifically for filters,
; but just to demonstrate a simple loop, the following code
; could be used.
; y(8) = h(0)x(8) + h(1)x(7) + ... + h(6)x(2)
; Note that the output is still in Q15 notation
Reset_Handler
49
ADR r2, COEFF ; start of coefficients in memory
ADR r0, SAMPLE ; start of sample data in memory
MOV r7, #0 ; the accumulator
Loop
LDRSH r4, [r2], #2 ; bring in one coefficient
LDRSH r5, [r0], #2 ; bring in one sample
MUL r6, r5, r4 ; h(n)*x(n-m)
ADD r7, r7, r6 ; result still in Q15 format
BNE Loop
Done B Done
COEFF DCW 0xFBE8 ; -0.032

DCW 0x04DD ; 0.038
DCW 0x0624 ; 0.048
DCW 0xF9DC ; -0.048
DCW 0x0624
DCW 0x04FF
DCW 0xFBE8
SAMPLE DCW 0x0034, 0x0024, 0x0012, 0x0010
; x(8), x(7), x(6), x(5)
DCW 0x0120, 0x0142, 0x0030, 0x0294
; x(4), x(3), x(2), x(1)
; Note: x(1) isn't needed, really...
END
8. Write a routine to reverse the word order in a block of memory. The block contains 32 words
of data.
AREA ReverseMem, CODE, READONLY

num EQU 32 ; number of words for block copy
ENTRY
start
LDR r0, =Values ; r0 = ptr to source for block copy
LDR r1, =Temp ; r1 = ptr to destination for block copy
MOV r2, #num ; r2 = number of words for block copy
copy LDR r3, [r0], #4 ; a word from Values

STR r3, [r1], #4 ; store a word to Temp
BNE copy ; ... copy more
LDR r0, =Values ; reset pointer to start of Values

MOV r2, #num ; reset counter
SUB r1, r1, #4 ; adjust Temp to point to last word
50
copyRev
LDR r3, [r1], #-4 ; word from Temp starting at the end
STR r3, [r0], #4 ; str a word to Values, from beginning
SUBS r2, r2, #1
BNE copyRev
Done B Done
AREA Block, DATA, READWRITE
Values
DCD 1, 2, 4, 5, 6, 7, -8, 11, 23, 5
DCD 4, 5, 6, 2, 3, 4, 5, 13, 45, 7
DCD 5, -4, 8, 2, 1, 2, 7, 6, 45, 44
DCD 3, 4
Temp DCD 0,0,0,0,0,0,0,0,0,0

DCD 0,0,0,0,0,0,0,0,0,0
DCD 0,0
END
9. Translate the following conditions into a single ARM instruction:
a. Add registers r3 and r6 only if N is clear. Store the result in register r7.
ADDNE r7, r3, r6
b. Multiply registers r7 and r12, putting the results in register r3 only if C is set and Z is clear.
MULHI r3, r7, r12
c. Compare registers r6 and r8 only if Z is clear.
CMPNE r8, r8
10. The following is a simple C function that returns 0 if (x+y)<0 and returns 1 otherwise:
int foo(int x, int y){

if (x+y < 0)
return 0;
else
return 1;
}
51
Suppose that a compiler translated it into the following assembly:
foo ADDS r0, r0, r1

BPL PosOrZ
done MOV r0, #0
MOV pc, lr
PosOrZ MOV r0, r1
B done
This is inefficient. Rewrite the assembly code for the ARM7TDMI using only four instructions
(hint: use conditional execution).
foo ADDS r0, r0, r1 ; r1 = x + y, setting CCs

MOVPL r0, #1 ; return 1 if n bit = 0
MOVMI r0, #0 ; return 0 if n bit = 1
MOV pc, lr
11. Write Example 7.10 (finding the absolute value of a number) for the Cortex-M4.
MOVS r1, r0
IT LT
RSBLT r1, r1, #0
12. What instructions are actually assembled if you type the following lines of code for the
Cortex-M4 into the Keil assembler, and why?
CMP r3, #0
ADDEQ r2, r2, r1
CMP r3, #0
IT EQ
ADDS r2, r2, r1
The assembler will automatically add the IT instruction for you to create conditional
execution. Note that you get an ADDS instruction because the lower registers are used here (r1
and r2), which force the instruction to set condition codes in Thumb code. If you were using high
registers, then a plain ADD would have been used instead.
52
Chapter 9
1. Represent the following values in half-precision, single-precision, and double-
precision.
a. 1.5
0x3e00, 0x3fc00000, 0x3ff80000_00000000
b. 3.0
0x4200, 0x40400000, 0x40080000_00000000
c. -4.5
0xc480, 0xc0900000, 0xc0120000_00000000
d. -0.46875
0xb780, 0xbef00000, 0xbfde0000_00000000
e. 129
0x5808, 0x43010000, 0x40602000_00000000
f. -32768
0xf800, 0xc7000000, 0xc0e00000_00000000
2. Write a program in a high-level language to take as input a value in the form (-)x.y and
convert the value to single-precision and double-precision values.
Note to instructor: Output of single-precision in hexadecimal is straightforward with a

simple cast. Output of double-precision is a bit more complicated. Both may need a short
lesson on pointers in C. The output of double-precision may be omitted without any loss
of usefulness in the program.
/*************************************
* SimpleConv
* Convert input in decimal format
* (-)x(.x) to single-precision
* and double-precision floating-point
* CNH 4/2014
*************************************/
#include <stdio.h>
int main()
{
char xinline[81];
float F_in;
double D_in;
double *pD_in = &D_in;
int *pD_inh = (int *)pD_in+1;
while(1) {
printf("Enter input in decimal (-)x(.x): ");
if(scanf("%le", &D_in) == 1) {
F_in = D_in;
printf("Output: Single-precision: %08x, Double-precision:
%08x_%08x\n", *(int *)&F_in, *pD_inh, *(int *)&D_in);
53
}
else { return 0;
}
}
}
3. Using the program from Exercise 2 (or a converter on the internet), convert the
following values to single-precision and double-precision.
a. 65489
Output: Single-precision: 477fd100, Double-precision: 40effa20_00000000
b. 2147483648
Output: Single-precision: 4f000000, Double-precision: 41e00000_00000000
c. 229
Output: Single-precision: 4e000000, Double-precision: 41c00000_00000000
d. -0.38845
Output: Single-precision: bec6e2eb, Double-precision: bfd8dc5d_63886595
e. 0.0004529
Output: Single-precision: 39ed7336, Double-precision: 3f3dae66_b0384190
f. 11406
Output: Single-precision: 46323800, Double-precision: 40c64700_00000000
g. -57330.67
Output: Single-precision: c75ff2ac, Double-precision: c0ebfe55_70a3d70a
4. Expand the program in Exercise 2 to output half-precision values. Test your output on
the values from Exercise 3. Which would fit in half-precision?
See note in Exercise 2.

/*************************************
* SimpleConvHalf
* Convert input in decimal format
* (-)x(.x) to half-presicion,
* single-precision and double-precision
* floating-point
* CNH 4/2014
*************************************/
#include <stdio.h>
short ConvertToHalf(float);
int main()
{
char xinline[81];
char *token;
float F_in;
double D_in;
double *pD_in = &D_in;
int *pD_inh = (int *)pD_in+1;
short Half;
while(1) {
printf("Enter input in decimal (-)x(.x): ");
54
if(scanf("%le", &D_in) == 1) {
F_in = D_in;
Half = ConvertToHalf(F_in);
printf("Output: Half-Precision: %04x, Single-precision: %08x,
Double-precision: %08x_%08x\n", (Half & 0xffff), *(int *)&F_in,
*pD_inh, *(int *)&D_in);
}
else { return 0;
}
}
}
short ConvertToHalf(float F_input) {

char C_Output[5];
short W_Exponent, W_Half;

int I_Sign, I_Fraction, I_Half_Fraction, I_LSB, I_Guard, I_Sticky,
I_Round;
union {
int I_Single ;
float F_Single ;
} Single ;
Single.F_Single = F_input;
// Extract bits of the SP values and shift them to the right
W_Exponent = ((Single.I_Single & 0x7F800000) >> 23) - 127;
I_Fraction = (Single.I_Single & 0x007FFFFF) | 0x00800000;
I_Half_Fraction = (I_Fraction & 0x7FE000) >> 13;
I_Sign = (Single.I_Single & 0x80000000) >> 31;
I_LSB = I_Fraction & 0x2000;
I_Guard = I_Fraction & 0x1000;
I_Sticky = I_Fraction & 0xFFF;
I_Round = ((I_LSB > 0) & (I_Guard > 0))
| ((I_Guard > 0) & (I_Sticky > 0));
if (I_Round > 0) { I_Half_Fraction++; }
if (I_Half_Fraction == 0x800) { // overflow!
I_Half_Fraction = I_Half_Fraction >> 1;
W_Exponent++;
}
if (W_Exponent > 15 ) {
if (I_Sign) { W_Half = 0xfc00; }
else {W_Half = 0x7c00; }
} else if (W_Exponent < -16) {
W_Half = (I_Sign << 15 | 0);
} else {
W_Half = (I_Sign << 15)
| ((W_Exponent + 15) << 10)
| (I_Half_Fraction & 0x7FF)
;
}
return W_Half;
}
55
5. Write a program in a high-level language to take as input a single-precision value in
the form 0xffffffff (where f is a hexadecimal value) and convert the input to
decimal.
/*************************************
* ConvHexToSP
* Convert input in hexadecimal format
* to decimal.
* CNH 4/2014
*************************************/
#include <stdio.h>
int main()
{
union {
int I_Single;
float F_Single;
} IntSP;
printf("Enter input in hexadecimal 0xXXXXXXXX: ");

scanf("0x%8x", &IntSP.I_Single);
printf("Output: Single-Precison: %f\n", IntSP.F_Single);
}
6. Using the program from Exercise 5 (or a converter on the internet), convert the
following single-precision values to decimal. Identify the class of value for each input.
If the input is NaN, give the payload as the value, and NaN type in the class field.
Single-precision value Value Class

a. 0x3fc00000 1.5 Normal
b. 0x807345ff -1.058618e-38 Subnormal
c. 0x7f350000 2.405903e+38 Normal
d. 0xffffffff 0x7fffff NaN
e. 0x20000000 1.084202e-19 Normal
f. 0x7f800000 Inf Inf
g. 0xff800ffe 0x000ffe NaN
h. 0x42c80000 1.00e+02 Normal
i. 0x4d800000 2.684355e+08 Normal
j. 0x80000000 -0.0 Zero
7. What value would you write to the FPSCR to set the following conditions?
a. FZ unset, DN unset, roundTowardZero rounding mode

0x00C00000
b. FZ set, DN unset, roundTowardPositive rounding mode
0x01400000
c. FZ set, DN set, roundTiesToEven rounding mode
0x03000000
56
8. Complete the following table for each of the FPSCR values shown below.
N Z C V DN FZ RMode IDC IXC UFC OFC DZC IOC

0x41c00010 0 1 0 0 0 1 RZ 0 1 0 0 0 0
0x10000001 0 0 0 1 0 0 RN 0 0 0 0 0 1
0xc2800014 1 1 0 0 1 0 RM 0 1 0 1 0 0
9. Give the instructions to load the following values to FPU register S3.
a. 5.75 x 103
VLDR.F32 s3,=5.75e3
b. 147.225
VLDR.F32 s3,=147.225
c. -9475.376
VLDR.F32 s3,=-9475.376
d. -100.6 x 10-8
VLDR.F32 s3,=-1.006e-10
10. Give the instructions to perform the following load and store operations.
a. Load the 32-bit single-precision value at the address in R4 into S12.
VMOV.F32 s12, [r4]
b. Load the 32-bit single-precision value in R6 to S15. Repeat for a store of the value in
S15 to R6.
VMOV.F32 s15, r6
VMOV.F32 r6, s15
c. Store the 32-bit value in S4 to memory at the address in R8 with an offset of 16 bytes.
VMOV.F32 [r8, #16]
d. Store the 32-bit constant 0xFFFFFFFF to S28.
VLDR.F32 s28, =0f_FFFFFFFF
57
11. Give the instructions to perform a conversion of four fixed-point data in unsigned 8.8
format stored in S8 to S11 to single-precision format.
VCVT.U16.F32 s8, s8, #8

VCVT.U16.F32 s9, s9, #8
VCVT.U16.F32 s10, s10, #8
VCVT.U16.F32 s11, s11, #8
12. What instruction would you use to convert a half-precision value in the lower half of
register S5 to a single-precision value, and store the result in S2?
VCVTB.F32.F16 s2, s5
13. Give the instructions to load 8 single-precision values at address 0x40000100 to FPU
registers S8-S15.
FPDATA_BASE EQU 0x40000100
LDR r0,=#FPDATA_BASE
VLDR.32 s8, [r0]
VLDR.32 s9, [r0, #4]
VLDR.32 s10, [r0, #8]
VLDR.32 s11, [r0, #12]
VLDR.32 s12, [r0, #16]
VLDR.32 s13, [r0, #20]
VLDR.32 s14, [r0, #24]
VLDR.32 s15, [r0, #28]
Or, since the operands are contiguous in memory and in the register, file, the VLDR 64-
bit instruction could be used:
FPDATA_BASE EQU 0x40000100
LDR r0, =#FPDATA_BASE

VLDR.64 d4, [r0]
VLDR.64 d5, [r0, #8]
VLCR.64 d6, [r0, #16]
VLDR.64 d7, [r0, #24]
14. How many subnormal values are there in a single-precision representation? Is this
the same number as values for any non-zero exponent?
The fraction is 23 bits, and all bit patterns are valid for subnormal values. So there
23
are 2 unique representative values. The the same number of values exists for each non-
zero exponent in the normal range.
58
Chapter 10
1. For each rounding mode, show the resulting value given the sign, exponent, fraction,
guard, and sticky bit in the cases below.
Rounding mode Sign Exponent Fraction G S

bit bit
0 011111111 11111111111111111111111 1 0
roundTiesToEven 0 100000000 00000000000000000000000
roundTowardPositive 0 100000000 00000000000000000000000
roundTowardNegative 0 011111111 11111111111111111111111
roundTowardZero 0 011111111 11111111111111111111111
1 000000000 11111111111111111111111 0 1
0 11111110 11111111111111111111111 1 1
1 11111110 01111111111111111111110 1 0
2. Rework Example 10.1 with the following values for OpB. For each, generate the
product rounded for each of the four rounding modes:
For the negative operands (exercises a-d) the smaller operand is negated with a 2’s
complement bit added in the lowest place. When summed, this results in a MSB in the
sum of zero, requiring a left shift of one place. After this shift, the remaining bits are used
in the rounding computation. Recall that the result is positive, so RM and RZ will never
increment, while RN and RP will use the rounding bits.
a. 0xBF000000
RN/RP – 0x4B800000
RM/RZ – 0x4B7FFFFF
b. 0xBF800000
All rounding modes – 0x4B7FFFFF (note – all rounding bits are zero)
c. 0xBF900000
RN/RP – 0x4B7FFFFF
RM/RZ – 0x4B7FFFFE
59
d. 0xBFC00000
RN/RM/RZ – 0x4B7FFFFE
RP – 0x4B7FFFFF
e. 0x3F000000
RN/RM/RZ – 0x4B800000
RP – 0x4B800001
f. 0x3FC00000
RM/RZ – 0x4B800000
RN/RP – 0x4B800001
g. 0x40200000
RN/RM/RZ – 0x4B800001
RP – 0x4B800000
h. 0x40400000
RM/RZ – 0x4B800001
RN/RP – 0x4B800002
3. Complete the following table for the given operands and operations, showing the result
and the exception status bits resulting from each operation. Assume the multiply-
accumulate operations are fused.
Operation Operand A Operand B Operand C Result Exception

Bit(s) set
A+B 0xFF800000 0x7F800000 NaN - IOC
0x7FC00000
A*B 0x80000000 0x7F800000 - Inf None
A–B 0xFF800000 0xFF800001 - NaN IOC
0xFFC00001
A/B 0x7FC00011 0x00000000 - NaN IOC
0x7FC00011
A/B 0xFF800000 0xFF800000 - NaN None
0xFFC00000
A*B 0x10500000 0x02000000 - 0x00000000 UFC, IXC
A*B 0x01800000 0x3E7FFFFF - 0x00800000 UFC, IXC
A*B 0x3E7F0000 0x02000000 - 0x00FF0000 None
(A * B) + 0x80000000 0x00800000 0x7FB60004 NaN IOC
C 0x7FF60004
(A * B) + 0x3F800000 0x7F800000 0xFF800000 NaN IOC
C 0x7FC00000
(A * B) + 0x6943FFFF 0x71000000 0xFF800000 NaN IOC
C 0x7FC00000
4. Write a program in a high-level language to input two single-precision values and add,
subtract, multiply and divide the values. In this Exercise use the default rounding mode.
Test your program with various floating-point values.
60
/******************************
* QuickFloat - simple C program
* to compute add, sub, mul, and
* div for two input SP values
* CNH 4/2014
******************************/
#include <stdio.h>
int main() {
// For this exercise, we assume inputs in hexadecimal
union {
int I_Single;
float F_Single;
} inA, inB, res;
printf("Enter two single-precision values in hexadecimal format

(0xXXXXXXXX): ");
scanf("0x%8x 0x%8x", &inA.I_Single, &inB.I_Single);
// Perform addition
res.F_Single = inA.F_Single + inB.F_Single;
printf("\nSum : 0x%08x\n", res.I_Single);
// Perform subtraction, A-B

res.F_Single = inA.F_Single - inB.F_Single;
printf("Diff : 0x%08x\n", res.I_Single);
// Perform multiplication
res.F_Single = inA.F_Single * inB.F_Single;
printf("Prod : 0x%08x\n", res.I_Single);
// Perform division, A/B

res.F_Single = inA.F_Single / inB.F_Single;
printf("Quot : 0x%08x\n\n", res.I_Single);
return(0);
}
5. Using routines available in your high-level language, perform each of the computations
in each of the four rounding modes. In C, you can use the floating-point environment by
including <fenv.h> and changing the rounding mode by the function fsetround(RMODE),
where RMODE is one of
FE_DOWNWARD
FE_TONEAREST
FE_TOWARDZERO
FE_UPWARD
Test your program with input values, modifying the lower bits in the operands to see how
the result differs for the four rounding modes. For example, use 0x3f800001 and
0xbfc00000. What differences did you notice in the results for the four rounding
modes?
61
/*********************************
* QuickFloatRM - simple C program
* to compute add, sub, mul, and
* div for two input SP values
* using routines to set the
* desired rounding mode.
* CNH 4/2014
******************************/
#include <stdio.h>
#include <fenv.h>
int main() {
// For this exercise, we assume inputs in hexadecimal
union {
int I_Single;
float F_Single;
} inA, inB, resRN, resRP, resRM, resRZ;
printf("Enter two single-precision values in hexadecimal format

(0xXXXXXXXX): ");
scanf("0x%8x 0x%8x", &inA.I_Single, &inB.I_Single);
printf("Op RN RP RM RZ\n");
printf("------------------------------------------------------\n");
// Perform each function using each rounding mode.

// Perform addition
fesetround(FE_TONEAREST);
resRN.F_Single = inA.F_Single + inB.F_Single;
fesetround(FE_UPWARD);
resRP.F_Single = inA.F_Single + inB.F_Single;
fesetround(FE_DOWNWARD);
resRM.F_Single = inA.F_Single + inB.F_Single;
fesetround(FE_TOWARDZERO);
resRZ.F_Single = inA.F_Single + inB.F_Single;
printf("Sum : 0x%08x 0x%08x 0x%08x 0x%08x\n", resRN.I_Single,

resRP.I_Single, resRM.I_Single, resRZ.I_Single);
// Perform subtraction A-B

resRN.F_Single = inA.F_Single - inB.F_Single;
resRP.F_Single = inA.F_Single - inB.F_Single;
resRM.F_Single = inA.F_Single - inB.F_Single;
resRZ.F_Single = inA.F_Single - inB.F_Single;
printf("Diff : 0x%08x 0x%08x 0x%08x 0x%08x\n", resRN.I_Single,

// Perform multiplication
resRN.F_Single = inA.F_Single * inB.F_Single;
62
resRP.F_Single = inA.F_Single * inB.F_Single;
resRM.F_Single = inA.F_Single * inB.F_Single;
resRZ.F_Single = inA.F_Single * inB.F_Single;
printf("Prod : 0x%08x 0x%08x 0x%08x 0x%08x\n", resRN.I_Single,

// Perform division A/B

resRN.F_Single = inA.F_Single / inB.F_Single;
resRP.F_Single = inA.F_Single / inB.F_Single;
resRM.F_Single = inA.F_Single / inB.F_Single;
resRZ.F_Single = inA.F_Single / inB.F_Single;
printf("Quot : 0x%08x 0x%08x 0x%08x 0x%08x\n\n", resRN.I_Single,

return(0);
}
Differences are limited to the last bit for subtraction and multiplication, with addition and
division returning the same result for all rounding modes.
6. Give 3 values that will hold to the distributive property and 3 which will not.
Three values for which the distributive property will hold could be A = B = C =
0x3f800000 (1.0).
Two conditions (there may be more) will demonstrate a failure of the distributive
property for floating-point.
1. (B + C) < X, and A * X is normal, but A * B or A * C overflows. For example,
A = 0x43000000 (128)
B = 0x7E800001 (very large, positive)
C = 0xFE800000 (very large, negative)
In this example, both A * B or A * C will overflow, returning an Infinity. A * (B + C)

will remain in the normal range. Only one of A * B or A * C need overflow to
demonstrate this property.
2. (B + C) overflows, but A is < 1.0. For example,
A = 0x3F000000 (1/2)
B = 0x7F7FFFFF (just under the max for single-precision normal)
63
C = 0x7F7FFFFF
7. Is it is possible to have cancellation in an effective subtraction operation and have a

guard and sticky bit? If so, show an example. If not, explain why.
It is not. To have cancellation, the exponents must be equal or differ by only one. If they
are equal, there will not be either a guard or sticky bit. If the differ by one, only a guard
bit will exist.
8. Redo Example 10.5 using 0x3F800003 as the multiplier.
We replaced the multiplier (0x3F800001) with 0x3F800003. This will add a term in the
multiplier array due to bit [1] of the multiplier fraction. Otherwise, the exercise is
identical and the determination of rounding is made using the new guard and sticky bits,
and the sign.
Operand Significand
OpB Significand 1 101 00000000000000000000
(Multiplicand)
OpA Significand 1 000 00000000000000000011
(Multiplier)
Internally aligned partial product terms

OpA[0] term 1 101 00000000000000000000
OpA[1] term 11 010 00000000000000000000
OpA[23] term 1.101 0000000000000000 0000
Infinitely-precise internal significand
Infinitely precise 1.101 0000000000000000 0100 111 00000000000000000000
product
Pre-rounded product 1.101 0000000000000000 000L Gss ssssssssssssssssssss
Incremented product 1.101 0000000000000000 0101 111 00000000000000000000
Result significand
Result product - RNE 1.101 0000000000000000 0101
Result product – RP 1.101 0000000000000000 0100
Result product – RM 1.101 0000000000000000 0101
Result product – RZ 1.101 0000000000000000 0100
9. Show that 36! generates an overflow condition.
36! = 3.719 x 1041. This value is greater than the range for normal values (3.4 x 1038).
10. Demonstrate a case in which a multiply operation results in a normal value for the RZ
and RM rounding modes, but overflows for the RNE and RP rounding modes.
0x7F7FFFFF and 0x3F800001.
64
Chapter 11
1. Complete the table for a VCMP instruction and the following operands.
Operand A Operand B N Z C V
0xFF800000 0x3F800000 1 0 0 0
0x00000000 0x80000000 0 1 1 0
0x7FC00005 0x00000000 0 0 1 1
0x7F80000F 0x7F80000F 0 0 1 1
0x40000000 0xBF000000 0 0 1 0
2. Give the instructions to implement the following algorithm. Assume y is in S0, and
return x in S0.
x = 8y2 -7y + 12
; Compute the polynomial

; x = 8y^2 - 7Y + 12
; Set up y value in s0
VLDR.F32 s0,=-3.0
; Set up the contents of s1-s3 using VLDR

; pseudo-instruction
VLDR.F32 s1,=8.0 ; power 2 coeff
VLDR.F32 s2,=-7.0 ; power 1 coeff
VLDR.F32 s3,=12.0 ; constant
VMUL.F32 s1, s0, s1 ; compute 8 * y

VMUL.F32 s1, s0, s1 ; compute (8 * y) * y
VMUL.F32 s2, s0, s2 ; compute -7y
VADD.F32 s0, s1, s2 ; compute 8y^2 + (-7)y
VADD.F32 s0, s0, s3 ; add in +12, return in s0
Exit B Exit
3. Expand the code above to create a subroutine that will resolve an order 2 polynomial
with the constant for the square term in S1, the order 1 term in S2, and the constant factor
in S3, and the y value in S0. Return the result in S0.
; Set up y value in s0
VLDR.F32 s0,=-3.0
; Set up the contents of s4-s7 using VLDR

VLDR.F32 s1,=8.0 ; power 2 coeff
65
VLDR.F32 s2,=-7.0 ; power 1 coeff
VLDR.F32 s3,=12.0 ; constant
BL func
Exit B Exit
func
; y is input in S0
; function - (S1)y^2 + (S2)y + (S3)
VMUL.F32 s1, s1, s0 ; compute (S1)y
VMUL.F32 s1, s1, s0 ; compute ((S1)y)y
VMUL.F32 s2, s0, s2 ; compute (S2)y
VADD.F32 s0, s1, s2 ; compute (S1)y^2 + (S2)y
VADD.F32 s0, s0, s3 ; add in constant term
BX lr ; return
4. Give the instructions to perform the following loop over an array X of 20 data values
in memory at address 0x40000100 and an array Y of data values in memory at address
0x40000200. The new y value should overwrite the original value.
y = Ax + y
AREA RESET, CODE, READONLY

THUMB
EXPORT __Vectors
__Vectors
DCD 0x400 ; Top of Stack
DCD Reset_Handler ; Reset Handler
EXPORT Reset_Handler
ENTRY
Reset_Handler
; Enable the FPU

; Code taken from ARM website
; CPACR is located at address 0xE000ED88
LDR.W R0, =0xE000ED88
; Read CPACR
LDR R1, [R0]
; Set bits 20-23 to enable CP10 and CP11
ORR R1, R1, #(0xF << 20)
; Write back the modified value to the CPACR
STR R1, [R0]
DSB
66
; Reset pipeline not the FPU is enabled
ISB
; Apply the equation to two arrays, x and y

; each 20 entries. The function is called
; "func," and provides two values in S0 and
; S1, and return the result in S0. S2
; contains the constant, A.
IterCnt EQU 20
RAMStartX EQU 0x20000000
RAMStartY EQU 0x20000100
; Registers used
; r0 = address of x_data, incremented after
; r1 = address of y_data, incremented after
; r2 = iteration count, set to 20
; r3 = counter, initially set to 0
; Assume at least one iteration
LDR r0, =x_data ; address for x data

LDR r1, =y_data ; address for y data
MOV r2, #IterCnt ; number of iterations
MOV r3, #0 ; initial iteration
VLDR.F32 s2,=-20.0 ; initialize constant, A
; Copy the data from flash to RAM

LDR r4, =RAMStartX
LDR r5, =RAMStartY
CpyLoop
LDR r6, [r0], #4
LDR r7, [r1], #4
STR r6, [r4], #4
STR r7, [r5], #4
ADD r3, #1
CMP r3, r2
BLT CpyLoop
; Reset the pointers to the RAM space

LDR r0, =RAMStartX
LDR r1, =RAMStartY
MOV r3, #0 ; reinitialize iter cnt
; Perform the function

Loop
VLDR.32 s0, [r0], #4 ; load x data, post-inc
67
VLDR.32 s1, [r1] ; load y data, no post-inc
BL func ; call function
ADD r3, r3, #1 ; increment counter
VSTR.32 s0, [r1], #4 ; store new y, post-inc
CMP r3, r2 ; test for done
BLT Loop ; if less than IterCnt, loop
Exit B Exit ; else, exit. We're done.
; Function is y = Ax + y.
; S0 - x
; S1 - y
; S2 - A
; Return result in S0
func
VMUL.F32 S0, S0, S2
VADD.F32 S0, S0, S1
BX LR
x_data
DCD 0x3f800000
DCD 0x40000000
DCD 0x40800000
DCD 0x41000000
DCD 0x41800000
DCD 0x42000000
DCD 0x42800000
DCD 0x43000000
DCD 0x43800000
DCD 0x44000000
DCD 0x44800000
DCD 0x45000000
DCD 0x45800000
DCD 0x46000000
DCD 0x46800000
DCD 0x47000000
DCD 0x47800000
DCD 0x48000000
DCD 0x48800000
DCD 0x49000000
DCD 0x49800000
y_data
DCD 0xbf800000
DCD 0xc0000000
DCD 0xc0800000
68
DCD 0xc1000000
DCD 0xc1800000
DCD 0xc2000000
DCD 0xc2800000
DCD 0xc3000000
DCD 0xc3800000
DCD 0xc4000000
DCD 0xc4800000
DCD 0xc5000000
DCD 0xc5800000
DCD 0xc6000000
DCD 0xc6800000
DCD 0xc7000000
DCD 0xc7800000
DCD 0xc8000000
DCD 0xc8800000
DCD 0x49000000
END
5. Modify the program in Example 11.4 to order the four values in S8-S11 in order from
smallest to largest.

THUMB
EXPORT __Vectors
__Vectors
ENTRY
Reset_Handler
; Enable the FPU

; Read CPACR
LDR R1, [R0]
ORR R1, R1, #(0xF << 20)
STR R1, [R0]
DSB

ISB
69
; It will take a maximum of 3 passes for a bubble sort
MOV r0, #3
; Set up the contents of registers s8-s1 using VLDR

VLDR.F32 s8,=2.0
VLDR.F32 s9,=1.0
VLDR.F32 s10,=-1.0
VLDR.F32 s11,=-2.0
; Data in registers S8-S11 are to be ordered from smallest (in

; S8) to largest. Use s0 as the temp scratch register
Loop
VCMP.F32 s8, s9
VMRS APSR_nzcv, FPSCR
VMOVGT.F32 s0, s9
VMOVGT.F32 s9, s8
VMOVGT.F32 s8, s0
VCMP.F32 s9, s10

VMOVGT.F32 s0, s10
VMOVGT.F32 s10, s9
VMOVGT.F32 s9, s0re
VCMP.F32 s10, s11

VMOVGT.F32 s0, s11
VMOVGT.F32 s11, s10
VMOVGT.F32 s10, s0
SUB r0, #1
BNE Loop
END
6. In the third case in Example 11.9 (Section 11.7.2.3) the error of the VMLA was 2-22.
Show why this is the case.
The inputs are (1.5 + 1 ulp)2 – (2.5 + 4 ulps). When the inputs are multiplied, we have
2.25 + 3 ulps + 1 ulp2. The VMLA will round this to 2.25 + 4 ulps, or (1.125 + 2 ulp) *
21 (0x40100002). When 0xC0100002 is added to the product, the result is zero for the
VMLA, but what remains after the subtraction in the VFMA case is –(2 ulps + 2 ulp2)
(with an ulp equal to 2-23). The result in single-precision is 0xB3FFFFFE.
7. Write a division routine that checks for a divisor of zero, and if it is returns a correctly
signed infinity without setting the DZC bit. If the divisor is not zero, the division is
performed.
Note: Testing for a negative zero is a bit tricky. The Cortex-M4 will compare two zeros
as equal regardless of the signs of the zeros. The option used in this solution is to move
70
the zero to an ARM register and test it, and if negative flip the sign on a +1 operand and
multiply an infinity by this value to return the signed infinity.

THUMB
EXPORT __Vectors
__Vectors
ENTRY
Reset_Handler
; Enable the FPU

; Read CPACR
LDR R1, [R0]
ORR R1, R1, #(0xF << 20)
STR R1, [R0]
DSB

ISB
; Test out the division routine

VLDR.F32 s0,=+1.0 ; just some normals
VLDR.F32 s1,=+2.0
BL DivNoDZ
VLDR.F32 s0,=0.0 ; try with +0/+0

VLDR.F32 s1,=0.0
BL DivNoDZ
VLDR.F32 s0,=0f_7fe00000 ; NaN/-0

VLDR.F32 s1,=-0.0
BL DivNoDZ
VLDR.F32 s0,=+1.0 ; normal/-0

VLDR.F32 s1,=-0.0
BL DivNoDZ
Exit B Exit
; Division routine. Check is made for a zero divisor, and

; if so, a correctly signed infinity is returned without
; setting DZ, even if the dividend is NaN or zero.
; Input:
; s0 - dividend
; s1 - divisor
71
; Return
; s0 - quotient
; If the divisor is zero, r0 is used to extract the sign
DivNoDZ
VCMP.F32 s1, #0.0 ; if divisor is zero, rtn INF
BEQ RtnInf
VDIV.F32 s0, s0, s1
BX lr
RtnInf
VMOV.F32 r0, s1 ; use the tst inst
TEQ r0, #0 ; in the integer side
MOV r0, #0x3f800000 ; if +0, return +1
ORRMI r0, #0x80000000 ; if -0, flip sign, rtn -1
VMOV.F32 s1, r0
VLDR.F32 s0,=0f_7f800000 ; load +inf
VMUL.F32 s0, s1
BX lr
END
8. Add to the program of Exercise 7 a check on a divisor that is 2.0. If it is, perform a
multiplication of 0.5 rather than do the division.
Modifying the DivNoDZ routine above, we have
; Division routine. Check is made for a zero divisor, and

; if so, a correctly signed infinity is returned without
; setting DZ, even if the dividend is NaN or zero.
; Added check on divisor of 2.0, to be replaced with a
; multiplication of 0.5
; Input:
; s0 - dividend
; s1 - divisor
; s2 - scratch
; Return
; s0 - quotient
; If the divisor is zero, r0 is used to extract the sign
DivNoDZ
VCMP.F32 s1, #0.0 ; if divisor is zero, rtn INF
BEQ RtnInf
VLDR.F32 s2,=2.0 ; load 2.0 for check
VCMP.F32 s1, s2 ; if divisor is 2.0, mul by 0.5
BEQ MulByHalf
VDIV.F32 s0, s0, s1
BX lr
RtnInf
VMOV.F32 r0, s1 ; use the tst inst
TEQ r0, #0 ; in the integer side
MOV r0, #0x3f800000 ; if +0, return +1
ORRMI r0, #0x80000000 ; if -0, flip sign, rtn -1
72
VMOV.F32 s1, r0
VLDR.F32 s0,=0f_7f800000 ; load +inf
VMUL.F32 s0, s1
BX lr
MulByHalf
VLDR.F32 s1,=0.5 ; replace divisor by 0.5
VMUL.F32 s0, s1
BX lr
END
9. Write a subroutine that would perform a reciprocal operation on an input in S0,

returning the result in S0.
; Reciprocal routine. A value in s0 is operated on as

; 1.0/s0, with the result in s0.
; Input:
; s0 - input
; Return
; s0 - reciprocal
Recip
VLDR.F32 s1,=1.0 ; set up 1.0 constant
VDIV.F32 s0, s1, s0 ; perform the division
BX lr ; and retur
73
Chapter 12
1. Using the sine table as a guide, construct a cosine table which produces the value for cos(x),
where 0 ≤ x < 360. Test your code for values of 84 degrees and 105 degrees.
AREA COSINETABLE, CODE

ENTRY
; Registers used:
; r0 = return value in Q31 notation
; r1 = sin argument (in degrees, from 0 to 360)
; r2 = temp
; r4 = starting address of sine table
; r7 = copy of argument
main
MOV r7, r1 ; make a copy of the argument
LDR r2,=270 ; constant won’t fit into rotation scheme
ADR r4, cosine_data ; load address of cosine table
CMP r1, #90 ; determine quadrant
BLE retvalue ; first quadrant?
CMP r1, #180

RSBLE r1,r1,#180 ; second quadrant?
BLE retvalue
CMP r1, r2
SUBLE r1, r1, #180 ; third quadrant?
BLE retvalue
RSB r1, r1, #360 ; otherwise, fourth
retvalue
LDR r0, [r4, r1, LSL #2] ; get cosine value from table
CMP r7, #90
BLT done
CMP r7, r2 ; do we return a negative value?
BGT done
RSBLT r0, r0, #0 ; if between 90 and 270, negate the value
done B done
ALIGN
cosine_data
DCD 0x7fffffff,0x7ffb0280,0x7fec0a00,0x7fd31780
DCD 0x7fb02e00,0x7f834f00,0x7f4c7e80,0x7f0bc080
DCD 0x7ec11a80,0x7e6c9280,0x7e0e2e00,0x7da5f580
DCD 0x7d33f100,0x7cb82880,0x7c32a680,0x7ba37500
DCD 0x7b0a9f80,0x7a683180,0x79bc3880,0x7906c080
DCD 0x7847d900,0x777f9000,0x76adf600,0x75d31a80
74
DCD 0x74ef0e80,0x7401e500,0x730baf00,0x720c8080
DCD 0x71046d00,0x6ff38a00,0x6ed9eb80,0x6db7a880
DCD 0x6c8cd700,0x6b598e80,0x6a1de700,0x68d9f980
DCD 0x678dde80,0x6639b000,0x64dd8980,0x63798500
DCD 0x620dbe80,0x609a5300,0x5f1f5e80,0x5d9cff80
DCD 0x5c135380,0x5a827980,0x58ea9100,0x574bb900
DCD 0x55a61280,0x53f9be00,0x5246dd80,0x508d9200
DCD 0x4ecdff00,0x4d084680,0x4b3c8c00,0x496af400
DCD 0x4793a200,0x45b6bb80,0x43d46500,0x41ecc480
DCD 0x40000000,0x3e0e3dc0,0x3c17a500,0x3a1c5c40
DCD 0x381c8bc0,0x36185b00,0x340ff240,0x32037a40
DCD 0x2ff31bc0,0x2ddf0040,0x2bc75100,0x29ac37c0
DCD 0x278dde80,0x256c6f80,0x234815c0,0x2120fb80
DCD 0x1ef74c00,0x1ccb3240,0x1a9cd9a0,0x186c6de0
DCD 0x163a1a80,0x14060b60,0x11d06ca0,0x0f996a20
DCD 0x0d613050,0x0b27eb60,0x08edc7b0,0x06b2f1d0
DCD 0x04779630,0x023be164,0x00000000
END
Using r1 = 0x54, which is 84 degrees, the result is r0 = 0x0D613050 in Q31 = .10528463
Using r1 = 0x69, which is 105 degrees, the result is r0 = 0xDEDF0480 in Q31 = -.25881903
* Note that this cosine table was generated on a simulated ARM7 without floating point
hardware, using:
double M_PI = 2 * acos(0.0);
to represent pi, since M_PI is not part of the ARM mathlib library. Some cosine tables may
slightly differ depending on the processor/environment/tools/precision used.
2. It was mentioned in Section 12.4 that a binary search only works if the entries in a list are
sorted first. A bubble sort is a simple way to sort entries. The basic idea is to compare two
adjacent entries in a list – call them entry[j] and entry[j+1]. If entry[j] is larger, then swap the
entries. If this is repeated until the last two entries are compared, the largest element in the list
will now be last. The smallest entry will ultimately get swapped, or “bubbled”, to the top. This
algorithm could be described in C as
75
last = num;
while (last>0){
pairs = last – 1;
for (j=0; j <=pairs; j++) {
if(entry[j] > entry[j+1]) {
temp = entry[j];
entry[j] = entry[j+1];
entry[j+1] = temp;
last = j;
}
}
}
where num is the number of entries in the list. Write an assembly language program to implement
a bubble sort algorithm, and test it using a list of 20 elements. Each element should be a word in
length.
AREA BubbleSort, CODE, READONLY

num EQU 20 ; number of elements
ENTRY
MOV r0, #num ; number of elements

LSL r0, r0, #2 ; number of bytes of elements
ADR r1, elements ; pointer to start of elements
MOV r2, r1 ; make copy of start of elements
SUB r2, r2, #4 ; length is # of elements - 1
Sort ADD r3, r1, r0

SUB r3, r3, #4 ; addr of last element (new ptr)
MOV r4, #0 ; r4 = change flag
ADD r2, r2, #4 ; incr by 1 word each iteration
NextWord
LDR r5, [r3], #-4 ; load 1st word
LDR r6, [r3] ; load 2nd word
CMP r5, r6 ; 1st > 2nd ?
BGT NoSwap ; if so, don't swap
STR r5, [r3], #4 ; otherwise, swap

STR r6, [r3]
ADD r4, r4, #1 ; flag swap
SUB r3, r3, #4 ; decrement pointer
NoSwap
CMP r3, r2 ; checked all words?
BNE NextWord ; if not, check more
CMP r4, #0 ; was there a swap?
BNE Sort ; if so, check again
76
Done B Done
elements
DCD 3, 4, 2, 54, 6, 33, -5, 2, 1, 0
DCD 77, 1, 2, 3, 0, -5, -35, 3, 4, 1
END
3. Using the bubble sort algorithm written in Exercise 2, write an assembly program that sorts
entries in a list and then uses a binary search to find a particular key. Remember that your sorting
routine must sort both the key and the data associated with each entry. Create a list with 30
entries or so, and data for each key should be at least 12 bytes of information.
NUM EQU 14 ; insert # of entries here

ESIZE EQU 4 ; log 2 of the entry size (16 bytes)
; NB: Assumes entry size is a power of 2
AREA BINARYSearch, CODE

ENTRY
; bubble sort keys and data
MOV r0, #NUM ; number of elements

LSL r0, r0, #4 ; number of bytes of elements
ADR r1, table_start ; pointer to start of elements
MOV r2, r1 ; make copy of start of elements
SUB r2, r2, #16 ; length is # of elements - 1
Sort ADD r3, r1, r0

SUB r3, r3, #16 ; r3 = addr of last element (new ptr)
MOV r4, #0 ; r4 = change flag
ADD r2, r2, #16 ; incr by 16 bytes each iteration
NextElement
LDMIA r3, {r5-r8} ; load first 16-byte key/element
SUB r3, r3, #16 ; adjust pointer
LDMIA r3, {r9-r12} ; load second 16-byte key/element
CMP r5, r9 ; 1st > 2nd ?
BGT NoSwap ; if so, don't swap
STMIA r3, {r5-r8} ; otherwise swap

ADD r3, r3, #16
STMIA r3, {r9-r12}
ADD r4, r4, #1 ; flag swap
SUB r3, r3, #16 ; decrement pointer
77
NoSwap
CMP r3, r2 ; checked all elements?
BNE NextElement ; if not, check more
CMP r4, #0 ; was there a swap?
BNE Sort ; if so, check again
; Binary Search
; Registers used:
; R0 - first
; R1 - last
; R2 - middle
; R3 - index
; R4 - size of the entries (log 2)
; R5 - the key (what you're searching for)
; R6 - address of the list
; R7 - temp
LDR r5, =0x250 ; let’s look for PINEAPPLE

ADR r6, table_start ; load address of the table
MOV r0, #0 ; first = 0
MOV r1, #NUM-1 ; last = num of entries in list - 1
loop CMP r0, r1 ; compare first and last
MOVGT r2, #0 ; first>last, no key found, middle = 0
BGT done
ADD r2, r0, r1 ; first + last
ASR r2, r2, #1 ; first + last /2
LDR r7, [r6, r2, LSL #ESIZE] ; load the entry
CMP r5, r7 ; compare key to value loaded
ADDGT r0, r2, #1 ; first = middle + 1
SUBLT r1, r2, #1 ; last = middle - 1
BNE loop ; go again
done MOV r3, r2 ; move middle to 'index'
stop B stop
table_start
DCD 0x024
DCB "EXTRA SAUCE "
DCD 0x005
DCB "ANCHOVIES "
DCD 0x012
DCB "GREEN PEPPER"
DCD 0x010
DCB "OLIVES "
DCD 0x300
DCB "PINE NUTS "
DCD 0x018
DCB "BLACK OLIVES"
DCD 0x004
DCB "PEPPERONI "
DCD 0x022
DCB "CHEESE "
78
DCD 0x038
DCB "MUSHROOMS "
DCD 0x026
DCB "CHICKEN "
DCD 0x030
DCB "CANADIAN BAC"
DCD 0x100
DCB "TOMATOES "
DCD 0x200
DCB "PINEAPPLE "
DCD 0x035
DCB "GREEN OLIVES"
END
4. Create a queue of 32-bit data values in memory. Write a function to remove the first item in the
queue.
AREA QueueRemoveFirst, CODE, READONLY

ENTRY
LDR r0, Queue ; load head of queue

STR r1, Pointer ; store it to Pointer
CMP r0, #0 ; end of queue?
BEQ Done ; if so, done
LDR r1, [r0] ; otherwise, get next pointer

STR r1, Queue ; and make it head of queue
Done B Done
AREA Data, DATA

Queue DCD Item1 ; pointer to start of queue
Pointer
DCD 0 ; pointer space
DataArea % 40 ; queue space for new entries
Item1 DCD Item2 ; pointer

DCD 1, 2 ; data
Item2 DCD Item3 ; pointer
DCD 3, 4 ; data
Item3 DCD 0 ; NULL pointer
DCD 5, 6 ; data
END
79
5. Using the sine table as a guide, construct a tangent table which produces the value for tan(x),
where 0 ≤ x ≤ 45. Test your code for values of 12 degrees and 43 degrees. You may return a
value of 0x7FFFFFFF for the case where the angle is equal to 45 degrees, since 1 cannot be
represented in Q31 notation.
AREA Reset, CODE
ENTRY
; Registers used:
; r1 = tan argument (in degrees, from 0 to 45)
; r4 = starting address of tan table
Reset_Handler
ADR r4, tan_data ; load address of tan table
LDR r1, =12 ; put test value in

LDR r0, [r4, r1, LSL #2] ; get tan value from table
LDR r1, =43 ; put test value in

LDR r0, [r4, r1, LSL #2] ; get tan value from table
done B done
ALIGN
tan_data
DCD 0x00000000,0x023bf7b4,0x047848a8,0x06b54c50
DCD 0x08f35ca0,0x0b32d420,0x0d740e40,0x0fb76790
DCD 0x11fd3e00,0x1445f100,0x1691e1e0,0x18e17440
DCD 0x1b350da0,0x1d8d16c0,0x1fe9fae0,0x224c28c0
DCD 0x24b412c0,0x27222ec0,0x2996f800,0x2c12ed40
DCD 0x2e969380,0x31227500,0x33b721c0,0x365530c0
DCD 0x38fd4100,0x3baff840,0x3e6e0580,0x41382180
DCD 0x440f0e00,0x46f39980,0x49e69d00,0x4ce90000
DCD 0x4ffbb800,0x531fc980,0x56564b80,0x59a06680
DCD 0x5cff5880,0x60747580,0x64012b00,0x67a70100
DCD 0x6b679e00,0x6f44c980,0x73407080,0x775ca780
DCD 0x7b9bb080,0x7fffffff
END
80
6. Create a jump table that contains the addresses of three functions: sin(x), cos(x) and tan(x). The
arguments to these functions should be between 0 and 45 degrees. Call each of these functions
using the jump table for the angle 27 degrees.
;
; Note that you would probably not implement such a jump table in
; practice, as there are better ways to compute trig values.
; This exercise is mostly illustrative of how to build the jump
; table itself.
;
num EQU 3
ENTRY
Reset_Handler
start
MOV r0, #1 ; first parameter determines
; function and holds return argument:
; 0 = sin(x)
; 1 = cos(x)
; 2 = tan(x)
MOV r1, #27 ; 27 degrees (NB: argument
; assumed to be 0<=x<=45)
BL trigfun ; call the function
stop
B stop
trigfun
CMP r0, #num ; if code is >=num then return
MOVHS pc, lr
ADR r3, JumpTable ; load address of jump table
LDR pc, [r3, r0, LSL #2]; jump to appropriate routine
JumpTable
DCD DoSin
DCD DoCos
DCD DoTan
DoSin
LDR r2,=270 ; constant won't fit into rotation scheme
ADR r4, sin_data ; load address of sin table
CMP r1, #180
BLE retvalue
CMP r1, r2
81
BLE retvalue
retvalue
LDR r0, [r4, r1, LSL #2] ; get sin value from table
CMP r7, #180 ; do we return a neg value?
RSBGT r0, r0, #0 ; negate the value
MOV pc, lr ; return
ALIGN
sin_data
DCD 0x00000000,0x023be164,0x04779630,0x06b2f1d8
DCD 0x08edc7b0,0x0b27eb50,0x0d613050,0x0f996a30
DCD 0x11d06ca0,0x14060b80,0x163a1a80,0x186c6de0
DCD 0x1a9cd9c0,0x1ccb3220,0x1ef74c00,0x2120fb80
DCD 0x234815c0,0x256c6f80,0x278dde80,0x29ac3780
DCD 0x2bc750c0,0x2ddf0040,0x2ff31bc0,0x32037a40
DCD 0x340ff240,0x36185b00,0x381c8bc0,0x3a1c5c80
DCD 0x3c17a500,0x3e0e3dc0,0x40000000,0x41ecc480
DCD 0x43d46500,0x45b6bb80,0x4793a200,0x496af400
DCD 0x4b3c8c00,0x4d084600,0x4ecdff00,0x508d9200
DCD 0x5246dd00,0x53f9be00,0x55a61280,0x574bb900
DCD 0x58ea9100,0x5a827980,0x5c135380,0x5d9cff80
DCD 0x5f1f5f00,0x609a5280,0x620dbe80,0x63798500
DCD 0x64dd8900,0x6639b080,0x678dde80,0x68d9f980
DCD 0x6a1de700,0x6b598f00,0x6c8cd700,0x6db7a880
DCD 0x6ed9ec00,0x6ff38a00,0x71046d00,0x720c8080
DCD 0x730baf00,0x7401e500,0x74ef0f00,0x75d31a80
DCD 0x76adf600,0x777f9000,0x7847d900,0x7906c080
DCD 0x79bc3880,0x7a683200,0x7b0a9f80,0x7ba37500
DCD 0x7c32a680,0x7cb82880,0x7d33f100,0x7da5f580
DCD 0x7e0e2e00,0x7e6c9280,0x7ec11a80,0x7f0bc080
DCD 0x7f4c7e80,0x7f834f00,0x7fb02e00,0x7fd31780
DCD 0x7fec0a00,0x7ffb0280,0x7fffffff
DoCos
LDR r2,=270 ; constant won't fit into rotation scheme
ADR r4, cosine_data ; load addr of cosine table
BLE retvalue1 ; first quadrant?
CMP r1, #180

BLE retvalue1
CMP r1, r2
BLE retvalue1
82
retvalue1
LDR r0, [r4, r1, LSL #2] ; get cosine value from table
CMP r7, #90
BLT done1
CMP r7, r2 ; do we return a negative value?
BGT done1
RSBLT r0, r0, #0 ; if between 90 and 270, negate
done1 MOV pc,lr
ALIGN
cosine_data
DCD 0x7b0a9f80,0x7a683180,0x79bc3880,0x7906c080
DCD 0x7847d900,0x777f9000,0x76adf600,0x75d31a80
DCD 0x678dde80,0x6639b000,0x64dd8980,0x63798500
DCD 0x5c135380,0x5a827980,0x58ea9100,0x574bb900
DCD 0x55a61280,0x53f9be00,0x5246dd80,0x508d9200
DCD 0x381c8bc0,0x36185b00,0x340ff240,0x32037a40
DCD 0x04779630,0x023be164,0x00000000
DoTan
ADR r4, tan_data ; load address of tan table
LDR r0, [r4, r1, LSL #2]; get tan value from table
MOV pc,lr
ALIGN
tan_data
DCD 0x00000000,0x023bf7b4,0x047848a8,0x06b54c50
DCD 0x08f35ca0,0x0b32d420,0x0d740e40,0x0fb76790
DCD 0x11fd3e00,0x1445f100,0x1691e1e0,0x18e17440
DCD 0x1b350da0,0x1d8d16c0,0x1fe9fae0,0x224c28c0
DCD 0x24b412c0,0x27222ec0,0x2996f800,0x2c12ed40
83
DCD 0x2e969380,0x31227500,0x33b721c0,0x365530c0
DCD 0x38fd4100,0x3baff840,0x3e6e0580,0x41382180
DCD 0x440f0e00,0x46f39980,0x49e69d00,0x4ce90000
DCD 0x4ffbb800,0x531fc980,0x56564b80,0x59a06680
DCD 0x5cff5880,0x60747580,0x64012b00,0x67a70100
DCD 0x6b679e00,0x6f44c980,0x73407080,0x775ca780
DCD 0x7b9bb080,0x7fffffff
END
7. Implement a sine table which holds values ranging from 0 to 180 degrees. The implementation
contains fewer instructions than the routine in Section 12.2, but to generate the value, it uses more
memory to hold the sine values themselves. Compare the total code size for both cases.
AREA RESET, CODE
ENTRY
; Registers used:
; r2 = temp
; r7 = flag to return a negative value
Reset_Handler
main
MOV r1, #186 ; test 186 degrees
MOV r7,r1 ; make a copy
BLE retvalue ; third or fourth?
SUB r1,r1,#180 ; sin(x-180)=-sin(x)
retvalue
RSBLT r0, r0, #0 ; negate the value
done B done
ALIGN
sin_data
DCD 0x00000000,0x023be164,0x04779630,0x06b2f1d0
84
DCD 0x1a9cd9a0,0x1ccb3240,0x1ef74c00,0x2120fb80
DCD 0x234815c0,0x256c6f80,0x278dde80,0x29ac37c0
DCD 0x2bc75100,0x2ddf0040,0x2ff31bc0,0x32037a40
DCD 0x43d46500,0x45b6bb80,0x4793a200,0x496af400
DCD 0x5f1f5e80,0x609a5300,0x620dbe80,0x63798500
DCD 0x6a1de700,0x6b598e80,0x6c8cd700,0x6db7a880
DCD 0x6ed9eb80,0x6ff38a00,0x71046d00,0x720c8080
DCD 0x730baf00,0x7401e500,0x74ef0e80,0x75d31a80
DCD 0x76adf600,0x777f9000,0x7847d900,0x7906c080
DCD 0x7fec0a00,0x7ffb0280,0x7fffffff,0x7ffb0280
DCD 0x7fec0a00,0x7fd31780,0x7fb02e00,0x7f834f00
DCD 0x7f4c7e80,0x7f0bc080,0x7ec11a80,0x7e6c9280
DCD 0x7e0e2e00,0x7da5f580,0x7d33f100,0x7cb82880
DCD 0x7c32a680,0x7ba37500,0x7b0a9f80,0x7a683180
DCD 0x79bc3880,0x7906c080,0x7847d900,0x777f9000
DCD 0x76adf600,0x75d31a80,0x74ef0e80,0x7401e500
DCD 0x730baf00,0x720c8080,0x71046d00,0x6ff38a00
DCD 0x6ed9eb80,0x6db7a880,0x6c8cd700,0x6b598e80
DCD 0x6a1de700,0x68d9f980,0x678dde80,0x6639b000
DCD 0x64dd8980,0x63798500,0x620dbe80,0x609a5300
DCD 0x5f1f5e80,0x5d9cff80,0x5c135380,0x5a827980
DCD 0x58ea9100,0x574bb900,0x55a61280,0x53f9be00
DCD 0x5246dd80,0x508d9200,0x4ecdff00,0x4d084680
DCD 0x4b3c8c00,0x496af400,0x4793a200,0x45b6bb80
DCD 0x43d46500,0x41ecc480,0x40000000,0x3e0e3dc0
DCD 0x3c17a500,0x3a1c5c40,0x381c8bc0,0x36185b00
DCD 0x340ff240,0x32037a40,0x2ff31bc0,0x2ddf0040
DCD 0x2bc75100,0x29ac37c0,0x278dde80,0x256c6f80
DCD 0x234815c0,0x2120fb80,0x1ef74c00,0x1ccb3240
DCD 0x1a9cd9a0,0x186c6de0,0x163a1a80,0x14060b60
DCD 0x11d06ca0,0x0f996a20,0x0d613050,0x0b27eb60
DCD 0x08edc7b0,0x06b2f1d0,0x04779630,0x023be164
DCD 0x00000000
END
428 bytes versus 768 bytes
8. Implement a cosine table which holds values ranging from 0 to 180 degrees. The
implementation contains fewer instructions than the routine in Section 12.2, but to generate the
85
value, it uses more memory to hold the cosine values themselves. Compare the total code size for
both cases.
AREA reset, CODE
ENTRY
; Registers used:
; r1 = cos argument (in degrees, from 0 to 360)
; r2 = temp
; r4 = starting address of cosine table
Reset_Handler
main
MOV r1, #186 ; test 186 degrees
LDR r2, =360
ADR r4, cosine_data ; load address of cosine table

SUBGT r1, r2, r1 ; subtract from 360 if > 180
; get cosine value from table

LDR r0, [r4, r1, LSL #2]
done B done
ALIGN
cosine_data
DCD 0x7b0a9f80,0x7a683180,0x79bc3880,0x7906c080
DCD 0x7847d900,0x777f9000,0x76adf600,0x75d31a80
DCD 0x678dde80,0x6639b000,0x64dd8980,0x63798500
DCD 0x5c135380,0x5a827980,0x58ea9100,0x574bb900
DCD 0x55a61280,0x53f9be00,0x5246dd80,0x508d9200
DCD 0x381c8bc0,0x36185b00,0x340ff240,0x32037a40
86
DCD 0x04779630,0x023be164,0x00000000,0xfdc41e9c
DCD 0xfb8869d0,0xf94d0e30,0xf7123850,0xf4d814a0
DCD 0xf29ecfb0,0xf06695e0,0xee2f9360,0xebf9f4a0
DCD 0xe9c5e580,0xe7939220,0xe5632660,0xe334cdc0
DCD 0xe108b400,0xdedf0480,0xdcb7ea40,0xda939080
DCD 0xd8722180,0xd653c840,0xd438af00,0xd220ffc0
DCD 0xd00ce440,0xcdfc85c0,0xcbf00dc0,0xc9e7a500
DCD 0xc7e37440,0xc5e3a3c0,0xc3e85b00,0xc1f1c240
DCD 0xc0000000,0xbe133b80,0xbc2b9b00,0xba494480
DCD 0xb86c5e00,0xb6950c00,0xb4c37400,0xb2f7b980
DCD 0xb1320100,0xaf726e00,0xadb92280,0xac064200
DCD 0xaa59ed80,0xa8b44700,0xa7156f00,0xa57d8680
DCD 0xa3ecac80,0xa2630080,0xa0e0a180,0x9f65ad00
DCD 0x9df24180,0x9c867b00,0x9b227680,0x99c65000
DCD 0x98722180,0x97260680,0x95e21900,0x94a67180
DCD 0x93732900,0x92485780,0x91261480,0x900c7600
DCD 0x8efb9300,0x8df37f80,0x8cf45100,0x8bfe1b00
DCD 0x8b10f180,0x8a2ce580,0x89520a00,0x88807000
DCD 0x87b82700,0x86f93f80,0x8643c780,0x8597ce80
DCD 0x84f56080,0x845c8b00,0x83cd5980,0x8347d780
DCD 0x82cc0f00,0x825a0a80,0x81f1d200,0x81936d80
DCD 0x813ee580,0x80f43f80,0x80b38180,0x807cb100
DCD 0x804fd200,0x802ce880,0x8013f600,0x8004fd80
DCD 0x80000000
END
444 bytes versus 752 bytes

9. Rewrite the binary search routine for the Cortex-M4
NUM EQU 14 ; insert # of entries here

ESIZE EQU 4 ; log 2 of the entry size (16 bytes)
; NB: This assumes entry size
; is a power of 2
; Registers used:
; R0 - first
; R1 - last
; R2 - middle
; R3 - index
; R4 - size of the entries (log 2)
; R5 - the key (what you're searching for)
; R6 - address of the list
; R7 - temp
; R8 - temp
LDR r5, =0x200 ; let’s look for PINEAPPLE
ADR r6, table_start ; load address of the table

MOV r0, #0 ; first = 0
MOV r1, #NUM-1 ; last = num of entries in list - 1
loop CMP r0, r1 ; compare first and last
87
ITT GT
MOVGT r2, #0 ; first>last, no key found, middle = 0
BGT done
ADD r2, r0, r1 ; first + last

ASR r2, r2, #1 ; first + last /2
; The shift count on a load instruction with register offset

; must be between 0 and 3 - our ESIZE value is too big, so we
; can do this with one extra instruction and a different load
LSL r8, r2, #ESIZE

LDR r7, [r6, r8] ; load the entry
CMP r5, r7 ; compare key to value loaded
IT GT
ADDGT r0, r2, #1 ; first = middle + 1
IT LT
SUBLT r1, r2, #1 ; last = middle - 1
BNE loop ; go again
done MOV r3, r2 ; move middle to 'index'

stop B stop
ALIGN
table_start
DCD 0x004
DCB "PEPPERONI "
DCD 0x005
DCB "ANCHOVIES "
DCD 0x010
DCB "OLIVES "
DCD 0x012
DCB "GREEN PEPPER"
DCD 0x018
DCB "BLACK OLIVES"
DCD 0x022
DCB "CHEESE "
DCD 0x024
DCB "EXTRA SAUCE "
DCD 0x026
DCB "CHICKEN "
DCD 0x030
DCB "CANADIAN BAC"
DCD 0x035
DCB "GREEN OLIVES"
DCD 0x038
DCB "MUSHROOMS "
DCD 0x100
DCB "TOMATOES "
DCD 0x200
DCB "PINEAPPLE "
DCD 0x300
DCB "PINE NUTS "
88
Chapter 13
1. What’s wrong with the following ARM7TDMI instructions?
a. STMIA r5!, {r5, r4, r9}
Register r5 is included in the register list, as well as being used as a pointer. According to the
ARM ARM, the results are unpredictable when writeback is used in such a situation.
b. LDMDA r2, {}
At least one register must be transferred, otherwise result is unpredictable
c. STMDB r15!, {r0-r3, r4, lr}
Using the program counter as the base pointer gives unpredictable results
2. On the ARM7TDMI, if register r6 holds the address 0x8000 and you executed the instruction
STMIA r6, {r7, r4, r0, lr}
what address now holds the value in register r0? Register r4? Register r7? The Link Register?
Address 0x8000 holds value in r0.
Address 0x8004 holds value in r4
Address 0x8008 holds value in r7
Address 0x800C holds value in lr
3. Assume that memory and ARM7TDMI registers r0 through r3 appear as follows:
Address Register
0x8010 0x00000001 0x13 r0
0x800C 0xFEEDDEAF 0xFFFFFFFF r1
89
0x8008 0x00008888 0xEEEEEEEE r2
0x8004 0x12340000 0x8000 r3
0x8000 0xBABE0000
Describe the memory and register contents after executing the instruction
LDMIA r3!, {r0, r1, r2}
Memory contents stay the same.
r0 = 0xBABE0000, r1 = 0x12340000, r2 = 0x00008888, r3 = 0x0000800C
4. Suppose that a stack appears as shown in the first diagram below. Give the instruction or
instructions that would push or pop data so that memory appears in the order shown. In other
words, what instruction would be necessary to go from the original state to that shown in (a), and
then (b), and then (c)?
Address
0x8010 0x00000001 0x00000001 0x00000001 0x00000001
0x800C 0xFEEDDEAF 0xFEEDDEAF 0xFEEDDEAF 0xFEEDDEAF
0x8008 0xBABE2222 0xBABE2222
0x8004 0x12340000
0x8000
original (a) (b) (c)
LDR sp, =0x800C

LDR r1, =0xBABE2222
LDR r0, =0x12340000
STR sp!, [r1] ; gets you to (a)

STR sp!, [r0] ; gets you to (b)
LDMDB sp!, {r0, r1} ; gets you to (c)
5. Convert the cosine table from Problem 1 in Chapter 12 into a subroutine, using a full
descending stack.
; Registers used:
; r2 = temp
90
CosineTable
STMFD sp!,{r4,r7,lr} ; save off used registers
LDR r2,=270 ; constant won’t fit into
; rotation scheme
ADR r4, cosine_data ; load addr of cosine table
CMP r1, #180

BLE retvalue
CMP r1, r2
BLE retvalue
retvalue
; get cosine value from table
LDR r0, [r4, r1, LSL #2]
CMP r7, #90

BLT done
CMP r7, r2 ; return a negative value?
BGT done
RSBLT r0, r0, #0 ; if between 90 and 270,
; negate the value
LDMFD sp!,{r4,r7,pc} ; return
cosine_data
DCD 0x7b0a9f80,0x7a683180,0x79bc3880,0x7906c080
DCD 0x7847d900,0x777f9000,0x76adf600,0x75d31a80
DCD 0x678dde80,0x6639b000,0x64dd8980,0x63798500
DCD 0x5c135380,0x5a827980,0x58ea9100,0x574bb900
DCD 0x55a61280,0x53f9be00,0x5246dd80,0x508d9200
DCD 0x381c8bc0,0x36185b00,0x340ff240,0x32037a40
91
DCD 0x04779630,0x023be164,0x00000000
6. Rewrite Example 13.4 using full descending stacks.
SRAM_BASE EQU 0x40000000
AREA Passbyreg, CODE, READONLY

ENTRY
LDR sp, =SRAM_BASE + 100 ; stack pointer initialized
; try out positive case

; (this should saturate)
MOV r0, #0x40000000
MOV r1, #2
BL saturate
; try out a negative case

; (should not saturate
MOV r0, #0xFFFFFFFE
MOV r1, #8
BL saturate
stop B stop
saturate
; Subroutine saturate32
; Parameters are read from the stack, and
; registers r4 through r7 are saved on
; the stack. The result
; is placed at the bottom of the stack.
; Registers used:
; r4 - result
; r5 - operand to be shifted
; r6 - scratch register
; shift amount (m)
; r4 = saturate32 (r5 << m)

STMFD sp!,{r6,lr} ; save off used registers
MOV r6, #0x7FFFFFFF
MOV r2, r0, LSL r1
TEQ r0, r2, ASR r1 ; if (r0!=(r2>>m))
EORNE r2, r6, r0, ASR #31 ; r2 = 0x7FFFFFFF^sign(r5)
LDMFD sp!,{r6,pc} ; return
92
END
AREA PassbyMem, CODE, READONLY

ENTRY
LDR sp, =SRAM_BASE+100 ; stack pointer initialized

LDR r3, =SRAM_BASE+100 ; writable mem for parameters
MOV r1, #0x40000000
MOV r2, #2
STMFD r3,{r1,r2} ; save off parameters
BL saturate

MOV r1, #0xFFFFFFFE
MOV r2, #8
STMFD r3, {r1,r2}
BL saturate
stop B stop
saturate
; Parameters are read from memory, and the
; starting address is in register r3. The result
; is placed at the start of parameter memory
; Registers used:
; r4 - result
; shift amount (m)

STMFD sp!,{r4-r7,lr} ; save off used registers
LDMFD r3, {r5,r7} ; get parameters off the stack
MOV r6, #0x7FFFFFFF
MOV r4, r5, LSL r7
TEQ r5, r4, ASR r7 ; if (r5!=(r4>>m))
STR r4, [r3] ; move result to memory
LDMFD sp!,{r4-r7,pc} ; return
93
END
AREA Passbystack, CODE, READONLY

ENTRY
LDR sp, =SRAM_BASE+100 ; stack pointer initialized

MOV r1, #0x40000000
MOV r2, #2
; push parameters on the stack
STMFD sp!,{r1,r2}
BL saturate
; pop results off the stack
; now r1 = result of shift
LDMFD sp!, {r1,r2}

MOV r1, #0xFFFFFFFE
MOV r2, #8
STMFD sp!, {r1,r2}
BL saturate
LDMFD sp!, {r1,r2}
stop B stop
saturate
; Parameters are read from memory, and the
; starting address is in register r3. The result
; is placed at the start of parameter memory
; Registers used:
; r4 - result
; shift amount (m)

STMFD sp!,{r3-r7,lr} ; save off used registers
LDR r5, [sp, #0x18] ; get first parameter off stack
94
LDR r7, [sp, #0x1C] ; get second parameter off stack
MOV r6, #0x7FFFFFFF
MOV r4, r5, LSL r7
TEQ r5, r4, ASR r7 ; if (r5!=(r4>>m))
STR r4, [sp,#0x18] ; move result to bottom of stack
LDMFD sp!,{r3-r7,pc} ; return
END
9. Convert the factorial program written in Chapter 3 into a subroutine, using full descending
stacks. Pass arguments to the subroutine using both pass-by-register and pass-by-stack
techniques.
ENTRY
Reset_Handler
LDR sp, =SRAM_BASE + 100
; Since we really want these to behave as subroutines, i.e.,

; passing back a value in a register, we’ll send the argument
; in register r0 and also put the return value in r0.
MOV r0, #10 ; load 10 into r0, pass by register

BL FactReg
MOV r0, #10

STMFD sp!, {r0} ; save off parameter
BL FactStack
stop B stop
FactReg
STMFD sp!,{r4,lr}
MOV r4, #1 ; if n=0, at least n! = 1
loop CMP r0, #0
MULGT r4, r0, r4 ; perform multiply, result in r7
SUBGT r0, r0, #1 ; decrement n
BGT loop
MOV r0, r4 ; pass the result back
LDMFD sp!,{r4,pc}
95
FactStack
STMFD sp!,{r4,lr}
LDR r0, [sp, #8] ; get parameter off stack
MOV r4, #1 ; if n=0, at least n! = 1
loop1 CMP r0, #0
MULGT r4, r0, r4 ; perform multiply, result in r7
SUBGT r0, r0, #1 ; decrement n
BGT loop1
MOV r0, r4 ; pass the result back
LDMFD sp!,{r4,pc}
END
10. Write the ARM7TDMI division routine from Chapter 7 as a subroutine that uses empty
ascending stacks. Pass the subroutine arguments using registers, and test the code by dividing
4000 by 32.
AREA DivUsingEAstack, CODE, READONLY
Rcnt RN 4 ; assign R4 to Rcnt

Ra RN 1 ; assign R1 to Ra
Rb RN 2 ; assign R2 to Rb
Rc RN 3 ; assign R3 to Rc
ENTRY
LDR Ra, =4000

LDR Rb, =32
BL Division
Done B Done
Division
STMEA sp!,{r4,lr}
MOV Rcnt, #1 ; bit to control the division
Div1 CMP Rb, #0x80000000 ; move Rb until greater than Ra

CMPCC Rb, Ra
MOVCC Rb, Rb, LSL #1
MOVCC Rcnt, Rcnt, LSL #1
BCC Div1
96
MOV Rc, #0
Div2 CMP Ra, Rb ; test for possible subtraction
SUBCS Ra, Ra, Rb ; subtract if OK
ADDCS Rc, Rc, Rcnt ; put relevant bit into result
MOVS Rcnt, Rcnt, LSR #1 ; shift control bit
MOVNE Rb, Rb, LSR #1 ; halve unless finished
BNE Div2 ; divide result into Rc, remainder in Ra
LDMEA sp!,{r4,pc}
END
Rc = r3 = 0x7D = 125 = 4000/32

Ra = 0
11. Match the following terms with their definitions:
a. recursive 1. subroutine can be interrupted and called by the interrupting routine
b. relocatable 2. a subroutine which calls itself
c. position independent 3. the subroutine can be placed anywhere in memory
d. reentrant 4. all program addresses are calculated relative to the program counter
a.  2.
b.  3.
c.  4.
d.  1.
12. Write the ARM7TDMI division routine from Chapter 7 as a subroutine that uses full
descending stacks. Pass the subroutine arguments using the stack, and test the code by dividing
142 by 7.
AREA DivUsingFDstack, CODE, READONLY
STACK_BASE EQU 0x40000000
Rcnt RN 4 ; assign R4 to Rcnt

Ra RN 1 ; assign R1 to Ra
Rb RN 2 ; assign R2 to Rb
Rc RN 3 ; assign R3 to Rc
ENTRY
LDR sp, =STACK_BASE

LDR Ra, =142
97
LDR Rb, =7
STMFD sp!,{Ra,Rb}
BL Division
Done B Done
Division
STMFD sp!,{r4,lr}
LDR Ra, [sp, #0x8] ; load Ra from stack
LDR Rb, [sp, #0xC] ; load Rb from stack
MOV Rcnt, #1 ; bit to control the division
Div1 CMP Rb, #0x80000000 ; move Rb until greater than Ra

CMPCC Rb, Ra
MOVCC Rb, Rb, LSL #1
MOVCC Rcnt, Rcnt, LSL #1
BCC Div1
MOV Rc, #0
Div2 CMP Ra, Rb ; test for possible subtraction
SUBCS Ra, Ra, Rb ; subtract if OK
ADDCS Rc, Rc, Rcnt ; put relevant bit into result
MOVS Rcnt, Rcnt, LSR #1 ; shift control bit
MOVNE Rb, Rb, LSR #1 ; halve unless finished
BNE Div2 ; divide result into Rc, remainder in Ra
LDMFD sp!,{r4,pc}
END
Rc = r3 = 0x14 = 20 = 142/7
Ra = 0x2 = 2
13. Write ARM assembly to implement a PUSH operation without using LDM or STM
instructions. The routine should handle three data types, where register r0 contains 1 for byte
values, 2 for halfword values, and 4 for word values. Register r1 should contain the data to be
stored on the stack. The stack pointer should be updated at the end of the operation.
AREA Push, CODE, READONLY

ENTRY
CMP r0, #4
BNE Half
STR r1, [r13]
B Inc_SP
Half CMP r0, #2
98
BNE Byte
STRH r1, [r13]
B Inc_SP
Byte STRB r1, [r13]
Inc_SP ADD r13, r13, #4
Done B Done
END
14. Write ARM assembly to check whether an N x N matrix is a magic square. A magic square is
an N x N matrix in which the sums of all rows, columns, and the two diagonals add up to N(N2 +
1)/2. All matrix entries are unique numbers from 1 to N2. Register r1 will hold N. The matrix
starts at location 0x4000 and ends at location (0x4000 + N2). Suppose you wanted to test a
famous example of a magic square:
16 3 2 13
5 10 11 8
9 6 7 12
4 15 14 1
The numbers 16, 3, 2, and 13 would be stored at addresses 0x4000 to 0x4003, respectively. The
number 5, 10, 11, 8 would be stored at addresses 0x4004 to 0x4007, etc. Assume all numbers are
bytes. If the matrix is a magic square, r9 will be set upon completion; otherwise it will be cleared.
You can find other magic square examples, such as Ben Franklin’s own 8 x 8 magic square, on
the Internet to test your program.
; Write ARM Assembly to check whether an N x N matrix is a

; magic square. A magic square is an N x N matrix in which the
; sums of all rows, columns, and diagonals add up to
; N(N^2 + 1)/2.
; All matrix entries are unique numbers from 1 to N. The
; matrix starts at location 0x4000. R1 will hold N. If the
; matrix is a magic square, r9 upon completion will be set,
; otherwise cleared.
AREA MagicSquare, CODE, READONLY

ENTRY
99
MOV r9, #0
MOV r8, #1
MOV r7, #0
MUL r6, r1, r1 ; r6 = N^2

ADD r6, r6, #1 ; r6 = N^2 + 1
MUL r6, r6, r1 ; r6 = N(N^2 + 1)
MOV r6, r6, LSR #1 ; r6 = N(N^2 + 1)/2
BL Initialize
Rows0 MOVEQ r3, #0

Rows LDRB r4, [r2], #1 ; r4 = element for row addition
ADD r3, r3, r4 ; acc row sums to r3
BNE Rows
Check_Rows
SUBS r7, r7, #1 ; row counter
BEQ Column_Init ; tested all rows successfully
CMP r3, r6 ; test for magic
MOVEQ r5, r1 ; reset counter
BEQ Rows0 ; test next row
B Done
Column_Init
BL Initialize
Columns0
MOVEQ r3, #0
Columns
LDRB r4, [r2], r1
ADD r3, r3, r4
SUBS r5, r5, #1
BNE Columns
Check_Columns
SUBS r7, r7, #1
BEQ Diag_Init
CMP r3, r6
MOVEQ r5, r1
MOVEQ r2, #0x4000
ADDEQ r2, r2, r8
ADDEQ r8, r8, #1
BEQ Columns0
B Done
Diag_Init
BL Initialize
Diags0
100
MOVEQ r3, #0
ADD r10, r1, #1
Diags
LDRB r4, [r2], r10
ADD r3, r3, r4
SUBS r5, r5, #1
BNE Diags
Diags1
CMP r3, r6
BNE Done
MOV r3, #0
MOV r2, #0x4000
MOV r7, r1
SUB r7, r7, #1
ADD r2, r2, r7
SUB r10, r1, #1
MOV r5, r1
Diags2
LDRB r4, [r2], r10
ADD r3, r3, r4
SUBS r5, r5, #1
BNE Diags2
CMP r3, r6
MOVEQ r9, #0xFFFFFFFF
Done B Done
END
15. Another common operation in signal processing and control applications is to compute a dot
product, given as
N 1
a   cm xm
m 0
where the dot product a is a sum of products. The coefficients cm and the input samples xm are
stored as arrays in memory. Assume sample data and coefficients are 16-bit, unsigned integers.
Write the assembly code to compute a dot product for 20 samples. This will allow you to use the
LDM instruction to load registers with coefficients and data efficiently. You probably want to
bring in four or five values at a time, looping as needed to exhaust all values. Leave the dot
product in a register, and give the register the name DPROD using the RN directive in your code.
101
If you use the newer v7-M SIMD instructions, note that you can perform two multiplies on 16-bit
values at the same time.
AREA DotProduct, CODE, READONLY

ENTRY
DPROD RN 0
LDR DPROD, =0
LDR r1, =samples
LDR r2, =coeffs
LDR r14, =4 ; loop counter for 20 samples
loop LDMIA r1!, {r3-r7} ; load samples

LDMIA r2!, {r8-r12} ; load coefficients
MLA DPROD, r3, r8, DPROD ; multiply and accumulate
; corresponding
MLA DPROD, r4, r9, DPROD ; samples and coeffs
MLA DPROD, r5, r10, DPROD
BNE loop
Done B Done
samples
DCD 3, 4, 2, 54, 6, 33, 5, 2, 1, 0, 77
DCD 1, 2, 3, 0, 5, 35, 3, 4, 1
coeffs
DCD 1, 1, 3, 3, 2, 1, 4, 2, 1, 0, 0, 0
DCD 0, 2, 1, 3, 2, 1, 0, 1
END
15. Suppose your stack was defined to be between addresses 0x40000000 and 0x40000200, with
program variables located at address 0x40000204 and higher in memory, and your stack pointer
contains the address 0x400001FC. What do you think would happen if you attempt to store 8
words of data on an ascending stack?
The machine would let you perform this operation, but as a result, you now have data corruption
in the program variables.
102
Chapter 14
1. Name three ways in which FIQ interrupts are handled more quickly than IRQ interrupts.
1) FIQs have a higher priority than IRQs when they occur simultaneously
2) The FIQ vector is at the top of the vector table allowing you to place the handler there
eliminating the need to change the PC, as you must with the IRQ vector
3) FIQ mode contains extra FIQ-specific registers which allows you to not have to push some
data on the stack on an FIQ call.
2. Describe the types of operations normally performed by a reset handler.
Set up exception vectors
Initialize the memory system (e.g. MMU/PU)
Initialize all required processor mode stacks and registers
Initialize variables required by C
Initialize any critical I/O devices
Enable interrupts
Change processor mode and/or state
Call the main application
3. Why can you only return from exceptions in ARM state on the ARM7TDMI?
There are special instructions to access the PSRs (to restore program state) which are only
available in ARM state.
4. How many external interrupt lines does the ARM7TDMI have? If you have 8 interrupting
devices, how would you handle this?
103
Two. If more than two sources of interrupt, they will need to be connected to an interrupt
controller which prioritizes and signals the core via the two interrupt lines.
5. Write an SVC handler that accepts the number 0x1234. When the handler sees this value, it
should reverse the bits in register r9 (see Exercise 2 in Chapter 8). The SWI exception handler
should examine the actual SWI instruction in memory to determine its number. Be sure to set up a
stack pointer in SVC mode before handling the exception.
T_bit EQU 0x20

RAMSTART EQU 0x40000000 ; start of onboard RAM for LPC2104
Mode_SVC EQU 0x13 ; Mode bits for Supervisor mode
I_Bit EQU 0x80 ; when I bit is set, IRQ is disabled
F_Bit EQU 0x40 ; when F bit is set, FIQ is disable
MSR CPSR_c, #Mode_SVC:OR:I_Bit:OR:F_Bit

; enter Supervisor mode
LDR sp, =RAMSTART ; setup Supervisor stack pointer
; this is not a standalone program

; ………..
; SVC Handler
SVC_Handler
STMFD sp!, {r0-r3,r12,lr} ; stack registers
MOV r1, sp ; and set pointer to parameters
MRS r0, SPSR ; get SPSR
STMFD sp!, {r0} ; and store it on the stack
TST r0, #T_bit
LDRNEH r0, [lr,#-2] ; extract SWI number
BICNE r0, r0, #0xFF00 ; 8 bits for Thumb
LDREQ r0, [lr,#-4]
BICEQ r0, r0, #0xFF000000 ; 24 bits for ARM
LDR r2, =0x1234

CMP r0, r2 ; is the SVC number 0x1234?
BLEQ ReverseR9
; insert checks for other SVC numbers here
LDMFD sp!, {r1}

MSR SPSR_csxf, r1
LDMFD sp!, {r0-r3,r12,pc}^ ; restore registers
; and return
104
ReverseR9
MOV r2, #0 ; r2 = result (will be r0 reversed)
Loop AND r3, r9, #1 ; r3=current LSB of reg to be reversed

ADD r2, r3, r2, LSL #1 ; shift result left
; 1, result LSB is r3
LSR r9, r9, #1 ; look at new LSB of reg
; to be reversed
BNE Loop
MOV pc, lr ; return
6. Explain why you can’t have an SVC and an undefined instruction exception occur at the same
time.
An SVC cannot be an undefined instruction, so the two are mutually exclusive.
7. Build an Undefined exception handler that tests for and handles a new instruction called
FRACM. This instruction takes two Q15 numbers, multiplies them together, shifts the result left
one bit to align the binary point, then stores the upper 16 bits to register r7 as a Q15 number. Be
sure to test the routine with two values.
AREA UndefQ15Mul, CODE, READONLY

ENTRY
LDR r0, =0xFFFFBB79 ; load -0.5354 into r3 (Q15)

LDR r4, =0x1FEB ; load 0.24938 into r4 (Q15)
FRACM DCD 0x77F07FF4 ; r7 (HI Q31) = r0(Q15)*Rm(Q15)
Done B Done
UndefHandler
STMFD sp!, {r0-r12, lr} ; save workspace and lr to stack
MRS r0, SPSR ; copy SPSR to r0
STR r0, [sp, #-4]! ; Save SPSR to stack
LDR r0, [lr, #-4] ; r0 = undefined instruction

BIC r2, r0, #0xF00FFFFF ; clear out all bits but
; opcode bits
TEQ r2, #0x07F000000 ; r2 = opcode for MulQ15
BLEQ MulQ15 ; if valide opcode, handle it
105
; insert tests for other undefined instructions here
LDR r1, [sp], #4 ; restore SPSR to r1

MSR SPSR_cxsf, r1 ; restore SPSR
LDMFD sp!, {r0-r12, pc}^ ; return to place in program
; after undef instruction
; MulQ15 is r0(Q15) * Rm(Q15) (where Rm can only be between r0

; and r7 inclusive), and puts the HI
; result in r7. It does not decode immediates, CC, S-bit, etc.
; Format is same as example in text.
MulQ15
BIC r3, r0, #0xFFFFFFF0 ; mask out all bits except Rm
ADD r3, r3, #1 ; bump past SPSR on the stack
LDR r0, [sp, #4] ; grab r0 from the stack
LDR r3, [sp, r3,LSL #2] ; use the Rm field as an offset
MUL r7, r0, r3 ; calculate r0(Q15) * Rm(Q15)
MOV r7, r7, LSL #1 ; shift Q30 number to get Q31
MOV r7, r7, LSR #15 ; move upper 16 bits to [15:0]
STR r7, [sp, #4] ; store result back on stack
MOV pc, lr
END
8. As a sneak peek into Chapter 16, we’ll find out that interrupting devices are programmable.
Using ST Microelectronic’s documentation found on their website, give the address range for the
following peripherals on the STR910FM32 microcontroller. They can cause an interrupt to be
sent to the ARM processor.
a. Real Time Clock
b. Wake-up/Interrupt Unit
a. 0x4C001000 to 0x4C001017
b. 0x48001000 to 0x48001013
9. Explain the steps the ARM7TDMI processor takes when handling an exception.
When an exception occurs, the ARM7TDMI:

Copies CPSR into SPSR_<mode>
Sets appropriate CPSR bits
If core currently in Thumb state then
ARM state is entered.
106
Mode field bits
Interrupt disable bits (if appropriate)
Stores the return address in LR_<mode>
Sets PC to vector address
To return, exception handler needs to:
Restore CPSR from SPSR_<mode>
Restore PC from LR_<mode>
10. What mode does the processor have to be in to move the contents of the SPSR to the CPSR?
What instruction is used to do this?
Any privileged mode.
You’ll use the MSR instruction.
11. When handling interrupts, why must the link register be adjusted before returning from the
exception?
The interrupt handler is called after the instruction has finished executing. Thus, the PC has
already been updated at the point LR is calculated.
12. How many SPSRs are there on the ARM7TDMI?
13. What is a memory management unit (MMU) used for? What is the difference between an
MMU and a memory protection unit (MPU)? You may want to consult the ARM documentation
for specific information on both.
MMU – handles accesses to memory requested by CPU and converts virtual addresses to physical
addresses using pages.
MPU – sets and enforces access permissions to various portions of memory, but in general, is
simpler than an MMU.
107
Chapter 15
1. How many operation modes does the Cortex-M4 have?
Two modes: Handler mode and Thread mode
2. What happens if you do not enable usage faults in Example 15.1?
The processor will take a hard fault.
3. Which register must be used to switch privilege levels?
The CONTROL register
4. What are the differences between the ARM7TDMI and Cortex-M4 vector tables?
a) ARM7TDMI vector table contains actual instructions at each address, where the Cortex-M4
vector table has handler addresses in the table.
b) The vector tables are ordered differently, e.g., the exception vector at address 0x0 is the reset
exception for the ARM7TDMI and the pointer to the top of the stack on the Cortex-M4.
c) There are more exception types on the Cortex-M4
5. Give the offsets (from the base address 0xE000E000) and register size for the following:
a) Usage Fault Status Register – 0xD28, 16 bits
b) Memory Management Fault Status Register – 0xD28, 8 bits
You may wish to consult the TM4C1233H6PM data sheet (Texas Instruments 2013)
6. Configure Example 15.5 so that the timer counts up rather than down. Don’t forget to
configure the appropriate match value!
There are two changes that must be made. The first is to set the counter to count up, which is
bit[4] of the CPTMTAMR register, so instead of ORing with the value 0x21, OR with the value
0x31. The second change is to remove the code that clears the timer match value, since the
default match value is 0xFFFF, so the write to offset 0x30 should be removed.
108
Chapter 16
1. Write an SVC handler so that when an SVC instruction is executed, the handler prints out the
contents of register r8. The program should incorporate assembly routines similar to the ones
already built. For example, you will need to convert the binary value in register r8 to ASCII first,
then use a UART to display the information in the Keil tools. Have the routine display “Register
r8 =” followed by the register’s value.
AREA SVCr8ASCII, CODE, READONLY
PINSEL0 EQU 0xE002C000 ; controls the function of the pins

U0START EQU 0xE000C000 ; start of UART0 registers
LCR0 EQU 0xC ; line control register for UART0
LSR0 EQU 0x14 ; line status register for UART0
RAMSTART EQU 0x40000000 ; start of onboard RAM for 2104

Mode_SVC EQU 0x13 ; Supervisor mode bits
I_Bit EQU 0x80 ; when I bit is set, IRQ is disabled
F_Bit EQU 0x40 ; when F bit is set, FIQ is disabled
T_bit EQU 0x20
ENTRY
MSR CPSR_c, #Mode_SVC:OR:I_Bit:OR:F_Bit

; switch to Supervisor mode
LDR sp, =RAMSTART ; and set Supervisor stack ptr
; Subroutine UARTConfig
; This subroutine configures the I/O pins first. It
; then sets up the UART control register. The parameters
; are set to 8 bits, no parity and 1 stop bit.
; Registers used:
; r5 – scratch register
; r6 – scratch register
; inputs: none
; outputs: none
UARTConfig
LDR r5,=PINSEL0 ; base address of register
LDR r6,[r5] ; get contents
BIC r6,r6,#0xF ; clear out lower nibble
ORR r6,r6,#0x5 ; sets P0.0 to Tx0 and P0.1 to Rx0
STR r6, [r5] ; r/modify/w back to register
LDR r5,=U0START
MOV r6, #0x83 ; 8 bits, no parity, 1 stop bit
109
STRB r6, [r5, #LCR0] ; write control byte to LCR
MOV r6, #0x61 ; 9600 baud @15MHz VPB clock
STRB r6, [r5] ; store control byte
MOV r6, #3 ; set DLAB = 0
STRB r6, [r5, #LCR0] ; Tx and Rx buffers set up
SVC_Handler
STMFA sp!, {r0-r3,r12,lr} ; stack registers
MOV r1, sp ; and set pointer to parameters
MRS r0, SPSR ; get SPSR
STMFA sp!, {r0} ; and store it on the stack
TST r0, #T_bit
LDRHNE r0, [lr,#-2] ; extract SVC number
BICNE r0, r0, #0xFF00 ; 8 bits for Thumb
LDREQ r0, [lr,#-4]
BICEQ r0, r0, #0xFF000000 ; 24 bits for ARM
; insert checks for SVC numbers here
BL printR8
LDMFA sp!, {r1}
MSR SPSR_csxf, r1
LDMFA sp!, {r0-r3,r12,pc}^ ; restore regs and return
printR8
MOV r9, lr ; preserve LR
CMP r8, #0xA ; is number less than 0xA?
BLT ADD0
ADD r8, r8, #"A"-"0"-0xA ; add offset 'A' to 'F'
ADD0 ADD r8, r8, #"0" ; convert to ASCII
LDR r1,=CharData ; starting address of characters
Loop LDRB r0, [r1],#1 ; load character, increment addr

CMP r0,#0 ; null terminated?
BLNE Transmit ; send character to UART
BNE Loop ; continue if not a ‘0’
LDR r5, =U0START

STRB r8, [r5] ; print r8's ASCII value
; Done B Done ; uncomment to test UART output
MOV pc, r9 ; return
Transmit
; Subroutine Transmit
; This routine puts one byte into the UART
; for transmitting.
; Register used:
; r5 – scratch
110
; r6 - scratch
; inputs: r0- byte to transmit
; outputs: none
STMFA sp!,{r5,r6,lr}
LDR r5,=U0START
wait LDRB r6,[r5,#LSR0] ; get status of buffer
CMP r6,#0x20 ; buffer empty?
BEQ wait ; spin until buffer’s empty
STRB r0,[r5]
LDMFA sp!,{r5,r6,pc}
CharData
DCB "Register r8 = ",0
END
2. Choose a device with general-purpose I/O pins, such as the LPC2103, and write an assembler
routine which sequentially walks a 1 back and forth across the I/O pins. In other words, at any
given time, only a single pin is set to 1 – all others are 0. Set up the Keil tools to display the pins
(you might want to compile and run the Blinky example that comes with the tools to see what the
interface looks like).
AREA IOwalk, CODE, READONLY
IODIR EQU 0xE0028008

IOSET EQU 0xE0028004
IOCLR EQU 0xE002800C
ENTRY
LDR r0, =0x00FF0000

LDR r1, =IODIR
STR r0, [r1] ; configure IO pins as output
LDR r1, =0x00010000 ; load least sig IO pin

LDR r2, =IOSET
LDR r3, =IOCLR
LoopForever
LDR r4, =8 ; loop counter for 8 pins
Loop1 STR r1, [r2] ; turn on LED

BL Delay ; delay
111
STR r0, [r3] ; turn off LEDs
LSL r1, r1, #1 ; shift to next pin

SUBS r4, r4, #1
BNE Loop1
LDR r4, =8
Loop2 LSR r1, r1, #1 ; shift to next pin

STR r1, [r2] ; turn on LED
BL Delay ; delay
STR r0, [r3] ; turn off LEDs
SUBS r4, r4, #1

BNE Loop2
B LoopForever
Delay
LDR r5, =0x00FF0000
DelayLoop
SUBS r5, r5, #1
BNE DelayLoop
MOV pc, lr
END
3. Rewrite the two examples in this chapter using full descending stacks.
See solutions in Chapter 13 for examples of how to do this.
4. What is the address range for on-chip SRAM for the LPC2106 microcontroller?
0x40000000 to 0x4000FFFF
5. Write an assembly program which takes a character entered from a keyboard and echoes it to a
display. To do this, you will need to use two UARTs, one for entering data and one for displaying
it. The routine which accepts characters should not generate an interrupt when data is available,
but merely wait for the character to appear in its buffer. In other words, the UART routine should
spin in a loop until a key is pressed, after which it branches back to the main routine. Note that
112
the UART window must have the focus in order to accept data from a keyboard (this is a
Windows requirement), so be sure to click on the UART window first when testing your code.
AREA UartRxTx, CODE, READONLY
PINSEL0 EQU 0xE002C000 ; controls the function of the pins
U0START EQU 0xE000C000 ; start of UART0 registers

U1START EQU 0xE0010000 ; start of UART1 registers

ENTRY
LDR r5,=PINSEL0 ; base address of register

LDR r6,[r5] ; get contents
BIC r6,r6,#0xF ; clear out lower nibble
LDR r4, =0x50005
ORR r6,r6,r4 ; sets P0.0 to Tx0; P0.1 to Rx0
; and P0.8 to Tx1; P0.9 to Rx1
STR r6, [r5] ; r/modify/w back to register
UART0Config
LDR r5,=U0START
UART1Config
LDR r5,=U1START
Recieve LDR r5,=U1START

waitRx LDRB r6,[r5,#LSR1] ; get status of buffer
TST r6,#0x1 ; buffer empty?
BEQ waitRx ; spin until buffer’s empty
LDRB r0,[r5]
113
Transmit LDR r5,=U0START
waitTx LDRB r6,[r5,#LSR0] ; get status of buffer
CMP r6,#0x20 ; buffer empty?
BEQ waitTx ; spin until buffer’s empty
STRB r0,[r5]
B Recieve ; loop forever
END
6. Using the D/A converter example as a guide, write an assembly routine which will generate a
waveform defined by
f(x) = asin(0.5x) + bsin(x) + c
where a, b and c are constants that allow a full period to be displayed. The output waveform
should appear on AOUT so that you can view it on the logic analyzer in the Keil tools.
; Sine wave generator using the LPC2132 microcontroller

; This program will generate a sine wave using
; the equation f(x) = 256sin(0.5x)+256sin(x)+512 and
; the D/A converter on the controller. The output can be
; viewed using the Logic Analyzer in the MDK tools
PINSEL1 EQU 0xE002C004

DACREG EQU 0xE006C000
SRAMBASE EQU 0x40000000
AREA Reset, CODE
ENTRY
Reset_Handler
main
LDR sp, =SRAMBASE ; initialize stack pointer
LDR r6, =PINSEL1 ; PINSEL1 configures pins
LDR r8, =DACREG ; DAC Register[15:6] is VALUE
LDR r7, [r6] ; read/modify/write
ORR r7, r7, #1:SHL:19 ; set bit 19
BIC r7, r7, #1:SHL:18 ; clear bit 18
STR r7, [r6] ; change P0.25 to Aout
outloop MOV r6, #720 ; start number
; Note that we have to count to 720 now, since the period of

; sin(0.5x) is twice that of sin(x)
114
inloop
RSB r1, r6, #720 ; arg = 720 - loop count
BL sine ; get sin(r1)
MOV r9, r0 ; hold sin(x)
; this isn't the cleanest way to do this, but generate

; another argument and calculate sin(0.5x)
RSB r1, r6, #720
MOV r1, r1, ASR#1 ; argument/2
BL sine ; get sin(r1)
; Now that we have r0 = sin(r1), we need to send

; this to the DAC converter
; First, we take the Q31 value and make it Q15
; and multiply it by 256. We do this for both
; the sin(x) and sin(0.5x). Then we offset the result
; by 512 to show the full sine wave. Aout is
; VALUE/1024 * Vref, so our sine wave should swing
; between 0 and 3.3V on the output.
; this section is for sin(0.5x)

ASR r0, r0, #16 ; convert Q31 to Q15
LSL r0, r0, #8 ; *256 now in Q15 notation
ASR r0, r0, #15 ; keep the integer part only
; this section is for sin(x)
ASR r9, r9, #16 ; convert Q31 to Q15
LSL r9, r9, #8 ; *256 now in Q15 notation
ASR r9, r9, #15 ; keep the integer part only
ADD r0, r9, r0 ; 256*sin(0.5x)+256*sin(x)

ADD r0, r0, #512 ; above+512 to show wave
LSL r0, r0, #6 ; bits 5:0 of DAC are
;
undefined
STRH r0, [r8] ; write to DACREG
SUBS r6, r6, #1 ; count down to 0

BNE inloop
B outloop ; do this forever
; Sine function
; Returns Q31 value for integer arguments from 0 to 360
; Registers used:
; r1 = sin argument (in degrees)
; r4 = starting address in sine table
; r5 = temp
; Note that the function has been slightly modified to account
; for values of x such that 360<=x<=720 so we can see a full
; period of sin(0.5x)
115
sine
STMIA sp!, {r4,r5,r7,lr} ; stack used registers
LDR r5,=270 ; const won't fit into rotation scheme
CMP r1, #360 ; if x>360, then subtract 360 from it
SUBGE r1,r1,#360
CMP r1, #180
BLE retvalue
CMP r1, r5
BLE retvalue
retvalue
RSBGT r0, r0, #0 ; negate the value
LDMDB sp!,{r4,r5,r7,pc} ; restore registers
ALIGN
sin_data
DCD 0x00000000,0x023be164,0x04779630,0x06b2f1d8
DCD 0x1a9cd9c0,0x1ccb3220,0x1ef74c00,0x2120fb80
DCD 0x234815c0,0x256c6f80,0x278dde80,0x29ac3780
DCD 0x2bc750c0,0x2ddf0040,0x2ff31bc0,0x32037a40
DCD 0x43d46500,0x45b6bb80,0x4793a200,0x496af400
DCD 0x5f1f5f00,0x609a5280,0x620dbe80,0x63798500
DCD 0x6a1de700,0x6b598f00,0x6c8cd700,0x6db7a880
DCD 0x6ed9ec00,0x6ff38a00,0x71046d00,0x720c8080
DCD 0x730baf00,0x7401e500,0x74ef0f00,0x75d31a80
DCD 0x76adf600,0x777f9000,0x7847d900,0x7906c080
DCD 0x7fec0a00,0x7ffb0280,0x7fffffff
END
116
You should see an image like the one below:
7. What is the address range for the following devices on the LPC2132 microcontroller?
a. General purpose input/output ports
b. Universal asynchronous receiver/transmitter 0
c. Analog-to-digital converter
d. Digital-to-analog converter
a. 0xE0028000 to 0xE002BFFF
b. 0xE000C000 to 0xE000FFFF
c. 0xE0034000 to 0xE0037FFF
d. 0xE006C000 to 0xE006FFF
8. What is the address range for the following devices on the STR910FM32 microcontroller?
a. General purpose input/output ports
b. Real time clock
c. Universal asynchronous receiver/transmitter
d. Analog-to-digital converter
117
a.
GPIO Port 0: 0x48006000 to 0x48006423

GPIO Port 1: 0x48007000 to 0x48007423
GPIO Port 2: 0x48008000 to 0x48008423
GPIO Port 3: 0x48009000 to 0x48009423
GPIO Port 4: 0x4800A000 to 0x4800A423
GPIO Port 5: 0x4800B000 to 0x4800B423
GPIO Port 6: 0x4800C000 to 0x4800C423
GPIO Port 7: 0x4800D000 to 0x4800D423
GPIO Port 8: 0x4800E000 to 0x4800E423
GPIO Port 9: 0x4800F000 to 0x4800F423
b. 0x4C001000 to 0x4C001017
c.
UART0: 0x4C004000 to 0x4C00404B

UART1: 0x4C005000 to 0x4C00504B
UART2: 0x4C006000 to 0x4C00604B
d. 0x4C00A000 to 0x4C00A037
9. The UART example given in this chapter uses registers r5 and r6 to configure the UART and,
therefore, must write them to the stack before corrupting them. Rewrite the UART example so
that no registers are stacked and the routine is still AAPCS compliant.
Just remove stack instructions and replace all instances or r5 with r2 and r6 with r3.
10. Modify the routine in Section 16.4.5 so that the LEDs flash alternately between red and blue
for about 1 sec each.
You only need to have the value written to Port F be either 2 or 8; therefore, instead of shifting by
1 bit, shift by two bits. Change the line
LSLLT r6, r6, #1
to
LSLLT r6, r6, #2
and the value will alternate between 2 and 8.
118
Chapter 17
1. On the ARM7TDMI, which bit in the CPSR indicates whether you are in ARM state or Thumb
state?
Bit 5
2. Give the mnemonic(s) for a Thumb instruction(s) that is equivalent to the ARM
instruction
SUB r0, r3, r2, LSL #2
LSL r2, #2
SUB r3, r2
MOV r0, r3
3. Why might you want to switch to Thumb state in an exception handler?
The bulk of the handler can be written in THUMB for the same reason you would use THUMB
for anything else: code density and better performance from narrow memory.
4. Can you talk to a floating-point coprocessor in Thumb state?
Not on the ARM7TDMI. On the Cortex-M4, since you are only executing Thumb-2 instructions,
you can execute floating-point code once the floating-point unit is enabled.
5. Using Figure 17.2 as a guide, convert Program 3 from Chapter 3 into 16-bit Thumb assembly.

ENTRY
ARM
ADR r0, intoThumb + 1

BX r0
CODE16
intoThumb
LDR r0, =0xF631024C ; load some data
LDR r1, =0x17539ABD ; load some data
EOR r0, r1 ; r0 = r0 XOR r1
119
MOV r2, r0 ; r2 = temp r0
MOV r1, r2 ; r1 = r2 XOR r1
stop B stop
END
6. Describe why veneers might be needed in a program.
A veneer is needed when a branch involves a destination beyond the branching range of the
current state.
7. Convert Example 13.4 into Thumb code. Do not convert the entire subroutine – just the four
lines of code to perform saturation arithmetic.
MOV r6, #0
NEG r6, r6
LSR r6, r6, #1 ; MOV r6, #0x7FFFFFFF
LSL r0, r1
MOV r2, r0 ; MOV r2, r0, LSL r1
ASR r2, r1
CMP r0, r2
BEQ Return ; TEQ r0, r2, ASR r1
ASR r0, #31

EOR r6, r0
MOV r2, r6 ; EORNE r2, r6, r0, ASR #31
Return ; return instructions here
8. In which state does the ARM7TDMI processor come out of reset?
ARM
9. How do you switch to Thumb state if your processor supports both ARM and Thumb
instructions?
BX and BLX instructions
10. How do you switch to ARM state on the Cortex-M4?
120
This is a trick question, since you cannot switch to ARM state!
121
Chapter 18
1. Example 18.1 gives the program necessary to set the Q flag. Run the code using the Keil tools,
with the target being the STR910FM32 from ST Microelectronics. Which registers does the
compiler use, and what is the value in those registers just before the QDADD instruction is
executed?
r0 = 0x60000000, r1 = 0x0007000, r2 = 0x00007ABC, r3 = 0x60000000, sp = 0x04000458, lr
(r14) = 0x35B24000 , pc = 0x0000424C
2. Example 18.2 demonstrates the embedded assembler. Compile the code and run it. What is the
value in the program counter just before the BX instruction executes in the function
my_strcopy? In order to compile this example, you will need to target the LPC2101 from NXP
and include files from the “Inline” example found in the Keil “Examples” directory. Include the
source files serial.c and retarget.c in your own project. Also be sure to include the startup file
when asked. When you run the code, you can use the UART #2 window to see the output from
the printf statements.
0x000041D8
3. Write a short C program which declares a variable called TMPTR. Using example 18.2 as a
guide, print out the variable in degrees Celsius, with some initial temperature defined in the main
program in degrees Fahrenheit. Write the temperature conversion program as an inline assembly
function. You’ll want to use fractional arithmetic to avoid division.
#include <stdio.h>
extern void init_serial (void);
int main(void)
{
int TMPTR = 96;
122
int a, b, c;
float result;
init_serial();
__asm
{
MOV a, #0x471C // load 5/9 in Q15
SUB b, TMPTR, #32 // b = TMPTR - 32
MUL c, a, b // c = 5/9 * (TMPTR - 32)
}
result = c/32768.0; // convert Q15 to decimal

printf("%i",TMPTR);
printf(" degrees F equals ");
printf("%f",result);
printf(" degrees C.");
return 0;
}
4. Using the saturation algorithm discussed in Chapter 13, which performs a logical shift left by m
bits and saturates when necessary, write a C routine which calls it as an embedded assembly
function. The function should have two parameters: the value to be shifted and the shift count. It
should return the shifted value. The small C routine should create a variable with the initial value.
__asm void sat(int val, int shiftVal)

{
// Subroutine sat
// Performs r0 = saturate32 (r0 << r1)
// Registers used:
// r0 = operand to be shifted and result returned
// r1 = shift amount m
// r6 = scratch register
MOV r6, #0x7FFFFFFF
MOV r2, r0, LSL r1
TEQ r0, r2, ASR r1 // if (r0!=(r2>>m)
EORNE r0, r6, r0, ASR #31 // r2 = 0x7FFFFFFF^sign(r0)
}
void callsat()
{
// try out a positive case
// (this should saturate)
int val = 0x40000000;
int shiftVal = 2;
123
sat(val,shiftVal);
}
5. Modify Example 18.1 so that the function Clear_Q_Flag returns 1 when the function clears
a set Q flag; otherwise, if the bit was clear, it returns 0.
__inline int Clear_Q_flag (void)

{
int temp, temp2;
__asm
{
MRS temp, CPSR
BIC temp2, temp, #Q_Flag
ANDS temp, temp, #Q_Flag
MOVNE temp, #1
MSR CPSR_f, temp2
}
return temp;
}
6. Run Example 18.4 by creating two separate source files in the Keil tools. Once you have saved
these files, you can add them to a new project. The Keil tools will compile the C source file and
assemble the assembly language file automatically. When you run the code, you can see the
output on UART #2. Refer to Exercise 2 above for more details.
No written answer required.
7. Run Example 18.5 by creating three separate source files in the Keil tools. Recall that the FPU
must be initialized, and this should be one of the three files. Notice the value of OutVal in the
variables window and confirm the converted values match the expected inputs from the sensor
(see the comments in the array declaration).
8. Expand Example 18.5 by converting the OutVal floating-point value back to S16 8x8 format.
Add this routine to the file containing the CvtShorts8x8ToFloat routine and call it
CvtFloatToShort8x8. Verify that the result of the conversion back to S16 8x8 format matches the
124
original value. Experiment with some other formats, such as 9.7 or 7.9, and see what values are
produced.
125

Solutions Manual For: Arm Assembly Language

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Solutions Manual For: Arm Assembly Language

Caricato da

Copyright:

Formati disponibili

solutions MANUAL FOR

William Hohl and

Boca Raton London New York

CRC Press is an imprint of the

No claim to original U.S. Government works

Printed on acid-free paper

International Standard Book Number-13: 978-1-4822-2992-9 (Ancillary)

Fundamentals and Techniques

Issue Date By Change

William Hohl and Christopher Hinds

DMA controller, 4 UARTs, 2 CAN channels, a 10-bit Digital-to-Analog Converter, a motor

control PWM, timers, and a host of other peripherals.

3. Convert the following binary values into hexadecimal:

Signed Magnitude: 10001110

5. Convert the following hexadecimal values to base 10:

6. Convert the following base 10 numbers to base 4:

convert the following values into two’s complement representations:

It can be data in a number of formats:

Unsigned: 3,785,367,681 in decimal

Signed two’s complement: -505,599,615 in decimal

It could be an address, or the instruction

It could also be the floating-point number -3.69227 x 1020.

base 10? What if it were a word-length value (i.e., 32 bits long)?

partition into its decimal equivalent.

13. Convert the following decimal numbers into hexadecimal:

complement notations. Write the answer using 8 hex digits.

Two’s complement: 0xFFFFFF10

a. 1100 b. 1010 c. 1000 d. 11100

the tools given in the References to check your answer.

many modes does the Cortex-M4 have?

ARM7TDMI: 7 modes, 2 states

Cortex-M4: 2 modes, Handler mode and Thread mode

instruction entered the execute stage of its pipeline?

It wouldn’t recognize the instruction and would go into Undefined mode.

r14: Link Register

r13: Stack Pointer

r15: Program Counter

17 for User mode, 18 for remaining modes

M4 holds the status flags?

ARM7TDMI: Bits [31:28]

Cortex-M4: APSR bits [31:28]

On the ARM7TDMI: Undefined: 0x04 and Reset: 0x0

7. What is the purpose of FIQ mode?

Handling of highest priority interrupt(s)

supporting virtual memory systems?

9. How do you enable interrupts on the ARM7TDMI?

Clear bits 6 and 7 of the CPSR

3: Fetch, Decode, and Execute

Note that it could be 0x8002 if the machine is in Thumb state.

12. What is the function of the Saved Program Status Register?

upon return of handling the exception.

SUB r0, r2, r3

system using a Cortex-M4?

address, and mostly likely die a horrible death.

ARM7TDMI processor? How does it differ from the Cortex-M4?

ARM7TDMI. What value is typically placed at address 0x0 on a Cortex-M4?

LDR pc, [pc, #offset]

where “#offset” is an offset to the Reset handler.

ADD r2, r1, r1, LSL #2

B instruction)? What is the ADD instruction actually doing?

r2 = 170 (0xAA). The ADD instruction is multiplying r1 by 5.

0x1C8CFC00 = 479001600 = 12!

second XOR instruction, what is the value in register r1?