Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
History
ARM was developed at Acron Computer Limited of Cambridge, England between 1983 & 1985
RISC concept introduced in 1980 at Stanford and Berkley
ARM Architecture
Based upon RISC Architecture with enhancements to meet requirements of embedded applications
A large uniform register file Load-store architecture, where data processing operations operate on register contents only Uniform and fixed length instructions 32-bit processor Instructions are 32-bit long Good Speed/Power Consumption Ratio High Code Density
Architecture Versions
Version 5T
Superset of 4T adding new instruction
Version 5TE
Add signal processing signal extension
Examples:
ARM 6 : v3 ARM7 : v3, ARM7TDMI : v4T StrongARM: v4 ARM 9E-S : v5TE
Instructions typically use two source registers and single result or destination registers A Barrel shifter on the data path can preprocess data before it enters ALU Increment/decrement logic can update register content for sequential access independent of ALU
ALU
data in register
Registers
General Purpose registers hold either data or address All registers are of 32 bits In user mode 16 data registers and 2 status registers are visible Data registers: r0 to r15
Three registers r13, r14, r15 perform special functions r13: stack pointer r14: link register ( where return address is put whenever a subroutine is called) r15: program counter
Registers (2)
Depending upon context, registers r13 and r14 can also be used as GPR Any instruction which use r0 can as well be used with any other GPR (r1-r13) In addition, there are two status registers
CPSR: current program status register SPSR: saved program status register
Register: r15
When the processor is executing in ARM state
All instructions are 32 bit wide All instructions are word aligned PC value is stored in bits [31:2] with bits [1:0] undefined
Status Registers
CPSR: monitors and control internal operations
Processor Modes
Processor modes determine
which registers are active, and access rights to CPSR register itself
Privileged Modes
Abort: when there is a failed attempt to access memory Fast Interrupt Request (FIQ) & interrupt request: correspond to interrupt levels available on ARM Supervisor mode: state after reset and generally the mode in which OS kernel executes
Banked Registers
Register file contains in all 37 registers
20 registers are hidden from program at different times
These registers are called banked registers
Banked registers are available only when the processor is in a particular mode
Processor modes (other than system mode) have a set of associated banked registers that are subset of 16 registers Maps one-to-one onto a user mode register
Register banking
User User registers replaced by banked registers FIQ IRQ sup undef abort
SPSR
Register Organization
Mode Changing
Mode changes by writing directly to CPSR or by hardware when the processor responds to exception or interrupt To return to user mode a special return instruction is used that instructs the core to restore the original CPSR and banked registers
bi t 0
20 16 12 8 4 0
Memory (Byte-wide) Addresses decrease from top to bottom and left to right.
w ord16
15 11 7 3
half -w ord14 half -w ord12 w ord8 by te6 half -w ord4 by te3 by te2 by te1 by te0 by te address
Instructions
Instructions process data held in registers and access memory with load and store instructions Classes of instructions:
Data processing Branch instructions Load-store instructions Software interrupt instruction Program status register instructions
Data Processing
Manipulate data within registers
MOVE instructions Arithmetic instructions
Multiply instructions
Move instruction
MOV Rd, N
Rd : destination register N : can be an immediate value or source register Example: mov r7, r5
MVN Rd, N
Move into Rd not of the 32-bit value from source
Facilitates fast multiply, division and increases code density Example: mov r7, r5, LSL #2
Multiplies content of r5 by 4 and puts result in r7
Immediate value
Arithmetic Instructions
Implements 32 bit addition and subtraction
3-operand form Examples:
SUB r0, r1, r2
Subtract value stored in r2 from that of r1 and store in r0
Multiply Instructions
Multiply contents of a pair of registers
Long multiply generates 64 bit result
Examples:
MUL r0, r1,r2
Contents of r1 and r2 multiplied and put in r0
UMULL r0,r1,r2,r3
Unsigned multiply with result stored in r0 and r1
Number of cycles taken for execution of multiply instruction depends upon processor implementation
Logical Instructions
Bit wise logical operations on the two source registers
AND, OR, Ex-OR, bit clear Example: BIC r0, r1, r2
R2 contains a binary pattern where every binary 1 in r2 clears a corresponding bit location in register r1 Useful in manipulating status flags and interrupt masks
Compare Instructions
Enables comparison of 32 bit values
Updates CPSR flags but do not affect other registers Examples
CMP r0,r9
Flags set as a result of r0 - r9
TEQ r0,r9
Flags set as a result r0 ex-or r9
TST r0,r9
Flags as a result of r0 & r9
Summary
We have examined basics of ARM architecture Understood processor modes We have looked at core data path Discussed basic data processing operations
Load-Store Instructions
Transfers data between memory and processor registers
Single register transfer
Data types supported are signed and unsigned words (32 bits), half-words, bytes
Multiple-register transfer
Transfer multiple registers between memory and the processor in a single instruction
Swap
Swaps content of a memory location with the contents of a register
Scaled
Address is calculated using the base address register and a barrel shift operation
Example
Pre-indexing with write back LDR r0,[r1,#4]!
Before instruction execution
r0 = 0x00000000 r1 = 0x00009000 Mem32[0x00009000] = 0x01010101 Mem32[0x00009004] = 0x02020202
latency
Addressing Modes
LDMIA|IB|DA|DB ex: LDMIA STMIA|IB|DA|DB
Start Address IA IB DA DB Increment after Increment before Rn Rn + 4
Rn!, {r1-r3}
End Address Rn+4*N-4 Rn + 4*N Rn! Rn +4*N Rn + 4*N Rn-4*N Rn-4*N
Stack Processing
A stack is implemented as a linear data structure which grows up (ascending) or down(descending) Stack pointer hold the address of the current top of the stack
Empty Descending
LDMED : translates to LDMIB (POP) STMED : translates to STMIA (PUSH) SP points to first unused location
SWAP Instruction
Special case of load store instruction Swap instructions:
SWP : swap a word between memory and register SWPB : swap a byte between memory and register
Branch Instruction
Branch instruction : B label
Example: B forward Address label is stored in the instruction as a signed pc-relative offset
r9 points to source of data, r10 points to start of destination data, r11 points to end of the source
Conditional Execution
An unusual feature of ARM instruction set is that conditional execution applies not only to branches but to all ARM instructions Example: ADDEQ r0,r1,r2
Instruction will only be executed when the zero flag is set to 1
Advantages
Reduces the number of branches
Reduces the number of pipeline flushes Improves performance of the code
Increases code density Whenever the conditional sequence is 3 instructions or fewer (smaller and faster) to exploit conditional execution than to use a branch
For nested subroutine, push r14 and some work registers required to be saved onto a stack in memory
Example
BL sub1 STMFD r13!,{r0-r2,r14} BL sub2 ..
SWI
SWI is typically executed in user mode Instruction forces processor mode to supervisor (SVC) this allows an OS routine to be executed in privileged mode Each SWI has an associated SWI number which is used to represent a particular function call or feature Parameter passing through registers; Return value is also passed using registers
Example
PRE : cpsr =nzcvqift_USER pc = 0x00008000 lr= 0x003fffff (lr=r14) r0=0x12 0x00008000 SWI 0x123456 POST: cpsr = nzcvqift_SVC spsr=nzcvqift_USER pc=0x00008004 lr =0x00008004 (lr=r14_SVC) r0=0x12
Example
Enabling IRQ interupt
PRE cpsr = nzcvqIFt_SVC
MRS r1,CPSR BIC r1,r1,#0x80 MSR cpsr, r1
Coprocessor Instructions
Used to extend the instruction set
Used by cores with a coprocessor Coprocessor specific operations
Thumb
Thumb encodes a subset of the 32 bit instruction set into a 16-bit subspace Thumb has higher performance than ARM on a processor with a 16-bit data bus Thumb has higher code density
For memory constrained embedded system
Code density
ARM divide
MOV r3,#0
Thumb divide
MOV r3,#0
Loop
SUBS r0,r0,r1 ADDGE r3,r3,#1 BGE loop ADD r2,r0,r1 5x4 =20 bytes
Loop
ADD SUB BGE SUB ADD r3,#1 r0,r1 loop r3,#1 r2,r0,r1
6x2 = 12 bytes
Thumb instructions
Only low registers r0 to r7 fully accessible
Higher registers accessible with MOV, ADD, CMP instructions
Only branch instruction can be conditionally executed Barrel shift operations are separate instructions
ARM-Thumb Interworking
To call a thumb routine from an ARM routine the core has to change state
Changing T bit in CPSR
ARMv5E Extensions
Extensions to facilitate signal processing operations Supports
Signed multiply accumulate instruction Saturation Arithmetic Greater flexibility and efficiency when manipulating 16 bit values for applications such as 16-it digital audio processing
Saturation Arithmetic
Normal ARM arithmetic instructions wrap around when there is an overflow of an integer value Using ARMv5E instructions you can saturate the result
Once the highest number is exceeded the result remains at the maximum value Minimum value does not change on underflow
Summary
We have studied instruction set of ARM processors We have examined Thumb mode of operation We shall look into interrupt processing and other features of ARM architecture next
ARM Exceptions:Review
Exception Mode Fast Interrupt Request FIQ Interrupt Request IRQ
SWI and Reset Pre-fetch Abort & Data Abort Undefined instruction
Vector Table
Vector table a table of addresses that the ARM core branches
Fixed offset for each type of exception
Exception Priorities
Exceptions Reset Data abort FIQ IRQ Pre-fetch abort
SWI
Priority 1 2 3 4 5
6
I bit 1 1 1 1 1
1
Fbit 1 1 -
Undefined instructions
Exception Handlers
Reset handler
Initializes the system, setting up stack pointers, memory, external interrupt sources before enabling IRQ or FIQ Code should be designed to avoid further triggering of exceptions
Data Abort
Occurs when memory controller indicates that an invalid memory address has been accessed An FIQ exception can be raised within data abort handler
IRQ
Occurs when when an external device generates the IRQ input signal IRQ handler will be entered if neither an FIQ exception or Data abort exception occurs On entry IRQ exception is disabled and should remain disabled for the handler if not enabled by the handler
Undefined instruction
Occurs when an instruction is not in the ARM or Thumb instruction
SWI and undefined instruction have the same level of priority because they cannot occur together
Interrupt Assignment
An interrupt controller connects multiple external interrupts to either FIQ or IRQ IRQ are normally assigned to general purpose interrupts
Example: periodic timer interrupt to force a context switch
FIQ is reserved for an interrupt source which requires fast response time
Interrupt Latency
Hardware and software latency Software methods to reduce latency
Nested handler which allows further interrupts to occur even when servicing an existing interrupt by re-enabling the interrupts inside service routine Program interrupt controller to ignore interrupts of same or lower priority
Higher priority interrupts will have lower average latency
Stack Organization
For each processor mode stack has to be set up
To be done every time processor is reset
Change to each mode by storing CPSR bit pattern and initialise sp
Design decisions
Location and mode (descending stack is common) Size
Nested interrupt handler requires larger stack
DMA Support
Large bandwidth data transfer
Diagrams from: ARM SOC Architecture, Steve Furber, Addison Wesley, 2000 ARM Architecture Reference Manual, David Seal, Addison Wesley, 2001
f etch
decode
execute
3 instruction
f etch
decode
execute time
3 stage Pipeline
Pipeline Operation
Not always cycle per instruction completion
Example: LDMIA r0, [r2,r3] (multiple load):
ARM7TDMI organization
sc an c hain 2 ex tern0 ex tern1 opc, r/w, mreq, trans, mas[1:0] A[31:0] D[31:0]
Embedded ICE
sc an c hain 0
processor core
sc an c hain 1
other signals
Din[31:0] Dout[31:0]
bu s splitter
TC K TMSTR ST TD I TD O
mcl k wa it eclk bi ge nd i rq q i sync reset en in en out en outi ab e al e ap e db e tbe bu sen hi gh z bu sdi s ecapclk db grq brea kpt db gack exec exte rn 1 exte rn 0 db gen rang eou t0 rang eou t1 db grqi co mmrx co mmtx op c cp i cp a cp b Vd d Vss
A[31:0] Di n[31:0] Do ut[31 :0 ] D[31:0] bl [3:0] r/w mas[1 :0 ] mreq seq l ock tra ns mod e[4:0] ab ort Tb it
me mory interface
bus control
ARM7TDMI core
tapsm[3 :0 ] i r[3:0] tdoe n tck1 tck2 screg[3:0] dri veb s ecapclkbs i cap cl kb s hi gh z pclkbs rstcl kb s sdi nbs sdoutbs shcl kb s shcl k2 bs TRS T TCK TMS TDI TDO
debug
JTAG controls
Interface signals
ARM9TDMI
Harvard Architecture
Increases available memory bandwidth
Instruction memory interace Data memory interface
next pc
+4 I-cache fetch
pc + 4
pc + 8 r15
register read
mul
LDM/ STM
+4
post index
shift ALU
reg shift
execute
mux
B, BL MOV pc SUBS pc
load/store address
D-cache
Write back
To register file
rot/sgn ex
LDR pc
register write
write-back
Decode
Thumb decompress ARM decode reg read
Execute
shift/ALU reg write
ARM9TDMI :
instr uction fetch r. read decode shift/ALU data memor y access reg write
Fetch
Decode
Execute
Memory
Write
Enhancements in 9E
Count leading zeros instruction
CLZ for faster normalisation and division
Rs
y=B
Rn
Rd
ARM9E Datapath
Instruction Decode and Datapath control logic Byte rotate / Sign Extension
RDATA[]
r0
MUL
Byte/Half Replicate
WDATA[]
REGBANK
CLZ DINC
Imm
BData[..]
BARREL SHIFTER
DA[]
IINC
r14 PC PSR
InsAddr
AData[..]
SAT(x2)
RESULT[..]
ACC SAT
ARM920T organization
instructions
instruction cache
virtual IA
data
data cache
virtual DA
CP15
instruction MMU
ARM9TDMI
EmbeddedICE & JT AG
data MMU
physical DA
AMBA interface
write buffer
Arbiter
AMBA
ARM
Bus Interface On-chip RAM
Reset
Remap/ Pause
Decoder
Interrupt Controller
Peripherals
I/O
8 bit ROM
ARM Core
108
ARM v5TEJ
J : supports implementation of Java virtual machine Offering hardware and software acceleration for optimized byte code execution
ARM v6 Architecture
SIMD (single instruction multiple data) instructions for exploiting data parallelism
High code density and low power By slicing up the existing 32 bit datapath into four 8-bit and two 16-bit slices Example: QADD8<cond> Rd, Rn, Rm
Signed saturating 8-bit SIMD add
Other features
Sum of absolute difference instructions
Example: UASAD8<cond> Rd,Rm,Rs
Sum of absolute difference between corresponding 8-bit values
ARM is mostly used as a processor core in SOC and ASICs There are a number of ASSP (application specific standard product) available, for example, communication applications
Example: Philips VWS22100 : ARM7 based GSM base band chip
Summary
We have discussed architecture of ARM processors We have discussed exception processing Looked at pipeline architecture Understood key aspects of ARM CPU core