ARMfinal 1

ARM Processor
History
ARM was developed at Acron Computer Limited of Cambridge, England between 1983 & 1985
RISC concept introduced in 1980 at Stanford and Berkley
ARM Limited founded in 1990 ARM Cores

Licensed to partners to develop and fabricate new micro-controllers Soft-core
ARM Architecture
Based upon RISC Architecture with enhancements to meet requirements of embedded applications
A large uniform register file Load-store architecture, where data processing operations operate on register contents only Uniform and fixed length instructions 32-bit processor Instructions are 32-bit long Good Speed/Power Consumption Ratio High Code Density
Enhancement to Basic RISC Features

Control over ALU and shifter for every data processing operations to maximize their usage Auto-increment and auto-decrement addressing modes to optimize program loops Load and Store Multiple instructions to maximize data throughput Conditional Execution of instruction to maximize execution throughput
Arm Architecture Versions

Version 1 (1983-85) 26 bit addressing, no multiply or coprocessor Version 2 Includes 32-bit result multiply co-processor Version 3 32 bit addressing Version 4 Add signed, unsigned half-word and signed byte load and store instructions
Version 4T 16-bit Thumb compressed form of instruction introduced
Architecture Versions
Version 5T
Superset of 4T adding new instruction
Version 5TE
Add signal processing signal extension
Examples:
ARM 6 : v3 ARM7 : v3, ARM7TDMI : v4T StrongARM: v4 ARM 9E-S : v5TE
Overview: Core Data Path

Data items are placed in register file
No data processing instructions directly manipulate data in memory
Instructions typically use two source registers and single result or destination registers A Barrel shifter on the data path can preprocess data before it enters ALU Increment/decrement logic can update register content for sequential access independent of ALU
Basic ARM organization

A[31:0] address register P C control incr ementer
P C register bank instruction decode A L U b u s A b u s multiply register B b u s & control
barrel shif ter
ALU
data out register D[31:0]
data in register
Registers
General Purpose registers hold either data or address All registers are of 32 bits In user mode 16 data registers and 2 status registers are visible Data registers: r0 to r15
Three registers r13, r14, r15 perform special functions r13: stack pointer r14: link register ( where return address is put whenever a subroutine is called) r15: program counter
Registers (2)
Depending upon context, registers r13 and r14 can also be used as GPR Any instruction which use r0 can as well be used with any other GPR (r1-r13) In addition, there are two status registers
CPSR: current program status register SPSR: saved program status register
Register: r15
When the processor is executing in ARM state
All instructions are 32 bit wide All instructions are word aligned PC value is stored in bits [31:2] with bits [1:0] undefined
Status Registers
CPSR: monitors and control internal operations
Processor Modes
Processor modes determine
which registers are active, and access rights to CPSR register itself
Each processor mode is either

Privileged: full read-write access to the CPSR Non-privileged: only read access to the control field of CPSR but read-write access to the condition flags
Processor Modes (2)

ARM has seven modes
Privileged: abort, fast interrupt request, interrupt request, supervisor, system and undefined Non-privileged: user
User mode is used for programs and applications
Privileged Modes
Abort: when there is a failed attempt to access memory Fast Interrupt Request (FIQ) & interrupt request: correspond to interrupt levels available on ARM Supervisor mode: state after reset and generally the mode in which OS kernel executes
Privileged Modes (2)

System mode: special version of user mode that allows full readwrite access of CPSR Undefined: when processor encounters an undefined instruction
Banked Registers
Register file contains in all 37 registers
20 registers are hidden from program at different times
These registers are called banked registers
Banked registers are available only when the processor is in a particular mode
Processor modes (other than system mode) have a set of associated banked registers that are subset of 16 registers Maps one-to-one onto a user mode register
Register banking
User User registers replaced by banked registers FIQ IRQ sup undef abort
CPSR copied into SPSR
SPSR
Register Organization
Mode Changing
Mode changes by writing directly to CPSR or by hardware when the processor responds to exception or interrupt To return to user mode a special return instruction is used that instructs the core to restore the original CPSR and banked registers
ARM memory organization

Little Endian
bi t 3 1
23 19 22 18 14 10 6 2 21 17 13 9 5 1
bi t 0
20 16 12 8 4 0
Memory (Byte-wide) Addresses decrease from top to bottom and left to right.
w ord16
15 11 7 3
32-bit word aligned for 8 and 16-bit words also
half -w ord14 half -w ord12 w ord8 by te6 half -w ord4 by te3 by te2 by te1 by te0 by te address
ARM Instruction Set
Instructions
Instructions process data held in registers and access memory with load and store instructions Classes of instructions:
Data processing Branch instructions Load-store instructions Software interrupt instruction Program status register instructions
Features of ARM instruction set
ARM data types

Word is 32 bits long. Word can be divided into four 8-bit bytes. ARM addresses can be 32 bits long. Address refers to byte.
Address 4 starts at byte 4.
Can be configured at power-up as either little- or big-endian mode.
Data Processing
Manipulate data within registers
MOVE instructions Arithmetic instructions
Multiply instructions
Logical instructions Comparison instructions
Suffix S on data processing instructions updates flags in CPSR
Data Processing Instructions

Operands are 32-it wide; come from registers or specified as literal in the instruction itself Second operand sent to ALU via barrel shifter 32-bit result placed in register; long multiply instruction produces 64 bit result
Move instruction
MOV Rd, N
Rd : destination register N : can be an immediate value or source register Example: mov r7, r5
MVN Rd, N
Move into Rd not of the 32-bit value from source
Using Barrel Shifter

Enables shifting 32-bit operand in one of the source registers left or right by a specific number of positions within the cycle time of instruction Basic Barrel shifter operations
Shift left, shift right, rotate right
Facilitates fast multiply, division and increases code density Example: mov r7, r5, LSL #2
Multiplies content of r5 by 4 and puts result in r7
Using Barrel Shifter

Register
Immediate value
Arithmetic Instructions
Implements 32 bit addition and subtraction
3-operand form Examples:
SUB r0, r1, r2
Subtract value stored in r2 from that of r1 and store in r0
SUBS r1, r1, #1

Subtract 1 from r1 and store result in r1 and update Z and C flags
With Barrel Shifter

Use of barrel shifter with arithmetic and logical instructions increases the set of possible available operations Example
ADD r0,r1,r1 LSL#1 Register r1 is shifted to the left by 1, then it is added with r1 and the result ( 3 times r1) is stored in r0.
Multiply Instructions
Multiply contents of a pair of registers
Long multiply generates 64 bit result
Examples:
MUL r0, r1,r2
Contents of r1 and r2 multiplied and put in r0
UMULL r0,r1,r2,r3
Unsigned multiply with result stored in r0 and r1
Number of cycles taken for execution of multiply instruction depends upon processor implementation
Multiply and Accumulate

Result of multiplication can be accumulated with content of another register
MLA Rd, Rm, Rs, Rn
Rd =(Rm*Rs) + Rn
UMLAL Rdlo, Rdhi, Rm, Rs

[Rdhi,Rdlo] = [Rdhi,Rdlo] + (Rm*Rs)
Logical Instructions
Bit wise logical operations on the two source registers
AND, OR, Ex-OR, bit clear Example: BIC r0, r1, r2
R2 contains a binary pattern where every binary 1 in r2 clears a corresponding bit location in register r1 Useful in manipulating status flags and interrupt masks
Compare Instructions
Enables comparison of 32 bit values
Updates CPSR flags but do not affect other registers Examples
CMP r0,r9
Flags set as a result of r0 - r9
TEQ r0,r9
Flags set as a result r0 ex-or r9
TST r0,r9
Flags as a result of r0 & r9
Summary
We have examined basics of ARM architecture Understood processor modes We have looked at core data path Discussed basic data processing operations
More ARM instructions
Load-Store Instructions
Transfers data between memory and processor registers
Single register transfer
Data types supported are signed and unsigned words (32 bits), half-words, bytes
Multiple-register transfer
Transfer multiple registers between memory and the processor in a single instruction
Swap
Swaps content of a memory location with the contents of a register
Single Transfer Instructions

Load & Store data on a boundary alignment
LDR, LDRH, LDRB :
load (word, half-word, byte)
STR, STRH, STRB :

store (word, half-word, byte)
Supports different addressing modes:

Register indirect : LDR r0,[r1] Immediate: LDR r0,[r1,#4]
12-bit offset added to the base register
Register operation: LDR r0,[r1,-r2]

Address calculated using base register and another register
Scaled
More Addressing Modes
Address is calculated using the base address register and a barrel shift operation
Pre & Post Indexing

Pre-index with write back: LDR r0,[r1,#4]!
Updates the address base register with new address
Post index: LDR r0,[r1],#4

Updates the address register after address is used
Example
Pre-indexing with write back LDR r0,[r1,#4]!
Before instruction execution
r0 = 0x00000000 r1 = 0x00009000 Mem32[0x00009000] = 0x01010101 Mem32[0x00009004] = 0x02020202
After instruction execution

r0 = 0x02020202 r1 = 0x00009004
Multiple Register Transfer

Load-store multiple instructions transfer multiple register contents between memory and the processor in a single instruction More efficient for moving blocks of memory and saving and restoring context and stack These instructions can increase interrupt
latency
Usually instruction executions are not interrupted by ARM
Multiple Byte Load-Store

Any subset of current bank of registers can be transferred to memory or fetched from memory
LDM SDM
The base register Rn determines source or destination address
Addressing Modes
LDMIA|IB|DA|DB ex: LDMIA STMIA|IB|DA|DB
Start Address IA IB DA DB Increment after Increment before Rn Rn + 4
Rn!, {r1-r3}
End Address Rn+4*N-4 Rn + 4*N Rn! Rn +4*N Rn + 4*N Rn-4*N Rn-4*N
Decrement Rn 4*N + 4 Rn after Decrement Rn-4*N before Rn-4
Stack Processing
A stack is implemented as a linear data structure which grows up (ascending) or down(descending) Stack pointer hold the address of the current top of the stack
Modes of Stack Operation

ARM multiple register transfer instructions support
Full ascending: grows up, SP points to the highest address containing a valid item Empty ascending: grows up, SP points to the first empty location above stack Full descending: grows down, SP points to the lowest address containing a valid data Empty descending: grows down, SP points to the first location below the stack
Some Stack Instructions

Full Ascending
LDMFA : translates to LDMDA (POP) STMFA : translates to STMIB (PUSH) SP points to last item in stack
Empty Descending
LDMED : translates to LDMIB (POP) STMED : translates to STMIA (PUSH) SP points to first unused location
SWAP Instruction
Special case of load store instruction Swap instructions:
SWP : swap a word between memory and register SWPB : swap a byte between memory and register
Useful for implementing synchronization primitives like semaphore
Control Flow Instructions

Branch Instructions Conditional Branches Conditional Execution Branch and Link instructions Subroutine Return Instructions
Branch Instruction
Branch instruction : B label
Example: B forward Address label is stored in the instruction as a signed pc-relative offset
Conditional Branch: B<cond> label

Example: BNE loop Branch has a condition associated with it and executed if condition codes have the correct value
Example: Block memory copy

Loop LDMIA STMIA CMP BNE r9!, {r0-r7} r10!, {r0-r7} r9, r11 Loop
r9 points to source of data, r10 points to start of destination data, r11 points to end of the source
Conditional Execution
An unusual feature of ARM instruction set is that conditional execution applies not only to branches but to all ARM instructions Example: ADDEQ r0,r1,r2
Instruction will only be executed when the zero flag is set to 1
Advantages
Reduces the number of branches
Reduces the number of pipeline flushes Improves performance of the code
Increases code density Whenever the conditional sequence is 3 instructions or fewer (smaller and faster) to exploit conditional execution than to use a branch
Branch & Link Instruction

Perform a branch, save the address following the branch in the link register, r14
Example: BL subroutine
For nested subroutine, push r14 and some work registers required to be saved onto a stack in memory
Example
BL sub1 STMFD r13!,{r0-r2,r14} BL sub2 ..
Subroutine return instructions

No specific instructions Example (1):
sub ..
MOV PC,r14
Example (2): when return address has been pushed to stack

sub2
LDMFD r13!,{r0-r12,PC}
Software Interrupt Instruction (SWI)

A software Interrupt instruction causes a software interrupt exception, which provides a mechanism for applications to OS routines Instruction: SWI {<cond>} SWI_number When the processor executes an SWI instruction, it sets the program counter pc to the offset 0x8 in the vector table Instruction also forces the processor mode to SVC, which allows an operating system routine to execute
SWI
SWI is typically executed in user mode Instruction forces processor mode to supervisor (SVC) this allows an OS routine to be executed in privileged mode Each SWI has an associated SWI number which is used to represent a particular function call or feature Parameter passing through registers; Return value is also passed using registers
Example
PRE : cpsr =nzcvqift_USER pc = 0x00008000 lr= 0x003fffff (lr=r14) r0=0x12 0x00008000 SWI 0x123456 POST: cpsr = nzcvqift_SVC spsr=nzcvqift_USER pc=0x00008004 lr =0x00008004 (lr=r14_SVC) r0=0x12
Program Status Register Instructions

Two instructions to control PSR directly MRS transfers contents of either cpsr or spsr into a register MSR transfers contents of register to cpsr or spsr
Example
Enabling IRQ interupt
PRE cpsr = nzcvqIFt_SVC
MRS r1,CPSR BIC r1,r1,#0x80 MSR cpsr, r1
POST cpsr =nzcvqiFt_SVC Instructions in SVC mode
Coprocessor Instructions
Used to extend the instruction set
Used by cores with a coprocessor Coprocessor specific operations
Syntax: coprocessor data processing

CDP{<cond>} cp,opcode1, Cd, Cn, Cm, {,opcode2}
Cp represents coprocessor number between p0 to p15 opcode field describes coprocessor operation Cd, Cn, Cm coprocessor registers
Also coprocessor register transfer and memory transfer instructions
Thumb
Thumb encodes a subset of the 32 bit instruction set into a 16-bit subspace Thumb has higher performance than ARM on a processor with a 16-bit data bus Thumb has higher code density
For memory constrained embedded system
Code density
ARM divide
MOV r3,#0
Thumb divide
MOV r3,#0
Loop
SUBS r0,r0,r1 ADDGE r3,r3,#1 BGE loop ADD r2,r0,r1 5x4 =20 bytes
Loop
ADD SUB BGE SUB ADD r3,#1 r0,r1 loop r3,#1 r2,r0,r1
6x2 = 12 bytes
Thumb instructions
Only low registers r0 to r7 fully accessible
Higher registers accessible with MOV, ADD, CMP instructions
Only branch instruction can be conditionally executed Barrel shift operations are separate instructions
ARM-Thumb Interworking
To call a thumb routine from an ARM routine the core has to change state
Changing T bit in CPSR
BX and BLX instruction can be used for the switch

Example : BX r0 ; BLX r0 Enters Thumb state if bit 0 of the address in Rn is set to binary 1; otherwise it enters ARM state Thumb
Thumb (T) Architecture

Thumb instruction decoder is placed in pipeline
Change in Thumb mode happens by changing the state of multiplexers A1

A1 selects 16 bit data
ARMv5E Extensions
Extensions to facilitate signal processing operations Supports
Signed multiply accumulate instruction Saturation Arithmetic Greater flexibility and efficiency when manipulating 16 bit values for applications such as 16-it digital audio processing
Saturation Arithmetic
Normal ARM arithmetic instructions wrap around when there is an overflow of an integer value Using ARMv5E instructions you can saturate the result
Once the highest number is exceeded the result remains at the maximum value Minimum value does not change on underflow
Example Instructions: QADD, QSUB
Summary
We have studied instruction set of ARM processors We have examined Thumb mode of operation We shall look into interrupt processing and other features of ARM architecture next
ARM: Interrupt Processing
ARM Exceptions:Review
Exception Mode Fast Interrupt Request FIQ Interrupt Request IRQ
SWI and Reset Pre-fetch Abort & Data Abort Undefined instruction
SVC abort undefined
When Exception Occurs

Exception causes mode change and
Saves the cpsr to the spsr of the exception mode Saves the pc to the lr of the exception mode Sets the cpsr to the exception mode Sets pc to the address of the exception handler
Vector Table
Vector table a table of addresses that the ARM core branches
Fixed offset for each type of exception
These addresses contain instructions of one of the following forms:

B <address> : branching relative to PC LDR pc, [pc, #offset] : loads handler address from memory to PC MOV PC, #immediate : loads immediate value into PC
Exception Priorities
Exceptions Reset Data abort FIQ IRQ Pre-fetch abort
SWI
Priority 1 2 3 4 5
6
I bit 1 1 1 1 1
1
Fbit 1 1 -
Undefined instructions
Exception Handlers
Reset handler
Initializes the system, setting up stack pointers, memory, external interrupt sources before enabling IRQ or FIQ Code should be designed to avoid further triggering of exceptions
Data Abort
Occurs when memory controller indicates that an invalid memory address has been accessed An FIQ exception can be raised within data abort handler
Exception Handlers (contd.)

FIQ
Occurs when an external peripheral generates the FIQ input signal Core disables both FIQ and IRQ interrupts
IRQ
Occurs when when an external device generates the IRQ input signal IRQ handler will be entered if neither an FIQ exception or Data abort exception occurs On entry IRQ exception is disabled and should remain disabled for the handler if not enabled by the handler
Exception Handlers (contd.)

Prefetch Abort
Occurs when an attempt to fetch an instruction results in memory fault FIQ exception can be serviced
Undefined instruction
Occurs when an instruction is not in the ARM or Thumb instruction
SWI and undefined instruction have the same level of priority because they cannot occur together
Returning from Exception Handler

Exception handler must not corrupt lr After servicing is complete, return to normal execution occurs
By moving the correct value of link register r14 into pc By restoring cpsr from spsr
Interrupt Assignment
An interrupt controller connects multiple external interrupts to either FIQ or IRQ IRQ are normally assigned to general purpose interrupts
Example: periodic timer interrupt to force a context switch
FIQ is reserved for an interrupt source which requires fast response time
Interrupt Latency
Hardware and software latency Software methods to reduce latency
Nested handler which allows further interrupts to occur even when servicing an existing interrupt by re-enabling the interrupts inside service routine Program interrupt controller to ignore interrupts of same or lower priority
Higher priority interrupts will have lower average latency
Stack Organization
For each processor mode stack has to be set up
To be done every time processor is reset
Change to each mode by storing CPSR bit pattern and initialise sp
Design decisions
Location and mode (descending stack is common) Size
Nested interrupt handler requires larger stack
ARM I/O System

Handles all I/O devices using memory mapped I/O Interrupt support:
Fast interrupt Normal interrupt
DMA Support
Large bandwidth data transfer
ARM CPU Core

Processor Core + Cache + MMU
Diagrams from: ARM SOC Architecture, Steve Furber, Addison Wesley, 2000 ARM Architecture Reference Manual, David Seal, Addison Wesley, 2001
ARM 7 Processor Core

Low-end ARM core for applications like mobile phones TDMI
T : Thumb D : On chip debug support enabling processor to halt in response to debug request M : Enhanced multiplier, yield a full 64-bit result I : Embedded ICE Hardware
Von Neumann architecture 3 stage pipeline, CPI ~ 1.9
ARM single-cycle instruction pipeline operation

1 f etch decode execute
f etch
decode
execute
3 instruction
f etch
decode
execute time
3 stage Pipeline
Before returning from exception handler proper adjustment of lr value is required
Pipeline Operation
Not always cycle per instruction completion
Example: LDMIA r0, [r2,r3] (multiple load):
2 registers to load , instruction in execution for two cycles

Execution of Prefetched instruction delayed
Branch, Subroutine call, Exceptions effect pipeline efficiency
Interrupt Pipeline Example

FIQ
FIQ minimum latency : 7 cycles
ARM7TDMI organization
sc an c hain 2 ex tern0 ex tern1 opc, r/w, mreq, trans, mas[1:0] A[31:0] D[31:0]
Embedded ICE
sc an c hain 0
processor core
sc an c hain 1
other signals
Din[31:0] Dout[31:0]
bu s splitter
JTAG TAP controller
TC K TMSTR ST TD I TD O
The ARM7TDMI core interface signals
clock control conf iguration interrupts initialization
mcl k wa it eclk bi ge nd i rq q i sync reset en in en out en outi ab e al e ap e db e tbe bu sen hi gh z bu sdi s ecapclk db grq brea kpt db gack exec exte rn 1 exte rn 0 db gen rang eou t0 rang eou t1 db grqi co mmrx co mmtx op c cp i cp a cp b Vd d Vss
A[31:0] Di n[31:0] Do ut[31 :0 ] D[31:0] bl [3:0] r/w mas[1 :0 ] mreq seq l ock tra ns mod e[4:0] ab ort Tb it
me mory interface
bus control
MMU interface state
ARM7TDMI core
tapsm[3 :0 ] i r[3:0] tdoe n tck1 tck2 screg[3:0] dri veb s ecapclkbs i cap cl kb s hi gh z pclkbs rstcl kb s sdi nbs sdoutbs shcl kb s shcl k2 bs TRS T TCK TMS TDI TDO
TAP inf ormation
debug
boundary scan extension
coprocessor interface pow er
JTAG controls
Interface signals
Interface signals (contd.)
Interface signals (contd.)
ARM Memory Interface
ARM9TDMI
Harvard Architecture
Increases available memory bandwidth
Instruction memory interace Data memory interface
Simultaneous access to instruction and data memory
5 stage pipeline Changes implemented to

increase CPI ~1.5 Improve maximum clock frequency
5 stage Pipe-line Organization

Fetch Decode Execute Buffer Data
Access data memory or buffer
next pc
+4 I-cache fetch
pc + 4
pc + 8 r15
I decode instruction decode

immediate fields
register read
mul
LDM/ STM
+4
post index
shift ALU
reg shift
pre-index forwarding paths
execute
mux
B, BL MOV pc SUBS pc
byte repl. buffer/ data
load/store address
D-cache
Write back
To register file
rot/sgn ex
LDR pc
register write
write-back
ARM7TDMI and ARM9TDMI pipeline comparisons

ARM7TDMI: Fetch
instruction fetch
Decode
Thumb decompress ARM decode reg read
Execute
shift/ALU reg write
ARM9TDMI :
instr uction fetch r. read decode shift/ALU data memor y access reg write
Fetch
Decode
Execute
Memory
Write
DSP enhancements in ARM9E

New instruction additions give architecture V5TE New 32x16 and 16x16 multiply and multiplyaccumulate instructions
SMLAxy, SMLAWy, SMLALxy, SMULxy, SMULWy Allows independent access to 16-bit halves of registers
Gives efficient use of 32-bit bandwidth for packed 16-bit operands
Zero overhead fractional saturating arithmetic

QADD, QSUB, QDADD, QDSUB
Enhancements in 9E
Count leading zeros instruction
CLZ for faster normalisation and division
Single cycle 32x16 multiplier array

speeds up all ARM9E multiply instructions
New multiply-accumulate instructions

SMLAxy Rd,Rm,Rs,Rn
Rm
x=T x & y select the upper and lower 16-bits of the 32-bit registers x=B y=T
Rs
y=B
Rn
16x32 or 16x16 multiply gives 48-bit or 32-bit product
32-bit register or 64-bit register-pair as accumulation source
Other instructions include:SMUL: 16x16 => 32
SMLAL: 16x16 + 64 => 64
SMLAW: 32x16 + 32 => 32

SMULW: 32x16 => 32 MLA: MLAL: 32x32 + 32 => 32 32x32 + 64 => 64
Rd
32-bit register or 64bit register-pair as accumulation destination
ARM9E Datapath
Instruction Decode and Datapath control logic Byte rotate / Sign Extension
RDATA[]
r0
MUL
Byte/Half Replicate
WDATA[]
REGBANK
CLZ DINC
Imm
BData[..]
BARREL SHIFTER
DA[]
IINC
r14 PC PSR
InsAddr
AData[..]
SAT(x2)
RESULT[..]
ACC SAT
ARM920T organization
instructions
instruction cache
virtual IA
external coprocessor interface
data
data cache
virtual DA
CP15
instruction MMU
ARM9TDMI
EmbeddedICE & JT AG
data MMU
physical DA
CPU core around

ARM9TDMI
physical IA
AMBA interface
write buffer
physical address tag

copy-back DA
AMBA AMBA address data
Arbiter
AMBA
ARM
Bus Interface On-chip RAM
Reset
TIC Bridge External ROM External RAM Timer
External Bus Interface
Remap/ Pause
Decoder
Interrupt Controller
AHB or ASB System Bus
APB Peripheral Bus
AMBA (advanced micro-controller bus architecture): ARMs on chip bus specification
Simple ARM based System

On-chip there will be an ARM core (obviously) together with a number of system dependant peripherals. Also required will be some form of interrupt controller which receives interrupts from the peripherals and raised the IRQ or FIQ input to the ARM as appropriate.
This interrupt controller may also provide hardware assistance for prioritizing interrupts.
Simple ARM based System

As far as memory is concerned there is likely to be some (cheap) narrow off-chip ROM (or flash) used to boot the system from. There is also likely to be some 16-bit wide RAM used to store most of the runtime data and perhaps some code copied out of the flash. Then on-chip there may well be some 32-bit memory used to store the interrupt handlers and perhaps stacks.
Example ARM-based System

16 bit RAM 32 bit RAM Interrupt Controller
nIRQ nFIQ
Peripherals
I/O
8 bit ROM
ARM Core
108
ARM v5TEJ
J : supports implementation of Java virtual machine Offering hardware and software acceleration for optimized byte code execution
ARM v6 Architecture
SIMD (single instruction multiple data) instructions for exploiting data parallelism
High code density and low power By slicing up the existing 32 bit datapath into four 8-bit and two 16-bit slices Example: QADD8<cond> Rd, Rn, Rm
Signed saturating 8-bit SIMD add
Other features
Sum of absolute difference instructions
Example: UASAD8<cond> Rd,Rm,Rs
Sum of absolute difference between corresponding 8-bit values
Dual 16x16 multiply Cryptographic multiplication

A new 64 + 32x32 multiply accumulate operation
Multiprocessing synchronization primitive
Use of ARM Core

ARM based products to market from manufacturers: Atmel, Cirrus Logic, Intel, Samsung
Most products based upon 7TDMI-core and 920Tcores
ARM is mostly used as a processor core in SOC and ASICs There are a number of ASSP (application specific standard product) available, for example, communication applications
Example: Philips VWS22100 : ARM7 based GSM base band chip
Intels ARM Derivative

Xscale ARM v5TE instruction set Intel developed micro architecture Coprocessor instructions for extension
Summary
We have discussed architecture of ARM processors We have discussed exception processing Looked at pipeline architecture Understood key aspects of ARM CPU core

ARMfinal 1

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

ARMfinal 1

Caricato da

Copyright:

Formati disponibili

ARM Processor

ARM Limited founded in 1990 ARM Cores

Enhancement to Basic RISC Features

Arm Architecture Versions

Overview: Core Data Path

Basic ARM organization

P C register bank instruction decode A L U b u s A b u s multiply register B b u s & control

barrel shif ter

data out register D[31:0]

Each processor mode is either

Processor Modes (2)

User mode is used for programs and applications

Privileged Modes (2)

CPSR copied into SPSR

ARM memory organization

32-bit word aligned for 8 and 16-bit words also

ARM Instruction Set

Features of ARM instruction set

ARM data types

Can be configured at power-up as either little- or big-endian mode.

Logical instructions Comparison instructions

Suffix S on data processing instructions updates flags in CPSR

Data Processing Instructions

Using Barrel Shifter

Using Barrel Shifter

SUBS r1, r1, #1

With Barrel Shifter

Multiply and Accumulate

UMLAL Rdlo, Rdhi, Rm, Rs

More ARM instructions

Single Transfer Instructions

STR, STRH, STRB :

Supports different addressing modes:

Register operation: LDR r0,[r1,-r2]

More Addressing Modes

Pre & Post Indexing

Post index: LDR r0,[r1],#4

After instruction execution

Multiple Register Transfer

Usually instruction executions are not interrupted by ARM

Multiple Byte Load-Store

The base register Rn determines source or destination address

Decrement Rn 4*N + 4 Rn after Decrement Rn-4*N before Rn-4

Modes of Stack Operation

Some Stack Instructions

Useful for implementing synchronization primitives like semaphore

Control Flow Instructions

Conditional Branch: B<cond> label

Example: Block memory copy

Branch & Link Instruction

Subroutine return instructions

Example (2): when return address has been pushed to stack

Software Interrupt Instruction (SWI)

Program Status Register Instructions

POST cpsr =nzcvqiFt_SVC Instructions in SVC mode

Syntax: coprocessor data processing

Also coprocessor register transfer and memory transfer instructions

BX and BLX instruction can be used for the switch

Thumb (T) Architecture

Change in Thumb mode happens by changing the state of multiplexers A1

Example Instructions: QADD, QSUB

ARM: Interrupt Processing

Decrement Rn 4N + 4 Rn after Decrement Rn-4N before Rn-4