Sei sulla pagina 1di 114

ARM Processor

History
ARM was developed at Acron Computer Limited of Cambridge, England between 1983 & 1985
RISC concept introduced in 1980 at Stanford and Berkley

ARM Limited founded in 1990 ARM Cores


Licensed to partners to develop and fabricate new micro-controllers Soft-core

ARM Architecture
Based upon RISC Architecture with enhancements to meet requirements of embedded applications
A large uniform register file Load-store architecture, where data processing operations operate on register contents only Uniform and fixed length instructions 32-bit processor Instructions are 32-bit long Good Speed/Power Consumption Ratio High Code Density

Enhancement to Basic RISC Features


Control over ALU and shifter for every data processing operations to maximize their usage Auto-increment and auto-decrement addressing modes to optimize program loops Load and Store Multiple instructions to maximize data throughput Conditional Execution of instruction to maximize execution throughput

Arm Architecture Versions


Version 1 (1983-85) 26 bit addressing, no multiply or coprocessor Version 2 Includes 32-bit result multiply co-processor Version 3 32 bit addressing Version 4 Add signed, unsigned half-word and signed byte load and store instructions
Version 4T 16-bit Thumb compressed form of instruction introduced

Architecture Versions
Version 5T
Superset of 4T adding new instruction

Version 5TE
Add signal processing signal extension

Examples:
ARM 6 : v3 ARM7 : v3, ARM7TDMI : v4T StrongARM: v4 ARM 9E-S : v5TE

Overview: Core Data Path


Data items are placed in register file
No data processing instructions directly manipulate data in memory

Instructions typically use two source registers and single result or destination registers A Barrel shifter on the data path can preprocess data before it enters ALU Increment/decrement logic can update register content for sequential access independent of ALU

Basic ARM organization


A[31:0] address register P C control incr ementer

P C register bank instruction decode A L U b u s A b u s multiply register B b u s & control

barrel shif ter

ALU

data out register D[31:0]

data in register

Registers
General Purpose registers hold either data or address All registers are of 32 bits In user mode 16 data registers and 2 status registers are visible Data registers: r0 to r15
Three registers r13, r14, r15 perform special functions r13: stack pointer r14: link register ( where return address is put whenever a subroutine is called) r15: program counter

Registers (2)
Depending upon context, registers r13 and r14 can also be used as GPR Any instruction which use r0 can as well be used with any other GPR (r1-r13) In addition, there are two status registers
CPSR: current program status register SPSR: saved program status register

Register: r15
When the processor is executing in ARM state
All instructions are 32 bit wide All instructions are word aligned PC value is stored in bits [31:2] with bits [1:0] undefined

Status Registers
CPSR: monitors and control internal operations

Processor Modes
Processor modes determine
which registers are active, and access rights to CPSR register itself

Each processor mode is either


Privileged: full read-write access to the CPSR Non-privileged: only read access to the control field of CPSR but read-write access to the condition flags

Processor Modes (2)


ARM has seven modes
Privileged: abort, fast interrupt request, interrupt request, supervisor, system and undefined Non-privileged: user

User mode is used for programs and applications

Privileged Modes
Abort: when there is a failed attempt to access memory Fast Interrupt Request (FIQ) & interrupt request: correspond to interrupt levels available on ARM Supervisor mode: state after reset and generally the mode in which OS kernel executes

Privileged Modes (2)


System mode: special version of user mode that allows full readwrite access of CPSR Undefined: when processor encounters an undefined instruction

Banked Registers
Register file contains in all 37 registers
20 registers are hidden from program at different times
These registers are called banked registers

Banked registers are available only when the processor is in a particular mode
Processor modes (other than system mode) have a set of associated banked registers that are subset of 16 registers Maps one-to-one onto a user mode register

Register banking
User User registers replaced by banked registers FIQ IRQ sup undef abort

CPSR copied into SPSR

SPSR

Register Organization

Mode Changing
Mode changes by writing directly to CPSR or by hardware when the processor responds to exception or interrupt To return to user mode a special return instruction is used that instructs the core to restore the original CPSR and banked registers

ARM memory organization


Little Endian
bi t 3 1
23 19 22 18 14 10 6 2 21 17 13 9 5 1

bi t 0
20 16 12 8 4 0

Memory (Byte-wide) Addresses decrease from top to bottom and left to right.

w ord16
15 11 7 3

32-bit word aligned for 8 and 16-bit words also

half -w ord14 half -w ord12 w ord8 by te6 half -w ord4 by te3 by te2 by te1 by te0 by te address

ARM Instruction Set

Instructions
Instructions process data held in registers and access memory with load and store instructions Classes of instructions:
Data processing Branch instructions Load-store instructions Software interrupt instruction Program status register instructions

Features of ARM instruction set

ARM data types


Word is 32 bits long. Word can be divided into four 8-bit bytes. ARM addresses can be 32 bits long. Address refers to byte.
Address 4 starts at byte 4.

Can be configured at power-up as either little- or big-endian mode.

Data Processing
Manipulate data within registers
MOVE instructions Arithmetic instructions
Multiply instructions

Logical instructions Comparison instructions

Suffix S on data processing instructions updates flags in CPSR

Data Processing Instructions


Operands are 32-it wide; come from registers or specified as literal in the instruction itself Second operand sent to ALU via barrel shifter 32-bit result placed in register; long multiply instruction produces 64 bit result

Move instruction
MOV Rd, N
Rd : destination register N : can be an immediate value or source register Example: mov r7, r5

MVN Rd, N
Move into Rd not of the 32-bit value from source

Using Barrel Shifter


Enables shifting 32-bit operand in one of the source registers left or right by a specific number of positions within the cycle time of instruction Basic Barrel shifter operations
Shift left, shift right, rotate right

Facilitates fast multiply, division and increases code density Example: mov r7, r5, LSL #2
Multiplies content of r5 by 4 and puts result in r7

Using Barrel Shifter


Register

Immediate value

Arithmetic Instructions
Implements 32 bit addition and subtraction
3-operand form Examples:
SUB r0, r1, r2
Subtract value stored in r2 from that of r1 and store in r0

SUBS r1, r1, #1


Subtract 1 from r1 and store result in r1 and update Z and C flags

With Barrel Shifter


Use of barrel shifter with arithmetic and logical instructions increases the set of possible available operations Example
ADD r0,r1,r1 LSL#1 Register r1 is shifted to the left by 1, then it is added with r1 and the result ( 3 times r1) is stored in r0.

Multiply Instructions
Multiply contents of a pair of registers
Long multiply generates 64 bit result

Examples:
MUL r0, r1,r2
Contents of r1 and r2 multiplied and put in r0

UMULL r0,r1,r2,r3
Unsigned multiply with result stored in r0 and r1

Number of cycles taken for execution of multiply instruction depends upon processor implementation

Multiply and Accumulate


Result of multiplication can be accumulated with content of another register
MLA Rd, Rm, Rs, Rn
Rd =(Rm*Rs) + Rn

UMLAL Rdlo, Rdhi, Rm, Rs


[Rdhi,Rdlo] = [Rdhi,Rdlo] + (Rm*Rs)

Logical Instructions
Bit wise logical operations on the two source registers
AND, OR, Ex-OR, bit clear Example: BIC r0, r1, r2
R2 contains a binary pattern where every binary 1 in r2 clears a corresponding bit location in register r1 Useful in manipulating status flags and interrupt masks

Compare Instructions
Enables comparison of 32 bit values
Updates CPSR flags but do not affect other registers Examples
CMP r0,r9
Flags set as a result of r0 - r9

TEQ r0,r9
Flags set as a result r0 ex-or r9

TST r0,r9
Flags as a result of r0 & r9

Summary
We have examined basics of ARM architecture Understood processor modes We have looked at core data path Discussed basic data processing operations

More ARM instructions

Load-Store Instructions
Transfers data between memory and processor registers
Single register transfer
Data types supported are signed and unsigned words (32 bits), half-words, bytes

Multiple-register transfer
Transfer multiple registers between memory and the processor in a single instruction

Swap
Swaps content of a memory location with the contents of a register

Single Transfer Instructions


Load & Store data on a boundary alignment
LDR, LDRH, LDRB :
load (word, half-word, byte)

STR, STRH, STRB :


store (word, half-word, byte)

Supports different addressing modes:


Register indirect : LDR r0,[r1] Immediate: LDR r0,[r1,#4]
12-bit offset added to the base register

Register operation: LDR r0,[r1,-r2]


Address calculated using base register and another register

Scaled

More Addressing Modes

Address is calculated using the base address register and a barrel shift operation

Pre & Post Indexing


Pre-index with write back: LDR r0,[r1,#4]!
Updates the address base register with new address

Post index: LDR r0,[r1],#4


Updates the address register after address is used

Example
Pre-indexing with write back LDR r0,[r1,#4]!
Before instruction execution
r0 = 0x00000000 r1 = 0x00009000 Mem32[0x00009000] = 0x01010101 Mem32[0x00009004] = 0x02020202

After instruction execution


r0 = 0x02020202 r1 = 0x00009004

Multiple Register Transfer


Load-store multiple instructions transfer multiple register contents between memory and the processor in a single instruction More efficient for moving blocks of memory and saving and restoring context and stack These instructions can increase interrupt

latency

Usually instruction executions are not interrupted by ARM

Multiple Byte Load-Store


Any subset of current bank of registers can be transferred to memory or fetched from memory
LDM SDM

The base register Rn determines source or destination address

Addressing Modes
LDMIA|IB|DA|DB ex: LDMIA STMIA|IB|DA|DB
Start Address IA IB DA DB Increment after Increment before Rn Rn + 4

Rn!, {r1-r3}
End Address Rn+4*N-4 Rn + 4*N Rn! Rn +4*N Rn + 4*N Rn-4*N Rn-4*N

Decrement Rn 4*N + 4 Rn after Decrement Rn-4*N before Rn-4

Stack Processing
A stack is implemented as a linear data structure which grows up (ascending) or down(descending) Stack pointer hold the address of the current top of the stack

Modes of Stack Operation


ARM multiple register transfer instructions support
Full ascending: grows up, SP points to the highest address containing a valid item Empty ascending: grows up, SP points to the first empty location above stack Full descending: grows down, SP points to the lowest address containing a valid data Empty descending: grows down, SP points to the first location below the stack

Some Stack Instructions


Full Ascending
LDMFA : translates to LDMDA (POP) STMFA : translates to STMIB (PUSH) SP points to last item in stack

Empty Descending
LDMED : translates to LDMIB (POP) STMED : translates to STMIA (PUSH) SP points to first unused location

SWAP Instruction
Special case of load store instruction Swap instructions:
SWP : swap a word between memory and register SWPB : swap a byte between memory and register

Useful for implementing synchronization primitives like semaphore

Control Flow Instructions


Branch Instructions Conditional Branches Conditional Execution Branch and Link instructions Subroutine Return Instructions

Branch Instruction
Branch instruction : B label
Example: B forward Address label is stored in the instruction as a signed pc-relative offset

Conditional Branch: B<cond> label


Example: BNE loop Branch has a condition associated with it and executed if condition codes have the correct value

Example: Block memory copy


Loop LDMIA STMIA CMP BNE r9!, {r0-r7} r10!, {r0-r7} r9, r11 Loop

r9 points to source of data, r10 points to start of destination data, r11 points to end of the source

Conditional Execution
An unusual feature of ARM instruction set is that conditional execution applies not only to branches but to all ARM instructions Example: ADDEQ r0,r1,r2
Instruction will only be executed when the zero flag is set to 1

Advantages
Reduces the number of branches
Reduces the number of pipeline flushes Improves performance of the code

Increases code density Whenever the conditional sequence is 3 instructions or fewer (smaller and faster) to exploit conditional execution than to use a branch

Branch & Link Instruction


Perform a branch, save the address following the branch in the link register, r14
Example: BL subroutine

For nested subroutine, push r14 and some work registers required to be saved onto a stack in memory
Example
BL sub1 STMFD r13!,{r0-r2,r14} BL sub2 ..

Subroutine return instructions


No specific instructions Example (1):
sub ..
MOV PC,r14

Example (2): when return address has been pushed to stack


sub2
LDMFD r13!,{r0-r12,PC}

Software Interrupt Instruction (SWI)


A software Interrupt instruction causes a software interrupt exception, which provides a mechanism for applications to OS routines Instruction: SWI {<cond>} SWI_number When the processor executes an SWI instruction, it sets the program counter pc to the offset 0x8 in the vector table Instruction also forces the processor mode to SVC, which allows an operating system routine to execute

SWI
SWI is typically executed in user mode Instruction forces processor mode to supervisor (SVC) this allows an OS routine to be executed in privileged mode Each SWI has an associated SWI number which is used to represent a particular function call or feature Parameter passing through registers; Return value is also passed using registers

Example
PRE : cpsr =nzcvqift_USER pc = 0x00008000 lr= 0x003fffff (lr=r14) r0=0x12 0x00008000 SWI 0x123456 POST: cpsr = nzcvqift_SVC spsr=nzcvqift_USER pc=0x00008004 lr =0x00008004 (lr=r14_SVC) r0=0x12

Program Status Register Instructions


Two instructions to control PSR directly MRS transfers contents of either cpsr or spsr into a register MSR transfers contents of register to cpsr or spsr

Example
Enabling IRQ interupt
PRE cpsr = nzcvqIFt_SVC
MRS r1,CPSR BIC r1,r1,#0x80 MSR cpsr, r1

POST cpsr =nzcvqiFt_SVC Instructions in SVC mode

Coprocessor Instructions
Used to extend the instruction set
Used by cores with a coprocessor Coprocessor specific operations

Syntax: coprocessor data processing


CDP{<cond>} cp,opcode1, Cd, Cn, Cm, {,opcode2}
Cp represents coprocessor number between p0 to p15 opcode field describes coprocessor operation Cd, Cn, Cm coprocessor registers

Also coprocessor register transfer and memory transfer instructions

Thumb
Thumb encodes a subset of the 32 bit instruction set into a 16-bit subspace Thumb has higher performance than ARM on a processor with a 16-bit data bus Thumb has higher code density
For memory constrained embedded system

Code density
ARM divide
MOV r3,#0

Thumb divide
MOV r3,#0

Loop
SUBS r0,r0,r1 ADDGE r3,r3,#1 BGE loop ADD r2,r0,r1 5x4 =20 bytes

Loop
ADD SUB BGE SUB ADD r3,#1 r0,r1 loop r3,#1 r2,r0,r1

6x2 = 12 bytes

Thumb instructions
Only low registers r0 to r7 fully accessible
Higher registers accessible with MOV, ADD, CMP instructions

Only branch instruction can be conditionally executed Barrel shift operations are separate instructions

ARM-Thumb Interworking
To call a thumb routine from an ARM routine the core has to change state
Changing T bit in CPSR

BX and BLX instruction can be used for the switch


Example : BX r0 ; BLX r0 Enters Thumb state if bit 0 of the address in Rn is set to binary 1; otherwise it enters ARM state Thumb

Thumb (T) Architecture


Thumb instruction decoder is placed in pipeline

Change in Thumb mode happens by changing the state of multiplexers A1


A1 selects 16 bit data

ARMv5E Extensions
Extensions to facilitate signal processing operations Supports
Signed multiply accumulate instruction Saturation Arithmetic Greater flexibility and efficiency when manipulating 16 bit values for applications such as 16-it digital audio processing

Saturation Arithmetic
Normal ARM arithmetic instructions wrap around when there is an overflow of an integer value Using ARMv5E instructions you can saturate the result
Once the highest number is exceeded the result remains at the maximum value Minimum value does not change on underflow

Example Instructions: QADD, QSUB

Summary
We have studied instruction set of ARM processors We have examined Thumb mode of operation We shall look into interrupt processing and other features of ARM architecture next

ARM: Interrupt Processing

ARM Exceptions:Review
Exception Mode Fast Interrupt Request FIQ Interrupt Request IRQ

SWI and Reset Pre-fetch Abort & Data Abort Undefined instruction

SVC abort undefined

When Exception Occurs


Exception causes mode change and
Saves the cpsr to the spsr of the exception mode Saves the pc to the lr of the exception mode Sets the cpsr to the exception mode Sets pc to the address of the exception handler

Vector Table
Vector table a table of addresses that the ARM core branches
Fixed offset for each type of exception

These addresses contain instructions of one of the following forms:


B <address> : branching relative to PC LDR pc, [pc, #offset] : loads handler address from memory to PC MOV PC, #immediate : loads immediate value into PC

Exception Priorities
Exceptions Reset Data abort FIQ IRQ Pre-fetch abort
SWI

Priority 1 2 3 4 5
6

I bit 1 1 1 1 1
1

Fbit 1 1 -

Undefined instructions

Exception Handlers
Reset handler
Initializes the system, setting up stack pointers, memory, external interrupt sources before enabling IRQ or FIQ Code should be designed to avoid further triggering of exceptions

Data Abort
Occurs when memory controller indicates that an invalid memory address has been accessed An FIQ exception can be raised within data abort handler

Exception Handlers (contd.)


FIQ
Occurs when an external peripheral generates the FIQ input signal Core disables both FIQ and IRQ interrupts

IRQ
Occurs when when an external device generates the IRQ input signal IRQ handler will be entered if neither an FIQ exception or Data abort exception occurs On entry IRQ exception is disabled and should remain disabled for the handler if not enabled by the handler

Exception Handlers (contd.)


Prefetch Abort
Occurs when an attempt to fetch an instruction results in memory fault FIQ exception can be serviced

Undefined instruction
Occurs when an instruction is not in the ARM or Thumb instruction

SWI and undefined instruction have the same level of priority because they cannot occur together

Returning from Exception Handler


Exception handler must not corrupt lr After servicing is complete, return to normal execution occurs
By moving the correct value of link register r14 into pc By restoring cpsr from spsr

Interrupt Assignment
An interrupt controller connects multiple external interrupts to either FIQ or IRQ IRQ are normally assigned to general purpose interrupts
Example: periodic timer interrupt to force a context switch

FIQ is reserved for an interrupt source which requires fast response time

Interrupt Latency
Hardware and software latency Software methods to reduce latency
Nested handler which allows further interrupts to occur even when servicing an existing interrupt by re-enabling the interrupts inside service routine Program interrupt controller to ignore interrupts of same or lower priority
Higher priority interrupts will have lower average latency

Stack Organization
For each processor mode stack has to be set up
To be done every time processor is reset
Change to each mode by storing CPSR bit pattern and initialise sp

Design decisions
Location and mode (descending stack is common) Size
Nested interrupt handler requires larger stack

ARM I/O System


Handles all I/O devices using memory mapped I/O Interrupt support:
Fast interrupt Normal interrupt

DMA Support
Large bandwidth data transfer

ARM CPU Core


Processor Core + Cache + MMU

Diagrams from: ARM SOC Architecture, Steve Furber, Addison Wesley, 2000 ARM Architecture Reference Manual, David Seal, Addison Wesley, 2001

ARM 7 Processor Core


Low-end ARM core for applications like mobile phones TDMI
T : Thumb D : On chip debug support enabling processor to halt in response to debug request M : Enhanced multiplier, yield a full 64-bit result I : Embedded ICE Hardware

Von Neumann architecture 3 stage pipeline, CPI ~ 1.9

ARM single-cycle instruction pipeline operation


1 f etch decode execute

f etch

decode

execute

3 instruction

f etch

decode

execute time

3 stage Pipeline

Before returning from exception handler proper adjustment of lr value is required

Pipeline Operation
Not always cycle per instruction completion
Example: LDMIA r0, [r2,r3] (multiple load):

2 registers to load , instruction in execution for two cycles


Execution of Prefetched instruction delayed

Branch, Subroutine call, Exceptions effect pipeline efficiency

Interrupt Pipeline Example


FIQ

FIQ minimum latency : 7 cycles

ARM7TDMI organization
sc an c hain 2 ex tern0 ex tern1 opc, r/w, mreq, trans, mas[1:0] A[31:0] D[31:0]

Embedded ICE

sc an c hain 0

processor core
sc an c hain 1

other signals

Din[31:0] Dout[31:0]

bu s splitter

JTAG TAP controller

TC K TMSTR ST TD I TD O

The ARM7TDMI core interface signals

clock control conf iguration interrupts initialization

mcl k wa it eclk bi ge nd i rq q i sync reset en in en out en outi ab e al e ap e db e tbe bu sen hi gh z bu sdi s ecapclk db grq brea kpt db gack exec exte rn 1 exte rn 0 db gen rang eou t0 rang eou t1 db grqi co mmrx co mmtx op c cp i cp a cp b Vd d Vss

A[31:0] Di n[31:0] Do ut[31 :0 ] D[31:0] bl [3:0] r/w mas[1 :0 ] mreq seq l ock tra ns mod e[4:0] ab ort Tb it

me mory interface

bus control

MMU interface state

ARM7TDMI core

tapsm[3 :0 ] i r[3:0] tdoe n tck1 tck2 screg[3:0] dri veb s ecapclkbs i cap cl kb s hi gh z pclkbs rstcl kb s sdi nbs sdoutbs shcl kb s shcl k2 bs TRS T TCK TMS TDI TDO

TAP inf ormation

debug

boundary scan extension

coprocessor interface pow er

JTAG controls

Interface signals

Interface signals (contd.)

Interface signals (contd.)

ARM Memory Interface

ARM9TDMI
Harvard Architecture
Increases available memory bandwidth
Instruction memory interace Data memory interface

Simultaneous access to instruction and data memory

5 stage pipeline Changes implemented to


increase CPI ~1.5 Improve maximum clock frequency

5 stage Pipe-line Organization


Fetch Decode Execute Buffer Data
Access data memory or buffer

next pc

+4 I-cache fetch

pc + 4

pc + 8 r15

I decode instruction decode


immediate fields

register read

mul
LDM/ STM

+4

post index

shift ALU

reg shift

pre-index forwarding paths

execute

mux
B, BL MOV pc SUBS pc

byte repl. buffer/ data

load/store address

D-cache

Write back
To register file

rot/sgn ex
LDR pc

register write

write-back

ARM7TDMI and ARM9TDMI pipeline comparisons


ARM7TDMI: Fetch
instruction fetch

Decode
Thumb decompress ARM decode reg read

Execute
shift/ALU reg write

ARM9TDMI :
instr uction fetch r. read decode shift/ALU data memor y access reg write

Fetch

Decode

Execute

Memory

Write

DSP enhancements in ARM9E


New instruction additions give architecture V5TE New 32x16 and 16x16 multiply and multiplyaccumulate instructions
SMLAxy, SMLAWy, SMLALxy, SMULxy, SMULWy Allows independent access to 16-bit halves of registers
Gives efficient use of 32-bit bandwidth for packed 16-bit operands

Zero overhead fractional saturating arithmetic


QADD, QSUB, QDADD, QDSUB

Enhancements in 9E
Count leading zeros instruction
CLZ for faster normalisation and division

Single cycle 32x16 multiplier array


speeds up all ARM9E multiply instructions

New multiply-accumulate instructions


SMLAxy Rd,Rm,Rs,Rn
Rm
x=T x & y select the upper and lower 16-bits of the 32-bit registers x=B y=T

Rs
y=B

Rn

16x32 or 16x16 multiply gives 48-bit or 32-bit product

32-bit register or 64-bit register-pair as accumulation source

Other instructions include:SMUL: 16x16 => 32

SMLAL: 16x16 + 64 => 64

SMLAW: 32x16 + 32 => 32


SMULW: 32x16 => 32 MLA: MLAL: 32x32 + 32 => 32 32x32 + 64 => 64

Rd

32-bit register or 64bit register-pair as accumulation destination

ARM9E Datapath
Instruction Decode and Datapath control logic Byte rotate / Sign Extension
RDATA[]

r0

MUL
Byte/Half Replicate
WDATA[]

REGBANK

CLZ DINC
Imm

BData[..]

BARREL SHIFTER
DA[]

IINC

r14 PC PSR
InsAddr

AData[..]

SAT(x2)

RESULT[..]

ACC SAT

ARM920T organization
instructions

instruction cache
virtual IA

external coprocessor interface

data

data cache
virtual DA

CP15

instruction MMU

ARM9TDMI
EmbeddedICE & JT AG

data MMU
physical DA

CPU core around


ARM9TDMI
physical IA

AMBA interface

write buffer

physical address tag


copy-back DA

AMBA AMBA address data

Arbiter

AMBA
ARM
Bus Interface On-chip RAM

Reset

TIC Bridge External ROM External RAM Timer

External Bus Interface

Remap/ Pause

Decoder

Interrupt Controller

AHB or ASB System Bus

APB Peripheral Bus

AMBA (advanced micro-controller bus architecture): ARMs on chip bus specification

Simple ARM based System


On-chip there will be an ARM core (obviously) together with a number of system dependant peripherals. Also required will be some form of interrupt controller which receives interrupts from the peripherals and raised the IRQ or FIQ input to the ARM as appropriate.
This interrupt controller may also provide hardware assistance for prioritizing interrupts.

Simple ARM based System


As far as memory is concerned there is likely to be some (cheap) narrow off-chip ROM (or flash) used to boot the system from. There is also likely to be some 16-bit wide RAM used to store most of the runtime data and perhaps some code copied out of the flash. Then on-chip there may well be some 32-bit memory used to store the interrupt handlers and perhaps stacks.

Example ARM-based System


16 bit RAM 32 bit RAM Interrupt Controller
nIRQ nFIQ

Peripherals

I/O

8 bit ROM

ARM Core

108

ARM v5TEJ
J : supports implementation of Java virtual machine Offering hardware and software acceleration for optimized byte code execution

ARM v6 Architecture
SIMD (single instruction multiple data) instructions for exploiting data parallelism
High code density and low power By slicing up the existing 32 bit datapath into four 8-bit and two 16-bit slices Example: QADD8<cond> Rd, Rn, Rm
Signed saturating 8-bit SIMD add

Other features
Sum of absolute difference instructions
Example: UASAD8<cond> Rd,Rm,Rs
Sum of absolute difference between corresponding 8-bit values

Dual 16x16 multiply Cryptographic multiplication


A new 64 + 32x32 multiply accumulate operation

Multiprocessing synchronization primitive

Use of ARM Core


ARM based products to market from manufacturers: Atmel, Cirrus Logic, Intel, Samsung
Most products based upon 7TDMI-core and 920Tcores

ARM is mostly used as a processor core in SOC and ASICs There are a number of ASSP (application specific standard product) available, for example, communication applications
Example: Philips VWS22100 : ARM7 based GSM base band chip

Intels ARM Derivative


Xscale ARM v5TE instruction set Intel developed micro architecture Coprocessor instructions for extension

Summary
We have discussed architecture of ARM processors We have discussed exception processing Looked at pipeline architecture Understood key aspects of ARM CPU core

Potrebbero piacerti anche