Sei sulla pagina 1di 9

DLX Architecture

The architecture of DLX was chosen based on observations about most frequently used
primitives in programs. DLX provides a good architectural model for study, not only because of
the recent popularity of this type of machine, but also because it is easy to understand.
Like most recent load/store machines, DLX emphasizes
A simple load/store instruction set
Design for pipelining efficiency
An easily decoded instruction set
Efficiency as a compiler target

Registers for DLX


thirty-two 32-bit general purpose registers (GPRs), named R0, R1, ..., R31. The value of R0 is
always 0.
thirty-two floating-point registers (FPRs), which can be used as
32 single precision (32-bit) registers or
even-odd pairs holding double-precision values. Thus, the 64-bit FPRs are named
F0,F2,...,F30
a few special registers can be transferred to and from the integer registers.

Data types for DLX


for integer data
- 8-bit bytes
- 16-bit half words
- 32-bit words
for floating point
- 32-bit single precision
- 64-bit double precision
The DLX operations work on 32-bit integers and 32- or 64-bit floating point. Bytes and half
words are loaded into registers with either zeros or the sign bit replicated to fill the 32 bits of the
registers.

Memory
byte addressable
Big Endian mode
32-bit address
two addressing modes (immediate and displacement). Register deferred and absolute
addressing with 16-bit field are accomplished using R0.
memory references are load/store between memory and GPRs or FPRs

access to GPRs can be to a byte, to a halfword, or to a word


all memory accesses must be aligned
there are instructions for moving between a FPR and a GPR

Instructions
instruction layout for DLX
complete list of instructions in DLX
32 bits(fixed)
must be aligned

Operations
There are four classes of instructions:
Load/Store
Any of the GPRs or FPRs may be loaded and stored except that loading R0 has no effect.
ALU Operations
All ALU instructions are register-register instructions.
The operations are :
- add
- subtract
- AND
- OR
- XOR
- shifts
Compare instructions compare two registers (=,!=,<,>,=<,=>).
If the condition is true, these instructions place a 1 in the destination register, otherwise they
place a 0.
Branches/Jumps
All branches are conditional.The branch condition is specified by the instruction, which may test
the register source for zero or nonzero.
Floating-Point Operations
- add
- subtract
- multiply
- divide

DLX Instruction Set


Instruction

Complete list of the instructions in DLX.


Instruction meaning

type/opcode
Move data between registers and memory, or between the integer and
FP or special register; only memory address mode is 16-bit
displacement + contents of a GPR
Load byte, load byte unsigned, store byte
LB, LBU, SB
LH, LHU, SH
Load halfword, load halfword unsigned, store halfword
LW, SW
Load word, store word (to/from integer registers)
Load SP float, load DP float, store SP float, store DP float (SP - single
LF, LD, SF, SD
precision, DP - double precision)
MOVI2S, MOVS2I Move from/to GPR to/from a special register
MOVF, MOVD
Copy one floating-point register or a DP pair to another register or pair
MOVFP2I,
Move 32 bits from/to FP tegister to/from integer registers
MOVI2FP
Arithmetic /
Operations on integer or logical data in GPRs; signed arithmetics trap
Logical
on overflow
ADD, ADDI,
Add, add immediate (all immediates are 16-bits); signed and unsigned
ADDU, ADDUI
SUB, SUBI, SUBU,
Subtract, subtract immediate; signed and unsigned
SUBUI
MULT, MULTU,
Multiply and divide, signed and unsigned; operands must be floating-point
DIV, DIVU
registers; all operations take and yield 32-bit values
AND, ANDI
And, and immediate
OR, ORI, XOP,
Or, or immediate, exclusive or, exclusive or immediate
XOPI
LHI
Load high immediate - loads upper half of register with immediate
SLL, SRL, SRA,
Shifts: both immediate(S__I) and variable form(S__); shifts are shift left
SLLI, SRLI, SRAI logical, right logical, right arithmetic
S__, S__I
Set conditional: "__"may be LT, GT, LE, GE, EQ, NE
Control
Conditional branches and jumps; PC-relative or through register
BEQZ, BNEZ
Branch GPR equal/not equal to zero; 16-bit offset from PC
Test comparison bit in the FP status register and branch; 16-bit offset from
BFPT, BFPF
PC
J, JR
Jumps: 26-bit offset from PC(J) or target in register(JR)
Jump and link: save PC+4 to R31, target is PC-relative(JAL) ot a
JAL, JALR
register(JALR)
TRAP
Transfer to operating system at a vectored address
RFE
Return to user code from an exception; restore user code
Floating point
Floating-point operations on DP and SP formats
ADDD, ADDF
Add DP, SP numbers
SUBD, SUBF
Subtract DP, SP numbers
Data transfers

MULTD, MULTF
DIVD, DIVF
CVTF2D, CVTF2I,
CVTD2F,
CVTD2I, CVTI2F,
CVTI2D
__D, __F

Multiply DP, SP floating point


Divide DP, SP floating point
Convert instructions: CVTx2y converts from type x to type y, where x and
y are one of I(Integer), D(Double precision), or F(Single precision). Both
operands are in the FP registers.
DP and SP compares: "__" may be LT, GT, LE, GE, EQ, NE; set
comparison bit in FP status register.

Addressing Modes
Addressing modes are the ways how architectures specify the address of an object they want to
access. In GPR machines, an addressing mode can specify a constant, a register or a location in
memory.

The most common names for addressing modes (names may differ among architectures)
Addressing
Example
Meaning
When used
modes
Instruction
Register
Add R4,R3
R4 <- R4 + R3
When a value is in a register
Immediate
Add R4, #3
R4 <- R4 + 3
For constants
Add R4,
Displacement
R4 <- R4 + M[100+R1] Accessing local variables
100(R1)
Register
Accessing using a pointer or a
Add R4,(R1) R4 <- R4 + M[R1]
deffered
computed address
Useful in array addressing:
Add R3, (R1 +
Indexed
R3 <- R3 + M[R1+R2] R1 - base of array
R2)
R2 - index amount
Direct
Add R1, (1001) R1 <- R1 + M[1001]
Useful in accessing static data
Memory
If R3 is the address of a pointer p, then
Add R1, @(R3) R1 <- R1 + M[M[R3]]
deferred
mode yields *p
Useful for stepping through arrays in a
AutoR1 <- R1 +M[R2]
loop.
Add R1, (R2)+
increment
R2 <- R2 + d
R2 - start of array
d - size of an element
AutoR2 <-R2-d
Same as autoincrement.
Add R1,-(R2)
decrement
R1 <- R1 + M[R2]
Both can also be used to implement a

Scaled

Add R1,
100(R2)[R3]

stack as push and pop


Used to index arrays. May be applied
R1<to any base addressing mode in some
R1+M[100+R2+R3*d]
machines.

An Implementation of DLX
This unpipelined implementation is not the most economical or the highest-performance
implementation without pipelining. Instead, it is designed to lead naturally to a pipelined
implementation.
Implementing the instruction set requires the introduction of several temporary registers that are
not part of the architecture.
Every DLX instruction can be implemented in at most five clock cycles. The five clock cycles
are
Instruction fetch cycle (IF)
Instruction decode/register fetch (ID)
Execution/Effective address cycle (EX)
Memory access/branch completion cycle (MEM)
Write-back cycle (WB)

Instruction fetch cycle (IF):


IR
<- MEM[PC]
NPC <- PC +4
Operation:
- Send out the PC and fetch the instruction from memory into the instruction register (IR)
- increment the PC by 4 to address the next sequential instruction
- the IR is used to hold the instruction that will be needed on subsequent clock cycles
- the NPC is used to hold the next sequential PC (program counter)

Instruction decode/register fetch (ID):


A <- Regs[IR6..10]
B <- Regs[IR11..15]
Imm <- ((IR16)16##IR16..31)
Operation:
- Decode the instruction and access the register file to read the registers.
- the output of the general-purpose registers are read into two temporary registers (A and B) for
use in later clock cycles.
- the lower 16 bits of the IR are also sign-extended and stored into the temporary register IMM,
for use in the next cycle.

- decoding is done in parallel with reading registers, which is possible because these fields are at
a fixed location in the DLX instruction format. This technique is known as fixed-field decoding.

Execution/Effective address cycle (EX):


The ALU operates on the operand prepared in the prior cycle, performing one of four functions
depending on the DLX instruction type
Memory reference:
ALUOutput <- A +Imm
Operation: The ALU adds the operands to form the effective address and places the result into
the register ALUOutput
Register-Register ALU instruction:
ALUOutput <- A op B
Operation: The ALU performs the operation specified by the opcode on the value in register A
and on the value in register B. The result is placed in the register ALUOutput.
Register- Immediate ALU instruction:
ALUOutput <- A op Imm
Operation: The ALU performs the operation specified by the opcode on the value in register A
and on the value in register Imm. The result is placed in the register ALUOutput.
Branch:
ALUOutput <- NPC + Imm
Cond <- ( A op 0 )
Operation:
-The ALU adds the NPC to the sign-extended immediate value in Imm to compute the address of
the branch target.
-Register A, which has been read in the prior cycle, is checked to determine whether the branch
is taken.
- the comparison operation op is the relational operator determined by the branch opcode (e.g. op
is "==" for the instruction BEQZ)

Memory access/branch completion cycle (MEM):


The only DLX instructions active in this cycle are loads, stores, and branches.
Memory reference:
LMD <- Mem[ALUOutput]
or
Mem[ALUOutput] <- B
Operation:
-Access memory if needed
- If the instruction is load , data returns from memory and is placed in the LMD (load memory
data) register
- If the instruction is store, data from the B register is written into memory.
- In either case the address used is the one computed during the prior cycle and stored in the
register ALUOutput
Branch:

if (cond) PC <- ALUOutput


else PC <- NPC
Operation:
- If the instruction branches, the PC is replaced with branch destination address in the register
ALUOutput
- Otherwise, PC is replaced with the incremented PC in the register NPC

Write-back cycle (WB):


Register-Register ALU instruction:
Regs[IR16..20] <- ALUOutput
Register-Immediate ALU instruction:
Regs[IR11..15] <- ALUOutput
Load instruction:
Regs[IR11..15] <- LMD
Operation:
- Write the result into the register file, whether it comes from the memory(LMD) or from ALU
(ALUOutput)
- the register destination field is in one of two positions depending on the opcode

The Basic Pipeline for DLX


We can pipeline the DLX datapath with almost no changes by starting a new instruction on each
clock cycle. Each of the clock cycles of the DLX datapath now becomes a pipe stage: a cycle in
the pipeline.
While each instruction takes five clock cycles to complete, during each clock cycle the hardware
will initiate a new instruction and will execute some part of the five different instructions. The
typical way to show what is going on is:
Instr Num
1 2
3
4
5
6
7
8
9
instr i
IF ID EX MEM
WB
instr i+1
IF ID
EX
MEM
WB
instr i+2
IF
ID
EX
MEM
WB
instr i+3
IF
ID
EX
MEM
WB
instr i+4
IF
ID
EX
MEM
WB

Pipeline Hazards

There are situations, called hazards, that prevent the next instruction in the instruction stream
from being executing during its designated clock cycle. Hazards reduce the performance from
the ideal speedup gained by pipelining.
There are three classes of hazards:
Structural Hazards. They arise from resource conflicts when the hardware cannot
support all possible combinations of instructions in simultaneous overlapped execution.
Data Hazards. They arise when an instruction depends on the result of a previous
instruction in a way that is exposed by the overlapping of instructions in the pipeline.
Control Hazards.They arise from the pipelining of branches and other instructions that
change the PC.
Hazards in pipelines can make it necessary to stall the pipeline. The processor can stall on
different events:
A cache miss. A cache miss stalls all the instructions on pipeline both before and after the
instruction causing the miss.
A hazard in pipeline. Eliminating a hazard often requires that some instructions in the
pipeline to be allowed to proceed while others are delayed. When the instruction is stalled, all the
instructions issued later than the stalled instruction are also stalled. Instructions issued earlier
than the stalled instruction must continue, since otherwise the hazard will never clear.

Data Hazards
A major effect of pipelining is to change the relative timing of instructions by overlapping their
execution. This introduces data and control hazards. Data hazards occur when the pipeline
changes the order of read/write accesses to operands so that the order differs from the order seen
by sequentially executing instructions on the unpipelined machine.
Consider the pipelined execution of these instructions:

ADD
SUB
AND
OR
XOR

R1, R2, R3
R4, R5, R1
R6, R1, R7
R8, R1, R9
R10,R1,R11

1 2 3
IF ID EX
IF IDsub
IF

4
MEM
EX
IDand
IF

5
WB
MEM
EX
IDor
IF

WB
MEM WB
EX
MEM WB
MEM WB
IDxor EX

Control Hazards
Control hazards can cause a greater performance loss for DLX pipeline than data hazards. When
a branch is executed, it may or may not change the PC (program counter) to something other
than its current value plus 4. If a branch changes the PC to its target address, it is a taken branch;
if it falls through, it is not taken.
If instruction i is a taken branch, then the PC is normally not changed until the end of MEM
stage, after the completion of the address calculation and comparison (see diagram).
The simplest method of dealing with branches is to stall the pipeline as soon as the branch is
detected until we reach the MEM stage, which determines the new PC. The pipeline behavior
looks like :

Branch
Branch successor
Branch successor+1

IF ID
IF(stall)

EX MEM
stall stall

WB
IF
ID EX MEM
IF ID EX

WB
MEM

WB

Potrebbero piacerti anche