Single-Cycle Processor Datapath and Control Logic

1
Single-Cycle Processors:
Datapath & Control
Arvind
Computer Science & Artificial Intelligence Lab
M.I.T.
Based on the material prepared by
Arvind and Krste Asanovic
Instruction Set Architecture (ISA)

versus Implementation
6.823 L5- 2
Arvind
ISA is the hardware/software interface

Defines set of programmer visible state
Defines instruction format (bit encoding) and instruction
semantics
Examples: MIPS, x86, IBM 360, JVM
Many possible implementations of one ISA
360 implementations: model 30 (c. 1964), z900 (c. 2001)

x86 implementations: 8086 (c. 1978), 80186, 286, 386, 486,
Pentium, Pentium Pro, Pentium-4 (c. 2000), AMD Athlon,
Transmeta Crusoe, SoftPC
MIPS implementations: R2000, R4000, R10000, ...
JVM: HotSpot, PicoJava, ARM Jazelle, ...
September 26, 2005
6.823 L5- 3
Arvind
Processor Performance
Time
Program
Instructions
Program *
Cycles
Instruction
Time
Cycle
Instructions per program depends on source code, compiler

technology, and ISA
Cycles per instructions (CPI) depends upon the ISA and the
microarchitecture
Time per cycle depends upon the microarchitecture and the
base technology
this lecture
September 26, 2005
Microarchitecture
CPI
cycle time
Microcoded
>1
short
Single-cycle unpipelined
long
Pipelined
short
6.823 L5- 4
Arvind
Microarchitecture:
status
lines
Implementation of an ISA
Controller
control
points
Data
path
Structure: How components are connected.

Static
Behavior: How data moves between components
Dynamic
September 26, 2005
Hardware Elements
Combinational circuits
OpSelect
Mux, Demux, Decoder, ALU, ...

A0
A1
An-1
...
Mux
lg(n)
Demux
Sel
lg(n)
...
O0
O1
A
lg(n)
On-
Decoder
Sel
- Add, Sub, ...

- And, Or, Xor, Not, ...
- GT, LT, EQ, Zero, ...
...
O0
O1
On-1
Result
ALU
Comp?
Synchronous state elements

Flipflop, Register, Register file, SRAM, DRAM
D
En
Clk
ff
Q
register
Clk
En
D
Q
En
Clk
D1
D2
ff
ff
ff ...
Q0
Q1
Q2
Edge-triggered: Data is sampled at the rising edge

September 26, 2005
...
D0
...
Dn-1
ff
Qn-1
6.823 L5- 6
Arvind
Register Files
Clock WE
ReadSel1
ReadSel2
WriteSel
WriteData
ws clk
rs1
rs2
ws
wd
we
Register
file
2R+1W
ReadData1
ReadData2
rd1
rd2
rs1
wd
32
register 1
we
register 0
32
rd1
32
rs2
32
32
register 31
32
rd2
No timing issues in reading a selected register

Register files with a large number of ports are difficult
to design
Intels Itanium, GPR File has 128 registers with 8 read ports and
4 write ports!!!
September 26, 2005
6.823 L5- 7
Arvind
A Simple Memory Model
WriteEnable
Clock
Address
WriteData
MAGIC
RAM
ReadData
Reads and writes are always completed in one cycle

a Read can be done any time (i.e. combinational)
a Write is performed at the rising clock edge
if it is enabled
the write address and data
must be stable at the clock edge
Later in the course we will present a more realistic
model of memory
September 26, 2005
6.823 L5- 8
Arvind
Implementing MIPS:
Single-cycle per instruction
datapath & control logic
September 26, 2005
6.823 L5- 9
Arvind
The MIPS ISA
Processor State
32 32-bit GPRs, R0 always contains a 0
32 single precision FPRs, may also be viewed as
16 double precision FPRs

FP status register, used for FP compares & exceptions
PC, the program counter
some other special registers
Data types
8-bit byte, 16-bit half word
32-bit word for integers
32-bit word for single precision floating point
64-bit word for double precision floating point
Load/Store style instruction set
data addressing modes- immediate & indexed

branch addressing modes- PC relative & register indirect
Byte addressable memory- big endian mode
All instructions are 32 bits

September 26, 2005
6.823 L5- 10
Arvind
Instruction Execution
Execution of an instruction involves
1.
2.
3.
4.
5.
instruction fetch
decode and register fetch
ALU operation
memory operation (optional)
write back
and the computation of the address of the

next instruction
September 26, 2005
6.823 L5- 11
Arvind
Datapath: Reg-Reg ALU Instructions
RegWrite
0x4
clk
Add
inst<25:21>
inst<20:16>
PC
clk
addr
inst<15:11>
inst
Inst.
Memory
we
rs1
rs2
rd1
ws
wd rd2
ALU
GPRs
inst<5:0>
ALU
Control
OpCode
6
0
31
26 25
September 26, 2005
5
rs
5
rt
21 20
5
rd
16 15
5
0
11
RegWrite Timing?
6
func
5
rd (rs) func (rt)

0
6.823 L5- 12
Arvind
Datapath: Reg-Imm ALU Instructions
RegWrite
0x4
clk
Add
inst<25:21>
PC
clk
addr
inst<20:16>
inst
Inst.
Memory
we
rs1
rs2
rd1
ws
wd rd2
GPRs
inst<15:0>
OpCode
31
26 25
September 26, 2005
5
rs
5
rt
2120
ALU
Control
ExtSel
16
immediate
16 15
Imm
Ext
inst<31:26>
6
opcode
ALU
rt (rs) op immediate
0
6.823 L5- 13
Arvind
Conflicts in Merging Datapath
RegWrite
0x4
Add
we
rs1
rs2
rd1
ws
wd rd2
inst<25:21>
PC
clk
addr
inst<20:16>
inst<15:11>
inst
Inst.
Memory
inst<15:0>
September 26, 2005
5
rs
5
rt
rs
rt
Imm
Ext
ALU
Control
ExtSel
OpCode
opcode
ALU
GPRs
inst<31:26>
inst<5:0>
6
0
Introduce
muxes
clk
5
rd
5
0
immediate
6
func
rd (rs) func (rt)

6.823 L5- 14
Arvind
Datapath for ALU Instructions
RegWrite
0x4
clk
Add
PC
clk
addr
we
rs1
rs2
rd1
ws
wd rd2
<25:21>
<20:16>
inst
<15:11>
Inst.
Memory
GPRs
<15:0>
6
0
opcode
September 26, 2005
5
rs
5
rt
rs
rt
Imm
Ext
<31:26>, <5:0>
OpCode
ALU
RegDst
rt / rd
5
rd
ALU
Control
ExtSel
5
0
immediate
OpSel
func
BSrc
Reg / Imm
rd (rs) func (rt)

6.823 L5- 15
Arvind
Datapath for Memory Instructions
Should program and data memory be separate?

Harvard style: separate (Aiken and Mark 1 influence)
- read-only program memory
- read/write data memory
at some level the two memories have
to be the same
Princeton style: the same (von Neumanns influence)
- A Load or Store instruction requires
accessing the memory more than once
during its execution
September 26, 2005
6.823 L5- 16
Arvind
Load/Store Instructions:Harvard Datapath
RegWrite
0x4
we
rs1
rs2
rd1
ws
wd rd2
base
addr
inst
Inst.
Memory
clk
WBSrc
ALU / Mem
clk
Add
PC
MemWrite
clk
ALU
GPRs
rdata
Data
Memory
Imm
Ext
disp
we
addr
wdata
ALU
Control
OpCode RegDst
6
opcode
31
26 25
5
rs
ExtSel
5
rt
21 20
OpSel
BSrc
16
displacement
16 15
addressing mode
(rs) + displacement
0
rs is the base register

rt is the destination of a Load or the source for a Store
September 26, 2005
6.823 L5- 17
Arvind
MIPS Control Instructions

Conditional (on GPR) PC-relative branch
6
opcode
5
rs
16
offset
BEQZ, BNEZ
Unconditional register-indirect jumps
6
opcode
5
rs
16
JR, JALR
Unconditional absolute jumps

6
opcode
26
target
J, JAL
PC-relative branches add offset4 to PC+4 to calculate the

target address (offset is in words): 128 KB range
Absolute jumps append target4 to PC<31:28> to calculate
the target address: 256 MB range
jump-&-link stores PC+4 into the link register (R31)
All Control Transfers are delayed by 1 instruction
we will worry about the branch delay slot later
September 26, 2005
6.823 L5- 18
Arvind
Conditional Branches (BEQZ, BNEZ)

PCSrc
br
MemWrite
RegWrite
pc+4
0x4
Add
Add
clk
PC
clk
addr
we
rs1
rs2
rd1
ws
wd rd2
inst
Inst.
Memory
clk
we
addr
ALU
GPRs
Imm
Ext
wdata
ALU
Control
OpCode RegDst
September 26, 2005
ExtSel
rdata
Data
Memory
OpSel
BSrc
zero?
WBSrc
6.823 L5- 19
Arvind
Register-Indirect Jumps (JR)

PCSrc
br
rind
RegWrite
MemWrite
pc+4
0x4
Add
Add
clk
PC
clk
addr
we
rs1
rs2
rd1
ws
wd rd2
inst
Inst.
Memory
clk
we
addr
ALU
GPRs
Imm
Ext
wdata
ALU
Control
OpCode RegDst
September 26, 2005
ExtSel
rdata
Data
Memory
OpSel
BSrc
zero?
WBSrc
6.823 L5- 20
Arvind
Register-Indirect Jump-&-Link (JALR)

PCSrc
br
rind
RegWrite
MemWrite
pc+4
0x4
Add
Add
clk
PC
clk
addr
inst
31
Inst.
Memory
we
rs1
rs2
rd1
ws
wd rd2
clk
we
addr
ALU
GPRs
Imm
Ext
wdata
ALU
Control
OpCode RegDst
September 26, 2005
ExtSel
rdata
Data
Memory
OpSel
BSrc
zero?
WBSrc
6.823 L5- 21
Arvind
Absolute Jumps (J, JAL)

PCSrc
br
rind
jabs
pc+4
RegWrite
MemWrite
0x4
Add
Add
clk
PC
clk
addr
inst
31
Inst.
Memory
we
rs1
rs2
rd1
ws
wd rd2
clk
we
addr
ALU
GPRs
Imm
Ext
wdata
ALU
Control
OpCode RegDst
September 26, 2005
ExtSel
rdata
Data
Memory
OpSel
BSrc
zero?
WBSrc
6.823 L5- 22
Arvind
Harvard-Style Datapath for MIPS

PCSrc
br
rind
jabs
pc+4
RegWrite
MemWrite
0x4
Add
Add
clk
PC
clk
addr
inst
31
Inst.
Memory
we
rs1
rs2
rd1
ws
wd rd2
clk
we
addr
ALU
GPRs
Imm
Ext
wdata
ALU
Control
OpCode RegDst
September 26, 2005
ExtSel
rdata
Data
Memory
OpSel
BSrc
zero?
WBSrc
23
Five-minute break to stretch your legs
Single-Cycle Hardwired Control:
6.823 L5- 24
Arvind
Harvard architecture
We will assume
clock period is sufficiently long for all of
the following steps to be completed:
1.
2.
3.
4.
5.
instruction fetch
decode and register fetch
ALU operation
data fetch if required
register write-back setup time
tC > tIFetch + tRFetch + tALU+ tDMem+ tRWB
At the rising edge of the following clock, the PC,
the register file and the memory are updated
September 26, 2005
6.823 L5- 25
Arvind
Hardwired Control is pure

Combinational Logic
ExtSel
BSrc
op code
zero?
OpSel
combinational
logic
MemWrite
WBSrc
RegDst
RegWrite
PCSrc
September 26, 2005
6.823 L5- 26
Arvind
ALU Control & Immediate Extension
Inst<5:0> (Func)
Inst<31:26> (Opcode)
ALUop
+
0?
OpSel
( Func, Op, +, 0? )
Decode Map
ExtSel
( sExt16, uExt16,
High16)
September 26, 2005
6.823 L5- 27
Arvind
Hardwired Control Table
Opcode
ExtSel
ALU
BSrc
OpSel
MemW
RegW
WBSrc
RegDst
PCSrc
SW
*
sExt16
uExt16
sExt16
sExt16
Reg
Imm
Imm
Imm
Imm
Func
Op
Op
+
+
no
no
no
no
yes
yes
yes
yes
yes
no
ALU
ALU
ALU
Mem
*
rd
rt
rt
rt
*
pc+4
pc+4
pc+4
pc+4
pc+4
BEQZz=0
sExt16
0?
no
no
br
BEQZz=1
sExt16
*
*
*
*
*
no
no
no
no
no
*
*
*
*
pc+4
jabs
*
*
*
*
0?
*
*
*
*
yes
no
yes
PC
*
PC
R31
*
R31
jabs
rind
rind
ALUi
ALUiu
LW
J
JAL
JR
JALR
BSrc = Reg / Imm

RegDst = rt / rd / R31
September 26, 2005
no
no
WBSrc = ALU / Mem / PC

PCSrc = pc+4 / br / rind / jabs
6.823 L5- 28
Arvind
Pipelined MIPS
To pipeline MIPS:
First build MIPS without pipelining with CPI=1
Next, add pipeline registers to reduce cycle
time while maintaining CPI=1
September 26, 2005
6.823 L5- 29
Arvind
Pipelined Datapath
0x4
Add
PC
addr
rdata
Inst.
Memory
IR
we
rs1
rs2
rd1
ws
wd rd2
GPRs
Imm
Ext
ALU
we
addr
rdata
Data
Memory
wdata
write
fetch
decode & Reg-fetch
execute
memory
-back
phase
phase
phase
phase
phase
Clock period can be reduced by dividing the execution of an
instruction into multiple cycles
tC > max {tIM, tRF, tALU, tDM, tRW} ( = tDM probably)
However, CPI will increase unless instructions are pipelined
September 26, 2005
6.823 L5- 30
Arvind
An Ideal Pipeline
stage
1
stage
2
stage
3
stage
4
All objects go through the same stages

No sharing of resources between any two stages
Propagation delay through all pipeline stages is equal
The scheduling of an object entering the pipeline
is not affected by the objects in other stages
These conditions generally hold for industrial
assembly lines.
But can an instruction pipeline satisfy the last
condition?
September 26, 2005
How to divide the datapath

into stages
6.823 L5- 31
Arvind
Suppose memory is significantly slower than

other stages. In particular, suppose
tIM
tDM
tALU
tRF
tRW
=
=
=
=
=
10 units
10 units
5 units
1 unit
1 unit
Since the slowest stage determines the clock, it

may be possible to combine some stages without
any loss of performance
September 26, 2005
6.823 L5- 32
Arvind
Alternative Pipelining
0x4
Add
PC
addr
rdata
Inst.
Memory
fetch
phase
IR
we
rs1
rs2
rd1
ws
wd rd2
GPRs
ALU
we
addr
rdata
Data
Memory
Imm
Ext
wdata
decode & Reg-fetch

phase
execute
phase
memory
phase
tCC > max {tIM

, tRF
, t ALU,, ttDM
,, ttRW
}}
RW
IM
RF+tALU
DM+t
RW}
=
= ttDM
DM
DM+ tRW
write
-back
phase
increase the critical path by 10%

Write-back stage takes much less time than other stages.
Suppose we combined it with the memory phase
September 26, 2005
6.823 L5- 33
Arvind
Maximum Speedup by Pipelining

Assumptions
Unpipelined
Pipelined Speedup
1. tIM = tDM = 10,
tALU = 5,
tRF = tRW= 1
4-stage pipeline
tC
tC
27
10
2.7
2. tIM =tDM = tALU = tRF = tRW = 5

4-stage pipeline
25
10
2.5
3. tIM =tDM = tALU = tRF = tRW = 5

5-stage pipeline
25
5.0
It is possible to achieve higher speedup with more

stages in the pipeline.
September 26, 2005
34
Thank you !

Single-Cycle Processor Datapath and Control Logic

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Single-Cycle Processor Datapath and Control Logic

Caricato da

Copyright:

Formati disponibili

1

Datapath & Control

Computer Science & Artificial Intelligence Lab

Based on the material prepared by

Arvind and Krste Asanovic

Instruction Set Architecture (ISA)

ISA is the hardware/software interface

Examples: MIPS, x86, IBM 360, JVM

Many possible implementations of one ISA

360 implementations: model 30 (c. 1964), z900 (c. 2001)

JVM: HotSpot, PicoJava, ARM Jazelle, ...

September 26, 2005

Instructions per program depends on source code, compiler

September 26, 2005

Structure: How components are connected.

Mux, Demux, Decoder, ALU, ...

- Add, Sub, ...

Synchronous state elements

Edge-triggered: Data is sampled at the rising edge

No timing issues in reading a selected register

September 26, 2005

A Simple Memory Model

Reads and writes are always completed in one cycle

Single-cycle per instruction

datapath & control logic

September 26, 2005

The MIPS ISA

32 32-bit GPRs, R0 always contains a 0

32 single precision FPRs, may also be viewed as

16 double precision FPRs

8-bit byte, 16-bit half word

32-bit word for integers

32-bit word for single precision floating point

64-bit word for double precision floating point

Load/Store style instruction set

data addressing modes- immediate & indexed

All instructions are 32 bits

and the computation of the address of the

September 26, 2005

Datapath: Reg-Reg ALU Instructions

rd (rs) func (rt)

Datapath: Reg-Imm ALU Instructions

Conflicts in Merging Datapath

September 26, 2005

rd (rs) func (rt)

Datapath for ALU Instructions

rd (rs) func (rt)

Datapath for Memory Instructions

Should program and data memory be separate?

September 26, 2005

Load/Store Instructions:Harvard Datapath

rs is the base register

MIPS Control Instructions

Unconditional register-indirect jumps

Unconditional absolute jumps

PC-relative branches add offset4 to PC+4 to calculate the

September 26, 2005

Conditional Branches (BEQZ, BNEZ)

Register-Indirect Jumps (JR)

Register-Indirect Jump-&-Link (JALR)

Absolute Jumps (J, JAL)

Harvard-Style Datapath for MIPS

Five-minute break to stretch your legs