CODch 5 Slides

Computer Organization and
Architecture (AT70.01)
Comp. Sc. and Inf. Mgmt.
Asian Institute of Technology
Instructor: Dr. Sumanta Guha
Slide Sources: Patterson &
Hennessy COD book website
(copyright Morgan Kaufmann)
adapted and supplemented
COD Ch. 5
The Processor: Datapath and
Control
Implementing MIPS
We're ready to look at an implementation of the MIPS instruction set
Simplified to contain only
arithmetic-logic instructions: add, sub, and, or, slt
memory-reference instructions: lw, sw
control-flow instructions: beq, j
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
op rs rt rd shamt funct R-Format

6 bits 5 bits 5 bits 16 bits
op rs rt offset I-Format
6 bits 26 bits
op address J-Format
Implementing MIPS: the
Fetch/Execute Cycle
High-level abstract view of fetch/execute implementation
use the program counter (PC) to read instruction address
fetch the instruction from memory and increment PC
use fields of the instruction to select registers to read
execute depending on the instruction
repeat
Data
Register #
PC Address Instruction Registers ALU Address
Register #
Instruction
memory Data
Register # memory
Data
Overview: Processor
Implementation Styles
Single Cycle
perform each instruction in 1 clock cycle
clock cycle must be long enough for slowest instruction; therefore,
disadvantage: only as fast as slowest instruction
Multi-Cycle
break fetch/execute cycle into multiple steps
perform 1 step in each clock cycle
advantage: each instruction uses only as many cycles as it needs
Pipelined
execute each instruction in multiple steps
perform 1 step / instruction in each clock cycle
process multiple instructions in parallel assembly line
Functional Elements
Two types of functional elements in the hardware:
elements that operate on data (called combinational elements)
elements that contain data (called state or sequential elements)
Combinational Elements
Works as an input output function, e.g., ALU
Combinational logic reads input data from one register and
writes output data to another, or same, register
read/write happens in a single cycle combinational element
cannot store data from one cycle to a future one
Combinational logic hardware units
State State State

Combinational logic
element Combinational logic element element
1 2
Clock cycle
State Elements
State elements contain data in internal storage, e.g., registers
and memory
All state elements together define the state of the machine
What does this mean? Think of shutting down and starting up again
Flipflops and latches are 1-bit state elements, equivalently,
they are 1-bit memories
The output(s) of a flipflop or latch always depends on the bit
value stored, i.e., its state, and can be called 1/0 or high/low
or true/false
The input to a flipflop or latch can change its state depending
on whether it is clocked or not
Set-Reset (SR-) latch A set-reset latch made from two cross-coupled
nand gates is a basic memory unit.
(unclocked) When both Sbar and Rbar are 1, then either one
of the following two states is stable:
Think of Sbar as S, the inverse of set (which a) Q = 1 & Qbar = 0
sets Q to 1), and Rbar as R, the inverse of reset. b) Q = 0 & Qbar = 1
Sbar and the latch will continue in the current stable
(set) n1 Q state.
If Sbar changes to 0 (while Rbar remains at 1),

then the latch is forced to the exactly one
possible stable state (a). If Rbar changes to 0
Rbar n2 Qbar (while Sbar remains at 1), the latch is forced to
(reset) the exactly one possible stable state (b).
See sr_latch.v in Verilog Examples
So, the latch remembers which of Sbar or Rbar
equivalently with nor gates was last 0 during the time they are both 1.
R
Q
When both Sbar and Rbar are 0 the exactly one
stable state is Q = Qbar = 1. However, if after
_ that both Sbar and Rbar return to 1, the latch must
Q
S then jump non-deterministically to one of stable
states (a) or (b), which is undesirable behavior.
Synchronous Logic:
Clocked Latches and Flipflops
Clocks are used in synchronous logic to determine when a state
element is to be updated
in level-triggered clocking methodology either the state changes
only when the clock is high or only when it is low (technology-
dependent)
Falling edge
Clock period Rising edge

in edge-triggered clocking methodology either the rising edge or
falling edge is active (depending on technology) i.e., states
change only on rising edges or only on falling edge
Latches are level-triggered

Flipflops are edge-triggered
Clocked SR-latch
State can change only when clock is high
Potential problem : both inputs Sbar = 0 & Rbar = 0
will cause non-deterministic behavior
Sbar X
r1 Q
n1
clk clkbar
a
Rbar r2 n2 Qbar
Y
See clockedSr_latch.v in Verilog Examples
Clocked D-latch
State can change only when clock is high
Only single data input (compare SR-latch)
No problem with non-deterministic behavior
D Dbar X
a2
r1 Q
n1
clk clkbar
a1
r2 n2 Qbar
Y
See clockedD_latch.v in Verilog Examples
Timing diagram of D-latch

Clocked D-flipflop
Negative edge-triggered
Made from three SR-latches
sbar
cbar s
clear
q
clkbar
clk
r qbar
rbar
d
See edge_dffGates.v in Verilog Examples

State Elements on the
Datapath: Register File
Registers are implemented with arrays of D-flipflops
Clock
Read register
5 bits number 1 Read
data 1
32 bits
5 bits Read register
number 2
Register file
5 bits Write
register
Read 32 bits
Write data 2
32 bits data Write
Control signal
Register file with two read ports and

one write port
State Elements on the
Datapath: Register File
Port implementation: Clock
Clock
Write
Read register C
number 1 0
Register 0
Register 0 1 D
Register 1 M n-to-1 C
Register number
u Read data 1 decoder Register 1
Register n 1 x D
n 1
Register n n
Read register
number 2
C
Register n 1
M D
u Read data 2 C
x Register n
Register data D
Read ports are implemented Write port is implemented using

with a pair of multiplexors 5 a decoder 5-to-32 decoder for
bit multiplexors for 32 registers 32 registers. Clock is relevant to
write as register state may change
only at clock edge
Verilog
All components that we have discussed and shall discuss
can be fabricated using Verilog
Refer to our Verilog slides and examples
Single-cycle Implementation
of MIPS
Our first implementation of MIPS will use a single long clock
cycle for every instruction
Every instruction begins on one up (or, down) clock edge
and ends on the next up (or, down) clock edge
This approach is not practical as it is much slower than a
multicycle implementation where different instruction
classes can take different numbers of cycles
in a single-cycle implementation every instruction must take
the same amount of time as the slowest instruction
in a multicycle implementation this problem is avoided by
allowing quicker instructions to use fewer cycles
Even though the single-cycle approach is not practical it is
simple and useful to understand first
Note : we shall implement jump at the very end

Datapath: Instruction
Store/Fetch & PC Increment
Instruction
address
Add
PC
Instruction Add Sum
Instruction
4
memory
Read
PC address
a. Instruction memory b. Program counter c. Adder
Instruction
Instruction
Three elements used to store memory
and fetch instructions and

increment the PC
Datapath
Animating the Datapath
Instruction <- MEM[PC]
PC <- PC + 4
ADD
PC
ADDR
Memory
RD Instruction
Datapath: R-Type Instruction
ALU control ALU operation

5 Read 3 Read 3
register 1 register 1
Read Read
Register 5 data 1 data 1
Read Read
numbers register 2 Zero Zero
Registers Data ALU Instruction register 2
ALU Registers ALU ALU
5 Write result
register Write result
Read
register
Write data 2 Read
Data data data 2
Write
data
RegWrite
RegWrite
a. Registers b. ALU
Two elements used to implement Datapath

R-type instructions
add rd, rs, rt
Instruction
op rs rt rd shamt funct R[rd] <- R[rs] + R[rt];
5 5 5 Operation
3
RN1 RN2 WN
RD1
Register File ALU Zero
WD
RD2
RegWrite
Datapath:
Load/Store Instruction
3 ALU operation
Read
MemWrite register 1 MemWrite
Read
data 1
Read
Instruction register 2 Zero
Registers ALU ALU
Address Read Write Read
result Address
data 16 32 register data
Sign Read
Write data 2
extend Data
Write Data data
data memory memory
RegWrite Write
data
16 32
Sign MemRead
MemRead
extend
a. Data memory unit b. Sign-extension unit
Two additional elements used Datapath

To implement load/stores
lw rt, offset(rs)
R[rt] <- MEM[R[rs] + s_extend(offset)];
sw rt, offset(rs)
MEM[R[rs] + sign_extend(offset)] <- R[rt]
Datapath: Branch Instruction
PC + 4 from instruction datapath
No shift hardware required: Add Sum Branch target

simply connect wires from
input to output, each shifted Shift
left 2 bits left 2
ALU operation
Read 3
Instruction register 1
Read
data 1
Read
register 2 To branch
Registers ALU Zero
Write control logic
register
Read
data 2
Write
data
RegWrite
16 32
Sign
extend
Datapath
beq rs, rt, offset
if (R[rs] == R[rt]) then

PC <- PC+4 + s_extend(offset<<2)
MIPS Datapath I: Single-Cycle
Input is either register (R-type) or sign-extended
lower half of instruction (load/store)
Data is either
from ALU (R-type)
or memory (load) Combining the datapaths for R-type instructions
and load/stores using two multiplexors
Animating the Datapath:
R-type Instruction
Instruction add rd,rs,rt
32 16 5 5 5 Operation
3
RN1 RN2 WN
RD1
WD
M MemWrite
RD2 U ADDR MemtoReg
RegWrite X
Data
E Memory RD M
X U
16 32 ALUSrc X
T WD
N MemRead
D
Load Instruction
Instruction lw rt,offset(rs)
3
RN1 RN2 WN
RD1
WD
M MemWrite
RD2 U ADDR MemtoReg
RegWrite X
Data
E Memory RD M
X U
16 32 ALUSrc X
T WD
N MemRead
D
Store Instruction
Instruction sw rt,offset(rs)
3
RN1 RN2 WN
RD1
WD
M MemWrite
RD2 U ADDR MemtoReg
RegWrite X
Data
E Memory RD M
X U
16 32 ALUSrc X
T WD
N MemRead
D
MIPS Datapath II: Single-Cycle
Separate adder as ALU operations and PC
increment occur in the same clock cycle
Add
Read Registers
ALU operation
register 1 3 MemWrite
PC Read
Read Read MemtoReg
address
register 2 data 1 ALUSrc Zero
Instruction ALU ALU
Write Read Address Read
register data 2 M result data
u M
Instruction Write x u
memory Data x
data memory
Write
RegWrite data
16 Sign 32 MemRead
extend
Separate instruction memory
as instruction and data read
occur in the same clock cycle
Adding instruction fetch
MIPS Datapath III: Single-Cycle
PCSrc New multiplexor
M
Add u
x
4 Add ALU
result
Shift
left 2 Extra adder needed as both
adders operate in each cycle
Registers
Read 3 ALU operation
MemWrite
Read register 1 ALUSrc
PC Read
address Read data 1 MemtoReg
register 2 Zero
Instruction ALU ALU
register M result data
data 2 u M
Instruction u
memory Write x Data x
data memory
Write
RegWrite data
16 32
Sign
extend MemRead
Instruction address is either
PC+4 or branch target address
Adding branch capability and another multiplexor

Important note: in a single-cycle implementation data cannot be stored
during an instruction it only moves through combinational logic
Question: is the MemRead signal really needed?! Think of RegWrite!
Datapath Executing add
ADD
M
ADD
ADD U
4 X
PC <<2 PCSrc
Instruction
ADDR RD
Instruction 3
Memory RN1 RN2 WN
RD1
WD
M MemWrite
RD2 U ADDR MemtoReg
RegWrite X
Data
E Memory RD M
U
16 X 32 ALUSrc X
T WD
add rd, rs, rt N

D
MemRead
Datapath Executing lw
ADD
M
ADD
ADD U
4 X
PC <<2 PCSrc
Instruction
ADDR RD
Instruction 3
Memory RN1 RN2 WN
RD1
WD
M MemWrite
RD2 U ADDR MemtoReg
RegWrite X
Data
E Memory RD M
U
16 X 32 ALUSrc X
T WD
lw rt,offset(rs) N
D
MemRead
Datapath Executing sw
ADD
M
ADD
ADD U
4 X
PC <<2 PCSrc
Instruction
ADDR RD
Instruction 3
Memory RN1 RN2 WN
RD1
WD
M MemWrite
RD2 U ADDR MemtoReg
RegWrite X
Data
E Memory RD M
U
16 X 32 ALUSrc X
T WD
sw rt,offset(rs) N
D
MemRead
Datapath Executing beq
ADD
M
ADD
ADD U
4 X
PC <<2 PCSrc
Instruction
ADDR RD
Instruction 3
Memory RN1 RN2 WN
RD1
WD
M MemWrite
RD2 U ADDR MemtoReg
RegWrite X
Data
E Memory RD M
U
16 X 32 ALUSrc X
T WD
beq r1,r2,offset N
D
MemRead
Control
Control unit takes input from
the instruction opcode bits
Control unit generates

ALU control input
write enable (possibly, read enable also) signals for each storage
element
selector controls for each multiplexor
ALU Control
Plan to control ALU: main control sends a 2-bit ALUOp control field
to the ALU control. Based on ALUOp and funct field of instruction the
ALU control generates the 3-bit ALU control field
Recall from Ch. 4
ALU control Func- 2
field tion
ALUOp 3
Main ALU To
000 and Control Control ALU ALU
001 or control
010 add input
110 sub 6
111 slt
Instruction
funct field
ALU must perform
add for load/stores (ALUOp 00)
sub for branches (ALUOp 01)
one of and, or, add, sub, slt for R-type instructions, depending on the
instructions 6-bit funct field (ALUOp 10)
Setting ALU Control Bits
Instruction AluOp Instruction Funct Field Desired ALU control
opcode operation ALU action input
LW 00 load word xxxxxx add 010
SW 00 store word xxxxxx add 010
Branch eq 01 branch eq xxxxxx subtract 110
R-type 10 add 100000 add 010
R-type 10 subtract 100010 subtract 110
R-type 10 AND 100100 and 000
R-type 10 OR 100101 or 001
R-type 10 set on less 101010 set on less 111
ALUOp Funct field Operation

ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0
0 0 X X X X X X 010
*Typo in text 0* 1 X X X X X X 110
Fig. 5.15: if it is X
1 X X X 0 0 0 0 010
then there is potential
1 X X X 0 0 1 0 110
conflict between
line 2 and lines 3-7! 1 X X X 0 1 0 0 000
1 X X X 0 1 0 1 001
1 X X X 1 0 1 0 111
Truth table for ALU control bits
Designing the Main Control
R-type opcode rs rt rd shamt funct
31-26 25-21 20-16 15-11 10-6 5-0
Load/store opcode rs rt address

or branch
31-26 25-21 20-16 15-0
Observations about MIPS instruction format

opcode is always in bits 31-26
two registers to be read are always rs (bits 25-21) and rt (bits 20-
16)
base register for load/stores is always rs (bits 25-21)
16-bit offset for branch equal and load/store is always bits 15-0
destination register for loads is in bits 20-16 (rt) while for R-type
instructions it is in bits 15-11 (rd) (will require multiplexor to select)
Datapath with Control I
PCSrc
1
Add M
u
x
4 ALU 0
Add result
New multiplexor RegWrite Shift
left 2
Instruction [25 21] Read

Read register 1 Read MemWrite
PC data 1
address Instruction [20 16] Read MemtoReg
ALUSrc
Instruction register 2 Zero
1 Read ALU ALU
[31 0] Write data 2 1 Read
M result Address 1
u register M data
Instruction Instruction [15 11] x u M
memory Write x u
0 data Registers x
0
Write Data 0
RegDst data memory
Instruction [15 0] 16 Sign 32
extend ALU MemRead
control
Instruction [5 0]
ALUOp
Adding control to the MIPS Datapath III (and a new multiplexor to select field to
specify destination register): what are the functions of the 9 control signals?
Control Signals
Signal Name Effect when deasserted Effect when asserted
RegDst The register destination number for the The register destination number for the
Write register comes from the rt field (bits 20-16) Write register comes from the rd field (bits 15-11)
RegWrite None The register on the Write register input is written
with the value on the Write data input
AlLUSrc The second ALU operand comes from the The second ALU operand is the sign-extended,
second register file output (Read data 2) lower 16 bits of the instruction
PCSrc The PC is replaced by the output of the adder The PC is replaced by the output of the adder
that computes the value of PC + 4 that computes the branch target
MemRead None Data memory contents designated by the address
input are put on the first Read data output
MemWrite None Data memory contents designated by the address
input are replaced by the value of the Write data input
MemtoReg The value fed to the register Write data input The value fed to the register Write data input
comes from the ALU comes from the data memory
Effects of the seven control signals

Datapath with Control II
0
M
u
x
ALU
Add result 1
Add Shift PCSrc
RegDst left 2
4 Branch
MemRead
Instruction [31 26] MemtoReg
Control
ALUOp
MemWrite
ALUSrc
RegWrite

PC Read register 1
address Read
Instruction [20 16] data 1
Read
register 2 Zero
Instruction 0 Registers Read ALU ALU
[31 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15 0] Sign
extend ALU
control
Instruction [5 0]
MIPS datapath with the control unit: input to control is the 6-bit instruction
opcode field, output is seven 1-bit signals and the 2-bit ALUOp signal
PCSrc cannot be
set directly from the
0
M
u opcode: zero test
x
ALU
Add result 1
outcome is required
Add Shift PCSrc
RegDst left 2
4 Branch
MemRead
Control
ALUOp
MemWrite
ALUSrc
RegWrite

PC Read register 1
address Read
Read
register 2 Zero
[31 0] 0 Read
u M
1 Write x Data
data x
1 memory 0
Write
data
Datapath with Instruction [15 0]

16
Sign
extend
32
ALU
control
Control II (cont.) Instruction [5 0]
Determining control signals for the MIPS datapath based on instruction opcode
Memto- Reg Mem Mem

Instruction RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUp0
R-format 1 0 0 1 0 0 0 1 0
lw 0 1 1 1 1 0 0 0 0
sw X 1 X 0 0 1 0 0 0
beq X 0 X 0 0 0 1 0 1
Control Signals:
R-Type Instruction
ADD
0
M
ADD
ADD U
4 rs rt rd X
I[25:21] I[20:16] I[15:11] 1

PC <<2 PCSrc
Instruction
ADDR RD I
32 5
0
0 1 Value depends on
Instruction ???
Memory
MUX RegDst Operation funct
16 5 5 5
1 3
RN1 RN2 WN
RD1
0
WD 0
immediate/
offset M MemWrite 0
RD2 U ADDR MemtoReg
I[15:0] RegWrite X 1
1
Data
E Memory RD M
U
1 16 X
T
32 ALUSrc
WD X
Control signals
N
D
0 MemRead 0
shown in blue 0
Control Signals:
lw Instruction
ADD
0
M
ADD
ADD U
4 rs rt rd X
I[25:21] I[20:16] I[15:11] 1

PC <<2 PCSrc
Instruction
ADDR RD I
32 5
0
0 1
Instruction MUX
010
RegDst Operation
Memory 16 5 5 5
0 3
RN1 RN2 WN
RD1
Register File ALU Zero 0
WD 0
immediate/
offset M MemWrite 1
RD2 U ADDR MemtoReg
1
Data
E Memory RD M
U
1 16 X
T
32 ALUSrc
WD X
Control signals
N
D
1 MemRead 0
shown in blue 1
Control Signals:
sw Instruction
ADD
0
M
ADD
ADD U
4 rs rt rd X
I[25:21] I[20:16] I[15:11] 1

PC <<2 PCSrc
Instruction
ADDR RD I
32 5
0
0 1
Instruction
Memory
MUX 010
RegDst Operation
16 5 5 5
X 3
RN1 RN2 WN
RD1
WD 0
immediate/
offset M MemWrite X
RD2 U ADDR MemtoReg
1
Data
E Memory RD M
U
0 16 X
T
32 ALUSrc
WD X
Control signals
N
D
1 MemRead 0
shown in blue 0
Control Signals:
beq Instruction
ADD
0
M
ADD
ADD U
4 rs rt rd X
I[25:21] I[20:16] I[15:11] 1

PC <<2 PCSrc
Instruction
ADDR RD I 5
1 if Zero=1
32
0 1
Instruction MUX 110
RegDst Operation
Memory 16 5 5 5
X 3
RN1 RN2 WN
RD1
WD 0
immediate/
offset M MemWrite X
RD2 U ADDR MemtoReg
1
Data
E Memory RD M
U
0 16 X
T
32 ALUSrc
WD X
Control signals
N
D
0 MemRead 0
shown in blue 0
Datapath with Control III
Jump opcode address
31-26 25-0
Composing jump New multiplexor with additional
target address control bit Jump
Instruction [25 0] Shift Jump address [31 0]

left 2
26 28 0 1
PC+4 [31 28] M M

u u
x x
ALU
Add result 1 0
Add Shift
RegDst
Jump left 2
4 Branch
MemRead
Instruction [31 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite

Read register 1
PC address Read
Read
register 2 Zero
[31 0] 0 Read
u M
1 Write x Data
data x
1 memory 0
Write
data
16 32
extend ALU
control
Instruction [5 0]
MIPS datapath extended to jumps: control unit generates new Jump control bit
Datapath Executing j
R-type Instruction: Step 1
add $t1, $t2, $t3 (active = bold)
0
M
u
x
Add ALU 1
result
Add Shift
RegDst left 2
4 Branch
MemRead
Control ALUOp
MemWrite
ALUSrc
RegWrite

Read register 1
PC address Read
Instruction [20 16] Read data 1
register 2 Zero
Instruction 0 Registers Read
[31 0] ALU ALU
M Write data 2 0 Address Read
result 1
u M
memory x u
Instruction [15 11] Write x
1 Data x
data 1 memory 0
Write
data
16 32
extend ALU
control
Instruction [5 0]
Fetch instruction and increment PC count

0
M
u
x
ALU
Add result 1
Add Shift
RegDst left 2
4 Branch
MemRead
Control
ALUOp
MemWrite
ALUSrc
RegWrite

PC Read register 1
address Read
Read
register 2 Zero
[31 0] 0 Read
u M
1 Write x Data
data x
1 memory 0
Write
data
16 32
extend ALU
control
Instruction [5 0]
Read two source registers from the register file

0
M
u
x
Add ALU 1
result
Add Shift
RegDst left 2
4 Branch
MemRead
Control ALUOp
MemWrite
ALUSrc
RegWrite

Read register 1
PC address Read
register 2 Zero
[31 0] Write 0 Read
M data 2 result Address data 1
Instruction u register M
u M
memory x u
Instruction [15 11] Write x
1 Data x
data 1 memory 0
Write
data
16 32
extend ALU
control
Instruction [5 0]
ALU operates on the two register operands

0
M
u
x
ALU
Add result 1
Add Shift
RegDst left 2
4 Branch
MemRead
Control ALUOp
MemWrite
ALUSrc
RegWrite

Read register 1
PC address Read
Read
register 2 Zero
[31 0] 0 Read
u M
1 Write x Data
data x
1 memory 0
Write
data
16 32
extend ALU
control
Instruction [5 0]
Write result to register

Single-cycle Implementation
Notes
The steps are not really distinct as each instruction
completes in exactly one clock cycle they simply indicate
the sequence of data flowing through the datapath
The operation of the datapath during a cycle is purely
combinational nothing is stored during a clock cycle
Therefore, the machine is stable in a particular state at the
start of a cycle and reaches a new stable state only at the
end of the cycle
Very important for understanding single-cycle computing:
See our simple Verilog single-cycle computer in the folder
SimpleSingleCycleComputer in Verilog/Examples
Load Instruction Steps
lw $t1, offset($t2)
1. Fetch instruction and increment PC
2. Read base register from the register file: the base
register ($t2) is given by bits 25-21 of the instruction
3. ALU computes sum of value read from the register file
and the sign-extended lower 16 bits (offset) of the
instruction
4. The sum from the ALU is used as the address for the
data memory
5. The data from the memory unit is written into the
register file: the destination register ($t1) is given by
bits 20-16 of the instruction
Load Instruction
lw $t1, offset($t2)
0
M
u
x
ALU
Add result 1
Add Shift
RegDst left 2
4 Branch
MemRead
Control
ALUOp
MemWrite
ALUSrc
RegWrite

PC Read register 1
address Read
Read
register 2 Zero
[31 0] 0 Read
u M
1 Write x Data
data x
1 memory 0
Write
data
Instruction [15 0] 16 32
Sign
extend ALU
control
Instruction [5 0]
Branch Instruction Steps
beq $t1, $t2, offset
1. Fetch instruction and increment PC
2. Read two register ($t1 and $t2) from the register file
3. ALU performs a subtract on the data values from the
register file; the value of PC+4 is added to the sign-
extended lower 16 bits (offset) of the instruction
shifted left by two to give the branch target address
4. The Zero result from the ALU is used to decide which
adder result (from step 1 or 3) to store in the PC
Branch Instruction
beq $t1, $t2, offset
0
M
u
x
ALU
Add result 1
Add
Shift
RegDst left 2
4 Branch
MemRead
Control
ALUOp
MemWrite
ALUSrc
RegWrite

PC Read register 1
address Read
register 2 Zero
Instruction
[31 0] 0 Registers Read ALU ALU
Write 0 Read
M data 2 result Address 1
u M
Write x Data
1 data x
1 memory 0
Write
data
16 32
extend ALU
control
Instruction [5 0]
Implementation: ALU Control Block
ALUOp Funct field Operation
ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0
0 0 X X X X X X 010
0* 1 X X X X X X 110 *Typo in text
Fig. 5.15: if it is X
1 X X X 0 0 0 0 010
then there is potential
1 X X X 0 0 1 0 110 conflict between
1 X X X 0 1 0 0 000 line 2 and lines 3-7!
1 X X X 0 1 0 1 001
1 X X X 1 0 1 0 111
Truth table for ALU control bits
ALUOp
ALU control block
ALUOp0
ALUOp1
Operation2
F3
Operation
F2 Operation1
F (5 0)
F1
Operation0
F0
ALU control logic

Implementation: Main Control
Block Inputs
Op5
Op4
Signal R- lw sw beq
Op3
Op2
name format Op1
Op5 0 1 1 0 Op0
Op4 0 0 0 0
Inputs
Op3 0 0 1 0 Outputs
Op2 0 0 0 1 R-format Iw sw beq
RegDst
Op1 0 1 1 0 ALUSrc
Op0 0 1 1 0 MemtoReg
RegDst 1 0 x x RegWrite
ALUSrc 0 1 1 0 MemRead
MemtoReg 0 1 x x MemWrite
Outputs
RegWrite 1 1 0 0 Branch
MemRead 0 1 0 0 ALUOp1
MemWrite 0 0 1 0 ALUOpO
Branch 0 0 0 1 Main control PLA (programmable

ALUOp1 1 0 0 0 logic array): principle underlying
ALUOP2 0 0 0 1 PLAs is that any logical expression
Truth table for main control signals can be written as a sum-of-products
Single-Cycle Design Problems
Assuming fixed-period clock every instruction datapath uses one
clock cycle implies:
CPI = 1
cycle time determined by length of the longest instruction path
(load)
but several instructions could run in a shorter clock cycle: waste of time
consider if we have more complicated instructions like floating point!
resources used more than once in the same cycle need to be
duplicated
waste of hardware and chip area
Example: Fixed-period clock vs.
variable-period clock in a
single-cycle implementation
Consider a machine with an additional floating point unit. Assume
functional unit delays as follows
memory: 2 ns., ALU and adders: 2 ns., FPU add: 8 ns., FPU multiply: 16 ns.,
register file access (read or write): 1 ns.
multiplexors, control unit, PC accesses, sign extension, wires: no delay
Assume instruction mix as follows
all loads take same time and comprise 31%
all stores take same time and comprise 21%
R-format instructions comprise 27%
branches comprise 5%
jumps comprise 2%
FP adds and subtracts take the same time and totally comprise 7%
FP multiplys and divides take the same time and totally comprise 7%
Compare the performance of (a) a single-cycle implementation using a fixed-
period clock with (b) one using a variable-period clock where each instruction
executes in one clock cycle that is only as long as it needs to be (not really
practical but pretend its possible!)
Solution
Instruction Instr. Register ALU Data Register FPU FPU Total
class mem. read oper. mem. write add/ mul/ time
sub div ns.
Load word 2 1 2 2 1 8
Store word 2 1 2 2 7
R-format 2 1 2 0 1 6
Branch 2 1 2 5
Jump 2 2
FP mul/div 2 1 1 16 20
FP add/sub 2 1 1 8 12
Clock period for fixed-period clock = longest instruction time = 20

ns.
Average clock period for variable-period clock = 8 31% +
7 21% + 6 27% + 5 5% + 2 2% + 20 7% + 12 7%
= 7.0 ns.
Therefore, performancevar-period /performancefixed-period = 20/7 = 2.9
Fixing the problem with single-
cycle designs
One solution: a variable-period clock with different cycle
times for each instruction class
unfeasible, as implementing a variable-speed clock is technically
difficult
Another solution:
use a smaller cycle time
have different instructions take different numbers of cycles
by breaking instructions into steps and fitting each step into one
cycle
feasible: multicyle approach!
Multicycle Approach
Break up the instructions into steps
each step takes one clock cycle
balance the amount of work to be done in each step/cycle so that
they are about equal
restrict each cycle to use at most once each major functional unit
so that such units do not have to be replicated
functional units can be shared between different cycles within one
instruction
Between steps/cycles
At the end of one cycle store data to be used in later cycles of the
same instruction
need to introduce additional internal (programmer-invisible) registers
for this purpose
Data to be used in later instructions are stored in programmer-
visible state elements: the register file, PC, memory
Multicycle Approach
PCSrc
M
Add u
x
4 Add ALU
result
Shift
left 2
Registers
Read 3 ALU operation
MemWrite
Read register 1 ALUSrc
PC Read
Note particularities of address
Instruction
Read
register 2
data 1
Zero
ALU ALU
MemtoReg
multicyle vs. single-

register M result data
data 2 u M
Instruction u
memory Write x Data x
diagrams data
RegWrite
Write
data
memory
single memory for data 16

Sign
extend
32
MemRead
and instructions
single ALU, no extra adders Single-cycle datapath
extra registers to
hold data between
Instruction
clock cycles PC Address
register
Data
A
Register #
Instruction
Memory or data Registers ALU ALUOut
Memory Register #
data B
Data register Register #
Multicycle datapath (high-level view)

Multicycle Datapath
PC 0 0
M Instruction Read
Address [25 21] register 1 M
u u
x Read x
Instruction Read A
1 Memory Zero
[20 16] register 2 data 1 1
MemData 0 ALU ALU ALUOut
Registers
Instruction M Write result
Read
[15 0] Instruction u register data 2 B 0
Write [15 11] x
Instruction Write 4 1 M
data 1 u
register data 2 x
Instruction 0 3
[15 0] M
u
x
Memory 1
data 16 32
Sign Shift
register
extend left 2
Basic multicycle MIPS datapath handles R-type instructions and load/stores:

new internal register in red ovals, new multiplexors in blue ovals
Breaking instructions into steps
Our goal is to break up the instructions into steps so that
each step takes one clock cycle
the amount of work to be done in each step/cycle is about equal
each cycle uses at most once each major functional unit so that
such units do not have to be replicated
functional units can be shared between different cycles within one
instruction
Data at end of one cycle to be used in next must be stored !!
Breaking instructions into steps
We break instructions into the following potential execution steps
not all instructions require all the steps each step takes one
clock cycle
1. Instruction fetch and PC increment (IF)
2. Instruction decode and register fetch (ID)
3. Execution, memory address computation, or branch completion (EX)
4. Memory access or R-type instruction completion (MEM)
5. Memory read completion (WB)
Each MIPS instruction takes from 3 5 cycles (steps)

Step 1: Instruction Fetch &
PC Increment (IF)
Use PC to get instruction and put it in the instruction register.
Increment the PC by 4 and put the result back in the PC.
Can be described succinctly using RTL (Register-Transfer Language):

IR = Memory[PC];
PC = PC + 4;
Step 2: Instruction Decode and
Register Fetch (ID)
Read registers rs and rt in case we need them.
Compute the branch address in case the instruction is a branch.
RTL:
A = Reg[IR[25-21]];
B = Reg[IR[20-16]];
ALUOut = PC + (sign-extend(IR[15-0]) << 2);
Step 3: Execution, Address
Computation or Branch Completion
(EX)
ALU performs one of four functions depending on instruction
type
memory reference:
ALUOut = A + sign-extend(IR[15-0]);
R-type:
ALUOut = A op B;
branch (instruction completes):
if (A==B) PC = ALUOut;
jump (instruction completes):
PC = PC[31-28] || (IR(25-0) << 2)
Step 4: Memory access or R-
type Instruction Completion
(MEM)
Again depending on instruction type:
Loads and stores access memory
load
MDR = Memory[ALUOut];
store (instruction completes)
Memory[ALUOut] = B;
R-type (instructions completes)

Reg[IR[15-11]] = ALUOut;
Step 5: Memory Read
Completion (WB)
Again depending on instruction type:
Load writes back (instruction completes)
Reg[IR[20-16]]= MDR;
Important: There is no reason from a datapath (or control) point

of view that Step 5 cannot be eliminated by performing
Reg[IR[20-16]]= Memory[ALUOut];
for loads in Step 4. This would eliminate the MDR as well.
The reason this is not done is that, to keep steps balanced in

length, the design restriction is to allow each step to contain
at most one ALU operation, or one register access, or one
memory access.
Summary of Instruction
Execution
Step Action for R-type Action for memory-reference Action for Action for
Step name instructions instructions branches jumps
1: IF Instruction fetch IR = Memory[PC]
PC = PC + 4
Instruction A = Reg [IR[25-21]]
2: ID decode/register fetch B = Reg [IR[20-16]]
ALUOut = PC + (sign-extend (IR[15-0]) << 2)
Execution, address ALUOut = A op B ALUOut = A + sign-extend if (A ==B) then PC = PC [31-28] II
3: EX computation, branch/ (IR[15-0]) PC = ALUOut (IR[25-0]<<2)
jump completion
Memory access or R-type Reg [IR[15-11]] = Load: MDR = Memory[ALUOut]
4: MEM completion ALUOut or
Store: Memory [ALUOut] = B
5: WB Memory read completion Load: Reg[IR[20-16]] = MDR
Multicycle Execution Step (1):
Instruction Fetch
IR = Memory[PC];
PC = PC + 4;
PC + 4
4
Instruction Decode & Register Fetch
A = Reg[IR[25-21]]; (A = Reg[rs])
B = Reg[IR[20-15]]; (B = Reg[rt])
ALUOut = (PC + sign-extend(IR[15-0]) << 2)
Branch
Reg[rs]
Target
Address
PC + 4
Reg[rt]
Memory Reference Instructions
Reg[rs] Mem.
Address
PC + 4
Reg[rt]
ALU Instruction (R-Type)
ALUOut = A op B
Reg[rs]
R-Type
Result
PC + 4
Reg[rt]
Branch Instructions
if (A == B) PC = ALUOut;
Branch
Reg[rs]
Target
Address
Branch
Target
Address
Reg[rt]
Jump Instruction
PC = PC[31-28] concat (IR[25-0] << 2)
Branch
Reg[rs]
Target
Address
Jump
Address
Reg[rt]
Memory Access - Read (lw)
Reg[rs] Mem.
Address
PC + 4
Mem. Reg[rt]
Data
Memory Access - Write (sw)
Memory[ALUOut] = B;
Reg[rs]
PC + 4
Reg[rt]
Reg[IR[15:11]] = ALUOUT
Reg[rs]
R-Type
Result
PC + 4
Reg[rt]
Memory Read Completion (lw)
Reg[IR[20-16]] = MDR;
Reg[rs]
Mem.
Address
PC + 4
Mem. Reg[rt]
Data
Multicycle Datapath with Control I
IorD MemRead MemWrite IRWrite RegDst RegWrite ALUSrcA
PC 0 0
M Instruction Read
[25 21] register 1 M
u Address u
x Read x
Instruction Read data 1 A Zero
1 Memory 1
[20 16] register 2
0 ALU ALU ALUOut
MemData Registers
Instruction M Write Read result
[15 0] register data 2 B 0
Instruction u
Write Instruction [15 11] x 4 1 M
data 1 Write u
register data 2 x
Instruction 0 3
[15 0] M
u
x
Memory 1
data 16 32 ALU
Sign Shift
register control
extend left 2
Instruction [5 0]
MemtoReg ALUSrcB ALUOp
with control lines and the ALU control block added not all control lines are shown
Multicycle Datapath with Control II
New gates New multiplexor
For the jump address
PCWriteCond PCSource
PCWrite ALUOp
IorD Outputs
ALUSrcB
MemRead
Control ALUSrcA
MemWrite
RegWrite
MemtoReg
Op RegDst
IRWrite [5 0]
0
M
Jump 1 u
Instruction [25 0] 26 28 address [31-0] x
Shift
left 2 2
Instruction
[31-26] PC [31-28]
PC 0 0
M Instruction Read
[25 21] register 1 M
u Address u
x Read x
Instruction Read A Zero
1 Memory
[20 16] register 2 data 1 1
MemData 0 ALU ALU ALUOut
Registers
Instruction M Write Read result
[15 0] register data 2 B
Instruction u 0
Write Instruction [15 11] x 4 1 M
data Write
register 1 data u
2 x
Instruction 0 3
[15 0] M
u
x
Memory 1
data 16 32 ALU
Sign Shift
register control
extend left 2
Instruction [5 0]
Complete multicycle MIPS datapath (with branch and jump capability)

and showing the main control block and all control lines
Multicycle Control Step (1):
Fetch
IR = Memory[PC];
PC = PC + 4;
1
IRWrite
I Instruction I jmpaddr 28 32
1 R
5 I[25:0] <<2 CONCAT
PCWr* rs rt rd
2
0 0 32 5 5
0
MUX
1 RegDst
0 M
IorD 1U
PC
5 X ALUSrcA 010
Operation 0
X
MemWrite RN1 RN2 WN 3

0M 0M
U
1X
ADDR M Registers U PCSource
1M 1X Zero
Memory D RD1 A 0
RD R U
0X
WD ALU
ALU
OUT
WD RD2 B 0
MemRead MemtoReg 4 1M
U
2X
X RegWrite 3
1 0 E
X ALUSrcB
immediate 16 32
T <<2 1
N
D
Instruction Decode & Register Fetch
A = Reg[IR[25-21]]; (A = Reg[rs])
B = Reg[IR[20-15]]; (B = Reg[rt])
ALUOut = (PC + sign-extend(IR[15-0]) << 2);
0IRWrite
0 R
5 I[25:0] <<2 CONCAT
PCWr* rs rt rd
X 0 1 RegDst 0 2
IorD 0 32 5 5 MUX
5 X ALUSrcA 010 1U
M
X
PC Operation 0
0M 0M
U
1X
1M 1X Zero
Memory D RD1 A X
RD R U
0X
WD ALU
ALU
OUT
WD RD2 B 0
U
X 2X
3
RegWrite
0 0 E
X ALUSrcB
immediate 16 32
T <<2 3
N
D
Memory Reference Instructions
0
IRWrite
0
PCWr*
R
rs rt
5
rd
I[25:0] <<2 CONCAT
2
X 32 5 5
0
MUX
1 RegDst
1 M
IorD 0 5 X ALUSrcA 010 1U
X
PC Operation 0
0M 0M
U
1X
1M 1X Zero
Memory D RD1 A
RD R U
0X
WD ALU
ALU
X
OUT
WD RD2 B 0
U
2X
X RegWrite 3
0 0 E
X ALUSrcB
immediate 16 32
T <<2 2
N
D
ALUOut = A op B;
0
IRWrite
0 R
5 I[25:0] <<2 CONCAT
PCWr* rs rt rd
X 32
0 1 RegDst 1 2
IorD 0 5 5 MUX
5 X ALUSrcA ??? 1U
M
X
PC Operation 0
0M 0M
U
1X
1M 1X Zero
D RD1 A
Memory
RD R U
0X
WD ALU X
ALU
OUT
WD RD2 B 0
U
2X
X 3
RegWrite
0 0 E
X ALUSrcB
immediate 16 32
T
N
<<2 0
D
Branch Instructions
if (A == B) PC = ALUOut;
0
IRWrite
1 if I Instruction I jmpaddr 28 32
Zero=1 R
5 I[25:0] <<2 CONCAT
PCWr* rs rt rd
X 32
0 1 RegDst 1 2
IorD 0 5 5 MUX
5 X ALUSrcA 011 1U
M
X
PC Operation 0
0M 0M
U
1X
1M 1X Zero
D RD1 A
Memory
RD R U
0X
WD ALU 1
ALU
OUT
WD RD2 B 0
U
2X
X 3
RegWrite
0 0 E
X ALUSrcB
immediate 16 32
T
N
<<2 0
D
Jump Instruction
PC = PC[21-28] concat (IR[25-0] << 2);
0
IRWrite
1
PCWr*
R
rs rt
5
rd
I[25:0] <<2 CONCAT
X 32 5 5
0
MUX
1 RegDst
X
2
M
IorD 0 5 X ALUSrcA XXX 1U
X
PC Operation 0
0M 0M
U
1X
1M 1X Zero
Memory D RD1 A
RD R U
0X
WD ALU
ALU
2
OUT
WD RD2 B 0
U
2X
X RegWrite 3
0 0 E
X ALUSrcB
immediate 16 32
T <<2 X
N
D
Memory Access - Read (lw)
IRWrite 0
0 R
5 I[25:0] <<2 CONCAT
PCWr* rs rt rd
1 32
0 1 RegDst
X 2
IorD 0 5 5 MUX
5 X ALUSrcA XXX 1U
M
X
PC Operation 0
0M 0M
U
1X
1M 1X Zero
D RD1 A
Memory
RD R U
0X
WD ALU X
ALU
OUT
WD RD2 B 0
U
2X
X RegWrite 3
1 0 E
X ALUSrcB
immediate 16 32
T
N
<<2 X
D
Multicycle Execution Steps (4)
Memory Access - Write (sw)
Memory[ALUOut] = B;
IRWrite 0
0
PCWr*
R
rs rt
5
rd
I[25:0] <<2 CONCAT
1 32 5 5
0 1 RegDst
X
2
M
IorD 1 MUX
5 X ALUSrcA XXX 1U
X
PC Operation 0
0M 0M
U
1X
1M 1X Zero
Memory D RD1 A
RD R U
0X
WD ALU X
ALU
OUT
WD RD2 B 0
U
2X
X RegWrite 3
0 0 E
X ALUSrcB
immediate 16 32
T <<2 X
N
D
Reg[IR[15:11]] = ALUOut; (Reg[Rd] =
ALUOut) 0 IRWrite
0 R 5 I[25:0] <<2 CONCAT
PCWr* rs rt rd
2
X 32 5 5
0
MUX
1 RegDst
X M
IorD
0 5 1 ALUSrcA
XXX
1U
X
PC Operation 0
0M 0M
U ADDR M U PCSource
1X 0M Registers 1X Zero
Memory D RD1 A
RD R U
1X
WD ALU
ALU
X
OUT
WD RD2 B 0
U
2X
1 RegWrite 3
0 1 E
ALUSrcB
immediate 16 X 32
T <<2
N X
D
Multicycle Execution Steps (5)
Memory Read Completion (lw)
Reg[IR[20-16]] = MDR;
IRWrite 0
0 R 5 I[25:0] <<2 CONCAT
PCWr* rs rt rd
X 32
0 1 RegDst
X 2
IorD 0 5 5 MUX
5 0 ALUSrcA XXX 1U
M
X
PC Operation 0
0M 0M
U ADDR M U PCSource
1X 0M Registers 1X Zero
D RD1 A
Memory
RD R U
1X
WD ALU X
ALU
OUT
WD RD2 B 0
U
2X
0 RegWrite 3
0
immediate 16
1 E
X 32
ALUSrcB
T
N
<<2 X
D
Simple Questions
How many cycles will it take to execute this code?
lw $t2, 0($t3)
lw $t3, 4($t3)
beq $t2, $t3, Label #assume not equal
add $t5, $t2, $t3
sw $t5, 8($t3)
Label: ...
What is going on during the 8th cycle of execution?
Clock time-line
In what cycle does the actual addition of $t2 and $t3 takes place?
Implementing Control
Value of control signals is dependent upon:
what instruction is being executed
which step is being performed
Use the information we have accumulated to specify a finite

state machine
specify the finite state machine graphically, or
use microprogramming
Implementation is then derived from the specification

Review: Finite State Machines
Finite state machines (FSMs):
a set of states and
next state function, determined by current state and the input
output function, determined by current state and possibly input
Next
state
Next-state
Current state
function
Clock
Inputs
Output
Outputs
function
Well use a Moore machine output based only on current state

Example: Moore Machine
The Moore machine below, given input a binary string
terminated by #, will output even if the string has an even
number of 0s and odd if the string has an odd number of 0s
Even state Odd state
1
0 1
No No
output output
0
Start
# #
Output Output
even odd
Output even state Output odd state

FSM Control: High-level View
Start
Instruction fetch/decode and register fetch

(Figure 5.37)
Memory access
R-type instructions Branch instruction Jump instruction
instructions
(Figure 5.39) (Figure 5.40) (Figure 5.41)
(Figure 5.38)
High-level view of FSM control

Instruction decode/
Instruction fetch Register fetch
0
MemRead 1
ALUSrcA = 0
IorD = 0
Asserted signals Start
IRWrite
ALUSrcB = 01
ALUSrcA = 0
ALUSrcB = 11
shown inside ALUOp = 00
PCWrite
ALUOp = 00
state circles PCSource = 00

)
ype
(Op = 'JMP')
')
-t
EQ
p =R
'B
') (O
=
W
= 'S
p
(Op
(O
W ') or
= 'L
(Op
Memory reference FSM R-type FSM Branch FSM Jump FSM

(Figure 5.38) (Figure 5.39) (Figure 5.40) (Figure 5.41)
Instruction fetch and decode steps of every instruction is identical

FSM Control: Memory Reference
From state 1
(Op = 'LW') or (Op = 'SW')
Memory address computation
2
ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00
(O
(Op = 'LW')
p
=
'S
W
')
Memory Memory
access access
3 5
MemRead
MemWrite
IorD = 1
IorD = 1
Write-back step
4
RegWrite To state 0
MemtoReg = 1 (Figure 5.37)
RegDst = 0
FSM control for memory-reference has 4 states

FSM Control: R-type Instruction
From state 1
(Op = R-type)
Execution
6
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 10
R-type completion
7
RegDst = 1
RegWrite
MemtoReg = 0
To state 0
(Figure 5.37)
FSM control to implement R-type instructions has 2 states
FSM Control: Branch Instruction
From state 1
(Op = 'BEQ')
Branch completion
8
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 01
PCWriteCond
PCSource = 01
To state 0
(Figure 5.37)
FSM control to implement branches has 1 state

FSM Control: Jump Instruction
From state 1
(Op = 'J')
Jump completion
9
PCWrite
PCSource = 10
To state 0
(Figure 5.37)
FSM control to implement jumps has 1 state

FSM Control: Complete View
Instruction decode/
Instruction fetch register fetch
0
MemRead 1
IF ALUSrcA = 0
IorD = 0
ID ALUSrcA = 0
Start IRWrite ALUSrcB = 11
ALUSrcB = 01 ALUOp = 00
ALUOp = 00
PCWrite
PCSource = 00
e)
')
- t yp
EQ
(Op = 'J')
= R
'B
(Op
=
Memory address W ')
p
= 'S Branch
(O
Jump
computation
or (Op Execution completion
W ') completion
= 'L
2 (Op 6 8 9
ALUSrcA = 1
ALUSrcA = 1 ALUSrcA =1 ALUSrcB = 00
EX ALUSrcB = 10
ALUOp = 00
ALUSrcB = 00 ALUOp = 01
PCWriteCond
PCWrite
PCSource = 10
ALUOp = 10
PCSource = 01
(O
(Op = 'LW')
p
=
'S
W
')
Memory Memory
access access R-type completion
3 5 7
RegDst = 1 Labels on arcs are conditions

MEM MemRead
IorD = 1
MemWrite
IorD = 1
RegWrite
MemtoReg = 0 that determine next state
Write-back step
4
RegDst = 0
WB RegWrite
MemtoReg = 1
The complete FSM control for the multicycle MIPS datapath:

refer Multicycle Datapath with Control II
Example: CPI in a multicycle
CPU
Assume
the control design of the previous slide
An instruction mix of 22% loads, 11% stores, 49% R-type
operations, 16% branches, and 2% jumps
What is the CPI assuming each step requires 1 clock cycle?
Solution:
Number of clock cycles from previous slide for each instruction class:
loads 5, stores 4, R-type instructions 4, branches 3, jumps 3
CPI = CPU clock cycles / instruction count
= (instruction countclass i CPIclass i) / instruction count
= (instruction countclass I / instruction count) CPIclass I
= 0.22 5 + 0.11 4 + 0.49 4 + 0.16 3 + 0.02 3
= 4.04
FSM Control: PCWrite
PCWriteCond
Implement-
IorD
MemRead
MemWrite
ation
IRWrite
Control logic
MemtoReg
PCSource
ALUOp
Outputs ALUSrcB
ALUSrcA
RegWrite
RegDst
NS3
NS2
NS1
Inputs NS0
Op5
Op4
Op3
Op2
Op1
Op0
S3
S2
S1
S0
Instruction register State register
opcode field
Four state bits are required for 10 states
High-level view of FSM implementation: inputs to the combinational logic block are
the current state number and instruction opcode bits; outputs are the next state
number and control signals to be asserted for the current state
Op5
FSM
Op4
Op3
Control:
Op2
Op1
Op0
PLA
S3
S2
Implem-
S1
S0
entation
PCWrite
PCWriteCond
IorD
MemRead
MemWrite
IRWrite
MemtoReg
PCSource1
PCSource0
ALUOp1
ALUOp0
ALUSrcB1
ALUSrcB0
ALUSrcA
RegWrite
RegDst
NS3
NS2
NS1
NS0
Upper half is the AND plane that computes all the products. The products are carried
to the lower OR plane by the vertical lines. The sum terms for each output is given by
the corresponding horizontal line
E.g., IorD = S0.S1.S2.S3 + S0.S1.S2.S3
FSM Control: ROM
Implementation
ROM (Read Only Memory)
values of memory locations are fixed ahead of time
A ROM can be used to implement a truth table
if the address is m-bits, we can address 2m entries in the ROM
outputs are the bits of the entry the address points to
address output
0 0 0 0 0 1 1
m n
0 0 1 1 1 0 0
0 1 0 1 1 0 0
ROM m = 3 0 1 1 1 0 0 0
n = 4 1 0 0 0 0 0 0
1 0 1 0 0 0 1
1 1 0 0 1 1 0
1 1 1 0 1 1 1
The size of an m-input n-output ROM is 2m x n bits such a ROM can
be thought of as an array of size 2m with each entry in the array being
n bits
FSM Control: ROM vs. PLA
First improve the ROM: break the table into two parts
4 state bits give the 16 output signals 24 x 16 bits of ROM
all 10 input bits give the 4 next state bits 210 x 4 bits of ROM
Total 4.3K bits of ROM
PLA is much smaller
can share product terms
only need entries that produce an active output
can take into account don't cares
PLA size = (#inputs #product-terms) + (#outputs
#product-terms)
FSM control PLA = (10x17)+(20x17) = 460 PLA cells
PLA cells usually about the size of a ROM cell (slightly bigger)
Microprogramming
Microprogramming is a method of specifying FSM control that
resembles a programming language textual rather graphic
this is appropriate when the FSM becomes very large, e.g., if the
instruction set is large and/or the number of cycles per instruction
is large
in such situations graphical representation becomes difficult as
there may be thousands of states and even more arcs joining them
a microprogram is specification : implementation is by ROM or PLA
A microprogram is a sequence of microinstructions
each microinstruction has eight fields (label + 7 functional)
Label: used to control microcode sequencing
ALU control: specify operation to be done by ALU
SRC1: specify source for first ALU operand
SRC2: specify source for second ALU operand
Register control: specify read/write for register file
Memory: specify read/write for memory
PCWrite control: specify the writing of the PC
Sequencing: specify choice of next microinstruction
Microprogramming
The Sequencing field value determines the execution order
of the microprogram
value Seq : control passes to the sequentially next
microinstruction
value Fetch : branch to the first microinstruction to begin the
next MIPS instruction, i.e., the first microinstruction in the
microprogram
value Dispatch i : branch to a microinstruction based on
control input and a dispatch table entry (called dispatching):
Dispatching is implemented by means of creating a table, called
dispatch table, whose entries are microinstruction labels and
which is indexed by the control input. There may be multiple
dispatch tables the value Dispatch i in the sequencing field
indicates that the i th dispatch table is to be used
Control Microprogram
The microprogram corresponding to the FSM control shown
graphically earlier:
ALU Register PCWrite
Label control SRC1 SRC2 control Memory control Sequencing
Fetch Add PC 4 Read PC ALU Seq
Add PC Extshft Read Dispatch 1
Mem1 Add A Extend Dispatch 2
LW2 Read ALU Seq
Write MDR Fetch
SW2 Write ALU Fetch
Rformat1 Func code A B Seq
Write ALU Fetch
BEQ1 Subt A B ALUOut-cond Fetch
JUMP1 Jump address Fetch
Microprogram containing 10 microinstructions
Dispatch ROM 1
Op Opcode name Value
Dispatch ROM 2
000000 R-format Rformat1
Op Opcode name Value
000010 jmp JUMP1 100011 lw LW2
000100 beq BEQ1
101011 sw SW2
100011 lw Mem1
101011 sw Mem1 Dispatch Table 2
Dispatch Table 1
Microcode: Trade-offs
Specification advantages
easy to design and write
typically manufacturer designs architecture and microcode in parallel
Implementation advantages
easy to change since values are in memory (e.g., off-chip ROM)
can emulate other architectures
can make use of internal registers
Implementation disadvantages
control is implemented nowadays on same chip as processor so the
advantage of an off-chip ROM does not exist
ROM is no longer faster than on-board cache
there is little need to change the microcode as general-purpose
computers are used far more nowadays than computers designed for
specific applications
Summary
Techniques described in this chapter to design datapaths and
control are at the core of all modern computer architecture
Multicycle datapaths offer two great advantages over single-
cycle
functional units can be reused within a single instruction if they are
accessed in different cycles reducing the need to replicate
expensive logic
instructions with shorter execution paths can complete quicker by
consuming fewer cycles
Modern computers, in fact, take the multicycle paradigm to a
higher level to achieve greater instruction throughput:
pipelining (next topic) where multiple instructions execute
simultaneously by having cycles of different instructions overlap in
the datapath
the MIPS architecture was designed to be pipelined

CODch 5 Slides

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

CODch 5 Slides

Caricato da

Copyright:

Formati disponibili

Computer Organization and

6 bits 5 bits 5 bits 5 bits 5 bits 6 bits

op rs rt rd shamt funct R-Format

Combinational logic hardware units

State State State

If Sbar changes to 0 (while Rbar remains at 1),

Clock period Rising edge

Latches are level-triggered

Timing diagram of D-latch

See edge_dffGates.v in Verilog Examples

Register file with two read ports and

Read ports are implemented Write port is implemented using

Note : we shall implement jump at the very end

and fetch instructions and

ALU control ALU operation

Two elements used to implement Datapath

a. Data memory unit b. Sign-extension unit

Two additional elements used Datapath

No shift hardware required: Add Sum Branch target

beq rs, rt, offset

if (R[rs] == R[rt]) then

Adding branch capability and another multiplexor

add rd, rs, rt N

Control unit generates

ALUOp Funct field Operation

Load/store opcode rs rt address

Observations about MIPS instruction format

Instruction [25 21] Read

Effects of the seven control signals

Instruction [25 21] Read

Instruction [25 21] Read

Datapath with Instruction [15 0]

Control II (cont.) Instruction [5 0]

Memto- Reg Mem Mem

I[25:21] I[20:16] I[15:11] 1

I[25:21] I[20:16] I[15:11] 1

I[25:21] I[20:16] I[15:11] 1

I[25:21] I[20:16] I[15:11] 1

Instruction [25 0] Shift Jump address [31 0]

PC+4 [31 28] M M

Instruction [25 21] Read

Instruction [25 21] Read

Fetch instruction and increment PC count

Instruction [25 21] Read

Read two source registers from the register file

Instruction [25 21] Read

ALU operates on the two register operands

Instruction [25 21] Read

Write result to register

Instruction [25 21] Read

Instruction [25 21] Read

ALU control logic

Branch 0 0 0 1 Main control PLA (programmable

Clock period for fixed-period clock = longest instruction time = 20

multicyle vs. single-

single memory for data 16

Multicycle datapath (high-level view)

Basic multicycle MIPS datapath handles R-type instructions and load/stores:

Each MIPS instruction takes from 3 5 cycles (steps)

Can be described succinctly using RTL (Register-Transfer Language):

R-type (instructions completes)

Important: There is no reason from a datapath (or control) point

The reason this is not done is that, to keep steps balanced in