Sei sulla pagina 1di 26

Single Cycle Processor Design

COE 308
Computer Architecture Prof. Muhamed Mudawar
Computer Engineering Department King Fahd University of Petroleum and Minerals

Presentation Outline
Designing a Processor: Step-by-Step Datapath Components and Clocking Assembling an Adequate Datapath Controlling the Execution of Instructions The Main Controller and ALU Controller Drawback of the single-cycle processor design

Single Cycle Processor Design

COE 308 Computer Architecture

Muhamed Mudawar slide 2

The Performance Perspective


Recall, performance is determined by:
Instruction count Clock cycles per instruction (CPI) Clock cycle time CPI Cycle I-Count

Processor design will affect


Clock cycles per instruction Clock cycle time

Single cycle datapath and control design:


Advantage: One clock cycle per instruction Disadvantage: long cycle time
Single Cycle Processor Design COE 308 Computer Architecture Muhamed Mudawar slide 3

Designing a Processor: Step-by-Step


Analyze instruction set => datapath requirements
The meaning of each instruction is given by the register transfers Datapath must include storage elements for ISA registers Datapath must support each register transfer

Select datapath components and clocking methodology Assemble datapath meeting the requirements Analyze implementation of each instruction
Determine the setting of control signals for register transfer

Assemble the control logic


Single Cycle Processor Design COE 308 Computer Architecture Muhamed Mudawar slide 4

Review of MIPS Instruction Formats


All instructions are 32-bit wide Three instruction formats: R-type, I-type, and J-type
Op6 Op6 Op6 Rs5 Rs5 Rt5 Rt5 Rd5 sa5 immediate16 immediate26 funct6

Op6: 6-bit opcode of the instruction Rs5, Rt5, Rd5: 5-bit source and destination register numbers sa5: 5-bit shift amount used by shift instructions funct6: 6-bit function field for R-type instructions immediate16: 16-bit immediate value or address offset immediate26: 26-bit target address of the jump instruction
Single Cycle Processor Design COE 308 Computer Architecture Muhamed Mudawar slide 5

MIPS Subset of Instructions


Only a subset of the MIPS instructions are considered
ALU instructions (R-type): add, sub, and, or, xor, slt Immediate instructions (I-type): addi, slti, andi, ori, xori Load and Store (I-type): lw, sw Branch (I-type): beq, bne Jump (J-type): j

This subset does not include all the integer instructions But sufficient to illustrate design of datapath and control Concepts used to implement the MIPS subset are used to construct a broad spectrum of computers
Single Cycle Processor Design COE 308 Computer Architecture Muhamed Mudawar slide 6

Details of the MIPS Subset


Instruction
add sub and or xor slt addi slti andi ori xori lw sw beq bne j

Meaning
op6 = 0 op6 = 0 op6 = 0 op6 = 0 op6 = 0 op6 = 0 0x08 0x0a 0x0c 0x0d 0x0e 0x23 0x2b 0x04 0x05 0x02 rs5 rs5 rs5 rs5 rs5 rs5 rs5 rs5 rs5 rs5 rs5 rs5 rs5 rs5 rs5

Format
rt5 rt5 rt5 rt5 rt5 rt5 rt5 rt5 rt5 rt5 rt5 rt5 rt5 rt5 rt5 rd5 rd5 rd5 rd5 rd5 rd5 0 0 0 0 0 0 im16 im16 im16 im16 im16 im16 im16 im16 im16 0x20 0x22 0x24 0x25 0x26 0x2a

rd, rs, rt addition rd, rs, rt subtraction rd, rs, rt bitwise and rd, rs, rt bitwise or rd, rs, rt exclusive or rd, rs, rt set on less than rt, rs, im16 add immediate rt, rs, im16 slt immediate rt, rs, im16 and immediate rt, rs, im16 or immediate rt, im16 xor immediate rt, im16(rs) load word rt, im16(rs) store word rs, rt, im16 branch if equal rs, rt, im16 branch not equal im26 jump

im26
Muhamed Mudawar slide 7

Single Cycle Processor Design

COE 308 Computer Architecture

Register Transfer Level (RTL)


RTL is a description of data flow between registers RTL gives a meaning to the instructions All instructions are fetched from memory at address PC Instruction
ADD SUB ORI LW SW BEQ

RTL Description
Reg(Rd) Reg(Rs) + Reg(Rt); Reg(Rd) Reg(Rs) Reg(Rt); Reg(Rt) Reg(Rs) | zero_ext(Im16); Reg(Rt) MEM[Reg(Rs) + sign_ext(Im16)]; MEM[Reg(Rs) + sign_ext(Im16)] Reg(Rt); if (Reg(Rs) == Reg(Rt)) PC PC + 4 + 4 sign_extend(Im16) else PC PC + 4
COE 308 Computer Architecture Muhamed Mudawar slide 8

PC PC + 4 PC PC + 4 PC PC + 4 PC PC + 4 PC PC + 4

Single Cycle Processor Design

Instructions are Executed in Steps


R-type
Fetch instruction: Fetch operands: Execute operation: Write ALU result: Next PC address: Fetch instruction: Fetch operands: Execute operation: Write ALU result: Next PC address: Fetch instruction: Fetch operands: Equality: Branch: Instruction MEM[PC] data1 Reg(Rs), data2 Reg(Rt) ALU_result func(data1, data2) Reg(Rd) ALU_result PC PC + 4 Instruction MEM[PC] data1 Reg(Rs), data2 Extend(imm16) ALU_result op(data1, data2) Reg(Rt) ALU_result PC PC + 4 Instruction MEM[PC] data1 Reg(Rs), data2 Reg(Rt) zero subtract(data1, data2) if (zero) PC PC + 4 + 4sign_ext(imm16) else PC PC + 4
COE 308 Computer Architecture Muhamed Mudawar slide 9

I-type

BEQ

Single Cycle Processor Design

Instruction Execution contd


LW
Fetch instruction: Fetch base register: Calculate address: Read memory: Write register Rt: Next PC address: Fetch instruction: Fetch registers: Calculate address: Write memory: Next PC address: Fetch instruction: Target PC address: Jump: Instruction MEM[PC] base Reg(Rs) address base + sign_extend(imm16) data MEM[address] Reg(Rt) data PC PC + 4 Instruction MEM[PC] base Reg(Rs), data Reg(Rt) address base + sign_extend(imm16) MEM[address] data PC PC + 4 Instruction MEM[PC] target PC[31:28] , Imm26 , 00 PC target
concatenation

SW

Jump

Single Cycle Processor Design

COE 308 Computer Architecture

Muhamed Mudawar slide 10

Requirements of the Instruction Set


Memory
Instruction memory where instructions are stored Data memory where data is stored

Registers
32 32-bit general purpose registers, R0 is always zero Read source register Rs Read source register Rt Write destination register Rt or Rd

Program counter PC register and Adder to increment PC Sign and Zero extender for immediate constant ALU for executing instructions
Single Cycle Processor Design COE 308 Computer Architecture Muhamed Mudawar slide 11

Next . . .
Designing a Processor: Step-by-Step Datapath Components and Clocking Assembling an Adequate Datapath Controlling the Execution of Instructions The Main Controller and ALU Controller Drawback of the single-cycle processor design

Single Cycle Processor Design

COE 308 Computer Architecture

Muhamed Mudawar slide 12

Components of the Datapath


Combinational Elements
ALU, Adder Immediate extender Multiplexers
ExtOp
16 32 32

Extend

m u x
1
select

32

A L U

32

zero ALU result overflow

ALU control

Storage Elements
PC
32 32

Instruction
32

32 32 32

Data Memory
Address Data_out Data_in
32

Instruction memory Data memory PC register Register file

Address

Instruction Memory

5 5

Registers
RA RB BusA

MemRead
32 32

MemWrite

Clocking methodology
Timing of reads and writes
Single Cycle Processor Design

BusB BusW
32

RW

RegWrite

COE 308 Computer Architecture

Muhamed Mudawar slide 13

Register Element
Register
Similar to the D-type Flip-Flop
Data_In n bits Write Enable

n-bit input and output Write Enable:

Clock

Register
Data_Out n bits

Enable / disable writing of register Negated (0): Data_Out will not change Asserted (1): Data_Out will become Data_In after clock edge

Edge triggered Clocking


Register output is modified at clock edge
Single Cycle Processor Design COE 308 Computer Architecture Muhamed Mudawar slide 14

MIPS Register File RW RA


Register File consists of 32 32-bit registers

RB

BusA and BusB: 32-bit output busses for reading 2 registers BusW: 32-bit input bus for writing a register when RegWrite is 1 Two registers read and one written in a cycle

Registers are selected by:


RA selects register to be read on BusA RB selects register to be read on BusB RW selects the register to be written

RA
5

Register File BusA


BusB

32

RB
5

32

RW Clock

BusW
32

Clock input

RegWrite

The clock input is used ONLY during write operation During read, register file behaves as a combinational logic block
RA or RB valid => BusA or BusB valid after access time
Single Cycle Processor Design COE 308 Computer Architecture Muhamed Mudawar slide 15

Details of the Register File


32

RA 5 Decoder
32

"0"

RB 5 Decoder

"0" Tri-state buffer

R0 is not used

R1
32

RW
5

Decoder

. . .
32

R2

32

. . .
32

32

BusA

BusW

R31 Clock
Single Cycle Processor Design

32 32

RegWrite
COE 308 Computer Architecture

BusB

Muhamed Mudawar slide 16

Tri-State Buffers
Allow multiple sources to drive a single bus Two Inputs:
Data signal (data_in) Output enable
Data_in Data_out Enable

One Output (data_out):


If (Enable) Data_out = Data_in else Data_out = High Impedance state (output is disconnected)

Tri-state buffers can be used to build multiplexors


Single Cycle Processor Design

Data_0 Output Data_1 Select

COE 308 Computer Architecture

Muhamed Mudawar slide 17

Building a Multifunction ALU


None = 00 SLL = 01 SRL = 10 SRA = 11 Shift Operation
2 32 Shift Amount lsb 5

Shifter
c0
sign A d d e r

SLT: ALU does a SUB and check the sign and overflow

A Arithmetic Operation B ADD = 0 SUB = 1

32

ALU Result
32

<

1 2 3

32

32

overflow
0 1

zero

Logic Unit
AND = 00 OR = 01 NOR = 10 XOR = 11 Logical Operation

ALU Selection Shift = 00 SLT = 01 Arith = 10 Logic = 11


Muhamed Mudawar slide 18

2 3 2
COE 308 Computer Architecture

Single Cycle Processor Design

Instruction and Data Memories


Instruction memory needs only provide read access
Because datapath does not write instructions Behaves as combinational logic for read Address selects Instruction after access time
32 32

Address Instruction

Data Memory is used for load and store


MemRead: enables output on Data_out
Address selects the word to put on Data_out
32

Instruction Memory

Data Memory
32

MemWrite: enables writing of Data_in


Address selects the memory word to be written The Clock synchronizes the write operation

Address Data_out
32

Data_in Clock

Separate instruction and data memories


Later, we will replace them with caches
Single Cycle Processor Design COE 308 Computer Architecture

MemRead

MemWrite

Muhamed Mudawar slide 19

Clocking Methodology
Clocks are needed in a sequential We assume edgelogic to decide when a state element triggered clocking (register) should be updated All state changes
occur on the same To ensure correctness, a clocking clock edge methodology defines when data can Data must be valid be written and read
Register 1 Register 2

and stable before arrival of clock edge

Combinational logic

clock rising edge


Single Cycle Processor Design

falling edge
COE 308 Computer Architecture

Edge-triggered clocking allows a register to be read and written during same clock cycle
Muhamed Mudawar slide 20

10

Determining the Clock Cycle


With edge-triggered clocking, the clock cycle must be long enough to accommodate the path from one register through the combinational logic to another register
Register 1 Register 2

Tclk-q : clock to output delay through register Tmax_comb : longest delay through combinational logic Ts : setup time that input to a register must be stable before arrival of clock edge Th: hold time that input to a register must hold after arrival of clock edge Hold time (Th) is normally satisfied since Tclk-q > Th
Muhamed Mudawar slide 21

Combinational logic
clock writing edge

Tclk-q

Tmax_comb

T s Th

Tcycle Tclk-q + Tmax_comb + Ts


Single Cycle Processor Design

COE 308 Computer Architecture

Clock Skew
Clock skew arises because the clock signal uses different paths with slightly different delays to reach state elements Clock skew is the difference in absolute time between when two storage elements see a clock edge With a clock skew, the clock cycle time is increased

Tcycle Tclk-q + Tmax_combinational + Tsetup+ Tskew


Clock skew is reduced by balancing the clock delays
Single Cycle Processor Design COE 308 Computer Architecture Muhamed Mudawar slide 22

11

Next . . .
Designing a Processor: Step-by-Step Datapath Components and Clocking Assembling an Adequate Datapath Controlling the Execution of Instructions The Main Controller and ALU Controller Drawback of the single-cycle processor design

Single Cycle Processor Design

COE 308 Computer Architecture

Muhamed Mudawar slide 23

Instruction Fetching Datapath


We can now assemble the datapath from its components For instruction fetching, we need
Program Counter (PC) register Instruction Memory Adder for incrementing PC
next PC

Improved datapath increments upper 30 bits of PC by 1


next PC

4
32 32

A d d
Instruction Address

The least significant 2 bits of the PC are 00 since PC is a multiple of 4


32

30

+1
00

Improved Datapath
32

00

30

PC

32

Instruction Memory

Datapath does not handle branch or jump instructions


COE 308 Computer Architecture

Instruction Address

32

PC

Instruction Memory

Single Cycle Processor Design

Muhamed Mudawar slide 24

12

Datapath for R-type Instructions


Op6 Rs5 Rt5
RegWrite
30

Rd5

sa5
ALUCtrl

funct6

+1
00

30 32

Instruction Memory
Instruction Address

Rs 5
32

Registers
RA RB RW BusA BusB BusW

32

Rt 5 Rd 5

PC

32

A L U

32

ALU result

RA & RB come from the instructions Rs & Rt fields RW comes from the Rd field

ALU inputs come from BusA & BusB ALU result is connected to BusW

Control signals
ALUCtrl is derived from the funct field because Op = 0 for R-type RegWrite is used to enable the writing of the ALU result
Single Cycle Processor Design COE 308 Computer Architecture Muhamed Mudawar slide 25

Datapath for I-type ALU Instructions


Op6 Rs5 Rt5
RegWrite
30

immediate16
ALUCtrl

+1
00

30 32

Instruction Memory
Instruction Address

Rs 5
32 5

Registers
RA RB RW BusA
32

32

PC

Rt 5

BusB BusW ExtOp

32

A L U

32

ALU result

RW now comes from Rt, instead of Rd

Imm16

Extender

Control signals
ALUCtrl is derived from the Op field

Second ALU input comes from the extended immediate RB and BusB are not used

RegWrite is used to enable the writing of the ALU result ExtOp is used to control the extension of the 16-bit immediate
Single Cycle Processor Design COE 308 Computer Architecture Muhamed Mudawar slide 26

13

Combining R-type & I-type Datapaths


RegWrite
30

+1
00

ALUCtrl

30 32

Instruction Memory
Instruction Address

Rs 5
32

Registers
RA RB RW BusA
32

32

Rt 5
0

PC

BusB BusW ExtOp


32

m u Rd x
5 1

m u x
1

A L U

32

ALUSrc ALU result

A mux selects RW as either Rt or Rd

RegDst Imm16

Another mux selects 2nd ALU input as either source register Rt data on BusB or the extended immediate

Extender

Control signals
ALUCtrl is derived from either the Op or the funct field RegWrite enables the writing of the ALU result ExtOp controls the extension of the 16-bit immediate RegDst selects the register destination as either Rt or Rd ALUSrc selects the 2nd ALU source as BusB or extended immediate
Single Cycle Processor Design COE 308 Computer Architecture Muhamed Mudawar slide 27

Controlling ALU Instructions


RegWrite = 1
30

+1
00

ALUCtrl

30 32

Instruction Memory
Instruction Address

Rs 5
32

Registers
RA RB RW BusA
32

32

Rt
0

PC

BusB BusW ExtOp


32

m u Rd x
5 1

m u x
1

A L U

32

ALUSrc = 0 ALU result

RegDst = 1 Imm16

Extender
RegWrite = 1

For R-type ALU instructions, RegDst is 1 to select Rd on RW and ALUSrc is 0 to select BusB as second ALU input. The active part of datapath is shown in green For I-type ALU instructions, RegDst is 0 to select Rt on RW and ALUSrc is 1 to select Extended immediate as second ALU input. The active part of datapath is shown in green
Muhamed Mudawar slide 28

30

+1
00

ALUCtrl

30 32

Instruction Memory
Instruction Address

Rs 5
32

Registers
RA RB RW BusA
32

32

Rt
0

PC

BusB BusW ExtOp


32

m u Rd x
5 1

m u x
1

A L U

32

ALUSrc = 1 ALU result

RegDst = 0 Imm16 Single Cycle Processor Design

Extender

COE 308 Computer Architecture

14

Details of the Extender


Two types of extensions
Zero-extension for unsigned constants Sign-extension for signed constants

Control signal ExtOp indicates type of extension Extender Implementation: wiring and one AND gate
ExtOp = 0 Upper16 = 0 ExtOp

Upper 16 bits

Imm16
Single Cycle Processor Design

COE 308 Computer Architecture

Adding Data Memory to Datapath


A data memory is added for load and store instructions
ExtOp
Imm16

Extender
RA RB RW BusA

30

+1
00

30 32

Instruction Memory
Instruction Address

Rs 5
32

Rt 5
0

Registers
BusB BusW
0

PC

m u Rd x
5 1

RegDst RegWrite

ALU calculates data memory address

Additional Control signals


MemRead for load instructions MemWrite for store instructions
Single Cycle Processor Design

MemtoReg selects data on BusW as ALU result or Memory Data_out


COE 308 Computer Architecture Muhamed Mudawar slide 30

. . . . . .
32 32

ExtOp = 1 Upper16 = sign bit

Lower 16 bits
Muhamed Mudawar slide 29

ALUCtrl ALUSrc

MemRead

MemWrite MemtoReg

ALU result

m u x
1

A L U

32

Data Memory
Address Data_out Data_in
32

m 32 u x
1

32

A 3rd mux selects data on BusW as either ALU result or memory data_out BusB is connected to Data_in of Data Memory for store instructions

15

Controlling the Execution of Load


ExtOp = sign to sign-extend Immmediate16 to 32 bits
Imm16

ExtOp = sign

ALUCtrl = ADD
32

Extender
RA RB RW BusA

ALUSrc =1

MemRead =1

MemWrite =0 MemtoReg =1

30

+1
00

ALU result

30 32

Instruction Memory
Instruction Address

Rs 5
32

32

Rt 5
0

Registers
BusB BusW
0

PC

m u Rd x
5 1

m u x
1

A L U

32

Data Memory
Address Data_out Data_in
32

m 32 u x
1

32

RegDst = 0 selects Rt as destination register

RegDst RegWrite =0 =1

MemRead = 1 to read data memory MemtoReg = 1 places the data read from memory on BusW RegWrite = 1 to write the memory data on BusW to register Rt
Muhamed Mudawar slide 31

ALUSrc = 1 selects extended immediate as second ALU input ALUCtrl = ADD to calculate data memory address as Reg(Rs) + sign-extend(Imm16)
Single Cycle Processor Design

COE 308 Computer Architecture

Controlling the Execution of Store


ExtOp = sign to sign-extend Immmediate16 to 32 bits
Imm16

ExtOp = sign

ALUCtrl = ADD
32

Extender
RA RB RW BusA

ALUSrc =1

MemRead =0

MemWrite =1 MemtoReg =x

30

+1
00

ALU result

30 32

Instruction Memory
Instruction Address

Rs 5
32

32

Rt 5
0

Registers
BusB BusW
0

PC

m u Rd x
5 1

m u x
1

A L U

32

Data Memory
Address Data_out Data_in
32

m 32 u x
1

32

RegDst = x because no destination register

RegDst RegWrite =x =0

MemWrite = 1 to write data memory MemtoReg = x because we dont care what data is placed on BusW RegWrite = 0 because no register is written by the store instruction
Muhamed Mudawar slide 32

ALUSrc = 1 to select the extended immediate as second ALU input ALUCtrl = ADD to calculate data memory address as Reg(Rs) + sign-extend(Imm16)
Single Cycle Processor Design

COE 308 Computer Architecture

16

Adding Jump and Branch to Datapath


30 Jump or Branch Target Address 30 Imm26 30

Next PC
Imm16

MemRead

MemWrite MemtoReg

ALU result

PCSrc

+1
00

30

Instruction Memory
Instruction Address

Rs 5
32

zero

RA RB RW

BusA

Rt 5
0

Registers
BusB BusW

Ext
0

PC

m u x
0

m u Rd x
5 1

m u x
1

A L U

Data Memory
Address Data_out Data_in
32

m 32 u x
1

RegDst RegWrite ALUSrc ALUCtrl J, Beq, Bne

Additional Control Signals


J, Beq, Bne for jump and branch instructions Zero condition of the ALU is examined PCSrc = 1 for Jump & taken Branch
Single Cycle Processor Design COE 308 Computer Architecture

Next PC computes jump or branch target instruction address For Branch, ALU does a subtraction
Muhamed Mudawar slide 33

Details of Next PC
Branch or Jump Target Address Inc PC Sign-Extension: Most-significant bit is replicated Imm26
26 30

PCSrc

30

A D D

30

SE
msb 4

m 30 u x

Beq Bne

Imm16

J Zero

Imm16 is sign-extended to 30 bits PCSrc = J + (Beq . Zero) + (Bne . Zero)


Single Cycle Processor Design COE 308 Computer Architecture

Jump target address: upper 4 bits of PC are concatenated with Imm26

Muhamed Mudawar slide 34

17

Controlling the Execution of Jump


30 30 Imm26 Jump Target Address 30

PCSrc =1

Next PC
Imm16

MemRead =0

MemWrite =0 MemtoReg =x

ALU result

+1
00

30

Instruction Memory
Instruction Address

Rs 5
32

zero

RA RB RW

BusA Ext
0

Rt 5
0

Registers
BusB BusW

PC

m u x
0

m u Rd x
5 1

m u x
1

A L U

Data Memory
Address Data_out Data_in
32

m 32 u x
1

J = 1 selects Imm26 as jump target address Upper 4 bits are from the incremented PC PCSrc = 1 to select jump target address
Single Cycle Processor Design

RegDst RegWrite =x =0

ExtOp =x

ALUSrc ALUCtrl J = 1 =x =x

MemRead, MemWrite & RegWrite are 0 We dont care about RegDst, ExtOp, ALUSrc, ALUCtrl, and MemtoReg
COE 308 Computer Architecture Muhamed Mudawar slide 35

Controlling the Execution of Branch


30 30 Imm26 Branch Target Address 30

PCSrc =1

Next PC
Imm16

MemRead =0

MemWrite =0 MemtoReg =x

ALU result

+1
00

30

Instruction Memory
Instruction Address

Rs 5
32

zero

RA RB RW

BusA Ext
0

Rt 5
0

Registers
BusB BusW

PC

m u x
0

m u Rd x
5 1

m u x
1

A L U

Data Memory
Address Data_out Data_in
32

m 32 u x
1

Either Beq or Bne =1

RegDst RegWrite =x =0

ExtOp =x

ALUSrc ALUCtrl Beq = 1 =0 = SUB Bne = 1

Next PC outputs branch target address ALUSrc = 0 (2nd ALU input is BusB) ALUCtrl = SUB produces zero flag MemRead = MemWrite = RegWrite = 0
Single Cycle Processor Design

Next PC logic determines PCSrc according to zero flag RegDst = ExtOp = MemtoReg = x
Muhamed Mudawar slide 36

COE 308 Computer Architecture

18

Next . . .
Designing a Processor: Step-by-Step Datapath Components and Clocking Assembling an Adequate Datapath Controlling the Execution of Instructions The Main Controller and ALU Controller Drawback of the single-cycle processor design

Single Cycle Processor Design

COE 308 Computer Architecture

Muhamed Mudawar slide 37

Main Control and ALU Control


Instruction Memory
Instruction Address RegDst
32

Datapath
MemRead RegWrite ALUSrc ExtOp MemtoReg MemWrite funct6 Beq Bne J

A L U

Op6

ALUCtrl

Input: Output:

Main Control

ALUOp

ALU Control

6-bit opcode field from instruction 10 control signals for datapath ALUOp for ALU Control
Single Cycle Processor Design

Input:
6-bit function field from instruction ALUOp from main control

Output:
ALUCtrl signal for ALU

COE 308 Computer Architecture

Muhamed Mudawar slide 38

19

Single-Cycle Datapath + Control


30 Jump or Branch Target Address 30 Imm26 30

Next PC
Imm16

J, Beq, Bne ALU result

PCSrc

+1
00

30

Instruction Memory
Instruction Address

Rs 5
32

zero

RA RB RW

BusA Ext
0

Rt 5
0

Registers
BusB BusW

PC

m u x
0

m u Rd x
5 1

m u x
1

A L U

Data Memory
Address Data_out Data_in
32

m 32 u x
1

RegDst RegWrite

ExtOp

ALUSrc ALUCtrl func

Op

ALU Ctrl
ALUOp MemWrite

MemRead MemtoReg

Main Control
Single Cycle Processor Design COE 308 Computer Architecture Muhamed Mudawar slide 39

Main Control Signals


Signal
RegDst RegWrite ExtOp ALUSrc MemRead MemWrite

Effect when 0
Destination register = Rt None 16-bit immediate is zero-extended Second ALU operand comes from the second register file output (BusB) None None

Effect when 1
Destination register = Rd Destination register is written with the data value on BusW 16-bit immediate is sign-extended Second ALU operand comes from the extended 16-bit immediate Data memory is read Data_out Memory[address] Data memory is written Memory[address] Data_in BusW = Data_out from Memory PC Branch target address If branch is taken PC Jump target address

MemtoReg BusW = ALU result Beq, Bne J ALUOp PC PC + 4 PC PC + 4

This multi-bit signal specifies the ALU operation as a function of the opcode
COE 308 Computer Architecture Muhamed Mudawar slide 40

Single Cycle Processor Design

20

Main Control Signal Values


Op Reg Dst 0 = Rt 0 = Rt 0 = Rt 0 = Rt 0 = Rt 0 = Rt x x x x Reg Write 1 1 1 1 1 1 1 0 0 0 0 Ext Op x ALU Src ALU Op ADD SLT AND OR XOR ADD ADD SUB SUB x Beq Bne 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 J 0 0 0 0 0 0 0 0 0 0 1 Mem Read 0 0 0 0 0 0 1 0 0 0 0 Mem Write 0 0 0 0 0 0 0 1 0 0 0 Mem toReg 0 0 0 0 0 0 1 x x x x R-type 1 = Rd addi slti andi ori xori lw sw beq bne j 0=BusB R-type

1=sign 1=Imm 1=sign 1=Imm 0=zero 1=Imm 0=zero 1=Imm 0=zero 1=Imm 1=sign 1=Imm 1=sign 1=Imm x x x 0=BusB 0=BusB x

X is a dont care (can be 0 or 1), used to minimize logic


Single Cycle Processor Design COE 308 Computer Architecture Muhamed Mudawar slide 41

Logic Equations for Control Signals


RegDst RegWrite ExtOp ALUSrc <= R-type <= (sw + beq + bne + j) <= (andi + ori + xori) <= (R-type + beq + bne)
Logic Equations
MemtoReg MemRead MemWrite RegWrite ALUSrc RegDst ALUop ExtOp

Op6

Decoder
R-type addi slti andi ori xori lw sw
Muhamed Mudawar slide 42

MemRead <= lw MemWrite <= sw MemtoReg <= lw

Single Cycle Processor Design

COE 308 Computer Architecture

Beq Bne J

21

ALU Control Truth Table


Op6 R-type R-type R-type R-type R-type R-type addi slti andi ori xori lw sw beq bne j ALUOp R-type R-type R-type R-type R-type R-type ADD SLT AND OR XOR ADD ADD SUB SUB x ALU Control funct6 ALUCtrl add sub and or xor slt x x x x x x x x x x ADD SUB AND OR XOR SLT ADD SLT AND OR XOR ADD ADD SUB SUB x 4-bit Encoding 0000 0010 0100 0101 0110 1010 0000 1010 0100 0101 0110 0000 0000 0010 0010 x

The 4-bit encoding for ALUctrl is chosen here to be equal to the last 4 bits of the function field Other binary encodings are also possible. The idea is to choose a binary encoding that will minimize the logic for ALU Control

Single Cycle Processor Design

COE 308 Computer Architecture

Muhamed Mudawar slide 43

Next . . .
Designing a Processor: Step-by-Step Datapath Components and Clocking Assembling an Adequate Datapath Controlling the Execution of Instructions The Main Controller and ALU Controller Drawback of the single-cycle processor design

Single Cycle Processor Design

COE 308 Computer Architecture

Muhamed Mudawar slide 44

22

Drawbacks of Single Cycle Processor


Long cycle time
All instructions take as much time as the slowest
ALU Load Store
Instruction Fetch Reg Read Instruction Fetch Reg Read Instruction Fetch Reg Read ALU ALU ALU ALU Reg Write Memory Read Memory Write Reg Write

longest delay

Branch Instruction Fetch Reg Read Jump


Instruction Fetch Decode

Alternative Solution: Multicycle implementation


Break down instruction execution into multiple cycles
Single Cycle Processor Design COE 308 Computer Architecture Muhamed Mudawar slide 45

Multicycle Implementation
Break instruction execution into five steps
Instruction fetch Instruction decode and register read Execution, memory address calculation, or branch completion Memory access or ALU instruction completion Load instruction completion

One step = One clock cycle (clock cycle is reduced)


First 2 steps are the same for all instructions
Instruction
ALU & Store Load
Single Cycle Processor Design

# cycles
4 5

Instruction
Branch Jump

# cycles
3 2
Muhamed Mudawar slide 46

COE 308 Computer Architecture

23

Performance Example
Assume the following operation times for components:
Instruction and data memories: 200 ps ALU and adders: 180 ps Decode and Register file access (read or write): 150 ps Ignore the delays in PC, mux, extender, and wires

Which of the following would be faster and by how much?


Single-cycle implementation for all instructions Multicycle implementation optimized for every class of instructions

Assume the following instruction mix:


40% ALU, 20% Loads, 10% stores, 20% branches, & 10% jumps
Single Cycle Processor Design COE 308 Computer Architecture Muhamed Mudawar slide 47

Solution
Instruction Class ALU Load Store Branch Jump Instruction Memory 200 200 200 200 200 Register Read 150 150 150 150 150 ALU Operation 180 180 180 180 200 200 Data Memory Register Write 150 150 Total 680 ps 880 ps 730 ps 530 ps 300 ps

decode and update PC

For fixed single-cycle implementation:


Clock cycle = 880 ps determined by longest delay (load instruction)

For multi-cycle implementation:


Clock cycle = max (200, 150, 180) = 200 ps (maximum delay at any step) Average CPI = 0.44 + 0.25 + 0.14+ 0.23 + 0.12 = 3.8

Speedup = 880 ps / (3.8 200 ps) = 880 / 760 = 1.16


Single Cycle Processor Design COE 308 Computer Architecture Muhamed Mudawar slide 48

24

Worst Case Timing (Load Instruction)


Clk Clk-to-q Old PC New PC Instruction Memory Access Time Old Instruction New Instruction = (Op, Rs, Rt, Rd, Funct, Imm16, Imm26) Delay Through Control Logic Old Control Signal Values New Control Signal Values (ExtOp, ALUSrc, ALUOp, ) Register File Access Time Old BusA Value Delay Through Extender and ALU Mux Old Second ALU Input New Second ALU Input = sign-extend(Imm16) ALU Delay Old ALU Result Data Memory Access Time Old Data Memory Output Value Mux delay + Setup time + Clock skew Clock Cycle
Single Cycle Processor Design COE 308 Computer Architecture Muhamed Mudawar slide 49

New BusA Value = Register(Rs)

New ALU Result = Address

New Value Write Occurs

Worst Case Timing Cont'd


Long cycle time: must be long enough for Load operation
PCs Clk-to-Q + Instruction Memorys Access Time + Maximum of ( Register Files Access Time, Delay through control logic + extender + ALU mux) + ALU to Perform a 32-bit Add + Data Memory Access Time + Delay through MemtoReg Mux + Setup Time for Register File Write + Clock Skew

Cycle time is longer than needed for other instructions


Therefore, single cycle processor design is not used in practice
Single Cycle Processor Design COE 308 Computer Architecture Muhamed Mudawar slide 50

25

Summary
5 steps to design a processor
Analyze instruction set => datapath requirements Select datapath components & establish clocking methodology Assemble datapath meeting the requirements Analyze implementation of each instruction to determine control signals Assemble the control logic

MIPS makes Control easier


Instructions are of same size Source registers always in same place Immediates are of same size and same location Operations are always on registers/immediates

Single cycle datapath => CPI=1, but Long Clock Cycle


Single Cycle Processor Design COE 308 Computer Architecture Muhamed Mudawar slide 51

26

Potrebbero piacerti anche