8 3501 07 2

Designing MIPS Processor (Single-Cycle)
Dr. Arjan Durresi Louisiana State University Baton Rouge, LA 70810 Durresi@Csc.LSU.Edu These slides are available at: http://www.csc.lsu.edu/~durresi/CSC3501_07/
CSC3501 S07
Louisiana State University
8- Single Cycle Datapath - 1
Overview

Datapath Control Unit Problems with single cycle Datapath
CSC3501 S07
Performance

Instruction Count Clock cycle time Clock cycle per Instruction
Implementation of Processor
CSC3501 S07
The Processor: Datapath & Control

We're ready to look at an implementation of the MIPS Simplified to contain only: memory-reference instructions: lw, sw arithmetic-logical instructions: add, sub, and, or, slt control flow instructions: beq, j Generic Implementation:

use the program counter (PC) to supply instruction address get the instruction from memory read registers use the instruction to decide exactly what to do
All instructions use the ALU after reading the registers Why? memory-reference? arithmetic? control flow?
CSC3501 S07
More Implementation Details
Abstract / Simplified View:
4 Add Add
Data Register # Registers Register # Register # Data
PC
Address Instruction Instruction memory
ALU
Address Data memory
Two types of functional units: elements that operate on data values (combinational) elements that contain state (sequential)
CSC3501 S07
Building the Datapath
CSC3501 S07
An edge triggered methodology Typical execution: read contents of some state elements at the beginning of the clock cycle, send values through some combinational logic, write results to one or more state elements at the end of the clock cycle.
State element 1 Combinational logic State element 2
Our Implementation
Clock cycle
An edge triggered methodology allows a state element to be read and written in the same clock cycle without creating a race that could to indeterminate data.
CSC3501 S07
Single Cycle Design

We shall first design a simpler processor that executes each instruction in only one clock cycle time. This is not efficient from performance point of view, since: a clock cycle time (i.e. clock rate) must be chosen such that the longest instruction can be executed in one clock cycle makes shorter instructions execute in one unnecessary long cycle. Additionally, no resource in the design may be used more than once per instruction, thus some resources will be duplicated. Because of that, the singe cycle design will require: two memories (instruction and data), two additional adders.
CSC3501 S07
Elements for Datapath Design
Include the functional units we need for each instruction

Instruction address Instruction Instruction memory a. Instruction memory b. Program counter c. Adder PC Add Sum
MemWrite Address Read data 16 Data memory Sign extend 32
Write data
MemRead a. Data memory unit b. Sign-extension unit
5 5 5
Read register 1 Read register 2 Write register Write Data Registers
4 Read data 1 Data Read data 2
ALU operation
Register numbers
Zero ALU ALU result
Data
Why do we need this stuff?

b. ALU
RegWrite a. Registers
CSC3501 S07
Abstract /Simplified View (1st look)
Generic implementation: use the program counter (PC) to supply instruction address, get the instruction from memory, read registers, use the instruction to decide exactly what to do.
CSC3501 S07
Abstract /Simplified View (2nd look)

4 Add Add
Data Register # Registers Register # Register # Data
PC
Address Instruction Instruction memory
ALU
Address Data memory
PC is incremented by 4, by most instructions, and by 4 + 4offset, by branch instructions. Jump instructions change PC differently (not shown).
CSC3501 S07
Incrementing PC & Fetching Instruction
CSC3501 S07
Two elements needed to implement R-formant ALU operations
add $t1,$t2,$t3
CSC3501 S07
Complete Datapath for R-type Instructions
CSC3501 S07
Two elements needed to implement loads and stores
lw $t1, offset_value($t2) Sw $t1, offset_value($t2)
Compute a memory address by adding the base register ($t2) to the 16-bit signed offset field contained in the instruction
CSC3501 S07
Datapath for LW and SW Instructions
CSC3501 S07
Datapath for R-type, LW & SW Instructions
CSC3501 S07
The datapath for a branch
beq $t1, $t2, offset
Compute the branch target address by adding the sign-extended offset of the instruction to PC
CSC3501 S07
Datapath for R-type, LW, SW & BEQ
CSC3501 S07
ALU
Generate the 4-bit ALU control input using a small control unit that has as inputs the function field of the instruction and a 2-bit control field ALUOp
CSC3501 S07
Control

Selecting the operations to perform (ALU, read/write, etc.) Controlling the flow of data (multiplexor inputs) Information comes from the 32 bits of the instruction Example: add $8, $17, $18 Instruction Format:
000000 10001 10010 01000 00000 100000 op rs rt rd shamt funct 10:6 5:0
31:26 25:21 20:16 15:11
ALU's operation based on instruction type and function code

CSC3501 S07
Load, store and branch instructions

Load or store instruction
35 or 43 31:26 rs rt address 15:0 25:21 20:16
Branch instruction
4 31:26 rs rt address 15:0
25:21 20:16
CSC3501 S07
Control
Must describe hardware to compute 4-bit ALU control input given instruction type 00 = lw, sw ALUOp 01 = beq, computed from instruction type 10 = arithmetic function code for arithmetic Describe it using a truth table (can turn into gates):
CSC3501 S07
Truth Table for (Main) Control Unit
CSC3501 S07
Truth Table of ALU Control Unit
CSC3501 S07
Design of (Main) Control Unit
CSC3501 S07
Design of Control Unit (J included)
CSC3501 S07
Design of 7-Function ALU Control Unit
CSC3501 S07
Cycle Time Calculation
CSC3501 S07
Single Cycle Implementation
Calculate cycle time assuming negligible delays except: memory (200ps), ALU and adders (100ps), register file access (50ps)
PCSrc
Add 4 Shift left 2 Read register 1 ALUSrc ALU operation ALU Add result
M u x
PC
Read address Instruction Instruction memory
Read data 1 Read register 2 Registers Read Write data 2 register Write data RegWrite 16 32
MemWrite Zero ALU ALU result Address Read data MemtoReg
M u x
M u x
Write data Sign extend MemRead
Data memory
CSC3501 S07
Performance of Single cycle machines

Memory (200ps), ALU and adders (100ps), register file access (50ps) Instructions mix: 25% loads, 10% stores, 45% ALU, 15% branches, 5% jumps. Which of the following implementations would be faster? Every instruction operates in a 1 clock cycle of a fixed length Every instruction operates in a 1 clock cycle of a variable length CPU execution time = Instruction count x CPI x Clock cycle time Since CPI = 1 CPU execution time = Instruction count x Clock cycle time Using the critical paths we can compute the required length for each class: R-type 400ps, Load word 600ps, Store word 550ps, Branch 350ps, jump 200ps In case 1 the clock has to be 600ps depending on the longest instruction A machine with a variable clock will have a clock cycle that varies between 200ps and 600ps. The average CPU clock cycle= 600x25% + 550x10% + 400x45% + 350x15%+200x5% = 447.5ps So the variable clock machine is faster 1.34 times
CSC3501 S07
Example
Instruction class R-type Load word Store word Branch Jump Functional units used by the instruction class Instructio n fetch Instructio n fetch Instructio n fetch Instructio n fetch Instructio n fetch Register access Register access Register access Register access ALU ALU ALU ALU Register access Register access Register access Register access
CSC3501 S07
Example
Instruction class R-type Load word Store word Branch Jump Instructio n memory 200 200 200 200 200 Register read 50 50 50 50 ALU Data memory 0 200 200 0 Register write 50 50 0 Total 100 100 100 100 400ps 600ps 550ps 350ps 200ps
CSC3501 S07
Abstract View of our single cycle process

op fun
ExtOp ALUSrc ALUctr
Main Control ALU control
Equal
MemRd MemWr
nPC_sel
MemWr
RegDst RegWr
PC
Ext
ALU
looks like a FSM with PC as state

Data Mem
CSC3501 S07
Result Store
Instruction Fetch
Next PC
Register Fetch
Mem Access
Reg. Wrt
Whats wrong with our CPI=1 processor?

Arithmetic & Logical PC Inst Memory Load PC Store PC Branch PC Reg File
mux
ALU
mux
setup
Inst Memory
mux Reg File Critical Path
ALU
Data Mem
mux setup
Inst Memory Inst Memory
Reg File Reg File
mux
ALU
Data Mem
cmp
mux
Long Cycle Time All instructions take as much time as the slowest Real memory is not as nice as our idealized memory cannot always get the job done in one (short) cycle
CSC3501 S07
Where we are headed

Single Cycle Problems: what if we had a more complicated instruction like floating point? wasteful of area One Solution: use a smaller cycle time have different instructions take different numbers of cycles a multicycle datapath:
PC
Address Instruction Memory or data Data
Instruction register
Data A Register # Registers Register # ALU B Register # ALUOut
Memory data register
CSC3501 S07
Single Cycle Processor: Conclusion
Single Cycle Problems: what if we had a more complicated instruction like floating point? a clock cycle would be much longer, thus for shorter and more often used instructions, such as add & lw, wasteful of time. One Solution: use a smaller cycle time, and have different instructions take different numbers of cycles. And that is a multi-cycle processor.
CSC3501 S07

8 3501 07 2

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

8 3501 07 2

Caricato da

Copyright:

Formati disponibili

Designing MIPS Processor (Single-Cycle)

Louisiana State University

8- Single Cycle Datapath - 1

Datapath Control Unit Problems with single cycle Datapath

Louisiana State University

8- Single Cycle Datapath - 2

Instruction Count Clock cycle time Clock cycle per Instruction

Louisiana State University

8- Single Cycle Datapath - 3

The Processor: Datapath & Control

Louisiana State University

More Implementation Details

Abstract / Simplified View:

Data Register # Registers Register # Register # Data

Address Instruction Instruction memory

Address Data memory

Louisiana State University

8- Single Cycle Datapath - 5

Building the Datapath

Louisiana State University

8- Single Cycle Datapath - 6

Louisiana State University

Single Cycle Design

Louisiana State University

8- Single Cycle Datapath - 8

Elements for Datapath Design

Include the functional units we need for each instruction

MemWrite Address Read data 16 Data memory Sign extend 32

MemRead a. Data memory unit b. Sign-extension unit

Read register 1 Read register 2 Write register Write Data Registers

4 Read data 1 Data Read data 2

Zero ALU ALU result

Why do we need this stuff?

Louisiana State University

8- Single Cycle Datapath - 9

Abstract /Simplified View (1st look)

Louisiana State University

8- Single Cycle Datapath - 10

Abstract /Simplified View (2nd look)

Data Register # Registers Register # Register # Data

Address Instruction Instruction memory

Address Data memory

Louisiana State University

8- Single Cycle Datapath - 11

Incrementing PC & Fetching Instruction

Louisiana State University

8- Single Cycle Datapath - 12

Two elements needed to implement R-formant ALU operations

Louisiana State University

8- Single Cycle Datapath - 13

Complete Datapath for R-type Instructions

Louisiana State University

8- Single Cycle Datapath - 14

Two elements needed to implement loads and stores

lw $t1, offset_value($t2) Sw $t1, offset_value($t2)

8- Single Cycle Datapath - 15

Datapath for LW and SW Instructions

Louisiana State University

8- Single Cycle Datapath - 16

Datapath for R-type, LW & SW Instructions

Louisiana State University

8- Single Cycle Datapath - 17