Cpre 381 Project Report 1

CPRE 381 Final Project
MIPS Pipeline Datapath Implementation

Jiwei Xia
Junjie Fan

5/1/2014

CPRE 381 FINAL PROJECT May 1, 2014

1

Overview
This project requires us to implement the pipeline datapath for MIPS processor.
In this project, we need to

Modify the datapath from single cycle datapah.
Apply the hazard detection unit.
Apply the data forwarding unit.
Extend the Instruction pool.
Develop some test case for the circuit.
Run a simple for loop on it.

Description of the project
Pipelining is a technique that exploits parallelism among the instructions in a
sequential instruction stream. Each instruction is split up into a sequence of steps.
Different steps are executed concurrently. This increases the instruction
throughput, speeding up the execution time. However it brings some issues which
never exists in single cycle situation. We need to deal with Data Hazard, Control
Hazard in this project. Besides this, we need to assembly the datapath correctly.
These things would make sure we are able to run a simple program after our
processor is completed.

Determining the architecture of the processor
Our processor is a 32-bit MIPS processor. Its function is based on the lecture and
textbook from CprE 381. The operation is well explained in the textbook. We
modified some parts to make it be able to run more instructions.


2

1. DataPath

Figure 1 shows the single-cycle datapath with pipeline stage identified. The
division of an instruction into five stages means a five stage pipeline.
The five stages are:

1. IF: Instruction fetch
2. ID: Instruction decode and register file read
3. EX: Execution or address calculation
4. MEM: Data memory access
5. WB: Write back

Thus we need to add pipeline registers between stages.

Figure 1. Si ngle Cycle Datapath

3

After these registers are set up, we need to assign correct value from register to
register. For example, we need to pass instruction and PC value to IF/ID registers;
the control signals from control to ID/EX register;

Since some instructions need to track the extra values, we need to add some
extra registers. For example, for function jal , we need add control signal jal to
tell the write back stage that we are using the 31 as the write address for the
register file. Also, since we are trying to write the PC value into the register file,
we need extra registers to pass by the PC value. The original IF/ID.PC, ID/EX.PC
Figure 3. Pipeli ne Registers in Code
Figure 2. Pipeli ne Registers

4

can be used, but EX/MEM.PC cannot remain the PC+4 (Our machine predict the
PC = PC+1, the reason is because the given Memory module is word based, and its
index is increased by 1), because the EX/MEM.PC will be the branch address. As
a result, we need an EX/MEM.PC.jal and MEM/WB.PC.jal to keep the PC value
that we want to write into register file.
2. Memory

The memory in the book is supposed to be byte based, which means each
instruction in increased by 4 in the instruction memory. Thus when predicting the
next PC value, we need make PC = PC+4. When we grab the value from data
memory, we also need to make the address incremented by 4.

However, in our lab, the memory module produced by the MegaWizard is word
based, which means each address is pointing to a whole word. As a result, our
prediction for PC is PC= PC +1;

Figure 4. Bubble Sort Instructions i n the memory file

5

The memory is sized Width = 32 and Depth = 1024. But because we only use 8 bits
as our immediate field, the effect memory address will be from 0 to 255.

Another thing I noticed is the memory module is clock based, which means it is
driven by clock. However in the single cycle datapath, our other module is
combination logic which means the result is directly produced by input. The clock
delay makes the single cycle datapath failed. To solve this problem, we added a
much faster clock signal to drive the memory module so the correct value will be
available when the normal clock rising edge comes.

3. Registers

The register file will consist of 32 32-bit registers. Most are general purpose
registers, but some are served as special purpose.

Register Purpose
0 The constant zero
31 Link register

4. Instruction Format

The instruction format is exactly the same from MIPS, which make it easy because
we can get a lot help form textbook.
This website helps us to convert the instructions into binary and hexadecimal.
http://www.mipshelper.com/mips-converter.php

Figure 5. www.mipshel per.com converter

6

5. Addressing Modes

The processor supports the following addressing mode:
Branch operation
o Branch Equal (8-bit field)
o Branch Not Equal (8-bit field)
Jump operations
o Absolute jump (8-bit field)
o Jump and Link (8-bit filed)
Memory load and store operation
o Register base address with8 bit immediate field

6. Instruction Supported

To make more instructions supported by the processor is the most challenging
part and it is time consuming.

Since we did this project in about two days, we dont have many instructions
added. But it supports the basic functions:

We also have branch not equal implemented. To implement BNE, we need
another control signal called Branch_NE, which indicating it is an Branch not
equal instruction. Then the branch condition becomes (Branch_NE && !Zero).

We have already talked about jal, it also need some additional control signal and
pipeline registers. The change in data path is as follows:

INST Reg
Dst
ALU
Src
Memto-
Reg
Reg
Write
Mem
Read
Men
Write
Branch ALU
Op1
ALU
Op0
jal
R-Type 1 0 0 1 0 0 0 1 0 0
lw 0 1 1 1 1 0 0 0 0 0
sw X 1 X 0 0 1 0 0 0 0
beq X 0 X 0 0 0 1 0 1 0
addi 0 1 0 1 0 0 0 0 0 0
ori 0 1 0 1 0 0 0 1 0 0
andi 0 1 0 1 0 0 0 1 0 0
j X X X 0 0 0 X X X 0
jal X X X 1 0 0 X X X 1

7

Although it is single cycle datapath, we are able to modify it relatively easily.
The key point implementing jal is that we need to use IDEX_jal to select the write
address between 31 and Rt, Rd field. This is at EX stage.

And use MEMWB_jal to select the data to write. This is at WB stage.

7. Specifying the behavior of the clock signal

Literally we have to use rising edge to update PC and pipeline registers, and use
falling clock edge to update the register file and the memory unit. However, we
actually use rising clock edge for all registers and it work well.
We think it is maybe because in ModelSim we dont need to deal with time the
clock signal flips. It is ideal and everything happen instantly. In reality we may
have slew rate issue and have to make correct value stable before clock changes.

Figure 6. Jal Jr added si ngle cycle datapath

8

8. Design, implement, and test the ALU

Our ALU is the same with the one in lecture slides or textbooks. It is 32 bit wide
and can implement bitwise and, bitwise or, addition, subtraction and set less than,
but shift is not supported. We can actually implement shift inside the ALU using
the shift module from lab 3.

Our ALU uses carry look ahead method to compute the addition. Each bit
produces generate and propagate, g and p. Then each 4 bits are group together to
produce P and G, then we reuse the 4 bit carry look-ahead unit to produce carry
out for 16 bits. Finally we use ripple carry method to combine two 16 bits ALU
together to make it a 32 bit ALU.

ALUOp Funct field Operation
ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0
0 0 X X X X X X 010
X 1 X X X X X X 110
1 X X X 0 0 0 0 010
1 X X X 0 0 1 0 110
1 X X X 0 1 0 0 000
1 X X X 0 1 0 1 001
1 X X X 1 0 1 0 111

Figure 7. 16 bit carry look-ahead ALU
Figure 8. ALU Control Unit Truth Table

9

Operation is a three bit control signal. The least significant bit is used as B_invert.
Oeration[1:0] is used for determine which function the ALU is performing:

a) Logical AND (operation 00)
b) Logical OR (operation 01)
c) Addition (operation 10)
d) Subtraction (operation 10 with binvert)
e) Set-on-Less-Than (operation 11)

Besides these it also Overflow flag and Zero which indicates the overflow occurs
and the result is zero.
Zero is set when the result is zero.

9. Drawing the datapath

Several drafts has been sketched but there is no a final version because we are
keeping adding new thing to it. It is very much similar with those on textbook
with additional multiplexer and some wire added.
It is most closed to that in the Figure 6 on page 7, plus stage registers.

10. Defining the control lines

Our control lines are based on the textbook, the new control lines are
discussed here.

Jal : Our datapath adds Jal control line. Asserting this line causes the
register file to record the value of the PC of the instruction in the WB
stage to register 31. As long as control detect it is a jump and link
instruction in the ID stage, its output jal will be set, and it will pass on to
the next stage. In the EX stage, we will use ID/EX. jal to select 31 or Rd
as a new writing address; in the WB stage, we will use MEM/WB.jal to
select the recorded PC value or the data from MemtoReg Multiplexer.
Branch_NE: The signal is asserted when the instruction is a branch not
equal instruction. This signal is ANDed with the !Zero signal to
determine whether a branch is taken or not.


10

11. Determine the control line coding for each instruction

With all of the control lines are defined in the table, we are able to code the
control unit for each instruction. The control signals are as follows:

The code for the control unit is as follows:

The page is not wide enough to show the complete equation. When we need to
add new instructions, we add the part which would assert the signal,
otherwise it would be deserted.

12. Implementing and testing the control module.

We did not do separate testing for the control module because this part is
embedded into the testing for instructions. As long as the instructions are
working fine, there should be no problems.

INST Reg
Dst
ALU
Src
Memto-
Reg
Reg
Write
Mem
Read
Men
Write
Branch ALU
Op1
ALU
Op0
jal BNE
R-Type 1 0 0 1 0 0 0 1 0 0 0
lw 0 1 1 1 1 0 0 0 0 0 0
sw X 1 X 0 0 1 0 0 0 0 0
beq X 0 X 0 0 0 1 0 1 0 0
bne X 0 X 0 0 0 0 0 1 0 1
addi 0 1 0 1 0 0 0 0 0 0 0
j X X X 0 0 0 X X X 0 0
jal X X X 1 0 0 X X X 1 0
Figure 9. Control Unit Truth Table

11

13. Designing and implementing the remaining modules

The other modules were developed and tested individually and we are
confirmed that they are working correctly. These individual testing is done in
each lab assignment because they are not changed as much as we change the
datapath and control unit. We have used them in Single Cycle datapath and
Multi-cycle datapath. The only new modules we use for the first time is the
hazard detection unit and forwarding unit.

14. Implementing the datapath

The datapath was implemented in Verilog using connecting wires and
modules we had created. The most important work in coding is make sure you
dont enter the wrong variable name into the module ports.
It is very common mistake when assembling the datapath. You may get error
in the value you want and spend hours finding what is going on and finally
find you put the wrong name in the slot.
With the help of the textbook and lecture slides, also with some resources
from internet, actual connecting modules are relatively easy.

15. Selecting the sequence of instructions to test the processor

Data Forwarding testing

PC Instruction Machine Code Expected Value
0 addi $1, $0, 1 20010001 $1 =1
1 addi $2, $0, 2 20020002 $2 =2
2 addi $3, $0, 3 20030003 $3 =3
3 addi $4, $0, 4 20040004 $4 =4
4 add $1, $1, $2 00220820 $1 =3
5 add $1, $1, $3 00230820 $1 =6
6 NOP 00000000


12

Simulation result

The tested value is the same as we expected. The add $1, $1, $3 instruction
needs the value which is not yet written into the register $1, but it is
forwarded from next stage.

Double Data Hazard testing

0 addi $1, $0, 1 20010001 $1 =1
1 addi $2, $0, 2 20020002 $2 =2
2 addi $3, $0, 3 20030003 $3 =3
3 addi $4, $0, 4 20040004 $4 =4
4 add $1, $1, $2 00220820 $1 =3
5 add $1, $1, $3 00230820 $1 =6
6 add $1, $1, $4 00240820 $1 =10
7 NOP 00000000

Simulation result

In this case, actually it is the Double Data Hazard case in the lecture slides,
both hazards occur, and we want to use the most recent result. Thus we used
the revised MEM hazard condition: only forward if EX hazard condition isnt
true.

The result is as we expected, the add $1, $1, $4 applies forwarding from
EX/MEM registers rather than from MEM/WB registers.

Figure 10. Data forwarding test result
Figure 11. Double Hazard Data forwardi ng test result

13

Forwarding condition:

EX hazard: Data forwarding from EX/MEM register
if (EX/MEM.RegWrite and (EX/MEM.RegisterRd 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs))
ForwardA = 10

if (EX/MEM.RegWrite and (EX/MEM.RegisterRd0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRt))
ForwardB = 10

MEM hazard (revision): Data forwarding from MEM/WB register
if (MEM/WB.RegWrite and (MEM/WB.RegisterRd 0)
and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs))
and (MEM/WB.RegisterRd = ID/EX.RegisterRs))

ForwardA = 01

if (MEM/WB.RegWrite and (MEM/WB.RegisterRd 0)
and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRt))
and (MEM/WB.RegisterRd = ID/EX.RegisterRt))

ForwardB = 01

Load Use Hazard testing

0 addi $8, $0, 4 20080004 $8 =4
1 lw $3, 2($8) 8D030002 $3 =611861287
2 add $4, $3, $8 00682020 $3 =611861291
3 NOP 00000000


14

As we can see, lw $3, 2($8) uses the result from addi $8, $0, 4, which is
forwarded from next stage. This procedure takes only 1 cycle.

For add $4, $3, $8, it takes two cycles because we have to stall the pipeline to
wait the result comes out at MEM stage.

The result is the same as we expected.

Hazard Detection Condition:

Control Hazard Detection

Branch flush:

0 beq $zero, $zero, 8 10000008 branch to 9
1 addi $1, $0, 1 20010001 flushed
2 addi $2, $0, 2 20020002 flushed
3 addi $3, $0, 3 20030003 flushed
4 addi $4, $0, 4 20040004 Not in the pipeline
5
9 addi $5, $0, 5 20050005 $5 =5
7 NOP 00000000

Figure 12. Load Use Hazard test

15

Simulation result

Since we resolve the branch in MEM stage, each taken branch will cause 3
cycles performance loss. We flush three instructions followed by the branch
instruction.
As a result, $1, $2, $3 will not be assign any value. Instruction addi $4, $0, 4 is
not even fetched into the registers.
After branch taken, the PC is set to 9 and it executes addi $5, $0, 5 and the $5
becomes 5.

The result is the same as we expected.
The branch takes 4 cycles to determine whether to branch or not. And the
addi $5, $0, 5 takes 5 cycle to write the result back to register files.

Thats why you see 9 clock cycles on the wave form.

Figure 13.Control Hazard Detection
Figure 14. Branch Fl ush process

16

For the IFID register, we set the instruction to 0, essentially
add $0, $0, $0, which is equivalent to a NOP and it does no harm to our data.
For the IDEX registers and EXMEM registers, we set the control signal to 0 so
that it wont write into memory or register file.

Jump flush:
0 j 9 08000009 jump to 9
1 addi $1, $0, 1 20010001 flushed
5
9 addi $5, $0, 5 20050005 $5 =5
10 NOP 00000000

Simulation Result

It is the same result, but the only difference is that jump instruction only
wasted 1 clock cycle, since only 1 instruction behind jump instruction is
fetched. We only need to flush that.

The addi instruction still takes 5 cycles and the total clock cycles need is 7.
You only see 6 wave form because the first wave is triggered by reset, and the
write-back is happen at the beginning of the WB stage.

Figure 14. Jump Flush process

17

Jal flush and record PC

0 jal 9 0C000009 jump to 9 and link
1 addi $1, $0, 1 20010001 flushed
4 addi $4, $0, 4 20040004 $ 31 =1
5
9 addi $5, $0, 5 20050005 $5 =5
10 NOP 00000000

Simulation Result:

We can see that the PC+1 of when the jump happens is written into the
register 31 at beginning of forth cycle. Again, the first cycle is triggered by
reset and cannot clearly see, after the first instruction, everything is as we
expected.

It also flushes the following one instruction as jump instruction does.
Figure 14. Jal Flush and Record process

18

:
Run a program to test the whole processor

After testing each instructions we are able to combine them together to run
some simple programs.

Bubble Sort Program

Special thanks for Nicholas Cervantes who translates the C code of bubble
sort into MIPS assembly code and share with me to test my processor.

C code:
int main ()
{
int arr[10] = {1, 7, 10, 2, 3, 8, 4, 9, 5, 6};

int i = 0;
int d = 0;

//Bubble Sort for the given array.
for (i = 0 ; i <10; i++) {
for (d = 0 ; d < 9 - i; d++) {
if (arr[d] > arr[d+1]) {
int swap = arr[d];
arr[d] = arr[d+1];
arr[d+1] = swap;
}
}
}

return 0;
}


19

Assembly Code:

0 addi $19, $0, 9 20130009 Set $19 = 9
1 addi $20, $0, 10 2014000A Set $20 = 10
2 add $1, $0, $0 00000820 Set $1 = i = (0)
3 slt $11, $1, $20 0034582A branch check (i<10)
#OUTER LOOP
4 beq $0, $11, 14 100B000E Branch to where Loop Ends
5 sub $18, $19, $1 02619022 $18 = 9 - i
6 add $2, $0, $0 00001020 Set $2 = d = (0)
7 slt $11, $2, $18 0052582A branch check (d<9-i)
8 beq $0, $11, 8 100B0008 branch to OUTER if not (d<9-i)
#INNER LOOP
9 lw $4, 0($2) 8C440000 load arr[d] -> $4
10 lw $5, 1($2) 8C450001 load arr[d+1] -> $5
11 slt $11, $5, $4 00A4582A Note: a bubble should form
(from lw, followed by read)
12 beq $0, $11, 2 100B0002 Branch if no swap needed.
13 sw $5, 0($2) AC450000 Place arr[d+1] into arr[d]
14 sw $4, 1($2) AC440001 Place arr[d] into arr[d+1]
15 addi $2, $2, 1 20420001 Addi $2++ (d++)
16 J 3 08000003 Jump to the #INNER LOOP
17 addi $1, $1, 1 20210001 Addi $1++ (i++)
18 j 7 08000007 Jump to OUTER LOOP
19 lw $1, 0($0) 8C010000 Load all values into the regs to
20 lw $2, 1($0) 8C020001 check their values
21 lw $3, 2($0) 8C030002
22 lw $4, 3($0) 8C040003
23 lw $5, 4($0) 8C050004
24 lw $6, 5($0) 8C060005
25 lw $7, 6($0) 8C070006
26 lw $8, 7($0) 8C080007
27 lw $9, 8($0) 8C090008
28 lw $10, 9($0) 8C0A0009
29 NOP 00000000

Figure 15. Bubble Sort MIPS Code

20

We expect the result to be sorted from small to large, from register 1 to
register 10.

As you can see the final result is sorted as we want.
These data from the data memory is random placed, lets try one more.

Data in the data memory is like this:

The result is as follows:

Figure 16. Sorting example 1
Figure 17. Sorting example 2

21

Conclusion: We should start this project earlier, so we will have enough
time to implement more things. At the Friday morning, we are still writing the
report. This class is not required for EE student but I am quite interested in
and fortunately I am doing not bad. As an EE senior student, I am more
familiar with Verilog compared to my partner who are majored in computer
engineering. However, we finally go through all the labs and have done our
final project, it is a great success. All the instructions are working correctly
and we have reviewed what we learned in the textbook and verified the
design. We gain a lot from implementing this pipeline datapath project, and
we appreciate it.

Cpre 381 Project Report 1

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Cpre 381 Project Report 1

Caricato da

Copyright:

Formati disponibili

CPRE 381 Final Project

MIPS Pipeline Datapath Implementation

Potrebbero piacerti anche