Sei sulla pagina 1di 38

Advanced Computer Architecture

MIPS ISA and PROCESSOR

Ajit Pal
Professor
Department of Computer Science and Engineering
Indian Institute of Technology Kharagpur
INDIA-721302
Case Study

MIPS Instruction Set architecture


Operands in MIPS
 Registers
– MIPS has 32 architected 32-bit registers
– Effective use of registers is key to program performance

 Data Transfer
– 32 words in registers, millions of words in main memory
– Instruction accesses memory via memory address
– Address indexes memory, a large single-dimensional array

 Alignment
– Most architectures address individual bytes
– Addresses of sequential words differ by 4
– Words must always start at addresses that are multiples of 4
Register Usage Conventions
 Registers
r0: always 0
r1: reserved for the assembler
r2, r3: function return values
r4~r7: function call arguments
r8~r15: “caller-saved” temporaries
r16~r23 “callee-saved” temporaries
r24~r25 “caller-saved” temporaries
r26, r27: reserved for the operating system
r28: global pointer
r29: stack pointer
r30: callee-saved temporaries
r31: return address
Data Types in MIPS
Data and instructions are both represented in bits
– 32-bit architectures employ 32-bit instructions
– Combination of fields specifying operations/operands
MIPS operates on:
32-bit (unsigned or 2’s complement) integers,
32-bit (single precision floating point) real numbers,
 64-bit (double precision floating point) real numbers;
 32-bit words, bytes and half words can be loaded into GPRs
 After loading into GPRs, bytes and half words are either zero
or sign bit expanded to fill the 32 bits;
 Only 32-bit units can be loaded into FPRs and 32-bit real
numbers are stored in even numbered FPRs.
 64-bit real numbers are stored in two consecutive FPRs,
starting with even-numbered register.
Floating-point numbers IEEE standard 754 float: 8-bit exponent, 23-bit
significand double:11-bit exponent, 52-bit significand
MIPS Memory Organization
• MIPS supports byte addressability:
 it implies that a byte is the smallest unit with its
address;
 32-bit word has to start at byte address that is
multiple of 4;
 Thus, 32-bit word at address 4n includes four bytes
with addresses: 4n, 4n+1, 4n+2, and 4n+3.
16-bit half word has to start at byte address that is
multiple of 2; Thus, 16-bit word at address 2n includes
two bytes with addresses: 2n and 2n+1.
 it implies that an address is given as 32-bit unsigned
MIPS Instruction Formats
R-Type Instruction Fields
– op: basic operation of instruction, called the opcode
– rs, rt: first, second register source operand
– rd: register destination operand
– shamt: shift amount for shift instructions
– funct: specifies a variant of the operation, called
function code

op rs rt rd shamt funct
MIPS Instruction Formats
• I-Type Instruction Fields
– Opcode specifies instruction format

• J-Type Instruction Fields


MIPS Addressing Modes
• Register Addressing
– Operand is a register (e.g., add $s2, $s0, $s1)

• Base or Displacement Addressing


– Operand is at memory location whose address is sum
of register and constant (e.g., lw $s1, 8($s0))
MIPS Addressing Modes
• Immediate Addressing
– Operand is constant within instruction (e.g., addi $s1,
$s0, 4)

• PC-Relative Addressing
– Address is sum of PC and constant within instruction
(e.g., beq $s0, $s1, 16)
MIPS Addressing Modes

• Pseudo-direct Addressing
– Address is 26 bits of constant within instruction
concatenated with upper 6 bits of PC (e.g., j 1000)
Survey of MIPS Instruction Set
• Arithmetic
• Addition
– add $s1, $s2, $s3 # $s1 = $2 + $s3, overflow detected
– addu $s1, $s2, $s3 # $s1 = $s2 x $s3, overflow
undetected

• Subtraction
– sub $s1, $s2, $s3 # $s1 = $s2 - $s3, overflow detected
– subu $s1, $s2, $s3 # $s1 = $s2 - $s3, overflow
undetected

• Immediate/Constants
– addi $s1, $s2, 100 # $s1 = $s2 + 100, overflow detected
– addiu $s1, $s2, 100 # $s1 = $s2 + 100, overflow
undetected
Survey of MIPS Instruction Set
• Arithmetic
• Multiply
– mult $s2, $s3 # Hi, Lo = $s2 x $s3, 64-bit signed product
– multu $s2, $s3 # Hi, Lo = $s2 x $s3, 64-bit unsigned
product

• Divide
– div $s2, $s3 # Lo = $s2/$s3, Hi = $s2 mod $s3
– divu $s2, $s3 # Lo = $s2/$s3, Hi = $s2 mod $s3,
unsigned

• Ancillary
– mfhi $s1 # $s1 = Hi, get copy of Hi
– mflo $s1 # $s1 = Lo, get copy of low
Survey of MIPS Instruction Set
• Logical

• Boolean Operations
– and $s1, $s2, $s3 # $s1 = $s2 & $s3, bit-wise AND
– or $s1, $s2, $s3 # $s1 = $s2 | $s3, bit-wise OR

• Immediate/Constants
– andi $s1, $s2, 100 # $s1 = $s2 & 100, bit-wise AND
– ori $s1, $s2, 100 # $s1 = $s2 | 100, bit-wise OR

• Shifting
– sll $s1, $s2, 10 # $s1 = $s2 << 10, shift left
– srl $s1, $s2, 10 # $s1 = $s2 >> 100, shift right
Survey of MIPS Instruction Set
• Data Transfer

• Load Operations
– lw $s1, 100($s2) # $s1 = Mem($s2+100), load word
– lbu $s1, 100($s2) # $s1 = Mem($s2+100), load byte
– lui $s1, 100 # $s1 = 100 * 2^16, load upper imm.

• Store Operations
– sw $s1, 100($s2) # Mem[$s2+100] = $s1, store word
– sb $s1, 100($s2) # Mem[$s2+100] = $s1, store byte
Survey of MIPS Instruction Set
• Control Transfer
• Jump Operations
– j 2500 # go to 10000
– jal 2500 # $ra = PC+4, go to 10000, for procedure call
– jr $ra # go to $ra, for procedure return
• Branch Operations
– beq $s1, $s2, 25 # if($s1==$s2), go to PC+4+100, PC-relative
– bne $s1, $s2, 25 # if($s1!=$s2), go to PC+4+100, PC-relative

• Comparison Operations
– slt $s1, $s2, $s3 # if($s2<$s3) $s1=1, else $s1=0, 2’s comp.
– slti $s1, $s2, 100 # if($s2<100), $s1=1, else $s1=0, 2’s comp.
– sltu $s1, $s2, $s3 # if($s2<$s3), $s1=1, else $s1=0, unsigned
– sltiu $s1, $s2, 100 # if($s2<100) $s1=1, else $s1=0, unsigned
Survey of MIPS Instruction Set
• Floating Point Operations
• Arithmetic
– {add.s, sub.s, mul.s, div.s} $f2, $f4, $f6 # single-precision
– {add.d, sub.d, mul.d, div.d} $f2, $f4, $f6 # double-precision
– Floating-point registers are used in even-odd pairs, using even
number register as its name

• Data Transfer
– {lwc1, swc1} $f1, 100($s2)
– Transfer data to/from floating-point register file

• Conditional Branch
– {c.lt.s, c.lt.d} $f2, $f4 # if ($f2 < $f4), cond=1, else cond=0
– {bclt, bclf} 25 # if (cond==1/0), go to PC+4+100
Case Study

MIPS Instruction Set Processor


Introduction
 Starting point:
■ The specification of the MIPS instruction set drives the design of the
hardware.
■ Will restrict design to integer type instructions
 Identify common functions to all instructions, and within instruction
classes – easy to do in a RISC architecture
■ Instruction fetch
■ Access one or more registers
■ Use ALU
 Asserted signals – a high or low level of a signal which implies a logically
“true” condition … an “action” level. The text will only assert a logically
high level, ie., a “1”.
 Clocking
■ Assume “edge triggered” clocking (as opposed to level sensitive).
■ A storage circuit or flip-flop stores a value on the clock transition
edge.
■ Model is flip-flops with combinational logic between them
■ Propagation delay through combinations logic between storage
elements determines clock cycle length.
■ Single clock cycle vs. multi-clock cycle design approach
Single Versus Multi-clock Cycle Design
 Start out with a single “long” clock cycle for each instruction .
■ Entire instruction gets executed in a single clock pulse
■ Controller is pure combinational logic
■ Design is simple
■ You would think that a single clock cycle per instruction
execution would give us super high performance – but not so:
Slowest instruction determines speed of all instructions.
 Because various phases of the instructions need the same
hardware resource, and all is needed at the same time (clock
pulse)
■ Some hardware is redundant – another disadvantage of
single phase
Examples:
2 memories: instruction and data memory
2 adders and an ALU
Design Summary
 Has a performance bottleneck
■ The clock cycle time is determined by the longest path
in the machine
■ The simple jmp instruction will take as long as the load
word (lw)
■ The instruction which uses the longest data path
dictates the time for all others.
 What about a variable time clock design?
■ Still a single clock
■ Clock pulse interval is a function of the opcode
■ Average time for instruction theoretically improves
But
■ It difficult to implement - lots of overhead to overcome

 Let’s start simple with a single clock cycle design for


simplicity reasons and later convert to multi-clock cycle.
Basic Abstract View of the Data Path

D a ta

R e g is t e r #
PC A d dres s In s t r u c t i o n R e g is te rs ALU A d dres s
I n s t r u c t io n R e g is t e r #
m e m o ry D a ta
R e g is t e r # m e m o ry

D a ta

Shows common functions for most instructions


Data Path for Instruction Fetching

A dd

Read
PC a d d re s s

I n s t r u c t io n
I n s t r u c t io n
m e m o ry
Basic Data Path for R-type Instruction
A L U o p e r a tio n
R ead 3
re g is te r 1
R ead
R ead d a ta 1
re g is te r 2 Z e ro
In s tr u c tio n
R e g is te r s ALU ALU
W r it e r e s u lt
re g is te r
R ead
W r it e d a ta 2
d a ta

R e g W r it e

Red lines are for control signals


generated by the controller
Adding the Data Path for lw & sw Instruction
3 A L U o p e r a tio n
R ea d
r e g is t e r 1 M e m W r it e
R e ad
d a ta 1
R ea d
I n s t r u c t io n r e g is t e r 2 Z e ro
R e g is te rs ALU ALU
W ri te Read
re s u lt A d d re ss d a ta
r e g is t e r R e ad
W ri te d a ta 2
D a ta
d a ta
m e m o ry
R e g W r it e W ri te
d a ta
16 32
Immediate S ig n M em Read
offset data e x te n d

Implements:
lw $t1, offset_value($t2)
sw $t1, offset_value($t2)
The offset value is a 16-bit signed immediate
field & must be sign extended to 32 bits
Adding the Data Path for beq Instruction
P C + 4 fr o m i n s tr u c t i o n d a ta p a th
To PC
Add Sum B r a n c h ta r g e t

S h i ft
l e ft 2

A L U o p e r a ti o n
R ead 3
I n s tr u c t io n r e g i s te r 1
R ea d
da ta 1
R ead
r e g i s te r 2 T o b ra n c h
R e g i s te r s ALU Z e ro
W r i te c o n tr o l l o g i c
r e g i s te r R ea d
da ta 2
W r i te
d a ta

R e g W rit e Implements beq $t1, $t2, offset


16 32
Offset is a signed 16 bit
S ig n
e x te n d
immediate field, & thus must be
sign extended. In addition we
shift left by 2 (make low bits are 00)
to address to a word boundary
Putting It All Together
unsuccessful branch P C S rc
Incremented PC
M
Add u
x
or beq branch
4 Add ALU
r e s u lt address
S h if t
le f t 2 Successful branch
R e g i s te r s
R e ad 3 A L U o p e r a t io n M e m W r i te
Read re g is t e r 1 A L U S rc
PC Read
a d d re s s R ead d a ta 1 M e m to R e g
r e g i s te r 2 Z ero
I n s t r u c t io n ALU A LU
W r it e R ead A dd re ss R ead
re g is t e r M r e s u lt d a ta
da ta 2 M
I n s tr u c tio n u u
m e m o ry W r it e x D ata x
d a ta m e m o ry
W r i te
R e g W r it e da ta
16 32
S ig n
e xte nd M e m R ea d

j instruction to be added later


Need control circuits to drive control lines in red.
Two control units will be designd: ALU Control & “Main Control
Instruction RegDst RegWrite ALUSrc MemRea MemWri MemToReg PCSrc ALU
d te operation

R-format 1 1 0 0 0 0 0 0000
(and)
0001
(or)
0010
(add)
0110
(sub)

lw 0 1 1 1 0 1 0 0010
(add)

sw X 0 1 0 1 X 0 0010
(add)

beq x 0 0 0 0 X 1 or 0 0110
(sub)
Control

 We next add the control unit that generates


■ write signal for each state element
■ control signals for each multiplexer
■ ALU control signal
 Input to control unit: instruction opcode and function
code
Control Unit

 Divided into two parts


■ Main Control Unit
● Input: 6-bit opcode
● Output: all control signals for Muxes, RegWrite,
MemRead, MemWrite and a 2-bit ALUOp signal
■ ALU Control Unit
● Input: 2-bit ALUOp signal generated from Main
Control Unit and 6-bit instruction function code
● Output: 4-bit ALU control signal
Truth Table for Main Control Unit
Main Control Unit
ALU Control Unit

 Must describe hardware to compute 4-bit ALU


control input given
■ 2-bit ALUOp signal from Main Control Unit
■ function code for arithmetic
 Describe it using a truth table (can turn into gates):
ALU Control bits

0010

0010

0110

0010

0110

0000

0001

0111
Putting It All Together Again
0
M
u
x
ALU
Add result 1
Add Shift PCSrc
RegDst left 2
4 Branch
MemRead
Instruction [31 26] MemtoReg
Control
ALUOp
MemWrite
ALUSrc
RegWrite

Instruction [25 21] Read


PC Read register 1
address Read
Instruction [20 16] data 1
Read
register 2 Zero
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15 0] Sign
extend ALU
control

Instruction [5 0]

Use this for R-type, memory, & beq instructions scenarios.


Addition of the Unconditional Jump

 We now add one more op code to our single cycle design:


■ Op code 2: “j”
■ The format is op field 28-31 is a “2”
■ Remaining 26 low bits is the immediate target address

 The full 32 bit target address is computed by concatenating:


■ Upper 4 bits of PC+4
■ 26 bit immediate field of the jump instruction
■ Bits 00 in the lowest positions (word boundary)
■ See text chapter 3, p. 150

 An additional control line from the main controller will have to


be generated to select this “new” instruction

 A two bit shifter is also added to get the two low order zeros
Final Design with jump Instruction
I n s t ru c tio n [ 2 5 – 0 ] S h if t Ju m p a d dre s s [3 1 – 0 ]
le f t 2
26 28 0 1

P C + 4 [31 – 2 8 ] M M
u u
x x
A d d r e As uL ltU 1 0
Ad d S h if t
R e g D st
Jump le f t 2
4 B ra n ch
M emR ead
I n s tr u c t io n [3 1 – 2 6 ]
C o n tr o l M e m to R e g
ALUO p
M e m W r ite
A L U S rc
R e g W r it e

I n s tr u c t io n [2 5 – 2 1 ] Read
PC Read r e g is te r 1
a d d re ss Read
I n s tr u c t io n [2 0 – 1 6 ] Read d a ta 1
r e g is te r 2 Z e ro
In s t r u c tio n 0 R e g is te r s R e a d ALU
[3 1 – 0 ] ALU
M W r it e d a ta 2 0 r e s u lt A d d re ss Read 1
In s tr u c tio n u r e g is te r M d a ta
u M
m e m or y I n s tr u c t io n [1 5 – 1 1 ] x u
1 W r it e x D a ta
d a ta x
1 m e m o ry 0
W ri te
d a ta
16 32
I n s tr u c t io n [1 5 – 0 ] S ig n
e x te n d ALU
c o ntro l

I n s t ru c tio n [ 5 – 0 ]
Thanks!

Ajit Pal, IIT Kharagpur

Potrebbero piacerti anche