Sei sulla pagina 1di 10

Datapath and Control

Considerations

BusA

BusB

BusC

Original Design
Incrementer

PC

Register
file

MUX

Constant4

A
ALU R
B

Instruction
decoder

IR

MDR

MAR

Memorybus
datalines

Address
lines

Figure7.8. Threebusorganizationofthedatapath.

Register
file

BusA

Pipelined Design
BusB

Separate instruction and data caches


PC is connected to IMAR
DMAR
Separate MDR
Buffers for ALU
Controlsignalpipeline
Instruction queue
Instruction decoder output

B
BusC

ALU R

PC

Incrementer

Instruction
decoder

IMAR
Memoryaddress
(Instructionfetches)

Instruction
queue
MDR/Write

DMAR

Instructioncache
Memoryaddress
(Dataaccess)

Reading an instruction from the instruction cache


Incrementing the PC
Decoding an instruction
Reading from or writing into the data cache
Datacache
Reading the contents of up to two regs
Figure8.18. Datapathmodifiedforpipelinedexecution,with
Writing into one register in the reg file
interstagebuffersattheinputandoutputoftheALU.
Performing an ALU operation

MDR/Read

Superscalar Operation

Overview
The maximum throughput of a pipelined
processor is one instruction per clock cycle.
If we equip the processor with multiple
processing units to handle several
instructions in parallel in each processing
stage, several instructions start execution in
the same clock cycle multiple-issue.
Processors are capable of achieving an
instruction execution throughput of more than
one instruction per cycle superscalar
processors.
Multiple-issue requires a wider path to the
cache and multiple execution units.

Superscalar
F:Instruction
fetchunit
Instructionqueue

Floating
point
unit
Dispatch
unit

W:Write
results
Integer
unit

Figure8.19.

Aprocessorwithtwoexecutionunits.

Timing
Clockcycle

I 1 (Fadd)

F1

D1

E1A

E1B

E 1C

W1

I 2 (Add)

F2

D2

E2

W2

I 3 (Fsub)

F3

D3

E3

E3

E3

I 4 (Sub)

F4

D4

E4

W4

Time

W3

Figure8.20. AnexampleofinstructionexecutionflowintheprocessorofFigure8.19,
assumingnohazardsareencountered.

Out-of-Order Execution

Hazards
Exceptions
Imprecise exceptions
Precise exceptions
Clockcycle

I 1 (Fadd)

F1

D1

E1A

E 1B

E 1C

W1

I 2 (Add)

F2

D2

E2

I 3 (Fsub)

F3

D3

I 4 (Sub)

F4

D4

W2
E3A

(a)Delayedwrite

E 3B

E 3C

W3

E4

W4

Time

Execution Completion
It is desirable to used out-of-order execution, so that an
execution unit is freed to execute other instructions as
soon as possible.
At the same time, instructions must be completed in
program order to allow precise exceptions.
The use of temporary registers
Commitment unit
Clockcycle

I 1 (Fadd)

F1

D1

E1A

E 1B

E 1C

W1

I 2 (Add)

F2

D2

E2

TW2

I 3 (Fsub)

F3

D3

E3A

E 3B

I 4 (Sub)

F4

D4

E4

TW4

(b)Usingtemporaryregisters

Time
7

W2
E 3C

W3
W4

Potrebbero piacerti anche