Sei sulla pagina 1di 36

PIPELINING

1
• Pipelining is an implementation technique
whereby multiple instructions are overlapped
in execution.
• It takes advantage of parallelism that exists
among the actions needed to execute an
instruction.
• Pipelining is the key implementation
technique used to make fast CPU’s.

2
• In a computer pipeline, each step in the
pipeline completes a part of an instruction.
Each of these steps is called a pipe stage or
pipe segment.
• Throughput of an instruction pipeline is
determined by how often an instruction exits
the pipeline.
• Time per instruction on a pipelined(ideal)
processor=time per instruction on unpipelined
m/c / no. of pipeline stages.

3
RISC instruction set-MIPS64
Microprocessor without Interlocked Pipeline Stages
3 classes of instructions:
1) ALU instructions: these instructions take either 2
registers or a register and a sign extended immediate
instructions, operate on them, and store the result
into a 3rd register.
Typical instructions include DADD, DSUB and logical
operations.
2) Load and store instructions: these instructions take a
register source, called the base register, and an
immediate field called the offset, as operands.
The sum called the effective address –of the contents of
the base register and the sign extended offset is used
as memory address.
4
3) Branches and jumps: branches are conditional transfers of
control.
Every instruction in RISC subset can be implemented in at
most 5 clock cycles.
1) Instruction fetch cycle(IF):
send the PC to memory and fetch the current instruction
from memory. Update the PC to the next sequential PC by
adding 4(since each instruction is 4 bytes) to the PC.
2) Instruction decode/register fetch cycle(ID):
• Decode the instruction and read the registers
corresponding to register source specifiers from the
register file.
• Decoding is done in parallel with reading registers, which
is possible because the register specifiers are at a fixed
location in a RISC architecture. This technique is known as
fixed-field decoding.

5
3) execution/ effective address cycle(EX):ALU performs
one of the 3 functions depending on the instruction
type.
• Memory reference: ALU adds the base register and the
offset to form the effective address.
• Register-Register ALU instructions: ALU performs
operations specified by the ALU opcode on the values
read from the register file.
• Register- immediate ALU instruction: ALU performs the
operation specified by the ALU opcode on the first
value read from the register file.
• In a load-store architecture the effective address and
execution cycles can be combined into a single clock
cycle, since no instruction needs to simultaneously
calculate a data address and perform an operation on
the data.

6
4) Memory access(MEM):
• If the instruction is a load, memory does a
read using the effective addresss computed in
the previous cycle.
• If it is a store, then the memory writes the
data from the second register read from the
register file using the effective address.
5) write-back cycle: write the result into the
register file, whether it comes from the
memory system or from the ALU.

7
Classic 5 stage pipeline for a RISC
processor
3 observations:
• We use separate instruction an data memories,
typically implemented with separate instruction
and data caches. The use of separate caches
eliminates a conflict for a single memory that
would arise b/w IF and data memory access.
• The register file is used in 2 stages: one for
reading in ID and one for writing in WB
• To handle reads and a write to the same register,
write in the first half of the clock cycle and read in
the second half.
8
9
10
• Basic performance issues in pipelining:
Pipelining increases the CPU instruction
throughput- the number of instructions
completed per unit of time, but does not
reduce the execution time of of an individual
instruction.

11
Pipelining Hazards- situations that prevent the next
instruction in the instruction stream from executing
during its designated clock cycle
1) Structural Hazards: arise from resource conflicts when the
hardware cannot support all possible combination of
instructions simultaneously in overlapped instruction.
2) Data hazards: when an instruction depends on the results
of a previous instruction in a way that is exposed by the
overlapping of instructions in the pipeline.
3) Control Hazards: arise from the pipelining of branches and
other instructions that change the PC.
hazards, in pipelines can make it necessary to stall the
pipeline.
Avoiding a hazard often requires that some instruction in the
pipeline be allowed to proceed while others are delayed

12
• Some performance expressions involving a
realistic pipeline in terms of CPI.
• It is assumed that the clock period is the
same for pipelined and unpipelined
implementations.
Speedup = CPI Unpipelined / CPI pipelined
= Pipeline Depth / ( 1(ideal CPI) + Stalls
per Ins)

13
Structural hazards
• When a processor is pipelined, the overlapped
execution of instructions requires pipelining of
functional units and duplication of resources
to allow all possible combinations of
instructions in the pipline.
• If some combination of instructions cannot be
accommodated because of resource conflicts,
the processor is said to have a structural
hazard.

14
Structural hazards
• The most common instances of structural hazards arise
when some functional unit is not fully pipelined. The
sequence of instructions using that unpipelined unit cannot
proceed at the rate of 1 per clock cycle.
• Another reason- when some resources has not been
duplicated enough to allow all combinations of instructions
in the pipeline to execute.
• Ex: a processor might have only 1register-file write port, but
under certain circumstances, the pipeline might want to
perform 2 writes in a clock cycle. This will generate a
structural hazard.
• When a sequence of instructions encounters this hazard,
the pipeline will stall one of the instructions until the
required unit is available. Such stalls will increase the CPI
from its usual ideal value of 1.
15
Structural hazards

• Some pipelined processors have shared a single-


memory pipeline for data and instructions. As a
result, when an instruction contains a data
memory reference, it will conflict with the
instruction reference for a later instruction.
• To resolve this hazard, stall the pipeline for 1 clock
cycle when data memory access occurs. A stall is
commonly called pipeline bubble , since it floats
through the pipeline taking space but carrying no
useful work.

16
Structural hazards
• As an alternative to structural hazard, the
designer could provide a separate memory access
for instructions, either by splitting the cache into
separate instruction and data caches, or by using
a set of buffers, usually called instruction buffers,
to hold instructions.
• A designer would allow structural hazards-
primary reason is to reduce cost of the unit, since
all the functional units, or duplicating them may
be too costly.

17
Data hazards
• Data hazards occur when the pipeline changes the
order of R/W accesses to operands so that the
order differs from the order seen by sequentially
executing instructions on an un pipelined
processor.
• Ex: DADD R1, R2, R3
DSUB R4, R1, R5
AND R6, R1, R7
OR R8, R1, R9
XOR R10, R1, R11

18
19
• All the instructions after DADD use the result of
the DADD instruction. The DADD instruction
writes the value of R1 in the WB pipe stage, but
the DSUB instruction reads the value during its ID
stage. This is called as Data Hazard.
• Unless precautions are taken to prevent it, the
DSUB instruction will read the wrong value.
• The AND instruction is also affected by this
hazard. It reads the registers during clock cycle 4
and will receive the wrong results.
• The XOR instruction operates properly because its
register read occurs in clock cycle 6.

20
Minimizing data Hazards
A technique called Forwarding (also called
bypassing and sometimes short-circuiting) can
be used.

21
Using forward paths to avoid data hazards

22
Forwarding works as follows:
1) The ALU result from both the EX/MEM and
MEM/WB pipeline registers is always fed back to
the ALU inputs.
2) If the forwarding hardware detects that the
previous ALU operations has written the register
corresponding to a source for the current ALU
operation, control logic selects the forwarded
result as the ALU input rather than the value
read from the register file.

23
General Data Forwarding
• It is easy to see how data forwarding can be
used by drawing out the pipelined execution
of each instruction.
• Now consider the following instructions:
• DADD R1, R2, R3
• LDR4, O(R1)
• SDR4, 12(R1

24
25
• Can data forwarding prevent all data hazards?
• NO!
• The following operations will still cause a data
hazard. This happens because the further down
the pipeline we get, the less we can use
forwarding.
• LD R1, O(R2)
• DSUB R4, R1, R5
• AND R6, R1, R7
• OR R8, R1, R9
26
27
• We can avoid the hazard by using a pipeline
interlock.
• The pipeline interlock will detect when data
forwarding will not be able to get the data to
the next instruction in time.
• A stall is introduced until the instruction can
get the appropriate data from the previous
instruction.

28
Control Hazards
• Control hazards are caused by branches in the
code.
• During the IF stage remember that the PC is
incremented by 4 in preparation for the next IF
cycle of the next instruction.
• What happens if there is a branch performed and
we aren’t simply incrementing the PC by 4.
• The easiest way to deal with the occurrence of a
branch is to perform the IF stage again once the
branch occurs.
29
Performing IF Twice
• We take a big performance hit by performing
the instruction fetch whenever a branch
occurs. Note, this happens even if the branch
is taken or not. This guarantees that the PC
will get the correct value.
IF ID EX MEM WB
IF ID EX MEM WB
IF IF ID EX MEM WB

30
Performing IF Twice
• This method will work but as always in
computer architecture we should try to make
the most common operation fast and efficient.
• With MIPS64 branch instructions are quite
common.
• By performing IF twice we will encounter a
performance hit between 10%-30%

31
Control Hazards (other solutions)
• These following solutions assume that we are
dealing with static branches. Meaning that the
actions taken during a branch do not change.
• We already saw the first example, we stall the
pipeline until the branch is resolved (in our
case we repeated the IF stage until the branch
resolved and modified the PC)
• The next two examples will always make an
assumption about the branch instruction.

32
Control Hazards (other solutions)
• What if we treat every branch as “not taken”
remember that not only do we read the registers
during ID, but we also perform an equality test in
case we need to branch or not.
• We can improve performance by assuming that
the branch will not be taken.
• What in this case we can simply load in the next
instruction (PC+4) can continue. The complexity
arises when the branch evaluates and we end up
needing to actually take the branch.

33
Control Hazards (other solutions)
• If the branch is actually taken we need to clear the
pipeline of any code loaded in from the “not-
taken” path.
• Likewise we can assume that the branch is always
taken. Does this work in our “5-stage” pipeline?
• No, the branch target is computed during the ID
cycle. Some processors will have the target
address computed in time for the IF stage of the
next instruction so there is no delay.

34
Control Hazards (other solutions)
• The “branch-not taken” scheme is the same as performing
the IF stage a second time in our 5 stage pipeline if the
branch is taken.
• If not there is no performance degradation.
• The “branch taken” scheme is no benefit in our case
because we evaluate the branch target address in the ID
stage.
• The fourth method for dealing with a control hazard is to
implement a “delayed” branch scheme.
• In this scheme an instruction is inserted into the pipeline
that is useful and not dependent on whether the branch is
taken or not. It is the job of the compiler to determine the
delayed branch instruction.

35
36

Potrebbero piacerti anche