Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Until now, we have assumed that there is no overlap in the execution of the basic steps of
successive instructions. A substantial improvement in performance can be achieved by
overlapping the execution of basic steps called pipe lining.
Example: Suppose that we want to perform the combined multiply and add operations
with a stream of numbers.
Ai * Bi + Ci for i = 1,2,3,……,7
Each sub operation is to be implemented in a segment within a pipeline. The sub
operations performed in each segment of the pipeline are as follows:
R5 R3 + R4 Add Ci to product
R1 R2
Multiplier
Ai Bi Ci
R3 R4
Adder
R5
1. Suppose to execute n task there are k-segment pipeline with a clock cycle time of
tp .
2. The first task T1 requires a time equal to ktp to complete its operation using k
segments in the pipe.
3. The remaining n - 1 tasks emerge from the pipe at the rate of one task per clock
cycle and they will be completed after a time equal to (n - l) tp
4. Therefore, to complete n tasks using a k-segment pipeline requires k + (n - 1)
clock cycles.
SPEEDUP
For example:
When a finite number of tasks are executed in a pipeline, the space-time diagram
clearly shows the pipeline start-up region, where all stages are not yet fully utilized, and
the pipeline drainage region, composed of stages that have become idle because the last
task has left them. If a pipeline executes many tasks, the overhead of start-up and
drainage can be ignored and the effective throughput of the pipeline taken to be 1 task per
cycle.
In terms of instruction execution, this corresponds to an effective CPI (cycles per
instruction) of 1. If the 5-stage pipeline has to be drained after executing 7 instructions
(as in Figure 2.b), then 7 instructions are executed in 11 cycles, making the effective CPI
equal to 11/7 = 1.57.
For example:
For example:
There are 4 segments t1=60ns,t2=70ns,t3=100ns, t4=80ns and the interface registers have
a delay of tr=10ns
Using Pipeline :
The clock cycle is choosen to be tp= t3+ tr =100+10 =110ns
Non-pipeline:
tn = t1+ t2+ t3+ t4+ tr =60+70+100+80+10 =320
The pipeline adder has a speedup of 320/110 = 2.9
Pipeline conflicts:
There are 3 major difficulties that cause the instruction pipeline to deviate from its
normal operation.
1. Resource conflict
2. Data dependency
3. Branch difficulties
Resource conflict:
When an instruction depends on the result of a previous instruction ,but this result
is not available.
Branch difficulties:
Branch difficulties arise from branch and other instructions that change that value
of PC.
Example:
Instr 1 FI DA FO EX
Instr 2 FI DA FO EX
Branch Instr 3 FI DA FO EX
Instr 4 - - FI DA FO EX
Inst 5 - - - FI DA FO EX
Hazard:
In pipe line system in which instruction executions are overlapped, this means that one of the
operations required for instructions IK+1,IK+2 may be started and completed before the
instruction IK is completed, this difference can cause problems if not properly considered in
the design of the control.
Existence of such dependencies causes what is called “hazard”. These hazards must be
detected and resolved so that the accurate results produced by the machine matching
programmer’s expectations.
The hardware technique that detects and resolves hazards is called interlock.
A hazard occurs whenever an object within the system (i.e. registers, flags or memory
locations) is accessed or modified by two separate instructions that are close enough in the
program such that they will be active simultaneously in the pipeline. In computer
architecture, a hazard is a potential problem that can happen in a pipelined processor.
Types of hazards:
1. Instruction hazard
2. Data hazard
3. Branching hazards
4. Structural hazards
1. INSTRUCTION HAZARDS:
An instruction RAW hazard occurs when the instruction to be initiated in the pipeline is
fetched from a location that is yet to be updated by some uncompleted instruction in the
pipeline. The instruction initiation must be suspended till the required change has occurred.
To handle this hazard a centralized controller is required to keep the address in the range sets
for the instructions inside the pipeline. It is then required that every instruction fetch, PC be
compared against the possible match with the address in the address range set used by the
subsequent stages. The match with any of the address means there is an instruction RAW
hazard, and the instruction fetch must be suspended till the instruction updating the object
moves out from point of hazard in the pipeline.
Types of hazards:-
Instruction no code
1. STORE R4,A
2. SUB R3,A
3. STORE R3,A
4. ADD R3,A
5. STORE R3,A
RAW hazard:
In the above code, the second instruction must use the value of A updated by the first
instruction. If the second instruction (SUB) reads the value A before instruction 1 has a
chance to update it, a wrong value of data will be used by the CPU. This situation is called
raw hazard.
WAR hazard:
A WAR hazard between 2 instructions i & j occurs when the instruction j attempts to
write onto some object that being read by instruction i.
The WAR hazard exists between the instructions 2 & 3, since an attempt by instruction 3 to
record a value in A before instruction 2 has read the value is clearly wrong.
WAW hazard:
When WAW hazard between 2 instructions i & j occurs when the instruction j
attempts to write onto some object that is also required to be modified by the
instruction i.
Similarly WAW hazard occurs between the instructions 3 & 5 since an attempt by instruction
5 to store before the store of instruction 3 is clearly incorrect.
2. DATA HAZARDS:
Data hazards occur when data is modified. Ignoring potential data hazards can result in race
conditions. There are 3 situations data hazard can occur in:
Example:
Instr 1: R3 ← R1 + R2
Instr 2: R5 ← R3 + R2
The first instruction is calculating a value to be saved in register R3 and the second is going
to use this value to compute a result for register 5.
Example:
R1←R2+R3
R3←R4+R5
We must ensure that we do not store the result of register R3 before it has had a
chance to fetch the operands.
For example:
We must delay the WB (write back) of instr2 until the execution instr1.
Eliminating hazards:
As instruction is fetched, control logic determines whether a hazard could/will occur. If this
is true, then the control logic inserts NOPs into the pipeline. Thus, before the next instruction
is executed, the previous would have had sufficient to complete and prevent the hazard.
Bubble insertion
The first solution is for the assembler to detect this type of data dependency and insert
three redundant but harmless instructions (adding 0 to a register or shifting a register by 0
bit) they perform no useful work and just take up memory locations to space out the data-
dependent instructions.
We say that the assembler inserts three bubbles in the pipeline to resolve a read-after-
compute data dependency. Actually, inserting two bubbles might suffice if we note that
writing into and reading out of a register each take 1 ns.
writes into register $ 8 and the fourth instruction needs the result of the third instruction
in register $ 9. Note that writing into register , $ 8 is completed in cycle 6; hence, reading
of the new value from register $ 8 is possible beginning with cycle 7. The third
instruction, however, reads out registers $8 and $2 in cycle 4 and will thus not get the
intended value for register $ 8. This data dependency problem can be solved by bubble
insertion or via data forwarding.
Note that in Figure 3, even though the result of the second instruction is not yet stored in
register $ 8 by the time the third instruction needs it, the result is in fact available at the
output of the ALU. Thus, if a bypass path is provided from the output of the ALU to one
of its inputs. This approach is known as data forwarding.
Forwarding involves feeding output data into a previous stage of a pipeline, for instance, let’s
say we want to write the value 3 to register 1, and then add 7 to register 1 and store the result
in register 2. i.e.
Following execution register 2 should contain the value 10. However if instruction 1 does
not completely exit the pipeline before inst-2 starts execution, it means that the register 1
does not contain the value when instr-2 performs its addition. In such an event, inst-2 adds 7
to the old value of register 1 and so register 2 would contain 13.
Operand hazard:
Logic to detect the hazards in operand fetches can be worked out as follow from the
definition of range and domain sets, we can device a mechanism to keep track of the domain
and range sets for various instructions passing through the pipeline stages. Each set will
associate with one set of storage registers for the domain set and another for the range set
addresses. This storage will be required for all the stages beyond the decode stage. When an
instruction moves from stage to stage it also carries its range and domain set information.
Let k be the stage beyond which hazard can occur. For simplicity assume, that there is only
one element each in the domain and each range set. Let DK & RK be the domain and range
address register associated with a stage k.
If a stage i needs to detect any of the 3 hazards, then it must compare it Ri & Di.
The comparators are connected as follows:-
1. The stage j detects RAW hazard by comparing Dj & Ri for all I >j > k
2. The stage j detects WAR hazard by comparing Rj & Di for all I > j> k
3. The stage j detects WAW hazard by comparing Rj & Ri for all I > j > k