Lec11 Pipeline Introduction

CSCE430/830 Computer Architecture
Pipeline: Introduction
Lecturer: Prof. Hong Jiang
Courtesy of Prof. Yifeng Zhu, U of Maine Fall, 2006
CSCE430/830
Portions of these slides are derived from: Dave Patterson UCB
Pipeline
Pipelining Outline
Introduction
Defining Pipelining Pipelining Instructions
Hazards
Structural hazards Data Hazards Control Hazards
Performance Controller implementation
CSCE430/830
Pipeline
What is Pipelining?
A way of speeding up execution of instructions
Key idea:
overlap execution of multiple instructions
CSCE430/830
Pipeline
The Laundry Analogy

Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer takes 30 minutes Folder takes 30 minutes Stasher takes 30 minutes to put clothes into drawers
CSCE430/830 Pipeline
If we do laundry sequentially...
6 PM
T a s k O r d e r
10
11
12
2 AM
A B C D
30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 Time
Time Required: 8 hours for 4 loads
CSCE430/830
Pipeline
To Pipeline, We Overlap Tasks

6 PM
T a s k
10
11 Time
12
2 AM
30 30 30 30 30 30 30 A
O r d e r
B
C D
Time Required: 3.5 Hours for 4 Loads
CSCE430/830
Pipeline
To Pipeline, We Overlap Tasks

6 PM
T a s k
10
11 Time
12
2 AM
30 30 30 30 30 30 30 A
O r d e r
B
C D
CSCE430/830
Pipelining doesnt help latency of single task, it helps throughput of entire workload Pipeline rate limited by slowest pipeline stage Multiple tasks operating simultaneously Potential speedup = Number pipe stages Unbalanced lengths of pipe stages reduces speedup Time to fill pipeline and time to drain it reduces speedup Pipeline
Pipelining a Digital System

1 nanosecond = 10^-9 second 1 picosecond = 10^-12 second
Key idea: break big computation up into pieces
Separate each piece with a pipeline register
1ns
200ps
Pipeline Register
CSCE430/830
200ps
200ps
200ps
200ps
Pipeline
Pipelining a Digital System

Why do this? Because it's faster for repeated computations
Non-pipelined: 1 operation finishes every 1ns 1ns
Pipelined: 1 operation finishes every 200ps 200ps 200ps 200ps 200ps 200ps
CSCE430/830
Pipeline
Comments about pipelining

Pipelining increases throughput, but not latency
Answer available every 200ps, BUT A single computation still takes 1ns
Limitations:
Computations must be divisible into stage size Pipeline registers add overhead
CSCE430/830
Pipeline
Pipelining a Processor
Recall the 5 steps in instruction execution:
1. 2. 3. 4. 5. Instruction Fetch (IF) Instruction Decode and Register Read (ID) Execution operation or calculate address (EX) Memory access (MEM) Write result into register (WB)
Review: Single-Cycle Processor

All 5 steps done in a single clock cycle Dedicated hardware required for each step
CSCE430/830
Pipeline
Review - Single-Cycle Processor
CSCE430/830
What do we need to add to actually split the datapath into stages?
Pipeline
The Basic Pipeline For MIPS

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7
ALU
I n s t r.
O r d e r
Ifetch
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
What do we need to add to actually split the datapath into stages?

Basic Pipelined Processor
CSCE430/830
Pipeline
Pipeline example: lw IF
CSCE430/830
Pipeline
Pipeline example: lw ID
CSCE430/830
Pipeline
Pipeline example: lw EX
CSCE430/830
Pipeline
Pipeline example: lw MEM
CSCE430/830
Pipeline
Pipeline example: lw WB
Can you find a problem?

Basic Pipelined Processor (Corrected)
CSCE430/830
Pipeline
Single-Cycle vs. Pipelined Execution
Non-Pipelined
Instruction Order
lw $1, 100($0)
200
400
ALU
600
MEM
800
REG WR
1000
1200
1400
1600
1800
Time
Instruction REG Fetch RD
lw $2, 200($0)
800ps
lw $3, 300($0)
Instruction REG Fetch RD
ALU
MEM
REG WR Instruction Fetch
800ps
800ps
Pipelined
Instruction Order
lw $1, 100($0)
200
400
ALU
600
MEM ALU
800
REG WR MEM
1000
1200
1400
1600
Time
Instruction Fetch
lw $2, 200($0)
200ps
lw $3, 300($0)
REG RD Instruction Fetch
200ps
REG RD Instruction Fetch
REG WR
REG RD
ALU
MEM
REG WR
200ps 200ps 200ps 200ps 200ps

Speedup
Consider the unpipelined processor introduced previously. Assume that it has a 1 ns clock cycle and it uses 4 cycles for ALU operations and branches, and 5 cycles for memory operations, assume that the relative frequencies of these operations are 40%, 20%, and 40%, respectively. Suppose that due to clock skew and setup, pipelining the processor adds 0.2ns of overhead to the clock. Ignoring any latency impact, how much speedup in the instruction execution rate will we gain from a pipeline?
Average instruction execution time = 1 ns * ((40% + 20%)*4 + 40%*5) = 4.4ns

Speedup from pipeline = Average instruction time unpiplined/Average instruction time pipelined = 4.4ns/1.2ns = 3.7
CSCE430/830
Pipeline
Comments about Pipelining

The good news
Multiple instructions are being processed at same time This works because stages are isolated by registers Best case speedup of N
The bad news

Instructions interfere with each other - hazards Example: different instructions may need the same piece of hardware (e.g., memory) in same clock cycle Example: instruction may require a result produced by an earlier instruction that is not yet complete
CSCE430/830
Pipeline
Pipeline Hazards
Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle
Structural hazards: two different instructions use same h/w in same cycle Data hazards: Instruction depends on result of prior instruction still in the pipeline Control hazards: Pipelining of branches & other instructions that change the PC
CSCE430/830
Pipeline
Summary - Pipelining Overview

Pipelining increase throughput (but not latency) Hazards limit performance
Structural hazards Control hazards Data hazards
CSCE430/830
Pipeline
Pipelining Outline
Introduction
Defining Pipelining Pipelining Instructions
Hazards
Structural hazards Data Hazards Control Hazards \
Performance Controller implementation
CSCE430/830
Pipeline

Lec11 Pipeline Introduction

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Lec11 Pipeline Introduction

Caricato da

Copyright:

Formati disponibili

CSCE430/830 Computer Architecture

Portions of these slides are derived from: Dave Patterson UCB

Performance Controller implementation

The Laundry Analogy

Time Required: 8 hours for 4 loads

To Pipeline, We Overlap Tasks

To Pipeline, We Overlap Tasks

Pipelining a Digital System

Key idea: break big computation up into pieces

Separate each piece with a pipeline register

Pipelining a Digital System

Comments about pipelining

Review: Single-Cycle Processor

Review - Single-Cycle Processor

What do we need to add to actually split the datapath into stages?

The Basic Pipeline For MIPS

What do we need to add to actually split the datapath into stages?

Basic Pipelined Processor

Pipeline example: lw MEM

Can you find a problem?

Basic Pipelined Processor (Corrected)

Single-Cycle vs. Pipelined Execution

Instruction REG Fetch RD

Instruction REG Fetch RD

REG WR Instruction Fetch

REG RD Instruction Fetch

REG RD Instruction Fetch

200ps 200ps 200ps 200ps 200ps

Average instruction execution time = 1 ns * ((40% + 20%)*4 + 40%*5) = 4.4ns

Comments about Pipelining

The bad news

Summary - Pipelining Overview

Performance Controller implementation

Potrebbero piacerti anche

Average instruction execution time = 1 ns * ((40% + 20%)4 + 40%5) = 4.4ns