Sei sulla pagina 1di 9

PIPELINE DESIGN DATA PATH AND CONTROL SYNTHESIS

c Giovanni De Micheli Stanford University

Outline
Synthesis of pipelined circuits:

c GDM

{ Scheduling. { Binding.
Data-path synthesis. Control-unit synthesis.

High-level synthesis of pipelined circuits


Pipeline circuits:

c GDM
TIME 1 1 * * 2

Example
6 * * 8

c GDM
10 +

STAGE 1

{ Concurrent execution of operations


on di erent data sets. I/O data rate.
TIME 2 *

3 *

7 +

9 <

11

{ Increase throughput: { Preserve latency.


Applicable to:

TIME 1

STAGE 2

TIME 2

{ General purpose processors. { Digital signal processors.

Synthesis of pipelined circuits


DSP applications:

c GDM

Issues in synthesis of pipelined circuits


c GDM

Partitioning:

{ Mainly data-path pipelining. { Few exceptions/interrupts. { Mature area.


Microprocessors:

{ Pipe-stage formation.
Scheduling:

{ Source vertex of the sequencing graph


red at constant rate. Sharing:

{ Advanced features:

Stalls, ush, bypass, hazard avoidance.

{ Synthesis tools not ready yet.

{ More concurrency. { Binding and scheduling are a ected.

Scheduling of pipelined circuits


Scheduling of non-pipelined circuit using pipelined resources. Scheduling of pipelined circuit using non-pipelined resources.

Scheduling for functional pipelining


c GDM

c GDM

Choose:

{ cycle-time. { data-introduction interval 0.


Determine (area,latency) spectrum. Key fact:

{ Functional pipelining.
Both problems can be modeled by ILP.

{ Simultaneous operations at steps:


l

+p 0

{ Reduced sharing.

Scheduling for functional pipelining ILP model


d = 0 e;1

X X
i

l+p
i

c GDM

Scheduling for functional pipelining Heuristic algorithms


c GDM

p=0 i:T (v )=k m=l;d +1+p

x
0

im

k 8k 8l

List scheduling:

Used in conjunction with other constraints. Use regular ILP solvers.

{ Compute resource usage at each step. { Determine candidates.


Force-directed scheduling.

{ Operation-type distribution:
Account for overlapping.

Example
TIME 1 1 * * 2 * 6 * 8

c GDM
10 +

Loop folding
Reduce execution delay of a loop.

STAGE 1

c GDM

TIME 2

3 * *

7 +

9 <

11

TIME 1

STAGE 2

TIME 2

Pipeline operations inside a loop.

0 1 2

3 1 2

{ Overlap execution of operations. { Need a prologue and epilogue.


Use pipeline scheduling for loop graph model.

Distribution graphs for multiplier and ALU.

Example
NOP NOP

c GDM
NOP

Resource sharing for pipelined circuits


Scheduled graphs:

c GDM

TIME 1

TIME 2

4 TIME 3 NOP NOP

{ Determine compatibility (or con ict) graphs.


The lower the 0 (the higher the throughput): { The lower the compatibility.
(b)

TIME 4

NOP

(a)

Example:
TIME 1 1 * * 2 * 6

0=2
10 +

c GDM
STAGE 1

Resource sharing for pipelined circuits


Branching constructs:

c GDM

TIME 2

3 * *

7 *

8 <

11

TIME 1

4 +

{ Special care to avoid deadlocks.


Twisted pairs:
9

STAGE 2

TIME 2

{ Two mutually compatible operation pairs


with twisted dependencies. Sharing operations in twisted pairs must be avoided.

10

11

Example

c GDM

Data path synthesis


Resource binding.

c GDM

1 + *

Connectivity synthesis:

{ Connection of resources to:


3 * + 4

multiplexers busses and registers.

{ Control unit interface. { I/O ports.


Physical data-path synthesis.

Example
REGISTERS
a 3 dx x y u r1 r2

Control synthesis
c GDM

c GDM

Synthesis of the control unit.


enable

Logic model:

{ Synchronous FSM.
mux control ALU control (+,,<)

Physical implementation:

ALU

DATAPATH

CONTROLUNIT

{ Microcode (ROM,PLA). { Hard-wired FSM. { Distributed FSM.

Control synthesis
Synthesize circuit that:

c GDM

Controlling scheduled operations


c GDM

{ Executes scheduled operations. { Provides synchronization. { Supports:


Iteration. Branching. Hierarchy. Interfaces.

Simple model:

{ No branching, iteration, hierarchy. { No data-dependent delays.


Implementation:

{ FSM-oriented design:

Hardware: PLAs, gates, registers. One FSM state per schedule level.

Assumption:

{ Synchronous implementation. { Control unit is a FSM (or connection of


FSMs).

{ Microcode-oriented design:

Hardware: ROM, PLA, counter.

FSM-based implementation
Simple model:

c GDM

Example
0 NOP

c GDM

{ next-state function: unconditional. { output function: activate operations.


Extended model:

TIME 1

*
3

10

TIME 2

11

<

TIME 3

TIME 4

n NOP

{ Branching and iteration: { Hierarchy:

Conditional next-state function. Hierarchical FSM connection.

reset

reset 1,2,6,8,10 reset reset reset 5

3,7,9,11

reset

Microcode implementation
Horizontal microcode:

c GDM

Example of horizontal microcode


c GDM
0 NOP

{ One bit per activation signal. { One microcode word per schedule level. { Maximum performance. { Wide words.
Vertical microcode:

TIME 1

*
3

10

TIME 2

11

<

TIME 3

TIME 4

n NOP

Address 00 01 10 11 Counter

Microwords 11000101010 00 100010101 00010000000 00001000000 Activation signals

{ Encode each resource activation signal. { Shorter words. { One (or more) words per schedule level.

Reset

Microcode compaction problem Example of vertical microcode


Microwords 0001 0010 0110 1000 1010 0011 0111 1001 1011 0100 0101

c GDM

c GDM

Partition ROM word into elds. Encode signals in each eld. Allow for a code for NOP. Activation signals in each eld must not be concurrent. Problems:

Decoder Activation signals

{ Minimize number of elds. { Minimize total ROM width.

Microcode optimization
Con ict graph:

c GDM

Example
eld A A A B C C C D D E E op code 1 01 3 10 4 11 2 1 6 01 7 10 5 11 8 01 9 10 10 01 11 10

c GDM

{ Concurrent operations. { Optimum vertex coloring


Compatibility graph:

yields minimum number of elds.

{ Non-concurrent operations. { Optimum clique partitioning

yields minimum number of elds. yields minimum number of bits.

{ Minimum weighted clique partitioning

Example
Microword format A B C D E

Hierarchical control
c GDM

c GDM

Exploit the hierarchical structure of sequencing graphs. One controller per entity.

Microwords 01 10 11 00 1 0 0 0 0 1 10 00 11 01 10 00 00 01 10 00 00

Interconnected nite state machines. Handshake:

D1
1,3,4 2

D2
6,7,5

D3
8,9

D4
10,11

Activation signals

{ activate signals. { condition signals. { reset signals.

Example
act act CONTROL BLOCK reset act CONTROL reset BLOCK act

c GDM

Control synthesis for unbounded-latency sequencing graphs


c GDM

Data-dependent delay operations.


DATAPATH

CONTROLUNIT

{ activate signals. { completion signals.

Synchronization problem:
reset condition act CONTROL act BLOCK reset CONTROL BLOCK reset act CONTROL BLOCK act

{ Wait on completion signals. { Wait on external synchronization.

Several strategies.
DATAPATH

CONTOLUNIT

{ Clustering. { Adaptive Control. { Relative Scheduling.

Summary Control synthesis


Di erent approaches. Implementations:

c GDM

{ FSM, connection of FSMs or ROM.


Techniques:

{ Bounded delays only:


FSM { microcode.

{ Unbounded delays:

Di erent methods to provide synchronization.

Potrebbero piacerti anche