Sei sulla pagina 1di 36

ECE 720 ESL & Physical Design

Lecture 20:
Clock-Tree Synthesis
Spring 2013
W. Rhett Davis
NC State University

W. Rhett Davis

NC State University

Slide 1

ECE 720

Spring 2013

Announcements

Homework #6 Due Today

Homework #7 Due in 1 week

W. Rhett Davis

NC State University

Slide 2

ECE 720

Spring 2013

Todays Lecture

Clock Trees

Clock-Tree Design Example

Other Clocking Styles

W. Rhett Davis

NC State University

Slide 3

ECE 720

Spring 2013

Effect of Skew

Insertion-delays through
clock-tree to each clock- V
sink are different:
tCLK1, tCLK2

time
tCLK1 tCLK2

Define skew

New timing constraints

REG

Setup time:
T + tc-q + tp,logic + tsu
Hold time:
thold + < tcd,reg + tcd,logic
W. Rhett Davis

NC State University

logic

tCLK1
Slide 4

REG

= tCLK2 tCLK1

tCLK2
ECE 720

Spring 2013

Skew and Insertion Delay

So far, skew and insertion-delay have been defined


for a pair of registers.
When referring to a clock-tree, we can also talk about
the skew of the entire tree.
tskew=tins(max)-tins(min)
tins= tins(min)+tskew/2

W. Rhett Davis

NC State University

Slide 5

ECE 720

Spring 2013

The Clock-Tree Design Problem

...

How to get a signal from the clock source to the


clock sinks?

W. Rhett Davis

NC State University

Slide 6

ECE 720

Spring 2013

The Simplest Approach

...

Progressively sized buffer chain to drive the large load


Problem: Large skew due to different RC wire delays

W. Rhett Davis

NC State University

Slide 7

ECE 720

Spring 2013

A Better Approach

...

Clock-Tree allows equalizing wire-delays between branches


Shown here: Binary Clock Tree
Fan-out of 2 for each branch
Typically the highest power, lowest skew

W. Rhett Davis

NC State University

Slide 8

ECE 720

Spring 2013

Ideal Binary Clock Tree


H-Tree
Equalizes wirelengths to all loads
Problem: Clocksinks are typically
not evenly
distributed

CLOCK

W. Rhett Davis

NC State University

Slide 9

ECE 720

Spring 2013

Zero Skew Tree

merging
points

Choose merging points


so that wire-lengths are
equalized, even if sinks
are unevenly distributed

Planar DME: A Single-Layer


Zero Clock-Skew Tree Router
Kahng & Tsao, IEEE Trans. CAD
1996
see also Bounded-Skew Clock
and Steiner Routing
Cong, Kahng, Koh, & Tsao, ACM
Trans. Des. Auto. 1999

W. Rhett Davis

NC State University

Slide 10

ECE 720

Spring 2013

Well-Designed Clock Trees

We can think of a clock-tree as distributed version of a


progressively sized buffer chain driving a large
capacitance
A well-designed, progressively sized buffer-chain tends
to have the following properties
All stages have the same ratio of input capacitance to load
capacitance
All stages have the same delay due to loading
All stages have the same transition times

Adhering to these properties has the following benefits


Minimum (insertion) delay (may or may not be important)
Reduced uncertainty in timing estimates
Minimized power loss due to short-circuit/direct-path/shootthrough currents

W. Rhett Davis

NC State University

Slide 11

ECE 720

Spring 2013

Todays Lecture

Clock Trees

Clock-Tree Design Example

Other Clocking Styles

W. Rhett Davis

NC State University

Slide 12

ECE 720

Spring 2013

Example Design
ARM CORTEXM0 processor
~6000 cells when synthesized for 45nm
Nangate library
841 clock-sinks (flip-flops)

W. Rhett Davis

NC State University

Slide 13

ECE 720

Spring 2013

Fixing Hold Violations

Recall that the only way to fix a hold-time violation on a


path is to add more delay by adding buffers on short
paths
To add this delay in Synopsys Design Compiler, first
specify the amount of expected skew in the
Constraints.tcl file:
set CLK_SKEW 0.05
set_clock_uncertainty $CLK_SKEW $clkname

Then specify that these "hold-fix buffers" should be


added at compile-time in the CompileAnalyze.tcl file:
set_fix_hold $clkname
compile -only_design_rule -incremental

W. Rhett Davis

NC State University

Slide 14

ECE 720

Spring 2013

timing_min_fast_holdcheck_tut2.rpt
Point
Incr
Path
----------------------------------------------------------clock HCLK (rise edge)
0.0000
0.0000
clock network delay (ideal)
0.0000
0.0000
u_logic/J9d3z4_reg/CK (DFFR_X1)
0.0000
0.0000 r
u_logic/J9d3z4_reg/QN (DFFR_X1)
0.0451
0.0451 f
u_logic/U1572/ZN (OAI22_X1)
0.0276
0.0727 r
u_logic/J9d3z4_reg/D (DFFR_X1)
0.0000
0.0727 r
data arrival time
0.0727
clock HCLK (rise edge)
0.0000
0.0000
clock network delay (ideal)
0.0000
0.0000
clock uncertainty
0.0500
0.0500
u_logic/J9d3z4_reg/CK (DFFR_X1)
0.0000
0.0500 r
library hold time
0.0225
0.0725
data required time
0.0725
----------------------------------------------------------data required time
0.0725
data arrival time
-0.0727
----------------------------------------------------------slack (MET)
0.0002
W. Rhett Davis

NC State University

Slide 15

ECE 720

Spring 2013

ARM CORTEXM0 Floorplan

W. Rhett Davis

NC State University

Slide 16

Run design-flow up
to cts step to
generate template
clock.ctstch file

ECE 720

Spring 2013

clock.ctstch
AutoCTSRootPin
Period
MaxDelay
MinDelay
MaxSkew
SinkMaxTran
BufMaxTran

HCLK
30ns
30ns # default value
0ns
# default value
300ps # default value
400ps # default value
400ps # default value

Primary clock-tree specification file


Insertion delay constrained to be less than one clock period, but
otherwise unconstrained (lots of flexibility to the tool)
Skew constrained to 300ps (which is A LOT for 45nm)
Transition time at buffer and sink outputs constrained to 400ps
(again, A LOT for 45nm)

W. Rhett Davis

NC State University

Slide 17

ECE 720

Spring 2013

clock.ctsrpt

Primary report on clock-tree quality


How much skew is predicted?
Do we believe this?

Nr. of Subtrees
: 1
Nr. of Sinks
: 841
Nr. of Buffer
: 8
Nr. of Level (including gates) : 2
Root Rise Input Tran
: 120(ps)
Root Fall Input Tran
: 120(ps)
Max trig. edge delay at sink(R): u_logic_T243z4_reg/CK 441.3(ps)
Min trig. edge delay at sink(R): u_logic_Mvi2z4_reg/CK 337.4(ps)

W. Rhett Davis

NC State University

Slide 18

ECE 720

Spring 2013

clock.ctsrpt (continued)

Max.
Max.
Max.
Max.
Min.
Min.
Min.
Min.

1X-2X greater transition time predicted for sinks, as


compared to buffers
Do we believe this?

Rise
Fall
Rise
Fall
Rise
Fall
Rise
Fall

W. Rhett Davis

Buffer Tran
Buffer Tran
Sink Tran
Sink Tran
Buffer Tran
Buffer Tran
Sink Tran
Sink Tran

NC State University

:
:
:
:
:
:
:
:

(Actual)
159.4(ps)
56(ps)
237.1(ps)
76.8(ps)
159.4(ps)
55.6(ps)
144(ps)
61.7(ps)

Slide 19

(Required)
400(ps)
400(ps)
400(ps)
400(ps)
0(ps)
0(ps)
0(ps)
0(ps)

ECE 720

Spring 2013

clock.ctsrpt (continued)
Main Tree from HCLK w/o tracing through gates:
Rise Delay [337.4(ps) 441.3(ps)] Skew [103.9(ps)]
Fall Delay [400.9(ps) 446.5(ps)] Skew=[45.6(ps)]

HCLK (0 0) load=0.0193686(pf)
HCLK__L1_I0/A (0.0026 0.0026)
HCLK__L1_I0/ZN (0.2475 0.1175) load=0.123348(pf)
HCLK__L2_I6/A (0.272 0.1419)
HCLK__L2_I6/ZN (0.3366 0.4001) load=0.114191(pf)
HCLK__L2_I5/A (0.2622 0.132)
HCLK__L2_I5/ZN (0.3852 0.412) load=0.161563(pf)
... (5 other L2 buffers)
W. Rhett Davis

NC State University

Slide 20

ECE 720

Spring 2013

Root Pin to L1

Clock Display Display Clock Tree Selected Level 1

W. Rhett Davis

NC State University

Slide 21

Single buffer on
periphery
Note: Special Nets
turned off for clarity

ECE 720

Spring 2013

L1 to L2

Clock Display Display Clock Tree Selected Level 2

W. Rhett Davis

NC State University

Slide 22

Buffers chosen
such that an H-tree
is developing

ECE 720

Spring 2013

L2 to Sinks

Clock Display Display Clock Tree Selected Level 3

W. Rhett Davis

NC State University

Slide 23

Final level of tree


Clock sinks shown,
Color coded to
show which sinks
share the same
last-stage driver
Two last stage
drivers shown here
(420 sinks each)

ECE 720

Spring 2013

Skew Map

Clock Display Display Clock Tree Phase Delay

W. Rhett Davis

NC State University

Slide 24

Red longest
collective insertion
delay
Blue shortest
collective insertion
delay

ECE 720

Spring 2013

Longest & Shortest Paths

Clock Display Display Clock Tree Max,Min Paths

W. Rhett Davis

NC State University

Slide 25

Red longest
individual insertion
delay
Green shortest
individual insertion
delay

ECE 720

Spring 2013

timing_pt_slow_trialrouted.rpt

Arrival Time

Point
Trans
Incr
Path
-----------------------------------------------------------------clock HCLK (rise edge)
0.0000
0.0000
clock source latency
0.0000
0.0000
HCLK (in)
0.0000
0.0000 &
0.0000 r
HCLK__L1_I0/A (INV_X32)
0.0024
0.0020 &
0.0020 r
HCLK__L1_I0/ZN (INV_X32)
0.0359
0.0305 &
0.0326 f
HCLK__L2_I4/A (INV_X32)
0.0394
0.0209 &
0.0535 f
HCLK__L2_I4/ZN (INV_X32)
0.2055
0.2061 &
0.2597 r
u_logic_Itw2z4_reg/CK (DFFS_X1)
0.2114
0.0219 &
0.2816 r
u_logic_Itw2z4_reg/Q (DFFS_X1)
0.0871
0.4203 &
0.7019 r
u_logic_U275/A2 (AOI222_X1)
0.0871
0.0000 &
0.7019 r
u_logic_U275/ZN (AOI222_X1)
0.1135
0.1810 &
0.8829 f
u_logic_U274/A (OAI221_X1)
0.1135
0.0003 &
0.8832 f
u_logic_U274/ZN (OAI221_X1)
0.4206
0.5048 &
1.3880 r
u_logic_Dvy2z4_reg/D (DFFS_X1)
0.4206
0.0004 &
1.3884 r
data arrival time
1.3884
W. Rhett Davis

NC State University

Slide 26

ECE 720

Spring 2013

timing_slow_pt_routed_clock.rpt

Required Time

Point
Trans
Incr
Path
------------------------------------------------------------------clock HCLK (rise edge)
0.0000
0.0000
clock source latency
0.0000
0.0000
HCLK (in)
0.0000
0.0000 &
0.0000 r
HCLK__L1_I0/A (INV_X32)
0.0024
0.0020 &
0.0020 r
HCLK__L1_I0/ZN (INV_X32)
0.0360
0.0306 &
0.0326 f
HCLK__L2_I1/A (INV_X32)
0.0395
0.0221 &
0.0547 f
HCLK__L2_I1/ZN (INV_X32)
0.2486
0.2439 &
0.2987 r
u_logic_Dvy2z4_reg/CK (DFFS_X1)
0.2543
0.0247 &
0.3234 r
library hold time
1.1454
1.4688
data required time
1.4688
------------------------------------------------------------------data required time
1.4688
data arrival time
-1.3884
------------------------------------------------------------------slack (VIOLATED)
-0.0804
W. Rhett Davis

NC State University

Slide 27

ECE 720

Spring 2013

PrimeTime Results

Note that PrimeTime is computing the insertion delay in


the clock-tree and using it to compute the skew for every
register-to-register pair
Enabled by set_propagated_clock command

Note that PrimeTime predicts insertion delay of 323ps


337 ps to 441 ps predicted by CTS

Note that PrimeTime predicts sink transition time of 254ps


144 ps to 237 ps predicted by CTS

Transition time at sinks is 7X greater than at buffers!


Much of this is due to the fact that the pull-up is weaker than the
pull-down
Less difference between these transition-times leads to a more
robust clock-tree, but also burns more power

W. Rhett Davis

NC State University

Slide 28

ECE 720

Spring 2013

timing_pt_slow_clock.rpt

Skew is more than 1.5X greater than predicted


by CTS
Illustrates the difficulty of knowing clock-tree
properties prior to routing

Maximum setup skew:


u_logic_Wj73z4_reg/CK
u_logic_Mvi2z4_reg/CK
Maximum hold skew:
u_logic_Uyv2z4_reg/CK
u_logic_J5o2z4_reg/CK
W. Rhett Davis

NC State University

Slide 29

0.1586

rp-+
rp-+

0.1588

rp-+
rp-+

ECE 720

Spring 2013

How to Fix Hold Violations

Add delays to inputs in PrimeTime


Input delays need to match insertion delay

Reduce MaxSkew and SinkMaxTran


parameters in clock.ctstch
Should add more stages, reduce the number
of sinks on each last-stage buffer

Use optDesign postCTS hold


Adds more "hold-fix-buffers" to lengthen
shortest paths

W. Rhett Davis

NC State University

Slide 30

ECE 720

Spring 2013

Homework #7 p2 Notes

Prob. 2 asks you to


partially verify timing
hierarchically

CORTEXM0

AHB

Find output delays of AHB, set input delays in


CORTEXM0
Make sure that AHB outputs are constrained,
or you may not get anything out, for example:
create_clock -name hclk -period 40 hclk
set_output_delay clock hclk 0 [all_outputs]

Make sure AHB input delays are greater than insertion


delay, or CORTEXM0 will fail hold-checks
Best to modify run_pt.tcl when creating gen_indelay.tcl
W. Rhett Davis

NC State University

Slide 31

ECE 720

Spring 2013

Relationship of Skew & Slope

W. Rhett Davis

NC State University

tr1/2 tr2/2
V
time
tCLK1 tCLK2

logic

tCLK1
Slide 32

REG

Larger transition times mean more


uncertainty in delay predictions
One form of uncertainty pertains to
the switching threshold (can move
up or down)
Hold constraint:
thold + < tcd,reg + tcd,logic
Actual skew on a path may
increase by as much as
tr1/2 + tr2/2
To be 100% sure, I like to see a
hold-time slack (i.e. hold margin) of
at least this much on every path
For simplicity, can have a hold-time
slack of tr,max on every path
NOTE: This is extremely
conservative, recommended only
for research chips
Not required on your assignments
except HW#7 problem 1

REG

tCLK2
ECE 720

Spring 2013

Todays Lecture

Clock Trees

Clock-Tree Design Example

Other Clocking Styles

W. Rhett Davis

NC State University

Slide 33

ECE 720

Spring 2013

Link Insertion
clock source
(chip input pin)

clock sinks
(flip-flop clock pins)

link

link
...

...

...

Insert wires to connect branches of the tree


Tends to dissipate more power. Why?
Tends to have less skew. Why?

W. Rhett Davis

NC State University

Slide 34

ECE 720

Spring 2013

Clock Grids/Meshes

W. Rhett Davis

NC State University

Wire Mesh Connects


all clock-sinks at the
last level of the tree
Can be considered a
special case of the
link-insertion
approach

Slide 35

ECE 720

Spring 2013

Clock Mesh Synthesis

Encounter has commands to Synthesize


Clock Meshes, in addition to Clock Trees
Give it a try!

W. Rhett Davis

NC State University

Slide 36

ECE 720

Spring 2013

Potrebbero piacerti anche