Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Session Speaker
Ajaya Kumar.s
1
PEMP VSD531
Session objective
Session Topics
• Clock Tree Synthesis (CTS) goals
• Clock Skew
• Identify settings of key timing parameters for pre-CTS and post-CTS stages
• Placement - completed
• Power and ground nets – prerouted
• Estimated congestion – acceptable
• Estimated timing – acceptable (~0ns slack)
• Estimated max cap/transition – no violations
• High fanout nets:
• Reset, Scan Enable synthesized with buffers
• Clocks are still not buffered
Why
Whyare
arethere
thereno
nobuffers
bufferson
onclock
clocknets?
nets?
Before CTS
FF FF FF FF FF FF
FF FF FF FF FF FF
Clock
FF FF FF FF FF FF
FF FF FF FF FF FF
All
Allclock
clockpins
pinsare
aredriven
drivenby
byaasingle
singleclock
clocksource.
source.
CTS Goals
FF FF FF FF FF FF
FF FF FF FF FF FF
FF FF FF FF FF FF
Clock
FF FF FF FF FF FF
A
A buffer
buffer tree
tree is
is built
built to
to balance
balance the
the loads
loads and
and minimize
minimize the
the skew.
skew.
FF FF FF FF FF FF
FF FF FF FF FF FF
FF FF FF FF FF FF
Clock
FF FF FF FF FF FF
A
A “delay
“delay line”
line” is
is added
added to
to meet
meet the
the minimum
minimum insertion
insertion delay.
delay.
GATED FF
{ create_clock D
Q
STOP
CLOCK
CLK
FF
CLK
IO_PAD
D
GATED Q
STOP Pins:
FF
{ CTS optimizes for DRC and clock CLOCK
D
Q
CLK
D Q
FF1
GATED 0.64 CLK
matched 0.65
FF2
CLK
D Q
FF3
CLK
CLOCK
D
Q
0.63
D Q
create_clock FFD FF4
CLK CLK
QN
D Q
create_generated_clock FF5
CLK
FF3
skew and
pin. In this insertion delay
are ignored
example the clock pin is
not defined. What is the problem
here?
IP_CLK D
?
Q
FF
The
The macro’s
macro’s clock
clock pin
pin is
CLKn
is Implicit
exclude pin no clock
marked
marked as
as an
an implicit
implicit pin
definition
exclude
exclude pin
pin –– no
no skew
skew
IP (FRAM)
optimization!
optimization!
Defining an explicit
CLOCK D Q
CLK
IP_CLK D
FF
Explicit stop pin defined
of the IP-internal clock CLKn
CLOCK D
Exceptions 0.42
Q
FF
CLK
FF
IP CLKn
set_clock_tree_exceptions \
-float_pins IP/IP_CLK \
-float_pin_max_delay_rise 0.15
Clock
Clock Analysis
Analysis
CT optimization is run inside
clock_opt, and can be run
independently as well:
Clock
Clock spec
spec No
CTO
CTO
met?
met? optimize_clock_tree
Yes
Routing
Style Distribution
Spines with matched branches Multiple central structures with length (or
delay) matched branches
Grid Interconnected (shorted) clock structure
Unconstrained tree
It is commonly used in automatic synthesis flows and usually placed with little or
no restriction on the number of buffer stages and explicit matching between
interconnect delays and the buffer delays
Balanced Tree
H structure
Clock Driver
Grid Network
Central Spine
Hybrid Distribution
Regional buffers (labeled as level 4 buffers in Fig.) residing at the end of the
multilevel H-tree drive a common grid that includes all local loads
S0
L0, W0 R1 S1
N0 S0 R0 C1/2 C1/2
CL1
R2 S2
C0/2 C0/2
L2, W2 L1, W1
CL2
C2/2 C2/2
S1
S2
rl1 rl 2
ts = C L1 − CL2
w1 w2
The skew variation in terms of wire width variation
∂t1 ∂t 2 rl1C L1 rl 2 C L 2
Δt s = Δw1 + Δw2 = − Δw1 + Δw2
∂w1 ∂w2 w12
w2 2
Different buffer delays cause phase delay variations on different source to sink paths,
the given tolerable skew of a buffered clock tree ts into two components
ts = t + t b
s
w
s
b
t s
= tolerable skew for buffer delays
w
t s
= wire width variation after buffer insertion
Buffer insertion problem is to find the location on the clock tree to insert intermediate
buffers and and these locations are buffer insertion points (BIP’s)
Clock Skew
• Clock skew is the maximum difference in the arrival time of a clock signal at
two different components.
• Clock skew forces designers to use a large time period between clock pulses.
This makes the system slower.
• So, in addition to other objectives, clock skew should be minimized during
clock routing.
Local Skew
FF3
Q
A D Q
FF3
CLK
T3 CLK
(0.4ns)
D Q
CLOCK
D
FF1
Q
B CLOCK FF1
CLK
CLK T1
(0.2ns)
B_OUT
D Q
B_OUT B D
FF2
Q
FF2 CLK
T2
CLK
(0.2ns)
Related path is
minimized for skew
Longer runtime
Global Skew
FF3
Q
A D Q
FF3
CLK T3 CLK
(0.38ns)
D Q
CLOCK
D
FF1
Q
B CLOCK FF1
CLK
CLK T1
(0.37ns)
B_OUT
D Q
B_OUT B D
FF2
Q
FF2 CLK
CLK
T2
(0.38ns)
All clock delays are
matched as close as
possible
Useful Skew
DIN A_OUT
DIN A_OUT A D Q
A D
FF3
Q FF3
CLK
CLK T3
(0.22ns)
D Q
B
CLOCK
D
FF1
Q
B CLOCK FF1
CLK
B_OUT
CLK D Q
T1
FF2
(0.11ns)
CLK
B_OUT
D Q
FF2
T2
CLK
(0.35ns)
Add clock delay to FF2
to help setup time
F F
F
F F
F
4X 4X 2X 4X 3X
F F
F F
F F
F F
4X F
F F
F
5X
F F
F F
After
After
F F
F
F
F
3X F
F
F
F
F
F
F
F
4X 2X
F
F
F
Clock Tree Optimization (1/2)
F
F
F
4X
F
F
Before
F F
F
F F
F
4X 4X 2X 4X 2X
F F
F F
F F
F F
F
F F
4X 4X F
F F
F F
F F
After
After
36
PEMP VSD531
PEMP VSD531
F
F
F
F
F
F
F
F
F
F
F
FF
F
F
FF
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
FF
F
F
F
F
F
F
F
F
F
F
F
F
F
F
Before Before
Level Adjustment
F
F
F
F
F
FF
FF
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
FF
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
FF
F
F
F
F
Reconfiguration
After After
Timing-Driven P&R
optimizes the logic gates, places and routes them to meet all timing constraints
Timing Constraints
•0.5 ns
TLU
Layer "METAL1" {
TLU model comes …
unitNomResistance = 6.4e-5
from the vendor and is …
contained in “tech” file }
CapTable "metal1_C_LATERAL_14MIN" {
Contains capacitance wireWidthSize = 5
look-up tables only wireSpacingSize = 16
wireWidth = (0.16, 0.32, 0.48, 0.64, 0.8)
Resistance is calculated from the net wireSpacing = (0.18, 0.36, 0.54, 0.72, …, 2.88)
geometry and capValue = (0.000183764, 9.85682e-05, 6.5029e-05, …
)
a resistance/length (unit resistance) }
value from the tech file
…
CapModel "metal1Config4" {
refLayer = "METAL1"
lateralCapType = "Table"
lateralCapDataMin = "metal1_C_LATERAL_14MIN"
…
}
TLU +:
TLU+ models:
Model UDSM process effects
Contain C and R look-up tables
If TLU+ models are available, use them!
UDSM
UDSMProcess
ProcessEffects
Effects
Conformal Dielectric
Conformal Dielectric
TLU+ Astro
Metal
MetalFill
Fill
Shallow
ShallowTrench
TrenchIsolation
Isolation
Copper Dishing:
Copper Dishing:
••Density
DensityAnalysis
Analysis
••Width/Spacing Single
Width/Spacing nxtgrd Star-RCXT
Trapezoid
TrapezoidConductor
Conductor Process File
(ITF)
Mapping file
The Mapping File maps the .tf layer/via names to Star-RCXT .itf layer/via
names.
cb13.tf cb13.itf
conducting_layers
poly poly
metal1 cm
metal2 cm2
…
R3 U2
C3
C4
R2
U1
R1
C2
C1
Virtual Route
Pin-to-pin timing
Detailed
Route
After routing, detailed nets are available and extraction will be more
accurate
Use AWE or Arnoldi for postroute optimizations
Arnoldi is preferred when comparing to PrimeTime
• The only time this is recommended is when performing a “timing sanity check”
performed by running a timing report with all the timing panel settings in pre-
By default, asynchronous preset and clear timing arcs are not analyzed for timing.
Depending on your design, you may have to enable this setting after CTS. E.g. if
Your design contains a reset network that is asserted asynchronously, will not
analyze for preset/clear violations on the flip-flops unless this setting is enabled
Clock D Q
Clock1 D Q
Pre-CTS,
Pre-CTS, the
the delay
delay to
to the
the FFs
FFs is
is “ideal”.
“ideal”. i.e.
i.e.
the
the delay
delay is
is zero,
zero, unless
unless commands
commands areare used
used to
to
“model”
“model” the
the clock
clock insertion
insertion delay.
delay. example:
example:
set_clock_latency
set_clock_latency
Clock2 D Q
D Q
Clock2 D Q
?
©M.S.Ramaiah School Of Advanced Studies
53
PEMP VSD531
A setup timing check verifies the timing relationship between the clock and the
data pin of a flip-flop so that the setup requirement is met. In other words, the setup
check ensures that the data is available at the input of the flip-flop before it is
clocked in the flip-flop. The data should be stable for a certain amount of time,
namely the setup time of the flip-flop, before the active edge of the clock arrives at
the flip-flop.
A hold timing check ensures that a flip-flop output value that is changing does not
pass through to a capture flip-flop and overwrite its output before the flip-flop has
had a chance to capture its original value. This check is based on the hold
requirement of a flip-flop. The hold specification of a flip-flop requires that the data
being latched should be held stable for a specified amount of time after the active
edge of the clock.
A removal timing check ensures that there is adequate time between an active
clock edge and the release of an asynchronous control signal. The check ensures
that the active clock edge has no effect because the asynchronous control signal
remains active until removal time after the active clock edge. In other words, the
asynchronous control signal is released (becomes inactive) well after the active
clock edge so that the clock edge can have no effect.
A recovery timing check ensures that there is a minimum amount of time between
the asynchronous signal becoming inactive and the next active clock edge. In other
words, this check ensures that after the asynchronous signal becomes inactive, there
is adequate time to recover so that the next active clock edge can be effective.
On-Chip Variations
Due to process variations, identical MOS transistors in different portions of the die
may not have similar characteristics . These differences are due to process variations
within the die. Note that the process parameter variations across multiple
manufactured lots can cover the entire span of process models from slow to fast
These differences can arise due to many factors, including:
i. IR drop variation along the die area affecting the local power supply.
One important distinction with respect to the setup check of a flip-flop is that the data
to data setup check is performed on the same edge as the launch edge (unlike a
normal setup check of a flip-flop, where the capture clock edge is normally one cycle
away from the launch clock edge). Thus, the data to data setup checks are also
referred to as zero-cycle checks or same-cycle checks.
A clock gating check occurs when a gating signal can control the path of a clock
signal at a logic cell. An example is shown in Figure. The pin of the logic cell
connected to the clock is called the clock pin and the pin where the gating signal is
connected to is the gating pin. The logic cell where the clock gating occurs is also
referred to as the gating cell.
Power Gating
Power gating involves gating off the power supply so that the power to the inactive
blocks can be turned off. This procedure is illustrated in Figure, where a footer (or a
header) MOS device is added in series with the power supply. The control signal
SLEEP is configured so that the footer (or header) MOS device is on during normal
operation of the block. Since the power gating MOS device (footer or header) is on
during normal operation, the block is powered and it operates in normal functional
mode.
Session Summary
Clock tree synthesis is one of the most important steps of IC design and can
have a significant impact on timing, power, area, etc.
Clock tree synthesis and optimization are an iterative processes and can require
replacement and rerouting various times in order to optimize clock tree
parameters.
CTS importance increases for 90nm and below technologies and especially
when applying low power design techniques as they significantly change the
ratio of gate interconnects as well as manners of building clock trees depending
on their multi-level structures.