Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Agenda
Objective Introduction to clock gating Clock gating methodology
Overview RTL synthesis Physical synthesis Clock tree synthesis Summary of recommendations
Objective
Describe the clock gating methodology to meet target
Skew Insertion delay Power
Agenda
Objective Introduction to clock gating Clock gating methodology
Overview RTL synthesis Physical synthesis Clock tree synthesis Summary of recommendations
D EN CLK
Area savings
Eliminating multiplexers saves area
Easy to implement
No RTL code change is required Clock gating is automatically inserted by the tool Technology independent
Agenda
Objective Introduction to clock gating Clock gating methodology
Overview RTL synthesis Physical synthesis Clock tree synthesis Summary of recommendations
Input RTL
Insert Insertclock clockgating gating Compile Compile IC ICCompiler Compiler Physical PhysicalCompiler Compiler Merge clock gates Merge clock gates Placement and placement Placement and placement optimization optimization Astro Astro Replicate clock gates Replicate clock gates Clock tree synthesis Clock tree synthesis Detail routing Detail routing
Merge clock gates Merge clock gates Placement and placement Placement and placement optimization optimization Replicate clock gates [BETA] Replicate clock gates [BETA] Clock tree synthesis Clock tree synthesis Detail routing Detail routing Design Compiler X-2005.09 IC Compiler v1.1 Physical Compiler X-2005.09 Astro X-2005.09
Agenda
Objective Introduction to clock gating Clock gating methodology
Overview RTL synthesis
Methodology Clock
gating considerations
10
Input RTL
RTL Synthesis
11
Minimum bitwidth
This is the minimum bitwidth of register banks that will be gated By default, the minimum bitwidth is 3 No area or power benefit with register banks with bitwidth less than 3
RTL Synthesis
12
d1 a b
EN
d1 a b
EN
CG
clk
Module B
CG
clk d2
Module B
d2
EN
CG
Top
Top
13
Measure the Quality of Inserted Clock Gating: Report Power and Clock Gating
Use the report_power command
Cell Internal Power Net Switching Power Total Dynamic Power = 160.6544 mW = 102.5581 mW --------= 263.2125 mW (61%) (39%) (100%)
RTL Synthesis
14
Agenda
Objective Introduction to clock gating Clock gating methodology
Overview RTL synthesis
Methodology Clock
gating considerations
15
RTL Synthesis
16
Latch-based clock gates prevent a glitch on the enable from being propagated to the gated clock
D EN CLK GCLK
CLK EN GCLK
No glitches on gated clock
RTL Synthesis
17
No clock skew between latch and AND gate Timing analysis and CTS handle the clock gate automatically Setup and hold check modeled in library Easy to use in the flow
Ensure minimum skew between latch and AND gate Specify latch clock pin as a non stop pin for CTS Specify the setup and hold time This adds complexity to the flow
18
CLK
EN CLK
CG
( )
RTL Synthesis
( + )
19
RTL Synthesis
20
ICG
Fewer clock gating cells Better power reduction More constrained enable
RTL Synthesis
21
300
ICG
ICG ICG
108
ICG
27 8
ICG
Unbalanced clock structure Depending on design skew requirement, may need processing for CTS QoR
RTL Synthesis
22
fanout of each clock gate Eliminate small fanout Select the value based on your design Experiments have shown that using a balanced fanout of 128 or 256 results in improved CTS QoR
RTL Synthesis
23
Agenda
Objective Introduction to clock gating Clock gating methodology
Overview RTL synthesis Physical synthesis Clock tree synthesis Summary of recommendations
24
Small fanout
To keep the clock gate and its register fanout together during placement, use
set physopt_disable_auto_bound_for_gated_clock false
Helps
Physical Synthesis
25
Physical Synthesis
26
Gate-level design
Identify Identifyclock clockgates gates identify_clock_gates identify_clock_gates Merge Mergeclock clockgates gates merge_clock_gates merge_clock_gates Placement Placementoptimization optimization
27
Agenda
Objective Introduction to clock gating Clock gating methodology
Overview RTL synthesis Physical synthesis Clock tree synthesis Prepare your clock structure for CTS Replicate clock gates Summary of recommendations
28
60
ICG
60 34 28
ICG ICG ICG
300
ICG
31 28
108
ICG ICG
25
ICG ICG
25 8
29
EN2 ICG
To enable, use
set power_cg_all_registers true
RTL Synthesis
30
25
ICG
25
ICG ICG
20 108
ICG ICG
31 25 32
ICG
25
Same engine used for clustering in clock tree synthesis and clock gate replication Clock Tree Synthesis
31
Replicates clock gate with new instances using the same reference cell Balances the fanout of clock gates based on design rule constraints Considers the location of registers In Astro, marks the output net of the clock gate as synthesized
Astro CTS does not modify the net IC Compiler CTS checks the net for a DRC violation, but does not modify the net if it is DRC clean
Inserts buffers to drive registers that are not gated The number of clock gates increases
Clock gates are larger than clock buffers and consume more power Impact on power and area
32
No
33
34
35
36
The object_list is a list of instances or nets whose fanout is to be replicated Enable sizing or relocation of ICGs
37
Creating Balanced Clock Fanout at RTL Versus Replicate Clock Gates Before CTS
Balanced Clock Fanout Replicate Clock Gates at RTL
When? Why? Insert clock gating at RTL synthesis. CTS QoR is a priority. Enable pin timing is a priority. Replicate clock gates before CTS. Selected maximum fanout at RTL synthesis for maximum power savings. Need to preprocess clock structure to meet target skew. DRC at output of clock gate (includes input capacitance of registers and net capacitance) Clustering based on placement location
Based on
38
Agenda
Objective Introduction to clock gating Clock gating methodology
Overview RTL synthesis Physical synthesis Clock tree synthesis Summary of recommendations
39
40
41
Agenda
Objective Introduction to clock gating Clock gating methodology
Overview RTL synthesis Physical synthesis Clock tree synthesis Summary of recommendations
42
Flow highlights
RTL synthesis No max fanout constraint
Insert clock gating
(default: unlimited) Insert always active clock gating cells No group bounds
Results
Final skew Final power 141ps 27mW
43
Flow highlights
RTL synthesis No max fanout constraint
Insert clock gating
(default: unlimited) Insert always active clock gating cells No group bounds
Results
Final skew Final power 91ps 16mW
44
Agenda
Objective Introduction to clock gating Clock gating methodology
Overview RTL synthesis Physical synthesis Clock tree synthesis Summary of recommendations
45
IC Compiler only
Use clock gate optimization to optimize the timing of the enable pin after CTS
46
Agenda
Objective Introduction to clock gating Clock gating methodology
Overview RTL synthesis Physical synthesis Clock tree synthesis Summary of recommendations
47
Summary
Understand the power and CTS requirements of your design Choose the clock gating methodology based on your design requirements
Use integrated clock gating Process the clock structure based on your CTS and power requirements Select the right fanout of clock gates during RTL synthesis Use merge and replication of clock gates only if necessary
48
Appendix
Sample scripts Summary of clock gating methodologies Overview of clock gating methodology using ASCII interchange format How to handle enable signal timing Equivalence checking in Formality Clock gating and design-for-test Details on replicate clock gates Additional considerations with discrete clock gating
49
Sample DC Script
#Set clock gating options, max_fanout default is unlimited set_clock_gating_style -sequential_cell latch \ -positive_edge_logic {integrated} \ -control_point before \ -control_signal scan_enable #Create a more balanced clock tree by inserting always enabled ICGs set power_cg_all_registers true set power_remove_redundant_clock_gates true read_db design.gtech.db current_design top link source design.cstr.tcl #Insert clock gating insert_clock_gating compile #Generate a report on clock gating inserted report_clock_gating
50
#Replicate clock gates split_clock_net object_list *latch* gate_sizing gate_relocation #Clock tree synthesis and optimization clock_opt
51
52
53
Example:
cell1 cell2 cell2 cell3 cell4 cell5
cell1,
cell2, and cell3 are in the same class cell4 and cell5 are in the same class
54
Replicate clock gates before CTS. Selected maximum fanout at RTL synthesis for maximum power savings. Need to preprocess clock structure to meet target skew. DRC at output of clock gate (includes input capacitance of registers and net capacitance) Clustering based on placement location
Why?
Based on
55
Input RTL
Insert Insertclock clockgating gating Compile Compile IC ICCompiler Compiler Physical PhysicalCompiler Compiler Identify clock gating cells Identify clock gating cells Merge clock gates Merge clock gates Placement and placement Placement and placement optimization optimization Astro Astro Replicate clock gates Replicate clock gates (astSplitClockNet) (astSplitClockNet) Clock tree synthesis Clock tree synthesis Detail routing Detail routing Skew analysis Skew analysis
Identify clock gating cells Identify clock gating cells Merge clock gates Merge clock gates Placement and placement Placement and placement optimization optimization Replicate clock gates [BETA] Replicate clock gates [BETA] (split_clock_net) (split_clock_net) Clock tree synthesis Clock tree synthesis Detail routing Detail routing Skew analysis Skew analysis
56
CLK
It can also be modeled by specifying a clock latency for the clock and then a modified clock latency for all the clock gate clock pins
set_clock_latency 1.7 CLK This is the delay seen at the input of any ungated register set_clock_latency 1.1 $ICGClkInputPins This is the delay seen at the input of the clock gates set_clock_latency 1.7 $ICGClkOutputPins This is the delay seen at the input of the gated registers
Registers
CG
( )
( + )
57
Formal Verification
The Synopsys formal verification tool, Formality, can perform equivalence checking when the design has inserted clock gating cells The following command instructs Formality to account for clock gating logic
fm_shell > set verification_clock_gate_hold_mode any
58
59
Di
Q Enable logic
EN D Q ENCLK Latch G
Flipflops CLK
60
Control point
Di D Q Control logic EN
Flipflops CLK
ENCLK
Latch G
61
Control point
Di D Q Enable logic
Flipflops CLK
D EN
Latch G
62
Complete Observability
EN3 Other observability nodes EN2
Observe flop
CLK
EN1
D testmode Q dataout
EN
Latch
CLK
Unobservable point
63
SE1
CG1
FF
SE2 SE3
CG1
FF
hookup_testports se_port SE3 hookup_testports [-verbose] [-se_port port] [-tm_port port] [-se_pin pin] [-tm_pin pin]
64
Replication of ICG
Load on ICG: 2pf Load on each ICG: 0.25pf (< Max Cap of 0.3pf)
8 ICGs
DRC fixed on the output of each instance In Astro, net is marked as synthesized In IC Compiler, net is not marked as synthesized
65
Constraints
The replication of the specified instances is based on fixing DRC at the output of each instance The DRC constraints considered are maximum fanout, maximum capacitance and maximum transition The tool converts maximum fanout and maximum transition into equivalent capacitance values, and uses the tightest of the three capacitance values as the maximum capacitance constraint
Behavior
The tool splits the specified instance as many times as is necessary to fix the DRC on the output of each clock gate
66
Solution
2000 registers
~120 ICGs 3000 registers Load on each ICG < 0.35pf Fanout of each ICG ~ 25
67
Solution
~80 ICGs
Load on each ICG < 0.35pf 3000 registers Fanout of each ICG ~ 25
68
Solution
1000 registers
2 ICGs
69
Solution
1000 registers
200 registers
1000 registers
~15 ICGs
195 registers
195 registers
70
71
In Astro,
Place the latch and AND gates close together Specify a large netweight on the net Get the clock to go through the latch, that is, ignore the CLK pin of the latch as a sync pin Use the astSetClockNonStop command Refer to SolvNet article 003097