Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Leakage Power
Growing impact of leakage power
Increase of leakage power due to scaling of transistor lengths and threshold voltages Power budget limits use of fast leaky transistors
Challenge:
How to maintain performance scaling in face of increasing leakage power?
Observation:
Critical paths dominate leakage after applying SSST techniques Example: PowerPC 750
5% of transistor width is low Vt, but these account for >50% of total leakage.
Body Biasing
Vt increase by Body reverse-biased body effect Large transition time and wakeup latency due to well cap and resistance
Drain Source
Power Gating
Vdd
Sleep signal Sleep transistor between Virtual Vdd supply and virtual supply lines Logic cells Increased delay due to sleep transistor
Sleep Vector
0 Input vector which minimizes leakage Increased delay due to mux and active energy due to spurious toggles after applying sleep vector
Outline
1. Methodology and DDFT Metrics 2. Cache Leakage Saving
Idle subbank deactivation
4. Conclusion
Methodology
Process Technology
180nm DVT process modeled after 0.18um TSMC LVT and MVT processes Scaled to 130, 100, and 70nm processes based on SIA roadmap Optimistic/pessimistic leakage prediction: 2x/4x increase of leakage current density (nA/um)
Energy measurements
Hspice simulation for 180nm process and scaled to other processes accordingly
Local bitlines (32-bit cells) disconnected from senseamp by local-global switch. LBB for Caches: If a subbank is not in use, turn off precharge transistors and delay precharging.
BIT
BIT_BAR
WL
BIT
BIT_BAR
WL
BIT
BIT_BAR
WL
BIT
BIT_BAR
WL
0
Our Target
Forcing 0
0
Forcing 1
1
Forcing ?
0
Stay at 1
1
LBB lets bitlines float by turning off the local HVT NMOS precharge transistors
No static current draw because local bitline isolated LBB uses leakage itself to bias bitlines to the voltage which minimizes leakage!
Original
100
180nm
50
70nm
Energy (pJ)
Energy (pJ)
40 30 20 10 0 0
Original LBB
50 40 30 20 10 0 0
Original
LBB
100 200 300 400 Length of Sleep (cy cles) 500
500
Focus on I-Cache because any latency increase can be partly hidden by branch prediction
ross processes
Percentage (%)
25 20 15 10 5 0
Percentage (%)
t av g
25 20 15 10 5 0
pe rl
gc c
88 k
pe rl
gc c
co m
co m
jp e
Percentage (%)
Percentage (%)
Case 2 (worst) assumption (adding additional pipeline stage) 2.5% IPC decrease on average
av g
vo r
vo r
jp e
70nm
88 k
go
li
go
li
WWL[0:3]
RWL[0:7]
x4
x4
x8
HVT transistors: green-colored Simplified but active/leakage power-aware baseline
Comparison of DDFTs
32 x 32-b Regfile subbank
(75% zero assumed. Optimistic leakage current used.)
Process Tech. (nm) Original (uW) SV steady-state (uW) LBB steady-state (uW) NST steady-state (uW) 180 177.9 2.0 2.0 1.8 130 214.1 2.4 2.4 2.2 100 263.6 3.0 3.0 2.7 70 276.7 3.1 3.1 2.9
180nm
50
70nm
50
Original
Energy (pJ)
Energy (pJ)
40 30 20 10 0 0
40 30 20 10 0 0
Original
1000
Energy (pJ)
50
60
percent (%)
60
percent (%)
40 20 0
40 20 0
m li 88 k pe co rl m p vo rt
m li 88 k pe co rl m p vo rt
130 100 Process (nm) 70
c go jp eg
c g jp o eg
gc
gc
av
Total Energy Savings NST Queue NST Stack LBB 16 regs/bank LBB 8 regs/bank
60
percent (%)
60
percent (%)
40 20 0
40 20 0
180
70
180
NST stack better than NST queue, LBB stack better than either NST
av
Percentage (%)
60 50 40 30 20 10 0
88 k
pe rl
88 k
vo rt
pe rl
vo rt
gc c
gc c
go
av g
go
co m
co m
jp e
jp e
Percentage (%)
20 15
70 60 50 40 30 20 10 0
Percentage (%)
Percentage (%)
10 5 0 -5 180nm
180nm 130nm
130nm
100nm
100nm
70nm
-10
-10
130nm
100nm
70nm
More energy saving for wider issue processors Readport deactivation can be combined with dead subbank deactivation.
av g
li
li
25
Conclusion
Most leakage power is in critical paths
Dynamic leakage reduction (DDFT) desired
LBB allows Fine-grain dynamic leakage reduction with zero or minimal performance penalty.
0% performance penalty for multiported regfiles
Follow on work:
Leakage-biased domino logic to save leakage power in critical ALUs [ VLSI Symposium 2002 ]
Acknowledgments
Thanks to Christopher Batten, Ronny Krashinsky, Rajesh Kumar, and anonymous reviewers Funded by DARPA PAC/C award F3060200-2-0562, NSF CAREER award CCR0093354, and a donation from Infineon Technologies.
DDFT Examples
Body Biasing
Steady-state leakage power
Power Gating
Sleep Vector
Less than 5% Less than 5% Less than 50% (depends on (depends on sleep (depends on the Vbody) transistor) circuit)
0.1~100us
No