Sei sulla pagina 1di 5

Abstract Increasing area overhead is a major design

concern in low-power subthreshold SRAM designs, due to


stability considerations. Since power perormance can only
improve at the e!pense o large area and delay penalties, this
project evaluates the trade-o between area and power-delay
product or some representative subthreshold SRAM designs,
including "#, $#, and %&# cell conigurations. Analytical models
or stability in subthreshold SRAM in deep submicron
technology are used to determine optimum transistor si'ing or
a given desired stability and supply voltage. Models or delay,
power and ()* are also given. #hereore the tradeo between
power, delay, area or dierent designs can be investigated.
I. MOTIVATION
s electronics continue to be integrated into portable
consumer devices, the demand grows not only for
increased functionality, but also for long battery life and
small physical sie. This implies a need to balance ultra!low
power with area!efficient design. "#amples include
wristwatches and hearing aids. An obvious way to minimie
$%AM energy per operation is to decrease V&&. This
decreases active power, '()V&&
*
+, as well as lea,age power.
If V&& is decreased too sharply, however, increased delay time
causes this lea,age power to be integrated over a longer time
interval, thus increasing the power!delay product '-&-+. It
has been shown that a minimum -&- corresponds to a supply
located in the sub!threshold region. ./0
A
Implementing $%AM in subthreshold involves an e#plicit
tradeoff between stability and area. Typical 1T $%AM
achieves desired read 2 write margins by relying on ratioed
current strengths set by transistor lengths2widths. 3ut high
sensitivity to VT process variations, as well as degraded Ion2Ioff
ratios, renders these length2width!based ratios wholly
unreliable for sub!VT $%AM. In order to increase read2write
stability, e#tra peripheral circuitry and2or additions to the 1T
memory cell design can be utilied, at the cost of increased
area. This motivated us to investigate the area!performance
trade!off for subthreshold $%AM designs.
II. -%O34"M $TAT"M"NT
In order to optimie power, delay and area in $%AM design,
modeling of the memories is needed to characterie the
behavior of the $%AM and help ma,ing design decisions
before running $-I)" simulations. Over the last decade,
there have been many proposed models .50, .60 and tools .10,
.70 developed to predict the $%AM performance. 8owever,
these models and tools are all based on traditional 1T $%AM
design operated in superthreshold regime. 8ence they didn9t
consider the stability issue, which is the ma:or metric that
trades!off with the area in subthreshold $%AM design.
Therefore, in this paper stability is modeled and ta,en into
account in subthreshold $%AM performance trade!offs.
This paper compares the performance of the nominal 1T cell
to the approaches ta,en by two representative sub!VT designs.
Our goal is to determine the most area!efficient method of
maintaining sub!VT $%AM read2write stability for
applications re;uiring very low energy per operation.
<
III. $=3!VT $%AM &"$I>N$
In this paper, performance of two specific subthreshold
$%AM designs .*0, ./0 are compared to the traditional 1T
design. The design in ./0 uses an 6T memory cell which only
marginally adds to the typical $%AM cell area. The e#tra
two transistors act as a buffer which protects the stored data
during a memory read. Typically in 1T $%AM, at the onset
of a read, the ?@A memory state is connected to a precharged
bitline, which raises the node9s voltage and reduces stability
margins. The included buffer isolates this node from the
bitline, thus allowing the read margin to e;ual the hold
margin, which is typically much higher. =nfortunately, only
a single word!line transistor, M6, bloc,s charge from lea,ing
off %34. 8igh bitline lea,age limits the number of rows that
can connect to a single bitline, if the desired read current
from a single row is to dominate the combined lea,age from
all other rows. The solution involves tying the feet of all
unaccessed M7 buffers to V&&, driven through a buffer. This
introduces small area and power overheads. In particular, the
power overhead is small if each word is located on a single
row, since only one foot must be discharged to read all the
cells in a word. $ince the foot of the row being read must
source I%"A& from all cells in the row, the pull!down strength
of this buffer must be ;uite high. A charge pump is used to
boost the buffer9s input voltage to *BV&& in order to provide
such high current strength while allowing the buffer itself to
be of minimum sie.
Area-perormance tradeos in sub-threshold SRAM designs
""*/C Dinal %eport
>eorge )ramer 'cramergEeecs+ and -ing!)hen 8uang 'pchuangEeecs+
C
'a+
'b+
Dig.C. 'a+ 6T $%AM cell ./0, 'b+ C@T $%AM cell .*0
Additional area overhead arises from the need to ensure
write stability. The -MO$ pull!up transistors are connected
to a secondary supply, VV&&, which is lowered during a write
in order to reduce the drive fight and ensure that a ?@A can be
successfully written. This techni;ue re;uires that any cells
connected to a given VV&& be written at the same time, since
a lower VV&& drastically reduces hold margins. This causes a
significant area overhead, since sense!amps and other column
circuitry can no longer be shared, as would be e#pected with
an interleaved column setup.
The design discussed in .*0 uses a C@T memory cell. As
with the 6T cell, the e#tra transistors are used as a buffer to
maintain higher stability during read operations. The e#tra
two transistors, MF and MC@, greatly reduce lea,age current,
both from V&& and %34. If node G3 H ?CA, the high -MO$
lea,age 'relative to NMO$+ ,eeps G33 I ?CA, which
essentially eliminates bitline lea,age. If G3 H ?@A, G33 is
held fully at C through the -MO$, once again yielding ero
bitline lea,age. In fact, the lea,age is so low that a successful
read can be distinguished even with *51 cells connected to a
single bitline. This significantly reduces peripheral area,
:ustifying the C@T design. $imilar to ./0, .*0 uses a lower
-MO$ V&& to enable a negative write margin. In this case,
VV&& is left floating during a read, so that the ground!tied
bitline gradually pulls it down, wea,ening the pull!up -MO$
until the write is successful.
%eference ./0 .*0
Memory $ie *51 ,b *51 ,b
Area *.CC7 mm
*
*.CC7mm
*
Tech Node 15 nm 15 nm
Total -ower *.* JK L.*6 JK
Dre;uency *5 ,h /75 ,8
$upply L5@ mV /@@ mV
Min Operating $upply L5@ mV L6@ mV
Table I. -erformance summary of $%AM designs .*0, ./0.
II. -%O-O$"& )OM-A%I$ON2$O4=TION
There are four main performance metrics for any $%AM
designM stability, delay, power, and area. "ach can be
e#pressed in terms of siing and Vdd. Ke assume a given
constant stability for the three designs as the basis for
comparison. As the Vdd scales down, the corresponding
siing for each design at a particular Vdd can be calculated.
Once the siing is determined at a particular Vdd, the power
and delay can then be calculated or simulated. Dor
subthreshold $%AM in particular, the ultimate goal is
minimum overall power consumption while the delay can be
tolerated in applications of interested. Dor this reason, our
comparison does not see, to reduce delay specifically. 8ence,
the power!delay product or energy per operation '"O-+ will
be the primary figure of merit in our analysis. The
comparison proposed here thus will determine the area
efficiency of a given design as a function of the desired "O-.
A. Modeling Stability
If stability is assumed to be constant for all designs, then
the $%AM cell transistor sies must be determined
appropriately, assuming a given supply voltage. This siing
can be determined through simulation, although this
procedure is rather tedious and yields little intuition into what
is really going on. Our approach was to e#press stability as a
function of siing and supply voltage, based off analytical
e#pressions, and then utilie these e#pressions directly to
determine transistor siing in later simulations.
This paper models the hold, read, and write margins based
on traditional 3utterfly plots.
a. Hold Margin
If VG is low, VG3 is high and V&$I@, V>$N@ for M*. If VG is
high, V>$H@ for both M* and ML, but I-MO$OINMO$ in the sub!
VT operation. Thus, we may assume IM*H@ when calculating
hold margin. $etting IMCHIML,
C
C
C
L
L
L
e#p C e#p
e#p C e#p
Q T QB
S
TH TH
DD Q T QB DD
S
TH TH
V V V
I
nV V
V V V V V
I
n V V
_ _ _



, , ,
_ _ _



, , ,
As shown in .C@0, solving for VG yieldsM
C L L
C L C
C L L C C
C L C L C L
C e#p
ln ln
C e#p
QB DD
TH TH S
Q
QB S
TH
T DD T
V V
V n n V I
V
V n n I
V
n n V nV V
n n n n n n
1 _ _

1
_
, 1
+

1
+ _
,
1


1 , , ]
_
+ + +

+ +
,
Inverting this e;uation and then solving for $NMhold is
computationally intractable. 8owever, for regions of interest,
using the provided /5nm -TM 3$IM model it can be
modeled asM
$NMhold'V+H!@.@L/7P@.5BV&&.
b. Read Margin
If VG is low, M* has a low V&$, so IM*NNIML, yielding the same
e;uation as before. If VG is high, ML is turned off and
IM*OOIML. $etting IMCHIM*,
*
'a+ 'b+
'c+
Dig.*. 'a+ 8old $tress, 'b+ %ead $tress, 'c+ Krite $tress
C
C
C
*
*
*
e#p C e#p
e#p C e#p
Q T QB
S
TH TH
DD QB T QB DD
S
TH TH
V V V
I
nV V
V V V V V
I
n V V
_ _ _



, , ,
_ _ _



, , ,
$olving,
( )
*
C C
C
C
C *
*
C e#p
ln ln
C e#p
QB DD
TH S
Q TH TH
QB S
TH
T DD T QB
V V
V I
V nV nV
V I
V
n
V V V V
n
_ _


_
,
+


_
,



, ,
+ +
$ince the analytical solution for $NM does not e#ist .C@0, but
least!s;uare fitting for the implemented 3$IM model yields
very closely modelsM
' + @.@CLL @.*516
@.@CC ln @.@*@C ln
read DD
p
a
n n
SNM V V
W
W
W W
+
_ _
+

, ,
c. Write Margin
If VG is low, MC is off and M* and ML are on. If VG is
high, ML is off and VG3I@. Therefore, solve for VG by setting
IM*HIML. =nli,e for the hold and read margin cases, using the
sub!VT appro#imation for IM* and IML does not yield an
accurate solution of VG. This is because the e#ponential
behavior of I&'V>$+ is accurate only for V>$N*@@mV,as shown
in Dig. L. This error, when applied to the drive fight between
IM* and IML at VGH@, yields a significantly different result for
VG3.
Dinding an accurate value of VG3 depends on accurately
modeling current in the moderate!VT region, which is very
difficult. Kith no other option, an e#pression for $NMwrite
was developed by manually fitting simulation resultsM
*
@.@5L @./1L
@.C5 C ma# @.C C , @.5
write DD
a DD
DD p
SNM V
W V
V W
+
_ _ _
+

, , ,
where V&&* is the voltage seen at the source of ML.
Intuitively, the e;uation states that either lowering V&&* or
raising Ka2Kp will decrease the relative strength of ML,
ma,ing a write easier to complete. 8owever, this only wor,s
to a point, since $NMwrite will no longer continues increasing
once M* completely overpowers ML.
The obstacle to meeting stability constraints in sub!VT
$%AM is VT variation. This is due to the very high
sensitivity of current to VT in the subthreshold region. Thus,
by no means will transistor sie ratios alone ensure stability
re;uirements will be met. 8owever, VT variations are not
considered in this paper, so we will simply pic, some high
$NM 'e.g. C5@mV+ which we assume will continue to meet
specs for the desired 5Q!1Q of variation.
B. Modeling Delay and Power
Dor a 1T $%AM cell, the read delay Td can be
appro#imated as
%e
B
d
ad
! V
T
I

where RV is the input voltage difference re;uired for the


sense!amp and I%ead is the read current.
%e
e#p' +'C e#p' 2 ++
dd TN
ad "n dd t#
t#
V V
I I V V
nV


The total power -tot is
tot dd lea$ dd
P ! VV % I V +
where S is the activity rate, fHC2*Td, and Ilea, is the lea,age
current supplied from Vdd
e#p' 2 +'C e#p' 2 ++
lea$ "p TN t# dd t#
I I V nV V V
8ence the "O- can be obtained
total B dd lea$ dd
&'P P Delay ! VV I V Dealy +
Kith )34H*@fD, RVH@.6Vdd, and activity rate SHC, and all
minumin!sied devices, the analytical and simulated "O- of
the traditional 1T is shown in Dig. /. The reason why we
cannot see a dip in this plot is because SHC, where lea,age
power is still low. As S decreases, the lea,age power starts
coming into play and causes "O- the local minimum.
V. ANA4T$I$
Now that e#pressions for stability, delay, and power have
been developed, it is now possible to estimate the area versus
L
Dig.L. I& as a function of V>$ for both NMO$ and -MO$
5E-16
7E-16
9E-16
1.1E-15
1.3E-15
1.5E-15
1.7E-15
0.2 0.22 0.24 0.26 0.28 0.3
Vdd
E
O
P
HSPICE
MATLAB
Dig./. Analytical and simulated results of "O- versus Vdd for 1T $%AM cell
"O- for each $%AM design. VV&&2V&& is assumed to be @.6
for all cases. This is necessary to ensure a high $NMwrite in
subthreshold, where -MO$ is stronger than NMO$. Dirst, we
set bounds on stabilityM minimum $NMreadH6@mV and
$NMwrite H C5@mV. Dig. 5 shows the simulated $NMread for
several combinations of siings and V&&, with the siings
pic,ed using the $NM e#pressions developed in the previous
section. $NMread consistently matches the e#pected value,
with the e#ception being for V&&H@.LV, where wp2wn I F.
'Dew $%AM designs would realistically have such a high sie
ratio, due to the high cost in area, so this data point is
irrelevant in practice.+ $NMread e#ceeds 6@mV for V&&H@.5V
simply because the cell has minimum sie and cannot be
scaled down any further.
Dor both the 6T and the C@T cells, the read stability margin
is not an issue. Therefore, siing is sub:ect only to the write
margin constraint. The figure below simulates $NMwrite as a
function of V&& and siing. $iing is pic,ed by setting $NM!
write H C5@mV in the e;uation developed last section.
Once the siing is determined at each Vdd, the power,
delay, "O-, and area can be obtained. Dig. 6 shows the
power, delay and "O- of the three designs. The 1T design
has the smallest read delay since its path from the internal
node storing the data to the read bitline has the smallest
e;uivalent resistance of all three designs. In our simulation
setup, with SHC, the dynamic power dominates, so the 1T one
has the largest power. The "O- for the 6T is higher than that
of C@T because the 6T design re;uires e#tra power to switch
the buffer!foot inverter during each read. Dig. F shows the
area versus "O- for three cases. Dor low "O- applications,
the 1T design area must increase dramatically to meet both
read and write stability re;uirements. Although the C@T
design has more transistors, it is actually more area!efficient
in e#treme low "O- regime. 8owever, for only moderately
low "O-, stability re;uirements are met even with minimum
siing. In this case, the 6T design re;uires less area.
VI. )ON)4=$ION
In this paper, models for stability, power, delay are used to
investigated the area!"O- trade!off for three representative
subthreshold $%AM designs. -ower, delay, and "O- for each
design are compared as Vdd scales down. The C@T design has
the smallest "O- and is most area!efficient in low "O-
region.
%"D"%"N)"$
.C0 T. Uwon, &. -avlidis, T. 4. 3roc,, &. ). $treit, ?A &!band monolithic
fundamental oscillator using In-!based 8"MT9s,A I&&& Tran". on
Microwa(e T#eory and Tec#., vol. /C, no. C*, pp. *LL1!*L//, &ec. CFFL.
.*0 3. 8. )alhoun and A. -. )handra,asan, VA *51!,b 15!nm sub!threshold
$%AM design for ultra!low!voltage operation,V I&&& )o*rnal o% Solid+
State !irc*it", vol. /*, no. L, Mar. *@@7, pp. 16@!166.
.L0 W. )hen, 4.T. )lar, and T.!8. )hen, VAn ultra!low!power memory with a
subthreshold power supply voltage,V I&&& )o*rnal o% Solid+State !irc*it",
vol. /C, no. C@, Oct. *@@1, pp. *L//!*L5L.
/
Dig.5. $imulated $NMread for desired $NMreadH6@mV cell, using the $NM
model to determine siing
Dig. 7. $imulated $NMwrite for desired $NMwrite OH C5@mV using simulation
results to determine siing

Dig.1. $imulated $NMwrite for desired $NMwriteHC5@mV using the $NM model to
determine siing
./0 N. Verma and A. -. )handra,asan, ?A *51 ,b 15 nm 6T $ubthreshold
$%AM "mploying $ense!Amplifier %edundancy,A I&&& )o*rnal o% Solid+
State !irc*it", vol. /L, no. C, Wan. *@@6, pp. C/C!C/F.
.50 3. Amrutur and M. 8orowit, ?$peed and power scaling of $A%M9s,A
I&&& )o*rnal o% Solid+State !irc*it", vol. L5, no. *, Deb. *@@@, pp. C75!
C65.
.10 -. $hiva,umar and N. -. Wouppi, ?)A)TI L.@M an integrated cache timing,
power, and area model,A Aug. *@@C.
.70 M. Mamidipa,a and N. &utt, ?e)A)TIM An enhanced power model for on!
chip caches,A Tech. %ep. )")$ T%!@/!*6, $ep. *@@/.
.60 3. Agrawal, T. $herwood, ?>uiding architectural $%AM models,A
International !on%erence on !o,p*ter De"ign, Oct. *@@7, pp. *71!LF*.
.F0 &o, M. G., M. &radiulis, -. 4arsson!"defors, and 4. 3engtsson
?4ea,age!)onscious Architecture!4evel -ower "stimation for -artitioned
and -ower!>ated $%AM Arrays.A Proceeding" o% t#e -t# International
Sy,po"i*, on Q*ality &lectronic De"ign, pp. C65!CFC, Mar. *@@7.
.C@0 3. 8. )alhoun and A. -. )handra,asan, ?$tatic Noise Margin Variation
for $ub!Threshold $%AM in 15!nm )MO$,A ,V I&&& )o*rnal o% Solid+
State !irc*it", vol. /C, no. 7, Wul. *@@7, pp. C17L!C17F.
5
0.25 0.30 0.35 0.40 0.45 0.50
0
500
1000
1500
2000
2500
3000
3500
0
2
4
6
8
10
12

D
e
l
a
y

(
n
s
)
P
o
w
e
r

(
n
W
)
Vdd(V)
10T
8T
6T
'a+
0.25 0.30 0.35 0.40 0.45 0.50
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
2.2
2.4
2.6


E
O
P
(
f
J
)
Vdd(V)
10T
8T
6T
'b+
Dig. 6 'a+ -ower, delay, and 'b+ "O- versus Vdd for the three $%AM designs.
0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6
600
800
1000
1200
1400
1600
1800
2000
2200


T
o

a
l

W
!
d

"

(
#
r
e
a
)

(
n
$
)
EOP (fJ)
10T
8T
6T
Dig. F Area versus "O- for the three $%AM designs.