Sei sulla pagina 1di 11

220 t\4EMORY

ORGANIZATiON

soLlrTtoN
F ( S O-)1 , F ( S 1 ) = 1 - H ( S 1 ) =0.4,

F ( S 2 )= I - H ( S z ) = 0 , 2 , a n d F ( S 3 )= 1 - H ( S 3 )= 0 . 1
7: r(so)r,+ F(st;r,+ F(s2)r3
+ F(s3)lo
= ( 1 C o . r -0 . 4 r 1 A 3 + 0 . 2 x 5 0 + 0 . 1 x 1 0 0 0 ) m s
= ( 0 0 0 0 1 + 0 . 0 0 0 4+ 1 0 . 0 + 1 0 0 )m s
: 1 1 0 . 0 0 0m
5s

The conceptof hierarchicalmemory systemsis popular in large mainframecom-


puterssuchas the IBM 370 and the CDC 7600. For thesesystems,the memoryhierar-
chy includesbipolarcachememory,MOS main memory,ferritecorememory.magnetic
disks.and tapes.Howerer.sucha multilevelorganization is not commonin microcom-
putcrsystemc
As staiedbefore. ihe actualperformanceof a hierarchicalmemory systemis de-
prendenton severalfactors,someof which are as follows:
. Sequence generated
of addresses by the CPU.
. Numberof bits per level.

. Size of the block transf'enedwhen the requiredaddressis not found in level L

. Natureof policiesthat control the flow of information.

A rigorous mathematicalanalysisof the performanceof a memory hierarchyis


difficult becauseof the randomnessassociatedwith the hierarchicalorganizationfunc-
tions. However, satisfactoryresults may be obtainedby employing simulation tech-
niques.This iopic is beyondthe scopeof this book.

5.6 CACHE MEMORIES

The perfonnanceof ,a computersystemwill be severelyaffectedif the speeddisparity


betweenthe processor and the main memon'is srgnificant. For example,the cycletime
of the CDC STAR 100 processoris 40 nanosecond, while the cycle time of its main
memory is 1280nanosecond. In sucha situation,the sy'stemperformancecan be signif-
icantly improved by adding a small, expensive,but fast buffer memory betweenthe
processorand the main memory. This buff'ermemory is called cachememory and was
first implementedin the IBM 360/85 computer. r-ater, this conceptbecamea part of
minicomputers suchas PDP-ll and Data General'sEclipse.With the adventof VLSI
technology,the cacheconccptis gainingacceptance in the micro world. The fourth-
generation 32-bif VLSl micioprocessors
suchas Motorola'sMC68020and Intel's80386
includean on-boardcachein their CPU.
--sF.

5.6 CACHEMEMORIES 221

Flgure 5.15 M€moryOrganization


of a Computer
SystemthatEmploys
a CacheMemory

The block diagramrcpresentation of a computersystemthat employsa cacheis


shownin'Figure5.15. Cacheassumes thetop levelin thecomputer's memorysysrem.
Usually, the cache-memory size is much smallerthan the main memoryand 5 to l0
timesfaster.
Assumethe foliowing modeof operation:An addressgenerated by the CPU is
first sent to the cache.If this referenceis found in the cache,a cache/ril is said to
occur, andthe datapertainingto the CPU referenceis transfenedto the CPU from the
cache.However,if the referenceis not foundin the cache,a cachemrssis saidto have
occuned.Whcnthercis a cachemiss, first the requireddatais transfenedto the cache
fmm the main memory,and then it is readby the CPU from the cache.Usuallythe
datais transferred from the main memoryto cachein blocks.
Typical block size is 4 to 64 words.Whenthe cacheis full, one of the existing
blockswill beevictedusingstandard replacementpoliciessuchasfirst-infirst-our(FIFO)
or leastrecentlyused(LRU). Thesepoliciesare discussedlater rn this chapter.This
block transferis caniedout in anticipationthatthe block is very likely to be refererrccd
againand againin the near future. Suchan intuitive guessis stronglysupportedby
manypracticalsituationssuchas tight loopsandrepeateduseof a subroutineor a ddr
structuresuchas a symboltablecorresponding to a sourceprogram.
The performance of a systemthat employsa cachecan be formally analyzedas
follows: lf t", h, andt- specifythe cache-access time, hit ratio, and the nrainmemory
accesstime,"respectively; thenthe averageaccesstime can be determinedas shownin
equation5.7: r

| = ht" + (l - h)(r" + r-) (5.71


222 MEMCqYORGANIZATION

The hit rario Ir always lies in the closed interval 0 and It and it specificstbe
relativenumberof successfulreferencesto the cache.Equation5.7 is derivedusing the
fact that when there is a cachehit, the main memorywill not be accessed;and in the
eventof a cachemiss, both main memoryandcacrp will be accessed.Supposethc ratio
of main memory accesstime to cacheaccesstime is 1, then an expres3ionfm 0re
etficiencyof a systemthat employsa cachecan be derived-asfollows:

EfficiencY=A:*
t

ht"+(l-h)(t,+t^)

I
h+0-lr)(l+u)
tc

n+(l-&Xl+r)
I
l+r(l-tr)
Note that A is maximum when h = | iwhen all referenc€saro confinedto thc
cachc). A hit ratio of 90% (ft = 0.90) is not uncomrnonwith many cont€hPora{t,
systems.
The following exampleprovidesa qualitativeexplanation:
f
3ffi3 ale as indicated:
Calculatei, 1, and A of a memorysystemwhoseparameters
l" = 160'ns
t' = 960 ns
ft = 0.90

sotuTrot{
l=ht"+(l-h)(t"+tn)
= 0.9(160)+ (0.1X960+ 160)
= 144 + ll2
= 256ns
tm , 96{)
r=-:--1 =6
tc 160
__
_:::l
:-..
r-<:-=:-:::"'

5.6CACHE
MEMORIES zixt

A=*-=*=l=o.ozs
I +r(l -r)
-
I +6(0.1) 1.6-"

This resultindicatesthat by employinga cache,_


efficiencyis improvedby 62.s%.
Assumethe unit of mappingis a brock; then the relationshipLt"r"n
ttrc ndn
and cachememory biocks can be estriblishedby using a specific
mappingtecirniquc.
Threemappingtechniquesare widely used:
. Fully associative
mapping
. Direct mapping
. Setassociative
mapping

In a fully associativemappingscheme,a main memoryblock i can be


mappedto
any cacheblockj, where

0=i<M-t and 0<7</V-t


This situationis shownin Figure5.16.

Main
memory

Cacie
memory

N-1

N Blocks
M-1

r M Blocks
Flgurc 5.16 Fully AssociativeMapping
22a MEMOFYORGANIZATION

this figure, it is apparentthat the main memory has M brocks and


. .F*r the cache
is divided into N blocks. To determinewhich block of r.in r"roryls
sto.ed into the
cache, a mg is required for each cacheblock. More formally:

Tag 0) = addrcssof the main memory brock stored in the cache


block 7
SupposeM = 2^ and N = 2"; then m and n bits are required to specify the
addressof a main and cache memory block, respectively.
Since a mairr memory block
b" mappedto any cache block, the entire m bits of a main
:* memory brock address
has to be usedas a tag. SincethereareN cachebrocks,N
tagsare needed.These-tag.s
can be either stored in the cache rnemory itself oneparateli
stored in an associative
memory called the tag directory.
In this scheme,when the cpU generatesan address,the
main memory block is
extracted(usuallythe high-orderru bits) and is then
associativelycomparedwith all N
tagsstoredin the tag directoryfor a match. If a match
occurs,tt. .oorponding cache
blbck nurnberis retrieved,and the cacheis accessed
for the requireddata.
If the associativesearchfails, then the main memorv is accessed
for the required
data. This is a cache miss. A block of data is transferred
to the cache, and the tag
directotyis updatedaccordinglv,If thereis no free
spacein the cache.then the incom-
ing maln memory biock will replacean existing cache
block. Such a repracementis
canied out using a replacementpolicy such as LRU
or FIFO (to be explainedlater).
The tag directoryis also updatedio rehectthis acrron.
The principal advantages of this methodare its great flexibility and that the ad-
.
dresstanslation processcan be perrormedquickly
beciuseor tne nijrr-speedtag direc-
H9w-e.ver,lthe high cost associated with a tag direcrorylimits the liberal implemen-
lor.r'
tation of this idea. An increasein the number
of main btockslinearlyin.r.ur., the size
of the directory.Similarly, when the cacheis
expancied, the tag directoryhardwarealso
increases'The numberof cotnparators requiredto conductan aisociatiuesearchis equal
to thc number of cacheblocks.
To reducethe hartrwarecost, the direct mappingtechnique
is emproyed.A main
memoryblock, i, is alwaysmappedinto the cacne-bloct
i moo,v. Fo. inis rcason,this
methodis known as congruentmappingand is illustrated
in Figure 5.17.
I f N = 2 n a n d M = 2 ' n .t h e n i m o t l N w i i l b e i n t h e r a r i g e o f
0to2" -- i. This
meansthat the low-orderri bits of the binary representatlon
correspondingto :he main
block i give the cacheblock number.
This is shownin Figure5.1g.
Figure 5'18 ihows that the high-orderm - n bits can
be usecias a tag to deter.
mine if a main block is stored in the cache memory.
when the CpU generatesan
address,the low-order n bits of the main memory
block number field are used as the
index to the tag directory. and the tag sroredhere
is comparedwith the tag field of the
specifiedmain memory block number(high-orderm -
n bits). If thereis a match, the
cacheis accessed: otherwiselhemain memoryis accessed. ln the eventof a cacherniss,
the incoming main memory block i always replacesthe
cacheblock i mod N becauseit
carnot be mappedinto any othercachebrock. Thus no repracement
policy is calledfor.
Howe.ver'.becauseof the mapping restriction, it causes*o-"
, proH",ns. For ex-.
ample, let i, and 12be two main memory blocks such
that i, = I i i,. In this casc,
both blocksare mappedto the samecacheblock because
+

5.6 CACHEMEMORIES 22a

Main
memory

i m o dN

M-1

Flgure 5.17 DirectMapping

i2 mod N : (N * r1) mod.V


=NrnodN+irmodN

= = 0* j r m o d l / ('.'l/modN - 0f
: ir modN
Even though there ntay be other vacant cache blocks, i2 needs to be evicted to
provide rooln for i1, and vice vcrsa. This sltuation leads to a drop in the hit ratio. This

[-- t* --#-- cacheutocx --l


nu,rber
Figuro 5.18 Addres;sFormatunderDirec!fu{apping
Scireme
22G MEMORYORGANIZATION

information from two


reduction will bc significant if the processor frequently requires
one adother' According
rnain memory blocki that are a multiple of N blocks away from
are unlikely, and thus
to the localiiy of rcference principle, such referencesequences
all the time.
6re direct maiping technique may not produce disastrous results
and direct mapping methods,
To achieve a compromise b"t*".n fully associative
called sel associativemapping is employed. In th-is case, the cache
** *i"Li"e
"blocks are divided into N/S sets so there are s cache blocks per set. In this aPproach, a
(N/S)' However'
main memory block i will always be mapped to the cache set_i mod
placed anywhere. Therefore' direct mapping occurs at
within the ."i, th" block can be
mapping occurs within a set. A conceptual view of
the set level, and fully associative
this idea is shown in Figure 5.19.
_ l'
| f M = 2 ^ , N = - 2 n , * d S = 2 " , t h e n N / s w i l l b e i n t h e r a n g e o0f t o M s
and n - s bits are required.
- 1, the low-order n - s bits
Since i mod (/v/.s)is also in the range of 0 to N/s
the
of the main memory block addressdirectly specify the cacheset numhr' Therefore,
treated as tag bits
high-order n - (n - s) bits of the main memory block addressare
(seeFigure 5.20).
set
f J = 0, this approach degeneratesto direct mapping becauseeach cache
contains only one blocf. When .r = n, this method degenerates to fully associative
: 2" = N
mapping beiause the entire cache memory is treated as one set with T

Main
memory

Ts oio"x.
f Perset

Set
irnod (N/S)

stN/s - 1

Figuro 5.i9 Set AssocraliveMapping


5.6 CACHEMEMORIES aar
Mainrnemoryblod<number
m tats

Flgurc t.2O AddressFormatunderSetAssociativg


Mapping
Scfieme

blocksper set.This methodis practicalbecause it reducesthe sizeof th€ tag field fiom
rz bits (fully associtive)ro m - (z - s) bits. It is morc ffexiblerhandirectmrpping
becausetherc are lvls blockswithin a set (as opposedto just one block pcr sct). Fo?
this rcason,it is widely employedin many contemporary computersystemssuch.s
VAX andAmdahlcomputers(seeFigure5.21).
The following exampleis includedto explainthe mappingtechniques discusscd
so far.

I
EXAMPLE 5-3
The parameters
of a computermemorysystenarespecifiedas follows:

. Main rnemorysiee : 8K blocks


. Cachememorysize = 512blocks
. Block size = 8 words

Detcmtne the size of the tag field of the main memoryaddressunderthe following
conditions:

IBM 370/168 1024 I


Anm dah]. 470Vl9 2c!{8 4
IBM 3033 rc24 t6
vAx - r t/180 to21 2

Fig.ureF.2l set Associativlcachepararneters


ol ryorcaicomputersysfoms
224 MEMORYORGANIZATION

e) Fully associativemaPPing
b) Direct mapping

c) Set associativemapping with 16 blocks/set

30LUTlOt{
With the given data, compute the following:

, M = 8 K = 8 1 9 2 = 2 l 3 , a n d t h u s m= 1 3 '
. N = 512= Ze,andthusn:9'
word
. Block size = E words = 23 words, and thus we require 3 bits to specify a
within a block.

.Usingthisinformation,wecandeterminethemainandcachennemoryaddress
formats as shown next:

_ ) l
Main memoryaddress
rl{# "- l6bils

l.--- Block number l'------- Word "-l

E Uits 3 611,_._-_----+l
l.---

4l I
Cachememory address

= 13 = bits:
a) In this case,the size of the tag lield is m

t-- Main mernoryaddrcss

l- l6bits-
5.6 CACHEMEMORIES 2n
b) In this case,the size of the tag ftetd is m - n = 13 - 9 = 4bits:

Main memory address


i*___ 16bits
I
I
l*--- Ta!--+l+- C a c hbel o c kn u m b e- lr* - - W o r d. . , l
l*-., 4 bits --*l- 9 bits - >1.--g birs ------+l

c) S = 16 = 2", and thus S = 4. Therefore. the size of the'tag field is m - n + s


=13-9+4-8bits:

Ir{ain memorv address


1 6b i r

If the desired information is not found in the cache, data is retrieved from the
main memory, and a block of data is transfened from the main memory to the cache.
When the CPU alters the contents of the cache, it is necessaryto update its main mem-
ory copy. There are two ways an update operation can be carried out.
In the first approach, whenever the CPU writes somethinginto a cacheblock, that
.block is tagged as a dirry block. When a dirty block is to be replaced with a new block,
the dirty block is copied into the main memory before it is overwritten by the incoming
new block. This method is called yrite-back. The virtue of this policy is that it avoids
unnecessary writing into main rnemory.
In the second method, whenever the CPU alters a cache address,the same alter-
ation is made in the main memory copy of the altered cache addressimmediately. This
technique is known as the w.rite-through rnethod. This policy can be easily imple-
mented, and it ensuresthat ihe content$of the main memory are always valid. This
feature is desirablein a multiprocessorsystem where the main memory is sharedby
severalprocessors.However, this approachmay lead to severalunnecessary writes of a
6lock to main memory befLre it is replacedin the cache.
C)neof the important aspectsof cache-memoryorganization design is to devise a
method that ensuresproper use of the cache. Usually, the tag directory contains an extra
bit for each entry. This additional bit is called a valid bit. When the power is turned
on, the vaiid bit correspondingto each cache block entry of the tag directory is reset '.o
zero. This is done to indicdtethat the cacheblock holds invalid data. When a block of
data is first transferred from the main memory to a cache block, the valid brt corre-
2to MEMORYORGANIZATION

spoding o this cacheblock is set to l. In this arrangement,wheneverthe valid bit is


i*ro, it imptict that wheneverthe valid bit is 0, the cacheaccessshouldbe disabted
ard sucha situationis to be'considcr€das a cachemiss.
The following exampleillustratesthe effect of cachemanagementpolicies dis-
cussedso far.

I
s!!33!
Thc accesstirne -ofa cachememoryis 50 ns and that of the main memoryis 5(0 ns. It
is cstimatedthat t0% of the main memoryrcquestsare for lead and the remainingare
fc writc. The hit ratio for rcad'accessonly is 0.9 and a write-throughpolicy is used-
$ Dcrcrminethe averageaccesstime consideringonly the readcycles.
b) What is the averagetime if the write rcquestsarealsotakeninto consideration

SotuTrox
r)i= fu"+(l-h)(t"+t^)
= 0"9 x 50 + (0.1x550)
=45*55ns
= t(X)ns
b) tn.oe, = (readrequestprobability) x i.* + (l-read r€questprobability) X lwirc
probility = 9.9
Rcad-rcquest

Writc-requestprobability= 9.2
i,..o = i = 100ns (rcsultof Part(a))
=
iyrirc SCd)ns (becauseboth main and cachememories are updated at the same
time)
i*.,,n = 0.8 x l0o + 0.2 x 5oo
= 80 + l00ns
= lilO ns

Thc growth in lC technologyhas allowed manufacturersto fabricate a small cache


xt the CPU chip. The on-chip cache of Motorola's 32-bit microprocessor, the
MC6E020, ru discussedncxt.
The MC660?0 is an HMOS (high-density N{OS) high-performancetwo-level mi-
cruproSrantmedrnicroprocessor.Motorola claims il is the tirst 32-bit microprocessorto
haveevolvcdfrorn a l6-bit machineto a full 32-bit machinethat provides32-bit address

Potrebbero piacerti anche