Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Computer Organization
Lecture 4
Caches and Memory Systems
Dr. Yuan-Shun Dai
Computer Science 504
Fall 2004
1
Write Policy:
Write-Through vs Write-Back
Write-through: all writes update cache and underlying
memory/cache
Can always discard cached data - most up-to-date data is in memory
Cache control bit: only a valid bit
Other Advantages:
Write-through:
memory (or other processors) always have latest data
Simpler management of cache
Write-back:
much lower bandwidth, since data often overwritten multiple times
5
Write Buffer
write
buffer
DRAM
(or lower mem)
6
2
1.8
1.6
1.4
0->1
1.2
1->2
1
2->64
0.8
Base
0.6
0->1
1->2
2->64
Base
0.4
0.2
Integer
ora
spice2g6
nasa7
alvinn
hydro2d
mdljdp2
wave5
su2cor
doduc
swm256
tomcatv
fpppp
ear
mdljsp2
compress
xlisp
espresso
eqntott
Floating Point
FP programs on average: AMAT= 0.68 -> 0.52 -> 0.34 -> 0.26
Int programs on average: AMAT= 0.24 -> 0.20 -> 0.19 -> 0.19
10
8 KB Data Cache, Direct Mapped, 32B block, 16 cycle miss
Definitions:
Local miss rate misses in this cache divided by the total number of
memory accesses to this cache (Miss rateL2)
Global miss ratemisses in this cache divided by the total number of
memory accesses generated by the CPU
(Miss RateL1 x Miss RateL2)
Global Miss Rate is what matters
11
12
11
Bit line
Row Decoder
Address Buffer
A0A10
Memory Array
(2,048 x 2,048)
W ord Line
Storage
Cell
15
DRAM Performance
A 60 ns (tRAC) DRAM can
perform a row access only every 110 ns (tRC)
perform column access (tCAC) in 15 ns, but
time between column accesses is at least 35
ns (tPC).
In practice, external address delays and turning
around buses make it 40 to 50 ns
Wide:
CPU/Mux 1 word;
Mux/Cache, Bus, Memory
N words (Alpha: 64 bits &
256 bits; UtraSPARC 512)
Interleaved:
CPU, Cache, Bus 1 word:
Memory N Modules
(4 Modules); example is
word interleaved
18
Simple M.P.
= 4 x (1+6+1) = 32
Wide M.P.
=1+6+1=8
Interleaved M.P. = 1 + 6 + 4x1 = 11
19
Example
Suppose 1-word block size with Simple MO has 3% Miss rate,
and Memory Acce per instr=1.2, Hit time=2. If miss,
4 cycles to send address, 56 access time, 4 to send a word
Then, CPI=2+(1.2*3%*(4+56+4))=4.30
If 2W block size has 2% miss rate and 4W has 1.2%,
20
21
22
23
24
25
26
CPU
VA
VA
VA
Tags
TB
$
VA
PA
TB
$
PA
PA
MEM
MEM
Conventional
Organization
27
Solution to aliases
HW guarantees covers index field & direct mapped, they
must be unique; called page coloring
Page Offset
Address Tag
Index
Block Offset
CPU
VA
PA
Tags
TB
PA
L2 $
30
MEM
31
IS
IF
RF
IS
IF
EX
RF
IS
IF
DF
EX
RF
IS
IF
DS
DF
EX
RF
IS
IF
TC
DS
DF
EX
RF
IS
IF
WB
TC
DS
DF
EX
RF
IS
IF
IF
THREE Cycle
Branch Latency
(conditions evaluated
during EX phase)
IS
IF
RF
IS
IF
EX
RF
IS
IF
DF
EX
RF
IS
IF
DS
DF
EX
RF
IS
IF
TC
DS
DF
EX
RF
IS
IF
WB
TC
DS
DF
EX
RF
IS
IF
TWO Cycle
Load Latency
32
R4000 Performance
Note ideal CPI of 1:
Base
Load stalls
Branch stalls
FP result stalls
tomcatv
su2cor
spice2g6
ora
nasa7
doduc
li
gcc
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
espresso
eqntott
FP structural
stalls
33
hit time
miss
penalty
miss rate
MR
+
+
+
+
+
+
+
MP HT
+
+
+
+
+
+
+
+
Complexity
0
1
2
2
2
3
0
1
2
3
2
3
0
2
2
34