Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
2013
COMPUTER ARCHITECTURE
CSE Fall 2013
Faculty of Computer Science and
Engineering
BK
TP.HCM Department of Computer Engineering
Vo Tan Phuong
http://www.cse.hcmut.edu.vn/~vtphuong
dce
2013
Chapter 5
Memory
Multilevel Caches
Random Access
Access time is practically the same to any data on a RAM chip
OE WE
Specifies write operation
Cell Implementation:
Cross-coupled inverters store bit
24 23 22 21 20 19 18 17 16 15 14 13
1 2 3 4 5 6 7 8 9 10 11 12
Row decoder
Select row to read/write
Row Decoder
Row address
2r × 2c × m bits
Column decoder r
...
Cell Matrix
Select column to read/write
Cell Matrix
2D array of tiny memory cells Sense/write amplifiers
m
Data Row Latch 2c × m bits
Sense/Write amplifiers
...
Sense & amplify data on read
Column Decoder
Drive bit line with data in on write c
Same data lines are used for data in/out Column address
Threshold
voltage
OE WE OE WE OE WE
Multilevel Caches
Performance Gap
DRAM: 7% per year
Bigger
Faster
Access time: 50 – 100 ns Memory Bus
Main Memory
Disk Storage (> 200 GB)
I/O Bus
Access time: 5 – 10 ms
Magnetic or Flash Disk
Goal is to achieve
Fast speed of cache memory access
Balance the cost of the memory system
Imm
E
ALU result 32
0
32
1
ALUout
Register File
I-Cache Instruction Rs 5 BusA 2 D-Cache
WB Data
A 0
RA 3
Instruction Rt 5 L Address
BusB 32
RB 0 U
PC
Address Data_out
1 1 1
B
RW 2 0 Data_in
D
BusW
3
32
32
Rd2
Rd3
Rd4
0
1
Rd
clk
Instruction Block
Block Address
Block Address
D-Cache miss
I-Cache miss
Data Block
I-Cache miss or D-Cache miss
causes pipeline to stall
Multilevel Caches
000
001
010
100
101
110
011
111
In this example:
Cache
Cache index =
least significant 3 bits
of Memory address
Memory
Main
00000
00001
00010
00100
00101
01000
01001
01010
10000
10001
10010
10100
10101
11000
11001
11010
00110
01100
01101
10110
00011
01011
10011
11100
11101
00111
01110
10111
11011
11110
01111
11111
Computer Architecture – Chapter 5 ©Fall 2013, CS 28
dce
2013
Direct-Mapped Cache
A memory address is divided into
Block address: identifies block in memory Block Address
V Tag Block Data V Tag Block Data V Tag Block Data V Tag Block Data
= = = =
mux
m-way associative Data
Hit
V Tag Block Data V Tag Block Data V Tag Block Data V Tag Block Data
= = = =
mux
m-way set-associative Hit
Data
Multilevel Caches
Solution:
AMAT = 1 + 0.05 × 20 = 2 cycles = 4 ns
Without the cache, AMAT will be equal to Miss penalty = 20 cycles
Multilevel Caches
64
256
128
16
Multilevel Caches
32KB I-Cache/core
32KB D-Cache/core
3-cycle latency
256KB Unified
L2 Cache/core
8-cycle latency
32MB Unified
Shared L3 Cache
Embedded DRAM
25-cycle latency
to local slice