Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Copyright 2007
8-<1>
Chapter 8 :: Topics
Introduction
Memory System Performance Analysis
Caches
Virtual Memory
Memory-Mapped I/O
Summary
Copyright 2007
8-<2>
Introduction
Computer performance depends on:
Processor performance
Memory system performance
Memory Interface
CLK
Processor
Copyright 2007
CLK
MemWrite
Address
WriteData
WE
Memory
ReadData
8-<3>
Introduction
Up until now, assumed memory could be accessed
in 1 clock cycle
But that hasnt been true since the 1980s
Copyright 2007
8-<4>
Copyright 2007
8-<5>
Memory Hierarchy
Technology
cost / GB
Access time
SRAM
~ $10,000
~ 1 ns
DRAM
~ $100
~ 100 ns
Hard Disk
~ $1
~ 10,000,000 ns
Cache
Speed
Main Memory
Virtual Memory
Size
Copyright 2007
8-<6>
Locality
Exploit locality to make memory accesses fast
Temporal Locality:
Locality in time (e.g., if looked at a Web page recently, likely to
look at it again soon)
If data used recently, likely to use it again soon
How to exploit: keep recently accessed data in higher levels of
memory hierarchy
Spatial Locality:
Locality in space (e.g., if read one page of book recently, likely to
read nearby pages soon)
If data used recently, likely to use nearby data soon
How to exploit: when access data, bring nearby data into higher
levels of memory hierarchy too
Copyright 2007
8-<7>
Memory Performance
Hit: is found in that level of memory hierarchy
Miss: is not found (must go to next level)
Hit Rate
Miss Rate
8-<8>
What are the hit and miss rates for the cache?
Copyright 2007
8-<9>
What are the hit and miss rates for the cache?
Hit Rate = 1250/2000 = 0.625
Miss Rate = 750/2000 = 0.375 = 1 Hit Rate
Copyright 2007
8-<10>
Copyright 2007
8-<11>
= tcache + MRcache(tMM)
= [1 + 0.375(100)] cycles
= 38.5 cycles
Copyright 2007
8-<12>
Copyright 2007
8-<13>
Cache
A safe place to hide things
Copyright 2007
8-<14>
Copyright 2007
8-<15>
8-<16>
Cache Terminology
Capacity (C):
the number of data bytes a cache stores
Copyright 2007
8-<17>
Copyright 2007
8-<18>
mem[0xFF...FC]
11...11111000
mem[0xFF...F8]
11...11110100
mem[0xFF...F4]
11...11110000
mem[0xFF...F0]
11...11101100
11...11101000
mem[0xFF...EC]
mem[0xFF...E8]
11...11100100
mem[0xFF...E4]
11...11100000
mem[0xFF...E0]
00...00100100
mem[0x00...24]
00...00100000
00...00011100
mem[0x00..20]
mem[0x00..1C]
Set Number
00...00011000
mem[0x00...18]
6 (110)
00...00010100
mem[0x00...14]
5 (101)
00...00010000
mem[0x00...10]
4 (100)
00...00001100
mem[0x00...0C]
3 (011)
00...00001000
mem[0x00...08]
2 (010)
00...00000100
00...00000000
mem[0x00...04]
mem[0x00...00]
1 (001)
0 (000)
7 (111)
23 Word Cache
8-<19>
Tag
Byte
Set Offset
00
27
V Tag
Data
8-entry x
(1+27+32)-bit
SRAM
27
32
Hit
Copyright 2007
Data
8-<20>
Tag
Byte
Set Offset
00...00 001 00
3
V Tag
Data
Set 7 (111)
Set 6 (110)
Set 5 (101)
Set 4 (100)
Set 3 (011)
Set 2 (010)
Set 1 (001)
Set 0 (000)
$t0,
$t0,
$t1,
$t2,
$t3,
$t0,
loop
$0, 5
$0, done
0x4($0)
0xC($0)
0x8($0)
$t0, -1
Copyright 2007
Miss Rate =
8-<21>
Tag
00...00 001 00
$t0,
$t0,
$t1,
$t2,
$t3,
$t0,
loop
$0, 5
$0, done
0x4($0)
0xC($0)
0x8($0)
$t0, -1
Copyright 2007
Byte
Set Offset
3
V Tag
Data
0
0
0
0
1
1
1
0
00...00
00...00
00...00
mem[0x00...0C]
mem[0x00...08]
mem[0x00...04]
Set 7 (111)
Set 6 (110)
Set 5 (101)
Set 4 (100)
Set 3 (011)
Set 2 (010)
Set 1 (001)
Set 0 (000)
8-<22>
Tag
00...01 001 00
loop:
addi
beq
lw
lw
addi
j
$t0,
$t0,
$t1,
$t2,
$t0,
loop
Byte
Set Offset
$0, 5
$0, done
0x4($0)
0x24($0)
$t0, -1
V Tag
Data
Set 7 (111)
Set 6 (110)
Set 5 (101)
Set 4 (100)
Set 3 (011)
Set 2 (010)
Set 1 (001)
Set 0 (000)
done:
Copyright 2007
8-<23>
Tag
00...01 001 00
loop:
addi
beq
lw
lw
addi
j
$t0,
$t0,
$t1,
$t2,
$t0,
loop
$0, 5
$0, done
0x4($0)
0x24($0)
$t0, -1
done:
Byte
Set Offset
3
V Tag
Data
0
0
0
0
0
0
1
0
00...00
mem[0x00...04]
mem[0x00...24]
Set 7 (111)
Set 6 (110)
Set 5 (101)
Set 4 (100)
Set 3 (011)
Set 2 (010)
Set 1 (001)
Set 0 (000)
Copyright 2007
8-<24>
Memory
Address
Byte
Set Offset
Tag
00
28
Way 1
V Tag
28
Way 0
Data
32
V Tag
28
Data
32
Hit1
Hit0
Hit1
32
Hit
Copyright 2007
Data
8-<25>
$t0,
$t0,
$t1,
$t2,
$t0,
loop
$0, 5
$0, done
0x4($0)
0x24($0)
$t0, -1
Way 1
V Tag
Copyright 2007
Data
Way 0
V Tag
Data
8-<26>
$t0,
$t0,
$t1,
$t2,
$t0,
loop
$0, 5
$0, done
0x4($0)
0x24($0)
$t0, -1
Associativity reduces
conflict misses
Way 1
V Tag
Data
0
0
1
0
Way 0
V Tag
Data
0
0
00...10
mem[0x00...24]
Copyright 2007
1
0
00...00
mem[0x00...04]
Set 3
Set 2
Set 1
Set 0
8-<27>
V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data
No conflict misses
Expensive to build
Copyright 2007
8-<28>
Spatial Locality?
Increase block size:
Tag
Block Byte
Set Offset Offset
00
27
V Tag
Data
Set 1
Set 0
27
32
32
00
Copyright 2007
01
Hit
32
10
11
32
32
Data
8-<29>
Memory
Address
Tag
Block Byte
Set Offset Offset
00
27
V Tag
Data
Set 1
Set 0
27
32
32
00
Copyright 2007
01
Hit
32
10
11
32
32
Data
8-<30>
$t0,
$t0,
$t1,
$t2,
$t3,
$t0,
loop
Memory
Address
Tag
$0, 5
$0, done
0x4($0)
0xC($0)
0x8($0)
$t0, -1
Block Byte
Set Offset Offset
00
27
V Tag
27
Data
32
32
00
01
Copyright 2007
Hit
32
10
11
32
32
Data
8-<31>
$t0,
$t0,
$t1,
$t2,
$t3,
$t0,
loop
Tag
$0, 5
$0, done
0x4($0)
0xC($0)
0x8($0)
$t0, -1
Block Byte
Set Offset Offset
Memory
00...00 0 11 00
Address
2
27
V Tag
0
1
00...00
27
Data
mem[0x00...0C]
32
mem[0x00...00]
32
00
Copyright 2007
32
01
Hit
mem[0x00...04]
32
10
11
mem[0x00...08]
Set 1
Set 0
32
Data
8-<32>
Capacity: C
Block size: b
Number of blocks in cache: B = C/b
Number of blocks in a set: N
Number of Sets: S = B/N
Organization
Direct Mapped
1<N<B
B/N
Fully Associative
Copyright 2007
8-<33>
Capacity Misses
Cache is too small to hold all data of interest at one time
If the cache is full and program tries to access data X that
is not in cache, cache must evict data Y to make room for
X
Capacity miss occurs if program then tries to access Y
again
X will be placed in a particular set based on its address
In a direct mapped cache, there is only one place to put X
In an associative cache, there are multiple ways where X
could go in the set.
How to choose Y to minimize chance of needing it again?
Least recently used (LRU) replacement: the least
recently used block in a set is evicted when the cache is
full.
Copyright 2007
8-<34>
Types of Misses
Compulsory: first time data is accessed
Capacity: cache too small to hold all data of
interest
Conflict: data of interest maps to same location in
cache
Miss penalty: time it takes to retrieve a block from
lower level of hierarchy
Copyright 2007
8-<35>
LRU Replacement
# MIPS assembly
lw $t0, 0x04($0)
lw $t1, 0x24($0)
lw $t2, 0x54($0)
V U Tag
Data
V Tag
Data
Set Number
3 (11)
2 (10)
(a)
1 (01)
0 (00)
V U Tag
Data
V Tag
Data
Set Number
3 (11)
(b)
2 (10)
1 (01)
0 (00)
Copyright 2007
8-<36>
LRU Replacement
# MIPS assembly
lw $t0, 0x04($0)
lw $t1, 0x24($0)
lw $t2, 0x54($0)
V U Tag
0
0
1
0
0
0
0 00...010
0
(a)
Way 1
Data
Way 0
V Tag
0
0
mem[0x00...24] 1 00...000
0
Way 1
V U Tag
0
0
1
0
0
0
1 00...010
0
Data
Data
mem[0x00...04]
Way 0
V Tag
Data
0
mem[0x00...24]
Set 3 (11)
Set 2 (10)
Set 1 (01)
Set 0 (00)
0
1 00...101
0
mem[0x00...54]
Set 3 (11)
Set 2 (10)
Set 1 (01)
Set 0 (00)
(b)
Copyright 2007
8-<37>
Caching Summary
What data is held in the cache?
Recently used data (temporal locality)
Nearby data (spatial locality, with larger block sizes)
Copyright 2007
8-<38>
Copyright 2007
8-<39>
Copyright 2007
8-<40>
Multilevel Caches
Larger caches have lower miss rates, longer access
times
Expand the memory hierarchy to multiple levels
of caches
Level 1: small and fast (e.g. 16 KB, 1 cycle)
Level 2: larger and slower (e.g. 256 KB, 2-6
cycles)
Even more levels are possible
Copyright 2007
8-<41>
Copyright 2007
8-<42>
Virtual Memory
Gives the illusion of a bigger memory without the
high cost of DRAM
Main memory (DRAM) acts as cache for the hard
disk
Copyright 2007
8-<43>
cost / GB
Access time
SRAM
~ $10,000
~ 1 ns
DRAM
~ $100
~ 100 ns
Hard Disk
~ $1
~ 10,000,000 ns
Cache
Speed
Main Memory
Virtual Memory
Capacity
Copyright 2007
8-<44>
Magnetic
Disks
Read/Write
Head
8-<45>
Virtual Memory
Each program uses virtual addresses
Two programs can use the same virtual address for different data
Programs dont need to be aware that others are running
One program (or virus) cant corrupt the memory used by another
This is called memory protection
Copyright 2007
8-<46>
Virtual Memory
Block
Page
Block Size
Page Size
Block Offset
Page Offset
Miss
Page Fault
Tag
Copyright 2007
8-<47>
Copyright 2007
8-<48>
Copyright 2007
8-<49>
Address Translation
Copyright 2007
8-<50>
Copyright 2007
8-<51>
Organization:
Copyright 2007
8-<52>
Copyright 2007
8-<53>
Copyright 2007
8-<54>
Copyright 2007
8-<55>
Copyright 2007
8-<56>
Page
Offset
47C
12
VPN is index
into page table
0
0
1
1
0
0
0
0
1
0
0
1
0
0
Hit
Copyright 2007
Physical
Address
Physical
Page Number
0x0000
0x7FFE
Page Table
Virtual
Address
Virtual
Page Number
0x0001
0x7FFF
15
0x7FFF
12
47C
8-<57>
0
0
1
1
0
0
0
0
1
0
0
1
0
0
Hit
Copyright 2007
0x0000
0x7FFE
Page Table
Physical
Page Number
0x0001
0x7FFF
15
8-<58>
Page
Offset
F20
12
0
0
1
1
0
0
0
0
1
0
0
1
0
0
Hit
Physical
Address
Copyright 2007
Physical
Page Number
0x0000
0x7FFE
Page Table
Virtual
Address
Virtual
Page Number
0x0001
0x7FFF
15
0x0001
12
F20
8-<59>
0
0
1
1
0
0
0
0
1
0
0
1
0
0
Copyright 2007
Hit
0x0000
0x7FFE
Page Table
Physical
Page Number
0x0001
0x7FFF
15
8-<60>
0x00007
Page
Offset
3E0
19
0
0
1
1
0
0
0
0
1
0
0
1
0
0
Hit
Copyright 2007
Physical
Page Number
0x0000
0x7FFE
Page Table
Virtual
Address
Virtual
Page Number
0x0001
0x7FFF
15
8-<61>
Copyright 2007
8-<62>
Copyright 2007
8-<63>
TLB
Copyright 2007
8-<64>
Virtual
Address
Virtual
Page Number
0x00002
Page
Offset
47C
19
12
Entry 1
V
Virtual
Page Number
0x7FFFD
19
Entry 0
Physical
Page Number V
0x0000
15
Virtual
Page Number
0x00002
Physical
Page Number
0x7FFF
19
TLB
15
Hit1
Hit0
Hit
Copyright 2007
Physical
Address
15
0x7FFF
Hit1
12
47C
8-<65>
Memory Protection
Multiple programs (processes) run at once
Each process has its own page table
Each process can use entire virtual address space
without worrying about where other programs are
A process can only access physical pages mapped
in its page table cant overwrite memory from
another process
Copyright 2007
8-<66>
Copyright 2007
8-<67>
Copyright 2007
8-<68>
I/O Registers:
Hold values written to the I/O devices
ReadData Multiplexer:
Selects between memory and I/O devices as source of
data sent to the processor
Copyright 2007
8-<69>
Processor
Copyright 2007
Address
WriteData
WE
Memory
ReadData
8-<70>
Address Decoder
CLK
MemWrite
WE
Address
Memory
WriteData
RDsel1:0
Processor
WEM
WE1
WE2
CLK
CLK
Copyright 2007
EN
I/O
Device 1
EN
I/O
Device 2
00
01
ReadData
10
8-<71>
Copyright 2007
8-<72>
Processor
Address
WriteData
CLK
WE
RDsel1:0
MemWrite
WEM
CLK
WE1 = 1
WE2
Address Decoder
Memory
CLK
Copyright 2007
EN
I/O
Device 1
EN
I/O
Device 2
00
01
ReadData
10
8-<73>
CLK
WE
Memory
WriteData
RDsel1:0 = 01
MemWrite
Address
WEM
WE1
WE2
CLK
Processor
Address Decoder
CLK
Copyright 2007
EN
I/O
Device 1
EN
I/O
Device 2
00
01
ReadData
10
8-<74>
LL
AX OW
Copyright 2007
8-<75>
SP0256
A6:1
ALD
SBY
A6:1:
allophone input
ALD:
allophone load (the bar over the name
indicates it is low-asserted, i.e. the chip loads
the address when ALD goes low)
SBY:
standby, indicates when the speech chip
is standing by waiting for the next
allophone
Copyright 2007
8-<76>
SPO256
A6:1
ALD
SBY
1. Set ALD to 1
2. Wait until the chip asserts SBY to indicate that
it has finished speaking the previous allophone
and is ready for the next one
3. Write a 6-bit allophone to A6:1
4. Reset ALD to 0 to initiate speech
Copyright 2007
8-<77>
SPO256
A6:1
ALD
SBY
Allophones in Memory
Memory-Mapped I/O
0xFFFFFF00
0xFFFFFF04
0xFFFFFF08
Copyright 2007
Data
10000010
0x20
1000000C
0x0F
10000008
0x2D
10000004
0x07
10000000
0x1B
8-<78>
Main Memory
A6:1:
ALD:
SBY:
Address
addi
addi
lui
addi
start: sw
loop: lw
beq
add
lw
sw
sw
addi
beq
j
$t1,
$t2,
$t3,
$t4,
$0, 1
$0, 20
0x1000
$0, 0
#
#
#
#
$t1
$t2
$t3
$t4
=
=
=
=
1
array size * 4
array base address
0 (array index)
$t1, 0xFF04($0)
$t5, 0xFF08($0)
$0, $t5, loop
# ALD = 1
# $t5 = SBY
# loop until SBY == 1
#
#
#
#
#
#
#
done:
Copyright 2007
8-<79>
Copyright 2007
8-<80>
Copyright 2007
8-<81>
Copyright 2007
8-<82>
Processor
Copyright 2007
Address
WriteData
WE
Memory
ReadData
8-<83>
Technology
cost / GB
Access time
SRAM
~ $10,000
~ 1 ns
DRAM
~ $100
~ 100 ns
Hard Disk
~ $1
~ 10,000,000 ns
Cache
Speed
Main Memory
Virtual Memory
Size
8-<84>
Address Decoder
CLK
MemWrite
WE
Address
Memory
WriteData
RDsel1:0
Processor
WEM
WE1
WE2
CLK
CLK
Copyright 2007
EN
I/O
Device 1
EN
I/O
Device 2
00
01
ReadData
10
8-<85>
Course Summary
You have learned about:
Combinational and sequential logic
Schematic and HDL design entry
Digital building blocks: adders, ALUs, multiplexers, decoders,
memories, etc.
Assembly language computer architecture
Processor design microarchitecture
Memory system design
8-<86>