Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Lecture overview
Today’s lecture
– Memory Hierarchy: Introduction
– Cache types
– Direct mapped cache
– Set associative cache
– Fully associative cache
– Measuring cache performance
– Basic cache optimizations
Memory hierarchy: An
introduction
In 80s largest of programs were only a few KBs
Now we have programs and applications ranging from
few KBs to hundreds of MBs
Memory requirement has become huge
For efficient execution we want our memories to be
unlimited in size and extremely fast
ideally memories should be as fast as a processor
This will lead to a 1 cycle memory access time?
Is it actually possible to have enormous amount of fast
memory?
The answer is NO
Memory hierarchy: An
introduction
Two important requirements
Memory should be large
Memory should be fast
Both requirements are contradictory to each other
Flat large and fast memory is not possible
We need to have a hierarchy
The library analogy.
Memory hierarchy:
Library analogy
Memory hierarchy:
Library analogy
Worker
works on available registers
(pages)
Can rapidly access to data
which is available on its desk in
the form of books
The size of desk is limited and
only small amount of data can
be placed on it
Data from books can be accessed
in a page by page manner
Memory hierarchy:
Library analogy
Book shelf
Contains more books than
on the desk
Access is slower
Access is performed on
book titles not on pages
Books are brought to
desk and then data is
accessed in page by page
manner
Library
– Contains books
– Access is very
slow
– Access is
performed in the
form of books
Memory hierarchy
Similar to the library analogy
Memory is organized in hierarchy
Illusion of large and fast memory is created through hierarchy
In library analogy, usually
If you reference a book, you are more likely to reference it again
very soon (temporal locality).
Most programs in computers contain loops where data is accessed
repeatedly showing temporal locality
If you reference a book in a particular section, you are more likely
to reference other books lying in that section (spatial locality)
Programs in computers execute instructions sequentially showing
spatial locality
Principles of locality are used to create a memory hierarchy
Memory hierarchy
Similar to library analogy, multiple levels of memory hierarchy
are created
We need large and fast memory
Fast memory is expensive (e.g. SRAMs)
We can not afford to have large blocks of fast memory
Slow memory is cheap (e.g. DRAMs, flash memory, magnetic
disk)
Large amount of slow memory is possible
Faster but smaller memories are physically closer to the
processor
Larger but slower memories are farther from processor
Three levels of memory hierarchy 1) SRAMs 2) DRAMs 3)
Magnetic Disc
Memory hierarchy
Goal is: provide largest possible memory at cheapest
rate with fastest possible access
Data is transferred between two levels only
Flow of data: Magnetic disk to main memory: then to
cache and finally to processor registers
Memory hierarchy
Some important
terminologies
Block: the minimum unit of information that can be
either present or not present in the memory hierarchy
Hit rate: the fraction of memory accesses found in certain
level of memory
Miss rate: the fraction of memory accesses not found in a
certain level of memory (1 – hit rate)
Hit time: time required to access a level of memory
hierarchy when it is a hit
Miss penalty: time required to replace a block of
memory in upper level with a block from lower level plus
the time to deliver the block to processor
The basics of cache
So far, we know that the fastest level which is closest to
memory is a cache: A safe place for hiding or storing data
Whenever processor tries to look for data, it first requests cache
When a processor requests for data
How do we know that requested data item is in cache?
How do we find the requested data item?
If we assume that each block in a cache has a fixed location then
the answer is simple
We go to that location and see if required data is present or
no?
Direct mapped cache
Every block of data has a memory (main memory)
address
Based on the memory address of a block, its location in
cache is determined
There are different mapping techniques
Direct mapped cache: A cache structure in which each
memory
location is mapped to exactly one location in the cache
Formula to determine location of memory block in the
cache
(block address) modulo (number of cache blocks in cache)
Direct mapped cache
Normally main memory is much bigger and slower compared to cache
Block locations in memory are much more compared to cache
So a mapping mechanism is found to bring memory blocks into cache
Example
Direct mapped cache
The concept of tag
The concept of valid bit
Accessing a cache
Accessing a direct mapped
cache
Accessing a direct mapped
cache
Accessing a direct mapped cache
Calculating the size of a cache
Index field combined with tag field and valid bit
gives the block in direct mapped cache
In a MIPS processor we have an address field of 32
bits and memory is byte addressable
For a 32-bit byte address field, a direct mapped
cache
with 2n blocks, 2m words in each block will require a
tag field of (32 – (n+m+2))
Total number of bits in a direct mapped cache of 2n
blocks = 2n x( 2m x 32 + (32 – n – m – 2) + 1)
Handling cache misses
When a data is requested by processor:
Either data is found in cache termed as cache hit
Or data is not found in cache termed as cache miss
Whenever a cache miss occurs stall is created and
following steps are taken
Send the original PC to memory
Instruct main memory to perform a read and wait for the
memory to complete access
Write the cache entry putting data, tag, valid bit values in
their relevant fields
Restart the instruction execution at first step
Block size vs. cache miss rate
Larger blocks better exploit spatial locality and leads
to lower miss rate
Miss rate may go up if block size becomes
significant
Increased block size leads to higher miss penalty
Miss penalty: access time to the first word + transfer
time for the rest of words
Increased miss penalty can be addressed with
Early restart
Critical word first
Handling cache writes
In case of store instruction, data is written into data cache
Data in cache is different from data in memory causing
inconsistency
Consistency can be achieved through
Write-through: writes update both cache and memory
Simple but makes the execution slow
Slow and less efficient.
Efficiency can be improved through write buffer
Write-back: blocks are updated only in cache, then
writing
the modified block to the main memory when block is
replaced
Measuring cache performance
Calculating cache performance
Instruction cache miss rate = 4 %
Data cache miss rate = 6 %
CPI = 4 (in ideal scenario)
Miss penalty = 200 clock cycles
Memory access (loads+stores) = 40% of total instructions
Instruction miss cycles = I x 0.04 x 200 = 8
Data miss cycles = I x 0.4 x 0.06 x 200 = 4.8
CPI = 4 + 8 + 4.8 = 16.8
Performance comparison = 16.8/4 = 4.2
What happens if CPI (ideal) is reduced from 4 to 2 and clock cycle
time is kept same?
What happens if clock cycle time is halved and CPI is kept same?
Improving cache performance
(1)
So far we know only one mapping technique between memory and cache
i.e. direct mapped technique
Inflexible
Only one available position for a memory block in the cache
Other extreme is fully associative cache: a block from main memory can
be placed any where in cache
Flexible
All free cache locations are available
Search is made more difficult
Search time increases
In order to decrease search time parallel search need to be performed that
requires extra hardware
middle range between the above two is Set associative cache: a block
from main memory has fixed number of locations (at least two)
n-way set associative cache (2-way, 4-way, 8-way etc…)
Improving cache performance
(2)
Set associative cache
A mixture of direct mapped, fully associative cache
A block is directly mapped to a set
Inside that set it can be placed any where depending
upon associatively of mapping
Formula for set associative cache
(Block number) modulo (number of sets in cache)
A direct mapped cache is a 1-way set associative
cache
Comparison of different
mapping techniques
Let there be a cache of 8 block and a block numbered
12 is supposed to be mapped on this cache using
different mapping techniques
Performance comparison of
different mapping techniques
Cache size = 4 blocks
Comparison between direct mapped, 2-way set
associative, fully associative
Address sequence = 0, 8, 0, 6, 8
For direct mapped
Performance comparison of
different mapping techniques
For 2-way set associative cache
Performance comparison of
different mapping techniques
For fully associative cache
• Three misses are a must for this sequence for every mapping technique
• What if there is an eight block cache?
• What will be the effect on set associative and direct mapped cache?
• What if there is a 16 block cache?
• What will be its effect on direct mapped, set-associative cache?
Locating a block in set-
associative cache
Any block address has three fields
• In a direct mapped cache, index field takes you to the exact location
• Tag field then decides whether it’s a hit or miss
• In a set-associative cache, index field takes you to the set
• Tag comparison is then performed for all the blocks in parallel to search
for a match
• In a set associative cache when associativity increases by a factor of 2
– Index field decreases by one bit
– Tag field increases by one bit
• In a fully associative cache, no index field is required
Locating a block in set-
associative cache
• In a direct mapped
cache, only a single
comparator needed
• In a set-associative
cache, comparators
increase with
associativity
• Increasing
associativity,
decreases miss rate
but increases
hardware cost and
miss penalty
Size of tags and set-associativity
Cache size = 4 K blocks
Block size = 4 word/block
Word size = 4 bytes
Address size = 32 bits
Number of sets = ?
Total number of tag bits = ?
For direct mapped, 2-way set associative, 4-way set
associative, and fully associative
Multi level caches
A memory hierarchy having multiple levels of
caches
rather than just a cache and a memory
Multi level caches reduce miss penalty and hence
improve cache performance
For example a two-level cache structure allows
Primary cache to focus on reduced access time
Secondary cache focuses on reduced miss rate
Summary of cache
performance improvement
Set associative caches reduce miss rate
Multilevel caches reduce miss penalty
Both eventually lead to
Less number of memory stall cycles
Improved execution time and hence better
performance
Four cache hierarchy
questions
Q1: Where can a block be placed in a cache?
Three mapping techniques are used commonly
Direct mapped cache
If each block in main memory has only one possible
destination in cache, then cache is termed as direct mapped cache
(block address) mod (number of blocks in cache)
Fully associative cache
If the block can be placed any where in the cache, the cache is said to be
fully associative
Set associative cache
If a block can be placed in a restricted set of places in the
cache, the cache is set associative
A block is first mapped on a set and then it can be placed any where
within the set
(block address) mod (number of sets in cache)
Mapping example
Four cache hierarchy
questions
Q2: How is a block found if it is in cache?