Memory Hierarchy

Nadeem Tariq
Lecture overview

 Today’s lecture
 – Memory Hierarchy: Introduction
 – Cache types
 – Direct mapped cache
 – Set associative cache
 – Fully associative cache
 – Measuring cache performance
 – Basic cache optimizations
Memory hierarchy: An
introduction

 In 80s largest of programs were only a few KBs
 Now we have programs and applications ranging from
few KBs to hundreds of MBs
 Memory requirement has become huge
 For efficient execution we want our memories to be
unlimited in size and extremely fast
 ideally memories should be as fast as a processor
 This will lead to a 1 cycle memory access time?
 Is it actually possible to have enormous amount of fast
memory?
The answer is NO
Memory hierarchy: An
introduction

 Two important requirements
 Memory should be large
 Memory should be fast
 Both requirements are contradictory to each other
 Flat large and fast memory is not possible
 We need to have a hierarchy
 The library analogy.
Memory hierarchy:
Library analogy

Memory hierarchy:
Library analogy

Worker
 works on available registers
(pages)
 Can rapidly access to data
which is available on its desk in
the form of books
 The size of desk is limited and
only small amount of data can
be placed on it
 Data from books can be accessed
in a page by page manner
Memory hierarchy:
Library analogy

Book shelf
 Contains more books than
on the desk
 Access is slower
 Access is performed on
book titles not on pages
 Books are brought to
desk and then data is
accessed in page by page
manner

Library
 – Contains books
 – Access is very
slow
 – Access is
performed in the
form of books
Memory hierarchy

 Similar to the library analogy
 Memory is organized in hierarchy
 Illusion of large and fast memory is created through hierarchy
 In library analogy, usually
 If you reference a book, you are more likely to reference it again
very soon (temporal locality).
 Most programs in computers contain loops where data is accessed
repeatedly showing temporal locality
 If you reference a book in a particular section, you are more likely
to reference other books lying in that section (spatial locality)
 Programs in computers execute instructions sequentially showing
spatial locality
 Principles of locality are used to create a memory hierarchy
Memory hierarchy

 Similar to library analogy, multiple levels of memory hierarchy
are created
 We need large and fast memory
 Fast memory is expensive (e.g. SRAMs)
 We can not afford to have large blocks of fast memory
 Slow memory is cheap (e.g. DRAMs, flash memory, magnetic
disk)
 Large amount of slow memory is possible
 Faster but smaller memories are physically closer to the
processor
 Larger but slower memories are farther from processor
 Three levels of memory hierarchy 1) SRAMs 2) DRAMs 3)
Magnetic Disc
Memory hierarchy

 Goal is: provide largest possible memory at cheapest
rate with fastest possible access
 Data is transferred between two levels only
 Flow of data: Magnetic disk to main memory: then to
cache and finally to processor registers
Memory hierarchy

Some important
terminologies

 Block: the minimum unit of information that can be
either present or not present in the memory hierarchy
 Hit rate: the fraction of memory accesses found in certain
level of memory
 Miss rate: the fraction of memory accesses not found in a
certain level of memory (1 – hit rate)
 Hit time: time required to access a level of memory
hierarchy when it is a hit
 Miss penalty: time required to replace a block of
memory in upper level with a block from lower level plus
the time to deliver the block to processor
The basics of cache

 So far, we know that the fastest level which is closest to
memory is a cache: A safe place for hiding or storing data
 Whenever processor tries to look for data, it first requests cache
 When a processor requests for data
 How do we know that requested data item is in cache?
 How do we find the requested data item?
 If we assume that each block in a cache has a fixed location then
the answer is simple
 We go to that location and see if required data is present or
no?
Direct mapped cache

 Every block of data has a memory (main memory)
address
 Based on the memory address of a block, its location in
cache is determined
 There are different mapping techniques
 Direct mapped cache: A cache structure in which each
memory
 location is mapped to exactly one location in the cache
 Formula to determine location of memory block in the
cache
 (block address) modulo (number of cache blocks in cache)
Direct mapped cache

 Normally main memory is much bigger and slower compared to cache
 Block locations in memory are much more compared to cache
 So a mapping mechanism is found to bring memory blocks into cache
 Example
Direct mapped cache

 The concept of tag
 The concept of valid bit
 Accessing a cache
Accessing a direct mapped
cache

Accessing a direct mapped
cache

Accessing a direct mapped cache

Calculating the size of a cache

 Index field combined with tag field and valid bit
gives the block in direct mapped cache
 In a MIPS processor we have an address field of 32
bits and memory is byte addressable
 For a 32-bit byte address field, a direct mapped
cache
 with 2n blocks, 2m words in each block will require a
tag field of (32 – (n+m+2))
 Total number of bits in a direct mapped cache of 2n
blocks = 2n x( 2m x 32 + (32 – n – m – 2) + 1)
Handling cache misses

 When a data is requested by processor:
 Either data is found in cache termed as cache hit
 Or data is not found in cache termed as cache miss
 Whenever a cache miss occurs stall is created and
following steps are taken
 Send the original PC to memory
 Instruct main memory to perform a read and wait for the
memory to complete access
 Write the cache entry putting data, tag, valid bit values in
their relevant fields
 Restart the instruction execution at first step
Block size vs. cache miss rate

 Larger blocks better exploit spatial locality and leads
to lower miss rate
 Miss rate may go up if block size becomes
significant
 Increased block size leads to higher miss penalty
 Miss penalty: access time to the first word + transfer
time for the rest of words
 Increased miss penalty can be addressed with
 Early restart
 Critical word first
Handling cache writes

 In case of store instruction, data is written into data cache
 Data in cache is different from data in memory causing
 inconsistency
 Consistency can be achieved through
 Write-through: writes update both cache and memory
 Simple but makes the execution slow
 Slow and less efficient.
 Efficiency can be improved through write buffer
 Write-back: blocks are updated only in cache, then
writing
the modified block to the main memory when block is
replaced
Measuring cache performance

Calculating cache performance

Instruction cache miss rate = 4 %
Data cache miss rate = 6 %
CPI = 4 (in ideal scenario)
Miss penalty = 200 clock cycles
Memory access (loads+stores) = 40% of total instructions
Instruction miss cycles = I x 0.04 x 200 = 8
Data miss cycles = I x 0.4 x 0.06 x 200 = 4.8
CPI = 4 + 8 + 4.8 = 16.8
Performance comparison = 16.8/4 = 4.2
What happens if CPI (ideal) is reduced from 4 to 2 and clock cycle
time is kept same?
What happens if clock cycle time is halved and CPI is kept same?
Improving cache performance
(1)

 So far we know only one mapping technique between memory and cache
i.e. direct mapped technique
 Inflexible
 Only one available position for a memory block in the cache
 Other extreme is fully associative cache: a block from main memory can
 be placed any where in cache
 Flexible
 All free cache locations are available
 Search is made more difficult
 Search time increases
 In order to decrease search time parallel search need to be performed that
requires extra hardware
 middle range between the above two is Set associative cache: a block
from main memory has fixed number of locations (at least two)
 n-way set associative cache (2-way, 4-way, 8-way etc…)
Improving cache performance
(2)

 Set associative cache
 A mixture of direct mapped, fully associative cache
 A block is directly mapped to a set
 Inside that set it can be placed any where depending
 upon associatively of mapping
 Formula for set associative cache
 (Block number) modulo (number of sets in cache)
 A direct mapped cache is a 1-way set associative
cache
Comparison of different
mapping techniques

 Let there be a cache of 8 block and a block numbered
12 is supposed to be mapped on this cache using
different mapping techniques
Performance comparison of

 Cache size = 4 blocks
 Comparison between direct mapped, 2-way set
associative, fully associative
 Address sequence = 0, 8, 0, 6, 8
 For direct mapped

 For 2-way set associative cache

 For fully associative cache
• Three misses are a must for this sequence for every mapping technique
• What if there is an eight block cache?
• What will be the effect on set associative and direct mapped cache?
• What if there is a 16 block cache?
• What will be its effect on direct mapped, set-associative cache?
Locating a block in set-
associative cache

 Any block address has three fields
• In a direct mapped cache, index field takes you to the exact location
• Tag field then decides whether it’s a hit or miss
• In a set-associative cache, index field takes you to the set
• Tag comparison is then performed for all the blocks in parallel to search
for a match
• In a set associative cache when associativity increases by a factor of 2
– Index field decreases by one bit
– Tag field increases by one bit
• In a fully associative cache, no index field is required
Locating a block in set-
associative cache

• In a direct mapped
cache, only a single
comparator needed
• In a set-associative
cache, comparators
increase with
associativity
• Increasing
associativity,
decreases miss rate
but increases
hardware cost and
miss penalty
Size of tags and set-associativity

 Cache size = 4 K blocks
 Block size = 4 word/block
 Word size = 4 bytes
 Address size = 32 bits
 Number of sets = ?
 Total number of tag bits = ?
 For direct mapped, 2-way set associative, 4-way set
 associative, and fully associative
Multi level caches

 A memory hierarchy having multiple levels of
caches
 rather than just a cache and a memory
 Multi level caches reduce miss penalty and hence
 improve cache performance
 For example a two-level cache structure allows
 Primary cache to focus on reduced access time
 Secondary cache focuses on reduced miss rate
Summary of cache
performance improvement

 Set associative caches reduce miss rate
 Multilevel caches reduce miss penalty
 Both eventually lead to
 Less number of memory stall cycles
 Improved execution time and hence better
performance
Four cache hierarchy
questions

 Q1: Where can a block be placed in a cache?
 Three mapping techniques are used commonly
 Direct mapped cache
 If each block in main memory has only one possible
destination in cache, then cache is termed as direct mapped cache
 (block address) mod (number of blocks in cache)
 Fully associative cache
 If the block can be placed any where in the cache, the cache is said to be
fully associative
 Set associative cache
 If a block can be placed in a restricted set of places in the
cache, the cache is set associative
 A block is first mapped on a set and then it can be placed any where
within the set
 (block address) mod (number of sets in cache)
Mapping example

questions

 Q2: How is a block found if it is in cache?
– Index field used to search the set

– Tag field used to find the desired block within the set
– As associativity increases index filed size decreases
– For fully associative cache no index field
questions

 Q3: Which block should be replaced on a cache-miss?
 For direct mapped cache only one possible choice.
Simple and straight forward
 For fully associative and set-associative cache
multiple choices exist
 Replacement strategies
 Random
 Least Recently Used (LRU)
 First In First Out (FIFO)
Four cache hierarchy questions

 Q4: What happens on a write?
 Write through
 Information is written to both the block in cache and in main memory
or lower level memory
 Easier to implement
 Next lower level is up to date
 Write back
 Information written only to the block in cache. Modified block written
to the lower level memory only when it is replaced
 Dirty bit feature used to reduce the frequency of writing
back
 Requires less memory bandwidth compared write through
 Saves power as it accesses the lower level memory less frequently

 Sites.google.com/a/cs.uol.edu.pk/pf

Memory Hierarchy

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Memory Hierarchy

Caricato da

Copyright:

Formati disponibili

Nadeem Tariq

– Index field used to search the set

Potrebbero piacerti anche