Sei sulla pagina 1di 4

ASSIGNMENT 2

A Report on Memory Management and Organization.


1. Introduction
Computer memory system is a necessity for any modern computer system. The notation on
computer memory usually refers to main memory or primary memory, which temporarily holds
the data and instructions needed in process execution by the Central Processing Unit (CPU).
Nowadays computer memory system is a hierarchical structure including ROM, RAM, Cache,
virtual memory, a perfect memory organization is one that has unlimited space and is infinitely
fast so that it does not limit the processor, which is not practically implementable such as
Universal Turing Machine. In reality, there is an increasing gap between the speed of memory
and the speed of microprocessors. In the early 1980s, the average access time of memory was
about 200 ns, which was nearly the same as the clock cycle of the commonly used 4.77 MHz
(210 ns) microprocessors of the same period. Three decades passed, the typical speed of a
home microprocessor is 4 GHz (0.25 nano second), however, memory access time is just around
10 nanoseconds. Thus, the growing processor - memory performance gap becomes the primary
obstacle to improve computer system performance.
2. Memory Hierarchical Structure
Memory was just a single-level scheme for early computers. However, while computers became
faster and computer programs were getting bigger, especially multiple processes that were
concurrently executed under the same computer system, a single main memory that was both
fast enough and large enough had not really been available. As a result, memory hierarchical
structure was designed to provide with rather fast speed at low cost considering of economic
and physical constraints. Because of the cost-performance trade-offs, the memory hierarchy of
modern computers usually contains registers, cache, main memory, and virtual memory. The
concept of memory hierarchy takes advantage of the principle of locality, which states that
accessed memory words will be referenced again quickly (temporal locality) and that memory
words adjacent to an accessed word will be accessed soon after the access in question (spatial
locality). Loops, functions, procedures and variables used for counting and totaling all involve
temporal locality, where recently referenced memory locations are likely to be referenced
again in the near future. Array traversal, sequential code execution, modules, and the tendency
of programmers (or compilers) to place related variable definitions near one another all involve
spatial locality - they all tend to generate clustered memory references. The principle of locality
is particularly applicable to memories for two reasons. First, in most technologies, smaller
memories are faster than larger memories. Second, larger memories need more steps and time
to decode addresses to fetch the required data.
2. Cache
In modern computer systems, caches are small, high-speed, temporary storage areas to hold
data duplicating original values stored in main memory which are currently in use, where the
original data is expensive to access compared to the cost of accessing from the cache. The
success of cache memories has been explained of exploiting the “principle of locality”. When a
CPU needs a particular piece of information, it first checks whether it is in the cache. If it is, the
CPU uses the information directly from the cache; if it is not, the CPU uses the information from
ASSIGNMENT 2

the memory, putting a copy in the cache under the assumption that it will be needed again
soon (temporal locality). Information located in cache memory is accessed in much less time
compared to that located in memory. Thus, a CPU spends far less time waiting for instructions
and data to be fetched and/or stored.
To be cost-effective and to lookup of data efficiently, caches have limited size. Many aspects of
cache design are important including the selection of the cache line size, cache fetch algorithm,
placement algorithm, and replacement policy
2.1. The Cache Line
The cache line is the fixed-size block of information transferred between the cache and the
main memory; selecting the appropriate line size is crucial of the cache design.
Small line sizes have several advantages: 1) the transmission time for moving the whole line
from main memory to cache is obviously shorter than that for a long line. 2) The small line is
less likely to have unneeded information, only a few extra bytes are brought in along with the
actually requested information. On the other hand, large line sizes have also a number of
benefits, 1) If more information in a line is actually being used, fetching it all at one time is more
efficient; 2) the number of lines in the cache is smaller, so fewer logic gates and storage bits are
required to keep and manage address tags and replacement status; 3) a larger line size means
fewer sets in the cache, which minimizes the associative search logic.
Note that the advantages above for both long and short lines become disadvantages for the
other. Another criterion for selecting a line size is the effect of the line size on the miss ratio.
Normally, a large line size exploits spatial locality to lower miss rates, but the miss penalty
increases since the time to fetch the block has increased. In conclusion, selecting the optimum
block size is a trade-off; the average value lines lie in the range of 64 to 256 bytes.
2.2. Placement Algorithm
Placement algorithm determines where information is to be placed, i.e. choosing an
unallocated region for a subset. In cache, mapping schemes are used to translate the main
memory address into a cache location, to search the cache set, or to perform some
combination of these two. The most commonly used form of placement algorithm is called set-
associative mapping. It divides the cache into S sets of E lines per set. Given a memory address
r(i), a mapping function f will map by f (r(i)) = s(i). When E is 1, which is known as direct
mapping, there is only one line per set, and mapping function f is many to one, the conflict is
quite high if two or more currently active lines map into the same set. When S is 1, the cache is
a fully associative memory, allowing a memory block to be placed anywhere in the cache. The
disadvantage of fully associative mapping is that it is slow and expensive to find a mapped block
by searching all lines of the cache. An effective compromise is a combination of the above two,
a trade-off between the miss rate and the miss penalty, to select the value of E in the range of 2
and 16. Studies indicate that such a range exhibits good performance at a fraction of the cost
and performs almost as well as a fully associative cache
2.3. Cache Fetch Algorithm and Cache Replacement Policy
The standard cache fetch algorithm - demand fetching - is to fetch a cache line when it is
needed. If the cache is full, a cache miss means a fetch plus a replacement - to remove a line
from the cache. Replacement algorithms have been extensively studied for paged main
memory. Even though there are some more stringent constraints for a cache memory, for
ASSIGNMENT 2

example, the cache replacement algorithm must be implemented entirely in hardware and
must execute very quickly, the replacement algorithms in Cache are very similar to replacement
algorithms in main memory, thus we leave the discussion in section 3.
Finally, some more aspects such as write through or write back policy, cache band- width,
input/output, multi-cache consistency, data path, and pipelining are not mentioned in this
paper but also important aspects in cache design
2.3. Cache Optimization Techniques
In this section, some cache optimization techniques are discussed to escalate the cache
performance through improving hit time, increasing bandwidth, dropping miss penalty, and
reducing miss rate. Compiler optimization and pre fetching techniques for main memory
management techniques will be discussed in next section.
2.3.1. Small and Multi-level Cache
Cache is extremely expensive, and small cache decreases hit time but increases the miss rate.
Thus, many computers use multiple levels of caches to address the trade-off between cache
latency and hit rate. Small fast caches on chip are backed up by larger slower caches of separate
memory chips. The multiple levels of caches to use could yield an overall improvement in
performance. Some modern computers have begun to have three levels on chip cache, such as
the AMD Phenom II has 6MB on chip L3 cache and Intel i7 has an 8MB on chip L3 cache
4.2. Paging
Another method to decrease the size of mapping information is paging. Most of the memory
management schemes used before (including segmentation) suffered from the significant
fragmentation problem of fitting various sizes of memory chunks onto the backing store.
However, paging can avoid this problem and because of its advantages over other methods,
paging in its various forms is commonly used in most operating systems.
Paging is implemented by breaking main memory into fixed-size blocks called frames, which are
the residence for the matching size blocks of logical memory known as pages. When a process
is being executed, its pages are loaded into any available memory frames from the auxiliary
storage. Each page is identified by virtual address and each frame is identified by physical
address, which is the location address of the first word in the frame. Similar to virtual
segmentation address, a virtual memory address is an ordered two parts (p, d), where p is the
page number and d is the offset within the page p. The page number p is used as an index into a
page table, which contains the base address of each page in physical memory, see figure 3.
Since the virtual address is a power of 2, the computation that generates the virtual address
from (p, d) is trivial. There is no standard page size , the page/frame size varies between 512
bytes and 16MB per page, depending on the architecture. The similar discussion about different
cache line sizes can be found in section 2. Early studies, both in theoretical and in empirical,
found small pages have a better performance. However, large page size could alleviate the high
latency time of the secondary storage devices. Currently, with the rapid increase of both
memory and program sizes, larger page sizes have become more desirable.
Paging is used by almost every computer manufacturer in at least one of products
Under dynamic relocation, every logical address is translated and bounded by the paging
hardware to some real address. One benefit of a paged virtual memory system is that not all
the pages of a process must reside in main memory at the same time: The main memory could
ASSIGNMENT 2

contain only the pages that the process is currently referencing. So the page table is used to
indicate whether or not a mapped page currently resides in main memory. If the page is in
memory, the page table gives the value of the corresponding frame. Otherwise, the page table
yields the location in secondary storage at which the referenced page is stored. When a process
references a page that is not in main memory, the processor generates a page fault, which
invokes the operating system to load the missing page into memory from the secondary
storage. As in the implementation of segmentation, the page table can be stored in memory.
Consequently, a reference to the page table requires one complete main memory cycle and a
total of two memory cycles for a reference. To achieve faster speed, an associative memory -
translation look aside buffer (TLB) - to retain the most recently used page table entries can be
used to achieve faster translation. The TLB is an integral part of today’s MMUs. Because of
prohibitive cost, the TLB can hold only a small portion of most virtual address spaces. It is a
challenge to pick a size to balance the cost and to have enough entries for a large percentage of
reference hits in TLB.

Potrebbero piacerti anche