CA Unit4 Notes WithDiagrams

Unit IV - Memory System 5.1) Basics: Memory is designed to store & retrieve data / information.
. Max mem size is determined by addressing scheme used. Ex- 16 bit address gives 216 = 64 K memory locations. Modern comps r byte addressable.
Connection of Mem to the Processor:

Processor MAR kbit addressbus nbit databus Memory
MDR
Upto2k addressable locations Wordlength= nbits
Controllines R/W (,MFC,etc.)
Figure 5.1. Connection of the memory to the processor.
Processor consists of two Registers MAR & MDR. To read data from Mem, MAR locates the address in the mem & reads data from that location (R/W = 1). To write data into Mem, MAR locates addr & MDR Passes data bits to that addr(R/W = 0). MFC identifies the mem function completed or not. Ops involving consecutive addr locations r Block Transfer.
Basic Measures of Memory: Mem Access Time: Time b/w initiation & completion of an Op. Mem Cycle Time: Min time delay b/w the initiation of two successive mem Ops.
Types Of Memory: RAM(Random Access Memory / Read Write Memory) o Static RAM, CMOS Chip
Dynamic RAM Asynchronous DRAM Synchronous DRAM
Rambus Memory
ROM(Read Only Memory) o o o o PROM EPROM EEPROM Flash Memory
Cache Memory Virtual Memory Secondary Storage o o o Magnetic Hard Disks Optical Disks Magnetic Tape Systems
5.2) Semiconductor RAM Memories If a loc in a mem can be accessed for R/W in a specific time that is independent of locs addr, than such memory is Random Access Memory. 5.2.1 Internal Organization of Memory Chips: Mem cells r organized in the form of Arrays. Each cell is capable of storing one bit of infm. Each row of cell constitutes a mem word. Cells of a row r connected to a common line referred as Word Line. Word Line is driven by the addr decoder Cells in each col r connected to Sense / Write circuit by two bit lines. In Read Op, this circuit read the infm from the selected Word Line & transmits to O/P lines.
In W Op, these circuits receive I/P infm & store in the cells of selected Word Line. Control line CS selects one chip among multiple chips.
b7 W0 b 7 b1 b 1 b0 b 0

FF FF
A0 A1 A2 A3 Address decoder
W1 W15

Memory cells

Sense/Write circuit Sense/Write circuit Sense/Write circuit R /W CS
Datainput /outputlines:
b7
b1
b0
Figure 5.2. Organization of bit cells in a memory chip.
Stores 128 bits. 14 external connections for addr, data and contrl lines.
Two lines for pwr sply & ground
5bitrow address 5bit decoder
W0 W1 32 32 memorycell array Sense/ Write circuitry
W31
10bit address
32to1 outputmultiplexer and inputdemultiplexer 5bitcolumn address Data input/output
R/ W CS
Figure 5.3. Organization of a 1K 1 memory chip.
Slightly larger mem circuit which has 1K (1024) mem cells. It is organized as 128*8 mem Requires 15 external connections 10 bit address is needed & is divided as 5 bit row addr, 5 bit col addr. Need 1 data line Each row contains 32 cells. I/P are demultiplexed and O/P is multiplexed from 32 bits to 1 bit.
5.2.2 Static Memories: These r the mem circuits that r capable of retaining their State as long as pwr is applied.
Wordline Bitlines
Two invertors r cross connected to form a latch. Latch is connected to two bit lines by Transistors T1 & T2. Transistors act as switches under the cntrl of Word Line. R OP: To read the state of SRAM cell, WL allows T1 &T2 to turn On. If cell state is 1, signal on bit line b is high and b is 0 W OP: To set the state of SRAM cell, place appropriate values on b & b and activate the WL. This forces the cell into corresponding state.
CMOS Cell: Complementory Metal Oxide Semiconductor Chip
V supply
T3 T1 X T5
T4 T2 Y T6
W ordline B itline s
F u e .5 ig r 5 .
A e a p o C O m m r c ll. n x m le fa M S e o y e
Transistor pairs (T3, T5) & (T4, T6) form the invertors in the Latch. In State 1, X have high vge, which makes T3, T6 turn ON, T4, T5 turn OFF and vge at bit line b is high. Continuous pwr sply is needed; interruption of pwr sply loses the cell contents. Even when pwr is restored, the cell state may not be same, hence called as Volatile Memory. Advantage o o Low pwr consumption Provides less heat.
Dis Adv o Needs Continuous pwr sply
5.2.3 Asynchronous DRAMs: Static RAMs r fast, but they cost more space,kore transistors & expensive. DRAMs r cheap & efficient, but they can not retain their state indefinitely & need to be periodically refreshed.
Bitline Wordline
DRAMs stores infm in form of Charge on Capacitors. Charge is maintained for tens of milliseconds. Hence need to restore capacitor charge to its full value by refreshing periodically. R OP: Sense amplifier connected to bit line detects whether charge stored on capacitor is above Threshold value. o o Above threshold value -> logic 1 in bit line Below threshold value -> logic 0 in bit line
Once reading data contents of a cell, it refreshes the row cells automatically. Hence, reading shld be done at row level To store infm to the cell, tran T is turned On and appropriate vge is applied to the bit line & it allows C to charge to its full value.
A Dynamic Memory Chip:
RAS
Row Addr. Strobe

Row address latch Row decoder 4096 (512 8) cellarray
A20 9 A 8 0
Sense/Write circuits
CS R/ W
Column address latch

Column Addr. Strobe
Column decoder
CAS
D7
D0
16 Mega bit DRAM chip configured as 2M*8 Cells r organized in the form of 4k*4K array 4096 cells in each row r divided into 512 groups of 8 Each roe can store 512 bytes of data. Needs 12 addr bits to select a row addr and 9 addr bits to specify a group of 8 bits in the selected row. R/W OP: o o o o o Row addr is applied first Based on RAS signal, addr is loaded on row address latch. Col addr is loaded by CAS signal and decoded. R OP: Sense circuits read data / infm from the selected circuits then transmits to D7 D0 bit lines. W OP: Sense circuits get I/P from D7 D0 & load them into the selected circuits
RAS & CAS r low in general and provides asynchronous cntrl signals DRAM chip is organized to R/W a number of bits in parallel.
Chips r available in size range from 1M to 256M bits. Adv: High density & low cost.
Fast Page Mode: When DRAM in Fig5.7 is accessed, the content of all 4096 cells in the selected row r sensed, but only 8 bit r placed on the data lines D7 D0. Fast Page Mode make it possible to access the other bytes in the same row without having to reselect the row. A latch is added at the O/P of the sense amplifier in each col. Good for bulk transfer.
5.2.4 Synchronous DRAMs:
As a technology development, the DRAMs OPs r directly controlled with clock signal & r called Synchronous DRAM. Cell array is same as in asynchronous DRAM.
Refresh counter
Row address latch Row/Column address Column address counter
Row decoder
Cellarray
Column decoder
Read/Write circuits&latches
Clock RA S CA S R/ W CS Moderegister and timingcontrol Datainput register Dataoutput register
Fig 5.8 Synchronous DRAM
The addr & data connections r buffered by means of registers. The O/P of each sense amplifier is connected to a latch. R OP loads all cell contents of selected row into the lathes & data in the latches r transferred to data O/P register. Refresh OP maintains the latch values but refreshes the cell contents. SDRAM have different modes of OP, whish can be selected by writing cntrl inf into a Mode register. No need of external col signals like CAS, instead the internal column counter & clock signal does this cntrl. All actions r triggered by the rising edge of the clock.
Clock R/ W RAS CAS Address Data Row Col D0 D1 D2 D3
Figure 5.9. Burst read of length 4 in an SDRAM.
Row ddr is latched under cntrl of RAS signal. Mem typically takes 2 or 3 clock ccles to activate the selected row. Then col addr is latched by CAS signal. After one clock cycle, datas r loaded on data lines. DRAM automatically increments col addr to select next data values. Refresh counter provides the addr of the row that r selected for refreshing. Refresh circuits refresh for every 64ms. Clock frequency > 100 MHZ.
Latency & Bandwidth: Transfers b/w mem & processor involve single word of data or small blocks of words. Speed & efficiency of these transfers have performance impact on computers. Performance is given by 2 parameters o Latency: Word transfer-The amount of time to transfer a word of data to or from mem
Block Transfer-The amount of time it takes to transfer the first word of data.
Bandwith: The number of bits or bytes that can be transferred in one second. It is used to measure how much time is needed to transfer an entire block of data. It is determined by speed of mem, transfer capability of links b/w mem and processor & speed of the bus.
Double Data Rate SDRAM: SDRAM performs all actions on the edge of the clock signal. DDR SDRAM accesses the cell array in the same way, but transfers the data on both edges of the clock. Latency is same as for SDRAMs but Bandwidth os double for long burst transfers. Cell array is organized in two banks. Each can be accessed separately. DDR SDRAMs & Strandard SDRAMs r most efficiently used in block transfer applications.
5.2.5 Structure of larger Memories: Connecting memory chips to form a larger memory.
A0 A1 A19 A20
21bit addresses
19bitinternalchipaddress
2bit decoder
512K 8 memorychip
D3124
2316
D 158
D70
512K 8memorychip
19bit address
8bitdata input/output
Chipselect
Figure5.10.Organizationofa2M 32memorymoduleusing512K 8staticmemorychips.
Static Memory Systems: (Ref Fig 5.7) Consider mem of 2M words of 32 bits each. This is implemented by 512*8 static memory. CS allows selecting a chip, if it is logic 1 then the selected chip can R/W data on the data lines. 19 bit addr used to access specific byte location inside each chip of selected row. 2 bit higher addr to select which of the 4 CS cntrl signal should be activated.
Dyanmic Memory Systems: PCs use atleast 32M bytes of memory. Workstations use 128 M bytes of memory. Large mem leads to better performance but occupies more space in MotherBoard. Thus, it leads to the development of larger mem units known as SIMMS (Single In-line Memory) & DIMMS (Dual In-line Memory).
The choice of a RAM chip for a given application depends on several factors: cost, spped, power, size, etc. SRAMs r faster, more expensive, larger and r used in cache memories. DRAMs r slower, cheaper, smaller and r used in main memory.
5.2.6 Memory System Considerations: The choice of a RAM chip for a given application depends on several factors: cost, spped, power, size, etc. SRAMs r faster, more expensive, larger and r used in cache memories. DRAMs r slower, cheaper, smaller and r used in main memory. Memory Controller: Same addr pins for row & column addr. RAS & CAS signal indicates whether the addr pin has row/coladdr. R/W signal specifies the R/W OP. Mem controller acts as the addr multiplexer. DRAM chips do not have self-refreshing capabilities.
Address R/ W Processor Request Clock
Row/Column address RAS Memory Controller CAS R/ W CS Clock Memory
Data
Fig 5.11 Use of a memory controller.
Refresh Overhead:
All DRAMs have to be refreshed. (64ms)
5.2.7 Rambus Memory: Rambus technology is a fast signaling method, used to transfer infm b/w chips. Instead of using signals that have Vsupply, it uses VRef. The reference voltage is about 2V. The two logic values r represented by 0.3V swinsabove & below Vref. This type of signaling is Differential signaling which provides short transition time. The communication links used for this signaling is called as Rambus Channel. Communicating devices like processor may server as Master and RDRAM modules servers as Slaves is carried out by means of Packets transmitted on data lines. 3 types of Packets: o Request: this is issued by Master to indicate the type of operation and contains addr of desired mem location and also has 8 bit count which specifies the number of bytes involved in transfer. Acknowledge: Slave responds by returning positiveor negative acknowledgement packet. Data packet
o o
RDRAM chip can be assembled into larger modules like SIMMS &DIMMS called as RIMM.
5.3 ROM (Read Only Memory) The mem which involves only operation of reading stored data is Read Only Memory. 5.3.1 ROM Cell: Data r written in ROM when it is manufactured. When Transistor is connected to ground point P then cell has logic 0 else logic 1 value. To read the state of the cell, word Line is activated & T turns on and sense circuit gives the O/P state from the bit line.
Bitline Wordline
T Connectedtostorea0 Notconnectedtostorea1
Fig.5.12 A ROM cell
5.3.2 PROM(Programmable ROM): Some ROM designs allow the data to be loaded by the user called Programmable ROM. Mem contains all values as Zero. To insert logic 1 in a location, high-current pulses can be fused at point P.
5.3.3. EPROM(Erasable Programmable ROM): ROM chip that allows the stored data to be erased and new data to be loaded is EPROM. EPROMs r capable of retaining stored infm for a long time. Erasure can be done by exposing the chip to ultraviolet light.
5.3.4 EEPROM (Electrically Programmable ROM): To erase the chip contents, the EPROM has to be removed physically from the circuit and erased by UV light. EEPROM allows the chip to be erasing the contents without removing physically. EEPROM needs different voltage for erasing, writing & reading.
5.3.5 FLASH Memory: In EEPROM it is possible to R/W contents of every single cell. IN Flash device, it is possible to read contents of single cell but it is only possible to write an entire block of cells. Higher density, low cost/bit, less power consumption. Used in portable battery driven equipments.
Larger modules can be implemented in two ways o o Flash Cards Flash Drives
5.4 Speed, Size and Cost: SRAMs r faster, more expensive, larger and r used in cache memories.
DRAMs r slower, cheaper, smaller and r used in main memory.
Increasing size
Processor Re gisters
Increasing speed
Increasing cost per bit
Primarycache
L1
Secondary Cache Mainn Memory
L2
Magnetic disk Secondary Memory
5.5 Cache Memories: Since the speed of the main mem is slow, the cache memory is used which essentially makes the main mem available to the processor in a faster time. The effectiveness of cache mechanism is based on the property called Locality reference. Locality reference - Many instructions in the localized area of the program r executed repeatedly during some period of time and remainder of the program r accessed infrequently. It is done in 2 ways o Temporal Recently executed instructions is likely to be executed again very soon
Spatial Instructions that r closer to the executed instructions r likely to be execute soon.
Caching block of datas Cache block is referred as cache line.
Processor
Cache
Main Memory
Figure 5.14. Use of a cache memory.
When Read request comes from processor, the cache mem sends data if that specified data of given addr is there else read from the main mem. Cache mem has as much number of blocks but less than blocks in mem. The correspondence b/w the main mem block and cache block is specified by Mapping function. When cache mem is full and a block from main mem has to read, the control hardware has to decide which block has to be removed to create space for new block Replacement Algorithm. When R/W OP is done in cache mem, read/write hit is said to have occured. R OP do not involve main mem. W OP is done in 2 techniques. o o Write-through Protocol: Cache location & main memory location r updated simultaneously. Write-back or Copy-back Protocol: Update only Cache location and mark it as updated with associated flag bit (dirty or modified bit). Updates main memory later.
Read Miss when the addressed word in a read operation is not in cache mem. Write Miss When the addressed word is not cache mem. o Writes directly into the main mem, if write-through protocol is used.
Writes first in the cache mem and then to main mem.
5.5.1 Mapping Functions: Method to determine cache locations in which to store memory blocks. 3 types Direct, Associative and Set-Associative Mapping. Consider a cache of 128 blocks with 16 words each. Main mem with 4k blocks of 16 words each.
Direct Mapping: Block j of the main memory maps onto block j modulo 128 of the cache. Blocks 0, 128, 256. Is loaded in block 0 of cache. Replacement algorithm becomes trivial, for Ex when a program has block 1 to block 129. Both block 1 and block 129 has to be loaded in block 1 of cache location. Low order 4 bit main mem addr selects the 16 bit words in a block. 7 bit cache block field determines the cache position in which this block must be stored The high order 5 bit is compared with tag bits associated with the cache location, if they match then the desired word is in the cache else read from main mem.
Main memory Block0 Block1
Cache tag tag Block0 Block1
Block127 Block128 Block129
tag
Block127
Block4095
Figure5.15.Directmappedcache.
Tag 5
Block 7
Word 4 Mainmemoryaddress
11101, 1111111, 1100

EX: Tag: 11101 Block: 1111111=127, in the 127th block of the cache Word:1100=12, the 12th word of the 127th block in the cache
Associative Mapping: Main mem block can be placed into any cache block position. 12 bit tag r required to identify a mem block.
Tag bits from processor r compared with cache and the blocks r located called associative mapping.
memory Block0 Block1 Cache tag tag Block0 Block1 Block i tag Block127
Block4095
Figure5.16.Associativemappedcache.
Tag 12
Word 4
Mainmemoryaddress
111011111111, 1100
12: 12 tag bits Identify which of the 4096 blocks that are resident in the cache. Tag: 111011111111 Word:1100=12, the 12th word of a block in the cache 4: one of 16 words. (each block has 16 words)
Set-Associative Mapping: Combination of both direct & associative mapping techniques. Blocks of cache r grouped into sets. Ex 2 blocks / set i.e. memory block 0 may occupy either cache block0/ block 1 One more control bit called Valid bit is provided to each block to indicate whether it contains valid data or not (Valid bit = 0).
Transfers from the disk to main memory r carried out by a DMA (Direct Memory Access) mechanism. When the cache block is loaded for first time from main memory, the valid bit is set to 1. A transfer from main memory to disk uses write-back protocol. Flush mechanism forces the dirty data from cache block to main memory before DMA takes place. Need to ensure that the two different entities (Processor & DMA) use the same copies of data is referred as - Cache Coherence.
MainMemory Block0 Block1 Cache tag tag tag tag Block0 Block1 Block2 Block3 Block63 Block64 Block65
tag tag
Block126 Block127
Block4095
Figure5.17.Setassociativemappedcachewithtwoblocksperset.
4: one of 16 words. 6: determines which set of cache might contain the desired block (128/2=64) 6: 6 tag bits is used to check if the desired block is present Tag Set Word 6 6 4
111011, 111111, 1100
Tag: 111011 Set: 111111=63, in the 63rd set of the cache Word:1100=12, the 12th word of the 63rd set in the cache
5.5.2Replacement Algorithms: In Direct mapped cache, no replacement strategy exists. In associative and set-associative mapping, when a new block is to be brought into the cache & all the positions that it may occupy r full, then the cache controller decides which old blocks to overwrite. Programs usually stay in localized areas for reasonable periods of time, so the recently used blocks may be referred again. Hence the least recently used (LRU) blocks r replaced with the new blocks. Increase / clear track counters when a hit/miss occurs. Performance of LRU can be improved by using small amount of randomness in deciding which block to replace.
5.6 Performance Considerations: Two key factors: performance and cost Price/performance ratio Performance depends on how fast machine instructions can be brought into the processor for execution and how fast they can be executed. In the memory hierarchy, it is beneficial if transfers to and from the faster units can be done at a rate equal to that of the faster unit. This is not possible if both the slow and the fast units are accessed in the same manner. However, it can be achieved when parallelism is used in the organizations of the slower unit. An effective way to introduce parallelism is to use an interleaved organization.
5.6.1 Interleaving: If the main memory is structured as a collection of physically separated modules, each with its own ABR (Address buffer register) and DBR( Data buffer register), memory access operations may proceed in more than one module at the same time.
mbits kbits Module mbits Addressinmodule MMaddress Addressinmodule kbits Module MMaddress
ABR DBR ABR DBR ABR DBR Module i ABR DBR Module n 1 Module 0
ABR DBR Module i
ABR DBR Module k 2 1
Module 0
(a)Consecutivewordsinamodule
(b)Consecutivewordsinconsecutivemodules
Figure 5.25. Addressing multiple-module memory systems.
First Case: High order K bits specify one of n modules. Low order M bits name a particular word in that module
Second Case: It is called memory interleaving Low order K bits select a module. High order M bits name a location within that module. To implement the interleaved structure, there must be 2k modules.
5.6.2 Hit Rate and Miss Penalty: The success rate in accessing information at various levels of the memory hierarchy hit rate / miss rate. A successful access to data in a cache is called a hit. The number of hits stated as a fraction of all attempted accesses is called the hit rate. The number of misses stated as a fraction of all attempted accesses is called the miss rate.
Ideally, the entire memory hierarchy would appear to the processor as a single memory unit that has the access time of a cache on the processor chip and the size of a magnetic disk depends on the hit rate (>>0.9). A miss causes extra time needed to bring the desired information into the cache is called the miss penalty.
The impact of the cache on the overall performance of the computer: tave = hC + (1-h)M o o o tave: average access time experienced by the processor h: hit rate M: miss penalty- the time to access information in the main memory C: the time to access information in the cache
Example: o Assume that 30 percent of the instructions in a typical program perform a read/write operation, which means that there are 130 memory accesses for every 100 instructions executed. h=0.95 for instructions, h=0.9 for data C=10 clock cycles, M=17 clock cycles 130x10 100(0.95x1+0.05x17)+30(0.9x1+0.1x17)
o o
Time without cache Time with cache o
The computer with the cache performs five times better
How to Improve Hit Rate? Use larger cache increased cost Increase the block size while keeping the total cache size constant. However, if the block size is too large, some items may not be referenced before the block is replaced miss penalty increases. Using Load-through approach reduces the miss penalty.
5.6.3 Caches on the Processor Chip: Optimal place for cache is on the processor chip. But space on the processor chip is needed for many other functions. All high performance chips include some form of cache. Some manufactures use two separate caches for instructions and data, respectively as in 68040, Pentium III and Pentium IV Processors. Some use Single cache for both as in ARM710T processor. Which one has better hit rate? -- Single cache Whats the advantage of separating caches? parallelism, better performance High performance processors use 2 levels of caches. L1 cache faster, smaller and located in processor chip. Access more than one word simultaneously and let the processor use them one at a time. L2 cache slower, larger and implemented externally using SRAM chips. How about the average access time? Average access time: tave = h1C1 + (1-h1)h2C2 + (1-h1)(1-h2)M o Where h is the hit rate, C is the time to access information in cache; M is the time to access information in the main memory.
5.6.4 Other Enhancements: Write buffer o Write operation in write-through protocol results in writing new value into the main memory, which makes the processor to wait for the memory functions to be completed. Write buffer is used for temporary storage of write requests. Write buffer sends this requests to main memory whenever main memory is not responding for read requests. Write buffer also works for the writ-back protocol.
o o
Prefetching
o o
Prefetch the data into the cache before they are needed. A special prefetch instruction may be provided in the instruction set of the processor to achieve prefetching. Executing this instruction causes the addressed data to be loaded into the cache, as in the case of read miss. Prefetch instructions can be inserted into a program either by the programmer or by the compiler. Prefetching can also be done through hardware by adding circuitry that discovers a pattern in memory references and prefetches data according to that pattern. Intels Pentium 4 processor use both software prefetching and hardware prefetching.
Lockup-Free cache o Processor is able to access the cache while a miss is being serviced. A cache that can support multiple outstanding misses is called lockup-free cache. It can service only one miss at a time and includes circuitry that keeps track of all outstanding misses.
5.7 Virtual Memories: Overview Physical main memory is not as large as the address space spanned by an address issued by the processor. When a program does not completely fit into the main memory, the parts of it not currently being executed are stored on secondary storage devices. Techniques that automatically move program and data blocks into the physical main memory when they are required for execution are called virtual-memory techniques. The binary addresses that the processor issue for either instructions or data are called virtual or logical addresses. These addresses r translated into physical addresses by a combination of hardware and software components.
A special hardware unit called Memory Management Unit (MMU) translates virtual addresses into physical address. If the desired data is in the main memory, then it is accessed by cache mechanism else it is accessed from storage devices by DMA mechanism.
P cs o r es r o Vt a d r s i ula des r
Da a t
MU M Pyi a d r s h scla des Ch a e c
Da a t M e oy an m i m r D Ar nf r Mt a se Dk t r g i oa e ss
Pyi a d r s h scla des
F u . 6 i r5 . ge 2
V ulm o r ai a o. it a e r o n t n r my g z i
5.7.1 Address Translation: To translate virtual addr into physical addr assume that all programs and data are composed of fixed-length units called pages, each of which consists of a block of words that occupy contiguous locations in the main memory. Pages commonly range from 2K to 16 bytes in length. Page cannot be too small, because the access time of magnetic disk is much longer than the access time of main memory. Page cannot be too large, because a substantial portion of a page may not be used, yet unnecessary data will occupy valuable space in the main memory. The virtual memory mechanism bridges the size and speed gaps between the main memory and secondary storage.
Fig 5.27
Each virtual address generated by the processor, whether to fetch/store a instruction/operand, is interpreted as a virtual page number (high order) and offset (low order) that specifies the location of a particular byte within a page. Information about the main memory location of each page is kept in a Page table. An area in maim memory that can hold one page is called page frame. The starting addr of a page table is kept in a page table base register. The control bits describe the status of the page present in the main memory. One such bit indicates validity of the page, whether the page is actually loaded in the main memory. Another control bit indicates whether the page is modified during its residency in the main memory. Other control bits indicate various restrictions like full read and write permission or read access only, etc on accessing the page.
Virtualaddressfromprocessor Pagetablebaseregister Pagetableaddress Virtualpagenumber Offset
+
PAGETABLE
Control bits
Pageframe inmemory
Pageframe
Offset
Physicaladdressinmainmemory
Figure5.27.Virtualmemoryaddresstranslation.
The page table information is used by the MMU for every access, so it is supposed to be with the MMU.
However, since MMU is on the processor chip and the page table is rather large, only small portion of it, which consists of the page table entries that correspond to the most recently accessed pages, can be accommodated within the MMU. A small cache, called Translation Lookaside Buffer (TLB) is incorporated in MMU for this purpose.
TLB The contents of TLB must be coherent with the contents of page tables in the memory. When the OS updates the Page table, the corresponding TLB contents need to be updated. Translation procedure. o If the page table entry is found in the TLB, the physical address is obtained immediately. o If there is a miss in TLB, then the required entry is got from the Page table in the main memory and the TLB is updated. When a program generates an access request to a page that is not in the main memory, a Page fault is said to have occurred. If a new page is brought from the disk when the main memory is full, it must replace one of the resident pages. LRU replacement algorithm is applicable and is achieved by the control bit set to 1 whenever the corresponding page is accessed. This determines which pages have not been used recently. Write-through is not suitable for virtual memory.
Virtualaddressfromprocessor
Virtualpagenumber
Offset
TLB Virtualpage number Control bits Pageframe inmemory
No
=? Yes
Miss Hit Pageframe Offset
Physicaladdressinmainmemory
Figure5.28.UseofanassociativemappedTLB.
5.8 Memory Management Requirements: Management routines are part of the OS of the computer. Virtual address space is called System space. Providing separate page table for each user is called user space. No program should be allowed to destroy either the data or instruction of other programs, so Protection has to be addressed. (supervisor / user state, privileged instructions) Shared pages will have entry in two different page tables.
5.9 Secondary Storage: Semiconductor memories r limited to cost per bit of stored information. This leads to large storage devices like magnetic disks, optical disks and magnetic tapes. 5.9.1 Magnetic Hard Disks
Figure 5.29 Magnetic disk Principles.
Magnetic disk system consists of one or more disks mounted on a common spindle. A thin magnetic film is deposited on each disk, usually on both sides. The disks are placed in a rotary drive so that the magnetized surfaces move in close proximity to read/write heads. The disks rotate at a uniform speed. Each head consist of a magnetic yoke and a magnetizing coil. Digital information can be stored on the magnetic film by applying current pulses of suitable polarity to the magnetizing coil. using clock signal as a reference, the data stored on other tracks can be read correctly Encoding clock signal with data is approached. Manchester or phase encoding is used. In which the data sent is broken down into a series of long and short signals. Manchester encoding is a self-clocking data encoding method that divides the time required to define the bit into two cycles. The first cycle is the data value (0 or 1) and the second cycle provides the timing by shifting to the opposite state.
Organization and Accessing of Data on a Disk:
Each surface is divided into concentric tracks. Each track is divided into sectors. The set of corresponding tracks on all surfaces of a stack of disks forms a logical cylinder. The data r accessed by specifying the surface number, track number, and the sector number without moving the read/write heads.
Sector3,track n
Sector0,track1 Sector0,track0
Figure5.30.Organizationofonesurfaceofadisk.
Following the data, there is an error-correction code (ECC). The stored information is packed more densely on inner tracks than on outer tracks. Access time o Seek time The time required to move the read/write head to the proper track. o Rotational delay or latency time amount of time that elapses after the head is positioned over the correct track until the starting position of the addresses sector passes under the read/write head. Data buffer/cache o A disk drive is connected to the rest of a computer system using some standard interconnection scheme. o The SCSI bus is capable of transferring data at much higher rates than the rate at which data can be read from the disk tracks. o A Data buffer in the disk unit is a semiconductor memory, capable of storing a few mega bytes of data. o When a read request arrives at the disk, the controller can first check to see if the desired data are already available in the buffer. If so, the data can be accessed and placed on the SCSI bus in microseconds rather than milliseconds.
Disk Controller:
Processor
Mainmemory
System Bus
Diskcontroller
Diskdrive
Diskdrive
Figure 5.31 Disks Connected to the system bus
Operation of disk drives is controlled by a disk controller circuit. It uses DMA scheme to transfer data b/w the disk and the MM. Main Memory address The addr of the first main memory location of the block of words involved in the transfer. Disk address The location of the sector containing the beginning of the desired block of words. Word Count The number of words in the block to be transferred. The disk controllers major functions r: o Seek Causes the disk drive to move the read/write head from its current position to the desired track. o Read o Write o Error checking Computes the error correcting code (ECC) value for the data read from a given sector and compares it with the corresponding ECC value read from the disk. Floppy Disks: Floppy disks are smaller, simpler, and cheaper disk units consist of a flexible, removable, plastic diskette coated with magnetic material. The diskette is enclosed in a plastic jacket, which has an opening where the read/write head makes contact with the diskette. A hole in the center of the diskette allows a spindle mechanism in the disk drive to position & rotate the diskette.
RAID Disk Arrays: Redundant Array of Inexpensive Disks
Using multiple disks makes it cheaper for huge storage, and also possible to improve the reliability of the overall system. RAID0 data striping RAID1 identical copies of data on two disks RAID2, 3, 4 increased reliability RAID5 parity-based error-recovery 5.9.2 Optical Disks: Large storage devices can also be implemented using optical means.
CD Technology:
A cross section of a small portion of a CD is shown in Fig 5.32 a. The bottom layer is polycarbonate plastic. The surface of this plastic is programmed to store data by indenting it with pits. The unintended parts are called lands. Fig 5.32 b shows what happens as the laser beam scans across the disk and encounters a transition from a pit to a land. Three different positions of the laser source and the detector are shown, as would occur when the disk is rotating. When the light reflects solely from the pit, or from the land, the detector will see the reflected beam as a bright spot. At the pit-land and land-pit transitions the detector will not see a reflected beam and will detect a dark spot. Fig 5.32 c depicts several transitions b/w lands and pits. If each transition, detected as a dark spot, is taken to denote the binary value 1, & the flat portions represents 0s.
Pit
Land
Polycarbonateplastic (a)Crosssection
Pit
Land
Reflection
Reflection
Noreflection
Source
Detector
Source
Detector
Source
Detector
(b)Transitionfrompittoland
(c)Storedbinarypattern
Figure5.32.Opticaldisk.
CD-ROM CD-Recordable (CD-R) CD-Re Writables (CD-RW) DVD Technology DVD-RAM 5.9.3 Magnetic Tape systems:
Magnetic tapes r suited for off-line storage of large amounts of data. They r typically used for hard disk backup purpose and for archival storage. Data on the tape r organized in the form of records separated by gaps. A group of related records is called a file. The beginning of a file is identified by a file mark. The control commands used are: o Rewind tape o Rewind and unload tape o Erase tape o Write tape mark o Forward space one/ Backspace one record o Forward space one / Backspace one file

CA Unit4 Notes WithDiagrams

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

CA Unit4 Notes WithDiagrams

Caricato da

Copyright:

Formati disponibili

Unit IV - Memory System 5.1) Basics: Memory is designed to store & retrieve data / information.

Connection of Mem to the Processor:

Upto2k addressable locations Wordlength= nbits

Controllines R/W (,MFC,etc.)

Figure 5.1. Connection of the memory to the processor.

Dynamic RAM Asynchronous DRAM Synchronous DRAM

ROM(Read Only Memory) o o o o PROM EPROM EEPROM Flash Memory

Figure 5.2. Organization of bit cells in a memory chip.

Two lines for pwr sply & ground

5bitrow address 5bit decoder

W0 W1 32 32 memorycell array Sense/ Write circuitry

32to1 outputmultiplexer and inputdemultiplexer 5bitcolumn address Data input/output

Figure 5.3. Organization of a 1K 1 memory chip.

CMOS Cell: Complementory Metal Oxide Semiconductor Chip

Dis Adv o Needs Continuous pwr sply

A Dynamic Memory Chip:

Row Addr. Strobe

Column address latch

5.2.4 Synchronous DRAMs:

Row address latch Row/Column address Column address counter

Clock RA S CA S R/ W CS Moderegister and timingcontrol Datainput register Dataoutput register

Fig 5.8 Synchronous DRAM

Clock R/ W RAS CAS Address Data Row Col D0 D1 D2 D3

Figure 5.9. Burst read of length 4 in an SDRAM.

Figure5.10.Organizationofa2M 32memorymoduleusing512K 8staticmemorychips.

Address R/ W Processor Request Clock

Row/Column address RAS Memory Controller CAS R/ W CS Clock Memory

DRAMs r slower, cheaper, smaller and r used in main memory.

Increasing cost per bit

Secondary Cache Mainn Memory

Magnetic disk Secondary Memory

Caching block of datas Cache block is referred as cache line.

Figure 5.14. Use of a cache memory.

Writes first in the cache mem and then to main mem.

Main memory Block0 Block1

Cache tag tag Block0 Block1

Block127 Block128 Block129

Block255 Block256 Block257

11101, 1111111, 1100

Block127 Block128 Block129

111011, 111111, 1100

ABR DBR Module i

ABR DBR Module k 2 1

Figure 5.25. Addressing multiple-module memory systems.

Time without cache Time with cache o

The computer with the cache performs five times better

MU M Pyi a d r s h scla des Ch a e c

Pyi a d r s h scla des

Virtualaddressfromprocessor Pagetablebaseregister Pagetableaddress Virtualpagenumber Offset

TLB Virtualpage number Control bits Pageframe inmemory

Miss Hit Pageframe Offset

Figure 5.29 Magnetic disk Principles.

Organization and Accessing of Data on a Disk:

RAID Disk Arrays: Redundant Array of Inexpensive Disks

Potrebbero piacerti anche