Cache Memory: Computer Architecture Unit-1

Computer Architecture
unit-1
*Cache memory
Cache memory is small, high speed RAM buffer located
between processor and main memory.
Cache memory holds copy of instructions (instruction cache) or
data (operand or data cache) currently being used by CPU.
the main purpose of cache is to accelerate the computer while
keeping the price of the computer low.
*why we use cache memory..?
Processor is much faster than the main memory as a result, the
processor has to spend much of it’s time waiting while
instructions and data are being fetched from the main
memory.
to achieving good performance speed of the main memory
can't be increased beyond a certain point. to overcome this
problem cache memory is used.
cache memory is an architectural arrangement which makes
the main memory appear faster to the processor than it really
is.
cache memory is based on the property of computer programs
known as locality of reference.
*Computer
Computer Memory System:
System:-
Types of computer memory (RAM and ROM)
Memory is the best essential
ssential element of a computer because computer can’t perform simple tasks.
Computer memory is of two basic type – Primary memory / Volatile memory and Secondary memory
/ non-volatile
volatile memory. Random Access Memory (RAM) is volatile memory and Read Only Memory
Memo
(ROM) is non-volatile memory.
Memory Hierarchy:-
*Cache memory principles

1. Intended to give memory speed approaching that of fastest
memories available but with large size at close to the price of
slower memories.
2. Cache is checked first for all memory references.
3. If not found, the entire block in which that reference resides
in main memory is stored in a cache slot, called a line.
4. Each line includes a tag(usually a portion of main memory
address) which identifies which particular block is being stored.
Word Transfer Block Transfer
Main
Processor cache memory
5. Locality of reference implies that future references will likely

come from this block of memory, so that cache line will
probably be utilized repeatedly.
6. When the processor wants to read a word of memory, a
check is made to determine whether the word is present inside
the cache or not. If present then it is cache hit. if not then it is
cache miss.
7. When a cache hit Occurs the data and address buffers are
disabled and the communication is only between processor and
cache with no system bus traffic. And the desired word is
delivered to the processor form cache memory.
8. When a cache miss occurs the desired word is first read in to
cache and then transformed from cache to processor for later
case , the cache is physically interposed between the processor
and main memory for all data address and control lines.
9. The portion of memory reference, which is found in cache is
called Hit ratio.
Hit ratio = total number of hits/total number of bus cycle
System bus
processor Addresss
Address Buffer
Control
Cache Control
Data Data Buffer
10. Cache connects to the processor via data control and

address line. The data and address line also attached to data
and address buffer which attached to a system bus from which
main memory is reached.
CPU generates the receive address (RA) of a word to be
moved (read).
Check a block containing RA in cache.
If present get from cache (fast) and return.
If not present, access and read required block from main
memory to cache.
Allocate cache line for this new found block.
Cache includes tags to identify which block of main memory
is in each cache slot.
Start
Receive address (RA)

from cpu
Access main memory for

Is block containing
block containing RA
RA in cache?
Allocate Cache line for main

memory block
Fetch RA word and
deliver to CPU
Load main memory block Deliver RA word to CPU

into cache line
(Processor)
Done
(Flow chart for cache read operation)

Locality of Reference:-The Reference to memory at any given
interval of time tends to be confirmed within a few localized
area of memory. This property is called Locality of reference.
This is possible because subroutine calls are encountered
frequently, when program loop is executed the CPU will
execute same code repeatedly. Similarly when a subroutine is
called, The CPU fetched the starting address of subroutine and
executes the subroutine program.
The principle states that memory references tend to cluster
over a long period of time, The clusters are use in changes but
over a short period of time, The processor is primarily working
fixed cluster of memory references.
Spatial locality:-It refers to the tendency of execution to
involve a number of memory locations that are clustered.
It reflects tendency of a program to access data locations
sequentially, such as when processing a table of data.
Temporal locality:-It refers to the tendency for a processor to
access memory locations that has been used frequently. For
e.g. iteration loops executes same set of instructions
repeatedly.
*Elements of cache design:-
There are a few basic design elements that serve to classify and
differentiate cache architectures. They are listed down:
1. Cache Addresses
2. Cache Size
3. Mapping Function
4. Replacement Algorithm
5. Write Policy
6. Line Size
7. Number of caches
1. Cache Addresses
When virtual addresses are used, the cache can be placed
between the processor and the MMU or between the MMU
and main memory. A logical cache, also known as a virtual
cache, stores data using virtual addresses. The processor
accesses the cache directly, without going through the MMU.
This organization is shown in Figure 3.
A physical cache stores data using main memory physical
addresses. This organization is shown in Figure 4. One
advantage of the logical cache is that cache access speed is
faster than for a physical cache, because the cache can respond
before the MMU performs an address translation.
2. Cache Size:
The size of the cache should be small enough so that the overall
average cost per bit is close to that of main memory alone and
large enough so that the overall average access time is close to
that of the cache alone.
3. Mapping Function:
As there are fewer cache lines than main memory blocks, an
algorithm is needed for mapping main memory blocks into
cache lines. Further, a means is needed for determining which
main memory block currently occupies a cache line. The choice
of the mapping function dictates how the cache is organized.
Three techniques can be used: direct, associative, and set
associative.
 DIRECT MAPPING: The simplest technique, known as
direct mapping, maps each block of main memory into
only one possible cache line.
The direct mapping technique is simple and inexpensive to
implement.
 ASSOCIATIVE MAPPING: Associative mapping overcomes
the disadvantage of direct mapping by permitting each
main memory block to be loaded into any line of the cache
 SET-ASSOCIATIVE MAPPING: Set-associative mapping is a
compromise that exhibits the strengths of both the direct
and associative approaches. With set-associative mapping,
block can be mapped into any of the lines of set.
4. Replacement Algorithms:
Once the cache has been filled, when a new block is brought
into the cache, one of the existing blocks must be replaced. For
direct mapping, there is only one possible line for any particular
block, and no choice is possible. For the associative and set
associative techniques, a replacement algorithm is needed. To
achieve high speed, such an algorithm must be implemented in
hardware. Least Recently Used (LRU), Least Frequently
Used(LFU), First In First Out (FIFO) are some replacement
algorithms.
5. Write Policy
 When a block that is resident in the cache is to be
replaced, there are two cases to consider. If the old block
in the cache has not been altered, then it may be
overwritten with a new block without first writing out the
old block. If at least one write operation has been
performed on a word in that line of the cache, then main
memory must be updated by writing the line of cache out
to the block of memory before bringing in the new block.
 The simplest policy is called write through. Using this
technique, all write operations are made to main memory
as well as to the cache, ensuring that main memory is
always valid. An alternative technique, known as write
back, minimizes memory writes. With write back, updates
are made only in the cache. When an update occurs, a
dirty bit, or use bit, associated with the line is set. Then,
when a block is replaced, it is written back to main
memory if and only if the dirty bit is set.
6. Line Size
 Another design element is the line size. When a block of
data is retrieved and placed in the cache, not only the
desired word but also some number of adjacent words is
retrieved. Basically, as the block size increases, more
useful data are brought into the cache. The hit ratio will
begin to decrease, however, as the block becomes even
bigger and the probability of using the newly fetched
information becomes less than the probability of reusing
the information that has to be replaced.
 The relationship between block size and hit ratio is
complex, depending on the locality characteristics of a
particular program, and no definitive optimum value is
found as of yet.
7. Number of Caches
When caches were originally introduced, the typical system had
a single cache. More recently, the use of multiple caches has
become an important aspect. There are two design issues
surrounding number of caches.
 MULTILEVEL CACHES: Most contemporary designs include

both on-chip and external caches. The simplest such
organization is known as a two-level cache, with the
internal cache designated as level 1 (L1) and the external
cache designated as level 2 (L2). There can also be 3 or
more levels of cache. This helps in reducing main memory
accesses.
 UNIFIED VERSUS SPLIT CACHES: Earlier on-chip cache
designs consisted of a single cache used to store
references to both data and instructions. This is the unified
approach. More recently, it has become common to split
the cache into two: one dedicated to instructions and one
dedicated to data. These two caches both exist at the
same level. This is the split cache. Using a unified cache or
a split cache is another design issue.
*Pentium 4 cache organization:-
The evolution of cache organization is seen clearly in the

evolution of Intel microprocessor.
80386 has no on-chip cache

80486 includes single on-chip cache of 8 Kbytes using a line
size of 16 bytes and a four-way set associative cache
organization.
Pentium (all versions) have 2 on-chip L1 caches. One is for
data and another is for instruction.
in Pentium-4 L1 caches are 8 Kbytes of size using line size of
64 bytes and a 4-way set associative organization.
L2 cache is 256 Kbytes of size using line size of 128 bytes and
it is 8-way associative. It feeds both L1 caches.
Pentium 4 block diagram
System bus
Out-of-order L1 instruction Instruction fetch/
execution logic cache(12 k µ ops) decode unit
64 bits
L2 cache(256 KB)
Integer register file FP register file
Load Stor Simpl Simp Com FP/MMX FP move

e le plex
Add- e
integ- integ inte unit unit
ress add
er er ger
ress
unit unit ALU ALU
unit
L1 data cache (8 KB)

256 bits
The processor core consists of 4 major components.

(1)Fetch/decode unit:-Fetches program instruction in order
form the L2 cache ,decodes these into a series of micro-
operations, and stores the result in L1 cache.
(2)out-of-order execution logic:- schedules execution of the
micro operations subject to dependencies and resource and
resource availability ,thus micro operations may be scheduled
for execution in a different order than they are fetched from
the instruction stream.
(3)Execution Unit:- These units execute micro-operations
fetching required data from the L1 data cache and temporarily
storing results in registers.
(4)Memory subsystem:- This units includes the L2 and L3 cache
and system bus, which is used to access main memory when
the L1 and L2 cache have a cache miss and to access the System
I/O resources.
*ARM cache organization:-

Each cache is implementation-defined and can be one, two or
four-way set associative cache of configurable size. They are
physically indexed and physically addressed. The cache sizes are
configurable with sizes in the range of 1 to 64KB, but the
maximum clock frequency might be affected if you increase the
cache sizes beyond 16KB. Both the instruction cache and the
data cache are capable of providing two words per cycle for all
requesting sources.
The cache way size can be varied between 1KB and 16KB in
powers of 2. A 1KB cache size must be implemented as a 1 way
cache, and a 2KB cache must be implemented as a 2 way cache.
All other cache sizes must be implemented as 4 way set
associative. The cache line length is fixed at eight words (32
bytes).
The maximum cache way size that the processor supports is
16KB. The minimum cache way size that the processor supports
is 1KB. You can disable instruction cache and data cache
together or instruction cache and data cache individually.
Note
If a cache is implemented within the ARM1156T2-S processor,
way 0 must be present.
Write operations must occur after the Tag RAM reads and
associated address comparisons have completed. A three-entry
Write Buffer is included in the cache to enable the written
words to be held until they can be written to cache. One or two
words can be written in a single store operation. The addresses
of these outstanding writes provide an additional input into the
Tag RAM comparison for reads.
To avoid a critical path from the Tag RAM comparison to the
enable signals for the data RAMs, there is a minimum of one
cycle of latency between the determination of a hit to a
particular way, and the start of writing to the data RAM of that
way. This requires the Cache Write Buffer to be able to hold
three entries, for back-to-back writes. Accesses that read the
dirty bits must also check the Cache Write Buffer for pending
writes that result in dirty bits being set. The cache dirty bits for
the data cache are updated when the Cache Write Buffer data
is written to the RAM. This requires the dirty bits to be held as a
separate storage array (significantly, the tag arrays cannot be
written, because the arrays are not accessed during the data
RAM writes), but permits the dirty bits to be implemented as a
small RAM.
The other main operations performed by the cache are cache
line refills and write-back. These occur to particular cache ways,
which are determined at the point of the detection of the cache
miss by the victim selection logic.
To reduce overall power consumption, the number of full

cache reads is reduced by the sequential nature of many cache
operations, especially on the instruction side. On a cache read
that is sequential to the previous cache read, only the data
RAM set that was previously read is accessed, if the read is
within the same cache line. The Tag RAM is not accessed at all
during this sequential operation.
Cache line refills can take several cycles. The cache line
length is eight words.
The control of the level one memory system and the
associated functionality, together with other system wide
control attributes are handled through the system control
coprocessor, CP15.
Internal memory:
Internal memory typically refers to main memory (RAM), but
may also refer to ROM and flash memory. In either
case, internal memory generally refers to chips rather than
disks or tapes.
 In a computer, all of the storage spaces that are accessible by
a processor without the use of the computer input-output.

Internal memory usually includes several types of storage,
such as main storage, cache memory, and special registers, all
of which can be directly accessed by the processor.
 Primary storage (or main memory or internal memory), often
referred to simply as memory, is the only one directly

accessible to the CPU. The CPU continuously reads
instructions stored there and executes them as required. Any
data actively operated on is also stored there in uniform
manner.
 It’s also called ( Primary /Main/Temporary/Semiconductor)
Memory type.

Internal memory
Register Cache RAM ROM
PROM
Static RAM Dynamic RAM
EPROM
SDRAM RDRAM EEPROM
Flash
*RAM (Random Access Memory)
Random access memory, or RAM, is memory storage on a
computer that holds data while the computer is running so that
it can be accessed quickly by the processor. RAM holds the
operating system, application programs and data that is
currently being used.
RAM data is much faster to read than data stored on the hard
disk. RAM is stored in microchips and contains much less data
than the hard disk. RAM can never run out of memory, but the
processor must overwrite old data if the RAM is filled, which
results in slower computer function. Any file stored in RAM can
be accessed directly if the user knows the row and column
where the data is stored.
 Random access memory is used to store temporary but

necessary information on a computer for quick access by open
programs or applications.
 RAM, is a volatile yet fast type of memory used in computers.
RAM is more expensive to incorporate.

 RAM allows reading and writing (electrically) of data at the
byte level
 RAM is the Volatile memory.
Types of RAM
 Static RAM
 Dynamic RAM
Static RAM Static RAM stores a bit of information in a flip-flop.

Static RAM is usually used for applications that do not require
large capacity RAM memory.
Static(RAM) is a memory technology based on flip-flops. SRAM
has an access time of 2 – 10 nanoseconds. All of main memory
can be viewed as fabricated from SRAM, although such a
memory would be unrealistically expensive.
Dynamic RAM
Dynamic RAM data store one bit of information as a payload.
Dynamic RAM using a substrate capacitance gate MOS
transistors as memory cells shut. To keep dynamic RAM stored
data remains intact, the data should be refreshed again by
reading and re-write the data into memory. Dynamic RAM is
used for applications that require large RAM capacity, for
example in a personal computer (PC)
 RDRAM (Rambus Dynamic Random Access Memory)and
SDRAM (Synchronous Dynamic Random Access Memory) are
type of Dynamic RAM.
 DRAM uses Asynchronous memory address i.e the memory
access is not synchronized with the processor clock at the

time in which main memory access request is issued by
processor. In SDRAM ,DRAM access is synchronized with the
processor clock.
 Dynamic RAM (DRAM) is a memory technology based on
capacitors
 Dynamic RAM is cheaper than static RAM and can be packed
more densely on a computer chip

 DRAM has an access time in the order of 60 – 100
nanoseconds, slower than SRAM.

ROM(Read Only Memory):
ROM is a type of storage medium that permanently stores data
on personal computers (PCs) and other electronic devices. ...
Because ROM is read-only, it cannot be changed.
it is permanent and non-volatile, meaning it also holds its
memory even when power is removed.
PROM(Programable Read Only Memory):-

PROM is a type of ROM that is programmed after the memory
is constructed. ... Standard PROM can only be programmed
once.
EPROM(Erasable Programable Read Only Memory):-
It is a type of ROM just like PROM but it can be programmed
multiple time and it can be erased using ultraviolet rays.
EEPROM(Electronic Erasable programmable Read Only
Memory):-
It is user-modifiable read-only memory (ROM) that can be
erased and reprogrammed (written to) repeatedly through the
application of higher than normal electrical voltage.
Unlike EPROM chips, EEPROMs do not need to be removed
from the computer to be modified.
FLASH Memory:- EEPROM requires a special programming
device to write data and a separate power connection and also
need higher voltage for operation and writing process is slow
in EEPROM because data are written byte by byte.
Flash memory overcomes these problems and also provides
much higher data storage capacity.
*Advanced DRAM organization:-

The traditional DRAM is constrained both by its internal
architecture and by its interface to processor's memory bus.
The new enhancements (most common) on DRAM architecture
are:
1. Synchronous DRAM (SDRAM)

2. Rambus DRAM (RDRAM)
3. Double Data Rate DRAM (DDR DRAM)
4. Cache DRAM (CDRAM)
1. Synchronous DRAM (SDRAM)
 Exchange data with the processor synchronized to an external
clock signal and running at the full speed of the
processor/memory bus without imposing wait states.
 One word of data is transmitted per clock cycle (single data
rate).[All control, address, & data signals are only valid (and
latched) on a clock edge.]
 Typical clock frequencies are 100 and 133 MHz.
 SDRAM has multiple-bank internal architecture that improves
opportunities for on-chip parallelism. Generally it uses dual
data banks internally. It starts access in one bank then next, and
then receives data from first then second.
 SDRAM performs best when it is transferring large blocks of
data serially, such as for applications like word processing,
spreadsheets and multimedia.
2. Rambus DRAM (RDRAM)
 RDRAM chips are vertical packages with all pins on one side.
The chip exchanges data with the processor over 28 wires no
more than 12 cm long. The bus can address up to 320 RDRAM
chips.
 It has tricky system level design where the bus itself defines
impedance, clocking and signalling very precisely.
 More expensive memory chips.
 Concurrent RDRAMs have been used in video games, while
Direct RDRAMs have been used in computers.
 It can transfer data at a speed up to 800 MHZ.
3. Double Data Rate DRAM (DDR DRAM)
 Uses both rising (positive edge) and falling (negative) edge of
clock for data transfer. That is DDR SDRAM can send data twice
per clock cycle - once on the rising edge of the clock pulse and
once on the falling edge.
 There has been improvement in DDR DRAM technology. The
later generations (DDR2 and DDR3) increases the data rate by
increasing the operational frequency of the RAM chip and by
increasing the prefetch buffer from 2 bits to 4 bits per chip.
 DDR can transfer data at a clock rate in the range of 200MHz to
600 MHz, DDR2 can transfer in the range of 400MHz to
1066MHz and DDR3 can transfer in the range of 800MHz to
1600 MHz.
4. Cache DRAM (CDRAM)
 Integrates a small SRAM cache onto a generic DRAM chip.
 The SRAM on CDRAM can be used in two ways
 It can be used as true cache.
 It can be used as buffer to support the serial access of a block
of data.
 This architecture achieves concurrent operation of DRAM and
SRAM synchronized with an external clock. Separate control
and address input terminals of the two portions enable
independent control of the DRAM and SRAM, thus the system
achieves continuous and concurrent operation of DRAM and
SRAM.
 CDRAM can handle CPU, direct memory access (DMA) and
video refresh at the same time by utilizing a high-speed video
interface.
 CDRAM can replace cache and main memory, and it is has
already been proven that a CDRAM based system has a 10 to 50
percent performance advantage over a 256kbyte cache based
system.
*Error correction:-
There are 2 types of error
1. Hard Failure:-it related to physical defects like memory shells

are damaged. This error may be report to the user.
2. Soft error:-if the user storing in memory shell the content of

the memory may be changed. It doesn’t report to the user.
Error Signal
Data out
Corrector
Data in M
Memory M K
Compare
f
f K
K
M=Data bit
K=Code bit
f=Code function
Here in the above figure M is data bit and K is the code bit
and f is the code function. There are 2 code function (f ).First
code function is used when data is stored in the Particular
memory location and the second code function is used when
the data is retrieved from the Particular memory Location. Both
the code bits are compared in the comparison function if it
gives any error comparison function gives a signal to the
corrector then corrector collects the data bit from memory and
correct it then data will be out. Suppose after comparing the
code bit there is a error and it will not be corrected then it gives
a error signal.
****************************************************
****************************************************
Unit-2:-
*External memory:-
External memory typically refers to storage in an external hard
drive or internet. It is also known as Auxiliary memory used in
computer system are generally magnetic disks and magnetic
tapes.
Other components are also used in as external storage like
magnetic drums, magnetic bubble memory and optical disks.
*Magnetic disks:-
A magnetic disk is a circular plate constructed of metal or
plastic coated with magnetized material. Both sides of the disk
are used and several disks may stacked on one spindle with
read/write heads available on each surface.
All disks rotate at a high speed and are not stopped or
started for access purpose. Bits are stored in magnetized
surface in spots along concentric circles called tracks.
 Tracks are commonly divided into sections called sectors .In
most systems, the minimum quantity of information which can
be transferred is sector.
Some units use a single read/write head for each disk
surface. In this type of unit the track address bits are used by a
mechanical assembly to move the head in to specified track
position before reading or writing in other disk systems,
separate read/write tracks are provided for each track in each
surface.
permanent timing tracks are used in disk to synchronize the
bits and recognize the sectors.
A disk system is addressed by address bits that specify the
disk number, disk surface, sector number and the track within
the sector.
After the read/write heads are positioned in the specified
track, the system has to e wait until the rotating disk reaches
the specified sector under the read/write head.
Information transfer is very fast after once the beginning of a
sector has been reached.
Disks may have multiple heads and simultaneous transfer of
bits from several tracks at the same time.
s Tracks
e
c
t
o
r
s
Read/write head
[Magnetic Disk diagram]
A track in a given sector near the circumference Is longer than
the track near the center of the disk. If bits are recorded with
equal density some tracks will contain more recorded bits than
others. To make all the records in a sector of equal length,
some disks use a variable recording density with higher density
on tracks near the center than on tracks near the
circumference. This equalizes the number of bits on all tracks of
a given sector.
NOTE:-
 The disks that are permanently attached to the unit
assembly and can’t be removed by the occasional user are
called hard disks.
 A disk drive with removable disks is called a floppy
disk are generally made of plastic coated and magnetic
recording material.
*RAID solid state driver:-

In Redundant Arrays of Independent Disks (RAID) system,
multiple disks operate in parallel to store the same information.
It improves storage reliability. It eliminates the risk of data loss
when one of the disk fails. Also, a large file is stored in several
disk units by breaking the file up into a number of smaller
pieces and storing these pieces on different disks. This is called
data stripping.
RAID(redundant arrays of independent disks) solid state
driver is a methodology commonly used to protect data by
distributing redundant data blocks across multiple solid state
drivers.
Storage systems generally do not use RAID to pool SSDs for
performance purposes. Flash-based SSDs inherently offer
higher performance than HDDs(Hard drive disks), and enable
faster rebuilds in parity-based RAID. Rather than improve
performance, vendors typically use SSD-based RAID to
protect data if a drive fails.
The term SSD RAID is sometimes used as an alternative
name for a storage array that is equipped with flash-based
SSDs and uses a form of RAID.
There are 3 key concepts in RAID these are
*Mirroring:- In which data is written simultaneously in 2
separate drives.
*Stripping:- In which data is split evenly across 2 or more
drives.
*Parity:- In which Raw binary data is passed through an
operation to calculate a binary result or parity block used for
redundancy and error correction.
Hard disk driver and solid state driver based systems include
RAID-0(Simple Stripping),RAID-1(simple or multi
mirroring),RAID-3(Byte Level stripping ,plus 1 drive dedicated to
strong parity information),RAID-4(block level striping with a
parity drive),RAID-5(block level stripping with a distributed
parity scheme).
Stripping with no redundancy or parity is often used to
increase performance. Stripping with parity or double parity
strengthens data protection.
In most RAID types stripping redundant data blocks enables
system to reconstruct the lost information in one or more drive
fail.
The term solid state driver Array sometimes used as an
alternative name for a storage array that is equipped with flash
based solid state drivers and uses a form of disk.
Advantages of SSD-based storage arrays over HDD-based

storage arrays include reduced access time and superior I/O
performance. However, ideal SSD RAID performance requires
the optimum combination of microprocessor, cache, software
and hardware resources. When all these factors work together
in the best possible way, an SSD RAID can significantly
outperform a RAID of comparable HDD-based storage capacity.
A typical SSD consumes less power than an HDD. When large

numbers of drives are combined, the power savings of an SSD
RAID array compared with an HDD RAID array can translate to
lower long-term operating costs. In large centres, the improved
efficiency of SSDs compared with mechanical HDDs can also
reduce the cooling cost, both in terms of simpler cooling
systems and lower electric bills.
Limitations of SSD:-SSD RAID has limitations and drawbacks,

largely related to the storage media. SSDs carry a higher price
per gigabyte compared to HDDs of comparable storage
capacity. NAND flash-based drives are limited to a certain
number of program/erase cycles before they wear out, become
unreliable and require replacement.
*optical memory:-
Optical memories are used for large storage of data. These
devices provide the option of variety of data storage. These can
save up to 20 GB of information.
The data or information is read or written using a laser beam.
Due to its low cost and high data storage capacity these
memories are being freely used.
Apart from low cost these memories have long life. But the
problem is that of low access time.
CD-ROM:-CD ROM or compact disk read only memory are
optical storage device which can be easily read by computer
but not written. CDROMs are stamped by the vendor and once
stamped; they can’t be erased and filled with new data.
To read a CD,CD-Rom player is needed. All CDROM conform to
a standard size and format so any type of CDROM can be
loaded in to any CD-ROM player. In addition, CDROM players
are capable of playing audio. CDs, which share the same
technology.
CD-ROM are particularly well-Studio to information that
requires large storage capacity which includes large software
application that support color, Graphics. Sound and especially
video.
Advantages of CD-ROM:
Storage capacity is high.
storage cost per bit is reasonable.
Easy to carry.
can store variety of data.
Disadvantages of CD-Rom
CD-ROMs are read only
Access tine is more than harddisk.
WORM:- WORM or write once read many or CD-R or CD-
recordable are kind of opctical device which provides the use of
library to write once on the CD-R.
The user can write on the disk using the CD-R disk drive
unity. But this data or information can’t be overwritten or
changed.
CD-R doesn’t allows re-writing through reading can be done
many time.
Advantages of WORM:
Storage capacity is high
Can record once.
Reliable
Runs longer
Access time is good.
Disadvantages or limitation of WORM:-
Can be written only once erasable optical disk.
Erasable optical disk:-Erasable optical are also called CD-RW or
CD rewritable it gives the user liberty of erasing data already
written by burning the microscopic point on the disk surface
the disk can be reused
Advantage of CD-RW:-
Storage capacity is very hiagh
reliablity is high.
runs longer
eassy to rewrite
Limitation of CD-RW:-
Access time is high.
DVD-ROM:
DVD-ROM,DVD-R and DVD-RAM:DVD or digital versatile disk is
another form of optical storage. These are higher in capacity
the CDs.
Pre-record DVDs are mass produce using molding machine
that physically stamp.
Data on to the DVD such disk are known as DVD-Rom
because data can only be read and not written nor erased.
DVD-Rs are the blank recordable DVDs which can be record
once using optical disk recording technology by using DVD
record and then function as DVD-Rom.
DVD-ROM rewritable DVDs DVD-RAM can be record and erased
multiple time.
*Magnetic tapes:-
A magnetic- tape transport consist of the electrical ,
mechanical and electronic components to provides the parts of
control mechanism for a magnetic tape unit.
The tape itself a strip of plastic coated with a magnetic
recording material.
Bits are recorded as magnetic spots on the tape along several
tracks. Usually 7 or 9 bits are recorded simultaneously o form a
character together with a parity bit.
Read/write heads are mounted one in each track so that that
can be recorded and read as a sequence of characters.
Magnetic tape units can be stopped, started and move
forward or in reverse or can be rewound However, they can’t
be started or stopped fast enough between individual
characters.
For this reason, information is recorded in blocks referred to
as records. Gaps of unrecorded tapes are inserted between
records where the tape can be stopped.
The tapes start moving while in a gap and attains it’s
constant speed by the time It reaches the next record.
Each record on tape has a identification bit pattern at the
beginning and end. By reading the bit pattern at the beginning ,
the tape control identifies the record number.
By reading the bit pattern at the end of the record the
control recognizes the beginning of the gap.
-->A tape unit is addressed by specifying the record number
and the number of characters in the record. Records may be
fixed or variable length.
*Input output:-
I/O operations are accomplished through external devices that
provide a means of exchanging data between external
environment and computer.
*External Devices:-
External Devices can be categorized as:-
1. Human Readable: suitable for communicating with
computer user, for example video display terminals and
printers.
2. Machine Readable: suitable for communicating with
equipment. For example- sensor, actuators used in robotic
Applications.
3. Communication: Suitable for communicating with remote
devices. They may be human readable device such as
terminal and machine readable device such as another
computer.
*I/O modules:-
The computer will be no of use if it is not communicating with
the external world. A computer must have a system to receive
information from the outside world and must be able to
communicate to external world. Thus a computer consists of an
I/O devices and the I/O module, which not only consist of I/O
device with the system bus, but plays a very crucial role in
between. A device which Is connected to an I/O module of
computer is called a peripheral device. The I/O module is
normally connected to the computer system one end and one
or more I/O devices on the other.
An I/O module is needed because of:
(a)Diversity of I/O devices makes it difficult to include all the
peripheral device logic (i.e its control commands and data
format etc.) in to CPU.
(b)The I/O devices are usually slower than the memory and
CPU. Therefore it is not advisable to use them on high speed
system bus directly for communication purpose.
(c)The data format and word length used by the peripheral may
be quite different than that of CPU.
*Thus we can say that:-
(i)An I/O module is modulator between the processor and I/O
devices.
(ii)It controls the data exchange between the external devices
and the main memory or external devices and the CPU register.
(iii)An I/O module provide an interface internal to to the
computer which connects it to external device or peripheral.
(iv)The I/O module shouldn’t only communicate the
information from CPU to I/O device but it should also
coordinate these two.
(v)In addition since there are speed difference between CPU
and I/O devices, The I/O module should have facilities like
Buffer (storage area) and error detection mechanism.
Functions of I/O modules:
The major functions of I/O modules are:
1. processor communication:-This Involves the following
tasks:-
a. Exchange of data- between processor and I/O
module.
b. Command decoding- I/O module accepts commands
sent from the processor E.g The I/O module for a disk
drive may accepts the following commands from the
processor. READ SECTOR,WRITE SECTOR,SEEKTRACK
etc.
c. Status reporting- The device must be able to report
its status to the processor. E.g disk drive busy, read
etc. status reporting may also involves reporting
values errors.
d. Address recognition- Each I/O device has a unique
address and the I/O module must recognize this
address.
2. Device communication:- The I/O module must be able to
perform communications such as status reporting.
3. Control and timing:- The I/O module must be able to
coordinate the flow of data between the internal
resources(such as processor and memory) and external
devices.
4. Data buffering:- This is necessary as there is a speed
mismatch between the speed of data transfer between
processor and memory and external devices. Data coming
from the main memory are sent to an I/O module and
then sent to the peripheral device at its rate.
5. Error Detection:- The I/O module must also be able to
detect errors and report them to the processor. These
errors may be mechanical errors (such as paper jam in a
printer). Or changes in the bit pattern of transmitted data.
A common ways of detecting such errors is by using parity
bits.
Interface to system bus Interface to external device
Data
External
Data Registers Device Data
Status Links
Interface
Status control Registers
Logic Control
Data
External
Address I/O Device Status

Links Control
Logic Interface Control
Links Status
Logic
*Programmed I/O:-
Using this technique data transfer takes place under the direct
control of processor. The processor must continuously check an I/O
device and hence it can’t do another task. Hence this method is
inefficient (slow).
Memory Device
and
MAB CB DB Status FF
DAB DB/ Controller
Register DC
Processor
CB
DB
DB-Data Bus
DAB-Device Address Bus
DC-Device Control Signal
MAB-Memory Address Bus
CB-Control Bus
Characteristics of programmed I/O:-
1. In a programmed I/O, The I/O operations are controlled
Completely by CPU.
2. Used in real time and embedded systems.
3. Used in CPUs, which have a single input and a single output
instruction. Each of these instructions selects one device.
4. The disadvantages of this technique is that the CPU spends
most of its time waiting for the device to become ready.
*Interrupt Driven I/O:-
The basic drawback of programmed I/O is that the speed of I/O
device is much slower in comparison to that of CPU, and because
the CPU has to repeatedly check whether a device is free; or wait till
the comparison of I/O, therefore, the performance of CPU in
programmed I/O goes down tremendously.
Interrupt:-
The term interrupt is used for nay event that causes temporary
transfer of that control of CPU that program to the another which is
causing the interrupt. Interrupts are primarily issued on :-
1. Initiations of I/O operations (interrupt issued by I/O devices)
2. Completion of an I/O operation.
3. Occurrence of Hardware or Software errors.
Interrupts can be generated by various sources internal or external
to the CPU.
In the internal driven I/O the processor issues a READ/WRITE
instruction to the device and then continuous to doing its task.
When the interface buffer Is full, and its ready to send data to the
processor, The interface sends signal to the processor informing it
that data is ready. This signal is called as the interrupt signal. When
the processor receives the interrupt signal it knows that the data is
ready; it suspended its current job and transfers data buffers to its
own registers.
Disadvantages of interrupt driven I/O technique:-
The processor must suspend its work and later resume it. If there
are many devices, each can issue an interrupt and then processor
must be able to attend each of these based on some priority.
The Role Of Processor In Interrupt Driven I/O:-
When an I/O device is ready to send data, the following events
occur:-
1. The device issued an interrupted signal to the processor.
2. The processor finishes execution of the current instruction, it
then responds to the interrupted signal.
3. The processor sends an acknowledgement signal to the device
that sent the interrupt. The device then removes its
interrupted signal.
4. The processor must save the state of current task (i.e the
values of registers, The address of next Instruction to be
executed etc.) These are saved on to a stack.
5. The processor then attends to the device that issued to the
interrupted signal.
6. When an interrupt processing is over the saved registers are
retrieved from the stack and the processor continuous its
previous task signal from the point where it was last stopped.
When the processor detects an interrupt. It executes an
interrupt-service routine. This routine polls each I/O device to
determine which device caused the interrupted. This technique is
called software poll.
*Direct Memory access:-
DMA interface
Status Device
Memory
MAB and MAB
Data Register
DB
Controller
MAR
DB
Processor
MAB
DA and CB
This method eliminates the need for a continuous involvement of

the processor in the I/O operation. The data transfer takes place as
follows:
1. When a read instruction is encountered, The processor sends
the Device Address Bus (DAB). This is decoded by the I/O
controller and the DMA interface of the appropriated device is
selected. The processor also sends the address (in RAM) where
the data is to be stored. The read command is used.
2. The processor continuous with the next instruction in the
program. It has no further role to play data transfer.
3. The DMA status register is sent to 1 to indicate the BUSY status
. Data is read from the device and stored in the DMA’s data
register (buffer).
4. When data has been entered in the data register, the data
ready flip-flop is set to 1and in interrupt is send to the
processor.
5. The processor completes the current instruction. It then gives
control of MAB and DB interface. The DMA transfers data from
its data register to the memory address specified.
Cycle stealing:-
The process of taking control of memory cycle to transfer data is
known as cycle stealing. The DMA transfers one data word at a time
after which it must return control to the busses to the CPU.
The CPU delays its operation for 1 cycle to allow the DMA to steal
one memory cycle.
*I/O Channels and processors:-

A channel is an independent hardware component that co-
ordinate all I/O to a set of controllers. Computer systems that
use I/O channel have special hardware components that handle
all I/O operations.
Channels use separate, independent and low cost processors
for its functioning which are called Channel Processors.
Channel processors are simple, but contains sufficient
memory to handle all I/O tasks. When I/O transfer is complete
or an error is detected, the channel controller communicates
with the CPU using an interrupt, and informs CPU about the
error or the task completion.
Each channel supports one or more controllers or
devices. Channel programs contain list of commands to the
channel itself and for various connected controllers or devices.
Once the operating system has prepared a list of I/O
commands, it executes a single I/O machine instruction to
initiate the channel program, the channel then assumes control
of the I/O operations until they are completed.
IBM 370 I/O Channel

The I/O processor in the IBM 370 computer is called a Channel.
A computer system configuration includes a number of
channels which are connected to one or more I/O devices.
Categories of I/O Channels

Following are the different categories of I/O channels:
Multiplexer
The Multiplexer channel can be connected to a number of slow
and medium speed devices. It is capable of operating number
of I/O devices simultaneously.
Selector
This channel can handle only one I/O operation at a time and is
used to control one high speed device at a time.
Block-Multiplexer
It combines the features of both multiplexer and selector
channels.
The CPU directly can communicate with the channels
through control lines. Following diagram shows the word
format of channel operation.
*THE EXTERNAL INTERFACE: THUNDERBOLT AND INFINIBAND
Types of Interfaces
The interface to a peripheral from an I/O module must be tailored

to the nature and operation of the peripheral. One major
characteristic of the interface is whether it is serial or parallel .
In a parallel interface, there are multiple lines connecting the I/O
module and the peripheral, and multiple bits are transferred
simultaneously, just as all of the bits of a word are transferred
simultaneously over the data bus.
In a serial interface, there is only one line used to transmit data,
and bits must be transmitted one at a time. A parallel interface has
traditionally been used for higher-speed peripherals, such as tape
and disk, while the serial interface has traditionally been used for
printers and terminals.
With a new generation of high-speed serial interfaces, parallel
interfaces are becoming much less common. In either case, the I/O
module must engage in a dialogue with the peripheral. In general
terms, the dialogue for a write operation is as follows:
1. The I/O module sends a control signal requesting permission to

send data.
2. The peripheral acknowledges the request.
3. The I/O module transfers data (one word or a block depending on

the peripheral).
4. The peripheral acknowledges receipt of the data.

A read operation proceed similarly.
Key to the operation of an I/O module is an internal buffer that can

store data being passed between the peripheral and the rest of the
system. This buffer allows the I/O module to compensate for the
differences in speed between the system bus and its external lines.
Thunderbolt
The most recent, and fastest, peripheral connection technology to

become available for general-purpose use is Thunderbolt,
developed by Intel with collaboration from Apple.
One Thunderbolt cable can manage the work previously required

of multiple cables.
The technology combines data, video, audio, and power into a

single high-speed connection for peripherals such as hard drives,
RAID (Redundant Array of Independent Disks) arrays, video-capture
boxes, and network interfaces. It provides up to 10 Gbps
throughput in each direction and up to 10 Watts of power to
connected peripherals.
Although the technology and its associated specifications have

stabilized, the introduction of Thunderbolt-equipped devices into
the marketplace has, as of this writing, only slowly begun to
develop. This is because a Thunderbolt-compatible peripheral
interface is considerably more complex than that of a simple USB
device.
 The first generation of Thunderbolt products are primarily aimed

at the prosumer (professional-consumer) market such as
audiovisual editors who want to be able to move large volumes of
data quickly between storage devices and laptops. As the
technology becomes cheaper, Thunderbolt will find mass consumer
uses, such as enabling very high-speed data backups and editing
high-definition photos. Thunderbolt is already a standard feature of
Apple’s Mac Book Pro laptop and iMac desktop computers.
cable and connector layer provides transmission medium access.

This layer specifies the physical and electrical attributes of the
connector port.
The Thunderbolt protocol physical layer is responsible for link

maintenance including hot-plug3 detection and data encoding to
provide highly efficient data transfer. The physical layer has been
designed to introduce very minimal overhead and provides full-
duplex 10 Gbps of usable capacity to the upper layers.
The common transport layer is the key to the operation of
Thunderbolt and what makes it attractive as a high-speed
peripheral I/O technology. Some of the features include:
• A high-performance, low-power, switching architecture.
• A highly efficient, low-overhead packet format with flexible

quality of service (QoS) support that allows multiplexing of bursty
PCI Express transactions
The application layer contains I/O protocols that are mapped

onto the transport layer. Initially, Thunderbolt provides full support
for PCIe and DisplayPort protocols. This function is provided by a
protocol adapter, which is responsible for efficient encapsulation of
the mapped protocol information into transport layer packets.
Mapped protocol packets between a source device and a
destination device may be routed over a path that may cross
multiple Thunderbolt controllers. At the destination device, a
protocol adapter re-creates the mapped protocol in a way that is
indistinguishable from what was received by the source device. The
advantage of doing protocol mapping in this way is that
Thunderbolt technology–enabled product devices appear as PCIe or
DisplayPort devices to the operating system of the host computer,
thereby enabling the use of standard drivers that are available in
many operating systems today.
InfiniBand
InfiniBand is a recent I/O specification aimed at the high-end server

market. For The first version of the specification was released in
early 2001 and has attracted numerous vendors. The standard
describes an architecture and specifications for data flow among
processors and intelligent I/O devices. Infini Band has become a
popular interface for storage area networking and other large
storage configurations. In essence, InfiniBand enables servers,
remote storage, and other network devices to be attached in a
central fabric of switches and links. The switch-based architecture
can connect up to 64,000 servers, storage systems, and networking
devices.
INFINIBAND ARCHITECTURE Although PCI is a reliable

interconnect method and continues to provide increased speeds,
up to 4 Gbps, it is a limited architecture compared to InfiniBand.
With InfiniBand, it is not necessary to have the basic I/O interface
hardware inside the server chassis. With InfiniBand, remote
storage, networking, and connections between servers are
accomplished by attaching all devices to a central fabric of switches
and links. Removing I/O from the server chassis allows greater
server density and allows for a more flexible and scalable data
center, as independent nodes may be added as needed.
Unlike PCI, which measures distances from a CPU motherboard in

centimeters, InfiniBand’s channel design enables I/O devices to be
placed up to 17 meters away from the server using copper, up to
300 m using multimode optical fiber, and up to 10 km with single-
mode optical fiber. Transmission rates has high as 30 Gbps can be
achieved.
The key elements of InfiniBand’s are as follows:

• Host channel adapter (HCA): Instead of a number of PCI slots, a
typical server needs a single interface to an HCA that links the
server to an InfiniBand switch. The HCA attaches to the server at a
memory controller, which has access to the system bus and
controls traffic between the processor and memory and between
the HCA and memory. The HCA uses direct-memory access
(DMA) to read and write memory.
• Target channel adapter (TCA): A TCA is used to connect storage

systems, routers, and other peripheral devices to an InfiniBand
switch.
• InfiniBand switch: A switch provides point-to-point physical

connections to a
variety of devices and switches traffic from one link to another.

Servers and devices communicate through their adapters, via the
switch. The switch’s intelligence manages the linkage without
interrupting the servers’ operation.
• Links: The link between a switch and a channel adapter, or

between two switches.
• Subnet: A subnet consists of one or more interconnected

switches plus the links that connect other devices to those
switches.
•Router:- Router Connects InfiniBand subnets, or connects an

InfiniBand switch to a network, such as a local area network, wide
area network, or storage area network.
INFINIBAND OPERATION
Each physical link between a switch and an attached interface

(HCA or TCA) can be support up to 16 logical channels, called virtual
lanes. One lane is reserved for fabric management and the other
lanes for data transport. Data are sent in the form of a stream of
packets, with each packet containing some portion of the total data
to be transferred, plus addressing and control information. Thus, a
set of communications protocols are used to manage the transfer of
data. A virtual lane is temporarily dedicated to the transfer of data
from one end node to another over the InfiniBand fabric.
The InfiniBand switch maps traffic from an incoming lane to an

outgoing lane to route the data between the desired end points.
*IBM zENTERPRISE 196 I/O STRUCTURE
The zEnterprise 196 is IBM’s latest mainframe computer offering (at

the time of this writing), introduced in 2010. The system is based on
the use of the z196 chip, which is a 5.2-GHz multicore chip with four
cores. The z196 architecture can have a maximum of 24 processor
chips for a total of 96 cores. In this section, we look at the I/O
structure of the zEnterprise 196.
Channel Structure
The zEnterprise 196 has a dedicated I/O subsystem that manages all
I/O operations, completely off-loading this processing and memory
burden from the main processors. up to 4 of these can be dedicated
for I/O use, creating 4 channel subsystems(CSS). Each CSS is made
up of the following elements:
• System assist processor (SAP): The SAP is a core processor

configured for I/O operation. Its role is to offload I/O operations
and manage channels and the I/O operations queues. It relieves the
other processors of all I/O tasks, allowing them to be dedicated to
application logic.
• Hardware system area (HSA): The HSA is a reserved part of the

system memory containing the I/O configuration. It is used by SAPs.
A fixed amount of 16 GB is reserved, which is not part of the
customer-purchased memory. This provides for greater
configuration flexibility and higher availability by eliminating
planned and preplanned outages.
• Logical partitions: A logical partition is a form of virtual machine,

which is in essence, a logical processor defined at the operating
system level.5 Each CSS supports up to 16 logical partitions.
• Subchannels: A subchannel appear to a program as a logical

device and contain the information required to perform an I/O
operation. One subchannel exists for each I/O device addressable
by the CSS. A subchannel is used by the channel subsystem code
running on a partition to pass an I/O request to the channel
subsystem. A subchannel is assigned for each device defined to the
logical partition. Up to 196k sub channels are supported per CSS.
• Channel path: A channel path is a single interface between a

channel subsystem and one or more control units, via a channel.
Commands and data are sent across a channel path to perform I/O
requests. Each CSS can have up to 256 channel paths.
• Channel: Channels are small processors that communicate with

the I/O control units (CUs). They manage the data transfer between
memory and the external devices.
This elaborate structure enables the mainframe to manage a

massive number of I/O devices and communication links. All I/O
processing is offloaded from the application and server processors,
enhancing performance. The channel subsystem processors are
somewhat general in configuration, enabling them to manage a
wide variety of I/O duties and to keep up with evolving
requirements. The channel processors are specifically programmed
for the I/O control units to which they interface.
*****************************************************************************************************************************
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
*****************************************************************************************************************************

Cache Memory: Computer Architecture Unit-1

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Cache Memory: Computer Architecture Unit-1

Caricato da

Copyright:

Formati disponibili

Computer Architecture

*Cache memory principles

5. Locality of reference implies that future references will likely

Data Data Buffer

10. Cache connects to the processor via data control and

Receive address (RA)

Access main memory for

Allocate Cache line for main

Load main memory block Deliver RA word to CPU

(Flow chart for cache read operation)

 MULTILEVEL CACHES: Most contemporary designs include

*Pentium 4 cache organization:-

The evolution of cache organization is seen clearly in the

80386 has no on-chip cache

Integer register file FP register file

Load Stor Simpl Simp Com FP/MMX FP move

L1 data cache (8 KB)

The processor core consists of 4 major components.

*ARM cache organization:-

To reduce overall power consumption, the number of full

a processor without the use of the computer input-output.

referred to simply as memory, is the only one directly

Register Cache RAM ROM

SDRAM RDRAM EEPROM

 Random access memory is used to store temporary but

RAM is more expensive to incorporate.

Static RAM Static RAM stores a bit of information in a flip-flop.

access is not synchronized with the processor clock at the

more densely on a computer chip

nanoseconds, slower than SRAM.

PROM(Programable Read Only Memory):-

*Advanced DRAM organization:-

1. Synchronous DRAM (SDRAM)

1. Hard Failure:-it related to physical defects like memory shells

2. Soft error:-if the user storing in memory shell the content of

*RAID solid state driver:-

Advantages of SSD-based storage arrays over HDD-based

A typical SSD consumes less power than an HDD. When large

Limitations of SSD:-SSD RAID has limitations and drawbacks,

Interface to system bus Interface to external device

Address I/O Device Status

This method eliminates the need for a continuous involvement of

*I/O Channels and processors:-

IBM 370 I/O Channel

Categories of I/O Channels

*THE EXTERNAL INTERFACE: THUNDERBOLT AND INFINIBAND

The interface to a peripheral from an I/O module must be tailored

1. The I/O module sends a control signal requesting permission to

2. The peripheral acknowledges the request.

3. The I/O module transfers data (one word or a block depending on

4. The peripheral acknowledges receipt of the data.

Key to the operation of an I/O module is an internal buffer that can

The most recent, and fastest, peripheral connection technology to

One Thunderbolt cable can manage the work previously required

The technology combines data, video, audio, and power into a

Although the technology and its associated specifications have

 The first generation of Thunderbolt products are primarily aimed

cable and connector layer provides transmission medium access.

The Thunderbolt protocol physical layer is responsible for link

• A high-performance, low-power, switching architecture.

• A highly efficient, low-overhead packet format with flexible

The application layer contains I/O protocols that are mapped

InfiniBand is a recent I/O specification aimed at the high-end server