Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
CHAPTER 1 INTRODUCTION
Hierarchical system:
A set of interrelated subsystems, each subsystem hierarchic in structure until some lowest level of elementary subsystems is reached
At each level of the system, the designer is concerned with structure and function.
Function
General computer functions: Data processing Data storage Data movement Control
Operations
Data movement
Ex., keyboard to screen
Operations
Storage
Ex., Internet download to disk Playing an mp3 file stored in memory to earphones attached to the same PC.
Operations
Processing from/ to storage
Any number-crunching application that takes data from memory and stores the result back in memory. ex., updating bank statement
Operations
Processing from storage to I/O
Receiving packets over a network interface, verifying their CRC, then storing them in memory.
ex., printing a bank statement
Structure
Four main structural components
CPU Main Memory I/O Devices System Interconnection
Structure
Four main structural components
1. Central Processing Unit (CPU)
Controls the operation of the computer and performs its data processing functions; often simply referred to as processor. 2. Main Memory Stores data
Structure
Four main structural components
3.I/O
moves data between the computer and its external environment. 4. System Interconnection Some mechanism that provides for communication among CPU, main memory, and I/O. A common example of system interconnection is a system bus consisting of a number of wires to w/c all
Computer
Central Processing Unit Main Memory
Computer
Systems Interconnection
Communication lines
Input Output
CPU
Registers Arithmetic and Logic Unit
System Bus
CPU
Memory
Control Unit
ALU
Control Memory
o In 1946, von Neumann and his colleagues began the design of a new stored program computer, referred to as the IAS computer. o The IAS computer, although not completed until 1952, is the prototype of all subsequent general-purpose computers.
o o o
Computer) 1947 -Eckert-Mauchly Computer Corporation first successful commercial computer. It was intended for both scientific and commercial applications. US Bureau of Census 1950 calculations Became part of Sperry-Rand Corporation Late 1950s -UNIVAC II -Faster -More memory
IBM
o Punched-card processing equipment o 1953 -the 701 o IBMs first stored program computer o Scientific calculations o 1955 -the 702 o Business applications o Lead to 700/7000 series
Transistors
o o o o
is smaller, cheaper, and dissipates less heat than a vacuum tube but can be used in the same way as a vacuum tube to construct computers invented at Bell Labs in 1947 by William Shockley IBM 7000 DEC (Digital Equipment Corporation) was founded in 957 Produced PDP-1 in the same year
Moores Law Increased density of components on chip Gordon Moore co-founder of Intel Number of transistors on a chip will double every year Since 1970s development has slowed a little
Number of transistors doubles every 18 months
Cost of a chip has remained almost unchanged Higher packing density means shorter electrical paths, giving higher performance Smaller size gives increased flexibility Reduced power and cooling requirements Fewer interconnections increases reliability
IBM 360 Series first planned family of computers. Similar or identical O/S Increasing speed Increasing number of I/O ports (i.e. more terminals) Increased memory size Increased cost Multiplexed switch structure
DEC PDP - 8
1964 First minicomputer (after miniskirt!) Did not need air conditioned room Small enough to sit on a lab bench $16,000 -$100k+ for IBM 360 Embedded applications & OEM BUS STRUCTURE
Semiconductor Memory
1970
Fairchild Size of a single core -i.e. 1 bit of magnetic core storage Holds 256 bits Non-destructive read Much faster than core Capacity approximately doubles each year
Microprocessors -Intel 1971 -4004 First microprocessor All CPU components on a single chip 4 bit Multiplication by repeated addition, no hardware multiplier! Followed in 1972 by 8008 8 bit Both designed for specific applications 1974 -8080 Intels first general purpose microprocessor
1970s Processors
1980s Processors
1990s Processors
Recent Processors
Performance Balance
While processor power has raced ahead at breakneck speed, other critical components of the computer have not kept up. The result is a need to look for performance balance: an adjusting of the organization and architecture to compensate for the mismatch among the capabilities of the various components. Processor speed increased Memory capacity increased
While processor speed has grown rapidly, the speed with which data can be transferred between main memory and the processor has lagged badly. The interface between processor and main memory is the most crucial pathway in the entire computer because it is responsible for carrying a constant flow of program instructions and data between memory chips and the processor. If memory or the pathway fails to keep pace with the processors insistent demands, the processor stalls in a wait state, and valuable processing time is lost.
Solutions
Increased number of bits retrieved at one time
Make DRAM wider rather than deeper
I/O Devices
As computers become faster and more capable, more sophisticated applications are developed that support the use of peripherals with intensive I/O demands. Solutions
Caching Buffering Higher-speed interconnection buses More elaborate bus structures Multiple processor configurations
ARM Designed by ARM Inc., Cambridge, England Its not a processor, but an architecture AMR licenses it to manufacturers As of 2007, about 98 percent of the more than one billion mobile phones sold each year use at least one ARM processor ARM chips are the processors in Apples popular iPod and iPhone devices. ARM is probably the most widely used embedded processor architecture and indeed the most widely used processor architecture of any kind in the world.
ARM Evolution
ARM processors are designed to meet the needs of three system categories:
Embedded real-time systems: Systems for storage, automotive body and power-train, industrial, and networking applications Application platforms: Devices running open operating systems including Linux, Palm OS,
Symbian OS, and Windows CE in wireless, consumer entertainment and digital imaging applications
Secure applications: Smart cards, SIM cards, and payment terminals
Performance Assessment
In evaluating processor hardware and setting requirements for new systems, performance is one of the key parameters to consider, along with cost, size, security, reliability, and in some cases power consumption. System clock speed Operations performed by a processor, such as fetching an instruction, decoding the instruction, performing an arithmetic operation, and so on are governed by a system clock. The speed of a processor is dictated by the pulse frequency produced by the clock, measured in cycles per second, or Hertz (Hz).
Performance Assessment
Clock signals are generated by a quartz crystal, which generates a constant signal wave while power is applied. This wave is converted into a digital voltage pulse stream that is provided in a constant flow to the processor circuitry. The rate of pulses is known as the clock rate, or clock speed. One increment, or pulse, of the clock is referred to as a clock cycle, or a clock tick. The time between pulses is the cycle time.
System Clock
Pipelining simultaneous execution of instructions Conclusion: clock speed is not the whole story about performance
We can refine this formulation by recognizing that during the execution of an instruction, part of the work is done by the processor, and part of the time a word is being transferred to or from memory. In this latter case, the time to transfer depends on the memory cycle time, which may be greater than the processor cycle time. We can rewrite the preceding equation as
p no. of processor cycles M no. of memory references K ratio between memory cycle time and processor cycle time
cache & memory hierarchy We can express the MIPS rate in terms of the clock rate and CPI as follows:
For example, consider the execution of a program which results in the execution of 2 million instructions on a 400-MHz processor. The program consists of four major types of instructions. The instruction mix and the CPI for each instruction type are given below based on the result of a program trace experiment:
The average CPI when the program is executed on a uniprocessor with the above trace results is CPI 0.6 + (2*0.18) + (4*0.12) + (8*0.1) = 2.24. The corresponding MIPS rate is (400*106)/(2.24*106) = 178. Floating point performance is expressed as millions of floating-point operations per second (MFLOPS), defined as follows:
Benchmarks Programs designed to test performance benchmark suite is a collection of programs, defined in a high-level language, that together attempt to provide a representative test of a computer in a particular application or system programming area. System Performance Evaluation Corporation (SPEC), maintained and defined the best known collection of benchmark suites
Averaging Results
To obtain a reliable comparison of the performance of various computers, it is preferable to run a number of different benchmark programs on each machine and then average the results. For example, if m different benchmark program, then a simple arithmetic mean can be calculated as follows:
Where Ri is the high-level language instruction execution rate for the ith benchmark program. Alternative: Harmonic Mean
Ahmdals Law Gene Amdahl Potential speed-up of program using multiple processors Concluded that:
Code needs to be parallelizable Speed up is bound, giving diminishing returns for more processors Task dependent Servers gain by maintaining multiple
Let T be the total execution time of the program using a single processor. Then the speedup using a parallel processor with N processors that fully exploits the parallel portion of the program is as follows:
Two important conclusions can be drawn: 1. When f is small, the use of parallel processors has little effect. 2. As N approaches infinity, speedup is bound by 1/(1 f), so that there are diminishing returns for using more processors.
Speedup
Suppose that a feature of the system is used during execution a fraction of the time f, before enhancement, and that the speedup of that feature after enhancement is SUf. Then the overall speedup of the system is
For example, suppose that a task makes extensive use of floating-point operations, with 40% of the time is consumed by floating-point operations. With a new hardware design, the floating-point module is speeded up by a factor of K. Then the overall speedup is:
1.67
Computer Components
The Control Unit and the Arithmetic and Logic Unit constitute the Central Processing Unit An instruction interpreter and a module of general-purpose arithmetic and logic functions Data and instructions must be put into the system Taken together, theses are referred to as I/O components Memory/Main Memory place to store temporarily both instructions and data.
Top-Level View
The CPU exchanges data with memory. For this purpose, it typically makes use of two internal (to the CPU) registers: a memory address register (MAR), which specifies the address in memory for the next read or write, and a memory buffer register (MBR), which contains the data to be written into memory or receives the data read from memory. Similarly, an I/O address register (I/OAR) specifies a particular I/O device. An I/O buffer (I/OBR) register is used for the exchange of data between an I/O module and the CPU An I/O module transfers data from external devices to CPU and memory, and vice versa. It contains internal buffers for temporarily holding these data until they can be sent on.
Computer Function
The basic function performed by a computer is execution of a program The processor does the actual work by executing instructions specified in the program. Instruction processing consists of two steps:
The processor reads ( fetches) instructions from memory one at a time and executes each instruction. Program execution (executes) consists of repeating the process of instruction fetch and instruction execution
Computer Function
Instruction Cycle
processing required for a single instruction The two steps are referred to as the fetch cycle and the execute cycle. Program execution halts only if the machine is turned off, some sort of unrecoverable error occurs, or a program instruction that halts the computer is encountered.
Processor I/O
Data transfer between CPU and I/O module
Data processing
Some arithmetic or logical operation on data
Control
Alteration of sequence of operations
Interrupts
Mechanism by which other modules (e.g. I/O) may interrupt normal sequence of processing Program
e.g. overflow, division by zero
Timer
Generated by internal processor time
I/O
from I/O controller
Hardware failure
e.g. memory parity error
Interrupt Cycle
Added to instruction cycle Processor checks for interrupt
Indicated by an interrupt signal
Interrupt Cycle
Added to instruction cycle Processor checks for interrupt
Indicated by an interrupt signal
Multiple Interrupts
Disable Interrupts
Processor will ignore further interrupts whilst processing one interrupt Interrupts remain pending and are checked after first interrupt has been processed Interrupts handled in sequence as they occur
Define Priorities
Low priority interrupts can be interrupted by higher priority interrupts
When higher priority interrupt has been processed, processor returns to previous interrupt
Interconnection Structures
The collection of paths connecting the various modules is called the interconnection structure. The design of this structure will depend on the exchanges that must be made among modules.
Interconnection Structures
Types of exchanges that are needed by indicating the major forms of input and output for each module type:
Memory: Typically, a memory module will consist of N words of equal length. Each word is assigned a unique numerical address (0, 1, . . . ,N 1). A word of data can be read from or written into the memory I/O module: From an internal (to the computer system) point of view, I/O is functionally similar to memory. There are two operations, read and write. Further, an I/O module may control more than one external device. We can refer to each of the interfaces to an external device as a port and give each a unique address (e.g., 0, 1, . . . ,M 1).
Interconnection Structures
- Processor: The processor reads in instructions and data, writes out data after processing, and uses control signals to control the overall operation of the system. It also receives interrupt signals.
Computer Module
Memory Connection
Receives and sends data Receives addresses (of locations)
Input
Receive data from peripheral Send data from computer
CPU Connection
Reads instruction and data Writes out data (after processing)
Bus Interconnection
A bus is a communication pathway connecting two or more devices Multiple devices connect to the bus, and a signal transmitted by any one device is available for reception by all other devices attached to the bus. A bus that connects major computer components (processor, memory, I/O) is called a system bus.
Bus Structure
On any bus the lines can be classified into three functional groups:
The data lines provide a path for moving data among system modules. These lines, collectively, are called the data bus. The address lines are used to designate the source or destination of the data on the data bus. The control lines are used to control the access to and the use of the data and address lines.
Bus Structure
The operation of the bus is as follows. If one module wishes to send data to another, it must do two things: (1) obtain the use of the bus, and (2) transfer data via the bus. If one module wishes to request data from another module, it must (1) obtain the use of the bus, and (2) transfer a request to the other module over the appropriate control and address lines. It must then wait for that second module to send the data.
Bus Structure
Bus Types
Dedicated
Separate data & address lines
Multiplexed
Shared lines Address valid or data valid control line Advantage - fewer lines Disadvantages
More complex control Ultimate performance
Bus Arbitration
More than one module controlling the bus
Ex. CPU and DMA controller
Only one module may control bus at one time Arbitration may be centralised or distributed
Distributed
Each module may claim the bus Control logic on all modules
Timing
Co-ordination of events on bus Synchronous
Events determined by clock signals Control Bus includes clock line A single 1-0 is a bus cycle All devices can read clock line Usually sync on leading edge Usually a single cycle for an event
PCI Bus
Peripheral Component Interconnection Intel released to public domain 32 or 64 bit
Error Lines
PCI Commands
Transaction between initiator (master) and target Master claims bus Determine type of transaction
Ex. I/O read/write
Cache Memory
Chapter 4
Terminology
Capacity: the amount of information that can be
contained in a memory unit
usually in terms of words or bytes
Terminology
Access time:
For RAM, the time to address the unit and perform the transfer For non-random access memory, the time to position the R/W head over the desired location
Memory Hierarchy
Major design objective of any memory system
To provide adequate storage capacity at An acceptable level of performance At a reasonable cost
Use a hierarchy of storage devices Develop automatic space allocation methods for efficient use of the memory Through the use of virtual memory techniques, free the user from memory management tasks Design the memory and its related interconnection structure so that the processor can operate at or near its maximum rate
Memory Hierarchy
Basis of the memory hierarchy
Registers internal to the CPU for temporary data storage (small in number but very fast) External storage for data and programs (relatively large and fast) External permanent storage (much larger and much slower)
Consists of distinct levels of memory components Each level characterized by its size, access time, and cost per bit Each increasing level in the hierarchy consists of modules of larger capacity, slower access time, and lower cost/bit
Try to match the processor speed with the rate of information transfer from the lowest element in the hierarchy
Hierarchy List
Registers L1 Cache L2 Cache Main memory Disk cache Disk Optical Tape
Cache Memory
Cache memory is a critical component of the memory hierarchy
Compared to the size of main memory, cache is relatively small Operates at or near the speed of the processor Very expensive compared to main memory Cache contains copies of sections of main memory
Cache Memory
Small amount of fast memory Sits between normal main memory and CPU May be located on CPU chip or module
Locality of Reference
The cache memory works because of locality of reference
Memory references made by the processor, for both instructions and data, tend to cluster together
Instruction loops, subroutines Data arrays, tables
Keep these clusters in high speed memory to reduce the average delay in accessing data Over time, the clusters being referenced will change -- memory management must deal with this
Cache Design
Addressing Size Mapping Function Replacement Algorithm Write Policy Block Size Number of Caches
Cache Addressing
Where does cache sit?
Between processor and virtual memory management unit Between MMU and main memory
Mapping Function
Because there are fewer cache lines than main memory blocks, an algorithm is needed for mapping main memory blocks into cache lines. The choice of the mapping function dictates how the cache is organized. 3 techniques: direct, associative, and set associative.
Direct Mapping
The simplest technique, known as direct mapping, maps each block of main memory into only one possible cache line.
Direct Mapping
Direct Mapping
Write Policy
Must not overwrite a cache block unless main memory is up to date
Write Through
All writes go to main memory as well as cache Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to date Lots of traffic Slows down writes Remember bogus write through caches!
Write Back
Updates initially made in cache only Update bit for cache slot is set when update occurs If block is to be replaced, write to main memory only if update bit is set Other caches get out of sync I/O must access main memory through cache N.B. 15% of memory references are writes