Sei sulla pagina 1di 55

EC041 COMPUTER ARCHITECTURE AND

ORGANIZATION

Course Objective:
To understand how computers are constructed out of a set of functional
units and how the functional units operate, interact and communicate.
To make the students to understand the concept of interfacing memory and
various I/O devices to a computer system using a suitable bus system.

Introduction: Function and structure of a computer, Functional components of a


Computer, Interconnection of components, Performance of a computer.
Representation of Instructions: Machine instructions, Memory locations & Addresses,
Operands, Addressing modes, Instruction formats, Instruction sets, Instruction set
architectures - CISC and RISC architectures, Super scalar Architectures, Fixed point and
floating point operations.
Basic Processing Unit: Fundamental concepts, ALU, Control unit, Multiple bus
organization, Hardwired control, Micro programmed control, Pipelining, Data hazards,
Instruction hazards, Influence on instruction sets, Data path and control considerations,
Performance considerations.
Memory organization: Basic concepts, Semiconductor RAM memories, ROM, Speed Size and cost, Cache memory, Improving cache performance, Memory management unit,
Concept of virtual memory, Address translation, Secondary storage devices.
I/O Organization: Accessing I/O devices, Input/output programming, Interrupts,
Exception Handling, DMA, Buses, I/O interfaces - Serial port, Parallel port, PCI bus, SCSI
bus, USB bus, Firewall and Infiniband, I/O peripherals.

Text Books:
1. C. Hamacher, Z. Vranesic and S. Zaky, "Computer Organization", McGraw- Hill, 2012.
(6 th edition)
2. W. Stallings, "Computer Organization and Architecture - Designing for Performance",
Prentice Hall of India, 2012. (9th Edition)
References:
1. D. A. Patterson and J. L. Hennessy, "Computer Organization and Design The
Hardware/Software Interface", Morgan Kaufmann, 2013. (5 th edition)
2. J .P. Hayes, "Computer Architecture and Organization", McGraw-Hill, (3 rd edition)
3. Behrooz Parhami, Computer Architecture-From Microprocessors to Supercomputers,
Oxford University Press, 2014.

Computer organization refers to the operational units and their


interconnections that realize the architectural specifications.
It describes the function and design of the various units of digital computers
that store and process information.
Receive information from external src and send computed results to external
dest.
Computer architecture encompasses the specification of an instruction set
and the hardware units that implement the instructions.
Computer architecture refers to those attributes of a system visible to a
programmer.

Computer hardware consists of electronic circuits, magnetic and optical


storage devices, displays, electromechanical devices, and communication
facilities.

Highlevel
view
Computer archit ecture
Computer organization

Subfields or views in computer system engineering.


Lowlevel
view
Computer Architecture,
Background and Motivation

Electronic components

Software

Circuit designer

Logic designer

Computer designer

System designer

Application designer

Application domains

From Components to Applications


Hardware

What Is (Computer) Architecture?


Clients requirements:
function, cost, . . .

Clients taste:
mood, style, . . .

Goals
Interface
Means

Construction tec hnology:


material, codes, . . .

Engineering

Arts

Computer Architecture,
Background and Motivation

Architect

The world of arts:


aesthetics, trends, . . .

Interface

Like a building architect, whose place at the engineering/arts and


goals/means interfaces is seen in this diagram, a computer architect
reconciles many conflicting or competing demands.

Computer Systems and Their Parts


Computer

Analog

Digital

Stored-program

Electronic

General-purpose

Number cruncher

Nonelectronic

Special-purpose

Data manipulator

The space of computer systems, with what we normally mean by the word
computer highlighted.

Computer Architecture,
Background and Motivation

Fixed-function

Number cruncher: a computer or program capable of performing rapid


calculations with large amounts of data.
Data manipulation : is the process of changing data in an effort to make it
easier to read or more organized.

COMPUTER TYPES
Computers are classified based on the
parameters like
Speed of operation
Size
Cost
Computational power
Type of application
Embedded computers are integrated into a larger device or system in
order to automatically monitor and control a physical process or
environment.
They are used for a specific purpose rather than for general processing
tasks.
Typical applications include industrial and home automation, appliances,
telecommunication products, and vehicles.

Personal computers
have achieved widespread use in homes,
educational institutions, and business and
engineering office settings, primarily for dedicated
individual use.
A number of classifications are used for personal
computers.
Desktop computers
serve general needs and fit within a typical
personal workspace.
Advantage: Cost effective, easy to operate, suitable
for general purpose educational or business
application

Workstation computers
offer higher computational capacity and
more powerful graphical display capabilities for
engineering and scientific work.
More computational power than PC.
Costlier
Portable and Notebook computers
provide the basic features of a personal
computer in a smaller lightweight package.
They can operate on batteries to provide
mobility.
Compact form of personal computer
(laptop)
Advantage is portability

Enterprise systems(Mainframes)
are large computers that are meant to be
shared by a potentially large number of users who
access them from some form of personal
computer over a public or private network.

More computational power


Larger storage capacity
Used for business data processing in large
organization
Commonly referred as servers or super
computers.
Server system
Supports large volumes of data which
frequently need to be accessed or to be modified.

Supports request response operation.

Supercomputers and Grid computers normally offer the highest


performance
Faster than mainframes.
Helps in calculating large scale numerical and algorithm calculation in short
span of time.
Used for aircraft design and testing, military application and weather
forecasting.
High cost.

Grid computers provide a more cost-effective


alternative.
They combine a large number of personal computers
and disk storage units in a physically distributed highspeed network, called a grid, which is managed as a
coordinated computing resource.

There is an emerging trend in access to computing


facilities, known as cloud computing.
Personal computer users access widely distributed
computing and storage server resources for individual,
independent, computing needs.

The Internet provides the necessary communication


facility.
Cloud hardware and software service providers operate
as a utility, charging on a pay-as-you-use basis.

Price/Performance Pyramid

$Millions

Mainfram e

$100s Ks

Server

Workstation

Personal

Embedded

$10s Ks
$1000s
$100s
$10s

Classifying computers by computational power and price


range.

Computer Architecture,
Background and Motivation

Super

Functional units of a computer


Five functionally independent main parts: Input, Memory, Arithmetic and
logic, Output and Control units.

a computer is a sophisticated electronic calculating machine that:


Accepts input information,
Processes the information according to a list of internally stored
instructions and
Produces the resulting output information.

All of these actions are coordinated by the control unit.


An interconnection network provides the means for the functional units to
exchange information and coordinate their actions.
A program is a list of instructions which performs a task. Programs are stored
in the memory.

Instructions specify commands to:


Transfer information within a computer (e.g., from memory to ALU)
Transfer of information between the computer and I/O devices (e.g., from
keyboard to computer, or computer to printer)
Perform arithmetic and logic operations (e.g., Add two numbers, Perform
a logical AND).

Data are numbers and characters that are used as operands by the instructions.
Data are also stored in the memory.
Data are the operands upon which instructions operate.
Data could be:
Numbers,
Encoded characters.
Data, in a broad sense means any digital information.

Computers use data that is encoded as a string of binary digits called bits.
The instructions and data handled by a computer must be encoded in a suitable
format.

Input unit
Binary information must be presented to a computer in a specific format.
This task is performed by the input unit:
- Interfaces with input devices.
- Accepts binary information from the input devices.
- Presents this binary information in a format expected by the computer.
- Transfers this information to the memory or processor.
Real world
Keyboard
Joysticks
Track balls
Mouse
Microphones
(Audio input)
Cameras
(video input)
Internet

Computer
Memory
Input Unit
Processor

Memory unit
Memory unit stores instructions and data.
Recall, data is represented as a series of bits.
To store data, memory unit thus stores bits.
Processor reads instructions and reads/writes data from/to the memory during
the execution of a program.
In theory, instructions and data could be fetched one bit at a time.
In practice, a group of bits is fetched at a time.
Group of bits stored or retrieved at a time is termed as word
Number of bits in a word is termed as the word length of a computer.
In order to read/write to and from memory, a processor should know where to
look:
Address is associated with each word location.

Processor reads/writes to/from memory based on the memory address:


Access any word location in a short and fixed amount of time based on the
address.
Random Access Memory (RAM) provides fixed access time independent of
the location of the word.
Access time is known as Memory Access Time.
Memory and processor have to communicate with each other in order to
read/write information.
In order to reduce communication time, a small amount of RAM (known
as Cache) is tightly coupled with the processor.
Modern computers have three to four levels of RAM units with different speeds
and sizes:
Fastest, smallest known as Cache
Slowest, largest known as Main memory.

Part No:

Data bus
width

Memory size

Operating
frequency

INTEL

8085

64K (216 ) A0-A15

3 MHz

8086

16

1MB (220 ) A0-A19

5 MHz

80186

16

1MB (220 ) A0-A19

80286

16

16MB (224 )

8 MHz

80386DX

32

4GB (232 )

33 MHz

80486DX

32

4GB (232 ) + 8K cache

50 MHz

Pentium

64

4GB + 16K cache

60 MHz &
66 MHz

Pentium Pro

64

64GB (236 ) + 16K L1cache +


256K L2 cache

150 MHz &


166 MHz

Pentium II

64

64GB (236 ) + 32K L1cache +


512K L2 cache

233 MHz to
450 MHz

Part No:

Data bus
width

Memory size

Operating
frequency

Pentium II
Xeon

64

64GB (236 ) + 32K L1cache +


512K L2 cache or 1MB L2

450 MHz

Pentium III
Pentium 4

64

64GB (236 ) + 32K L1cache +


256K L2 cache

1 GHz
1.3 GHz .4GHz

Motorola
6800

64 K

2 MHz

68000

16

16M

8-20 MHz

68020

32

4G

12 MHz to
33 MHz.

68040

32

4G +8K Cache

40 MHz

68060

64

4G +16K Cache

66 MHz and
75 MHz.

POWER PC

64

4G +32K Cache

90 to 120 MHz

Caches and CPUs


data
cache
controller

address

CPU
data

cache
address
data

main
memory

Multi-Level Caches
Processor

Regs

L1
d-cache
L1
i-cache

size:
speed:
$/Mbyte:
line size:

200 B
3 ns

8-64 KB
3 ns

Unified
L2
Cache

Memory

disk

1-4MB SRAM 128 MB DRAM 30 GB


6 ns
60 ns
8 ms
$100/MB
$1.50/MB
$0.05/MB
8B
32 B
32 B
8 KB
larger, slower, cheaper

On chip cache & Co processors


Pentium

80486 DX
CPU1

CPU

CPU2

Co pros

Co processor

16 K L1 cache
8K L1 cache

Pentium II, III, 4 module


Pentium Pro
CPU1

CPU1 CPU2

CPU3

Co
pros

CPU2

CPU3

32 K L1 cache

16 K L1 cache
256 K L2 cache

512 K L2 cache

Co
Pros

Primary storage of the computer consists of RAM units.


Fastest, smallest unit is Cache.
Slowest, largest unit is Main Memory.
Primary storage is insufficient to store large amounts of data and programs.
Primary storage can be added, but it is expensive.
Store large amounts of data on secondary storage devices:
Magnetic disks and tapes,
Optical disks (CD-ROMS).
Access to the data stored in secondary storage in slower, but take
advantage of the fact that some information may be accessed infrequently.
Cost of a memory unit depends on its access time, lesser access time implies
higher cost.

Arithmetic and logic unit (ALU)


Operations are executed in the Arithmetic and Logic Unit (ALU).
Arithmetic operations such as addition, subtraction.
Logic operations such as comparison of numbers.
In order to execute an instruction, operands need to be brought into the ALU
from the memory.
Operands are stored in general purpose registers available in the ALU.
Access times of general purpose registers are faster than the cache.
Results of the operations are stored back in the memory or retained in the
processor for immediate use.

Output unit
Computers represent information in a specific binary form.
Output units:
- Interface with output devices.
- Accept processed results provided by the computer in specific binary form.
- Convert the information in binary form to a form understood by an
output device.
Computer
Memory
Output Unit
Processor

Printer
Graphics display
Speakers

Control unit
Operation of a computer can be summarized as:
Accepts information from the input units (Input unit).
Stores the information (Memory).
Processes the information (ALU).
Provides processed results through the output units (Output unit).

Operations of Input unit, Memory, ALU and Output unit are coordinated by
Control unit.
Control unit generates timing signals which determines when a particular
operation takes place.

How are the functional units connected?


For a computer to achieve its operation, the functional units need to
communicate with each other.
In order to communicate, they need to be connected.

Output

Memory

Processor
32

Input

Bus

Functional units may be connected by a group of parallel wires.


The group of parallel wires is called a bus.
Each wire in a bus can transfer one bit of information.
The number of parallel wires in a bus is equal to the word length of
a computer

Generations of Progress
The 5 generations of digital computers, and their ancestors.

Processor
technology

Memory
innovations

I/O devices
introduced

Dominant
look & fell

0 (1600s)

(Electro-)
mechanical

Wheel, card

Lever, dial,
punched card

Factory
equipment

1 (1950s)

Vacuum tube

Magnetic drum

Paper tape,
magnetic tape

Hall-size
cabinet

2 (1960s)

Transistor

Magnetic core

Drum, printer,
text terminal

Room-size
mainframe

3 (1970s)

SSI/MSI

RAM/ROM
chip

Disk, keyboard,
video monitor

Desk-size mini

4 (1980s)

LSI/VLSI

SRAM/DRAM

Network, CD,
mouse,sound

Desktop/
laptop micro

5 (1990s)

ULSI/GSI/
WSI,
SOC,SOP

SDRAM, flash

Sensor/actuator,
point/click

Invisible,
embedded

Computer Architecture,
Background and Motivation

Generation
(begun)

The manufacturing process for an IC part.


Blank wafer
with defects

Silicon
crystal
ingot

Slicer
15-30
cm

x x
x x x
x
x
x
x
x x

Patterned wafer
Processing:
20-30 steps

(100s of simple or scores


of complex processors)

0.2 cm

Dicer

Die

~1 cm

Die
tester

Good
die

Microchip
or other part
Mounting

~1 cm

Die yield = (number of good dies) / (total number of dies)

Part
tester

Usable
part
to ship

Computer Architecture,
Background and Motivation

30-60 cm

Scaling of Transistor Performance and Wires


Feature size: the minimum size of a transistor or a wire in either the x or y
dimension.
From 10 microns in 1971 to 0.09 microns (90 nm) in 2006;
The density of transistors increases quadratically with a linear decrease in
feature size;
Since improvement in transistor density, thus CPU move quickly from 4-bit
to 8-bit, to 16-bit, to 32-bit microprocessors;
Transistor performance improves linearly with decreasing feature size; wires
in an integrated circuit do not.
In particular, the signal delay for a wire increases in proportion to the
product of its resistance and capacitance.
Of course, as feature size shrinks, wires get shorter, but the resistance and
capacitance per unit length get worse.

Moores Law
TIPS

Tb

Memory

GIPS

80486

R10000
Pentium II
Pentium
256Mb
68040
64Mb

Gb
1Gb

16Mb
80386
68000
MIPS
80286

4Mb
1Mb

Mb
4 / 3 yrs

256kb
64kb

kIPS
1980

1990

2000

kb
2010

Calendar year

Short time challenge: putting processor and memory on the same device.

Computer Architecture,
Background and Motivation

Processor performance

1.6 / yr
2 / 18 mos
10 / 5 yrs

Memory chip capacity

Processor

Various packaging schemes are used for connecting the processor, memory
and other blocks, depending on the computer type and cost/performance
targets.
Memory chips are mounted on small printed circuit boards known as
daughter cards.

SIMM - single in line memory modules


Mother boards holds the processor, system bus, various interfaces and a
variety of connectors.
Multiple such boards mounted in a chassis or card cage and interconnected
through a backplane.

Processor and Memory Technologies

Die

PC board

Bus
CPU
Connector
Memory
(a) 2D or 2.5D packaging now common

Stacked layers
glued together

(b) 3D packaging of the future

Packaging of processor, memory, and other components.

Computer Architecture,
Background and Motivation

Backplane

Interlayer connections
deposited on the
outside of the stack

Floppy
disk

..
(a) Cutaway view of a hard disk drive

CD-ROM

..
.

Magnetic
tape
cartridge

(b) Some removable storage media

Magnetic and optical disk memory units.

Computer Architecture,
Background and Motivation

Typically
2-9 cm

Performance Trends: Bandwidth over Latency


Bandwidth or throughput: the total amount of work done in a given time.
Such as megabyte per second for a disk transfer.
Latency or response time: the time between the start and the completion of an
event. Such as milliseconds for a disk access.

Communication Technologies
10 12

Geographically distributed
I/O
network
System-area
network
(SAN)
Local-area
network
(LAN)

10 9

Metro-area
network
(MAN)

10 6
Same geographic location

10 3
10 9
(ns)

10 6
(s)

10 3
(ms)

Computer Architecture,
Background and Motivation

Bandwidth (b/s)

Processor
bus

Wide-area
network
(WAN)

(min)

10 3

Latency (s)

Latency and bandwidth characteristics of different classes of


communication links.

(h)

High- vs Low-Level Programming

One task =
many statements

add
add
add
lw
lw
sw
sw
jr

One statement =
several instructions

$2,$5,$5
$2,$2,$2
$2,$4,$2
$15,0($2)
$16,4($2)
$16,0($2)
$15,4($2)
$31

Machine
language
instructions,
binary (hex)

00a51020
00421020
00821020
8c620000
8cf20004
acf20000
ac620004
03e00008

Mostly one-to-one

Models and abstractions in programming.

Computer Architecture,
Background and Motivation

temp=v[i]
v[i]=v[i+1]
v[i+1]=temp

Assembly
language
instructions,
mnemonic

Compiler

Swap v[i]
and v[i+1]

High-level
language
statements

Interpreter

Very
high-level
language
objectives
or tasks

More conc rete, machine-specific, error-prone;


harder to write, read, debug, or maintain

Assembler

More abstract, machine-independent;


easier to write, read, debug, or maintain

Software Systems and Applications


Software

word processor,
spreadsheet,
circuit simulator,
.. .

System
Operating system

Translator:

Manager:

Enabler:

Coordinator:

MIPS assembler,
C compiler,
.. .

virtual memory,
security,
file system,
.. .

disk driver,
display driver,
printing,
.. .

scheduling,
load balancing,
diagnostics,
.. .

Categorization of software, with examples in each class.

Computer Architecture,
Background and Motivation

Application:

Power in IC
Power also provides challenges as devices are scaled.
Dynamic power (watts, W)in CMOS chip: the traditional dominant
energy consumption has been in switching transistors.
Powerdynamic

1
Capacitive load Voltage 2 Frequency switched
2

In modern VLSI, the exact power measurement is the sum of,


Powertotal=Power_dynamic+Power_static+Power_leakage
For mobile devices: they care about battery life more than power, so energy is
the proper metric, measured in joules:
Energy dynamic 1 / 2 * Capacitive load Voltage2

Hence, lower voltage can reduce Powerdynamic and Energydynamic


greatly. (In the past 20 years, supply voltage is from 5V down to 1V)

Example 1 Some microprocessor today are design to have adjustable


voltage, so that a 15% reduction in voltage may result in a 15% reduction in
frequency. What would be the impact on dynamic power?
Answer
Since the capacitance is unchanged, the answer is the ratios of the
voltages and frequencies:
Powernew Voltage 0.85 Frequency switched 0.85
3

0
.
85
0.61
2
Powerold
Voltage Frequency switch
2

thereby reducing power to about 60% of the original.

thereby reducing energy to about 72% of the original.

As we move from one process to the next, (60 nm or 45 nm)


Transistor switching and frequency ;
Capacitance and voltage ;
However, power consumption and energy .
The first microprocessors consumed tenths of a watt,
while a 3.2 GHz Pentium 4 Extreme Edition consumes 135 watts
we are reaching the limits of what can be cooled by air.
Most microprocessors today turn off the clock of inactive modules to
save energy and dynamic power.
For example, if no floating-point instructions are executing, the clock of
the floating-point unit is disabled.

Static power: an important issue because leakage current flows even when
a transistor is off:
No.of transistor , leakage current , power ;
Feature size , leakage current , power (why? You can find out in
VLSI area).
As a result, very low power systems are even gating the voltage to
inactive modules to control loss due to leakage.

Powerstatic Current static Voltage

Cost of an Integrated Circuit (IC)


Cost of IC

Cost of die Cost of testing die Cost of packaging and final test
Final test yield

(A greater portion of the cost that varies between machines)


Cost of wafer = cost of die x dies per wafer x die yield
Cost of wafer
Cost of die
# Dies per wafer Die yield

Number of dies per wafer = area of the wafer / area


of the die

Wafer radius
Wafer diameter
# Dies per wafer

Die area
2 Die area
2

(sensitive to die size)

(# of dies along the edge)

First term = wafer area ( 2 ) / die ara


Second term = consider rectangular dies near the periphery of round wafers
circumference d

/ diagonal of the square die

It gives maximum number of dies per wafer.


Fraction of good dies on a wafer (die yield)
Yield is inversely proportional to the complexity of the fabrication process.

N process complexity factor, a measure of manufacturing difficulty

For 40 nm process in 2010 N ranged from 11.5 to 15.5


Defects per unit area 0.1 to 0.3 defects per square inch
0.016 to 0.057 defects per square centimeter

Find the number of dies per 300 mm (30 cm) wafer for a die that is 1.5 cm
on a side.
The total die area is 2.25 cm2. Thus
Wafer radius
Wafer diameter
# Dies per wafer

Die area
2 Die area
2

30 / 22
2.25

30
2 2.25

706.5 94.2

270
2.25 2.12

Response Time, Throughput, and Performance


When we say one computer is faster than another one?
Response time : the time between the start and the completion of an event
also referred to as execution time.
The computer user is interested.
Throughput : the total amount of work done in a given time.
The administrator of a large data processing center may be interested.
In comparing design alternatives,
The phrase X is faster than Y is used here to mean that the response
time or execution time is lower on X than on Y.
In particular, X is n times faster than Y or the throughput of X is n
times higher than Y will mean

Execution time Y
n
Execution time X

Performance Measuring
Execution is the reciprocal of performance,

Performance X

1
Execution time X

1
Execution Time Y Performance Y Performance X
n

1
Execution Time X
Performance Y
Performance X

Reliable Measure User CPU Time


Response time may include disk access, memory access, input/output
activities, CPU event and operating system overhead everything

In order to get an accurate measure of performance, we use CPU time instead


of using response time.
CPU time is the time the CPU spends computing a program and does not
include time spent waiting for I/O or running other programs.
CPU time can also be divided into user CPU time (program) and system CPU
time (OS).
In our performance measures, we use user CPU time because of its
independence on the OS and other factors.

Four Useful Principles of CA Design


Take advantage of parallelism
One most important methods for improving performance.
System level parallelism and Individual processor level parallelism.
Principle of Locality
The properties of programs.
Temporal locality and Spatial locality.
Focus on the common case
For power, resource allocation and performance.
Amdahls law
The performance improvement to be gained from using some faster
mode of execution is limited by the fraction of the time the faster mode
can be used.

Potrebbero piacerti anche