Sei sulla pagina 1di 128

Advance Computer

Architecture
Humaira Ashraf

Computer Architecture

Major components of a computer

Central Processing Unit (CPU)

memory

peripheral devices

What is a computer?

Simply put, a computer is a sophisticated electronic calculating machine that:

Accepts input information,

Processes the information according to a list of internally stored instructions and

Produces the resulting output information.

Functions performed by a computer are:

Accepting information to be processed as input.

Storing a list of instructions to process the information.

Processing the information according to the list of instructions.

Providing the results of the processing as output.

What are the functional units of a computer?


3

Functional units of a computer

Arithmetic and logic unit(ALU):


Performs the desired
operations on the input
information as determined
by instructions in the memory

Input unit accepts


information:
Human operators,
Electromechanical devices
Other computers

Memory
Input

Instr1
Instr2
Instr3
Data1
Data2

Output
I/O
Output unit sends
results of processing:
To a monitor display,
To a printer

Stores
information:
Instructions,
Data

Arithmetic
& Logic

Control
Processor
Control unit coordinates
various actions
Input,
Output
Processing
4

Information in a computer -- Instructions

Instructions specify commands to:

Transfer information within a computer (e.g., from memory to ALU)

Transfer of information between the computer and I/O devices (e.g., from
keyboard to computer, or computer to printer)

Perform arithmetic and logic operations (e.g., Add two numbers, Perform
a logical AND).

A sequence of instructions to perform a task is called a program, which is


stored in the memory.

Processor fetches instructions that make up a program from the memory and
performs the operations stated in those instructions.

What do the instructions operate upon?


5

Information in a computer -- Data

Data are the operands upon which instructions operate.

Data could be:

Numbers,

Encoded characters.

Data, in a broad sense means any digital information.

Computers use data that is encoded as a string of binary digits called bits.

Input unit

Binary information must be presented to a computer in a specific format. This


task is performed by the input unit:
- Interfaces with input devices.
- Accepts binary information from the input devices.
- Presents this binary information in a format expected by the computer.
- Transfers this information to the memory or processor.

Real world

Computer
Memory

Keyboard
Audio input

Input Unit

Processor
7

Memory unit

Memory unit stores instructions and data.

Recall, data is represented as a series of bits.

To store data, memory unit thus stores bits.

Processor reads instructions and reads/writes data from/to the


memory during the execution of a program.

In theory, instructions and data could be fetched one bit at a time.

In practice, a group of bits is fetched at a time.

Group of bits stored or retrieved at a time is termed as word

Number of bits in a word is termed as the word length of a


computer.

In order to read/write to and from memory, a processor should know


where to look:

Address is associated with each word location.


8

Memory unit (contd..)

Processor reads/writes to/from memory based on the memory address:

Access any word location in a short and fixed amount of time based
on the address.

Random Access Memory (RAM) provides fixed access time


independent of the location of the word.

Access time is known as Memory Access Time.

Memory and processor have to communicate with each other in order


to read/write information.

In order to reduce communication time, a small amount of RAM


(known as Cache) is tightly coupled with the processor.

Modern computers have three to four levels of RAM units with different
speeds and sizes:

Fastest, smallest known as Cache

Slowest, largest known as Main memory.


9

Memory unit (contd..)

Primary storage of the computer consists of RAM units.

Fastest, smallest unit is Cache.

Slowest, largest unit is Main Memory.

Primary storage is insufficient to store large amounts of data and programs.

Primary storage can be added, but it is expensive.

Store large amounts of data on secondary storage devices:

Magnetic disks and tapes,

Optical disks (CD-ROMS).

Access to the data stored in secondary storage in slower, but take advantage of the
fact that some information may be accessed infrequently.

Cost of a memory unit depends on its access time, lesser access time implies higher cost.
10

Computer Architecture (continued)

organized as

bit

byte = 8 bits (smallest addressable location)

word = 4 bytes (typically; machine dependent)

instructions consist of operation codes and addresses

oprn

addr 1

oprn

addr 1

oprn

addr 1

addr 2

addr 2

addr 3

Computer Architecture (continued)

Numeric data representations

magnitude

integer (exact representation)

sign-magnitude

floating point (approximate representation)

scientific notation: 0.3481 x 106

inherently imprecise

IEEE Standard 754-1985

exp

significand

Arithmetic and logic unit (ALU)

Operations are executed in the Arithmetic and Logic Unit (ALU).

Arithmetic operations such as addition, subtraction.

Logic operations such as comparison of numbers.

In order to execute an instruction, operands need to be brought into the ALU


from the memory.

Operands are stored in general purpose registers available in the ALU.

Access times of general purpose registers are faster than the cache.

Results of the operations are stored back in the memory or retained in the
processor for immediate use.

13

Output unit
Computers represent information in a specific binary form. Output units:
- Interface with output devices.
- Accept processed results provided by the computer in specific binary form.
- Convert the information in binary form to a form understood by an
output device.

Computer
Memory

Output Unit

Real world
Printer
Graphics display
Speakers

Processor
14

Control unit

Operation of a computer can be summarized as:

Accepts information from the input units (Input unit).

Stores the information (Memory).

Processes the information (ALU).

Provides processed results through the output units (Output unit).

Operations of Input unit, Memory, ALU and Output unit are coordinated by
Control unit.

Instructions control what operations take place (e.g. data transfer,


processing).

Control unit generates timing signals which determines when a particular


operation takes place.
15

How are the functional units connected?


For a computer to achieve its operation, the functional units need to
communicate with each other.
In order to communicate, they need to be connected.

Input

Output

Memory

Processor

Bus
Functional units may be connected by a group of parallel wires.
The group of parallel wires is called a bus.
Each wire in a bus can transfer one bit of information.
The number of parallel wires in a bus is equal to the word length of
a computer
16

Organization of cache and main memory


Main
memory

Cache
memory Processor

Bus

Why is the access time of the cache memory lesser than the
access time of the main memory?

17

Simple Machine Organization (continued)

ALU does arithmetic and logical comparisons

AC = accumulator holds results


MQ = memory-quotient holds second portion of long results
MBR = memory buffer register holds data while operation executes

Simple Machine Organization (continued)

Program control determines what computer does based


on instruction read from memory

MAR = memory address register holds address of memory cell to be read


PC = program counter; address of next instruction to be read
IR = instruction register holds instruction being executed
IBR holds right half of instruction read from memory

Simple Machine Organization


(continued)

Machine operates on fetch-execute cycle

Fetch

PC

MAR

read M(MAR) into MBR

copy left and right instructions into IR and IBR

Execute

address part of IR

read M(MAR) into MBR

execute opcode

MAR

Architecture Families

Before mid-60s, every new machine had a different


instruction set architecture

IBM System/360 created family concept

single instruction set architecture


wide range of price and performance with same software

Performance improvements based on different


detailed implementations

programs from previous generation didnt run on new machine


cost of replacing software became too large

memory path width (1 byte to 8 bytes)


faster, more complex CPU design
greater I/O throughput and overlap

Software compatibility now a major issue

partially offset by high level language (HLL) software

Multiple Register Machines

Initially, machines had only a few registers


2 to 8 or 16 common
registers more expensive than memory

Most instructions operated between memory locations

results had to start from and end up in memory, so fewer instructions

means smaller programs and (supposedly) faster execution

although more complex


fewer instructions and data to move between memory and ALU

But registers are much faster than memory

30 times faster

Multiple Register Machines (continued)

Also, many operands are reused within a short time

waste time loading operand again the next time its needed

Depending on mix of instructions and operand use, having many registers may
lead to less traffic to memory and faster execution

Most modern machines use a multiple register architecture

maximum number about 512, common number 32 integer, 32 floating point

Register Organization

User-visible
Control

registers

and status registers

User Visible Registers

General

Purpose

Data
Address
Condition

Codes

General Purpose Registers


Data
Accumulator
Addressing
Segment

pointers
Index registers
Stack Pointer

Control & Status Registers

Program

Counter
Instruction Decoding Register
Memory Address Register
Memory Buffer Register

Example Register Org.

Instruction Cycle

It

is the time in which a single


instruction is fetched from memory,
decoded, and executed

An

Instruction Cycle requires the


following subcycle:

Instruction Cycle
Fetch

Read next instruction from memory into the


processor
Indirect

Cycle (Decode Cycle)

May require memory access to fetch operands,


therefore more memory accesses.
Interrupt

Save current instruction and service the interrupt


Execute

Interpret the opcode and perform the indicated


operation

Instruction Cycle
Fetch

Interrupt

Indirect
Indirect

Execute

Data Flow (Fetch Diagram)

PC

MAR
Memory
Control
Unit

IR

MBR
MBR

Data Flow (Indirect Diagram)

MAR
Memory
Memory
Control
Unit

MBR
MBR

Data Flow (Execute)


May

take many forms


Depends on instruction being executed
May include
Memory read/write
Input/Output
Register transfers
ALU operations

Data Flow (Interrupt Diagram)

PC

PC

MAR
Memory
Control
Control
UnitUnit

MBR

Instruction Level Parallel Processor

Pipeline lead to evaluation of ILP and as the degree of parallel evaluation


increases the use of multiple pipeline execution Units.

Thus the evaluation of super scaler ILP begin to appear

They were first implemented as VLIW architectures

Instruction Pipelining
Instruction

processing is subdivided:

- Fetch/ Execute instruction


Pipeline

has two independent stages:

1st Stage Fetch an instruction and buffers it.


2nd Stage Temporarily free until first stage passes it
the buffered instruction.
While the second stage is executing the instruction, the
first stage fetches and buffers the next instruction.
Instruction

prefetch or fetch overlap.

- Purpose?

To speed up instruction execution.

Two-Stage Instruction Pipeline

Instruction Processing
Fetch

instruction (FI)

Decode

instruction (DI)

Calculate
Fetch

operands (FO)

Execute
Write

operands (CO)

instruction (EI)

operand (WO)

Successive

instructions in a program sequence


will overlap in execution.

Timing Diagram for Instruction Pipeline


Operation

Six-Stage CPU
Instruction
Pipeline
The logic needed
for pipelining to
account for
branches,
interrupts, and
arising problems.

Alternative Pipeline Depiction

RISC
Pipeline
1. Instruction

fetch
2. Instruction decode and register fetch
3. Execute
4. Memory Access
5. Register write back

Improve CPU performance by

increasing clock rates

(CPU running at 2GHz!)

increasing the number of instructions to be executed in parallel

(executing 6 instructions at the same time)

How do we increase the number of


instructions to be executed?

Time and Space parallelism

Pipeline (assembly line)

Result of pipeline (e.g.)

VLIW
(very long instruction word,1024 bits!)

Superscalar
(sequential stream of instructions)

From Sequential instructions


to parallel execution

Dependencies between instructions

Instruction scheduling

Preserving sequential consistency

4.2 Dependencies between instructions

Instructions often depend on each other in such a way that a particular instruction
cannot be executed until a preceding instruction or even two or three preceding
instructions have been executed.

1 Data dependencies

2 Control dependencies

3 Resource dependencies

4.2.1 Data dependencies

Read after Write

Write after Read

Write after Write

Recurrences

Data dependencies in straight-line code


(RAW)

RAW dependencies

i1: load r1, a

r2: add r2, r1, r1

flow dependencies

Data dependencies in straight-line code


(WAR)

WAR dependencies

i1: mul r1, r2, r3

r2: add r2, r4, r5

anti-dependencies

false dependencies

can be eliminated through register renaming

i1: mul r1, r2, r3

r2: add r6, r4, r5

by using compiler or ILP-processor

Data dependencies in straight-line code


(WAW)

WAW dependencies

i1: mul r1, r2, r3

r2: add r1, r4, r5

output dependencies

false dependencies

can be eliminated through register renaming

i1: mul r1, r2, r3

r2: add r6, r4, r5

by using compiler or ILP-processor

Data dependencies in loops


for (int i=2; i<10; i++) {
x[i] = a*x[i-1] + b

cannot be executed in parallel

Data dependency graphs

i1: load r1, a

i2: load r2, b

i3: load r3, r1, r2

i4: mul r1, r2, r4;

i5: div r1, r2, r4

Control Dependency Graph

4.2.3 Resource dependencies

An instruction is resource-dependent on a previously issued instruction if it


requires a hardware resource which is still being used by a previously issued
instruction.

e.g.

div r1, r2, r3

div r4, r2, r5

4.3 Instruction scheduling

scheduling or arranging two or more instruction to be executed in parallel

Need to detect code dependency (detection)

Need to remove false dependency (resolution)

Two basic approaches

Static: done by compiler

Dynamic: done by processor

Instruction Scheduling:

ILP-instruction scheduling

4.4 Preserving sequential consistency

care must be taken to maintain the logical integrity of the program execution

parallel execution mimics sequential execution as far as the logical integrity


of program execution is concerned

e.g.

add r5, r6, r7

div r1, r2, r3

jz somewhere

Branches

Branch(orbranching,branched)

may

also refer to the act of switching


execution to a different instruction
sequence as a result of executing a
branch instruction.

Branches
Two

Types of Branch Instructions

Conditional

A programming instruction that directs the


computer to another part of the program
based on the results of a compare. Highlevel language statements, such as IF
THEN ELSE and CASE, are used to express
the compare andconditional branch.

Branches
Unconditional

Unconditionalbranch instructionssuch
as GOTO are used to unconditionally
"jump" to (begin execution of) a different
instruction sequence. Machine
levelbranch instructionsare
sometimes called jump instructions.

Conditional Branch Instructions


Condition
BRP

Codes

Branch
BRZ

Branch
BRE

to location X if result is positive


to location X if result is zero

R1,R2,X

Branch

to location X if contents of R1 = R2

Conditional Branch Instructions

Dealing with Branches


A

major problem in designing an instruction


pipeline is assuring a steady flow of
instructions to the initial stages of the
pipeline.

Since

conditional branches alter the steady


flow of instructions, we must come up with
ways to execute them efficiently.

Major types of branches

Branch: To transfer control

Branch examples

8.1.2 How to check the results of operations for specified


conditions {branch} (e.g. equals 0, negative, and so on)

ISA = Instruction Set Architecture

Alternatives for checking the operation results

Result state approach: Disadvantage

The generation of the result state is not straightforward

It requires an irregular structure and occupies additional chip area

The result state is a sequential concept.

It cannot be applied without modification in architectures which have multiple


execution units.

Retaining sequential consistency for


condition checking (in VLIW or Superscalar)

Use multiple sets of condition codes or flags

It relies on programmer or compiler to use


different sets condition codes or flags for
different outcome generated by different EUs.

Use Direct Check approach.

Branch Statistics

20% of general-purpose code are branch

on average, each fifth instruction is a branch

5-10% of scientific code are branch

The Majority of branches are conditional (80%)

75-80% of all branches are taken

Branch statistics: Taken or not Taken

8.1.4 The branch problem:


The delay caused in pipelining

More branch problems

Conditional branch could cause an even longer penalty

evaluation

of the specified condition needs an extra

cycle
waiting

for unresolved condition (the result is not yet

ready)
e.g.

wait for the result may take 10-50 cycles

Pipelines with more stages than 4

each

branch would result in a yet larger number of


wasted cycles (called bubbles)

Interpretation of the concept of branch


penalty

Zero-cycle branching {in no time}

8.2 Basic approaches to branch handling

Speculative vs. Multiway branching

-Delayed Branching: Occurrence of an unused


instruction slot (unconditional branch)

Basic scheme of delayed branching

Early branch detection: {for scalar Processor}


Integrated instruction fetch and branch detection

Detect branch instruction during fetching

Guess taken or not taken

Fetch next sequential instruction or target instruction

Handling of unresolved conditional


branches

Blocking branch processing


Simply

stalled (stopped and waited) until the


specified condition can be resolved

-Basic kinds of branch predictions

-The fixed prediction approach

Always not taken vs. Always Taken

Always not taken: Penalty figures

Penalty figures for


the always taken prediction approach

-Static branch prediction

Instruction Scheduling

instructionschedulingis a compiler optimization used to improve


instruction-level parallelism, which improves performance on machines with
instruction pipelines. Put more simply, without changing the meaning of
thecode, it tries to. Avoid pipeline stalls by rearranging the order of
instructions

Compiler Optimization

Incomputing, anoptimizing compileris acompilerthat tries to minimize or


maximize some attributes of anexecutablecomputer program. The most
common requirement is to minimize the time taken to execute aprogram; a
less common one is to minimize the amount ofmemory occupied. The growth
ofportable computershas created a market for minimizing
thepowerconsumed by a program. Compiler optimization is generally
implemented using a sequence ofoptimizing transformations, algorithms
which take a program and transform it to produce a semantically equivalent
output program that uses fewer resources.

Motivation

Single cycling implementation

Single cycling implementation

Single cycling implementation

Pipeline Implementation

Pipeline Implementation

Pipeline Implementation

Pipeline Hazards

Pipeline Hazards

Pipeline Hazards

Pipeline Hazards

Pipeline Hazards

Pipeline Hazards

Pipeline Hazards

Instruction Order

Instruction Order

Instruction Order

Instruction Dependencies

Instruction Dependencies

Instruction Dependencies

Instruction Dependencies

Instruction Dependencies

Instruction Scheduling

Preserving Dependencies

Preserving Dependencies

Preserving Dependencies

Preserving Dependencies

pipeline stall

abubbleorpipeline stallis a delay in execution of aninstructionin


aninstruction pipeline.

During the decoding stage, the control unit will determine if the decoded
instruction reads from a register that the instruction currently in the
execution stage writes to. If this condition holds, the control unit will stall
the instruction by one clock cycle. It also stalls the instruction in the fetch
stage, to prevent the instruction in that stage from being overwritten by the
next instruction in the program.

Dynamic Scheduling

Potrebbero piacerti anche