Advance Computer Architecture (Autosaved)

Advance Computer
Architecture
Humaira Ashraf
Computer Architecture
Major components of a computer
Central Processing Unit (CPU)
memory
peripheral devices
What is a computer?
Simply put, a computer is a sophisticated electronic calculating machine that:
Accepts input information,
Processes the information according to a list of internally stored instructions and
Produces the resulting output information.
Functions performed by a computer are:
Accepting information to be processed as input.
Storing a list of instructions to process the information.
Processing the information according to the list of instructions.
Providing the results of the processing as output.
What are the functional units of a computer?

3
Functional units of a computer
Arithmetic and logic unit(ALU):

Performs the desired
operations on the input
information as determined
by instructions in the memory
Input unit accepts

information:
Human operators,
Electromechanical devices
Other computers
Memory
Input
Instr1
Instr2
Instr3
Data1
Data2
Output
I/O
Output unit sends
results of processing:
To a monitor display,
To a printer
Stores
information:
Instructions,
Data
Arithmetic
& Logic
Control
Processor
Control unit coordinates
various actions
Input,
Output
Processing
4
Information in a computer -- Instructions
Instructions specify commands to:
Transfer information within a computer (e.g., from memory to ALU)
Transfer of information between the computer and I/O devices (e.g., from
keyboard to computer, or computer to printer)
Perform arithmetic and logic operations (e.g., Add two numbers, Perform
a logical AND).
A sequence of instructions to perform a task is called a program, which is

stored in the memory.
Processor fetches instructions that make up a program from the memory and
performs the operations stated in those instructions.
What do the instructions operate upon?

5
Information in a computer -- Data
Data are the operands upon which instructions operate.
Data could be:
Numbers,
Encoded characters.
Data, in a broad sense means any digital information.
Computers use data that is encoded as a string of binary digits called bits.
Input unit
Binary information must be presented to a computer in a specific format. This

task is performed by the input unit:
- Interfaces with input devices.
- Accepts binary information from the input devices.
- Presents this binary information in a format expected by the computer.
- Transfers this information to the memory or processor.
Real world
Computer
Memory
Keyboard
Audio input
Input Unit
Processor
7
Memory unit
Memory unit stores instructions and data.
Recall, data is represented as a series of bits.
To store data, memory unit thus stores bits.
Processor reads instructions and reads/writes data from/to the

memory during the execution of a program.
In theory, instructions and data could be fetched one bit at a time.
In practice, a group of bits is fetched at a time.
Group of bits stored or retrieved at a time is termed as word
Number of bits in a word is termed as the word length of a

computer.
In order to read/write to and from memory, a processor should know

where to look:
Address is associated with each word location.

8
Memory unit (contd..)
Processor reads/writes to/from memory based on the memory address:
Access any word location in a short and fixed amount of time based
on the address.
Random Access Memory (RAM) provides fixed access time

independent of the location of the word.
Access time is known as Memory Access Time.
Memory and processor have to communicate with each other in order

to read/write information.
In order to reduce communication time, a small amount of RAM

(known as Cache) is tightly coupled with the processor.
Modern computers have three to four levels of RAM units with different
speeds and sizes:
Fastest, smallest known as Cache
Slowest, largest known as Main memory.

9
Memory unit (contd..)
Primary storage of the computer consists of RAM units.
Fastest, smallest unit is Cache.
Slowest, largest unit is Main Memory.
Primary storage is insufficient to store large amounts of data and programs.
Primary storage can be added, but it is expensive.
Store large amounts of data on secondary storage devices:
Magnetic disks and tapes,
Optical disks (CD-ROMS).
Access to the data stored in secondary storage in slower, but take advantage of the
fact that some information may be accessed infrequently.
Cost of a memory unit depends on its access time, lesser access time implies higher cost.
10
Computer Architecture (continued)
organized as
bit
byte = 8 bits (smallest addressable location)
word = 4 bytes (typically; machine dependent)
instructions consist of operation codes and addresses
oprn
addr 1
oprn
addr 1
oprn
addr 1
addr 2
addr 2
addr 3
Computer Architecture (continued)
Numeric data representations
magnitude
integer (exact representation)
sign-magnitude
floating point (approximate representation)
scientific notation: 0.3481 x 106
inherently imprecise
IEEE Standard 754-1985
exp
significand
Arithmetic and logic unit (ALU)
Operations are executed in the Arithmetic and Logic Unit (ALU).
Arithmetic operations such as addition, subtraction.
Logic operations such as comparison of numbers.
In order to execute an instruction, operands need to be brought into the ALU

from the memory.
Operands are stored in general purpose registers available in the ALU.
Access times of general purpose registers are faster than the cache.
Results of the operations are stored back in the memory or retained in the
processor for immediate use.
13
Output unit
Computers represent information in a specific binary form. Output units:
- Interface with output devices.
- Accept processed results provided by the computer in specific binary form.
- Convert the information in binary form to a form understood by an
output device.
Computer
Memory
Output Unit
Real world
Printer
Graphics display
Speakers
Processor
14
Control unit
Operation of a computer can be summarized as:
Accepts information from the input units (Input unit).
Stores the information (Memory).
Processes the information (ALU).
Provides processed results through the output units (Output unit).
Operations of Input unit, Memory, ALU and Output unit are coordinated by
Control unit.
Instructions control what operations take place (e.g. data transfer,

processing).
Control unit generates timing signals which determines when a particular

operation takes place.
15
How are the functional units connected?

For a computer to achieve its operation, the functional units need to
communicate with each other.
In order to communicate, they need to be connected.
Input
Output
Memory
Processor
Bus
Functional units may be connected by a group of parallel wires.
The group of parallel wires is called a bus.
Each wire in a bus can transfer one bit of information.
The number of parallel wires in a bus is equal to the word length of
a computer
16
Organization of cache and main memory

Main
memory
Cache
memory Processor
Bus
Why is the access time of the cache memory lesser than the
access time of the main memory?
17
Simple Machine Organization (continued)
ALU does arithmetic and logical comparisons
AC = accumulator holds results

MQ = memory-quotient holds second portion of long results
MBR = memory buffer register holds data while operation executes
Simple Machine Organization (continued)
Program control determines what computer does based

on instruction read from memory
MAR = memory address register holds address of memory cell to be read

PC = program counter; address of next instruction to be read
IR = instruction register holds instruction being executed
IBR holds right half of instruction read from memory
Simple Machine Organization

(continued)
Machine operates on fetch-execute cycle
Fetch
PC
MAR
read M(MAR) into MBR
copy left and right instructions into IR and IBR
Execute
address part of IR
read M(MAR) into MBR
execute opcode
MAR
Architecture Families
Before mid-60s, every new machine had a different

instruction set architecture
IBM System/360 created family concept
single instruction set architecture

wide range of price and performance with same software
Performance improvements based on different

detailed implementations
programs from previous generation didnt run on new machine

cost of replacing software became too large
memory path width (1 byte to 8 bytes)

faster, more complex CPU design
greater I/O throughput and overlap
Software compatibility now a major issue
partially offset by high level language (HLL) software
Multiple Register Machines
Initially, machines had only a few registers

2 to 8 or 16 common
registers more expensive than memory
Most instructions operated between memory locations
results had to start from and end up in memory, so fewer instructions
means smaller programs and (supposedly) faster execution
although more complex

fewer instructions and data to move between memory and ALU
But registers are much faster than memory
30 times faster
Multiple Register Machines (continued)
Also, many operands are reused within a short time
waste time loading operand again the next time its needed
Depending on mix of instructions and operand use, having many registers may
lead to less traffic to memory and faster execution
Most modern machines use a multiple register architecture
maximum number about 512, common number 32 integer, 32 floating point
Register Organization
User-visible
Control
registers
and status registers
User Visible Registers
General
Purpose
Data
Address
Condition
Codes
General Purpose Registers

Data
Accumulator
Addressing
Segment
pointers
Index registers
Stack Pointer
Control & Status Registers
Program
Counter
Instruction Decoding Register
Memory Address Register
Memory Buffer Register
Example Register Org.
Instruction Cycle
It
is the time in which a single

instruction is fetched from memory,
decoded, and executed
An
Instruction Cycle requires the

following subcycle:
Instruction Cycle
Fetch
Read next instruction from memory into the

processor
Indirect
Cycle (Decode Cycle)
May require memory access to fetch operands,

therefore more memory accesses.
Interrupt
Save current instruction and service the interrupt

Execute
Interpret the opcode and perform the indicated

operation
Instruction Cycle
Fetch
Interrupt
Indirect
Indirect
Execute
Data Flow (Fetch Diagram)
PC
MAR
Memory
Control
Unit
IR
MBR
MBR
Data Flow (Indirect Diagram)
MAR
Memory
Memory
Control
Unit
MBR
MBR
Data Flow (Execute)

May
take many forms

Depends on instruction being executed
May include
Memory read/write
Input/Output
Register transfers
ALU operations
Data Flow (Interrupt Diagram)
PC
PC
MAR
Memory
Control
Control
UnitUnit
MBR
Instruction Level Parallel Processor
Pipeline lead to evaluation of ILP and as the degree of parallel evaluation

increases the use of multiple pipeline execution Units.
Thus the evaluation of super scaler ILP begin to appear
They were first implemented as VLIW architectures
Instruction Pipelining
Instruction
processing is subdivided:
- Fetch/ Execute instruction

Pipeline
has two independent stages:
1st Stage Fetch an instruction and buffers it.

2nd Stage Temporarily free until first stage passes it
the buffered instruction.
While the second stage is executing the instruction, the
first stage fetches and buffers the next instruction.
Instruction
prefetch or fetch overlap.
- Purpose?
To speed up instruction execution.
Two-Stage Instruction Pipeline
Instruction Processing
Fetch
instruction (FI)
Decode
instruction (DI)
Calculate
Fetch
operands (FO)
Execute
Write
operands (CO)
instruction (EI)
operand (WO)
Successive
instructions in a program sequence

will overlap in execution.
Timing Diagram for Instruction Pipeline

Operation
Six-Stage CPU
Instruction
Pipeline
The logic needed
for pipelining to
account for
branches,
interrupts, and
arising problems.
Alternative Pipeline Depiction
RISC
Pipeline
1. Instruction
fetch
2. Instruction decode and register fetch
3. Execute
4. Memory Access
5. Register write back
Improve CPU performance by
increasing clock rates
(CPU running at 2GHz!)
increasing the number of instructions to be executed in parallel
(executing 6 instructions at the same time)
How do we increase the number of

instructions to be executed?
Time and Space parallelism
Pipeline (assembly line)
Result of pipeline (e.g.)
VLIW
(very long instruction word,1024 bits!)
Superscalar
(sequential stream of instructions)
From Sequential instructions

to parallel execution
Dependencies between instructions
Instruction scheduling
Preserving sequential consistency
4.2 Dependencies between instructions
Instructions often depend on each other in such a way that a particular instruction
cannot be executed until a preceding instruction or even two or three preceding
instructions have been executed.
1 Data dependencies
2 Control dependencies
3 Resource dependencies
4.2.1 Data dependencies
Read after Write
Write after Read
Write after Write
Recurrences
Data dependencies in straight-line code

(RAW)
RAW dependencies
i1: load r1, a
r2: add r2, r1, r1
flow dependencies

(WAR)
WAR dependencies
i1: mul r1, r2, r3
r2: add r2, r4, r5
anti-dependencies
false dependencies
can be eliminated through register renaming
i1: mul r1, r2, r3
r2: add r6, r4, r5
by using compiler or ILP-processor

(WAW)
WAW dependencies
i1: mul r1, r2, r3
r2: add r1, r4, r5
output dependencies
false dependencies
can be eliminated through register renaming
i1: mul r1, r2, r3
r2: add r6, r4, r5
by using compiler or ILP-processor
Data dependencies in loops

for (int i=2; i<10; i++) {
x[i] = a*x[i-1] + b
cannot be executed in parallel
Data dependency graphs
i1: load r1, a
i2: load r2, b
i3: load r3, r1, r2
i4: mul r1, r2, r4;
i5: div r1, r2, r4
Control Dependency Graph
4.2.3 Resource dependencies
An instruction is resource-dependent on a previously issued instruction if it

requires a hardware resource which is still being used by a previously issued
instruction.
e.g.
div r1, r2, r3
div r4, r2, r5
4.3 Instruction scheduling
scheduling or arranging two or more instruction to be executed in parallel
Need to detect code dependency (detection)
Need to remove false dependency (resolution)
Two basic approaches
Static: done by compiler
Dynamic: done by processor
Instruction Scheduling:
ILP-instruction scheduling
4.4 Preserving sequential consistency
care must be taken to maintain the logical integrity of the program execution
parallel execution mimics sequential execution as far as the logical integrity

of program execution is concerned
e.g.
add r5, r6, r7
div r1, r2, r3
jz somewhere
Branches
Branch(orbranching,branched)
may
also refer to the act of switching

execution to a different instruction
sequence as a result of executing a
branch instruction.
Branches
Two
Types of Branch Instructions
Conditional
A programming instruction that directs the

computer to another part of the program
based on the results of a compare. Highlevel language statements, such as IF
THEN ELSE and CASE, are used to express
the compare andconditional branch.
Branches
Unconditional
Unconditionalbranch instructionssuch
as GOTO are used to unconditionally
"jump" to (begin execution of) a different
instruction sequence. Machine
levelbranch instructionsare
sometimes called jump instructions.
Conditional Branch Instructions

Condition
BRP
Codes
Branch
BRZ
Branch
BRE
to location X if result is positive

to location X if result is zero
R1,R2,X
Branch
to location X if contents of R1 = R2
Conditional Branch Instructions
Dealing with Branches

A
major problem in designing an instruction

pipeline is assuring a steady flow of
instructions to the initial stages of the
pipeline.
Since
conditional branches alter the steady

flow of instructions, we must come up with
ways to execute them efficiently.
Major types of branches
Branch: To transfer control
Branch examples
8.1.2 How to check the results of operations for specified

conditions {branch} (e.g. equals 0, negative, and so on)
ISA = Instruction Set Architecture
Alternatives for checking the operation results
Result state approach: Disadvantage
The generation of the result state is not straightforward
It requires an irregular structure and occupies additional chip area
The result state is a sequential concept.
It cannot be applied without modification in architectures which have multiple

execution units.
Retaining sequential consistency for

condition checking (in VLIW or Superscalar)
Use multiple sets of condition codes or flags
It relies on programmer or compiler to use

different sets condition codes or flags for
different outcome generated by different EUs.
Use Direct Check approach.
Branch Statistics
20% of general-purpose code are branch
on average, each fifth instruction is a branch
5-10% of scientific code are branch
The Majority of branches are conditional (80%)
75-80% of all branches are taken
Branch statistics: Taken or not Taken
8.1.4 The branch problem:

The delay caused in pipelining
More branch problems
Conditional branch could cause an even longer penalty
evaluation
of the specified condition needs an extra
cycle
waiting
for unresolved condition (the result is not yet
ready)
e.g.
wait for the result may take 10-50 cycles
Pipelines with more stages than 4
each
branch would result in a yet larger number of

wasted cycles (called bubbles)
Interpretation of the concept of branch

penalty
Zero-cycle branching {in no time}
8.2 Basic approaches to branch handling
Speculative vs. Multiway branching
-Delayed Branching: Occurrence of an unused

instruction slot (unconditional branch)
Basic scheme of delayed branching
Early branch detection: {for scalar Processor}

Integrated instruction fetch and branch detection
Detect branch instruction during fetching
Guess taken or not taken
Fetch next sequential instruction or target instruction
Handling of unresolved conditional

branches
Blocking branch processing

Simply
stalled (stopped and waited) until the

specified condition can be resolved
-Basic kinds of branch predictions
-The fixed prediction approach
Always not taken vs. Always Taken
Always not taken: Penalty figures
Penalty figures for

the always taken prediction approach
-Static branch prediction
Instruction Scheduling
instructionschedulingis a compiler optimization used to improve

instruction-level parallelism, which improves performance on machines with
instruction pipelines. Put more simply, without changing the meaning of
thecode, it tries to. Avoid pipeline stalls by rearranging the order of
instructions
Compiler Optimization
Incomputing, anoptimizing compileris acompilerthat tries to minimize or

maximize some attributes of anexecutablecomputer program. The most
common requirement is to minimize the time taken to execute aprogram; a
less common one is to minimize the amount ofmemory occupied. The growth
ofportable computershas created a market for minimizing
thepowerconsumed by a program. Compiler optimization is generally
implemented using a sequence ofoptimizing transformations, algorithms
which take a program and transform it to produce a semantically equivalent
output program that uses fewer resources.
Motivation
Single cycling implementation
Pipeline Implementation
Pipeline Hazards
Pipeline Hazards
Pipeline Hazards
Pipeline Hazards
Pipeline Hazards
Pipeline Hazards
Pipeline Hazards
Instruction Order
Instruction Order
Instruction Order
Instruction Dependencies
Instruction Scheduling
Preserving Dependencies
pipeline stall
abubbleorpipeline stallis a delay in execution of aninstructionin

aninstruction pipeline.
During the decoding stage, the control unit will determine if the decoded
instruction reads from a register that the instruction currently in the
execution stage writes to. If this condition holds, the control unit will stall
the instruction by one clock cycle. It also stalls the instruction in the fetch
stage, to prevent the instruction in that stage from being overwritten by the
next instruction in the program.
Dynamic Scheduling

Advance Computer Architecture (Autosaved)

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Advance Computer Architecture (Autosaved)

Caricato da

Copyright:

Formati disponibili

Advance Computer

Major components of a computer

Central Processing Unit (CPU)

Simply put, a computer is a sophisticated electronic calculating machine that:

Accepts input information,

Processes the information according to a list of internally stored instructions and

Produces the resulting output information.

Functions performed by a computer are:

Accepting information to be processed as input.

Storing a list of instructions to process the information.

Processing the information according to the list of instructions.

Providing the results of the processing as output.

What are the functional units of a computer?

Functional units of a computer

Arithmetic and logic unit(ALU):

Input unit accepts

Information in a computer -- Instructions

Instructions specify commands to:

Transfer information within a computer (e.g., from memory to ALU)

A sequence of instructions to perform a task is called a program, which is

What do the instructions operate upon?

Information in a computer -- Data

Data are the operands upon which instructions operate.

Data could be:

Data, in a broad sense means any digital information.

Binary information must be presented to a computer in a specific format. This

Memory unit stores instructions and data.

Recall, data is represented as a series of bits.

To store data, memory unit thus stores bits.

Processor reads instructions and reads/writes data from/to the

In theory, instructions and data could be fetched one bit at a time.

In practice, a group of bits is fetched at a time.

Group of bits stored or retrieved at a time is termed as word

Number of bits in a word is termed as the word length of a

In order to read/write to and from memory, a processor should know

Address is associated with each word location.

Memory unit (contd..)

Processor reads/writes to/from memory based on the memory address:

Random Access Memory (RAM) provides fixed access time

Access time is known as Memory Access Time.

Memory and processor have to communicate with each other in order

In order to reduce communication time, a small amount of RAM

Fastest, smallest known as Cache

Slowest, largest known as Main memory.

Memory unit (contd..)

Primary storage of the computer consists of RAM units.

Fastest, smallest unit is Cache.

Slowest, largest unit is Main Memory.

Primary storage is insufficient to store large amounts of data and programs.

Primary storage can be added, but it is expensive.

Store large amounts of data on secondary storage devices:

Magnetic disks and tapes,

Optical disks (CD-ROMS).

Computer Architecture (continued)

byte = 8 bits (smallest addressable location)

word = 4 bytes (typically; machine dependent)

instructions consist of operation codes and addresses

Computer Architecture (continued)

Numeric data representations

integer (exact representation)

floating point (approximate representation)

scientific notation: 0.3481 x 106