Chapter 17

Computer Organization
Douglas Comer
Computer Science Department
Purdue University
250 N. University Street
West Lafayette, IN 47907-2066
http://www.cs.purdue.edu/people/comer
Copyright 2006. All rights reserved. This document may not
be reproduced by any means without written consent of the author.
XVII
Parallelism
CS250 -- Chapt. 17
2006
Two Fundamental Hardware Techniques

Used To Increase Performance
d Parallelism
d Pipelining
CS250 -- Chapt. 17
2006
Parallelism
d Multiple copies of hardware unit used
d All copies can operate simultaneously
d Occurs at many levels of architecture
d Term parallel computer applied when parallelism dominates
entire architecture
CS250 -- Chapt. 17
2006
Characterizations Of Parallelism
d Microscopic vs. macroscopic
d Symmetric vs. asymmetric
d Fine-grain vs. coarse-grain
d Explicit vs. implicit
CS250 -- Chapt. 17
2006
Microscopic Vs. Macroscopic Parallelism
Parallelism is so fundamental that virtually all computer

systems contain some form of parallel hardware. We use the
term microscopic parallelism to characterize parallel facilities
that are present, but not especially visible.
CS250 -- Chapt. 17
2006
Examples Of Microscopic Parallelism

d Parallel operations in an ALU
d Parallel access to general-purpose registers
d Parallel data transfer to/from physical memory
d Parallel transfer across an I/O bus
CS250 -- Chapt. 17
2006
Examples Of Macroscopic Parallelism

d Symmetric parallelism
Refers to multiple, identical processors
Example: dual processor PC
d Asymmetric parallelism
Refers to multiple, dissimilar processors
Example: PC with a graphics processor
CS250 -- Chapt. 17
2006
Level Of Parallelism
d Fine-grain
Parallelism among individual instructions or data

elements
d Coarse-grain parallelism
Parallelism among programs or large blocks of data
CS250 -- Chapt. 17
2006
Explicit And Implicit Parallelism

d Explicit
Visible to programmer
Requires programmer to initiate and control parallel

activities
d Implicit
Invisible to programmer
Hardware runs multiple copies of program automatically
CS250 -- Chapt. 17
2006
Parallel Architectures
d Design in which computer has reasonably large number of
processors
d Intended for scaling
d Example: computer with thirty-two processors
d Not generally classified as parallel computer
Dual processor computer
Quad processor computer
CS250 -- Chapt. 17
10
2006
Types Of Parallel Architectures
Name
Meaning
22222222222222222222222222222222222222222222222222
SISD
Single Instruction Single Data stream
SIMD
Single Instruction Multiple Data streams
MIMD
Multiple Instructions Multiple Data streams
d Known as Flynn classification
CS250 -- Chapt. 17
11
2006
Conventional (Nonparallel) Architecture

d Known as Single Instruction Single Data
d Other terms include
Sequential architecture
Uniprocessor
CS250 -- Chapt. 17
12
2006
Single Instruction Multiple Data

(SIMD)
d Each instruction specifies a single operation
d Hardware applies operation to multiple data items
CS250 -- Chapt. 17
13
2006
Vector Processor
d Uses SIMD architecture
d Applies a single floating point operation to an entire array of
values
d Example use: normalize values in a set
CS250 -- Chapt. 17
14
2006
Normalization On A Conventional Computer

for i from 1 to N {
V[i] V[i] Q;
}
CS250 -- Chapt. 17
15
2006
Normalization On A Vector Processor

V V Q;
d Trivial amount of code

d Special instruction called vector instruction
d If vector V larger than hardware capacity, multiple steps are
required
CS250 -- Chapt. 17
16
2006
Graphics Processors
d Graphics hardware uses sequential bytes in memory to store
pixels
d To move a window, software copies bytes
d SIMD architecture allows copies in parallel
CS250 -- Chapt. 17
17
2006
Multiple Instructions Multiple Data

(MIMD)
d Parallel architecture with separate processors
d Each processor runs independent program
d Processors visible to programmer
CS250 -- Chapt. 17
18
2006
Two Popular Categories Of Multiprocessors

d Symmetric
d Asymmetric
CS250 -- Chapt. 17
19
2006
Symmetric Multiprocessor (SMP)

d Most well-known MIMD architecture
d Set of N identical processors
d Examples of groups that built SMP computers
Carnegie Mellon University (C.mmp)
Sequent Corporation (Balance 8000 and 21000)
Encore Corporation (Multimax)
CS250 -- Chapt. 17
20
2006
Illustration Of A Symmetric Multiprocessor
P1
Main
Memory
(various
modules)
P2
Pi+1
Pi+2
Devices
Pi
CS250 -- Chapt. 17
PN
21
2006
Asymmetric Multiprocessor (AMP)

d Set of N processors
d Multiple types of processors
d Processors optimized for specific tasks
d Often use master-slave paradigm
CS250 -- Chapt. 17
22
2006
Example AMP Architectures

d Math (or graphics) coprocessor
Special-purpose processor
Handles floating point (or graphics) operations
Called by main processor as needed
d I/O Processor
Optimized for handling interrupts
Programmable
CS250 -- Chapt. 17
23
2006
Examples Of Programmable I/O Processors

d Channel (IBM mainframe)
d Peripheral Processor (CDC mainframe)
CS250 -- Chapt. 17
24
2006
Multiprocessor Overhead
d Having many processors is not always a clear win
d Overhead arises from
Communication
Coordination
Contention
CS250 -- Chapt. 17
25
2006
Communication
d Needed
Among processors
Between processors and I/O devices
d Can become a bottleneck
CS250 -- Chapt. 17
26
2006
Coordination
d Needed when processors work together
d May require one processor to coordinate others
CS250 -- Chapt. 17
27
2006
Contention
d Processors contend for resources
Memory
I/O devices
d Speed of resources can limit overall performance
Example: N 1 processors wait while one processor

accesses memory
CS250 -- Chapt. 17
28
2006
Performance Of Multiprocessors
d Disappointing
d Bottlenecks
Contention for operating system (only one copy

of OS can run)
Contention for memory and I/O
d Another problem: either need
One centralized cache (contention problems)
Coordinated caches (complex interaction)
d Many applications are I/O bound
CS250 -- Chapt. 17
29
2006
According To John Harper

Building multiprocessor systems that scale while
correctly synchronising the use of shared resources is very
tricky, whence the principle: with careful design and
attention to detail, an N-processor system can be made to
perform nearly as well as a single-processor system. (Not
nearly N times better, nearly as good in total performance as
you were getting from a single processor). You have to be
very good and have the right problem with the right
decomposability to do better than this.
http:/ / www.john-a-harper.com/ principles.htm
CS250 -- Chapt. 17
30
2006
Definition Of Speedup
d Defined relative to single processor
1
333
Speedup =
N
d 1 denotes the execution time on a single processor
d N denotes the execution time on a multiprocessor
d Goal: speedup is linear in number of processors
CS250 -- Chapt. 17
31
2006
Ideal And Typical Speedup
Speedup
16
ideal
12
actual
8
1
1
12
16
Number of processors (N)
CS250 -- Chapt. 17
32
2006
Speedup For N >> 1 Processors
Speedup
32
24
ideal
16
actual
8
1
1
CS250 -- Chapt. 17
16
Number of processors (N)
33
24
32
2006
Summary Of Speedup
When used for general-purpose computing, a multiprocessor

may not perform well. In some cases, added overhead means
performance decreases as more processors are added.
CS250 -- Chapt. 17
34
2006
Consequences For Programmers

d Writing code for multiprocessors is difficult
Need to handle mutual exclusion for shared items
Typical mechanism: locks
CS250 -- Chapt. 17
35
2006
The Need For Locking

d Consider an assignment
x = x + 1;
d Typical code is
load x, R5
incr R5
store R5, x
CS250 -- Chapt. 17
36
2006
Example Of Problem With Parallel Access

d Consider two processors incrementing item x
Processor 1 loads x into its register 5
Processor 1 increments its register 5
Processor 2 loads x into its register 5
Processor 1 stores its register 5 into x
Processor 2 increments its register 5
Processor 2 stores its register 5 into x
CS250 -- Chapt. 17
37
2006
Hardware Locks
d Prevent simultaneous access
d Separate lock assigned to each item
d Code is
lock 17
load x, R5
incr R5
store R5, x
release 17
CS250 -- Chapt. 17
38
2006
Programming Parallel Computers

d Implicit parallelism
Programmer writes sequential code
Hardware runs many copies automatically
d Explicit parallelism
Programmer writes code for parallel architecture
Code must use locks to prevent interference
CS250 -- Chapt. 17
39
2006
The Point About Parallel Programming
From a programmers point of view, a system that uses explicit

parallelism is significantly more complex to program than a
system that uses implicit parallelism.
CS250 -- Chapt. 17
40
2006
Programming Symmetric And

Asymmetric Multiprocessors
d Both types can be difficult to program
d Symmetric has two advantages
One instruction set
Programmer does not need to choose processor type for

each task
CS250 -- Chapt. 17
41
2006
Redundant Parallel Architectures

d Used to increase reliability
d Do not improve performance
d Multiple copies of hardware perform same function
d Can be used to
Test whether hardware is performing correctly
Serve as backup in case of hardware failure
CS250 -- Chapt. 17
42
2006
Loose And Tight Coupling

d Tightly coupled multiprocessor
Multiple processors in single computer
Buses or switching fabrics used to interconnect

processors, memory, and I/O
Usually one operating system
d Loosely coupled multiprocessor
Multiple, independent computer systems
Computer networks used to interconnect systems
Each computer runs its own operating system
Known as distributed computing
CS250 -- Chapt. 17
43
2006
Cluster Computer
d Distributed computer system
d All computers work on a single problem
d Works best if problem can be partitioned into pieces
CS250 -- Chapt. 17
44
2006
Grid Computing
d Form of loosely-coupled distributed computing
d Uses computers on the Internet
d Popular for large, scientific computations
CS250 -- Chapt. 17
45
2006
Summary
d Parallelism is a fundamental optimization
d Computers classified as
SISD (e.g., conventional uniprocessor)
SIMD (e.g., vector computer)
MIMD (e.g., multiprocessor)
d Multiprocessor speedup usually less than linear
CS250 -- Chapt. 17
46
2006
Summary
(continued)
d Multiprocessors can be
Symmetric or asymmetric
Explicitly or implicitly parallel
d Programming multiprocessors is usually difficult
Locks needed for shared items
d Parallel systems can be
Tightly-coupled (single computer)
Loosely-coupled (computers connected by a network)
CS250 -- Chapt. 17
47
2006
Questions?

Chapter 17

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Chapter 17

Caricato da

Copyright:

Formati disponibili

Computer Organization

Two Fundamental Hardware Techniques

Microscopic Vs. Macroscopic Parallelism

Parallelism is so fundamental that virtually all computer

Examples Of Microscopic Parallelism

Examples Of Macroscopic Parallelism

Refers to multiple, identical processors

Example: dual processor PC

Refers to multiple, dissimilar processors

Example: PC with a graphics processor

Parallelism among individual instructions or data

Parallelism among programs or large blocks of data

Explicit And Implicit Parallelism

Requires programmer to initiate and control parallel

Hardware runs multiple copies of program automatically

Dual processor computer

Quad processor computer

Types Of Parallel Architectures

d Known as Flynn classification

Conventional (Nonparallel) Architecture

Single Instruction Multiple Data

Normalization On A Conventional Computer

Normalization On A Vector Processor

d Trivial amount of code

Multiple Instructions Multiple Data

Two Popular Categories Of Multiprocessors

Symmetric Multiprocessor (SMP)

Carnegie Mellon University (C.mmp)

Sequent Corporation (Balance 8000 and 21000)

Encore Corporation (Multimax)

Illustration Of A Symmetric Multiprocessor

Asymmetric Multiprocessor (AMP)

Example AMP Architectures

Handles floating point (or graphics) operations

Called by main processor as needed

Optimized for handling interrupts

Examples Of Programmable I/O Processors

Between processors and I/O devices

d Can become a bottleneck

d Speed of resources can limit overall performance

Example: N 1 processors wait while one processor

Contention for operating system (only one copy

Contention for memory and I/O

d Another problem: either need

One centralized cache (contention problems)

Coordinated caches (complex interaction)

d Many applications are I/O bound

According To John Harper

http:/ / www.john-a-harper.com/ principles.htm

Ideal And Typical Speedup

Number of processors (N)

Speedup For N >> 1 Processors

When used for general-purpose computing, a multiprocessor

Consequences For Programmers

Need to handle mutual exclusion for shared items

Typical mechanism: locks

The Need For Locking

Example Of Problem With Parallel Access

Processor 1 loads x into its register 5

Processor 1 increments its register 5

Processor 2 loads x into its register 5

Processor 1 stores its register 5 into x

Processor 2 increments its register 5

Processor 2 stores its register 5 into x