Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Chapter 1
Parallel Computer Models
TEXT BOOK: KAI HWANG AND NARESH JOTWANI, ADVANCED COMPUTER
ARCHITECTURE (SIE): PARALLELISM, SCALABILITY, PROGRAMMABILITY, MCGRAW
HILL EDUCATION 3/E. 2015
In this chapter…
OBJECTIVE
INTRODUCTION
• THE STATE OF COMPUTING
• MULTIPROCESSORS AND MULTICOMPUTERS
• MULTIVECTOR AND SIMD COMPUTERS
• PRAM AND VLSI MODELS
OBJECTIVE
Single control unit (CU) fetches single instruction Multiple instructions operate on one data stream.
stream (IS) from memory. The CU then generates Heterogeneous systems operate on the same data stream
appropriate control signals to direct single processing and must agree on the result.
element .
• Multiprocessor and multicomputer,
o Shared memory multiprocessors
o Distributed Memory Multiprocessors
o A taxonomy of MIMD Computers
FLYNN’S CLASSIFICATION
Instructions are executed sequentially. It can be MIMD architectures include multi-core superscalar
achieved by pipelining or multiple functional units processors, and distributed systems, using either one
shared memory space or a distributed memory space.
MIMD is most popular model.
SIMD is next,
MISD the least popular model.
Parallel/Vector computers
Execute programs on MIMD mode.
2 major classes:
1. shared-memory multiprocessors.(multiple processors with a shared memory)
2. message-passing multicomputer.(architecture in which each processor has its own memory)
each computer node in a multicomputer system has a local memory. Unshared with other nodes.
System attributes to performance
Performance depends on perfect match b/w Machine capability (MC)and Program behavior(PB).
Machine capability MC: can be Program behavior (PB): is difficult to predict due
enhanced with better H/W to dependency on application and runtime
technology, architecture features conditions.
and efficient recourse Other factors :-algorithm design, data
management. structures, language efficiency, programmer
skills and compiler technology.
instruction fetch,
decode,
operand(s) fetch, Requires access of
Memory
execution and
Carried out in
store results. CPU
CPI(cycles per instruction can be divided into 2 component terms based on Processor cycles and
memory cycles.
Depending on the instruction type the instruction cycle may involve 1-4 memory reference
◦ 1 instruction fetch
◦ 2 operand fetch
◦ 1 to store result.
Therefore T= Ic * (p + m*k) * t
Ic= instruction count
p= number of processor cycles.
m= number of memory references
k= ratio between memory and processor cycle.
t= processor cycle time
T=CPU Time
3)System Attributes
The 5 performance factors (Ic,p ,m,k, t )are influenced by 4 system attributes:
◦ Instruction-set architecture
◦ Compiler technology
◦ CPU implementation and control
◦ Cache and memory hierarchy
Instruction-set architecture= Ic, p (processor cycle per instruction)
Compiler technology= Ic, p, m (memory references per instruction)
CPU implementation and control= p, t (processor cycle time) total processor time needed.
Cache and memory hierarchy= affects memory access latency = k, t
4)MIPS (millions instructions per second) rate
All 4 sytem attributes ISA,Compiler,processor ,memory effects MIPS rate.
7)Programming Environment
• Programmability depends on programming environment provided to the
user.
• We prefer parallel environment rather than sequential environment.
• Factors Influencing programming environment are Languages, compilers, OS.
• OS must be able to manage resources, parallel scheduling, Inter process
communication, synchronization, shared memory allocation.
Two approaches to Parallel Programming
Implicit and explicit parallelism
Implicit parallelism
Language such as C, C++, Fortran, pascal are used to write source program.
In implicit parallelism success relies on the compilers.
The sequentially coded source program is translated into parallel object code by a parallelizing
compiler.
This compiler must be able to detect parallelism and assign target machine resources.
Relies heavily on “intelligence” of parallelizing compilers.
i.e “less effort on programmer”
Two approaches to Parallel Programming
Implicit and explicit parallelism
Explicit parallelism
Explicit parallelism requires more effort by the programmer to develop a source program using
C, C++, Fortran, pascal.
Parallelism is explicitly specified in user program.
Reduces burden on compilers to detect parallelism.
1.2 Multiprocessors and multicomputer
2 categories of parallel computers
◦ Shared memory multiprocessors
◦ Distributed-memory multicomputer
On the other hand, if the decoded instructions are vector operations then the instructions will be sent
to vector control unit.
3:executes using 4:vector operation
pipelines must be sent to VCU
2:decodes
Parallel computers use VLSI chips to fabricate processor arrays, memory arrays and large-scale
switching networks.
Nowadays, VLSI technologies are 2-dimensional. The size of a VLSI chip is proportional to the amount
of storage (memory) space available in that chip.
We can calculate the space complexity of an algorithm by the chip area (A) of the VLSI chip
implementation of that algorithm. If T is the time (latency) needed to execute the algorithm, then
A.T gives an upper bound on the total number of bits processed through the chip (or I/O). For certain
computing, there exists a lower bound, f(s), such that
A.T2 >= O (f(s))
Where A=chip area and T=time