Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Introduction
Winter 2007
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
Resource Allocation:
Winter 2007
Parallelism:
Winter 2007
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
constraints
inevitable
In the mainstream
Winter 2007
Application Trends
Demand for cycles fuels advances in hardware, and vice-versa
Cycle drives exponential increase in microprocessor performance
Drives parallel architecture harder: most demanding applications
Range of performance demands
Speedup (p processors) =
Performance (p processors)
Performance (1 processor)
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
Time (p processors)
6
Winter 2007
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
Winter 2007
1GIPS
Telephone
Number
Recognition
100MIPS
10MIPS
1MIPS
1980
200Words
IsolatedSpeech
Recognition
SubBand
SpeechCoding
1,000Words
Continuous
Speech
Recognition
5,000Words
Continuous
Speech
Recognition
HDTVReceiver
CIFVideo
ISDNCDStereo
Receiver
CELP
SpeechCoding
Speaker
Verication
1985
1990
1995
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
10
Commercial Computing
Also relies on parallelism for high end
Winter 2007
11
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
12
Technology Trends
100
Supercomputers
Performance
10
Mainframes
Microprocessors
Minicomputers
1
0.1
1965
1970
1975
1980
1985
1990
1995
The natural building block for multiprocessors is now also about the fastest!
Winter 2007
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
13
180
160
140
DEC
alpha
120
100
80
60
40
20
MIPS
Sun 4 M/120
260
0
1987
Winter 2007
1988
MIPS
M2000
1989
IBM
RS6000
540
1990
Integer
FP
HP 9000
750
1991
1992
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
14
count
Parallelism in processing
Proc
Interconnect
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
15
C
l
o
c
k
r
a
t
e
(
M
H
z
)
1,000
100
10
i8086
R10000
Pentium100
i80386
i80286
i8080
i8008
i4004
0.1
1970
1975
1980
1985
1990
1995
2000
2005
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
16
T
r
a
n
s
is
to
r
s
10,000,000
1,000,000
i80286
100,000
R10000
Pentium
i80386
R3000
R2000
i8086
10,000
i8080
i8008
i4004
1,000
1970
Winter 2007
1980
1990
2000
1975
1985
1995
2005
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
17
Storage Performance
Divergence between memory capacity and speed more
pronounced
New designs fetch many bits within memory chip; follow with fast
pipelined transfer across narrower interface
Buffer caches most recently accessed data
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
18
Architectural Trends
Architecture translates technologys gifts to performance and
capability
Resolves the tradeoff between parallelism and locality
VLSI
exploited
Winter 2007
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
19
great inflection point when 32-bit micro and cache fit on a chip
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
20
25
2.5
20
2
Speedup
30
15
1.5
10
0.5
6+
10
15
Infinite resources and fetch bandwidth, perfect branch prediction and renaming
real caches and non-zero miss latencies
Winter 2007
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
21
1,000x1000 matrix
100 x 100 matrix
1,000x1000 matrix
100 x 100 matrix
LINPACK (MFLOPS)
1,000
T94
C90
Ymp
Xmp/416
IBM Power2/990
100
DEC 8200
MIPS R4400
Xmp/14se
DEC Alpha
HP9000/735
CRAY 1s
IBM RS6000/540
10
MIPS M/2000
MIPS M/120
Sun 4/260
1
1975
Winter 2007
1980
1985
1990
1995
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
2000
22
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
23
ApplicationSoftware
Systolic
Arrays
Dataflow
Winter 2007
System
Software
Architecture
SIMD
MessagePassing
SharedMemory
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
24
cooperation
Defines
Winter 2007
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
25
CAD
Database
Multiprogramming
Shared
address
Scientific modeling
Message
passing
Data
parallel
Compilation
or library
Operating systems support
Communication hardware
Parallel applications
Programming models
Communication abstraction
User/system boundary
Hardware/software boundary
Winter 2007
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
26
Programming Model
What programmer uses in coding applications
Specifies communication and synchronization
Examples:
Multiprogramming: no communication or synch. at program level
Shared address space: like bulletin board
Message passing: like letters or phone calls, explicit point to point
Data parallel: more regimented, global actions on data
Winter 2007
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
27
Communication Abstraction
User level communication primitives provided
Realizes the programming model
Mapping exists between language primitives of programming model
and these primitives
Supported directly by HW, or via OS, or via user SW
Lot of debate about what to support in SW and gap between
layers
Today:
Winter 2007
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
28
Communication Architecture
Comm. Arch. = User/System Interface + Implementation
User/System Interface:
Implementation:
Goals:
Performance
Broad applicability
Programmability
Scalability
Low Cost
Winter 2007
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
29
Dataflow
Systolic Arrays
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
30
Convenient:
Location transparency
Similar programming model to time-sharing on uniprocessors
Winter 2007
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
31
P1
Pn
Common physical
addresses
P2
P0
St or e
Shared portion
of address space
Private portion
of address space
P2 pr i vat e
P1 pr i v at e
P0 pr i vat e
Writes to shared address visible to other threads (in other processes too)
Natural extension of uniprocessors model: conventional memory operations
for comm.; special atomic operations for synchronization
OS uses shared memory to coordinate processes
Winter 2007
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
32
Mem
Mem
Mem
Interconnect
Processor
Mem
I/O ctrl
I/O ctrl
Interconnect
Processor
controllers
Winter 2007
33
Motivated by multiprogramming
Extends crossbar used for mem bw and I/O
Originally processor cost limited to small
P
P
I/O
I/O
C
M
Minicomputer approach
Winter 2007
I/O
I/O
34
P-Pro
module
256-KB
Interrupt
L2 $
controller
Bus interface
P-Pro
module
P-Pro
module
PCI
bridge
PCI bus
PCI
I/O
cards
PCI
bridge
PCI bus
Memory
controller
MIU
1-, 2-, or 4-way
interleaved
DRAM
Winter 2007
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
35
P
$
P
$
$2
$2
CPU/mem
cards
Mem ctrl
Bus interface/switch
I/O cards
SBUS
SBUS
2 FiberChannel
SBUS
100bT, SCSI
Bus interface
36
Network
Network
M $
M $
Dance hall
Distributed memory
M $
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
37
Mem
Mem
ctrl
and NI
XY
Switch
Winter 2007
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
38
operations
Winter 2007
Library or OS intervention
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
39
Message-Passing Abstraction
Match
ReceiveY, P, t
AddressY
Send X, Q, t
AddressX
Local process
address space
Process P
Process Q
Local process
address space
Winter 2007
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
40
Simplifies programming
101
001
100
000
111
011
Winter 2007
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
110
010
41
Power 2
CPU
L2 $
Memory bus
4-way
interleaved
DRAM
Memory
controller
MicroChannel bus
I/O
DMA
i860
NI
DRAM
NIC
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
42
i860
L1 $
L1 $
Intel
Paragon
node
Mem
ctrl
DMA
Driver
4-way
interleaved
DRAM
2D grid network
with processing node
attached to every switch
Winter 2007
NI
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
8 bits,
175 MHz,
bidirectional
43
Winter 2007
ENGR9861 R. Venkatesan
High-Performance Computer Architecture
44