Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Shamjith K V
shamjithkv@cdac.in
Hybrid Computing Group
CDAC, Bangalore
Overview
MPI in Action
Hybrid Programming
MPI Standards
26-Jun-14
MPI IN ACTION
26-Jun-14
MPI Ecosystem
MPI Application
MPI implementation
Compilers & Linkers
Schedulers / Resource Managers
Cluster Interconnects
Communication Stacks
And of course .. The physical resources & the OS
Hence to get most out of MPI, the user should know
Some details of the MPI implementation
The System architecture
26-Jun-14
Rank 1
Application
Application
Message
Message
MPI
Implementation
MPI
Implementation
OS
OS
The Network
The Network
26-Jun-14
Rank 0
Rank 1
Application
Application
Message
MPI
Implementation
Message
MPI
Implementation
OS
OS
Fast Interconnect
26-Jun-14
Fast Interconnect
Infiniband
Myrinet
High speed LAN system by Myricom
Initially proposed for building HPC clusters
Two fibre optic cables
Upstream
Downstream
Fault tolerant features
Myri-10G 10 Gbps rate
Compatible with 10Gig Ethernet at PHY
26-Jun-14
PARAMNet 3
Developed by C-DAC
GEMINI Communication co-processor
Supports 8-48 ports
Each port supports 10 Gbps full duplex
KSHIPRA Software stack
RDMA centric
MVAPICH and Intel MPI support
26-Jun-14
Case Study
Application Case Study
N-BODY Problem
Wikipedia says In physics, the n-body
26-Jun-14
10
Gravitaional Force
26-Jun-14
11
GalaxSee
F = G(M1M2)/sqr(D)
Where G is the Gravitational Constant
M1, M2 are the masses of two bodies
D is the distance between the two bodies
The acceleration of an object is given by the sum of the
forces acting on that object divided by its mass
a = F/M
change in velocity, a = v/t
change in position ,v = x/t
New position ,
NEW = OLD + CHANGE
26-Jun-14
12
GalaxSee
Galaxsee Application
The GalaxSee program lets the user model a
number of bodies in space moving under the
influence of their mutual gravitational attraction.
It is effective for relatively small numbers of bodies
(on the order of a few hundred
26-Jun-14
13
GalaxSee Simulation
26-Jun-14
14
HYBRID PROGRAMMING
26-Jun-14
15
Programming Paradigms
26-Jun-14
16
CPU
Memory
26-Jun-14
17
Multi-Threaded Program
Core
Core
Core
CPU
CPU
Core
Core
Core
Core
Core
Shared Memory
26-Jun-14
18
Multi-Process Program
Shared Memory
Core
Process
Core
Core
CPU
Core
Shared Memory
Core
Core
CPU
Core
Core
Core
Core
CPU
Core
Core
Core
Process
CPU
Core
Core
Core
Network
Core
Core
Core
Core
CPU
Core
Core
CPU
CPU
Core
Core
Core
Core
Core
Core
CPU
Core
Core
Process
Process
Shared Memory
26-Jun-14
Core
Core
Shared Memory
Think Parallel June 2014
19
Distributed Memory
Many nodes distributed memory
each node has its own local memory
not directly addressable from other nodes
Multiple sockets per node
each node has 2 sockets (chips)
Multiple cores per socket
each socket (chip) has 4 cores
Memory spans all 8 cores - shared memory
nodes full local memory is addressable from any
core in any socket
Memory is attached to sockets
4 cores sharing the socket have fastest access to
attached memory
26-Jun-14
20
21
Why Hybrid
Eliminates domain decomposition at node
level
Lower memory latency and data movement
within node
Improved application performance; reduced
turn-around time.
26-Jun-14
22
26-Jun-14
23
24
MPI+OpenMP/Thread Combination
Treat each node as an SMP
launch a single MPI process per node
create parallel threads sharing full-node
memory
typically 8 threads/node
Treat each socketas an SMP
launch one MPI process on each socket
create parallel threads sharing same-socket
memory
typically 4 threads/socket
26-Jun-14
25
Nested Parallelism
Nested Loops
Principles
Limited parallelism on outer level
Additional inner level of parallelism
Inner level not suitable for MPI
Inner level may be suitable for OpenMP
26-Jun-14
26
Nested Loops
for(int i=0;i<1000;i++)
{
for(int j=0;j<10000000;j++)
{
OpenMP
MPI
c[i][j]=a[i][j]+b[i][j];
}
}
26-Jun-14
27
28
Run:
mpirun np 2 ./hello-hybrid
26-Jun-14
29
MPI STANDARDS
26-Jun-14
30
Contents of MPI-2
26-Jun-14
31
26-Jun-14
32
Operations
MPI_Put()
for remote writes
MPI_Get()
for remote reads
MPI_Accumulate()
for remote updates.
26-Jun-14
33
I/O Calls
MPI-1 relied on OS I/O functions, but MPI-2 provides
MPI_File functions for dedicated parallel I/O:
int MPI_File_open(MPI_Comm comm, char *name,
int mode, MPI_Info info, MPI_File *fh);
int MPI_File_seek(MPI_File fh, MPI_Offset offset,
int whence);
int MPI_File_read / MPI_File_write(MPI_File fh, void
*buf, int count, MPI_Datatype type, MPI_Status
*status);
int MPI_File_close(MPI_File *fh);
26-Jun-14
34
Parallel I/O
26-Jun-14
35
36
26-Jun-14
Rank 0
Rank 1
Rank 2
26-Jun-14
37
Rank 0
Rank 1
Rank 2
If(myrank==0)
{
Input_data = read();
}
MPI_Bcast(Input_data,);
Shared File System
26-Jun-14
38
Rank 0
Rank 1
Rank 2
MPI_Gather(output_data,)
If(myrank==0)
{
write(output_data);
}
Shared File System
26-Jun-14
39
26-Jun-14
40
41
26-Jun-14
42
26-Jun-14
43
MPI -2 Miscellany
Standard startup with mpiexec
Recommended but not required
26-Jun-14
44
External Interfaces
Generalized Requests
users can create new non-blocking operations
Naming objects for debuggers and profilers
label communicators, windows, datatypes
Allow users to add error codes, classes and strings
Specifies how threads are to be handled if the
implementation chooses to provide them
26-Jun-14
45
MPI-3
MPI-3 Scope
26-Jun-14
47
26-Jun-14
48
Data piggybacking
Dynamic communicators
Asynchronous dynamic process control
26-Jun-14
49
26-Jun-14
50
Hybrid Programming
Ensure that MPI has the features
necessary to facilitate efficient hybrid
programming
Investigate what changes are needed in
MPI to better support:
Traditional thread interfaces (e.g., Pthreads,
OpenMP)
Emerging interfaces (like TBB, OpenCL, CUDA,
and Ct)
PGAS (UPC, CAF, etc.)
Shared Memory
26-Jun-14
51
26-Jun-14
52
THANK YOU!
Questions?