Sei sulla pagina 1di 2

Master in High Performance Computing

Advanced Parallel Programming LABS


The labs will be performed in the Finis Terrae (FT2) supercomputer of the Galicia
Supercomputing Center (CESGA). The use of this system was studied in the subject
Parallel Programming in first semester. In aula.cesga.es a small guide about the FT2
architecture, as well as information about how to compile and execute code in the FT2
can be found (see aula.cesga.es → Advanced Parallel Programming course → Documents
→ MPIatFT.pdf).
For each one of the labs you will have to write a small report, just explaining what
you have done in each exercise, the resulting codes, and the performance analysis. The
performance analysis implies to get the speedup with different number of threads and
to get some conclusions. Note that, to perform a good analysis and obtain reasonable
conclusions, the computing time of the tests should be at least of the order of seconds.
The memory can be written in English or Spanish. The deadline dates for each lab will
be communicated via slack.

OpenMP: Vectorization and Hybrid Programming


LABS 1
Starting codes are in Lab1Codes.zip file.

Vectorization with OpenMP. We are going to use the gcc compiler, so module
load gcc[/6.4.0] is needed. For testing, compute -c 4 can be enough.

1. The code multf.c performs the product of matrices

D = A × BT

Parallelize and vectorize the product. Analyze the performance with different
number of threads (1, 2, and 4) with vectorization and without it.
2. In the code saxpy.c there are two different functions for SAXPY operation.
Vectorize the loops of N iterations calling saxpy and saxpyi functions. The
saxpy_no_simd and saxpyi_no_simd functions (not to be vectorized) are only
to compare the performance between vectorization and without it.
3. Using the program done in the previous point, parallelize the four loops of N
iterations. Analyze the performance.

Hybrid Programming. We can use Intel MPI implementation, module load


intel impi, or OpenMPI one, load gcc openmpi. There may be some differences.
The exercises are based on codes that you parallelized in MPI in Parallel Program-
ming.

1. The code pi_integral.c computes the value of π with numerical integration


in the interval [0,1], using N intervals of the same size and adding their areas:
Z 1 1
1 π
dx = arctan(x) = arctan(1) − arctan(0) =
0 1 + x2 0 4
Parallelize it using MPI. After that include OpenMP directives to parallelize
the loop. Analyze the speedup for different number of processes and threads
for the same number of CPUs, that is, for a given number of CPUs, analyze
the different configurations of processes and threads. For example, for 16 CPUs
the configurations (processes, threads) can be: (1, 16) which implies OpenMP
only, (2, 8), (4, 4), (8, 2), and (16, 1) which implies MPI only.
2. The program dotprod.c computes the dot product of two vectors. Parallelize
it using MPI. After that include OpenMP directives to parallelize the loop.
Analyze the speedup for different number of processes and threads for the same
number of CPUs, that is, for a given number of CPUs, analyze the different
configurations of processes and threads.
3. The program mxvnm.c computes the matrix vector product. Parallelize it using
MPI the loop through the rows of the matrix (N ). After that include OpenMP
directives to parallelize the innermost loop (M ). Analyze the speedup for dif-
ferent number of processes and threads for the same number of CPUs, that is,
for a given number of CPUs, analyze the different configurations of processes
and threads. Try with different values of N and M in order to find a situation
in which hybrid programming is better than OpenMP or MPI.

Potrebbero piacerti anche