Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
m --without-scalapack
- then edit file PW/Makefile to substitute line 44:
all : tldeps pw-gpu.x manypw-gpu.x
- with line:
all : tldeps pw-gpu.x
- finally issue the commands:
> cd ..
> make -f Makefile.gpu pw-gpu
if everything goes fine you should find the "pw-gpu.x" in the directory
espresso-5.0.2/bin
3 - List and Purpose of Datasets
Three datasets are provided together with this benchmark: a small one
(SiO2.tar.gz), to be used to run benchmarks inside a single node (or device)
and as a test bed for code change; a medium one (AuSurf.tar.gz) to run
benchmarks
on more than one node; and a large one (AuSurf-large.tar.gz) to run benchmarks
on many nodes.
- SiO2.tar.gz test-case requires only few Gigabytes of main memory to run,
and can scale easily up to 32 or 64 cores.
- AuSurf.tar.gz test-case requires less than 32 Gigabytes of main memory,
2 Gigabyte of disk space, and you would need 2 or 4 nodes to run it.
It can scale up to 256 or 512 cores.
- AuSurf-large.tar.gz test-case is 8 time larger than AuSurf.tar.gz,
and it is meant to perform benchmark runs on 1024, 2048 or even more cores.
Usually, if the architecture is well balanced (i.e. the network performance
are good enough to support the node performance) Quantum ESPRESSO display
linear weak scalability with AuSurf.tar.gz and AuSurf-large.tar.gz test-cases.
Then, under this hypothesis, you can extrapolate the performance of
AuSurf-large.tar.gz using the performance results of AuSurf.tar.gz.
4 - Run the Benchmarks
Quantum Espresso reads many command line parameters to control the
internal distribution of data structure as well as standard input.
For parallel execution Quantum ESPRESSO require the use of a system launcher
command (e.g. mpirun or mpiexec) to distribute the instance of Quantum
ESPRESSO on different nodes. Relevant command line parameters for this
benchmark are:
-input MY_INPUT_FILE
tells Quantum ESPRESSO to read input from MY_INPUT_FILE)
-npool P
(tells Quantum ESPRESSO to use P pools to distribute data. P should
be less or equal the number of k-points, and maximum scalability is
usually reached with P exactly equal to the number of k-points. You can
read the output to find out the number of k-points of your system)
-ntg T
(tells Quantum ESPRESSO to use T task groups to distribute FFT. Usually
optimal performance can be reached with T ranging from 2 to 8)
-ndiag D
(tells Quantum ESPRESSO to use D processors to perform parallel linear
algebra computation, ScalaPACK. D can range from 1 to the maximum
number of MPI tasks, the optima value for D depend on the bandwidth and
latency of your network).
Below are reported some examples of possible command lines to execute QE:
SiO2 test-case (MPI only)
a) mpirun -np 4 $QE_PATH/bin/pw.x < SiO2-50Ry.in > SiO2-50Ry.out
b) mpirun -np 4 $QE_PATH/bin/pw.x -input SiO2-50Ry.in > SiO2-50Ry.out
c) mpirun -np 16 $QE_PATH/bin/pw.x -ntg 2 -ndiag 16 < SiO2-50Ry.in > SiO2-5
0Ry.out
AuSurf test-case (MPI & OpenMP)
a) export OMP_NUM_THREADS=4; mpirun -np 16 $QE_PATH/bin/pw.x \
-ntg 2 -ndiag 16 < ausurf.in > ausurf.out
b) export OMP_NUM_THREADS=4; mpirun -np 32 $QE_PATH/bin/pw.x \
-ntg 4 -ndiag 16 < ausurf.in> ausurf.out
AuSurf-large test-case (MPI & OpenMP)
a) export OMP_NUM_THREADS=4; mpirun -np 128 $QE_PATH/bin/pw.x \
-ntg 2 -ndiag 64 -npool 2 < ausurf-large.in > ausurf-large.out
b) export OMP_NUM_THREADS=4; mpirun -np 512 $QE_PATH/bin/pw.x \
-ntg 4 -ndiag 64 -npool 8 < ausurf-large.in> ausurf-large.out
5 - Collect and Report the Results
5.1 Validate Results
To validate a benchmark result you have to check the value of the Total
Energy at convergence (ETOT). Proceed as follow:
- inside the running directory issue the command:
> grep "total energy
=" MY_OUTPUT_FILE | tail -1
- you should see a string like:
total energy
= -XXXX.YYYYYYYY Ry
- or
!
total energy
= -XXXX.YYYYYYYY Ry
- Note that if this string is not present the result is not valid!
- The value XXXX.YYYYYYYY is the ETOT. It may vary depending on the
number of tasks/command line parameters, but its variation should be limited
to the last 3 digits.
Below are reported reference values for the datasets
For the SiO2 test case the valid results should have ETOT:
-2622.42376YYY Ry +-0.00001
For the AuSurf test case the valid results should have ETOT:
-11427.0820YYYY Ry +-0.0001
For the AuSurf-large test case the valid results should have ETOT:
-11408.2091YYYY Ry +-0.0001
where YYY can be any digits
5.2 Collect and Report the results
QE has already internal profiling and timing functions, then to
evaluate the performance of a given execution you need simply to
locate the execution wall time (PWSCF_WTIME) that can be found in the
PWSCF timing string (e.g.: PWSCF : 1m52.95s CPU 0m32.17s WALL) at the
end of the output. You can use the command: grep "PWSCF :" and
take the value labelled as WALL. Here "h", "m" and "s" stay for
hours, minutes and seconds.
The results should be recorded and reported using the following table,
where few sample records about different test-case is reported as an example:
Dataset
| Architecture | # Tasks | # Threads x Task | # GPU | -ntg | -ndia
g | -npool | ETOT
| PWSCF_WTIME
---------------------------------------------------------------------------------------------------------------------------SiO2
| EURORA
| 1
| 8
| 2
| 1 | 1
| 1
| -2622.42376369 Ry | 0h15m WALL
---------------------------------------------------------------------------------------------------------------------------AuSurf-large | BGQ
| 1024
| 4
| 0
| 2 | 64
| 4
| -11408.20916560 Ry | 11m52.68s WALL
---------------------------------------------------------------------------------------------------------------------------AuSurf
| EURORA
| 4
| 8
| 4
| 1 | 1
| 1
| -11427.08209914 Ry | 0h24m WALL
6 - Benchmark Rules
The following Quantum ESPRESSO Benchmark Suite rules have to be adhered so
that a result of an execution could be considered valid.
- Only minor changes of the source code, especially due to portability issues,
are allowed. In any case no more than 10% of the source line (not including
comments and empty lines) can be modified. If any other changes were
introduced, they must be reported with the results in order to be checked
and validated.
- Replacement of the numerical libraries already supported and validated for
Quantum ESPRESSO with alternative libraries is not allowed. However, the
version of a library can be substituted for a more recent release. In such
a case, the version number of the library has to be clearly mentioned when
submitting the results.
- There is no restriction on the usage of the compile-line options.
Nevertheless, for each code, the compile-line options used must be reported
with the final results.
- Use of the C pre-processor is allowed only for supported configure and make
flags as defined in file "make.sys".
- No change are allowed on the input files provided with the benchmark suite.
- At least three results for each data-set with different number of cores
should be provided to allow for an estimation of the scalability.
- Any valid combination of Threads and Task are allowed. Quantum ESPRESSO
will report invalid combination.
- Any valid combination of parallelization parameters (-ntg -ndiag -npool)
are allowed. Quantum ESPRESSO will report invalid combination.
- Extrapolation are allowed for the largest dataset AuSurf-large.tar.gz
- Any information concerning non-standard execution (underutilised nodes,
user-defined MPI topologies, MPI task affinity, etc.) must be reported.
- For each execution, the numerical results of a run must pass the validation
check in order for them to be considered valid.