Benchmark Instructions

Instructions to Configure and Run Quantum ESPRESSO Benchmarks
This document is intended to give a quick and simple introduction to the

Quantum ESPRESSO Benchmark suite.
The document is organized as follow: 1 - Brief Description of Quantum ESPRESSO,
2 - Download and Install Quantum ESPRESSO Benchmark Suite,3 - List and Purpose
of Datasets, 4 - Run the Benchmarks, 5 - Collect and Report the Results,
6 - Benchmark Rules.
1 - Brief Description of Quantum ESPRESSO
Quantum ESPRESSO (http://www.quantum-espresso.org) is an integrated suite of
computer codes for electronic-structure calculations and materials modelling,
based on density-functional theory, plane waves, and pseudopotentials
(norm-conserving, ultrasoft, and projector-augmented wave). Quantum ESPRESSO
stands for opEn Source Package for Research in Electronic Structure,
Simulation, and Optimization. It is freely available to researchers around the
world under the terms of the GNU General Public License. Quantum ESPRESSO
builds upon newly restructured electronic-structure codes that have been
developed and tested by some of the original authors of novel
electronic-structure algorithms and applied in the last twenty years by some
of the leading materials modelling groups worldwide. Innovation and efficiency
are still its main focus, with special attention paid to massively parallel
architectures, and a great effort being devoted to user friendliness. Quantum
ESPRESSO is evolving towards a istribution of independent and inter-operable
codes in the spirit of an open-source project, where researchers active in the
field of electronic-structure calculations are encouraged to participate in
the project by contributing their own codes or by implementing their own ideas
into existing codes. Quantum ESPRESSO is written mostly in Fortran90, and
parallelised using MPI and OpenMP.
2 - Download and Install Quantum ESPRESSO Benchmark Suite
For this benchmark suite latest version of Quantum ESPRESSO 5.0.3 will be
used. The code is publicly available from the Quantum ESPRESSO web site
www.quantum-espresso.org, or from the download pages of the developers portal
(qe-forge.org).
No authentication/registration is required.
Quantum ESPRESSO distribution's tar balls tar balls of the source code is
available at site:
http://www.quantum-espresso.org/download/
Patches for GPU enabled version are available at:
https://github.com/fspiga/QE-GPU
In what follow you can find the procedure to obtain the source tree for the
benchmark. From a LINUX/UNIX terminal you issues the following command:
wget http://qe-forge.org/gf/download/frsrelease/116/403/espresso-5.0.2.tar.gz
wget http://qe-forge.org/gf/download/frsrelease/116/405/PHonon-5.0.2.tar.gz
wget http://qe-forge.org/gf/download/frsrelease/128/435/espresso-5.0.2-5.0.3.dif
f
wget http://qe-forge.org/gf/download/frsrelease/135/453/QE-GPU-r216.tar.gz
wget http://qe-forge.org/gf/download/frsrelease/142/452/QE-5.0.2_GPU-r216.patch
tar xvzf espresso-5.0.2.tar.gz
cd espresso-5.0.2
tar xvzf ../PHonon-5.0.2.tar.gz
tar xvzf ../QE-GPU-r216.tar.gz
patch -p1 < ../espresso-5.0.2-5.0.3.diff

patch -p1 < ../QE-5.0.2\_GPU-r216.patch
The command above should set the source tree ready for configure.
Three different configuration procedure are possible: CPU serial, CPU
parallel, CPU parallel + GPU
QE is self contained, nevertheless to obtain an optimal performance
usually it is better to link the code with external standard
libraries: BLAS, LAPACK, FFTW, BLACS/SCALAPACK (for parallel build).
Parallel build requires MPI 1.1 and optionally OpenMP.
QE support the possibility to use Nvidia GPGPU to accelerate linear
algebra subroutines and a limited number of other time consuming
subroutines.
2.1 - Configure serial build
Serial build are usually useful to test scalar/vector optimizations.
Proceed as follow:
a) Set the environment variables
add the compiler executable path to the system PATH
b) Build the executable
- go inside the espresso-5.0.2 directory
> cd espresso-5.0.2
- issue the commands:
> ./configure --disable-parallel
> make all
if everything goes fine you should find the executable "pw.x" in the directory
espresso-5.0.2/bin
2.2 - Configure parallel build
Parallel build is the main stream and default build of QE.
To obtain a parallel build proceed as follow
a) Set the environment
Under the hypothesis you would like to use Intel compiler suite and Intel MPI,
set the following environment variables:
> export I_MPI_F77=ifort
> export I_MPI_CXX=icpc
> export I_MPI_ROOT=PATH_TO_INTEL_MPI
> export I_MPI_CC=icc
> export INTELMPI_HOME=PATH_TO_INTEL_MPI
> export INTEL_HOME=PATH_TO_INTEL_SUITE
> export F90=ifort
> export F77=ifort
> export CXX=icpc
> export MKL_INC=PATH_TO_MKL_INCLUDE_SUBDIR
> export MKL_INCLUDE=PATH_TO_MKL_INCLUDE_SUBDIR
> export MKL_LIB=PATH_TO_MKL_LIB_SUBDIR
> export MKL_HOME=PATH_TO_MKL
> export MKLROOT=PATH_TO_MKL
> export LD_LIBRARY_PATH=PATH_TO_MKL_LIB_SUBDIR:PATH_TO_INTEL_MPI_LIB_DIR:PATH_T
O_INTEL_SUITE_LIB_DIR
> export LIBPATH=PATH_TO_MKL_LIB_SUBDIR:PATH_TO_INTEL_MPI_LIB_DIR:PATH_TO_INTEL_

SUITE_LIB_DIR
> export PATH=PATH_TO_INTEL_SITE_BIN_DIR
>
-
go inside the espresso-5.0.2 directory

cd espresso-5.0.2
issue the command: ./configure --enable-openmp --with-scalapack
edit the file "make.sys" and substitute the string:
-lmkl_blacs_openmpi_lp64
- with string:
-lmkl_blacs_intelmpi_lp64
- issue the command: make all
if everything goes fine you should find the executable "pw.x" in the directory
espresso-5.0.2/bin
2.3 - Configure parallel + CUDA build
This kind of build is used to take advantage of GPGPU accelerators
a) set the environment
If you want to use the Intel compiler suite and CUDA, you have to set the
following environment variable as appropriate:
> export I_MPI_CXX=icpc
> export I_MPI_ROOT=PATH_TO_INTEL_MPI
> export I_MPI_CC=icc
> export INTELMPI_HOME=PATH_TO_INTEL_MPI
> export INTEL_HOME=PATH_TO_INTEL_SUITE
> export F90=ifort
> export F77=ifort
> export CXX=icpc
> export MKL_INC=PATH_TO_MKL_INCLUDE_SUBDIR
> export MKL_INCLUDE=PATH_TO_MKL_INCLUDE_SUBDIR
> export MKL_LIB=PATH_TO_MKL_LIB_SUBDIR
> export MKL_HOME=PATH_TO_MKL
> export MKLROOT=PATH_TO_MKL
> export CUDA_SDK=PATH_TO_CUDA_SDK
> export CUDA_INCLUDE=PATH_TO_CUDA_INCLUDE_DIR
> export CUDA_HOME=PATH_TO_CUDA_HOME
> export CUDA_INC=PATH_TO_CUDA_INCLUDE_DIR
> export CUDA_LIB=PATH_TO_CUDA_LIB_DIR
> export CUDA_CFLAGS=--compiler-bindir=/usr/bin
> export NVCC_HOME=PATH_TO_NVCC_COMPILER_DIR
> export LD_LIBRARY_PATH=PATH_TO_CUDA_LIB_DIR:PATH_TO_MKL_LIB_SUBDIR:PATH_TO_INT
EL_MPI_LIB_DIR:PATH_TO_INTEL_SUITE_LIB_DIR
> export LIBPATH=PATH_TO_CUDA_LIB_DIR:PATH_TO_MKL_LIB_SUBDIR:PATH_TO_INTEL_MPI_L
IB_DIR:PATH_TO_INTEL_SUITE_LIB_DIR
> export PATH=PATH_TO_NVCC_COMPILER_DIR:PATH_TO_INTEL_SITE_BIN_DIR
>
>
>
issue the commands:

cd espresso-5.0.2
cd GPU
./configure --enable-parallel --enable-openmp --enable-cuda --with-gpu-arch=35
--with-cuda-dir=${CUDA_HOME} --disable-magma --enable-profiling --enable-phigem
m --without-scalapack
- then edit file PW/Makefile to substitute line 44:
all : tldeps pw-gpu.x manypw-gpu.x
- with line:
all : tldeps pw-gpu.x
- finally issue the commands:
> cd ..
> make -f Makefile.gpu pw-gpu
if everything goes fine you should find the "pw-gpu.x" in the directory
espresso-5.0.2/bin
3 - List and Purpose of Datasets
Three datasets are provided together with this benchmark: a small one
(SiO2.tar.gz), to be used to run benchmarks inside a single node (or device)
and as a test bed for code change; a medium one (AuSurf.tar.gz) to run
benchmarks
on more than one node; and a large one (AuSurf-large.tar.gz) to run benchmarks
on many nodes.
- SiO2.tar.gz test-case requires only few Gigabytes of main memory to run,
and can scale easily up to 32 or 64 cores.
- AuSurf.tar.gz test-case requires less than 32 Gigabytes of main memory,
2 Gigabyte of disk space, and you would need 2 or 4 nodes to run it.
It can scale up to 256 or 512 cores.
- AuSurf-large.tar.gz test-case is 8 time larger than AuSurf.tar.gz,
and it is meant to perform benchmark runs on 1024, 2048 or even more cores.
Usually, if the architecture is well balanced (i.e. the network performance
are good enough to support the node performance) Quantum ESPRESSO display
linear weak scalability with AuSurf.tar.gz and AuSurf-large.tar.gz test-cases.
Then, under this hypothesis, you can extrapolate the performance of
AuSurf-large.tar.gz using the performance results of AuSurf.tar.gz.
4 - Run the Benchmarks
Quantum Espresso reads many command line parameters to control the
internal distribution of data structure as well as standard input.
For parallel execution Quantum ESPRESSO require the use of a system launcher
command (e.g. mpirun or mpiexec) to distribute the instance of Quantum
ESPRESSO on different nodes. Relevant command line parameters for this
benchmark are:
-input MY_INPUT_FILE
tells Quantum ESPRESSO to read input from MY_INPUT_FILE)
-npool P
(tells Quantum ESPRESSO to use P pools to distribute data. P should
be less or equal the number of k-points, and maximum scalability is
usually reached with P exactly equal to the number of k-points. You can
read the output to find out the number of k-points of your system)
-ntg T
(tells Quantum ESPRESSO to use T task groups to distribute FFT. Usually
optimal performance can be reached with T ranging from 2 to 8)
-ndiag D
(tells Quantum ESPRESSO to use D processors to perform parallel linear
algebra computation, ScalaPACK. D can range from 1 to the maximum
number of MPI tasks, the optima value for D depend on the bandwidth and
latency of your network).
Below are reported some examples of possible command lines to execute QE:
SiO2 test-case (MPI only)
a) mpirun -np 4 $QE_PATH/bin/pw.x < SiO2-50Ry.in > SiO2-50Ry.out
b) mpirun -np 4 $QE_PATH/bin/pw.x -input SiO2-50Ry.in > SiO2-50Ry.out
c) mpirun -np 16 $QE_PATH/bin/pw.x -ntg 2 -ndiag 16 < SiO2-50Ry.in > SiO2-5
0Ry.out
AuSurf test-case (MPI & OpenMP)
a) export OMP_NUM_THREADS=4; mpirun -np 16 $QE_PATH/bin/pw.x \
-ntg 2 -ndiag 16 < ausurf.in > ausurf.out
b) export OMP_NUM_THREADS=4; mpirun -np 32 $QE_PATH/bin/pw.x \
-ntg 4 -ndiag 16 < ausurf.in> ausurf.out
AuSurf-large test-case (MPI & OpenMP)
a) export OMP_NUM_THREADS=4; mpirun -np 128 $QE_PATH/bin/pw.x \
-ntg 2 -ndiag 64 -npool 2 < ausurf-large.in > ausurf-large.out
b) export OMP_NUM_THREADS=4; mpirun -np 512 $QE_PATH/bin/pw.x \
-ntg 4 -ndiag 64 -npool 8 < ausurf-large.in> ausurf-large.out
5 - Collect and Report the Results
5.1 Validate Results
To validate a benchmark result you have to check the value of the Total
Energy at convergence (ETOT). Proceed as follow:
- inside the running directory issue the command:
> grep "total energy
=" MY_OUTPUT_FILE | tail -1
- you should see a string like:
total energy
= -XXXX.YYYYYYYY Ry
- or
!
total energy
= -XXXX.YYYYYYYY Ry
- Note that if this string is not present the result is not valid!
- The value XXXX.YYYYYYYY is the ETOT. It may vary depending on the
number of tasks/command line parameters, but its variation should be limited
to the last 3 digits.
Below are reported reference values for the datasets
For the SiO2 test case the valid results should have ETOT:
-2622.42376YYY Ry +-0.00001
For the AuSurf test case the valid results should have ETOT:
-11427.0820YYYY Ry +-0.0001
For the AuSurf-large test case the valid results should have ETOT:
-11408.2091YYYY Ry +-0.0001
where YYY can be any digits
5.2 Collect and Report the results
QE has already internal profiling and timing functions, then to
evaluate the performance of a given execution you need simply to
locate the execution wall time (PWSCF_WTIME) that can be found in the
PWSCF timing string (e.g.: PWSCF : 1m52.95s CPU 0m32.17s WALL) at the
end of the output. You can use the command: grep "PWSCF :" and
take the value labelled as WALL. Here "h", "m" and "s" stay for
hours, minutes and seconds.
The results should be recorded and reported using the following table,
where few sample records about different test-case is reported as an example:
Dataset
| Architecture | # Tasks | # Threads x Task | # GPU | -ntg | -ndia
g | -npool | ETOT
| PWSCF_WTIME
---------------------------------------------------------------------------------------------------------------------------SiO2
| EURORA
| 1
| 8
| 2
| 1 | 1
| 1
| -2622.42376369 Ry | 0h15m WALL
---------------------------------------------------------------------------------------------------------------------------AuSurf-large | BGQ
| 1024
| 4
| 0
| 2 | 64
| 4
| -11408.20916560 Ry | 11m52.68s WALL
---------------------------------------------------------------------------------------------------------------------------AuSurf
| EURORA
| 4
| 8
| 4
| 1 | 1
| 1
| -11427.08209914 Ry | 0h24m WALL
6 - Benchmark Rules
The following Quantum ESPRESSO Benchmark Suite rules have to be adhered so
that a result of an execution could be considered valid.
- Only minor changes of the source code, especially due to portability issues,
are allowed. In any case no more than 10% of the source line (not including
comments and empty lines) can be modified. If any other changes were
introduced, they must be reported with the results in order to be checked
and validated.
- Replacement of the numerical libraries already supported and validated for
Quantum ESPRESSO with alternative libraries is not allowed. However, the
version of a library can be substituted for a more recent release. In such
a case, the version number of the library has to be clearly mentioned when
submitting the results.
- There is no restriction on the usage of the compile-line options.
Nevertheless, for each code, the compile-line options used must be reported
with the final results.
- Use of the C pre-processor is allowed only for supported configure and make
flags as defined in file "make.sys".
- No change are allowed on the input files provided with the benchmark suite.
- At least three results for each data-set with different number of cores
should be provided to allow for an estimation of the scalability.
- Any valid combination of Threads and Task are allowed. Quantum ESPRESSO
will report invalid combination.
- Any valid combination of parallelization parameters (-ntg -ndiag -npool)
are allowed. Quantum ESPRESSO will report invalid combination.
- Extrapolation are allowed for the largest dataset AuSurf-large.tar.gz
- Any information concerning non-standard execution (underutilised nodes,
user-defined MPI topologies, MPI task affinity, etc.) must be reported.
- For each execution, the numerical results of a run must pass the validation
check in order for them to be considered valid.

Benchmark Instructions

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Benchmark Instructions

Caricato da

Copyright:

Formati disponibili

Instructions to Configure and Run Quantum ESPRESSO Benchmarks

This document is intended to give a quick and simple introduction to the

patch -p1 < ../espresso-5.0.2-5.0.3.diff

> export LIBPATH=PATH_TO_MKL_LIB_SUBDIR:PATH_TO_INTEL_MPI_LIB_DIR:PATH_TO_INTEL_

go inside the espresso-5.0.2 directory

issue the commands:

Potrebbero piacerti anche