USPEX Manual 9.1.0 Release

USPEX Manual 1
USPEX (Universal Structure Predictor: Evolutionary Xtallography).

A.R. Oganov, C.W. Glass, A.O. Lyakhov, P. Pertierra, M.A. Salvado, H.T. Stokes, Q. Zhu

MANUAL.

version 9.1.0. Last modified May 27, 2012. A.R. Oganov, A.O. Lyakhov

USPEX Manual 2
Contents:
1. Terminology in the context of structure prediction.
2. Aims and history of the project.
3. Basics of the algorithm.
4. Version history.
5. How to obtain USPEX. How to install it. Necessary citations. Codes that can work with USPEX. On which
machines can USPEX be run?
6. Input and output files. File locations.
7. Input options: the INPUT.txt file.
7.1. Type of run and system.
7.2. Population.
7.3. Survival of the fittest and selection.
7.4. Variation operators.
7.5. Constraints.
7.6. Cell.
7.7. Restart.
7.8. Details of ab initio calculations.
7.9. Hardware-related.
7.10. Remote settings.
7.11. Fingerprint settings.
7.12. Space groups.
7.13. Many-parents settings.
7.14. Statistics for developers.
8. Additional input for special cases:
8.1. molecular crystals: MOL_1, MOL_2, files.
USPEX Manual 3
8.2. variable-composition code.
9. How to
9.1. How to visualize results.
9.2. How to avoid trapping.
9.3. How to use seed technique.
9.4. How to set up passwordless connection from your local machine to remote cluster.
9.5. How to adapt USPEX to your cluster.
9.6. How to set up remote submission.
10. References.
11. Appendix 1: test runs.
12. Appendix 2: sample input file INPUT.txt
13. Appendix 3: sample short input file INPUT.txt
14. Appendix 4: list of space groups.
15. Appendix 5: list of most important point groups.

USPEX Manual 4
1. TERMINOLOGY IN THE CONTEXT OF STRUCTURE PREDICTION.
Crystal structure prediction problem problem of finding the most stable (lowest free energy) structure for a given
chemical composition at given external conditions (such as pressure and temperature).
Evolutionary algorithm a broad class of global optimization algorithms operating with populations of candidate
solutions and featuring selection, production of offspring (through recipes known as variation operators), and survival of
the fittest. There is no such thing as the evolutionary algorithm, because the construction of variation operators,
representation of solutions, type of fitness function etc. play crucial role in performance of an algorithm, and for each
type of problem one needs to construct a specialized evolutionary algorithm.
Genetic algorithm subclass of evolutionary algorithms involving binary 0/1 strings for representation. USPEX is not a
genetic algorithm, but an evolutionary algorithm.
Local optimization = structure relaxation.
Niching removal of duplicate structures. USPEX does it using fingerprint functions defined in (Valle & Oganov, 2008;
Oganov & Valle, 2009).
Fingerprint function an identifier of the structure based on interatomic distances (or, more generally, many-particle
correlation functions) (Valle & Oganov, 2008; Oganov & Valle, 2009).
Seed technique insertion of already known reasonable structure in the initialization of USPEX structure searches.
Space group vs lattice vs crystal structure often there is confusion between these terms. There are only 230 space
groups, and 14 Bravais lattices, but the number of possible distinct crystal structures is infinite. Space group is a set of all
symmetry operators present in the structure. Bravais lattice is a set of translation vectors in the space group. I.e. space
group and lattice are mathematical objects, while crystal structure is a physical object defining where the atoms sit.
Density functional theory (DFT) exact or approximate? In principle, DFT is exact, but in all practical calculations one
uses approximate flavors of DFT such as LDA, GGA, meta-GGA, or hybrid functionals. Looking for the global minimum
of the approximate energy surface will give realistic results only if the underlying approximations for the energy are
reasonable. I.e. dont use LDA for cuprate superconductors (where LDA doesnt work well at all) the results will be
garbage! Using LDA for studying normal metals, semiconductors or ionic salts is a perfectly valid approach.
Some details are given further in this Manual. For more, see References or consult the book below:

USPEX Manual 5
2. AIMS AND HISTORY OF THE PROJECT.
The USPEX (Universal Structure Predictor: Evolutionary Xtalloraphy and in Russian uspekh means
success owing to the high success rate and many useful results produced by this method!) code possesses
rather unique capabilities: it allows one to predict the stable structure of a given compound at given
conditions (pressure, temperature) just from the knowledge of the chemical composition and using no
experimental information. From the beginning, this non-empirical crystal structure prediction was the main
aim of the USPEX project. This is achieved by merging a specially developed evolutionary algorithm featuring
local optimization and real-space representation with ab initio simulations. In addition to this fully non-
empirical search, USPEX allows one to predict also a large set of robust metastable structures and perform
several types of simulations using various degrees of prior knowledge.
The problem of crystal structure prediction is very old and does, in fact, constitute the central problem of
theoretical crystal chemistry. In 1988 John Maddox
1
wrote that:

One of the continuing scandals in the physical sciences is that it remains in general impossible to predict the
structure of even the simplest crystalline solids from a knowledge of their chemical composition Solids such as
crystalline water (ice) are still thought to lie beyond mortals ken.

It is immediately clear that the problem at hand is that of global optimization i.e. finding the global
minimum of the free energy of the crystal (per mole) with respect to variations of the structure. To get some
feeling of the number of possible structures, let us consider a simplified case of a fixed cubic cell with volume
V, within which one has to position N identical atoms. For further simplification let us assume that atoms can
only take discrete positions on the nodes of a grid with resolution o. This discretisation makes the number C of
combinations of atomic coordinates finite:
! ]! ) / [(
)! / (
) / (
1
3
3
3
N N V
V
V
C
=
o
o
o
(1)
If o is chosen to be a significant fraction of the characteristic bond length (e.g., o = 1 ), the number of
combinations given by (1) would be a reasonable estimate of the number of local minima of the free energy. If
there is more than one type of atoms, the number of different structures significantly increases. Assuming a
typical atomic volume ~10
3
, and taking into account Stirlings formula (n! ~ n
e
n
n
t 2 ) ( ), the number of
possible structures for an element A (compound AB) is 10
11
(10
14
) for a system with 10 atoms in the unit cell,
10
25
(10
30
) for a system with 20 atoms in the cell, and 10
39
(10
47
) for the case of 30 atoms in the unit cell.
One can see that these numbers are enormous and practically impossible to deal with even for small systems
with the total number of atoms N ~ 10. Even worse, these numbers increase exponentially with N. It is clear
then, that point-by-point exploration of the free energy surface going through all possible structures is not
viable, except for the simplest systems with ~1-5 atoms in the unit cell.
Progress in solving this problem has been insufficient as revealed by the results of blind tests
2
, where a
large variety of methods were tested against experimentally solved structures. Thus, new approaches are
USPEX Manual 6
needed. Previous approaches that have been devised to solve this problem are discussed in Ref.3. USPEX
3,4

employs an evolutionary algorithm devised by A.R. Oganov and C.W. Glass, with major later contributions by
A.O. Lyakhov and Q. Zhu. Its efficiency draws from the problem-specific variation operators and extensive
tuning, while its reliability is largely due to the use of state-of-the-art ab initio simulations inside the
evolutionary algorithm. The strength of evolutionary simulations is that they do not necessarily require any
system-specific knowledge (except chemical composition) and are self-improving, i.e. in subsequent
generations increasingly good structures are found and used to generate new structures. This allows a
zooming in on promising regions of configuration space. Furthermore, due to the flexible nature of the
variation operators, it is very easy to incorporate additional features into an evolutionary algorithm.
A major motivation for our development of USPEX has been the discovery of the post-perovskite phase of
MgSiO
3
, which was made in 2004
6,7
and has significantly changed models of the Earths internal structure.
Several months later, when Colin W. Glass joined Oganovs group in August 2004, we started the
development of USPEX. In 2006 -2008, when Yanming Ma was A.R. Oganovs postdoc, USPEX was applied to
a number of important problems. A new major turn took place in August 2007, when Andriy O. Lyakhov took
over the role of the main code developer. Qiang Zhu (current USPEXmaster and an active code developer)
joined us in September 2009. By September 2010, when USPEX was publicly released, over 50 paper
(including 2 in Nature, 5 in Phys. Rev. Lett., 4 in PNAS) were written on USPEX or using this methodology, and
its user community numbered nearly 200. Things develop fast now we have a larger developers community
(where everyone is welcome to join) and over 800 users, as of May 2012.

Prediction of the crystal structure of MgSiO
3
at 120 GPa (20 atoms/cell). Enthalpy of the best structure as a
function of generation is shown. Between 6
th
and 12
th
generations the best structure is perovskite, but at the 13
th

generation the global minimum (post-perovskite) is found. This simulation used no experimental information and
illustrates that USPEX can find both the stable and low-energy metastable structures in a single simulation. Each
generation contains 30 structures. This figure illustrates the slowest of ~10 calculations done by the very first version of
USPEX and yet is already pretty fast!

The popularity of USPEX is due to its extremely good efficiency and reliability. This was shown in the First
Blind Test for Inorganic Crystal Structure Prediction
9
, where USPEX outperformed the other methods it was
USPEX Manual 7
tested against (simulated annealing and random sampling). Random sampling (a technique pioneered for
structure prediction by Freeman and Schmidt in 1993 and 1996, respectively, and since 2006 revived by
Pickard
12
under the name AIRSS Ab Initio Random Searching Strategy) is the simplest, but also the least
successful and reliable strategy. Due to the exponential scaling of the complexity of structure search (eq. 1),
the advantages of USPEX also increase exponentially with system size. But already for small systems such as
GaAs with 8 atoms/cell these advantages are large (random sampling requires on average 500 structure
relaxations to find the ground state in this case, while USPEX finds it after only ~30 relaxations!). For
instance, 2 out of 3 structures of SiH
4
predicted by random sampling to be stable
12
, turned out to be
incorrect
13
; and similarly predictions of random sampling were shown
14
to be incorrect for nitrogen
15
and for
SnH
4
(compare predictions
16
of USPEX and of random sampling
17
).
a b
Structure prediction for GaAs: a) energy distribution for relaxed random structures, b) progress of an evolutionary
simulation (thin vertical lines show generations of structures, and the grey line shows the lowest energy as a function of
generation). All energies are relative to the ground-state structure. The evolutionary simulation used a population of 10
structures. The first generation was produced randomly. Each subsequent generation was produced from the lowest-
energy 40% of the previous generation. 60% of the new structures were generated through heredity, 20% by lattice
mutation and 20% through permutation. In addition, the lowest-energy structure of the previous generation survived
into the next generation.

For larger systems random sampling tends to produce almost exclusively disordered structures with nearly
identical energies, which decreases success rate to practically zero, as shown in the example of MgSiO3 post-
perovskite with 40 atoms/supercell random sampling fails to find the correct structure even after 120,000
relaxations, whereas USPEX finds it just after several hundred relaxations. An important note is that random
sampling runs can easily be done with USPEX, for those who would like to try but we see this useful mostly
for testing. Likewise, Boldyrevs version of the Particle Swarm Optimization (PSO) algorithm for crystal
structure prediction (recently re-implemented by Wang, Lv, Zhu and Ma) can be implemented on the basis of
USPEX with minor programming work, and we implemented a corrected PSO algorithm, which, however,
only shows that all existing PSO methods are much less efficient and reliable than USPEX again, we see the
PSO approach suitable mostly for testing purposes, if anyone wants to try. A very powerful new method,
USPEX Manual 8
complementary to our evolutionary algorithm, is evolutionary metadynamics
19
a hybrid of Martonaks
metadynamics and Oganov-Glass evolutionary approach. This method is powerful for global optimization
and for harvesting low-energy metastable structures, and even for finding possible phase transition pathways.

Sampling of the energy surface: comparison of random sampling and USPEX for a 40-atom cell of MgSiO
3

with cell parameters of post-perovskite. Energies of locally optimized structures are shown. For random sampling,
1.2x10
5
structures were generated (none of them corresponded to the ground state). For USPEX search, each generation
included 40 structures and the ground-state structure was found within 15 generations. The energy of the ground-state
structure is indicated by the arrow. This picture shows that learning incorporated in evolutionary search drives the
simulation towards lower-energy structures.

In its current version, the code has a minimal input: the number of atoms of each sort, pressure-temperature
conditions and algorithm parameter values: size of the population (i.e. the number of structures in each
generation), hard constraints
35
, the number of structures used for producing the next generation, and the
percentage of structures obtained by lattice mutation, atomic permutation and heredity. Other parameters
define how often structure slices combined in heredity are randomly shifted, how many atomic permutations
are done per structure, and how strong lattice mutation is. The use of specially designed fingerprint functions
for niching helps to speed up structure search and prevent sticking to local minima. Optionally, calculations
can be performed under fixed lattice parameters (if these are known from experiment). It is possible to
perform variable-composition simulations, where one specifies only the atomic types, and USPEX should find
both the stable compositions and the corresponding structures. It is also possible to predict structures of
nanoparticles and to study packing of molecules in molecular crystals.

USPEX Manual 9
Overview of capabilities: The table below shows different styles of calculation and their mutual compatibility
in the latest version (+ means ready for production runs):
Non-
molecular
Molecular Variable-
composition
Properties Evolutionary
metadynamics
VASP + - + + +
SIESTA + + + + -
GULP + + + + +
CP2k + - + - To be
implemented
QuantumEspresso + - + - -
DMACRYS + + + - -
MD++ + - - - -
Seeds + + + + N/A
Restart + + + + +
CellSplitting + + + + N/A
Space groups + + + + +
Fingerprints + + + + +
Local order + + + + +
Molecular - + In progress + +
Variable-composition + In progress + + -
Properties - + + + +
Evolutionary metadynamics + + - + +

Post-processing, or analysis of the data, is extremely important, and in this aspect USPEX also occupies a
unique niche, benefiting from an interface specifically developed for USPEX by Mario Valle in his STM4 code.
This includes analysis of thousands of structures in a matter of a few minutes, determination of structure-
property correlations, analysis of algorithm performance, quantification of the energy landscapes, state-of-
the-art visualization of the structures, determination of space groups, etc. etc. etc. including even
preparation of movies showing the progress of the simulation! USPEX also generates some figures, so you can
monitor its results and analyze them:
1. Energy_vs_N.tif (Fitness_vs_N.tif) energy (fitness) as a function of structure number;
2. Energy_vs_Volume.tif energy as a function of volume;
3. BestEnthalpy.tif (BestFitness.tif) best enthalpy (fitness) as a function of generation number;
4. quasiEntropy.tif quasi-entropy as a function of generation number;
5. Ediff_vs_N.tif energy of the child vs parent(s) energy; different operators marked with different
colors (this graph allows one to assess the performance of different variation operators);
6. E
i
_vs_E
i+1
.tif correlation between energies from relaxation steps i and i+1; helps to detect problems
and improve input for relaxation files;
USPEX Manual 10
For variable compositions there are additional graphs, like:
1. C-O-volumes.tif shows the volume/atom of the system vs composition, useful for determination of
the correct atomic volumes for the input file;
2. C-O-decomposition-enthalpy.tif shows the enthalpy of formation as function of composition.

USPEX Manual 11
3. BASICS OF OUR EVOLUTIONARY ALGORITHM.
Crystal structures (lattice vectors and atomic coordinates) are represented by real numbers - not by the often
used, but unphysical, binary 0/1 strings. Every candidate structure is locally optimized (i.e. relaxed) and
replaced by the locally optimal structure. For structure relaxation we use conjugate-gradients or steepest-
descent methods, available in many first-principles and atomistic simulation codes. Currently, USPEX can use
VASP
38
, SIESTA
39
, Quantum Espresso, CP2k for first-principles simulations, and GULP
40
, MD++, and
DMACRYS for atomistic simulations. Structure relaxation is split into stages starting from crude and ending
with very fine. This allows extremely accurate calculations at low cost. During optimization with ab initio
methods, the k-point grid changes in accordance with cell changes. This enables strict comparability between
all obtained free energies while keeping the computational costs low. The following variation operators are
used: heredity, lattice mutation, permutation, softmutation.
Our procedure is the following:
1. The first generation is produced by a random-number generator (only those structures which satisfy the
hard constraints are allowed). Non-random start from some good structures provided by user is also possible.
Though this is not necessary, we recommend that the first generation be produced randomly using space
group symmetry but the code works well also if you dont use space groups! Even if you use symmetrized
structures, the algorithm finds whether the structure wants to break symmetry and breaks it, if needed!
2. Among the locally optimized structures, a certain number of the worst ones are rejected, and the remaining
structures participate in creating the next generation through heredity, permutation and mutation. Duplicate
structures are removed using the fingerprint method
8
. Selection probabilities for variation operators are
derived from the rank of their fitness (i.e. their calculated free energies).
During heredity, new structures are produced by matching slices (chosen in random directions and with
random positions) of the parent structures. A certain fraction of structures is produced by randomly shifting
these slices in their matching plane. Heredity for the lattice vectors matrix elements (for this matrix we use
the upper-triangular form in order to avoid unphysical whole-cell rotations) is done by taking a weighted
average, using random weights.
A certain fraction of the new generation is created by permutation (i.e. switching identities of two or more
atoms in a structure) and lattice mutation (random change of the cell vectors). Lattice mutation essentially
incorporates into our method the ideas of metadynamics
9,21
, where new structures are found by building up
cell distortions of some known structure. Unlike in metadynamics, in our method the distortions are not
accumulated, so to obtain new structures the strain components should be large. Softmutation obtains new
structures by large displacements of the atoms along the eigenvectors of the softest phonon modes; this relies
on our very efficient approximate phonon technique that requires no input parameters and takes negligibly
small time.
To avoid pathological lattices all newly obtained structures are rescaled to have a certain volume, which is
then relaxed by local optimization. The value of the rescaling volume can be easily estimated either from the
equation of state of some known structure or by optimizing a random structure; this value is used only for the
first generation and for subsequent generations is adapted to the volumes of several best found structures. A
USPEX Manual 12
specified number of the best structures of the current generation always survive, mate and compete in the
next generation.
3. The simulation is terminated after some halting criterion is met. In our experience, for systems with up to ~
10 atoms in the cell the global minimum is often found within the first few generations, for systems with ~20
atoms in the cell this usually takes up to ~10-20 generations. Among the important results of the simulation
are the stable crystal structure and a set of robust metastable structures at given pressure-temperature
conditions.
USPEX allows one to find the stable crystal structure of a given compound at given external conditions
(pressure, temperature, etc.). Moreover, it also produces a set of robust metastable structures. Unlike
traditional simulation methods that only sample a small part of the free energy landscape close to some
minimum, our method explores the entire free energy surface on which it locates the most promising areas.
This allows one to see which aspects of structures (molecular vs coordination or metallic vs insulating
structures, atomic coordination numbers, bond lengths and angles) are required for stability and therefore
provides an interesting way of probing structural chemistry of matter at different conditions. Our present
implementation in the USPEX code is very efficient for systems with up to ~200 atoms in the unit cell. With
additional developments we expect the method to be efficient also for larger systems. Since no symmetry
constraints are imposed during simulations, symmetry is one of the results of our algorithm. This ensures that
the resulting structures are mechanically stable and do not contain any unstable I-point phonons.
Our approach enables crystal structure prediction without any experimental input. Essentially, the only input
is the chemical formula (then, we typically perform simulations for different numbers of formula units in the
cell). However, in some cases, we would like to be able to predict also stable stoichiometries. One of the first
steps in this direction was taken in Ref. 44, who have applied an ab initio evolutionary algorithm to find stable
alloys. However, in their work the structure was fixed (only fcc- and bcc- structures were explored). Our
algorithm incorporates variable-composition searches in order to find stable structures and likely
stoichiometries in a given system. Among the current limitations of the method is the limitation to ordered
periodic structures, but it can be overcome once it becomes possible to calculate free energies of disordered
and aperiodic structures. Another limiration is that most calculations are done at T=0 K, because finite-T free
energy optimization is too expensive at the ab initio level; such calculations with interatomic potentials have,
however, been possible since 2005!3 Then, clearly, the quality of the global minimum found by USPEX
depends on the quality of the ab initio description of the system. Present-day DFT simulations (e.g., within the
GGA) are adequate for most situations, but it is known that these simulations do not fully describe the bonding
and electronic structure of Mott insulators, and until recently there were problems with the DFT description
of the van der Waals bonding. In both areas there are significant achievements (see, e.g., Ref. 45-51), which
can be used for calculating ab initio free energies and evaluation of structures in evolutionary simulations.

USPEX Manual 13
4. VERSION HISTORY.
Versions up to 6.1 were written by C.W. Glass (he also partly debugged molecular options in 6.7.3). Starting from 6.2
A.O. Lyakhov is the main developer. Experience with Octave from M. Salvado and P. Pertierra, later complemented by
A.O.Lyakhov, enabled the use of USPEX under Octave. Q.Zhu has finalized the molecular packing prediction code
(finalized in v.8.5) and powerful evolutionary metadynamics code. H.T. Stokes has contributed his powerful code for
initialization of the first generation of random structures with space group constraints in v.8.5.0, and space group
determination code in v.8.6.0.
v.1 Evolutionary algorithm without local optimization. Real-space representation, interface with VASP. Experimental
version. October 2004.
v.2 CMA-ES implementation (CMA-ES is a powerful global optimization method developed by N. Hansen).
Experimental version. January 2005.
v.3 Evolutionary algorithm with local optimization.
v.3.1 Working versions, sequential. Major basic developments.
3.1.1-3.1.3 Versions for debugging. April 2005.
3.1.4-3.1.5 First production version. Based largely on heredity with slice-shifting and with minimum-parent
contribution (hard-coded to be 0.25). May 2005.
3.1.6-3.1.7 Experimental versions (development of new options). August 2005.
3.1.8 Adaptation of k-point grids. 15/10/2005.
3.1.9 Improved output. 28/10/2005.
3.1.10 Automatic control of structure relaxation (before this was done manually). Slice-shift mutation. Experimental
version. 31/10/2005.
3.1.11 Restart from arbitrary generation. Experimental version. 04/11/2005.
3.1.12 Production version based on v.3.1.11, variable slice-shift mutation. 11/11/2005.
3.1.13 Adaptive scaling volume. 29/11/2005.
3.1.14 Basic seed technique. 29/11/2005 (debugged 6/12/2005).
3.1.15 Improved output. 15/12/2005.
v.3.2 Massively parallel version.
3.2.1 Massive parallelisation. 16/11/2005.
3.2.1.1 Corrected version with an arbitrary number of CPUs/job. 21/11/2005.
3.2.2 Improved output. Replaced InterplanarDistance constraint by LatticeParameter constraint. 15/12/2005.
v.4 Unified parallel/sequential version.
4.1.1 Lattice mutation. 20/12/2005 (debugged 10/01/2006).
4.2.1 Interfaced with SIESTA. Initial population size allowed to be different from the running population size.
24/01/2006 (debugged 20/04/2006).
4.2.2 Number of kept best individuals can be specified. Experimental version. 21/04/06.
4.2.3 Relaxation of best structures made optional. Version with fully debugged massive parallelism. 25/04/06.
4.3.1 Implementation for CRAY XT3 supercomputer. 22/05/2006.
4.4.1 Interfaced with GULP. 08/05/2006.
4.4.2 POTCAR_1, POTCAR_2 etc. allowed. 08/05/06.
v.5 Completely rewritten and debugged version, clear modular structure of the code.
5.1.1 new platforms: Blanc, Gonzales. Sequential mode temporarily abandoned. Atom-specific permutation, code
interoperability, on-the-fly reading of parameters from INPUT_EA.txt. 20/12.2006.
USPEX Manual 14
5.2.1 SIESTA-interface for Z-matrix, rotational mutation operator, only remote job submission enabled (experimental
version). 01/03/2007.
v.6 Production version, with both remote and local job submission enabled.
6.1.1 - Variation in CPU numbers allowed. Created standard tests. 04/04/2007.
6.1.2 Density of k-points changes within structure relaxation (from stage_1 to stage_2 to ), allowing extremely
accurate and cheap treatment of metals and semiconductors. 06/04/2007.
6.1.3 To efficiently fulfil hard constraints for large systems, an optimizer has been implemented within USPEX.
07/06/2007.
6.2 Development version.
6.3.1-6.3.2 Introduced angular constraints for cell diagonals. Completely rewritten remote submission. Improved input
format. Further extended standard tests. 07/12/2007.
6.3.3 X-com grid interface (with participation of STikhonov and SSobolev). In progress (05/03/2008).
6.4.1 Fingerprint functions for niching. 07/04/2008.
6.4.2 Debugged fingerprinting. Separate identity threshold for survival of the fittest. (16/04/2008).
6.4.3 Debugged SIESTA- and GULP-interfaces. 19/04/2008.
6.4.4 - Space group recognition (debug, but still problematic), with an option to switch them off. Fast fingerprints (from
tables). 05/05/2008.
6.4.5-6.4.7 - Further debugging. Development of features for next version. (June-July 2008)
6.5.1 Split-cell method for very large systems. Easy remote submission. Variable number of best structures (energy
clustering). 16/07/2008.
6.6.1 A very robust version, with local execution re-enabled, and improved fingerprint and split-cell implementations.
13/08/2008.
6.6.2 adaptation of v6.6.1 for compatibility with older versions of Matlab. 16/08/2008.
6.6.3 Heredity with multiple parents implemented. 01/10/2008
6.6.4 Added a threshold for parents participating in heredity (niching). 03/10/2008
6.6.5 changes in remote submission files, clean-up of the USPEX code. 11/2008
6.6.6 first implementation of multicomponent fingerprints. 04/12/2008
6.6.7, 6.7.1 and 6.7.2 Implemented quasi-entropy to measure the diversity of the population, moved CEL and SPF to
separate folder. 10/12/2008
6.7.3 largely debugged the case of 2-atom molecules. Stopping criterion was based on the quasientropy of the
population. 14/01/2009.
v.7 Production version, written to include variable composition.
7.1.1-7.1.7 series of improved versions. v.7.1.7 has been distributed to ~200 users. Variable composition partly coded,
most known bugs fixed, improved tricks based on energy landscapes. Improved cell splitting, implemented pseudo-
subcells. Implemented multicomponent fingerprints (much more sensitive to the structure than one-component
fingerprints). 28/04/2009 (version finalized 28/05/2009).
7.2.5 first fully functional version of the variable-composition method (7.2.1-7.2.4 development versions).
Introduced transmutation operator and compositional entropy. 6/09/2009.
7.2.7 thoroughly debugged, improved restart capabilities, improved seeding, interface to MBaskess MD++ code,
introduced perturbations within structure relaxation, introduced biased fitness function for variable-composition.
25/09/2009, further improved in versions 7.2.8/9.
USPEX Manual 15
7.3.0 Full fingerprint support in the variable-composition code, including niching. Fair algorithm for producing the
first generation of compositions. 22/10/2009.
7.3.1 introduced (optionally) heredity and permutation biased by local order parameters. Found and fixed two bugs in
the heredity operator. 30/09/2009.
7.4.1 introduced coordinate mutation based on local order
8
. Heredity and transmutation are also biased by local order.
Introduced computation of the hardness and new types of optimization by hardness and density. 04/01/2010
7.4.2 debugging, implementation of multiple-parents heredity biased by local order. 15/01/2010.
7.4.3 debugging, implementation of new types of optimization (to maximize structural order and diversity of the
population). Eliminated parameters volTimeConst and volBestHowMany. 24/01/2010.
v.8 Production version, written to include new types of optimization.
8.1.1-8.2.8 development versions. Local order and coordinate mutation operator. Softmutation operator. Calculation
and optimization of the hardness. Prediction of the structure of nanoparticles. Implementation of point groups. Greatly
improved overall performance. Option to perform PSO simulations (not recommended for applications, due to PSOs
inferior efficiency so use only for testing purposes). GoodBonds transformed into a matrix and used for building
nanoparticles. 22/09/2010.
8.3.1 debugged PSO method, cleaned-up input. 08/10/2010.
8.3.2 for clusters, introduced a check on connectivity (extremely useful), and improved dynamicalKeepBestHM=2
option, as well as mechanism for producing purely softmutated generations. Improved fingerprints for clusters. Interface
to Quantum Espresso and CP2K codes. 11/10/2010.
8.4 - Family of development versions containing several improvements for nanoparticles. Development branches
pseudo-metadynamics, molecular crystals.
8.5.0 initialization of the first random generation using the space group code of H.Stokes added. New formulation of
metadynamics implemented and finalized, for now in a separate code. Several debugs for varcomp, nanoparticles,
computation of hardness. 18/03/2011.
8.5.1 working version with numerous debugs. Space group initialization implemented for cases of fixed unit, variable
composition, and subcells. 20/04/2011.
8.6.0 added space group determination program from H. Stokes. Merger with the updated code for molecular crystals
(including space group initialization). Fixed a bug for SIESTA (thanks to DSkachkov). 06/05/2011.
8.6.1-8.7.2 development versions, quite robust. Improved symmetric initialization for the case of fixed cell. Graphical
output enabled. Improved softmutation (by better criteria of mode and directional degeneracies) and heredity (by using
energy-order correlation coefficient and cosine formula for the number of trial slabs) operators. Most variables now
have default values, which allows to use very short input files. Shortened and improved the format of log-files.
13/11/2011.
8.7.3 improved split-cell algorithm and debugged enhanced constraints, graphical output now includes random
structures. 21/11/2011.
8.7.5 fixed bugs in variable composition code, graphical output now includes many extra figures and approximate
atomic volumes are suggested to user in variable composition calculations. Added utility to extract all structures close to
convex hull for easier post-processing. 21/3/2012.
v.9 Production version, made more user-friendly and written to include new types of functionality and to set the new
standard in the field.
USPEX Manual 16
9.0.0 Evolutionary metadynamics and vc-NEB codes added to USPEX package, added tensor version of metadynamics,
added additional figures and post-processing tools, cleaned the code output. A few parameters removed from the input.
Improved soft-mode mutation. April 2012.
9.1.0 Release version. Cleaned up, documented. The user community is >800 people. Released 28/05/2012.

Bug reports:
Like any large code, USPEX may have bugs. If you see strange behaviour in your simulations, please report to
us by sending files INPUT.txt and log to USPEXmaster (currently Qiang Zhu).

USPEX Manual 17
5. HOW TO OBTAIN USPEX. HOW TO INSTALL IT. NECESSARY CITATIONS. CODES THAT CAN WORK WITH USPEX. ON
WHICH MACHINES CAN USPEX BE RUN?
USPEX is an open source public domain code, and can be downloaded at http://han.ess.sunysb.edu/~USPEX/
For ease of programming and ease of use USPEX is written in MATLAB and it also works under Octave (a free
matlab-like environment) you dont need to compile anything, just plug and play! To enhance MATLAB-
version compatibility, only basic MATLAB commands have been used.
USPEX can be used on any platform all you need is to have 1 CPU where MATLAB or Octave can be run
under Linux or Unix using its special remote submission mechanism, USPEX will be able to connect to any
remote machine (regardless of whether MATLAB is installed there) and use it for calculations.
Trial structures generated by USPEX are relaxed and then evaluated by an external code interfaced with
USPEX. Based on the thus obtained ranking of relaxed structures, USPEX generates new structures which are
again relaxed relaxed and ranked. Our philosophy is to use existing well-established ab initio codes for
structure relaxation and energy calculations. Currently, USPEX is interfaced with VASP, SIESTA, Quantum
Espresso, CP2k, DMACRYS, GULP, ASE, ATK and MD++. The choice of these codes is based on 1) its efficiency
for structure relaxation, 2) general speed, 3) robustness, 4) popularity. Of course, there are many other ab
initio codes that can satisfy these criteria, and in the future we can interface USPEX to them.
USPEX is easy to install. To run USPEX, you need to have MATLAB or Octave installed (at least, on login nodes
or on your remote computer) and to have VASP, SIESTA, Quantum Espresso, CP2k, DMACRYS, GULP, ASE, ATK
or MD++. executable(s) on the compute nodes. This way you can use USPEX on any platform.
Whenever using USPEX, in all publications and reports you must cite the original papers, e.g. in the following
way: Crystal structure prediction was done with the USPEX code
3-5
, based on an evolutionary algorithm
developed by Oganov, Glass and Lyakhov and featuring local optimization, real-space representation and
flexible physically motivated variation operators.

USPEX Manual 18
6. INPUT AND OUTPUT FILES. FILE LOCATIONS. OVERVIEW OF CAPABILITIES.
Hint to all new users: USPEX comes with a set of test cases. Run them to check if everything works.
Use this Manual to understand the input parameters. To prepare a new calculation, it is always best to
start with one of these test cases but be sure to modify input (especially the level of accuracy) to
what you need. Most of the tests are provided with very crude computational settings designed to tell
the user (and developer!) whether certain functionalities of the code perform correctly. Bear this in
mind when transforming these tests to real calculations.
Input/output files depend on the external code used for structure relaxation and on the type of job
submission.
An important technical element of our philosophy is the multi-stage strategy for structure relaxation,
which has a deep rationale. Final structures and energies must be high-quality, in order to provide
correct ranking of structures by energy. Most of the newly generated structures are very far from local
minimum (e.g. contain bonds that are too short or too long) and their high-quality relaxation is
extremely expensive. However, this cost can be avoided if the first stages of relaxation are done with
cruder computational conditions only at the last stages of structure relaxation there is a need for
high-quality calculations. First stages of structure relaxation can even be done with cheaper
approaches (e.g. interatomic potentials using GULP or MD++). You can change the computational
conditions (basis set, k-points sampling, pseudopotentials or PAW potentials) or the level of
approximation (interatomic potentials vs LDA vs GGA) or even the structure relaxation code (GULP
vs MD++ vs DMACRYS vs SIESTA vs VASP vs CP2K vs QE) during structure relaxation of each
candidate structure. Furthermore, a multi-stage strategy is needed to ensure stability of structure
relaxation if the initial forces on atoms are too large, variable-cell optimization will often lead to the
explosion of the structure and meaningless results. Therefore, we strongly suggest to first optimize
atomic positions at constant cell parameters, or the cell shape and atomic positions with constant unit
cell volume, and only then perform the full optimization of all structural variables. While optimizing at
constant volume, you dont need to worry about Pulay stresses in plane-wave calculations and
therefore it is OK to use a small basis set at this stage, but of course for constant-pressure variable-cell
relaxation you will need a high-quality basis set. For structure optimization, you can often get away
with a small set of k-points- but dont forget to sufficiently increase it at the last stage(s) of structure
relaxation.
Now, suppose that the directory where the calculations are performed is ~/StructurePrediction.

A. Running USPEX in the sequential mode. This directory will contain:
-USPEX code in particular, the ev_alg.m and directory FunctionFolder
-file INPUT.txt
-Subdirectory ~/StructurePrediction/Specific with VASP or SIESTA or GULP (etc) executables, and
enumerated input files for structure relaxation INCAR_1, INCAR_2,, and pseudopotentials or PAW
potentials For VASP, files INCAR_1, INCAR_2, etc. defining how relaxation and energy calculations will
be done at each stage of relaxation, and the corresponding POTCAR_1, POTCAR_2 files with pseudopotentials.
E.g., INCAR_1 does very crude structure relaxation of atomic positions with fixed cell parameters, INCAR_2
crudely optimizes both atomic positions and cell parameters, keeping the volume fixed, INCAR_3 does full
structure relaxation under constant external pressure with medium precision, INCAR_4 does very accurate
calculations. Each higher-level structure relaxation starts from the results of a lower-level optimization and
improves them.
- For SIESTA, you need the pseudopotentials and input files sinput_1, sinput_2, (if you do molecular
calculations with Z-matrix) or input_1.fdf, input_2.fdf, if you do standard calculations.
USPEX Manual 19
- For GULP, files goptions_1, goptions_2, and ginput_1, ginput_2, must be present. The former specify
what kind of optimization is performed, the latter specify the details (interatomic potentials, pressure,
temperature, number of optimization cycles, etc.)
- For MD++ - to be documented.
- For DMACRYS - to be documented.
- For CP2K, files cp2k_options_1, cp2k_options_2, must be present. All files should be the normal cp2k
input files with all parameters except atom coordinates and cell parameters (these will be written by USPEX
together with the finishing line \&END FORCE_EVAL). The name of the project should always be USPEX,
since the program reads the output from files USPEX-pos-1.xyz and USPEX-1.cell. We recommend doing
relaxation at least in three steps similar to VASP first optimise only the atom positions with lattice being fixed
and then do a full optimisation.
- For Quantum Espresso, files QEspresso_options_1, QEspresso_options_2, must be present. All files
should be the normal QE input files with all parameters except atom coordinates, cell parameters and kpoints
(these will be written by USPEX at the end of the file). We recommend to do a multi-step relaxation. E.g.,
QEspresso_options_1 does a crude structure relaxation of atomic positions with fixed cell parameters,
QEspresso_options_2 does full structure relaxation under constant external pressure with medium precision,
QEspresso_options_3 does very accurate calculations. Each higher-level structure relaxation starts from the
results of a lower-level relaxation.
- Subdirectory ~/StructurePrediction/Seeds containing seed structures, if the seed technique is used (otherwise
keep this directory empty). If seeds are used, copy to this directory the get* files. Seeded structures should be all
concatenated in the VASP format to a file called POSCARS (format concatenated VASP POSCAR files) and a
file called compositions (format on each line, there should be the number of atomic species A, B, C, in the
unit cell, there should be as many lines as structures in POSCARS).
-Subdirectory ~/StructurePrediction/results1 (if this is a new calculation) and results2, results3, (if the
calculation has been restarted or run a few times).

The subdirectory ~/StructurePrediction/results1 contains the following files:
-Parameters.txt this is a copy of the INPUT.txt file used in this calculation, for your reference.
-gatheredPOSCARS, enthalpies, fitness.dat, VOLUMES, KPOINTS structures, their enthalpies, fitnesses
(may or may not be identical to enthalpies, depending on what you ask the code to optimize!), unit cell volumes,
and k-points meshes that you need to use to reproduce values given in enthalpies. If you use USPEX with
SIESTA, you will see structures in the file gatheredSTRUCS (in SIESTA format)
-BESTgatheredPOSCARS, BESTenthalpies, BESTvolumes, etc. the same data for 1 best structure in each
generation.
-hardness.dat an estimate of hardness of all structures.
-enthalpies_complete here enthalpies for all structures in each stage of relaxation are given.
-quasiEntropy.dat shows the diversity of structures in each generation.
-origin shows which structures originated from which parents and through which variation operators
-a lot of other files useful mostly for benchmarking, rather than applications.

File Parameters.txt is a copy of the INPUT.txt file (to keep track of computational conditions). There
is a (suppressed but easy to reinstate, if needed) capability to generate structure files in formats for
other software - CEL files can be used immediately for calculating powder diffraction patterns with
the program Powdercell (freely available), and .SPF files are convenient for finding the space group
using PLATON (also a public domain code).

How to run USPEX.
First of all, set up your calculation by editing INPUT.txt. Options and keywords of this crucial file are
described below.
USPEX Manual 20
Then, gather files needed for the external code doing structure relaxation this information must be in
the folder Specific. This includes the executable (e.g. vasp), and such files as INCAR_1, INCAR_2,
and POTCAR_1, POTCAR_2, It is in these files that you specify the external p-T conditions at
which you want to predict the structure.
Once this is done, all you need to execute the code in the sequential mode is just to type:
matlab <USPEX.m > log &

File log will contain information on progress of the simulation and, if any, errors (these need to be
reported to us, if you would like to report a bug).

B. When running USPEX in the massively parallel mode, there will be a few differences. All the capabilities are
implemented, the user only needs to do minimal work to configure files to the users computers (hence, we cannot
guarantee support for solving problems with massively parallel mode).
First, you need to edit file RemoteTemplate and enter parameters specific to your remote supercomputer and your personal
access there. Rename this file as you like (then this name should appear in INPUT.txt on the input line
whichCluster).

Also, in INPUT_EA.txt specify remote=2, and indicate the level of parallelization of your calculation (how many
structures to be relaxed in parallel). When the calculation starts, new subdirectories CalcFold1, CalcFold2,, where
independent jobs are executed, will be created.

To start the calculation, you need to use a Cron daemon on your Linux machine. In your user root directory, there must
now be files:
~/call_job
~/CronTab

Here is an example of a 1-line CronTab file from one of our clusters:
*/5 * * * * sh call_job

It states that the interval between job submissions is 5 minutes and points to the file call_job, which should contain the
address of the directory where USPEX will be executed, and the file call_job looks like this:

#!/bin/tcsh
source /etc/csh.login
source ${HOME}/.cshrc
cd /ExecutionDirectory
matlab < USPEX.m >> log

To activate Cron, either type
crontab CronTab

or edit Cron by typing
crontab e

If you want to terminate this run, either edit call_job or remove this crontab by typing
crontab r

USPEX Manual 21
7. INPUT OPTIONS: THE INPUT.TXT FILE.

A typical INPUT.txt file is given in Appendix 2. Below we discuss the most important options of the
input. Most options now have default values, which will be used if you skip them in the input file (this
allows you to have extremely small input file!) Those options that have no default, should always be
specified.

7.1. TYPE OF RUN AND SYSTEM
*variable calculationMethod

Meaning: specifies the type of calculation.

Possible values (characters):
USPEX evolutionary algorithm for crystal structure prediction
VCNEB transition path determination
META evolutionary metadynamics (pseudo-metadynamics)

Default: USPEX

Format:
USPEX : calculationMethod

*variable calculationType

Meaning: specifies type of calculation, i.e. whether a bulk crystal, or a nanocluster structure is to be
predicted.

Possible values (integer): 1 = bulk, 2 = clusters, 4 = varcomp bulk, 11 = molecular crystals

Default: 1

Format:
1 : calculationType

Notes: If calculationType=11, i.e. a prediction for a molecular crystal is to be done, then USPEX expects you to provide
files MOL_1, MOL_2, with coordinates of the atoms in each type of molecule type and these molecules will be placed
in the newly generated structures as whole objects (and later, if needed, relaxed).

*variable optType
Meaning: This variable allows you to specify the quantity that you want to optimize.

Possible values (characters):
enthalpy - to find the stable phases
volume - volume minimization (to find the densest structure)
hardness - hardness maximization (to find the hardest phase)
struc_order - maximization of the degree of order (to find the most ordered structure)

USPEX Manual 22
Default: enthalpy

Format:
enthalpy : optType

Notes: (1) If you want to do opposite optimization, add minus sign. For instance, to minimize the hardness, put -hardness.

Now, you need to specify what you know about the system. The number of atoms of each sort is given
by the numIons keyblock, e.g.:

*variable numIons
Meaning: Describes the number of atoms of each type.

Default: none, must specify explicitly

Format:
% numIons
4 4 12
% EndNumIons

This means there are 4 atoms of the first type, 4 of the second type, and 12 of the third type.

Notes: For variable composition calculations, you have to specify the building blocks as follows:
Format:
% numIons
2 0 3
0 1 1
% EndNumIons

This means first building block has formula A
2
C
3
and second building block has formula BC, where A, B and C are
described in the block atomType. All structures will then have the formula xA
2
C
3
+yBC with x, y = (0,1,2,).

*variable atomType
Meaning: Describes the identities (e.g. numbers in Periodic Table) of atoms of each type.


Format:
% atomType
12 14 8
% EndAtomType

In this case, the first atomic type is Mg (number 12 in Mendeleevs Periodic Table of the elements), second one is Si and
the third one is O (numbers 14 and 8, respectively). You can use atomic numbers, short name of the element (e.g. Ca, Cl,
etc) or full name (Carbon, etc) of the elements in this field.

*variable valencies
Meaning: Describes the valencies of atoms of each type.

USPEX Manual 23

Format:
% valencies
2 4 2
% endValencies

*variable goodBonds

Meaning: specifies the minimum bond valence matrix (i.e. the ratio valence/coordination number,
extended by Browns formula) in this system this does not have to be accurate, just give a reasonable
guess. Like the IonDistances matrix (see below), this is square keymatrix cast in an upper-triangular
form.

Default: 0.15

Format:
% goodBonds
1.0 1.0 0.2
0.0 1.0 0.5
0.0 0.0 1.0
% EndGoodBonds
Notes: The dimensions of this matrix must be equal either to the number of atomic species or unity. If only one number is
used, the matrix is filled with this number. The matrix above reads as follows: to be considered a bond, the Mg-Mg
distance should be short enough to have bond strength of 1.0 or more, the same for Mg-Si, Si-Si and O-O bonds (by using
such exclusive criteria, we effectively disregard these interactions as potential bonds), whereas the weakest Mg-O bond
that will be considered for hardness and softmutation calculations will have valence strength of 0.2, and Si-O will have
strength of 0.5 or more. To set this parameter just take values close to the ratio valence/(maximum possible coordination
number). Easy!

*variable checkConnectivity

Meaning: switches on/off hardness calculation and connectivity related criteria in softmutation.

Possible values (integer): 0 = connectivity not checked, no hardness calculations; 1 = connectivity
taken into account, hardness is calculated.

Default: 1

Format:
1 : checkConnectivity

7.2. POPULATION
*variable populationSize
Meaning: the number of structures in each generation, except initial

USPEX Manual 24
Default: 2*N rounded to closest 10, where N is number of atoms/cell (or maxAt for variable
composition). Upper cap is 60. Usually, you can trust these default settings.

Format:
20 : populationSize

*variable initialPopSize
Meaning: the number of structures in the initial generation.

Default: equal to populationSize.

Format:
20 : initialPopSize

Note: In most situations we suggest that these two parameters be equal. Sometimes it may be useful to specify the initial
population to be larger than the population size in the rest of the simulation. This allows one to explore the configuration
space better and thus have a better selection of initial structures. It is possible to have a smaller initial population as well
this is useful, if one wants to make the first population entirely from seeded structures.

*variable numGenerations
Meaning: maximum number of generations allowed for the simulation. The simulation can terminate
earlier, however, if the stopping criterion is met i.e. when the same best structure remained best for
stopCrit generations.

Default: 100

Format:
100 : numGenerations

*variable stopCrit
Meaning: the simulation is stopped if for stopCrit generations the best structure did not change, or
when numGenerations have expired whichever happens first.

Default: total number of atoms for fixed-composition runs, maximum number of atoms maxAt for
variable-composition runs.

Format:
100 : stopCrit

7.3. SURVIVAL OF THE FITTEST AND SELECTION
*variable keepBestHM
Meaning: defines how many best structures will survive into the next generation.

Default: 0.1*populationSize
USPEX Manual 25

Format:
3 : keepBestHM

Note: if reoptOld=0, these structures will be left without reoptimization while if reoptOld=1, they will be reoptimized
again (if structure relaxation is high-quality, this option will not affect the final results).

*variable bestFrac
Meaning: Fraction of the current generation that shall be used to produce the next generation.

Default: 0.7

Format:
0.7 : bestFrac

Note: This is a very important parameter, directly affecting the convergence rate of the algorithm. Values between 0.5-0.75
seem to be reasonable.

If you use the fingerprinting method (see below), it is a good idea to keep several best structures. This
will increase the learning power of the algorithm, while not leading to decrease of structural diversity.
To that end, set dynamicalBestHM to 1 or 2.

*variable dynamicalBestHM
Meaning: specifies whether number of surviving best structures will vary during the calculation with
keepBestHM as upper bound .

Possible values (integer): 0 = no variation, 1 and 2 = see note

Default: 2

Format:
1 : dynamicalBestHM

Note: If you set dynamicalBestHM=1, the code will choose up to keepBestHM best different structures (based on the
fingerprint tolerance toleranceBestHM). If dynamicalBestHM=2 (our preferred choice), then you select keepBestHM
maximally different structures (chosen using a clustering algorithm) in the entire energy interval corresponding to
bestFrac, and toleranceBestHM is determined automatically this helps diversity while retaining memory of good
structures.

7.4. VARIATION OPERATORS

*variable fracGene
Meaning: percentage of structures obtained by heredity. 0.1 means 10%, etc.

Default: 0.5

Format:
USPEX Manual 26
0.5 : fracGene

*variable fracPerm
Meaning: percentage of structures obtained by permutation. 0.1 means 10%, etc.

Default: 0.1 if there is more than 1 type of atoms/molecules, 0 otherwise.

Format:
0.1 : fracPerm

*variable fracRotMut
Meaning: percentage of structures obtained mutating molecular orientations. 0.1 means 10%, etc.

Default: 0.1 for molecular crystals, 0 otherwise.

Format:
0.1 : fracRotMut

The percentage of structures obtained by lattice mutation is not specified explicitly, but obtained as 1-
(fracGene+fracPerm+fracAtomsMut+fracRotMut).

For heredity, slices are chosen in random directions with a random offset.

*variable percSliceShift
Meaning: switches on, for a specified percentage of structures, random shifts of slices parallel to their
matching plane.

Default: 1.0

Format:
1.0 : percSliceShift

Note: This helps to increase structural diversity and can be viewed as an atomic coordinate mutation. Values close to zero
seem to speed up the calculation, large values diversify search as always, choose balance.

*variable howManySwaps
Meaning: For permutation, the number of pairwise swaps will be randomly drawn from a uniform
distribution between 1 and howManySwaps.

Default: 5 (we recommend to specify this parameter explicitly, not resorting to this default value)

Format:
5 : howManySwaps

*variable specificSwaps
Meaning: specifies which atom types you allow to swap in permutation.

USPEX Manual 27
Default: blank line

Format:
% specificSwaps
1 2
% EndSpecific

Note: In this case, atoms of type 1 could be swapped with atoms of type 2. If you want to try all possible swaps, just leave
a blank line inside this keyblock.

*variable fracAtomsMut
Meaning: specifies percentage of structures obtained by softmutation or coormutation.

Default: 0.1

Format:
0.1 : fracAtomsMut
Note: You can use softmutation or coormutation by specifying softMutTill this gives the number of generation after
which coormutation is used instead of softmutation.

*variable mutationDegree
Meaning: the maximum displacement in softmutation in . The displacement vectors for softmutation
or coormutation are scaled so that the largest displacement magnitude equals mutationDegree.

Default: 3*average atom radius

Format:
2.5 : mutationDegree

*variable softMutOnly
Meaning: how many generations should be produced by softmutation only.

Default: 0

Format:
% softMutOnly
1-5
% EndSoftOnly

Note: In the above example generations up to 5
th
generation (excluding, of course, the first generation) are produced by
softmutation alone. Note that upon softmutation each parent produces TWO softmutants. You can also specify particular
generations to be softmutated throughout the run, for example to softmutate every 10
th
generation you can write:
% softMutOnly
2 12 22 32 42
% EndSoftOnly

For lattice mutation, we define each mutated cell vector ' a as a product of the old vector (
0
a ) and the
( I + ) matrix:
USPEX Manual 28
0
' )a (I a + = , (1)
where I is the unit matrix and is the symmetric strain matrix, so that:
3 4 5
4 2 6
5 6 1
1 2 / 2 /
2 / 1 2 /
2 / 2 / 1
) (
c c c
c c c
c c c
+
+
+
= + I (2)
The strain matrix components are selected randomly from the Gaussian distribution and are only
allowed to take values between -1 and 1. Lattice mutation essentially incorporates into our method the
ideas of metadynamics
9,21
, where new structures are found by building up cell distortions of some
known structure. Unlike in metadynamics, in our method the distortions are not accumulated, so to
obtain new structures the strain components should be large:

*variable mutationRate
Meaning: standard deviation of the epsilons in the strain matrix.

Default: 0.5

Format:
0.5 : mutationRate

It is a good idea to combine latmutation with a weak softmutation:

*variable DisplaceInLatmutation
Meaning: specifies softmutation as part of latmutation and sets the maximum displacement in .

Default: 1.0

Format:
1.0 : DisplaceInLatmutation

7.5. CONSTRAINTS
Since the same structure can be represented in an infinite number of coordinate systems (this
phenomenon is known as modular invariance), there is a lot of potential redundancy in the search.
Furthermore, most of these equivalent choices will lead to very flat unit cells, which creates problems
for structure relaxation and energy calculation (e.g., very many k-points are needed). The constraint,
well known in crystallography, that the cell angles be between 60 and 120, does not remove all
redundancies and problematic cells (e.g. thus allowed cells with =|=~120 are practically flat).
Therefore we developed
35,36
a special scheme to obtain special cell shapes with shortest cell vectors.
This transformation can be done if there is at least one lattice vector whose projection onto any other
cell vector or the diagonal vector of the opposite cell face is greater (by modulus) than half the length
of that vector, i.e. for pairs a and b, or c and (a+b) these criteria are:
USPEX Manual 29
2
b
b
b a
>
-
(3a)
2
a
a
b a
>
-
(3b)
2
) ( c
c
b a c
>
+ -
(3c)
2
) ( b a
b a
b a c +
>
+
+ -
(3d)

E.g. for the criterion (3a) the new vector a
*
equals:
b b a
b
b a
a a )) ( *
| |
| |
2
*
-
|
|
.
|
\
| -
= sign ceil (4)
This transformation, done iteratively, completely avoids pathological cell shapes and solves the
problem. During this transformation atomic fractional coordinates are transformed so that the original
and the transformed structures are identical (during the transformation Cartesian coordinates of the
atoms remain invariant).

*variable minVectorLength
Meaning: sets the minimum length of a cell parameter of a newly generated structure.

Default: diameter of the largest atom.

Format:
2.0 : minVectorLength

Commonly used computational methods (pseudopotentials, PAW, LAPW, and many parametric
interatomic potentials) break down when the interatomic distances are too small. This situation has to
be avoided and you can specify the minimum distances between each pair of atoms using the
IonDistances square keymatrix cast in an upper-triangular form, e.g.

*variable IonDistances
Meaning: sets the minimum inter-atomic distance matrix between different atom types.

Default: half of the covalent radii sum for corresponding atom pair.

Format:
% IonDistances
1.0 1.0 0.8
0.0 1.0 0.8
0.0 0.0 1.0
USPEX Manual 30
% EndDistances

Note: The dimensions of this matrix must be equal to the number of atomic species. The matrix above reads as follows: the
minumum Mg-Mg distance allowed in a newly generated structure is 1.0 Angstrom, the minimum Mg-Si, Si-Si and O-O
distances are also 1.0 Angstrom, and the minimum Mg-O and Si-O distances are 0.8 Angstrom. You can use this keymatrix
for incorporating further system-specific information: e.g. if you know that in your system Mg atoms prefer to be very far
and are never closer than 3 Angstrom, you can specify this information. Beware, however, that the larger are these
minimum distances, the more difficult it is to find structures fulfilling these constraints (especially for large systems) so a
compromise must be found in each case. With some experience you will find this easy.

*variable constraint_enhancement
Meaning: rather technical parameter, which allows one to use stricter (by constraint_enhancement
times) constraints of IonDistances matrix for symmetric random structures (for all variation operators,
unenhances IonDistances matrix still applies). Use it only if you know what you are doing.

Default: 1.

Format:
1 : constraint_enhancement

For molecular crystals, the following keyblock is useful:

*variable MolCenters
Meaning: sets the minimum inter-molecular distance matrix between centers of molecules of different
types.

Default: zero-matrix for non-molecular calculations.

Format:
% MolCenters
5.5 7.7
0.0 9.7
% EndMol
Note: In the above example, there are two types of molecules. In all generates structures the distance between geometric
centers of the molecules of the first type must be at least 5.5 (A-A distance), between centers of the molecules of the first
and second type 7.7 (A-B distance), between molecules of the second type 9.7 (B-B distance).

7.6. CELL
As mentioned above, it is useful to rescale all newly produced structures to a unit cell volume that you
believe to be reasonable for your system at relevant conditions. To find this volume, do structure
relaxation for a randomly generated structure, or just take some other (known or hypothetical)
structure at relevant conditions and use its unit cell volume. This should be specified in the
LatticeValues keyblock, e.g.:

*variable LatticeValues
Meaning: specifies the initial volume of the unit cell or know lattice parameters.

USPEX Manual 31
Default: no default, has to be specified by user.

Format:
% Latticevalues
125.00
% Endvalues

Notes: (1) This volume is only used as an initial guess and influences only the first generation, each structure is fully
optimized and adopts the volume corresponding to the (free) energy minimum. This keyblock has also another use: when
you know lattice parameters (e.g., from experiment), you can specify them here (in a matrix form, with each lattice vector
represented by a row of the matrix), in the LatticeValues keyblock instead of unit cell volume, e.g.:
% Latticevalues
7.49 0.0 0.0
0.0 9.71 0.0
0.0 0.0 7.07
% Endvalues

(2) For variable composition calculations you have to specify the volume of each atom separately, e.g.:
% Latticevalues
12.5 14.0 11.0
% Endvalues

If you have a large system (>20-40 atoms/cell), randomly produced first generation will consist of
disordered structures with high energies and little diversity. This is one of the manifestations of the
curse of dimensionality, which eventually becomes hopeless for any algorithm to overcome. For this
case we proposed a trick, where a large cell is split into a suitable number of identical subcells. The
user can input how many atoms these subcells are allowed to contain (5, 10 and 20 in the example
below) and the code will search for an optimal splitting for a given cell. Here is an example:
*variable splitInto
Meaning: defines the number of identical subcells in the unit cell. If you dont want to use splitting,
just put value 1.

Default: 1

Format:
% splitInto (number of subcells into which the unit cell is split)
1 2 4
% EndSplitInto

Subcells introduce extra translational (pseudo)symmetry. One can use, in addition to this, the full
apparatus of space groups, enabled by a powerful code contributed by H.T. Stokes
18
:

*variable symmetries
Meaning: possible space groups for crystal or point groups for clusters.

Default: 1-230 for crystals and E C2 D2 C4 C3 C6 T S2 Ch1 Cv2 S4 S6 Ch3 Th Ch2 Dh2 Ch4 D3
Ch6 O D4 Cv3 D6 Td Cv4 Dd3 Cv6 Oh Dd2 Dh3 Dh4 Dh6 Oh C5 S5 S10 Cv5 Ch5 D5 Dd5 Dh5
I Ih for clusters.
USPEX Manual 32

Format:
% symmetries
2-230
% endSymmetries

For clusters you have to specify the thickness of the vacuum region around the cluster.

*variable vacuumSize
Meaning: defines the amount of vacuum added around the structure (closest distance in between
neighboring clusters in adjacent unit cells).

Default: 10 for every step of relaxation

Format:
% vacuumSize
20 20 20 10 10
% endVacuumSize

7.7. RESTART
Sometimes a calculation may go wrong or terminate prematurely e.g. due to hardware failure. One
could, of course, start it again from scratch (sometimes this is necessary, if your input was incorrect!),
but often you want to continue the calculation either from the point where it stopped or even from an
earlier point. If all you want is to continue the run from where it stopped, you dont need to change
settings (all information will be stored in the *.mat files) and it will suffice to remove file still_reading
and run USPEX again.
If you want to restart from a particular generation in a particular results-folder, then specify
pickUpYN=1, pickUpGen=number of the generation from which you want to start,
pickUpFolder=number of results-folder (e.g., results1, results2,) from which the restart needs to be
done. If pickUpGen=0, then a new calculation is started, if pickUpFolder=0, then restart is done by
default from the highest existing number of result-folder. (Hint: instead of relying on these defaults,
better specify the folder and generation explicitly). Default options are 0 for all three parameters. For
example, to restart a calculation performed in the folder results5 from generation number 10, specify:
1 : pickUpYN
10 : pickUpGen
5 : pickUpFolder

7.8. DETAILS OF AB INITIO CALCULATIONS
USPEX enjoys a powerful two-level parallelisation scheme, making its parallel scalability very hard to
match by most other computational algorithms. By far the most expensive part of the calculation is
structure relaxation. The first level of parallelisation is done within structure relaxation codes, enabling
USPEX Manual 33
excellent efficiency for each structure on up to ~10-10
2
CPUs. You can specify how many CPUs you
want to use for structure relaxation (at each of its steps) using the numProcessors keyblock, e.g.:
*variable numProcessors
Meaning: defines the number of processors for every optimization step.

Default: 1 for every relaxation step

Format:
% numProcessors
4 8 16 16 16
% EndProcessors

Another interesting feature of USPEX is the possibility to combine different levels of theory e.g.,
often you dont need to do highly accurate structure relaxation from the beginning (only final steps of
structure relaxation have to be highly accurate) and you may want to start structure relaxation at a
cheaper level, e.g. with interatomic potentials (e.g., using the GULP code) or with a minimal LCAO
basis set (using SIESTA), switching towards the end to more accurate calculations - with large LCAO
(using SIESTA) or plane-wave basis sets (using VASP). However, use this feature with care cheap
calculations still have to be meaningfully constructed. This feature is controlled by the abinitioCode
keyblock, e.g.:
*variable abinitioCode
Meaning: defines the code used for every optimization step.

Default: 1 for every optimization step (VASP)

Format:
abinitioCode
3 2 2 1 1
ENDabinit

Note: Numbers indicate the code/mode used at each step of structure relaxation 1 means VASP, 2 means SIESTA, 3
indicates GULP, 4 is for GULP used in molecular crystal calculations (only for testing purposes), 5 is for SIESTA for
molecular crystals, 6 is for MD++ code, 7 is for Neural Networks code (only for testing purposes at this moment), 8 is for
DMACRYS, 9 is for CP2K, 10 is for Quantum Espresso, 11 is for DL POLY, 12 is for molecular VASP, 13 is for ASE,
14 is for ATK.

*variable Kresol
Meaning: specifies the reciprocal-space resolution for k-points generation (units: 2t*Angstrom
-1
).

Default: from 0.2 to 0.08 linearly

Format:
% KresolStart
0.2 0.16 0.12 0.08
% Kresolend

Note: You can enter several values (one for each step of structure relaxation), starting with cruder (i.e. larger) values and
ending with high resolution. This dramatically speeds up calculations, especially for metals, where very many k-points may
USPEX Manual 34
be needed for accurate energy calculations. This keyblock is important if you use VASP or QuantumEspresso (with GULP
it is not needed at all, and with SIESTA you will have to define Kresol within SIESTA input files).

*variable wallTime
Meaning: (format- hours:minutes, e.g. 04:30) sets the maximum amount of wall time a single
calculation is allowed to take, it will be put in the job-submission files.

Default: 2:00

Format:
2:00 : wallTime

The second level of parallelization done within USPEX parallelizes the calculation over the
individuals in the same population (since within the same generation structures are independent of
each other) i.e. you can simultaneously perform up to populationSize (also typically of order 10-10
2
)
structure relaxations.
*variable numParallelCalcs
Meaning: specifies how many structure relaxations you want to run in parallel (more precisely, up to
how many jobs you want to be in the queue at each moment of course, if the machine is fully loaded,
you will have to wait, only if the machine is sufficiently free they will be executed at the same time).

Default: 1

Format:
10 : numParallelCalcs

You need to supply the job submission files or names of executable files for each code/mode you are
using (1 - VASP, 2 - SIESTA, 3 - GULP, 4 - GULP in molecular mode, 5 - SIESTA in molecular
mode, 6 WCais MD++ code, 7 development code, etc) and specify them in the
commandExecutable keyblock:

*variable commandExecutable
Meaning: specifies the name of the job submission files or executables for a given code.

Default: No default, has to be specified by user.

Format:
% commandExecutable
mpirun -np 2 vasp > out
siesta
./job3_gulpNP
job2
/nfs/xt3-homes/users/alyakhov/bin/siesta<input.fdf>out
./meam-lammps_han test.tcl
timelimit -t 400 ./OptimizeNN.x > log
timelimit -t 400 ./dmacrys <mol.res.dmain >output
mpirun -np 4 cp2k.popt cg.inp > cp2k_output
USPEX Manual 35
mpirun -np 4 pw.x < qe.in > output
reserved for DL POLY executable
molecular VASP
ASE executable
atkpython < ATK.in > ATK.out
% EndExecutable

Note: USPEX reads the line number for every optimization step from the variable abinitioCode and then uses the
corresponding line from commandExecutable to start the optimizer. For example, abinitioCode equal to 3 3 1 means
that first two optimization steps will be done with GULP which is started using the script ./job3_gulpNP and the last step
is done with vasp via the command mpirun -np 2 vasp > out.

7.9. HARDWARE-RELATED
USPEX is written in Matlab, and is compatible with Octave, a non-commercial analogue of Matlab.
You can actually use USPEX on virtually any platform in the remote submission mode. All you need
is Matlab/Octave to be running on your workstation. In that case, your workstation will prepare input
(including jobs), send them to the remote compute nodes, check when the calculations are done, get
the results back, analyse them and prepare new input. The amount of data being sent to and from is not
large, so the network does not need to be very fast. Job submission is, of course, machine-dependent.
To use remote submission from your workstation to a supercomputer, you must first set up a
passwordless connection from the workstation to the supercomputer. On the local workstation USPEX
could be run using periodic cron calls (e.g. starting USPEX every 10 minutes).

*variable remote
Meaning: specifies whether the jobs are submitted remotely or locally.

Possible values (integer):
0 calculation is done on a local machine, e.g. on your desktop workstation.
1 remote submission on supercomputers, description of which is hardcoded into the USPEX code.
2 remote submission using specially prepared submission file, see whichCluster

Default: 0

Format:
0 : remote

Note: for remote calculation we strongly urge the users to use remote=2, which only requires them to modify the file
Remote_template.

*variable whichCluster
Meaning: specifies on which machine the calculations will be run.

Default: nonParallel

Format:
nonParallel : whichCluster
USPEX Manual 36

Note: If you select whichCluster = nonParallel, this will set up a local sequential calculation (e.g., on your desktop
workstation, with interactive energy calculations). This needs to be set with remote=0. If remote = 2, then specify
whichCluster=mySupercomputer, and create file mySupercomputer by editing Remote_template, located in the folder
RemoteSubmission (with a few examples for real supercomputers). Example:
Lomonosov : whichCluster
Where file Lomonosov describes all the information required for remote job submission on Lomonosov supercomputer.

Occasionally, relaxation of a particular structure may fail (hardware problems etc.) and its a good idea
to give it another chance by specifying how many times you want a failed relaxation to be re-
attempted, which is done by specifying maxErrors (Default: 2). If after maxErrors attempts there is
still a failure (e.g. no output), this trial structure will be removed from the population.

7.10. REMOTE SETTINGS
If remote > 0, you have to specify some settings in the input file relevant to the current user and
calculation. Other (user/task independent) settings are either coded in USPEX or specified in the file
described by whichCluster parameter.

*variable username
Meaning: user name to login to remote supercomputer.

Default: by default, remote=0 and all subsequent remote settings are disabled.

Format:
aroganov : username

*variable remotePath reserved for developers. Will not influence user calculations.
*variable portNumber reserved for developers. Will not influence user calculations.

*variable localFolder
Meaning: used for remote = 1 and describes the location of your calculation folder. Basically, it
specifies which part of the path returned by unix pwd command is not relevant for calculation. The
folder structure after localFolder is usually replicated on supercomputer, if remote = 1. This allows
better organization of multiple calculations.

Default: by default, remote=0 and all remote settings are disabled.

Format:
artem : localFolder

*variable remoteFolder
Meaning: folder on supercomputer, where calculation will be performed.

Default: by default, remote=0 and all remote settings are disabled.

Format:
USPEX Manual 37
Blind_test : remoteFolder

Note: there is similar parameter specified in the remote submission file - homeFolder. The actual path to the calculation
will be *homeFolder*/*remoteFolder*/CalcFolderX where X = 1,2,3,

7.11. FINGERPRINTS SETTINGS
Before changing the following parameters we would encourage you to read the methodological papers
first, so you know what you are doing (Valle & Oganov, 2008; Oganov & Valle, 2009; Lyakhov, Oganov & Valle,
2010).
0.05 : sigmaFing
0.10 : deltaFing
10.0 : RmaxFing
0.015 : toleranceFing (if distance is less than tolerance - structures are identical)
0.015 : toleranceBestHM (if distance is less than tolerance - structures are identical)

sigmaFing is the Gaussian broadening of interatomic distances (0.05 is always OK).
deltaFing is the discretisation (in Angstrom) of the fingerprint function (0.10 should be OK).
RmaxFing is the distance cutoff (7-10 is usually OK, but sometimes you may want to use smaller
or larger values, depending on the system and what you want to do).
toleranceFing and toleranceBestHM specify the minimal cosine distances between structures that
qualify them as non-identical for participating in the production of child structures and for survival
of the fittest, respectively. Often its OK to have the two values identical. Sometimes you may want to
sample best structures that are very different in such cases, make toleranceBestHM much larger.
These settings are very powerful and should be made with good experience and judgment. A short
evolutionary or random-sampling run would be helpful to tune the parameters. They depend on the
precision of structure relaxation and on the physics of the system (for instance, for ordering problems
fingerprints belonging to different structures will be very similar, and these tolerance parameters
should be made small).
maxDistHeredity specifies the maximal cosine distances between structures that participate in
heredity. This specifies the radius on the landscape within which structures can mate. Use with care
(or dont use at all).

7.12. SPACE GROUPS
Meaning: determine space groups and write output also in the crystallographic *.CIF-format (this
makes your life easier when preparing publications but beware that sometimes space groups may be
under-determined if the relaxation was not very precise and very stringent tolerances were set for
symmetry finder). This option is enabled thanks to the powerful symmetry code provided by H.T.
Stokes.

Default: 1, except calculationType=2 (clusters) where the default is =0.

Format:
1 : doSpaceGroup (0 - no space groups, 1 - determine space groups)

USPEX Manual 38

7.13. MANY PARENTS SETINGS
*variable manyParents
Meaning: specifies whether more than two slices (or more than two parent structures) should be used
for heredity. This may be beneficial for very large systems.

Possible values (integer):
0 only 2 parents are used, 1 slice each.
1 many structures are used as parents, 1 slice each.
2 two structures are used as parents, many slices (determined dynamically using parameters minSlice and maxSlice) are
chosen independent form each other.
3 two structures are used as parents, many slices (determined dynamically using parameters minSlice and maxSlice) are
cut from cell with fixed offset. This is the preferred option for large systems. For example, we cut both structures into
slices of approximately same thickness and then choose even slices from parent 1 and odd slices form parent 2, making a
multilayered sandwich.

Default: 0

Format:
3 : manyParents

minSlice, maxSlice : Determine the minimal and maximal possible slices in that will be cut out of
the parent structures to participate in the creation of the child structure. We want them to be thick
enough to carry some information about the parent (but not too thick to make multiple-parents heredity
ineffective). Reasonable values for these parameters are around 1 and 6 , respectively.
For clusters you have to specify the number of parents participating in heredity:

*variable numberparents
Meaning: defines the number of parents in heredity for clusters.

Default: 2

Format:
2 : numberparents

7.14. STATISTICS FOR DEVELOPERS
20 : repeatForStatistics

Default: 1 (i.e. no statistics will be gathered)

USPEX simulations are stochastic, and redoing the simulation with the same input parameters does not
necessarily yield the same results. While the final result the ground state is the same (hopefully!),
the number of steps it takes to reach it and the trajectory in chemical space differ from run to run. To
compare different algorithms you MUST collect at least some statistics not relying on just a single
run (which may be lucky or unlucky USPEX does not rely on luck!). This option is of interest only
USPEX Manual 39
to developers and it only makes sense to collect statistics with simple potentials (e.g., using GULP or
MD++).

USPEX Manual 40
8. ADDITIONAL INPUT FOR SPECIAL CASES:
8.1. MOLECULAR CRYSTALS: MOL_1, MOL_2, FILES.
MOL_1 file: the file describes the structure of a molecule to be used as a whole entity, and defines
which degrees of freedom will be frozen during structure relaxation. This file and its format differ
from Siesta's Z_Matrix file (MOL_1 gives Cartesian coordinates of the atoms, whereas Z_Matrix file
defines atomic positions from bond lengths, bond angles and torsion angles). Z_Matrix file is created
using the information given in the MOL_1 file - i.e. bond lengths and all necessary angles are
calculated from the Cartesian coordinates. Which lengths and angles are important and should be used
for Z_Matrix to define atomic positions - this is exactly what columns 5-7 specify. Let's look at the
MOL_1 file for benzene C
6
H
6
:
Number of atoms: 12
0 0 0 1 0 0 0 1 1 1
1.0600 0 0 6 1 0 0 0 1 1
1.7550 1.2038 0.0000 6 2 1 0 0 0 1
3.1450 1.2038 0.0000 6 3 2 1 0 0 0
3.8400 -0.0000 0.0000 6 4 3 2 0 0 0
3.1450 -1.2038 0.0000 6 5 4 3 0 0 0
1.7550 -1.2038 0.0000 6 6 5 4 0 0 0
1.2250 2.1218 0.0000 1 3 2 1 0 0 0
3.6750 2.1218 0.0000 1 4 3 8 0 0 0
4.9000 -0.0000 0.0000 1 5 4 9 0 0 0
3.6750 -2.1218 0.0000 1 6 5 10 0 0 0
1.2250 -2.1218 0.0000 1 7 6 11 0 0 0

The first atom is Hydrogen, its coordinates are defined without reference to other atoms ("0 0 0")
The second atom is Carbon, its coordinates (in molecular coordinate frame) in Z_matrix will be set
only by its distance from the first atom (i.e. Hydrogen described above), but no angles - ("1 0 0")
The third atom is Carbon, its coordinates will be set by its distance from the second atom, and the
bond angle 3-2-1, but no torsion angle - hence we write "2 1 0"
The fourth atom is Carbon, its coordinates will be set by its distance from the third atom, bond angle
4-3-2, and torsion angle 4-3-2-1 - hence we write "3 2 1"
and so forth... until we reach the final, 12th atom, which is Hydrogen, defined by its distance from the
7th atom (Carbon), valence angle 12-7-6 and torsion angle 12-7-6-11 - hence "7-6-11"

USPEX Manual 41
8.2. VARIABLE-COMPOSITION CODE.
To switch to the variable-composition mode, you have to:
1. Specify 4 : calculationType
2. Specify building blocks in numIons (see the description of numIons parameter)
3. Specify the approximate atomic volumes (or lattice vectors) for each atom type in the parameter
LatticeValues
4. Specify the following varcomp-only options:

*variable firstGeneMax
Meaning: how many different compositions are sampled in the first generation. If 0, then the number
is equal to initialPopSize/4.

Default: 11

Format:
10 : firstGeneMax

*variable minAt
Meaning: minimum number of atoms/cell for the first generation.

No default

Format:
10 : minAt

*variable maxAt
Meaning: maximum number of atoms/cell for the first generation.

No default

Format:
20 : maxAt

*variable fracTrans
Meaning: percentage of structures obtained by transmutation.

Default: 0.1

Format:
0.1 : fracTrans

*variable howManyTrans
Meaning: maximum percentage of atoms in the structure that are being transmutated (0.1 = 10%).

Default: 0.2
USPEX Manual 42

Format:
0.2 : howManyTrans

*variable specificTrans
Meaning: specifies allowed transmutations (leave blank if you want to allow all possible
transmutations).

Default: blank line

Format:
% specificTrans
1 2
% EndTransSpecific

Note: In this case, atoms of type 1 could be transmutated into atoms of type 2 and vice versa. If you want to try all possible
transmutations, just leave a blank line inside this keyblock.

In the case of variable composition runs, parameter keepBestHM takes a new meaning all structures
on the convex hull (i.e. thermodynamically stable states of the multicomponent system) are kept as
best and survive, and the difference (if it is positive) between keepBestHM and the number of
convex-hull states is added to survivors.
An additional variation operator is introduced transmutation, which makes fracTrans the fraction of
the new generation (in this operator, a randomly selected atom is transmuted into another chemical
species present in the system the new chemical identity is chosen randomly by default, or you can
specify it in the block specificTrans, just like with specific permutation swaps). The fraction of atoms
that will be transmuted is drawn randomly from a homogeneous distribution bounded between 0 and
fractional parameter howManyTrans.
For variable-composition runs, it is particularly important to set up the first generation wisely. Choose
a suitably large initial generation size initialPopSize. Choose a reasonably large number of different
compositions firstGeneMax to be sampled in the first generation (but not too large each
composition needs to be sampled a few times at least). Finally, choose the total number of atoms/cell
within limits minAt and maxAt (make them not too different), and you may need a few calculations
with different system sizes: e.g., 4-8, 8-16, 16-30 atoms, etc.
An additional comment for VASP users if you want to do a variable-composition run, lets say for
the C-O system, you should make sure the atomic numbers are given correctly in INPUT.txt, and
should put files POTCAR_C and POTCAR_O in ~/StructurePrediction/Specific. USPEX will then
recognize each atom and take appropriately each atoms POTCAR file for the calculations.

USPEX Manual 43
9. HOW TO VISUALISE RESULTS, RESTART THE SIMULATION, PERFORM DIFFERENT TYPES OF RUNS, AVOID
TRAPPING.
9.1. HOW TO VISUALIZE RESULTS: USPEX produces a large set of numbers (crystal structures, free energies,
etc.). Analysis of these data by hand can be quite tedious and time-consuming. With efficient visualisation,
this analysis can be greatly speeded up and can produce valuable additional insights. Mario Valle has
specifically developed functionalities to read and visualise USPEX output files using his STM4 visualisation
toolkit
52
, which we strongly recommend to use in conjunction with USPEX. To use STM4 you need to have
AVS/Express installed on your computer. AVS/Express is not public domain and requires a licence.
Alternatively, you can of course visualise USPEX results with other software, e.g. OpenDX, VESTA, etc.
Different types of runs: broadly, you can do two very different types of runs, exploratory and exploitory. In
the first, you run a totally unbiased calculation with random starting structures and no prior knowledge
about the system, and keeping only 1 best structure from generation to generation. For large systems, this can
be expensive. Exploitory runs concentrate much more on the low-energy part of the energy landscape and
involve non-random starting structures obtained from some prior knowledge (smaller runs, similar systems,
or the same system at different conditions), accumulating many best structures (lets say, the same number as
the population size) and focussing more on mutations and permutations. Such runs yield a very large number
of low-energy metastable structures and provide a detailed coverage of the most relevant part of the
landscape. They are especially valuable for large systems, and when a search for metastable phases is
performed. The main danger is trapping in a metastable minimum since diversity in such simulations is
somewhat reduced. There is, of course, a continuous range of simulations between extreme exploratory and
exploitory. We can, in particular, recommend evolutionary metadynamics
19
as a powerful exploitory
approach implemented in USPEX.

9.2. HOW TO AVOID TRAPPING: first, use a sufficiently large population size (at least the same as the number
of atoms in the unit cell, for systems with complicated chemical bonding or close to a transformation, use
larger populations), large mutationRate (at least 0.4, but for hard materials 0.5 is perhaps better),
percSliceShift could also be increased. Basically, anything that increases diversity of the population, reduces
the chances of trapping in a local minimum. To make sure that your simulation is not trapped, it is useful to
run a second simulation with different parameters. The good news is that trapping occurs extremely seldom
in USPEX, within very broad reasonable parameter settings. Very powerful yet non trivial way to avoid
trapping in tough cases is to use fingerprints and tune parameters toleranceFing, toleranceBestHM, with
parameter dynamicalBestHM set to 1. The larger are those parameters, the more unlikely is the occurrence of
the trapping. For some systems it is the only way to find the solution, however for simple structures it may
slow down global optimization and should be used with care.

9.3. HOW TO USE SEED TECHNIQUE? This technique is useful if instead of starting with random structures,
you would like to input some structures that you already know for this compound or related materials. Just
USPEX Manual 44
create a file Seeds/POSCARS in the format of concatenated POSCAR files with an empty line at the end (this
empty line is required). Something like this example:

EA33 2.69006 5.50602 4.82874 55.2408 73.8275 60.7535 no SG
1.0
2.6901 0 0
2.6901 4.8041 0
1.3449 2.4021 3.9671
1 2 4
Direct
0.79919 0.56784 0.85959
0.79352 0.23095 0.54475
0.79354 0.91609 0.17445
0.050972 0.81606 0.85961
0.17223 0.19481 0.8596
0.43825 0.65517 0.40688
0.43823 0.20244 0.31233
EA34 7.61073 2.85726 2.85725 60.0001 79.1809 79.1805 no SG
1.0
7.6107 0 0
0.53635 2.8065 0
0.53633 1.352 2.4593
1 2 4
Direct
0.70891 0.50744 0.068339
0.37405 0.28573 0.84663
0.023663 0.069185 0.63009
0.88956 0.78056 0.34146
0.35047 0.62692 0.18782
0.59729 0.21131 0.77221
0.11644 0.37159 0.9325

For variable composition calculation, additional file Seeds/compositions is required, in which the composition
if every structure is given in a single line (this way, for example, for C-O the first line as 0 8 would mean that
the first structure consists of O only and 8 0 of C only):
2 4
2 4 in this case both seed structures have composition C
2
O
4

9.4. HOW SET UP PASSWORDLESS CONNECTION FROM LOCAL MACHINE TO REMOTE CLUSTER? You will
need to copy the public key from your local machine (directory ./ssh or ./ssh2) to the remote cluster. Here is
the list of commands you need to execute:
local # ssh-keygen -t dsa
local # scp ~/.ssh2/id_dsa.pub oganov@palu.cscs.ch:~/.ssh/tmp.pub
USPEX Manual 45
remote # cd ~/.ssh/
remote # ssh-keygen -f tmp.pub -i >> authorized_keys
remote # rm tmp.pub

9.5. HOW TO ADAPT USPEX TO YOUR CLUSTER? Well, this may require a little bit of experimentation and
remote=1. Let us assume you want local (i.e. not remote) submission and take whichCluster = gonzales. You
will need to make the following changes:

1) In the part, where you get the error (ORG_SCTRUC.hreidar part of submitJob.m) change the following lines
-----------------------------------------------------
if numProcessors == 1
if ORG_STRUC.gonzales
[nothing, tline] = unix(['bsub -W ' ORG_STRUC.wallTime ' -o schaa "prun ./'
ORG_STRUC.commandExecutable{POP_STRUC.POPULATION(Ind_No).Step} '"']);
else
[nothing, tline] = unix(['bsub -o schaa -W ' ORG_STRUC.wallTime ' < ./'
ORG_STRUC.commandExecutable{POP_STRUC.POPULATION(Ind_No).Step} ]);
end
else
if ORG_STRUC.gonzales
[nothing, tline] = unix(['bsub -n ' num2str(numProcessors) ' -W ' ORG_STRUC.wallTime ' -o schaa "prun ./'
ORG_STRUC.commandExecutable{POP_STRUC.POPULATION(Ind_No).Step} '"']);
else
[nothing, tline] = unix(['bsub -x -n ' num2str(numProcessors) ' -o schaa -W ' ORG_STRUC.wallTime ' < ./'
ORG_STRUC.commandExecutable{POP_STRUC.POPULATION(Ind_No).Step} ]);
end
end
-----------------------------------------------------
Then determine which command is used for job submission on your machine (qsub, bsub, etc.).

2) After that you have to extract the job number from the tline. On hreidar this job number is between < and > (for
example <15341>) an that is why following command is used:
-------------------
a= 0 ;
ind1 = 0;
while ~a
ind1 = ind1+1;
a=strcmp('<',tline(ind1));
end
b= 0 ;
ind2 = 0;
while ~b
USPEX Manual 46
ind2 = ind2+1;
b=strcmp('>',tline(ind2));
end
jobNumber = str2num(tline(ind1+1:ind2-1));
-------------------
if in your case job number is separated by ", you may write something
like this:
-----------------
c = findstr(b,'"');
jobNumber = b(c(end-1)+1:c(end)-1);
-----------------
The main idea is that you need your own execution command and then you have to write the job number into the
variable jobNumber. If jobNumber is a number, then use str2num command to convert it to a numerical format.

3) You should also change file checkStatusC.m:
----------------------------------
[nothing, statusStr] = unix(['bjobs ' num2str(jobID) ]);
doneOr = strfind(statusStr, 'found');
if ~isempty(doneOr)
doneOr = 1;
else
doneOr = strfind(statusStr,'DONE')
if ~isempty(doneOr)
doneOr = 1;
else
doneOr =0;
end
end
----------------------------------
Use bjobs (if it works on your machine) or an analogous command supported by your supercomputer that checks job
status on your supercomputer. Then find a word that determines that job is done. For example it is word 'finished' and
command that check job status is 'mystatus', then you should write something like this:
----------------------------------
[nothing, statusStr] = unix(['mystatus ' num2str(jobID) ]);
doneOr = strfind(statusStr, 'finished');
if ~isempty(doneOr)
doneOr = 1;
end
----------------------------------
Also keep in mind, that if job number is a text string (for example job14523) then you shuold use jobID instead of
num2str(jobID).

USPEX Manual 47
From v.6.7 onwards, there is a better (and strongly preferred) way. For parallel execution, set
remote=2
then choose the name of the cluster (e.g. Yangtze) and set
whichCluster = Yangtze
Now you will need to create file called "Yangtze" and containing commands and data specific for your cluster
- use Remote_template for this and modify it properly.
If you want 8 parallel calculations, each on 2 processors, then set:
numParallelCalcs=8
and
% numProcessors (how many processors per calculation)
2 2 2 2 2
% EndProcessors
See more details in the section on remote submission.

9.6. HOW TO SET UP REMOTE SUBMISSION?
As mentioned earlier, to set up the remote submission for your own supercomputer you have to set remote=2,
whichCluster= mySupercomputer, and create file mySupercomputer by editing remote_Template. You can use
so called $variables in this remote submission description file that USPEX will replace with actual data during
the job submission.
$~$jobnumber = absence of the job number, useful if it indicates that job was done
$jobnumber = job number
$username = username specified in INPUT_EA.txt
$numProcessors = number of processors specified in INPUT_EA.txt
$wallTime = max time for each calculations specified in INPUT_EA.txt
$currentCommandExecutable = name of the execution script or command specified in INPUT_EA.txt
$currentFolder = current calculation folder (for example 'CalcFold1')
$FullPath = remotePath (from file mySupercomputer) + remoteFolder (from INPUT_EA.txt) + $currentFolder
$remFolder = remoteFolder (from INPUT_EA.txt)

It is assumed that remote folder specified in mySupercomputer file already exists on remote machine. Remote
folder specified in INPUT_EA.txt is created by USPEX and is unique for each system. Please note, that remote
folder in mySupercomputer should contain slash / at the end of the line and remote folder in INPUT_EA.txt
should be a single folder name without any slashes.
Some non trivial parts of the remote submission description file are:

% filesToCopyOnce
all
% EndFilesToCopyOnce
USPEX Manual 48

Write 'all' if you want all files in the calculation folder to be copied or list of the file names otherwise. These
files will be copied only once at the beginning/restart of the calculation

% filesToCopyAlways
POTCAR
POSCAR
INCAR
KPOINTS
% EndFilesToCopyAlways

Write 'all' if you want all files in the calculation folder to be copied or list of file names otherwise. These files
will be copied always before submitting the new remote job. These two parameters filesToCopyOnce and
filesToCopyAlways allow us to copy the biggest files (executables for structure relaxation) only once, at the
beginning of the calculation. And then copy only those files that are changing for each new optimization step.

% filesToCopyBack
OUTCAR
CONTCAR
DOSCAR
% EndFilesToCopyBack

Write 'all' if you want all files in the directory to be copied back or list of file names otherwise. These files will
be copied to local machine after job is done.

% jobNumber
<
>
1
1
1
-1
% EndJobNumber

This parameter describes how USPEX should find the job number (jobID) from the submit response string. For
example: for string ' Job <13117> is submitted to default queue <mono>.' you should write 6 strings,
containing <, >, 1, 1, 1, -1. First two strings describe characters to search to search in the submit response,
next two - the number of the character in a search result and last two - offset from given characters to get the
USPEX Manual 49
job number. In example above it means that jobID is between the first appearance of < and first appearance
of >. Possible special characters: LineStart and LineEnd. Mark the beginning and the end of the submit
response line

% doneCommand
bjobs $jobnumber
% EndDoneCommand

Specifies the command to execute to check if work is done. $variables are allowed. Examples:
bjobs $jobnumber
llq $jobnumber
"grep Exiting... $FullPath/$jobnumber/manager.log"

% doneAnswer
DONE
% EndDoneAnswer

String that should be present in the answer from done_Command, which indicates that calculation is finished.

% submitCommand
"bash $FullPath/myrun_brutus $FullPath"
% EndSubmitCommand

Name of the command to execute to submit the job, $variables are allowed. Examples:
"bash $FullPath/myrun_brutus $FullPath"
"llsubmit $FullPath/job"
"/nethome/$username/$FullPath/myrun ~/$FullPath -np $numProcessors -maxtime $wallTime
$currentCommandExecutable"

%batchFile
echo path=\$1 > myrun_brutus
echo cd \$path \&\& bsub -o output-file -n $numProcessors ompirun ./$currentCommandExecutable >>
myrun_brutus
%EndBatchFile

This creates the batch file to execute. Note, that $variables will be replaced by respective strings before using
the echo command.

USPEX Manual 50

USPEX Manual 51
10. REFERENCES.
1. Maddox J. (1988). Crystals from first principles. Nature 335, 201.
2. Day G.M., Motherwell W.D.S., Ammon H.L., Boerrigter S.X.M., Della Valle R.G., Venuti E., Dzyabchenko A., Dunitz
J.D., Schweizer B., van Eijck B.P., Erk P., Facelli J.C., Bazterra V.E., Ferraro M.B., Hofmann D.W.M., Leusen F.J.J., Liang
C., Pantelides C.C., Karamertzanis P.G., Price S.L., Lewis T.C., Nowell H., Torrisi A., Scheraga H.A., Arnautova Y.A.,
Schmidt M.U. & Verwer P. (2005). A third blind test of crystal structure prediction. Acta Cryst. B61, 511-527.
3. Oganov A.R. & Glass C.W. (2006). Crystal Structure Prediction using Ab Initio Evolutionary Techniques: Principles
and Applications. J. Chem. Phys. 124, art. 244704.
4. Glass C.W., Oganov A.R., Hansen N. (2006). USPEX evolutionary crystal structure prediction. Comp. Phys. Comm.
175, 713-720.
5. Lyakhov A.O., Oganov A.R., Valle M. (2010). How to predict very large and complex crystal structures. Comp. Phys.
Comm. 181, 1623-1632.
6. Oganov A.R., Ono S. (2004). Theoretical and experimental evidence for a post-perovskite phase of MgSiO
3
in Earth's
D" layer. Nature 430, 445-448.
7. Murakami M., Hirose K., Kawamura K., Sata N., Ohishi Y. (2004). Post-perovskite phase transition in MgSiO
3
.
Science 304, 855-858.
8. Oganov A.R., Valle M. (2009). How to quantify energy landscapes of solids. J. Chem. Phys. 130, 104504.
9. Oganov A.R., Schn J.C., Jansen M., Woodley S.M., Tipton W.W., Hennig R.G. (2010). First blind test of inorganic
crystal structure prediction. In Modern Methods of Crystal Structure Prediction (ed. A.R. Oganov), Wiley-VCH. In
press.
10. Freeman C.M., Newsam J.M., Levine S.M., Catlow C.R.A. (1993). Inorganic crystal structure prediction using
simplified potentials and experimental unit cells application to the polymorphs of titanium dioxide. J. Mater.
Chem. 3, 531-535.
11. Schmidt M.U., Englert U. (1996). Prediction of crystal structures. J. Chem. Soc. Dalton Trans. 10, 2077-2082.
12. Pickard C.J., Needs R.J. (2006). High-pressure phases of silane. Phys. Rev. Lett. 97, art. 045504.
13. Martinez-Canales M., Oganov A.R., Lyakhov A., Ma Y., Bergara A. (2009). Novel structures of silane under pressure.
Phys. Rev. Lett. 102, 087005.
14. Ma Y., Oganov A.R., Xie Y., Li Z., Kotakoski J. (2009). Novel high pressure structures of polymeric nitrogen. Phys. Rev.
Lett. 102, 065501.
15. Pickard C.J., Needs R.J. (2009). High-pressure phases of nitrogen. Phys. Rev. Lett. 102, 125702.
16. Gao G., Oganov A.R., Li Z., Li P., Cui T., Bergara A., Ma Y., Iitaka T., Zou G. (2010). Crystal structures and
superconductivity of stannane under high pressure. Proc. Natl. Acad. Sci. 107, 1317-1320.
17. Pickard C.J., Needs R.J. (2009). Structures at high pressure from random searching. Phys. Status Solidi 246, 536-540.
18. Lyakhov A.O., Oganov A.R., Stokes H.T. (2012). Submitted.
19. Zhu Q., Oganov A.R., Lyakhov A.O. (2012). Evolutionary metadynamics: a novel method to predict crystal structures.
Cryst.Eng.Comm. 14, 3596-3601.
20. Martok R., Laio A., Parrinello M. (2003). Predicting crystal structures: The Parrinello-Rahman method revisited.
Phys. Rev. Lett. 90, art. 075503.
21. Martok R., Laio A., Bernasconi M., Ceriani C., Raiteri P., Zipoli F., Parrinello M. (2005). Simulation of structural
phase transitions by metadynamics. Z. Krist. 220, 489-498.
22. Pannetier J., Bassasalsina J., Rodriguez-Carvajal J., Caignaert V. (1990). Prediction of crystal structures from crystal
chemistry rules by simulated annealing. Nature 346, 343-345.
USPEX Manual 52
23. Schn J.C., Jansen M. (1996). First step towards planning of syntheses in solid-state chemistry: Determination of
promising structure candidates by global optimization. Angew. Chem. Int. Ed. 35, 1287-1304.
24. Wales D.J., Doye J.P.K. (1997). Global optimization by basin-hopping and the lowest energy structures of Lennard-
Jones clusters containing up to 110 atoms. J. Phys. Chem. A101, 5111-5116.
25. Gdecker S. (2004). Minima hopping: An efficient search method for the global minimum of the potential energy
surface of complex molecular systems. J. Chem. Phys. 120, 9911-9917.
26. Bush T.S., Catlow C.R.A. & Battle P.D. (1995). Evolutionary programming techniques for predicting inorganic crystal
structures. J. Mater. Chem. 5, 1269-1272.
27. Woodley S.M., Battle P.D., Gale J.D., Catlow C.R.A. (1999). The prediction of inorganic crystal structures using a
genetic algorithm and energy minimization. Phys. Chem. Chem. Phys. 1, 2535-2542.
28. Woodley S.M. (2004). Prediction of crystal structures using evolutionary algorithms and related techniques.
Structure and Bonding 110, 95-132.
29. Deaven D.M., Ho K.M. (1995). Molecular geometry optimization with a genetic algorithm. Phys. Rev. Lett. 75, 288-
291.
30. Oganov A.R., Martok R., Laio A., Raiteri P., Parrinello M. (2005). Anisotropy of Earths D layer and stacking faults
in the MgSiO
3
post-perovskite phase. Nature 438, 1142-1144.
31. Michalewicz Z., Fogel D.B. (2004) How to Solve It: Modern Heuristics. Berlin, Springer.
32. Cartwright H.M. (2004). An introduction to evolutionary computation and evolutionary algorithms. Structure and
Bonding 110, 1-32.
33. Usually called genetic crossover or two-parent variation operator.
34. Like the method of Deaven and Ho (Ref. 29), our evolutionary algorithm involves local optimization of each
generated structure and a real-space representation of the atomic positions. Note that the method of Ref. 29 was
developed specifically for molecules and has never been adapted or applied to the prediction of structures of crystals.
35. Oganov A.R., Glass C.W. (2008). Evolutionary crystal structure prediction as a tool in materials design. J. Phys.: Cond.
Mattter 20, art. 064210.
36. Lyakhov A.O., Oganov A.R., Valle M. (2010). Crystal structure prediction using evolutionary approach. In: Modern
methods of crystal structure prediction (ed. A.R. Oganov), pp. 147-180. Berlin: Wiley-VCH.
37. The hard constraints are: the minimum acceptable interatomic distance, the minimum value of a lattice parameter,
the minimum and maximum cell angles. The minimum interatomic distance is usually set to values about 0.7 , to
exclude unphysical starting structures and problems with large pseudopotential core overlaps. The minimum length
of a lattice parameter is usually set to less than the diameter of the largest atom under pressure.
38. Kresse G. & Furthmller J. (1996). Efficient iterative schemes for ab initio total-energy calculations using a plane
wave basis set. Phys. Rev. B54, 11169-11186.
39. Soler J.M., Artacho E., Gale J.D., Garcia A., Junquera J., Ordejon P., Sanchez-Portal D. (2002). The SIESTA method for ab
initio order-N materials simulation. J. Phys.: Condens. Matter 14, 2745-2779.
40. Gale J.D. (2005). GULP: Capabilities and prospects. Z. Krist. 220, 552-554.
41. List of initial applications (as of December 2005), which were performed just before we submitted our first
methodological papers includes: hydrogen at 200 and 600 GPa (2,3,4,6,8,12,16 atoms/cell), carbon (at fixed cell
parameters of diamond and with variable cell at 0, 100, 300, 500, 1000, 2000 GPa and with 8 atoms/cell), silicon
(10,14,20 GPa with 8 atoms/cell), xenon (at 200 and 1000 GPa, 8 atoms/cell), nitrogen at 100 GPa (with 6, 8, 12, 16
atoms/cell) and at 250 GPa (with 8 atoms/cell), oxygen (at experimental cell parameters of the c- and -phases, and
with variable cell at 25, 130 and 250 GPa using 16 atoms/cell), iron at 350 GPa (8 atoms/cell), sulphur at 12 GPa
USPEX Manual 53
(with 3,4,6,8,9,12 atoms/cell), chlorine at 100 GPa (8 atoms/cell), fluorine (50 and 100 GPa, 8 atoms/cell), SiO
2
at 0
GPa (9 atoms/cell), H
2
O at 0 GPa (12 atoms/cell), CO
2
at 50 GPa (3,6,9,12,18,24 atoms/cell), TiO
2
(at experimental
cell parameters of anatase, 12 atoms/cell), Al
2
O
3
at 300 GPa (20 atoms/cell), Si
2
N
2
O at 0 GPa (10 atoms/cell), SrSiN
2

at 0 GPa (16 atoms/cell), urea (NH
2
)
2
CO at experimental cell parameters (16 atoms/cell), MgSiO
3
with 20 atoms/cell
(at experimental cell parameters of post-perovskite, and with variable cell at 80 and 120 GPa), MgCO
3
at 150 GPa
(5, 10, 20 atoms/cell), CaCO
3
(at 50, 80, 150 GPa with 5, 10, 20 atoms/cell; also at experimental cell parameters of
post-aragonite, 10 atoms/cell).
42. Umemoto K., Wentzcovitch R.M., Allen P.B. (2006). Dissociation of MgSiO
3
in the cores of gas giants and terrestrial
exoplanets. Science 311, 983-986.
43. Oganov A.R., Glass C.W., Ono S. (2006). High-pressure phases of CaCO
3
: crystal structure prediction and
experiment. Earth Planet. Sci. Lett. 241, 95-103.
44. Jhannesson G.H., Bligaard T., Ruban A.V., Skriver H.L., Jacobsen K.W., and Nrskov J.K. (2002). Combined Electronic
Structure and Evolutionary Search Approach to Materials Design. Phys. Rev. Lett. 88, art. 255506.
45. Becke A.D. (1993). Density-functional thermochemistry. 3. The role of exact exchange. J. Chem. Phys. 98, 5648-
5652.
46. Tao J.M., Perdew J.P., Staroverov V.N., Scuseria G.E. (2003). Climbing the density functional ladder: Nonempirical
meta-generalized gradient approximation designed for molecules and solids. Phys. Rev. Lett. 91, art. 146401.
47. Fuchs M., Gonze X. (2002). Accurate density functionals: Approaches using the adiabatic-connection fluctuation-
dissipation theorem. Phys. Rev. B65, art. 235109.
48. Liechtenstein A.I., Anisimov V.I., Zaanen J. (1995). Density functional theory and strong interactions orbital
ordering in Mott-Hubbard insulators. Phys. Rev. B52, R5467-R5470.
49. Strange P., Svane A., Temmerman W.M., Szotek Z., Winter H. (1999). Understanding the valency of rare earths from
first-principles theory. Nature 399, 756-758.
50. Georges A., Kotliar G., Krauth W., Rozenberg M.J. (1996). Dynamical mean-field theory of strongly correlated
fermion systems and the limit of infinite dimensions. Rev. Mod. Phys. 68, 13-125.
51. Foulkes W.M.C., Mitas L., Needs R.J., & Rajagopal G. (2001). Quantum Monte Carlo simulations of solids. Rev. Mod.
Phys. 73, 33-83.
52. Valle M. (2005). STM3: a chemistry visualization platform. Z. Krist. 220, 585-588.
53. Martonak R., Oganov A.R., Glass C.W. (2007). Crystal structure prediction and simulations of structural
transformations: metadynamics and evolutionary algorithms. Phase Transitions 80, 277-298.
USPEX Manual 54

11. APPENDIX1: TEST RUNS.

a b
c d
Evolutionary structure search for Au
8
Pd
4
. a,b evolution of the total energy (for clarity, panel (b) zooms in on
the lowest-energy region of the same data set), c - the lowest-energy structure found in our evolutionary
simulation, d the lowest-energy structure found by cluster expansion. In our simulations, the first
generation of structures was produced randomly. In each generation, there are 40 structures, the best 60% of
which participated in producing the next generation of structures, 70% of which were created by heredity,
20% by permutation and 10% by lattice mutation. 4 best structures of a given generation survive into the next
generation. Note that our structure (c) is the lowest-energy known structure for this compound.

USPEX Manual 55
12. APPENDIX 2. SAMPLE INPUT.

PARAMETERS EVOLUTIONARY ALGORITHM

USPEX : calculationMethod (USPEX, VCNEB, META)

******************************************
* TYPE OF RUN AND SYSTEM *
******************************************
1 : calculationType (1 = bulk, 2 = clusters, 4 = varcomp bulk, 11 = molecular crystals)

% Possible symmetry of the randomly created structures; space groups fo crystals, point groups for clusters
% symmetries
1-230
% endSymmetries

% optimisation criteria
enthalpy : optType (optimise by: enthalpy, volume, hardness, struc_order, aver_dist, mag_moment)

% numbers of ions of each type
% numIons
4 8 16
% EndNumIons

% Here come the atomic numbers of the atoms involved
% atomType
12 13 8
% EndAtomType

% For hardness/softmutation, define the next few parameters (valence, etc)
% valencies
2 3 2
% endValencies

% bonds with nu higher than this value are always included into hardness formula, even if connectivity isn't
increased. Need either matrix (like minDist) or a single value
%%%%%%%%%%%%%%%%
% goodBonds
0.15
% EndGoodBonds
%%%%%%%%%%%%%%%%

0 : checkConnectivity

******************************************
* POPULATION *
******************************************
40 : populationSize (how many individuals per generation)
40 : initialPopSize (how many individuals in the first generation - if =0 then equal to the size specified above)
50 : numGenerations (how many generations shall be calculated)
40 : stopCrit (max number of generations with the same best structure before stoppage)

******************************************
* SURVIVAL OF THE FITTEST AND SELECTION *
******************************************
USPEX Manual 56
5 : keepBestHM (how many structures should survive and compete in the next generation)
1 : dynamicalBestHM (1: number of surviving structures varies during calculations with previous parameter as
upper bound)
0 : reoptOld (should the old structures be reoptimized? 1:yes, 0:no)
0.6 : bestFrac (What fraction of current generation shall be used to produce the next generation)

******************************************
* VARIATION OPERATORS *
******************************************
0.50 : fracGene (what fraction of generated individuals shall be produced by heredity)
0.10 : fracRand (what fraction of generated individuals shall be produced randomly from space groups specified by
user)
1.0 : percSliceShift (what fraction of the structures produced by heredity shall be shifted in all dimensions)

0.2 : fracPerm (what fraction of the generated individuals shall be produced by permutations)
5 : howManySwaps (max of the uniform, between 1 and max, distribution of the number of swaps per
MUTANT).
%%%%%%%%%%%%%%%%%%%
% The following are the swaps you want to allow. One line corresponds to a set of interchangeable atoms.
% specificSwaps (write here the swaps you want to allow)

% EndSpecific
%%%%%%%%%%%%%%%%%%%

0.2 : fracAtomsMut (what fraction of the generated individuals shall be produced by atom position mutations)
3.0 : mutationDegree
% softMutOnly
0
% EndSoftOnly
200 : softMutTill

% percentage of structures produced by lattice mutation = 1.0-(FracGene+FracPerm+FracRotMut)
% so don't need to specify explicitly
0.50 : mutationRate (standard deviation of the epsilons in the strain matrix)
1.00 : DisplaceInLatmutation

****************************************
* CONSTRAINTS *
****************************************
2.0 : minVectorLength ( minimal length of any lattice vector)

%%%%%%%%%%%%%%%%
% IonDistances
0.6 0.6 0.6
0.0 0.6 0.6
0.0 0.0 0.6
% EndDistances
%%%%%%%%%%%%%%%%%

*****************************************
* CELL *
*****************************************
% The following is what you know about the lattice. If you know the lattice vectors,
% type them in as 3x3 matrix. If not, type the estimated volume.
% Latticevalues (this word MUST stay here, type values below)
210.0
USPEX Manual 57
% Endvalues (this word MUST stay here)

% splitInto (possible number of atoms per one subcell)
2
% EndSplitInto

*****************************************
* RESTART *
*****************************************
0 : pickUpYN
0 : pickUpGen
0 : pickUpFolder

*****************************************
* DETAILS OF AB INITIO CALCULATIONS *
*****************************************
abinitioCode (which code from CommandExecutable shall be used for calculation? vasp (1), siesta (2), gulp (3), etc)
3 3 3 3 3 3 3
ENDabinit

% numProcessors (how many processors per calculation)
1 1 1 1 1 1 1
% EndProcessors

%Resolution for KPOINTS - one number per step or just one number in total)
% KresolStart
0.16 0.14 0.12 0.09 0.07
% Kresolend

2:00 : wallTime (max time for each calculation)
1 : numParallelCalcs (how many parallel calculations shall be performed)

%%%%%%%%%%%%%%%%%
% commandExecutable
mpirun -np 2 vasp > out
siesta
./job3_gulpNP
job2
/nfs/xt3-homes/users/alyakhov/bin/siesta<input.fdf
./meam-lammps_han test.tcl
timelimit -t 400 ./OptimizeNN.x > log
timelimit -t 400 ./dmacrys <mol.res.dmain >output
mpirun -np 4 cp2k.popt cg.inp > cp2k_output
mpirun -np 4 pw.x < qe.in > output
reserved for DL POLY executable
molecular VASP
ASE executable
atkpython < ATK.in > ATK.out
% EndExecutable
%%%%%%%%%%%%%%%%%%

1 : doSpaceGroup (0 - no space group, 1 - calculate space groups)

*****************************************
* HARDWARE-RELATED *
*****************************************
USPEX Manual 58
0 : remoteRegime
nonParallel : whichCluster
7 : maxErrors

******************************************
% REMOTE SETTINGS (only if REMOTE>1) *
******************************************
aroganov : username (user name to login to remote supercomputer)
/gpfs/home2 : remotePath (user home folder at supercomputer)
2134 : portNumber
artem : localFolder (TALC folder)
Blind_test : remoteFolder (remote folder)

******************************************
% FINGERPRINTS SETTINGS *
******************************************
0.05 : sigmaFing
0.10 : deltaFing
8.0 : RmaxFing
0.01 : toleranceFing (if distance is less than tolerance - structures are identical)
0.04 : toleranceBestHM

% repeatForStatistics
1
% EndRepeatForStatistics
*************************************************************************
* END OF INPUT *
*************************************************************************

USPEX Manual 59
13. APPENDIX 3. SAMPLE OF A SHORT INPUT (MOST PARAMETERS ARE DEFAULTS).
PARAMETERS EVOLUTIONARY ALGORITHM
% Example of the short input, using most options as defaults

% numIons
2 4 8
% EndNumIons

% atomType
12 13 8
% EndAtomType

% valencies
2 3 2
% endValencies

40 : populationSize (how many individuals per generation)
40 : initialPopSize (how many individuals in the first generation - if =0 then equal to the size specified above)
50 : numGenerations (how many generations shall be calculated)

2.0 : minVectorLength (minimal length of any lattice vector)

% Minimal inter-atomic distances matrix of the different ion types.
% IonDistances
0.6 0.6 0.6
0.0 0.6 0.6
0.0 0.0 0.6
% EndDistances

% Latticevalues (this word MUST stay here, type values below)
106.0
% Endvalues (this word MUST stay here)

abinitioCode (which code from CommandExecutable shall be used for calculation? vasp (1), siesta (2), gulp (3), etc)
3 3 3 3 3 3 3
ENDabinit

%%%%%%%%%%%%%%%%%
% What follows is the unix command to call optimizer - exactly as you would type it in the terminal%
commandExecutable
mpirun -np 2 vasp-1.5 > out
siesta
./job3_gulpNP
% EndExecutable
%%%%%%%%%%%%%%%%%%
USPEX Manual 60
14. APPENDIX 4. LI ST OF 230 SPACE GROUPS.
1 P1 2 P-1 3 P2 4 P2
1
5 C2
6 Pm 7 Pc 8 Cm 9 Cc 10 P2/m
11 P2
1
/m 12 C2/m 13 P2/c 14 P2
1
/c 15 C2/c
16 P222 17 P222
1
18 P2
1
2
1
2 19 P2
1
2
1
2
1
20 C222
1

21 C222 22 F222 23 I222 24 I2
1
2
1
2
1
25 Pmm2
26 Pmc2
1
27 Pcc2 28 Pma2 29 Pca2
1
30 Pnc2
31 Pmn2
1
32 Pba2 33 Pna2
1
34 Pnn2 35 Cmm2
36 Cmc2
1
37 Ccc2 38 Amm2 39 Abm2 40 Ama2
41 Aba2 42 Fmm2 43 Fdd2 44 Imm2 45 Iba2
46 Ima2 47 Pmmm 48 Pnnn 49 Pccm 50 Pban
51 Pmma 52 Pnna 53 Pmna 54 Pcca 55 Pbam
56 Pccn 57 Pbcm 58 Pnnm 59 Pmmn 60 Pbcn
61 Pbca 62 Pnma 63 Cmcm 64 Cmca 65 Cmmm
66 Cccm 67 Cmma 68 Ccca 69 Fmmm 70 Fddd
71 Immm 72 Ibam 73 Ibca 74 Imma 75 P4
76 P4
1
77 P4
2
78 P4
3
79 I4 80 I4
1

81 P-4 82 I-4 83 P4/m 84 P4
2
/m 85 P4/n
86 P4
2
/n 87 I4/m 88 I4
1
/a 89 P422 90 P42
1
2
91 P4
1
22 92 P4
1
2
1
2 93 P4
2
22 94 P4
2
2
1
2 95 P4
3
22
96 P4
3
2
1
2 97 I422 98 I4
1
22 99 P4mm 100 P4bm
101 P4
2
cm 102 P4
2
nm 103 P4cc 104 P4nc 105 P4
2
mc
106 P4
2
bc 107 I4mm 108 I4cm 109 I4
1
md 110 I4
1
cd
111 P-42m 112 P-42c 113 P-42
1
m 114 P-42
1
c 115 P-4m2
116 P-4c2 117 P-4b2 118 P-4n2 119 I-4m2 120 I-4c2
121 I-42m 122 I-42d 123 P4/mmm 124 P4/mcc 125 P4/nbm
126 P4/nnc 127 P4/mbm 128 P4/mnc 129 P4/nmm 130 P4/ncc
131 P4
2
/mmc 132 P4
2
/mcm 133 P4
2
/nbc 134 P4
2
/nnm 135 P4
2
/mbc
136 P4
2
/mnm 137 P4
2
/nmc 138 P4
2
/ncm 139 I4/mmm 140 I4/mcm
141 I4
1
/amd 142 I4
1
/acd 143 P3 144 P3
1
145 P3
2

USPEX Manual 61
146 R3 147 P-3 148 R-3 149 P312 150 P321
151 P3
1
12 152 P3
1
21 153 P3
2
12 154 P3
2
21 155 R32
156 P3m1 157 P31m 158 P3c1 159 P31c 160 R3m
161 R3c 162 P-31m 163 P-31c 164 P-3m1 165 P-3c1
166 R-3m 167 R-3c 168 P6 169 P6
1
170 P6
5

171 P6
2
172 P6
4
173 P6
3
174 P-6 175 P6/m
176 P6
3
/m 177 P622 178 P6
1
22 179 P6
5
22 180 P6
2
22
181 P6
4
22 182 P6
3
22 183 P6mm 184 P6cc 185 P6
3
cm
186 P6
3
mc 187 P-6m2 188 P-6c2 189 P-62m 190 P-62c
191 P6/mmm 192 P6/mcc 193 P6
3
/mcm 194 P6
3
/mmc 195 P23
196 F23 197 I23 198 P2
1
3 199 I2
1
3 200 Pm-3
201 Pn-3 202 Fm-3 203 Fd-3 204 Im-3 205 Pa-3
206 Ia-3 207 P432 208 P4
2
32 209 F432 210 F4
1
32
211 I432 212 P4
3
32 213 P4
1
32 214 I4
1
32 215 P-43m
216 F-43m 217 I-43m 218 P-43n 219 F-43c 220 I-43d
221 Pm-3m 222 Pn-3n 223 Pm-3n 224 Pn-3m 225 Fm-3m
226 Fm-3c 227 Fd-3m 228 Fd-3c 229 Im-3m 230 Ia-3d

USPEX Manual 62
15. APPENDIX 5. LIST OF ALL CRYSTALLOGRAPHIC AND THE MOST IMPORTANT NON-
CRYSTALLOGRAPHIC POINT GROUPS IN SCHOENFLIES AND HERMANN-MAUGUIN (INTERNATIONAL)
NOTATIONS.

Hermann-Mauguin Schoenflies In USPEX
Crystallographic point groups
1 C
1
C1 or E
2 C
2
C2
222 D
2
D2
4 C
4
C4
3 C
3
C3
6 C
6
C6
23 T T
1 S
2
S2
M C
1h
Ch1
mm2 C
2v
Cv2
2 S
4
S4
3 S
6
S6
6 C
3h
Ch3
m3 T
h
Th
2/m C
2h
Ch2
mmm D
2h
Dh2
4/m C
4h
Ch4
32 D
3
D3
6/m C
6h
Ch6
432 O O
422 D
4
D4
3m C
3v
Cv3
622 D
6
D6
43m T
d
Td
4mm C
4v
Cv4
3m D
3d
Dd3
6mm C
6v
Cv6
m3m O
h
Oh
42m D
2d
Dd2
62m D
3h
Dh3
4/m mm D
4h
Dh4
6/m mm D
6h
Dh6
m3m O
h
Oh
Important non-crystallographic point groups
5 C
5
C5
5/m S
5
S5
5 S
10
S10
5m Cv
5v
Cv5
10 Ch
5h
Ch5
52 D
5
D5
5m D
5d
Dd5
10 2m D5
h
Dh5
532 I I
53m I
h
Ih

USPEX Manual 9.1.0 Release

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

USPEX Manual 9.1.0 Release

Caricato da

Copyright:

Formati disponibili

USPEX Manual 1

USPEX (Universal Structure Predictor: Evolutionary Xtallography).

Potrebbero piacerti anche