Sei sulla pagina 1di 126

GOLD

User Guide & Tutorials


Copyright 2006 The Cambridge Crystallographic Data Centre
Registered Charity No 800579
Conditions of Use i
Conditions of Use
GOLD and its associate documentation and software, including SILVER, (together the Program),
are copyright works and all rights are protected. Use of the Program is permitted solely in accordance
with a valid Software Licence Agreement and the Program is proprietary. All persons accessing the
Program should make themselves aware of the conditions contained in the Software Licence
Agreement.
In particular:
The Program is to be treated as confidential and may NOT be disclosed or re-distributed in any
form, in whole or in part, to any third party.
No representations, warranties, or liabilities are expressed or implied in the supply of the
Program by CCDC, its servants or agents, except where such exclusion or limitation is
prohibited, void or unenforceable under governing law.
GOLD 2006 CCDC Software Ltd.
SILVER 2006 CCDC Software Ltd.
Implementation of ChemScore within GOLD Astex Technology
All rights reserved
Licences may be obtained from:
CCDC Software Ltd.
12 Union Road
Cambridge CB2 1EZ
United Kingdom
Email:admin@ccdc.cam.ac.uk
Web:www.ccdc.cam.ac.uk
Telephone:+44-1223-336408
236 Index
Threonine hydroxyls, orientation of 10
Top-ranked docking solution 115
Torsion angle distribution file
adding a new distribution to 83
available choices of 83
editing 84
expand directive 85
format of an individual distribution 85
format of header 84
gold.tordist 83
gold.tordist.new 83
mimumba.tordist 83
period directive 85
selecting in front end 5
Torsion angle distributions
adding a new 83
basic use of 83
distributions file 83
examples 87
matching to ligand torsions 88
Torsion angles, allowing protein side chain
flexibility 18
Torsion angles, fixing at input conformation via
the gold.conf 66
Tutorials 162
TYPE_DEF (in torsion angle distribution file) 85
U
Use Distributions (check box in front end) 5
User Defined Score
overview of 62
User Defined Score (check box in front end) 5
User-defined scoring function, constructing 62
V
Valence angle, bending energy term for covalent
complexes 33
Valence angles 33
Validation of docking predictions
effect of number of ligand atoms, first
validation 153
effect of number of ligand atoms, second
validation 160
effect of number of ligand H-bonding atoms,
first validation 153
effect of number of ligand H-bonding atoms,
second validation 160
effect of number of ligand torsions, first
validation 153
effect of number of ligand torsions, second
validation 160
first series of experiments 153
resolution of protein structure 159
root mean square deviations, first validation
154
second series of experiments 160
subjective analysis compared with rms
deviations 158
using the CCDC/Astex validation test set 131
van der Waals (entry box in front end) 5
van der Waals annealing parameter
explanation of 91
setting 5
Van der Waals energy
annealing of 91
external (Goldscore) 46
external, scaling of (Goldscore) 46
internal (Goldscore) 46
listed in ligand log file 119
parameters (Goldscore) 46
Virtual screening 98
Visualisation
grommitt 142
using the front end 4
W
Water molecules 16
Index 235
end) 3
Selection Pressure (entry box in front end) 7
Selection Pressure (genetic algorithm
parameter)
default values 96
explanation of 90
setting value of 96
Serine hydroxyls, orientation of 18
Set atom types (check boxes in front end) 4
setting up proteins
heme containing 15
side chain conformations, defining 19
Side chain flexibility 18
side chain rotamer energy, specifying 24
SILVER 125
analysis of docking results 125
exporting results to 117, 125
visualising docking result 124
Slave process 101
smart_rms 143
Soft potentials, using 47
specifying flexible side chains 19
specifying torsion angle tolerances for rotatable
side chains 19
specifying torsion angles for rotatable side
chains 19
Speed of GOLD
and reliability 97
effect of early termination 93
effect of genetic algorithm parameters 94
number of dockings 93
Split soft potentials, using 47
standard rotamer library, using 20
Starting geometry
of ligand 30
of protein 18
of protein hydroxyl groups 18
Stereochemistry of ligand 31
Sub-directories, creating for output files 112
Submit&Exit (button in front end) 3
Submitting to background 100
Substructure Constraint (menu item in front end)
72
Substructure-based constraints
setting up 72
Sulfoxide
atom type conventions 44
bond type conventions 44
Sulphonamide
atom type conventions 39
bond type conventions 39
Sulphonate
atom type conventions 39
bond type conventions 39
Sulphone
atom type conventions 39
bond type conventions 39
SYB_TYPE (in torsion angle distribution file) 85
Symmetry, handling of in RMSD calculations
143
T
tag names in output files 151
Tags 151
Tautomerism
of histidine 10
of ligand 30
Template Similarity Constraint (menu item in
front end) 79
Template similarity constraints
overview 79
setting up 79
ii Conditions of Use
Contacting User Support iii
Contacting User Support
If you have any technical or scientific queries concerning this CCDC product then please contact
User Support who will try to help.
Email: support@ccdc.cam.ac.uk
Website: http://www.ccdc.cam.ac.uk/support
Tel : +44 1223 336022
A list of frequently asked questions (FAQs) are available at the website address given above. This
resource is continually being updated with answers to common questions. Please scan the archive for
the relevant product before making use of our email and telephone support service.
If you need to contact User Support, please try to provide the following information:
The name and version number of the product with which you are having problems.
The make, model and operating system of the workstation you are using.
A clear description of the problem and the circumstances under which it occurred.
Also be prepared to email error messages and other output. This information is always useful when
trying to determine the cause of a problem.
We try to deal with User Support queries within one working day but sometimes problems can take
longer to solve. When this happens we will keep you informed of our progress and try to provide you
with an answer as quickly as possible.
234 Index
setting a bond as fully rotatable 38
using 38
Rotatable-bond freezing term, in ChemScore 56
rotating a bond during docking using the
rotatable bond override file 38
Run (button in front end) 3
Running GOLD
configuration file, use of 100
directory, use new 99
error messages 124
from command line 100
in background 100
interactive diagnostics 100
interactively 100
parallel mode 101
S
S.a (GOLD internal atom type) 45
S.m (GOLD internal atom type) 45
Save&Exit (button in front end) 3
Scaffold match constraint 80
method 81
setting up 81
Scaffold match constraint, overview 80
Scoring function
angle bending term for covalent complexes
33
apparent increase in during genetic algorithm
run 119
bond angle term for covalent complexes 33
bump checking 48
ChemScore
block functions 50
clash penalty 56
constraint terms 58
covalent term 58
explanation of hydrogen-bond terms 52
hydrogen-bond terms 52
ligand torsional strain 56
lipophilic term 54
metal-binding 54
overview of 49
parameter file 58
parameters, altering 58
rotatable-bond freezing term 56
choice of GoldScore, ChemScore, User
Defined Score 46
correlation with binding affinity 137
customising parameters 127
GoldScore
atom radii 46
energy parameters 46
external van der Waals energy 46
hydrogen bond directionality parameters 46
hydrogen bond energy, ligand intramolecular 46
hydrogen bond energy, protein-ligand 46
internal van der Waals energy 46
overview of 46
parameter file 48, 59
parameters, altering 59
polarisability parameters 46
scaling of external van der Waals energy 46
van der Waals energy, ligand 46
van der Waals energy, protein-ligand 46
list of, in log file 115
ranking of, for docking solutions 116
torsional parameters 83
User Defined Score 62
overview of 62
valence angle term for covalent complexes 33
scoring function limitations when using flexible
side chains 18
Scoring function terms
exporting to SILVER 125
in output files, definition 151
saving to output files 111
Scoring function, adding user terms 62
sd format 31
SD-style 151
SD-style tags 151
Select editing panels, Input (check box in front
Index 233
disabling 101
FAQs 162
log files 103
PVM (Parallel Virtual Machine) 101
R
Radius
of atom, for use in GoldScore fitness function
46
of binding site 24
ranked_structure... mol2 files 112
Ranking of docked solutions 116
Read hydrophobic fitting points (check box in
front end) 5
References describing GOLD 147
Region (hydrophobic) constraints 77
Relative ligand energy 62
Reliability of predictions
as function of number of ligand atoms, first
validation 153
as function of number of ligand atoms,
second validation 160
as function of number of ligand H-bonding
atoms, first validation 153
as function of number of ligand H-bonding
atoms, second validation 160
as function of number of ligand torsions, first
validation 153
as function of number of ligand torsions,
second validation 160
binding affinity, alpha chymotrypsin 139
binding affinity, FKBP12 140
binding affinity, influenza A neuraminidase
138
examples 136
methodology (binding affinity tests) 137
methodology (docking orientation tests) 129
resolution of protein structure 159
root mean square deviations in first
validation 154
subjective analysis compared with rms
deviations 158
validation, first series of experiments 129
validation, second series of experiments 130
REMOVE_HIGH_ENERGY (parameter in
torsion angle distribution file) 84
Reordering (message in log file) 119
Rescore log file 117
Rescoring 106
output files 117
overview 106
setting up 106
Rescoring solution file 117
resetting bond types 38
Resolution of protein structure, and prediction
accuracy 159
Rigid ligand docking 66
Rings, varying conformation of 64
rms_analysis 144
rnk file 115
rotamer command in gold.conf 19
limitations 19
rotamer library (standard), using 20
rotamer_lib command block in gold.conf 19
rotamer_library.txt file
commenting out unrequired torsions 20
location 20
using 20
rotamers 18
rotatable_bond_override.mol2 file
fixing an angle at its input angle 38
flipping a bond 38
retyping a bond as an amide (am) bond type
38
iv Contacting User Support
Table of Contents v
Table of Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Overview of the GOLD Front End . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 Control Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Input Parameters and Files Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Fitness Function Settings Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4 Genetic Algorithm Parameters Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5 Parallel Operation Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Setting Up the Protein . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1 Essential Steps in Setting Up the Protein . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Protein Hydrogen Atoms, Ionisation States and Tautomeric States . . . . . . . . . . . . . . . . . . 10
3.3 Metal Ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3.1 Preparing a Protein Input File which Contains a Metal Ion . . . . . . . . . . . . . . . . . . . . 10
3.3.2 Automatic Determination of Metal Coordination Geometries . . . . . . . . . . . . . . . . . . 11
3.3.3 Specifying Metal Coordination Geometries Manually . . . . . . . . . . . . . . . . . . . . . . . 12
3.3.4 Defining Custom Metal Coordination Geometries . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3.5 Metal-Ligand Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3.6 Heme Containing Proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4 Water Molecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.4.1 Methodology For Handling Waters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.4.2 Specifying Waters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.5 Rotatable O-H and NH3 Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.6 Flexible Side Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.6.1 Introduction to Side-Chain Flexibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.6.2 Specifying a Flexible Side Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.6.3 Using a Standard Rotamer Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.6.4 Allowing a Localised Backbone Movement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.6.5 Protein-Protein Clashes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.6.6 Specifying the Energy of a Side-Chain Rotamer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.7 Large Backbone Movements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.8 Defining the Binding Site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.8.1 Defining a Binding Site from a Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.8.2 Defining a Binding Site from an Atom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
232 Index
Processes, maximum number of 105
Program crash, action required in event of 124
Program speed
and reliability 97
effect of early termination 93
effect of genetic algorithm parameters 94
number of dockings 93
Protein
active site definition 24
aspartic acid 10
atom charges 10
atom labels 9
atom types 36
binding site definition 24
bond types 36
cavity detection 28
charges on atoms 10
conformation 18
disulphide bridges 9
file formats 29
file name definition 29
flexibility 18
glutamic acid 10
histidine 10
hydrogen atoms 10
initialised 112
ionisation states 10
metal ions 10
mol2 format 29
pdb format 29
protonation states 10
radius of binding site 24
resolution, correlation with prediction
accuracy 159
selecting, in front end 4
serine 18
setting up 9
tautomeric states 10
threonine 18
water molecules 16
protein
dummy atoms 163
lone pairs 163
metal atoms 163
protonation state 163
setting up 163
Protein (entry box in front end) 4
protein backbone movement (large), defining 24
protein backbone movement (localised),
defining 20
protein energy term, in GoldScore 23
Protein flexibility
allowing large backbone movement 24
protein-protein clash penalisation, turning off
23
protein-protein clashes, penalisation 23
scoring function limitations 18
side chain flexibility 18
specifying allowed rotatable side chains 19
specifying the energy 24
using a standard rotamer library 20
Protein flexiblility
allowing a localised backbone movement 20
Protein H bond constraints
overview of 74
setting up 75
Protein log file 118
protein-protein clash penalisation, turning off 23
protein-protein clashes, penalising when using
rotatable side chains 23
Protonation states
of ligand 30
of protein residues 10
PVM
console 102
Index 231
disabling 101
FAQs 162
log files 103
Parallel Virtual Machine (PVM) 101
Parameter file
ChemScore
editing 58
explanation of 58
GoldScore
editing 48
explanation of 48
selecting in front end 4
Parameter File (entry box in front end) 4
pdb format
for ligand 31
for protein 29
problems of defining bond type 31
Peptide linkages
flipping between cis and trans (in ligands) 65
period (directive in torsion angle distribution
file) 85
Phosphate
atom type conventions 39
bond type conventions 39
Planar nitrogen, flipping 65
Polar protein hydrogen atoms
explanation 115
saving to file 111
Polarisability, of atom, for use in GoldScore
fitness function 46
Population Size (entry box in front end) 7
Population Size (genetic algorithm parameter)
default values 96
explanation of 89
relationship to program speed 94
setting value of 96
postprocessing of ligand rotatable bonds,
switching off 38
Predictions, accuracy of
as function of number of ligand atoms, first
validation 153
as function of number of ligand atoms,
second validation 160
as function of number of ligand H-bonding
atoms, first validation 153
as function of number of ligand H-bonding
atoms, second validation 160
as function of number of ligand torsions, first
validation 153
as function of number of ligand torsions,
second validation 160
binding affinity, alpha chymotrypsin 139
binding affinity, FKBP12 140
binding affinity, influenza A neuraminidase
138
examples 136
methodology (binding affinity tests) 137
methodology (docking orientation tests) 129
resolution of protein structure 159
root mean square deviations in first
validation 154
subjective analysis compared with rms
deviations 158
validation, first series of experiments 129
validation, second series of experiments 130
Preferences
.gold_preferences 127
ChemScore
fitness function parameters 58
default genetic algorithm parameter settings
96
GoldScore
fitness function parameters 59
torsion angle distributions 84
Process file 124
Process scheduler, for parallel operation 104
process_tab 88
vi Table of Contents
3.8.3 Defining a Binding Site from a List of Atoms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.8.4 Defining a Binding Site from a Single Residue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.8.5 Defining a Binding Site from a List of Residues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.8.6 Defining a Binding Site from a Reference Ligand . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.8.7 Cavity Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.8.8 Output of Cavity Volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.9 Protein File Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.10 Specifying the Protein File Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4 Setting Up Ligands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1 Essential Steps in Setting Up a Ligand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2 Ligand Hydrogen Atoms, Ionisation States and Tautomeric States . . . . . . . . . . . . . . . . . . 30
4.3 Ligand Geometry, Conformation and Stereochemistry . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.4 Ligand File Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.5 Specifying the Ligand File(s) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.6 Setting Up Covalently Bound Ligands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.6.1 Method Used for Docking Covalently Bound Ligands . . . . . . . . . . . . . . . . . . . . . . . 33
4.6.2 Setting Up a Single Covalent Link . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.6.3 Setting Up Substructure-Based Covalent Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5 Atom and Bond Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.1 Atom and Bond Type Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2 Automatically Setting Atom and Bond Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.3 Manually Setting Atom and Bond Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.4 Overriding Automatic Bond Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.5 Atom and Bond Type Conventions for Difficult Groups . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.6 Internal GOLD Atom Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6 Fitness Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.1 Choice of Fitness Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.2 GoldScore Fitness Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.2.1 Docking With Localised Soft Potentials: An Alternative Form for the External Van der
Waals Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.2.2 Bump Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.3 Altering GoldScore Fitness-Function Parameters; the GoldScore File . . . . . . . . . . . . . . . 48
6.4 ChemScore Fitness Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Table of Contents vii
6.4.1 Introduction to ChemScore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.4.2 Block Functions in ChemScore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.4.3 Hydrogen-Bond Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.4.4 Metal-Binding and Lipophilic Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.4.5 Rotatable-Bond Freezing Term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.4.6 Clash Penalty and Internal Torsion Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.4.7 Covalent Term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.4.8 Constraint Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.5 Altering ChemScore Fitness-Function Parameters; the ChemScore File . . . . . . . . . . . . . . 58
6.6 Altering GOLD Parameters: the gold.params File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.7 Kinase Scoring Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.8 Heme Scoring Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.9 Internal Energy Offset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.10 User Defined Fitness Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7 Ligand Flexibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7.1 Flipping Ring Corners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7.2 Flipping Amide Bonds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7.3 Flipping Planar Nitrogens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.4 Flipping Pyramidal Nitrogens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
7.5 Intramolecular Hydrogen Bonds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
7.6 Protonated Carboxylic Acids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
7.7 Fixing Rotatable Bonds at Their Input Conformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
8 Setting and Releasing Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
8.1 Using the Constraint Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
8.2 Distance Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
8.2.1 Setting Up a Distance Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
8.2.2 Method Used for Substructure-Based Distance Constraints . . . . . . . . . . . . . . . . . . . 71
8.2.3 Setting Up Substructure-Based Distance Constraints . . . . . . . . . . . . . . . . . . . . . . . . 72
8.3 Hydrogen Bond Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
8.3.1 Setting Up Hydrogen Bond Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
8.3.2 Method Used for Protein H Bond Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
8.3.3 Setting up Protein H Bond Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
8.4 Region (Hydrophobic) Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
230 Index
explanation of 91
setting values of 96
Output
controlling amount 109
controlling information written to files 111
tag names 151
Output Directory (entry box in front end) 4
Output files
active_atoms, set in gold_protein.mol2 112
atom type errors 124
best docking solution 115
bestranking.lst 116
cluster analysis 144
comparison of docking solutions 120
directories 112
docked ligand files 112
donor_hydrogens, set in gold_protein.mol2
112
energy values 119
error messages 124
fitness function may appear to increase 119
fitness function scores 115
formats same as input files 112
gold.err 124
gold.pid 124
gold_ligand.mol2 112
gold_protein.log 118
gold_protein.mol2 112
gold_solution... mol2 files 118
hydrogen bond energy 119
initialised ligand file 112
initialised protein file 112
ligand log file 118
links, symbolic, between ligand docking files
112
log file 118
lone_pairs, set in gold_protein.mol2 112
naming conventions 109
process file 124
protein log file 118
ranked_structure... mol2 files 112
ranking of docked solutions 116
reordering (message in log file) 118
rescore log file 117
rescore.mol2 file 117
rms comparison of docked solution 120
rnk 115
sub-directories 112
symbolic links between ligand docking files
112
van der Waals energy 119
overriding ligand bond types 38
Overview
of fitness functions 46
of front end 3
of genetic algorithm 89
of GOLD 1
of torsion angle distributions 83
Oxygen, anionic
atom type conventions 39
bond type conventions 39
P
Parallel (check box in front end) 3
Parallel mode of running
host 101
how it works 101
maximum number of processes 105
multi-processor machines 101
PVM 101
PVM log files 103
selecting and deselecting machines 104
using the console 102
Parallel Operation (panel in front end) 8
Parallel Virtual Machine
console 102
Index 229
processing 101
Mutate (entry box in front end) 7
Mutate (genetic algorithm parameter)
default values 96
explanation of 91
setting value of 96
N
N.acid (GOLD internal atom type) 45
N.plc (GOLD internal atom type) 45
N_BINS (parameter in torsion angle distribution
file) 84
Naming conventions for ligand output files 109
NEIGHBOURS (in torsion angle distribution
file) 85
Neuraminidase binding affinity 138
Niche Size (entry box in front end) 7
Niche Size (genetic algorithm parameter)
default values 96
explanation of 91
setting value of 96
Niching 7
Nitro
atom type conventions 39
bond type conventions 39
Nitrogen, anionic
atom type conventions 39
bond type conventions 39
Nitrogen, cationic
atom type conventions 39
bond type conventions 39
NODE (in torsion angle distribution file) 85
Non-bonded contacts, allowing short 48
N-oxide
atom type conventions 39
bond type conventions 39
Number of Constraints (display box in front end)
5
Number of dockings
early termination 93
effect on program speed 93
setting 93
Number of Islands (entry box in front end) 7
Number of Islands (genetic algorithm
parameter)
default values 96
explanation of 90
setting value of 96
Number of ligand atoms, effect on prediction
accuracy
first validation 153
second validation 160
Number of Ligand Bumps (display box in front
end) 5
Number of ligand H-bonding atoms, effect on
prediction accuracy
first validation 153
second validation 160
Number of ligand torsions, effect on prediction
accuracy
first validation 153
second validation 160
Number of Ligands (display box in front end) 4
Number of Operations (entry box in front end) 7
Number of Operations (genetic algorithm
parameter)
default values 96
explanation of 90
relation to program speed 94
setting value of 96
O
Operator weights in genetic algorithm
default values 96
viii Table of Contents
8.4.1 Method Used for Region (Hydrophobic) Constraints . . . . . . . . . . . . . . . . . . . . . . . . 77
8.4.2 Setting Up Region (Hydrophobic) Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
8.5 Template Similarity Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
8.5.1 Method Used for Template Similarity Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . 79
8.5.2 Setting Up a Template Similarity Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
8.6 Scaffold Match Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
8.6.1 Method Used for Scaffold Match Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
8.6.2 Setting Up Scaffold Match Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
9 Torsion Angle Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
9.1 Basic Use of Torsion Angle Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
9.2 Choice of Torsion Angle Distribution Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
9.3 Editing Torsion Angle Distribution Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
9.3.1 Format of Torsion Angle Distribution File Header . . . . . . . . . . . . . . . . . . . . . . . . . . 84
9.3.2 Format of Torsion Angle Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
9.3.3 Example Torsion Angle Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
9.3.4 Extracting Torsion Angle Distributions from the Cambridge Structural Database . . 88
9.4 Matching Torsion Angle Distributions at Run Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
10 Genetic Algorithm Parameter Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
10.1 Genetic Algorithm Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
10.2 Population Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
10.3 Selection Pressure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
10.4 Number of Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
10.5 Number of Islands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
10.6 Niche Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
10.7 Operator Weights: Migrate, Mutate, Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
10.8 Van der Waals and Hydrogen Bonding Annealing Parameters . . . . . . . . . . . . . . . . . . . . . 91
10.9 Hydrophobic Fitting Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
11 Balancing Reliability and Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
11.1 Number of Dockings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
11.2 Early Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
11.3 Controlling Reliability and Speed with GA Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
11.3.1 Relationship between GA Parameters and Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
11.3.2 Using Automatic GA Parameter Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Table of Contents ix
11.3.3 Using Pre-Defined GA Parameter Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
11.3.4 Benchmarking of Reliability/Speed for Pre-defined GA Parameter Settings . . . . . . 97
11.3.5 GA Parameter Settings for Virtual Screening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
12 Running GOLD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
12.1 Required Input Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
12.2 Starting GOLD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
12.3 Running Interactively; Interactive Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
12.4 Submitting a GOLD job to the Background from the Front End . . . . . . . . . . . . . . . . . . . 100
12.5 Running GOLD from the Command Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
12.6 Running in Parallel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
12.6.1 Parallel Virtual Machine (PVM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
12.6.2 Using the PVM Console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
12.6.3 Diagnosis of PVM Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
12.6.4 Selecting and Deselecting Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
12.6.5 Setting the Maximum Number of Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
12.6.6 Using GOLD with your own PVM Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
13 Rescoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
13.1 Rescoring Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
13.2 Setting Up a Rescoring Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
14 Output Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
14.1 Controlling the Amount of Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
14.2 Controlling the Information Written to Output Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
14.3 Specifying Directories for Output Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
14.4 Files Containing the Initialised Protein and Ligand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
14.5 Files Containing the Docked Ligand(s) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
14.6 Files Containing Protein Binding-Site Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
14.7 Files Containing Fitness Function Rankings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
14.7.1 File Containing Ranked Fitness Scores for an Individual Ligand . . . . . . . . . . . . . . 115
14.7.2 File Containing Ranked Fitness Scores for a Set of Ligands . . . . . . . . . . . . . . . . . . 116
14.8 Files Containing the Results of Rescoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
14.8.1 Rescore Solution File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
14.8.2 Rescore Log File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
14.9 Protein Log File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
228 Index
setting up 30
starting geometry 30
stereochemistry 31
tautomeric states 30
valence angles 33
Ligand Editor (window in front end) 32
Ligand energy 62
Ligand energy correction 62
Ligand input files 4
formats 31
multiple ligands 32
Ligand internal torsional strain, in ChemScore
56
Ligand log file 118
best docking solution 115
cluster analysis 120
comparison of docking solutions 120
energy values 119
fitness function may appear to increase 119
fitness function scores 115
reordering (message in log file) 119
rms comparison of docked solutions 120
Ligand output files
best docking solution 115
directories 112
docked ligand files 112
formats same as input files 112
gold_ligand.mol2 112
gold_solution... mol2 files 118
initialised ligand file 112
links, symbolic, between ligand docking files
112
naming conventions 109
ranked_structure... mol2 files 112
rnk 115
sub-directories 112
symbolic links between ligand docking files
112
limitations (scoring function) when using
flexible side chains 18
LINKAGE (in torsion angle distribution file) 85
Links, symbolic, between ligand docking files
112
Lipophilic term, in ChemScore 54
Literature references describing GOLD 147
Log file
ligand 118
protein 118
lone_pairs (set in gold_protein.mol2) 112
M
Maximum number of distributed processes
(entry box in front end) 8
Maximum number of processes, setting 105
Metal ions
custom coordination geometries 14
determination of coordination geometries 11
preparation of input files 10
specifying coordination geometries 12
Metal ligand interactions 15
Metal-binding term, in ChemScore 54
Migrate (entry box in front end) 7
Migrate (genetic algorithm parameter)
default values 96
explanation of 91
setting value of 96
mimumba.tordist 83
mol format 31
mol2 format
for ligands 31
for multiple ligands 32
for protein 29
Multiple ligands, docking of 116
Multi-processor machines, use in parallel
Index 227
I
Identify ligand, utility 145
improper torsions, defining 20
Influenza A neuraminidase binding affinity 138
Initial geometry 31
Initialised ligand 112
Initialised protein 112
Input files 99
Input Parameters and Files (panel in front end) 4
Interactive use
run-time diagnostics 100
Internal energy of ligand 62
Internal H-Bonds (menu item in front end) 66
Internal ligand energy offset 62
Internal van der Waals energy (Goldscore) 46
Interrupt GA (button in GOLD Output window)
100
Intramolecular hydrogen bonds in ligand
switching on and off 66
Introduction
to fitness functions 46
to front end 3
to genetic algorithm 89
to GOLD 1
to torsion angle distributions 83
Ionisation states
of ligand 30
of protein residues 10
K
kinase scoring function (ChemScore), using 59
L
Lennard-Jones potentials, using localised soft
potentials 47
Library screening 98
Library screening settings (menu item in front
end) 98
Ligand
Add Ligand (window in front end) 32
Add/Delete Ligand (button in front end) 4
atom charges 30
atom types 36
bond angles 31
bond lengths 31
bond types 36
bond types, specifying in pdb files 31
charges, atomic 30
chiral 31
conformation 31
diastereomers 31
enantiomers 31
file formats 31
file name definition 32
flexibility 64
geometry 31
hydrogen atoms 30
initialised 112
input files 31
ionisation states 30
Ligand Editor (window in front end) 32
mol format 31
mol2 format 31
output files 112
pdb format 31
prediction accuracy, as function of number of
atoms 153
prediction accuracy, as function of number of
H-bonding atoms 153
prediction accuracy, as function of number of
torsions 153
protonation states 30
rings 64
sd format 31
selecting, in front end 4
x Table of Contents
14.10Ligand Log File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
14.10.1Information on the Progress of Docking Runs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
14.10.2Comparison of Docking Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
14.10.3Identification of Different Binding Modes (Clustering of Ligand Poses) . . . . . . . . 122
14.11File Containing Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
14.12Process File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
14.13Viewing Docked Solutions in SILVER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
14.14Exporting Fitness-Function Data to SILVER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
15 Saving and Reusing Program Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
15.1 Saving and Re-using Program Settings in Configuration Files . . . . . . . . . . . . . . . . . . . . 126
15.2 Customising Fitness Function Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
15.3 Customising the Torsion Angle Distribution File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
15.4 Creating Customised Default Genetic Algorithm Parameter Settings . . . . . . . . . . . . . . . 127
16 Accuracy of Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
16.1 Correlation between Predicted and Observed Ligand Positions . . . . . . . . . . . . . . . . . . . . 129
16.1.1 Initial Validation of Docking Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
16.1.2 Follow-Up Validation of Docking Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
16.1.3 Validation using the CCDC/Astex Test Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
16.1.4 Examples of GOLD Dockings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
16.2 Correlation between Fitness Function and Biological Activity . . . . . . . . . . . . . . . . . . . . 137
16.2.1 Prediction of Binding Affinity to Influenza A Neuraminidase . . . . . . . . . . . . . . . . 138
16.2.2 Prediction of Binding Affinity to Alpha Chymotrypsin . . . . . . . . . . . . . . . . . . . . . . 139
16.2.3 Prediction of Binding Affinity to FKBP12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
17 Context-Dependent Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
18 Utility Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
18.1 grommitt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
18.2 smart_rms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
18.3 rms_analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
18.4 identify_ligand.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
19 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
20 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
21 Appendix A: List of Atom and Bond Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Table of Contents xi
22 Appendix B: Additional Tags in Output Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
23 Appendix C: GOLD Predictions in First Series of Validation Tests . . . . . . . . . . . . . . . . . 153
24 Appendix D: GOLD Predictions in Second Series of Validation Tests . . . . . . . . . . . . . . . 160
25 Appendix E: GOLD Tutorials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
26 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
226 Index
gold_ligand.mol2 112
gold_protein.log 118
gold_protein.mol2 112
gold_solution... mol2 files 118
GoldScore
atom radii 46
energy parameters 46
heme scoring function, using 48
hydrogen bond energy, ligand intramolecular
46
hydrogen bond energy, protein-ligand 46
internal van der Waals energy 46
overview of 46
parameter file 48, 59
parameters, altering 59
polarisability parameters 46
rescoring with 106
scaling of external van der Waals energy 46
torsional energy of ligand 46
van der Waals energy, ligand 46
van der Waals energy, protein-ligand
46
GoldScore (check box in front end) 5
GoldScore fitness terms, in output files 151
goldscore.p450_csd.params file 60
goldscore.p450_pdb.params file 60
goldscore.params file 48
grommitt 142
Guanidinium
atom type conventions 39
bond type conventions 39
H
Help, context sensitive 141
heme scoring function
different parameter files 60
making planar heme N atoms lipophilic
(ChemScore) 60
using 60
using CSD data 60
using PDB data 60
heme-containing proteins, how to set up 15
Histidine, defining ionisation and tautomeric
state of 10
Host file name (button in process scheduler) 104
Host machines, selecting 104
Hydrogen atoms
constraining to form hydrogen bonds 66
histidine 10
ionisable groups 10
ligand 30
necessity of including 10, 30
protein 10
serine hydroxyls 10
threonine hydroxyls 10
hydrogen bond directionality parameters 46
Hydrogen bond energy
annealing of 91
annealing parameters 91
directionality parameters (Goldscore) 46
ligand intramolecular (Goldscore) 46
ligand intramolecular, switching on and off
66
listed in ligand log file 119
parameters 91
protein-ligand (Goldscore) 46
Hydrogen Bonding (entry box in front end) 5
Hydrogen bonding annealing parameter
explanation of 91
setting 5
Hydrogen-bond terms, in ChemScore 52
Hydrophobic fitting points
explanation of 92
setting value of 92
Index 225
introduction to 18
overview 18
protein-protein clash penalisation, turning off
23
protein-protein clashes, penalisation 23
rotamer command in gold.conf 19
rotamer command limitations 19
rotamer_lib block in gold.conf 19
scoring function limitations 18
specifying 19
specifying the energy 24
using a standard rotamer library 20
flexible side chain conformations, defining 19
Flip Amide Bonds (menu item in front end) 64
Flip Planar N (menu item in front end) 65
Flip Protonated Carboxylic Acid (menu item in
front end) 66
Flip Pyramidal N (menu item in front end) 66
Flip Ring Corners (menu item in front end) 64
flipping a bond during docking using the
rotatable bond override file 38
Formats
of ligand files 31
of output files 112
of protein files 29
problems with pdb 31
FRAGMENT (in torsion angle distribution file)
85
Front end, overview 3
G
GA (check box in front end) 3
Genetic algorithm
and accuracy 129
and speed 94
annealing parameters 91
automatic determination of optimal settings
94
basic description 89
benchmarking of default parameter sets 97
chromosome 89
crossover 91
customising default settings 127
FINAL_VIRTUAL_PT_MATCH_MAX 91
FINISH_VDW_LINEAR_CUTOFF 91
hydrogen bonding, annealing parameter 91
hydrophobic fitting points 92
library screening parameters 98
migrate 91
mutate 91
niche size 91
number of islands 90
number of operations 90
operator weights 91
overview 89
parameter settings for virtual screening 98
population size 89
prediction accuracy 129
selection pressure 90
setting parameters 96
van der Waals, annealing parameter 91
virtual screening 98
Genetic Algorithm Parameters (panel in front
end) 7
Geometry, starting
of ligand 30
of protein 18
of protein hydroxyl groups 18
Glutamic acid, defining ionisation state of 10
gold.conf 126
gold.err 124
gold.params 59
gold.pid 124
gold.tordist 83
gold.tordist.new 83
xii Table of Contents
GOLD User Guide 1
GOLD User Guide
1. Introduction (see page 1)
2. Overview of the GOLD Front End (see page 3)
3. Setting Up the Protein (see page 9)
4. Setting Up Ligands (see page 30)
5. Atom and Bond Types (see page 36)
6. Fitness Functions (see page 46)
7. Ligand Flexibility (see page 64)
8. Setting and Releasing Constraints (see page 68)
9. Torsion Angle Distributions (see page 83)
10. Genetic Algorithm Parameter Definitions (see page 89)
11. Balancing Reliability and Speed (see page 93)
12. Running GOLD (see page 99)
13. Rescoring (see page 106)
14. Output Options (see page 109)
15. Saving and Reusing Program Settings (see page 126)
16. Accuracy of Predictions (see page 129)
17. Context-Dependent Help (see page 141)
18. Utility Programs (see page 142)
19. References (see page 147)
20. Acknowledgments (see page 148)
1. Introduction
GOLD (Genetic Optimisation for Ligand Docking) is a genetic algorithm for docking flexible
ligands into protein binding sites.
A version of SILVER is supplied with GOLD. SILVER has two purposes, first, it serves as a
browser for visualising protein-ligand dockings from GOLD. Secondly, it allows you to define
and calculate a wide variety of descriptors (parameters that describe dockings) which may be
used to analyse the results of a docking run. For further information refer to the SILVER User
Guide.
GOLD provides all the functionality required for docking ligands into protein binding sites from
prepared input files ((see Section 3.1, page 9) and (see Section 4.1, page 30)). GOLD will likely
be used in conjunction with a modelling program since you will be required to create and edit
starting models, e.g. add all hydrogen atoms, including those necessary for defining the correct
ionisation and tautomeric states of the residues. Commonly used molecular modelling
environments include:
SYBYL (http://www.tripos.com/)
Insight II or Cerius2 (http://www.accelrys.com/).
Predicting how a small molecule will bind to a protein is difficult, and no program can guarantee
224 Index
flipping of planar nitrogen 65
flipping of protonated carboxylic acids 66
flipping of ring corners 64
flipping on pyramidal nitrogen 66
intramolecular hydrogen bonds 66
Fitness Flags (button in front end) 5
Fitness function
angle bending term for covalent complexes
33
apparent increase in during genetic algorithm
run 119
bond angle term for covalent complexes 33
ChemScore 49
block functions 50
clash penalty 56
constraint terms 58
covalent term 58
explanation of hydrogen-bond terms 52
hydrogen-bond terms 52
ligand torsional strain 56
lipophilic term 54
metal-binding term 54
overview of 49
parameter file 58
parameters, altering 58
rotatable-bond freezing term 56
choice of GoldScore, ChemScore, User
Defined Score 46
correlation with binding affinity 137
customising parameters 127
GoldScore 46
atom radii 46
bump checking 48
energy parameters 46
external van der Waals energy 46
hydrogen bond directionality parameters 46
hydrogen bond energy, ligand intramolecular 46
hydrogen bond energy, protein-ligand 46
internal van der Waals energy 46
overview of 46
parameter file 48
parameters, altering 48
polarisability parameters 46
scaling of external van der Waals energy 46
torsional energy of ligand 46
van der Waals energy, ligand 46
van der Waals energy, protein-ligand 46
list of, in log file 115
ranking of, for docking solutions 116
torsional parameters 83
User Defined Score 62
overview of 62
valence angle term for covalent complexes 33
Fitness Function (check box in front end) 3
Fitness Function Settings (panel in front end) 5
Fitness function, limitations when using flexible
side chains 18
Fitness terms
Chemscore, definition 151
exporting to SILVER 125
Goldscore, definition 151
in output files, definition 151
saving to output files 111
fixing a bond at its input angle using the
rotatable bond override file 38
Fixing rotatable bonds at input conformation via
the gold.conf 66
FKBP12 binding affinity 140
Flexibility, treatment of
for ligands 89
for protein hydroxyl groups 89
for proteins 18
for rings 64
flexible groups 163
dummy atom 163
set as rigid 163
Flexible protein side chains
chi command in gold.conf 19
chi command limitations 19
commenting out unrequired rotamer lines 20
defining torsion tolerances 19
defining torsions 19
Index 223
valence angle term for covalent complexes 33
Energy values
docking solutions ranked by 119
listed in log file 119
Enolate
atom type conventions 39
bond type conventions 39
Error messages
atom typing 124
during interactive use 124
gold.err 124
Examples of docking results 136
Exit (button in front end) 3
External van der Waals energy 91
F
FAQs 162
File formats
for ligands 31
for output files 112
for proteins 29
problems with pdb 31
File names
conventions for ligand output files 112
specifying for ligand 32
specifying for protein 29
File, configuration 126
Files, input 4
Files, output
active_atoms, set in gold_protein.mol2 112
atom type errors 124
best docking solution 115
bestranking.lst 116
cluster analysis 144
comparison of docking solutions 120
directories 112
docked ligand files 112
donor_hydrogens, set in gold_protein.mol2
112
energy values 119
error messages 124
fitness function may appear to increase 119
fitness function scores 115
formats same as input files 112
gold.err 124
gold.pid 124
gold_ligand.mol2 112
gold_protein.log 118
gold_protein.mol2 112
gold_solution... mol2 files 118
hydrogen bond energy 119
initialised ligand file 112
initialised protein file 112
ligand log file 118
links, symbolic, between ligand docking files
112
log file 118
lone_pairs, set in gold_protein.mol2 112
naming conventions 109
process file 124
protein log file 118
ranked_structure... mol2 files 112
ranking of docked solutions 116
reordering (message in log file) 119
rms comparison of docked solutions 120
rnk 115
sub-directories 112
symbolic links between ligand docking files
112
van der Waals energy 119
FINAL_VIRTUAL_PT_MATCH_MAX 91
FINISH_VDW_LINEAR_CUTOFF 91
Fit point file (button in front end) 92
Fitness flags
flipping of amide bonds 64
2 GOLD User Guide
success. The next best thing is to measure as accurately as possible the reliability of the program,
i.e. the chance that it will make a successful prediction in a given instance. For that reason,
GOLD has been tested on a large number of complexes extracted from the Protein Data Bank
(see Section 16.1, page 129). The overall conclusion of these tests was that the top-ranked
GOLD solution was correct in 70-80% of cases.
GOLD offers a choice of scoring functions, GoldScore (see Section 6.2, page 46), ChemScore
(see Section 6.4, page 49) and User Defined Score which allows users to modify an existing
function or implement their own scoring function (see Section 6.10, page 62). With respect to
using the GoldScore or ChemScore functions one may give a successful prediction where the
other fails, but their overall success rates are about the same (see Section 16., page 129).
Different values of the genetic algorithm parameters may be used to control the balance between
the speed of GOLD and the reliability of its predictions (see Section 11., page 93). GOLD will
only produce reliable results if it is used properly and correct atom typing for both protein and
ligand is particularly important (see Section 5., page 36).
GOLD may be used in serial or parallel modes (see Section 12.6, page 101).
GOLD User Guide 3
2. Overview of the GOLD Front End
The GOLD front end consists of five panels, not all of which may necessarily be on display at
the same time. These are:
Control panel (see Section 2.1, page 3)
Input Parameters and Files panel (see Section 2.2, page 4)
Fitness Function Settings panel (see Section 2.3, page 5)
Genetic Algorithm Parameters panel (see Section 2.4, page 7)
Parallel Operation panel (see Section 2.5, page 8)
2.1 Control Panel
The Control panel of the GOLD front end contains the following buttons, entry boxes and check
boxes:
Run: Starts an interactive GOLD job.
Settings: Offers a choice of genetic algorithm parameter settings (see Section 11.3.3, page 96).
Save&Exit: Saves the current parameter settings in a configuration file for later use, and closes
the front end (see Section 15.1, page 126).
Submit&Exit: Starts a GOLD run in the background (and also saves the parameter settings as a
configuration file), then closes the front end.
Exit: Closes the front end without saving the current parameter settings.
Configuration File: Reads parameter settings from a previously saved configuration file and
loads the parameter values into the front end. The name of the required configuration file must
be typed into the entry box.
Help: Brings up help documentation.
Select editing panels:
Input: Switches on and off the display of the Input Parameters and Files panel (see Section
2.2, page 4).
Fitness Function: Switches on and off the display of the Fitness Function Settings panel (see
Section 2.3, page 5).
GA: Switches on and off the display of the Genetic Algorithm Parameters panel (see Section
222 Index
ring conformations 64
Context sensitive help 141
Control panel 3
Correction term, ligand energy 62
Covalent (check box in front end) 4
Covalent constraints
angle-bending term in 33
method used 33
overview 33
Covalent substructure-based constraints, setting
up 34
Covalent term, in ChemScore 58
Crash, action required in event of 124
Create output sub-directories (check box in front
end) 4
Crossover (entry box in front end) 7
Crossover (genetic algorithm parameter)
default values 96
explanation of 91
setting value of 96
Customising
default genetic algorithm parameter settings
127
fitness function parameter file 127
torsion angle distribution file 127
D
Default (button in front end) 3
Default settings, of genetic algorithm parameters
96
Define active site from (buttons in front end) 4
DELTA_E (parameter in torsion angle
distribution file) 84
Detect Cavity (check box in front end) 4
Diastereomers 31
DIRECTIVE (in torsion angle distribution file)
85
Directory
for input 32
for output 112
output sub-directories 112
Display/Output Options (button in front end) 4
Distributed processes, setting maximum number
of 105
Distributions File (button in front end) 5
Disulphide bridges 9
Docking solutions
examples 136
geometrical comparison 120
ranking of 116
donor_hydrogens (set in gold_protein.mol2) 112
E
Edit Constraints (button in front end) 5
Edit Distributions (button in front end) 5
Edit Parameters (button in front end) 4
Enantiomers 31
energy (rotatable side chain), specifying 24
Energy parameters
angle bending term for covalent complexes
33
bond angle term for covalent complexes 33
ChemScore
parameter file 58
parameters, altering 58
GoldScore 46
altering 59
atom radii 46
overview of 46
parameter file 59
polarisability parameters 46
scaling of external van der Waals energy 46
torsional 46
van der Waals 46
hydrogen bond 73
Index 221
using for GOLD validation 131
C-H...O interactions, accounting for 59
Charges, atomic
for ligand 30
for protein 10
ChemScore
block functions 50
clash penalty 56
constraint terms 58
covalent term 58
explanation of hydrogen-bond terms 52
heme scoring function
making heme N atoms lipophilic 60
using 60
hydrogen-bond terms 52
kinase scoring function, using 59
ligand torsional strain 56
lipophilic term 54
metal binding term 54
parameter file 58
parameters, altering 58
rescoring with 106
rotatable-bond freezing term 56
weak CH...O bonding term 59
ChemScore (check box in front end) 5
ChemScore fitness terms, in output files 151
chemscore.p450_csd.params file 60
chemscore.p450_pdb.params file 60
chi command in gold.conf 19
limitations 19
Chiral ligands 31
Choose machines (entry box in front end) 8
Chromosome 89
Clash penalty, in ChemScore 56
Cluster analysis
calculation with rms_analysis 144
in ligand log file 120
Command line, running GOLD from 100
Comparison of docking solutions 120
Conditions of use i
Configuration file
creating with front end 3
description 126
use in command-line mode 100
Configuration File (entry box in front end) 3
Conformation
of ligand 31
of protein 18
of protein hydroxyl groups 18
of rings 64
Consensus scoring 106
Constraint editor 68
Constraint terms, in ChemScore 58
Constraints
covalent, overview 33
distance 69
Fixing rotatable bonds via the gold.conf 66
hydrogen bonds, forcing between protein and
ligand 73
region (hydrophobic) 77
scaffold match constraint, overview 80
scaffold match constraint, setting up 81
scaffold match, method 81
scaffold match, setting up 81
substructure-based covalent, setting up 34
substructure-based, setting up 72
template similarity, overview 79
template similarity, setting up 79
Constraints, relaxing
amide conformations 64
hydrogen bonds, ligand intramolecular 66
planar nitrogens 65
Protonated carboxylic acid conformations 66
pyramidal nitrogen conformations 66
4 GOLD User Guide
2.4, page 7).
Parallel: Switches on and off the display of the Parallel Operation panel (see Section 2.5,
page 8).
2.2 Input Parameters and Files Panel
The Input Parameters and Files panel contains the following buttons, entry boxes, check boxes,
etc.:
Protein: Allows specification of the protein input file (see Section 3.10, page 29).
Edit Ligand File List: Allows selection of input ligand file(s) (see Section 4.5, page 32).
Waters: Specification of water molecules. GOLD allows waters to switch on and off (i.e. to be
bound or displaced) and to rotate around (to optimise hydrogen bonding) during docking (see
Section 3.4, page 16).
Metals: Allows specification of metal coordination geometries (see Section 3.3, page 10).
Set atom types: Controls whether atom types will be set manually or automatically for (a) the
ligand(s) and (b) the protein (see Section 5., page 36).
Allow early termination: If switched on, instructs GOLD to terminate docking on a given ligand
if a user-specified criterion is met (see Section 11.2, page 93). The criterion will be that the n
top-ranked answers obtained so far are within x rms deviation of one another, where n and x
are user-defined quantities.
Define active site from: Allows specification of the position of the binding site with respect to a
point, a protein atom close to the centre of the site, a set of protein atoms lining the site, or a
reference ligand (see Section 3.8, page 24).
Active site radius: Allows specification of the radius of the binding site, in (see Section 3.8,
page 24).
Detect Cavity: Switches cavity detection on and off (if switched on, the calculation will be
confined to concave regions in the vicinity of the binding-site) (see Section 3.8.7, page 28).
Covalent: Allows specification of a protein-ligand covalent bond (see Section 4.6, page 33).
GOLD User Guide 5
Display: Allows docking solutions to be viewed in SILVER visualiser (see Section 14.13, page
124).
Output: Provides control over the amount, format and directory structure of GOLD output (see
Section 14., page 109).
Edit Parameters: Copies the default parameter file to a user area so that, e.g., GoldScore fitness-
function parameters and other GOLD settings can be modified (see Section 6.3, page 48).
Parameter File: Specifies which parameter file will be used; this contains parameters used by the
GoldScore fitness function together with parameters that control the general operation of GOLD
(see Section 6.6, page 59).
2.3 Fitness Function Settings Panel
The Fitness Function and Search Settings panel contains the following buttons, entry boxes,
check boxes, etc.
GoldScore, ChemScore, User Defined Score: Provides control over which fitness function is to be
used (see Section 6., page 46). The appearance of the rest of the panel will depend on which
function is selected.
Appearance if GoldScore selected:
Appearance if ChemScore selected:
220 Index
API 62
Aromatic bond type 37
Aromatic nitrogen
atom type conventions 39
bond type conventions 39
Aspartic acid, defining ionisation state of 10
Astex/CCDC validation test set
using for GOLD validation 131
Atom charges
for ligand 30
for protein 10
Atom polarisabilities, for use in GoldScore
fitness function 46
Atom radii, for use in GoldScore fitness function
46
Atom types
automatic assignment 36
errors, reporting of 36
manual assignment 37
ATOM_DEF (in torsion angle distribution file)
85
automatic bond settings, overriding 38
Automatic GA parameter settings 94
B
backbone movement (large), dealing with 24
backbone movement (localised), allowing 20
Background, submitting GOLD job to 100
Basic group, defining ionisation state of
in ligand 30
in protein 10
Best docking solution 115
bestranking.lst 116
Binding affinity
alpha chymotrypsin 139
correlation with fitness function 137
FKBP12 140
influenza A neuraminidase 138
Binding site
cavity detection 28
defining from a point 25
defining from a set of atoms 26
defining from an atom 25
radius of 24
Biological activity
alpha chymotrypsin 139
correlation with fitness function 137
FKBP12 140
influenza A neuraminidase 138
Block functions, in ChemScore 50
Bond angle, bending energy term for covalent
complexes 33
Bond angles 31
Bond lengths 31
Bond types
amides 37
aromatic 36
of difficult groups 39
specifying in pdb files 31
bond types (ligand), overriding 38
Bump checking 48
C
Cambridge Structural Database, extracting
torsion angle distributions from 83
Carboxylate
atom type conventions 39
bond type conventions 39
Cationic nitrogen
atom type conventions 39
bond type conventions 39
Cavity detection 28
CCDC/Astex validation test set
Index 219
Index
Numerics
3D visualisation with grommitt 142
A
account for topology (check box in front end) 69
Accuracy of predictions 129
as function of number of ligand atoms, first
validation 153
as function of number of ligand atoms,
second validation 160
as function of number of ligand H-bonding
atoms, first validation 153
as function of number of ligand H-bonding
atoms, second validation 160
as function of number of ligand torsions, first
validation 153
as function of number of ligand torsions,
second validation 160
binding affinity, alpha chymotrypsin 139
binding affinity, FKBP12 140
binding affinity, influenza A neuraminidase
138
examples 136
methodology (binding affinity tests) 137
methodology (docking orientation tests) 129
resolution of protein structure 159
root mean square deviations in first
validation 154
subjective analysis compared with RMS
deviations 158
validation, first series of experiments 129
validation, second series of experiments 130
Acidic group, defining ionisation state of
in ligand 30
in protein 10
Acknowledgements 148
Active site
cavity detection 28
defining from a point 25
defining from a reference ligand 28
defining from a residue 26
defining from a set of atoms 26
defining from a set of residues 27
defining from an atom 25
radius of 24
Active site radius (entry box in front end) 4
active_atoms (set in gold_protein.mol2) 112
Activity, biological
correlation with fitness function 137
Add Ligand (window in front end) 32
Add/Delete Ligand (button in front end) 4
Allow early termination (check box in front end)
4
Alpha chymotrypsin binding affinity 139
amide bond, retyping to using the rotatable bond
override file 38
Amide linkages
bond type 36
conformation around 31
flipping between cis and trans (in ligands) 64
Amidinium
atom type conventions 39
bond type conventions 39
Anionic nitrogen
atom type conventions 39
bond type conventions 39
Anionic oxygen
atom type conventions 39
bond type conventions 39
Annealing parameters 91
FINAL_VIRTUAL_PT_MATCH_MAX 91
FINISH_VDW_LINEAR_CUTOFF 91
hydrogen bonding 91
van der Waals 91
6 GOLD User Guide
Appearance if User Defined Score selected:
Rescore: Used to rescore a docked ligand pose with an alternative scoring function (see Section
13., page 106).
Fitness and Search Options: Used to control:
Ligand flexibility during docking, including: whether ligand ring conformations are varied,
whether torsion angles around ligand amide bonds and bonds to trigonal nitrogen are allowed
to vary during docking, whether intramolecular hydrogen bonds are permitted between ligand
atoms, and whether protonated carboxylic acids are permitted to rotate or flip (see Section 7.,
page 64).
The use of an internal energy offset. This will offset the internal energy of the ligand (internal
torsion, van der waals and hydrogen bonding terms, if applicable) by the best internal energy
found. i.e., when enabled, the internal energy will be taken relative to a near optimal reference
state. This allows any internal energy that is implicit in the structure, i.e. cannot be removed
by a change in conformation, to be ignored (see Section 6.9, page 62).
The use of torsional distributions. These can be used by GOLD to restrict ligand
conformational searches to regions of torsion-angle space that are observed in small-molecule
crystal structures (see Section 9., page 83).
Use of hydrophobic fitting points. This allows specification of a fit point file, i.e. a file of
customised hydrophobic fitting points (see Section 10.9, page 92).
Edit Constraints: Allows specification of distance constraints, hydrogen bond constraints,
regional (hydrophobic) constraints, and binding mode similarity constraints (see Section 8., page
68).
Constraints: Displays the number of constraints currently set.
Number of Ligand Bumps: Instructs GOLD to allow up to n short protein-ligand contacts, where
n is user-specified (see Section 6.2.2, page 48) (not available with ChemScore).
Van der Waals: Only available if GoldScore selected. Allows specification of the van der Waals
annealing parameter (see Section 10.8, page 91).
Hydrogen Bonding: Only available if GoldScore selected. Allows specification of the hydrogen-
GOLD User Guide 7
bonding annealing parameter (see Section 10.8, page 91).
ChemScore Parameter File: Only available if ChemScore selected. Allows the default file
containing ChemScore parameters to be replaced by a user-specified file.
Scoring Function Shared Object Name (UNIX) or Scoring Function DLL Name (Windows):
Only available if User Defined Score selected. Allows selection of users own scoring function
by specifying a path to a dynamically loadable shared object library.
2.4 Genetic Algorithm Parameters Panel
The Genetic Algorithm Parameters panel contains the following buttons, entry boxes, check
boxes, etc. (see Section 10., page 89):
Select GA Presets and Automatic Settings: Allows specification of speed and accuracy of
docking runs. Either select from a range of preset GA settings, or use automatic settings which
will optimise the number of GA operations for each ligand docked (settings will be determined
automatically according to the number of rotatable bonds, number of flexible ring corners, size
of binding site etc.) (see Section 11.3, page 94).
Population Size: Allows specification of the population size (i.e. the number of chromosomes
that will be used on each island) (see Section 10.2, page 89).
Selection Pressure: Allows specification of the selection pressure (see Section 10.3, page 90).
Number of Operations: Allows specification of the total number of operations to be performed
in a genetic algorithm run (this is the key determinant of program calculation time) (see Section
10.4, page 90).
Number of Islands: Allows the genetic algorithm to be split over n islands, where n is user
specified (see Section 10.5, page 90).
Niche Size: Allows specification of the niche size to be used (see Section 10.6, page 91). Niching
is a method for trying to keep diversity within the population by avoiding generation of > n very
similar chromosomes, where n is user defined. Niching is switched off after 90% of the GA run.
Migrate/Mutate/Crossover: Controls the relative frequencies with which the three types of
genetic operations occur. Migrate should be zero if Number of Islands is one, since it refers to
migration of chromosomes from one island to another (see Section 10.7, page 91).
Note: You are recommended to use automatic settings, or one of the default parameter sets
offered in the GOLD front end (see Section 11.3, page 94).
218 GOLD User Guide
mode still appears in some solutions but these invariably have lower scores.
This ends the tutorial.
GOLD User Guide 217
3. Cross-Docking into 1x7r with a Soft Potential applied to Leu 346
View the file gold_1x7r_1l21_SP.conf using a text editor. This file has been set up so
that a soft VdW potential with 2-4 functional form has been applied to one residue only, Leu346.
This replaces the default 4-8 functional form that applies to the rest of the protein.
The Keyword that has been introduced to set the soft potential for Leu346 is at the end of the file
and is reproduced below. The numeral in brackets, (1) in this case, indicates that a 2-4 form has
been applied. If this number were (2) then a softer 1-2 functional form would have been applied.
Further information is available (see Section 6.2.1, page 47).
Run the docking job gold_1x7r_1l21_SP.conf and analyse the results using SILVER. We
recommend you run this job using the command line. Instructions are available on using the
command line in Windows or under Unix (see Section 12.5, page 100).
This time you should find that the highest scoring solutions correspond very closely with the 1l2i
binding mode (see below). These solutions will have scores of 43-45. The reversed binding
8 GOLD User Guide
2.5 Parallel Operation Panel
The Parallel Operation panel contains the following buttons, entry boxes, check boxes, etc.:
Maximum number of distributed processes: Shows the number of GOLD processes that will run
simultaneously. This should normally be set equal to the number of processors available for the
GOLD job to run on (see Section 12.6.5, page 105).
Choose machines: Allows specification of the machines on which the GOLD job is to be run
(see Section 12.6.4, page 104).
Note: The Parallel operation Panel is only accessible if PVM has been set-up (see Section 12.6,
page 101).
GOLD User Guide 9
3. Setting Up the Protein
3.1 Essential Steps in Setting Up the Protein (see page 9)
3.2 Protein Hydrogen Atoms, Ionisation States and Tautomeric States (see page 10)
3.3 Metal Ions (see page 10)
3.4 Water Molecules (see page 16)
3.5 Rotatable O-H and NH3 Groups (see page 18)
3.6 Flexible Side Chains (see page 18)
3.7 Large Backbone Movements (see page 24)
3.8 Defining the Binding Site (see page 24)
3.9 Protein File Formats (see page 29)
3.10 Specifying the Protein File Name (see page 29)
3.1 Essential Steps in Setting Up the Protein
You can either input the whole protein structure to GOLD, or just those residues that are in the
active site region. The latter leads to somewhat shorter run times, since both protein initialisation
and cavity detection will be quicker.
If you input only the region of interest around the binding site, you must ensure that all the
residues you include are complete. You should also include all residues within a 5 radius from
the solvent-accessible surface of the cavity.
Add all hydrogen atoms, including those necessary to define the correct ionisation and
tautomeric states of residues such as Asp, Glu and His (see Section 3.2, page 10).
Ensure that all bond types are correct. If they are, and hydrogen atoms have been placed on the
correct atoms, GOLD will deduce atom types automatically (see Section 5.2, page 36). This also
applies to PDB input files but only for known residues (i.e there is no HET group library).
GOLD connects atoms within residues on the basis of proximity. Double bonds are assigned as
appropriate for the naturally occurring protein residues,
Residues should be in sequence order, and correctly named.
All atoms should be properly labelled (CA, CB etc.).
Any unusual bonds (disulphide bridges, etc.) should have CONECT records.
If a metal ion is present, ensure that all bonds between the ion and coordinating protein or water
atoms are deleted (GOLD will re-find them automatically). Metals should be within bonding
distance of at least two protein and/or water atoms in the active site so that GOLD can infer
likely coordination geometries. (see Section 3.3, page 10)
Save the protein in, e.g., MOL2 format.
GOLD assigns atom types from the information about element types and bond orders in the
input structure file, so it is important that these are correct. However, if for any reason, GOLD is
unable to deduce an atom type, then the atom in question will be replaced with a dummy atom
type Du. If this is the case a warning message will be given in the gold_protein.log file.
216 GOLD User Guide
Most of the binding site is well superimposed. However above the ligands you can see that there
is movement of a protein loop that brings Leu346 closer in to the ligand in 1x7r than in 1l2i. This
superposition suggests that a clash would exist if the ligand from 1l2i were docked into 1x7r.
This might prevent the correct binding mode being rated highly if using a scoring function such
as GoldScore, with a clash term that increases sharply with proximity to the protein. Other
residues such as Met343 also do not superimpose well as a consequence of this loop movement.
However these residue shifts appear to have less of an impact on the size of the active site than
does that of Leu346.
You can view this superposition yourself by opening SILVER or another protein visualiser and
reading in the file 1x7r_1l2i_sup.mol2 .
2. Cross-Docking into 1x7r with no Soft Potential Applied
The files 1l2i_prot.mol2 and 1x7r_prot.mol2 are the protein models derived from the
pdb entries 1x7r and 1l2i. 1l2i_lig.mol2 is the ligand structure obtained from, and in the
same frame of reference as 1l2i.
The GOLD configuration file gold_1l2i_1l2i.conf is set up to dock the 1l2i ligand back
into the 1l2i protein structure. Run this GOLD job and analyse the results in SILVER to check
that the crystallographic binding mode is indeed retrieved. Read in the file 1l2i_lig.mol2
to make the comparison.
The GOLD configuration file gold_1x7r_1l21.conf is set up to dock the the 1l2i ligand
into the 1x7r protein structure. Run this GOLD job and analyse the results into SILVER. Read in
the file 1x7r_1l2i_sup.mol2 to compare the docked poses with the binding mode found
in 1l2i. You may find that there are some solutions which have approximately the right binding
mode and which return scores of between 23 and 25. However there should also exist higher
ranking poses with scores of between 28 and 32. These poses have the ligand rotated through
180 degrees along the long axis as shown in the superposition below (crystallographic binding
mode colour coded orange, GOLD docking pose colour coded green).
GOLD User Guide 215
Tutorial 7: Docking using Localised Soft Potentials
1. Introduction (see page 215)
2. Cross-Docking into 1x7r with no Soft Potential Applied (see page 216)
3. Cross-Docking into 1x7r with a Soft Potential applied to Leu 346 (see page 217)
1. Introduction
The object of this tutorial is to demonstrate how to employ the Localised Soft Potential option
that is available when using GoldScore. This option allows you to soften the VdW clash
component of the GoldScore for one or more residues in the protein. We will examine the
docking of a ligand to two different crystal structures of Estrogen Receptor Alpha. The
structures differ in that a small loop movement constrains the binding site of one of the
structures (pdb code 1x7r) slightly more than for the other structure (pdb code 1l2i)
All files referred to in this tutorial can be found in <GOLD_DIR>/examples/tutorial7
where <GOLD_DIR> is the location of your GOLD installation.
The figure below shows the superposition of both protein structures. 1x7r corresponds to the
protein colour coded light blue and the ligand colour coded green, 1l2i corresponds to the
protein colour coded orange with the ligand colour coded yellow.
10 GOLD User Guide
The presence of dummy atoms should not significantly affect the docking prediction since
dummy atoms are neither considered as donors or acceptors.
3.2 Protein Hydrogen Atoms, Ionisation States and Tautomeric States
GOLD uses an all-atom model, so the protein must have all hydrogen atoms added.
The precise geometrical positions of Ser, Thr and Tyr hydroxyl hydrogen atoms or Lys NH
3

hydrogen atoms do not matter as their orientation will be optimised during the GOLD run.
GOLD deduces the hydrogen-bonding abilities of protein residues from the presence or absence
of hydrogen atoms. For example, you can control the protonation and tautomeric state of Asp,
Glu and His residues by adding or removing appropriate hydrogen atoms.
If incorrect ionisation or tautomeric states are inferred by the program, it is unlikely that correct
protein-ligand binding modes will be predicted. GOLD will not vary tautomeric or ionisation
states during docking, if you are unsure about, e.g., the tautomeric state of a His residue, you
should perform separate GOLD runs using the different possibilities.
GOLD ignores atom charges, both formal and partial. It deduces whether an atom is charged by
counting the bond orders of the bonds that it forms and comparing the result with the atoms
normal valency.
3.3 Metal Ions
3.3.1 Preparing a Protein Input File which Contains a Metal Ion (see page 10)
3.3.2 Automatic Determination of Metal Coordination Geometries (see page 11)
3.3.3 Specifying Metal Coordination Geometries Manually (see page 12)
3.3.4 Defining Custom Metal Coordination Geometries (see page 14)
3.3.5 Metal-Ligand Interactions (see page 15)
3.3.6 Heme Containing Proteins (see page 15)
3.3.1 Preparing a Protein Input File which Contains a Metal Ion
There are some additional requirements when preparing a protein input file which contains a
metal ion.
The metal ion must be coordinated to at least two protein atoms or water molecules so that
GOLD can predict the coordination geometry (see Section 3.3.2, page 11).
In the protein input file, the metal ion should not have any bonds to coordinating atoms. If these
are present in the original PDB file, they must be deleted.
In order to model metal ions within SYBYL, you need to load a parameter file (otherwise, all
metal ions will be assigned dummy-atom types). Add the following line to your ~/.sybylrc file:
parameter open $TA_ROOT/demo/metals.tpd
GOLD User Guide 11
There may be problems in the way that SYBYL handles metal ions: they are not always well
behaved in the minimiser, and typically have valencies of 4 or 6, which may mean that hydrogen
atoms are added to the metal when you add hydrogen atoms to the protein.
Note: GOLD can only handle the hardcoded metal atom types (see Section 3.3.2, page 11); it is
not possible to add user defined metal atom types.
3.3.2 Automatic Determination of Metal Coordination Geometries
GOLD is able to recognise the following metal coordination geometries:
In order to determine the coordination geometry of a particular metal atom GOLD performs a
permuted superimposition of coordination geometry templates onto the coordinating atoms
found in the protein (e.g. if there are only two coordinating atoms in the protein then every
unique pair of coordinating template atoms are selected and superimposed on the system in the
protein).
Coordination fitting points are then generated using the template that gives the best fit (based on
RMSd).
The geometry templates used for given metals are defined in the gold.params file in the section
headed # Metals (for explanation of parameters refer to comments in the gold.params file):
Template Geometry Coordination
number
TETR Tetrahedral n=4
TBP Trigonal bipyramidal n=5
OCT Octahedral n=6
CTP Capped trigonal prism n=7
PBP Pentagonal bipyramidal n=7
SQAP Square prism n=8
ICO Icosahedral n=10
DOD Dodecahedral n=12
214 GOLD User Guide
this means we have to be careful to pick a Gln192 rotamer that is folded away from the binding
region but also does not clash with the arginine residue. A way round this is to add the command
penalise_protein_clashes = 0 to the rotamer_lib command block (place it anywhere between
rotamer_lib and end_rotamer_lib). This will switch off calculation of clashes between flexible
side-chain atoms and neighbouring protein atoms, allowing Gln192 to approach nearby residues
closely. While physically unrealistic, this is a pragmatic tactic that might well work (and is not as
egregious as it sounds, since, in reality, Arg143 can probably move away from Gln192 if it needs
to).
Obviously, you can experiment with these options if you wish.
This ends the tutorial.
GOLD User Guide 213
Also, the best solution from the flexible run has a much higher GoldScore value (75.7161) than
was obtained from the rigid run.
Again, you can view these results in SILVER if you wish. The movements of the flexible side
chain Gln192 can be seen more effectively if the show protein hydrogens tick box is deactivated
and the Gln192 residue is selected as a protein subset (Descriptors, Define a protein subset, By
residue...). The newly defined subset can be selected by picking it from the Subset highlighting
pull-down menu in SILVER.
5. Choosing Side-Chain Rotamers
Two decisions must be made when using the flexible side-chain facility: (a) which side chains
are made flexible; (b) how flexible is each side chain made? It is important to recognise that the
more flexibility is introduced, the larger the search space becomes. Particularly with high-
throughput runs, when relatively little time can be allowed per ligand, this may seriously
decrease the chance of finding the global minimum.
A sensible strategy is therefore to make a side chain flexible only if you have some a priori
reason to suppose that it will move, as we have (from X-ray structures) in the tutorial example.
On the other hand, we probably allowed Gln192 more movement than necessary in the above
experiments. As long as it can adopt the native 1fax position and one other position in which it is
folded away from the binding site, that might well have been enough.
One problem is that, in some conformations, Gln192 tends to clash with Arg143. At first sight,
12 GOLD User Guide
For example, for a Zn atom GOLD will attempt to match coordination geometries 4, 5 and 6
(tetrahedral, trigonal bipyramidal, and octahedral templates) onto the coordinating atoms found
in the protein.
The template that gives the best match will then be used to generate coordination fitting points.
Details of the coordination geometry determination are given in the gold_protein.log file.
The output file gold_protein.mol2 will contain a number of dummy atoms representing idealised
coordination positions. These dummy atoms will be connected to the metal ion. Any unoccupied
coordination points will then be available for ligand binding (see Section 3.3.5, page 15).
3.3.3 Specifying Metal Coordination Geometries Manually
It is possible to manually specify coordination geometries for particular metal atoms. This can
be used to allow non-standard metal coordination geometries, or to limit the number of possible
geometries that GOLD checks (i.e. it is possible to overrule the default geometries for the
corresponding metal type defined in the gold.params file (see Section 3.3.2, page 11)).
Click on the Metals button in the Input Parameters and Files section of the GOLD front-end.
The Metal Selection window will appear:
H-Bonding
type
Sybyl atom type Atom type (default or
elucidated)
Donor (D), Acceptor
(A), or Metal (M).
Allowed Coordination
geometries
Coordination
distance
MGD Mg DEF M 4, 6 2.05
ZND Zn DEF M 4, 5, 6 2.09
MND Mn DEF M 4, 6 2.06
FED Fe DEF M 4, 6 1.98
CAD Ca DEF M 6, 7 2.44
COBD Co.oh DEF M 6 2.09
GDD Gd DEF M 6 2.44
GOLD User Guide 13
Type in the atom number of the metal (as it appears in the protein input file), then select the
allowed coordination geometries from the list.
Note: If the list of pre-defined coordination geometries does not contain a suitable geometry then
you can define a custom metal coordination geometry (see Section 3.3.4, page 14).
Once the allowed geometries have been selected for a particular metal atom click on the Add
metal or Update selected metal button to add the selection to the Current Metal Settings.
Repeat the above procedure if you want to specify coordination geometries for additional metal
atoms.
To edit a Current Metal Setting (e.g. to change the allowed coordination geometries) highlight
the corresponding entry in the Current Metal Settings list, make the required change and then hit
the Add metal or Update selected metal button.
To remove an entry from the Current Metal Settings highlight the entry and hit the Delete
Selection button, or to remove all entries hit the Clear List button.
Click on Done in the Metal Selection window when you are satisfied with the chosen metals and
their allowed coordination geometries. When you finish, the count of Metals will be updated in
the GOLD front end.
212 GOLD User Guide
You can see this for yourself in SILVER by the following sequence of operations:
Open SILVER.
Select File followed by Load GOLD run results... and use the file browser to select the file
non_flexible.conf.
In the SILVER interface, select the ligand docking with the highest GoldScore (scores are
given at the end of each line in the Ligands list box).
Switch off the Clear ligands on loading check box (near the bottom right of the SILVER
window).
Select File followed by Load a ligand... and pick 1fax_1lpg_super.mol2 from the file browser.
Switch on the Display multiple ligands check box and then select LIM::IMA_301_pdb1lpg_1
(the experimental position of the 1lpg ligand) from the Ligands list (it should be item 14 in the
list). You can now see the discrepancy between the experimental pose and the top-ranked
solution from the non-flexible run.
Read the flexible.conf into SILVER in the way described above and compare the top-ranked
solution with the experimental position of the 1lpg ligand. In contrast, the top-ranked solution
from the flexible run is much better. It is not perfect - in particular, the benzamidine moiety is
somewhat displaced - but the benzyloxy side chain is now roughly in the right position, the
Gln192 side chain having moved out the way:
GOLD User Guide 211
first rotamer line specifies a side-chain conformation with chi1 = 62 (plus or minus 13) degrees,
chi2 = 180 (plus or minus 14), chi3 = 20 (plus or minus 16).
Note: GOLD will round any tolerance values that are not multiples of ten up to the next 10, thus
GOLD will process the first line as chi1 = 62 (plus or minus 20) degrees, chi2 = 180 (plus or
minus 20), chi3 = 20 (plus or minus 20).
The next line defines the exact rotamer chi1 = 70, chi2 = -75, chi3 = 0; and so on.
During docking, Gln192 will be allowed to take up any conformation that falls within any of the
rotamer definitions. The rotamers are based on a library of highly-populated side-chain
conformations described in S. C. Lovell, J. M. Word, J. S. Richardson & D. C. Richardson,
Proteins, 40, 389-408, 2000. A digest of this information in a format suitable for copy-and-paste
into GOLD configuration files is available in <GOLD_DIR>/gold/
rotamer_library.txt (see Section 3.6.3, page 20).
The final line, end_rotamer_lib, closes this flexible side-chain definition. Had we wished, we
could have added further rotamer_lib command blocks to specify other flexible side chains, up
to a maximum of 10.
4. Comparison of Flexible and Non-Flexible Results
If you wish, you can run the two GOLD jobs using the configuration files described in the
preceding section. Alternatively, you can view the results that we have generated. Since GOLD
is non-deterministic, any results that you get might differ from ours, but the general trends are
likely to be the same.
The results of our docking runs with rigid and flexible Gln192 side chain are in the directories
non_flexible and flexible, respectively.
As expected, none of the solutions produced in our non-flexible run is correct; all have the
benzyloxy side chain seriously misplaced. The top-ranked docking has a GoldScore of 63.8592
and is shown below with the true ligand position for reference:
14 GOLD User Guide
3.3.4 Defining Custom Metal Coordination Geometries
It is possible to specify custom metal coordination geometries which can subsequently be used
to derive ligand binding points around particular metal atoms.
GOLD will normalise the size of the custom polyhedron to the appropriate metal-chelator
distance before matching it to the metal and the coordinating atoms found in the protein.
Click on the Metals button in the Input Parameters and Files section of the GOLD front-end
then, in the Metals Selection window, click on the Set Up Custom Metal Polyhedrons button. The
Define Custom Metal Coordination geometries window will appear:
Custom metal polyhedron may contain up to nine points. Each point in the custom polyhedron
must be specified using a vector (assuming the centre of your polyhedron is at the origin).
For example, to set up a custom square planar geometry you must specify four points using the
following vectors:
0, 1, 0
1, 0, 0
-1, 0, 0
0, -1, 0
GOLD User Guide 15
Assuming the metal is on the origin (0,0,0), GOLD will then attempt to match the specified
vectors onto the metal-to-protein-atom vectors found in the protein (vectors are normalised to a
metal-to-chelator distance of 2.0 ).
Once vectors for each point in the polyhedron have been defined click on the Add metal
coordination polyhedron or Update selected polyhedron button to add the custom definition to
the Current Metal Polyhedron Settings.
Repeat the above procedure if you want to specify additional custom polyhedron. It is possible
to set up to three custom metal polyhedron.
To edit a Current Metal Polyhedron Setting highlight the corresponding entry in the Current
Metal Polyhedron Settings list, make the required change and then hit the Add metal or Update
selected metal button.
To remove an entry from the Current Metal Polyhedron Settings highlight the entry and hit the
Delete Selection button, or to remove all entries hit the Clear List button.
Click on Done in the Define Custom Metal Coordination geometries window when you are
satisfied with the custom coordination geometries that have been defined.
The count of Custom Metal Polyhedron will be updated and the custom geometries will be
available for selection from the Metal Selection window (see Section 3.3.3, page 12).
3.3.5 Metal-Ligand Interactions
Metal coordination in GOLD is modelled as 'pseudo-hydrogen bonding'.
Metal-ligand interactions will typically involve the metal binding to, for example, carboxylate
ions, deprotonated histidines (i.e. negatively charged), and phenolates. Therefore metals can be
considered to bind to H-bond acceptors and the metal will compete with H-bond donors for
interaction.
Consequently, GOLD uses the following approach for handling metals:
Virtual coordination points are added at locations where GOLD is missing a coordination site.
These coordination points are then used as fitting points that can bind to acceptors.
3.3.6 Heme Containing Proteins
The paper Kirton et al, Proteins: Structure, Function, and Bioinformatics, 58, 836-844, 2005
describes the use of ligand specific iron parameters in the context of docking to heme-containing
proteins. This extended metal parameterisation is available for the fine-tuning of metal
interactions, so that e.g. metal-ligand interactions can specifically be addressed depending on the
metal contact.
The protein does not need to be set up in a special way to make use of these parameters however
the standard set-up should be followed (see Section 3.3.1, page 10).
Further information on setting up a GOLD run with these settings is available (see Section 6.8,
page 60).
210 GOLD User Guide
GOLD conventions.
These two files may be viewed in SILVER if desired.
3. Preparation of Configuration Files
Two GOLD configuration files have been prepared. The first, non_flexible.conf, was set up in
the normal way using the GOLD graphical user interface. It corresponds to a standard docking
of the 1lgp ligand into the 1fax binding site, using slow search settings (100,000 GA operations)
and allowing no side-chain flexibility. The considerations outlined in the preceding part of this
tutorial suggest that this docking protocol is unlikely to give good results.
The second file, flexible.conf, defines a docking in which the Gln192 side chain is allowed to
move. It was set up by editing the original configuration file, non_flexible.conf, in a text editor.
Currently, side-chain flexibility is not available via the GOLD graphical interface, you must
directly edit additional command lines into the .conf file (see Section 3.6.2, page 19) and run
GOLD via the command line.
Comparing the two configuration files, you will see that the flexible version contains the
following additional lines at the end:
rotamer_lib
name gln_192
chi1 2817 2818 2821 2822
chi2 2818 2821 2822 2823
chi3 2821 2822 2823 2825
rotamer 62 (13) 180 (14) 20 (16)
rotamer 70 -75 0
... several more rotamer lines ...
end_rotamer_lib
Collectively, these lines define the torsional flexibility that the Gln192 side-chain will be
allowed to have during docking.
The first line specifies that a rotamer_lib command block is beginning.
The second line specifies a unique name for this rotamer_lib command block - any text can be
used, but a useful convention is to use the name of the side chain to which the command block
pertains.
The next three lines specify the torsion angles that are to be made variable. If you open
1fax_protein.mol2 in a text editor, you will see that the atom numbers 2817, 2818, 2821, 2822,
2823 and 2825 correspond to N, CA, CB, CG, CD and NE2, respectively, of Gln192. This means
that chi1, chi2 and chi3 correspond, respectively, to rotation around the Co-C|, C|-C and C-
Co bonds. When defining these torsion angles, you must start from the atom nearest the
backbone and move out along the side chain, e.g. chi3 2825 2823 2822 2821 would be invalid.
The rotamer lines define the allowed values or ranges of values for chi1, chi2 and chi3. Thus, the
GOLD User Guide 209
right-hand corner of the plot, Gln192, adopts a variety of positions according to which ligand is
bound. The Gln192 position highlighted in purple is taken from 1lpg, that shown in orange is
taken from 1fax.
The next figure was produced by superimposing 1lpg and 1fax. It shows the 1fax binding site
and the 1lpg ligand. Gln192 is highlighted in orange. It is immediately clear that the 1lpg ligand
cannot be docked accurately into the 1fax binding site if Gln192 is not allowed to move, since
there is a severe steric clash between these two.
To see this more clearly, you can open SILVER and read in the file 1fax_1lpg_super.mol2 from
<GOLD_DIR>/examples/tutorial6 via File, Load a ligand; this is the superposition
from which the above figure was generated. Superimpose the 1lpg ligand with the 1fax protein
fragment by selecting the Display multiple ligands tickbox then clicking on PCN::pdb1fax-
A_1 and LIM::IMA_301_pdb1lpg_1 from the Ligands list.
2. Preparation of Input Files
The file 1fax_protein.mol2 contains the binding site from 1fax. It has been set up for docking in
the normal way. Parts of the protein remote from the binding site have been deleted in order to
speed up the calculation, and hydrogen atoms have been placed on the protein in order to ensure
that ionisation and tautomeric states are defined unambiguously (see Section 3.1, page 9).
The ligand from 1lpg has also been set up for docking (see Section 4.1, page 30). It is stored in
1lpg_ligand.mol2. Again, attention has been given to protonation states (e.g. the benzamidine
group has been built in its protonated form) and the bond types have been set in accordance with
16 GOLD User Guide
3.4 Water Molecules
3.4.1 Methodology For Handling Waters (see page 16)
3.4.2 Specifying Waters (see page 16)
3.4.1 Methodology For Handling Waters
Water molecules often play key roles in protein-ligand recognition. Water molecules can either
form mediating hydrogen bonds between protein and ligand, or be displaced by the ligand on
binding.
GOLD allows waters to switch on and off (i.e. to be bound or displaced) and to rotate around
around their three principal axes (to optimise hydrogen bonding) during docking.
To predict whether a specific water molecule should be bound or displaced, GOLD estimates the
free-energy change, AG
b
, associated with transferring a water molecule from the bulk solvent to
its binding site in a protein-ligand complex. AG
b
for a given water molecule is defined as:
AG
p
(W) is a constant penalty added for each water molecule that is switched on and represents
the loss of rigid-body entropy on binding to the target (hence rewarding water displacement).
Note: AG
p
values were optimised against a training set of 58 protein-ligand complexes for four
targets (HIV-1 protease, factor Xa, thymidine kinase and the oligopeptide-binding protein Opp
A) where water molecule play key roles in the recognition. Further details can be found in
Modeling Water Molecules in Protein-Ligand Docking Using GOLD (see References, page 147).
AG
i
(W) represents the intrinsic binding affinity of a water molecule and contains contributions
resulting from interactions that the water forms with the protein and ligand (changes in the
interactions between protein and ligand caused by introduction of the water are also accounted
for).
Therefore, for a water molecule to be bound to a protein-ligand complex, its intrinsic binding
affinity needs to outweigh the loss of rigid-body entropy on binding.
3.4.2 Specifying Waters
GOLD allows you to switch specific water molecules on or off (i.e. you can specify whether a
particular water should be present or absent in the protein). Alternatively, GOLD can
automatically determine whether a specific water should be bound or displaced by toggling it on
and off during the docking run. The orientation of the water hydrogen atoms can also be
optimised by GOLD during docking.
AG
b
W ( ) AG
p
W ( ) AG
i
W ( ) + =
GOLD User Guide 17
To specify settings for key water molecules, click on the Waters button in the Input Parameters
and Files section of the GOLD front-end. The Water Selection window will appear:
For each water molecule that you want to either include or exclude from the docking, you need
to specify:
The atom number of the water oxygen atom (as defined in the protein input MOL2 file).
The state of the water, available options are:
On: use the water for docking (i.e. present)
Off: do not use the water for docking (i.e. absent)
Toggle: have GOLD decide whether the water should be present or absent (i.e. bound or
displaced) during docking.
The orientation of the water hydrogen atoms, available options are:
Freeze: use the orientation specified in the input file
Spin: have GOLD automatically optimise the orientation of the hydrogen atoms.
Once the allowed state and orientation of a water molecule has been specified click on the Add
water or Update selected water button to add the water molecule to the Current Water Settings.
Repeat the above procedure if you want to specify additional water molecules.
To edit a Current Water Setting (e.g. to change the state) highlight the corresponding entry in the
Current Water Settings list, make the required change and then hit the Add water or Update
selected water button.
To remove an entry from the Current Water Settings highlight the entry and hit the Delete
208 GOLD User Guide
Tutorial 6: Docking with a Flexible Side Chain
1. Introduction (see page 208)
2. Preparation of Input Files (see page 209)
3. Preparation of Configuration Files (see page 210)
4. Comparison of Flexible and Non-Flexible Results (see page 211)
5. Choosing Side-Chain Rotamers (see page 213)
1. Introduction
The object of this tutorial is to demonstrate how to dock a ligand into a binding site which is
known to contain a flexible side chain. The example will involve docking the ligand from PDB
entry 1lpg into the protein binding site taken from 1fax. These structures are of blood
coagulation factor Xa, complexed with two different ligands.
All files referred to in this tutorial can be found in <GOLD_DIR>/examples/tutorial6
where <GOLD_DIR> is the location of your GOLD installation.
The figure below shows a superposition of several experimental determinations of the factor Xa
binding site, complexed with a variety of different ligands (not shown), Only a small part of the
binding site is displayed.
While it is clear that parts of the binding site are rigid, their positions hardly moving from one
structure to the next, other parts are more inclined to move. In particular, the residue at the top
GOLD User Guide 207
6. Changing the Scoring function
You may wish to stop the tutorial here. However, optionally, you can run through the tutorial
again, this time having the GoldScore option set in the Fitness Function and Search Settings box.
You will find that similar results are obtained. When all waters are turned off, two binding
modes are generally found that score similarly well. One of these binding modes actually
superimposes the reference ligand very well, and allowing waters to toggle does not
significantly improved the superimposition in this case. However, allowing the waters to toggle
does result in only this one binding mode being returned. The second, spurious binding mode is
successfully eliminated.
This ends the tutorial.
18 GOLD User Guide
Selection button, or to remove all entries hit the Clear List button.
Click on Done in the Water Selection window when you are satisfied with the waters specified
and their allowed state and orientations. When you finish, the count of Waters will be updated in
the GOLD front end.
Any unspecified waters that are part of the protein are considered to be On, automatically.
3.5 Rotatable O-H and NH
3
Groups
The torsion angles of Ser, Thr and Tyr hydroxyl groups will be optimised by GOLD so their
starting positions do not matter. Specifically, each Ser, Thr and Tyr OH will be allowed to rotate
to optimise its hydrogen-bonding to the ligand, unless it is held in place by strong H-bonds to
neighbouring protein residues. Lysine NH
3
+ groups are similarly optimised.
3.6 Flexible Side Chains
3.6.1 Introduction to Side-Chain Flexibility (see page 18)
3.6.2 Specifying a Flexible Side Chain (see page 19)
3.6.3 Using a Standard Rotamer Library (see page 20)
3.6.4 Allowing a Localised Backbone Movement (see page 20)
3.6.5 Protein-Protein Clashes (see page 23)
3.6.6 Specifying the Energy of a Side-Chain Rotamer (see page 24)
3.6.1 Introduction to Side-Chain Flexibility
You may specify that one or more protein side chains are to be treated as flexible. Each flexible
side chain will be allowed to undergo torsional rotation around one or more of its acyclic bonds.
This option is only available if you are using the GoldScore scoring function (see Section 6.2,
page 46).
Making a side chain flexible can make docking more difficult because it increases the search
space that must be explored. It may also increase the chance of false positives (i.e. ligands that
appear to dock well but do not actually bind). Therefore, you should only make a side chain
flexible if you have good reason to believe (e.g. from X-ray data) that it is likely to move in
response to ligand binding.
At present, side chains may only be made flexible by directly editing the GOLD configuration
file (see Section 15.1, page 126), i.e. the option is not available via the GOLD graphical user
interface. Therefore, you need to set up the GOLD job as normal in the graphical interface, hit
Save & Exit to save the configuration file (which, by default, is named gold.conf), then manually
edit this file as detailed below.
GOLD User Guide 19
3.6.2 Specifying a Flexible Side Chain
For each side chain that you want to make flexible, you should add a rotamer_lib block of
commands to the end of the gold.conf file. This specifies the name of the side chain, the torsion
angles that are permitted to vary, and the allowed values or ranges of values for those torsion
angles. You can have up to 10 rotamer_lib blocks in a given configuration file, each one
pertaining to a particular protein side chain.
For example, consider the following rotamer_lib command block:
rotamer_lib
name tyr370
chi1 497 498 501 502
chi2 498 501 502 503
rotamer 60 90
rotamer -65 (10) -85 (10:15)
end_rotamer_lib
The text following name is a unique identifier of this rotamer_lib command block. Any text can
be used but the obvious choice is the name of the side chain that the command block refers to, in
this case Tyr370.
The chi1 command specifies the atom numbers of the atoms defining the first rotatable torsion.
In the example, this corresponds to rotation around Co-C|, so the atoms will be the backbone N
(= atom 497), CA (498), CB (501) and CG (502). It is necessary to specify the atoms from the
backbone outwards, i.e. chi1 502 501 498 497 would be invalid.
The chi2 command specifies the second rotatable torsion. In this example, this corresponds to
rotation around C|-C, so the atoms are CA (498), CB (501), CG (502) and CD1 (503).
You may specify up to 8 chi commands in a given rotamer_lib block.
Each rotamer line describes one allowed conformation for the side chain.
Thus, the first rotamer command specifies the first set of allowed values for chi1 and chi2. In the
example, this is chi1 = 60, chi2 = 90.
The second rotamer command specifies the second set of allowed values. The format x (y)
specifies the range (x - y) to (x + y), while x (y:z) specifies the range (x - y) to (x + z).
Note: in practice because the torsion angle distribution is divided into 10 degree bins users may
see angles outside the specified input range as range boundaries are rounded up or down to the
next bin. For instance the actual sampled range for chi2 will be -100 to -60 degrees.
In summary, the effect of this rotamer_lib command block is therefore to allow Tyr370 to adopt
the conformation of precisely chi1 = 60, chi2 = 90, or any conformation in the range chi1 = -80
to -50, chi2 = -100 to -70, with a preference for those angles in the centre of the range.
You can have up to 50 rotamer commands in a rotamer_lib block.
206 GOLD User Guide
docking solutions are found, none of which closely resemble the correct binding mode.
5.2All waters turned off
From Load GOLD run results ... read in the gold.conf file corresponding to your second set
of results (Alternatively, turn off the Clear ligands on loading flag in the SILVER window
already open, and read the results into there. This will allow you to directly compare docks from
different runs).
Click on Display Multiple ligands and then display the reference ligand.
Check each solution in turn against that of the reference ligand. Now it is likely that only one
docking mode is represented. This docking mode is close to that of the reference ligand. It is not
a perfect superposition though, as the ligand attempts to contact the protein, along its edge, more
closely than it does in reality. The values of the docking scores for this run are higher than those
of the previous run
5.3All waters toggled.
From Load GOLD run results ... read in the gold.conf file corresponding to your third set of
results (Alternatively, read the results into a SILVER window containing the other two sets of
docks.)
Click on Display Multiple ligands and then display the reference ligand.
Again only one docking mode should be observed. This docking mode is now much closer to
that of the reference ligand. Also the scores for this run are higher than the two previous runs.
Notice that the two waters able to interact with NH
2
of the ligand, have also been able to
optimise their interactions with H-bond acceptor functionality in the protein, so that both are
making three good H-bonds. The third water has been excluded in all the reported docking
poses.
GOLD User Guide 205
Any warning messages produced will be displayed in a separate GA Program Error Message
window. Select Dismiss to close this window.
Once the job is complete the message GA Done will appear.
4.2All waters turned off
Now access the Water Selection pane again. Double click the top Water atom Number so it
becomes displayed above the Add water or Update selected water button. Change the Water
State to Off and then hit Add water or Update selected water.
Repeat for both the other waters, so that they are now all turned off. Return to the main interface.
Go to the Output... pane and change the output sub-directory to a new one.
Edit the name of the GOLD configuration file in the Configuration File text box in the top pane
to gold2.conf.
Hit Run. There is no need to change any other settings.
4.3All waters toggled
Return to the Water Selection pane again. Change the state of each water to toggle and ensure
that the water orientation is set to spin. Return to the main interface
Go to the Output... pane and change the output sub-directory to a new one.
Edit the name of the GOLD configuration file in the Configuration File text box in the top pane
to gold3.conf.
Hit Run. There is no need to change any other settings for this job. Because allowing the waters
to toggle on or off normally increases the size of the search necessary to find a good docking
mode, it is generally recommended to increase the search time allowed per ligand, when
toggling waters. The search problem becomes harder the more waters that are included. In this
case because of the small size of the binding site and the fact the ligand has no rotatable bonds,
the search problem is not large and the same settings can be used throughout.
5. Analysis of results
5.1 All waters turned on (see page 205)
5.2 All waters turned off (see page 206)
5.3 All waters toggled. (see page 206)
5.1All waters turned on
Open SILVER and from Load GOLD run results ... read in the gold.conf file corresponding
to your first set of results.
Click on Display Multiple ligands and then display the reference ligand.
Check each solution in turn against that of the reference ligand. You should find that several
20 GOLD User Guide
3.6.3 Using a Standard Rotamer Library
The file <GOLD_DIR>/gold/rotamer_library.txt contains information taken from
the paper The Penultimate Rotamer Library, S. C. Lovell, J. M. Word, J. S. Richardson & D. C.
Richardson, Proteins, 40, 389-408, 2000. It is a compilation of the most commonly observed
side-chain conformations for the naturally occurring amino acids.
To make use of the rotamer information for a given residue, copy and paste the relevant
rotamer_lib section into the GOLD configuration file. The residue name should be changed to
something more meaningful, e.g. name 1qon_TYR370. The atom numbers that define each
torsion angle (starting with the residue backbone N atom) should be entered on the lines starting
chi, e.g. the opening lines of the template:
rotamer_lib
name tyrosine
chi1 <at1 at2 at3 at4>
chi2 <at1 at2 at3 at4>
rotamer 62 (13) 90 (13)
rotamer -177 (11) 80 (11)
rotamer -65 (11) -85 (11)
rotamer -65 (11) -30 (18)
end_rotamer_lib
might be edited to:
rotamer_lib
name 1qon_TYR370
chi1 497 498 501 502
chi2 498 501 502 503
etc.
All defined torsion angles, i.e. rotamer lines, can be used if required. Rotamer lines that are not
needed can be deleted or commented out (by inserting the character # at the start of the line).
Tolerances (in brackets) can be edited or deleted altogether. These tolerances allow some leeway
for torsion angles (see Section 3.6.2, page 19) and in the file represent the positions of peak half-
height either side of the torsion distribution peak, as determined by Lovell et al.
3.6.4 Allowing a Localised Backbone Movement
Quite often, a side-chain rotation is accompanied by a small change in the local backbone
conformation. For example, the figure below shows a detail from an overlay of two PDB
structures (1qon, 1dx4) of the same enzyme:
GOLD User Guide 21
Not only has the Tyr side chain rotated around Co-C| and C|-C, but there has also been a small
backbone movement, primarily affecting the position of the Co atom.
Although minor (the two Co positions are only 0.6 apart), this movement is extremely
important because it alters the vector direction Co-C|, and this can have a big leverage effect on
the positions of atoms further down the side chain. In this case, it is impossible to overlay the
Tyr370 side chain of 1dx4 closely onto that of 1qon simply by rotating around the Co-C| and
C|-C bonds. This is about as close as one can get:
204 GOLD User Guide
be using both these options shortly.
Click on the Add water or Update selected water button.
Repeat for waters 2169 and 2176.
Click Done.
4. Running GOLD Dockings
4.1 All waters turned on (see page 204)
4.2 All waters turned off (see page 205)
4.3 All waters toggled (see page 205)
4.1All waters turned on
The gold.conf file for this docking job uses the ChemScore scoring function. The Genetic
Algorithm parameters used are the defaults
Ensure that the allow early termination flag is set on.
Hit Output... on the Input Parameters and Files page and put the name of an appropriate sub-
directory in the Output directory... box. Click done to get back to the main interface.
Edit the name of the GOLD configuration file in the Configuration File text box in the top pane
to gold1.conf.
Click on the Run button in the Control panel of the GOLD front end, this will start the GOLD
job interactively. As the job progresses output will be displayed in the GOLD Output window.
GOLD User Guide 203
the hydrogen positions on the waters have been optimised for maximal hydrogen bonding.This
doesnt matter as the water hydrogen positions can be optimised during docking.
3. Setting up protein bound waters
A configuration file gold.conf has been provided for this tutorial which will automatically
load most of the settings and parameter values for this tutorial into the GOLD front end.
Open GOLD and click on the Configuration File button within the Control panel of the GOLD
front end, then select the file gold.conf from <GOLD_DIR>/examples/tutorial5
and hit Open.
We will need to identify the waters in the binding site that we particularly want to consider, and
set up their chosen states. Click on the Waters button on the Input Parameters and Files pane.
This will take you to the Water Selection pane.
The atom IDs for the waters that we need, are those of the water oxygens in the protein.mol2 file.
The relevant atom IDs are 2168 and 2169 for the waters that are not occluded by the ligand, and
2176 for the water that is so occluded.
Type 2168 in the water atom no. (oxygen) box and ensure that the water state is set to On and
water orientation is set toSpin. This sets this water to be always present in the binding site and
allows the hydrogen positions to vary during docking, in order to maximise the hydrogen
bonding score both from interactions with the protein and the ligand. The Off water state
option allows a water to be removed from consideration during docking. The Toggle option
sets a water up so that it may either be removed, or kept and made use of in terms of hydrogen
bonding, depending on which arrangement scores most highly for a given ligand pose. We will
22 GOLD User Guide
The backbone movement can be mimicked by allowing the Co atom and the attached side chain
to rotate around the N-C vector, where N and C are the backbone atoms on either side of the Co
atom. This is defined as a rotation of the improper torsion defined by the atom sequence CA-N-
C-CA (atom numbers 498, 497, 499 and 498 in this example):
rotamer_lib
name tyr370
chi1 498 497 499 498
chi2 497 498 501 502
chi3 498 501 502 503
rotamer 0 (30) 62 (11) 90 (11)
rotamer 0 (30) -65 (11) -85 (21)
end_rotamer_lib
This is the rotamer_lib block used as an example earlier (see Section 3.6.2, page 19), except that

GOLD User Guide 23
an additional improper torsion has been defined as chi1 and the original chi1 and chi2 have been
renamed as chi2 and chi3. The specification 0 (30) for the improper torsion angle will allow a
rotation of (+ or -)30 degrees around the N-C vector, the zero angle corresponding to the Co
position given in the protein input file.
It is not easy to decide on suitable rotation limits for improper torsions - a trial and error
approach is normally required - but they often need to be quite large. For example, an improper
rotation of about +40 degrees has to be applied to Tyr370 of 1dx4 for it to be possible to overlay
the side chain closely onto the 1qon Tyr370 position.
3.6.5 Protein-Protein Clashes
By default, when a flexible side chain is moved during docking, GOLD checks whether any of
its atoms clash with atoms in neighbouring residues. This gives rise to an extra Protein Energy
term which contributes to the total GoldScore value.
The term is computed by summing the van der Waals interactions of all pairs of protein atoms
which satisfy the following conditions: (a) at least one of the protein atoms is in a flexible side
chain; (b) the van der Waals term for that pair of atoms is repulsive. The van der Waals
interactions will be estimated using the same potential as is used for the protein-ligand vdw term
(by default, this is a 4-8 potential).
The protein-protein clash term can be switched off by including the command
penalise_protein_clashes = 0 anywhere in a rotamer_lib block, e.g.
rotamer_lib
name tyr370
penalise_protein_clashes = 0
chi1 497 498 501 502
chi2 498 501 502 503
rotamer 62 (13) 90 (13)
rotamer -65 (11) -85 (11)
end_rotamer_lib
This will switch off calculation of the protein-protein clash term for all flexible side chains, not
just the one corresponding to the rotamer_lib block in which you have placed the
penalise_protein_clashes = 0 command.
202 GOLD User Guide
Tutorial 5: Docking with Water in the Binding Site
1. Introduction (see page 202)
2. Preparation of Input Files (see page 202)
3. Setting up protein bound waters (see page 203)
4. Running GOLD Dockings (see page 204)
5. Analysis of results (see page 205)
6. Changing the Scoring function (see page 207)
1. Introduction
The object of this tutorial is to investigate docking to a binding site that contains water
molecules which a ligand may either displace, or alternatively, make use of through hydrogen
bond interactions.
The protein used here is acetylcholine esterase (PDB entry code 1ACJ), the protein that, more
than any other, is essential for the correct transmittal of nerve impulses in the brain and around
the body. The ligand is tacrine, an inhibitor of acetylcholine esterase which is a drug used to treat
Alzheimers disease. The active site of the enzyme has been modelled with three water
molecules in it, each of which makes hydrogen bonds with the protein.
This tutorial will illustrate the requirements for setting up and running dockings in which the
protein binding site features one or more water molecules. The example chosen mimics the
situation where a researcher has a crystal structure of a protein binding site, and is unsure which
and how many of the waters in that binding site should be included in the model for use in an
inhibitor design effort.
2. Preparation of Input Files
Open SILVER and read in the file protein.mol2 from <GOLD_DIR>/examples/
tutorial5.
The acetylcholine esterase protein.mol2, has already been set up in accordance with the
guidelines for the preparation of protein input files (see Section 3., page 9).
The full protein is not displayed. Parts of the protein remote from the binding site have been
deleted in order to speed up the calculation (see Section 3.1, page 9). Hydrogen atoms have been
placed on the protein in order to ensure that the ionisation and tautomeric states are defined
unambiguously (see Section 3.2, page 10).
Read in the file ligand_reference.mol2 from <GOLD_DIR>/examples/tutorial5. You
will be able to see how the ligand tacrine chooses to bind. You will find that two of the waters in
the active site are within hydrogen bonding distance to the NH
2
of the ligand, as well as to
hydrogen bond acceptors on the protein. The third water is at a position where it can make a
hydrogen bond to the same Histidine backbone carbonyl as the protonated ring nitrogen of the
ligand. This water cannot be accommodated if tacrine takes up its normal binding mode. None of
GOLD User Guide 201
gold_soln_ligands_m1_3.mol2).
Using SILVER, read in the docking results by specifying the gold.conf file.
The position and orientation of the terminal sulphonamide groups in the docked solutions should
be similar to that observed in the co-crystallised ETS inhibitor (i.e. coordinated to the zinc
within the protein via the sulphonamide nitrogen).
In the example below the terminal sulphonamide group of GOLDs top-ranked solution can be
seen to satisfy the specified constraint and reproduces the known binding mode of the co-
crystallised ETS inhibitor:
This ends the tutorial.
24 GOLD User Guide
3.6.6 Specifying the Energy of a Side-Chain Rotamer
An energy may be assigned to a given rotamer, e.g. as follows:
rotamer_lib
name tyr370
chi1 497 498 501 502
chi2 498 501 502 503
rotamer 62 (11) 90 (11)
energy 10
rotamer -65 (11) -85 (18)
end_rotamer_lib
This will penalise (i.e. reduce) the GoldScore value by 10 units if the Tyr370 side chain is placed
in the chi1 = 62, chi2 = 90 conformation. In other words, it makes this conformation less
favourable.
Had the command energy -10 been included, its effect would have been to improve (i.e.
increase) the GoldScore value.
3.7 Large Backbone Movements
The only way of dealing with large backbone movements in GOLD is to perform separate
docking runs on different binding-site conformations.
Small backbone movements in the vicinity of a flexible side chain may be allowed by including
the improper torsion angle CA-N-C-CA in a rotamer_lib command block (see Section 3.6.4,
page 20). Another option you can try is to apply a Localised Soft Potential to one or more
residues in the loop (see Section 6.2.1, page 47) (GoldScore only).
3.8 Defining the Binding Site
You must specify the approximate centre and extent of the binding site. This can be done in
several ways:
from a point (see Section 3.8.1, page 25);
from a protein atom (see Section 3.8.2, page 25);
from a file containing a list of atoms (see Section 3.8.3, page 26);
from a protein residue (see Section 3.8.4, page 26);
from a file containing a list of residues (see Section 3.8.5, page 27);
from a reference ligand (see Section 3.8.6, page 28).
You can use cavity detection to confine the calculation to regions enclosed within concave parts
of the binding site surface (see Section 3.8.7, page 28).
The cavity volume, as determined by the cavity detection algorithm, can also be output (see
Section 3.8.8, page 29).
GOLD User Guide 25
3.8.1 Defining a Binding Site from a Point
Switch on the button labelled Point in the GOLD front end.
In the three boxes, type the orthogonal x,y,z coordinates of a single solvent-accessible point
approximately at the centre of the active site in the protein.
The approximate radius of the binding site must also be specified. By default the binding site
radius is set to 10.0 .Type a radius in the box labelled Active site radius.
If r is the radius, the binding site will be defined as all atoms within r of the specified point.
The radius should be large enough to contain any possible binding mode of the ligand.
3.8.2 Defining a Binding Site from an Atom
Switch on the button labelled Atom in the GOLD front end.
Type in the atom number (as it appears in the protein input file) of a single solvent-accessible
protein atom close to the centre of the active site of the protein.
The approximate radius of the binding site must also be specified. By default the binding site
radius is set to 10.0 .Type a radius in the box labelled Active site radius
If r is the radius, the binding site will be defined as all atoms within r of the specified protein
atom.
The radius should be large enough to contain any possible binding mode of the ligand.
200 GOLD User Guide
The file gives total fitness scores and a breakdown of the fitness into its constituent energy
terms. For GoldScore, these are the two vdw energy terms (protein-ligand and internal ligand),
an internal ligand torsion term, and two hydrogen-bonding terms (protein-ligand and ligand
intramolecular).
An additional constraint scoring term S(con) is also listed. For docking solutions which satisfy
the specified distance constraint the contribution from this scoring term will be 0.00. However,
for solutions in which the constrained distances lie outside the specified bounds a negative
S(con) score will be applied, thus reducing the overall fitness.
Further details relating to substructure-based constraints are given within individual ligand log
files. Your output directory should contain ten ligand log files gold_ligands_m#.log, one for
each ligand.
Open and inspect the ligand log file corresponding to the first ligand in the input file, i.e.
gold_ligands_m1.log. This file will contain the distance bounds as specified in the constraint and
the actual distance observed in the docked solution:
From your bestranking.lst file identify GOLDs top ranked solution for the ligand with the best
total fitness score (in the example bestranking.lst file given above this would be
GOLD User Guide 199
As with standard distance constraints, the fitness score is reduced for solutions which do not
satisfy the constraint. The amount by which the score is reduced is determined by a user-defined
weight term. Set the value of the Spring const. to 20.0, then click on the Add constraint to update
selected constraint button to add the constraint to the Current Constraints list. Hit Done to close
the Constraint Editor.
4. Running GOLD
The time taken by GOLD to dock ligands can be controlled by altering the values of the genetic
algorithm (GA) parameters (see Section 10., page 89). GOLD runs for a fixed number of genetic
operations (crossover, migration, mutation). Therefore reducing the number of GA operations
performed during the course of a run will result in GOLD running faster, however the search will
be less exhaustive.
GOLD can decide on the optimal settings to use for a given ligand (see Section 11.3, page 94).
To enable automatic GA settings, click on the Select GA Presets and Automatic Settings button in
the Genetic Algorithm Parameters panel (or hit Settings in the Control panel) then, in the
Settings selector window, click on Use automatic settings. Ensure the Search efficiency is set to
100%, then hit Done.
Click on the Output button within the Input Files and Parameters panel, then hit the Output
Directory... button. Specify a directory, to which you have write permission, this is where the
GOLD output files will be written. Select Ok to close the Output preferences window.
Click on the Run button in the Control panel of the GOLD front end, this will start the GOLD
job interactively. As the job progresses output will be displayed in the GOLD Output window.
Any warning messages produced will be displayed in a separate GA Program Error Message
window. Select Dismiss to close this window.
Once the job is complete the message GA Done will appear.
5. Analysis of Output
A file called bestranking.lst is written for batch jobs on multiple ligands. Open and inspect the
file bestranking.lst from your specified output directory in a text editor. This file gives a
continuous summary of the best solution that has been obtained for each docked ligand.
The listed file names correspond to the names of the files containing the best solution found for
each ligand. For example, in the file below, gold_soln_ligands_m1_3.mol2 contains the best
answer found for the first ligand (m1) in the input file:
26 GOLD User Guide
3.8.3 Defining a Binding Site from a List of Atoms
Switch on the button labelled Atom nos. in the GOLD front end.
A file which contains a list of protein atom numbers must be specified.
Multiple atoms numbers are permitted on each line in the file, it is therefore possible to re-use an
existing active site definition by using the list of active atoms printed in the protein.log file.
Example file format is shown below:
Each index is an index of an atom in the input protein.
The list should contain all the solvent-accessible atoms which are required to explicitly define
the protein active site since all acceptor and donor hydrogen atoms available to the ligand are
taken from the list.
3.8.4 Defining a Binding Site from a Single Residue
The ability to define a binding site from a single residues is not available from the GOLD front
end. To do this, you need to edit the gold.conf file (see Section 15.1, page 126) and add the
commands:
floodfill_atom_no = <atom number>
floodfill_center = residue
The atom number given can be any atom within the residue you want to define the active site
from. GOLD will then get the substructure ID from that atom and find all other atoms that
belong to the same substructure.
GOLD User Guide 27
Note: in order to define the active site in this way, the amino acid substructures must be properly
defined in the protein input file.
The approximate radius of the binding site must also be specified. By default the binding site
radius is set to 10.0 .Type a radius in the box labelled Active site radius.
If r is the specified radius, all protein atoms within r of each atom in the selected residue are
found, then all these atoms plus the atoms of their associated residues are used for the active site
definition.
3.8.5 Defining a Binding Site from a List of Residues
The ability to define a binding site from a list of residues is not available from the GOLD front
end. To do this, you need to edit the gold.conf file (see Section 15.1, page 126) and add the
commands:
floodfill_center = list_of_residues
cavity_file = <path to text file>
GOLD will then read in the specified text file e.g. list_of_residues.txt, and extract the
residues listed.
The list of residues can be extracted from any text file, including a standard GOLD solution file
(GOLD writes the active site residues list to the solution files if output of rotatable hydrogens is
turned on).
The following formatting restrictions apply:
The list must begin with the following tag on its own line:
> <Gold.Protein.ActiveResidues>
The list must end with a blank line (or the end of the text file).
GOLD will read multiple residue names from one line, but lines must not exceed 250
characters in length.
Residue names must be separated by a space, for example:
> <Gold.Protein.ActiveResidues>
HIS69 ARG71 GLU72 ARG127 ASN144 ARG145 GLY155 ALA156 GLU163
THR164 HIS196 SER197 TYR198 SER199 LEU201 LEU203 ILE243 ILE244
ILE247 TYR248 GLN249 ALA250 GLY253 SER254 ILE255 THR268 GLU270
PHE279 ZN309
The list should contain all the residues which are required to explicitly define the protein active
site since all acceptor and donor hydrogen atoms available to the ligand are taken from the list.
198 GOLD User Guide
Click on the Substructure file name button, then select the file substructure.mol2 from
<GOLD_DIR>/examples/tutorial4 and hit Open.
Enter the Protein atom number and Substructure atom number to which the constraint applies.
These are 2041 (the zinc atom number in the protein.mol2 input file) and 4 (the sulphonamide
nitrogen atom in the substructure.mol2 file) respectively.
Specify the allowed range of separation by entering a Maximum separation of 2.50 and a
Minimum separation of 1.50 (distances are in ).
GOLD User Guide 197
When setting up a distance constraint the protein and ligand atom numbers, as defined in the
MOL2 input files, must be used. The maximum and minimum separation of the constrained
atoms must also be entered (distances are in ).
During a GOLD run, if a constrained distance is found to lie outside the specified bounds, a
spring energy term is used to reduce the fitness score. The spring energy term (E) = kx2, where x
is the difference between the distance and the closest constraint bound and k is a user-defined
spring constant.
Select Cancel to close the Constraint Editor.
3.2 Substructure-Based Distance Constraints
It is possible to apply a distance constraint to multiple ligands which have a common
substructure or functional group.
In order to use a substructure-based distance constraint it is first necessary to create a file
containing the common substructure in MOL2 format.
The substructure-based constraint forces GOLD to limit the distance between a protein atom and
one atom of this functional group.
During docking the constraint will be applied to any ligands which contain the specified
substructure (matching is performed on the basis of the atom types and 2D connectivity) and the
resulting solutions will be biased towards the specified distance range.
A substructure file containing a sulphonamide group has been provided for this tutorial. Open
and inspect the file substructure.mol2 from <GOLD_DIR>/examples/tutorial4 within
SILVER. When creating your own substructure files it is recommended that you set atom types
manually (see Section 8.2.3, page 72) since an incomplete fragment can cause problems with
automatic atom-typing.
Click on the Edit Constraints button to bring up the Constraint Editor. Then, select Substructure
Constraint from the list of constraint types:
28 GOLD User Guide
Note: It is possible to use cavity detection when defining the active site from a list of residues.
With cavity detection enabled, the cavity definition will be restricted to those specified atoms
that are solvent-accessible (see Section 3.8.7, page 28).
3.8.6 Defining a Binding Site from a Reference Ligand
Switch on the button labelled Ligand in the GOLD front end.
Enter the name of a file which contains a reference ligand. This could be a ligand in a known
binding mode, or the co-crystallised ligand.
By default all protein atoms within 5.0 of each ligand atom are found, then all these atoms
plus the atoms of their associated residues are used for the active site definition.
These default settings can yield very large binding site definitions. To use only those protein
atoms within the cavity distance threshold of each ligand atom (i.e. do not also include all atoms
of their associated residues), edit the gold.conf file (see Section 15.1, page 126) and enter the
keyword atom on the following line:
floodfill_center cavity_from_ligand <distance> atom
Note: the cavity distance threshold can also be changed by specifying a new
<distance>value.
Note: It is possible to use cavity detection when defining the active site from a list of residues.
With cavity detection enabled, the cavity definition will be restricted to those specified atoms
that are solvent-accessible (see Section 3.8.7, page 28).
3.8.7 Cavity Detection
A cavity detection algorithm (Hendlich, Rippmann and Barnickel, LIGSITE: Automatic and
efficient detection of potential small molecule binding sites in proteins, Merck technical report,
1997) is used to restrict the region of interest to concave, solvent-accessible surfaces.
Cavity detection is enabled by switching on the button labelled Detect Cavity.
GOLD User Guide 29
3.8.8 Output of Cavity Volume
The cavity volume, as determined by the cavity detection algorithm, can be output. To do this,
you need to edit the gold.params file and add the command:
CAVITY_VOLUMES=1
The volume of the cavity will be written to the gold_protein.log file, e.g.
Volume of docking regions (stochastic sampling)
Box volume: 39060.00
Acc. volume: 26703.33
Surf. volume: 15492.00
Probe rad. 1.400, Samples: 117180
3.9 Protein File Formats
Acceptable protein file formats are PDB and MOL2.
3.10 Specifying the Protein File Name
Click on the Protein button in the GOLD front-end. The file selection window will appear, e.g.
Use the file selection window to choose the protein data file. When you have finished, the file
name will appear in the entry box next to the Protein button in the GOLD front end.
196 GOLD User Guide
3. Distance Constraints
3.1 Standard Distance Constraints (see page 196)
3.2 Substructure-Based Distance Constraints (see page 197)
Any distance between a ligand atom and a protein atom can be constrained, or restrained, to lie
between minimum and maximum distance bounds.
GOLD features two types of distance constraint:
A standard distance constraint for use with individual ligands (see Section 3.1, page 196).
A substructure-based distance constraint for use with multiple ligands which have a common
functional group (see Section 3.2, page 197).
3.1 Standard Distance Constraints
Click on the Edit Constraints button to bring up the Constraint Editor. Then, select Distance
Constraint from the list of constraint types:
GOLD User Guide 195
The terminal sulphonamide nitrogen atom of the ligand clearly coordinates to the zinc. We can
attempt to reproduce this known binding mode within GOLD with the introduction of a distance
constraint during docking.
Ten ligands, each structurally similar to the ETS inhibitor, will be screened using GOLD. These
ligands were identified using Relibase+, a program for search and anaLysis of protein-ligand
complexes (http://www.ccdc.cam.ac.uk/products/life_sciences/relibase/).
These ligands, ligand.mol2, are available from <GOLD_DIR>/examples/tutorial4, note
that each of the ten ligands in this file features a terminal sulphonamide group.
A configuration file (gold.conf) has been provided for this tutorial which will automatically load
the settings and parameter values for this tutorial into the GOLD front end.
Open GOLD and click on the Configuration File button within the Control panel of the GOLD
front end, then select the file gold.conf from <GOLD_DIR>/examples/tutorial4 and hit
Open.
30 GOLD User Guide
4. Setting Up Ligands
4.1 Essential Steps in Setting Up a Ligand (see page 30)
4.2 Ligand Hydrogen Atoms, Ionisation States and Tautomeric States (see page 30)
4.3 Ligand Geometry, Conformation and Stereochemistry (see page 31)
4.4 Ligand File Formats (see page 31)
4.5 Specifying the Ligand File(s) (see page 32)
4.6 Setting Up Covalently Bound Ligands (see page 33)
4.1 Essential Steps in Setting Up a Ligand
Add all hydrogen atoms, including those necessary to define the correct ionisation and
tautomeric states (see Section 4.2, page 30).
Ensure that all bond types are correct. If they are, and hydrogen atoms have been placed on the
correct atoms, GOLD will deduce atom types automatically when atom typing is turned on (see
Section 5.2, page 36).
GOLD assigns atom types from the information about element types and bond orders in the
input structure file, so it is important that these are correct. However, if for any reason, GOLD is
unable to deduce an atom type, then the atom in question will be replaced with a dummy atom
type Du. If this is the case a warning message will be given in the gold_protein.log file.
The presence of dummy atoms should not significantly affect the docking prediction since
dummy atoms are neither considered as donors or acceptors.
There is usually a right and a wrong way to code groups which can be drawn in more than one
way (i.e. have more than one canonical form), such as nitro, carboxylate and amidinium (see
Section 5.5, page 39).
The starting geometry of the ligand should be reasonably low in energy, since GOLD will not
alter bond lengths or angles, or rotate rigid bonds such as amide linkages, double bonds and
certain bonds to trigonal nitrogens. However, GOLD will optimise the values of torsion angles
around rotatable bonds.
Save the ligand as a MOL2 file (i.e. Tripos format) or a MOL file (i.e. MDL SD format). It is also
possible (but not recommended) to use PDB format. If using PDB format CONECT records
should also be included (see Section 4.4, page 31).
4.2 Ligand Hydrogen Atoms, Ionisation States and Tautomeric States
GOLD uses an all-atom model, so the ligand must have all hydrogen atoms added.
The precise geometrical positions of rotatable (e.g. hydroxyl and amino) hydrogen atoms do not
matter, as they will be optimised during the GOLD run.
GOLD deduces hydrogen-bonding abilities from the presence or absence of hydrogen atoms.
For example, you can control the protonation state of a carboxylic acid group by adding or
removing the ionisable hydrogen atom.
GOLD User Guide 31
If incorrect ionisation or tautomeric states are inferred by the program, it is unlikely that correct
protein-ligand binding modes will be predicted. If you are unsure about, e.g., the preferred
ionisation state of the ligand, you should perform separate GOLD runs using the different
possibilities.
GOLD ignores atom charges, both formal and partial. It deduces whether an atom is charged by
counting the bond orders of the bonds that it forms and comparing the result with the atoms
normal valency.
4.3 Ligand Geometry, Conformation and Stereochemistry
The ligand conformation will be varied by GOLD during docking. The starting conformation
therefore does not matter.
GOLD will not alter bond lengths or angles. These parameters should therefore be set to
reasonably optimum values. A good practice is to build the ligand in an arbitrary conformation
and then perform a few cycles of molecular-mechanics minimisation to take the ligand close to
its local potential-energy minimum.
Ring conformations and the torsion angles around rigid bonds such as amide linkages, double
bonds and certain bonds to trigonal nitrogens will normally be fixed at their starting values.
However, you can use the Fitness and Search Options button in the GOLD front end to enable
some of these features to vary (see Section 7., page 64).
GOLD will not alter stereochemistry. If you are unsure about the stereochemistry of the ligand,
you must generate all alternatives and dock each separately. It is meaningful to make
comparisons between fitness scores for dockings of different stereoisomers.
4.4 Ligand File Formats
Acceptable ligand file formats are MOL2 (i.e. Tripos format), MOL (i.e. MDL SD format) and
PDB (although we do not recommend the use of pdb format). Files in MOL format may also
have the extension .mdl or .sdf.
Only MOL2 may be used if you wish to set ligand atom types manually.
An extension to the PDB file format is required if it is used for storing the ligand structure.
Specifically, a bond specified twice in a single CONECT record is assumed to be a double bond,
and a bond specified three times in a single CONECT record is assumed to be a triple bond. For
example, the following CONECT records both specify a double bond between atoms with serial
numbers 25 and 26:
CONECT 25 20 26 30 26
CONECT 26 25 27 52 25
This mechanism for specifying bond orders is forced by the lack of a bond-order field in the
standard PDB format, and seems to offer lots of scope for users to commit errors. For that
reason, we recommend that the PDB format is not used for ligands.
194 GOLD User Guide
Tutorial 4: Use of Substructure Based Distance Constraints
1. Introduction (see page 194)
2. Input Files (see page 194)
3. Distance Constraints (see page 196)
4. Running GOLD (see page 199)
5. Analysis of Output (see page 199)
1. Introduction
The object of this tutorial is to assess the binding of a small number of structurally related
ligands with the carbonic anhydrase II, PDB entry code 1cil. In the ETS inhibitor a terminal
sulphonamide nitrogen atom is observed to coordinate to a zinc atom within the protein binding
site.
This tutorial will illustrate how GOLD can be used to screen a number of compounds in order to
identify ligands with potential activity. The use of constraints in order to bias solutions towards
the observed binding mode of the inhibitor will also be demonstrated, as well as the use of
automatic speed settings.
2. Input Files
Open SILVER and read in the file protein.mol2 from <GOLD_DIR>/examples/
tutorial4. The original protein PDB file <GOLD_DIR>/examples/tutorial4/
1CIL.pdb has also been provided should you wish to set up the protein for yourself.
Carbonic anhydrase II, 1cil, protein.mol2, has already been set up in accordance with the
guidelines for the preparation of protein input files (see Section 3.2, page 10).
Upon inspection of the protein you will see that the zinc atom is coordinated to three histidine
groups, the one remaining zinc coordination site is available for binding to the ligand.
Read in the file ligand_reference.mol2 from <GOLD_DIR>/examples/tutorial4. Inspect
the crystallographically observed position of the ETS inhibitor (shown in green) within the
protein binding site:
GOLD User Guide 193
Interactions between the cyclic urea inhibitor and HIV-1 protease can be divided into two
groups: those that anchor the scaffold in the active site and those that fix the substituents in the
target subsites.
Confirm that the hydrogen bonds specified in the constraints are formed as expected to the cyclic
urea scaffold by measuring the relevant contact distances. Identify any additional hydrogen
bonding interactions between the benzimidazole substituents and the target subsites within the
protein.
This ends the tutorial.
32 GOLD User Guide
4.5 Specifying the Ligand File(s)
Any number of ligands can be specified, either by selecting several individual files, or by
selecting a directory containing several ligand files, or by selecting a single file containing
several ligands (i.e. a multi-MOL2 or SD file). GOLD will dock each in turn.
Click on the Edit Ligand File List button in the GOLD front-end. The Ligand Selection for
docking run window will appear:
Click on the Filename button. In the resulting dialog select the required file or directory and hit
Open.
Specify the number of times each ligand is to be docked by entering a value in the No. of GA
runs box (see Section 11.1, page 93).
When using a single file containing several ligands (i.e. a multi-MOL2 or SD file) it is possible to
only dock specific ligands in that file. Enable the Specify ligand numbers check-box and specify
which ligand you wish to start and finish docking at (by entering the number relating to the
position of the ligand within the file).
Note: Unless specified otherwise GOLD will, by default, start at the first ligand and finish at the
last ligand in the file.
Once a selection has been made click on either the Add file or Update selected file button or Add
all files in directory button to add the chosen ligand file or directory to the Current Ligand File
Selection.
Repeat the above procedure if you want to select further ligands for docking.
GOLD User Guide 33
To edit a Ligand File Selection (e.g. to change the number of times the ligand will be docked)
highlight the ligand file with the mouse, make the required change and hit the Add file or Update
selected file button.
To remove a file from the Current Ligand File Selection highlight the ligand file with the mouse
and hit the Delete Selection button, or to remove all files hit the Clear List button.
Click on Done in the Ligand Selection window when you are satisfied with the selected ligands.
When you finish, the count of ligands will be updated in the front end.
4.6 Setting Up Covalently Bound Ligands
GOLD is able to dock covalently bound inhibitors, but only if you specify which ligand atom is
bonded to which protein atom. GOLD supports two types of covalent link:
A covalent link for use with individual ligands (see Section 4.6.2, page 34).
A substructure-based covalent link for use with multiple ligands which have a common
functional group (see Section 4.6.3, page 34).
4.6.1 Method Used for Docking Covalently Bound Ligands (see page 33)
4.6.2 Setting Up a Single Covalent Link (see page 34)
4.6.3 Setting Up Substructure-Based Covalent Links (see page 34)
4.6.1 Method Used for Docking Covalently Bound Ligands
GOLD is able to dock covalently bound inhibitors, but only if you specify which ligand atom is
bonded to which protein atom.
The program assumes that there is just one atom linking the ligand to the protein (e.g. the O in a
serine residue). Both protein and ligand files are set up with the link atom included (so, if the
serine O is the link atom, it will appear in both the protein and ligand input files). Ideally the link
atom, in both the ligand and the protein, will have a free valence available through which the
link can be made. If the link atom on the ligand does not have a free valence, having a hydrogen
instead, then the docking will proceed and the hydrogen will be ignored in terms of its
contribution to the fitness score. It will however still be displayed when docking poses are
visualised.
Inside the GOLD least-squares fitting routine, the link atom in the ligand will be forced to fit
onto the link atom in the protein.
In order to make sure that the geometry of the bound ligand is correct, the angle-bending
potential from the Tripos Force Field has been incorporated into the fitness function. On
evaluating the score for the docked ligand, the angle-bending energy for the link atom is
included in the calculation of the fitness score.
This seems to work well in the systems on which GOLD was validated. However, since the
protein is held rigid (apart from hydroxyl hydrogen atoms), it does require that the position of
the link atom in the protein is sensible.
192 GOLD User Guide
terms for each docking performed on the ligand. In the example below the fitness score for the
solution found on the first docking attempt (gold_soln_ligand_m1_1.mol2) is shown:
A constraint scoring term S(con) is listed for each docking. If a solution predicted by GOLD
satisfies all of the protein H bond constraints then the contribution from this scoring term will be
0.00. However, for solutions in which not all of the constraints are satisfied, a penalty will be
applied to the fitness score for each constrained H bond that is not formed. The value of this
penalty is the Constraint weight previously specified (see Section 3.2.1, page 187).
The details of each specified protein H bond constraint satisfied in the solution are listed and an
overall constraint score is given. A list of all hydrogen bonds formed between ligand and protein
is also provided in the ligand log file.
From your ligand_m1.rnk output file identify GOLD top ranked solution. Docking attempts are
listed in decreasing order of fitness, so the best solution is placed first. Load the GOLD results
into SILVER and display the top-ranked solution.
Inspect how well the docked inhibitor fits within the protein binding site as predicted by GOLD:
GOLD User Guide 191
Once all of these protein H bond constraints have been set up the Constraints Editor window
should contain four individual constraints:
Hit Done to close the Constraint Editor window.
4. Running GOLD
Click on the Output button within the Input Files and Parameters panel, then hit the Output
Directory... button. Specify a directory, to which you have write permission, this is where the
GOLD output files will be written. Select Ok to close the Output preferences window.
Click on the Run button in the Control panel of the GOLD front end, this will start the GOLD
job interactively. As the job progresses output will be displayed in the GOLD Output window.
Once the job is complete the message GA Done will appear.
5. Analysis of Output
Open the file gold_ligand_m1.log, from your specified output directory, using a text editor.
This file will give a total fitness score and a breakdown of the fitness into its constituent energy
34 GOLD User Guide
4.6.2 Setting Up a Single Covalent Link
Set up the protein and ligand structures so that they both contain the link atom (see Section 4.6.1,
page 33).
In the GOLD front end, click on the Covalent button in the Input Parameters and Files panel.
In the resulting Covalently-bound ligand settings dialog enable the Apply covalent docking
check-box and select Atom to atom for the type of link.
Enter the atom numbers (as defined in the Sybyl MOL2 input files, or use PDB sequence
numbers if PDB input is used) of the link atom in the protein and ligand into the appropriate
boxes:
Hit Done to accept the current selections and close the Covalently-bound ligand settings dialog.
Any constraint created can be altered by selecting the relevant line in the Constraint Editor
window, clicking on the Add or Edit button in this window and selecting Edit selected to edit the
constraint.
4.6.3 Setting Up Substructure-Based Covalent Links
It is possible to apply a covalent link to multiple ligands which have a common functional group.
During docking the link will be applied to any ligands which contain a specified substructure
(matching is performed on the basis of the atom types and 2D connectivity).
Note: the substructure must be a sub-graph rather than a complete molecule.
To use a substructure-based covalent link, first create a file containing the substructure in MOL2
format (e.g. substructure.mol2). It is recommended that you set atom types manually (see
Section 5.3, page 37) since an incomplete fragment can cause problems with automatic atom-
typing. The actual conformation of the group in this file is not important, as only the atom types
GOLD User Guide 35
and 2D connectivity will be used.
Click on the Covalent button in the Input Parameters and Files panel of the GOLD front-end.
This will open the Covalently-bound ligand settings dialog:
Enable the Apply covalent docking check-box and select Atom to substructure for the type of
link.
Click on the Substructure file button, then select the required substructure file and hit Open.
Enter the Protein atom number and Substructure atom number to which the covalent link
applies (numbering as in the MOL2 files).
Enable the Topology matching check-box if the constraint refers to a substructure atom (and
therefore a ligand atom) which is topologically equivalent to other atoms (e.g. it is one of the
oxygen atoms of an ionised carboxylate group), GOLD will then use whichever of the
equivalent atoms gives the best result.
Hit Done to accept the current selections and close the Covalently-bound ligand settings dialog.
190 GOLD User Guide
By default the Constraint weight and Minimum H bond geometry weight should be 10.0 and
0.005 respectively. Select Add constraint or update selected constraint to accept these values.
The specified constraint should now appear in the Current Constraints list.
Specify protein H bond constraints for the three remaining key hydrogen bonding interactions as
outlined in the table below:
Protein H bonding
group
Atom number(s) Constraint weight Minimum H bond
geometry weight
Ile50 1388 10.0 0.005
Ile50 468 10.0 0.005
Asp25 1161 or 1162 10.0 0.005
GOLD User Guide 189
Protein H bond constraints can be used in order to attempt to reproduce these key interactions
during docking.
Specify that either oxygen atom of the carboxylate group of Asp25 should form a hydrogen bond
to the ligand by entering the corresponding Protein atom(s) required to form H-bond in the
Constraint Editor window:
36 GOLD User Guide
5. Atom and Bond Types
5.1 Atom and Bond Type Overview (see page 36)
5.2 Automatically Setting Atom and Bond Types (see page 36)
5.3 Manually Setting Atom and Bond Types (see page 37)
5.4 Overriding Automatic Bond Settings (see page 38)
5.5 Atom and Bond Type Conventions for Difficult Groups (see page 39)
5.6 Internal GOLD Atom Types (see page 45)
5.1 Atom and Bond Type Overview
Each protein and ligand atom must be assigned an atom type which is used, for example, to
determine whether the atom is capable of forming hydrogen bonds.
GOLD atom typing is based on SYBYL atom types. Internally, GOLD also uses some additional
atom types (see Section 5.6, page 45).
SYBYL bond types are also used.
Correct assignment of atom and bond types is crucial.
GOLD assigns atom types from the information about element types and bond orders in the
input structure file, so it is important that these are correct. However, if for any reason, GOLD is
unable to deduce an atom type, then the atom in question will be replaced with a dummy atom
type Du. If this is the case a warning message will be given in the gold_protein.log file.
The presence of dummy atoms should not significantly affect the docking prediction since
dummy atoms are neither considered as donors or acceptors.
Atom types may be set manually, provided you are using MOL2 input files (see Section 5.3,
page 37).
Alternatively, they can be set automatically (see Section 5.2, page 36). Unless you are an expert
GOLD user or are dealing with a very unusual ligand structure, you are recommended to use this
option. However, you still need to input the ligand and protein structures correctly, e.g. with
correct bond orders and appropriate protonation states.
5.2 Automatically Setting Atom and Bond Types
Unless you are an expert GOLD user or are dealing with a very unusual ligand structure, you are
recommended to use the automatic atom-type assigner. This requires that the Set atom types
check buttons are switched on in the GOLD front end.
GOLD assigns atom types from the information about element types and bond orders in the
input structure file, so it is important that these are correct (see Section 5.5, page 39). However,
if for any reason, GOLD is unable to deduce an atom type, then the atom in question will be
replaced with a dummy atom type Du.
It does not matter whether the bonds in an aromatic ring are coded as aromatic (ar) or alternate
single and double, as the GOLD atom-type assigner will automatically assign the special
GOLD User Guide 37
SYBYL bond type ar where appropriate.
The atom-type assigner will also detect amide linkages and assign them the SYBYL bond type
am.
Care should be taken when using the type-assignment software on protein input files. In
particular, the software is likely to be unreliable if protein residues have been partially deleted,
so that some atoms appear to have free valencies. This situation can be avoided by ensuring that
all residues included in the input file are complete.
There is usually a right and a wrong way to code groups which can be drawn in more than one
way (i.e. have more than one canonical form), such as nitro, carboxylate and amidinium. A list
of correct bond types for some of the common, difficult groups is available (see Section 5.5,
page 39).
Because correct atom typing is so important, any messages from the type checker are logged in
both the gold_protein.log file and the gold.err file. These errors will also be displayed in a
separate window if GOLD is run through the front end.
5.3 Manually Setting Atom and Bond Types
If you do not want to use the automatic atom- and bond-type assignment available in GOLD,
you can define the atom and bond types yourself, provided that you use MOL2 format. This
option is useful when you want to set unusual atom types or user-defined types.
GOLD atom typing is based on SYBYL atom types (see Appendix A: List of Atom and Bond
Types, page 149).
SYBYL bond types are also used (see Appendix A: List of Atom and Bond Types, page 149).
Even if atom types are set manually, the automatic atom-type assignment software is still run to
check the ligand structure for inconsistencies. Any errors will be recorded in both the log file
and the error file. In most cases, input types will not be reset.
If for any reason GOLD is unable to deduce an atom type, then the atom in question will be
replaced with a dummy atom type Du.
Bond types must be correctly set (see Section 5.5, page 39). This is normally just a case of
checking single and double bonds. However, the amide bond must be set to the am bond type.
Also, the ar bond type is used for delocalised bonds (e.g. in carboxylate, phosphate and
guanidinium ions) as well as for aromatic bonds.
Atom types should conform to those expected in SYBYL. In particular, sp2 oxygen is atom type
O.2, sp3 oxygen is O.3, tetrahedral nitrogen is N.3 (or N.4 if protonated), planar (non-amide)
nitrogen is N.pl3 and the planar amide nitrogen is N.am. The atom type O.co2 should be used for
the oxygens of carboxylate and phosphate ions or the singly-charged oxygen of phenolates.
If an atom is mis-typed, it is possible that GOLD will assign it the wrong H-bond donor or
acceptor properties. Therefore, correct atom-type assignment is crucial. An N.3 donor
(tetrahedral nitrogen), is very different from an N.4 (protonated nitrogen) or an N.pl3 (planar
trigonal nitrogen) donor. The assignment of rotatable bonds may also be affected. If a bond has
188 GOLD User Guide
GOLD. The Minimum H bond geometry weight takes a range of values from 0 to 1, by default
this value is set at 0.005.
3.2.2 Specifying Multiple Constraints
Using the Constraint Editor it is possible to specify several different protein H bond constraints,
with different weights for each constraint. Simply specify the protein atom number and required
weight and click on the Add constraint or Update selected constraint button to add the constraint
definition to the Current Constraints. Repeat the procedure to set up further constraints, each
constraint will be displayed on a separate line in the Constraint Editor window.
For a given protein H bond constraint more than one protein atom number can be entered in the
Protein atom(s) required to form H-bond input box. This will instruct GOLD to use an either-or
type of constraint during docking. For example, specifying two protein atoms m and n, separated
by a space, will result in the constraint being satisfied if an H bond is formed to either m or n
during docking. This is of particular use when defining constraints involving, for example,
carboxylates where it is not important which oxygen atom forms an H bond, provided one does.
3.2.3 Defining the Protein H Bond Constraints
The crystal structures of HIV-1 protease in complex with a number of cyclic urea inhibitors have
been determined. It has been observed that the central urea moiety is anchored in the active site
of the protease by six key hydrogen bonds:
Two hydrogen bonds between the urea oxygen atom and the protein backbone peptide groups
of Ile50 and Ile50 (shown below).
Four hydrogen bonds between the cyclic urea diol and the carboxylates of the catalytic
aspartate of the protein residues (ASP25) (shown below).
GOLD User Guide 187
3.2.1 General Methodology
Click on the Edit Constraints button to bring up the Constraint Editor. Then, select Protein H-
Bond Constraint from the list of constraint types:
When specifying a protein hydrogen bond constraint the protein atom number, as defined in the
MOL2 input file, must be entered.
GOLD will then be biased towards finding solutions in which the specified protein atom forms
hydrogen bonds. However, as with standard hydrogen bond constraints such a solution is not
guaranteed.
During the GOLD run the fitness score of a given docking will be penalised for every protein H-
bond constraint that is not satisfied.
The Constraint weight is the strength of bias applied to the formation of a specified hydrogen
bond in the least squares mapping algorithm within GOLD. The Constraint weight is also the
value of the penalty applied to the fitness score for each constrained H bond that is not formed.
The Minimum H bond geometry weight is a user defined score that determines how good a
hydrogen bonding interaction has to be in order for it to be considered a hydrogen bond by
38 GOLD User Guide
the wrong type, it may be inappropriately allowed to rotate freely.
The BUILD menu in SYBYL has a MODIFY sub-menu for altering atom/bond types. There is
also a dialogue box for displaying atom- and bond-type labels.
A list of atom and bond type conventions for some common, difficult groups is available (see
Section 5.5, page 39).
5.4 Overriding Automatic Bond Settings
When using fitness flags, e.g. Flip amide bonds (see Section 7.2, page 64) or Flip all planar R-
NR1R2 (see Section 7.3, page 65), the bond in question is treated in a specific manner at ligand
initialisation to prepare it for the docking run (in both the aforementioned cases, the bond is
flattened at ligand initialisation prior to it being flipped during docking).
If a bond is e.g. desired to rotate freely rather than flip during docking, this fine-grained control
can be achieved by using the rotatable_bond_override.mol2 file, found in the $GOLD_DIR/
gold/ directory. Some fragments are already provided (which can be edited), however user-
specific ones may also be added. Instructions on how to do this, as well as further information,
can be found in the file itself.
This is particularly useful if further control is sought over more than one ligand with a common
substructure in a ligand library file.
This feature is only available if using GOLD via the command line.
If the rotatable_bond_override.mol2 file is to be used, lines of the following type should be
inserted into the gold.conf:
postprocess_bonds = 1
rotatable_bond_override_file =
<full_path_to_rotatable_bond_override_file>
The new bond type(s) are specified in the rotatable_bond_override.mol2 file, in the
@<TRIPOS>COMMENT part of the molecule file. The following format should be used:
RESET_BOND_TYPE <bond_number> <fix | flip | 1 | am>
fix keeps the bond at its input angle. This option can also be specified for a single ligand
docking via the gold.conf (see Section 7.7, page 66).
flip causes 180 degree turns of the input angle geometry.
1 re-types the bond to a single bond, thus it is treated as fully rotatable.
am re-types the bond as an amide bond.
A report detailing what has been matched can be found in the gold_ligand.log file:
GOLD User Guide 39
Postprocessing is done by default, even if the line postprocess_bonds = 1 is not present
in the gold.conf. Postprocessing can be switched off by adding the line postprocess_bonds
= 0 to the gold.conf and running GOLD via the command line.
If using the postprocess instruction and rotatable bond override file, the geometry is overruled
whether the associated fitness flag is on or off.
If a torsion distribution can be found and matched, this will be used to bias the geometry of the
re-typed bond.
Care should be taken to ensure the correct substructure is defined in the
rotatable_bonds_override.mol2 file. If a substructure cannot be matched, the bond override will
not be used.
5.5 Atom and Bond Type Conventions for Difficult Groups
Use of correct atom and bond types in GOLD is important for producing good results.
In order for the GOLD atom-type assigner to work correctly, it is necessary for the input
structures to have correct bond orders. This can be difficult when a ligand contains a group that
can be drawn in more than one way (i.e. a group which has more than one canonical form). In
such cases, there is usually a right and a wrong way for GOLD, and you need to know which is
which.
This section explains how to set the bond orders of some common difficult groups. It also shows
the atom types that GOLD will assign if bond types are set correctly (or that you must assign if
you are setting atom types manually).
Amidinium (see page 40)
Carboxylate (see page 40)
Enolate/phenolate oxygen (see page 40)
Guanidinium (see page 41)
186 GOLD User Guide
front end, then select the file gold.conf from <GOLD_DIR>/examples/tutorial3 and hit
Open.
3. Hydrogen Bonding Constraints
GOLD features two types of hydrogen bonding constraints:
A standard hydrogen bond constraint can be used to force a hydrogen bond between a specific
protein atom and a specific ligand atom (see Section 3.1, page 186).
A protein hydrogen bond constraint can be used to specify that a particular protein atom
should be hydrogen-bonded to the ligand, but without specifying to which ligand atom (see
Section 3.2, page 186).
3.1 Standard Hydrogen Bond Constraints
A standard hydrogen bond constraint allows a particular ligand atom to be constrained to form a
hydrogen bond to a particular protein atom.
Click on the Edit Constraints button to bring up the Constraint Editor. Then, select H-Bond
Constraint from the list of constraint types.
When specifying a hydrogen bond constraint the ligand and protein atom numbers, as defined in
the MOL2 input files, must be entered (if PDB input files are used, specify the sequence
number).
One of the atoms must be an H-bond donor and the other should be an acceptor. The protein
atom must also be available for ligand binding (i.e. solvent accessible).
Once defined, an H-bond constraint is incorporated into the least-squares fitting routine used by
GOLD to dock the ligand. The constraint has a weight of 5 relative to a normal hydrogen bond.
Thus, the docking will be biased towards solutions which include the specified hydrogen bond.
The hydrogen bond constraint weighting can be altered within the Fitness Function section of
the GOLD parameters file by changing the value of the parameter CONSTRAINT_WT.
Select Cancel to close the Constraint Editor.
3.2 Protein Hydrogen Bond Constraints
3.2.1 General Methodology (see page 187)
3.2.2 Specifying Multiple Constraints (see page 188)
3.2.3 Defining the Protein H Bond Constraints (see page 188)
A protein hydrogen bond constraint can be used to specify that a particular protein atom should
be hydrogen-bonded to the ligand, but without specifying to which ligand atom (see Section
8.3.3, page 75).
GOLD User Guide 185
Tutorial 3: Use of Hydrogen Bonding Constraints
1. Introduction (see page 185)
2. Input Files (see page 185)
3. Hydrogen Bonding Constraints (see page 186)
4. Running GOLD (see page 191)
5. Analysis of Output (see page 191)
1. Introduction
The design of new and more potent antiretroviral agents for the human immunodeficiency virus
(HIV) continues to be the focus of much attention. The crystal structures of HIV-1 protease in
complex with a number of cyclic urea inhibitors have been determined in order to identify the
key interactions responsible for the high potency of this class of inhibitor (see: Jadhav et al. J.
Med. Chem., (1997) 40, 181). The C
2
symmetric cyclic urea scaffold is well suited to interact
with the viral protease, it has been observed that these inhibitors are anchored in the active site
of the protease by six key hydrogen bonds.
The object of this tutorial is to investigate the binding mode of a cyclic urea inhibitor with HIV-
1 protease, PDB entry code 1qbt. The use of hydrogen bonding constraints in order to reproduce
these key interactions will also be illustrated.
2. Input Files
Open SILVER and read in and inspect the file protein.mol2 from <GOLD_DIR>/examples/
tutorial3. The original PDB file <GOLD_DIR>/examples/tutorial3/1QBT.pdb
has also been provided should you wish to set up the protein for yourself.
HIV-1 protease, protein.mol2, has already been set up in accordance with the guidelines for the
preparation of protein input files (see Section 3., page 9).
An important feature of cyclic urea inhibitors is their ability, upon binding, to displace a
structural water molecule present within the active site of the protein. In this example, all water
molecules have been deleted from protein.mol2. However, in other complexes you may not
know whether water molecules should form mediating hydrogen bonds, or be displaced by the
ligand on binding. GOLD allows waters to switch on and off (i.e. to be bound or displaced) and
to rotate (to optimise hydrogen bonding) during docking (see Section 3.4, page 16).
The cyclic urea inhibitor has already been prepared in accordance with the requirements for
setting up the ligand (see Section 4., page 30).
Open the file ligand.mol2 from <GOLD_DIR>/examples/tutorial3 within SILVER and
inspect the structure.
A configuration file (gold.conf) has been provided for this tutorial which will automatically load
the settings and parameter values for this tutorial into the GOLD front end.
Open GOLD and click on the Configuration File button within the Control panel of the GOLD
40 GOLD User Guide
N-oxide (see page 41)
Nitro (see page 42)
Nitrogen (anionic) (see page 42)
Nitrogen (cationic, aromatic) (see page 42)
Oxygen (anionic) (see page 43)
Phosphate (bridging) (see page 43)
Phosphate (terminal) (see page 43)
Sulphonamide (see page 44)
Sulphonate (see page 44)
Sulphone (see page 44)
Sulfoxide (sulfinyl) (see page 44)
Amidinium
Carboxylate
Enolate/phenolate oxygen
GOLD User Guide 41
or:
Guanidinium
N-oxide
or:
or:
184 GOLD User Guide
Metal coordination in GOLD is modelled as pseudo-hydrogen bonding. Metal-ligand
interactions will typically involve the metal binding to, for example, carboxylate ions,
deprotonated histidines (i.e. negatively charged), and phenolates. Therefore metals can be
considered to bind to H-bond acceptors and the metal will compete with H-bond donors for
interaction.
This ends the tutorial.
GOLD User Guide 183
5.1 Protein Log File
Open and inspect the file gold_protein.log (from the output directory specified (see Section 3.,
page 179) using a text editor.
The gold_protein.log file will contain details of the parameterisation of the protein and the
determination of the ligand binding site. Information relating to the metal and the determination
of the coordination geometry will also be given:
Check to see that the coordination geometry has been correctly overruled, and that the matched
geometry is tetrahedral. Further information about the contents of the gold_protein.log file are
given elsewhere, (see Section 14.9, page 118).
5.2 Files Containing the Protein and Docked Ligands
Open and inspect the file gold_protein.mol2 (located within your specified output directory (see
Section 3., page 179) using SILVER. The protein file now contains a number of dummy atoms
representing idealised metal coordination positions. These dummy atoms will be connected to
the metal ion.
At locations where GOLD is missing a coordination site (i.e. coordination points not bound to
the protein) virtual coordination points are added. These coordination points are then used as
fitting points that can bind to acceptors.
From your specified output directory identify the top-ranked solution predicted by GOLD,
ranked_ligand_m1_1.mol2 and open this file within SILVER.
Inspect how well the docked benzyl succinate inhibitor fits within the protein binding site.
The zinc (shown in blue) is coordinated to the protein via two histidine residues and a
carboxylate group. In the example shown below, the remaining zinc coordination site is used to
bind the benzyl succinate inhibitor (shown coloured in green) via interaction with a carboxylate
ion acceptor along the direction of the carbonyl oxygen lone pair:
42 GOLD User Guide
Nitro
Nitrogen (anionic)
For example, an anionic imidazole ring would be:
Nitrogen (cationic, aromatic)
For example, the pteridine ring system in methotrexate (PDB code 4DFR) would be:
GOLD User Guide 43
Oxygen (anionic)
For example, in a serine protease transition-state analogue this would be:
Phosphate (bridging)
Phosphate (terminal)
182 GOLD User Guide
Click on the Add metal or Update selected metal button to add the selection to the Current Metal
Settings. Hit Done to close the Metal Selection window.
5. Running GOLD and Analysis of Output
5.1 Protein Log File (see page 183)
5.2 Files Containing the Protein and Docked Ligands (see page 183)
Click on the Run button in the Control panel of the GOLD front end, this will start the GOLD
job interactively. As the job progresses output written to the gold_ligand_m1.log file will also be
displayed in the GOLD Output window. Once the job is complete the message GA Done will
appear.
GOLD User Guide 181
Open and inspect the GOLD parameters file by clicking on Edit Parameters within the Input
Files and Parameters panel of the front end and then selecting Yes in the Copy parameter file?
window.
The parameters used by GOLD for each metal are listed, for explanation of parameters refer to
comments in the gold.params file. Additional metal parameterisation can also be found within
the H_BOND TABLE.
For our Zn atom GOLD will therefore attempt to match coordination geometries 4, 5 and 6
(tetrahedral, trigonal bipyramidal, and octahedral templates) onto the coordinating atoms found
in the protein. The template that gives the best match will then be used to generate coordination
fitting points.
4.2 Manually Specifying Metal Coordination Geometries
It is possible to manually specify coordination geometries for particular metal atoms. This can
be useful in allowing non-standard metal coordination geometries, or to limit the number of
possible geometries that GOLD checks (i.e. to overrule the default geometries for the
corresponding metal type defined in the gold.params file).
In this example, the zinc atom is clearly tetrahedral (the Zn is coordinated to two histidine
residues and a carboxylate group in the protein, the fourth coordination site is available to bind
to the benzyl succinate inhibitor). We can therefore instruct GOLD to match against the
tetrahedral template only when determining the coordination geometry.
Click on the Metals button in the Input Parameters and Files section of the GOLD front-end. In
the resulting Metal Selection window specify the Metal atom no. 2096 (this is the Zn atom
number as defined in the protein input MOL2 file), and select (4) tetrahedral from the list of
allowed metal coordinations:
H-Bonding
type
Sybyl atom type Atom type (default or
elucidated)
Donor (D), Acceptor
(A), or Metal (M).
Allowed Coordination
geometries
Coordination
distance
MGD Mg DEF M 4, 6 2.05
ZND Zn DEF M 4, 5, 6 2.09
MND Mn DEF M 4, 6 2.06
FED Fe DEF M 4, 6 1.98
CAD Ca DEF M 6, 7 2.44
COBD Co.oh DEF M 6 2.09
GDD Gd DEF M 6 2.44
44 GOLD User Guide
Sulphonamide
GOLD will treat the nitrogen atom as a planar, trigonal nitrogen, i.e. not capable of accepting a
hydrogen bond. However, pyramidal sulphonamide nitrogen atoms are now typed as N.3, if the
geometry read into GOLD is pyramidal rather than N.pl3, and are treated as H-bond acceptors
(i.e. they have a fitting point) allowing them to coordinate metal groups.
Sulphonate
Sulphone
Sulfoxide (sulfinyl)
GOLD User Guide 45
5.6 Internal GOLD Atom Types
GOLD uses four internal atom types which are not recognised by SYBYL. These are N.plc
(nitrogen donors in a protonated delocalised system, such as a guanidinium ion), N.acid (acidic
nitrogen, e.g. in tetrazole or sulphonamide ions), S.a (sulphur acceptors) and S.m (charged
sulphur atoms). You should not really need to know about these, but all assignments of the N.plc,
N.acid, S.a and S.m atom types are logged in the gold.log file, so you can check to see if
everything is working as you would expect.
180 GOLD User Guide
4. The Handling and Parameterisation of Metals in GOLD
4.1 Automatic Determination of Metal Coordination Geometries (see page 180)
4.2 Manually Specifying Metal Coordination Geometries (see page 181)
GOLD is able to predict binding to seven metal ions: Mg, Zn, Fe, Mn, Ca, Co and Gd.
No special instructions are needed to dock to metal ions, they will be handled automatically
when present in the protein binding site.
4.1 Automatic Determination of Metal Coordination Geometries
GOLD will automatically recognise the following metal coordination geometries:
In order to determine the coordination geometry of a particular metal atom GOLD performs a
permuted superimposition of coordination geometry templates onto the coordinating atoms
found in the protein. Coordination fitting points are then generated using the template that gives
the best fit (based on RMSd).
The geometry templates used for a given metal are defined in the gold.params file in the section
headed # Metals:
Template Geometry Coordination
number
TETR Tetrahedral n=4
TBP Trigonal bipyramidal n=5
OCT Octahedral n=6
CTP Capped trigonal prism n=7
PBP Pentagonal bipyramidal n=7
SQAP Square prism n=8
ICO Icosahedral n=10
DOD Dodecahedral n=12
GOLD User Guide 179
The benzyl succinate inhibitor has also been set up in accordance with the guidelines for the
preparation of ligands (see Section 4.1, page 30).
Open and inspect the file ligand.mol2 from <GOLD_DIR>/examples/tutorial2
3. The GOLD Configuration File
All of the parameters and settings required to define a particular GOLD job may be saved as a
configuration file (gold.conf) (see Section 15., page 126). This text file will include details of the
ligand, the protein binding site, the fitness-function parameter file to be used, the torsion
distribution file to be used, and the genetic algorithm parameters. Therefore there is no need to
specify protein.mol2 and ligand.mol2 input files, as these will be read in upon opening
gold.conf.
A configuration file has been provided for this tutorial. Open GOLD and click on the
Configuration File button within the Control panel of the GOLD interface, then select the file
gold.conf from <GOLD_DIR>/examples/tutorial2 and hit Open. This will
automatically load the settings and parameter values for this tutorial into the GOLD front end.
Click on the Output button within the Input Files and Parameters panel, then hit the Output
Directory... button. Specify a directory, to which you have write permission, this is where the
GOLD output files will be written. Select Ok to close the Output preferences window.
46 GOLD User Guide
6. Fitness Functions
6.1 Choice of Fitness Functions (see page 46)
6.2 GoldScore Fitness Function (see page 46)
6.3 Altering GoldScore Fitness-Function Parameters; the GoldScore File (see page 48)
6.4 ChemScore Fitness Function (see page 49)
6.5 Altering ChemScore Fitness-Function Parameters; the ChemScore File (see page 58)
6.6 Altering GOLD Parameters: the gold.params File (see page 59)
6.7 Kinase Scoring Function (see page 59)
6.8 Heme Scoring Function (see page 60)
6.9 Internal Energy Offset (see page 62)
6.10 User Defined Fitness Function (see page 62)
6.1 Choice of Fitness Functions
GOLD offers a choice of fitness functions: GoldScore (see Section 6.2, page 46), ChemScore
(see Section 6.4, page 49) and User Defined Score.
With respect to use of either the GoldScore or ChemScore functions, both are about equally
reliable although, on any given problem, one may give a good prediction and the other not.
Therefore, when screening large numbers of compounds, rescoring docking poses with
alternative scoring functions and considering the best results from each (consensus scoring) can
have a favourable impact on the overall rank ordering of ligands (see Section 13., page 106).
GoldScore is the original GOLD scoring function and is selected by default.
User Defined Score allows users to implement their own scoring function (or modify an existing
scoring function) by specifying a path to a dynamically loadable shared object library (see
Section 6.10, page 62).
6.2 GoldScore Fitness Function
The GOLD fitness function is made up of four components:
protein-ligand hydrogen bond energy (external H-bond)
protein-ligand van der Waals (vdw) energy (external vdw)
ligand internal vdw energy (internal vdw)
ligand torsional strain energy (internal torsion)
Optionally, a fifth component, ligand intramolecular hydrogen bond energy (internal H-bond),
may be added.
If any constraints have been specified, then an additional constraint scoring contribution S(con)
will be made to the final fitness score. Similarly, when docking covalently bound ligands a
covalent term S(cov) will be present.
Note: By default, output files will contain a single internal energy term S(int) which is the sum of
GOLD User Guide 47
the internal torsion and internal vdw terms. To write these component terms to output files you
will need to edit the gold.params file (see Section 6.3, page 48) to include the following line:
VERBOSE_SCORE = 1
Empirical parameters used in the fitness function (hydrogen bond energies, atom radii and
polarisabilities, torsion potentials, hydrogen bond directionalities, etc.) are taken from the
GOLD parameter file. They can be customised by copying the file, editing the copy, and
instructing GOLD to use the edited file (see Section 6.3, page 48).
The fitness score is taken as the negative of the sum of the component energy terms, so that
larger fitness scores are better.
The external vdw score is multiplied by a factor of 1.375 when total fitness score is computed.
This is an empirical correction to encourage protein-ligand hydrophobic contact.
During a docking run, the fitness score may appear to get worse as the docking proceeds. This is
due to the fact that the effects of poor H-bond geometry and close nonbonded contacts are
artificially down-weighted at early stages of the docking (annealing) (see Section 10.8, page
91). Only the final fitness score (i.e. from the completed docking) has any meaning.
The fitness function has been optimised for the prediction of ligand binding positions rather than
the prediction of binding affinities, although some correlation with the latter has been found (see
Section 16.2, page 137).
6.2.1 Docking With Localised Soft Potentials: An Alternative Form for the External Van der
Waals Contribution (see page 47)
6.2.2 Bump Checking (see page 48)
6.2.1 Docking With Localised Soft Potentials: An Alternative Form for the External Van der
Waals Contribution
GoldScore uses Lennard-Jones functional forms for both the External and Internal Van der
Waals contributions to the Fitness Function. By default a 6-12 potential is applied to the Internal
Van der Waals contribution and a 4-8 potential is applied to the External Van der Waals
contribution. These defaults are defined in the gold.params file (see Section 6.3, page 48).
The 4-8 potential form for the External contribution is selected as being optimum for general
use. However there are cases where this potential form may be too severe in the short contact (i.e
the clash) component. This would arise for instance, where part of the binding site is made up of
a loop which it is known can move aside slightly to accomodate large ligands. In such cases it is
possible to apply a softer 'Split Van der Waals Potential' for certain selected residues. Two
alternative soft 'Split Potential' forms are parameterised in the gold.params file:
178 GOLD User Guide
Tutorial 2: Handling of Metals in GOLD
1. Introduction (see page 178)
2. Preparation of Input Files (see page 178)
3. The GOLD Configuration File (see page 179)
4. The Handling and Parameterisation of Metals in GOLD (see page 180)
5. Running GOLD and Analysis of Output (see page 182)
1. Introduction
The object of this tutorial is to investigate the binding mode of a benzyl succinate inhibitor with
the carboxypeptidase A, PDB entry code 1cbx. In this example, the benzyl succinate inhibitor is
known to coordinate to a zinc atom within the ligand binding site of the protein.
This tutorial will illustrate the requirements for setting up and running a docking in which the
protein binding site features a metal ion. Additional information will also be provided on the
handling and parameterisation of metals in GOLD.
2. Preparation of Input Files
Open SILVER and read in the file protein.mol2 from <GOLD_DIR>/examples/
tutorial2. The original PDB file <GOLD_DIR>/examples/tutorial2/1CBX.pdb
has also been provided should you wish to set up the protein for yourself (please note that the
inhibitor coordinates will need to be deleted when preparing the protein).
The carboxypeptidase A, protein.mol2, has already been set up in accordance with the guidelines
for the preparation of protein input files (see Section 3., page 9).
Upon inspection of protein.mol2 you should notice that parts of the protein remote from the
binding site have been deleted in order to speed up the calculation (see Section 2.1, page 163),
and that hydrogen atoms have been placed on the protein in order to ensure that the ionisation
and tautomeric states are defined unambiguously (see Section 3.2, page 10).
There are some additional requirements when preparing a protein input file which contains a
metal ion:
In the protein input file it is essential that the metal ion is coordinated to at least two protein
atoms or water molecules so that GOLD can determine the correct coordination geometry.
In the protein input file, the metal ion must not have any bonds to coordinating atoms. If these
are present in the original protein file, they must be deleted.
On closer inspection of the protein.mol2 input file you will see that the zinc atom is coordinated
to two histidine residues and a carboxylate group. All bonds to coordinating atoms have been
removed:
GOLD User Guide 177
Using this methodology GOLD has been validated against a large number of protein-ligand
complexes taken from the PDB. Further details and the entire validation test set are available for
download (see Section 16.1.3, page 131).
This ends the tutorial.
48 GOLD User Guide
EXTERNAL_POTENTIAL(1) = 4-8 2-4- Form 1
EXTERNAL_POTENTIAL(2) = 4-8 1-2- Form 2
The first term of each form describes long range interactions, the second term describes short
range interactions. The point of change-over is at the 4-8 potential minimum and the second
term is set such that both terms take the same value at this point. The function therefore remains
continuous and the minimum point is the same as with the default 4-8 potential.
To apply one of these two soft potentials to a single residue, edit the gold.conf file (see
Section 15.1, page 126) and add the following instruction:
alt_residues(form) = <residue>
Where form is the 'Split Potential' form to be applied (i.e. 1 or 2), and <residue> is the
residue to which the split potential is to be applied. e.g. specifying
alt_residues(1) = ALA148
will apply the split potential of form 1 to the residue Ala 148.
More than one residue can be specified, and both potential forms can be used in the same gold
run. In the example below two residues are assigned split potentials of form 1, and one is
assigned a split potential of form 2.
alt_residues(1) = ALA148 ARG150
alt_residues(2) = ARG149
6.2.2 Bump Checking
Normally, a bump check is made to guard against unreasonably close contacts between ligand
and protein atoms. However, if (and only if) GoldScore is being used, you can permit n ligand
atoms to penetrate the protein by entering n in the No. of Ligand Bumps entry box, e.g.
6.3 Altering GoldScore Fitness-Function Parameters; the GoldScore File
A GoldScore parameter file, goldscore.params, is provided in the $GOLD_DIR/gold
directory.
The goldscore.params file is used by default.
Instructions on how to make use of the extended metal parameters is given elsewhere (see
Section 6.8, page 60).
GOLD User Guide 49
To make use of the new metal parameters either replace the default file by renaming the
goldscore.p450_<csd|pdb>.params, or by specifying one of p450 params files via the
GOLD interface or in the gold.conf.
6.4 ChemScore Fitness Function
6.4.1 Introduction to ChemScore (see page 49)
6.4.2 Block Functions in ChemScore (see page 50)
6.4.3 Hydrogen-Bond Terms (see page 52)
6.4.4 Metal-Binding and Lipophilic Terms (see page 54)
6.4.5 Rotatable-Bond Freezing Term (see page 56)
6.4.6 Clash Penalty and Internal Torsion Terms (see page 56)
6.4.7 Covalent Term (see page 58)
6.4.8 Constraint Terms (see page 58)
6.4.1 Introduction to ChemScore
The ChemScore scoring function is published in:
M. D. Eldridge, C. W. Murray, T. R. Auton, G. V. Paolini and R. P. Mee, J. Comput.-Aided
Mol. Des., 11, 425-445 (1997).
C. A. Baxter, C. W. Murray, D. E. Clark, D. R. Westhead and M. D. Eldridge, Proteins, 33,
367-382 (1998).
ChemScore was derived empirically from a set of 82 protein-ligand complexes for which
measured binding affinities were available.
Unlike GoldScore, the ChemScore function was trained by regression against measured affinity
data, although there is no clear indication that it is superior to GoldScore in predicting affinities.
ChemScore estimates the total free energy change that occurs on ligand binding as:
Each component of this equation is the product of a term dependent on the magnitude of a
particular physical contribution to free energy (e.g. hydrogen bonding) and a scale factor
determined by regression, i.e.
rot lipo metal hbond binding
G G G G G G A + A + A + A + A = A
0
176 GOLD User Guide
11.3 Files Containing The Docked Ligand (gold_soln_ligand_m#_n.mol2)
The N-phosphonacetyl-L-aspartate ligand will have been docked a number of times, so a set of
files will have been produced, each containing the results of a separate docking attempt.
The result of each docking attempt is written out as gold_soln_ligand_m1_n.mol2, where n is the
number of the docking solution 1,2,3 ... and m1 is an index to the ligand (in this example, only
one ligand was docked).
Note that the file gold_soln_ligand_m1_1.mol2 is not the best GOLD prediction, it is just the
solution found in the first docking attempt. However, as GOLD proceeds, symbolic links are
created: ranked_ligand_m1_1.mol2 will point to the current top-ranked solution,
ranked_ligand_m1_2.mol2 will point to the second-best solution, and so on.
Open and inspect the top ranked solution predicted by GOLD within your visualisation software
package.
A simple test of the effectiveness of a docking program is to take a protein-ligand complex from
the PDB and extract the ligand. The docking program can then be used to predict the binding
mode of the ligand and a comparison made with the crystallographically observed position. The
crystallographically observed conformation of the docked N-phosphonacetyl-L-aspartate ligand
is provided. Open the file ligand_reference.mol2 from <GOLD_DIR>/examples/
tutorial1 and compare this with the solution predicted by GOLD.
In the figure below the crystallographically observed reference structure ligand_reference.mol2
(shown in green) is compared with the top-ranked solution predicted by GOLD (shown coloured
by element):
GOLD User Guide 175
fitness score, while the solution found for docking attempt number 1 has the worst fitness:
11.2 Fitness Function Rankings Files (ligand_m1.rnk and bestranking.lst)
Open and inspect the file ligand_m1.rnk in a text editor. This file contains a summary of the
fitness scores for all the docking attempts on the N-phosphonacetyl-L-aspartate ligand.
The docking attempts are listed according to fitness score, so the best solution is placed first.
The file gives total fitness scores and a breakdown of the fitness into its constituent energy
terms. For GoldScore, these are the two vdw energy terms (protein-ligand and internal ligand),
an internal ligand torsion term, and two hydrogen-bonding terms (protein-ligand and ligand
intramolecular). For example:
A file called bestranking.lst is written when running dockings with multiple ligands. This gives a
continuous summary of the best solution that has been obtained for each completed ligand. The
file gives total fitness scores and a breakdown of the fitness into its constituent energy terms.
50 GOLD User Guide
Here, the v terms are the regression coefficients and the P terms represent the various types of
physical contributions to binding.
The final ChemScore value is obtained by adding in a clash penalty and internal torsion terms,
which militate against close contacts in docking and poor internal conformations. Covalent and
constraint scores may also be included.
6.4.2 Block Functions in ChemScore
ChemScore uses block functions throughout its implementation to describe contact terms of
various types.
A block function is of the following form:
This functional form looks like:
rot rot
lipo lipo
metal metal
hbond hbond
P v G
P v G
P v G
P v G
v G
4
3
2
1
0 0
= A
= A
= A
= A
= A
( )
int constra covalent covalent internal internal clash binding
P P c P c P G ChemScore + + + + A =

>
s s

s
=
max
max ideal
ideal max
ideal
ideal
ideal
x x
x x x
x x
x x
x x
x x x B
if
if
if
0
0 . 1
1
) , , (
max
GOLD User Guide 51
In the GOLD implementation of ChemScore, the block function is sometimes convoluted with a
Gaussian function:
The effect is to smooth the function, e.g.:

0.0
1.0
x
ideal
x
max

}
}
+

+

=
du u g
du u g x x u x B
x x x B
ideal
ideal
) , (
) , ( ) , , (
) , , , ( '
max
max
o
o
o
2 2
2 /
) , (
o
o
u
e u g

=
174 GOLD User Guide
11.1 The Ligand Log File (gold_ligand_m1.log)
Ten docking runs have been set up for this ligand and, for each of these docking runs, the
progress of the genetic algorithm is displayed in the GOLD Output window. This information is
also recorded in the ligand log file gold_ligand_m1.log (where m1 is the index to the number of
the ligand in the input file).
Open and inspect gold_ligand_m1.log using a text editor (a section of an example ligand log file
is shown):
Following the completion of all docking runs on the ligand, the results from the different runs
are compared. The end of the gold_ligand_m1.log file will include a matrix of root mean square
deviations (rmsd) between the various docked ligand positions (see Section 14.10.2, page 120).
A clustering report is also given which can be used to identify different binding modes (see
Section 14.10.3, page 122). It is possible that fewer than the specified ten dockings were
completed due to the Allow early termination option being selected (see Section 5., page 167). In
the example output shown below, the solution found for docking attempt number 2 has the best
GOLD User Guide 173
Any error or warning messages produced will be displayed in a separate GA Program Error
Message window (this might normally contain a number of warning messages relating to the
GOLD atom type assigner). These messages can be safely ignored.
Once the job is complete the message GA Done will appear in the GOLD Output window. The
output displayed is also written to the ligand.log file but can be saved under a different filename
by selecting the Save Output button.
Dismiss the GOLD Output window by clicking on the Dismiss button.
11. Analysis of Output
11.1 The Ligand Log File (gold_ligand_m1.log) (see page 174)
11.2 Fitness Function Rankings Files (ligand_m1.rnk and bestranking.lst) (see page 175)
11.3 Files Containing The Docked Ligand (gold_soln_ligand_m#_n.mol2) (see page 176)
The specified output directory (see Section 9., page 170) will contain a number of files
including:
Files containing the initialised protein and ligand (gold_protein.mol2 and gold_ligand.mol2)
Files containing the docked ligand (gold_soln_ligand_m1_n.mol2)
Files containing fitness function rankings (ligand_m1.rnk and bestranking.lst)
Protein and ligand log files (gold_protein.log and gold_ligand_m1.log)
Files containing error messages (gold.err), this file will be empty if no errors are found.
Some of these output files will be dealt with in detail below. Further information on the content
of all these output files is available (see Section 14., page 109).
52 GOLD User Guide
6.4.3 Hydrogen-Bond Terms
The hydrogen-bond term is computed as a sum over all possible donor-acceptor pairs, such that
one atom belongs to the protein and the other to the ligand.
Each term in the summation is the product of three Gaussian-smoothed block functions (see
Section 6.4.2, page 50). The purpose of the block functions is to reduce the contribution of a
hydrogen bond according to how much its geometry deviates from (a) ideal H...A distance, (b)
ideal D-H...A angle and (c) ideal directionality with respect to the acceptor atom. The maximum
contribution of a given donor-acceptor pair to the summation is 1; this will occur if the pair form
a hydrogen bond of ideal geometry.
The tables below describe the various parameters in this equation, their meanings, and what they
are called in the ChemScore parameter file (see Section 6.5, page 58).
D-H..A distance parameters (D= Donor, A = Acceptor)
Term Meaning Name in ChemScore file Default
value
r The ideal hydrogen..acceptor
(H...A) distance (in )
R_IDEAL 1.85

0.0
1.0
x
ideal
x
max

) , , , '*( ). , , , ( ' . ) , , , ( '
| o
o | | | o o o o o
max ideal max ideal r max ideal hbond
B B r r r B G A A A A A A A A A = A

pairs
acceptor - donor all
GOLD User Guide 53
Ar The absolute deviation of the
actual H..A separation from r
Calculated for each H-
bond
-
Ar
ideal
The tolerance window around
the H..A distance, r, within
which the H-bond is regarded
as ideal
DELTA_R_IDEAL 0.25
Ar
max
The maximum possible
deviation from the ideal
distance; above this, the
interaction is not regarded as
an H-bond
DELTA_R_MAX 0.65
o
r
The Gaussian smearing sigma
associated with this term.
HBOND_R_SIGMA 0.1
D-H..A angle parameters (D= Donor, A = Acceptor)
Term Meaning Name in ChemScore file Default
value
o The ideal D-H..A angle (in
degrees)
ALPHA_IDEAL 180.0
A o The absolute deviation of the
actual D-H..A angle from o
Calculated for each H-
bond
-
A o
ideal
The tolerance window around
the D-H..A angle, o, within
which the H-bond is regarded
as ideal
DELTA_ALPHA_IDEAL 30.0
A o
max
The maximum possible
deviation from the ideal D-
H..A angle; above this, the
interaction is not regarded as
an H-bond
DELTA_ALPHA_MAX 80.0
o
o
The Gaussian smearing sigma
associated with this term.
HBOND_ALPHA_SIGMA 10.0
DH..A-X acceptor-centred angle parameters (D= Donor, A = Acceptor, X =
Heavy atom attached to A)
Term Meaning Name in ChemScore file Default
value
| The ideal H..A-X angle (in
degrees)
BETA_IDEAL 180.0
172 GOLD User Guide
Filter out all solutions with fitness scores lower than a specified value
By default the Keep all solutions option from the Selecting Docked Solutions panel in the Output
preferences window should be selected:
Select Done to close the Output preferences window.
10. Running GOLD
The main Control panel of the GOLD front end contains a number of options, including:
The Run button, which will start a GOLD job, and display the output to the screen until
completion of the job.
Save&Exit which will save all the settings defined in the GOLD front end in a configuration
file (gold.conf) and then close the front end. The configuration file includes details of the
ligand, the protein binding site, the fitness-function parameter file to be used, the torsion
distribution file to be used, and the genetic algorithm parameters (see Section 15., page 126).
Submit&Exit which will start a GOLD run in the background (and also save a configuration
file), then close the front end.
The Configuration File button which enables the settings from a previously saved
configuration file to be opened. This will automatically load the saved parameter values into
the front end (see Section 15., page 126).
Click on the Run button in the GOLD front end.
As the job progresses output will be displayed in a GOLD Output window:
GOLD User Guide 171
Ensure that the Save rnk files and Save solution log files check boxes are switched on, this will
instruct GOLD to retain output files listing fitness-function rankings and ligand log files. The
content of these files are discussed later (see Section 11., page 173).
By default, docking solutions will be written out in the same format as was used for input (i.e.
MOL2 format), ensure that the Same as input output file format option is selected.
Click on the Output Directory... button and specify a directory, to which you have write
permission, this is where the GOLD output files will be written.
It is possible to write additional information to docked solution files. This information is written
to SD file tags; for MOL2 files, these tags are written to comment blocks. This information is
particularly important for post-processing docking results with SILVER. For the purpose of this
tutorial the Information in File settings can be left at their default settings.
GOLD can produce a large amount of output. However, it is possible to cut this down by
applying output filter options. These options can be used to:
Specify that all docking solutions are saved
Retain only the n best docking solutions
Save the top-ranked solution for the best m ligands only
54 GOLD User Guide
The third block function in the H-bond equation, B
*
, is the sum of all possible values for a given
hydrogen bond. For example, a tertiary amine acceptor has three covalently-bound atoms that
could be deemed as the X atom: in this case, the term added for an H-bond to the amine is the
product of the block-function values for all three possible H..A-X angles.
Hydrogen bonds have a regression coefficient associated with them, v
1
(see Section 6.4.1, page
49). By default, this is set to 3.34. The name of this coefficient in the ChemScore parameter file
(see Section 6.5, page 58) is HBOND_COEFFICIENT.
6.4.4 Metal-Binding and Lipophilic Terms
The metal-binding term in ChemScore is computed as a sum over all possible metal-ion ...
acceptor pairs, where the acceptor is an atom in the ligand that is capable of binding to a metal.
Each term in the summation is a Gaussian-smoothed block function (see Section 6.4.2, page 50)
whose purpose is to reduce the contribution of the metal-acceptor interaction if the geometry is
not ideal.
The table below describes the various parameters in this equation, their meanings, and what they
are called in the ChemScore parameter file (see Section 6.5, page 58).
A | The absolute deviation of the
actual H..A-X angle from |
Calculated for each H-
bond
-
A |
ideal
The tolerance window around
the H..A-X angle, |, within
which the H-bond is regarded
as ideal
DELTA_BETA_IDEAL 70.0
A |
max
The maximum possible
deviation from the ideal H..A-
X angle; above this, the
interaction is not regarded as
an H-bond
DELTA_BETA_MAX 80.0
o
|
The Gaussian smearing sigma
associated with this term.
HBOND_BETA_SIGMA 10.0

=
acceptors
ligand All
metals
protein All
) , , , (
metal max ideal aM metal
R R r B P o
GOLD User Guide 55
The metal-binding term has a regression coefficient associated with it, v
2
(see Section 6.4.1,
page 49). By default, this is set to 6.03. The name of this coefficient in the ChemScore
parameter file (see Section 6.5, page 58) is METAL_COEFFICIENT.
The lipophilic term is defined in a similar way:
The table below describes the various parameters in this equation, their meanings, and what they
are called in the ChemScore parameter file (see Section 6.5, page 58).
Metal-binding parameters in ChemScore
Term Meaning Name in ChemScore file Default
value
r
aM
The actual acceptor-metal distance
(in )
Calculated for each
acceptor-metal pair
-
R
ideal
The ideal acceptor-metal distance METAL_R1 2.6
R
max
The maximum acceptor-metal
distance to be considered a binding
interaction
METAL_R2 3.0
o
metal
The Gaussian smearing sigma
associated with this term
METAL_R_SIGMA 0.1
Lipophilic parameters in ChemScore
Term Meaning Name in ChemScore file Default
value
r
ll
The actual distance between the
pair of lipophilic atoms (in )
Calculated for each atom-
atom pair
-
R
ideal
The ideal atom...atom distance
separation
LIPO_R1 4.1
R
max
The maximum separation, beyond
which no interaction is deemed to
occur
LIPO_R2 7.1
o
lipo
The Gaussian smearing sigma
associated with this term
LIPO_R_SIGMA 0.1

=
atoms lipophilic
ligand All
atoms lipophilic
protein All
) , , , (
lipo max ideal ll lipo
R R r B P o
170 GOLD User Guide
are shown):
Care should be taken when altering these parameter settings and you are recommended to use
one of the pre-defined parameters sets offered. Alternatively, GOLD can decide on the optimal
settings to use for a given ligand (see Section 11.3, page 94).
To enable automatic GA settings, click on the Select GA Presets and Automatic Settings button in
the Genetic Algorithm Parameters panel (or hit Settings in the Control panel) then, in the
Settings selector window, click on Use automatic settings. Ensure the Search efficiency is set to
100%, then hit Done.
The criteria used by GOLD to determine the optimal GA parameter settings for a given ligand
include: the number of rotatable bonds in the ligand, ligand flexibility, i.e. number of flexible
ring corners, flippable nitrogens, etc., the volume of the protein binding site, and the number of
water molecules considered during docking. Details of the exact settings used will be given in
the ligand log file gold_ligand_m1.log (see Section 14.10, page 118).
9. Setting Output Preferences
Select the Output... button in the GOLD front end to open the Output preferences window:
GOLD User Guide 169
GoldScore is the original GOLD scoring function and is made up of four components:
protein-ligand hydrogen bond energy (external H-bond)
protein-ligand van der waals (vdw) energy (external vdw)
ligand internal vdw energy (internal vdw)
ligand torsional strain energy (internal torsion)
It is possible to alter the empirical parameters used in the fitness function (hydrogen bond
energies, atom radii and polarisabilities, torsion potentials, hydrogen bond directionalities, etc.)
within the GOLD parameters file. The default GOLD parameters file (gold.params) can be
found in:
UNIX: $GOLD_DIR/gold.params
Windows: <InstallDir>/GOLD/gold.params
where <InstallDir> is usually C:/Program Files/CCDC
For the purpose of this tutorial ensure that the Parameter File entry box in the Input Parameters
and Files section of the GOLD front end is set to gold.params, or DEFAULT when used for the
first time.
Torsion angle distributions, extracted from the Cambridge Structural Database (CSD), can be
used to restrict the ligand conformational space sampled by the genetic algorithm. Using torsion
angle distributions in this way may improve the chances of GOLD finding the correct answer by
biasing the search towards ligand torsion-angle values that are commonly observed in crystal
structures. It may also improve convergence and so make GOLD usable with faster settings (see
Section 9.1, page 83).
By default the use of torsion angle distributions should be enabled. Click on the Fitness & Search
options button in the GOLD front end. In the resulting window ensure the check box labelled
Use torsion angle distributions from the CSD is switched on.
8. Genetic Algorithm Parameter Settings
GOLD optimises the fitness score using a genetic algorithm (GA) (see Section 10., page 89).
A number of parameters control the precise operation of the genetic algorithm. Genetic
algorithm parameter settings can be specified in the GOLD front end (standard default settings
56 GOLD User Guide
The difference between the metal and lipophilic parameterisation is that the lipophilic term is
scored over a much longer range.
Lipophilic atoms are defined as non-accepting sulphurs, non-polar carbon atoms (polar carbon
atoms are carbon atoms attached to two or more polar atoms), and non-ionic chlorine, bromine
and iodine atoms.
The lipophilic term has a regression coefficient associated with it, v
3
(see Section 6.4.1, page
49). By default, this is set to 0.117. The name of this coefficient in the ChemScore parameter
file (see Section 6.5, page 58) is LIPO_COEFFICIENT.
6.4.5 Rotatable-Bond Freezing Term
The following formula is used to estimate the entropic loss that occurs when single, acyclic
bonds in the ligand become non-rotatable upon binding:
N
rot
is the number of frozen rotatable bonds in the ligand (a bond is considered frozen if one or
more atoms on both sides of the rotatable bond is in contact with the protein). The expression is
deemed to have a value of zero if there are no rotatable bonds in the ligand.
P
nl
(r) and P
nl
(r) are the percentages of non-hydrogen atoms on either side of the rotatable bond
that are not lipophilic. For example, if there are 10 non-hydrogen atoms on one side of the bond,
of which 3 are not lipophilic, and there are 20 non-hydrogen atoms on the other side, of which 2
are not lipophilic, then P
nl
(r) and P
nl
(r) are 30% and 10%, respectively.
The regression coefficient associated with this term, v
4
(see Section 6.4.1, page 49), has the
default value 2.56. The name of this coefficient in the ChemScore parameter file (see Section
6.5, page 58) is ROT_COEFFICIENT.
6.4.6 Clash Penalty and Internal Torsion Terms
Clashes between protein and ligand atoms and ligand internal torsional strain are accommodated
by penalty terms.
These terms are included to prevent poor geometries in docking.
The clash penalty terms in ChemScore differ on the nature of the contact, i.e. whether it is a
hydrogen-bonding contact, a metal-binding contact or neither of these.
Any hydrogen bond with an H...A distance shorter than r
hbond
contributes a clash term of:

+
+ =
r
nl nl
rot
rot
r P r P
N
p
2
)) ( ' ) ( (
)
1
1 ( 1
GOLD User Guide 57
The value of r
hbond
(default = 1.6) can be changed by altering the parameter
CLASH_RADIUS_HBOND in the ChemScore file (see Section 6.5, page 58).
Any metal coordination contact shorter than r
metal
contributes a clash term of:
The value of r
metal
(default = 1.3*) can be changed by altering the parameter
CLASH_RADIUS_METAL in the ChemScore file (see Section 6.5, page 58).
All other ligand-protein interatomic contacts contribute clash terms of the following form:
r
clash
varies with contact type: for contacts to protein sulphur atoms, it is set to 3.35; for all
other contacts, it is set to 3.10. These settings correspond to the parameters
CLASH_RADIUS_SULPHUR and CLASH_RADIUS_GENERAL in the ChemScore file (see
Section 6.5, page 58).
Internal ligand strain is accommodated by clash terms in combination with torsional strain terms
of the form:
( )
hbond hbond
hbond
hbond clash
r G
r r
P
A

=

0 . 20
( )
metal metal
metal
metal clash
r G
r r
P
A

=

0 . 20
( )
clash
clash
other clash
r
r r
P

+ =

0 . 4
0 . 1
( )

u u =
bonds
rotatable All
) cos( 1
0
n A P
i internal
168 GOLD User Guide
The orthogonal x, y, z coordinates of a solvent accessible point approximately at the centre of the
active site should be entered. The centre of the binding site in 1acm has already been centred
over the origin, so in this case the coordinates can be left as 0.0, 0.0, 0.0.
The approximate radius of the binding site must also be specified. By default the binding site
radius is set to 10.0 , ensure that this is the case. This radius should be large enough to contain
any possible binding mode of the N-phosphonacetyl-L-aspartate ligand.
A cavity detection algorithm, LIGSITE, is used to restrict the region of interest to concave,
solvent-accessible surfaces. Ensure that cavity detection is enabled by switching on the button
labelled Detect Cavity:
7. Fitness Function and Search Settings
During a docking run the solutions found by GOLD are scored according to a fitness function
(see Section 6., page 46).
GOLD offers a choice of three fitness functions, GoldScore (see Section 6.2, page 46),
ChemScore (see Section 6.4, page 49) and User Defined Score (see Section 6.10, page 62).
The User Defined Score allows you to modify existing scoring functions, or to implement a
completely new scoring function using an Applications Programming Interface (API). A good
knowledge of the C programming language is required together with some experience in using
GOLD. Full documentation for the GOLD Scoring Function API is provided:
UNIX: $GOLD_DIR/gold/api_doc/index.html
Windows: <InstallDir>/GOLD/gold/api_doc/index.html
where <InstallDir> is usually C:/Program Files/CCDC
Ensure that the default GoldScore scoring function is selected within the Fitness Function and
Search Settings panel of the GOLD front end (see Section 6., page 46):
GOLD User Guide 167
Add single ligands
Select a complete directory of ligand files.
Specify a single file containing several ligands (i.e. a multi-MOL2 or SD file).
Click on the Filename button and select ligand.mol2 from <GOLD_DIR>/examples/
tutorial1.
The number of dockings to be performed on each ligand is specified by entering a value for the
No. of GA runs. By default this should be set to ten, if not set the number of docking runs to ten.
Click on Add file or Update selected file, the filename of the selected ligand and the number of
dockings are now displayed in the Current Ligand File Selection list. Hit Done to close the
Ligand selection for docking run window.
5. Input Parameters and Files Settings
The specified protein input file should be displayed within the Input Parameters and Files panel
of the GOLD front end, and the Ligands Count should be displayed as 1.
By default the Set atom types check button for the Ligand only should be switched on in the
Input Parameters and Files panel, further information on atom type assignment is provided (see
Section 5.1, page 36). If this is not the case, then enable the Set atom types option for the Ligand.
By default the Allow early termination check box should be switched on and contain the
following early termination criteria:
This will instruct GOLD to terminate the docking if, at any point, the best three solutions found
are all within 1.5 rmsd of each other. In this case, it is probable that the answer is correct and
further docking runs will not be required.
6. Defining the Ligand Binding Site
It is necessary to specify the approximate centre and extent of the protein binding site, this can
be done in a number of ways, including:
from a point (see Section 3.8.1, page 25);
from a protein atom (see Section 3.8.2, page 25);
from a file containing a list of atoms (see Section 3.8.3, page 26);
from a protein residue (see Section 3.8.4, page 26);
from a file containing a list of residues (see Section 3.8.5, page 27);
from a reference ligand (see Section 3.8.6, page 28).
For this example, switch on the button labelled Point in the GOLD front end:
58 GOLD User Guide
Bonds are deemed to be rotatable if they are single and acyclic and involve pairs of atoms with
hybridisation states sp3-sp3, sp3-sp2 or sp2-sp2.
The parameters A, n and u in the above equation are set in the ChemScore file (see Section 6.5,
page 58). The relevant lines are SP3_SP3_BOND, SP3_SP2_BOND, SP2_SP2_BOND and
UNKNOWN_BOND. The syntax is of the form:
SP3_SP3_BOND A n u
0
For example:
SP3_SP3_BOND 0.18750 3.0 3.1515926
The overall contribution of intramolecular strain to the scoring function is scaled by the
coefficient called INTRA_COEFFICIENT in the ChemScore file (see Section 6.5, page 58)
6.4.7 Covalent Term
When covalent bonding is switched on (see Section 4.6, page 33) the ChemScore function is
modified in the following ways:
The clash term (see Section 6.4.6, page 56) is reduced so that no clash is registered for 1-2 or
1-3 contacts around the link atoms in the protein and ligand.
Torsion terms (see Section 6.4.6, page 56) are added for the rotatable parts of the linkage.
A valence-angle bending term is added to the overall energy to penalize poor link geometries.
The weight of the covalent link energy in the ChemScore function is controlled by the parameter
called LINK_BEND_COEFFICIENT in the ChemScore parameter file (see Section 6.5, page
58).
6.4.8 Constraint Terms
Constraints (see Section 8., page 68) are implemented in ChemScore in the same way as they are
in GoldScore.
6.5 Altering ChemScore Fitness-Function Parameters; the ChemScore File
The ChemScore parameter file is stored in the GOLD distribution directory. It contains all the
parameters used by the GOLD implementation of ChemScore. A full description of the meaning
of the various parameters is given elsewhere (see Section 6.4, page 49).
The ChemScore file can be customised by copying it, editing the copy, and instructing GOLD to
use the edited file.
A copy of the default file will be placed in your current directory (where it will be called
chemscore.params) if you click on the ChemScore File button in the GOLD front end.
GOLD User Guide 59
The entry box next to the ChemScore File button in the GOLD front end should say DEFAULT if
you want to use the default ChemScore parameter file. If you want to use a customised version
of the file, click on the ChemScore File button to select the required file or directly type the file
name into the entry box.
The format of the ChemScore file is quite strict: incorrect editing may cause GOLD to behave in
unexpected ways or even to crash. Because of the large number of parameters, no guarantee can
be given that the program will behave reliably with anything other than the default
parameterisation.
6.6 Altering GOLD Parameters: the gold.params File
The parameter file gold.params is stored in the GOLD distribution directory. It contains all of
the parameters used by GOLD (e.g. hydrogen bond energies, atom radii and polarisabilities,
torsion potentials, hydrogen bond directionalities, etc.) other than those which are specified in
the configuration file (i.e. can be set via the GOLD front end).
It also contains parameters that control the general behaviour of GOLD, e.g. whether the final
solution from a genetic algorithm run is to be minimised via a Simplex procedure before being
saved.
The parameter file can be customised by copying it, editing the copy, and instructing GOLD to
use the edited file.
Click on the Edit Parameters button to edit the parameter file. If the parameter file is set to
DEFAULT then the standard GOLD distribution parameter file is copied to the current directory.
GOLD gets the location of the parameter file from the configuration file line param_file =
<parameter file location>. This is most easily defined using the Parameter File button in the
front end.
The Parameter File entry box in the GOLD front end should say DEFAULT if you want to use
the default GOLD parameter file. You can click on the button to pick an alternative parameter
file, or directly type a file name into the entry box.
The format of the parameter file is quite strict: incorrect editing may cause GOLD to behave in
unexpected ways or even to crash. Because of the large number of parameters, no guarantee can
be given that the program will behave reliably with anything other than the default
parameterisation.
For more information see the comments in the parameter file, gold.params.
6.7 Kinase Scoring Function
Weak CH..O interactions can be accounted for by inclusion of a Chemscore term that calculates
a contribution for weak hydrogen bonds. This term can be useful when dealing with particular
proteins, e.g. most kinases contain weak N-heterocycle CH...O hydrogen bonds.
This term can be enabled by editing the chemscore.params file (see Section 6.5, page 58). The
166 GOLD User Guide
The ligand has been minimised into a low-energy starting conformation and the atom types have
been checked for accuracy (see Section 4.3, page 31).
2.3 Atom Type Assignment
Each protein and ligand atom must be assigned an atom type which is used to determine whether
the atom is capable of forming hydrogen bonds. GOLD atom typing is based on SYBYL (http://
www.tripos.com/) atom types. SYBYL bond types are also used.
GOLD will automatically assign atom types provided the Set atom types check buttons are
switched on in the Input Parameters and Files panel of the GOLD front end.
GOLD deduces atom types from the information about element types and bond orders in the
input structure file, it is therefore crucial that both the protein and ligand input files are prepared
according to the guidelines provided (see Section 5.1, page 36).
3. Specifying the Protein Input File
Open GOLD and click on the Protein button in the Input Parameters and Files section of the
front end to bring up the file selection window.
Select protein.mol2 from <GOLD_DIR>/examples/tutorial1, then click on Open.
4. Specifying the Ligand Input File
Click on the Edit Ligand File List button in the GOLD front-end. The Ligand selection for
docking run window will appear:
From here it is possible to:
GOLD User Guide 165
All other parts of the protein will be kept rigid, so the only way of dealing with a truly flexible
binding site is to perform separate GOLD runs on different binding-site conformations.
2.2 Preparing the Ligand Input File
The N-phosphonacetyl-L-aspartate ligand has already been prepared in accordance with the
requirements for setting up the ligand (see Section 4., page 30).
Within SILVER read in the file ligand.mol2 from <GOLD_DIR>/examples/tutorial1
and inspect the structure:
Acceptable ligand input file formats are MOL2 (i.e. Tripos format) or MOL (i.e. MDL SD
format), PDB files can also be used, although we do not recommend the use of PDB format for
ligands (see Section 4.4, page 31).
All hydrogen atoms must be present in the ligand input file (see Section 4.2, page 30). In this
example, all hydrogen atoms have been added thus ensuring that the ionisation and tautomeric
states are defined unambiguously.
Certain groups can be represented in more than one way (i.e. have more than one canonical
form), such as nitro, carboxylate and amidinium. In such cases, there is usually a right and a
wrong representation for use in GOLD. The conventions used for some common difficult groups
and further help on setting up the ligand is provided (see Section 4., page 30).
60 GOLD User Guide
following parameters are used:
# CH...O PARAMETERS
# ================
CHO_COEFFICIENT -2.00
# OFF no CHO term
# SPECIAL only CH adjacent to heteroatoms
# ARO all aromatic CH
CHO_TYPE OFF
#CHO_TYPE SPECIAL
CHO_R_IDEAL 2.35
CHO_DELTA_R_IDEAL 0.25
CHO_DELTA_R_MAX 0.65
CHO_ALPHA_IDEAL 180.0
CHO_DELTA_ALPHA_IDEAL 50.0
CHO_DELTA_ALPHA_MAX 100.0
CHO_BETA_IDEAL 180.0
CHO_DELTA_BETA_IDEAL 70.0
CHO_DELTA_BETA_MAX 80.0
To enable calculation of a weak CH...O hydrogen bonding term S(cho) the term CHO_TYPE
should be set to SPECIAL. This will enable the recognition of activated CH groups for
hydrogen bonding. Active CH groups are those in aromatic rings next to nitrogens (e.g. the CH's
in an imidazole ring). These groups are recognised both in the ligand and protein active site.
For further details please refer to Virtual Screening Using Protein-Ligand Docking: Avoiding
Artificial Enrichment (see Section 19., page 147).
6.8 Heme Scoring Function
The heme scoring function is available for both GoldScore (see Section 6.2, page 46) and
ChemScore (see Section 6.4, page 49).
By default GOLD makes no distinction between different H-bond acceptors in terms of their
strength of interaction with the metal. A recent publication by Kirton et al (S. B. Kirton, C. W.
Murray, M. L. Verdonk and R. D. Taylor, Proteins: Structure, Function, and Bioinformatics, 58,
836-844, 2005) demonstrated how metal parameters can be set up in GOLD for both GoldScore
and ChemScore, to take account of different H-bond acceptor types. Kirton et al described the
use of ligand specific iron parameters in the context of docking to heme containing proteins and
demonstrated improved performance. It is now possible in GOLD to optionally use these
parameters.
The parameters are derived from contact statistics obtained from the CSD and PDB databases.
GOLD User Guide 61
Parameters were derived for both GoldScore and ChemScore.
These parameters can be used by choosing the appropriate .params file from those that have
been supplied with the GOLD installation. The .params files that are available are:
goldscore.p450_csd.params
goldscore.p450_pdb.params
chemscore.p450_csd.params
chemscore.p450.pdb.params
The files are located within the $GOLD_DIR/gold directory. The graphic below shows the
iron parameters for GoldScore, derived from the CSD, as displayed in the
goldscore.p450_csd.params file.
To employ one of the files, click on either the GoldScore Parameter File: button (if using
GoldScore) or the ChemScore Parameter File: button (if using ChemScore), navigate to the
$GOLD_DIR/gold, select the file required then click on Open.
It was found necessary by Kirton et al to assign the planar nitrogens in the heme molecules as
lipophilic when using the ChemScore scoring function. In order to bring this about the
chemscore.p450 parameter files therefore contain the additional keyword:
MAKE_PLANAR_N_LIPO 1
NOTE: Use of this keyword has only been validated for nitrogen atoms within heme containing
proteins. Improvements in docking performance when used with non-heme containing proteins
are not guaranteed.
164 GOLD User Guide
Acceptable protein input file formats for GOLD are PDB and MOL2.
The protein input file may be the entire protein structure, or consist of just those residues that are
in the region of the ligand binding site. GOLD searches for contacts out to a distance of 20.0 .
In this example, parts of the protein remote from the binding site have been deleted, in order to
speed up the calculation. The protein has been cut down to a radius of 20.0 around the ligand
binding site thus ensuring that enough of the protein has been retained so that all of the residues
that might reasonably interact with the ligand are present.
All hydrogen atoms must be present in the protein input file (see Section 3.2, page 10). In this
example, hydrogen atoms have been placed on the protein (using a molecular modelling
program (see Section 1., page 1) in order to ensure that ionisation and tautomeric states are
defined unambiguously. Obviously, this involved making hypotheses about the protonation
states of residues such as His, Glu and Asp.
GOLD allows for partial protein flexibility. Specifically, the torsion angles of Ser, Thr and Tyr
hydroxyl groups will be allowed to rotate during docking in order to optimise their hydrogen-
bonding to the ligand. Lysine NH
3
+
groups are similarly optimised.
Note: the optimised positions of polar protein hydrogen atoms that are generated during docking
(these will usually be different for each docked ligand pose) can be saved to the docked solution
file (see Section 14.2, page 111)
GOLD User Guide 163
Tutorial 1: A Step-By-Step Guide to Using GOLD
1. Introduction (see page 163)
2. Preparation of Input Structures for Use in GOLD (see page 163)
3. Specifying the Protein Input File (see page 166)
4. Specifying the Ligand Input File (see page 166)
5. Input Parameters and Files Settings (see page 167)
6. Defining the Ligand Binding Site (see page 167)
7. Fitness Function and Search Settings (see page 168)
8. Genetic Algorithm Parameter Settings (see page 169)
9. Setting Output Preferences (see page 170)
10. Running GOLD (see page 172)
11. Analysis of Output (see page 173)
1. Introduction
This tutorial aims to provide a step-by-step guide to using GOLD. To illustrate this, the
procedure for setting-up and running an example docking will be explained and additional
information will be provided on related issues.
In this example GOLD will be used to determine the binding mode of N-phosphonacetyl-L-
aspartate with the aspartate carbamoyltransferase, PDB entry code 1acm.
2. Preparation of Input Structures for Use in GOLD
2.1 Preparing the Protein Input File (see page 163)
2.2 Preparing the Ligand Input File (see page 165)
2.3 Atom Type Assignment (see page 166)
GOLD will only produce reliable results if the protein and ligand input files are set up correctly.
It is therefore essential that a number of key steps are followed when preparing any input
structure for use in GOLD ((see Section 3.1, page 9) and (see Section 4.1, page 30)).
2.1 Preparing the Protein Input File
The aspartate carbamoyltransferase, 1acm, has already been prepared in accordance with the
requirements for setting up the protein (see Section 3.1, page 9).
Open SILVER and read in the file protein.mol2 from <GOLD_DIR>/examples/
tutorial1 and inspect the structure:
62 GOLD User Guide
6.9 Internal Energy Offset
Click on the Fitness & Search Options button and switch on the Offset internal ligand energy by
best energy that is encountered during run check-box.
Enabling this option will result in the internal energy terms (internal torsion, internal vdw, and
internal Hbond) being corrected according to the best energy encountered for these terms during
the run.
By applying this correction the internal energy will be calculated with respect to that of a close
to optimal non-bound structure, thereby taking into account any irreducible internal energy.
The internal energy offset can be used with both Goldscore and Chemscore.
For Chemscore the ligand energy correction value is written to the docked solution files in the
tag <Gold.Chemscore.Internal.Correction>. This is the best (i.e. minimum energy) value
encountered.
For GoldScore the correction value is written to the docked solution files in the tag
<Gold.Goldscore.Internal.Correction>. This is the best score (ie. the maximum value)
encountered.
In both cases, best value encountered is subtracted from the ligand score (or energy) value before
being passing to the final GOLDscore or Chemscore-energy term. Note: The final Chemscore-
energy is converted to Chemscore-score by taking the negative.
Note: The .rnk file is corrected at the end of a run with the best energy encountered after all
docking attempts on a particular ligand (individual solution files are not). Therefore you may
observe small deviations for the best energy found between the solutions and rank file.
Increasing the number of dockings or the number of GA operations in each docking will result in
the discrepancy being less pronounced.
6.10 User Defined Fitness Function
In addition to the choice of scoring functions currently provided, i.e., GoldScore and
ChemScore, users can now implement their own scoring function, which can be accessed from
the GOLD front end by selecting User Defined Score:
GOLD User Guide 63
The GOLD scoring function Application Programming Interface (API) allows users to modify
the GOLD scoring-function mechanism in order to:
Calculate and write out additional data after each docking
Add extra terms to the scoring function
Implement a completely new scoring function
Full documentation for the GOLD Scoring Function Application Programming Interface (API)
is provided with the GOLD distribution:
UNIX: $GOLD_DIR/gold/api_doc/index.html
Windows: <InstallDir>/GOLD/gold/api_doc/index.html
where <InstallDir> is usually C:/Program Files/CCDC
see: GOLD Scoring Function Application Programming Interface (API) documentation.
A good knowledge of the C programming language is required together with some experience in
using GOLD.
Selecting Scoring Function Shared Object Name (UNIX) or Scoring Function DLL Name
(Windows) from the Fitness Function Settings panel enables you to specify a path to a
dynamically loadable shared object library.
GOLD uses shared objects (or dynamically loadable libraries) to allow new or modified scoring
functions to be plugged in. Two shared object files are relevant:
The main GOLD shared object, which is called libgold.so (UNIX) or gold.dll
(Windows)
The scoring-function shared objects which, by default, are called libfitfunc_dll.so
(UNIX), goldscore.dll or chemscore.dll(Windows)
On UNIX the file libgold.so is included in the GOLD distribution, together with two
versions of libfitfunc_dll.so, one implementing the normal GOLD scoring function and
the other implementing the ChemScore function.
On Windows the file gold.dll is included in the GOLD distribution, together with two files
called goldscore.dll, for implementing the normal GOLD scoring function, and
chemscore.dll, for implementing the ChemScore function.
It effectively provides a mechanism by which data may be intercepted and modified during
docking. Users may therefore post-process the results of a docking, or modify the GOLD
function, or implement their own scoring function, by building their own versions of
libfitfunc_dll.so (UNIX) or, e.g. goldscore.dll (Windows).
162 GOLD User Guide
Appendix E: GOLD Tutorials
In order to familiarise yourself with GOLD it is recommended that you work through the tutorial
examples provided. Tutorial 1 will go through the process of setting up and running an example
docking in some detail, subsequent tutorials will be more concise but will introduce other, more
advanced, aspects of the program.
For the purpose of these tutorials it is assumed that the user has access to either SILVER
(supplied with GOLD) or another visualisation program (for instructions on how to use SILVER
refer to the SILVER User Guide). In addition, if you wish to set up your own protein and ligand
input files ((see Section 3.1, page 9) and (see Section 4.1, page 30)) then you will need access to
a molecular modelling program. Full details of the software requirements needed in order to use
GOLD are given elsewhere (see Section 1., page 1).
Please note: due to the non-deterministic nature of GOLD results may vary from those described
in the tutorials.
Tutorial 1: A Step-By-Step Guide to Using GOLD (see page 163)
Tutorial 2: Handling of Metals in GOLD (see page 178)
Tutorial 3: Use of Hydrogen Bonding Constraints (see page 185)
Tutorial 4: Use of Substructure Based Distance Constraints (see page 194)
Tutorial 5: Docking with Water in the Binding Site (see page 202)
Tutorial 6: Docking with a Flexible Side Chain (see page 208)
Tutorial 7: Docking using Localised Soft Potentials (see page 215)
GOLD User Guide 161
Correlation of prediction quality with number of flexible torsions in ligand
Errors or Wrong 53.9 38.5 3.6
Prediction Result Max Avg Min
Good or Close 24 9.0 0
Errors or Wrong 14 8.4 3
64 GOLD User Guide
7. Ligand Flexibility
7.1 Flipping Ring Corners (see page 64)
7.2 Flipping Amide Bonds (see page 64)
7.3 Flipping Planar Nitrogens (see page 65)
7.4 Flipping Pyramidal Nitrogens (see page 66)
7.5 Intramolecular Hydrogen Bonds (see page 66)
7.6 Protonated Carboxylic Acids (see page 66)
7.7 Fixing Rotatable Bonds at Their Input Conformation (see page 66)
7.1 Flipping Ring Corners
Click on the Fitness & Search Options button and switch on the Flip ring corners check-box to
allow free corners of ligand rings to flip. This will result in GOLD performing a limited
conformational search of cyclic systems by allowing free corners of rings to flip above or below
the plane of their neighbouring atoms.
If the Flip ring corners check box is not switched on then rings will be held rigid at the input
conformation during docking.
The rules govening flipping of ring corners in GOLD are given in:
A. W. R. Payne and R. C. Glen, J. Mol. Graphics, 1993, 10, 74-91
7.2 Flipping Amide Bonds
During initialisation of the ligand amides (including thioamides, ureas, and thioureas) will be set
to the trans conformation.
Click on the Fitness & Search Options button and switch on the Flip amide bonds check box to
allow amides, thioamides, ureas, and thioureas in the ligand to flip between cis and trans.
In order to flip between cis and trans conformations the CO-NRR' torsion is first made planar (at
the initialised trans conformation).
Note: N,N disubstituted amides are not made planar; CO-NH
2
will be set so that the NH
2
group
is in plane with the CO (care must be taken that the input RNH
2
group itself is planar since
GOLD will not change this).
On occasion this flattening of the CO-NRR' torsion may result in clashes in the initialised
structure. If this occurs, it is advisable to turn off normalisation of amide bonds using the
FLATTEN_BONDS keyword in the gold.params file. In this case it is recommended to fix
the bond by switching off Flip amide bonds, or by explicitly specifying that the appropriate
rotatable bonds are held at their input conformation (see Section 7.7, page 66).
If the use of torsion angle distribution has been enabled (see Section 9., page 83) GOLD will
attempt to match amide torsions against the torsion angles distributions file. If an amide torsion
matches, this will override the Flip amide bonds flag setting.
Note: Data in the CSD show that both cis and trans conformations occur in ureas, it is therefore
GOLD User Guide 65
recommended that amide flipping be turned on in order to sample R-N-C(O)-N torsions of 0
degrees when docking ureas.
7.3 Flipping Planar Nitrogens
Click on the Fitness & Search Options button and switch on the Flip all planar R-NR1R2 check
box to allow planar trigonal nitrogens in the ligand (bound to sp2 carbons) to flip between cis
and trans conformations during docking (otherwise, they will be held fixed at the input
geometry).
It is possible to further specify whether or not ring-NHR and ring-NRR' groups are also allowed
to flip (i.e. rotate 180 deg.).
When running GOLD from the command line a number of keyword modifiers can be specified
after the flip_planar_n command in the gold.conf file:
flip_planar_n = <1|0> <keyword>
These keywords allow further control over the behaviour of this flag. The following keywords
can be used:
flip_ring_NRR
flip_ring_NHR
This allows flipping of ring-NHR or ring-NRR groups and is equivalent to using the including
ring-NHR and including ring-NRR settings in the interface.
fix_ring_NRR
fix_ring_NHR
This fixes these bonds at their input conformation and is equivalent to using the do not flip ring-
NHR and do not flip ring-NRR settings in the interface.
rot_ring_NRR
rot_ring_NHR
Use these keywords to allow free rotation of ring-NHR or ring-NRR groups.
For example, setting flip_planar_n = 1 fix_ring_NRR will allow all planar R
3
N
groups to flip, but will fix ring-NRR groups.
160 GOLD User Guide
Appendix D: GOLD Predictions in Second Series of Validation
Tests
3D plots of individual predictions are available on the CCDC web page.
The tables in this section list:
Subjective classification of GOLD predictions (see page 153)
Correlation of prediction quality with number of heavy atoms in ligand (see page 158)
Correlation of prediction quality with percentage of heavy atoms in ligand that can form
hydrogen bonds (see page 158)
Correlation of prediction quality with number of flexible torsions in ligand (see page 158)
Subjective classification of GOLD predictions
Correlation of prediction quality with number of heavy atoms in ligand
Correlation of prediction quality with percentage of heavy atoms in ligand that can form hydrogen
bonds
Subjective Result No. PDB Codes
Good 12 1BMA 1CIL 1FRP 2GBP 1GLP 1LAH
1LPM 1MMQ 1MRG 1TRK 1TNL 1WAP
Close 13 1ATL 1BBP 1BYB 1CBS 1COM 1FEN
1HFC 1IMB 1LCP 1NCO 1TNG 1TNI 1TPH
Some significant errors 6 2CMD 1CTR 2LGS 1LNA 1SNC 1UKZ
Wrong 3 1CDG 1LMO 1TYL
Prediction Result Max Avg Min
Good or Close 48 21.2 8
Errors or Wrong 29 19.9 10
Prediction Result Max Avg Min
Good or Close 60.0 29.5 0.0
GOLD User Guide 159
Correlation of prediction quality with protein resolution
Resolution () Total No. Good +
No. Close
No. Errors +
No. Wrong
> 1.0, <= 1.5 2 2 0
> 1.5, <= 2.0 44 34 10
> 2.0, <= 2.5 32 24 8
> 2.5, <= 3.0 20 11 9
> 3.0 1 0 1
66 GOLD User Guide
7.4 Flipping Pyramidal Nitrogens
Click on the Fitness & Search Options button and switch on the Flip pyramidal N check box to
allow pyramidal (i.e. non-planar sp3) nitrogens to invert during docking (otherwise, they will be
held fixed at the input geometry).
Given a non-planar group RRRN or tetrahedrally surrounded RRRNH, the Flip pyramidal N
switch enables flipping of the local stereochemistry around the nitrogen (the energy barrier for
this umbrella-like change of geometry around the nitrogen is low).
Flipping only changes the stereochemistry around RRRN and RRRNH nitrogens. It does not
affect other chiral centers.
7.5 Intramolecular Hydrogen Bonds
Click on the Fitness & Search Options button and switch on the Internal H-bonds check box to
allow intramolecular hydrogen bonds in the ligand to be formed during docking.
Use this with care as it can make ligands like methotrexate curl up.
7.6 Protonated Carboxylic Acids
Click on the Fitness & Search Options button and switch on the Protonated carboxylic acids
check box. Protonated carboxylic acids can then either be allowed to flip (i.e. rotate 180 deg.) or
rotate freely during docking.
If the Protonated carboxylic acids check box is not switched on then these groups will be held
rigid at their input conformation.
7.7 Fixing Rotatable Bonds at Their Input Conformation
GOLD was designed to dock flexible ligands into protein binding sites. However, sometimes it
can be useful to fix the geometry of part or all of the ligand e.g. in order to study the possible
binding of a pre-determined ligand geometry.
The ability to fix rotatable bonds at their input conformation is not available from the GOLD
front end. To do this, you need to edit the gold.conf file (see Section 15.1, page 126). The
following options are available:
To fix the rotatable bond between two specified atoms, add the following line to the
gold.conf file:
fix_rotatable_bond = <atom number 1> <atom number 2>
(numbering as in the input file).
Note: The ability to fix rotatable bonds at their input conformation is also available using the
rotatable_bond_override.mol2 file (see Section 5.4, page 38). This is particularly useful if
GOLD User Guide 67
docking a library of ligands that have a common substructure rather than the method above
which is more suitable when docking an individual ligand.
To fix all rotatable bonds in the ligand at their input conformation, add the following line to
the gold.conf file:
fix_rotatable_bond = all
To fix all non-terminal rotatable bonds (i.e. not -CH
3
, -OH, etc.), add the following line to the
gold.conf file:
fix_rotatable_bond = all_but_terminal
Note: When fixing all rotatable bonds at their input conformation (i.e. performing a rigid ligand
docking) GOLD will try to find the best orientation of the ligand in the binding site by mapping
donor-acceptor (as well as hydrophobic-hydrophobic) fitting points. However, GOLD will not
perform a local optimisation (simplex) on the final solution. This may lead to penalisation of
near-optimal conformations. Performing a few cycles of molecular-mechanics minimisation
before docking may help to take the ligand close to its local potential-energy minimum.
158 GOLD User Guide
Correlation of subjective classification with rms deviation
Correlation of prediction quality with number of heavy atoms in ligand
Correlation of prediction quality with percentage of heavy atoms in ligand that can form hydrogen
bonds
Correlation of prediction quality with number of flexible torsions in ligand
Rms Devn. () Total No. No. Good No. Close No. Errors No. Wrong
<= 0.5 8 8 0 0 0
> 0.5, <= 1.0 27 24 3 0 0
> 1.0, <= 1.5 20 7 13 0 0
> 1.5, <= 2.0 11 2 9 0 0
> 2.0, <= 2.5 2 0 2 0 0
> 2.5, <= 3.0 3 0 2 1 0
> 3.0 28 0 1 8 19
Prediction Result Max Avg Min
Good or Close 52 20.4 6
Errors or Wrong 55 24.3 9
Prediction Result Max Avg Min
Good or Close 66.7 31.9 8.7
Errors or Wrong 53.9 25.1 4.8
Prediction Result Max Avg Min
Good or Close 28 7.9 0
Errors or Wrong 40 11.4 0
GOLD User Guide 157
1ETR 4.23 1.55 5.65 12.81 Errors
1NIS 4.29 3.49 3.99 4.31 Wrong
2MCP 4.37 2.45 4.43 8.26 Wrong
6RSA 4.42 4.29 4.50 5.24 Errors
1RDS 4.78 1.49 6.00 11.00 Errors
1ACK 4.99 3.82 4.95 10.10 Errors
2AK3 5.08 2.41 5.43 10.20 Wrong
3CLA 5.45 2.22 5.59 6.88 Wrong
4FAB 5.69 1.24 3.60 6.69 Wrong
1BAF 6.12 4.96 5.76 6.17 Errors
1MCR 6.23 3.40 5.32 6.73 Wrong
2RO7 8.23 8.23 11.32 17.12 Wrong
1ICN 8.63 4.14 9.92 16.98 Wrong
1IGJ 9.42 9.08 10.43 13.21 Wrong
2MTH 10.12 0.90 4.65 10.12 Wrong
1TDB 10.48 4.47 8.57 12.06 Wrong
1HDC 10.49 1.65 10.64 13.50 Errors
1LIC 10.78 6.32 12.88 15.65 Errors
1ETA 11.21 7.19 9.69 12.84 Wrong
1IDA 12.12 1.41 6.84 14.43 Close
1EED 12.43 2.87 10.06 13.78 Wrong
1AAQ 12.85 1.52 7.04 15.35 Wrong
2PLV 13.92 9.11 12.65 16.21 Wrong
1HRI 14.01 11.70 14.40 16.97 Wrong
PDB Code Rms Devn.
of Top-
Ranked
Solution
Rms Devn.
of Closest
Solution
Average
Rms Devn.
of All
Solutions
Rms Devn.
of Worst
Solution
Subjective
Rating
68 GOLD User Guide
8. Setting and Releasing Constraints
8.1 Using the Constraint Editor (see page 68)
8.2 Distance Constraints (see page 69)
8.3 Hydrogen Bond Constraints (see page 73)
8.4 Region (Hydrophobic) Constraints (see page 77)
8.5 Template Similarity Constraints (see page 79)
8.6 Scaffold Match Constraint (see page 80)
8.1 Using the Constraint Editor
Click on the Edit Constraints button within the Fitness Function and Search Settings panel of
the GOLD front end. This will open the Constraints Editor:
To define a constraint, select a constraint type from those listed and specify the required settings.
The following constraint types are available:
Distance constraint, for use with individual ligands (see Section 8.2, page 69).
GOLD User Guide 69
Substructure based distance constraint, for use with multiple ligands that have a common
substructure or functional group (see Section 8.2, page 69).
Hydrogen bond constraint, for specifying a hydrogen bond between a particular ligand atom
and a particular atom in the protein (see Section 8.3, page 73).
Protein hydrogen bond constraint, for specifying that a particular protein atom should be
hydrogen-bonded to the ligand, but without specifying to which ligand atom (see Section 8.3,
page 73).
Region (hydrophobic) constraint, for biasing the docking towards solutions in which
particular regions of the binding site are occupied by specific ligand atoms or types of ligand
atom (see Section 8.4, page 77).
Template similarity constraint, for biasing the conformation of docked ligands towards a
given solution, or template (see Section 8.5, page 79).
Once the settings for a constraint have been specified click on the Add constraint or Update
selected constraint button to add the constraint definition to the Current Constraints.
Repeat the above procedure if you want to specify additional constraints.
To edit a constraint highlight the corresponding entry in the Current Constraints list, make the
required change and then hit the Add constraint or Update selected constraint button.
To remove a constraint from the Current Constraints list highlight the entry and hit the Delete
Selection button, or to remove all entries hit the Clear List button.
It is possible to instruct GOLD not to dock ligands when the specified constraint is physically
impossible to satisfy (e.g. if no suitable group is present in the ligand to form the required H-
bond constraint). This is done by selecting the Never dock a ligand when a constraints is
physically impossible check box in the Constraint Editor.
Click on Done in the Constraints Setup window when you are satisfied with the constraints
specified. The count of Constraints will be updated in the GOLD front end.
Note: When using constraints GOLD will be biased towards finding solutions in which the
specified constraint is satisfied. However, it is important to remember that such a solution is not
guaranteed (i.e. it is not possible to force a constraint to be satisfied in the final solution).
8.2 Distance Constraints
Any distance between a ligand and protein atom (or between two ligand atoms) can be
constrained to lie between minimum and maximum distance bounds. GOLD features two types
of distance constraint:
A standard distance constraint for use with individual ligands (see Section 8.2.1, page 70).
A substructure-based distance constraint for use with multiple ligands which have a common
functional group (see Section 8.2.3, page 72).
156 GOLD User Guide
1GLQ 1.35 0.97 3.77 9.47 Close
1PHG 1.35 1.35 3.57 4.59 Close
4EST 1.38 1.04 2.76 4.96 Close
1DRI 1.41 1.04 1.35 1.43 Close
4DFR 1.44 0.80 3.98 10.85 Good
1GHB 1.45 1.22 2.59 4.80 Close
5P2P 1.55 1.24 6.15 11.69 Close
4CTS 1.57 1.56 1.57 1.61 Close
3CPA 1.58 0.90 1.47 1.89 Close
1APT 1.62 1.62 6.50 9.97 Close
1TMN 1.68 1.46 5.25 10.61 Close
1DWD 1.71 1.71 6.50 9.56 Close
1FKG 1.81 1.67 6.26 11.32 Good
1HEF 1.87 1.87 10.01 14.04 Good
1TKA 1.88 0.86 2.54 5.09 Close
1BLH 1.95 0.53 1.60 2.31 Close
1RNE 2.00 1.79 6.70 10.90 Close
1EPB 2.08 2.03 6.50 12.91 Close
1IVE 2.16 1.23 2.05 2.17 Close
1AZM 2.52 2.25 2.46 2.56 Close
3GCH 2.64 1.67 1.99 2.64 Close
1EAP 3.00 1.33 3.78 10.48 Errors
1DID 3.72 0.51 3.59 5.88 Wrong
1ROB 3.75 0.80 3.83 7.43 Errors
1MUP 3.96 3.41 4.10 4.58 Wrong
1ACJ 4.00 0.23 3.73 5.52 Wrong
PDB Code Rms Devn.
of Top-
Ranked
Solution
Rms Devn.
of Closest
Solution
Average
Rms Devn.
of All
Solutions
Rms Devn.
of Worst
Solution
Subjective
Rating
GOLD User Guide 155
1ABE 0.86 0.73 1.12 3.06 Good
1ACO 0.86 0.80 1.49 3.43 Good
1COY 0.86 0.54 3.15 6.63 Good
8GCH 0.86 0.86 5.84 8.54 Good
1LST 0.87 0.47 0.84 1.07 Good
1XID 0.92 0.92 1.95 2.38 Close
2SIM 0.92 0.73 1.20 1.56 Good
1HDY 0.94 0.79 1.30 2.08 Good
3PTB 0.96 0.64 0.91 1.78 Good
1HSL 0.97 0.63 0.81 0.97 Good
2CGR 0.99 0.82 0.98 1.05 Good
1LDM 1.00 1.00 1.00 1.00 Close
1MRK 1.01 0.74 1.45 5.86 Good
1DIE 1.03 0.86 1.94 3.82 Close
6ABP 1.08 0.27 0.99 3.05 Close
1HYT 1.10 1.01 1.11 1.15 Good
1AEC 1.11 0.35 1.42 6.07 Good
4PHV 1.11 1.02 5.74 12.87 Good
3HVT 1.12 1.12 4.25 4.81 Close
1DBB 1.17 0.43 4.86 11.48 Good
2YHX 1.19 1.12 2.99 8.58 Close
6RNT 1.20 0.72 4.16 8.17 Close
1PHA 1.24 0.86 2.88 6.14 Close
1POC 1.27 1.20 2.73 12.37 Good
2DBL 1.31 1.29 8.65 16.31 Close
2PK4 1.34 1.11 1.83 7.01 Close
PDB Code Rms Devn.
of Top-
Ranked
Solution
Rms Devn.
of Closest
Solution
Average
Rms Devn.
of All
Solutions
Rms Devn.
of Worst
Solution
Subjective
Rating
70 GOLD User Guide
8.2.1 Setting Up a Distance Constraint (see page 70)
8.2.2 Method Used for Substructure-Based Distance Constraints (see page 71)
8.2.3 Setting Up Substructure-Based Distance Constraints (see page 72)
8.2.1 Setting Up a Distance Constraint
A distance between a specified ligand and protein atom (or between two ligand atoms) can be
constrained to lie between minimum and maximum distance bounds.
During a GOLD run, if a constrained distance is found to lie outside its bounds, a spring energy
term is used to reduce the fitness score, i.e.
E = kx
2
where:
x is the difference between the distance and the closest constraint bound;
k is a user-defined spring constant.
To constrain a distance, click on the Edit Constraints button to bring up the Constraint Editor.
Then, select Distance Constraint from the list of constraint types.
Specify the required settings using the protein and ligand atom numbers as defined in the MOL2
input files (if PDB input is used, use the sequence number). The maximum and minimum
separation of the constrained atoms must be entered (distances are in ), and the spring constant
must also be specified. For example:
GOLD User Guide 71
If the specified ligand atom is topologically equivalent to other atoms in the ligand (e.g. it is one
of the oxygen atoms of an ionised carboxylate group), then GOLD will compute the constraint
term using whichever of the equivalent atoms gives the best value automatically.
Click on the Add constraint or Update selected constraint button to add the constraint definition
to the Current Constraints (see Section 8.1, page 68).
8.2.2 Method Used for Substructure-Based Distance Constraints
It is possible to apply a distance constraint to multiple ligands which have a common functional
group.
The constraint forces GOLD to limit the distance between a protein atom and one atom of this
functional group. Docking solutions will be biased towards the specified distance range.
During docking the constraint will be applied to any ligands which contain the specified
substructure (matching is performed on the basis of the atom types and 2D connectivity) and the
resulting solutions will be biased towards the specified distance range. GOLD always accounts
for topology in the substructure.
154 GOLD User Guide
Rms deviations between GOLD predictions and observed ligand positions
PDB Code Rms Devn.
of Top-
Ranked
Solution
Rms Devn.
of Closest
Solution
Average
Rms Devn.
of All
Solutions
Rms Devn.
of Worst
Solution
Subjective
Rating
1ULB 0.32 0.32 0.38 0.53 Good
2CTC 0.32 0.24 0.38 1.94 Good
1MDR 0.36 0.36 0.50 0.65 Good
2ADA 0.40 0.40 0.47 6.20 Good
1SRJ 0.42 0.42 4.86 1.11 Good
3AAH 0.42 0.36 0.66 0.49 Good
1TPP 0.43 0.37 0.43 0.61 Good
1ASE 0.49 0.36 0.60 1.31 Good
1AHA 0.51 0.51 0.51 0.51 Good
1CBX 0.54 0.49 0.53 0.58 Good
1PBD 0.57 0.18 0.45 0.70 Good
2CHT 0.59 0.57 0.62 0.85 Good
1STP 0.69 0.56 0.67 0.98 Good
1XIE 0.69 0.69 2.20 4.93 Good
1FKI 0.71 0.71 1.81 6.22 Good
1DBJ 0.72 0.39 4.16 6.13 Good
2PHH 0.72 0.63 0.68 0.73 Good
1SLT 0.78 0.78 6.64 8.43 Good
7TIM 0.78 0.64 0.81 1.71 Good
3TPI 0.80 0.36 0.91 1.98 Good
1ACM 0.81 0.79 1.01 1.23 Good
1CPS 0.84 0.60 1.91 6.56 Good
1PHD 0.85 0.32 0.85 2.15 Good
GOLD User Guide 153
Appendix C: GOLD Predictions in First Series of Validation Tests
3D plots of individual predictions are available on the CCDC web page.
The tables in this section list:
Subjective classification of GOLD predictions (see page 153)
Rms deviations between GOLD predictions and observed ligand positions (see page 154)
Correlation of subjective classification with rms deviation (see page 158)
Correlation of prediction quality with number of heavy atoms in ligand (see page 158)
Correlation of prediction quality with percentage of heavy atoms in ligand that can form
hydrogen bonds (see page 158)
Correlation of prediction quality with number of flexible torsions in ligand (see page 158)
Correlation of prediction quality with protein resolution (see page 159)
Subjective classification of GOLD predictions
Subjective Result No. PDB Codes
Good 41 1ABE 1ACM 1ACO 1CBX 1COY 1CPS 1DBB
1DBJ 1FKG 1FKI 1HDY 1HEF 1HYT 1LST 1MDR
1MRK 1PBD 1PHD 1POC 1SRJ 1STP 1TPP 1ULB
1XIE 2ADA 2CGR 2CHT 2CTC 2PHH 2SIM 3AAH
3PTB 3TPI 4DFR 4PHV 7TIM 8GCH 1AEC 1AHA
1ASE 1HSL
Close 30 1BLH 1DIE 1DR1 1DWD 1EPB 1GHB 1GLQ 1IDA
1IVE 1LDM 1PHA 1PHG 1RNE 1SLT 1TKA 1TMN
1XID 2DBL 2PK4 2YHX 3CPA 3GCH 3HVT 4CTS
5P2P 6ABP 6RNT 1APT 1AZM 4EST
Some significant errors 9 1BAF 1EAP 1ETR 1HDC 1LIC 1RDS 1ROB 6RSA
1ACK
Wrong 19 1AAQ 1ACJ 1DID 1EED 1ETA 1HRI 1ICN 1IGJ
1MCR 1MUP 2R07 1NIS 1TDB 2AK3 2MTH 2PLV
3CLA 4FAB 2MCP
72 GOLD User Guide
Note: the substructure must be a sub-graph rather than a complete molecule.
As with normal distance constraints (see Section 8.2.1, page 70), the score is reduced for
unfavourable ligand solutions. The amount of decrease in the score is determined by a weight
term that the user must supply.
8.2.3 Setting Up Substructure-Based Distance Constraints
To use a substructure-based distance constraint, first create a file containing the substructure in
MOL2 format (e.g. substructure.mol2). It is recommended that you set atom types manually (see
Section 5.3, page 37) since an incomplete fragment can cause problems with automatic atom-
typing. The actual conformation of the group in this file is not important, as only the atom types
and 2D connectivity will be used.
Click on the Edit Constraints button to bring up the Constraint Editor. Then, select Substructure
Constraint from the list of constraint types.
Click on the Substructure file name button, then select the substructure file and hit Open.
Enter the Protein atom number and Substructure atom number to which the distance constraint
GOLD User Guide 73
applies (numbering as in the MOL2 files).
Specify the allowed range of separation by entering a Maximum separation and a Minimum
separation (distances are in ).
Enter the spring constant (i.e. the weight of the term). This causes a spring-based distance
constraint to be added for the specified substructure atom and protein atom. The weight specifies
the spring energy term; usually, a weight in the range of 5 to 10 will work well.
It is possible to define a distance constraint from a centroid of a ring in the ligand. To do this
specify an atom within the ring of interest and enable the Use ring center nearest to selected atom
in ligand check-box. The closest ring center to the selected atom will be used.
Note: when defining a distance constraint involving a ring center ensure that the maximum and
minimum separations are adjusted accordingly.
If the constraint refers to a substructure atom (and therefore a ligand atom) which is
topologically equivalent to other atoms (e.g. it is one of the oxygen atoms of an ionised
carboxylate group), GOLD will automatically compute the constraint term using whichever of
the equivalent atoms gives the best value.
Click on the Add constraint or Update selected constraint button to add the constraint definition
to the Current Constraints (see Section 8.1, page 68).
8.3 Hydrogen Bond Constraints
Two types of hydrogen bond constraints may be specified:
A hydrogen bond constraint: H Bond Constraint (see Section 8.3.1, page 73), which can be
used to force a hydrogen bond between a particular protein atom and a particular ligand atom.
A protein hydrogen bond constraint: Protein H Bond Constraint (see Section 8.3.3, page 75),
which can be used to specify that a particular protein atom should be hydrogen-bonded to the
ligand, but without specifying to which ligand atom.
8.3.1 Setting Up Hydrogen Bond Constraints (see page 73)
8.3.2 Method Used for Protein H Bond Constraints (see page 74)
8.3.3 Setting up Protein H Bond Constraints (see page 75)
8.3.1 Setting Up Hydrogen Bond Constraints
A ligand atom may be constrained to form a hydrogen bond to a particular protein atom. One
atom should be a donatable hydrogen atom (you must give the number of the hydrogen atom, not
the O or N atom to which it is attached) and the other should be an acceptor. The protein atom
should be available for ligand binding (i.e. solvent accessible).
Note: that this constraint does not work with metals.
The constraint is incorporated into the least-squares fitting routine used by GOLD. Thus, when
least-squares fitting is used to dock the ligand (by attempting to form hydrogen bonds encoded
within the chromosome) the constraint is added to the least-squares mapping. The constraint has
152 GOLD User Guide
Note: Certain docking-score terms are the product of a term dependent on the magnitude of a
particular physical contribution (e.g. hydrogen bonding) and a scale factor determined e.g. by a
regression coefficient.
The docking-score term descriptors included in the output file can therefore consist of weighted
terms, non-weighted terms or both (as specified in the GOLD Output Preferences).
Weighted terms will be indicated as such in the tag name, e.g.
Gold.Chemscore.Hbond.Weighted.
Gold.Goldscore.Inter-
nal.Correction
Internal ligand energy offset (see Section 6.9, page
62)
Gold.Chemscore.Zero-
Coef
The Chemscore zero coefficient (see Section 6.4.1,
page 49)
Gold.Chemscore.Rot Rotatable-bond freezing term contribution to Chem-
score value
(see Section 6.4.5,
page 56)
Gold.Chemscore.Fitness Total Chemscore fitness value of docked ligand (see Section 6.4.1,
page 49)
Gold.Chemscore.Hbond Protein-ligand H-bond contribution to Chemscore
value
(see Section 6.4.3,
page 52)
Gold.Chemscore.Lipo Protein-ligand lipophilic contribution to the Chem-
score value
(see Section 6.4.4,
page 54)
Gold.Chemscore.Metal Metal-binding contribution to Chemscore value (see Section 6.4.4,
page 54)
Gold.Chem-
score.internal_Hbond
Internal ligand intramolecular H-bond contribution to
Chemscore value
(see Section 6.4.3,
page 52)
Gold.Chemscore.DEClash Protein-ligand clash penalty to the Chemscore value (see Section 6.4.6,
page 56)
Gold.Chem-
score.DEInternal
Internal ligand torsional strain penalty to the Chem-
score value
(see Section 6.4.6,
page 56)
Gold.Chemscore.DG Free energy change (that occurs on ligand binding)
contribution to Chemscore value
(see Section 6.4.1,
page 49)
Gold.Chemscore.Cova-
lent
Covalent bonding contribution to Chemscore value (see Section 6.4.7,
page 58)
Gold.Chemscore.Con-
straint
Constraint contribution to Chemscore value (see Section 6.4.8,
page 58)
Gold.Chemscore.CHO-
Score
Contribution for weak CH...O H-bonds
(see Section 6.7, page
59)
Gold.Chemscore.Inter-
nal.Correction
Internal ligand energy offset (see Section 6.9, page
62)
Name Explanation See
GOLD User Guide 151
Appendix B: Additional Tags in Output Files
Solution output files for the docked ligand(s) can contain additional information such as the
scoring function terms and the rotated protein hydrogen atom positions that were generated
during the docking.
This information can be written to SD file tags; for MOL2 files, these tags are written to
comment blocks. This additional information is particularly important when post-processing
docking results with SILVER. It is possible to control the information written to solution files
from the Output Preferences window (see Section 14.2, page 111).
The table below lists the tag names that you are likely to see in GOLD solution files:
Name Explanation See
Gold.Protein.ActiveR-
esidues
List of protein residues used to define the binding site. (see Section 3.8.5,
page 27)
Gold.Protein.Rota-
tedAtoms
Optimised positions of polar protein hydrogen atoms
that are generated during docking.
(see Section 14.6,
page 115)
Gold.Protein.Rotated-
WaterAtoms
Optimised positions of water hydrogen atoms gener-
ated during docking
(see Section 3.4, page
16)
Gold.Protein.Rotated-
Torsions
Optimised torsions for rotatable bonds in the ligand.
Also for protein side chain torsions which have been
specified as being allowed to rotate during docking
(see Section 3.6, page
18)
Gold.Id.Protein Enabling the association of a solution with its protein
Gold.Goldscore.Fitness Total GoldScore fitness value of docked ligand (see Section 6.2, page
46)
Gold.Goldscore.Exter-
nal.Hbond
Protein-ligand H-bond contribution to GoldScore
value
(see Section 6.2, page
46)
Gold.Goldscore.Exter-
nal.Vdw
Protein-ligand vdw contribution to GoldScore value (see Section 6.2, page
46)
Gold.Goldscore.Inter-
nal.Hbond
Internal ligand intramolecular H-bond contribution to
GoldScore value
(see Section 6.2, page
46)
Gold.Goldscore.Inter-
nal.Vdw
Internal ligand vdw contribution to GoldScore value (see Section 6.2, page
46)
Gold.Goldscore.Inter-
nal.Torsion
Internal ligand torsion-strain contribution to Gold-
Score value
(see Section 6.2, page
46)
Gold.Goldscore.Cova-
lent.Energy
Covalent bonding contribution to Goldscore value (see Section 6.2, page
46)
Gold.Goldscore.Con-
straint.Score
Constraint contribution to GoldScore value (see Section 6.2, page
46)
74 GOLD User Guide
a weight of 5 relative to a normal hydrogen bond taken from the chromosome.
To specify a hydrogen bond constraint, click on the Edit Constraints button to bring up the
Constraint Editor. Then, select H-Bond Constraint from the list of constraint types.
Specify the ligand and protein atom numbers as defined in the MOL2 input files (if PDB input is
used, use the sequence number):
The hydrogen bond constraint weighting can be altered within the # FITNESS FUNCTION
section of the GOLD parameters file by changing the value of the parameter CONSTRAINT_WT.
Click on the Add constraint or Update selected constraint button to add the constraint definition
to the Current Constraints (see Section 8.1, page 68).
8.3.2 Method Used for Protein H Bond Constraints
A protein hydrogen bond constraint can be used to specify that a particular protein atom should
be hydrogen-bonded to the ligand, but without specifying to which ligand atom.
GOLD will be biased towards finding solutions in which the specified protein atoms form
GOLD User Guide 75
hydrogen bonds. The fitness score of a given docking will be penalised by a user specified value
c for every protein H-bond constraint that is not satisfied (i.e. for every protein atom that you
have specified should form a hydrogen bond but does not).
GOLD assesses the geometry of each required hydrogen bond on a scale of 0 to 1, with 1
denoting perfect. If this geometry weight for the constrained Hbond falls below the Minimum H-
bond geometry weight specified by the user, a penalty will be applied to the score for the
unfulfilled hydrogen bond. i.e. it will not be considered to be an H-bond and will therefore
contribute a penalty to the fitness score.The magnitude of this penalty is equal to the weight
specified for the constraint.
Each trial ligand docking in a genetic algorithm run is generated by a least-squares fit of
mapping points (H-bonding or hydrophobic binding points on the protein with complementary
points on the ligand). The inclusion of a protein H-bond constraint will ensure that at least one of
the specified protein atoms is included as one of the mapping points. i.e. use of the specified
points is enforced at the mapping stage of the algorithm.
If a ligand simply does not contain sufficient complementary hydrogen-bonding atom(s) to
satisfy the specified protein H-bond constraints (e.g. you require an H-bond to a protein acceptor
but the ligand contains no donors), then GOLD can be set up not to dock ligands when the
specified constraint is physically impossible to satisfy (see Section 8.1, page 68).
8.3.3 Setting up Protein H Bond Constraints
A protein hydrogen bond constraint can be used to specify that a particular protein atom should
be hydrogen-bonded to the ligand, but without specifying to which ligand atom.
To do this, click on the Edit Constraints button to bring up the Constraint Editor. Then, select
Protein H-Bond Constraint from the list of constraint types.
Specify which protein atoms are to form hydrogen bonds by typing their atom numbers, as
defined in the MOL2 input file, into the Protein atom required to form H-bond entry box.
Note: Either a donatable hydrogen atom (you must give the number of the hydrogen atom, not
the O or N atom to which it is attached) or an acceptor can be specified. The protein atom should
be available for ligand binding (e.g. solvent accessible). This constraint does not work with
metals.
150 GOLD User Guide
Bond types:
single 1
double 2
triple 3
aromatic ar
amide am
delocalised, e.g. in carboxylate, guanidinium ar
GOLD User Guide 149
Appendix A: List of Atom and Bond Types
GOLD uses SYBYL atom and bond types as follows:
Atom types:
Hydrogen H
Carbon sp
3
C.3
Carbon sp
2
C.2
Carbon sp C.1
Carbon aromatic C.ar
Carbocation (guanadinium) C.cat
Nitrogen sp
3
N.3
Nitrogen sp
2
N.2
Nitrogen sp N.1
Nitrogen aromatic, e.g. in pyridine N.ar
Nitrogen amide N.am
Nitrogen trigonal planar, e.g. in nitro, pyrrole N.pl3
Nitrogen sp
3
positively charged, e.g. in lysine
N.4
Oxygen sp
3
O.3
Oxygen sp
2
O.2
Oxygen in carboxylates and phosphates O.co2
Sulphur sp
3
S.3
Sulphur sp
2
S.2
Sulphoxide sulphur S.o
Sulphone sulphur S.o2
Phosphorus sp
3
P.3
Halogens, metals normal element symbols, e.g. F, Cl,
Ca, Zn
76 GOLD User Guide
The Constraint weight is the strength of bias applied to the formation of a specified hydrogen
bond in the least squares mapping algorithm within GOLD. The Constraint weight is also the
value of the penalty applied to the fitness score for each constrained H bond that is not formed.
The Minimum H bond geometry weight is a user defined score that determines how good a
hydrogen bonding interaction has to be in order for it to be considered a hydrogen bond by
GOLD. The Minimum H bond geometry weight takes a range of values from 0 to 1, by default
this value is set at 0.005.
For a given protein H bond constraint more than one protein atom number can be entered in the
Protein atom entry box. This will instruct GOLD to use an either-or type of constraint during
docking. For example, specifying two protein atoms, acceptor m and acceptor n, separated by a
space, will result in the constraint being satisfied if an H bond is formed to either m or n during
docking. This is of use when defining constraints involving, for example, carboxylates where it
is not important which oxygen atom forms an H bond, provided that one of them does.
Click on the Add constraint or Update selected constraint button to add the constraint definition
to the Current Constraints (see Section 8.1, page 68). Using the Constraints Editor it is possible
GOLD User Guide 77
to specify several different protein H bond constraints, with different weights for each
constraint.
8.4 Region (Hydrophobic) Constraints
This constraint can be used to bias the docking towards solutions in which particular regions of
the binding site are occupied by specific ligand atoms (or types of ligand atom, e.g. hydrophobic
atoms).
8.4.1 Method Used for Region (Hydrophobic) Constraints (see page 77)
8.4.2 Setting Up Region (Hydrophobic) Constraints (see page 77)
8.4.1 Method Used for Region (Hydrophobic) Constraints
This constraint can be used to bias the docking towards solutions in which particular regions of
the binding site are occupied by specific ligand atoms (or types of ligand atom).
For each region (hydrophobic) constraint specified a sphere is placed at an explicitly-defined
position (using x,y,z coordinates) within the binding site. Each sphere is assigned a user-defined
radius, so a sphere can be adjusted if required, e.g, to fill an entire pocket in the binding-site.
Minimum settable radius as 0.5 .
A contribution (determined according to a user-specified weighting) is then added to the score
for each specified non-hydrogen ligand atom that lies within the designated sphere.
Note: A contribution is added to the score for each atom located within the sphere, (i.e. the total
contribution will depend on the number of atoms found in the region of interest and ultimately
the ligand-accessible volume of the region).
The ligand atoms used in the constraint can be specified explicitly from a list of atom numbers
(as defined in the MOL2 input file). Alternatively, it is possible to use all hydrophobic ligand
atoms, or to use only those hydrophobic atoms in aromatic rings. Atoms considered to be
hydrophobic include:
Carbon atoms bound to at least two H or C atoms.
Atoms typed C.cat.
Atoms typed S.3 and bound to two carbons.
H atoms bound to an sp2, sp3 or aromatic carbon (Note: only heavy atoms found within the
sphere will contribute to the score).
Details of the region (hydrophobic) constraint calculation, including the final contribution to the
fitness score, are given in the ligand log file (see Section 14.10, page 118).
8.4.2 Setting Up Region (Hydrophobic) Constraints
Click on the Edit Constraints button to bring up the Constraint Editor. Then, select Region
(Hydrophobic) Constraint from the list of constraint types.
148 GOLD User Guide
20. Acknowledgments
GOLD was written by Gareth Jones (University of Sheffield, UK) in a DTI LINK collaboration
with GlaxoWellcome and the Cambridge Crystallographic Data Centre (CCDC).
Funding was provided by the Biotechnology and Biological Sciences Research Council, the
Department of Trade and Industry, the Medical Research Council, GlaxoWellcome Ltd and
CCDC.
Peter Willett (University of Sheffield), Robert Glen (Wellcome), Andrew Leach
(GlaxoWellcome) and Jacques Barbanton (Lipha Pharmaceuticals) are also thanked for
significant contributions to the development of GOLD.
ChemScore in GOLD was implemented by Astex Technology, Cambridge, UK.
CCDC staff involved in GOLD are Jason Cole, Simon Bowden and Robin Taylor.
One of the torsion libraries supplied with GOLD was developed by Gerhard Klebe and Thomas
Mietzner (BASF).
GOLD User Guide 147
19. References
Molecular Recognition of Receptor Sites Using a Genetic Algorithm with a Description of
Desolvation
G. Jones, P. Willett and R. C. Glen
J. Mol. Biol., 245, 43-53, 1995
Development and Validation of a Genetic Algorithm for Flexible Docking
G. Jones, P. Willett, R. C. Glen, A. R. Leach and R. Taylor,
J. Mol. Biol., 267, 727-748, 1997
A New Test Set for Validating Predictions of Protein-Ligand Interactions
J. W. M. Nissink, C. Murray, M. Hartshorn, M. L. Verdonk, J. C. Cole and R. Taylor
Proteins, 49(4), 457-471, 2002
Life-science Applications of the Cambridge Structural Database
R.Taylor
Acta Cryst., D58, 879-888, 2002
Improved Protein-Ligand Docking using GOLD
M. L. Verdonk, J. C. Cole, M. J. Hartshorn, C. W. Murray, R. D. Taylor
Proteins, 52, 609-623, 2003
Virtual Screening Using Protein-Ligand Docking: Avoiding Artificial Enrichment
Marcel L. Verdonk, Valerio Berdini, Michael J. Hartshorn, Wijnand T. M. Mooij, Christopher W.
Murray, Richard D. Taylor, and Paul Watson,
J. Chem. Inf. Comput. Sci., 44, 793-806, 2004
Protein-Ligand Docking and Virtual Screening with GOLD
J. C. Cole, J. W. M. Nissink, R. Taylor in Virtual Screening in Drug Discovery (Eds. B.
Shoichet, J. Alvarez), Taylor & Francis CRC Press, Boca Raton, Florida, USA (2005).
Modeling Water Molecules in Protein-Ligand Docking Using GOLD
Marcel L. Verdonk, Gianni Chessari, Jason C. Cole, Michael J. Hartshorn, Christopher W.
Murray, J. Willem M. Nissink, Richard D. Taylor, and Robin Taylor,
J. Med. Chem., 48, 6504-6515, 2005
Comparing protein-ligand docking programs is difficult
Jason C. Cole, Christopher W. Murray, J. Willem M. Nissink, Richard D. Taylor, Robin Taylor
Proteins, 60, 325-332, 2005
78 GOLD User Guide
Specify the ligand atoms to be used in the constraint by selected either All hydrophobic ligand
atoms, Hydrophobic ligand atoms in aromatic rings, or User-specified list. If User-specified list is
selected then enter the ligand atom numbers (as defined in the MOL2 input file) into the Ligand
atoms entry box. Atom numbers should be separated by spaces.
Specify the position of the centre of the sphere (defined using x,y,z coordinates), and the radius
of the sphere (distances are in ).
A score contribution must also be specified. This is the value that will be added to the fitness
score for each specified non-hydrogen ligand atom found within the sphere region.
Note: the total contribution added will therefore depend on the number of atoms located within
the sphere.
Click on the Add constraint or Update selected constraint button to add the constraint definition
to the Current Constraints (see Section 8.1, page 68). Using the Constraints Editor it is possible
to define multiple region (hydrophobic) constraints.
GOLD User Guide 79
8.5 Template Similarity Constraints
This constraint can be used to bias the conformation of docked ligands towards a given solution,
or template.
8.5.1 Method Used for Template Similarity Constraints (see page 79)
8.5.2 Setting Up a Template Similarity Constraint (see page 79)
8.5.1 Method Used for Template Similarity Constraints
This constraint will bias the conformation of docked ligands towards a given solution. This
solution, or template, can, for example, be another ligand in a known conformation, a common
core (useful when docking ligands of a combinatorial set), or it may just be a large substructure
that is expected, or known, to bind in a certain way.
The template must be supplied as a MOL2 file or PDB file.
Unlike the distance-based constraints, which reduce the score for ligands that adopt
unfavourable orientations, this constraint will add an energy term to the score based on the
similarity between the ligand being docked and the template provided. The similarity between
the two is evaluated as a Gaussian overlap term.
The similarity constraint can be applied in three ways that differ in the way that the overlap
between ligand and template is calculated. The similarity can be evaluated:
by using the overlap between all donor atoms in the template and the ligand being docked.
by using the overlap between all acceptor atoms in the template and the ligand being docked.
by using the overlap of all atoms of the template (this can be regarded as a ligand-shape
constraint).
The energy term to be added is calculated as similarity times weight (the similarity value is
between 0 and 1, where 1 indicates identity of template and ligand).
Note: If you wish to place a fragment at an exact specified position in the binding site, as
opposed to biasing the docking, use the scaffold match constraint (see Section 8.6, page 80).
8.5.2 Setting Up a Template Similarity Constraint
Click on the Edit Constraints button to bring up the Constraint Editor. Then, select Template
Similarity Constraint from the list of constraint types.
Fill in the form to specify the similarity type to be used [H-bond donor overlap, H-bond-acceptor
overlap, or shape overlap (see Section 8.5.1, page 79)]; the similarity template file; and the
weight of the constraint.
146 GOLD User Guide
identify_ligand.py can be invoked from the command line. The structure of the command is:
identify_ligand.py <ligand data file> <ligand number>
Note: identify_ligand.py is a Python script and as such requires a working installation of Python
(http://www.python.org).
GOLD User Guide 145
For example, the table of rms deviations below for nine dockings of a molecule produces the
following clustering with the complete linkage method:
18.4 identify_ligand.py
identify_ligand.py can be used to extract a specific ligand description from PDB SDFile or
MOL2 format input files.
It requires a filename and a ligand number (n) as arguments and then locates the nth ligand in the
file. If any descriptive information, such as the ligand name, is available for that ligand, it is then
displayed.
2 3 4 5 6 7 8 9
1 0.8 1.1 1.0 1.0 1.4 2.3 5.0 4.6
2 0.9 1.1 1.1 1.2 2.3 5.2 4.6
3 0.4 0.8 0.9 2.3 5.0 4.5
4 0.6 1.1 2.3 4.9 4.5
5 1.3 2.0 4.9 4.5
6 1.8 5.1 4.4
7 5.3 4.5
8 2.4
Step Distance between
clusters being
merged
Clusters
1 0.40 1 | 2 | 3, 4 | 9 | 5 | 6 | 7 | 8 |
2 0.84 1 | 2 | 3, 4, 5 | 9 | 8 | 6 |
3 0.84 1, 2 | 7 | 3, 4, 5 | 9 | 8 | 6 |
4 1.13 1, 2, 3, 4, 5 | 7 | 6 | 9 | 8 |
5 1.42 1, 2, 3, 4, 5, 6 | 7 | 8 | 9 |
6 2.35 1, 2, 3, 4, 5, 6, 7 | 9 | 8 |
7 2.38 1, 2, 3, 4, 5, 6, 7 | 8, 9|
8 5.28 1, 2, 3, 4, 5, 6, 7, 8, 9 |
80 GOLD User Guide
The similarity template file should contain the template molecule or fragment in its docked
position (i.e. expressed with respect to the same coordinate frame as the protein and with the
coordinates required to place it in the correct pose).
The weight term determines the maximum energy term that would be added to the score in the
case of perfect overlap between ligand and template. As an initial value for this term, we suggest
a value between 5 and 30.
Click on the Add constraint or Update selected constraint button to add the constraint definition
to the Current Constraints (see Section 8.1, page 68). Using the Constraints Editor it is possible
to define multiple constraints, e.g. one for donors and one for acceptors.
8.6 Scaffold Match Constraint
The scaffold match constraint can be used to place a fragment at an exact specified position in
the binding site, the geometry of the fragment will not be altered during docking.
GOLD User Guide 81
8.6.1 Method Used for Scaffold Match Constraint (see page 81)
8.6.2 Setting Up Scaffold Match Constraints (see page 81)
8.6.1 Method Used for Scaffold Match Constraint
This constraint will attempt to a place a ligand onto a given scaffold location. The scaffold, can,
for example, be a common core, or fragment (useful when docking ligands of a combinatorial
set), or it may just be a substructure known to adopt a certain binding position.
The scaffold must be supplied as a MOL2. The file should contain the scaffold fragment in its
docked position (i.e. expressed in the same coordinate frame as the protein and with the
coordinates required to place it in the correct pose).
Note: It is important that the Sybyl atom and bond types in the scaffold mol2 file match those in
the scaffold portion of the ligand. The scaffold matching algorithm matches heavy atoms only.
However it is recommended that the scaffold have hydrogens correctly placed on all appropriate
atoms other than the unfulfilled valency at the substitution point, which must not be blocked by
hydrogen.
Unlike the template similarity constraint, which will bias the docking by adding an energy term
to the score based on the similarity between the ligand being docked and the template provided,
this constraint is enforced at the mapping stage in GOLD. Ligand placements are generated
using a best least-squares fit with the scaffold heavy atom positions. i.e. this constraint forces all
atoms on the matching portion of the ligand to lie very close, or coincident, with the
corresponding scaffold. There is no S(con) contribution to the fitness score to bias dockings.
How closely ligand atoms fit onto the scaffold is governed by a user specified weight. Setting a
higher weight will force the ligand to be placed onto the scaffold locations more strictly. A
default weight of 5.0 is used.
Note: setting high weightings can have a detrimental effect on the fitness score if the placement
results in e.g. bad protein-ligand clashes. If desired, values below 1 can be used to achieve a
more lenient overlay.
Symmetry effects (such as the flipping of a phenyl ring by 180 degrees) are not taken into
account during matching of the ligand onto the scaffold. Therefore, a scaffold that will give a
unique match should ideally be provided.
For a given ligand, it is not possible to match multiple scaffolds at the same time. Scaffolds are
evaluated in the order supplied by the user and the scaffold that matches the ligand first will be
used. This means that it is possible to specify two or more different scaffolds, and GOLD will
use the scaffold that matches the ligand first. This can be useful when docking multiple different
series of compounds.
8.6.2 Setting Up Scaffold Match Constraints
Click on the Edit Constraints button to bring up the Constraint Editor. Then, select Scaffold
Match Constraint from the list of constraint types.
144 GOLD User Guide
C:\Program Files\CCDC\GOLD\gold\d_win32\bin\smartrms_win32.exe [-hv]
conformation_1 conformation_2
The flags are:
h use heavy atoms only (the calculation easily becomes intractable if Hs are included).
v verbose output.
conformation_1 and conformation_2 are MOL2 files containing the two conformations.
18.3 rms_analysis
rms_analysis calculates an rms difference matrix for a set of structures (as MOL2 files) and
performs hierarchical cluster analysis. A graph isomorphism algorithm is used to determine
optimal rms values.
rms_analysis can be invoked from the command line.
The structure of the command is dependent on the platform being used:
UNIX:
$GOLD_DIR/utilities/rms_analysis -method [simple|complete|group_average] <file1>.mol2
<file2>.mol2 <file3>.mol2 <file4>.mol2...
Note: this command will only work if users have their GOLD_DIR environment variable
correctly set. To e.g. carry out a simple cluster analysis for the files file1.mol2 and file2.mol2,
the following command would be used:
$GOLD_DIR/utilities/rms_analysis -method simple file1.mol2 file2.mol2
Windows (via the command prompt):
<install_dir>\gold\d_win32\bin\rms_analysis_win32.exe -method
[simple|complete|group_average] <file1>.mol2 <file2>.mol2 <file3>.mol2 <file4>.mol2...
where <install_dir> is the GOLD installation directory. If specifying the full path, the
command will need to be in inverted commas, e.g. :
C:\Program Files\CCDC\GOLD\gold\d_win32\bin\rms_analysis_win32.exe -method
[simple|complete|group_average] <file1>.mol2 <file2>.mol2 <file3>.mol2 <file4>.mol2...
Choose simple for single linkage cluster analysis, complete for complete linkage, group_average
for group average.
GOLD User Guide 143
18.2 smart_rms
smart_rms calculates the rms difference between two conformations of the same structure, while
taking account of symmetry effects (such as the flipping of a phenyl ring by 180 degrees). Using
a graph isomorphism algorithm, an rms score is calculated for each way of mapping the
molecule onto itself.
smart_rms can be invoked from the command line. The following platform-dependent
commands should be used.
UNIX platforms:
$GOLD_DIR/untilities/smart_rms [-hv] conformation_1 conformation_2
Windows platforms (at the Windows command prompt):
<install_dir>\gold\d_win32\bin\smartrms_win32.exe [-hv] conformation_1 conformation_2
where <install_dir> is the GOLD installation directory. If specifying the full path, the
command will need to be in inverted commas, e.g. :
82 GOLD User Guide
The scaffold structure file should contain the scaffold molecule or fragment in its docked
position (i.e. within the same coordinate frame as the protein).
The Scaffold Match Constraint Weight determines how closely ligand atoms fit onto the
scaffold. Setting a higher weight will force the ligand to be placed onto the scaffold locations
more strictly.
By default, all heavy atoms in the supplied scaffold structure file will be used for matching.
However, it is possible to specify only a subset of those atoms in the scaffold structure (these
may include non-heavy atoms). Atoms should be specified using the atom indices as defined in
the scaffold structure file (indices should be separated by a single space). Limiting the number of
atoms to be matched can be useful for large, rigid scaffolds. In such a case, specifying only a few
atoms distributed throughout the scaffold can be sufficient to obtain a good 3D superimposition.
Click on the Add constraint or Update selected constraint button to add the constraint definition
to the Current Constraints (see Section 8.1, page 68).
GOLD User Guide 83
9. Torsion Angle Distributions
9.1 Basic Use of Torsion Angle Distributions (see page 83)
9.2 Choice of Torsion Angle Distribution Files (see page 83)
9.3 Editing Torsion Angle Distribution Files (see page 84)
9.4 Matching Torsion Angle Distributions at Run Time (see page 88)
9.1 Basic Use of Torsion Angle Distributions
Torsion angle distributions extracted from the Cambridge Structural Database (CSD) can be
input to GOLD. These distributions are used to restrict the ligand conformational space sampled
by the genetic algorithm.
Using torsion angle distributions in this way will not make GOLD go any faster. However, it
may improve the chances of GOLD finding the correct answer by biasing the search towards
ligand torsion-angle values that are commonly observed in crystal structures. It may also
improve convergence and so make GOLD usable with faster settings (see Section 11.3, page 94).
To enable the use of torsion angle distributions click on the Fitness & Search Options button in
the Fitness Function and Search Settings panel in the GOLD front end, then in the resulting
window switching on the check box labelled Use torsion angle distributions from the CSD.
9.2 Choice of Torsion Angle Distribution Files
Three torsion angle distribution files are provided:
gold.tordist - this is the default file.
gold.tordist.new - this contains all the torsions in gold.tordist and many more new
distributions. However, many of these newer torsions have very few hits in the CSD and no
significant improvement was found when using this new file in GOLD.
mimumba.tordist - this contains all the torsional distributions used in the MIMUMBA
program (Klebe and Mietzner, J.Comput.-Aided Mol.Des., 8, 583-606, 1994).
Click on the Distributions File button in the GOLD front end to pick a torsion angle distribution
file. Alternatively, type the required file into the entry box.
It is possible to customise torsion angle distribution information by editing one of the standard
torsion angle distribution files (see Section 9.3, page 84).
142 GOLD User Guide
18. Utility Programs
A number of utility programs are supplied to assist in the analysis of GOLD docking results
The following utility is available in the sgi_utils directory of the GOLD distribution:
18.1 grommitt (see page 142) - used for simple visualisation of dockings, available for SGI
users running IRIX only.
The following utilities are available in the utilities directory of the GOLD distribution:
18.2 smart_rms (see page 143) - computes rms deviations between two conformations of
the same structure.
18.3 rms_analysis (see page 144) - performs cluster analysis on a set of docking solutions.
18.4 identify_ligand.py (see page 145) - extracts descriptive information such as ligand
name for a specified structure record in a file.
18.1 grommitt
grommitt is a simple molecular viewer for examining binding modes and available for SGI users
running IRIX only.
When GOLD is being run interactively, grommitt can be used to display the current top solution
from a genetic algorithm run. To do this, click on the Display/Output Options button in the
GOLD front end (see Section 2.2, page 4).
grommitt can also be opened from the command line, e.g. to display overlays of SYBYL MOL2
files. The structure of the command is:
grommitt [-chp] <files>
The flags are:
c each molecule is coloured differently. Normally, molecules are coloured by atom type.
h only display heavy atoms.
p pretty (but slow) display.
<files> is a list of SYBYL MOL2 and/or PDB files.
grommitt is useful for visualising a set of GOLD solutions, e.g. to see at a glance if all solutions
are identical or whether there are several different binding modes. For example:
%grommitt -h gold_soln*
displays the window:
GOLD User Guide 141
Non-parametric tests indicate that GOLD score and activity are not significantly correlated
(Spearman r
s
= -0.564, p = 0.056; Kendall t =-0.382, p = 0.086).
There is not a statistically significant relationship between the GOLD score and activity. It is
worth noting that the compounds are all structurally similar and all are active.
17. Context-Dependent Help
Context-dependent help is available in the front end, by clicking the middle mouse button on the
item for which information is required. For example, clicking on:
brings up this help window:
84 GOLD User Guide
9.3 Editing Torsion Angle Distribution Files
To edit the torsion angle distribution file click on the Edit Distributions button in the Fitness
Function and Search options window (accessible by clicking on the Fitness & Search Options
button in the Fitness Function and Search Settings panel in the GOLD front end).
If you are using the default torsion angle distribution file, it will be copied to the current
directory.
The format of entries in the torsion angle distribution file is quite strict: incorrect editing of the
file may cause GOLD to behave in unexpected ways or even to crash.
9.3.1 Format of Torsion Angle Distribution File Header (see page 84)
9.3.2 Format of Torsion Angle Distributions (see page 85).
9.3.3 Example Torsion Angle Distributions (see page 87).
9.3.4 Extracting Torsion Angle Distributions from the Cambridge Structural Database (see
page 88)
9.3.1 Format of Torsion Angle Distribution File Header
The first section of the torsion angle distribution file sets parameters and tells GOLD what to do
with the distributions.
N_BINS is the number of bins used in the torsion histogram.
REMOVE_HIGH_ENERGY and DELTA_E are parameters that can be used to control the
filtering out of high-energy torsion angles.
If torsion angle distributions are used, GOLD will no longer sample over 360 degrees but will
constrain the torsion to values contained in the histogram. However, if a histogram contains a
large number of entries, there may be some high-energy torsions within the histogram. GOLD
therefore provides a method for filtering out such high-energy torsions: set
REMOVE_HIGH_ENERGY = 1 and DELTA_E = E to remove those bars in the histogram that
correspond to torsions that are E kcal/mol higher in energy than the most populated state. The
ground state of the torsion is assumed to correspond to the maximum peak in the torsional
histogram. The energy difference between this ground state and any other peak in the torsion
angle histogram is then assumed to be approximately given by the partition function.
The following table indicates the relationship between the value of DELTA_E and the ratio high/
low, where high is the height of the biggest bar in the histogram and low is the height below
which bars will be removed from the histogram:
GOLD User Guide 85
For example, if REMOVE_HIGH_ENERGY=1 and DELTA_E = 2.5, those bars which are 1/69th
or less of the height of the largest bar will be removed from the histogram and torsion angles
corresponding to these bars will never be sampled by the genetic algorithm.
The relationship between DELTA_E and ratio, based on the partition function, is:
ratio = exp (DELTA_E/0.5898)
9.3.2 Format of Torsion Angle Distributions
Each torsion angle distribution entry comprises three lines: the first line is the name of the
torsion angle; the second line is the definition of the torsion angle; the third line is the histogram.
The histogram should be a list of space-separated integers. The ith integer should be the number
of observations in the torsion-angle range of the ith bin. There should be N_BINS integers in all.
The first bin starts at -180 degrees and the last bin ends at +180.
Torsion angle distributions are defined using Backus-Naur Form (BNF) grammar, as follows (all
the symbols in the table are part of the grammar except for ||, which is used to indicate
alternative fields):
DELTA_E ratio
3.0 161
2.5 69
2.0 30
TORSION NODE | NODE | NODE | NODE | ||
NODE | NODE | NODE | NODE | DIRECTIVE ||
NODE | NODE | NODE | NODE | DIRECTIVE | DIRECTIVE
DIRECTIVE expand <min> <max> || period <min> <max>
NODE ATOM || ATOM (NEIGHBOURS)
NEIGHBOURS NEIGHBOUR_NODE || NEIGHBOUR_NODE NEIGHBOURS
NEIGHBOUR_NODE NODE || HYDROGENS
HYDROGENS 0H || 1H || 2H || 3H
ATOM ATOM_DEF || ATOM_DEF [FRAGMENT]
FRAGMENT ribose || adenine || uracil || benzene
ATOM_DEF TYPE_DEF || LINKAGE&ltno space&gtTYPE_DEF
140 GOLD User Guide
Non-parametric tests indicate that GOLD score and activity are significantly correlated
according to the Kendall test but not according to the Spearman test (Spearman r
s
= -0.191, p =
0.065; Kendall t =-0.150, p = 0.033).
These inhibitors are all extremely hydrophobic, representing a difficult case for GOLD.
Note: For this dataset and target GOLD is not predicting active molecules as inactive. This is
advantageous in virtual screening applications (inactives that are predicted as actives are
acceptable in this context, the converse is not applicable).
16.2.3 Prediction of Binding Affinity to FKBP12
GOLD was used to dock a set of 13 FK506BP inhibitors (data from Holt et a., J. Am. Chem. Soc,
1993, 115, 9925). 20 docking runs were performed on each complex and the best fitness score
recorded.
A plot of fitness score against measured K
i
is shown below:
GOLD User Guide 139
The GOLD scores are a good indicator of activity for this series. It is most unlikely that this level
of prediction could have arisen through chance (_
2
= 15.27, p < 0.001, 1 degree of freedom)
16.2.2 Prediction of Binding Affinity to Alpha Chymotrypsin
GOLD was used to dock a set of 94 alpha-chymotrypsin inhibitors (data from Stewart et al., T.
C. Methods, 1990, 3, 713).
A plot of fitness score against measured K
i
is shown below:
The graph below omits the two outliers:
Predicted active Predicted inactive
Observed active 14 1
Observed inactive 5 14
86 GOLD User Guide
This grammar allows torsions to be specified as four fragment nodes. Each node defines an atom
type and, optionally, a set of neighbours to which the atom is connected. Each of the neighbours
is a node or an exact count of the number of hydrogen atoms to which the atom is bonded. Atom
types are defined using SYBYL atom types or elemental atom types. The atom can also be
required to be part of a pre-defined fragment.
Bonding environments can also be specified, using the symbols ~,=,-, which indicate,
respectively, that an atom forms an aromatic, double or single bond to its parent node.
Note: ~,=, and - should therefore not be used on the first atoms specified, these bond types are
specified for substituents only.
A node is a parent of all its neighbours and a top level node in the torsion definition is a parent of
subsequent nodes in the torsion.
There are currently four fragments available, one of which (the uracil fragment) matches both
thymine and uracil. More fragments can easily be added. The Ullman algorithm is used to
determine if an atom belongs to a fragment. Fragments are defined through SYBYL atom types
and connectivity (exact bond types are not used). Only heavy atoms are considered. Currently,
fragments are precompiled, but they could be read in at run-time if required.
Directives are allowed to take account of special circumstances. There are two directives:
expand and period.
The expand directive has the form expand <min> <max> where <max> - <min> = 180.0 or
<min> = 0. This directive is used for torsions where the CSD query has symmetry and torsions
are only measured over <min> to <max> degrees. However, although the CSD query may have
two-fold symmetry, often the matched structure does not. The expand directive fills out the rest
of the histogram with the correct values.
The period directive takes account of those torsional distributions for which the matched
structure has symmetry. This directive has the form period <pmin> <pmax>. The distribution
will only be expanded between angles <pmin> and <pmax>.
TYPE_DEF SYB_TYPE || EL_TYPE
LINKAGE ~ || = || -
SYB_TYPE C.3 || C.2 || C.1 || C.ar || C.cat || N.3 || N.2 || N.1 || N.ar || N.am || N.pl3
|| N.4 || O.3 || O.2 || O.co2 ||
S.3 || S.2 || S.o || S.o2 || P.3 || H || F || Cl || Br || I
EL_TYPE C || N || O || S || P
TORSION NODE | NODE | NODE | NODE | ||
NODE | NODE | NODE | NODE | DIRECTIVE ||
NODE | NODE | NODE | NODE | DIRECTIVE | DIRECTIVE
GOLD User Guide 87
9.3.3 Example Torsion Angle Distributions
Here are some examples of torsion angle distributions extracted from the Cambridge Structural
Database and in the correct format:
DIAGRAM
acid T1
C.2 (O.co2 O.co2) | C.3 (2H) | C.3 (2H) | C
41 8 0 0 0 0 0 0 0 1 8 7 2 0 0 0 0 1 1 0 0 0 1 0 4 1 0 1 0 0 0 0
0 2 2 41
DIAGRAM
acid T2
O.co2 | C.2 (O.co2) | C.3 (2H) | C.3 (2H C)
8 5 1 3 2 1 3 2 3 2 3 3 4 0 3 2 7 11 15 9 1 4 1 0 2 1 4 4 1 3 3 6
0 3 5 7
DIAGRAM
amide nh T2
C.2 (=O.2 N.am (1H)) | C.3 (1H C.3) | N.am (1H) | C.2 (=O.2)
1 1 14 16 29 25 23 38 35 50 82 156 53 6 1 0 0 0 0 0 0 1 1 14 17 15 4 4 2
1 2 5 2 2 0 0
DIAGRAM
uracil
O.3 [ribose] | C.3 [ribose] | N.am [uracil] (C.2 (1H))| C.2 [uracil] (=O.2)
24 73 85 44 59 60 40 14 8 3 2 0 0 0 0 0 0 0 0 0 0 0 0 0 7 5 3 0 0 1 4
3 3 5 10 6
DIAGRAM
benzyl sub
C | C.3 (2H) | C.ar (~C.ar (0H)) | ~C.ar (0H) | expand 0.0 180.0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 9 27 76 64 15 7 4 2
0 0 0 0
138 GOLD User Guide
activity. This has varied from a clear relationship for a test set of neuraminidase inhibitors, a
discernable relationship for alpha-chymotrypsin inhibitors, but no statistically significant
relationship for FK506 inhibitors.
16.2.1 Prediction of Binding Affinity to Influenza A Neuraminidase (see page 138)
16.2.2 Prediction of Binding Affinity to Alpha Chymotrypsin (see page 139)
16.2.3 Prediction of Binding Affinity to FKBP12 (see page 140)
16.2.1 Prediction of Binding Affinity to Influenza A Neuraminidase
GOLD was used to dock a set of 34 neuraminidase inhibitors. 25 docking runs were performed
on each complex and the best fitness score recorded.
A plot of fitness score against measured IC
50
(data supplied by GlaxoWellcome) is shown
below:
There are no compounds with low fitness and high activity and there is evidence of a correlation
(Spearman r
s
= -0.649, p < 0.001; Kendall t =-0.483, p < 0.001).
Considering 10m to be a cutoff for activity, there are 15 actives and 19 inactives. Using a
GOLD score of 74 or above as a predictor of activity gives:
GOLD User Guide 137
Classified in the validation experiments as a prediction that was wrong (1ICN - oleate docked
into a fatty-acid binding protein):
16.2 Correlation between Fitness Function and Biological Activity
The GOLD fitness function was designed to discriminate between different binding modes of
the same molecule. Extra terms are probably required to compare different molecules. For
example, a term is probably required to account for the entropic loss associated with freezing
rotatable bonds when the ligand binds.
Nevertheless, some correlation has been observed between GOLD fitness scores and biological
88 GOLD User Guide
9.3.4 Extracting Torsion Angle Distributions from the Cambridge Structural Database
The command process_tab (only available on SG machines) will extract the torsion angle
histogram from the .tab file produced by a search of the Cambridge Structural Database, and
reformat it so that it can be added into the GOLD torsional distribution file.
9.4 Matching Torsion Angle Distributions at Run Time
GOLD identifies each rotatable bond in the ligand and attempts to match it to a torsion angle
distribution in the torsion angle distribution file. This includes bonds that are identified by
GOLD as flippable (e.g., if torsions are switched on then ligand carboxylic acids (O)C-OH will
also use a torsion distribution).
In some cases, a rotatable bond may match more than one torsion angle distribution. If this
happens, a score is calculated for each torsion angle distribution and the distribution with the
highest score is selected.
Note: a weighting scheme is used when matching rotatable bonds in the ligand to a torsion angle
distribution such that more specific torsion definitions are taken in preference to more generic
ones.
Each portion of the torsion angle distribution contributes to the score as follows:
Element atom type 1.5
SYBYL atom type 2.0
Fragment 3.0
Hydrogen count 2.0
Bond linkage 0.5
GOLD User Guide 89
10. Genetic Algorithm Parameter Definitions
10.1 Genetic Algorithm Overview (see page 89)
10.2 Population Size (see page 89)
10.3 Selection Pressure (see page 90)
10.4 Number of Operations (see page 90)
10.5 Number of Islands (see page 90)
10.6 Niche Size (see page 91)
10.7 Operator Weights: Migrate, Mutate, Crossover (see page 91)
10.8 Van der Waals and Hydrogen Bonding Annealing Parameters (see page 91)
10.9 Hydrophobic Fitting Points (see page 92)
10.1 Genetic Algorithm Overview
GOLD optimises the fitness score by using a genetic algorithm.
A population of potential solutions (i.e. possible docked orientations of the ligand) is set up at
random. Each member of the population is encoded as a chromosome, which contains
information about the mapping of ligand H-bond atoms onto (complementary) protein H-bond
atoms, mapping of hydrophobic points on the ligand onto protein hydrophobic points, and the
conformation around flexible ligand bonds and protein OH groups.
Each chromosome is assigned a fitness score based on its predicted binding affinity and the
chromosomes within the population are ranked according to fitness.
The population of chromosomes is iteratively optimised. At each step, a point mutation may
occur in a chromosome, or two chromosomes may mate to give a child. The selection of parent
chromosomes is biased towards fitter members of the population, i.e. chromosomes
corresponding to ligand dockings with good fitness scores.
A number of parameters control the precise operation of the genetic algorithm, viz.
Population Size (see page 89)
Selection Pressure (see page 90)
Number of Operations (see page 90)
Number of Islands (see page 90)
Niche Size (see page 91)
Operator Weights: Migrate, Mutate, Crossover (see page 91)
Van der Waals and Hydrogen Bonding Annealing Parameters (see page 91)
Changes to genetic algorithm parameters should be made with care (see Section 11.3, page 94).
10.2 Population Size
The genetic algorithm maintains a set of possible solutions to the problem. Each possible
solution is known as a chromosome and the set of solutions is termed a population.
136 GOLD User Guide
16.1.4 Examples of GOLD Dockings
The plots below show examples of GOLD dockings:
Classified in the validation experiments as a good prediction (4PHV - a peptide-like ligand
docked into HIV protease):
Classified in the validation experiments as a close prediction (1GLQ - a nitrophenyl-substituted
peptide ligand docked into glutathione-S-transferase):
Classified in the validation experiments as a prediction with significant errors (1EAP - a
succinylaminophosphonate ligand docked into an antibody):
GOLD User Guide 135
The aspartic protease set contains a high proportion of large ligands with several rotational
bonds; these complexes are difficult samples for docking. The lyases are difficult to dock as the
set features relatively shallow binding sites and polar ligands that are partly solvent-exposed
(examples are 1aco and 2h4n); crystal waters sometimes mediate binding (examples are 1pdz,
1okm).
However, it is extremely difficult to draw conclusions from data obtained using such small sets.
When GOLD solutions are classified as good or wrong using an RMS threshold of 2.0, a
simple chi-squared based test can be used to decide whether or not the observed result really is
different from the success rate obtained for the clean list.
It does show that the set of aspartic proteases can be regarded as different at a confidence level
of P<0.025. The lyase and lectin sets have significantly different results when P=0.10 is allowed,
and for the isomerases P<0.25 applies. The results for all other sets may just differ by chance,
and are not significantly different from the results obtained for the clean list.
Alternatively, F statistics can be used to decide whether a subset is really different from the clean
list in terms of RMS value. In this case, the F ratio is calculated using the null hypothesis that the
average RMS for the clean list of 224 entries and each sublist is equal.
Results for F indicate that only the subsets containing aspartic proteases and isomerases (and
possibly the lectin set) are significantly different from the clean list, showing clearly that it is
very difficult to draw any meaningful conclusions from the results for such small sets.
Influence of Mediating Water Molecules on GOLD Results
Waters have been removed from all complexes prior to docking. This probably lowers
performance of the docking algorithm, as waters can mediate interactions that are essential for
ligand-binding. To estimate this effect, a subset of structures were identified with at least one
strongly-bound water molecule within a 2.9 distance of both protein and ligand moieties.
GOLD success rates for this subset (40 entries) and structures lacking mediating water
molecules (55 entries) are reported below. All entries are subsets of the clean list. There seems to
be a trend towards lower success rates for structures that contain water-mediated contacts
between ligand and protein, although the impact of leaving water molecules out is not so high as
might be expected.
GOLD results for complexes with and without waters that mediate protein-ligand binding:
90 GOLD User Guide
The variable Population Size (or popsize) is the number of chromosomes in the population. If
n_islands is greater than one (i.e. the genetic algorithm is split over two or more islands),
popsize is the population on each island.
Changes to genetic algorithm parameters should be made with care (see Section 11.3, page 94).
10.3 Selection Pressure
Each of the genetic operations (crossover, migration, mutation) (see Section 10.7, page 91) takes
information from parent chromosomes and assembles this information in child chromosomes.
The child chromosomes then replace the worst members of the population.
The selection of parent chromosomes is biased towards those of high fitness, i.e. a fit
chromosome is more likely to be a parent than an unfit one.
The selection pressure is defined as the ratio between the probability that the most fit member of
the population is selected as a parent to the probability that an average member is selected as a
parent. Too high a selection pressure will result in the population converging too early.
For the GOLD docking algorithm, a selection pressure of 1.1 seems appropriate, although 1.125
may be better for library screening where the aim is faster convergence.
Changes to genetic algorithm parameters should be made with care (see Section 11.3, page 94).
10.4 Number of Operations
The genetic algorithm starts off with a random population (each value in every chromosome is
set to a random number). Genetic operations (crossover, migration, mutation) (see Section 10.7,
page 91) are then applied iteratively to the population. The parameter Number of Operations (or
maxops) is the number of operators that are applied over the course of a GA run.
It is the key parameter in determining how long a GOLD run will take.
Changes to genetic algorithm parameters should be made with care (see Section 11.3, page 94).
10.5 Number of Islands
Rather than maintaining a single population, the genetic algorithm can maintain a number of
populations that are arranged as a ring of islands. Specifically, the algorithm maintains n_islands
populations, each of size popsize.
Individuals can migrate between adjacent islands using the migration operator.
The effect of n_islands on the efficiency of the genetic algorithm is uncertain.
Changes to genetic algorithm parameters should be made with care (see Section 11.3, page 94).
GOLD User Guide 91
10.6 Niche Size
Niching is a common technique used in genetic algorithms to preserve diversity within the
population.
In GOLD, two individuals share the same niche if the rmsd between the coordinates of their
donor and acceptor atoms is less than 1.0 .
When adding a new individual to the population, a count is made of the number of individuals in
the population that inhabit the same niche as the new chromosome. If there are more than
NicheSize individuals in the niche, then the new individual replaces the worst member of the
niche rather than the worst member of the total population.
Changes to genetic algorithm parameters should be made with care (see Section 11.3, page 94).
10.7 Operator Weights: Migrate, Mutate, Crossover
The operator weights are the parameters Mutate, Migrate and Crossover (or pt_cross).
They govern the relative frequencies of the three types of operations that can occur during a
genetic optimisation: point mutation of the chromosome, migration of a population member
from one island to another, and crossover (sexual mating) of two chromosomes.
Each time the genetic algorithm selects an operator, it does so at random. Any bias in this choice
is determined by the operator weights. For example, if Mutate is 40 and Crossover is 10 then, on
average, four mutations will be applied for every crossover.
The migrate weight should be zero if there is only one island, otherwise migration should occur
about 5% of the time.
Changes to genetic algorithm parameters should be made with care (see Section 11.3, page 94).
10.8 Van der Waals and Hydrogen Bonding Annealing Parameters
When GoldScore is being used, the annealing parameters, van der Waals and Hydrogen
Bonding, allow poor hydrogen bonds to occur at the beginning of a genetic algorithm run, in the
expectation that they will evolve to better solutions.
At the start of a GOLD run, external van der Waals (vdw) energies are cut off when E
ij
> van der
Waals * k
ij
, where k
ij
is the depth of the vdw well between atoms i and j. At the end of the run,
the cut-off value is FINISH_VDW_LINEAR_CUTOFF. This allows a few bad bumps to be
tolerated at the beginning of the run.
Similarly, the parameters Hydrogen Bonding and FINAL_VIRTUAL_PT_MATCH_MAX are
used to set starting and finishing values of max_distance (the distance between donor hydrogen
and fitting point must be less than max_distance for the bond to count towards the fitness score).
This allows poor hydrogen bonds to occur at the beginning of a GA run.
Both the vdw and H-bond annealing must be gradual and the population allowed plenty of time
to adapt to changes in the fitness function.
Changes to genetic algorithm parameters should be made with care (see Section 11.3, page 94).
134 GOLD User Guide
GOLD Performance as a Function of Protein Type
Success rates for GOLD as a function of protein types are given below. Statistical analysis was
performed to asses whether the results are really different, or may have arisen by coincidence.
This check is essential, as the size of the sets being considered here is very small.
Performance appears to be above average for the metalloprotease, kinase, isomerase and lectin
sets. However, performance seems to be lower than expected for the aspartic protease and lyase
sets.
GOLD User Guide 133
GOLD Performance
A brief overview of the results obtained for GOLD with the CCDC/Astex test set are given
below. Figure 1 shows GOLD success rates as a function of the number of torsion angles in the
ligand. Results were obtained using the default settings; the values shown are the average values
derived from a set of 50 validation runs. Standard deviations are given. RMS (Root mean
squared deviations of atomic coordinates) values of 2.0 or less were considered to be good
results.
The following table shows the GOLD results for the clean set; results were calculated using both
the default settings and the threefold speed-up settings. As can be seen, there is a tradeoff
between speed and reliability. All success rates are average values over 50 validation runs.
Standard deviation is given in parentheses.
92 GOLD User Guide
10.9 Hydrophobic Fitting Points
GOLD automatically calculates a list of hydrophobic fitting points in the binding site. These are
used during the generation of trial docking solutions to map hydrophobic ligand atoms into
favourable regions of the binding site.
GOLD generates its hydrophobic fitting points by placing a fine grid over the binding site. At
each grid position, the van der Waals interaction energy between a bare carbon atom and the
protein is evaluated. By default, positions at which the interaction energy is below -2.5 kcal/
mole are added to the list of fitting points.
Note: the potential and threshold for selecting fitting points can be changed by editing the
gold.params file and changing the values of INTERNAL_POTENTIAL_FITPTS and
E_FITPT_THRESHOLD.
In this way, a map is constructed that contains positions onto which the placement of a
hydrophobic ligand atom should be favourable.
The ligand fitting points are used for the matching of hydrophobic regions.
By default only carbon atoms in the ligand are considered when identifying fitting points. The
selection of suitable ligand atoms can be extended to include carbon, halogen and non-polar
sulfur atoms by uncommenting the following line in the gold.params file:
#LIGAND_FITPTS_SELECTION EXTENDED_HAL_S
During docking, GOLD selects a list of lipophilic ligand atoms and matches them onto a subset
of the hydrophobic fitting points.
It is possible to use customised hydrophobic fitting points. This might be appropriate if GOLD is
not giving good results on a particular protein and you suspect that the fault may lie in the
placement of hydrophobic ligand groups.
Customised fitting points must be supplied in a MOL2 format file that contains a list of dummy
atoms at the desired fitting-point locations. The supplied fitting points should sample all regions
of interest in the cavity, so that the docking algorithm has sufficient alternatives for placement of
hydrophobic ligand atoms within the cavity. GOLD uses gridded points that are spaced by 0.25
; for a speed-up in calculation, higher values could be used.
To make GOLD use a customised fitting-point file, click on the Fitness & Search options button
in the GOLD front end, then switch on the Read hydrophobic fitting points check box in from the
Fitness Function and Search Options window. Finally, hit the Fit point file... button to open a file
selection window from which your customised file can be located.
Customised fitting points can, for example, be generated by the CCDC program SuperStar,
which offers the possibility of writing out a file of GOLD fitting points in the appropriate format
(see SuperStar manual sections on SAVE_GOLD_FITTING_POINTS and
GOLD_MIN_PROPENSITY).
GOLD User Guide 93
11. Balancing Reliability and Speed
11.1 Number of Dockings (see page 93)
11.2 Early Termination (see page 93)
11.3 Controlling Reliability and Speed with GA Parameters (see page 94)
11.1 Number of Dockings
GOLD will dock each ligand several times starting each time from a different random
population of ligand orientations. The results of the different docking runs are ranked by fitness
score.
The number of dockings to be performed on each ligand is set when the ligand file is defined
(see Section 4.5, page 32).
By default the number of dockings to be performed on each ligand is 10.
The total time spent docking a ligand obviously depends on the number of docking runs, so you
can make GOLD go faster by reducing this number. However, it is useful to perform at least a
few docking runs on each ligand. This increases the chances of getting the right answer. Also, if
the same answer is found in several different docking runs, it is usually a strong indicator that
the answer is correct.
The early termination option (see Section 11.2, page 93) can be used to prevent GOLD wasting
time performing multiple docking runs on easy ligands.
11.2 Early Termination
The early termination option instructs GOLD to terminate docking runs on a given ligand as
soon as a specified number of runs have given essentially the same answer. In this situation, it is
probable that the answer is correct, and GOLD will just be wasting time if it performs more
docking runs on that ligand.
To switch early termination on, click on the Allow early termination check box in the GOLD
front end (i.e. so that the box is coloured red). Then specify the early termination criterion. In the
example below, GOLD has been instructed to stop docking a ligand if it reaches a state in which
the best three solutions found so far are all within 1.5 rmsd of each other:
The rms deviation takes account of any ligand symmetry.
Early termination does not always save as much time as you might think, because it tends to be
invoked for easy (i.e. relatively rigid) ligands, which are quick to dock anyway.
132 GOLD User Guide
TABLE I. Optimal sets (clean lists) with different resolution thresholds of
none, 2.5 , and 2.0
Full set (305 entries)
1a07 1a0q 1a1b 1a1e 1a28 1a42 1a4g 1a4k 1a4q 1a6w
1a9u 1aaq 1abe 1abf 1acj 1acl 1acm 1aco 1aec 1aha
1ai5 1aj7 1ake 1aoe 1apt 1apu 1aqw 1ase 1atl 1azm
1b58 1b59 1b6n 1b9v 1baf 1bbp 1bgo 1bl7 1blh 1bma
1bmq 1byb 1byg 1c12 1c1e 1c2t 1c5c 1c5x 1c83 1cbs
1cbx 1cdg 1cf8 1cil 1cin 1ckp 1cle 1com 1coy 1cps
1cqp 1ctr 1ctt 1cvu 1cx2 1d0l 1d3h 1d4p 1dbb 1dbj
1dbm 1dd7 1dg5 1dhf 1did 1die 1dmp 1dog 1dr1 1dwb
1dwc 1dwd 1dy9 1eap 1ebg 1eed 1ei1 1ejn 1ela 1elb
1elc 1eld 1ele 1eoc 1epb 1epo 1eta 1etr 1ets 1ett
1etz 1f0r 1f0s 1f3d 1fax 1fbl 1fen 1fgi 1fig 1fkg
1fki 1fl3 1flr 1frp 1ghb 1glp 1glq 1gpy 1hak 1hdc
1hdy 1hef 1hfc 1hiv 1hos 1hpv 1hri 1hsb 1hsl 1htf
1hti 1hvr 1hyt 1ibg 1icn 1ida 1igj 1imb 1ivb 1ivc
1ivd 1ive 1ivq 1jao 1jap 1kel 1kno 1lah 1lcp 1ldm
1lic 1lkk 1lmo 1lna 1lpm 1lst 1lyb 1lyl 1mbi 1mcq
1mcr 1mdr 1ml1 1mld 1mmb 1mmq 1mnc 1mrg 1mrk 1mts
1mtw 1mup 1nco 1ngp 1nis 1nsd 1okl 1okm 1pbd 1pdz
1pgp 1pha 1phd 1phf 1phg 1poc 1ppc 1pph 1ppi 1ppl
1pso 1ptv 1qbr 1qbt 1qbu 1qcf 1qh7 1ql7 1qpe 1qpq
1rbp 1rds 1rne 1rnt 1rob 1rt2 1sln 1slt 1snc 1srf
1srg 1srh 1srj 1stp 1tdb 1tka 1tlp 1tmn 1tng 1tnh
1tni 1tnl 1tph 1tpp 1trk 1tyl 1ukz 1ulb 1uvs 1uvt
1vgc 1vrh 1wap 1xid 1xie 1xkb 1ydr 1yds 1ydt 1yee
25c8 2aad 2ack 2ada 2ak3 2cgr 2cht 2cmd 2cpp 2ctc
2dbl 2er7 2fox 2gbp 2h4n 2ifb 2lgs 2mcp 2mip 2pcp
2phh 2pk4 2plv 2qwk 2r04 2r07 2sim 2tmn 2tsc 2yhx
2ypi 3cla 3cpa 3erd 3ert 3gch 3gpb 3hvt 3mth 3nos
3pgh 3ptb 3tpi 4aah 4cox 4cts 4dfr 4er2 4est 4fab
4fbp 4lbd 4phv 4tpi 5abp 5cpp 5er1 5p2p 6abp 6cpa
6rnt 6rsa 7cpa 7tim 8gch
Clean list (224 entries)
1a28 1a42 1a4g 1a4q 1a6w 1a9u 1aaq 1abe 1abf 1acj
1acl 1acm 1aco 1aec 1ai5 1aoe 1apt 1apu 1aqw 1ase
1atl 1azm 1b58 1b59 1b9v 1baf 1bbp 1bgo 1bl7 1blh
1bma 1bmq 1byb 1byg 1c12 1c1e 1c5c 1c5x 1c83 1cbs
1cbx 1cdg 1cil 1ckp 1cle 1com 1coy 1cps 1cqp 1cvu
1cx2 1d0l 1d3h 1d4p 1dbb 1dbj 1dd7 1dg5 1dhf 1did
1dmp 1dog 1dr1 1dwb 1dwc 1dwd 1dy9 1eap 1ebg 1eed
1ei1 1ejn 1eoc 1epb 1epo 1eta 1etr 1ets 1ett 1f0r
1f0s 1f3d 1fax 1fen 1fgi 1fkg 1fki 1fl3 1flr 1frp
1glp 1glq 1hak 1hdc 1hfc 1hiv 1hos 1hpv 1hri 1hsb
1hsl 1htf 1hvr 1hyt 1ibg 1ida 1imb 1ivb 1ivq 1jap
1kel 1lah 1lcp 1ldm 1lic 1lna 1lpm 1lst 1lyb 1lyl
1mbi 1mcq 1mdr 1mld 1mmq 1mrg 1mrk 1mts 1mup 1nco
1ngp 1nis 1okl 1okm 1pbd 1pdz 1phd 1phg 1poc 1ppc
1pph 1ppi 1pso 1ptv 1qbr 1qbu 1qcf 1qpe 1qpq 1rds
1rne 1rnt 1rob 1rt2 1slt 1snc 1srj 1tdb 1tlp 1tmn
1tng 1tnh 1tni 1tnl 1tpp 1trk 1tyl 1ukz 1ulb 1uvs
1uvt 1vgc 1wap 1xid 1xie 1ydr 1ydt 1yee 25c8 2aad
2ack 2ada 2ak3 2cht 2cmd 2cpp 2ctc 2dbl 2fox 2gbp
2h4n 2ifb 2lgs 2mcp 2pcp 2phh 2pk4 2qwk 2r07 2tmn
2tsc 2yhx 2ypi 3cla 3cpa 3erd 3ert 3gpb 3hvt 3tpi
4aah 4cox 4cts 4dfr 4est 4fbp 4lbd 4phv 5abp 5cpp
5er1 6rnt 6rsa 7tim
Clean list, resolution threshold 2.0 (92 entries)
1a28 1a4q 1a6w 1abe 1abf 1aec 1aoe 1apt 1apu 1aqw
1atl 1b58 1b59 1bma 1byb 1c1e 1c5c 1c5x 1c83 1cbs
1cil 1coy 1d0l 1d3h 1ejn 1eta 1f3d 1fen 1flr 1glp
1glq 1hfc 1hpv 1hsb 1hsl 1hvr 1hyt 1ida 1jap 1kel
1lcp 1lic 1lna 1lst 1mld 1mmq 1mrg 1mrk 1mts 1nco
1phd 1phg 1ppc 1pph 1qbr 1qbu 1rds 1rnt 1rob 1slt
1snc 1srj 1tmn 1tng 1tnh 1tni 1tnl 1tpp 1tyl 1ukz
1vgc 1wap 1xid 1xie 2ak3 2cmd 2cpp 2ctc 2fox 2gbp
2h4n 2qwk 2tmn 2tsc 3cla 3ert 3tpi 4dfr 4est 5abp
6rnt 7tim
GOLD User Guide 131
16.1.3 Validation using the CCDC/Astex Test Set
CCDC/Astex Validation Overview (see page 131)
GOLD Performance (see page 133)
GOLD Performance as a Function of Protein Type (see page 134)
Influence of Mediating Water Molecules on GOLD Results (see page 135)
CCDC/Astex Validation Overview
The CCDC/Astex test set of protein-ligand complexes was used to determine the GOLD success
rates (see http://www.ccdc.cam.ac.uk/products/life_sciences/validate/). The set consists of 305
protein-ligand complexes. All complexes have had their protonation states set manually, and
have been checked extensively. It is a considerably extended version of the original GOLD
validation test set.
From this set, a set of 224 reliable complexes was selected. This clean set excluded all
complexes that might be unreliable. Complexes were considered to be unsuitable if they did not
pass the following checks:
Involvement of crystallographically-related protein units in ligand binding.
Identification of bad clashes between protein side chains and the ligand.
Presence of structural errors, and/or inconsistency of ligand placement with crystal structure
electron density.
Limiting the clean list to resolutions better than 2.0 left 92 entries, for which results will also
be shown.
In addition, the set has been pruned to assure diversity in terms of protein-ligand structures.
The full list of 305, the clean list of 224, and the limited clean set of 92 entries list are shown in
Table I.
94 GOLD User Guide
11.3 Controlling Reliability and Speed with GA Parameters
11.3.1 Relationship between GA Parameters and Speed (see page 94)
11.3.2 Using Automatic GA Parameter Settings (see page 94)
11.3.3 Using Pre-Defined GA Parameter Settings (see page 96)
11.3.4 Benchmarking of Reliability/Speed for Pre-defined GA Parameter Settings (see page
97)
11.3.5 GA Parameter Settings for Virtual Screening (see page 98)
11.3.1 Relationship between GA Parameters and Speed
The time taken by GOLD to dock ligands can be controlled by altering the values of the genetic
algorithm (GA) parameters (see Section 10., page 89).
GOLD runs for a fixed number of genetic operations (crossover, migration, mutation). The
easiest way to make GOLD go faster is to reduce the number of GA operations performed in the
course of a run. This is done through the Number of Operations variable (this parameter is called
maxops in the configuration file).
A reduction in Number of Operations is likely to change the optimum values of several other GA
parameters, particularly popsize, van der Waals and Hydrogen Bonding.
GOLD manipulates a pool of chromosomes of size popsize * Number of Islands. The size of this
pool should be such that the optimisation converges within the specified maximum number of
operations, Number of Operations. If the pool size is too small for a given value of Number of
Operations, the algorithm will converge prematurely. Conversely, if the pool size is too large the
algorithm will terminate before it has converged.
The annealing parameters van der Waals and Hydrogen Bonding allow poor hydrogen bonds to
occur at the beginning of a genetic algorithm run, in the expectation that they will evolve to
better solutions. Both the vdw and H-bond annealing must be gradual and the population
allowed plenty of time to adapt to changes in the fitness function.
Because of these factors, it is difficult to set GA parameters by hand and you are recommended
to use automatic (ligand dependent) GA parameter settings (see Section 11.3.2, page 94), or one
of the default parameter sets offered in the GOLD front end (see Section 11.3.3, page 96).
11.3.2 Using Automatic GA Parameter Settings
The number of genetic operations performed (crossover, migration, mutation) is the key
parameter in determining how long a GOLD run will take (i.e. this parameter controls the
coverage of the search space).
GOLD can automatically calculate an optimal number of operations for a given ligand, thereby
making the most efficient use of search time, e.g. small ligands containing only one or two
rotatable bonds will generally require fewer genetic operations than larger highly flexible
GOLD User Guide 95
ligands.
The criteria used by GOLD to determine the optimal GA parameter settings for a given ligand
include: the number of rotatable bonds in the ligand, ligand flexibility, i.e. number of flexible
ring corners, flippable nitrogens, etc. (see Section 7., page 64), the volume of the protein binding
site, and the number of water molecules considered during docking (see Section 3.4, page 16).
The exact number of GA operations contributed, e.g. for each rotatable bond in the ligand, are
defined in the gold.params file (see Section 6.3, page 48).
To enable automatic GA settings, click on the Select GA Presets and Automatic Settings button in
the Genetic Algorithm Parameters panel (or hit Settings in the Control panel) then, in the
Settings selector window, click on Use automatic settings:
GOLD runs for a fixed number of genetic operations, limiting this number will result in an
increase in docking speed, however the search space will be less well explored (see Section 11.3,
page 94). The Search efficiency can be used to control the speed of docking and the predictive
accuracy (i.e. the reliability) of the results. With the Search efficiency set at 100% GOLD will
attempt to apply optimal settings for each ligand. For a ligand with five rotatable bonds this will
be around 30,000 GA operations. If the Search efficiency were set to 50%, then GOLD will
perform around 15,000 operations thereby speeding up the docking by a factor of two. Similarly,
by setting a Search efficiency greater than 100%, it is possible to make the search more
exhaustive (but slower).
The Minimum number of operations in run will be updated automatically according to the
Search efficiency that is set. The automatic preset can be overridden to ensure that every ligand is
subjected to at least a user-specified number of operations. Similarly, The Maximum number of
130 GOLD User Guide
genetic algorithm parameters.
Results:
GOLD failed to produce an answer for 1ACL because the ligand contains no hydrogen-bonding
atoms (this problem is since fixed). The subsequent analysis was therefore based on results for
99 complexes.
In summarising the results, the GOLD prediction is defined as the best of the 20 dockings
according to the GOLD fitness score and not the docking that is closest to the experimental
result.
Each GOLD prediction was assigned to one of 4 subjective categories: good, close, errors or
wrong. Each prediction was also ranked by its rms with respect to the observed ligand position.
GOLD achieved a 71% rate of successful predictions (good or close).
3D plots of individual predictions are available on the CCDC web page.
Detailed tabulations of the predictions are in Appendix C: GOLD Predictions in First Series of
Validation Tests (see page 153).
16.1.2 Follow-Up Validation of Docking Results
The GOLD algorithm was improved in various ways following the first set of validation tests. A
second set of tests was then performed on 34 additional complexes in order to ensure that GOLD
had not been over-trained on the original set. The method used was the same as in the first set of
validation tests.
Results:
GOLD achieved a 74% rate of successful predictions (good or close).
3D plots of individual predictions are available on the CCDC web page.
Detailed tabulations of the predictions are in Appendix D: GOLD Predictions in Second Series
of Validation Tests (see page 160).
GOLD User Guide 129
16. Accuracy of Predictions
16.1 Correlation between Predicted and Observed Ligand Positions (see page 129)
16.2 Correlation between Fitness Function and Biological Activity (see page 137)
16.1 Correlation between Predicted and Observed Ligand Positions
NOTE: This section and Appendix B summarise validation tests done when GOLD was first
developed using the GoldScore fitness function. Recently (2001-2), we have significantly
expanded the size of the test set and done comparisons between GoldScore and ChemScore. The
new validations do not change the basic conclusions outlined below in any major way and give
preliminary indications that GoldScore and ChemScore have comparable overall success rates.
A simple test of the effectiveness of a docking program is to take a protein-ligand complex from
the Protein Data Bank and extract the ligand. The docking program can then be used to predict
the binding mode of the ligand and a comparison made with the crystallographically observed
position. This methodology has been used to validate GOLD. Tests were done in two phases:
first, on a test set of 100 complexes; later, on an additional 34 complexes as a check against over-
training.
16.1.1 Initial Validation of Docking Results (see page 129)
16.1.2 Follow-Up Validation of Docking Results (see page 130)
16.1.3 Validation using the CCDC/Astex Test Set (see page 131)
16.1.4 Examples of GOLD Dockings (see page 136)
16.1.1 Initial Validation of Docking Results
The method used for each test calculation was as follows:
100 protein-ligand complexes were selected from the Protein Data Bank.
Parts of the protein remote from the binding site were deleted. Enough of the protein was
retained to ensure that all residues were present that might reasonably interact with the ligand.
The ligand was extracted from the protein binding site.
Hydrogen atoms were placed on both the protein and the ligand in order to ensure that ionisation
and tautomeric states were defined unambiguously. This involved making hypotheses about the
protonation states of residues such as His, Glu and Asp.
The ligand was minimised into a low-energy conformation.
The atom types of both the protein and ligand were checked for accuracy.
In almost all test runs, all water molecules were deleted from the protein structure. This is not
strictly defensible since water molecules often mediate protein-ligand binding. However, if more
careful judgements were made on which waters to remove, the effect would be to improve the
accuracy of the GOLD predictions. Hence, the deletion of all waters is a conservative strategy
which will make GOLD look less reliable than it really is, rather than more reliable.
20 docking runs were performed on each test complex, using the slowest default setting of the
96 GOLD User Guide
operations in run can be set manually.
When using automatic GA parameter settings, the parameters controlling the precise operation
of the genetic algorithm (population size, selection pressure, Niche size, etc.) will be set to auto
in the Genetic Algorithm Parameters panel. The actual GA settings used will be reported in the
ligand log file (see Section 14.10, page 118).
11.3.3 Using Pre-Defined GA Parameter Settings
To use one of the pre-defined GA parameter settings click on the Select GA Presets and
Automatic Settings button in the Genetic Algorithm Parameters panel, or hit Settings in the
Control panel, to open the Settings selector window:
Select Choose presets and choose from one of the pre-defined GA parameter settings listed.
The Default settings deliver high predictive accuracy but are relatively slow. Default settings are
recommended for use with large highly-flexible ligands, or for research applications where
speed of docking is not an issue and optimal accuracy is required.
The 2 times speed-up or 3 times speed-up settings are progressively quicker (predictive
reliability will fall off, but quite slowly). These setting are recommended for use with
compounds containing up to six flexible bonds and/or ring corners (see Section 7.1, page 64).
The 7-8 times speed-up settings will give comparable predictive accuracy to the slow, Default
settings when docking small ligands. These settings are recommended for use with ligands
containing one or two rotatable torsions and for virtual screening work (see Section 11.3.5, page
98).
GOLD User Guide 97
It is possible to create your own default GA settings. To do this, you must edit the file
gold_preferences (see Section 15.4, page 127)
Individual GA parameter settings can be specified in the GOLD front end by typing directly into
the input boxes in the Genetic Algorithm Parameters panel (see Section 2.4, page 7). However,
it is recommended that you use one of the pre-defined GA parameter settings as opposed to
altering individual GA parameters, because the optimum values of the parameters are highly
correlated.
11.3.4 Benchmarking of Reliability/Speed for Pre-defined GA Parameter Settings
We have performed a great many experiments with different genetic algorithm (GA) settings.
Three such settings are summarised below:
We used GOLD with each of these settings to dock 100 ligands into their binding sites, using a
test set of 100 protein-ligand complexes selected from the PDB. 20 docking runs were done on
each ligand with each GA set. The rms deviations were computed between the experimental
result and the GOLD solution ranked top by fitness function. Root mean square deviations
(rmsd) were also calculated between the experimental result and the closest of the 20 dockings
(i.e. not necessarily the top-ranked solution). Results were:
GA Parameter Set A Set B Set C
Number of Operations 100000 10000 1000
Population Size 100 100 50
Selection Pressure 1.1 1.1 1.125
Number of Islands 5 1 1
Crossover 95 100 100
Mutate 95 100 100
Migrate 10 0 0
Niche Size 2 2 2
Hydrogen Bonding 2.5 2.0 5.0
van der Waals 4.0 10.0 10.0
128 GOLD User Guide
Edit into the file a line such as:
default_ga_setting /home/golduser/configfiles/myconfig.conf my
protein
and create a configuration file (called /home/golduser/configfiles/
myconfig.conf in the above case) containing the desired GA settings.
The settings will appear in the Settings Selector window next time GOLD is opened:
GOLD User Guide 127
15.2 Customising Fitness Function Parameters
GOLD parameters are stored in the gold.params file in the GOLD distribution directory. It
can be customised by copying it, editing the copy, and instructing GOLD to use the edited file.
Parameters specific to GoldScore are stored in files of the type
goldscore.p450_<csd|pdb>.params (see Section 6.3, page 48).
The ChemScore fitness-function parameters are stored in the ChemScore file, which can
also be customised (see Section 6.5, page 58).
15.3 Customising the Torsion Angle Distribution File
It is possible to customise torsion distribution information by copying one of the standard torsion
distribution files, editing it, and instructing GOLD to use the edited file (see Section 9.3, page
84).
15.4 Creating Customised Default Genetic Algorithm Parameter Settings
A number of pre-defined genetic algorithm (GA) settings are offered when GOLD is opened:
It is possible to add your own default GA settings to this window.
To do this, you must edit the file .gold_preferences in your home directory. This file will
be created the first time you run GOLD, and will look something like this:
98 GOLD User Guide
rmsd < n = number of predictions out of the 100 within n rmsd of observed result.
In the GOLD front-end, the GA parameter set called Default settings corresponds to Set A
above; 7-8 times speed-up corresponds to Set B; and library screening settings corresponds to
Set C.
For careful work, we recommend the slow standard setting A, which typically finds correct
solutions in 70-80% of cases. Set C, which is fast enough for virtual library-screening, is
inevitably less accurate, but still finds the correct solution 60-70% of the time.
11.3.5 GA Parameter Settings for Virtual Screening
Existing GOLD users may have library screening settings available as one of the default
preferences. However, due to general advances in processor speed we would now recommend
using 7-8 times speed-up for virtual screening work in order to take advantage of the associated
improvement in accuracy (see Section 11.3.1, page 94):
Note: If library screening settings are not available as a default preference you can re-enable
these by editing the gold_preferences file (see Section 15.4, page 127).
top-ranked;
rmsd < 2
top-ranked;
rmsd < 3
closest;
rmsd < 2
closest;
rmsd < 3
Set A 70 79 83 88
Set B 64 77 79 89
Set C 62 68 72 86
GOLD User Guide 99
12. Running GOLD
12.1 Required Input Files (see page 99)
12.2 Starting GOLD (see page 99)
12.3 Running Interactively; Interactive Diagnostics (see page 100)
12.4 Submitting a GOLD job to the Background from the Front End (see page 100)
12.5 Running GOLD from the Command Line (see page 100)
12.6 Running in Parallel (see page 101)
12.1 Required Input Files
The following files must be available before a GOLD job can be run:
One or more files containing the ligand(s) to be docked, in MOL2, MOL, SD or PDB format (but
PDB format is not recommended for ligand files) (see Section 4., page 30).
A file containing the protein (or the part of a protein) into which the ligand is to be docked. This
may be in PDB or MOL2 format (see Section 3., page 9)
GOLD also needs a configuration file, which contains the names of the protein and ligand files,
and all the user-defined parameters such as genetic algorithm parameter settings, fitness flags,
etc. The configuration file can be created manually, but it is usually easier and preferable to
create it with the GOLD graphical front end (the file is written automatically when the Run, Save
& Exit or Submit & Exit buttons are hit) (see Section 2.1, page 3).
In addition, GOLD uses a parameter file (see Section 6.3, page 48) and (optionally) a torsion
distribution file (see Section 9., page 83). If the ChemScore fitness function is selected, it will
also use a ChemScore file (see Section 6.5, page 58). All these files are supplied in the GOLD
distribution and, by default, will be found automatically by the program. If required, any of the
files can be copied to a users directory and edited, and GOLD can then be directed to use the
edited file.
12.2 Starting GOLD
GOLD opens output log files so each GOLD run should be performed in a separate directory.
Create a directory in which to run GOLD and copy the protein and ligand files into it.
You can also write each set of ligand output files to its own sub-directory.
GOLD can be run from the command line or via the graphical front end. The easiest way to get
started is to use the front end (see Section 2., page 3).
From the front end, you can run a GOLD job interactively (see Section 12.3, page 100), submit it
to the background (see Section 12.4, page 100), or save the configuration file so that GOLD may
be started from the command line (see Section 12.5, page 100).
126 GOLD User Guide
15. Saving and Reusing Program Settings
15.1 Saving and Re-using Program Settings in Configuration Files (see page 126)
15.2 Customising Fitness Function Parameters (see page 127)
15.3 Customising the Torsion Angle Distribution File (see page 127)
15.4 Creating Customised Default Genetic Algorithm Parameter Settings (see page 127)
15.1 Saving and Re-using Program Settings in Configuration Files
The configuration file is a text file which specifies the GOLD calculation that is to be run,
including details of the ligand, the protein binding site, the fitness-function parameter file to be
used, the torsion distribution file to be used, and the genetic algorithm parameters. Although the
file can be generated with a standard text editor, the easiest way to create it is to use the GOLD
front end (see Section 2.1, page 3).
Any settings that have been defined in the GOLD front end can be saved as a configuration file
by selecting the button Save & Exit. Alternatively, the file will be saved automatically if you
start a GOLD job from the front end with the Submit & Exit or Run buttons.
By default, the configuration file will be saved in the directory from which GOLD was opened
and will be called gold.conf. Use the entry box next to the Configuration File button to
change the file name and/or directory (any file name can be used).
Once a configuration file has been created, it can be re-used, either as a quick way of reading
program settings into the GOLD front end or to run GOLD from the command line (see Section
12., page 99).
To load a previously created configuration file into the front end, enter the file name into the box
next to the Configuration File button and hit return. The parameters read in from the
configuration file will overwrite any parameters that have already been set in the GOLD front
end.
If you have a valid configuration file (i.e. one that completely specifies a GOLD job), you can
run GOLD from the command line by using a simple command available in $GOLD_DIR/bin.
For example, if the configuration file is gold.conf, the command is:
% gold_auto gold.conf &
If you find yourself using a configuration file over and over again, you may want to add it to the
options listed in the GOLD start-up window (the Settings Selector window). This is done by
editing the file .gold_preferences in your home directory (see Section 15.4, page 127).
GOLD User Guide 125
Note: SGI users running IRIX will also be given the option to use grommitt for simple
visualisation of docking results (see Section 18.1, page 142).
14.14 Exporting Fitness-Function Data to SILVER
It is possible to write additional information to docked solution files.
This information includes the values of the individual fitness-function components and is written
to SD file tags; for MOL2 files, these tags are written to comment blocks (see Section 14.2, page
111).
This information can be utilised by SILVER (supplied with GOLD). SILVER allows you to
define and calculate a wide variety of descriptors (parameters that describe dockings) which may
be used to analyse the results of a docking run. For further information, refer to the SILVER
User Guide.
100 GOLD User Guide
12.3 Running Interactively; Interactive Diagnostics
GOLD can be run interactively by hitting the Run button in the front end. However, since
docking often takes several minutes or even hours, it is usually better to run the job in the
background.
If GOLD is run interactively, output that is written to the log files is also displayed in a window:
The parallel version only gives a summary as it is not possible to track multiple files.
You can use the Interrupt GA button to interrupt GOLD and terminate the docking run.
If any error conditions are encountered, they will be displayed in another window. Note that only
fatal errors are reported for the parallel version.
When GOLD is being run interactively, SILVER can be used to display the current top solution
from a genetic algorithm run (see Section 18.1, page 142). To do this, click on the Display
options button in the GOLD front end.
12.4 Submitting a GOLD job to the Background from the Front End
You can submit a GOLD job the background by using the Submit&Exit button in the front end,
having first specified all the required information, such as protein and ligand file names,
parameter settings, etc.
12.5 Running GOLD from the Command Line
Unix platforms:
GOLD can be run directly in the background by using a simple command available in:
$GOLD_DIR/bin:
GOLD User Guide 101
% gold_auto gold.conf &
where gold.conf is the name of a configuration file.
Windows:
GOLD can be run on Windows by starting a command prompt, navigating to the directory
containing the gold.conf file and running the following command:
"C:\Program
Files\CCDC\gold_v3.1\gold\d_win32\bin\gold_win32.exe"
The above command assumes that GOLD is installed in the default installation directory and
that the configuration file is called gold.conf. If another name has been used for the gold.conf,
(e.g. new_conf_filename.conf), this will have to be specified:
"C:\Program
Files\CCDC\gold_v3.1\gold\d_win32\bin\gold_win32.exe"
new_conf_filename.conf
12.6 Running in Parallel
12.6.1 Parallel Virtual Machine (PVM) (see page 101)
12.6.2 Using the PVM Console (see page 102)
12.6.3 Diagnosis of PVM Problems (see page 103)
12.6.4 Selecting and Deselecting Machines (see page 104)
12.6.5 Setting the Maximum Number of Processes (see page 105)
12.6.6 Using GOLD with your own PVM Installation (see page 105)
12.6.1 Parallel Virtual Machine (PVM)
The parallel version of GOLD uses PVM (Parallel Virtual Machine) in its operation. PVM is a
3rd party public-domain library of routines that allows a program to schedule and harvest results
across a network of machines and/or processors.
PVM is supplied with GOLD for UNIX-based platforms only (parallel versions can only be run
on Windows with third party applications) and allows users to distribute jobs over their network,
across a virtual cluster of machines in order to harness the processing power of multiple
machines concurrently.
If PVM is not installed, GOLD disables the parallel version. There is also an option, -np, which
allows you to disable the parallel version, if required:
UNIX: $GOLD_DIR/bin/gold -np
Windows: <InstallDir>/bin/gold -np
124 GOLD User Guide
Cluster 1: bestranking structure is gold_soln_ligand_m1_8.mol
Cluster 2 : bestranking structure is gold_soln_ligand_m1_10.mol2
Cluster 3 : bestranking structure is gold_soln_ligand_m1_4.mol2
Cluster 4 : bestranking structure is gold_soln_ligand_m1_9.mol2
14.11 File Containing Error Messages
The file gold.err lists any errors found by the program. These are generally fatal and cause
the program to stop. It is a good idea to check gold.err if something goes wrong.
Errors found by the atom-type checker are written to gold.err. If you are unsure about your
atom typing you should therefore check this file. For example:
In the parallel version, warning messages are logged in individual error files - one for each
process. They are not sent back to the central parallel scheduling process.
gold.err is line buffered so errors are logged immediately. If you are running GOLD
interactively, the contents of gold.err will appear in a separate window.
14.12 Process File
The file gold.pid records the user, host and process number of the GOLD job. It is deleted
when GOLD exits. Its purpose is to stop the user running two GOLD jobs in the same directory.
If the machine goes down, or GOLD crashes or is killed with signal 9, you will need to remove
gold.pid before you can run another GOLD job in the same directory.
14.13 Viewing Docked Solutions in SILVER
To visualise docked solutions in SILVER click on Display options, then select either Show in
SILVER to view all results after a docking run has completed, or Show in SILVER now in order
to visualise current results immediately.
GOLD User Guide 123
In the above example, at a clustering distance of 0.75 , there are four different clusters of
solutions:
0.90 | 1 2 3 5 | 4 7 | 6 9 10 | 8 | files (d= 0.75 )
Note: Clusters are separated by the | symbol and rankings are used rather than run numbers (see
Section 14.5, page 112).
The first cluster contains four solutions ranked numbers 1, 2, 3 and 5, the bestranking structure
in this cluster is ranked_structure_m#_1.mol2 which corresponds to the docked
solution gold_soln_ligand_m1_8.mol2. Likewise, the second cluster contains two
solutions ranked numbers 4 and 7, the bestranking structure in this cluster is
ranked_structure_m#_4.mol2 which corresponds to the docked solution
gold_soln_ligand_m1_10.mol2, and so on for the fourth and fifth clusters.
Symbolic links will be generated in the output directory which will link to the top-ranked
solution in each cluster:
102 GOLD User Guide
Parallel GOLD dockings are distributed over a PVM at the ligand level such that each ligand is
assigned to a particular node within the PVM and then docked. Results are returned to the PVM
Master machine whilst new ligands are distributed amongst idle machines within the PVM until
the GOLD job is completed.
PVM works by using daemons. When you start PVM, a daemon will be created on the machine
you are using (we will call this machine the master). You can then add further computers (which
we will call slaves) to the virtual machine (see Section 12.6.4, page 104). Adding each new
machine will start a slave daemon on that machine.
You can only use each host as a member of one virtual machine. This is because a user can only
have one daemon running on a given machine.
When using GOLD with PVM, it is strongly recommended that you pick one machine as master
and always use that machine for setting up and starting GOLD jobs.
To run parallel GOLD using PVM, passwordless shell access (either RSH or SSH) must be set
up between all of the machines that you wish to use in your PVM cluster. Your systems
administrator should be able to set this up for you. To get PVM to work with SSH you need to
set a global environment variable $PVM_RSH to ssh on all systems that you intend to use in the
PVM cluster.
PVM user manual pages can be found in $PVM_ROOT/man. For more information, see the
PVM home page at http://www.netlib.org/pvm3.
12.6.2 Using the PVM Console
The PVM software provides a command line console. Once you have set the environment
variable $PVM_ROOT you can start it by typing:
$PVM_ROOT/lib/pvm
at the command line.
The setenv command in the console will generate a listing of the local environment set in
PVM.
The conf command will tell you which hosts are currently present within your virtual
machine.
The PVM console allows you to add machines to and delete machines from your virtual machine
using add and delete, as well as view details about PVM. If there are problems with a specific
node, or machine, try the command:
add <node-name>
GOLD User Guide 103
and see if it generates any useful information as to why there may be a problem.
GOLD provides a simple interface to PVM that allows you to add machines (see Section 12.6.4,
page 104); however you should use the console to remove them. If you delete them in the GOLD
interface, they are just flagged as do not use. The reason for this is that we cannot guarantee that
a user is not using PVM for other purposes.
Adding a machine will not affect any other software, but deleting a machine might.
12.6.3 Diagnosis of PVM Problems
If you are having difficulty getting PVM running correctly on your system, in the first instance
please check the following:
1. Check that the environment variable
$PVM_ROOT
is set correctly and globally on all machines within the PVM cluster.
2. Check that your system temporary area is not full. We have occasionally heard of cases where
PVM could not start correctly because /tmp on the user's machine was full.
Once you have performed these checks, you can begin diagnosing the root cause of the problem.
The UNIX GOLD distribution includes a PVM diagnostics script called test_pvm.sh. To run this
script, please execute the command:
$GOLD_DIR/bin/test_pvm.sh
and follow the on-screen instructions. If you are unable to interpret the information generated by
this script, please send the entire output by email to support@ccdc.cam.ac.uk and we will
diagnose any PVM problems you may have.
Additional diagnostic information can be obtained from various files that can be found on the
machines within your PVM cluster. In particular, the PVM log files are often very useful. Each
daemon generates its own log file. They take the form:
/tmp/pvml.<user id>
and are generated on both the PVM master machine and the PVM slave machines. They can
contain relevant information (or sometimes lack expected lines) that indicates the source of the
problem.
For example, if PVM is configured correctly you should expect to see the text line
Running on <platform type>
122 GOLD User Guide
14.10.3Identification of Different Binding Modes (Clustering of Ligand Poses)
GOLD clusters docked solutions according to how similar the poses are in terms of their RMSd
(see Section 14.10.2, page 120). A link can be generated to the top ranked solution from each
distinct cluster. This can be useful in identifying different ligand binding modes. Considering
solutions from different clusters is often more relevant than taking the top n ranked poses since
these will often be very similar (i.e. all from the same cluster of solutions).
Open the Output Preferences window by hitting the Output button in the GOLD front end. Then,
switch on the Create links for different binding modes check-box, and specify an RMSd
clustering distance (this determines how similar the poses are in each cluster of solutions). By
default the clustering distance is 0.75 :
A clustering report is given at the end of the ligand log file. The clusters themselves and the
individual solutions within each cluster are in ranked order (i.e. the first member of the first
cluster is always the top-ranked solution). For example, output from a run of 10 GA dockings
may look like:
GOLD User Guide 121
In this case, solution number 4 had the largest fitness score (this solution will be in
gold_soln_ligand_m#_4.mol2, which will be symbolically linked to
ranked_ligand_m#_1.mol2), while solution number 3 had the worst fitness.
The numbers in the matrix of rms deviations refer to the rankings, not the run numbers (e.g. row
1 of the above matrix refers to the solution with the best fitness score, contained in
ranked_ligand_m#_1.mol2).
Finally, the rms deviations are used as input to a hierarchical cluster analysis, using the complete
linkage algorithm. Each line shows one iteration of the clustering algorithm, the distance
between the clusters that were merged at that step, and the contents of the current set of clusters.
Clusters are separated by the | symbol and rankings are used rather than run numbers. For
example, the solutions ranked_ligand_m#_2.mol2 and
ranked_ligand_m#_4.mol2 were merged in the first step of the following cluster
analysis:
Final Ranking 4 2 5 1 3
_______________________________
RMSD Matrix of RANKED solutions
2 3 4 5
1: 4.8 4.7 5.1 10.1
2: 4.0 3.1 10.9
3: 4.1 10.4
4: 11.0
Clustering using complete linkage.
Structure ids are RANKING
Dist Clusters...
3.14 | 4 2 | 3 | 5 | 1 |
4.06 | 4 2 3 | 1 | 5 |
5.07 | 4 2 3 1 | 5 |
10.95 | 4 2 3 1 5 |
104 GOLD User Guide
somewhere in the PVM log file on the PVM master machine.
For further information, please consult the PVM troubleshooting guide:
http://www.netlib.org/pvm3/book/node1.html
12.6.4 Selecting and Deselecting Machines
Click on Choose machines in the GOLD front end to launch the parallel process scheduling
window:
The scheduling window allows you to select a set of machines, across which a parallel GOLD
job will be distributed. A GOLD job may be distributed across multiple processors on a single
machine, or across several single-processor machines, or across several multiple-processor
machines.
The process scheduler allows you to add suitable hosts into the schedule for use in docking a
ligand.
By clicking on the Add button you can add new machines to your schedule:
Type in a host name (your administrator must install GOLD so that it knows the names of
available host machines).
For each host chosen, you need to specify a value for Number of Processes. This tells GOLD
GOLD User Guide 105
how many separate docking runs to start on that machine. For single processor machines,
Number of Processes should usually be set to 1; on machines with more than one processor, it
should usually be greater than one, depending on how many of the machines processors you
wish to use.
The Host file name button allows you to read a file that contains a host configuration previously
created when using parallel GOLD. If you click on this button, GOLD then prompts you for a
file to read. It will read hosts and numbers of processes from this file, and attempt to add these
hosts to your configuration.
12.6.5 Setting the Maximum Number of Processes
The entry box labelled Maximum number of distributed processes allows specification of the
maximum number of GOLD processes that can run simultaneously. This should normally be set
equal to the number of processors available for the GOLD job to run on.
Note: If the maximum number of distributed processes is set to a number greater than the total
no. of processes listed for each individual host in the PVM configuration, GOLD will spawn
more jobs than specified on each machine until the total no. set in the maximum number of
distributed processes are being run. i.e. a discrepancy between the no. of processors listed and
the maximum number of processes can lead to more or less processes than intended being run on
each machine.
12.6.6 Using GOLD with your own PVM Installation
In some circumstances, users may prefer to run parallel GOLD using a pre-existing installation
of PVM rather than the version packages within the UNIX GOLD installer. However, this can
cause difficulties since the parallel components of GOLD are compiled against the version of
PVM packaged with GOLD using specific compiler flags.
If the users version of PVM is significantly different, parallel GOLD may not function correctly
in its default configuration. The solution is for the user to re-compile the PVM parts of GOLD
on their system. For this reason, the UNIX GOLD distribution is packaged with a tar-gzip patch
file for the PVM part of GOLD on their system. It also recompiles the front end and the PVM
shared object used in the main GOLD process.
If you would like to try recompiling the parallel components of GOLD on your own system, you
will find the required patch file here:
$GOLD_DIR/gold_pvm_patch.tar.gz
Please unpack this file and consult the ReadMe file for further details.
120 GOLD User Guide
14.10.2Comparison of Docking Solutions
Following the completion of all docking runs on a ligand, the results from the different runs are
compared in the ligand log file.
The file will include a matrix of rms deviations between the various docked ligand positions.
The rms deviation algorithm takes account of symmetry effects, using a graph isomorphism
algorithm. For example:
GOLD User Guide 119
The progress of each docking run (see Section 14.10.1, page 119).
A comparison of the various docking solutions found (see Section 14.10.2, page 120).
Clustering of ligand poses, for identification of solutions with different binding modes (see
Section 14.10.3, page 122).
You can choose not to save ligand log files if you prefer (see Section 14.1, page 109).
14.10.1Information on the Progress of Docking Runs
As each docking run is performed on a ligand, the progress of the genetic algorithm is recorded
in the ligand log file.
The best (most fit) individual at any time is listed. The total fitness and its component terms are
also displayed.
For GoldScore, the internal vdw energy includes the ligand torsional energy. The external vdw
energy is normally scaled by a factor of 1.375 and summed with the other components to give
the total fitness (this is to encourage hydrophobic contact between the protein and ligand).
During a docking run, the fitness score may appear to get worse as the docking proceeds. This is
due to the fact that the effects of poor H-bond geometry and close nonbonded contacts are
artificially down-weighted at early stages of the docking (annealing). Only the final fitness score
(i.e. from the completed docking) has any meaning.
The message Reordering... refers to a re-ranking of the GA populations caused by the annealing
process.
At the end of the GA run, the solution is output and summarised.
Here is an example output:
106 GOLD User Guide
13. Rescoring
Different scoring functions may perform better for selected cases. You may find, for example,
that ChemScore outperforms GoldScore in ranking actives or one protein class, whereas the
reverse will apply for other classes.
Therefore, when screening large numbers of compounds, rescoring docking poses with
alternative scoring functions and considering the best results from each (consensus scoring) can
have a favourable impact on the overall rank ordering of ligands.
13.1 Rescoring Overview (see page 106)
13.2 Setting Up a Rescoring Run (see page 106)
13.1 Rescoring Overview
It is possible to rescore a single ligand or a set of ligands in one or more files.
Typically, a user will rescore GOLD solution files with an alternative scoring function.
However, it is also possible to score a known ligand pose from an alternative source (for
example, from a known crystal structure or a solution from another docking program).
Note: when docking from a source other than a GOLD solution file it will not be possible to use
the optimised positions of polar protein hydrogen atoms (see Section 13.2, page 106).
Rescoring, like docking, requires a prepared protein input file and a fully defined binding site
(preferably the same definition that was used for the original docking). The ligand file, scoring
function and output preferences must also all be specified (see Section 13.2, page 106).
GOLD can perform a local optimisation of the ligand conformation that is to be rescored. This is
important because if the pose is tweaked only slightly (via a simple minimization in an
appropriate force field) one finds that the fitness score can greatly increase.
When rescoring a GOLD solution file is it possible to use the positions of the rotatable protein
hydrogens that were generated during the original docking as a starting point for the
minimisation. If these are not available then the default hydrogen atoms positions specified in
the protein input file will be used.
Rescored solution files can be written out that will contain the new scoring function terms and
can be used with SILVER (see Section 13.2, page 106).
It is not possible to use the rescore feature if GOLD is being run in parallel (see Section 12.6,
page 101).
13.2 Setting Up a Rescoring Run
Rescoring requires essentially the same information as a normal docking run. You will therefore
need to:
Provide a prepared protein input file (see Section 3.10, page 29).
Define the binding site (preferably the same definition that was used for the original docking),
i.e. you must specify the approximate centre and extent of the binding site (see Section 13.1,
GOLD User Guide 107
page 106).
Use the ligand selection dialog to specify the ligand file you wish to rescore.
Note: When the Rescore check-box is switched on, the ligand selection dialog will contain an
additional option. Hit the Add all solutions in directory button to automatically add all GOLD
solution files (i.e. all files named gold_soln_*) in the specified directory to the Current
Ligand File Selection.
Specify the fitness function to be used for the rescoring (see Section 6.1, page 46).
Switch on the Rescore check-box in the Fitness Function Settings section of the GOLD front-
end. To specify the settings to be used for the rescoring run hit the Options button. This will open
the Rescoring Settings dialog:
The following Calculation Options are available:
Perform local optimisation (simplexing)
Enable this check-box to minimise the docked ligand pose before rescoring. Simplexing is
important if you are to obtain meaningful scores. Due to the nature of scoring functions, one
finds that small changes in location or conformation of the pose can have large effects on the
calculated score.. Note: simplexing can also affect rotatable protein hydrogen atoms (see
Section 14.6, page 115).
Retrieve rotatable H positions from file if available
When rescoring a GOLD solution file it is possible to use the optimised positions of the polar
protein hydrogen atoms that were generated during the original docking (see Section 14.6,
page 115). If this option is not switched on (or no rotatable H positions are available) then the
default hydrogen atoms positions specified in the protein input file will be used.
118 GOLD User Guide
For example, if concatenated_output = Myfile.mol2 the log file will be named
Myfile.rescore.log.
For each rescored ligand a total fitness score and the component scoring terms are listed.
Status gives an indication of whether or not there were any errors during the rescoring run.
Simplex indicates whether or not a locally optimised ligand pose was used for the rescoring. 1
indicates that the minimised pose was used, 0 indicates that the minimised pose was not used
and - indicates that simplexing was not switched on (see Section 13.2, page 106).
Note: When Perform local optimisation (simplexing) is switched on the minimised conformation
will only be used for the rescoring if this results in an improvement to the fitness score.
When a minimised ligand pose is used for the rescoring an RMSd measure is given of the final
minimised orientation with respect to the input ligand conformation.
The example file below was generated by rescoring the best solution found (m2) for the second
ligand in the solution file results.mol2:
14.9 Protein Log File
The protein log file gold_protein.log details the parameterisation of the protein and the
determination of the binding site.
The cavity volume, as determined by the cavity detection algorithm, can also be output to the
gold_protein.log file (see Section 3.8, page 24).
The file is line buffered, so you can see how the algorithm is progressing even when GOLD is
run in the background.
14.10 Ligand Log File
The progress of each genetic algorithm run is listed in the ligand log file
gold_<ligand_file_name>_m#.log. Here, m# is an index to the number of the ligand
in the input file, e.g. m3 indicates that the log file refers to the third ligand in the input ligand file
(remember that an input file may contain more than one ligand).
The log files are line buffered, so you can see how the algorithm is progressing even when
GOLD is run in the background.
The parallel version of GOLD creates several temporary log files for each ligand, named
gold_soln_<ligand_file_name>_m#_<N>.log where <N> is a docking-run number.
Once all the docking runs for the ligand have been completed, these files are concatenated
together into the single log file gold_soln_<ligand_file_name>_m#.log.
The ligand log file contains information on:
GOLD User Guide 117
14.8 Files Containing the Results of Rescoring
GOLD writes two types of file which contain the results of a rescoring run:
A structure file containing the docked ligand pose after rescoring (see Section 14.8.1, page 117)
A log file containing the scoring function terms obtained for the rescoing run (see Section
14.8.2, page 117)
14.8.1 Rescore Solution File
A file containing the docked ligand solution(s) after rescoring can be written. You can control
whether or not this file is written from within the Rescoring Settings window (see Section 13.2,
page 106).
If specified, solutions will be written with the default filename rescore.mol2 (MOL2 or SD
output can be selected (see Section 14.5, page 112)). To specify an alternative filename (for both
the rescore solution and log files), add the following line to the gold.conf file:
concatenated_output = <filename.mol2>
For example, if concatenated_output = Myfile.mol2 the rescore mol2 file will be
named Myfile.mol2.
Solution files will contain the new scoring function terms and the positions of rotatable protein
hydrogen atoms generated during rescoring (see Section 13.2, page 106).
A full description of the additional tags written to solution output files is available in Appendix
B: Additional Tags in Output Files (see page 151).
14.8.2 Rescore Log File
The rescore log file rescore.log summarises the outcome of the rescoring run. To specify an
alternative filename (for both the rescore solution and log files), add the following line to the
gold.conf file:
concatenated_output = <filename.mol2>
108 GOLD User Guide
The following Output options are available:
Write structures to file for SILVER
Enable this check-box to write out docked ligand solutions after rescoring. Solutions will be
written to the file rescore.mol2 (to specify an alternative filename (see Section 14.8.1,
page 117), MOL2 or SD output can be specified (see Section 14.5, page 112)). Solution files
will contain the new scoring function terms and can be used with SILVER.
Note: If writing of this file is switched off, only the rescore.log file will be written (see
Section 14.8, page 117).
Replace relevant tags in file
When rescoring a GOLD solution file enable this check-box to overwrite the list of active
residues and the rotated protein hydrogen atom positions generated during the original
docking with those resulting from the rescoring run. If you select not to replace relevant tags
then rescore.mol2 will contain both the binding site definition of the original docking
and that of the subsequent rescoring run.
Hit Done to close the Rescoring Settings dialog and start the GOLD job in the usual way (see
Section 12., page 99).
Output that is written to the rescore.log file is also displayed in the GOLD Output window.
Note: To specify an alternative rescore log filename (see Section 14.8.2, page 117).
GOLD User Guide 109
14. Output Options
14.1 Controlling the Amount of Output (see page 109)
14.2 Controlling the Information Written to Output Files (see page 111)
14.3 Specifying Directories for Output Files (see page 112)
14.4 Files Containing the Initialised Protein and Ligand (see page 112)
14.5 Files Containing the Docked Ligand(s) (see page 112)
14.6 Files Containing Protein Binding-Site Geometry (see page 115)
14.7 Files Containing Fitness Function Rankings (see page 115)
14.8 Files Containing the Results of Rescoring (see page 117)
14.9 Protein Log File (see page 118)
14.10 Ligand Log File (see page 118)
14.11 File Containing Error Messages (see page 124)
14.12 Process File (see page 124)
14.13 Viewing Docked Solutions in SILVER (see page 124)
14.14 Exporting Fitness-Function Data to SILVER (see page 125)
14.1 Controlling the Amount of Output
GOLD can produce a lot of output and you may wish to cut it down.
To do this, hit the Output... button in the GOLD front end to open the Output Preferences
window.
Use the File and Format Options to specify whether you want files listing fitness-function
rankings (see Section 14.7, page 115), ligand log files (see Section 14.10, page 118), and/or links
for different binding modes (see Section 14.10.3, page 122). For example, the settings below
will produce log files but not ranking files or links for different binding modes:
Use the Selecting Docked Solutions options to specify whether you want to save:
All docking solutions:
116 GOLD User Guide
gold_soln_ligand_file_m5_8.mol2, which is symbolically linked to
ranked_ligand_file_m5_2.mol2, since it is the second best of the docking attempts for
this molecule:
You can choose not to save ligand rnk files if you prefer (see Section 14.1, page 109).
14.7.2 File Containing Ranked Fitness Scores for a Set of Ligands
A file called bestranking.lst is written for batch jobs on multiple ligands. This gives a
continuous summary of the best solution that has been obtained for each completed ligand.
To specify an alternative filename, add the following line to the gold.conf file:
bestranking_list_name = <filename.lst>
The file gives total fitness scores and a breakdown of the fitness into its constituent energy
terms. For GoldScore, these are the two vdw energy terms (protein-ligand and internal ligand),
an internal ligand torsion term, and two hydrogen-bonding terms (protein-ligand and ligand
intramolecular). The external vdw term is scaled by a factor of 1.375 in constructing the total
fitness score (this is an empirical correction to encourage protein-ligand hydrophobic contact).
Note: by default the file will contain a single internal energy term S(int) which is the sum of the
internal torsion and internal vdw terms (see Section 6.2, page 46).
The example file below was generated from a ligand input file containing 5 ligands. The listed
file names correspond to the names of the files containing the best solution found for each
ligand, e.g. gold_soln_ligs_m1_3.mol2 contains the best answer found for the first ligand in the
input file.
GOLD User Guide 115
N-phosphonacetyl-L-aspartate
the line SET_UNIQUE_SOLN_TITLES = 0 in the gold.params file should be changed to read
SET_UNIQUE_SOLN_TITLES = 1.
A description of the various other tags available can be found in Appendix B: Additional Tags
in Output Files (see page 151).
14.6 Files Containing Protein Binding-Site Geometry
During docking, GOLD will keep the protein geometry fixed except that it will optimise
hydrogen-bond geometries by rotating groups such as serine OH and lysine NH
3
. This means
that the coordinates of polar hydrogen atoms such as these will change.
Files can be written out that contain the conformation of the cavity residues around the docked
ligand (and, specifically, the optimised positions of the protein H-bonding hydrogen atoms) for
each docking. To do this, you need to edit the gold.params file and add the command
SAVE_CAVITY = 1.
The optimised positions of polar protein hydrogen atoms that are generated during docking can
also be written to the docked solution file. This information can be written to SD file tags; for
MOL2 files, these tags are written to comment blocks (see Section 14.2, page 111).
14.7 Files Containing Fitness Function Rankings
GOLD writes two types of file which summarise the fitness-function scores of docked ligands:
One pertains to an individual ligand (see Section 14.7.1, page 115).
The other pertains to a set of ligands (see Section 14.7.2, page 116).
14.7.1 File Containing Ranked Fitness Scores for an Individual Ligand
A file called <ligand_file_name>_m#.rnk is written for each ligand (m# refers to the
position of the ligand in the input file - remember that a given ligand input file may contain more
than one ligand). This file contains a summary of the fitness scores for all the docking attempts
on that ligand. The docking attempts are listed in decreasing order of fitness score, so the best
solution is placed first.
The file gives total fitness scores and a breakdown of the fitness into its constituent energy
terms. For GoldScore, these are the two vdw energy terms (protein-ligand and internal ligand),
an internal ligand torsion term, and two hydrogen-bonding terms (protein-ligand and ligand
intramolecular). The external vdw term is scaled by a factor of 1.375 in constructing the total
fitness score (this is an empirical correction to encourage protein-ligand hydrophobic contact).
The example file below corresponds to the five ligand in the input file ligand_file.mol2
and is therefore called ligand_file_m5.rnk. The solution Mol No 8 corresponds to the file
110 GOLD User Guide
or just the n best solutions for each ligand, where n is a user-specified number (e.g. n = 5 in
the screenshot below):
or just the top solution, and for only those m ligands with the best fitness scores, where m is
user specified (e.g. m = 100 in the example below):
In addition, you can filter out all solutions with fitness scores lower than a specified value by
switching on the button labelled Reject solutions with fitness lower than and typing in the
required value. For example, the settings below will save a maximum of 3 solutions for each
ligand and will not keep any solution with a fitness lower than 50:
GOLD User Guide 111
14.2 Controlling the Information Written to Output Files
It is possible to write additional information to docked solution files. This information is written
to SD file tags; for MOL2 files, these tags are written to comment blocks.
For post-processing docking results with SILVER it is particularly important that the scoring
function terms and the rotated protein hydrogen atom positions are saved.
Hit the Output... button in the GOLD front end to open the Output Preferences window. Use the
Information in File options to control what information is written to docked ligand files (see
Section 14.5, page 112).
The following options are available:
Save lone pairs in files
Some 3rd-party programs have difficulty reading files which contain lone pairs. You can stop
GOLD including lone pairs when it writes docked solution files by switching off this check-
box.
Save rotated hydrogens in file
SILVER uses the optimised positions of polar protein hydrogen atoms that are generated
during docking (these will usually be different for each docked ligand pose). Enable this
check-box to save the positions of rotated protein hydrogen atoms to docked solution files.
Save score in output file
Enable this check-box if you want the docked solution files to include the docking-score
terms, i.e. the total GoldScore or ChemScore value for each docking, and its components such
as protein-ligand H-bond energy, internal ligand strain energy, etc.
Output weighted SF terms
Certain docking scoring function terms are the product of a term dependent on the magnitude
114 GOLD User Guide
Output files for the docked ligand(s) may also contain additional information such as the scoring
function terms and the rotated protein hydrogen atom positions specific to that solution.
This information can be written to SD file tags; for MOL2 files, these tags are written to
comment blocks. It is possible to control the information written to solution files from the
Output Preferences window (see Section 14.2, page 111).
Solution file title strings take the form
<file_basename>|<p>|[cov<r>|]dock<q>
where
<file_basename> is the base name of the ligand input file
<p> is the molecule number in the file
<q> is the number of the docking
<r> is the covalent attachment atom. This part is only printed for covalent dockings.
For example (mol2 file):
ligand|mol2|1|dock4
where the ligand filename is ligand.mol2, the structure is number 1 in the molecule input
file, and the solution is from the fourth docking (dock4). The format for the output of the
equivalent sd input file would be the following:
ligand|sd|1|dock4
To revert to the historic output i.e. to output only the structure name e.g.
GOLD User Guide 113
Each ligand will normally be docked several times, so a given input ligand will produce a set of
files, each containing the results of a separate docking attempt.
Suppose that the original ligand file is structure.mol2. (this can contain more than one
ligand, in which case each will be docked). As the GOLD job progresses, the result of each
docking attempt is written out as gold_soln_structure_m#_n.mol2, where n is the
solution number 1,2,3 ... and m# is the number of the ligand, i.e. m1 for the first ligand, m2 for
the second, etc.
Note that the file gold_soln_structure_m1_1.mol2 is not the best GOLD prediction, it
is just the solution found in the first docking attempt. However, as GOLD proceeds, symbolic
links are created: ranked_structure_m#_1.mol2 will always point to the current top-
ranked solution, ranked_structure_m#_2.mol2 will point to the second-best solution,
and so on.
Alternatively, you can specify that all saved docking solutions for all ligands are to be
concatenated and written to a single file. To do this, open the Output Preferences dialogue by
hitting the Output... button in the GOLD front end. Then, switch on the Save solutions to one file
check-box, hit the Solutions file name button, and specify the required file name in the resulting
pop-up, e.g.
112 GOLD User Guide
of a particular physical contribution (e.g. hydrogen bonding) and a scale factor determined
e.g. by a regression coefficient. The docking scoring function terms included in the output file
can therefore consist of weighted terms, non-weighted terms or both. To include weighted
terms enable this check-box.
Output non-weighted SF terms
Enable this check-box to include non-weighted scoring function terms in the output file.
No SD-style tags in mol2 files
Enable this check-box to prevent SD-style tags being written to comment blocks in MOL2
solution files.
14.3 Specifying Directories for Output Files
Hit the Output... button in the GOLD front end to open the Output Preferences window.
Use the Output directory... entry box to specify the directory to which output files will be
written.
When more than one ligand is being docked, switch on the Create output sub-directories check
box if you want results for each ligand to be written to a separate sub-directory.
14.4 Files Containing the Initialised Protein and Ligand
GOLD produces the following output files:
gold_ligand.mol2 is the original ligand datafile with lone pairs added and the sets
DONOR_HYDROGENS and LONE_PAIRS defined.
gold_protein.mol2 is the original protein datafile with lone pairs added to binding site
atoms and the sets DONOR_HYDROGENS and LONE_PAIRS defined. The binding site is
defined in the set CAVITY_ATOMS.
Note: these set-definitions in the gold_protein.mol2 file are only accessible (i.e. visible)
through SYBYL.
14.5 Files Containing the Docked Ligand(s)
By default, docked ligands will be written out in the same format as was used for input. To
change this, hit the Output... button in the GOLD front end to open the Output Preferences
window. Then use the File and Format Options to specify the required output format. For
example:

Potrebbero piacerti anche