Sei sulla pagina 1di 11

Tutorial 1: Simulating a box of water

Before you begin


Create a directory to work with the tutorial files (e.g., mkdir tutorial1). This will create a
directory called tutorial1 in your current working directory. We recommend that you use your
$WORK directory on Stampede as the storage space for all of the tutorials and we will assume
that this is the case for all of the following tutorials.
For the purposes of this tutorial and all of the others that follow, we will additionally assume that:
(1) AmberTools15/Amber14 has been installed somewhere (in the case of Stampede, it is
located in /work/02555/psn/amber14)
(2) you have run the setup script amber.sh in the amber14 directory via the source
command or similar (e.g., source /work/02555/psn/amber14/amber.sh)
Modify these names and conventions as necessary to match those of the system that you are
using!
Initial structures of water molecule and water box
We have provided Protein Data Bank (PDB) format coordinate files of both a single water
molecule (wat.pdb) and a box of 640 molecules (wat_box.pdb). Should you want to build these
files yourself, weve included some instructions in the appendix of this tutorial.
System setup
Before we begin any sort of simulation, we need to create parameter/topology and coordinate
files that Amber can understand. Using a text editor (e.g., nano) create a file setup_water.leap
with the following content:
source leaprc.ff14SB
wliq = loadpdb wat_box.pdb
setbox wliq vdw
saveamberparm wliq wat_box.prmtop wat_box.inpcrd
wgas = loadpdb wat.pdb
saveamberparm wgas wat_gas.prmtop wat_gas.inpcrd
quit

Here is a brief translation of the lines in this file:

source leaprc ff14SB load parameters for the AMBER ff14SB force field, which by
default also loads the parameters for the TIP3P water model
wliq = loadpdb wat_box.pdb load the PDB file with the water box coordinates into a
unit called wliq
setbox wliq vdw this calculates the dimensions of the periodic box to be used by
finding the min/max coordinates of water box system and adding the applicable van der
Waals radii as a buffer

saveamberparm wliq write out Amber-compatible parameter/topology and


coordinate files for the water box system
wgas = loadpdb wat.pdb load the PDB file with the single water molecule coordinates
into a unit called wgas
saveamberparm wgas write out Amber-compatible parameter/topology and
coordinate files for the gas phase system
quit tell tleap to quit when its done

We can now run tleap to create the desired files from the command line:
tleap -f setup_water.leap

At this point it is worthwhile to create a directory structure that will help us in keeping various
things organized. Here is one suggestion:
mkdir analysis mdcrd mdin mdout rst

This creates five subdirectories inside the tutorial directory. All of the popular MD packages
Amber included create many different output files, and it is a good idea to organize things so
that you can find them easily.
Minimization
In general energy minimization is a necessary prerequisite to running a molecular dynamics
simulation. If a simulation is started while the system is in a particularly unfavorable
configuration, it will almost surely crash due to numerical instabilities caused by large forces. In
this case minimization is essential because our water box was built without any regard to
relative molecular orientations (although the program that created the box does respect
intermolecular spacing).
To minimize this system we will use the Amber executable pmemd.MPI, a parallel version of the
pmemd MD program. (Quick aside: AmberTools is distributed with sander, a free executable
that can carry out all of the same calculations and in fact has greater functionality than pmemd,
which is distributed with the non-free Amber package. But pmemd is ~2x faster or even more
in some cases than sander and so we will use it extensively here.)
Using a text editor create a file mdin/min.in with the following content:
liquid minimization
&cntrl
imin = 1,
ntb = 1, cut = 9.0,
ntc = 1, ntf = 1,
maxcyc = 1000, ncyc = 500
/
The first line of the file is a title line. Pretty much anything can be written here. Beyond that,
here is a brief translation of this file:

&cntrl this specifies that everything that follows belongs to the cntrl namelist (a list of
parameters for controlling the simulation)
imin = 1 flag for doing minimization (vs. real MD)
ntb = 1 use constant volume periodic boundaries
cut = 9.0 set the real space cut-off for both electrostatics and van der Waals forces to
9.0
ntc = 1 SHAKE is not used to constrain any bond lengths
ntf = 1 compute all bonded interactions (needed when SHAKE is not used)
maxcyc = 1000 the maximum number of minimization steps to be carried out
ncyc = 500 carry out steepest descent minimization for the first 500 steps, followed
by conjugate gradient minimization (provided that ntmin = 1, which is the default if not
otherwise specified)

We can now perform the energy minimization from the command line (type everything below on
the same line):
ibrun -np 16 pmemd.MPI -O -i mdin/min.in -o mdout/wat_box_min.out -p
wat_box.prmtop -c wat_box.inpcrd -r rst/wat_box_min.rst

This tells pmemd.MPI to run on 16 cores (the max number in each node) using the specified
input script (-i), prmtop file (-p), and inpcrd file (-c). It will output final coordinates to a restart file
(-r) and a variety of simulation output to a text file (-o). (Note: On the majority of computers the
command to run an executable in parallel is mpirun, not ibrun.)
Take a look inside mdout/wat_box_min.out to see what the output looks like. In general you
should find the potential energy (ENERGY) and root-mean-square gradient of the potential
energy (RMS) decreasing as the minimization progresses. (The RMS gradient provides some
sense of the average magnitude of the force acting on each atom.)
Heating
Now we want to bring our simulation box up to the desired temperature. In this case well run
the simulation at room temperature (298.15 K). Create a new text file mdin/heat.in with the
following content:
heat liquid
&cntrl
imin = 0, irest = 0, ntx = 1,
ntb = 1, cut = 9.0,
ntc = 2, ntf = 2,
tempi = 100.0, temp0 = 298.15, ntt = 3, gamma_ln = 1.0, ig = -1,
nstlim = 15000, dt = 0.002,
ntpr = 250, ntwx = 250, ntwr = 5000,
ioutfm = 1, nmropt = 1,
/
&wt
type='TEMP0', istep1=0, istep2=10000, value1=100.0, value2=298.15
/
&wt
type='END'
/

Note that there are many of the same terms as used in our minimization script, but some have
different values. There are also many new terms. Here are brief descriptions of all of the terms
that are either new or changed:

imin = 0 perform a MD run (as opposed to energy minimization)


irest = 0 run as a new simulation (i.e., not a continuation)
ntx = 1 only take coordinates from the restart file used as input coordinates
ntc = 2 use SHAKE to constrain heavy atom-to-H bonds
ntf = 2 do not evaluate bonded energy terms for heavy atom-to-H bonds
tempi = 100.0 initial temperature (in K) of the simulation
temp0 = 298.15 desired target temperature of the simulation
ntt = 3, gamma_ln = 1.0, ig = -1 use the Langevin thermostat to maintain temperature
with a collision frequency of 1.0 ps-1 and a random number generator seed generated
based on the CPU clock (about as random as anything)
nstlim = 15000 perform 15000 MD steps (simulation length = nstlim x dt)
dt = 0.002 use a MD time step of 0.002 ps = 2 fs
ntpr = 250 print simulation output every 250 steps
ntwx = 250 write simulation trajectory every 250 steps
ntwr = 5000 write a simulation restart file every 5000 steps
ioutfm = 1 write the simulation trajectory as a NetCDF (binary) file
nmropt = 1 use the NMR restraints specified in the namelist(s) that follow(s)

The remaining lines are separate namelists that enable us to vary the thermostat settings on the
fly. In particular from steps 0 to 10000, we linearly increase the thermostat temperature from
100.0 to 298.15 K. Then from steps 10001 to 15000 we maintain the thermostat temperature at
298.15 K. With modern MD packages it is not clear that we need to be so conservative with our
heating protocol, but starting from a lower temperature can potentially improve simulation
stability for systems that are difficult to minimize.
Run this NVT heating simulation with the following command:
ibrun -np 16 pmemd.MPI -O -i mdin/heat.in -o mdout/wat_box_heat.out -p
wat_box.prmtop -c rst/wat_box_min.rst -r rst/wat_box_heat.rst -x
mdcrd/wat_box_heat.mdcrd

Note that we use the minimized coordinates restart file as our starting coordinates (-c) and we
now also output a simulation trajectory file (-x), which contains the coordinates of all of the
atoms in our system (and optionally the velocities) at every 250th step of the simulation.
Equilibration (round #1)
Now that our system is up to temperature, we can bring it to the correct pressure/density. By
now youve hopefully gotten the hang of the organization scheme were using. Create a new
text file called mdin/eq1.in with the following content:
equilibrate liquid (Berendsen barostat)
&cntrl
imin = 0, irest = 1, ntx = 5,
ntb = 2, cut = 9.0,

pres0 = 1.013, ntp = 1, barostat = 1, taup = 2.0,


ntc = 2, ntf = 2,
temp0 = 298.15, ntt = 3, gamma_ln = 1.0, ig = -1,
nstlim = 25000, dt = 0.002,
ntpr = 250, ntwx = 250, ntwr = 5000,
ioutfm = 1, iwrap = 1

This input script contains only a few new or changed terms that are not self-explanatory:

irest = 1, ntx = 5 continue this simulation from a restart file, reading both the
coordinates and velocities from that restart file
ntb = 2 use constant pressure (i.e., variable volume) periodic boundaries
pres0 = 1.013, ntp = 1, barostat = 1, taup = 2.0 set the desired target pressure of the
simulation to 1.013 bar (1 atm) using an isotropic Berendsen barostat with a time
constant of 2.0 ps
iwrap = 1 ensure that all molecules are inside the original system box when writing
out restart and trajectory files

Run this NPT simulation with the following command:


ibrun -np 16 pmemd.MPI -O -i mdin/eq1.in -o mdout/wat_box_eq1.out -p
wat_box.prmtop -c rst/wat_box_heat.rst -r rst/wat_box_eq1.rst -x
mdcrd/wat_box_eq1.mdcrd

Lets analyze the output file to monitor the convergence of the density of the system. On the
command line, input the following command:
grep Density mdout/wat_box_eq1.out

The grep command outputs lines in a file (or files) that contain the expression in quotes.
(Strictly speaking, quotes arent necessary if the expression is a single word, number, etc.) You
should see that the density trends toward ~1.0 g/cm3 and then fluctuates near that value, with
the exception of the last two numbers. (Why? Hint: Go look at the simulation output file!)
If we wanted to extract this data in such a way that might be useful for plotting, we could do
something like:
grep Density mdout/wat_box_eq1.out | head -100 | awk {print $3} >
analysis/eq1_density.dat

The awk command is quite versatile in parsing text files and even performing some basic data
manipulating. Here weve used it to print out the 3rd column of the grep output, which weve
truncated to include only the first 100 lines. You should now be able to plot the contents of the
resulting file (eq1_density.txt) using the tool of your choice. In this case it takes roughly 20 ps
for the system to reach the equilibrium density. (Note that the simulation output is written every
250 steps or 0.5 ps.)
Equilibration (round #2)

Now that our system is at the correct temperature and pressure, we can switch the type of
barostat to one that is more suited to equilibrium conditions (and generates a rigorously correct
NPT ensemble!). Create a new text file mdin/eq2.in with the following content:
equilibrate liquid (Monte Carlo barostat)
&cntrl
imin = 0, irest = 1, ntx = 5,
ntb = 2, cut = 9.0,
pres0 = 1.013, ntp = 1, barostat = 2, mcbarint = 100,
ntc = 2, ntf = 2,
temp0 = 298.15, ntt = 3, gamma_ln = 1.0, ig = -1,
nstlim = 25000, dt = 0.002,
ntpr = 500, ntwx = 500, ntwr = 5000,
ioutfm = 1, iwrap = 1
/

This input script contains just two new terms in this one line: pres0 = 1.013, ntp = 1, barostat =
2, mcbarint = 100. Instead of using a Berendsen barostat, we have now switched to a Monte
Carlo barostat in which new system volumes are attempted every 100 steps.
Run this NPT simulation with the following command:
ibrun -np 16 pmemd.MPI -O -i mdin/eq2.in -o mdout/wat_box_eq2.out -p
wat_box.prmtop -c rst/wat_box_eq1.rst -r rst/wat_box_eq2.rst -x
mdcrd/wat_box_eq2.mdcrd

We wont take too much time to analyze this output, but it wouldnt hurt to do a quick sanity
check of the density to ensure that it is fluctuating about an equilibrium value and not drifting
(which would indicate that more equilibration is necessary). You can also see the outcomes of
the Monte Carlo barostat swaps in the simulation output. On average, we would like these to be
accepted about ~50% of the time for the most efficient sampling of phase space.
Production
Now we are ready to do a production simulation from which we can get statistics that are
rigorously correct with regards to the NPT ensemble. Create a new text file mdin/prod.in with
the following content:
production simulation
&cntrl
imin = 0, irest = 1, ntx = 5,
ntb = 2, cut = 9.0,
pres0 = 1.013, ntp = 1, barostat = 2, mcbarint = 100,
ntc = 2, ntf = 2,
temp0 = 298.15, ntt = 3, gamma_ln = 1.0, ig = -1,
nstlim = 1000000, dt = 0.002,
ntpr = 500, ntwx = 500, ntwr = 50000,
ioutfm = 1, iwrap = 1
/

The only difference between this input script and mdin/eq2.in is the length of the simulation has
been greatly extended (to 1.0 x 106 steps or 2.0 ns). In general it is a good idea for your final

equilibration and production scripts to be identical in their parameters aside from simulation
length and output frequencies.
Run the production simulation with the following command:
ibrun -np 16 pmemd.MPI -O -i mdin/prod.in -o mdout/wat_box_prod.out -p
wat_box.prmtop -c rst/wat_box_eq2.rst -r rst/wat_box_prod.rst -x
mdcrd/wat_box_prod.mdcrd

This simulation will take several minutes to run. If you want an estimate of when it will
complete, open the file named mdinfo that is automatically created by pmemd. (You can control
the name and location of this file using the flag -inf. This will come in handy for later tutorials.)
Computing the average density
If you examine your production simulation output, you should find that the average density is
~0.987 g/cm3, which differs slightly from the known experimental value at these conditions
(0.997 g/cm3). Is this just a statistical fluke due to our finite sampling or a real anomaly? We
can use some basic data analysis techniques to find out.
First, extract the densities from the production simulation output file. (Remember to exclude the
last two densities, as these correspond to the mean and RMS values.) Next, we need to create
a program to compute the standard error of the mean density. Standard errors are generally
calculated as the sample standard deviation (sx) divided by the square root of the number of
samples (n):

SE x =

sx
n

This formula assumes that each sample (i.e., simulation snapshot) is statistically independent,
but it is important to note that successive observations taken from MD simulations can be
strongly correlated because the atomic positions at time t + t depend on the positions at time t.
This is counteracted somewhat by the fact that we do not record observables at every time step,
but rather at every jth step (such that n = tsim/j), as specified in our input script, where tsim is the
total simulation length (in steps) and j is the frequency at which output is written out (again in
steps).
We therefore need a method to estimate how many independent samples of the density we
actually have, and one way to do this is by determining the characteristic timescale of the
autocorrelation function of the density. We will not cover the theory here (more can be found in
Frankel and Smits book, among others), but the standard error for simulation observables can
be estimated as:

SE x

sx
n 2ncorr ,x

where ncorr,x is the characteristic number of snapshots for the autocorrelation in the observable
of interest to decay to (approximately!) zero. (If you are unfamiliar with autocorrelation

functions, there is a nice explanation available on Wikipedia, as well as in the free energy
simulations review by Shirts and Mobley that is part of our Suggested Reading list.)
In practice ncorr,x is usually calculated by assuming that the autocorrelation function has a simple
exponential form:

ACFx ( ) e

/ncorr ,x

Therefore by fitting a function of this form to the autocorrelation function, we can estimate this
characteristic timescale. Because this analysis is nontrivial, we have placed an example
IPython notebook on the workshop website (water_analysis.ipynb) that uses the scipy and
numpy libraries to carry out such an analysis. (Dont worry, youll get a chance to adapt it for
the next task!)
Computing the enthalpy of vaporization
In addition to checking that our simulation obtains the correct density for liquid water at 298.15
K, you may also be curious about the accuracy of the intermolecular interactions. One way to
measure this is to calculate the enthalpy of vaporization, which is the energetic cost of
converting one mole of a substance from the liquid phase to the gas phase.
The enthalpy of vaporization can be calculated from MD simulations as:

Hvap = Ugas Uliquid + RT


where Ugas is the (simulation average) gas phase potential energy, Uliquid is the (simulation
average) liquid phase potential energy per molecule, R is the universal gas constant, and T is
the temperature. There is some debate about how to incorporate the effect of polarization in
calculating such enthalpies when using fixed-charge force fields, but that is a lengthy discussion
indeed!
In the case of TIP3P water (or indeed any rigid, fixed-charge water model), Ugas is actually zero
because the bonds are frozen (via SHAKE) and the molecule is too small to have any
intramolecular nonbonded interactions. Therefore we can calculate the enthalpy of vaporization
using the liquid phase (i.e., water box) simulation alone.
The experimental value for the enthalpy of vaporization of water at 298.15 K and 1.0 atm is
10.52 kcal/mol. See how your simulation data compare!
Radial distribution function
One final analysis we can do to check if our water box is physically reasonable is to calculate
the radial distribution function (rdf) for the O-O (or O-H or H-H) distances in the water. This will
give us some insight into the structure of our bulk water at the molecular scale. This analysis
will require us to use another executable of AmberTools, cpptraj, which can perform a variety
of trajectory analysis and Amber file modification tasks.

cpptraj can be run interactively or with a script; we will choose the former option for now. At

the command line enter the following:

cpptraj -p wat_box.prmtop

You will now see a prompt where you can type in commands. Enter the following:
trajin mdcrd/wat_box_prod.mdcrd
radial analysis/wat_OO_rdf.dat 0.05 8.0 @O volume
run
quit

Lets unpack this syntax line-by-line:


(1) trajin mdcrd/wat_box_prod.mdcrd read in the MD trajectory file
(2) radial analysis/wat_OO_rdf.dat 0.05 8.0 @O volume calculate the oxygen-to-oxygen
(@O) rdf out 8.0 , using bin widths of 0.05 for the histogram and using the average
volume of the system to properly normalize the rdf
(3) run carry out all of the commands
(4) quit self-explanatory
The rdf is placed in a text file analysis/wat_OO_rdf.dat that has two columns in it: one for O-O
distance and one for the value of the rdf, also known as g(r). Plot this using the plotting tool of
your choice and compare it directly to experimental data (e.g., Figure 4A in
http://pubs.acs.org/doi/abs/10.1021/cr0006831).

Appendix: Creating PDB files of a single water molecule and water box
Use the molecule editor of your choice to save a PDB file of a single water molecule. One
editor that we recommend is Avogadro (available at http://avogadro.cc/wiki/Main_Page). In
Avogadro this task can be done in the following way:
(1) Select the Draw Tool (looks like a pencil).
(2) In the Element: drop-down menu, select Oxygen (8).
(3) Click anywhere in the black View window. You should see a water molecule appear. If
you see a lone oxygen atom, then undo this action, make sure the Adjust Hydrogens
box is checked, and click in the View window again.
(4) File Save As wat.pdb (into the tutorial directory)
Note: You will need to do some editing of this file with a text editor to make sure that tleap
recognizes this as a water molecule. In particular you must:

Change HETATM to ATOM (with two spaces after the word ends)
Change the residue number of the hydrogen atoms (5th column) from 0 to 1
Change the names of the two hydrogen atoms (3rd column) from H (with one space
after the letter) to H1 and H2, respectively

At room temperature, water has a density of ~1 g/cm3, but we want to convert this molecules
per cubic ngstrom (molecules/3) so that we can determine an appropriate box size. Using the
following formula, we find:
1 g/cm3 x (10-8 cm/)3 x (6.022 x 1023 molecules/mol) x 1/(18.015 g/mol) = 0.0334 molecules/3
Therefore we need a box of volume 640 molecules/(0.0334 molecules/3) = 19200 3 to fit
these molecules at the appropriate density. This implies a side length of 26.8 , which we will
conservatively round up to 27 .
We can then build the water box using any number of tools. One popular, free, and opensource tool is called packmol (http://www.ime.unicamp.br/~martinez/packmol/). Once you have
downloaded and compiled this software, create a text file wat_box.inp with the following content:
tolerance 3.00
output wat_box.pdb
filetype pdb
add_amber_ter
structure wat.pdb
number 640
inside cube 0. 0. 0. 27.
end structure

You can then build the water box by typing the following at the command line:
$path-to-packmol/packmol < water_box.inp

where $path-to-packmol should be replaced with the full path to wherever packmol has been
installed.

An alternative way to approach this problem is to use the built-in solvate function of tleap. If
you want to go this route, create and modify the wat.pdb file as directed above. Then we create
the following script to be executed by tleap:
source leaprc.ff14SB
wgas = loadpdb wat.pdb
saveamberparm wgas wat_gas.prmtop wat_gas.inpcrd
wliq = copy wgas
solvatebox wliq TIP3PBOX 12.11
saveamberparm wliq wat_box.prmtop wat_box.inpcrd
quit

Unfortunately it is not possible to specify the number of desired solvent molecules with tleap,
but creating a box that extends at least 12.11 from the starting water molecule will yield ~640
water molecules.

Potrebbero piacerti anche