Sei sulla pagina 1di 12

Practice 10: Using pydock to predict interaction modes of two peptide chains

Jordi Vill i Freixa

Computational Biochemistry and Biophysics Lab Structural Biology, 4rt Biologia Universitat Pompeu Fabra March 2012

In this small tutorial/exercise, we will use pydock[2] to test possible binding positions between two polypeptide chains. The objective is to familiarize with the concept of protein docking, its relation with scoring functions, and understand the utility of the available software on this computational demanding task.

1 pydock
pydock is one of the programs of choice when trying to predict the structure
for a protein complex from its isolated components. This is one of the most dicult problem to solve in computational biology, due to the complexity of molecular interactions, the number of possible orientations, specially for the exibility of the molecules that are interacting, and the diculty in designing
scoring functions able to discriminate correct from incorrect interactions.

Adapted from "EMBO Practicals" by Dr. Fernandez-Recio's group at Barcelona Su-

percomputing Center

pydock developers have created based on a target from CAPRI1 Later on, the
of the system they are modelling.

In the practice, we will work an example extracted from the tutorial that

students will use the same technique to evaluate possible docking positions Due to the fact that the calculations are expensive, we will use the practice's cluster

Example: Target 26 from CAPRI 2005

Fernandez-Recio's group at

The rst excercise consists in reproducing the binding mode of the complex formed by TolB and Pal, which corresponded to Target 26 from CAPRI[4]. We will adapt the exercise proposed by Dr. necessary material. the Barcelona Supercomputing Center, who has kindly shared with us the


Experimental data

The rst thing that you have to realize is that docking studies are costly and can result in highly wrong predictions. In this way, any available experimental information might come handy to reduce wrong assignments. In this case, Ray et al.[5] we can extract valuable information: (...) The Tol-Pal system of Escherichia coli is involved in maintaining outer membrane stability acting as a barrier to the entry of macromolecules into the bacteria, thus providing protection against deleterious actions of bacteriocins and digestive enzymes. The periplasmic protein TolB was shown to interact with the outer membrane, peptidoglycan-associated proteins OmpA, Lpp, and Pal (4, 7). Thus, TolB and Pal could be part of a multicomponent system linking the outer membrane to peptidoglycan. The aim of this study was to determine the regions of TolB involved in the interaction of the protein with Pal. To this end, we used suppressor genetic techniques which had previously allowed us to characterize the regions of interaction between TolQ, TolR, and TolA (10, 18). Pal point mutations were identied, and some of them involved residues important for interaction with TolB (7). These mutations induce sensitivity to sodium cholate and release of periplasmic proteins

1 CAPRI (Critical Assessment of PRediction of Interactions) is a competition in which

computational biologists try to predict the structure of a protein complex that has been solved experimentally.

in the medium. We used these pal mutants to search for suppressors in tolB. (...) Isolation of extragenic suppressor mutations of pal A88V in tolB. Twelve mutations aecting 11 dierent residues of tolB were isolated as suppressor mutations of pal A88V (Table 2). They enabled the pal A88V mutant to grow on plates containing sodium cholate and lowered its excretion of periplasmic enzymes, some mutants being more ecient than others in suppressing the pal A88V phenotype. In most cases, the tolB mutations could not suppress the phenotypes of tolerance to colicins A and E2 of mutant pal A88V. Three tolB point mutations (H246Y, A249V, and T292I) aected the activity of TolB, whereas the others had phenotypes similar to the wild type. All the extragenic suppressor mutations of pal A88V are located in the C-terminal region of TolB. This suggests that this region of TolB is important for its interaction with Pal. (...) Isolation of intragenic suppressor mutations of pal A88V. Mutations pal S99F and pal E102K were both isolated as intragenic suppressor mutations of pal A88V. The pal E102K mutations was previously described as a pal-defective mutant (7). Both pal S99F and pal E102K mutations enabled the pal A88V mutant to grow in the presence of sodium cholate and lowered its excretion of periplasmic enzymes, mutant pal E102K being more ecient than mutant pal S99F as a suppressor mutation (Table 1). Thus, the conformation of the region from residues 88 to 102 appeared to be important for Pal function.(...)



The section 3.1 gives details on the dierent modulus comprised within this exercise. Basically, the protocol consists on 1. prepare the les for input and a Setup; 2. get docking poses with the Fast Fourier Transformed methods (FFT); 3. rank the poses with a

(scoring) function;

4. add experimental data; 5. predict the interaction interface between the two chains; 6. use the desolvation energy value to analyze the optimum docking area (ODA); and, at last

7. compare our results with the experimentally solved structure deposited on the PDB. The package l u k e . u p f . edu : / c u r s o s /BE/ e x e r c i s e s / p r a c t i c a _ 1 0 / p r a c t i c a _ 1 0 . contains all the les needed for this practice. Copy it into your user home at the cluster $

tar . g z


and decompress with



practica_10 .

tar . g z

To be able to use th conguration le: $

PyDock you should load the environment variables from


/ c u r s o s /BE/ p r o g r a m a r i / b a s h r c _ l u k e

It is recommended to create a working directory where you can leave all the generated les. Once you have decompressed the le, nd a directory called CAPRI. In what follows, we assume that you have created the work directory at the same level. That is. $ ls

. / CAPRI . /OUTPUT $ $



. / CAPRI . / work

2.2.1 Preparation of the les

The rst step is to create a le in pdb format that can be used by docking programs. The pdb le corresponding to the largest chain in the future complex will be labeled receptor and the smaller ligand. 1. download the le for the receptor: PDB code 1C5K 2. download the le for the ligand: PDB code 1OAP 3. copy the le


to your working directory

$ $ $ $ $

cd work wget h t t p : / /www . r c s b . o r g / pdb / f i l e s /1 c 5 k . pdb . g z wget h t t p : / /www . r c s b . o r g / pdb / f i l e s /1 oap . pdb . g z cp

. . / CAPRI/ T26 . i n i

Take a look at the contents of T26.ini. Once you understand what it does (and after modifying the strings of interest) you can run the preparation of les by doing: $ pydock T26 setup

As a result, the les for the receptor and the ligand ready for use with pydock will be created.

2.2.2 Fast Fourier Transformed (FFT) methods

With this methods a set of relative positions between the two chains (docking poses) will be generated. In fact,


can be used to evaluate, with Generat-

its scoring function, docking results made with other programs.

ing these results is costly so a queue will be used to send the calculations (see section 3.2). We ca use both zdock[1] or ftdock[3] for this task executing this programs, run: $ pydock T26 ftdock

2 . For

and/or $ pydock T26 zdock

In the current example we can skip this step and take the results that were previously generated. $


. . / CAPRI/ T26 . f t d o c k

Obviously, when working with the proposed exercise you would use the queue to calculate your assessments. Finally, we obtain the transformation matrices that give the relative positions of the two chains, using the results from ftdock and or zdock $ $ pydock pydock T26 T26 rotzdock rotftdock

2 Notice that the ZDOCK tutorial also includes the molecules used in here:




to re-rank the complexes list

pydock includes a more appropriate scoring function for the reorganization of

the relative positions of the chains obtained in the previous exercise. Therefore, we must tell the program which is the ordered list of complex and make a new calculation with the new scoring function. To do this, execute: $ pydock T26 dockser

Beware! This calculation can take hours. You must then send it to the queue as described in section 3.2. At the end of the


a le is created named


that contains

a rank of the dierent conformations according to their energy as scored by


For instance:

Conf Ele Desolv VDW Total RANK 82 13.008 18.137 6 0 . 5 0 1 25.095 1 93 4.872 17.757 5 . 4 4 5 22.085 2 68 17.781 5.304 1 4 . 3 4 0 21.651 3 21 14.638 7.127 1 7 . 2 8 4 20.037 4 17 6.582 15.011 3 2 . 6 9 5 18.323 5

Conf: conformation le number Ele: electrostatic component Desolv: desolvation component


VDW: van der Waals component Total: sum of the previous three RANK: order according to the binding energy

The module


can also be used in an interactive way, for instance

to reorder using only the polar energy component, which can give good results for systems with only temporary interactions (as in a signaling cascade). In other cases, such as in antigen-antibody interactions, or dimer, the van der Waals component might be determinant.

adminBE@luke : ~ / t e s t /work> pydock For batch u s e : pydock f i l e < s c r i p t f i l e > or pydock <dockname> <module> PyDock l o a d e d . . . Now you can run pyDock commands i n t e r a c t i v e l y . . . >>> import pyDockSER >>> pyDockSER . r e c a l c E n e ( " T26 . ene " , " T26 . ene_0 . 0VDW" , 1 , 1 , 0 . 0 , 1 )
where we used the function recalcEne, which has the following 6 arguments:

argument1= original input le argument2= output le argument3= weight that we want to assign to the electrostatic component argument4= weight that we want to assign to the desolvation component argument5= weight that we want to assign to the vdW component (0.1 by default) argument6= parameter under development

Once you have reorder you will have (it is also possible to use Excel to do the same job, although the


based on Python, is very useful):

Conf Ele Desolv VDW Total RANK 82 13.008 18.137 6 0 . 5 0 1 31.145 1 24 8.399 18.068 9 4 . 6 2 7 26.467 2 68 17.781 5.304 1 4 . 3 4 0 23.085 3 93 4.872 17.757 5 . 4 4 5 22.629 4 21 14.638 7.127 1 7 . 2 8 4 21.765 5
2.4 Using experimental constraints to guide the docking
These come from knowing

pydock allows us to add experimental constraints. pydock,


which residues are important in the interaction. In particular, experiments, for example). To

pydock allows

us to dene which residues are known to be at the interfase (from mutation such restrictions are imposed by

making that the center of mass of the residue in question is at a distance of less that 6 from any other (non hydrogen) atom of another molecule. To add the constraints we only write them on the le [ receptor ] pdb mol newmol restr [ ligand ] pdb mol newmol restr and execute $ pydock = 1 oap . pdb = A = B = B . Ala88 = 1 c 5 k . pdb = A = A = A. His246 ,A. Ala249






Interaction interface prediction

2.5.1 From the docking results

From the top 100 solution according to consensus region of interaction by: $ pydock T26 patch


ranking, we can nd the

The result is a series of les with the information of the residues and their NIP (normalized interface property), and the same value in the column B-factors in PDB les:


The residue has a equally random propensity to be at the interface

as at the surface The residue appears with a propensity that is less than random The residue has propensity to be at the interface.


As this information is stored in the B-factor column, any visualization program able to color the residues according to this value will be useful. One

of the is ICM browser (, but also VMD, choosing "coloring method = beta", is capable of showing this information.

2.5.2 From the desolvation energy

Another tool in

pydock lets us evaluate the optimal surface of docking (ODA)

from the desolvation energy. Keep in mind that the proteins that form multiple complexes will be very prone to surface interaction. We can calculate this value by: pydock 1 c 5 k . pdb oda

Once again, the information is stored in the column B-factor.


Comparison of the prediction with the experimental structure

You can use any visualization program to see the nal results. However, we must have the PDB for the conformation predicted, and so far we only have the translation and rotation matrices. In order to generate the corresponding PDB le you can use:

adminBE@luke : ~ / t e s t /work> pydock For batch u s e : pydock f i l e < s c r i p t f i l e > or pydock <dockname> <module> PyDock l o a d e d . . . Now you can run pyDock commands i n t e r a c t i v e l y . . . >>> import pyDockMakePDB >>> pyDockMakePDB . main ( " T26 " , 3 , 3 ) Reading pdb f i l e s . . . Conformation 3


pydock *.ini:
pdb mol newmol [ ligand ] pdb mol

oers the possibility to also add the reference PDB to the le

[ receptor ] = REC . pdb = A = A

= LIG . pdb = C


= B

[ reference ] pdb = XXX. pdb recmol = L ligmol = I newrecmol = A newligmol = B


quick reference

To execute the dierent modules that compose $ pydock dockingNAME moduleNAME



In the rst example above,

input les


output les

Table 1: Modules and les typical from pydock


module name
setup ftdock/zdock

dockingNAME.ini dockingNAME_rec.pdb dockingNAME_rec.lig dockingNAME.(ftdock/zdock) dockingNAME_rec.pdb dockingNAME_lig.pdb dockingNAME.rot dockingNAME.ini dockingNAME_rec.pdb dockingNAME_lig.pdb dockingNAME.rot dockingNAME.ene dockingNAME_rec.pdb dockingNAME_lig.pdb dockingNAME.rot dockingNAME.ene X.pdb Y.pdb

dockingNAME_rec.pdb dockingNAME_lig.pdb dockingNAME.(ftdock/zdock) dockingNAME.rot dockingNAME.ene


rotftdock/rotzdock dockser dockrst

dockingNAME.eneRST dockingNAME.rst dockingNAME.recNIP dockingNAME_rec.pdb.nip dockingNAME.lignip dockingNAME_lig.pdb.nip dockingNAME_rec.pdb.oda dockingNAME_rec.pdb.oda.ODAtab dockingNAME_lig.pdb.oda dockingNAME_lig.pdb.oda.ODAtab

patch Aditional tools




Using the Sun Grid Engine to send jobs to the server has a system of queues to send the calculations. You can nd a good tutorial on how to use the system: gridengine62u2/Using
Briey, the jobs are sent to a queue managed by the proper use of resources when there are many users on a machine. To do this we must write a le such us #!/ b i n / b a s h #$ #$ #$ #$

S q o e

/ bi n / bash llicen .q / homes / u s e r s /adminBE/ t e s t / work / s e n d . o / homes / u s e r s /adminBE/ t e s t / work / s e n d . e

cd / homes / u s e r s /adminBE/ t e s t / work source / c u r s o s /BE/ p r o g r a m a r i / b a s h r c _ l u k e

pydock T26 zdock This le is used to lauch a job with zdock as explained above. Basically it says to the system which queue it must use (llicen.q) and a series of parameters to control where the STDOUT and STDERR are stored. Then add the lines needed to run the program. Once you have written a le adapted for your particular problem, send it to the queue by: $ qsub < f i t x e r >

To see if it is running or not, run $ qstat

Other tool of interest allows us to obtain a list of all available queues: j v i l l a @ l u k e : ~ / work> q c o n f all .q cursext . q curstest .q llicen .q master . q masterlong . q or to know the characteristics of a particular queue (test, for example:

s q l

-sq all.q).


The material contained herein is based on the document "EMBO practicals:


algorithm for docking:

application to a real CAPRI case" by Dr. Carles Pons, from the same

Fernandez-Recio's group at the Barcelona Supercomputing Center. The les used for the practice were provided by Dr. laboratory.

[1] R. Chen, L. Li, and Z. Weng. ZDOCK: an initial-stage protein-docking algorithm. Proteins: Structure, Function, and Bioinformatics, 52(1):80 87, 2003. [2] T. Cheng, T. Blundell, and J. Fernandez-Recio. pyDock: electrostatics and desolvation for eective scoring of rigid-body protein-protein docking.
Proteins: Structure, Function, and Bioinformatics, 68(2):503515, 2007.

[3] H. Gabb, R. Jackson, and M. Sternberg. Modelling protein docking using shape complementarity, electrostatics and biochemical information1.
Journal of Molecular Biology, 272(1):106120, 1997.

[4] S. Grosdidier, C. Pons, A. Solernou, and J. Fernndez-Recio. Prediction and scoring of docking poses with pyDock. Proteins: Structure, Function,
and Bioinformatics, 69(4):852858, 2007.

[5] M. Ray, P. Germon, A. Vianney, R. Portalier, and J. Lazzaroni. Identication by genetic suppression of Escherichia coli TolB residues important for TolB-Pal interaction. Journal of Bacteriology, 182(3):821, 2000.