Protein Secondary Structure Prediction 1

Protein Secondary Structure
Prediction
K.Akila,
Lecturer,
Department of Bioinformatics
Jamal Mohamed College,
Trichy.
Proteins
 Proteins play a crucial role in virtually all biological
processes with a broad range of functions.
 The activity of an enzyme or the function of a
protein is governed by the three-dimensional
structure.
H11_MOUSE
histocompatibility antigen
VE2_BPV1
Bovine DNA-binding domain
20 amino acids - the building blocks
Clickable map at: http://www.russell.embl-heidelberg.de/aas/

Secondary structure = spatial arrangement of aminoacid
residues that are adjacent in the primary structure
Reasons for Predicting Secondary
Structure
 Starting point for prediction of tertiary and
quaternary structure.
 Insight into biological function of protein.
 Facilitate alignment for homology modeling
of distantly related proteins.
 Insight for data analysis/mutagenesis
experiments when structure is not known.
 Since secondary structure is local, just need
amino acid sequence.
Use of amino acid properties in
prediction schemes
Sequence
Propensity
function
Other inputs
Vector of
Sequence
propensities
Prediction
Prediction
function
Other inputs
Primary structure
Primary structure refers to the "linear" sequence of amino acids.
Types of Secondary
Structures
 α Helices
 β Sheets
 Loops
 Coils
α Helix
 Most abundant secondary

 structure
 3.6 amino acids per turn
 Hydrogen bond formed
between every fourth
reside
 Average length: 10 amino
acids, or 3 turns
 Varies from 5 to 40 amino
acids
Contd.
 Normally found on the surface of protein cores.

 Interact with aqueous environment
– Inner facing side has hydrophobic amino acids.
– Outer-facing side has hydrophilic amino acids.
 Every third amino acid tends to be hydrophobic.

 Pattern can be detected computationally.
 Rich in alanine (A), gutamic acid (E), leucine(L), and
methionine (M).
 Poor in proline (P), glycine (G), tyrosine (Y),and serine (S).
β Sheet
 Hydrogen bonds between 5-10

consecutive amino acids in one portion
of the chain with another 5-10 farther
down the chain
 Interacting regions may be adjacent with
a short loop, or far apart with other
structures in between.
 Slight counterclockwise rotation
-Alpha carbons (as well as R side
groups) alternate above and below the
sheet
- Prediction difficult, due to wide range
of φ and ψ angles.
Loop
 Regions between α helices and βsheets.

 Various lengths and three-dimensional
configurations.
 Located on surface of the structure.
 Hairpin loops: complete turn in the polypeptide
chain, (anti-parallel β sheets).
 More variable sequence structure.
 Tend to have charged and polar amino acids.
 Frequently a component of active sites.
Secondary structure
prediction
• Historically first structure prediction methods
predicted secondary structure
• Can be used to improve alignment accuracy
• Can be used to detect domain boundaries within

proteins with remote sequence homology
• Often the first step towards 3D structure prediction
• Informative for mutagenesis studies

Contd.,
 In either case, amino acid propensities

should be useful for predicting secondary
structure
 Two classical methods that use previously
determined propensities:
– Chou-Fasman
– Garnier-Osguthorpe-Robson
Chou-Fasman method
 Uses table of conformational parameters
(propensities) determined primarily from
measurements of secondary structure by
CD spectroscopy
 Table consists of one “likelihood” for each
structure for each amino acid
Chou-Fasman propensities
(partial table)
Amino Acid Pα Pβ Pt
Glu 1.51 0.37 0.74
Met 1.45 1.05 0.60
Ala 1.42 0.83 0.66
Val 1.06 1.70 0.50
Ile 1.08 1.60 0.50
Tyr 0.69 1.47 1.14
Pro 0.57 0.55 1.52
Gly 0.57 0.75 1.56
Chou-Fasman method
 A prediction is made for each type of
structure for each amino acid
– Can result in ambiguity if a region has high
propensities for both helix and sheet (higher
value usually chosen, with exceptions)
Chou-Fasman method
 Calculation rules are somewhat ad hoc
 Example: Method for helix
– Search for nucleating region where 4 out of 6
a.a. have Pα > 1.03
– Extend until 4 consecutive a.a. have an average
Pα < 1.00
– If region is at least 6 a.a. long, has an average
Pα > 1.03, and average Pα > average Pβ,
consider region to be helix
Accuracy of Chou-Fasman
predictions
 Sequences whose 3D structures are known
are processed so that each residue is
“assigned” to a given secondary structure
class by looking at the backbone angles
 Three classes most often used (helix=H,
sheet=E, turn=C) but sometimes use four
classes (helix, sheet, turn, loop)
Confusion matrix for Chou-
Fasman method on 78 proteins
Predicted H E C Unknown
True
H 47.5 3.0 4.3 45.2
E 20.8 16.8 7.1 55.4
C 6.4 3.6 38.0 52.0

Average accuracy = 54.4
Data from ZY Zhu, Protein Engineering 8:103109, 1995
Garnier-Osguthorpe-Robson
 Uses table of propensities calculated
primarily from structures determined by X-
ray crystallography
 Table consists of one “likelihood” for each
structure for each amino acid for each
position in a 17 amino acid window
Garnier-Osguthorpe-Robson
 Analogous to searching for “features” with a
17 amino acid wide frequency matrix
 One matrix for each “feature”
α-helix
β-sheet
– turn
– coil
 Highest scoring “feature” is found at each
location
Accuracy of predictions
 GOR much better at recognizing β-sheets
 Both methods are only about 55-65%
accurate.

Protein Secondary Structure Prediction 1

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Protein Secondary Structure Prediction 1

Caricato da

Copyright:

Formati disponibili

Protein Secondary Structure

Clickable map at: http://www.russell.embl-heidelberg.de/aas/

 Most abundant secondary

 Normally found on the surface of protein cores.

 Every third amino acid tends to be hydrophobic.

 Hydrogen bonds between 5-10

 Regions between α helices and βsheets.

• Can be used to improve alignment accuracy

• Can be used to detect domain boundaries within

• Often the first step towards 3D structure prediction

• Informative for mutagenesis studies

 In either case, amino acid propensities

E 20.8 16.8 7.1 55.4

C 6.4 3.6 38.0 52.0

Potrebbero piacerti anche