Sei sulla pagina 1di 14

Objective:

To study the structural databases

Theory:
There are two structure database of NCBI-
1. MMDB
2. CDD

1. MMDB(Molecular modelling database)-


The Molecular Modeling Database (MMDB) contains 3D macromolecular structures,
including proteins and polynucleotides. MMDB contains over 28,000 structures and is
linked to the rest of the NCBI databases, including sequences, bibliographic citations,
taxonomic classifications, and sequence and structure neighbors. Entrez is the
integrated, text-based search and retrieval system used at NCBI for the major
databases, including PubMed, Nucleotide and Protein Sequences, Protein Structures,
Complete Genomes, Taxonomy, and others. Click on the graphic below for a more
detailed view of Entrez integration.

2.CDD(Conserved domain database)-


The Conserved Domain Database (CDD) is a database of well-annotated multiple sequence
alignment models and derived database search models, for ancient domains and full-length proteins.
Domains can be thought of as distinct functional and/or structural units of a protein. These two
classifications coincide rather often, as a matter of fact, and what is found as an independently
folding unit of a polypeptide chain also carries specific function. Domains are often identified as
recurring (sequence or structure) units, which may exist in various contexts. In molecular evolution
such domains may have been utilized as building blocks, and may have been recombined in
different arrangements to modulate protein function. CDD defines conserved domains as recurring
units in molecular evolution, the extents of which can be determined by sequence and structure
analysis.
Objective:
To study the pairwise sequence similarity search using BLAST
algorithm.

Theory:
BLAST program was designed by Stephen Altschul, Warren Gish, Webb Miller, Eugene Myers, and
David J. Lipmann at National Institutes of Health (NIH) and was published in Journal of Molecular
Biology in 1990. BLAST (Basic local alignment search tool) is a heuristic search algorithm, it finds the
solutions from the all possibilities ,which takes input as nucleotide or protein sequence and compare it
with existing databases like NCBI, GenBank etc. It finds the local similarity between different
sequences and calculates the statistical significance of matches. It can also be used to find functional
and evolutionary relationship between different sequences. Search is done by taking the sequence of a
certain word size, comparing it with the database sequence and scores are assigned for each
comparison. Based on the threshold, a suitable match of that query word is taken and the alignment
is extended to both sides. After the alignment is complete, the total score is calculated and alignment
is displayed on the blast result page only if the total scores exceed the threshold value.

BLAST Procedure
This is the common procedure for any BLAST program.

Step 1: Select the BLAST program.


Step 2: Enter a query sequence or upload a file containing sequence.
Step 3: Select the database to search.
Step 4: Select the algorithm and the parameters of the algorithm for the search.
Step 5: Run the BLAST program.

Step 1: The BLAST program was selected

The blast program as specified from the database like BLASTp, BLASTn, BLASTx, tBLASTn, tBLASTx
by visiting the website http://blast.ncbi.nlm.nih.gov/Blast.cgi and click nucleotide blast.

Step 2: A query sequence was entered or a file containing sequence was uploaded

A query sequence was entered by pasting the sequence in the query box or uploading a FASTA file
which was having the sequence for similarity search. This step was similar for all BLAST programs.
The accession number can also be entered. Simulator tab was visited to know more about how to
retrieve query sequence.
Figure 1: Enter a query sequence or upload a file containing sequence

Step 3: To search database was selected

User first has to know what all databases are available and what type of sequences are present in
those databases. Sequence similarity search involves searching of similar sequences of the query
sequence from the selected databases (Figure 2).

Figure 2: Select database to search

Step 4: The algorithm and the parameters of the algorithm was selected for the search

There are different algorithms for some of the BLAST program. User has to specify the algorithm for
the BLAST program. Nucleotide BLAST uses algorithms like MegaBLAST which searches for highly
similar sequences, discontiguous MegaBLAST which searches for more dissimilar sequences and
BLASTn which searches for somewhat similar sequences. Meanwhile for protein BLAST algorithms like
BLASTp, searches for similarity between protein query and protein database, PSI-BLAST performs
position specific search iteratively, PHI-BLAST searches for a particular pattern (user has to enter the
pattern to search in the PHI pattern box provided) that is present in the sequence against the
sequences in the database, DELTA-BLAST is Domain Enhanced Lookup Time Accelerated BLAST. It
searches multiple sequence and aligns them to find protein homology. The different algorithmic
parameters are, Target sequences, Short queries, E-value, Word size, Query range, scoring
parameters (Match/Mismatch scores, and Gap penalties) and filters (Filter and Mask) which are
required to run BLAST programs. Default values are provided but the user can adjust the values
accordingly which is shown in figure 3.

Figure 3: Algorithm and the parameters

Step 5: The BLAST program was run

Submission of the BLAST program was done done by clicking the BLAST button at the end of the page.
Screen shot of result shown in figure 4.

Figure 4: Run the BLAST program

BLAST Result:
After submitting the query sequence for sequence similarity search, the result page will appear along
with the information like Query id, Description, Molecule type, Length of sequence, Database name
and BLAST program. It shows the putative conserved domains that have been detected while
undergoing sequence similarity search.
The query sequence represented as a numbered red bar below the color key. Database hits are shown
below the query (red) bar according to the alignment score. Among the aligned sequences, the most
related sequences are kept near to the query sequence. User can find more description about these
alignments, by dragging the mouse to the each colored bar which is shown below in figure 5.
Objective:
Secondary structure prediction for amino acid sequence of a
given protein

Theory:
PSI-blast based secondary structure PREDiction (PSIPRED) is a method used to
investigate protein structure. It uses artificial neural network machine learning methods in its
algorithm.Secondary structure is the general three-dimensional form of local segments
of biopolymers such as proteins and nucleic acids (DNA, RNA). It does not, however, describe
specific atomic positions in three-dimensional space, which are considered to be the tertiary
structure. Secondary structure can be formally defined by the hydrogen bonds of the biopolymer, as
observed in an atomic-resolution structure.

PSIPRED Procedure

Step1: The PSIPRED program was selected.

Step 2: The sequence in FASTA format was entered from NCBI. Insulin was selected here
Step 3: The database was selected to search.
Step 4: The algorithm and the parameters of the algorithm was selected for the search.
Step 5:The PSIPRED program by click on the predict button was run.

PSIPRED Result:
Objective:
Tertiary structure prediction for amino acid sequence of a given
protein

Theory:
The practical role of protein structure prediction is now more important than ever. Massive amounts
of protein sequence data are produced by modern large-scale DNA sequencing efforts such as
the Human Genome Project. Despite community-wide efforts in structural genomics, the output of
experimentally determined protein structures—typically by time-consuming and relatively
expensive X-ray crystallography or NMR spectroscopy—is lagging far behind the output of protein
sequences.

PyMOL Procedure

PyMOL is great for casual visualization of biological molecules. In this example, a PDB file
describing a protein is loaded and its style and color are tweaked.

The end result will look something like this


Default buttons for viewing with a 3-button mouse

1. PDB coordinates file for your favorite protein was obtained. (The RCSB Protein Data Bank ,
a public structure repository containing over 40,000 protein structures in PDB format
available for download, not a bad place to look.) For the example, we used trypsin.
2. The PDB file was opened by using File => Open... from the menu bar. The protein's
structure was appeared, probably rendered as simple bonding lines.
3. The right side of the Viewer showed the loaded PDB as an object, as well as its command
buttons. Each button contains a submenu with more options. S was clicked ,
then cartoon to show the protein's secondary structure in popular cartoon form.
 The lines was still visible on top of the cartoon viewed, to hide the lines, H was clicked
then lines.
4. To change the color of each protein chain (as defined in the coordinate file), clicked
the C then chainbows was selected from the by chain menu. "Chainbows" colors residues
in each protein chain as a rainbow that begins with blue and ends with red.
 Another common coloring method assigned a single color to each chain. Click C then
select by chain from the by chain menu.
5. Click and drag the protein to change the view. A list of mouse buttons is below the object
control panel.
 Rota: Rotate
 Move: Translate object along an axis
 MoveZ: aka Zoom
 Sele: Select
 Slab:
 Cent:
 PkAt:

PyMOL Result:
3D- STRUCTURE of TRYPSIN IN PYMOLE

3D STRUCTURE of TRYPSIN IN CARTOON FORM


Objective:
Establishment of method for gene and protein phylogeny.

Theory:
In biology, phylogenetics is the study of the evolutionary history and relationships among
individuals or groups of organisms (e.g. species, or populations). These relationships are discovered
through phylogenetic inference methods that evaluate observed heritable traits, such
as DNA sequences or morphology under a model of evolution of these traits. The result of these
analyses is a phylogeny (also known as a phylogenetic tree) – a diagrammatic hypothesis about the
history of the evolutionary relationships of a group of organisms.

Phylogeny.fr
Phylogeny.fr has been designed to provide a high performance platform that transparently chains programs
relevant to phylogenetic analysis in a comprehensive, and flexible pipeline. Although phylogenetic
aficionados will be able to find most of their favorite tools and run sophisticated analysis, the primary
philosophy of Phylogeny.fr is to assist biologists with no experience in phylogeny in analyzing their data in a
robust way.
The Phylogeny.fr platform offers a phylogeny pipeline which can be executed through three main modes:
1. The "One Click mode" targets users that do not wish to deal with program and parameter selection. By
default, the pipeline is already set up to run and connect programs recognized for their accuracy and speed
(MUSCLEfor multiple alignment and PhyML for phylogeny) to reconstruct a robust phylogenetic tree from a
set of sequences.
2. In the "Advanced mode", the Phylogeny.fr server proposes the succession of the same programs but users
can choose the steps to perform (multiple sequence alignment, phylogenetic reconstruction, tree drawing) and
the options of each program.
3. The "A la carte mode" offers the possibility of running and testing more alignment and phylogeny
programs, such as MUSCLE, ClustalW, T-Coffee, PhyML, BioNJ, TNT,..

"One Click" mode


This is a "default" mode which proposes a pipeline already set up to run and connect programs recognized
for their accuracy and speed (MUSCLE for multiple alignment, optionally Gblocks for alignment
curation, PhyML for phylogeny and finally TreeDyn for tree drawing) to reconstruct a robust phylogenetic
tree from a set of sequences.

Procedure
Step 1: The Phylogeny.fr program was selecetd.
Step 2: The “click-one” mode analysis was selected.
Step 3: The query sequence was entered or a file containing sequence was uploaded in fasta format
from NCBI. For example the insulin and its derivatives was selected.
Step 4: The Phylogeny.fr program was run by click on submit button

Phylogeny.fr
Result:

Potrebbero piacerti anche