ENSEMBL Project - Browsing Tools Human Genome

Glossary
• Agroinformatics / Agricultural informatics:

Agroinformatics concentrates on the aspects of
bioinformatics dealing with plant genomes.
• Alignment :The process of lining up two or more
sequences to achieve maximal levels of identity (and
conservation, in the case of amino acid sequences) for
the purpose of assessing the degree of similarity and
the possibility of homology. Arrangement of two or more
nucleotides of protein sequences to maximize the
number of matching monomers.
• Alignment score: A numerical value that describes the
quality of a sequence alignment.
• Algorithm :A fixed procedure embodied in a computer
program. A set of rules for calculating or problem
solving carried out by a computer program.
• Bioinformatics :The merger of biotechnology and
information technology with the goal of revealing new
insights and principles in biology.
• Annotation: 1-Finding genes and other important
elements in raw sequence data (structural annotation).2-
Determining the function of genes and proteins
(functional annotation)
• BLAST Basic Local Alignment Search Tool. (Altschul
et al.) A sequence comparison algorithm optimized for
speed used to search sequence databases for optimal
local alignments to a query. The initial search is done for
a word of length "W" that scores at least "T" when
compared to the query using a substitution matrix. Word
hits are then extended in either direction in an attempt to
generate an alignment with a score exceeding the
threshold of "S". The "T" parameter dictates the speed
and sensitivity of the search.
• Database: On a computer, a collection of data records
either in a single file or as a multiple files. The central
component of a database management system.
• Database management system (DBMS): A software suite including a
database and utilities required to organize, search, and update it, maintain
data security and control access.
• Domain: Usually used to describe part of a protein that can fold and carry
out a function independently, but sometimes used more generally to indicate
part of a protein sequence, for instance a ‘glycine-rich domain’, or a
geometrically distinct part of a protein structure.
• E value Expectation value used to test the significance of a sequence
similarity score. The number of different alignents with scores equivalent to
or better than S that are expected to occur in a database search by chance.
The lower the E value, the more significant the score.
• FASTA The first widely used algorithm (a sequence alignment algorithm) for
database similarity searching. The program looks for optimal local
alignments by scanning the sequence for small matches called "words".
Initially, the scores of segments in which there are multiple word hits are
calculated ("init1"). Later the scores of several segments may be summed to
generate an "initn" score. An optimized alignment that includes gaps is
shown in the output as "opt". The sensitivity and speed of the search are
inversely related and controlled by the "k-tup" variable which specifies the
size of a "word". (Pearson and Lipman)
• Global Alignment:The alignment of two nucleic acid or protein
sequences over their entire length
• Heuristic: Of a computer program, making guesses to obtain
approximate results but much faster than possible with exhaustive
searching.
• Homology :Similarity attributed to descent from a common
ancestor. An evolutionary relationship of two molecules deriving
from a common ancesstor.
• Identity :The extent to which two (nucleotide or amino acid)
sequences are invariant.
• K :A statistical parameter used in calculating BLAST scores that can
be thought of as a natural scale for search space size. The value K
is used in converting a raw score (S) to a bit score (S').
• lambda :A statistical parameter used in calculating BLAST scores
that can be thought of as a natural scale for scoring system. The
value lambda is used in converting a raw score (S) to a bit score
(S').
• Local Alignment :The alignment of some portion of two nucleic
acid or protein sequences
• Motif :A short conserved region in a protein sequence. Motifs are
frequently highly conserved parts of domains.
• Multiple Sequence Alignment :An alignment of three or more
sequences with gaps inserted in the sequences such that residues
with common structural positions and/or ancestral residues are
aligned in the same column. Clustal W is one of the most widely
used multiple sequence alignment programs
• Optimal Alignment :An alignment of two sequences with the
highest possible score.
• Orthologous :Homologous sequences in different species that
arose from a common ancestral gene during speciation; may or may
not be responsible for a similar function.
• P value :The probability of an alignment occurring with the score in
question or better. The p value is calculated by relating the
observed alignment score, S, to the expected distribution of HSP
scores from comparisons of random sequences of the same length
and composition as the query to the database. The most highly
significant P values will be those close to 0. P values and E values
are different ways of representing the significance of the alignment.
• Paralogous :Homologous sequences within a single species that arose by
gene duplication.
• Profile :A table that lists the frequencies of each amino acid in each
position of protein sequence. Frequencies are calculated from multiple
alignments of sequences containing a domain of interest.
• Proteome: The entire complement of proteins produced by a particular
genome, including variants of the same basic protein generated by post-
translational modification etc. The study of the proteome is known as
proteomics.
• Proteomics :Systematic analysis of protein expression of normal and
diseased tissues that involves the separation, identification and
characterization of all of the proteins in an organism.
• PSI-BLAST :Position-Specific Iterative BLAST. An iterative search using the
BLAST algorithm. A profile is built after the initial search, which is then used
in subsequent searches. The process may be repeated, if desired with new
sequences found in each cycle used to refine the profile. Details can be
found in this discussion of PSI-BLAST. (Altschul et al.)
• PSSM :Position-specific scoring matrix; see profile. The PSSM gives the
log-odds score for finding a particular matching amino acid in a target
sequence.
• Query :The input sequence (or other type of search term) with which all of
the entries in a database are to be compared.
• Primary database: A database for primary sequence data. The
primary nucleotide databases are NCBI GenBank, the European
Molecular Biology Laboratory (EMBL), Nucleotide Sequence
Database, and the DNA Database of Japan. The primary protein
databases are SWISS-PROT and TrEMBL.
• Secondary database: A database of sequence information derived
from the data in primary databases. Example include PROSITE,
BLOCKS, Pfam and PRINTS.
• Relational database: A database in which data records are
organized as tables, allowing the data from tables containing similar
fields to be linked together.
• SWISS-PROT: Database of confirmed protein sequences with
extensive annotations. Maintained by the Swiss Bioinformatics
Institute.
• TrEMBL: Translated EMBL. Database of protein sequences
translated from the EMBL nucleotide sequence database. Not as
extensively annotated as SWISS-PROT
• SQL: Symbolic query language. The industry-standard language
used to interrogate and process data in relational database.
ENSEMBL project- Browsing
tools human genome
Ensembl
From Wikipedia, the free encyclopedia
• EMnsembl is a bioinformatics research

project aiming to "develop a software
system which produces and maintains
automatic annotation on selected
eukaryotic genomes". It is run in a
collaboration between the
Wellcome Trust Sanger Institute and the
European Bioinformatics Institute, an
outstation of the
European Molecular Biology Laboratory.
Goals of Ensembl
The Ensembl project aims to provide:

• Accurate, automatic analysis of genome
data
• Analysis and annotation maintained on the
current data
• Presentation of the analysis to all via the
Web
• Distribution of the analysis to other
bioinformatics laboratories
Software and data
• The project is open source - all data and all software that
is produced in the project can be freely accessed and
used.
• Most of the software produced and used is written in the
language Perl and is based on the BioPerl infrastructure.
The Perl API can be easily employed in other genomic
projects e.g. for the annotation of gene or clone lists. The
website code uses an extensible plugins system which
allows groups to modify the website for their own data
sets, e.g. Vega which stores and displays manual
annotation.
• Also available is an API in Java.
Current species
• The annotated genomes include most finished vertebrates and
selected model organisms. Currently this includes:
• Chordates
– Mammals: Human, Mouse, Rat, Chimp, Macaque, Dog, Cow, Elephant
(pre), Opossum, Rabbit (preliminary data), Armadillo (pre), Tenrec (pre)
– Birds: Chicken
– Fish: Takifugu rubripes (Fugu), Tetradodon nigroviridis, Danio rerio
(Zebrafish)
– Frog: Xenopus tropicalis
– Ancient relatives: Ciona intestinalis, Ciona savignyi (pre)
• Invertebrates
– Insects: Anopheles gambiae (Mosquito), Honeybee, Drosophila
melanogaster (Fruitfly), Aedes aegypti (Mosquito)
– Worm: Caenorhabditis elegans
• Yeast: Saccharomyces cerevisiae (Baker's yeast)
Usage
• The service is used by molecular biologists and
bioinformaticians around the world working with
genome data of the above organisms. The
predictions of coding, controlling and other
elements in the genomes can be compared with
primary research data and with common
repositories of current genomic knowledge (
Biological Databases).
• The comparison of organisms (
comparative genomics or also intergenomics)
with respect to their gene structures and the
coded proteins is of special interest. The synteny
view can be useful educational material for
school classes.
Human genome
• The human genome project is the result of an
international consortium among many different
sequencing and bioinformatics centers. A wealth of data
is available including; the annotated assembled genomic
sequence, transcript sequence, library resources,
expression data, map data, disease and functional
information, and more. The result is an unprecedented
amount of knowledge concerning human genetics that
will eventually result in breakthroughs in understanding
human biology as well as significant medical advances.
• A challenge facing researchers today is that of analyzing
and integrating the plethora of data available. The
human genome sequence provides a critical foundation
for continued advances in medicine, basic research, and
clinical diagnostic technologies.
Karyotype
Map of the human X chromosome (from the NCBI website). Assembly of the
human genome is one of the greatest achievements of bioinformatics .

Usage of SWISSPROT, EMBL,
BLAST software for similarities
searches- Comparing to
sequences-building a multiple
alignment sequence.
Swiss-Prot
• Swiss-Prot is a manually curated
biological database of protein sequences. Swiss-
Prot was created in 1986 by Amos Bairoch
during his PhD and developed by the
Swiss Institute of Bioinformatics and the
European Bioinformatics Institute. Swiss-Prot
strives to provide reliable protein sequences
associated with a high level of annotation (such
as the description of the function of a protein, its
domains structure, post-translational
modifications, variants, etc.), a minimal level of
redundancy and high level of integration with
other databases.
• In 2002, the UniProt consortium was created: it is a
collaboration between the Swiss Institute of
Bioinformatics, the European Bioinfomatics Institute and
the Protein Information Resource (PIR), funded by the
National Institutes of Health. Swiss-Prot and its
automatically curated supplement TrEMBL, have joined
with the Protein Information Resource protein database
to produce the UniProt Knowledgebase, the world's most
comprehensive catalogue of information on proteins. The
UniProtKB/Swiss-Prot release 51.3 from 12 December
2006 contains 250,296 entries.
• The UniProt consortium produced 3 database
components, each optimised for different uses. The
UniProt Knowledgebase (UniProtKB (Swiss-Prot +
TrEMBL)), the UniProt Non-redundant Reference
(UniRef) databases, which combine closely related
sequences into a single record to speed similarity
searches and the UniProt Archive (UniParc), which is a
comprehensive repository of protein sequences,
reflecting the history of all protein sequences.
• European Molecular Biology Laboratory
• The European Molecular Biology Laboratory (EMBL)
is a molecular biology research institution supported by
19 European countries. The EMBL was created in 1974
and has laboratories in Heidelberg, Germany; Hamburg,
Germany; Grenoble, France; and Hinxton, UK, and an
external Research Programme in Monterotondo, Italy.
• Cell biology and biophysics, developmental biology,
gene expression, structural biology and
computational biology are the major fields of research at
EMBL Heidelberg.
• Many scientific breakthroughs have been made at EMBL
Heidelberg, most notably the first systematic genetic
analysis of embryonic development in the fruit fly by
Christiane Nüsslein-Volhard and Erich Wieschaus, for
which they were awarded the Nobel Prize for Medicine in
1995.
• Heidelberg is the largest centre for biomedical research
in Germany and home to the oldest German university,
the Ruprecht-Karls-Universität Heidelberg.
BLAST
• Developer:Altschul S.F., Gish W., Miller E.W., Lipman D.J.,
• NCBILatest release:2.2.15 /
• OS:UNIX, Linux, Mac, MS-Windows
• Use:Bioinformatics tool
• In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is

an algorithm for comparing primary biological sequence information,
such as the amino-acid sequences of different proteins or the
nucleotides of DNA sequences.
• A BLAST search enables a researcher to compare a query

sequence with a library or database of sequences, and identify
library sequences that resemble the query sequence above a
certain threshold. For example, following the discovery of a
previously unknown gene in the mouse, a scientist will typically
perform a BLAST search of the human genome to see if human
beings carry a similar gene; BLAST will identify sequences in the
human genome that resemble the mouse gene based on similarity
of sequence.
Agroinformatics
• BLAST is one of the most widely used bioinformatics programs,
probably because it addresses a fundamental problem and the
algorithm emphasizes speed over sensitivity. This emphasis on
speed is vital to making the algorithm practical on the huge genome
databases currently available, although subsequent algorithms can
be even faster.
• Examples of other questions that researchers use BLAST to answer
are
• Which bacterial species have a protein that is related in lineage to a
certain protein whose amino-acid sequence I know?
• Where does the DNA that I've just sequenced come from?
• What other genes encode proteins that exhibit structures or motifs
such as the one I've just determined?
• BLAST is also often used as part of other algorithms that require
approximate sequence matching.
• The BLAST algorithm and the computer program that implements it
were developed by Stephen Altschul, Warren Gish, David Lipman at
the U.S. National Center for Biotechnology Information (NCBI),
Webb Miller at The Pennsylvania State University, and Gene Myers
at the University of Arizona .
• Input and Output, complies to the FASTA format
Algorithm
• To run, BLAST requires two sequences as
input: a query sequence (also called the
target sequence) and a sequence
database. BLAST will find subsequences
in the query that are similar to
subsequences in the database. In typical
usage, the query sequence is much
smaller than the database, e.g., the query
may be one thousand nucleotides while
the database is several billion nucleotides.
BLAST searches for high scoring
sequence alignments between the query sequence
and sequences in the database using a heuristic
approach that approximates the
Smith-Waterman algorithm. The exhaustive Smith-
Waterman approach is too slow for searching large
genomic databases such as GenBank. Therefore,
the BLAST algorithm uses a heuristic approach
that is slightly less accurate than Smith-Waterman
but over 50 times faster. The speed and relatively
good accuracy of BLAST are the key technical
innovation of the BLAST programs and arguably
why the tool is the most popular bioinformatics
search tool.
• The BLAST algorithm can be conceptually divided into three stages.
• In the first stage, BLAST searches for exact matches of a small fixed
length W between the query and sequences in the database. For
example, given the sequences AGTTAC and ACTTAG and a word
length W = 3, BLAST would identify the matching substring TTA that
is common to both sequences. By default, W = 11 for nucleic seeds.
•
• In the second stage, BLAST tries to extend the match in both
directions, starting at the seed. The ungapped alignment process
extends the initial seed match of length W in each direction in an
attempt to boost the alignment score. Insertions and deletions are
not considered during this stage. For our example, the ungapped
alignment between the sequences AGTTAC and ACTTAG centered
around the common word TTA would be:
• ..A G T T A C..
• | | | | | |
• .A C T T A G..
If a high-scoring ungapped alignment is found, the database sequence

is passed on to the third stage.
• In the third stage, BLAST performs a gapped alignment between the
query sequence and the database sequence using a variation of the
Smith-Waterman algorithm. Statistically significant alignments are
then displayed to the user.
An extremely fast but considerably less
sensitive alternative to BLAST that
compares nucleotide sequences to the
genome is BLAT (Blast Like Alignment
Tool). A version designed for comparing
multiple large genomes or chromosomes is
BLASTZ. Also there is another well-known
software called PatternHunter which
produces significantly better sensitivity
results than BLAST at the same speed or
very similar sensitivity results at a much
faster speed.
Parallel BLAST
• Parallel BLAST versions are implemented

using MPI, Pthreads and are ported on
various platforms including Windows,Linux
, Solaris, OSX, and AIX. Popular
approaches to parallelize BLAST include
query distribution, hash table
segmentation, computation parallelization,
and database segmentation(partition).
Program
• The BLAST program can either be downloaded

and run as a command-line utility "blastall" or
accessed for free over the web. The BLAST web
server, hosted by the NCBI, allows anyone with
a web browser to perform similarity searches
against constantly updated databases of
proteins and DNA that include most of the newly
sequenced organisms.
• BLAST is actually a family of programs (all
included in the blastall executable). The
following are some of the programs, ranked
mostly in order of importance:
• Nucleotide-nucleotide BLAST (blastn): This program, given a DNA query, returns
the most similar DNA sequences from the DNA database that the user specifies.
• Protein-protein BLAST (blastp): This program, given a protein query, returns the
most similar protein sequences from the protein database that the user specifies.
• Position-Specific Iterative BLAST (PSI-BLAST): One of the more recent BLAST
programs, this program is used for finding distant relatives of a protein. First, a list of
all closely related proteins is created. Then these proteins are combined into a
"profile" that is a sort of average sequence. A query against the protein database is
then run using this profile, and a larger group of proteins found. This larger group is
used to construct another profile, and the process is repeated.
By including related proteins in the search, PSI-BLAST is much more sensitive in
picking up distant evolutionary relationships than the standard protein-protein BLAST.
• Nucleotide 6-frame translation-protein (blastx): This program compares the six-
frame conceptual translation products of a nucleotide query sequence (both strands)
against a protein sequence database.
• Nucleotide 6-frame translation-nucleotide 6-frame translation (tblastx): This
program is the slowest of the BLAST family. It translates the query nucleotide
sequence in all six possible frames and compares it against the six-frame translations
of a nucleotide sequence database. The purpose of tblastx is to find very distant
relationships between nucleotide sequences.
• Protein-nucleotide 6-frame translation (tblastn): This program compares a protein
query against the six-frame translations of a nucleotide sequence database.
• Large numbers of query sequences (megablast): When comparing large numbers
of input sequences via the command-line BLAST, "megablast" is much faster than
running BLAST multiple times. It basically concatenates many input sequences
together to form a large sequence before searching the BLAST database, then post-
analyze the search results to glean individual alignments and statistical values.
Setting up a BLAST search
Step 1. Plan the search
Step 2. Enter the query sequence
Step 3. Choose the appropriate search parameters
Step 4. Submit the query
Deciphering the BLAST output
Step 1. Examine the alignment scores and statistics
Step 2. Examine the alignments

Step 3. Review search details to plan the next step
Post-BLAST analysis
Perform a PSI-BLAST analysis

Create a multiple alignment
Try motif searching with PHI-BLAST
• The core of NCBI 's BLAST services is BLAST 2.0
otherwise known as "Gapped BLAST". This service is
designed to take protein and nucleic acid sequences and
compare them against a selection of NCBI databases.
• The BLAST algorithm was written balancing speed and
increased sensitivity for distant sequence relationships.
Instead of relying on global alignments (commonly seen
in multiple sequence alignment programs) BLAST
emphasizes regions of local alignment to detect
relationships among sequences which share only
isolated regions of similarity (Altschul et al., 1990).
Therefore, BLAST is more than a tool to view sequences
aligned with each other or to find homology, but a
program to locate regions of sequence similarity with a
view to comparing structure and function
Selecting the BLAST Program
The BLAST search pages allow you to select from several different programs.
Below is a table of these programs.
• Program Description
• Blastp :Compares an amino acid query sequence
against a protein sequence database.
• Blastn:Compares a nucleotide query sequence against a
nucleotide sequence database.
• Blastx:Compares a nucleotide query sequence
translated in all reading frames against a protein
sequence database. You could use this option to find
potential translation products of an unknown nucleotide
sequence.
• Tblastn: Compares a protein query sequence against a
nucleotide sequence database dynamically translated in
all reading frames.
• Tblastx: Compares the six-frame translations of a
nucleotide query sequence against the six-frame
translations of a nucleotide sequence database. Please
note that the tblastx program cannot be used with the nr
database on the BLAST Web page because it is
computationally intensive.
To select a BLAST program for
your search
1. Open the Basic BLAST search page.
2. From the "Program" Pull Down Menu
select the appropriate program.
Figure 1. Using the pull down menu to select a BLAST
program.
Proteins
• Database & Description
• Nr : All non-redundant GenBank CDS
tralations+PDB+SwissProt+PIR+PRF
• month: All new or revised GenBank CDS
translation+PDB+SwissProt+PIR released in the last 30 days.
• Swissprot: The last major release of the SWISS-PROT protein
sequence database (no updates). These are uploaded to our
system when they are received from EMBL
• .patents:Protein sequences derived from the Patent division of
GenBank.
• Yeast: Yeast (Saccharomyces cerevisiae) protein sequences. This
database is not to be confused with a listing of all Yeast protein
sequences. It is a database of the protein translations of the Yeast
complete genome.
• E. coli :E. coli (Escherichia coli) genomic CDS translations
• .pdbSequences derived from the 3-dimensional structure
Brookhaven Protein Data Bank.
• kabat [kabatpro]: Kabat's database of sequences of immunological
interest. For more information http://immuno.bme.nwu.edu/
• Alu: Translations of select Alu repeats from REPBASE, suitable for
masking Alu repeats from query sequences. It is available at ftp://
ncbi.nlm.nih.gov/pub/jmc/alu. See "Alu alert" by Claverie and
Makalowski, Nature vol. 371, page 752 (1994).
Nucleotides
• Database Description
• Nr: All non-redundant GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,
GSS, or HTGS sequences).
• Month: All new or revised GenBank+EMBL+DDBJ+PDB sequences released in the
last 30 days.
• Dbest: Non-redundant database of GenBank+EMBL+DDBJ EST Divisions
• Dbsts: Non-redundant database of GenBank+EMBL+DDBJ STS Divisions.
• mouse ests: The non-redundant Database of GenBank+EMBL+DDBJ EST Divisions
limited to the organism mouse.
• human ests: The Non-redundant Database of GenBank+EMBL+DDBJ EST Divisions
limited to the organism human.
• other ests: The non-redundant database of GenBank+EMBL+DDBJ EST Divisions all
organisms except mouse and human.
• Yeast: Yeast (Saccharomyces cerevisiae) genomic nucleotide sequences. Not a
collection of all Yeast nucelotides sequences, but the sequence fragments from the
Yeast complete genome
• E. coli: E. coli (Escherichia coli) genomic nucleotide sequences.
• Pdb:Sequences derived from the 3-dimensional structure of proteins.
• .kabat [kabatnuc]: Kabat's database of sequences of immunological interest. For
more information http://immuno.bme.nwu.edu/
• Patents; Nucleotide sequences derived from the Patent division of GenBank.
• vector: Vector subset of GenBank(R), NCBI, (ftp://ncbi.nlm.nih.gov/pub/blast/db/
directory).
• Mito: Database of mitochondrial sequences (Rel. 1.0, July 1995).
• Alu: Select Alu repeats from REPBASE,
suitable for masking Alu repeats from
query sequences. It is available at ftp://
ncbi.nlm.nih.gov/pub/jmc/alu. See "Alu
alert" by epd
– Eukaryotic Promotor Database ISREC in
Epalinges s/Lausanne (Switzerland).
• Gss: Genome Survey Sequence, includes
single-pass genomic data, exon-trapped
sequences, and Alu PCR sequences.
• Htgs: High Throughput Genomic
Sequences.
Figure 2. Using the Pull Down Menu to select the
BLAST database.
Entering your Sequence
• The BLAST web pages accept input
sequences in three formats; FASTA
sequence format, NCBI Accession
numbers, or GIs.
• FASTA Format
• A description of the FASTA format is
located on the Basic BLAST search
pages.
1. Open your FASTA formatted sequence in a text editor as
plain text.
2. Use your mouse to highlight the entire sequence.
3. Select Edit/Copy from the menu in your text editor.
4. Go to the BLAST search page in your web browser.
5. Use your mouse to select the main input field titled
"Enter your input data here", by clicking it once.
6. Select Edit/Paste from the browser's menu.
7. You should now see your FASTA sequence in this field.
8. Set the pull down menu to "Sequence in FASTA format".
Figure 3. Example of a FASTA sequence in the input
field.
Accession or GI number
• If you know the Accession number or the GI of a
sequence in GenBank, you can use this as the
query sequence in a BLAST search.
1. Go to the BLAST search page in your web

browser.
2. Use your mouse to select the main input field
titled "Enter your input data here", by clicking it
once.
3. Using the keyboard enter the GenBank
Accession number or the GI number.
4. Set the Pull Down Menu to "Accession or GI".
Submitting your Search
1. Make sure you have selected the correct
BLAST program and BLAST database.
2. If you have entered your FASTA
sequence or an Accession or GI number,
click the "Submit Query Button".
3. BLAST will now open a new window and
tell you it is working on your search.
4. Once your results are computed they will
be presented in the window.
Introduction to a BLAST Query
Open a new browser window so that the BLAST program
can be compared to the tutorial. Notice that the tutorial
page resembles the Query form for an ADVANCED
BLAST search, however, the elements of the Query form
have been reorganized on the tutorial page to facilitate
describing them. Explanatory notes have been added in
light grey boxes. Additional details about BLAST are
available through the buttons.
The BLAST browser window may be left open and used
in parallel, or it may be closed while browsing through
this tutorial. Scroll down the tutorial page to learn how to
submit a BLAST search, step by step. When you are
ready, the button will take you to the BLAST output page
where the results of this search can be examined.

ENSEMBL Project - Browsing Tools Human Genome

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

ENSEMBL Project - Browsing Tools Human Genome

Caricato da

Copyright:

Formati disponibili

Glossary

• Agroinformatics / Agricultural informatics:

• EMnsembl is a bioinformatics research

The Ensembl project aims to provide:

human genome is one of the greatest achievements of bioinformatics .

• In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is

• A BLAST search enables a researcher to compare a query

If a high-scoring ungapped alignment is found, the database sequence

• Parallel BLAST versions are implemented

• The BLAST program can either be downloaded

Step 2. Examine the alignments

Perform a PSI-BLAST analysis

1. Go to the BLAST search page in your web

Potrebbero piacerti anche