BIO310 Lecture 10

Lecture : 10
Introduction to Genomics
Course Instructor:
Dr. Anum Masood
1
Genome
Genome Size
• The complete DNA sequence defines what we call a genome.
• The genome is therefore the total genetic information that is carried
within the cell.
• That includes the DNA in the nucleus and DNA in any of the
organelles.
• This is new: turns out that some of the organelles also include DNA.
• In animals, such organelle is mitochondria and in plants, this are the

chloroplasts.
Genome
• Despite this, when we will refer to a genome for eukaryotes, we will
usually mean the DNA in the nucleus, and we will refer to the genetic
material in the mitochondria as mitochondrial genome.
• DNA in the nucleus is most often not a single molecule, but rather
broken into pieces and organized within the chromosomes.
• Human have 23 pairs of chromosomes.
• But do not worry about chromosomes at this point (and at least not for a
few next lectures).
Genome
Genome Size
• So what is the total length of the DNA sequence?
• It depends on an organism.
• Prokaryote (bacteria) have the shortest genome.
• The length of the DNA sequence is expressed in the base pairs (bp),
which is a unit consisting of two nucleobases bound to each other by
hydrogen bonds.
• Simply, one base pair, one nucleotide on each strand.
• The total number of nucleotides in one of the strands is the size (length)
of the genome.
Genome
Genome Size
• Bacteria have genomes of length ranging from 0.5 to 13 Mbp.
• The unit Mbp means mega base pairs, or 106106 base pairs.
• This is actually usually written simply by Mb (mega bases).
• Genomes of eukaryotes is large and ranges from 8Mb to 670Gb.
• Viruses have a much smaller genomes of the size from 5 to 50kb.

Genome
Genome Size
What should be cheaper and faster? DNA/RNA or
protein sequencing?
DNA/RNA sequencing is faster and

cheaper simply because of fewer
characters, four nucleotides vs. twenty
The two strands of DNA are complimentary!
Why complementarity?
T is always facing A, while G is always facing C in one-

to-one reciprocal relationship
If we know the sequence of one strand, we can get the
sequence of the other strand
quiz
There is 20% Adenine in the genome of

a newly sequenced bacterium from
Antarctic. What is the percentage of G,
C, and T in the genome?
Acknowledgements: ©Arshan Nasir, PhD

quiz
There is 20% Adenine in the genome of a newly sequenced bacterium
from Antarctic. What is the percentage of G, C, and T in the genome?
If A = 20% then T is also 20%. It means that G and C make up the

remaining 60% 
Acknowledgements: ©Arshan Nasir, PhD

Example
• 5’-ATGCTGA-3’
• What is the complimentary sequence?

• 5’-ATGCTGA-3’
• 3’-TACGACT-5’
• How is this reported?

• 5’-ATGCTGA-3’ and 5’-TCAGCAT-3’
• What does it mean?

• The two sequences correspond to facing strands of the same DNA
molecule
Palindromes
• A fascinating property of DNA complementarity
is that sometimes the two strands are identical
• Known as palindromes and are very important

• Recognized by restriction enzymes
• Important binding sites
• A palindromic sequence is a nucleic acid

sequence (DNA or RNA) that is same whether
read 5' to 3' on one strand or 5' to 3' on the
complementary strand with which it forms a
double helix.
Palindromes
• The mirror like palindrome in which the
same forward and backwards are on a
single strand of DNA strand, as in
GTAATG
• The Inverted repeat palindromes is also

a sequence that reads the same forward
and backwards, but the forward and
backward sequences are found in
complementary DNA strands (GAATTC
being complementary to CTTAAG)
One more stop: the genetic code
• Proteins and RNA are encoded by DNA
• Sequencing proteins is difficult (isolation, funding restrictions, and

larger alphabets)
• So what should we do?
• Can we predict the protein and RNA sequences simply from DNA?
• Yes. Very useful activity in bioinformatics
DNA Coding Regions: Pretending
to Work with Protein Sequences
• Determining the sequence of a protein is much more difficult than

sequencing DNA —but all the proteins that a given organism (whether
microbe or human being) can synthesize are encoded in the DNA
sequence of its genome.
• Thus, the smart shortcut that molecular biologists have been using is to
read protein sequences directly at the information source: in the DNA
sequence!
• This way, we can pretend to know the amino-acid sequence of a protein
Turning DNA into proteins: The genetic code
• When you know a DNA sequence, you can translate it into the
corresponding protein sequence by using the genetic code, the very
same way the cell itself generates a protein sequence.
• The genetic code is universal (with some exceptions—otherwise life
would be too simple!), and it is nature’s solution to the problem of how
one uniquely relates a 4-nucleotide sequence (A, T, G, C) to a suite of
20 amino acids;
• we’re using symbols (rather than actual chemicals) to do the same.
• Understanding how the cell does this was one of the most brilliant
achievements of the biologists of the 1960s.
• Yet the final answer can be contained in a (miraculously small) table
From a given starting point in your DNA sequence,
start reading the sequence 3 nucleotides (one triplet)
at a time. Then consult the genetic code table to read
which amino acid corresponds to the current triplet
(technically referred to as codons)
• If your DNA sequence is correctly listed in the 5' to 3' orientation, you
generate the protein sequence in the conventional N- to C-terminus as
well.
• Thus, if you know where a protein-coding region starts in a DNA
sequence, your computer can pretend to be a cell and generate the
corresponding amino-acid sequence!
• This simple computer translation exercise is at the origin of most of the
so-called protein sequences that you can find in databases.
• The resulting protein sequence depends entirely on the way you
converted your DNA sequence into triplets before using the genetic
code.
• For instance, using the second position as starting point leads to
Beginning with the third position (GGA-AGT- . . .) again leads to an entirely different translation.
Turning DNA into proteins: Reading Frames
• Because of the triplet-based genetic code, a given DNA interval, on a given strand,
can theoretically be translated in three different ways
• Basically three perspectives that are known in the field as reading frames.
• Because the DNA can be used from both strands, a total of six possible reading
frames are possible for translating a DNA sequence into proteins.
• With very few exceptions (found in exotic viruses), only one of these six frames is
used for any given DNA coding region.
Open Reading Frame (ORF)
• An interval of DNA sequence that begins at Start Codon (ATG M=Methionine)
and remains free of STOP Codon (TAA, TGA, or TAG) is called an open reading
frame (ORF)
Six ORFs Example-1 Six ORFs Example-2
• Some DNA sequences are not encoding proteins at all — and that
higher organisms have large pieces of noncoding DNA inserted within
their genes.
• A large part of bioinformatics is devoted to the development of

methods to locate protein-coding regions in DNA sequences, to
delineate precisely where genes start and end, or where they are
interrupted by the noncoding intervals (called introns).
Questions
26

BIO310 Lecture 10

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

BIO310 Lecture 10

Caricato da

Copyright:

Formati disponibili

Lecture : 10

• In animals, such organelle is mitochondria and in plants, this are the

• Human have 23 pairs of chromosomes.

• This is actually usually written simply by Mb (mega bases).

• Genomes of eukaryotes is large and ranges from 8Mb to 670Gb.

• Viruses have a much smaller genomes of the size from 5 to 50kb.

DNA/RNA sequencing is faster and

T is always facing A, while G is always facing C in one-

There is 20% Adenine in the genome of

Acknowledgements: ©Arshan Nasir, PhD

If A = 20% then T is also 20%. It means that G and C make up the

Acknowledgements: ©Arshan Nasir, PhD

• What is the complimentary sequence?

• How is this reported?

• What does it mean?

• Known as palindromes and are very important

• A palindromic sequence is a nucleic acid

• The Inverted repeat palindromes is also

• Sequencing proteins is difficult (isolation, funding restrictions, and

• So what should we do?

• Determining the sequence of a protein is much more difficult than

• A large part of bioinformatics is devoted to the development of

Potrebbero piacerti anche