Human Transcriptom E: by Dr. Ina Garg

HUMAN
TRANSCRIPTOM
E 1
By Dr. Ina Garg

TRANSCRIPTOME: A BRIEF
HISTORY

Transcriptomics is the study of RNA, single-stranded nucleic acid, which was
not separated from the DNA world until the central dogma was formulated
by Francis Crick in 1958.
In 1961, Jacob and Monod proposed a model that the protein-coding gene is
transcribed into a special short-lived intermediate associated with the
ribosome, which was designated as mRNA.
A short, stable RNA, transfer RNA (tRNA), was identified as the predicted
“adaptor”.
Shortly, ribosomal RNA (rRNA) involved in protein synthesis was purified.

2
 Since the late 1970s, Altman and Cech revealed
respectively that RNA can function as a catalyst.
 In 1982, Kruger put forward the “ribozyme” concept,

demonstrating that RNA could act as both genetic
material (like DNA) and a biological catalyst (like
protein enzymes).
3
 In 1998, Fire and Mello found that double-stranded RNAs
(dsRNAs) could recognize specific mRNA sequence and then led
to the degradation of the target mRNAs, which was known as
RNA interference (RNAi).
 Further studies indicated that the actual molecules that directly

caused RNAi were short dsRNA fragments of 21–25 base pair,
called small interfering RNA (siRNA).
 In 1977, Sharp and Roberts showed that the mRNA sequence of

adenovirus displayed discontinuous distribution in the genome,
and therefore suggested that a typical eukaryotic gene consists
of exons, the protein- coding sequence, and introns, the non-
coding sequence. 4
TRANSCRIPTOME
 A transcriptome is the collection of all
the mRNA molecules or transcripts
transcribed from the DNA (genome), at
any one time, in a particular cell.
 Specifies the composition of the

proteome
 Expression triggered by environmental

factors
 If conditions are not optimum, total

expression can be switched off
5
TRANSCRIPTOMICS
SCOPE
 Reflects the genes that are being actively expressed at any
given time, with the exception of mRNA degradation phenomena
such as transcriptional attenuation.
 The study of transcriptomics, also referred to as expression

profiling, examines the expression level of mRNAs in a given cell
population, often using high-throughput techniques based on DNA
microarray technology.
6
TRANSCRIPTOMICS AIMS
I. To catalogue all species of transcripts.
II.To determine the transcriptional structure of genes.
III.To quantify the changing expression levels of each

transcript during development and under different
conditions.
7
TRANSCRIPTOMES GIVE US
INFORMATION OF GENE
EXPRESSION
Why use transcriptomes in biological research?
Pros Cons
• Easy, accessible • Snapshot in time (different times,
• Immediate access to the different expression patterns)
protein coding portion of the • Absence of gene expression does not
genome mean it is not present in the genome.
• Identify alternative splicing • Difficult to ensure that you have
• Identify SNPs in coding sampled a single cell type.
regions • Statistical analysis is highly 8
dependent on experimental design
TECHNOLOGIES
 Hybridization-based approaches
– fluorescently labelled cDNA with custom-made microarrays
– commercial high-density oligo microarrays
Sequence-based approaches
– Sanger sequencing of cDNA or EST libraries
– serial analysis of gene expression (SAGE)
– cap analysis of gene expression (CAGE)
– massively parallel signature sequencing (MPSS)
9
CDNA LIBRARIES AND DATA

MINING
Collections of cDNAs were obtained from various normal and diseased
tissues, as well as from reference cell lines.
 The functional characterization of transcribed sequences progressed at

the same time.
 The corresponding cDNAs known to be expressed in various tissues or cell

types analyzed were designated expressed sequence tags (or ESTs).
 With increasing entries, it became feasible to merge overlapping partial

sequences and eventually to define full length open-reading frames (ORFs).
10
 Global gene expression information provided by cDNA/EST
databases made electronic northern feasible.
 The electronic northern analysis facilitated prediction of

expression changes between normal and diseased tissues.
 Extensive mining of EST databases using stringent statistical

tests permitted identification of candidate genes whose altered
(stimulated or reduced) expression correlated with the disease
state.
11
CDNA SUBTRACTION
 The data mining approach was limited by the existing sequence
information and the available gene annotations.
 To circumvent this bias, researchers established several methods.

 cDNA subtraction
 Differential display PCR (DD)
 Representational difference analysis
 Serial analysis of gene expression (SAGE).
12
 cDNA subtraction is a method for separating cDNA molecules that
distinguish related cDNA samples
 cDNAs prepared from two different cell types to be compared are rendered
single-stranded, subsequently mixed, and incubated to allow annealing of
sequences common to both cell species.
 These sequences will hybridize, while sequences unique to one of the cells
will stay single-stranded.
 Single-stranded and double stranded cDNAs are separated by

hydroxylapatite chromatography.
 Subsequently, the unique cDNA fragments are cloned and sequenced.

13
Subtractive hybridization requires two populations of nucleic acids
TEST
ER/ Driver
TRAC
ER It lacks the target sequences
Contains the target nucleic acid (the DNA or
RNA differences that one wants to identify)
The two populations are hybridized with a driver to tester ratio of at least 10:1. Because of
the large excess of driver molecules, tester sequences are more likely to form driver-tester
hybrids than double-stranded tester.
Only the sequences in common between the tester and the driver hybridize, however, leaving
the remaining tester sequences either single-stranded or forming tester-tester pairs.
14
 The driver-tester, double-stranded driver and any single-stranded
driver molecules are subsequently removed (the "subtractive" step),
leaving only tester molecules enriched for sequences not found in the
driver.(target sequences)
15
PROCESS
16
DISADVANTAGE
 Abundant mRNAs (cDNAs) are over-represented due to the lack of
normalization.
 Rare transcripts are not detected at all.
Overcome by suppression subtractive hybridization (SSH), a PCR-based

subtraction method that combines normalization and subtraction into a single
procedure.
The normalization step equalizes the abundance of cDNA fragments within the
target population, and the subtraction step excludes sequences that are common
to the cell populations being contrasted 17
 Differential display PCR is a method to separate and clone
individual mRNAs that are differentially expressed.
 A set of oligonucleotide primers is used, one being anchored

to the polyadenylated tail of a subset of mRNAs, the other
being short and arbitrary in sequence to allow annealing at
different sites relative to the first primer.
 The mRNA subpopulations defined by the primer pairs are

amplified after reverse transcription and the products
resolved (displayed) on DNA sequencing gels.
 Many samples can be run in parallel to reveal differences in

mRNA composition. 18
SERIAL ANALYSIS OF GENE EXPRESSION
 Invented at Johns Hopkins University in USA (Oncology Center)
by Dr. Victor Velculescu in 1995.
 Serial analysis of gene expression (SAGE) is an approach that

allows rapid and detailed analysis of overall gene expression
patterns.
 SAGE is based mainly on two principles, representation of

mRNAs (cDNAs) by short sequence tags and concatenation of
these tags for cloning to allow the efficient sequencing analysis.
19
SAGE
1. Isolate mRNA.
20
2. (a) Add biotin-labeled dT B
primer:
(b) Synthesize ds cDNA.
3. (a) Bind to streptavidin-

coated beads.
B
21
22
(b) Cleave with “anchoring enzyme”(restriction
enzyme)
 Type II restriction
enzyme used (E.g. NlaIII.)
 Average length of cDNA
–256bp with sticky ends
created.
(c) Discard loose fragments
23
5. DIVIDE INTO TWO POOLS AND ADD LINKER SEQUENCES
 Captured cDNA are then ligated to linkers at their ends.
 Linkers must contain:

 NlaIII 4-nucleotide cohesive overhang.
 Type IIs recognition sequence.
 PCR primer sequence.
24
Linker
s
B B
B B
Pool A Pool B
25
6.CLEAVING WITH TAGGING ENZYME
 Tagging enzyme, (usually BsmF1) cleave DNA, releasing the
linker-adapted SAGE tag from each cDNA.
 Repair of ends to make blunt ended tags using DNA polymerase
(Klenow fragments) and dNTPs.
26
7. Combine pools and ligate.(formation of ditags)
 Two groups of cDNAs are ligated to each other, to create a “ditag”

with linkers on either end.
 Two tags are linked together using T4 DNA ligase.
8. Amplify ditags, then cleave with anchoring enzyme.
 This leaves a “sticky” end with the sequence GTAC (or CAGT on the
other strand) at each end of the ditag.
27
9. Ligate digitags. (Concatamerization of Ditags)
 Each ditag is having an AE site, allowing the scientist and the

computer to recognize where one ends and the next begins.
10. Sequence and record the tags and

frequencies.
28
APPLICATIONS OF SAGE
 To analyze differences between gene expression patterns of
cancer cells and their normal counter parts.
 Allows rapid, detailed analysis of thousands of transcripts in a

cell.
 By comparing different types of cells, generate profiles that will

help to understand healthy cells and pathogenesis.
 To identify downstream targets of oncogenes and tumor

suppresser genes.
29
ADVANTAGE
S:
 mRNA sequence does not need to be known prior, so genes
of variants which are not known can be discovered.
 Accurate.
30
PROBLEMS IN
SAGE…
• Length of gene tag is extremely short (13 or 14bp), so
if the tag is derived from an unknown gene, it is
difficult to analyze with such a short sequence.
• Type II restriction enzyme does not yield same length

fragments.
• mRNA levels and protein expression do not always

correlate.
31
CAP ANALYSIS OF GENE
EXPRESSION
 Cap analysis gene expression (CAGE) is used to produce a snapshot
of the 5′ end of the mRNA population in a biological sample (the
transcriptome).
 The small fragments from the very beginnings of mRNAs (5' ends
of capped transcripts) are extracted reverse-transcribed to
DNA PCR amplified sequenced.
 Unlike SAGE in which tags come from other parts of transcripts,

CAGE is primarily used to locate exact transcription start sites in
the genome. This knowledge in turn allows a researcher to
investigate promoter structure necessary for gene expression.
32
Sanger sequencing of EST or cDNA library provided
information for genome annotation in the early days of
genome research.
Due to the limitations on throughput and cost, it is

impossible to achieve transcriptome quantitative
analysis using EST methods.
WithSAGE and CAGE, respectively, multiple 3′ and 5′

cDNA ends were concatenated to be one clone.
33
However, due to the high cost of Sanger
sequencing and the difficulty to map the
short sequence (~20 bp) tags to genome,
CAGE and SAGE were replaced by DNA
microarray shortly.
34
TRANSCRIPTOME ANALYSIS BASED
ON MICROARRAYS
 DNA microarray or chip method is based on nucleic acid hybridization.
 Fluorescent labelled cDNAs are incubated with oligonucleotide probes on

the chip and then the abundance of RNA is determined by measuring
fluorescence density.
 High-density gene chip allowed relatively low cost gene expression

profiling.
 Specific microarrays were designed according to the purpose of the

experiment, such as arrays to detect different isoforms from
alternative splicing. 35
MICROARRAY SAGE
36
BASIC FLOWCHART OF
TRANSCRIPTOME STUDY USING
MICROARRAY/DNA CHIP Use Reverse
Collect mRNA Transcriptase (RT) Label cDNA with
molecules enzyme to
flourescent dyes
from a cell produce cDNA
molecules from
the mRNA
Prepare Hybridization
microarray/DNA of labeled
chip (cDNA from
Place labeled cDNA with
reference genes or cDNA on cDNA
oligonucleotide microarray (complimentary
mixture) slide ) on microarray
Larger mRNA Scan array slide.

amount in cell More
(more
expression),
flourescence,
more more intensity 37
hybridization.Vic of expression
• Rapid evaluation of the
comparison of two
transcriptomes can be
achieved by running them
simultaneously on
identical arrays and
checking hybridization
patterns of the two
Microarray and DNA chip 38

 For hybridization to the probe nucleic acids on the microarray,
a labeled sample nucleic acid is needed.
 RNA extracted from the sample material can be used for

synthesis of labeled complementary DNA without amplification
of the sample RNA, or for the production of antisense-RNA.
 Common amplification procedures utilize bacteriophage T7, T3

or SP6 polymerases to transcribe RNA from DNA templates.
 By including labeled nucleotides in the in-vitro transcription

reaction, one can incorporate labels into the synthesized RNA.
39
 In a typical two-colour experiment, RNA is extracted from tumor tissue and
neighbouring normal tissue.
 The RNA samples are labelled with different fluorophores, tumor RNA with
a red fluorophore, normal RNA with a green one.
 Both samples are hybridized together on the microarray. If the spot then
appears in red, this means higher expression of that gene in tumor tissue
compared to normal tissue.
 If the spot looks green, higher expression in normal tissue compared to

tumor tissue has been detected.
 Yellow means equal expression in tumor versus normal tissue

40
41
Microarray/DNA chip after hybridization. Color intensity
shows level of hybridization.The cDNA prepared from
mRNA is first labeled with fluorescent marker (like Cy3 and
Cy5), then hybridized with array to produce such a pattern. 42
COMPLICATIONS
• Hybridization analysis will have insufficient specificity to
distinguish between every mRNA that could be present.
Two mRNAs, similar Paralogous genes
sequences, may active in same
cross- hybridize with tissue – group of
each related mRNAs can
other’s specific probe hybridize with
members of the
on the array same gene family
Distinguishing which Two or more different
specific mRNA is
present and how mRNAs could have
much is present been derived from
becomes difficult same gene
– alternate
splicing 43
concept
• Alternate splicing
44
COMPLICA
TIONS
• When comparing more than one transcriptome,
differences in mRNA amount and hybridization intensities
must be due to difference in transcripts rather than due
to experimental errors.
• Experimental errors could include:

– Amount of target DNA on array
– Efficiency with which probe has been labeled
– Effectiveness of hybridization process
45
NORMALIZATION PROCEDURES
TO COUNTER
EXPERIMENTAL FACTORS
• Enables results from different array experiments to
be accurately compared
• Normalization procedure:
– Negative controls, so that background can be
determined in each experiment
– Positive controls, always give identical signals
46
• In vertebrates, actin gene is used as positive
control
– Its expression level is fairly constant in
any particular tissue
– Even in developmental, or diseased state
47
APPLICATIONS
 Studying the transcriptome can lead to various applications
– Transcriptomes of stem cell and cancer cells can be studied by

researchers to understand cellular differentiation and
carcinogenesis
– Transcriptomes of human oocytes and embryos can be studied

to understand molecular mechanisms and signaling pathways in
embryonic development
– Used in biomarker discovery 48

MICROARRAY APPLICATIONS
IN CANCER PATHOGENESIS
 AND
Can be usedDIAGNOSIS
to elucidate the mechanisms of tumorigenesis and metastasis,
particularly to study the complexity of the underlying processes
 Cancer classification based on microarray studies aims at identifying

characteristics beyond anatomical site and histopathology.
 There are three types of microarray-based approaches:
(i) class comparison,
(ii) class discovery, and
(iii) class prediction.

49
 Class comparison - used to compare the expression profiles of two
(or more) predefined classes.
 Class discovery - used to identify novel subtypes within an

apparently homogenous population. .
The starting point usually is a homogenous group of specimens, in
which a concealed proportion behaves aberrantly or exhibits
invisible or unknown features.
The problem of cancer treatment falls under this category.
Patients who are stratified into treatment groups according to
standard histopathological criteria often respond differently to
therapy.
50
 Class prediction means to find a set of features that are
predictive for a certain, predefined class.
It is usually based on class discovery, followed by the idea to
establish a classifier.
A classifier is a set of features, like genes, proteins, micro-

RNAs that are surrogate markers for a certain class.
This is the common approach to identify predictive gene sets

or gene signatures that can predict clinical outcome or
therapy response.
51
RNA-SEQ: SEQUENCING
APPROACHES TO STUDY THE
TRANSCRIPTOME
• RNA-Seq transcriptomics replaces the hybridization of
nucleotide probes with sequencing individual cDNAs produced
from the target RNA.
• Have the potential to overcome the limitations of microarray
technology.
52
 The prototype of NGS is massive parallel signature sequencing (MPSS), which
applies four rounds of restriction enzyme digestion and ligation reactions to
determine the nucleotide sequence of cDNA ends generating a 17–20 bp
sequence as the fingerprint of a corresponding RNA.
 However, due to the nature of digestion and ligation reactions, a large fraction
of the sequence signatures obtained is not long enough to be unique
fingerprints of RNA molecules.
 High throughput sequencing also called Next Generation Sequencing (NGS)

have the capacity to sequence full genomes. Bacteriophage fX174, was the
first genome to be sequenced, a viral genome with only 5,368 base pairs
(bp).
53
The mRNA is extracted from
the organism, fragmented and
copied into stable ds-cDNA
(blue).
The ds-cDNA is sequenced
using high-throughput, short-
read sequencing methods.
These sequences can then be
aligned to a reference
genome sequence to
reconstruct which genome
regions were being
transcribed.
54
APPLICATION
 Disease diagnosis.
 RNA-Seq is a approach to discover, profile and quantify RNA transcripts in

the whole transcriptome. Does not use predesigned probes or primers so an
unbiased hypothesis free approach.
 RNA SEQUENCING IS USED WIDELY FOR:
 Studies of gene expression and discovery of novel transcripts and isoforms

 SNP discovery in the coding portions of the genome
 Denovo transcriptome assembly
 Basis for expression quantification in RNA-Seq
55
MICROARRAY RNA - SEQ
56
57
SUMMARY
 Microarray-based analysis of gene expression permits assessment of
normal mRNA levels and disease-related alterations on a genome
wide scale.
 Rigid statistical assessments of microarray data and the use of

sufficiently high numbers of specimens are obligatory due to the
inherent problem of multiple testing.
58
 In molecular pathology, gene expression patterns (often called gene
signatures, molecular portraits, or gene profiles) were identified
that correlate with disease state or clinical outcome. This enables
the nomination of candidate genes that may play a regulatory role in
the processes under study.
 Disease entities indistinguishable by histopathological criteria were

defined by expression profiling as well.
 For example, gene signatures related to lymphoma subtypes and to

malignant breast cancer were identified. A number of signatures
related to prediction of prognosis and response to therapy are
already in clinical use.
59
 Expression profiling essentially generates correlative
information only. Particularly, gene signatures diagnostic of
disease prognosis and therapy response have to be thoroughly
validated through unbiased analysis of large cohorts of
specimens.
 In systems-based analysis, microarray profiling (together with

other “–omics” technologies) has become an indispensible tool
for assessing the impact of pharmacological or genetic
perturbations on the transcriptome.
60

Human Transcriptom E: by Dr. Ina Garg

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Human Transcriptom E: by Dr. Ina Garg

Caricato da

Copyright:

Formati disponibili

HUMAN

By Dr. Ina Garg

Shortly, ribosomal RNA (rRNA) involved in protein synthesis was purified.

 In 1982, Kruger put forward the “ribozyme” concept,

 Further studies indicated that the actual molecules that directly

 In 1977, Sharp and Roberts showed that the mRNA sequence of

 Specifies the composition of the

 Expression triggered by environmental

 If conditions are not optimum, total

 The study of transcriptomics, also referred to as expression

II.To determine the transcriptional structure of genes.

III.To quantify the changing expression levels of each

 The functional characterization of transcribed sequences progressed at

 The corresponding cDNAs known to be expressed in various tissues or cell

 With increasing entries, it became feasible to merge overlapping partial

 The electronic northern analysis facilitated prediction of

 Extensive mining of EST databases using stringent statistical

 To circumvent this bias, researchers established several methods.

 Single-stranded and double stranded cDNAs are separated by

 Subsequently, the unique cDNA fragments are cloned and sequenced.

 Rare transcripts are not detected at all.

Overcome by suppression subtractive hybridization (SSH), a PCR-based

 A set of oligonucleotide primers is used, one being anchored

 The mRNA subpopulations defined by the primer pairs are

 Many samples can be run in parallel to reveal differences in

 Serial analysis of gene expression (SAGE) is an approach that

 SAGE is based mainly on two principles, representation of

3. (a) Bind to streptavidin-

(c) Discard loose fragments

 Captured cDNA are then ligated to linkers at their ends.

 Linkers must contain:

 Two groups of cDNAs are ligated to each other, to create a “ditag”

8. Amplify ditags, then cleave with anchoring enzyme.

 Each ditag is having an AE site, allowing the scientist and the

10. Sequence and record the tags and

 Allows rapid, detailed analysis of thousands of transcripts in a

 By comparing different types of cells, generate profiles that will

 To identify downstream targets of oncogenes and tumor

• Type II restriction enzyme does not yield same length

• mRNA levels and protein expression do not always

 Unlike SAGE in which tags come from other parts of transcripts,

Due to the limitations on throughput and cost, it is

WithSAGE and CAGE, respectively, multiple 3′ and 5′

 Fluorescent labelled cDNAs are incubated with oligonucleotide probes on

 High-density gene chip allowed relatively low cost gene expression

 Specific microarrays were designed according to the purpose of the

Larger mRNA Scan array slide.

Microarray and DNA chip 38

 RNA extracted from the sample material can be used for

 Common amplification procedures utilize bacteriophage T7, T3

 By including labeled nucleotides in the in-vitro transcription

 If the spot looks green, higher expression in normal tissue compared to

 Yellow means equal expression in tumor versus normal tissue

• Experimental errors could include:

– Transcriptomes of stem cell and cancer cells can be studied by

– Transcriptomes of human oocytes and embryos can be studied

– Used in biomarker discovery 48

 Cancer classification based on microarray studies aims at identifying

 There are three types of microarray-based approaches:

(i) class comparison,

(ii) class discovery, and

(iii) class prediction.