Sei sulla pagina 1di 60

HUMAN

TRANSCRIPTOM
E 1

By Dr. Ina Garg


TRANSCRIPTOME: A BRIEF
HISTORY

Transcriptomics is the study of RNA, single-stranded nucleic acid, which was
not separated from the DNA world until the central dogma was formulated
by Francis Crick in 1958.

In 1961, Jacob and Monod proposed a model that the protein-coding gene is
transcribed into a special short-lived intermediate associated with the
ribosome, which was designated as mRNA.

A short, stable RNA, transfer RNA (tRNA), was identified as the predicted
“adaptor”.

Shortly, ribosomal RNA (rRNA) involved in protein synthesis was purified.


2
 Since the late 1970s, Altman and Cech revealed
respectively that RNA can function as a catalyst.

 In 1982, Kruger put forward the “ribozyme” concept,


demonstrating that RNA could act as both genetic
material (like DNA) and a biological catalyst (like
protein enzymes).

3
 In 1998, Fire and Mello found that double-stranded RNAs
(dsRNAs) could recognize specific mRNA sequence and then led
to the degradation of the target mRNAs, which was known as
RNA interference (RNAi).

 Further studies indicated that the actual molecules that directly


caused RNAi were short dsRNA fragments of 21–25 base pair,
called small interfering RNA (siRNA).

 In 1977, Sharp and Roberts showed that the mRNA sequence of


adenovirus displayed discontinuous distribution in the genome,
and therefore suggested that a typical eukaryotic gene consists
of exons, the protein- coding sequence, and introns, the non-
coding sequence. 4
TRANSCRIPTOME
 A transcriptome is the collection of all
the mRNA molecules or transcripts
transcribed from the DNA (genome), at
any one time, in a particular cell.

 Specifies the composition of the


proteome

 Expression triggered by environmental


factors

 If conditions are not optimum, total


expression can be switched off
5
TRANSCRIPTOMICS
SCOPE
 Reflects the genes that are being actively expressed at any
given time, with the exception of mRNA degradation phenomena
such as transcriptional attenuation.

 The study of transcriptomics, also referred to as expression


profiling, examines the expression level of mRNAs in a given cell
population, often using high-throughput techniques based on DNA
microarray technology.

6
TRANSCRIPTOMICS AIMS
I. To catalogue all species of transcripts.

II.To determine the transcriptional structure of genes.

III.To quantify the changing expression levels of each


transcript during development and under different
conditions.

7
TRANSCRIPTOMES GIVE US
INFORMATION OF GENE
EXPRESSION
Why use transcriptomes in biological research?
Pros Cons
• Easy, accessible • Snapshot in time (different times,
• Immediate access to the different expression patterns)
protein coding portion of the • Absence of gene expression does not
genome mean it is not present in the genome.
• Identify alternative splicing • Difficult to ensure that you have
• Identify SNPs in coding sampled a single cell type.
regions • Statistical analysis is highly 8
dependent on experimental design
TECHNOLOGIES
 Hybridization-based approaches
– fluorescently labelled cDNA with custom-made microarrays
– commercial high-density oligo microarrays

Sequence-based approaches
– Sanger sequencing of cDNA or EST libraries
– serial analysis of gene expression (SAGE)
– cap analysis of gene expression (CAGE)
– massively parallel signature sequencing (MPSS)
9
CDNA LIBRARIES AND DATA

MINING
Collections of cDNAs were obtained from various normal and diseased
tissues, as well as from reference cell lines.

 The functional characterization of transcribed sequences progressed at


the same time.

 The corresponding cDNAs known to be expressed in various tissues or cell


types analyzed were designated expressed sequence tags (or ESTs).

 With increasing entries, it became feasible to merge overlapping partial


sequences and eventually to define full length open-reading frames (ORFs).

10
 Global gene expression information provided by cDNA/EST
databases made electronic northern feasible.

 The electronic northern analysis facilitated prediction of


expression changes between normal and diseased tissues.

 Extensive mining of EST databases using stringent statistical


tests permitted identification of candidate genes whose altered
(stimulated or reduced) expression correlated with the disease
state.

11
CDNA SUBTRACTION
 The data mining approach was limited by the existing sequence
information and the available gene annotations.

 To circumvent this bias, researchers established several methods.


 cDNA subtraction
 Differential display PCR (DD)
 Representational difference analysis
 Serial analysis of gene expression (SAGE).

12
 cDNA subtraction is a method for separating cDNA molecules that
distinguish related cDNA samples

 cDNAs prepared from two different cell types to be compared are rendered
single-stranded, subsequently mixed, and incubated to allow annealing of
sequences common to both cell species.

 These sequences will hybridize, while sequences unique to one of the cells
will stay single-stranded.

 Single-stranded and double stranded cDNAs are separated by


hydroxylapatite chromatography.

 Subsequently, the unique cDNA fragments are cloned and sequenced.


13
Subtractive hybridization requires two populations of nucleic acids
TEST
ER/ Driver
TRAC
ER It lacks the target sequences
Contains the target nucleic acid (the DNA or
RNA differences that one wants to identify)

The two populations are hybridized with a driver to tester ratio of at least 10:1. Because of
the large excess of driver molecules, tester sequences are more likely to form driver-tester
hybrids than double-stranded tester.

Only the sequences in common between the tester and the driver hybridize, however, leaving
the remaining tester sequences either single-stranded or forming tester-tester pairs.
14
 The driver-tester, double-stranded driver and any single-stranded
driver molecules are subsequently removed (the "subtractive" step),
leaving only tester molecules enriched for sequences not found in the
driver.(target sequences)

15
PROCESS

16
DISADVANTAGE
 Abundant mRNAs (cDNAs) are over-represented due to the lack of
normalization.

 Rare transcripts are not detected at all.

Overcome by suppression subtractive hybridization (SSH), a PCR-based


subtraction method that combines normalization and subtraction into a single
procedure.

The normalization step equalizes the abundance of cDNA fragments within the
target population, and the subtraction step excludes sequences that are common
to the cell populations being contrasted 17
 Differential display PCR is a method to separate and clone
individual mRNAs that are differentially expressed.

 A set of oligonucleotide primers is used, one being anchored


to the polyadenylated tail of a subset of mRNAs, the other
being short and arbitrary in sequence to allow annealing at
different sites relative to the first primer.

 The mRNA subpopulations defined by the primer pairs are


amplified after reverse transcription and the products
resolved (displayed) on DNA sequencing gels.

 Many samples can be run in parallel to reveal differences in


mRNA composition. 18
SERIAL ANALYSIS OF GENE EXPRESSION
 Invented at Johns Hopkins University in USA (Oncology Center)
by Dr. Victor Velculescu in 1995.

 Serial analysis of gene expression (SAGE) is an approach that


allows rapid and detailed analysis of overall gene expression
patterns.

 SAGE is based mainly on two principles, representation of


mRNAs (cDNAs) by short sequence tags and concatenation of
these tags for cloning to allow the efficient sequencing analysis.
19
SAGE
1. Isolate mRNA.

20
2. (a) Add biotin-labeled dT B

primer:
(b) Synthesize ds cDNA.

3. (a) Bind to streptavidin-


coated beads.
B

21
22
(b) Cleave with “anchoring enzyme”(restriction
enzyme)

 Type II restriction
enzyme used (E.g. NlaIII.)
 Average length of cDNA
–256bp with sticky ends
created.

(c) Discard loose fragments

23
5. DIVIDE INTO TWO POOLS AND ADD LINKER SEQUENCES

 Captured cDNA are then ligated to linkers at their ends.

 Linkers must contain:


 NlaIII 4-nucleotide cohesive overhang.
 Type IIs recognition sequence.
 PCR primer sequence.

24
Linker
s

B B

B B

Pool A Pool B

25
6.CLEAVING WITH TAGGING ENZYME
 Tagging enzyme, (usually BsmF1) cleave DNA, releasing the
linker-adapted SAGE tag from each cDNA.
 Repair of ends to make blunt ended tags using DNA polymerase
(Klenow fragments) and dNTPs.

26
7. Combine pools and ligate.(formation of ditags)

 Two groups of cDNAs are ligated to each other, to create a “ditag”


with linkers on either end.
 Two tags are linked together using T4 DNA ligase.

8. Amplify ditags, then cleave with anchoring enzyme.

 This leaves a “sticky” end with the sequence GTAC (or CAGT on the
other strand) at each end of the ditag.
27
9. Ligate digitags. (Concatamerization of Ditags)

 Each ditag is having an AE site, allowing the scientist and the


computer to recognize where one ends and the next begins.

10. Sequence and record the tags and


frequencies.

28
APPLICATIONS OF SAGE
 To analyze differences between gene expression patterns of
cancer cells and their normal counter parts.

 Allows rapid, detailed analysis of thousands of transcripts in a


cell.

 By comparing different types of cells, generate profiles that will


help to understand healthy cells and pathogenesis.

 To identify downstream targets of oncogenes and tumor


suppresser genes.
29
ADVANTAGE
S:
 mRNA sequence does not need to be known prior, so genes
of variants which are not known can be discovered.

 Accurate.

30
PROBLEMS IN
SAGE…
• Length of gene tag is extremely short (13 or 14bp), so
if the tag is derived from an unknown gene, it is
difficult to analyze with such a short sequence.

• Type II restriction enzyme does not yield same length


fragments.

• mRNA levels and protein expression do not always


correlate.
31
CAP ANALYSIS OF GENE
EXPRESSION
 Cap analysis gene expression (CAGE) is used to produce a snapshot
of the 5′ end of the mRNA population in a biological sample (the
transcriptome).

 The small fragments from the very beginnings of mRNAs (5' ends
of capped transcripts) are extracted reverse-transcribed to
DNA PCR amplified sequenced.

 Unlike SAGE in which tags come from other parts of transcripts,


CAGE is primarily used to locate exact transcription start sites in
the genome. This knowledge in turn allows a researcher to
investigate promoter structure necessary for gene expression.
32
Sanger sequencing of EST or cDNA library provided
information for genome annotation in the early days of
genome research.

Due to the limitations on throughput and cost, it is


impossible to achieve transcriptome quantitative
analysis using EST methods.

WithSAGE and CAGE, respectively, multiple 3′ and 5′


cDNA ends were concatenated to be one clone.
33
However, due to the high cost of Sanger
sequencing and the difficulty to map the
short sequence (~20 bp) tags to genome,
CAGE and SAGE were replaced by DNA
microarray shortly.

34
TRANSCRIPTOME ANALYSIS BASED
ON MICROARRAYS
 DNA microarray or chip method is based on nucleic acid hybridization.

 Fluorescent labelled cDNAs are incubated with oligonucleotide probes on


the chip and then the abundance of RNA is determined by measuring
fluorescence density.

 High-density gene chip allowed relatively low cost gene expression


profiling.

 Specific microarrays were designed according to the purpose of the


experiment, such as arrays to detect different isoforms from
alternative splicing. 35
MICROARRAY SAGE

36
BASIC FLOWCHART OF
TRANSCRIPTOME STUDY USING
MICROARRAY/DNA CHIP Use Reverse
Collect mRNA Transcriptase (RT) Label cDNA with
molecules enzyme to
flourescent dyes
from a cell produce cDNA
molecules from
the mRNA

Prepare Hybridization
microarray/DNA of labeled
chip (cDNA from
Place labeled cDNA with
reference genes or cDNA on cDNA
oligonucleotide microarray (complimentary
mixture) slide ) on microarray

Larger mRNA Scan array slide.


amount in cell More
(more
expression),
flourescence,
more more intensity 37
hybridization.Vic of expression
• Rapid evaluation of the
comparison of two
transcriptomes can be
achieved by running them
simultaneously on
identical arrays and
checking hybridization
patterns of the two

Microarray and DNA chip 38


 For hybridization to the probe nucleic acids on the microarray,
a labeled sample nucleic acid is needed.

 RNA extracted from the sample material can be used for


synthesis of labeled complementary DNA without amplification
of the sample RNA, or for the production of antisense-RNA.

 Common amplification procedures utilize bacteriophage T7, T3


or SP6 polymerases to transcribe RNA from DNA templates.

 By including labeled nucleotides in the in-vitro transcription


reaction, one can incorporate labels into the synthesized RNA.
39
 In a typical two-colour experiment, RNA is extracted from tumor tissue and
neighbouring normal tissue.

 The RNA samples are labelled with different fluorophores, tumor RNA with
a red fluorophore, normal RNA with a green one.

 Both samples are hybridized together on the microarray. If the spot then
appears in red, this means higher expression of that gene in tumor tissue
compared to normal tissue.

 If the spot looks green, higher expression in normal tissue compared to


tumor tissue has been detected.

 Yellow means equal expression in tumor versus normal tissue


40
41
Microarray/DNA chip after hybridization. Color intensity
shows level of hybridization.The cDNA prepared from
mRNA is first labeled with fluorescent marker (like Cy3 and
Cy5), then hybridized with array to produce such a pattern. 42
COMPLICATIONS
• Hybridization analysis will have insufficient specificity to
distinguish between every mRNA that could be present.
Two mRNAs, similar Paralogous genes
sequences, may active in same
cross- hybridize with tissue – group of
each related mRNAs can
other’s specific probe hybridize with
members of the
on the array same gene family
Distinguishing which Two or more different
specific mRNA is
present and how mRNAs could have
much is present been derived from
becomes difficult same gene
– alternate
splicing 43
concept
• Alternate splicing

44
COMPLICA
TIONS
• When comparing more than one transcriptome,
differences in mRNA amount and hybridization intensities
must be due to difference in transcripts rather than due
to experimental errors.

• Experimental errors could include:


– Amount of target DNA on array
– Efficiency with which probe has been labeled
– Effectiveness of hybridization process

45
NORMALIZATION PROCEDURES
TO COUNTER
EXPERIMENTAL FACTORS
• Enables results from different array experiments to
be accurately compared
• Normalization procedure:
– Negative controls, so that background can be
determined in each experiment
– Positive controls, always give identical signals
46
• In vertebrates, actin gene is used as positive
control
– Its expression level is fairly constant in
any particular tissue
– Even in developmental, or diseased state

47
APPLICATIONS
 Studying the transcriptome can lead to various applications

– Transcriptomes of stem cell and cancer cells can be studied by


researchers to understand cellular differentiation and
carcinogenesis

– Transcriptomes of human oocytes and embryos can be studied


to understand molecular mechanisms and signaling pathways in
embryonic development

– Used in biomarker discovery 48


MICROARRAY APPLICATIONS
IN CANCER PATHOGENESIS
 AND
Can be usedDIAGNOSIS
to elucidate the mechanisms of tumorigenesis and metastasis,
particularly to study the complexity of the underlying processes

 Cancer classification based on microarray studies aims at identifying


characteristics beyond anatomical site and histopathology.

 There are three types of microarray-based approaches:

(i) class comparison,

(ii) class discovery, and

(iii) class prediction.


49
 Class comparison - used to compare the expression profiles of two
(or more) predefined classes.

 Class discovery - used to identify novel subtypes within an


apparently homogenous population. .
The starting point usually is a homogenous group of specimens, in
which a concealed proportion behaves aberrantly or exhibits
invisible or unknown features.
The problem of cancer treatment falls under this category.
Patients who are stratified into treatment groups according to
standard histopathological criteria often respond differently to
therapy.

50
 Class prediction means to find a set of features that are
predictive for a certain, predefined class.
It is usually based on class discovery, followed by the idea to
establish a classifier.

A classifier is a set of features, like genes, proteins, micro-


RNAs that are surrogate markers for a certain class.

This is the common approach to identify predictive gene sets


or gene signatures that can predict clinical outcome or
therapy response.
51
RNA-SEQ: SEQUENCING
APPROACHES TO STUDY THE
TRANSCRIPTOME
• RNA-Seq transcriptomics replaces the hybridization of
nucleotide probes with sequencing individual cDNAs produced
from the target RNA.
• Have the potential to overcome the limitations of microarray
technology.

52
 The prototype of NGS is massive parallel signature sequencing (MPSS), which
applies four rounds of restriction enzyme digestion and ligation reactions to
determine the nucleotide sequence of cDNA ends generating a 17–20 bp
sequence as the fingerprint of a corresponding RNA.

 However, due to the nature of digestion and ligation reactions, a large fraction
of the sequence signatures obtained is not long enough to be unique
fingerprints of RNA molecules.

 High throughput sequencing also called Next Generation Sequencing (NGS)


have the capacity to sequence full genomes. Bacteriophage fX174, was the
first genome to be sequenced, a viral genome with only 5,368 base pairs
(bp).
53
The mRNA is extracted from
the organism, fragmented and
copied into stable ds-cDNA
(blue).
The ds-cDNA is sequenced
using high-throughput, short-
read sequencing methods.
These sequences can then be
aligned to a reference
genome sequence to
reconstruct which genome
regions were being
transcribed.

54
APPLICATION
 Disease diagnosis.

 RNA-Seq is a approach to discover, profile and quantify RNA transcripts in


the whole transcriptome. Does not use predesigned probes or primers so an
unbiased hypothesis free approach.

 RNA SEQUENCING IS USED WIDELY FOR:

 Studies of gene expression and discovery of novel transcripts and isoforms


 SNP discovery in the coding portions of the genome
 Denovo transcriptome assembly
 Basis for expression quantification in RNA-Seq
55
MICROARRAY RNA - SEQ

56
57
SUMMARY
 Microarray-based analysis of gene expression permits assessment of
normal mRNA levels and disease-related alterations on a genome
wide scale.

 Rigid statistical assessments of microarray data and the use of


sufficiently high numbers of specimens are obligatory due to the
inherent problem of multiple testing.

58
 In molecular pathology, gene expression patterns (often called gene
signatures, molecular portraits, or gene profiles) were identified
that correlate with disease state or clinical outcome. This enables
the nomination of candidate genes that may play a regulatory role in
the processes under study.

 Disease entities indistinguishable by histopathological criteria were


defined by expression profiling as well.

 For example, gene signatures related to lymphoma subtypes and to


malignant breast cancer were identified. A number of signatures
related to prediction of prognosis and response to therapy are
already in clinical use.
59
 Expression profiling essentially generates correlative
information only. Particularly, gene signatures diagnostic of
disease prognosis and therapy response have to be thoroughly
validated through unbiased analysis of large cohorts of
specimens.

 In systems-based analysis, microarray profiling (together with


other “–omics” technologies) has become an indispensible tool
for assessing the impact of pharmacological or genetic
perturbations on the transcriptome.

60

Potrebbero piacerti anche