Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Application
Assembly: Compare
Raw Pre- specific:
Question Alignment / samples / Answer?
reads processing Variant calling,
de novo methods
count matrix, ...
36626 - Next Generation Sequencing Analysis
Generalized NGS analysis
Data size
Sample prep
&
Sequencing
Application
Assembly: Compare
Raw Pre- specific:
Question Alignment / samples / Answer?
reads processing Variant calling,
de novo methods
count matrix, ...
36626 - Next Generation Sequencing Analysis
Generalized NGS analysis
Data size
SNPs, genes,
regions
Sample prep
&
Sequencing
Application
Assembly: Compare
Raw Pre- specific:
Question Alignment / samples / Answer?
reads processing Variant calling,
de novo methods
count matrix, ...
36626 - Next Generation Sequencing Analysis
Generalized NGS analysis
Main data reductive steps
Data size
SNPs, genes,
regions
Sample prep
&
Sequencing
Application
Assembly: Compare
Raw Pre- specific:
Question Alignment / samples / Answer?
reads processing Variant calling,
de novo methods
count matrix, ...
36626 - Next Generation Sequencing Analysis
What is sequence data?
Sequences are stored in fasta-files
Header
>gi|218693476|ref|NC_011748.1| Escherichia coli 55989 chromosome, complete genome
GTAAGTATTTTTCAGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGT
Sequence GTCTGATAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAA
ATACTTTAACCAATATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTACACAACATCCATGAAACG
CATTAGCACCACCATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGGCTGACGCGTACAGGAA
ACACAGAAAAAAGCCCGCACCTGACAGTGCGGGCTTTTTTTTCGACCAAAGGTAACGAGGTAACAACCAT
GCGAGTGTTGAAGTTCGGCGGTACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTG
GAAAGCAATGCCAGGCAGGGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAACCACCTGG
TGGCGATGATTGAAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAACGTATTTT
TGCCGAACTTTTGACGGGACTCGCCGCCGCCCAGCCGGGGTTCCCGCTGGCGCAATTGAAAACTTTCGTC
GATCAGGAATTTGCCCAAATAAAACATGTCCTGCATGGCATTAGTTTGTTGGGGCAGTGCCCGGATAGCA
@ILLUMINA-C90280_0030_FC:5:1:2675:1090#NNNNNN/1
Sequence ATTCCCGGCCTTTTTCCAGGCCTGCCTGCTCGAGC
+
BAAAGECEE<EEDFEDF3DBDBB=A+==>9>>88?
Qualities
(prob. that base call is wrong)
@ILLUMINA-C90280_0030_FC:5:1:2675:1090#NNNNNN/1
Sequence ATTCCCGGCCTTTTTCCAGGCCTGCCTGCTCGAGC
+
BAAAGECEE<EEDFEDF3DBDBB=A+==>9>>88?
Qualities
(prob. that base call is wrong)
@ILLUMINA-C90280_0030_FC:5:1:2675:1090#NNNNNN/1
Sequence ATTCCCGGCCTTTTTCCAGGCCTGCCTGCTCGAGC
+
BAAAGECEE<EEDFEDF3DBDBB=A+==>9>>88?
Qualities
(prob. that base call is wrong)
66
66 65
66 65 65
66 65 65
Q ~ Prob
10 ~ 0.1
20 ~ 0.01
30 ~ 0.001
40 ~ 0.0001
66 65 65 ~1e-6
Q ~ Prob
10 ~ 0.1
20 ~ 0.01
30 ~ 0.001
40 ~ 0.0001
http://www.cbs.dtu.dk/courses/27626/Exercises/
Data_basics_exercise.php