Pages From SAMv1-15

Caricato da

antidemos

Il 0% ha trovato utile questo documento (0 voti)

17 visualizzazioni1 pagina

Copyright

Formati disponibili

PDF, TXT o leggi online da Scribd

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Segnala questo documento

Copyright:

Attribution Non-Commercial (BY-NC)

Formati disponibili

Scarica in formato PDF, TXT o leggi online su Scribd

Segnala contenuti inappropriati

Il 0% ha trovato utile questo documento (0 voti)

17 visualizzazioni1 pagina

Pages From SAMv1-15

Caricato da

antidemos

Copyright:

Attribution Non-Commercial (BY-NC)

Formati disponibili

Scarica in formato PDF, TXT o leggi online su Scribd

Segnala contenuti inappropriati

Salta alla pagina

Sei sulla pagina 1di 1

Cerca all'interno del documento

5

Indexing BAM

Indexing aims to achieve fast retrieval of alignments overlapping a specied region without going through the whole alignments. BAM must be sorted by the reference ID and then the leftmost coordinate before indexing.

5.1
5.1.1

Algorithm
Basic binning index

The UCSC binning scheme was suggested by Richard Durbin and Lincoln Stein and is explained by Kent et al. (2002). In this scheme, each bin represents a contiguous genomic region which is either fully contained in or non-overlapping with another bin; each alignment is associated with a bin which represents the smallest region containing the entire alignment. The binning scheme is essentially a representation of R-tree. A distinct bin uniquely corresponds to a distinct internal node in a R-tree. Bin A is a child of Bin B if the region represented by A is contained in B. To nd the alignments that overlap a specied region, we need to get the bins that overlap the region, and then test each alignment in the bins to check overlap. To quickly nd alignments associated with a specied bin, we can keep in the index the start le osets of chunks of alignments which all have the bin. As alignments are sorted by the leftmost coordinates, alignments having the same bin tend to be clustered together on the disk and therefore usually a bin is only associated with a few chunks. Traversing all the alignments having the same bin usually needs a few seek calls. Given the set of bins that overlap the specied region, we can visit alignments in the order of their leftmost coordinates and stop seeking the rest when an alignment falls outside the required region. This strategy saves half of the seek calls in average. In BAM, each bin may span 229 , 226 , 223 , 220 , 217 or 214 bp3 . Bin 0 spans a 512Mbp region, bins 18 span 64Mbp, 972 8Mbp, 73584 1Mbp, 5854680 128Kbp and bins 468137449 span 16Kbp regions. 5.1.2 Reducing small chunks

Around the boundary of two adjacent bins, we may see many small chunks with some having a shorter bin while the rest having a larger bin. To reduce the number of seek calls, we may join two chunks having the same bin if they are close to each other. After this process, a joined chunk will contain alignments with dierent bins. We need to keep in the index the le oset of the end of each chunk to identify its boundaries. 5.1.3 Combining with linear index

For an alignment starting beyond 64Mbp, we always need to seek to some chunks in bin 0, which can be avoided by using a linear index. In the linear index, for each tiling 16384bp window on the reference, we record the smallest le oset of the alignments that start in the window. Given a region [rbeg,rend), we only need to visit a chunk whose end le oset is larger than the le oset of the 16kbp window containing rbeg. With both binning and linear indices, we can retrieve alignments in most of regions with just one seek call. 5.1.4 A conceptual example

Suppose we have a genome shorter than 144kbp. we can design a binning scheme which consists of three types of bins: bin 0 spans 0-144kbp, bin 1, 2 and 3 span 48kbp and bins from 4 to 12 span 16kbp each: An alignment starting at 65kbp and ending at 67kbp would have a bin number 8, which is the smallest bin containing the alignment. Similarly, an alignment starting at 51kbp and ending at 70kbp
3 Due to a limitation in the current indexing scheme, a chromosome sequence longer than 229 1 is not supported during indexing.

Potrebbero piacerti anche

Oracle Database Performance Tuning: Presented by - Rahul Gaikwad
Documento42 pagine
Oracle Database Performance Tuning: Presented by - Rahul Gaikwad
thamizh karuvoolam
Nessuna valutazione finora
Buoyed Up: Proven in The Past, Prepared For The Future
Documento10 pagine
Buoyed Up: Proven in The Past, Prepared For The Future
antidemos
Nessuna valutazione finora
Quiz 6 L5-L9
Documento131 pagine
Quiz 6 L5-L9
Rio Ardayana
67% (9)
Algebraic Geometry in Coding Theory and Cryptography
Da Everand
Algebraic Geometry in Coding Theory and Cryptography
Harald Niederreiter
Nessuna valutazione finora
NumPy Recipes
Da Everand
NumPy Recipes
Martin McBride
Nessuna valutazione finora
IP Routing Protocols All-in-one: OSPF EIGRP IS-IS BGP Hands-on Labs
Da Everand
IP Routing Protocols All-in-one: OSPF EIGRP IS-IS BGP Hands-on Labs
Redouane MEDDANE
Nessuna valutazione finora
Python Data Import
Documento28 pagine
Python Data Import
Beni Djohan
100% (1)
TDL Reference Manual
Documento567 pagine
TDL Reference Manual
dcpkahdar
Nessuna valutazione finora
Breadth First Search: Fundamentals and Applications
Da Everand
Breadth First Search: Fundamentals and Applications
Fouad Sabry
Nessuna valutazione finora
257 - Trelleborg Bending Stiffeners
Documento8 pagine
257 - Trelleborg Bending Stiffeners
antidemos
Nessuna valutazione finora
Parallel Computing Simply in Depth by Ajit Singh PDF
Documento125 pagine
Parallel Computing Simply in Depth by Ajit Singh PDF
azuanr830
Nessuna valutazione finora
Metaheuristics for Vehicle Routing Problems
Da Everand
Metaheuristics for Vehicle Routing Problems
Nacima Labadie
Nessuna valutazione finora
SAP TM BOPF Learning
Documento11 pagine
SAP TM BOPF Learning
Fayazu Palagiri
100% (1)
Placement Issues and Fixes - Nishanth
Documento15 pagine
Placement Issues and Fixes - Nishanth
Nishanth Gowda
Nessuna valutazione finora
Storage Area Networks For Dummies
Da Everand
Storage Area Networks For Dummies
Christopher Poelker
Valutazione: 3.5 su 5 stelle
3.5/5 (2)
An Introduction to Algebraic and Combinatorial Coding Theory
Da Everand
An Introduction to Algebraic and Combinatorial Coding Theory
Ian F. Blake
Valutazione: 4 su 5 stelle
4/5 (1)
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
Da Everand
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
Fouad Sabry
Nessuna valutazione finora
Algebraic Number Fields
Da Everand
Algebraic Number Fields
Elsevier Books Reference
Nessuna valutazione finora
Jgt98deform AABB
Documento14 pagine
Jgt98deform AABB
Sachin G
Nessuna valutazione finora
Biology 2960 Computer Laboratory
Documento7 pagine
Biology 2960 Computer Laboratory
Alya Aynetdinova
Nessuna valutazione finora
Bucket Sort
Documento2 pagine
Bucket Sort
Pratima Acharya
Nessuna valutazione finora
Cache Memory
Documento20 pagine
Cache Memory
Keshav Bharadwaj R
Nessuna valutazione finora
Impulsive Noise Immunity of Multidimensional Pulse Position Modulation Arxiv1805.08120
Documento14 pagine
Impulsive Noise Immunity of Multidimensional Pulse Position Modulation Arxiv1805.08120
William Johnson
Nessuna valutazione finora
Lecture 5
Documento18 pagine
Lecture 5
MuhammadFakhriAimi
Nessuna valutazione finora
Assignment 2
Documento17 pagine
Assignment 2
Manav
Nessuna valutazione finora
Bucket Sort
Documento9 pagine
Bucket Sort
DON ERICK Bonus
Nessuna valutazione finora
Module 6 CO 2020
Documento40 pagine
Module 6 CO 2020
jinto0007
Nessuna valutazione finora
Solving The Two Dimensional Bin Packing With Variable Bin Sizes
Documento8 pagine
Solving The Two Dimensional Bin Packing With Variable Bin Sizes
Lâm Anh Vũ
Nessuna valutazione finora
An Ecient Algorithm For Constructing Minimal Trellises For Codes Over Finite Abelian Groups
Documento36 pagine
An Ecient Algorithm For Constructing Minimal Trellises For Codes Over Finite Abelian Groups
Arun Av
Nessuna valutazione finora
A Cost-Effective Technique To Reduce HOL Blocking in Single-Stage and Multistage Switch Fabrics
Documento17 pagine
A Cost-Effective Technique To Reduce HOL Blocking in Single-Stage and Multistage Switch Fabrics
Gopu Thalikunnath
Nessuna valutazione finora
Chap 11
Documento30 pagine
Chap 11
Sayem Hasan
Nessuna valutazione finora
Jumping Library
Documento8 pagine
Jumping Library
watson191
Nessuna valutazione finora
Adaptive Mesh Applications: Sathish Vadhiyar
Documento32 pagine
Adaptive Mesh Applications: Sathish Vadhiyar
Atreya Dey
Nessuna valutazione finora
Chunkstash: Speeding Up Storage Deduplication Using Flash Memory
Documento38 pagine
Chunkstash: Speeding Up Storage Deduplication Using Flash Memory
m.public
Nessuna valutazione finora
Quicksort
Documento36 pagine
Quicksort
Md. Zubaer Hossain
Nessuna valutazione finora
Two Dimensional Packing: The Power of Rotation
Documento10 pagine
Two Dimensional Packing: The Power of Rotation
Wantak
Nessuna valutazione finora
My Presentation - 6th Oct. 2011
Documento18 pagine
My Presentation - 6th Oct. 2011
Edinamobong N. Okon
Nessuna valutazione finora
GC 2011 Garan Zha
Documento4 pagine
GC 2011 Garan Zha
Daniel Mullen
Nessuna valutazione finora
Parallel Breadth First Search: CS 420 Final Project
Documento10 pagine
Parallel Breadth First Search: CS 420 Final Project
kicha120492
Nessuna valutazione finora
Ecient Neighbor Searching in Nonlinear Time Series Analysis
Documento20 pagine
Ecient Neighbor Searching in Nonlinear Time Series Analysis
Сињор Кнежевић
Nessuna valutazione finora
DS, C, C++, Aptitude, Unix, RDBMS, SQL, CN, Os
Documento176 pagine
DS, C, C++, Aptitude, Unix, RDBMS, SQL, CN, Os
api-3726520
67% (3)
Cache Handout
Documento3 pagine
Cache Handout
gvishal.murthy
Nessuna valutazione finora
Memory Mapping Techniques (Zain)
Documento3 pagine
Memory Mapping Techniques (Zain)
ZAIN ABBAS
Nessuna valutazione finora
Roaring Bitmap
Documento24 pagine
Roaring Bitmap
Qihao Shi
Nessuna valutazione finora
SC 17
Documento12 pagine
SC 17
Warrior Bro
Nessuna valutazione finora
Q Sitefinder
Documento3 pagine
Q Sitefinder
chandan kumar
Nessuna valutazione finora
Good Error-Correcting Codes Based On Very Sparse Matrices: David J. C. Mackay
Documento33 pagine
Good Error-Correcting Codes Based On Very Sparse Matrices: David J. C. Mackay
Thao Chip
Nessuna valutazione finora
9장 캐시
Documento65 pagine
9장 캐시
민규
Nessuna valutazione finora
It Is A Very Efficient Method To Search The Exact Data Items Based On Hash Table
Documento49 pagine
It Is A Very Efficient Method To Search The Exact Data Items Based On Hash Table
Dr. Salini Suresh
Nessuna valutazione finora
Project Report
Documento5 pagine
Project Report
Oussama
Nessuna valutazione finora
Technical Aptitude Questions Ebook
Documento250 pagine
Technical Aptitude Questions Ebook
Amrit Raj
Nessuna valutazione finora
Fine Art of Film Grain 1
Documento6 pagine
Fine Art of Film Grain 1
Kristine
Nessuna valutazione finora
Memory Management in RTOS
Documento20 pagine
Memory Management in RTOS
jayanthimurthy
Nessuna valutazione finora
Sequential Decoding by Stack Algorithm
Documento9 pagine
Sequential Decoding by Stack Algorithm
Ram Chandram
100% (1)
Mill Cpu Split-Stream Encoding
Documento5 pagine
Mill Cpu Split-Stream Encoding
Anonymous BWVVQpx
Nessuna valutazione finora
Problems Memory Hierarchy
Documento27 pagine
Problems Memory Hierarchy
Cruise_Ice
Nessuna valutazione finora
Restriction Mapping: Sample Problem
Documento7 pagine
Restriction Mapping: Sample Problem
Hansa Boricha
Nessuna valutazione finora
CH 4
Documento37 pagine
CH 4
Aravind Voruganti
Nessuna valutazione finora
Data Structure Aucse
Documento10 pagine
Data Structure Aucse
api-3804388
Nessuna valutazione finora
Counting, Bucket and Radix Sort
Documento23 pagine
Counting, Bucket and Radix Sort
Bilal F4
Nessuna valutazione finora
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
Da Everand
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
Fouad Sabry
Nessuna valutazione finora
Error-Correction on Non-Standard Communication Channels
Da Everand
Error-Correction on Non-Standard Communication Channels
Edward A. Ratzer
Nessuna valutazione finora
Microelectronic Systems N3 Checkbook
Da Everand
Microelectronic Systems N3 Checkbook
R E Vears
Nessuna valutazione finora
Distributions and Their Applications in Physics: International Series in Natural Philosophy
Da Everand
Distributions and Their Applications in Physics: International Series in Natural Philosophy
F. Constantinescu
Nessuna valutazione finora
Subband Adaptive Filtering: Theory and Implementation
Da Everand
Subband Adaptive Filtering: Theory and Implementation
Kong-Aik Lee
Nessuna valutazione finora
Search Tree: Fundamentals and Applications
Da Everand
Search Tree: Fundamentals and Applications
Fouad Sabry
Nessuna valutazione finora
Broadband Planar Antennas: Design and Applications
Da Everand
Broadband Planar Antennas: Design and Applications
Zhi Ning Chen
Nessuna valutazione finora
Topics in Field Theory
Da Everand
Topics in Field Theory
G. Karpilovsky
Nessuna valutazione finora
XC4418 dataSheetMain PDF
Documento1 pagina
XC4418 dataSheetMain PDF
antidemos
Nessuna valutazione finora
4.2 The BAM Format: List of Reference Information (N N Ref)
Documento1 pagina
4.2 The BAM Format: List of Reference Information (N N Ref)
antidemos
Nessuna valutazione finora
Franklin AID 2x2pp FE277
Documento4 pagine
Franklin AID 2x2pp FE277
antidemos
Nessuna valutazione finora
5.2 The BAM Indexing Format: List of Indices (N N Ref) List of Distinct Bins (N N Bin)
Documento1 pagina
5.2 The BAM Indexing Format: List of Indices (N N Ref) List of Distinct Bins (N N Bin)
antidemos
Nessuna valutazione finora
3 Guide For Describing Assembly Sequences in SAM: 3.1 Unpadded Versus Padded Representation
Documento1 pagina
3 Guide For Describing Assembly Sequences in SAM: 3.1 Unpadded Versus Padded Representation
antidemos
Nessuna valutazione finora
2 Recommended Practice For The SAM Format
Documento1 pagina
2 Recommended Practice For The SAM Format
antidemos
Nessuna valutazione finora
1.5 The Alignment Section: Optional Fields
Documento1 pagina
1.5 The Alignment Section: Optional Fields
antidemos
Nessuna valutazione finora
1.4 The Alignment Section: Mandatory Fields
Documento1 pagina
1.4 The Alignment Section: Mandatory Fields
antidemos
Nessuna valutazione finora
1.2 Terminologies and Concepts
Documento1 pagina
1.2 Terminologies and Concepts
antidemos
Nessuna valutazione finora
Sequence Alignment/Map Format Specification
Documento1 pagina
Sequence Alignment/Map Format Specification
antidemos
Nessuna valutazione finora
2 Pages From UserManual
Documento3 pagine
2 Pages From UserManual
antidemos
Nessuna valutazione finora
FFFFF
Documento21 pagine
FFFFF
Sami Boy
Nessuna valutazione finora
Advantages and Disadvantages of HTML, CSS, Javascript and PHP
Documento17 pagine
Advantages and Disadvantages of HTML, CSS, Javascript and PHP
Ehsan
Nessuna valutazione finora
Love Babbar's Array Section
Documento10 pagine
Love Babbar's Array Section
JJ Jain
Nessuna valutazione finora
Java and Unicode: The Confusion About String and Char in Java
Documento15 pagine
Java and Unicode: The Confusion About String and Char in Java
JuANITO
Nessuna valutazione finora
Asp Net
Documento274 pagine
Asp Net
niit cts
Nessuna valutazione finora
Comprog 5 - Programing Operator
Documento18 pagine
Comprog 5 - Programing Operator
Mikey Ramos
Nessuna valutazione finora
37 Alphabet Pattern Programs
Documento109 pagine
37 Alphabet Pattern Programs
Diwakar Reddy
Nessuna valutazione finora
Hana Abap Sdfsmon Filterandaggregation
Documento9 pagine
Hana Abap Sdfsmon Filterandaggregation
contactaps
Nessuna valutazione finora
Specifications: Class Class From To Constructor
Documento3 pagine
Specifications: Class Class From To Constructor
Debanik Majumder
33% (6)
Logfile
Documento42 pagine
Logfile
Hema Arumugam
Nessuna valutazione finora
Intro To Apache Spark
Documento66 pagine
Intro To Apache Spark
Yohanes Eka Wibawa
Nessuna valutazione finora
11.interfaces in Local Classes in SAP ABAP
Documento5 pagine
11.interfaces in Local Classes in SAP ABAP
Shashank Yerra
Nessuna valutazione finora
Aditya SAP ABAP
Documento2 pagine
Aditya SAP ABAP
sumeet
Nessuna valutazione finora
Antipole Tree Indexing
Documento16 pagine
Antipole Tree Indexing
Jesus Alberto Arcia Hernandez
Nessuna valutazione finora
Os Process Scheduling Algorithms
Documento4 pagine
Os Process Scheduling Algorithms
toss
Nessuna valutazione finora
Algorithms Design and Analysis DP Sheet: Year 3 22 20 - Semester 2
Documento16 pagine
Algorithms Design and Analysis DP Sheet: Year 3 22 20 - Semester 2
youssef amr
Nessuna valutazione finora
Isca TSP
Documento14 pagine
Isca TSP
updatesmockbank
Nessuna valutazione finora
Properties of Levenshtein, N-Gram, Cosine and Jaccard Distance Coefficients - in Sentence Matching
Documento1 pagina
Properties of Levenshtein, N-Gram, Cosine and Jaccard Distance Coefficients - in Sentence Matching
AngelRibeiro10
Nessuna valutazione finora
Clean Code Slides-Objects
Documento17 pagine
Clean Code Slides-Objects
Anurag Mishra
Nessuna valutazione finora
Lab Task 7: Programming Exercises
Documento6 pagine
Lab Task 7: Programming Exercises
Farzeen Zehra
Nessuna valutazione finora
Lösungen Zu Den Exercises AI Python
Documento26 pagine
Lösungen Zu Den Exercises AI Python
Ozan
Nessuna valutazione finora
Numerical Methods Lab Report
Documento12 pagine
Numerical Methods Lab Report
SANTOSH DAHAL
Nessuna valutazione finora
Search Results: Mathcad Solve Block
Documento3 pagine
Search Results: Mathcad Solve Block
bashasvuce
Nessuna valutazione finora
Oops MCQ
Documento6 pagine
Oops MCQ
Captain price
Nessuna valutazione finora