Pooja Anshul Saxena Engr 692: Special Topics - Computational Biology

Pooja Anshul Saxena Engr 692: Special Topics Computational Biology
Global Sequence Alignment

The NeedlemanWunsch algorithm performs a global alignment on two sequences It is an example of dynamic programming, and was the first application of dynamic programming to biological sequence comparison Suitable when the two sequences are of similar length, with a significant degree of similarity throughout Aim: The best alignment over the entire length of two sequences
Three steps in NeedlemanWunsch Algorithm

Initialization Scoring Trace back (Alignment) Consider the two DNA sequences to be globally aligned are: ATCG (x=4, length of sequence 1) TCG (y=3, length of sequence 2)
Scoring Scheme
Match Score = +1 Mismatch Score = -1 Gap penalty = -1 Substitution Matrix
A A C G T 1 -1 -1 -1 C -1 1 -1 -1 G -1 -1 1 -1 T -1 -1 -1 1
Initialization Step
Create a matrix with X +1 Rows and Y +1 Columns The 1st row and the 1st column of the score matrix are filled as multiple of gap penalty
T 0 A T C G -1 -2 -3 -4 -1 C -2 G -3
Scoring
The score of any cell C(i, j) is the maximum of: scorediag = C(i-1, j-1) + S(I, j) scoreup = C(i-1, j) + g scoreleft = C(i, j-1) + g where S(I, j) is the substitution score for letters i and j, and g is the gap penalty
Scoring .
Example: The calculation for the cell C(2, 2): scorediag = C(i-1, j-1) + S(I, j) = 0 + -1 = -1 scoreup = C(i-1, j) + g = -1 + -1 = -2 scoreleft = C(i, j-1) + g = -1 + -1 = -2
T 0 A T C G -1 -2 -3 -4 -1 -1 C -2 G -3
Scoring .
Final Scoring Matrix

T 0 A T C G -1 -2 -3 -4 -1 -1 0 -1 -2 C -2 -2 -1 1 0 G -3 -3 -2 0 2
Trace back
The trace back step determines the actual alignment(s) that result in the maximum score There are likely to be multiple maximal alignments Trace back starts from the last cell, i.e. position X, Y in the matrix Gives alignment in reverse order
Trace back .
There are three possible moves: diagonally (toward the top-left corner of the matrix), up, or left Trace back takes the current cell and looks to the neighbor cells that could be direct predecessors. This means it looks to the neighbor to the left (gap in sequence #2), the diagonal neighbor (match/mismatch), and the neighbor above it (gap in sequence #1). The algorithm for trace back chooses as the next cell in the sequence one of the possible predecessors
Trace back .
T 0 A T C G
C -2 -2 -1 1 0
G -3 -3 -2 0 2
-1 -1 0 -1 -2
-1 -2 -3 -4
The only possible predecessor is the diagonal match/mismatch neighbor. If more than one possible predecessor exists, any can be chosen. This gives us a current alignment of Seq 1: G | Seq 2: G
Trace back .
Final Trace back

T
0 A T C G -1 -2 -3 -4 -1 -1 0 -1 -2
C
-2 -2 -1 1 0
G
-3 -3 -2 0 2
Best Alignment: ATC G | | | | _TCG
Local Sequence Alignment

The Smith-Waterman algorithm performs a local alignment on two sequences It is an example of dynamic programming Useful for dissimilar sequences that are suspected to contain regions of similarity or similar sequence motifs within their larger sequence context Aim: The best alignment over the conserved domain of two sequences
Differences in NeedlemanWunsch and Smith-Waterman Algorithms:
In the initialization stage, the first row and first column are all filled in with 0s While filling the matrix, if a score becomes negative, put in 0 instead In the traceback, start with the cell that has the highest score and work back until a cell with a score of 0 is reached.
Three steps in Smith-Waterman Algorithm

Initialization Scoring Trace back (Alignment) Consider the two DNA sequences to be globally aligned are: ATCG (x=4, length of sequence 1) TCG (y=3, length of sequence 2)
Scoring Scheme
Match Score = +1 Mismatch Score = -1 Gap penalty = -1 Substitution Matrix
A A C G T 1 -1 -1 -1 C -1 1 -1 -1 G -1 -1 1 -1 T -1 -1 -1 1
Initialization Step
Create a matrix with X +1 Rows and Y +1 Columns The 1st row and the 1st column of the score matrix are filled with 0s
T 0 A T C G 0 0 0 0 0 C 0 G 0
Scoring
The score of any cell C(i, j) is the maximum of: scorediag = C(i-1, j-1) + S(I, j) scoreup = C(i-1, j) + g scoreleft = C(i, j-1) + g And 0 (here S(I, j) is the substitution score for letters i and j, and g is the gap penalty)
Scoring .
Example: The calculation for the cell C(2, 2): scorediag = C(i-1, j-1) + S(I, j) = 0 + -1 = -1 scoreup = C(i-1, j) + g = 0 + -1 = -1 scoreleft = C(i, j-1) + g = 0 + -1 = -1
T 0 A T C G 0 0 0 0 0 0 C 0 G 0
Scoring .
Final Scoring Matrix

T 0 A T C G 0 0 0 0 0 0 1 0 0 C 0 0 0 2 1 G 0 0 0 1 3
Note: It is not mandatory that the last cell has the maximum alignment score!
Trace back
The trace back step determines the actual alignment(s) that result in the maximum score There are likely to be multiple maximal alignments Trace back starts from the cell with maximum value in the matrix Gives alignment in reverse order
Trace back .
There are three possible moves: diagonally (toward the top-left corner of the matrix), up, or left Trace back takes the current cell and looks to the neighbor cells that could be direct predecessors. This means it looks to the neighbor to the left (gap in sequence #2), the diagonal neighbor (match/mismatch), and the neighbor above it (gap in sequence #1). The algorithm for trace back chooses as the next cell in the sequence one of the possible predecessors. This continues till cell with value 0 is reached.
Trace back .
T 0 A T C G
C 0 0 0 2 1
G 0 0 0 1 3
0 0 1 0 0
0 0 0 0
The only possible predecessor is the diagonal match/mismatch neighbor. If more than one possible predecessor exists, any can be chosen. This gives us a current alignment of Seq 1: G | Seq 2: G
Trace back .
Final Trace back

T
0 A T C G 0 0 0 0 0 0 1 0 0
C
0 0 0 2 1
G
0 0 0 1 3
Best Alignment: TCG | | | TCG

Pooja Anshul Saxena Engr 692: Special Topics - Computational Biology

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Pooja Anshul Saxena Engr 692: Special Topics - Computational Biology

Caricato da

Copyright:

Formati disponibili

Pooja Anshul Saxena Engr 692: Special Topics Computational Biology

Global Sequence Alignment

Three steps in NeedlemanWunsch Algorithm

Final Scoring Matrix

Final Trace back

Best Alignment: ATC G | | | | _TCG

Local Sequence Alignment

Differences in NeedlemanWunsch and Smith-Waterman Algorithms:

Three steps in Smith-Waterman Algorithm

Final Scoring Matrix

Final Trace back

Best Alignment: TCG | | | TCG

Potrebbero piacerti anche