In this homework, we will implement the algorithm for sequence alignment with gap opening and extension. We will then perform a series of pairwise alignments of various protein sequences, and compare the results obtained by the two techniques we discussed in class: sequence alignment with fixed gap penalties, and sequence alignment with gap opening and extension. To get started:
1) 1MBC 3) 1MYT 5) 1MBA
2) 1LHS 4) 1YMC 6) 1LH1
Also, download the file blosum50.txt, which
contains the Blosum 50 scoring matrix.
Note that the alignSequences() function is somewhat different from the one
shown in class. The modifications were introduced to clean up the code and improve performance.
Please see the comments inside alignSequences.m for more details.
alignSequences() and the lecture notes
on gap opening and extension in order to complete the implementation.
alignGapOpenExt() function to find all pairwise sequence alignment
scores, using the Blosum50 matrix, a gap opening score of -10, and a gap extension
score of -1. Generate and print out the 6x6 matrix of pairwise alignment scores. How is this
matrix different from the one obtained in part 3?
1MBC vs. 1YMC, obtained using both alignment
techniques. Likewise print out the alignments of 1LHS vs. 1LH1.
If your gap opening and extension code works correctly, you will notice that the two alignments
of 1MBC vs. 1YMC are very similar, both in aligned sequences and
in score. However the two alignments of 1LHS vs. 1LH1 are quite
different from one another: the alignment obtained using the gap opening and extension method
is considerably longer, scores higher, and contains more gaps. Explain this outcome.
alignGapOpenExt() function.
For part (6), submit your explanation. You do not need to submit a printout of your sequence
alignments, though feel free to do so.