Homework 4: (Due Thursday, 11/01)

In this homework, we will implement the algorithm for sequence alignment with gap opening and extension. We will then perform a series of pairwise alignments of various protein sequences, and compare the results obtained by the two techniques we discussed in class: sequence alignment with fixed gap penalties, and sequence alignment with gap opening and extension. To get started:

  1. Download the following proteins from the protein databank
           1) 1MBC        3) 1MYT      5) 1MBA      
           2) 1LHS        4) 1YMC      6) 1LH1
    
    Also, download the file blosum50.txt, which contains the Blosum 50 scoring matrix.

  2. Download the following matlab files from the CS321 website:
    readScoreMatrix.m
    Reads in the blosum 50 scoring matrix file
    pickSequence.m
    Extract the sequence from a PDB file
    sym2pos.m
    Converts amino acid 1-letter codes to integers between 1 and 20
    pos2sym.m
    Converts integers between 1 and 20 to amino acid 1-letter codes

  3. Download alignSequences.m, a matlab function that performs pair-wise sequence alignments. Use it to find all pairwise alignment scores of the above protein sequences, using the Blosum50 scoring matrix and a gap penalty of -10. Generate and print out a 6x6 matrix of pairwise alignment scores.

    Note that the alignSequences() function is somewhat different from the one shown in class. The modifications were introduced to clean up the code and improve performance. Please see the comments inside alignSequences.m for more details.

  4. Download alignGapOpenExt.m. This is the skeleton of a function to perform sequence alignment with gap opening and extension. You will need complete this function yourself; however, do not modify any of the provided code. Refer to the function alignSequences() and the lecture notes on gap opening and extension in order to complete the implementation.

  5. Use your alignGapOpenExt() function to find all pairwise sequence alignment scores, using the Blosum50 matrix, a gap opening score of -10, and a gap extension score of -1. Generate and print out the 6x6 matrix of pairwise alignment scores. How is this matrix different from the one obtained in part 3?

  6. Download pprintAlignment.m. This is function "pretty prints" a pair of aligned sequences, highlighting regions where identical amino acids are aligned against one another. Use this function to print out the two alignments of 1MBC vs. 1YMC, obtained using both alignment techniques. Likewise print out the alignments of 1LHS vs. 1LH1. If your gap opening and extension code works correctly, you will notice that the two alignments of 1MBC vs. 1YMC are very similar, both in aligned sequences and in score. However the two alignments of 1LHS vs. 1LH1 are quite different from one another: the alignment obtained using the gap opening and extension method is considerably longer, scores higher, and contains more gaps. Explain this outcome.
Submit the homework on paper only. Hand in the printout of both alignment score matrices, the code used to generate them, and the code for your alignGapOpenExt() function. For part (6), submit your explanation. You do not need to submit a printout of your sequence alignments, though feel free to do so.