Review Questions for the Midterm

CS 426 - Introduction to Computational Biology - Fall 2003

Some of the following questions are from http://www.roselab.jhu.edu/~przytyck/Exam2001.doc and from http://www.cs.princeton.edu/courses/archive/fall00/cs597c/ps1.htm.

  1. Fill in the cost-table below assuming gap penalty = -2, match = +1, mismatch = -1. Fill in the other table to show the path-costs determined during the dynamic programming.  Write down the optimal alignment and show the path in the matrix that corresponds to this alignment. If there is more than one alignment with the maximum score write all these alignments.

    cost A G C G
    A        
    A        
    A        
    C        
    G        

    DP   A G C G
               
    A          
    A          
    A          
    C          
    G          

  2. What is an amino-acid substitution matrix? What is the difference between PAM1 and PAM250?
     
  3. Given is a set of multiple aligned sequences. Compute the sequence profile for this set.
    ATAATAC
    ATAATAG
    ATAATTC
    ATATTAC
    ATAATAA
      1 2 3 4 5 6 7
    A              
    C              
    T              
    G              

     

  4. Assume scoring s(a,b) such that
    s(a,b) =        2 if a=b
    0 if a and b are different and are not gaps
    -1 if one of a or b is a gap
    0 if both a and b are gaps

    Compute the SP (Sum of Pairs) score for the following multiple alignment:
    AATT-A
    AA-TAA
    AC-TAA
    .
     

  5. Multiple Alignments:
    For this problem, use the sum-of-pairs metric and follow the "once a
    gap, always a gap" rule. Consider the following three sequences.

    (1) ACGTC
    (2) TCCT
    (3) ACGTCCT
     

    1. Compute all three optimal pairwise alignments assuming a cost of 2 for each deletion and 3 for each substitution. Give the cost of each alignment.
    2. Compute a progressive multiple alignment starting with the pairwise alignment (1,3). Now use the pairwise alignment (2,3) to merge sequence 2 into the multiple alignment. Show the resulting alignment and give its cost.
    3. Repeat problem 2, but this time use the pairwise alignment (1,2) to merge sequence 2 into the multiple alignment. Show the resulting alignment and give its cost. Are the two alignments the same? Which has a lower cost?
    4. What is the optimal multiple alignment?
    5. Suppose you charge a cost of 1 for each deletion and 1 for each substitution. What is the optimal alignment? Is it unique?
       
  6. Consider threading algorithm as applied to RNA (rather than proteins that we discussed in class). It turns out that Z scores computed for RNA are usually much higher than Z scores computed for proteins. Can you explain?
     
  7. Consider a refinement to HP model: An H amino acid with a missing neighbor (it is on the surface) is penalized by +1 (H-H remains –1 , and H-P/P-P remain 0). For the following polymer below suggest an optimal HP sequence without a constraint on sequence composition.