Review Questions for the Midterm
CS 426 - Introduction to Computational Biology - Fall 2003
Some of the following questions are from http://www.roselab.jhu.edu/~przytyck/Exam2001.doc
and from http://www.cs.princeton.edu/courses/archive/fall00/cs597c/ps1.htm.
- Fill in the cost-table below assuming gap penalty =
-2, match = +1, mismatch = -1. Fill in the other table to show the path-costs
determined during the dynamic programming. Write down the optimal alignment and show the
path in the matrix that corresponds to this alignment. If there is more than
one alignment with the maximum score write all these alignments.
- What is an amino-acid substitution matrix? What is the difference between
PAM1 and PAM250?
- Given is a set of multiple aligned sequences. Compute the sequence profile for
this set.
ATAATAC
ATAATAG
ATAATTC
ATATTAC
ATAATAA
- Assume scoring s(a,b) such that
| s(a,b)
= |
|
2 if a=b |
| 0 if a and b are different and are not gaps |
| -1 if one of a or b is a gap |
| 0 if both a and b are gaps |
Compute the SP (Sum of Pairs) score for the following multiple alignment:
AATT-A
AA-TAA
AC-TAA.
- Multiple Alignments:
For this problem, use the sum-of-pairs metric and follow the "once a
gap, always a gap" rule. Consider the following three sequences.
(1) ACGTC
(2) TCCT
(3) ACGTCCT
- Compute all three optimal pairwise alignments assuming a cost of 2 for
each deletion and 3 for each substitution. Give the cost of each alignment.
- Compute a progressive multiple alignment starting with the pairwise
alignment (1,3). Now use the pairwise alignment (2,3) to merge sequence 2
into the multiple alignment. Show the resulting alignment and give its cost.
- Repeat problem 2, but this time use the pairwise alignment (1,2) to merge
sequence 2 into the multiple alignment. Show the resulting alignment and
give its cost. Are the two alignments the same? Which has a lower cost?
- What is the optimal multiple alignment?
- Suppose you charge a cost of 1 for each deletion and 1 for each
substitution. What is the optimal alignment? Is it unique?
- Consider threading algorithm as applied to RNA (rather than proteins that
we discussed in class). It turns out that Z scores computed for RNA are
usually much higher than Z scores computed for proteins. Can you explain?
- Consider a refinement to HP model: An H amino
acid with a missing neighbor (it is on the surface) is penalized by +1 (H-H
remains –1 , and H-P/P-P remain 0). For the following polymer below
suggest an optimal HP sequence without a constraint on sequence composition.
