CS 426 - HW2

Due: Friday, Oct 3

  1. Visit the NCBI website (http://www.ncbi.nlm.nih.gov/) to use BLAST.  The goal is to determine if any known proteins have an amino acid sequence that corresponds to the string CORNELL UNIVERSITY.  There are multiple versions of BLAST available.  For this kind of query, the most appropriate is to search proteins for "short, nearly exact matches".
    1. Not all the letters correspond to valid amino acids.  Which letters must be removed?
    2. Search for the modified string using BLAST.  Use the nr database ("nr" stands for non-redundant).  Copy the data for the strongest pairwise match (this should be about 10 lines) and turn this in.
            
  2. Use the Center Star Method to compute a multiple alignment for the following words: BLATHER, BARTER, HEATER, and THERE.  For partial credit you must show your work.  Use the following simple scoring matrix (this is the same one that was used in the Friday section):
    d(x,y) = 0 if x and y are identical
    = 1 if x and y are nonidentical vowels
    = 2 if x and y are nonidentical consonants
    = 2 if one of x and y is a space
    = 3 if one of x and y is a consonant and the other is a vowel
    Note that Y is both a vowel and a consonant, so its score is 1 when compared to another vowel and 2 when compared to another consonant.  
     
  3. Using the scoring matrix from the previous problem, consider the following multiple alignment.
    BL-A-THER
    B--ART-ER
    -HEA-T-ER
    THE-R--E-
    1. What is the consensus string for this multiple alignment?
    2. Determine the profile for this multiple alignment.
    3. Use the profile to find the best alignment for THREAD.  Show both the final alignment and the Dynamic Programming matrix that you used to find this alignment (i.e., show the values that are in the matrix before finding the path through the matrix).  
    4. Suppose that the characters B, C, D, A, and A are aligned (i.e., they appear in a single column of a multiple alignment).  What would the consensus character be for this column?  (Be careful here.)
       
  4. Find a simple example to show that the Center Star Method does not necessarily produce the optimal sum-of-pairs alignment.  Use the scoring matrix from the previous problems.  Part of the grade on this problem is based on how simple your example is (i.e., simpler is better).
     
  5. On the course website you can find a file with 10 sequences; they all belong to the same enzyme class. 
    1. Use ClustalW to obtain the multiple sequence alignment of these 10 sequences.
    2. Using the sequences, find the the enzyme class they belong to.
    3. Use the Swiss-Prot database to find the sequence of the enzyme alcohol dehydrogenase in the organism Escherichia coli.
  6. Please submit (a) your alignment, (b) the name and EC number of the enzyme class, and (c) the sequence and accession number that you find.