Visit the NCBI website (http://www.ncbi.nlm.nih.gov/)
to use BLAST. The goal is to determine if any known proteins have an
amino acid sequence that corresponds to the string CORNELL UNIVERSITY. There are
multiple versions of BLAST available. For this kind of query, the most
appropriate is to search proteins for "short, nearly exact
matches".
Not all the letters correspond to valid amino acids. Which
letters must be removed?
Search for the modified string using BLAST. Use the nr database
("nr" stands for non-redundant). Copy the data for the
strongest pairwise match (this should be about 10 lines) and turn this in.
Use the Center Star Method to compute a multiple alignment for the
following words: BLATHER, BARTER, HEATER, and THERE. For partial credit you must show your work.
Use the following
simple scoring matrix (this is the same one that was used in the Friday
section):
d(x,y)
= 0 if x and y are identical
= 1 if x and y are nonidentical vowels
= 2 if x and y are nonidentical consonants
= 2 if one of x and y is a space
= 3 if one of x and y is a consonant and the other is a vowel
Note that Y is both a vowel and a consonant, so its score is 1 when
compared to another vowel and 2 when compared to another
consonant.
Using the
scoring matrix from the previous problem, consider the following multiple alignment. BL-A-THER
B--ART-ER
-HEA-T-ER
THE-R--E-
What is the consensus string for this multiple alignment?
Determine the profile for this multiple alignment.
Use the profile to find the best alignment for THREAD. Show
both the final alignment and the
Dynamic Programming matrix that you used to find this alignment (i.e., show the values
that are in the
matrix before finding the path through the matrix).
Suppose that the characters B, C, D, A, and A are aligned (i.e., they
appear in a single column of a multiple alignment). What would the
consensus character be for this column? (Be careful here.)
Find a simple example to show that the Center Star Method does not
necessarily produce the optimal sum-of-pairs alignment. Use the
scoring matrix from the previous problems. Part of the grade on this
problem is based on how simple your example is (i.e., simpler is better).
On the course website you can find a file with 10 sequences; they all
belong to the same enzyme class.
Use ClustalW to obtain the multiple
sequence alignment of these 10 sequences.
Using the sequences, find the the enzyme class they belong to.
Use the Swiss-Prot database to find the sequence of the enzyme alcohol dehydrogenase
in the organism Escherichia coli.
Please submit (a) your alignment, (b)
the name and EC number of the enzyme class, and (c) the sequence and accession
number that you find.