CS 426 - HW4

Due: Friday, Nov 14

In lecture, we noticed that the tree that appears on transparency #19 could not have been produced by the UPGMA algorithm because it does not satisfy the ultrametric property.
1. Determine the correct UPGMA tree. The UPGMA algorithm appears on transparency #16; note that each new node is at height (measured from the leaves) d(C_i,C_j)/2.
2. Find a phylogenetic tree program based on UPGMA. Run the program using the data from the table on transparency #18. Turn in the part of the program output that includes the resulting tree.
  [PHYLIP is one package that includes the UPGMA algorithm, but you can use a different program if you want. There is a PHYLIP web server (http://bioweb.pasteur.fr/intro-uk.html) set up by the Institut Pasteur in Paris. The online documentation includes example input. Make sure your input looks like the example (including spacing) --- PHYLIP is picky about its input format.]
3. Find a phylogenetic tree program based on Neighbor Joining. Run the program using the data from the table on transparency #18. Turn in the part of the program output that includes the resulting tree. [You can use PHYLIP or some other program.]
4. The tree produced in (c) doesn't look the same as the tree (on transparency #18) that was used to derive the distance table. Is it really the same? Explain.
Use the Swiss-Prot Protein Knowledgebase to find sequences for the following proteins: MYG_BOVIN, MYG_CHICK, MYG_HORSE, MYG_HUMAN, MYG_LYCPI, MYG_MOUSE, MYG_PIG, MYG_RABIT. These are myoglobins from cow, chicken, horse, human, wild dog, mouse, pig, and rabbit, respectively. (You can do a search for myoglobin and then checkmark each of the listed proteins.)
1. Run ClustalW to align the protein sequences; this is one of the "Result Options" that you can use once you have selected the proteins. Turn in the alignment output.
2. By default in PHYLIP.protpars (protein parsimony), the resulting trees are drawn as if the first protein in the input is an outgroup. Of these species, which one would make the most sense to use as an outgroup?
3. Use the program protpars from Phylip to produce the most parsimonious tree (or trees) using the first 60 amino acids for each of the above proteins. Make sure your input looks like the example in the online documentation (including spacing) --- PHYLIP is picky about its input format. Turn in the outfile (the part that includes the tree pictures).
4. According to these results, which species are most closely related to humans?
After you have run ClustalW (2a, above), you can use DistmatP in the "Result Options" to convert your alignment into a distance matrix.
1. Turn in the distance matrix.
2. Use the distance matrix to run a phylogenetic tree program based on Neighbor Joining (you can use PHYLIP or some other program; you may have to retype the distance matrix). Turn in the resulting tree.
3. Discuss how this tree compares with the tree(s) produced by parsimony.
The following Markov model was designed to generate sequences in the helical conformation.
We assume two types of residues alpha (helix) and beta (something else). An alpha helix is defined as a sequence of all alpha amino acids. The parameters of the model are as follows:
- If the previous residue was not alpha, then there is a probability 0.3 that the next residue will be alpha.
- If the previous residue was alpha then the probability of the next residue to be alpha is 0.7 (this is sometimes called cooperativity effect in secondary structure formation).
- The zero state (opening amino acids) has alpha and beta residues with equal probabilities.
Write a program that generates sequences consistent with the above model and computes 100 sequences of length 100 consistent with the above model. In your report includes the program and the average length of observed continuous helices.