Introduction

A current area of research in computational biology is identifying the structure and function of proteins from their amino acid sequence. A present method to accomplish this is to compare two sequences: a probe sequence with unknown structure and function to a template sequence for which the structure and function are known. By comparing the probe sequence to a database that is representative of all known proteins, the structure and function can be known by seeing which proteins the query sequence is homologous to.

A pair wise alignment using dynamic programming is used to compare the probe sequence to the template sequence. The success of this method depends on the substitution matrix, or scoring function, used to quantify how well two positions, one in the probe and one in the template, match up. My work focused on trying two new scoring functions. One was based on secondary structure and the other was a mixture of amino acids, secondary structure, and environment.