Computer Science Colloquium Series, Spring 2008

Humans differ in many "phenotypes" such as weight, hair color and more importantly disease susceptibility. These phenotypes are largely determined by each individual's specific genotype, stored in the 3.2 billion bases of his or her genome sequence. Deciphering the sequence by finding which sequence variations cause a certain phenotype would have a great impact. The recent advent of high-throughput genotyping methods has enabled retrieval of an individual's sequence information on a genome-wide scale. Classical approaches have focused on identifying which sequence variations are associated with a particular phenotype. However, the complexity of cellular mechanisms, through which sequence variations cause a particular phenotype, makes it difficult to directly infer such causal relationships. In this talk, I will present machine learning approaches that address these challenges by explicitly modeling the cellular mechanisms induced by sequence variations. Our approach takes as input genome-wide expression measurements and aims to generate a finer-grained hypothesis such as "sequence variations S induces cellular processes M, which lead to changes in the phenotype P". Furthermore, we have developed the "meta-prior algorithm" which can learn the regulatory potential of each sequence variation based on their intrinsic characteristics. This improvement helps to identify a true causal sequence variation among very many variations in the same chromosomal region. Our approaches have led to novel insights on sequence variations, and some of the hypotheses have been validated through biological experiments. Many of the machine learning techniques are generally applicable to a wide-ranging set of applications, and as an example I will present the meta-prior algorithm in the context of movie rating prediction tasks using the Netflix data set.

Su-In Lee is a Ph.D. candidate at Stanford University, where she is a member of the Stanford Artificial Intelligence Laboratory. Her research focuses on devising computational methodologies for understanding the genetic basis of complex traits. She is also interested in developing general machine learning algorithms for broader applications. Su-In graduated Summa Cum Laude with a B.Sc. in Electrical Engineering and Computer Science from Korea Advanced Institute of Science and Technology and was a recipient of the Stanford Graduate Fellowship.

Tuesday, April 10, 2008 4:15 pm B17 Upson Hall	Computer Science Colloquium Spring 2008
Su-In Lee Stanford University
Machine Learning Approaches for Understanding the Genetic Basis of Complex Traits
Humans differ in many "phenotypes" such as weight, hair color and more importantly disease susceptibility. These phenotypes are largely determined by each individual's specific genotype, stored in the 3.2 billion bases of his or her genome sequence. Deciphering the sequence by finding which sequence variations cause a certain phenotype would have a great impact. The recent advent of high-throughput genotyping methods has enabled retrieval of an individual's sequence information on a genome-wide scale. Classical approaches have focused on identifying which sequence variations are associated with a particular phenotype. However, the complexity of cellular mechanisms, through which sequence variations cause a particular phenotype, makes it difficult to directly infer such causal relationships. In this talk, I will present machine learning approaches that address these challenges by explicitly modeling the cellular mechanisms induced by sequence variations. Our approach takes as input genome-wide expression measurements and aims to generate a finer-grained hypothesis such as "sequence variations S induces cellular processes M, which lead to changes in the phenotype P". Furthermore, we have developed the "meta-prior algorithm" which can learn the regulatory potential of each sequence variation based on their intrinsic characteristics. This improvement helps to identify a true causal sequence variation among very many variations in the same chromosomal region. Our approaches have led to novel insights on sequence variations, and some of the hypotheses have been validated through biological experiments. Many of the machine learning techniques are generally applicable to a wide-ranging set of applications, and as an example I will present the meta-prior algorithm in the context of movie rating prediction tasks using the Netflix data set. Su-In Lee is a Ph.D. candidate at Stanford University, where she is a member of the Stanford Artificial Intelligence Laboratory. Her research focuses on devising computational methodologies for understanding the genetic basis of complex traits. She is also interested in developing general machine learning algorithms for broader applications. Su-In graduated Summa Cum Laude with a B.Sc. in Electrical Engineering and Computer Science from Korea Advanced Institute of Science and Technology and was a recipient of the Stanford Graduate Fellowship.