Monday, April 10, 2006
4:30 pm
B17 Upson Hall

Computer Science
Spring 2006

New Life Sciences

Dana Pe'er

 Harvard Medical School


Elucidating Function and Organization of Molecular Networks:
From Molecules to a System


With the advent of high throughput technologies, data in molecular biology is accumulating at a staggering rate.  While this flood of data bears much promise, the computational methodology needed to analyze such data remains a challenging bottleneck. My objective is to understand the architecture of molecular networks and elucidate a global view of how the molecular network processes combinations of signals to compute and execute a concerted cellular decision and response.  I approach this goal with a computational toolbox and develop methods that integrate and explain diverse high throughput technologies to uncover novel insights about the workings of living cells and organisms.  

Understanding “cellular computation” requires knowledge of the network architecture and the influences among its components.  To address this challenge, I take a hierarchical modeling approach.  At the finest resolution, I develop models that infer the detailed connectivity and influences between network components. In this talk, I will demonstrate how we applied Bayesian networks to the automated derivation of causal influences in signaling networks.  This relied on state of the art technology that simultaneously measures the levels of multiple signaling components in thousands of individual human cells.  Our method automatically discovered de novo, most traditionally established influences between the measured signaling components, as well as discovering novel inter-pathway crosstalk, which we experimentally verified.  A key distinction of our approach is the use of single cell measurements, thus avoiding population averaging, which often masks true activities. 

However, molecular networks are complex, constituting of a web of thousands of components interacting via a wide range of molecular mechanisms.  To address this challenge, I develop probabilistic models that integrate heterogeneous data and exploit biological principles such as modularity for obtaining efficient representations and learning algorithms.  I will present a probabilistic method that integrates genotype and gene expression data for discovering regulatory modules: a set of genes that is controlled by the same regulatory program. Our method automatically identifies the genes comprising each module and the regulatory program controlling their behavior.  Additionally, our method provides fine-grained mechanistic explanations for the effect of individual genetic variation on regulatory interactions.  By taking in the system as a whole, our analysis suggests some insights on how genetic variation adapts the regulatory network to different environments.