Raluca Gordan

Duke University

Transcriptional regulation of gene expression is largely enacted by regulatory proteins called transcription factors (TFs), which bind specific DNA sites across the genome and thus activate or repress the expression of target genes. Identifying DNA binding motifs of TFs from genome-wide experimental data (such as ChIP-chip) is a challenging problem because TF binding motifs are typically short and degenerate, which makes them difficult to distinguish from genomic background. My work on DNA motif discovery addresses the problem of low signal-to-noise ratio by using additional biological information to derive positional priors that can be incorporated into a Gibbs sampling algorithm to bias the search for TF binding sites toward DNA regions that are more likely to be functional, such as regions with low nucleosome occupancy or evolutionarily conserved regions. I will show that when incorporating additional biological information, the accuracy of predicted DNA motifs from ChIP-chip data increases by up to 52%.


Despite the improvement obtained using additional biological information, DNA motif discovery remains especially challenging when applied to in vivo TF binding data because the observed TF-DNA interactions are not necessarily direct. Some TFs associate with DNA only indirectly via co-regulatory factors, while others exhibit both direct and indirect binding. I will present a novel algorithm for analyzing in vivo TF binding data to distinguish direct from indirect TF-DNA interactions by combining the in vivo data with nucleosome occupancy data and in vitro DNA binding motifs in a principled statistical framework.


Combining different types of biological data is also the key to understanding how TFs cooperate or compete to achieve their regulatory functions. My future research will focus on finding regulatory modules that are bound combinatorially by sets of TFs, and on understanding how TFs with similar DNA binding specificities compete for genomic sites in vivo and regulate different sets of genes. To achieve these goals, I will design and apply computational methods that combine TF binding data, DNA accessibility data, histone modification data, evolutionary conservation data, and potentially other types of information that are relevant for DNA binding and transcriptional regulation.


B17 Upson Hall

Thursday, February 24, 2011

Refreshments at 3:45pm in the Upson 4th Floor Atrium


Computer Science


Spring 2011


Combining Different Types of Biological Data

to Identify Regulatory Interactions

Between Proteins and DNA