Abstract

 

 

Uri Keich: Estimating the significance of sequence motifs

Efficient and accurate statistical significance evaluation is an essential requirement for motif-finding tools. One such widely used significance criterion is the p-value of the motif's information content or entropy score. Current computation schemes used in popular motif-finding programs can unwittingly provide poor approximations. We present an approach to a fast and reliable estimation of this p-value that can be applied more generally. We then show that in the context of twilight zone searches, or searches for relatively weak motifs, the paradigm of relying on entropy scores and their p-values can surprisingly lead to undesirable results. These lead us to consider alternative approaches to analyze the significance of motifs.