Abstract
Uri Keich: Estimating the significance of sequence motifs
Efficient and accurate statistical significance evaluation is an essential
requirement for motif-finding tools. One such widely used significance
criterion is the p-value of the motif's information content or entropy score.
Current computation schemes used in popular motif-finding programs can
unwittingly provide poor approximations. We present an approach to a fast and
reliable estimation of this p-value that can be applied more generally. We then
show that in the context of twilight zone searches, or searches for relatively
weak motifs, the paradigm of relying on entropy scores and their p-values can
surprisingly lead to undesirable results. These lead us to consider alternative
approaches to analyze the significance of motifs.