The Disk-Covering Method for Phylogenetic Tree Reconstruction

Tandy Warnow
Department of Computer Science
The University of Texas at Austin


Phylogenetic trees, also known as evolutionary trees, model the evolution of biological species or genes
from a common ancestor. Most computational problems associated with phylogenetic tree reconstruction are
very hard (specifically, they are NP-hard, and are practically hard, as real datasets can take years
of analysis, without provably optimal solutions being found). Finding ways of speeding up the
solutions to these problems is of major importance to systematic biologists. Other approaches take only
polynomial time and have provable performance guarantees under Markov models of evolution; however, our recent work shows
that the sequence lengths that suffice for these methods to be accurate with high probability grows exponentially
in the diameter of the underlying tree.

In this talk, we will describe new dataset decomposition techniques, called the Disk-Covering Methods, for phylogenetic tree reconstruction. This basic algorithmic technique uses interesting graph theory, and can be used to reduce the sequence length requirement of polynomial time methods, so that polynomial length sequences suffice for accuracy with high probability (instead of exponential). We also use this technique to speed up the solution of NP-hard optimization problems, such as maximum likelihood and maximum parsimony.