Short sequence insertions and deletions (indels) are responsible for as much, if not more, sequence diversity in the genomes of mammals as nucleotide substitutions. Despite this, indel mutations have received relatively little attention. Instead, they have been treated as “nuisance” mutations, to be removed by sequence alignment, so that substitutions patterns may be studied.
I will try to convince the audience that indels are worthy of our attention. First, although the inference of indel events from sequence data (by alignments) is an old problem, I will show that a statistical approach leads to new insights – and better algorithms. Next, I will talk about a new way to infer indel rates, leading to the conclusion that indels are more prevalent than appreciated before. Finally, I will show that indels can be put to good use: using a model for their distribution under neutrality, I will identify a good fraction of evolutionary conserved DNA in the human genome. In addition, the model can be used to estimate the total amount of such conserved DNA.
Preliminary results suggest that much more of our genome is conserved than the currently accepted estimate of 5%.