An empirical study of the effect of sequence alignment on phylogenetic analysis
Seminar Room 1, Newton Institute
AbstractPhylogenetic analyses start with a multiple sequence alignment, which is often accepted as known despite wide recognition that errors may impact downstream phylogenetic analysis. Many phylogenetic methods involve testing which of a range of competing hypotheses best describe the evolution of a set of sequences. These tests may be justified statistically when using the correct alignment, but errors in the alignment lead to non-homologous characters being placed together, which in turn may systematically bias the test. We investigate empirically the impact of different alignment methods on phylogenetic analyses and assess the relative impact of different approximations used by different alignment methods. We examine the effect of alignment on two phylogenetic analyses that are commonly used in computational biology: the inference of a maximum-likelihood tree using RAxML, and a test for positive selection by comparing the M7 and M8 models in PAML. We test 200 sets of sequences from the Adaptive Evolution Database using the popular aligners ClustalW, Muscle, MAAFT, ProbCons, and the phylogenetic aligner Prank. We also sample from the posterior distribution of the statistical aligner BAli-Phy, which enables us to compare the relative impact of aligner choice to uncertainty from a single aligner. The algorithmic basis of an aligner tends to determine the outcome of the phylogenetic analysis. For example, trees estimated from progressive aligners tend to be more similar to one another than those estimated from phylogenetically aware (Prank) or consensus (ProbCons) aligners. Moreover the spread of phylogenetic parameter estimates inferred from BAli-Phy’s posterior distribution of alignments is much smaller than the differences between other aligners, suggesting differences are larger than could be expected by chance. Of the aligners examined, our results suggest that the phylogenetically informed Prank provides the closest approximation to full statistical alignment.
If it doesn't, something may have gone wrong with our embedded player.
We'll get it fixed as soon as possible.