The INI has a new website!

This is a legacy webpage. Please visit the new site to ensure you are seeing up to date information.

Isaac Newton Institute for Mathematical Sciences

Mathematical and Statistical Aspects of Molecular Biology

Base frequency parameters in statistical phylogenetic inference

Authors: Vivek Gowri-Shankar (Department of Computer Science), Magnus Rattray (Department of Computer Science)

Abstract

Likelihood-based molecular phylogenetic inference methods are intrinsically linked to a probabilistic model which describes the biological sequence evolution process. Usually, each site in a sequence alignment is considered independently and parametrical Markov models are commonly used to compute the probability of a base being replaced by another one over time. Parameters of these Markov processes are closely linked to chemical properties and biological constraints applied on nucleotides and synthesized molecules. For instance, base frequency parameters describe the frequencies of the different bases which are assumed to be homogeneous over evolutionary time spans and over sequences sites. Perhaps the most important development made in statistical phylogenetic inference over the past ten years is the incorporation of among site rate variation in substitution models. We show here that the initial assumption made on the homogeneity of the composition among sites is clearly violated because frequencies can exhibit clear differences correlated to the substitution rates at each sites. Possible effects of this model misspecification on the estimation of evolutionary parameters are suggested. A previous study seeking knowledge of the environmental temperature of the common ancestor of all life forms from the G+C content of its rRNA sequences is challenged in light of this finding.