Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified

Keane, Thomas and Creevey, Christopher and Pentony, Melissa and Naughton, Thomas J. and McInerney, James (2006) Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. BMC Evolutionary Biology, 6. pp. 29-46.

Download (552kB)

Share your research

more...

Add this article to your Mendeley library

Abstract

Background In recent years, model based approaches such as maximum likelihood have become the methods of choice for constructing phylogenies. A number of authors have shown the importance of using adequate substitution models in order to produce accurate phylogenies. In the past, many empirical models of amino acid substitution have been derived using a variety of different methods and protein datasets. These matrices are normally used as surrogates, rather than deriving the maximum likelihood model from the dataset being examined. With few exceptions, selection between alternative matrices has been carried out in an ad hoc manner. Results We start by highlighting the potential dangers of arbitrarily choosing protein models by demonstrating an empirical example where a single alignment can produce two topologically different and strongly supported phylogenies using two different arbitrarily-chosen amino acid substitution models. We demonstrate that in simple simulations, statistical methods of model selection are indeed robust and likely to be useful for protein model selection. We have investigated patterns of amino acid substitution among homologous sequences from the three Domains of life and our results show that no single amino acid matrix is optimal for any of the datasets. Perhaps most interestingly, we demonstrate that for two large datasets derived from the proteobacteria and archaea, one of the most favored models in both datasets is a model that was originally derived from retroviral Pol proteins. Conclusion This demonstrates that choosing protein models based on their source or method of construction may not be appropriate.

Item Type:	Article
Keywords:	Assessment; Amino acid matrix; Empirical data; Ad Hoc assumptions;
Academic Unit:	Faculty of Science and Engineering > Biology
Item ID:	916
Depositing User:	Dr. James McInerney
Date Deposited:	26 Feb 2008
Journal or Publication Title:	BMC Evolutionary Biology
Publisher:	BioMed Central Ltd
Refereed:	Yes
URI:	http://www.biomedcentral.com/1471-2148/6...
Use Licence:	This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available here

Repository Staff Only(login required)

Item control page

Downloads

Downloads per month over past year

Origin of downloads

Altmetric

MURAL - Maynooth University Research Archive Library

Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified

Abstract

Repository Staff Only(login required)

Downloads

Origin of downloads