Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis

Authors
Citation
J. Castresana, Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis, MOL BIOL EV, 17(4), 2000, pp. 540-552
Citations number
57
Categorie Soggetti
Biology,"Experimental Biology
Journal title
MOLECULAR BIOLOGY AND EVOLUTION
ISSN journal
07374038 → ACNP
Volume
17
Issue
4
Year of publication
2000
Pages
540 - 552
Database
ISI
SICI code
0737-4038(200004)17:4<540:SOCBFM>2.0.ZU;2-G
Abstract
The use of some multiple-sequence alignments in phylogenetic analysis, part icularly those that are not very well conserved, requires the elimination o f poorly aligned positions and divergent regions, since they may not he hom ologous or may have been saturated by multiple substitutions. A computerize d method that eliminates such positions and at the same time tries to minim ize the loss of informative sites is presented here. The method is based on the selection of blocks of positions that fulfill a simple set of requirem ents with respect to the number of contiguous conserved positions, lack of gaps, and high conservation of flanking positions, making the final alignme nt more suitable for phylogenetic analysis. To illustrate the efficiency of this method, alignments of 10 mitochondrial proteins from several complete ly sequenced mitochondrial genomes belonging to diverse eukaryotes were use d as examples. The percentages of removed positions were higher in the most divergent alignments. After removing divergent segments, the amino acid co mposition of the different sequences was more uniform, and pairwise distanc es became much smaller. Phylogenetic trees show that topologies can be diff erent after removing conserved blocks, particularly when there are several poorly resolved nodes. Strong support was found for the grouping of animals and fungi but not for the position of more basal eukaryotes. The use of a computerized method such as the one presented here reduces to a certain ext ent the necessity of manually editing multiple alignments, makes the automa tion of phylogenetic analysis of large data sets feasible, and facilitates the reproduction of the final alignment by other researchers.