G. Vogt et al., AN ASSESSMENT OF AMINO-ACID EXCHANGE MATRICES IN ALIGNING PROTEIN SEQUENCES - THE TWILIGHT ZONE REVISITED, Journal of Molecular Biology, 249(4), 1995, pp. 816-831
The sensitivity of most protein sequence alignment methods depends str
ongly on the quality of the comparison matrices used. These matrices,
which assign weights or similarity scores to every possible amino acid
substitution pair, are utilized to differentiate amongst the various
possible alignments of two or more sequences. There are many ways to g
enerate these exchange weights and new matrices are constantly publish
ed. There has been no overall assessment of these various matrices whe
n applied in different alignment techniques and over many protein fold
s and families, both close and distant and with the use of several gap
penalty values. In this work, a set of amino acid sequences matched b
y superposition of known protein tertiary topologies is used to test t
he alignment accuracy of the different method/matrix/penalty combinati
ons. The comparisons show relatively similar results for the top scori
ng matrices, a preference for the global alignment method of Needleman
and Wunsch, and the importance of matrix modification and optimized g
ap penalties. The relationship between the percentage identity in a re
sulting alignment and the level of correctness to be expected are give
n for the top-performing matrix, resulting in a better definition of t
he so-called ''twilight zone''. Estimates are made for the probability
that two sequences, aligned at a certain level of residue percentage
identity, are in fact unrelated.