Large-scale comparison of protein sequence alignment algorithms with structure alignments

Citation
Jm. Sauder et al., Large-scale comparison of protein sequence alignment algorithms with structure alignments, PROTEINS, 40(1), 2000, pp. 6-22
Citations number
61
Categorie Soggetti
Biochemistry & Biophysics
Journal title
PROTEINS-STRUCTURE FUNCTION AND GENETICS
ISSN journal
08873585 → ACNP
Volume
40
Issue
1
Year of publication
2000
Pages
6 - 22
Database
ISI
SICI code
0887-3585(20000701)40:1<6:LCOPSA>2.0.ZU;2-6
Abstract
Sequence alignment programs such as BLAST and PSI-BLAST are used routinely in pairwise, profile-based, or intermediate-sequence-search (ISS) methods t o detect remote homologies for the purposes of fold assignment and comparat ive modeling, Yet, the sequence alignment quality of these methods at low s equence identity is not known, We have used the CE structure alignment prog ram (Shindyalov and Bourne, Prot Eng 1998;11: 739) to derive sequence align ments for all superfamily and family-level related proteins in the SCOP dom ain database. CE aligns structures and their sequences based on distances w ithin each protein, rather than on interprotein distances. We compared BLAS T, PSI-BLAST, CLUSTALW, and ISS alignments with the CE structural alignment s. We found that global alignments with CLUSTALW were very poor at low sequ ence identity (<25%), as judged by the CE alignments. We used PSI-BLAST to search the nonredundant sequence database (nr) with every sequence in SCOP using up to four iterations. The resulting matrix was used to search a data base of SCOP sequences. PSI-BLAST is only slightly better than BLAST in ali gnment accuracy on a per-residue basis, but PSI-BLAST matrix alignments are much longer than BLAST's, and so align correctly a larger fraction of the total number of aligned residues in the structure alignments. Any two SCOP sequences in the same superfamily that shared a hit or hits in the nr PSI-B LAST searches mere identified as linked by the shared intermediate sequence . We examined the quality of the longest SCOP-query/SCOP-hit alignment via an intermediate sequence, and found that ISS produced longer alignments tha n PSI-BLAST searches alone, of nearly comparable per-residue quality. At 10 -15% sequence identity, BLAST correctly aligns 28%, PSI-BLAST 40%, and ISS 46% of residues according to the structure alignments. We also compared CE structure alignments with FSSP structure alignments generated by the DALI p rogram. In contrast to the sequence methods, CE and structure alignments fr om the FSSP database identically align 75% of residue pairs at the 10-15% l evel of sequence identity, indicating that there is substantial room for im provement in these sequence alignment methods. BLAST produced alignments fo r 8% of the 10,665 nonimmunoglobulin SCOP superfamily sequence pairs (nearl y all <25% sequence identity), PSI-BLAST matched 17% and the double-PSI-BLA ST ISS method aligned 38% with E-values <10.0. The results indicate that in termediate sequences may be useful not only in fold assignment but also in achieving more complete sequence alignments for comparative modeling. (C) 2 000 Wiley-Liss, Inc.