Sequence alignment programs such as BLAST and PSI-BLAST are used routinely
in pairwise, profile-based, or intermediate-sequence-search (ISS) methods t
o detect remote homologies for the purposes of fold assignment and comparat
ive modeling, Yet, the sequence alignment quality of these methods at low s
equence identity is not known, We have used the CE structure alignment prog
ram (Shindyalov and Bourne, Prot Eng 1998;11: 739) to derive sequence align
ments for all superfamily and family-level related proteins in the SCOP dom
ain database. CE aligns structures and their sequences based on distances w
ithin each protein, rather than on interprotein distances. We compared BLAS
T, PSI-BLAST, CLUSTALW, and ISS alignments with the CE structural alignment
s. We found that global alignments with CLUSTALW were very poor at low sequ
ence identity (<25%), as judged by the CE alignments. We used PSI-BLAST to
search the nonredundant sequence database (nr) with every sequence in SCOP
using up to four iterations. The resulting matrix was used to search a data
base of SCOP sequences. PSI-BLAST is only slightly better than BLAST in ali
gnment accuracy on a per-residue basis, but PSI-BLAST matrix alignments are
much longer than BLAST's, and so align correctly a larger fraction of the
total number of aligned residues in the structure alignments. Any two SCOP
sequences in the same superfamily that shared a hit or hits in the nr PSI-B
LAST searches mere identified as linked by the shared intermediate sequence
. We examined the quality of the longest SCOP-query/SCOP-hit alignment via
an intermediate sequence, and found that ISS produced longer alignments tha
n PSI-BLAST searches alone, of nearly comparable per-residue quality. At 10
-15% sequence identity, BLAST correctly aligns 28%, PSI-BLAST 40%, and ISS
46% of residues according to the structure alignments. We also compared CE
structure alignments with FSSP structure alignments generated by the DALI p
rogram. In contrast to the sequence methods, CE and structure alignments fr
om the FSSP database identically align 75% of residue pairs at the 10-15% l
evel of sequence identity, indicating that there is substantial room for im
provement in these sequence alignment methods. BLAST produced alignments fo
r 8% of the 10,665 nonimmunoglobulin SCOP superfamily sequence pairs (nearl
y all <25% sequence identity), PSI-BLAST matched 17% and the double-PSI-BLA
ST ISS method aligned 38% with E-values <10.0. The results indicate that in
termediate sequences may be useful not only in fold assignment but also in
achieving more complete sequence alignments for comparative modeling. (C) 2
000 Wiley-Liss, Inc.