Jd. Thompson et al., IMPROVED SENSITIVITY OF PROFILE SEARCHES THROUGH THE USE OF SEQUENCE WEIGHTS AND GAP EXCISION, Computer applications in the biosciences, 10(1), 1994, pp. 19-29
Position-specific substitution matrices, known as profiles, derived fr
om multiple sequence alignments are currently used to search sequence
databases for distantly related members of protein families. The perfo
rmance of the database searches is enhanced by using (i) a sequence we
ighting scheme which assigns higher weights to more distantly related
sequences based on branch lengths derived from phylogenetic trees, (ii
) exclusion of positions with mainly padding characters at sites of in
sertions or deletions and (iii) the BLOSUM62 residue comparison matrix
. A natural consequence of these modifications is an improvement in th
e alignment of new sequences to the profiles. However, the accuracy of
the alignments can be further increased by employing a similarity res
idue comparison matrix. These developments are implemented in a progra
m called PROFILEWEIGHT which runs on Unix and Vax computers. The only
input required by the program is the multiple sequence alignment. The
output from PROFILEWEIGHT is a profile designed to be used by existing
searching and alignment programs. Test results from database searches
with four different families of proteins show the improved sensitivit
y of the weighted profiles.