Motivation: The Sequence Search Algorithm Assessment and Testing Toolkit (S
AT) aims to be a complete package for the comparison of different protein h
omology search algorithms. The structural classification of proteins can pr
ovide us with a clear criterion for judgement in homology detection. There
have been several assessments based on structural sequences with classifica
tions but a good deal of similar work is now being repented with locally de
veloped procedures and programs. The SAT will provide developers with a com
plete package which will save time and produce more comparable performance
assessments for search algorithms. The package is complete in the sense tha
t it provides a non-redundant large sequence resource database, a well-char
acterized query database of proteins domains, all the parsers and some prev
ious results from PSI-BLAST and a hidden markov model algorithm.
Results: An analysis on two different data sets was carried out using the S
AT package. It compared rite performance of a full protein sequence databas
e (RSDB100) with a non-redundant representative sequence database derived f
rom it (RSDB50). The performance measurement indicated that the full databa
se is sub-optimal for a homology search. This result justifies the use of m
uch smaller and faster RSDB50 than RSDB100 for the SAT.