Motivation: Statistical models of protein families, such as position-specif
ic scoring matrices, profiles and hidden Markov models, have been used effe
ctively to find remote homologs when given a set of known protein family me
mbers. Unfortunately training these models typically requires a relatively
large set of training sequences. Recent work (Grundy, J. Comput. Biol., 5,
479-492, 1998) has shown that, when only a few family members are known, se
veral theoretically justified statistical modeling techniques fail to provi
de homology detection performance on a par with Family Pairwise Search (FPS
), an algorithm that combines scores from a pairwise sequence similarity al
gorithm such as BLAST.
Results: The present paper provides a model-based algorithm that improves F
PS by incorporating hybrid motif-based models of the form generated by Cobb
ler (Henikoff and Henikoff, Protein Sci., 6, 698-705, 1997). For the 73 pro
tein families investigated here, this cobbled FPS algorithm provides better
homology detection performance than either Cobbler or FPS alone. This impr
ovement is maintained when BLAST is replaced with the fill Smith-Waterman a
lgorithm.