The growth in protein sequence data has placed a premium on ways to infer s
tructure and function of the newly sequenced proteins. One of the most effe
ctive ways is to identify a homologous relationship with a protein about wh
ich more is known. While close evolutionary relationships can be confidentl
y determined with standard methods, the difficulty increases as the relatio
nships become more distant. All of these methods rely on some score functio
n to measure sequence similarity. The choice of score function is especiall
y critical for these distant relationships. We describe a new method of det
ermining a score function, optimizing the ability to discriminate between h
omologs and non-homologs. We find that this new score function performs bet
ter than standard score functions for the identification of distant homolog
ies. Proteins 2000;41:498-503. (C) 2000 Wiley-Liss, Inc.