Data mining in genome sequences can identify distant homologues of known pr
otein families, and is most powerful if solved structures are available to
reveal the three-dimensional implications of very dissimilar sequences. Her
e we describe putative serpin sequences identified with very high statistic
al significance in the Caenorhabditis elegans genome. When mapped onto vert
ebrate serpins such as alpha(1)-antitrypsin, they suggest novel structural
features. Some appear complete, some show extensive deletions, and others a
ppear to contain only the C-terminal part of the known serpin fold, probabl
y in partnership with N-terminal regions that have conformations unlike tho
se of known serpins. The observation of such striking sequence similarity,
in proteins that must have significantly different overall structures, subs
tantially extends the structural characteristics of the serpin family of pr
oteins. Proteins 1999;36:31-41, (C) 1999 Wiley-Liss, Inc.