Motivation: The sizes of protein domains observed in the 3D-structure datab
ase follow a surprisingly narrow distribution. Structural domains are furth
ermore formed from a single-chain continuous segment in over 80% of instanc
es. These observations imply that some choices of domain boundaries on an o
therwise uncharacterized sequence are more likely than others, based solely
on the size and segment number of predicted domains. This property might b
e used to guess the locations of protein domain boundaries.
Results: To test this possibility we enumerate putative domain boundaries a
nd calculate their relative likelihood under a probability model that consi
ders only the size and segment number of predicted domains. We ask in a cro
ss-validated test using sequences with known 3D structure, whether the most
likely guesses agree with the observed domain structure. We find that doma
in boundary predictions are surprisingly successful for sequences up to 400
residues long and that guessing domain boundaries in this way can improve
the sensitivity of threading analysis.