A combinatorial method was developed to calculate complete distributions of
the Tanimoto coefficient (Tc) for binary fingerprint (FP) representations
of specified length, regardless of the chemical parameters they reflect. Th
eoretical Tc distributions were calculated for FPs consisting of up to 67 b
it positions which revealed significant statistical preferences of certain
Tc values. Calculation of Tc distributions in a large compound database usi
ng different FPs mirrored the effects identified by our general analysis. O
n the basis of these findings, an average Tc is biased by statistically pre
ferred values.