A stochastic procedure for similarity searching in large virtual combinator
ial libraries is presented. The method avoids explicit enumeration and calc
ulation of descriptors for every virtual compound, yet provides an optimal
or nearly optimal similarity selection in a reasonable time frame. It is ba
sed on the principle of probability sampling and the recognition that each
reagent is represented in a combinatorial library by multiple products. The
method proceeds in three stages. First, a small fraction of the products i
s selected at random and ranked according to their similarity against the q
uery structure. The top-ranking compounds are then identified and deconvolu
ted into a list of "preferred" reagents. Finally, all the cross-products of
these preferred reagents are enumerated in an exhaustive manner, and syste
matically compared to the target to obtain the final selection. This proced
ure has been applied to produce similarity selections from several virtual
combinatorial libraries, and the dependency of the quality of the selection
s on several selection parameters has been analyzed.