This paper describes the general design and application of CerBeruS, a comp
uter-based system for supporting the process of sequential screening. CerBe
ruS stands for cluster-based selection, with cluster analysis forming the p
ivotal part of the system. CerBeruS uses the Ward's clustering method for p
artitioning the data set to be screened into smaller, more homogeneous subs
ets. One representative is picked from each subset and suggested as a scree
ning candidate. Although the number of compounds submitted to screening is
most often driven by the capacity of the assay, CerBeruS provides a statist
ical measure that computes the optimal number of clusters in the data set.
This measure forms a point of reference for all screening experiments. Diff
erent hierarchies of subsets are stored in an Oracle database. Information
about the size and content of a cluster can be retrieved from this database
via a Visual Basic application. How these components work together in the
CerBeruS system is demonstrated on a large data set. In addition, we show t
hat, using the statistical measure, one can find an optimal trade-off betwe
en screening effort and number of hits.