ITA
ENG

A sequential approach for identifying lead compounds in large chemical databases

Authors

Abt, M Lim, Y Sacks, J Xie, M Young, SS

Citation

M. Abt et al., A sequential approach for identifying lead compounds in large chemical databases, STAT SCI, 16(2), 2001, pp. 154-168

Citations number

Categorie Soggetti

Mathematics

Journal title

STATISTICAL SCIENCE

ISSN journal

08834237 → ACNP

Volume

Issue

Year of publication

2001

Pages

154 - 168

Database

ISI

SICI code

0883-4237(200105)16:2<154:ASAFIL>2.0.ZU;2-9

Abstract

At the early stage of drug discovery, many thousands of chemical compounds can be synthesized and tested (assayed) for potency (activity) with high th roughput screening (HTS). With ever-increasing numbers of compounds to be t ested (now often in the neighborhood of 500,000) it remains a challenge to find strategies via sequential design that reduce costs while locating clas ses of active compounds. Initial screening of a modest number of selected compounds (first-stage) is used to construct a structure-activity relationship (SAR). Based on this m odel, a second-stage sample is selected, the SAR updated and, if no more sa mpling is done, the activities of not yet tested compounds are predicted. I nstead of stopping, the SAR could be used to determine another stage of sam pling after which the SAR is updated and the process repeated. We use existing data on the potency and chemical structure of 70,223 compou nds to investigate various sequential testing schemes. Evidence on two assa ys supports the conclusion that a rather small number of samples selected a ccording to the proposed scheme can more than triple the rate at which acti ve compounds are identified and also produce SARs effective for identifying chemical structure. A different set of 52,883 compounds is used to confirm our findings. One surprising conclusion of the study is that the design of the initial sa mple stage may be unimportant: random selection or systematic methods based on chemical structures are equally effective.