Rp. Sheridan et Sk. Kearsley, USING A GENETIC ALGORITHM TO SUGGEST COMBINATORIAL LIBRARIES, Journal of chemical information and computer sciences, 35(2), 1995, pp. 310-320
Citations number
13
Categorie Soggetti
Information Science & Library Science","Computer Application, Chemistry & Engineering","Computer Science Interdisciplinary Applications",Chemistry,"Computer Science Information Systems
In combinatorial synthesis, molecules are assembled by linking chemica
lly similar fragments. Since the number of available chemical fragment
s often greatly exceeds the number of distinct fragments that can be u
sed in one synthetic experiment, choosing a subset of fragments become
s problematical. For example, only a few dozen distinct primary and se
condary amines have ever been reported to have been used in constructi
ng a library of peptoids (oligomers of N-substituted glycine), while t
here are several thousand suitable primary and secondary amines that a
re commercially available. If a combinatorial library is to be constru
cted with a particular biological activity in mind, computer-based str
ucture-activity methods can be used to rationally select a subset of f
ragments. In principle one would computationally generate every possib
le molecule as a combination of fragments, score each molecule by the
Likelihood of its being active, and select those fragments that occur
in high-scoring molecules. For many cases there are too many combinati
ons to take this exhaustive approach, but genetic algorithms can be us
ed to quickly find high-scoring molecules by sampling a small subset o
f the total combinatorial space. In this paper we demonstrate how a ge
netic algorithm is used to select a subset of amines for the construct
ion of a tripeptoid library. We show three examples. In the first exam
ple, the scoring is based on the similarity of the tripeptoids to a sp
ecific tripeptoid target. Since the target itself can be generated in
this example, we have an opportunity to experiment with the protocol o
f our genetic algorithm. In the second example, scoring is based on th
e similarity to two tetrapeptide CCK antagonists. In the third, scorin
g is done by a trend vector derived from activity data on ACE inhibito
rs. In all cases we show that the genetic algorithm can find, in a mod
est amount of computer time, high-scoring peptoids that resemble the t
argets.