L. Xue et al., Database searching for compounds with similar biological activity using short binary bit string representations of molecules, J CHEM INF, 39(5), 1999, pp. 881-886
Citations number
32
Categorie Soggetti
Chemistry
Journal title
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES
In an effort to identify biologically active molecules in compound database
s, we have investigated similarity searching using short binary bit strings
with a maximum of 54 bit positions. These "minifingerprints" (MFPs) were d
esigned to account for the presence or absence of structural fragments and/
or aromatic character, flexibility, and hydrogen-bonding capacity of molecu
les. MFP design was based on an analysis of distributions of molecular desc
riptors and structural fragments in two large compound collections. The per
formance of different MFPs and a reference fingerprint was tested by system
atic "one-against-all" similarity searches of molecules in a database conta
ining 364 compounds with different biological activities. For each fingerpr
int, the most effective similarity cutoff value was determined. An MFP acco
unting for only 32 structural fragments showed less than 2% false positive
similarity matches and correctly assigned on average similar to 40% of the
compounds with the same biological activity to a query molecule. Inclusion
of three numerical two-dimensional (2D) molecular descriptors increased the
performance by 15%. This MFP performed better than a complex 2D fingerprin
t. At a similarity cutoff value of 0.85, the 2D fingerprint totally elimina
ted false positives but recognized less than 10% of the compounds within th
e same activity class.