Database searching for compounds with similar biological activity using short binary bit string representations of molecules

Citation
L. Xue et al., Database searching for compounds with similar biological activity using short binary bit string representations of molecules, J CHEM INF, 39(5), 1999, pp. 881-886
Citations number
32
Categorie Soggetti
Chemistry
Journal title
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES
ISSN journal
00952338 → ACNP
Volume
39
Issue
5
Year of publication
1999
Pages
881 - 886
Database
ISI
SICI code
0095-2338(199909/10)39:5<881:DSFCWS>2.0.ZU;2-R
Abstract
In an effort to identify biologically active molecules in compound database s, we have investigated similarity searching using short binary bit strings with a maximum of 54 bit positions. These "minifingerprints" (MFPs) were d esigned to account for the presence or absence of structural fragments and/ or aromatic character, flexibility, and hydrogen-bonding capacity of molecu les. MFP design was based on an analysis of distributions of molecular desc riptors and structural fragments in two large compound collections. The per formance of different MFPs and a reference fingerprint was tested by system atic "one-against-all" similarity searches of molecules in a database conta ining 364 compounds with different biological activities. For each fingerpr int, the most effective similarity cutoff value was determined. An MFP acco unting for only 32 structural fragments showed less than 2% false positive similarity matches and correctly assigned on average similar to 40% of the compounds with the same biological activity to a query molecule. Inclusion of three numerical two-dimensional (2D) molecular descriptors increased the performance by 15%. This MFP performed better than a complex 2D fingerprin t. At a similarity cutoff value of 0.85, the 2D fingerprint totally elimina ted false positives but recognized less than 10% of the compounds within th e same activity class.