Combinatorial preferences affect molecular similarity/diversity calculations using binary fingerprints and Tanimoto coefficients

Citation
Jw. Godden et al., Combinatorial preferences affect molecular similarity/diversity calculations using binary fingerprints and Tanimoto coefficients, J CHEM INF, 40(1), 2000, pp. 163-166
Citations number
15
Categorie Soggetti
Chemistry
Journal title
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES
ISSN journal
00952338 → ACNP
Volume
40
Issue
1
Year of publication
2000
Pages
163 - 166
Database
ISI
SICI code
0095-2338(200001/02)40:1<163:CPAMSC>2.0.ZU;2-J
Abstract
A combinatorial method was developed to calculate complete distributions of the Tanimoto coefficient (Tc) for binary fingerprint (FP) representations of specified length, regardless of the chemical parameters they reflect. Th eoretical Tc distributions were calculated for FPs consisting of up to 67 b it positions which revealed significant statistical preferences of certain Tc values. Calculation of Tc distributions in a large compound database usi ng different FPs mirrored the effects identified by our general analysis. O n the basis of these findings, an average Tc is biased by statistically pre ferred values.