Dx. Xie et al., An efficient projection protocol for chemical databases: Singular value decomposition combined with truncated-Newton minimization, J CHEM INF, 40(1), 2000, pp. 167-177
Citations number
23
Categorie Soggetti
Chemistry
Journal title
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES
A rapid algorithm for visualizing large chemical databases in a low-dimensi
onal space (2D or 3D) is presented as a first step in database analysis and
design applications. The projection mapping of the compound database (desc
ribed as vectors in the high-dimensional space of chemical descriptors) is
based on the singular value decomposition (SVD) combined with a minimizatio
n procedure implemented with the efficient truncated-Newton program package
(TNPACK). Numerical experiments on four chemical datasets with real-valued
descriptors (ranging from 58 to 27 255 compounds) show that the SVD/TNPACK
projection duo achieves a reasonable accuracy in 2D, varying from 30% to a
bout 100% of pairwise distance segments that lie within 10% of the original
distances. The lowest percentages, corresponding to scaled datasets, can b
e made close to 100% with projections onto a 10-dimensional space. We also
show that the SVD/TNPACK duo is efficient for minimizing the distance error
objective function (especially for scaled datasets), and that TNPACK is mu
ch more efficient than a current popular approach of steepest descent minim
ization in this application context. Applications of our projection techniq
ue to similarity and diversity sampling in drug design can be envisioned.