An efficient projection protocol for chemical databases: Singular value decomposition combined with truncated-Newton minimization

Citation
Dx. Xie et al., An efficient projection protocol for chemical databases: Singular value decomposition combined with truncated-Newton minimization, J CHEM INF, 40(1), 2000, pp. 167-177
Citations number
23
Categorie Soggetti
Chemistry
Journal title
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES
ISSN journal
00952338 → ACNP
Volume
40
Issue
1
Year of publication
2000
Pages
167 - 177
Database
ISI
SICI code
0095-2338(200001/02)40:1<167:AEPPFC>2.0.ZU;2-2
Abstract
A rapid algorithm for visualizing large chemical databases in a low-dimensi onal space (2D or 3D) is presented as a first step in database analysis and design applications. The projection mapping of the compound database (desc ribed as vectors in the high-dimensional space of chemical descriptors) is based on the singular value decomposition (SVD) combined with a minimizatio n procedure implemented with the efficient truncated-Newton program package (TNPACK). Numerical experiments on four chemical datasets with real-valued descriptors (ranging from 58 to 27 255 compounds) show that the SVD/TNPACK projection duo achieves a reasonable accuracy in 2D, varying from 30% to a bout 100% of pairwise distance segments that lie within 10% of the original distances. The lowest percentages, corresponding to scaled datasets, can b e made close to 100% with projections onto a 10-dimensional space. We also show that the SVD/TNPACK duo is efficient for minimizing the distance error objective function (especially for scaled datasets), and that TNPACK is mu ch more efficient than a current popular approach of steepest descent minim ization in this application context. Applications of our projection techniq ue to similarity and diversity sampling in drug design can be envisioned.