ITA
ENG

Multidimensional scaling and visualization of large molecular similarity tables

Authors

Agrafiotis, DK Rassokhin, DN Lobanov, VS

Citation

Dk. Agrafiotis et al., Multidimensional scaling and visualization of large molecular similarity tables, J COMPUT CH, 22(5), 2001, pp. 488-500

Citations number

Categorie Soggetti

Chemistry

Journal title

JOURNAL OF COMPUTATIONAL CHEMISTRY

ISSN journal

01928651 → ACNP

Volume

Issue

Year of publication

2001

Pages

488 - 500

Database

ISI

SICI code

0192-8651(20010415)22:5<488:MSAVOL>2.0.ZU;2-U

Abstract

Multidimensional scaling (MDS) is a collection of statistical techniques th at attempt to embed a set of patterns described by means of a dissimilarity matrix into a low-dimensional display plane in a way that preserves their original pairwise interrelationships as closely as possible. Unfortunately, current MDS algorithms are notoriously slow, and their use is limited to s mall data sets. In this article, we present a family of algorithms that com bine nonlinear mapping techniques with neural networks, and make possible t he scaling of very large data sets that are intractable with conventional m ethodologies. The method employs a nonlinear mapping algorithm to project a small random sample, and then "learns" the underlying transform using one or more multilayer perceptrons. The distinct advantage of this approach is that it captures the nonlinear mapping relationship in an explicit function , and allows the scaling of additional patterns as they become available, w ithout the need to reconstruct the entire map. A novel encoding scheme is d escribed, allowing this methodology to be used with a wide variety of input data representations and similarity functions. The potential of the algori thm is illustrated in the analysis of two combinatorial libraries and an en semble of molecular conformations. The method is particularly useful for ex tracting low-dimensional Cartesian coordinate vectors from large binary spa ces, such as those encountered in the analysis of large chemical data sets. (C) 2001 John Wiley & Sons, Inc.