A novel method for computing chemical similarity from chemical substructure
descriptors is described. This new method, called LaSSI, uses the singular
value decomposition (SVD) of a chemical descriptor-molecule matrix to crea
te a low-dimensional representation of the original descriptor space. Ranki
ng molecules by similarity to a probe molecule in the reduced-dimensional s
pace has several advantages over analogous ranking in the original descript
or space: matching latent structures is more robust than matching discrete
descriptors, choosing the number of singular values provides a rational way
to vary the "fuzziness" of the search, and the reduction in the dimensiona
lity of the chemical space increases-searching speed. LaSSI also allows the
calculation of the similarity between two descriptors and between a descri
ptor and a molecule.