Images of similarity: A visual exploration of optimal similarity metrics and scaling properties of TREC topic-document sets

Authors
Citation
M. Rorvig, Images of similarity: A visual exploration of optimal similarity metrics and scaling properties of TREC topic-document sets, J AM S INFO, 50(8), 1999, pp. 639-651
Citations number
49
Categorie Soggetti
Library & Information Science
Journal title
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE
ISSN journal
00028231 → ACNP
Volume
50
Issue
8
Year of publication
1999
Pages
639 - 651
Database
ISI
SICI code
0002-8231(199906)50:8<639:IOSAVE>2.0.ZU;2-6
Abstract
Multiple similarity measures for five TREC topic-document sets from the LDC TREC Collection Disk 1 are derived from the full text of documents. Each m easure on each set is scaled using SAS MDS under ordinal, interval, and MLE assumptions. The resulting 75 permutations are ploted. It is suggested tha t cosine-vector and overlap measures for similarity appear to recover optim al data relationships among the documents of the five sets. MLE assumptions appear to be required to model the data adequately.