M. Rorvig, Images of similarity: A visual exploration of optimal similarity metrics and scaling properties of TREC topic-document sets, J AM S INFO, 50(8), 1999, pp. 639-651
Citations number
49
Categorie Soggetti
Library & Information Science
Journal title
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE
Multiple similarity measures for five TREC topic-document sets from the LDC
TREC Collection Disk 1 are derived from the full text of documents. Each m
easure on each set is scaled using SAS MDS under ordinal, interval, and MLE
assumptions. The resulting 75 permutations are ploted. It is suggested tha
t cosine-vector and overlap measures for similarity appear to recover optim
al data relationships among the documents of the five sets. MLE assumptions
appear to be required to model the data adequately.