H. Schutze et Jo. Pedersen, A COOCCURRENCE-BASED THESAURUS AND 2 APPLICATIONS TO INFORMATION-RETRIEVAL, Information processing & management, 33(3), 1997, pp. 307-318
Citations number
32
Categorie Soggetti
Information Science & Library Science","Information Science & Library Science","Computer Science Information Systems
This paper presents a new method for computing a thesaurus from a text
corpus. Each word is represented as a vector in a multi-dimensional s
pace that captures cooccurrence information. Words are defined to be s
imilar if they have similar cooccurrence patterns. Two different metho
ds for using these thesaurus vectors in information retrieval are show
n to significantly improve performance over the Tipster reference corp
us as compared to a term vector space baseline. (C) 1997 Elsevier Scie
nce Ltd.