A COOCCURRENCE-BASED THESAURUS AND 2 APPLICATIONS TO INFORMATION-RETRIEVAL

Citation
H. Schutze et Jo. Pedersen, A COOCCURRENCE-BASED THESAURUS AND 2 APPLICATIONS TO INFORMATION-RETRIEVAL, Information processing & management, 33(3), 1997, pp. 307-318
Citations number
32
Categorie Soggetti
Information Science & Library Science","Information Science & Library Science","Computer Science Information Systems
ISSN journal
03064573
Volume
33
Issue
3
Year of publication
1997
Pages
307 - 318
Database
ISI
SICI code
0306-4573(1997)33:3<307:ACTA2A>2.0.ZU;2-#
Abstract
This paper presents a new method for computing a thesaurus from a text corpus. Each word is represented as a vector in a multi-dimensional s pace that captures cooccurrence information. Words are defined to be s imilar if they have similar cooccurrence patterns. Two different metho ds for using these thesaurus vectors in information retrieval are show n to significantly improve performance over the Tipster reference corp us as compared to a term vector space baseline. (C) 1997 Elsevier Scie nce Ltd.