R. Chandrasekar et B. Srinivas, GLEAN - USING SYNTACTIC INFORMATION IN DOCUMENT FILTERING, Information processing & management, 34(5), 1998, pp. 623-640
Citations number
36
Categorie Soggetti
Information Science & Library Science","Computer Science Information Systems","Computer Science Information Systems
In this paper, we describe a system called Glean, which is based on th
e idea that coherent text contains significant latent information, suc
h as syntactic structure and patterns of language use, which can be us
ed to enhance the performance of information retrieval systems. We pro
pose an approach to increase the precision of information retrieval th
at makes use;of syntactic information obtained using a supertagger. In
this approach, patterns based on local syntactic context are induced
from training material. These patterns are used to refine the set of d
ocuments retrieved by a standard Web search engine or an information r
etrieval system, by selecting relevant information and filtering out i
rrelevant items. We show that syntactic information does improve the e
ffectiveness of filtering irrelevant documents, and that supertagging
is more effective than part of speech tagging in filtering documents.
Further, we also show how the extent of syntactic context affects filt
ering performance. We discuss the relationship between Glean and other
attempts at improving information retrieval performance. (C) 1998 Pub
lished by Elsevier Science Ltd. All rights reserved.