GLEAN - USING SYNTACTIC INFORMATION IN DOCUMENT FILTERING

Citation
R. Chandrasekar et B. Srinivas, GLEAN - USING SYNTACTIC INFORMATION IN DOCUMENT FILTERING, Information processing & management, 34(5), 1998, pp. 623-640
Citations number
36
Categorie Soggetti
Information Science & Library Science","Computer Science Information Systems","Computer Science Information Systems
ISSN journal
03064573
Volume
34
Issue
5
Year of publication
1998
Pages
623 - 640
Database
ISI
SICI code
0306-4573(1998)34:5<623:G-USII>2.0.ZU;2-0
Abstract
In this paper, we describe a system called Glean, which is based on th e idea that coherent text contains significant latent information, suc h as syntactic structure and patterns of language use, which can be us ed to enhance the performance of information retrieval systems. We pro pose an approach to increase the precision of information retrieval th at makes use;of syntactic information obtained using a supertagger. In this approach, patterns based on local syntactic context are induced from training material. These patterns are used to refine the set of d ocuments retrieved by a standard Web search engine or an information r etrieval system, by selecting relevant information and filtering out i rrelevant items. We show that syntactic information does improve the e ffectiveness of filtering irrelevant documents, and that supertagging is more effective than part of speech tagging in filtering documents. Further, we also show how the extent of syntactic context affects filt ering performance. We discuss the relationship between Glean and other attempts at improving information retrieval performance. (C) 1998 Pub lished by Elsevier Science Ltd. All rights reserved.