Enhancing concept-based retrieval based on minimal term sets

Citation
Ah. Alsaffar et al., Enhancing concept-based retrieval based on minimal term sets, J INTELL IN, 14(2-3), 2000, pp. 155-173
Citations number
11
Categorie Soggetti
Information Tecnology & Communication Systems
Journal title
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS
ISSN journal
09259902 → ACNP
Volume
14
Issue
2-3
Year of publication
2000
Pages
155 - 173
Database
ISI
SICI code
0925-9902(200003)14:2-3<155:ECRBOM>2.0.ZU;2-1
Abstract
There is considerable interest in bridging the terminological gap that exis ts between the way users prefer to specify their information needs and the way queries are expressed in terms of keywords or text expressions that occ ur in documents. One of the approaches proposed for bridging this gap is ba sed on technologies for expert systems. The central idea of such an approac h was introduced in the context of a system called Rule Based Information R etrieval by Computer (RUBRIC). In RUBRIC, user query topics (or concepts) a re captured in a rule base represented by an AND/OR tree. The evaluation of AND/OR tree is essentially based on minimum and maximum weights of query t erms for conjunctions and disjunctions, respectively. The time to generate the retrieval output of AND/OR tree for a given query topic is exponential in number of conjunctions in the DNF expression associated with the query t opic. In this paper, we propose a new approach for computing the retrieval output. The proposed approach involves preprocessing of the rule base to ge nerate Minimal Term Sets (MTSs) that speed up the retrieval process. The co mputational complexity of the on-line query evaluation following the prepro cessing is polynomial in m. We show that the computation and use of MTSs al lows a user to choose query topics that best suit their needs and to use re trieval functions that yield a more refined and controlled retrieval output than is possible with the AND/OR tree when document terms are binary. We i ncorporate p-Norm model into the process of evaluating MTSs to handle the c ase where weights of both documents and query terms are non-binary.