EFFICIENT SIGNATURE FILE METHODS FOR TEXT RETRIEVAL

Citation
Dl. Lee et al., EFFICIENT SIGNATURE FILE METHODS FOR TEXT RETRIEVAL, IEEE transactions on knowledge and data engineering, 7(3), 1995, pp. 423-435
Citations number
21
Categorie Soggetti
Information Science & Library Science","Computer Sciences, Special Topics","Engineering, Eletrical & Electronic","Computer Science Artificial Intelligence
ISSN journal
10414347
Volume
7
Issue
3
Year of publication
1995
Pages
423 - 435
Database
ISI
SICI code
1041-4347(1995)7:3<423:ESFMFT>2.0.ZU;2-Z
Abstract
Signature files have been studied extensively as an access method for textual databases. Many approaches have been proposed for searching si gnatures files efficiently. However, different methods make different assumptions and use different performance measures, making it difficul t to compare their performance. In this paper, we study three basic me thods proposed in the literature, namely, the indexed descriptor file, the two-level superimposed coding scheme, and the partitioned signatu re file approach. The contribution of this paper is two-fold. First, w e present a uniform analytical performance model so that the methods c an be compared fairly and consistently. The analysis shows that the tw o-level superimposed coding scheme, if stored in a transposed file, ha s the best performance. Second, we extend the two-level superimposed c oding method into a multilevel superimposed coding method, we obtain t he optimal number of levels for the multilevel method and show that fo r databases with reasonable size the optimal value is much larger than 2, which is assumed in the two-level method. The accuracy of the anal ytical formula is demonstrated by simulation.