Document ranking based upon Markov chains

Citation
C. Danilowicz et J. Balinski, Document ranking based upon Markov chains, INF PR MAN, 37(4), 2001, pp. 623-637
Citations number
25
Categorie Soggetti
Library & Information Science","Information Tecnology & Communication Systems
Journal title
INFORMATION PROCESSING & MANAGEMENT
ISSN journal
03064573 → ACNP
Volume
37
Issue
4
Year of publication
2001
Pages
623 - 637
Database
ISI
SICI code
0306-4573(200107)37:4<623:DRBUMC>2.0.ZU;2-G
Abstract
One of the most important problems in information retrieval is determining the order of documents in the answer returned to the user. Many methods and algorithms fur document ordering have been proposed. The method introduced in this paper differs from them especially in that it uses a probabilistic model of a document set. In this model documents are regarded as states of a Markov chain, where transition probabilities are directly proportional t o similarities between documents. Steady-state probabilities reflect simila rities of particular documents to the whole answer set. If documents are or dered according to these probabilities, at the top of a list there will be documents that are the best representatives of the set, and at the bottom t hose which are the worst representatives. The method was tested against dat abases INSPEC and Networked Computer Science Technical Reference Library (N CSTRL). Test results are positive. Values of the Kendall rank correlation c oefficient indicate high similarity between rankings generated by the propo sed method and rankings produced by experts. Results are comparable with ra nkings generated by the vector model using standard weighting schema tf . i df. (C) 2001 Elsevier Science Ltd. All rights reserved.