ITA
ENG

Document ranking based upon Markov chains

Authors

Danilowicz, C Balinski, J

Citation

C. Danilowicz et J. Balinski, Document ranking based upon Markov chains, INF PR MAN, 37(4), 2001, pp. 623-637

Citations number

Categorie Soggetti

Library & Information Science","Information Tecnology & Communication Systems

Journal title

INFORMATION PROCESSING & MANAGEMENT

ISSN journal

03064573 → ACNP

Volume

Issue

Year of publication

2001

Pages

623 - 637

Database

ISI

SICI code

0306-4573(200107)37:4<623:DRBUMC>2.0.ZU;2-G

Abstract

One of the most important problems in information retrieval is determining the order of documents in the answer returned to the user. Many methods and algorithms fur document ordering have been proposed. The method introduced in this paper differs from them especially in that it uses a probabilistic model of a document set. In this model documents are regarded as states of a Markov chain, where transition probabilities are directly proportional t o similarities between documents. Steady-state probabilities reflect simila rities of particular documents to the whole answer set. If documents are or dered according to these probabilities, at the top of a list there will be documents that are the best representatives of the set, and at the bottom t hose which are the worst representatives. The method was tested against dat abases INSPEC and Networked Computer Science Technical Reference Library (N CSTRL). Test results are positive. Values of the Kendall rank correlation c oefficient indicate high similarity between rankings generated by the propo sed method and rankings produced by experts. Results are comparable with ra nkings generated by the vector model using standard weighting schema tf . i df. (C) 2001 Elsevier Science Ltd. All rights reserved.