H. Abu-salem et al., Stemming methodologies over individual query words for an Arabic Information Retrieval System, J AM S INFO, 50(6), 1999, pp. 524-529
Citations number
24
Categorie Soggetti
Library & Information Science
Journal title
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE
Stemming is one of the most important factors that affect the performance o
f information retrieval systems. This article investigates how to improve t
he performance of an Arabic Information Retrieval System (Arabic-IRS) by im
posing the retrieval method over individual words of a query depending on t
he importance of the WORD, the STEM, or the ROOT of the query terms in the
database. This method, called Mixed Stemming, computes term importance usin
g a weighting scheme that uses the Term Frequency (TF) and the Inverse Docu
ment-frequency (IDF), called TFxIDF. An extended version of the Arabic-IRS
system is designed, implemented, and evaluated to reduce the number of irre
levant documents retrieved. The results of the experiment suggest that the
proposed method outperforms the Word index method using the Binary scheme a
nd the Word index method using the TFxIDF weighting scheme. If also outperf
orms the Stem index method using the Binary weighting scheme but does not o
utperform the Stem index method using the TFxIDF weighting scheme, and agai
n it outperforms the Roof index method using the Binary weighting scheme bu
t does not outperform the Root index method using the TFxIDF weighting sche
me.