Stemming methodologies over individual query words for an Arabic Information Retrieval System

Citation
H. Abu-salem et al., Stemming methodologies over individual query words for an Arabic Information Retrieval System, J AM S INFO, 50(6), 1999, pp. 524-529
Citations number
24
Categorie Soggetti
Library & Information Science
Journal title
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE
ISSN journal
00028231 → ACNP
Volume
50
Issue
6
Year of publication
1999
Pages
524 - 529
Database
ISI
SICI code
0002-8231(19990501)50:6<524:SMOIQW>2.0.ZU;2-I
Abstract
Stemming is one of the most important factors that affect the performance o f information retrieval systems. This article investigates how to improve t he performance of an Arabic Information Retrieval System (Arabic-IRS) by im posing the retrieval method over individual words of a query depending on t he importance of the WORD, the STEM, or the ROOT of the query terms in the database. This method, called Mixed Stemming, computes term importance usin g a weighting scheme that uses the Term Frequency (TF) and the Inverse Docu ment-frequency (IDF), called TFxIDF. An extended version of the Arabic-IRS system is designed, implemented, and evaluated to reduce the number of irre levant documents retrieved. The results of the experiment suggest that the proposed method outperforms the Word index method using the Binary scheme a nd the Word index method using the TFxIDF weighting scheme. If also outperf orms the Stem index method using the Binary weighting scheme but does not o utperform the Stem index method using the TFxIDF weighting scheme, and agai n it outperforms the Roof index method using the Binary weighting scheme bu t does not outperform the Root index method using the TFxIDF weighting sche me.