USING LATENT SEMANTIC INDEXING FOR MULTILANGUAGE INFORMATION-RETRIEVAL

Authors
Citation
Mw. Berry et Pg. Young, USING LATENT SEMANTIC INDEXING FOR MULTILANGUAGE INFORMATION-RETRIEVAL, Computers and the humanities, 29(6), 1995, pp. 413-429
Citations number
15
Categorie Soggetti
Art & Humanities General","Computer Sciences, Special Topics","Computer Science Interdisciplinary Applications
ISSN journal
00104817
Volume
29
Issue
6
Year of publication
1995
Pages
413 - 429
Database
ISI
SICI code
0010-4817(1995)29:6<413:ULSIFM>2.0.ZU;2-S
Abstract
In this paper, a method for indexing cross-language databases for conc eptual query matching is presented. Two languages (Greek and English) are combined by appending a small portion of documents from one langua ge to the identical documents in the other language. The proposed merg ing strategy duplicates less than 7% of the entire database (made up o f different translations of the Gospels). Previous strategies duplicat ed up to 34% of the initial database in order to perform the merger. T he proposed method retrieves a larger number of relevant documents for both languages with higher cosine rankings when Latent Semantic Index ing (LSI) is employed. Using the proposed merge strategies, LSI is sho wn to be effective in retrieving documents from either language (Greek or English) without requiring any translation of a user's query. An e ffective Bible search product needs to allow the use of natural langua ge for searching (queries), LSI enables the user to form queries with using natural expressions in the user's own native language. The mergi ng strategy proposed in this study enables LSI to retrieve relevant do cuments effectively using a minimum of the database in a foreign langu age.