A METHOD FOR DISAMBIGUATING WORD SENSES IN A LARGE CORPUS

Citation
Wa. Gale et al., A METHOD FOR DISAMBIGUATING WORD SENSES IN A LARGE CORPUS, Computers and the humanities, 26(5-6), 1992, pp. 415-439
Citations number
37
Categorie Soggetti
Art & Humanities General","Computer Sciences, Special Topics","Computer Applications & Cybernetics
ISSN journal
00104817
Volume
26
Issue
5-6
Year of publication
1992
Pages
415 - 439
Database
ISI
SICI code
0010-4817(1992)26:5-6<415:AMFDWS>2.0.ZU;2-9
Abstract
Word sense disambiguation has been recognized as a major problem in na tural language processing research for over forty years. Both quantiti ve and qualitative methods have been tried, but much of this work has been stymied by difficulties in acquiring appropriate lexical resource s. The availability of this testing and training material has enabled us to develop quantitative disambiguation methods that achieve 92% acc uracy in discriminating between two very distinct senses of a noun. In the training phase, we collect a number of instances of each sense of the polysemous noun. Then in the testing phase, we are given a new in stance of the noun, and are asked to assign the instance to one of the senses. We attempt to answer this question by comparing the context o f the unknown instance with contexts of known instances using a Bayesi an argument that has been applied successfully in related tasks such a s author identification and information retrieval. The proposed method is probably most appropriate for those aspects of sense disambiguatio n that are closest to the information retrieval task. In particular, t he proposed method was designed to disambiguate senses that are usuall y associated with different topics.