A GENERALIZED ALGORITHM FOR JAPANESE MORPHOLOGICAL ANALYSIS AND A COMPARATIVE-EVALUATION OF SOME HEURISTICS

Citation
T. Hisamitsu et Y. Nitta, A GENERALIZED ALGORITHM FOR JAPANESE MORPHOLOGICAL ANALYSIS AND A COMPARATIVE-EVALUATION OF SOME HEURISTICS, Systems and computers in Japan, 26(1), 1995, pp. 73-87
Citations number
16
Categorie Soggetti
Computer Science Hardware & Architecture","Computer Science Information Systems","Computer Science Theory & Methods
ISSN journal
08821666
Volume
26
Issue
1
Year of publication
1995
Pages
73 - 87
Database
ISI
SICI code
0882-1666(1995)26:1<73:AGAFJM>2.0.ZU;2-U
Abstract
In ordinary written Japanese, words are not separated by spaces. There fore morphological analysis involves segmenting and tagging sentences. Since each sentence has a huge number of possible tagged segmentation s, various criteria have been proposed for making plausible decisions. However, there are still no unified frameworks that incorporate vario us heuristics, and there has been no comparative evaluation of commonl y used heuristics. This paper presents a clear framework to describe v arious heuristics, and an N-best algorithm for extracting optimal solu tions. The time complexity of this algorithm is O(nNlog(2)(1 + N)), wh ere n is the sentence length. The advantage of the N-best algorithm ov er the standard beam search algorithm is also discussed. This paper al so presents a comparative evaluation of three major heuristics, and pr oposes a precise and portable rule-based heuristic. Estimation was don e using the aforementioned algorithm and six criteria. The newly propo sed heuristic is based upon the Extended Least Bunsetsu (Phrase) Numbe r method.