T. Hisamitsu et Y. Nitta, A GENERALIZED ALGORITHM FOR JAPANESE MORPHOLOGICAL ANALYSIS AND A COMPARATIVE-EVALUATION OF SOME HEURISTICS, Systems and computers in Japan, 26(1), 1995, pp. 73-87
Citations number
16
Categorie Soggetti
Computer Science Hardware & Architecture","Computer Science Information Systems","Computer Science Theory & Methods
In ordinary written Japanese, words are not separated by spaces. There
fore morphological analysis involves segmenting and tagging sentences.
Since each sentence has a huge number of possible tagged segmentation
s, various criteria have been proposed for making plausible decisions.
However, there are still no unified frameworks that incorporate vario
us heuristics, and there has been no comparative evaluation of commonl
y used heuristics. This paper presents a clear framework to describe v
arious heuristics, and an N-best algorithm for extracting optimal solu
tions. The time complexity of this algorithm is O(nNlog(2)(1 + N)), wh
ere n is the sentence length. The advantage of the N-best algorithm ov
er the standard beam search algorithm is also discussed. This paper al
so presents a comparative evaluation of three major heuristics, and pr
oposes a precise and portable rule-based heuristic. Estimation was don
e using the aforementioned algorithm and six criteria. The newly propo
sed heuristic is based upon the Extended Least Bunsetsu (Phrase) Numbe
r method.