SPOKEN LANGUAGE IDENTIFICATION BY ERGODIC HMMS AND ITS STATE SEQUENCES

Citation
S. Nakagawa et al., SPOKEN LANGUAGE IDENTIFICATION BY ERGODIC HMMS AND ITS STATE SEQUENCES, Electronics and communications in Japan. Part 3, Fundamental electronic science, 77(6), 1994, pp. 70-79
Citations number
15
Categorie Soggetti
Engineering, Eletrical & Electronic
ISSN journal
10420967
Volume
77
Issue
6
Year of publication
1994
Pages
70 - 79
Database
ISI
SICI code
1042-0967(1994)77:6<70:SLIBEH>2.0.ZU;2-8
Abstract
This paper describes an automatic text- and speaker-independent langua ge identification method based on hidden Markov models (HMMs) for acou stic features. The hidden Markov modeling is used to represent the pho notactics for each language. Each language has its proper phonotactics . The HMM topology here is a fully structured (ergodic) model wherein any state could transit to all states. Two kinds of HMMs are Used: the discrete HMM (DHMM) with the codebook and the continuous density HMM (CHMM). The HMM was trained using both the Baum-Welch (forward-backwar d) algorithm and the Viterbi algorithm. The latter was used for emphas izing the state transition probability. For comparison, experiments al so were conducted on the identification using a mixtured Gaussian dist ribution model with one state. This single-state Gaussian distribution model gave the same performance as the HMM trained with the Baum-Welc h algorithm. This is because the transition between states which refle cts the characteristics of each language does not affect the likelihoo d scores very much. This problem was addressed by emphasizing the tran sition probabilities and using the Viterbi algorithm, which resulted i n an improvement in the recognition rates. The trigram for optimal sta te sequence is introduced. Combining it with the HMM produced the best results.