S. Nakagawa et al., SPOKEN LANGUAGE IDENTIFICATION BY ERGODIC HMMS AND ITS STATE SEQUENCES, Electronics and communications in Japan. Part 3, Fundamental electronic science, 77(6), 1994, pp. 70-79
This paper describes an automatic text- and speaker-independent langua
ge identification method based on hidden Markov models (HMMs) for acou
stic features. The hidden Markov modeling is used to represent the pho
notactics for each language. Each language has its proper phonotactics
. The HMM topology here is a fully structured (ergodic) model wherein
any state could transit to all states. Two kinds of HMMs are Used: the
discrete HMM (DHMM) with the codebook and the continuous density HMM
(CHMM). The HMM was trained using both the Baum-Welch (forward-backwar
d) algorithm and the Viterbi algorithm. The latter was used for emphas
izing the state transition probability. For comparison, experiments al
so were conducted on the identification using a mixtured Gaussian dist
ribution model with one state. This single-state Gaussian distribution
model gave the same performance as the HMM trained with the Baum-Welc
h algorithm. This is because the transition between states which refle
cts the characteristics of each language does not affect the likelihoo
d scores very much. This problem was addressed by emphasizing the tran
sition probabilities and using the Viterbi algorithm, which resulted i
n an improvement in the recognition rates. The trigram for optimal sta
te sequence is introduced. Combining it with the HMM produced the best
results.