We study the adaptation of all existing language-identification system
to new languages using a limited amount of training data. The platfor
m used for this study is the system recently developed (Yan and Barnar
d, 1995a,b) to exploit phonotactic constraints based on language-depen
dent phone recognition. Using the proposed language model re-estimatio
n technique based on probabilistic gradient descent, two new approache
s and their combination are proposed and tested. These approaches all
modify the phonotactic language models, so that they no longer equal t
he conventional maximum-likelihood estimate. The difference of these m
ethods can be viewed as different information resampling on the same a
mount of data. Experiments were conducted using the standard OGI_TS da
tabase (Muthusamy et al., 1992). For comparison, the baseline system (
with traditional model estimation) was also subjected to the same set
of tests. Systems trained with different amounts of training data in t
he new languages were evaluated. Compared with the conventional model
estimation, the results demonstrate that the new methods improve adapt
ation to new languages. The success of the discriminative model shows
that conventional model estimation is not optimal for language identif
ication, so that improvements can be obtained by modifying the maximum
-likelihood estimates of the language models.