This paper discusses the research directions pursued jointly at the Anthrop
ic Signal Processing Group of the Oregon Graduate Institute and at the Spee
ch and Vision Laboratory of the Indian Institute of Technology Madras. Curr
ent methods for speaker verification are based on modeling the speaker char
acteristics using Gaussian mixture models (GMM). The performance of these s
ystems significantly degrades if the target speakers use a telephone handse
t that is different from that, used while training. Conventional methods fo
r channel normalization include utterance-based mean subtraction (MS) and R
elAtive SpecTrAl (RASTA) filtering. In this paper we introduce a novel meth
od for designing filters that are capable of normalizing the variability in
troduced by different telephone handsets. The design of the filter is based
on the estimated second-order statistics of handset variability. This filt
er is applied on the logarithmic energy outputs of Mel spaced filter banks.
We also demonstrate the effectiveness of the proposed channel normalizing
filter in improving speaker verification performance in mismatched conditio
ns. GMM-based systems often use thousands of mixture components and hence r
equire a large number of parameters to characterize each target speaker. In
order to address this issue we propose an alternative to GMM for modeling
speaker characteristics. The alternative is based on speaker-specific mappi
ng and it relies on a speaker-independent representation of speech. (C) 200
0 Academic Press.