ITA
ENG

Data-driven temporal filters and alternatives to GMM in speaker verification

Authors

Malayath, N Hermansky, H Kajarekar, S Yegnanarayana, B

Citation

N. Malayath et al., Data-driven temporal filters and alternatives to GMM in speaker verification, DIGIT SIG P, 10(1-3), 2000, pp. 55-74

Citations number

Categorie Soggetti

Eletrical & Eletronics Engineeing

Journal title

DIGITAL SIGNAL PROCESSING

ISSN journal

10512004 → ACNP

Volume

Issue

1-3

Year of publication

2000

Pages

55 - 74

Database

ISI

SICI code

1051-2004(200001/07)10:1-3<55:DTFAAT>2.0.ZU;2-I

Abstract

This paper discusses the research directions pursued jointly at the Anthrop ic Signal Processing Group of the Oregon Graduate Institute and at the Spee ch and Vision Laboratory of the Indian Institute of Technology Madras. Curr ent methods for speaker verification are based on modeling the speaker char acteristics using Gaussian mixture models (GMM). The performance of these s ystems significantly degrades if the target speakers use a telephone handse t that is different from that, used while training. Conventional methods fo r channel normalization include utterance-based mean subtraction (MS) and R elAtive SpecTrAl (RASTA) filtering. In this paper we introduce a novel meth od for designing filters that are capable of normalizing the variability in troduced by different telephone handsets. The design of the filter is based on the estimated second-order statistics of handset variability. This filt er is applied on the logarithmic energy outputs of Mel spaced filter banks. We also demonstrate the effectiveness of the proposed channel normalizing filter in improving speaker verification performance in mismatched conditio ns. GMM-based systems often use thousands of mixture components and hence r equire a large number of parameters to characterize each target speaker. In order to address this issue we propose an alternative to GMM for modeling speaker characteristics. The alternative is based on speaker-specific mappi ng and it relies on a speaker-independent representation of speech. (C) 200 0 Academic Press.