Modeling of the glottal flow derivative waveform with application to speaker identification

Citation
Md. Plumpe et al., Modeling of the glottal flow derivative waveform with application to speaker identification, IEEE SPEECH, 7(5), 1999, pp. 569-586
Citations number
35
Categorie Soggetti
Eletrical & Eletronics Engineeing
Journal title
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING
ISSN journal
10636676 → ACNP
Volume
7
Issue
5
Year of publication
1999
Pages
569 - 586
Database
ISI
SICI code
1063-6676(199909)7:5<569:MOTGFD>2.0.ZU;2-1
Abstract
An automatic technique for estimating and modeling the glottal pow derivati ve source waveform from speech, and applying the model parameters to speake r identification, is presented. The estimate of the glottal flow derivative is decomposed into coarse structure, representing the general flow shape, and fine structure, comprising aspiration and other perturbations in the fl ow, from which model parameters are obtained, The glottal flow derivative i s estimated using an inverse filter determined within a time interval of vo cal-fold closure that is identified through differences in formant frequenc y modulation during the open and closed phases of the glottal cycle. This f ormant motion is predicted by Ananthapadmanabha and Pant to be a result of time-varying and nonlinear source/vocal tract coupling within a glottal cyc le. The glottal how derivative estimate is modeled using the Liljencrants-F ant model to capture its coarse structure, while the fine structure of the flow derivative is represented through energy and perturbation measures. Th e model parameters are used in a Gaussian mixture model speaker identificat ion (SID) system. Both coarse- and fine-structure glottal features are show n to contain significant speaker-dependent information. For a large TIMIT d atabase subset, averaging over male and female SID scores, the coarse-struc ture parameters achieve about 60% accuracy, the fine-structure parameters g ive about 40% accuracy, and their combination yields about 70% correct iden tification. Finally, in preliminary experiments on the counterpart telephon e-degraded NTIMIT database, about a 5% error reduction in SID scores is obt ained when source features are combined with traditional mel-cepstral measu res.