W. Ding et al., SIMULTANEOUS ESTIMATION OF VOCAL-TRACT AND VOICE SOURCE PARAMETERS BASED ON AN ARX MODEL, IEICE transactions on information and systems, E78D(6), 1995, pp. 738-743
A novel adaptive pitch-synchronous analysis method is proposed to esti
mate simultaneously vocal tract (formant/antiformant) and voice source
parameters from speech waveforms. We use the parametric Rosenberg-Kla
tt (RK) model to generate a glottal waveform and an autoregressive-exo
genous (ARX) model to represent voiced speech production process. The
Kalman filter algorithm is used to estimate the formant/antiformant pa
rameters from the coefficients of the ARX model, and the simulated ann
ealing method is employed as a nonlinear optimization approach to esti
mate the voice source parameters. The two approaches work together in
a system identification procedure to find the best set of the paramete
rs of both the models. The new method has been compared using syntheti
c speech with some other approaches in terms of accuracy of estimated
parameter values and has been proved to be superior. We also show that
the proposed method can estimate accurately the parameters from natur
al speech sounds. A major application of the analysis method lies in a
concatenative formant synthesizer which allows us to make flexible co
ntrol of voice quality of synthetic speech.