T. Yahagi et Y. Soeda, ESTIMATION OF GLOTTAL WAVES BASED ON NONMINIMUM-PHASE MODELS, Electronics and communications in Japan. Part 3, Fundamental electronic science, 81(11), 1998, pp. 56-66
Since the characteristics of the glottal sound source due to vibration
of the vocal cords have a great effect on the quality of synthesized
speech, there have been intensive studies on glottal waves. It is know
n experimentally that the waveform is a rounded asymmetrical triangula
r wave. Many voiced source models have been proposed in which the glot
tal waves are parametrically represented as possible approaches toward
a more natural synthesized speech. There are many unsolved problems,
however, since the characteristics of the glottal source must be known
for various speech utterances in order to construct the source model.
Method of estimating the glottal wave from the observed speech signal
include the inverse filtering method, where a filter with the inverse
characteristic to the transfer function of the vocal tract is used. I
n this method, however, there remains the problem of how the essential
error due to the separate estimations of the vocal-tract transfer fun
ction and the glottal wave can be eliminated. This paper proposes a ne
w estimation algorithm for glottal waves, where the characteristics of
the glottal waves and the vocal tract are estimated simultaneously by
considering the vocal-tract transfer function, including the characte
ristics of the glottal source. In the proposed method, the speech gene
ration process is represented by a nonminimum-phase model including th
e characteristics of the glottal source, and the glottal wave is estim
ated by estimating the parameters of the transfer function. In the est
imation of the glottal wave, the unknown driving input signal must be
estimated in parallel to the estimation of the transfer function param
eters. An approximate inverse system is introduced in the proposed met
hod, since the inverse system for the transfer function of the nonmini
mum-phase model is unstable. Using the proposed model, the glottal wav
e can be directly estimated when the vocal-tract characteristic can be
represented by an all-pole model. It is also possible to use the nonm
inimum-phase ARMA model in this method for the analysis/synthesis of s
peech that includes glottal waves. The glottal wave is estimated by si
mulation as well as by observation of actual vowels, and satisfactory
results are obtained, indicating the usefulness of the proposed estima
tion algorithm. (C) 1998 Scripta Technica.