T. Dutoit et B. Gosselin, ON THE USE OF A HYBRID HARMONIC STOCHASTIC MODEL FOR TTS SYNTHESIS-BY-CONCATENATION/, Speech communication, 19(2), 1996, pp. 119-143
In this paper, we address the possibilities offered by hybrid harmonic
/stochastic (H/S) models in the context of wide-band text-to-speech sy
nthesis based on segment concatenation. After a brief recall of the hy
potheses underlying such models and a comprehensive review of a well-k
nown analysis algorithm, namely the one provided by the multi-band exc
ited (MBE) analysis framework, we study how H/S models allow to modify
the prosody of segments and how segment concatenation can be organize
d, in the purpose of minimizing mismatches at the border of segments.
In this context, we introduce an original concatenation algorithm whic
h takes advantage of some analysis errors. Speech synthesis algorithms
are then described, including an original synthesis technique based o
n judiciously prepared IFFTs, and the final segmental quality(1) is de
tailed. More particularly, we examine the differences in the quality o
btained when using the model in a narrow-band speech coding context an
d in a wide-band, concatenation based synthesis context. We study thre
e possible causes for these differences: the choice of an analysis cri
terion, the inadequacy of the model due to pitch variatons, and the ef
fect of coarticulation on phases.