Control of spectral dynamics in concatenative speech synthesis

Citation
J. Wouters et Mw. Macon, Control of spectral dynamics in concatenative speech synthesis, IEEE SPEECH, 9(1), 2001, pp. 30-38
Citations number
30
Categorie Soggetti
Eletrical & Eletronics Engineeing
Journal title
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING
ISSN journal
10636676 → ACNP
Volume
9
Issue
1
Year of publication
2001
Pages
30 - 38
Database
ISI
SICI code
1063-6676(200101)9:1<30:COSDIC>2.0.ZU;2-1
Abstract
Current speech synthesis methods based on the concatenation of waveform uni ts can produce highly intelligible speech capturing the identity of a parti cular speaker. However, the quality of concatenated speech often suffers fr om discontinuities bem-een the acoustic units, due to contextual difference s and variations in speaking style across the database. In this paper, we p resent methods to spectrally modify speech units in a concatenative synthes izer to correspond more closely to the acoustic transitions observed in nat ural speech. First, a technique called "unit fusion" is proposed to reduce spectral mismatch between units. In addition to concatenation units, a seco nd, independent tier of units is selected that defines the desired spectral dynamics at concatenation points. Both unit tiers are ''fused' to obtain n atural transitions throughout the synthesized utterance. The unit fusion me thod is further extended to control the perceived degree of articulation of concatenated units. In the second part of the paper, a signal processing t echnique based on sinusoidal modeling is presented that enables high-qualit y resynthesis of units with a modified spectral shape.