Current speech synthesis methods based on the concatenation of waveform uni
ts can produce highly intelligible speech capturing the identity of a parti
cular speaker. However, the quality of concatenated speech often suffers fr
om discontinuities bem-een the acoustic units, due to contextual difference
s and variations in speaking style across the database. In this paper, we p
resent methods to spectrally modify speech units in a concatenative synthes
izer to correspond more closely to the acoustic transitions observed in nat
ural speech. First, a technique called "unit fusion" is proposed to reduce
spectral mismatch between units. In addition to concatenation units, a seco
nd, independent tier of units is selected that defines the desired spectral
dynamics at concatenation points. Both unit tiers are ''fused' to obtain n
atural transitions throughout the synthesized utterance. The unit fusion me
thod is further extended to control the perceived degree of articulation of
concatenated units. In the second part of the paper, a signal processing t
echnique based on sinusoidal modeling is presented that enables high-qualit
y resynthesis of units with a modified spectral shape.