Speech analysis and synthesis using an AM-FM modulation model

Citation
A. Potamianos et P. Maragos, Speech analysis and synthesis using an AM-FM modulation model, SPEECH COMM, 28(3), 1999, pp. 195-209
Citations number
33
Categorie Soggetti
Computer Science & Engineering
Journal title
SPEECH COMMUNICATION
ISSN journal
01676393 → ACNP
Volume
28
Issue
3
Year of publication
1999
Pages
195 - 209
Database
ISI
SICI code
0167-6393(199907)28:3<195:SAASUA>2.0.ZU;2-7
Abstract
In this paper, the AM-FM modulation model is applied to speech analysis, sy nthesis and coding. The AM-FM model represents the speech signal as the sum of formant resonance signals each of which contains amplitude and frequenc y modulation. Multiband filtering and demodulation using the energy separat ion algorithm are the basic tools used for speech analysis, First, multiban d demodulation analysis OC IDA) is applied to the problem of fundamental fr equency estimation using the average instantaneous frequency as estimates o f pitch harmonics. The MDA pitch tracking algorithm is shown to produce smo oth and accurate fundamental frequency contours. Next, the AM-FM modulation vocoder is introduced, which represents speech as the sum of resonance sig nals. A time-varying filterbank is used to extract the formant bands and th en the energy separation algorithm is used to demodulate the resonance sign als into the amplitude envelope and instantaneous frequency signals. Effici ent modeling and coding (at 4.8-9.6 kbits/sec) algorithms are proposed for the amplitude envelope and instantaneous frequency of speech resonances. Fi nally, the perceptual importance of modulations in speech resonances is inv estigated and it is shown that amplitude modulation patterns are both speak er and phone dependent. (C) 1999 Elsevier Science B.V. All rights reserved.