ITA
ENG

PITCH-SYNCHRONOUS FRAME-BY-FRAME AND SEGMENT-BASED ARTICULATORY ANALYSIS-BY-SYNTHESIS

Authors

GUPTA SK SCHROETER J

Citation

Sk. Gupta et J. Schroeter, PITCH-SYNCHRONOUS FRAME-BY-FRAME AND SEGMENT-BASED ARTICULATORY ANALYSIS-BY-SYNTHESIS, The Journal of the Acoustical Society of America, 94(5), 1993, pp. 2517-2530

Citations number

Categorie Soggetti

Acoustics

Journal title

The Journal of the Acoustical Society of America → ACNP

ISSN journal

00014966

Volume

Issue

Year of publication

1993

Pages

2517 - 2530

Database

ISI

SICI code

0001-4966(1993)94:5<2517:PFASAA>2.0.ZU;2-0

Abstract

This paper presents a pitch-synchronous analysis-by-synthesis procedur e for estimating model parameters for voiced speech. These model param eters describe the vocal-tract shape and the time derivative of the gl ottal area function. The excitation waveform is derived from the glott al area function by incorporating source-tract interaction using the c urrent vocal-tract input impedance. The corresponding analysis procedu re for estimating the model parameters once every pitch period is outl ined. A significant improvement in quality was obtained for the new pi tch-synchronous analysis/synthesis procedure relative to the fixed-fra me-length-based scheme used previously. It was also found that the new pitch-synchronous articulatory analysis/synthesis scheme achieves low er rms spectral distortion values than the 2.4 kb/s. Federal standard LPC-10E algorithm. A segment-based procedure for estimating the vocal- tract model parameters at a rate much lower than the current pitch is described. In this segment-based analysis-by-synthesis approach, the m odel parameters are estimated every 50-100 ms. The parameters for the intermediate pitch periods are derived by interpolation. The segments are selected using a maximum likelihood segmentation algorithm that se gments an utterance into diphonelike units. A segment-based parameter optimization scheme could lead to a highly economical representation o f the speech signal for potential applications in very low bit rate sp eech coding and-speech storage. The above schemes were optimized for a pilot test sentence and then evaluated using eight test sentences for a log area and the Coker articulatory model representation of the voc al tract. Nine listeners were asked to judge the quality of the synthe sis in a paired-comparison test and the results were analyzed using a nonparametric one-tailed sign test. For the log-area representation of the vocal tract, we found a significant degradation in speech quality for the segment-based optimization procedure relative to the frame-ba sed procedure. However, for the Coker model representation, the degrad ation was found to be insignificant. This shows that unlike cross-sect ional areas, the movement of various articulators in the vocal tract d uring speech production can be described with sufficient accuracy by s pecifying the position of these articulators and by using an interpola tion function at time intervals much longer than a pitch period.