ITA
ENG

DERIVING GESTURAL SCORES FROM ARTICULATOR-MOVEMENT RECORDS USING WEIGHTED TEMPORAL DECOMPOSITION

Authors

JUNG TP KRISHNAMURTHY AK AHALT SC BEEKMAN ME LEE SH

Citation

Tp. Jung et al., DERIVING GESTURAL SCORES FROM ARTICULATOR-MOVEMENT RECORDS USING WEIGHTED TEMPORAL DECOMPOSITION, IEEE transactions on speech and audio processing, 4(1), 1996, pp. 2-18

Citations number

Categorie Soggetti

Engineering, Eletrical & Electronic",Acoustics

Journal title

IEEE transactions on speech and audio processing → ACNP

ISSN journal

10636676

Volume

Issue

Year of publication

1996

Pages

2 - 18

Database

ISI

SICI code

1063-6676(1996)4:1<2:DGSFAR>2.0.ZU;2-D

Abstract

A computational model to map from articulatory data to an articulatory -phonetic representation is examined in this paper. The approach uses positional values tracked by X-ray microbeam of two lip pellets and fo ur tongue pellets nonlinearly transformed to a new Cartesian space in which the new x and y values represent the distance of the pellets goi ng back along the opposing vocal tract wall and the distance perpendic ular to the tract wall, The transformed articulatory data, as well as the simultaneously recorded electroglottograph data, then serve as the input representation for the computational model that makes use of te mporal decomposition to model multichannel trajectories, Temporal deco mposition constructs a set of target functions from data-derived basis functions using a form of adaptive Gauss-Seidel iteration, The result ant target functions, in conjunction with the weights for each basis f unction, are then used to derive the articulatory-phonetic representat ion called a gestural score. This method is applied to the task of est imating the gestural score for various CVC syllables embedded in frame sentences in the stimulus set, To determine the adequacy of the deriv ed gestural scores, two evaluations were performed: a perception test and a classification test using an automatic recognizer based on a neu ral network model, High recognition rates from both the perceptual exp eriments and the automatic recognizers support the hypothesis that suf ficient information is available in the resultant gestural scores to a llow accurate identification of the phonetic elements.