Tp. Jung et al., DERIVING GESTURAL SCORES FROM ARTICULATOR-MOVEMENT RECORDS USING WEIGHTED TEMPORAL DECOMPOSITION, IEEE transactions on speech and audio processing, 4(1), 1996, pp. 2-18
A computational model to map from articulatory data to an articulatory
-phonetic representation is examined in this paper. The approach uses
positional values tracked by X-ray microbeam of two lip pellets and fo
ur tongue pellets nonlinearly transformed to a new Cartesian space in
which the new x and y values represent the distance of the pellets goi
ng back along the opposing vocal tract wall and the distance perpendic
ular to the tract wall, The transformed articulatory data, as well as
the simultaneously recorded electroglottograph data, then serve as the
input representation for the computational model that makes use of te
mporal decomposition to model multichannel trajectories, Temporal deco
mposition constructs a set of target functions from data-derived basis
functions using a form of adaptive Gauss-Seidel iteration, The result
ant target functions, in conjunction with the weights for each basis f
unction, are then used to derive the articulatory-phonetic representat
ion called a gestural score. This method is applied to the task of est
imating the gestural score for various CVC syllables embedded in frame
sentences in the stimulus set, To determine the adequacy of the deriv
ed gestural scores, two evaluations were performed: a perception test
and a classification test using an automatic recognizer based on a neu
ral network model, High recognition rates from both the perceptual exp
eriments and the automatic recognizers support the hypothesis that suf
ficient information is available in the resultant gestural scores to a
llow accurate identification of the phonetic elements.