This article describes a neural network model that addresses the acqui
sition of speaking skills by infants and subsequent motor equivalent p
roduction of speech sounds. The model learns two mappings during a bab
bling phase. A phonetic-to-orosensory mapping specifies a vocal tract
target for each speech sound; these targets take the form of convex re
gions in orosensory coordinates defining the shape of the vocal tract.
The babbling process wherein these convex region targets are formed e
xplains how an infant can learn phoneme-specific and language-specific
limits on acceptable variability of articulator movements. The model
also learns an orosensory-to-articulatory mapping wherein cells coding
desired movement directions in orosensory space learn articulator mov
ements that achieve these orosensory movement directions. The resultin
g mapping provides a natural explanation for the formation of coordina
tive structures. This mapping also makes efficient use of redundancy i
n the articulator system, thereby providing the model with motor equiv
alent capabilities. Simulations verify the model's ability to compensa
te for constraints or perturbations applied to the articulators automa
tically and without contextual variability seen in human speech produc
tion.