To obtain an accurate phone sequence from a continuous speech signal,
we suggest a novel approach consisting of tightly coupled bottom-up an
d top-down processing. The bottom-up path consists of segmentation, re
cognition and labeling. Also the top-down path consists of labeling, s
peech generation and segmentation. In this manner, the four processes
form a closed feedback loop achieving an optimal interpretation effici
ently for a given noisy observation of speech signal and a priori know
ledge. The major goal of this paper is to identify the system model us
ing both the stochastic estimation theory and the mean field theory. E
xperimental results are obtained in terms of the TIMIT database. It is
shown that introducing the top-down path to the traditional bottom-up
path can improve the recognition rate by 19.7%, and reduce the error
(substitution, deletion and insertion) rate by 16.1%. As a result, the
overall system can transform the incoming continuous signal into one
of the 61 phone classes at the rate of 73.7%.