S. Rajendran et B. Yegnanarayana, WORD BOUNDARY HYPOTHESIZATION FOR CONTINUOUS SPEECH IN HINDI BASED ONF-0 PATTERNS, Speech communication, 18(1), 1996, pp. 21-46
This paper proposes an algorithm based on F-0 patterns to hypothesize
word boundaries and function words in continuous speech in Hindi. It m
akes use of the properties of F-0 contour such as declination tendency
, resetting and fall-rise patterns in Hindi. The syllabic units are id
entified by using the energy contour, pitch and the first order LP coe
fficient. Each syllabic unit is assigned an accent value L (Low), H or
h (High) by (i) comparing the F-0 value at the mid point of each syll
abic nucleus with that of the previous syllabic unit and (ii) comparin
g the F-0 values at two different points within each syllabic unit in
a sequence having an accent value L. Word boundaries are placed betwee
n the adjacent syllabic units (i)H and L, (ii)h and L, (iii)L and L, (
iv)L and h and (v)H and h. An evaluation conducted on a corpus of 50 s
entences in Hindi read aloud by five native speakers in an ordinary of
fice environment showed that about 74 percent of the word boundaries a
nd about 28 percent of the function words were correctly identified. T
he results of the word boundary hypothesization can be used to improve
the performance of the acoustic-phonetic, lexical and syntactic modul
es in a speech-to-text conversion system. Robustness of the algorithm
in handling noisy speech input conditions and telephone speech are als
o discussed.