This paper describes speech endpoint detection methods for continuous speec
h recognition systems used over telephone networks. Speech input to these s
ystems may be contaminated not only by various ambient noises but also by v
arious irrelevant sounds generated by users such as coughs, tongue clicking
, lip noises and certain out-of-task utterances. Under these adverse condit
ions, robust speech endpoint detection remains an unsolved problem. We foun
d in fact, that speech endpoint detection errors occurred in over 10% of th
e inputs in field trials of a voice activated telephone extension system. T
hese errors were caused by problems of (1) low SNR, (2) long pauses between
phrases and (3) irrelevant sounds prior to task sentences. To solve the fi
rst two problems, we propose a real-time speech ending point detection algo
rithm based on the implicit approach, which finds a sentence end by compari
ng the likelihood of a complete sentence hypothesis and other hypotheses. F
or the third problem, we propose a speech beginning point detection algorit
hm which rejects irrelevant sounds by using likelihood ratio and duration c
onditions. The effectiveness of these methods was evaluated under various c
onditions. As a result, we found that the ending point detection algorithm
was not affected by long pauses and that the beginning point detection algo
rithm successfully rejected irrelevant sounds by using phone HMMs that fit
the task. Furthermore, a garbage model of irrelevant sounds was also evalua
ted and we found that the garbage modeling technique and the proposed metho
d compensated each other in their respective weak points and that the best
recognition accuracy was achieved by integrating these methods. (C) 1999 El
sevier Science B.V. All rights reserved.