ITA
ENG

Robust speech detection method for telephone speech recognition system

Authors

Kuroiwa, S Naito, M Yamamoto, S Higuchi, N

Citation

S. Kuroiwa et al., Robust speech detection method for telephone speech recognition system, SPEECH COMM, 27(2), 1999, pp. 135-148

Citations number

Categorie Soggetti

Computer Science & Engineering

Journal title

SPEECH COMMUNICATION

ISSN journal

01676393 → ACNP

Volume

Issue

Year of publication

1999

Pages

135 - 148

Database

ISI

SICI code

0167-6393(199903)27:2<135:RSDMFT>2.0.ZU;2-X

Abstract

This paper describes speech endpoint detection methods for continuous speec h recognition systems used over telephone networks. Speech input to these s ystems may be contaminated not only by various ambient noises but also by v arious irrelevant sounds generated by users such as coughs, tongue clicking , lip noises and certain out-of-task utterances. Under these adverse condit ions, robust speech endpoint detection remains an unsolved problem. We foun d in fact, that speech endpoint detection errors occurred in over 10% of th e inputs in field trials of a voice activated telephone extension system. T hese errors were caused by problems of (1) low SNR, (2) long pauses between phrases and (3) irrelevant sounds prior to task sentences. To solve the fi rst two problems, we propose a real-time speech ending point detection algo rithm based on the implicit approach, which finds a sentence end by compari ng the likelihood of a complete sentence hypothesis and other hypotheses. F or the third problem, we propose a speech beginning point detection algorit hm which rejects irrelevant sounds by using likelihood ratio and duration c onditions. The effectiveness of these methods was evaluated under various c onditions. As a result, we found that the ending point detection algorithm was not affected by long pauses and that the beginning point detection algo rithm successfully rejected irrelevant sounds by using phone HMMs that fit the task. Furthermore, a garbage model of irrelevant sounds was also evalua ted and we found that the garbage modeling technique and the proposed metho d compensated each other in their respective weak points and that the best recognition accuracy was achieved by integrating these methods. (C) 1999 El sevier Science B.V. All rights reserved.