In this paper we propose a method for enhancement of speech in the presence
of additive noise. The objective is to selectively enhance the high signal
-to-noise ratio (SNR) regions in the noisy speech in the temporal and spect
ral domains, without causing significant distortion in the resulting enhanc
ed speech. This is proposed to be done at three different levels. (a) At th
e gross level, by identifying the regions of speech and noise in the tempor
al domain. (b) At the finer level, by identifying the regions of high and l
ow SNR portions in the noisy speech. (c) At the short-time spectrum level,
by enhancing the spectral peaks over spectral valleys. The basis for the pr
oposed approach is to analyze linear prediction (LP) residual signal in sho
rt (1-2 ms) segments to determine whether a segment belongs to a noise regi
on or speech region. In the speech regions the inverse spectral flatness fa
ctor is significantly higher than in the noisy regions. The LP residual sig
nal enables us to deal with short segments of data due to uncorrelatedness
of the samples. Processing of noisy speech for enhancement involves mostly
weighting the LP residual signal samples. The weighted residual signal samp
les are used to excite the time-varying all-pole filter to produce enhanced
speech. As the additive noise level in the speech signal is increased, the
quality of the resulting enhanced speech decreases progressively due to lo
ss of speech information in the low SNR, high noise regions. Thus the degra
dation in performance of enhancement is graceful as the overall SNR of the
noisy speech is decreased. (C) 1999 Elsevier Science B.V. All rights reserv
ed.