Separation of speech from interfering sounds based on oscillatory correlation

Citation
Dll. Wang et Gj. Brown, Separation of speech from interfering sounds based on oscillatory correlation, IEEE NEURAL, 10(3), 1999, pp. 684-697
Citations number
57
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
IEEE TRANSACTIONS ON NEURAL NETWORKS
ISSN journal
10459227 → ACNP
Volume
10
Issue
3
Year of publication
1999
Pages
684 - 697
Database
ISI
SICI code
1045-9227(199905)10:3<684:SOSFIS>2.0.ZU;2-G
Abstract
A multistage neural model is proposed for an auditory scene analysis task-s egregating speech from interfering sound sources. The core of the model is a two-layer oscillator network that performs stream segregation on the basi s of oscillatory correlation. In the oscillatory correlation framework, a s tream is represented by a population of synchronized relaxation oscillators , each of which corresponds to an auditory feature, and different streams a re represented by desynchronized oscillator populations. Lateral connection s between oscillators encode harmonicity, and proximity in frequency and ti me. Prior to the oscillator network are a model of the auditory periphery a nd a stage in which mid-level auditory representations are formed. The mode l has been systematically evaluated using: a corpus of voiced speech mixed with interfering sounds, and produces improvements in terms of signal-to-no ise ratio for every mixture. The performance of our model is compared with other studies on computational auditory scene analysis. A number of issues including: biological plausibility and real-time implementation are also di scussed.