A multistage neural model is proposed for an auditory scene analysis task-s
egregating speech from interfering sound sources. The core of the model is
a two-layer oscillator network that performs stream segregation on the basi
s of oscillatory correlation. In the oscillatory correlation framework, a s
tream is represented by a population of synchronized relaxation oscillators
, each of which corresponds to an auditory feature, and different streams a
re represented by desynchronized oscillator populations. Lateral connection
s between oscillators encode harmonicity, and proximity in frequency and ti
me. Prior to the oscillator network are a model of the auditory periphery a
nd a stage in which mid-level auditory representations are formed. The mode
l has been systematically evaluated using: a corpus of voiced speech mixed
with interfering sounds, and produces improvements in terms of signal-to-no
ise ratio for every mixture. The performance of our model is compared with
other studies on computational auditory scene analysis. A number of issues
including: biological plausibility and real-time implementation are also di
scussed.