Approaches to speaker detection and tracking in conversational speech

Citation
Rb. Dunn et al., Approaches to speaker detection and tracking in conversational speech, DIGIT SIG P, 10(1-3), 2000, pp. 93-112
Citations number
15
Categorie Soggetti
Eletrical & Eletronics Engineeing
Journal title
DIGITAL SIGNAL PROCESSING
ISSN journal
10512004 → ACNP
Volume
10
Issue
1-3
Year of publication
2000
Pages
93 - 112
Database
ISI
SICI code
1051-2004(200001/07)10:1-3<93:ATSDAT>2.0.ZU;2-Q
Abstract
Two approaches to detecting and tracking speakers in multispeaker audio are described. Both approaches use an adapted Gaussian mixture model, universa l background model (GMM-UBM) speaker detection system as the core speaker r ecognition engine. In one approach, the individual log-likelihood ratio sco res, which are produced on a frame-by-frame basis by the GMM-UBM system, ar e used to first partition the speech file into speaker homogenous regions a nd then to create scores for these regions. We refer to this approach as in ternal segmentation. Another approach uses an external segmentation algorit hm, based on blind clustering, to partition the speech file into speaker ho mogenous regions. The adapted GMM-UBM system then scores each of these regi ons as in the single-speaker recognition case. We show that the external se gmentation system outperforms the internal segmentation system for both det ection and tracking. In addition, we show how different components of the d etection and tracking algorithms contribute to the overall system performan ce. (C) 2000 Academic Press.