ITA
ENG

Approaches to speaker detection and tracking in conversational speech

Authors

Dunn, RB Reynolds, DA Quatieri, TF

Citation

Rb. Dunn et al., Approaches to speaker detection and tracking in conversational speech, DIGIT SIG P, 10(1-3), 2000, pp. 93-112

Citations number

Categorie Soggetti

Eletrical & Eletronics Engineeing

Journal title

DIGITAL SIGNAL PROCESSING

ISSN journal

10512004 → ACNP

Volume

Issue

1-3

Year of publication

2000

Pages

93 - 112

Database

ISI

SICI code

1051-2004(200001/07)10:1-3<93:ATSDAT>2.0.ZU;2-Q

Abstract

Two approaches to detecting and tracking speakers in multispeaker audio are described. Both approaches use an adapted Gaussian mixture model, universa l background model (GMM-UBM) speaker detection system as the core speaker r ecognition engine. In one approach, the individual log-likelihood ratio sco res, which are produced on a frame-by-frame basis by the GMM-UBM system, ar e used to first partition the speech file into speaker homogenous regions a nd then to create scores for these regions. We refer to this approach as in ternal segmentation. Another approach uses an external segmentation algorit hm, based on blind clustering, to partition the speech file into speaker ho mogenous regions. The adapted GMM-UBM system then scores each of these regi ons as in the single-speaker recognition case. We show that the external se gmentation system outperforms the internal segmentation system for both det ection and tracking. In addition, we show how different components of the d etection and tracking algorithms contribute to the overall system performan ce. (C) 2000 Academic Press.