This paper, based on three presentations made in 1998 at the RLA2C Workshop
in Avignon, discusses the evaluation of speaker recognition systems from s
everal perspectives. A general discussion of the speaker recognition task a
nd the challenges and issues involved in its evaluation is offered. The NIS
T evaluations in this area and specifically the 1998 evaluation, its object
ives, protocols and test data, are described. The algorithms used by the sy
stems that were developed for this evaluation are summarized, compared and
contrasted. Overall performance results of this evaluation are presented by
means of detection error trade-off (DET) curves. These show the performanc
e trade-off of missed detections and false alarms for each system and the e
ffects on performance of training condition, test segment duration, the spe
akers' sex and the match or mismatch of training and test handsets. Several
factors that were found to have an impact on performance, including pitch
frequency, handset type and noise, are discussed and DET curves showing the
ir effects are presented. The paper concludes with some perspective on the
history of this technology and where it may be going. (C) 2000 Elsevier Sci
ence B.V. All rights reserved.