J. Kreiman et Br. Gerratt, VALIDITY OF RATING-SCALE MEASURES OF VOICE QUALITY, The Journal of the Acoustical Society of America, 104(3), 1998, pp. 1598-1608
The validity of perceptual measures of vocal quality has been neglecte
d in studies of voice, which focus more commonly on rater reliability.
Validity depends in part on reliability, because an unreliable test d
oes not measure what it is intended to measure. However, traditional m
easures of rating reliability only partially represent interrater agre
ement, because they cannot reflect variations or patterns of agreement
for specific voice samples. In this paper the likelihood that two rat
ers would agree in their ratings of a single voice is examined, for ea
ch voice in five previously gathered data sets. Results do not support
the continued assumption that traditional rating procedures produce u
seful indices of listeners' perceptions. Listeners agreed very poorly
in the midrange of scales for breathiness and roughness, and mean rati
ngs in the midrange of such scales did not represent the extent to whi
ch a voice possesses a quality, but served only to indicate that liste
ners disagreed. Techniques like analysis by synthesis or judgment of s
imilarity avoid decomposing quality into constituent dimensions, and d
o not require a listener to compare an external stimulus to an unstabl
e internal representation, thus decreasing the error in measures of qu
ality. Modeling individual differences in perception can increase the
variance accounted for in models of quality, further reducing the erro
r in perceptual measures. Thus such techniques may provide valid alter
natives to current approaches. (C) 1998 Acoustical Society of America.
[S0001-4966(98)04708-0]