Psychoacoustical evaluation of PSOLA. II. Double-formant stimuli and the role of vocal perturbation

Citation
Rwl. Kortekaas et A. Kohlrausch, Psychoacoustical evaluation of PSOLA. II. Double-formant stimuli and the role of vocal perturbation, J ACOUST SO, 105(1), 1999, pp. 522-535
Citations number
41
Categorie Soggetti
Multidisciplinary,"Optics & Acoustics
Journal title
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA
ISSN journal
00014966 → ACNP
Volume
105
Issue
1
Year of publication
1999
Pages
522 - 535
Database
ISI
SICI code
0001-4966(199901)105:1<522:PEOPID>2.0.ZU;2-F
Abstract
This article presents the results of Listening experiments and psychoacoust ical modeling aimed at evaluating the pitch synchronous overlap-and-add (PS OLA) technique. This technique can be used for simultaneous modification of pitch and duration of natural speech, using simple and efficient time-doma in operations on the speech waveform. The first set of experiments tested t he ability of subjects to discriminate double-formant stimuli, modified in fundamental frequency using PSOLA, from unmodified stimuli. Of the potentia l auditory discrimination cues induced by PSOLA, cues from the first forman t were found to generally dominate discrimination performance. In the secon d set of experiments the influence of vocal perturbation, i.e., jitter and shimmer, on discriminability of PSOLA-modified single-formant stimuli was d etermined, The data show that discriminability deteriorates at most modestl y in the presence of jitter and shimmer. With the exception of a few condit ions, the trends in these data could be replicated by either using a modula tion-discrimination or an intensity-discrimination model, dependent on the formant frequency. As a baseline experiment detection thresholds for jitter and shimmer were measured. Thresholds for jitter could be replicated by us ing either the modulation-discrimination or the intensity-discrimination mo del, dependent on the (mean) fundamental frequency of stimuli. The threshol ds for shimmer could be accurately predicted for stimuli with a 250-Hz fund amental, but less accurately in the case of a 100-Hz fundamental. (C) 1999 Acoustical Society? of America. [S0001-4966(99)05201-7].