ITA
ENG

Psychoacoustical evaluation of PSOLA. II. Double-formant stimuli and the role of vocal perturbation

Authors

Kortekaas, RWL Kohlrausch, A

Citation

Rwl. Kortekaas et A. Kohlrausch, Psychoacoustical evaluation of PSOLA. II. Double-formant stimuli and the role of vocal perturbation, J ACOUST SO, 105(1), 1999, pp. 522-535

Citations number

Categorie Soggetti

Multidisciplinary,"Optics & Acoustics

Journal title

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA

ISSN journal

00014966 → ACNP

Volume

105

Issue

Year of publication

1999

Pages

522 - 535

Database

ISI

SICI code

0001-4966(199901)105:1<522:PEOPID>2.0.ZU;2-F

Abstract

This article presents the results of Listening experiments and psychoacoust ical modeling aimed at evaluating the pitch synchronous overlap-and-add (PS OLA) technique. This technique can be used for simultaneous modification of pitch and duration of natural speech, using simple and efficient time-doma in operations on the speech waveform. The first set of experiments tested t he ability of subjects to discriminate double-formant stimuli, modified in fundamental frequency using PSOLA, from unmodified stimuli. Of the potentia l auditory discrimination cues induced by PSOLA, cues from the first forman t were found to generally dominate discrimination performance. In the secon d set of experiments the influence of vocal perturbation, i.e., jitter and shimmer, on discriminability of PSOLA-modified single-formant stimuli was d etermined, The data show that discriminability deteriorates at most modestl y in the presence of jitter and shimmer. With the exception of a few condit ions, the trends in these data could be replicated by either using a modula tion-discrimination or an intensity-discrimination model, dependent on the formant frequency. As a baseline experiment detection thresholds for jitter and shimmer were measured. Thresholds for jitter could be replicated by us ing either the modulation-discrimination or the intensity-discrimination mo del, dependent on the (mean) fundamental frequency of stimuli. The threshol ds for shimmer could be accurately predicted for stimuli with a 250-Hz fund amental, but less accurately in the case of a 100-Hz fundamental. (C) 1999 Acoustical Society? of America. [S0001-4966(99)05201-7].