ITA
ENG

Tree-based modeling of prosodic phrasing and segmental duration for KoreanTTS systems

Authors

Lee, S Oh, YH

Citation

S. Lee et Yh. Oh, Tree-based modeling of prosodic phrasing and segmental duration for KoreanTTS systems, SPEECH COMM, 28(4), 1999, pp. 283-300

Citations number

Categorie Soggetti

Computer Science & Engineering

Journal title

SPEECH COMMUNICATION

ISSN journal

01676393 → ACNP

Volume

Issue

Year of publication

1999

Pages

283 - 300

Database

ISI

SICI code

0167-6393(199908)28:4<283:TMOPPA>2.0.ZU;2-7

Abstract

This study describes the tree-based modeling of prosodic phrasing, pause du ration between phrases and segmental duration for Korean TTS systems. We co llected 400 sentences from various genres and built a corresponding speech corpus uttered by a professional female announcer. The phonemic and prosodi c boundaries were manually marked on the recorded speech, and morphological analysis, grapheme-to-phoneme conversion and syntactic analysis were also done on the text. A decision tree and regression trees were trained on 240 sentences (of approximately 20 min length), and tested on 160 sentences (of approximately 13 min length). Features for modeling prosody are proposed, and their effectiveness is measured by interpreting the resulting trees. Th e misclassification rate of the decision tree was 14.46%, the RMSEs of the regression trees, which predict pause duration and segmental duration, were 132 and 22 ms, respectively, for the test set. To understand the performan ce of our approach in the run time of TTS systems, we trained and tested tr ies with the output of our text analyzer. The misclassification rate and th e RMSE were 18.49% and 134 ms, respectively, for prosodic phrasing and paus e duration on the test set. (C) 1999 Elsevier Science B.V. All rights reser ved.