ITA
ENG

Speech visualization by integrating features for the hearing impaired

Authors

Watanabe, A Tomishige, S Nakatake, M

Citation

A. Watanabe et al., Speech visualization by integrating features for the hearing impaired, IEEE SPEECH, 8(4), 2000, pp. 454-466

Citations number

Categorie Soggetti

Eletrical & Eletronics Engineeing

Journal title

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING

ISSN journal

10636676 → ACNP

Volume

Issue

Year of publication

2000

Pages

454 - 466

Database

ISI

SICI code

1063-6676(200007)8:4<454:SVBIFF>2.0.ZU;2-X

Abstract

This paper describes development of a new speech visualization system that creates readable patterns by integrating different speech features into a s ingle picture. The system extracts the phonemic and prosodic features from speech signals and converts them into a visual image using neither speech s egmentation nor speech recognition. We used four time-delay neural networks (TDNN's) to generate phonemic features in the new system. Training of the TDNN's using three selected frames of eight kinds of acoustic parameters sh owed significant improvement in the performance. The TDNN outputs control t he brightness of patterns used for consonants, that is, each of the consona nt-patterns is represented by a different white texture whose brightness is weighted by the output of a corresponding TDNN, All the weighted consonant -patterns are simply added and then overlaid synchronously on colors due to the formant frequencies. When this is done, phonemic sequences and boundar ies manifest themselves in the resulting visual patterns. In addition, the color of a single vowel sandwiched between consonants looks uniform. These visual phenomena are very useful for decoding the complex speech code, whic h is generated by the continuous movements of speech organs. We evaluated t he visualized speech in a preliminary test. When three students read the pa tterns of 75 words uttered by four mates (300 items), the learning curves s how ed a steep rise and the correct answer rate reached 96-99%. The learnin g effect was durable: after five months of absence from the system, a subje ct read 96.3% of the 300 tokens in a response time which averaged only 1.3 s/word.