Pitch detection and formant analysis of Arabic speech processing

Citation
A. Cherif et al., Pitch detection and formant analysis of Arabic speech processing, APPL ACOUST, 62(10), 2001, pp. 1129-1140
Citations number
7
Categorie Soggetti
Optics & Acoustics
Journal title
APPLIED ACOUSTICS
ISSN journal
0003682X → ACNP
Volume
62
Issue
10
Year of publication
2001
Pages
1129 - 1140
Database
ISI
SICI code
0003-682X(200110)62:10<1129:PDAFAO>2.0.ZU;2-1
Abstract
Speech processing and synthesis has been a well researched area for several years linked to a renewal of interest especially in electronics, artificia l intelligence, telecommunications, and even in medicine. For example, the implantation of speaker recognition systems, the development of new low bit rate coders, speech synthesis and assistance of the handicapped person, th e identification of some neurological and ORL pathologies by vocal analysis are considered as the most promising applications in this field. In fact, for these applications, speech processing constitutes an essential stage of the extraction and the identification of vocal parameters (pitch, formants , stamp...) which depend on the physical, physiological and the linguistic structure of the spoken language. Moreover, the variability of the speech s ignal (children, male, female sounds) and its prosodic aspects (shouted, su ng sounds...) render the task of treatment more difficult and oblige us to observe and acquire a large quantity of speech signals to extract that whic h is relevant. Hence, we have improved the processing part by the developme nt of a convivial hard and soft environment under MATLAB 5-2. The originali ty of the work is that the developed program works in real time when associ ated with the MATLAB real time toolbox. In fact, the new speech processing program computes the pitch period, extracts the formant frequencies of Arab ic speech and identifies the speaker vocal stamp. The database consists of Arabic sentences phonetically balanced, pronounced by several speakers. Aft er acquisition, conversion and segmentation, we identify the voiced-unvoice d (V/UV) speech by analysing its zero-crossing evolution. Then we compute t he fundamental frequency, the formants and the spectral envelope (vocal sta mp). These parameters are not used only in speech synthesis and recognition but also in the prediction of the speaker's emotional and psychological st ate. (C) 2001 Elsevier Science Ltd. All rights reserved.