ITA
ENG

DIRECT SPEECH FEATURE ESTIMATION USING AN ITERATIVE EM ALGORITHM FOR VOCAL FOLD PATHOLOGY DETECTION

Authors

GAVIDIACEBALLOS L HANSEN JHL

Citation

L. Gavidiaceballos et Jhl. Hansen, DIRECT SPEECH FEATURE ESTIMATION USING AN ITERATIVE EM ALGORITHM FOR VOCAL FOLD PATHOLOGY DETECTION, IEEE transactions on biomedical engineering, 43(4), 1996, pp. 373-383

Citations number

Categorie Soggetti

Engineering, Biomedical

Journal title

IEEE transactions on biomedical engineering → ACNP

ISSN journal

00189294

Volume

Issue

Year of publication

1996

Pages

373 - 383

Database

ISI

SICI code

0018-9294(1996)43:4<373:DSFEUA>2.0.ZU;2-Z

Abstract

The focus of this study is to formulate a speech parameter estimation algorithm for analysis/detection of vocal fold pathology, The speech p rocessing algorithm proposed estimates features necessary to formulate a stochastic model to characterize healthy and pathology conditions f rom speech recordings, The general idea is to separate speech componen ts under healthy and assumed pathology conditions, This problem is add ressed using an iterative maximum-likelihood (ML) estimation procedure , based on the estimation-maximization (EM) algorithm, A new feature f or characterizing pathology, termed enhanced-spectral-pathology compon ent (ESPC), is estimated and shown to vary consistently between health y and pathology conditions, It is also shown that the mean-area-peak-v alue (MAPV) and the weighted-slope (WSLOPE) indexes, which are obtaine d from the ESPC estimate, are meaningful measures of speech pathology conditions, For classification purposes, a five-state hidden-Markov-mo del (HMM) recognizer was formulated, based on the MAPV, WSLOPE, and ES PC spectral features. A set of log Mel-frequency filter bank coefficie nts were used to parameterize the ESPC feature, An evaluation of the H MM-based classifier was performed using speech recordings from healthy and vocal fold cancer patients of sustained vowel sounds, It is shown that while both MAPV and WSLOPE are useful features for vocal fold pa thology detection, superior performance was achieved using a finer spe ctral representation of ESPC (e,g,, a detection rate of 88.7% for path ology and 92.8% for healthy condition), One main advantage of the prop osed method is that it does not require direct estimation of the glott al flow waveform, Therefore, the limitation of the inability to charac terize vocal fold pathology, due to incomplete glottal closure, is no longer an issue, The results suggest that general analysis of the ESPC feature can provide a quantitative, noninvasive approach for analysis , detection, and characterization of speech production under vocal fol d pathology.