ITA
ENG

Classification of general audio data for content-based retrieval

Authors

Li, DG Sethi, IK Dimitrova, N McGee, T

Citation

Dg. Li et al., Classification of general audio data for content-based retrieval, PATT REC L, 22(5), 2001, pp. 533-544

Citations number

Categorie Soggetti

AI Robotics and Automatic Control

Journal title

PATTERN RECOGNITION LETTERS

ISSN journal

01678655 → ACNP

Volume

Issue

Year of publication

2001

Pages

533 - 544

Database

ISI

SICI code

0167-8655(200104)22:5<533:COGADF>2.0.ZU;2-9

Abstract

In this paper, we address the problem of classification of continuous gener al audio data (GAD) for content-based retrieval, and describe a scheme that is able to classify audio segments into seven categories consisting of sil ence, single speaker speech, music, environmental noise, multiple speakers' speech, simultaneous speech and music, and speech and noise. We studied a total of 143 classification features for their discrimination capability. O ur study shows that cepstral-based features such as the Mel-frequency cepst ral coefficients (MFCC) and linear prediction coefficients (LPC) provide be tter classification accuracy compared to temporal and spectral features. To minimize the classification errors near the boundaries of audio segments o f different type in general audio data, a segmentation-pooling scheme is al so proposed in this work. This scheme yields classification results that ar e consistent with human perception. Our classification system provides over 90% accuracy at a processing speed dozens of times faster than the playing rate. (C) 2001 Elsevier Science B.V. All rights reserved.