Fundamental frequency estimation based on the joint time-frequency analysis of harmonic spectral structure

Authors
Citation
Dj. Liu et Ct. Lin, Fundamental frequency estimation based on the joint time-frequency analysis of harmonic spectral structure, IEEE SPEECH, 9(6), 2001, pp. 609-621
Citations number
24
Categorie Soggetti
Eletrical & Eletronics Engineeing
Journal title
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING
ISSN journal
10636676 → ACNP
Volume
9
Issue
6
Year of publication
2001
Pages
609 - 621
Database
ISI
SICI code
1063-6676(200109)9:6<609:FFEBOT>2.0.ZU;2-Q
Abstract
In this paper, we propose a new scheme to analyze the spectral structure of speech signals for fundamental frequency estimation. First, we propose a p itch measure to detect the harmonic characteristics of voiced sounds on the spectrum of a speech signal. This measure utilizes the properties that the re are distinct impulses located at the positions of fundamental frequency and its harmonics, and the energy of voiced sound is dominated by the energ y of these distinct harmonic impulses. The spectrum can be obtained by the fast Fourier transform (FFT); however, it may be destroyed when the speech is interfered with by additive noise. To enhance the robustness of the prop osed scheme in noisy environments, we apply the joint time-frequency analys is (JTFA) technique to obtain the adaptive representation of the spectrum o f speech signals. The adaptive representation can accurately extract import ant harmonic structure of noisy speech signals at the expense of high compu tation cost. To solve this problem, we further propose a fast adaptive repr esentation (FAR) algorithm, which reduces the computation complexity of the original algorithm by 50%. The performance of the proposed fundamental-fre quency estimation scheme is evaluated on a large database with or without a dditive noise. The performance is compared to that of other approaches on t he same database. The experimental results show that the proposed scheme pe rforms well on clean speech and is robust in noisy environments.