Dj. Liu et Ct. Lin, Fundamental frequency estimation based on the joint time-frequency analysis of harmonic spectral structure, IEEE SPEECH, 9(6), 2001, pp. 609-621
In this paper, we propose a new scheme to analyze the spectral structure of
speech signals for fundamental frequency estimation. First, we propose a p
itch measure to detect the harmonic characteristics of voiced sounds on the
spectrum of a speech signal. This measure utilizes the properties that the
re are distinct impulses located at the positions of fundamental frequency
and its harmonics, and the energy of voiced sound is dominated by the energ
y of these distinct harmonic impulses. The spectrum can be obtained by the
fast Fourier transform (FFT); however, it may be destroyed when the speech
is interfered with by additive noise. To enhance the robustness of the prop
osed scheme in noisy environments, we apply the joint time-frequency analys
is (JTFA) technique to obtain the adaptive representation of the spectrum o
f speech signals. The adaptive representation can accurately extract import
ant harmonic structure of noisy speech signals at the expense of high compu
tation cost. To solve this problem, we further propose a fast adaptive repr
esentation (FAR) algorithm, which reduces the computation complexity of the
original algorithm by 50%. The performance of the proposed fundamental-fre
quency estimation scheme is evaluated on a large database with or without a
dditive noise. The performance is compared to that of other approaches on t
he same database. The experimental results show that the proposed scheme pe
rforms well on clean speech and is robust in noisy environments.