Experiments and results

The presented F0 estimation algorithm was evaluated on the Keele pitch reference database for clean speech. We measured the voiced error rate (VE), the unvoiced error rate (UE) and the gross pitch error rate (GPE). A voiced error is present if a voiced frame is recognized as unvoiced, an unvoiced error exists if an unvoiced frame is identified as voiced and a gross pitch error is counted if the estimated F0 differs by more than 20% from the reference pitch. The precision is given by the root-mean-square error (RMSE) in Hz for all frames classified as correct. The results for our parallel, cognition-oriented F0 estimation algorithm (PCO) are given in

Table 9.1. We also cite the results of other state-of-the-art F0 estimation or pitch detection algorithms where these figures were available: RAPT, PSHF and non-negative matrix factorization (NMF) [ROA 07, SHA 05, TAL 95]. RAPT is one of the best time-domain algorithms based on cross-correlation and dynamic programming.

VE (%)

UE (%)

GPE (%)

RMSE (Hz)

PCO

4.84

4.12

1.96

5.89

RAPT

3.2

6.8

2.2

4.4

PSHF

4.51

5.06

0.61

2.46

NMF

7.7

4.6

0.9

4.3

Table 9.1. Results of the parallel, cognition-oriented F0 estimation (PCO) in comparison with other state-of-the-art algorithms on the Keele pitch reference database

The results show that the voiced and unvoiced error rates of PCO are comparable to those of the other state-of-the-art algorithms. In fact, the sum of both voiced and unvoiced error rates is the smallest for PCO, namely 8.96%, whereas it is 10% for RAPT, 9.57% for PSHF and 12.3% for NMF. The gross pitch error rate (GPE) of 1.96 is lower than that for RAPT but clearly not as low as that for the frequency-domain algorithms PSHF and NMF. Our algorithm is a pure time-domain method, and it performs better than RAPT which operates on the time domain also, but uses the normalized cross-correlation function. However, the gross pitch error (GPE) of our algorithm PCO is far lower than those of Praat, YIN and SAFE, as given in Table 9.2. The GPE of the cited algorithms are reported in [CHU 12]. For clarity, Table 9.2 also gives the GPE of the algorithms PSHF, NMF and RAPT, cited in Table 9.1.

PSHF

NMF

PCO

RAPT

Praat

YIN

SAFE

GPE (%)

0.61

0.9

1.96

2.2

3.22

2.94

2.98

Table 9.2. Gross pitch error rates alone of method PCO compared with standard F0 estimation algorithms on the Keele pitch reference database

The root-mean-square error (RMSE), at 5.89 Hz - a measure for the preciseness of the correctly calculated F0 estimates - is higher than that for the other algorithms, as given in Table 9.1. This can be explained as follows. The fundamental frequency F0 is defined as the inverse of the time between two minimum signal peaks or two maximum peaks. However, maximum or minimum peaks may have an inclination - either to the left or to the right - and often, there is a set of close peaks around the maximum or minimum peak, so that F0 is not as accurately calculated as with other methods. However, the accuracy of F0 estimates can certainly be improved by adjustment methods.

 
Source
< Prev   CONTENTS   Source   Next >