Experiments and results
The presented F0 estimation algorithm was evaluated on the Keele pitch reference database for clean speech. We measured the voiced error rate (VE), the unvoiced error rate (UE) and the gross pitch error rate (GPE). A voiced error is present if a voiced frame is recognized as unvoiced, an unvoiced error exists if an unvoiced frame is identified as voiced and a gross pitch error is counted if the estimated F0 differs by more than 20% from the reference pitch. The precision is given by the rootmeansquare error (RMSE) in Hz for all frames classified as correct. The results for our parallel, cognitionoriented F0 estimation algorithm (PCO) are given in
Table 9.1. We also cite the results of other stateoftheart F0 estimation or pitch detection algorithms where these figures were available: RAPT, PSHF and nonnegative matrix factorization (NMF) [ROA 07, SHA 05, TAL 95]. RAPT is one of the best timedomain algorithms based on crosscorrelation and dynamic programming.
VE (%) 
UE (%) 
GPE (%) 
RMSE (Hz) 

PCO 
4.84 
4.12 
1.96 
5.89 
RAPT 
3.2 
6.8 
2.2 
4.4 
PSHF 
4.51 
5.06 
0.61 
2.46 
NMF 
7.7 
4.6 
0.9 
4.3 
Table 9.1. Results of the parallel, cognitionoriented F0 estimation (PCO) in comparison with other stateoftheart algorithms on the Keele pitch reference database
The results show that the voiced and unvoiced error rates of PCO are comparable to those of the other stateoftheart algorithms. In fact, the sum of both voiced and unvoiced error rates is the smallest for PCO, namely 8.96%, whereas it is 10% for RAPT, 9.57% for PSHF and 12.3% for NMF. The gross pitch error rate (GPE) of 1.96 is lower than that for RAPT but clearly not as low as that for the frequencydomain algorithms PSHF and NMF. Our algorithm is a pure timedomain method, and it performs better than RAPT which operates on the time domain also, but uses the normalized crosscorrelation function. However, the gross pitch error (GPE) of our algorithm PCO is far lower than those of Praat, YIN and SAFE, as given in Table 9.2. The GPE of the cited algorithms are reported in [CHU 12]. For clarity, Table 9.2 also gives the GPE of the algorithms PSHF, NMF and RAPT, cited in Table 9.1.
PSHF 
NMF 
PCO 
RAPT 
Praat 
YIN 
SAFE 

GPE (%) 
0.61 
0.9 
1.96 
2.2 
3.22 
2.94 
2.98 
Table 9.2. Gross pitch error rates alone of method PCO compared with standard F0 estimation algorithms on the Keele pitch reference database
The rootmeansquare error (RMSE), at 5.89 Hz  a measure for the preciseness of the correctly calculated F0 estimates  is higher than that for the other algorithms, as given in Table 9.1. This can be explained as follows. The fundamental frequency F0 is defined as the inverse of the time between two minimum signal peaks or two maximum peaks. However, maximum or minimum peaks may have an inclination  either to the left or to the right  and often, there is a set of close peaks around the maximum or minimum peak, so that F0 is not as accurately calculated as with other methods. However, the accuracy of F0 estimates can certainly be improved by adjustment methods.