The author wishes to thank Prof. Jozsef Szakos from the Hong Kong Polytechnic University for valuable comments and Prof. Guy Aston from the University of Bologna, Italy, for his careful proof-reading. She is also very grateful to Christian Singer who implemented the basic version of the pausefinding algorithm during his diploma thesis.


[ACH 05] Achan K., Roweis S., Hertzmann A. et al., “A segment-based probabilistic generative model of speech”, Proceedings ofICASSP, pp. 221-224, 2005.

[BOE 01] Boersma P., “PRAAT, a System for doing phonetics by computer”, Glot International, vol. 5, no. 9/10, pp. 341-345, 2001.

[CHU 12] Chu W., Alwan A., “SAFE: a statistical approach to F0 estimation under clean and noisy conditions”, IEEE Transactions on Audio, Speech and Language Processing, vol. 20, no. 3, pp. 933-944, 2012.

[DEC 02] de Cheveigne A., Kawahara A., “YIN, a fundamental frequency estimator for speech and music”, Journal of the Acoustical Society of America, vol. 111, no. 4, pp. 1917-1930, 2002.

[EWE 10] Ewender T., Pfister B., “Accurate pitch marking for prosodic modification of speech segments”, Proceedings of INTERSPEECH, pp. 178-181, 2010.

[GHA 14] Ghahremani P., Baba Ali B., Povey D. et al., “A pitch extraction algorithm tuned for automatic speech recognition”, Proceedings of INTERSPEECH, pp. 2494-2498, 2014.

[GLA 15] Glavitsch U., He L., Dellwo V., “Stable and unstable intervals as a basic segmentation procedure of the speech signal”, Proceedings of INTERSPEECH, pp. 31-35, 2015.

[KAH 11] Kahneman D., Thinking, Fast and Slow, Farrar, Straus and Giroux, New York, 2011.

[KUH 77] Kuhn T.S., “Second thoughts on paradigms”, The Essential Tension, Selected Studies in Scientific Tradition and Change, The University of Chicago Press, Chicago, pp. 837-840, 1977.

[MAR 72] Markel J.D., “The SIFT algorithm for fundamental frequency estimation”, IEEE Transactions on Audio and Electroacoustics, vol. 20, no. 5, pp. 367-377, 1972.

[MOO 08] MOORE B.C.J., An Introduction to the Psychology of Hearing, Emerald, Bingley, 2008.

[PEH 11] Peharz R., Wohlmayr M., Pernkopf F., “Gain-robust multi-pitch tracking using sparse nonnegative matrix factorization”, Proceedings of ICASSP, pp. 5416-5419, 2011.

[PLA 95] Plante F., Meyer G.F., Ainsworth W.A., “A pitch extraction reference database”, Proceedings of Eurospeech, pp. 837-840, 1995.

[RAB 75] Rabiner L.R., Sambur M.R., “An algorithm for detecting the endpoints of isolated utterances”, Bell System Technical Journal, vol. 54, no. 2, 1975.

[RAB 76] Rabiner L.R., Cheng M.J., Rosenberg A.E. et al, “A comparative performance study of several pitch detection algorithms”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 24, no. 5, pp. 399-418, 1976.

[ROA 07] Roa S., Bennewitz M., Behnke S., “Fundamental frequency based on pitch-scaled harmonic filtering”, Proceedings of ICASSP, pp. 397-400, 2007.

[ROD 11] Rodero E., “Intonation and emotion: influence of pitch levels and contour type on creating emotions”, Journal of Voice, vol. 25, no. 1, pp. e25-e34, 2011.

[SEC 83] Secrest B.G., Doddington G.R., “An integrated pitch tracking algorithm for speech systems”, Proceedings of ICASSP, pp. 1352-1355, 1983.

[SHA 05] Sha F., Saul L. K., “Real-time pitch determination of one or more voices by nonnegative matrix factorization”, Advances in Neural Information Processing Systems, MIT Press, vol. 17, pp. 1233-1240, 2005.

[TAL 95] Talkin D., A Robust Algorithm for Pitch Tracking (RAPT), Speech Coding and Synthesis, Elsevier Science B.V., Amsterdam, 1995.

< Prev   CONTENTS   Source   Next >