An Overview of Pitch Detection Algorithms Alexandre Savard MUMT611: Music Information Acquisition, Preservation, and Retrieval February 2006.

Slides:



Advertisements
Similar presentations
Acoustic/Prosodic Features
Advertisements

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.
Multipitch Tracking for Noisy Speech
Overview of Real-Time Pitch Tracking Approaches Music information retrieval seminar McGill University Francois Thibault.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Pitch Detection and Tracking Juhan Nam 1.
Introduction The aim the project is to analyse non real time EEG (Electroencephalogram) signal using different mathematical models in Matlab to predict.
Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.
Complete Discrete Time Model Complete model covers periodic, noise and impulsive inputs. For periodic input 1) R(z): Radiation impedance. It has been shown.
Pitch Recognition with Wavelets Final Presentation by Stephen Geiger.
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
DEVON BRYANT CS 525 SEMESTER PROJECT Audio Signal MIDI Transcription.
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
/14 Automated Transcription of Polyphonic Piano Music A Brief Literature Review Catherine Lai MUMT-611 MIR February 17,
Representing Acoustic Information
Introduction to Spectral Estimation
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
EE513 Audio Signals and Systems Digital Signal Processing (Systems) Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
Lecture 1 Signals in the Time and Frequency Domains
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.
Speech Enhancement Using Spectral Subtraction
Module 2 SPECTRAL ANALYSIS OF COMMUNICATION SIGNAL.
Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.
1 BIEN425 – Lecture 8 By the end of the lecture, you should be able to: –Compute cross- /auto-correlation using matrix multiplication –Compute cross- /auto-correlation.
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
Advanced Digital Signal Processing
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Definitions Random Signal Analysis (Review) Discrete Random Signals Random.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
Linear Predictive Analysis 主講人:虞台文. Contents Introduction Basic Principles of Linear Predictive Analysis The Autocorrelation Method The Covariance Method.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
Audio processing methods on marine mammal vocalizations Xanadu Halkias Laboratory for the Recognition and Organization of Speech and Audio
Audio Tempo Extraction Presenter: Simon de Leon Date: February 9, 2006 Course: MUMT611.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
Pitch Estimation by Enhanced Super Resolution determinator By Sunya Santananchai Chia-Ho Ling.
Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.
Performance Comparison of Speaker and Emotion Recognition
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
More On Linear Predictive Analysis
Pitch Tracking MUMT 611 Philippe Zaborowski February 2005.
Automatic Transcription System of Kashino et al. MUMT 611 Doug Van Nort.
Query by Singing and Humming System
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
1 Hidden Markov Model: Overview and Applications in MIR MUMT 611, March 2005 Paul Kolesnik MUMT 611, March 2005 Paul Kolesnik.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,
Speech Enhancement Summer 2009
Rhythmic Transcription of MIDI Signals
PATTERN COMPARISON TECHNIQUES
CS 591 S1 – Computational Audio
Catherine Lai MUMT-611 MIR February 17, 2005
Vocoders.
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Linear Predictive Coding Methods
Pitch Estimation By Chih-Ti Shih 12/11/2006 Chih-Ti Shih.
A Tutorial on Bayesian Speech Feature Enhancement
EE513 Audio Signals and Systems
Digital Systems: Hardware Organization and Design
Presentation on Timbre Similarity
Govt. Polytechnic Dhangar(Fatehabad)
Speech Processing Final Project
Combination of Feature and Channel Compensation (1/2)
Music Signal Processing
Presentation transcript:

An Overview of Pitch Detection Algorithms Alexandre Savard MUMT611: Music Information Acquisition, Preservation, and Retrieval February 2006

Content Introduction – Classification – Applications – Problems and Constraints Time Domain Algorithms Frequency Domain Algorithms Alternative Techniques Conclusion

Introduction Prior Definitions – Pitch : Defined as the perceptual appreciation of the highness or the lowness of a sound. It is related to the periodicity of a sound. – Frequency : Physical attribute of a sound or any type other of signal. Describes the amount of times that a repeated event occur per unit of time. – Fundamental Frequency : In a complex sound or signal, it is the lowest partial.

Introduction Application of Pitch Tracking – Music Automatic Transcription from audio signals to common music notation or to MIDI number – Score Following – Musical Queries by singing or humming – Acoustic feature for Human-Computer Interaction – Sound-Editing Program like pitch-shifting and time- scaling operation

Introduction Non-Exclusive Classification – Voice ( Speech, Singing ) – Instrumental – Monophonic – Polyphonic – Time-Based Algorithm – Spectral-Based Algorithm – Alternative

Introduction Generally Encountered Problems – Noise – Reverberation – Other Sounds from the environment – Shortness of the sustained part for certain sounds – Sounds need to be analyzed right after the attack transient where they are not totally stable – Detuning during the sustain part of a sound – Minimal output delay for realtime.

Introduction Music-Specific Difficulties – Large frequency range for musical instrument – Many instrumental sound have inharmonic partials – Expressiveness factors ( glissando, vibrato, thrill ) – Fast algorithm for real-time processing – Multiphonic

Time Domain Zero-Crossing Detection Autocorrelation Function Average Magnitude Difference Function

Time Domain Zero-Crossing Detection – Based on a direct application of the definition of periodicity – Counting the number of time that the signal crosses a reference level – Mostly Inexpensive in computation – Weakness against noise – Presents weakness when used to analyze signals with energy in high frequencies

Time Domain Zero-Crossing Detection

Time Domain Autocorrelation Technique – Cross-Correlation is a non-linear operation that measure the similarity between two signal. – The coresponding samples of a signals and a time- shifted version of an other one are multiplied and added toghether. – The Cross-Correlation functionwill then have a peak to the offset value which coresponds to the maximum of similarity.

Time Domain Autocorrelation Technique – Autocorrelation is a cross-correlation of a signal with itself. – The maximum of similarity occurs for time shifting of zero. – An other maximum should occur in theory when the time-shifting of the signal corresponds to the fundamental period.

Time Domain Autocorrelation Technique

Time Domain Autocorrelation Technique – Not very efficient for high fundamental frequency. – Convolution is a very expensive process. – Computation efficiency can be improved using the FFT algorithm instead of convolution. It reduces calculation from N squared to NlogN. – Most of the variation of this technique related to the mathematical definition of the autocorrelation used, the way the maximums are localized, and how errors in the maximum identification are attenuated.

Time Domain Average Magnitude Difference Function – It is an alternate to Autocorrelation function. – It compute the difference between the signal and a time-shifted version of itself. – While auttocorelation have peaks at maximum similarity, there will be valleys in the average magnitude difference function.

Time Domain Other Temporal Algorithm – Waveform Maximum Detection – Sum Magnitude Difference Squared Function – Average Squared Difference Function – Cumulative Mean Normalized Difference Function – Circular Average Magnitude Difference Function – Adaptive Filter

Time Domain Other Temporal Algorithm – Adaptive Filter – Super Resolution Pitch Determination

Frequency Domain Harmonic Product Spectrum Cepstrum

Frequency Domain Harmonic Product Spectrum – FFT is used to convert temporal representation of sound into its spectral representation – Assume that all signals are made of harmonic partials – The spectrum is compressed by a factor corresponding to harmonic numbers – Multiplying the compressed spectrum with the original one leads to a amplification of the fundamental frequency

Frequency Domain Harmonic Product Spectrum – The highest peak most likely correspond to the fundamental frequency

Frequency Domain Harmonic Product Spectrum – Presents a high degree of robustness in a noisy environment – Less efficient for sounds that are not made from harmonic components – Computationnally inexpensive – Octave Errors can occur

Frequency Domain Cepstrum – Cepstrum is defined as the inverse Fourrier transform of the logarithm of the power spectrum of a signal – Cepstrum extracts periodicity from the spectrum – It can be unformally mathematically written as: – It results a peak which correspond to the fundamental period

Frequency Domain Calculation of Cepstrum for Voice – In the source filter-model, voiced speech s(t) can be considered as the convolution of a pulse train p(t) with the impulse respond of the vocal tract h(t). – In the spectrum we get: – Taking the logarithm on both side we then obtain:

Frequency Domain Cepstrum – The logarithim operation flatten the spectra so that so that it gives more robustness for formants – However this same operation rises the noise level

Frequency Domain Other Frequency Domain Algorithm – Maximum Likelihood – Linear Prediction Coding – Spectral Autocorrelation

Alternative Technique Teager Energy Function – Referring again to the source-filter model for voice, it can be represented by a pulse train filtered by the vocal tract. – The pulse train is produced by the successive opening and closure of the glottis. – The production of speech is closely related to the release of energy through the glottis. – The opening/closure of the glottis result in a peak of energy into the signal

Alternative Technique Teager Energy Function – The Teager energy function is a non-linear operator that defines the instantaneous energy as: – It is derived from the total energy of an oscillatory spring-mass system. - Estimating the periodicity of energy peaks for the signal leads to an approximation of the fundamental frequency.

Alternative Technique Miscellaneous Technique – Wavelet Transform – Bayesian Statistical Model – Hidden Markov Model – Graphical probablilistic Models – Perceptual Pitch Detector

Conclusion

Bibliography Liu B.,Wu Y., L Yi. "Linear Hidden Markov Model for Music Information Retrieval Based on Humming." Paper presented at the International Conference on Acoustics, Speech, and Signal Processing Li B., Li Y., Wang C., Tang C., Zhang E. "A New Efficient Pitch-Tracking Algorithm." Paper presented at the International Conference on Robotics, Intelligent Systems and Signal Processing Chilton E., Evans B. "The Spectral Autocorrelation Applied to the Linear Prediction Residual of Speech for Robust Pitch Detection." Paper presented at the International Conference on Acoustics, Speech, and Signal Processing Monti G., Sandler M. "Monophonic Transcription with Autocorrelation " Paper presented at the Conference on Digital Audio Effects Liu J., Zheng T., Deng J. and Wu W. "Real-Time Pitch Tracking Based on Combined Smdsf." Paper presented at the Conference on Speech Communcation and Technology 2005.

Bibliography Luo H., Denbigh P. "A Speech Separation System That Is Robust to Reverberation." Paper presented at the International Symposium on Speech, Image Processing and Neural Networks Wu M., Wang D., Brown G. "A Multi-Pitch Tracking Algorithm for Noisy Speech." Paper presented at the International Conference on Acoustic, Speech, and Signal Processing Nazih Abu-Shikhah Mohamed Deriche. "A Novel Pitch Estimation Technique Using the Teager Energy Function." Paper presented at the International Symposium on Signal Processing and its Applications Picone J., Doddington G., Secrest B. "Robust Pitch Detection in a Noisy Telephone Environment." Paper presented at the International Conference on Acoustics, Speech, and Signal Processing Quast H., Schreiner O., Schroeder R. "Robust Pitch Tracking in the Car Environment." Paper presented at the International Conference on Acoustics, Speech, and Signal Processing 2002.

Bibliography Marchand S. "An Efficient Pitch-Tracking Algorithm Using a Combination of Fourier Transforms." Paper presented at the Conference on Digital Audio Effects Walmsley P., Godsill S., Rayner P. "Polyphonic Pitch Tracking Using Joint Bayesian Estimation of Multiple Frame Parameters." Paper presented at the Workshop on Applications of Signal Processing to Audio and Acoustics Zhu W., Kankanhalli M. "Robust and Efficient Pitch Tracking for Query-by- Humming." Paper presented at the Conference on Information, Communications and Signal Processing Roads C., “The Computer Music Tutorial”, p , Boston, The MIT Press, 1996.