Pitch Tracking (音高追蹤) Jyh-Shing Roger Jang (張智星) MIR Lab (多媒體資訊檢索實驗室)

Slides:

Advertisements

Similar presentations

Acoustic/Prosodic Features

Advertisements

Feature Selection for Pattern Recognition J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan University ( 台灣大學資訊工程系 )

陳慶瀚機器智慧與自動化技術 (MIAT) 實驗室國立中央大學資工系 2009 年 10 月 15 日 ESD-05 Grafcet-to-VHDL 硬體合成 Grafcet-to-VHDL Hardware Synthesis.

Dynamic Time Warping (DTW)

Chapter 4 sampling of continous-time signals 4.5 changing the sampling rate using discrete-time processing 4.1 periodic sampling 4.2 discrete-time processing.

Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, Dept of CSIE National Taiwan University

Overview of Real-Time Pitch Tracking Approaches Music information retrieval seminar McGill University Francois Thibault.

Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.

Kaggle: Whale Challenge

Intro. to Audio Signals Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CSIE Dept National Taiwan Univ., Taiwan.

Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval.

CMSC Assignment 1 Audio signal processing

計算機視覺研究室專題實作簡報張元翔老師.

21 st 世紀通識教育賴明詔 2008/05/10. 環境變化與能力需求 1. 資訊爆炸，新領域出現頻繁 2. 壽命延長，須自我學習新知 3. 變化迅速，一生時常換工作 4. 世界交流，國際間活動增加 5. 競爭激烈，探索與關懷生命 1. 人文素養與專業技能 2. 融會貫通與創意 3. 領導能力.

數位信號處理專題 : 聽語資訊處理 NSL tool 相關參數介紹. Auditory model The auditory model was composed of early and central stages: 1. The early stage converts the sound waveform.

Basic Features of Audio Signals ( 音訊的基本特徵 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CS Dept, Tsing Hua Univ. Hsinchu, Taiwan.

Communications & Multimedia Signal Processing Refinement in FTLP-HNM system for Speech Enhancement Qin Yan Communication & Multimedia Signal Processing.

中華大學資訊工程系 Fall 2002 Chap 4 Laplace Transform. Page 2 Outline Basic Concepts Laplace Transform Definition, Theorems, Formula Inverse Laplace Transform.

Warped Linear Prediction Concept: Warp the spectrum to emulate human perception; then perform linear prediction on the result Approaches to warp the spectrum:

GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.

NM7613: Music Signal Analysis and Retrieval 音樂訊號分析與檢索 Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.

So far: Historical introduction Mathematical background (e.g., pattern classification, acoustics) Feature extraction for speech recognition (and some neural.

2015/9/111 Introduction to ISMIR/MIREX J.-S. Roger Jang （張智星） Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan Univ.

2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan.

National Taiwan University

Digital Linear Filters 張智星 (Roger Jang) 多媒體資訊檢索實驗室清華大學資訊工程系.

1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:

Digital DesignFloating-Point Number-0 CS3104: 數位系統導論 Principles of Digital Design [project2] floating-point number addition 吳中浩教授助教高鵬程國立清華大學資訊工程學系.

Characteristics of Speech zLong-term (sentence level, several seconds) yDrastic/irregular changes zShort-term (frame level, 20ms or so) yRegular periodic.

2015/10/221 Progressive Filtering and Its Application for Query-by-Singing/Humming J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept.,

Real Time Appearance Based Hand Tracking The 19th International Conference on Pattern Recognition (ICPR) December 7-11, 2008, Tampa Convention Center,

資訊工程系智慧型系統實驗室 iLab 南台科技大學 1 A Static Hand Gesture Recognition Algorithm Using K- Mean Based Radial Basis Function Neural Network 作者 :Dipak Kumar Ghosh,

Chapter 5: Speech Recognition An example of a speech recognition system Speech recognition techniques Ch5., v.5b1.

Content-based Music Retrieval from Acoustic Input (CBMR)

2016/6/41 Recent Improvement Over QBSH and AFP J.-S. Roger Jang （張智星） Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan Univ.

Dynamic Programming 張智星 (Roger Jang) 多媒體資訊檢索實驗室台灣大學資訊工程系.

Copyright © NDSL, Chang Gung University. Permission required for reproduction or display. Chapter 6 Bandwidth Utilization: Multiplexing and Spreading 長庚大學資訊工程學系.

Perceptual Linear Predictive Analysis of Speech Hynek Hermansky, Speech Technology Laboratory, J. Acoustical Society of America, April 1990 報告 : 張志豪.

Music Information Retrieval: Overview and Challenges

哼唱檢索用於嵌入式系統張智星多媒體資訊檢索實驗室台灣大學資訊工程系.

Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)

Query by Singing and Humming System

DTW for Speech Recognition J.-S. Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, Tsing Hua Univ. ( 清華大學.

By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.

When Selecting Classes... 選課時 … Courses Searching System 選課查詢系統 Course Registration (Put into the Cart) 選課登記 Learn the Details of Courses 了解課程細節 Confirm.

Performance Indices for Binary Classification 張智星 (Roger Jang) 多媒體資訊檢索實驗室台灣大學資訊工程系.

Beat Tracking (節拍追蹤) 張智星 (Roger Jang)

Distance/Similarity Functions for Pattern Recognition J.-S. Roger Jang ( 張智星 ) CS Dept., Tsing Hua Univ., Taiwan

Pitch Tracking in Time Domain Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, Dept of CSIE National Taiwan University

Introduction to Music Information Retrieval (MIR)

義守大學資訊工程學系作者：郭東黌, 張佑康報告人：徐碩利 Date: 2006/11/01

Onset Detection, Tempo Estimation, and Beat Tracking

PATTERN COMPARISON TECHNIQUES

Query by Singing/Humming via Dynamic Programming

Discrete Fourier Transform (DFT)

ML for FinTech: Some Examples

Intro. to Audio Signals Jyh-Shing Roger Jang (張智星)

1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.

Speech/Audio Signal Processing in MATLAB/Simulink

A Study on Scalable CELP

基於邊緣吻合向量量化編碼法之資訊隱藏張真誠逢甲大學講座教授中正大學榮譽教授、合聘教授清華大學合聘教授

Endpoint Detection ( 端點偵測)

Uses of filters To remove unwanted components in a signal

Query by Singing/Humming via Dynamic Programming

Duration & Pitch Modification via WSOLA

Longest Common Subsequence (LCS)

Chapter 9. Analog Signals

Edit Distance 張智星 (Roger Jang)

Pre and Post-Processing for Pitch Tracking

Presentation transcript:

Pitch Tracking (音高追蹤) Jyh-Shing Roger Jang (張智星) MIR Lab (多媒體資訊檢索實驗室) CS, NTHU (清華大學資訊工程系) jang@mirlab.org, http://mirlab.org/jang

Pitch (音高） Definition of pitch Characteristics of pitch Fundamental frequency (FF, in Hz): Reciprocal of the fundamental period in a quasi-periodic waveform Pitch (in semitone): Obtained from the fundamental frequency through a log-based transformation (to be detailed later) Characteristics of pitch Noise and unvoiced sound do not have pitch.

Pitch Tracking (音高追蹤) Pitch tracking: To compute the pitch vector of a give waveform (對整段音訊求取音高) Applications Query by singing/humming (哼唱選歌) Tone recognition for Mandarin (華語的音調辨識) Intonation scoring for English (英語的音調評分) Prosody analysis for speech synthesis (語音合成中的韻律分析) Pitch scaling and duration modification (音高調節與長度改變)

Pitch Tracking Algorithms Two categories for pitch tracking algorithms Time domain (時域) ACF (Autocorrelation function) AMDF (Average magnitude difference function) SIFT (Simple inverse filtering tracking) Frequency domain (頻域) Harmonic product spectrum method Cepstrum method

Typical Steps for Pitch Tracking Chop signals into frames (aka frame blocking) Compute pitch functions (ACF, AMDF, etc.) Determine pitch for a frame Max/min picking of the pitch function Remove unreliable pitch Via volume/clarity thresholding Smooth the whole pitch vector Via median filter, etc.

Frame Blocking Zoom in Frame size=256 points Overlap=84 points Frame rate = fs/(frameSize-overlap) = 11025/(256-84)=64 pitch/sec

ACF: Auto-correlation Function 1 128 Frame s(i): Shifted frame s(i+t): t=30 acf(30) = inner product of overlap part Pitch period 30

ACF Example 1 sunday.wav Fundamental frequency Sample rate = 16kHz Frame size = 512 (starting from point 9000) Fundamental frequency Max of ACF occurs at index 132 FF = 16000/(132-1) = 123.077 Hz

ACF Example 2 If the range of humans’ FF is [40, 1000], then we have the restriction for selecting pitch point: Min FF=40Hz  acf(fs/40:end) is not considered. Max FF=1000Hz  acf(1:fs/1000) is not considered.

Pitch Tracking via ACF Specs Playback Sampe rate = 11025 Hz Frame size = 353 points = 32 ms Overlap = 0 Frame rate = 31.25 f/s Playback soo.wav sooPitch.wav

Variations of ACF to Avoid Tapering Normalized version Half-frame shifting:

Variations of ACF to Normalize Range To normalize ACF to the range [-1 1]: This is based on the inequality:

AMDF: Average Magnitude Difference Function 1 128 Frame s(i): Shifted frame s(i+t): t=30 amdf(30) = sum of abs. difference Pitch period 30

AMDF Example sunday.wav Fundamental frequency Sample rate = 16kHz Frame size = 512 (starting from point 9000) Fundamental frequency Min of AMDF occurs at index 132 FF = 16000/(132-1) = 123.077 Hz

Variations of AMDF to Avoid Tapering Normalized version Half-frame shifting:

Combining ACF and AMDF Frame ACF AMDF ACF/AMDF

Example of Pitch Tracking

UPDUDP (1/4) UPDUDP: Unbroken Pitch Determination Using DP Goal: To take pitch smoothness into consideration : a given path in the AMDF matrix : Number of frames : Transition penalty : Exponent of the transition difference

UPDUDP (2/4) Optimum-value function D(i, j): the minimum cost starting from frame 1 to position (i, j) Recurrent formula: Initial conditions : Optimum cost :

UPDUDP (3/4) A typical example

UPDUDP (4/4) Insensitivity in

Harmonic Product Spectrum hps.m

Frequency to Semitone Conversion Semitone : A music scale based on A440 Reasonable pitch range: E2 - C6 82 Hz - 1047 Hz ( - )

Unreliable Pitch Removal Pitch removal via volume thresholding Plot by self demo of ptByPf.m

Unreliable Pitch Removal Pitch removal via volume/clarity thresholding Plot by self demo of ptByPf.m

Rest Handling With rests Without rests

Rest Handling Original pitch vectors with rests. Rests are replaced by previous nonzero pitch. Good for LS. Rests are removed. Good for DTW.

Typical Result of Pitch Tracking Pitch tracking via autocorrelation for茉莉花 (jasmine)

Comparison of Pitch Vectors Yellow line : Target pitch vector

Demo of Pitch Tracking Real-time display of ACF for pitch tracking toolbox/sap/goPtByAcf.mdl Real-time pitch tracking for real-time mic input toolbox/sap/goPtByAcf2.mdl Pitch scaling pitchShiftDemo/project1.exe pitchShift-multirate/multirate.m Intonation assessment ap170/matlab/goDemo.m