Query by Singing/Humming via Dynamic Programming

Slides:



Advertisements
Similar presentations
Dynamic Time Warping (DTW)
Advertisements

Pitch Tracking (音高追蹤) Jyh-Shing Roger Jang (張智星) MIR Lab (多媒體資訊檢索實驗室)
Shallow Copy Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
Onset Detection in Audio Music J.-S Roger Jang ( 張智星 ) MIR LabMIR Lab, CSIE Dept. National Taiwan University.
Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval.
T.Sharon 1 Internet Resources Discovery (IRD) Music IR.
NM7613: Music Signal Analysis and Retrieval 音樂訊號分析與檢索 Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
CSIE Dept., National Taiwan Univ., Taiwan
National Taiwan University
2015/10/221 Progressive Filtering and Its Application for Query-by-Singing/Humming J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept.,
加速以 GPU 為運算核心的二階段哼唱選歌 系統 A CCELERATING A T WO -S TAGE Q UERY BY S INGING /H UMMING S YSTEM U SING GPU S Student:Andy Chuang ( 莊詠翔 )
2015/10/241 Query by Tapping 敲擊選歌 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan
Demos for QBSH J.-S. Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
Content-based Music Retrieval from Acoustic Input (CBMR)
Singly Linked Lists Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University 1.
2016/6/41 Recent Improvement Over QBSH and AFP J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan Univ.
Sorting Algorithms Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
RuSSIR 2013 QBSH and AFP as Two Successful Paradigms of Music Information Retrieval Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CSIE Dept.
Sparse Vectors & Matrices Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
Binary Search Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
QBSH Corpus The QBSH corpus provided by Roger Jang [1] consists of recordings of children’s songs from students taking the course “Audio Signal Processing.
Content-Based MP3 Information Retrieval Chueh-Chih Liu Department of Accounting Information Systems Chihlee Institute of Technology 2005/06/16.
STL: Maps Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
DTW for Speech Recognition J.-S. Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, Tsing Hua Univ. ( 清華大學.
Distance/Similarity Functions for Pattern Recognition J.-S. Roger Jang ( 張智星 ) CS Dept., Tsing Hua Univ., Taiwan
Discussions on Audio Melody Extraction (AME) J.-S. Roger Jang ( 張智星 ) MIR Lab, CSIE Dept. National Taiwan University.
Simulation of Stock Trading J.-S. Roger Jang ( 張智星 ) MIR Lab, CSIE Dept. National Taiwan University.
Linear Classifiers (LC) J.-S. Roger Jang ( 張智星 ) MIR Lab, CSIE Dept. National Taiwan University.
Pitch Tracking in Time Domain Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, Dept of CSIE National Taiwan University
Final Project: English Preposition Usage Checker J.-S. Roger Jang ( 張智星 ) MIR Lab, CSIE Dept. National Taiwan University.
Introduction to Music Information Retrieval (MIR)
Introduction to ISMIR/MIREX
Onset Detection, Tempo Estimation, and Beat Tracking
Search in Google's N-grams
CSIE Dept., National Taiwan Univ., Taiwan
Quadratic Classifiers (QC)
MIR Lab: R&D Foci and Demos ( MIR實驗室:研發重點及展示)
DP for Optimum Strategies in Games
Query by Singing/Humming via Dynamic Programming
Discrete Fourier Transform (DFT)
Introduction to Pattern Recognition
Singing Voice Separation via Active Noise Cancellation 使用主動式雜訊消除於歌聲分離
Gradient Descent 梯度下降法
自我介紹 學歷: 研究方向: 經歷: 1984:學士,台大電機系 1992:博士,加州大學柏克萊分校、電機電腦系
National Taiwan University
Closing Remarks on MSAR-2017
ML for FinTech: Some Examples
Intro. to Audio Signals Jyh-Shing Roger Jang (張智星)
Introduction to Music Information Retrieval (MIR)
Search in OOXX Games J.-S. Roger Jang (張智星) MIR Lab, CSIE Dept.
Introduction to Music Information Retrieval (MIR)
Circularly Linked Lists and List Reversal
Queues Jyh-Shing Roger Jang (張智星)
National Taiwan University
Applications of Heaps J.-S. Roger Jang (張智星) MIR Lab, CSIE Dept.
Insertion Sort Jyh-Shing Roger Jang (張智星)
Dynamic Programming 動態規劃
Examples of Time Complexity
Scientific Computing: Closing 科學計算:結語
Prediction in Stock Trading
Selection Algorithm Jyh-Shing Roger Jang (張智星)
Gradient Descent 梯度下降法
Naive Bayes Classifiers (NBC)
Game Trees and Minimax Algorithm
Duration & Pitch Modification via WSOLA
Longest Common Subsequence (LCS)
National Taiwan University
Sorting Algorithms Jyh-Shing Roger Jang (張智星)
Edit Distance 張智星 (Roger Jang)
Pre and Post-Processing for Pitch Tracking
Presentation transcript:

Query by Singing/Humming via Dynamic Programming J.-S. Roger Jang (張智星) jang@mirlab.org http://mirlab.org/jang MIR Lab, CSIE Dept. National Taiwan University

Introduction to Query by Singing/Humming Query by singing/humming (QBSH, 哼唱選歌) Goal: Identify a song by singing or humming Demos http://mirlab.org/demo/miracle (PC) http://www.midomi.com (PC) http://www.soundhound.com (Mobile) Approach to QBSH Pitch tracking: Convert singing/humming into pitch vector Retrieval: Find the distance between the pitch vector and each song in the database Our homework Explore how we can use dynamic programming (DP) to find the distance in the retrieval part of QBSH

Examples of Pitch Vectors and Music Notes MIDI numbers Used in MIDI files AKA semitones Music note vector Integer semitones Example MIDI file of 小星星 Note vector: [60 60 67 67 69 69 67 65 65 64 64 62 62 60] Pitch vector Real-number semitone Example of singing clips 小星星, pitch vector (play) 在那遙遠的地方, pitch vector (play) Pitch rate = 31.25 pitch/second

Our Task: Optimal Alignment How to find the distance between a pitch vector p(i), i=1~m, and a note vector q(j), j=1~n? We need to find the optimal alignment. This can be achieved by DP. Example: The alignment path: (1,1), (2,1), (3,2), (4,2), (5,2), (6,2), (7,3), (8,3), (9,4), (10,4), (11,4), (12,4) Distance:

Three-step Formula of DP for Alignment Three-step DP formula Optimum-value function: D(i,j) is the min distance between p(1:i) and q(1:j) Recurrent equation: Answer: Assumption Anchored at beginning  p(1) is assigned to q(1) No rest in p  No zeros in p No need to do key transposition for p

Walk-through Example q P |p(i)-q(j)| 40 20 30 10 20 21 25 12 14 28 26 19 15 28 26 12 14 19 9 19 14 17 20 1 5 8 6 8 6 1 11 1 6 3 q 30 9 5 18 16 2 4 9 1 9 4 7 10 11 15 2 4 18 16 11 21 11 16 13 20 1 5 8 6 8 6 1 11 1 6 3 21 25 12 14 28 26 21 31 21 26 23 P

Hints and Caveats Useful hints to implementation Caveats Be aware of the recurrent equation when i=1 or j=1. Pad an extra layer with D(i, j)=inf. for simplified code Caveats The optimum path may not be unique, but the minimum distance is. The last element in the pitch vector does not have to be assigned to the last music note.

Example Singing clip of 小星星 MIDI file of 三輪車 Alignment path: Pitch vectors:

Example Singing clip of 小星星 MIDI file of小星星 Alignment path: Pitch vectors:

Retrieval Result Other considerations Key transposition Anchor point Music note duration Min distance!