Presentation is loading. Please wait.

Presentation is loading. Please wait.

Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval.

Similar presentations


Presentation on theme: "Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval."— Presentation transcript:

1 Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang ( 張智星 ) jang@mirlab.org http://mirlab.org/jang Multimedia Information Retrieval Lab CSIE Dept, National Taiwan University

2 Retrieval Methods for QBSH zGoal yFind the most similar melody in the database zChallenges yRobust pitch tracking for various acoustic inputs xInput from mobile devices xInput at a noisy karaoke box yComparison methods should be able to deal with… xKey variations in users’ input (for instance, due to gender difference) xTempo variations in users’ input xReasonable response time, e.g., 5 seconds

3 Evaluation of QBSH Methods zTwo categories for evaluating QBSH methods yEfficiency: How fast is the system? xCan it deal with a music database of size 100K? yEffectiveness: How accurate is the system? xTop-10 recognition rates for n queries: (1+0+0+1+1…)/n xTop-10 mean reciprocal rank for n queries: (1/3+1/inf+1/4+1/2+1/5…)/n xTrue positive and true negative to deal with out-of- vocabulary (OOV) problem

4 Types of QBSH Approaches zCategories of approaches to QBSH yHistogram/statistics-based yNote vs. note xEdit distance yFrame vs. note xHMM yFrame vs. frame xLinear scaling, DTW, recursive alignment

5 Range Comparison zConcept yReject a song if the range does not match: zCharacteristics yExtremely fast yNot effective yGood for initial filtering

6 Linear Scaling (LS) zConcept yScale the query linearly to match the candidates zAssumption yUniform tempo variation zRest handling yCut leading and trailing zeros (silence) yAll the other zeros (rests) are replaced with the previous non-zero pitch

7 Linear Scaling zScale the query pitch linearly to match the candidates Original input pitch Stretched by 1.25 Stretched by 1.5 Compressed by 0.75 Compressed by 0.5 Target pitch in database Best match Original pitch

8 Strength and Weakness of LS zStrength yOne-shot for dealing with key transposition yEfficient and effective yIndexing methods available zWeakness yCannot deal with non- uniform tempo variations zTypical mapping path

9 Shorten or Lengthen a Pitch Vector zGiven a pitch vector x of length m, how to shorten or lengthen it to length n? yx2=interp1(1:m, x, linspace(1, m, n)); yExamples xm=7, n=13 xm=7, n=9

10 Distance Function for LS zCommonly used distance function for LS yNormalized L p -norm zCharacteristics yUsually p=1 or 2 for LS yNormalization to get rid of length variations

11 Key Transposition in LS zHow to find the best transposed query that has the smallest distance from the database items: yBest transposition yIn practice… Query Database item Transposed query

12 Example of Linear Scaling via L 1 Norm zlinScaling01.mlinScaling01.m

13 Linear Scaling via L 1 and L 2 Norm zlinScaling02.mlinScaling02.m

14 DTW (Dynamic Time Warping) zAbout DTW yDTW introductionDTW introduction yDTW for QBSHDTW for QBSH z#1 method for task 2 in QBSH/MIREX 2006

15 RA (Recursive Alignment) zCharacteristics yCombine characteristics of LS & DTW y#1 method for task 1 in QBSH/MIREX 2006 zA typical mapping path

16 Modified Edit Distance zNote segmentation zModified edit distance


Download ppt "Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval."

Similar presentations


Ads by Google