Presentation is loading. Please wait.

Presentation is loading. Please wait.

Music Information Retrieval: Overview and Challenges

Similar presentations


Presentation on theme: "Music Information Retrieval: Overview and Challenges"— Presentation transcript:

1 Music Information Retrieval: Overview and Challenges
J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan Univ. 2017/4/25

2 Outline Music information Retrieval (MIR)
Intro to MIR Intro to ISMIR & MIREX Two classical paradigms of MIR QBSH (query by singing/humming) AFP (audio fingerprinting) Conclusions

3 Introduction to QBSH QBSH: Query by Singing/Humming Progression
Input: Singing or humming from microphone Output: A ranked list retrieved from the song database according to similarity to the query Progression First paper: Around 1994 Extensive studies since 2001 State of the art: QBSH tasks at ISMIR/MIREX, since 2006

4 Two Steps in QBSH Pitch Tracking Database comparison
To detect the period of a waveform Time domain (時域) ACF (Autocorrelation function) NSDF (Normalized squared difference function) AMDF (Average magnitude difference function) Frequency domain (頻域) Harmonic product spectrum Cepstrum To find similarity between query and database songs Linear scaling Dynamic time warping Recursive alignment Hybrid methods

5 Frame Blocking for Pitch Tracking
Overlap Sample rate = 16 kHz Frame size = 512 samples Frame duration = 512/16000 = s = 32 ms Overlap = 192 samples Hop size = frame size – overlap = = 320 samples Frame rate = 16000/320 = 50 frames/sec = Pitch rate Zoom in Frame

6 ACF: Auto-correlation Function
1 128 Original frame s(t): Shifted frame s(t-t): t=30 acf(30) = inner product of the overlap part Pitch period To play safe, the frame size needs to cover at least two fundamental periods!

7 Frequency to Semitone Conversion
Semitone : A music scale based on A440 Reasonable pitch range: E2 - C6 82 Hz Hz ( )

8 Demos Pitch related demos Pitch tracking Pitch shift

9 Basic Comparison Method: Linear Scaling
Scale the query pitch linearly to match the candidates Target pitch in database Compressed by 0.5 Compressed by 0.75 Original pitch Original input pitch Best match Stretched by 1.25 Stretched by 1.5

10 Typical Result of Pitch Tracking
Pitch tracking via autocorrelation for茉莉花 (jasmine)

11 Comparison of Pitch Vectors
Yellow line : Target pitch vector

12 QBSH Demos QBSH demos by our lab Existing commercial QBSH systems
Description QBSH on the web: MIRACLE QBSH on toys Existing commercial QBSH systems

13 Our QBSH System: Miracle
Single server with GPU NVIDIA 560 Ti, 384 cores (speedup factor = 10) Clients Single server PC Master server Request: pitch vector Master server Response: search result PDA/Smartphone Database size: ~20,000 Cellular

14 Improving QBSH Many ways to improve QBSH Sorted error vector
Various weight for rests Re-ranking for better accuracy Better memory arrangement in GPU

15 Intro to Audio Fingerprinting (AFP)
Goal Identify a noisy version of a given audio clips Also known as… “Query by exact example”  no “cover versions” are allowed

16 AFP Applications Commercial applications of AFP
Music identification & purchase Royalty assignment (over radio) TV shows or commercials ID (over TV) Copyright violation (over web) Major commercial players Shazam, Soundhound, Intonow, Viggle…

17 Two Stages in AFP Offline Online Feature extraction
Hash table construction for songs in database Inverted indexing Online Feature extraction Hash table search Ranked list of the retrieved songs/music

18 Robust Feature Extraction
Various kinds of features for AFP Invariance along time and frequency Landmark of a pair of local maxima Wavelets Extensive test required for choosing the best features

19 Representative Approaches to AFP
Philips J. Haitsma and T. Kalker, “A highly robust audio fingerprinting system”, ISMIR 2002. Shazam A.Wang, “An industrial-strength audio search algorithm”, ISMIR 2003 Google S. Baluja and M. Covell, “Content fingerprinting using wavelets”, Euro. Conf. on Visual Media Production, 2006. V. Chandrasekhar, M. Sharifi, and D. A. Ross, “Survey and evaluation of audio fingerprinting schemes for mobile query-by-example applications”, ISMIR 2011

20 Improvement on AFP Re-ranking of AFP by learning to rank
Demo:

21 Shazam’s Method Ideas Take advantage of music local structures
Find salient peaks on spectrogram Pair peaks to form landmarks for comparison Efficient search by hash tables Use positions of landmarks as hash keys Use song ID and offset time as hash values Use time constraints to find matched landmarks

22 How to Find Salient Peaks
We need to find peaks that are salient along both frequency and time axes Frequency axis: Gaussian local smoothing Time axis: Decaying threshold over time

23 How to Find Initial Threshold?
Goal To suppress neighboring peaks Ideas Find the local max. of mag. spectra of initial 10 frames Superimpose a Gaussian on each local max. Find the max. of all Gaussians

24 How to Update the Threshold along Time?
Decay the threshold Find local maxima larger than the threshold  salient peaks Define the new threshold as the max of the old threshold and the Gaussians passing through the active local maxima

25 Time-decaying Thresholds
Forward: Backward:

26 How to Pair Salient Peaks?
Target zone

27 Salient Peaks and Landmarks
Peak picking after forward smoothing Matched landmarks (green) (Source: Dan Ellis)

28 Landmarks for Hash Table Access

29 Optimization Strategies for AFP
Several ways to optimize AFP Strategy for query landmark extraction Confidence measure Incremental retrieval Better use of the hash table Re-ranking for better performance

30 Demos of Audio Fingerprinting
Commercial apps Shazam Soundhound Our demo

31 QBSH vs. AFP QBSH AFP Goal: MIR Feature: Pitch Method: LS Database
Perceptible Small data size Method: LS Database Harder to collect Small storage Bottleneck CPU/GPU-bound AFP Goal: MIR Features: Landmarks Not perceptible Big data size Method: Matched LM Database Easier to collect Large storage Bottleneck I/O-bound

32 Conclusions Successful applications in MIR Due to Challenges in MIR
QBSH AFP Due to Faster bigger memory Advances in GPU/CPU (Moore’s law) New machine learning methods Challenges in MIR Audio melody extraction from polyphonic music Database collection for QBSH Cover song ID (which cannot handled by AFP) Polyphonic music transcription

33 Thank you for your attention! Questions & comments?


Download ppt "Music Information Retrieval: Overview and Challenges"

Similar presentations


Ads by Google