Presentation is loading. Please wait.

Presentation is loading. Please wait.

Discussions on Audio Melody Extraction (AME) J.-S. Roger Jang ( 張智星 ) MIR Lab, CSIE Dept. National Taiwan University.

Similar presentations


Presentation on theme: "Discussions on Audio Melody Extraction (AME) J.-S. Roger Jang ( 張智星 ) MIR Lab, CSIE Dept. National Taiwan University."— Presentation transcript:

1 Discussions on Audio Melody Extraction (AME) J.-S. Roger Jang ( 張智星 ) jang@mirlab.org http://mirlab.org/jang MIR Lab, CSIE Dept. National Taiwan University

2 2/6 Outline Dataset preparation for AME Suggestions to AME task in MIREX

3 3/6 Goals Large enough to have statistical significance Diversified contents for better generation More instrumental music Should be full songs instead of excerpts Annotation procedure should be standardized and fully documented Music contents should be as professional as possible. Dataset Preparation for AME J. Salamon and J. Urbano, "Current Challenges in the Evaluation of Predominant Melody Extraction Algorithms", ISMIR, 2012 How about two datasets?

4 4/6 Goal: to simply the task such that Reduce the entry barrier Since the basic task is already hard enough Encourage more people to participate Such that it can promote other task such as cover song ID Directions for simplification Datasets for different lead instruments Lead singer only: Subset of type A Other lead instruments: Subset of type A About submissions Different submissions for different datasets Train/test procedures Simpler criteria Get rid of +5dB and -5dB? Suggestions to AME Task in MIREX

5 5/6 3 Definitions of Melody Type A: The f0 curve of the most predominant melodic source in the recording, and only that source. So for example in this scenario if there's a lead singer but also a guitar solo, the annotation will only include the lead singer. This is closest to the definition used in MIREX right now. Type B: The f0 of the most predominant melodic source in the recording at any given point in time. In this more relaxed definition, the f0 curve can include the pitch of several sources (but only one source at any point in time). To create this we annotated all the pitch tracks that we considered melodic (e.g. lead voice, solos, etc.). Then we ranked them from most predominant to least predominant. Then the final f0 curve was generated by taking at every timestamp the f0 value from the most predominant source that was active at that time. Type C: The multi-f0 curve of all melodic instruments. This is basically closer to multi-f0 tracking in the sense that there may be several active melodic f0 values at the same time. However, unlike multi-f0 tracking we don't annotate all pitched instruments in the track (e.g. we don't annotate the bass line), only the tracks that are considered melodic. Under this definition, the algorithm's estimate would be considered correct if it matches any of the active melodic sources in the annotation. R. M. Bittner, J. Salamon, M. Tierney, M. Mauch, C. Cannam and J. P. Bello. "MedleyDB: A Multitrack Dataset for Annotation-Intensive MIR Research“ ISMIR, 2014

6 6/6 Discussions How can we join force to create an AME dataset that satisfies (almost) all the requirements?


Download ppt "Discussions on Audio Melody Extraction (AME) J.-S. Roger Jang ( 張智星 ) MIR Lab, CSIE Dept. National Taiwan University."

Similar presentations


Ads by Google