Polyphonic Queries A Review of Recent Research by Cory Mckay.

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

Configuration management

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.

Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.

Outline Introduction Music Information Retrieval Classification Process Steps Pitch Histograms Multiple Pitch Detection Algorithm Musical Genre Classification.

Service Discrimination and Audit File Reduction for Effective Intrusion Detection by Fernando Godínez (ITESM) In collaboration with Dieter Hutter (DFKI)

Evaluating Search Engine

Toward Semantic Indexing and Retrieval Using Hierarchical Audio Models Wei-Ta Chu, Wen-Huang Cheng, Jane Yung-Jen Hsu and Ja-LingWu Multimedia Systems,

Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.

Video Table-of-Contents: Construction and Matching Master of Philosophy 3 rd Term Presentation - Presented by Ng Chung Wing.

HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.

Tree structured representation of music for polyphonic music information retrieval David Rizo Departament of Software and Computing Systems University.

Jonah Shifrin, Bryan Pardo, Colin Meek, William Birmingham

DEVON BRYANT CS 525 SEMESTER PROJECT Audio Signal MIDI Transcription.

The Effectiveness Study of Music Information Retrieval Arbee L.P. Chen National Tsing Hua University 2002 ACM International CIKM Conference.

Voice Separation A Local Optimization Approach Voice Separation A Local Optimization Approach Jurgen Kilian Holger H. Hoos Xiaodan Wu Feb

Chapter 1 Program Design

On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.

Aparna Kulkarni Nachal Ramasamy Rashmi Havaldar N-grams to Process Hindi Queries.

Information Retrieval in Practice

To quantitatively test the quality of the spell checker, the program was executed on predefined “test beds” of words for numerous trials, ranging from.

L. Padmasree Vamshi Ambati J. Anand Chandulal J. Anand Chandulal M. Sreenivasa Rao M. Sreenivasa Rao Signature Based Duplicate Detection in Digital Libraries.

Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.

JSymbolic and ELVIS Cory McKay Marianopolis College Montreal, Canada.

A Time Based Approach to Musical Pattern Discovery in Polyphonic Music Tamar Berman Graduate School of Library and Information Science University of Illinois.

Educational Software using Audio to Score Alignment Antoine Gomas supervised by Dr. Tim Collins & Pr. Corinne Mailhes 7 th of September, 2007.

Paper by Craig Stuart Sapp 2007 & 2008 Presented by Salehe Erfanian Ebadi QMUL ELE021/ELED021/ELEM021 5 March 2012.

by B. Zadrozny and C. Elkan

JSymbolic Cedar Wingate MUMT 621 Professor Ichiro Fujinaga 22 October 2009.

Fundamentals of Data Analysis Lecture 9 Management of data sets and improving the precision of measurement.

A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.

When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.

Aspects of Music Information Retrieval Will Meurer School of Information University of Texas.

Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,

MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.

TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.

Similarity Matrix Processing for Music Structure Analysis Yu Shiu, Hong Jeng C.-C. Jay Kuo ACM Multimedia 2006.

Music Information Retrieval Information Universe Seongmin Lim Dept. of Industrial Engineering Seoul National University.

Polyphonic Transcription Bruno Angeles McGill University - Schulich School of Music MUMT-621 Fall /14.

Cluster-specific Named Entity Transliteration Fei Huang HLT/EMNLP 2005.

Scaling Heterogeneous Databases and Design of DISCO Anthony Tomasic Louiqa Raschid Patrick Valduriez Presented by: Nazia Khatir Texas A&M University.

Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.

Melodic Similarity Presenter: Greg Eustace. Overview Defining melody Introduction to melodic similarity and its applications Choosing the level of representation.

Issues in Automatic Musical Genre Classification Cory McKay.

Content-Based MP3 Information Retrieval Chueh-Chih Liu Department of Accounting Information Systems Chihlee Institute of Technology 2005/06/16.

Glencoe Introduction to Multimedia Chapter 8 Audio 1 Section 8.1 Audio in Multimedia Audio plays many roles in multimedia. Effective use in multimedia.

Nick Kwolek Martin Pendergast Stephen Edwards David Duemler.

1 Hidden Markov Model: Overview and Applications in MIR MUMT 611, March 2005 Paul Kolesnik MUMT 611, March 2005 Paul Kolesnik.

VizTree Huyen Dao and Chris Ackermann. Introducing example

Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:

BASS TRACK SELECTION IN MIDI FILES AND MULTIMODAL IMPLICATIONS TO MELODY gPRAI Pattern Recognition and Artificial Intelligence Group Computer Music Laboratory.

Tree structured and combined methods for comparing metered polyphonic music Kjell Lëmstrom David Rizo Valero José Manuel Iñesta CMMR’08 May 21, 2008.

Melody Recognition with Learned Edit Distances Amaury Habrard Laboratoire d’Informatique Fondamentale CNRS Université Aix-Marseille José Manuel Iñesta,

ISA Kim Hye mi. Introduction Input Spectrum data (Protein database) Peptide assignment Peptide validation manual validation PeptideProphet.

1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ； Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.

A shallow description framework for musical style recognition Pedro J. Ponce de León, Carlos Pérez-Sancho and José Manuel Iñesta Departamento de Lenguajes.

Genre Classification of Music by Tonal Harmony Carlos Pérez-Sancho, David Rizo Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante,

1 Tempo Induction and Beat Tracking for Audio Signals MUMT 611, February 2005 Assignment 3 Paul Kolesnik.

A Music Search Engine for Plagiarism Detection

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance Hello everyone,

Introduction Multimedia initial focus

Introduction to Music Information Retrieval (MIR)

Aspects of Music Information Retrieval

Memory and Melodic Density : A Model for Melody Segmentation

Presentation on Timbre Similarity

Retrieval Performance Evaluation - Measures

Measuring the Similarity of Rhythmic Patterns

Presentation transcript:

Polyphonic Queries A Review of Recent Research by Cory Mckay

Overview Introduction to polyphonic music queries Review of five recent systems Conclusions

Melodic Fragments as Queries Melodic fragments are a useful way of searching electronic music databases Convenient for humans to remember musical phrases Easy to enter musical fragments Monophonic: query by humming Polyphonic: notation software

Search Requirements Want searches to return all occurrences of a given set of notes in a database May also want records that contain similar sets of notes Should be able to deal with incomplete or partially erroneous queries Ideally, want to search music stored in Symbolic formats (MIDI, scores, sketches) Raw audio data

Monophonic Databases Has been some success with special case of monophonic databases Andreas Kornstadt’s Themefinder Roger J. MacNab’s Meledex Searching polyphonic databases is much more difficult

Polyphonic Databases: Problems Notes may begin simultaneously. Makes it impossible to outline an unambiguous sequence of events Multiple voices, with varying roles and relevance to particular queries Hard to deal with both symbolic and raw audio representations: Monophonic: can just transcribe audio Polyphonic: no good transcription system

Current Polyphonic Systems Wide-spread systems rely on meta-data This no good for searches of melodic fragments Currently no widely accepted content- based system A number of papers on topic have recently been published

Wiggins et al. (2002) Designed a new algorithm called SIA(M)ESE For making transposition-invariant queries Matches a query even if there are events in a score being searched that separate musical events in the query Assumes that both queries and database files are in symbolic form and are accurate This limits generality of algorithm No implementation of the algorithm given

Doraisamy & Rüger (2001) Polyphonic queries Partial queries permitted Symbolic queries Symbolic records Pitch and rhythm features

Doraisamy & Rüger (2001) N-grams are produced by converting notes into interval-based representation and grouping intervals into subdivisions of length n using a gliding window Leads to transposition-invariant data N-grams work well with monophonic queries

Doraisamy & Rüger (2001) To deal with potential for simultaneous note onsets in polyphonic music, constructs exhaustive “melodic strings”: Divide piece into overlapping windows of n adjacent onset times Find all possible combinations of onsets within each window

Doraisamy & Rüger (2001) Incorporates rhythmic information as well as intervals into each n-gram window Done by calculating ratio of time differences between adjacent note onsets: Ratio i = (Onset i+2 – Onset i+1 ) / (Onset i+1 – Onset i ) Avoids need to quantize events based on a predetermined beat duration By using onsets only, avoids needing to determine the duration of notes, which can be difficult to detect in raw audio recordings

Doraisamy & Rüger (2001) N-grams converted into text-based representations to allow use of text-based search engines Interval and rhythmic ratio histograms constructed in order to search for patterns in each piece

Doraisamy & Rüger (2001) Tested using a database of 3096 MIDI representations of classical music Studied effects of varying window sizes, bin ranges and query lengths 95% success rate with window sizes of 4 onset times, variable bin ranges and queries involving 50 notes 74% with query lengths of ten notes 65% with queries containing errors

Doraisamy & Rüger (2001) Evaluation: Performs well under ideal conditions Databases or queries containing raw audio not considered Only transposition invariant searches possible Undesirably long query lengths necessary Only classical music records tested Successful in showing the potential utility of n-grams Most recent work uses a variant of this n-gram approach

Doraisamy & Rüger (2002) Monophonic queries Partial queries permitted Symbolic queries Symbolic records Pitch and rhythm features

Doraisamy & Rüger (2002) Focused on monophonic queries because of query by humming N-grams particularly appropriate for error- prone queries resulting from query by humming One or two mistakes only lead to a few incorrect n-grams among a larger number of correct ones

Doraisamy & Rüger (2002) Used same overall design as previous system Includes more sophisticated error models to test effects of query inaccuracies Database expanded to include popular music as well as classical music 80% of the relevant compositions were returned in first 15 hits, on average

Doraisamy & Rüger (2002) Evaluation N-grams are an effective and error-tolerant tool for searching polyphonic music with monophonic queries Improvements still need to be made Queries still symbolic, so true applicability of system to query by humming not tested

Pickens et al. (2002) Polyphonic queries Full-length queries only Audio queries Symbolic records Harmonic features

Pickens et al. (2002) Relies on transcription algorithms to transform audio queries into symbolic form Relies on error tolerance of searches to compensate for transcription errors Two types of polyphonic transcription systems used, including blackboard

Pickens et al. (2002) Transcribed queries and database records analyzed and stored using a harmonic modelling module Characterizes pieces by mapping chords to a probability distribution Breaks into sequences of independent note sets Applies smoothing procedure to sets Markov models created from smoothed sets

Pickens et al. (2002) Performing searches: Scoring function used to compare query models with each model stored in database Dissimilarity scores produced Allows search hits to be ranked

Pickens et al. (2002) Tested with database containing 3150 classical piano pieces Queries consisted of full-length audio recordings On average, searches assigned a rank of between 2 and 6 to the correct database record Moderately successful at matching variations of a piece: on average, 3 of top 5 hits relevant

Pickens et al. (2002) Evaluation Can use polyphonic audio queries Limited by effectiveness of its transcription systems and error tolerance of the query system System only tested on piano music Full-length queries limit usefulness Only one feature used

Song, Bae & Yoon (2002) Monophonic queries Partial queries permitted Audio queries Audio records Melodic features

Song, Bae & Yoon (2002) Intended to be used with query by humming Avoids disadvantages of automated transcription by mapping audio data directly to “mid-level” melody-based feature set description Contrasts with approach of first transcribing audio data and then extracting features from this high- level symbolic representation

Song, Bae & Yoon (2002) “Mid-level” representation produced by processing audio frames using a five- step process

Song, Bae & Yoon (2002) Instead of making definite decision on which notes were present, as a blackboard system would have done, vectors of all possible notes were kept for each audio segment A DP-matching method was used during searches so that potentially error-prone patterns of different lengths could be compared

Song, Bae & Yoon (2002) Tested by attempting to match 176 hummed samples to 92 short extracts (15-20 seconds long) of popular Korean and Western popular songs Resulted in exact matches roughly 43% of the time and a match in the top ten from 69% to 76% of the time (depending on window size) Search time varied from 3 to 14 seconds

Song, Bae & Yoon (2002) Evaluation Performance relatively poor Only monophonic queries possible Only tested using a database of short recordings This was only system using audio data for both queries and database records

Conclusions No truly viable systems produced yet Promising approaches have been proposed Recurring problems: None of systems can deal with both symbolic and audio data None have been tested with both polyphonic and monophonic queries Tend to require long queries to achieve good results Searches do not allow much flexibility (e.g. must be transposition invariant)

Conclusions Difficult to compare systems because they each use different performance evaluation metrics Limited data sets used during testing

Conclusions Possible improvement: use a greater number of feature classes Aside from Doraisamy & Rüger, all systems discussed limited themselves to one feature class Harmonic, melodic, timbral and rhythm- based features could all prove useful Would allow much more flexible searches