Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.

Slides:

Advertisements

Similar presentations

ASSESS: a descriptive scheme for speech in databases Roddy Cowie.

Advertisements

Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:

DECISION TREES. Decision trees  One possible representation for hypotheses.

Automation Audio, Aux Input, MIDI and Instrument tracks – Volume – Pan – Mute Sends – Volume – Pan – mute Master Faders – Volume All plug-in controls.

Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.

A System for Hybridizing Vocal Performance By Kim Hang Lau.

Optimal Design Laboratory | University of Michigan, Ann Arbor 2011 Design Preference Elicitation Using Efficient Global Optimization Yi Ren Panos Y. Papalambros.

1 Manipulating Digital Audio. 2 Pulse Code Modulation (PCM)  This is a means of encoding the digital signal for transmission or storage.

Chapter 3 The Greedy Method 3.

Laser Scan Matching in Polar Coordinates with Application to SLAM

Content-Based Classification, Search & Retrieval of Audio Erling Wold, Thom Blum, Douglas Keislar, James Wheaton Presented By: Adelle C. Knight.

Algorithmic Complexity Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.

Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.

Understand the football simulation source code. Understand the football simulation source code. Learn all the technical specifications of the system components.

Aki Hecht Seminar in Databases (236826) January 2009

1 Preprocessing for JPEG Compression Elad Davidson & Lilach Schwartz Project Supervisor: Ari Shenhar SPRING 2000 TECHNION - ISRAEL INSTITUTE of TECHNOLOGY.

This material in not in your text (except as exercises) Sequence Comparisons –Problems in molecular biology involve finding the minimum number of edit.

T.Sharon 1 Internet Resources Discovery (IRD) Music IR.

Chapter 3: The Efficiency of Algorithms Invitation to Computer Science, C++ Version, Fourth Edition.

Parallel Computation in Biological Sequence Analysis Xue Wu CMSC 838 Presentation.

Accurate Method for Fast Design of Diagnostic Oligonucleotide Probe Sets for DNA Microarrays Nazif Cihan Tas CMSC 838 Presentation.

Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.

Inputs to Signal Generation.vi: -Initial Distance (m) -Velocity (m/s) -Chirp Duration (s) -Sampling Info (Sampling Frequency, Window Size) -Original Signal.

To quantitatively test the quality of the spell checker, the program was executed on predefined “test beds” of words for numerous trials, ranging from.

Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.

Two and a half problems in homogenization of climate series concluding remarks to Daily Stew Ralf Lindau.

Building Efficient Time Series Similarity Search Operator Mijung Kim Summer Internship 2013 at HP Labs.

L. Padmasree Vamshi Ambati J. Anand Chandulal J. Anand Chandulal M. Sreenivasa Rao M. Sreenivasa Rao Signature Based Duplicate Detection in Digital Libraries.

Data Structures and Algorithms Semester Project – Fall 2010 Faizan Kazi Comparison of Binary Search Tree and custom Hash Tree data structures.

Polyphonic Queries A Review of Recent Research by Cory Mckay.

DIGITAL WATERMARKING OF AUDIO SIGNALS USING A PSYCHOACOUSTIC AUDITORY MODEL AND SPREAD SPECTRUM THEORY By: Ricardo A. Garcia University of Miami School.

MINING RELATED QUERIES FROM SEARCH ENGINE QUERY LOGS Xiaodong Shi and Christopher C. Yang Definitions: Query Record: A query record represents the submission.

1 ELEN 6820 Speech and Audio Processing Prof. D. Ellis Columbia University Midterm Presentation High Quality Music Metacompression Using Repeated- Segment.

Gapped BLAST and PSI- BLAST: a new generation of protein database search programs By Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.

Clustering User Queries of a Search Engine Ji-Rong Wen, Jian-YunNie & Hon-Jian Zhang.

MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.

©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.

Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.

Thursday, May 9 Heuristic Search: methods for solving difficult optimization problems Handouts: Lecture Notes See the introduction to the paper.

Music Information Retrieval from a Singing Voice Using Lyrics and Melody Information Motoyuki Suzuki, Toru Hosoya, Akinori Ito, and Shozo Makino EURASIP.

Singer similarity / identification Francois Thibault MUMT 614B McGill University.

Programming Logic and Design Fourth Edition, Comprehensive Chapter 16 Using Relational Databases.

Melodic Similarity Presenter: Greg Eustace. Overview Defining melody Introduction to melodic similarity and its applications Choosing the level of representation.

UNIT 5.  The related activities of sorting, searching and merging are central to many computer applications.  Sorting and merging provide us with a.

MMDB-8 J. Teuhola Audio databases About digital audio: Advent of digital audio CD in Order of magnitude improvement in overall sound quality.

1 Channel Coding (III) Channel Decoding. ECED of 15 Topics today u Viterbi decoding –trellis diagram –surviving path –ending the decoding u Soft.

Web Mining Issues Size Size –>350 million pages –Grows at about 1 million pages a day Diverse types of data Diverse types of data.

Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.

CS307P-SYSTEM PRACTICUM CPYNOT. B13107 – Amit Kumar B13141 – Vinod Kumar B13218 – Paawan Mukker.

QBSH Corpus The QBSH corpus provided by Roger Jang [1] consists of recordings of children’s songs from students taking the course “Audio Signal Processing.

Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,

Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.

September 28, 2000 Improved Simultaneous Data Reconciliation, Bias Detection and Identification Using Mixed Integer Optimization Methods Presented by:

Principles of the Global Positioning System Lecture 09 Prof. Thomas Herring Room A;

©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.

1 Hidden Markov Model: Overview and Applications in MIR MUMT 611, March 2005 Paul Kolesnik MUMT 611, March 2005 Paul Kolesnik.

CS307P-SYSTEM PRACTICUM CPYNOT. B13107 – Amit Kumar B13141 – Vinod Kumar B13218 – Paawan Mukker.

Fast face localization and verification J.Matas, K.Johnson,J.Kittler Presented by: Dong Xie.

1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.

An Offline Approach for Whole-Program Paths Analysis using Suffix Arrays G. Pokam, F. Bodin.

Data Mining What is to be done before we get to Data Mining?

A Music Search Engine for Plagiarism Detection

Do-Gil Lee1*, Ilhwan Kim1 and Seok Kee Lee2

Web Data Extraction Based on Partial Tree Alignment

Introduction to Data Mining, 2nd Edition

Cyclic string-to-string correction

Learning for Efficient Retrieval of Structured Data with Noisy Queries

M. Kezunovic (P.I.) S. S. Luo D. Ristanovic Texas A&M University

DIGITAL WATERMARKING OF AUDIO SIGNALS USING A PSYCHOACOUSTIC AUDITORY MODEL AND SPREAD SPECTRUM THEORY By: Ricardo A. Garcia University of Miami School.

Presentation transcript:

Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour

Purpose of the Project Software Song name Recorded melody

Presentation Overview  Demonstration  Internals  Results  Conclusions

Program Demonstration

Inside the Program Vocal Input Segmentation Database Search List of Best Matches Pitch DetectionVolume Detection

ועכשיו בעברית קלט קולי סגמנטציה חיפוש במאגר המידע רשימת התאמות טובות ביותר זיהוי pitchזיהוי ווליום

Definition of Input  The input is sung by a human, who does not need to have any knowledge of music.  The program was optimized for singing using the syllables “da-da-da” or “ti-ti-ti”. All testing was performed on this type of input. InputPitch Detection SegmentationSearch

Pitch Detection  The super-resolution pitch detection algorithm achieves accurate detection values without increasing CPU time, by performing linear interpolation on a low sampling rate recording.  Detection is performed in a pitch- synchronous fashion (one pitch value for each cycle). InputPitch Detection SegmentationSearch

Pitch/Volume Detection InputPitch Detection SegmentationSearch

Segmentation (1/3) Sequence of Pitches and Volumes Sequence of Notes Volume-Based Segmentation Pitch-Based Segmentation Voice Noise Note Identification Ignore InputPitch Detection SegmentationSearch Decision

עכשיו בעברית רצף ערכי pitch ו-volume רצף של תווים - גובה ומשך זמן סגמנטציה ראשונית - מבוססת volume סגמנטציה שניונית - מבוססת pitch צליל רעש זיהוי גובה ומשך זמןביטול הסגמנט החלטה

Segmentation (2/3)  Volume Segmentation: Possible notes are identified as a region in which the volume is higher than a trigger value.  Thus, it’s important to separate each note by a short quiet period, e.g. by pronouncing “ta-ta-ta” rather than “la-la-la”. InputPitch Detection SegmentationSearch

Segmentation (3/3)  Pitch Segmentation: Within each segment, find the longest region in which the pitch is relatively constant.  Noise Removal: If this region is very short, then the segment is assumed to be noise, and it is ignored.  Conversion to Notes: The frequency of the note is identified by an iterative averaging technique. InputPitch Detection SegmentationSearch

Segmentation Example InputPitch Detection SegmentationSearch

Database Search Sequence of Notes Convert to relative frequencies and durations Find edit distance for each database entry Sort by increasing edit cost List of Best Matches InputPitch Detection SegmentationSearch

Edit Distance (1/3)  Purpose: Correction of errors in singing and in previous identification steps.  Mechanism: The edit distance is the minimum cost required to transform one string into another. The following changes can be applied at given costs: Change one character into anotherChange one character into another Insert one characterInsert one character Delete one characterDelete one character InputPitch Detection SegmentationSearch

Edit Distance (2/3) InputPitch Detection SegmentationSearch How to make an elephant become elegant: elephant eleghant Replace elegant Delete Example: Total edit distance is the cost of replacing ‘p’ with ‘g’, plus the cost of deleting ‘h’.

Edit Distance (3/3)  Algorithms differ by the content of the strings being compared. Three algorithms were checked: Parsons code: Only the direction of pitch change is compared (up, down, or repeat).Parsons code: Only the direction of pitch change is compared (up, down, or repeat). Frequency similarity: The direction and size of pitch change (e.g., up 3 semitones).Frequency similarity: The direction and size of pitch change (e.g., up 3 semitones). Frequency/Duration similarity: Both pitch change and relative duration of notes (e.g., up 3 semitones, and a longer note).Frequency/Duration similarity: Both pitch change and relative duration of notes (e.g., up 3 semitones, and a longer note). InputPitch Detection SegmentationSearch

Results

Simulation  Simulations of the search engine were performed in order to have a larger ensemble, from which a detection probability was calculated.  Random noise was added to the first few notes of a tune. The tune was then applied to the search engine.

Comparison of Search Algorithms

Effect of Database Size

Empirical Test  Subjects listened to a sample query. Then, they chose a song from the database, and were told to sing it in a similar manner.  Number of test subjects: 14 Number of recorded songs: 64 Number of songs in database: 197

Empirical Results

Conclusions  Combined frequency/duration search is the most robust search algorithm tested, and outperforms the Parsons code search by a wide margin.  The program performs better than an average human under the tested conditions.

Summary  A successful melody search engine has been created.  Real-time software implementation is possible.  The new frequency/duration search algorithm was found more effective than the existing Parsons code search.

The End