Advanced Multimedia Music Information Retrieval Tamara Berg.

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

A Musical Data Mining Primer CS235 – Spring ’03 Dan Berger

Time-Frequency Analysis Analyzing sounds as a sequence of frames

LECTURE Copyright  1998, Texas Instruments Incorporated All Rights Reserved Use of Frequency Domain Telecommunication Channel |A| f fcfc Frequency.

Toward Automatic Music Audio Summary Generation from Signal Analysis Seminar „Communications Engineering“ 11. December 2007 Patricia Signé.

Outline Introduction Music Information Retrieval Classification Process Steps Pitch Histograms Multiple Pitch Detection Algorithm Musical Genre Classification.

1 / 22 Issues in Text Similarity and Categorization Jordan Smith – MUMT 611 – 27 March 2008.

LYRIC-BASED ARTIST NETWORK METHODOLOGY Derek Gossi CS 765 Fall 2014.

Information Retrieval in Practice

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.

Image Fourier Transform Faisal Farooq Q: How many signal processing engineers does it take to change a light bulb? A: Three. One to Fourier transform the.

Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.

Introduction to Wavelets

Back Projection Reconstruction for CT, MRI and Nuclear Medicine

A PRESENTATION BY SHAMALEE DESHPANDE

Discrete Time Periodic Signals A discrete time signal x[n] is periodic with period N if and only if for all n. Definition: Meaning: a periodic signal keeps.

Overview of Search Engines

CELLULAR COMMUNICATIONS DSP Intro. Signals: quantization and sampling.

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING MARCH 2010 Lan-Ying Yeh

Information Retrieval in Practice

Songsmith: Dan Morris, Ian Simon, Sumit Basu, and the MSR Advanced Development Team Microsoft Research Using Machine Learning to Help People Make Music.

Goals For This Class Quickly review of the main results from last class Convolution and Cross-correlation Discrete Fourier Analysis: Important Considerations.

Sound Applications Advanced Multimedia Tamara Berg.

Discrete-Time and System (A Review)

Lecture 1 Signals in the Time and Frequency Domains

TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

Isolated-Word Speech Recognition Using Hidden Markov Models

Beats and Tuning Pitch recognition Physics of Music PHY103.

R ESEARCH BY E LAINE C HEW AND C HING -H UA C HUAN U NIVERSITY OF S OUTHERN C ALIFORNIA P RESENTATION BY S EAN S WEENEY D IGI P EN I NSTITUTE OF T ECHNOLOGY.

CSC589 Introduction to Computer Vision Lecture 8

Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.

©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.

Module 2 SPECTRAL ANALYSIS OF COMMUNICATION SIGNAL.

Wireless and Mobile Computing Transmission Fundamentals Lecture 2.

A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.

Student: Mike Jiang Advisor: Dr. Ras, Zbigniew W. Music Information Retrieval.

Complex Variables & Transforms 232 Presentation No.1 Fourier Series & Transforms Group A Uzair Akbar Hamza Saeed Khan Muhammad Hammad Saad Mahmood Asim.

Multimodal Information Analysis for Emotion Recognition

MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.

Music Information Retrieval Information Universe Seongmin Lim Dept. of Industrial Engineering Seoul National University.

Advanced Multimedia Image Content Analysis Tamara Berg.

Spatial Frequencies Spatial Frequencies. Why are Spatial Frequencies important? Efficient data representation Provides a means for modeling and removing.

Fourier Series Fourier Transform Discrete Fourier Transform ISAT 300 Instrumentation and Measurement Spring 2000.

7- 1 Chapter 7: Fourier Analysis Fourier analysis = Series + Transform ◎ Fourier Series -- A periodic (T) function f(x) can be written as the sum of sines.

Advanced Multimedia Image Content Analysis Tamara Berg.

GG313 Lecture 24 11/17/05 Power Spectrum, Phase Spectrum, and Aliasing.

MSc Project Musical Instrument Identification System MIIS Xiang LI ee05m216 Supervisor: Mark Plumbley.

Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.

Fourier Transform.

1 Hidden Markov Model: Overview and Applications in MIR MUMT 611, March 2005 Paul Kolesnik MUMT 611, March 2005 Paul Kolesnik.

Piano Music Transcription Wes “Crusher” Hatch MUMT-614 Thurs., Feb.13.

Classification of melody by composer using hidden Markov models Greg Eustace MUMT 614: Music Information Acquisition, Preservation, and Retrieval.

Instructor: Mircea Nicolescu Lecture 5 CS 485 / 685 Computer Vision.

Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.

A content-based System for Music Recommendation and Visualization of User Preference Working on Semantic Notions Dmitry Bogdanov, Martin Haro, Ferdinand.

 Carrier signal is strong and stable sinusoidal signal x(t) = A cos(  c t +  )  Carrier transports information (audio, video, text, ) across.

1 Tempo Induction and Beat Tracking for Audio Signals MUMT 611, February 2005 Assignment 3 Paul Kolesnik.

Information Retrieval in Practice

Jean Baptiste Joseph Fourier

DIGITAL SIGNAL PROCESSING ELECTRONICS

SIGNALS PROCESSING AND ANALYSIS

ARTIFICIAL NEURAL NETWORKS

Introduction to Music Information Retrieval (MIR)

Tremor Detection Using Motion Filtering and SVM Bilge Soran, Jenq-Neng Hwang, Linda Shapiro, ICPR, /16/2018.

4.1 DFT In practice the Fourier components of data are obtained by digital computation rather than by analog processing. The analog values have to be.

Lec.6:Discrete Fourier Transform and Signal Spectrum

Measuring the Similarity of Rhythmic Patterns

Music Signal Processing

Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems NDSS 2019 Hadi Abdullah, Washington Garcia, Christian Peeters, Patrick.

Presentation transcript:

Advanced Multimedia Music Information Retrieval Tamara Berg

Announcements Still missing a few assignment 1’s Assignment 2 is online – due March 10 Assignment 2

Audio Indexing and Retrieval Motivation Features for representing audio: – Metadata – low level features – high level audio features Example usage cases: Audio classification Music retrieval

Howard Leung

Content Based Music Retrieval Extract music descriptions from a database of music documents. Extract music description from a query music document. Compute match between query and database descriptions. Retrieve similar music documents to query. Casey et al IEEE 2008

MIR tasks H: high level specificity – match specific instances of audio content. M: mid-level specificity – match high level audio features like melody, but do not match audio content. L: low specificity – match global (statistical) properties of the query Different usage cases require different descriptions and matching schema. Casey et al IEEE 2008

Metadata Most common method of accessing music Can be rich and expressive When catalogues become very large, difficult to maintain consistent metadata Useful for low specificity queries Casey et al IEEE 2008

Metadata Pandora.com – Uses metadata to estimate artist similarity and track similarity and creates personalized radio stations. Human entered metadata of musical- cultural properties (20-30 minutes per track of an expert’s time – 50 person-years for 1 million tracks). Pandora.com User contributed metadata repositories (gracenote, musicbrainz). Factual metadata (artist, album, year, title, duration). Cultural metadata (mood, emotion, genre, style).gracenote musicbrainz Automatic metadata methods – generate descriptions from community metadata automatically. Language analysis to associate noun and verb phrases with musical features (Whitman & Rifkin). Casey et al IEEE 2008

Content features Low level or high level Want features to be robust to certain changes in the audio signal (why?) – Noise – Volume – Sampling High level features will be more robust to changes, low level features will be less robust. Low level features will be easy to compute, high level difficult

Low level audio features Low level measurements of audio signal that contain information about a musical work. Can be computed periodically ( ms intervals) or beat synchronous. Casey et al IEEE 2008 In text analysis we had words, here we have to come up with our own set of features to compute from audio signal!

Example Low-Level Audio Features Howard Leung

Average number of times signal crosses zero amplitude value. 1 if true O o.w.

Howard Leung

Frequency Domain Reminder How much of each describes the frequency spectrum of a signal. Li & Drew Signals can be decomposed into a weighted sum of sinusoids

Frequency domain features How do we get to frequency domain? TimeFrequency

DFT Discrete Fourier Transform (DFT) of the audio Converts to a frequency representation DFT analysis occurs in terms of number of equally spaced ‘bins’ Each bin represents a particular frequency range DFT analysis gives the amount of energy in the audio signal that is present within the frequency range for each bin Inverse Discrete Fourier Transform (IDFT) Converts from frequency representation back to audio signal.

DFT Discrete Fourier Transform (DFT) of the audio Converts to a frequency representation DFT analysis occurs in terms of number of equally spaced ‘bins’ Each bin represents a particular frequency range DFT analysis gives the amount of energy in the audio signal that is present within the frequency range for each bin Inverse Discrete Fourier Transform (IDFT) Converts from frequency representation back to audio signal.

DFT Discrete Fourier Transform (DFT) of the audio Converts to a frequency representation DFT analysis occurs in terms of number of equally spaced ‘bins’ Each bin represents a particular frequency range DFT analysis gives the amount of energy in the audio signal that is present within the frequency range for each bin Inverse Discrete Fourier Transform (IDFT) Converts from frequency representation back to audio signal.

Howard Leung

Filtering Removes frequency components from some part of the spectrum Low pass filter – removes high frequency components from input and leaves only low in the output signal. High pass filter – removes low frequency components from input and leaves only high in the output signal. Band pass filter – removes some part of the frequency spectrum.

How could you do this using the FT and IFT? Compute FT spectrum of input. Zero out the part of the frequency spectrum that you want to filter out. Compute the IFT of this modified spectrum -> output will be input with some frequency components removed.

How could you do this using the FT and IFT? f = input

How could you do this using the FT and IFT? f = input FT(f)

How could you do this using the FT and IFT? 1 0.* f = input FT(f)

How could you do this using the FT and IFT? 1 0.* f = input FT(f) Zero out some freq components

How could you do this using the FT and IFT? 1 0.* = f = input FT(f) Zero out some freq components x xxx xxxxx xxxx x

How could you do this using the FT and IFT? 1 0.* = f = input FT(f) Zero out some freq components IFT o = Frequency limited output x xxx xxxxx xxxx x

How could you do this using the FT and IFT? 1 0.* = f = input FT(f) Zero out some freq components IFT o = Frequency limited output x xxx xxxxx xxxx x What kind of filter is this?

How could you do this using the FT and IFT? f = input

How could you do this using the FT and IFT? f = input FT(f)

How could you do this using the FT and IFT? 1 0.* f = input FT(f)

How could you do this using the FT and IFT? 1 0.* f = input FT(f) Zero out some freq components

How could you do this using the FT and IFT? 1 0.* = f = input FT(f) Zero out some freq components x x x x xx xxxxxx x

How could you do this using the FT and IFT? 1 0.* = f = input FT(f) Zero out some freq components IFT o = Frequency limited output x x x x xx xxxxxx x

How could you do this using the FT and IFT? 1 0.* = f = input FT(f) Zero out some freq components IFT o = Frequency limited output x x x x xx xxxxxx x What kind of filter is this?

Filtering Alternatively you can convolve the input signal with a filter to get frequency limited output signal. Convolution (we’ll see this again for images): (convolution demo)demo /3 f = g = signal filter

Filtering Alternatively you can convolve the input signal with a filter to get frequency limited output signal. Convolution: (convolution demo)demo /3 *** = -2

Filtering Alternatively you can convolve the input signal with a filter to get frequency limited output signal. Convolution: (convolution demo)demo /3 *** = -210/3

Filtering Alternatively you can convolve the input signal with a filter to get frequency limited output signal. Convolution: (convolution demo)demo /3 *** = -210/3

Filtering Alternatively you can convolve the input signal with a filter to get frequency limited output signal. Convolution: (convolution demo)demo /3 *** = -210/3

Filtering Alternatively you can convolve the input signal with a filter to get frequency limited output signal. Convolution: (convolution demo)demo /3 *** = -210/3 3

Filtering Alternatively you can convolve the input signal with a filter to get frequency limited output signal. Convolution: (convolution demo)demo /3 *** = -210/3 311/3- What does this filter do?

Filtering Alternatively you can convolve the input signal with a filter to get frequency limited output signal. Convolution: (convolution demo)demo /41/21/4 f = g = signal filter

Filtering Alternatively you can convolve the input signal with a filter to get frequency limited output signal. Convolution: (convolution demo)demo /41/21/4 *** = -2.25

Filtering Alternatively you can convolve the input signal with a filter to get frequency limited output signal. Convolution: (convolution demo)demo /41/21/4 *** =

Filtering Alternatively you can convolve the input signal with a filter to get frequency limited output signal. Convolution: (convolution demo)demo /41/21/4 *** =

Filtering Alternatively you can convolve the input signal with a filter to get frequency limited output signal. Convolution: (convolution demo)demo /41/21/4 *** =

Filtering Alternatively you can convolve the input signal with a filter to get frequency limited output signal. Convolution: (convolution demo)demo /41/21/4 *** =

Filtering Alternatively you can convolve the input signal with a filter to get frequency limited output signal. Convolution: (convolution demo)demo /41/21/4 *** = In general filters will have a more complex effect on the output.

What is convolution doing?

Relationship f = input F = ft(f) g = filter G = ft(g) f f ★ g

Relationship f = input F = ft(f) g = filter G = ft(g) f f ★ g FF.*G

Relationship f = input F = ft(f) g = filter G = ft(g) f f ★ g FF.*G FT Theorem: Convolution in signal space is equivalent to point- wise multiplication in frequency space.

Relationship f = input F = ft(f) g = filter G = ft(g) f f ★ g FF.*G FT f ★ g = IFT(F.*G) F.*G = FT(f ★ g) Theorem: Convolution in signal space is equivalent to point- wise multiplication in frequency space.

Matlab demo soundFilt/demo.m

Howard Leung

Pitch-Class Profile (PCP) Represent the energy due to each pitch class Integrates the energy in all octaves into a single band There are 12 equally spaced pitch classes in western tonal music. So, typically 12 bands in the PCP.

Pitch-Class Profile (PCP) Represent the energy due to each pitch class Integrates the energy in all octaves into a single band There are 12 equally spaced pitch classes in western tonal music. So, typically 12 bands in the PCP. How might we calculate this using the DFT?

Howard Leung

High level music features High level intuitive information about a piece of music (melody, harmony etc). “It is melody that enables us to distinguish one work from another. It is melody that human beings are innately able to reproduce by singing, humming, and whistling. It is melody that makes music memorable: we are likely to recall a tune long after we have forgotten its text.” -Selfridge-Field Intuitive features, but hard to extract and ongoing areas of research. Casey et al IEEE 2008

Melody & Bass Estimation Melody and bass lines represented as continuous temporal trajectory of fundamental frequency, F0 (a series of musical notes). PreFEst (Predominant-F0 Estimation method – Goto 1999 ) – Estimate the F0 trajectory in mid-high freq range of input -> melody. – Estimate the F0 trajectory in low freq range-> bass. Casey et al IEEE 2008

Chord Recognition Musical performance is assumed to travel through a sequence of states. Hidden Markov Model (HMM – probabilistic model good for modeling sequences of data, here sequences of chords over time) is used to model these transitions and predict the best chord sequence given a set of observations (PCP). Transition model – Probability of transitioning from one chord to another Output model – Probability of a PCP given a chord. Casey et al IEEE 2008

Chord Recognition

Music Structure Segment into temporal regions with some internal consistency – Beat segmentation – Verse, chorus, bridge – Speech vs music Uses: – facilitate audio editing – Improve similarity measurements by removing irrelevant parts or selecting most representative parts (for recommender systems). Casey et al IEEE 2008

Music Structure Detect repeated structures and label them as being the same.

Music as vector of features Once again we represent (music) documents as a vector of numbers – Each entry (or set of entries) in this vector is a different feature

Music as vector of features Once again we represent (music) documents as a vector of numbers – Each entry (or set of entries) in this vector is a different feature To retrieve music documents given a query we can: – Find exact matches – Find nearest match – Find nearby matches – Train a classifier to recognize a given category (genre, style etc).

Audio Similarity We have a description of a music document based on some set of features, now how do we compare two descriptions? Casey et al IEEE 2008

Usage examples

Howard Leung

Query by humming Requires high level features because matches will not be exact Extract melody from dataset of songs Extract melody from hum Match by comparing similarities of melodies (nearby matches)

Copyright monitoring Compute fingerprints from database examples Compute fingerprint from query example Find exact matches

Best performing systems on MIREX 2007 Casey et al IEEE 2008

Music Browsing Musicream – UI for discovering and managing musical pieces. User can select a disc and listen to it. By dragging a disc in the flow, the user can easily pick out other similar pieces (attach similar discs). This interaction allows a user to unexpectedly come across various pieces similar to other pieces the user likes. Link to demo Casey et al IEEE 2008

Music Browsing Musicrainbow – UI for discovering unknown artists. Artists are mapped on a circular rainbow where colors represent different styles of music. Similar artists are mapped near each other. User rotates rainbow by turning a knob. Link to demo Casey et al IEEE 2008

Howard Leung