Centre for Computational Creativity Semantic Audio Studio Tools and Techniques using MPEG-7 Dr. Michael Casey Centre for Computational Creativity Department.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Entropy and Dynamism Criteria for Voice Quality Classification Applications Authors: Peter D. Kukharchik, Igor E. Kheidorov, Hanna M. Lukashevich, Denis.
Franz de Leon, Kirk Martinez Web and Internet Science Group  School of Electronics and Computer Science  University of Southampton {fadl1d09,
Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.
Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.
July 27, 2002 Image Processing for K.R. Precision1 Image Processing Training Lecture 1 by Suthep Madarasmi, Ph.D. Assistant Professor Department of Computer.
LAM: Musical Audio Similarity Michael Casey Centre for Cognition, Computation and Culture Department of Computing Goldsmiths College, University of London.
One-Shot Learning Gesture Recognition Students:Itay Hubara Amit Nishry Supervisor:Maayan Harel Gal-On.
Adapted representations of audio signals for music instrument recognition Pierre Leveau Laboratoire d’Acoustique Musicale, Paris - France GET - ENST (Télécom.
Rhythmic Similarity Carmine Casciato MUMT 611 Thursday, March 13, 2005.
Content-Based Classification, Search & Retrieval of Audio Erling Wold, Thom Blum, Douglas Keislar, James Wheaton Presented By: Adelle C. Knight.
Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.
SWE 423: Multimedia Systems
Subband-based Independent Component Analysis Y. Qi, P.S. Krishnaprasad, and S.A. Shamma ECE Department University of Maryland, College Park.
AdvAIR Supervised by Prof. Michael R. Lyu Prepared by Alex Fok, Shirley Ng 2002 Fall An Advanced Audio Information Retrieval System.
Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.
The Chinese University of Hong Kong Department of Computer Science and Engineering Lyu0202 Advanced Audio Information Retrieval System.
김덕주 (Duck Ju Kim). Problems What is the objective of content-based video analysis? Why supervised identification has limitation? Why should use integrated.
DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.
Database Construction for Speech to Lip-readable Animation Conversion Gyorgy Takacs, Attila Tihanyi, Tamas Bardi, Gergo Feldhoffer, Balint Srancsik Peter.
Content-Based Video Retrieval System Presented by: Edmund Liang CSE 8337: Information Retrieval.
Sound Applications Advanced Multimedia Tamara Berg.
Contactforum: Digitale bibliotheken voor muziek. 3/6/2005 Real music libraries in the virtual future: for an integrated view of music and music information.
August 12, 2004IAML - IASA 2004 Congress, Olso1 Music Information Retrieval, or how to search for (and maybe find) music and do away with incipits Michael.
Automatic detection of microchiroptera echolocation calls from field recordings using machine learning algorithms Mark D. Skowronski and John G. Harris.
Hierarchical Dirichlet Process (HDP) A Dirichlet process (DP) is a discrete distribution that is composed of a weighted sum of impulse functions. Weights.
Music Information Retrieval -or- how to search for (and maybe find) music and do away with incipits Michael Fingerhut Multimedia Library and Engineering.
Jacob Zurasky ECE5526 – Spring 2011
Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome.
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
Understanding The Semantics of Media Chapter 8 Camilo A. Celis.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials: Informedia.
Page 1 NOLISP, Paris, May 23rd 2007 Audio-Visual Audio-Visual Subspaces Audio Visual Reduced Audiovisual Subspace Principal Component & Linear Discriminant.
A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj NIPS 2009.
Machine Learning Extract from various presentations: University of Nebraska, Scott, Freund, Domingo, Hong,
1 Applications of video-content analysis and retrieval IEEE Multimedia Magazine 2002 JUL-SEP Reporter: 林浩棟.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Non-negative Matrix Factor Deconvolution; Extracation of Multiple Sound Sources from Monophonic Inputs International Symposium on Independent Component.
Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.
A Study of Sparse Non-negative Matrix Factor 2-D Deconvolution Combined With Mask Application for Blind Source Separation of Frog Species 1 Reporter :
MMDB-9 J. Teuhola Standardization: MPEG-7 “Multimedia Content Description Interface” Standard for describing multimedia content (metadata).
ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre.
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
Semantic Extraction and Semantics-Based Annotation and Retrieval for Video Databases Authors: Yan Liu & Fei Li Department of Computer Science Columbia.
1 Automatic Music Style Recognition Arturo Camacho.
DYNAMIC TIME WARPING IN KEY WORD SPOTTING. OUTLINE KWS and role of DTW in it. Brief outline of DTW What is training and why is it needed? DTW training.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Extraction of Individual Tracks from Polyphonic Music Nick Starr.
Introduction to MPEG  Moving Pictures Experts Group,  Geneva based working group under the ISO/IEC standards.  In charge of developing standards for.
3D Motion Classification Partial Image Retrieval and Download Multimedia Project Multimedia and Network Lab, Department of Computer Science.
A content-based System for Music Recommendation and Visualization of User Preference Working on Semantic Notions Dmitry Bogdanov, Martin Haro, Ferdinand.
Audio Processing Mitch Parry. Resource! Sound Waves and Harmonic Motion.
Audio Processing Mitch Parry. Similar to Image Processing? For images a pixel is the smallest unit The color is a distribution of the spectrum of visible.
Audio Fingerprinting Wes Hatch MUMT-614 Mar.13, 2003.
Oracle Advanced Analytics
Independent Components in Text
Visual Information Retrieval
Carmine Casciato MUMT 611 Thursday, March 13, 2005
3D Motion Classification Partial Image Retrieval and Download
IMAGE PROCESSING RECOGNITION AND CLASSIFICATION
School of Computer Science & Engineering
Brian Whitman Paris Smaragdis MIT Media Lab
Carmine Casciato MUMT 611 Thursday, March 13, 2005
Feature Extraction (I)
Measuring the Similarity of Rhythmic Patterns
Beehive Audio Source Separation
Presentation transcript:

Centre for Computational Creativity Semantic Audio Studio Tools and Techniques using MPEG-7 Dr. Michael Casey Centre for Computational Creativity Department of Computing City University, London

Centre for Computational Creativity Overview MPEG-7 Tools Low Level Audio Descriptors Statistical Sound Models (Semantic ?) Music Unmixing Independent Spectrogram Separation Sound Classification Automatic label extraction “Semantic” processing Segment Similarity, Structure Extraction Musaics S-Matrix (Self-Similarity Matrix) C-Matrix (Cross-Similarity Matrix) Segment Replacement Musaics

Centre for Computational Creativity Semantic Audio Analysis Acoustic Features Extraction Semantic Audio Description

Centre for Computational Creativity MPEG-7 Audio Descriptors Header

Centre for Computational Creativity MPEG-7 Audio Descriptors Segments

Centre for Computational Creativity MPEG-7 Audio Descriptors Descriptor

Centre for Computational Creativity Some Useful Descriptors for Music Processing AudioSpectrumEnvelopeD AudioSpectrumBasisD AudioSpectrumProjectionD SoundModelDS SoundModelStatePathD SoundModelStateHistogramD

Centre for Computational Creativity EXAMPLE 1 MUSIC UNMIXING

Centre for Computational Creativity AudioSpectrumBasisD

Centre for Computational Creativity AudioSpectrumBasisD SVD / ICA Basis Rotation AudioSpectrumProjectionD AudioSpectrumBasisD

Centre for Computational Creativity AudioSpectrumBasisD

Centre for Computational Creativity AudioSpectrumProjectionD SVD / ICA Basis Rotation AudioSpectrumProjectionD AudioSpectrumBasisD

Centre for Computational Creativity AudioSpectrumProjectionD

Centre for Computational Creativity Outer Product Spectrum Reconstruction Individual Basis Component

Centre for Computational Creativity 4 Component Reconstruction

Centre for Computational Creativity 10 Component Reconstruction

Centre for Computational Creativity Linear basis projection using SVD and ICA spectrum subspace separation fast computation of subspace ICA full-rate filterbank masking Blocked ICA functions subspace reconstruction Y = XVV cluster subspaces to identify “tracks” sum masked filterbank output to create audio Music Unmixing + j jj

Centre for Computational Creativity 1 Component 4 Components 10 Components Subspace Extraction Mixture Spectrogram Independent Spectrogram Subspace Layers Spectral Basis Time Function Spectrogram Layer

Centre for Computational Creativity Music Unmixing Example (Pink Floyd: mono -> 9 subspace tracks)

Centre for Computational Creativity EXAMPLE 2 AUTOMATIC AUDIO CLASSIFICATION

Centre for Computational Creativity Sound Model DS and related descriptors ContinuousHiddenMarkovModelDS SoundModelStatePathD AudioSpectrumBasisD T(i,j) x AudioSpectrumEnvelopeD AudioSpectrumProjectionD

Centre for Computational Creativity Sound Recognition using HMMs Trained HMMs Sound Database

Centre for Computational Creativity MPEG-7: Intelligent Music Browsing

Centre for Computational Creativity Music Genre Classification: Class Name Num of Files Num Segments 1) Blues ) hiphop ) Gospel ) Country ) DrumNBass ) Classical ) 2Step ) Merengue ) Reggae ) Salsa Totals

Centre for Computational Creativity Music Genre Classification

Centre for Computational Creativity Semantic Audio: General Sound Taxonomy

Centre for Computational Creativity DS: General Audio Classification

Centre for Computational Creativity EXAMPLE 3 STRUCTURE EXTRACTION

Centre for Computational Creativity Structure Discovery Acoustic Features State-Space Models Hierarchical Structure Discovery

Centre for Computational Creativity SoundModelStatePathD State Path A simplified representation of spectral dynamics

Centre for Computational Creativity SoundModelStateHistogramD seconds state index 0.01s Frames

Centre for Computational Creativity High-Level Structure Discovery

Centre for Computational Creativity S-Matrix

Centre for Computational Creativity STRUCTURE EXTRACTION == SEGMENTATION

Centre for Computational Creativity Structure Discovery Low level features High-level Structure Acoustic Features State-Space Models Hierarchical Structure Discovery

Centre for Computational Creativity Alanis Morrisette Human Segmentation Machine Segmentation High-Level Structure Discovery

Centre for Computational Creativity Cranberries Human Segmentation Machine Segmentation High-Level Structure Discovery

Centre for Computational Creativity Nirvana Human Segmentation Machine Segmentation High-Level Structure Discovery

Centre for Computational Creativity High-Level Structure Discovery

Centre for Computational Creativity EXAMPLE 4 MUSAICS

Centre for Computational Creativity Musaics ( Music Mosaics) C-Matrix : Cross-Song Similarity Matrix Outer product of target and source histograms Find segments similar to target segment Similarity between all target and database segments SORT columns of similarity matrix Replace segments with similar material Segmentation boundaries (beat alignment) Replace with “best fit” using DTW on most similar segments EXAMPLES

Centre for Computational Creativity Musaics Target Extract MPEG-7 Database StatePathHistograms Segment Beats Match Replace Musaic

Centre for Computational Creativity Musaics

Centre for Computational Creativity Musaics

Centre for Computational Creativity Musaics

Centre for Computational Creativity Musaics

Centre for Computational Creativity Musaics

Centre for Computational Creativity Musaics

Centre for Computational Creativity Musaics

Centre for Computational Creativity Musaics

Centre for Computational Creativity Musaics

Centre for Computational Creativity Musaics

Centre for Computational Creativity Musaics

Centre for Computational Creativity Musaics New Content by Similarity Replacement C-Matrix: Cross-Song Similarity Map 1 Target, Many Sources Constraints Preserve Rhythm by Beat Tracking Preserve Beats by DTW alignment Bigger Source Database == Better Greater Number of Accurate Matches

Centre for Computational Creativity Acknowledgements International Standards Organisation ISO/IEC JTC 1 SC29 WG11 (MPEG) Mitsubishi Electric Research Labs Massachusetts Institute of Technology Music Mind Machine Group (formerly Machine Listening Group) Paris Smaragdis, Youngmoo Kim, Brian Whitman Iroro Orife, John Hershey, Alex Westner, Kevin Wilson City University Department of Computing Centre for Computational Creativity