Centre for Computational Creativity Semantic Audio Studio Tools and Techniques using MPEG-7 Dr. Michael Casey Centre for Computational Creativity Department of Computing City University, London
Centre for Computational Creativity Overview MPEG-7 Tools Low Level Audio Descriptors Statistical Sound Models (Semantic ?) Music Unmixing Independent Spectrogram Separation Sound Classification Automatic label extraction “Semantic” processing Segment Similarity, Structure Extraction Musaics S-Matrix (Self-Similarity Matrix) C-Matrix (Cross-Similarity Matrix) Segment Replacement Musaics
Centre for Computational Creativity Semantic Audio Analysis Acoustic Features Extraction Semantic Audio Description
Centre for Computational Creativity MPEG-7 Audio Descriptors Header
Centre for Computational Creativity MPEG-7 Audio Descriptors Segments
Centre for Computational Creativity MPEG-7 Audio Descriptors Descriptor
Centre for Computational Creativity Some Useful Descriptors for Music Processing AudioSpectrumEnvelopeD AudioSpectrumBasisD AudioSpectrumProjectionD SoundModelDS SoundModelStatePathD SoundModelStateHistogramD
Centre for Computational Creativity EXAMPLE 1 MUSIC UNMIXING
Centre for Computational Creativity AudioSpectrumBasisD
Centre for Computational Creativity AudioSpectrumBasisD SVD / ICA Basis Rotation AudioSpectrumProjectionD AudioSpectrumBasisD
Centre for Computational Creativity AudioSpectrumBasisD
Centre for Computational Creativity AudioSpectrumProjectionD SVD / ICA Basis Rotation AudioSpectrumProjectionD AudioSpectrumBasisD
Centre for Computational Creativity AudioSpectrumProjectionD
Centre for Computational Creativity Outer Product Spectrum Reconstruction Individual Basis Component
Centre for Computational Creativity 4 Component Reconstruction
Centre for Computational Creativity 10 Component Reconstruction
Centre for Computational Creativity Linear basis projection using SVD and ICA spectrum subspace separation fast computation of subspace ICA full-rate filterbank masking Blocked ICA functions subspace reconstruction Y = XVV cluster subspaces to identify “tracks” sum masked filterbank output to create audio Music Unmixing + j jj
Centre for Computational Creativity 1 Component 4 Components 10 Components Subspace Extraction Mixture Spectrogram Independent Spectrogram Subspace Layers Spectral Basis Time Function Spectrogram Layer
Centre for Computational Creativity Music Unmixing Example (Pink Floyd: mono -> 9 subspace tracks)
Centre for Computational Creativity EXAMPLE 2 AUTOMATIC AUDIO CLASSIFICATION
Centre for Computational Creativity Sound Model DS and related descriptors ContinuousHiddenMarkovModelDS SoundModelStatePathD AudioSpectrumBasisD T(i,j) x AudioSpectrumEnvelopeD AudioSpectrumProjectionD
Centre for Computational Creativity Sound Recognition using HMMs Trained HMMs Sound Database
Centre for Computational Creativity MPEG-7: Intelligent Music Browsing
Centre for Computational Creativity Music Genre Classification: Class Name Num of Files Num Segments 1) Blues ) hiphop ) Gospel ) Country ) DrumNBass ) Classical ) 2Step ) Merengue ) Reggae ) Salsa Totals
Centre for Computational Creativity Music Genre Classification
Centre for Computational Creativity Semantic Audio: General Sound Taxonomy
Centre for Computational Creativity DS: General Audio Classification
Centre for Computational Creativity EXAMPLE 3 STRUCTURE EXTRACTION
Centre for Computational Creativity Structure Discovery Acoustic Features State-Space Models Hierarchical Structure Discovery
Centre for Computational Creativity SoundModelStatePathD State Path A simplified representation of spectral dynamics
Centre for Computational Creativity SoundModelStateHistogramD seconds state index 0.01s Frames
Centre for Computational Creativity High-Level Structure Discovery
Centre for Computational Creativity S-Matrix
Centre for Computational Creativity STRUCTURE EXTRACTION == SEGMENTATION
Centre for Computational Creativity Structure Discovery Low level features High-level Structure Acoustic Features State-Space Models Hierarchical Structure Discovery
Centre for Computational Creativity Alanis Morrisette Human Segmentation Machine Segmentation High-Level Structure Discovery
Centre for Computational Creativity Cranberries Human Segmentation Machine Segmentation High-Level Structure Discovery
Centre for Computational Creativity Nirvana Human Segmentation Machine Segmentation High-Level Structure Discovery
Centre for Computational Creativity High-Level Structure Discovery
Centre for Computational Creativity EXAMPLE 4 MUSAICS
Centre for Computational Creativity Musaics ( Music Mosaics) C-Matrix : Cross-Song Similarity Matrix Outer product of target and source histograms Find segments similar to target segment Similarity between all target and database segments SORT columns of similarity matrix Replace segments with similar material Segmentation boundaries (beat alignment) Replace with “best fit” using DTW on most similar segments EXAMPLES
Centre for Computational Creativity Musaics Target Extract MPEG-7 Database StatePathHistograms Segment Beats Match Replace Musaic
Centre for Computational Creativity Musaics
Centre for Computational Creativity Musaics
Centre for Computational Creativity Musaics
Centre for Computational Creativity Musaics
Centre for Computational Creativity Musaics
Centre for Computational Creativity Musaics
Centre for Computational Creativity Musaics
Centre for Computational Creativity Musaics
Centre for Computational Creativity Musaics
Centre for Computational Creativity Musaics
Centre for Computational Creativity Musaics
Centre for Computational Creativity Musaics New Content by Similarity Replacement C-Matrix: Cross-Song Similarity Map 1 Target, Many Sources Constraints Preserve Rhythm by Beat Tracking Preserve Beats by DTW alignment Bigger Source Database == Better Greater Number of Accurate Matches
Centre for Computational Creativity Acknowledgements International Standards Organisation ISO/IEC JTC 1 SC29 WG11 (MPEG) Mitsubishi Electric Research Labs Massachusetts Institute of Technology Music Mind Machine Group (formerly Machine Listening Group) Paris Smaragdis, Youngmoo Kim, Brian Whitman Iroro Orife, John Hershey, Alex Westner, Kevin Wilson City University Department of Computing Centre for Computational Creativity