Presentation is loading. Please wait.

Presentation is loading. Please wait.

Informatik og Matematisk Modellering / Intelligent Signalbehandling 1Kaare Brandt Petersen Machine Learning on Sound... how hard can it be? Audio Information.

Similar presentations


Presentation on theme: "Informatik og Matematisk Modellering / Intelligent Signalbehandling 1Kaare Brandt Petersen Machine Learning on Sound... how hard can it be? Audio Information."— Presentation transcript:

1

2 Informatik og Matematisk Modellering / Intelligent Signalbehandling 1Kaare Brandt Petersen Machine Learning on Sound... how hard can it be? Audio Information Seminar Thursday, June 8, 2006 Kaare Brandt Petersen

3 Informatik og Matematisk Modellering / Intelligent Signalbehandling Kaare Brandt Petersen2 Agenda Motivation The reason it might be hard: - From data and information - Features The good news: - Computer power and machine learning - Examples Conclusions

4 Informatik og Matematisk Modellering / Intelligent Signalbehandling Kaare Brandt Petersen3 Motivation What can we do with audio information? News archive: Find the grumpy voice in a TV broadcasting from a busy street in the middle east. Search in newsarchives Music: 6 billion friends. Navigating in the world landscape of music

5 Informatik og Matematisk Modellering / Intelligent Signalbehandling Kaare Brandt Petersen4 Data Sound as perceived by humans and by computers [ Beeps ] - "There's the televison" [ Music - violins ] [ Steps ]- "Its all right there" - "All right there!" - "Look. Listen. Neel. Pray" - "Commericals!" [ Male voice - indoor ] Dialogue Sound events 12 Monkeys Movie from 1995

6 Informatik og Matematisk Modellering / Intelligent Signalbehandling Kaare Brandt Petersen5 Data Is the data-to-information translation really necessary? 1) Query by signal processing [ humans learn how computers think ] 2) Query by information [ computers learn how humans think ] 3) Query by example [ various approaches ] "happy jazz" ZCR < 198 Archive

7 Informatik og Matematisk Modellering / Intelligent Signalbehandling Kaare Brandt Petersen6 Data Going from 5 million real numbers to "Opera" Bridging the gap: From data to information Constructing sound features the right way Information Meaning Context

8 Informatik og Matematisk Modellering / Intelligent Signalbehandling Kaare Brandt Petersen7 Features Many shorttime features Zero crossing rate Spectral flatness Spectral bandwidth Spectral centroids Spectral rolloff Spectral flux Energy... Mel Frequency Cepstral Coefficients (MFCC) [Foote97, Rabiner93] Real Cepstral Coefficients (RCC) Linear Prediction Coefficients (LPC) Wavelets Gamma-tone-filterbanks Sone / Bark Chroma features... ZCR MFCC 1 Spec Sp-Flatness MFCC 2-7 Waveform Sp-Bandwidth Sp-Centroid Chroma 12 Monkeys sound clip

9 Informatik og Matematisk Modellering / Intelligent Signalbehandling Kaare Brandt Petersen8 Features Aggregating shorttime features Audio clip = data cloud Distribution of values Basic statistics [Wold96] Histograms and vector quantization [Foote97] Gaussian Mixture Models [Auc02] K-means clustering [Logan01] Anchors by Neural Networks [Beren03] Temporal modelling SVD of e.g. spectrogram [Gu04] AR-coefficients [Meng05]

10 Informatik og Matematisk Modellering / Intelligent Signalbehandling Kaare Brandt Petersen9 Features What we are trying to do: From data to information Data ZCR Spectral MFCC Chroma Sone/Bark RCC LPC... Low-level Features Basic stats GMM Kmeans Anchors AR coeff SVD HMM... High-level Features "Rough" "Deep" "Sparky" "Broad" "Melancolic" "Majestic" "Jazz" "Rock"... Information

11 Informatik og Matematisk Modellering / Intelligent Signalbehandling Kaare Brandt Petersen10 Features Music similarity example "Shape of my heart" Backstreet Boys, 2000 "Thats the way it is" Celine Dion, 2000 "Cantaloop" Us3, 1993 "The limitations observed in this paper (...) suggests that the usual route to timbre similarity may not be the optimal one" [Auc04]

12 Informatik og Matematisk Modellering / Intelligent Signalbehandling Kaare Brandt Petersen11 The bad news Sound data is far from the information Not all features are useful It is not obvious what the information labels should be

13 Informatik og Matematisk Modellering / Intelligent Signalbehandling Kaare Brandt Petersen12 The good news Computer power Signal processing - strong development in signal processing and machine learning in general - Large amounts of data - Increased interest in sound and music processing

14 Informatik og Matematisk Modellering / Intelligent Signalbehandling Kaare Brandt Petersen13 Example: Genre estimation Genre estimation by temporal integration Peter Ahrendt Anders Meng [Meng05] Processing: Sound -> MFCC -> AR

15 Informatik og Matematisk Modellering / Intelligent Signalbehandling Kaare Brandt Petersen14 Example: Genre estimation Genre estimation by temporal integration + kernel methods Jeronimo Arenas-Garcia Tue Lehn-Schiøler Kaare Brandt Petersen [ArGa06] Processing: Sound -> MFCC -> AR -> KOPLS Btw: A data harvesting tool coming up - ISMIR 2006

16 Informatik og Matematisk Modellering / Intelligent Signalbehandling Kaare Brandt Petersen15 Example: Source separation Spectrogram modelling with sparse NTF2D Morten Mørup Mikkel Schmidt, [Mørup06] W = time-frequency patterns H = time, amplitude, pitch Original (mixed) Separated sources(Harp)(Flute)

17 Informatik og Matematisk Modellering / Intelligent Signalbehandling Kaare Brandt Petersen16 Example: CNN Translating a CNN news broadcast Kasper Jørgensen Lasse Mølgaard Lars Kai Hansen [Jorg06] Music or Speech? Sound -> MFCC, STE, SpF, ZCR -> mean/var Speaker change detection Sound -> MFCC -> VQ Speech recognition Sphinx 4 (Carnegie Mellon)

18 Informatik og Matematisk Modellering / Intelligent Signalbehandling Kaare Brandt Petersen17 Conclusions It is hard: Sound data is far from the information Good features are hard to find but machine learning is catching up: Examples: Genre, Source separation, CNN-translation

19 Informatik og Matematisk Modellering / Intelligent Signalbehandling Kaare Brandt Petersen18 References [Wold96] Wold, E.; Blum, T.; Keislar, D. & Wheaton, J. "Content-based Classification, Search, and Retrieval of Audio" IEEE Multimedia, 1996, 3, [Foote97] Foote, J."Content-based retrieval of music and audio", Multimedia Storage and Archiving Systems II, Proc. of SPIE, 1997, 3229, [Logan01] Logan and Salomon, "A music similarity function based on signal analysis", ICME 2001 [Beren03] Berenzweig, Ellis and Lawrence, "Anchorspace for classification and similarity measurement of music" ICME 2003 [Rabiner93] Rabiner, L. & Juang, B.H. "Fundamentals of Speech Recognition", Prentice-Hall, 1993 [Gu04] Gu, Lu, Cai and Zhang, "Dominant Feature vector based audio similarity measure", Proceedings of the Pacific Rim Conference on Multimedia, PCM, 2004 [Tza02] Tzanetakis and Cook, "Music Genre Classification of Music", IEEE Transactions on Speech and Audio Processing, 2002, 10, [Auc02] Aucouturier and Pachet, "Music Similarity Measures: Whats the use?" ISMIR 2002 [Meng05] Anders Meng, Peter Ahrendt and Jan Larsen: "Improving Music Genre Classification by Short- Time Feature Integration", ICASSP, [Auc04] Aucouturier, Pachet, "Improving Timbre Similarity: How high is the sky?", JNRSAS, 2004 [Mørup06] Sparse Non-negative Tensor Factor Double Deconvolution (SNTF2D) for multi channel time- frequency analysis", submitted to JMLR 2006 [ArGa06], "Reduced Kaernel Orthonormal Partial Least Squares", submitted for NIPS 2006 [Jorg06] Kasper Jørgensen, Lasse Mølgaard, Lars Kai Hansen, "Unsupervised speaker change detection for broadcast news segmentation", EUSIPCO 2006


Download ppt "Informatik og Matematisk Modellering / Intelligent Signalbehandling 1Kaare Brandt Petersen Machine Learning on Sound... how hard can it be? Audio Information."

Similar presentations


Ads by Google