Results Audio Information Retrieval using Semantic Similarity Luke Barrington, Antoni Chan, Douglas Turnbull & Gert Lanckriet Electrical & Computer Engineering.

Slides:



Advertisements
Similar presentations
Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos.
Advertisements

Yansong Feng and Mirella Lapata
Presentation in Aircraft Satellite Image Identification Using Bayesian Decision Theory And Moment Invariants Feature Extraction Dickson Gichaga Wambaa.
Franz de Leon, Kirk Martinez Web and Internet Science Group  School of Electronics and Computer Science  University of Southampton {fadl1d09,
Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.
Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.
1 Content-Based Retrieval (CBR) -in multimedia systems Presented by: Chao Cai Date: March 28, 2006 C SC 561.
Multiple Instance Learning
Assuming normally distributed data! Naïve Bayes Classifier.
Unsupervised Image Clustering using Probabilistic Continuous Models and Information Theoretic Principles Shiri Gordon Electrical Engineering – System,
Distributional Clustering of Words for Text Classification Authors: L.Douglas Baker Andrew Kachites McCallum Presenter: Yihong Ding.
Automatic Image Annotation and Retrieval using Cross-Media Relevance Models J. Jeon, V. Lavrenko and R. Manmathat Computer Science Department University.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Expectation Maximization Method Effective Image Retrieval Based on Hidden Concept Discovery in Image Database By Sanket Korgaonkar Masters Computer Science.
Causality challenge workshop (IEEE WCCI) June 2, Slide 1 Bernoulli Mixture Models for Markov Blanket Filtering and Classification Mehreen Saeed Department.
People use words to describe music
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
A Supervised Approach for Detecting Boundaries in Music using Difference Features and Boosting Douglas Turnbull Computer Audition Lab UC San Diego, USA.
Identifying Words that are Musically Meaningful David Torres, Douglas Turnbull, Luke Barrington, Gert Lanckriet Computer Audition Lab UC San Diego ISMIR.
Formulating Semantic Image Annotation as a Supervised Learning Problem Gustavo Carneiro and Nuno Vasconcelos CVPR ‘05 Presentation by: Douglas Turnbull.
Region Based Image Annotation Through Multiple-Instance Learning By: Changbo Yang Wayne State University Department of Computer Science.
Semantic Similarity for Music Retrieval Luke Barrington, Doug Turnbull, David Torres & Gert Lanckriet Electrical & Computer Engineering University of California,
Student: Kylie Gorman Mentor: Yang Zhang COLOR-ATTRIBUTES- RELATED IMAGE RETRIEVAL.
Towards Musical Query-by-Semantic-Description using the CAL500 Dataset Douglas Turnbull Computer Audition Lab UC San Diego Work with Luke Barrington, David.
POTENTIAL RELATIONSHIP DISCOVERY IN TAG-AWARE MUSIC STYLE CLUSTERING AND ARTIST SOCIAL NETWORKS Music style analysis such as music classification and clustering.
Information Retrieval in Practice
DOG I : an Annotation System for Images of Dog Breeds Antonis Dimas Pyrros Koletsis Euripides Petrakis Intelligent Systems Laboratory Technical University.
Exercise Session 10 – Image Categorization
Introduction to Machine Learning for Information Retrieval Xiaolong Wang.
Cepstral Vector Normalization based On Stereo Data for Robust Speech Recognition Presenter: Shih-Hsiang Lin Luis Buera, Eduardo Lleida, Antonio Miguel,
Text Classification, Active/Interactive learning.
Lecture note for Stat 231: Pattern Recognition and Machine Learning 4. Maximum Likelihood Prof. A.L. Yuille Stat 231. Fall 2004.
SVCL 1 Content based Image Retrieval (at SVCL) Nikhil Rasiwasia, Nuno Vasconcelos Statistical Visual Computing Laboratory University of California, San.
Understanding the Semantics of Media Lecture Notes on Video Search & Mining, Spring 2012 Presented by Jun Hee Yoo Biointelligence Laboratory School of.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
ALIP: Automatic Linguistic Indexing of Pictures Jia Li The Pennsylvania State University.
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
A Game-Based Approach for Collecting Semantic Music Annotations Douglas Turnbull, Rouran Liu, Luke Barrington, Gert Lanckriet Computer Audition Lab UC.
Music Information Retrieval Information Universe Seongmin Lim Dept. of Industrial Engineering Seoul National University.
Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.
Combining Audio Content and Social Context for Semantic Music Discovery José Carlos Delgado Ramos Universidad Católica San Pablo.
Object Recognition a Machine Translation Learning a Lexicon for a Fixed Image Vocabulary Miriam Miklofsky.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Latent Dirichlet Allocation
1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
1.Learn appearance based models for concepts 2.Compute posterior probabilities or Semantic Multinomial (SMN) under appearance models. -But, suffers from.
The Effect of Database Size Distribution on Resource Selection Algorithms Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University.
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
KNN & Naïve Bayes Hongning Wang
Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Large-Scale Content-Based Audio Retrieval from Text Queries
Statistical Models for Automatic Speech Recognition
Multimodal Learning with Deep Boltzmann Machines
Image Segmentation Techniques
Statistical Models for Automatic Speech Recognition
PRAKASH CHOCKALINGAM, NALIN PRADEEP, AND STAN BIRCHFIELD
John Lafferty, Chengxiang Zhai School of Computer Science
Michal Rosen-Zvi University of California, Irvine
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Text Categorization Berlin Chen 2003 Reference:
Presentation transcript:

Results Audio Information Retrieval using Semantic Similarity Luke Barrington, Antoni Chan, Douglas Turnbull & Gert Lanckriet Electrical & Computer Engineering University of California, San Diego References Carneiro & Vasconcelos (2005). Formulating semantic image annotation as a supervised learning problem. IEEE CVPR. Rasiwasia, Vasconcelos & Moreno (2006). Query by Semantic Example. ICIVR. Slaney (2002). Semantic-audio retrieval. IEEE ICASSP. Skowronek, McKinney & van de Par (2006). Ground-truth for automatic music mood classification. ISMIR. Semantic Models For the w th word in the vocabulary, estimate P(a|w), a ‘word’ distribution over audio feature vector space. Model P(a|w) with a Gaussian Mixture Model (GMM), estimated using Expectation Maximization. The training data for word distribution P(a|w i ) is all feature vectors from all tracks labeled with word w i. The semantic model is a set of ‘word’ GMM distributions Query By Example Query-by-example is a method for retrieving content from databases: given an example, return similar content. For sound effects audio, the retrieved results could have: similar sound: Query by Acoustic Example (QBAE) or similar meaning: Query by Semantic Example (QBSE). We describe QBSE retrieval and demonstrate that it is both more accurate and more efficient that QBAE. We experiment on the BBC Sound Effects library, a heterogeneous data set of audio track / caption pairs: 1305 tracks, 3 to 600 seconds long 348-word vocabulary of semantic concepts (words) Each track has a descriptive caption of up to 13 words Represent each track’s audio as a bag of feature vectors; MFCC features plus 1 st and 2 nd time deltas 10,000 feature vectors per minute of audio Represent each track’s caption as a bag of words; Binary document vector of length 348 Audio & Text Features Mean Av. Prec QBSEQBAE ± ±.001 Each database track, d, is represented as a probability distribution over the audio feature space, approximated as a K-component Gaussian mixture model (GMM): The similarity of database tracks to a query track, is based on the likelihood of the audio features of the query under the database track distributions: Rank-order database tracks by decreasing likelihood QBAE complexity grows with the size of the database Acoustic Similarity The semantic distributions are points in a semantic space. A natural measure of similarity in this space is the Kullback-Leibler (KL) divergence; Given a query track, QBSE retrieves the database tracks that minimize the KL divergence with the query. The bulk of QBSE computation lies in calculating the semantic distribution for the query track so complexity grows with the size of the vocabulary. Semantic Similarity Using learned ‘word’ distributions P(a|w i ), compute the posterior probability of word w i, given track Assume x m and x n are conditionally independent, given w i : Estimate the song prior, by summing over all words: Normalizing posteriors of all words, we represent a track as a semantic distribution over the vocabulary terms: Sounds → Semantics Semantic Solution