Presentation is loading. Please wait.

Presentation is loading. Please wait.

Results Audio Information Retrieval using Semantic Similarity Luke Barrington, Antoni Chan, Douglas Turnbull & Gert Lanckriet Electrical & Computer Engineering.

Similar presentations


Presentation on theme: "Results Audio Information Retrieval using Semantic Similarity Luke Barrington, Antoni Chan, Douglas Turnbull & Gert Lanckriet Electrical & Computer Engineering."— Presentation transcript:

1 Results Audio Information Retrieval using Semantic Similarity Luke Barrington, Antoni Chan, Douglas Turnbull & Gert Lanckriet Electrical & Computer Engineering University of California, San Diego lbarrington@ucsd.edu References Carneiro & Vasconcelos (2005). Formulating semantic image annotation as a supervised learning problem. IEEE CVPR. Rasiwasia, Vasconcelos & Moreno (2006). Query by Semantic Example. ICIVR. Slaney (2002). Semantic-audio retrieval. IEEE ICASSP. Skowronek, McKinney & van de Par (2006). Ground-truth for automatic music mood classification. ISMIR. Semantic Models For the w th word in the vocabulary, estimate P(a|w), a ‘word’ distribution over audio feature vector space. Model P(a|w) with a Gaussian Mixture Model (GMM), estimated using Expectation Maximization. The training data for word distribution P(a|w i ) is all feature vectors from all tracks labeled with word w i. The semantic model is a set of ‘word’ GMM distributions Query By Example Query-by-example is a method for retrieving content from databases: given an example, return similar content. For sound effects audio, the retrieved results could have: similar sound: Query by Acoustic Example (QBAE) or similar meaning: Query by Semantic Example (QBSE). We describe QBSE retrieval and demonstrate that it is both more accurate and more efficient that QBAE. http://cosmal.ucsd.edu/cal/ We experiment on the BBC Sound Effects library, a heterogeneous data set of audio track / caption pairs: 1305 tracks, 3 to 600 seconds long 348-word vocabulary of semantic concepts (words) Each track has a descriptive caption of up to 13 words Represent each track’s audio as a bag of feature vectors; MFCC features plus 1 st and 2 nd time deltas 10,000 feature vectors per minute of audio Represent each track’s caption as a bag of words; Binary document vector of length 348 Audio & Text Features Mean Av. Prec QBSEQBAE 0.186 ±.0030.165 ±.001 Each database track, d, is represented as a probability distribution over the audio feature space, approximated as a K-component Gaussian mixture model (GMM): The similarity of database tracks to a query track, is based on the likelihood of the audio features of the query under the database track distributions: Rank-order database tracks by decreasing likelihood QBAE complexity grows with the size of the database Acoustic Similarity The semantic distributions are points in a semantic space. A natural measure of similarity in this space is the Kullback-Leibler (KL) divergence; Given a query track, QBSE retrieves the database tracks that minimize the KL divergence with the query. The bulk of QBSE computation lies in calculating the semantic distribution for the query track so complexity grows with the size of the vocabulary. Semantic Similarity Using learned ‘word’ distributions P(a|w i ), compute the posterior probability of word w i, given track Assume x m and x n are conditionally independent, given w i : Estimate the song prior, by summing over all words: Normalizing posteriors of all words, we represent a track as a semantic distribution over the vocabulary terms: Sounds → Semantics Semantic Solution


Download ppt "Results Audio Information Retrieval using Semantic Similarity Luke Barrington, Antoni Chan, Douglas Turnbull & Gert Lanckriet Electrical & Computer Engineering."

Similar presentations


Ads by Google