Download presentation
Presentation is loading. Please wait.
1
Thursday, November 13, 2008 ASA 156: Statistical Approaches for Analysis of Music and Speech Audio Signals AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing Michael A. Casey Digital Musics Dartmouth College, Hanover, NH
2
Scalable Similarity 8M tracks in commercial collection 8M tracks in commercial collection PByte of multimedia data PByte of multimedia data Require passage-level retrieval (~ 2 bars) Require passage-level retrieval (~ 2 bars) Require scalable nearest-neighbor methods Require scalable nearest-neighbor methods
3
Specificity Partial track retrieval Partial track retrieval Alternate versions: remix, cover, live, album Alternate versions: remix, cover, live, album Task is mid-high specificity Task is mid-high specificity
4
Example: remixing Original Track Original Track Remix 1 Remix 1 Remix 2 Remix 2 Remix 3 Remix 3
5
Audio Shingles, concatenate l frames of m dimensional features A shingle is defined as: Shingles provide contextual information about features Originally used for Internet search engines: Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, Geoffrey Zweig:Steven C. GlassmanMark S. ManasseGeoffrey Zweig “Syntactic Clustering of the Web”. Computer Networks 29(8-13): 1157-1166 (1997)Computer Networks 29 Related to N-grams, overlapping sequences of features Applied to audio domain by Casey and Slaney : Casey, M. Slaney, M. “The Importance of Sequences in Musical Similarity”, in Proc. IEEE Int. Conf. onIEEE Int. Conf. on Acoustics, Speech and Signal Processing, 2006. ICASSP 2006Acoustics, Speech and Signal Processing, 2006. ICASSP 2006
6
Audio Shingle Similarity
7
, a query shingle drawn from a query track {Q}, database of audio tracks indexed by (n), a database shingle from track n Shingles are normalized to unit vectors, therefore: For shingles with M dimensions (M=l.m); m=12, 20; l=30,40
8
Open source: google: “audioDB” Open source: google: “audioDB” Management of tracks, sequences, salience Management of tracks, sequences, salience Automatic indexing parameters Automatic indexing parameters OMRAS2, Yahoo!, AWAL, CHARM, more… OMRAS2, Yahoo!, AWAL, CHARM, more… Web-services interface (SOAP / JSON) Web-services interface (SOAP / JSON) Implementation of LSH for large N ~ 1B Implementation of LSH for large N ~ 1B 1-10 ms whole-track retrieval from 1B vectors 1-10 ms whole-track retrieval from 1B vectors AudioDB: Shingle Nearest Neighbor Search
10
Whole-track similarity Often want to know which tracks are similar Often want to know which tracks are similar Similarity depends on specificity of task Similarity depends on specificity of task Distortion / filtering / re-encoding (high) Distortion / filtering / re-encoding (high) Remix with new audio material (mid) Remix with new audio material (mid) Cover song: same song, different artist (mid) Cover song: same song, different artist (mid)
11
Whole-track resemblance: radius-bounded search Compute the number of shingle collisions between two tracks:
12
Whole-track resemblance: radius-bounded search Compute the number of shingle collisions between two tracks: Requires a threshold for considering shingles to be related Need a way to estimate relatedness (threshold) for data set
13
Statistical approaches to modeling distance distributions
14
Distribution of minimum distances Database: 1.4 million shingles. The left bump is the minimum between 1000 randomly selected query shingles and this database. The right bump is a small sampling (1/98 000 000) of the full histogram of all distances.
15
Radius-bounded retrieval performance: cover song (opus task) Performance depends critically on xthresh, the collision threshold Want to estimate xthresh automatically from unlabelled data
16
Order Statistics Minimum-value distribution is analytic Minimum-value distribution is analytic Estimate the distribution parameters Estimate the distribution parameters Substitute into minimum value distribution Substitute into minimum value distribution Define a threshold in terms of FP rate Define a threshold in terms of FP rate This gives an estimate of xthresh This gives an estimate of xthresh
17
Estimating xthresh from unlabelled data Use theoretical statistics Use theoretical statistics Null Hypothesis: Null Hypothesis: H 0 : shingles are drawn from unrelated tracks H 0 : shingles are drawn from unrelated tracks Assume elements i.i.d., normally distributed Assume elements i.i.d., normally distributed M dimensional shingles, d effective degrees of freedom: M dimensional shingles, d effective degrees of freedom: Squared distance distribution for H 0 Squared distance distribution for H 0
18
ML for background distribution Likelihood for N data points (distances squared) d = effective degrees of freedom M = shingle dimensionality
19
Background distribution parameters Likelihood for N data points (distances squared) d = effective degrees of freedom M = shingle dimensionality
20
Minimum value over N samples
21
Minimum value distribution of unrelated shingles
22
Estimate of xthresh, false positive rate
23
Unlabelled data experiment Unlabelled data set Unlabelled data set Known to contain: Known to contain: cover songs (same work, different performer) cover songs (same work, different performer) Near duplicate recordings (misattribution, encoding) Near duplicate recordings (misattribution, encoding) Estimate background distance distribution Estimate background distance distribution Estimate minimum value distribution Estimate minimum value distribution Set xthresh so FP rate is <= 1% Set xthresh so FP rate is <= 1% Whole-track retrieval based on shingle collisions Whole-track retrieval based on shingle collisions
24
Cover song retrieval
25
Scaling Locality sensitive hashing Locality sensitive hashing Trade-off approximate NN for time complexity Trade-off approximate NN for time complexity 3 to 4 orders of magnitude speed-up 3 to 4 orders of magnitude speed-up No noticeable degradation in performance No noticeable degradation in performance For optimal radius threshold For optimal radius threshold
26
LSH
27
Remix retrieval via LSH
28
Current deployment Large commercial collections Large commercial collections AWAL ~ 100,000 tracks AWAL ~ 100,000 tracks Yahoo! 2M+ tracks, related song classifier Yahoo! 2M+ tracks, related song classifier AudioDB: open-source, international consortium of developers AudioDB: open-source, international consortium of developers Google: “audioDB” Google: “audioDB”
29
Conclusions Radius-bounded retrieval model for tracks Radius-bounded retrieval model for tracks Shingles preserve temporal information, high d Shingles preserve temporal information, high d Implements mid-to-high specificity search Implements mid-to-high specificity search Optimal radius threshold from order statistics Optimal radius threshold from order statistics null hypothesis: shingles are drawn from unrelated tracks null hypothesis: shingles are drawn from unrelated tracks LSH requires radius bound, automatic estimate LSH requires radius bound, automatic estimate Scales to 1B shingles+ using LSH Scales to 1B shingles+ using LSH
30
Thanks Malcolm Slaney, Yahoo! Research Inc. Malcolm Slaney, Yahoo! Research Inc. Christophe Rhodes, Goldsmiths, U. of London Christophe Rhodes, Goldsmiths, U. of London Michela Magas, Goldsmiths, U. of London Michela Magas, Goldsmiths, U. of London Funding: EPSRC: EP/E02274X/1 Funding: EPSRC: EP/E02274X/1
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.