Presentation is loading. Please wait.

Presentation is loading. Please wait.

Song Intersection by Approximate Nearest Neighbours Michael Casey, Goldsmiths Malcolm Slaney, Yahoo! Inc.

Similar presentations


Presentation on theme: "Song Intersection by Approximate Nearest Neighbours Michael Casey, Goldsmiths Malcolm Slaney, Yahoo! Inc."— Presentation transcript:

1 Song Intersection by Approximate Nearest Neighbours Michael Casey, Goldsmiths Malcolm Slaney, Yahoo! Inc.

2 Overview Large Databases: Everywhere! –8B web pages –50M audio files on web –2M songs Find duplicates with shingles –Text-based –LSH - Randomized projections Results –Best features –2018 song subset

3 The Need for Normalization Recommendations –Apply one songs rating to another –– > Better matches Playlists –Find matches to user requests –Remove adult/child music Search results –Dont show duplicates

4 Specificity Spectrum Cover songsRemixes Look for specific exact matches Bag of Features model Our work (nearest neighbor) FingerprintingGenre

5 Remixes of One Title

6 Remix Examples Abba Gimme Gimme Madonna Hung Up Tracy Young Remix of Hung Up Tracy Young Remix 2 of Hung Up

7 How Remix Recognition Works Algorithm –Matched filter best (ICASSP2005 result) –Nearest neighbor in 360–1200D space Ill posed? Efficient implementation –Audio shingles –Like web-duplicate search –Locality-sensitive hashing –Probabilistic guarantee

8 Audio Processing

9 Remix Distance N-best matches Matched filter (implemented as nearest neighbor)

10 Choosing r0

11 Hashing Types of hashes –String : put casey vs cased in different bins –Locality sensitive : find nearest neighbors High-dimensional and probabilistic Two Nearest Neighbor implementations –Pair-wise distance computation –1,000,000,000,000 comparisons in 2M song database –Hash bucket collisions –1,000,000,000 hash projections

12 Random Projections Random projections estimate distance Multiple projections improve estimate

13 Locality Sensitive Hashing Hash function is a random projection No pair-wise computation Collisions are nearest neighbors Distant Vector

14 Remix Nearest Neighbour Algorithm 1 1.Extract database audio shingles 2.Eliminate shingles < songs mean power 3.Compute remix distance for all pairs 4.Choose pairs with remix distance < r0

15 1.Extract database audio shingles 2.Eliminate shingles < songs mean power 3.Hash remaining shingles, bin width=r0 4.Collisions are near neighbour shingles Remix Nearest Neighbour Algorithm Revisited

16 Method Choose 20 Query Songs Each has 3-10 Remixes 306 Madonna Songs 2018 Madonna+Miles

17 Results

18 Conclusions Remixes are hard, but well-posed Brute force distances too expensive LSH is 1-2 orders of magnitude faster LSH Remix Recognition is Accurate

19 Conclusions Remixes are hard, but well-posed Brute force distances too expensive LSH is 1-2 orders of magnitude faster LSH Remix Recognition is Accurate

20 Conclusions Remixes are hard, but well-posed Brute force distances too expensive LSH is 1-2 orders of magnitude faster LSH Remix Recognition is Accurate

21 Conclusions Remixes are hard, but well-posed Brute force distances too expensive LSH is 1-2 orders of magnitude faster LSH Remix Recognition is Accurate


Download ppt "Song Intersection by Approximate Nearest Neighbours Michael Casey, Goldsmiths Malcolm Slaney, Yahoo! Inc."

Similar presentations


Ads by Google