Song Intersection by Approximate Nearest Neighbours Michael Casey, Goldsmiths Malcolm Slaney, Yahoo! Inc.

Slides:



Advertisements
Similar presentations
Object Recognition Using Locality-Sensitive Hashing of Shape Contexts Andrea Frome, Jitendra Malik Presented by Ilias Apostolopoulos.
Advertisements

Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile.
Distinctive Image Features from Scale-Invariant Keypoints David Lowe.
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
1 Spatial Join. 2 Papers to Present “Efficient Processing of Spatial Joins using R-trees”, T. Brinkhoff, H-P Kriegel and B. Seeger, Proc. SIGMOD, 1993.
Aggregating local image descriptors into compact codes
Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order Edgar Chávez Karina Figueroa Gonzalo Navarro UNIVERSIDAD MICHOACANA, MEXICO.
Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009.
Algorithmic High-Dimensional Geometry 1 Alex Andoni (Microsoft Research SVC)
Presented by Xinyu Chang
Latent Semantic Indexing (mapping onto a smaller space of latent concepts) Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading 18.
Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.
VisualRank: Applying PageRank to Large-Scale Image Search Yushi Jing, Member, IEEE, and Shumeet Baluja, Member, IEEE.
Big Data Lecture 6: Locality Sensitive Hashing (LSH)
Searching on Multi-Dimensional Data
MIT CSAIL Vision interfaces Towards efficient matching with random hashing methods… Kristen Grauman Gregory Shakhnarovich Trevor Darrell.
Similarity Search in High Dimensions via Hashing
Data Structures and Functional Programming Algorithms for Big Data Ramin Zabih Cornell University Fall 2012.
VLSH: Voronoi-based Locality Sensitive Hashing Sung-eui Yoon Authors: Lin Loi, Jae-Pil Heo, Junghwan Lee, and Sung-Eui Yoon KAIST
Coherency Sensitive Hashing (CSH) Simon Korman and Shai Avidan Dept. of Electrical Engineering Tel Aviv University ICCV2011 | 13th International Conference.
Chapter Physical Database Design Methodology Software & Hardware Mapping Logical Design to DBMS Physical Implementation Security Implementation Monitoring.
Thursday, November 13, 2008 ASA 156: Statistical Approaches for Analysis of Music and Speech Audio Signals AudioDB: Scalable approximate nearest-neighbor.
Sparse Solutions for Large Scale Kernel Machines Taher Dameh CMPT820-Multimedia Systems Dec 2 nd, 2010.
1 Lecture 18 Syntactic Web Clustering CS
Wedneday, January 21st, 2008 Comp. Sci. Colloquium The Problem with Music: Modeling Distance Distributions of Large Music Collections Prof. Michael Casey.
Near Duplicate Detection
Scale Invariant Feature Transform (SIFT)
Nearest Neighbor Retrieval Using Distance-Based Hashing Michalis Potamias and Panagiotis Papapetrou supervised by Prof George Kollios A method is proposed.
Finding Similar Items.
J Cheng et al,. CVPR14 Hyunchul Yang( 양현철 )
IIIT Hyderabad Atif Iqbal and Anoop Namboodiri Cascaded.
1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.
Indexing Techniques Mei-Chen Yeh.
Approximation algorithms for large-scale kernel methods Taher Dameh School of Computing Science Simon Fraser University March 29 th, 2010.
Detecting Near-Duplicates for Web Crawling Manku, Jain, Sarma
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
School of Information Technology & Electrical Engineering Multiple Feature Hashing for Real-time Large Scale Near-duplicate Video Retrieval Jingkuan Song*,
PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008.
FINDING NEAR DUPLICATE WEB PAGES: A LARGE- SCALE EVALUATION OF ALGORITHMS - Monika Henzinger Speaker Ketan Akade 1.
Accessing the Deep Web Bin He IBM Almaden Research Center in San Jose, CA Mitesh Patel Microsoft Corporation Zhen Zhang computer science at the University.
Nearest Neighbor Paul Hsiung March 16, Quick Review of NN Set of points P Query point q Distance metric d Find p in P such that d(p,q) < d(p’,q)
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
Identifying Patterns in Time Series Data Daniel Lewis 04/06/06.
Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.
P ROBING THE L OCAL -F EATURE S PACE OF I NTEREST P OINTS Wei-Ting Lee, Hwann-Tzong Chen Department of Computer Science National Tsing Hua University,
Introduction to String Kernels Blaz Fortuna JSI, Slovenija.
Document duplication (exact or approximate) Paolo Ferragina Dipartimento di Informatica Università di Pisa Slides only!
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality Piotr Indyk, Rajeev Motwani The 30 th annual ACM symposium on theory of computing.
CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original.
Outline Problem Background Theory Extending to NLP and Experiment
Our MP3 Search Engine Crawler –Searching for Artist Name –Searching for Song Title Website Difficulties Looking Back.
Hallucinations in Auditory Perception!!! Malcolm Slaney Yahoo! Research Stanford CCRMA.
1 Efficient Computation of Diverse Query Results Erik Vee joint work with Utkarsh Srivastava, Jayavel Shanmugasundaram, Prashant Bhat, Sihem Amer Yahia.
Privacy Preserving Outlier Detection using Locality Sensitive Hashing
Audio Fingerprinting MUMT 611 Philippe Zaborowski March 2005.
Spatial Data Management
Big Data Infrastructure
SIMILARITY SEARCH The Metric Space Approach
Multiple Feature Hashing for Real-time Large Scale
Near Duplicate Detection
Fast nearest neighbor searches in high dimensions Sami Sieranoja
Distance Computation “Efficient Distance Computation Between Non-Convex Objects” Sean Quinlan Stanford, 1994 Presentation by Julie Letchner.
Lecture 11: Nearest Neighbor Search
Faster Sample-based Motion Planning using Instance-based Learning
Overview of Query Evaluation
CS5112: Algorithms and Data Structures for Applications
Minwise Hashing and Efficient Search
President’s Day Lecture: Advanced Nearest Neighbor Search
Topological Signatures For Fast Mobility Analysis
LSH-based Motion Estimation
Presentation transcript:

Song Intersection by Approximate Nearest Neighbours Michael Casey, Goldsmiths Malcolm Slaney, Yahoo! Inc.

Overview Large Databases: Everywhere! –8B web pages –50M audio files on web –2M songs Find duplicates with shingles –Text-based –LSH - Randomized projections Results –Best features –2018 song subset

The Need for Normalization Recommendations –Apply one songs rating to another –– > Better matches Playlists –Find matches to user requests –Remove adult/child music Search results –Dont show duplicates

Specificity Spectrum Cover songsRemixes Look for specific exact matches Bag of Features model Our work (nearest neighbor) FingerprintingGenre

Remixes of One Title

Remix Examples Abba Gimme Gimme Madonna Hung Up Tracy Young Remix of Hung Up Tracy Young Remix 2 of Hung Up

How Remix Recognition Works Algorithm –Matched filter best (ICASSP2005 result) –Nearest neighbor in 360–1200D space Ill posed? Efficient implementation –Audio shingles –Like web-duplicate search –Locality-sensitive hashing –Probabilistic guarantee

Audio Processing

Remix Distance N-best matches Matched filter (implemented as nearest neighbor)

Choosing r0

Hashing Types of hashes –String : put casey vs cased in different bins –Locality sensitive : find nearest neighbors High-dimensional and probabilistic Two Nearest Neighbor implementations –Pair-wise distance computation –1,000,000,000,000 comparisons in 2M song database –Hash bucket collisions –1,000,000,000 hash projections

Random Projections Random projections estimate distance Multiple projections improve estimate

Locality Sensitive Hashing Hash function is a random projection No pair-wise computation Collisions are nearest neighbors Distant Vector

Remix Nearest Neighbour Algorithm 1 1.Extract database audio shingles 2.Eliminate shingles < songs mean power 3.Compute remix distance for all pairs 4.Choose pairs with remix distance < r0

1.Extract database audio shingles 2.Eliminate shingles < songs mean power 3.Hash remaining shingles, bin width=r0 4.Collisions are near neighbour shingles Remix Nearest Neighbour Algorithm Revisited

Method Choose 20 Query Songs Each has 3-10 Remixes 306 Madonna Songs 2018 Madonna+Miles

Results

Conclusions Remixes are hard, but well-posed Brute force distances too expensive LSH is 1-2 orders of magnitude faster LSH Remix Recognition is Accurate

Conclusions Remixes are hard, but well-posed Brute force distances too expensive LSH is 1-2 orders of magnitude faster LSH Remix Recognition is Accurate

Conclusions Remixes are hard, but well-posed Brute force distances too expensive LSH is 1-2 orders of magnitude faster LSH Remix Recognition is Accurate

Conclusions Remixes are hard, but well-posed Brute force distances too expensive LSH is 1-2 orders of magnitude faster LSH Remix Recognition is Accurate