A Music Search Engine Built upon Audio-based and Web-based Similarity Measures P. Knees, T., Pohle, M. Schedl, G. Widmer SIGIR 2007
INTRODUCTION Basically all existing music search systems make use of manually assigned subjective meta- information like genre or style to index the underlying music collection. Explicit manual annotations A small set of meta-data Recent approaches Content-based analysis of the audio files Collaborative recommendations Incorporate information from different sources
RELATED WORK Query-by-example Query-by-Humming/Singing (QBHS) Operate on MIDI Music piece → Meta-data Cross-media Semantic ontology Semantic relations Crawler on “audio blogs” Word sense disambiguation Text surrounding the links to audio files Last.fm – listening habits & tags
PREPROCESSING THE COLLECTION ID3 tags Artist Album Title Ignored Only speech pieces ( skit in rap) Intro / Outro Duration below 1 minute
WEB-BASED FEATURES Queries to Google 1. “artist” music 2. “artist” “album” music review 3. “artist” “title” music review -lyrics For each query, retrieve top-ranked 100 pages Clean HTML tags and stop words in 6 languages
WEB-BASED FEATURES (CONT.) term list of each music piece Remove all terms with df tm <= 2 global term list Remove all terms that co-occur < 0.1% Resulting 78,000 terms (dimensions) weight( t, m ) tf * idf N – # of music pieces mpf t – music piece frequency Cosine normalization Removes the influence of the length of pages
AUDIO-BASED SIMILARITY MFCCs, Gaussian Mixture Model, KL divergence Problem Hubs- frequently similar Outliers- never similar to others Triangle inequality - does not fulfill Author’s previous work solve these problems
AUDIO-BASED SIMILARITY (CONT.) Always similar – hubs n dist (A) = distance to the n th nearest neighbour g(A, P i ) = D basic (A, P i ) / n dist (P i ), for all i sort g(A, P i ) ascending, pick n th value as f(A) D n-NN norm (A, B) = D basic (A, B) / ( f(A) * f(B) ) Never similar – outliers like above Triangle inequality sort D basic (A, P i ), for all i interpolating D basic (A, B) into D basic (A, P i ) D P (A, B) is the rank of D basic (A, B) in D basic (A, P i ) D pv (A, B) = D P (A, B) + D P (B, A)
DIMENSIONALITY REDUCTION χ 2 test s : 100 most similar tracks d : 100 most dissimilar tracks Calculate χ 2 ( t, s ) N terms with highest value are then joined into a global list sd t AB !t CD n __ dimensionality
VECTOR ADAPTATION Particularly necessary for tracks where no related information could be retrieved from the web Perform a simple smoothing
QUERYING THE MUSIC SEARCH ENGINE Original query + “music” -site:last.fm Google search 10 top-most web pages Map to vector space Calculate Euclidean distances
AUDIOSCROBBLER GROUND TRUTH Common approach genre information several drawbacks Web services to access Last.fm data Tag information provided by Last.fm drawbacks Using top tags for tracks (total 227 tags)
PERFORMANCE EVALUATION Dimensionality reduction pass significance test χ 2 /50 best random permutation
PERFORMANCE EVALUATION Vector adaptation (re-weighting) no significance
PERFORMANCE EVALUATION Overall Precision after 10 documents
EXAMPLES Rock with great riffs Punk Relaxing music
FUTURE WORK Dimensionality reduction tracks ID3 tag Web-based feature Google search Audio similarity Vector adaptation Query Google search Vector space results
FUTURE WORK Dimensionality reduction tracks ID3 tag Web-based feature Google search Audio similarity Vector adaptation Query Google search Vector space 合輯, remix results
FUTURE WORK Dimensionality reduction tracks ID3 tag Web-based feature Google search Audio similarity Vector adaptation Query Google search Vector space Lyrics results
FUTURE WORK Dimensionality reduction tracks ID3 tag Web-based feature Google search Audio similarity Vector adaptation Query Google search Vector space Indexing documents results
FUTURE WORK Dimensionality reduction tracks ID3 tag Web-based feature Google search Audio similarity Vector adaptation Query Google search Vector space PLSA results
FUTURE WORK Dimensionality reduction tracks ID3 tag Web-based feature Google search Audio similarity Vector adaptation Query Google search Vector space Computation inefficient results
FUTURE WORK Dimensionality reduction tracks ID3 tag Web-based feature Google search Audio similarity Vector adaptation Query Google search Vector space Ground truth? results
FUTURE WORK Dimensionality reduction tracks ID3 tag Web-based feature Google search Audio similarity Vector adaptation Query Google search Vector space 合輯, remix Lyrics PLSA Indexing documents Computation inefficient Ground truth? results