A Music Search Engine Built upon Audio-based and Web-based Similarity Measures P. Knees, T., Pohle, M. Schedl, G. Widmer SIGIR 2007.

Slides:



Advertisements
Similar presentations
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 7: Scoring and results assembly.
Advertisements

Text Categorization.
Chapter 5: Introduction to Information Retrieval
Improved TF-IDF Ranker
Text Similarity David Kauchak CS457 Fall 2011.
A review on “Answering Relationship Queries on the Web” Bhushan Pendharkar ASU ID
| 1 › Gertjan van Noord2014 Zoekmachines Lecture 4.
LYRIC-BASED ARTIST NETWORK METHODOLOGY Derek Gossi CS 765 Fall 2014.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
From last time What’s the real point of using vector spaces?: A user’s query can be viewed as a (very) short document. Query becomes a vector in the same.
Effective Image Database Search via Dimensionality Reduction Anders Bjorholm Dahl and Henrik Aanæs IEEE Computer Society Conference on Computer Vision.
Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Berenzweig - Music Recommendation1 Music Recommendation Systems: A Progress Report Adam Berenzweig April 19, 2002.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
Hinrich Schütze and Christina Lioma
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Evaluating the Performance of IR Sytems
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
MusicSense: Contextual Music Recommendation using Emotional Allocation Modeling Rui Cai, Chao Zhang, Chong Wang, Lei Zhang, and Wei-Ying Ma Proceedings.
Information Retrieval IR 6. Recap of the last lecture Parametric and field searches Zones in documents Scoring documents: zone weighting Index support.
Information Retrieval
HYPERGEO 1 st technical verification ARISTOTLE UNIVERSITY OF THESSALONIKI Baseline Document Retrieval Component N. Bassiou, C. Kotropoulos, I. Pitas 20/07/2000,
Chapter 5: Information Retrieval and Web Search
Improving web image search results using query-relative classifiers Josip Krapacy Moray Allanyy Jakob Verbeeky Fr´ed´eric Jurieyy.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Documents as vectors Each doc j can be viewed as a vector of tf.idf values, one component for each term So we have a vector space terms are axes docs live.
DETECTING NEAR-DUPLICATES FOR WEB CRAWLING Authors: Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma Presentation By: Fernando Arreola.
Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang.
Web search basics (Recap) The Web Web crawler Indexer Search User Indexes Query Engine 1 Ad indexes.
Multimedia Information Retrieval
Automated Essay Grading Resources: Introduction to Information Retrieval, Manning, Raghavan, Schutze (Chapter 06 and 18) Automated Essay Scoring with e-rater.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
CSE 6331 © Leonidas Fegaras Information Retrieval 1 Information Retrieval and Web Search Engines Leonidas Fegaras.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
Chapter 6: Information Retrieval and Web Search
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Text mining. The Standard Data Mining process Text Mining Machine learning on text data Text Data mining Text analysis Part of Web mining Typical tasks.
University of Malta CSA3080: Lecture 6 © Chris Staff 1 of 20 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
Music Information Retrieval Information Universe Seongmin Lim Dept. of Industrial Engineering Seoul National University.
Publication Spider Wang Xuan 07/14/2006. What is publication spider Gathering publication pages Using focused crawling With the help of Search Engine.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Vector Space Models.
Search Engines WS 2009 / 2010 Prof. Dr. Hannah Bast Chair of Algorithms and Data Structures Department of Computer Science University of Freiburg Lecture.
CIS 530 Lecture 2 From frequency to meaning: vector space models of semantics.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Natural Language Processing Topics in Information Retrieval August, 2002.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Optimization Indiana University July Geoffrey Fox
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
3: Search & retrieval: Structures. The dog stopped attacking the cat, that lived in U.S.A. collection corpus database web d1…..d n docs processed term-doc.
Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,
CSCE 590 Web Scraping – Information Extraction II
Information Retrieval and Web Search
Video Google: Text Retrieval Approach to Object Matching in Videos
HITS Hypertext Induced Topic Selection
Text Categorization Assigning documents to a fixed set of categories
From frequency to meaning: vector space models of semantics
Multimedia Information Retrieval
HITS Hypertext Induced Topic Selection
Video Google: Text Retrieval Approach to Object Matching in Videos
Presentation transcript:

A Music Search Engine Built upon Audio-based and Web-based Similarity Measures P. Knees, T., Pohle, M. Schedl, G. Widmer SIGIR 2007

INTRODUCTION Basically all existing music search systems make use of manually assigned subjective meta- information like genre or style to index the underlying music collection. Explicit manual annotations A small set of meta-data Recent approaches Content-based analysis of the audio files Collaborative recommendations Incorporate information from different sources

RELATED WORK Query-by-example Query-by-Humming/Singing (QBHS) Operate on MIDI Music piece → Meta-data Cross-media Semantic ontology Semantic relations Crawler on “audio blogs” Word sense disambiguation Text surrounding the links to audio files Last.fm – listening habits & tags

PREPROCESSING THE COLLECTION ID3 tags Artist Album Title Ignored Only speech pieces ( skit in rap) Intro / Outro Duration below 1 minute

WEB-BASED FEATURES Queries to Google 1. “artist” music 2. “artist” “album” music review 3. “artist” “title” music review -lyrics For each query, retrieve top-ranked 100 pages Clean HTML tags and stop words in 6 languages

WEB-BASED FEATURES (CONT.) term list of each music piece Remove all terms with df tm <= 2 global term list Remove all terms that co-occur < 0.1% Resulting 78,000 terms (dimensions) weight( t, m ) tf * idf N – # of music pieces mpf t – music piece frequency Cosine normalization Removes the influence of the length of pages

AUDIO-BASED SIMILARITY MFCCs, Gaussian Mixture Model, KL divergence Problem Hubs- frequently similar Outliers- never similar to others Triangle inequality - does not fulfill Author’s previous work solve these problems

AUDIO-BASED SIMILARITY (CONT.) Always similar – hubs n dist (A) = distance to the n th nearest neighbour g(A, P i ) = D basic (A, P i ) / n dist (P i ), for all i sort g(A, P i ) ascending, pick n th value as f(A) D n-NN norm (A, B) = D basic (A, B) / ( f(A) * f(B) ) Never similar – outliers like above Triangle inequality sort D basic (A, P i ), for all i interpolating D basic (A, B) into D basic (A, P i ) D P (A, B) is the rank of D basic (A, B) in D basic (A, P i ) D pv (A, B) = D P (A, B) + D P (B, A)

DIMENSIONALITY REDUCTION χ 2 test s : 100 most similar tracks d : 100 most dissimilar tracks Calculate χ 2 ( t, s ) N terms with highest value are then joined into a global list sd t AB !t CD n __ dimensionality

VECTOR ADAPTATION Particularly necessary for tracks where no related information could be retrieved from the web Perform a simple smoothing

QUERYING THE MUSIC SEARCH ENGINE Original query + “music” -site:last.fm Google search 10 top-most web pages Map to vector space Calculate Euclidean distances

AUDIOSCROBBLER GROUND TRUTH Common approach genre information several drawbacks Web services to access Last.fm data Tag information provided by Last.fm drawbacks Using top tags for tracks (total 227 tags)

PERFORMANCE EVALUATION Dimensionality reduction pass significance test χ 2 /50 best random permutation

PERFORMANCE EVALUATION Vector adaptation (re-weighting) no significance

PERFORMANCE EVALUATION Overall Precision after 10 documents

EXAMPLES Rock with great riffs Punk Relaxing music

FUTURE WORK Dimensionality reduction tracks ID3 tag Web-based feature Google search Audio similarity Vector adaptation Query Google search Vector space results

FUTURE WORK Dimensionality reduction tracks ID3 tag Web-based feature Google search Audio similarity Vector adaptation Query Google search Vector space 合輯, remix results

FUTURE WORK Dimensionality reduction tracks ID3 tag Web-based feature Google search Audio similarity Vector adaptation Query Google search Vector space Lyrics results

FUTURE WORK Dimensionality reduction tracks ID3 tag Web-based feature Google search Audio similarity Vector adaptation Query Google search Vector space Indexing documents results

FUTURE WORK Dimensionality reduction tracks ID3 tag Web-based feature Google search Audio similarity Vector adaptation Query Google search Vector space PLSA results

FUTURE WORK Dimensionality reduction tracks ID3 tag Web-based feature Google search Audio similarity Vector adaptation Query Google search Vector space Computation inefficient results

FUTURE WORK Dimensionality reduction tracks ID3 tag Web-based feature Google search Audio similarity Vector adaptation Query Google search Vector space Ground truth? results

FUTURE WORK Dimensionality reduction tracks ID3 tag Web-based feature Google search Audio similarity Vector adaptation Query Google search Vector space 合輯, remix Lyrics PLSA Indexing documents Computation inefficient Ground truth? results