Distributed Computing Group From Web to Map: Exploring the World of Music Olga Goussevskaia Michael Kuhn Michael Lorenzi Roger Wattenhofer Web Intelligence.

Slides:



Advertisements
Similar presentations
Google News Personalization: Scalable Online Collaborative Filtering
Advertisements

Lecture 19: Parallel Algorithms
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Network Coding in Peer-to-Peer Networks Presented by Chu Chun Ngai
A Music Search Engine Built upon Audio-based and Web-based Similarity Measures P. Knees, T., Pohle, M. Schedl, G. Widmer SIGIR 2007.
LYRIC-BASED ARTIST NETWORK METHODOLOGY Derek Gossi CS 765 Fall 2014.
Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.
Information Networks Small World Networks Lecture 5.
ETH Zurich – Distributed Computing Group Michael Kuhn 1ETH Zurich – Distributed Computing Group Social Audio Features An Intuitive Guide to the Music Galaxy.
Identity and search in social networks Presented by Pooja Deodhar Duncan J. Watts, Peter Sheridan Dodds and M. E. J. Newman.
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Mohamed Hefeeda 1 School of Computing Science Simon Fraser University, Canada ISP-Friendly Peer Matching without ISP Collaboration Mohamed Hefeeda (Joint.
K nearest neighbor and Rocchio algorithm
Graph & BFS.
On the Topologies Formed by Selfish Peers Thomas Moscibroda Stefan Schmid Roger Wattenhofer IPTPS 2006 Santa Barbara, California, USA.
CS 376b Introduction to Computer Vision 04 / 08 / 2008 Instructor: Michael Eckmann.
CS Lecture 9 Storeing and Querying Large Web Graphs.
Taming Dynamic and Selfish Peers “Peer-to-Peer Systems and Applications” Dagstuhl Seminar March 26th-29th, 2006 Stefan Schmid Distributed Computing Group.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Randomized 3D Geographic Routing Roland Flury Roger Wattenhofer Distributed Computing Group.
Dynamic Hypercube Topology Stefan Schmid URAW 2005 Upper Rhine Algorithms Workshop University of Tübingen, Germany.
CS728 Lecture 16 Web indexes II. Last Time Indexes for answering text queries –given term produce all URLs containing –Compact representations for postings.
ETH Zurich – Distributed Computing Group Samuel Welten 1ETH Zurich – Distributed Computing Group Michael Kuhn Roger Wattenhofer Samuel Welten TexPoint.
Graph & BFS Lecture 22 COMP171 Fall Graph & BFS / Slide 2 Graphs * Extremely useful tool in modeling problems * Consist of: n Vertices n Edges D.
Distributed Computing Group Exploring Music Collections on Mobile Devices Michael Kuhn Olga Goussevskaia Roger Wattenhofer MobileHCI 2008 Amsterdam, NL.
Distributed Computing Group Visually and Acoustically Exploring the High-Dimensional Space of Music Lukas Bossard Michael Kuhn Roger Wattenhofer SocialCom.
LSDS-IR’08, October 30, Peer-to-Peer Similarity Search over Widely Distributed Document Collections Christos Doulkeridis 1, Kjetil Nørvåg 2, Michalis.
Understanding and Organizing User Generated Data Methods and Applications.
Michael Kuhn Distributed Computing Group (DISCO) ETH Zurich The MusicExplorer Project: Mapping the World of Music.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Scalable Network Distance Browsing in Spatial Database Samet, H., Sankaranarayanan, J., and Alborzi H. Proceedings of the 2008 ACM SIGMOD international.
Distributed Model-Based Learning PhD student: Zhang, Xiaofeng.
Navigating and Browsing 3D Models in 3DLIB Hesham Anan, Kurt Maly, Mohammad Zubair Computer Science Dept. Old Dominion University, Norfolk, VA, (anan,
CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
1 Reading Report 4 Yin Chen 26 Feb 2004 Reference: Peer-to-Peer Architecture Case Study: Gnutella Network, Matei Ruoeanu, In Int. Conf. on Peer-to-Peer.
Music retrieval Conventional music retrieval systems Exact queries: ”Give me all songs from J.Lo’s latest album” What about ”Give me the music that I like”?
Developing Analytical Framework to Measure Robustness of Peer-to-Peer Networks Niloy Ganguly.
LANGUAGE NETWORKS THE SMALL WORLD OF HUMAN LANGUAGE Akilan Velmurugan Computer Networks – CS 790G.
CS 551/651 Advanced Computer Graphics Warping and Morphing Spring 2002.
Music Recommendation A Data Mining Approach Daniel McEnnis 2nd year PhD Daniel McEnnis 2nd year PhD.
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
A Scalable Self-organizing Map Algorithm for Textual Classification: A Neural Network Approach to Thesaurus Generation Dmitri G. Roussinov Department of.
Boundary Recognition in Sensor Networks by Topology Methods Yue Wang, Jie Gao Dept. of Computer Science Stony Brook University Stony Brook, NY Joseph S.B.
Representing and Using Graphs
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 5 Graph Algorithms.
CCAN: Cache-based CAN Using the Small World Model Shanghai Jiaotong University Internet Computing R&D Center.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
Scaling Personalized Web Search Authors: Glen Jeh, Jennfier Widom Stanford University Written in: 2003 Cited by: 923 articles Presented by Sugandha Agrawal.
Gennaro Cordasco - How Much Independent Should Individual Contacts be to Form a Small-World? - 19/12/2006 How Much Independent Should Individual Contacts.
Online Social Networks and Media
Ground Truth Free Evaluation of Segment Based Maps Rolf Lakaemper Temple University, Philadelphia,PA,USA.
Semantic Wordfication of Document Collections Presenter: Yingyu Wu.
WSP: A Network Coordinate based Web Service Positioning Framework for Response Time Prediction Jieming Zhu, Yu Kang, Zibin Zheng and Michael R. Lyu The.
SocialVoD: a Social Feature-based P2P System Wei Chang, and Jie Wu Presenter: En Wang Temple University, PA, USA IEEE ICPP, September, Beijing, China1.
Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis.
Parallel Applications And Tools For Cloud Computing Environments CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
A configuration method for structured P2P overlay network considering delay variations Tomoya KITANI (Shizuoka Univ. 、 Japan) Yoshitaka NAKAMURA (NAIST,
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
CSC321: Extra Lecture (not on the exam) Non-linear dimensionality reduction Geoffrey Hinton.
1 ITERATIVE FILE- BASED ITEM:ITEM SIMILARITY COMPUTATION 1 ● Will Holcomb – Vanderbilt University ● Project Aura Intern.
Recommender Systems & Collaborative Filtering
Social Audio Features for Advanced Music Retrieval Interfaces
Chapter 5. Greedy Algorithms
Multidimensional Access Structures
Understanding and Organizing User Generated Data
Presentation transcript:

Distributed Computing Group From Web to Map: Exploring the World of Music Olga Goussevskaia Michael Kuhn Michael Lorenzi Roger Wattenhofer Web Intelligence 2008 Sydney, Australia

2 Olga Goussevskaia, ETH Web Intelligence Storage media –Vinyl records –Compact cassetts –Compact discs An Album is stored on a single physical storage medium –Sequence of songs given by album –Album is typically listened to as a whole Music in the old days organization by album

3 Olga Goussevskaia, ETH Web Intelligence Music today Huge offer, easily available –filesharing, iTunes, Amazon, etc. Large collections –The entire collection is stored on a single electronic storage medium –Organization by albums (and other lists) is no longer appropriate organize by similarity!

4 Olga Goussevskaia, ETH Web Intelligence Overview Define music similarity From Perception to Web –Build a graph of songs From Web to Map –Embed the graph into Euclidean space Application prototype:

5 Olga Goussevskaia, ETH Web Intelligence Music Similarity Audio content analysis Metadata analysis Collaborative filtering –“people who listen to this song also listen to that song” Similar or different???

6 Olga Goussevskaia, ETH Web Intelligence From Perception to Web Data from last.fm (20M users) –Top-50 lists (290K lists, 1.5M distinct songs) –Co-occurrence analysis (normalization cosine(s i,s j )=n ij /(n i n j ) 1/2 ) –10 12 (O(TB)!) pair-wise similarity values Building a graph G –Edge weight w(s i,s j ) = 1/cosine(s i,s j ) –Sparsening: co-occ ≥ 2, w(s i,s j ) ≥ threshold –sim(s i, s j ) = length(shortestPath G (s i, s j )) –Still n = 430K, m = 6.3M, and ever growing How to operate on G? (assuming G is sparse: m=O(n logn)) –Shortest path computation cost: O(m+logn)=O(n logn) –Memory needed to retrieve one value sim(s i, s j ): O(m)=O(n logn) Order of seconds on a state-of-the-art PC! Need to store the whole G, even if I only have 50 songs in my collection!

7 Olga Goussevskaia, ETH Web Intelligence From Web to Map Embedding: map vertices of G into points in Euclidean space, s.t. d G /d E (stretch) is “minimized”. Computation cost of sim(i,j): O(1) time, O(1) memory per item Embedding algorithms: –Multi Dimensional Scaling (MDS): O(dn 2 ) –Spring embedding (Fruchterman-Reingold): O(n 2 + m) –MIS-filtering: O(n log 2 Δ) –High-dimensional embedding: O(nl 2 + lm) –Landmark MDS (LMDS): O(nld + l 3 ) –Adaptive computation/quality tradeoff –Suitable for dynamic settings

8 Olga Goussevskaia, ETH Web Intelligence Iterative Embedding Assumption: some links erroneously shortcut certain paths E [# random edges] = X Repeat (X / f) times –embed G (using e.g. LMDS) –Remove (from G) fraction f of edges with highest stretch d E /d G Example: Kleinberg graph (20x20 grid, f = 0.003) Spring embedding output After 6 rounds After 12 rounds After 30 rounds

9 Olga Goussevskaia, ETH Web Intelligence Evaluation Music Taxonomy ( –Control set: 7K songs with genre information Genre distance d S = LCA (least common ancestor) How well does the resulting map represent music similarity?

10 Olga Goussevskaia, ETH Web Intelligence Evaluation: Quality Measures Distance comparison Q L : average similarity increase as a function of genre distance d s Embedding smoothness Q R : average # of genre re- occurrences on a random line Avg. similarity of pairs (s i,s j ) w/ d s (i,j)=h Songs that belong to distant genres should be far away in the embedding. Genre transitions in the embedding should be “smooth”.

11 Olga Goussevskaia, ETH Web Intelligence Evaluation: Iterative Embedding After 30 rounds, f=0.5% LMDS output (430K nodes, 10 dimensions)

12 Olga Goussevskaia, ETH Web Intelligence Evaluation Closest neighbors in 10D

13 Olga Goussevskaia, ETH Web Intelligence Applications: Music Explorer –Web service to query coordinates (current DB with 430K titles) –Visualization in 2D –Zoom level according to song popularity –Playlist generation based on trajectories

14 Olga Goussevskaia, ETH Web Intelligence Playlist generation Interpolation between start and end-point –Smooth transition from one style to the other –In reality: 10 dimensions

15 Olga Goussevskaia, ETH Web Intelligence Music in Euclidean Space Performance –Similarity computation comes almost for free: O(1) time –Memory footprint is extremly low: O(1) per song –All information can be saved in the file, no server connection required. Applications –Trajectories (playlists,...) –Volumes (region of interest,...) –Notion of direction coordinates are well suited for mobile applications coordinates are well suited for similarity based organization

16 Olga Goussevskaia, ETH Web Intelligence Towards a new world of music? Euclidean representation –Efficient similarity computation (time and memory) –No server needed: distributed applications –Building blocks for new functionalities: New scenarios: –Mobile file sharing –P2P overlay based on the map –Innovations at home –“Play anything hip-hip… not this and not closely related songs… go towards Detroit house, be there in an hour” –Automatic DJ (collect feedback from mobiles, generate playlists based on guests regions of interest) Trajectories (Playlists) Volumes (Interest Regions) Notion of Direction (Browsing)

17 Olga Goussevskaia, ETH Web Intelligence Conclusions Necessary?

18 Olga Goussevskaia, ETH Web Intelligence Thanks for your Attention Questions?