Recommendation Systems ARGEDOR
Introduction Sample Data Tools Cases
Introduction Recommender systems reduce information overload by estimating relevance. Which artist should I listen based on my preferences? What is the best holiday for me and my family? Which movie should I watch? Which web sites will I find interesting? Which book should I buy for my next vacation?
Personalized Recommendation
Collaborative Filtering Collaborative: "Tell me what's popular among my peers"
Content Based Recommendation Content-based: "Show me more of the same what I've liked"
Knowledge Based Recommendation Knowledge-based: "Tell me what fits based on my needs"
Hybrid Models Hybrid: combinations of various inputs and/or composition of different mechanism
Sample Data: MovieLens 1M dataset Content Data Item movies.dat MovieID::Title::Genres 1::Toy Story (1995)::Animation|Children's|Comedy Sometimes content data contains time of creation for the content. ( )
Sample Data: MovieLens 1M dataset Content Data User users.dat UserID::Gender::Age::Occupation::Zip-code 1::F::1::10::48067 !! Since this data anonymized no user name related information.
Sample Data:TTNet Music TTNet Music User Rating Logs userId,songId,albumId,artistId,timeofaction,ratingValue,channel , ,286068,546697, :17:49,0.9,SI – Rating value is a derived value obtained by a formula depending on user’s actions(listened,downloaded,listened before etc) – For TTNET music recommendation engine we have approximately 1 million unique user action logs daily. – Stored on distributed file system. Used for collaborative filtering.
User Profiling Content data Age: 18 Gender: F Occupation: User’s Ratings Item1:3 Item2:5 User profiling enables weighting of similarity metrics
Context Awareness Context location time of day season mood weather Context information is taken into account when generating recommendations
Tools Apache Mahout ( Open Source machine learning library for large scale applications – Classification( Complementary Naive Bayes classifier, Random forest decision tree based classifier ) – Clustering( K-Means, Fuzzy K-Means clustering ) – Collaborative filtering,User based,Item based recommendations.
Tools hadoop.apache.org Open source distributed file system. – Large Scale DBMS runs on Hadoop file system.
Tools Open source graph database. – Storage for highly connected data – Fast query response for large scale databases – Most graph traversal algoritms implemented. – Instead of scaning whole database just visit connected parts. – Collaborative filtering data model is possible with Neo4j. – Used for Content based music recommendation projects of ARGEDOR
Example:Movie Graph DB Relations Both content data and user actions are stored on graph db