Presentation is loading. Please wait.

Presentation is loading. Please wait.

MUSIC RECOMMENDATION SYSTEM FOR LAST.FM DATASET

Similar presentations


Presentation on theme: "MUSIC RECOMMENDATION SYSTEM FOR LAST.FM DATASET"— Presentation transcript:

1 MUSIC RECOMMENDATION SYSTEM FOR LAST.FM DATASET

2 Why music recommendation system is required?

3 What is a data mining ? Data mining , which can be called data or knowledge discovery, is the process of analyzing data from different perspectives and summarizing it into useful information.

4 according to other classified label class
Data mining Modelling Clustering Classification Association Items are grouped for their similar specification in this method. It is consider the similarities of data among themselves It is very common technique for predicting some interests. It may refer to categorization data items. Unclassified cases are predicted as any class label group according to other classified label class Existing records in the database by examining their relationship with each other, it is a technique that determines which events occur together simultaneously

5 What is recommendation engine?
Recommendation system is described as system which interprets data that users entered the system and makes recommendation to users.

6 Recommendation Techniques
Content-based Filtering The salient features of any contents which were liked or watched previously by users are saved in mostly databases and new profile is created for users. While making recommendation, the content that belongs to nearest feature from the sets of property previously created is recommended with looking at this profile.

7 Recommendation Techniques
Collaborative Filtering This constitutes the foundation of “The one loving one loves the alike” approaches. It is not depending on the one user's content- property profile, while making recommendation bearing in mind that users who like the similar content properties or users with similar characteristics.

8 Recommendation Techniques
Collaborative Filtering Types User-based recommendation: This technique finds the similar users and recommends item. Item-based recommendation: The similarity of items is calculated and items are recommended.

9 How to be created recommendation engine ?

10 How to be created recommendation engine ?
When the recommendation engine is created, the following steps should be implemented. The definition of data representation The creation of database or file model structure Making data pre-processing for getting the best result

11 What is an Apache Mahout ?
It is a Java library of scalable machine-learning algorithms, implemented on top of Apache Hadoop and using the MapReduce paradigm. For using Mahout in project: Download the latest Mahout release is 0.8 It can be accessed from the link below Extract all the libraries and include them in a new Eclipse (NetBeans) project as external JAR file. Java 1.6.x or greater is required for installation Hadoop is not mandatory to create recommendation engine.

12 How to use Mahout for recommendation?
The recommendation in Mahout follows these steps: The dataset is adjusted for Mahout-compliant The compatible recommender component is chosen The similarity calculations are computing according to rating or preferences The recommendation is evaluated

13 Recommender job flow The main step doing the heavy lifting in the workflow is the "calculate co-occurrences" step. This step is responsible for doing pairwise comparisons across the entire matrix, looking for commonalities.

14 The background process of recommendation in architecture

15 Graduation Project with Last.fm
Scheduling

16 Graduation Project with Last.fm
Gannt chart

17 Graduation Project with Last.fm
What is important risks ? Big-Data Time Computer performance Sparsity

18 Music recommendation project for Last.fm
The dataset of « Last.fm Dataset-1K users » is used in project. This dataset has information about user properties and which songs are listened by which users. This dataset 2 files, one of them is users’ profile file and other one contains users’ musical history. There are 1000 users and 19,150,868 lines musical history which belongs to 1000-users.

19 Music recommendation project for Last.fm
Last.fm API is used and new csv format is created. Although there are 1000 users, during to project period 700 users' files with desired properties were prepared due to time constraints. After preparing files, all files were saved on database tables for the sake of easy data processing, the tables: Artists Tracks UserTagTrack Users TrackTags

20 Music recommendation project for Last.fm
The collaborative filtering method is used. 2 types of segmentation are considered. The one of the recommendation is made between clustering users according to gender, age, country type. Other recommendation is made between all users. User-based recommendation engine is created. JDBC and File Data Model is used for data representation.

21 Music recommendation project for Last.fm
To make cluster, Weka is used because of simplicity. All users' characteristics were represented as value. (In thesis page 33-34) goes …….

22 Music recommendation project for Last.fm
There are many methods can be used for collaborative filtering : Mean Squared Differences Algorithm Vector Similarity Pearson Correlation Coefficient Strengths and Weaknesses of Collaborative Filtering Method Pearson Correlation Similarity algorithm is used for thesis data model. Since it is convenient and gives correct result for huge amount of data.

23 The functionality of project system

24 JDBC Model-Database Tables
Artists artist id artist name Tracks track id track name artist id published year TrackTags tag id tag name Users user id user name gender age country UserTagTrack usertagtrack id user id track id tag id preferences It is a general database (default), all files or other databases are created from this.

25 Recommendation Model PrefUserTag user id tag id sum (preferences)
PrefUserTrack user id track id sum (preferences) PrefTagTrack track id tag id sum (preferences) In JDBCDataModel, primary keys must be defined because of time efficiency. The database format should be:

26 Number of elements in tables
The name of tables begins with «Pref» statement are formatted table for Mahout recommendation functions. They contain very low data according to UserTagTrack table.

27 Number of elements in tables
Before the assignment of primary key With primary key, format is shown below: user id tag id sum (preferences)

28 The introduction of system
After the text file is created via API, standard line of text is shown as follows: This line represents on UserTagTrack table: user name, artist name, track name, published year, tags user_000103, Super Furry Animals, The Undefeated, 2003, indie, britpop, rock, trumpet, pop usertagtrackid user id track id tag id preferences 1 user_000103 The Undefeated indie 20 2 britpop 3 rock 4 trumpet 5 pop

29 The functions used in the recommendation engine
The working principle of user-based recommendation engine:

30 Recommendation Results
The infinite amount of results can be obtained via evaluator program. In thesis, pages have many results with different conditions. Table Name PrefUserTag Neighbourhood Size 2 For User Id 5 # Recommendations Results Tag-Name RecommendedItem[item:112040,value: ] missjudy76 RecommendedItem[item:3387, value: ] my 750 essential songs RecommendedItem[item:8124, value: ] lionel richie RecommendedItem[item:8147, value: ] leona lewis RecommendedItem[item:1809, value: ] better than the original

31 Recommendation Results
Table Name PrefUserTrack Neighbourhood Size 2 For User Id 5 # Recommendations Results Track Name RecommendedItem[item:7064,value:73.0] Out Of Control Neighbourhood Size 7 Results Track Name RecommendedItem[item:16570,value:304.5] When You'Re Gone RecommendedItem[item:7064, value:73.0] Out Of Control RecommendedItem[item:1466, value:9.0] Aerodynamic RecommendedItem[item:7170, value:5.0 ] Bring Me To Life RecommendedItem[item:2969, value:5.0] Number Five With A Bullet

32 How to evaluate results ?
The evaluation of this recommendation engine result is realized with the most common metrics precision and recall.  Precision is calculated with the ratio of relevant items recommended correctly to the number of items recommended. Recall is the ratio of relevant items recommended correctly to the number of items which are relavent to users. Actual Positive Actual Negative Predicted as positive TP FP Predicted as negative FN TN

33 How to evaluate results ?
The precision-recall is provided RecommenderIRStatsEvaluator class in Mahout. The evaluate function gives the result of F-measure, precision, recall value of recommendation engine . Parameters are given this functions, the important parameter is «at» which means that the number of recommendations to consider when evaluating precision precision at something (integer value)

34 Evaluation Results Table Name PrefUserTag Data Model Structure
User-Tag-Preference Row-Column Variable Number # users: 700 , # item: 14044 Neighbourhood Size 2 5 recommendations Precision: Recall: Table Name PrefUserTrack Data Model Structure User-Track-Preference Row-Column Variable Number # users: 700, # item: Neighbourhood Size 2 5 recommendations Precision: Recall:

35 User-Track-Preference Row-Column Variable Number
Evaluation Results Table Name PrefUserTrack Data Model Structure User-Track-Preference Row-Column Variable Number # users: 700, # item: Neighbourhood Size 3 5 recommendations Precision: Recall:

36 The comment of evaluation results
If the number of neighbourhood size increases, the recommendation engine results will be better because of the working principle of similarity function. User-tag recommendation engine is the better than user-track recommendation engine because of data size and sparsity. People with similar characteristics are also similar musical tastes. When the neighbourhood size increases, the number of recommended items increases.

37 Self-criticism I The creation of data set and data representation took a long time. Thus, ready dataset can be used and this way buys project holder extra time. There are huge amount of data in data model. Scanning all data and making recommendation took a long time because of computer capacity. Thus, I could get a better computer. The out of memory error was the most frequently encountered problems while calculating evaluation result because of low JAVA heap-space in operating system or Java version.

38 Self-criticism II Slowness or memory error problems can be solved via using parallel programming. In addition, using server is the another alternative solution for problems. User-Track Profile results is not good, recommendation engine performance for this model could be increased. If the computer capacity increases, more data can be used for recommendation engine.

39 Thank you for listening 


Download ppt "MUSIC RECOMMENDATION SYSTEM FOR LAST.FM DATASET"

Similar presentations


Ads by Google