3What is a data mining ?Data mining , which can be called data or knowledge discovery, is the process of analyzing data from different perspectives and summarizing it into useful information.
4according to other classified label class Data mining ModellingClustering Classification AssociationItems are grouped for their similar specification in this method. It is consider the similarities of data among themselvesIt is very common technique for predicting some interests. It may refer to categorization data items. Unclassified cases are predicted as any class label groupaccording to other classified label classExisting records in the database by examining their relationship with each other, it is a technique that determines which events occur together simultaneously
5What is recommendation engine? Recommendation system is described as system which interprets data that users entered the system and makes recommendation to users.
6Recommendation Techniques Content-based FilteringThe salient features of any contents which were liked or watched previously by users are saved in mostly databases and new profile is created for users.While making recommendation, the content that belongs to nearest feature from the sets of property previously created is recommended with looking at this profile.https://www.ntt-review.jp/archive_html/200804/images/le1_fig02.gif
7Recommendation Techniques Collaborative FilteringThis constitutes the foundation of “The one loving one loves the alike” approaches.It is not depending on the one user's content- property profile, while making recommendation bearing in mind that users who like the similar content properties or users with similar characteristics.
8Recommendation Techniques Collaborative Filtering TypesUser-based recommendation: This technique finds the similar users and recommends item.Item-based recommendation: The similarity of items is calculated and items are recommended.
10How to be created recommendation engine ? When the recommendation engine is created, the following steps should be implemented.The definition of data representationThe creation of database or file model structureMaking data pre-processing for getting the best result
11What is an Apache Mahout ? It is a Java library of scalable machine-learning algorithms, implemented on top of Apache Hadoop and using the MapReduce paradigm.For using Mahout in project:Download the latest Mahout release is 0.8It can be accessed from the link belowExtract all the libraries and include them in a new Eclipse (NetBeans) project as external JAR file.Java 1.6.x or greater is required for installationHadoop is not mandatory to create recommendation engine.
12How to use Mahout for recommendation? The recommendation in Mahout follows these steps:The dataset is adjusted for Mahout-compliantThe compatible recommender component is chosenThe similarity calculations are computing according to rating or preferencesThe recommendation is evaluated
13Recommender job flowThe main step doing the heavy lifting in the workflow is the "calculate co-occurrences" step. This step is responsible for doing pairwise comparisons across the entire matrix, looking for commonalities.
14The background process of recommendation in architecture
17Graduation Project with Last.fm What is important risks ?Big-DataTimeComputer performanceSparsity
18Music recommendation project for Last.fm The dataset of « Last.fm Dataset-1K users » is used in project. This dataset has information about user properties and which songs are listened by which users.This dataset 2 files, one of them is users’ profile file and other one contains users’ musical history.There are 1000 users and 19,150,868 lines musical history which belongs to 1000-users.
19Music recommendation project for Last.fm Last.fm API is used and new csv format is created.Although there are 1000 users, during to project period 700 users' files with desired properties were prepared due to time constraints.After preparing files, all files were saved on database tables for the sake of easy data processing, the tables:ArtistsTracksUserTagTrackUsersTrackTags
20Music recommendation project for Last.fm The collaborative filtering method is used.2 types of segmentation are considered.The one of the recommendation is made between clustering users according to gender, age, country type.Other recommendation is made between all users.User-based recommendation engine is created.JDBC and File Data Model is used for data representation.
21Music recommendation project for Last.fm To make cluster, Weka is used because of simplicity. All users' characteristics were represented as value. (In thesis page 33-34)goes…….
22Music recommendation project for Last.fm There are many methods can be used for collaborative filtering :Mean Squared Differences AlgorithmVector SimilarityPearson Correlation CoefficientStrengths and Weaknesses of Collaborative Filtering MethodPearson Correlation Similarity algorithm is used for thesis data model. Since it is convenient and gives correct result for huge amount of data.
24JDBC Model-Database Tables Artistsartist idartist nameTrackstrack idtrack nameartist idpublished yearTrackTagstag idtag nameUsersuser iduser namegenderagecountryUserTagTrackusertagtrack iduser idtrack idtag idpreferencesIt is a general database (default), all files or other databases are created from this.
25Recommendation Model PrefUserTag user id tag id sum (preferences) PrefUserTrackuser idtrack idsum (preferences)PrefTagTracktrack idtag idsum (preferences)In JDBCDataModel, primary keys must be defined because of time efficiency. The database format should be:
26Number of elements in tables The name of tables begins with «Pref» statement are formatted table for Mahout recommendation functions.They contain very low data according to UserTagTrack table.
27Number of elements in tables Before the assignment of primary keyWith primary key, format is shown below:user idtag idsum (preferences)
28The introduction of system After the text file is created via API, standard line of text is shown as follows:This line represents on UserTagTrack table:user name, artist name, track name, published year, tagsuser_000103, Super Furry Animals, The Undefeated, 2003, indie, britpop, rock, trumpet, popusertagtrackiduser idtrack idtag idpreferences1user_000103The Undefeatedindie202britpop3rock4trumpet5pop
29The functions used in the recommendation engine The working principle of user-based recommendation engine:
30Recommendation Results The infinite amount of results can be obtained via evaluator program. In thesis, pages have many results with different conditions.Table NamePrefUserTagNeighbourhood Size2For User Id5# RecommendationsResultsTag-NameRecommendedItem[item:112040,value: ]missjudy76RecommendedItem[item:3387, value: ]my 750 essential songsRecommendedItem[item:8124, value: ]lionel richieRecommendedItem[item:8147, value: ]leona lewisRecommendedItem[item:1809, value: ]better than the original
31Recommendation Results Table NamePrefUserTrackNeighbourhood Size2For User Id5# RecommendationsResultsTrack NameRecommendedItem[item:7064,value:73.0]Out Of ControlNeighbourhood Size7ResultsTrack NameRecommendedItem[item:16570,value:304.5]When You'Re GoneRecommendedItem[item:7064, value:73.0]Out Of ControlRecommendedItem[item:1466, value:9.0]AerodynamicRecommendedItem[item:7170, value:5.0 ]Bring Me To LifeRecommendedItem[item:2969, value:5.0]Number Five With A Bullet
32How to evaluate results ? The evaluation of this recommendation engine result is realized with the most common metrics precision and recall. Precision is calculated with the ratio of relevant items recommended correctly to the number of items recommended.Recall is the ratio of relevant items recommended correctly to the number of items which are relavent to users.Actual PositiveActual NegativePredicted as positiveTPFPPredicted as negativeFNTN
33How to evaluate results ? The precision-recall is provided RecommenderIRStatsEvaluator class in Mahout. The evaluate function gives the result of F-measure, precision, recall value of recommendation engine .Parameters are given this functions, the important parameter is «at» which means that the number of recommendations to consider when evaluating precisionprecision at something (integer value)
34Evaluation Results Table Name PrefUserTag Data Model Structure User-Tag-PreferenceRow-Column Variable Number# users: 700 , # item: 14044Neighbourhood Size25 recommendationsPrecision:Recall:Table NamePrefUserTrackData Model StructureUser-Track-PreferenceRow-Column Variable Number# users: 700, # item:Neighbourhood Size25 recommendationsPrecision:Recall:
35User-Track-Preference Row-Column Variable Number Evaluation ResultsTable NamePrefUserTrackData Model StructureUser-Track-PreferenceRow-Column Variable Number# users: 700, # item:Neighbourhood Size35 recommendationsPrecision:Recall:
36The comment of evaluation results If the number of neighbourhood size increases, the recommendation engine results will be better because of the working principle of similarity function.User-tag recommendation engine is the better than user-track recommendation engine because of data size and sparsity.People with similar characteristics are also similar musical tastes.When the neighbourhood size increases, the number of recommended items increases.
37Self-criticism IThe creation of data set and data representation took a long time. Thus, ready dataset can be used and this way buys project holder extra time.There are huge amount of data in data model. Scanning all data and making recommendation took a long time because of computer capacity. Thus, I could get a better computer.The out of memory error was the most frequently encountered problems while calculating evaluation result because of low JAVA heap-space in operating system or Java version.
38Self-criticism IISlowness or memory error problems can be solved via using parallel programming. In addition, using server is the another alternative solution for problems.User-Track Profile results is not good, recommendation engine performance for this model could be increased.If the computer capacity increases,more data can be used forrecommendation engine.