Presentation is loading. Please wait.

Presentation is loading. Please wait.

Personalisation and Recommendations using Drupal Keywords: – Personalisation – Recommendations – Scalable machine learning – Predictions – Similarity –

Similar presentations


Presentation on theme: "Personalisation and Recommendations using Drupal Keywords: – Personalisation – Recommendations – Scalable machine learning – Predictions – Similarity –"— Presentation transcript:

1 Personalisation and Recommendations using Drupal Keywords: – Personalisation – Recommendations – Scalable machine learning – Predictions – Similarity – Data Mining – Big Data – Trend Spotting – Clustering Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16

2 Kendra Initiative mission – Foster an Open Distributed Marketplace for Digital Media EU funded – P2P-Next http://www.p2p-next.org – SARACEN (Socially Aware, collaboRative, scAlable Coding mEdia distributioN) http://www.saracen-p2p.eu Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16

3 Deliverables Kendra Signpost – Metadata interoperability, mapping and transformation Smart Filters – Portable preferences and filters Kendra Social, Kendra Hub – Social networking management tools Standards work – OpenSocial extension – Social API – see Abstracting Social Networking functionality in Drupal sprint Kendra Match – Searching and recommendation Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16

4 Components Drupal Recommender API module Recommender helper modules async_command module Apache Mahout or cloud service Hadoop cluster (optional) Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16

5 Industry Examples Amazon Netflix Spotify, Pandora Facebook, LinkedIn OKCupid iTunes: Genius; app store - not so much Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16

6 Machine learning Collaborative Filtering – AKA recommender engines Clustering Classification Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16

7 Collaborative Filtering Input: preference data Output: predictions Preference = – w 1 = signed integer representing weight of uid 1 - nid 1 or uid 1 -uid 2 correlation (affinity) Prediction = – w 2 = float representing strength of uid 1 -nid 1 or uid 1 -uid 2 correlation Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16

8 Enter Mahout Apache Mahout is a scalable machine learning library that supports large data sets. Launched Spring 2010 Grew from the Apache Lucene project (basis for Apache Solr) Merged with Taste project Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16

9 Use Cases Recommendation mining Clustering Classification Frequent itemset mining Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16

10 Out-of-box algorithms Recommendation – User-based recommender – Item-based recommender – Slope-One recommender – Distributed Item-Based Collaborative Filtering – Collaborative Filtering using parallel matrix factorisation Clustering – Canopy Clustering – K-Means Clustering – Fuzzy K-Means – Mean Shift Clustering – Dirichlet Process Clustering – Latent Dirichlet Allocation – Spectral Clustering – Minhash Clustering Model combination – Naive Bayes algorithm Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16

11 Hadoop Provides clustering capabilities Not trivial to set up Not yet implemented in Recommender API (issue #1206840) Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16

12 Recommender API Drupal 7 (alpha) & 6 (beta) Can run either on same server as Apache web server or on a remote server Java helper program (was PHP) Uses JDBC and Java Persistence API (JPA) Drupal helper modules Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16

13 Recommender API helper modules Browsing History Recommender OG Similar groups module Ubercart Products Recommender Fivestar Recommender Points Voting Recommender Flag Recommender Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16

14 Asynchronous operation Async_command module – Talks to Mahout – Typically run via cron Results are stored directly in Drupal db – Recommender tables – Via JDBC Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16

15 Hosting Solutions Self-hosted: all-in-one (web server, database server, recommender server) - has its pro’s & cons Recommender API Cloud Service - looking for beta testers Amazon Elastic MapReduce (EMR) Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16

16 Installing Mahout Prerequisites: – Dedicated VM if possible – Linux, Mac OSX Leopard 10.5.6 or later, Windows (Cygwin) – Java JDK 1.6 – Maven 2.0.11 or higher (maven.apache.org) Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16

17 Installing Mahout Building – Follow instructions – https://cwiki.apache.org/MAHOUT/buildingmaho ut.html https://cwiki.apache.org/MAHOUT/buildingmaho ut.html Use maven to build examples Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16

18 Installing Mahout Testing: Grouplens – On a single 2GHz server: 100K ratings (1000 users, 1700 items): 9 minutes. 1M ratings (6000 users, 4000 items): 12 hours. 10M ratings (72,000 users, 10,000 items): fuggedaboutit – Using 6 concurrent 2GHz processing units: 100K ratings (1000 users, 1700 items): 2 minutes. 1M ratings (6000 users, 4000 items): 2 hours. 10M ratings (72,000 users, 10,000 items): 11 days 20 hours. Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16

19 Installing Recommender API See http://drupal.org/node/1207634http://drupal.org/node/1207634 Configuration – sites/all/modules/async_command/config.propert ies should match settings.php Download and enable async_command Check /admin/config/search/recommender/admin Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16

20 Usage Making recommendations – User-user – User-item – Item-item Predictions/similarity feeds back into Drupal Blocks Views Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16

21 Case study: Data Mining and Recommendations in SARACEN SARACEN: http://www.saracen-p2p.eu/http://www.saracen-p2p.eu/ Feedback loop to measure subjective quality of the recommendations – Limited set of data, small user base – API provides an initial set of recommended videos – User can then watch a recommended video – User’s actions are incorporated into their implicit profile, feeds back to the recommender API – Recommender API generates new predictions based on the complete set of implicit profile metadata Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16

22 SARACEN: Prototype Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16

23 Recommender data sources Explicit data – SARACEN account data, including location and language – Linked accounts and profiles e.g. Facebook user profile, “likes”, connections, metadata Implicit data – Activity history recorded during the user’s sessions – Searches – Shared content – Viewed content – Albums (media containers) – Content ratings Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16

24 Scalability Don’t need Hadoop if – Number of users is orders of magnitude larger than the number of items – Users browse anonymously most of the time – Few users log in and need personalised recommendations – Item churn rate is relatively low Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16

25 Worth Considering Decreased Transparency Decreased Serendipity Sleep deprivation Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16

26 Resources: Recommender API http://drupal.org/project/recommender http://recommenderapi.com/cloud https://cwiki.apache.org/confluence/display/ MAHOUT https://cwiki.apache.org/confluence/display/ MAHOUT Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16

27 Resources: Mahout http://mahout.apache.org/ Mahout in Action – http://www.manning.com/owen/ http://www.manning.com/owen/ – ISBN 9781935182689. The Optimality of Naive Bayes, Harry Zhang. http://aws.amazon.com/elasticmapreduce/ Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16

28 Acknowledgements Socially Aware, collaboRative, scAlable Coding mEdia distributioN (SARACEN) – http://www.saracen-p2p.eu http://www.saracen-p2p.eu – Funded within the European Union’s Seventh Framework Programme (FP7/2007-2013) under grant agreement 248474 Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16

29 Questions? Kendra Initiative – @kendra – http://www.kendra.org.uk http://www.kendra.org.uk – https://github.com/kendrainitiative https://github.com/kendrainitiative Klokie Grossfeld – @klokie – klokie@kendra.org.uk klokie@kendra.org.uk – http://www.linkedin.com/in/klokie http://www.linkedin.com/in/klokie Daniel Harris – @dahacouk – daniel@kendra.org.uk daniel@kendra.org.uk – http://www.linkedin.com/in/dahacouk http://www.linkedin.com/in/dahacouk Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16

30 Thanks Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16 http://barcelona2012.drupaldays.org/abstracting-social- networking-functionality-drupal


Download ppt "Personalisation and Recommendations using Drupal Keywords: – Personalisation – Recommendations – Scalable machine learning – Predictions – Similarity –"

Similar presentations


Ads by Google