We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byTiara Duty
Modified about 1 year ago
1 ©MapR Technologies 2013 Which Algorithms Really Matter?
2 ©MapR Technologies 2013 Me, Us Ted Dunning, Chief Application Architect, MapR Committer PMC member, Mahout, Zookeeper, Drill Bought the beer at the first HUG MapR Distributes more open source components for Hadoop Adds major technology for performance, HA, industry standard API’s Info Hash tag - #mapr See @ted_dunning
3 ©MapR Technologies 2013 Which Learning Algorithms Really Matter? The set of algorithms that matter theoretically is different from the ones that matter commercially. Commercial importance often hinges on ease of deployment, robustness against perverse data and conceptual simplicity. Often, even accuracy can be sacrificed against these other goals. Commercial systems also often live in a highly interacting environment so off-line evaluations may have only limited applicability. I will describe several commercially important algorithms such as Thompson sampling (aka Bayesian Bandits), result dithering, on-line clustering and distribution sketches and will explain what makes these algorithms important in industrial settings.
4 ©MapR Technologies 2013 Topic For Today What is important? What is not? Why? What is the difference from academic research? Some examples
5 ©MapR Technologies 2013 What is Important? Deployable Robust Transparent Skillset and mindset matched? Proportionate
6 ©MapR Technologies 2013 What is Important? Deployable – Clever prototypes don’t count if they can’t be standardized Robust Transparent Skillset and mindset matched? Proportionate
7 ©MapR Technologies 2013 What is Important? Deployable – Clever prototypes don’t count Robust – Mishandling is common Transparent – Will degradation be obvious? Skillset and mindset matched? Proportionate
8 ©MapR Technologies 2013 What is Important? Deployable – Clever prototypes don’t count Robust – Mishandling is common Transparent – Will degradation be obvious? Skillset and mindset matched? – How long will your fancy data scientist enjoy doing standard ops tasks? Proportionate – Where is the highest value per minute of effort?
9 ©MapR Technologies 2013 Academic Goals vs Pragmatics Academic goals – Reproducible – Isolate theoretically important aspects – Work on novel problems Pragmatics – Highest net value – Available data is constantly changing – Diligence and consistency have larger impact than cleverness – Many systems feed themselves, exploration and exploitation are both important – Engineering constraints on budget and schedule
10 ©MapR Technologies 2013 Example 1: Making Recommendations Better
11 ©MapR Technologies 2013 Recommendation Advances What are the most important algorithmic advances in recommendations over the last 10 years? Cooccurrence analysis? Matrix completion via factorization? Latent factor log-linear models? Temporal dynamics?
12 ©MapR Technologies 2013 The Winner – None of the Above What are the most important algorithmic advances in recommendations over the last 10 years? 1. Result dithering 2. Anti-flood
13 ©MapR Technologies 2013 The Real Issues Exploration Diversity Speed Not the last fraction of a percent
14 ©MapR Technologies 2013 Result Dithering Dithering is used to re-order recommendation results – Re-ordering is done randomly Dithering is guaranteed to make off-line performance worse Dithering also has a near perfect record of making actual performance much better
15 ©MapR Technologies 2013 Result Dithering Dithering is used to re-order recommendation results – Re-ordering is done randomly Dithering is guaranteed to make off-line performance worse Dithering also has a near perfect record of making actual performance much better “Made more difference than any other change”
16 ©MapR Technologies 2013 Simple Dithering Algorithm Generate synthetic score from log rank plus Gaussian Pick noise scale to provide desired level of mixing Typically Oh… use floor(t/T) as seed
17 ©MapR Technologies 2013 Example … ε = 0.5
18 ©MapR Technologies 2013 Example … ε = log 2 = 0.69
19 ©MapR Technologies 2013 Exploring The Second Page
20 ©MapR Technologies 2013 Lesson 1: Exploration is good
21 ©MapR Technologies 2013 Example 2: Bayesian Bandits
22 ©MapR Technologies 2013 Bayesian Bandits Based on Thompson sampling Very general sequential test Near optimal regret Trade-off exploration and exploitation Possibly best known solution for exploration/exploitation Incredibly simple
23 ©MapR Technologies 2013 Thompson Sampling Select each shell according to the probability that it is the best Probability that it is the best can be computed using posterior But I promised a simple answer
24 ©MapR Technologies 2013 Thompson Sampling – Take 2 Sample θ Pick i to maximize reward Record result from using i
25 ©MapR Technologies 2013 Fast Convergence
26 ©MapR Technologies 2013 Thompson Sampling on Ads An Empirical Evaluation of Thompson Sampling - Chapelle and Li, 2011
27 ©MapR Technologies 2013 Bayesian Bandits versus Result Dithering Many useful systems are difficult to frame in fully Bayesian form Thompson sampling cannot be applied without posterior sampling Can still do useful exploration with dithering But better to use Thompson sampling if possible
28 ©MapR Technologies 2013 Lesson 2: Exploration is pretty easy to do and pays big benefits.
29 ©MapR Technologies 2013 Example 3: On-line Clustering
30 ©MapR Technologies 2013 The Problem K-means clustering is useful for feature extraction or compression At scale and at high dimension, the desirable number of clusters increases Very large number of clusters may require more passes through the data Super-linear scaling is generally infeasible
31 ©MapR Technologies 2013 The Solution Sketch-based algorithms produce a sketch of the data Streaming k-means uses adaptive dp-means to produce this sketch in the form of many weighted centroids which approximate the original distribution The size of the sketch grows very slowly with increasing data size Many operations such as clustering are well behaved on sketches Fast and Accurate k-means For Large Datasets. Michael Shindler, Alex Wong, Adam Meyerson. Revisiting k-means: New Algorithms via Bayesian Nonparametrics. Brian Kulis, Michael Jordan.
32 ©MapR Technologies 2013 An Example
33 ©MapR Technologies 2013 An Example
34 ©MapR Technologies 2013 The Cluster Proximity Features Every point can be described by the nearest cluster – 4.3 bits per point in this case – Significant error that can be decreased (to a point) by increasing number of clusters Or by the proximity to the 2 nearest clusters (2 x 4.3 bits + 1 sign bit + 2 proximities) – Error is negligible – Unwinds the data into a simple representation Or we can increase the number of clusters (n fold increase adds log n bits per point, decreases error by sqrt(n)
35 ©MapR Technologies 2013 Diagonalized Cluster Proximity
36 ©MapR Technologies 2013 Lots of Clusters Are Fine
37 ©MapR Technologies 2013 Typical k-means Failure Selecting two seeds here cannot be fixed with Lloyds Result is that these two clusters get glued together
38 ©MapR Technologies 2013 Streaming k-means Ideas By using a sketch with lots (k log N) of centroids, we avoid pathological cases We still get a very good result if the sketch is created – in one pass – with approximate search In fact, adaptive dp-means works just fine In the end, the sketch can be used for clustering or …
39 ©MapR Technologies 2013 Lesson 3: Sketches make big data small.
40 ©MapR Technologies 2013 Example 4: Search Abuse
41 ©MapR Technologies 2013 Recommendations Alice got an apple and a puppy Charles got a bicycle Alice Charles
42 ©MapR Technologies 2013 Recommendations Alice got an apple and a puppy Charles got a bicycle Bob got an apple Alice Bob Charles
43 ©MapR Technologies 2013 Recommendations What else would Bob like? ? Alice Bob Charles
44 ©MapR Technologies 2013 Log Files Alice Bob Charles Alice Bob Charles Alice
45 ©MapR Technologies 2013 History Matrix: Users by Items Alice Bob Charles ✔✔✔ ✔✔ ✔✔
46 ©MapR Technologies 2013 Co-occurrence Matrix: Items by Items How do you tell which co-occurrences are useful?
47 ©MapR Technologies 2013 Co-occurrence Binary Matrix 1 1 not 1
48 ©MapR Technologies 2013 Indicator Matrix: Anomalous Co-Occurrence ✔ ✔ Result: The marked row will be added to the indicator field in the item document…
49 ©MapR Technologies 2013 Indicator Matrix ✔ id: t4 title: puppy desc: The sweetest little puppy ever. keywords: puppy, dog, pet indicators: (t1) id: t4 title: puppy desc: The sweetest little puppy ever. keywords: puppy, dog, pet indicators: (t1) That one row from indicator matrix becomes the indicator field in the Solr document used to deploy the recommendation engine. Note: data for the indicator field is added directly to meta-data for a document in Solr index. You don’t need to create a separate index for the indicators.
50 ©MapR Technologies 2013 Internals of the Recommender Engine 50
51 ©MapR Technologies 2013 Internals of the Recommender Engine 51
52 ©MapR Technologies 2013 Looking Inside LucidWorks What to recommend if new user listened to 2122: Fats Domino & 303: Beatles? Recommendation is “1710 : Chuck Berry” 52 Real-time recommendation query and results: Evaluation
53 ©MapR Technologies 2013 Real-life example
54 ©MapR Technologies 2013 Lesson 4: Recursive search abuse pays Search can implement recs Which can implement search
55 ©MapR Technologies 2013 Summary
56 ©MapR Technologies 2013
57 ©MapR Technologies 2013 Me, Us Ted Dunning, Chief Application Architect, MapR Committer PMC member, Mahout, Zookeeper, Drill Bought the beer at the first HUG MapR Distributes more open source components for Hadoop Adds major technology for performance, HA, industry standard API’s Info Hash tag - #mapr See @ted_dunning
UNIT V: LEARNING. LEARNING Learning from Observation Inductive Learning Decision Trees Explanation based Learning Statistical Learning methods Reinforcement.
Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered.
Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China
REFEREE: An open framework for practical testing of recommender systems using ResearchIndex Proceedings of the 28 th VLDB Conference Hong Kong, China,
Technology that changes everything. About this Powerpoint Show The prime objective of this PPT is to introduce GP partners to the scope and depth of Trinitys.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 7: Scores in a Complete Search.
9/4/20141 Iterative Project Management Chapter 2 – How Do Iterative Projects Function? Iterative Project Management / 01 - Iterative and Incremental Development.
Algorithms of Google News An Analysis of Google News Personalization Scalable Online Collaborative Filtering 1.
Public Information Version 3.1: 1/1/2012 Introducing Instant Business Intelligence To IT BI Project Managers What you need, when you need it
Your Project Proposals Come up with one carefully proposed idea for a possible group machine learning project, that could be done this semester. This proposal.
Computational and Statistical Tradeoffs in Inference for Big Data Michael I. Jordan Fondation des Sciences Mathematiques de Paris January 23, 2013.
Query Optimizer Overview Conor Cunningham Principal Architect, SQL Server Query Processor 1.
Introduction to Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan.
LIS650 lecture 3 CSS positioning & site architecture Thomas Krichel 2009–02–08.
Web site design incubation Thomas Krichel LIU & НГУ
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 10: Text Classification; The Naive Bayes algorithm.
Artificial Intelligence 16. Genetic Algorithms Course V231 Department of Computing Imperial College © Simon Colton.
G. Pottie, Sensys, November 7, 2003 Multi-Terminal Information Theory Problems in Sensor Networks Gregory J Pottie Professor, Electrical Engineering Department.
CS276 Lecture 5. Plan Last lecture: Tolerant retrieval Wildcards Spell correction Soundex This time: Index construction.
General web design Thomas Krichel assessment You return a two-page typed assessment on a library and information science department web site.
Divide-and-Conquer and Statistical Inference for Big Data Michael I. Jordan University of California, Berkeley September 8, 2012.
Using Trees to Depict a Forest Bin Liu, H.V. Jagadish Department of EECS University of Michigan Ann Arbor, USA Proceedings of Very Large Data Base Endowment.
Structured Prediction and Active Learning for Information Retrieval Presented at Microsoft Research Asia August 21 st, 2008 Yisong Yue Cornell University.
Introduction to Information Retrieval Kangnam Univ. Introduction to Information Retrieval Kangnam Univ. Lecture 4: Index Construction.
Quality Point: A Contemporary Approach to Sales Comparison Presented to the Fine Appraisers of Eastern Ontario on Behalf of the Ontario Association – Appraisal.
1 Advanced Database Application Development Performance Tuning Performance Benchmarks Standardization E-Commerce Legacy Systems.
Testing Relational Database. Overview Once the design of a database system has been completed, the developers are ready to move into the implementation.
LIS650 lecture 3 CSS layout & site architecture Thomas Krichel
Chapter 2 Overview of the Data Mining Process 1. Introduction Data Mining – Predictive analysis Tasks of Classification & Prediction Core of Business.
Computational Methods in Physics PHYS 3437 Dr Rob Thacker Dept of Astronomy & Physics (MM-301C)
© 2016 SlidePlayer.com Inc. All rights reserved.