Presentation is loading. Please wait.

Presentation is loading. Please wait.

REAL-TIME RECOMMENDATIONS FOR RETAIL: ARCHITECTURE, ALGORITHMS, AND DESIGN Juliet Hougland and Jonathan Natkins.

Similar presentations


Presentation on theme: "REAL-TIME RECOMMENDATIONS FOR RETAIL: ARCHITECTURE, ALGORITHMS, AND DESIGN Juliet Hougland and Jonathan Natkins."— Presentation transcript:

1 REAL-TIME RECOMMENDATIONS FOR RETAIL: ARCHITECTURE, ALGORITHMS, AND DESIGN Juliet Hougland and Jonathan Natkins

2 Who Are We? Jonathan Natkins Field Engineer at WibiData Before that, Cloudera Software Engineer Before that, Vertica Software/Field Engineer Juliet Hougland Data Scientist, previously at WibiData MS in Applied Math BA in Math-Physics

3 Recommendations in Retail Personalized versus Non-Personalized

4 Recommendations in Retail Personalized versus Non-Personalized

5 Recommendations in Retail Personalized versus Non-Personalized

6 Recommender Contexts Taste History Based on everything you know about a user Interests over months/years Current Taste Based on a user’s immediate history Interests over minutes/hours Ephemeral Extreme version of current taste For example, location Demographic* Similar to taste history, but less subjective Geographic region, age bracket, etc.

7 Why Does Real-Time Matter? Relevancy

8 I am a Special Snowflake Natty

9 Requirements for a Real-Time System General System Requirements Handle millions of customers/users Support collection and storage of complex data Static and event-series Real-Time System Requirements Quickly retrieve subsets of data for a single user Aggregate/derive new, first-class data per user

10 What is Kiji? The Kiji project is a modular, open- source framework for building real- time applications that collect, store, and analyze entity-centric data kiji.org github.com/kijiproject

11 What is Kiji? The Kiji project is a modular, open- source framework for building real- time applications that collect, store, and analyze entity-centric data kiji.org github.com/kijiproject

12 Three Challenges Developing models for use in real-time Scoring models in real-time Deploying models into a production environment

13 How Can We Make Real-Time Models? Population interests change slowly Individual interests change quickly

14 How Can We Make Real-Time Models? Population interests change slowly Individual interests change quickly Models don’t need to be retrained frequently

15 How Can We Make Real-Time Models? Population interests change slowly Individual interests change quickly Models don’t need to be retrained frequently Application of a model should be fast

16 A Common Workflow Train a model over the entire dataset Save fitted model parameters to a file or another table Access the model parameters when generating new recommendations based on new data This is EXPENSIVE

17 Developing Models KijiExpress Scala interface for interacting with Kiji data Uses Scalding for designing complex dataflows Model Lifecycle Allows analysts and data scientists to break apart a model into phases

18 Scoring Models in Real-Time Batch isn’t real-time

19 Scoring Models in Real-Time Batch isn’t real-time Number of Users Number of Interactions

20 Scoring Models in Real-Time Batch isn’t real-time Number of Users Number of Interactions A few users with many interactions

21 Scoring Models in Real-Time Batch isn’t real-time Number of Users Number of Interactions A few users with many interactions A lot of users with few interactions

22 Fresheners Compute Lazily Client KijiScoring Server HBase Read a column Get from HBase

23 Fresheners Compute Lazily Client KijiScoring Server HBase Read a column Get from HBase Freshness Policy

24 Fresheners Compute Lazily Client KijiScoring Server HBase Read a column Get from HBase Freshness Policy Yes, return to client

25 Fresheners Compute Lazily NO Client KijiScoring Server HBase Read a column Get from HBase Freshness Policy Scorer

26 Fresheners Compute Lazily Client KijiScoring Server HBase Read a column Get from HBase Freshness Policy Scorer Yes, return to client Write back for next time

27 Kiji Application Stack

28 Deployment Challenges

29 Kiji Model Repository Link between application and models Stores Freshener metadata FreshnessPolicy, Scorer, attached column Location of trained model Stores Scorer code Code repository makes model scoring code available to the application from a central location New models can be deployed to the Model Repository and made immediately available to the application

30 Kiji Model Repository

31 Retail Recommendation

32 Types of Recommenders Recommendation Algorithms Collaborative Filtering Methods Content Based Methods Memory Based Model Based

33 Content-Based Recommenders Orange-Nosed Lab Assistant Meeps a lot Build models around entities using features that we think reflect inherent characteristics

34 Content-Based Recommenders safer faster knife

35 Pandora: Content-Based Expertly-Characterized Music

36 Collaborative Filtering Represent users-item affinities as a sparse matrix Beaker Banana Slicer Pineapple Slicer Users ≈ Rows Items ≈ Columns

37 Aspirational Ratings I put in my queue… I actually watch

38 Collaborative Filtering Represent users-item affinities as a sparse matrix Beaker Banana Slicer Pineapple Slicer Users ≈ Rows Items ≈ Columns

39 Simple aggregate predictors Collaborative Filtering: How It Works Similar Users Similar Products

40 Similar Entities What do we mean by similar? Jaccard Index: a measure of set similarity Cosine Similarity: the angle between two vectors Pearson Correlation: statistical measure, similar to cosine Naively, we could compare every entity to each other …But that would not scale will with increasing numbers of entities

41 Building the Similarity Matrix

42 Collaborative Filtering: Is This Useful? Problem: Too much data! Tracking user preferences and all their events generates huge amounts of data Problem: Too little data! Dimensions of user-space and item-space are usually very large More variables makes it more difficult to generate user preferences Problem: Cold start If you don’t know anything about a user, what should you recommend? Problem: More ratings means slower computations Identifying neighborhoods of entities is expensive

43 Collaborative Filtering: Why Is It Useful? Because it works Content-agnostic All that matters is co-occurrence of events

44 Amazon: Item-Item Collaborative Filtering Used for personalized recommendations Fill screen real estate with related items Produces specific, but non-creepy recommendations Linden, G.; Smith, B.; York, J., "Amazon.com recommendations: item-to-item collaborative filtering," Internet Computing, IEEE, vol.7, no.1, pp.76,80, Jan/Feb 2003 >

45 Item-Item Collaborative Filtering Beaker buys a banana slicer Then: Generate list of candidate items to predict ratings for Predict ratings for candidate items Select Top-N items

46 Accessing External Data KeyValueStore API enables external data access when applying a model External data might be… Trained model parameters Hierarchical/Taxonomic data Geo-lookup Store external data flexibly Text files, sequence files, Kiji tables, etc. Data access is decoupled from use during execution If the data doesn’t fit in memory, put it in a table

47 How Much Less Work Can We Do? We can choose a predictor that allows us to truncate a sum There are two ways terms in the sum of our predictor can be small No rating Small similarity

48 How Much Less Work Can We Do? We can choose a predictor that allows us to truncate a sum There are two ways terms in the sum of our predictor can be small No rating Small similarity

49 How Much Less Work Can We Do? We can choose a predictor that allows us to truncate a sum There are two ways terms in the sum of our predictor can be small No rating Small similarity Ignore unrated items

50 How Much Less Work Can We Do? We can choose a predictor that allows us to truncate a sum There are two ways terms in the sum of our predictor can be small No rating Small similarity Ignore dissimilar items

51 How Much Less Work Can We Do? If we only present a few recommendations, we don’t need to predict ratings for all items Choose your candidate set to estimate ratings wisely or infer from nearest neighbors

52 Organizing Data in Item-Item CF

53 Accessing Data During Freshening

54 Want to Know More? The Kiji Project kiji.org github.com/kijiproject Questions about this presentation?


Download ppt "REAL-TIME RECOMMENDATIONS FOR RETAIL: ARCHITECTURE, ALGORITHMS, AND DESIGN Juliet Hougland and Jonathan Natkins."

Similar presentations


Ads by Google