Large-scale Recommendations in a Dynamic Marketplace Jay Katukuri Rajyashree Mukherjee Tolga Konik Chu-Cheng Hsieh LSRS 20131.

Slides:



Advertisements
Similar presentations
Google News Personalization: Scalable Online Collaborative Filtering
Advertisements

Google News Personalization Scalable Online Collaborative Filtering
Information Retrieval in Practice
Arnd Christian König Venkatesh Ganti Rares Vernica Microsoft Research Entity Categorization Over Large Document Collections.
University of Minnesota Location-based & Preference-Aware Recommendation Using Sparse Geo-Social Networking Data Location-based & Preference-Aware Recommendation.
Chapter 5: Introduction to Information Retrieval
Introduction to Information Retrieval
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
A Graph-based Recommender System Zan Huang, Wingyan Chung, Thian-Huat Ong, Hsinchun Chen Artificial Intelligence Lab The University of Arizona 07/15/2002.
WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Clustering and Load Balancing Optimization for Redundant Content Removal Shanzhong Zhu (Ask.com) Alexandra Potapova, Maha Alabduljalil (Univ. of California.
Experiments on Query Expansion for Internet Yellow Page Services Using Log Mining Summarized by Dongmin Shin Presented by Dongmin Shin User Log Analysis.
Information Retrieval in Practice
Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.
Information Retrieval Review
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
J. Chen, O. R. Zaiane and R. Goebel An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities.
Query Biased Snippet Generation in XML Search Yi Chen Yu Huang, Ziyang Liu, Yi Chen Arizona State University.
Recommender systems Ram Akella November 26 th 2008.
1 Today  Tools (Yves)  Efficient Web Browsing on Hand Held Devices (Shrenik)  Web Page Summarization using Click- through Data (Kathy)  On the Summarization.
Overview of Search Engines
Web Information Retrieval Projects Ida Mele. Rules Students can work in teams (max 3 people) The project must be delivered by the deadline that will be.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Machine Learning at Orbitz Robert Lancaster and Jonathan Seidman Strata 2011 February 02 | 2011.
Tag-based Social Interest Discovery
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
Chapter 10 Developing New Products
 An important problem in sponsored search advertising is keyword generation, which bridges the gap between the keywords bidded by advertisers and queried.
Clustering-based Collaborative filtering for web page recommendation CSCE 561 project Proposal Mohammad Amir Sharif
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Improving Suffix Tree Clustering Base cluster ranking s(B) = |B| * f(|P|) |B| is the number of documents in base cluster B |P| is the number of words in.
Giorgos Giannopoulos (IMIS/”Athena” R.C and NTU Athens, Greece) Theodore Dalamagas (IMIS/”Athena” R.C., Greece) Timos Sellis (IMIS/”Athena” R.C and NTU.
Qingqing Gan Torsten Suel CSE Department Polytechnic Institute of NYU Improved Techniques for Result Caching in Web Search Engines.
Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
1 Computing Relevance, Similarity: The Vector Space Model.
RecBench: Benchmarks for Evaluating Performance of Recommender System Architectures Justin Levandoski Michael D. Ekstrand Michael J. Ludwig Ahmed Eldawy.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Badrul M. Sarwar, George Karypis, Joseph A. Konstan, and John T. Riedl
Answering Similar Region Search Queries Chang Sheng, Yu Zheng.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
CS 347Notes101 CS 347 Parallel and Distributed Data Processing Distributed Information Retrieval Hector Garcia-Molina Zoltan Gyongyi.
Clustering C.Watters CS6403.
A Content-Based Approach to Collaborative Filtering Brandon Douthit-Wood CS 470 – Final Presentation.
Vector Space Models.
 Enhancing User Experience  Why it is important?  Discussing user experience one-by-one.
ISchool, Cloud Computing Class Talk, Oct 6 th Computing Pairwise Document Similarity in Large Collections: A MapReduce Perspective Tamer Elsayed,
ApproxHadoop Bringing Approximations to MapReduce Frameworks
HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.
Hybrid Content and Tag-based Profiles for recommendation in Collaborative Tagging Systems Latin American Web Conference IEEE Computer Society, 2008 Presenter:
Using category-Based Adherence to Cluster Market-Basket Data Author : Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen Graduate : Chien-Ming Hsiao.
Ariel Fuxman, Panayiotis Tsaparas, Kannan Achan, Rakesh Agrawal (2008) - Akanksha Saxena 1.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
B ig D ata Analysis for Page Ranking using Map/Reduce R.Renuka, R.Vidhya Priya, III B.Sc., IT, The S.F.R.College for Women, Sivakasi.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
1 Query Directed Web Page Clustering Daniel Crabtree Peter Andreae, Xiaoying Gao Victoria University of Wellington.
IR 6 Scoring, term weighting and the vector space model.
Information Retrieval in Practice
Data Mining: Concepts and Techniques
Recommender Systems & Collaborative Filtering
Semantic Processing with Context Analysis
Learning to Rank Shubhra kanti karmaker (Santu)
Detecting Online Commercial Intention (OCI)
Text Categorization Assigning documents to a fixed set of categories
Learning Literature Search Models from Citation Behavior
Presentation transcript:

Large-scale Recommendations in a Dynamic Marketplace Jay Katukuri Rajyashree Mukherjee Tolga Konik Chu-Cheng Hsieh LSRS 20131

John is interested in an item: “iPhone 5 64gb white”, should we recommends – “iPhone 5 case” (or) – “iPhone 5s gold” Meet John Doe LSRS 20132

Recommendation on e-marketplace Recommendation “before” purchase – iPhone 5S gold Recommendation “after” purchase – iPhone 5 case Similar Item Recommendation (SIR) Related Item Recommendation (RIR) LSRS 20133

SIR- Example 1 LSRS 20134

SIR Example 2 LSRS 20135

Related Item Recommendation 6 Recommendations for Xbox 360 4GB on Checkout page LSRS 2013

Main Idea Similar Item Clustering (SIC) – Titles – Attributes (Price, etc.) – Images Recommendation – SIR: (same cluster) – RIR: (neighbor clusters) LSRS 20137

Models Item clusters Cluster represented by meaningful keywords – “clarks women shoe pumps classics” – “authentic handmade amish quilt” Cluster-Cluster Relations – “samsung galaxy s4” – “samsung galaxy s4 screen protector” – “wolfgang puck electric pressure cooker” – “kitchenaid food processor” LSRS 20138

System Architecture - Overview LSRS Inventory Cluster-Cluster Relations Transactions Clusters Conceptual Knowledgebase Offline Model GenerationThe Data StoreReal-time Performance System Similar Items Recommender (SIR) Related Items Recommender (RIR) Clusters Model Generation Related Clusters Model Generation Clickstream Lost Item Similar Items ?similarTo(item) Bought Item Related Items ?relatedTo(item)

Cluster Generation (offline) LSRS

Data on eBay Item-item co-occurrences on transaction logs Large Data – Much bigger data set in both users and inventory than other ecommerce sites. Scale – More than 300M listings. – More than 10M new items every day LSRS

Challenges Global clustering not feasible Size bias on different categories Performance LSRS

Model Generation - Clusters 1.Select a few keyword to represents “big notions”, e.g. iPhone, Handbags, etc. – How to select? 2.Clustering by K-means – How to set K? LSRS

Model Generation - Clusters new clusters items user queries concepts, categories query-to-items Query-Recall Generation Cluster Generation Clusters Model Generation Data Store Clusters Inventory Clickstream Conceptual Knowledgebase Problem: Global clustering not feasible Solution: Partition input data by user queries Parallel distributed K-Means in Hadoop MapReduce Dedupe and merge overlapping clusters (100X reduction in size over inventory with over 90% coverage) LSRS

Base Cluster Generation Base Cluster ≡ Query Find merge candidates based on query term overlap – Eg: “nike airmax tennis shoes” -> “nike airmax” Score candidates using cosine similarity – Term weight : TF-IDF in the query space(document=query) TF : Query Demand IDF : Number of Queries LSRS

Step 1: base cluster candidates Method for choosing the ``base clusters’’ (initial states): – Minimum frequency – Supply threshold (Enough Inventory) – Min and max token constraint (Length of queries) – Heuristic constraints Queries that have only numbers are not allowed: “10 5” … – Merge similar clusters into one LSRS

candidates merge 4.34M base clusters merged into 1.95M Example phrase(hand,made) phrase(king,s) queen quilt phrase(hand,made) phrase(pink,s) quilt phrase(hand,made) phrase(prae,owned) queen quilt phrase(hand,made) queen quilt phrase(hand,made) phrase(prae,owned) quilt phrase(hand,made) quilt size twin phrase(hand,made) quilt silk phrase(hand,made) quilt twin phrase(hand,made) phrase(patch,work) quilt phrase(hand,made) quilt white phrase(hand,made) phrase(king,size) quilt phrase(hand,made) phrase(yo,yo,s) quilt phrase(hand,made) quilt sale phrase(hand,made) quilt red phrase(hand,made) quilt LSRS

Step 2: K-Means Clustering Split Clusters Query to Items Data Base Cluster Generation K-Means Clustering of Base Clusters Generate Item Features Transaction Logs Inventory Logs Scoring Models LSRS

Clusters on Item Signature apple ipod touch 4g clear film protector screen Cluster clarks women shoe pumps classics LSRS

Recommendation (online) LSRS

Performance System ClustersInventory Conceptual Knowledgebase ?similarTo(item) SIR query formation Item Selection Cluster Assignment SIR Ranking items Data Store Lost Item Similar Items recommendations Item Search query Clusters Inventory Conceptual Knowledgebase ?relatedTo(item) Item Selection Cluster Assignment RIR Ranking items Data Store Bought Item Related Items recommendations Item Search queries RIR Query Formation Cluster-Cluster Relations clusters related clusters LSRS

Items in the same cluster LSRS

Similar Item Recommendations LSRS

Experimental Results A/B Tests comparing against legacy systems – SIR legacy system Completely online Naïve approach of using seed item title as a search query – RIR legacy system Chen, Y. and J.F. Canny, Recommending ephemeral items at web scale, ACM SIGIR 2011 Collaborative Filtering on stable representations of items – Significant improvements at 90% confidence interval SIR resulted in 38.18% higher user engagement (CTR) RIR resulted in 10.5% higher CTR Statistically significant improvement in site-wide business metrics from both SIR & RIR LSRS

Conclusion Balance between similarity and quality crucial in driving user engagement and conversion Clusters of similar items in the inventory – Local clustering in the coverage set of user queries Offline models built using Map-Reduce – Huge input datasets including inventory, clickstream and transactional data Efficient real-time performance system Currently deployed on ebay.com LSRS

Acknowledgments Current & Past team members – Kranthi Chalasani – Santanu Kolay – Riyaaz Shaik – Venkat Sundaranatha LSRS

WE’RE HIRING Chu-Cheng Hsieh LSRS