Predictive Discrete Latent Factor Models for large incomplete dyadic data Deepak Agarwal, Srujana Merugu, Abhishek Agarwal Y! Research MMDS Workshop, Stanford.

Slides:

Advertisements

Similar presentations

Emulation of a Stochastic Forest Simulator Using Kernel Stick-Breaking Processes (Work in Progress) James L. Crooks (SAMSI, Duke University)

Advertisements

Probabilistic analog of clustering: mixture models

Unsupervised Learning

Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.

Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule Stephen D. Bay 1 and Mark Schwabacher 2 1 Institute for.

Fast Bayesian Matching Pursuit Presenter: Changchun Zhang ECE / CMR Tennessee Technological University November 12, 2010 Reading Group (Authors: Philip.

K Means Clustering , Nearest Cluster and Gaussian Mixture

Contextual Advertising by Combining Relevance with Click Feedback D. Chakrabarti D. Agarwal V. Josifovski.

What is Statistical Modeling

Robust Network Compressive Sensing Lili Qiu UT Austin NSF Workshop Nov. 12, 2014.

Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski.

Probability based Recommendation System Course : ECE541 Chetan Tonde Vrajesh Vyas Ashwin Revo Under the guidance of Prof. R. D. Yates.

Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.

1 Estimating Rates of Rare Events at Multiple Resolutions Deepak Agarwal Andrei Broder Deepayan Chakrabarti Dejan Diklic Vanja Josifovski Mayssam Sayyadian.

Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.

1 Unsupervised Learning With Non-ignorable Missing Data Machine Learning Group Talk University of Toronto Monday Oct 4, 2004 Ben Marlin Sam Roweis Rich.

Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.

J. Chen, O. R. Zaiane and R. Goebel An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities.

Amos Storkey, School of Informatics. Density Traversal Clustering and Generative Kernels a generative framework for spectral clustering Amos Storkey, Tom.

Space-time Modelling Using Differential Equations Alan E. Gelfand, ISDS, Duke University (with J. Duan and G. Puggioni)

Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.

Intelligible Models for Classification and Regression

Bi-Clustering Jinze Liu. Outline The Curse of Dimensionality Co-Clustering  Partition-based hard clustering Subspace-Clustering  Pattern-based 2.

Cao et al. ICML 2010 Presented by Danushka Bollegala.

Anindya Ghose Sha Yang Stern School of Business New York University An Empirical Analysis of Sponsored Search Performance in Search Engine Advertising.

WEMAREC: Accurate and Scalable Recommendation through Weighted and Ensemble Matrix Approximation Chao Chen ⨳ , Dongsheng Li

Fast Max–Margin Matrix Factorization with Data Augmentation Minjie Xu, Jun Zhu & Bo Zhang Tsinghua University.

RecSys 2011 Review Qi Zhao Outline Overview Sessions – Algorithms – Recommenders and the Social Web – Multi-dimensional Recommendation, Context-

@delbrians Transfer Learning: Using the Data You Have, not the Data You Want. October, 2013 Brian d’Alessandro.

A Comparative Evaluation of Three Skin Color Detection Approaches Dennis Jensch, Daniel Mohr, Clausthal University Gabriel Zachmann, University of Bremen.

Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.

The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.

Distributed Nonnegative Matrix Factorization for Web- Scale Dyadic Data Analysis on MapReduce Challenge : the scalability of available tools Definition.

Clustering and Testing in High- Dimensional Data M. Radavičius, G. Jakimauskas, J. Sušinskas (Institute of Mathematics and Informatics, Vilnius, Lithuania)

Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.

Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.

Flat clustering approaches

Classification Ensemble Methods 1

Information-Theoretic Co- Clustering Inderjit S. Dhillon et al. University of Texas, Austin presented by Xuanhui Wang.

Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.

Tutorial I: Missing Value Analysis

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Supervised Random Walks: Predicting and Recommending Links in Social Networks Lars Backstrom (Facebook) & Jure Leskovec (Stanford) Proc. of WSDM 2011 Present.

Regression Based Latent Factor Models Deepak Agarwal Bee-Chung Chen Yahoo! Research KDD 2009, Paris 6/29/2009.

Methods of Secure Computation and Data Integration Jerome Reiter, Duke University Alan Karr, NISS Xiaodong Lin, University of Cincinnati Ashish Sanil,

Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:

Introduction to Gaussian Process CS 478 – INTRODUCTION 1 CS 778 Chris Tensmeyer.

1 Part09: Applications of Multi- level Models to Spatial Epidemiology Francesca Dominici & Scott L Zeger.

Hierarchical Beta Process and the Indian Buffet Process by R. Thibaux and M. I. Jordan Discussion led by Qi An.

The Nested Dirichlet Process Duke University Machine Learning Group Presented by Kai Ni Nov. 10, 2006 Paper by Abel Rodriguez, David B. Dunson, and Alan.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

Bias-Variance Analysis in Regression  True function is y = f(x) +  where  is normally distributed with zero mean and standard deviation .  Given a.

1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.

1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.

Chapter 7. Classification and Prediction

Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani

The Elements of Statistical Learning

Collective Network Linkage across Heterogeneous Social Platforms

Machine Learning Basics

Unsupervised Extraction of Template Structure in Web Search Queries www 2012 – Session: search Qingxia Liu.

Current Developments in Differential Privacy

Advanced Artificial Intelligence

Discovering Functional Communities in Social Media

Logistic Regression & Parallel SGD

Generalized Spatial Dirichlet Process Models

LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.

Topic models for corpora and for graphs

Finding Periodic Discrete Events in Noisy Streams

Bayesian kernel mixtures for counts

GhostLink: Latent Network Inference for Influence-aware Recommendation

Presentation transcript:

Predictive Discrete Latent Factor Models for large incomplete dyadic data Deepak Agarwal, Srujana Merugu, Abhishek Agarwal Y! Research MMDS Workshop, Stanford University 6/25/2008

Agenda Motivating applications Problem Definitions Classic approaches Our approach – PDLF –Building local models via co-clustering Enhancing PDLF via factorization Discussion

Motivating Applications Movie (Music) Recommendations –(Netflix, Y! Music) –Personalized; based on historical ratings Product Recommendation –Y! shopping: top products based on browse behavior Online advertising –What ads to show on a page? Traffic Quality of a publisher –What is the conversion rate?

DATA DYAD (i,j) RESPONSE: y ij (ratings, click rates conversion rates) user, webpage i j Movie, music, product, ad COVARIATES (u i,v j )

Problem Definition CHALLENGES Scalability: Large dyadic matrix Missing data: Small fraction of dyads Noise: SNR low; data heterogeneous but there are strong interactions (i, j); Training data Predict Response GOAL:

Classical Approaches SUPERVISED LEARNING –Non-parametric Function Estimation –Random effects: to estimate interactions UNSUPERVISED LEARNING –Co-clustering, low-rank factorization,… Our main contribution –Blend supervised & unsupervised in a model based way; scalable fitting

Non-parametric function estimation E.g. Trees, Neural Nets, Boosted Trees, Kernel Learning,… –capture entire structure through covariates –Dyadic data: Covariate-only models shows “Lack-of- Fit”, better estimates of interactions possible by using information on dyads.

Random effects model Specific term per observed cell Smooth dyad-specific effects using prior( “shrinkage”) –E.g. Gaussian mixture, Dirichlet process,.. Main goal: hypothesis testing, not suited to prediction –Prediction for new cell is based only on estimated prior Our approach –Co-cluster the matrix; local models in each cluster –Co-clustering done to obtain the best model fit global Dyad-specific

Classic Co-clustering Exclusively capture interactions →No covariates included! Goal: Prediction by Matrix Approximation Scalable –Iteratively cluster rows & cols →homogeneous blocks

RAW DATA Smooth Rows using column clusters; reduces variance After ROW CLUSTERING After COLUMN Clustering Iterate Until convergence

Our Generative model Sparse, flexible approach to learn dyad-specific coeffs –borrow strength across rows and columns Capture interactions by co-clustering –Local model in each co-cluster –Convergence fast, procedure scalable –Completely model based, easy to generalize We consider x ij =1 in this talk

Scalable model fitting EM algorithm

Simulation on Movie Lens User-movie ratings –Covariates: User demographics; genres –Simulated (200 sets): estimated co-cluster structure –Response assumed Gaussian β0β0 β1β1 β2β2 β3β3 β4β4 σ2σ2 truth % c.i 3.66, , , , , ,0.99

Regression on Movie Lens Rating > 3: +ve 23 covariates PDLF Pure co-clustering Logistic Regression

Click Count Data Goal: Click activity on publisher pages from ip-domains Dataset: –47903 ip-domains, 585 web-sites, click-count observations –two covariates: ip-location (country-province) and routing type (e.g., aol pop, anonymizer, mobile-gateway), row-col effects. Model: –PDLF model based on Poisson distributions with number of row/column clusters set to 5 We thank Nicolas Eddy Mayoraz for discussions and data

Co-cluster Interactions: Plain Co-clustering Non- Clickin g IPs/ US AOL/ Unkno wn Edu / Europe an Japane se Korean Internet-related e.g., buydomains Shopping search e.g., netguide AOL & Yahoo Smaller publishers e.g. blogs Smaller portals e.g. MSN, Netzero Publishers: IP Domains

Co-cluster Interactions: PDLF Tech Compani es ISPs WebMedia e.g., usatoday Online Portals (MSN, Yahoo) Publishers: IP Domains

Smoothing via Factorization Cluster size vary in PDLF, smoothing across local models works better

Synthetic example (moderately sparse data) Missing values are shown as dark regions Finer interactions

Synthetic example (highly sparse data)

Movie Lens

Estimating conversion rates Back to ip x publisher example –Now model conversion rates Prob (click results in sales) –Detecting important interaction helps in traffic quality estimation

Summary Covariate only models often fail to capture residual dependence for dyadic data Model based co-clustering attractive and scalable approach to estimate interactions Factorization on cluster effects smoothes local models; leads to better performance Models widely applicable in many scenarios

Ongoing work Fast co-clustering through DP mixtures (Richard Hahn, David Dunson) –Few sequential scans over the data –Initial results extremely promising Model based hierarchical co-clustering (Inderjit Dhillon) –Multi-resolution local models; smoothing by borrowing strength across resolutions