Patterns of Influence in a Recommendation Network Jure Leskovec, CMU Ajit Singh, CMU Jon Kleinberg, Cornell School of Computer Science Carnegie Mellon.

Slides:



Advertisements
Similar presentations
1 Dynamics of Real-world Networks Jure Leskovec Machine Learning Department Carnegie Mellon University
Advertisements

gSpan: Graph-based substructure pattern mining
LEARNING INFLUENCE PROBABILITIES IN SOCIAL NETWORKS Amit Goyal Francesco Bonchi Laks V. S. Lakshmanan University of British Columbia Yahoo! Research University.
Modeling Blog Dynamics Speaker: Michaela Götz Joint work with: Jure Leskovec, Mary McGlohon, Christos Faloutsos Cornell University Carnegie Mellon University.
Spread of Influence through a Social Network Adapted from :
Analysis and Modeling of Social Networks Foudalis Ilias.
School of Computer Science Carnegie Mellon University 1 The dynamics of viral marketing Jure Leskovec, Carnegie Mellon University Lada Adamic, University.
Maximizing the Spread of Influence through a Social Network
Lecture 21 Network evolution Slides are modified from Jurij Leskovec, Jon Kleinberg and Christos Faloutsos.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
A Game Theory Approach to Cascading Behavior in Networks By Jim Manning Jordan Mitchell Ajay Mattappallil.
1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.
Frequent Subgraph Pattern Mining on Uncertain Graph Data
CS 599: Social Media Analysis University of Southern California1 The Basics of Network Analysis Kristina Lerman University of Southern California.
CIKM’2008 Presentation Oct. 27, 2008 Napa, California
Empirical analysis of social recommendation systems Review of paper by Ophir Gaathon Analysis of Social Information Networks COMS , Spring 2011,
School of Information University of Michigan 1 The dynamics of viral marketing Lada Adamic MOCHI November 9, 2005.
Cascading Behavior in Large Blog Graphs Patterns and a Model Leskovec et al. (SDM 2007)
INFERRING NETWORKS OF DIFFUSION AND INFLUENCE Presented by Alicia Frame Paper by Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Kraus.
CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California.
Simpath: An Efficient Algorithm for Influence Maximization under Linear Threshold Model Amit Goyal Wei Lu Laks V. S. Lakshmanan University of British Columbia.
Part I: Introductory Materials Introduction to Graph Theory Dr. Nagiza F. Samatova Department of Computer Science North Carolina State University and Computer.
A Measurement-driven Analysis of Information Propagation in the Flickr Social Network WWW09 报告人: 徐波.
Models of Influence in Online Social Networks
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
1 Computing with Social Networks on the Web (2008 slide deck) Jennifer Golbeck University of Maryland, College Park Jim Hendler Rensselaer Polytechnic.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
The Dynamics of Viral Marketing Jure Leskovec Lada Adamic Bernardo A. Huberman Stanford University University of MichiganHP Labs Presented by Leman Akoglu.
Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.
On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.
School of Computer Science Carnegie Mellon University 1 The dynamics of viral marketing Jure Leskovec, Carnegie Mellon University Lada Adamic, University.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
December 7-10, 2013, Dallas, Texas
Xiangnan Kong,Philip S. Yu Department of Computer Science University of Illinois at Chicago KDD 2010.
1 Frequent Subgraph Mining Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY June 12, 2010.
Mining Social Network for Personalized Prioritization Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Professor Yashar Ganjali Department of Computer Science University of Toronto
Maximizing the Spread of Influence through a Social Network Authors: David Kempe, Jon Kleinberg, É va Tardos KDD 2003.
University at BuffaloThe State University of New York Lei Shi Department of Computer Science and Engineering State University of New York at Buffalo Frequent.
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
+ Big Data, Network Analysis Week How is date being used Predict Presidential Election - Nate Silver –
Du, Faloutsos, Wang, Akoglu Large Human Communication Networks Patterns and a Utility-Driven Generator Nan Du 1,2, Christos Faloutsos 2, Bai Wang 1, Leman.
How Do “Real” Networks Look?
1 Latency-Bounded Minimum Influential Node Selection in Social Networks Incheol Shin
A Latent Social Approach to YouTube Popularity Prediction Amandianeze Nwana Prof. Salman Avestimehr Prof. Tsuhan Chen.
Algorithms For Solving History Sensitive Cascade in Diffusion Networks Research Proposal Georgi Smilyanov, Maksim Tsikhanovich Advisor Dr Yu Zhang Trinity.
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
1 Patterns of Cascading Behavior in Large Blog Graphs Jure Leskoves, Mary McGlohon, Christos Faloutsos, Natalie Glance, Matthew Hurst SDM 2007 Date:2008/8/21.
Bo Zong, Yinghui Wu, Ambuj K. Singh, Xifeng Yan 1 Inferring the Underlying Structure of Information Cascades
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
A Connectivity-Based Popularity Prediction Approach for Social Networks Huangmao Quan, Ana Milicic, Slobodan Vucetic, and Jie Wu Department of Computer.
Collaborative Filtering - Pooja Hegde. The Problem : OVERLOAD Too much stuff!!!! Too many books! Too many journals! Too many movies! Too much content!
Gspan: Graph-based Substructure Pattern Mining
Inferring Networks of Diffusion and Influence
Nanyang Technological University
Greedy & Heuristic algorithms in Influence Maximization
Mining in Graphs and Complex Structures
A paper on Join Synopses for Approximate Query Answering
Learning Influence Probabilities In Social Networks
Depth-First Search.
The Importance of Communities for Learning to Influence
Lecture 13 Network evolution
Discovering Functional Communities in Social Media
Mining Frequent Subgraphs
Decision Based Models of Cascades
Lecture 21 Network evolution
The dynamics of viral marketing
Approximate Graph Mining with Label Costs
Presentation transcript:

Patterns of Influence in a Recommendation Network Jure Leskovec, CMU Ajit Singh, CMU Jon Kleinberg, Cornell School of Computer Science Carnegie Mellon

School of Computer Science Carnegie Mellon 2 Spread of information Social network plays fundamental role in spread of information or influence Viral marketing (Word of mouth) An idea gets a sudden widespread popularity Example: GMail achieved wide popularity and the only way to obtain an account was through referral In blogs a piece of information spreads rapidly before eventually picked by mass media

School of Computer Science Carnegie Mellon 3 Information cascades Cascades are phenomena in which an action or idea becomes widely adopted due to influence by others Traditionally sociologists studied the diffusion of innovation: Hybrid corn (Ryan and Gross, 1943) Prescription drugs (Coleman et al. 1957)

School of Computer Science Carnegie Mellon 4 Cascade formation process t3t3 Time: t 1 < t 2 < … < t n legend received recommendation and propagated it forward received a recommendation but didn’t propagate t5t5 t1t1 t6t6 t2t2 t4t4

School of Computer Science Carnegie Mellon 5 Work on information cascades Cascades have also been studied to: Select trendsetters for viral marketing (Kempe et al. 2003, Richardson et al. 2002) Find inoculation targets in epidemiology (Newman 2002) Explain trends in blogspace (Adar and Adamic 2005, Gruhl et al. 2004) Since it is hard to obtain reliable data on cascades, previous studies were primarily focused on large-scale (coarse) analysis

School of Computer Science Carnegie Mellon 6 Our work We look at the fine-grained patterns of influence in a large-scale, real recommendation network Given a directed who-influences-whom graph Find cascades And examine their topological structure: What kinds of cascades arise frequently in real life? Are they like trees, stars, or something else? What is the distribution of cascade sizes (all same size / exponential tail / heavy-tailed)?

School of Computer Science Carnegie Mellon 7 Roadmap The recommendation network dataset Proposed method: Indentifing cascades Enumerating cascades Counting cascades (approximate graph isomorphism) Experimental results: Distribution of cascade sizes Frequent cascade subgraphs Conclusion

School of Computer Science Carnegie Mellon 8 Roadmap The recommendation network dataset Proposed method: Indentifing cascades Enumerating cascades Counting cascades (approximate graph isomorphism) Experimental results: Distribution of cascade sizes Frequent cascade subgraphs Conclusion

School of Computer Science Carnegie Mellon 9 The data – recommendation network Senders and followers of recommendations receive discounts on products 10% credit10% off Recommendations are made to any number of people at the time of purchase

School of Computer Science Carnegie Mellon 10 The data – recommendations For each recommendation we have: sender ID recipient ID recommendation time response (buy / no buy) purchase time

School of Computer Science Carnegie Mellon 11 The data – description A large online retailer (June 2001 to May 2003) Over a gigabyte in size 15,646,121 recommendations 3,943,084 distinct customers 548,523 products recommended 99% of them belonging 4 main product groups: books DVDs music CDs VHS

School of Computer Science Carnegie Mellon 12 The data – statistics Networks are very sparsely connected (low average degree) 9% of DVD purchases are due to recommendations Book recommendations are influential productscustomersrecommendationsedgespurchasesresponses Book103,1612,863,9775,741,6112,097,8092,859,09683,113 DVD19,829805,2858,180,393962,341837,30075,421 Music393,598794,1481,443,847585,738721,67310,576 Video26,131239,583280,270160,683165,1091,376 Full542,7193,943,08415,646,1213,153,6764,574,178170,486 high low

School of Computer Science Carnegie Mellon 13 Roadmap The recommendation network dataset Proposed method: Indentifing cascades Enumerating cascades Counting cascades (approximate graph isomorphism) Experimental results: Distribution of cascade sizes Frequent cascade subgraphs Conclusion

School of Computer Science Carnegie Mellon 14 Product recommendation network Majority of recommendations do not cause purchases nor propagation Notice many star-like patterns Many disconnected components

School of Computer Science Carnegie Mellon 15 Identifying cascades Given a set of recommendations find cascades We use the following approach Create a separate graph for each product Delete late recommendations: Delete recommendations that happened after the first purchase of the product We get time-increasing graph Delete no-purchase nodes: We find many star-like patterns, no propagation of influence Delete nodes that did not purchase a product Now connected components correspond to maximal cascades

School of Computer Science Carnegie Mellon 16 Cascade enumeration Maximal cascades do not reveal what are the cascade building blocks (local structures) Given a maximal cascade we want to enumerate all local cascades: For every node we explore the cascade in the neighborhood up to 1, 2, 3,… steps away This way we capture the local structure of the cascade around the node source node 1 step away 2 steps away

School of Computer Science Carnegie Mellon 17 Counting cascades (graph isomorphism) To count cascades we need to determine whether a new cascade is isomorphic to already seen one: No polynomial graph isomorphism algorithm is known, so we reside to approximate solution Graphs are isomorphic if there exists a node mapping so that nodes have same neighbors ? ==

School of Computer Science Carnegie Mellon 18 Graph isomorphism Do not compare the graphs directly, but For each graph we create a signature A good signature is one where isomorphic graphs have the same signature, but few non- isomorphic graphs share the same signature Compare the graph signatures

School of Computer Science Carnegie Mellon 19 Creating a signature We propose multilevel approach Complexity (and accuracy) depends on the size of the graph Different levels of the signature Number of nodes, number of edges Sorted in- and out- degree sequence Singular values of graph adjacency matrix For small graphs (n < 9) we perform exact isomorphism test simple (fast/inaccurate) complex (slow/accurate)

School of Computer Science Carnegie Mellon 20 Comparing signatures First compare simple signatures Compare the graphs with the same simple signature using more and more complicated (expensive/accurate) signatures At the end (for small graphs) we perform exact isomorphism resolution Since we are interested in building blocks of cascades which are generally small, the precision for small graphs is more important

School of Computer Science Carnegie Mellon 21 Comparing signatures – Example Compare simple signature (number of nodes/edges) Compare simple signature (degree sequence) Compare simple signature (Singular values)

School of Computer Science Carnegie Mellon 22 Counting subgraphs – related work Work on frequent subgraph mining: Apriori-based algorithm (Inokuchi et al. 2000) G-span (Yan and Han, 2002) Kuramochi and Karypis 2004; Pei, Jiang and Zhang 2005; and many more It mainly focuses on richly labeled undirected graphs (e.g. chemical compounds) We are interested in enumerating subgraphs based only on their structures We have no labels on nodes and edges So heuristics for pruning the search space using node and edge labels cannot be applied

School of Computer Science Carnegie Mellon 23 Roadmap The recommendation network dataset Proposed method: Indentifing cascades Enumerating cascades Counting cascades (approximate graph isomorphism) Experimental results: Distribution of cascade sizes Frequent cascade subgraphs Conclusion

School of Computer Science Carnegie Mellon 24 Measuring maximal cascade sizes Count how many people are in a single cascade We observe a heavy tailed distribution which can not be explained by a simple branching process steep drop-off very few large cascades books

School of Computer Science Carnegie Mellon 25 Cascade sizes for DVDs DVD cascades can grow large possibly a product of websites where people sign up to exchange recommendations shallow drop off – fat tail a number of large cascades DVD

School of Computer Science Carnegie Mellon 26 Music CD and VHS cascades Music and VHS cascades don’t grow large music VHS

School of Computer Science Carnegie Mellon 27 Frequent cascade subgraphs (1) General observations: DVDs have the richest cascades (most recommendations, most densely linked) Books have small cascades Music is 3 times larger than video but does not have much variety in cascades cascadesdifferent Book122, DVD289,05587,614 Music13, Video1, high low number of all “words” vocabulary size

School of Computer Science Carnegie Mellon 28 is the most common cascade subgraph It accounts for ~75% cascades in books, CD and VHS, only 12% of DVD cascades is 6 (1.2 for DVD) times more frequent than For DVDs is more frequent than Chains ( ) are more frequent than is more frequent than a collision ( ) (but collision has less edges) Late split ( ) is more frequent than Frequent cascade subgraphs (2)

School of Computer Science Carnegie Mellon 29 No propagation Common friends Nodes having same friends Typical classes of cascades A complicated cascade

School of Computer Science Carnegie Mellon 30 Conclusion (1) Cascades are a form of collective behavior We developed a scalable algorithm for indentifing and counting cascades (approximate graph isomorphism) We illustrate the existence of cascades, and measure their frequencies in a large real-world dataset

School of Computer Science Carnegie Mellon 31 Conclusion (2) From our experiments we found: Most cascades are small, but large bursts can occur Cascade sizes follow a heavy-tailed distribution Frequency of different cascade subgraphs depends on the product type Cascade frequencies do not simply decrease monotonically for denser subgraphs But reflect more subtle features of the domain in which the recommendations are operating

School of Computer Science Carnegie Mellon 32 Thank you! Questions?