Discovering Influential Nodes From Social Trust Network

Slides:



Advertisements
Similar presentations
Learning Influence Probabilities in Social Networks 1 2 Amit Goyal 1 Francesco Bonchi 2 Laks V. S. Lakshmanan 1 U. of British Columbia Yahoo! Research.
Advertisements

LEARNING INFLUENCE PROBABILITIES IN SOCIAL NETWORKS Amit Goyal Francesco Bonchi Laks V. S. Lakshmanan University of British Columbia Yahoo! Research University.
Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
Spread of Influence through a Social Network Adapted from :
Maximizing the Spread of Influence through a Social Network
Cost-effective Outbreak Detection in Networks Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie Glance.
Maximizing the Spread of Influence through a Social Network
Guest lecture II: Amos Fiat’s Social Networks class Edith Cohen TAU, December 2014.
Maximizing the Spread of Influence through a Social Network By David Kempe, Jon Kleinberg, Eva Tardos Report by Joe Abrams.
1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.
Based on “Cascading Behavior in Networks: Algorithmic and Economic Issues” in Algorithmic Game Theory (Jon Kleinberg, 2007) and Ch.16 and 19 of Networks,
CIKM’2008 Presentation Oct. 27, 2008 Napa, California
Discovering Leaders from Community Actions Amit Goyal 1 Francesco Bonchi 2 Laks V.S. Lakshmanan 1 Oct 27,
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Optimal Marketing Strategies over Social Networks Jason Hartline (Northwestern), Vahab Mirrokni (Microsoft Research) Mukund Sundararajan (Stanford)
INFERRING NETWORKS OF DIFFUSION AND INFLUENCE Presented by Alicia Frame Paper by Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Kraus.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
Influence Maximization
Simpath: An Efficient Algorithm for Influence Maximization under Linear Threshold Model Amit Goyal Wei Lu Laks V. S. Lakshmanan University of British Columbia.
Maximizing Product Adoption in Social Networks
Models of Influence in Online Social Networks
Personalized Influence Maximization on Social Networks
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
Information Spread and Information Maximization in Social Networks Xie Yiran 5.28.
Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.
Thang N. Dinh, Dung T. Nguyen, My T. Thai Dept. of Computer & Information Science & Engineering University of Florida, Gainesville, FL Hypertext-2012,
December 7-10, 2013, Dallas, Texas
Maximizing the Spread of Influence through a Social Network David Kempe, Jon Kleinberg, Eva Tardos Cornell University KDD 2003.
Maximizing the Spread of Influence through a Social Network Authors: David Kempe, Jon Kleinberg, É va Tardos KDD 2003.
Online Social Networks and Media
Lecture 3-1 Independent Cascade Weili Wu Ding-Zhu Du University of Texas at Dallas.
On Bharathi-Kempe-Salek Conjecture about Influence Maximization Ding-Zhu Du University of Texas at Dallas.
1 Latency-Bounded Minimum Influential Node Selection in Social Networks Incheol Shin
Algorithms For Solving History Sensitive Cascade in Diffusion Networks Research Proposal Georgi Smilyanov, Maksim Tsikhanovich Advisor Dr Yu Zhang Trinity.
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
Biao Wang 1, Ge Chen 1, Luoyi Fu 1, Li Song 1, Xinbing Wang 1, Xue Liu 2 1 Shanghai Jiao Tong University 2 McGill University
Yu Wang1, Gao Cong2, Guojie Song1, Kunqing Xie1
Inferring Networks of Diffusion and Influence
Cohesive Subgraph Computation over Large Graphs
Seed Selection.
Wenyu Zhang From Social Network Group
Nanyang Technological University
Independent Cascade Model and Linear Threshold Model
By Arijit Chatterjee Dr
Greedy & Heuristic algorithms in Influence Maximization
Influence Maximization
E-Commerce Theories & Practices
Link Prediction and Network Inference
Learning Influence Probabilities In Social Networks
Independent Cascade Model and Linear Threshold Model
Influence Maximization
Maximizing the Spread of Influence through a Social Network
The Importance of Communities for Learning to Influence
Effective Social Network Quarantine with Minimal Isolation Costs
Hidden Markov Models Part 2: Algorithms
Discovering Functional Communities in Social Media
Coverage Approximation Algorithms
Cost-effective Outbreak Detection in Networks
A History Sensitive Cascade Model in Diffusion Networks
Bharathi-Kempe-Salek Conjecture
Kempe-Kleinberg-Tardos Conjecture A simple proof
Discovering Leaders from Community Actions
Influence Maximization
Viral Marketing over Social Networks
Independent Cascade Model and Linear Threshold Model
Analysis of Large Graphs: Overlapping Communities
Presentation transcript:

Discovering Influential Nodes From Social Trust Network Thesis Proposal by Sabbir Ahmed ahmedp@uwindsor.ca Thesis Committee: Chair- Dr. R. Frost [Computer Science] Advisor – Dr. C. I. Ezeife [Computer Science] Internal Reader – Dr. A. Ngom [Computer Science] External Reader – Dr. E. H. Kim [Department of Physics] University of Windsor, School of Computer Science

Outline 1. Introduction 2. Related Work 3. Proposed solution framework Background (SNA, Data Mining ) Viral Marketing Influence Maximization Challenges and Thesis Problem 2. Related Work 3. Proposed solution framework Thesis Contribution 4. Thesis Plan 5. References University of Windsor, School of Computer Science

1.1 Data Mining Data Mining is way of efficiently discovering interesting rules from large databases. Increasing revenue Cutting time and costs Decision making Medical Diagnosis Techniques include: Classification (e.g. classify ‘Cancer’ or ‘Not Cancer’) Association Rule (e.g. discover rule - Bread => Milk) Clustering (e.g. Cluster - ‘Risky’, ‘Not Risky’) University of Windsor, School of Computer Science

1.1 What is “Social Network” A social network is a social structure Made up of individuals / organizations / entities called "nodes“ Which are tied (connected) by one or more specific types of interdependency. Such as friendship, kinship, common interest, financial exchange, dislike etc. University of Windsor, School of Computer Science

1.1 Modeling Social Network The structure of social networks, most commonly, are modeled as directed or undirected graph G=(V,E). Where V is the set of all nodes in the network and E is the set of directed or undirected edges between nodes. E.g. V = {1,2,3,4,5,6,7,8,9,10,11} E = {(1,5), (1,3), (1,3),(2,3),(2,4), (4,5), (5,7)........(10,11)} 1 5 8 3 9 7 2 11 10 6 4 University of Windsor, School of Computer Science

1.1 Types of Social Network Graph Directed Graph Undirected Graph Signed graph Weighted graph + - – University of Windsor, School of Computer Science

1.1 Social Network Mining University of Windsor, School of Computer Science

1.2 Viral Marketing Also known as Target Advertising Initiate chain reaction by Word of mouth effect. Also known as diffusion process. Low investments, maximum gain University of Windsor, School of Computer Science

1.2 Social Influence Social influence - a force that person A (i.e., the influencer) exerts on person B to introduce a change of the behavior and/or opinion of B University of Windsor, School of Computer Science

1.2 Problem Setting Given Goal Question A limited budget B for initial advertising (e.g. give away free samples of product) Estimates for influence between individuals Goal Trigger a large cascade of influence e.g. further adoptions of a product Question Which set of individuals should B target at? if we can try to convince a subset of individuals to adopt a new product or innovation But how should we choose the few key individuals to use for seeding this process? Which blogs should one read to be most up to date? University of Windsor, School of Computer Science

1.2 Terms Used Active Diffusion Model [M] Influence Probability [pu,v] We say a node (user) is active if he/she adopts a product or performs an action. Diffusion Model [M] model that describes the diffusion process. Influence Probability [pu,v] Probability of user v getting activated given u activates. Seed Set Initially activated set Influence Spread [σM(A)] # of users getting activated by seed set A with diffusion model M. Marginal Gain Additional # of users getting activated as a result of adding any node v with respect to A. i.e. σM(A+v) - σM(A) University of Windsor, School of Computer Science

1.2 Diffusion Models A diffusion model attempts to describes the entire diffusion process And determines which nodes will be activated due the influence spread through the social network. Example - Linear threshold (LT) model and Independent Cascade (IC) model Mathematical sociology. General Threshold Model – An extension of IC and LT Model. University of Windsor, School of Computer Science

1.2 GT Model At any time any node u is either active or inactive. Diffusion process starts with initial set of active nodes. Let us consider an inactive user u and set of its active neighbors S. According to GT model probability of user u activating given active set S is: Probability increases as more of u’s neighbors also gets activated. If pu(S) ≥ θu , we can conclude that u activates. Where θu is the activation threshold of user u θu is chosen randomly and uniformly from interval [0,1] University of Windsor, School of Computer Science

1.2 General Threshold Model - Example pu(S) ≥ 1.2 General Threshold Model - Example Source: David Kempe’s slides Inactive Node 0.6 Active Node 0.2 0.2 0.3 Threshold x 0.1 Joint Influence Probability 0.4 U Probably skip this slide 0.3 0.5 Stop! 0.2 0.5 w v University of Windsor, School of Computer Science

1.2 Influence Spread function Given a social network graph G=(V,E) A diffusion model M An initial set of active vertices A⊆ V, The influence spread of set A, denoted σM(A), is the expected number of vertices to become active, under the influence of vertices in set A, once the diffusion process is over. University of Windsor, School of Computer Science

1.2 Influence Maximization Problem: Given social network G with influence probabilities, budget k, find k-node set S that maximizes σM(A) University of Windsor, School of Computer Science

1.2 Other application area Spread of virus in computer network. Contamination prevention in water distribution network. Social media recommendation Expert finding Etc. University of Windsor, School of Computer Science

Adding S’ helps very little 1.2 Solving the problem Kempe et al. showed that solving the problem exactly is NP-hard Also proved that: Influence Spread function is submodular and monotone New node: S1 S1 S’ S’ Adding S’ helps very little Adding S’ helps a lot S3 S4 S2 S2 Seed Set A={S1, S2} Seed Set A={S1, S2, S3, S4} University of Windsor, School of Computer Science

1.2 Monotone and Submodularity For all seed set it hol Monotone: A submodular function f is said to be monotone if Benefit of adding a node to a small seed set Benefit of adding a node to a large seed set adding an element to a set cannot cause f to decrease University of Windsor, School of Computer Science

1.2 Greedy Algorithm A submodular monotone function can be maximized with (1-1/e) or 63% approximation guarantee using Greedy algorithm. (Nemhauser et al. 1978) That is if A* is an optimal solution. The greedy algorithm guarantee the following: University of Windsor, School of Computer Science

1.2 Greedy Algorithm University of Windsor, School of Computer Science

1.2 Issues and Challenges Scalability Dynamic Social Graph Privacy Considering Negative Influence in IM Requires new Diffusion model Learning Influence Probability +ve and –ve Influence Probability Thesis focus University of Windsor, School of Computer Science

1.2 Motivation All previous works consider only positive influence among users. In real life a user can also have some degree of negative influence on another node. If I do not trust someone I probably will not be influenced by him. In fact I may get negatively influenced. Viral marketing differentiates itself because it is based on trust among individuals’ close social circle of families, friends, and coworkers. University of Windsor, School of Computer Science

1.2 Thesis Problem Given a trust network graph, G(V,E) with every edge (u,v) of E is directed and labelled either positive (trust) or negative (distrust). We need a diffusion model which consider both +ve and –ve influence. How we can learn both positive and negative influence from patterns of actions? How to extract influential nodes under this new model? + - – University of Windsor, School of Computer Science

2 Related Work Domingos & Richardson 2001 – Kempe et al. 2003 – Finding influential users to maximize expected lift in profit. No concrete problem formulation. Too many assumptions. Kempe et al. 2003 – Defined IM as discrete optimization problem. Proved this to NP-Hard. Influence spread is monotone and submodular. University of Windsor, School of Computer Science

2 Related Work Leskovec et al. 2007 Chen et al. 2009, 2010 Lazy forward optimization. 700 times faster than Greedy. Chen et al. 2009, 2010 Scalability of greedy algorithm. Proposed heuristic algorithms. Goyal et al. 2010 Learn Influence probabilities. Goyal et al. 2011 Credit Distribution Model. University of Windsor, School of Computer Science

3. Proposed Solution Recall General Threshold Model According to GT model probability of user u activating given S is: If pu(S) ≥ θu , we can conclude that u activates University of Windsor, School of Computer Science

3. Trust-General Threshold Model S+ is all the trusted active user of node u S- is the active distrusted user of node u In T-GT model we say: If pu(S+) > pu(S-) then u activates. Note – No need of threshold θu +ve joint influence probability -ve joint influence probability University of Windsor, School of Computer Science

3. Example Since pu(S+) < pu(S-) according to T-GT model node C wont activate. University of Windsor, School of Computer Science

3. Trust Matrix A matrix denoted as - TM(u,v) TM(u,v) is ‘+ve’ if u trusts v, ‘-ve’ otherwise. Users in trust network often tend not to declare distrust Need a way to estimate the unknown ‘?’ University of Windsor, School of Computer Science

3. Predict edge sign: A ML formulation Machine Learning formulation by Leskovec et al.: Predict sign of edge (u,v) Class label: +1: positive edge -1: negative edge Learning method: Logistic regression u v + ? – University of Windsor, School of Computer Science

3. Frequent Action Pattern Positive Frequent Action Pattern Av.u - The number of actions performed by any user u after the same action were performed by a trusted user v. Negative Frequent Action Pattern A’v.u - The number of actions not performed by any user u after the same actions were performed by a distrusted user of u. University of Windsor, School of Computer Science

3. Computing Influence Probabilities Positive influence probability: Negative influence probability: Where Av is the # of action performed by v. University of Windsor, School of Computer Science

3. Action Sequence Table University of Windsor, School of Computer Science

3. Action Sequence Action sequence of an action a, Seq (a), is a sequence of users performing the action a with respect to time. For example – Seq(a) = {u1, u2, u3, u4} Sub sequence of any user u for action a, denoted as Seq(a,u), is the sequence of users performing action a before user u. For example – Seq(a,u3) = {u1, u2} Also Seq(a)-Seq(a,u3) = {u4} University of Windsor, School of Computer Science

3. Action Pattern Generator Complexity of APG algorithm is O(A*2 |V|2 ) in worst case. Where A is number of actions and |V| is number of total nodes. University of Windsor, School of Computer Science

Running Example Action Sequence Table Trust Matrix The algorithm will start with action a from the action sequence table whose sequence is {u1, u2, u3, u4}. According to TM, u5 (did not perform a) do not trust u1,u3 and u4. So the algorithm will add 1 to A’u1.u5, A’u3.u5 and A’u4.u5 University of Windsor, School of Computer Science

Running Example (Cont.) Action Sequence Table Trust Matrix According to TM, u3 (did perform a) do trusts u1 and u2. So the algorithm will add 1 to Au1.u3 and Au2.u3 APG algorithm will continue the same process for all action and return Au.v , A’u.v and Av for all u and v. University of Windsor, School of Computer Science

Running Example (Cont.) University of Windsor, School of Computer Science

Running Example (Cont.) Positive influence probability of user u2 on user u5 that is: Similarly the negative influence probability of user u2 on user u4 that is: University of Windsor, School of Computer Science

Influence Maximization under T-GT Model Unlike previous works influence spread is not monotone. That is adding a node (or user) may result in influence spread to decrease. Let S is the initially activated seed set And ∂(S) are the nodes that were successfully activated by the seed set S. i.e. the influence spread, σ(S), is actually the number of nodes in ∂(S) or |∂(S)| Let a node w have a negative influence (due to distrust) on another node u which is in ∂(S). According to T-GT Model adding w to S will cause the probability of u getting activated to decrease. This will cause the influence spread of S+w, i.e. σ(S+w), to decrease as |∂(S)-1| < |∂(S)|. University of Windsor, School of Computer Science

Influence Maximization under T-GT Model Greedy or Lazy Forward Optimization is not applicable for IM under T-GT model. However we claim that though influence spread function in T-GT model is not monotone but still sub modular. [Need to provide formal proof!!] So IM under T-GT is a optimization of non-monotone submodular function. Based on this we can use deterministic local search algorithm of Feige et al. [2007]. Provides 1/3 approximation guarantee. University of Windsor, School of Computer Science

Solution Framework University of Windsor, School of Computer Science

Thesis Contribution First to consider negative influence and trust network for IM. Proposed a new diffusion model, T-GT, model to incorporate –ve influence. Proposed a pattern based approach to compute +ve and –ve probabilities. Show that standard IM algorithms are not applicable to T-GT model. University of Windsor, School of Computer Science

Future Work Considering dynamic network. Needs to be scalable to very large social network. Need more sophisticated way to assign threshold θ. Action based influence. University of Windsor, School of Computer Science

4. Proposed Timeline University of Windsor, School of Computer Science

5. References Chen, W., Wang, C., and Wang, Y. 2010. Scalable influence maximization for prevalent viral marketing in large-scale social networks. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. KDD ’10. ACM, New York, NY, USA, 1029–1038. Chen, W., Wang, Y., and Yang, S. 2009. Efficient influence maximization in social networks. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. KDD ’09. ACM, New York, NY, USA, 199–208. Domingos, P. and Richardson, M. 2001. Mining the network value of customers. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. KDD ’01. ACM, New York, NY, USA, 57–66. Feige, U.; Mirrokni, V.S.; Vondrak, J.; , "Maximizing Non-Monotone Submodular Functions," Foundations of Computer Science, 2007. FOCS '07. 48th Annual IEEE Symposium on , vol., no., pp.461-471, 21-23 Oct. 2007 Goyal, A., Bonchi, F., and Lakshmanan, L. V. 2010. Learning influence probabilities in social networks. In Proceedings of the third ACM international conference on Web search and data mining. WSDM ’10. ACM, New York, NY, USA, 241–250. Goyal, A., Bonchi, F., and Lakshmanan, L. V. S. 2011. A data-based approach to social influence maximization. Proc. VLDB Endow. 5, 73–84. Kempe, D., Kleinberg, J., and Tardos, E. 2003. Maximizing the spread of influence through a social network. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. KDD ’03. ACM, New York, NY, USA, 137–146. Leskovec, J., Huttenlocher, D., and Kleinberg, J. 2010a. Predicting positive and negative links in online social networks. In Proceedings of the 19th international conference on World wide web. WWW ’10. ACM, New York, NY, USA, 641–650. Leskovec, J., Krause, A., Guestrin, C., Faloutsos, C., VanBriesen, J., and Glance, N. 2007. Cost-effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. KDD ’07. ACM, New York, NY, USA, 420–429. University of Windsor, School of Computer Science

Acknowledgment Some material of the presentation is taken from slides of: Dr. David Kempe [Yahoo Research] Amit Goyal [UBC] Dr. Jure Leskovec [Stanford University] University of Windsor, School of Computer Science

QUESTION? University of Windsor, School of Computer Science