Discovering Influential Nodes From Social Trust Network

Discovering Influential Nodes From Social Trust Network
Thesis Proposal by Sabbir Ahmed Thesis Committee: Chair- Dr. R. Frost [Computer Science] Advisor – Dr. C. I. Ezeife [Computer Science] Internal Reader – Dr. A. Ngom [Computer Science] External Reader – Dr. E. H. Kim [Department of Physics] University of Windsor, School of Computer Science

Outline 1. Introduction 2. Related Work 3. Proposed solution framework
Background (SNA, Data Mining ) Viral Marketing Influence Maximization Challenges and Thesis Problem 2. Related Work 3. Proposed solution framework Thesis Contribution 4. Thesis Plan 5. References University of Windsor, School of Computer Science

1.1 Data Mining Data Mining is way of efficiently discovering interesting rules from large databases. Increasing revenue Cutting time and costs Decision making Medical Diagnosis Techniques include: Classification (e.g. classify ‘Cancer’ or ‘Not Cancer’) Association Rule (e.g. discover rule - Bread => Milk) Clustering (e.g. Cluster - ‘Risky’, ‘Not Risky’) University of Windsor, School of Computer Science

1.1 What is “Social Network”
A social network is a social structure Made up of individuals / organizations / entities called "nodes“ Which are tied (connected) by one or more specific types of interdependency. Such as friendship, kinship, common interest, financial exchange, dislike etc. University of Windsor, School of Computer Science

1.1 Modeling Social Network
The structure of social networks, most commonly, are modeled as directed or undirected graph G=(V,E). Where V is the set of all nodes in the network and E is the set of directed or undirected edges between nodes. E.g. V = {1,2,3,4,5,6,7,8,9,10,11} E = {(1,5), (1,3), (1,3),(2,3),(2,4), (4,5), (5,7) (10,11)} 1 5 8 3 9 7 2 11 10 6 4 University of Windsor, School of Computer Science

1.1 Types of Social Network Graph
Directed Graph Undirected Graph Signed graph Weighted graph + - – University of Windsor, School of Computer Science

1.1 Social Network Mining University of Windsor, School of Computer Science

1.2 Viral Marketing Also known as Target Advertising
Initiate chain reaction by Word of mouth effect. Also known as diffusion process. Low investments, maximum gain University of Windsor, School of Computer Science

1.2 Social Influence Social influence - a force that person A (i.e., the influencer) exerts on person B to introduce a change of the behavior and/or opinion of B University of Windsor, School of Computer Science

1.2 Problem Setting Given Goal Question
A limited budget B for initial advertising (e.g. give away free samples of product) Estimates for influence between individuals Goal Trigger a large cascade of influence e.g. further adoptions of a product Question Which set of individuals should B target at? if we can try to convince a subset of individuals to adopt a new product or innovation But how should we choose the few key individuals to use for seeding this process? Which blogs should one read to be most up to date? University of Windsor, School of Computer Science

1.2 Terms Used Active Diffusion Model [M] Influence Probability [pu,v]
We say a node (user) is active if he/she adopts a product or performs an action. Diffusion Model [M] model that describes the diffusion process. Influence Probability [pu,v] Probability of user v getting activated given u activates. Seed Set Initially activated set Influence Spread [σM(A)] # of users getting activated by seed set A with diffusion model M. Marginal Gain Additional # of users getting activated as a result of adding any node v with respect to A. i.e. σM(A+v) - σM(A) University of Windsor, School of Computer Science

1.2 Diffusion Models A diffusion model attempts to describes the entire diffusion process And determines which nodes will be activated due the influence spread through the social network. Example - Linear threshold (LT) model and Independent Cascade (IC) model Mathematical sociology. General Threshold Model – An extension of IC and LT Model. University of Windsor, School of Computer Science

1.2 GT Model At any time any node u is either active or inactive.
Diffusion process starts with initial set of active nodes. Let us consider an inactive user u and set of its active neighbors S. According to GT model probability of user u activating given active set S is: Probability increases as more of u’s neighbors also gets activated. If pu(S) ≥ θu , we can conclude that u activates. Where θu is the activation threshold of user u θu is chosen randomly and uniformly from interval [0,1] University of Windsor, School of Computer Science

1.2 General Threshold Model - Example
pu(S) ≥ 1.2 General Threshold Model - Example Source: David Kempe’s slides Inactive Node 0.6 Active Node 0.2 0.2 0.3 Threshold x 0.1 Joint Influence Probability 0.4 U Probably skip this slide 0.3 0.5 Stop! 0.2 0.5 w v University of Windsor, School of Computer Science

1.2 Influence Spread function
Given a social network graph G=(V,E) A diffusion model M An initial set of active vertices A⊆ V, The influence spread of set A, denoted σM(A), is the expected number of vertices to become active, under the influence of vertices in set A, once the diffusion process is over. University of Windsor, School of Computer Science

1.2 Influence Maximization
Problem: Given social network G with influence probabilities, budget k, find k-node set S that maximizes σM(A) University of Windsor, School of Computer Science

1.2 Other application area
Spread of virus in computer network. Contamination prevention in water distribution network. Social media recommendation Expert finding Etc. University of Windsor, School of Computer Science

Adding S’ helps very little
1.2 Solving the problem Kempe et al. showed that solving the problem exactly is NP-hard Also proved that: Influence Spread function is submodular and monotone New node: S1 S1 S’ S’ Adding S’ helps very little Adding S’ helps a lot S3 S4 S2 S2 Seed Set A={S1, S2} Seed Set A={S1, S2, S3, S4} University of Windsor, School of Computer Science

1.2 Monotone and Submodularity
For all seed set it hol Monotone: A submodular function f is said to be monotone if Benefit of adding a node to a small seed set Benefit of adding a node to a large seed set adding an element to a set cannot cause f to decrease University of Windsor, School of Computer Science

1.2 Greedy Algorithm A submodular monotone function can be maximized with (1-1/e) or 63% approximation guarantee using Greedy algorithm. (Nemhauser et al. 1978) That is if A* is an optimal solution. The greedy algorithm guarantee the following: University of Windsor, School of Computer Science

1.2 Greedy Algorithm University of Windsor, School of Computer Science

1.2 Issues and Challenges Scalability Dynamic Social Graph Privacy
Considering Negative Influence in IM Requires new Diffusion model Learning Influence Probability +ve and –ve Influence Probability Thesis focus University of Windsor, School of Computer Science

1.2 Motivation All previous works consider only positive influence among users. In real life a user can also have some degree of negative influence on another node. If I do not trust someone I probably will not be influenced by him. In fact I may get negatively influenced. Viral marketing differentiates itself because it is based on trust among individuals’ close social circle of families, friends, and coworkers. University of Windsor, School of Computer Science

1.2 Thesis Problem Given a trust network graph, G(V,E) with every edge (u,v) of E is directed and labelled either positive (trust) or negative (distrust). We need a diffusion model which consider both +ve and –ve influence. How we can learn both positive and negative influence from patterns of actions? How to extract influential nodes under this new model? + - – University of Windsor, School of Computer Science

2 Related Work Domingos & Richardson 2001 – Kempe et al. 2003 –
Finding influential users to maximize expected lift in profit. No concrete problem formulation. Too many assumptions. Kempe et al – Defined IM as discrete optimization problem. Proved this to NP-Hard. Influence spread is monotone and submodular. University of Windsor, School of Computer Science

2 Related Work Leskovec et al. 2007 Chen et al. 2009, 2010
Lazy forward optimization. 700 times faster than Greedy. Chen et al. 2009, 2010 Scalability of greedy algorithm. Proposed heuristic algorithms. Goyal et al. 2010 Learn Influence probabilities. Goyal et al. 2011 Credit Distribution Model. University of Windsor, School of Computer Science

3. Proposed Solution Recall General Threshold Model
According to GT model probability of user u activating given S is: If pu(S) ≥ θu , we can conclude that u activates University of Windsor, School of Computer Science

3. Trust-General Threshold Model
S+ is all the trusted active user of node u S- is the active distrusted user of node u In T-GT model we say: If pu(S+) > pu(S-) then u activates. Note – No need of threshold θu +ve joint influence probability -ve joint influence probability University of Windsor, School of Computer Science

3. Example Since pu(S+) < pu(S-) according to T-GT model node C wont activate. University of Windsor, School of Computer Science

3. Trust Matrix A matrix denoted as - TM(u,v)
TM(u,v) is ‘+ve’ if u trusts v, ‘-ve’ otherwise. Users in trust network often tend not to declare distrust Need a way to estimate the unknown ‘?’ University of Windsor, School of Computer Science

3. Predict edge sign: A ML formulation
Machine Learning formulation by Leskovec et al.: Predict sign of edge (u,v) Class label: +1: positive edge -1: negative edge Learning method: Logistic regression u v + ? – University of Windsor, School of Computer Science

3. Frequent Action Pattern
Positive Frequent Action Pattern Av.u - The number of actions performed by any user u after the same action were performed by a trusted user v. Negative Frequent Action Pattern A’v.u - The number of actions not performed by any user u after the same actions were performed by a distrusted user of u. University of Windsor, School of Computer Science

3. Computing Influence Probabilities
Positive influence probability: Negative influence probability: Where Av is the # of action performed by v. University of Windsor, School of Computer Science

3. Action Sequence Table University of Windsor, School of Computer Science

3. Action Sequence Action sequence of an action a, Seq (a), is a sequence of users performing the action a with respect to time. For example – Seq(a) = {u1, u2, u3, u4} Sub sequence of any user u for action a, denoted as Seq(a,u), is the sequence of users performing action a before user u. For example – Seq(a,u3) = {u1, u2} Also Seq(a)-Seq(a,u3) = {u4} University of Windsor, School of Computer Science

3. Action Pattern Generator
Complexity of APG algorithm is O(A*2 |V|2 ) in worst case. Where A is number of actions and |V| is number of total nodes. University of Windsor, School of Computer Science

Running Example Action Sequence Table Trust Matrix The algorithm will start with action a from the action sequence table whose sequence is {u1, u2, u3, u4}. According to TM, u5 (did not perform a) do not trust u1,u3 and u4. So the algorithm will add 1 to A’u1.u5, A’u3.u5 and A’u4.u5 University of Windsor, School of Computer Science

Running Example (Cont.)
Action Sequence Table Trust Matrix According to TM, u3 (did perform a) do trusts u1 and u2. So the algorithm will add 1 to Au1.u3 and Au2.u3 APG algorithm will continue the same process for all action and return Au.v , A’u.v and Av for all u and v. University of Windsor, School of Computer Science

University of Windsor, School of Computer Science

Positive influence probability of user u2 on user u5 that is: Similarly the negative influence probability of user u2 on user u4 that is: University of Windsor, School of Computer Science

Influence Maximization under T-GT Model
Unlike previous works influence spread is not monotone. That is adding a node (or user) may result in influence spread to decrease. Let S is the initially activated seed set And ∂(S) are the nodes that were successfully activated by the seed set S. i.e. the influence spread, σ(S), is actually the number of nodes in ∂(S) or |∂(S)| Let a node w have a negative influence (due to distrust) on another node u which is in ∂(S). According to T-GT Model adding w to S will cause the probability of u getting activated to decrease. This will cause the influence spread of S+w, i.e. σ(S+w), to decrease as |∂(S)-1| < |∂(S)|. University of Windsor, School of Computer Science

Influence Maximization under T-GT Model
Greedy or Lazy Forward Optimization is not applicable for IM under T-GT model. However we claim that though influence spread function in T-GT model is not monotone but still sub modular. [Need to provide formal proof!!] So IM under T-GT is a optimization of non-monotone submodular function. Based on this we can use deterministic local search algorithm of Feige et al. [2007]. Provides 1/3 approximation guarantee. University of Windsor, School of Computer Science

Solution Framework University of Windsor, School of Computer Science

Thesis Contribution First to consider negative influence and trust network for IM. Proposed a new diffusion model, T-GT, model to incorporate –ve influence. Proposed a pattern based approach to compute +ve and –ve probabilities. Show that standard IM algorithms are not applicable to T-GT model. University of Windsor, School of Computer Science

Future Work Considering dynamic network.
Needs to be scalable to very large social network. Need more sophisticated way to assign threshold θ. Action based influence. University of Windsor, School of Computer Science

4. Proposed Timeline University of Windsor, School of Computer Science

5. References Chen, W., Wang, C., and Wang, Y Scalable influence maximization for prevalent viral marketing in large-scale social networks. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. KDD ’10. ACM, New York, NY, USA, 1029–1038. Chen, W., Wang, Y., and Yang, S Efficient influence maximization in social networks. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. KDD ’09. ACM, New York, NY, USA, 199–208. Domingos, P. and Richardson, M Mining the network value of customers. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. KDD ’01. ACM, New York, NY, USA, 57–66. Feige, U.; Mirrokni, V.S.; Vondrak, J.; , "Maximizing Non-Monotone Submodular Functions," Foundations of Computer Science, FOCS '07. 48th Annual IEEE Symposium on , vol., no., pp , Oct. 2007 Goyal, A., Bonchi, F., and Lakshmanan, L. V Learning influence probabilities in social networks. In Proceedings of the third ACM international conference on Web search and data mining. WSDM ’10. ACM, New York, NY, USA, 241–250. Goyal, A., Bonchi, F., and Lakshmanan, L. V. S A data-based approach to social influence maximization. Proc. VLDB Endow. 5, 73–84. Kempe, D., Kleinberg, J., and Tardos, E Maximizing the spread of influence through a social network. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. KDD ’03. ACM, New York, NY, USA, 137–146. Leskovec, J., Huttenlocher, D., and Kleinberg, J. 2010a. Predicting positive and negative links in online social networks. In Proceedings of the 19th international conference on World wide web. WWW ’10. ACM, New York, NY, USA, 641–650. Leskovec, J., Krause, A., Guestrin, C., Faloutsos, C., VanBriesen, J., and Glance, N Cost-effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. KDD ’07. ACM, New York, NY, USA, 420–429. University of Windsor, School of Computer Science

Acknowledgment Some material of the presentation is taken from slides of: Dr. David Kempe [Yahoo Research] Amit Goyal [UBC] Dr. Jure Leskovec [Stanford University] University of Windsor, School of Computer Science

QUESTION? University of Windsor, School of Computer Science

Discovering Influential Nodes From Social Trust Network

Similar presentations

Presentation on theme: "Discovering Influential Nodes From Social Trust Network"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Discovering Influential Nodes From Social Trust Network

Similar presentations

Presentation on theme: "Discovering Influential Nodes From Social Trust Network"— Presentation transcript:

Similar presentations

About project

Feedback