Presentation is loading. Please wait.

Presentation is loading. Please wait.

Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang.

Similar presentations


Presentation on theme: "Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang."— Presentation transcript:

1 Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang

2 What’s it all about?  There’s a growing interest in Clustering a social network of people based on their social relationships and their participation in information networks.  This paper makes use of the concept of social influence to improve the clustering quality.  Social Influence studies how the impact of people’s activity /opinions propagating towards members of a social network, via direct and indirect social connections.

3 Keywords  Graph Clustering  Heterogeneous Network  Kernels  Social Influence

4 Today’s Presentation Part One:  Definitions  Concepts  Kernels  Similarity Measurement Part Two:  Clustering Algorithm – SI CLUSTERING  Parameter-based Optimization  Experiments  Conclusions

5 Problem Statement  Model activities/events/experiences as information networks in addition to social relationships of people.  Social influence can propagate through networks: 1. Self – influence: people influence one another based solely on the social network; 2. Co – influence: people influence one another through individuals’ participation in some activity/event networks. TWO KINDS OF INFLUENCE

6 Problem Statement  Social Collaboration Network (Social Graph/ SG) THREE TYPES OF GRAPHS/NETWORKS SG = (U, E) U: set of vertices, members of the social network (e.g., authors, customers.) E: Set of edges denoting the collaborative relationships between the members. N SG : the size of U.

7 Problem Statement  Associated Activity Network (Activity Graph/ AG i ) THREE TYPES OF GRAPHS/NETWORKS AG i = (V i, S i ) V i : Activity vertices in the i th associated activity network AG i. S i : Weighted edges representing the similarity between two activity vertices. N AG i : the size of each activity vertex set.

8 Problem Statement  Influence Network (Influence Graph/ IG i ) THREE TYPES OF GRAPHS/NETWORKS

9 Problem Statement HETEROGENEOUS NETWORK When you consider both Self-influence and Co- influence networks, the network as a whole is Heterogeneous.

10 Problem Statement HETEROGENEOUS NETWORK

11 Problem Statement  Given a social graph, multiple activity graphs and corresponding influence graphs.  Problem: Partition the member vertices U into K disjoint clusters U i  A desired clustering result should achieve a good balance: (1) Vertices within one cluster should have similar collaborative patterns among themselves and similar interaction patterns with activity networks; (2) Vertices in different clusters should have dissimilar collaborative patterns and dissimilar interaction patterns with activities S ocial I nfluence-based graph Cluster ing (SI-Cluster)

12 Problem Statement Clustering algorithm should be fast and scalable to the number of influence graphs and the size of the activity graphs S ocial I nfluence-based graph Cluster ing (SI-Cluster)

13 Dataset DBLP Dataset  It consists of two types of entities: authors and conferences and three types of links: co-authorship, author-conference, conference similarity.

14 Influence-based Similarity Step 1: Heat Diffusion on Social Graph

15 Influence-based Similarity Step 2: Compute Self-influence Similarity

16 Influence-based Similarity Co-influence Kernel on Influence Graph  Non-propagating heat diffusion kernel Hi for each influence graph IG i (one hop)

17 Influence-based Similarity Co-influence Kernel on Influence Graph

18 Influence-based Similarity Step 3: Compute Propagating Co-influence Kernel on Influence Graph Philip S. Yu and his co- authors with more than 45 co-publications

19 Influence-based Similarity Step 4: Partition Activities into Clusters Philip S. Yu and his co- authors with more than 45 co-publications

20 Influence-based Similarity Propagate Heat Distribution Initial the heat distribution f ij (0) for each cluster c ij in each influence graph IG i

21 Influence-based Similarity Step 5: Compute Influence Score Based on Co-influence Model

22 Influence-based Similarity Step 6: Compute Co-influence Similarity Philip S. Yu and his co- authors with more than 45 co-publications

23 Influence-based Similarity Step 6: Compute Co-influence Similarity Co-influence Similarity Matrix Wi for each influence graph IGi Step 7: Compute Unified Co-influence based Similarity

24 SI- Clustering Algorithm What is it? Initialization the most centrally located point in a cluster as a centroid assign the rest of points to their closest centroids Clustering convergence Clustering objective Calculate Update N + 1 weights iteration

25 SI- Clustering Algorithm Cont. Initialization

26 SI- Clustering Algorithm Cont. Vertex Assignment and Centroid Update Update centroid with the most centrally located vertex in each cluster

27 SI- Clustering Algorithm Cont. Clustering Objective Function

28 SI- Clustering Algorithm Cont. Clustering Objective Function Cont.

29  Simplified: (1) cluster assignment (2) centroid update (3) weight adjustment SI- Clustering Algorithm Cont. Clustering Objective Function Cont. common to all partitioning clustering algorithms

30 SI- Clustering Algorithm Cont. Parameter-based Optimization

31 SI- Clustering Algorithm Cont. Parameter-based Optimization Cont.

32 SI- Clustering Algorithm Cont. Parameter-based Optimization Cont.

33 SI- Clustering Algorithm Cont. Parameter-based Optimization Cont.

34 SI- Clustering Algorithm Cont. Parameter-based Optimization Cont.

35 SI- Clustering Algorithm Cont. Parameter-based Optimization Cont.

36 SI- Clustering Algorithm Cont. Parameter-based Optimization Cont.

37 The procedure of solving this NPPP optimization problem includes two parts: (1) find such a reasonable parameter β (F(β) = 0), making NPPP equivalent to NFPP; (2) given the parameter β, solve a polynomial programming problem about the original variables. SI- Clustering Algorithm Cont. Adaptive Weight Adjustment & Clustering Algorithm

38  Amazon product co-purchasing network 20,000 products activity graphs: product category graph and customer review graph  DBLP bibliography data - A full version: 964,166 authors activity graphs: Conference and Keyword - A subset of DBLP data: 100,000 authors activity graphs: Conference and Keyword Evaluation Datasets

39  Algorithms to be compare - BAGC - SA-Cluster - Inc-Cluster - W-Cluster  Measures - Density: - Entropy - Davies-Bouldin Index Evaluation Cont. Baseline Methods

40  Dataset: 200,000 Amazon products.  The number of clusters: K = 40, 60, 80, 100. Evaluation Cont. Cluster quality evaluation

41  Dataset: DBI on DBLP with 100, 000 authors.  The number of clusters: K = 400, 600, 800, 1000. Evaluation Cont. Cluster quality evaluation Cont.

42  Dataset: DBI on DBLP with 964, 166 authors.  The number of clusters: K = 4000, 6000, 8000, 10000. Evaluation Cont. Cluster quality evaluation Cont.

43 Evaluation Cont. Cluster efficiency evaluation

44  Observation: Both the social weight and the keyword weight are increasing but the conference weight is decreasing with more iterations.  Explanation: People who have many publications in the same conferences may have different research topics but people who have many papers with the same keywords usually have the same research topics, and thus have a higher collaboration probability as co-authors. Evaluation Cont. Cluster convergence

45 Evaluation Cont. Case Study

46 Undefined influence- based model Webs Evaluation Compute vertex similarity Update Centroid Conclusion link entities Static activities Dynamic activities SI-Clustering a sophisticated nonlinear fractional programming problem a straightforward nonlinear parametric programming problem

47  Integrated different types of links, entities, static attributes and dynamic activities from different networks into a unifying influence-based model.  Proposed an iterative learning algorithm.  Transformed a sophisticated nonlinear fractional programming problem of multiple weights into a straightforward nonlinear parametric programming problem of single variable to speed up the clustering process. Conclusion Cont.

48 Thanks ! Q&A ?


Download ppt "Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang."

Similar presentations


Ads by Google