Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang.

Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang

What’s it all about?  There’s a growing interest in Clustering a social network of people based on their social relationships and their participation in information networks.  This paper makes use of the concept of social influence to improve the clustering quality.  Social Influence studies how the impact of people’s activity /opinions propagating towards members of a social network, via direct and indirect social connections.

Keywords  Graph Clustering  Heterogeneous Network  Kernels  Social Influence

Today’s Presentation Part One:  Definitions  Concepts  Kernels  Similarity Measurement Part Two:  Clustering Algorithm – SI CLUSTERING  Parameter-based Optimization  Experiments  Conclusions

Problem Statement  Model activities/events/experiences as information networks in addition to social relationships of people.  Social influence can propagate through networks: 1. Self – influence: people influence one another based solely on the social network; 2. Co – influence: people influence one another through individuals’ participation in some activity/event networks. TWO KINDS OF INFLUENCE

Problem Statement  Social Collaboration Network (Social Graph/ SG) THREE TYPES OF GRAPHS/NETWORKS SG = (U, E) U: set of vertices, members of the social network (e.g., authors, customers.) E: Set of edges denoting the collaborative relationships between the members. N SG : the size of U.

Problem Statement  Associated Activity Network (Activity Graph/ AG i ) THREE TYPES OF GRAPHS/NETWORKS AG i = (V i, S i ) V i : Activity vertices in the i th associated activity network AG i. S i : Weighted edges representing the similarity between two activity vertices. N AG i : the size of each activity vertex set.

Problem Statement  Influence Network (Influence Graph/ IG i ) THREE TYPES OF GRAPHS/NETWORKS

Problem Statement HETEROGENEOUS NETWORK When you consider both Self-influence and Co- influence networks, the network as a whole is Heterogeneous.

Problem Statement HETEROGENEOUS NETWORK

Problem Statement  Given a social graph, multiple activity graphs and corresponding influence graphs.  Problem: Partition the member vertices U into K disjoint clusters U i  A desired clustering result should achieve a good balance: (1) Vertices within one cluster should have similar collaborative patterns among themselves and similar interaction patterns with activity networks; (2) Vertices in different clusters should have dissimilar collaborative patterns and dissimilar interaction patterns with activities S ocial I nfluence-based graph Cluster ing (SI-Cluster)

Problem Statement Clustering algorithm should be fast and scalable to the number of influence graphs and the size of the activity graphs S ocial I nfluence-based graph Cluster ing (SI-Cluster)

Dataset DBLP Dataset  It consists of two types of entities: authors and conferences and three types of links: co-authorship, author-conference, conference similarity.

Influence-based Similarity Step 1: Heat Diffusion on Social Graph

Influence-based Similarity Step 2: Compute Self-influence Similarity

Influence-based Similarity Co-influence Kernel on Influence Graph  Non-propagating heat diffusion kernel Hi for each influence graph IG i (one hop)

Influence-based Similarity Co-influence Kernel on Influence Graph

Influence-based Similarity Step 3: Compute Propagating Co-influence Kernel on Influence Graph Philip S. Yu and his co- authors with more than 45 co-publications

Influence-based Similarity Step 4: Partition Activities into Clusters Philip S. Yu and his co- authors with more than 45 co-publications

Influence-based Similarity Propagate Heat Distribution Initial the heat distribution f ij (0) for each cluster c ij in each influence graph IG i

Influence-based Similarity Step 5: Compute Influence Score Based on Co-influence Model

Influence-based Similarity Step 6: Compute Co-influence Similarity Philip S. Yu and his co- authors with more than 45 co-publications

Influence-based Similarity Step 6: Compute Co-influence Similarity Co-influence Similarity Matrix Wi for each influence graph IGi Step 7: Compute Unified Co-influence based Similarity

SI- Clustering Algorithm What is it? Initialization the most centrally located point in a cluster as a centroid assign the rest of points to their closest centroids Clustering convergence Clustering objective Calculate Update N + 1 weights iteration

SI- Clustering Algorithm Cont. Initialization

SI- Clustering Algorithm Cont. Vertex Assignment and Centroid Update Update centroid with the most centrally located vertex in each cluster

SI- Clustering Algorithm Cont. Clustering Objective Function

SI- Clustering Algorithm Cont. Clustering Objective Function Cont.

 Simplified: (1) cluster assignment (2) centroid update (3) weight adjustment SI- Clustering Algorithm Cont. Clustering Objective Function Cont. common to all partitioning clustering algorithms

SI- Clustering Algorithm Cont. Parameter-based Optimization

SI- Clustering Algorithm Cont. Parameter-based Optimization Cont.

The procedure of solving this NPPP optimization problem includes two parts: (1) find such a reasonable parameter β (F(β) = 0), making NPPP equivalent to NFPP; (2) given the parameter β, solve a polynomial programming problem about the original variables. SI- Clustering Algorithm Cont. Adaptive Weight Adjustment & Clustering Algorithm

 Amazon product co-purchasing network 20,000 products activity graphs: product category graph and customer review graph  DBLP bibliography data - A full version: 964,166 authors activity graphs: Conference and Keyword - A subset of DBLP data: 100,000 authors activity graphs: Conference and Keyword Evaluation Datasets

 Algorithms to be compare - BAGC - SA-Cluster - Inc-Cluster - W-Cluster  Measures - Density: - Entropy - Davies-Bouldin Index Evaluation Cont. Baseline Methods

 Dataset: 200,000 Amazon products.  The number of clusters: K = 40, 60, 80, 100. Evaluation Cont. Cluster quality evaluation

 Dataset: DBI on DBLP with 100, 000 authors.  The number of clusters: K = 400, 600, 800, 1000. Evaluation Cont. Cluster quality evaluation Cont.

 Dataset: DBI on DBLP with 964, 166 authors.  The number of clusters: K = 4000, 6000, 8000, 10000. Evaluation Cont. Cluster quality evaluation Cont.

Evaluation Cont. Cluster efficiency evaluation

 Observation: Both the social weight and the keyword weight are increasing but the conference weight is decreasing with more iterations.  Explanation: People who have many publications in the same conferences may have different research topics but people who have many papers with the same keywords usually have the same research topics, and thus have a higher collaboration probability as co-authors. Evaluation Cont. Cluster convergence

Evaluation Cont. Case Study

Undefined influence- based model Webs Evaluation Compute vertex similarity Update Centroid Conclusion link entities Static activities Dynamic activities SI-Clustering a sophisticated nonlinear fractional programming problem a straightforward nonlinear parametric programming problem

 Integrated different types of links, entities, static attributes and dynamic activities from different networks into a unifying influence-based model.  Proposed an iterative learning algorithm.  Transformed a sophisticated nonlinear fractional programming problem of multiple weights into a straightforward nonlinear parametric programming problem of single variable to speed up the clustering process. Conclusion Cont.

Thanks ！ Q&A ？

Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang.

Similar presentations

Presentation on theme: "Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang.

Similar presentations

Presentation on theme: "Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang."— Presentation transcript:

Similar presentations

About project

Feedback