Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang.

Slides:



Advertisements
Similar presentations
Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia Joint work.
Advertisements

Social network partition Presenter: Xiaofei Cao Partick Berg.
ICDE 2014 LinkSCAN*: Overlapping Community Detection Using the Link-Space Transformation Sungsu Lim †, Seungwoo Ryu ‡, Sejeong Kwon§, Kyomin Jung ¶, and.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Andreas Papadopoulos - [WI 2013] IEEE/WIC/ACM International Conference on Web Intelligence Nov , 2013 Atlanta, GA USA A. Papadopoulos,
Maximizing the Spread of Influence through a Social Network
Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.
Schema Summarization cong Yu Department of EECS University of Michigan H. V. Jagadish Department of EECS University of Michigan
1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.
Graph Data Management Lab School of Computer Science , Bristol, UK.
Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.
Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool,
On Community Outliers and their Efficient Detection in Information Networks Jing Gao 1, Feng Liang 1, Wei Fan 2, Chi Wang 1, Yizhou Sun 1, Jiawei Han 1.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Social Position & Social Role Lei Tang 2009/02/13.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
On the Construction of Energy- Efficient Broadcast Tree with Hitch-hiking in Wireless Networks Source: 2004 International Performance Computing and Communications.
Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.
Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.
Models of Influence in Online Social Networks
Social Network Analysis via Factor Graph Model
Topology Design for Service Overlay Networks with Bandwidth Guarantees Sibelius Vieira* Jorg Liebeherr** *Department of Computer Science Catholic University.
Surface Simplification Using Quadric Error Metrics Michael Garland Paul S. Heckbert.
Graph Partitioning and Clustering E={w ij } Set of weighted edges indicating pair-wise similarity between points Similarity Graph.
Community Evolution in Dynamic Multi-Mode Networks Lei Tang, Huan Liu Jianping Zhang Zohreh Nazeri Danesh Zandi & Afshin Rahmany Spring 12SRBIAU, Kurdistan.
Principles of Social Network Analysis. Definition of Social Networks “A social network is a set of actors that may have relationships with one another”
2015/10/111 DBconnect: Mining Research Community on DBLP Data Osmar R. Zaïane, Jiyang Chen, Randy Goebel Web Mining and Social Network Analysis Workshop.
On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Handover and Tracking in a Camera Network Presented by Dima Gershovich.
P-Rank: A Comprehensive Structural Similarity Measure over Information Networks CIKM’ 09 November 3 rd, 2009, Hong Kong Peixiang Zhao, Jiawei Han, Yizhou.
Attributed Visualization of Collaborative Workspaces Mao Lin Huang, Quang Vinh Nguyen and Tom Hintz Faculty of Information Technology University of Technology,
On Node Classification in Dynamic Content-based Networks.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Measuring Behavioral Trust in Social Networks
Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.
Efficient Semi-supervised Spectral Co-clustering with Constraints
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
Relation Strength-Aware Clustering of Heterogeneous Information Networks with Incomplete Attributes ∗ Source: VLDB.
Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
Using category-Based Adherence to Cluster Market-Basket Data Author : Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen Graduate : Chien-Ming Hsiao.
Biao Wang 1, Ge Chen 1, Luoyi Fu 1, Li Song 1, Xinbing Wang 1, Xue Liu 2 1 Shanghai Jiao Tong University 2 McGill University
::Network Optimization:: Minimum Spanning Trees and Clustering Taufik Djatna, Dr.Eng. 1.
Machine Learning Lecture 4: Unsupervised Learning (clustering) 1.
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
Exploring Social Tagging Graph for Web Object Classification
Finding Dense and Connected Subgraphs in Dual Networks
Greedy & Heuristic algorithms in Influence Maximization
Discrete ABC Based on Similarity for GCP
Greedy Algorithm for Community Detection
Collective Network Linkage across Heterogeneous Social Platforms
Clustering (3) Center-based algorithms Fuzzy k-means
A Consensus-Based Clustering Method
Integrating Meta-Path Selection With User-Guided Object Clustering in Heterogeneous Information Networks Yizhou Sun†, Brandon Norick†, Jiawei Han†, Xifeng.
Network Science: A Short Introduction i3 Workshop
Community Distribution Outliers in Heterogeneous Information Networks
An Efficient method to recommend research papers and highly influential authors. VIRAJITHA KARNATAPU.
Graph Clustering Based on Structural/Attribute Similarities
Dynamic Supervised Community-Topic Model
Text Categorization Berlin Chen 2003 Reference:
Presented by Nick Janus
--WWW 2010, Hongji Bao, Edward Y. Chang
Modeling Topic Diffusion in Scientific Collaboration Networks
Presentation transcript:

Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang

What’s it all about?  There’s a growing interest in Clustering a social network of people based on their social relationships and their participation in information networks.  This paper makes use of the concept of social influence to improve the clustering quality.  Social Influence studies how the impact of people’s activity /opinions propagating towards members of a social network, via direct and indirect social connections.

Keywords  Graph Clustering  Heterogeneous Network  Kernels  Social Influence

Today’s Presentation Part One:  Definitions  Concepts  Kernels  Similarity Measurement Part Two:  Clustering Algorithm – SI CLUSTERING  Parameter-based Optimization  Experiments  Conclusions

Problem Statement  Model activities/events/experiences as information networks in addition to social relationships of people.  Social influence can propagate through networks: 1. Self – influence: people influence one another based solely on the social network; 2. Co – influence: people influence one another through individuals’ participation in some activity/event networks. TWO KINDS OF INFLUENCE

Problem Statement  Social Collaboration Network (Social Graph/ SG) THREE TYPES OF GRAPHS/NETWORKS SG = (U, E) U: set of vertices, members of the social network (e.g., authors, customers.) E: Set of edges denoting the collaborative relationships between the members. N SG : the size of U.

Problem Statement  Associated Activity Network (Activity Graph/ AG i ) THREE TYPES OF GRAPHS/NETWORKS AG i = (V i, S i ) V i : Activity vertices in the i th associated activity network AG i. S i : Weighted edges representing the similarity between two activity vertices. N AG i : the size of each activity vertex set.

Problem Statement  Influence Network (Influence Graph/ IG i ) THREE TYPES OF GRAPHS/NETWORKS

Problem Statement HETEROGENEOUS NETWORK When you consider both Self-influence and Co- influence networks, the network as a whole is Heterogeneous.

Problem Statement HETEROGENEOUS NETWORK

Problem Statement  Given a social graph, multiple activity graphs and corresponding influence graphs.  Problem: Partition the member vertices U into K disjoint clusters U i  A desired clustering result should achieve a good balance: (1) Vertices within one cluster should have similar collaborative patterns among themselves and similar interaction patterns with activity networks; (2) Vertices in different clusters should have dissimilar collaborative patterns and dissimilar interaction patterns with activities S ocial I nfluence-based graph Cluster ing (SI-Cluster)

Problem Statement Clustering algorithm should be fast and scalable to the number of influence graphs and the size of the activity graphs S ocial I nfluence-based graph Cluster ing (SI-Cluster)

Dataset DBLP Dataset  It consists of two types of entities: authors and conferences and three types of links: co-authorship, author-conference, conference similarity.

Influence-based Similarity Step 1: Heat Diffusion on Social Graph

Influence-based Similarity Step 2: Compute Self-influence Similarity

Influence-based Similarity Co-influence Kernel on Influence Graph  Non-propagating heat diffusion kernel Hi for each influence graph IG i (one hop)

Influence-based Similarity Co-influence Kernel on Influence Graph

Influence-based Similarity Step 3: Compute Propagating Co-influence Kernel on Influence Graph Philip S. Yu and his co- authors with more than 45 co-publications

Influence-based Similarity Step 4: Partition Activities into Clusters Philip S. Yu and his co- authors with more than 45 co-publications

Influence-based Similarity Propagate Heat Distribution Initial the heat distribution f ij (0) for each cluster c ij in each influence graph IG i

Influence-based Similarity Step 5: Compute Influence Score Based on Co-influence Model

Influence-based Similarity Step 6: Compute Co-influence Similarity Philip S. Yu and his co- authors with more than 45 co-publications

Influence-based Similarity Step 6: Compute Co-influence Similarity Co-influence Similarity Matrix Wi for each influence graph IGi Step 7: Compute Unified Co-influence based Similarity

SI- Clustering Algorithm What is it? Initialization the most centrally located point in a cluster as a centroid assign the rest of points to their closest centroids Clustering convergence Clustering objective Calculate Update N + 1 weights iteration

SI- Clustering Algorithm Cont. Initialization

SI- Clustering Algorithm Cont. Vertex Assignment and Centroid Update Update centroid with the most centrally located vertex in each cluster

SI- Clustering Algorithm Cont. Clustering Objective Function

SI- Clustering Algorithm Cont. Clustering Objective Function Cont.

 Simplified: (1) cluster assignment (2) centroid update (3) weight adjustment SI- Clustering Algorithm Cont. Clustering Objective Function Cont. common to all partitioning clustering algorithms

SI- Clustering Algorithm Cont. Parameter-based Optimization

SI- Clustering Algorithm Cont. Parameter-based Optimization Cont.

SI- Clustering Algorithm Cont. Parameter-based Optimization Cont.

SI- Clustering Algorithm Cont. Parameter-based Optimization Cont.

SI- Clustering Algorithm Cont. Parameter-based Optimization Cont.

SI- Clustering Algorithm Cont. Parameter-based Optimization Cont.

SI- Clustering Algorithm Cont. Parameter-based Optimization Cont.

The procedure of solving this NPPP optimization problem includes two parts: (1) find such a reasonable parameter β (F(β) = 0), making NPPP equivalent to NFPP; (2) given the parameter β, solve a polynomial programming problem about the original variables. SI- Clustering Algorithm Cont. Adaptive Weight Adjustment & Clustering Algorithm

 Amazon product co-purchasing network 20,000 products activity graphs: product category graph and customer review graph  DBLP bibliography data - A full version: 964,166 authors activity graphs: Conference and Keyword - A subset of DBLP data: 100,000 authors activity graphs: Conference and Keyword Evaluation Datasets

 Algorithms to be compare - BAGC - SA-Cluster - Inc-Cluster - W-Cluster  Measures - Density: - Entropy - Davies-Bouldin Index Evaluation Cont. Baseline Methods

 Dataset: 200,000 Amazon products.  The number of clusters: K = 40, 60, 80, 100. Evaluation Cont. Cluster quality evaluation

 Dataset: DBI on DBLP with 100, 000 authors.  The number of clusters: K = 400, 600, 800, Evaluation Cont. Cluster quality evaluation Cont.

 Dataset: DBI on DBLP with 964, 166 authors.  The number of clusters: K = 4000, 6000, 8000, Evaluation Cont. Cluster quality evaluation Cont.

Evaluation Cont. Cluster efficiency evaluation

 Observation: Both the social weight and the keyword weight are increasing but the conference weight is decreasing with more iterations.  Explanation: People who have many publications in the same conferences may have different research topics but people who have many papers with the same keywords usually have the same research topics, and thus have a higher collaboration probability as co-authors. Evaluation Cont. Cluster convergence

Evaluation Cont. Case Study

Undefined influence- based model Webs Evaluation Compute vertex similarity Update Centroid Conclusion link entities Static activities Dynamic activities SI-Clustering a sophisticated nonlinear fractional programming problem a straightforward nonlinear parametric programming problem

 Integrated different types of links, entities, static attributes and dynamic activities from different networks into a unifying influence-based model.  Proposed an iterative learning algorithm.  Transformed a sophisticated nonlinear fractional programming problem of multiple weights into a straightforward nonlinear parametric programming problem of single variable to speed up the clustering process. Conclusion Cont.

Thanks ! Q&A ?