Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool,

Slides:



Advertisements
Similar presentations
Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia Joint work.
Advertisements

Community Detection and Graph-based Clustering
Cluster Analysis: Basic Concepts and Algorithms
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Hierarchical Clustering, DBSCAN The EM Algorithm
Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida.
Community Detection with Edge Content in Social Media Networks Paper presented by Konstantinos Giannakopoulos.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Community Detection and Evaluation
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Clustering II CMPUT 466/551 Nilanjan Ray. Mean-shift Clustering Will show slides from:
Social Media Mining Chapter 5 1 Chapter 5, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool, September, 2010.
Data Mining Techniques: Clustering
Nodes, Ties and Influence
3.3 Network-Centric Community Detection
Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering.
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
Lecture 21: Spectral Clustering
Lecture 8 Communities Slides modified from Huan Liu, Lei Tang, Nitin Agarwal.
Discovering Overlapping Groups in Social Media Xufei Wang, Lei Tang, Huiji Gao, and Huan Liu Arizona State University.
A General Model for Relational Clustering Bo Long and Zhongfei (Mark) Zhang Computer Science Dept./Watson School SUNY Binghamton Xiaoyun Wu Yahoo! Inc.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Social Position & Social Role Lei Tang 2009/02/13.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Segmentation Graph-Theoretic Clustering.
What is Cluster Analysis?
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.
Radial Basis Function Networks
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
COMMUNITIES IN MULTI-MODE NETWORKS 1. Heterogeneous Network Heterogeneous kinds of objects in social media – YouTube Users, tags, videos, ads – Del.icio.us.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Image Segmentation Rob Atlas Nick Bridle Evan Radkoff.
Lecture 18 Community structures Slides modified from Huan Liu, Lei Tang, Nitin Agarwal.
Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning.
Community Evolution in Dynamic Multi-Mode Networks Lei Tang, Huan Liu Jianping Zhang Zohreh Nazeri Danesh Zandi & Afshin Rahmany Spring 12SRBIAU, Kurdistan.
Chapter 14: SEGMENTATION BY CLUSTERING 1. 2 Outline Introduction Human Vision & Gestalt Properties Applications – Background Subtraction – Shot Boundary.
CSIE Dept., National Taiwan Univ., Taiwan
DATA MINING LECTURE 13 Pagerank, Absorbing Random Walks Coverage Problems.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 6. Dimensionality Reduction.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
About Me Swaroop Butala  MSCS – graduating in Dec 09  Specialization: Systems and Databases  Interests:  Learning new technologies  Application of.
Mining information from social media
CS 590 Term Project Epidemic model on Facebook
Relation Strength-Aware Clustering of Heterogeneous Information Networks with Incomplete Attributes ∗ Source: VLDB.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
Ultra-high dimensional feature selection Yun Li
3.3 Network-Centric Community Detection  Network-Centric Community Detection –consider the global topology of a network. –It aims to partition nodes of.
Scalable Learning of Collective Behavior Based on Sparse Social Dimensions Lei Tang, Huan Liu CIKM ’ 09 Speaker: Hsin-Lan, Wang Date: 2010/02/01.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Algebraic Techniques for Analysis of Large Discrete-Valued Datasets 
Spectral Clustering Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Document Clustering Based on Non-negative Matrix Factorization
Algorithms and Networks
Community detection in graphs
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
Segmentation Graph-Theoretic Clustering.
Step-By-Step Instructions for Miniproject 2
CSE572, CBS598: Data Mining by H. Liu
Discovering Functional Communities in Social Media
Spectral Clustering Eric Xing Lecture 8, August 13, 2010
3.3 Network-Centric Community Detection
CSE572: Data Mining by H. Liu
Analysis of Large Graphs: Overlapping Communities
Presentation transcript:

Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool, September, 2010.

Heterogeneous Networks Heterogeneous Network Heterogeneous Interactions Multi-Dimensional Network Heterogeneous Nodes Multi-Mode Network 2

Multi-Dimensional Networks Communications in social media are multi-dimensional Networks often involve heterogeneous connections – E.g. at YouTube, two users can be connected through friendship connection, communications, subscription/Fans, chatter in comments, etc. a.k.a. multi-relational networks, multiplex networks, labeled graphs 3

Multi-Mode Networks Interactions in social media may involve heterogeneous types of entities Networks involve multiple modes of nodes – Within-mode interaction, between-mode interaction – Different types of interactions between different modes 4

Why Does Heterogeneity Matter Social media introduces heterogeneity It calls for solutions to community detection in heterogeneous networks – Interactions in social media are noisy – Interactions in one mode or one dimension might be too noisy to detect meaningful communities – Not all users are active in all dimensions or with different modes Need integration of interactions at multiple dimensions or modes 5

COMMUNITIES IN MULTI- DIMENSIONAL NETWORKS 6

Communities in Multi-Dimensional Networks A p-dimension network An example of a 3-dimensional network Goal: integrate interactions at multiple dimensions to find reliable community structures 7

A Unified View for Community Partition (from Chapter 3) Latent space models, block models, spectral clustering, and modularity maximization can be unified as 8

Integration Strategies 9

Network Integration Convert a multi-dimensional network into a single- dimensional network Different types of interaction strengthen one actor’s connection The average strength is Spectral clustering with a p-dimensional network becomes 10

Network Integration Example 11

Utility Integration Integration by averaging the utility matrix Equivalent to optimizing average utility function For spectral clustering, Hence, the objective of spectral clustering becomes 12

Utility Integration Example 13

Utility Integration Example Spectral clustering based on utility integration leads to a partition of two communities: {1, 2, 3, 4} and {5, 6, 7, 8, 9} 14

Feature Integration Soft community indicators extracted from each type of interactions are structural features associated with nodes Integration can be done at the feature level A straightforward approach: take the average of structural features Direct feature average is not sensible Need comparable coordinates among different dimensions 15

Problem with Direct Feature Average Two communities: {1, 2, 3, 7, 9} {4, 5, 6, 8} 16

Proper way of Feature Integration Structural features of different dimensions are highly correlated after a certain transformation Multi-dimensional integration can be conducted after we map the structural features into the same coordinates – Find the transformation by maximizing pairwise correlation – Suppose the transformation associated with dimension (i) is – The average of structural features is – The average is shown to be proportional to the top left singular vector of data X by concatenating structural features of each dimension 17

Feature Integration Example The top 2 left singular vectors of X are Two Communities: {1, 2, 3, 4} {5, 6, 7, 8, 9} 18

Partition Integration Combine the community partitions obtained from each type of interaction – a.k.a. cluster ensemble Cluster-based Similarity Partitioning Algorithm (CPSA) – Similarity is 1 is two objects belong to the same group, 0 otherwise – The similarity between nodes is computed as – The entry is essentially the probability that two nodes are assigned into the same community – Then apply similarity-based community detection methods to find clusters 19

CPSA Example Applying spectral clustering to the above matrix results in two communities: {1, 2, 3, 4} and { 5, 6, 7, 8, 9} 20

More Efficient Partition Integration CPSA requires the computation of a dense similarity matrix – Not scalable An alternative approach: Partition Feature Integration – Consider partition information as features – Apply a similar procedure as in feature integration A detailed procedure: – Given partitions of each dimension – Construct a sparse partition feature matrix – Take the top left singular vectors of Y as soft community indicator – Apply k-means to the singular vectors to find community partition 21

Partition Integration Example SVD k-means {1, 2, 3, 4} {5, 6, 7, 8, 9} Y is sparse 22

Comparison of Multi-Dimensional Integration Strategies Network Integration Utility Integration Feature Integration Partition Integration Tuning weights for different types of interactions XXXX Sensitivity to noise YesOKRobustYes Clustering quality badGood OK Computational cost Low HighExpensive 23

COMMUNITIES IN MULTI-MODE NETWORKS 24

Co-clustering on 2-mode Networks Multi-mode networks involve multiple types of entities A 2-mode network is a simple form of multi-mode network – E.g., user-tag network in social media – A.k.a., affiliation network The graph of a 2-mode network is a bipartite – All edges are between users and tags – No edges between users or between tags 25

Adjacency Matrix of 2-Mode Network Each mode represents one type of entity; not necessarily a square matrix 26

Co-Clustering Co-clustering: find communities in two modes simultaneously – a.k.a. biclustering – Output both communities of users and communities of tags for a user- tag network A straightforward Approach: Minimize the cut in the graph The minimum cut is 1; a trivial solution is not desirable Need to consider the size of communities 27

Spectral Co-Clustering Minimize the normalized cut in a bipartite graph – Similar as spectral clustering for undirected graph Compute normalized adjacency matrix Compute the top singular vectors of the normalized adjacency matrix Apply k-means to the joint community indicator Z to obtain communities in user mode and tag mode, respectively 28

Spectral Co-Clustering Example Two communities: { u 1,u 2, u 3, u 4, t 1, t 2, t 3 } { u 5, u 6, u 7, u 8, u 9, t 4, t 5, t 6, t 7 } Two communities: { u 1,u 2, u 3, u 4, t 1, t 2, t 3 } { u 5, u 6, u 7, u 8, u 9, t 4, t 5, t 6, t 7 } k-means 29

Generalization to A Star Structure Spectral co-clustering can be interpreted as a block model approximation to normalized adjacency matrix generalize to a star structure S (1) corresponds to the top left singular vectors of the following matrix S (1) corresponds to the top left singular vectors of the following matrix 30

Generalization to Multi-Mode Networks For a multi-mode network, compute the soft community indicator of each mode one by one It becomes a star structure when looking at one mode vs. other modes Community Detection in Multi-Mode Networks – Normalize interaction matrix – Iteratively update community indicator as the top left singular vectors – Apply k-means to the community indicators to find partitions in each mode 31

Book Available at Morgan & claypool Publishers Amazon If you have any comments, please feel free to contact: Lei Tang, Yahoo! Labs, Huan Liu, ASU 32