MEIKE: Influence-based Communities in Networks

Slides:



Advertisements
Similar presentations
Sparsification and Sampling of Networks for Collective Classification
Advertisements

Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.
Social network partition Presenter: Xiaofei Cao Partick Berg.
ICDE 2014 LinkSCAN*: Overlapping Community Detection Using the Link-Space Transformation Sungsu Lim †, Seungwoo Ryu ‡, Sejeong Kwon§, Kyomin Jung ¶, and.
School of Computer Science Carnegie Mellon University Duke University DeltaCon: A Principled Massive- Graph Similarity Function Danai Koutra Joshua T.
+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
DAVA: Distributing Vaccines over Networks under Prior Information
Community Detection Algorithm and Community Quality Metric Mingming Chen & Boleslaw K. Szymanski Department of Computer Science Rensselaer Polytechnic.
Maximizing the Spread of Influence through a Social Network
In Search of Influential Event Organizers in Online Social Networks
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.
Scalable Vaccine Distribution in Large Graphs given Uncertain Data Yao Zhang, B. Aditya Prakash Department of Computer Science Virginia Tech CIKM, Shanghai,
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
One-Shot Multi-Set Non-rigid Feature-Spatial Matching
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network.
INFERRING NETWORKS OF DIFFUSION AND INFLUENCE Presented by Alicia Frame Paper by Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Kraus.
Kyle Heath, Natasha Gelfand, Maks Ovsjanikov, Mridul Aanjaneya, Leo Guibas Image Webs Computing and Exploiting Connectivity in Image Collections.
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
Application of Graph Theory to OO Software Engineering Alexander Chatzigeorgiou, Nikolaos Tsantalis, George Stephanides Department of Applied Informatics.
The Union-Split Algorithm and Cluster-Based Anonymization of Social Networks Brian Thompson Danfeng Yao Rutgers University Dept. of Computer Science Piscataway,
Models of Influence in Online Social Networks
Image Segmentation Rob Atlas Nick Bridle Evan Radkoff.
Social Network Analysis via Factor Graph Model
Active Learning for Networked Data Based on Non-progressive Diffusion Model Zhilin Yang, Jie Tang, Bin Xu, Chunxiao Xing Dept. of Computer Science and.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.
Diversified Top-k Graph Pattern Matching 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
December 7-10, 2013, Dallas, Texas
1/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
Guided Learning for Role Discovery (GLRD) Presented by Rui Liu Gilpin, Sean, Tina Eliassi-Rad, and Ian Davidson. "Guided learning for role discovery (glrd):
Online Social Networks and Media
Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
Algorithms For Solving History Sensitive Cascade in Diffusion Networks Research Proposal Georgi Smilyanov, Maksim Tsikhanovich Advisor Dr Yu Zhang Trinity.
Controlling Propagation at Group Scale on Networks Yao Zhang*, Abhijin Adiga +, Anil Vullikanti + *, and B. Aditya Prakash* *Department of Computer Science.
Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
Biao Wang 1, Ge Chen 1, Luoyi Fu 1, Li Song 1, Xinbing Wang 1, Xue Liu 2 1 Shanghai Jiao Tong University 2 McGill University
Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang.
Yu Wang1, Gao Cong2, Guojie Song1, Kunqing Xie1
Inferring Networks of Diffusion and Influence
Cohesive Subgraph Computation over Large Graphs
Wenyu Zhang From Social Network Group
Nanyang Technological University
Finding Dense and Connected Subgraphs in Dual Networks
Independent Cascade Model and Linear Threshold Model
Heuristic & Approximation
Greedy & Heuristic algorithms in Influence Maximization
Sofus A. Macskassy Fetch Technologies
Friend Recommendation with a Target User in Social Networking Services
Distributed Representations of Subgraphs
Effective Social Network Quarantine with Minimal Isolation Costs
Discovering Functional Communities in Social Media
Zhenjiang Lin, Michael R. Lyu and Irwin King
Noémi Gaskó, Rodica Ioana Lung, Mihai Alexandru Suciu
Consensus Partition Liang Zheng 5.21.
3.3 Network-Centric Community Detection
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
Asymmetric Transitivity Preserving Graph Embedding
Automatic Segmentation of Data Sequences
Alan Kuhnle*, Victoria G. Crawford, and My T. Thai
Independent Cascade Model and Linear Threshold Model
Approximate Graph Mining with Label Costs
Presentation transcript:

MEIKE: Influence-based Communities in Networks Yao Zhang, Bijaya Adhikari, Steve Jan and B. Aditya Prakash Department of Computer Science Virginia Tech SDM, Houston, April 27, 2017 Zhang and Prakash, SDM 2014

Zhang, Adhikari, Jan and Prakash, SDM 2017 Outline Motivation Problem Definition Our Proposed Methods Experiments Conclusion Zhang, Adhikari, Jan and Prakash, SDM 2017

Motivation: Communities Communities are important Nodes in the same community: cohesive, behave similarly Communities are important; in bio: proteins have same functionalities; in social networks, share common interests; behave similarly Communities in biological networks Communities in social networks How to explore community structure by factoring in different roles of nodes during diffusion? Zhang, Adhikari, Jan and Prakash, SDM 2017

Motivation: Diffusion How to explore community structure by factoring in different roles of nodes during diffusion? Diffusion: the phenomenon of spreading contagion on an underlying network E.g.: Memes propagate on social networks Flu spreads over population contact networks …… Twitter Following Network Zhang, Adhikari, Jan and Prakash, SDM 2017

Motivation: Roles in Diffusion How to explore community structure by factoring in different roles of nodes during diffusion? Media nodes Roles in Diffusion Media: nodes who boost diffusion “Bridges/media nodes” Get influenced + influence others Kernel: influential nodes “Celebrities” Communities Kernel Communities: same topic Ordinary communities Corresponding to kernel communities Ordinary communities How to find these communities? Kernel communities First, they belong to the same topic, then they connect to themselves In addition, elonmusk, spacex, and TeslaMotors has similar connections to CNN and TEDchris Twitter Following Network Zhang, Adhikari, Jan and Prakash, SDM 2017

Motivation: Roles in Diffusion How to explore community structure by factoring in different roles of nodes during diffusion? Media nodes Traditional community detection algorithms Ordinary communities Kernel communities While they are useful, they give horizontal the community structure Twitter Following Network Communities detected by NEWMAN’s algorithm Zhang, Adhikari, Jan and Prakash, SDM 2017

Zhang, Adhikari, Jan and Prakash, SDM 2017 Challenges How to formally define media nodes and kernel communities? How to develop effective methods based on the problem formulation? Media nodes Ordinary communities Kernel communities Zhang, Adhikari, Jan and Prakash, SDM 2017

Zhang, Adhikari, Jan and Prakash, SDM 2017 Outline Motivation Problem Definition Our Proposed Methods Experiments Conclusion Zhang, Adhikari, Jan and Prakash, SDM 2017

Zhang, Adhikari, Jan and Prakash, SDM 2017 Defining Media Nodes Properties of Media nodes PM1: Upstream effect of diffusion Capability of getting influenced from other nodes : the expected number of infected nodes in S Seed set A1 Seed set A2 All possible choices of the seed set The number of infected nodes in set S at the end given A is a seed set Upstream effect Zhang, Adhikari, Jan and Prakash, SDM 2017

Zhang, Adhikari, Jan and Prakash, SDM 2017 Defining Media Nodes Properties of Media nodes PM1: Upstream effect of diffusion Capability of getting influenced from other nodes : the expected number of infected nodes in S PM2: Downstream effect of diffusion capability of influencing other nodes : the expected number of nodes S can infect \sigma(S) is the same objective function as the influence maximization problem Downstream effect Zhang, Adhikari, Jan and Prakash, SDM 2017

Zhang, Adhikari, Jan and Prakash, SDM 2017 Defining Media Nodes Properties of Media nodes PM1: Upstream effect of diffusion Capability of getting influenced from other nodes : the expected number of infected nodes in S PM2: Downstream effect of diffusion capability of influencing other nodes : the expected number of nodes S can infect Media node set S has high value of both full stream effect of diffusion Defined as Media nodes are “bridges”: get information and influence others S: media node set (high full stream effect) Zhang, Adhikari, Jan and Prakash, SDM 2017

Defining Kernel and Ordinary Communities Properties of a Kernel Community Maximize PK1: Connectivity among themselves Nodes in the same kernel community have more connections to themselves PK2: Similarity w.r.t. media nodes Nodes in the same kernel community should connect to similar media nodes Properties of a Ordinary Community have more connections to its corresponding kernel community PK1 PK2 PK2: how information flows from kernel to media nodes Zhang, Adhikari, Jan and Prakash, SDM 2017

Zhang, Adhikari, Jan and Prakash, SDM 2017 Problem Formulation MEIKECOM: MEdIa and KErnel COMmunity detection MEIKECOM-Media Find a set M to maximize the full stream effect 𝜙(𝑀) MEIKECOM-Kernel Find kernel community set K={ K1,…Kl } to satisfy MEIKECOM-Ordinary Find ordinary communities corresponding to kernel communities Media nodes Ordinary communities Kernel communities PK1 PK2 Zhang, Adhikari, Jan and Prakash, SDM 2017

Neither submodular, nor supermodular Hardness of MEIKECOM Media nodes MEIKECOM-Media MEIKECOM-Kernel NP-hard Reduce from the MAX-CLIQUE problem Neither submodular, nor supermodular Ordinary communities Kernel communities NP-hard (Down-stream effect) #P-hard (Up-stream effect) Zhang, Adhikari, Jan and Prakash, SDM 2017

Zhang, Adhikari, Jan and Prakash, SDM 2017 Outline Motivation Problem Definition Our Proposed Methods Experiments Conclusion Zhang, Adhikari, Jan and Prakash, SDM 2017

Proposed Methods: MEIKE Overview Step 1 (MEIKE-Media): Find media nodes Merge-based approach Step 2 (MEIKE-Kernel): Find kernel communities Iterative approach Step 3 (MEIKE-Ordinary): Find ordinary communities Leverage the idea Zhang, Adhikari, Jan and Prakash, SDM 2017

Media nodes (unmerged) Step 1: MEIKE-Media A Merge-based approach merge unimportant edges (node pairs) to full stream effects successively maintain the overall full stream effect nodes that remain unmerged (the “singleton” nodes) are ones with highest full stream effect Media nodes (unmerged) Merged node Merged node Emphasize “unmerged” Merging Merging Merging Merged Graph Zhang, Adhikari, Jan and Prakash, SDM 2017

Zhang, Adhikari, Jan and Prakash, SDM 2017 Step 1: MEIKE-Media Q1: How to quantify the edge impact on the full- stream effect Too expensive to compute 𝜙 𝑆 directly Full stream effect Downstream effect Upstream effect Zhang, Adhikari, Jan and Prakash, SDM 2017

Local effect φb(a) can be computed in constant time Step 1: MEIKE-Media Q1: How to quantify the edge impact on the full- stream effect Too expensive to compute 𝜙 𝑆 directly We use local effect instead Local effect φb(a) can be computed in constant time wab a b We prove that φb(a)∝ ua wab, where ua is the eigenscore of node a φb(a): Local effect of edge (a,b) on φ(a), i.e., contribution of edge (a,b) towards φ(a), ua Once the eigenvalue is computed, it is constant time A u u = λ1 . See paper for details Zhang, Adhikari, Jan and Prakash, SDM 2017

Zhang, Adhikari, Jan and Prakash, SDM 2017 Step 1: MEIKE-Media Q2: How to maintain the overall full-stream effect based on local effects Create a new graph G’ with the edge weight ua wab G’ shows contributions of full stream effects on edges Leverage the idea from CoarseNet [Purohit+, KDD2014], preserve the first eigenvalue λ1 of the new graph G’ , when merging edges in G’ φb(a)∝ ua wab wab ua wab a b a b Original Graph G New Graph G’ For ease of analysis Zhang, Adhikari, Jan and Prakash, SDM 2017

Zhang, Adhikari, Jan and Prakash, SDM 2017 Step 1: MEIKE-Media See paper for details Q3: How to merge to maintain the first eigenvalue of G’ Merge edges with the smallest change of λ1 Following the merge definition from CoarseNet Define the change of λ1 after merging edge (a,b) on G’: Compute directly for all edges is expensive Time complexity: O(|E|(|V|+|E|)) We use the matrix perturbation theory up to the first-order approximation to compute Time complexity: linear for all edges Merge (a,b) a b c d c Zhang, Adhikari, Jan and Prakash, SDM 2017

Media nodes (unmerged) Step 1: Running Time Time Complexity O(|E|log|E|+D(|V|-m)), subquadratic D: maximum degree; m: number of media nodes Media nodes (unmerged) Merged node Merged node Merging Merging Merging Zhang, Adhikari, Jan and Prakash, SDM 2017

Zhang, Adhikari, Jan and Prakash, SDM 2017 Step 2: MEIKE-Kernel See paper for details Reformulate MEIKECOM-Kernel problem using vector representations for each node Main idea: Each node u has a vector zu representing the importance to each kernel community Iterative heuristic algorithm Pairwise relaxation Change zu iteratively to increase the objective function Guarantee to converge Running time: linear for each iteration Briefly introduce kernel Zhang, Adhikari, Jan and Prakash, SDM 2017

Zhang, Adhikari, Jan and Prakash, SDM 2017 Outline Motivation Problem Definition Our Proposed Methods Experiments Conclusion Zhang, Adhikari, Jan and Prakash, SDM 2017

Zhang, Adhikari, Jan and Prakash, SDM 2017 Datasets Datasets Domain #Nodes #Edges Enron Emails 156 2,061 MemeTracker Cascades 851 5,000 Citation 8,046 18,322 Google+ Social Media 107K 14M Twitter 456K 8M Coauthor Coauthors 0.8M 2M Coauthor has the ground-truth Media nodes: researchers who have publications in at least three areas Kernel communities: PC members in five research areas Zhang, Adhikari, Jan and Prakash, SDM 2017

Zhang, Adhikari, Jan and Prakash, SDM 2017 Baselines Media nodes PMIA: maximize downstream effect NETSHIELD: maximize upstream effect HIS and MAXD: role discovery algorithms BIGCLAM and CLIQUE: overlaps of communities Kernel communities NEWMAN, LOUVAIN, D-LOUVAIN and P-LOUVAIN: popular community detection algorithms WEBA: celebrity based community detection algorithm BIGCLAM and CLIQUE: overlapping community detection Media nodes covers multiple areas Make sense Zhang, Adhikari, Jan and Prakash, SDM 2017

Performance of MEIKE-Media Quality for full stream effects Quality compared to the ground truth on Coauthor Explain\phi(M) really solves the problem; good for max. \phi(M) MEIKE has the best results for full-stream effects. MEIKE outperforms baselines like community based and role based approaches. Zhang, Adhikari, Jan and Prakash, SDM 2017

Performance of MEIKE-Kernel Quality (F1-score) of kernel communities compared to other competitors on Coauthor. MEIKE outperforms other community detection algorithms Zhang, Adhikari, Jan and Prakash, SDM 2017

Case study on a citation network (database area) Case Studies See more in the paper Kernel Communities Media nodes propagate topics survey existing methods asking important open questions. E.g., Cafarella et. al. ”Data management projects at Google”. SIGMOD2008. cite popular projects such as Map-Reduce and GFS K1:queries K2: hashing K3: logic K4: optimization Case study on a citation network (database area) MEIKE finds interesting kernel communities, and intuitive and useful media nodes Zhang, Adhikari, Jan and Prakash, SDM 2017

Zhang, Adhikari, Jan and Prakash, SDM 2017 Outline Motivation Problem Definition Our Proposed Methods Experiments Conclusion Zhang, Adhikari, Jan and Prakash, SDM 2017

Zhang, Adhikari, Jan and Prakash, SDM 2017 Conclusion Intuitive Novel Problem: MEIKECOM Media nodes Get influenced + influence others Kernel communities Group of Celebrities Ordinary communities Corresponding to a kernel community Effective and Efficient Methods MEIKECOM-Media Merge based approach MEIKECOM-Kernel Iterative pairwise relaxation approach Experiments Intuitive and interesting media nodes and kernel communities Media nodes Ordinary communities Kernel communities Different from standard community detection methods, …. Zhang, Adhikari, Jan and Prakash, SDM 2017

Zhang, Adhikari, Jan and Prakash, SDM 2017 Thank you! Funding: Code at: http://people.cs.vt.edu/~yaozhang Zhang, Adhikari, Jan and Prakash, SDM 2017