Social Media Mining Chapter 5 1 Chapter 5, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool, September, 2010.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Community Detection and Graph-based Clustering
Clustering Basic Concepts and Algorithms
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida.
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Data Mining Classification: Alternative Techniques
Community Detection and Evaluation
An Overview of Machine Learning
Nodes, Ties and Influence
A Probabilistic Framework for Semi-Supervised Clustering
Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool,
Discovering Overlapping Groups in Social Media Xufei Wang, Lei Tang, Huiji Gao, and Huan Liu Arizona State University.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.
Semi-Supervised Learning D. Zhou, O Bousquet, T. Navin Lan, J. Weston, B. Schokopf J. Weston, B. Schokopf Presents: Tal Babaioff.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Introduction to machine learning
Evaluating Performance for Data Mining Techniques
COMMUNITIES IN MULTI-MODE NETWORKS 1. Heterogeneous Network Heterogeneous kinds of objects in social media – YouTube Users, tags, videos, ads – Del.icio.us.
Interactive Image Segmentation of Non-Contiguous Classes using Particle Competition and Cooperation Fabricio Breve São Paulo State University (UNESP)
CHAPTER 12 ADVANCED INTELLIGENT SYSTEMS © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang.
Friends and Locations Recommendation with the use of LBSN
CSE 185 Introduction to Computer Vision Pattern Recognition.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning.
Community Evolution in Dynamic Multi-Mode Networks Lei Tang, Huan Liu Jianping Zhang Zohreh Nazeri Danesh Zandi & Afshin Rahmany Spring 12SRBIAU, Kurdistan.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Data Mining and Machine Learning Lab Unsupervised Feature Selection for Linked Social Media Data Jiliang Tang and Huan Liu Computer Science and Engineering.
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign User Profiling in Ego-network: Co-profiling Attributes and Relationships.
Page 1 Inferring Relevant Social Networks from Interpersonal Communication Munmun De Choudhury, Winter Mason, Jake Hofman and Duncan Watts WWW ’10 Summarized.
1 Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 12 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Chapter 6 – Three Simple Classification Methods © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Network Community Behavior to Infer Human Activities.
1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Contribution and Proposed Solution Sequence-Based Features Collective Classification with Reports Results of Classification Using Reports Collective Spammer.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Mining information from social media
Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Unsupervised Streaming Feature Selection in Social Media
Clustering Algorithms Sunida Ratanothayanon. What is Clustering?
Ultra-high dimensional feature selection Yun Li
Scalable Learning of Collective Behavior Based on Sparse Social Dimensions Lei Tang, Huan Liu CIKM ’ 09 Speaker: Hsin-Lan, Wang Date: 2010/02/01.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Applying Link-based Classification to Label Blogs Smriti Bhagat, Irina Rozenbaum Graham Cormode.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
 DM-Group Meeting Liangzhe Chen, Oct Papers to be present  RSC: Mining and Modeling Temporal Activity in Social Media  KDD’15  A. F. Costa,
Data Mining Practical Machine Learning Tools and Techniques
Semi-Supervised Clustering
A Viewpoint-based Approach for Interaction Graph Analysis
Classification of unlabeled data:
Machine Learning Basics
Mixture of Mutually Exciting Processes for Viral Diffusion
Neural Networks and Their Application in the Fields of Coporate Finance By Eric Séverin Hanna Viinikainen.
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.
GANG: Detecting Fraudulent Users in OSNs
Machine Learning – a Probabilistic Perspective
Presentation transcript:

Social Media Mining Chapter 5 1 Chapter 5, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool, September, 2010.

EVOLUTION PATTERNS IN SOCIAL MEDIA 2

Growth of Facebook Population 3

Evolutions in Social Media Social media networks are highly dynamic Interesting patterns in dynamic networks – Decreasing probability of new connections between two nodes with increasing distance – Many new connections occur as triadic closures – Segmentation of dynamic networks into 3 regions Singletons Isolated communities with a star structure A giant component anchored by a well-connected core region – Density increases with the network growth – Average distance between nodes shrinks 4

Community Evolution Communities also expand, shrink, or dissolve in dynamic networks How to uncover latent community change behind dynamic network interactions? 5

Naïve Approach to Studying Community Evolution Take snapshots of a network find communities at each snapshot Clustering independently at each snapshot Cons: – Most community detection methods produce local optimal solutions – Hard to determine if the evolution is due to the evolution or algorithm randomness 6

Naïve Approach Example There is a sharp change at T 2 This approach may report spurious structural changes at T 1 at T 3 at T 2 7

Evolutionary Clustering in Smoothly Evolving Networks Evolutionary Clustering: find a smooth sequence of communities given a series of network snapshots Objective function: snapshot cost (CS) + temporal cost (CT) Take spectral clustering as an example – Snapshot cost : – Temporal cost: Community Evolution: where S t is still a valid solution after an orthogonal transformation 8

Evolutionary Clustering Example at T 1 at T 3 at T 2 For T 1 For T 2 We obtain two communities based on spectral clustering with this modified graph Laplacian: {1, 2, 3, 4} and {5, 6, 7, 8, 9} We obtain two communities based on spectral clustering with this modified graph Laplacian: {1, 2, 3, 4} and {5, 6, 7, 8, 9} 9

Segment-based Clustering with Evolving Networks Independent clustering at each snapshot – do not consider temporal information – Likely to output specious evaluation patterns Evolutionary clustering enforces smoothness – may fail to capture drastic change How to strike balance between gradual changes under normal circumstances and drastic changes caused by major events? Segment-based clustering: – Community structure remains unchanged in a segment of time – A change between consecutive segments Fundamental question: how to detect the change points? 10

Segment-based Clustering Segment-based Clustering assumes community structure remains unchanged in a segment of time GraphScope is one segment-based clustering method – If network connections do not change much over time, consecutive network snapshots should be grouped into one segment – If a new network snapshot does not fit into an existing segment (when current community structure induces a high cost on a new network snapshot), then introduce a change point and start a new segment 11

GraphScope on Enron Data 12

CLASSIFICATION WITH NETWORK DATA 13

Correlations in Network Individual behaviors are correlated in a network environment homophilyinfluenceConfounding 14

Classification with Network Data How to leverage this correlation observed in networks to help predict user attributes or interests? Predict the labels for nodes in yellow 15

Collective Classification Labels of nodes are interdependent with each other The label of one node cannot be determined independently; Need Collective Classification Markov Assumption: the label of one node depends on the label of his neighbors Collective classification involves 3 components: Local Classifier Assign initial label Relational Classifier Capture correlations between nodes Collective Inference Propagate correlations through network 16

Collective Classification Local Classifier: used for initial label assignment – Predicts label based on node attributes – Classical classification learning – Does not employ network information Relational Classifier: capture correlations based on label info – Learn a classifier from the labels or/and attributes of its neighbors to the label of one node – Network information is used Collective Classification: propagate the correlation – Apply relational classifier to each node iteratively – Iterate until the inconsistency between neighboring labels is minimized – Network structure substantially affects the final prediction 17

Weighted-vote Relational Neighborhood Classifier No local classifier Relational classifier – prediction of one node is the average of its neighbors Collective Inference 18

Example of wvRN Initialization for unlabeled nodes – p(y i =1|N i )=0.5 1 st Iteration: – For node 3, N 3 ={1,2,4} P(y 1 =1|N 1 ) = 0 P(y 2 =1|N 2 ) = 0 P(y 4 =1 |N 4 ) = 0.5 P(y 3 =1|N 3 ) = 1/3 ( ) = 0.17 – For node 4, N 4 ={1,3, 5, 6} P(y 4 =1|N 4 )= ¼( ) = 0.42 – For node 5, N 5 ={4,6,7,8} P(y 5 =1|N 5 ) = ¼ ( ) =

Iterative Result Stabilizes after 5 iterations – Nodes 5, 8, 9 are + (P i > 0.5) – Node 3 is – (P i < 0.5) – Node 4 is in between (P i =0.5) 20

Community-Based Learning People in social media engage in various relationships – Colleagues – Relatives – Friends – Co-travelers Different relationships have different correlations with user interests/behavior/profiles 21

Challenges Social media often comes with a network, but no relationship information Or relationship information is not complete or refined enough Challenges: – How to differentiate these heterogeneous relationship in a network? – How to determine whether the relationship is useful for prediction? 22

Social Dimension Social Dimension: – Latent dimensions defined by social connections Each dimension represents one type of relationship 23

A Learning Framework based on Social Dimensions People involved in the same social dimension are likely to connect to each other, thus forming a community Training: 1.Extract meaningful social dimensions based on network connectivity via community detection 2.Determine relevant social dimensions (plus node attributes) through supervised learning Prediction – Apply the constructed model in Step 2 to social dimensions of unlabeled nodes – No iterative inference 24

Underlying Assumption Assumption: – the label of one node is determined by its social dimension – P(y i |A) = P(y i |S i ) Community membership serves as latent features 25

Example of SocioDim Framework One is likely to involve in multiple relationships, thus soft clustering is used to extract social dimensions Spectral Clustering Support Vector Machine 26

Collective Classification vs. Community-Based Learning Collective Classification Community-Based Learning Computational Cost for Traininglowhigh Computational Cost for Prediction highlow Handling Heterogeneous Relations ✔ Handling Evolving Networks ✔ Integrating network information and actor attributes ✔✔ 27

Book Available at Morgan & claypool Publishers Amazon If you have any comments, please feel free to contact: Lei Tang, Yahoo! Labs, Huan Liu, ASU 28