Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County.

Slides:



Advertisements
Similar presentations
Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia Joint work.
Advertisements

Learning Trajectory Patterns by Clustering: Comparative Evaluation Group D.
Community Detection with Edge Content in Social Media Networks Paper presented by Konstantinos Giannakopoulos.
Spectral graph reduction for image and streaming video segmentation Fabio Galasso 1 Margret Keuper 2 Thomas Brox 2 Bernt Schiele 1 1 Max Planck Institute.
Leting Wu Xiaowei Ying, Xintao Wu Aidong Lu and Zhi-Hua Zhou PAKDD 2011 Spectral Analysis of k-balanced Signed Graphs 1.
KDD 2009 Scalable Graph Clustering using Stochastic Flows Applications to Community Discovery Venu Satuluri and Srinivasan Parthasarathy Data Mining Research.
Information Networks Graph Clustering Lecture 14.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Xiaowei Ying, Xintao Wu, Daniel Barbara Spectrum based Fraud Detection in Social Networks 1.
Analysis of Twitter Data NIKHIL PURANIK CMSC 601 – Research Skills 25 th April 2011UNIVERSITY OF MARYLAND BALTIMORE COUNTY.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Masters Thesis Defense Amit Karandikar Advisor: Dr. Anupam Joshi Committee: Dr. Finin, Dr. Yesha, Dr. Oates Date: 1 st May 2007 Time: 9:30 am Place: ITE.
Biased Normalized Cuts 1 Subhransu Maji and Jithndra Malik University of California, Berkeley IEEE Conference on Computer Vision and Pattern Recognition.
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
SFU, CMPT 741, Fall 2009, Martin Ester 418 Outlook Outline Trends in KDD research Graph mining and social network analysis Recommender systems Information.
Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool,
Discovering Overlapping Groups in Social Media Xufei Wang, Lei Tang, Huiji Gao, and Huan Liu Arizona State University.
A Unified View of Kernel k-means, Spectral Clustering and Graph Cuts Dhillon, Inderjit S., Yuqiang Guan, and Brian Kulis.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© 2003 by Davi GeigerComputer Vision October 2003 L1.1 Image Segmentation Based on the work of Shi and Malik, Carnegie Mellon and Berkley and based on.
Unsupervised Learning of Categories from Sets of Partially Matching Image Features Dominic Rizzo and Giota Stratou.
A Unified View of Kernel k-means, Spectral Clustering and Graph Cuts
Segmentation Graph-Theoretic Clustering.
Incorporating User Provided Constraints into Document Clustering Yanhua Chen, Manjeet Rege, Ming Dong, Jing Hua, Farshad Fotouhi Department of Computer.
A scalable multilevel algorithm for community structure detection
1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti
The Very Small World of the Well-connected. (19 june 2008 ) Lada Adamic School of Information University of Michigan Ann Arbor, MI
POTENTIAL RELATIONSHIP DISCOVERY IN TAG-AWARE MUSIC STYLE CLUSTERING AND ARTIST SOCIAL NETWORKS Music style analysis such as music classification and clustering.
Mining Social Media Communities and Content Akshay Java Ph.D. Dissertation Defense October 16 th 2008.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
COMMUNITIES IN MULTI-MODE NETWORKS 1. Heterogeneous Network Heterogeneous kinds of objects in social media – YouTube Users, tags, videos, ads – Del.icio.us.
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
Analysis of Social Media MLD , LTI William Cohen
Recognition using Regions (Demo) Sudheendra V. Outline Generating multiple segmentations –Normalized cuts [Ren & Malik (2003)] Uniform regions –Watershed.
Chapter 14: SEGMENTATION BY CLUSTERING 1. 2 Outline Introduction Human Vision & Gestalt Properties Applications – Background Subtraction – Shot Boundary.
User Profiling based on Folksonomy Information in Web 2.0 for Personalized Recommender Systems Huizhi (Elly) Liang Supervisors: Yue Xu, Yuefeng Li, Richi.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
Co-clustering Documents and Words Using Bipartite Spectral Graph Partitioning Jinghe Zhang 10/28/2014 CS 6501 Information Retrieval.
Predicting Positive and Negative Links in Online Social Networks
IEEE Int'l Symposium on Signal Processing and its Applications 1 An Unsupervised Learning Approach to Content-Based Image Retrieval Yixin Chen & James.
Spectral Analysis based on the Adjacency Matrix of Network Data Leting Wu Fall 2009.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering.
Xintao Wu Jan 18, 2013 Retweeting Behavior and Spectral Graph Analysis in Social Media.
PCI th Panhellenic Conference in Informatics Clustering Documents using the 3-Gram Graph Representation Model 3 / 10 / 2014.
LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor : Dr. Koh Jia-Ling Speaker : Tu.
+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.
Data Structures and Algorithms in Parallel Computing Lecture 7.
About Me Swaroop Butala  MSCS – graduating in Dec 09  Specialization: Systems and Databases  Interests:  Learning new technologies  Application of.
CS654: Digital Image Analysis Lecture 28: Advanced topics in Image Segmentation Image courtesy: IEEE, IJCV.
Topical Clustering of Search Results Date : 2012/11/8 Resource : WSDM’12 Advisor : Dr. Jia-Ling Koh Speaker : Wei Chang 1.
Efficient Semi-supervised Spectral Co-clustering with Constraints
 In the previews parts we have seen some kind of segmentation method.  In this lecture we will see graph cut, which is a another segmentation method.
Blog Track Open Task: Spam Blog Detection Tim Finin Pranam Kolari, Akshay Java, Tim Finin, Anupam Joshi, Justin.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.
A Tutorial on Spectral Clustering Ulrike von Luxburg Max Planck Institute for Biological Cybernetics Statistics and Computing, Dec. 2007, Vol. 17, No.
Analysis of Massive Data Sets Prof. dr. sc. Siniša Srbljić Doc. dr. sc. Dejan Škvorc Doc. dr. sc. Ante Đerek Faculty of Electrical Engineering and Computing.
Normalized Cuts and Image Segmentation Patrick Denis COSC 6121 York University Jianbo Shi and Jitendra Malik.
Document Clustering with Prior Knowledge Xiang Ji et al. Document Clustering with Prior Knowledge. SIGIR 2006 Presenter: Suhan Yu.
Motoki Shiga, Ichigaku Takigawa, Hiroshi Mamitsuka
Document Clustering Based on Non-negative Matrix Factorization
Feeds That Matter A study of Bloglines subscriptions
Wikitology Wikipedia as an Ontology
Smart Portal To Protect Child Online
Approximating the Community Structure of the Long Tail
Spectral Clustering Eric Xing Lecture 8, August 13, 2010
3.3 Network-Centric Community Detection
“Traditional” image segmentation
Presented by Nick Janus
Presentation transcript:

Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County KDD 2008 Workshop on Web Mining and Web Usage Analysis

Introduction Community Detection –Clustering Approach –Spectral Approach –Co-Clustering Simultaneous Clustering Evaluation Future Work Conclusions Outline

Introduction Community Detection –Clustering Approach –Spectral Approach –Co-Clustering Simultaneous Clustering Evaluation Future Work Conclusions Outline

Social Media Describes the online technologies and practices that people use to share opinions, insights, experiences, and perspectives and engage with each other. ~Wikipedia

Social Media Graphs G = (V,E) describing the relationships between different entities (People, Documents, etc.) G’ = a tri-partite graph that expresses how entities ‘Tag’ some resource Tags URLs Users

A community in the real world is identified in a graph as a set of nodes that have more links within the set than outside it. Political Blogs Twitter Network Facebook Network What is a Community

Introduction Community Detection –Clustering Approach –Spectral Approach –Co-Clustering Simultaneous Clustering Evaluation Future Work Conclusions Outline

Community Detection Clustering Approach Clustering Approach 1.Agglomerative/Hierarchical Topological Overlap: Similarity is measured in terms of number of nodes that both i and j link to. (Razvasz et al.)

Community Detection Clustering Approach Clustering Approach 1.Agglomerative/Hierarchical 2.Divisive/Partition based Remove edges that have highest edge betweenness centrality Political Books (Girvan-Newman Algorithm)

Community Detection Spectral Approach The graph can be partitioned using the eigenspectrum of the Laplacian. (Shi and Malik) The second smallest eigenvector of the graph Laplacian is the Fiedler vector. The graph can be recursively partitioned using the sign of the values in its Fielder vector. Normalized Cuts Graph Laplacian Cost of edges deleted to disconnect the graph Total cost of all edges that start from B

Community Detection Co-Clustering Spectral graph bipartitioning Compute graph laplacian using Where is the document by term matrix (Dhillon et al.)

Introduction Community Detection –Clustering Approach –Spectral Approach –Co-Clustering Simultaneous Clustering Evaluation Future Work Conclusions Outline

Social Media Graphs Links Between Nodes Links Between Nodes and Tags Simultaneous Cuts

A community in the real world is identified in a graph as a set of nodes that have more links within the set than outside it and share similar tags. Communities in Social Media

Clustering Tags and Graphs Nodes Tags Nodes Tags Fiedler Vector Polarity β= 0 is like co-clustering, β= 1 Equal importance to blog-blog and blog-tag, β>> 1 NCut

Clustering Tags and Graphs β= 0 is like co-clustering, β= 1 Equal importance to blog-blog and blog-tag, β>> 1 NCut Clustering Only Links Clustering Links + Tags

Clustering Tags and Graphs Clustering Only Links Clustering Links + Tags

Introduction Community Detection –Clustering Approach –Spectral Approach –Co-Clustering Simultaneous Clustering Evaluation Future Work Conclusions Outline

Datasets Citeseer –Agents, AI, DB, HCI, IR, ML –Words used in place of tags Blog data –derived from the WWE/Buzzmetrics dataset –Tags associated with Blogs derived from del.icio.us –For dimensionality reduction 100 topics derived from blog homepages using LDA (Latent Dirichilet Allocation) Pairwise similarity computed –RBF Kernel for Citeseer –Cosine for blogs

Citeseer Data Accuracy = 36%Accuracy = 62% Higher accuracy by adding ‘tag’ information

SimCut Results in Higher intra-cluster similarity Lower inter-cluster similarity Citeseer Data NCutSimCut

Constrains cuts based on both Link Structure Tags Citeseer Data NCut SimCut True

SimCut Results in Higher intra-cluster similarity Lower inter-cluster similarity Blog Data NCutSimCut

Blog Data NCutSimCut Ncut Few, Large clusters with low intra-cluster similarity SimCut Moderate size clusters higher intra-cluster similarity 35 Clusters

Effect of Number of Tags, Clusters Citeseer More tags help, to an extent Lower mutual information if only the graph is used Mutual Information compares clusters to ground truth

Effect of Number of Tags, Clusters Blogs More tags help, to an extent Lower mutual information if only the graph is used Mutual Information compares clusters to content-based clusters (no tags/graph)

Introduction Community Detection –Clustering Approach –Spectral Approach –Co-Clustering Simultaneous Clustering Evaluation Future Work Conclusions Outline

Future Work Evaluating SimCut algorithm on derived feature types like: named entities, sentiments and opinions, links to main stream media. For a dataset with ground truth, a comparison of graph based, text based and graph+tag based clustering Evaluating effect of varying β

Introduction Community Detection –Clustering Approach –Spectral Approach –Co-Clustering Simultaneous Clustering Evaluation Future Work Conclusions Outline

Conclusions Many Social Media sites allow users to tag resources Incorporating folksonomies in community detection can yield better results SimCut can be easily implemented and relates to Ncut with two simultaneous objectives –Minimize number of node-node edges being cut –Minimize number of node-tag edges being cut Detected communities can be associated with meaningful, descriptive tags

Thanks!

More Tags Only GraphSimCut

Citeseer (Community Size, Similarity)

Blogs (Community Size, Similarity)