Presentation is loading. Please wait.

Presentation is loading. Please wait.

Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County.

Similar presentations


Presentation on theme: "Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County."— Presentation transcript:

1 Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County KDD 2008 Workshop on Web Mining and Web Usage Analysis

2 Introduction Community Detection –Clustering Approach –Spectral Approach –Co-Clustering Simultaneous Clustering Evaluation Future Work Conclusions Outline

3 Introduction Community Detection –Clustering Approach –Spectral Approach –Co-Clustering Simultaneous Clustering Evaluation Future Work Conclusions Outline

4 Social Media Describes the online technologies and practices that people use to share opinions, insights, experiences, and perspectives and engage with each other. ~Wikipedia

5 Social Media Graphs G = (V,E) describing the relationships between different entities (People, Documents, etc.) G’ = a tri-partite graph that expresses how entities ‘Tag’ some resource 1 1 2 2 3 3 4 4 1 1 2 2 Tags 1 1 2 2 3 3 4 4 URLs Users

6 A community in the real world is identified in a graph as a set of nodes that have more links within the set than outside it. Political Blogs Twitter Network Facebook Network What is a Community

7 Introduction Community Detection –Clustering Approach –Spectral Approach –Co-Clustering Simultaneous Clustering Evaluation Future Work Conclusions Outline

8 Community Detection Clustering Approach Clustering Approach 1.Agglomerative/Hierarchical Topological Overlap: Similarity is measured in terms of number of nodes that both i and j link to. (Razvasz et al.)

9 Community Detection Clustering Approach Clustering Approach 1.Agglomerative/Hierarchical 2.Divisive/Partition based Remove edges that have highest edge betweenness centrality Political Books (Girvan-Newman Algorithm)

10 Community Detection Spectral Approach The graph can be partitioned using the eigenspectrum of the Laplacian. (Shi and Malik) The second smallest eigenvector of the graph Laplacian is the Fiedler vector. The graph can be recursively partitioned using the sign of the values in its Fielder vector. Normalized Cuts Graph Laplacian Cost of edges deleted to disconnect the graph Total cost of all edges that start from B

11 Community Detection Co-Clustering Spectral graph bipartitioning Compute graph laplacian using Where is the document by term matrix (Dhillon et al.)

12 Introduction Community Detection –Clustering Approach –Spectral Approach –Co-Clustering Simultaneous Clustering Evaluation Future Work Conclusions Outline

13 Social Media Graphs Links Between Nodes Links Between Nodes and Tags Simultaneous Cuts

14 A community in the real world is identified in a graph as a set of nodes that have more links within the set than outside it and share similar tags. Communities in Social Media

15 Clustering Tags and Graphs Nodes Tags Nodes Tags Fiedler Vector Polarity β= 0 is like co-clustering, β= 1 Equal importance to blog-blog and blog-tag, β>> 1 NCut

16 Clustering Tags and Graphs β= 0 is like co-clustering, β= 1 Equal importance to blog-blog and blog-tag, β>> 1 NCut Clustering Only Links Clustering Links + Tags

17 Clustering Tags and Graphs Clustering Only Links Clustering Links + Tags

18 Introduction Community Detection –Clustering Approach –Spectral Approach –Co-Clustering Simultaneous Clustering Evaluation Future Work Conclusions Outline

19 Datasets Citeseer –Agents, AI, DB, HCI, IR, ML –Words used in place of tags Blog data –derived from the WWE/Buzzmetrics dataset –Tags associated with Blogs derived from del.icio.us –For dimensionality reduction 100 topics derived from blog homepages using LDA (Latent Dirichilet Allocation) Pairwise similarity computed –RBF Kernel for Citeseer –Cosine for blogs

20 Citeseer Data Accuracy = 36%Accuracy = 62% Higher accuracy by adding ‘tag’ information

21 SimCut Results in Higher intra-cluster similarity Lower inter-cluster similarity Citeseer Data NCutSimCut

22 Constrains cuts based on both Link Structure Tags Citeseer Data NCut SimCut True

23 SimCut Results in Higher intra-cluster similarity Lower inter-cluster similarity Blog Data NCutSimCut

24 Blog Data NCutSimCut Ncut Few, Large clusters with low intra-cluster similarity SimCut Moderate size clusters higher intra-cluster similarity 35 Clusters

25 Effect of Number of Tags, Clusters Citeseer More tags help, to an extent Lower mutual information if only the graph is used Mutual Information compares clusters to ground truth

26 Effect of Number of Tags, Clusters Blogs More tags help, to an extent Lower mutual information if only the graph is used Mutual Information compares clusters to content-based clusters (no tags/graph)

27 Introduction Community Detection –Clustering Approach –Spectral Approach –Co-Clustering Simultaneous Clustering Evaluation Future Work Conclusions Outline

28 Future Work Evaluating SimCut algorithm on derived feature types like: named entities, sentiments and opinions, links to main stream media. For a dataset with ground truth, a comparison of graph based, text based and graph+tag based clustering Evaluating effect of varying β

29 Introduction Community Detection –Clustering Approach –Spectral Approach –Co-Clustering Simultaneous Clustering Evaluation Future Work Conclusions Outline

30 Conclusions Many Social Media sites allow users to tag resources Incorporating folksonomies in community detection can yield better results SimCut can be easily implemented and relates to Ncut with two simultaneous objectives –Minimize number of node-node edges being cut –Minimize number of node-tag edges being cut Detected communities can be associated with meaningful, descriptive tags

31 Thanks!

32 http://ebiquity.umbc.edu http://socialmedia.typepad.com

33 More Tags Only GraphSimCut

34 Citeseer (Community Size, Similarity)

35 Blogs (Community Size, Similarity)


Download ppt "Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County."

Similar presentations


Ads by Google