On Finding Fine-Granularity User Communities by Profile Decomposition Seulki Lee, Minsam Ko, Keejun Han, Jae-Gil Lee Department of Knowledge Service Engineering KAIST(Korea Advanced Institute of Science and Technology) {seulki15, minsam.ko, The 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining August, 2012, Kadir Has University, Istanbul, Turkey
Table of Contents Introduction DecompClus Algorithm Evaluation Related Work Conclusion 2
Community Discovery Community discovery is one of the most popular tasks in social network analysis. Many real-world applications with community discovery Advertisement to common interest groups Recommendation of potential collaborators in workplaces 3
Relationships in Social Networks A social network is modeled as a huge graph. A node is a user. An edge is a relationship between users. Two types of relationships in social network Explicit relationship Implicit relationship Follower / FollowingFriend Explicit relationshipImplicit relationship Unknown, but similar interest We focus on this relationship. 4
To extract implicit relationships, a user is typically represented by his/her profile, and the similarity between user profiles is measured. The form of the profile depends on the social network and application. In DBLP, the profile is a list of papers he/she wrote In Twitter, the profile is a list of tweets he/she posted Extracting implicit relationships Similarity between the profiles = Implicit relationship User A’s profileUser B’s profile … … 5
Limitation of a Single Profile Generally, a user is described by only a single profile which oversimplifies the multiple characteristics of a user. This problem results in loss of meaningful communities. Though User A and User B share the same interest about photography, overall similarity between the two users is not very high. 6
DecompClus We propose DecompClus, the community discovery method of profile decomposition, which divides a profile into sub-profiles. outdoor, hiking, … art, museum, photo, lens, … photo, color, … photo, lens, … outdoor, hiking, … photo, color, … art, museum, … Step1: Profile Decomposition Step2: sub-profile clustering photo, lens, … photo, color, … outdoor, hiking, … art, museum, … ProfilesSub-ProfilesCommunities … … … 7
Table of Contents Introduction DecompClus Algorithm Evaluation Related Work Conclusion 8
Overall Procedure of DecompClus 9
Step 1: Profile Decomposition (1/2) A network of unit items (e.g., papers or tweets) is constructed for each user’s profile. A node (item) is represented by a term vector (weight: TF-IDF). An edge is determined as the similarity between two nodes (cosine similarity). i2i2 i6i6 i5i5 i4i4 i3i3 i1i1 i7i7 User A’s profile 10
Step 1: Profile Decomposition (2/2) Clustering is performed on the small network. We adopted a clustering algorithm based on modularity optimization, which tries to detect high modularity partitions of networks [V. D. Blondel, et. al., 2008]. Each cluster becomes a sub-profile. User A’s profile User A’s sub-profiles 11
Step 2: Sub-Profile Clustering (1/2) A network of sub-profiles is constructed by accumulating sub-profiles from every user. A node (sub-profile) is represented by a term vector (weight: TF-IDF). A edge is weighted by the similarity between two nodes (cosine similarity). User A’s sub-profile User D’s sub-profile User E’s sub-profile User A’s sub-profile User B’s sub-profile User C’s sub-profile 12
Step 2: Sub-Profile Clustering (2/2) Clustering is performed on the network of sub-profiles. The same clustering method is used to group sub-profiles. Now, each cluster becomes a user community. A user can belong to multiple communities ( e.g., User A is in C1 and C2) DecompClus is a method to discover overlapping community structure by non-overlapping clustering method. Community C1 Community C2 User A’s sub-profile User D’s sub-profile User E’s sub-profile User A’s sub-profile User B’s sub-profile User C’s sub-profile User A User D User E User A User B User C 13
Overall Procedure of DecompClus 14
Table of Contents Introduction DecompClus Algorithm Evaluation Related Work Conclusion 15
Experimental Set-up (1/3) Evaluation methods Quantitative evaluation: verify that DecompClus finds more tightly and well-connected communities Modularity value Intra-similarity Inter-similarity Qualitative evaluation: explain how the communities by our method and those by compared method are different semantically Defining the theme of each community Case studies (See the paper) Visualization 16
Distribution of users according to their tags Experimental Set-up (2/3) CiteULike Social bookmarking service for scholarly papers Dataset # of users = 122 # of articles = 25,089 # of unique stemmed tags = 16,161 Half of the users have more than one interest tag like 'social_network%' or 'socialnetwork%' tag like 'data_mining%' or 'mining%' or 'knowledge_discovery%' tag like 'recommend%’ 17
Experimental Set-up (3/3) Implementation Gephi Library - open-source software for visualizing and analyzing large network graphs Baseline Follows almost the same procedures. Use only one overall profile for a user photo, lens, … outdoor, hiking, … photo, color, … art, museum, … photo, lens, … outdoor, hiking,… photo, color, … art, museum, … ProfilesCommunities ………… ………… 18
Discovered Communities Community ID# OF U SERS Bc157 Bc265 # of community DecompClus finds more communities than Baseline does. # of users in community The discovered communities by DecompClus have a greater number of members than Baseline. ∵ DecompClus allows a user to belong to multiple communities at the same time. Community ID# OF U SERS DC1DC180 DC2DC253 DC3DC391 DC4DC484 19
Quantitative Evaluation DecompClus achieves better metrics than Baseline Modularity value: the strength of division of a network into modules Intra-similarity: the average value of similarities in a community Inter-similarity: the average value of similarities between communities In DecompClus the connections between the members within a community are denser; in contrast, the connections between the members in different communities are sparser. 20
IDT HEME BC1BC1 Data mining & Recommendation BC2BC2Social Network IDT HEME DC1DC1 Data mining & Recommendation DC2DC2Semantic Web DC3DC3 Data mining & Bioinformatics DC4DC4Social Network newly founded Qualitative Evaluation (1/2) DecompClus preserves the themes defined by Baseline. DecompClus finds new communities that are not found by Baseline. 21
Distribution of articles related to “Semantic web” Distribution of articles related to “Bioinformatics” Qualitative Evaluation (2/2) In DecompClus, a user’s minor interests are not assimilated into his/her major interests, so new communities which consist of users’ minor interests can be discovered. 22
Visualization By ForceAtlas2 layout provided by Gephi The community structure produced by DecompClus is more clearly distinguishable. 23
Table of Contents Introduction DecompClus Algorithm Evaluation Related Work Conclusion 24
Related Work (1/2) Approach # of profile per user In clustering, the type of mapping (Node: Community) Result Non-overlapping community discovery One profile1:1 A user belongs to one community Overlapping community discovery One profile1:N A user belongs to multiple communities DecompClus Multiple sub- profiles 1:1 A user belongs to multiple communities Comparison with related areas 25
Related Work (2/2) Non-overlapping community discovery Newman’s method [Newman and Girvan, 2004] Multi-level graph partitioning method [Karypis and Kumar, 1995] Attribute augmented graph [Zhou et al., 2006] Bayesian generative models [Wang, 2006] Overlapping community discovery CPM (clique percolation method) [Pallal et al., 2005] Connectedness and local optimality [Goldberg et al., 2010] Label propagation [Gregory, 2009] 26
Conclusion A novel concept of profile decomposition, which enables us to detect fine-granularity user communities with implicit relationships A new approach to discovering overlapping communities with non- overlapping community discovery algorithms We demonstrate, by using real data set, that our algorithm effectively discovers user communities from social media data. 27
THANK YOU !!
Case Studies Case 1 Users who become a member in multiple communities by profile decomposition For example, a user A’s profile In our data set, there are total 99 users (81.1%) like the user A. Baseline DecompClus Community Bc1(data mining& Recommendation) User A Community Dc2 (semantic web) Community Dc4 (social network) Community Dc1 (data mining & recommendation) semantics, semantic web, rdf, ontology, social semantic web … User A’s sub-profile1 user model, recommender, personalization, user profiling, knn, data mining … User A’s sub-profile2 social network analysis, social search, graphs, … User A’s sub-profile3 Community Bc2(Social network) Community Dc3 (Data mining & Bioinformatics) 29
Case Studies Case 2 Users who become a member in the communities newly discovered by DecompClus There are total 9 users (7.3%) like the user B. For example, a user B’s profile Baseline DecompClus Community Bc1(data mining& Recommendation) User B Community Dc2 (semantic web) Community Dc4 (social network) Community Dc1 (data mining & recommendation) User B’s sub-profile1 Community Bc2(Social network) Community Dc3 (Data mining & Bioinformatics) statistics, cancer, genomics, gene, sequencing, virus, bacteria, database, classification, … 30