Presentation is loading. Please wait.

Presentation is loading. Please wait.

On Finding Fine-Granularity User Communities by Profile Decomposition Seulki Lee, Minsam Ko, Keejun Han, Jae-Gil Lee Department of Knowledge Service Engineering.

Similar presentations


Presentation on theme: "On Finding Fine-Granularity User Communities by Profile Decomposition Seulki Lee, Minsam Ko, Keejun Han, Jae-Gil Lee Department of Knowledge Service Engineering."— Presentation transcript:

1 On Finding Fine-Granularity User Communities by Profile Decomposition Seulki Lee, Minsam Ko, Keejun Han, Jae-Gil Lee Department of Knowledge Service Engineering KAIST(Korea Advanced Institute of Science and Technology) {seulki15, minsam.ko, brianhan87}@gmail.com, jaegil@kaist.ac.kr The 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 26-29 August, 2012, Kadir Has University, Istanbul, Turkey

2 Table of Contents  Introduction  DecompClus Algorithm  Evaluation  Related Work  Conclusion 2

3 Community Discovery  Community discovery is one of the most popular tasks in social network analysis.  Many real-world applications with community discovery Advertisement to common interest groups Recommendation of potential collaborators in workplaces 3

4 Relationships in Social Networks  A social network is modeled as a huge graph. A node is a user. An edge is a relationship between users.  Two types of relationships in social network Explicit relationship Implicit relationship Follower / FollowingFriend Explicit relationshipImplicit relationship Unknown, but similar interest We focus on this relationship. 4

5  To extract implicit relationships, a user is typically represented by his/her profile, and the similarity between user profiles is measured.  The form of the profile depends on the social network and application. In DBLP, the profile is a list of papers he/she wrote In Twitter, the profile is a list of tweets he/she posted Extracting implicit relationships Similarity between the profiles = Implicit relationship User A’s profileUser B’s profile … … 5

6 Limitation of a Single Profile  Generally, a user is described by only a single profile which oversimplifies the multiple characteristics of a user.  This problem results in loss of meaningful communities.  Though User A and User B share the same interest about photography, overall similarity between the two users is not very high. 6

7 DecompClus  We propose DecompClus, the community discovery method of profile decomposition, which divides a profile into sub-profiles. outdoor, hiking, … art, museum, photo, lens, … photo, color, … photo, lens, … outdoor, hiking, … photo, color, … art, museum, … Step1: Profile Decomposition Step2: sub-profile clustering photo, lens, … photo, color, … outdoor, hiking, … art, museum, … ProfilesSub-ProfilesCommunities … … … 7

8 Table of Contents Introduction DecompClus Algorithm Evaluation Related Work Conclusion 8

9 Overall Procedure of DecompClus 9

10 Step 1: Profile Decomposition (1/2)  A network of unit items (e.g., papers or tweets) is constructed for each user’s profile. A node (item) is represented by a term vector (weight: TF-IDF). An edge is determined as the similarity between two nodes (cosine similarity). i2i2 i6i6 i5i5 i4i4 i3i3 i1i1 i7i7 User A’s profile 10

11 Step 1: Profile Decomposition (2/2)  Clustering is performed on the small network. We adopted a clustering algorithm based on modularity optimization, which tries to detect high modularity partitions of networks [V. D. Blondel, et. al., 2008].  Each cluster becomes a sub-profile. User A’s profile User A’s sub-profiles 11

12 Step 2: Sub-Profile Clustering (1/2)  A network of sub-profiles is constructed by accumulating sub-profiles from every user. A node (sub-profile) is represented by a term vector (weight: TF-IDF). A edge is weighted by the similarity between two nodes (cosine similarity). User A’s sub-profile User D’s sub-profile User E’s sub-profile User A’s sub-profile User B’s sub-profile User C’s sub-profile 12

13 Step 2: Sub-Profile Clustering (2/2)  Clustering is performed on the network of sub-profiles. The same clustering method is used to group sub-profiles.  Now, each cluster becomes a user community.  A user can belong to multiple communities ( e.g., User A is in C1 and C2) DecompClus is a method to discover overlapping community structure by non-overlapping clustering method. Community C1 Community C2 User A’s sub-profile User D’s sub-profile User E’s sub-profile User A’s sub-profile User B’s sub-profile User C’s sub-profile User A User D User E User A User B User C 13

14 Overall Procedure of DecompClus 14

15 Table of Contents Introduction DecompClus Algorithm Evaluation Related Work Conclusion 15

16 Experimental Set-up (1/3)  Evaluation methods Quantitative evaluation: verify that DecompClus finds more tightly and well-connected communities  Modularity value  Intra-similarity  Inter-similarity Qualitative evaluation: explain how the communities by our method and those by compared method are different semantically  Defining the theme of each community  Case studies (See the paper)  Visualization 16

17 Distribution of users according to their tags Experimental Set-up (2/3)  CiteULike Social bookmarking service for scholarly papers http://www.citeulike.org/faq/data.adp  Dataset # of users = 122 # of articles = 25,089 # of unique stemmed tags = 16,161 Half of the users have more than one interest tag like 'social_network%' or 'socialnetwork%' tag like 'data_mining%' or 'mining%' or 'knowledge_discovery%' tag like 'recommend%’ 17

18 Experimental Set-up (3/3)  Implementation Gephi Library - open-source software for visualizing and analyzing large network graphs  Baseline Follows almost the same procedures. Use only one overall profile for a user photo, lens, … outdoor, hiking, … photo, color, … art, museum, … photo, lens, … outdoor, hiking,… photo, color, … art, museum, … ProfilesCommunities ………… ………… 18

19 Discovered Communities Community ID# OF U SERS Bc157 Bc265  # of community DecompClus finds more communities than Baseline does.  # of users in community The discovered communities by DecompClus have a greater number of members than Baseline. ∵ DecompClus allows a user to belong to multiple communities at the same time. Community ID# OF U SERS DC1DC180 DC2DC253 DC3DC391 DC4DC484 19

20 Quantitative Evaluation DecompClus achieves better metrics than Baseline Modularity value: the strength of division of a network into modules Intra-similarity: the average value of similarities in a community Inter-similarity: the average value of similarities between communities  In DecompClus the connections between the members within a community are denser; in contrast, the connections between the members in different communities are sparser. 20

21 IDT HEME BC1BC1 Data mining & Recommendation BC2BC2Social Network IDT HEME DC1DC1 Data mining & Recommendation DC2DC2Semantic Web DC3DC3 Data mining & Bioinformatics DC4DC4Social Network newly founded Qualitative Evaluation (1/2)  DecompClus preserves the themes defined by Baseline.  DecompClus finds new communities that are not found by Baseline. 21

22 Distribution of articles related to “Semantic web” Distribution of articles related to “Bioinformatics” Qualitative Evaluation (2/2)  In DecompClus, a user’s minor interests are not assimilated into his/her major interests, so new communities which consist of users’ minor interests can be discovered. 22

23 Visualization By ForceAtlas2 layout provided by Gephi  The community structure produced by DecompClus is more clearly distinguishable. 23

24 Table of Contents Introduction DecompClus Algorithm Evaluation Related Work Conclusion 24

25 Related Work (1/2) Approach # of profile per user In clustering, the type of mapping (Node: Community) Result Non-overlapping community discovery One profile1:1 A user belongs to one community Overlapping community discovery One profile1:N A user belongs to multiple communities DecompClus Multiple sub- profiles 1:1 A user belongs to multiple communities  Comparison with related areas 25

26 Related Work (2/2)  Non-overlapping community discovery Newman’s method [Newman and Girvan, 2004] Multi-level graph partitioning method [Karypis and Kumar, 1995] Attribute augmented graph [Zhou et al., 2006] Bayesian generative models [Wang, 2006]  Overlapping community discovery CPM (clique percolation method) [Pallal et al., 2005] Connectedness and local optimality [Goldberg et al., 2010] Label propagation [Gregory, 2009] 26

27 Conclusion  A novel concept of profile decomposition, which enables us to detect fine-granularity user communities with implicit relationships  A new approach to discovering overlapping communities with non- overlapping community discovery algorithms  We demonstrate, by using real data set, that our algorithm effectively discovers user communities from social media data. 27

28 THANK YOU !!

29 Case Studies Case 1 Users who become a member in multiple communities by profile decomposition For example, a user A’s profile In our data set, there are total 99 users (81.1%) like the user A. Baseline DecompClus Community Bc1(data mining& Recommendation) User A Community Dc2 (semantic web) Community Dc4 (social network) Community Dc1 (data mining & recommendation) semantics, semantic web, rdf, ontology, social semantic web … User A’s sub-profile1 user model, recommender, personalization, user profiling, knn, data mining … User A’s sub-profile2 social network analysis, social search, graphs, … User A’s sub-profile3 Community Bc2(Social network) Community Dc3 (Data mining & Bioinformatics) 29

30 Case Studies Case 2 Users who become a member in the communities newly discovered by DecompClus There are total 9 users (7.3%) like the user B. For example, a user B’s profile Baseline DecompClus Community Bc1(data mining& Recommendation) User B Community Dc2 (semantic web) Community Dc4 (social network) Community Dc1 (data mining & recommendation) User B’s sub-profile1 Community Bc2(Social network) Community Dc3 (Data mining & Bioinformatics) statistics, cancer, genomics, gene, sequencing, virus, bacteria, database, classification, … 30


Download ppt "On Finding Fine-Granularity User Communities by Profile Decomposition Seulki Lee, Minsam Ko, Keejun Han, Jae-Gil Lee Department of Knowledge Service Engineering."

Similar presentations


Ads by Google