On Finding Fine-Granularity User Communities by Profile Decomposition Seulki Lee, Minsam Ko, Keejun Han, Jae-Gil Lee Department of Knowledge Service Engineering.

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

A Unified Framework for Context Assisted Face Clustering
ICDE 2014 LinkSCAN*: Overlapping Community Detection Using the Link-Space Transformation Sungsu Lim †, Seungwoo Ryu ‡, Sejeong Kwon§, Kyomin Jung ¶, and.
One Theme in All Views: Modeling Consensus Topics in Multiple Contexts Jian Tang 1, Ming Zhang 1, Qiaozhu Mei 2 1 School of EECS, Peking University 2 School.
Analysis and Modeling of Social Networks Foudalis Ilias.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
A New Suffix Tree Similarity Measure for Document Clustering Hung Chim, Xiaotie Deng City University of Hong Kong WWW 2007 Session: Similarity Search April.
Discovering Overlapping Groups in Social Media Xufei Wang, Lei Tang, Huiji Gao, and Huan Liu Arizona State University.
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
Recommender systems Ram Akella November 26 th 2008.
Memoplex Browser: Searching and Browsing in Semantic Networks CPSC 533C - Project Update Yoel Lanir.
12 -1 Lecture 12 User Modeling Topics –Basics –Example User Model –Construction of User Models –Updating of User Models –Applications.
Overview of Web Data Mining and Applications Part I
Query Planning for Searching Inter- Dependent Deep-Web Databases Fan Wang 1, Gagan Agrawal 1, Ruoming Jin 2 1 Department of Computer.
Webpage Understanding: an Integrated Approach
Attention and Event Detection Identifying, attributing and describing spatial bursts Early online identification of attention items in social media Louis.
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Intelligent Systems Lecture 23 Introduction to Intelligent Data Analysis (IDA). Example of system for Data Analyzing based on neural networks.
Liang Xiang, Quan Yuan, Shiwan Zhao, Li Chen, Xiatian Zhang, Qing Yang and Jimeng Sun Institute of Automation Chinese Academy of Sciences, IBM Research.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.
Community Detection by Modularity Optimization Jooyoung Lee
Outlier Detection Using k-Nearest Neighbour Graph Ville Hautamäki, Ismo Kärkkäinen and Pasi Fränti Department of Computer Science University of Joensuu,
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
By : Garima Indurkhya Jay Parikh Shraddha Herlekar Vikrant Naik.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
WEB SEARCH PERSONALIZATION WITH ONTOLOGICAL USER PROFILES Data Mining Lab XUAN MAN.
Data Mining By Dave Maung.
Mining fuzzy domain ontology based on concept Vector from wikipedia category network.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
You Are What You Tag Yi-Ching Huang and Chia-Chuan Hung and Jane Yung-jen Hsu Department of Computer Science and Information Engineering Graduate Institute.
CONCLUSION & FUTURE WORK Normally, users perform search tasks using multiple applications in concert: a search engine interface presents lists of potentially.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering.
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Local/Global Term Analysis for Discovering Community Differences in Social Networks David Fuhry, Yiye Ruan, and Srinivasan Parthasarathy Data Mining Research.
Measuring Behavioral Trust in Social Networks
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
Copyright © 2001, SAS Institute Inc. All rights reserved. Data Mining Methods: Applications, Problems and Opportunities in the Public Sector John Stultz,
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
Mining Tag Semantics for Social Tag Recommendation Hsin-Chang Yang Department of Information Management National University of Kaohsiung.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Topical Analysis and Visualization of (Network) Data Using Sci2 Ted Polley Research & Editorial Assistant Cyberinfrastructure for Network Science Center.
1 Discovering Web Communities in the Blogspace Ying Zhou, Joseph Davis (HICSS 2007)
A Connectivity-Based Popularity Prediction Approach for Social Networks Huangmao Quan, Ana Milicic, Slobodan Vucetic, and Jie Wu Department of Computer.
Twitter Community Discovery & Analysis Using Topologies Andrew McClain Karen Aguar.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment Xinyang Jiang, Fei Wu, Xi Li, Zhou Zhao, Weiming Lu, Siliang Tang, Yueting.
Summary Presented by : Aishwarya Deep Shukla
Hansheng Xue School of Computer Science and Technology
RankClus: Integrating Clustering with Ranking for Heterogeneous Information Network Analysis Yizhou Sun, Jiawei Han, Peixiang Zhao, Zhijun Yin, Hong Cheng,
Community Distribution Outliers in Heterogeneous Information Networks
Postdoc, School of Information, University of Arizona
Information Networks: State of the Art
Affiliation Network Models of Clusters in Networks
CSE591: Data Mining by H. Liu
Presentation transcript:

On Finding Fine-Granularity User Communities by Profile Decomposition Seulki Lee, Minsam Ko, Keejun Han, Jae-Gil Lee Department of Knowledge Service Engineering KAIST(Korea Advanced Institute of Science and Technology) {seulki15, minsam.ko, The 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining August, 2012, Kadir Has University, Istanbul, Turkey

Table of Contents  Introduction  DecompClus Algorithm  Evaluation  Related Work  Conclusion 2

Community Discovery  Community discovery is one of the most popular tasks in social network analysis.  Many real-world applications with community discovery Advertisement to common interest groups Recommendation of potential collaborators in workplaces 3

Relationships in Social Networks  A social network is modeled as a huge graph. A node is a user. An edge is a relationship between users.  Two types of relationships in social network Explicit relationship Implicit relationship Follower / FollowingFriend Explicit relationshipImplicit relationship Unknown, but similar interest We focus on this relationship. 4

 To extract implicit relationships, a user is typically represented by his/her profile, and the similarity between user profiles is measured.  The form of the profile depends on the social network and application. In DBLP, the profile is a list of papers he/she wrote In Twitter, the profile is a list of tweets he/she posted Extracting implicit relationships Similarity between the profiles = Implicit relationship User A’s profileUser B’s profile … … 5

Limitation of a Single Profile  Generally, a user is described by only a single profile which oversimplifies the multiple characteristics of a user.  This problem results in loss of meaningful communities.  Though User A and User B share the same interest about photography, overall similarity between the two users is not very high. 6

DecompClus  We propose DecompClus, the community discovery method of profile decomposition, which divides a profile into sub-profiles. outdoor, hiking, … art, museum, photo, lens, … photo, color, … photo, lens, … outdoor, hiking, … photo, color, … art, museum, … Step1: Profile Decomposition Step2: sub-profile clustering photo, lens, … photo, color, … outdoor, hiking, … art, museum, … ProfilesSub-ProfilesCommunities … … … 7

Table of Contents Introduction DecompClus Algorithm Evaluation Related Work Conclusion 8

Overall Procedure of DecompClus 9

Step 1: Profile Decomposition (1/2)  A network of unit items (e.g., papers or tweets) is constructed for each user’s profile. A node (item) is represented by a term vector (weight: TF-IDF). An edge is determined as the similarity between two nodes (cosine similarity). i2i2 i6i6 i5i5 i4i4 i3i3 i1i1 i7i7 User A’s profile 10

Step 1: Profile Decomposition (2/2)  Clustering is performed on the small network. We adopted a clustering algorithm based on modularity optimization, which tries to detect high modularity partitions of networks [V. D. Blondel, et. al., 2008].  Each cluster becomes a sub-profile. User A’s profile User A’s sub-profiles 11

Step 2: Sub-Profile Clustering (1/2)  A network of sub-profiles is constructed by accumulating sub-profiles from every user. A node (sub-profile) is represented by a term vector (weight: TF-IDF). A edge is weighted by the similarity between two nodes (cosine similarity). User A’s sub-profile User D’s sub-profile User E’s sub-profile User A’s sub-profile User B’s sub-profile User C’s sub-profile 12

Step 2: Sub-Profile Clustering (2/2)  Clustering is performed on the network of sub-profiles. The same clustering method is used to group sub-profiles.  Now, each cluster becomes a user community.  A user can belong to multiple communities ( e.g., User A is in C1 and C2) DecompClus is a method to discover overlapping community structure by non-overlapping clustering method. Community C1 Community C2 User A’s sub-profile User D’s sub-profile User E’s sub-profile User A’s sub-profile User B’s sub-profile User C’s sub-profile User A User D User E User A User B User C 13

Overall Procedure of DecompClus 14

Table of Contents Introduction DecompClus Algorithm Evaluation Related Work Conclusion 15

Experimental Set-up (1/3)  Evaluation methods Quantitative evaluation: verify that DecompClus finds more tightly and well-connected communities  Modularity value  Intra-similarity  Inter-similarity Qualitative evaluation: explain how the communities by our method and those by compared method are different semantically  Defining the theme of each community  Case studies (See the paper)  Visualization 16

Distribution of users according to their tags Experimental Set-up (2/3)  CiteULike Social bookmarking service for scholarly papers  Dataset # of users = 122 # of articles = 25,089 # of unique stemmed tags = 16,161 Half of the users have more than one interest tag like 'social_network%' or 'socialnetwork%' tag like 'data_mining%' or 'mining%' or 'knowledge_discovery%' tag like 'recommend%’ 17

Experimental Set-up (3/3)  Implementation Gephi Library - open-source software for visualizing and analyzing large network graphs  Baseline Follows almost the same procedures. Use only one overall profile for a user photo, lens, … outdoor, hiking, … photo, color, … art, museum, … photo, lens, … outdoor, hiking,… photo, color, … art, museum, … ProfilesCommunities ………… ………… 18

Discovered Communities Community ID# OF U SERS Bc157 Bc265  # of community DecompClus finds more communities than Baseline does.  # of users in community The discovered communities by DecompClus have a greater number of members than Baseline. ∵ DecompClus allows a user to belong to multiple communities at the same time. Community ID# OF U SERS DC1DC180 DC2DC253 DC3DC391 DC4DC484 19

Quantitative Evaluation DecompClus achieves better metrics than Baseline Modularity value: the strength of division of a network into modules Intra-similarity: the average value of similarities in a community Inter-similarity: the average value of similarities between communities  In DecompClus the connections between the members within a community are denser; in contrast, the connections between the members in different communities are sparser. 20

IDT HEME BC1BC1 Data mining & Recommendation BC2BC2Social Network IDT HEME DC1DC1 Data mining & Recommendation DC2DC2Semantic Web DC3DC3 Data mining & Bioinformatics DC4DC4Social Network newly founded Qualitative Evaluation (1/2)  DecompClus preserves the themes defined by Baseline.  DecompClus finds new communities that are not found by Baseline. 21

Distribution of articles related to “Semantic web” Distribution of articles related to “Bioinformatics” Qualitative Evaluation (2/2)  In DecompClus, a user’s minor interests are not assimilated into his/her major interests, so new communities which consist of users’ minor interests can be discovered. 22

Visualization By ForceAtlas2 layout provided by Gephi  The community structure produced by DecompClus is more clearly distinguishable. 23

Table of Contents Introduction DecompClus Algorithm Evaluation Related Work Conclusion 24

Related Work (1/2) Approach # of profile per user In clustering, the type of mapping (Node: Community) Result Non-overlapping community discovery One profile1:1 A user belongs to one community Overlapping community discovery One profile1:N A user belongs to multiple communities DecompClus Multiple sub- profiles 1:1 A user belongs to multiple communities  Comparison with related areas 25

Related Work (2/2)  Non-overlapping community discovery Newman’s method [Newman and Girvan, 2004] Multi-level graph partitioning method [Karypis and Kumar, 1995] Attribute augmented graph [Zhou et al., 2006] Bayesian generative models [Wang, 2006]  Overlapping community discovery CPM (clique percolation method) [Pallal et al., 2005] Connectedness and local optimality [Goldberg et al., 2010] Label propagation [Gregory, 2009] 26

Conclusion  A novel concept of profile decomposition, which enables us to detect fine-granularity user communities with implicit relationships  A new approach to discovering overlapping communities with non- overlapping community discovery algorithms  We demonstrate, by using real data set, that our algorithm effectively discovers user communities from social media data. 27

THANK YOU !!

Case Studies Case 1 Users who become a member in multiple communities by profile decomposition For example, a user A’s profile In our data set, there are total 99 users (81.1%) like the user A. Baseline DecompClus Community Bc1(data mining& Recommendation) User A Community Dc2 (semantic web) Community Dc4 (social network) Community Dc1 (data mining & recommendation) semantics, semantic web, rdf, ontology, social semantic web … User A’s sub-profile1 user model, recommender, personalization, user profiling, knn, data mining … User A’s sub-profile2 social network analysis, social search, graphs, … User A’s sub-profile3 Community Bc2(Social network) Community Dc3 (Data mining & Bioinformatics) 29

Case Studies Case 2 Users who become a member in the communities newly discovered by DecompClus There are total 9 users (7.3%) like the user B. For example, a user B’s profile Baseline DecompClus Community Bc1(data mining& Recommendation) User B Community Dc2 (semantic web) Community Dc4 (social network) Community Dc1 (data mining & recommendation) User B’s sub-profile1 Community Bc2(Social network) Community Dc3 (Data mining & Bioinformatics) statistics, cancer, genomics, gene, sequencing, virus, bacteria, database, classification, … 30