Presentation is loading. Please wait.

Presentation is loading. Please wait.

Generating and Tracking Communities Based on Implicit Affinities Matthew Smith – BYU Data Mining Lab April 2007.

Similar presentations


Presentation on theme: "Generating and Tracking Communities Based on Implicit Affinities Matthew Smith – BYU Data Mining Lab April 2007."— Presentation transcript:

1 Generating and Tracking Communities Based on Implicit Affinities Matthew Smith – smitty@byu.edu BYU Data Mining Lab April 2007

2 Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu) Outline Introduction & Motivation Project Community Generation: IANs Social Capital for Community Tracking Experiments & Observations Conclusions and Future Work

3 Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu) Introduction Online Communities Continually emerging – many sites are adding this aspect Like offline communities, they are complex and dynamic Examples USENET (1980), Google Groups, Wikipedia LinkedIn, Flickr, YouTube, MySpace, Facebook, etc. Medical Communities (e.g., DailyStrength, NAAF) Political Communities Blogosphere – focus of experiments

4 Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu) Motivation Explicit Links Explicit Social Network (ESN) Links: Friends, Web Links, etc.

5 Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu) Motivation Explicit Links Implicit Affinities smoke cancer bald ESN and Implicit Affinity Network (IAN) Applications: Medical, Blogosphere, etc.

6 Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu) Implicit Affinity Affinity: The overlapping of attributes-values for any common attribute Community: Set of individuals characterized by attributes Linked by affinities rather than explicit relationships

7 Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu) IAN Community Generation Individuals – nodes characterized by attributes Affinities – edges unlike traditional social networks where links represent explicit relationships, the links in our approach are based strictly on affinities Connections emerge naturally

8 Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu) Affinity Scoring Affinity score for a particular attribute Affinity score for all attributes

9 Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu) Affinity Network Building IAN

10 Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu) Social Capital for Community Tracking Social Capital: The advantage available through connections between individuals within a particular network Bonding and Bridging Metrics

11 Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu) Preliminary Experiments & Observations

12 Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu) Scobleizer’s Blog List Robert Scoble (“Scobleizer”) Blogger and book author Technical evangelist (formerly with Microsoft) Data Set Details: Scobleizer’s reading list at Bloglines.com 570 blogs 2380 bloggers

13 Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu) Data Set Statistics – Blog posts per day We observe fewer posts during the weekend (Friday & Saturday) Lack of data for all bloggers during first few days

14 Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu) Single Attribute: Companies Motivation Many bloggers talk about various companies and what they are doing Methodology Whenever a company is mentioned in a blogger’s post, it becomes a feature of the blogger Static company list used as attributes 1,914 company names

15 Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu) Cyclic Feature Usage

16 Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu) Power-law Behavior – Features Microsoft843 Google480 YouTube241 Where173 IBM153 Flickr152 Intel150 Sony145 Amazon140 Technorati137 Wikipedia129 MySpace117 Start115 Yahoo!114 Dell110 Skype100 eBay95 BBC79 Digg76 Gmail76 …… Ogle1 Observations Few companies mentioned by many Many companies mentioned by few

17 Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu) Blog Community Evolution Observations Weekend bonding? Bridging indicates newly used features new bloggers Overall bonding (expected) static set of features no decay blogosphere is full of buzz

18 Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu) Blog-based IAN – Feb. 24 niche sub-communities exist

19 Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu) Conclusions Blog posts were cyclic within this community Posted more during the week and less during the weekends Interestingly, bonding occurs during the weekends Companies were mentioned in a power-law way Few companies are mentioned often Most companies are mentioned rarely Niche sub-communities Bloggers focusing on long-tail companies were identified Blog-based IAN Appears to follow power-law connectivity like ESNs

20 Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu) Future Work (In Progress) Compare IAN and ESN of the same community Analyze evolution (social capital vs. density) Compare snapshots Identify and report similarities and differences Develop hybrid sub-community identification Experiment on domain-specific communities Medical – patient communities Political – jump start grass-roots campaigns

21 Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu) More Future Work Refine implicit attribute extraction Allow for dynamic feature extraction Allow features to naturally decay with time Use LDA to extract “concepts” Putnam’s puzzle Consider adapting Social Capital measures to allow for uncorrelated bonding and bridging

22 Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu) Questions ?

23 Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu) Affinity Score Distribution

24 Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu) Blog-based IANs – Filtered by Threshold Affinity Scores GTE 0.5Affinity Score of 1.0

25 Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu) Blog-based IAN – Filtered by Thresholds Affinity Thresholds Score GTE 0.5 Count GTE 3 2/15 – 3/15


Download ppt "Generating and Tracking Communities Based on Implicit Affinities Matthew Smith – BYU Data Mining Lab April 2007."

Similar presentations


Ads by Google