Presentation is loading. Please wait.

Presentation is loading. Please wait.

F INDING E VENT -S PECIFIC I NFLUENCERS IN D YNAMIC S OCIAL N ETWORKS Masters Thesis – Chris Schenk December 1 st, 2010.

Similar presentations


Presentation on theme: "F INDING E VENT -S PECIFIC I NFLUENCERS IN D YNAMIC S OCIAL N ETWORKS Masters Thesis – Chris Schenk December 1 st, 2010."— Presentation transcript:

1 F INDING E VENT -S PECIFIC I NFLUENCERS IN D YNAMIC S OCIAL N ETWORKS Masters Thesis – Chris Schenk December 1 st, 2010

2 O UTLINE Problem overview Influencers, reputation, validation and security Summary of analysis methods Boulder fire data Twitter Data API, formats, collection and data limitations Statistics Finding event-specific influencers – Rankings Stats Hyperlink-Induced Topic Search (HITS) Context-specific in-degree (original work) Conclusions and Future Work

3 P ROBLEM O VERVIEW How to detect manipulation? How do we safely report results? Security Can we verify claims made about reputation? Validation Why are those people important? Reputation Who is important? Influencers

4 I NFLUENCERS Social dynamics vs online social dynamics Social network features Search, friends, re-tweets Influencers and sheep What is meant by influence? Understanding the data Sampling and baseline statistics Similarity measures, clustering Semantics, intent (NLP) Baseline activity

5 I NFLUENCERS – N ETWORK S TRUCTURE Betweenness/Closeness centrality PageRank/TwitterRank/TunkRank Local/Global hierarchical clustering K-core decomposition K-clique percolation Nearest Neighbor Networks Assortative mixing HITS Activity Network

6 T WITTER D ATA S TATS – B OULDER F IRE Tweets First day – September 6 th, :00am to September 7 th, :00am, Mountain time First week – September 6 th, :00am to September 13 th, :00am, Mountain time Social graph Five one-day snapshots beginning September 7 th, :40pm, Mountain time Tweet example Article on Twitter's use during #eqnz, #boulderfire, and #sanbrunofire: kate30_CU :29:24+00:00 Keywords: boulder, boulderfire, fourmilefire, fourmilecanyon, 4milefire

7 Q UALITATIVELY I NFLUENTIAL U SERS Sixteen users gathered by Jo White Used as “ground truth” data for ranking comparison epiccoloradolaurasrecipesHumaneBoulderfishnette suzanbondCampSteveConnectColoradoOrg9 metroseenpalensophiabliuMediamum Tanukuneadvocatekate30_CUBoulderChannel1

8 T WITTER API AND D ATA C OLLECTION Search+Track+REST Unique users for a given event Profiles Periodic collection Friends/Followers Periodic collection Tweets One-time collection Limitations Rate limits, multi-threading Improper SQL query

9 T WEET S TATS StatFirst DayFirst Week # Tweets (total)12,1472,314,700 # Users39813,955 Avg. Tweets/user Med. Tweets/user # Hashtags (total)7,422756,785 # Hashtags (unique)89566,765 Avg. Hashtag occurrence Med. Hashtag occurrence1.0 # Mentions (total)7,8771,224,851 Avg. Mentions/User Med. Mentions/User1.0 # Users mentioning others308 (77.39%)11,036 (79.08%)

10 T WEET S TATS ( CONT.) StatFirst DayFirst Week # Addressed Msgs.2,291 (18.85%)368,047 (15.90%) # Users addressing msgs.227 (57.04%)8,404 (60.22%) # Re-tweet Msgs.3,994 (32.88%)504,836 (21.81%) # Users re-tweeted (global)1,456134,204 # Users re-tweeted (fire)356 (24.45%)2,085 (1.55%) # URLs (unique)4,1051,200,927 # Source applications851,026 # Users giving location30 (7.53%)858 (6.14%) # Tweets with location172 (1.42%)17,093 (0.77%)

11 G RAPH S TATS Timezone: Mountain :40: :40: :40: :40: :10:01 Users (fire)4481,6311,6231,6224,093 Users (all)821,6092,292,9292,295,8852,300,8384,075,573 Edges (fire)3,14225,19325,48425,66487,539 Edges (all)1,510,0365,361,6505,370,4515,372,59730,458,948

12 L OCATION D ATA – U.S.

13 L OCATION D ATA – D ENVER M ETRO

14 L OCATION D ATA – B OULDER, L ONGMONT, B ROOMFIELD

15 U SER “ FISHNETTE ” D ATA - A GGREGATE H OURLY T WEET C OUNTS

16 U SER “ FISHNETTE ” D ATA – A GGREGATE M ONTHLY T WEET C OUNTS

17 H ASHTAG C OUNTS

18 A DDRESSED M ESSAGES

19 R E - TWEETS

20 F INDING I NFLUENCERS - R ANKINGS Tweets Number of tweets Username mentions Number of re-tweets Graph In-degree HITS all users (sorted by frequency) active users Mentions addressed messages (replies) Context-specific in-degree Global followers count Active edges (pre-existing network) New Edges

21 R ANKINGS - N UMBER OF T WEETS

22 R ANKINGS – U SERNAME M ENTIONS

23 R ANKINGS – R E - TWEETS

24 R ANKINGS – I N - DEGREE (F OLLOWERS )

25 H YPERLINK -I NDUCED T OPIC S EARCH (HITS) Hubs Those that link to many authorities Authorities Those that are linked to by many hubs Process Calculate the principle eigenvector of two matrices Followers adjacency matrix (authorities) Friends adjacency matrix (hubs) Iterative Rankings by highest value descending in eigenvectors

26 R ANKINGS – HITS – A LL USERS

27 R ANKINGS – HITS – A CTIVE U SERS

28 R ANKINGS – HITS – M ENTIONS

29 R ANKINGS – HITS – A DDRESSED M SGS.

30 C ONTEXT - SPECIFIC I N - DEGREE R ANKING Global followers count Periodically download user profiles Calculate change in followers count for each snapshot Rank based on overall change, descending Active edges (includes pre-existing edges) Periodically download friend/follower lists Calculate change in followers count for each snapshot Rank based on overall change, descending New Edges Periodically download friend/follower lists Calculate change in followers count for each snapshot Do not count edges that existed prior to the start of the event Rank based on overall change, descending

31 R ANKINGS – G LOBAL F OLLOWERS C OUNT

32 R ANKINGS – A CTIVE E DGES

33 R ANKINGS – N EW E DGES

34 L IMITATIONS AND M ODIFICATIONS On-going influence Can only measure when a user becomes influential Global popularity masking local influence User “andrewhyde” News and bot activity Extra data needed to ignore these users Large events Data collection limitations How important is a de-follow? Can identify individual user activity Identifying the sheep Can equivalently count friends (out-links) created

35 C ONCLUSIONS Notions of influence and interaction are heavily dependent on social network features No agreement on definitions Influence measured by features not 100% in use Or features not used in the same way by everyone Composability problem HITS ranking no better than global in-degree Context-specific in-degree ranking good! Needs to be tested on multiple events of varying sizes

36 F UTURE W ORK Understanding “baseline” behavior For users active (using keywords) during an event Calculate all given statistics for a user (Klout.com?) Lots of ways to cut the data Composable factors/measures/attributes Explaining new links created Models for searching, re-tweeting, hashtags, #ff, etc Incorporating blogs, forums, news websites Real-time vs not Informing algorithms with other techniques NLP and more automation Qualitative analysis (crowdsourcing?)

37 T HANKS ! Q UESTIONS ?

38 R EPUTATION Definitions? Scores Composability Explicit reputation Ratings, votes Implicit reputation Client Server

39 V ALIDATION Ground truth Authorities Armies of grad students Crowd-sourcing? More data Cross-referencing News websites Blogs Public health and safety (or other)

40 S ECURITY Malicious users Inflation of reputation Sybil attacks Reporting Audience? Anonymization


Download ppt "F INDING E VENT -S PECIFIC I NFLUENCERS IN D YNAMIC S OCIAL N ETWORKS Masters Thesis – Chris Schenk December 1 st, 2010."

Similar presentations


Ads by Google