Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Diffusion of Information & Innovations in Online Social Networks Krishna Gummadi Networked Systems Research Group Max Planck Institute for Software Systems.

Similar presentations


Presentation on theme: "1 Diffusion of Information & Innovations in Online Social Networks Krishna Gummadi Networked Systems Research Group Max Planck Institute for Software Systems."— Presentation transcript:

1 1 Diffusion of Information & Innovations in Online Social Networks Krishna Gummadi Networked Systems Research Group Max Planck Institute for Software Systems

2 2 My goals and methodology Goals: Understand & build complex systems –example: online social networks Methodology: Evolve the systems with feedback –observe deployed systems –extract insights –test new designs and architectural principles

3 3 My research: Enabling the Social Web Three fundamental trends & challenges in social Web 1. User-generated content sharing –can we protect privacy of users sharing personal data? 2. Word-of-mouth based content exchange –can we understand & leverage word-of-mouth better?? 3. Crowd-sourcing content rating and ranking –can we find trustworthy & relevant content sources?

4 4 Information discovery in Online Social Networks Discovering information on the Web –old method: Browsing from authoritative sources –new method: Word-of-mouth from friends Lots of theories & beliefs about viral propagation –but few are empirically derived or validated at scale! Large-scale empirical studies only possible recently

5 5 Research problems Understand dynamics of propagation –Temporal and spatial patterns of propagation –Role of social network, social systems, and user influence For different types of information and innovations –News, web URLs, conventions, and technology services With the ultimate goal of enabling better viral campaigns –Consumers: Help them get content they would not otherwise receive –Publishers: Help them spread their content more effectively

6 6  One of the most popular social media  Social links are the primary way how information flows  Users can follow any public messages, called tweets, they like  Traditional media sources and word-of-mouth coexist  Mainstream media sources (BBC, CNN, DowningSteet)  Celebrities (Oprah Winfrey), politicians (Barack Obama)  Ordinary users (like you and me!) Why ?

7 7 Dataset  Crawled near-complete data from Twitter till August 2009  a sked Twitter to white-list 58 machines  c rawled information about user profiles and all tweets ever posted starting from user ID of 0 to 80 million  Gathered 54M users, 2B follow links, and 1.7B tweets  u ser profile includes join date, name, location, time zone  e xact time stamp of tweets available

8 8 Studies of information diffusion How web URLs are discovered in Twitter [IMC ‘11] How news spreads in Twitter [ICWSM ‘11] The role of offline geography in Twitter [ICWSM 2012] How social conventions emerge in Twitter [ICWSM 2012] –social norms are fundamental to social psychology and social life –social conventions are like social norms, before they become tied to group identity and before deviant behavior is sanctioned

9 Macroscopic analysis: Who passes information to whom With Fabrício Benevenuto (UFOP) Hamed Haddadi (QMUL) Meeyoung Cha (KAIST)

10 10 High-level network characteristics  95% of users belong to the largest connected component (LCC)  5% were singletons and 0.2% formed 32K smaller components  Low reciprocity (10%)  Power-law node degree distribution with extremely large hubs  Grassroots users, on average, have 37 followers (98% had <200 followers)  0.01% users had >100,000 followers

11 11  Two-step flow of influence by Katz and Lazarsfeld (1940s)  Not all people are equally influential  A minority of opinion leaders influence everyone else  Mass media influence the opinion leaders, hence the two-step flow Theory of information flow

12 12  Can we identify the different groups in Twitter?  What fraction of audience can each group reach? Interesting questions

13 13 How do we identify different groups? Grassroots 51M (98.6%) Evangelists 700,000 (1.4%) Mass media 8,000 (<0.01%)

14 14 Major news events studied  Picked six major news topics in 2009  Used keywords to identify relevant tweets  Limited study to a 2 month period 50-80% grassroots 18-48% evangelists <0.1% mass media All events reached millions of audience

15 15 Audience reach: Sufficiency  Sufficiency—Audience that can be reached by the top K spreaders rank 1 rank 2 rank 3 Spreader Audience

16 16 Sufficiency test in Iran election Mass media Evangelists Grassroots

17 17 Audience reach: Necessity  Necessary—Audience that are still reachable after removing the top K spreaders, i.e., audience would otherwise not be reachable rank 1 rank 2 rank 3 Spreader Audience

18 18 Necessity test in Iran election Mass media Evangelists Grassroots

19 19 Audience reach of popular topics Mass media alone reach the majority of all audience Evangelists increase the reach considerably Grassroots play marginal role

20 20 Audience reach of non-popular topics Evangelists group need more attention in viral marketing Existing influence measures fail to appreciate their role Evangelists group need more attention in viral marketing Existing influence measures fail to appreciate their role Evangelists group consistently reach large audience Mass media may not be present Grassroots play marginal role

21 21  Teased out the roles of mass media, evangelist, and grassroots users in the spread of major and minor events  Mass media are important for spreading popular topics  Evangelists play a crucial role for both popular and non-popular topics  Grassroots play a marginal role in all cases  Studied information spreading patterns across groups  Information flows in all directions unlike in the two-step flow theory Summary of macroscopic analysis

22 A more closer look: Patterns of URL propagation With Tiago Rodrigues (UFMG) Fabrício Benevenuto (UFOP) Meeyoung Cha (KAIST)

23 23 Interesting questions What types of content are discovered by Word-of-Mouth? What are the structures of Word-of-Mouth propagation trees? How geographically distributed are the propagation trees?

24 24 Why URLs on Twitter? Ideal for studying Word-of-Mouth – Centered around the idea of spreading information – Easy to trace their propagation 208M URLs shared on Twitter from

25 25 Modeling Information Cascades Hierarchical tree model TUserTweet content A C B D

26 26 Modeling Information Cascades Hierarchical tree model TUserTweet content 1ACheck this: A B Initiator Receiver C D

27 27 Modeling Information Cascades Hierarchical tree model TUserTweet content 1ACheck this: 2Bhttp://www.example.com/ is interesting A B D Initiator Spreader Receiver C

28 28 Modeling Information Cascades Hierarchical tree model TUserTweet content 1ACheck this: 2Bhttp://www.example.com/ is interesting 3CInteresting link: A C B D Initiator Spreader Receiver

29 29 Modeling Information Cascades Hierarchical tree model A C B D Initiator Spreader Receiver Audience

30 30 Modeling Information Cascades Hierarchical tree model – URL propagation pattern is a forest A C B D Initiator Spreader Receiver E F Initiator Spreader G Initiator I Receiver H

31 31 Word-of-mouth can help popularize niche content What URLs are popularly shared on Twitter? Do they come from the popular domains in the Web?

32 32 Does all content, including those published by unpopular domains, benefit from Word-of-Mouth? Word-of-mouth gives all URLs and content (both popular and non-popular) a chance to become popular

33 33 How large is the largest Word-of-Mouth? URL popularity – Most popular: 426,820 spreaders and audience of 28M users – Average: 3 spreaders and audience of 843 users Word-of-mouth can incur extremely large cascades

34 34 What are the typical structures of propagation trees? Cascade trees are much wider than they are deep – 0.1% of the trees have width > 20 – 0.005% of the trees have height > 20 A C B D ,418

35 35 What are the typical structures of propagation trees?

36 36 Twitter Cascades vs. Cascades D. Liben-Nowell and J. Kleinberg – Tracing Information Flow on a Global Scale using Internet Chain-Letter Data, PNAS, Twitter

37 37 Users within a short geographical distance have a higher probability of posting the same URL How geographically distributed are the propagation trees? A C B D

38 38 Summary: Patterns of URL propagation Large-scale analysis of URL propagation in Twitter – All contents have a chance to reach a large audience – Propagation trees on Twitter are wide and shallow Advertising – Content is consumed locally Caching design and recommendation

39 Microscopic analysis: Understanding news media landscape in Twitter With Jisun An (Cambridge Univ.) Meeyoung Cha (KAIST)

40 40 Interesting questions Does social interaction help media sources reach more audience? Do users follow diverse media sources? Does social interaction expose users to diverse media sources?

41 41 Methodology  Focus on 80 media sources  English-based media  A total of 14M followers and their connections (1.2B links, 350,000 tweets GenreExample account News (40 sources) cnnbrk, nytimes, TerryMoran Technology (13) BBCClick, mashable Sports (7)NBA, nfl Music (3)MTV Politics (5)nprpolitics, Business (2)davos Fashion & Gossip (4) peoplemag

42 42 Media exposure

43 43 Is social interaction helping media publishers reach more audience? Yes: Social interaction increases publisher’s audience On average, audience size increases by a factor of Nytimes (1.7M) 2. Nytimes (1.7M) 55. NASA (120K) 55. NASA (120K) 2. nytimes 1.7M -> 6.7M 8. BBCClick 1.2M -> 12M 65. washingtonpost 30K->3.5M

44 44 Does a user follow multiple media sources? Direct Subs: 80% users su bscribe only to 2-3 media sources No: Users only follow limited number of media sources.

45 45 Is social interaction exposing users to multiple media sources? Social Interaction: 80% o f users hear from up to 2 7 media sources Yes: 8 fold increase in number of media sources Direct Subs: 80% users su bscribe only to 2-3 media sources

46 Following multiple media sources does not necessarily imply exposure to diverse opinions Focus on political news Does a user follow diverse media sources?

47 47 Does user follow diverse media sources?  Manually tagging political leanings of media source  Left-right.org  ADA (Americans for Democratic Action) score  Scale from 0 to 100, where 0 means ‘very conservative’  No: Out of 10M users, 7M users only follow one side of media sources  Left-leaning(62.1%), center (37%), right-leaning (0.9%) I like to see diverse media sources

48 48 Is social interaction exposing users to diverse media sources? Yes: Users are exposed to diverse opinions through social interact ion

49 49 Estimating closeness How “close” or “similar” two media sources are

50 50 Closeness measure  Closeness: probability that a random follower of B i also follows A Closeness( NYTimes, Foxnews) = 143K/578K = 0.25 Closeness( NYTimes, washingtonpost) = 250K/404K = 0.62 Which one is closer to nytimes, Foxnews or washingtonpost? Washingtonpost is closer to nytimes than Foxnews NYTimes (A) washingtonpost(B 2 ) 154,224249,6262,840,960 Foxnews (B 1 ) NYTimes (A) 435,222142,9512,947,635

51 51 Closeness of political media sources  Picked political media sources  Ranked other political media sources based on closeness value We can automatically infer political leaning of media sources nprpolitics (Left) close distant nytimes (Left) jdickerson (Left) Nightling (Left) nrpscottsismon (Left) GMA (Center) bbcbreaking (Center) foxnews (Right) washtimes (Right) close distant washingtonpost (Left) f oxnews (Right) usnews (Right) bbcbreaking (Center) earlyshow (Left) nytimes (Left) arianhuff (Left) ObamaNews (Left) nprpolitics (Left)

52 52 Summary: Media landscape in Twitter Users only follow limited number of media sources. But they are exposed to 8x more media sources via social interaction Most users only follow political media with a certain bias Can automatically infer bias in media sources – Could be used for recommending content from diverse media sources

53 Emergence of social conventions With Farshad Kooti (MPI-SWS) Meeyoung Cha (KAIST) Winter Mason (Stevens Inst. of Tech.)

54 54 Interesting questions How do social conventions arise naturally? What is the context of their invention? How do they become widely accepted? Can we predict their adoption?

55 The retweeting variations o Searched for syntax o “Adopter” refers to a user using the variation at least once Variation# of adopters# of retweets RT1,836 K53,221 K via751 K5367 K Retweeting50 K296 K Retweet36 K110 K HT8 K22 K R/T5 K28 K  3 K18 K Total2,059 K59,065 K 55

56 56 Why retweeting convention? o Information-sharing channels are explicit in Twitter o Specific to Twitter: exposures within the community o Contained in Twitter, hence capturing all usages 56

57 What are the very first use cases? Via Mar’07  Sep’08 RT Jan’08 R/T Jun’08 Retweeting Jan’08 Retweet Nov’07 HT Oct’07 57

58 Via started from natural - new Nokia N-Series p hones will do Flash, Video and YouTube Via Mar’07  Sep’08 RT Jan’08 R/T Jun’08 Retweeting Jan’08 Retweet Nov’07 HT Oct’07 58

59 HT started from blog communities The Age Project: how old do I look? m/21b ( ) Via Mar’07  Sep’08 RT Jan’08 R/T Jun’08 Retweeting Jan’08 Retweet Nov’07 HT Oct’07 59

60 The first Twitter-specific variation she is in the Boston Glob e today, for a Stand up show she’s doing tonight. A dd the funny lady on Tweeter! Via Mar’07  Sep’08 RT Jan’08 R/T Jun’08 Retweeting Jan’08 Retweet Nov’07 HT Oct’07 60

61 RT was an adaption to constraints "LV Fire Department: No major injuries and the fire on the Monte Carlo west wing contained east wing nearly contained." Via Mar’07  Sep’08 RT Jan’08 R/T Jun’08 Retweeting Jan’08 Retweet Nov’07 HT Oct’07 61

62 Some start from explicit re: twitterkeys ★ d Via Mar’07  Sep’08 RT Jan’08 R/T Jun’08 Retweeting Jan’08 Retweet Nov’07 HT Oct’07 62

63 Early adopters are more tech-savvy Random users Early adopters 63

64 Early adopters are more innovative Early adoptersRandom users Has Bio94%25% Profile Pic99%50% Changed profile theme 91%40% Has Location95%36% Has Lists57%4% Has URL85%14% 64

65 Early adopters are more popular Much higher number of followers 80% of early adopters in top 1% based on PageRank 65

66 66 Defining the diffusion network o Each adopter is a node in the graph. o There is a link from A to B if A was exposed to the variation by B. 66

67 67 Diffusion network of first 500 adopters of Retweet

68 68 Diffusion network of first 500 adopters of RT

69 69 Early adopter network o Average number of exposures: 2.9 – 6.4 o Average clustering coefficient: o Criticality: fraction of users who were only exposed because of the most critical user: 0.5% - 4.9% Early adopters’ diffusion networks are dense and clustered. There is no single critical user.

70 70 Convention had different spread patterns from the URLs o URLs’ early adopters are not necessarily core users o The diffusion network is not dense and clustered o There are critical users in the process

71 71

72 72 Variations have different growth rates Some variations are growing and some dying at the end Only two variations became dominant RT via

73 73 Wide-spread vs. normal adoptions Successful variations reached peripheral users In tune with two-step flow theory Successful variations reached peripheral users In tune with two-step flow theory

74 74 Summary o Conventions emerged in an organic, bottom-up manner o Early adopters are core members of the community: Active, tech-savvy, popular, and innovative o Social conventions start spreading through dense and clustered networks and there is no critical user o When variations got popular, they reached out side of core community

75 75 Ongoing work: Convention prediction problem “Given a social network with records of users and their interactions, how reliably can we infer which variant of the convention a user U adopts at time T?”

76 76 Ongoing work: What features matter for prediction? Personal features – join date, in-/out-degrees, geo-location, # of tweets etc. Social features – number of exposures, number of adopter friends Global features – date of adoption, which is related to global popularity

77 77 Preliminary results: Prediction accuracy Baseline predicts adoption of dominant convention all the time Minimal improvement in prediction accuracy over baseline

78 78 Preliminary results: Prediction accuracy without a dominant convention Baseline predicts adoption with 0.5 accuracy Improvement in prediction accuracy over baseline especially, for less popular conventions

79 79 Top-5 predictive features 1.Date of adoption: Global feature 2.# of exposures: Social feature 3.# of posted URLs: Personal feature 4.Join date of adopter: Personal feature 5.# of adopter friends: Social feature


Download ppt "1 Diffusion of Information & Innovations in Online Social Networks Krishna Gummadi Networked Systems Research Group Max Planck Institute for Software Systems."

Similar presentations


Ads by Google