Presentation is loading. Please wait.

Presentation is loading. Please wait.

Implicit Structure and Dynamics of BlogSpace Eytan Adar, Li Zhang, Lada Adamic, & Rajan Lukose HP Labs, Palo Alto, CA.

Similar presentations


Presentation on theme: "Implicit Structure and Dynamics of BlogSpace Eytan Adar, Li Zhang, Lada Adamic, & Rajan Lukose HP Labs, Palo Alto, CA."— Presentation transcript:

1 Implicit Structure and Dynamics of BlogSpace Eytan Adar, Li Zhang, Lada Adamic, & Rajan Lukose HP Labs, Palo Alto, CA

2 Blogs (web logs) contain online stamped entries date and time stamps list of read blogs URL that is being commented on via link

3 Blogs: structure and transmission Blog use: –Record real-world and virtual experiences –Note and discuss things “seen” on the net Blog structure: blog-to-blog linking Use + Structure –Great to track “memes” (catchy ideas) Patterns of information flow –How does the popularity of a topic evolve over time? –Who is getting information from whom? Ranking algorithms that take advantage of transmission patterns

4 Related Work Link prediction in social networks: Butts, C. Network Inference, Error, and Information (In)Accuracy: A Bayesian Approach, Social Networks, 25(2):103-140. Dombroski, M., P. Fischbeck, and K. Carley, An Empirically-Based Model for Network Estimation and Prediction, NAACSOS conference proceeding, Pittsburgh, PA, 2003. O’Madadhain J., Smyth P., Adamic L., Learning Predictive Models for Link Formation, Sunbelt 2005 (hope you were there!) Getoor, L., N. Friedman, D. Koller, and B. Taskar, Learning Probabilistic Models of Link Structure, Journal of Machine Learning Research, vol. 3(2002), pp. 690-707. Adamic L., Adar E., Friends and neighbors on the Web, Social Networks, 2003. Kleinberg, J., and.D. Liben-Nowell, The Link Prediction Problem for Social Networks’, in Proceedings of CIKM ’03 (New Orleans, LA, November 2003), ACM Press. Blog ranking: Technorati, BlogPulse, Daypop… Blog epidemic tracking: Blogdex at MIT media lab, Cameron Marlow, Sunbelt 2003 BlogPulse

5 Intelliseek’s BlogPulse Service for tracking trends in the blogosphere: popular URLs, phrases, people

6 BlogPulse Data analyzed 37,153 blogs Differential daily crawls (to find new posts) for May 2003 Full page crawl for May 18, 2003 to capture blogrolls 175,712 URLs occurring on > 2 blogs

7 Popularity Time Slashdot Effect BoingBoing Effect Tracking popularity over time Blogdex, BlogPulse, etc. track the most popular links/phrases of the day

8 Election Map Cartograms Michael Gastner, Cosma Shalizi, and Mark Newman University of Michigan http://www-personal.umich.edu/~mejn/election/

9

10 Popularity Time Tracking popularity over time

11 Clustering information popularity profiles May 2003 Total # of mentions substantial (40) URL mentioned for the first time in May

12 K-means clustering 259 URLs in the sample satisfy criteria Take normalized cumulative profiles all mentions day K-means minimizes the sum of the differences within each cluster 4 clusters captured most of the differences

13 Different kinds of information have different popularity profiles Products, etc. Major-news site (editorial content) – back of the paper 5101551015 51015 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 51015 Slashdot postings Front-page news 1 234

14 Cluster Profile# urlsexamples 1Sharp peak on day 1 followed by fast decay38Slashdot postings 2Day 1 peak followed by decay46Front page news 3Day 2 peak followed by gradual decay51Editorial content, Sun java release 4Sustained interest124iPod, iTunes, quizzila Popularity profiles 246810121416182022 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 cluster 4 cluster 3 cluster 2 cluster 1

15 Micro example: Giant Microbes

16 Microscale Dynamics What do we need track specific info ‘epidemics’? –Timings –Underlying network b1b1 b1b1 Time of infection t0t0 t1t1 b2b2 b2b2 b3b3 b3b3

17 Microscale Dynamics Challenges –Root may be unknown –Multiple possible paths –Uncrawled space, alternate media (email, voice) –No links b1b1 b1b1 Time of infection t0t0 t1t1 b2b2 b2b2 b3b3 b3b3 ? ? bnbn bnbn

18 Microscale Dynamics who is getting info from whom Via Links (< 2 % of links, 50% within sample) unambiguous Multiple explicit links: which link is more likely No explicit links (70%) which implicit path is more likely

19 Link Inference Use machine learning algorithms: A) Support Vector Machine (SVM) B) Logistic Regression What we can use Full text Blogs in common Links in common History of infection BoingBoing WIRED

20 Percentage of blog pairs sharing at least one link link typesame dayA after BA before B A  B17.4%24.5% A  B10.9%22.9%17.0% A,B unlinked 0.6%1.5%1.3%

21 Similarity in links between reciprocated, unreciprocated, and non-linked blog pairs

22 Blog A Blog B + T infection (Blog B) > T infection (Blog A) Blog A Blog B - Positive Example Negative Example InfectedUninfected Training on positive and negative examples of ‘infection’

23 Prediction results Link Inference: SVM 91% accuracy regression 92% accuracy (blog-blog links most predictive) Infection inference: SVM 71.5% accuracy: using blog and non-blog link similarity + timing features (A before B)/n A, (B before A)/n A, (A same day B)/n A,, … Regression: 75% accuracy using only timing features

24 time inferred actual uncrawled blog or media source Sources of error Coarseness and sparseness of timing data (1 day resolution) Mirror URLS (actually helps) Incomplete crawls B A C

25 Visualization by Eytan Adar GUESS tool (build your own, see demo @ 5:30!) –Using GraphViz (by AT&T) layouts Simple algorithm –If single, explicit link exists, draw it (add node if needed) –Otherwise use ML algorithm Pick the most likely explicit link Pick the most likely possible link Tool lets you zoom around space, control threshold, link types, etc. http://www-idl.hpl.hp.com/blogstuff

26 Giant Microbes epidemic visualization via link explicit link inferred linkblog

27 iRank Find early sources of good information using inferred information paths or timing b1b1 b1b1 b2b2 b2b2 b3b3 b3b3 b4b4 b4b4 b5b5 b5b5 bnbn bnbn … True source Popular site

28 iRank Algorithm Draw a weighted edge for all pairs of blogs that cite the same URL higher weight for mentions closer together run PageRank control for ‘spam’ Time of infection t0t0 t1t1

29 Do Bloggers Kill Kittens? 02:00 AM Friday Mar. 05, 2004 PST Wired publishes:Wired "Warning: Blogs Can Be Infectious.” 7:25 AM Friday Mar. 05, 2004 PST Slashdot posts:Slashdot "Bloggers' Plagiarism Scientifically Proven" 9:55 AM Friday Mar. 05, 2004 PST Metafilter announcesMetafilter "A good amount of bloggers are outright thieves."

30 For more info Information Dynamics Lab @ HP http://www.hpl.hp.com/research/idl Blog Epidemic Analyzer http://www-idl.hpl.hp.com/blogstuff Eytan, Li, Lada & Rajan http://www.hpl.hp.com/research/idl/people/eytan/ http://www.hpl.hp.com/personal/Li_Zhang/ http://www.hpl.hp.com/personal/Lada_Adamic http://www.hpl.hp.com/research/idl/people/lukose/

31 CNN: Wal-Mart banishes bawdy mags


Download ppt "Implicit Structure and Dynamics of BlogSpace Eytan Adar, Li Zhang, Lada Adamic, & Rajan Lukose HP Labs, Palo Alto, CA."

Similar presentations


Ads by Google