Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2006 Nielsen BuzzMetrics, A VNU business affiliate Deriving Marketing Intelligence from Online Discussion Natalie Glance and Matthew Hurst CMU Information.

Similar presentations


Presentation on theme: "© 2006 Nielsen BuzzMetrics, A VNU business affiliate Deriving Marketing Intelligence from Online Discussion Natalie Glance and Matthew Hurst CMU Information."— Presentation transcript:

1 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Deriving Marketing Intelligence from Online Discussion Natalie Glance and Matthew Hurst CMU Information Retrieval Seminar, April 19, 2006

2 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Overview  Motivation  Content Segment: The Blogosphere  Structural Aspects  Topical Aspects  Deriving market intelligence  Conclusion

3 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Motivation Social Media Social Media Mobile phone data The celly 31 is awesome, but the screen is a bit too dim. ProductScore Celly 314.0 Phony ZA8.0 FeatureScore Screen2.0 Signal9.0

4 © 2006 Nielsen BuzzMetrics, A VNU business affiliate The Blogosphere

5 © 2006 Nielsen BuzzMetrics, A VNU business affiliate

6 Profile Analysis Hurst, “24 Hours in the Blogosphere”, 2006 AAAI Spring Symposium on Computational Approaches to Analysing Weblogs.

7 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Hypotheses Different hosts attract users with different capacity to disclose profile information (?) Blogspot users are more disposed to disclose information (?) Different interface implementations perform differently at extracting/encouraging information from users (?)

8 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Per Capita: Spaces variance in average age variance in profiles with age variance in per capita bloggers

9 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Per Capita: Blogspot

10 © 2006 Nielsen BuzzMetrics, A VNU business affiliate The graphical structure of the blogosphere

11 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Graphical Structure of the Blogosphere  Citations between blogs indicate some form of relationship, generally topical.  A link is certainly evidence of awareness, consequently reciprocal links are evidence of mutual awareness.  Mutual awareness suggests some commonality, perhaps common interests.  The graph of reciprocal links can be considered a social network.  Areciprocal links suggest topical relationships, but not social ones.

12 © 2006 Nielsen BuzzMetrics, A VNU business affiliate

13 Graph Layout  Hierarchical Force Layout  Graph has 2 types of links: reciprocal links and areciprocal links  Create set of partitions P where each partition is a connected component in the reciprocal graph.  Create a graph whose nodes are the members of P and whose edges are formed from areciprocal links between (nodes within) member of P.  Layout the partition graph.  Layout each partition.

14 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Japanese r = 2 p = 25 cooking knitting

15 © 2006 Nielsen BuzzMetrics, A VNU business affiliate boingboing michellemalkin engadget instapundit powerline scoble crooksandliars kbcafe/rss gizmodo r = 2 p = 1

16 © 2006 Nielsen BuzzMetrics, A VNU business affiliate r = 3 p = 100 technology social/politics The English blogosphere is political.

17 © 2006 Nielsen BuzzMetrics, A VNU business affiliate L. Adamic and N. Glance, “The Political Blogosphere and the 2004 U.S. Election: Divided They Blog”, 2 nd Annual Workshop on the Weblogging Ecosystem, Chiba, Japan, 2005. Political Blogosphere

18 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Political Blogs & Readership  Pew Internet & American Life Project Report, January 2005, reports:  63 million U.S. citizens use the Internet to stay informed about politics (mid-2004, Pew Internet Study)  9% of Internet users read political blogs preceding the 2004 U.S. Presidential Election  2004 Presidential Campaign Firsts  Candidate blogs: e.g. Dean’s blogforamerica.com  Successful grassroots campaign conducted via websites & blogs  Bloggers credentialed as journalists & invited to nominating conventions

19 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Research Goals & Questions  Are we witnessing a cyberbalkination of the Internet?  Linking behavior of blogs may make it easier to read only like-minded bloggers  On the other hand, bloggers systematically react to and comment on each others’ posts, both in agreement and disagreement (Balkin 2004)  Goal: study the linking behavior & discussion topics of political bloggers  Measure the degree of interaction between liberal and conservative bloggers  Find any differences in the structure of the two communities: is there a significant difference in “cohesiveness” in one community over another?

20 © 2006 Nielsen BuzzMetrics, A VNU business affiliate The Greater Political Blogosphere  Citation graph of greater political blogosphere  Front page of each blog crawled in February 2005  Directed link between blog A and blog B, if A links to B  Method biases blogroll/sidebar links (as opposed to links in posts)  Results  91% of links point to blog of same persuasion (liberal vs. conservative)  Conservative blogs show greater tendency to link  82% of conservative blogs are linked to at least once; 84% link to at least one other blog  67% of liberal blogs are linked to at least once; 74% link to at least one other blog  Average # of links per blog is similar: 13.6 for liberal; 15.1 for conservative  Higher proportion of liberal blogs that are not linked to at all

21 A)All citations between A- list blogs in 2 months preceding the 2004 election B)Citations between A-list blogs with at least 5 citations in both directions C)Edges further limited to those exceeding 25 combined citations Citations between blogs extracted from posts (Aug 29 th – Nov 15 th, 2004) Only 15% of the citations bridge communities

22 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Are political blogs echo chambers?  Performed pairwise comparison of URL citations and phrase usage from blog posts  Link-based similarity measure  Cosine similarity: cos(A,B) = v A.v B /(||v A ||*||v B ||), where v A is a binary vector. Each entry = 1 or 0, depending on whether blog A cites a particular URL  Average similarity(L,R) = 0.03; cos(R,R) = 0.083; cos(L,L) = 0.087  Phrase-based similarity measure  Extracted set of phrases, informative wrt background model  Entries in v A are TF*IDF weight for each phrase = (# of phrase mentions by blog)*log[(# blogs)/(# blogs citing the phrase)]  Average similarity(L,R) = 0.10; cos(R,R) = 0.54; cos(L,L) = 0.57

23 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Influence on mainstream media Notable examples of blogs breaking a story 1.Swiftvets.com anti-Kerry video  Bloggers linked to this in late July, keeping accusations alive  Kerry responded in late August, bringing mainstream media coverage 2.CBS memos alleging preferential treatment of Pres. Bush during the Vietnam War  Powerline broke the story on Sep. 9 th, launching flurry of discussion  Dan Rather apologized later in the month 3.“Was Bush Wired?”  Salon.com asked the question first on Oct. 8 th, echoed by Wonkette & PoliticalWire.com  MSM follows-up the next day

24 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Deriving Market Intelligence N. Glance, M. Hurst, K. Nigam, M. Siegler, R. Stockton and T. Tomokiyo. Deriving Marketing Intelligence from Online Discussion. Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2005).KDD 2005

25 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Automating Market Research  Brand managers want to know:  Do consumers prefer my brand to another?  Which features of my product are most valued?  What should we change or improve?  Alert me when a rumor starts to spread!

26 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Comparative mentions: Halo 2 ‘halo 2’

27 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Case Study: PDAs  Collect online discussion in target domain (order of 10K to 10M posts)  Classify discussion into domain-specific topics (brand, feature, price)  Perform base analysis over combination of topics: buzz, sentiment/polarity, influencer identification

28 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Dell Axim, 11.5% buzz, 3.4 polarity

29 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Interactive analysis  Top-down approach: drill down from aggregate findings to drivers of those findings  Global view of data used to determine focus  Model parent and child slice  Use data driven methods to identify what distinguishes one data set from the other

30 © 2006 Nielsen BuzzMetrics, A VNU business affiliate SD card

31 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Social network analysis for discussion about the Dell Axim

32 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Drilling down to sentence level  Discussion centers on poor quality of sound hardware & IR ports  “It is very sad that the Axim’s audio AND Irda output are so sub-par, because otherwise it is a great Pocket PC.”  “Long story made short: the Axim has a considerably inferior audio output than any other Pocket PC we have ever tested.”  “When we tested it we found that there was a problem with the audio output of the Axim.”  “The Dell Axim has a lousy IR transmitter AND a lousy headphone jack.”  Note: these examples are automatically extracted.

33 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Technology  Data Collection:  Document acquisition and analysis  Classification (relevance/topic)  Topical Analysis:  Topic classification using a hierarchy of topic classifiers operating at sentence level.  Phrase mining and association.  Intentional Analysis:  Interpreting sentiment/polarity  Community analysis  Aggregate metrics

34 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Topical Analysis  Hierarchy of topics with specific ‘dimensions’:  Brand dimension  Pocket PC:  Dell Axim  Toshiba  e740  Palm  Zire  Tungsten  Feature dimension:  Components  Battery

35 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Topical Analysis  Each topic is a classifier, e.g. a boolean expression with sentence and/or message scoped sub-expressions.  Measured precision of classifier allows for projection of raw counts.  Intersection of typed dimensions allows for a basic approach to association (e.g. find sentences discussing the battery of the Dell Axim).

36 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Polarity: What is it?  Opinion, evaluation/emotional state wrt some topic.  It is excellent  I love it.  Desirable or undesirable condition  It is broken (objective, but negative).  We use a lexical/syntactic approach.  Cf. related work on boolean document classification task using supervised classifiers.

37 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Polarity Identification This car is really great

38 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Polarity Identification This car is really great POS: DT NN VB RR JJ

39 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Polarity Identification This car is really great POS: Lexical orientation: DT NN VB RR JJ 0 0 0 0 +

40 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Polarity Identification This car is really great POS: Lexical orientation: Chunking: DT NN VB RR JJ 0 0 0 0 + BNP BVP BADJP

41 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Polarity Identification This car is really great POS: Lexical orientation: Chunking: Interpretation: DT NN VB RR JJ 0 0 0 0 + BNP BVP BADJP Positive (parsing):

42 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Polarity Challenges  Methodological: ‘She told me she didn’t like it.’  Syntactic: ‘His cell phone works in some buildings, but it others it doesn’t.’  Valence:  ‘I told you I didn’t like it’,  ‘I heard you didn’t like it’,  ‘I didn’t tell you I liked it’,  ‘I didn’t hear you liked it’: man verbs (tell, hear, say, …) require semantic/functional information for polarity interpretation.  Association

43 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Polarity Examples

44 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Polarity Metric  Function of counts of polar statements on a topic: f(size, f top, f top+pos, f top+neg )  Use empirical priors to smooth counts from observed counts (helps with low counts)  Use P/R of system to project true counts and provide error bars (requires labeled data)  Example: +/- ratio metric maps ratio to 0-10 score

45 © 2006 Nielsen BuzzMetrics, A VNU business affiliate

46 Predicting Movie Sales from Blogger Sentiment G. Mishne and N. Glance, “Predicting Movie Sales from Blogger Sentiment,” 2006 AAAI Spring Symposium on Computational Approaches to Analysing Weblogs.

47 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Blogger Sentiment and Impact on Sales  What we know:  There is a correlation between references to a product in the blogspace and its financial figures  Tong 2001: Movie buzz in Usenet is correlated with sales  Gruhl et. al.: 2005: Spikes in Amazon book sales follow spikes in blog buzz  What we want to find out:  Does taking into account the polarity of the references yield a better correlation?  Product of choice: movies  Methodology: compare correlation of references to sales with the correlation of polar references to sales

48 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Experiment  49 movies  Budget > 1M$  Released between Feb. and Aug. 2005  Sales data from IMDB  “Income per Screen” = opening weekend sales / screens  Blog post collection  References to the movies in a 2-month window  Used IMDB link + simple heuristics  Measure:  Pearson’s-R between the Income per Screen and {references in blogs, positive/polar references in blogs}  Applied to various context lengths around the reference

49 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Results For 80% of the movies, r > 0.75 for pre-release positive sentiment 12% improvement compared with correlation of movie sales with simple buzz count (0.542 vs. 0.484) Income per screen vs. positive references

50 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Conclusion  The intersection of Social Media and Data/Text Mining algorithms presents a viable business opportunity set to replace traditional forms of market research/social trend analysis/etc.  Key elements include topic detection and sentiment mining.  The success of the blogosphere has driven interest in a distinct form of online content which has a long history but is becoming more and more visible.  The blogosphere itself is a fascinating demonstration of social content and interaction and will enjoy many applications of traditional and novel analysis.

51 © 2006 Nielsen BuzzMetrics, A VNU business affiliate  Internships: openings available for this summer  e-mail: Matthew.Hurst@buzzmetrics.com  Data set: weblog data for July 2005  e-mail: Natalie.Glance@buzzmetrics.com  3 rd Annual Workshop on the Weblogging Ecosystem  http://www.blogpulse.com/www2006-workshop  1 st International Conference on Weblogs on Social Media, March 2007  http://www.icwsm.com (under construction) http://www.icwsm.com  Company info  Company website: http://nielsenbuzzmetrics.com/http://nielsenbuzzmetrics.com/  Blog search: http://www.blogpulse.com/

52 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Phrase Finding  Goal: find key phrases which discriminate between foreground corpus and background corpus  First step: KeyBigramFinder  Identifies phrases that score high in informativeness and phraseness  Informativeness: measure of ability to discriminate foreground from background  Phraseness: measure of collocation of consecutive words

53 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Phrase Finding Pipeline  Seeded by KeyBigramFinder  Sample pipeline  APrioriPhraseExpander: expands top N bigrams into longer phrases, adapting the APRIORI algorithm to text and features of text  ConstituentFinder: uses contextual evidence to identify noun phrases  Final list sorted either by frequency or informativeness score


Download ppt "© 2006 Nielsen BuzzMetrics, A VNU business affiliate Deriving Marketing Intelligence from Online Discussion Natalie Glance and Matthew Hurst CMU Information."

Similar presentations


Ads by Google