Presentation is loading. Please wait.

Presentation is loading. Please wait.

Inferring User Political Preferences from Streaming Communications Svitlana Volkova 1, Glen Coppersmith 2 and Benjamin Van Durme 1,2 1 Center for Language.

Similar presentations


Presentation on theme: "Inferring User Political Preferences from Streaming Communications Svitlana Volkova 1, Glen Coppersmith 2 and Benjamin Van Durme 1,2 1 Center for Language."— Presentation transcript:

1 Inferring User Political Preferences from Streaming Communications Svitlana Volkova 1, Glen Coppersmith 2 and Benjamin Van Durme 1,2 1 Center for Language and Speech Processing 2 Human Language Technology Center of Excellence ACL 2014, Baltimore

2 Motivation Personalized, diverse and timely data Can reveal user interests, preferences and opinions DemographicsPro – http://www.demographicspro.com/http://www.demographicspro.com/ WolphralAlpha Analytics – http://www.wolframalpha.com/facebook/http://www.wolframalpha.com/facebook/

3 Applications Large-scale passive polling and real-time live polling Online advertising Healthcare analytics Personalized recommendation systems and search

4 User Attribute Prediction Political Preference Rao et al., 2010; Conover et al., 2011, Pennacchiotti and Popescu, 2011; Zamal et al., 2012; Cohen and Ruths, 2013...... Communications Gender Garera and Yarowsky, 2009; Rao et al., 2010; Burger et al., 2011; Van Durme, 2012; Zamal et al., 2012; Bergsma and Van Durme, 2013 Age Rao et al., 2010; Zamal et al., 2012; Cohen and Ruth, 2013; Nguyen et al., 2011, 2013 … … … … …

5 Existing Approaches ~1K Tweets* ….… Does an average Twitter user produce thousands of tweets? *Rao et al., 2010; Conover et al., 2011; Pennacchiotti and Popescu, 2011a; Burger et al., 2011; Zamal et al., 2012; Nguyen et al., 2013 Tweets as a document

6 How Active are Twitter Users? http://www.digitalbuzzblog.com/visualizing-twitter-statistics-x100/

7 Real-World Predictions Not active users: no or limited content Average Twitter users Median = 10 tweets per day Active users 1,000+ tweets Private users: no content 10% 50% 20%

8 Our Approach 1.Take advantage of user local neighborhoods 2.Incremental dynamic real-time predictions Real world batch predictions Streaming predictions

9 Our Approach 1.Take advantage of user local neighborhoods 2.Incremental dynamic real-time predictions Real world batch predictions

10 Attributed Social Network User Local Neighborhoods a.k.a. Social Circles

11 Twitter Network Data Code, data and trained models for gender, age, political preference prediction http://www.cs.jhu.edu/~svitlana/

12 Twitter Social Graph I.Candidate-Centric 1,031 users of interest II.Geo-Centric 270 users III.Politically Active* 371 users 10 - 20 neighbors of each type per user ~50K nodes, ~60K edges What types of neighbors lead to the best attribute prediction for a given user? *Pennacchiotti and Popescu, 2011; Zamal et al., 2012; Cohen and Ruths, 2013 Code, data and trained models for gender, age, political preference prediction http://www.cs.jhu.edu/~svitlana/

13 Experiments Log-linear binary unigram models: (I)Users vs. (II) Neighbors and (III) Both Evaluate the relative utility of different neighborhood types: – varying neighborhood size n=[1, 2, 5, 10] and content amount t=[5, 10, 15, 25, 50, 100, 200] – 10-fold cross validation with 100 random restarts for every n and t parameter combination

14 Neighborhood Comparison Tweets per Neighbor 1 Neighbor10 Neighbors Accuracy

15 Optimizing Twitter API Calls Cand-Centric Graph: Friend Circle

16

17

18

19 Summary: Batch Real-World Predictions with Limited User Data More data is better How to get it? More neighbors per user > additional content from the existing neighbors What kind of data? Follower, friend, @mention, retweet Users recently joined Twitter No or limited access to user tweets no or very limited content! Real-world predictions

20 Our Approach 1.Take advantage of user local neighborhoods 2.Incremental dynamic real-time predictions Streaming predictions

21 Iterative Bayesian Predictions Time … ?

22 Cand-Centric Graph: Belief Updates ? … Time ? …

23 Cand-Centric Graph: Prediction Time User-Neighbor 100 users 75% confidence Cand 75% 95% User Stream

24 Batch vs. Online Performance

25 Summary Neighborhood content is useful * Neighborhoods constructed from friends, usermentions and retweets are most effective Signal is distributed in the neighborhood Streaming models > batch models *Pennacchiotti and Popescu, 2011a, 2001b; Conover et al., 2011a, 2001b; Golbeck et al., 2011; Zamal et al., 2012

26 Thank you! Labeled Twitter network data for gender, age, political preference prediction: http://www.cs.jhu.edu/~svitlana/ http://www.cs.jhu.edu/~svitlana/ Code and pre-trained models available upon request: svitlana@jhu.edusvitlana@jhu.edu


Download ppt "Inferring User Political Preferences from Streaming Communications Svitlana Volkova 1, Glen Coppersmith 2 and Benjamin Van Durme 1,2 1 Center for Language."

Similar presentations


Ads by Google