Presentation is loading. Please wait.

Presentation is loading. Please wait.

Topical Authority Detection and Sentiment Analysis on Top Influencers

Similar presentations


Presentation on theme: "Topical Authority Detection and Sentiment Analysis on Top Influencers"— Presentation transcript:

1 Topical Authority Detection and Sentiment Analysis on Top Influencers
Machine Learning with Large Datasets Course Project (under the guidance of Prof. William W. Cohen) Team Members: Manuel, Shubham and Soumya

2 Outline Introduction Related Work Problem Statement Methodology
Results Evaluation plan Conclusion

3 Introduction Topical authority detection in social networks is an active research area Important for recommending relevant feed to users interested in certain topics Challenges - Results should not be overly biased towards: popular authors (such as celebrities) generic authorities (such as news channels) Relatively new users, who may not exist prior to an event, but post dedicatedly on the topic, should also be considered

4 Related Work TwitterRank [2]: Authority Detection in Twitter using the idea of PageRank Leverages topical similarity and link structure between users Fails to filter out spammers, or celebrities who are not always influential Meeyoung Cha et. al. [3] find that popular users who have high in- degree are not necessarily influential in terms of spawning retweets or mentions Aditya Pal et. al. [5] (considered as the baseline): Use clustering to identify influential vs. non-influential users on Twitter Rank users in the influential cluster, considering various important features

5 Problem Statement Aim:
Perform authority detection on a collection of topics in Twitter for a time window Sentiment analysis to determine the influence of top users tweeting on specific topics on their respective communities Period: June 6th 2010 to June 10th 2010 Topics: Oil Spill iPhone World Cup

6 Methodology - User Metrics
M = Mentions M1: Number of mentions of other users by the author M2: Number of unique users mentioned by the author M3: Number of mentions by others of the author M4: Number of unique users mentioning the author G = Graph Characteristics (restricted by the availability of data) G1: Number of topically active followers G2: Number of topically active friends G3: Number of followers tweeting on topic after the author G4: Number of friends tweeting on topic before the author OT1: Number of original tweets OT2: Number of links shared OT3: Self-similarity score OT4: Number of keyword hashtags used CT = Conversational tweets CT1: Number of conversational tweets CT2: Tweets where conversation is initiated by the author RT = Repeated tweets RT1: Number of retweets of others’ tweets RT2: Number of unique tweets retweeted by other users RT3: Number of unique users who retweeted author’s tweets

7 Methodology - Features Extracted
Topic Signal (TS) Signal Strength (SS) Non-Chat Signal (NCS) Retweet Impact (RI) - modified Mention Impact (MI) Information Diffusion (ID) Network Score (NS) URL Impact (UI)

8 Methodology - Features Formulae

9 Methodology - Steps Data in Twitter API format -> User Metrics MapReduce (using Hadoop on AWS) Src-follows-Dest edge-list -> Adjacency Lists User Metrics and Adjacency Lists -> Features Features -> Clusters -> Influential Cluster Using Gaussian Mixture Model and Expectation Maximization Influential Cluster -> Top 20 Influencers Using Gaussian Ranking Sentiment Analysis and Visualization Using Liu Hu Lexicon and Gephi

10 Results - Authority Detection
Normalized Not Normalized : sandiebanandie : LATenvironment : latimesgreen : dbiello : mrt : BPOilSpill : NWF : climateprogress : ByronYork : Oil_Spill_News : SwampSchool : BrentSpiner : TPM : USGulfOilSpill : kate_sheppard : Fertic : GulfOilCleanup : msnbcvideo : alabamainsider 9848: jimmybuffett : LATenvironment : BPOilSpill : NWF : dbiello : GulfOilCleanup : sandiebanandie : NOLAnews : BoycottBP : Alyssa_Milano : USGulfOilSpill : guardianeco : climateprogress : TIME : ByronYork : TelegraphNews :washingtonpost : mrt7384 :BPOilNews :greenforyou : HuffingtonPost

11 Results - Sentiment Analysis
Dbeillo Negative Sentiment Influence LATenvironment Neutral Sentiment Influence

12 Evaluation - Clustering, Ranking and Authority
We randomly sample users from the “good” and “bad” clusters to ask people how relevant the tweets are for the topic. Using the assigned rank (1 to 5) of the users from the top k Twitter users in our ranking, we run NCGD to compare the relative rank that the users assigned to our ranking. WIth a final survey, we plan to ask people to rank the authoritativeness of the top k users in our rank with anonymized and non-anonymized tweets.

13 Evaluation

14 Conclusion While the baseline had more authorities who seemed generic, such as news Twitter accounts, our results show more topical authorities. We have also analyzed the sentiment influence of the top authorities, which can have further applications in formulating better marketing strategies for products and to influence consumers. Further, we plan to include evaluation results in our final report, and also improve upon the features related to the follower-following graph.

15

16 References [1] Pal, Aditya, and Scott Counts. "Identifying topical authorities in microblogs." Proceedings of the fourth ACM international conference on Web search and data mining. ACM, 2011. [2] Weng, Jianshu, et al. "Twitterrank: finding topic-sensitive influential twitterers." Proceedings of the third ACM international conference on Web search and data mining. ACM, 2010. [3] Cha, Meeyoung, et al. "Measuring User Influence in Twitter: The Million Follower Fallacy." ICWSM (2010): 30. [4] Yoshida, M., & Yamaguchi, Y. (2015). Interactive Tagging Networks (Following/Followers and Tags on 1 million Twitter Users) [Data set]. Zenodo. [5] Page, Lawrence, et al. "The PageRank citation ranking: bringing order to the web." (1999). [6] Bishop, Christopher M. "Pattern recognition." Machine Learning 128 (2006).

17 Baseline Results NWF TIME Huffingtonpost NOLAnews Reuters CBSNews
LATenvironment kate_sheppard MotherNatureNet mparent77772


Download ppt "Topical Authority Detection and Sentiment Analysis on Top Influencers"

Similar presentations


Ads by Google