Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

Slides:

Advertisements

Similar presentations

Social media for business by Frank Flores Hash Cloud Studio A Creative Marketing Agency 200 Industrial Rd. Suite 155 San Carlos, CA (650)

Advertisements

Choosing a Topic and Developing Research Questions

Ranking Tweets Considering Trust and Relevance Srijith Ravikumar,Raju Balakrishnan, and Subbarao Kambhampati Arizona State University 1.

Twitter – what is it? The School District of Haverford Township |

Topical search in Twitter Complex Network Research Group Department of CSE, IIT Kharagpur.

WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.

Role of Online Social Networks during disasters & political movements Saptarshi Ghosh Department of Computer Science and Technology Bengal Engineering.

1 KSIDI June 9, 2010 Measuring User Influence in Twitter: The Million Follower Fallacy Meeyoung Cha Max Planck Institute for Software Systems (MPI-SWS)

Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.

Distant Supervision for Emotion Classification in Twitter posts 1/17.

Twitter The Basics. What is Twitter? Tweets are: 140 characters or less Quick to follow and view updates Used to share links, photos, videos, music,hot.

Advanced Google Becoming a Power Googler. (c) Thomas T. Kaun 2005 How Google Works PageRank: The number of pages link to any given page. “Importance”

Hashtags as Milestones in Time Identifying the hashtags for meaningful events using Twitter search logs and Wikipedia data Stewart Whiting University of.

1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.

MT Evaluation: Human Measures and Assessment Methods : Machine Translation Alon Lavie February 23, 2011.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

Enabling the Social Web Krishna P. Gummadi Networked Systems Group Max Planck Institute for Software Systems.

Extracting Relevant & Trustworthy Information from Microblogs Joint work with Bimal Viswanath, Farshad Kooti, Saptarshi Ghosh, Naveen Sharma, Niloy Ganguly,

Search Engines and Information Retrieval

Nisha Ranga TURNING DOWN THE NOISE IN BLOGOSPHERE.

Computing Trust in Social Networks

Overview of Search Engines

Cohort Modeling for Enhanced Personalized Search Jinyun YanWei ChuRyen White Rutgers University Microsoft BingMicrosoft Research.

Web Information Retrieval Projects Ida Mele. Rules Students can work in teams (max 3 people) The project must be delivered by the deadline that will be.

TwitterSearch : A Comparison of Microblog Search and Web Search

Kim Salamonson Hastings District Libraries, Hastings, New Zealand LIANZA Conference 2014 in Association with DigLib-SIG.

«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,

Using Social Networks in Education Region One Technology Conference May 11, 2010.

Emerging Topic Detection on Twitter (Cataldi et al., MDMKDD 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences

SOCIAL NETWORKS AND THEIR IMPACTS ON BRANDS Edwin Dionel Molina Vásquez.

Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li.

Search Engines and Information Retrieval Chapter 1.

Aardvark Anatomy of a Large-Scale Social Search Engine.

12/2014 Heidi Larson HeidiL_edc.  Setting up an account  Twitter vocabulary – With Strategy tips  How to Tweet  Why to Tweet  How to get started.

Knowing Your Facebook From Your Flickr Dan O’ Neill – -

Know your Neighbors: Web Spam Detection Using the Web Topology Presented By, SOUMO GORAI Carlos Castillo(1), Debora Donato(1), Aristides Gionis(1), Vanessa.

User Profiling based on Folksonomy Information in Web 2.0 for Personalized Recommender Systems Huizhi (Elly) Liang Supervisors: Yue Xu, Yuefeng Li, Richi.

A Comparison of Microblog Search and Web Search.

When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.

Pete Bohman Adam Kunk. Real-Time Search  Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.

Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.

Microblogs: Information and Social Network Huang Yuxin.

Date: 2012/4/23 Source: Michael J. Welch. al(WSDM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Topical semantics of twitter links 1.

Wei Feng , Jiawei Han, Jianyong Wang , Charu Aggarwal , Jianbin Huang

How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.

OCLC Online Computer Library Center 1 Social Media and Advocacy.

Twitter Games: How Successful Spammers Pick Targets Vasumathi Sridharan, Vaibhav Shankar, Minaxi Gupta School of Informatics and Computing, Indiana University.

WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.

What Is Text Mining? Also known as Text Data Mining Process of examining large collections of unstructured textual resources in order to generate new.

Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.

1 A Biterm Topic Model for Short Texts Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.

Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.

+ Social Media in the Classroom Tumblr & Twitter.

#GoingViral giulia_bonelli, formicablu Using social media to promote research CAGLIARI,

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

On Frequent Chatters Mining Claudio Lucchese 1 st HPC Lab Workshop 6/15/12 1st HPC Workshp - Claudio Lucchese.

Engaging the audience. Social Media is a Universe A way to talk with supporters and key stakeholders So, be a connector. Reciprocate. Empower your audience,

Measuring User Influence in Twitter: The Million Follower Fallacy Meeyoung Cha Hamed Haddadi Fabricio Benevenuto Krishna P. Gummadi.

Semi-Supervised Recognition of Sarcastic Sentences in Twitter and Amazon -Smit Shilu.

Reflections on the pros and cons of modules, deployment mechanisms and development strategies Paolo Spada Centre for the Study of Democratic Institutions.

Alvin CHAN Kay CHEUNG Alex YING Relationship between Twitter Events and Real-life.

Chapter 8: Web Analytics, Web Mining, and Social Analytics

Twitter anyone? Sue Newell Chief Operating Officer Faculty of Health and Social Sciences Leeds Metropolitan University.

Frompo is a Next Generation Curated Search Engine. Frompo has a community of users who come together and curate search results to help improve.

Can Recommendation Trails Create Coverage Bias? Abhijnan Chakraborty, Saptarshi Ghosh, Niloy Ganguly, Krishna P. Gummadi Indian Institute of Technology,

Searching the Web for academic information Ruth Stubbings.

Topical Authority Detection and Sentiment Analysis on Top Influencers

Pooria Taghizadeh : Dr. Hadi Tabatabaee : Dr. Mona Ghassemian :

Building Topic/Trend Detection System based on Slow Intelligence

Presentation transcript:

Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS) Saptarshi Ghosh

Topical search in Twitter Twitter has emerged as an important source of information & real-time news  Search for breaking news and trending topics Topical search  Searching for topical experts  Searching for information on specific topics Primary requirement: Identify topical expertise of users

Profile of a Twitter user

Example tweets

Prior approaches to find topic experts  Research studies  Pal et. al. (WSDM 2011) uses 15 features from tweets, network, to identify topical experts  Weng et. al. (WSDM 2010) uses ML approach  Application systems  Twitter Who To Follow (WTF), Wefollow, …  Methodology not fully public, but reported to utilize several features

Prior approaches use features extracted from  User profiles  Screen-name, bio, …  Tweets posted by a user  Hashtags, others retweeting a given user, …  Social graph of a user  Number of followers, PageRank, …

Problems with prior approaches  User profiles – screen-name, bio, …  Bio often does not give meaningful information  Tweets posted by a user  Tweets mostly contain day-to-day conversation  Social graph of a user – number of followers, PageRank  Helps to identify authoritative users, but …  Does not provide topical information

We propose … Use a completely different feature to infer topics of expertise for an individual Twitter user Utilize social annotations  How does the Twitter crowd describe a user?  Social annotations obtained through Twitter Lists  Approach essentially relies on crowdsourcing

Twitter Lists Primarily an organizational feature Used to organize the people one is following  Create a named list, add an optional List description  Add related users to the List  Tweets posted by these users will be grouped together as a separate stream

How Lists work ?

Using Lists to infer topics for users If U is an expert / authority in a certain topic  U likely to be included in several Lists  List names / descriptions provide valuable semantic cues to the topics of expertise of U

Inferring topical attributes of users

Dataset Collected Lists of 55 million Twitter users who joined before or in 2009  88 million Lists collected in total All studies consider 1.3 million users who are included in 10 or more Lists Most List names / descriptions in English, but significant fraction also in French, Portuguese, …

Mining Lists to infer expertise Collect Lists containing a given user U List names / descriptions collected into a ‘topic document’ for the given user Identify U’s topics from the document  Ignore domain-specific stopwords  Identify nouns and adjectives  Unify similar words based on edit-distance, e.g., journalists and jornalistas, politicians and politicos (not unified by stemming)

Mining Lists to infer expertise Unigrams and bigrams considered as topics Extracted from topic document of U:  Topics for user U  Frequencies of the topics in the document

Topics inferred from Lists linux, tech, open, software, libre, gnu, computer, developer, ubuntu, unix politics, senator, congress, government, republicans, Iowa, gop, conservative politics, senate, government, congress, democrats, Missouri, progressive, women celebs, actors, famous, movies, comedy, funny, music, hollywood, pop culture

Lists vs. other features love, daily, people, time, GUI, movie, video, life, happy, game, cool Most common words from tweets celeb, actor, famous, movie, stars, comedy, music, Hollywood, pop culture Most common words from Lists Profile bio

Lists vs. other features Fallon, happy, love, fun, video, song, game, hope, #fjoln, #fallonmono Most common words from tweets celeb, funny, humor, music, movies, laugh, comics, television, entertainers Most common words from Lists Profile bio

Evaluation of inferred topics – 1 Evaluated through user-survey  Evaluator shown top 30 topics for a chosen user  Are the inferred attributes (i) accurate, (ii) informative?  Binary response for both queries More than 93% evaluators judged the topics to be both accurate and informative  The few negative judgments were a result of subjectivity

Evaluation of inferred topics – 2  Comparison with topics identified by Twitter WTF  Obtained top 20 WTF results for about 200 queries  3495 distinct users  Topics inferred by us from Lists include query-topic for 2916 users (83.4%)  For the rest  Case 1 – inferred topics include semantically very similar words, but not exact query-word (18%)  Case 2 – wrong results by WTF, unrelated to query (58%)

Comparison with Twitter WTF  Restaurant dineLA for query “dining”  Inferred topics – food, restaurant, recipes, los angeles  Space explorer HubbleHugger77 for query “hubble”  Inferred topics – science, tech, space, cosmology, nasa  Comedian jimmyfallon for query “astrophysicist”  Inferred topics – celebs, comedy, humor, actor  Web developer ScreenOrigami for query “origami”  Inferred topics – webdesign, html, designers Case 1 Case 2

Who-is-who service Developed a Who-is-Who service for Twitter Shows word-cloud for major topics for a user sws.org/who-is-who/ sws.org/who-is-who/ Inferring Who-is-who in the Twitter Social Network, WOSN 2012 (Highest rated paper in workshop)

Identifying topical experts

Topical experts in Twitter 400 million tweets posted daily Quality of tweets posted by different users vary widely  News, pointless babble, conversational tweets, spam, … Challenge: to find topical experts  Sources of authoritative information on specific topics

Basic methodology Given a query (topic) Identify experts on the topic using Lists  Discussed earlier Rank identified experts w.r.t. expertise on the given topic  Need a suitable ranking algorithm  Commonly used ranking metrics such as number of followers, PageRank does not consider topic

Ranking experts Two components of ranking user U w.r.t. query Q: relevance of U to Q, popularity of U Relevance of user to query  Cover density ranking between topic document T U of user U and Q  Cover Density ranking preferred for short queries Popularity of user: Number of Lists including the user Topic relevance( T U, Q ) × log( #Lists including U )

Cognos Search system for topical experts in Twitter Publicly deployed at Cognos: Crowdsourcing Search for Topic Experts in Microblogs, ACM International SIGIR Conference 2012

Cognos results for “politics”

Cognos results for “stem cell”

Cognos results for “earthquake”

Evaluation of Cognos System evaluated ‘in-the-wild’ People were asked to try the system and give feedback Evaluators were students & researchers from the home institutes of researchers Advantage – lot of varied queries tried Disadvantage – subjectivity in relevance judgement

User-evaluation of Cognos

Sample queries for evaluation

Evaluation results Overall 2136 relevance judgments over 55 queries  1680 said relevant (78.7%) Large amount of subjectivity in evaluations  Same result for same query received both relevant and non-relevant judgments  E.g., for query “cloud computing”, Werner Vogels got 4 relevant judgments, 6 non-relevant judgments

Cognos vs Twitter Who-to-follow  Evaluator shown top 10 results by both systems  Result-sets anonymized  Evaluator judges which is better / both good / both bad  Queries chosen by evaluators themselves  27 distinct queries were asked at least twice  In total, asked 93 times  Judgment by majority voting

Cognos vs Twitter WTF  Cognos judged better on 12 queries  Computer science, Linux, mac, Apple, ipad, India, internet, windows phone, photography, political journalist  Twitter WTF judged better on 11 queries  Music, Sachin Tendulkar, Anjelina Jolie, Harry Potter, metallica, cloud computing, IIT Kharagpur  Mostly names of individuals or organizations  Tie on 4 queries  Microsoft, Dell, Kolkata, Sanskrit as an official language

Topical content search

Challenges in topical content search Services today are limited to keyword search  Search for ‘politics’  get only tweets which contain the word ‘politics’  Knowing which keywords to search for, is itself an issue Individual tweets are too small to deduce topics Scalability: 400M tweets posted per day Tweets may contain spam / rumors / phishing URLs

Our approach Look at tweets posted by a selected set of topical experts Inferring topic of tweets from tweeters’ expertise  Large fraction of tweets posted by experts are only about day-to-day conversation Solution: If multiple experts on a topic tweet about something, it is most likely related to the topic

Sampling Tweets from Experts We capture all tweets from 585K topical experts  Identified through Lists  Expertise in a wide variety of topics The experts generate 1.46 million tweets per day  0.268% of all tweets on twitter  scalable Trustworthiness  Experts not likely to post spam / phishing URLs  Less chance of rumors in what is posted by several experts

Methodology at a Glance Gather tweets from experts on given topic Group tweets on the same news-story  We use a group of hashtags to represent a news-story Multi-level clustering (cluster: news-story)  Cluster tweets based on the hashtags they contain  Cluster hashtags based on co-occurrence Rank new-stories by popularity  Number of distinct experts tweeting on the story  Number of tweets on the story

Results for the last week on Politics (a popular topic)

Related tweets grouped together by common hashtags. The most popular tweet in the story shown Hashtags which co-occur frequently grouped together

Our system specially excels for niche topics.

Evaluation – Relevance Evaluated using human feedback  Used Amazon Mechanical Turk for user evaluation  Evaluated top 10 clusters for 20 topics Users have to judge if the tweet shown was relevant to the given topic  Options are Relevant / Not Relevant / Can’t Say

Evaluating Tweet Relevance We obtained 3150 judgments  80% of tweets marked relevant by majority judgment Non-relevant results primarily due to  Global events that were discussed by experts across all topics, e.g., Hurricane Sandy in the USA  Sometimes, topic is too specific and several experts tweet on a broader topic (e.g., baseball and ESPN Sports Update)

Effect of global events Experts on all topics tweeting on #sandy Most of these got negative judgments

Diversity of topics in Twitter

Topics in Twitter Discovering thousands of experts on diverse topics  characterizing the Twitter platform as a whole On what topics is expert content available in Twitter? Popular view – few topics such as politics, sports, music, celebs, … We find – lots of niche topics along with the popular ones

Topics in Twitter – major topics to niche ones what Twitter is mostly known for wide variety of niche topics

Thank You Contact: