Topical search in Twitter Complex Network Research Group Department of CSE, IIT Kharagpur.

Slides:

Advertisements

Similar presentations

Choosing a Topic and Developing Research Questions

Advertisements

Learning more about Facebook and Twitter. Introduction  What we’ve covered in the Social Media webinar series so far  Agenda for this call Facebook.

By: Maetal Rozenberg ’11 & Barri DeFrancisci ‘12 Language Learning Center April 9 th, 2010.

WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.

Role of Online Social Networks during disasters & political movements Saptarshi Ghosh Department of Computer Science and Technology Bengal Engineering.

1 KSIDI June 9, 2010 Measuring User Influence in Twitter: The Million Follower Fallacy Meeyoung Cha Max Planck Institute for Software Systems (MPI-SWS)

Creating Collaborative Partnerships

Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.

Distant Supervision for Emotion Classification in Twitter posts 1/17.

TWITTER BASICS PCNA: Signing up, tweeting, following, & hash-tagging on Twitter.

Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.

Finding your friends and following them to where you are by Adam Sadilek, Henry Kautz, Jeffrey P. Bigham Presented by Guang Ling 1.

Hashtags as Milestones in Time Identifying the hashtags for meaningful events using Twitter search logs and Wikipedia data Stewart Whiting University of.

Content Management & Hashtag Recommendation IN P2P OSN By Keerthi Nelaturu.

Case Study: BibFinder BibFinder: A popular CS bibliographic mediator –Integrating 8 online sources: DBLP, ACM DL, ACM Guide, IEEE Xplore, ScienceDirect,

Web 2.0: Concepts and Applications 5 Connecting People.

Enabling the Social Web Krishna P. Gummadi Networked Systems Group Max Planck Institute for Software Systems.

Extracting Relevant & Trustworthy Information from Microblogs Joint work with Bimal Viswanath, Farshad Kooti, Saptarshi Ghosh, Naveen Sharma, Niloy Ganguly,

An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.

Masoud Valafar †, Reza Rejaie †, Walter Willinger ‡ † University of Oregon ‡ AT&T Labs-Research WOSN’09 Barcelona, Spain Beyond Friendship Graphs: A Study.

Search Engines and Information Retrieval

Automatic Discovery of Technology Trends from Patent Text Youngho Kim, Yingshi Tian, Yoonjae Jeong, Ryu Jihee, Sung-Hyon Myaeng School of Engineering Information.

Computing Trust in Social Networks

A Social Help Engine for Online Social Network Mobile Users Tam Vu, Akash Baid WINLAB, Rutgers University May 21,

Topical search in the Twitter OSN Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

The App Store is a type of digital distribution platform for application software. Often provided as a component on a personal Mac computer, iPhone, iPad,

What is it? Social networking is the grouping of individuals into specific groups, much like a neighborhood subdivision, if you will. Although social.

Projects ( ) Ida Mele. Rules Students have to work in teams (max 2 people). The project has to be delivered by the deadline that will be published.

Modern Retrieval Evaluations Hongning Wang

Emerging Topic Detection on Twitter (Cataldi et al., MDMKDD 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences

Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.

As news analysis tool SNATZ TECHNOLOGY. Main terms used in presentation Term – a phrase, which system uses for training NLP algorithms. Summary – a phrase,

Search Engines and Information Retrieval Chapter 1.

Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.

Media Relations in a Social Media World By Julie DeBardelaben Deputy Director of Public Affairs CAP National Headquarters.

Developing your social media profile Decoding Social Media February

1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.

User Profiling based on Folksonomy Information in Web 2.0 for Personalized Recommender Systems Huizhi (Elly) Liang Supervisors: Yue Xu, Yuefeng Li, Richi.

Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.

A Comparison of Microblog Search and Web Search.

Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?

When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.

1 Discovering Authorities in Question Answer Communities by Using Link Analysis Pawel Jurczyk, Eugene Agichtein (CIKM 2007)

Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.

Pete Bohman Adam Kunk. Real-Time Search  Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.

TWITTER What is Twitter, a Social Network or a News Media? Haewoon Kwak Changhyun Lee Hosung Park Sue Moon Department of Computer Science, KAIST, Korea.

Microblogs: Information and Social Network Huang Yuxin.

Date: 2012/4/23 Source: Michael J. Welch. al(WSDM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Topical semantics of twitter links 1.

CHAPTER 2 Statistical Inference, Exploratory Data Analysis and Data Science Process cse4/587-Sprint

Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.

C. Lawrence Zitnick Microsoft Research, Redmond Devi Parikh Virginia Tech Bringing Semantics Into Focus Using Visual.

Department of Electrical Engineering and Computer Science Kunpeng Zhang, Yu Cheng, Yusheng Xie, Doug Downey, Ankit Agrawal, Alok Choudhary {kzh980,ych133,

Improving Search Results Quality by Customizing Summary Lengths Michael Kaisser ★, Marti Hearst  and John B. Lowe ★ University of Edinburgh,  UC Berkeley,

Twitter Games: How Successful Spammers Pick Targets Vasumathi Sridharan, Vaibhav Shankar, Minaxi Gupta School of Informatics and Computing, Indiana University.

Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.

Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.

Semi-Supervised Recognition of Sarcastic Sentences in Twitter and Amazon -Smit Shilu.

Chapter 8: Web Analytics, Web Mining, and Social Analytics

© 2014 Deluxe Enterprise Operations, Inc. All rights reserved. Proprietary and Confidential. How to Own the First Page of Google for Your Brand …and dominate.

Big Data: Every Word Managing Data Data Mining TerminologyData Collection CrowdsourcingSecurity & Validation Universal Translation Monolingual Dictionaries.

Frompo is a Next Generation Curated Search Engine. Frompo has a community of users who come together and curate search results to help improve.

PEOPLE’S RESOURCE CENTER TECH TUESDAY: TWITTER FOR BEGINNERS

Topical Authority Detection and Sentiment Analysis on Top Influencers

By : Namesh Kher Big Data Insights – INFM 750

SOCIAL COMPUTING Homework 3 Presentation

Representing Documents Through Their Readers

Personalizing Search on Shared Devices

Presentation transcript:

Topical search in Twitter Complex Network Research Group Department of CSE, IIT Kharagpur

Topical search on Twitter Twitter has emerged as an important source of information & real-time news  Most common search in Twitter: search for trending topics and breaking news Topical search  Identifying topical attributes / expertise of users  Searching for topical experts  Searching for information on specific topics

Prior approaches to find topic experts  Research studies  Pal et. al. (WSDM 2011) uses 15 features from tweets, network, to identify topical experts  Weng et. al. (WSDM 2010) uses ML approach  Application systems  Twitter Who To Follow (WTF), Wefollow, …  Methodology not fully public, but reported to utilize several features

Prior approaches use features extracted from  User profiles  Screen-name, bio, …  Tweets posted by a user  Hashtags, others retweeting a given user, …  Social graph of a user  #followers, PageRank, …

Problems with prior approaches  User profiles – screen-name, bio, …  Bio often does not give meaningful information  Information in users profiles mostly unvetted  Tweets posted by a user  Tweets mostly contain day-to-day conversation  Social graph of a user – #followers, PageRank  Does not provide topical information

We propose … Use a different way to infer topics of expertise for an individual Twitter user Utilize social annotations  How does the Twitter crowd describe a user?  Social annotations obtained through Twitter Lists  Approach essentially relies on crowdsourcing

Twitter Lists A feature used to organize the people one is following on Twitter  Create a named list, add an optional List description  Add related users to the List  Tweets posted by these users will be grouped together as a separate stream

How Lists work ?

Using Lists to infer topics for users If U is an expert / authority in a certain topic  U likely to be included in several Lists  List names / descriptions provide valuable semantic cues to the topics of expertise of U

Dataset Collected Lists of 55 million Twitter users who joined before or in 2009  88 million Lists collected in total All studies consider 1.3 million users who are included in 10 or more Lists Most List names / descriptions in English, but significant fraction also in French, Portuguese, …

Inferring topical attributes of users

Mining Lists to infer expertise Collect Lists containing a given user U List names / descriptions collected into a ‘document’ for the given user Identify U’s topics from the document  Handle CamelCase words, case-folding  Ignore domain-specific stopwords  Identify nouns and adjective  Unify similar words based on edit-distance, e.g., journalists and jornalistas, politicians and politicos (not unified by stemming)

Mining Lists to infer expertise Unigrams and bigrams considered as topics Result: Topics for U along with their frequencies in the document

Topics inferred from Lists linux, tech, open, software, libre, gnu, computer, developer, ubuntu, unix politics, senator, congress, government, republicans, Iowa, gop, conservative politics, senate, government, congress, democrats, Missouri, progressive, women celebs, actors, famous, movies, comedy, funny, music, hollywood, pop culture

Lists vs. other features love, daily, people, time, GUI, movie, video, life, happy, game, cool Most common words from tweets celeb, actor, famous, movie, stars, comedy, music, Hollywood, pop culture Most common words from Lists Profile bio

Lists vs. other features Fallon, happy, love, fun, video, song, game, hope, #fjoln, #fallonmono Most common words from tweets celeb, funny, humor, music, movies, laugh, comics, television, entertainers Most common words from Lists Profile bio

Who-is-who service Developed a Who-is-Who service for Twitter Shows word-cloud for major topics for a user sws.org/who-is-who/ sws.org/who-is-who/ Inferring Who-is-who in the Twitter Social Network, WOSN 2012 (Highest rated paper in workshop)

Identifying topical experts

Topical experts in Twitter 400 million tweets posted daily Quality of tweets posted by different users vary widely  News, pointless babble, conversational tweets, spam, … Challenge: to find topical experts  Sources of authoritative information on specific topics

Basic methodology Given a query (topic) Identify experts on the topic using Lists  Discussed earlier Rank identified experts w.r.t. given topic  Need ranking algorithm Additional challenge: keeping the system up-to-date in face of thousands of users joining Twitter daily

Ranking experts Used a ranking scheme solely based on Lists Two components of ranking user U w.r.t. query Q  Relevance of user to query – cover density ranking between topic document T U of user and Q  Popularity of user – number of Lists including the user Cover Density ranking preferred for short queries Topic relevance( T U, Q ) × log( #Lists including U )

Cognos Search system for topical experts in Twitter Publicly deployed at Cognos: Crowdsourcing Search for Topic Experts in Microblogs, ACM SIGIR 2012

Cognos results for “politics”

Cognos results for “stem cell”

Evaluation of Cognos - 1 Competes favorably with prior research attempts to identify topical experts (Pal et al. [WSDM 2011])

Evaluation of Cognos – 2  Cognos compared with Twitter WTF  Evaluator shown top 10 results by both systems  Result-sets anonymized  Evaluator judges which is better / both good / both bad  Queries chosen by evaluators themselves  27 distinct queries were asked at least twice  In total, asked 93 times  Judgment by majority voting

Cognos vs Twitter WTF  Cognos judged better on 12 queries  Computer science, Linux, mac, Apple, ipad, India, internet, windows phone, photography, political journalist  Twitter WTF judged better on 11 queries  Music, Sachin Tendulkar, Anjelina Jolie, Harry Potter, metallica, cloud computing, IIT Kharagpur  Mostly names of individuals or organizations  Tie on 4 queries  Microsoft, Dell, Kolkata, Sanskrit as an official language

Cognos vs Twitter WTF  Low overlap between top 10 results  … In spite of same topic being inferred for 83% experts  Major differences are due to List-based ranking  Top Twitter WTF results – mostly business accounts  Top Cognos results – mostly personal accounts

Keeping system up-to-date Any search / recommendation system on OSN platform needs to be kept up-to-date  Thousands of new users join every day  Need efficient way of discovering topical experts Can brute force approach be used?  Periodically crawl data (profile, Lists) of all users

Scalability problem  200 million new users joined Twitter during 9 months in 2011  740K new users join daily  Lower-bound estimate: 1480K API calls per day required to crawl their profiles and Lists  Twitter allows only 3.6K API calls per day per IP  480K API calls per day from whitelisted IP  Plus, 465 million users already

How many experts in Twitter?  Only 1% listed 10 or more times  Only 0.12% listed 100 or more times  If experts can be identified efficiently, possible to crawl their Lists

Identifying experts efficiently  Hubs – users who follow many experts and add them to Lists  Identified top hubs in social network using HITS  Crawled Lists created by top 1 million hubs  Top 1M hubs listed 4.1M users  2.06M users included in 10 or more Lists (50%)  Discovered 65% of the estimated number of experts listed 100 or more times

Identifying experts efficiently  More than 42% of the users listed by top hubs have joined Twitter after 2009  Discovered several popular experts who joined within the duration of the crawl  All experts reported by Pal et. al. discovered  Discovered all Twitter WTF top 20 results for 50% of the queries, 15 or more for 80% of the queries

Topical search in Twitter

Looking for Tweets by Topic Services today are limited to keyword search  Knowing which keywords to search for, is itself an issue  Keyword search is not context aware Tweets are too small to deduce topics Topic analysis of 400M tweets/day is a challenge

Challenges Some tweets are more important than others  Millions of tweets are posted on popular topics  Only some are relevant to the context intended Tweets may contain wrong or misleading info  Twitter has a large population of spammers  Twitter is also a potent source of rumors  Some tweets are outright malicious

Our Approach to the Issues Scalability  We only look at tweets from as small subset of users who are experts on different topics Topic deduction  We map user expertise topics, to tweets/hashtags, instead of the other way round Trustworthiness  Our source of tweets is a small subset of users  It is practical to vet their expertise and reputation

Advantages of list-based methodology 600K experts on 36K distinct topics

Topical Diversity of Expert Sample CSCW’14

Popular Topics

Niche Topics

Challenges in Used Approach We assign topics to tweets/hashtags Inferring tweet topics from tweeter expertise  Experts can have multiple topics of expertise  Experts do tweet about topics beyond their expertise Solution: If multiple experts on a subject tweet about something, it is most likely related to the topic.

Sampling Tweets from Experts We capture all tweets from 585K topical experts  This is a set we obtained from our previous study  This about 0.1% of the whole Twitter population The experts generate 1.46 million tweets/per day  This is 0.268% of all tweets on twitter Expertise in diverse topics (36K)  Our topics of expertise is crowd sourced  We will have more topics as more users show interests

Methodology at a Glance Given a topic, we gather tweets from experts We use hashtags to represent subjects Clustering Tweets by similar hashtags  A cluster represents information on related subjects Ranking clusters by popularity  Number of unique experts tweeting on the subject  Number of unique tweets on the subject Ranking tweets by authority  Tweets from highest ranked user is shown first

What-is-happening on Twitter twitter-app.mpi-sws.org/what-is-happening/ Topical search in Microblogs with Cognoscenti, Or: The Wisdom of Crowdsourced Experts,

Results for the last week on Politics (a popular topic)

Related tweets are grouped together by common hashtags. Number of experts tweeting on the subject and the number of tweets on the subject decides ranking. The most popular tweet from the most authoritative user represents the group.

Our system specially excels for niche topics.

Evaluation – Relevance We used Amazon Mechanical Turk for user evaluation  We chose to evaluate 20 topics  We picked top 10 tweets and hashtags  We picked results for all 3 time groups Users have to judge if the tweet/hashtag was relevant to the given topic  Options are Relevant/Not Relevant/Can’t Say We chose master workers only Every tweet/hashtag was evaluated by at least 4 users

Evaluating Tweet Relevance We obtained 3150 judgments 76% of which were Relevant 22% Not Relevant, 2% Can’t Say 80% of the Tweets were marked relevant by majority judgment

Dissecting Negative Judgments Iphone was the topic which received most negative results Experts on Iphone were generally tweeting on the overall topic (such as androids, tablets, …) Last week time group had most positive results  Scarcity of information led to bad ranking

Evaluating Hashtag Relevance Total 3200 judgments 62.3% were Relevant  Much less than tweets (76% were marked relevant) Relevance of hashtags is very context sensitive

Perspectival relevance The generic hashtag #sandy is very relevant to the topics in context of the tweet. These got negative judgments when shown without the tweets.

Generic Hashtags Some hashtags are generic, but our service brings our their specificity with respect to the topic. These hashtags received negative judgments when shown without the context of the tweet.

Summary Simple Core Observation Users curate experts Services  who-is who (WOSN’12, CCR’12)  whom-to-follow (SIGIR’12)  what-is-happening (in-submission)  Sample-stream (CIKM’13, CSCW’14)

Complex Network Research Group

Thank You Contact: Complex Network Research Group (CNeRG) CSE, IIT Kharagpur, India