TWinner : Understanding News Queries with Geo-content using Twitter Satyen Abrol,Latifur Khan University of Texas at Dallas,Department of Computer Science.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Promoting Your Business Through Twitter ©2009, All rights reserved Fox Coaching Associates.
WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
{ Trends in Social Network M. Tech Project Presentation By : Pranay Agarwal 2008CS50220 Guides : Amitabha Bagchi Maya Ramanath.
PSRC Technology Integration Team TWITTER 101.  Twitter is a social networking tool or microblog.  It is composed of short text, pictures, and URLs called.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
SOPS: Stock Prediction using Web Sentiment Presented by Vivek sehgal, Charles Song Department of Computer Science, University of Maryland ICDMW
SNOW Workshop, 8th April 2014 Real-time topic detection with bursty ngrams: RGU participation in SNOW 2014 challenge Carlos Martin and Ayse Goker (Robert.
Introduction Information Management systems are designed to retrieve information efficiently. Such systems typically provide an interface in which users.
Search Engines and Information Retrieval
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
Information Retrieval in Practice
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
Presented by Zeehasham Rasheed
Recommender systems Ram Akella November 26 th 2008.
Text mining in social media for participatory sensing data A dissertation by Georgios Keikoglou SID:
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
TwitterSearch : A Comparison of Microblog Search and Web Search
Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.
Attention and Event Detection Identifying, attributing and describing spatial bursts Early online identification of attention items in social media Louis.
Emerging Topic Detection on Twitter (Cataldi et al., MDMKDD 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences
On Sparsity and Drift for Effective Real- time Filtering in Microblogs Date : 2014/05/13 Source : CIKM’13 Advisor : Prof. Jia-Ling, Koh Speaker : Yi-Hsuan.
Search Engines and Information Retrieval Chapter 1.
Short Text Understanding Through Lexical-Semantic Analysis
Knowing Your Facebook From Your Flickr Dan O’ Neill – -
Tweet Analysis for Real-Time Event Detection and Earthquake Reporting System Development.
Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Querying Structured Text in an XML Database By Xuemei Luo.
Pete Bohman Adam Kunk. Real-Time Search  Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.
Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei.
Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
Wei Feng , Jiawei Han, Jianyong Wang , Charu Aggarwal , Jianbin Huang
Who is on… Introduction Using social media entails particular kind of literacies i.e. skills which include the ability to engage in a medium for production.
Relevance Feedback in Image Retrieval Systems: A Survey Part II Lin Luo, Tao Huang, Chengcui Zhang School of Computer Science Florida International University.
How to write a professional paper. 1. Developing a concept of the paper 2. Preparing an outline 3. Writing the first draft 4. Topping and tailing 5. Publishing.
Search Engine Architecture
Automatic Detection of Social Tag Spams Using a Text Mining Approach Hsin-Chang Yang Associate Professor Department of Information Management National.
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
Center for E-Business Technology Seoul National University Seoul, Korea Social Ranking: Uncovering Relevant Content Using Tag-based Recommender Systems.
Understanding User’s Query Intent with Wikipedia G 여 승 후.
Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : David Milne *, Ian H. Witten 2012, AI An open-source toolkit for mining Wikipedia.
WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.
+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.
Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.
Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
On TWITTER on TWITTER.   Regular tweets with the added bonus of reaching both current and potential followers  Can Appear in:  User Timelines and.
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Digital Images / Write Copy CUFIMA01A Produce And Manipulate Digital Images CUFWRT05A Write Content And/Or Copy Week 4.
MMM2005The Chinese University of Hong Kong MMM2005 The Chinese University of Hong Kong 1 Video Summarization Using Mutual Reinforcement Principle and Shot.
Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida Universidade Federal de Minas Gerais Belo Horizonte, Brazil ACSAC 2010 Fabricio.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Mining Tag Semantics for Social Tag Recommendation Hsin-Chang Yang Department of Information Management National University of Kaohsiung.
Assistant Instructor Nian K. Ghafoor Feb Definition of Proposal Proposal is a plan for master’s thesis or doctoral dissertation which provides the.
Big Data: Every Word Managing Data Data Mining TerminologyData Collection CrowdsourcingSecurity & Validation Universal Translation Monolingual Dictionaries.
Presented by: Shahab Helmi Spring Authors: Publication:  ICDE 2015 Type:  Research Paper 2.
Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)
User Modeling for Personal Assistant
Web Mining Department of Computer Science and Engg.
TEKS 7.21: The student is expected to: (A)  differentiate between, locate, and use valid primary and secondary sources such as computer software, databases,
Introduction Dataset search
Development of Search engine optimization for Crowdfunding site
Presentation transcript:

TWinner : Understanding News Queries with Geo-content using Twitter Satyen Abrol,Latifur Khan University of Texas at Dallas,Department of Computer Science GIR ’10 29 April, 2011 Sengyu Rim

Outline  Introduction  Related Work  Twitter as News-wire  Determining News Intent  Assigning Weights to Tweets  Experiments and Results  Conclusion 2/26

Introduction  Motivations – Users find news through search engines –The search results of common search engines are different from the user expected  Non-critical information  Unorganized content –Necessary for search engines to understand the intend of the user query 3/26

Introduction  Motivation E.g what event in Korea attracted most attention in 2002? A naive user is searching the news with keyword “korea” on Map: korea Wiki: Korea News: Korea:Italy 2:1 Food: Kimchi 4/26

Introduction  Analyze the content of a popular social networking site, Twitter to know the intention of the user query –Twitter provides popular news topics –Twitter provides keywords that may enhance the user query  TWinner makes two novel contributions to the field of Geographic information retrieval –Identifying the intent of the user query –Adding additional keywords to the query 5/26

Introduction  The architecture of the news intent system Twinner 6/26

Outline  Introduction  Related Work  Twitter as News-wire  Determining News Intent  Assigning Weights to Tweets  Experiments and Results  Conclusion 7/26

Related Work  To identify and disambiguate the locations of users –Natural Language Processing –Data Mining  To establish the relationship between the location of the news and news content –A model using NLP techniques 8/26

Outline  Introduction  Related Work  Twitter as News-wire  Determining News Intent  Assigning Weights to Tweets  Experiments and Results  Conclusion 9/26

Twitter as News-wire  Twitter –Free social networking –Micro-blogging service –Medium for news updates 10/26

Outline  Introduction  Related Work  Twitter as News-wire  Determining News Intent  Assigning Weights to Tweets  Experiments and Results  Conclusion 11/26

Determining News Intent  Identification of Location –Geo-tags the query to a location with certain confidence  Frequency-Population Ratio –FPR always remains constant in the absence of a news making event irrespective of the location –Used to assign a news intent confidence to the query –FPR = (α + β) * Nt  α: the population density factor  β: location type constant  Nt:the number of tweets per minute at that instant 12/26

Determining News Intent  Experiments on determining the effect of geo-type and population density 13/26

Determining News Intent  The drawback of FPR –Fails to take into account the geographical relatedness of features  Modified FPR –FPR = Σ δi (α i + β i ) * Nt  δi: factor that each geo-location related to the primary search query 14/26

Outline  Introduction  Related Work  Twitter as News-wire  Determining News Intent  Assigning Weights to Tweets  Experiments and Results  Conclusion 15/26

Assigning Weights to Tweets  Detecting Spam Messages –Spam messages carry little or no relevant information –Nature of spam messages –The formula that tags to a certain level of confidence whether the message is spam or not  Np: the number of followers  Nq: the number of people the user is following  μ: an arbitrary constant  Nr: the ratio of number of tweets containing a reply to the total number of tweets 16/26

Assigning Weights to Tweets  On basis of user location –The experiment conducted to understand the relation between Twitter messages and the location of the user 17/26

Assigning Weights to Tweets  Using Hyperlinks Mentioned in Tweets –30-50% of the general Twitter messages contain a hyperlink to external website –The news Twitter messages of this percentage increases to 70-80% –We also make use of this pointer to assign the weights to tweets 18/26

Assigning Weights to Tweets  Semantic Similarity –Summarize the Twitter messages into a couple of keywords –Naïve approach picks k keywords ignoring the sematic similarity –The definition of the semantic similarity  M: the total number of articles searched in New York Times Corpus  f(x): the number of articles for term x  f(y): the number of articles for term y 19/26

Assigning Weights to Tweets  Reassigns the weight of all keywords on the basis of the following formula – Wi*= Wi + ΣS ij * W j  Wi*: the new weight of the keyword i  Wi: the weight without semantic similarity  S ij : the semantic similarity derived from semantic formula  W j : the initial weight of the other words being considered  Identifies k keywords that are semantically dissimilar but together contribute maximum weight. –S pq <S threshold, the similarity between any two word(p) and word(q) belonging to the set of k is less than a threshold –W 1 +W 2 +W 3 +….+W k is maximum for all groups satisfying the condition above mentioned 20/26

Outline  Introduction  Related Work  Twitter as News-wire  Determining News Intent  Assigning Weights to Tweets  Experiments and Results  Conclusion 21/26

Experiment and Results  Experiments-to see the validity of the hypothesis –First: a naïve user is looking for the latest on the happenings in the context to the Ford Hood incident on 12 th November 2009 –Second: a naïve user is looking for the latest on the happenings in the context to ‘Russia’ on 5 th December 2009 –Third: :a naïve user is looking for the latest on the happenings in the context to ‘Haiti’ on 18 th January /26

Experiment and Results  Results 23/26

Experiment and Results  Result-shows the contrast in search results produced by using original query and after adding keywords obtained by TWinner 24/26

Outline  Introduction  Related Work  Twitter as News-wire  Determining News Intent  Assigning Weights to Tweets  Experiments and Results  Conclusion 25/26

Conclusion  We present a system to predict a user’s news intent –Takes location mentioned and time of query into consideration –Makes use of the social networking site Twitter to understand the relations hip between geo-information and the news intend of the query  Future work –Understanding the content of the social media message –Sentiment conveyed by the messages –Enhancing the accuracy of the system 26/26

Thank you!