Summary Presented by : Aishwarya Deep Shukla

Slides:



Advertisements
Similar presentations
Trustworthy Service Selection and Composition CHUNG-WEI HANG MUNINDAR P. Singh A. Moini.
Advertisements

Lindsey Bleimes Charlie Garrod Adam Meyerson
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
LEARNING INFLUENCE PROBABILITIES IN SOCIAL NETWORKS Amit Goyal Francesco Bonchi Laks V. S. Lakshmanan University of British Columbia Yahoo! Research University.
Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.
One Theme in All Views: Modeling Consensus Topics in Multiple Contexts Jian Tang 1, Ming Zhang 1, Qiaozhu Mei 2 1 School of EECS, Peking University 2 School.
Twitter The Basics. What is Twitter? Tweets are: 140 characters or less Quick to follow and view updates Used to share links, photos, videos, music,hot.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Problem Semi supervised sarcasm identification using SASI
Title Course opinion mining methodology for knowledge discovery, based on web social media Authors Sotirios Kontogiannis Ioannis Kazanidis Stavros Valsamidis.
Did You See Bob?: Human Localization using Mobile Phones Constandache, et. al. Presentation by: Akie Hashimoto, Ashley Chou.
A Statistical Model for Domain- Independent Text Segmentation Masao Utiyama and Hitoshi Isahura Presentation by Matthew Waymost.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Landmark Classification in Large- scale Image Collections Yunpeng Li David J. Crandall Daniel P. Huttenlocher ICCV 2009.
Distributed Search over the Hidden Web Hierarchical Database Sampling and Selection Panagiotis G. Ipeirotis Luis Gravano Computer Science Department Columbia.
Presented by Zeehasham Rasheed
Attention and Event Detection Identifying, attributing and describing spatial bursts Early online identification of attention items in social media Louis.
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
By : Garima Indurkhya Jay Parikh Shraddha Herlekar Vikrant Naik.
Exploring Metropolitan Dynamics with an Agent- Based Model Calibrated using Social Network Data Nick Malleson & Mark Birkin School of Geography, University.
Beyond Co-occurrence: Discovering and Visualizing Tag Relationships from Geo-spatial and Temporal Similarities Date : 2012/8/6 Resource : WSDM’12 Advisor.
Improved search for Socially Annotated Data Authors: Nikos Sarkas, Gautam Das, Nick Koudas Presented by: Amanda Cohen Mostafavi.
Making the most of social historic data Aleksander Kolcz Twitter, Inc.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Hotspot Detection in a Service Oriented Architecture Pranay Anchuri,
ON THE SELECTION OF TAGS FOR TAG CLOUDS (WSDM11) Advisor: Dr. Koh. Jia-Ling Speaker: Chiang, Guang-ting Date:2011/06/20 1.
Wei Feng , Jiawei Han, Jianyong Wang , Charu Aggarwal , Jianbin Huang
Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Providing User Context for Mobile and Social Networking Applications A. C. Santos et al., Pervasive and Mobile Computing, vol. 6, no. 1, pp , 2010.
1 A Biterm Topic Model for Short Texts Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.
Amanda Lambert Jimmy Bobowski Shi Hui Lim Mentors: Brent Castle, Huijun Wang.
IPSN 2012 Yu Wang, Rui Tan, Guoliang Xing, Jianxun Wang, and Xiaobo Tan NSLab study group 2012/07/02 Reporter: Yuting 1.
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Unsupervised Streaming Feature Selection in Social Media
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida Universidade Federal de Minas Gerais Belo Horizonte, Brazil ACSAC 2010 Fabricio.
Jan 27, Digital Preservation Seminar1 Effective Page Refresh Policies for Web Crawlers Written By: Junghoo Cho & Hector Garcia-Molina Presenter:
26. Classification Accuracy Assessment
Data Science Credibility: Evaluating What’s Been Learned
Topic Modeling for Short Texts with Auxiliary Word Embeddings
GS/PPAL Section N Research Methods and Information Systems
CHAPTER 8 Estimating with Confidence
Topical Authority Detection and Sentiment Analysis on Top Influencers
Understanding Human Mobility from Twitter
Multidisciplinary Engineering Senior Design Project P06441 See Through Fog Imaging Preliminary Design Review 05/19/06 Project Sponsor: Dr. Rao Team Members:
MID-SEM REVIEW.
ICICLES: Self-tuning Samples for Approximate Query Answering
Differential Privacy in Practice
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
Data Mining Practical Machine Learning Tools and Techniques
#VisualHashtags Visual Summarization of Social Media Events using Mid-Level Visual Elements Sonal Goel (IIIT-Delhi), Sarthak Ahuja (IBM Research, India),
Gerald Dyer, Jr., MPH October 20, 2016
iSRD Spam Review Detection with Imbalanced Data Distributions
Pooria Taghizadeh : Dr. Hadi Tabatabaee : Dr. Mona Ghassemian :
Panagiotis G. Ipeirotis Luis Gravano
Analytics – Statistical Approaches
Chapter 8: Estimating with Confidence
Lecture 6: Counting triangles Dynamic graphs & sampling
Inferential Statistics
New Techniques and Technologies for Statistics 2017  Estimation of Response Propensities and Indicators of Representative Response Using Population-Level.
Chapter 8: Estimating with Confidence
Building Topic/Trend Detection System based on Slow Intelligence
Introduction to the design (and analysis) of experiments
Introduction to Machine learning
Yingze Wang and Shi-Kuo Chang University of Pittsburgh
Evaluation David Kauchak CS 158 – Fall 2019.
Presentation transcript:

Summary Presented by : Aishwarya Deep Shukla You are Where you TWEET: A content-BASED approach to GEO-locating twitter users Summary Presented by : Aishwarya Deep Shukla

Problem Definition Only 26% of Twitter users list their location (city), the rest do not Just 0.42% use geo-tagging as of 2009 (timeframe of this study) Can we accurately guess the users location from just the tweet text ?

Introduction Twitter users slow to adopt geo tagging per tweet Geo-location of user important to understand trends Sparsity affects the geo-location applications Can we predict a user’s location based purely on the content of the user’s tweets ? Key Intuition: Specific keywords are more likely to be associated with a particular city/location

Challenges in locating the user with just tweet content Twitter status updates are noisy ! Twitter users rely on shorthand, sms language – it is inconsistent and hard to text mine Even if location sensitive data is isolated from a user’s tweets, it might be error prone as user might have interest in any other location too, other than his present location User may have more than one location associated, E.g: travellers

Related Work Probabilistic language models based on the flickr photo tags – Serdyukov et. al Probablistic model looking at text + visual – Crandall et. Al Applications: Detecting earthquakes with real time twitter data - Each user is a treated as a sensor Detecting news origin location, and diffusion pattern

Data Collection Crawl through twitter public timeline API - random sampling Breadth First Search --- > Crawl Social Edges DATA 29 million status updates of over 1 million users 72% of the profiles have no location information listed 7% have bad location information (Eg: Wonderland) 21% have a city name listed in their profile The data distribution is representative of the population distribution

Evaluation Setup Location Estimation problem: Stweets(u) - Set of tweets by user u Estimate the probability of the user u being in city i: p(i| Stweets(u) ), such that lest(u) is actual location of the user Test data: Extract the tweets of all users who have actual location listed (Coordinates) and use it to check algorithm accuracy Metrics Error Distance  ErrDist(u)= d(lact(u), lest(u)) AvgErrDist(U)  COPY THE FORMULA Accuracy (U)  COPY THE FORMULA

Location Determination Algorithm Select training dataset Associate real location with frequency of words used at that location Run it on the test data to predict location Accuracy: 10.12%  Step 1 Baseline Determine words with high spatial focus These words are typically very specific to a place Accuracy:49.8% Step 2 Identifying local Smoothing Accuracy: 51% Step 3 Optimization

ESTIMATION ALGORITHM: BASELINE PROBABLITY Baseline Location Estimation Training data of 130,689 users Plot their tweets Calculate estimated probability based on the formula P(i|Swords(u))= ∑ (p(i\w)*p(w)) Test Result : only 10.12% of the 5119 test users are in the 100 miles of the estimated location this way

OPTIMIZATOIN Identify Local Words in Tweets Words with more compact scope compared to other words Determining Spatial Focus: Cd –α (C)- Focus , Dispersion (α) Determine the focus and dispersion

State Level Lattice based Model Based Tweet Sparsity Large number of “tiny” word distibutions – words issued sparingly and from only a few cities Smoothing approaches State Level Aggregate the probability of a word by state Lattice based Aggregate by 1 X 1 square degrees Model Based Spatial focused word model

Experimental Results Goals of the test experiments Does classification on spatial distribution help ? -- YES How much do different smoothing techniques help? Impact of amount of information about a particular user (via tweets) (Count. of tweets)

Estimation Quality: Number of Tweets

Comments Novel approach to locate users based on their tweeted text Users with geo information has steadily increased to 45% now , more than double of when this paper was authored. Algorithm to locate users by tweet this way, can also work for other social media networks and even blogs. Isn't there a self-selection bias in who chooses to share location … Is it okay to assume overall algorithm accuracy based on the tests results ? Suggestions Why not use hashtags too and develop a geo-locator based on hashtags ?