Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event Detection Takehi Sakaki Makoto Okazaki @ymatsuo.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.
VISIT: Virtual Intelligent System for Informing Tourists Kevin Meehan Intelligent Systems Research Centre Supervisors: Dr. Kevin Curran, Dr. Tom Lunney,
CSC 380 Algorithm Project Presentation Spam Detection Algorithms Kyle McCombs Bridget Kelly.
1 Vertically Integrated Seismic Analysis Stuart Russell Computer Science Division, UC Berkeley Nimar Arora, Erik Sudderth, Nick Hay.
A Novel Scheme for Video Similarity Detection Chu-Hong Hoi, Steven March 5, 2003.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
1 Robust Video Stabilization Based on Particle Filter Tracking of Projected Camera Motion (IEEE 2009) Junlan Yang University of Illinois,Chicago.
Reliable Range based Localization and SLAM Joseph Djugash Masters Student Presenting work done by: Sanjiv Singh, George Kantor, Peter Corke and Derek Kurth.
Context-Aware Query Classification Huanhuan Cao 1, Derek Hao Hu 2, Dou Shen 3, Daxin Jiang 4, Jian-Tao Sun 4, Enhong Chen 1 and Qiang Yang 2 1 University.
Evaluating Hypotheses
1 Integration of Background Modeling and Object Tracking Yu-Ting Chen, Chu-Song Chen, Yi-Ping Hung IEEE ICME, 2006.
Today Introduction to MCMC Particle filters and MCMC
Presented by Zeehasham Rasheed
Estimating Congestion in TCP Traffic Stephan Bohacek and Boris Rozovskii University of Southern California Objective: Develop stochastic model of TCP Necessary.
Bayesian Filtering for Location Estimation D. Fox, J. Hightower, L. Liao, D. Schulz, and G. Borriello Presented by: Honggang Zhang.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Energy-efficient Self-adapting Online Linear Forecasting for Wireless Sensor Network Applications Jai-Jin Lim and Kang G. Shin Real-Time Computing Laboratory,
Towards Detecting Influenza Epidemics by Analyzing Twitter Massages Aron Culotta Jedsada Chartree.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Anomaly detection Problem motivation Machine Learning.
Muhammad Moeen YaqoobPage 1 Moment-Matching Trackers for Difficult Targets Muhammad Moeen Yaqoob Supervisor: Professor Richard Vinter.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Patterns of significant seismic quiescence in the Pacific Mexican coast A. Muñoz-Diosdado, A. H. Rudolf-Navarro, A. Barrera-Ferrer, F. Angulo-Brown National.
On Sparsity and Drift for Effective Real- time Filtering in Microblogs Date : 2014/05/13 Source : CIKM’13 Advisor : Prof. Jia-Ling, Koh Speaker : Yi-Hsuan.
Bei Pan (Penny), University of Southern California
Automatically Identifying Localizable Queries Center for E-Business Technology Seoul National University Seoul, Korea Nam, Kwang-hyun Intelligent Database.
Economic Cooperation Organization Training Course on “Drought and Desertification” Alanya Facilities, Antalya, TURKEY presented by Ertan TURGU from Turkish.
Mirco Nanni, Roberto Trasarti, Giulio Rossetti, Dino Pedreschi Efficient distributed computation of human mobility aggregates through user mobility profiles.
Ubiquitous Human Computation
SIS Sequential Importance Sampling Advanced Methods In Simulation Winter 2009 Presented by: Chen Bukay, Ella Pemov, Amit Dvash.
Simultaneous Localization and Mapping Presented by Lihan He Apr. 21, 2006.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Tweet Analysis for Real-Time Event Detection and Earthquake Reporting System Development.
Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Online Social Networks and Media Mining 1.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011.
Pete Bohman Adam Kunk. Real-Time Search  Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.
Learning Geographical Preferences for Point-of-Interest Recommendation Author(s): Bin Liu Yanjie Fu, Zijun Yao, Hui Xiong [KDD-2013]
Wei Feng , Jiawei Han, Jianyong Wang , Charu Aggarwal , Jianbin Huang
BING: Binarized Normed Gradients for Objectness Estimation at 300fps
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.
Stable Multi-Target Tracking in Real-Time Surveillance Video
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
Dr. Sudharman K. Jayaweera and Amila Kariyapperuma ECE Department University of New Mexico Ankur Sharma Department of ECE Indian Institute of Technology,
1 Clarifying Sensor Anomalies using Social Network feeds * University of Illinois at Urbana Champaign + U.S. Army Research Lab ++ IBM Research, USA Prasanna.
Boosted Particle Filter: Multitarget Detection and Tracking Fayin Li.
Unsupervised Mining of Statistical Temporal Structures in Video Liu ze yuan May 15,2011.
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
TWinner : Understanding News Queries with Geo-content using Twitter Satyen Abrol,Latifur Khan University of Texas at Dallas,Department of Computer Science.
Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.
Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.
Zhaoxia Fu, Yan Han Measurement Volume 45, Issue 4, May 2012, Pages 650–655 Reporter: Jing-Siang, Chen.
Jiancang Zhuang Inst. Statist. Math. Detecting spatial variations of earthquake clustering parameters via maximum weighted likelihood.
Online Social Media Sensing 1: Sensing the Physical and Social World
Online Social Networks and Media
Summary Presented by : Aishwarya Deep Shukla
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
iSRD Spam Review Detection with Imbalanced Data Distributions
EE513 Audio Signals and Systems
Parametric Methods Berlin Chen, 2005 References:
Overview: Chapter 2 Localization and Tracking
Presentation transcript:

Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event Detection Takehi Sakaki Makoto Okazaki @ymatsuo the University of Tokyo

Introduction Event Detection Model Experiments And Evaluation Outline Conclusions Application

Introduction Outline

What’s happening?  Twitter  is one of the most popular microblogging services  has received much attention recently  Microblogging  is a form of blogging  that allows users to send brief text updates  is a form of micromedia  that allows users to send photographs or audio clips  In this research, we focus on an important characteristic real-time nature

Real-time Nature of Microblogging  Twitter users write tweets several times in a single day.  There is a large number of tweets, which results in many reports related to events  We can know how other users are doing in real-time  We can know what happens around other users in real-time. social events parties baseball games presidential campaign disastrous events storms fires traffic jams riots heavy rain-falls earthquakes

Japan Earthquake Shakes Twitter Users... And Beyonce: Earthquakes are one thing you can bet on being covered on Twitter first, because, quite frankly, if the ground is shaking, you’re going to tweet about it before it even registers with the USGS* and long before it gets reported by the media. That seems to be the case again today, as the third earthquake in a week has hit Japan and its surrounding islands, about an hour ago. The first user we can find that tweeted about it was Ricardo Duran of Scottsdale, AZ, who, judging from his Twitter feed, has been traveling the world, arriving in Japan yesterday.  Adam Ostrow, an Editor in Chief at Mashable wrote the possibility to detect earthquakes from tweets in his blog Our motivation we can know earthquake occurrences from tweets =the motivation of our research *USGS : United States Geological Survey

Our Goals  propose an algorithm to detect a target event  do semantic analysis on Tweet  to obtain tweets on the target event precisely  regard Twitter user as a sensor  to detect the target event  to estimate location of the target  produce a probabilistic spatio-temporal model for  event detection  location estimation  propose Earthquake Reporting System using Japanese tweets

Twitter and Earthquakes in Japan a map of earthquake occurrences world wide a map of Twitter user world wide The intersection is regions with many earthquakes and large twitter users.

Twitter and Earthquakes in Japan Other regions: Indonesia, Turkey, Iran, Italy, and Pacific coastal US cities

Event Detection Outline

Event detection algorithms  do semantic analysis on Tweet  to obtain tweets on the target event precisely  regard Twitter user as a sensor  to detect the target event  to estimate location of the target

Semantic Analysis on Tweet  Search tweets including keywords related to a target event  Example: In the case of earthquakes  “shaking”, “earthquake”  Classify tweets into a positive class or a negative class  Example:  “Earthquake right now!!” ---positive  “Someone is shaking hands with my boss” --- negative  Create a classifier

Semantic Analysis on Tweet  Create classifier for tweets  use Support Vector Machine(SVM)  Features (Example: I am in Japan, earthquake right now!)  Statistical features (7 words, the 5 th word) the number of words in a tweet message and the position of the query within a tweet  Keyword features ( I, am, in, Japan, earthquake, right, now) the words in a tweet  Word context features (Japan, right) the words before and after the query word

Tweet as a Sensory Value ・・・ tweets ・・・ Probabilistic model Classifier observation by sensors observation by twitter users target event target object Probabilistic model values Event detection from twitter Object detection in ubiquitous environment the correspondence between tweets processing and sensory data detection

Tweet as a Sensory Value some users posts “earthquake right now!!” some earthquake sensors responses positive value We can apply methods for sensory data detection to tweets processing ・・・ tweets Probabilistic model Classifier observation by sensors observation by twitter users target event target object Probabilistic model values Event detection from twitter Object detection in ubiquitous environment ・・・ search and classify them into positive class detect an earthquake earthquake occurrence

Tweet as a Sensory Value  We make two assumptions to apply methods for observation by sensors  Assumption 1: Each Twitter user is regarded as a sensor  a tweet → a sensor reading  a sensor detects a target event and makes a report probabilistically  Example:  make a tweet about an earthquake occurrence  “earthquake sensor” return a positive value  Assumption 2: Each tweet is associated with a time and location  a time : post time  location : GPS data or location information in user’s profile Processing time information and location information, we can detect target events and estimate location of target events

Model Outline

Probabilistic Model  Why we need probabilistic models?  Sensor values are noisy and sometimes sensors work incorrectly  We cannot judge whether a target event occurred or not from one tweets  We have to calculate the probability of an event occurrence from a series of data  We propose probabilistic models for  event detection from time-series data  location estimation from a series of spatial information

Temporal Model  We must calculate the probability of an event occurrence from multiple sensor values  We examine the actual time-series data to create a temporal model

Temporal Model

 the data fits very well to an exponential function  design the alarm of the target event probabilistically,which was based on an exponential distribution

Temporal Model  the probability of an event occurrence at time t  the false positive ratio of a sensor  the probability of all n sensors returning a false alarm  the probability of event occurrence  sensors at time 0 → sensors at time t  the number of sensors at time t  expected wait time to deliver notification  parameter

Spatial Model  We must calculate the probability distribution of location of a target  We apply Bayes filters to this problem which are often used in location estimation by sensors  Kalman Filers  Particle Filters

Bayesian Filters for Location Estimation  Kalman Filters  are the most widely used variant of Bayes filters  approximate the probability distribution which is virtually identical to a uni-modal Gaussian representation  advantages: the computational efficiency  disadvantages: being limited to accurate sensors or sensors with high update rates

Bayesian Filters for Location Estimation  Particle Filters  represent the probability distribution by sets of samples, or particles  advantages: the ability to represent arbitrary probability densities  particle filters can converge to the true posterior even in non- Gaussian, nonlinear dynamic systems.  disadvantages: the difficulty in applying to high-dimensional estimation problems

Information Diffusion Related to Real-time Events  Proposed spatiotemporal models need to meet one condition that  Sensors are assumed to be independent  We check if information diffusions about target events happen because  if an information diffusion happened among users, Twitter user sensors are not independent. They affect each other

Information Diffusion Related to Real-time Events Nintendo DS Game an earthquakea typhoon Information Flow Networks on Twitter In the case of an earthquakes and a typhoons, very little information diffusion takes place on Twitter, compared to Nintendo DS Game → We assume that Twitter user sensors are independent about earthquakes and typhoons

Experiments And Evaluation Outline

Experiments And Evaluation  We demonstrate performances of  tweet classification  event detection from time-series data → show this results in “application”  location estimation from a series of spatial information

Evaluation of Semantic Analysis  Queries  Earthquake query: “shaking” and “earthquake”  Typhoon query:”typhoon”   Examples to create classifier  597 positive examples

Evaluation of Semantic Analysis  “earthquake” query  “shaking” query FeaturesRecallPrecisionF-Value Statistical87.50%63.64%73.69% Keywords87.50%38.89%53.85% Context50.00%66.67%57.14% All87.50%63.64%73.69% FeaturesRecallPrecisionF-Value Statistical66.67%68.57%67.61% Keywords86.11%57.41%68.89% Context52.78%86.36%68.20% All80.56%65.91%72.50%

Discussions of Semantic Analysis  We obtain highest F-value when we use Statistical features and all features.  Keyword features and Word Context features don’t contribute much to the classification performance  A user becomes surprised and might produce a very short tweet  It’s apparent that the precision is not so high as the recall FeaturesRecallPrecisionF-Value Statistical87.50%63.64%73.69% Keywords87.50%38.89%53.85% Context50.00%66.67%57.14% All87.50%63.64%73.69%

Experiments And Evaluation  We demonstrate performances of  tweet classification  event detection from time-series data → show this results in “application”  location estimation from a series of spatial information

Evaluation of Spatial Estimation  Target events  earthquakes  25 earthquakes from August.2009 to October 2009  typhoons  name: Melor  Baseline methods  weighed average  simply takes the average of latitudes and longitudes  the median  simply takes the median of latitudes and longitudes  We evaluate methods by distances from actual centers  a distance from an actual center is smaller, a method works better

Evaluation of Spatial Estimation Tokyo Osaka actual earthquake center Kyoto estimation by median estimation by particle filter balloon: each tweets color : post time

Evaluation of Spatial Estimation

Earthquakes Average - Particle filters works better than other methods DateActual CenterMedianWeighed AverageKalman FilterParticle Filter mean square errors of latitudes and longitude

Evaluation of Spatial Estimation A typhoon Average - Particle Filters works better than other methods DateActual CenterMedianWeighed AverageKalman FilterParticle Filter mean square errors of latitudes and longitude

Discussions of Experiments  Particle filters performs better than other methods  If the center of a target event is in an oceanic area, it’s more difficult to locate it precisely from tweets  It becomes more difficult to make good estimation in less populated areas

Outline Application

Earthquake Reporting System  Toretter (  Earthquake reporting system using the event detection algorithm  All users can see the detection of past earthquakes  Registered users can receive s of earthquake detection reports Dear Alice, We have just detected an earthquake around Chiba. Please take care. Toretter Alert System

Screenshot of Toretter.com

Earthquake Reporting System  Effectiveness of alerts of this system  Alert s urges users to prepare for the earthquake if they are received by a user shortly before the earthquake actually arrives.  Is it possible to receive the before the earthquake actually arrives?  An earthquake is transmitted through the earth's crust at about 3~7 km/s.  a person has about 20~30 sec before its arrival at a point that is 100 km distant from an actual center

Results of Earthquake Detection In all cases, we sent s before announces of JMA In the earliest cases, we can sent s in 19 sec. DateMagnitudeLocationTime sent time time gap [sec] # tweets within 10 minutes Announce of JMA Aug Tochigi6:58:557:00: :08 Aug Suruga-wan19:22:4819:23: :28 Aug Chiba8:51:168:51: :56 Aug Uraga-oki2:22:492:23: :27 Aug.253.5Fukushima2:21:1522:22: :26 Aug Wakayama17:47:3017:48: :7:53 Aug Suruga-wan20:26:2320:26: :31 Ag Fukushima00:45:5400:46: :51 Sep. 23.3Suruga-wan13:04:4513:05: :10 Sep. 23.6Bungo-suido17:37:5317:38: :43

Experiments And Evaluation  We demonstrate performances of  tweet classification  event detection from time-series data → show this results in “application”  location estimation from a series of spatial information

Results of Earthquake Detection JMA intensity scale2 or more3 or more4 or more Num of earthquakes78253 Detected70(89.7%)24(96.0%)3(100.0%) Promptly detected*53(67.9%)20(80.0%)3(100.0%) Promptly detected: detected in a minutes JMA intensity scale: the original scale of earthquakes by Japan Meteorology Agency Period: Aug.2009 – Sep Tweets analyzed :49,314 tweets Positive tweets : 6291 tweets by 4218 users We detected 96% of earthquakes that were stronger than scale 3 or more during the period.

Outline Conclusions

 We investigated the real-time nature of Twitter for event detection  Semantic analyses were applied to tweets classification  We consider each Twitter user as a sensor and set a problem to detect an event based on sensory observations  Location estimation methods such as Kaman filters and particle filters are used to estimate locations of events  We developed an earthquake reporting system, which is a novel approach to notify people promptly of an earthquake event  We plan to expand our system to detect events of various kinds such as rainbows, traffic jam etc.

 Thank you for your paying attention and tweeting on earthquakes.   Takeshi