Presentation on theme: "Geo-spatial Event Detection in the Twitter Stream Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013."— Presentation transcript:
Geo-spatial Event Detection in the Twitter Stream Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013
2 Outline 1.Introduction & Context Social Media Analysis in a C2 Center 2.The “Avalanche” event detection approach Identify posting “hot spots” Evaluate post clusters with Machine Learning approach 3.Evaluation 4.Future work
3 Background: Social Data Social Media continuously creates massive amounts of data E.g. 500 Million tweets each day: ~300 GB raw data Nature of the data: time-stamped textual (many languages, lingos & slangs, spelling mistakes are ripe, only a few words per tweet) links to pictures links to news paper articles (more text) sometimes geo-spatial (contains coordinates) Creating real actionable insights from this isn’t an easy problem This talk gives one specific example how this can be done
4 Use case: Urban Management & Public Safety Cites today are complex and need to be organized Administration is responsible for keeping population safe emergency services health services fire fighters police Command & Control Center
5 Urban Management & Public Safety Why is Social Media relevant in this context? ?
6 Urban Management & Public Safety Why is Social Media relevant in this context? “There's a plane in the Hudson. I'm on the ferry going to pick up the people. Crazy”
7 Urban Management & Public Safety Why is Social Media relevant in this context? “De tering, wat een hel!!! 1,4 miljoen mensen op dat terrein! #loveparade”
8 Urban Management & Public Safety Why is Social Media relevant in this context? “#Hoboken is on fire. Building above Hoboken Farm Corporation at 300 Washington is all smoked out” Social Media can help creating a situational awareness picture
9 detect, classify and display events to operator accidents, fires, violence, demonstrations, violence 1. Automatic detection of breaking events improve USAP by focused Social Media Analytics possibly contact owner of posts for more information 2. Monitoring of ongoing situations automatic report generation interactive investigation support 3. Post Incident reporting Context: Social Media in a C2 Center
16 Two step approach: 1.Identify locations with high tweet activity Collect geo-spatial tweet clusters 2.Evaluate clusters with a Machine Learning approach Do these clusters constitute an real-world event that the tweeters are witnessing first-hand? Work in Progress: 3.Classify events according to type How is it done?
17 Machine Learning – What is the task? = geo-located Social Media post (Tweet)
18 Machine Learning – What is the task? Suspicious package in #GrandCentral #NYC #bomb threat possibility not sure?? http://t.co/VwU7SP3Xhttp://t.co/VwU7SP3X Suspicious package found in Grand Central Station... the 456 train..the trains are closed !! [pic]: http://t.co/9YPki4k2http://t.co/9YPki4k2 Something happened in the #456 #trainstation in #GrandCentral #NYC http://t.co/GGKvQurahttp://t.co/GGKvQura Accident on the #456train in #midtown #NYC http://t.co/fj2mJJmfhttp://t.co/fj2mJJmf vs. RT @refinery29: This image of Madeleine Albright playing the drums will be the best thing you'll see today: http://t.co/rGwQ5RdG@refinery29:http://t.co/rGwQ5RdG «@_PrettyPoison Guess ill fill out more job apps today» make punna fill out some 2! The Glamour & Glitz at the 2012 Emmy' s that we loved! http://t.co/CiTFszfL http://t.co/CiTFszfL @IszwanieSyahira: i'm happy and i hope u feel the same too. weeeee ~.~ How to prepare yourself for Friday's apocalypse http://cnet.co/lPUhttp://cnet.co/lPU We need to automatically determine which of the tweet clusters (tweets issued close to each other in a short time frame) represent real-world events and which are just random chatter. Good Bad
19 We look for geo- spatial clusters of tweets (e.g. 3 or more tweets in a 200m radius, posted within 30 mins) These become “event candidates” Event candidates are evaluated with a Machine Learning scheme. We currently use C4.5 decision trees. Architecture
20 Machine Learning - Features Tweet cluster: Suspicious package in #GrandCentral #NYC #bomb threat possibility not sure?? http://t.co/VwU7SP3X http://t.co/VwU7SP3X Suspicious package found in Grand Central Station... the 456 train..the trains are closed !! [pic]: http://t.co/9YPki4k2 http://t.co/9YPki4k2 Something happened in the #456 #trainstation in #GrandCentral #NYC http://t.co/GGKvQura http://t.co/GGKvQura Accident on the #456train in #midtown #NYC http://t.co/fj2mJJmf http://t.co/fj2mJJmf
21 Blue = training Green = runtime In offline ML, we train once, but use the predictive model possibly millions of times a day. It’s okay if training isn’t fast as lightning. But during execution every CPU cycle can count. Scalable Machine Learning ……with Weka!
22 … Scalable Machine Learning ……with Weka! … which can be optimized further in various ways. See e.g. Nima Asadi, Jimmy Lin, Arjen P. de Vries. Runtime Optimizations for Tree-Based Machine Learning Models. IEEE Transactions on Knowledge and Data Engineering, 2013.
26 If there are several tweets … from roughly the same location at roughly the same time from different users that nevertheless use the same words … chances are good that we have detected an event. (Somewhat simplyfied) Summary
27 Outlook – work in progress and future work Derive more coordinates from shared pictures from toponyms in posts use image sharing sites directly Make use of posts without coordinates and add them to already existing clusters Explore real-time TF-IDF to get rid of the Kardashians & Beliebers Evaluate system with real-world data Because recall numbers are currently somewhat misleading
28 Machine Learning – Relevance Feedback Machine Learning Model Users (journalists, C2 operators ) Documents (e.g. tweets, post clusters) Good Bad Users implicitly rate documents by how they interact with them User performs follow up actions relevant User clicks document away irrelevant System learns to present more relevant documents System can adapt to changing needs over time Work in progress
29 Example: Explosion in an image Explosion detected with Image Analysis OMG!!! OMG!!! http://t.co/maiAgHoh Problem: Not all tweets contain useful textual information Shared text might be hard to analyze Solution: ~35% of tweets contain linked images Images provide a wealth of information that can be analyzed Objects, events, persons coordinates Image Analysis of shared pictures Work in progress