Spatio-Temporal-Thematic Analysis of Citizen Sensor Data Challenges and Experiences Meenakshi Nagarajan, Karthik Gomadam, Amit Sheth, Ajith Ranabahu, Raghava.

Slides:



Advertisements
Similar presentations
From Words to Meaning to Insight
Advertisements

Automatic Timeline Generation from News Articles Josh Taylor and Jessica Jenkins.
From Words to Meaning to Insight Julia Cretchley & Mike Neal.
UNIVERSITY OF TWENTE - FACULTY OF GEO-INFORMATION SCIENCE AND EARTH OBSERVATION (ITC) Human Sensor Web project h2.0 Inform and Empower Initiative Human.
Learning more about Facebook and Twitter. Introduction  What we’ve covered in the Social Media webinar series so far  Agenda for this call Facebook.
The Big Idea for the “Emerging Young Artists” is to do SMART marketing using digital marketing avenues. The idea is to create awareness and increase.
Twitter – what is it? The School District of Haverford Township |
SOCIAL MEDIA & PHYSICAL ACTIVITY PROMOTION: MAKING THE CONNECTIONS Presented by: Sandra De Freitas
Twitter Glossary. #: People use the hashtag symbol # before a relevant keyword or phrase (no spaces) in their Tweet to categorize those Tweets and help.
Twarql Tapping Into the Wisdom of the Crowd Pablo N. Mendes, Pavan Kapanipathi, Alexandre Passant I-SEMANTICS Graz, Austria September 2 nd, 2010.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
August 23, 2013 Social Media Audit. Overview  Goals –Evaluate current social networking status –Identify trending topics and social influencers –Provide.
Information | Analytics | Expertise SOCIAL MEDIA INTELLIGENCE Practical Strategies for Using Social Media to Enhance Security AUGUST 2014 © 2014 IHS IHS.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Twitter: What do so many people have to say? Mary Zedeck Instructional Designer Twitter: Course Resources:
By Daragh Social Media Strategy for a Political Campaign.
1 Advanced Archive-It Application Training: Archiving Social Networking and Social Media Sites.
Result presentation. Search Interface Input and output functionality – helping the user to formulate complex queries – presenting the results in an intelligent.
TwitterSearch : A Comparison of Microblog Search and Web Search
Twitter for Dairy Farmers Tweets, tweeps & hashtags.
How to Face the Challenges of Web Archiving? The experiences of a small library on the edge. Chloe Martin, Internet Memory Catherine Ryan, National Library.
MarketLine HQ ADVANTAGE – your subscription service Explore today at
Imagery 2.0 –you are here and there A brief introduction to social photo and video.
Knowing Your Facebook From Your Flickr Dan O’ Neill – -
SIRS Issues Researcher Insight into today’s Leading Issues sks.sirs.com | proquestk12.com.
Analysis and Monetization of Social Data Amit P. Sheth Lexis-Nexis Ohio Eminent Scholar Director, Kno.e.sis Center, Wright State University.
AIR TWITTER: USING SOCIAL MEDIA AND SCIENTIFIC DATA TO SENSE AIR QUALITY EVENTS E. M. Robinson 1 ; W.E. Fialkowski 1 1. Energy, Environmental and Chemical.
Microblogs: Information and Social Network Huang Yuxin.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
OT Connections is AOTA’s new online community which allows occupational therapists, occupational therapy assistants and students to connect with each.
2014 ML Project2: Goal: Do some real machine learning; learn you to use machine learning to make sense out of data. Group Project—4 (3) students per group.
Wei Feng , Jiawei Han, Jianyong Wang , Charu Aggarwal , Jianbin Huang
Puget Sound Information Challenge Experiences and Lessons Learned.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
The Birth & Growth of Web 2.0 COM 415-Fall II Ashley Velasco (Prince)
Semantic Visualization What do we mean when we talk about visualization? - Understanding data - Showing the relationships between elements of data Overviews.
OCLC Online Computer Library Center 1 Social Media and Advocacy.
Social Media Primer. Social Media is Great For: Building awareness and attracting new business Fostering community Providing helpful content and information.
Social Media Marketing Client Project By Ashli Dean.
1/12/ Multimedia Data Mining. Multimedia data types any type of information medium that can be represented, processed, stored and transmitted over.
David Herring NOAA Climate Program Office May 28, 2013 NOAA Climate.gov A brief overview and highlights of what’s new.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
On Frequent Chatters Mining Claudio Lucchese 1 st HPC Lab Workshop 6/15/12 1st HPC Workshp - Claudio Lucchese.
Online communication channels Project: Efficient Managers for Efficient Natura2000 Network Created by EUROPARC Federation The production of this presentation.
Facilitating Document Annotation Using Content and Querying Value.
Leveraging Social Media Analytics to Protect the Brand, Improve Products and enhance Operational Performance Derive business value from unstructured data.
Grow Your Business with Social Marketing
The Canadian Healthcare Education Commons What is CHEC-CESC?
Frompo is a Next Generation Curated Search Engine. Frompo has a community of users who come together and curate search results to help improve.
Twitris By: Bhargabi Chakrabarti 28/03/13. Twitris 28/03/13 “Situation awareness application that care more about knowing what is going on so you can.
Visualizing User Activity History
Altmetrics What do they measure?
CSE5544 Final Project Interactive Visualization Tool(s) for IEEE Vis Publication Exploration and Analysis Team Name: Publication Miner Team Members:
CSE5544 Final Project Interactive Visualization Tool(s) for IEEE Vis Publication Exploration and Analysis Team Name: Publication Miner Team Members:
Click to Add Title Click to Add Subtitle.
Click to edit Master text styles
Author names here Author association names here
Click to edit Master text styles
Click to edit Master text styles
ОПШТЕСТВО ТЕМА: МЕСТОТО ВО КОЕ ЖИВЕАМ Скопје
Author names here Author associations here
Author names here Author associations here
Political map- shows the boarders between different countries and states.
Click to edit Master text styles
Author names here Author associations here
Big Data Big Data first appeared towards the end of the 1990’s and has become a buzz word in the last few years.
Click to edit Master text styles
Click to edit Master title style
Presentation transcript:

Spatio-Temporal-Thematic Analysis of Citizen Sensor Data Challenges and Experiences Meenakshi Nagarajan, Karthik Gomadam, Amit Sheth, Ajith Ranabahu, Raghava Mutharaju and Ashutosh Jadhav Kno.e.sisKno.e.sis, Wright State University Presented at the conference by Pablo Mendes Text at:

Micro-blogging platforms – Twitter, Friendfeed.. Revolutionizing how unaltered, real-time information is disseminated and consumed Significant portion of data is Experiential in nature First-hand observations, experiences, opinions via texts, images, audio, video (Citizens as sensors) 22

Citizen Sensor Observations Are a lens into the social perception of an event in any region at any point in time Mumbai Terror Attacks, Iran Elections, Obama’s Health Care Reform… They present complementary, sometimes lagged viewpoints that evolve over time and with other external stimuli 33

Click to edit Master text styles Second level Third level Fourth level Fifth level what is being said about an event (the theme), IS AS IMPORTANT AS where (spatial) and when (time) it is being said 44

Contribution, Presentation Focus A Web MashUp that Processes textual citizen sensor observations pertaining to real-world events Takes three dimensions of space, time and theme into consideration Extracts local and global social signals/ perceptions over time 55

TWITRIS – System overview Crawling, Processing, Visualization 66

Twitris - System Overview 1. Obtain Topically Relevant Tweets, Extract Location, Time stamp Information, Store in DB Process Tweet Contents, Store extracted metadata in DB 3. User Visualization talks to the DB

1. Gathering topically relevant data, extracting Location, time stamps Obtaining Citizen-Sensor Observations 88

Crawling Tweets Relevant to Event Twitter has no explicit topic categorization Community generated hashtags are the strongest cues Strategy Start with manually selected keywords (seed) Obtain additional hints from Google Insights Crawl using keywords, hashtags 99

Crawling Tweets Relevant to Event Events change, Topics of discussions change Periodically update keywords used for crawl Process crawled tweets, extract top 1 TFIDF keyword, obtain Google Insights Suggestions Continue crawl 1010

Challenges, Limitations of Crawl Volume and Rapidity of Change, Quality of data is key Keyword gathering, Crawl requires supervision Twitter API restrictions Hourly / Daily access limits Severe limitations on extracting past data Can go back only a few weeks! Large scale geo-code conversions 1111

Tweet Spatio-Temporal Coordinates Time Stamp obtained at crawl Location extraction is non-trivial Twitter does not store location data Approximate tweet location using user’s location information Note : Recently, Twitter added per tweet geo locations Geocodes found for a user’s location in profile Use Google / Yahoo geo-coder services Not always provided, not always decoded to a location 1212

Twitris - System Overview 2. Process Tweet Contents, Store extracted metadata in DB Extracted time stamps, geo-coordinates stored in the DB

Spatio-Temporal Sets of Tweets Intuition behind processing of tweets Events have inherent spatial temporal biases associated with them Bias dictates granularity of data processing Mumbai terror attack: country level activity everyday Health care reform: US state level activity per week 1414

Spatial Temporal Sets Group observations using spatial, temporal bias cues E.g., for Mumbai Terror Attack, create X sets of tweets per day, each cluster represents activity in one country Thematic processing over each set Ensures local, temporal social signals are preserved 1515

Processing Tweets Extracting important event descriptors Key words or phrases (n-grams) What is a region paying attention to today? Objective: from volumes of tweets to key descriptors Using three attributes: Thematic, Temporal and Spatial importance of a descriptor 1616

Descriptor: Thematic Importance TFIDF weighted 3-grams Amplified if nouns, no stop words Amplified by presence of contextual evidence BIG THREE GM CHRYSLER FORD GENERAL MOTORS BIG

Contextual Evidence Thematic score of a descriptor (focus word fw ) is enhanced proportional to PMI based association strength with and thematic score of co-occurring descriptors ( awi ) Co-occurring descriptors obtained from the same spatio-temporal set as the descriptor 1818

Certain descriptors always dominate observations Terrorism in the Mumbai Terror Attack Healthcare in the Health Care Reform discussions To allow less popular, interesting descriptors to surface, we discount thematic score proportional to recent popularity Certain descriptors always dominate observations Terrorism in the Mumbai Terror Attack Healthcare in the Health Care Reform discussions To allow less popular, interesting descriptors to surface, we discount thematic score proportional to recent popularity Descriptor: Thematic- Temporal Importance 1919

Descriptors that occur all over the world not as interesting as those local to a region Discount thematic-temporal score proportional to number of spatial sets (not local) that mention the descriptor Descriptors that occur all over the world not as interesting as those local to a region Discount thematic-temporal score proportional to number of spatial sets (not local) that mention the descriptor Descriptor: Thematic-Temporal- Spatial Importance 2020

TFIDF vs. Spatio-Temporal- Thematic (STT) Scores of Descriptors Interesting descriptors surface up! Other examples in the paper 2121

Examples of STT scored Descriptors over 5 days Mumbai Terror Attack 2222

Discussions around Descriptors For some context : Extracting chatter surrounding a descriptor of interest Using a clustering approach Figure showing top X STT weighted descriptors What are people saying about a descriptor? (user click driven) 2323

Clustering Algorithm Overview For a descriptor of interest We generate complementary viewpoints expressed in the data Using a Information Content based Clustering Algorithm Basic Intuition Among descriptor associations, select complementary viewpoint hints Initialize clusters with descriptor of interest, viewpoint indicator Expand cluster to add strong associations 2424

Algorithm Overview – Example for focus word ‘Pakistan’ Click to edit Master text styles Second level Third level Fourth level Fifth level 2525

Discussions around Descriptors - Example Around Pakistan on a particular day US (shades of blue), India (orange), Pakistan (red) Size indicates STT score This summarized viz. only for presentation 2626

Discussions around Descriptors - Example Around G20 in Denmark across 4 days (color) Size indicates STT scores 2727

User Interface and Visualizations Browsing the when, what and where slices of social perceptions behind events 2828

Data Processed – AJ FIX As of March 2009 Event Total Crawled Mapped Financial Crisis G Mumbai Incident Country Percenta ge US76.42 GB7.24 IN1.86 CA1.8 AU1.63 FR1.11 BR1.11 SG1.01 NL0.9 IE0.78 Breakdown of crawled Tweets Percentage of Tweets by region for Financial Crisis event 2929

Events in Twitris WISE 2009 Mumbai Terror Attack, G20 ISWC Challenge 2009 Health Care Reform, Iran Election New features Integration with news, Wikipedia, Tweets mentioning descriptors Current Explorations, Investigations : Automated Crawl, Progressive TFIDF Streaming data analysis 3030

For more information Try it on-line: s/socialmedia/ 3131 For more information Try it on-line: Try it on-line:

Development Team 3232