1 Clarifying Sensor Anomalies using Social Network feeds * University of Illinois at Urbana Champaign + U.S. Army Research Lab ++ IBM Research, USA Prasanna.

Slides:



Advertisements
Similar presentations
A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois.
Advertisements

Mining User Similarity Based on Location History Yu Zheng, Quannan Li, Xing Xie Microsoft Research Asia.
Submitted by: Javaneh Noorparvar Civil Engineering, Cal Poly Pomona.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Influence and Passivity in Social Media Daniel M. Romero, Wojciech Galuba, Sitaram Asur, and Bernardo A. Huberman Social Computing Lab, HP Labs.
Presenter: Liu, Ya Tian, Yujia Pham, Anh TwitterMonitor: Trend Detection over the Twitter Stream EvenTweet: Online Localized Event Detection from Twitter.
Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
Polymorphic blending attacks Prahlad Fogla et al USENIX 2006 Presented By Himanshu Pagey.
Civil and Environmental Engineering Carnegie Mellon University Sensors & Knowledge Discovery (a.k.a. Data Mining) H. Scott Matthews April 14, 2003.
On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
Streaming Models and Algorithms for Communication and Information Networks Brian Thompson (joint work with James Abello)
1 Distributed Online Simultaneous Fault Detection for Multiple Sensors Ram Rajagopal, Xuanlong Nguyen, Sinem Ergen, Pravin Varaiya EECS, University of.
Novelty Detection and Profile Tracking from Massive Data Jaime Carbonell Eugene Fink Santosh Ananthraman.
Tracking Moving Objects in Anonymized Trajectories Nikolay Vyahhi 1, Spiridon Bakiras 2, Panos Kalnis 3, and Gabriel Ghinita 3 1 St. Petersburg State University.
On Burstiness-Aware Search for Document Sequences Theodoros Lappas Benjamin Arai Manolis Platakis Dimitrios Kotsakos Dimitrios Gunopulos SIGKDD 2009.
seminar on Intrusion detection system
Intrusion Detection System Marmagna Desai [ 520 Presentation]
Delay Analysis of Large-scale Wireless Sensor Networks Jun Yin, Dominican University, River Forest, IL, USA, Yun Wang, Southern Illinois University Edwardsville,
WAC/ISSCI Automated Anomaly Detection Using Time-Variant Normal Profiling Jung-Yeop Kim, Utica College Rex E. Gantenbein, University of Wyoming.
Water Contamination Detection – Methodology and Empirical Results IPN-ISRAEL WATER WEEK (I 2 W 2 ) Eyal Brill Holon institute of Technology, Faculty of.
Where Are the Nuggets in System Audit Data? Wenke Lee College of Computing Georgia Institute of Technology.
Intrusion Detection Jie Lin. Outline Introduction A Frame for Intrusion Detection System Intrusion Detection Techniques Ideas for Improving Intrusion.
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Personalization in Local Search Personalization of Content Ranking in the Context of Local Search Philip O’Brien, Xiao Luo, Tony Abou-Assaleh, Weizheng.
Bei Pan (Penny), University of Southern California
1 Mobility Aware Server Selection for Mobile Streaming Multimedia CDN Muhammad Mukarram Bin Tariq, Ravi Jain, Toshiro Kawahara {tariq, jain,
Using Transactional Information to Predict Link Strength in Online Social Networks Indika Kahanda and Jennifer Neville Purdue University.
Web Cache Replacement Policies: Properties, Limitations and Implications Fabrício Benevenuto, Fernando Duarte, Virgílio Almeida, Jussara Almeida Computer.
Demo. Overview Overall the project has two main goals: 1) Develop a method to use sensor data to determine behavior probability. 2) Use the behavior probability.
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
Travel Speed Study of Urban Streets Using GPS &GIS Tom E. Sellsted City of Yakima, Washington Information Systems and Traffic.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
23-aug-05Intrusion detection system1. 23-aug-05Intrusion detection system2 Overview of intrusion detection system What is intrusion? What is intrusion.
Grant Pannell. Intrusion Detection Systems  Attempt to detect unauthorized activity  CIA – Confidentiality, Integrity, Availability  Commonly network-based.
Event Detection using Customer Care Calls 04/17/2013 IEEE INFOCOM 2013 Yi-Chao Chen 1, Gene Moo Lee 1, Nick Duffield 2, Lili Qiu 1, Jia Wang 2 The University.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Querying Structured Text in an XML Database By Xuemei Luo.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
© Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP Confidential. Mahalia.
© 2010 IBM Corporation IBM Research - Ireland © 2014 IBM Corporation xStream Data Fusion for Transport Smarter Cities Technology Centre IBM Research.
HIPS Host-Based Intrusion Prevention System By Ali Adlavaran & Mahdi Mohamad Pour (M.A. Team) Life’s Live in Code Life.
Prediction of Influencers from Word Use Chan Shing Hei.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
Designing multiple biometric systems: Measure of ensemble effectiveness Allen Tang NTUIM.
Multiple Location Profiling for Users and Relationships from Social Network and Content Rui Li, Shengjie Wang, Kevin Chen-Chuan Chang University of Illinois.
Enquiring Minds: Early Detection of Rumors in Social Media from Enquiry Posts Zhe Zhao Paul Resnick Qiaozhu Mei Presentation Group 2.
Boundary Detection in Tokenizing Network Application Payload for Anomaly Detection Rachna Vargiya and Philip Chan Department of Computer Sciences Florida.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,
Towards Social User Profiling: Unified and Discriminative Influence Model for Inferring Home Locations Rui Li, Shengjie Wang, Hongbo Deng, Rui Wang, Kevin.
Topic cluster of Streaming Tweets based on GPU-Accelerated Self Organizing Map Group 15 Chen Zhutian Huang Hengguang.
Efficient Gigabit Ethernet Switch Models for Large-Scale Simulation Dong (Kevin) Jin David Nicol Matthew Caesar University of Illinois.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Embedded Lab. Park Yeongseong.  Introduction  Problem Formulation  Approach Overview  AOI(Area Of Interest) Extraction  CallStack Pattern Mining.
Evaluating Event Credibility on Twitter Presented by Yanan Xie College of Computer Science, Zhejiang University 2012.
A Nonparametric Method for Early Detection of Trending Topics Zhang Advisor: Prof. Aravind Srinivasan.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Twitter Hashtags RMBI4310Spring 2016 Group 14 Cheung Hiu Yan, Debbie Chow Miu Lam, Carman Tsang Wing Wah, Denise
LEARNING IN A PAIRWISE TERM-TERM PROXIMITY FRAMEWORK FOR INFORMATION RETRIEVAL Ronan Cummins, Colm O’Riordan (SIGIR’09) Speaker : Yi-Ling Tai Date : 2010/03/15.
Using Social Media to Enhance Emergency Situation Awareness
Online Conditional Outlier Detection in Nonstationary Time Series
QianZhu, Liang Chen and Gagan Agrawal
ANOMALOUS NOISE EVENTS CONSIDERATIONS FOR THE COMPUTATION OF ROAD TRAFFIC NOISE LEVELS : THE DYNAMAP'S MILAN CASE STUDY F. Orga (1), R. M. Alsina-Pagès.
Collective Network Linkage across Heterogeneous Social Platforms
Roland Kwitt & Tobias Strohmeier
A survey of network anomaly detection techniques
ADVANCED ANOMALY DETECTION IN CANARY TESTING
Presentation transcript:

1 Clarifying Sensor Anomalies using Social Network feeds * University of Illinois at Urbana Champaign + U.S. Army Research Lab ++ IBM Research, USA Prasanna Giridhar *, Tanvir Amin *, Lance Kaplan +, Jemin George +, Raghu Ganti ++, Tarek Abdelzaher *

2 INTRODUCTION  Explosive growth in deployment of physical sensors.  Many times activities recorded by these sensors deviate from the norm:  Closure of a freeway due to forest fire.  Change in building occupancy due to shutdown.  Unusual behavior tend to attract human attention and get reported socially as well.

3  Several research works in the past for detecting events in the physical as well as the social domain.  Can we use the social media as a tool for explaining the underlying cause of anomalies?  A system for identifying the discriminative social feeds that can be correlated with sensor anomalies.  The more unusual the event, higher probability.  Evaluation performed on real time traffic data. MOTIVATION

4 System Work-flow STEP 1: Initialization of the system Continuous stream of tweets using parameters  Keywords  Location Continuous stream of data from physical sensors

5 STEP 2: Identification of sensor anomalies  Run a black box algorithm.  Store attributes for sensors classified positively by the algorithm  Cluster the sensors which provide redundant data Detecting events in Sensors

6 STEP 2: Identification of sensor anomalies  Run a black box algorithm.  Store attributes for sensors classified positively by the algorithm  Cluster the sensors which provide redundant data Detecting events in Sensors t1,t2

7 STEP 2: Identification of sensor anomalies  Run a black box algorithm.  Store attributes for sensors classified positively by the algorithm  Cluster the sensors which provide redundant data Detecting events in Sensors

8 STEP 3: Identification of discriminative social feeds  Social feeds often have keywords describing an event Discriminative Social Feeds  Keywords: malaysian, airlines, 370

9 Keyword Signatures Single Keyword? Airlines

10 Keyword Signatures Keyword pair? Malaysian, Airlines

11 Keyword Signatures Keyword triplet? Malaysia, Airlines, 370 Malaysia, Airlines, Satellite

12 Keyword Signatures Signature Events per Signature Signatures per Event Single keyword Keyword Pair Keyword Triplet  Signature profile on the twitter data collected  Ideal 1-to-1 mapping for keyword pair

13 Problem: Given a list of keyword pairs for the current and past window, how to find the most discriminating subset? Difference in rate of occurrences: (traffic,jam) 50 times today compared to past average of 35 (drunk, kills) 12 times today compared to a past average of 0. Increase in percentage: (traffic,jam) 1 time today compared to past average of 0 (drunk, kills) 12 times today compared to a past average of 2 Possible Approaches Overcome disadvantages using Information Gain Theory

14 Information Gain Theory and Entropy Entropy measures randomness introduced by a variable Using conditional entropy value determine information gain about an event by the keyword pair. This can be formulated as: Information Gain = H(Y) − H(Y|X) Y: variable associated with event; y=0 (normal) and y=1 (anomalous) X: variable associated with keyword pair; x=0 (absent) and x=1 (present)

15 STEP 4: Ranking discriminative events  Identify tweets for discriminative pairs.  Score proportional to conditional entropy.  The lower the entropy value, the higher is the discriminating power. Rank the unusual events

16 STEP 5: Matching tweets with sensor anomalies We align both the data based on spatiotemporal properties associated with the event. For example  Sensor ID40456 on I-15 Northbound with unusual activity  Unusual Tweet: “SFvSD game tonight, traffic!!!” Mapping both events

17 STEP 6: Output the matched explanations  Final step is to provide the explanations.  A user interface which enables to track unusual events on a per-day basis. Output Explanations

18 Twitter feeds collected for a period of 2 weeks: Aug 19 to September 01, 2013 with a radius of 30 miles Three cities in CA: Los Angeles San Francisco San Diego Physical sensors data retrieved from PeMS (Caltrans Performance Measurement System ) : 5 minutes report for flow, speed, occupancy, delay EXPERIMENTAL RESULTS

19 Table: Precision using different methods B1 corresponds to Difference in rate of occurrences and B2 to Increase in percentage. Table: Average position of tweets from the top Performance measured using Precision and Mean Average rank for our Information gain theory approach against other baseline approaches EXPERIMENTAL RESULTS

20 INTERESTING EVENTS Sensor anomaly detected  Highway I-80 Eastbound in SF  Landmarks: Bay bridge  Duration: 4 days

21 INTERESTING EVENTS

22 US101 blockage due to Bomb squad in LA INTERESTING EVENTS

23 Traffic on 15N due to game in SD INTERESTING EVENTS

24 CONCLUSION  Abnormal behavior recorded in social medium.  Tool to explain the abnormalities.  Major activities explained with high precision.  Explanations ranked among top two tweets.

25 Future Work  Scalability Issues  Credibility of social feeds  Geo localization of tweets

26 THANK YOU Q+A