Tru-Alarm: Trustworthiness Analysis of Sensor Network in Cyber Physical Systems Lu-An Tang, Xiao Yu, Sangkyum Kim, Jiawei Han, Chih-Chieh Hung, Wen-Chih.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois.
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.
A Hierarchical Multiple Target Tracking Algorithm for Sensor Networks Songhwai Oh and Shankar Sastry EECS, Berkeley Nest Retreat, Jan
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Imbalanced data David Kauchak CS 451 – Fall 2013.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
FM-BASED INDOOR LOCALIZATION TsungYun 1.
Multidimensional Analysis of Atypical Events in Cyber-Physical Data Lu-An Tang, Xiao Yu, Sangkyum Kim, Jiawei Han, Wen-Chih Peng, Yizhou Sun.
Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations Lu-An Tang, Yu Zheng, Xing Xie, Jing Yuan, Xiao Yu, Jiawei Han University of.
Patch to the Future: Unsupervised Visual Prediction
Constructing Popular Routes from Uncertain Trajectories Authors of Paper: Ling-Yin Wei (National Chiao Tung University, Hsinchu) Yu Zheng (Microsoft Research.
Constructing Popular Routes from Uncertain Trajectories Ling-Yin Wei 1, Yu Zheng 2, Wen-Chih Peng 1 1 National Chiao Tung University, Taiwan 2 Microsoft.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Department of Computer Science, University of Maryland, College Park, USA TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
Communication-Efficient Distributed Monitoring of Thresholded Counts Ram Keralapura, UC-Davis Graham Cormode, Bell Labs Jai Ramamirtham, Bell Labs.
Margin Based Sample Weighting for Stable Feature Selection Yue Han, Lei Yu State University of New York at Binghamton.
1 A Dynamic Clustering and Scheduling Approach to Energy Saving in Data Collection from Wireless Sensor Networks Chong Liu, Kui Wu and Jian Pei Computer.
Feature Selection and Its Application in Genomic Data Analysis March 9, 2004 Lei Yu Arizona State University.
Query-Based Outlier Detection in Heterogeneous Information Networks Jonathan Kuck 1, Honglei Zhuang 1, Xifeng Yan 2, Hasan Cam 3, Jiawei Han 1 1 University.
TelosCAM: Identifying Burglar Through Networked Sensor-Camera Mates with Privacy Protection Presented by Qixin Wang Shaojie Tang, Xiang-Yang Li, Haitao.
Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques
Real Time Abnormal Motion Detection in Surveillance Video Nahum Kiryati Tammy Riklin Raviv Yan Ivanchenko Shay Rochel Vision and Image Analysis Laboratory.
Introduction To Wireless Sensor Networks Wireless Sensor Networks A wireless sensor network (WSN) is a wireless network consisting of spatially distributed.
Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.
1. Introduction Generally Intrusion Detection Systems (IDSs), as special-purpose devices to detect network anomalies and attacks, are using two approaches.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Cluster based fact finders Manish Gupta, Yizhou Sun, Jiawei Han Feb 10, 2011.
Efficient Gathering of Correlated Data in Sensor Networks
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Outlier Detection Using k-Nearest Neighbour Graph Ville Hautamäki, Ismo Kärkkäinen and Pasi Fränti Department of Computer Science University of Joensuu,
Tracking with Unreliable Node Sequences Ziguo Zhong, Ting Zhu, Dan Wang and Tian He Computer Science and Engineering, University of Minnesota Infocom 2009.
Department of Computer Science City University of Hong Kong Department of Computer Science City University of Hong Kong 1 A Statistics-Based Sensor Selection.
Energy-Aware Scheduling with Quality of Surveillance Guarantee in Wireless Sensor Networks Jaehoon Jeong, Sarah Sharafkandi and David H.C. Du Dept. of.
Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.
Vladyslav Kolbasin Stable Clustering. Clustering data Clustering is part of exploratory process Standard definition:  Clustering - grouping a set of.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
Mining Social Network for Personalized Prioritization Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Page 1 Inferring Relevant Social Networks from Interpersonal Communication Munmun De Choudhury, Winter Mason, Jake Hofman and Duncan Watts WWW ’10 Summarized.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Characterizing the Uncertainty of Web Data: Models and Experiences Lorenzo Blanco, Valter Crescenzi, Paolo Merialdo, Paolo Papotti Università degli Studi.
Anant Pradhan PET: A Statistical Model for Popular Events Tracking in Social Communities Cindy Xide Lin, Bo Zhao, Qiaozhu Mei, Jiawei Han (UIUC)
Mohamed Hefeeda 1 School of Computing Science Simon Fraser University, Canada Efficient k-Coverage Algorithms for Wireless Sensor Networks Mohamed Hefeeda.
Secure In-Network Aggregation for Wireless Sensor Networks
Presented By, Shivvasangari Subramani. 1. Introduction 2. Problem Definition 3. Intuition 4. Experiments 5. Real Time Implementation 6. Future Plans 7.
Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Computer Science 1 Using Clustering Information for Sensor Network Localization Haowen Chan, Mark Luk, and Adrian Perrig Carnegie Mellon University
A Security Framework with Trust Management for Sensor Networks Zhiying Yao, Daeyoung Kim, Insun Lee Information and Communication University (ICU) Kiyoung.
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
NC-BSI: TASK 3.5: Reduction of False Alarm Rates from Fused Data Problem Statement/Objectives Research Objectives Intelligent fusing of data from hybrid.
Anomaly Detection. Network Intrusion Detection Techniques. Ştefan-Iulian Handra Dept. of Computer Science Polytechnic University of Timișoara June 2010.
Graph Indexing From managing and mining graph data.
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Ariel Fuxman, Panayiotis Tsaparas, Kannan Achan, Rakesh Agrawal (2008) - Akanksha Saxena 1.
KNN & Naïve Bayes Hongning Wang
On-line Detection of Real Time Multimedia Traffic
Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering
On Discovery of Traveling Companions from Streaming Trajectories
Lecture 16. Classification (II): Practical Considerations
Presentation transcript:

Tru-Alarm: Trustworthiness Analysis of Sensor Network in Cyber Physical Systems Lu-An Tang, Xiao Yu, Sangkyum Kim, Jiawei Han, Chih-Chieh Hung, Wen-Chih Peng

Introduction A cyber-physical system (CPS) is a system featuring a tight combination of, and coordination between, the system's computational and physical resources – Definition by NSF CPS = sensor network + data mining module Traffic monitoring system healthcare system battlefield surveillance, etc Major Problem: Data reliability, especially the trustworthiness of anomalies/outliers/alarms

Motivation Example: Motion Detector Battle Network: Deploy sensor network to detect hostile object and actions Problem: Sensors are easily damage or influenced by irrelevant activities – generate false alarms

Problem Definition Given a CPS dataset including both alarming and normal data records, find out the trustworthy alarms – We focus on the trustworthiness tasks for alarming records Formal Definition: Let R = {r(s1, t1), r(s1, t2), . . . r(sm, tn) } be a CPS dataset, be the set of alarm records, given a trustworthy threshold δt, the Tru-Alarm’s task is to find out the trustworthy alarms ra(s, t) with τ(ra) > δt

Challenges in Trustworthiness Analysis Huge size: A typical CPS contains hundreds of sensors and millions of data records Unreliable Data: Buonadonna et.al: 51% of the data are faulty; Szewzyk et.al: 60% of the data are faulty in a deployment in green lake No/Rare Training Sets: it is costly to manually label the large dataset Conflicts of Sensors: Well deployed sensor network has reasonable redundancies (k-coverage). Conflicts among reliable and unreliable sensors

Two Tasks The position of intrusion object is important for alarm trustworthiness analysis However, it is hard to get the accurate positions from noisy data with lots of false alarms Observation: Mutual Enhancement More trustworthiness the alarms are, easier to estimate object locations More accurate the object positions are, better trustworthiness analysis for alarms

Main Framework Object-Alarm Trustworthiness Analysis Framework Estimate intrusion object locations from noisy data (Not accurate: many false positives: ghost objects that do not really exist) Use such objects to verify the alarms – find out the false ones Refine the object locations with trustable alarms

Estimate object locations Some state-of-art works in object tracking Commonly used approach: sampling the object positions in the ranges of alarming sensors

Build up links of objects and alarms Construct a bipartite graph of object (positions) and sensor (records) For each sensor s: the monitored objects Os For each object o: the monitoring sensors So Two kinds of links: Alarm-object Link Normal_record-obj link

Trustworthiness Inference in the Graph The weight of link between object o and alarm ra(si,t): how likely the alarm is caused by such object – the conditional trustworthiness τ(ra(si,t)|o) Estimate τ(ra(si,t)|o): it is determined by the coherence of other sensors’ readings in the same monitoring sensor set of So

Compute object trustworthiness A low τ(ra|o) indicates two possibilities: ra is a false alarm ra is a true alarm, but it is not caused by object o In either case, object o is not likely to be a real one; a real object should cause alarms for all its monitoring sensors o’s trustworthiness τ(o) is the average of all its conditional alarm trustworthiness

Compute alarm trustworthiness Even there is only one real object that causes the alarm, such alarm is still meaningful If an alarm has different conditional trustworthiness with different objects, we will take the maximum one as τ(ra)

Running Example I

Running Example II

Experiment Setup Synthetic a battle field with hundreds of sensors Objects (i.e., tanks and soliders) move across the battlefield Random false alarms added in

Precision and Recall with kNN methods

Thank You Very Much!

More Introduction Related works Trustworthiness Inference Experiments

CPS Sensors for Motion Detection The CPSs are deployed in different scenarios with various types of sensors In the scenario of motion detection, several types of sensors are used We use common sensors in this study, however, the method also works for other types of sensors

Related Works: Spatial Similarity Assumption: The sensors that are spatially close to each other should report the similar readings (Krishnamachari et. al 2004) kNN Approach Setup a neighbor threshold k Judge the alarm trustworthiness by neighboring information Suppose an alarm sensor s has l alarming neighbors in its kNN, if l/k > δt, the alarm is trustworthiness, else it is not

Problem of Spatial Similarity based Approach The edge sensor’s alarms may be ignored Hard to determine k, k is indeed different by sensors S5 is an edge sensor, its 1st and 2nd NN s2 and s3 are normal sensors

Related Works: Temporal Similarity Assumption: The sensors that reports alarms in the same time are likely to report together in the future (Xiao et. al 2007) Train a correlation model from historical data, test the alarms by such model Problem: The noisy data are in a large portion (30% -- 50%) The damaged sensors are likely to report false alarms for a long time Some unreliable sensors and false alarms may have such strong correlations with real alarms

More Introduction Related works Trustworthiness Inference Details Experiments

Estimate coherence of two sensor records I To estimate the coherence score between two sensors’ records rj and ri, the system should take count in both their reading differences and positions ri= f(dist(si, o), Ω(o)),where Ω(o) is object o’s signal strength Acoustic sensor: ri= α*Ω(o)/dist(si, o)2+β Estimate Ωi(o) by ri : Ωi(o) = f-1(dist(sj, o), rj) rj’= f(dist(sj, o), Ωi(o)), -- the expect value of rj from ri

Estimate coherence of two sensor records II Coherence coh(ra(si, t), r(sj, t)) is judged by the difference of the expected reading and real value σ is the standard deviation of monitoring sensor set So If si’ reading is the same as expected value, the coherence score reaches the maximum of 1; if the difference is larger than σ, the score is set to 0

Tru-Alarm Algorithm Initialize the trustworthiness for each object and alarm For each object o, first retrieves its related data records from the object-alarm graph, and computes the conditional alarm trustworthiness The object’s trustworthiness is then computed as the average of its conditional alarm trustworthiness The system groups the conditional alarm trustworthiness by alarm and select the max one as τ(ra)

More Introduction Related works Efficient Trustworthiness Inference Experiments

Efficient Trustworthiness Analysis The time complexity of Tru-Alarm is linear in the number of objects The efficiency will be a problem when there are a large number of objects generated by the sampling algorithm Most objects turn out to be low trustworthy: In the running example, there are 10 objects but only one is trustworthy Can we prune the untrustworthy objects in advance?

Upperbound of τ(o) Let o be an object, So be its monitoring set and Rao be the set of related alarms. τ(o)’s upper-bound τ(o) = |Rao|/|So|

Improved Tru-alarm Algorithm Initialize the trustworthiness for each object and alarm For each object o, first compute its upper-bound, if it is less than δt, then prune it Retrieves o’s related data records from the object-alarm graph, and computes the conditional alarm trustworthiness The object’s trustworthiness is then computed as the average of its conditional alarm trustworthiness Groups the conditional alarm trustworthiness by alarm and select the max one as τ(ra)

Time Cost