Presentation is loading. Please wait.

Presentation is loading. Please wait.

2015-5-5 Multidimensional Analysis of Atypical Events in Cyber-Physical Data Lu-An Tang, Xiao Yu, Sangkyum Kim, Jiawei Han, Wen-Chih Peng, Yizhou Sun.

Similar presentations


Presentation on theme: "2015-5-5 Multidimensional Analysis of Atypical Events in Cyber-Physical Data Lu-An Tang, Xiao Yu, Sangkyum Kim, Jiawei Han, Wen-Chih Peng, Yizhou Sun."— Presentation transcript:

1 2015-5-5 Multidimensional Analysis of Atypical Events in Cyber-Physical Data Lu-An Tang, Xiao Yu, Sangkyum Kim, Jiawei Han, Wen-Chih Peng, Yizhou Sun

2 2015-5-5 2 DAIS UIUC Outline Introduction Backgrounds Model Construction Query Processing Experiments

3 2015-5-5 3 DAIS UIUC Introduction Cyber Physical System: Integrate physical devices (e.g., sensors, cameras) with cyber components to form a situation aware analytical system Many promising applications traffic observation intruder/motion detection battlefield surveillance remote healthcare Key task: Analyze the atypical data with multi- dimensional information

4 2015-5-5 4 DAIS UIUC 4 Motivation Example I Taffic Monitoring System: Typical CPS Inductive loop sensors Thousands, placed every few miles in highways 24 hours * 7 days monitoring traffic and report congestions

5 2015-5-5 5 DAIS UIUC Motivation Example II Questions from Transportation Officers When do the congestion usually happen in downtown? Where do the congestion happen in the weekday? In the past three months, which road is the most seriously congested, how do those congestion start? Traditional SQL query cannot satisfy them

6 2015-5-5 6 DAIS UIUC Our Contribution They demand the results that are summarized, self-organized and succinct, be delivered in short time Our goal Construct a data model for atypical data in CPS Support efficient query processing with such model

7 2015-5-5 7 DAIS UIUC Challenges Massive Data Thousands of sensors generate giga-bytes, even tera-bytes of data Complex Event The atypical event is a dynamic process influencing multiple spatial regions How to represent such an event? – new measure/model Effectiveness & Efficiency If the query range is large, many events are involved Retrieve the significant ones in short time – new algorithm

8 2015-5-5 8 DAIS UIUC Our Contribution Introduce the techniques to discover atypical events and summarize them as atypical micro-clusters Integrate the similar micro-clusters to macro-clusters to generate big figure Construct the data model of atypical cluster forest Using a guiding algorithm to retrieve the significant cluster efficiently

9 2015-5-5 9 DAIS UIUC Outline Introduction Backgrounds Model Construction Query Processing Experiments

10 2015-5-5 10 DAIS UIUC CPS Systems in Traffic Application PeMS: collects data in California highway CarWeb: collects real time GPS data from cars Google Traffic: Toolkit on Google Map CubeView by Shekhar et.al: Implement traditional OLAP on the traffic data AITVS: based on CubeView, using two more distinct views to support investigation Most focus on SQL based queries, lacking analysis power Build on the whole dataset – huge I/O overhead, atypical data are dwarfed

11 2015-5-5 11 DAIS UIUC Other Spatial OLAP Techniques Spatial Cube by Stefanovic et. al: dimension members are spatially referenced and can be represented on a Map Trajectory Cube by Giannotti et. al: include temporal, spatial, demo-graphic and techno-graphic dimensions, two kinds of measures: spatial measure and numerical measure Flow Cube by Gonzalez et. al: analyzing item flows in RFID applications Different object – cannot use them directly in this problem

12 2015-5-5 12 DAIS UIUC Preliminaries Atypical record: (s, t, f(s,t)) s: sensor t: reported time f(s,t): severity measure Analytical query Q(W, T, etc) W: spatial region T: time period There might be query conditions on other dimensions Return total severity: Too abstract

13 2015-5-5 13 DAIS UIUC Problem Formulation Let R be the CPS dataset, retrieving the atypical events from R, designing a measure to represent the event and integrating the information of multiple events Process analytical query Q in online time We assume the atypical criteria is given and the atypical dataset can be acquired in advance

14 2015-5-5 14 DAIS UIUC System Overview

15 2015-5-5 15 DAIS UIUC Outline Introduction Backgrounds Model Construction Query Processing Experiments

16 2015-5-5 16 DAIS UIUC Atypical Event Let us examine the atypical event -- congestion in traffic monitoring system : start from a single segment of the streets expand along the road and influence nearby roads may cover hundred road segments when reaching the full size The data records in a congestion are spatially close and timely relevant

17 2015-5-5 17 DAIS UIUC Retrieve the Atypical Event Scan the dataset, retrieve the atypical records and group them by a time threshold and distance threshold The atypical event is a set of atypical records The size is not bounded (or bounded by the size of dataset R) Difficult to represent and integrate Too detail -- not a good measure

18 2015-5-5 18 DAIS UIUC Atypical Micro-Cluster Aggregate the atypical records in one dimension Summarize the total severity by sensors (sensor/spatial feature) Summarize the total severity by time window (temporal features) The size is bounded by the total numbers of sensors and time windows Still keeping detailed information

19 2015-5-5 19 DAIS UIUC Example in Congestion Event

20 2015-5-5 20 DAIS UIUC Integrate the Micro-clusters The micro-clusters represent an individual event Atypical events may happen in similar places/time For example, 10E highway congested in evening rush hours in weekday For analytical purposes, it is helpful to group those similar congestions as a whole Two sub-problems: Which ones to merge? How to merge?

21 2015-5-5 21 DAIS UIUC Similarity Measure for Atypical Clusters Basic Principles Consider the similarity on multiple dimensions – users may specify a preference weight Weighted measure on the data themselves (e.g., if sensor s 1 report higher severities in the clusters than s 2, then the weight of s 1 is higher) – employ the severity as weight

22 2015-5-5 22 DAIS UIUC Cluster Integration For two clusters C 1 and C 2, the system carry out aggregation on the feature of each dimension for the common items, sum up their severity keep the non-overlap items Example C 1 {s 1, 100 min; s 2, 20 min} C 2 {s 1, 30 min; s 3, 40 min} C 23 {s 1, 130 min; s 2, 20 min; s 3, 40 min} The spatial and temporal features are algebraic – efficient to aggregate

23 2015-5-5 23 DAIS UIUC Macro-Clusters The macro-clusters are generated by merging the micro-clusters The similarities are computed among those macro- clusters and even larger ones can be further generated

24 2015-5-5 24 DAIS UIUC Clustering Forest The clusters make up the hierarchy of a tree Different aggregate paths (preference on dimensions ) form a cluster forest

25 2015-5-5 25 DAIS UIUC Outline Introduction Backgrounds Model Construction Query Processing Experiments

26 2015-5-5 26 DAIS UIUC The Efficiency Problem on Online Query Usually it is not realistic to materialize the entire data forest Only some middle results (i.e., the micro-clusters in lower level cells) are pre-computed (Partial materialization) The time complexity of the cluster integration algorithm is O(n 2 ) Query efficiency will be influenced if n is large –the analytical query Q(W, T) usually covers large region with long time – n is indeed large

27 2015-5-5 27 DAIS UIUC The Effectiveness Problem In the result, only few significant macro-clusters are generated The remaining are the trivial ones that cannot be aggregated with others

28 2015-5-5 28 DAIS UIUC Pruning-beforehand Strategy Filter out the insignificant micro-clusters The insignificant micro- clusters may integrate together and generate significant macro-clusters Can we foretell which micro-cluster will contribute to significant macro-clusters?

29 2015-5-5 29 DAIS UIUC Red-Zone Guided Clustering Since it is fast to compute the total severity in a specified region Select out the regions with high severities (red zones) Filter out the micro-clusters locating outside those red zones Only keep the ones in/intersect with red zones (where the significant macro-clusters may locate)

30 2015-5-5 30 DAIS UIUC Red-Zone Guided Clustering Example

31 2015-5-5 31 DAIS UIUC Outline Introduction Backgrounds Model Construction Query Processing Experiments

32 2015-5-5 32 DAIS UIUC Experiment Setup PeMS datasets from UC Berkeley 1 year traffic data 4,076 loop detectors in 38 freeways in California totally 54 GB Hardware Inter 2200 Dual CPU @ 2.20G Hz and 2.19G Hz 1.98 GB RAM; Windows XP SP2. All the algorithms are implemented in Java

33 2015-5-5 33 DAIS UIUC Model Construction Comparing Atypical Cluster (AC) with Original CubeView (OC) and Modified CubeView (MC) AC is an order of magnitude faster than OC

34 2015-5-5 34 DAIS UIUC Query Efficieny All: Do not prune; Pru: Prune beforehand; Gui: Guided Clustering Gui cost 20% time of All, and is close to Pru

35 2015-5-5 35 DAIS UIUC Query Effectiveness Ground Truth: Generated by All Pru may miss real significant macro-clusters, but Gui can guarantee the recall

36 2015-5-5 36 DAIS UIUC Conclusions We have investigated the problem of multi- dimensional analysis of atypical events in CPS Atypical cluster is designed to present the event and serve as the measure for data model The red-zone algorithm is proposed to retrieve the significant clusters for analytical query Performance evaluation on real large datasets Thank You Very Much! Any Questions?


Download ppt "2015-5-5 Multidimensional Analysis of Atypical Events in Cyber-Physical Data Lu-An Tang, Xiao Yu, Sangkyum Kim, Jiawei Han, Wen-Chih Peng, Yizhou Sun."

Similar presentations


Ads by Google