Download presentation
Presentation is loading. Please wait.
Published bySamantha Houston Modified over 9 years ago
1
Assembler Efficient Discovery of Spatial Co-evolving Patterns in Massive Geo-sensory Data Sheng QIAN 2015-08-01 SIGKDD 2015
2
Content 1. Introduction 2. Problem Description 3. The Assembler Method Stage I Detecting Individual Evolutions Stage II SCP Generation Time and space complexity 4. Experiment
3
Introduction Spatial Co-evolving Patterns(SCP) e.g. AQI Sensors in Beijing
4
Introduction Challenge Interesting evolutions are often flooded by trivial fluctuations The pattern search space is extremely large
5
Problem Description Our Interest
6
Problem Description Symbol S = {s 1, s 2,..., s m }Sensors l i Location of s i T = {t 1, t 2,..., t n }Time domain
7
Problem Description Definitions
8
Definitions
9
Definitions
10
Method: I. Detecting Individual Evolutions Haar Wavelet Transformation
11
Method: I. Detecting Individual Evolutions Haar Wavelet Transformation c ij
12
Method: I. Detecting Individual Evolutions Evolving interval extraction
13
Method: I. Detecting Individual Evolutions Mining Frequent Evolutions Segment-and-group approach 1. Segement: bottom-up 2. Mean Shift: divide segements into groups such that the segments in the same group have similar slopes
14
Method: II. SCP Generation The Anti-monotonicity Property
15
Method: II. SCP Generation Find SCP by intersecting matching timestamps
16
Method: II. SCP Generation SCP Search Tree
17
Method: II. SCP Generation Neighbor and Parent
18
Method: II. SCP Generation SCP Search Tree
19
Method: II. SCP Generation Algorithm
20
Mining Frequent Evolutions Segment-and-group approach 1. Segement: bottom-up 2. Mean Shift: divide segements into groups such that the segments in the same group have similar slopes
21
Method: Discussion Time Complexity Segment approach : Segment approach : O(n e · l e · l s ) ≈ O(m) ls is small, ne · le <m Mean Shift : Mean Shift : O(n l · k) ≈ O(m) k: the avg. number of shifting operation Second Stage : Second Stage : O(n G (n|E G | + n p 2 n s )) n G : the number of connected components in G that have SCPs |E G | : the number of edges in G n p : the maximum number of SCPs on a connected component n s : the maximum support of an SCP
22
Method: Discussion Space Complexity Segment & Mean Shift: nearly linear Second Stage: Second Stage: O(n · n p · n s )
23
Method: Discussion Parameters Setting The minimum support θ How many occurrences can be considered frequent enough The distance threshold h What distance makes two sensors reachable The change threshold δ How much change in the reading reflects a significant and unusual behavior The mean shift bandwidth ω
24
Experiment Dataset 1. Air is an air quality data set. 180 air quality sensors are deployed in 16 cities in northern China (Beijing, Tianjin, and 14 cities in the Hebei Province). Each sensor has measured the hourly AQI during the period 2013.02.08 – 2014.08.27. 2. Bike is the Citi Bike rental data set for the 332 rental docks in New York, we record the number of available bikes at each dock every 30 minutes during 2013.07.01 – 2014.08.30. 3. Syn-Sensor is a collection of 4 synthetic data sets used to evaluate the scalability of Assembler w.r.t. the number of sensors n
25
Experiment Illumination
26
Illumination
27
Efficiency Study Varing and h Efficiency Study Varing θ and h
28
Experiment Efficiency Study Varing and w Efficiency Study Varing δ and w
29
Experiments Scalability
30
Thank you
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.