G10 Anuj Karpatne Vijay Borra

Slides:



Advertisements
Similar presentations
Construct chronicles For each fuzzy clusters of step : instances are sorted in the decreasing order of their membership degree the T first instances that.
Advertisements

Beyond Spectral and Spatial data: Exploring other domains of information GEOG3010 Remote Sensing and Image Processing Lewis RSU.
Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.
Mining Frequent Spatio-temporal Sequential Patterns
Data Mining in Clinical Databases by using Association Rules Department of Computing Charles Lo.
Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.
Frequent Closed Pattern Search By Row and Feature Enumeration
Budapest May 27, 2008 Unifying mixed linear models and the MASH algorithm for breakpoint detection and correction Anders Grimvall, Sackmone Sirisack, Agne.
Scaling Laws, Scale Invariance, and Climate Prediction
The Evolution of Spatial Outlier Detection Algorithms - An Analysis of Design CSci 8715 Spatial Databases Ryan Stello Kriti Mehra.
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
Downstream weather impacts associated with atmospheric blocking: Linkage between low-frequency variability and weather extremes Marco L. Carrera, R. W.
Analysis of Extremes in Climate Science Francis Zwiers Climate Research Division, Environment Canada. Photo: F. Zwiers.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Quantile-Based KNN over Multi- Valued Objects Wenjie Zhang Xuemin Lin, Muhammad Aamir Cheema, Ying Zhang, Wei Wang The University of New South Wales, Australia.
Cascading Spatio-Temporal Pattern Discovery P. Mohan, S.Shekhar, J. Shine, J. Rogers CSci 8715 Presented by: Atanu Roy Akash Agrawal.
SSCP: Mining Statistically Significant Co-location Patterns Sajib Barua and Jörg Sander Dept. of Computing Science University of Alberta, Canada.
Fast Algorithms for Association Rule Mining
Anomaly Detection. Anomaly/Outlier Detection  What are anomalies/outliers? The set of data points that are considerably different than the remainder.
Co-location pattern mining (for CSCI 5715) Charandeep Parisineti, Bhavtosh Rath Chapter 7: Spatial Data Mining [1]Yan Huang, Shashi Shekhar, Hui Xiong.
Climate Change: Challenges for Fish and Wildlife Conservation Rick Kearney WildlifeProgram Coordinator Wildlife Program Coordinator U.S. Geological Survey.
USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
CEMiner – An Efficient Algorithm for Mining Closed Patterns from Time Interval-based Data Yi-Cheng Chen, Wen-Chih Peng and Suh-Yin Lee ICDM 2011.
1 ANALYZING TIME SERIES OF SATELLITE IMAGERY USING TEMPORAL MAP ALGEBRA Jeremy Mennis 1 and Roland Viger 1,2 1 Dept. of Geography, University of Colorado.
Modern Era Retrospective-analysis for Research and Applications: Introduction to NASA’s Modern Era Retrospective-analysis for Research and Applications:
Gap-filling and Fault-detection for the life under your feet dataset.
Adjustment of Global Gridded Precipitation for Orographic Effects Jennifer Adam.
Identification of land-use and land-cover changes in East-Asia Masayuki Tamura, Jin Chen, Hiroya Yamano, and Hiroto Shimazaki National Institute for Environmental.
Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team.
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
Lecture 7: Outlier Detection Introduction to Data Mining Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
Feng Zhang and Aris Georgakakos School of Civil and Environmental Engineering, Georgia Institute of Technology Sample of Chart Subheading Goes Here Comparing.
Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Salvatore Orlando Raffaele Perego Claudio Silvestri 國立雲林科技大學 National.
Locally Optimized Precipitation Detection over Land Grant Petty Atmospheric and Oceanic Sciences University of Wisconsin - Madison.
Ten-Year Simulations of U.S. Regional Climate Z. Pan, W. J. Gutowski, Jr., R. W. Arritt, E. S. Takle, F. Otieno, C. Anderson, M. Segal Iowa State University.
Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques Mert Özer, Ilkcan Keles, Ismail Hakki Toroslu, Pinar.
Engine measurement dataset Engine Sub-system Measurements from Metro-Transit Buses in Twin-Cities Sample Engine Variables:  GPS Speed and Position  Vehicle.
1 Complex Spatio-Temporal Pattern Queries Cahide Sen University of Minnesota.
1 Spatio-temporal Distribution of Latent Heating in the Southeast Asian Monsoon Region School of Earth and Atmospheric Sciences Georgia Institute of Technology.
Reconciling droughts and landfalling tropical cyclones in the southeastern US Vasu Misra and Satish Bastola Appeared in 2015 in Clim. Dyn.
Fire Climatology The pattern of fire frequency and the applied QC Algorithms.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
In a changing climate with uncertainties and impacts where journalists stand Dr, Ousmane Ndiaye International Institute for Climate and Society Earth Institute.
DOWNSCALING GLOBAL MEDIUM RANGE METEOROLOGICAL PREDICTIONS FOR FLOOD PREDICTION Nathalie Voisin, Andy W. Wood, Dennis P. Lettenmaier University of Washington,
黃福銘 (Angus F.M. Huang) ANTS Lab, IIS, Academia Sinica Exploring Spatial-Temporal Trajectory Model for Location.
MODIS Atmosphere Products: The Importance of Record Quality and Length in Quantifying Trends and Correlations S. Platnick 1, N. Amarasinghe 1,2, P. Hubanks.
General Elliptical Hotspot Detection Xun Tang, Yameng Zhang Group
Mining Statistically Significant Co-location and Segregation Patterns.
5th International Conference on Earth Science & Climate Change
A Framework for Mining Sequential Patterns from Spatio-Temporal Event Data Sets Yan Huang, Liqin Zhang, Pusheng Zhang, IEEE Transactions on Knowledge.
Precipitation-Runoff Modeling System (PRMS)
Overview of Downscaling
Frequent Pattern Mining
Meng Lu and Edzer Pebesma
Spatio-temporal Pattern Queries
CARPENTER Find Closed Patterns in Long Biological Datasets
Data Mining Association Analysis: Basic Concepts and Algorithms
Outlier Discovery/Anomaly Detection
Chao Zhang1, Yu Zheng2, Xiuli Ma3, Jiawei Han1
Nathalie Voisin, Andy W. Wood and Dennis P. Lettenmaier
Global Climate Predictions
Investigation of sub-patterns discovery and its applications
by A. L. Westerling, H. G. Hidalgo, D. R. Cayan, and T. W. Swetnam
CONTENT Introduction Definitions & Taxonomy of Application Layer Protocols Fault Generation Algorithms Experiments & Conclusions.
Association Analysis: Basic Concepts
Discovering Frequent Poly-Regions in DNA Sequences
Presentation transcript:

G10 Anuj Karpatne Vijay Borra Finding Sequential Patterns with Time Lags in Spatio-temporal Earth Science Datasets G10 Anuj Karpatne Vijay Borra

Outline Motivation Example of Dataset Problem Statement First principle example Challenges Novelty Proposed Approach Validation outline and future work

Motivation Deforestation in Indonesian peat lands for palm-oil plantation 2) 1) Increase in average winter temperature in higher latitudes due to global warming Degradation of soil due to poor soil moisture Increase in Pine Beetle Infestation Increased susceptibility of Forest Fires Forest Fires Sources: NASA Earth Observatory, Environment Protection Agency, National Oceanic and Atmospheric Administration, National Resources Canada

An example Dataset Neighborhood Relationship Time Lag Time Lag Active Fire Time Lag Neighborhood Relationship Time Lag Active Fire

Problem Definition Problem Statement: Input: Output: Objective: Finding sequential patterns in spatio-temporal real-valued events using multiple variables with time lags Input: Spatio-temporal gridded data with multiple variables Event Detection schemes for each variable with parameters and threshold Prevalence Threshold measures for a sequential pattern Spatial Neighborhood relationship Statistical Parameters of Time Lag Distribution Output: Patterns of the form where T denotes the time lag distribution of the event relationship (a vector of time lag values for which the relationship is prevalent) Examples: Constraints: Events are point processes occurring with time lags and not interval-based Chains of events being detected with ‘total orderedness’ assumption Homogenous data variables spanning the same range of time Event detection algorithms are robust in handling noise and seasonality of time series Algorithm is correct and complete Objective: Minimize computational time

First-Principle Example eA Time of Occurrence A1 3 A2 4 A3 6 A4 A5 A6 A7 5 A8 A9 A10 A11 A12 A13 A14 2 A15 eB Time of Occurrence B1 6 B2 2 B3 5 B4 B5 B6 4 B7 B8 B9 B10 eC Time of Occurrence C1 5 C2 6 C3 3 C4 2 C5 C6 4 C7 C8

Challenges Novelty Exponential size of search space A vector of prevalence thresholds to be considered for pruning at multiple time lag values . . . . . . . . . . . . Novelty Spatio-temporal sequential pattern mining Sequential pattern mining for Boolean events without time Lag Huang et al. Mohan et al. (Cascade) Moving Object Trajectory Mining Cao et al. STAR Sequential pattern mining for Real-valued events with time lag Our approach

Approach: Computing Prevalence Index: PI(t) For a pattern of size two, : For each possible Time Lag value t, Prevalence Ratio of A at time lag t: Prevalence Ratio of B at time lag t: Prevalence Index of at time lag t: minimum of Prevalence ratios of A and B at time lag t For patterns of size greater than two, : Find as the set of events in which participate in minimum of and

Running Example eAB Time Lag A1B1 3 A1B3 2 A1B7 A2B1 A2B3 1 A2B7 A3B1 A4B1 A4B3 A4B4 A4B6 A4B10 A6B3 A6B10 A7B10 A8B1 A9B1 A9B3 A9B6 A10B1 A11B3 A12B6 A13B6 A14B6 A14B4 A15B4 A15B10 eAB(0) eAB(1) eAB(2) eAB(3) A3B1 A2B3 A1B3 A1B1 A10B1 A2B7 A1B7 A4B1 A15B4 A4B6 A2B1 A4B10 A7B10 A4B3 A6B10 A9B6 A4B4 A9B1 A11B3 A6B3 A14B4 A12B6 A8B1 A13B6 A9B3 A14B6 PI(0) = 0.2 PI(1) = 0.5 PI(2) = 0.47 PI(3) = 0.3 Let eAB be the set of pairs of events in(A,B) satisfying Spatial neighborhood condition: events in eAB fall in spatial neighborhood Positive time lag condition: event in B occurs after event in A in eAB eA Time of Occurrence A1 3 A2 4 A3 6 A4 A5 A6 A7 5 A8 A9 A10 A11 A12 A13 A14 2 A15 eB Time of Occurrence B1 6 B2 2 B3 5 B4 B5 B6 4 B7 B8 B9 B10 eC Time of Occurrence C1 5 C2 6 C3 3 C4 2 C5 C6 4 C7 C8 Output additional statistical properties of event pair occurrences at the prevalent Time Lags

Pruning Strategy Lattice Growing Region If Prevalent Time Lag is empty, remove the candidate pattern from the list of frequent patterns Else, prune the prevalent time lags using statistical properties of the event occurrence distributions over the prevalent time lags Accept middle quantile of prevalent time Lag distribution as pruned time lags

Proposed Approach For size of patterns from 1 to n (number of variables) Generate candidate patterns of size (k+1) from frequent patterns of size (k) Compute Prevalence Index: PI(t) for each candidate pattern at each time lag Generate pattern instance distributions for time lags which have PI(t) > threshold If set of prevalent time lags is empty Discard the candidate pattern Else Prune the prevalent time lags using middle quantile of distribution and add the candidate pattern with the pruned time lags into set of frequent patterns of size (k+1) end

Properties of PI and the proposed approach Prevalence Index is anti-monotonic Prevalence Ratio is anti-monotonic: An event instance participates in a sequence only if it participates in all the subsequences of the pattern The approach is complete Prevalence Index is anti-monotonic for each time lag Apriori-based Candidate Generation Technique is complete Enumeration of event pairs for each spatial join is complete The approach is correct is correct only if: Events in A and B occur in a spatial neighborhood Events in B follow events in A with a time lag distribution T Pruning approach is correct

Insight into Real-world datasets Future Work Region of Interest – Peatland forests in Indonesia (tile h29v08 and tile h29v09) Types of datasets to be used and their respective event detection algorithms: Vegetation Index (EVI) Deforestation: V2delta, Gradual Decrease, Segmentation Forest Fire: KD6 + ID6 Land Surface Temperature Increase in annual land surface temperature because of fire Thermal Anomaly Index Precipitation Soil Moisture Aerosol Information EVI for Forest Fire EVI for Deforestation Land Surface Temperature (Day) Thermal Anomaly Index