Spatial-Temporal Data Mining Wei Wang Data Mining Lab Computer Science Department UCLA.

Slides:



Advertisements
Similar presentations
CLUSTERING.
Advertisements

1 DATA STRUCTURES USED IN SPATIAL DATA MINING. 2 What is Spatial data ? broadly be defined as data which covers multidimensional points, lines, rectangles,
An Introduction to Data Mining
The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
PARTITIONAL CLUSTERING
Han-na Yang Trace Clustering in Process Mining M. Song, C.W. Gunther, and W.M.P. van der Aalst.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 6 Advanced Data Modeling.
Database Systems: Design, Implementation, and Management Tenth Edition
Chapter 6 Advanced Data Modelling
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
Decision Tree Algorithm
Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.
Evaluating Hypotheses
An Approach to Active Spatial Data Mining Wei Wang Data Mining Lab, UCLA March 24, 1999.
CHAPTER 6 Statistical Analysis of Experimental Data
Mining Long Sequential Patterns in a Noisy Environment Jiong Yang, Wei Wang, Philip S. Yu, Jiawei Han SIGMOD 2002.
Spatial Temporal Data Mining
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Data Structures Introduction Phil Tayco Slide version 1.0 Jan 26, 2015.
Wheeler Lower School Mathematics Program Grades 4-5 Goals: 1.For all students to become mathematically proficient 2.To prepare students for success in.
Data Mining Techniques
Copyright © Cengage Learning. All rights reserved. CHAPTER 11 ANALYSIS OF ALGORITHM EFFICIENCY ANALYSIS OF ALGORITHM EFFICIENCY.
CHP - 9 File Structures. INTRODUCTION In some of the previous chapters, we have discussed representations of and operations on data structures. These.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
Presented by Tienwei Tsai July, 2005
CS654: Digital Image Analysis Lecture 3: Data Structure for Image Analysis.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Partitioning – A Uniform Model for Data Mining Anne Denton, Qin Ding, William Jockheck, Qiang Ding and William Perrizo.
Of 33 lecture 10: ontology – evolution. of 33 ece 720, winter ‘122 ontology evolution introduction - ontologies enable knowledge to be made explicit and.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003.
Mining various kinds of Association Rules
1 Discovering Robust Knowledge from Databases that Change Chun-Nan HsuCraig A. Knoblock Arizona State UniversityUniversity of Southern California Journal.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
UNIT_2 1 DATABASE MANAGEMENT SYSTEM[DBMS] [Unit: 2] Prepared By Lavlesh Pandit SPCE MCA, Visnagar.
18 February 2003Mathias Creutz 1 T Seminar: Discovery of frequent episodes in event sequences Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo.
Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering National Taiwan University 1/45 GEOSTATISTICS INTRODUCTION.
Data Mining and Decision Support
CLUSTERING HIGH-DIMENSIONAL DATA Elsayed Hemayed Data Mining Course.
A Spatial Index Structure for High Dimensional Point Data Wei Wang, Jiong Yang, and Richard Muntz Data Mining Lab Department of Computer Science University.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
CLUSTERING GRID-BASED METHODS Elsayed Hemayed Data Mining Course.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
ITree: Exploring Time-Varying Data using Indexable Tree Yi Gu and Chaoli Wang Michigan Technological University Presented at IEEE Pacific Visualization.
Privacy Vulnerability of Published Anonymous Mobility Traces Chris Y. T. Ma, David K. Y. Yau, Nung Kwan Yip (Purdue University) Nageswara S. V. Rao (Oak.
CLARANS: A Method for Clustering Objects for Spatial Data Mining IEEE Transactions on Knowledge and Data Enginerring, 2002 Raymond T. Ng et al. 22 MAR.
QED : An Efficient Framework for Temporal Region Query Processing Yi-Hong Chu 朱怡虹 Network Database Laboratory Dept. of Electrical Engineering National.
Dense-Region Based Compact Data Cube
COP Introduction to Database Structures
CHP - 9 File Structures.
Chapter 5 Advanced Data Modeling
Market Basket Analysis and Association Rules
Group 9 – Data Mining: Data
Presentation transcript:

Spatial-Temporal Data Mining Wei Wang Data Mining Lab Computer Science Department UCLA

Outline Introduction Active Spatial Data Mining –Spatial data mining trigger Temporal Association Rule with Numerical Attributes –Correlation among object evolutions Conclusions and Future Work

Introduction Huge amount of spatial data are generated everyday. –Earth Observing System –National Spatial Data Infrastructure –National Image Mapping Agency –One meter resolution data –Digital earth Users are usually interested in the hidden information. –Aggregate information –Clustering –Patterns

Introduction Knowledge discovery processes are computationally expensive. Today’s technology advances provide necessary computing power to carry out such complicated processes.

Outline Introduction STING+: An approach to active spatial data mining Temporal association rules with numerical attributes Conclusions and Future Work

STING+ Since data evolves over time, interesting patterns are likely to emerge or change. Goal: identify and find (most) interesting patterns Problems: –Knowledge discovery processes are expensive. It is not feasible to re-process the entire data set for every change. –Periodically examine the data. Long delays Transient patterns might be missed Natural solution: Usage of triggers.

STING+ Traditional database triggers can not be directly applied: –Expressive power of traditional database triggers is limited, especially in describing spatial relationships. –Example: Trigger investigation when the size of any cluster exceeds

STING+ STING+ was designed to introduce and support spatial triggers efficiently. Observation (spatial locality): Only objects added to the shaded area will contribute to the growth of cluster size at this moment

STING+ STING+ Strategy: Monitor only the area occupied by potential clusters and their neighborhoods. Observation (cumulative effect): at least 4 more objects are needed in order to make the cluster size be 20. STING+ Strategy: Space is organized in a hierarchy so that updates can be suspended at various levels in the hierarchy until the cumulative effect might cause the trigger to be fired.

Level Level 2 STING+ –Space is recursively divided into smaller rectangular cells down to a specified granularity and is organized via the inherit pyramid hierarchy.

STING+ –STING+ decomposes a trigger into a set of sub-triggers associated with individual cells in the hierarchical structure to monitor the cumulative effect of data changes within the cell. Level Sub-trigger on cell Level Higher level sub-trigger on cell

STING+ –Updates/insertions are suspended at various levels in the hierarchy until such time that the cumulative effect of these insertions might cause the trigger condition to become satisfied Level Level

STING+ Level Level No update of cluster !

STING+ Primitive event: insertion, deletion, update Composite event: a set of primitive events In general, evaluating a trigger T usually involves two aspects: –Find a set of composite events E(s) that may cause the trigger condition C T to become true. –Each time some composite event in E(s) occurs, check the status (false or true) of C T (given that C T was false previously). Observation: As a side effect of the occurrence of some composite event, E(s) might also evolve over time.

STING+ STING+ Strategy: Two sets of composite events are considered: –the set of composite events E(s) that can cause C T to become true need to re-evaluate C T –the set of composite events F(s) that can cause a change to E(s) need to update E(s) –The sub-triggers are used to monitor composite events in E(s) and F(s) and change accordingly when E(s) and F(s) evolves

STING+ Observation: Trigger condition C T is a conjunction of predicates P 1  P 2  …  P n and can not be true if one predicate is false. –They can be evaluated in a specific order: the ith predicate is tested when all previous (i -1) predicates are true. –The evaluation order should be chosen in such a way that the total cost is minimum.

STING+ PK-tree is used to index instantiated cells –Bound on height –Bounds on number of children –Uniqueness for any data set independent of order of insertion and deletion –Solid theoretical foundation –Fast retrieval and efficient maintenance Statistical information maintained at each node is used to facilitate the trigger process. –Sub-trigger

STING+ Comparison with periodic re-examination via STING –200,000 synthetic point objects –10,000 insertions/deletions/updates –If the period is set to be less than 4000 updates, STING+ consumes less CPU cycles. –Significant delay and transient patterns misses can occur for larger period. Not acceptable in many applications –No delay and no transient patterns missed with STING+.

Outline Introduction STING+: An approach to active spatial data mining Temporal association rules with numerical attributes Conclusions and Future Work

Temporal Association Rules Now we are considering general databases with evolving numerical attributes. Interesting patterns exhibited in the data are often numerous and complicated. –Customer churning: If a customer’s phone bill increases by at least $10 each month for six months, then he is likely to change his long distance telephone carrier. –Real estate: People who receive a raise of at least 20% of their salary are likely to move away from big city. Such patterns can be represented by association rules of the form X  Y, which indicates that the occurrences of X and Y have high correlation.

Temporal Association Rules Earlier work on association rules mainly focused on binary attributes and intra-transaction relationship. –E.g., ham  bread –Support and strength are two metrics used to qualify interesting rules. support: number of instances to follow the rule –N(ham, bread) strength: how strong the correlation is –

Temporal Association Rules Consider a set of objects, each of which has a unique ID and a set of time varying numerical attributes; and a sequence of snapshots are taken at some frequency. –E.g., in an employee database, two attributes are considered: salary and monthly housing expense. –For a given snapshot, each employee can be mapped to a point in a two dimensional space. salary monthly housing expense......

Temporal Association Rules –Given a sequence of snapshots, the trace of an employee can be mapped to a point in a high dimensional space. (,,,, ) salary monthly housing expense Snapshot 1Snapshot 2Snapshot 3Snapshot 4Snapshot 5 time

Temporal Association Rules Temporal association rules represent the correlation among object evolutions. –(salary: [52000, 56000]  [54000, 58000])  (monthly_housing_expense: [1200, 1400]  [1400, 1600]) –Each temporal association rule can also be viewed as an interpretation of a cluster (with certain shape) of points monthly_housing_expense salary

Temporal Association Rules Observation: The domain of a numerical attribute might contain a large number of distinct values and might even be continuous. –E.g., domain(salary) = [50000, 60000]. –Any sub-ranges can appear in a rule. –The number of possible rules may be very large if not infinite. Strategy: Each attribute domain is quantized into a set of equi-length base intervals. –The domain of salary could be quantized into base intervals of length $2000: –Values within the same interval are not distinguished. E.g., $51000 and $51500 are considered as the same

Temporal Association Rules salary E 1 (salary) = [52000, 54000]  [52000, 54000]  [54000, 56000] Attribute evolution E 2 (salary) = [52000, 56000]  [52000, 54000]  [52000, 56000]

Temporal Association Rules Snapshot 1 Snapshot 2 Snapshot 3 Evolution space Evolution cube of E 1 (salary) Evolution cube of E 2 (salary) Base cube

Temporal Association Rules –The subcube-supercube relationship defines a lattice among all evolution cubes within the evolution space. –This also holds for the evolution space of more than one attributes. salary monthly housing expense

Temporal Association Rules Some properties of the metrics enable us to search efficiently through the lattice in a bottom-up manner Property of strength: The strength of an evolution cube is less than or equal to the highest strength of its subcubes. Property of support: The support of an evolution cube is great than or equal to support of its subcube.

Temporal Association Rules Observation: Many valid but trivial rules may exist. –(salary: [52000, 56000])  (monthly_housing_expense: [1200, 1400]) –(salary: [50000, 56000])  (monthly_housing_expense: [1200, 1400]) –Both rules have the same value of support and strength since no employee’s salary is between and However, the first rule conveys more precise information. salary monthly housing expense

Temporal Association Rules Strategy: An interval can be included in a rule only if there are some minimum number of objects whose attributes values fall into that interval. –The density of each base cube within the evolution cube of a rule has to meet some threshold. –In the previous example, the second rule can be eliminated. Property of density: An evolution cube could satisfy the density threshold only when all of its subcubes satisfy the density threshold min_density = 2

Temporal Association Rules General Model: –Data set D –Language L express properties or define subgroup of data –Selection predicate q evaluate whether a sentence   L defines a potentially interesting class of D –Task: find the set {   | q(D,  ) is true} If –a lattice can be formed on sentences in L and –partial order exists on selection predicate then the level-wise algorithm can be used to prune search space efficiently.

Temporal Association Rules Temporal Association Rule: –Language L: each sentence   L is a temporal association rule. –The selection predicate q(D,  ) is true iff support(D,  )  min_support and strength(D,  )  min_strength and density(D,  )  min_density –Task: find the set of temporal association rules which satisfy all three predicates. Specialization relation <  a lattice on the sentences in L –subcube/supercube relationship q1q1 q2q2 q3q3

Temporal Association Rules partial order on q i with respect to < –support(D,  )  support(D,  ) if  <  –if strength (D,  ) < min_strength for all  < , then strength(D,  ) < min_strength –density(D,  )  density(D,  ) if  <  level-wise algorithm –basic scheme: starting from the most special (general) sentences, and then evaluate more and more general (special) sentences excluding those sentences that can not be interesting given all the information obtained in earlier iterations. Efficient space pruning –Starting point –Random sampling –Order of predicate evaluation

Temporal Association Rules Efficiency of space pruning –SR algorithm: after quantization, base intervals are combined as long as their density satisfies the threshold. The original base intervals and the combined intervals are treated as a set of items objects 100 snapshots 5 attributes 500 rules of length 5 density = 2 support = 5% strength = 1.4

Conclusions and Future Work STING+ was developed to support spatial data mining triggers very efficiently by –employing spatial locality property and –postponing the trigger condition evaluation until the cumulative effect might cause the trigger to be fired. Temporal association rules were introduced to capture relationship among object evolutions. Selected continuous work –Patterns whose cause and consequence do not happen together There is a delay for the consequence to show up. –Patterns involving relationships among objects e.g., children tend to live further away from their parent when they grow up.

Conclusions and Future Work Selected future work –Data mining over Internet data type networking issue –Analytical model classify data mining problems devise efficient general approach –Applications compiler/programming language WWW