Presentation is loading. Please wait.

Presentation is loading. Please wait.

11 Spatial Data Mining CS 697 Assignment 1 February 16, 2010 Pradnya Khutafale, Peter Lucas, and Chris Maio Advisor: Dr. Wei Ding Computer Science Department.

Similar presentations


Presentation on theme: "11 Spatial Data Mining CS 697 Assignment 1 February 16, 2010 Pradnya Khutafale, Peter Lucas, and Chris Maio Advisor: Dr. Wei Ding Computer Science Department."— Presentation transcript:

1 11 Spatial Data Mining CS 697 Assignment 1 February 16, 2010 Pradnya Khutafale, Peter Lucas, and Chris Maio Advisor: Dr. Wei Ding Computer Science Department UMass Boston UMass Boston

2 22 Discovery of Climate Indices using Clustering Principal Investigators Vipin Kumar (University of Minnesota) Vipin Kumar (University of Minnesota) Michael Steinbach (University of Minnesota) Michael Steinbach (University of Minnesota)Collaborators Steven Klooster (Cal. State Univ, Monterey Bay) Steven Klooster (Cal. State Univ, Monterey Bay) Christopher Potter (NASA Ames Research Center) Christopher Potter (NASA Ames Research Center) Pang-Ning Tan (Michigan State University) Pang-Ning Tan (Michigan State University)

3 3 Department of Computer Science and Engineering Michael Steinbach Pang-Ning Tan Vipin Kumar Researchers Discovery of Climate Indices using Clustering Leading educators in the field of spatial data mining Leading educators in the field of spatial data mining Investigating the use of data mining techniques to find interesting spatio- temporal patterns from Earth Science Investigating the use of data mining techniques to find interesting spatio- temporal patterns from Earth Science Regarded as leaders in the field of climate indices identification and data mining research Regarded as leaders in the field of climate indices identification and data mining research

4 4 NASA & Ames Research Center team members: Chris Potter Steven Klooster Researchers Working on cutting edge computer science methods and technologies to be utilized for finding solutions to complex environmental problems. Discovery of Climate Indices using Clustering

5 5 5 Presentation Outline Background: (Chris) Background: (Chris) Climate Change Climate Change Earth System Linkages Earth System Linkages Earth Science Data and Climate Indices (Chris) Earth Science Data and Climate Indices (Chris) Existing Eigenvalue Techniques and Limits (Pete) Existing Eigenvalue Techniques and Limits (Pete) New Clustering Based Methodology (Pete) New Clustering Based Methodology (Pete) Results and Comparisons (Pradnya) Results and Comparisons (Pradnya) Conclusions and Future Research (Pradnya and Pete) Conclusions and Future Research (Pradnya and Pete)

6 6 Discovery of Climate Indices using Clustering 6 Presentation Outline Background: Background: Climate Change Climate Change Earth System Linkages Earth System Linkages Earth Science Data and Climate Indices Earth Science Data and Climate Indices Existing Eigenvalue Techniques and Limitations Existing Eigenvalue Techniques and Limitations New Clustering Based Methodology New Clustering Based Methodology Results and Comparisons Results and Comparisons Conclusions and Future Research Conclusions and Future Research

7 77 Climate Change Background IPCC Predictions Rise in global temperatures Extinctions of plants and animals Discovery of Climate Indices using Clustering Sea-level Rise

8 88 Climate Change leads to significant changes of rainfall and soil moisture (drought and flood) Climate Change leads to significant changes of rainfall and soil moisture (drought and flood) Climate Change Impacts Background Discovery of Climate Indices using Clustering Agricultural activities (crop growth cycle) and world food supplies are affected greatly by climatic factors (desertification) Agricultural activities (crop growth cycle) and world food supplies are affected greatly by climatic factors (desertification) Climate change increases the frequency, intensity, and distribution of natural hazards, such as hurricanes and other storms Climate change increases the frequency, intensity, and distribution of natural hazards, such as hurricanes and other storms

9 9 Discovery of Climate Indices using Clustering Background Ocean, atmosphere, and land processes are highly coupled Ocean, atmosphere, and land processes are highly coupled Climate phenomena in one location can affect the climate at a far away location this is known as climate teleconnections Climate phenomena in one location can affect the climate at a far away location this is known as climate teleconnections Understanding climate “teleconnections” key to knowing and predicting ecosystem response to climate change Understanding climate “teleconnections” key to knowing and predicting ecosystem response to climate change Earth System Linkages

10 10 Discovery of Climate Indices using Clustering 10 Presentation Outline Background: Background: Climate Change Climate Change Earth System Linkages Earth System Linkages Earth Science Data and Climate Indices Earth Science Data and Climate Indices Existing Eigenvalue Techniques and Limitations Existing Eigenvalue Techniques and Limitations New Clustering Based Methodology New Clustering Based Methodology Results and Comparisons Results and Comparisons Conclusions and Future Research Conclusions and Future Research

11 11 Time Series Data Earth Science Data Sea Surface Temperature (SST) Sea Surface Temperature (SST) Sea Level Pressure (SLP) Sea Level Pressure (SLP)

12 1212 Earth Science Data Discovery of Climate Indices using Clustering There are thousands of floats, buoys, and other remote sensing devises throughout the oceans collecting enormous amount of oceanographic data periodically transmitted to shore via satellite (Naval Research Laboratory). Data Acquisition

13 1313 Spatial and temporal nature of data poses a number of challenges Spatial and temporal nature of data poses a number of challenges Noisy Noisy Cycles of varying lengths and regularity Cycles of varying lengths and regularity Strong seasonal component Strong seasonal component Displays long term trends Displays long term trends Displays temporal and spatial Autocorrelation Displays temporal and spatial Autocorrelation Discovery of Climate Indices using Clustering Earth Science Data Preprocessing Required

14 1414 Climate Indices = Data time series that summarize physical behavior of different regions of ocean and atmosphere Climate Indices = Data time series that summarize physical behavior of different regions of ocean and atmosphere Distill climate variability at regional or global scale into a single and manageable time series Distill climate variability at regional or global scale into a single and manageable time series Usually based on sea level pressure and sea surface temperature Usually based on sea level pressure and sea surface temperature Past methods of indication painstakingly slow and tedious Past methods of indication painstakingly slow and tedious Climate Indices Discovery of Climate Indices using Clustering

15 1515 Climate Index: Nino 1+2 Climate Indices Discovery of Climate Indices using Clustering

16 1616 Discovery of Climate Indices using Clustering

17 17 El Nino Correlations Climate Indices SST of El Nino correlated indices

18 18 Detection of Climate Indices Earth Scientists have devoted a significant amount of time discovering climate indices Earth Scientists have devoted a significant amount of time discovering climate indices Traditional approaches include direct observation of climate phenomena (El Nino) Traditional approaches include direct observation of climate phenomena (El Nino) Use of linear algebra techniques including eigenvalue analysis Use of linear algebra techniques including eigenvalue analysis Climate Indices Discovery of Climate Indices using Clustering

19 19 Eigenvalue Analysis Driven by massive amount of data obtained from satellites and remote sensing devises Driven by massive amount of data obtained from satellites and remote sensing devises Provides a way to quickly and automatically detect patterns in large amounts of data Provides a way to quickly and automatically detect patterns in large amounts of data Climate Indices Jason-2 IR satellite image Discovery of Climate Indices using Clustering

20 20 Eigenvalue Analysis Climate Indices Discovery of Climate Indices using Clustering Eigenvalue techniques include: Eigenvalue techniques include: Principle Components Analysis (PCA) Principle Components Analysis (PCA) Single Value Decomposition (SVD) Single Value Decomposition (SVD) Limitations of Eigenvalue Analysis Limitations of Eigenvalue Analysis Weaker signals may be masked by stronger signals Weaker signals may be masked by stronger signals All Discovered signals must be orthogonal to each other making it difficult to attach a physical interpretation to them All Discovered signals must be orthogonal to each other making it difficult to attach a physical interpretation to them

21 21 Alternative Clustering Methodology Utilization of data mining techniques and enormous amount of remote sensing data to find climate indices Utilization of data mining techniques and enormous amount of remote sensing data to find climate indices Analysis yields clusters that represent ocean regions with relatively homogeneous behavior Analysis yields clusters that represent ocean regions with relatively homogeneous behavior Centroids of these areas summarize behavior particular region Centroids of these areas summarize behavior particular region Finding “meaningful” clusters will enable Earth Scientists to better predict changes in climate system Finding “meaningful” clusters will enable Earth Scientists to better predict changes in climate system Climate Indices Discovery of Climate Indices using Clustering

22 22 Benefits of Clustering Discovered signals do not need to be orthogonal or statistically independent of one another Discovered signals do not need to be orthogonal or statistically independent of one another Signals are more easily interpreted Signals are more easily interpreted Weaker signals are more readily detected Weaker signals are more readily detected It provides an efficient way to determine the influence of large set of points (all ocean point) on another large set of points (all land points) It provides an efficient way to determine the influence of large set of points (all ocean point) on another large set of points (all land points) Climate Indices Discovery of Climate Indices using Clustering

23 23 Results of Clustering Methodology Candidate Indices highly correlated to known indices representing rediscovery of well known indices and validation of methods Candidate Indices highly correlated to known indices representing rediscovery of well known indices and validation of methods Variants to well-known indices which may be better predictors of land behavior for some regions of land Variants to well-known indices which may be better predictors of land behavior for some regions of land Cluster centroids that have medium or low correlation with known indices may represent new Earth science phenomena Cluster centroids that have medium or low correlation with known indices may represent new Earth science phenomena Climate Indices Discovery of Climate Indices using Clustering

24 24 Discovery of Climate Indices using Clustering 24 Presentation Outline Background: Background: Climate Change Climate Change Earth System Linkages Earth System Linkages Earth Science Data and Climate Indices Earth Science Data and Climate Indices Existing Eigenvalue Techniques and Limitations Existing Eigenvalue Techniques and Limitations New Clustering Based Methodology New Clustering Based Methodology Results and Comparisons Results and Comparisons Conclusions and Future Research Conclusions and Future Research

25 25 Finding Spatial or Temporal Patterns using SVD Analysis SVD: Singular Value Decomposition Earth Scientists typically used SVD analysis to identify climate indices Earth Scientists typically used SVD analysis to identify climate indices Goal : To find a new set of attributes that better describe variability in the data, through dimensionality reduction Goal : To find a new set of attributes that better describe variability in the data, through dimensionality reduction Its operation can be thought of as revealing the internal structure of the data in a way which best explains the variance in the data Its operation can be thought of as revealing the internal structure of the data in a way which best explains the variance in the data Karl Pearson, Statistician 1857 – 1936 1857 – 1936 Discovery of Climate Indices using Clustering Eigenvalue Techniques

26 26 Overview of SVD Analysis These techniques applied to a data set in the form of a data matrix (m by n) These techniques applied to a data set in the form of a data matrix (m by n) m rows (objects) m rows (objects) n columns (attributes) n columns (attributes) Data Matrix: a variation of Data Matrix: a variation of record data in that it consists record data in that it consists of all numeric attributes of all numeric attributes Example of a data matrix Discovery of Climate Indices using Clustering Eigenvalue Techniques

27 27 Overview of SVD Analysis Assume the data objects in a matrix all have the same fixed set of attributes Assume the data objects in a matrix all have the same fixed set of attributes Each data object can be thought of as a point, or Vector in multidimensional space Each data object can be thought of as a point, or Vector in multidimensional space Each spatial dimension Each spatial dimension represents a distinct attribute describing the object represents a distinct attribute describing the object Discovery of Climate Indices using Clustering Eigenvalue Techniques

28 Simple Example of SVD Analysis Just using web, it’s hard to find intuitive explanation of SVD Just using web, it’s hard to find intuitive explanation of SVD Again, SVD is a way to expose underlying details of matrix Again, SVD is a way to expose underlying details of matrix Simple Example using Golf : 3 golfers play 9 holes, par every hole How to predict score for a player on a given hole? How to predict score for a player on a given hole? Assume two vectors, Player Ability and Hole Difficulty Assume two vectors, Player Ability and Hole Difficulty Predicted score = Player Ability * Hole Difficulty Predicted score = Player Ability * Hole Difficulty Hole difficulty is Left Singular Vector Hole difficulty is Left Singular Vector Player Ability is Right Singular Vector Player Ability is Right Singular Vector Discovery of Climate Indices using Clustering28

29 29 Finding Spatial or Temporal Patterns using SVD Analysis Given a data matrix, whose rows consist of time series from various points on the globe, the objective is to discover the strong temporal or spatial patterns in the data Given a data matrix, whose rows consist of time series from various points on the globe, the objective is to discover the strong temporal or spatial patterns in the data SVD decomposes a matrix into two sets of patterns, which, that correspond to a set of spatial patterns (left singular vectors) and a set of temporal patterns (right singular vectors). SVD decomposes a matrix into two sets of patterns, which, that correspond to a set of spatial patterns (left singular vectors) and a set of temporal patterns (right singular vectors). We can plot the temporal patterns regular line plot and the spatial patterns on a spatial grid and visualize these patterns. We can plot the temporal patterns regular line plot and the spatial patterns on a spatial grid and visualize these patterns. Discovery of Climate Indices using Clustering Eigenvalue Techniques

30 30 Example : Plotting SST (Sea Surface Temp) Temporal pattern of SST (blue) plotted against the NINO4 index (green) Strongest spatial pattern of SST Discovery of Climate Indices using Clustering Eigenvalue Techniques

31 31 Limitations of SVD Analysis Only useful for finding a few of the strongest signals Only useful for finding a few of the strongest signals Smaller patterns in data may be obscured Smaller patterns in data may be obscured Signals must be orthogonal to each other (statistically independent) Signals must be orthogonal to each other (statistically independent) May not identify all patterns in data May not identify all patterns in data Efficiency can be a concern Efficiency can be a concern Discovery of Climate Indices using Clustering Eigenvalue Techniques

32 32 Discovery of Climate Indices using Clustering 32 Presentation Outline Background: Background: Climate Change Climate Change Earth System Linkages Earth System Linkages Earth Science Data and Climate Indices Earth Science Data and Climate Indices Existing Eigenvalue Techniques and Limitations Existing Eigenvalue Techniques and Limitations New Clustering Based Methodology New Clustering Based Methodology Results and Comparisons Results and Comparisons Conclusions and Future Research Conclusions and Future Research

33 33 Clustering Based Methodology for the Discovery of Climate Indices Two key steps for finding climate indices Two key steps for finding climate indices 1. Find candidate indices using clustering 2. Evaluate these candidate indices for Earth Science significance Clustering Method used for this study: SNN Clustering Algorithm Method “Searching Nearest Neighbors” “Searching Nearest Neighbors” Discovery of Climate Indices using Clustering Clustering Methods

34 34 Finding Candidate Indices Using Clustering SNN Clustering Algorithm First finds the nearest neighbors of each data point First finds the nearest neighbors of each data point Next, redefines the similarity between pairs in terms of how many nearest neighbors the two points share Next, redefines the similarity between pairs in terms of how many nearest neighbors the two points share Using this definition of similarity the algorithm identifies core points Using this definition of similarity the algorithm identifies core points These Core Points are used to build clusters These Core Points are used to build clusters SNN algorithms have time complexity O(n*log(n)) SNN algorithms have time complexity O(n*log(n)) Graph of functions n(log n) and n Discovery of Climate Indices using Clustering Clustering Methods

35 35 Evaluation of Candidate Indices Indices must be evaluated in terms of Earth Science significance Indices must be evaluated in terms of Earth Science significance (meaning the strength of the association between the behavior of a candidate index and land climate) Goal is to find a numerical measure of the strength and association between the behavior of an index and land climate Goal is to find a numerical measure of the strength and association between the behavior of an index and land climate To evaluate influence of climate indices on land, the researchers use Area-Weighted Correlation To evaluate influence of climate indices on land, the researchers use Area-Weighted Correlation Definition : The weighted average of the correlation of the candidate index with all land points, where weight is based on the area of the land grid point Definition : The weighted average of the correlation of the candidate index with all land points, where weight is based on the area of the land grid point Discovery of Climate Indices using Clustering Clustering Methods

36 36 Calculating Area-weighted Correlation Step 1 : Compute the correlation of the time series of the candidate index with the same time series associated with each land point Step 1 : Compute the correlation of the time series of the candidate index with the same time series associated with each land point Step 2 : Compute the weighted average of the correlations, where the weight associated with each land point is its area Step 2 : Compute the weighted average of the correlations, where the weight associated with each land point is its area The resulting area-weighted correlation The resulting area-weighted correlation can be at most 1, min is 0 can be at most 1, min is 0 General Formula for W.A. General Correlation Index. 1 being strongest Wc = weight of each value M Mc = some value to average Discovery of Climate Indices using Clustering Clustering Methods

37 37 Comparison of Area-Weighted Correlations Development of Baseline to compare the values of area weighted correlations of candidate indices Development of Baseline to compare the values of area weighted correlations of candidate indices Histogram of area weighted correlation of 1000 random time series Histogram of area weighted correlation of 1000 random time series No time series has a WAC >.1 This will be the baseline, and indicates whether a good candidate index No time series has a WAC >.1 This will be the baseline, and indicates whether a good candidate index Discovery of Climate Indices using Clustering Clustering Methods

38 38 Validation of Comparison Baseline Below shown are weighted area correlations of 11 known indices Below shown are weighted area correlations of 11 known indices Note that 10/11 indices have a weighted area correlation of >.1 Note that 10/11 indices have a weighted area correlation of >.1 If candidate index shows weighted area correlation >.1, investigate If candidate index shows weighted area correlation >.1, investigate Graph of Weighted Area Correlation of Well know Climate Indices Discovery of Climate Indices using Clustering Clustering Methods

39 39 Discovery of Climate Indices using Clustering 39 Presentation Outline Background: Background: Climate Change Climate Change Earth System Linkages Earth System Linkages Earth Science Data and Climate Indices Earth Science Data and Climate Indices Existing Eigenvalue Techniques and Limitations Existing Eigenvalue Techniques and Limitations New Clustering Based Methodology New Clustering Based Methodology Results and Comparisons Results and Comparisons Conclusions and Future Research Conclusions and Future Research

40 40 SST Based Candidate Indices Used SST data over time period from 1958 and 1998 and applied SNN clustering Obtained 107 clusters Cluster centroids were used to categorize clusters into G0,G1,G2 and G3 groups depending on their correlation to known indices Discovery of Climate Indices using Clustering Results

41 41 107 Sea Surface Temperature (SST) Clusters Discovery of Climate Indices using Clustering Results Find Correlation with known index like SOI, NINO1+2 etc Find Area Weighted correlation with land

42 42 SST Cluster Correlation Correlation between known indices with SST cluster centroids and SVD Components Results Discovery of Climate Indices using Clustering

43 43 G0: Clusters with correlation to known indices >= 0.8 Results Very highly correlated Very highly correlated Rediscovered well-known indices Rediscovered well-known indices Serve to validate the approach Serve to validate the approach NINO 1+2 NINO 3 NINO 3.4 NINO 4 Discovery of Climate Indices using Clustering

44 44 G0: SST Cluster Correlation Correlation between known indices with SST cluster centroids and SVD Components Results Discovery of Climate Indices using Clustering

45 45 G1: Clusters with correlation to known indices from 0.4 to 0.8 Discovery of Climate Indices using Clustering Results

46 46 G1: Cluster 29 vs. El Nino Indices Discovery of Climate Indices using Clustering Results Cluster 29

47 47 G2: Clusters with correlation to known indices from 0.25 to 0.4 Less correlated Less correlated May represent new earth science May represent new earth science phenomena phenomena May be new index May be new index Results Discovery of Climate Indices using Clustering

48 48 Cluster 62 vs. El Nino Indices Discovery of Climate Indices using Clustering Results Cluster 62

49 49 G3: Clusters with correlation to known indices <= 0.25 Less correlated Less correlated May represent new earth science May represent new earth science phenomena or weaker version of phenomena or weaker version of known phenomena known phenomena New index New index Results Discovery of Climate Indices using Clustering

50 50 SLP­based Candidate Indices SLP data over time period from SLP data over time period from 1958 to 1998 1958 to 1998 Correlation measured as difference Correlation measured as difference of all pairs of cluster centriods of all pairs of cluster centriods Negative correlation are interesting Negative correlation are interesting candidates candidates 25 Clusters found 25 Clusters found Results 25 Sea Level Pressure Based Clusters Discovery of Climate Indices using Clustering

51 51 SLP Clusters Pairwise Correlation Note :Only negative correlation values shown Discovery of Climate Indices using Clustering Results

52 52 Comparison with SVD based Indices Correlation of Cluster Centroids with land temperature Correlation of first 30 SVD components with land temperature Comparisons Discovery of Climate Indices using Clustering

53 53 SST Clusters : Performance Comparison Correlation for known indices with SST cluster centroids and SVD components Comparisons Discovery of Climate Indices using Clustering

54 54 SLP Clusters : Performance Comparison Comparisons Discovery of Climate Indices using Clustering

55 55 Area-weighted correlation for known indices with SLP cluster centroids and SVD components SLP clusters Performance Comparison Comparisons Discovery of Climate Indices using Clustering

56 56Conclusions Demonstrated that clustering is a viable alternative to eigenvalue based approach for the discovery of climate indices Demonstrated that clustering is a viable alternative to eigenvalue based approach for the discovery of climate indices Can replicate many well-known climate indices Can replicate many well-known climate indices Have also discovered variants of known indices that may be “better” for some regions Have also discovered variants of known indices that may be “better” for some regions Some indices may represent new Earth Science phenomena Some indices may represent new Earth Science phenomena No need for discovered indices to be orthogonal No need for discovered indices to be orthogonal No need to pre-select the area to analyze No need to pre-select the area to analyze Discovery of Climate Indices using Clustering

57 57 Future Work Investigation of candidate indices by Earth Scientists Investigation of candidate indices by Earth Scientists Investigate whether there are climate indices that cannot be represented by clusters Investigate whether there are climate indices that cannot be represented by clusters Noise elimination and other preprocessing improvements Noise elimination and other preprocessing improvements Aggregation Aggregation Discovery of Climate Indices using Clustering

58 58 QUESTIONS ???


Download ppt "11 Spatial Data Mining CS 697 Assignment 1 February 16, 2010 Pradnya Khutafale, Peter Lucas, and Chris Maio Advisor: Dr. Wei Ding Computer Science Department."

Similar presentations


Ads by Google