Download presentation
Presentation is loading. Please wait.
Published byKimberly Hall Modified over 9 years ago
1
11 Spatial Data Mining CS 697 Assignment 1 February 16, 2010 Pradnya Khutafale, Peter Lucas, and Chris Maio Advisor: Dr. Wei Ding Computer Science Department UMass Boston UMass Boston
2
22 Discovery of Climate Indices using Clustering Principal Investigators Vipin Kumar (University of Minnesota) Vipin Kumar (University of Minnesota) Michael Steinbach (University of Minnesota) Michael Steinbach (University of Minnesota)Collaborators Steven Klooster (Cal. State Univ, Monterey Bay) Steven Klooster (Cal. State Univ, Monterey Bay) Christopher Potter (NASA Ames Research Center) Christopher Potter (NASA Ames Research Center) Pang-Ning Tan (Michigan State University) Pang-Ning Tan (Michigan State University)
3
3 Department of Computer Science and Engineering Michael Steinbach Pang-Ning Tan Vipin Kumar Researchers Discovery of Climate Indices using Clustering Leading educators in the field of spatial data mining Leading educators in the field of spatial data mining Investigating the use of data mining techniques to find interesting spatio- temporal patterns from Earth Science Investigating the use of data mining techniques to find interesting spatio- temporal patterns from Earth Science Regarded as leaders in the field of climate indices identification and data mining research Regarded as leaders in the field of climate indices identification and data mining research
4
4 NASA & Ames Research Center team members: Chris Potter Steven Klooster Researchers Working on cutting edge computer science methods and technologies to be utilized for finding solutions to complex environmental problems. Discovery of Climate Indices using Clustering
5
5 5 Presentation Outline Background: (Chris) Background: (Chris) Climate Change Climate Change Earth System Linkages Earth System Linkages Earth Science Data and Climate Indices (Chris) Earth Science Data and Climate Indices (Chris) Existing Eigenvalue Techniques and Limits (Pete) Existing Eigenvalue Techniques and Limits (Pete) New Clustering Based Methodology (Pete) New Clustering Based Methodology (Pete) Results and Comparisons (Pradnya) Results and Comparisons (Pradnya) Conclusions and Future Research (Pradnya and Pete) Conclusions and Future Research (Pradnya and Pete)
6
6 Discovery of Climate Indices using Clustering 6 Presentation Outline Background: Background: Climate Change Climate Change Earth System Linkages Earth System Linkages Earth Science Data and Climate Indices Earth Science Data and Climate Indices Existing Eigenvalue Techniques and Limitations Existing Eigenvalue Techniques and Limitations New Clustering Based Methodology New Clustering Based Methodology Results and Comparisons Results and Comparisons Conclusions and Future Research Conclusions and Future Research
7
77 Climate Change Background IPCC Predictions Rise in global temperatures Extinctions of plants and animals Discovery of Climate Indices using Clustering Sea-level Rise
8
88 Climate Change leads to significant changes of rainfall and soil moisture (drought and flood) Climate Change leads to significant changes of rainfall and soil moisture (drought and flood) Climate Change Impacts Background Discovery of Climate Indices using Clustering Agricultural activities (crop growth cycle) and world food supplies are affected greatly by climatic factors (desertification) Agricultural activities (crop growth cycle) and world food supplies are affected greatly by climatic factors (desertification) Climate change increases the frequency, intensity, and distribution of natural hazards, such as hurricanes and other storms Climate change increases the frequency, intensity, and distribution of natural hazards, such as hurricanes and other storms
9
9 Discovery of Climate Indices using Clustering Background Ocean, atmosphere, and land processes are highly coupled Ocean, atmosphere, and land processes are highly coupled Climate phenomena in one location can affect the climate at a far away location this is known as climate teleconnections Climate phenomena in one location can affect the climate at a far away location this is known as climate teleconnections Understanding climate “teleconnections” key to knowing and predicting ecosystem response to climate change Understanding climate “teleconnections” key to knowing and predicting ecosystem response to climate change Earth System Linkages
10
10 Discovery of Climate Indices using Clustering 10 Presentation Outline Background: Background: Climate Change Climate Change Earth System Linkages Earth System Linkages Earth Science Data and Climate Indices Earth Science Data and Climate Indices Existing Eigenvalue Techniques and Limitations Existing Eigenvalue Techniques and Limitations New Clustering Based Methodology New Clustering Based Methodology Results and Comparisons Results and Comparisons Conclusions and Future Research Conclusions and Future Research
11
11 Time Series Data Earth Science Data Sea Surface Temperature (SST) Sea Surface Temperature (SST) Sea Level Pressure (SLP) Sea Level Pressure (SLP)
12
1212 Earth Science Data Discovery of Climate Indices using Clustering There are thousands of floats, buoys, and other remote sensing devises throughout the oceans collecting enormous amount of oceanographic data periodically transmitted to shore via satellite (Naval Research Laboratory). Data Acquisition
13
1313 Spatial and temporal nature of data poses a number of challenges Spatial and temporal nature of data poses a number of challenges Noisy Noisy Cycles of varying lengths and regularity Cycles of varying lengths and regularity Strong seasonal component Strong seasonal component Displays long term trends Displays long term trends Displays temporal and spatial Autocorrelation Displays temporal and spatial Autocorrelation Discovery of Climate Indices using Clustering Earth Science Data Preprocessing Required
14
1414 Climate Indices = Data time series that summarize physical behavior of different regions of ocean and atmosphere Climate Indices = Data time series that summarize physical behavior of different regions of ocean and atmosphere Distill climate variability at regional or global scale into a single and manageable time series Distill climate variability at regional or global scale into a single and manageable time series Usually based on sea level pressure and sea surface temperature Usually based on sea level pressure and sea surface temperature Past methods of indication painstakingly slow and tedious Past methods of indication painstakingly slow and tedious Climate Indices Discovery of Climate Indices using Clustering
15
1515 Climate Index: Nino 1+2 Climate Indices Discovery of Climate Indices using Clustering
16
1616 Discovery of Climate Indices using Clustering
17
17 El Nino Correlations Climate Indices SST of El Nino correlated indices
18
18 Detection of Climate Indices Earth Scientists have devoted a significant amount of time discovering climate indices Earth Scientists have devoted a significant amount of time discovering climate indices Traditional approaches include direct observation of climate phenomena (El Nino) Traditional approaches include direct observation of climate phenomena (El Nino) Use of linear algebra techniques including eigenvalue analysis Use of linear algebra techniques including eigenvalue analysis Climate Indices Discovery of Climate Indices using Clustering
19
19 Eigenvalue Analysis Driven by massive amount of data obtained from satellites and remote sensing devises Driven by massive amount of data obtained from satellites and remote sensing devises Provides a way to quickly and automatically detect patterns in large amounts of data Provides a way to quickly and automatically detect patterns in large amounts of data Climate Indices Jason-2 IR satellite image Discovery of Climate Indices using Clustering
20
20 Eigenvalue Analysis Climate Indices Discovery of Climate Indices using Clustering Eigenvalue techniques include: Eigenvalue techniques include: Principle Components Analysis (PCA) Principle Components Analysis (PCA) Single Value Decomposition (SVD) Single Value Decomposition (SVD) Limitations of Eigenvalue Analysis Limitations of Eigenvalue Analysis Weaker signals may be masked by stronger signals Weaker signals may be masked by stronger signals All Discovered signals must be orthogonal to each other making it difficult to attach a physical interpretation to them All Discovered signals must be orthogonal to each other making it difficult to attach a physical interpretation to them
21
21 Alternative Clustering Methodology Utilization of data mining techniques and enormous amount of remote sensing data to find climate indices Utilization of data mining techniques and enormous amount of remote sensing data to find climate indices Analysis yields clusters that represent ocean regions with relatively homogeneous behavior Analysis yields clusters that represent ocean regions with relatively homogeneous behavior Centroids of these areas summarize behavior particular region Centroids of these areas summarize behavior particular region Finding “meaningful” clusters will enable Earth Scientists to better predict changes in climate system Finding “meaningful” clusters will enable Earth Scientists to better predict changes in climate system Climate Indices Discovery of Climate Indices using Clustering
22
22 Benefits of Clustering Discovered signals do not need to be orthogonal or statistically independent of one another Discovered signals do not need to be orthogonal or statistically independent of one another Signals are more easily interpreted Signals are more easily interpreted Weaker signals are more readily detected Weaker signals are more readily detected It provides an efficient way to determine the influence of large set of points (all ocean point) on another large set of points (all land points) It provides an efficient way to determine the influence of large set of points (all ocean point) on another large set of points (all land points) Climate Indices Discovery of Climate Indices using Clustering
23
23 Results of Clustering Methodology Candidate Indices highly correlated to known indices representing rediscovery of well known indices and validation of methods Candidate Indices highly correlated to known indices representing rediscovery of well known indices and validation of methods Variants to well-known indices which may be better predictors of land behavior for some regions of land Variants to well-known indices which may be better predictors of land behavior for some regions of land Cluster centroids that have medium or low correlation with known indices may represent new Earth science phenomena Cluster centroids that have medium or low correlation with known indices may represent new Earth science phenomena Climate Indices Discovery of Climate Indices using Clustering
24
24 Discovery of Climate Indices using Clustering 24 Presentation Outline Background: Background: Climate Change Climate Change Earth System Linkages Earth System Linkages Earth Science Data and Climate Indices Earth Science Data and Climate Indices Existing Eigenvalue Techniques and Limitations Existing Eigenvalue Techniques and Limitations New Clustering Based Methodology New Clustering Based Methodology Results and Comparisons Results and Comparisons Conclusions and Future Research Conclusions and Future Research
25
25 Finding Spatial or Temporal Patterns using SVD Analysis SVD: Singular Value Decomposition Earth Scientists typically used SVD analysis to identify climate indices Earth Scientists typically used SVD analysis to identify climate indices Goal : To find a new set of attributes that better describe variability in the data, through dimensionality reduction Goal : To find a new set of attributes that better describe variability in the data, through dimensionality reduction Its operation can be thought of as revealing the internal structure of the data in a way which best explains the variance in the data Its operation can be thought of as revealing the internal structure of the data in a way which best explains the variance in the data Karl Pearson, Statistician 1857 – 1936 1857 – 1936 Discovery of Climate Indices using Clustering Eigenvalue Techniques
26
26 Overview of SVD Analysis These techniques applied to a data set in the form of a data matrix (m by n) These techniques applied to a data set in the form of a data matrix (m by n) m rows (objects) m rows (objects) n columns (attributes) n columns (attributes) Data Matrix: a variation of Data Matrix: a variation of record data in that it consists record data in that it consists of all numeric attributes of all numeric attributes Example of a data matrix Discovery of Climate Indices using Clustering Eigenvalue Techniques
27
27 Overview of SVD Analysis Assume the data objects in a matrix all have the same fixed set of attributes Assume the data objects in a matrix all have the same fixed set of attributes Each data object can be thought of as a point, or Vector in multidimensional space Each data object can be thought of as a point, or Vector in multidimensional space Each spatial dimension Each spatial dimension represents a distinct attribute describing the object represents a distinct attribute describing the object Discovery of Climate Indices using Clustering Eigenvalue Techniques
28
Simple Example of SVD Analysis Just using web, it’s hard to find intuitive explanation of SVD Just using web, it’s hard to find intuitive explanation of SVD Again, SVD is a way to expose underlying details of matrix Again, SVD is a way to expose underlying details of matrix Simple Example using Golf : 3 golfers play 9 holes, par every hole How to predict score for a player on a given hole? How to predict score for a player on a given hole? Assume two vectors, Player Ability and Hole Difficulty Assume two vectors, Player Ability and Hole Difficulty Predicted score = Player Ability * Hole Difficulty Predicted score = Player Ability * Hole Difficulty Hole difficulty is Left Singular Vector Hole difficulty is Left Singular Vector Player Ability is Right Singular Vector Player Ability is Right Singular Vector Discovery of Climate Indices using Clustering28
29
29 Finding Spatial or Temporal Patterns using SVD Analysis Given a data matrix, whose rows consist of time series from various points on the globe, the objective is to discover the strong temporal or spatial patterns in the data Given a data matrix, whose rows consist of time series from various points on the globe, the objective is to discover the strong temporal or spatial patterns in the data SVD decomposes a matrix into two sets of patterns, which, that correspond to a set of spatial patterns (left singular vectors) and a set of temporal patterns (right singular vectors). SVD decomposes a matrix into two sets of patterns, which, that correspond to a set of spatial patterns (left singular vectors) and a set of temporal patterns (right singular vectors). We can plot the temporal patterns regular line plot and the spatial patterns on a spatial grid and visualize these patterns. We can plot the temporal patterns regular line plot and the spatial patterns on a spatial grid and visualize these patterns. Discovery of Climate Indices using Clustering Eigenvalue Techniques
30
30 Example : Plotting SST (Sea Surface Temp) Temporal pattern of SST (blue) plotted against the NINO4 index (green) Strongest spatial pattern of SST Discovery of Climate Indices using Clustering Eigenvalue Techniques
31
31 Limitations of SVD Analysis Only useful for finding a few of the strongest signals Only useful for finding a few of the strongest signals Smaller patterns in data may be obscured Smaller patterns in data may be obscured Signals must be orthogonal to each other (statistically independent) Signals must be orthogonal to each other (statistically independent) May not identify all patterns in data May not identify all patterns in data Efficiency can be a concern Efficiency can be a concern Discovery of Climate Indices using Clustering Eigenvalue Techniques
32
32 Discovery of Climate Indices using Clustering 32 Presentation Outline Background: Background: Climate Change Climate Change Earth System Linkages Earth System Linkages Earth Science Data and Climate Indices Earth Science Data and Climate Indices Existing Eigenvalue Techniques and Limitations Existing Eigenvalue Techniques and Limitations New Clustering Based Methodology New Clustering Based Methodology Results and Comparisons Results and Comparisons Conclusions and Future Research Conclusions and Future Research
33
33 Clustering Based Methodology for the Discovery of Climate Indices Two key steps for finding climate indices Two key steps for finding climate indices 1. Find candidate indices using clustering 2. Evaluate these candidate indices for Earth Science significance Clustering Method used for this study: SNN Clustering Algorithm Method “Searching Nearest Neighbors” “Searching Nearest Neighbors” Discovery of Climate Indices using Clustering Clustering Methods
34
34 Finding Candidate Indices Using Clustering SNN Clustering Algorithm First finds the nearest neighbors of each data point First finds the nearest neighbors of each data point Next, redefines the similarity between pairs in terms of how many nearest neighbors the two points share Next, redefines the similarity between pairs in terms of how many nearest neighbors the two points share Using this definition of similarity the algorithm identifies core points Using this definition of similarity the algorithm identifies core points These Core Points are used to build clusters These Core Points are used to build clusters SNN algorithms have time complexity O(n*log(n)) SNN algorithms have time complexity O(n*log(n)) Graph of functions n(log n) and n Discovery of Climate Indices using Clustering Clustering Methods
35
35 Evaluation of Candidate Indices Indices must be evaluated in terms of Earth Science significance Indices must be evaluated in terms of Earth Science significance (meaning the strength of the association between the behavior of a candidate index and land climate) Goal is to find a numerical measure of the strength and association between the behavior of an index and land climate Goal is to find a numerical measure of the strength and association between the behavior of an index and land climate To evaluate influence of climate indices on land, the researchers use Area-Weighted Correlation To evaluate influence of climate indices on land, the researchers use Area-Weighted Correlation Definition : The weighted average of the correlation of the candidate index with all land points, where weight is based on the area of the land grid point Definition : The weighted average of the correlation of the candidate index with all land points, where weight is based on the area of the land grid point Discovery of Climate Indices using Clustering Clustering Methods
36
36 Calculating Area-weighted Correlation Step 1 : Compute the correlation of the time series of the candidate index with the same time series associated with each land point Step 1 : Compute the correlation of the time series of the candidate index with the same time series associated with each land point Step 2 : Compute the weighted average of the correlations, where the weight associated with each land point is its area Step 2 : Compute the weighted average of the correlations, where the weight associated with each land point is its area The resulting area-weighted correlation The resulting area-weighted correlation can be at most 1, min is 0 can be at most 1, min is 0 General Formula for W.A. General Correlation Index. 1 being strongest Wc = weight of each value M Mc = some value to average Discovery of Climate Indices using Clustering Clustering Methods
37
37 Comparison of Area-Weighted Correlations Development of Baseline to compare the values of area weighted correlations of candidate indices Development of Baseline to compare the values of area weighted correlations of candidate indices Histogram of area weighted correlation of 1000 random time series Histogram of area weighted correlation of 1000 random time series No time series has a WAC >.1 This will be the baseline, and indicates whether a good candidate index No time series has a WAC >.1 This will be the baseline, and indicates whether a good candidate index Discovery of Climate Indices using Clustering Clustering Methods
38
38 Validation of Comparison Baseline Below shown are weighted area correlations of 11 known indices Below shown are weighted area correlations of 11 known indices Note that 10/11 indices have a weighted area correlation of >.1 Note that 10/11 indices have a weighted area correlation of >.1 If candidate index shows weighted area correlation >.1, investigate If candidate index shows weighted area correlation >.1, investigate Graph of Weighted Area Correlation of Well know Climate Indices Discovery of Climate Indices using Clustering Clustering Methods
39
39 Discovery of Climate Indices using Clustering 39 Presentation Outline Background: Background: Climate Change Climate Change Earth System Linkages Earth System Linkages Earth Science Data and Climate Indices Earth Science Data and Climate Indices Existing Eigenvalue Techniques and Limitations Existing Eigenvalue Techniques and Limitations New Clustering Based Methodology New Clustering Based Methodology Results and Comparisons Results and Comparisons Conclusions and Future Research Conclusions and Future Research
40
40 SST Based Candidate Indices Used SST data over time period from 1958 and 1998 and applied SNN clustering Obtained 107 clusters Cluster centroids were used to categorize clusters into G0,G1,G2 and G3 groups depending on their correlation to known indices Discovery of Climate Indices using Clustering Results
41
41 107 Sea Surface Temperature (SST) Clusters Discovery of Climate Indices using Clustering Results Find Correlation with known index like SOI, NINO1+2 etc Find Area Weighted correlation with land
42
42 SST Cluster Correlation Correlation between known indices with SST cluster centroids and SVD Components Results Discovery of Climate Indices using Clustering
43
43 G0: Clusters with correlation to known indices >= 0.8 Results Very highly correlated Very highly correlated Rediscovered well-known indices Rediscovered well-known indices Serve to validate the approach Serve to validate the approach NINO 1+2 NINO 3 NINO 3.4 NINO 4 Discovery of Climate Indices using Clustering
44
44 G0: SST Cluster Correlation Correlation between known indices with SST cluster centroids and SVD Components Results Discovery of Climate Indices using Clustering
45
45 G1: Clusters with correlation to known indices from 0.4 to 0.8 Discovery of Climate Indices using Clustering Results
46
46 G1: Cluster 29 vs. El Nino Indices Discovery of Climate Indices using Clustering Results Cluster 29
47
47 G2: Clusters with correlation to known indices from 0.25 to 0.4 Less correlated Less correlated May represent new earth science May represent new earth science phenomena phenomena May be new index May be new index Results Discovery of Climate Indices using Clustering
48
48 Cluster 62 vs. El Nino Indices Discovery of Climate Indices using Clustering Results Cluster 62
49
49 G3: Clusters with correlation to known indices <= 0.25 Less correlated Less correlated May represent new earth science May represent new earth science phenomena or weaker version of phenomena or weaker version of known phenomena known phenomena New index New index Results Discovery of Climate Indices using Clustering
50
50 SLPbased Candidate Indices SLP data over time period from SLP data over time period from 1958 to 1998 1958 to 1998 Correlation measured as difference Correlation measured as difference of all pairs of cluster centriods of all pairs of cluster centriods Negative correlation are interesting Negative correlation are interesting candidates candidates 25 Clusters found 25 Clusters found Results 25 Sea Level Pressure Based Clusters Discovery of Climate Indices using Clustering
51
51 SLP Clusters Pairwise Correlation Note :Only negative correlation values shown Discovery of Climate Indices using Clustering Results
52
52 Comparison with SVD based Indices Correlation of Cluster Centroids with land temperature Correlation of first 30 SVD components with land temperature Comparisons Discovery of Climate Indices using Clustering
53
53 SST Clusters : Performance Comparison Correlation for known indices with SST cluster centroids and SVD components Comparisons Discovery of Climate Indices using Clustering
54
54 SLP Clusters : Performance Comparison Comparisons Discovery of Climate Indices using Clustering
55
55 Area-weighted correlation for known indices with SLP cluster centroids and SVD components SLP clusters Performance Comparison Comparisons Discovery of Climate Indices using Clustering
56
56Conclusions Demonstrated that clustering is a viable alternative to eigenvalue based approach for the discovery of climate indices Demonstrated that clustering is a viable alternative to eigenvalue based approach for the discovery of climate indices Can replicate many well-known climate indices Can replicate many well-known climate indices Have also discovered variants of known indices that may be “better” for some regions Have also discovered variants of known indices that may be “better” for some regions Some indices may represent new Earth Science phenomena Some indices may represent new Earth Science phenomena No need for discovered indices to be orthogonal No need for discovered indices to be orthogonal No need to pre-select the area to analyze No need to pre-select the area to analyze Discovery of Climate Indices using Clustering
57
57 Future Work Investigation of candidate indices by Earth Scientists Investigation of candidate indices by Earth Scientists Investigate whether there are climate indices that cannot be represented by clusters Investigate whether there are climate indices that cannot be represented by clusters Noise elimination and other preprocessing improvements Noise elimination and other preprocessing improvements Aggregation Aggregation Discovery of Climate Indices using Clustering
58
58 QUESTIONS ???
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.