Presentation is loading. Please wait.

Presentation is loading. Please wait.

Columbia University Advanced Machine Learning & Perception – Fall 2006 Term Project Nonlinear Dimensionality Reduction and K-Nearest Neighbor Classification.

Similar presentations


Presentation on theme: "Columbia University Advanced Machine Learning & Perception – Fall 2006 Term Project Nonlinear Dimensionality Reduction and K-Nearest Neighbor Classification."— Presentation transcript:

1 Columbia University Advanced Machine Learning & Perception – Fall 2006 Term Project Nonlinear Dimensionality Reduction and K-Nearest Neighbor Classification Applied to Global Climate Data Carlos Henrique Ribeiro Lima New York – Dec/2006

2 Outline 1.Goals 2.Motivation and Dataset 3.Methodology 4. Results 1.Low-Dimensional Manifold 2.KNN on Low-Dimensional Manifold 5.Conclusion

3 1. Goals 1. Use of kernel PCA based on Semidefinite Embedding to identify the low-dimensional, non-linear, manifold of climate data sets  identification of main modes of spatial variability; 2. Classification on the feature space  predictions on the original space (KNN method);

4 2. Motivation Dataset of Monthly Sea Surface Temperature (SST) Huge economical and social impacts of extreme El Nino events (e.g. 1997)  Need of forecasting models!

5 2. Dataset Monthly Sea Surface Temperature (SST) Data from Jan/1856 to Dec/2005 1. Latitudinal Band: 25oS-25oN 2. Grid with 599 cells; 3. Training data: Jan/1856 to Dec/1975 = 120 years 4. Testing set: Jan/1976 to Dec/2005 = 30 years 5. Input matrix: n = 1440 points m = 599 dimensions

6 3. Methodology 1) Semidefinite Embedding (Code from K. Q. Weinberger) Semipositive definiteness Inner product centered on the origin Isometry - local distances of the input space are preserved on the feature space 2) KNN  Euclidian Distance 3) Probabilistic Forecasting  Skill Score (RPS)

7 4. Results Low-Dimensional Manifold

8 4. Results Labeling on the feature space

9 4. Results Forecasts – Testing Set KNN method and skill score E.g. March – 1997; 1) Want to predict the class of nino3 in Dec/1997  lead time = 9 months. 2) KNN on feature space (March:1856 to 1975); 3) Take classes and weights of the k neighbors; 4) Skill score.

10 4. Results Forecasts – Testing Set KNN method and skill score – El Nino of 1982 and 1997

11 5. Conclusions 1.Semidefinite Embedding performs well on the SST data (high dimensional  just 3 dimensions ~90%of exp. variance); 2.KNN method provides very good classification and forecasts; 3.Need to check sensibility to change in some parameters (# local neighbors, #KNN); 4.Plan to extend to other climate datasets; 5.Try other metrics, multivariate data, etc.


Download ppt "Columbia University Advanced Machine Learning & Perception – Fall 2006 Term Project Nonlinear Dimensionality Reduction and K-Nearest Neighbor Classification."

Similar presentations


Ads by Google