Central limit theorem revisited

Slides:



Advertisements
Similar presentations
Face Recognition Sumitha Balasuriya.
Advertisements

3D Geometry for Computer Graphics
Noise & Data Reduction. Paired Sample t Test Data Transformation - Overview From Covariance Matrix to PCA and Dimension Reduction Fourier Analysis - Spectrum.
PCA + SVD.
Maximum Covariance Analysis Canonical Correlation Analysis.
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Computer Vision – Image Representation (Histograms)
Statistical tools in Climatology René Garreaud
Lecture 7: Principal component analysis (PCA)
Principal Components Analysis Babak Rasolzadeh Tuesday, 5th December 2006.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
An introduction to Principal Component Analysis (PCA)
Principal Component Analysis
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
Chapter 10 Simple Regression.
Common Factor Analysis “World View” of PC vs. CF Choosing between PC and CF PAF -- most common kind of CF Communality & Communality Estimation Common Factor.
LGM Eddy Diagnostics and Energetics from CCSM. Winter Composite of VT at 850.
Dimensional reduction, PCA
Principle Component Analysis What is it? Why use it? –Filter on your data –Gain insight on important processes The PCA Machinery –How to do it –Examples.
The dominant patterns of climate variability John M. Wallace Department of Atmospheric Sciences University of Washington.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 9(b) Principal Components Analysis Martin Russell.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
A quick introduction to the analysis of questionnaire data John Richardson.
The Simple Regression Model
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Time Series 2 Time Series 1 R 2 = 1 Perfectly correlated TS2=5*cos(2*t) TS1=cos(2*t)
What is EOF analysis? EOF = Empirical Orthogonal Function Method of finding structures (or patterns) that explain maximum variance in (e.g.) 2D (space-time)
Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004.
Chapter 2 Dimensionality Reduction. Linear Methods
CSE554AlignmentSlide 1 CSE 554 Lecture 8: Alignment Fall 2014.
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Extensions of PCA and Related Tools
Some matrix stuff.
 Global Sea Surface Temperatures From voluntary ship observations: Colors show the percentage of months with at least one observation in a 2 by 2 degree.
CSE554AlignmentSlide 1 CSE 554 Lecture 5: Alignment Fall 2011.
Review of Statistics and Linear Algebra Mean: Variance:
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Stratospheric harbingers of anomalous weather regimes. M.P. Baldwin and T.J Dunkerton Science, 294:581. Propagation of the Arctic Oscillation from.
Self-organizing maps (SOMs) and k-means clustering: Part 1 Steven Feldstein The Pennsylvania State University Trieste, Italy, October 21, 2013 Collaborators:
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
CSE554AlignmentSlide 1 CSE 554 Lecture 8: Alignment Fall 2013.
Principal Components Analysis. Principal Components Analysis (PCA) A multivariate technique with the central aim of reducing the dimensionality of a multivariate.
Lecture 12 Factor Analysis.
Statistical Summary ATM 305 – 12 November Review of Primary Statistics Mean Median Mode x i - scalar quantity N - number of observations Value at.
Introduction to Linear Algebra Mark Goldman Emily Mackevicius.
Christina Bonfanti University of Miami- RSMAS MPO 524.
PATTERN RECOGNITION : PRINCIPAL COMPONENTS ANALYSIS Richard Brereton
Principle Component Analysis and its use in MA clustering Lecture 12.
Extratropical Sensitivity to Tropical SST Prashant Sardeshmukh, Joe Barsugli, and Sang-Ik Shin Climate Diagnostics Center.
Changes in Sea Ice Alison Liou Meghan Goodwin. Arctic Oscillation (Northern Annular Mode) Antarctic Oscillation (Southern Annular Mode) Zonal = movement.
3 “Products” of Principle Component Analysis
Oceanography 569 Oceanographic Data Analysis Laboratory Kathie Kelly Applied Physics Laboratory 515 Ben Hall IR Bldg class web site: faculty.washington.edu/kellyapl/classes/ocean569_.
Central limit theorem revisited Throw a dice twelve times- the distribution of values is not Gaussian Dice Value Number Of Occurrences.
Principal Components Analysis ( PCA)
Central limit theorem - go to web applet. Correlation maps vs. regression maps PNA is a time series of fluctuations in 500 mb heights PNA = 0.25 *
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
PRINCIPAL COMPONENT ANALYSIS(PCA) EOFs and Principle Components; Selection Rules LECTURE 8 Supplementary Readings: Wilks, chapters 9.
CSE 554 Lecture 8: Alignment
Principal Component Analysis
Background on Classification
Application of Independent Component Analysis (ICA) to Beam Diagnosis
Principal Component Analysis
Singular Value Decomposition North Atlantic SST
Descriptive Statistics vs. Factor Analysis
Dimension reduction : PCA and Clustering
PCA of Waimea Wave Climate
Principal Component Analysis
Seasonal Forecasting Using the Climate Predictability Tool
Contrasts & Statistical Inference
Marios Mattheakis and Pavlos Protopapas
Presentation transcript:

Central limit theorem revisited Throw a dice twelve times- the distribution of values is not Gaussian Number Of Occurrences Dice Value

Repeat twelve dice throws twelve times- find the average for each set of twelve Number Of Occurrences Dice Value

What is the distribution of the AVERAGE of the 12 sets? Number of Occurrences Mean Dice Value

Same as last slide, for 180 sets Number Of Occurrences Average Dice Value of Each set Key Point: Even though the distribution in each set is NOT Gaussian, the average of the sets has a Gaussian Distribution

Time Series 2 Time Series 1 TS2=5*cos(2*t) TS1=cos(2*t) R2 = 1 Perfectly correlated

* Time Series 2 Time Series 1 R2 = 0 TS2=5*cos(2*t) No LINEAR correlation TS2=5*cos(2*t) TS1=cos(t)

Time Series 2 Time Series 1 TS2=sin(t) TS1=cos(t) R2 = 0 No correlation

Time Series 2 Time Series 1 TS2=cos(t-pi/4) TS1=cos(t)

Which linear fit minimizes the error in a least squares sense? Hint, what is the fraction of the variance explained implied by each fit? B A C R2= a12 x’2 y’2

Answer C RMS error = .707 implies R2=.5 RMS error = .767

Buying/Selling Nuts and Bolts Usually when you buy one nut, you buy one bolt as well Sometimes you miscount a little bit, but most of the time, you go to the store to buy sets of nuts and bolts

If I ask you to go to the store to get 12 nuts and 10 bolts, you trace out 10 units on the bolt axis and 12 units on the bolt axis 10 Bolts 12 NUTS

Alternatively, I could ask for 10 sets of nuts and bolts and 2 extra nuts Trace 10 units on the nut bolt axis Trace 2 units on the nut only axis Advantage: We explain more data with a single (rotated) axis

Some additional considerations: Maintaining orthogonality of our axes Keep the principle axis, make the secondary axis orthogonal to it 11 units On the nut bolt axis -1 unit on the bolt excess axis

Some additional considerations Rotate the axis about the mean of the data (x and y) The principle axis is then, the axis which explains the maximum fraction of the VARIANCE.

In EOF speak The Empirical Orthogonal Function (also called eigenvector) is the definition of the axes The Principal Component (also called time series), is the trace of each purchase (data realization) onto the axis

What about buying nuts, bolts, and washers? Number of Nuts

Subtract the mean from each axis Number of Nuts

Define principal axis which explains the most variance It is defined by 1 nut,1 bolt, 2 washers, NORMALIZED

The secondary axis is orthogonal to the primary axis And explains the greatest possibel fraction of the REMAINING variance Number of nuts

The third and final axis is orthogonal to the first two It is therefore uniquely defined by the first two Number of nuts

How do we represent the data that fall on the primary axis? Regress or correlate the each realization of the data against the rotated axis (eigenvector). Data point = [-9 nuts, -9 bolts, -18 washers] Regression onto principle axis= [-9, -9, -18] * [.408, .408, .81] = -22 units The dimensional principal component of the first EOF associated with a normalized axis

Alternatively, we could normalize the PC over all realizations Buyer Number Nuts Bought Bolts Bought Washers Bought Projection onto primary axis Normalized PC 1 14 16 32 16.32993162 1.074337606 2 4 -17.1464282 -1.128054487 3 20 21 39 26.53613888 1.745798611 12 10 3.265986324 0.214867521 5 6 9 -10.61445555 -0.698319444 -11.43095213 -0.752036325 7 18 8 15 28 13.06394529 0.859470085 17 -2.041241452 -0.134292201 -19.59591794 -1.289205128 Normalized First PC

Now regress the normalized PC against the original nut and bolt data Regression coef. = The number of bolts sold associated with a one standard deviation purchase of the first EOF/PC pair = 6 nuts in this case Alternatively, we could express the PC’s projection onto the bolt data as a correlation =

[6 NUTS, 6 BOLTS, 12 Washers] if the data was perfect Repeat the regression with bolts and washers we now the combination of nuts, bolts and washers associated with a standard purchase in the primary EOF/axis [6 NUTS, 6 BOLTS, 12 Washers] if the data was perfect Standard Purchase on The primary axis Multiply by the normalize PC and we have a large percent of our data

Again in EOF speak The axes definition are the EOFs (Eigenvectors, spatial patterns) The trace or projection onto the axes for each data realization is the principal component (time series) Lastly, the fraction of the variance explained along each axis is the eigenvalue or singular value ( will get to this soon)

Some thoughts 1st EOF answers the question, what linear combination of the data best explains the variance 2nd EOF answers question, what pattern best explains the REMAINING variance, is severely constrained by orthogonality Both PCs and EOFs are orthogonal Direction of EOF is ambiguous to 180 degrees EOF/PC pairs are uniquely defined by the data set, not the physics!!! Key Idea: Explain the majority of a large data set with a small number of spatial/temporal patterns

How does this relate to spatial-temporal data At any instance, we represent a spatial pattern by an Nx by Ny matrix representing the data at each grid point We could also think of the data at each time as vector of length Nx * Ny observations let Nx * Ny =M= # of observations at each time

M = number of observations at each time (Spatial Locations) Compose a matrix out of the vector of M observations at each of N observation times. M = number of observations at each time (Spatial Locations) N = number of time steps in the data set TIME SPACE Columns refer to a set of M spatial observations at a given time Rows correspond to a time series of observations at a given location

Singular Value Decomposition (SVD) X = USVT Principal Component, time series (normalized) DATA MATRIX Variance Explained by PC/EOF pair EOFs Spatial Structures (Normalized) M N N N M M M N LEADING EOF (Eigenvector) VECTOR LENGTH M (NORMALIZED IN SPACE) Diagonal squared Gives amount Of variance explained First PC/ time series (NORMALIZED in Time)

Eigenvectors, Eigenvalues, PC’s Eigenvectors explain variance in spatial (observation) dimension; Principle components explain variance in the time (realization of data) dimension. Each eigenvector has a corresponding principle component. The PAIR define a mode that explains variance. Each eigenvector/PC pair has an associated eigenvalue which relates to how much of the total variance is explained by that mode.

Fabricated example of spatial temporal data What spatial and temporal patterns most efficiently describe this data set? Are they orthogonal in space? Time?

EOF 1 - 60% variance expl. PC 1 EOF 2 - 40% variance expl. PC 2

% Variance = Eigenvalue Eigenvalue Spectrum % Variance = Eigenvalue Explained (sum of all Eigenvalues) EOF 1 - 60% variance expl. PC 1 EOF 2 - 40% variance expl. PC 2

Fabricated example of spatial temporal data What spatial and temporal patterns most efficiently describe this data set? Are they orthogonal in space? Time?

PC 1 EOF 2 PC 2

“Data” EOF 1 - 65% variance expl. PC 1 EOF 2 - 35% variance expl. PC 2

This is A composite Not an EOF Cold tongue index

Correlate the cold tongue index time series with SLP at each grid point to get a correlation map

EOFs of Northern hemisphere sea level Pressure EOFs of Real Data: Winter SLP anomalies EOF 1: AO/NAM (23% expl). EOF 2: PNA (13% expl.) Northern Annular mode Arctic Oscillation Pacific North America Mode EOF 3: non-distinct(10% expl.)

EOF 1 (AO/NAM) EOF 2 (PNA) EOF 3 (?) PC1 (AO/NAM) PC2 (PNA) PC3 (?)

Correlation maps vs. regression maps 500 hPa PNA - Correlation map (r values of each point with index) PNA – Geopotential Regression map (meters/std deviation of index) Correlation maps put each point on ‘equal footing’ Regression maps show magnitude of typical variability

Regress the PCs of SLP against surface air temperature annomalies The surface air temperature field associated with a one standard deviation NAM or PNA event

First 25 Eigenvalues for DJF SLP Significance Each EOF / PC pair comes with an associated eigenvalue The normalized eigenvalues (each eigenvalue divided by the sum of all of the eigenvalues) tells you the percent of variance explained by that EOF / PC pair. Eigenvalues need to be well separated from each other to be considered distinct modes. First 25 Eigenvalues for DJF SLP Eigenvalue EOF/PC #

Significance: The North Test First 25 Eigenvalues for DJF SLP North et al (1982) provide estimate of error in estimating eigenvalues Requires estimating degrees of freedom (DOF) of the data set. If eigenvalues overlap, those EOFs cannot be considered distinct. Any linear combination of overlapping EOFs is an equally viable structure. These two just barely overlap. Need physical intuition to help judge. Example of overlapping eigenvalues Eigenvalue EOF/PC #

Validity of PCA modes: Questions to ask Is the variance explained more than expected for null hypothesis (red noise, white noise, etc.)? Do we have an a priori reason for expecting this structure? Does it fit with a physical theory? Are the EOF’s sensitive to choice of spatial domain? Are the EOF’s sensitive to choice of sample? If data set is subdivided (in time), do you still get the same EOF’s?

Practical Considerations EOFs are easy to calculate, difficult to interpret. There are no hard and fast rules, physical intuition is a must. Due to the constraint of orthogonality, EOFs tend to create wave-like structures, even in data sets of pure noise. So pretty… so suggestive… so meaningless. Beware of this.

Practical Considerations EOF’s are created using linear methods, so they only capture linear relationships. By nature, EOF’s give are fixed spatial patterns which only vary in strength and in sign. E.g., the ‘positive’ phase of an EOF looks exactly like the negative phase, just with its sign changed. Many phenomena in the climate system don’t exhibit this kind of symmetry, so EOF’s can’t resolve them properly.

EOFs of global SLP- shown as correlation maps

Regression of Arctic oscillation on zonal wind annomalies and temperature Pressure Z Y m/s

Total Trend 1950-1998 Trend explained By AO Trend NOT Explained by AO SLP TREND SURFACE TEMPERATURE TREND K K K

AO index times series reconstructed from 2) Regression of AO EOF onto maps of SLP 1) Regression of surface air temperature pattern associated with the AO onto maps of surface air temperature

Rigor

First EOF of SST in the Pacific North of 20N

Analysis of stationary wave heat transport- 1st EOF of Seasonal cycle