Characterizing the Spatiotemporal Variation of Environmental Data I. Principal Component Analysis II. Time Series Analysis Brian K. Eder Air Resources.

Slides:



Advertisements
Similar presentations
Sampling Design, Spatial Allocation, and Proposed Analyses Don Stevens Department of Statistics Oregon State University.
Advertisements

Krishna Rajan Data Dimensionality Reduction: Introduction to Principal Component Analysis Case Study: Multivariate Analysis of Chemistry-Property data.
Air Force Technical Applications Center 1 Subspace Based Three- Component Array Processing Gregory Wagner Nuclear Treaty Monitoring Geophysics Division.
An Introduction to Multivariate Analysis
Factor Analysis Continued
Dimension reduction (1)
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Statistical tools in Climatology René Garreaud
The influence of extra-tropical, atmospheric zonal wave three on the regional variation of Antarctic sea ice Marilyn Raphael UCLA Department of Geography.
Visualizing and Exploring Data Summary statistics for data (mean, median, mode, quartile, variance, skewnes) Distribution of values for single variables.
Lecture 7: Principal component analysis (PCA)
Factor Analysis Research Methods and Statistics. Learning Outcomes At the end of this lecture and with additional reading you will be able to Describe.
Principle Component Analysis What is it? Why use it? –Filter on your data –Gain insight on important processes The PCA Machinery –How to do it –Examples.
A quick introduction to the analysis of questionnaire data John Richardson.
1 Interannual Variability in Stratospheric Ozone Xun Jiang Advisor: Yuk L. Yung Department of Environmental Science and Engineering California Institute.
INTERDECADAL OSCILLATIONS OF THE SOUTH AMERICAN MONSOON AND THEIR RELATIONSHIP WITH SEA SURFACE TEMPERATURE João Paulo Jankowski Saboia Alice Marlene Grimm.
Dr. Michael R. Hyman Factor Analysis. 2 Grouping Variables into Constructs.
Center for Hydrometeorology and Remote Sensing, University of California, Irvine Data Mining Tools-Empirical Orthogonal Functions and Nonlinear Mode Decomposition.
Chapter 2 Dimensionality Reduction. Linear Methods
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Extensions of PCA and Related Tools
El Niño-Southern Oscillation in Tropical Column Ozone and A 3.5-year signal in Mid-Latitude Column Ozone Jingqian Wang, 1* Steven Pawson, 2 Baijun Tian,
What is it? Principal Component Analysis (PCA) is a standard tool in multivariate analysis for examining multidimensional data To reveal patterns between.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Modulation of eastern North Pacific hurricanes by the Madden-Julian oscillation. (Maloney, E. D., and D. L. Hartmann, 2000: J. Climate, 13, )
Advanced Correlational Analyses D/RS 1013 Factor Analysis.
Principal Component vs. Common Factor. Varimax Rotation Principal Component vs. Maximum Likelihood.
Interpreting Principal Components Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University L i n.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Long-Term Changes in Northern and Southern Annular Modes Part I: Observations Christopher L. Castro AT 750.
Objective Data  The outlined square marks the area of the study arranged in most cases in a coarse 24X24 grid.  Data from the NASA Langley Research Center.
AIRS Radiance and Geophysical Products: Methodology and Validation Mitch Goldberg, Larry McMillin NOAA/NESDIS Walter Wolf, Lihang Zhou, Yanni Qu and M.
Principal Components Analysis. Principal Components Analysis (PCA) A multivariate technique with the central aim of reducing the dimensionality of a multivariate.
Lecture 12 Factor Analysis.
Introduction to Linear Algebra Mark Goldman Emily Mackevicius.
Christina Bonfanti University of Miami- RSMAS MPO 524.
Evaluation of Models-3 CMAQ I. Results from the 2003 Release II. Plans for the 2004 Release Model Evaluation Team Members Prakash Bhave, Robin Dennis,
1 Opposite phases of the Antarctic Oscillation and Relationships with Intraseasonal to Interannual Activity in the Tropics during the Austral Summer (submitted.
Principle Component Analysis and its use in MA clustering Lecture 12.
A signal in the energy due to planetary wave reflection in the upper stratosphere J. M. Castanheira(1), M. Liberato(2), C. DaCamara(3) and J. M. P. Silvestre(1)
3 “Products” of Principle Component Analysis
UBC/UW 2011 Hydrology and Water Resources Symposium Friday, September 30, 2011 DIAGNOSIS OF CHANGING COOL SEASON PRECIPITATION STATISTICS IN THE WESTERN.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
FACTOR ANALYSIS.  The basic objective of Factor Analysis is data reduction or structure detection.  The purpose of data reduction is to remove redundant.
Principal Components Analysis ( PCA)
Central limit theorem - go to web applet. Correlation maps vs. regression maps PNA is a time series of fluctuations in 500 mb heights PNA = 0.25 *
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.
Multivariate Time Series Analysis Charles D. Camp MSRI July 18, 2008.
Daiwen Kang 1, Rohit Mathur 2, S. Trivikrama Rao 2 1 Science and Technology Corporation 2 Atmospheric Sciences Modeling Division ARL/NOAA NERL/U.S. EPA.
Factor and Principle Component Analysis
North Atlantic Oscillation and its Influence on Ozone in London
Factor analysis Advanced Quantitative Research Methods
Principal Component Analysis (PCA)
Dimension Reduction via PCA (Principal Component Analysis)
Measuring latent variables
Interpreting Principal Components
Principal Component Analysis
Measuring latent variables
Descriptive Statistics vs. Factor Analysis
Measuring latent variables
Introduction to Statistical Methods for Measuring “Omics” and Field Data PCA, PcoA, distance measure, AMOVA.
X.1 Principal component analysis
PCA of Waimea Wave Climate
Factor Analysis (Principal Components) Output
Principal Component Analysis
Measuring latent variables
Evaluation of Models-3 CMAQ Annual Simulation Brian Eder, Shaocai Yu, Robin Dennis, Alice Gilliland, Steve Howard,
Presentation transcript:

Characterizing the Spatiotemporal Variation of Environmental Data I. Principal Component Analysis II. Time Series Analysis Brian K. Eder Air Resources Laboratory National Oceanic and Atmospheric Administration National Exposure Research Laboratory Environmental Protection Agency RTP, NC 27711

Principal Component Analysis Objective Identify, through data reduction, the characteristic, recurring and independent modes of variation (signals) of a large, noisy data set. Approach Sorts, initially correlated data, into a hierarchy of statistically independent modes of variation (mutually orthogonal linear combinations), which explain successively less and less of the total variation. Utility Facilitates identification, characterization and understanding of the spatiotemporal variation of the data set across a myriad of spatial and temporal scales.

Numerous applications  The spatial and temporal analysis of the Palmer Drought Severity Index over the Southeastern US. (J. of Climatology ‑ 7, pp 31 ‑ 56)  A principal component analysis of SO 4 = precipitation concentrations over the eastern US. (Atmospheric Environment 23, No. 12, pp 2739 ‑ 2750)  A characterization of the spatiotemporal variation of non-urban ozone in the Eastern US. (Atmospheric Environment, 27A, pp )  A climatology of total ozone mapping spectrometer data using rotated principal component analysis. (Journal of Geophysical Research. 104, No. D3, pp )  A climatology of air concentration data from the Clean Air Status and Trends Network (CASTNet). (Atmospheric Environment)

Methodology Spatial Calculate a square, symmetrical correlation matrix R having dimensions j x j, from the original data matrix having dimensions j (e.g. stations, grid cells) x i (e.g. days, weeks). By using R and the Identity matrix I, of the same dimensions, j characteristics roots or eigenvalues ( ) can be derived that satisfythe following polynomial equation: det [ j R j - j I j ] = 0(1)

Methodology Spatial For each root of (1) which is called the characteristic equation, a nonzero vector e can be derived such that: j R j e 1 = j e 1 (2) where e is the characteristic vector (eigenvector) of the correlation matrix R, associated with its corresponding eigenvalue. - The eigenvectors represent the mutually orthogonal linear combinations (modes of variation) of the matrix. - The eigenvalues represent the amount of variation explained by each of the eigenvectors.

Methodology Spatial When the elements of each eigenvector (e) are multiplied by the square root of the associated eigenvalue ( 0.5 ), the principal component (pc) Loading (L) is obtained. L : provides the correlation between the pc and the station (grid cell) L 2 : provides the proportion of variance at an individual station (grid cell) that can be attributed to a particular pc The sum of the squared Loadings indicates (3) the total variance accounted for by the pc, which stated earlier is called the eigenvalue For station j and pc k The pc Loadings can then be spatially mapped onto their respective stations (grid cells) identifying homogeneity or “influence regimes”.

Methodology Spatial By retaining the first few eigenvector-eigenvalue pairs or principal components, a substantial amount of the variation can be explained while ignoring higher-order pcs, which explain successively less of the variance. How many principal components should be retained?? - Scree test - > 1 criteria - Overland-Priesendorfer “Rule N” test - Common Sense

Methodology Spatial Rotation of Retained Principal Components Facilitates spatial interpretation allowing better identification of areas that are homogeneous Oblique Rotation Orthogonal Rotation An orthogonal rotation developed by Kaiser (’58) increases the segregation between principal component loadings which in turn betterdefines a distinct group or cluster of homogeneous stations. Stations (grid cells) are then assigned to the pc (“influence regime”) having the largest pc loading.

Methodology Temporal Having identified influence regimes, we can examine their temporal structure thru calculation of the pc Score The pc Score for time period i on principal component k are weighted, summed values whose magnitudes depend upon the observation O ij for time i at station j and L jk is the loading of station j on component k as seen below: (4) The pc Scores are standardized (mean: 0, std dev: 1)

Methodology Temporal When plotted as a time series, the pc Scores provide excellent insight into the spectrum of temporal variance experienced by each of the influence regimes. This temporal variance can then be examined using: Spectral Density Analysis Correlograms Filters Red and White Noise tests

“ Dynamical Forcing” Related to transport and tropopause height. Sharp late winter, early spring peak. More broad, late summer, early autumn minimum. Strong annual signal (Periodicity = 2  /f)

“Photochemical Forcing” Related to solar insolation. Broad, mid-summer maximum. Sharp, mid-winter minimum. Strong annual signal

“Dynamical Forcing” Related to annual transport and SAO in wind field (peaks at equinoxes) Strong annual, semi-annual and a long term signal.

“Quasi-Biennial Forcing” Related to QBO of tropical winds in the stratosphere. Note peaks in ‘80, ‘82, ’85, ‘87, ‘90 and ’92. Strong QBO signal (~2.5 years)

“Wave Number 5” One of 5 similar patterns found between S. Due to medium scale baroclinic waves associated with Antarctic Polar Jet stream. Note variability. Note trend and semi-annual periodicity.

“El-Nino-Southern Oscillation” During ENSO years of ’82-83, ’87 and ’91-’92, ozone values are very low, while in none ENSO years ozone values are high. Note strong periodicity of ~ 4 years.

“Data Retrieval Artifact” An earlier analysis of TOMS Version 6.0 included a “cross-track” bias related to successive orbital scans of the surface. Note the tremendous “pulse” in the spectral plot. NASA was unaware of this artifact.

Six Homogenous Regions Great Lakes Northeast Mid-Atlantic Southwest South Florida

Daily Time Series Standardized PC scores Summer Peak No pronounced Peak Spring Peak

Daily Time Series Standardized PC scores Early summer peak Late summer peaks

Seasonal Time Series Standardized PC scores Medians over 6 years Cubic Spline Smoother

Seasonal Time Series Standardized PC scores Medians over 6 years Cubic Spline Smoother

Correlograms Standardized PC scores Deseasonalized Weaker persistence Stronger Persistence Lag 1 r = 0.56Lag 1 r = 0.47 Lag 1 r = 0.61Lag 1 r = 0.53 Lag 1 r = 0.70 Lag 1 r = 0.64

Spectral Density Standardized PC scores Deseasonalized “White Noise” “Red Noise”

Spectral Density Standardized PC scores Deseasonalized “White Noise” “Red Noise”

Summary Principal Component Analysis: -allows one to examine the spatial and temporal variability of environmental data across a myriad of scales; -utilization of Kaiser’s orthogonal rotation facilitates identification of “influence regimes” where concentrations exhibit statistically unique and homogenous characteristics. -utilization of time series analyses, including spectral density analysis, facilitates characterization of the “influence regimes”

Summary Principal Component Analysis -is useful in that it: -can provide “weight of evidence” of the regional-scale nature of environmental data, -facilitate understanding of the mechanisms responsible for the data’s unique variability among influence regimes, -designate stations (grid cells) that can be used as indicators for each influence regime, -identify erroneous data or data artifacts that are often missed with other analyses.