Oceanography 569 Oceanographic Data Analysis Laboratory Kathie Kelly Applied Physics Laboratory 515 Ben Hall IR Bldg class web site: faculty.washington.edu/kellyapl/classes/ocean569_.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Managerial Economics in a Global Economy
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Probability & Statistical Inference Lecture 9
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Regression Analysis Simple Regression. y = mx + b y = a + bx.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
1 Chapter 2 Simple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Time Series Analysis Autocorrelation Naive & Simple Averaging
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
Chapter 13 Additional Topics in Regression Analysis
Correlation and Simple Regression Introduction to Business Statistics, 5e Kvanli/Guynes/Pavur (c)2000 South-Western College Publishing.
Statistics for Managers Using Microsoft® Excel 5th Edition
Statistics for Managers Using Microsoft® Excel 5th Edition
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 13-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Chapter 13 Forecasting.
FIN357 Li1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
Chapter 11 Multiple Regression.
Topic 3: Regression.
Multiple Regression and Correlation Analysis
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Pertemua 19 Regresi Linier
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Business Statistics - QBM117 Statistical inference for regression.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 10 th Edition.
Correlation and Regression Analysis
Chapter 7 Forecasting with Simple Regression
Introduction to Regression Analysis, Chapter 13,
Constant process Separate signal & noise Smooth the data: Backward smoother: At any give T, replace the observation yt by a combination of observations.
Creating Empirical Models Constructing a Simple Correlation and Regression-based Forecast Model Christopher Oludhe, Department of Meteorology, University.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
Oceanography 569 Oceanographic Data Analysis Laboratory Kathie Kelly Applied Physics Laboratory 515 Ben Hall IR Bldg class web site: faculty.washington.edu/kellyapl/classes/ocean569_.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Oceanography 569 Oceanographic Data Analysis Laboratory Kathie Kelly Applied Physics Laboratory 515 Ben Hall IR Bldg class web site: faculty.washington.edu/kellyapl/classes/ocean569_.
Physics 114: Exam 2 Review Lectures 11-16
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Autocorrelation in Time Series KNNL – Chapter 12.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
4-1 Operations Management Forecasting Chapter 4 - Part 2.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
Lecture 10: Correlation and Regression Model.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Correlation & Regression Analysis
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Normal Equations The Orthogonality Principle Solution of the Normal Equations.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and  2 Now, we need procedures to calculate  and  2, themselves.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Oceanography 569 Oceanographic Data Analysis Laboratory Kathie Kelly Applied Physics Laboratory 515 Ben Hall IR Bldg class web site: faculty.washington.edu/kellyapl/classes/ocean569_.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Regression Analysis Week 4.
CHAPTER 29: Multiple Regression*
6-1 Introduction To Empirical Models
Undergraduated Econometrics
Seasonal Forecasting Using the Climate Predictability Tool
Chapter 13 Additional Topics in Regression Analysis
Presentation transcript:

Oceanography 569 Oceanographic Data Analysis Laboratory Kathie Kelly Applied Physics Laboratory 515 Ben Hall IR Bldg class web site: faculty.washington.edu/kellyapl/classes/ocean569_ 2014/ Kathie Kelly Applied Physics Laboratory 515 Ben Hall IR Bldg class web site: faculty.washington.edu/kellyapl/classes/ocean569_ 2014/

Exercise 3a: Error Estimates Known errors for Q, T and H Need error estimates for Q/(ρ c p H) dT/dt

Exercise 3a: Error Estimates Terms quite similar: Q/(ρ c p H) dT/dt Compare LHS variance with error variance estimate

1) Compute estimated relative error variance for heating term 2) Convert to error variance by multiplying by variance 3) Compute error variance for dT/dt (assuming uncorrelated errors) 4) Total error variance is sum of these terms 5) Compare with variance of LHS 6) Error variance > LHS variance => other terms negligible comared with errors Exercise 3a: Error Estimates

Exercise 3b: Significance Test Is the recent global temperature change significantly different from the previous values? Probability of year-to-year temperature differences

Exercise 3b: Significance Test Differences ΔT can be normalized to N(0,1) using the Z-transform Probability p of Z score (or lower) found from a table OR probability of ΔT score can be found from Matlab: p = normcdf(Z,0,1) [Matlab function] Is the most recent temperature change significantly different from previous values at 5%? (Is the probability of recent value smaller than 5%?)

The error variance in the derivative dT/dt is given by How is the error variance affected by the choice of time step Δt? Consider two cases: 1)errors are point-to-point random 2)errors have time scales long relative to T Derivatives and Errors: strategy

1) For errors with little serial correlation the error variance in the derivative decreases with increasing interval Δt 2) For errors that are highly correlated (more like a bias) there is little advantage to increasing the interval size because the error terms nearly cancel Derivatives and Errors

Analysis of Variance (ANOVA) Evaluate a dynamical or statistical model using observations [mean and periodic signals removed] Example: linear function The error is the difference between the model and the observations. The data variance NOT explained by model (error variance) is The fraction of data variance σ 2 that IS explained by the model is usually called the skill Signal-to-noise ratio:

Chi-Squared A related measure of the goodness of fit of a least-squares estimator is chi-squared (Χ 2 ) For errors that are all about the same size this is just the error variance divided by the data variance which is the fraction of variance in the error so in this case skill is given by

Analysis of Variance: correlation vs. skill What is the relationship between correlation and skill for By definition: In terms of variance: Eliminate y: which is the skill S. This is a special case where amplitude of y is nearly the same as that of x and

Analysis of Variance for a Model What is the relationship between skill and correlation for In terms of variance: So the skill S is given by

Exercise 4: Lagged correlations SSH: longitude-time Δt Δx Lag correlate the SSH at one location with SSH at every other location to get this image:

Measuring Similarity The correlation gives a measure of the similarity of two time series but not their magnitude Example of a model that is highly correlated with data but smaller in magnitude The fraction of variance explained by the model (skill) does not distinguish between correlations and magnitudes. Does any metric describe both?

To compare both correlations and relative magnitudes (Taylor, 2001) Given a model of an observed value The mean squared error is given by which reduces to (1) because Compare with law of cosines for triangle (2) Taylor Diagram σmσm σoσo ε θ Truth ---

Taylor Diagram & Normalized Version Eqns (1) and (2) match if we define Triangle shows error contributions of both correlation and magnitude geometrically Normalized version: Divide through by variance of observations σ o or where r is relative magnitude

Analysis of Variance for a Model What is the relationship between skill and correlation for In terms of variance: So the skill S is given by where. For large a, S can be negative. (For small a the correlation will be small and the assumption is violated.)

Skill: Empirical vs. Model A good estimator has a skill near 1 (small squared error). A linear regression is an empirical fit to data that minimizes the squared error (least-squares fit). Its skill is positive by design. Skill or fraction of variance explained can also be used to evaluate a model d m (x,t) = f[u(x,t]) However, models do not optimize the fit to the observations (unless data assimilation is used) so the skill can be negative. For example, a model that predicts the seasonal cycle of ocean temperature, with a good match of phase, but an underestimate of amplitude, could give a negative skill. Therefore, Taylor diagrams are frequently used to evaluate models.

Lowpass Filter Smooth data to remove errors. What is the assumption? filter removes half the power at specified time (or space) scale Input to function “butter” in signal processing toolbox: Wn = 2*  t/half_power 2*  t is twice the grid spacing (Nyquist frequency)

Other Filters Highpass filter: removes low frequencies (or large spatial scales) Bandpass filter: removes low and high frequencies (small and large scales)

Filtering and Correlations Lowpass filter: what happens to the integral length scale? How does this affect N*? How does this affect the significance level for correlations?

Linear Algebra Review (2)

Linear Algebra Review (3) B=A\C

Linear Algebra Review (1)

Linear Regression

Linear Regression (cont’d) x minimum error observation z y estimate estimate error best estimate

Linear Regression (cont’d) y=X   =X\y

Linear Regression: limiting the number of variables Note: fitting data to a curve is a simple form of linear regression in which variables X are 1, x, x 2, x 3,... Coefficients are optimized to give best fit to the data. For each variable X added to the regression squared error decreases, because coefficients also fit random components (“noise”). However, on another set of data the same coefficients will not fit random components and the fit may be worse. The amount by which the estimator overfits data is sometimes called “artificial skill.” Minimize “artificial skill” by limiting regression to only significant variables, which are determined using an F test on the variance reduction (skill) of the estimator. We check the estimator by comparing the errors from regression and errors from another set of data.

Significance of Linear Regression k is number of additional parameters

Code for Linear Regression (1)

Code for Linear Regression (2)

Exercise 6: Linear Regression create linear estimator for heat flux

Linear regression Test each variable (hindcast) Regression on single variables for latent heat flux [use only half the data] cosineair-sea temp. wind speed humidity diff.

Multiple Regression Test combinations of variables (hindcast) Find variable(s) least correlated with best single variable correlated = redundant F test for evaluating additional variables humidity humidity + wind humidity + wind + cosine

Multiple Regression Examine residuals plot residual check histogram (nearly normal?) Is there a pattern in residual? residual humidity + wind + cosine

Linear regression Test regressions on new data (forecast) Compare hindcast and forecast errors Do estimators perform as predicted? Check for patterns in residual humidity + wind