Environmental Data Analysis with MatLab Lecture 24: Confidence Limits of Spectra; Bootstraps.

Slides:



Advertisements
Similar presentations
Environmental Data Analysis with MatLab Lecture 10: Complex Fourier Series.
Advertisements

Environmental Data Analysis with MatLab
Environmental Data Analysis with MatLab Lecture 21: Interpolation.
Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.
Lesson 10: Linear Regression and Correlation
Environmental Data Analysis with MatLab Lecture 8: Solving Generalized Least Squares Problems.
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Lecture 15 Orthogonal Functions Fourier Series. LGA mean daily temperature time series is there a global warming signal?
Environmental Data Analysis with MatLab
Hypothesis Testing Steps in Hypothesis Testing:
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Environmental Data Analysis with MatLab Lecture 9: Fourier Series.
Environmental Data Analysis with MatLab
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Linear regression models
Environmental Data Analysis with MatLab Lecture 13: Filter Theory.
Environmental Data Analysis with MatLab Lecture 16: Orthogonal Functions.
Lecture 3 Probability and Measurement Error, Part 2.
Environmental Data Analysis with MatLab Lecture 23: Hypothesis Testing continued; F-Tests.
Spectral analysis for point processes. Error bars. Bijan Pesaran Center for Neural Science New York University.
Environmental Data Analysis with MatLab Lecture 11: Lessons Learned from the Fourier Transform.
Environmental Data Analysis with MatLab
Environmental Data Analysis with MatLab Lecture 12: Power Spectral Density.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Environmental Data Analysis with MatLab Lecture 17: Covariance and Autocorrelation.
Chapter Seventeen HYPOTHESIS TESTING
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Environmental Data Analysis with MatLab Lecture 5: Linear Models.
Environmental Data Analysis with MatLab Lecture 3: Probability and Measurement Error.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
The Simple Regression Model
Chapter Topics Types of Regression Models
Simple Linear Regression Analysis
Data Analysis Statistics. Inferential statistics.
Environmental Data Analysis with MatLab Lecture 7: Prior Information.
5-3 Inference on the Means of Two Populations, Variances Unknown
Introduction To Signal Processing & Data Analysis
Modern Navigation Thomas Herring
Lecture 5 Correlation and Regression
Environmental Data Analysis with MatLab Lecture 20: Coherence; Tapering and Spectral Analysis.
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Chapter 13: Inference in Regression
Correlation and Linear Regression
Chapter 26: Comparing Counts AP Statistics. Comparing Counts In this chapter, we will be performing hypothesis tests on categorical data In previous chapters,
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Oceanography 569 Oceanographic Data Analysis Laboratory Kathie Kelly Applied Physics Laboratory 515 Ben Hall IR Bldg class web site: faculty.washington.edu/kellyapl/classes/ocean569_.
Environmental Data Analysis with MatLab Lecture 10: Complex Fourier Series.
● Final exam Wednesday, 6/10, 11:30-2:30. ● Bring your own blue books ● Closed book. Calculators and 2-page cheat sheet allowed. No cell phone/computer.
1 Statistical Distribution Fitting Dr. Jason Merrick.
Tests for Random Numbers Dr. Akram Ibrahim Aly Lecture (9)
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
June 30, 2008Stat Lecture 16 - Regression1 Inference for relationships between variables Statistics Lecture 16.
1 G Lect 7a G Lecture 7a Comparing proportions from independent samples Analysis of matched samples Small samples and 2  2 Tables Strength.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
T tests comparing two means t tests comparing two means.
ENGR 610 Applied Statistics Fall Week 11 Marshall University CITE Jack Smith.
Environmental Data Analysis with MatLab 2 nd Edition Lecture 14: Applications of Filters.
Environmental Data Analysis with MatLab 2 nd Edition Lecture 22: Linear Approximations and Non Linear Least Squares.
CHI SQUARE DISTRIBUTION. The Chi-Square (  2 ) Distribution The chi-square distribution is the probability distribution of the sum of several independent,
Correlation and Linear Regression
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Lecture 26: Environmental Data Analysis with MatLab 2nd Edition
CHAPTER 29: Multiple Regression*
Simple Linear Regression
Environmental Data Analysis with MatLab
Confidence and Prediction Intervals
Presentation transcript:

Environmental Data Analysis with MatLab Lecture 24: Confidence Limits of Spectra; Bootstraps

Housekeeping This is the last lecture The final presentations are next week The last homework is due today

Lecture 01Using MatLab Lecture 02Looking At Data Lecture 03Probability and Measurement Error Lecture 04Multivariate Distributions Lecture 05Linear Models Lecture 06The Principle of Least Squares Lecture 07Prior Information Lecture 08Solving Generalized Least Squares Problems Lecture 09Fourier Series Lecture 10Complex Fourier Series Lecture 11Lessons Learned from the Fourier Transform Lecture 12Power Spectral Density Lecture 13Filter Theory Lecture 14Applications of Filters Lecture 15Factor Analysis Lecture 16Orthogonal functions Lecture 17Covariance and Autocorrelation Lecture 18Cross-correlation Lecture 19Smoothing, Correlation and Spectra Lecture 20Coherence; Tapering and Spectral Analysis Lecture 21Interpolation Lecture 22 Hypothesis testing Lecture 23 Hypothesis Testing continued; F-Tests Lecture 24 Confidence Limits of Spectra, Bootstraps SYLLABUS

purpose of the lecture continue develop a way to assess the significance of a spectral peak and develop the Bootstrap Method of determining confidence intervals

Part 1 assessing the confidence level of a spectral peak

what does confidence in a spectral peak mean?

one possibility indefinitely long phenomenon you observe a short time window (looks “noisy” with no obvious periodicities) you compute the p.s.d. and detect a peak you ask would this peak still be there if I observed some other time window? or did it arise from random variation?

example t f f ff d a.s.d YNNN

t f f ff d YYYY

Null Hypothesis The spectral peak can be explained by random variation in a time series that consists of nothing but random noise.

Easiest Case to Analyze Random time series that is: Normally-distributed uncorrelated zero mean variance that matches power of time series under consideration

So what is the probability density function p(s 2 ) of points in the power spectral density s 2 of such a time series ?

Chain of Logic, Part 1 The time series is Normally-distributed The Fourier Transform is a linear function of the time series Linear functions of Normally-distributed variables are Normally-distributed, so the Fourier Transform is Normally-distributed too For a complex FT, the real and imaginary parts are individually Normally-distributed

Chain of Logic, Part 2 The time series has zero mean The Fourier Transform is a linear function of the time series The mean of a linear function is the function of the mean value, so the mean of the FT is zero For a complex FT, the means of the real and imaginary parts are individually zero

Chain of Logic, Part 3 The time series is uncorrelated The Fourier Transform has [G T G] -1 proportional to I So by the usual rules of error propagation, the Fourier Transform is uncorrelated too For a complex FT, the real and imaginary parts are uncorrelated

Chain of Logic, Part 4 The power spectral density is proportional to the sum of squares of the real and imaginary parts of the Fourier Transform The sum of squares of two uncorrelated Normally- distributed variables with zero mean and unit variance is chi-squared distributed with two degrees of freedom. Once the p.s.d. is scaled to have unit variance, it is chi- squared distributed with two degrees of freedom.

so s 2 /c is chi-squared distributed where c is a yet-to-be-determined scaling factor

in the text, it is shown that where: σ d 2 is the variance of the data N f is the length of the p.s.d. Δ f is the frequency sampling f f is the variance of the taper. It adjusts for the effect of a tapering.

A) tapered time series time t, seconds d(i) B) power spectral density frequency f, Hz 2d2d 2d2d s 2 (f) mean 95% example 1: a completely random time series

power spectral density, s 2 (f) counts mean95% example 1: histogram of spectral values

A) tapered time series time t, seconds d(i) B) power spectral density frequency f, Hz 2d2d 2d2d s 2 (f) mean 95% example 2: random time series consisting of 5 Hz cosine plus noise

power spectral density, s 2 (f) counts mean95% peak example 2: histogram of spectral values

so how confident are we of a peak at 5 Hz ? = the p.s.f. is predicted to be less than the level of the peak % of the time But here we must be very careful

two alternative Null Hypotheses a peak of the observed amplitude at 5 Hz is caused by random variation a peak at the observed amplitude somewhere in the p.s.d. is caused by random variation

two alternative Null Hypotheses a peak of the observed amplitude at 5 Hz is caused by random variation a peak at the observed amplitude somewhere in the p.s.d. is caused by random variation much more likely, since p.s.d. has many frequency points (513 in this case)

two alternative Null Hypotheses a peak of the observed amplitude at 5 Hz is caused by random variation a peak at the observed amplitude somewhere in the p.s.d. is caused by random variation peak of the observed amplitude or greater occurs only = % of the time The Null Hypothesis can be rejected to high certainty

two alternative Null Hypotheses a peak of the observed amplitude at 5 Hz is caused by random variation a peak at the observed amplitude somewhere in the p.s.d. is caused by random variation peak of the observed amplitude occurs only 1-( ) 513 = 3% of the time The Null Hypothesis can be rejected to acceptable certainty

Part 2 The Bootstrap Method

The Issue What do you do when you have a statistic that can test a Null Hypothesis but you don’t know its probability density function ?

If you could repeat the experiment many times, you could address the problem empirically perform experiment calculate statistic, s make histogram of s ’s normalize histogram into empirical p.d.f. repeat

The problem is that it’s not usually possible to repeat an experiment many times over

Bootstrap Method create approximate repeat datasets by randomly resampling (with duplications) the one existing data set

example of resampling original data set random integers in range 1-6 resampled data set

example of resampling original data set random integers in range 1-6 new data set

p(d)p’(d) sampling duplication mixing interpretation of resampling

time t, hours d(i) Example what is the p(b) where b is the slope of a linear fit?

This is a good test case, because we know the answer if the data are Normally-distributed, uncorrelated with variance σ d 2, and given the linear problem d = G m where m = [intercept, slope] T The slope is also Normally-distributed with a variance that is the lower-right element of σ d 2 [G T G] -1

create resampled data set returns N random integers from 1 to N

usual code for least squares fit of line save slopes

histogram of slopes

2.5% and 97.5% bounds integrate p(b) to P(b)

p(b) standard error propagation bootstrap slope, b 95% confidence

a more complicated example p(r) where r is ratio of CaO to Na 2 O ratio of the second varimax factor of the Atlantic Rock dataset

p(r) CaO / Na 2 O ratio, r 95% confidence mean

we can use this histogram to write confidence intervals for r r has a mean of % probability that r is between and and roughly, since p(r) is approximately symmetrical r = ± (95% confidence)