Presentation is loading. Please wait.

Presentation is loading. Please wait.

Environmental Data Analysis with MatLab Lecture 23: Hypothesis Testing continued; F-Tests.

Similar presentations


Presentation on theme: "Environmental Data Analysis with MatLab Lecture 23: Hypothesis Testing continued; F-Tests."— Presentation transcript:

1 Environmental Data Analysis with MatLab Lecture 23: Hypothesis Testing continued; F-Tests

2 Lecture 01Using MatLab Lecture 02Looking At Data Lecture 03Probability and Measurement Error Lecture 04Multivariate Distributions Lecture 05Linear Models Lecture 06The Principle of Least Squares Lecture 07Prior Information Lecture 08Solving Generalized Least Squares Problems Lecture 09Fourier Series Lecture 10Complex Fourier Series Lecture 11Lessons Learned from the Fourier Transform Lecture 12Power Spectral Density Lecture 13Filter Theory Lecture 14Applications of Filters Lecture 15Factor Analysis Lecture 16Orthogonal functions Lecture 17Covariance and Autocorrelation Lecture 18Cross-correlation Lecture 19Smoothing, Correlation and Spectra Lecture 20Coherence; Tapering and Spectral Analysis Lecture 21Interpolation Lecture 22 Hypothesis testing Lecture 23 Hypothesis Testing continued; F-Tests Lecture 24 Confidence Limits of Spectra, Bootstraps SYLLABUS

3 purpose of the lecture continue Hypothesis Testing and apply it to testing the significance of alternative models

4 Review of Last Lecture

5 Steps in Hypothesis Testing

6 Step 1. State a Null Hypothesis some variation of the result is due to random variation

7 Step 2. Focus on a statistic that is unlikely to be large when the Null Hypothesis is true

8 Step 3. Determine the value of statistic for your problem

9 Step 4. Calculate that the probability that a the observed value or greater would occur if the Null Hypothesis were true

10 Step 5. Reject the Null Hypothesis only if such large values occur less than 5% of the time

11 An example test of a particle size measuring device

12 manufacturer's specs machine is perfectly calibrated particle diameters scatter about true value measurement error is σ d 2 = 1 nm 2

13 your test of the machine purchase batch of 25 test particles each exactly 100 nm in diameter measure and tabulate their diameters repeat with another batch a few weeks later

14 Results of Test 1

15 Results of Test 2

16 Question 1 Is the Calibration Correct? Null Hypothesis The observed deviation of the average particle size from its true value of 100 nm is due to random variation (as contrasted to a bias in the calibration).

17 in our case the key question is Are these unusually large values for Z ? = 0.278 and 0.243 = 0.780 and 0.807 So values of |Z| greater than Z est are very common The Null Hypotheses cannot be rejected there is no reason to think the machine is biased

18 Question 2 Is the variance in spec? Null Hypothesis The observed deviation of the variance from its true value of 1 nm 2 is due to random variation (as contrasted to the machine being noisier than the specs).

19 the key question is Are these unusually large values for χ 2 ? = ? Results of the two tests

20 In MatLab = 0.640 and 0.499 So values of χ 2 greater than χ est 2 are very common The Null Hypotheses cannot be rejected there is no reason to think the machine is noisy

21 End of Review now continuing this scenario …

22 Question 1, revisited Is the Calibration Correct? Null Hypothesis The observed deviation of the average particle size from its true value of 100 nm is due to random variation (as contrasted to a bias in the calibration).

23 suppose the manufacturer had not specified a variance then you would have to estimate it from the data = 0.876 and 0.894

24 but then you couldn’t form Z since you need the true variance

25 last lecture, we examined a quantity t, defined as the ratio of a Normally-distributed variable and something that has the form as of an estimated variance

26 so we will test t instead of Z

27 in our case Are these unusually large values for t ? = 0.297 and 0.247 = 0.768 and 0.806 So values of |t| greater than t est are very common The Null Hypotheses cannot be rejected there is no reason to think the machine is biased

28 Question 3 Has the calibration changed between the two tests? Null Hypothesis The difference between the means is due to random variation (as contrasted to a change in the calibration). = 100.055 and 99.951

29 since the data are Normal their means (a linear function) is Normal and the difference between them (a linear function) is Normal

30 if c = a – b then σ c 2 = σ a 2 + σ b 2

31 so use a Z test in our case Z est = 0.368

32 = 0.712 Values of |Z| greater than Z est are very common so the Null Hypotheses cannot be rejected there is no reason to think the bias of the machine has changed using MatLab

33 Question 4 Has the variance changed between the two tests? Null Hypothesis The difference between the variances is due to random variation (as contrasted to a change in the machine’s precision). = 0.896 and 0.974

34 last lecture, we examined the distribution of a quantity F, the ratio of variances

35 so use an F test in our case F est = 1.110

36 F p(F) 1/F est F est whether the top or bottom χ 2 in is the bigger is irrelevant, since our Null Hypothesis only concerns their being different. Hence we need evaluate:

37 = 0.794 Values of F greater than F est or less than 1/F est are very common using MatLab so the Null Hypotheses cannot be rejected there is no reason to think the noisiness of the machine has changed

38 Another use of the F-test

39 we often develop two alternative models to describe a phenomenon and want to know which is better?

40 However any difference in total error between two models may just be due to random variation

41 Null Hypothesis the difference in total error between two models is due to random variation

42 linear fit cubic fit time t, hours d(i) Example Linear Fit vs. Cubic Fit?

43 A) linear fit B) cubic fit time t, hours d(i) Example Linear Fit vs Cubic Fit? cubic fit has 14% smaller error, E

44 The cubic fits 14% better, but … The cubic has 4 coefficients, the line only 2, so the error of the cubic will tend to be smaller anyway and furthermore the difference could just be due to random variation

45 Use an F-test degrees of freedom on linear fit: ν L = 50 data – 2 coefficients = 48 degrees of freedom on cubic fit: ν C = 50 data – 4 coefficients = 46 F = (E L / ν L ) / (E C / ν C ) = 1.14

46 in our case = 0.794 Values of F greater than F est or less than 1/F est are very common so the Null Hypotheses cannot be rejected

47 in our case = 0.794 Values of F greater than F est or less than 1/F est are very common so the Null Hypotheses cannot be rejected there is no reason to think one model is ‘really’ better than the other


Download ppt "Environmental Data Analysis with MatLab Lecture 23: Hypothesis Testing continued; F-Tests."

Similar presentations


Ads by Google