GG 313 Lecture 7 Chapter 2: Hypothesis Testing Sept. 13, 2005.

Slides:



Advertisements
Similar presentations
Estimation of Means and Proportions
Advertisements

“Students” t-test.
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Chapter 6 Sampling and Sampling Distributions
Hypothesis Testing A hypothesis is a claim or statement about a property of a population (in our case, about the mean or a proportion of the population)
1 1 Slide STATISTICS FOR BUSINESS AND ECONOMICS Seventh Edition AndersonSweeneyWilliams Slides Prepared by John Loucks © 1999 ITP/South-Western College.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 9 Hypothesis Testing Developing Null and Alternative Hypotheses Developing Null and.
Statistics and Quantitative Analysis U4320
Chapter 11- Confidence Intervals for Univariate Data Math 22 Introductory Statistics.
BCOR 1020 Business Statistics Lecture 17 – March 18, 2008.
10 Hypothesis Testing. 10 Hypothesis Testing Statistical hypothesis testing The expression level of a gene in a given condition is measured several.
Sample size computations Petter Mostad
Chapter 7 Sampling and Sampling Distributions
Final Jeopardy $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 LosingConfidenceLosingConfidenceTesting.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
GG313 Lecture 8 9/15/05 Parametric Tests. Cruise Meeting 1:30 PM tomorrow, POST 703 Surf’s Up “Peak Oil and the Future of Civilization” 12:30 PM tomorrow.
Topic 2: Statistical Concepts and Market Returns
1 Hypothesis Testing In this section I want to review a few things and then introduce hypothesis testing.
Inference about a Mean Part II
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
IENG 486 Statistical Quality & Process Control
Inferences About Process Quality
Chapter 9 Hypothesis Testing.
5-3 Inference on the Means of Two Populations, Variances Unknown
Statistics for Managers Using Microsoft® Excel 7th Edition
Inferential Statistics
Statistics for Managers Using Microsoft® Excel 7th Edition
AM Recitation 2/10/11.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Probability Tables. Normal distribution table Standard normal table Unit normal table It gives values.
Probability Distributions and Test of Hypothesis Ka-Lok Ng Dept. of Bioinformatics Asia University.
Intermediate Statistical Analysis Professor K. Leppel.
Fundamentals of Hypothesis Testing: One-Sample Tests
Slide 23-1 Copyright © 2004 Pearson Education, Inc.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Go to Index Analysis of Means Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Dan Piett STAT West Virginia University
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Chapter 9 Hypothesis Testing and Estimation for Two Population Parameters.
Inferential Statistics 2 Maarten Buis January 11, 2006.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 1): Two-tail Tests & Confidence Intervals Fall, 2008.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
GG 313 Geological Data Analysis Lecture 13 Solution of Simultaneous Equations October 4, 2005.
Copyright © 2012 Pearson Education. All rights reserved © 2010 Pearson Education Copyright © 2012 Pearson Education. All rights reserved. Chapter.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
5.1 Chapter 5 Inference in the Simple Regression Model In this chapter we study how to construct confidence intervals and how to conduct hypothesis tests.
Lecture 7 Dustin Lueker. 2  Point Estimate ◦ A single number that is the best guess for the parameter  Sample mean is usually at good guess for the.
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Introduction to the Practice of Statistics Fifth Edition Chapter 6: Introduction to Inference Copyright © 2005 by W. H. Freeman and Company David S. Moore.
Inferences from sample data Confidence Intervals Hypothesis Testing Regression Model.
Ex St 801 Statistical Methods Inference about a Single Population Mean.
Chapter 12 Confidence Intervals and Hypothesis Tests for Means © 2010 Pearson Education 1.
Inen 460 Lecture 2. Estimation (ch. 6,7) and Hypothesis Testing (ch.8) Two Important Aspects of Statistical Inference Point Estimation – Estimate an unknown.
© Copyright McGraw-Hill 2004
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
Inference About Means Chapter 23. Getting Started Now that we know how to create confidence intervals and test hypotheses about proportions, it’d be nice.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
Hypothesis Tests. An Hypothesis is a guess about a situation that can be tested, and the test outcome can be either true or false. –The Null Hypothesis.
Lecture 13 Dustin Lueker. 2  Inferential statistical methods provide predictions about characteristics of a population, based on information in a sample.
4-1 Statistical Inference Statistical inference is to make decisions or draw conclusions about a population using the information contained in a sample.
Chapter 9 Introduction to the t Statistic
Inference and Tests of Hypotheses
Chapter 9 Hypothesis Testing.
Chapter 23 Inference About Means.
Presentation transcript:

GG 313 Lecture 7 Chapter 2: Hypothesis Testing Sept. 13, 2005

CRUISE Non- required planning meeting for all involved This Friday 1:30 PM, POST 703

Let’s go over the homework and finish up the last of Chapter 2 first. Homework being returned: Do your calculations in Matlab or Excel and DOCUMENT them. You should be able to determine what a reasonable answer is. If your answer isn’t reasonable, its probable that it’s wrong! I take off points for unreasonable answers.

Today’s homework: 1) Hawaiian data: What did you get for a correlation coefficient? Are age and distance correlated? (We’ll come back to these data set later.) 2) chromium: You can use the functions in Matlab, but to be sure you understand them, I strongly suggest that you try your own functions and compare the two. Any bad points? Is nickel correlated? 3) Generate the plots

Inferences about Population Mean We estimate the mean of a population by calculating the mean of our sample. Knowing from the central limit theorem that the sample mean estimate should be normally distributed about the population mean, we can get some idea of the precision of our estimate of the population mean. Here’s one place where I found Paul’s notes near impossible to understand. Read his notes, and try the following explanation.

If we took many trials, our sample mean, x, would have a normal distribution about the population mean, µ. From earlier, the standard deviation of the sample mean is equal to the population standard deviation divided by the √n: (see Eqn ) Thus, the deviation of our sample mean from the population mean has a normal distribution defined by a mean of 0=x-µ, and a variance of  2 /n. Changing to normalized coordinates, We have used s as an approximation to . This ASSUMES that n is LARGE (>30).

We want a z i that will give us appropriate confidence limits for our estimate of the mean. Z i =2, for example is 2 standard deviations from the mean, with a probability that the estimate will be within z i =±2 of 95%, thus, this is the 95% confidence interval. We want to solve for the x i appropriate for that z i. If we wanted a 97% confidence interval, we would use z i =3, and so forth. Paul calls this special value of z i z  /2, and the special value of x i he calls E.

The probability that x is greater than the confidence interval z  /2 is . When n~30 or more, and the population is infinite, we can substitute the sample standard deviation, s x, for . What sample size do we need to be confident that our error will be no larger than E?

Since this statistic is normally distributed, and thus symmetric, the interval -z  /2 < z < z  /2 is where z will have its value with probabiliity 1- . Plugging in for z: This shows the confidence interval on µ at the 1-  confidence level. This is a commonly used statistic for estimation of the population mean. Often the level used is 95%, or 2 .

Example: A 30-grain sample of a sediment is obtained, and the mean computed. The mean, x, is 10.5 microns, and the standard deviation is s=1.2 microns. At the 95% confidence interval, what is the uncertainty in the mean grain size of the sediment? From the above equation, the 95% confidence interval is z=2, thus the uncertainty is: So the mean grain size, based on the above analysis is 10.5±0.4 microns with 95% confidence.

What if our sample size is smaller? Above, we insisted that we had a fairly large sample size, effectively guaranteeing that the sample mean is a normally distributed statistic. If we use smaller sample sizes we need to assume that the population we are sampling is close to normally distributed. Then we can use the distribution below to estimate the uncertainty in the mean: This is known as the Student t-distribution. The transformation looks identical to the one for z, but  has been replaced by s. The t distribution shape depends on the number of degrees of freedom =n-1. If n is large, then the t-statistics are the same as the z-statistics (normal distribution). There are tables of t values for different combinations of degrees of freedom and confidence level.

Student's distribution arises when (as in nearly all practical statistical work) the population standard deviation is unknown and has to be estimated from the data. We use it at this point to estimate the uncertainty in the population mean. For a detailed discussion, see: The derivation of the t-distribution was first published in 1908 by William Sealey Gosset, while he worked at a Guinness brewery in Dublin. He was not allowed to publish under his own name, so the paper was written under the pseudonym Student.To get values of the student’s t- distribution, use Matlab functions tcdf (cumulative) and ppdf (pdf).William Sealey Gosset GuinnessDublin

These are student’s T distributions for =1 (lowest in middle) to =6 (highest). It does not depend on µ or .

What we really want is the values of t that bound the confidence interval we want. So we want to use the cumulative function, tcdf. tcdf(3.078,1) = 0.9. Compare this with the table on the next page. It gives the probability that the value will be less than t=3.078 is 0.9, given 1 degree of freedom. This allows us to use the Matlab function to obtain the table values for t. But this is actually backwards from what we really want, so lets try the function tinv(probability,degrees of freedom). tinv(0.9,1)= This gives us exactly what we want - the t statistic for a given probability and given degrees of freedom.

Probability of exceeding the critical value This table contains the upper critical values of the Student's t-distribution. The upper critical values are computed using the percent point function. Due to the symmetry of the t-distribution, this table can be used for both 1-sided (lower and upper) and 2-sided tests using the appropriate value of, the significance level, demonstrated with the graph below which plots a t distribution with 10 degrees of freedom.. Student's t-distributionpercent point function  This is the table Paul uses in his examples:

Now let’s do the example in Paul’s notes: We have density samples: ( ). What is the 95% confidence interval on the sample mean? We can get the mean: 2.28, s=0.05, n=7, and degrees of freedom=6 easily. Now we just plug in: The t-value for =6 and  /2=0.025 is given by: tinv(0.975,6)= So the limits are 2.447*0.05/√7, so µ=2.28±0.046 In words, the mean density of our samples is 2.28±0.046 gm/cm^3 with a confidence of 95%.

Try another example now: You have 4 rock sample ages of 14.2, , and 12.1 million years. What is the mean age, and what is the 90% confidence interval for that age?

Chapter 2 Testing Hypotheses In many situations you can give quantitative answers to the question: “Does your data satisfy your hypothesis?” That’s what this chapter is about. You cannot PROVE your hypothesis, but you might be able to reject the opposite of your hypothesis. For example you have two rock samples and hypothesize that they have different densities. You can likely show that they do not have the same densities within some bounds. This is the NULL HYPOTHESIS.

Example: Claim: a particular sandstone has a density of 2.35 gm/cm^3. We are given 50 samples from another outcrop. We hypothesize that if the mean density of these samples are not between 2.25 and 4.45, then the samples are from a different lithological unit. What is the probability of making a wrong decision? There are two possibilities - we could have the mean of our sample be outside of the range we have selected, thus incorrectly rejecting the idea that the samples are from the same sandstone, or we could have the mean inside the range, and accept the idea, but be incorrect; the sample mean being a poor representative of another unit.

Other examples: Hypothesisnull hypothesis Alcohol has a harmful effect Alcohol has no effect On reaction time. The date of isochron A is Isochron B is older than A. older than isochron B. Strain increases before Strain does not increase. earthquakes

Let’s do the first case: The null hypothesis is that the sample is from the unit, but the sample mean is outside the bounds: We can reject the null hypothesis if it occurs in less than a small fraction of samples, like only 5%. N=50, s=  =0.42, so: This is the standard deviation of the mean.

We now normalize to get the normal scores: We get the area under the tails (eqn ): Multiplying by to to get both tails: p=0.095=9.5%. Thus the probability that we will erroneously say that the sample is not the same as the known formation is 9.5%. Calling a result false when it is actually true is called a type I error.

Let’s look at the other possibility. Suppose that the true density of our sandstone sample is µ=2.53, and that it is not the same formation as the 2.35 reference sandstone. What’s the probability of having our sample mean density fall between our limits and thus erroneously accepting the hypothesis? In this case, we want to know the probability of making a type II error.

Computing the normal scores as above: We then use eqn to calculate the probability of the sample mean falling between the two limits: So we have a 9.2% probability of calling the rocks the same when they really are not.

There is always the possibility of making an erroneous conclusion, and these methods allow us to quantify the probabilities. There are for possible outcomes in our hypothesis testing, we can accept or reject the null hypothesis, and the null hypothesis can be true or false: Type II errors are hard to detect and most desirable to avoid..