Presenter: Shib Sekhar Datta Moderator: M S Bharambe

Slides:



Advertisements
Similar presentations
Introduction to Hypothesis Testing
Advertisements

Introductory Mathematics & Statistics for Business
Overview of Lecture Parametric vs Non-Parametric Statistical Tests.
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
Quantitative Methods Lecture 3
You will need Your text Your calculator
HYPOTHESIS TESTING. Purpose The purpose of hypothesis testing is to help the researcher or administrator in reaching a decision concerning a population.
Chapter 7 Hypothesis Testing
6. Statistical Inference: Example: Anorexia study Weight measured before and after period of treatment y i = weight at end – weight at beginning For n=17.
Chapter 8: Introduction to Hypothesis Testing. 2 Hypothesis Testing An inferential procedure that uses sample data to evaluate the credibility of a hypothesis.
STATISTICAL ANALYSIS. Your introduction to statistics should not be like drinking water from a fire hose!!
Inferential Statistics
Statistical Inferences Based on Two Samples
CHAPTER 15: Tests of Significance: The Basics Lecture PowerPoint Slides The Basic Practice of Statistics 6 th Edition Moore / Notz / Fligner.
Testing Hypotheses About Proportions
Multiple Regression and Model Building
Chapter 16 Inferential Statistics
Hypothesis Testing. To define a statistical Test we 1.Choose a statistic (called the test statistic) 2.Divide the range of possible values for the test.
1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
Inferential Statistics & Hypothesis Testing
Chapter 10: Hypothesis Testing
Making Inferences for Associations Between Categorical Variables: Chi Square Chapter 12 Reading Assignment pp ; 485.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Fundamentals of Hypothesis Testing. Identify the Population Assume the population mean TV sets is 3. (Null Hypothesis) REJECT Compute the Sample Mean.
1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.
Chapter 6 Inference for a Mean
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 9-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
Chapter 9 Hypothesis Testing.
Statistics for Managers Using Microsoft® Excel 5th Edition
Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
AM Recitation 2/10/11.
Hypothesis Testing:.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Chapter 10 Hypothesis Testing
Overview Definition Hypothesis
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
Chapter 8 Hypothesis testing 1. ▪Along with estimation, hypothesis testing is one of the major fields of statistical inference ▪In estimation, we: –don’t.
Fundamentals of Hypothesis Testing: One-Sample Tests
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Chapter 20 Testing hypotheses about proportions
Chapter 12 A Primer for Inferential Statistics What Does Statistically Significant Mean? It’s the probability that an observed difference or association.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Statistical Inference Statistical Inference involves estimating a population parameter (mean) from a sample that is taken from the population. Inference.
Chapter 9 Three Tests of Significance Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Lecture 9 Chap 9-1 Chapter 2b Fundamentals of Hypothesis Testing: One-Sample Tests.
Physics 270 – Experimental Physics. Let say we are given a functional relationship between several measured variables Q(x, y, …) x ±  x and x ±  y What.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
© Copyright McGraw-Hill 2004
Chapter 12 Tests of Hypotheses Means 12.1 Tests of Hypotheses 12.2 Significance of Tests 12.3 Tests concerning Means 12.4 Tests concerning Means(unknown.
Tests of Significance We use test to determine whether a “prediction” is “true” or “false”. More precisely, a test of significance gets at the question.
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 8 th Edition Chapter 9 Hypothesis Testing: Single.
Review Statistical inference and test of significance.
Chapter 9 Hypothesis Testing Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Chapter Nine Hypothesis Testing.
HYPOTHESIS TESTING.
Statistics for Managers Using Microsoft® Excel 5th Edition
Unit 5: Hypothesis Testing
Hypothesis Testing Review
Chapter 9 Hypothesis Testing.
Chapter 9 Hypothesis Testing.
Decision Errors and Power
Daniela Stan Raicu School of CTI, DePaul University
Chapter 9 Hypothesis Testing: Single Population
Presentation transcript:

Presenter: Shib Sekhar Datta Moderator: M S Bharambe Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe

Framework of presentation Introduction to test of significance A B C D E F G I J K L References

Normal curve Because a two-sided test is symmetrical, you can also use a confidence interval to test a two-sided hypothesis. In a two-sided test, C = 1 – α. C confidence level α significance level α /2 α /2

Tests of Significance Is it due to chance, or something else? This kind of questions can be answered using tests of significance. Since many journal articles use such tests, it is a good idea to understand what they mean.

Why should we test significance? We test SAMPLE to draw conclusions about POPULATION If two SAMPLES (group means) are different, can we be certain that POPULATIONS (from which the samples were drawn) are also different? Is the difference obtained TRUE or SPURIOUS? Will another set of samples be also different? What are the chances that the difference obtained is spurious? The above questions can be answered by STAT TEST.

Issues in significance tests Testing many hypotheses One-tailed or two-tailed? A statistically significant result may not be important Sample size If chance model is wrong, test result can be meaningless A significant test does not identify the cause

Four Key Points: Repeated Samples Plotting the means of repeated samples will produce a normal distribution: it will be more peaked than when raw data are plotted (as shown in Figure 9.1)

Four Key Points (cont’d) The larger the sample sizes, the more peaked the distribution and the closer the means of the samples to the population mean (shown in Figure 9.2)

Four Key Points (cont’d) The greater the variability in the population, the greater the variations in the samples When sample sizes are above 100, even if a variable in the population is not normally distributed, the means will be normally distributed when repeated samples are plotted E.g., weight of population of males and females will be bimodal, but if we did repeated samples, the weights would be normally distributed

One- and Two-Tailed Tests If the direction of a relationship is predicted, the appropriate test will be one-tailed If the direction of the relationship is not predicted, use a two-tailed test Example: One tailed: Females are less approving of violence than are males Two-tailed: There is a gender difference in the acceptance of violence [Note: No prediction about which gender is more approving]

Statistical test Hypothesis testing for one sample problem consists of three parts: Null hypothesis (Ho): The unknown average is equal to sample mean II) Alternative hypothesis (Ha): The unknown average is not equal to This is the hypothesis that the researcher would like to validate and it is chosen before collecting the data! There are three possible alternative hypotheses – choose only one! Ha:  one-sided test Ha:  two-sided test Ha:  one-sided test

Five Percent Probability Rejection Area: One- and Two-Tailed Tests

Significance doesn’t rule out 100% Chance When a result is statistically significant, there is a 5% chance of obtaining something as extreme as, or more extreme than your observation, under the null hypothesis. Even if nothing is happening (the null hypothesis is true), you will get 5 significant results in 100 tests, just by chance.

One-tailed vs Two-tailed Checking if the sample average is too big or too small before making an alternative hypothesis is a form of data snooping. A coin is tossed 100 times, and there are 61 heads. The null hypothesis says the coin is fair, so EV = 50, SE = 5, and z = (61 – 50) / 5 = 2.2. If the alternative hypothesis says the coin is biased towards the heads, then the P-value is roughly the area under the normal curve to the right of 2.2: 1.4%.

If the alternative hypothesis is that the coin is biased, but this bias can be either way, then the P-value should be the area to the left of 2.2 and right of 2.2: 0.7%. A one-tailed test is OK if it was known before the experiment that the coin is either fair or biased towards the heads. If a reported test is two-tailed, and you think it should be one-tailed, just double the P-value.

Suppose an investigator use z-test with two-tailed test and alpha at 0 Suppose an investigator use z-test with two-tailed test and alpha at 0.05 (5%) and gets z = 1.85, so P ≈ 6%. Most journals won’t publish this result, since it is not “statistically significant”. The investigator could do a better experiment, but this is hard. A simple way out is to do a one-tailed test.

Was the result important? To compare WISC vocabulary scores for big-city and rural children, investigators took a simple random sample of 2,500 big-city kids, and an independent simple random sample of 2,500 rural kids. Big-city: average = 26, SD = 10. Rural: average = 25, SD = 10. Two-sample z-test. SE for difference ≈ 0.3, z = 1 / 0.3 ≈ 3.3, P = 5 / 10,000. The difference is highly significant, and the investigators use this to support a proposal to pour money into rural schools.

The z-test only tells us that the difference of 1 point is very hard to explain away with chance variation. How important is this difference?

P-value and sample size The P-value of a test depends on the sample size. With a large sample, even a small difference can be “statistically significant”, i.e., hard to explain by the luck of the draw. This doesn’t necessarily make it important. Conversely, an important difference may not be statistically significant if the sample is too small.

The Role of the Chance Model A test of significance asks the question, “Is the difference due to chance?” But the test cannot be done until the word “chance” has been given a precise meaning.

Does the difference prove the point? In an experiment of throwing a die: a die is rolled, and the subject tries to make it show 6 spots. In 720 trials, 6 spots turned up 143 times. If the die is fair, and the subject has no bias, expect 120 sixes, with SD(Hypothetical) 10. z = (143 – 120) / 10 = 2.3, P ≈ 1%.

A test of significance tells you that there is a difference, but can’t tell you the cause of the difference. A test of significance does not check the design of the study.

How to test statistical significance? State Null hypothesis Set alpha (level of significance) Identify the variables to be analyzed Identify the groups to be compared Choose a test Calculate the test statistic If necessary calculate degrees of freedom Find out the P value Compare with alpha Decision on hypothesis fail to reject or reject Calculate the CI of the difference Calculate Power if required

How to interpret P? If P < alpha (0.05), the difference is statistically significant If P > alpha, the difference between groups is not statistically significant / the difference could not be detected. If P> alpha, calculate the power If power < 80% - The difference could not be detected; repeat the study with deficit number of study subjects. If power ≥ 80 % - The difference between groups is not statistically significant.

Statistical Significance The null hypothesis: Dependent and independent variables are statistically unrelated If a relationship between an independent variable and a dependent variable is statistically non-significant Null hypothesis is true Research hypothesis is rejected If a relationship between an independent variable and a dependent variable is statistically significant Null hypothesis is false Research hypothesis is supported

Tests of Significance Hypotheses In a test of significance, we set up two hypotheses. The null hypothesis or H0. The alternative hypothesis or Ha. The null hypothesis (H0)is the statement being tested. Usually we want to show evidence that the null hypothesis is not true. It is often called the “currently held belief” or “statement of no effect” or “statement of no difference.” The alternative hypothesis (Ha) is the statement of what we want to show is true instead of H0. The alternative hypothesis can be one-sided or two-sided, depending on the statement of the question of interest. Hypotheses are always about parameters of populations, never about statistics from samples. It is often helpful to think of null and alternative hypotheses as opposite statements about the parameter of interest.

Tests of Significance Test Statistics A test statistic measures the compatibility between the null hypothesis and the data. An extreme test statistic (far from 0) indicates the data are not compatible with the null hypothesis. A common test statistic (close to 0) indicates the data are compatible with the null hypothesis.

Tests of Significance Significance Level (a) The significance level (a) is the point at which we say the p-value is small enough to reject H0. If the P-value is as small as a or smaller, we reject H0, and we say that the data are statistically significant at level a. Significance levels are related to confidence levels through the rule C = 1 – α Common significance levels (a’s) are 0.10 corresponding to confidence level 90% 0.05 corresponding to confidence level 95% 0.01 corresponding to confidence level 99%

Tests of Significance Steps for Testing a Population Mean (with s known) 1. State the null hypothesis: 2. State the alternative hypothesis: 3. State the level of significance Assume a = 0.05 unless otherwise stated 4. Calculate the test statistic

Steps for Testing a Population Mean (with s known) 5. Find the P-value: For a one or two-sided test: T test Consult t distribution table and get standard value for specific degree of freedom Z test N need to consult table as it is independent of degrees of freedom At 5% level of significance Z=1.96 At 1% level of significance Z =2.58

Steps for Testing a Population Mean (with s known) 6. Reject or fail to reject H0 based on comparison of test statistic If the P-value is less than or equal to a, reject H0. It the P-value is greater than a, fail to reject H0. 7. State your conclusion Your conclusion should reflect your original statement of the hypotheses. Furthermore, your conclusion should be stated in terms of the alternative hypotheses For example, if Ha: μ ≠ μ0 as stated previously If H0 is rejected, “There is significant statistical evidence that the population mean is different than m0.” If H0 is not rejected, “There is not significant statistical evidence that the population mean is different than m0.”

Example : Arsenic “A factory that discharges waste water into the sewage system is required to monitor the arsenic levels in its waste water and report the results to the Environmental Protection Agency (EPA) at regular intervals. Sixty beakers of waste water from the discharge are obtained at randomly chosen times during a certain month. The measurement of arsenic is in nanograms per liter for each beaker of water obtained.” (From Graybill, Iyer and Burdick, Applied Statistics, 1998). Suppose the EPA wants to test if the average arsenic level exceeds 30 nanograms per liter at the 0.05 level of significance.

Making a test of significance Follow these steps: Set up the null hypothesis H0– the hypothesis you want to test. Set up the alternative hypothesis Ha– what we accept if H0 is rejected Compute the value t* of the test statistic. Compute the observed significance level P. This is the probability, calculated assuming that H0 is true, of getting a test statistic as extreme or more extreme than the observed one in the direction of the alternative hypothesis. State a conclusion. You could choose a significance level . If the P-value is less than or equal to , you conclude that the null hypothesis can be rejected at level , otherwise you conclude that the data do not provide enough evidence to reject H0.

Examples of Significance Tests Arsenic Example Information given: 37.6 56.7 5.1 3.7 3.5 15.7 20.7 81.3 37.5 15.4 10.6 8.3 23.2 9.5 7.9 21.1 40.6 35 19.4 38.8 20.9 8.6 59.2 6.2 24 33.8 21.6 15.3 6.6 87.7 4.8 10.7 182.2 17.6 152 63.5 46.9 17.4 26.1 21.5 3.2 45.2 12 128.5 23.5 24.1 36.2 48.9 16.5 33.2 25.6 33.6 12.2 9.9 14.5 30 Sample size: n = 60. Assume it is known that s = 34.

Arsenic Example 1. State the null hypothesis: 2. State the alternative hypothesis: 3. State the level of significance from “exceeds” a = 0.05 for one tailed test

Tests using the z-statistic are called z-tests. Z Test Statistic Tests using the z-statistic are called z-tests.

Observed Significance Level z is the number of SEs an observed value is away from its expected value, based on the null hypothesis. The observed z-statistic is –3. The chance of getting a sample average 3 SEs or more below its expected value is about 1 in 1,000. This is an observed significance level, denoted by P, and referred to as a P-value.

Arsenic Example 4. Calculate the test statistic. 5. Z value is < 1.96 (At 5% level of significance) So decision Evidence are failed to reject the null hypothesis Hence, alternate hypothesis is true

Arsenic Example 7. State the conclusion. There is not significant statistical evidence that the average arsenic level exceeds 30 nanograms per liter at the 0.05 level of significance.

Types of Error Type I Error: When we reject H0 (accept Ha) when in fact H0 is true. Type II Error: When we accept H0 (fail to reject H0) when in fact Ha is true. Truth about the population Ho is true Ha is true Decision based on the sample Reject Ho Type I error Correct decision Accept Ho Type II

Power and Error Significance and Type I Error The significance level α of any fixed level test is the probability of a Type I Error: That is, α is the probability that the test will reject the null hypothesis, H0, when H0 is in fact true. Power and Type II Error The power of a fixed level test against a particular alternative is 1 minus the probability of a Type II Error for that alternative (1 - β).

Power Power: The probability that a fixed level α significance test will reject H0 when a particular alternative value of the parameter is true is called the power of the test to detect that alternative. In other words, power is the probability that the test will reject H0 when the alternative is true (when the null really should be rejected.) Ways To Increase Power: Increase α Increase the sample size – this is what you will typically want to do Decrease σ

Significance Level The observed significance level is the chance of getting a test statistic as extreme as, or more extreme than, the observed one. It is computed on the basis that the null hypothesis is right. The smaller it is, the stronger the evidence against the null.

The common practice of testing hypotheses The common practice of testing hypotheses mixes the reasoning of significance tests and decision rules as follows: State H0 and Ha Think of the problem as a decision problem, so that the probabilities of Type I and Type II errors are relevant. Because of Step 1, Type I errors are more serious. So choose an α (significance level) and consider only tests with probability of Type I error no greater than α. Among these tests, select one that makes the probability of a Type II error as small as possible (that is, power as large as possible.) If this probability is too large, you will have to take a larger sample to reduce the chance of an error.

The t-test The z-test is good when the sample size (number of draws) is large enough. For small samples, the t-test should be used, provided the histogram of the contents is not too different from the normal curve.

The t Distribution: t-Test Groups and Pairs Used often for experimental data t-test used when: Sample size is small (e.g.: < 30) Dependent variable measured at ratio level Random assignment to treatment/control groups Treatment has two levels only Sample statistic normally distributed

t difference between the means = standard error of the difference The t-test represents the ratio between the difference in means between two groups and the standard error of the difference. Thus: difference between the means standard error of the difference t =

Two t-Tests: Between- and Within-Subject Design Between-subjects: Used in an experimental design, with an experimental and a control group, where the groups have been independently established Within-subjects: In these designs the same person is subjected to different treatments and a comparison is made between the two treatments.

Example A weight is known to weigh 70 g. Five measurements were made: 78, 83, 68, 72, 88. The fluctuation is due to chance variation assuming that the calibrated instrument was used for measurement. Do the measurements fluctuate around 70, or do they indicate some bias in the measurement procedure?

Summary for the t-test Assumptions: (a) The data are likely drawn from a normally distributed population. (b) The SD of the population is unknown. (c) The number of observations is small. (d) The histogram of the contents does not look too different from the normal curve.

Two Sample Tests To compare two samples averages, the test statistic has the form except that now these terms apply to difference of sample averages. Need to find the SE for difference.

Summary for the t-test The t-statistic has the same form as the z-statistic: except that the SE is computed on the basis of SD of the data. The degree of freedom is (sample size – 1).

Used for frequencies in discrete categories. The Chi Square Test Used for frequencies in discrete categories. (a) To check if the observed frequencies are like hypothesized frequencies. (b) To check if two variables are independent in the population.

Expected frequency for a cell = (row total X column total)/N The χ2 statistic is: where the sum is taken over all the cell.

Is the die loaded? A gambler is accused of using a loaded die, but he pleads innocent. The results of the last 60 throws are summarised below: Value observed frequency 1 4 2 6 3 17 4 16 5 8 6 9

In our example, the χ2-statistic is Large χ2 values observed frequency is far from expected frequency: evidence against the null hypothesis. Small χ2 values they are close to each other: evidence for the null hypothesis.

What’s the P-value, i.e., under the null hypothesis, what’s the chance of getting a χ2-statistic that is as large as, or larger than the observed statistic? Find out df, Chi square value and find out p

Conclusion A test of significance answers a specific question: How easy is it to explain the difference between the data and what is expected on the null hypothesis, on the basis of chance variation alone? It does not measure the size of a difference or its importance, It will not identify the cause of the difference.

Data type 2. Distribution of data 3. Analysis type (goal) Choosing a stat test…… Data type 2. Distribution of data 3. Analysis type (goal) 4. No. of groups 5. Design

Mean, Mode and Median if are almost equal Kelmogorov Smirnov test Correlation How we know a variable is normally distributed Histogram Mean, Mode and Median if are almost equal Kelmogorov Smirnov test

References Armitadge, Edition Bishweswar Rao, Edition 4, Year 2007.

Thank You