Presentation on theme: "Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe."— Presentation transcript:
Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe
Framework of presentation Introduction to test of significance A B C D E F G I J K L References
Normal curve Because a two-sided test is symmetrical, you can also use a confidence interval to test a two-sided hypothesis. α / 2 In a two-sided test, C = 1 – α. C confidence level α significance level
Tests of Significance Is it due to chance, or something else? This kind of questions can be answered using tests of significance. Since many journal articles use such tests, it is a good idea to understand what they mean.
We test SAMPLE to draw conclusions about POPULATION If two SAMPLES (group means) are different, can we be certain that POPULATIONS (from which the samples were drawn) are also different? Is the difference obtained TRUE or SPURIOUS? Will another set of samples be also different? What are the chances that the difference obtained is spurious? The above questions can be answered by STAT TEST. Why should we test significance?
Issues in significance tests Testing many hypotheses One-tailed or two-tailed? A statistically significant result may not be important Sample size If chance model is wrong, test result can be meaningless A significant test does not identify the cause
Four Key Points: Repeated Samples 1.Plotting the means of repeated samples will produce a normal distribution: it will be more peaked than when raw data are plotted (as shown in Figure 9.1)
Four Key Points (contd) 2.The larger the sample sizes, the more peaked the distribution and the closer the means of the samples to the population mean (shown in Figure 9.2)
Four Key Points (contd) 3.The greater the variability in the population, the greater the variations in the samples 4.When sample sizes are above 100, even if a variable in the population is not normally distributed, the means will be normally distributed when repeated samples are plotted –E.g., weight of population of males and females will be bimodal, but if we did repeated samples, the weights would be normally distributed
One- and Two-Tailed Tests If the direction of a relationship is predicted, the appropriate test will be one-tailed –If the direction of the relationship is not predicted, use a two-tailed test Example: –One tailed: Females are less approving of violence than are males –Two-tailed: There is a gender difference in the acceptance of violence [Note: No prediction about which gender is more approving]
Statistical test Hypothesis testing for one sample problem consists of three parts: I)Null hypothesis (H o ): The unknown average is equal to sample mean II) Alternative hypothesis (H a ): The unknown average is not equal to This is the hypothesis that the researcher would like to validate and it is chosen before collecting the data! There are three possible alternative hypotheses – choose only one! 1.H a : one-sided test 2.H a : two-sided test 3.H a : one-sided test
Five Percent Probability Rejection Area: One- and Two-Tailed Tests
Significance doesnt rule out 100% Chance When a result is statistically significant, there is a 5% chance of obtaining something as extreme as, or more extreme than your observation, under the null hypothesis. Even if nothing is happening (the null hypothesis is true), you will get 5 significant results in 100 tests, just by chance.
One-tailed vs Two-tailed Checking if the sample average is too big or too small before making an alternative hypothesis is a form of data snooping. A coin is tossed 100 times, and there are 61 heads. The null hypothesis says the coin is fair, so EV = 50, SE = 5, and z = (61 – 50) / 5 = 2.2. If the alternative hypothesis says the coin is biased towards the heads, then the P-value is roughly the area under the normal curve to the right of 2.2: 1.4%.
If the alternative hypothesis is that the coin is biased, but this bias can be either way, then the P-value should be the area to the left of 2.2 and right of 2.2: 0.7%. A one-tailed test is OK if it was known before the experiment that the coin is either fair or biased towards the heads. If a reported test is two-tailed, and you think it should be one-tailed, just double the P-value.
Suppose an investigator use z-test with two- tailed test and alpha at 0.05 (5%) and gets z = 1.85, so P 6%. Most journals wont publish this result, since it is not statistically significant. The investigator could do a better experiment, but this is hard. A simple way out is to do a one- tailed test.
Was the result important? To compare WISC vocabulary scores for big-city and rural children, investigators took a simple random sample of 2,500 big-city kids, and an independent simple random sample of 2,500 rural kids. Big-city: average = 26, SD = 10. Rural: average = 25, SD = 10. Two-sample z-test. SE for difference 0.3, z = 1 / , P = 5 / 10,000. The difference is highly significant, and the investigators use this to support a proposal to pour money into rural schools.
The z-test only tells us that the difference of 1 point is very hard to explain away with chance variation. How important is this difference?
P-value and sample size The P-value of a test depends on the sample size. With a large sample, even a small difference can be statistically significant, i.e., hard to explain by the luck of the draw. This doesnt necessarily make it important. Conversely, an important difference may not be statistically significant if the sample is too small.
The Role of the Chance Model A test of significance asks the question, Is the difference due to chance? But the test cannot be done until the word chance has been given a precise meaning.
Does the difference prove the point? In an experiment of throwing a die: a die is rolled, and the subject tries to make it show 6 spots. In 720 trials, 6 spots turned up 143 times. If the die is fair, and the subject has no bias, expect 120 sixes, with SD(Hypothetical) 10. z = (143 – 120) / 10 = 2.3, P 1%.
A test of significance tells you that there is a difference, but cant tell you the cause of the difference. A test of significance does not check the design of the study.
How to test statistical significance? State Null hypothesis Set alpha (level of significance) Identify the variables to be analyzed Identify the groups to be compared Choose a test Calculate the test statistic If necessary calculate degrees of freedom Find out the P value Compare with alpha Decision on hypothesis fail to reject or reject Calculate the CI of the difference Calculate Power if required
How to interpret P? If P < alpha (0.05), the difference is statistically significant If P > alpha, the difference between groups is not statistically significant / the difference could not be detected. If P> alpha, calculate the power If power < 80% - The difference could not be detected; repeat the study with deficit number of study subjects. If power 80 % - The difference between groups is not statistically significant.
Statistical Significance The null hypothesis: Dependent and independent variables are statistically unrelated If a relationship between an independent variable and a dependent variable is statistically non-significant –Null hypothesis is true –Research hypothesis is rejected If a relationship between an independent variable and a dependent variable is statistically significant –Null hypothesis is false –Research hypothesis is supported
Tests of Significance Hypotheses –In a test of significance, we set up two hypotheses. The null hypothesis or H 0. The alternative hypothesis or H a. –The null hypothesis (H 0 )is the statement being tested. Usually we want to show evidence that the null hypothesis is not true. It is often called the currently held belief or statement of no effect or statement of no difference. –The alternative hypothesis (H a ) is the statement of what we want to show is true instead of H 0. The alternative hypothesis can be one-sided or two-sided, depending on the statement of the question of interest. –Hypotheses are always about parameters of populations, never about statistics from samples. It is often helpful to think of null and alternative hypotheses as opposite statements about the parameter of interest.
Tests of Significance Test Statistics –A test statistic measures the compatibility between the null hypothesis and the data. An extreme test statistic (far from 0) indicates the data are not compatible with the null hypothesis. A common test statistic (close to 0) indicates the data are compatible with the null hypothesis.
Tests of Significance Significance Level (a) –The significance level (a) is the point at which we say the p-value is small enough to reject H 0. –If the P-value is as small as a or smaller, we reject H 0, and we say that the data are statistically significant at level a. –Significance levels are related to confidence levels through the rule C = 1 – α –Common significance levels (a s ) are 0.10 corresponding to confidence level 90% 0.05 corresponding to confidence level 95% 0.01 corresponding to confidence level 99%
Tests of Significance Steps for Testing a Population Mean (with known) 1. State the null hypothesis: 2. State the alternative hypothesis: 3. State the level of significance Assume a = 0.05 unless otherwise stated 4. Calculate the test statistic
Steps for Testing a Population Mean (with known) 5. Find the P-value: For a one or two-sided test: T test Consult t distribution table and get standard value for specific degree of freedom Z test N need to consult table as it is independent of degrees of freedom –At 5% level of significance Z=1.96 –At 1% level of significance Z =2.58
Steps for Testing a Population Mean (with known) 6. Reject or fail to reject H 0 based on comparison of test statistic If the P-value is less than or equal to a, reject H 0. It the P-value is greater than a, fail to reject H State your conclusion Your conclusion should reflect your original statement of the hypotheses. Furthermore, your conclusion should be stated in terms of the alternative hypotheses For example, if H a : μ μ 0 as stated previously –If H 0 is rejected, There is significant statistical evidence that the population mean is different than m 0. –If H 0 is not rejected, There is not significant statistical evidence that the population mean is different than m 0.
Example : Arsenic –A factory that discharges waste water into the sewage system is required to monitor the arsenic levels in its waste water and report the results to the Environmental Protection Agency (EPA) at regular intervals. Sixty beakers of waste water from the discharge are obtained at randomly chosen times during a certain month. The measurement of arsenic is in nanograms per liter for each beaker of water obtained. (From Graybill, Iyer and Burdick, Applied Statistics, 1998). –Suppose the EPA wants to test if the average arsenic level exceeds 30 nanograms per liter at the 0.05 level of significance.
Making a test of significance Follow these steps: 1.Set up the null hypothesis H 0 – the hypothesis you want to test. 2.Set up the alternative hypothesis H a – what we accept if H 0 is rejected 3.Compute the value t* of the test statistic. 4.Compute the observed significance level P. This is the probability, calculated assuming that H 0 is true, of getting a test statistic as extreme or more extreme than the observed one in the direction of the alternative hypothesis. 5.State a conclusion. You could choose a significance level. If the P-value is less than or equal to, you conclude that the null hypothesis can be rejected at level, otherwise you conclude that the data do not provide enough evidence to reject H 0.
Examples of Significance Tests Arsenic Example –Information given: Assume it is known that s = 34. Sample size: n = 60.
Arsenic Example 1. State the null hypothesis: 2. State the alternative hypothesis: 3. State the level of significance from exceeds a = 0.05 for one tailed test
Z Test Statistic Tests using the z-statistic are called z-tests.
Observed Significance Level z is the number of SEs an observed value is away from its expected value, based on the null hypothesis. The observed z-statistic is –3. The chance of getting a sample average 3 SEs or more below its expected value is about 1 in 1,000. This is an observed significance level, denoted by P, and referred to as a P-value.
Arsenic Example 4. Calculate the test statistic. 5. Z value is < 1.96 (At 5% level of significance) So decision Evidence are failed to reject the null hypothesis Hence, alternate hypothesis is true
Arsenic Example 7. State the conclusion. There is not significant statistical evidence that the average arsenic level exceeds 30 nanograms per liter at the 0.05 level of significance.
Types of Error Type I Error: When we reject H 0 (accept H a ) when in fact H 0 is true. Type II Error: When we accept H 0 (fail to reject H 0 ) when in fact H a is true. Truth about the population H o is trueH a is true Decision based on the sampleReject H o Type I error Correct decision Accept H o Correct decision Type II error
Power and Error Significance and Type I Error –The significance level α of any fixed level test is the probability of a Type I Error: That is, α is the probability that the test will reject the null hypothesis, H 0, when H 0 is in fact true. Power and Type II Error –The power of a fixed level test against a particular alternative is 1 minus the probability of a Type II Error for that alternative (1 - β).
Power Power: The probability that a fixed level α significance test will reject H 0 when a particular alternative value of the parameter is true is called the power of the test to detect that alternative. In other words, power is the probability that the test will reject H 0 when the alternative is true (when the null really should be rejected.) Ways To Increase Power: –Increase α –Increase the sample size – this is what you will typically want to do –Decrease σ
Significance Level The observed significance level is the chance of getting a test statistic as extreme as, or more extreme than, the observed one. It is computed on the basis that the null hypothesis is right. The smaller it is, the stronger the evidence against the null.
The common practice of testing hypotheses The common practice of testing hypotheses mixes the reasoning of significance tests and decision rules as follows: –State H 0 and H a –Think of the problem as a decision problem, so that the probabilities of Type I and Type II errors are relevant. –Because of Step 1, Type I errors are more serious. So choose an α (significance level) and consider only tests with probability of Type I error no greater than α. –Among these tests, select one that makes the probability of a Type II error as small as possible (that is, power as large as possible.) If this probability is too large, you will have to take a larger sample to reduce the chance of an error.
The t-test The z-test is good when the sample size (number of draws) is large enough. For small samples, the t-test should be used, provided the histogram of the contents is not too different from the normal curve.
The t Distribution: t-Test Groups and Pairs Used often for experimental data t-test used when: Sample size is small (e.g.: < 30) Dependent variable measured at ratio level Random assignment to treatment/control groups Treatment has two levels only Sample statistic normally distributed
The t-test represents the ratio between the difference in means between two groups and the standard error of the difference. Thus: difference between the means standard error of the difference t =
Two t-Tests: Between- and Within-Subject Design Between-subjects: Used in an experimental design, with an experimental and a control group, where the groups have been independently established Within-subjects: In these designs the same person is subjected to different treatments and a comparison is made between the two treatments.
Example A weight is known to weigh 70 g. Five measurements were made: 78, 83, 68, 72, 88. The fluctuation is due to chance variation assuming that the calibrated instrument was used for measurement. Do the measurements fluctuate around 70, or do they indicate some bias in the measurement procedure?
Summary for the t-test Assumptions: (a) The data are likely drawn from a normally distributed population. (b) The SD of the population is unknown. (c) The number of observations is small. (d) The histogram of the contents does not look too different from the normal curve.
Two Sample Tests To compare two samples averages, the test statistic has the form except that now these terms apply to difference of sample averages. Need to find the SE for difference.
Summary for the t-test The t-statistic has the same form as the z-statistic: except that the SE is computed on the basis of SD of the data. The degree of freedom is ( sample size – 1 ).
The Chi Square Test Used for frequencies in discrete categories. (a) To check if the observed frequencies are like hypothesized frequencies. (b) To check if two variables are independent in the population.
Expected frequency for a cell = (row total X column total)/N The χ 2 statistic is: where the sum is taken over all the cell.
Is the die loaded? A gambler is accused of using a loaded die, but he pleads innocent. The results of the last 60 throws are summarised below: Value observed frequency
In our example, the χ 2 -statistic is Large χ 2 values observed frequency is far from expected frequency: evidence against the null hypothesis. Small χ2 values they are close to each other: evidence for the null hypothesis.
Whats the P-value, i.e., under the null hypothesis, whats the chance of getting a χ 2 -statistic that is as large as, or larger than the observed statistic? Find out df, Chi square value and find out p
Conclusion A test of significance answers a specific question: How easy is it to explain the difference between the data and what is expected on the null hypothesis, on the basis of chance variation alone? It does not measure the size of a difference or its importance, It will not identify the cause of the difference.
Choosing a stat test…… 1.Data type 2. Distribution of data 3. Analysis type (goal) 4. No. of groups 5. Design
Correlation How we know a variable is normally distributed Histogram Mean, Mode and Median if are almost equal Kelmogorov Smirnov test
References Armitadge, Edition Bishweswar Rao, Edition 4, Year 2007.