Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical Inference An introduction. Big picture Use a random sample to learn something about a larger population.

Similar presentations


Presentation on theme: "Statistical Inference An introduction. Big picture Use a random sample to learn something about a larger population."— Presentation transcript:

1 Statistical Inference An introduction

2 Big picture Use a random sample to learn something about a larger population.

3 Inference for a Single Population Population ????? Sample (Data) Descriptive statistics (e.g. sample mean) Plots and graphs Inferential Statistics confidence intervals hypothesis tests models Unknown characteristics of the population, e.g. population mean (  ) or population proportion (p), i.e. parameters. Sample yields observed data on measured variables. Use descriptive methods to examine data and estimate parameters.

4 Two main ways to learn about a population Confidence intervals Hypothesis testing

5 Confidence Intervals Allow us to use sample data to estimate a population value, like the true mean or the true proportion, i.e. estimate parameters. Example: How much college students spend per week on alcoholic beverages?

6 Hypothesis Testing Allows us to use sample data to test a claim about a population, such as testing whether a population proportion or population mean equals some number. Example: Is the true average amount that students spent weekly on alcohol $20?

7 General Idea of Hypothesis Testing Make an initial assumption. Collect evidence (data). Based on the available evidence, decide whether or not the initial assumption is reasonable.

8 Hmm? Let’s illustrate this idea

9 Example: Grade inflation? (Has mean GPA increased since 1990?) Population of 5 million college students Is the average GPA 2.7 ? (Imagine that 2.7 was mean GPA for U.S. college students in 1990) Sample of 100 college students How likely is it that 100 students would have an average GPA as large as 2.9 if the population average was 2.7?

10 Making the Decision It is either likely or unlikely that we would collect the evidence we did given the initial assumption. (Note: “Likely” or “unlikely” is measured by calculating a probability!) If it is likely, then we “do not reject” our initial assumption. There is not enough evidence to do otherwise.

11 Making the Decision (cont’d) If it is unlikely, then: –either our initial assumption is correct and we experienced an unusual event –or our initial assumption is incorrect In statistics, if it is unlikely, we decide to “reject” our initial assumption.

12 Idea of Hypothesis Testing: Criminal Trial Analogy First, state 2 hypotheses, the null hypothesis (“H 0 ”) and the alternative hypothesis (“H A ”) –H 0 : Defendant is not guilty. –H A : Defendant is guilty.

13 An aside: Identification of hypotheses The null hypothesis always represents the status quo, i.e. the hypothesis that requires no change in current behavior. The alternative hypothesis is the conclusion that the researcher is trying to make.

14 Criminal Trial Analogy (continued) Then, collect evidence, such as finger prints, blood spots, hair samples, carpet fibers, shoe prints, ransom notes, handwriting samples, etc. In statistics, the data are the evidence.

15 Criminal Trial Analogy (continued) Then, make initial assumption. –Defendant is innocent until proven guilty. In statistics, we always assume the null hypothesis is true.

16 Criminal Trial Analogy (continued) Then, make a decision based on the available evidence. –If there is sufficient evidence (“beyond a reasonable doubt”), reject the null hypothesis. (Behave as if defendant is guilty.) –If there is not enough evidence, do not reject the null hypothesis. (Behave as if defendant is not guilty.)

17 Important Point About Errors Neither decision entails proving the null hypothesis or the alternative hypothesis. We merely state there is enough evidence to behave one way or the other. This is also always true in statistics! No matter what decision we make, there is always a chance we made an error. Because there are two possible decisions there are two potential errors that can be made, Type I and II Errors.

18 Hypothesis Testing: Statistical Approach (6 basic steps) 1.Formulate Null (H o ) and Alternative (H A ) Hypotheses. 2.Determine test criteria – what test procedure will be used and what “evidence” is required to convince. 3.Collect data, summarize data and measure discrepancy from initial assumption. 4.Compute likelihood chance variation would produce observed result, i.e. find p-value. 5.Make decision and interpret your findings. 6.Quantify significant findings: power analysis, effect size (ES), and confidence intervals.

19 STEP 1: Formulate Null and Alternative Hypotheses (H o and H A ) Null Hypothesis (H o ) Opposite of alternative hypothesis Statement of “nothing going on” Statement that reflects “equality” or the “status quo” IS ASSUMED TO BE THE TRUTH WHEN CONDUCTING THE TEST!

20 STEP 1: Formulate Null and Alternative Hypotheses (H o and H A ) Alternative Hypothesis (H A ) Also called the “Research Hypothesis” Reflects what is believed to be the case or what the researcher “hopes” to show. Statement of “inequality” or change from “status quo” States that a relationship or difference exists.

21 Example: Grade inflation? (Has mean GPA increased since 1990?) Population of 5 million college students Is the average GPA 2.7 ? (Imagine that 2.7 was mean GPA for U.S. college students in 1990) Sample of 100 college students How likely is it that 100 students would have an average GPA as large as 2.9 if the population average was 2.7?

22 Example: Grade inflation? Has mean GPA increased since 1990? Alternative hypothesis reflects research hypothesis that the mean GPA for college students is greater than it was in 1990.

23 STEP 1: More comments about the null hypothesis (H o ) In comparative experiments, the null says the populations do not differ in terms of an given outcome, e.g. in a treatment vs. control experiment it would state there is not “treatment effect”. In studies where we are interested in determining if a “relationship exists” the null says that there is no association or trend between the factors being studied, e.g. no difference in risk of cancer associated with a potential risk factor such as smoking.

24 STEP 1: More comments about the alternative hypothesis (H A ) In comparative experiments, the alternative says the populations DO differ in terms of an given outcome, e.g. in a treatment vs. control experiment it would state there is a “treatment effect”. In studies where we are interested in determining if a “relationship exists” the alternative states that there is an association or trend between the factors being studied, e.g. there is increased risk of cancer associated with smoking NOT CAUSATION!

25 STEP 2: Determine test criteria a)Choose significance level (  ) b)Determine test procedure that will be used

26 STEP 2 (a): Choose significance level Briefly this says “sets the bar” for what it means for our results to be viewed as “unlikely to be produced by chance variation alone”. The smaller we choose  the more discrepant our observed results have to be from what we would expect to see if the null hypothesis were true, i.e. the weirder our results have to be to convince us to support the alternative.

27 STEP 2 (a): Choose significance level Typically  is used which means that if less than 1 study in 20 would produce the observed result when the null (H o ) is true, we would then be convinced that the assumed null is probably not the case and we would decide in favor of the alternative (H A ).

28 STEP 2(b): Determine test procedure What test procedure we use is dictated by: the nature of the question to be answered the data types of the variable(s) involved, i.e. nominal, ordinal, discrete or continuous. distributional properties of the variables involved, e.g. normality for measured numeric quantities.

29 STEP 2(b): Determine test procedure Common test procedures you might be familiar with: t-Test (both one- and two-sample) Chi-square test of independence Paired t-Test ANOVA (F-tests) Wilcoxon or Mann-Whitney rank-sum test And many, many, more!

30 STEP 2(b): Determine test procedure This table is probably pretty scary at this point, but I think we will cover most of the tests in this table in this course.

31 STEP 2(b): Determine test procedure Common test procedures you might be familiar with: t-Test (both one- and two-sample) Chi-square test of independence Paired t-Test ANOVA (F-tests) Wilcoxon or Mann-Whitney rank-sum test And many, many, more! What is common to all these and other test procedures is that they all in some way quantify or measure numerically the discrepancy between what is observed and what we would expect to see if the null hypothesis were true

32 STEP 2(b): Determine test procedure Many of the tests you learn about in an introductory statistics course use a test statistic of the form: They measure how many SE’s away from what you expected to see your observed estimate, i.e. “think z- score”. Test statistics outside -2 to 2 are “strange”.

33 Example: Grade inflation? (Has mean GPA increased since 1990?) Population of 5 million college students Is the average GPA 2.7 ? (Imagine that 2.7 was mean GPA for U.S. college students in 1990) Sample of 100 college students Mean=2.9 SD =.6 How likely is it that 100 students would have an average GPA as large as 2.9 if the population average was 2.7?

34 Example: Grade Inflation?

35 Example: Grade Inflation (cont’d) Test Statistic for a Single Population Mean (  ) We can then use a standard normal or t-distribution to find the probability we would get a discrepancy as large or larger than our observed test statistic by chance variation alone.

36 STEP 3: Collect data and compute test statistic Summarize all variables involved both numerically and graphically Look at distributional shape, as many test procedures require approx. normality for continuous variables. Check for outliers. Compute the test statistic from our observed data.

37 Example: Grade inflation? (Has mean GPA increased since 1990?) Population of 5 million college students Is the average GPA 2.7 ? (Imagine that 2.7 was mean GPA for U.S. college students in 1990) Sample of 100 college students Mean=2.9 SD =.6 How likely is it that 100 students would have an average GPA as large as 2.9 if the population average was 2.7?

38 Example: Grade inflation? (Has mean GPA increased since 1990?) GPA How likely are we to obtain a sample mean this large sampling from a population whose mean  ? GPA distribution is only slightly left-skewed and looks for reasonably normal.

39 Example: Grade Inflation (cont’d) Test Statistic for a Single Population Mean (  ) Our observed mean is 3.44 SE’s higher than what we would expect if the mean GPA of U.S. college students was still 2.70 as it had been in 1990. This is a very extreme result!!! 99.73% of the time we expect such a value to lie in the range from -3.00 to 3.00 and we are beyond that!

40 STEP 4: Find p-value The p-value is the probability that chance variation alone would produce a test statistic value as extreme or more extreme than the one observed assuming the null hypothesis is true. The underlined part of the definition means that the probability is calculated assuming the null is true!

41 Example: Grade Inflation (cont’d) p-value calculation and interpretation P(t > 3.44) =.0003. Therefore the probability that chance variation alone would produce a sample mean of 2.9 when sampling from a population whose mean is actually 2.7 is.0003 or 3 out of 10,000! It is highly unlikely that chance variation would produce this result!

42 Example: Grade Inflation (cont’d) p-value calculation and interpretation Working under the assumption that the mean GPA of U.S. college students is still , we have determined that only 3 samples in 10,000 would be expected to produce a sample mean of 2.91 or higher. Thus our observed data (evidence) is very inconsistent with the assumption that the mean GPA today is  as it was in 1990.

43 Example: Grade Inflation (cont’d) p-value calculation and interpretation We two ways to think about this outcome: a)The mean GPA is still  and by dumb luck we have obtained a very rare sample, one which will be produced by chance variation in 3 out of 10,000 random samples. b)Our assumption that the mean GPA is still 2.70 is wrong, and it is actually the case that the mean GPA of U.S. college students has increased from where it was in 1990. Which makes the observed result more plausible.

44 Example: Grade Inflation (cont’d) p-value calculation and interpretation We two ways to think about this outcome: a)The mean GPA is still  and by dumb luck we have obtained a very rare sample, one which will be produced by chance variation in 3 out of 10,000 random samples. b)Our assumption that the mean GPA is still 2.70 is wrong, and it is actually the case that the mean GPA of U.S. college students has increased from where it was in 1990. Which makes the observed result more plausible. Given these two options (b) is clearly the way to go. Why on earth would we think that we obtained this incredibly rare sample ? The observed result is much more plausible when sampling from a population whose mean is greater than 2.70.

45 STEP 5: Make decision and interpret Decision rule based on p-values: If p-value <  we Reject H o in favor of the alternative. If p-value >  we Fail to Reject H o and conclude there is insufficient evidence to support the alternative. Note: This is true for ALL statistical tests!

46 STEP 5: Make decision and interpret Decision: Because our p-value =.0003 <.05 we reject the null hypothesis in favor of the alternative. Interpretation: We conclude that the mean GPA of U.S. college students today is greater than 2.70, which is what is was back in 1990.

47 STEP 5: Make decision and interpret What we have decided is that we have statistically significant evidence (p =.0003) to conclude the mean GPA has increased from what is was in 1990. Follow-up Questions: 1.How much larger is it? 2.Is it a large enough increase that we actually care? 3.Does this result tell us that grades in U.S. college and universities are being inflated by professors? 4.Was the sampling method biased in some way? 5.Others?

48 STEP 6: Quantifying significant findings Calculate Effect Size (ES) Construct Confidence Interval (see CI ppt) Calculate Power (see Type I & II Error ppt)

49 Effect Size (ES) - (denoted d) For a single population mean situation the effect size is simply the magnitude of the difference between the observed mean and the hypothesized mean divided by the standard deviation (not the SE). We will examine other ES estimates for other situations later in the course

50 Effect Size (ES) Guidelines for ES Interpretation d =.20 small effect size d =.40 medium effect size d =.80 large effect size Our estimated effect size d =.344 here, so it is certainly not a large effect.

51 Effect Size (ES) Guidelines for ES Interpretation – another way to interpret the effect size is in terms of the percent overlap or non-overlap of the distribution under the null (GPA’s in 1990) and the distribution under the alternative (GPA’s now with an estimated mean of 2.91). Overlap is highlighted in yellow, non-overlap is white.

52 Effect Size (ES) A d =.34 gives percent non-overlap around 24% for the two GPA distributions.

53 Confidence Interval for  The basic form for most confidence intervals is Estimate of parameter from our sample, e.g. the sample mean when estimating  Table value comes either from a standard normal or t-distribution table. For 95% confidence this value is around 2.00 Estimated standard error of our estimate, for the mean is given by

54 Confidence Interval (CI) for  The basic form for most confidence intervals is For a 95% CI for  we have:

55 Confidence Interval (CI) for  95% CI for  is given by (2.79, 3.03) This range of GPA values has a 95% chance of covering the true mean GPA of U.S. college students at the present time. *** The lower confidence limit (LCL) is 2.79 which is only a.09 grade point increase from the mean GPA in 1990, or a 3.33% increase. The upper confidence limit (UCL) is 3.03 which is a fairly sizeable increase. In percentage terms it is 12.2% increase and.33 grade point increase in absolute terms. *** This is technically not correct but it sounds good. The correct interpretation is that the process of taking a sample and constructing a 95% CI for  based on the sample will produce intervals that covers the true mean 95% of the time. This particular interval either covers the true mean or it doesn’t, we have no idea which is the case. We are 95% confident that the process of sampling and constructing confidence intervals will produce intervals that cover the true value of the parameter a large percentage of the time (i.e. 95%).

56 Confidence Interval for  and Effect Size 95% CI for  is (2.79, 3.03) which in terms of an increase from 2.70 is (.09,.33). If we divide the LCL and UCL of the latter CI above we have the effect sizes ( d ). Thus effect size is somewhere between small to medium in scale… doh! That sure tells me a lot !


Download ppt "Statistical Inference An introduction. Big picture Use a random sample to learn something about a larger population."

Similar presentations


Ads by Google