Presentation on theme: "Dr. G. Johnson, www.researchdemystified.org1 Inferential Statistics Research Methods for Public Administrators Dr. Gail Johnson."— Presentation transcript:
Dr. G. Johnson, www.researchdemystified.org1 Inferential Statistics Research Methods for Public Administrators Dr. Gail Johnson
Dr. G. Johnson, www.researchdemystified.org2 Welcome to Inferential Statistics This is a companion to Sampling Demystified It could be argued that this should follow that chapter If the results are not statistically significant, no further analysis is warranted But some people find inferential statistics overwhelming so I saved it for last There is much that can be done with descriptive data analysis but it gets overshadowed by the fancier statistics of regression and inference.
Dr. G. Johnson, www.researchdemystified.org3 Welcome to Inferential Statistics Used when working with data from random samples Used when researchers want to infer conclusions about a population based on results from a randomly selected sample from that population Hence the term “Inferential” Jargon term: generalizability
Dr. G. Johnson, www.researchdemystified.org4 Inferential Statistics: A Powerful Analytical Tool Enables researchers to: Estimate population proportions Estimate population mean Estimate sampling error Estimate confidence intervals Test for statistical significance
Dr. G. Johnson, www.researchdemystified.org5 Confidence Revisited Estimate the population mean or proportion based on the sample survey Confidence level: social science standard is 95% 95% certain that our population estimate is correct within a specified range This is the precision of the estimates 90% confidence level is the lowest level that should be used In some cases, the researchers might want to raise the bar to 99%--to very, very certain
Dr. G. Johnson, www.researchdemystified.org6 Confidence Revisited Confidence interval: this is the range where the true mean exists Social science standard for the confidence interval is plus or minus 5% Sampling error is the analogous term when working with proportions, like with survey data Sometimes called the margin of error
Dr. G. Johnson, www.researchdemystified.org7 Sampling Error: Revisited Most familiar in polling data: Big national surveys use 95 percent confidence level with a margin of error of Typically results are within +/- 3% That means that if we had surveyed everyone, the researchers are 95% certain that the results would be within +/-3% of the results from the survey.
Dr. G. Johnson, www.researchdemystified.org8 Sampling Error: Revisited 11/15/09 Poll: Views on Cap and Trade. There's a proposed system called "cap and trade." The government would issue permits limiting the amount of greenhouse gases companies can put out. Companies that did not use all their permits could sell them to other companies. The idea is that many companies would find ways to put out less greenhouse gases, because that would be cheaper than buying permits. Would you support or oppose this system? Results: Support Oppose 53 42
Dr. G. Johnson, www.researchdemystified.org9 Sampling Error: Revisited The sampling error is plus or minus 3 percent If they had surveyed everyone: The real percentage supporting cap and trade would be between 56% and 50% The real percentage opposing cap and trade would be between 45% and 39%
Dr. G. Johnson, www.researchdemystified.org10 Sampling Error: Revisited Sampling error provides a likely range for the true proportion in the population If the sampling errors overlap, then there is no discernable difference in the views--“too close to call”
Dr. G. Johnson, www.researchdemystified.org11 Statistical Significance When working with random sample data, the big question is: How likely are these results a fairly accurate reflection of the large population from which the sample was taken? Put another way: are these results just a quirk of chance?
Dr. G. Johnson, www.researchdemystified.org12 Statistical Significance Statisticians have provided researchers with analytical techniques to estimate how likely it is that the researchers have gotten the results they see in their analysis of sample data by chance. These techniques are called tests of statistical significance.
Dr. G. Johnson, www.researchdemystified.org13 Statistical Significance We do not need to understand calculus in order to understand how to interpret tests of statistical significance We just have to have faith that the statisticians have figured out the correct theories and that computers have been programmed to give correct results. I Believe, I Believe!
Dr. G. Johnson, www.researchdemystified.org14 Statistical Significance The logic will seem familiar. Researchers set a standard for determining how much risk they are willing to take that the observed results are due to random chance The social science standard or convention is to set an alpha level or p value of.05 or less. They run the statistical significance test. If the test comes in at.05 or less, the researchers conclude that there is little probability (less than 5 percent) that the results are due to chance.
Dr. G. Johnson, www.researchdemystified.org15 Another Way To Understand Statistical Significance If I took 100 random samples from this population, only 5 out of 100 would have the results I have gotten. It is unlikely, therefore, that I would have gotten such unusual results. I am willing to take a risk that my sample results fairly accurately captures what is true in the larger population from which the sample was selected.
Dr. G. Johnson, www.researchdemystified.org16 How much risk? It All Depends! The standard is.05 or less, meaning there is 95% chance of being reasonably accurate (i.e.within sampling error) I could raise the bar and set the standard at.01 or less, meaning there is 99% chance of being accurate I could lower the bar and set the standard at.10, meaning there is a 90% chance of being accurate
Dr. G. Johnson, www.researchdemystified.org17 Statistical Significance: The Logic of Hypothesis Testing Research Hypothesis Women and men earn different salaries. Null Hypothesis: There is no difference between women and men’s salaries. Remember: the null hypothesis is always one of “no difference”
Dr. G. Johnson, www.researchdemystified.org18 Steps In The Process Collect salary data from a random sample of men and women across the U.S. Analyze the data There is a $5,000 difference Because I am working with random sample data, you have to determine whether this $5,000 difference is the result of chance In the jargon: is this difference statistically significant?
Dr. G. Johnson, www.researchdemystified.org19 Testing for Statistical Significance: Testing against the Null Hypothesis: What is the probability of getting a $5,000 difference in my sample results if there really is no difference in the population from which the sample was drawn? I set the alpha or p value at.05. I run the test for statistical significance.
Dr. G. Johnson, www.researchdemystified.org20 Testing for Statistical Significance: If the test is.05 or less, I reject the null hypothesis This means that the probability of getting the $5,000 difference when there really is no difference in the population is 5% or less. I am willing to take the risk and therefore I reject the null hypothesis. I conclude that there is a $5,000 difference in salaries between men and women, and that difference is statistically significant.
Dr. G. Johnson, www.researchdemystified.org21 Testing for Statistical Significance: If the test is more than.05, there is too great a chance that the results do not reflect the population. This difference of $5,000 difference might be due to random chance. I would conclude that this salary difference is not statistically significant.
Dr. G. Johnson, www.researchdemystified.org22 Remember: A statistical significance test is nothing more than a determination of the probability of getting the results the researchers got by chance.
Dr. G. Johnson, www.researchdemystified.org23 Common Tests for Statistical Significance Chi Square: nominal and ordinal data T-tests: DV: interval/ratio data; IV: nominal/ordinal with2 categories Anova: DV: interval/ratio data; IV nominal/ordinal with 3+ categories F-tests: interval/ratio data
Dr. G. Johnson, www.researchdemystified.org24 Statistical Significance There are 100+ kinds of tests for statistical significance. Good news! They all get interpreted the same way. If researchers set the probability level at.05: Then anything that is.05 or less is statistically significant. And anything that is more than.05 is not statistically significant.
Dr. G. Johnson, www.researchdemystified.org25 Test for Statistical Significance: Chi Square Use with crosstabs Chi Square is based on a mathematical formula that looks at the differences between the actual data compared to how the data should have looked if there was no difference. The more difference there is, the more likely that the results will be statistically significant.
Dr. G. Johnson, www.researchdemystified.org26 Chi Square If there was no difference in attitudes based on gender (which is our null hypothesis), our crosstab would expect to see results similar to this: ForAgainst Men5050 Women5050
Dr. G. Johnson, www.researchdemystified.org27 Chi Square But what if our respondents actually reported this way: ForAgainst Men7525 Women2575 Clearly, there is a difference in attitudes based on gender.
Dr. G. Johnson, www.researchdemystified.org28 Example: Gender and Gun Law Are views on gun permit laws different based on gender? Results: it appears that women are somewhat more likely (89%) to favor gun permit law than men (77%). But are these results statistically significant? The computer calculates a p value of.001 Conclusion?
Dr. G. Johnson, www.researchdemystified.org29 Example: Gender and Abortion Attitudes Are views on abortion for any reason different based on gender? 48 percent of men favor abortion for any reason as compared to 49 percent of women. But are these results statistically significant? The computer calculates a p value of.78 Conclusion?
Dr. G. Johnson, www.researchdemystified.org30 Statistical Significance: T-Tests Used with means, comparison of means Single Mean: Interval/ration data where you are comparing to a known population mean Paired Means: before and after design Independent Means: comparing 2 means For t-tests: the dependent variable must be interval or ratio level data.
Dr. G. Johnson, www.researchdemystified.org31 Testing a Hypothesis about a Single Mean: Research hypothesis: There is a difference in average hours worked as compared to “40.” Null: not different from 40 Results: Average number of hours =42. T-test (p value) =.000 Interpretation?
Dr. G. Johnson, www.researchdemystified.org32 Interpretation Process In this case, you are comparing the actual result against the assumption that the norm is 40 hours. How likely is to get 42 hours if the the real average in the population is 40? It is less than.05 It is very unlikely you would have gotten these results by chance alone, so you reject the null hypothesis. Conclusion: the average number of hours worked is 42 and these results are statistically significant.
Dr. G. Johnson, www.researchdemystified.org33 Independent T-Test: Gender and Income Is there a difference in men’s and women’s income? The research hypothesis is that there is a difference in salaries. The null hypothesis is that there is no difference: Technically: The groups are independent or there is no difference in the population means for these 2 groups.
Dr. G. Johnson, www.researchdemystified.org34 Independent T-Test: Gender and Income We collect the data and compare means We run an independent t-test Note: this test can only be used with a nominal independent variable with two values like gender, and an interval/ratio level dependent variable Results: Mean for men:$38,000 Mean for women: $33,000 T-test =.001 Interpretation?
Dr. G. Johnson, www.researchdemystified.org35 F-Tests with Analysis of Variance Used when researchers have an independent variable with more than 2 categories Examples: Religion (Christian, Jewish, Muslim, Buddhist, None) Marital status (single, married, divorced) Education (HS, College, Graduate Degree)
Dr. G. Johnson, www.researchdemystified.org36 Example: Working The Statistical Significance Logic Is there a difference in income based on whether one has a High School degree or less, some college or completed a bachelor’s degree, or has a graduate degree Your Research Hypothesis is? Your Null Hypothesis is?
Dr. G. Johnson, www.researchdemystified.org37 Results: Education and Income HS or less:$29, 225 College$46,764 Graduate$62,275 But are these results statistically significant? F-test =.001 Your Conclusion?
Dr. G. Johnson, www.researchdemystified.org38 But There Is Potential For Error Type I and Type II Errors Type I Error: This occurs when the null hypothesis is rejected even though it is actually true. “There really is no difference in salaries population but we concluded that there was a statistically significant difference.” In very large samples, small differences will be found to be statistically significant.
Dr. G. Johnson, www.researchdemystified.org39 But There Is Potential For Error- at Least a 5% Chance Type II Error: This occurs when researchers fail to reject the null hypothesis even though it is false. “There really is a difference in salaries in the population but we concluded there were no statistically significant difference in salaries between men and women.”
Dr. G. Johnson, www.researchdemystified.org40 No Way To Avoid Error When Working With Random Sample Data To avoid a Type I error, the researchers may want to make it harder to reject the null hypothesis So they will raise the bar—and set the alpha or p-value at.01 rather than.05 But by doing so, they have increased the likelihood of making a Type II error
Dr. G. Johnson, www.researchdemystified.org41 No Way To Avoid Error When Working With Random Sample Data To avoid a Type II error, the researchers may want to make it easier to reject the null hypothesis So they will lower the bar—and set the alpha or p-value at.10 rather than.05 Or they will increase sample size But by making it easier to reject the null hypothesis, they will increase the likelihood of making a Type I error.
Dr. G. Johnson, www.researchdemystified.org42 Which Error Is Worse? It Depends Generally, social scientists feel that it is worse to make a Type I error than a Type II error. It is more problematic to conclude there is a difference or an impact when there really isn’t any. For example, concluding that a drug has a statistically significant positive impact when the results are just a Type I error is a problem.
Dr. G. Johnson, www.researchdemystified.org43 Which One Is Worse? Type I and Type II As a program manager, you may feel that it is worse to make a Type II error. In this case, the null hypothesis of “No difference” would not be rejected. The risk is that “No statistically significant differences were found” might turn into a conclusion that the program did not work. But technically, all that should be concluded is the researchers “failed to reject the null hypothesis.” The program may actually make a difference that the researchers failed to detect.
Dr. G. Johnson, www.researchdemystified.org44 More Statistical Significance Concepts ONE-Tailed Test: is used whenever the hypothesis specifies a direction. Men will earn more than women We are concerned with only one tail of the normal curve. Easier to reject a null-hypothesis.
Dr. G. Johnson, www.researchdemystified.org45 More Statistical Significance Concepts TWO-tailed test: when the research question does not specify a direction. The salaries of men and women are different Generally the default on statistical software packages. Generally the more “conservative” measure: harder to reject a null hypothesis.
Dr. G. Johnson, www.researchdemystified.org46 Statistical Significance Does Not Mean Meaningful Or Important They surveyed 3000 people, selected randomly across the U.S. 87% with a private physician reported being satisfied 85% of those with an HMO physician reported being satisfied. These results were statistically significant. Are they meaningfully different?
Dr. G. Johnson, www.researchdemystified.org47 Statistical Significance Does Not Mean Meaningful Or Important Statistical Significance has a narrow meaning and is based on mathematics Although the researchers do decide on the alpha or p- value they will set as the criterion for whether the results are statistically significant “Meaningful” or “important” is a judgment call. But remember: “significance” is a word owned by statisticians—so only use it when you are talking about tests for statistical significance.
Dr. G. Johnson, www.researchdemystified.org48 Statistical Significance Does Not Mean The results are meaningful or important. The relationship is strong or weak. That design errors have been eliminated. A test result of.001 rather than.049 is not stronger or better in any other sense than there is a lower probability the results are due to random chance.
Dr. G. Johnson, www.researchdemystified.org49 Statistical Significance Does Not Mean That non-sampling errors have been eliminated. Poorly worded survey questions, error-prone data entry, low response rates, systematic bias in respondents, etc etc have to acknowledged as limitations of the study even if the results are reported as statistically significant.
Dr. G. Johnson, www.researchdemystified.org50 Over-attachment To Statistical Significance Tests “Unfortunately, researchers often place undue emphasis on significance tests….Perhaps it is because they have spent so much time in courses learning to use significance tests, that many researchers give the tests an undue emphasis in their research.” --Phillip Shively, p. 172
Dr. G. Johnson, www.researchdemystified.org51 Key Points Tests for statistical significance assume that the study was designed properly using a random sample with valid and reliable measures. No amount of statistical wizardry will correct design flaws.
Dr. G. Johnson, www.researchdemystified.org52 Key Points When working with random sample data, error is always a possibility. Whether Type I or Type II: absolute certainty is an illusion. It is useful to provide readers with “point estimates” but these should be provided with the context of the confidence interval. We are 95% certain that the true mean in the population is between this range.
Dr. G. Johnson, www.researchdemystified.org53 Key Points The emphasis on finding statistical significance can diminish the importance of not finding statistically significant results Results that are not statistically significant can be important They can provide evidence that something thought to be a problem may not be They can provide other researchers with information about what has been tried—so they can try something else
Dr. G. Johnson, www.researchdemystified.org54 Key Points Final Word: When working with random sample data, be aware that the results might not be as solid one hopes. Be mindful of premature certainty. It helps if researchers pull in other similar research to provide support their findings If there is a pattern from other studies, then we can have more faith that the results are solid—meaning they fairly accurately reflect the larger population.
Dr. G. Johnson, www.researchdemystified.org55 Creative Commons This powerpoint is meant to be used and shared with attribution Please provide feedback If you make changes, please share freely and send me a copy of changes: Johnsong62@gmail.com Visit www.creativecommons.org for more informationwww.creativecommons.org