2Inferential Statistics With inferential statistics you can do the following:Determine probability of characteristics of population based on the characteristics of your sample. You know what the relationship “looks like” in the sample. You have the actual numbers. Inferential statistics tells you the probability that the same relationship exists in the population.Assess the strength of the relationship between independent and dependent variables. Inferential statistics allows you to determine if the relationship is statistically significant and helps you decide if it is substantially significant (e.g., is the relationship strong enough to matter).
3Inferential Statistics Inferential statistics are used to test hypothesesResearch hypotheses state there is a relationship between two or more variables in your sample.Initially you do NOT assume there is a relationship in the population (i.e., null hypothesis claims there is NO relationship).You compute a statistic that indicates the probability that we can reject the null hypothesis and support the research hypothesis (e.g., the same relationship we see in our sample also exists in the population). If it is 95% probable that the sample relationship exists in the population, we say it is significant at the .05 level (5% chance of making an error, or we say there is also a relationship in the population from which this sample was drawn).
4How Do We Apply What We Learn From Inferential Statistics? BEFORE you use any Intervention, you should determine if there is evidence that it works.For instance, what is the probability that fertilizer will increase crop yield for farmers?BEFORE you work with any group, you may want to know the characteristics of that group.For instance, what proportion of abused women are eventually killed by their partner? What is the probability that “risk” increases after they leave the partner?BEFORE you make recommendations, you want to understand the probabilities of success.What is the probability that those who participate in 4-H will graduate from college? Are they significantly more likely to graduate than those who do not participate?
5Inferential Statistics Can Answer the Following Questions Is the Intervention I am currently using worth my time?Does it work with 5% or 95% of program participants? Are participants more likely than general population to reach goals?What factors are most important when attempting to increase effectiveness of intervention?If I use a second intervention, does it increase success by 10%, 15%, or 60%?Are characteristics of program participants important?Policy Implications - Is it worth the tax payer’s dollars?Is this a Spurious Relationship?Is there a difference between groups receiving intervention and those not receiving itIs it large enough to warrant use of limited resources?(substantial significance)Is it large enough to argue that this intervention “works” in different settings/situations?
6Example of Applying Information Learned From Inferential Statistics You implement an “on-line” program to improve communication amongst young married couples.Using measurement of long term objectives, is it successful?What percent of my respondents stayed married? Howmuch lower is the divorce rate for those who participatedthan for general population?What factors are most important?Does required payment increase success?Do older/younger couples respond moreeffectively to this counseling?Policy Implications - Is it worth Tax Payer’s Dollars?Is this relationship spurious (i.e., proactive individuals aremore likely to seek intervention and also more likely to havegood marriages)?Is the intervention “cost effective”? Is difference bigenough to matter?Is there evidence that this program could work inother settings/situations or with other couples?
8Stating Hypothesis for Regression Null HypothesisThere is not relationship between any of the independent variables and the dependent variableTechnically, all of the slopes are zeroResearch HypothesisThere is a relationship between at least one of the independent variables and the dependent variableTechnically, at least one of the slopes are zeroThis relationship could be positive or negative
9Inferential Statistics First consider bi-variate statisticsPearson CorrelationWhen is it used?When you have a continuous independent variable and a continuous dependent variable.How do you interpret it?When the probability associated with the ___ statistics is .05 or less then you can assume there is a relationship between the dependent and independent variableFor instance you may want to know if the number of hours participants spend in your program is positively related to their scores on school exams* NOTE The Pearson Correlation and Bi-variate Regression are very similar
10Inferential Statistics First consider bi-variate statisticsBi-variate RegressionWhen is it used?When you have a continuous independent variable and a continuous dependent (outcome) variableFor instance, you may want to know if the number of hours participants spend in your program is positively related to their scores on school examsHow do you interpret it?When the probability associated with the F-statistic is .05 or less then you can assume there is a relationship between the dependent and the independent variableNOTE The Pearson Correlation and Bi-variate Regressionare very similar
11Pearson CorrelationConsists of a continuous independent and a continuous dependent variable (i.e., X and Y)A Pearson correlation coefficient is used to estimate the strength of the relationship between X and Y in the populationA Pearson correlation coefficient ranges from -1 to +1The closer to -1 or +1 it is, the stronger the relationship between X and Y, and the lower the probability that we would make a mistake if we claimed there is a relationship between X and Y in the populationA scatter plot can give a visual representation of the relationship between X and YA scatter plot shows all of the data points/plots and their relationship, using an X and Y axisOn the following slide, respondents’ scores on BETA and SAT were plotted so that there is one data point for someone who scored 1000 on the SAT and 12 on the BETA.
12Bi-variate scatterplot showing a strong positive relationship If all of the data points were on the regression line, then the correlation coefficient would be 1. This would indicate that if we know a person’s score on the SAT we can predict their score on the BETA 100% of the time.
13Bi-variate scatterplot showing a strong negative/inverse relationship .The regression line or slope indicates where the data points would be if you could predict Y after knowing X 100% of the time. It is the “predicted” Y.
14Correlation MatrixThe following slide contains a computer generated correlation matrix. A correlation matrix can provide the following information:Strength of the relationship between any two of the variablesThe probability that you would make a mistake if you claimed any two variables are related in the populationAt the top of the correlation matrix, the following information is reported:The mean of each continuous variableThe sample sizeThe standard deviation of each continuous variableThe range of scores for each continuous variable
15Pearson Correlation Coefficient What does it tell us about the strength of the relationship between X and Y?Strength of Relationship r value R2 valuesPerfectStrongModerateWeakNo RelationshipWeakModerateStrongPerfectStrength of relationship (r or R2)The closer to 1 or -1 the R and R2 are, the stronger the relationship. .SignificanceThe stronger the relationship the more likely it is significant.
16Correlation Matrix SR90 = number of men per every 100 women TPOV90 = % of people living in povertyFHH = % of female headed householdsEMPMAL = % of males employedEMPFEM90 = % of females employer
17The first number in the matrix (marked by the maroon textbox) is the correlation coefficient. It indicates the strength of the relationship.The second number is the probability. It must be .05 or less if you are to generalize to the population.There may be a third number in the matrix. It would indicate the sample size.
18Bi-Variate Regression This is a bi-variate regression printout. It focuses specifically on the relationship between two of the variables (e.g. FHH90 and TPOV90) reported in the matrix on the previous slide. Note that the standardized estimate is the same as the correlation coefficient. If you square this number ( ) you would get (the R square). If you square the standardized estimate you always get the R-square. It is the percent increase in you ability to predict Y if you know X. In this example, your ability to predict the poverty rate (TPOV90) in a city increases by 26% if you know the percent of female headed households in that city..The Prob>F is This indicates this relationship is significant.We are more than 99% sure that this relationship exists in the population.
19This is a SAS printout of a Pearson Correlation Matrix This is a SAS printout of a Pearson Correlation Matrix. This matrix reports the relationship between 3 continuous variables (i.e., GPR, grade in school and number of times the student has skipped class).
20Types of Regression Bi-Variate Regression Continuous dependent variableContinuous independent variableRelationship between the two can be negative or inverse, positive, linear or curvilinear.Multiple RegressionYou use multiple independent variables to predict a continuous dependent variableFor instance, you could use number of hours participating in the program, score on attitude index and age to predict success in school (i.e., GPR)A variation – you use one or more continuous independent variables and one categorical variable to predict a dependent variable—The categorical variable can have only two categories (i.e., male or female)For instance, you would use gender, number of hours participating n the program, score on attitude index and age to predict success in school (i.e., GPR)
21Regression – continued Logit RegressionCategorical dependent variable (two categories)You use continuous independent variables to predict the probability of falling into one category or anotherFor instance, how does number of hours studying for exams, age, and number of classes skipped, influence the probability that a student will graduate from high school?Graduation is measured simply as did or did not graduate
25Regression PrintoutThis is a copy of a multiple regression printout and includes a brief explanation of the numbers reported.
26Interpreting a regression printout Pr > F is <.000l indicating we can reject the null hypothesis and at least one independent variable is significantly related.R-Square is indicating that our ability to predict posself (self-esteem score) increases by 51% if we know the value of all of the independent variables.Pr . is less than for grades (abc) and how much like they other students (likestu) indicating that these are the independent variables that are related to posself (self-esteem score).
27Interpreting Parameter Estimates Parameter estimates are how much Y changes for every one unit change in X. From printout on previous slide, we see that for every grade increase (i.e., from C to B) then Posself (self esteem score) increases by .69 or 2/3 of a point.If the independent variable is a dummy variable then we interpret it slightly differently. It is how much Y changes when we go from one category to the other. From printout on previous slide we see that as we go from the category of male (coded 0) to female (coded 1) then Posself decreases by
28Interpreting Parameter Estimates Caution – When you interpret a parameter estimate, you must consider how you measured the X variables. If your parameter estimate is 1 and you measured money in $10,000 increments, then for every $10,000 you spend on your child, their SAT score would increase by 1. If your parameter estimate is 1 and you measure money in dollars, then for every dollar, their SAT score would increase by 1. $10,000 would result in an increase of 10,000 on their SAT.
29Predicting Y Why and when do we predict Y? Explain results to a nonacademic audienceExplain regression in an interesting wayHow do we predict?First generate regression printoutTHEN use the prediction formula – Y=a+b1x1+b2x2+b3x3+…….Where:Y=the intercept or constantb=slope or parameter estimate which tells you the change in Y for every one unit change in XX=value of each independent variable that you select using the codebookY=the predicted value of your dependent variable as predicted by the combination of X variables