Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder.

Slides:



Advertisements
Similar presentations
A Brief Introduction to Spatial Regression
Advertisements

Chapter 4 The Relation between Two Variables
Overview Correlation Regression -Definition
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
Correlation Chapter 9.
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
Describing the Relation Between Two Variables
PPA 501 – Analytical Methods in Administration Lecture 8 – Linear Regression and Correlation.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #18.
PPA 415 – Research Methods in Public Administration
The Simple Regression Model
Describing Relationships: Scatterplots and Correlation
Chapter 7 Correlational Research Gay, Mills, and Airasian
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Topics: Correlation The road map
Week 9: Chapter 15, 17 (and 16) Association Between Variables Measured at the Interval-Ratio Level The Procedure in Steps.
Relationships Among Variables
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Simple Linear Regression 1. 2 I want to start this section with a story. Imagine we take everyone in the class and line them up from shortest to tallest.
Week 11 Chapter 12 – Association between variables measured at the nominal level.
Correlation and Regression A BRIEF overview Correlation Coefficients l Continuous IV & DV l or dichotomous variables (code as 0-1) n mean interpreted.
Chapter 8: Bivariate Regression and Correlation
Lecture 16 Correlation and Coefficient of Correlation
Linear Regression.
Week 12 Chapter 13 – Association between variables measured at the ordinal level & Chapter 14: Association Between Variables Measured at the Interval-Ratio.
Correlation and Regression. The test you choose depends on level of measurement: IndependentDependentTest DichotomousContinuous Independent Samples t-test.
Introduction to Quantitative Data Analysis (continued) Reading on Quantitative Data Analysis: Baxter and Babbie, 2004, Chapter 12.
Regression Analysis. Scatter plots Regression analysis requires interval and ratio-level data. To see if your data fits the models of regression, it is.
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
Statistics in Applied Science and Technology Chapter 13, Correlation and Regression Part I, Correlation (Measure of Association)
Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution.
POSC 202A: Lecture 12/10 Announcements: “Lab” Tomorrow; Final ed out tomorrow or Friday. I will make it due Wed, 5pm. Aren’t I tender? Lecture: Substantive.
CATEGORICAL VARIABLES Testing hypotheses using. Independent variable: Income, measured categorically (nominal variable) – Two values: low income and high.
Correlation & Regression
Examining Relationships in Quantitative Research
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association.
CATEGORICAL VARIABLES Testing hypotheses using. When only one variable is being measured, we can display it. But we can’t answer why does this variable.
Chapter 16 Data Analysis: Testing for Associations.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.
Examining Relationships in Quantitative Research
Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.
Modern Languages Row A Row B Row C Row D Row E Row F Row G Row H Row J Row K Row L Row M
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Difference Between Means Test (“t” statistic) Analysis of Variance (F statistic)
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Lecturer’s desk Physics- atmospheric Sciences (PAS) - Room 201 s c r e e n Row A Row B Row C Row D Row E Row F Row G Row H Row A
BIVARIATE/MULTIVARIATE DESCRIPTIVE STATISTICS Displaying and analyzing the relationship between continuous variables.
Copyright © Cengage Learning. All rights reserved. 8 9 Correlation and Regression.
Chapter 15 Association Between Variables Measured at the Interval-Ratio Level.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Bivariate Association. Introduction This chapter is about measures of association This chapter is about measures of association These are designed to.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Other tests of significance. Independent variables: continuous Dependent variable: continuous Correlation: Relationship between variables Regression:
The simple linear regression model and parameter estimation
Chapter 12 Understanding Research Results: Description and Correlation
Regression Analysis.
Chapter 5 STATISTICS (PART 4).
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Difference Between Means Test (“t” statistic)
Testing hypotheses Continuous variables.
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
Testing hypotheses Continuous variables.
Regression & Correlation (1)
Presentation transcript:

Testing hypotheses Continuous variables

H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder Low Income 75%25% High Income 33%67% Median income Murder rate Hypothesis Lower income Higher murder rate L L H H

Correlation and Regression Correlation: measure of the strength of an association (relationship) between continuous variables Regression: predicting the value of a continuous dependent variable (y) based on the value of a continuous independent variable (x)

Median income Murder rate Plot IV and DV for each case (city) on a “scattergram” (two cities detailed) Hypothesis Lower income Higher murder rate Murder rate Median income Distribution of cities by median income Distribution of cities by murder rate Analysis later…

Correlation statistic - r Values of r Range from –1 to is a perfect negative association (correlation), meaning that as the scores of one variable increase, the scores of the other variable decrease at exactly the same rate +1 is a perfect positive association, meaning that both variables go up or down together, in perfect harmony Intermediate values of r (close to zero) indicate weak or no relationship Zero r (never in real life) means no relationship – that the variables do not change or “vary” together, except as what might happen through chance alone No relationship Perfect positive relationship Perfect negative relationship

X Y X Y r = +1 r = - 1 Can changes in one variable be predicted by changes in the other? Two “scattergrams” – each with a “cloud” of dots NOTE: Independent variable (X) is always placed on the horizontal axis NOTE: Dependent variable (Y) is always placed on the vertical axis

X Y Can changes in one variable be predicted by changes in the other? As X changes in value, does Y move correspondingly, either in the same or opposite direction? Here there seems to be no connection between X and Y. One cannot predict values of Y from values of X. r = 0

X Y Can changes in one variable be predicted by changes in the other? Here as X changes in value by one unit Y also changes in value by one unit. Knowing the value of X one can predict the value of Y. X and Y go up and down together, meaning a positive relationship. r = +1

X Y Can changes in one variable be predicted by changes in the other? Here as X changes in value by one unit Y also changes in value by one unit. Knowing the value of X one can predict the value of Y. X and Y go up and down in an opposite direction, meaning a negative relationship. r = -1

Computing r using the “Line of best fit” To arrive at a value of “r” a straight line is placed through the cloud of dots (the actual, “observed” data) This line is placed so that the cumulative distance between itself and the dots is minimized The smaller this distance, the higher the r r’s are normally calculated with computers. Paired scores (each X/Y combination) and the means of X and Y are used to compute: a, where the line crosses the Y axis b, the slope of the line When relationships are very strong or very weak, one can estimate the r value by simply examining the graph X Y 2 a b

The line of best fit predicts a value for one variable given the value of the other variable There will be a difference between these estimated values and the actual, known (“observed”) values. This difference is called a “residual” or an “error of the estimate.” As the error between the known and predicted values decreases – as the dots cluster more tightly around the line – the absolute value of r (whether + or –) increases X Y if x =.5, y=2.3 if y =5, x=3.4 “Line of best fit”

A perfect fit: Line of best fit goes “through” each dot X Y r = +1.0 a perfect fit X Y 4 r = -1.0 a perfect fit

r = +.65 An intermediate fit yields an intermediate value of r X Y 2 Moderate cumulative distance between line of best fit and “cloud” of dots

X Y r = -.19 A poor fit yields a low value of r Large cumulative distance between line of best fit and “cloud” of dots

R-squared (R 2 ), the coefficient of determination Proportion of the change in the dependent variable (also known as the “effect” variable) that is accounted for by change in the independent variable (also known as the “predictor” variable) Taken by squaring the correlation coefficient (r) “Little” r squared (r 2 ) depicts the explanatory power of a single independent/predictor variable “Big” R squared (R 2 ) combines the effects of multiple independent/predictor variables. It’s the more commonly used.

Change in the IV accounts for thirty-six percent of the change in the DV. A moderate- to-strong relationship, in the hypothesized direction – hypothesis confirmed! R = -.6 R 2 =.36 Hypothesis: Lower income  higher murder rate How to “read” a scattergram Move along the IV. Do the values of the DV change in a consistent direction? Look across the IV. Does knowing the value of the IV help you predict the value of the DV? Place a straight line through the cloud of dots, trying to minimize the overall distance between the line and the dots. Is the line at a pronounced angle? To the extent that you can answer “yes” to each of these, there is a relationship

Class exercise Hypothesis: Height  Weight Use this data to build a scattergram Be sure that the independent variable is on the X axis, smallest value on the left, largest on the right, just like when graphing any distribution Be sure that the dependent variable is on the Y axis, smallest value on the bottom, largest on top Place a dot representing a case at the intersection of its values on X and Y Place a STRAIGHT line where it minimizes the overall distance between itself and the cloud of dots Use this overall distance to estimate a possible value of r, from -1 (perfect negative relationship,) to 0 (no relationship), to +1 (perfect positive relationship) Height (inches)WeightAge

Height  Weight r =.72 r 2 =.52 A strong relationship

r =.35 r 2 =.12 Age  Weight A weak-to-moderate relationship

Age  Weight (less extreme cases) r = -.17 r 2 =.03 A very weak negative relationship

r =.04 r 2 =.00 Age  Height No relationship

Height Weight LIGHT HEAVY SHORTTALL Changing the level of measurement from continuous to categorical

A weak negative relationship r = -.26 r 2 =.07 Spring ’15 p.s.

Exploring data with r Why are we so polarized? Could part of the reason be a poor economy? The r statistic can be used to explore such questions, not just for a small group but for the whole country! But unless we go in with a hypothesis, backed by a literature review, it’s basically a fishing expedition. Remember that there are lots of variables changing all the time, so finding substantial correlations isn’t unusual. Theorizing after the fact is always hazardous. Remember the story about lunar cycles and homicide?

“Spearman’s r” – Assess correlation between two ordinal categorical variables Partial correlation – Using a control variable to assess its potential influence on a bivariate (two- variable) relationship when all variables are continuous – Analogous to using first-order partial tables for categorical variables – Instead of height  weight, is it possible that a variable related to height – age – is the real cause of changes in weight? Controlling for AGE HEIGHT WEIGHT HEIGHT WEIGHT Zero-order correlations First-order partial correlations HEIGHT WEIGHT AGE HEIGHT WEIGHT AGE Other correlation techniques

Some parting thoughts If we did not use probability sampling – Our results apply only to the cases we coded – Accounting for the influence of other variables can be tricky – R and related statistics are often unimpressive; describing what they mean can be tricky If we used probability sampling – Our results can be extended to the population – But, since samples are just that – samples – we cannot assume that the statistics a sample yields (e.g., r, R 2 ) hold true for the population – Techniques we’ll discuss later allow us to estimate the magnitude of the difference between sample statistics and the corresponding population parameters – This process will also let us interpret our results with far greater clarity and precision than is possible without probability sampling

Exam preview 1.You will be given a hypothesis and data from a sample. There will be two variables – the dependent variable, and the independent variable. Both will be categorical, and each will have two levels (e.g., low/high, etc.) A.You will build a table containing the frequencies (number of cases), just like we did in class and in this slide show. For consistency, place the categories of the independent variable in rows, just like in the slide shows. B.You will build another table with the percentages. Remember to go to one category of the independent variable and percentage it across the dependent variable. Then go to the other category of the independent variable and do the same. C.You will analyze the results. Are they consistent with the hypothesis? 2.You will be given the same data as above, broken down by a control variable. It will also be categorical, with two levels. A.You will build first order partial tables, one with frequencies (number of cases), the other with percentages, for each level of the control variable. Remember that these tables will look exactly like the zero-order table. The hypothesis, the independent and dependent variables and their categories stay the same. B.You will be asked whether introducing the control variable affects your assessment of the hypothesized zero-order relationship. This requires that you separately compare the results for each level of the control variable to the zero-order table. Does introducing the control variable tell us anything new? 3.You will be given another hypothesis and data. There will be two variables – the dependent variable and the independent variable. Both are continuous variables. A.You will build a scattergram and draw in a line of best fit. Remember that the independent variable must go on the X (horizontal) axis, and the dependent variable must go on the Y (vertical) axis. Also remember that the line of best fit must be a straight line, placed so as minimize its overall distance from the dots, which represent the cases. B.You will estimate the r (correlation coefficient) and state whether the scattergram supports the hypothesis. Be careful! First, is there a relationship between variables? Second, is it in the same direction (positive or negative) as the hypothesized relationship?