Statistics in Applied Science and Technology Chapter 13, Correlation and Regression Part I, Correlation (Measure of Association)

Slides:



Advertisements
Similar presentations
Association Between Two Variables Measured at the Nominal Level
Advertisements

Bivariate Analyses.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Learning Objectives 1 Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Describing Relationships Using Correlation and Regression
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
Chapter Eighteen MEASURES OF ASSOCIATION
Session 7.1 Bivariate Data Analysis
PSY 307 – Statistics for the Behavioral Sciences
Chapter Seven The Correlation Coefficient. Copyright © Houghton Mifflin Company. All rights reserved.Chapter More Statistical Notation Correlational.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Linear Regression/Correlation
Week 11 Chapter 12 – Association between variables measured at the nominal level.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
Analyzing Data: Bivariate Relationships Chapter 7.
Chapter 8: Bivariate Regression and Correlation
Understanding Research Results
Week 12 Chapter 13 – Association between variables measured at the ordinal level & Chapter 14: Association Between Variables Measured at the Interval-Ratio.
Association between Variables Measured at the Nominal Level.
LIS 570 Summarising and presenting data - Univariate analysis continued Bivariate analysis.
Correlation By Dr.Muthupandi,. Correlation Correlation is a statistical technique which can show whether and how strongly pairs of variables are related.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.
Simple Covariation Focus is still on ‘Understanding the Variability” With Group Difference approaches, issue has been: Can group membership (based on ‘levels.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
Learning Objective Chapter 14 Correlation and Regression Analysis CHAPTER fourteen Correlation and Regression Analysis Copyright © 2000 by John Wiley &
Introduction to Quantitative Data Analysis (continued) Reading on Quantitative Data Analysis: Baxter and Babbie, 2004, Chapter 12.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Agenda Review Association for Nominal/Ordinal Data –  2 Based Measures, PRE measures Introduce Association Measures for I-R data –Regression, Pearson’s.
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Statistics 11 Correlations Definitions: A correlation is measure of association between two quantitative variables with respect to a single individual.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Chapter 8 – 1 Chapter 8: Bivariate Regression and Correlation Overview The Scatter Diagram Two Examples: Education & Prestige Correlation Coefficient Bivariate.
1 GE5 Tutorial 4 rules of engagement no computer or no power → no lessonno computer or no power → no lesson no SPSS → no lessonno SPSS → no lesson no.
Correlation Association between 2 variables 1 2 Suppose we wished to graph the relationship between foot length Height
Basic Statistics Correlation Var Relationships Associations.
Chapter 10 Correlation and Regression
Investigating the Relationship between Scores
Examining Relationships in Quantitative Research
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 16 Data Analysis: Testing for Associations.
U Describes the relationship between two or more variables. Describes the strength of the relationship in terms of a number from -1.0 to Describes.
Chapter 11, 12, 13, 14 and 16 Association at Nominal and Ordinal Level The Procedure in Steps.
Describing Relationships Using Correlations. 2 More Statistical Notation Correlational analysis requires scores from two variables. X stands for the scores.
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
Psychology 820 Correlation Regression & Prediction.
Examining Relationships in Quantitative Research
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
Copyright © 2014 by Nelson Education Limited Chapter 11 Introduction to Bivariate Association and Measures of Association for Variables Measured.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
Copyright © 2012 by Nelson Education Limited. Chapter 12 Association Between Variables Measured at the Ordinal Level 12-1.
Correlation & Linear Regression Using a TI-Nspire.
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 3 Investigating the Relationship of Scores.
Bivariate Association. Introduction This chapter is about measures of association This chapter is about measures of association These are designed to.
Theme 5. Association 1. Introduction. 2. Bivariate tables and graphs.
Association Between Variables Measured at the Ordinal Level
Final Project Reminder
Final Project Reminder
Basic Statistics Overview
Chapter 14 in 1e Ch. 12 in 2/3 Can. Ed.
Chapter 13 (1e), (Ch. 11 2/3e) Association Between Variables Measured at the Nominal Level: Phi, Cramer’s V, and Lambda.
Summarising and presenting data - Bivariate analysis
Association Between Variables Measured at Nominal Level
Correlation and Regression
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
Presentation transcript:

Statistics in Applied Science and Technology Chapter 13, Correlation and Regression Part I, Correlation (Measure of Association)

Key Concepts in This Chapter Associations between two variables measured at Interval- ratio level Scatter diagram, Pearson’s correlation coefficient Positive correlation, negative correlation, curvilinear relationships Coefficient of Determination Associations between two variables measured at Ordinal level Associations between two variables measured at Nominal level Associations between one quantitative variable and one nominal or ordinal variable

Association Between Two Variables Measured at Interval- ration level Positive Correlation Negative Correlation No Association Curvilinear Relationship

Insofar as relatively low values are paired with relatively low values, and relatively high values are paired with relatively high values, the relationship is positive. Example: time studied for test and test scores Positive Correlation

Negative Correlation Insofar as relatively low values are paired with relatively high values, and relatively high values are paired with relatively low values, the relationship is negative. Example: life span in years and history of heavy smoking in years

No Association No regularity is apparent among the pairs of observations - no association. Example: IQ score and shoe size

Curvilinear Relationship A relationship that can be described best with a curved line. Example: Physical strength and years of age.

Preview a relationship - Scatterplots Definition of scatterplots - a graph containing a cluster of dots that represent all pairs of observations. Construction of scatterplots- scale each of the two variables along the horizontal (X) and vertical (Y) axes and use each pair of observations to locate a dot within the scatterplot.

Is there a Association?

Plot out the data The Scattergram

The Scattergram

Anxiety & Skill Performance Curvilinear Relationship

Measurement of strength of correlation - correlation coefficient r Intuitively: the more closely the dot cluster approximate a straight line, the stronger the relationship will be. A correlation coefficient is a number between -1 and 1 that describes the relationship between variables. The sign of r indicated the types of linear relationship, whether positive or negative.

The value of r without regard to sign, indicates the strength of the linear relationship. Types of correlation coefficient: –Pearson’s correlation coefficient r (appropriate when both variables are normal in distribution) –Spearman’s correlation coefficient r s (appropriate when one or both variables are not normal in distribution) Measurement of strength of correlation - correlation coefficient r

Calculation of Pearson’s r In which: x, y - pair of observations x, y - mean of x and y

Coefficient of Determination r 2 r 2 : simply square the r value What percentage of the variance in each variable is explained by knowledge of the variance of the other variable –what percentage of the variance within Y is predicted by the variance within X?

Notes about r 2 Coefficient of determination explains explained variance tTherefore, 1-r 2 is unexplained variance r for example, r= 0.70 gives about 50% explained variance (why???) aAlways calculate r 2 to evaluate extent of the correlation

Not Causation Only Association Thomas Sowell opinion piece

Smoking & LBP Smoking Low Back Pain r = 0.45

Smoking & LBP Smoking Low Back Pain r = 0.45 ? Low Back Pain Smoking

Smoking & LBP Smoking Low Back Pain r = 0.45 Lifestyle factors ( ie strength) ?

Smoking & LBP Smoking Low Back Pain r =.45 Lifestyle choices

How Would I Measure Association Between Variables Measured at the Ordinal Level? Like associations between two variables measured at Interval-ratio level, associations between two variables measured at the ordinal level have three types: –positive association –Negative association –No association

Positive association: Cases tend to be ranked in the same order on both variables. For example, if case A is ranked above case B on one variable, it would also be ranked above case B on the second variable. If “burnout” among elementary school teachers are positively associated with the number of years of service, teacher A who had more years of service than teacher B would be ranked higher than teacher B on “burnout”.

Negative association The order of cases would be ranked reversed between two variables. If case A ranked above case B on one variable, it would tend to rank below case B on the second variable. If “burnout” among elementary school teachers are negatively associated with the number of years of service, teacher A who had more years of service than teacher B would be ranked lower than teacher B on “burnout”.

No Association No regularity is apparent among two variables - no association.

Preview a Association - Conditional Distribution of Y Table 1. Burnout by Length of Service

Preview a Association - Conditional Distribution of Y Table 1 displays the association between burnout and length of service. Each column shows the pattern of scores on the dependent variable for each score on the independent variable. For example, the first column indicates that, of the 38 teacher whose length of service is low, 20 (52.6%) were low on burnout, 10 (26.3%) were moderate on burnout, and 8 (21.1%) were high on burnout. Second column shows that, of the 32 teacher whose length of service is moderate, 6 (18.8%) were low on burnout, 15 (46.9%) were moderate on burnout, and 11 (34.3%) were high on burnout.

Preview a Association - Conditional Distribution of Y Like Table 1, these “within-column” frequency distributions are called the conditional distribution of Y, since they display the distribution of scores on the dependent variable for each condition (score) of the independent variable. By observing the conditional distribution of Y in Table 1, a positive association exists between burn out and length of service (Why?)

Measurement of strength of association between two variables measured at Ordinal Level- Gamma (G) Gamma (G) can be calculated by (SPSS will be used) Interpreting G: similarly to r 2, G indicates that the percentage errors can be reduced if we predict the order of pairs of cases on one variable from the order of pairs of cases on the other - as opposed to predicting order while ignoring the other variable Where: Ns - the number of pairs of cases ranked the same as both variable. Nd - the number of pairs of cases ranked differently on the two variables.

How Would I Measure Association Between Variables Measured at the Nominal Level? Associations between two variables measured at the nominal level have no direction, therefore only two types: –Existence of association –No association

Preview a Association - Conditional Distribution of Y Table 2. Height by Gender

Preview a Association - Conditional Distribution of Y Table 2 displays the association between height and gender. Each column shows the pattern of scores on the dependent variable for each score on the independent variable. For example, the first column indicates that, of the 50 people who are male, 44 (88%) were tall, and 6 (12%) were short. Second column shows that, of the 50 people who are female, 8 (16%) were tall, and 42 (84%) were short. Calculating percentages as in Table 2 will help identifying the pattern of association. Males tend to be tall and females tend to be short.

Measurement of strength of association between two variables measured at Nominal Level- Phi (  ) and Cramer’s V Phi (  ) is appropriate for 2  2 table. It is defined as: For tables large than 2  2, a more general form of Cramer’s V should be used. It is defined as: Where:  2 - a test statistics based on expected frequency and observed frequency. N - total number of observation Where:  2, N are defined as above. Min of r-1, c-1 = minimum value of r-1 (# of rows minus 1) or c-1(# of column minus 1).

Phi (  ) and Cramer’s V Calculations of these two parameter will be conducted using SPSS (You don’t need to know about the calculation). Values of  and Cramer’s V are both between 0.00 and A value close to 1 indicate a strong association, and a value close to 0.00 indicate a very weak association.

A limitation of Phi (  ) and Cramer’s V A limitation of Phi and Cramer’s V is the absence of a direct and meaningful interpretation for values between the extremes of 0.00 and 1.00 In another word, the values between 0.00 and 1.00 can not be interpreted as anything other than an index of the relative strength of association.

Another measurement of strength of association between two variables measured at Nominal Level- Lambda ( ) Lambda is calculated based on E 1 (prediction errors made while ignoring and independent variable) and E 2 (prediction errors made while taking independent variables into account).

Interpretation of Lambda ( ) The value of lambda ranges from 0.00 to 1.00 When multiplied by 100, the value of lambda indicates the percentage reduction in error if we take independent variable into account - as opposed to ignoring the independent variable, therefore, the strength of the association.

Lambda ( ) is asymmetric. This means that the value of will vary depending on which variable is taken as independent When one of the row totals is much larger than the others, lambda can take on a value of 0.00 even when other measure of association would not be 0.00, and calculating percentages for the table indicates some association between variables. Notes about Lambda ( )

Another Popular Way Measuring the Strength of Association between two nominal variables (Chap 12.9) Relative risk is defined as the ratio of the incidence rate for persons exposed to a risk factor to the incidence rate for those not exposed to the risk factor: incidence rate among exposed Relative Risk (RR) = incidence rate among unexposed

A generalized 2  2 table (Table 3)

Relative Risk and Odds Ratio Relative Risk is also called risk ratio and it can be computed using frequencies listed in Table 3. Another commonly used measure of strength of association is the odds ratio (OR), and is defined as

How to measure strength of association between one quantitative variable and one nominal (or ordinal) variable? Measure of strength of association begin with summary statistics such as mean, median, percentage, etc. The measure of strength of association is created by comparing these summary statistics, such as subtract the summary statistics. The bigger the difference, the stronger the association.

An Example When exploring the relationship between benzene and white blood cell count, we might use the mean white blood cell counts for the exposed and non-exposed workers and then subtract these two means. The bigger the difference, the stronger the association.