Presentation is loading. Please wait.

Presentation is loading. Please wait.

Correlation and Regression Basics

Similar presentations


Presentation on theme: "Correlation and Regression Basics"— Presentation transcript:

1 Correlation and Regression Basics
HMI 7530– Programming in R STATISTICS MODULE: Correlation and Regression Basics Jennifer Lewis Priestley, Ph.D. Kennesaw State University 1

2 STATISTICS MODULE Basic Descriptive Statistics and Confidence Intervals Basic Visualizations Histograms Pie Charts Bar Charts Scatterplots Ttests One Sample Paired Independent Two Sample Proportion Testing ANOVA Chi Square and Odds Regression Basics 2 2 2

3 STATISTICS MODULE: Correlation
Dependent Variable Independent (predictor) Variable Statistical Test Comments Quantitative Categorical T-TEST (one, two or paired sample) Determines if categorical variable (factor) affects dependent variable; typically used for experimental or planned change studies Chi-Square Tests if variables are statistically independent (i.e. are they related or not?) Correlation/ Regression Analysis Tests if two or more quantitative variables are related (binary is a special case). 3

4 STATISTICS MODULE: Correlation
Correlation coefficients assess strength of linear relationship between two quantitative variables. The correlation measure ranges from -1 to +1. A negative correlation means that X and Y are inversely related. A positive correlation means that X and Y are directly related. Zero correlation means that X and Y are not linearly related. A correlation of +1 indicates X and Y are directly related and that all the points fall on the same straight line. A correlation of -1 indicates X and Y are inversely related and that all the points fall on the same straight line Plot Scatter Diagram of Each Predictor variable and Dependent Variable Look of Departures from Linearity Look for extreme data points (Outliers) Examine Partial Correlation Can’t determine causality, but isolate confounding variables 4

5 STATISTICS MODULE: Correlation
For those who are interested: 5

6 STATISTICS MODULE: Correlation
Age (yr) Prices Advertised ($) 1 12995 10950 2 10495 3 10995 4 6995 7990 5 8700 6 5990 4995 9 3200 2250 3995 11 2900 2995 13 1750 Consider the Used Car Data – Are these two variables related? The first step is to plot the data in a scatterplot. We can assess the Pearson Correlation Coefficient (r). 6

7 STATISTICS MODULE: Correlation
The correlation coefficient is Is this logical? Why or Why not? The line drawn through this scatterplot is called the “best fit” line – because it is the linear function that minimizes the distances between the output of the linear function and the observed points. 7

8 STATISTICS MODULE: Linear Regression
From the previous slide, the “regression line” has been imposed onto the relationship between Price and Age of car. The equation of this line takes the general form of y=mx+b, where: Y is the dependent variable (Price) M is the slope of the line X is the independent variable (Age) B is the Y-intercept. When we discussion regression models, we transform this equation to be: Y = bo + b1x1 + …bnxn Where bo is the y-intercept and b1 is the slope of the line. The “slope” is also the effect of a one unit change of x on y. 8

9 STATISTICS MODULE: Linear Regression
mod1 <- lm(Price~Year, data = cars) Results: 9

10 STATISTICS MODULE: Linear Regression
From the previous slide, the model equation is presented in the form of the equation of a line: y=-924x From this, we would say: For every 1 year of a car’s age, there is a $924 decrease in the price of the car. Every car “starts” at $12,320. If a car is 2 years old, the expected price is $10,472. That R2 value of is interpreted as “89.37% of the change in price of the cars can be explained by this linear model, where age is the only predictor”. 10


Download ppt "Correlation and Regression Basics"

Similar presentations


Ads by Google