Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.

Slides:



Advertisements
Similar presentations
Simple Linear Regression and Correlation by Asst. Prof. Dr. Min Aung.
Advertisements

Regression and correlation methods
Lesson 10: Linear Regression and Correlation
The Simple Regression Model
Forecasting Using the Simple Linear Regression Model and Correlation
Inference for Regression
Describing Relationships Using Correlation and Regression
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Correlation and Simple Regression Introduction to Business Statistics, 5e Kvanli/Guynes/Pavur (c)2000 South-Western College Publishing.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
The Simple Regression Model
SIMPLE LINEAR REGRESSION
Simple Linear Regression Analysis
Introduction to Probability and Statistics Linear Regression and Correlation.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
SIMPLE LINEAR REGRESSION
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Correlation and Regression Analysis
Introduction to Regression Analysis, Chapter 13,
Linear Regression Analysis
Lecture 5 Correlation and Regression
Correlation and Regression
Correlation and Linear Regression
Correlation and Linear Regression
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Correlation and Regression A BRIEF overview Correlation Coefficients l Continuous IV & DV l or dichotomous variables (code as 0-1) n mean interpreted.
Regression and Correlation Methods Judy Zhong Ph.D.
SIMPLE LINEAR REGRESSION
Linear Regression and Correlation
Correlation and Linear Regression
MAT 254 – Probability and Statistics Sections 1,2 & Spring.
STATISTICS: BASICS Aswath Damodaran 1. 2 The role of statistics Aswath Damodaran 2  When you are given lots of data, and especially when that data is.
CORRELATION & REGRESSION
1 FORECASTING Regression Analysis Aslı Sencer Graduate Program in Business Information Systems.
Chapter 6 & 7 Linear Regression & Correlation
Regression and Correlation. Bivariate Analysis Can we say if there is a relationship between the number of hours spent in Facebook and the number of friends.
Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Basic Statistics Correlation Var Relationships Associations.
Correlation Correlation is used to measure strength of the relationship between two variables.
Topic 10 - Linear Regression Least squares principle - pages 301 – – 309 Hypothesis tests/confidence intervals/prediction intervals for regression.
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
Linear correlation and linear regression + summary of tests
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Lecture 10: Correlation and Regression Model.
LECTURE 9 Tuesday, 24 FEBRUARY STA291 Fall Administrative 4.2 Measures of Variation (Empirical Rule) 4.4 Measures of Linear Relationship Suggested.
1 Virtual COMSATS Inferential Statistics Lecture-25 Ossam Chohan Assistant Professor CIIT Abbottabad.
Basic Statistics Linear Regression. X Y Simple Linear Regression.
Advanced Statistical Methods: Continuous Variables REVIEW Dr. Irina Tomescu-Dubrow.
Chapter 7 Calculation of Pearson Coefficient of Correlation, r and testing its significance.
Linear Regression and Correlation Chapter GOALS 1. Understand and interpret the terms dependent and independent variable. 2. Calculate and interpret.
Lecture 10 Introduction to Linear Regression and Correlation Analysis.
Go to Table of Content Correlation Go to Table of Content Mr.V.K Malhotra, the marketing manager of SP pickles pvt ltd was wondering about the reasons.
11-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
The simple linear regression model and parameter estimation
Simple Linear Correlation
Simple Linear Regression
Correlation and Simple Linear Regression
Correlation & Linear Regression
Correlation and Simple Linear Regression
Regression analysis: linear and logistic
Ch 4.1 & 4.2 Two dimensions concept
Presentation transcript:

Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics

Recall: Covariance

cov(X,Y) > 0 X and Y are positively correlated cov(X,Y) < 0 X and Y are inversely correlated cov(X,Y) = 0 X and Y are independent Interpreting Covariance

Correlation coefficient Pearson’s Correlation Coefficient is standardized covariance (unitless):

Correlation Measures the relative strength of the linear relationship between two variables Unit-less Ranges between –1 and 1 The closer to –1, the stronger the negative linear relationship The closer to 1, the stronger the positive linear relationship The closer to 0, the weaker any positive linear relationship

Scatter Plots of Data with Various Correlation Coefficients Y X Y X Y X Y X Y X r = -1 r = -.6r = 0 r = +.3 r = +1 Y X r = 0

Y X Y X Y Y X X Linear relationshipsCurvilinear relationships Linear Correlation

Y X Y X Y Y X X Strong relationshipsWeak relationships Linear Correlation

Y X Y X No relationship

Some calculation formulas… Note: Easier computation formulas:

Sampling distribution of correlation coefficient: *note, like a proportion, the variance of the correlation coefficient depends on the correlation coefficient itself  substitute in estimated r The sample correlation coefficient follows a T-distribution with n-2 degrees of freedom (since you have to estimate the standard error).

What is “Linear”? Remember this: Y=mX+B? B m

What’s Slope? A slope of 2 means that every 1-unit change in X yields a 2-unit change in Y.

Simple linear regression The linear regression model: Love of Math = *math SAT score interceptslope P=.22; not significant

Prediction If you know something about X, this knowledge helps you predict something about Y. (Sound familiar?…sound like conditional probabilities?)

EXAMPLE The distribution of baby weights at Stanford ~ N(3400, ) Your “Best guess” at a random baby’s weight, given no information about the baby, is what? 3400 grams But, what if you have relevant information? Can you make a better guess?

Predictor variable X=gestation time Assume that babies that gestate for longer are born heavier, all other things being equal. Pretend (at least for the purposes of this example) that this relationship is linear. Example: suppose a one-week increase in gestation, on average, leads to a 100-gram increase in birth-weight

Y depends on X Y=birth- weight (g) X=gestation time (weeks) Best fit line is chosen such that the sum of the squared (why squared?) distances of the points (Y i ’s) from the line is minimized: Or mathematically… (remember max and mins from calculus)… Derivative[  (Y i -(mx+b)) 2 ]=0

Prediction A new baby is born that had gestated for just 30 weeks. What’s your best guess at the birth-weight? Are you still best off guessing 3400? NO!

Y=birth- weight (g) X=gestation time (weeks) At 30 weeks…

Y=birth weight (g) X=gestation time (weeks) At 30 weeks… (x,y)= (30,3000)

At 30 weeks… The babies that gestate for 30 weeks appear to center around a weight of 3000 grams. In Math-Speak… E(Y/X=30 weeks)=3000 grams Note the conditional expectation

But… Note that not every Y-value (Y i ) sits on the line. There’s variability. Y i = random error i In fact, babies that gestate for 30 weeks have birth-weights that center at 3000 grams, but vary around 3000 with some variance  2 Approximately what distribution do birth-weights follow? Normal. Y/X=30 weeks ~ N(3000,  2 )

Y=birth- weight (g) X=gestation time (weeks) And, if X=20, 30, or 40…

Y=baby weights (g) X=gestation times (weeks) If X=20, 30, or 40… Y/X=40 weeks ~ N(4000,  2 ) Y/X=30 weeks ~ N(3000,  2 ) Y/X=20 weeks ~ N(2000,  2 )

Mean values fall on the line E(Y/X=40 weeks)=4000 E(Y/X=30 weeks)=3000 E(Y/X=20 weeks)=2000 E(Y/X)=  Y/X = 100 grams/week*X weeks

Linear Regression Model Y’s are modeled… Y i = 100*X + random error i Follows a normal distribution Fixed – exactly on the line