Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.

Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics

Recall: Covariance

cov(X,Y) > 0 X and Y are positively correlated cov(X,Y) < 0 X and Y are inversely correlated cov(X,Y) = 0 X and Y are independent Interpreting Covariance

Correlation coefficient Pearson’s Correlation Coefficient is standardized covariance (unitless):

Correlation Measures the relative strength of the linear relationship between two variables Unit-less Ranges between –1 and 1 The closer to –1, the stronger the negative linear relationship The closer to 1, the stronger the positive linear relationship The closer to 0, the weaker any positive linear relationship

Scatter Plots of Data with Various Correlation Coefficients Y X Y X Y X Y X Y X r = -1 r = -.6r = 0 r = +.3 r = +1 Y X r = 0

Y X Y X Y Y X X Linear relationshipsCurvilinear relationships Linear Correlation

Y X Y X Y Y X X Strong relationshipsWeak relationships Linear Correlation

Y X Y X No relationship

Some calculation formulas… Note: Easier computation formulas:

Sampling distribution of correlation coefficient: *note, like a proportion, the variance of the correlation coefficient depends on the correlation coefficient itself  substitute in estimated r The sample correlation coefficient follows a T-distribution with n-2 degrees of freedom (since you have to estimate the standard error).

What is “Linear”? Remember this: Y=mX+B? B m

What’s Slope? A slope of 2 means that every 1-unit change in X yields a 2-unit change in Y.

Simple linear regression The linear regression model: Love of Math = 5 +.01*math SAT score interceptslope P=.22; not significant

Prediction If you know something about X, this knowledge helps you predict something about Y. (Sound familiar?…sound like conditional probabilities?)

EXAMPLE The distribution of baby weights at Stanford ~ N(3400, 360000) Your “Best guess” at a random baby’s weight, given no information about the baby, is what? 3400 grams But, what if you have relevant information? Can you make a better guess?

Predictor variable X=gestation time Assume that babies that gestate for longer are born heavier, all other things being equal. Pretend (at least for the purposes of this example) that this relationship is linear. Example: suppose a one-week increase in gestation, on average, leads to a 100-gram increase in birth-weight

Y depends on X Y=birth- weight (g) X=gestation time (weeks) Best fit line is chosen such that the sum of the squared (why squared?) distances of the points (Y i ’s) from the line is minimized: Or mathematically… (remember max and mins from calculus)… Derivative[  (Y i -(mx+b)) 2 ]=0

Prediction A new baby is born that had gestated for just 30 weeks. What’s your best guess at the birth-weight? Are you still best off guessing 3400? NO!

Y=birth- weight (g) X=gestation time (weeks) At 30 weeks… 3000 30

Y=birth weight (g) X=gestation time (weeks) At 30 weeks… (x,y)= (30,3000) 3000 30

At 30 weeks… The babies that gestate for 30 weeks appear to center around a weight of 3000 grams. In Math-Speak… E(Y/X=30 weeks)=3000 grams Note the conditional expectation

But… Note that not every Y-value (Y i ) sits on the line. There’s variability. Y i =3000 + random error i In fact, babies that gestate for 30 weeks have birth-weights that center at 3000 grams, but vary around 3000 with some variance  2 Approximately what distribution do birth-weights follow? Normal. Y/X=30 weeks ~ N(3000,  2 )

Y=birth- weight (g) X=gestation time (weeks) And, if X=20, 30, or 40… 203040

Y=baby weights (g) X=gestation times (weeks) If X=20, 30, or 40… 203040 Y/X=40 weeks ~ N(4000,  2 ) Y/X=30 weeks ~ N(3000,  2 ) Y/X=20 weeks ~ N(2000,  2 )

Mean values fall on the line E(Y/X=40 weeks)=4000 E(Y/X=30 weeks)=3000 E(Y/X=20 weeks)=2000 E(Y/X)=  Y/X = 100 grams/week*X weeks

Linear Regression Model Y’s are modeled… Y i = 100*X + random error i Follows a normal distribution Fixed – exactly on the line

Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.

Similar presentations

Presentation on theme: "Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.

Similar presentations

Presentation on theme: "Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics."— Presentation transcript:

Similar presentations

About project

Feedback