MA-250 Probability and Statistics

Slides:



Advertisements
Similar presentations
Chapter 3 Examining Relationships Lindsey Van Cleave AP Statistics September 24, 2006.
Advertisements

Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Statistics 100 Lecture Set 7. Chapters 13 and 14 in this lecture set Please read these, you are responsible for all material Will be doing chapters
MA-250 Probability and Statistics Nazar Khan PUCIT Lecture 3.
Topic 3 The Normal Distribution. From Histogram to Density Curve 2 We used histogram in Topic 2 to describe the overall pattern (shape, center, and spread)
Chapter 3 Bivariate Data
Chapter 8 Linear Regression © 2010 Pearson Education 1.
Math 3680 Lecture #19 Correlation and Regression.
Regression. Correlation measures the strength of the linear relationship Great! But what is that relationship? How do we describe it? –regression, regression.
Describing the Relation Between Two Variables
5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters Outline: Two variables Scatter Diagrams.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Business Statistics - QBM117 Least squares regression.
Lecture 5: Simple Linear Regression
Correlation and Regression Analysis
Chapter 6 Normal Probability Distributions
Chapters 10 and 11: Using Regression to Predict Math 1680.
Chapter 12 Section 1 Inference for Linear Regression.
Prediction. Prediction is a scientific guess of unknown by using the known data. Statistical prediction is based on correlation. If the correlation among.
EC339: Lecture 6 Chapter 5: Interpreting OLS Regression.
Chapters 8 and 9: Correlations Between Data Sets Math 1680.
STAT E100 Section Week 6 – Exam Review. Review - Take the practice exam like you would take any test, be strict on yourself! Timing can be a big factor.
Covariance and correlation
Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.
MA-250 Probability and Statistics Nazar Khan PUCIT Lecture 5.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 1 PROBABILITIES FOR CONTINUOUS RANDOM VARIABLES THE NORMAL DISTRIBUTION CHAPTER 8_B.
NOTES The Normal Distribution. In earlier courses, you have explored data in the following ways: By plotting data (histogram, stemplot, bar graph, etc.)
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
© 2008 Pearson Addison-Wesley. All rights reserved Chapter 1 Section 13-6 Regression and Correlation.
Essential Statistics Chapter 31 The Normal Distributions.
Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics.
Regression Regression relationship = trend + scatter
Objective: Understanding and using linear regression Answer the following questions: (c) If one house is larger in size than another, do you think it affects.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Four Ending Wednesday, September 19 (Assignment 4 which is included in this study guide.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Chapter 4: Describing the relation between two variables Univariate data: Only one variable is measured per a subject. Example: height. Bivariate data:
Essential Statistics Chapter 31 The Normal Distributions.
Creating a Residual Plot and Investigating the Correlation Coefficient.
Correlation and Regression: The Need to Knows Correlation is a statistical technique: tells you if scores on variable X are related to scores on variable.
 Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Least Squares Regression.   If we have two variables X and Y, we often would like to model the relation as a line  Draw a line through the scatter.
Regression Analysis: Part 2 Inference Dummies / Interactions Multicollinearity / Heteroscedasticity Residual Analysis / Outliers.
1 Data Analysis Linear Regression Data Analysis Linear Regression Ernesto A. Diaz Department of Mathematics Redwood High School.
Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances.
Example: set E #1 p. 175 average ht. = 70 inchesSD = 3 inches average wt. = 162 lbs.SD = 30 lbs. r = 0.47 a)If ht. = 73 inches, predict wt. b)If wt. =
Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
©2011 Brooks/Cole, Cengage Learning Elementary Statistics: Looking at the Big Picture1 Lecture 35: Chapter 13, Section 2 Two Quantitative Variables Interval.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
REGRESSION MODELS OF BEST FIT Assess the fit of a function model for bivariate (2 variables) data by plotting and analyzing residuals.
Ch. 11 R.M.S Error for Regression error = actual – predicted = residual RMS(error) for regression describes how far points typically are above/below the.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Introduction Many problems in Engineering, Management, Health Sciences and other Sciences involve exploring the relationships between two or more variables.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Department of Mathematics
Statistical analysis.
Inference for Regression
Statistical analysis.
The Normal Distribution
AP Lab Skills Guide Data will fall into three categories:
1) A residual: a) is the amount of variation explained by the LSRL of y on x b) is how much an observed y-value differs from a predicted y-value c) predicts.
Regression Fallacy.
Correlation and Regression
CHAPTER 12 More About Regression
Presentation transcript:

MA-250 Probability and Statistics Nazar Khan PUCIT Lecture 7

Regression For bivariate data, we have studied that the correlation coefficient measures the spread of the data. Now we want to know how to predict the value of one variable from the other variable. The method for doing this is called the regression method.

Regression Describes how one variable y depends on another variable x. Example: Taller men usually weigh more. We want to know by how much the weight increases for a unit increase in height? Data was collected for 471 men aged 18-24 and is summarised using a scatter diagram. The scatter diagram itself can be summarised using the 2 means, the 2 SDs, and the correlation coefficient.

The SD line. All points on this line are an equal number of SDs away from the point of averages (µh,µw) Most men with 1SD above average height have less than 1SD above average weight. Why?

The Regression Method Most men with 1SD above average height have less than 1SD above average weight. Why? Because height and weight are not well-correlated (r=0.4). When x increases by 1 SDx, y increases by r*1 SDy. For r =1, 1SDx increase in x would imply 1SDy increase in y. For r=0, 1SDx increase in x would imply 0SDy increase in y. This is called the regression method for determining an average value of y from a given value of x. When x increases by k SDx?

On average, the weight of 76 inches tall men will be … ?

The Regression Line The regression line for y on x estimates the average value for y corresponding to each value of x.

Determining weight from height The SD line. All points on this line are an equal number of SDs away from the point of averages (µh,µw) Men with µh+1SD height Point of averages (µh,µw) We learned in the last lecture that a scatter diagram can be summarised using these 5 statistics. Most men with 1SD above average height have less than 1SD above average weight. Why?

Regression Remember, all the analysis so far has been for linear relationships between variables. If the variables are related in a non-linear way, then correlation and regression do not make sense. Linear association between the 2 variables. Correlation and regression make sense. Non-linear association between the 2 variables. Correlation and regression do not make sense.

Predicted avg. weight Actual avg. weight

The Regression Fallacy What does this prove? Nothing. This is just the regression effect. In most try-retry scenarios, the bottom group will on average improve and the top group will on average fall back. Why? Chance error Observation = True value + Chance error To think otherwise is incorrect – the regression fallacy.

The Regression Fallacy Your exam score = true score + luck If you score very low on the first try, you might have been unlucky on some question. Negative chance error. Next time there is lesser chance to be equally unlucky again. So, on average, the lower scoring group will improve. Question: explain the case for a very high score on the first try.

2 Regression Lines Regression of weight on height Regression of height on weight

The r.m.s error to the regression line Regression error = actual - predicted value r.m.s error = Compare with SD formula. Tells you how far typical points are above or below the regression line. Also called a residual

The r.m.s error to the regression line Generally, the r.m.s error also follows the 68-95-99 rule.

A quicker r.m.s formula A quicker alternative formula for r.m.s. error

Vertical Strips For oval shaped scatters, prediction errors will be similar in size here and here and in fact, all along the regression line.

Normal approximation within a strip In any vertical strip, think of the y values as a new dataset. Compute mean using the regression method. SD is roughly equal to the r.m.s error to the regression line. Allows you to use the normal approximation!

Normal approximation within a strip

For part (b), use the normal approximation. estimate the new average using the regression method (answer 71) estimate the new SD using the r.ms. approximation (answer 8) find appropriate area under normal curve

Summary For variables x and y with correlation coefficient r, a k SD increase in x is associated with an r*k SD increase in y. Plotting these regression estimates for y from x gives the regression line for y on x. Can be used to predict y from x.

Summary Regression effect: In most try-retry scenarios, bottom group improves and top group falls back. This is due to chance errors. To think otherwise is known as the regression fallacy. There are 2 regression lines on a scatter diagram.

Summary The r.m.s error to the regression line measures the accuracy of the regression predictions. Generally follows the 68-95-99 rule. For oval-shaped scatter diagrams, data inside a narrow vertical strip can be approximated using a normal curve.