Simple Linear Regression In many scientific investigations, one is interested to find how something is related with something else. For example the distance.

Simple Linear Regression In many scientific investigations, one is interested to find how something is related with something else. For example the distance traveled and the time spent driving; one’s age and height. Generally, there are two types of relationships between a pair of variable: deterministic relationship and probabilistic relationship. Deterministic relationship time distance S: distance travel S 0 : initial distance v: speed t: traveled S0S0 v slope intercept

Probabilistic Relationship age height In many occasions we are facing a different situation. One variable is related to another variable as in the following. Here we can not definitely predict one’s height from his age as we did in

Linear Regression Statistically, the way to characterize the relationship between two variables as we shown before is to use a linear model as in the following: Here, x is called independent variable y is called dependent variable  is the error term a is intercept b is slope x y a b Error: 

Least Square Lines Given some pairs of data for independent and dependent variables, we may draw many lines through the scattered points x y The least square line is a line passing through the points that minimize the vertical distance between the points and the line. In other words, the least square line minimizes the error term .

Least Square Method For notational convenience, the line that fits through the points is often written as The linear model we wrote before is If we use the value on the line, ŷ, to estimate y, the difference is (y- ŷ) For points above the line, the difference is positive, while the difference is negative for points below the line. ŷ y (y- ŷ)

For some points, the values of (y- ŷ) are positive (points above the line) and for some other points, the values of (y- ŷ) are negative (points below the line). If we add all these up, the positive and negative values can get cancelled. Therefore, we take a square for all these difference and sum them up. Such a sum is called the Error Sum of Squares (SSE) The constant a and b is estimated so that the error sum of squares is minimized, therefore the name least squares. Error Sum of Squares

Estimating Regression Coefficients If we solve the regression coefficients a and b from by minimizing SSE, the following are the solutions. Where x i is the ith independent variable value y i is dependdent variable value corresponding to x i x_bar and y_bar are the mean value of x and y.

The constant b is the slope, which gives the change in y (dependent variable) due to a change of one unit in x (independent variable). If b> 0, x and y are positively correlated, meaning y increases as x increases, vice versus. If b<0, x and y are negatively correlated. Interpretation of a and b b>0 x y a b<0 x y a

Correlation Coefficient Although now we have a regression line to describe the relationship between the dependent variable and the independent variable, it is not enough to characterize the relationship between x and y. We may see the situation in the following graphs. x y x y (1) (2) Obviously the relationship between x and y in (1) is stronger than that in (2) even though the line in (2) is the best fit line. The statistic that characterizes the strength of the relationship is correlation coefficient or R 2

How R 2 is Calculated? y If we use ŷ to represents y, then the error is (y- ŷ ). However, we used ŷ to represent y, therefore the error is reduced to (y- ŷ ). Thus (ŷ- y_bar ) is the improvement. This is true for all points in the graph. To account how much total improvement we get, we take a sum of all improvements, (ŷ -y_bar). Again we face the same situation as we did while calculating variance. We take the square of the difference and sum the squared difference for all points

R Square We already calculated SSE (Error Sum of Squares) while estimating a and b. In fact, the following relationship holds true: SST=SSR+SSE y R square indicates the percent variance in y explained by the regression. Regression Sum of Squares Total Sum of Squares

An Simple Linear Regression Example The followings are some survey data showing how much a family spend on food in relation to household income (x=income in thousand $, y=is percent of income left after spending on food)

Simple Linear Regression In many scientific investigations, one is interested to find how something is related with something else. For example the distance.

Similar presentations

Presentation on theme: "Simple Linear Regression In many scientific investigations, one is interested to find how something is related with something else. For example the distance."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Simple Linear Regression In many scientific investigations, one is interested to find how something is related with something else. For example the distance.

Similar presentations

Presentation on theme: "Simple Linear Regression In many scientific investigations, one is interested to find how something is related with something else. For example the distance."— Presentation transcript:

Similar presentations

About project

Feedback