Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiple Regression Petter Mostad 2005.10.17. Review: Simple linear regression We define a model where are independent (normally distributed) with equal.

Similar presentations


Presentation on theme: "Multiple Regression Petter Mostad 2005.10.17. Review: Simple linear regression We define a model where are independent (normally distributed) with equal."— Presentation transcript:

1 Multiple Regression Petter Mostad 2005.10.17

2 Review: Simple linear regression We define a model where are independent (normally distributed) with equal variance We can then use data to estimate the model parameters, and to make statements about their uncertainty

3 Multiple regression model The errors are independent random (normal) variables with expectation zero and variance The parameters are estimated by minimizing the sum of squares of errors, as before

4 Choice of independent variables for the model The set of measured variables that the response variable might depend on Do we expect the relationship to the response variable to be linear? The explanatory (independent) variables x 1i, x 2i, …, x Ki cannot be linearily related

5 Questions asked in connection with regression What would be a prediction of the dependent variable given new values of the independent variables? HOW do various independent variables influence the dependent variable? –Difficult question, as it depends on the WHOLE model!

6 Least squares estimation The least squares estimates of are the values b 1, b 2, …, b K minimizing They can be computed with similar but more complex formulas as with simple regression

7 Explanatory power Defining We get as before We define We also get that Coefficient of determination

8 Adjusted coefficient of determination Adding more independent variables will generally increase SSR and decrease SSE Thus the coefficient of determination will tend to indicate that models with many variables always fit better. To avoid this effect, the adjusted coefficient of determination may be used:

9 Drawing inference about the model parameters Similar to simple regression, we get that the following statistic has a t distribution with n-K-1 degrees of freedom: where b j is the least squares estimate for and s bj is its estimated standard deviation s bj is computed from SSE and the correlation between independent variables

10 Confidence intervals and hypothesis tests A confidence interval for becomes Testing the hypothesis vs –Reject if or

11 Testing sets of parameters We can also test the null hypothesis that a specific set of the betas are simultaneously zero. The alternative hypothesis is that at least one beta in the set is nonzero. The test statistic has an F distribution, and is computed by comparing the SSE in the full model, and the SSE when setting the parameters in the set to zero.

12 Making predictions from the model As in simple regression, we can use the estimated coefficients to make predictions As in simple regression, the uncertainty in the predictions has two sources: –The variance around the regression estimate –The variance of the estimated regression model

13 Nonlinear transformations and models Sometimes, a linear model does not fit Some alternatives: –Make a transformation of the y values –Make a transformation of the x values –Predict y from a combination of transformations of the x values

14 Example 1 When then Use standard formulas on the pairs (x 1,log(y 1 )), (x 2, log(y 2 )),..., (x n, log(y n )) We get estimates for log(a) and b, and thus a and b

15 Example 2 Another natural model may be We get that Use standard formulas on the pairs (log(x 1 ), log(y 1 )), (log(x 2 ), log(y 2 )),...,(log(x n ),log(y n )) Note: In this model, the curve goes through (0,0)

16 Example 3 Assume data (x 1,y 1 ),..., (x n,y n ) seem to follow a third degree polynomial We use multivariate regression on (x 1, x 1 2, x 1 3, y 1 ), (x 2, x 2 2, x 2 3, y 2 ),... We get estimated a,b,c,d, in a third degree polynomial curve

17 Indicator variables Binary variables (yes/no, male/female, …) can be represented as 1/0, and used as independent variables. Also called dummy variables in the book. When used directly, they influence only the constant term of the regression It is also possible to use a binary variable so that it changes both constant term and slope for the regression

18 Part 2: Problem solving with SPSS 1.Going from practial problem to statistically formulated problem 2.Clicking the right places in SPSS 3.Interpreting the result produced by SPSS Often, several rounds of using SPSS and interpreting the results will be needed

19 From practical problem to statistical problem In practice the hardest part In order to do it, you need to have an understanding of the statistical models available, and the kind of questions they can answer Practice is key

20 Example You want to investigate the cost and effect of a medical procedure X, and compare it to traditional procedure Y. Your data is, for 40 patients: –The type performed (X or Y) –The cost –The effect, measured by some number eff How would you analyze this?

21 Key starting point: what questions to ask Is there a difference in cost between X and Y? Is there a difference in effect between X and Y? What is the relationship between cost and effect? Is this relationship different for X and Y?

22 Example (cont.) You find out that the procedures have been performed by 20 doctors, each doctor performing one X and one Y procedure. Can this help you in your analysis? If we had only data for X patients, what kind of questions could we answer?

23 Analysis in SPSS: Data input and transformation The format is always: A number of variables (columns) observed for a number of cases (rows) Manual data input, or from tables (e.g., Excel). The rows must always correspond to ”cases” (so that in example with 20 doctors, data must be moved into 20 rows) Transformation of variables

24 Analysis in SPSS: Data exploration ALWAYS start with exploring data with –descriptive statistics –graphs From this, you can find out: –Strange observations (”outliers”) –Unexpected relationships and effects –Whether your intended analysis model is appropriate

25 Analysis in SPSS: Fitting models and doing tests, checking the fit Use ”Analyze” and one or more appropriate models to answer the questions you have raised Sometimes the results indicate that the model needs to be changed Several types of plots should be used to investigate the fit of the model –Plotting residuals against independent and dependent variables is useful


Download ppt "Multiple Regression Petter Mostad 2005.10.17. Review: Simple linear regression We define a model where are independent (normally distributed) with equal."

Similar presentations


Ads by Google