Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

Slides:



Advertisements
Similar presentations
Multiple Regression.
Advertisements

3.3 Hypothesis Testing in Multiple Linear Regression
Applied Econometrics Second edition
1 Outliers and Influential Observations KNN Ch. 10 (pp )
CHAPTER 2 Building Empirical Model. Basic Statistical Concepts Consider this situation: The tension bond strength of portland cement mortar is an important.
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Econ 140 Lecture 81 Classical Regression II Lecture 8.
Ch11 Curve Fitting Dr. Deshi Ye
Objectives (BPS chapter 24)
Regression Analysis Using Excel. Econometrics Econometrics is simply the statistical analysis of economic phenomena Here, we just summarize some of the.
July 1, 2008Lecture 17 - Regression Testing1 Testing Relationships between Variables Statistics Lecture 17.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Chapter 10 Simple Regression.
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Chapter 4 Multiple Regression.
Chapter 11 Multiple Regression.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Linear and generalised linear models
Basics of regression analysis
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Linear Regression and Linear Prediction Predicting the score on one variable.
Linear regression models in matrix terms. The regression function in matrix terms.
Separate multivariate observations
SIMPLE LINEAR REGRESSION
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
Multiple Linear Regression - Matrix Formulation Let x = (x 1, x 2, …, x n )′ be a n  1 column vector and let g(x) be a scalar function of x. Then, by.
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Simple Linear Regression Models
Inference for a Single Population Proportion (p).
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.
1 Chapter 3 Multiple Linear Regression Multiple Regression Models Suppose that the yield in pounds of conversion in a chemical process depends.
MTH 161: Introduction To Statistics
Slide 6.1 Linear Hypotheses MathematicalMarketing In This Chapter We Will Cover Deductions we can make about  even though it is not observed. These include.
2014. Engineers often: Regress data  Analysis  Fit to theory  Data reduction Use the regression of others  Antoine Equation  DIPPR We need to be.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x.
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 12.3.
Outliers and influential data points. No outliers?
6. Simple Regression and OLS Estimation Chapter 6 will expand on concepts introduced in Chapter 5 to cover the following: 1) Estimating parameters using.
Example x y We wish to check for a non zero correlation.
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
BPS - 5th Ed. Chapter 231 Inference for Regression.
F73DA2 INTRODUCTORY DATA ANALYSIS ANALYSIS OF VARIANCE.
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8… Where we are going… Significance Tests!! –Ch 9 Tests about a population proportion –Ch 9Tests.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Inference for a Single Population Proportion (p)
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
6. Simple Regression and OLS Estimation
Basic Estimation Techniques
Correlation and Simple Linear Regression
Basic Estimation Techniques
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
Chapter 12 Inference on the Least-squares Regression Line; ANOVA
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Correlation and Simple Linear Regression
Statistical Inference about Regression
Simple Linear Regression
Basic Practice of Statistics - 3rd Edition Inference for Regression
Simple Linear Regression and Correlation
Presentation transcript:

Trees Example More than one variable

The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though, so from physical arguments we force the line to pass through the origin.

The R squared value is higher now, but the residual plot is not so random.

We might now ask if we can find a model with both explanatory variables height and girth. Physical considerations suggest that we should explore the very simple model Volume = b 1 × height × (girth) 2 +  This is basically the formula for the volume of a cylinder.

So the equation is: Volume = × height × (girth) 2 + 

The residuals are considerably smaller than those from any of the previous models considered. Further graphical analysis fails to reveal any further obvious dependence on either of the explanatory variable girth or height. Further analysis also shows that inclusion of a constant term in the model does not significantly improve the fit. Model 4 is thus the most satisfactory of those models considered for the data.

However, this is regression “through the origin” so it may be more satisfactory to rewrite Model 4 as volume = b 1 +  height × (girth) 2

so that b 1 can then just be regarded as the mean of the observations of volume height × (girth) 2 recall that  is assumed to have location measure (here mean) 0.

Compare with found earlier

Practical Question 2 yx1x1 x2x

So y = x x2 + e

Use >plot(multregress) or >plot(cooks.distance(multregress),type="h")

> ynew=c(y,12) > x1new=c(x1,20) > x2new=c(x2,100) > multregressnew=lm(ynew~x1new+x2new)

Very large influence

Second Example > ynew=c(y,40) > x1new=c(x1,10) > x2new=c(x2,50) > multregressnew=lm(ynew~x1new+x2new)

Multiple Linear Regression - Matrix Formulation Let x = (x 1, x 2, …, x n )′ be a n  1 column vector and let g(x) be a scalar function of x. Then, by definition,

For example, let Let a = (a 1, a 2, …, a n )′ be a n  1 column vector of constants. It is easy to verify that and that, for symmetrical A (n  n)

Theory of Multiple Regression Suppose we have response variables Y i, i = 1, 2, …, n and k explanatory variables/predictors X 1, X 2, …, X k. i = 1,2, …, n There are k+2 parameters b 0, b 1, b 2, …, b k­ and σ 2

X is called the design matrix

OLS (ordinary least-squares) estimation

Fitted values are given by H is called the “hat matrix” (… it puts the hats on the Y’s)

The error sum of squares, SS RES, is The estimate of  2 is based on this.

Example: Find a model of the form yx1x1 x2x for the data below.

X is called the design matrix

The model in matrix form is given by: We have already seen that Now calculate this for our example

R can be used to calculate X’X and the answer is:

To input the matrix in R use X=matrix(c(1,1,1,1,1,1,1,3.1,3.4,3.0,3.4, 3.9,2.8,2.2,30,25,20,30,40,25,30),7,3) Number of rows Number of columns

Notice command for matrix multiplication

The inverse of X’X can also be obtained by using R

We also need to calculate X’Y Now

Notice that this is the same result as obtained previously using the lm result on R

So y = x x2 + e

The “hat matrix” is given by

The fitted Y values are obtained by

Recall once more we are looking at the model

Compare with

Error Terms and Inference A useful result is : n : number of points k: number of explanatory variables

In addition we can show that: And c (i+1)(i+1) is the (i+1)th diagonal element of where s.e.(b i )=  c (i+1)(i+1) 

For our example:

was calculated as:

This means that c 11 = 6.683, c 22 =0.7600,c 33 = Note that c 11 is associated with b 0, c 22 with b 1 and c 33 with b 2 We will calculate the standard error for b 1 This is  x =

The value of b 1 is Now carry out a hypothesis test. H 0 : b 1 = 0 H 1 : b 1 ≠ 0 The standard error of b 1 is ^

The test statistic is This calculates as ( – 0)/ = 3.55

Ds….. ………. t tables using 4 degrees of freedom give cut of point of for 2.5%. ………………

We therefore accept H 1. There is no evidence at the 5% level that b 1 is zero. The process can be repeated for the other b values and confidence intervals calculated in the usual way. CI for  2 - based on the  4 2 distribution of ((4  )/11.14, (4  )/0.4844) i.e. (0.030, 0.695)

The sum of squares of the residuals can also be calculated.