Practical Sheet 6 Solutions Practical Sheet 6 Solutions The R data frame “whiteside” which deals with gas consumption is made available in R by > data(whiteside,

Slides:



Advertisements
Similar presentations
Linear regression models in R (session 1) Tom Price 3 March 2009.
Advertisements

Copyright © 2010 Pearson Education, Inc. Slide
Linear Regression t-Tests Cardiovascular fitness among skiers.
Inference for Regression
SCATTER PLOTS AND SMOOTHING. An Example – Car Stopping Distances An experiment was conducted to measure how the stopping distance of a car depends on.
1 Statistics & R, TiP, 2011/12 Linear Models & Smooth Regression  Linear models  Diagnostics  Robust regression  Bootstrapping linear models  Scatterplot.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
The Simple Regression Model
10-2 Correlation A correlation exists between two variables when the values of one are somehow associated with the values of the other in some way. A.
Dr. Mario MazzocchiResearch Methods & Data Analysis1 Correlation and regression analysis Week 8 Research Methods & Data Analysis.
BCOR 1020 Business Statistics Lecture 24 – April 17, 2008.
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
Inference for regression - Simple linear regression
Linear Regression and Correlation
Hypothesis Testing. Distribution of Estimator To see the impact of the sample on estimates, try different samples Plot histogram of answers –Is it “normal”
Regression Analysis (2)
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
CORRELATION & REGRESSION
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
September In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Inferences for Regression
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.
Use of Weighted Least Squares. In fitting models of the form y i = f(x i ) +  i i = 1………n, least squares is optimal under the condition  1 ……….  n.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Copyright © 2005 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Managerial Economics Thomas Maurice eighth edition Chapter 4.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Regression Regression relationship = trend + scatter
Simple Linear Regression ANOVA for regression (10.2)
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Introduction to Statistical Modelling Example: Body and heart weights of cats. The R data frame cats, and the variables therein, are made available by.
Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,
Regression Analysis: Part 2 Inference Dummies / Interactions Multicollinearity / Heteroscedasticity Residual Analysis / Outliers.
Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Example x y We wish to check for a non zero correlation.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
Regression Several Explanatory Variables. Example: Scottish hill races data. These data are made available in R as data(hills, package=MASS) They give.
Estimating standard error using bootstrap
Chapter 2 Linear regression.
Chapter 4: Basic Estimation Techniques
Regression and Correlation
Chapter 4 Basic Estimation Techniques
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
Inference for Least Squares Lines
Basic Estimation Techniques
Inference for Regression
Elementary Statistics
Lecture Slides Elementary Statistics Thirteenth Edition
Basic Estimation Techniques
CHAPTER 26: Inference for Regression
Chapter 9 Hypothesis Testing.
residual = observed y – predicted y residual = y - ŷ
Hypothesis testing and Estimation
Inference for Regression
Chapter 2: Modeling Distributions of Data
CHAPTER 12 More About Regression
Linear Regression and Correlation
Linear Regression and Correlation
Essentials of Statistics for Business and Economics (8e)
Presentation transcript:

Practical Sheet 6 Solutions Practical Sheet 6 Solutions The R data frame “whiteside” which deals with gas consumption is made available in R by > data(whiteside, package=MASS) It records the weekly gas consumption and average external temperature at a house in south- east England during two heating seasons, one before and one after cavity-wall insulation was installed. The variables are: Variable Description Gas weekly gas consumption Temp average external temperature during week Insul (binary factor) Before (insulation) or After

We check for b and ρ significantly different from 0. (It is clear from the print-out about b so this manual calculation is not normally required).

The value of b is Now carry out a hypothesis test. H 0 : b = 0 H 1 : b ≠ 0 The standard error of b is This is calculated in R as ^

The test statistic is This calculates as ( – 0)/ =

Ds….. ………. t tables using 24 degrees of freedom (there are 26 points) give cut of point of for 2.5%. ………-2.064………………

Since is less than , we accept H 1. There is evidence at the 5% level of a significant positive relationship. In fact, the t values associated with significance levels of 1%. 0.1% are and and so b is also significant at the 0.1% level (“very highly significant”). This corresponds to the three stars on the R output.

We now check the significance of r. The computer output gives R 2 = r is the square root of this, i.e It is fairly clear that this will be significantly different from 0 but test anyway.

We know that In this case the test statistic calculates as Let the true correlation coefficient be ρ.

H 0 : ρ = 0 H 1 : ρ ≠ 0 As seen previously, the cut off points for the t distribution with 24 degrees of freedom for 2.5% top and bottom are +/

The t value of implies H 1 is accepted. There is evidence of a non zero correlation between Gas and Temp.

Fisher’s Transformation

Use of Weighted Least Squares

In fitting models of the form y i = f(x i ) +  i i = 1………n, least squares is optimal under the condition  1 ……….  n are i.i.d. N(0,  2 ) and is a reasonable fitting method when this condition is at least approximately satisfied. (Most importantly we require here that there should be no significant outliers).

In the case where we have instead  1 ……….  n are independent N(0,  i 2 ), it is natural to use instead weighted least squares: choose f from within the permitted class of functions f to minimise  w i (y i -f(x i )) 2 Where we take w i proportional to 1/  i 2 (clearly only relative weights matter) ^ ^

Example: Scottish hill races data. These data are made available in R as data(hills, package=MASS) They give record times (minutes) in 1984 of 35 Scottish hill races, against distance (miles) and total height climbed (feet). We regard time as the response variable, and seek to model how its conditional distribution depends on the explanatory variables distance and climb.

The R code pairs(hills) produces the plots shown.

The fitted model is: time=5.62xdistance x(distance) xclimb x(climb) 2 +ε

For the hill races data, it is natural to assume greater variability in the times for the longer races, with the variability perhaps proportional to the distance. We therefore try refitting the quadratic model with weights proportional to 1/distance 2 > model2w = lm(time ~ -1 + dist +I(dist^2)+ climb + I(climb^2),data = hills[-18,], weights=1/dist^2)

The fitted model is now time=4.94*distance *(distance) * climb *(climb) 2 +  ’

The fitted model is now time=4.94*distance *(distance) * climb *(climb) 2 +  ’ Note that the residual summary above is on a “reweighted” scale, and cannot be directly compared with the earlier residual summaries.

The fitted model is now time=4.94*distance *(distance) * climb *(climb) 2 +  ’ Note that the residual summary above is on a “reweighted” scale, and cannot be directly compared with the earlier residual summaries. While the coefficients here appear to have changed somewhat from those in the earlier, unweighted, fit of Model 2, the fitted model is not really very different.

This is confirmed by the plot of the residuals from the weighted fit against those from the unweighted fit, produced by >plot(resid(model2w)~resid(model2))

Resistant Regression

As already observed, least squares fitting is very sensitive to outlying observations. However, there are also a large number of resistant fitting techniques available. One such is least trimmed squares: choose f from within the permitted class of functions f to minimise:- ^

Example: phones data. The R dataset phones in the package MASS gives the annual number of phone calls (millions) in Belgium over the period Consider the model calls = a + b*year The following two graphs plot the data and shows the result of fitting the model by least squares and then fitting the same model by least trimmed squares.

These graphs are achieved by the following code: > plot(calls~year) > phonesls=lm(calls~year) > abline(phonesls) > plot(calls~year) > library(lqs) > phoneslts=lqs(calls~year) > abline(phoneslts)

The explanation for the data is that for a period of time total length of all phone calls in each year was accidentally recorded instead.

Nonparametric Regression

Sometimes we simply wish to fit a smooth model without specifying any particular functional form for f. Again there are very many techniques here. One such is called loess. This constructs the fitted value f(x i ) for each observation i by performing a local regression using only those observations with x values in the neighbourhood of x i (and attaching most weight to the closest observations). ^

Example: cars data. The R data frame cars (in the base package) records 50 observations of speed (mph) and stopping distance (ft). These observations were collected in the 1920s! We treat stopping distance as the response variable and seek to model its dependence on speed.

We try to fit a model using loess. Possible R code is > data(cars) > attach(cars) > plot(cars) > library(modreg) > carslo=loess(dist~speed) > lines(fitted(carslo)~speed)

An optional argument span can be increased from its default value of 0:75 to give more smoothing: > plot(cars) > carslo2=loess(dist~speed, span=1) > lines(fitted(carslo2)~speed)