Linear Models Alan Lee Sample presentation for STATS 760.

Slides:



Advertisements
Similar presentations
Topic 12: Multiple Linear Regression
Advertisements

Lecture 10 F-tests in MLR (continued) Coefficients of Determination BMTRY 701 Biostatistical Methods II.
BA 275 Quantitative Business Methods
5/11/ lecture 71 STATS 330: Lecture 7. 5/11/ lecture 72 Prediction Aims of today’s lecture  Describe how to use the regression model to.
5/18/ lecture 101 STATS 330: Lecture 10. 5/18/ lecture 102 Diagnostics 2 Aim of today’s lecture  To describe some more remedies for non-planar.
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Multiple Regression Predicting a response with multiple explanatory variables.
Multiple regression analysis
Linear Regression Exploring relationships between two metric variables.
x y z The data as seen in R [1,] population city manager compensation [2,] [3,] [4,]
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Regression Hal Varian 10 April What is regression? History Curve fitting v statistics Correlation and causation Statistical models Gauss-Markov.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Simple Linear Regression Analysis
7/2/ Lecture 51 STATS 330: Lecture 5. 7/2/ Lecture 52 Tutorials  These will cover computing details  Held in basement floor tutorial lab,
MATH 3359 Introduction to Mathematical Modeling Linear System, Simple Linear Regression.
Ch. 14: The Multiple Regression Model building
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
Crime? FBI records violent crime, z x y z [1,] [2,] [3,] [4,] [5,]
Correlation and Regression Analysis
Simple Linear Regression: An Introduction Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney.
Regression Transformations for Normality and to Simplify Relationships U.S. Coal Mine Production – 2011 Source:
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
Regression and Correlation Methods Judy Zhong Ph.D.
Inference for regression - Simple linear regression
BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra.
Chapter 12 Multiple Regression and Model Building.
PCA Example Air pollution in 41 cities in the USA.
9/14/ Lecture 61 STATS 330: Lecture 6. 9/14/ Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression.
Analysis of Covariance Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 1 Slide Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination n Model Assumptions n Testing.
Lecture 4: Inference in SLR (continued) Diagnostic approaches in SLR BMTRY 701 Biostatistical Methods II.
23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Exercise 8.25 Stat 121 KJ Wang. Votes for Bush and Buchanan in all Florida Counties Palm Beach County (outlier)
Warsaw Summer School 2015, OSU Study Abroad Program Regression.
Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.
Anaregweek11 Regression diagnostics. Regression Diagnostics Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s.
FACTORS AFFECTING HOUSING PRICES IN SYRACUSE Sample collected from Zillow in January, 2015 Urban Policy Class Exercise - Lecy.
Chapter 13 Multiple Regression
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
Lecture 7: Multiple Linear Regression Interpretation with different types of predictors BMTRY 701 Biostatistical Methods II.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Lecture 13 Diagnostics in MLR Added variable plots Identifying outliers Variance Inflation Factor BMTRY 701 Biostatistical Methods II.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Experimental Statistics - week 9
Stat 1510: Statistical Thinking and Concepts REGRESSION.
Tutorial 5 Thursday February 14 MBP 1010 Kevin Brown.
Before the class starts: Login to a computer Read the Data analysis assignment 1 on MyCourses If you use Stata: Start Stata Start a new do file Open the.
Lecture 11: Simple Linear Regression
Chapter 20 Linear and Multiple Regression
Chapter 12 Simple Linear Regression and Correlation
CHAPTER 7 Linear Correlation & Regression Methods
Correlation and regression
CHAPTER 29: Multiple Regression*
Console Editeur : myProg.R 1
Chapter 12 Simple Linear Regression and Correlation
Presentation transcript:

Linear Models Alan Lee Sample presentation for STATS 760

Contents The problem Typical data Exploratory Analysis The Model Estimation and testing Diagnostics Software A Worked Example

The Problem To model the relationship between a continuous variable Y and several explanatory variables x 1,… x k. Given values of x 1,… x k, predict the value of Y.

Typical Data Data on 5000 motor vehicle insurance policies having at least one claim Variables are –Y: log(amount of claim) –x 1 : sex of policy holder –x 2 : age of policy holder –x 3 : age of car –x 4 : car type (1-20 score, 1=Toyota Corolla, 20 = Porsche)

Exploratory Analysis Plot Y against other variables Scatterplot matrix Smooth as necessary

Log claims vs car age

The Model Relationship is modelled using the conditional distribution of Y given x 1,…x k. (covariates) Assume conditional distribution of Y is N( ,  2 ) where  depends on the covariates.

The Model (2) If all covariates are “continuous”, then      x   k  x k  In addition, all Y’s are assumed independent.

Estimation and Testing Estimate the  ’s Estimate the error variance  2 Test if  ’s  Check goodness-of-fit

Least Squares Estimate  ’s by values that minimize the sum of squares (Least squares estimates, LSE’s) Minimizing values are the solution of the Normal Equations. Minimum value is the residual sum of squares (RSS)   estimated by RSS/(n-k-1)

Goodness of Fit Goodness of fit measured by R 2 : 0  R 2  1 (why?) R 2 =1 iff perfect fit (data all on a plane)

Prediction Y predicted by where the hat indicates the LSE Standard errors: 2 kinds, one for mean value of Y for a set of x’s, the other for an individual y for a particular set of x’s

Interpretation of Coefficients The LSE for variable x j is the amount we expect y to increase if x j is increased by a unit amount, assuming all the other x’s are held fixed The test for  j = 0 is that variable j makes no contribution to the fit, given all other variables are in the model

Checking Assumptions (1) Tools are residuals, fitted values and hat matrix diagonals Fitted values Residuals Hat matrix diagonals (Measure the effect of an observation on its fitted value)

Checking Assumptions (2) Assumptions are –Mean linear in the x’s (plot residuals v fitted values, partial residual plot, CERES plots) –Constant variance (plot squared residuals v fitted values) –Independence (time series plot, residuals v preceding) –Normality/outliers (normal plot)

Remedial Action Transform variables Delete outliers Weighted least squares

Software SAS: PROC REG, PROC GLM R-Plus, R: lm Usage: lm(model formula, dataframe, weights,…)

Model Formula Assume k=3 If x 1,x 2,x 3 all continuous, fit a plane Y~x1 + x2 + x3 If x 1 categorical (eg gender) and x 2, x 3 continuous, fit a different plane/curve in x 2,x 3 for each level of x 1 : Y~x1 + x2 + x3 (planes parallel) Y~x1 + x2 + x3 + x1:x2 + x1:x3 (planes different)

Insurance Example (1)  cars.lm<-lm(logad~poly(CARAGE,2)+PRIMAGEN+gender)  summary(cars.lm) Call: lm(formula = logad ~ poly(CARAGE, 2) + PRIMAGEN + gender) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) < 2e-16 *** poly(CARAGE, 2) e-09 *** poly(CARAGE, 2) e-11 *** PRIMAGEN ** gender Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: on 4995 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: on 4 and 4995 DF, p-value: < 2.2e-16

Insurance Example (2) > plot(cars.lm)