Regression and Analysis Variance Linear Models in R.

Slides:



Advertisements
Similar presentations
Lecture 10 F-tests in MLR (continued) Coefficients of Determination BMTRY 701 Biostatistical Methods II.
Advertisements

BA 275 Quantitative Business Methods
Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.
Workshop in R & GLMs: #2 Diane Srivastava University of British Columbia
Generalized Linear Models (GLM)
Multiple Regression Predicting a response with multiple explanatory variables.
Zinc Data SPH 247 Statistical Analysis of Laboratory Data.
x y z The data as seen in R [1,] population city manager compensation [2,] [3,] [4,]
Chapter 12 Simple Regression
SPH 247 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data.
DJIA1 Beneath the Calm Waters: A Study of the Dow Index Group 5 members Project Choice: Hyo Joon You Data Retrieval: Stephen Meronk Statistical Analysis:
Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent.
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
7/2/ Lecture 51 STATS 330: Lecture 5. 7/2/ Lecture 52 Tutorials  These will cover computing details  Held in basement floor tutorial lab,
Crime? FBI records violent crime, z x y z [1,] [2,] [3,] [4,] [5,]
Some Analysis of Some Perch Catch Data 56 perch were caught in a freshwater lake in Finland Their weights, lengths, heights and widths were recorded It.
Multiple Regression Analysis. General Linear Models  This framework includes:  Linear Regression  Analysis of Variance (ANOVA)  Analysis of Covariance.
Regression Transformations for Normality and to Simplify Relationships U.S. Coal Mine Production – 2011 Source:
SPH 247 Statistical Analysis of Laboratory Data April 9, 2013SPH 247 Statistical Analysis of Laboratory Data1.
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
How to plot x-y data and put statistics analysis on GLEON Fellowship Workshop January 14-18, 2013 Sunapee, NH Ari Santoso.
© Department of Statistics 2012 STATS 330 Lecture 18 Slide 1 Stats 330: Lecture 18.
BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra.
PCA Example Air pollution in 41 cities in the USA.
9/14/ Lecture 61 STATS 330: Lecture 6. 9/14/ Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.
SWC Methodology - TWG February 19, 2015 Settlement Document Subject to I.R.E. 408.
Analysis of Covariance Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 12-1 Correlation and Regression.
Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.
Lecture 9: ANOVA tables F-tests BMTRY 701 Biostatistical Methods II.
Regression Model Building LPGA Golf Performance
Using R for Marketing Research Dan Toomey 2/23/2015
FACTORS AFFECTING HOUSING PRICES IN SYRACUSE Sample collected from Zillow in January, 2015 Urban Policy Class Exercise - Lecy.
Exercise 1 The standard deviation of measurements at low level for a method for detecting benzene in blood is 52 ng/L. What is the Critical Level if we.
Lecture 11 Multicollinearity BMTRY 701 Biostatistical Methods II.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
Lecture 7: Multiple Linear Regression Interpretation with different types of predictors BMTRY 701 Biostatistical Methods II.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Determining Factors of GPA Natalie Arndt Allison Mucha MA /6/07.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Lecture 3 Linear Models II Olivier MISSA, Advanced Research Skills.
Linear Models Alan Lee Sample presentation for STATS 760.
Exercise 1 The standard deviation of measurements at low level for a method for detecting benzene in blood is 52 ng/L. What is the Critical Level if we.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Real Estate Sales Forecasting Regression Model of Pueblo neighborhood North Elizabeth Data sources from Pueblo County Website.
EPP 245 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Tutorial 5 Thursday February 14 MBP 1010 Kevin Brown.
The Effect of Race on Wage by Region. To what extent were black males paid less than nonblack males in the same region with the same levels of education.
Nemours Biomedical Research Statistics April 9, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
1 Analysis of Variance (ANOVA) EPP 245/298 Statistical Analysis of Laboratory Data.
Education 793 Class Notes ANCOVA Presentation 11.
BUS 308 Week 4 Problem Set Check this A+ tutorial guideline at Problem Set Week Four.
Chapter 12 Simple Linear Regression and Correlation
Résolution de l’ex 1 p40 t=c(2:12);N=c(55,90,135,245,403,665,1100,1810,3000,4450,7350) T=data.frame(t,N,y=log(N));T; > T t N y
CHAPTER 7 Linear Correlation & Regression Methods
Correlation and regression
Multiple Regression A curvilinear relationship between one variable and the values of two or more other independent variables. Y = intercept + (slope1.
Console Editeur : myProg.R 1
Welcome to the class! set.seed(843) df <- tibble::data_frame(
Chapter 12 Simple Linear Regression and Correlation
STA 282 – Regression Analysis
The Multiple Regression Model
Obtaining the Regression Line in R
Business Statistics, 4e by Ken Black
Presentation transcript:

Regression and Analysis Variance Linear Models in R

What do you know already?

Regression Continuous Dependent Variable Continuous Independent Variable Assumptions Normality Independence Constant variance   N(0,  2 ) Linear or curvilinear

ANOVA Continuous Dependent Variable Discrete Independent Variable Assumptions Normality Independence Constant variance   N(0,  2 ) Factor level variances are equal

Linear Models Regression and ANOVA (and in fact ANCOVA) are all related mathematically to one another. Exactly the same mathematics is used throughout. The only difference is the type (and number) of independent variables that you are working with. The base assumptions are required for all linear models.

What procedure are we going to use to analyse linear model data?

Wagga House Prices A Wagga Wagga Real Estate Agent wishes to use data from 30 recent house sales to predict future selling prices ($ 000) from land area (m 2 ). The data was collected from the internet from any real estate listings that included the land size and the listing price. Most of the included listings were for 2 bedroom, 1 bathroom and 1 garage houses.

Call: lm(formula = Price ~ Land, data = dat) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) Land e-07 *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 28 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 28 DF, p-value: 3.141e-07

anova(dat.lm) Analysis of Variance Table Response: Price Df Sum Sq Mean Sq F value Pr(>F) Land e-07 *** Residuals Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Bottlenose Dolphins Neonate bottlenose dolphins produce many sounds just after birth. Prior to suckling these sounds intensify and then as the neonate prepares to feed the sounds cease, this is called a latency period (LP). It is thought that the LP is related to the suckling frequency. A study was conducted to collect information about the length of the LP and the suckling frequency, where the aim was to define this relationship if it existed.

Johne’s Disease To eliminate Johne’s disease from an infected farm or to prevent transmission, it is essential that susceptible animals are not exposed to an environment contaminated with the virus. The virus causing Johne’s disease is capable of persisting in the environment for long periods due to the high lipid content in the cell wall and the metabolic inactivity of the organism. Factors that could influence the survival of the virus in the soil including temperature, pH, organic matter exposure to ultra violet light and moisture content were investigated under controlled conditions.

Johne’s Disease continued This experiment involved trays of contaminated soil randomised to 12 unique treatments, involving changing the pH, UV light and the moisture content. They are uniquely defined as Treatment 1:12. The treatments were randomised to the trays of soil on a completely randomised fashion so there each treatment was replicated 5 times. The ln(number of virsus) remaining was the response measured as an indication of the effectiveness of the treatment. The aim of the experiment is to determine the “best” treatment for removing the virus from the soil.