Download presentation

Presentation is loading. Please wait.

Published byLynne Edwards Modified over 2 years ago

1
BIO503: Lecture 4 Statistical models in R --- Recap --- Stefan Bentink bentink@jimmy.harvard.edu

3
Linear Regression Models residual error regression coefficient dependent variable intercept independent variable Using the methods of least squares, we can derive the following estimators: Our goal is to test the hypothesis: We can do this with a T test: under the null hypothesis, this follows a T distribution with (n-1) df.

4
my.model <- lm(y ~ x)

5
Some important functions my.model <- lm(x~y) summary(my.model) anova(my.model) predict(my.model,new.data)

6
Specifying Models In R we use model formula to specify the model we want to fit to our data. y ~ x Simple Linear Regression y ~ x – 1 Simple Linear Regression without the intercept (line goes through origin) y ~ x1 + x2 + x3 Multiple Regression y ~ x + I(x^2) Quadratic Regression log(y) ~ x1 + x2 Multiple Regression of Transformed Variable For factors A, B: y ~ A 1-way ANOVA y ~ A + B 2-way ANOVA y ~ A*B 2-way ANOVA + interaction term

7
ANOVA Example Let's use a different dataset: > library(MASS) > data(ChickWeight) > attach(ChickWeight) The factor Diet has 4 levels. > levels(Diet) > anova(lm(weight ~ Diet, data=ChickWeight)) Analysis of Variance Table Response: weight Df Sum Sq Mean Sq F value Pr(>F) Diet 3 155863 51954 10.81 6.433e-07 Residuals 574 2758693 4806

8
Two-way ANOVA We can fit a two-way ANOVA: > anova(lm(weight ~ Diet + Chick, data=ChickWeight)) Analysis of Variance Table Response: weight Df Sum Sq Mean Sq F value Pr(>F) Diet 3 155863 51954 11.5045 2.571e-07 Chick 46 374243 8136 1.8015 0.001359 Residuals 528 2384450 4516 The interpretation of the model output is sequential, from the bottom to the top. This line tests the model: weight ~ Diet + Chick This line tests the model: weight ~ Diet vs weight ~ Diet + Chick.

9
Generalized Linear Models Linear regression models hinge on the assumption that the response variable follows a Normal distribution. Generalized linear models are able to handle non-Normal response variables and transformations to linearity.

10
Logistic Regression When faced with a binary response Y = (0,1), we use logistic regression. where Logit

11
Fit the Logistic Regression Model > anes.logit <- glm(move ~ conc, family=binomial(link=logit), data=anesthetic) The output summary looks like this: > summary(anes.logit) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 6.469 2.418 2.675 0.00748 ** conc -5.567 2.044 -2.724 0.00645 ** Estimates of P(Y=1) are given by: > fitted.values(anes.logit)

12
Update models and model selection Some handy functions to know about: new.model <- update(old.model, new.formula) Model Selection functions available in the MASS package drop1, dropterm add1, addterm step stepAIC

13
SURVIVAL ANALYSIS

14
Problem 5 – Survival Analysis 1.Read in the data file aml.txt. This data stores the survival data on patients with Acute Myelogenous Leukemia. 2.Compute the Kaplan-Meier estimate for all patients in this data. Compute the corresponding Kaplan-Meier plot. Construct Kaplan-Meier plots grouped by chemotherapy status. 3.Using a log-rank test, test if the two survival curves (patients on maintenance chemotherapy, patients who are not) are identical. 4.Fit a Cox proportional hazards model to the data set. 5.Plot these survival functions for patients from the different groups.

15
Survival Analysis library(survival) Example: aml leukemia data Kaplan-Meier curve fit1 <- survfit(Surv(aml$time,aml$status)~1) summary(fit1) plot(fit1) Log-rank test survdiff(Surv(time, status)~x, data=aml)

16
Survival analysis > cp <- coxph(Surv(aml$time, + aml$status)~x,data=aml) > > summary(cp) > > plot(survfit(Surv(aml$time,aml$status)~x, + data=aml),col=c("red","green"),lwd=2)

Similar presentations

OK

Simple linear models Straight line is simplest case, but key is that parameters appear linearly in the model Needs estimates of the model parameters (slope.

Simple linear models Straight line is simplest case, but key is that parameters appear linearly in the model Needs estimates of the model parameters (slope.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on measuring area and volume Ppt on computer organization and architecture Animated ppt on chemical bonding Ppt on sickle cell disease Ppt on earth damn Download ppt on 3 phase induction motor Ppt on advertisement of cadbury dairy milk Ppt on tribal communities of india Ppt on water scarcity map Ppt on inhabiting other planets that could support