1 Some R Basics EPP 245/298 Statistical Analysis of Laboratory Data.

Slides:



Advertisements
Similar presentations
Lecture 10 F-tests in MLR (continued) Coefficients of Determination BMTRY 701 Biostatistical Methods II.
Advertisements

SPH 247 Statistical Analysis of Laboratory Data 1April 2, 2013SPH 247 Statistical Analysis of Laboratory Data.
1 Some R Basics EPP 245/298 Statistical Analysis of Laboratory Data.
SPH 247 Statistical Analysis of Laboratory Data April 2, 2010SPH 247 Statistical Analysis of Laboratory Data1.
5/11/ lecture 71 STATS 330: Lecture 7. 5/11/ lecture 72 Prediction Aims of today’s lecture  Describe how to use the regression model to.
Regression Analysis Using Excel. Econometrics Econometrics is simply the statistical analysis of economic phenomena Here, we just summarize some of the.
Multiple Regression Predicting a response with multiple explanatory variables.
Zinc Data SPH 247 Statistical Analysis of Laboratory Data.
Multiple regression analysis
x y z The data as seen in R [1,] population city manager compensation [2,] [3,] [4,]
1 ANOVA Homework Solutions EPP 245/298 Statistical Analysis of Laboratory Data.
SPH 247 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data.
1 Regression Homework Solutions EPP 245/298 Statistical Analysis of Laboratory Data.
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
7/2/ Lecture 51 STATS 330: Lecture 5. 7/2/ Lecture 52 Tutorials  These will cover computing details  Held in basement floor tutorial lab,
1 Analysis of Variance (ANOVA) EPP 245 Statistical Analysis of Laboratory Data.
Crime? FBI records violent crime, z x y z [1,] [2,] [3,] [4,] [5,]
1 Regression and Calibration EPP 245 Statistical Analysis of Laboratory Data.
Regression Transformations for Normality and to Simplify Relationships U.S. Coal Mine Production – 2011 Source:
SPH 247 Statistical Analysis of Laboratory Data April 9, 2013SPH 247 Statistical Analysis of Laboratory Data1.
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
How to plot x-y data and put statistics analysis on GLEON Fellowship Workshop January 14-18, 2013 Sunapee, NH Ari Santoso.
BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra.
PCA Example Air pollution in 41 cities in the USA.
9/14/ Lecture 61 STATS 330: Lecture 6. 9/14/ Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.
MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression.
Analysis of Covariance Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Lecture 5: SLR Diagnostics (Continued) Correlation Introduction to Multiple Linear Regression BMTRY 701 Biostatistical Methods II.
 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 4b, February 20, 2015 Lab: regression, kNN and K- means results, interpreting and evaluating models.
Regression and Analysis Variance Linear Models in R.
Collaboration and Data Sharing What have I been doing that’s so bad, and how could it be better? August 1 st, 2010.
Lecture 9: ANOVA tables F-tests BMTRY 701 Biostatistical Methods II.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Using R for Marketing Research Dan Toomey 2/23/2015
FACTORS AFFECTING HOUSING PRICES IN SYRACUSE Sample collected from Zillow in January, 2015 Urban Policy Class Exercise - Lecy.
Exercise 1 The standard deviation of measurements at low level for a method for detecting benzene in blood is 52 ng/L. What is the Critical Level if we.
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
Lecture 7: Multiple Linear Regression Interpretation with different types of predictors BMTRY 701 Biostatistical Methods II.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Determining Factors of GPA Natalie Arndt Allison Mucha MA /6/07.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Lecture 3 Linear Models II Olivier MISSA, Advanced Research Skills.
Linear Models Alan Lee Sample presentation for STATS 760.
Exercise 1 The standard deviation of measurements at low level for a method for detecting benzene in blood is 52 ng/L. What is the Critical Level if we.
EPP 245 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data.
Tutorial 5 Thursday February 14 MBP 1010 Kevin Brown.
The Effect of Race on Wage by Region. To what extent were black males paid less than nonblack males in the same region with the same levels of education.
1 Analysis of Variance (ANOVA) EPP 245/298 Statistical Analysis of Laboratory Data.
1 Analysis of Variance (ANOVA) EPP 245/298 Statistical Analysis of Laboratory Data.
Before the class starts: Login to a computer Read the Data analysis assignment 1 on MyCourses If you use Stata: Start Stata Start a new do file Open the.
WSUG M AY 2012 EViews, S-Plus and R Damian Staszek Bristol Water.
Data Analytics – ITWS-4600/ITWS-6600
Chapter 12 Simple Linear Regression and Correlation
Résolution de l’ex 1 p40 t=c(2:12);N=c(55,90,135,245,403,665,1100,1810,3000,4450,7350) T=data.frame(t,N,y=log(N));T; > T t N y
Correlation and regression
Lecture 12 More Examples for SLR More Examples for MLR 9/19/2018
Console Editeur : myProg.R 1
Chapter 12 Simple Linear Regression and Correlation
Estimating the Variance of the Error Terms
Analysis of Variance (ANOVA)
Presentation transcript:

1 Some R Basics EPP 245/298 Statistical Analysis of Laboratory Data

November 3, 2005EPP 245 Statistical Analysis of Laboratory Data 2 R and Stata R and Stata both have many of the same functions Stata can be run more easily as “point and shoot” Both should be run from command files to document analyses Neither is really harder than the other, but the syntax and overall conception is different

November 3, 2005EPP 245 Statistical Analysis of Laboratory Data 3 Origins S was a statistical and graphics language developed at Bell Labs in the “one letter” days (i.e., the c programming language) R is an implementation of S, as is S-Plus, a commercial statistical package R is free, open source, runs on Windows, OS X, and Linux

November 3, 2005EPP 245 Statistical Analysis of Laboratory Data 4 Why use R? Bioconductor is a project collecting packages for biological data analysis, graphics, and annotation Most of the better methods are only available in Bioconductor or stand-alone packages With some exceptions, commercial microarray analysis packages are not competitive

November 3, 2005EPP 245 Statistical Analysis of Laboratory Data 5 Getting Data into R Many times the most direct method is to edit the data in Excel, Export as a txt file, then import to R using read.delim We will do this two ways for some energy expenditure data Frequently, the data from studies I am involved in arrives in Excel

November 3, 2005EPP 245 Statistical Analysis of Laboratory Data 6 energy package:ISwR R Documentation Energy expenditure Description: The 'energy' data frame has 22 rows and 2 columns. It contains data on the energy expenditure in groups of lean and obese women. Format: This data frame contains the following columns: expend a numeric vector. 24 hour energy expenditure (MJ). stature a factor with levels 'lean' and 'obese'. Source: D.G. Altman (1991), _Practical Statistics for Medical Research_, Table 9.4, Chapman & Hall.

November 3, 2005EPP 245 Statistical Analysis of Laboratory Data 7 > eg <- read.delim("energy1.txt") > eg Obese Lean NA NA NA NA 8.11

November 3, 2005EPP 245 Statistical Analysis of Laboratory Data 8 > class(eg) [1] "data.frame" > t.test(eg$Obese,eg$Lean) Welch Two Sample t-test data: eg$Obese and eg$Lean t = , df = , p-value = alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean of x mean of y > mean(eg$Obese)-mean(eg$Lean) [1] NA > mean(eg$Obese[1:9])-mean(eg$Lean) [1] >

November 3, 2005EPP 245 Statistical Analysis of Laboratory Data 9 > eg2 <- read.delim("energy2.txt") > eg2 expend stature Obese Obese Obese Obese Obese Obese Obese Obese Obese Lean Lean Lean Lean Lean Lean Lean Lean Lean Lean Lean Lean Lean

November 3, 2005EPP 245 Statistical Analysis of Laboratory Data 10 > class(eg2) [1] "data.frame" > t.test(eg2$expend ~ eg2$stature) Welch Two Sample t-test data: eg2$expend by eg2$stature t = , df = , p-value = alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean in group Lean mean in group Obese > mean(eg2[eg2[,2]=="Lean",1])-mean(eg2[eg2[,2]=="Obese",1]) [1]

November 3, 2005EPP 245 Statistical Analysis of Laboratory Data 11 > mean(eg2[eg2[,2]=="Lean",1])-mean(eg2[eg2[,2]=="Obese",1]) [1] > tapply(eg2[,1],eg2[,2],mean) Lean Obese > tmp <-tapply(eg2[,1],eg2[,2],mean) > tmp Lean Obese > class(tmp) [1] "array" > dim(tmp) [1] 2 > tmp[1]-tmp[2] Lean

November 3, 2005EPP 245 Statistical Analysis of Laboratory Data 12 Using R for Linear Regression The lm() command is used to do linear regression In many statistical packages, execution of a regression command results in lots of output In R, the lm() command produces a linear models object that contains the results of the linear model

November 3, 2005EPP 245 Statistical Analysis of Laboratory Data 13 Formulas, output and extractors If gene.exp is a response, and rads is a level of radiation to which the cell culture is exposed, then lm(gene.exp ~ rads) computes the regression lmobj <- lm(gene.exp ~ rads) Summary(lmobj) coef, resid(), fitted, …

November 3, 2005EPP 245 Statistical Analysis of Laboratory Data 14 Example Analysis Standard aqueous solutions of fluorescein (in pg/ml) are examined in a fluorescence spectrometer and the intensity (arbitrary units) is recorded What is the relationship of intensity to concentration Use later to infer concentration of labeled analyte

November 3, 2005EPP 245 Statistical Analysis of Laboratory Data 15 > fluor concentration intensity > attach(fluor) > plot(concentration,intensity) > title("Intensity vs. Concentration”)

November 3, 2005EPP 245 Statistical Analysis of Laboratory Data 16

November 3, 2005EPP 245 Statistical Analysis of Laboratory Data 17 > fluor.lm <- lm(intensity ~ concentration) > summary(fluor.lm) Call: lm(formula = intensity ~ concentration) Residuals: Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) ** concentration e-08 *** --- Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: on 5 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: 2228 on 1 and 5 DF, p-value: 8.066e-08

November 3, 2005EPP 245 Statistical Analysis of Laboratory Data 18 > fluor.lm <- lm(intensity ~ concentration) > summary(fluor.lm) Call: lm(formula = intensity ~ concentration) Residuals: Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) ** concentration e-08 *** --- Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: on 5 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: 2228 on 1 and 5 DF, p-value: 8.066e-08 Formula

November 3, 2005EPP 245 Statistical Analysis of Laboratory Data 19 > fluor.lm <- lm(intensity ~ concentration) > summary(fluor.lm) Call: lm(formula = intensity ~ concentration) Residuals: Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) ** concentration e-08 *** --- Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: on 5 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: 2228 on 1 and 5 DF, p-value: 8.066e-08 Residuals

November 3, 2005EPP 245 Statistical Analysis of Laboratory Data 20 > fluor.lm <- lm(intensity ~ concentration) > summary(fluor.lm) Call: lm(formula = intensity ~ concentration) Residuals: Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) ** concentration e-08 *** --- Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: on 5 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: 2228 on 1 and 5 DF, p-value: 8.066e-08 Slope coefficient

November 3, 2005EPP 245 Statistical Analysis of Laboratory Data 21 > fluor.lm <- lm(intensity ~ concentration) > summary(fluor.lm) Call: lm(formula = intensity ~ concentration) Residuals: Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) ** concentration e-08 *** --- Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: on 5 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: 2228 on 1 and 5 DF, p-value: 8.066e-08 Intercept (intensity at zero concentration)

November 3, 2005EPP 245 Statistical Analysis of Laboratory Data 22 > fluor.lm <- lm(intensity ~ concentration) > summary(fluor.lm) Call: lm(formula = intensity ~ concentration) Residuals: Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) ** concentration e-08 *** --- Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: on 5 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: 2228 on 1 and 5 DF, p-value: 8.066e-08 Variability around regression line

November 3, 2005EPP 245 Statistical Analysis of Laboratory Data 23 > fluor.lm <- lm(intensity ~ concentration) > summary(fluor.lm) Call: lm(formula = intensity ~ concentration) Residuals: Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) ** concentration e-08 *** --- Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: on 5 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: 2228 on 1 and 5 DF, p-value: 8.066e-08 Test of overall significance of model

November 3, 2005EPP 245 Statistical Analysis of Laboratory Data 24 > plot(concentration,intensity,lw=2) > title("Intensity vs. Concentration") > abline(coef(fluor.lm),lwd=2,col="red") > plot(fitted(fluor.lm),resid(fluor.lm)) > abline(h=0) The first of these plots shows the data points and the regression line. The second shows the residuals vs. fitted values, which is better at detecting nonlinearity

November 3, 2005EPP 245 Statistical Analysis of Laboratory Data 25

November 3, 2005EPP 245 Statistical Analysis of Laboratory Data 26

November 3, 2005EPP 245 Statistical Analysis of Laboratory Data 27 > setwd(“c:/td/class/K30bench/”) > source(“wright.r”) > cor(wright) std.wright mini.wright std.wright mini.wright > wplot1() File wright.r: library(ISwR) data(wright) attach(wright) wplot1 <- function() { plot(std.wright,mini.wright,xlab="Standard Flow Meter", ylab="Mini Flow Meter",lwd=2) title("Mini vs. Standard Peak Flow Meters") wright.lm <- lm(mini.wright ~ std.wright) abline(coef(wright.lm),col="red",lwd=2) }

November 3, 2005EPP 245 Statistical Analysis of Laboratory Data 28

November 3, 2005EPP 245 Statistical Analysis of Laboratory Data 29 red.cell.folate package:ISwR R Documentation Red cell folate data Description: The 'folate' data frame has 22 rows and 2 columns. It contains data on red cell folate levels in patients receiving three different methods of ventilation during anesthesia. Format: This data frame contains the following columns: folate a numeric vector. Folate concentration ($mu$g/l). ventilation a factor with levels 'N2O+O2,24h': 50% nitrous oxide and 50% oxygen, continuously for 24~hours; 'N2O+O2,op': 50% nitrous oxide and 50% oxygen, only during operation; 'O2,24h': no nitrous oxide, but 35-50% oxygen for 24~hours.

November 3, 2005EPP 245 Statistical Analysis of Laboratory Data 30 > data(red.cell.folate) > help(red.cell.folate) > summary(red.cell.folate) folate ventilation Min. :206.0 N2O+O2,24h:8 1st Qu.:249.5 N2O+O2,op :9 Median :274.0 O2,24h :5 Mean : rd Qu.:305.5 Max. :392.0 > attach(red.cell.folate) > plot(folate ~ ventilation)

November 3, 2005EPP 245 Statistical Analysis of Laboratory Data 31 > folate.lm <- lm(folate ~ ventilation) > summary(folate.lm) Call: lm(formula = folate ~ ventilation) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) e-14 *** ventilationN2O+O2,op * ventilationO2,24h Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: on 19 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: on 2 and 19 DF, p-value:

November 3, 2005EPP 245 Statistical Analysis of Laboratory Data 32 > anova(folate.lm) Analysis of Variance Table Response: folate Df Sum Sq Mean Sq F value Pr(>F) ventilation * Residuals Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

November 3, 2005EPP 245 Statistical Analysis of Laboratory Data 33 > data(heart.rate) > attach(heart.rate) > heart.rate hr subj time

November 3, 2005EPP 245 Statistical Analysis of Laboratory Data 34 > anova(hr.lm) Analysis of Variance Table Response: hr Df Sum Sq Mean Sq F value Pr(>F) subj e-16 *** time * Residuals Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

November 3, 2005EPP 245 Statistical Analysis of Laboratory Data 35 Exercises Download R and install Also download BioConductor –Go to BioConductor web page –Get and execute getbioc()… –Make sure you have a net connection and an hour Try to replicate the analyses in the presentation