FISH 397C Winter 2009 Evan Girvetz Basic Statistical Analyses and Contributed Packages in R © R Foundation, from

Slides:



Advertisements
Similar presentations
Lecture 10 F-tests in MLR (continued) Coefficients of Determination BMTRY 701 Biostatistical Methods II.
Advertisements

SPH 247 Statistical Analysis of Laboratory Data 1April 2, 2013SPH 247 Statistical Analysis of Laboratory Data.
Workshop in R & GLMs: #2 Diane Srivastava University of British Columbia
Multiple Regression Predicting a response with multiple explanatory variables.
Zinc Data SPH 247 Statistical Analysis of Laboratory Data.
Linear Regression Exploring relationships between two metric variables.
x y z The data as seen in R [1,] population city manager compensation [2,] [3,] [4,]
SPH 247 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data.
1 Regression Homework Solutions EPP 245/298 Statistical Analysis of Laboratory Data.
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
1 Some R Basics EPP 245/298 Statistical Analysis of Laboratory Data.
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
FISH 397C Winter 2009 Evan Girvetz More Complex Graphics in R © R Foundation, from
7/2/ Lecture 51 STATS 330: Lecture 5. 7/2/ Lecture 52 Tutorials  These will cover computing details  Held in basement floor tutorial lab,
MATH 3359 Introduction to Mathematical Modeling Linear System, Simple Linear Regression.
Crime? FBI records violent crime, z x y z [1,] [2,] [3,] [4,] [5,]
Regression Transformations for Normality and to Simplify Relationships U.S. Coal Mine Production – 2011 Source:
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
How to plot x-y data and put statistics analysis on GLEON Fellowship Workshop January 14-18, 2013 Sunapee, NH Ari Santoso.
Logistic Regression and Generalized Linear Models:
BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra.
PCA Example Air pollution in 41 cities in the USA.
9/14/ Lecture 61 STATS 330: Lecture 6. 9/14/ Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.
Analysis of Covariance Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.
Use of Weighted Least Squares. In fitting models of the form y i = f(x i ) +  i i = 1………n, least squares is optimal under the condition  1 ……….  n.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 4b, February 20, 2015 Lab: regression, kNN and K- means results, interpreting and evaluating models.
Regression and Analysis Variance Linear Models in R.
Exercise 8.25 Stat 121 KJ Wang. Votes for Bush and Buchanan in all Florida Counties Palm Beach County (outlier)
Collaboration and Data Sharing What have I been doing that’s so bad, and how could it be better? August 1 st, 2010.
Lecture 9: ANOVA tables F-tests BMTRY 701 Biostatistical Methods II.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Using R for Marketing Research Dan Toomey 2/23/2015
FACTORS AFFECTING HOUSING PRICES IN SYRACUSE Sample collected from Zillow in January, 2015 Urban Policy Class Exercise - Lecy.
Exercise 1 The standard deviation of measurements at low level for a method for detecting benzene in blood is 52 ng/L. What is the Critical Level if we.
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
Lecture 7: Multiple Linear Regression Interpretation with different types of predictors BMTRY 701 Biostatistical Methods II.
An Introduction to R Statistical Computing AMS 597 Stony Brook University Spring 2009 By Tianyi Zhang.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Determining Factors of GPA Natalie Arndt Allison Mucha MA /6/07.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Introduction to R Las Vegas 2015 James McCaffrey Microsoft Research, Advanced Development Tuesday, October 27, :15 - 3:30 PM devintersection.com.
Linear Models Alan Lee Sample presentation for STATS 760.
Exercise 1 The standard deviation of measurements at low level for a method for detecting benzene in blood is 52 ng/L. What is the Critical Level if we.
EPP 245 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data.
Stat 1510: Statistical Thinking and Concepts REGRESSION.
Tutorial 5 Thursday February 14 MBP 1010 Kevin Brown.
The Effect of Race on Wage by Region. To what extent were black males paid less than nonblack males in the same region with the same levels of education.
Nemours Biomedical Research Statistics April 9, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Statistical Analysis Data Mining  R is an extremely popular tool for Statistical Analysis and Data Mining. freeopen source  It is free and open source,
1 Analysis of Variance (ANOVA) EPP 245/298 Statistical Analysis of Laboratory Data.
The Data Collection and Statistical Analysis in IB Biology John Gasparini The Munich International School Part VI – A Statistical Test Flow Chart.
Before the class starts: Login to a computer Read the Data analysis assignment 1 on MyCourses If you use Stata: Start Stata Start a new do file Open the.
WSUG M AY 2012 EViews, S-Plus and R Damian Staszek Bristol Water.
Peter Fox and Greg Hughes Data Analytics – ITWS-4600/ITWS-6600
Data Analytics – ITWS-4600/ITWS-6600
Résolution de l’ex 1 p40 t=c(2:12);N=c(55,90,135,245,403,665,1100,1810,3000,4450,7350) T=data.frame(t,N,y=log(N));T; > T t N y
CHAPTER 7 Linear Correlation & Regression Methods
Correlation and regression
REGRESI DENGAN VARABEL FAKTOR/ KUALLTATIF
Data Analytics – ITWS-4600/ITWS-6600/MATP-4450
Console Editeur : myProg.R 1
PSY 626: Bayesian Statistics for Psychological Science
Regression Transformations for Normality and to Simplify Relationships
Multi Linear Regression Lab
ITWS-4600/ITWS-6600/MATP-4450/CSCI-4960
Amos Introduction In this tutorial, you will be briefly introduced to the student version of the SEM software known as Amos. You should download the current.
Presentation transcript:

FISH 397C Winter 2009 Evan Girvetz Basic Statistical Analyses and Contributed Packages in R © R Foundation, from

Basic Statistical Analysis in R correlation – cor.test() linear modeling – lm() t-test – t.test() ANOVA – aov() Chi squared – chisq.test()

Linear Regession: lm() > lm(taill ~ totlngth, data = possum) > taill.lm <- lm(taill ~ totlngth) > summary(taill.lm) Call: lm(formula = taill ~ totlngth) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) ** totlngth e-05 *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 41 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: on 1 and 41 DF, p-value: 2.883e-05

ANOVA > sexTaill.aov <- aov(taill~sex, data = possum) > summary(sexTaill.aov)

ANOVA: interactions > sexPopTaill.aov <- aov(taill~sex + Pop + sex*Pop, data = possum) > summary(sexPopTaill.aov)

Contributed Packages Go to –Click on CRAN –Select a CRAN mirror close (e.g. USA-WA) Click on “packages” to look at descriptions and more information about contributed packages Go look at the vegan package –Open up the reference manual link

Contributed Packages First you must install a package on your machine (and must re-install when R is updated) –This can be done from the pull down menu in the R GUI (this is the easiest) –Or can be done using the command install.packages() –Or the packages can be downloaded and installed manually as.zip files

Contributed Package Once a package is installed on your computer, you must load it into an R session each time you open the R session. –This can be done from the GUI pull down menu (under packages) –Or can be done using the command line > library(vegan)

Hands-on Exercise Install the following packages on your machine: vegan Hmisc Now load these packages into your R session (and add the code to your script for the class)

Cluster Analysis Example Select only possums greater than age 5 > possum5 5) | (is.na(possum$age)),] Calculate Jaccard distance matrix: > possum5.jac <- vegdist(possum5[,6:14], method = "jaccard") Run cluster analysis on distance matrix: > possum5.jac.hclust <- hclust(possum5.jac, method = "ward")

Plotting Cluster Analysis > plot(possum5.jac.hclust, xlab = "Possum Individuals", sub = "") This adds rectangles to create k = 4 groups: > par(lwd = 3, lty = 2) > rect.hclust(possum5.jac.hclust, k = 4 # add rectangles to show groups > par(lwd = 1, lty = 1)

Writing to graphic files Remember that this plot can be written to a graphics file using the command: > png(“dendrogram.png”, 1500, 1000, pointsize = 30) Put code for graphics here > dev.off()

Adding Error Bars to Graphics There are many ways to do this. –Hmisc has capability for this > library(Hmisc) > ?errbar

Hands On Exercise Create a new data table called hdlngthBySite, with three columns: –The site number –The mean hdlngth for each site –The standard deviation of hdlngth for each site –(Remember you can use aggregate to do this) Then plot hdlnth vs site (scatter plot is fine)

Adding Error Bars to Graphics > ?errbar > yplus <- hdlngthBySite$hdlngth.mean + hdlngthBySite$hdlngth.sd > yminus <- hdlngthBySite$hdlngth.mean + hdlngthBySite$hdlngth.sd

Adding Error Bars to Graphics > plot(hdlngth.mean ~ site, data = hdlngthBySite, ylim = c(80,105)) > errbar(x= hdlngthBySite$site, y= hdlngthBySite$hdlngth.mean, yplus = yplus, yminus = yminus, add=T, ylim = c(80,105))