SHOU Haochang ( 寿昊畅 ) Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health July 11th, 2011 Nanjing University, China *Thanks to.

Slides:



Advertisements
Similar presentations
Introduction to S-Plus by Francesco Ferretti Analysis of Biological Data Course Winter term 2007 Dalhousie University.
Advertisements

Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU Empowered by Higher Education Quality Enhancement Project (HEQEP) Department.
Introduction to MATLAB The language of Technical Computing.
MATLAB – What is it? Computing environment / programming language Tool for manipulating matrices Many applications, you just need to get some numbers in.
Welcome to the Plant Breeding and Genomics Webinar Series Today’s Presenter: Dr. Heather Merk Presentation & Supplemental Files:
 Statistics package  Graphics package  Programming language  Can be used to share/reproduce analyses  Many new packages being created - can be downloaded.
Introduction to R Introduction to Systems Biology Course Chris Plaisier Institute for Systems Biology.
Introduction to GTECH 201 Session 13. What is R? Statistics package A GNU project based on the S language Statistical environment Graphics package Programming.
R for Research Data Analysis using R Day1: Basic R Baburao Kamble University of Nebraska-Lincoln.
Alternative text for elementary statistics –Elementary Concepts –Basic Statistics.
Introduction to R A. Di Bucchianico. Introduction to R2 Types of statistical software command-line software –requires knowledge of syntax of commands.
R – a brief introduction Johannes Freudenberg Cincinnati Children’s Hospital Medical Center
How to Use the R Programming Language for Statistical Analyses Part I: An Introduction to R Jennifer Urbano Blackford, Ph.D. Department of Psychiatry Kennedy.
LISA Short Course Series R Basics
R Introduction to Statistical Analysis Using R Nick, Caroline, Tanya.
What is R Muhammad Omer. What is R  R is the programing language software for statistical computing and data analysis  The R language is extensively.
Training on R-language Mārtiņš Liberts Central Statistical Bureau of Latvia.
Introduction to R: The Basics Rosales de Veliz L., David S.L., McElhiney D., Price E., & Brooks G. Contributions from Ragan. M., Terzi. F., & Smith. E.
3. Functions and Arguments. Writing in R is like writing in English Jump three times forward Action Modifiers.
Applied Bioinformatics Introduction to Linux and R Bing Zhang Department of Biomedical Informatics Vanderbilt University
1 An Introduction – UCF, Methods in Ecology, Fall 2008 An Introduction By Danny K. Hunt & Eric D. Stolen Getting Started with R (with speaker notes)
MATLAB Lecture One Monday 4 July Matlab Melvyn Sim Department of Decision Sciences NUS Business School
An introduction to R: get familiar with R Guangxu Liu Bio7932.
732A44 Programming in R.  Self-studies of the course book  2 Lectures (1 in the beginning, 1 in the end)  Labs (computer). Compulsory submission of.
Hands-on Introduction to R. Outline R : A powerful Platform for Statistical Analysis Why bother learning R ? Data, data, data, I cannot make bricks without.
A B C Q R S! Coilín Minto Department of Biology, Dalhousie University.
Intro to R R is a free version of S-plus R is a free version of S-plus Can be used interactively but script or syntax files are commonly used to record.
Sébastien Lê Agrocampus Rennes A very short introduction to “R” The “Rcmdr” package and its environment.
Computational Methods of Scientific Programming Lecturers Thomas A Herring, Room A, Chris Hill, Room ,
Data Objects in R Vector1 dimensionAll elements have the same data types Data types: numeric, character logic, factor Matrix2 dimensions Array2 or more.
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
Using the ‘R’ Language for Bioinformatics
Applied Bioinformatics Introduction to R, continued Bing Zhang Department of Biomedical Informatics Vanderbilt University
Introduction to R / sma / Bioconductor Statistics for Microarray Data Analysis The Fields Institute for Research in Mathematical Sciences May 25, 2002.
Using Software in Teaching Statistics Damon Berridge, Centre for Applied Statistics, Dept of Mathematics & Statistics ESRC NCRM.
R Programming Yang, Yufei. Normal distribution.
R packages/libraries Data input/output Rachel Carroll Department of Public Health Sciences, MUSC Computing for Research I, Spring 2014.
STAT 251 Lab 1. Outline Lab Accounts Introduction to R.
An Introduction to R Statistical Computing AMS 597 Stony Brook University Spring 2009 By Tianyi Zhang.
Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
LESSON ONE DECISION ANALYSIS Subtopic 4 - R Programming Created by The North Carolina School of Science and Math forThe North Carolina School of Science.
Introduction to R Carol Bult The Jackson Laboratory Functional Genomics (BMB550) Spring 2011.
ME 142 Engineering Computation I Exam 3 Review Mathematica.
Postgraduate Computing Lectures PAW 1 PAW: Physicist Analysis Workstation What is PAW? –A tool to display and manipulate data. Learning PAW –See ref. in.
Learn R Toolkit D Kelly O'DayInstall & SetupMod 1 - Setup: 1 Module 1 Installing & Setting Up R Do See & HearRead Learn PowerPoint must be in View Show.
Math 252: Math Modeling Eli Goldwyn Introduction to MATLAB.
R Roger Barlow HEP Computing seminar 21 st February 2008.
Introduction to Programming on MATLAB Ecological Modeling Course Sep 11th, 2006.
Introductory Data Analysis F73DA2. Contact Times (Spring Term 2008) Monday 4: : Lecture in LT3 Tuesday 2: : Lecture in LT3 Wednesday
Chris Knight Beginners’ workshop.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
Introduction to R and Data Science Tools in the Microsoft Stack Jamey Johnston.
Introduction to R and Data Science Tools in the Microsoft Stack Jamey Johnston.
Introduction to R user-friendly and absolutely free
Introduction to R and Data Science Tools in the Microsoft Stack
CSC 215 : Procedural Programming with C
R Brown-Bag Seminar 2.1 Topic: Introduction to R Presenter: Faith Musili ICRAF-Geoscience Lab.
Programming in R Intro, data and programming structures
R programming language
Lecture 2: Programming in R
R Programming.
Lab 1 Introductions to R Sean Potter.
Introduction to R.
Code is on the Website Outline Comparison of Excel and R
Statistics 540 Computing in Statistics
Installing Packages Introduction to R, Part II
Lecture 2: Programming in R
Introduction to Matlab
Using R for Data Analysis and Data Visualization
> Introduction to Nelson Rios, Tulane University
Presentation transcript:

SHOU Haochang ( 寿昊畅 ) Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health July 11th, 2011 Nanjing University, China *Thanks to Prof. Ji and Prof. Ruczinski for some of the lecture materials Lab1: Getting Started with R

Some Facts about R A system for data analysis and visualization which is built based on S language. Open source and open development First developed by Robert Gentleman and Ross Ihaka—also known as "R & R" of the Statistics Department of the University of Auckland. The first version was released in 2000; the latest version is R Flexible, can interact with C/WinBUGS/Matlab and database

Download and Setup Official Website CRAN (The Comprehensive R Archive Network)   Choose your mirror site, e.g. Windows user: download and run R win.exe file. Mac user: download R dmg

R Studio

Simple Syntax to Begin with R command is case sensitive !! Comment with a hashmark (#) Set working directory >getwd() >setwd("C:/Users/shouhermione/Documents/TA/Nanjing/Karen") Data Type numeric, complex(1+2i), character(‘A’/”hello world!”), logical(TRUE/FALSE) Class of object vector, matrix, list, data frame, function

Vector, matrix and array > x<-1:10 > x [1] > w=c(x,0.3,-2.1,5.7) other useful functions for creating a vector: seq(), rep() > y<-matrix(1:6,nrow=2,ncol=3,byrow=FALSE) > y [,1] [,2] [,3] [1,] [2,] > y[2,1] > z<- array(1:9,dim=c(3,3,3)) Element-wise arithmetic operator: +, -, *, /, %/%, % summary(), mean(), median(),sd(),sum(),max(),min(),sort(),order()

List and Data Frame List is an object whose components can be of different classes and dimensions. > x<-list(gender=c('F','M'),grade=c(98,100,90),undergrad=FALSE) > x$gender > x[[1]] > names(x) Data frame is a list where the components have the same length > y<-data.frame(gender=c('F','M'),grade=c(98,100),undergrad=c(FALSE,TRUE)) > y$grade, y[,2] > indices same as matrices y[1,2], y$grade[1] > nrow(y), ncol(y)

Input and Output Data Read in data frame read.table() – ASCII file; read.csv() – Excel/CSV file > dat<-read.csv('osteo.csv', header=TRUE, sep=‘,’) > dat<-read.table(‘osteo.txt’, header=TRUE, sep=‘ ’)  read.table is not suitable for large matrices with many columns. Use ‘scan’ instead. Output the data > write.table(dat, ‘osteo2.txt’,col.names=TRUE, sep=‘\t’) Save and reload the.RData save(); load()

Loops Calculate 4!=? ‘for’ and ‘while’ s<-1 for(i in 1:4){ s=s*i } print(s) s<-4 j<-4-1 while(j>=1) { s=s*j j=j-1 }

Finding Help Know the exact name of the function help(mean), ?mean Don’t know the name help.search(‘mean’), ??mean help.start() Go to R’s online documentation Search and post questions on the mailing list Google!

Graphics in R

Scatter plots, boxplots, histograms, Stem-and-leaf plots, QQ plots, images… > x<-seq(from=0,to=1,length=50) > w<-2*cos(4*pi*x) #true value > e<-rnorm(50,mean=0,sd=.5) #random errors > y<-w+e > plot(x,y,type='l',ylim=c(-3,4)) > lines(x,w,col='blue',lwd=2,lty='dashed') > legend('topright',legend=c('with noise','true value'),col=c('black','blue'),lty=c('solid','dashed'),lwd=c(1,2))

op<-par(mfrow=c(2,2)) plot(dat$Age, dat$DPA,main='DPA vs. age',xlab='age',ylab='DPA',col='blue') hist(dat$DPA,main='Histogram of DPA') boxplot(dat$DPA~dat$Osteo,main='Boxplot of DPA by disease status') qqnorm(dat$DPA) qqline(dat$DPA) par(op)

R Packages Download and install packages; load the package for use e.g., library(SemiPar) Bioconductor two releases each year, more than 460 packages; statistical tools built by R for high-dimensional genomic data analysis

Some Useful Sources An Introduction to R by Venables and Smith list Prof. Ji’s website for statistical computing ml ml 统计建模与 R 软件 by 薛毅 人大统计之都 COS 论坛