Introduction à R Olivier Mestre Ecole Nationale de la Météorologie.

Slides:



Advertisements
Similar presentations
An Introduction to R: Logic & Basics. The R language Command line Can be executed within a terminal Within Emacs using ESS (Emacs Speaks Statistics)
Advertisements

Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Introduction to Matlab
Slides 2c: Using Spreadsheets for Modeling - Excel Concepts (Updated 1/19/2005) There are several reasons for the popularity of spreadsheets: –Data are.
Using Excel Biostatistics 212 Lecture 4. Housekeeping Questions about Lab 3? –replace vs. recode Final Project Dataset! –“Housekeeping” commands vs. data.
R tutorial g/methods2.2010/R-intro.pdf.
The General Linear Model. The Simple Linear Model Linear Regression.
Part 4 Chapter 13 Linear Regression
A Simple Guide to Using SPSS© for Windows
Log-linear and logistic models
Introduction to Exploratory Descriptive Data Analysis in S-Plus Jagdish S. Gangolly State University of New York at Albany.
Alternative text for elementary statistics –Elementary Concepts –Basic Statistics.
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
How to Use the R Programming Language for Statistical Analyses Part I: An Introduction to R Jennifer Urbano Blackford, Ph.D. Department of Psychiatry Kennedy.
Introduction to R: The Basics Rosales de Veliz L., David S.L., McElhiney D., Price E., & Brooks G. Contributions from Ragan. M., Terzi. F., & Smith. E.
Chapter 1 Number Sense See page 8 for the vocabulary and key concepts of this chapter.
Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.
MATLAB Lecture One Monday 4 July Matlab Melvyn Sim Department of Decision Sciences NUS Business School
732A44 Programming in R.  Self-studies of the course book  2 Lectures (1 in the beginning, 1 in the end)  Labs (computer). Compulsory submission of.
Data, graphics, and programming in R 28.1, 30.1, Daily:10:00-12:45 & 13:45-16:30 EXCEPT WED 4 th 9:00-11:45 & 12:45-15:30 Teacher: Anna Kuparinen.
REVIEW 2 Exam History of Computers 1. CPU stands for _______________________. a. Counter productive units b. Central processing unit c. Copper.
Introduction to to R Emily Kalah Gade University of Washington Credit to Kristin Siebel for development of much of this PowerPoint.
Arko Barman with modification by C.F. Eick COSC 4335 Data Mining Spring 2015.
Ranjeet Department of Physics & Astrophysics University of Delhi Working with Origin.
MEGN 536 – Computational Biomechanics MATLAB: Getting Started Prof. Anthony J. Petrella Computational Biomechanics Group.
MATLAB An Introduction to MATLAB (Matrix Laboratory) 1.
Data Objects in R Vector1 dimensionAll elements have the same data types Data types: numeric, character logic, factor Matrix2 dimensions Array2 or more.
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
Installing R CRAN: –(R homepage: –Windows 95 and later  Base –rw2001.exe.
Statistics 1: tests and linear models. How to get started? Exploring data graphically: Scatterplot HistogramBoxplot.
Multivariate Statistics Matrix Algebra I W. M. van der Veld University of Amsterdam.
Recap Script M-file Editor/Debugger Window Cell Mode Chapter 3 “Built in MATLAB Function” Using Built-in Functions Using the HELP Feature Window HELP.
Introduction to R. Why use R Its FREE!!! And powerful, fairly widely used, lots of online posts about it Uses S -> an object oriented programing language.
L – Modelling and Simulating Social Systems with MATLAB Lesson 2 – Statistics and plotting Anders Johansson and Wenjian Yu © ETH.
Introduction to Programming in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone
Recap Sum and Product Functions Matrix Size Function Variance and Standard Deviation Random Numbers Complex Numbers.
Matlab Basics FIN250f: Lecture 3 Spring 2010 Grifths Web Notes.
Math 15 Lecture 9 University of California, Merced Scilab A Short Introduction – No. 3 Today – Quiz #4.
STAT 251 Lab 1. Outline Lab Accounts Introduction to R.
Matlab Screen  Command Window  type commands  Current Directory  View folders and m-files  Workspace  View program variables  Double click on a.
Lecture 26: Reusable Methods: Enviable Sloth. Creating Function M-files User defined functions are stored as M- files To use them, they must be in the.
Quantitative analysis and R – (1) LING115 November 18, 2009.
INTRODUCTION TO MATLAB DAVID COOPER SUMMER Course Layout SundayMondayTuesdayWednesdayThursdayFridaySaturday 67 Intro 89 Scripts 1011 Work
Introduction to MATLAB 1.Basic functions 2.Vectors, matrices, and arithmetic 3.Flow Constructs (Loops, If, etc) 4.Create M-files 5.Plotting.
1 Lecture 5 Post-Graduate Students Advanced Programming (Introduction to MATLAB) Code: ENG 505 Dr. Basheer M. Nasef Computers & Systems Dept.
Engineering Analysis ENG 3420 Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th 11:00-12:00.
1 An Introduction to R © 2009 Dan Nettleton. 2 Preliminaries Throughout these slides, red text indicates text that is typed at the R prompt or text that.
R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables 
CS100A, Fall 1998, Lecture 191 CS100A, Fall 1998 Lecture 19, Thursday Nov 05 Matlab Concepts: Matlab arrays Matlab subscripting Matlab plotting.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
An Introduction to Programming in Matlab Emily Blumenthal
L – Modeling and Simulating Social Systems with MATLAB Lecture 2 – Statistics and plotting © ETH Zürich | Giovanni Luca Ciampaglia,
1-2 What is the Matlab environment? How can you create vectors ? What does the colon : operator do? How does the use of the built-in linspace function.
16BIT IITR Data Collection Module If you have not already done so, download and install R from download.
 Naïve Bayes  Data import – Delimited, Fixed, SAS, SPSS, OBDC  Variable creation & transformation  Recode variables  Factor variables  Missing.
Programming in R Intro, data and programming structures
R programming language
Introduction to R Samal Dharmarathna.
Built-in MATLAB Functions Chapter 3
Modelling and Simulating Social Systems with MATLAB
Naomi Altman Department of Statistics (Based on notes by J. Lee)
MATLAB DENC 2533 ECADD LAB 9.
Use of Mathematics using Technology (Maltlab)
Stat 251 (2009, Summer) Lab 1 TA: Yu, Chi Wai.
Introduction to Matlab
R Course 1st Lecture.
Matlab Training Session 2: Matrix Operations and Relational Operators
Data analysis with R and the tidyverse
Advanced data management
R tutorial
Presentation transcript:

Introduction à R Olivier Mestre Ecole Nationale de la Météorologie

R on the web Main ressource Mirrors example : Toulouse (CICT, UPS)

Characteristics Advantages –Free –Available on UNIX, LINUX, WINDOWS –Dedicated to statistics, graphical capabilities –open source Inconvénients –Bugs (always possible) –Beware memory size in climate applications

Running R Install R : download from website Running R : simply type R Quit R (sniff) : type q( ).

Sessions and programs Interactive sessions Programs : command files source(“prog.R”) Shell UNIX (mode batch) : R CMD BATCH prog.R

Help Full documentation on CRAN Use “search” on CRAN R session : ?command example: ?sd sd : entering a command gives its source code.

Installing “packages” (linux) Package = true reason of R usefulness Export R_LIBS variable in your “.profile” (library directory). Example : install sudoku package (to be done once) install.packages(“sudoku”,dependencies=TRUE) Load sudoku in your session (necessary each time you want to use it). library(sudoku)

Basic commands ls():list objects rm():remove objects rm(list=ls()):remove all objects system(“ls”):call unix function “ls” a=5:assign a<-5:assign (old) a:print a value (interactive) print(a):print a value (program) Beware, R is case sensitive

Special characters #Comments ;Separates two commands on the same line. “ “Strings pi3,

Arithmétics +: addition -: substraction, sign * : multiplication /: division ^: power %/%: integer division %: remainder from integer division

Logical ==: equal to !=: no equal to <: less than >: greater than <=: less than or equal to >=: greater than or equal to is.na(): missing? &: logical AND |: logical OR !: logical NOT

Conversion as.numeric(x)conversion to numeric as.integer(x)conversion to integer as.character(x)conversion to string as.logical(x)conversion to logical as.matrix(x) is.numeric(x),is.integer(x)… gives TRUE if numeric or integer variable, FALSE. Beware : integer is also numeric, while numeric may be different from integer.

Strings Paste function num.fic = 1 n.fic = as.character(numero.fichier) nom.fichier=paste(“fichier”,n.fic,”.txt”,sep=“”) nchar,substr functions a=“abcdefg” nchar(a) substr(a,1,3) substr(a,1,4)=“123456”

Generating numeric(25): 25 zéros (vector) character(25): 25 “” vector of empty strings seq(-4,4,0.1): sequence … 4.0 1:10: idem seq(1,10,1) 1:10-3: Arf! 1:(10-3): Arf! c(5,7,1:3): concatenation rep(1,7): replication

Matrix generating matrix(0,nrow=3,ncol=4) matrix(1:12,nrow=3,ncol=4) matrix(1:12,nrow=3,ncol=4, byrow=TRUE)

Data frames Allow mixing different data types in the same object. All variables must have same length donnee=data.frame(an=1880:2005,TN,TX) donnee=data frame made of 3 vectors of same length : donnee$an, donnee$TN, donnees$TX

Data frames names(donnee): names of the variables in “donnee” attach(donnee) : allows direct call of variables : “an” instead of “donnee$an”. Beware there is no link between those variables in case of modifications detach(donnee) : opposite to attach()

Lists List = “composite” object Useful for function results toto=list(y=an,titi,tata) names(toto) : nom des objets dans toto

Exercise Create a data frame containing two variables : “an” : years 1880, 1881,…, 2000 “bis”: logical, indicating wether year has 365 (FALSE) or 366 days (TRUE) Hint : use reminder of division by 4

Classical functions x may be scalar of vector (in the latter case, the result is a vector). round(x,k) : rounding x (k digits) log(x): natural log log10(x): log base 10 sqrt(x): square root exp(x) sin(x),cos(x),tan(x) asin(x),acos(x),atan(x)

Functions on vectors length(x) : size of x min(x), min(x1,x2) : gives the minimum of x (or x1,x2) max(x) : same as min, for maximum pmin(x1,x2..,xk) : gives k minima of x1, x2… xk pmax(x1,x2,…, xk) : same as pmin for maxima sort(x) : sorting x (if index.return=TRUE, gives also the corresponding indices)

Some statistics sum(x): sum of x elements mean(x): mean of x elements var(x): variance sd(x): standard deviation median(x): médian quantile(x,p): p quantile cor(x,y):correlation between x and y

Indexing, selection x[1]: first element of x x[1:5]: 5 first elements x[c(3,5,6)] : elements 3,5,6 of x z=c(3,5,6) ; x[z] : idem x[x>=5] : elements of x  5 L=x>=5 ; x[L]: idem i=which(x>=5);x[i]: idem

Matrices m[i,]: ith line m[,j]: jth column Selection columns or lines == vectors m1%*%m2: matrix multiplication solve(m): matrix inversion svd(m): SVD

Matrices rbind, cbind allow adding rows (rbind) or columns (cbind) to a vector, matrix or data frame y=rep(1,10);z=seq(1,1) M=cbind(y,z) remove M second column M[,-2] remove rows 4 and 6 M[c(-4,-6), ]

Indexing Frames donnee[donnee$annee<=1940,] subset of “donnee” corresponding to years before 1941 subset(donnee,annee<=1940) idem

Programming if (toto==2) {tata=1} if (toto==2) {tata=1} else {tata=0}

Programming for (i in 1:10) { tata[i]=exp(toto[i])} Beware, “:” has priority, compare 1:10-1 and 1:(10-1) i=1 while (i < 10) {tata[i]=exp(toto[i]) i=i+1}

Functions Definition ect=function(x) {resultat=sqrt(var(x) return(resultat)} Call s=ect(x)

Functions Lists in fuctions moyect=function(x) {s=ect(x) m=mean(x) resultat=list(moyenne=m,ect=s) return(resultat)}

Read data Interactive a=readline(“donner la valeur de a ”) Read ASCII file with header data=read.table(file=“nomfic”,header=TRUE) Write ASCII file (use format and round) write.table(format(round(data,k)),quote=F file=“nomfic”,sep=“ ”,rownames=F)

Saving objects save(a,m,file=“toto.sav”) load(“toto.sav”)

Distributions Normal law, expectation m, sd s dnorm(x,m,s): density de x pnorm(x,m,s): repartition function qnorm(p,m,s): p quantile rnorm(n,m,s): random number generation

Distributions Convention : d=density, p=repartition, q=quantile,r=random unif(,min,max) uniform on [min,max] pt(,df) Student(df) chisq(,df)  ²(df) f(,d1,d2) Fisher(d1,d2) pois(,lambda) Poisson(lambda) Etc..

Figures Open device : x11(): window on screen postscript(file=“fig.eps”): postscript png(file=“fig.png”): PNG Plot () Close device graphics.off(): close windows or finalizes files

Figures Plot() plot(x,y,type=“l”,main=“toto”,…) Parameters type=“l” (line),”p” (point),”h” (vertical line)… main=“title”,xlab=“title x”,ylab=“title y”,sub=“subtitle” –xlim=[a,b],ylim=[c,d]. Beware, R adds 4% to axis. Add xaxs=“i”,yaxs=“i”, for exact setting of axis range –col=“red”colors() for list

Controlling lines and symbols lwd=line width lty=line type (dots, etc…) pch=“+”, “.”,”*” etc.. ?par lists parameters an use

Other plotting functions lines(x,y): adds lines points(x,y): points text(x,y,”texte”) : adds text at x,y abline(a,b): straight line y=ax+b abline(h=y): horizontale line (height=y) abline(v=x): vertical line (position=x) See also : legend Etc…

Infinite capabilities Multiple figures 2D contours 3D (lattice) Maps (package “map”)

Exercise Draw density of N(0,1) –generate x from -5 to +5 (step : 0.01) –Calculate and plot density of normal law N(0,1)

Exercise Central-limit theorem –Generate 1000 vectors of size 12 following uniform law U[0,1] –command hist() : histogram of generated values –Calculate means of the 1000 vectors, and represent their histogram –Same questions for an exponential law of rate 10 Magic!

Exercise Series of daily minimum (TN) and maximum (TX) des TN et TX in Strasbourg –Load ascii file “Q lst” –Calculate and represent in separate figures: series of annual means of TN series of annual means of TX series of summer (JJA) means of TX Point series of annual maxima of TX (“+”) Point series of annual maxima of TN (“+”)

Exercise Series of annual number of frost days –Calculate and plot this series –command summary() : basic statistics on this series –command hist() : histogrma of the annual number of frost days

Tests t.test: Student var.test: Fisher cor.test: correlation tests. Other options: method=“ kendall“ ou method=“spearman“ chisq.test:  ² test

Gaussian linear model lm(y~x): y explained by x lm(y~x 1 +x 2 ): y explained by x1, x2 f=as.factor(f): transforms f into a factor lm(y~f) : one factor ANOVA lm(y~f1+f2) : two factors ANOVA lm(y~x+f): covariance analysis

Formula, interactions ~ : explained by + : additive effects :: interaction * : effects + interactions a * b = a + b + a:b -1: removes intercept

Outputs lm.out=lm(y~x): results in lm.out object summary(lm.out): coefficients, tests, etc.. anova(lm.out): regression sum of squares plot(lm.out): plot diagnosis fitted(lm.out): fitted values residuals(lm.out): residuals predict(lm.out,newdata): prediction for a new data frame

GLM Families : ?family Logistic regression glm.out=glm(y~x, binomial) Poisson régression glm.out=glm(y~x, poisson) Remark: lm(y~x)equivalent to glm(y~x, gaussian)

Outputs summary(lm.out): coefficients, tests, etc.. fitted(lm.out): fitted values residual(lm.out): residuals predict(lm.out,newdata) prediction for a new data frame