Introduction to R user-friendly and absolutely free

Slides:



Advertisements
Similar presentations
Introduction to R Brody Sandel. Topics Approaching your analysis Basic structure of R Basic programming Plotting Spatial data.
Advertisements

Data Analysis using SPSS By Dr. Shaik Shaffi Ahamed Ph. D
R for Macroecology Aarhus University, Spring 2011.
The INFILE Statement Reading files into SAS from an outside source: A Very Useful Tool!
Writing functions in R Some handy advice for creating your own functions.
OFFICE CHOUM AHMED Microsoft Excel Lesson 04.
 Statistics package  Graphics package  Programming language  Can be used to share/reproduce analyses  Many new packages being created - can be downloaded.
Data in R. General form of data ID numberSexWeightLengthDiseased… 112m … 256f3.61 NA1… 3……………… 4……………… n91m5.1711… NOTE: A DATASET IS NOT A MATRIX!
Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.
Introduction to GTECH 201 Session 13. What is R? Statistics package A GNU project based on the S language Statistical environment Graphics package Programming.
R for Research Data Analysis using R Day1: Basic R Baburao Kamble University of Nebraska-Lincoln.
Introduction to SPSS Descriptive Statistics. Introduction to SPSS Statistics Program for the Social Sciences (SPSS) Commonly used statistical software.
Guide To UNIX Using Linux Third Edition
How to Use the R Programming Language for Statistical Analyses Part I: An Introduction to R Jennifer Urbano Blackford, Ph.D. Department of Psychiatry Kennedy.
Introduction to R: The Basics Rosales de Veliz L., David S.L., McElhiney D., Price E., & Brooks G. Contributions from Ragan. M., Terzi. F., & Smith. E.
Introduction to SPSS (For SPSS Version 16.0)
Introduction to R Statistical Software Anthony (Tony) R. Olsen USEPA ORD NHEERL Western Ecology Division Corvallis, OR (541)
Correlation and Covariance. Overview Continuous Categorical Histogram Scatter Boxplot Predictor Variable (X-Axis) Height Outcome, Dependent Variable (Y-Axis)
Correlation and Covariance
1 An Introduction – UCF, Methods in Ecology, Fall 2008 An Introduction By Danny K. Hunt & Eric D. Stolen Getting Started with R (with speaker notes)
ALEXANDER C. LOPILATO R: Because the names of other stat programs don’t make sense so why should this one?
Introduction to SPSS Edward A. Greenberg, PhD
Introduction to to R Emily Kalah Gade University of Washington Credit to Kristin Siebel for development of much of this PowerPoint.
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
R-Studio and Revolution Analytics have built additional functionality on top of base R.
Introduction to R. Why use R Its FREE!!! And powerful, fairly widely used, lots of online posts about it Uses S -> an object oriented programing language.
Ann Arbor ASA ‘Up and Running’ With R Prepared by volunteers of the Ann Arbor chapter of the American statistical association, in cooperation with the.
R packages/libraries Data input/output Rachel Carroll Department of Public Health Sciences, MUSC Computing for Research I, Spring 2014.
Introduction to Enterprise Guide Jennifer Schmidt Rhonda Ellis Cassandra Hall.
EXCEL Intro to Microsoft Excel. Objectives for the Week Content ObjectivesLanguage Objectives I can create and manipulate charts, graphs, and reports.
Introduction to MATLAB adapted from Dr. Rolf Lakaemper.

STAT 534: Statistical Computing Hari Narayanan
CIS 601 Fall 2003 Introduction to MATLAB Longin Jan Latecki Based on the lectures of Rolf Lakaemper and David Young.
Learn R Toolkit D Kelly O'DayExcel & R WorldsMod 2 - Excel & R Worlds: 1 Module 2 Moving Between Excel & R Worlds Do See & HearRead Learning PowerPoint.
Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with the Department of Statistics and the Center.
R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables 
R Workshop #2 Basic Data Analysis. What we did last week: Understand the basics of how R works Generated objects (vectors, matrices, etc.) Read in data.

1 PEER Session 02/04/15. 2  Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g.,
1 EPIB 698C Lecture 1 Instructor: Raul Cruz-Cano
Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with the Department of Statistics and the Center.
Pinellas County Schools
Introduction to R Chris Free. Introduction to R Free! Superior (if not comparable) to commercial alternatives Available on all platforms Not just for.
Review > x[-c(1,4,6)] > Y[1:3,2:8] > island.data fishData$weight[1] > fishData[fishData$weight < 20 & fishData$condition.
Vectors and DataFrames. Character Vector: b
Working with data in R 2 Fish 552: Lecture 3. Recommended Reading An Introduction to R (R Development Core Team) –
Introduction to R Dr. Satish Nargundkar. What is R? R is a free software environment for statistical computing and graphics. It compiles and runs on a.
Introduction to SPSS SOCI 301 Lab session.
Introduction to OBIEE:
Sihua Peng, PhD Shanghai Ocean University
Sections Text Mining Plan Twitter API twitteR package
DATA MANAGEMENT MODULE: Getting Data Into and Out of R
Introduction to R Carolina Salge March 29, 2017.
Introduction to R Studio
Advanced Genomics - Bioinformatics Workshop
Uploading and handling databases
DATA MANAGEMENT MODULE: Getting Data Into and Out of R
ECONOMETRICS ii – spring 2018
Introduction to MATLAB
Lab 1 Introductions to R Sean Potter.
Use of Mathematics using Technology (Maltlab)
Sihua Peng, PhD Shanghai Ocean University
Data Analytics (CS40003) Programming with R Lecture #4
CSCI N207 Data Analysis Using Spreadsheet
Vectors and DataFrames
Basics of R, Ch Functions Help Managing your Objects
R Course 1st Lecture.
Introduction to Matlab
Data analysis with R and the tidyverse
Presentation transcript:

Introduction to R user-friendly and absolutely free Ho Kim SCHOOL OF PUBLIC HEALTH, SNU

Useful sites R is a free software with powerful tools The Comprehensive R Archives Network http://cran.r-project.org/ -> Download R for Windows -> base -> Download R-3.2.2 for Windows Textbook : Simple R by John Verzani http://cran.r-project.org/doc/contrib/Verzani-SimpleR.pdf

Features of R R is free. R is open-source and runs on UNIX, Windows and Macintosh. R has an excellent built-in help system. R has excellent graphing capabilities. Students can easily migrate to the commercially supported S-Plus program if commercial software is desired. R's language has a powerful, easy to learn syntax with many built-in statistical functions. The language is easy to extend with user-written functions. R is a computer programming language. For programmers it will feel more familiar than others and for new computer users, the next leap to programming will not be so large.

Starting the R

Data manipulation Data input Data management Data types Importing data Exporting data Viewing data Value labels Missing data Variables Operators Sorting data Merging data Subsetting data Source: http://www.statmethods.net/ (Quick r)

Data types Vectors Matrices Factors [Data Input] a <- c(1,2,5.3,6,-2,4) #numeric vector b <- c("one","two","three") #character vector c <- c(TRUE,TRUE,TRUE,FALSE,TRUE,FALSE) #logical vector a[c(2,4)] #2nd and 4th elements of vector Factors gender <- c(rep("male",20), rep("female", 30)) gender <- factor(gender) # R now treats gender as a nominal variable summary(gender) Matrices # generates 5 x 4 numeric matrix y<-matrix(1:20, nrow=5,ncol=4) y[,4] # 4th column of matrix y[3,] # 3rd row of matrix y[2:4,1:3] # rows 2,3,4 of columns 1,2,3

Data types Dataframes Lists # example of a list with 4 components - [Data Input] Data types Dataframes d <- c(1,2,3,4) e <- c("red", "white", "red", NA) f <- c(TRUE,TRUE,TRUE,FALSE) mydata <- data.frame(d,e,f) names(mydata) <- c("ID","Color","Passed") # variable names Lists # example of a list with 4 components - # a string, a numeric vector, a matrix, and a scaler w <- list(name="Fred", mynumbers=a, mymatrix=y, age=5.3)

Importing data From CSV file From Excel From txt file [Data Input] malaria <-read.table("C:\\R_data\\malaria.csv", header=TRUE, sep=",") From Excel library(RODBC) channel <- odbcConnectExcel("C:\\R_data\\malaria.xls") malaria <- sqlFetch(channel, "mal") *odbcConnectExcel is only usable with 32-bit Windows From txt file malaria <- read.table("C:\\ R_data\\malaria.txt", header=TRUE, sep="\t")

Exporting data To an CSV file To a tab delimited text file [Data Input] Exporting data To an CSV file write.table(malaria, "C:\\ R_data\\mal01.csv", row.names=F) To a tab delimited text file write.table(malaria, "C:\\ R_data\\mal02.txt", sep="\t", row.names=F)

Viewing data ls() # list objects in the working environment names(malaria) # list the variables in malaria str(malaria) # list the structure of malaria levels(malaria $v1) # list levels of factor v1 in malaria malaria$v1<-factor(malaria$mal) dim(malaria) # dimensions of an malaria class(malaria) # class of an malaria (numeric, matrix, dataframe, etc) malaria # print malaria head(malaria, n=10) # print first 10 rows of malaria tail(malaria, n=5) # print last 5 rows of malaria summary(malaria)

Value labels # variable v1 is coded 1, 2 or 3 # we want to attach value labels 1=red, 2=blue, 3=green v1<-c(1,1,1,2,2,3) v2 <- factor(v1, levels = c(1,2,3), labels = c("red", "blue", "green"))

Missing data Testing for missing values Recoding values to missing y <- c(1,2,3,NA) is.na(y) # returns a vector (F F F T) Recoding values to missing malaria[malaria$age==99,“age"] <- NA Excluding missing values from analyses x <- c(1,2,NA,3) mean(x) # returns NA mean(x, na.rm=TRUE) # returns 2

Help > help(mean) > ?mean

Data manipulation Data input Data management Data types Importing data Exporting data Viewing data Value labels Missing data Variables Operators Sorting data Merging data Subsetting data

Variables Recoding variables [Data management] # create 2 age categories malaria$agecat <- ifelse(malaria$age >7, c(“student"), c(“baby")) attach(malaria) malaria$agecat2[age > 7] <- "student" malaria$agecat2[age <= 7] <- "baby" detach(malaria)

Operators Comparison operators Logical operators == equals [Data management] Operators Comparison operators == equals != not equals <= less than or equals >= greater than or equals = assignment (arrow ‘<-’ 와 같다) Logical operators & and | or ! not

Sorting Data Avoid “Attach” command when sorting the data [Data management] Sorting Data # sort by mal newdata <- malaria[order(malaria$mal),] # sort by mal and age newdata2 <- malaria[order(malaria$mal, malaria$age),] #sort by mal (ascending) and age (descending) newdata3 <- malaria[order(malaria$mal, -malaria$age),] Avoid “Attach” command when sorting the data

Merging Data Raw dataset Adding rows Adding columns [Data management] malaria2<-read.table("C:\\R_data\\malaria.csv", header=TRUE, sep=",") Adding rows extra<-read.table ("C:\\R_data\\extra15.csv",header=T, sep=",") malaria3<-rbind(malaria2,extra) Adding columns region<-read.table ("C:\\R_data\\region.csv", header=T, sep=",") malaria4<-merge(malaria3, region, by="subject")

Subsetting Data mal.1 <- subset(malaria,mal==1) summary(mal.1) [Data management] Subsetting Data mal.1 <- subset(malaria,mal==1) summary(mal.1) mal.baby <- subset(malaria, mal == 1 & age < 8)