R – a brief introduction Johannes Freudenberg Cincinnati Children’s Hospital Medical Center

Slides:



Advertisements
Similar presentations
Introduction to R Brody Sandel. Topics Approaching your analysis Basic structure of R Basic programming Plotting Spatial data.
Advertisements

Introduction to S-Plus by Francesco Ferretti Analysis of Biological Data Course Winter term 2007 Dalhousie University.
R for Macroecology Aarhus University, Spring 2011.
Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU Empowered by Higher Education Quality Enhancement Project (HEQEP) Department.
 Statistics package  Graphics package  Programming language  Can be used to share/reproduce analyses  Many new packages being created - can be downloaded.
Basics of Using R Xiao He 1. AGENDA 1.What is R? 2.Basic operations 3.Different types of data objects 4.Importing data 5.Basic data manipulation 2.
Pasewark & Pasewark Microsoft Office XP: Introductory Course 1 INTRODUCTION Lesson 1 – Microsoft Office XP Basics and the Internet.
How to Guide: Step-by-Step introduction on how to Manage your References Pavlinka Kovatcheva, Sciences Librarian Library training instruction for Sciences.
Introduction to GTECH 201 Session 13. What is R? Statistics package A GNU project based on the S language Statistical environment Graphics package Programming.
Programming Fundamentals. Programming concepts and understanding of the essentials of programming languages form the basis of computing.
SHOU Haochang ( 寿昊畅 ) Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health July 11th, 2011 Nanjing University, China *Thanks to.
An introduction to R Honors 207 Cognitive Science (These Slides were Shamelessly Stolen from Dr. Pablo Gomez, DePaul University)
Introduction to MATLAB Week 13 – 4/21/09. Instructor: Kate Musgrave Time: Tuesdays 3-5pm Office Hours: Tuesdays 1:30-3pm
Alternative text for elementary statistics –Elementary Concepts –Basic Statistics.
R – a brief introduction Johannes Freudenberg Cincinnati Children’s Hospital Medical Center
How to Use the R Programming Language for Statistical Analyses Part I: An Introduction to R Jennifer Urbano Blackford, Ph.D. Department of Psychiatry Kennedy.
The “R” Statistical Package Naomi Altman Dept. of Statistics PSU.
Introduction to R: The Basics Rosales de Veliz L., David S.L., McElhiney D., Price E., & Brooks G. Contributions from Ragan. M., Terzi. F., & Smith. E.
Introduction to R Statistical Software Anthony (Tony) R. Olsen USEPA ORD NHEERL Western Ecology Division Corvallis, OR (541)
Applied Bioinformatics Introduction to Linux and R Bing Zhang Department of Biomedical Informatics Vanderbilt University
Introduction to MATLAB ENGR 1187 MATLAB 1. Programming In The Real World Programming is a powerful tool for solving problems in every day industry settings.
Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.
1 An Introduction – UCF, Methods in Ecology, Fall 2008 An Introduction By Danny K. Hunt & Eric D. Stolen Getting Started with R (with speaker notes)
R – a brief introduction Statistical physics – lecture 11 Szymon Stoma.
732A44 Programming in R.  Self-studies of the course book  2 Lectures (1 in the beginning, 1 in the end)  Labs (computer). Compulsory submission of.
A B C Q R S! Coilín Minto Department of Biology, Dalhousie University.
Objectives Understand what MATLAB is and why it is widely used in engineering and science Start the MATLAB program and solve simple problems in the command.
Intro to R R is a free version of S-plus R is a free version of S-plus Can be used interactively but script or syntax files are commonly used to record.
Arko Barman with modification by C.F. Eick COSC 4335 Data Mining Spring 2015.
Sébastien Lê Agrocampus Rennes A very short introduction to “R” The “Rcmdr” package and its environment.
Session 3: More features of R and the Central Limit Theorem Class web site: Statistics for Microarray Data Analysis.
Introduction to MATLAB ENGR 1181 MATLAB 1. Programming In The Real World Programming is a powerful tool for solving problems in every day industry settings.
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
Using the ‘R’ Language for Bioinformatics
Just as there are many human languages, there are many computer programming languages that can be used to develop software. Some are named after people,
Hands-on Introduction to R. We live in oceans of data. Computers are essential to record and help analyse it. Competent scientists speak C/C++, Java,
Chapter 17 Creating a Database.
R Programming Yang, Yufei. Normal distribution.
Chapter 3 MATLAB Fundamentals Introduction to MATLAB Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Blackboard 8: Grade Center This workshop is for existing users of Blackboard interested in keeping track of student grades online. Blackboard replaced.
An Introduction to R Statistical Computing AMS 597 Stony Brook University Spring 2009 By Tianyi Zhang.
The Report Generator Viewing Student Outcomes. Install the Report Generator In a browser, go to Click.
Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
Scientific Computing (w1) R Computing Workshops An Introduction to Scientific Computing workshop 1.
STAT 534: Statistical Computing Hari Narayanan
Digital Image Processing Introduction to MATLAB. Background on MATLAB (Definition) MATLAB is a high-performance language for technical computing. The.
More Unix Naomi Altman. Directories Directory = folder mkdir - makes a new directory rmdir - removes an empty directory cd mydirectory - moves you into.
© 2015 by Wade Rogers Introduction to R Cytomics Workshop December, 2015.
To find journals by language of publication, click on the Languages bar in the horizontal frame. The Languages drop down menu appear and we will choose.
Math 252: Math Modeling Eli Goldwyn Introduction to MATLAB.
Introduction to Programming on MATLAB Ecological Modeling Course Sep 11th, 2006.
Introductory Data Analysis F73DA2. Contact Times (Spring Term 2008) Monday 4: : Lecture in LT3 Tuesday 2: : Lecture in LT3 Wednesday
Chris Knight Beginners’ workshop.
1-2 What is the Matlab environment? How can you create vectors ? What does the colon : operator do? How does the use of the built-in linspace function.
Working with data in R 2 Fish 552: Lecture 3. Recommended Reading An Introduction to R (R Development Core Team) –
Development Environment
How to get started with RefWorks
Programming in R Intro, data and programming structures
R programming language
Second Annual Cytomics Workshop April, 2017
Naomi Altman Department of Statistics (Based on notes by J. Lee)
How to get started with RefWorks
MATLAB DENC 2533 ECADD LAB 9.
Lab 1 Introductions to R Sean Potter.
CSCI N317 Computation for Scientific Applications Unit 1 – 1 MATLAB
Statistics for the Social Sciences
MIS2502: Data Analytics Introduction to R and RStudio
Statistics for the Social Sciences
Using R for Data Analysis and Data Visualization
A brief introduction to the nutrient tool-kit, getting R Studio to work and checking the data Martyn Kelly
Presentation transcript:

R – a brief introduction Johannes Freudenberg Cincinnati Children’s Hospital Medical Center

Overview History of R Getting started R as a calculator Data types Missing values Subsetting Importing/Exporting data Plotting and Summarizing data Resources

History of R Statistical programming language S developed at Bell Labs since 1976 (at the same time as UNIX) Intended to interactively support research and data analysis projects Exclusively licensed to Insightful (“S-Plus”) R: Open source platform similar to S developed by R. Gentleman and R. Ihaka (U of Auckland, NZ) during the 1990s Since 1997: international “R-core” developing team Updated versions available every couple months

What R is and what it is not R is –a programming language –a state-of-the-art statistical package –an interpreter –Open Source R is not –a database –a collection of “black boxes” –a spreadsheet software package –commercially supported

Getting started To obtain and install R on your computer 1)Go to to choose a mirror near youhttp://cran.r-project.org/mirrors.html 2)Click on your favorite operating system (Linux, Mac, or Windows) 3)Download and install the “base” To install additional packages 1)Start R on your computer 2)Choose the appropriate item from the “Packages” menu

R as a calculator R can be used as a calculator: > 5 + (6 + 7) * pi^2 [1] > log(exp(1)) [1] 1 > log(1000, 10) [1] 3 > sin(pi/3)^2 + cos(pi/3)^2 [1] 1 > Sin(pi/3)^2 + cos(pi/3)^2 Error: couldn't find function "Sin"

R as a plotter R has many nice and easy- to-use plotting functions > plot(cars) * ) > lines(lowess(cars), col = "Red") > lines(c(4, 25), c(4, 25)* , lty = 2, col = "Blue") > legend(5, 118, c("lowess smoother", "linear regression"), lty = 1:2, col = c("Red", "Blue")) * ) The data give the speed of cars and the distances taken to stop. Note that the data were recorded in the 1920s.

R as a plotter > plot(sin, 0, 2*pi, type = "p", pch = "*", col = 2) > plot(table(rpois(1000,5)), type="h",col="red",lwd=10, main="rpois(1000,lambda=5)")

Basic (atomic) data types Logical > x <- T; y <- F > x; y [1] TRUE [1] FALSE Numerical > a <- 5; b <- sqrt(2) > a; b [1] 5 [1] Character > a <- "1"; b <- 1 > a; b [1] "1" [1] 1 > a <- "character" > b <- "a"; c <- a > a; b; c [1] "character" [1] "a" [1] "character"

Your R objects are stored in a workspace To list the objects in your workspace (may be a lot): > ls() To remove objects which you don’t need any more: > rm(x, y, a) To remove ALL objects in your workspace: > rm(list=ls()) To save your workspace to a file: > save.image() The default workspace file is./.RData R workspace management

Identifiers (object names) Must start with a letter (A-Z or a-z) –R is case sensitive! –e.g., mydata different from MyData Can contain letters, digits (0-9), periods (“.”) –Periods have no special meaning (i.e., unlike in C or Java) Until recently (before version 1.9.0), underscore “_” had special meaning!

Vectors, Matrices, Arrays Vector –Ordered collection of data of the same data type –Example: last names of all students in this class Mean intensities of all genes on an oligonucleotide microarray –In R, single number is a vector of length 1 Matrix –Rectangular table of data of the same type –Example Mean intensities of all genes measured during a microarray experiment Array –Higher dimensional matrix

Vectors Vector: Ordered collection of data of the same data type > x <- c(5.2, 1.7, 6.3) > log(x) [1] > y <- 1:5 > z <- seq(1, 1.4, by = 0.1) > y + z [1] > length(y) [1] 5 > mean(y + z) [1] 4.2

Matrices Matrix: Rectangular table of data of the same type > m <- matrix(1:12, 4, byrow = T); m [,1] [,2] [,3] [1,] [2,] [3,] [4,] > y <- -1:2 > m.new <- m + y > t(m.new) [,1] [,2] [,3] [,4] [1,] [2,] [3,] > dim(m) [1] 4 3 > dim(t(m.new)) [1] 3 4

Missing values R is designed to handle statistical data and therefore bound to having to deal with missing values Numbers that are “not available” > x <- c(1, 2, 3, NA) > x + 3 [1] NA “Not a number” > log(c(0, 1, 2)) [1] -Inf > 0/0 [1] NaN

Subsetting Often necessary to extract a subset of a vector or matrix R offers various neat ways to do that > x <- c("a","b","c","d","e","f","g","h") > x[1] * ) > x[3:5] > x[-(3:5)] > x[c(T, F, T, F, T, F, T, F)] > x[x <= "d"] > m[,2] > m[3,] * ) Index starts with 1, not with 0!

Other Objects and Data Types Functions Factors Lists Dataframes We’ll talk about them later in the course

Importing/Exporting Data Importing data –R can import data from other applications –Packages are available to import microarray data, Excel spreadsheets etc. –The easiest way is to import tab delimited files > my.data<-read.table("file",sep=",") * ) > SimpleData <- read.table(file = " header = TRUE, quote = "", sep = "\t", comment.char="") Exporting data –R can also export data in various formats –Tab delimited is the most common > write.table(x, "filename") * ) * ) make sure to include the path or to first change the working directory

Analyzing/Summarizing data First, let’s take a look… > SimpleData[1:10,] Mean, Variance, Standard deviation, etc. > mean(SimpleData[,3]) > mean(log(SimpleData[,3])) > var(SimpleData[,4]) > sd(SimpleData[,3]) > cor(SimpleData[,3:4]) > colMeans(SimpleData[3:14])

Plotting Scatter plot > plot(log(SimpleData[,"C1"]), log(SimpleData[,"W1"]), xlab = "channel 1", ylab = "channel 2") Histogram > hist(log(SimpleData[,7])) > hist(log(SimpleData[,7]),nclass = 50, main = "Histogram of W3 (on log scale)") Boxplot > boxplot(log(SimpleData[,3:14])) > boxplot(log(SimpleData[,3:14]), outline = F, boxwex = 0.5, col = 3, main = "Boxplot of SimpleData")

Getting help… and quitting Getting information about a specific command > help(rnorm) > ?rnorm Finding functions related to a key word > help.search("boxplot") Starting the R installation help pages > help.start() Quitting R > q()

Resources Books –Assigned text book –For an extended list visit project.org/doc/bib/R- publications.html project.org/doc/bib/R- publications.html Mailing lists –R-help ( project.org/mail.html) project.org/mail.html –Bioconductor ( g/docs/mailList.html) g/docs/mailList.html –However, first read the posting guide/ general instructions and search archives Online documentation –R Project documentation ( Manuals FAQs … –Bioconductor documentation Vignettes ( Short Courses ( shops/) shops/ … –Google Personal communication – me: –Ask other R users

References H Chen: R-Programming. programming.ppt programming.ppt WN Venables and DM Smith: An Introduction to R labs.com/cm/ms/departments/sia/S/history.htmlhttp://cm.bell- labs.com/cm/ms/departments/sia/S/history.html