LISA Short Course Series Basics of R Lin Zhang Feb. 16, 2015 LISA: Basics of RFeb. 16, 2015.

Slides:



Advertisements
Similar presentations
Introduction to R Brody Sandel. Topics Approaching your analysis Basic structure of R Basic programming Plotting Spatial data.
Advertisements

Associate Collaborator for LISA Department of Statistics, VT
R for Macroecology Aarhus University, Spring 2011.
Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU Empowered by Higher Education Quality Enhancement Project (HEQEP) Department.
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
MATLAB – What is it? Computing environment / programming language Tool for manipulating matrices Many applications, you just need to get some numbers in.
CART: Classification and Regression Trees Chris Franck LISA Short Course March 26, 2013.
Objectives Understand the software development lifecycle Perform calculations Use decision structures Perform data validation Use logical operators Use.
Tutorial 5: Working with Excel Tables, PivotTables, and PivotCharts
Introduction to GTECH 201 Session 13. What is R? Statistics package A GNU project based on the S language Statistical environment Graphics package Programming.
Ann Arbor ASA ‘Up and Running’ Series: SPSS Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with.
LISA Short Course Series Multivariate Analysis in R Liang (Sally) Shan March 3, 2015 LISA: Multivariate Analysis in RMar. 3, 2015.
Lecture 2 LISAM. Statistical software.. LISAM What is LISAM? Social network for Creating personal pages Creating courses  Storing course materials (lectures,
Introduction to MATLAB MECH 300H Spring Starting of MATLAB.
Introduction to SPSS Short Courses Last created (Feb, 2008) Kentaka Aruga.
LISA Short Course Series R Basics
Introduction to Array The fundamental unit of data in any MATLAB program is the array. 1. An array is a collection of data values organized into rows and.
LISA Short Course Series R Statistical Analysis Ning Wang Summer 2013 LISA: R Statistical AnalysisSummer 2013.
Introduction to MATLAB ENGR 1187 MATLAB 1. Programming In The Real World Programming is a powerful tool for solving problems in every day industry settings.
Managing Business Data Lecture 8. Summary of Previous Lecture File Systems  Purpose and Limitations Database systems  Definition, advantages over file.
LISA Short Course Series R Basics Ana Maria Ortega Villa Fall 2013 LISA: R BasicsFall 2013.
Introduction to MATLAB Session 1 Prepared By: Dina El Kholy Ahmed Dalal Statistics Course – Biomedical Department -year 3.
Shuyu Chu Department of Statistics February 17, 2014 Lisa Short Course Series R Statistical Analysis Laboratory for Interdisciplinary Statistical Analysis.
732A44 Programming in R.  Self-studies of the course book  2 Lectures (1 in the beginning, 1 in the end)  Labs (computer). Compulsory submission of.
Hands-on Introduction to R. Outline R : A powerful Platform for Statistical Analysis Why bother learning R ? Data, data, data, I cannot make bricks without.
Data, graphics, and programming in R 28.1, 30.1, Daily:10:00-12:45 & 13:45-16:30 EXCEPT WED 4 th 9:00-11:45 & 12:45-15:30 Teacher: Anna Kuparinen.
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
Arko Barman with modification by C.F. Eick COSC 4335 Data Mining Spring 2015.
ELG 3120 Signal and System Analysis 1 Introduction to MATLAB TAs Wei Zhang Ozgur Ekici (Section A)(Section B) ELG 3120 Lab Tutorial 1.
1 Lab of COMP 406 Teaching Assistant: Pei-Yuan Zhou Contact: Lab 1: 12 Sep., 2014 Introduction of Matlab (I)
Introduction to R Lecture 1: Getting Started Andrew Jaffe 8/30/10.
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
What is MATLAB? MATLAB is one of a number of commercially available, sophisticated mathematical computation tools. Others include Maple Mathematica MathCad.
Vectors and Matrices In MATLAB a vector can be defined as row vector or as a column vector. A vector of length n can be visualized as matrix of size 1xn.
Getting Started with MATLAB 1. Fundamentals of MATLAB 2. Different Windows of MATLAB 1.
10/24/20151 Chapter 2 Review: MATLAB Environment Introduction to MATLAB 7 Engineering 161.
SAS Interactive Matrix Language Computing for Research I Spring 2012 Ramesh.
MATLAB Environment ELEC 206 Computer Applications for Electrical Engineers Dr. Ron Hayne.
What does C store? >>A = [1 2 3] >>B = [1 1] >>[C,D]=meshgrid(A,B) c) a) d) b)
STAT 251 Lab 1. Outline Lab Accounts Introduction to R.
Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
Lecture 20: Choosing the Right Tool for the Job. What is MATLAB? MATLAB is one of a number of commercially available, sophisticated mathematical computation.
Comparison of different output options from Stata
Lecture 26: Reusable Methods: Enviable Sloth. Creating Function M-files User defined functions are stored as M- files To use them, they must be in the.
Introduction to R Carol Bult The Jackson Laboratory Functional Genomics (BMB550) Spring 2011.
Mr. Magdi Morsi Statistician Department of Research and Studies, MOH
STAT 534: Statistical Computing Hari Narayanan
Matlab Introduction  Getting Around Matlab  Matrix Operations  Drawing Graphs  Calculating Statistics  (How to read data)
SCRIPTS AND FUNCTIONS DAVID COOPER SUMMER Extensions MATLAB has two main extension types.m for functions and scripts and.mat for variable save files.
Math 252: Math Modeling Eli Goldwyn Introduction to MATLAB.
PHP Tutorial. What is PHP PHP is a server scripting language, and a powerful tool for making dynamic and interactive Web pages.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
PROGRAMMING USING PYTHON LANGUAGE ASSIGNMENT 1. INSTALLATION OF RASPBERRY NOOB First prepare the SD card provided in the kit by loading an Operating System.
Introduction to R and Data Science Tools in the Microsoft Stack Jamey Johnston.
Basics of R INSTRUCTOR: AMANDA MCGOUGH TUESDAY, MARCH 29, 2016.
Pinellas County Schools
Review > x[-c(1,4,6)] > Y[1:3,2:8] > island.data fishData$weight[1] > fishData[fishData$weight < 20 & fishData$condition.
16BIT IITR Data Collection Module If you have not already done so, download and install R from download.
Working with data in R 2 Fish 552: Lecture 3. Recommended Reading An Introduction to R (R Development Core Team) –
Block 1: Introduction to R
Lecture 2: Introduction to R
Programming in R Intro, data and programming structures
Introduction to R Samal Dharmarathna.
MATLAB DENC 2533 ECADD LAB 9.
Use of Mathematics using Technology (Maltlab)
Vectors and Matrices In MATLAB a vector can be defined as row vector or as a column vector. A vector of length n can be visualized as matrix of size 1xn.
R Course 1st Lecture.
Stat 251 (2009, Summer) Lab 2 TA: Yu, Chi Wai.
Data analysis with R and the tidyverse
Presentation transcript:

LISA Short Course Series Basics of R Lin Zhang Feb. 16, 2015 LISA: Basics of RFeb. 16, 2015

Laboratory for Interdisciplinary Statistical Analysis Collaboration: Visit our website to request personalized statistical advice and assistance with: Designing Experiments Analyzing Data Interpreting Results Grant Proposals Software (R, SAS, JMP, Minitab...) LISA statistical collaborators aim to explain concepts in ways useful for your research. Great advice right now: Meet with LISA before collecting your data. All services are FREE within VT community. We assist with research—not class projects or homework. LISA helps VT researchers benefit from the use of Statistics LISA also offers: Educational Short Courses: Designed to help graduate students apply statistics in their research Walk-In Consulting: for questions <30 min. Available: OSB---Monday-Friday from 1-3 PM; Other locations--- Tuesday-Friday AM & Monday 3-5 PM. (Please refer to our website for details regarding times and locations.)

1. What is R 2. Why use R 3. Installing R in your own computer 4. R studio 5. Data Import 6. Data Structures and Manipulation 7. Exploratory Data Analysis 8. Loops 9. If/Else Statements 10. Data Export Outline LISA: Basics of RFeb. 16, 2015

LISA: Basics of R Summer 2013 LISA: Basics of RFeb. 16, 2015 What is R? R is a powerful, versatile, and free statistical programming software. Scientists, statisticians, analysts and others who are interested in statistical analysis, data visualization, etc. are using R to do so. Data analysis is done in R by using customized or built in scripts and functions written in R language. By using R, you can not only perform the traditional statistical analysis, but also implement some of the most recent cutting-edge statistical methods. R is an open source software. This means that you can download and use R for free, and additionally the source code is open and available for inspection and modification.

Why use R? LISA: Basics of RFeb. 16, R is free and open. 2. R is a language. You learn much more than just point and click. 3. R has excellent tools for graphics and data visualization. 4. R is flexible. You are not restricted to the built-in functions; you can customize and extend those functions to serve your own needs. You can make your analysis your own!

LISA: Basics of RFeb. 16, 2015 How to Obtain R for your own computer? Windows: MacOs X:

LISA: Basics of RFeb. 16, 2015 R Studio The console will display all your results and commands. Workspace and history of commands Available files, generated plots, package management and help

We need to set the working directory. For this we use the function setwd(): dir <- “location” setwd(dir) 1. Comma Separated Values: Use the function read.table mydatacsv<- read.table('prices.csv', sep=',', header=T) mydatacsv<- read.csv('prices.csv', sep=',', header=T) 2. Text File: Use the function read.table: mydatatxt<- read.table('Iris.txt', sep='\t', header=T) Data Import LISA: Basics of RFeb. 16, 2015

Data Structures and Manipulation 1. Object Creation Expression: A command is given, evaluated and the result is printed on the screen. Assignment: Storing the results of expressions. 2. Vectors: The basic data structure in R. (Scalars are vectors of dimension 1). a. Creating sequences: - : command. Creates a sequence incrementing/decrementing by 1 - seq() command. b. Vectors with no pattern. c() function. c. Vectors of characters. Also use c() function with the help of “” d. Repeating values. rep() function. e. Arithmetic with vectors: All basic operations can be performed with vectors. f. Subsets: The basic syntax for subsetting vectors is: vector[index] LISA: Basics of RFeb. 16, 2015

Data Structures and Manipulation LISA: Basics of RFeb. 16, Matrices: Objects in two dimensions. a. Creating Matrices Command: matrix(data, nrow, ncol, byrow). data: list of elements that will fill the matrix. nrow, ncol: number of elements in the rows and the columns respectively. byrow: filling the matrix by row. The default is FALSE. b. Some Matrix Functions dim(): Lists the dimensions of the matrix. cbind(): Creating matrix by putting columns together. rbind(): Creating matrix by putting rows together. diag(d): Creates identity matrix of dimension d.

Data Structures and Manipulation LISA: Basics of RFeb. 16, 2015 c. Some Matrix computations Addition Subtraction Inverse: function solve() Transpose: function t() Element-wise multiplication: * Matrix multiplication: %*% d. Subsets Referencing a cell: matrix[r,c], where r represents the row and c represents the column. Referencing a row: matrix[r,] Referencing a column: matrix[,c]

The data are a random sample of records of re-sales of homes from Feb. 15 to Apr. 30, 1993 from the files maintained by the Albuquerque Board of Realtors. This type of data is collected by multiple listing agencies in many cities and is used by realtors as an information base. Number of cases: 65 Variable Names: PRICE = Selling price ($hundreds) SQFT = Square feet of living space AGE = Age of home (years) NE = Located in northeast sector of city (=1) or not (=0) Practice 1: Prices Data Set (prices.csv) LISA: Basics of RFeb. 16, 2015

Lets review some of the matrix commands we learned previously by applying them to our new dataset. 1. What is the dimension of our dataset? 2. Assign the value of the cell [2,3] to the new variable var1 3. Assign the value of the cell [10,4] to the new variable var2 4. Output the value of each column separately. 5. Assign the values of SQFT to a new variable SQFT1. 6. Output the value of row 15. Practice 1: Prices Data Set (prices.csv) LISA: Basics of RFeb. 16, 2015

Quantitative summary of variable Square feet of living space (SQFT1). We will calculate the Minimum, maximum, mean, variance, median for that variable. mean(SQFT1) var(SQFT1) min(SQFT1) max(SQFT1) median(SQFT1) You can obtain the 5 number summary for the variable by using the command: summary(SQFT1) Exploratory Data Analysis: Summaries LISA: Basics of RFeb. 16, 2015

1.Histogram of SQFT. hist(SQFT1, main="Histogram of Square Feet of Living Space", col="dodgerblue", breaks=10) 2.Boxplot of SQFT boxplot(SQFT1, main="Boxplot of Square Feet of Living Space", col="khaki1", ylab="Square Feet of Living Space”) 3.Boxplot of SQFT by NE boxplot(SQFT1~mydatacsv[,4]) 4.Normal Quantile-Quantile Plot qqnorm(SQFT1, main="Normal QQ Plot Square Feet of Living Space") qqline(SQFT1,col="red") Exploratory Data Analysis: Graphs LISA: Basics of RFeb. 16, 2015 R Colors: mbia.edu/~tzheng/fil es/Rcolor.pdf

This statement allows for code to be executed repeatedly. for(i in 1:n){ statement } For Loops LISA: Basics of RFeb. 16, 2015

This statement allows for code to be executed repeatedly while a condition holds true. while(condition){ statement } While Loops LISA: Basics of RFeb. 16, 2015

if statement - use this statement to execute some code only if a specified condition is true: if(condition){ statement } If/Else Statement LISA: Basics of RFeb. 16, 2015

if...else statement - use this statement to execute some code if the condition is true and another code if the condition is false. if ( condition ) statement else statement2 If/Else Statement LISA: Basics of RFeb. 16, 2015

if...else if....else statement - use this statement to select one of many blocks of code to be executed if (condition){ statement } else{ if (condition2){ statement2 } else { Statement4 } } If/Else Statement LISA: Basics of RFeb. 16, 2015

If you have modified your dataset in R you can export it as a.csv file using the following code: write.csv(mydatacsv,file="mydatacsv.csv") Can also export vectors or other objects that you have created to.csv file: write.csv(vec2,file="vec2.csv") Data Export:.csv LISA: Basics of RFeb. 16, 2015

If you have modified your dataset in R you can export it as a space delimited.txt file using the following code: write.table(mydatacsv,file="mydatatxt.txt", sep=" ") You can export it as a tab delimited.txt file using the following code: write.table(mydatacsv,file="mydatatxt2.txt", sep="\t") Data Export:.txt LISA: Basics of RFeb. 16, 2015

Point and Click at the interface and search. Type ?word at the console and R will search for help pages. Google: How to do in R. UCLA website: Help LISA: Basics of RFeb. 16, 2015

The variable content for each record on the file includes demographic and socioeconomic variables from the Current Population Survey combined with the underlying cause of death mortality outcome and the follow-up time until death for records of the deceased or 11 years of follow-up for those not deceased. The previous information was taken from the reference manual of the dataset, this manual and a complete variable description is uploaded in the course materials folder (Dataset reference manual.pdf). Practice 2: National Longitudinal Mortality Study Dataset LISA: Basics of RFeb. 16, 2015

1. Read into R the dataset pubfileb.csv. 2. Determine the dimensions of the dataset. 3. Extract the variable povpct, income as percent of poverty level (column 35) as a new variable. 4. Extract the variable ms, marital status (column 5) as a new variable. 5. Obtain the minimum, maximum, mean, variance, median for the variable povpct and store them in separate variables. 6. Create a vector with the stored values from Create a histogram of povpct of a different color with 20 breaks. Practice 2-a: National Longitudinal Mortality Study Dataset LISA: Basics of RFeb. 16, 2015

1. Create a boxplot of povpct of a different color. 2. Create a boxplot of povpct by ms with the same color for all boxes. 3. Create a boxplot of povpct by ms with the same color for the first three boxes and another color for the remaining three boxes. 4. Create a normal Q-Q plot for povpct. 5. Using for loops count how many observations are there in a metropolitan area (smsast=1) (col 20) with an age lower than 15 (col 2). 6. Export your extracted variables as a.csv file and the dataset as a tab delimited.txt file. Practice 2-b: National Longitudinal Mortality Study Dataset LISA: Basics of RFeb. 16, 2015

LISA: Basics of RFeb. 16, 2015