Presentation is loading. Please wait.

Presentation is loading. Please wait.

Before the class starts: 1) login to a computer 2) start RStudio 3) download Intro.R from MyCourses 4) open Intro.R in Rstudio 5) Download “R in Action”

Similar presentations


Presentation on theme: "Before the class starts: 1) login to a computer 2) start RStudio 3) download Intro.R from MyCourses 4) open Intro.R in Rstudio 5) Download “R in Action”"— Presentation transcript:

1 Before the class starts: 1) login to a computer 2) start RStudio 3) download Intro.R from MyCourses 4) open Intro.R in Rstudio 5) Download “R in Action” from Zotero and open it

2 Statistical software: SPSS, Stata, and R SPSSStataR DescriptionCommand driven statistical program Statistical programming environment that also allows interactive use AudienceDesigned for corporate use Designed for researchers/scien tists Designed to be general DocumentationExplains how to use SPSS Explains the analyses Points to original sources AvailabilityInstalled on all Aalto computers? Installed on all TUAS computers Installed on all Aalto computers CostAalto has a site license Student version 35$ Free

3 My take on the software I use Stata and R I am more productive with Stata in the tasks that it is designed for (And Stata has excellent documentation) R is more flexible and better for data management, and is better for making examples People in the DIEM department use mainly SPSS and Stata Some are moving from SPSS to Stata, but no-one moves the other way Students on my courses tend to slightly prefer R because they can install it (legally) on their home computers and they do just fine with that. But R is not the best choice for everyone. You cannot go wrong with Stata.

4 Datasets and command files Datasets Observations on rows Variables on columns Stata works with one file at a time R can work with multiple files at a time Manipulated with commands Data files are never edited! Command files A sequence of data manipulation and analysis commands to be applied to the data Stores the logic of your analysis Should contain a lot of comments where you explain the logic

5 Using the software: Menus vs. Typing commands vs. Command file Menus Good for learning the program Good if you do not remember the command for a particular analysis (Lack of menus is one of the reasons why R has a steeper learning curve) Typing commands This is normally the fastest way to explore the data and experiment with the analyses Command file Should always be used for the analyzes that you want to publish

6 Run Intro.R

7 Overview of the user interface

8 Compile a notebook

9 Introduction to R

10 1.Using the software as calculator 2.Accessing and reading the documentation 3.Creating and running projects as analysis files 4.Loading and manipulating datasets (e.g. merging, sorting, filtering) 5.Basic exploratory data analysis including means, correlations, etc 6.Basics of graphics 7.Generating data and running simple simulations 8.Creating loops in analysis files and other very basic automation

11 Packages Install package “lmtest’ Load package “lmtest” R in action: 1.4 Packages Click here

12 Using R as calculator Type thisExplanation 100+2/3 Basic math (100+2)/3 You can use round brackets to group operations so that they are carried out first 5*10^2 The symbol * means multiply, and ^ means "to the power", so this gives 5 times (10 squared), i.e. 500 1/0 R knows about infinity (and minus infinity) 0/0 undefined results take the value NaN ("not a number") sqrt(4) Square root function Type into console https://en.wikibooks.org/wiki/Statistical_Analysis:_an_Introduction_using_R/R/R_as_a_calculator

13 Using the help Try the following regression lm R in action: 1.3.2 Getting help Read the section Try the examples Type here

14 Help page for lm

15

16

17

18

19 Working with datasets

20

21 Accessing built-in datasets Load the packages “car” and “psych” List available datasets data() Datasets are accessed by their name mtcars Insepect the dataset describe(mtcars) scatterplotMatrix(mtcars)

22 Loading CSV files Load a dataset from UCLA website read.csv(“http://www.ats.ucla.edu/stat/data/test.csv”)“http://www.ats.ucla.edu/stat/data/test.csv Store the dataset with name myData <- read.csv(“http://www.ats.ucla.edu/stat/data/test.csv”)“http://www.ats.ucla.edu/stat/data/test.csv Print the dataset myData http://www.ats.ucla.edu/stat/r/modules/raw_data.htm

23 Loading CSV files from your computer R will load and save files to working directory Download the datasets for Data Analysis Assignment 4 (optional) from MyCourses and unzip the file Set your working directory to the directory where you unzipped the files and load the CSV file read.csv(“Orbis_Export_1.csv”)

24

25 Setting up Start a new R Script and copy-paste Listing 4.1 into the file manager <- c(1, 2, 3, 4, 5) date <- c("10/24/08", "10/28/08", "10/1/08", "10/12/08", "5/1/09") country <- c("US", "US", "UK", "UK", "UK") gender <- c("M", "F", "F", "M", "F") age <- c(32, 45, 25, 39, 99) q1 <- c(5, 3, 3, 3, 2) q2 <- c(4, 5, 5, 3, 2) q3 <- c(5, 2, 5, 4, 1) q4 <- c(5, 5, 5, NA, 2) q5 <- c(5, 5, 2, NA, 1) leadership <- data.frame(manager, date, country, gender, age, q1, q2, q3, q4, q5, stringsAsFactors=FALSE)

26 Selecting cases and variables (subsetting) Data.frame has two dimensions: rows, and columns leadership[,] Value on the left side of the comma selects rows, value on the right side selects columns leadership[1,] leadership[,1] leadership[1,1] Selecting with names leadership[,”date”] leadership$date leadership$date[1]

27 Creating vectors and selecting with vectors Vector is a sequence of numbers or strings 3:5 c(1,2,4) c(“gender”,”age”) Selecting with vector leadership[3:5,] leadership[c(1,2,4), c(“gender”,”age”)]

28 Comparisons Comparisons return vectors of TRUE and FALSE leadership$age > 40 leadership$age > 40 & leadership$country == “UK” Converting from TRUE and FALSE to indices which(leadership$age > 40)

29 Selecting cases with comparison leadership[leadership$age > 40,] People often forget the comma here

30 The subset command

31 Manipulating data Setting outlier to missing value leadership[leadership$age == 99,] <- NA Locating observations with missing data leadership[is.na(leadership$age),] <- NA Select what to update Assign new value

32 Creating variables Select a non-existing variable leadership$agecat[leadership$age > 75] <- "Elder" leadership$agecat[leadership$age >= 55 & leadership$age <= 75] <- "Middle Aged" leadership$agecat[leadership$age < 55] <- "Young"

33 Renaming variables Assign new values to names names(leadership)[1] <- “managerID” A better approach with reshape package leadership <- rename(leadership, c(manager="managerID”, date="testDate") Recreate the leadership dataset after trying these

34 Sorting datasets Get the order of values order(leadership$age) order(-leadership$age) Sort by selecting with the order leadership[order(leadership$age),] If you want to keep the new order, store the result with the same name leadership <- leadership[order(leadership$age),]

35 Merging datasets Merge two datasets (add columns) hairColor <- cbind(gender = c(“M”,”F”), hair=c(“Blonde”,”Brunette”)) merge(leadership, hairColor, by=“gender”) Alternative, you can use cbind if you know that the data are in the same order and have same number of rows Append datasets (add rows) rbind(leadership, leadership)

36 Applying what we just went through Hints: Use scale to standardize Math, Science, and English Use quantile to calculate grade cutoffs

37 Basics of exploratory data analysis

38

39 Statistical functions

40 Applying functions to data frames ?apply apply(mtcars,2, mean) apply(mtcars,2, sd) apply(mtcars,2, quantile)

41 More convenient way to get descriptive statistics using the psych package describe(mtcars) describeBy(mtcars, group = mtcars$cyl)

42 Frequency tables (7.2) table(mtcars$cyl) table(mtcars$gear) table(mtcars$gear, mtcars$cyl) prop.table(table(mtcars$gear, mtcars$cyl)) prop.table(table(mtcars$gear, mtcars$cyl),1) prop.table(table(mtcars$gear, mtcars$cyl),2)

43 Correlations (7.3) cor(mtcars) lowerCor(mtcars) corr.test(mtcars)

44 Basics of graphics

45 Plot example (3.1 Working with Graphs) plot(mtcars$wt, mtcars$mpg) abline(lm(mpg~wt, data = mtcars)) title("Regression of MPG on Weight")

46 Examples Browse graph examples at: http://shinyapps.org/apps/RGraphCompendium/index.php

47 Exporting graphics as files pdf(“myGraph.pdf”) plot(mtcars$wt, mtcars$mpg) abline(lm(mpg~wt, data = mtcars)) title("Regression of MPG on Weight”) dev.off()

48 Kernel density plot plot(density(mtcars$mpg))

49 Scatter plot matrix scatterplotMatrix(mtcars)

50 scatterplotMatrix(mtcars[,1:3])

51 Aggregating and restructuring data

52 Aggregating data ?aggregate aggregate(mtcars, mtcars$cyl, mean) aggregate(mtcars, list(mtcars$cyl), mean) aggregate(mtcars, list(cyl = mtcars$cyl), mean) aggregate(mtcars, list(cyl = mtcars$cyl, mtcars$gear), mean)

53 Reshaping data using reshape2 package

54 Reshape dw <- data.frame( id = 1001:1004, y_1 = 1:4, y_2 = 11:14, x_1 = 1:4, x_2 = 11:14, w = 1:4) library(reshape2) dm <- melt(dw,measure.vars = c("y_1","y_2","x_1","x_2")) ds <- colsplit(dm$variable, pattern="_", names = c("variable", "time")) dm <- cbind(dm[,-3],ds) dl <- dcast(dm,... ~ variable)

55 Simple simulations

56 Generating random numbers Throw ten dice sample(1:6,10, replace = TRUE) Generate ten standard normal variables (mean = 0, SD = 1) rnorm(10)

57 Effects of model misspecification on regression x1 <- rnorm(1000) x2 <- x1 + rnorm(1000) y <- x1 + x2 + rnorm(1000) lm(y ~ x1 + x2) lm(y ~ x1)

58 Mean of ten dice dice <- sample(1:6,10, replace = TRUE) mean(dice) reps <- replicate(10000,{ dice <- sample(1:6,10, replace = TRUE) mean(dice) }) plot(density(reps))

59 Loops and other basic automation

60 Loops and conditions for(counter in 1:10){ if(counter == 5){ print("Five") } else{ print("Not five") }

61 Conclusion

62 Getting started 1.Study R in action 2.Search for online examples 3.Ask for help online (e.g. course forum) 1.If you have a problem, it often helps to post your full analysis file or log https://gist.github.com https://gist.github.com 4.Online courses 1.https://www.datacamp.com/ courses/free-introduction- to-r

63 http://www.ats.ucla.edu/stat/dae/


Download ppt "Before the class starts: 1) login to a computer 2) start RStudio 3) download Intro.R from MyCourses 4) open Intro.R in Rstudio 5) Download “R in Action”"

Similar presentations


Ads by Google