Research methodology R Statistics – Introduction

Research methodology R Statistics – Introduction
Dr. Sanna Härkönen, R&D Manager, Bitcomp Oy

Contents Topic Contents 15.11. Introduction to R Basic use of Rstudio
Basic commands (reading and writing data, using data frames) 17.11. Modeling examples Example studies with R 21.11. Introduction to group work Model fitting Aggregating, plotting, linear regression and its interpretation (1) Model validation RMSE, bias, t-test (2) GIS analysis with R Rasters, shapefiles -> mapping 25.11. Group presentations Best practices: using R for data interpretation in scientific reports and studies

R Statistics Script language
Great for data analysis and statistical computing Efficient vector and matrix calculations Advantages: Any programming tasks, modeling etc Versatile packages for environmental analysis! For example data clustering, decision trees, kNN imputation, GIS data analysis, … Links: Just Google, there is huge amount of tutorials and example codes available!

R Studio Code window Console

Tips R code a <- b + c is same as a = b + c
Variable names are case sensitive (a is not same as A) Running code: mark the desired line(s) in code window and press CTRL + ENTER Clean the console: CTRL + L Show the previous code lines in console: up-array sheet-3.pdf

1 Basic commands Reading data in read.csv() Cheking first lines head()
Checking summary statistics summary() Data frames data.frame() Creating new column to data frame & calculating its value my_dataframe$my_new_variable <- my_dataframe$var1 + my_dataframe$var2 Removing column my_dataframe$my_new_variable <- NULL Conditionals: Ifelse() Taking subset subset() Plotting data plot() Writing data out write.csv()

Exercise 1 Download ”Modeling_data_all.csv” from Wiki Read modeling data set in RStudio to object called A A <- read.csv(”C:/temp/Modeling_data_all.csv”) Check first lines of your data set: head(A) Check summary statistics on data set A summary(A)

Calculate new variable N to data frame A (number of stems / ha, based on mean diameter D, cm, and total basal area BA, m2/ha) A$N <- A$BA / (pi * (0.5 * A$D / 100)^2) Calculate new variable ”mean_stem_volume1” to data frame A, based on total volume and N A$mean_stem_volume1 <- A$TOTAL_VOLUME / A$N

Calculate new variable mean_stem_volume2: using Laasasenaho volume function [note! Ln in R is log() ] Laasasenaho volume (V, liters) function (based on D, diameter (cm)): Scots pine: ln(V) = * ln( * D) * D A$mean_stem_volume2 <- exp( * log( * A$D) * A$D) / 1000 (converted from liters to m3)

Print summary statistics on your data set: summary(A)
Check visually how well the two different mean stem volumes correlate together : plot(x, y) Print boxplots showing 1) mean stem volume, 2) total volume and 3) difference on mean stem volume1 and mean stem volume2 by different tree species classes and site types boxplot(x~y)

Aggregate the data based on species and site type
A_agg <- aggregate(A, list(A$SP_GROUP, A$FOREST_TYPE), mean) Consider, how could you utilize R for interpreting your modeling data in ”Material” chapter of scientific report

Research methodology R Statistics – Introduction

Similar presentations

Presentation on theme: "Research methodology R Statistics – Introduction"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Research methodology R Statistics – Introduction

Similar presentations

Presentation on theme: "Research methodology R Statistics – Introduction"— Presentation transcript:

Similar presentations

About project

Feedback