Research methodology R Statistics – Introduction

Research methodology R Statistics – Introduction
Dr. Sanna Härkönen, R&D Manager, Bitcomp Oy

Contents Topic Contents 22.11. Introduction to R Basic use of Rstudio
Introduction to group work Basic commands (reading and writing data, using data frames) 24.11. Model fitting Model validation Aggregating, plotting, linear regression and its interpretation RMSE, bias, t-test 27.11. GIS analysis with R Rasters, shapefiles -> mapping

R Statistics Script language
Great for data analysis and statistical computing Efficient vector and matrix calculations Advantages: Any programming tasks, modeling etc Versatile packages for environmental analysis! For example data clustering, decision trees, kNN imputation, GIS data analysis, … Machine learning ! Links: Just Google, there is huge amount of tutorials and example codes available!

R Studio https://www.rstudio.com/
Code window Console

Tips R code: a <- b + c is same as a = b + c
Variable names are case sensitive (a is not same as A) Running code: mark the desired line(s) in code window and press CTRL + ENTER Run until current line: CTRL + ALT + B Clean the console: CTRL + L Show the previous code lines in console: up-array sheet-3.pdf

1 Basic commands Reading data in read.csv() Cheking first lines head()
Checking summary statistics summary() Data frames data.frame() Creating new column to data frame & calculating its value my_dataframe$my_new_variable <- my_dataframe$var1 + my_dataframe$var2 Removing column my_dataframe$my_new_variable <- NULL Conditionals: Ifelse() Taking subset subset() Plotting data plot() Writing data out write.csv()

Exercise 1 Download ”Modeling_data_all.csv” from Wiki Read modeling data set in RStudio to object called A A <- read.csv(”C:/temp/Modeling_data_all.csv”) Check first lines of your data set: head(A) Check summary statistics on data set A summary(A)

Calculate new variable N to data frame A (number of stems / ha, based on mean diameter D, cm, and total basal area BA, m2/ha) A$N <- A$BA / (pi * (0.5 * A$D / 100)^2) Calculate new variable ”mean_stem_volume1” to data frame A, based on total volume and N A$mean_stem_volume1 <- A$TOTAL_VOLUME / A$N

Calculate new variable mean_stem_volume2: using Laasasenaho volume function [note! Ln in R is log() ] Laasasenaho volume (V, liters) function (based on D, diameter (cm)): Scots pine: ln(V) = * ln( * D) * D A$mean_stem_volume2 <- exp( * log( * A$D) * A$D) / 1000 (converted from liters to m3)

Print summary statistics on your data set: summary(A)
Check visually how well the two different mean stem volumes correlate together : plot(x, y) Print boxplots showing 1) mean stem volume, 2) total volume and 3) difference on mean stem volume1 and mean stem volume2 by different tree species classes and site types boxplot(x~y)

Aggregate the data based on species and site type
A_agg <- aggregate(A, list(A$SP_GROUP, A$FOREST_TYPE), mean) Consider, how could you utilize R for interpreting your modeling data in ”Material” chapter of scientific report

Research methodology R Statistics – Introduction

Similar presentations

Presentation on theme: "Research methodology R Statistics – Introduction"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Research methodology R Statistics – Introduction

Similar presentations

Presentation on theme: "Research methodology R Statistics – Introduction"— Presentation transcript:

Similar presentations

About project

Feedback