Research methodology R Statistics – Introduction

Slides:



Advertisements
Similar presentations
Introduction to R Brody Sandel. Topics Approaching your analysis Basic structure of R Basic programming Plotting Spatial data.
Advertisements

R for Macroecology Aarhus University, Spring 2011.
PRE-SCHOOL QUANT WORKSHOP II R THROUGH EXCEL. NEW YORK TIMES INFOGRAPHICS GALARY The Jobless Rate for People Like You Home Prices in Selected Cities For.
Two topics in R: Simulation and goodness-of-fit HWU - GS.
Introduction to GTECH 201 Session 13. What is R? Statistics package A GNU project based on the S language Statistical environment Graphics package Programming.
Ann Arbor ASA ‘Up and Running’ Series: SPSS Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with.
R for Research Data Analysis using R Day1: Basic R Baburao Kamble University of Nebraska-Lincoln.
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
Questionnaire Development Part II: SPSS, Reliability, and Validity Personality Lab October 11, 2010.
MATLAB Lecture One Monday 4 July Matlab Melvyn Sim Department of Decision Sciences NUS Business School
732A44 Programming in R.  Self-studies of the course book  2 Lectures (1 in the beginning, 1 in the end)  Labs (computer). Compulsory submission of.
Data, graphics, and programming in R 28.1, 30.1, Daily:10:00-12:45 & 13:45-16:30 EXCEPT WED 4 th 9:00-11:45 & 12:45-15:30 Teacher: Anna Kuparinen.
Matlab Workshop 1/10/07 Lesson 1: Matlab as a graphing calculator.
How to start Visual Studio 2008 or 2010 (command-line program)
MATLAB Harri Saarnisaari, Part of Simulations and Tools for Telecommunication Course.
Outline Comparison of Excel and R R Coding Example – RStudio Environment – Getting Help – Enter Data – Calculate Mean – Basic Plots – Save a Coding Script.
Introduction to Programming in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone
What does C store? >>A = [1 2 3] >>B = [1 1] >>[C,D]=meshgrid(A,B) c) a) d) b)
Chapters 2 & 3 MATLAB Skills This tutorial revisits Examples 3.1 to 3.4 to show how MATLAB can be used to solve the same problems 1.Scatter Plots 2.Other.
An Introduction to R Statistical Computing AMS 597 Stony Brook University Spring 2009 By Tianyi Zhang.
Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
LESSON ONE DECISION ANALYSIS Subtopic 4 - R Programming Created by The North Carolina School of Science and Math forThe North Carolina School of Science.
R Workshop #2 Basic Data Analysis. What we did last week: Understand the basics of how R works Generated objects (vectors, matrices, etc.) Read in data.
Statistics 3502/6304 Prof. Eric A. Suess Chapter 3.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
PreCalculus 1-7 Linear Models. Our goal is to create a scatter plot to look for a mathematical correlation to this data.
MIS2502: Data Analytics Introduction to Advanced Analytics and R.
Pinellas County Schools
16BIT IITR Data Collection Module If you have not already done so, download and install R from download.
Introduction to R Dr. Satish Nargundkar. What is R? R is a free software environment for statistical computing and graphics. It compiles and runs on a.
Introduction to Data Manipulation, Analysis, and Visualization with R Patrick Grof-Tisza.
Basic statistics for corpus linguistics
Useful packages for visualisation, GIS analysis and more
Block 1: Introduction to R
Research methodology MSC COURSE VALIDATING of MODELS
R Brown-Bag Seminar 2.1 Topic: Introduction to R Presenter: Faith Musili ICRAF-Geoscience Lab.
Data Tools: R and RStudio
Is My Model Valid? Using Simulation to Understand Your Model and If It Can Accurately Predict Events Brad Foulkes JMP Discovery Summit 2016.
Modeling in R Sanna Härkönen.
Lecture 2: Introduction to R
Introduction to R Samal Dharmarathna.
Second Annual Cytomics Workshop April, 2017
Introduction to R.
R in Power BI.
Introduction to Matlab
R Programming.
MATLAB DENC 2533 ECADD LAB 9.
Prepared by Kimberly Sayre and Jinbo Bi
Lab 1 Introductions to R Sean Potter.
Introduction to R.
2-7 Curve Fitting with Linear Models Holt Algebra 2.
Introduction to R By Robert Biddle.
MATH 493 Introduction to MATLAB
Crash course in R – short introduction
HMI 7530– Programming in R Introduction
STAT 4030 – Programming in R Introduction
Model Selection In multiple regression we often have many explanatory variables. How do we find the “best” model?
Code is on the Website Outline Comparison of Excel and R
This is where R scripts will load
CSCI N207 Data Analysis Using Spreadsheet
Communication and Coding Theory Lab(CS491)
Installing Packages Introduction to R, Part II
MIS2502: Data Analytics Introduction to R and RStudio
MIS2502: Data Analytics ICA #7 Introduction to R and RStudio - Recap
This is where R scripts will load
R Course 1st Lecture.
Data analysis with R and the tidyverse
R tutorial
Research methodology R Statistics – Introduction
Python4ML An open-source course for everyone
Presentation transcript:

Research methodology R Statistics – Introduction Dr. Sanna Härkönen, R&D Manager, Bitcomp Oy

Contents Topic Contents 15.11. Introduction to R Basic use of Rstudio Basic commands (reading and writing data, using data frames) 17.11. Modeling examples Example studies with R 21.11. Introduction to group work Model fitting Aggregating, plotting, linear regression and its interpretation 23.11. (1) Model validation RMSE, bias, t-test 23.11. (2) GIS analysis with R Rasters, shapefiles -> mapping 25.11. Group presentations Best practices: using R for data interpretation in scientific reports and studies

R Statistics Script language Great for data analysis and statistical computing Efficient vector and matrix calculations Advantages: Any programming tasks, modeling etc Versatile packages for environmental analysis! For example data clustering, decision trees, kNN imputation, GIS data analysis, … Links: https://www.r-project.org/ https://cran.r-project.org/doc/contrib/Torfs+Brauer-Short-R-Intro.pdf https://www.analyticsvidhya.com/blog/2015/07/guide-data-visualization-r/ Just Google, there is huge amount of tutorials and example codes available!

R Studio Code window Console

Tips R code a <- b + c is same as a = b + c Variable names are case sensitive (a is not same as A) Running code: mark the desired line(s) in code window and press CTRL + ENTER Clean the console: CTRL + L Show the previous code lines in console: up-array https://www.rstudio.com/wp-content/uploads/2016/10/r-cheat- sheet-3.pdf

1 Basic commands Reading data in read.csv() Cheking first lines head() Checking summary statistics summary() Data frames data.frame() Creating new column to data frame & calculating its value my_dataframe$my_new_variable <- my_dataframe$var1 + my_dataframe$var2 Removing column my_dataframe$my_new_variable <- NULL Conditionals: Ifelse() Taking subset subset() Plotting data plot() Writing data out write.csv()

Exercise 1 Download ”Modeling_data_all.csv” from Wiki Read modeling data set in RStudio to object called A A <- read.csv(”C:/temp/Modeling_data_all.csv”) Check first lines of your data set: head(A) Check summary statistics on data set A summary(A)

Calculate new variable N to data frame A (number of stems / ha, based on mean diameter D, cm, and total basal area BA, m2/ha) A$N <- A$BA / (pi * (0.5 * A$D / 100)^2) Calculate new variable ”mean_stem_volume1” to data frame A, based on total volume and N A$mean_stem_volume1 <- A$TOTAL_VOLUME / A$N

Calculate new variable mean_stem_volume2: using Laasasenaho volume function [note! Ln in R is log() ] Laasasenaho volume (V, liters) function (based on D, diameter (cm)): Scots pine: ln(V) = -5.39417 + 3.48060 * ln(2+1.25 * D) -0.039884 * D A$mean_stem_volume2 <- exp(-5.39417 + 3.48060 * log(2+1.25 * A$D) -0.039884 * A$D) / 1000 (converted from liters to m3)

Print summary statistics on your data set: summary(A) Check visually how well the two different mean stem volumes correlate together : plot(x, y) Print boxplots showing 1) mean stem volume, 2) total volume and 3) difference on mean stem volume1 and mean stem volume2 by different tree species classes and site types boxplot(x~y)

Aggregate the data based on species and site type A_agg <- aggregate(A, list(A$SP_GROUP, A$FOREST_TYPE), mean) Consider, how could you utilize R for interpreting your modeling data in ”Material” chapter of scientific report