16BIT IITR Data Collection Module If you have not already done so, download and install R from https://cran.r-project.org/https://cran.r-project.org/ download.

Slides:



Advertisements
Similar presentations
Introduction to R Brody Sandel. Topics Approaching your analysis Basic structure of R Basic programming Plotting Spatial data.
Advertisements

An Introduction to R: Logic & Basics. The R language Command line Can be executed within a terminal Within Emacs using ESS (Emacs Speaks Statistics)
R for Macroecology Aarhus University, Spring 2011.
Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU Empowered by Higher Education Quality Enhancement Project (HEQEP) Department.
Refresh- Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis using
Introduction to MATLAB The language of Technical Computing.
Basics of Using R Xiao He 1. AGENDA 1.What is R? 2.Basic operations 3.Different types of data objects 4.Importing data 5.Basic data manipulation 2.
Example of multivariate data What is R? R is available as Free Software under the terms of the Free Software Foundation'sFree Software Foundation GNU General.
Introduction to GTECH 201 Session 13. What is R? Statistics package A GNU project based on the S language Statistical environment Graphics package Programming.
R for Research Data Analysis using R Day1: Basic R Baburao Kamble University of Nebraska-Lincoln.
Lecture 2 LISAM. Statistical software.. LISAM What is LISAM? Social network for Creating personal pages Creating courses  Storing course materials (lectures,
 2004 Prentice Hall, Inc. All rights reserved. Chapter 25 – Perl and CGI (Common Gateway Interface) Outline 25.1 Introduction 25.2 Perl 25.3 String Processing.
Game Programming © Wiley Publishing All Rights Reserved. The L Line The Express Line to Learning L Line L.
Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.
732A44 Programming in R.  Self-studies of the course book  2 Lectures (1 in the beginning, 1 in the end)  Labs (computer). Compulsory submission of.
Data, graphics, and programming in R 28.1, 30.1, Daily:10:00-12:45 & 13:45-16:30 EXCEPT WED 4 th 9:00-11:45 & 12:45-15:30 Teacher: Anna Kuparinen.
Arko Barman with modification by C.F. Eick COSC 4335 Data Mining Spring 2015.
Sébastien Lê Agrocampus Rennes A very short introduction to “R” The “Rcmdr” package and its environment.
Data Objects in R Vector1 dimensionAll elements have the same data types Data types: numeric, character logic, factor Matrix2 dimensions Array2 or more.
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
A Brief Introduction to R Programming Darren J. Fitzpatrick, PhD The Bioinformatics Support Team 27/08/2015.
Hands-on Introduction to R. We live in oceans of data. Computers are essential to record and help analyse it. Competent scientists speak C/C++, Java,
R Programming Yang, Yufei. Normal distribution.
Introduction to Programming in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone
What does C store? >>A = [1 2 3] >>B = [1 1] >>[C,D]=meshgrid(A,B) c) a) d) b)
Algorithms  Problem: Write pseudocode for a program that keeps asking the user to input integers until the user enters zero, and then determines and outputs.
Lecture 26: Reusable Methods: Enviable Sloth. Creating Function M-files User defined functions are stored as M- files To use them, they must be in the.
STAT 534: Statistical Computing Hari Narayanan
Introduction to Exploratory Descriptive Data Analysis in S-Plus Jagdish S. Gangolly State University of New York at Albany.
Data & Graphing vectors data frames importing data contingency tables barplots 18 September 2014 Sherubtse Training.
1 Statistics & R, TiP, 2011/12 Multivariate Methods  Multivariate data  Data display  Principal component analysis Unsupervised learning technique 
PHP Tutorial. What is PHP PHP is a server scripting language, and a powerful tool for making dynamic and interactive Web pages.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Introduction to R and Data Science Tools in the Microsoft Stack Jamey Johnston.
Basics of R INSTRUCTOR: AMANDA MCGOUGH TUESDAY, MARCH 29, 2016.
Pinellas County Schools
Introduction to R Chris Free. Introduction to R Free! Superior (if not comparable) to commercial alternatives Available on all platforms Not just for.
Working with data in R 2 Fish 552: Lecture 3. Recommended Reading An Introduction to R (R Development Core Team) –
Introduction to R and Data Science Tools in the Microsoft Stack Jamey Johnston.
Introduction to R Dr. Satish Nargundkar. What is R? R is a free software environment for statistical computing and graphics. It compiles and runs on a.
Arko Barman COSC 6335 Data Mining Fall  Free, open source statistical analysis software  Competitor to commercial softwares like MATLAB and SAS.
Introduction to R.
Introduction to R and Data Science Tools in the Microsoft Stack
Basic concepts of C++ Presented by Prof. Satyajit De
Introduction to R and Data Science Tools in the Microsoft Stack
R Brown-Bag Seminar 2.1 Topic: Introduction to R Presenter: Faith Musili ICRAF-Geoscience Lab.
Programming in R Intro, data and programming structures
Introduction to R Samal Dharmarathna.
LISAM. Statistical software.
Arko Barman COSC 6335 Data Mining Fall 2014
Introduction to R Carolina Salge March 29, 2017.
Introduction to R.
Intro to R & MS Data Science Tools
Introduction Osborn.
Introduction to R Programming with AzureML
PHP Introduction.
Perl for Bioinformatics
Introduction to Python
Use of Mathematics using Technology (Maltlab)
Weka Free and Open Source ML Suite Ian Witten & Eibe Frank
PHP.
Islamic University of Gaza
Spreadsheets, Modelling & Databases
MIS2502: Data Analytics Introduction to R and RStudio
R Course 1st Lecture.
Introduction to Matlab
Data analysis with R and the tidyverse
Matrix A matrix is a rectangular arrangement of numbers in rows and columns Each number in a matrix is called an Element. The dimensions of a matrix are.
L L Line CSE 420 Computer Games Lecture #3 Introduction to Python.
Presentation transcript:

16BIT IITR Data Collection Module If you have not already done so, download and install R from download and install the user interface RStudio from R is a powerful, free, user-modifiable piece of statistical software that runs equally well on Windows, Max OS and Linux. It can be used for everything from basic linear regression to complicated Bayesian statistics. It has a large and fast-growing user-base and is fast becoming one of the most powerful and widely used pieces of statistics software in academia. R is a script based language. This makes it easy to save the steps of your analysis and share them with colleagues. Reproducible research at your fingertips! Rstudio is an interface to work with R Introduction to R & RStudio Introduction

16BIT IITR Data Collection Module Type following command on your console : > print ( " Hello world !") That should give following output : [1] " Hello world !“ Basic Arithmetic Operations Type following command on your console : > # + 20 ; That should give following output : [1] 3 Your First Program Introduction Spacing doesnot matter # used for comments Semicolon required only for running multiple commands (This command would run even if removed)

16BIT IITR Data Collection Module For knowing more about a function help Type ? Before function name > ?print If exact function is not known Use apropos with substring on your console : > apropos(mean) That should give following output : [1] ". colMeans " ". rowMeans " " colMeans " " kmeans " " mean " [2] "rowMeans " " weighted. mean " " mean. Date " Basic R Operations All functions containing “mean”

16BIT IITR Data Collection Module Vectors Basic R Operations A vector is a sequence of data elements of the same basic type. > Vector1 <- c(1,2,3,4,5,6,7,8,9,10) Vector can be numeric, character or logical in nature. Basic vector operations such as > Vector1 / Vector1 Some basic aggregate operations also possible for vectors > sum ( Vector1 ) > length( Vector1 )

16BIT IITR Data Collection Module Vectors Basic R Operations See the difference Vector can be created using following methods (Simplest)> Vector<- 1:100 seq can be used to create vector of sequences seq ( from = 0, to = 10, by = 2 ) rep can be used to create vector by repeating an element or even other vectors rep ( " Hello ", 3) rep ( Vector2, 2) rep ( Vector2, each = 2) Elements in vectors can be accessed/indexed using Vector [2:4] or Vector (c(1,3,4)) or Vector2 [ -2] or Vector [ Vector >= 4 ]

16BIT IITR Data Collection Module List Basic R Operations A list of vector, numeric vector, character vector and character string vector list1 [[1]] = Vector list1 [[1]] [1] = Vector [1] = first member of vector Vectors of single element A list is a generic vector containing other objects. > list1 <- list ( Vector, 5, ”a”, ”abc” ) Members of List can be accessed by > list1 [[1]] [1] Members of list can also be named as > v = list(bob=c(2, 3, 5), john=c("aa", "bb")) Named members of the list can be accessed as v[["bob"]] or v$bob

16BIT IITR Data Collection Module Matrix Basic R Operations Accessing all the rows but only of 1 st and 3 rd columns A matrix is a collection of data elements arranged in a two-dimensional rectangular layout. > A = matrix( c(2, 4, 3, 1, 5, 7), nrow=2, ncol=3, byrow = TRUE) The data elements must be of the same basic type like numeric vector above > A[,c(1,3)] Column-names and Row-names can be provided for matrix > colnames ( Matrix1 ) <- c( “A",“B",“C” ) > rownames ( Matrix1 ) <- c( "a "," b","c” ) Two matrices A and B can be combined rowwise or columnwise using rbind ( A, B ) or cbind ( A, B )

16BIT IITR Data Collection Module A data frame is used for storing data tables. It is used to handle data in R. Similar to matrices Difference : columns (variables) in Data Frames can be of different types Many built in data frames in R which can be loaded as Data Frames Basic R Operations > data(iris) Header/ Column names Data Row Cell > names(iris) Data Columns iris [[“Sepal.Length”]] > iris [48, “Petal.Width”] iris $Sepal.Length iris[[1]] iris[,”Sepal.Length”]

16BIT IITR Data Collection Module Number of rows and columns can also be extracted using and Data from csv file can also be loaded as a Data Frame “mydata” Different rows of data frame can be sampled using Various subsets of data frames can also be selected using Data Frames Basic R Operations > nrow (iris) > ncol (iris) > mydata = read.csv("mydata.csv") > sample( iris, size=10, replace = FALSE, prob = prob_weights) > subset(iris,select = c(“Sepal.Width”, ”Sepal.Length”, “Petal.Width”, ”Petal.Length”) > subset(iris,select = - Species) Both extract same subsets

16BIT IITR Data Collection Module Using Machine learning Algorithms with R Machine Learning with R  install.packages(“e1071”); install.packages(“caret”);  library(e1071); library(caret);  train_data_int<- sample( 1:nrow(iris), 0.8*nrow(iris));  train_data <- iris[train_data_int,]  test_data <- iris[-train_data_int,]  model <- svm(Species ~., data = train_data) Or x<-subset(train_data,select = - Species) y<-subset(train_data,select = Species) model <- svm(x, y$Species)  summary(model)  save(model, file=”model_svm.saved”) Install.packages to install a package and library to load it Column “Species” is predicate variable and all other are used as causal svm (,, cross=10) for 10 fold cross validation

16BIT IITR Data Collection Module Using Machine learning Algorithms with R Machine Learning with R  load("model_svm.saved")  test_x<-subset(test_data,select = - Species)  test_y<-subset(test_data,select = Species)  pred <- predict(model, test_x)  pred <- as.data.frame(pred);  tab<-table(pred$pred,test_y$Species)  conf<-confusionMatrix(tab)  accuracy_test<-conf$overall[1] as.data.frame to convert pred into data frame Confusion matrix can be used to determine accuracy

16BIT IITR Data Collection Module Huge number of news articles collected by web crawler New diseases keep on emerging Information related to diseases and epidemic incidents related to old and new diseases published in news articles Using Classification technique, we could predict whether news article is discussing some information related to diseases or not. Application of Machine Learning with R Application of Machine Learning with R YYYYYYYYYYYYYYYY