Presentation is loading. Please wait.

Presentation is loading. Please wait.

16BIT IITR Data Collection Module If you have not already done so, download and install R from https://cran.r-project.org/https://cran.r-project.org/ download.

Similar presentations


Presentation on theme: "16BIT IITR Data Collection Module If you have not already done so, download and install R from https://cran.r-project.org/https://cran.r-project.org/ download."— Presentation transcript:

1 16BIT IITR Data Collection Module If you have not already done so, download and install R from https://cran.r-project.org/https://cran.r-project.org/ download and install the user interface RStudio from http://www.rstudio.com/ide/download/desktophttp://www.rstudio.com/ide/download/desktop R is a powerful, free, user-modifiable piece of statistical software that runs equally well on Windows, Max OS and Linux. It can be used for everything from basic linear regression to complicated Bayesian statistics. It has a large and fast-growing user-base and is fast becoming one of the most powerful and widely used pieces of statistics software in academia. R is a script based language. This makes it easy to save the steps of your analysis and share them with colleagues. Reproducible research at your fingertips! Rstudio is an interface to work with R Introduction to R & RStudio Introduction

2 16BIT IITR Data Collection Module Type following command on your console : > print ( " Hello world !") That should give following output : [1] " Hello world !“ Basic Arithmetic Operations Type following command on your console : > 1 + 2 # + 20 ; That should give following output : [1] 3 Your First Program Introduction Spacing doesnot matter # used for comments Semicolon required only for running multiple commands (This command would run even if removed)

3 16BIT IITR Data Collection Module For knowing more about a function help Type ? Before function name > ?print If exact function is not known Use apropos with substring on your console : > apropos(mean) That should give following output : [1] ". colMeans " ". rowMeans " " colMeans " " kmeans " " mean " [2] "rowMeans " " weighted. mean " " mean. Date " Basic R Operations All functions containing “mean”

4 16BIT IITR Data Collection Module Vectors Basic R Operations A vector is a sequence of data elements of the same basic type. > Vector1 <- c(1,2,3,4,5,6,7,8,9,10) Vector can be numeric, character or logical in nature. Basic vector operations such as > Vector1 / Vector1 Some basic aggregate operations also possible for vectors > sum ( Vector1 ) > length( Vector1 )

5 16BIT IITR Data Collection Module Vectors Basic R Operations See the difference Vector can be created using following methods (Simplest)> Vector<- 1:100 seq can be used to create vector of sequences seq ( from = 0, to = 10, by = 2 ) rep can be used to create vector by repeating an element or even other vectors rep ( " Hello ", 3) rep ( Vector2, 2) rep ( Vector2, each = 2) Elements in vectors can be accessed/indexed using Vector [2:4] or Vector (c(1,3,4)) or Vector2 [ -2] or Vector [ Vector >= 4 ]

6 16BIT IITR Data Collection Module List Basic R Operations A list of vector, numeric vector, character vector and character string vector list1 [[1]] = Vector list1 [[1]] [1] = Vector [1] = first member of vector Vectors of single element A list is a generic vector containing other objects. > list1 <- list ( Vector, 5, ”a”, ”abc” ) Members of List can be accessed by > list1 [[1]] [1] Members of list can also be named as > v = list(bob=c(2, 3, 5), john=c("aa", "bb")) Named members of the list can be accessed as v[["bob"]] or v$bob

7 16BIT IITR Data Collection Module Matrix Basic R Operations Accessing all the rows but only of 1 st and 3 rd columns A matrix is a collection of data elements arranged in a two-dimensional rectangular layout. > A = matrix( c(2, 4, 3, 1, 5, 7), nrow=2, ncol=3, byrow = TRUE) The data elements must be of the same basic type like numeric vector above > A[,c(1,3)] Column-names and Row-names can be provided for matrix > colnames ( Matrix1 ) <- c( “A",“B",“C” ) > rownames ( Matrix1 ) <- c( "a "," b","c” ) Two matrices A and B can be combined rowwise or columnwise using rbind ( A, B ) or cbind ( A, B )

8 16BIT IITR Data Collection Module A data frame is used for storing data tables. It is used to handle data in R. Similar to matrices Difference : columns (variables) in Data Frames can be of different types Many built in data frames in R which can be loaded as Data Frames Basic R Operations > data(iris) Header/ Column names Data Row Cell > names(iris) Data Columns iris [[“Sepal.Length”]] > iris [48, “Petal.Width”] iris $Sepal.Length iris[[1]] iris[,”Sepal.Length”]

9 16BIT IITR Data Collection Module Number of rows and columns can also be extracted using and Data from csv file can also be loaded as a Data Frame “mydata” Different rows of data frame can be sampled using Various subsets of data frames can also be selected using Data Frames Basic R Operations > nrow (iris) > ncol (iris) > mydata = read.csv("mydata.csv") > sample( iris, size=10, replace = FALSE, prob = prob_weights) > subset(iris,select = c(“Sepal.Width”, ”Sepal.Length”, “Petal.Width”, ”Petal.Length”) > subset(iris,select = - Species) Both extract same subsets

10 16BIT IITR Data Collection Module Using Machine learning Algorithms with R Machine Learning with R  install.packages(“e1071”); install.packages(“caret”);  library(e1071); library(caret);  train_data_int<- sample( 1:nrow(iris), 0.8*nrow(iris));  train_data <- iris[train_data_int,]  test_data <- iris[-train_data_int,]  model <- svm(Species ~., data = train_data) Or x<-subset(train_data,select = - Species) y<-subset(train_data,select = Species) model <- svm(x, y$Species)  summary(model)  save(model, file=”model_svm.saved”) Install.packages to install a package and library to load it Column “Species” is predicate variable and all other are used as causal svm (,, cross=10) for 10 fold cross validation

11 16BIT IITR Data Collection Module Using Machine learning Algorithms with R Machine Learning with R  load("model_svm.saved")  test_x<-subset(test_data,select = - Species)  test_y<-subset(test_data,select = Species)  pred <- predict(model, test_x)  pred <- as.data.frame(pred);  tab<-table(pred$pred,test_y$Species)  conf<-confusionMatrix(tab)  accuracy_test<-conf$overall[1] as.data.frame to convert pred into data frame Confusion matrix can be used to determine accuracy

12 16BIT IITR Data Collection Module Huge number of news articles collected by web crawler New diseases keep on emerging Information related to diseases and epidemic incidents related to old and new diseases published in news articles Using Classification technique, we could predict whether news article is discussing some information related to diseases or not. Application of Machine Learning with R Application of Machine Learning with R YYYYYYYYYYYYYYYY


Download ppt "16BIT IITR Data Collection Module If you have not already done so, download and install R from https://cran.r-project.org/https://cran.r-project.org/ download."

Similar presentations


Ads by Google