Data Analysis with R. Many data mining methods are also supported in R core package or in R modules –Kmeans clustering: Kmeans() –Decision tree: rpart()

Slides:



Advertisements
Similar presentations
Christoph F. Eick Questions and Topics Review Nov. 30, Give an example of a problem that might benefit from feature creation 2.How does DENCLUE.
Advertisements

STOR 892 Object Oriented Data Analysis Radial Distance Weighted Discrimination Jie Xiong Advised by Prof. J.S. Marron Department of Statistics and Operations.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Christoph F. Eick Questions and Topics Review Nov. 22, Assume you have to do feature selection for a classification task. What are the characteristics.
An Introduction of Support Vector Machine
Support Vector Machines
MIS2502: Data Analytics Clustering and Segmentation.
Classification and Decision Boundaries
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Cluster Analysis: Advanced Concepts and Algorithms Figures for Chapter 9 Introduction.
Lesson 8: Machine Learning (and the Legionella as a case study) Biological Sequences Analysis, MTA.
1) Reminder about HW #3 (Due Thurs 10/1) 2) Lecture over Chapter 5
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Cluster Analysis: Basic Concepts and Algorithms Figures for Chapter 8 Introduction.
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Classification: Alternative Techniques Figures for Chapter 5 Introduction to.
Advanced Multimedia Text Clustering Tamara Berg. Reminder - Classification Given some labeled training documents Determine the best label for a test (query)
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Data mining and machine learning A brief introduction.
Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , (
ADVANCED CLASSIFICATION TECHNIQUES David Kauchak CS 159 – Fall 2014.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Machine Learning Lecture 11 Summary G53MLE | Machine Learning | Dr Guoping Qiu1.
Classification Heejune Ahn SeoulTech Last updated May. 03.
Using Support Vector Machines to Enhance the Performance of Bayesian Face Recognition IEEE Transaction on Information Forensics and Security Zhifeng Li,
Critical Issues with Respect to Clustering Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Clustering.
CLUSTERING AND SEGMENTATION MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Apache Mahout Qiaodi Zhuang Xijing Zhang.
CS378 Final Project The Netflix Data Set Class Project Ideas and Guidelines.
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
CLUSTERING AND SEGMENTATION MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
Support-Vector Networks C Cortes and V Vapnik (Tue) Computational Models of Intelligence Joon Shik Kim.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Introduction to R Basics * Based on R tutorial by Lorenza Bordoli.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
DATA MINING: CLUSTER ANALYSIS (3) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
MIS2502: Data Analytics Clustering and Segmentation Jeremy Shafer
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Algorithms: The Basic Methods Clustering WFH:
1 C.A.L. Bailer-Jones. Machine Learning. Model selection and combination Machine learning, pattern recognition and statistical data modelling Lecture 10.
Data Mining Classification and Clustering Techniques Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining.
Machine Learning Models
Data Mining: Basic Cluster Analysis
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Machine Learning Week 1.
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Critical Issues with Respect to Clustering
Objectives Data Mining Course
Other Classification Models: Support Vector Machine (SVM)
Semi-Automatic Data-Driven Ontology Construction System
Derek Hoiem CS 598, Spring 2009 Jan 27, 2009
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Practice Project Overview
What is Artificial Intelligence?
Presentation transcript:

Data Analysis with R

Many data mining methods are also supported in R core package or in R modules –Kmeans clustering: Kmeans() –Decision tree: rpart() in rpart library –Nearest Neighbour Knn() in class library –…

Additional Libraries and Packages Libraries –Comes with Package installation (Core or others) –library() shows a list of current installed –library must be loaded before use e.g. library(rpart) Packages –Developed code/libraries outside the core packages –Can be downloaded and installed separately Install.package(“name”) –There are currently 2561 packages at project.org/web/packages/ project.org/web/packages/ E.g. Rweka, interface to Weka.

Common Data Mining Methods Clustering analysis –Grouping data object into different bucket. –Common methods: Distance based clustering, e.g. k-means Density based clustering e.g. DBSCAN Hierarchical clustering e.g. Aggregative hierarchical clustering Classification –Assigning labels to each data object based on training data. –Common methods: Distance based classification: e.g. SVM Statistic based classification: e.g. Naïve Bayesian Rule based classification: e.g. Decision tree classification

Cluster Analysis Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups –Inter-cluster distance: maximized –Intra-cluster distance: minimized

An Example of k-means Clustering K=3 Examples are from Tan, Steinbach, Kumar Introduction to Data Mining

K-means clustering Example login1% more kmeans.R x<-read.csv("../data/cluster.csv",header=F) fit<-kmeans(x, 2) plot(x,pch=19,xlab=expression(x[1]), ylab=expression(x[2])) points(fit$centers,pch=19,col="blue",cex=2) points(x,col=fit$cluster,pch=19)

> fit K-means clustering with 2 clusters of sizes 49, 51 Cluster means: V1 V Clustering vector: [1] [38] [75] Within cluster sum of squares by cluster: [1] Available components: [1] "cluster" "centers" "withinss" "size" >

Classification Tasks

Support Vector Machine Classification A distance based classification method. The core idea is to find the best hyperplane to separate data from two classes. The class of a new object can be determined based on its distance from the hyperplane.

Binary Classification with Linear Separator Red and blue dots are representations of objects from two classes in the training data The line is a linear separator for the two classes The closets objects to the hyperplane is the support vectors. ρ

SVM Classification Example install.packages("e1071") library(e1071) train<- read.csv("sonar_train.csv",header=FALSE) y<-as.factor(train[,61]) x<-train[,1:60] fit<-svm(x,y) 1-sum(y==predict(fit,x))/length(y))

SVM Classification Example test<- read.csv("sonar_test.csv",header=FALSE) y_test<-as.factor(test[,61]) x_test<-test[,1:60] 1- sum(y_test==predict(fit,x_test))/length (y_test)

Further references R –M. Crawley, Statistics An Introduction using R, Wiley –J. Verzani, SimpleR Using R for Introductory Statistics –Programming manual: Using R for data mining –Data Mining with R: Learning with case studies, Luis Togo Contact Info –Weijia Xu

Reminder Start R sessions –ssh –sbatch job.Rstudio.training get exemplar code cp –R /work/00791/xwj/R-0915 ~/