An Introduction to WEKA

Slides:



Advertisements
Similar presentations
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
Advertisements

Machine Learning Homework
Florida International University COP 4770 Introduction of Weka.
Weka & Rapid Miner Tutorial By Chibuike Muoh. WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering.
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 1 Tutorial.
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
March 25, 2004Columbia University1 Machine Learning with Weka Lokesh S. Shrestha.
An Extended Introduction to WEKA. Data Mining Process.
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
Machine Learning with WEKA. WEKA: the bird Copyright: Martin Kramer
1 How to use Weka How to use Weka. 2 WEKA: the software Waikato Environment for Knowledge Analysis Collection of state-of-the-art machine learning algorithms.
CSCI 347 / CS 4206: Data Mining Module 05: WEKA Topic 04: Data Preparation Tools.
CSCI 347 / CS 4206: Data Mining Module 05: WEKA Topic 01: WEKA Navigation.
 The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.
Contributed by Yizhou Sun 2008 An Introduction to WEKA.
Department of Computer Science, University of Waikato, New Zealand Geoff Holmes WEKA project and team Data Mining process Data format Preprocessing Classification.
WEKA - Explorer (sumber: WEKA Explorer user Guide for Version 3-5-5)
WEKA and Machine Learning Algorithms. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of.
Appendix: The WEKA Data Mining Software
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Weka: a useful tool in data mining and machine learning Team 5 Noha Elsherbiny, Huijun Xiong, and Bhanu Peddi.
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
Machine Learning with Weka Cornelia Caragea Thanks to Eibe Frank for some of the slides.
For ITCS 6265/8265 Fall 2009 TA: Fei Xu UNC Charlotte.
W E K A Waikato Environment for Knowledge Analysis Branko Kavšek MPŠ Jožef StefanNovember 2005.
Artificial Neural Network Building Using WEKA Software
1 1 Slide Using Weka. 2 2 Slide Data Mining Using Weka n What’s Data Mining? We are overwhelmed with data We are overwhelmed with data Data mining is.
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
Weka – A Machine Learning Toolkit October 2, 2008 Keum-Sung Hwang.
Introduction to Weka Xingquan (Hill) Zhu Slides copied from Jeffrey Junfeng Pan (UST)
 A collection of open source ML algorithms ◦ pre-processing ◦ classifiers ◦ clustering ◦ association rule  Created by researchers at the University.
W E K A Waikato Environment for Knowledge Aquisition.
An Exercise in Machine Learning
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
Weka Tutorial. WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering – association rule Created by.
Weka. Weka A Java-based machine vlearning tool Implements numerous classifiers and other ML algorithms Uses a common.
Machine Learning with WEKA - Yohan Chin. WEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning algorithms for data tasks.
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Machine Learning Homework Gaining familiarity with Weka, ML tools and algorithms.
@relation age sex { female, chest_pain_type { typ_angina, asympt, non_anginal,
WEKA: A Practical Machine Learning Tool WEKA : A Practical Machine Learning Tool.
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
Department of Computer Science, University of Waikato, New Zealand Geoff Holmes WEKA project and team Data Mining process Data format Preprocessing Classification.
An Introduction to WEKA
Machine Learning: Decision Trees in AIMA and WEKA
An Introduction to WEKA
Machine Learning: Decision Trees in AIMA and WEKA
Machine Learning with WEKA
Waikato Environment for Knowledge Analysis
WEKA.
Sampath Jayarathna Cal Poly Pomona
Data Mining: Concepts and Techniques Course Outline
An Introduction to WEKA
Machine Learning with WEKA
Machine Learning with WEKA
Weka Package Weka package is open source data mining software written in Java. Weka can be applied to your dataset from the GUI, the command line or called.
Machine Learning with Weka
Tutorial for WEKA Heejun Kim June 19, 2018.
CSCI N317 Computation for Scientific Applications Unit Weka
Machine Learning with Weka
Machine Learning with WEKA
Lecture 10 – Introduction to Weka
Statistical Learning Introduction to Weka
Copyright: Martin Kramer
Machine Learning: Decision Trees in AIMA and WEKA
Neural Networks Weka Lab
Data Mining CSCI 307, Spring 2019 Lecture 7
Data Mining CSCI 307, Spring 2019 Lecture 8
Presentation transcript:

An Introduction to WEKA SEEM 4630 Hello everyone! My name is Rong Yu, I’m the TA of this course. Today is our first tutorial. . So if there is any thing you can’t follow in this tutorial, please let me know thank you! Today I’ll introduce a powerful tools in the data mining and machine learning area As you see, It’s name is WEKA.

Content What is WEKA? The Explorer: References and Resources Preprocess data Classification Clustering Association Rules Attribute Selection Data Visualization References and Resources 2 10/04/16

What is WEKA? Waikato Environment for Knowledge Analysis It’s a data mining/machine learning tool developed by Department of Computer Science, University of Waikato, New Zealand. Weka is also a bird found only on the islands of New Zealand. So what is WEKA? WEKA is a data mining/machine learning tool developed by xxxx. And also it’s a name of bird found only on the New Zealand. 3 10/04/16

Download and Install WEKA Website: http://www.cs.waikato.ac.nz/~ml/weka/index.html Support multiple platforms (written in java): Windows, Mac OS X and Linux WEKA has been installed in the teaching labs in SEEM This page is some information about downloading and install WKEA. I hope everyone can go to download and install it. Also, WEKA is provided in the teaching labs. 4 10/04/16

Main Features 49 data preprocessing tools 76 classification/regression algorithms 8 clustering algorithms 3 algorithms for finding association rules 15 attribute/subset evaluators + 10 search algorithms for feature selection So, why WEKA is a powerful tools? It’s provide many basic methods and algorithms that frequently used in data analyze area. 5 10/04/16

Main GUI Three graphical user interfaces “The Explorer” (exploratory data analysis) “The Experimenter” (experimental environment) “The KnowledgeFlow” (new process model inspired interface) This is main GUI of WEKA, We can see, It contains three part. The explorer, The Experimenter & The KonwledgeFlow. Today I mainly introduce the Explorer, which is for the data analysis. 6 10/04/16

Content What is WEKA? The Explorer: References and Resources Preprocess data Classification Clustering Association Rules Attribute Selection Data Visualization References and Resources In explorer, we can do almost all the things about the basic data analyze task. Such as xxxxxx 7 10/04/16

Explorer: pre-processing the data Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called “filters” WEKA contains filters for: Discretization, normalization, resampling, attribute selection, transforming and combining attributes, … So How to user weka? I’ll show you a simple example to do the data analysis. Before that, I’ll show the example of pre-processing data. WEKA can accept many data file formats: such as xxxx, Also it can read xxxxx. For these input data, we always need some pre-processing step to meet our analyze requirement. xxxxxx 8 10/04/16

WEKA only deals with “flat” files @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ... Flat file in ARFF (Attribute-Relation File Format) format Before the example, let’s learn the data file formats--arff. arff is a flat files, you can use any texteditor open it. it contains three sections. the first section is relation section, tagged with @relation, it indicates what the relation this data file represents. the second section is attribute section, tagged with @acctribute, it describe the attributes the data contains. it define the attributes contained in the data instance the third section is data section. It contains all the instance of this data file. 9 10/04/16

WEKA only deals with “flat” files @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ... numeric attribute nominal attribute let’s see some details in the attributes section, each row is a attribute properties.it include attributes name and the type. such as xxx and after name is the attributes types. it can be numeric and nominal, and if a attribute is nominal, numeric attribute is indicate that the value of attribute is numbers. and for the nominal attribute,the attribute value is one of value of the value set. you should list all the possible values here. such as the xxxx and in the data section, each row is a instance of data, it contains 6 value, the order of attributes is the same as the attribute section. 10 10/04/16

University of Waikato 11 09/30/12 next let me show you a example of pre-processing data. 11 University of Waikato 09/30/12

University of Waikato 12 09/30/12 first open xxxx and press open file button 12 University of Waikato 09/30/12

University of Waikato 13 09/30/12 here we use the example data given by weka iris after loading the data, we can see some information about the data such as the relation name,number of instance and number of attributes, in the left window, there is the name of each attributes, when we choose one attribute, some detailed information of this attributes will shown in the right window. such as some statistic features and the visualize graph. 13 University of Waikato 09/30/12

University of Waikato 14 09/30/12 note that the fifth attributes-- calss 14 University of Waikato 09/30/12

University of Waikato 15 09/30/12 actually it is the class label of this data. 15 University of Waikato 09/30/12

when press visualize all button, 16 University of Waikato 09/30/12

University of Waikato 17 09/30/12 we can get all the visualize graph of each data attribute. before we do the analyze step, we can observe these graph to find some regulations. 17 University of Waikato 09/30/12

University of Waikato 18 09/30/12 Then I’ll show how to use filter to discretize the data. 18 University of Waikato 09/30/12

press choose filter 19 University of Waikato 09/30/12

and I choose 20 University of Waikato 09/30/12

21 University of Waikato 09/30/12

and I choose the discretize filter. 22 University of Waikato 09/30/12

University of Waikato 23 09/30/12 Then I choose the attribute we want to discretize. 23 University of Waikato 09/30/12

not 24 University of Waikato 09/30/12

University of Waikato 25 09/30/12 note there are some parameters can control the discretize filter, we can change it as we need. 25 University of Waikato 09/30/12

University of Waikato 26 09/30/12 in this example we change useEqualFrequency to true. and press ok 26 University of Waikato 09/30/12

27 University of Waikato 09/30/12

28 University of Waikato 09/30/12

29 University of Waikato 09/30/12

University of Waikato 30 09/30/12 after that , we simply press apply button. 30 University of Waikato 09/30/12

University of Waikato 31 09/30/12 and we can see the results. all the data are divided into serval intervals. that is the basic step about the data pre-processing 31 University of Waikato 09/30/12

Explorer: building “classifiers” Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include: Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, … Next, I’ll show a example about the data analyze, do a classification on this data. to complete this task, we should build the classifiers at first. In weka, there are many algorithm to build the classifiers. such as in this case, We use the decision trees as an example. 32 10/04/16

Decision Tree Induction: Training Dataset This follows an example of Quinlan’s ID3 (Playing Tennis) I guess that you have not learn the decision trees. First, Let me briefly introduce the decision tree by an example: Here is our train data, the task is to use the first four attribute to predict whether the people will buy a computer. 33 10/04/16

Output: A Decision Tree for “buys_computer” age? overcast student? credit rating? <=30 >40 no yes 31..40 fair excellent To solve the problem, we build a the tree structure from the attribute. each node is a question and each brunch is the operation of different answers. For example, for a new people, first we check his age, if his age is 20, the we check wether he is student or not, if yes ,then we can predict that he will buy the computer. This how is the decision tree works. we can use this questions as a classifier 34 10/04/16

University of Waikato 36 09/30/12 Next I’ll show you how to use weka to build a decision tree. 36 University of Waikato 09/30/12

University of Waikato 37 09/30/12 Press choose button and choose a classifier 37 University of Waikato 09/30/12

University of Waikato 38 09/30/12 The classifier we will use here is J48 claissifer 38 University of Waikato 09/30/12

39 University of Waikato 09/30/12

40 University of Waikato 09/30/12

University of Waikato 41 09/30/12 Like the filters, there are also several parameters we can set, In this case we use all default. 41 University of Waikato 09/30/12

42 University of Waikato 09/30/12

43 University of Waikato 09/30/12

University of Waikato 44 09/30/12 In the test options,there are many parameters we can set. in this case ,we only modify the percentage spilt as 66%. that’s means we use 66% of instances in dataset as training data and 33% as test data to evaluate the performance of this classifier . The other options remain default. 44 University of Waikato 09/30/12

45 University of Waikato 09/30/12

46 University of Waikato 09/30/12

47 University of Waikato 09/30/12

48 University of Waikato 09/30/12

49 University of Waikato 09/30/12

University of Waikato 50 09/30/12 Then we click start to run this task. 50 University of Waikato 09/30/12

University of Waikato 51 09/30/12 And this is the result of building decision tree and classification. The right window show the detailed result as text 51 University of Waikato 09/30/12

University of Waikato 52 09/30/12 First part is the result of building the decision tree. 52 University of Waikato 09/30/12

University of Waikato 53 09/30/12 The second part is test result. We can see that for the 50 test instances, only two test case is wrong. So this tree is a good classifier. 53 University of Waikato 09/30/12

University of Waikato 54 09/30/12 Maybe the previous result is not direct view enough. In weka ,we can also visualize the decision tree. right click the result list. 54 University of Waikato 09/30/12

choose the visualize tree. 55 University of Waikato 09/30/12

University of Waikato 56 09/30/12 and we can see the decision tree constructed by the weka. This is basic procedures to do a data analyze using weka. Today we learn how to use weka to do the data pre-processing and data analyze tasks. After this tutorial I hope everyone can try the weka by yourself. If you have any questions, please email me and I’ll reply you as soon as possible.Thank you very much! 56 University of Waikato 09/30/12

57 University of Waikato 09/30/12

Explorer: attribute selection Panel that can be used to investigate which (subsets of) attributes are the most predictive ones Attribute selection methods contain two parts: A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking An evaluation method: correlation-based, wrapper, information gain, chi-squared, … Very flexible: WEKA allows (almost) arbitrary combinations of these two 70 10/04/16

71 University of Waikato 09/30/12

72 University of Waikato 09/30/12

73 University of Waikato 09/30/12

74 University of Waikato 09/30/12

75 University of Waikato 09/30/12

76 University of Waikato 09/30/12

77 University of Waikato 09/30/12

78 University of Waikato 09/30/12

Explorer: data visualization Visualization very useful in practice: e.g. helps to determine difficulty of the learning problem WEKA can visualize single attributes (1-d) and pairs of attributes (2-d) To do: rotating 3-d visualizations (Xgobi-style) Color-coded class values “Jitter” option to deal with nominal attributes (and to detect “hidden” data points) “Zoom-in” function 79 10/04/16

80 University of Waikato 09/30/12

81 University of Waikato 09/30/12

82 University of Waikato 09/30/12

83 University of Waikato 09/30/12

84 University of Waikato 09/30/12

85 University of Waikato 09/30/12

86 University of Waikato 09/30/12

87 University of Waikato 09/30/12

88 University of Waikato 09/30/12

89 University of Waikato 09/30/12

References and Resources WEKA website: http://www.cs.waikato.ac.nz/~ml/weka/index.html WEKA Tutorial: Machine Learning with WEKA: A presentation demonstrating all graphical user interfaces (GUI) in Weka. A presentation which explains how to use Weka for exploratory data mining. WEKA Data Mining Book: Ian H. Witten and Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques (Second Edition) WEKA Wiki: http://weka.sourceforge.net/wiki/index.php/Main_Page Others: Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, 2nd ed.