An Introduction to WEKA

An Introduction to WEKA
SEEM 4630 Hello everyone! My name is Rong Yu, I’m the TA of this course. Today is our first tutorial. . So if there is any thing you can’t follow in this tutorial, please let me know thank you! Today I’ll introduce a powerful tools in the data mining and machine learning area As you see, It’s name is WEKA.

Content What is WEKA? The Explorer: References and Resources
Preprocess data Classification Clustering Association Rules Attribute Selection Data Visualization References and Resources 2 10/04/16

What is WEKA? Waikato Environment for Knowledge Analysis
It’s a data mining/machine learning tool developed by Department of Computer Science, University of Waikato, New Zealand. Weka is also a bird found only on the islands of New Zealand. So what is WEKA? WEKA is a data mining/machine learning tool developed by xxxx. And also it’s a name of bird found only on the New Zealand. 3 10/04/16

Download and Install WEKA
Website: Support multiple platforms (written in java): Windows, Mac OS X and Linux WEKA has been installed in the teaching labs in SEEM This page is some information about downloading and install WKEA. I hope everyone can go to download and install it. Also, WEKA is provided in the teaching labs. 4 10/04/16

Main Features 49 data preprocessing tools
76 classification/regression algorithms 8 clustering algorithms 3 algorithms for finding association rules 15 attribute/subset evaluators + 10 search algorithms for feature selection So, why WEKA is a powerful tools? It’s provide many basic methods and algorithms that frequently used in data analyze area. 5 10/04/16

Main GUI Three graphical user interfaces
“The Explorer” (exploratory data analysis) “The Experimenter” (experimental environment) “The KnowledgeFlow” (new process model inspired interface) This is main GUI of WEKA, We can see, It contains three part. The explorer, The Experimenter & The KonwledgeFlow. Today I mainly introduce the Explorer, which is for the data analysis. 6 10/04/16

Content What is WEKA? The Explorer: References and Resources
Preprocess data Classification Clustering Association Rules Attribute Selection Data Visualization References and Resources In explorer, we can do almost all the things about the basic data analyze task. Such as xxxxxx 7 10/04/16

Explorer: pre-processing the data
Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called “filters” WEKA contains filters for: Discretization, normalization, resampling, attribute selection, transforming and combining attributes, … So How to user weka? I’ll show you a simple example to do the data analysis. Before that, I’ll show the example of pre-processing data. WEKA can accept many data file formats: such as xxxx, Also it can read xxxxx. For these input data, we always need some pre-processing step to meet our analyze requirement. xxxxxx 8 10/04/16

WEKA only deals with “flat” files
@relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ... Flat file in ARFF (Attribute-Relation File Format) format Before the example, let’s learn the data file formats--arff. arff is a flat files, you can use any texteditor open it. it contains three sections. the first section is relation section, tagged it indicates what the relation this data file represents. the second section is attribute section, tagged it describe the attributes the data contains. it define the attributes contained in the data instance the third section is data section. It contains all the instance of this data file. 9 10/04/16

WEKA only deals with “flat” files
@relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ... numeric attribute nominal attribute let’s see some details in the attributes section, each row is a attribute properties.it include attributes name and the type. such as xxx and after name is the attributes types. it can be numeric and nominal, and if a attribute is nominal, numeric attribute is indicate that the value of attribute is numbers. and for the nominal attribute,the attribute value is one of value of the value set. you should list all the possible values here. such as the xxxx and in the data section, each row is a instance of data, it contains 6 value, the order of attributes is the same as the attribute section. 10 10/04/16

University of Waikato 11 09/30/12
next let me show you a example of pre-processing data. 11 University of Waikato 09/30/12

first open xxxx and press open file button 12 University of Waikato 09/30/12

here we use the example data given by weka iris after loading the data, we can see some information about the data such as the relation name,number of instance and number of attributes, in the left window, there is the name of each attributes, when we choose one attribute, some detailed information of this attributes will shown in the right window. such as some statistic features and the visualize graph. 13 University of Waikato 09/30/12

note that the fifth attributes-- calss 14 University of Waikato 09/30/12

actually it is the class label of this data. 15 University of Waikato 09/30/12

when press visualize all button,
16 University of Waikato 09/30/12

we can get all the visualize graph of each data attribute. before we do the analyze step, we can observe these graph to find some regulations. 17 University of Waikato 09/30/12

Then I’ll show how to use filter to discretize the data. 18 University of Waikato 09/30/12

press choose filter 19 University of Waikato 09/30/12

and I choose 20 University of Waikato 09/30/12

and I choose the discretize filter.

Then I choose the attribute we want to discretize. 23 University of Waikato 09/30/12

not 24 University of Waikato 09/30/12

note there are some parameters can control the discretize filter, we can change it as we need. 25 University of Waikato 09/30/12

in this example we change useEqualFrequency to true. and press ok 26 University of Waikato 09/30/12

after that , we simply press apply button. 30 University of Waikato 09/30/12

and we can see the results. all the data are divided into serval intervals. that is the basic step about the data pre-processing 31 University of Waikato 09/30/12

Explorer: building “classifiers”
Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include: Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, … Next, I’ll show a example about the data analyze, do a classification on this data. to complete this task, we should build the classifiers at first. In weka, there are many algorithm to build the classifiers. such as in this case, We use the decision trees as an example. 32 10/04/16

Decision Tree Induction: Training Dataset
This follows an example of Quinlan’s ID3 (Playing Tennis) I guess that you have not learn the decision trees. First, Let me briefly introduce the decision tree by an example: Here is our train data, the task is to use the first four attribute to predict whether the people will buy a computer. 33 10/04/16

Output: A Decision Tree for “buys_computer”
age? overcast student? credit rating? <=30 >40 no yes 31..40 fair excellent To solve the problem, we build a the tree structure from the attribute. each node is a question and each brunch is the operation of different answers. For example, for a new people, first we check his age, if his age is 20, the we check wether he is student or not, if yes ,then we can predict that he will buy the computer. This how is the decision tree works. we can use this questions as a classifier 34 10/04/16

Next I’ll show you how to use weka to build a decision tree. 36 University of Waikato 09/30/12

Press choose button and choose a classifier 37 University of Waikato 09/30/12

The classifier we will use here is J48 claissifer 38 University of Waikato 09/30/12

Like the filters, there are also several parameters we can set, In this case we use all default. 41 University of Waikato 09/30/12

In the test options,there are many parameters we can set. in this case ,we only modify the percentage spilt as 66%. that’s means we use 66% of instances in dataset as training data and 33% as test data to evaluate the performance of this classifier . The other options remain default. 44 University of Waikato 09/30/12

Then we click start to run this task. 50 University of Waikato 09/30/12

And this is the result of building decision tree and classification. The right window show the detailed result as text 51 University of Waikato 09/30/12

First part is the result of building the decision tree. 52 University of Waikato 09/30/12

The second part is test result. We can see that for the 50 test instances, only two test case is wrong. So this tree is a good classifier. 53 University of Waikato 09/30/12

Maybe the previous result is not direct view enough. In weka ,we can also visualize the decision tree. right click the result list. 54 University of Waikato 09/30/12

choose the visualize tree.

and we can see the decision tree constructed by the weka. This is basic procedures to do a data analyze using weka. Today we learn how to use weka to do the data pre-processing and data analyze tasks. After this tutorial I hope everyone can try the weka by yourself. If you have any questions, please me and I’ll reply you as soon as possible.Thank you very much! 56 University of Waikato 09/30/12

Explorer: attribute selection
Panel that can be used to investigate which (subsets of) attributes are the most predictive ones Attribute selection methods contain two parts: A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking An evaluation method: correlation-based, wrapper, information gain, chi-squared, … Very flexible: WEKA allows (almost) arbitrary combinations of these two 70 10/04/16

Explorer: data visualization
Visualization very useful in practice: e.g. helps to determine difficulty of the learning problem WEKA can visualize single attributes (1-d) and pairs of attributes (2-d) To do: rotating 3-d visualizations (Xgobi-style) Color-coded class values “Jitter” option to deal with nominal attributes (and to detect “hidden” data points) “Zoom-in” function 79 10/04/16

References and Resources
WEKA website: WEKA Tutorial: Machine Learning with WEKA: A presentation demonstrating all graphical user interfaces (GUI) in Weka. A presentation which explains how to use Weka for exploratory data mining. WEKA Data Mining Book: Ian H. Witten and Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques (Second Edition) WEKA Wiki: Others: Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, 2nd ed.

An Introduction to WEKA

Similar presentations

Presentation on theme: "An Introduction to WEKA"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

An Introduction to WEKA

Similar presentations

Presentation on theme: "An Introduction to WEKA"— Presentation transcript:

Similar presentations

About project

Feedback