Weka Package Weka package is open source data mining software written in Java. Weka can be applied to your dataset from the GUI, the command line or called.

Slides:



Advertisements
Similar presentations
Machine Learning Homework
Advertisements

Florida International University COP 4770 Introduction of Weka.
Weka & Rapid Miner Tutorial By Chibuike Muoh. WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik University of Manchester.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.
WEKA (sumber: Machine Learning with WEKA). What is WEKA? Weka is a collection of machine learning algorithms for data mining tasks. Weka contains.
WEKA - Experimenter (sumber: WEKA Explorer user Guide for Version 3-5-5)
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
A Short Introduction to Weka Natural Language Processing Thursday, September 25th.
An Extended Introduction to WEKA. Data Mining Process.
1 Statistical Learning Introduction to Weka Michel Galley Artificial Intelligence class November 2, 2006.
A Short Introduction to Weka Natural Language Processing Thursday, September 27 Frank Enos and Andrew Rosenberg.
CSCI 347 / CS 4206: Data Mining Module 05: WEKA Topic 04: Data Preparation Tools.
Data Mining – Algorithms: OneR Chapter 4, Section 4.1.
An Exercise in Machine Learning
 The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.
Carolina Environmental Program UNC Chapel Hill The Analysis Engine – A New Tool for Model Evaluation, Sensitivity and Uncertainty Analysis, and more… Alison.
WEKA - Explorer (sumber: WEKA Explorer user Guide for Version 3-5-5)
WEKA and Machine Learning Algorithms. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of.
Appendix: The WEKA Data Mining Software
An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester.
1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset.
Machine Learning with Weka Cornelia Caragea Thanks to Eibe Frank for some of the slides.
Weka: Experimenter and Knowledge Flow interfaces Neil Mac Parthaláin
For ITCS 6265/8265 Fall 2009 TA: Fei Xu UNC Charlotte.
W E K A Waikato Environment for Knowledge Analysis Branko Kavšek MPŠ Jožef StefanNovember 2005.
1 1 Slide Using Weka. 2 2 Slide Data Mining Using Weka n What’s Data Mining? We are overwhelmed with data We are overwhelmed with data Data mining is.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Weka – A Machine Learning Toolkit October 2, 2008 Keum-Sung Hwang.
WEKA Machine Learning Toolbox. You can install Weka on your computer from
Introduction to Weka Xingquan (Hill) Zhu Slides copied from Jeffrey Junfeng Pan (UST)
W E K A Waikato Environment for Knowledge Aquisition.
An Exercise in Machine Learning
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
Weka Tutorial. WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering – association rule Created by.
Machine Learning (ML) with Weka Weka can classify data or approximate functions: choice of many algorithms.
Machine Learning with WEKA - Yohan Chin. WEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning algorithms for data tasks.
Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Integrating Components and Dynamic Text Boxes with the Animated Map– Lesson 101 Integrating Components and Dynamic Text Boxes with the Animated Map Lesson.
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
@relation age sex { female, chest_pain_type { typ_angina, asympt, non_anginal,
WEKA: A Practical Machine Learning Tool WEKA : A Practical Machine Learning Tool.
Dive Into® Visual Basic 2010 Express
An Introduction to WEKA
Chapter 3: Getting Started with Tasks
Tutorial 5: Working with Excel Tables, PivotTables, and PivotCharts
Prepared by: Mahmoud Rafeek Al-Farra
Waikato Environment for Knowledge Analysis
WEKA.
Sampath Jayarathna Cal Poly Pomona
An Introduction to WEKA
Machine Learning with Weka
An Introduction to WEKA
Tutorial for WEKA Heejun Kim June 19, 2018.
Opening Weka Select Weka from Start Menu Select Explorer Fall 2003
CSCI N317 Computation for Scientific Applications Unit Weka
CS4705 – Natural Language Processing Thursday, September 28
Machine Learning with WEKA
CSE 491/891 Lecture 25 (Mahout).
Lecture 10 – Introduction to Weka
Statistical Learning Introduction to Weka
Assignment 1: Classification by K Nearest Neighbors (KNN) technique
Assignment 8 : logistic regression
Neural Networks Weka Lab
Lesson 13 Working with Tables
Data Mining CSCI 307, Spring 2019 Lecture 7
Data Mining CSCI 307, Spring 2019 Lecture 6
Data Mining CSCI 307, Spring 2019 Lecture 8
Data Mining CSCI 307, Spring 2019 Lecture 9
Presentation transcript:

Weka Package Weka package is open source data mining software written in Java. Weka can be applied to your dataset from the GUI, the command line or called by your own java code. It also provides a variety of tools for preprocessing and evaluating the result of learning algorithms on any given dataset. November 27, 2018

Online Documentation Software website Book: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations User guide for the Explorer GUI User guide for the Experimenter GUI ARFF file format API documentation Collections of datasets Weka-related Projects November 27, 2018

Input: ARFF file format The data must be converted to ARFF Required declarations of @RELATION, @ATTRIBUTE and @DATA @RELATION declaration associates a name with the dataset @RELATION <relation-name> @ATTRIBUTE declaration specifies the name and type of an attribute @attribute <attribute><data type> Data type can be numeric, nominal, string or data @DATA declaration is a single line denoting the start of the data segment Missing values are represented by ? November 27, 2018

Data Example @relation weather @attribute outlook {sunny, overcast, rainy} @attribute temperature real @attribute humidity real @attribute windy {TRUE, FALSE} @attribute play {yes, no} @data sunny,85,85,FALSE,no sunny,80,90,TRUE,no overcast,83,86,FALSE,yes rainy,70,96,FALSE,yes ………… November 27, 2018

Weka GUI Strat->Programs->Data Mining Tools->Weka-3-2 November 27, 2018

Weka GUI - Experimenter Convenient Environment for creating, running, modifying and analyzing experiments. November 27, 2018

Defining an Experiment Initialize an experiment: click on “new”, system loads the default parameters. Define dataset to be processed in dataset panel Select “use relative paths” Click on “add new” November 27, 2018

Defining an Experiment November 27, 2018

Saving result of the experiment Click on the “CSVResultListener” entry in the Destination panel. Click on “outputFile” in the new popped window Type the name of the output file, click Select The file name is displayed in the outputFile panel. November 27, 2018

Saving result of the experiment November 27, 2018

Saving the experiment definition Select “Save…” at the top of the setup window. Type the dataset name with the extension “exp” The experiment can be restored by selecting Open in the setup window and then selecting the file in the dialog window Click Start to run the experiment. November 27, 2018

Saving the experiment definition November 27, 2018

Running an experiment Click the Run tab at the top of the window. The current experiment performs 10 randomized train and test runs on the Iris dataset, using 66% of the patterns for training and 34% for testing, and using the ZeroR scheme. The results of the experiment are saved to the dataset experiment.txt The results are generated in comma-separated value (CSV) form and can be loaded into a spreadsheet for analysis. November 27, 2018

Running an experiment November 27, 2018

Running result November 27, 2018

Changing the Experiment Parameters Clicking on the ResultGenerator panel. Click on the splitEvaluator entry to display the SplitEvaluator properties Click on the classifier entry(ZeroR) to display the scheme properties Click on the drop-down list for the scheme to select a different scheme. November 27, 2018

Changing the Experiment Parameters November 27, 2018

Adding Additional Schemes Additional Schemes can be added in the Generator properties panel. To begin, change the dropdown list entry from Disabled to Enabled in the Generator properties panel. Click Select property and expand splitEvaluator so that the classifier entry is visible in the property list; click Select. The scheme name is displayed in the Generator properties panel. Now when the experiment is run, results are generated for both schemes. November 27, 2018

Adding Additional Schemes November 27, 2018

Adding Additional Schemes November 27, 2018

Adding Additional Datasets The scheme(s) may be run on any number of datasets at a time. Additional datasets are added by clicking “Add new …” in the Datasets panel. Datasets are deleted from the experiment by selecting the dataset and then clicking Delete Selected. November 27, 2018

Raw Output The output generated by a scheme can be saved to a file and then examined at a later time Open the Result Producer window by clicking on the Result Generator panel in the Setup window. Click on rawOutput and select the True entry from the drop-down list. By default, the output is sent to the file splitEvaluatorOut.zip. The output file can be changed by clicking on the outputFile panel in the window. Now when the experiment is run, the result of each processing run is archived. November 27, 2018

Raw Output November 27, 2018

Raw Output November 27, 2018

Instances Result Producer Results can also be sent to an Instances Result Listener and then analysed by the Weka Experiment Analyser. Click on the result listener portion of the Destination panel and then select Instances Result Listener. Then select the output dataset. The dataset extension should be “arff”. When this experiment is run, results are generated in “arff” format. November 27, 2018

Instances Result Producer November 27, 2018

Instances Result Producer @relation InstanceResultListener @attribute Key_Dataset {iris} @attribute Key_Run {1,2,3,4,5,6,7,8,9,10} @attribute Key_Scheme {weka.classifiers.ZeroR} @attribute Key_Scheme_options {''} @attribute Key_Scheme_version_ID {6077547173920530258} @attribute Date_time numeric @attribute Number_of_instances numeric @attribute Number_correct numeric @attribute Number_incorrect numeric @attribute Number_unclassified numeric @attribute Percent_correct numeric … @data iris,1,weka.classifiers.ZeroR,'',6077547173920530258,20010205.1546,51,15,36,0,29.411765, 70.588235,0,0.446239,0.473777,100,100,81.592363,81.592363,0,1.59985,1.59985,0,0,0,0,0,0, 0,0,1,31,1,20,0,0,0,? iris,2,weka.classifiers.ZeroR,'',6077547173920530258,20010205.1546,51,11,40,0,21.568627, 78.431373,0,0.451365,0.480492,100,100,83.584631,83.584631,0,1.638914,1.638914,0,0,0,0,0, 0,0,0,1,31,1,20,0,0,0,? iris,3,weka.classifiers.ZeroR,'',6077547173920530258,20010205.1546,51,15,36,0,29.411765, 0,0,1,35,1,16,0,0,0,? November 27, 2018

Experiment Analyzer Analyze the results of experiments that were sent to an Instances Result Listener The experiment shown below uses 3 schemes, ZeroR,OneR, and j48.J48, to classify the Iris data in an experiment using 10 train and test runs, with 66%of the data used for training and 34% used for testing. November 27, 2018

Experiment Analyzer Run the experiment. Analyse tab -> Perform test to generate a comparison of the 3 schemes. The percentage correct for each of the 3 schemes is shown in each dataset row. The annotation “v” or “*” indicates that a specific result is statistically better (v) or worse (*) than the baseline scheme at the significance level specified November 27, 2018

Experiment Analyzer November 27, 2018

Changing the Baseline Scheme November 27, 2018

Summary Test November 27, 2018

Ranking Test November 27, 2018

Cross-Validation Result Producer Change from random train and test experiments to cross-validation experiments. November 27, 2018

Explorer GUI Explorer GUI: different preparation, transformation and modeling algorithms on a dataset Experimenter GUI: run different algorithms in batch and to compare the results Tabs: Preprocess Classify Cluster Associate Select attributes Visualize November 27, 2018

Explorer GUI November 27, 2018

Preprocessing Opening files Base relation and working relation Base relation: originally loaded version of data, unchanged during performing actions. Working relation: copy of base relation, changed when any filters are applied to the data. Working with attributes Attributes in base relation Attributes info for base relation November 27, 2018

Preprocessing November 27, 2018

Working with Filters Preprocess section allows filters to be defined. Filters transform the data in various ways. E.g. DiscretizeFilter - discretizes a range of numeric attributes in the dataset into nominal attributes. E.g. NominalToBinaryFilter – transform nominal data to n-1 bits binary fileter (n is the number of categories) Multiple filters can be applied onto data Transformation results are save in working relation November 27, 2018

Classification Select classification algorithms Test options Use training set Supplied test set Cross-validation Percentage split Click start to run the classification November 27, 2018

Classification November 27, 2018

Classification The classifier output text Run information: gives relation name, instances, attributes and test mode. Classifier model: a textual representation of the model Summary: a list of statistics summarizing of accuracy Detailed accuracy by class: more detailed per-class break down of the prediction accuracy. Confusion Matrix: show how many instances have been assigned to each class. November 27, 2018

Classification November 27, 2018

Clustering Cluster modes Ignoring attributes Learning clusters November 27, 2018

Clustering November 27, 2018

Associating Setting up Learning association November 27, 2018

Associating November 27, 2018

Selecting attributes Searching and evaluating Options Performing selection November 27, 2018

Selecting attributes November 27, 2018

Visualizing Changing the view Selecting instances November 27, 2018

Visualizing instances November 27, 2018

Visualizing instances November 27, 2018

Visualizing instances November 27, 2018

Visualizing instances November 27, 2018

Visualizing instances November 27, 2018

Visualizing output November 27, 2018

Visualizing output November 27, 2018

Tutorial is the combination of following sources User guide for the Explorer GUI Tutorial for the experiment GUI Machine learning algorithms for java Short tutorial for weka(1) Short tutorial for weka(2) Software for the data mining course November 27, 2018

Thank you !!! November 27, 2018