1 1 Slide Using Weka. 2 2 Slide Data Mining Using Weka n What’s Data Mining? We are overwhelmed with data We are overwhelmed with data Data mining is.

Slides:



Advertisements
Similar presentations
COMP3740 CR32: Knowledge Management and Adaptive Systems
Advertisements

Machine Learning Homework
Florida International University COP 4770 Introduction of Weka.
Demo: Classification Programs C4.5 CBA Minqing Hu CS594 Fall 2003 UIC.
Weka & Rapid Miner Tutorial By Chibuike Muoh. WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
An Extended Introduction to WEKA. Data Mining Process.
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
CSCI 347 / CS 4206: Data Mining Module 05: WEKA Topic 04: Data Preparation Tools.
Data Mining – Algorithms: OneR Chapter 4, Section 4.1.
An Exercise in Machine Learning
 The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.
Contributed by Yizhou Sun 2008 An Introduction to WEKA.
Rapid Miner Session CIS 600 Analytical Data Mining,EECS, SU Three steps for use  Assign the dataset file first  Select functionality  Execute.
Carolina Environmental Program UNC Chapel Hill The Analysis Engine – A New Tool for Model Evaluation, Sensitivity and Uncertainty Analysis, and more… Alison.
WEKA - Explorer (sumber: WEKA Explorer user Guide for Version 3-5-5)
WEKA and Machine Learning Algorithms. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of.
Appendix: The WEKA Data Mining Software
1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset.
Data Mining – Algorithms: Prism – Learning Rules via Separating and Covering Chapter 4, Section 4.4.
Hands-on predictive models and machine learning for software Foutse Khomh, Queen’s University Segla Kpodjedo, École Polytechnique de Montreal PASED - Canadian.
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
Machine Learning with Weka Cornelia Caragea Thanks to Eibe Frank for some of the slides.
W E K A Waikato Environment for Knowledge Analysis Branko Kavšek MPŠ Jožef StefanNovember 2005.
Data Mining – Algorithms: Decision Trees - ID3 Chapter 4, Section 4.3.
1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge Version 0.1 ?
Data Mining – Algorithms: Naïve Bayes Chapter 4, Section 4.2.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.
Weka – A Machine Learning Toolkit October 2, 2008 Keum-Sung Hwang.
WEKA Machine Learning Toolbox. You can install Weka on your computer from
CSE/CIS 787 Analytical Data Mining, Dept. of EECS, SU Three steps for use  Assign the dataset file first  Assign the analysis type you want.
W E K A Waikato Environment for Knowledge Aquisition.
An Exercise in Machine Learning
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
Weka Tutorial. WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering – association rule Created by.
Machine Learning (ML) with Weka Weka can classify data or approximate functions: choice of many algorithms.
Machine Learning with WEKA - Yohan Chin. WEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning algorithms for data tasks.
WEKA's Knowledge Flow Interface Data Mining Knowledge Discovery in Databases ELIE TCHEIMEGNI Department of Computer Science Bowie State University, MD.
Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Rapid Miner Session CIS 787 Data Mining,EECS, SU Three steps for use  Assign the dataset file first  Assign the analysis type you want  Execute.
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Machine Learning Homework Gaining familiarity with Weka, ML tools and algorithms.
Data Mining Practical Machine Learning Tools and Techniques Chapter 6.3: Association Rules Rodney Nielsen Many / most of these slides were adapted from:
WEKA: A Practical Machine Learning Tool WEKA : A Practical Machine Learning Tool.
An Introduction to WEKA
DECISION TREES An internal node represents a test on an attribute.
Data Science Algorithms: The Basic Methods
Prepared by: Mahmoud Rafeek Al-Farra
Waikato Environment for Knowledge Analysis
Prepared by Kimberly Sayre and Jinbo Bi
WEKA.
Sampath Jayarathna Cal Poly Pomona
An Introduction to WEKA
Weka Package Weka package is open source data mining software written in Java. Weka can be applied to your dataset from the GUI, the command line or called.
Clustering.
Machine Learning with Weka
Tutorial for LightSIDE
An Introduction to WEKA
Tutorial for WEKA Heejun Kim June 19, 2018.
Opening Weka Select Weka from Start Menu Select Explorer Fall 2003
CSCI N317 Computation for Scientific Applications Unit Weka
Machine Learning with WEKA
Lecture 10 – Introduction to Weka
Assignment 8 : logistic regression
Neural Networks Weka Lab
Data Mining CSCI 307, Spring 2019 Lecture 7
Data Mining CSCI 307, Spring 2019 Lecture 6
Data Mining CSCI 307, Spring 2019 Lecture 8
Data Mining CSCI 307, Spring 2019 Lecture 9
Presentation transcript:

1 1 Slide Using Weka

2 2 Slide Data Mining Using Weka n What’s Data Mining? We are overwhelmed with data We are overwhelmed with data Data mining is about going from data to information, information that can give you useful predictions Data mining is about going from data to information, information that can give you useful predictions n Examples?? You’re at the supermarket checkout. You’re at the supermarket checkout. You’re happy with your bargains … and … the supermarket is happy you’ve bought some more stuff You’re happy with your bargains … and … the supermarket is happy you’ve bought some more stuff Say you want a child, but you and your partner can’t have one. Can data mining help? Say you want a child, but you and your partner can’t have one. Can data mining help? n Data mining vs. machine learning

3 3 Slide Data Mining Using Weka n What’s Weka? A bird found only in New Zealand? A bird found only in New Zealand? n Data mining workbench Waikato Environment for Knowledge Analysis Waikato Environment for Knowledge Analysis n Machine learning algorithms for data mining tasks 100+ algorithms for classification 100+ algorithms for classification 75 for data preprocessing 75 for data preprocessing 25 to assist with feature selection 25 to assist with feature selection 20 for clustering, finding association rules, etc 20 for clustering, finding association rules, etc

4 4 Slide Data Mining Using Weka n What will you learn? Load data into Weka and look at it Load data into Weka and look at it Use filters to preprocess it Use filters to preprocess it Explore it using interactive visualization Explore it using interactive visualization Apply classification algorithms Apply classification algorithms Interpret the output Interpret the output Understand evaluation methods and their implications Understand evaluation methods and their implications Understand various representations for models Understand various representations for models Explain how popular machine learning algorithms work Explain how popular machine learning algorithms work

5 5 Slide Data Mining Using Weka n What will you learn? (cont.) Be aware of common pitfalls with data mining Be aware of common pitfalls with data mining Use Weka on your own data … and understand what you are doing! Use Weka on your own data … and understand what you are doing!

6 6 Slide Data Mining Using Weka n Getting started with Weka Install Weka Install Weka Explore the “Explorer” interface Explore the “Explorer” interface Explore some datasets Explore some datasets Build a classifier Build a classifier Interpret the output Interpret the output Use filters Use filters Visualize your data set Visualize your data set

7 7 Slide Data Mining Using Weka n Install Weka Download links available on Course Page Download links available on Course Page n Platform: Windows X86 Windows X86 Windows X64 Windows X64 Mac OSX Mac OSX n Version: the latest stable version of Weka the latest stable version of Weka datasets for the course datasets for the course

8 8 Slide Data Mining Using Weka n Exercise Install Weka Install Weka Get datasets along with the installation Get datasets along with the installation Load the Weka program Load the Weka program Open Explorer Open Explorer Open a dataset (weather.nominal.arff) Open a dataset (weather.nominal.arff) Look at attributes Look at attributes Edit the dataset Edit the dataset Save it if you need to make changes to the dataset Save it if you need to make changes to the dataset

9 9 Slide Command‐line interface Graphical interface Performance comparisons Exploring the Explorer

10 Slide Exploring the Explorer

11 Slide attributes instances Outlook Temp Humidity Windy Play Sunny Hot High False No Sunny Hot High True No Overcast Hot High False Yes Rainy Mild High False Yes Rainy Cool Normal False Yes Rainy Cool Normal True No Overcast Cool Normal True Yes Sunny Mild High False No Sunny Cool Normal False Yes Rainy Mild Normal False Yes Sunny Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Rainy Mild High True No Exploring the Explorer n Open a dataset (weather.nominal.arff)

12 Slide 19 open file weather.nominal.arff Exploring the Explorer

13 Slide attributes attribute values Exploring the Explorer

14 Slide attributes instances Outlook Temp Humidity Windy Play Sunny Hot High False No Sunny Hot High True No Overcast Hot High False Yes Rainy Mild High False Yes Rainy Cool Normal False Yes Rainy Cool Normal True No Overcast Cool Normal True Yes Sunny Mild High False No Sunny Cool Normal False Yes Rainy Mild Normal False Yes Sunny Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Rainy Mild High True No Exploring the Explorer

15 Slide attributes class attribute values open file weather.nominal.arff Exploring the Explorer

16 Slide Classification Dataset: classified examples “ Model ” that classifies new examples instance: fixed setof features classified example class discrete: “ classification ” problem continuous: “ regression ” problem discrete ( “ nominal ” ) continuous ( “ numeric ” ) attribute 1 attribute 2 … attribute n sometimes called “ supervised learning ” Exploring the Explorer

17 Slide attributes class attribute values open file weather. numeric.arff Exploring the Explorer

18 Slide open file glass.arff Exploring the Explorer

19 Slide Exploring the Explorer n Exercise on the classification problem Datasets: weather.nominal, weather.numeric Datasets: weather.nominal, weather.numeric Nominal vs numeric attributes Nominal vs numeric attributes ARFF file format ARFF file format Checking attributes Checking attributes

20 Slide Exploring the Explorer n File format ARFF file format ARFF file format Native in WekaNative in Weka More informationMore information CSV file format CSV file format Compatible with Excel and WekaCompatible with Excel and Weka

21 Slide Exploring the Explorer n Excise on File Preparation Prepare ARFF file Prepare ARFF file Specialized formatSpecialized format Need to follow ARFF syntaxNeed to follow ARFF syntax CSV file format CSV file format Comma separated formatComma separated format Notepad compatibleNotepad compatible Excel compatibleExcel compatible

22 Slide Exploring the Explorer n Excise on File Preparation (cont.) ARFF  CSV ARFF  CSV EasyEasy In Weka Explorer, use Save… feature after loading the dataset and change file format to CSV data filesIn Weka Explorer, use Save… feature after loading the dataset and change file format to CSV data files CSV  ARFF CSV  ARFF EasyEasy In Weka Explorer, use Open File… feature and change the file format to CSV data filesIn Weka Explorer, use Open File… feature and change the file format to CSV data files Next, use Save… feature and change the file format to Arff data filesNext, use Save… feature and change the file format to Arff data files

23 Slide Building a classifier n Use J48 to analyze the glass dataset Open file glass.arff Open file glass.arff Check the available classifiers Check the available classifiers Choose the J48 decision tree learner (trees>J48) Choose the J48 decision tree learner (trees>J48) Run it Run it Examine the output Examine the output Look at the correctly classified instances … and the confusion matrix Look at the correctly classified instances … and the confusion matrix

24 Slide Building a classifier n Investigate J48 Open the configuration panel Open the configuration panel Check the More information Check the More information Examine the options Examine the options Use an unpruned tree Use an unpruned tree Look at leaf sizes Look at leaf sizes Set minNumObj to 15 to avoid small leaves Set minNumObj to 15 to avoid small leaves Visualize tree using right ‐ click menu Visualize tree using right ‐ click menu

25 Slide Building a classifier n From C4.5 to J48 ID3 (1979) ID3 (1979) C4.5 (1993) C4.5 (1993) C4.8 (1996) C4.8 (1996) C5.0 (commercial) C5.0 (commercial) J48

26 Slide Building a classifier n Investigate J48 Classifiers in Weka Classifiers in Weka Classifying the glass dataset Classifying the glass dataset Interpreting J48 output Interpreting J48 output J48 configuration panel J48 configuration panel … option: pruned vs unpruned trees … option: pruned vs unpruned trees … option: avoid small leaves … option: avoid small leaves

27 Slide Using a filter n Use a filter to remove an attribute (3 rd attribute) Open weather.nominal.arff Open weather.nominal.arff Check the filters Check the filters supervised vs unsupervisedsupervised vs unsupervised attribute vs instanceattribute vs instance Choose the unsupervised attribute filter Remove Choose the unsupervised attribute filter Remove Check the More information; look at the options Check the More information; look at the options Set attributeIndices to 3 and click OK (to remove the 3 rd attribute) Set attributeIndices to 3 and click OK (to remove the 3 rd attribute) Apply the filter Apply the filter Save the result or press Undo to skip the change Save the result or press Undo to skip the change

28 Slide Using a filter n Use Remove button to remove attributes Open weather.nominal.arff Open weather.nominal.arff Use check boxes and Remove button Use check boxes and Remove button

29 Slide Using a filter n Remove instances where humidity is high Open weather.nominal.arff Open weather.nominal.arff Supervised or unsupervised? Supervised or unsupervised? Attribute or instance? Attribute or instance? Look at them Look at them Select RemoveWithValues Select RemoveWithValues Set attributeIndex to 3 (3 rd attribute) Set attributeIndex to 3 (3 rd attribute) Set nominalIndices to 1 (first value: high) Set nominalIndices to 1 (first value: high) Apply Apply Undo Undo

30 Slide Using a filter n Fewer attributes, better classification! Open glass.arff Open glass.arff Run J48 (trees>J48) Run J48 (trees>J48) Remove Fe Remove Fe Remove all attributes except RI and MG Remove all attributes except RI and MG Look at the decision trees Look at the decision trees Use right ‐ click menu to visualize decision trees Use right ‐ click menu to visualize decision trees

31 Slide Using a filter n Summary Filters in Weka Filters in Weka Supervised vs unsupervised, attribute vs instance Supervised vs unsupervised, attribute vs instance To find the right one, you need to look To find the right one, you need to look Filters can be very powerful Filters can be very powerful Smartly removing attributes Smartly removing attributes improve performanceimprove performance increase comprehensibilityincrease comprehensibility

32 Slide Visualizing your data n Using the Visualize panel Open iris.arff Open iris.arff Bring up Visualize panel Bring up Visualize panel Click one of the plots; examine some instances Click one of the plots; examine some instances Set x axis to petalwidth and y axis to petallength Set x axis to petalwidth and y axis to petallength Click on Class color to change the color Click on Class color to change the color Bars on the right change correspond to attributes: click for x axis; right ‐ click for y axis Bars on the right change correspond to attributes: click for x axis; right ‐ click for y axis Jitter slider (to see the overlapped instances) Jitter slider (to see the overlapped instances) Show Select Instance: Rectangle option Show Select Instance: Rectangle option Submit, Reset, Clear and Save Submit, Reset, Clear and Save

33 Slide Visualizing your data n Visualizing classification errors Open iris.arff Open iris.arff Run J48 (trees>J48) Run J48 (trees>J48) Visualize classifier errors (from Results list) Visualize classifier errors (from Results list) Plot predictedclass against class Plot predictedclass against class Identify errors shown by confusion matrix Identify errors shown by confusion matrix

34 Slide Visualizing your data n Summary Get down and dirty with your data Get down and dirty with your data Visualize it Visualize it Clean it up by deleting outliers Clean it up by deleting outliers Look at classification errors Look at classification errors (there’s a filter that allows you to add classifications as a new attribute)(there’s a filter that allows you to add classifications as a new attribute)