An Exercise in Machine Learning

Slides:



Advertisements
Similar presentations
Florida International University COP 4770 Introduction of Weka.
Advertisements

Weka & Rapid Miner Tutorial By Chibuike Muoh. WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering.
Machine Learning in Practice Lecture 3 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
WEKA (sumber: Machine Learning with WEKA). What is WEKA? Weka is a collection of machine learning algorithms for data mining tasks. Weka contains.
Decision Tree Rong Jin. Determine Milage Per Gallon.
WEKA Evaluation of WEKA Waikato Environment for Knowledge Analysis Presented By: Manoj Wartikar & Sameer Sagade.
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
March 25, 2004Columbia University1 Machine Learning with Weka Lokesh S. Shrestha.
An Extended Introduction to WEKA. Data Mining Process.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
1 Statistical Learning Introduction to Weka Michel Galley Artificial Intelligence class November 2, 2006.
Evaluation of Results (classifiers, and beyond) Biplav Srivastava Sources: [Witten&Frank00] Witten, I.H. and Frank, E. Data Mining - Practical Machine.
1 How to use Weka How to use Weka. 2 WEKA: the software Waikato Environment for Knowledge Analysis Collection of state-of-the-art machine learning algorithms.
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Classification: Evaluation February 23,
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
 The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.
Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.
An Evaluation of A Commercial Data Mining Suite Oracle Data Mining Presented by Emily Davis Supervisor: John Ebden.
Evaluation – next steps
WEKA - Explorer (sumber: WEKA Explorer user Guide for Version 3-5-5)
WEKA and Machine Learning Algorithms. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of.
Appendix: The WEKA Data Mining Software
Weka Project assignment 3
1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
Weka: a useful tool in data mining and machine learning Team 5 Noha Elsherbiny, Huijun Xiong, and Bhanu Peddi.
Machine Learning with Weka Cornelia Caragea Thanks to Eibe Frank for some of the slides.
The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation.
For ITCS 6265/8265 Fall 2009 TA: Fei Xu UNC Charlotte.
W E K A Waikato Environment for Knowledge Analysis Branko Kavšek MPŠ Jožef StefanNovember 2005.
1 1 Slide Using Weka. 2 2 Slide Data Mining Using Weka n What’s Data Mining? We are overwhelmed with data We are overwhelmed with data Data mining is.
An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Weka – A Machine Learning Toolkit October 2, 2008 Keum-Sung Hwang.
WEKA Machine Learning Toolbox. You can install Weka on your computer from
Weka Just do it Free and Open Source ML Suite Ian Witten & Eibe Frank University of Waikato New Zealand.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Introduction to Weka Xingquan (Hill) Zhu Slides copied from Jeffrey Junfeng Pan (UST)
W E K A Waikato Environment for Knowledge Aquisition.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall DM Finals Study Guide Rodney Nielsen.
An Exercise in Machine Learning
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
Validation methods.
Machine Learning: Decision Trees Homework 4 assigned courtesy: Geoffrey Hinton, Yann LeCun, Tan, Steinbach, Kumar.
Weka Tutorial. WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering – association rule Created by.
WEKA's Knowledge Flow Interface Data Mining Knowledge Discovery in Databases ELIE TCHEIMEGNI Department of Computer Science Bowie State University, MD.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
@relation age sex { female, chest_pain_type { typ_angina, asympt, non_anginal,
Prepared by: Mahmoud Rafeek Al-Farra
Waikato Environment for Knowledge Analysis
WEKA.
Sampath Jayarathna Cal Poly Pomona
Figure 1.1 Rules for the contact lens data.
Weka Package Weka package is open source data mining software written in Java. Weka can be applied to your dataset from the GUI, the command line or called.
Machine Learning with Weka
Tutorial for WEKA Heejun Kim June 19, 2018.
CSCI N317 Computation for Scientific Applications Unit Weka
CS4705 – Natural Language Processing Thursday, September 28
CSE 491/891 Lecture 25 (Mahout).
Lecture 10 – Introduction to Weka
Statistical Learning Introduction to Weka
Practice Project Overview
Data Mining CSCI 307, Spring 2019 Lecture 7
Data Mining CSCI 307, Spring 2019 Lecture 8
Presentation transcript:

An Exercise in Machine Learning http://www.cs.iastate.edu/~cs573x/BBSIlab/2006/ Cornelia Caragea

Outline Machine Learning Software Preparing Data Building Classifiers Interpreting Results

Machine Learning Software Suites (General Purpose) WEKA (Source: Java) MLC++ (Source: C++) SAS List from KDNuggets (Various) Specific Classification: C4.5, SVMlight Association Rule Mining Bayesian Net … Commercial vs. Free

What does WEKA do? Implementation of the state-of-the-art learning algorithm Main strengths in the classification Regression, Association Rules and clustering algorithms Extensible to try new learning schemes Large variety of handy tools (transforming datasets, filters, visualization etc…)

WEKA resources API Documentation, Tutorials, Source code. WEKA mailing list Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations Weka-related Projects: Weka-Parallel - parallel processing for Weka RWeka - linking R and Weka YALE - Yet Another Learning Environment Many others…

Outline Machine Learning Software Preparing Data Building Classifiers Interpreting Results

Preparing Data ARFF Data Format Header – describing the attribute types Data – (instances, examples) comma-separated list

Launching WEKA java -jar weka.jar

Load Dataset into WEKA

Data Filters Useful support for data preprocessing Removing or adding attributes, resampling the dataset, removing examples, etc. Creates stratified cross-validation folds of the given dataset, and class distributions are approximately retained within each fold. Typically split data as 2/3 in training and 1/3 in testing

Data Filters

Outline Machine Learning Software Preparing Data Building Classifiers Interpreting Results

Building Classifiers A classifier model - mapping from dataset attributes to the class (target) attribute. Creation and form differs. Decision Tree and Naïve Bayes Classifiers Which one is the best? No Free Lunch!

Building Classifiers

(1) weka.classifiers.rules.ZeroR Class for building and using a 0-R classifier Majority class classifier Predicts the mean (for a numeric class) or the mode (for a nominal class)

Exercise 1 http://www.cs.iastate.edu/~cs573x/BBSIlab/2006/exercises/ex1.html

(2)weka.classifiers.bayes.NaiveBayes Class for building a Naive Bayes classifier

(3) weka.classifiers.trees.J48 Class for generating a pruned or unpruned C4.5 decision tree

Test Options Percentage Split (2/3 Training; 1/3 Testing) Cross-validation estimating the generalization error based on resampling when limited data; averaged error estimate. stratified 10-fold leave-one-out (Loo)

Outline Machine Learning Software Preparing Data Building Classifiers Interpreting Results

Understanding Output

Decision Tree Output (1)

Decision Tree Output (2)

Exercise 2 http://www.cs.iastate.edu/~cs573x/BBSIlab/2006/exercises/ex2.html

Performance Measures Accuracy & Error rate Confusion matrix – contingency table True Positive rate & False Positive rate (Area under Receiver Operating Characteristic) Precision,Recall & F-Measure Sensitivity & Specificity For more information on these, see uisp09-Evaluation.ppt

Decision Tree Pruning Overcome Over-fitting Pre-pruning and Post-pruning Reduced error pruning Subtree raising with different confidence Comparing tree size and accuracy

Subtree replacement Bottom-up: tree is considered for replacement once all its subtrees have been considered

Subtree Raising Deletes node and redistributes instances Slower than subtree replacement

Exercise 3 http://www.cs.iastate.edu/~cs573x/BBSIlab/2006/exercises/ex3.html