Weka Just do it Free and Open Source ML Suite Ian Witten & Eibe Frank University of Waikato New Zealand.

Slides:



Advertisements
Similar presentations
When Efficient Model Averaging Out-Perform Bagging and Boosting Ian Davidson, SUNY Albany Wei Fan, IBM T.J.Watson.
Advertisements

Florida International University COP 4770 Introduction of Weka.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Learning Algorithm Evaluation
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
Machine Learning in Practice Lecture 3 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Evaluation.
Model Evaluation Metrics for Performance Evaluation
Credibility: Evaluating what’s been learned. Evaluation: the key to success How predictive is the model we learned? Error on the training data is not.
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Neural Nets How the brain achieves intelligence 10^11 1mhz cpu’s.
Evaluation.
Ensemble Learning: An Introduction
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
A Short Introduction to Weka Natural Language Processing Thursday, September 25th.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Machine Learning: Ensemble Methods
Evaluation of Results (classifiers, and beyond) Biplav Srivastava Sources: [Witten&Frank00] Witten, I.H. and Frank, E. Data Mining - Practical Machine.
A Short Introduction to Weka Natural Language Processing Thursday, September 27 Frank Enos and Andrew Rosenberg.
Rotation Forest: A New Classifier Ensemble Method 交通大學 電子所 蕭晴駿 Juan J. Rodríguez and Ludmila I. Kuncheva.
Classification and Prediction: Basic Concepts Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
Ensembles of Classifiers Evgueni Smirnov
Today Evaluation Measures Accuracy Significance Testing
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Classification: Evaluation February 23,
EVALUATION David Kauchak CS 451 – Fall Admin Assignment 3 - change constructor to take zero parameters - instead, in the train method, call getFeatureIndices()
An Exercise in Machine Learning
 The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.
SVMLight SVMLight is an implementation of Support Vector Machine (SVM) in C. Download source from :
CLassification TESTING Testing classifier accuracy
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall 報告人:黃子齊
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Department of Computer Science, University of Waikato, New Zealand Geoffrey Holmes, Bernhard Pfahringer and Richard Kirkby Traditional machine learning.
Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp )
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
Experimental Evaluation of Learning Algorithms Part 1.
Figure 1.1 Rules for the contact lens data.. Figure 1.2 Decision tree for the contact lens data.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
CLASSIFICATION: Ensemble Methods
Data Reduction via Instance Selection Chapter 1. Background KDD  Nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable.
ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.
Bab /57 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 2 Model Overfitting & Classifier Evaluation.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
WEKA Machine Learning Toolbox. You can install Weka on your computer from
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
CSCI 347, Data Mining Evaluation: Cross Validation, Holdout, Leave-One-Out Cross Validation and Bootstrapping, Sections 5.3 & 5.4, pages
An Exercise in Machine Learning
Classification Ensemble Methods 1
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
Validation methods.
Introduction to Weka ML Seminar for Rookies Byoung-Hee Kim Biointelligence Lab, Seoul National University.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Evaluation of Learning Models Evgueni Smirnov. Overview Motivation Metrics for Classifier’s Evaluation Methods for Classifier’s Evaluation Comparing Data.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Data Science Credibility: Evaluating What’s Been Learned
Machine Learning: Ensemble Methods
Introduction to Data Mining, 2nd Edition by
Machine Learning Techniques for Data Mining
Weka Free and Open Source ML Suite Ian Witten & Eibe Frank
Introduction to Data Mining, 2nd Edition
CSCI N317 Computation for Scientific Applications Unit Weka
CS639: Data Management for Data Science
Data Mining CSCI 307, Spring 2019 Lecture 8
Evaluation David Kauchak CS 158 – Fall 2019.
Presentation transcript:

Weka Just do it Free and Open Source ML Suite Ian Witten & Eibe Frank University of Waikato New Zealand

Overview Classifiers, Regressors, and clusterers Multiple evaluation schemes Bagging and Boosting Feature Selection: –right features and data key to successful learning Experimenter Visualizer Text not up to date. They welcome additions.

Learning Tasks Classification: given examples labelled from a finite domain, generate a procedure for labelling unseen examples. Regression: given examples labelled with a real value, generate procedure for labelling unseen examples. Clustering: from a set of examples, partitioning examples into “interesting” groups. What scientists want.

Data Format: sepalwidth petallength class 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa Etc. General attribute-name REAL or list of values

J48 = Decision Tree petalwidth <= 0.6: Iris-setosa (50.0) : # under node petalwidth > 0.6 #..number wrong | petalwidth <= 1.7 | | petallength <= 4.9: Iris-versicolor (48.0/1.0) | | petallength > 4.9 | | | petalwidth <= 1.5: Iris-virginica (3.0) | | | petalwidth > 1.5: Iris-versicolor (3.0/1.0) | petalwidth > 1.7: Iris-virginica (46.0/1.0)

Cross-validation Correctly Classified Instances % Incorrectly Classified Instances % Default 10-fold cross validation i.e. –Split data into 10 equal sized pieces –Train on 9 pieces and test on remainder –Do for all possibilities and average

J48 Confusion Matrix Old data set from statistics: 50 of each class a b c <-- classified as | a = Iris-setosa | b = Iris-versicolor | c = Iris-virginica

Precision, Recall, and Accuracy Precision: probability of being correct given that your decision. –Precision of iris-setosa is 49/49 = 100% –Specificity in medical literature Recall: probability of correctly identifying class. –Recall accuracy for iris-setosa is 49/50 = 98% –Sensitity in medical literature Accuracy: # right/total = 143/150 =~95%

Other Evaluation Schemes Leave-one-out cross-validation –Cross-validation where n = number of training instanced Specific train and test set –Allows for exact replication –Ok if train/test large, e.g. 10,000 range.

Bootstrap sampling Randomly select n with replacement from n Expect about 2/3 to be chosen for training –Prob of not chosen = (1-1/n)^n ~ 1/e. Testing on remainder Repeat about 30 times and average. Avoids partition bias