Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression Clustering Association Rules Attribute Selection Data Visualization The Experimenter The Knowledge Flow GUI Conclusions Machine Learning with WEKA
11/10/2015University of Waikato2 WEKA: the bird Copyright: Martin Kramer
11/10/2015University of Waikato3 WEKA: the software Machine learning/data mining software written in Java (distributed under the GNU Public License) Used for research, education, and applications Complements “Data Mining” by Witten & Frank Main features: Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods Graphical user interfaces (incl. data visualization) Environment for comparing learning algorithms
11/10/2015University of Waikato4 WEKA: versions There are several versions of WEKA: WEKA 3.0: “book version” compatible with description in data mining book WEKA 3.2: “GUI version” adds graphical user interfaces (book version is command-line only) WEKA 3.3: “development version” with lots of improvements This talk is based on the latest snapshot of WEKA 3.3 (soon to be WEKA 3.4)
11/10/2015University of age sex { female, chest_pain_type { typ_angina, asympt, non_anginal, cholesterol exercise_induced_angina { no, class { present, 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present... WEKA only deals with “flat” files
11/10/2015University of age sex { female, chest_pain_type { typ_angina, asympt, non_anginal, cholesterol exercise_induced_angina { no, class { present, 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present... WEKA only deals with “flat” files
11/10/2015University of Waikato7
11/10/2015University of Waikato8
11/10/2015University of Waikato9
11/10/2015University of Waikato10 Explorer: pre-processing the data Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called “filters” WEKA contains filters for: Discretization, normalization, resampling, attribute selection, transforming and combining attributes, …
11/10/2015University of Waikato11
11/10/2015University of Waikato12
11/10/2015University of Waikato13
11/10/2015University of Waikato14
11/10/2015University of Waikato15
11/10/2015University of Waikato16
11/10/2015University of Waikato17
11/10/2015University of Waikato18
11/10/2015University of Waikato19
11/10/2015University of Waikato20
11/10/2015University of Waikato21
11/10/2015University of Waikato22
11/10/2015University of Waikato23
11/10/2015University of Waikato24
11/10/2015University of Waikato25
11/10/2015University of Waikato26
11/10/2015University of Waikato27
11/10/2015University of Waikato28
11/10/2015University of Waikato29
11/10/2015University of Waikato30
11/10/2015University of Waikato31
11/10/2015University of Waikato32 Explorer: building “classifiers” Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include: Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, … “Meta”-classifiers include: Bagging, boosting, stacking, error-correcting output codes, locally weighted learning, …
11/10/2015University of Waikato33
11/10/2015University of Waikato34
11/10/2015University of Waikato35
11/10/2015University of Waikato36
11/10/2015University of Waikato37
11/10/2015University of Waikato38
11/10/2015University of Waikato39
11/10/2015University of Waikato40
11/10/2015University of Waikato41
11/10/2015University of Waikato42
11/10/2015University of Waikato43
11/10/2015University of Waikato44
11/10/2015University of Waikato45
11/10/2015University of Waikato46
11/10/2015University of Waikato47
11/10/2015University of Waikato48
11/10/2015University of Waikato49
11/10/2015University of Waikato50
11/10/2015University of Waikato51
11/10/2015University of Waikato52
11/10/2015University of Waikato53
11/10/2015University of Waikato54
11/10/2015University of Waikato55
11/10/2015University of Waikato56
11/10/2015University of Waikato57
11/10/2015University of Waikato58
11/10/2015University of Waikato59
11/10/2015University of Waikato60
11/10/2015University of Waikato61
11/10/2015University of Waikato62
11/10/2015University of Waikato63
11/10/2015University of Waikato64
11/10/2015University of Waikato65
11/10/2015University of Waikato66
11/10/2015University of Waikato67
11/10/2015University of Waikato68
11/10/2015University of Waikato69
11/10/2015University of Waikato70
11/10/2015University of Waikato71
11/10/2015University of Waikato72
11/10/2015University of Waikato73
11/10/2015University of Waikato74
11/10/2015University of Waikato75
11/10/2015University of Waikato76
11/10/2015University of Waikato77
11/10/2015University of Waikato78
11/10/2015University of Waikato79
11/10/2015University of Waikato80
11/10/2015University of Waikato81
11/10/2015University of Waikato82
11/10/2015University of Waikato83
11/10/2015University of Waikato84
11/10/2015University of Waikato85
11/10/2015University of Waikato86
11/10/2015University of Waikato87
11/10/2015University of Waikato88
11/10/2015University of Waikato89
11/10/2015University of Waikato90
11/10/2015University of Waikato91
11/10/2015University of Waikato92 Explorer: clustering data WEKA contains “clusterers” for finding groups of similar instances in a dataset Implemented schemes are: k-Means, EM, Cobweb, X-means, FarthestFirst Clusters can be visualized and compared to “true” clusters (if given) Evaluation based on loglikelihood if clustering scheme produces a probability distribution
11/10/2015University of Waikato93
11/10/2015University of Waikato94
11/10/2015University of Waikato95
11/10/2015University of Waikato96
11/10/2015University of Waikato97
11/10/2015University of Waikato98
11/10/2015University of Waikato99
11/10/2015University of Waikato100
11/10/2015University of Waikato101
11/10/2015University of Waikato102
11/10/2015University of Waikato103
11/10/2015University of Waikato104
11/10/2015University of Waikato105
11/10/2015University of Waikato106
11/10/2015University of Waikato107
11/10/2015University of Waikato108 Explorer: finding associations WEKA contains an implementation of the Apriori algorithm for learning association rules Works only with discrete data Can identify statistical dependencies between groups of attributes: milk, butter bread, eggs (with confidence 0.9 and support 2000) Apriori can compute all rules that have a given minimum support and exceed a given confidence
11/10/2015University of Waikato109
11/10/2015University of Waikato110
11/10/2015University of Waikato111
11/10/2015University of Waikato112
11/10/2015University of Waikato113
11/10/2015University of Waikato114
11/10/2015University of Waikato115
11/10/2015University of Waikato116 Explorer: attribute selection Panel that can be used to investigate which (subsets of) attributes are the most predictive ones Attribute selection methods contain two parts: A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking An evaluation method: correlation-based, wrapper, information gain, chi-squared, … Very flexible: WEKA allows (almost) arbitrary combinations of these two
11/10/2015University of Waikato117
11/10/2015University of Waikato118
11/10/2015University of Waikato119
11/10/2015University of Waikato120
11/10/2015University of Waikato121
11/10/2015University of Waikato122
11/10/2015University of Waikato123
11/10/2015University of Waikato124
11/10/2015University of Waikato125 Explorer: data visualization Visualization very useful in practice: e.g. helps to determine difficulty of the learning problem WEKA can visualize single attributes (1-d) and pairs of attributes (2-d) To do: rotating 3-d visualizations (Xgobi-style) Color-coded class values “Jitter” option to deal with nominal attributes (and to detect “hidden” data points) “Zoom-in” function
11/10/2015University of Waikato126
11/10/2015University of Waikato127
11/10/2015University of Waikato128
11/10/2015University of Waikato129
11/10/2015University of Waikato130
11/10/2015University of Waikato131
11/10/2015University of Waikato132
11/10/2015University of Waikato133
11/10/2015University of Waikato134
11/10/2015University of Waikato135
11/10/2015University of Waikato136
11/10/2015University of Waikato137
11/10/2015University of Waikato138 Conclusion: try it yourself! WEKA is available at Also has a list of projects based on WEKA WEKA contributors: Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard Pfahringer, Brent Martin, Peter Flach, Eibe Frank,Gabi Schmidberger,Ian H. Witten, J. Lindgren, Janice Boughton, Jason Wells, Len Trigg, Lucio de Souza Coelho, Malcolm Ware, Mark Hall,Remco Bouckaert, Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy, Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang