Presentation is loading. Please wait.

Presentation is loading. Please wait.

Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.

Similar presentations


Presentation on theme: "Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression."— Presentation transcript:

1 Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression Clustering Association Rules Attribute Selection Data Visualization The Experimenter The Knowledge Flow GUI Conclusions Machine Learning with WEKA

2 2016/7/10University of Waikato2 WEKA: the bird Copyright: Martin Kramer (mkramer@wxs.nl)

3 2016/7/10University of Waikato3 WEKA: the software Machine learning/data mining software written in Java (distributed under the GNU Public License) Used for research, education, and applications Complements “Data Mining” by Witten & Frank Main features:  Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods  Graphical user interfaces (incl. data visualization)  Environment for comparing learning algorithms

4 2016/7/10University of Waikato4 WEKA: versions There are several versions of WEKA:  WEKA 3.0: “book version” compatible with description in data mining book  WEKA 3.2: “GUI version” adds graphical user interfaces (book version is command-line only)  WEKA 3.3: “development version” with lots of improvements This talk is based on the latest snapshot of WEKA 3.3 (soon to be WEKA 3.4)

5 2016/7/10University of Waikato5 @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present... WEKA only deals with “flat” files

6 2016/7/10University of Waikato6 @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present... WEKA only deals with “flat” files

7 2016/7/10University of Waikato7

8 2016/7/10University of Waikato8

9 2016/7/10University of Waikato9

10 2016/7/10University of Waikato10 Explorer: pre-processing the data Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called “filters” WEKA contains filters for:  Discretization, normalization, resampling, attribute selection, transforming and combining attributes, …

11 2016/7/10University of Waikato11

12 2016/7/10University of Waikato12 1 2

13 2016/7/10University of Waikato13

14 2016/7/10University of Waikato14

15 2016/7/10University of Waikato15

16 2016/7/10University of Waikato16

17 2016/7/10University of Waikato17

18 2016/7/10University of Waikato18

19 2016/7/10University of Waikato19

20 2016/7/10University of Waikato20

21 2016/7/10University of Waikato21

22 2016/7/10University of Waikato22

23 2016/7/10University of Waikato23

24 2016/7/10University of Waikato24

25 2016/7/10University of Waikato25

26 2016/7/10University of Waikato26

27 2016/7/10University of Waikato27

28 2016/7/10University of Waikato28

29 2016/7/10University of Waikato29

30 2016/7/10University of Waikato30

31 2016/7/10University of Waikato31

32 2016/7/10University of Waikato32 Explorer: building “classifiers” Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include:  Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, … “Meta”-classifiers include:  Bagging, boosting, stacking, error-correcting output codes, locally weighted learning, …

33 2016/7/10University of Waikato33

34 2016/7/10University of Waikato34 1 2

35 2016/7/10University of Waikato35

36 2016/7/10University of Waikato36

37 2016/7/10University of Waikato37

38 2016/7/10University of Waikato38

39 2016/7/10University of Waikato39

40 2016/7/10University of Waikato40

41 2016/7/10University of Waikato41

42 2016/7/10University of Waikato42

43 2016/7/10University of Waikato43

44 2016/7/10University of Waikato44

45 2016/7/10University of Waikato45

46 2016/7/10University of Waikato46

47 2016/7/10University of Waikato47

48 2016/7/10University of Waikato48

49 2016/7/10University of Waikato49

50 2016/7/10University of Waikato50

51 2016/7/10University of Waikato51

52 2016/7/10University of Waikato52

53 2016/7/10University of Waikato53

54 2016/7/10University of Waikato54

55 2016/7/10University of Waikato55

56 2016/7/10University of Waikato56

57 2016/7/10University of Waikato57

58 2016/7/10University of Waikato58 1 2

59 2016/7/10University of Waikato59

60 2016/7/10University of Waikato60

61 2016/7/10University of Waikato61 Node 0 Node 1 Node 2 Node 3 Node 4 Node 5

62 2016/7/10University of Waikato62 1 2

63 2016/7/10University of Waikato63 2 1

64 2016/7/10University of Waikato64 1 2

65 2016/7/10University of Waikato65 2 3 1

66 2016/7/10University of Waikato66

67 2016/7/10University of Waikato67 1 2

68 2016/7/10University of Waikato68

69 2016/7/10University of Waikato69

70 2016/7/10University of Waikato70

71 2016/7/10University of Waikato71

72 2016/7/10University of Waikato72

73 2016/7/10University of Waikato73 ROC curve

74 2016/7/10University of Waikato74

75 2016/7/10University of Waikato75 Use a numeric attribute as output

76 2016/7/10University of Waikato76

77 2016/7/10University of Waikato77

78 2016/7/10University of Waikato78 1 2 M5 Model trees and rules: Combines a decision tree with linear regression Use M5P to predict petal length of a iris flower

79 2016/7/10University of Waikato79 Generate 3 models 1 2 3 Model 1 Model 2 Model 3

80 2016/7/10University of Waikato80 Linear Model 1 Linear Model 2 Linear Model 3

81 2016/7/10University of Waikato81 right click “Visualize classifier error”

82 2016/7/10University of Waikato82

83 2016/7/10University of Waikato83 Click a data point to show the data window

84 2016/7/10University of Waikato84

85 2016/7/10University of Waikato85

86 2016/7/10University of Waikato86

87 2016/7/10University of Waikato87 Explorer: clustering data WEKA contains “clusterers” for finding groups of similar instances in a dataset Implemented schemes are:  k-Means, EM, Cobweb, X-means, FarthestFirst Clusters can be visualized and compared to “true” clusters (if given) Evaluation based on log likelihood if clustering scheme produces a probability distribution

88 2016/7/10University of Waikato88

89 2016/7/10University of Waikato89 1 2

90 2016/7/10University of Waikato90

91 2016/7/10University of Waikato91

92 2016/7/10University of Waikato92 1 3 2

93 2016/7/10University of Waikato93 Enter 3 clusters

94 2016/7/10University of Waikato94

95 2016/7/10University of Waikato95

96 2016/7/10University of Waikato96

97 2016/7/10University of Waikato97

98 2016/7/10University of Waikato98 Right click

99 2016/7/10University of Waikato99

100 2016/7/10University of Waikato100

101 2016/7/10University of Waikato101 Explorer: finding associations WEKA contains an implementation of the Apriori algorithm for learning association rules  Works only with discrete data Can identify statistical dependencies between groups of attributes:  milk, butter  bread, eggs (with confidence 0.9 and support 2000) Apriori can compute all rules that have a given minimum support and exceed a given confidence

102 2016/7/10University of Waikato102

103 2016/7/10University of Waikato103

104 2016/7/10University of Waikato104 Load vote data set

105 2016/7/10University of Waikato105

106 2016/7/10University of Waikato106

107 2016/7/10University of Waikato107

108 2016/7/10University of Waikato108 Expand the Window

109 2016/7/10University of Waikato109 minimum confidence = 0.9 minimum support = 0.45 Change parameters 2 10 rules

110 2016/7/10University of Waikato110 minimum confidence minimum support Set number of rules=15

111 2016/7/10University of Waikato111 15 rules

112 2016/7/10University of Waikato112 Explorer: attribute selection Panel that can be used to investigate which (subsets of) attributes are the most predictive ones Attribute selection methods contain two parts:  A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking  An evaluation method: correlation-based, wrapper, information gain, chi-squared, … Very flexible: WEKA allows (almost) arbitrary combinations of these two

113 2016/7/10University of Waikato113

114 2016/7/10University of Waikato114

115 2016/7/10University of Waikato115

116 2016/7/10University of Waikato116

117 2016/7/10University of Waikato117

118 2016/7/10University of Waikato118

119 2016/7/10University of Waikato119

120 2016/7/10University of Waikato120

121 2016/7/10University of Waikato121 Explorer: data visualization Visualization very useful in practice: e.g. helps to determine difficulty of the learning problem WEKA can visualize single attributes (1D) and pairs of attributes (2D)  To do: rotating 3D visualizations (Xgobi-style) Color-coded class values “Jitter” option to deal with nominal attributes (and to detect “hidden” data points) “Zoom-in” function

122 2016/7/10University of Waikato122 Load Glass data set 1 2

123 2016/7/10University of Waikato123

124 2016/7/10University of Waikato124 1 2 Change PointSize

125 2016/7/10University of Waikato125

126 2016/7/10University of Waikato126 1 2 Change PlotSize

127 2016/7/10University of Waikato127 PlotSize is changed

128 2016/7/10University of Waikato128 Double click to enlarge

129 2016/7/10University of Waikato129

130 2016/7/10University of Waikato130

131 2016/7/10University of Waikato131 Zoom-in 1 2

132 2016/7/10University of Waikato132

133 2016/7/10University of Waikato133

134 2016/7/10University of Waikato134

135 2016/7/10University of Waikato135 Performing experiments Experimenter makes it easy to compare the performance of different learning schemes For classification and regression problems Results can be written into file or database Evaluation options: cross-validation, learning curve, hold-out Can also iterate over different parameter settings Significance-testing built in!

136 2016/7/10University of Waikato136

137 2016/7/10University of Waikato137

138 2016/7/10University of Waikato138 1. Add a few data sets 2. Add a few algorithms

139 2016/7/10University of Waikato139

140 2016/7/10University of Waikato140

141 2016/7/10University of Waikato141 Result message Running status

142 2016/7/10University of Waikato142

143 2016/7/10University of Waikato143

144 2016/7/10University of Waikato144

145 2016/7/10University of Waikato145

146 2016/7/10University of Waikato146

147 2016/7/10University of Waikato147

148 2016/7/10University of Waikato148 The Knowledge Flow GUI New graphical user interface for WEKA Java-Beans-based interface for setting up and running machine learning experiments Data sources, classifiers, etc. are beans and can be connected graphically Data “flows” through components: e.g., “data source” -> “filter” -> “classifier” -> “evaluator” Layouts can be saved and loaded again later

149 2016/7/10University of Waikato149

150 2016/7/10University of Waikato150

151 2016/7/10University of Waikato151

152 2016/7/10University of Waikato152

153 2016/7/10University of Waikato153

154 2016/7/10University of Waikato154

155 2016/7/10University of Waikato155

156 2016/7/10University of Waikato156

157 2016/7/10University of Waikato157

158 2016/7/10University of Waikato158

159 2016/7/10University of Waikato159

160 2016/7/10University of Waikato160

161 2016/7/10University of Waikato161

162 2016/7/10University of Waikato162

163 2016/7/10University of Waikato163

164 2016/7/10University of Waikato164

165 2016/7/10University of Waikato165

166 2016/7/10University of Waikato166

167 2016/7/10University of Waikato167

168 2016/7/10University of Waikato168

169 2016/7/10University of Waikato169 Conclusion: try it yourself! WEKA is available at http://www.cs.waikato.ac.nz/ml/weka  Also has a list of projects based on WEKA  WEKA contributors: Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard Pfahringer, Brent Martin, Peter Flach, Eibe Frank,Gabi Schmidberger,Ian H. Witten, J. Lindgren, Janice Boughton, Jason Wells, Len Trigg, Lucio de Souza Coelho, Malcolm Ware, Mark Hall,Remco Bouckaert, Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy, Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang


Download ppt "Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression."

Similar presentations


Ads by Google