Presentation is loading. Please wait.

Presentation is loading. Please wait.

Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.

Similar presentations


Presentation on theme: "Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression."— Presentation transcript:

1 Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression Clustering Association Rules Attribute Selection Data Visualization The Experimenter The Knowledge Flow GUI Conclusions Machine Learning with WEKA

2 11/10/2015University of Waikato2 WEKA: the bird Copyright: Martin Kramer (mkramer@wxs.nl)

3 11/10/2015University of Waikato3 WEKA: the software Machine learning/data mining software written in Java (distributed under the GNU Public License) Used for research, education, and applications Complements “Data Mining” by Witten & Frank Main features:  Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods  Graphical user interfaces (incl. data visualization)  Environment for comparing learning algorithms

4 11/10/2015University of Waikato4 WEKA: versions There are several versions of WEKA:  WEKA 3.0: “book version” compatible with description in data mining book  WEKA 3.2: “GUI version” adds graphical user interfaces (book version is command-line only)  WEKA 3.3: “development version” with lots of improvements This talk is based on the latest snapshot of WEKA 3.3 (soon to be WEKA 3.4)

5 11/10/2015University of Waikato5 @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present... WEKA only deals with “flat” files

6 11/10/2015University of Waikato6 @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present... WEKA only deals with “flat” files

7 11/10/2015University of Waikato7

8 11/10/2015University of Waikato8

9 11/10/2015University of Waikato9

10 11/10/2015University of Waikato10 Explorer: pre-processing the data Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called “filters” WEKA contains filters for:  Discretization, normalization, resampling, attribute selection, transforming and combining attributes, …

11 11/10/2015University of Waikato11

12 11/10/2015University of Waikato12

13 11/10/2015University of Waikato13

14 11/10/2015University of Waikato14

15 11/10/2015University of Waikato15

16 11/10/2015University of Waikato16

17 11/10/2015University of Waikato17

18 11/10/2015University of Waikato18

19 11/10/2015University of Waikato19

20 11/10/2015University of Waikato20

21 11/10/2015University of Waikato21

22 11/10/2015University of Waikato22

23 11/10/2015University of Waikato23

24 11/10/2015University of Waikato24

25 11/10/2015University of Waikato25

26 11/10/2015University of Waikato26

27 11/10/2015University of Waikato27

28 11/10/2015University of Waikato28

29 11/10/2015University of Waikato29

30 11/10/2015University of Waikato30

31 11/10/2015University of Waikato31

32 11/10/2015University of Waikato32 Explorer: building “classifiers” Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include:  Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, … “Meta”-classifiers include:  Bagging, boosting, stacking, error-correcting output codes, locally weighted learning, …

33 11/10/2015University of Waikato33

34 11/10/2015University of Waikato34

35 11/10/2015University of Waikato35

36 11/10/2015University of Waikato36

37 11/10/2015University of Waikato37

38 11/10/2015University of Waikato38

39 11/10/2015University of Waikato39

40 11/10/2015University of Waikato40

41 11/10/2015University of Waikato41

42 11/10/2015University of Waikato42

43 11/10/2015University of Waikato43

44 11/10/2015University of Waikato44

45 11/10/2015University of Waikato45

46 11/10/2015University of Waikato46

47 11/10/2015University of Waikato47

48 11/10/2015University of Waikato48

49 11/10/2015University of Waikato49

50 11/10/2015University of Waikato50

51 11/10/2015University of Waikato51

52 11/10/2015University of Waikato52

53 11/10/2015University of Waikato53

54 11/10/2015University of Waikato54

55 11/10/2015University of Waikato55

56 11/10/2015University of Waikato56

57 11/10/2015University of Waikato57

58 11/10/2015University of Waikato58

59 11/10/2015University of Waikato59

60 11/10/2015University of Waikato60

61 11/10/2015University of Waikato61

62 11/10/2015University of Waikato62

63 11/10/2015University of Waikato63

64 11/10/2015University of Waikato64

65 11/10/2015University of Waikato65

66 11/10/2015University of Waikato66

67 11/10/2015University of Waikato67

68 11/10/2015University of Waikato68

69 11/10/2015University of Waikato69

70 11/10/2015University of Waikato70

71 11/10/2015University of Waikato71

72 11/10/2015University of Waikato72

73 11/10/2015University of Waikato73

74 11/10/2015University of Waikato74

75 11/10/2015University of Waikato75

76 11/10/2015University of Waikato76

77 11/10/2015University of Waikato77

78 11/10/2015University of Waikato78

79 11/10/2015University of Waikato79

80 11/10/2015University of Waikato80

81 11/10/2015University of Waikato81

82 11/10/2015University of Waikato82

83 11/10/2015University of Waikato83

84 11/10/2015University of Waikato84

85 11/10/2015University of Waikato85

86 11/10/2015University of Waikato86

87 11/10/2015University of Waikato87

88 11/10/2015University of Waikato88

89 11/10/2015University of Waikato89

90 11/10/2015University of Waikato90

91 11/10/2015University of Waikato91

92 11/10/2015University of Waikato92 Explorer: clustering data WEKA contains “clusterers” for finding groups of similar instances in a dataset Implemented schemes are:  k-Means, EM, Cobweb, X-means, FarthestFirst Clusters can be visualized and compared to “true” clusters (if given) Evaluation based on loglikelihood if clustering scheme produces a probability distribution

93 11/10/2015University of Waikato93

94 11/10/2015University of Waikato94

95 11/10/2015University of Waikato95

96 11/10/2015University of Waikato96

97 11/10/2015University of Waikato97

98 11/10/2015University of Waikato98

99 11/10/2015University of Waikato99

100 11/10/2015University of Waikato100

101 11/10/2015University of Waikato101

102 11/10/2015University of Waikato102

103 11/10/2015University of Waikato103

104 11/10/2015University of Waikato104

105 11/10/2015University of Waikato105

106 11/10/2015University of Waikato106

107 11/10/2015University of Waikato107

108 11/10/2015University of Waikato108 Explorer: finding associations WEKA contains an implementation of the Apriori algorithm for learning association rules  Works only with discrete data Can identify statistical dependencies between groups of attributes:  milk, butter  bread, eggs (with confidence 0.9 and support 2000) Apriori can compute all rules that have a given minimum support and exceed a given confidence

109 11/10/2015University of Waikato109

110 11/10/2015University of Waikato110

111 11/10/2015University of Waikato111

112 11/10/2015University of Waikato112

113 11/10/2015University of Waikato113

114 11/10/2015University of Waikato114

115 11/10/2015University of Waikato115

116 11/10/2015University of Waikato116 Explorer: attribute selection Panel that can be used to investigate which (subsets of) attributes are the most predictive ones Attribute selection methods contain two parts:  A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking  An evaluation method: correlation-based, wrapper, information gain, chi-squared, … Very flexible: WEKA allows (almost) arbitrary combinations of these two

117 11/10/2015University of Waikato117

118 11/10/2015University of Waikato118

119 11/10/2015University of Waikato119

120 11/10/2015University of Waikato120

121 11/10/2015University of Waikato121

122 11/10/2015University of Waikato122

123 11/10/2015University of Waikato123

124 11/10/2015University of Waikato124

125 11/10/2015University of Waikato125 Explorer: data visualization Visualization very useful in practice: e.g. helps to determine difficulty of the learning problem WEKA can visualize single attributes (1-d) and pairs of attributes (2-d)  To do: rotating 3-d visualizations (Xgobi-style) Color-coded class values “Jitter” option to deal with nominal attributes (and to detect “hidden” data points) “Zoom-in” function

126 11/10/2015University of Waikato126

127 11/10/2015University of Waikato127

128 11/10/2015University of Waikato128

129 11/10/2015University of Waikato129

130 11/10/2015University of Waikato130

131 11/10/2015University of Waikato131

132 11/10/2015University of Waikato132

133 11/10/2015University of Waikato133

134 11/10/2015University of Waikato134

135 11/10/2015University of Waikato135

136 11/10/2015University of Waikato136

137 11/10/2015University of Waikato137

138 11/10/2015University of Waikato138 Conclusion: try it yourself! WEKA is available at http://www.cs.waikato.ac.nz/ml/weka  Also has a list of projects based on WEKA  WEKA contributors: Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard Pfahringer, Brent Martin, Peter Flach, Eibe Frank,Gabi Schmidberger,Ian H. Witten, J. Lindgren, Janice Boughton, Jason Wells, Len Trigg, Lucio de Souza Coelho, Malcolm Ware, Mark Hall,Remco Bouckaert, Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy, Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang


Download ppt "Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression."

Similar presentations


Ads by Google