Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Learning with Weka

Similar presentations


Presentation on theme: "Machine Learning with Weka"— Presentation transcript:

1 Machine Learning with Weka
Cornelia Caragea Thanks to Eibe Frank for some of the slides

2 Outline Weka: A Machine Learning Toolkit Preparing Data
Building Classifiers Implementation of the state-of-the-art learning algorithm Main strengths in the classification Regression, Association Rules and clustering algorithms Extensible to try new learning schemes Large variety of handy tools (transforming datasets, filters, visualization etc…)

3 WEKA: the software Machine learning/data mining software written in Java (distributed under the GNU Public License) Used for research, education, and applications Main features: Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods Graphical user interfaces (incl. data visualization) Environment for comparing learning algorithms Collection of machine learning algorithms for data mining tasks Applied directly to a dataset Called from your own Java code

4 WEKA: versions There are several versions of WEKA:
WEKA: “book version” compatible with description in data mining book WEKA: “GUI version” adds graphical user interfaces (book version is command-line only) WEKA: “development version” with lots of improvements

5 WEKA: resources API Documentation, Tutorials, Source code.
WEKA mailing list Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations Weka-related Projects: Weka-Parallel - parallel processing for Weka RWeka - linking R and Weka YALE - Yet Another Learning Environment Many others…

6 Weka: web site

7 WEKA: launching java -jar weka.jar

8 Outline Weka: A Machine Learning Toolkit Preparing Data
Building Classifiers

9 WEKA only deals with “flat” files
@relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ... ARFF Data Format Header – describing the attribute types Data – (instances, examples) comma-separated list Flat file in ARFF format

10 WEKA only deals with “flat” files
numeric attribute @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ... nominal attribute

11 Explorer: pre-processing the data
Data can be imported from a file in various formats: ARFF, CSV, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called “filters” WEKA contains filters for: Discretization, normalization, resampling, attribute selection, … Useful support for data preprocessing Removing or adding attributes, resampling the dataset, removing examples, etc. Creates stratified cross-validation folds of the given dataset, and class distributions are approximately retained within each fold. Typically split data as 2/3 in training and 1/3 in testing

12

13

14

15

16

17

18

19

20

21

22 Outline Weka: A Machine Learning Toolkit Preparing Data
Building Classifiers

23 Explorer: building “classifiers”
Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include: Decision trees, support vector machines, perceptrons, neural networks, logistic regression, Bayes nets, … “Meta”-classifiers include: Bagging, boosting, stacking, … A classifier model - mapping from dataset attributes to the class (target) attribute. Creation and form differs. Decision Tree and Naïve Bayes Classifiers Which one is the best? No Free Lunch!

24

25

26

27 Class for building and using a 0-R classifier
Majority class classifier Predicts the mean (for a numeric class) or the mode (for a nominal class)

28

29

30

31

32 Outline Machine Learning Software Preparing Data Building Classifiers

33 To Do Try Naïve Bayes and Logistic Regression classifiers on a different Weka dataset Use various parameters Try Linear regression


Download ppt "Machine Learning with Weka"

Similar presentations


Ads by Google