Presentation is loading. Please wait.

Presentation is loading. Please wait.

Weka Tutorial. WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering – association rule Created by.

Similar presentations


Presentation on theme: "Weka Tutorial. WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering – association rule Created by."— Presentation transcript:

1 Weka Tutorial

2 WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering – association rule Created by researchers at the University of Waikato in New Zealand Java based

3 WEKA:: Installation Download software from http://www.cs.waikato.ac.nz/ml/weka/ http://www.cs.waikato.ac.nz/ml/weka/ – If you are interested in modifying/extending weka there is a developer version that includes the source code Set the weka environment variable for java – setenv WEKAHOME /usr/local/weka/weka-3-6-1 – setenv CLASSPATH $WEKAHOME/weka.jar:$CLASSPATH Download some ML data from http://mlearn.ics.uci.edu/MLRepository.html http://mlearn.ics.uci.edu/MLRepository.html

4 WEKA:: Introduction.contd Routines are implemented as classes and logically arranged in packages Comes with an extensive GUI interface – Weka routines can be used stand alone via the command line Eg. java weka.classifiers.j48.J48 -t $WEKAHOME/data/iris.arff

5 WEKA:: Interface

6 WEKA:: Data format Uses flat text files to describe the data Can work with a wide variety of data files including its own “.arff” format and C4.5 file formats Data can be imported from a file in various formats: – ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC)

7 WEKA:: ARRF file format @relation anneal @attribute carbon @attribute hardness @attribute 'enamelability' {'?','1','2','3','4','5'} @attribute cholesterol numeric @attribute shape { COIL, SHEET} @attribute class {‘1’,’2’,’3’,’4’,’5’,’U’} @data '?','C','A',0,60,'T','?','?',0,'?','?','G','?','?','?','?','M','?','?','?','?','?','?','?','?','?','?','?','?','?','?','COIL',2.801,385.1,0,'?','0', '?','3' '?','C','A',0,60,'T','?','?',0,'?','?','G','?','?','?','?','B','Y','?','?','?','Y','?','?','?','?','?','?','?','?','?','SHEET',0.801,255,269,'?','0','?','3' '?','C','A',0,45,'?','S','?',0,'?','?','D','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','COIL',1.6,610,0,'?','0','?', '3'... A more thorough description is available here http://www.cs.waikato.ac.nz/~ml/weka/arff.html http://www.cs.waikato.ac.nz/~ml/weka/arff.html

8 WEKA:: Explorer: Preprocessing Pre-processing tools in WEKA are called “filters” WEKA contains filters for: – Discretization, normalization, resampling, attribute selection, transforming, combining attributes, etc

9 Annealing dataset : Description Annealing dataset is from the UCI repository of datasets. It contains information about data being annealed and its various properties. There are 38 attributes in this dataset in which 6 are continuous, 3 are integer valued and remaining 29 are nominal. This dataset consists of missing values and in total has 798 records along with 6 major classes. The notion of classes will be explained later during classification.

10 Data Cleaning: Removing missing values:

11 Data Cleaning: Removing useless attributes Earlier 38 now 32

12 Data transformation: Discretizing the attributes Implies 15 bins First-last means all attributes

13 Data reduction: Supervised attribute selection Reducing data size from 32 to 10

14 Viewing and understanding the transformed data This can be done using the ARFF viewer option in Weka. It allows us to save files in other formats also like CSV and others. arfftocsv convertor option and vice versa is also there. Such files can then be imported into mysql databases and others easily after this conversion.

15 Data is now ready for data mining !


Download ppt "Weka Tutorial. WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering – association rule Created by."

Similar presentations


Ads by Google