Download presentation
Presentation is loading. Please wait.
Published byMavis Hunt Modified over 8 years ago
1
Weka Tutorial
2
WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering – association rule Created by researchers at the University of Waikato in New Zealand Java based
3
WEKA:: Installation Download software from http://www.cs.waikato.ac.nz/ml/weka/ http://www.cs.waikato.ac.nz/ml/weka/ – If you are interested in modifying/extending weka there is a developer version that includes the source code Set the weka environment variable for java – setenv WEKAHOME /usr/local/weka/weka-3-6-1 – setenv CLASSPATH $WEKAHOME/weka.jar:$CLASSPATH Download some ML data from http://mlearn.ics.uci.edu/MLRepository.html http://mlearn.ics.uci.edu/MLRepository.html
4
WEKA:: Introduction.contd Routines are implemented as classes and logically arranged in packages Comes with an extensive GUI interface – Weka routines can be used stand alone via the command line Eg. java weka.classifiers.j48.J48 -t $WEKAHOME/data/iris.arff
5
WEKA:: Interface
6
WEKA:: Data format Uses flat text files to describe the data Can work with a wide variety of data files including its own “.arff” format and C4.5 file formats Data can be imported from a file in various formats: – ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC)
7
WEKA:: ARRF file format @relation anneal @attribute carbon @attribute hardness @attribute 'enamelability' {'?','1','2','3','4','5'} @attribute cholesterol numeric @attribute shape { COIL, SHEET} @attribute class {‘1’,’2’,’3’,’4’,’5’,’U’} @data '?','C','A',0,60,'T','?','?',0,'?','?','G','?','?','?','?','M','?','?','?','?','?','?','?','?','?','?','?','?','?','?','COIL',2.801,385.1,0,'?','0', '?','3' '?','C','A',0,60,'T','?','?',0,'?','?','G','?','?','?','?','B','Y','?','?','?','Y','?','?','?','?','?','?','?','?','?','SHEET',0.801,255,269,'?','0','?','3' '?','C','A',0,45,'?','S','?',0,'?','?','D','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','COIL',1.6,610,0,'?','0','?', '3'... A more thorough description is available here http://www.cs.waikato.ac.nz/~ml/weka/arff.html http://www.cs.waikato.ac.nz/~ml/weka/arff.html
8
WEKA:: Explorer: Preprocessing Pre-processing tools in WEKA are called “filters” WEKA contains filters for: – Discretization, normalization, resampling, attribute selection, transforming, combining attributes, etc
9
Annealing dataset : Description Annealing dataset is from the UCI repository of datasets. It contains information about data being annealed and its various properties. There are 38 attributes in this dataset in which 6 are continuous, 3 are integer valued and remaining 29 are nominal. This dataset consists of missing values and in total has 798 records along with 6 major classes. The notion of classes will be explained later during classification.
10
Data Cleaning: Removing missing values:
11
Data Cleaning: Removing useless attributes Earlier 38 now 32
12
Data transformation: Discretizing the attributes Implies 15 bins First-last means all attributes
13
Data reduction: Supervised attribute selection Reducing data size from 32 to 10
14
Viewing and understanding the transformed data This can be done using the ARFF viewer option in Weka. It allows us to save files in other formats also like CSV and others. arfftocsv convertor option and vice versa is also there. Such files can then be imported into mysql databases and others easily after this conversion.
15
Data is now ready for data mining !
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.