Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tutorial for WEKA Heejun Kim June 19, 2018.

Similar presentations


Presentation on theme: "Tutorial for WEKA Heejun Kim June 19, 2018."— Presentation transcript:

1 Tutorial for WEKA Heejun Kim June 19, 2018

2 WEKA A machine learning toolkit developed at the University of Waikato, New Zealand Comprehensive collection of data processing and modeling algorithms Java-based implementation Both GUI and Programming API are available

3 The workflow of text mining with WEKA
Preparing data Extracting features (pre-processing) Building model Making prediction Error analysis Steps that are ideal for WEKA Neural network, Deep learning with some tweaks

4 Installing WEKA You should have JRE (1.8 preferred) or JDK
Download and Install WEKA from this link JRE is an acronym for java running environment which is a basically virtual machine that help you to run Java-based program.

5 Preparing Feature Table for WEKA
ARFF (Attribute-Relation File Format) that is default or CSV (comma delimited text file) file that may have limitation with some functionalities Encoding: UTF-8 is recommended Should be made as one table ARFF has the header structure explaining each feature (binary, numeric, and so on)

6 An example of feature table
Name of features Instance of data

7 Launching WEKA Explorer
Select Explorer

8 Opening Data & Checking Distribution
Open file (fileformat: csv) Save the file as arff (recommended) Filter (e.g., normalization and transforming attribute type, if necessary) Check distribution of attribute Remove attributes (if necessary) Filter: Unsupervised normalize: change all values to 0 to 1 center: z-transformation (mean to 0) you can transform data types (numeric to binary) Remove attribute: You can check whether each attribute is useful or not

9 Classification Choose an algorithm and options
Select test options (e.g., CV, independent test set, and output predictions) Check performance and prediction result Start modeling + prediction

10 Visualization of Results
Select attribute for axis Jitter (avoiding overlapped points) Distribution values of each attribute Correctly identified instances (cross) Incorrectly identified instances (squares) Legend (change color)

11 Start feature selection
Selecting features Choose an evaluator Select a search method Start feature selection Result

12 Selecting features (evaluator + search method)
J(0,1) 1 0 Courtesy of Andrew NG

13 Selecting features (evaluator + search method)
J(0,1) 1 0 Courtesy of Andrew NG

14 Clustering Choose an algorithm and options (set “numCluseters” to 3)
Select cluster mode (set “Classes to clusters evaluation” accordingly) Check performance and prediction result Select “iris.arff” data in preprocess menu Start modeling + prediction

15 Resources for further study
Data Mining with WEKA MOOC (by the University of Waikato): WEKA Wiki: WEKA manual (included in distribution) Weka Explorer tutorial (by CSU, Sacramento): ARFF has the header structure explaining each feature (binary, numeric, and so on)

16 Any questions?


Download ppt "Tutorial for WEKA Heejun Kim June 19, 2018."

Similar presentations


Ads by Google