Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining CSCI 307, Spring 2019 Lecture 7

Similar presentations


Presentation on theme: "Data Mining CSCI 307, Spring 2019 Lecture 7"— Presentation transcript:

1 Data Mining CSCI 307, Spring 2019 Lecture 7
Output: Trees WEKA intro

2 Can Use Trees for Numeric Prediction Too
Regression: the process of computing an expression that predicts a numeric quantity Regression tree: “decision tree” where each leaf predicts a numeric quantity Predicted value is average value of training instances that reach the leaf Model tree: “regression tree” with linear regression models at the leaf nodes Linear patches approximate continuous function Can Use Trees for Numeric Prediction Too

3 Linear Regression for the CPU Data
PRP = -56.1 MYCT MMIN MMAX CACH CHMIN CHMAX

4 Regression Tree for the CPU Data

5 Model Tree for the CPU Data
LM1 PRP = MMAX CHMIN LM2 PRP = MMIN – 3.99CHMIN CHMAX LM3 PRP = MMIN LM4 PRP = MMAX CACH CHMAX LM5 PRP = 285 – 1.46MYCT CACH -9.39CHMIN LM6 PRP = MMIN – 2.94CHMIN CHMAX

6 WEKA Waikato Environment for Knowledge Analysis
On Radius, do this once (make a WEKA folder, copy all the .arff files, copy the weka jar file) cd mkdir WEKAfiles cd WEKAfiles cp /usr/local/weka-3-8-1/data/* . cp /usr/local/weka-3-8-1/weka.jar weka.jar To Run the WEKA application (cd WEKAfiles, if not there already) java –Xmx1000M -jar weka.jar To Download onto a Windows or Mac computer, visit:

7 WEKA Introduction A collection of open source of many data mining and machine learning algorithms, including pre-processing on data classification clustering association rule extraction Created by researchers at the University of Waikato in New Zealand. Java based (also open source).

8 WEKA Main Features ∼ 49 data preprocessing tools
∼ 76 classification/regression algorithms ∼ 8 clustering algorithms ∼15 attribute/subset evaluators + 10 search algorithms for feature selection ∼ 3 algorithms for finding association rules 3 graphical user interfaces “The Explorer” (exploratory data analysis) “The Experimenter” (experimental environment) “The Knowledge Flow” (new process model inspired interface)

9 WEKA

10 WEKA Application Interface
Explorer preprocessing, attribute selection, learning, visualization Experimenter testing and evaluating machine learning algorithms Knowledge Flow visual design of the KDD (Knowledge Discovery /from Data/in Databases/with Data mining) process Simple Command-line A simple interface for typing commands

11 WEKA Functions and Tools
Preprocessing Filters Attribute selection Classification/Regression Clustering Association discovery Visualization

12 WEKA: Pros and Cons Pros Cons Open source, Free Extensible
Can be integrated into other java packages GUIs (Graphic User Interfaces) Relatively easy to use Features Run individual experiment, or Build KDD phases Cons Lack of proper and adequate documentations Systems are updated constantly (Kitchen Sink Syndrome)


Download ppt "Data Mining CSCI 307, Spring 2019 Lecture 7"

Similar presentations


Ads by Google