Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Exercise in Machine Learning

Similar presentations


Presentation on theme: "An Exercise in Machine Learning"— Presentation transcript:

1 An Exercise in Machine Learning
Cornelia Caragea

2 Outline Machine Learning Software Preparing Data Building Classifiers
Interpreting Results

3 Machine Learning Software
Suites (General Purpose) WEKA (Source: Java) MLC++ (Source: C++) SAS List from KDNuggets (Various) Specific Classification: C4.5, SVMlight Association Rule Mining Bayesian Net … Commercial vs. Free

4 What does WEKA do? Implementation of the state-of-the-art learning algorithm Main strengths in the classification Regression, Association Rules and clustering algorithms Extensible to try new learning schemes Large variety of handy tools (transforming datasets, filters, visualization etc…)

5 WEKA resources API Documentation, Tutorials, Source code.
WEKA mailing list Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations Weka-related Projects: Weka-Parallel - parallel processing for Weka RWeka - linking R and Weka YALE - Yet Another Learning Environment Many others…

6 Outline Machine Learning Software Preparing Data Building Classifiers
Interpreting Results

7 Preparing Data ARFF Data Format
Header – describing the attribute types Data – (instances, examples) comma-separated list

8 Launching WEKA java -jar weka.jar

9 Load Dataset into WEKA

10 Data Filters Useful support for data preprocessing
Removing or adding attributes, resampling the dataset, removing examples, etc. Creates stratified cross-validation folds of the given dataset, and class distributions are approximately retained within each fold. Typically split data as 2/3 in training and 1/3 in testing

11 Data Filters

12 Outline Machine Learning Software Preparing Data Building Classifiers
Interpreting Results

13 Building Classifiers A classifier model - mapping from dataset attributes to the class (target) attribute. Creation and form differs. Decision Tree and Naïve Bayes Classifiers Which one is the best? No Free Lunch!

14 Building Classifiers

15 (1) weka.classifiers.rules.ZeroR
Class for building and using a 0-R classifier Majority class classifier Predicts the mean (for a numeric class) or the mode (for a nominal class)

16 Exercise 1

17 (2)weka.classifiers.bayes.NaiveBayes
Class for building a Naive Bayes classifier

18 (3) weka.classifiers.trees.J48
Class for generating a pruned or unpruned C4.5 decision tree

19 Test Options Percentage Split (2/3 Training; 1/3 Testing)
Cross-validation estimating the generalization error based on resampling when limited data; averaged error estimate. stratified 10-fold leave-one-out (Loo)

20 Outline Machine Learning Software Preparing Data Building Classifiers
Interpreting Results

21 Understanding Output

22 Decision Tree Output (1)

23 Decision Tree Output (2)

24 Exercise 2

25 Performance Measures Accuracy & Error rate
Confusion matrix – contingency table True Positive rate & False Positive rate (Area under Receiver Operating Characteristic) Precision,Recall & F-Measure Sensitivity & Specificity For more information on these, see uisp09-Evaluation.ppt

26 Decision Tree Pruning Overcome Over-fitting
Pre-pruning and Post-pruning Reduced error pruning Subtree raising with different confidence Comparing tree size and accuracy

27 Subtree replacement Bottom-up: tree is considered for replacement once all its subtrees have been considered

28 Subtree Raising Deletes node and redistributes instances
Slower than subtree replacement

29 Exercise 3


Download ppt "An Exercise in Machine Learning"

Similar presentations


Ads by Google