1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset.

1 1 Slide Evaluation

2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset Select UserClassifier (tree classifier) Select UserClassifier (tree classifier) Use the test set segmenttest.arff Use the test set segmenttest.arff Examine data visualizer and tree visualizer Examine data visualizer and tree visualizer Plot regioncentroidrow vs intensitymean Plot regioncentroidrow vs intensitymean Rectangle, Polygon and Polyline selection tools … several selections … Rectangle, Polygon and Polyline selection tools … several selections … Right click in Tree visualizer and Accept the tree Right click in Tree visualizer and Accept the tree n Over to you: how well can you do? Be a classifier!

3 3 Slide n Build a tree: what strategy did you use? n Given enough time, you could produce a “perfect” tree for the dataset but would it perform well on the test test? but would it perform well on the test test? Be a classifier!

4 4 Slide TestTestdatadataTestTestdatadata Training data ML algorithm Classifier Deploy! Evaluation results Training and Testing

5 5 Slide TestTestdatadataTestTestdatadata Training data ML algorithm Classifier Deploy! Evaluation results sets produced by Basic assumption: training and test independent sampling from an infinite population Training and Testing

6 6 Slide n Use J48 to analyze the segment dataset Open file segment ‐ challenge.arff Open file segment ‐ challenge.arff Choose J48 decision tree learner (trees>J48) Choose J48 decision tree learner (trees>J48) Supplied test set segment ‐ test.arff Supplied test set segment ‐ test.arff Run it: 96% accuracy Run it: 96% accuracy Evaluate on training set: 99% accuracy Evaluate on training set: 99% accuracy Evaluate on percentage split: 95% accuracy Evaluate on percentage split: 95% accuracy Do it again: get exactly the same result! Do it again: get exactly the same result! Training and Testing

7 7 Slide n Basic assumption: training and test sets sampled independently from an infinite population training and test sets sampled independently from an infinite population n Just one dataset? — hold some out for testing n Expect slight variation in results… but Weka produces same results each time… Why? E.g. J48 on segment ‐ challenge dataset E.g. J48 on segment ‐ challenge dataset Training and Testing

8 8 Slide n Evaluate J48 on segment ‐ challenge With segment ‐ challenge and J48 (trees>J48) With segment ‐ challenge and J48 (trees>J48) Set percentage split to 90% Set percentage split to 90% Run it: 96.7% accuracy Run it: 96.7% accuracy [More options] Repeat [More options] Repeat with a different i th seed with a different i th seed Use 2, 3, 4, 5, 6, 7, 8, 9, 10Use 2, 3, 4, 5, 6, 7, 8, 9, 10 Repeated Training and Testing 0.9670.9400.9400.9670.9530.9670.9200.9470.9330.947

9 9 Slide 0.9670.9400.9400.9670.9530.9670.9200.9470.9330.947  x x x x i i Sample mean x =x =x =x = n nn(xi – x )2(xi – x )2nn(xi – x )2(xi – x )2 Variance  2 2 2 2 = n – 1 Standard deviation  x = 0.949,  = 0.0158 Repeated Training and Testing n Evaluate J48 on segment ‐ challenge

10 Slide n Basic assumption: training and test sets sampled independently from an infinite population training and test sets sampled independently from an infinite population n Expect slight variation in results … get it by setting the random ‐ number seed n Can calculate mean and standard deviation experimentally Repeated Training and Testing

11 Slide n Use diabetes dataset and default holdout n Open file diabetes.arff n Test option: Percentage split n Try these classifiers: trees > J4876% trees > J4876% bayes > NaiveBayes 77% bayes > NaiveBayes 77% lazy > IBk73% lazy > IBk73% rules > PART74% rules > PART74% n 768 instances (500 negative, 268 positive) n Always guess “negative”: 500/768=65% rules > ZeroR: most likely class! rules > ZeroR: most likely class! Baseline Accuracy

12 Slide n Sometimes baseline is best! Open supermarket.arff and blindly apply Open supermarket.arff and blindly apply rules > ZeroR64%rules > ZeroR64% trees > J4863%trees > J4863% bayes > NaiveBayes 63%bayes > NaiveBayes 63% lazy > IBk38%lazy > IBk38% rules > PART63%rules > PART63% Attributes are not informative Attributes are not informative Caution: Don’t just apply Weka to a dataset: you need to understand what’s going on Caution: Don’t just apply Weka to a dataset: you need to understand what’s going on Baseline Accuracy

13 Slide n Consider whether differences are significant n Always try a simple baseline, e.g. rules > ZeroR n Caution: Don’t just apply Weka to a dataset: you need to understand what’s going on Baseline Accuracy

14 Slide n Can we improve upon repeated holdout (i.e. reduce variance)? n Cross ‐ validation n Stratified cross ‐ validation Cross-Validation

15 Slide  Repeated holdout holdout10% for testing, repeat 10times (repeat 10 times) Cross-Validation

16 Slide 10‐fold cross‐validation  Divide dataset into 10 parts Hold out each part in turn Average the results (folds) Each data point used once for testing, 9 times for training Stratified cross‐validation  Ensure that each fold has the right proportion of each class value Cross-Validation

17 Slide  Cross‐validation better than repeated holdout Stratified is even better Practical rule of thumb: Lots of data? – use percentage split Else stratified 10‐fold cross‐validation  Cross-Validation

18 Slide Is cross‐validation really better than repeated holdout?  Diabetes dataset Baseline accuracy (rules > ZeroR): trees > J48 10‐fold cross‐validation 65.1% 73.8% … with 173.8 different random number seed 275.0375.5475.5574.4675.6773.6874.0974.51073.0 Cross-Validation Results

19 Slide holdout(10%)75.377.980.574.071.470.179.271.480.567.5 cross‐validation (10‐fold) 73.875.075.575.574.475.673.674.074.573.0 xixixixi Sample mean x =x =x =x = n nn(xi – x )2(xi – x )2nn(xi – x )2(xi – x )2 Variance  2 2 2 2 = n–1  Standard deviation x = 74.5 x = 74.8  = = = =  = 4.6 0.9 Cross-Validation Results

20 Slide  Why 10‐fold? E.g. 20‐fold: 75.1%  Cross‐validation really is better than repeated holdout It reduces the variance of the estimate Cross-Validation Results

21 Slide Evaluation Methods Exercises

22 Slide Plan n To evaluate the performance of machine learning algorithms classifying Tic-Tac-Toe games.

23 Slide Classification on Tic-Tac-Toe n Download Tic-Tac-Toe dataset tic-tac-toe.zip from Course Page. n Work as a team to evaluate the performance of machine learning algorithms classifying Tic-Tac-Toe games.

24 Slide Evaluation Methods n Using Training Set (use 100% of instances to train/learn and use 100% of instances to test performance) n 10-fold Cross-Validation n Split 70% (use 70% of instances to train/learn and use the rest of 30% of instances to test performance)

25 Slide Classifiers Being Used n Decision Tree Tree → J48 Tree → J48 n Neural Network Functions → MultilayerPerceptron (trainingtime=50) Functions → MultilayerPerceptron (trainingtime=50) n Bayes Network Bayes → NaiveBayes Bayes → NaiveBayes n Nearest Neighbor Lazy → IBk (k=3) Lazy → IBk (k=3)

26 Slide Using Weka n Extract Tic-Tac-Toe.zip to the Weka folder n Load Weka program n Open the Tic-Tac-Toe.arff n Choose Explorer

27 Slide Using Weka (cont.) n Click Classify tab n Choose J48 Classifier below trees n Set the Test options to Use training set n Enable Output predictions in More options n Click Start to run

28 Slide Using Weka (cont.) n Accuracy rate

29 Slide Reporting n Download Tic-tac-toe-report.docx n Complete the table evaluating the performance of different learning methods in Q1. n Find the best performer in Q2, Q3, and Q4.

1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset.

Similar presentations

Presentation on theme: "1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset.

Similar presentations

Presentation on theme: "1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset."— Presentation transcript:

Similar presentations

About project

Feedback