Machine Learning in Practice Lecture 18

Slides:



Advertisements
Similar presentations
COMP3740 CR32: Knowledge Management and Adaptive Systems
Advertisements

Data Mining Classification: Alternative Techniques
Rule-Based Classifiers. Rule-Based Classifier Classify records by using a collection of “if…then…” rules Rule: (Condition)  y –where Condition is a conjunctions.
From Decision Trees To Rules
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
RIPPER Fast Effective Rule Induction
Machine Learning in Practice Lecture 7 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Decision Tree Algorithm
Covering Algorithms. Trees vs. rules From trees to rules. Easy: converting a tree into a set of rules –One rule for each leaf: –Antecedent contains a.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Mohammad Ali Keyvanrad
Data Mining – Algorithms: Prism – Learning Rules via Separating and Covering Chapter 4, Section 4.4.
Chapter 9 – Classification and Regression Trees
GATree: Genetically Evolved Decision Trees 전자전기컴퓨터공학과 데이터베이스 연구실 G 김태종.
Learning from Observations Chapter 18 Through
Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Slides for “Data Mining” by I. H. Witten and E. Frank.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
CS Inductive Bias1 Inductive Bias: How to generalize on novel data.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.2: Classification Rules Rodney Nielsen Many.
Machine Learning in Practice Lecture 19 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Machine Learning in Practice Lecture 5 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
CSC 8520 Spring Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring,
Machine Learning in Practice Lecture 24 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Machine Learning in Practice Lecture 6 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Decision Tree Learning
Computational Intelligence: Methods and Applications
Rule Induction for Classification Using
Artificial Intelligence
Presented by: Dr Beatriz de la Iglesia
Data Science Algorithms: The Basic Methods
Data Science Algorithms: The Basic Methods
Data Mining Classification: Alternative Techniques
Issues in Decision-Tree Learning Avoiding overfitting through pruning
Decision Tree Saed Sayad 9/21/2018.
Introduction to Data Mining, 2nd Edition by
Introduction to Data Mining, 2nd Edition by
Introduction to Data Mining, 2nd Edition by
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Machine Learning in Practice Lecture 26
Machine Learning Chapter 3. Decision Tree Learning
Overfitting and Underfitting
Machine Learning in Practice Lecture 23
Machine Learning in Practice Lecture 22
Machine Learning in Practice Lecture 7
Machine Learning in Practice Lecture 17
Machine Learning in Practice Lecture 6
Machine Learning in Practice Lecture 19
Machine Learning in Practice Lecture 27
Presentation transcript:

Machine Learning in Practice Lecture 18 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute

Plan for the Day Announcements Rule Based Learning Questions? Quiz Feedback Rule Based Learning Revisit the Tic Tac Toe Problem Start thinking about Optimization and Tuning

Quiz Feedback Only one person got everything right Were the readings confusing this time? Association Rule Mining vs. Rule Learning

Rule Based Learning

Rules Versus Trees Tree based learning is divide and conquer Decision based on what will have the biggest overall effect on “purity” at leaf nodes Rule based learning is “separate-and-conquer” Considers only one class at a time (usually starting with smallest) What separates this class from the default class

Trees vs. Rules J48

Trees vs. Rules J48

Locally Optimal Solutions Forshadowing..... Optimal Solution Locally Optimal Solution

Covering Algorithms Rule based algorithms are called covering algorithms Whereas tree based algorithms take all classes into account at the same time, covering algorithms only consider one class at a time Rule based algorithms look for a set of conditions that achieve high accuracy on one class at a time

Accuracy versus Information Gain [A A A A B B B B B] [A A A A B B B B B] [A A A B] [A B B B B] [B B B] [A A A A B B] Accuracy: 78% Accuracy: 78% Information: .76 Information: .61 * Note that lower resulting information means higher information gain.

Accuracy vs Information Gain

Rules Don’t Need to be Applied in Order Rules that predict the same class can be re-ordered without affecting performance If rules are treated as un-ordered, rules associated with different classes might match at the same time In that case you need to have a tie breaker Maybe rule accuracy Maybe based on prior probabilities of each class

Rule Learning Note that the rules below for each class consider different subsets of attributes Note that two conditions were necessary to most accurately predict yum – rule learning algorithms add conditions to rules until accuracy is high enough The more complex a rule becomes, the more likely it is to over-fit @relation is-yummy If chocolate cake and not vanilla ice cream then yum If vanilla ice cream then good If vanilla cake then ok @attribute ice-cream {chocolate, vanilla, coffee, rocky-road, strawberry} @attribute cake {chocolate, vanilla} @attribute yummy {yum,good,ok} @data chocolate,chocolate,yum vanilla,chocolate,good coffee,chocolate,yum coffee,vanilla,ok rocky-road,chocolate,yum strawberry,vanilla,ok

Rule Induction by Pruning Rules from Trees Rules can be read off of trees They will be overly complex But they can be pruned in a “greedy” fashion using the same principles discussed here You might get duplicate rules then, so remove those In practice this is very inefficient

Rules versus Trees Decision tree learning is a divide and conquer approach Top-down, looking to attributes that achieve useful splits in data Trees can be converted into sets of rules If you then Tutor If not(you) and Imperitive then Tutor If not(you) and not(Imperitive) and good then Tutor not(good) and WordCount > 2 and not(all-I) then Tutor all-I and not(So) then Student all-I and So then Tutor not(good) and WordCount <= 2 and not(on) then Student On then Tutor

Ordered Rules More Compact If you then Tutor If not(you) and Imperitive then Tutor else if good then Tutor else if WordCount > 2 then if not(all-I) then Tutor else if ….. If rules are applied in order, then you can use if-then-else structure But then you’re back to a tree representation

Advantages of Classification Rules If a and b then x If c and d then x Decision trees can’t easily represent disjunctions Sometimes subtrees have to be repeated – this introduces a greater chance of error So rules are a more powerful representation, but more power can lead to more over-fitting!!! a b c d x

Advantages of Classification Rules If a and b then x If c and d then x Classification rules express disjunctions more concisely Decision lists are meant to be applied in order (so context is assumed) Easy to encode “else” conditions a b c d x

Rules Versus Trees Because both algorithms make one selection at a time, they will prefer different choices since the criteria are different Rule learning is more prone to over-fitting Rule representations have more power (e.g., disjunctions) Rule learning algorithms tend to make decisions based on more local information Even when Information Gain is used for choosing between options, the set of options considered is different

Pruning Rules Just as trees are grown and then pruned, rules are also grown and then pruned Rather than one growth stage followed by one pruning stage, you alternate growth and pruning With rules only reduced error pruning is used Trees can be pruned using reduced error pruning or by estimating error on training data using confidence intervals Rules only have one pruning operation Trees have two pruning operations

Rule Learning Manipulations Pruning Paradigms: How would this rule perform over the whole set by itself versus how would this rule perform after other rules have fired? Do you start with a default? If so, what is that default? Pruning rule: remove the condition that improves the performance of the rule the most over a validation set (or remove conditions in reverse order)

Tic Tac Toe

Tic Tac Toe O X X X O O X O X

Tic Tac Toe: Remember this? Decision Trees: .67 Kappa SMO: .96 Kappa Naïve Bayes: .28 Kappa O X X X O O X O X

Decision Trees

How do you think the rule model would be different? Decision Trees

Rules from JRIP .95 Kappa! * When will it fail?

Optimization

Why Trees and Rules are Sometimes Counter Intuitive All machine learning algorithms are designed to avoid doing an exhaustive search of the vector space In order to reduce search time, they make simplifying assumptions that sometimes lead to counter-intuitive results We have talked about some variations on basic tree and rule learning These affect which options are visible at each point in the search

Locally Optimal Solutions

Why Trees and Rules are Sometimes Counter Intuitive The simplifying assumptions bias the search to favor certain regions of the hypothesis space Different algorithms have different biases, so they look at a different subset of solutions When this bias leads the algorithm to an optimal or near optimal solution it is a useful bias Depends largely on quirky characteristics of your data set

Why Trees and Rules are Sometimes Counter Intuitive Simplifying assumptions increase efficiency but may decrease the quality of the derived solutions Tunnel vision Spurious regularities in the data lead to unpredictable results Tuning the parameters of an algorithm changes its bias (i.e., binary spilts vs not) You have to guard against overfitting!

Optimizing Parameter Settings Use a modified form of cross- validation: 1 2 4 5 3 Test Iterate over settings Compare performance over validation set; Pick optimal setting Test on Test Set Validation Still N folds, but each fold has less training data than with standard cross validation Train Or you can have a hold-out Validation set you use for all folds

Optimizing Parameter Settings 1 2 4 5 3 Test This approach assumes that you want to estimate the generalization you will get from your learning and tuning approach together. If you just want to know what the best performance you can get on *this* set by tuning, you can just use standard cross-validation Validation Train

Take Home Message Tree Based and Rule Based Learners are similar Rules are readable Greedy algorithms Locally optimal solution Tree Based and Rule Based Learners are different Information gain versus Accuracy Representational power wrt disjunctions