CS 478 - Learning Rules1 Learning Sets of Rules. CS 478 - Learning Rules2 Learning Rules If (Color = Red) and (Shape = round) then Class is A If (Color.

Slides:



Advertisements
Similar presentations
Explanation-Based Learning (borrowed from mooney et al)
Advertisements

Data Mining Classification: Alternative Techniques
Active Learning with Feedback on Both Features and Instances H. Raghavan, O. Madani and R. Jones Journal of Machine Learning Research 7 (2006) Presented.
1 CS 391L: Machine Learning: Rule Learning Raymond J. Mooney University of Texas at Austin.
Computer Science CPSC 322 Lecture 25 Top Down Proof Procedure (Ch 5.2.2)
Conceptual Clustering
Automated Regression Modeling Descriptive vs. Predictive Regression Models Four common automated modeling procedures Forward Modeling Backward Modeling.
From Decision Trees To Rules
Imbalanced data David Kauchak CS 451 – Fall 2013.
RIPPER Fast Effective Rule Induction
Natural Selection Simulation Learning objectives Students are expected to be proficient with either Microsoft PowerPoint or Google Docs Presentation tool.
Concept Learning DefinitionsDefinitions Search Space and General-Specific OrderingSearch Space and General-Specific Ordering The Candidate Elimination.
MAE 552 Heuristic Optimization Instructor: John Eddy Lecture #33 4/22/02 Fully Stressed Design.
CS Perceptrons1. 2 Basic Neuron CS Perceptrons3 Expanded Neuron.
CS421 - Course Information Website Syllabus Schedule The Book:
This material in not in your text (except as exercises) Sequence Comparisons –Problems in molecular biology involve finding the minimum number of edit.
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Multiplying, Dividing, and Simplifying Radicals
Data Structures Introduction Phil Tayco Slide version 1.0 Jan 26, 2015.
17.5 Rule Learning Given the importance of rule-based systems and the human effort that is required to elicit good rules from experts, it is natural to.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
APPLICATIONS OF DATA MINING IN INFORMATION RETRIEVAL.
Testing Theories: Three Reasons Why Data Might not Match the Theory Psych 437.
Machine Learning Chapter 11. Analytical Learning
Understanding the Variability of Your Data: Dependent Variable Two "Sources" of Variability in DV (Response Variable) –Independent (Predictor/Explanatory)
Sorting. Why Sort? Put matching elements together –Uniqueness testing –Deleting duplicates –Frequency Counting –Set operations Prioritize Elements Reconstruct.
Copyright © Cengage Learning. All rights reserved. Elementary Probability Theory 5.
CS Learning Rules1 Learning Sets of Rules. CS Learning Rules2 Learning Rules If (Color = Red) and (Shape = round) then Class is A If (Color.
Psy B07 Chapter 4Slide 1 SAMPLING DISTRIBUTIONS AND HYPOTHESIS TESTING.
AI Week 14 Machine Learning: Introduction to Data Mining Lee McCluskey, room 3/10
Today’s Topics Dealing with Noise Overfitting (the key issue in all of ML) A ‘Greedy’ Algorithm for Pruning D-Trees Generating IF-THEN Rules from D-Trees.
November 10, Machine Learning: Lecture 9 Rule Learning / Inductive Logic Programming.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
CS Decision Trees1 Decision Trees Highly used and successful Iteratively split the Data Set into subsets one attribute at a time, using most informative.
Algorithm Analysis (Algorithm Complexity). Correctness is Not Enough It isn’t sufficient that our algorithms perform the required tasks. We want them.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Bab 5 Classification: Alternative Techniques Part 1 Rule-Based Classifer.
This is a slide show to explain in detail how to solve a puzzle of a common sort. Use the right arrow key to go to the next step and left arrow keys to.
Section 10.1 Confidence Intervals
Overview Concept Learning Representation Inductive Learning Hypothesis
PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
CS Bayesian Learning1 Bayesian Learning A powerful and growing approach in machine learning We use it in our own decision making all the time – You.
Automated Reasoning Early AI explored how to automated several reasoning tasks – these were solved by what we might call weak problem solving methods as.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
CS Inductive Bias1 Inductive Bias: How to generalize on novel data.
CS 5751 Machine Learning Chapter 10 Learning Sets of Rules1 Learning Sets of Rules Sequential covering algorithms FOIL Induction as the inverse of deduction.
First-Order Logic and Inductive Logic Programming.
First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume.
Class1 Class2 The methods discussed so far are Linear Discriminants.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.2: Classification Rules Rodney Nielsen Many.
1 CS 4700: Foundations of Artificial Intelligence Prof. Bart Selman Machine Learning: The Theory of Learning R&N 18.5.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
Dr. Naveed Riaz Design and Analysis of Algorithms 1 1 Formal Methods in Software Engineering Lecture # 26.
More Symbolic Learning CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Chap. 10 Learning Sets of Rules 박성배 서울대학교 컴퓨터공학과.
CS4445 Data Mining B term WPI Solutions HW4: Classification Rules using RIPPER By Chiying Wang 1.
CS4432: Database Systems II
Deep Belief Network Training Same greedy layer-wise approach First train lowest RBM (h 0 – h 1 ) using RBM update algorithm (note h 0 is x) Freeze weights.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Algorithmic Foundations COMP108 COMP108 Algorithmic Foundations Algorithm efficiency Prudence Wong.
Algorithmic Foundations COMP108 COMP108 Algorithmic Foundations Algorithm efficiency Prudence Wong
First-Order Logic and Inductive Logic Programming
Objective of This Course
Rule Learning Hankui Zhuo April 28, 2018.
Presentation transcript:

CS Learning Rules1 Learning Sets of Rules

CS Learning Rules2 Learning Rules If (Color = Red) and (Shape = round) then Class is A If (Color = Blue) and (Size = large) then Class is B If (Shape = square) then Class is A Natural and intuitive hypotheses – Comprehensibility - Easy to understand?

CS Learning Rules3 Learning Rules If (Color = Red) and (Shape = round) then Class is A If (Color = Blue) and (Size = large) then Class is B If (Shape = square) then Class is A Natural and intuitive hypotheses – Comprehensibility - Easy to understand? Ordered (Prioritized) rules - default at the bottom, common but not so easy to comprehend Unordered rules – Theoretically easier to understand, except must – Force consistency, or – Create a separate unordered list for each output class and use a tie- break scheme when multiple lists are matched

CS Learning Rules4 Sequential Covering Algorithms There are a number of rule learning algorithms based on different variations of sequential covering – CN2, AQx, etc. 1. Find a “good” rule for the current training set 2. Delete covered instances (or those covered correctly) from the training set 3. Go back to 1 until the training set is empty or until no more “good” rules can be found

CS Learning Rules5 Finding “Good” Rules How might you quantify a "good rule"

CS Learning Rules6 Finding “Good” Rules The large majority of instances covered by the rule infer the same output class Rule covers as many instances as possible (general vs specific rules) Rule covers enough instances (statistically significant) Example rules and approaches? How to find good rules efficiently? - General to specific search is common Continuous features - some type of ranges/discretization

CS Learning Rules7 Common Rule “Goodness” Approaches Relative frequency: n c /n m-estimate of accuracy (better when n is small): where p is the prior probability of a random instance having the output class of the proposed rule, penalizes rules when n is small, Laplacian common: (n c +1)/(n+|C|) (i.e. m = 1/p c ) Entropy - Favors rules which cover a large number of examples from a single class, and few from others – Entropy can be better than relative frequency – Improves consequent rule induction. R1:(.7,.1,.1,.1) R2 (.7,0.3,0) - entropy selects R2 which makes for better subsequent specializations during later rule growth – Empirically, rules of low entropy have higher significance than relative frequency, but Laplacian often better than entropy

CS Learning Rules8 Mitchell sorts rules after the fact and only deletes correctly classified rules in the iteration We typically add rule to the bottom of the list – more in a minute

CS Learning Rules9

10

CS Learning Rules11 RIPPER

CS Learning Rules12

Insert Rules at Top or Bottom Typically would like focused exception rules higher and more general rules lower in the list Typical (CN2): Delete all instances covered by a rule during learning – Putting new rule on the bottom (i.e. early learned rules stay on top) makes sense since this rule is rated good only after removing all instances covered by previous rules, (i.e. instances which can get by the earlier rules). – Still should get exceptions up top and general rules lower in the list because exceptions can achieve a higher score early and thus be added first (assuming statistical significance) than a general rule which has to cover more cases. Even though E keeps getting diminished there should still be enough data to support reasonable general rules later (in fact the general rules should get increasing scores after true exceptions are removed). Highest scoring rules: Somewhat specific, high accuracy, sufficient coverage Medium scoring rules: General and specific with reasonable accuracy and coverage Low scoring rules: Specific with low coverage, and general with low accuracy 13

Rule Order - Continued If delete only instances correctly covered by a rule – Putting new rules somewhere in the list other than the bottom could make sense because we could learn exception rules for those instances not covered by general rules at the bottom – This only works if the rule placed at the bottom is truly more general than the later rules (i.e. many novel instances will slide past the more exceptional rules and get covered by the general rules at the bottom) Sort after: (Mitchell) Proceed with care because rules were learned based on specific subsets/ordering of the training set Other variations possible, but many could be problematic because there are an exponential number of possible orderings Also can do unordered lists with consistency constraints or tie- breaking mechanisms CS Learning Rules14

CS Learning Rules15 Learning First Order Rules Inductive Logic Programming Propositional vs. first order rules – 1st order allows variables in rules – If Color of object1 = x and Color of object 2 = x then Class is A – More expressive FOIL - Uses a sequential covering approach from general to specific which allows the addition of literals with variables