CS 478 - Learning Rules1 Learning Sets of Rules. CS 478 - Learning Rules2 Learning Rules If (Color = Red) and (Shape = round) then Class is A If (Color.

Slides:



Advertisements
Similar presentations
Robust Feature Selection by Mutual Information Distributions Marco Zaffalon & Marcus Hutter IDSIA IDSIA Galleria 2, 6928 Manno (Lugano), Switzerland
Advertisements

Explanation-Based Learning (borrowed from mooney et al)
Bayesian Treatment of Incomplete Discrete Data applied to Mutual Information and Feature Selection Marcus Hutter & Marco Zaffalon IDSIA IDSIA Galleria.
Data Mining Classification: Alternative Techniques
1 CS 391L: Machine Learning: Rule Learning Raymond J. Mooney University of Texas at Austin.
What is Chi-Square? Used to examine differences in the distributions of nominal data A mathematical comparison between expected frequencies and observed.
From Decision Trees To Rules
RIPPER Fast Effective Rule Induction
Natural Selection Simulation Learning objectives Students are expected to be proficient with either Microsoft PowerPoint or Google Docs Presentation tool.
HIGHER CS STANDARD ALGORITHMS Part 1- Ensuring fuller understanding of the basic concepts
Combining Inductive and Analytical Learning Ch 12. in Machine Learning Tom M. Mitchell 고려대학교 자연어처리 연구실 한 경 수
Knowledge in Learning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 19 Spring 2004.
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Example set X Can Inductive Learning Work? Hypothesis space H Training set  Inductive hypothesis h size.
Knowledge in Learning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 19 Spring 2005.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
CS421 - Course Information Website Syllabus Schedule The Book:
This material in not in your text (except as exercises) Sequence Comparisons –Problems in molecular biology involve finding the minimum number of edit.
Machine Learning: Symbol-Based
Covering Algorithms. Trees vs. rules From trees to rules. Easy: converting a tree into a set of rules –One rule for each leaf: –Antecedent contains a.
Decision trees and empirical methodology Sec 4.3,
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Saturation, Flat-spotting Shift up Derivative Weight Decay No derivative on output nodes.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
17.5 Rule Learning Given the importance of rule-based systems and the human effort that is required to elicit good rules from experts, it is natural to.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Slides for “Data Mining” by I. H. Witten and E. Frank.
{ CS203 Lecture 7 John Hurley Cal State LA. 2 Execution Time Suppose two algorithms perform the same task such as search (linear search vs. binary search)
Machine Learning Chapter 11. Analytical Learning
CS Learning Rules1 Learning Sets of Rules. CS Learning Rules2 Learning Rules If (Color = Red) and (Shape = round) then Class is A If (Color.
AI Week 14 Machine Learning: Introduction to Data Mining Lee McCluskey, room 3/10
Comparing Univariate and Multivariate Decision Trees Olcay Taner Yıldız Ethem Alpaydın Department of Computer Engineering Bogazici University
Today’s Topics Dealing with Noise Overfitting (the key issue in all of ML) A ‘Greedy’ Algorithm for Pruning D-Trees Generating IF-THEN Rules from D-Trees.
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
November 10, Machine Learning: Lecture 9 Rule Learning / Inductive Logic Programming.
CS Decision Trees1 Decision Trees Highly used and successful Iteratively split the Data Set into subsets one attribute at a time, using most informative.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Bab 5 Classification: Alternative Techniques Part 1 Rule-Based Classifer.
Decision Trees. MS Algorithms Decision Trees The basic idea –creating a series of splits, also called nodes, in the tree. The algorithm adds a node to.
Survey Propagation. Outline Survey Propagation: an algorithm for satisfiability 1 – Warning Propagation – Belief Propagation – Survey Propagation Survey.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
CS Bayesian Learning1 Bayesian Learning A powerful and growing approach in machine learning We use it in our own decision making all the time – You.
KU NLP Machine Learning1 Ch 9. Machine Learning: Symbol- based  9.0 Introduction  9.1 A Framework for Symbol-Based Learning  9.2 Version Space Search.
For Monday Finish chapter 19 No homework. Program 4 Any questions?
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
For Monday Finish chapter 19 Take-home exam due. Program 4 Any questions?
CS Inductive Bias1 Inductive Bias: How to generalize on novel data.
CS 5751 Machine Learning Chapter 10 Learning Sets of Rules1 Learning Sets of Rules Sequential covering algorithms FOIL Induction as the inverse of deduction.
First-Order Logic and Inductive Logic Programming.
Class1 Class2 The methods discussed so far are Linear Discriminants.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.2: Classification Rules Rodney Nielsen Many.
THE IMPORTANCE OF DISCRETE MATHEMATICS IN COMPUTER TECHNOLOGY.
Warm Up A ___________________ is a letter used to represent an unknown. Evaluate the expression
David Evans CS200: Computer Science University of Virginia Computer Science Lecture 15: Intractable Problems (Smiley.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.
More Symbolic Learning CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Chap. 10 Learning Sets of Rules 박성배 서울대학교 컴퓨터공학과.
EXAMPLE 3 Standardized Test Practice SOLUTION The theoretical probability of stopping on each of the four colors is.Use the outcomes in the table to find.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Machine Learning in Practice Lecture 18
Chapter 7. Classification and Prediction
Fast Effective Rule Induction
Understanding Results
First-Order Logic and Inductive Logic Programming
Variables ICS2O.
Rule Learning Hankui Zhuo April 28, 2018.
Knowledge in Learning Chapter 19
Translate 5 squares left and 4 squares up.
Presentation transcript:

CS Learning Rules1 Learning Sets of Rules

CS Learning Rules2 Learning Rules If (Color = Red) and (Shape = round) then Class is A If (Color = Blue) and (Size = large) then Class is B If (Shape = square) then Class is A Natural and intuitive hypotheses – Comprehensibility - Easy to understand? Ordered (Prioritized) rules - default at the bottom, common but not so easy to comprehend Unordered rules – Theoretically easier to understand, except must – Force consistency, or – Create a separate unordered list for each output class and use a tie- break scheme when multiple lists are matched

CS Learning Rules3 Sequential Covering Algorithms There are a number of rule learning algorithms based on different variations of sequential covering – CN2, AQx, etc. 1. Find a “good” rule for the current training set 2. Delete covered instances (or those covered correctly) from the training set 3. Go back to 1 until the training set is empty or until no more “good” rules can be found

CS Learning Rules4 Finding “Good” Rules The large majority of instances covered by the rule infer the same output class Rule covers as many instances as possible (general vs specific rules) Rule covers enough instances (statistically significant) Example rules and approaches? How to find good rules efficiently? - General to specific search is common Continuous features - some type of ranges/discretization

CS Learning Rules5 Common Rule “Goodness” Approaches Relative frequency: n c /n m-estimate of accuracy (better when n is small): where p is the prior probability of a random instance having the output class of the proposed rule, penalizes rules with small n, Laplacian common: (n c +1)/(n+|C|) Entropy - Favors rules which cover a large number of examples from a single class, and few from others – Entropy can be better than relative frequency – Improves consequent rule induction. R1:(.7,.1,.1,.1) R2 (.7,0.3,0) - entropy selects R2 which makes for better subsequent specializations during later rule growth – Empirically, rules of low entropy have higher significance than relative frequency, but Laplacian often better than entropy

CS Learning Rules6

7

8

9 Learning First Order Rules Inductive Logic Programming Propositional vs. first order rules – 1st order allows variables in rules – If Color of object1 = x and Color of object 2 = x then Class is A – More expressive FOIL - Uses a sequential covering approach from general to specific which allows the addition of literals with variables

CS Learning Rules10 RIPPER