Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Tuesday, November 27, 2001 William.

Slides:



Advertisements
Similar presentations
1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Advertisements

1er. Escuela Red ProTIC - Tandil, de Abril, Decision Tree Learning 3.1 Introduction –Method for approximation of discrete-valued target functions.
Knowledge Representation and Reasoning Learning Sets of Rules and Analytical Learning Harris Georgiou – 4.
Classification Techniques: Decision Tree Learning
Chapter 10 Learning Sets Of Rules
Machine Learning Chapter 10. Learning Sets of Rules Tom M. Mitchell.
Learning set of rules.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 21 Jim Martin.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
Decision Tree Learning Learning Decision Trees (Mitchell 1997, Russell & Norvig 2003) –Decision tree induction is a simple but powerful learning paradigm.
Computing & Information Sciences Kansas State University Lecture 11 of 42 CIS 530 / 730 Artificial Intelligence Lecture 11 of 42 William H. Hsu Department.
Machine Learning: Symbol-Based
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 22 Jim Martin.
Machine Learning Chapter 3. Decision Tree Learning
Mohammad Ali Keyvanrad
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Wednesday, 11 April 2007 William.
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Friday, 16 March 2007 William.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, February 7, 2001.
CpSc 810: Machine Learning Decision Tree Learning.
Computing & Information Sciences Kansas State University Wednesday, 15 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 20 of 42 Wednesday, 15 October.
November 10, Machine Learning: Lecture 9 Rule Learning / Inductive Logic Programming.
Decision-Tree Induction & Decision-Rule Induction
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Friday, 16 February 2007 William.
Computing & Information Sciences Kansas State University Wednesday, 20 Sep 2006CIS 490 / 730: Artificial Intelligence Lecture 12 of 42 Wednesday, 20 September.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 11 of 41 Wednesday, 15.
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Thursday, 12 April 2007 William.
CS Introduction to AI Tutorial 8 Resolution Tutorial 8 Resolution.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 13 of 41 Monday, 20 September.
Computing & Information Sciences Kansas State University Lecture 13 of 42 CIS 530 / 730 Artificial Intelligence Lecture 13 of 42 William H. Hsu Department.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 17 Wednesday, 01 October.
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Thursday, November 29, 2001.
Decision Trees. What is a decision tree? Input = assignment of values for given attributes –Discrete (often Boolean) or continuous Output = predicated.
For Monday Finish chapter 19 No homework. Program 4 Any questions?
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 12 Friday, 17 September.
Computing & Information Sciences Kansas State University Lecture 14 of 42 CIS 530 / 730 Artificial Intelligence Lecture 14 of 42 William H. Hsu Department.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, February 2, 2000.
Computing & Information Sciences Kansas State University Wednesday, 15 Nov 2006CIS 490 / 730: Artificial Intelligence Lecture 35 of 42 Wednesday, 15 November.
Computing & Information Sciences Kansas State University Monday, 25 Sep 2006CIS 490 / 730: Artificial Intelligence Lecture 14 of 42 Monday, 25 September.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 23 Friday, 17 October.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 14 of 41 Wednesday, 22.
For Wednesday Read 20.4 Lots of interesting stuff in chapter 20, but we don’t have time to cover it all.
CS 5751 Machine Learning Chapter 10 Learning Sets of Rules1 Learning Sets of Rules Sequential covering algorithms FOIL Induction as the inverse of deduction.
First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 15 of 41 Friday 24 September.
Computing & Information Sciences Kansas State University Lecture 15 of 42 CIS 530 / 730 Artificial Intelligence Lecture 15 of 42 William H. Hsu Department.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 18 of 41 Friday, 01 October.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, January 24, 2000 William.
Computing & Information Sciences Kansas State University Wednesday, 19 Sep 2007CIS 530 / 730: Artificial Intelligence Lecture 12 of 42 Wednesday, 19 September.
Computing & Information Sciences Kansas State University Monday, 23 Oct 2006CIS 490 / 730: Artificial Intelligence Lecture 25 of 42 Monday, 23 October.
Computing & Information Sciences Kansas State University Friday, 20 Oct 2006CIS 490 / 730: Artificial Intelligence Lecture 24 of 42 Friday, 20 October.
Computing & Information Sciences Kansas State University Lecture 12 of 42 CIS 530 / 730 Artificial Intelligence Lecture 12 of 42 William H. Hsu Department.
Computing & Information Sciences Kansas State University Wednesday, 13 Sep 2006CIS 490 / 730: Artificial Intelligence Lecture 10 of 42 Wednesday, 13 September.
Computational Learning Theory Part 1: Preliminaries 1.
Computing & Information Sciences Kansas State University CIS 530 / 730: Artificial Intelligence Lecture 09 of 42 Wednesday, 17 September 2008 William H.
Chap. 10 Learning Sets of Rules 박성배 서울대학교 컴퓨터공학과.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 14 of 42 Wednesday, 22.
Computing & Information Sciences Kansas State University Wednesday, 04 Oct 2006CIS 490 / 730: Artificial Intelligence Lecture 17 of 42 Wednesday, 04 October.
Computing & Information Sciences Kansas State University Friday, 13 Oct 2006CIS 490 / 730: Artificial Intelligence Lecture 21 of 42 Friday, 13 October.
Rule-based Learning Propositional Version. Rule Learning Based on generalization operations A generalization (resp. specialization) operation is an operation.
Computing & Information Sciences Kansas State University Monday, 18 Sep 2006CIS 490 / 730: Artificial Intelligence Lecture 11 of 42 Monday, 18 September.
CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
CS 9633 Machine Learning Decision Tree Learning
Decision Tree Learning
Decision Tree Learning
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Machine Learning Chapter 3. Decision Tree Learning
Presentation transcript:

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Tuesday, November 27, 2001 William H. Hsu Department of Computing and Information Sciences, KSU Readings: Sections , Mitchell Section 21.4, Russell and Norvig Section 5.4.5, Shavlik and Dietterich (Shavlik and Towell) Rule Learning and Extraction Lecture 26

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Lecture Outline Readings: Sections , Mitchell; Section 21.4 Russell and Norvig Suggested Exercises: 10.1, 10.2 Mitchell This Week’s Paper Review (Last One!) –“An Approach to Combining Explanation-Based and Neural Network Algorithms”, Shavlik and Towell –Due today, 11/30/1999 Sequential Covering Algorithms –Learning single rules by search Beam search Alternative covering methods –Learning rule sets First-Order Rules –Learning single first-order rules –FOIL: learning first-order rule sets

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Learning Disjunctive Sets of Rules Method 1: Rule Extraction from Trees –Learn decision tree –Convert to rules One rule per root-to-leaf path Recall: can post-prune rules (drop pre-conditions to improve validation set accuracy) Method 2: Sequential Covering –Idea: greedily (sequentially) find rules that apply to (cover) instances in D –Algorithm Learn one rule with high accuracy, any coverage Remove positive examples (of target attribute) covered by this rule Repeat

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Sequential Covering: Algorithm Algorithm Sequential-Covering (Target-Attribute, Attributes, D, Threshold) –Learned-Rules  {} –New-Rule  Learn-One-Rule (Target-Attribute, Attributes, D) –WHILE Performance (Rule, Examples) > Threshold DO Learned-Rules += New-Rule// add new rule to set D.Remove-Covered-By (New-Rule)// remove examples covered by New-Rule New-Rule  Learn-One-Rule (Target-Attribute, Attributes, D) –Sort-By-Performance (Learned-Rules, Target-Attribute, D) –RETURN Learned-Rules What Does Sequential-Covering Do? –Learns one rule, New-Rule –Takes out every example in D to which New-Rule applies (every covered example)

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition IF {Humidity = Normal} THEN Play-Tennis = Yes IF {Wind = Strong} THEN Play-Tennis = No IF {Wind = Light} THEN Play-Tennis = Yes IF {Humidity = High} THEN Play-Tennis = No … Learn-One-Rule: (Beam) Search for Preconditions IF {} THEN Play-Tennis = Yes … IF {Humidity = Normal, Outlook = Sunny} THEN Play-Tennis = Yes IF {Humidity = Normal, Wind = Strong} THEN Play-Tennis = Yes IF {Humidity = Normal, Wind = Light} THEN Play-Tennis = Yes IF {Humidity = Normal, Outlook = Rain} THEN Play-Tennis = Yes

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Learn-One-Rule: Algorithm Algorithm Sequential-Covering (Target-Attribute, Attributes, D) –Pos  D.Positive-Examples() –Neg  D.Negative-Examples() –WHILE NOT Pos.Empty() DO// learn new rule Learn-One-Rule (Target-Attribute, Attributes, D) Learned-Rules.Add-Rule (New-Rule) Pos.Remove-Covered-By (New-Rule) –RETURN (Learned-Rules) Algorithm Learn-One-Rule (Target-Attribute, Attributes, D) –New-Rule  most general rule possible –New-Rule-Neg  Neg –WHILE NOT New-Rule-Neg.Empty() DO// specialize New-Rule 1. Candidate-Literals  Generate-Candidates()// NB: rank by Performance() 2. Best-Literal  argmax L  Candidate-Literals Performance (Specialize-Rule (New-Rule, L), Target-Attribute, D)// all possible new constraints 3. New-Rule.Add-Precondition (Best-Literal)// add the best one 4. New-Rule-Neg  New-Rule-Neg.Filter-By (New-Rule) –RETURN (New-Rule)

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Learn-One-Rule: Subtle Issues How Does Learn-One-Rule Implement Search? –Effective approach: Learn-One-Rule organizes H in same general fashion as ID3 –Difference Follows only most promising branch in tree at each step Only one attribute-value pair (versus splitting on all possible values) –General to specific search (depicted in figure) Problem: greedy depth-first search susceptible to local optima Solution approach: beam search (rank by performance, always expand k best) Easily generalizes to multi-valued target functions (how?) Designing Evaluation Function to Guide Search –Performance (Rule, Target-Attribute, D) –Possible choices Entropy (i.e., information gain) as for ID3 Sample accuracy (n c / n  correct rule predictions / total predictions) m estimate: (n c + mp) / (n + m) where m  weight, p  prior of rule RHS

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Variants of Rule Learning Programs Sequential or Simultaneous Covering of Data? –Sequential: isolate components of hypothesis (e.g., search for one rule at a time) –Simultaneous: whole hypothesis at once (e.g., search for whole tree at a time) General-to-Specific or Specific-to-General? –General-to-specific: add preconditions, Find-G –Specific-to-general: drop preconditions, Find-S Generate-and-Test or Example-Driven? –Generate-and-test: search through syntactically legal hypotheses –Example-driven: Find-S, Candidate-Elimination, Cigol (next time) Post-Pruning of Rules? –Recall (Lecture 5): very popular overfitting recovery method What Statistical Evaluation Method? –Entropy –Sample accuracy (aka relative frequency) –m-estimate of accuracy

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition First-Order Rules What Are First-Order Rules? –Well-formed formulas (WFFs) of first-order predicate calculus (FOPC) –Sentences of first-order logic (FOL) –Example (recursive) Ancestor (x, y)  Parent (x, y). Ancestor (x, y)  Parent (x, z)  Ancestor (z, y). Components of FOPC Formulas: Quick Intro to Terminology –Constants: e.g., John, Kansas, 42 –Variables: e.g., Name, State, x –Predicates: e.g., Father-Of, Greater-Than –Functions: e.g., age, cosine –Term: constant, variable, or function(term) –Literals (atoms): Predicate(term) or negation (e.g.,  Greater-Than (age(John), 42)) –Clause: disjunction of literals with implicit universal quantification –Horn clause: at most one positive literal (H   L 1   L 2  …   L n )

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Learning First-Order Rules Why Do That? –Can learn sets of rules such as Ancestor (x, y)  Parent (x, y). Ancestor (x, y)  Parent (x, z)  Ancestor (z, y). –General-purpose (Turing-complete) programming language PROLOG Programs are such sets of rules (Horn clauses) Inductive logic programming (next time): kind of program synthesis Caveat –Arbitrary inference using first-order rules is semi-decidable Recursive enumerable but not recursive (reduction to halting problem L H ) Compare: resolution theorem-proving; arbitrary queries in Prolog –Generally, may have to restrict power Inferential completeness Expressive power of Horn clauses Learning part

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition First-Order Rule: Example Prolog (FOPC) Rule for Classifying Web Pages –[Slattery, 1997] –Course (A)  Has-Word (A, “instructor”), not Has-Word (A, “good”), Link-From (A, B), Has-Word (B, “assign”), not Link-From (B, C). –Train: 31/31, test: 31/34 How Are Such Rules Used? –Implement search-based (inferential) programs –References Chapters 1-10, Russell and Norvig Online resources at

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition First-Order Inductive Learning (FOIL): Algorithm Algorithm FOIL (Target-Predicate, Predicates, D) –Pos  D.Filter-By(Target-Predicate)// examples for which it is true –Neg  D.Filter-By(Not (Target-Predicate))// examples for which it is false –WHILE NOT Pos.Empty() DO// learn new rule Learn-One-First-Order-Rule (Target-Predicate, Predicates, D) Learned-Rules.Add-Rule (New-Rule) Pos.Remove-Covered-By (New-Rule) –RETURN (Learned-Rules) Algorithm Learn-One-First-Order-Rule (Target-Predicate, Predicate, D) –New-Rule  the rule that predicts Target-Predicate with no preconditions –New-Rule-Neg  Neg –WHILE NOT New-Rule-Neg.Empty() DO// specialize New-Rule 1. Candidate-Literals  Generate-Candidates()// based on Predicates 2. Best-Literal  argmax L  Candidate-Literals FOIL-Gain (L, New-Rule, Target- Predicate, D)// all possible new literals 3. New-Rule.Add-Precondition (Best-Literal)// add the best one 4. New-Rule-Neg  New-Rule-Neg,Filter-By (New-Rule) –RETURN (New-Rule)

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Specializing Rules in FOIL Learning Rule: P(x 1, x 2, …, x k )  L 1  L 2  …  L n. Candidate Specializations –Add new literal to get more specific Horn clause –Form of literal Q(v 1, v 2, …, v r ), where at least one of the v i in the created literal must already exist as a variable in the rule Equal(x j, x k ), where x j and x k are variables already present in the rule The negation of either of the above forms of literals

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Information Gain in FOIL Function FOIL-Gain (L, R, Target-Predicate, D) Where –L  candidate predicate to add to rule R –p 0  number of positive bindings of R –n 0  number of negative bindings of R –p 1  number of positive bindings of R + L –n 1  number of negative bindings of R + L –t  number of positive bindings of R also covered by R + L Note –- lg (p 0 / p 0 + n 0 ) is optimal number of bits to indicate the class of a positive binding covered by R –Compare: entropy (information gain) measure in ID3

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition FOIL: Learning Recursive Rule Sets Recursive Rules –So far: ignored possibility of recursive WFFs New literals added to rule body could refer to target predicate itself i.e., predicate occurs in rule head –Example Ancestor (x, y)  Parent (x, z)  Ancestor (z, y). Rule: IF Parent (x, z)  Ancestor (z, y) THEN Ancestor (x, y) Learning Recursive Rules from Relations –Given: appropriate set of training examples –Can learn using FOIL-based search Requirement: Ancestor  Predicates (symbol is member of candidate set) Recursive rules still have to outscore competing candidates at FOIL-Gain –NB: how to ensure termination? (well-founded ordering, i.e., no infinite recursion) –[Quinlan, 1990; Cameron-Jones and Quinlan, 1993]

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition FOIL: Summary Extends Sequential-Covering Algorithm –Handles case of learning first-order rules similar to Horn clauses –Result: more powerful rules for performance element (automated reasoning) General-to-Specific Search –Adds literals (predicates and negations over functions, variables, constants) –Can learn sets of recursive rules Caveat: might learn infinitely recursive rule sets Has been shown to successfully induce recursive rules in some cases Overfitting –If no noise, might keep adding new literals until rule covers no negative examples –Solution approach: tradeoff (heuristic evaluation function on rules) Accuracy, coverage, complexity FOIL-Gain: an MDL function Overfitting recovery in FOIL: post-pruning

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Terminology Disjunctive Rules –Learning single rules by search Beam search: type of heuristic search that maintains constant-width frontier –Learning rule sets Sequential covering versus simultaneous covering First-Order Rules –Units of first-order predicate calculus (FOPC): constants, variables, predicates, functions, terms, literals (atoms), well-formed fomulas (wffs, clauses) –FOPC quantifiers: universal (  ), existential (  ) –Horn clauses Sentences of Prolog (clauses with  1 positive literal) Of the form: H   L 1   L 2  …   L n (implicit  ), Prolog form H :- L 1, L 2, …, L n. –FOIL: algorithm for learning Horn clauses (including recursive rule sets)

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Summary Points Learning Rules from Data Sequential Covering Algorithms –Learning single rules by search Beam search Alternative covering methods –Learning rule sets First-Order Rules –Learning single first-order rules Representation: first-order Horn clauses Extending Sequential-Covering and Learn-One-Rule: variables in rule preconditions –FOIL: learning first-order rule sets Idea: inducing logical rules from observed relations Guiding search in FOIL Learning recursive rule sets Next Time: Inductive Logic Programming (ILP)