Machine Learning: Lecture 9

Slides:



Advertisements
Similar presentations
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Advertisements

Rerun of machine learning Clustering and pattern recognition.
Computer Science CPSC 322 Lecture 25 Top Down Proof Procedure (Ch 5.2.2)
OWL - DL. DL System A knowledge base (KB) comprises two components, the TBox and the ABox The TBox introduces the terminology, i.e., the vocabulary of.
FORS 8450 Advanced Forest Planning Lecture 4 Extensions to Linear Programming.
Knowledge Representation and Reasoning Learning Sets of Rules and Analytical Learning Harris Georgiou – 4.
Combining Inductive and Analytical Learning Ch 12. in Machine Learning Tom M. Mitchell 고려대학교 자연어처리 연구실 한 경 수
Logic in general Logics are formal languages for representing information such that conclusions can be drawn Syntax defines the sentences in the language.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 580 Artificial Intelligence Ch.5 [P]: Propositions and Inference Sections.
CS590D: Data Mining Prof. Chris Clifton April 21, 2005 Multi-Relational Data Mining.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
Knowledge in intelligent systems So far, we’ve used relatively specialized, naïve agents. How can we build agents that incorporate knowledge and a memory?
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
02 -1 Lecture 02 Agent Technology Topics –Introduction –Agent Reasoning –Agent Learning –Ontology Engineering –User Modeling –Mobile Agents –Multi-Agent.
Inductive Logic Programming Includes slides by Luis Tari CS7741L16ILP.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Notes for Chapter 12 Logic Programming The AI War Basic Concepts of Logic Programming Prolog Review questions.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, February 7, 2001.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
110/19/2015CS360 AI & Robotics AI Application Areas  Neural Networks and Genetic Algorithms  These model the structure of neurons in the brain  Humans.
Chapter 2 Paradigms, Theory, And Research Some Social Science Paradigms Two Logical Systems Revisited Deductive Theory Construction Inductive Theory Construction.
1 Knowledge Representation. 2 Definitions Knowledge Base Knowledge Base A set of representations of facts about the world. A set of representations of.
November 10, Machine Learning: Lecture 9 Rule Learning / Inductive Logic Programming.
Semantic web course – Computer Engineering Department – Sharif Univ. of Technology – Fall Description Logics: Logic foundation of Semantic Web Semantic.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, January 22, 2001 William.
3 ©2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multi-Relational Data Mining: An Introduction Joe Paulowskey.
Data Mining In contrast to the traditional (reactive) DSS tools, the data mining premise is proactive. Data mining tools automatically search the data.
CS690L Data Mining: Classification
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Thursday, November 29, 2001.
The AI War LISP and Prolog Basic Concepts of Logic Programming
Logical Agents Chapter 7. Outline Knowledge-based agents Logic in general Propositional (Boolean) logic Equivalence, validity, satisfiability.
GENETIC PROGRAMMING. THE CHALLENGE "How can computers learn to solve problems without being explicitly programmed? In other words, how can computers be.
Computer Science CPSC 322 Lecture 22 Logical Consequences, Proof Procedures (Ch 5.2.2)
Data Mining and Decision Support
Chap. 10 Learning Sets of Rules 박성배 서울대학교 컴퓨터공학과.
Some Thoughts to Consider 5 Take a look at some of the sophisticated toys being offered in stores, in catalogs, or in Sunday newspaper ads. Which ones.
Artificial Intelligence Knowledge Representation.
March 3, 2016Introduction to Artificial Intelligence Lecture 12: Knowledge Representation & Reasoning I 1 Back to “Serious” Topics… Knowledge Representation.
1 Data Warehousing Data Warehousing. 2 Objectives Definition of terms Definition of terms Reasons for information gap between information needs and availability.
Lecture 14. Recap Problem Solving GA Simple GA Examples of Mutation and Crossover Application Areas.
Data Mining is the process of analyzing data and summarizing it into useful information Data Mining is usually used for extremely large sets of data It.
Business Research Methods William G. Zikmund
RESEARCH APPROACH.
Predicate Logic Lecture 7.
Tutorial - Propositional Logic
Prediction as Data Mining Task
MM6007 Research Method in Management Theory Building Theory.
Data Mining Lecture 11.
GENETIC PROGRAMMING BBB4003.
© 2013 ExcelR Solutions. All Rights Reserved Data Mining - Supervised Decision Tree & Random Forest.
Syntax Questions 6. Define a left recursive grammar rule.
Basic Intro Tutorial on Machine Learning and Data Mining
EXPERT SYSTEMS.
Business Research Methods William G. Zikmund
Probabilistic Horn abduction and Bayesian Networks
Connectionist Knowledge Representation and Reasoning
CS490D: Introduction to Data Mining Prof. Chris Clifton
Back to “Serious” Topics…
2004: Topics Covered in COSC 6368
Lecture 14 Learning Inductive inference
GENETIC PROGRAMMING BBB4003.
Business Research Methods William G. Zikmund
Presentation transcript:

Machine Learning: Lecture 9 Rule Learning / Inductive Logic Programming / Association Rules

Learning Rules One of the most expressive and human readable representations for learned hypotheses is sets of production rules (if-then rules). Rules can be derived from other representations (e.g., decision trees) or they can be learned directly. Here, we are concentrating on the direct method. An important aspect of direct rule-learning algorithms is that they can learn sets of first-order rules which have much more representational power than the propositional rules that can be derived from decision trees. Rule Learnng also allows the incorporation of background knowledge nto the process. Learning rules is also useful for the data mining task of association rules mining.

Propositional versus First-Order Logic Propositional Logic does not include variables and thus cannot express general relations among the values of the attributes. Example 1: in Propositional logic, you can write: IF (Father1=Bob) ^ (Name2=Bob)^ (Female1=True) THEN Daughter1,2=True. This rule applies only to a specific family! Example 2: In First-Order logic, you can write: IF Father(y,x) ^ Female(y), THEN Daughter(x,y) This rule (which you cannot write in Propositional Logic) applies to any family!

Learning Propositional versus First-Order Rules Both approaches to learning are useful as they address different types of learning problems. Like Decision Trees, Feedforward Neural Nets and IBL systems, Propositional Rule Learning systems are suited for problems in which no substantial relationship between the values of the different attributes needs to be represented. In First-Order Learning Problems, the hypotheses that must be represented involve relational assertions that can be conveniently expressed using first-order representations such as horn clauses (H <- L1 ^…^Ln).

Learning Propositional Rules: Sequential Covering Algorithms Sequential-Covering(Target_attribute, Attributes, Examples, Threshold) Learned_rules <-- { } Rule <-- Learn-one-rule(Target_attribute, Attributes, Examples) While Performance(Rule, Examples) > Threshold, do Learned_rules <-- Learned_rules + Rule Examples <-- Examples -{examples correctly classified by Rule} Learned_rules <-- sort Learned_rules according to Performance over Examples Return Learned_rules

Learning Propositional Rules: Sequential Covering Algorithms The algorithm is called a sequential covering algorithm because it sequentially learns a set of rules that together cover the whole set of positive examples. It has the advantage of reducing the problem of learning a disjunctive set of rules to a sequence of simpler problems, each requiring that a single conjunctive rule be learned. The final set of rules is sorted so that the most accurate rules are considered first at classification time. However, because it does not backtrack, this algorithm is not guaranteed to find the smallest or best set of rules ---> Learn-one-rule must be very effective!

Learning Propositional Rules: Learn-one-rule General-to-Specific Search: Consider the most general rule (hypothesis) which matches every instances in the training set. Repeat Add the attribute that most improves rule performance measured over the training set. Until the hypothesis reaches an acceptable level of performance. General-to-Specific Beam Search (CN2): Rather than considering a single candidate at each search step, keep track of the k best candidates.

Comments and Variations regarding the Basic Rule Learning Algorithms Sequential versus Simultaneous covering: sequential covering algorithms (CN2) make a larger number of independent choices than simultaneous covering ones (ID3). Direction of the search: CN2 uses a general-to-specific search strategy. Other systems (GOLEM) uses a specific to general search strategy. General-to-specific search has the advantage of having a single hypothesis from which to start. Generate-then-test versus example-driven: CN2 is a generate-then-test method. Other methods (AQ, CIGOL) are example-driven. Generate-then-test systems are more robust to noise.

Comments and Variations regarding the Basic Rule Learning Algorithms,Cont’d Post-Pruning: pre-conditions can be removed from the rule whenever this leads to improved performance over a set of pruning examples distinct from the training set. Performance measure: different evaluation can be used. Example: relative frequency (AQ), m-estimate of accuracy (certain versions of CN2) and entropy (original CN2).

Example: RIPPER (this and the next three slides are borrowed from E Example: RIPPER (this and the next three slides are borrowed from E. Alpaydin, Lecture Notes for An Introduction to machine Learning, 2004, MIT Press. (Chapter 9)). There are two kinds of loop in the Ripper algorithm: Outer loop : adding one rule at a time to the rule base Inner loop : adding one condition at a time to the current rule Conditions are added to the rule to maximize an information gain measure. Conditions are added to the rule until it covers no negative example.

O(Nlog2N) DL: description length of the rule base The description length of a rule base = (the sum of the description lengths of all the rules in the rule base) + (the description of the instances not covered by the rule base)

Ripper Algorithm In Ripper, conditions are added to the rule to Maximize an information gain measure R : the original rule R’ : the candidate rule after adding a condition N (N’): the number of instances that are covered by R (R’) N+ (N’+): the number of true positives in R (R’) s : the number of true positives in R and R’ (after adding the condition) Until it covers no negative example p and n : the number of true and false positives respectively. Rule value metric

h1: Child(u,v) <-- Father(v,u) h2: Child(u,v) <-- Parent(v,u) Incorporating Background Knowledge into the Learning Process: Induction as Inverted Deduction Let D be a set of training examples, each of the form <xi,f(xi)>. Then, learning is the problem of discovering a hypothesis h, such that the classification f(xi) of each training instance xi follows deductively from the hypothesis h, the description of xi and any other background knowledge B known to the system. Example: xi: Male(Bob), Female(Sharon), Father(Sharon, Bob) f(xi): Child(Bob, Sharon) B: Parent(u,v) <-- Father(v,u) we want to find h s.t., (B^h^xi) |-- f(xi). h1: Child(u,v) <-- Father(v,u) h2: Child(u,v) <-- Parent(v,u)

Learning Sets of First-Order Rules: FOIL (Quinlan, 1990) FOIL is similar to the Propositional Rule learning approach except for the following: FOIL accommodates first-order rules and thus needs to accommodate variables in the rule pre-conditions. FOIL uses a special performance measure (FOIL-GAIN) which takes into account the different variable bindings. FOILS seeks only rules that predict when the target literal is True (instead of predicting when it is True or when it is False). FOIL performs a simple hillclimbing search rather than a beam search.

Association Rule Mining (borrowed from Stan Matwin’s slides) Given I={i1,.., im} set of items D a set of transaction (a database), each transaction is a set of items T in 2I . Association rule: X => Y, X in I, Y in I, X inter Y = 0 The support of an itemset is defined as the proportion of transactions in the data set which contain the itemset. The confidence of a rule is defined conf(X => Y)= supp(X U Y)/supp(X) Itemset is frequent if its support > θ

Itemsets and Association Rules Itemset = set of items k-itemset = set of k-items Finding association rules in databases: Find all frequent (or large) itemsets (those with support > minS) Generate rules that satisfy minimum confidence Example of an association rule: People who buy a computer also buy financial software (support of 2%; confidence of 60%)

Example Itemset{milk, bread, butter} Rule {Bread, Butter} => {Milk} transaction ID milk bread butter beer 1 2 3 4 5 Itemset{milk, bread, butter} Support 1/5 = .2 Rule {Bread, Butter} => {Milk} Confidence = 0.2/0.2 = 1

Apriori Algorithm Start with individual items with large support In each next step, k, Use itemsets from step k-1, generate new itemset Ck Compute Ck’s support Prune the ones that are below the threshold θ Apriori property: All [non-empty] subsets of a frequent itemset must be frequent

Apriori Algorithm: Example from Han Kamber, Data Mining, p.232 TID List of Items ID T100 I1, I2, I5 T200 I2, I4 T300 I2, I3 T400 I1, I2, I4 T500 I1, I3 T600 T700 T800 I1, I2, I3, I5 T900 I1, I2, I3

Apriori Algorithm: Example from Han Kamber, Data Mining, p Apriori Algorithm: Example from Han Kamber, Data Mining, p.232 (Cont’d)

From itemsets to association rules For each Frequent itemset I generate all the partitions of I into s, I-s Attempt a rule s => I-s iff support_count(I)/support_count(s) > minc