Cenni di Machine Learning: Decision Tree Learning Fabio Massimo Zanzotto.

Slides:



Advertisements
Similar presentations
COMP3740 CR32: Knowledge Management and Adaptive Systems
Advertisements

PROLOG MORE PROBLEMS.
Learning from Observations
Induction of Decision Trees (IDT)
Learning from Observations Chapter 18 Section 1 – 3.
2 x0 0 12/13/2014 Know Your Facts!. 2 x1 2 12/13/2014 Know Your Facts!
2 x /18/2014 Know Your Facts!. 11 x /18/2014 Know Your Facts!
2 x /10/2015 Know Your Facts!. 8 x /10/2015 Know Your Facts!
ICS 178 Intro Machine Learning
5 x4. 10 x2 9 x3 10 x9 10 x4 10 x8 9 x2 9 x4.
1 The Restaurant Domain Will they wait, or not?. 2 Decision Trees Patrons? NoYesWaitEst? No Alternate?Hungry?Yes Reservation?Fri/Sat?Alternate?Yes NoYesBar?Yes.
Parallel algorithms for expression evaluation Part1. Simultaneous substitution method (SimSub) Part2. A parallel pebble game.
DIRECTIONAL ARC-CONSISTENCY ANIMATION Fernando Miranda 5986/M
The Problem of K Maximum Sums and its VLSI Implementation Sung Eun Bae, Tadao Takaoka University of Canterbury Christchurch New Zealand.
Multiplication Facts Practice
Graeme Henchel Multiples Graeme Henchel
The Project Problem formulation (one page) Literature review –“Related work" section of final paper, –Go to writing center, –Present paper(s) to class.
0 x x2 0 0 x1 0 0 x3 0 1 x7 7 2 x0 0 9 x0 0.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 27, 2012.
Classification Techniques: Decision Tree Learning
Learning Department of Computer Science & Engineering Indian Institute of Technology Kharagpur.
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
Decision Tree Algorithm
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Induction of Decision Trees
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
LEARNING DECISION TREES
1 Theory of Inductive Learning zSuppose our examples are drawn with a probability distribution Pr(x), and that we learned a hypothesis f to describe a.
Learning decision trees derived from Hwee Tou Ng, slides for Russell & Norvig, AI a Modern Approachslides Tom Carter, “An introduction to information theory.
Learning decision trees
Learning decision trees derived from Hwee Tou Ng, slides for Russell & Norvig, AI a Modern Approachslides Tom Carter, “An introduction to information theory.
ICS 273A Intro Machine Learning
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Inductive learning Simplest form: learn a function from examples
LEARNING DECISION TREES Yılmaz KILIÇASLAN. Definition - I Decision tree induction is one of the simplest, and yet most successful forms of learning algorithm.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Experimental Evaluation of Learning Algorithms Part 1.
Learning from observations
Learning from Observations Chapter 18 Through
CHAPTER 18 SECTION 1 – 3 Learning from Observations.
Learning from Observations Chapter 18 Section 1 – 3, 5-8 (presentation TBC)
Learning from Observations Chapter 18 Section 1 – 3.
Decision Trees. Decision trees Decision trees are powerful and popular tools for classification and prediction. The attractiveness of decision trees is.
Classification Techniques: Bayesian Classification
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
Decision Trees. What is a decision tree? Input = assignment of values for given attributes –Discrete (often Boolean) or continuous Output = predicated.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
CSC 8520 Spring Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring,
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
CIS 335 CIS 335 Data Mining Classification Part I.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Chapter 18 Section 1 – 3 Learning from Observations.
Learning From Observations Inductive Learning Decision Trees Ensembles.
CSE343/543 Machine Learning: Lecture 4.  Chapter 3: Decision Trees  Weekly assignment:  There are lot of applications and systems using machine learning.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Learning from Observations
Learning from Observations
Introduce to machine learning
Presented By S.Yamuna AP/CSE
Dipartimento di Ingegneria «Enzo Ferrari»,
Learning from Observations
Learning from Observations
Decision trees One possible representation for hypotheses
Presentation transcript:

Cenni di Machine Learning: Decision Tree Learning Fabio Massimo Zanzotto

© F.M.ZanzottoLogica per la Programmazione e la Dimostrazione Automatica University of Rome Tor Vergata Concept Learning… what? animals plants Animal! Plant! Animal! ?

© F.M.ZanzottoLogica per la Programmazione e la Dimostrazione Automatica University of Rome Tor Vergata Quick background on Supervised Machine Learning Classifier Learner Instance Instance in a feature space yiyi {(x 1,y 1 ) (x 2,y 2 ) … (x n,y n )} Training Set Learnt Model xixi xixi

© F.M.ZanzottoLogica per la Programmazione e la Dimostrazione Automatica University of Rome Tor Vergata Learning… What? Concepts/Classes/Sets How? using similarities among objects/instances Where these similarities may be found?

© F.M.ZanzottoLogica per la Programmazione e la Dimostrazione Automatica University of Rome Tor Vergata Observation Space: Feature Space A Feature Space (FS) is a vector space defined as: an instance i is a point in the feature space: A classification function is: where T is the set of the target classes

© F.M.ZanzottoLogica per la Programmazione e la Dimostrazione Automatica University of Rome Tor Vergata Observation Space: Feature Space Pixel matrix? Ditribution of the colors Features as: –has leaves? –is planted in the ground? –... Note: in deciding the feature space some prior knowledge is used

© F.M.ZanzottoLogica per la Programmazione e la Dimostrazione Automatica University of Rome Tor Vergata Definition of the classification problem Learning requires positive and negative examples Verifying learning algorithms require test examples Given a set of instances I and a desired target function tagger Tagger, we need annotators to define training and testing examples

© F.M.ZanzottoLogica per la Programmazione e la Dimostrazione Automatica University of Rome Tor Vergata Annotation is an hard work Annotation procedure: –definition of the scope: the target classes T –definition of the set of the instances I –definition of the annotation procedure –actual annotation

© F.M.ZanzottoLogica per la Programmazione e la Dimostrazione Automatica University of Rome Tor Vergata Annotation problems Are the target classes well defined (and shared among people)? How is possible to measure this? inter-annotator agreement instances should be annotated by more than one annotators

© F.M.ZanzottoLogica per la Programmazione e la Dimostrazione Automatica University of Rome Tor Vergata Speeding up the annotation and controlling the results Annotator 1 Annotator 2 Annotator 3 section for the inter-annotator agreement Instances in I

© F.M.ZanzottoLogica per la Programmazione e la Dimostrazione Automatica University of Rome Tor Vergata After the annotation We have: –the behaviour of a tagger Tagger oracle : I T for all the examples in I

© F.M.ZanzottoLogica per la Programmazione e la Dimostrazione Automatica University of Rome Tor Vergata Using the annotated material Supervised learning: –I is divided in two halves: I training : used to train the Tagger I testing : used to test the Tagger or –I is divided in n parts, and the training-testing is done n times (n-fold cross validation), in each iteration: n-1 parts are used for training 1 part is used for testing

© F.M.ZanzottoLogica per la Programmazione e la Dimostrazione Automatica University of Rome Tor Vergata Evaluation Metrics Given the oracle: Tagger oracle : I T Accuracy of the tagger: where

© F.M.ZanzottoLogica per la Programmazione e la Dimostrazione Automatica University of Rome Tor Vergata Precision/Recall/F-measure needed for taggers that assign more than one category, i.e., defining: System={(i,t)|i I testing and t Tagger(i)} Oracle={(i,t)|i I testing and t=Tagger oracle (i)} precision and recall of the system are defined as: Precision = |System Oracle| / |System| Recall = |System Oracle| / |Oracle|

© F.M.ZanzottoLogica per la Programmazione e la Dimostrazione Automatica University of Rome Tor Vergata Categorizzazione: come avviene? Attributo definitorio –Studiata e provata sulle reti di concetti (più il concetto richiesto è distante dellattributo definitorio più il tempo di risposta è alto) Basata su esempi (recenti) Prototipo

© F.M.ZanzottoLogica per la Programmazione e la Dimostrazione Automatica University of Rome Tor Vergata Categorizzazione: in apprendimento automatico Attributo definitorio –Decision trees and decision tree learning Basata su esempi (recenti) –K-neighbours and lazy learning Prototipo –Class Centroids and Rocchio Formula

© F.M.ZanzottoLogica per la Programmazione e la Dimostrazione Automatica University of Rome Tor Vergata Categorizzazione: in apprendimento automatico Attributo definitorio –Decision trees and decision tree learning

© F.M.ZanzottoLogica per la Programmazione e la Dimostrazione Automatica University of Rome Tor Vergata Decision Tree: example Waiting at the restaurant. Given a set of values of the initial attributes, foresee whether or not you are going to wait.

© F.M.ZanzottoLogica per la Programmazione e la Dimostrazione Automatica University of Rome Tor Vergata Decision Tree: how is it? Patrons? NoYes Hungry? Type? Fri/Sat? No Yes No none somefull NoYes French Italian noyes Thai burger

© F.M.ZanzottoLogica per la Programmazione e la Dimostrazione Automatica University of Rome Tor Vergata Will we wait, or not? The Restaurant Domain

© F.M.ZanzottoLogica per la Programmazione e la Dimostrazione Automatica University of Rome Tor Vergata Splitting Examples by Testing on Attributes + X1, X3, X4, X6, X8, X12 (Positive examples) - X2, X5, X7, X9, X10, X11 (Negative examples)

© F.M.ZanzottoLogica per la Programmazione e la Dimostrazione Automatica University of Rome Tor Vergata Splitting Examples by Testing on Attributes (cont) + X1, X3, X4, X6, X8, X12 (Positive examples) - X2, X5, X7, X9, X10, X11 (Negative examples) Patrons? + - X7, X11 none some full +X1, X3, X6, X8 - +X4, X12 - X2, X5, X9, X10

© F.M.ZanzottoLogica per la Programmazione e la Dimostrazione Automatica University of Rome Tor Vergata Splitting Examples by Testing on Attributes (cont) + X1, X3, X4, X6, X8, X12 (Positive examples) - X2, X5, X7, X9, X10, X11 (Negative examples) Patrons? + - X7, X11 none some full +X1, X3, X6, X8 - +X4, X12 - X2, X5, X9, X10 NoYes

© F.M.ZanzottoLogica per la Programmazione e la Dimostrazione Automatica University of Rome Tor Vergata Splitting Examples by Testing on Attributes (cont) + X1, X3, X4, X6, X8, X12 (Positive examples) - X2, X5, X7, X9, X10, X11 (Negative examples) Patrons? + - X7, X11 none some full +X1, X3, X6, X8 - +X4, X12 - X2, X5, X9, X10 Hungry? + X4, X12 - X2, X X5, X9 no yes NoYes

© F.M.ZanzottoLogica per la Programmazione e la Dimostrazione Automatica University of Rome Tor Vergata Patrons? + - X7, X11 none some full +X1, X3, X6, X8 - +X4, X12 - X2, X5, X9, X10 Type? + X1 - X5 French ItalianThai +X6 - X10 +X3, X12 - X7, X9 + X4,X8 - X2, X11 Burger What Makes a Good Attribute? Better Attribute Not As Good An Attribute

© F.M.ZanzottoLogica per la Programmazione e la Dimostrazione Automatica University of Rome Tor Vergata Patrons? NoYes Hungry? Type? Fri/Sat? No Yes No none somefull NoYes French Italian noyes Thai burger Final Decision Tree

© F.M.ZanzottoLogica per la Programmazione e la Dimostrazione Automatica University of Rome Tor Vergata Decision Tree Learning: ID3 algorithm function ID3 (R: attributes, S: a training set, default_value) returns a decision tree; begin if S=, then return default_value; else if S consists of records all with the value v then return value v; else f R=, then return the most frequent value v for records of S; else let A be the attribute with largest Gain(A,S) among attributes in R; let {a j | j=1,2,.., m} be the values of attribute A; let {S j | j=1,2,.., m} be the subsets of S consisting respectively of records with value a j for A; return a tree with root labeled A and arcs labeled a 1, a 2,.., a m going respectively to the trees (ID3(R-{A}, S 1 ), ID3(R-{A}, S 2 ),.....,ID3(R-{A}, S m ); end

© F.M.ZanzottoLogica per la Programmazione e la Dimostrazione Automatica University of Rome Tor Vergata Attribute selection function: Gain(A,S) Information Theory –each set of messages has its intrinsic information related to the probability of a given message in the text It is demonstrated that: –M is the set of messages –p is a probability distribution of the messages –the information carried by the set of messages is:

© F.M.ZanzottoLogica per la Programmazione e la Dimostrazione Automatica University of Rome Tor Vergata Attribute selection function: Gain(A,S) Given: –a set of target classes T –A: an attribute with values a in A –S: a set of training instances Using the notion of information I(M) it is possible to define:

© F.M.ZanzottoLogica per la Programmazione e la Dimostrazione Automatica University of Rome Tor Vergata Example: Gain(A,S) with a binary classification Abusing the notation and using the maximun likelihood probability estimation model: where: –p is the # of positive examples in S –n is the # of negative examples in S –p a is the # of positive examples in S fixing the value of A as a –n a is the # of negative examples in S fixing the value of A as a

© F.M.ZanzottoLogica per la Programmazione e la Dimostrazione Automatica University of Rome Tor Vergata Prolog Coding attribute(fri,[yes,no]). attribute(hun,[yes,no]). attribute(pat,[none,some,full]). example(yes, [fri = no, hun = yes, pat = some, …]).

© F.M.ZanzottoLogica per la Programmazione e la Dimostrazione Automatica University of Rome Tor Vergata induce_tree( Tree) :- findall( example( Class, Obj), example( Class, Obj), Examples), findall( Att, attribute( Att, _ ), Attributes), induce_tree( Attributes, Examples, Tree).

© F.M.ZanzottoLogica per la Programmazione e la Dimostrazione Automatica University of Rome Tor Vergata induce_tree( Attributes, Examples, tree( Attribute, SubTrees)) :- choose_attribute( Attributes, Examples, Attribute), !, del( Attribute, Attributes, RestAtts), % Delete Attribute attribute( Attribute, Values), induce_trees( Attribute, Values, RestAtts, Examples, SubTrees). induce_tree( _, Examples, leaf( ExClasses)) :- % No (useful) attribute, leaf with class distr. findall( Class, member( example( Class, _), Examples), ExClasses).

© F.M.ZanzottoLogica per la Programmazione e la Dimostrazione Automatica University of Rome Tor Vergata induce_tree( _, [], null) :- !. induce_tree( _, [example( Class,_ ) | Examples], leaf( Class)) :- not ( member( example( ClassX, _), Examples), % No other example ClassX \== Class), !. % of different class