Decision Trees - Intermediate

Slides:



Advertisements
Similar presentations
Learning from Observations Chapter 18 Section 1 – 3.
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Spring 2004.
Decision Tree Rong Jin. Determine Milage Per Gallon.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2005.
Learning From Observations
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2004.
Induction of Decision Trees
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18.
ICS 273A Intro Machine Learning
Three kinds of learning
LEARNING DECISION TREES
Learning decision trees derived from Hwee Tou Ng, slides for Russell & Norvig, AI a Modern Approachslides Tom Carter, “An introduction to information theory.
Learning decision trees
Learning decision trees derived from Hwee Tou Ng, slides for Russell & Norvig, AI a Modern Approachslides Tom Carter, “An introduction to information theory.
Learning….in a rather broad sense: improvement of performance on the basis of experience Machine learning…… improve for task T with respect to performance.
ICS 273A Intro Machine Learning
Decision Tree Learning
Issues with Data Mining
Machine Learning CPS4801. Research Day Keynote Speaker o Tuesday 9:30-11:00 STEM Lecture Hall (2 nd floor) o Meet-and-Greet 11:30 STEM 512 Faculty Presentation.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Inductive learning Simplest form: learn a function from examples
Decision Trees & the Iterative Dichotomiser 3 (ID3) Algorithm David Ramos CS 157B, Section 1 May 4, 2006.
LEARNING DECISION TREES Yılmaz KILIÇASLAN. Definition - I Decision tree induction is one of the simplest, and yet most successful forms of learning algorithm.
Learning from observations
Learning from Observations Chapter 18 Through
CHAPTER 18 SECTION 1 – 3 Learning from Observations.
Learning from Observations Chapter 18 Section 1 – 3, 5-8 (presentation TBC)
Learning from Observations Chapter 18 Section 1 – 3.
CS690L Data Mining: Classification
Decision Trees. What is a decision tree? Input = assignment of values for given attributes –Discrete (often Boolean) or continuous Output = predicated.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Decision Tree Learning
Data Mining and Decision Support
CSC 8520 Spring Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring,
Chapter 18 Section 1 – 3 Learning from Observations.
Learning From Observations Inductive Learning Decision Trees Ensembles.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
Decision Tree Learning CMPT 463. Reminders Homework 7 is due on Tuesday, May 10 Projects are due on Tuesday, May 10 o Moodle submission: readme.doc and.
Machine Learning: Ensemble Methods
Learning from Observations
Learning from Observations
Machine Learning Inductive Learning and Decision Trees
Introduce to machine learning
Decision Trees.
Artificial Intelligence
Presented By S.Yamuna AP/CSE
Data Science Algorithms: The Basic Methods
Issues in Decision-Tree Learning Avoiding overfitting through pruning
Machine Learning Chapter 3. Decision Tree Learning
CS 4700: Foundations of Artificial Intelligence
Machine Learning: Lecture 3
Machine Learning Chapter 3. Decision Tree Learning
CSCI N317 Computation for Scientific Applications Unit Weka
Learning from Observations
Machine Learning in Practice Lecture 17
Evaluating Classifiers
Learning from Observations
Decision trees One possible representation for hypotheses
Decision Trees Jeff Storey.
Machine Learning: Decision Tree Learning
Data Mining CSCI 307, Spring 2019 Lecture 6
Presentation transcript:

Decision Trees - Intermediate Some material from Russell and Norvig, Artificial Intelligence, a Modern Approach, 2009 Villanova University Machine Learning Project

The Inductive Learning Problem Extrapolate from a given set of examples to make accurate predictions about future examples Concept learning or classification Given a set of examples of some concept/class/category, determine if a given example is an instance of the concept If it is an instance, we call it a positive example If it is not, it is called a negative example Usually called supervised learning Villanova University Machine Learning Project Decision Trees

Inductive Learning Framework Representation must extract from possible observations a feature vector of relevant features for each example. The number of attributes and values for the attributes are fixed (although values can be continuous). Each example is represented as a specific feature vector, and is identified as a positive or negative instance. Each example can be interpreted as a point in an n-dimensional feature space, where n is the number of attributes Villanova University Machine Learning Project Decision Trees

Machine Learning Project Hypotheses The task of a supervised learning system can be viewed as learning a function which predicts the outcome from the inputs: Given a training set of N example pairs (xI, yI) (x2,y2)...(xn,yn), where each yj was generated by an unknown function y = f(x), discover a function h that approximates the true function y h is our hypothesis, and learning is the process of finding a good h in the space of possible hypotheses Prefer simplest consistent with the data Tradeoff between fit and generalizability Tradeoff between fit and computational complexity Villanova University Machine Learning Project Decision Trees

Decision Tree Induction Very common machine learning and data mining technique. One of the earliest methods for inductive learning Induction of Decision Trees, J. Ross Quinlan Machine Learning Vol1: 81-106,Kluwer Academic Publishers, 1986. Given: Examples Attributes Goal (Classes) Pick “important” attribute: one which divides set cleanly. Recur with subsets not yet classified. Villanova University Machine Learning Project Decision Trees

Machine Learning Project A Restaurant Domain Develop a decision tree to model the decision a patron makes when deciding whether or not to wait for a table at a restaurant Two classes: wait, leave Ten attributes: Alternative available? Bar in restaurant? Is it Friday/Saturday? Are we hungry? How full is the restaurant? How expensive? Is it raining? Do we have a reservation? What type of restaurant is it? What’s the purported waiting time? Training set of 12 examples ~ 7000 possible cases Villanova University Machine Learning Project Decision Trees

What Might Your First Question Be? Alternative available? Bar in restaurant? Is it Friday or Saturday? Are we hungry? How full is the restaurant? How expensive? Is it raining? Do we have a reservation? What type of restaurant is it? What’s the purported waiting time? Villanova University Machine Learning Project Decision Trees

A Decision Tree from Introspection Villanova University Machine Learning Project Decision Trees

Machine Learning Project A Training Set Villanova University Machine Learning Project Decision Trees

Machine Learning Project Thinking About It Looking at these examples, now what might you expect the first question to be? The second? Villanova University Machine Learning Project Decision Trees

Machine Learning Project Tree by Inspection You have a copy of this table Get together in threes Decide on a decision tree Choose a representative to come up and draw your tree on the whiteboard Someone with legible handwriting! What issues came up? How many decisions did your tree have? Was it balanced? How do you decide what to split next? How good was it? Did every case get classified correctly? How many decisions would cases take? How many cases at the leaves? Do you think it would generalize? Villanova University Machine Learning Project Decision Trees

What Does your Group’s Tree Look Like? Villanova University Machine Learning Project Decision Trees

Machine Learning Project Choosing an attribute Idea: a good attribute splits the examples into subsets that are (ideally) "all positive" or "all negative" Which is the better choice? Patrons makes a cleaner split; we get 2 clean categories. Type doesn’t gain us anything at all. Villanova University Machine Learning Project Decision Trees

Machine Learning Project Best Attribute What’s the best attribute to choose? The one with the best information gain If we choose Bar, we have no: 3 -, 3 + yes: 3 -, 3+ If we choose Hungry, we have no: 4-, 1 + yes: 1 -, 5+ Hungry has given us more information about the correct classification. So we want to choose the attribute split which gives us the most useful division of our data Villanova University Machine Learning Project Decision Trees

Machine Learning Project ID3 A greedy algorithm for decision tree construction originally developed by Ross Quinlan, 1987 Top-down construction of decision tree by recursively selecting “best attribute” to use at the current node Once an attribute is selected, generate children nodes, one for each possible value of selected attribute Partition examples using possible values of attribute, assign subsets of examples to appropriate child node Repeat for each child node until all examples associated with a node are either all positive or all negative J48 is an improved version of ID3 In Weka j4.8 is an updated ID3 Villanova University Machine Learning Project Decision Trees

One Possible Learned Tree Substantially simpler than “true” tree---a more complex hypothesis isn’t justified by small amount of data Note that it is much simpler than my induced tree, and just as accurate. Villanova University Machine Learning Project Decision Trees

Machine Learning Project How well does it work? Many case studies have shown that decision trees are at least as accurate as human experts. A study for diagnosing breast cancer had humans correctly classifying the examples 65% of the time; the decision tree classified 72% correct British Petroleum designed a decision tree for gas-oil separation for offshore oil platforms that replaced an earlier rule-based expert system Cessna designed an airplane flight controller using 90,000 examples and 20 attributes per example Villanova University Machine Learning Project Decision Trees

More on Attribute Splits Each node tests one attribute The split does not need to be binary; note the “Outlook” split in the Weka weather data ID3 required nominal attributes; ID4.5 has been extended to numeric attributes, such as humidity. Tree from running Weka’s J48 on weather.numeric.arff Villanova University Machine Learning Project Decision Trees

Machine Learning Project Pruning With enough levels of a decision tree we can always get the leaves to be 100% positive or negative (if there is no inconsistency in the data) But if we are down to one or two cases in each leaf we are probably overfitting Useful to prune leaves; stop when we reach a certain level we reach a small enough size leaf our information gain is increasing too slowly If exactly the same values of xi lead to different yi, you can’t get a perfect tree. Villanova University Machine Learning Project Decision Trees

Machine Learning Project Expressiveness Decision trees can express any function of input attributes. E.g., for Boolean functions, truth table row → path to leaf: Trivially, there is a consistent decision tree for any training set with one path to leaf for each example (unless f nondeterministic in x) but it probably won't generalize to new examples Prefer to find more compact decision trees Villanova University Machine Learning Project Decision Trees

Machine Learning Project BUT! Decision tree tests are univariate: one attribute at a time In the OR tree we have captured the “OR” by essentially replicating the B question under both A answers. Inefficient if we have many attributes and/or values. Really inefficient if out attributes are real-valued. So a decision tree can express a function or model with a complex relationship among attributes but it may be unusably complicated and inefficient. Villanova University Machine Learning Project Decision Trees

Decision Tree Architecture Knowledge Base: the decision tree itself. Performer: tree walker Critic: actual outcome in training case Learner: ID3 or its variants This is an example of a large class of learners that need all of the examples at once in order to learn. Batch, not incremental. Villanova University Machine Learning Project Decision Trees

Strengths of Decision Trees Strengths include Fast to learn and to use Simple to implement Can look at the tree and see what is going on -- relatively “white box” Has been empirically validated many times Handles noisy data (with pruning) Quinlan’s C4.5 and C5.0 are extension of ID3 that account for unavailable values, continuous attribute value ranges, pruning of decision trees, rule derivation. Villanova University Machine Learning Project Decision Trees

Decision Tree Weaknesses Weaknesses include: Univariate splits/partitioning (one attribute at a time) limits types of possible trees Large decision trees may be hard to understand Requires fixed-length feature vectors Non-incremental (i.e., batch method) For continuous or real-valued features requires additional complexity to choose decision points Prone to over-fitting Villanova University Machine Learning Project Decision Trees

Summary: Decision Tree Learning Model being learned is a tree of nodes Each node is a test of the value of one attribute Series of test results for an example leads to classifying that example, at a leaf One of the earliest techniques to demonstrate machine learning from examples Widely used learning methods in practice Can out-perform human experts in many problems Not really suitable for large number of attributes and values Villanova University Machine Learning Project Decision Trees