R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.

Slides:



Advertisements
Similar presentations
Data Mining Lecture 9.
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
Random Forest Predrag Radenković 3237/10
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Decision Trees with Numeric Tests
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
R OBERTO B ATTITI, M AURO B RUNATO The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
Classification Techniques: Decision Tree Learning
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Decision Tree under MapReduce Week 14 Part II. Decision Tree.
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
Decision Tree Rong Jin. Determine Milage Per Gallon.
Decision Tree Algorithm
Ensemble Learning: An Introduction
Induction of Decision Trees
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Classification Continued
Lecture 5 (Classification with Decision Trees)
Three kinds of learning
ICS 273A Intro Machine Learning
Classification.
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
Ensemble Learning (2), Tree and Forest
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
Chapter 7 Decision Tree.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
Chapter 9 – Classification and Regression Trees
Machine Learning Queens College Lecture 2: Decision Trees.
Learning from Observations Chapter 18 Through
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
Chapter 20 Data Analysis and Mining. 2 n Decision Support Systems  Obtain high-level information out of detailed information stored in (DB) transaction-processing.
ID3 Algorithm Michael Crawford.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Lecture Notes for Chapter 4 Introduction to Data Mining
ECE 471/571 – Lecture 20 Decision Tree 11/19/15. 2 Nominal Data Descriptions that are discrete and without any natural notion of similarity or even ordering.
Classification and Regression Trees
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Roberto Battiti, Mauro Brunato
Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Artificial Intelligence
Trees, bagging, boosting, and stacking
Ch9: Decision Trees 9.1 Introduction A decision tree:
Data Science Algorithms: The Basic Methods
Roberto Battiti, Mauro Brunato
ECE 471/571 – Lecture 12 Decision Tree.
Roberto Battiti, Mauro Brunato
Roberto Battiti, Mauro Brunato
Roberto Battiti, Mauro Brunato
Roberto Battiti, Mauro Brunato
Roberto Battiti, Mauro Brunato
Roberto Battiti, Mauro Brunato
Statistical Learning Dong Liu Dept. EEIS, USTC.
Ensemble learning.
Presentation transcript:

R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb optimization.org/LIONbook © Roberto Battiti and Mauro Brunato, 2014, all rights reserved. Can be used and modified for classroom usage, provided that the attribution (link to book website) is kept.

Chap.6 Rules, decision trees, and forests If a tree falls in the forest and there’s no one there to hear it, does it make a sound?

Rules Rules are a way to condense nuggets of knowledge in a way amenable to human understanding “customer is wealthy” “he will buy my product.” “body temperature > 37 degrees Celsius” “patient is sick.”

Non-contradictory sets of rules If the set of rules gets large, complexities can appear, like rules leading to contradictory classifications Automated ways to extract non-contradictory rules from data are precious

Automated generation of rules breaking rules into a chain of simple questions is key the most informative questions are better placed at the beginning of the sequence, leading to a hierarchy of questions decision trees are organized hierarchical arrangement of decision rules, without contradictions

Decision trees A decision tree is a set of questions organized in a hierarchical manner and represented graphically as a tree

Decision trees For an input object, a decision tree estimates an unknown property by asking successive questions about its known properties

Building decision trees Basic idea: ask the most informative questions first Recursively construct a hierarchy of questions At each step, choose the question that leads to the purest subsets

Building decision trees Recursive step in tree building: after the initial purification by question2,the same method is applied on the left and right example subsets. Question3 is sufficient to completely purify the subsets. No additional recursive call is executed on the pure subsets

How do we measure purity? We need a quantitative measure of purity The two widely used measures of purity of a subset are the Information gain Gini impurity

Shannon Entropy Suppose we sample from a set associated to an internal node or to a leaf of the tree We are going to get examples of a class y with a probability Pr(y) proportional to the fraction of cases of the class present in the set The statistical uncertainty in the obtained class is measured by Shannon’s entropy of the label probability distribution:

Shannon entropy High entropy (left): events have similar probabilities, the uncertainty high. Low entropy (right): events have very different probabilities, the uncertainty is because one event collects most probability.

Information Gain The average reduction in entropy after knowing the answer is the information gain (or mutual information) An optimal question maximizes IG, that is, reduces the average entropy as much as possible. S is the current set of examples, S=S_yes U S_no is the splitting obtained after answering a question

Gini impurity Suppose that there are m classes, and let f_i be the fraction of items labeled with value i in the set. Then the Gini Impurity is defined as The Gini impurity measures how often a randomly chosen element from the set would be incorrectly labeled if it were randomly labeled according to the distribution of labels in the subset This method produces zero errors if the set is pure, and a small error rate if a single class gets the largest share of the set

How to pose questions? In general, question should have a binary output. For a categorical variable, the test can be based on the variable having (or not having) a subset of the possible values For real-valued variables, the tests can be on a single variable or on simple (linear) combination of a subset of variables

Dealing with missing values In real-world data, missing values are abundant If an instance reaches a node and the question cannot be answered because data is lacking, ideally “split the instance into pieces”, and send part of it down each branch in proportion to the number of training instances going down that branch. When the different pieces of the instance eventually reach the leaves, the corresponding leaf output values can be averaged.

Dealing with missing values Missing nationality information. The data point arriving at the top node is sent both to its left and right child nodes with different weights depending on the frequency of “YES” and “NO” answers in the training set.

Decision forests Basic idea: use an ensemble of trees to make decisions in a democratic way HOW? Train different trees on different sets of examples Allow for randomized choices during training an isolated tree will be rather weak, however, the majority (or weighted average) rule will provide reasonable answers.

Gist Simple “if-then” rules condense nuggets of information in a way which can be understood by human people. To avoid contradictory rules we proceed with a hierarchy of questions (the most informative first) leading to an organized structure of simple successive questions called a decision tree

Gist2 Trees can be learned in a greedy and recursive manner, starting from the complete set of examples, picking a test to split it into two subsets which are as pure as possible The recursive process terminates when the remaining subsets are sufficiently pure

Gist3 The abundance of memory and computing power permits training very large numbers of different trees. They can be fruitfully used as decision forests by collecting all outputs and averaging (regression) or voting (classification). They are fast and efficient thanks to their parallelism and reduced set of tests per data point.