Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.

Slides:



Advertisements
Similar presentations
Artificial Intelligence 11. Decision Tree Learning Course V231 Department of Computing Imperial College, London © Simon Colton.
Advertisements

COMP3740 CR32: Knowledge Management and Adaptive Systems
1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Decision Trees Decision tree representation ID3 learning algorithm
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Decision Tree Approach in Data Mining
ICS320-Foundations of Adaptive and Learning Systems
Classification Techniques: Decision Tree Learning
Chapter 7 – Classification and Regression Trees
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
Part 7.3 Decision Trees Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting.
Ensemble Learning: An Introduction
Induction of Decision Trees
Classification Continued
Decision Trees an Introduction.
Three kinds of learning
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
Learning….in a rather broad sense: improvement of performance on the basis of experience Machine learning…… improve for task T with respect to performance.
ICS 273A Intro Machine Learning
Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Machine Learning Reading: Chapter Text Classification  Is text i a finance new article? PositiveNegative.
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ID3 Algorithm Allan Neymark CS157B – Spring 2007.
By Wang Rui State Key Lab of CAD&CG
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
CS-424 Gregory Dudek Today’s outline Administrative issues –Assignment deadlines: 1 day = 24 hrs (holidays are special) –The project –Assignment 3 –Midterm.
Chapter 9 – Classification and Regression Trees
For Friday No reading No homework. Program 4 Exam 2 A week from Friday Covers 10, 11, 13, 14, 18, Take home due at the exam.
Machine Learning Lecture 10 Decision Tree Learning 1.
CpSc 810: Machine Learning Decision Tree Learning.
Scaling up Decision Trees. Decision tree learning.
CS Decision Trees1 Decision Trees Highly used and successful Iteratively split the Data Set into subsets one attribute at a time, using most informative.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
Decision Trees, Part 1 Reading: Textbook, Chapter 6.
Machine Learning Decision Trees. E. Keogh, UC Riverside Decision Tree Classifier Ross Quinlan Antenna Length Abdomen Length.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Decision Trees.
Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 4-Inducción de árboles de decisión (1/2) Eduardo Poggi.
Ensemble Learning, Boosting, and Bagging: Scaling up Decision Trees (with thanks to William Cohen of CMU, Michael Malohlava of 0xdata, and Manish Amde.
CSE343/543 Machine Learning: Lecture 4.  Chapter 3: Decision Trees  Weekly assignment:  There are lot of applications and systems using machine learning.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
CS 9633 Machine Learning Decision Tree Learning
Decision Tree Learning
Classification Algorithms
Ch9: Decision Trees 9.1 Introduction A decision tree:
Data Science Algorithms: The Basic Methods
Issues in Decision-Tree Learning Avoiding overfitting through pruning
Decision Tree Saed Sayad 9/21/2018.
ID3 Algorithm.
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Machine Learning Chapter 3. Decision Tree Learning
Decision Trees By Cole Daily CSCI 446.
Statistical Learning Dong Liu Dept. EEIS, USTC.
Artificial Intelligence 6. Decision Tree Learning
Decision Trees Jeff Storey.
Presentation transcript:

Learning what questions to ask

8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of a data instance that is to be classified  Each node is a question about the value that the instance to be classified has in a particular dimension OutlookHumidityWindPlay Tennis? SunnyNormalWeak ??? How would the decision tree classify this data instance Discrete Data Fan-out of each node determined by how many different values that dimension can take-on Play Tennis?

8/29/03Decision Trees3  Training data is used to build the tree  How decide what question to ask first?  Remember the curse of dimensionality  There might be just a few dimensions that are important and the rest could be random

8/29/03Decision Trees4  What question can I ask about the data that will give me the most information gain  Closer to being able to classify…  Identifying the most important dimension (most important question) What is the outlook? How humid is it? How windy is it?

8/29/03Decision Trees5  Approach comes out of Information Theory  From Wikipedia : developed by Claude E. Shannon to find fundamental limits on signal processing operations such as compressing data  Basically, how much information can I cram into a given signal (how many bits can I encode) Another statistical approach

8/29/03Decision Trees6  Starts with entropy…  Entropy is a measure of the homogeneity of the data  Purely random (nothing but noise) is maximum entropy  Linearly separable data is minimum entropy  What does that mean with discrete data? Given all instances with a sunny outlook, what if all of them were classified “yes, play tennis” that were “low humidity” and all of them were classified “no, do not play tennis” that were “high humidity” High entropy or low? Given all instances with a sunny outlook, what if half were “yes, play tennis” and half “no, don’t play” no matter what the humidity High entropy or low?

8/29/03Decision Trees7  If going to measure…  Want a statistical approach that yields…

8/29/03Decision Trees8  What if a sample was 20% 80%  Log 2 (.2) = log(.2)/log(2)  Log 2 (.2) =  Log(.8) =  -(.2)*( ) – (.8)*( )   What if 80% 20%  Same  What if 50% 50%  Highest entropy, 1

8/29/03Decision Trees9  Can extend to more classes  Not just positive and negative If set base to number of classes back to summing to 1 at max Sum to number of classes if stick with base 2 From book: Entropy is a measure of the expected encoding length measured in bits If set base to number of classes back to summing to 1 at max Sum to number of classes if stick with base 2 From book: Entropy is a measure of the expected encoding length measured in bits

Humidity question or Windy question? 8/29/03Decision Trees10  Simply, expected reduction in entropy caused by partitioning the examples according to this attribute Scales the contribution of each answer according to membership If entropy of S is 1 and each of the entropies for the answers is 1 then … 1 – 1 so zero Information gain is zero If entropy of S is 1 and each of the entropies for the answers is 1 then … 1 – 1 so zero Information gain is zero If entropy of S is 1 and each of the entropies for the answers is 0 then … 1 – 0 so one Information gain is 1 If entropy of S is 1 and each of the entropies for the answers is 0 then … 1 – 0 so one Information gain is 1

8/29/03Decision Trees11  What is the information gain

8/29/03Decision Trees12  Recursive algorithm: ID3  Iterative Dichotomizer 3 ID3(S, attributes yet to be processed) Create a Root node for the tree Base cases If S are all same class, return the single node tree root with that label If attributes is empty return r node with label equal to most common class Otherwise Find attribute with greatest information gain Set decision attribute for root For each value of the chosen attribute Add a new branch below root Determine S v for that value If S v is empty Add a leaf with label of most common class Else Add subtree to this branch: ID3(Sv, attributes – this attribute)

8/29/03Decision Trees13  Which attribute next?

8/29/03Decision Trees14  Next attribute?

8/29/03Decision Trees15  Is there a branch for every answer? What if no training samples had overcast as their outlook? Could you classify a new unknown or test instance if it had overcast in that dimension?

8/29/03Decision Trees16  Tree often perfectly classifies training data  Not guaranteed but usually: if exhaust every dimension as drill-down last decision node might have answers that are still “impure” but is labeled with most abundant class  For instance: on the cancer data my tree had no leaves deeper than 4 levels  It basically memorizes the training data  Is this the best policy?  What if had a node that “should” be pure but had a single exception?

8/29/03Decision Trees17  Decision boundary  Sometimes it is better to live with a little error than to try to get perfection

8/29/03Decision Trees18  Wikipedia  In statistics, overfitting occurs when a statistical model describes random error or noise instead of the underlying relationship.

8/29/03Decision Trees19  Bayesian finds boundary that minimizes error  If we trim the decision tree’s leaves—similar effect  i.e. don’t try to memorize every single training sample

8/29/03Decision Trees20  Don’t know until you know  Withhold some data  Use to test

8/29/03Decision Trees21  Stop growing tree early  Set some threshold for allowable entropy  Post Pruning  Build tree then remove as long as it improves

8/29/03Decision Trees22  Remove each decision node in turn and check performance  Removing a decision node means removing all sub-trees below it and assigning the most common class  Remove (permanently) the decision node that caused the greatest increase in accuracy  Rinse and repeat

8/29/03Decision Trees23

8/29/03Decision Trees24  A series of rules  A node could both be present and not be present  Imagine a bifurcation and one track has only the first and last “node”

8/29/03Neural Networks25  Bootstrap aggregating (bagging )  Helps to avoid overfitting  Usually applied to decision tree models (though not exclusively)

8/29/03Neural Networks26  Machine learning ensemble meta-algorithm  Create a bunch of models  Do so by bootstrap sampling the training data  Let all the models vote Q1 Q2Q3Q4 Q1 Q2Q3Q4 Q1 Q2Q3Q4 Q1 Q2Q3Q4 Q1 Q2Q3Q4 Q1 Q2Q3Q4 Q1 Q2Q3Q4 Pick me!

8/29/03Decision Trees27  Forest is a bunch of trees  Each tree has access to a random subset of attributes/dimensions

8/29/03Decision Trees28  Greedy algorithm  Tries to race to an answer  Finds the next question that best splits the data into classes by answer  Result:  Short trees are preferred

8/29/03Decision Trees29  The simplest answer is often the best  But does this lead to the best classifier  Book has a philosophical discussion about this without resolving the issue

8/29/03Decision Trees30  Many classifiers simply give an answer  No reason  Decision trees one of the few that provides such insights

8/29/0331Decision Trees