COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

Slides:

Advertisements

Similar presentations

Data Mining Lecture 9.

Advertisements

Slides from: Doug Gray, David Poole

DECISION TREES. Decision trees  One possible representation for hypotheses.

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.

Decision Tree Approach in Data Mining

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,

Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of the attributes is the class.

Classification Techniques: Decision Tree Learning

Chapter 7 – Classification and Regression Trees

Decision Tree under MapReduce Week 14 Part II. Decision Tree.

Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.

Decision Trees Jeff Storey. Overview What is a Decision Tree Sample Decision Trees How to Construct a Decision Tree Problems with Decision Trees Decision.

x – independent variable (input)

Decision Tree Rong Jin. Determine Milage Per Gallon.

1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.

Decision Tree Algorithm

Induction of Decision Trees

Lecture 5 (Classification with Decision Trees)

Decision Trees Chapter 18 From Data to Knowledge.

LEARNING DECISION TREES

Learning decision trees derived from Hwee Tou Ng, slides for Russell & Norvig, AI a Modern Approachslides Tom Carter, “An introduction to information theory.

Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.

Learning decision trees derived from Hwee Tou Ng, slides for Russell & Norvig, AI a Modern Approachslides Tom Carter, “An introduction to information theory.

© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.

MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

Data Mining: A Closer Look

Chapter 5 Data mining : A Closer Look.

R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.

CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.

Introduction to Directed Data Mining: Decision Trees

Classifiers, Part 3 Week 1, Video 5 Classification  There is something you want to predict (“the label”)  The thing you want to predict is categorical.

DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.

Machine Learning CPS4801. Research Day Keynote Speaker o Tuesday 9:30-11:00 STEM Lecture Hall (2 nd floor) o Meet-and-Greet 11:30 STEM 512 Faculty Presentation.

INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.

CS-424 Gregory Dudek Today’s outline Administrative issues –Assignment deadlines: 1 day = 24 hrs (holidays are special) –The project –Assignment 3 –Midterm.

Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.

Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10

Chapter 9 – Classification and Regression Trees

Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.

Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.

Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.

CPS 270: Artificial Intelligence Machine learning Instructor: Vincent Conitzer.

L6. Learning Systems in Java. Necessity of Learning No Prior Knowledge about all of the situations. Being able to adapt to changes in the environment.

ID3 Algorithm Michael Crawford.

1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.

Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.

CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.

COMP 2208 Dr. Long Tran-Thanh University of Southampton K-Nearest Neighbour.

DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:

COMP 2208 Dr. Long Tran-Thanh University of Southampton Bandits.

An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.

Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.

Big Data Analysis and Mining Qinpei Zhao 赵钦佩 2015 Fall Decision Tree.

COMP 2208 Dr. Long Tran-Thanh University of Southampton Reinforcement Learning.

Classification and Regression Trees

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Decision/Classification Trees Readings: Murphy ; Hastie 9.2.

Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.

Neural Networks Lecture 4 out of 4. Practical Considerations Input Architecture Output.

BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.

DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.

COMP 2208 Dr. Long Tran-Thanh University of Southampton Revision.

Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.

Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.

Data Mining Lecture 11.

CS639: Data Management for Data Science

Decision trees One possible representation for hypotheses

Decision Trees Jeff Storey.

Presentation transcript:

COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees

Classification Environment Perception Behaviour Categorize inputs Update belief model Update decision making policy Decision making Perception Behaviour

Recognizing the type of situation you are in right now is a basic agent task: Classification Robotics: misidentifying a human body with some part of a car on the assembly line would be disastrous Military: friend or foo? Electric card usage: was it a fraud or not?

Last lecture: neural networks Why more classification methods? Very powerful in theory Promising direction: deep learning Still difficult to fully control the technology In many cases: other techniques are more efficient Occam’s razor: the simpler (the model) the better (the performance is) – go for something more complicated only if it’s really necessary In many real-world problems, data cleaning is the most important step – after that, a simple classification method would do the job

Classification Algorithm Classification Algorithm Bottom up: inspiration from biology - e.g., neural networks Classification Algorithm Classification Algorithm Top down: inspiration from higher abstraction levels

Prof or hobo 1?

Prof or hobo 2?

Prof or hobo 3?

Prof or hobo answers Hobo Professor

Back to classification Classification Algorithm Classification Algorithm Different ways to go: Honey?Fired? Evil plan?

Back to classification Classification Algorithm Classification Algorithm Some classification algorithms: Logistic regression Support vector machines (SVMs) Decision trees + its family … Easy to understand (Relatively) easy to implement Vey efficient in many cases

Decision making process Did it go well? Yes No

What are the clues that allow you to distinguish a prof from a hobo? Clothes people are wearing Their eyes The beard … Back to the “Prof or hobo” quiz Main idea: checking out some properties in some order

Classification with decision trees A decision tree takes a series of inputs defining a situation, and outputs a binary decision/classification. A decision tree spells out an order for checking the properties (attributes) of the situation until we have enough information to decide what's going on. We use the observable attributes to predict the outcome (or some important hidden or unknown quantity). Question: what is the optimal (efficient) order of the attributes?

The importance of the ordering Think about the “20 questions” game: inefficient questions will lead to low performance Think about binary search: Optimal: always halve the interval Decision trees are very simple to produce if we already know the underlying rules. But what we don’t have the rules, just past examples (experience)?

Often we don't know in advance how to classify things, and want our agent to learn from examples. Our objective

Which attribute to start with? The order of attributes is still very important Idea: choose the next attribute whose value can reduce the uncertainty about the outcome of the classification the most What does it mean when we say that something reduces the uncertainty in our knowledge? Reducing uncertainty (in knowledge) = increase (known) information So we should choose the attribute that provides the highest information gain

Entropy How to measure information gain (and how to define it)? Answer: borrow similar concepts from information & coding theory Entropy (Shannon, 1948): A measure of the amount of disorder or uncertainty in a system. A tidy room has low entropy: You can be reasonably certain your keys are on the hook you made for them. A messy room has high entropy: things are all over the place and your keys could be absolutely anywhere.

Input X Output Y Entropy Uncertainty about the outcome Classification: Entropy (Shannon, 1948): How often Y =y Measure of information (surprise) when Y = y (in bits)

Entropy example GoodOKTerrible Birmingham0.33 Southampton Glasgow001 Weather:

Entropy example BirminghamP(x)logP(x)- P(x)logP(x) Good OK Terrible Sum =1.58 (bits)

Entropy example SouthamptonP(x)logP(x)- P(x)logP(x) Good OK Terrible Sum =1.29 (bits)

Entropy example GlasgowP(x)logP(x)- P(x)logP(x) Good0-infinity0 OK0-infinity0 Terrible100 Sum =0 (bits) When we are certain, the entropy is 0

Conditional entropy Input X Output Y Classification: Entropy measures the uncertainty of a given state of the system How to measure the change? Conditional entropy: Joint probability Conditional probability How much uncertainty would remain about the outcome Y if we knew (for instance) the outcome of attribute X

Information gain Information gain: Current level of uncertainty (entropy) Possible new level of uncertainty (conditional entropy) The difference represents how much uncertainty would decrease

Building a decision tree Split the tree on the attribute with the highest information gain. Then repeat. Stopping Conditions: Don't split if all matching records have same output value (no point, we know what happens!). Don't split if all matching records have same attribute values (no point, we can't distinguish them). Recursive algorithm:

Example: Predicting the importance of s Objective: predict whether the user will read the

18 s: 8 read, 8 skipped “Thread” attribute: ReadsSkipsRow total new_thread7 (70%)3 (30%)10 follow_up2 (25%)6 (75%)8 Example: Predicting the importance of s What is the information gain if we choose “Thread” ? Calculation steps: Calculate H(Read) Calculate H(Read | Thread) Calculate G(Read, Thread) = H(Read) – H(Read | Thread)

Example: Predicting the importance of s Calculating H(Read) 18 s: 8 read, 8 skipped P(Read = True) = P(Read = False) = 0.5 H(Read) = -(0.5*log2(0.5) + 0.5*log2(0.5)) = 1 (bit)

Example: Predicting the importance of s Calculating H(Read | Thread) Specific conditional entropy Calculation steps: Calculate H(Read | Thread = new) Calculate H(Read | Thread = follow_up) Calculate H(Read | Thread) = p(new)*H(Read | Thread = new) + + p(follow_up)*H(Read | Thread = follow_up)

ReadsSkipsRow total new_thread7 (70%)3 (30%)10 follow_up2 (25%)6 (75%)8 Example: Predicting the importance of s P(Read = True | new)= 0.7; P(Read = False | new) = 0.3 H(Read | new) = 0.88 P(Read = True | follow_up) = 0.25; P(Read = False | follow_up) = 0.75 H(Read | follow_up) = 0.81 H(Read | Thread) = 10/18 * /18*0.81 = 0.85

Example: Predicting the importance of s Calculating G(Read,Thread): G(Read,Thread) = H(Read) – H(Read | Thread) G(Read,Thread) = 1– 0.85 = 0.15

Example: Predicting the importance of s

Advantages of decision trees Decision trees are able to generate understandable rules (i.e., human- readable). Once learned, decision trees perform classification very efficiently. Decision trees are able to handle continuous as well as categorical variables. You choose a threshold to split the continuous variables based on information gain.