CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

Slides:



Advertisements
Similar presentations
COMP3740 CR32: Knowledge Management and Adaptive Systems
Advertisements

Lecture 16 Decision Trees
Machine Learning: Intro and Supervised Classification
1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Decision Trees Decision tree representation ID3 learning algorithm
CHAPTER 9: Decision Trees
Machine Learning III Decision Tree Induction
1er. Escuela Red ProTIC - Tandil, de Abril, Decision Tree Learning 3.1 Introduction –Method for approximation of discrete-valued target functions.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Decision Tree Approach in Data Mining
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Indian Statistical Institute Kolkata
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Decision Tree Rong Jin. Determine Milage Per Gallon.
Decision Tree Algorithm
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Lecture 5 (Classification with Decision Trees)
Three kinds of learning
Decision trees and empirical methodology Sec 4.3,
Learning….in a rather broad sense: improvement of performance on the basis of experience Machine learning…… improve for task T with respect to performance.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
1 Learning: Overview and Decision Trees  Learning: Intro and General concepts  Decision Trees Basic Concepts Finding the best DT.
Ensemble Learning (2), Tree and Forest
Decision Tree Learning
Machine Learning Chapter 3. Decision Tree Learning
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, April 3, 2000 DingBing.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Learning CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Decision Trees & the Iterative Dichotomiser 3 (ID3) Algorithm David Ramos CS 157B, Section 1 May 4, 2006.
Chapter 9 – Classification and Regression Trees
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
CpSc 810: Machine Learning Decision Tree Learning.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Learning from Observations Chapter 18 Through
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
Machine Learning, Decision Trees, Overfitting Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 14,
CS690L Data Mining: Classification
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
Decision Trees. What is a decision tree? Input = assignment of values for given attributes –Discrete (often Boolean) or continuous Output = predicated.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Decision Trees, Part 1 Reading: Textbook, Chapter 6.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Decision Tree Learning
Decision Tree Learning Presented by Ping Zhang Nov. 26th, 2007.
Lecture Notes for Chapter 4 Introduction to Data Mining
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Machine Learning Reading: Chapter Classification Learning Input: a set of attributes and values Output: discrete valued function Learning a continuous.
Artificial Intelligence
Decision Trees (suggested time: 30 min)
Chapter 6 Classification and Prediction
Data Science Algorithms: The Basic Methods
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Machine Learning Chapter 3. Decision Tree Learning
CSCI N317 Computation for Scientific Applications Unit Weka
Statistical Learning Dong Liu Dept. EEIS, USTC.
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
Presentation transcript:

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S. Thrun, P. Norvig, Wikipedia

Supervised ML: Formal Specification Slide 2CPSC 502, Lecture 15

3 Example Classification Data CPSC 502, Lecture 15

4 Today Nov 1 Supervised Machine Learning Naïve Bayes Markov-Chains Decision Trees Regression Logistic Regression Key Concepts Over-fitting Evaluation

5 Example Classification Data We want to classify new examples on property Action based on the examples’ Author, Thread, Length, and Where. CPSC 502, Lecture 15

6 Learning task  Inductive inference Given a set of examples of f(author,thread, length, where) = {reads,skips} Find a function h(author,thread, length, where) that approximates f CPSC 502, Lecture 15

7 author length unknown shortlong skipsreads thread known length new old skips short reads Example Decision Tree long skips CPSC 502, Lecture 15 The non-leaf nodes are labeled with attributes. The leaves of the tree are labeled with classifications. Arcs out of a node labeled with attribute A are labeled with each of the possible values of the attribute

8 DT as classifiers To classify an example, filter in down the tree For each attribute of the example, follow the branch corresponding to that attribute’s value. When a leaf is reached, the example is classified as the label for that leaf. CPSC 502, Lecture 15

9 author length unknown short long skips reads thread known length new old skips short reads e2 e3 e4 e1e5 DT as classifiers e6 long skips CPSC 502, Lecture 15

10 DT Applications  DT are often the first method tried in many areas of industry and commerce, when task involves learning from a data set of examples  Main reason: the output is easy to interpret by humans CPSC 502, Lecture 15

11 Learning Decision Trees Method for supervised classification (we will assume attributes with finite discrete values)  Representation is a decision tree.  Bias is towards simple decision trees.  Search through the space of decision trees, from simple decision trees to more complex ones. CPSC 502, Lecture 15

12 author thread known length unknown newold long short reads e1e5 Trivially Correct Decision Tree length skips where home long skips e4 e6 skips where home work length newold short reads e2 length where work long skips e3 where work PROBLEMS? CPSC 502, Lecture 15

13 Example Decision Tree (2) length long short skips read e1, e3, e4, e6e2, e5 But this tree also classifies my examples correctly. CPSC 502, Lecture 15

14 Searching for a Good Decision Tree  The input is a target attribute for which we want to build a classifier, a set of examples a set of attributes.  Stop if all examples have the same classification (good ending). Plus some other stopping conditions for not so good endings  Otherwise, choose an attribute to split on (greedy, or myopic step) for each value of this attribute, build a sub-tree for those examples with this attribute value CPSC 502, Lecture 15

author thread known length unknownnew old skips short reads newold skipsreads e1-s e5-r e4-s, e6-s e2-re3-s e1-s, e5-r, e4-s, e6-s thread, length, wheree2-r, e3-s e1-s, e5-r length, where e1-s, e2-r, e3-s, e5-r, e4-s, e6-s author, thread, length, where Building a Decision Tree (Newsgroup Read Domain) long skips Slide 15CPSC 502, Lecture 15

16 Choosing a good split  Goal: try to minimize the depth of the tree  Split on attributes that move as much as possible toward an exact classification of the examples  Ideal split divides examples into sets, with the same classification  Bad split leaves about the same proportion of examples in the different classes CPSC 502, Lecture 15

17 Choosing a Good Attribute (binary case) Good Not So Good CPSC 502, Lecture 15

Information Gain (1) CPSC 502, Lecture 15Slide 18

Information Gain (2): Expected Reduction in Entropy CPSC 502, Lecture 15Slide 19  Chose the attribute with the highest Gain

20 Example: possible splits author known unknown Skips 3 Read 1 e2, e3 e1, e5, e4, e6 Skips 4 Read 2 Skips 1 Read 1 For the initial training set B(4/6) = 0.92 bit Reminder(author) = 4/6 * B(1/4) + 2/6 * B(1/2) = 2/3* /3 = Gain(author ) = 0.92 – = CPSC 502, Lecture 15

21 Example: possible splits length long short Skips 4 Read 0 Skip 2 e1, e3, e4, e6 e2, e5 Skips 4 Read 2 For the initial training set I(4/6, 2/6) = 0.92 bit Reminder (length) = 4/6 * B(1) + 2/6 * B(1)=0 Gain(length)=0.92 CPSC 502, Lecture 15

22 Drawback of Information Gain Tends to favor attributes with many different values Can fit the data better than spitting on attributes with fewer values Imagine extreme case of using “message id- number” in the newsgroup reading example Every example may have a different value on this attribute Splitting on it would give highest information gain, even if it is unlikely that this attribute is relevant for the user’s reading decision Alternative measures (e.g. gain ratio) CPSC 502, Lecture 15

23 Expressiveness of Decision Trees They can represent any discrete function, an consequently any Boolean function How many CPSC 502, Lecture 15

24 Handling Overfitting  This occurs with noise and correlations in the available examplesthat are not reflected in the data as a whole.  One technique to handle overfitting: decision tree pruning  Statistical techniques to evaluate when the gain on the attribute selected by the splitting technique is large enough to be relevant  Generic techniques to test ML algorithms CPSC 502, Lecture 15

How to Learn  Data: labeled instances, e.g. s marked spam/ham  Training set  Held out (validation) set  Test set  Experimentation cycle  Learn model on training set  Tune it on held-out set  Compute accuracy on test set  Very important: never “peek” at the test set!  Evaluation  Accuracy: fraction of instances predicted correctly  Overfitting and generalization  Want a classifier which does well on test data  Overfitting: fitting the training data very closely, but not generalizing well to test data Training Data Held-Out Data Test Data 25CPSC 502, Lecture 15

26 Assessing Performance of Learning Algorithm  Collect a large set of examples  Divide into two sets: training set and test set  Use the learning algorithm on the training set to generate an hypothesis h  Measure the percentage of examples in the test set that are correctly classified by h  Repeat with training sets of different size and different randomly selected training sets. CPSC 502, Lecture 15

27 Cross-Validation Partition the training set into k sets Run the algorithm k times, each time (fold) using one of the k sets as the test test, and the rest as training set Report algorithm performance as the average performance (e.g. accuracy) over the k different folds  Useful to select different candidate algorithms/models E.g. a DT built using information gain vs. some other measure for splitting Once the algorithm/model type is selected via cross- validation, return the model trained on all available data CPSC 502, Lecture 15

28 Interesting thing to try  Repeat cross-validation with training sets of different size  Report algorithm performance as a function of training set size: learning curve CPSC 502, Lecture 15

29 Other Issues in DT Learning  Attributes with continuous and integer values (e.g. Cost in $) Important because many real world applications deal with continuous values Methods for finding the split point that gives the highest information gain (e.g. Cost > 50$) Still the most expensive part of using DT in real-world applications  Continue-valued output attribute (e.g. predicted cost in $): Regression Tree Splitting may stop before classifying all examples Leaves with unclassified examples use a linear function of a subset of the attributes to classify them via linear regression Tricky part: decide when to stop splitting and start linear regression CPSC 502, Lecture 15

Slide 30 TODO for this Thurs Read 7.5, 7.6 and 11.1, 11.2 Assignment 3-Part1 due