Supervised Learning I, Cont’d Reading: DH&S, Ch 8.1-8.4.

Slides:



Advertisements
Similar presentations
CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.
Advertisements

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Approximations of points and polygonal chains
Decision Tree Approach in Data Mining
David Luebke 1 4/22/2015 CS 332: Algorithms Quicksort.
Support Vector Machines
Decision Tree under MapReduce Week 14 Part II. Decision Tree.
Orthogonal Range Searching 3Computational Geometry Prof. Dr. Th. Ottmann 1 Orthogonal Range Searching 1.Linear Range Search : 1-dim Range Trees 2.2-dimensional.
Support Vector Machines and Kernel Methods
Bayesian Learning, Part 1 of (probably) 4 Reading: Bishop Ch. 1.2, 1.5, 2.3.
More Methodology; Nearest-Neighbor Classifiers Sec 4.7.
Bayesian Learning, Part 1 of (probably) 4 Reading: DH&S, Ch. 2.{1-5}, 3.{1-4}
Classification Dr Eamonn Keogh Computer Science & Engineering Department University of California - Riverside Riverside,CA Who.
Monday, 12/9/02, Slide #1 CS 106 Intro to CS 1 Monday, 12/9/02  QUESTIONS??  On HW #5 (Due 5 pm today)  Today:  Recursive functions  Reading: Chapter.
Margins, support vectors, and linear programming Thanks to Terran Lane and S. Dreiseitl.
More on Decision Trees. Numerical attributes Tests in nodes can be of the form x j > constant Divides the space into rectangles.
Margins, support vectors, and linear programming, oh my! Reading: Bishop, 4.0, 4.1, 7.0, 7.1 Burges tutorial (on class resources page)
Evaluating Hypotheses
Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Decision Trees Chapter 18 From Data to Knowledge.
Aprendizagem baseada em instâncias (K vizinhos mais próximos)
Decision trees and empirical methodology Sec 4.3,
Linear Methods, cont’d; SVMs intro. Straw poll Which would you rather do first? Unsupervised learning Clustering Structure of data Scientific discovery.
Steep learning curves Reading: DH&S, Ch 4.6, 4.5.
Supervised Learning & Classification, part I Reading: DH&S, Ch 1.
Supervised Learning & Classification, part I Reading: W&F ch 1.1, 1.2, , 3.2, 3.3, 4.3, 6.1*
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Linear Discriminant Functions Chapter 5 (Duda et al.)
The joy of Entropy.
Supervised Learning I, Cont’d Reading: Bishop, Ch 14.4, 1.6, 1.5.
The joy of Entropy. Administrivia Reminder: HW 1 due next week No other news. No noose is good noose...
Bayesian Learning Part 3+/- σ. Administrivia Final project/proposal Hand-out/brief discussion today Proposal due: Mar 27 Midterm exam: Thurs, Mar 22 (Thurs.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Learning CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
CS-424 Gregory Dudek Today’s outline Administrative issues –Assignment deadlines: 1 day = 24 hrs (holidays are special) –The project –Assignment 3 –Midterm.
Today’s Topics Dealing with Noise Overfitting (the key issue in all of ML) A ‘Greedy’ Algorithm for Pruning D-Trees Generating IF-THEN Rules from D-Trees.
Machine Learning Queens College Lecture 2: Decision Trees.
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
Today’s Topics Learning Decision Trees (Chapter 18) –We’ll use d-trees to introduce/motivate many general issues in ML (eg, overfitting reduction) “Forests”
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Sorting: Implementation Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.
Today’s Topics HW1 Due 11:55pm Today (no later than next Tuesday) HW2 Out, Due in Two Weeks Next Week We’ll Discuss the Make-Up Midterm Be Sure to Check.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Data Mining and Decision Support
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Oliver Schulte Machine Learning 726 Decision Tree Classifiers.
1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
CS 445/545 Machine Learning Winter, 2017
Artificial Intelligence
CS 445/545 Machine Learning Spring, 2017
Data Science Algorithms: The Basic Methods
Data Mining (and machine learning)
Orthogonal Range Searching and Kd-Trees
K Nearest Neighbor Classification
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning in Practice Lecture 17
AI and Machine Learning
CS639: Data Management for Data Science
Data Mining CSCI 307, Spring 2019 Lecture 23
Presentation transcript:

Supervised Learning I, Cont’d Reading: DH&S, Ch

Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers in the ML field Might be of interest. All are welcome Meets Fri, 2:00-3:30, FEC325 conf room More info: Lecture notes online

Yesterday & today Last time: Basic ML problem Definitions and such Statement of the supervised learning problem Today: HW 1 assigned Hypothesis spaces Intro to decision trees

Homework 1 Due: Tues, Jan 31 DH&S, problems 8.1, 8.5, 8.6 (a & b), 8.8 (a), 8.11

Feature (attribute): Instance (example): Label (class): Feature space: Training data: Review of notation

Finally, goals Now that we have and, we have a (mostly) well defined job: Find the function that most closely approximates the “true” function The supervised learning problem:

Goals? Key Questions: What candidate functions do we consider? What does “most closely approximates” mean? How do you find the one you’re looking for? How do you know you’ve found the “right” one?

Hypothesis spaces The “true” we want is usually called the target concept (also true model, target function, etc.) The set of all possible we’ll consider is called the hypothesis space, NOTE! Target concept is not necessarily part of the hypothesis space!!! Example hypothesis spaces: All linear functions Quadratic & higher-order fns.

Space of all functions on Visually... Might be here Or it might be here...

More hypothesis spaces Rules if (x.skin==”fur”) { if (x.liveBirth==”true”) { return “mammal”; } else { return “marsupial”; } } else if (x.skin==”scales”) { switch (x.color) { case (”yellow”) { return “coral snake”; } case (”black”) { return “mamba snake”; } case (”green”) { return “grass snake”; } } } else {... }

More hypothesis spaces Decision Trees

More hypothesis spaces Decision Trees

Finding a good hypothesis Our job is now: given an in some and an, find the best we can by searching Space of all functions on

Measuring goodness What does it mean for a hypothesis to be “as close as possible”? Could be a lot of things For the moment, we’ll think about accuracy (Or, with a higher sigma-shock factor...)

Aside: Risk & Loss funcs. The quantity is called a risk function A.k.a., expected loss function Approximation to true loss: (Sort of) measure of distance between “true” concept and approximation to it All functions on

Constructing DT’s, intro Hypothesis space: Set of all trees, w/ all possible node labelings and all possible leaf labelings How many are there? Proposed search procedure: 3. Propose a candidate tree, t i 4. Evaluate accuracy of t i w.r.t. X and y 5. Keep max accuracy t i 6. Go to 1 Will this work?

A more practical alg. Can’t really search all possible trees Instead, construct single tree Greedily Recursively At each step, pick decision that most improves the current tree

A more practical alg. DecisionTree buildDecisionTree(X,Y) { // Input: instance set X, label set Y if (Y.isPure()) { return new LeafNode(Y); } else { Feature a=getBestSplitFeature(X,Y); DecisionNode n=new DecisionNode(a); [X0,...,Xk,Y0,...,Yk]=a.splitData(X,Y); for (i=0;i<=k;++i) { n.addChild(buildDecisionTree(Xi,Yi)); } return n; } }

A bit of geometric intuition x 1 : petal length x 2 : sepal width

The geometry of DTs Decision tree splits space w/ a series of axis orthagonal decision surfaces A.k.a. axis parallel Equivalent to a series of half-spaces Intersection of all half-spaces yields a set of hyper-rectangles (rectangles in d>3 dimensional space) In each hyper-rectangle, DT assigns a constant label So a DT is a piecewise-constant approximator over a sequence of hyper-rectangular regions

Filling out the algorithm Still need to specify a couple of functions: Y.isPure() Determine whether we’re done splitting set getBestSplitFeature(X,Y) Find the best attribute to split X on, given labels Y Y.isPure() is the easy (easier, anyway) one...

Splitting criteria What properties do we want our getBestSplitFeature() function to have? Increase the purity of the data After split, new sets should be closer to uniform labeling than before the split Want the subsets to have roughly the same purity Want the subsets to be as balanced as possible