Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.

Slides:



Advertisements
Similar presentations
DECISION TREES. Decision trees  One possible representation for hypotheses.
Advertisements

Machine Learning and Data Mining Linear regression
CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Lecture Notes for Chapter 4 (2) Introduction to Data Mining
Model Assessment, Selection and Averaging
CMPUT 466/551 Principal Source: CMU
Introduction to Boosting Slides Adapted from Che Wanxiang( 车 万翔 ) at HIT, and Robin Dhamankar of Many thanks!
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Lecture outline Classification Decision-tree classification.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
General Mining Issues a.j.m.m. (ton) weijters Overfitting Noise and Overfitting Quality of mined models (some figures are based on the ML-introduction.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Part I: Classification and Bayesian Learning
Classification and Prediction: Regression Analysis
For Better Accuracy Eick: Ensemble Learning
Introduction to Directed Data Mining: Decision Trees
Classifiers, Part 3 Week 1, Video 5 Classification  There is something you want to predict (“the label”)  The thing you want to predict is categorical.
A Few Answers Review September 23, 2010
Introduction to Machine Learning Supervised Learning 姓名 : 李政軒.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
INTRODUCTION TO Machine Learning 3rd Edition
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Practical Issues of Classification Underfitting and Overfitting –Training errors –Generalization (test) errors Missing Values Costs of Classification.
CS Inductive Bias1 Inductive Bias: How to generalize on novel data.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Patch Based Prediction Techniques University of Houston By: Paul AMALAMAN From: UH-DMML Lab Director: Dr. Eick.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
CORRECTIONS L2 regularization ||w|| 2 2, not ||w|| 2 Show second derivative is positive or negative on exams, or show convex – Latter is easier (e.g. x.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Machine Learning 5. Parametric Methods.
Regression. We have talked about regression problems before, as the problem of estimating the mapping f(x) between an independent variable x and a dependent.
Validation methods.
Machine Learning: Decision Trees Homework 4 assigned courtesy: Geoffrey Hinton, Yann LeCun, Tan, Steinbach, Kumar.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Evaluating Classifiers
CSE 4705 Artificial Intelligence
CH. 2: Supervised Learning
ECE 5424: Introduction to Machine Learning
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
Introduction to Data Mining, 2nd Edition by
Lecture Notes for Chapter 4 Introduction to Data Mining
Introduction to Data Mining, 2nd Edition by
INTRODUCTION TO Machine Learning
Introduction to Data Mining, 2nd Edition by
Data Mining Practical Machine Learning Tools and Techniques
CS 2750: Machine Learning Line Fitting + Bias-Variance Trade-off
Overfitting and Underfitting
Bias-variance Trade-off
Neural networks (1) Traditional multi-layer perceptrons
Supervised machine learning: creating a model
Shih-Yang Su Virginia Tech
INTRODUCTION TO Machine Learning
INTRODUCTION TO Machine Learning 3rd Edition
COSC 4368 Intro Supervised Learning Organization
Presentation transcript:

Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family car?  Prediction: What is the amount of rainfall tomorrow? Knowledge extraction: What do people expect from a family car? What factors are important to predict tomorrows rainfall?

Christoph Eick: Learning Models to Predict and Classify 2 Use the simpler one because Simpler to use (lower computational complexity) Easier to train (needs less examples) Less sensitive to noise Easier to explain (more interpretable) Generalizes better (lower variance - Occam’s razor) Noise and Model Complexity

Christoph Eick: Learning Models to Predict and Classify Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 3 Alterantive Approach: Regression

Christoph Eick: Learning Models to Predict and Classify Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 4 Finding Regresssion Coefficients How to find w 1 and w 0 ? Solve: dE/dw 1 =0 and dE/dw 0 =0 And solve the two obtained equations! Group Homework!

Christoph Eick: Learning Models to Predict and Classify 5 Model Selection & Generalization Learning is an ill-posed problem; data is not sufficient to find a unique solution The need for inductive bias, assumptions about H Generalization: How well a model performs on new data Overfitting: H more complex than C or f Underfitting: H less complex than C or f

Christoph Eick: Learning Models to Predict and Classify Underfitting and Overfitting Overfitting Underfitting: when model is too simple, both training and test errors are large Complexity of a Decision Tree := number of nodes It uses Underfitting Complexity of the Used Model Overfitting: when model is too complex and test errors are large although training errors are small.

Christoph Eick: Learning Models to Predict and Classify 7 Generalization Error Two errors: training error, and testing error usually called generalization error ( ). Typically, the training error is smaller than the generalization error. Measuring the generalization error is a major challenge in data mining and machine learning ( nets/part3/section-11.html ) nets/part3/section-11.html To estimate generalization error, we need data unseen during training. We could split the data as  Training set (50%)  Validation set (25%)  optional, for selecting ML algorithm parameters  Test (publication) set (25%) Error on new examples!

Christoph Eick: Learning Models to Predict and Classify 8 Triple Trade-Off There is a trade-off between three factors (Dietterich, 2003): 1. Complexity of H, c (H), 2. Training set size, N, 3. Generalization error, E on new data  As N  E   As c (H)  first E  and then E    s c (H)  the training error decreases for some time and then stays constant (frequently at 0) overfitting

Christoph Eick: Learning Models to Predict and Classify Notes on Overfitting Overfitting results in models that are more complex than necessary: after learning knowledge they “tend to learn noise” More complex models tend to have more complicated decision boundaries and tend to be more sensitive to noise, missing examples,… Training error no longer provides a good estimate of how well the tree will perform on previously unseen records Need “new” ways for estimating errors

Christoph Eick: Learning Models to Predict and Classify Thoughts on Fitness Functions for Genetic Programming 1. Just use the squared training error  overfitting 2. Use the squared training error but restrict model complexity 3. Split Training set into true training set and validation set; use squared error of validation set as the fitness function. 4. Combine 1, 2, 3 (many combination exist) 5. Consider model complexity in the fitness function: fitness(model)= error(model) +  *complexity(model) 10