PatReco: Model and Feature Selection Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005.

Slides:

Advertisements

Similar presentations

Posner and Keele; Rosch et al.. Posner and Keele: Two Main Points Greatest generalization is to prototype. –Given noisy examples of prototype, prototype.

Advertisements

CS479/679 Pattern Recognition Dr. George Bebis

Probabilistic Generative Models Rong Jin. Probabilistic Generative Model Classify instance x into one of K classes Class prior Density function for class.

Assuming normally distributed data! Naïve Bayes Classifier.

PatReco: Hidden Markov Models Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall

PatReco: Estimation/Training Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall

Hypothesis Testing Steps of a Statistical Significance Test. 1. Assumptions Type of data, form of population, method of sampling, sample size.

Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.

Non Parametric Classifiers Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall

Intro to Statistics for the Behavioral Sciences PSYC 1900

Unsupervised Training and Clustering Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall

Discriminant Functions Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall

End of Chapter 8 Neil Weisenfeld March 28, 2005.

PatReco: Detection Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall

PatReco: Bayes Classifier and Discriminant Functions Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall

PatReco: Discriminant Functions for Gaussians Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall

Pattern Recognition Applications Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall

Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.

Manu Chandran. Outline Background and motivation Over view of techniques Cross validation Bootstrap method Setting up the problem Comparing AIC,BIC,Crossvalidation,Bootstrap.

CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.

Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.

Quadratic Classifiers (QC) J.-S. Roger Jang ( 張智星 ) CS Dept., National Taiwan Univ Scientific Computing.

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.

Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.

On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.

PatReco: Introduction Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Model selection –Error decomposition –Bias-Variance Tradeoff –Classification:

Computational Intelligence: Methods and Applications Lecture 14 Bias-variance tradeoff – model selection. Włodzisław Duch Dept. of Informatics, UMK Google:

Univariate Gaussian Case (Cont.)

Multilevel modelling: general ideas and uses

Practical Statistics for Physicists

More on Inference.

Chapter 3: Maximum-Likelihood Parameter Estimation

Quadratic Classifiers (QC)

Continuous Random Variable

Power and Effect Size.

ECE 5424: Introduction to Machine Learning

(5) Notes on the Least Squares Estimate

LECTURE 04: DECISION SURFACES

IEE 380 Review.

Stephen W. Raudenbush University of Chicago December 11, 2006

CH 5: Multivariate Methods

ECE 5424: Introduction to Machine Learning

Chapter 7 Sampling Distributions.

BIVARIATE REGRESSION AND CORRELATION

Overview of Supervised Learning

Empirical Evaluation (Ch 5)

More on Inference.

REMOTE SENSING Multispectral Image Classification

Econometric Models The most basic econometric model consists of a relationship between two variables which is disturbed by a random error. We need to use.

The Simple Linear Regression Model: Specification and Estimation

EE513 Audio Signals and Systems

CHAPTER 6 Statistical Inference & Hypothesis Testing

Overfitting and Underfitting

Generally Discriminant Analysis

Statistics II: An Overview of Statistics

Multivariate Methods Berlin Chen

Mathematical Foundations of BME

Multivariate Methods Berlin Chen, 2005 References:

STA 291 Spring 2008 Lecture 13 Dustin Lueker.

Chapter 8: Confidence Intervals

The Bias-Variance Trade-Off

Shih-Yang Su Virginia Tech

Using Clustering to Make Prediction Intervals For Neural Networks

Lecture 16. Classification (II): Practical Considerations

Hairong Qi, Gonzalez Family Professor

STA 291 Spring 2008 Lecture 14 Dustin Lueker.

How Confident Are You?.

Presentation transcript:

PatReco: Model and Feature Selection Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall

Breakdown of Classification Error  Bayes error  Model selection error  Model estimation error  Data mismatch error (training-testing)

True statements about Bayes error (valid within statistical significance)  The Bayes error is ALWAYS smaller than the total (empirical) classification error  If the model, estimation and mismatch errors are zero than the total classification error equals the Bayes error  The ONLY way to reduce the Bayes error is to add new features in the classifier design

More true statements  Adding new features can only reduce the Bayes error (this is not true about the total classification error!!!)  Adding new features will NOT reduce the Bayes error if the new features are Very bad at discriminating between classes (feature pdfs overlapping) Highly correlated with existing features

Gaussian classification Bayes Error For two classes ω 1 and ω 2 following Gaussian distributions with means μ 1, μ 2 and the same variance σ 2 then the Bayes error is: P(error) = 1/(2π) 0.5  r/2 exp{-u 2 /2} du where r = |μ 1 -μ 2 |/σ 

Feature Selection  If we had infinite amounts of data then The more features the better!  However in practice finite data: More features  more parameters to train!!!  Good features: Uncorrelated Able to discriminate among classes

Model selection  Number of model parameters is number of parameters that need to be estimated  Overfiting: too many parameters, too little data!!!  Gaussian models-Model selection: Single Gaussians Mixture of Gaussians Fixed Variance Tied Variance Diagonal Variance

Conclusion  Introducing more features and/or more complex models can only reduce the classification error (if infinite amounts of training data are available)  In practice: number of features and number of model parameters is a function of amount of training data available (avoid overfiting!)  Good features are uncorrelated and discriminative