PatReco: Model and Feature Selection Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005.

Slides:



Advertisements
Similar presentations
Posner and Keele; Rosch et al.. Posner and Keele: Two Main Points Greatest generalization is to prototype. –Given noisy examples of prototype, prototype.
Advertisements

CS479/679 Pattern Recognition Dr. George Bebis
Probabilistic Generative Models Rong Jin. Probabilistic Generative Model Classify instance x into one of K classes Class prior Density function for class.
Assuming normally distributed data! Naïve Bayes Classifier.
PatReco: Hidden Markov Models Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
PatReco: Estimation/Training Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Hypothesis Testing Steps of a Statistical Significance Test. 1. Assumptions Type of data, form of population, method of sampling, sample size.
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Non Parametric Classifiers Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Intro to Statistics for the Behavioral Sciences PSYC 1900
Unsupervised Training and Clustering Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Discriminant Functions Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
End of Chapter 8 Neil Weisenfeld March 28, 2005.
PatReco: Detection Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
PatReco: Bayes Classifier and Discriminant Functions Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
PatReco: Discriminant Functions for Gaussians Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Pattern Recognition Applications Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Manu Chandran. Outline Background and motivation Over view of techniques Cross validation Bootstrap method Setting up the problem Comparing AIC,BIC,Crossvalidation,Bootstrap.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
Quadratic Classifiers (QC) J.-S. Roger Jang ( 張智星 ) CS Dept., National Taiwan Univ Scientific Computing.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.
PatReco: Introduction Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Model selection –Error decomposition –Bias-Variance Tradeoff –Classification:
Computational Intelligence: Methods and Applications Lecture 14 Bias-variance tradeoff – model selection. Włodzisław Duch Dept. of Informatics, UMK Google:
Univariate Gaussian Case (Cont.)
Multilevel modelling: general ideas and uses
Practical Statistics for Physicists
More on Inference.
Chapter 3: Maximum-Likelihood Parameter Estimation
Quadratic Classifiers (QC)
Continuous Random Variable
Power and Effect Size.
ECE 5424: Introduction to Machine Learning
(5) Notes on the Least Squares Estimate
LECTURE 04: DECISION SURFACES
IEE 380 Review.
Stephen W. Raudenbush University of Chicago December 11, 2006
CH 5: Multivariate Methods
ECE 5424: Introduction to Machine Learning
Chapter 7 Sampling Distributions.
BIVARIATE REGRESSION AND CORRELATION
Overview of Supervised Learning
Empirical Evaluation (Ch 5)
More on Inference.
REMOTE SENSING Multispectral Image Classification
Econometric Models The most basic econometric model consists of a relationship between two variables which is disturbed by a random error. We need to use.
The Simple Linear Regression Model: Specification and Estimation
EE513 Audio Signals and Systems
CHAPTER 6 Statistical Inference & Hypothesis Testing
Overfitting and Underfitting
Generally Discriminant Analysis
Statistics II: An Overview of Statistics
Multivariate Methods Berlin Chen
Mathematical Foundations of BME
Multivariate Methods Berlin Chen, 2005 References:
STA 291 Spring 2008 Lecture 13 Dustin Lueker.
Chapter 8: Confidence Intervals
The Bias-Variance Trade-Off
Shih-Yang Su Virginia Tech
Using Clustering to Make Prediction Intervals For Neural Networks
Lecture 16. Classification (II): Practical Considerations
Hairong Qi, Gonzalez Family Professor
STA 291 Spring 2008 Lecture 14 Dustin Lueker.
How Confident Are You?.
Presentation transcript:

PatReco: Model and Feature Selection Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall

Breakdown of Classification Error  Bayes error  Model selection error  Model estimation error  Data mismatch error (training-testing)

True statements about Bayes error (valid within statistical significance)  The Bayes error is ALWAYS smaller than the total (empirical) classification error  If the model, estimation and mismatch errors are zero than the total classification error equals the Bayes error  The ONLY way to reduce the Bayes error is to add new features in the classifier design

More true statements  Adding new features can only reduce the Bayes error (this is not true about the total classification error!!!)  Adding new features will NOT reduce the Bayes error if the new features are Very bad at discriminating between classes (feature pdfs overlapping) Highly correlated with existing features

Gaussian classification Bayes Error For two classes ω 1 and ω 2 following Gaussian distributions with means μ 1, μ 2 and the same variance σ 2 then the Bayes error is: P(error) = 1/(2π) 0.5  r/2 exp{-u 2 /2} du where r = |μ 1 -μ 2 |/σ 

Feature Selection  If we had infinite amounts of data then The more features the better!  However in practice finite data: More features  more parameters to train!!!  Good features: Uncorrelated Able to discriminate among classes

Model selection  Number of model parameters is number of parameters that need to be estimated  Overfiting: too many parameters, too little data!!!  Gaussian models-Model selection: Single Gaussians Mixture of Gaussians Fixed Variance Tied Variance Diagonal Variance

Conclusion  Introducing more features and/or more complex models can only reduce the classification error (if infinite amounts of training data are available)  In practice: number of features and number of model parameters is a function of amount of training data available (avoid overfiting!)  Good features are uncorrelated and discriminative