Probability and Statistics in Vision. Probability Objects not all the sameObjects not all the same – Many possible shapes for people, cars, … – Skin has.

Slides:



Advertisements
Similar presentations
CS479/679 Pattern Recognition Dr. George Bebis
Advertisements

Clustering Beyond K-means
Computer vision: models, learning and inference Chapter 8 Regression.
Computer vision: models, learning and inference Chapter 18 Models for style and identity.
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
Computer Vision - A Modern Approach Set: Probability in segmentation Slides by D.A. Forsyth Missing variable problems In many vision problems, if some.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Segmentation and Fitting Using Probabilistic Methods
K-means clustering Hongning Wang
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 143, Brown James Hays 02/22/11 Many slides from Derek Hoiem.
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/15/12.
Visual Recognition Tutorial
EE-148 Expectation Maximization Markus Weber 5/11/99.
Tracking Objects with Dynamics Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/21/15 some slides from Amin Sadeghi, Lana Lazebnik,
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.
Problem Sets Problem Set 3 –Distributed Tuesday, 3/18. –Due Thursday, 4/3 Problem Set 4 –Distributed Tuesday, 4/1 –Due Tuesday, 4/15. Probably a total.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Expectation Maximization for GMM Comp344 Tutorial Kai Zhang.
Visual Recognition Tutorial
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Object Recognition by Parts Object recognition started with line segments. - Roberts recognized objects from line segments and junctions. - This led to.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
SVCL Automatic detection of object based Region-of-Interest for image compression Sunhyoung Han.
EM and expected complete log-likelihood Mixture of Experts
Learning and Recognizing Human Dynamics in Video Sequences Christoph Bregler Alvina Goh Reading group: 07/06/06.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
First topic: clustering and pattern recognition Marc Sobel.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Mixture of Gaussians This is a probability distribution for random variables or N-D vectors such as… –intensity of an object in a gray scale image –color.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Paper Reading Dalong Du Nov.27, Papers Leon Gu and Takeo Kanade. A Generative Shape Regularization Model for Robust Face Alignment. ECCV08. Yan.
Boosted Particle Filter: Multitarget Detection and Tracking Fayin Li.
Lecture 2: Statistical learning primer for biologists
 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.
Machine Learning 5. Parametric Methods.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
Visual Tracking by Cluster Analysis Arthur Pece Department of Computer Science University of Copenhagen
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/22/11.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
Machine Learning Expectation Maximization and Gaussian Mixtures CSE 473 Chapter 20.3.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Chapter 3: Maximum-Likelihood Parameter Estimation
Classification of unlabeled data:
Clustering Evaluation The EM Algorithm
Latent Variables, Mixture Models and EM
Lecture 26: Faces and probabilities
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
Probabilistic Models with Latent Variables
SMEM Algorithm for Mixture Models
PRAKASH CHOCKALINGAM, NALIN PRADEEP, AND STAN BIRCHFIELD
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
10701 / Machine Learning Today: - Cross validation,
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Parametric Methods Berlin Chen, 2005 References:
Mathematical Foundations of BME
EM Algorithm and its Applications
Clustering (2) & EM algorithm
Presentation transcript:

Probability and Statistics in Vision

Probability Objects not all the sameObjects not all the same – Many possible shapes for people, cars, … – Skin has different colors Measurements not all the sameMeasurements not all the same – Noise But some are more probable than othersBut some are more probable than others – Green skin not likely

Probability and Statistics Approach: probability distribution of expected objects, expected observationsApproach: probability distribution of expected objects, expected observations Perform mid- to high-level vision tasks by finding most likely model consistent with actual observationsPerform mid- to high-level vision tasks by finding most likely model consistent with actual observations Often don’t know probability distributions – learn them from statistics of training dataOften don’t know probability distributions – learn them from statistics of training data

Concrete Example – Skin Color Suppose you want to find pixels with the color of skinSuppose you want to find pixels with the color of skin Step 1: learn likely distribution of skin colors from (possibly hand-labeled) training dataStep 1: learn likely distribution of skin colors from (possibly hand-labeled) training data Color Probability

Conditional Probability This is the probability of observing a given color given that the pixel is skinThis is the probability of observing a given color given that the pixel is skin Conditional probability p(color|skin)Conditional probability p(color|skin)

Skin Color Identification Step 2: given a new image, want to find whether each pixel corresponds to skinStep 2: given a new image, want to find whether each pixel corresponds to skin Maximum likelihood estimation: pixel is skin iff p(skin|color) > p(not skin|color)Maximum likelihood estimation: pixel is skin iff p(skin|color) > p(not skin|color) But this requires knowing p(skin|color) and we only have p(color|skin)But this requires knowing p(skin|color) and we only have p(color|skin)

Bayes’s Rule “Inverting” a conditional probability: p(B|A) = p(A|B)  p(B) / p(A)“Inverting” a conditional probability: p(B|A) = p(A|B)  p(B) / p(A) Therefore, p(skin|color) = p(color|skin)  p(skin) / p(color)Therefore, p(skin|color) = p(color|skin)  p(skin) / p(color) p(skin) is the prior – knowledge of the domainp(skin) is the prior – knowledge of the domain p(skin|color) is the posterior – what we wantp(skin|color) is the posterior – what we want p(color) is a normalization termp(color) is a normalization term

Priors p(skin) = priorp(skin) = prior – Estimate from training data – Tunes “sensitivity” of skin detector – Can incorporate even more information: e.g. are skin pixels more likely to be found in certain regions of the image? With more than 1 class, priors encode what classes are more likelyWith more than 1 class, priors encode what classes are more likely

Skin Detection Results Jones & Rehg

Birchfield Skin Color-Based Face Tracking

Mixture Models Although single-class models are useful, the real fun is in multiple-class modelsAlthough single-class models are useful, the real fun is in multiple-class models p(observation) =   class p class (observation)p(observation) =   class p class (observation) Interpretation: the object has some probability  class of belonging to each classInterpretation: the object has some probability  class of belonging to each class Probability of a measurement is a linear combination of models for different classesProbability of a measurement is a linear combination of models for different classes

Gaussian Mixture Model Simplest model for each probability distribution: Gaussian Symmetric: Asymmetric:Simplest model for each probability distribution: Gaussian Symmetric: Asymmetric:

Application: Segmentation Consider the k-means algorithmConsider the k-means algorithm What if we did a “soft” (probabilistic) assignment of points to clustersWhat if we did a “soft” (probabilistic) assignment of points to clusters – Each point has a probability  j of belonging to cluster j – Each cluster has a probability distribution, e.g. a Gaussian with mean  and variance 

“Probabilistic k-means” Changes to clustering algorithm:Changes to clustering algorithm: – Use Gaussian probabilities to assign point  cluster weights

“Probabilistic k-means” Changes to clustering algorithm:Changes to clustering algorithm: – Use  p,j to compute weighted average and covariance for each cluster

Expectation Maximization This is a special case of the expectation maximization algorithmThis is a special case of the expectation maximization algorithm General case: “missing data” frameworkGeneral case: “missing data” framework – Have known data (feature vectors) and unknown data (assignment of points to clusters) – E step: use known data and current estimate of model to estimate unknown – M step: use current estimate of complete data to solve for optimal model

EM Example Bregler

EM and Robustness One example of using generalized EM framework: robustnessOne example of using generalized EM framework: robustness Make one category correspond to “outliers”Make one category correspond to “outliers” – Use noise model if known – If not, assume e.g. uniform noise – Do not update parameters in M step

Example: Using EM to Fit to Lines Good data

Example: Using EM to Fit to Lines With outlier

Example: Using EM to Fit to Lines EM fit Weights of “line” (vs. “noise”)

Example: Using EM to Fit to Lines EM fit – bad local minimum Weights of “line” (vs. “noise”)

Example: Using EM to Fit to Lines Fitting to multiple lines

Example: Using EM to Fit to Lines Local minima

Eliminating Local Minima Re-run with multiple starting conditionsRe-run with multiple starting conditions Evaluate results based onEvaluate results based on – Number of points assigned to each (non-noise) group – Variance of each group – How many starting positions converge to each local maximum With many starting positions, can accommodate many of outliersWith many starting positions, can accommodate many of outliers

Selecting Number of Clusters Re-run with different numbers of clusters, look at total errorRe-run with different numbers of clusters, look at total error Will often see “knee” in the curveWill often see “knee” in the curve Number of clusters Error

Overfitting Why not use many clusters, get low error?Why not use many clusters, get low error? Complex models bad at filtering noise (with k clusters can fit k data points exactly)Complex models bad at filtering noise (with k clusters can fit k data points exactly) Complex models have less predictive powerComplex models have less predictive power Occam’s razor: entia non multiplicanda sunt praeter necessitatem (“Things should not be multiplied beyond necessity”)Occam’s razor: entia non multiplicanda sunt praeter necessitatem (“Things should not be multiplied beyond necessity”)

Training / Test Data One way to see if you have overfitting problems:One way to see if you have overfitting problems: – Divide your data into two sets – Use the first set (“training set”) to train your model – Compute the error of the model on the second set of data (“test set”) – If error is not comparable to training error, have overfitting