Image Modeling & Segmentation Aly Farag and Asem Ali Lecture #2.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Principles of Density Estimation
Announcements •Homework due Tuesday. •Office hours Monday 1-2 instead of Wed. 2-3.
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
INTRODUCTION TO Machine Learning 3rd Edition
Lecture 3 Nonparametric density estimation and classification
Pattern recognition Professor Aly A. Farag
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Chapter 4 (Part 1): Non-Parametric Classification
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification, Chapter 3 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P.
Chapter 4 Probability Distributions
Prénom Nom Document Analysis: Non Parametric Methods for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 4 (part 2): Non-Parametric Classification
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Pattern Recognition: Baysian Decision Theory Charles Tappert Seidenberg School of CSIS, Pace University.
Non-Parametric Learning Prof. A.L. Yuille Stat 231. Fall Chp 4.1 – 4.3.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Review and Preview This chapter combines the methods of descriptive statistics presented in.
Slide 1 Copyright © 2004 Pearson Education, Inc..
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Computational Intelligence: Methods and Applications Lecture 12 Bayesian decisions: foundation of learning Włodzisław Duch Dept. of Informatics, UMK Google:
1 E. Fatemizadeh Statistical Pattern Recognition.
Perceptual and Sensory Augmented Computing Machine Learning WS 13/14 Machine Learning – Lecture 3 Probability Density Estimation II Bastian.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Intro. ANN & Fuzzy Systems Lecture 23 Clustering (4)
Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Nonparametric Density Estimation Riu Baring CIS 8526 Machine Learning Temple University Fall 2007 Christopher M. Bishop, Pattern Recognition and Machine.
Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Chapter 5 Probability Distributions 5-1 Overview 5-2 Random Variables 5-3 Binomial Probability Distributions 5-4 Mean, Variance and Standard Deviation.
Pattern Classification All materials in these slides* were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
CHAPTER 8: Nonparametric Methods Alpaydin transparencies significantly modified, extended and changed by Ch. Eick Last updated: March 4, 2011.
Lecture 7 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Nonparametric Density Estimation – k-nearest neighbor (kNN) 02/20/17
INTRODUCTION TO Machine Learning 3rd Edition
Ch8: Nonparametric Methods
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
3(+1) classifiers from the Bayesian world
Outline Parameter estimation – continued Non-parametric methods.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Nonparametric methods Parzen window and nearest neighbor
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
LECTURE 16: NONPARAMETRIC TECHNIQUES
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Parametric Methods Berlin Chen, 2005 References:
Learning From Observed Data
Nonparametric density estimation and classification
Mathematical Foundations of BME
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Hairong Qi, Gonzalez Family Professor
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Presentation transcript:

Image Modeling & Segmentation Aly Farag and Asem Ali Lecture #2

Intensity Model “Density Estimation”

3 Intensity Models  The histogram of the whole image will represent the rate of occurrence for all the classes in the given empirical density  Each class can be described by the histogram of the occurrences of the gray levels within that class  Intensity models describe the statistical characteristics of each class in the given image. marginal density  T he objective of the intensity model is to estimate the marginal density for each class from the mixed normalized histogram of the occurrences of the gray levels

4  Density estimation can be studied, under two primary umbrellas:  Parametric methods and  Nonparametric methods. Intensity Models  Nonparametric methods take a strong stance of letting the data (e.g., pixels’ gray levels) represent themselves.  One of the core methods that nonparametric density estimation approaches based on is the k-nearest neighbors (k-NN) method. These approaches calculate the probability of a sample by combining the memorized responses for the k nearest neighbors of this sample in the training data.  Nonparametric methods achieve good estimation for any input distribution as more data are observed.  Flexible: they can fit almost any data well. No prior knowledge is required.  However, they apparently often have a high computational cost and have many parameters that need to be tuned.

5 Nonparametric methods 1  Based on the fact that the probability P that a given sample x falls within a region (window) of R is given by  This integral can be approximated either by the product of the value of p(x) with the area/volume of the region, or by the ratio of number of samples fall within the region:  if we observe a large number, n, of fish and count those whose length fall within the range defined by R, then k/n can be used as n estimate of P as n→∞ 1 Chapter 4, Duda, R., Hart, P., Stork, D., Pattern Classification, 2nd edR John Wiley & Sons.

6 Nonparametric methods  In order to make sure that we get a good estimate of p(x) at each point, we have to have lots of data points (instances) for any given R (or volume V ). This can be done in two ways  We can fix V and take more and more samples in this volume. Then k/n→P, however, we then estimate only an average of of p(x), not the p(x) itself, because P can change in any region of nonzero volume  Alternatively, we can fix n and make V →0, so that p(x) is constant in that region. However, in practice we have a finite number of training data, so as V →0, V will be so small that it will eventually contain no samples: k =0 p(x) = 0, a useless result!  Therefore, V →0 is not feasible and there will always be some variance in k/n and hence some averaging in p(x) within the finite non-zero volume V. A compromise need to be found for V so that  It will be large enough to contain sufficient number of samples  It will be small enough to justify our assumption of p(x) be constant within the chosen volume/region.

7 Nonparametric methods To make sure that k/n is a good estimate of P, and consequently, p n (x*) is a good estimate p(x*) the following need to be satisfied:

8 Nonparametric methods  There are two ways to ensure these conditions:  Shrink an initial volume V n as a function of n, e.g.,. Then, as n increases so does k, which can be determined from the training data. Parzen Windows (PW) density estimation  Specify k n as a function of n, e.g., V n grows until it encloses k n samples. Then V n can be determined from the training data. K- Nearest Neighbor (KNN)  It can be shown that as n→∞, both KNN and PW approach the true density p(x*), provided that V n shrinks and k n grows proportionately with n.

9 Parzen Windows  The number of samples falling into the specified region is obtained by the help of a windowing function, hence the name Parzen windows. We first assume that R is a d -dimensional hypercube of each side h, whose volume is then V=(h) d Then define a window function φ(u), called a kernel function, to count the number of samples k that fall into R

10 Parzen Windows Example:Given this image D = {1,2,5,1,1,2,3,5,5,1.5,1.5} For h=1, compute p(2.5) d = V = n = K = p(2.5) =

11 Parzen Windows  Now consider ϕ (.) as a general function, typically a smooth and continuous function, instead of a hypercube. The general expression of p(x) remains unchanged  Then p(x) is an interpolation of ϕ (.) s, where each ϕ (.) measures how far a given x i is from x  In practice, x i are the training data points and we estimate p(x) by interpolating the contributions of each sample data point x i based on its distance from x, the point at which we want to estimate the density. The kernel function ϕ (.) provides the numerical value of this distance  If ϕ (.) is itself a distribution, then p(x) will converge to p(x) as n increases. A typical choice for ϕ (.) is the Gaussian  The density p(x) is then estimated simply by a superposition of Gaussians, where each Gaussian is centered at the training data instances. The parameter h is then the variance of the Gaussian

12 Assume you have n samples drawn from normal distribution N(0,1). Use PW with Gaussian kernel to estimate this distribution. Try different window widths and numbers of samples. i.e., Try to generate a similar figure. Image Modeling Homework #2 due Sept. 1 st

13 Classification using Parzen Windows  In classifiers based on Parzen-window estimation, we estimate the densities for each class and classify a test point by the label corresponding to the maximum posterior. 10,c 1 20,c 1 15,c ,c ,c 2 60,c ,c ,c 2 Example: