Classification and application in Remote Sensing.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
An Overview of Machine Learning
Chapter 4: Linear Models for Classification
Laboratory for Social & Neural Systems Research (SNS) PATTERN RECOGNITION AND MACHINE LEARNING Institute of Empirical Research in Economics (IEW)
Visual Recognition Tutorial
Lecture 20 Object recognition I
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Project 4 out today –help session today –photo session today Project 2 winners Announcements.
Minimum Classification Error Networks Based on book chapter 9, by Shigeru Katagiri Jaakko Peltonen, 28 th February, 2002.
Machine Learning CMPT 726 Simon Fraser University
1 lBayesian Estimation (BE) l Bayesian Parameter Estimation: Gaussian Case l Bayesian Parameter Estimation: General Estimation l Problems of Dimensionality.
Visual Recognition Tutorial
Computer Vision I Instructor: Prof. Ko Nishino. Today How do we recognize objects in images?
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Bayesian Estimation (BE) Bayesian Parameter Estimation: Gaussian Case
Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Introduction to machine learning
Crash Course on Machine Learning
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture Notes by Neşe Yalabık Spring 2011.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Introduction to Pattern Recognition
0 Pattern Classification, Chapter 3 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda,
Image Classification 영상분류
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
1 E. Fatemizadeh Statistical Pattern Recognition.
11 Overview of Predictive Learning Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota Presented at the University.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Chapter 3 (part 2): Maximum-Likelihood and Bayesian Parameter Estimation Bayesian Estimation (BE) Bayesian Estimation (BE) Bayesian Parameter Estimation:
1 Bayesian Decision Theory Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking and.
ECE 471/571 – Lecture 2 Bayesian Decision Theory 08/25/15.
Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.
Linear Models for Classification
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Final Review Course web page: vision.cis.udel.edu/~cv May 21, 2003  Lecture 37.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
Regress-itation Feb. 5, Outline Linear regression – Regression: predicting a continuous value Logistic regression – Classification: predicting a.
NTU & MSRA Ming-Feng Tsai
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Machine Learning for Computer Security
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Empirical risk minimization
Bounding the error of misclassification
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Lecture 26: Faces and probabilities
Outline Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no.
REMOTE SENSING Multispectral Image Classification
Pattern Recognition and Machine Learning
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Announcements Project 2 artifacts Project 3 due Thursday night
Empirical risk minimization
Speech recognition, machine learning
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
NAÏVE BAYES CLASSIFICATION
Speech recognition, machine learning
Presentation transcript:

Classification and application in Remote Sensing

2 Overview Introduction to classification problem an application of classification in remote sensing: vegetation classification -band selection -multi-class classification

3 Introduction make program that automatically recognize handwritten numbers:

4 Introduction classification problem from raw data to decisions learn from examples and generalize Given: Training examples (x, f(x)) for some unknown function f. Find: A good approximation to f.

5 Examples Handwriting recognition -x: data from pen motion -f(x): letter of the alphabet Disease Diagnosis -x: properties of patient (symptoms, lab tests) -f(x): disease (or maybe, recommended therapy) Face Recognition -x: bitmap picture of person’s face -f(x): name of person Spam Detection -x: message -f(x): spam or not spam

6 Steps for building a classifier data acquisition / labeling (ground truth) preprocessing feature selection / feature extraction classification (learning/testing) post-processing decision

7 Data acquisition acquiring the data and labeling data is independently randomly sample according to unknown distribution P(x,y)

8 Pre-processing e.g. image processing: -histogram equalization, -filtering -segmentation data normalization

9 Pre-processing: example

10 Feature selection/extraction This is generally the most important step conveying the information in the data to classifier the number of features: -should be high: more info is better -should be low: curse of dimensionality will include prior knowledge of problem in part manual, in part automatic

11 Feature selection/extraction User knowledge Automatic: -PCA: reduce number of feature by decorrelation -look which feature give best classification result

12 Feature extraction: example

13 Feature scatterplot Class A Class B Class C K=3 value feature 1 value feature 2

14 Classification learn from the features and generalize learning algorithm analyzes the examples and produces a classifier f given a new data point (x,y), the classifier is given x and predicts ŷ = f(x) the loss L(ŷ,y) is then measured goal of the learning algorithm: Find the f that minimizes the expected loss

15 Classification: Bayesian decision theory fundamental statistical approach to the problem of pattern classification assuming that the descision problem is posed in probabilistic terms using P(y|x) posterior probability, make classification (Maximum aposteriori classification)

16 Classification density estimationneed to estimate p(y) and p(x|y), prior and class-conditional probability density using only the data: density estimation. often not feasible: too little data in to high- dimensional space: -assume simple parametric probability model (normal) -non-parametric -directly find discriminant function

17 example

18 example

19 Post-processing include context -e.g. in images, signals integrate multiple classifiers

20 Decision minimize risk, considering cost of misclassification : when unsure, select class of minimal cost of error.

21 no free lunch theorem don’t wait until the a “generic” best classifier is here!

22 Applications in Remote Sensing

23 Remote Sensing : acquisition image are acquired from air or space.

24 Spectral response

25 Spectral response

26

27 Brugge Westhoek Hyperspectral sensor: AISA Eagle (July 2004): resolution

28 Labeling

29 Labeling:spectral class mean

30 Feature extraction here: exploratory use: Automatically look for relevant features -which spectral bands (wavelength) should be measured at what which spectral resolution (width) for my application. -results can be used for classification, sensor design or interpretation

31 Feature extraction: Band Selection With spectral response function:

32 Hypothetical 12 band sensor

33 Class distribution: Normal

34 Class Separation Criterion two class Bhattacharyya bound multi-class criterion

35 Optimization Minimize Gradient descent is possible, but local minima prevent it from giving good optimal values. Therefore, we use global optimization : Simulated Annealing.

36

37

38

39

40

41

42 Remote sensing: classification

43 Multi-class Classification Linear Multi-class Classifier Combining Binary Classifiers -One against all: K-1 classifiers -One against one: K(K-1)/2 classifiers

44 combining linear multi-class classifiers Class A Class B Class C AC AB BC K=3

45 Combining Binary Classifiers Maximum Voting: 4 class example Votes: 1 : 0 2 : 2 3 : 1 4 : 3 (Winner) Bin ClassifierResult

46 Problem with max voting No Probabilities, just class labels -Hard classification Probabilities are usefull for -spectral unmixing -post-processing

47 Combining Binary Classifiers : Coupling Probabilities Look for class probabilities p i : with r ij : probability class ω i for binary classifier i-j -K-1 free parameters and K(K-1)/2 constraints ! Hastie and Tibshirani: find approximations -minimizing Kullback-Leibler distance

48 Classification result

49 single pixel classes: not wanted

50 Remote Sensing: post-processing use contextual information to “adjust” classification. look a classes of neighboring pixels and probabilities, if necessary adjust pixel class

51 Post-processed classification result

52 Pixel mixing SAND MOSS DRY GRASS GREEN GRASS

53 Pixel mixing

54 Unmixing with sand Moss Sparse Moss Grass Sparse Grass Marram Sparse Marram

55 The End