ADAPTIVE HIERARCHICAL CLASSIFICATION WITH LIMITED TRAINING DATA Dissertation Defense of Joseph Troy Morgan Committee: Dr Melba Crawford Dr J. Wesley Barnes.

Slides:

Advertisements

Similar presentations

Applications of one-class classification

Advertisements

1 Manifold Alignment for Multitemporal Hyperspectral Image Classification H. Lexie Yang 1, Melba M. Crawford 2 School of Civil Engineering, Purdue University.

Laboratory for Applications of Remote Sensing Critical Class Oriented Active Learning for Hyperspectral Image Classification Hyperspectral Image Classification.

Data Mining Classification: Alternative Techniques

1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)

2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)

Mid-level Visual Element Discovery as Discriminative Mode Seeking Harley Montgomery 11/15/13.

Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

What is Cluster Analysis

Image Classification.

Unsupervised Training and Clustering Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall

Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.

Presented by Zeehasham Rasheed

Classification and application in Remote Sensing.

Data mining and statistical learning - lecture 13 Separating hyperplane.

Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.

Chapter 5 Data mining : A Closer Look.

Image Classification

1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.

Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.

Chapter 8 Discriminant Analysis. 8.1 Introduction  Classification is an important issue in multivariate analysis and data mining.  Classification: classifies.

Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.

Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.

Image Classification 영상분류

Part 2: Change detection for SAR Imagery (based on Chapter 18 of the Book. Change-detection methods for location of mines in SAR imagery, by Dr. Nasser.

Remote Sensing Supervised Image Classification. Supervised Image Classification ► An image classification procedure that requires interaction with the.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.

CLASSIFICATION: Ensemble Methods

Map of the Great Divide Basin, Wyoming, created using a neural network and used to find likely fossil beds See:

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Neural Networks - Lecture 81 Unsupervised competitive learning Particularities of unsupervised learning Data clustering Neural networks for clustering.

Digital Image Processing

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

H. Lexie Yang1, Dr. Melba M. Crawford2

MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.

DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

Text Clustering Hongning Wang

Classification Ensemble Methods 1

Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.

An Interval Classifier for Database Mining Applications Rakes Agrawal, Sakti Ghosh, Tomasz Imielinski, Bala Iyer, Arun Swami Proceedings of the 18 th VLDB.

Cell Segmentation in Microscopy Imagery Using a Bag of Local Bayesian Classifiers Zhaozheng Yin RI/CMU, Fall 2009.

APPLICATIONS OF DIRICHLET PROCESS MIXTURES TO SPEAKER ADAPTATION Amir Harati and Joseph PiconeMarc Sobel Institute for Signal and Information Processing,

Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

Unsupervised Classification

Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.

Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

JMP Discovery Summit 2016 Janet Alvarado

Semi-Supervised Clustering

A Personal Tour of Machine Learning and Its Applications

IMAGE PROCESSING RECOGNITION AND CLASSIFICATION

Map of the Great Divide Basin, Wyoming, created using a neural network and used to find likely fossil beds See:

Learning with information of features

REMOTE SENSING Multispectral Image Classification

Supervised Classification

Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.

Multivariate Methods Berlin Chen

Multivariate Methods Berlin Chen, 2005 References:

Machine Learning – a Probabilistic Perspective

EM Algorithm and its Applications

Presentation transcript:

ADAPTIVE HIERARCHICAL CLASSIFICATION WITH LIMITED TRAINING DATA Dissertation Defense of Joseph Troy Morgan Committee: Dr Melba Crawford Dr J. Wesley Barnes Dr Joydeep Ghosh Dr John Hasenbein Dr Elmira Popova

Overview Introduction Motivation for research Limited quantity of training data Limited quality of training data Output space precision Research contributions

Introduction: classification Assigning labels (L i ) to data (x) Typically use “supervised” methods when possible “Label specific” probability distributions Predefined vs. unknown labels Input space into feature space

Binary hierarchical classifier (BHC) framework C terminal nodes and (C-1) internal nodes Feature selection specific to each partition More natural and easier discriminations first * * Internal node 1 Leaf node 2 C A+1 C3 A 6

Introduction: training/testing Bayesian approach parameter estimation: Labeled data used for training and testing Sample selection is important Reported results may be “misleading” Real-world selection problems

Introduction: hyperspectral data

Overview Introduction Motivation for research Limited quantity of training data Limited quality of training data Output space precision Research contributions

Motivation for proposed research Robustness related to the training data Classification dependency on an adequate quantity of training data Dealing with training data that is of poor quality

Overview Introduction Motivation for research Limited quantity of training data Limited quality of training data Output space precision Research contributions

Limited quantity of training data Covariance matrix: parameters Literature: obs  4 x dimensionality Previous work Parameter stabilization techniques Improving ratio of training data to dimensionality * * Sub-sampling and combining schemes

Land cover labels: Bolivar Peninsula

Adaptive Best-Basis BHC Feature space is dependent upon quantity of data at each split in the BHC Set d=|X|/Threshold at each split Reduce dimensionality by merging highly correlated bands Use “ancestors” in the hierarchy for help in generating the best-basis

Adaptive BB-BHC algorithm Threshold: Correlation measure :

Bolivar Peninsula Acquired Fall ’99 Pixel spatial resolution of 5m Shoreline changes and sedimentary process

Adaptive BB-BHC: Bolivar Adaptive BB- BHC retains high level of accuracy Applicable even at 75% rate Generally less variability due to training data sample

Bolivar Peninsula TD-BHC classified images Sampling Percentage: BB vs Pseudo

Kennedy Space Center Acquired Spring ‘96 Pixel resolution of 18m Merritt Island National Wildlife Refuge > 1,000 plant and 500 animal species

Adaptive BB-BHC: Canaveral Adaptive BB-BHC performs better except at 1.5% Reduction is too severe

KSC Images Sampling Percentage: BB vs Pseudo Ex: TD-BHC

Overview Introduction Motivation for research Limited quantity of training data Limited quality of training data Output space precision Research contributions

Limited quality of training data Training data not representative of the entire population Detrimental impact Has not been demonstrated Potential solutions unexplored

“Misleading” accuracies Training and testing look great Transferred classifier performs poorly

Limited data problem Results from combined data very good 1 on 2 2 on 2 1,2 on 2

Where is the Problem Distribution has shifted and variance changed

Parameter Updating Methodology Example: 5 classes identified from “old” area “Reuse” knowledge acquired from previous data Assumptions: Applicable/extendable class structure from old area Projections will still work well at separating the meta- classes in the new area S1 S2 C3C1 S3 S4 C4C2 C5

Parameter Updating Methodology Compare relative magnitude of the means of the clusters to the previous means to identify the “hidden” cluster meta-class labels Old projection from S1 will be used to separate the unlabeled [C1,C3] from [C2,C4,C5] The meta-class distributions will be updated based upon the pseudo- labeled cluster distributions Those pixels identified as [C1,C3] will be separated based upon the old projection at S2 S1 S2 C3C1 S3 S4 C4C2 C5

Parameter Updating Results Improved accuracies for Bolivar Peninsula Mixed results for KSC

Confusion Matrix vs Precision Tree

Overview Introduction Motivation for research Limited quantity of training data Limited quality of training data Output space precision Research contributions

Output Space Precision Precision may not be supportable for transferal Oak Hammock, Slash Pine, etc vs Trees Need to find an applicable “level” of classes Goal: provide tools for researcher Use the BHC hierarchy Distance measure Ability to use multiple trees and sub-sampling due to the performance of the Adaptive BB-BHC “Purity” of label: classifier agreement

Separation of the Distributions Compare old separation vs new separation Comon distance measure: Bhattacharyya Best-basis approach for limited data quantity

Multiple BHC TD and BU hierarchies Adaptive BB-BHC allows for sub-sampling Common “Master Tree” Proximity of classes in each BHC are used for distance matrix Greedily merge classes Voting method to combine votes of the multiple BHCs

Output Space Precision Results Improved accuracies for Bolivar over each individual transferred classifier

Output Space Precision Tools * *

Overview Introduction Motivation for research Limited quantity of training data Limited quality of training data Output space decomposition Research contributions

Adaptive Best-Basis BHC Information Recycling Output space scalability Compare “trees” from different samples Master-basis construction Tool for classifier transferal

Future research topics Feature selection Unsupervised clustering necessary Focus on class homogeneity rather than validation Investigate techniques developed in signal processing community Build “library” of spectral signatures

Bottom-Up (BU) BHC * *

Fisher’s linear discriminant * * Maximize ratio of between (B) and within (W) class covariance

Output Space Precision Tools