Jeremy Bolton, Seniha Yuksel, Paul Gader

Slides:



Advertisements
Similar presentations
EcoTherm Plus WGB-K 20 E 4,5 – 20 kW.
Advertisements

Números.
University Paderborn 07 January 2009 RG Knowledge Based Systems Prof. Dr. Hans Kleine Büning Reinforcement Learning.
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
5.1 Rules for Exponents Review of Bases and Exponents Zero Exponents
PDAs Accept Context-Free Languages
3.6 Support Vector Machines
ALAK ROY. Assistant Professor Dept. of CSE NIT Agartala
EuroCondens SGB E.
Worksheets.
Reinforcement Learning
Slide 1Fig 26-CO, p.795. Slide 2Fig 26-1, p.796 Slide 3Fig 26-2, p.797.
Sequential Logic Design
STATISTICS Linear Statistical Models
STATISTICS Random Variables and Probability Distributions
STATISTICS HYPOTHESES TEST (I)
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
STATISTICS POINT ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
Addition and Subtraction Equations
David Burdett May 11, 2004 Package Binding for WS CDL.
Create an Application Title 1Y - Youth Chapter 5.
Add Governors Discretionary (1G) Grants Chapter 6.
CALENDAR.
CHAPTER 18 The Ankle and Lower Leg
The 5S numbers game..
突破信息检索壁垒 -SciFinder Scholar 介绍
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
1 OFDM Synchronization Speaker:. Wireless Access Tech. Lab. CCU Wireless Access Tech. Lab. 2 Outline OFDM System Description Synchronization What is Synchronization?
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
The basics for simulations
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Factoring Quadratics — ax² + bx + c Topic
EE, NCKU Tien-Hao Chang (Darby Chang)
Regression with Panel Data
TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
Dynamic Access Control the file server, reimagined Presented by Mark on twitter 1 contents copyright 2013 Mark Minasi.
TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
Progressive Aerobic Cardiovascular Endurance Run
Biology 2 Plant Kingdom Identification Test Review.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
Facebook Pages 101: Your Organization’s Foothold on the Social Web A Volunteer Leader Webinar Sponsored by CACO December 1, 2010 Andrew Gossen, Senior.
TCCI Barometer September “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
Artificial Intelligence
When you see… Find the zeros You think….
1 Using Bayesian Network for combining classifiers Leonardo Nogueira Matos Departamento de Computação Universidade Federal de Sergipe.
Midterm Review Part II Midterm Review Part II 40.
2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.
Before Between After.
2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.
Slide R - 1 Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Prentice Hall Active Learning Lecture Slides For use with Classroom Response.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Static Equilibrium; Elasticity and Fracture
Converting a Fraction to %
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Resistência dos Materiais, 5ª ed.
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
Lial/Hungerford/Holcomb/Mullins: Mathematics with Applications 11e Finite Mathematics with Applications 11e Copyright ©2015 Pearson Education, Inc. All.
WARNING This CD is protected by Copyright Laws. FOR HOME USE ONLY. Unauthorised copying, adaptation, rental, lending, distribution, extraction, charging.
9. Two Functions of Two Random Variables
A Data Warehouse Mining Tool Stephen Turner Chris Frala
1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.
Chart Deception Main Source: How to Lie with Charts, by Gerald E. Jones Dr. Michael R. Hyman, NMSU.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Introduction Embedded Universal Tools and Online Features 2.
Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.
Conjuntive Formulation of the Random Set Framework for Multiple Instance Learning: Application to Remote Sensing Jeremy Bolton Paul Gader CSI Laboratory.
Presentation transcript:

Jeremy Bolton, Seniha Yuksel, Paul Gader Multiple Instance Hidden Markov Model: Application to Landmine Detection in GPR Data Jeremy Bolton, Seniha Yuksel, Paul Gader CSI Laboratory University of Florida

Highlights Hidden Markov Models (HMMs) are useful tools for landmine detection in GPR imagery Explicitly incorporating the Multiple Instance Learning (MIL) paradigm in HMM learning is intuitive and effective Classification performance is improved when using the MI-HMM over a standard HMM Results further support the idea that explicitly accounting for the MI scenario may lead to improved learning under class label uncertainty

Outline HMMs for Landmine detection in GPR MIL Scenario MI-HMM Data Feature Extraction Training MIL Scenario MI-HMM Classification Results

HMMs for landmine detection

GPR Data GPR data 3d image cube Dt, xt, depth Subsurface objects are observed as hyperbolas

GPR Data Feature Extraction Many features extracted from in GPR data measure the occurrence of an “edge” For the typical HMM algorithm (Gader et al.), Preprocessing techniques are used to emphasize edges Image morphology and structuring elements can be used to extract edges Image Preprocessed Edge Extraction

4-d Edge Features Edge Extraction

Concept behind the HMM for GPR Using the extracted features (an observation sequence when scanning from left to right in an image) we will attempt to estimate some hidden states

Concept behind the HMM for GPR

HMM Features Current AIM viewer by Smock Rising Edge Feature Image Feature Image Rising Edge Feature Falling Edge Feature

Sampling HMM Summary Feature Calculation HMM Models Dimensions (Not always relevant whether positive or negative diagonal is observed …. Just simply a diagonal is observed) HMMSamp: 2d Down sampling depth HMMSamp: 4 HMM Models Number of States HMMSamp : 4 Gaussian components per state (Fewer total components for probability calculation) HMMSamp : 1 (recent observation)

Training the HMM Xuping Zhang proposed a Gibbs Sampling algorithm for HMM learning But, given an image(s) how do we choose the training sequences? Which sequence(s) do we choose from each image? There is an inherent problem in many image analysis settings due to class label uncertainty per sequence That is, each image has a class label associated with it, but each image has multiple instances of samples or sequences. Which sample(s) is truly indicative of the target? Using standard training techniques this translates to identifying the optimal training set within a set of sequences If an image has N sequences this translates to a search of 2N possibilities

Training Sample Selection Heuristic Currently, an MRF approach (Collins et al.) is used to bound the search to a localized area within the image rather than search all sequences within the image. Reduces search space, but multiple instance problem still exists

Multiple Instance Learning

Standard Learning vs. Multiple Instance Learning Standard supervised learning Optimize some model (or learn a target concept) given training samples and corresponding labels MIL Learn a target concept given multiple sets of samples and corresponding labels for the sets. Interpretation: Learning with uncertain labels / noisy teacher

Multiple Instance Learning (MIL) Given: Set of I bags Labeled + or - The ith bag is a set of Ji samples in some feature space Interpretation of labels Goal: learn concept What characteristic is common to the positive bags that is not observed in the negative bags

Standard learning doesn’t always fit: GPR Example Each training sample (feature vector) must have a label But which ones and how many compose the optimal training set? Arduous task: many feature vectors per image and multiple images Difficult to label given GPR echoes, ground truthing errors, etc … Label of each vector may not be known EHD: Feature Vector Is it easy here to label every depth bin as mine or non-mine? So WHICH one(s) do we present to the learning algorithm?

Learning from Bags In MIL, a label is attached to a set of samples. A bag is a set of samples A sample within a bag is called an instance. A bag is labeled as positive if and only if at least one of its instances is positive. POSITIVE BAGS (Each bag is an image) NEGATIVE BAGS (Each bag is an image)

MI Learning: GPR Example Multiple Instance Learning Each training bag must have a label No need to label all feature vectors, just identify images (bags) where targets are present Implicitly accounts for class label uncertainty … EHD: Feature Vector After producing multiple sets for multiple GPR images, the multiple instance learner will 1) identify the commonalities (common patterns) shared by the positives bags that are not observed in the negative bags – it will learn the target concept. 2) given the classifier/model chosen, it will aid in the optimization of classifier or model parameters. Some supervised, semi-supervised, or active learning methods may attempt to assign labels to all training samples, such that some expert is aiding, some criterion is satisfied, or some objective is optimized. With multiple instance learning, we say, FORGET ABOUT IT. The multiple instance learner will figure it out.

Multiple Instance Learning HMM: MI-HMM

MI-HMM In MI-HMM, instances are sequences NEGATIVE BAGS POSITIVE BAGS Direction of movement NEGATIVE BAGS POSITIVE BAGS Learning sequences can be applied to GPR as well!

MI-HMM Assuming independence between the bags and assuming the Noisy-OR (Pearl) relationship between the sequences within each bag where This is a slide showing equations

MI-HMM learning Due to the cumbersome nature of the noisy-OR, the parameters of the HMM are learned using Metropolis – Hastings sampling.

Sampling HMM parameters are sampled from Dirichlet A new state is accepted or rejected based on the ratio r at iteration t + 1 where P is the noisy-or model.

Discrete Observations Note that since we have chosen a Metropolis Hastings sampling scheme using Dirichlets, our observations must be discretized.

MI-HMM Summary Feature Calculation HMM Models Dimensions HMMSamp: 2d MI-HMM: 2d features are descretized into 16 symbols Down sampling depth HMMSamp: 4 MI-HMM: 4 HMM Models Number of States HMMSamp : 4 Components per state (Fewer total components for probability calculation) HMMSamp : 1 Gaussian MI-HMM: Discrete mixture over 16 symbols

Classification Results

MI-HMM vs Sampling HMM Small Millbrook HMM Samp (12,000) MI-HMM (100)

What’s the deal with HMM Samp?

Concluding Remarks

Concluding Remarks Explicitly incorporating the Multiple Instance Learning (MIL) paradigm in HMM learning is intuitive and effective Classification performance is improved when using the MI-HMM over a standard HMM More effective and efficient Future Work Construct bags without using MRF heuristic Apply to EMI data: spatial uncertainty

Back up Slides

Standard Learning vs. Multiple Instance Learning Standard supervised learning Optimize some model (or learn a target concept) given training samples and corresponding labels MIL Learn a target concept given multiple sets of samples and corresponding labels for the sets. Interpretation: Learning with uncertain labels / noisy teacher

Multiple Instance Learning (MIL) Given: Set of I bags Labeled + or - The ith bag is a set of Ji samples in some feature space Interpretation of labels Goal: learn concept What characteristic is common to the positive bags that is not observed in the negative bags

MIL Application: Example GPR EHD: Feature Vector Collaboration: Frigui, Collins, Torrione Construction of bags Collect 15 EHD feature vectors from the 15 depth bins Mine images = + bags FA images = - bags Explain GPR images and target signatures. Given a GPR image, typically multiple features vectors are calculated at each depth bin or image subsets. Note that some feature vectors exhibit the target concept and some do not, which ones exhibit it can be considered uncertain, unless an expert is used label each feature vector. Note that this is exactly the multiple instance scenario – when optimizing a classifier for landmine detection we are learning in conditions of uncertainty: we know that there is a target in this image, but we don’t know which features vectors contain the target and which do not.

Standard vs. MI Learning: GPR Example Standard Learning Each training sample (feature vector) must have a label Arduous task many feature vectors per image and multiple images difficult to label given GPR echoes, ground truthing errors, etc … label of each vector may not be known EHD: Feature Vector

Standard vs MI Learning: GPR Example Multiple Instance Learning Each training bag must have a label No need to label all feature vectors, just identify images (bags) where targets are present Implicitly accounts for class label uncertainty … EHD: Feature Vector After producing multiple sets for multiple GPR images, the multiple instance learner will 1) identify the commonalities (common patterns) shared by the positives bags that are not observed in the negative bags – it will learn the target concept. 2) given the classifier/model chosen, it will aid in the optimization of classifier or model parameters. Some supervised, semi-supervised, or active learning methods may attempt to assign labels to all training samples, such that some expert is aiding, some criterion is satisfied, or some objective is optimized. With multiple instance learning, we say, FORGET ABOUT IT. The multiple instance learner will figure it out.

Random Set Framework for Multiple Instance Learning

Random Set Brief Random Set

How can we use Random Sets for MIL? Random set for MIL: Bags are sets Idea of finding commonality of positive bags inherent in random set formulation Sets have an empty intersection or non-empty intersection relationship Find commonality using intersection operator Random sets governing functional is based on intersection operator Capacity functional : T A.K.A. : Noisy-OR gate (Pearl 1988) It is NOT the case that EACH element is NOT the target concept

Random Set Functionals Capacity functionals for intersection calculation Use germ and grain model to model random set Multiple (J) Concepts Calculate probability of intersection given X and germ and grain pairs: Grains are governed by random radii with assumed cumulative: Random Set model parameters Germ Grain

RSF-MIL: Germ and Grain Model Positive Bags = blue Negative Bags = orange Distinct shapes = distinct bags x T

Multiple Instance Learning with Multiple Concepts

Multiple Concepts: Disjunction or Conjunction? When you have multiple types of concepts When each instance can indicate the presence of a target Conjunction When you have a target type that is composed of multiple (necessary concepts) When each instance can indicate a concept, but not necessary the composite target type

Conjunctive RSF-MIL Previously Developed Disjunctive RSF-MIL (RSF-MIL-d) Conjunctive RSF-MIL (RSF-MIL-c) Noisy-OR combination across concepts and samples Standard noisy-OR for one concept j Noisy-AND combination across concepts

Synthetic Data Experiments Extreme Conjunct data set requires that a target bag exhibits two distinct concepts rather than one or none AUC (AUC when initialized near solution)

Application to Remote Sensing

Disjunctive Target Concepts Type 1 NoisyOR NoisyOR Target Concept Type 2 Type n OR Target Concept Present? … Using Large overlapping bins (GROSS Extraction) the target concept can be encapsulated within 1 instance: Therefore a disjunctive relationship exists

What if we want features with finer granularity Constituent Concept 1 (top of hyperbola) Constituent Concept 2 (wings of hyperbola) Our features have more granularity, therefore our concepts may be constituents of a target, rather than encapsulating the target concept NoisyOR … AND Target Concept Present? Fine Extraction More detail about image and more shape information, but may loose disjunctive nature between (multiple) instances

GPR Experiments Extensive GPR Data set Experimental Design Hypothesis ~800 targets ~ 5,000 non-targets Experimental Design Run RSF-MIL-d (disjunctive) and RSF-MIL-c (conjunctive) Compare both feature extraction methods Gross extraction: large enough to encompass target concept Fine extraction: Non-overlapping bins Hypothesis RSF-MIL will perform well when using gross extraction whereas RSF-MIL-c will perform well using Fine extraction

Experimental Results Highlights RSF-MIL-d using gross extraction performed best RSF-MIL-c performed better than RSF-MIL-d when using fine extraction Other influencing factors: optimization methods for RSF-MIL-d and RSF-MIL-c are not the same Gross Extraction Fine Extraction

Future Work Implement a general form that can learn disjunction or conjunction relationship from the data Implement a general form that can learn the number of concepts Incorporate spatial information Develop an improved optimization scheme for RSF-MIL-C

HMM Model Visualization Points = Gaussian Component means DTXTHMM Falling Diagonal Color = State Index State index1 State index 2 State index 3 Rising Diagonal Transition probabilities from state to state (red = high probability) Initial probabilities Pattern Characterized

Backup Slides

MIL Example (AHI Imagery) Robust learning tool MIL tools can learn target signature with limited or incomplete ground truth Which spectral signature(s) should we use to train a target model or classifier? Spectral mixing Background signal Ground truth not exact

MI-RVM Addition of set observations and inference using noisy-OR to an RVM model Prior on the weight w

SVM review Classifier structure Optimization

MI-SVM Discussion RVM was altered to fit MIL problem by changing the form of the target variable’s posterior to model a noisy-OR gate. SVM can be altered to fit the MIL problem by changing how the margin is calculated Boost the margin between the bag (rather than samples) and decision surface Look for the MI separating linear discriminant There is at least one sample from each bag in the half space

mi-SVM Enforce MI scenario using extra constraints Mixed integer program: Must find optimal hyperplane and optimal labeling set At least one sample in each positive bag must have a label of 1. All samples in each negative bag must have a label of -1.

Current Applications Multiple Instance Learning MI Problem MI Applications Multiple Instance Learning: Kernel Machines MI-RVM MI-SVM Current Applications GPR imagery HSI imagery

HSI: Target Spectra Learning Given labeled areas of interest: learn target signature Given test areas of interest: classify set of samples

Overview of MI-RVM Optimization Two step optimization Estimate optimal w, given posterior of w There is no closed form solution for the parameters of the posterior, so a gradient update method is used Iterate until convergence. Then proceed to step 2. Update parameter on prior of w The distribution on the target variable has no specific parameters. Until system convergence, continue at step 1.

1) Optimization of w Optimize posterior (Bayes’ Rule) of w Update weights using Newton-Raphson method

2) Optimization of Prior Optimization of covariance of prior Making a large number of assumptions, diagonal elements of A can be estimated

Random Sets: Multiple Instance Learning Random set framework for multiple instance learning Bags are sets Idea of finding commonality of positive bags inherent in random set formulation Find commonality using intersection operator Random sets governing functional is based on intersection operator

MI issues MIL approaches Some approaches are biased to believe only one sample in each bag caused the target concept Some approaches can only label bags It is not clear whether anything is gained over supervised approaches

RSF-MIL MIL-like Positive Bags = blue Negative Bags = orange Distinct shapes = distinct bags x T

Side Note: Bayesian Networks Noisy-OR Assumption Bayesian Network representation of Noisy-OR Polytree: singly connected DAG

Side Note Full Bayesian network may be intractable Occurrence of causal factors are rare (sparse co-occurrence) So assume polytree So assume result has boolean relationship with causal factors Absorb I, X and A into one node, governed by randomness of I These assumptions greatly simplify inference calculation Calculate Z based on probabilities rather than constructing a distribution using X

Diverse Density (DD) Probabilistic Approach Goal: Standard statistics approaches identify areas in a feature space with high density of target samples and low density of non-target samples DD: identify areas in a feature space with a high “density” of samples from EACH of the postitive bags (“diverse”), and low density of samples from negative bags. Identify attributes or characteristics similar to positive bags, dissimilar with negative bags Assume t is a target characterization Assuming the bags are conditionally independent

It is NOT the case that EACH Diverse Density Calculation (Noisy-OR Model): Optimization It is NOT the case that EACH element is NOT the target concept

Random Set Brief Random Set

Random Set Functionals Capacity and avoidance functionals Given a germ and grain model Assumed random radii

When disjunction makes sense OR Target Concept Present Using Large overlapping bins the target concept can be encapsulated within 1 instance: Therefore a disjunctive relationship exists

Theoretical and Developmental Progress Previous Optimization: Did not necessarily promote diverse density Current optimization Better for context learning and MIL Previously no feature relevance or selection (hypersphere) Improvement: included learned weights on each feature dimension Previous TO DO list Improve Existing Code Develop joint optimization for context learning and MIL Apply MIL approaches (broad scale) Learn similarities between feature sets of mines Aid in training existing algos: find “best” EHD features for training / testing Construct set-based classifiers?

How do we impose the MI scenario?: Diverse Density (Maron et al.) Calculation (Noisy-OR Model): Inherent in Random Set formulation Optimization Combo of exhaustive search and gradient ascent It is NOT the case that EACH element is NOT the target concept

How can we use Random Sets for MIL? Random set for MIL: Bags are sets Idea of finding commonality of positive bags inherent in random set formulation Sets have an empty intersection or non-empty intersection relationship Find commonality using intersection operator Random sets governing functional is based on intersection operator Example: Bags with target {l,a,e,i,o,p,u,f} {f,b,a,e,i,z,o,u} {a,b,c,i,o,u,e,p,f} {a,f,t,e,i,u,o,d,v} Bags without target {s,r,n,m,p,l} {z,s,w,t,g,n,c} {f,p,k,r} {q,x,z,c,v} {p,l,f} {a,e,i,o,u,f} intersection union {f,s,r,n,m,p,l,z,w,g,n,c,v,q,k} Target concept = \ = {a,e,i,o,u}