Download presentation

Presentation is loading. Please wait.

Published byHaylie Emmitt Modified over 2 years ago

1
Multiple Instance Hidden Markov Model: Application to Landmine Detection in GPR Data Jeremy Bolton, Seniha Yuksel, Paul Gader CSI Laboratory University of Florida

2
2/31 CSI Laboratory 2010 Highlights Hidden Markov Models (HMMs) are useful tools for landmine detection in GPR imagery Explicitly incorporating the Multiple Instance Learning (MIL) paradigm in HMM learning is intuitive and effective Classification performance is improved when using the MI-HMM over a standard HMM Results further support the idea that explicitly accounting for the MI scenario may lead to improved learning under class label uncertainty

3
3/31 CSI Laboratory 2010 Outline I.HMMs for Landmine detection in GPR I.Data II.Feature Extraction III.Training II.MIL Scenario III.MI-HMM IV.Classification Results

4
HMMs for landmine detection

5
5/31 CSI Laboratory 2010 GPR Data GPR data –3d image cube Dt, xt, depth –Subsurface objects are observed as hyperbolas

6
6/31 CSI Laboratory 2010 GPR Data Feature Extraction Many features extracted from in GPR data measure the occurrence of an “edge” –For the typical HMM algorithm (Gader et al.), Preprocessing techniques are used to emphasize edges Image morphology and structuring elements can be used to extract edges ImagePreprocessedEdge Extraction

7
7/31 CSI Laboratory d Edge Features Edge Extraction

8
8/31 CSI Laboratory 2010 Concept behind the HMM for GPR Using the extracted features (an observation sequence when scanning from left to right in an image) we will attempt to estimate some hidden states

9
9/31 CSI Laboratory 2010 Concept behind the HMM for GPR

10
10/31 CSI Laboratory 2010 HMM Features Current AIM viewer by Smock ImageFeature Image Rising Edge Feature Falling Edge Feature

11
11/31 CSI Laboratory 2010 Sampling HMM Summary Feature Calculation –Dimensions (Not always relevant whether positive or negative diagonal is observed …. Just simply a diagonal is observed) HMMSamp: 2d –Down sampling depth HMMSamp: 4 HMM Models –Number of States HMMSamp : 4 –Gaussian components per state (Fewer total components for probability calculation) HMMSamp : 1 (recent observation)

12
12/31 CSI Laboratory 2010 Training the HMM Xuping Zhang proposed a Gibbs Sampling algorithm for HMM learning –But, given an image(s) how do we choose the training sequences? –Which sequence(s) do we choose from each image? There is an inherent problem in many image analysis settings due to class label uncertainty per sequence That is, each image has a class label associated with it, but each image has multiple instances of samples or sequences. Which sample(s) is truly indicative of the target? –Using standard training techniques this translates to identifying the optimal training set within a set of sequences –If an image has N sequences this translates to a search of 2 N possibilities

13
13/31 CSI Laboratory 2010 Training Sample Selection Heuristic Currently, an MRF approach (Collins et al.) is used to bound the search to a localized area within the image rather than search all sequences within the image. –Reduces search space, but multiple instance problem still exists

14
Multiple Instance Learning

15
15/31 CSI Laboratory 2010 Standard Learning vs. Multiple Instance Learning Standard supervised learning –Optimize some model (or learn a target concept) given training samples and corresponding labels MIL –Learn a target concept given multiple sets of samples and corresponding labels for the sets. –Interpretation: Learning with uncertain labels / noisy teacher

16
16/31 CSI Laboratory 2010 Multiple Instance Learning (MIL) Given: –Set of I bags –Labeled + or - –The i th bag is a set of J i samples in some feature space –Interpretation of labels Goal: learn concept –What characteristic is common to the positive bags that is not observed in the negative bags

17
17/31 CSI Laboratory 2010 Standard learning doesn’t always fit: GPR Example Standard Learning –Each training sample (feature vector) must have a label –But which ones and how many compose the optimal training set? Arduous task: many feature vectors per image and multiple images Difficult to label given GPR echoes, ground truthing errors, etc … Label of each vector may not be known EHD: Feature Vector

18
18/31 CSI Laboratory 2010 POSITIVE BAGS (Each bag is an image) Learning from Bags In MIL, a label is attached to a set of samples. A bag is a set of samples A sample within a bag is called an instance. A bag is labeled as positive if and only if at least one of its instances is positive. NEGATIVE BAGS (Each bag is an image) 18

19
19/31 CSI Laboratory 2010 MI Learning: GPR Example Multiple Instance Learning –Each training bag must have a label –No need to label all feature vectors, just identify images (bags) where targets are present –Implicitly accounts for class label uncertainty … EHD: Feature Vector

20
Multiple Instance Learning HMM: MI-HMM

21
21/31 CSI Laboratory 2010 MI-HMM In MI-HMM, instances are sequences NEGATIVE BAGS POSITIVE BAGS Direction of movement 21

22
22/31 CSI Laboratory 2010 MI-HMM Assuming independence between the bags and assuming the Noisy-OR (Pearl) relationship between the sequences within each bag where

23
23/31 CSI Laboratory 2010 MI-HMM learning Due to the cumbersome nature of the noisy-OR, the parameters of the HMM are learned using Metropolis – Hastings sampling. 23

24
24/31 CSI Laboratory 2010 Sampling HMM parameters are sampled from Dirichlet A new state is accepted or rejected based on the ratio r at iteration t + 1 where P is the noisy-or model. 24

25
25/31 CSI Laboratory 2010 Discrete Observations Note that since we have chosen a Metropolis Hastings sampling scheme using Dirichlets, our observations must be discretized.

26
26/31 CSI Laboratory 2010 MI-HMM Summary Feature Calculation –Dimensions HMMSamp: 2d MI-HMM: 2d features are descretized into 16 symbols –Down sampling depth HMMSamp: 4 MI-HMM: 4 HMM Models –Number of States HMMSamp : 4 MI-HMM: 4 –Components per state (Fewer total components for probability calculation) HMMSamp : 1 Gaussian MI-HMM: Discrete mixture over 16 symbols

27
Classification Results

28
28/31 CSI Laboratory 2010 MI-HMM vs Sampling HMM Small Millbrook HMM Samp (12,000) MI-HMM (100)

29
29/31 CSI Laboratory 2010 What’s the deal with HMM Samp?

30
Concluding Remarks

31
31/31 CSI Laboratory 2010 Concluding Remarks Explicitly incorporating the Multiple Instance Learning (MIL) paradigm in HMM learning is intuitive and effective Classification performance is improved when using the MI-HMM over a standard HMM –More effective and efficient Future Work –Construct bags without using MRF heuristic –Apply to EMI data: spatial uncertainty

32
Back up Slides

33
33/31 CSI Laboratory 2010

34
34/31 CSI Laboratory 2010 Standard Learning vs. Multiple Instance Learning Standard supervised learning –Optimize some model (or learn a target concept) given training samples and corresponding labels MIL –Learn a target concept given multiple sets of samples and corresponding labels for the sets. –Interpretation: Learning with uncertain labels / noisy teacher

35
35/31 CSI Laboratory 2010 Multiple Instance Learning (MIL) Given: –Set of I bags –Labeled + or - –The i th bag is a set of J i samples in some feature space –Interpretation of labels Goal: learn concept –What characteristic is common to the positive bags that is not observed in the negative bags

36
36/31 CSI Laboratory 2010 MIL Application: Example GPR Collaboration: Frigui, Collins, Torrione Construction of bags –Collect 15 EHD feature vectors from the 15 depth bins –Mine images = + bags –FA images = - bags EHD: Feature Vector

37
37/31 CSI Laboratory 2010 Standard vs. MI Learning: GPR Example Standard Learning –Each training sample (feature vector) must have a label Arduous task –many feature vectors per image and multiple images –difficult to label given GPR echoes, ground truthing errors, etc … –label of each vector may not be known EHD: Feature Vector

38
38/31 CSI Laboratory 2010 Standard vs MI Learning: GPR Example Multiple Instance Learning –Each training bag must have a label –No need to label all feature vectors, just identify images (bags) where targets are present –Implicitly accounts for class label uncertainty … EHD: Feature Vector

39
Random Set Framework for Multiple Instance Learning

40
40/31 CSI Laboratory 2010 Random Set Brief Random Set

41
41/31 CSI Laboratory 2010 How can we use Random Sets for MIL? Random set for MIL: Bags are sets –Idea of finding commonality of positive bags inherent in random set formulation Sets have an empty intersection or non-empty intersection relationship Find commonality using intersection operator Random sets governing functional is based on intersection operator –Capacity functional : T It is NOT the case that EACH element is NOT the target concept A.K.A. : Noisy-OR gate (Pearl 1988)

42
42/31 CSI Laboratory 2010 Random Set Functionals Capacity functionals for intersection calculation Use germ and grain model to model random set –Multiple (J) Concepts –Calculate probability of intersection given X and germ and grain pairs: –Grains are governed by random radii with assumed cumulative: Random Set model parameters GermGrain

43
43/31 CSI Laboratory 2010 RSF-MIL: Germ and Grain Model Positive Bags = blue Negative Bags = orange Distinct shapes = distinct bags x x x x x x x x x T T T T T

44
Multiple Instance Learning with Multiple Concepts

45
45/31 CSI Laboratory 2010 Multiple Concepts: Disjunction or Conjunction? Disjunction –When you have multiple types of concepts –When each instance can indicate the presence of a target Conjunction –When you have a target type that is composed of multiple (necessary concepts) –When each instance can indicate a concept, but not necessary the composite target type

46
46/31 CSI Laboratory 2010 Conjunctive RSF-MIL Previously Developed Disjunctive RSF-MIL (RSF-MIL-d) Conjunctive RSF-MIL (RSF-MIL-c) Standard noisy-OR for one concept j Noisy-AND combination across concepts Noisy-OR combination across concepts and samples

47
47/31 CSI Laboratory 2010 Synthetic Data Experiments Extreme Conjunct data set requires that a target bag exhibits two distinct concepts rather than one or none AUC ( AUC when initialized near solution )

48
Application to Remote Sensing

49
49/31 CSI Laboratory 2010 Disjunctive Target Concepts Target Concept Type 1 Noisy OR … Target Concept Type 2 Target Concept Type n Noisy OR OROR Target Concept Present? Using Large overlapping bins (GROSS Extraction) the target concept can be encapsulated within 1 instance: Therefore a disjunctive relationship exists

50
50/31 CSI Laboratory 2010 What if we want features with finer granularity Fine Extraction –More detail about image and more shape information, but may loose disjunctive nature between (multiple) instances … Noisy OR AND Target Concept Present? Constituent Concept 1 (top of hyperbola) Constituent Concept 2 (wings of hyperbola) Our features have more granularity, therefore our concepts may be constituents of a target, rather than encapsulating the target concept

51
51/31 CSI Laboratory 2010 GPR Experiments Extensive GPR Data set –~800 targets –~ 5,000 non-targets Experimental Design –Run RSF-MIL-d (disjunctive) and RSF-MIL-c (conjunctive) –Compare both feature extraction methods Gross extraction: large enough to encompass target concept Fine extraction: Non-overlapping bins Hypothesis –RSF-MIL will perform well when using gross extraction whereas RSF-MIL-c will perform well using Fine extraction

52
52/31 CSI Laboratory 2010 Experimental Results Highlights –RSF-MIL-d using gross extraction performed best –RSF-MIL-c performed better than RSF-MIL-d when using fine extraction –Other influencing factors: optimization methods for RSF-MIL-d and RSF-MIL-c are not the same Gross Extraction Fine Extraction

53
53/31 CSI Laboratory 2010 Future Work Implement a general form that can learn disjunction or conjunction relationship from the data Implement a general form that can learn the number of concepts Incorporate spatial information Develop an improved optimization scheme for RSF-MIL-C

54
54/31 CSI Laboratory 2010

55
55/31 CSI Laboratory 2010 HMM Model Visualization DTXTHMM Falling Diagonal Falling Diagonal Rising Diagonal Points = Gaussian Component means Points = Gaussian Component means Color = State Index Color = State Index State index1 State index 2 State index 3 Initial probabilities Transition probabilities from state to state (red = high probability) Pattern Characterized

56
56/31 CSI Laboratory 2010

57
57/31 CSI Laboratory 2010

58
58/31 CSI Laboratory 2010

59
59/31 CSI Laboratory 2010

60
60/31 CSI Laboratory 2010

61
61/31 CSI Laboratory 2010

62
62/31 CSI Laboratory 2010

63
Backup Slides

64
64/31 CSI Laboratory 2010 MIL Example (AHI Imagery) Robust learning tool –MIL tools can learn target signature with limited or incomplete ground truth Which spectral signature(s) should we use to train a target model or classifier? 1.Spectral mixing 2.Background signal 3.Ground truth not exact

65
65/31 CSI Laboratory 2010 MI-RVM Addition of set observations and inference using noisy-OR to an RVM model Prior on the weight w

66
66/31 CSI Laboratory 2010 SVM review Classifier structure Optimization

67
67/31 CSI Laboratory 2010 MI-SVM Discussion RVM was altered to fit MIL problem by changing the form of the target variable’s posterior to model a noisy-OR gate. SVM can be altered to fit the MIL problem by changing how the margin is calculated –Boost the margin between the bag (rather than samples) and decision surface –Look for the MI separating linear discriminant There is at least one sample from each bag in the half space

68
68/31 CSI Laboratory 2010 mi-SVM Enforce MI scenario using extra constraints Mixed integer program: Must find optimal hyperplane and optimal labeling set At least one sample in each positive bag must have a label of 1. All samples in each negative bag must have a label of -1.

69
69/31 CSI Laboratory 2010 Current Applications I.Multiple Instance Learning I.MI Problem II.MI Applications II.Multiple Instance Learning: Kernel Machines I.MI-RVM II.MI-SVM III. Current Applications I.GPR imagery II.HSI imagery

70
70/31 CSI Laboratory 2010 HSI: Target Spectra Learning Given labeled areas of interest: learn target signature Given test areas of interest: classify set of samples

71
71/31 CSI Laboratory 2010 Overview of MI-RVM Optimization Two step optimization 1.Estimate optimal w, given posterior of w There is no closed form solution for the parameters of the posterior, so a gradient update method is used Iterate until convergence. Then proceed to step 2. 2.Update parameter on prior of w The distribution on the target variable has no specific parameters. Until system convergence, continue at step 1.

72
72/31 CSI Laboratory ) Optimization of w Optimize posterior (Bayes’ Rule) of w Update weights using Newton-Raphson method

73
73/31 CSI Laboratory ) Optimization of Prior Optimization of covariance of prior Making a large number of assumptions, diagonal elements of A can be estimated

74
74/31 CSI Laboratory 2010 Random Sets: Multiple Instance Learning Random set framework for multiple instance learning –Bags are sets –Idea of finding commonality of positive bags inherent in random set formulation Find commonality using intersection operator Random sets governing functional is based on intersection operator

75
75/31 CSI Laboratory 2010 MI issues MIL approaches –Some approaches are biased to believe only one sample in each bag caused the target concept –Some approaches can only label bags –It is not clear whether anything is gained over supervised approaches

76
76/31 CSI Laboratory 2010 RSF-MIL MIL-like Positive Bags = blue Negative Bags = orange Distinct shapes = distinct bags x x x x x x x x x T T T T T

77
77/31 CSI Laboratory 2010 Side Note: Bayesian Networks Noisy-OR Assumption –Bayesian Network representation of Noisy-OR –Polytree: singly connected DAG

78
78/31 CSI Laboratory 2010 Side Note Full Bayesian network may be intractable –Occurrence of causal factors are rare (sparse co-occurrence) So assume polytree So assume result has boolean relationship with causal factors –Absorb I, X and A into one node, governed by randomness of I These assumptions greatly simplify inference calculation Calculate Z based on probabilities rather than constructing a distribution using X

79
79/31 CSI Laboratory 2010 Diverse Density (DD) Probabilistic Approach –Goal: Standard statistics approaches identify areas in a feature space with high density of target samples and low density of non-target samples DD: identify areas in a feature space with a high “density” of samples from EACH of the postitive bags (“diverse”), and low density of samples from negative bags. –Identify attributes or characteristics similar to positive bags, dissimilar with negative bags –Assume t is a target characterization –Goal: –Assuming the bags are conditionally independent

80
80/31 CSI Laboratory 2010 Diverse Density Calculation (Noisy-OR Model): Optimization It is NOT the case that EACH element is NOT the target concept

81
81/31 CSI Laboratory 2010 Random Set Brief Random Set

82
82/31 CSI Laboratory 2010 Random Set Functionals Capacity and avoidance functionals –Given a germ and grain model –Assumed random radii

83
83/31 CSI Laboratory 2010 When disjunction makes sense Using Large overlapping bins the target concept can be encapsulated within 1 instance: Therefore a disjunctive relationship exists OR Target Concept Present

84
84/31 CSI Laboratory 2010 Theoretical and Developmental Progress Previous Optimization: Did not necessarily promote diverse density Current optimization Better for context learning and MIL Previously no feature relevance or selection (hypersphere) –Improvement: included learned weights on each feature dimension Previous TO DO list Improve Existing Code – Develop joint optimization for context learning and MIL Apply MIL approaches (broad scale) Learn similarities between feature sets of mines Aid in training existing algos: find “best” EHD features for training / testing Construct set-based classifiers? Previous TO DO list Improve Existing Code – Develop joint optimization for context learning and MIL Apply MIL approaches (broad scale) Learn similarities between feature sets of mines Aid in training existing algos: find “best” EHD features for training / testing Construct set-based classifiers?

85
85/31 CSI Laboratory 2010 How do we impose the MI scenario?: Diverse Density (Maron et al.) Calculation (Noisy-OR Model): –Inherent in Random Set formulation Optimization –Combo of exhaustive search and gradient ascent It is NOT the case that EACH element is NOT the target concept

86
86/31 CSI Laboratory 2010 How can we use Random Sets for MIL? Random set for MIL: Bags are sets –Idea of finding commonality of positive bags inherent in random set formulation Sets have an empty intersection or non-empty intersection relationship Find commonality using intersection operator Random sets governing functional is based on intersection operator Example: Bags with target {l,a,e,i,o,p,u,f} {f,b,a,e,i,z,o,u} {a,b,c,i,o,u,e,p,f} {a,f,t,e,i,u,o,d,v} Bags without target {s,r,n,m,p,l} {z,s,w,t,g,n,c} {f,p,k,r} {q,x,z,c,v} {p,l,f} {a,e,i,o,u,f} intersection union {f,s,r,n,m,p,l,z,w,g,n,c,v,q,k} Target concept = \ = {a,e,i,o,u}

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google