Conjuntive Formulation of the Random Set Framework for Multiple Instance Learning: Application to Remote Sensing Jeremy Bolton Paul Gader CSI Laboratory.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Bayesian Belief Propagation
CSI :Florida A BAYESIAN APPROACH TO LOCALIZED MULTI-KERNEL LEARNING USING THE RELEVANCE VECTOR MACHINE R. Close, J. Wilson, P. Gader.
Jeremy Bolton, Seniha Yuksel, Paul Gader
Bayesian Network and Influence Diagram A Guide to Construction And Analysis.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Linear Classifiers/SVMs
Pattern Recognition and Machine Learning
Supervised Learning Recap
Second order cone programming approaches for handing missing and uncertain data P. K. Shivaswamy, C. Bhattacharyya and A. J. Smola Discussion led by Qi.
What is Statistical Modeling
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Multiple Instance Learning
Sparse vs. Ensemble Approaches to Supervised Learning
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Lecture 5: Learning models using EM
Multiple-Instance Learning Paper 1: A Framework for Multiple-Instance Learning [Maron and Lozano-Perez, 1998] Paper 2: EM-DD: An Improved Multiple-Instance.
Today Linear Regression Logistic Regression Bayesians v. Frequentists
Support Vector Machines Pattern Recognition Sergios Theodoridis Konstantinos Koutroumbas Second Edition A Tutorial on Support Vector Machines for Pattern.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Image Categorization by Learning and Reasoning with Regions Yixin Chen, University of New Orleans James Z. Wang, The Pennsylvania State University Published.
Data mining and statistical learning - lecture 13 Separating hyperplane.
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Sparse vs. Ensemble Approaches to Supervised Learning
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Online Learning Algorithms
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Extending the Multi- Instance Problem to Model Instance Collaboration Anjali Koppal Advanced Machine Learning December 11, 2007.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.
Face Detection Using Large Margin Classifiers Ming-Hsuan Yang Dan Roth Narendra Ahuja Presented by Kiang “Sean” Zhou Beckman Institute University of Illinois.
Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Some Aspects of Bayesian Approach to Model Selection Vetrov Dmitry Dorodnicyn Computing Centre of RAS, Moscow.
Biointelligence Laboratory, Seoul National University
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
Dd Generalized Optimal Kernel-based Ensemble Learning for HS Classification Problems Generalized Optimal Kernel-based Ensemble Learning for HS Classification.
Lecture 2: Statistical learning primer for biologists
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Colour and Texture. Extract 3-D information Using Vision Extract 3-D information for performing certain tasks such as manipulation, navigation, and recognition.
Classification Ensemble Methods 1
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
Data Mining and Decision Support
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Chapter 7. Classification and Prediction
Chapter 3: Maximum-Likelihood Parameter Estimation
Machine Learning Basics
Dynamical Statistical Shape Priors for Level Set Based Tracking
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Pattern Recognition and Image Analysis
Support Vector Machines
Linear Discrimination
Presentation transcript:

Conjuntive Formulation of the Random Set Framework for Multiple Instance Learning: Application to Remote Sensing Jeremy Bolton Paul Gader CSI Laboratory University of Florida

2/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 Highlights Conjunctive forms of Random Sets for Multiple Instance Learning: –Random Sets can be used to solve MIL problem when multiple concepts are present –Previously Developed Formulations assume Disjunctive relationship between concepts learned –New formulation provides for a conjunctive relationship between concepts and its utility is exhibited on a Ground Penetrating Radar (GPR) data set

3/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 Outline I.Multiple Instance Learning I.MI Problem II.RSF-MIL III.Multiple Target Concepts II.Experimental Results I.GPR Experiments III. Future Work

Multiple Instance Learning

5/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 Standard Learning vs. Multiple Instance Learning Standard supervised learning –Optimize some model (or learn a target concept) given training samples and corresponding labels MIL –Learn a target concept given multiple sets of samples and corresponding labels for the sets. –Interpretation: Learning with uncertain labels / noisy teacher

6/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 Multiple Instance Learning (MIL) Given: –Set of I bags –Labeled + or - –The i th bag is a set of J i samples in some feature space –Interpretation of labels Goal: learn concept –What characteristic is common to the positive bags that is not observed in the negative bags

7/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 Multiple Instance Learning x 1 label = 1 x 2 label = 1 x 3 label = 0 x 4 label = 0 x 5 label = 1 {x 1, x 2, x 3, x 4 } label = 1 {x 1, x 2, x 3, x 4 } label = 0 Traditional ClassificationMultiple Instance Learning

8/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 MIL Application: Example GPR Collaboration: Frigui, Collins, Torrione Construction of bags –Collect 15 EHD feature vectors from the 15 depth bins –Mine images = + bags –FA images = - bags EHD: Feature Vector

9/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 Standard vs. MI Learning: GPR Example Standard Learning –Each training sample (feature vector) must have a label Arduous task –many feature vectors per image and multiple images –difficult to label given GPR echoes, ground truthing errors, etc … –label of each vector may not be known EHD: Feature Vector

10/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 Standard vs MI Learning: GPR Example Multiple Instance Learning –Each training bag must have a label –No need to label all feature vectors, just identify images (bags) where targets are present –Implicitly accounts for class label uncertainty … EHD: Feature Vector

Random Set Framework for Multiple Instance Learning

12/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 Random Set Brief Random Set

13/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 How can we use Random Sets for MIL? Random set for MIL: Bags are sets (multi-sets) –Idea of finding commonality of positive bags inherent in random set formulation Sets have an empty intersection or non-empty intersection relationship Find commonality using intersection operator Random sets governing functional is based on intersection operator –Capacity functional : T It is NOT the case that EACH element is NOT the target concept A.K.A. : Noisy-OR gate (Pearl 1988)

14/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 Random Set Functionals Capacity functionals for intersection calculation Use germ and grain model to model random set –Multiple (J) Concepts –Calculate probability of intersection given X and germ and grain pairs: –Grains are governed by random radii with assumed cumulative: Random Set model parameters GermGrain

15/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 RSF-MIL: Germ and Grain Model Positive Bags = blue Negative Bags = orange Distinct shapes = distinct bags x x x x x x x x x T T T T T

Multiple Instance Learning with Multiple Concepts

17/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 Multiple Concepts: Disjunction or Conjunction? Disjunction –When you have multiple types of concepts –When each instance can indicate the presence of a target Conjunction –When you have a target type that is composed of multiple (necessary concepts) –When each instance can indicate a concept, but not necessary the composite target type

18/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 Conjunctive RSF-MIL Previously Developed Disjunctive RSF-MIL (RSF-MIL-d) Conjunctive RSF-MIL (RSF-MIL-c) Standard noisy-OR for one concept j Noisy-AND combination across concepts Noisy-OR combination across concepts and samples

19/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 Synthetic Data Experiments Extreme Conjunct data set requires that a target bag exhibits two distinct concepts rather than one or none AUC ( AUC when initialized near solution )

Application to Remote Sensing

21/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 Disjunctive Target Concepts Target Concept Type 1 Noisy OR … Target Concept Type 2 Target Concept Type n Noisy OR OROR Target Concept Present? Using Large overlapping bins (GROSS Extraction) the target concept can be encapsulated within 1 instance: Therefore a disjunctive relationship exists

22/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 What if we want features with finer granularity Fine Extraction –More detail about image and more shape information, but may loose disjunctive nature between (multiple) instances … Noisy OR AND Target Concept Present? Constituent Concept 1 (top of hyperbola) Constituent Concept 2 (wings of hyperbola) Our features have more granularity, therefore our concepts may be constituents of a target, rather than encapsulating the target concept

23/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 GPR Experiments Extensive GPR Data set –~800 targets –~ 5,000 non-targets Experimental Design –Run RSF-MIL-d (disjunctive) and RSF-MIL-c (conjunctive) –Compare both feature extraction methods Gross extraction: large enough to encompass target concept Fine extraction: Non-overlapping bins Hypothesis –RSF-MIL will perform well when using gross extraction whereas RSF-MIL-c will perform well using Fine extraction

24/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 Experimental Results Highlights –RSF-MIL-d using gross extraction performed best –RSF-MIL-c performed better than RSF-MIL-d when using fine extraction –Other influencing factors: optimization methods for RSF-MIL-d and RSF-MIL-c are not the same Gross Extraction Fine Extraction

25/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 Future Work Implement a general form that can learn disjunction or conjunction relationship from the data Implement a general form that can learn the number of concepts Incorporate spatial information Develop an improved optimization scheme for RSF-MIL-C

Backup Slides

27/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 MIL Example (AHI Imagery) Robust learning tool –MIL tools can learn target signature with limited or incomplete ground truth Which spectral signature(s) should we use to train a target model or classifier? 1.Spectral mixing 2.Background signal 3.Ground truth not exact

28/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 MI-RVM Addition of set observations and inference using noisy-OR to an RVM model Prior on the weight w

29/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 SVM review Classifier structure Optimization

30/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 MI-SVM Discussion RVM was altered to fit MIL problem by changing the form of the target variable’s posterior to model a noisy-OR gate. SVM can be altered to fit the MIL problem by changing how the margin is calculated –Boost the margin between the bag (rather than samples) and decision surface –Look for the MI separating linear discriminant There is at least one sample from each bag in the half space

31/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 mi-SVM Enforce MI scenario using extra constraints Mixed integer program: Must find optimal hyperplane and optimal labeling set At least one sample in each positive bag must have a label of 1. All samples in each negative bag must have a label of -1.

32/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 Current Applications I.Multiple Instance Learning I.MI Problem II.MI Applications II.Multiple Instance Learning: Kernel Machines I.MI-RVM II.MI-SVM III. Current Applications I.GPR imagery II.HSI imagery

33/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 HSI: Target Spectra Learning Given labeled areas of interest: learn target signature Given test areas of interest: classify set of samples

34/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 Overview of MI-RVM Optimization Two step optimization 1.Estimate optimal w, given posterior of w There is no closed form solution for the parameters of the posterior, so a gradient update method is used Iterate until convergence. Then proceed to step 2. 2.Update parameter on prior of w The distribution on the target variable has no specific parameters. Until system convergence, continue at step 1.

35/23 CSI Laboratory Jeremy Bolton Paul Gader ) Optimization of w Optimize posterior (Bayes’ Rule) of w Update weights using Newton-Raphson method

36/23 CSI Laboratory Jeremy Bolton Paul Gader ) Optimization of Prior Optimization of covariance of prior Making a large number of assumptions, diagonal elements of A can be estimated

37/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 Random Sets: Multiple Instance Learning Random set framework for multiple instance learning –Bags are sets –Idea of finding commonality of positive bags inherent in random set formulation Find commonality using intersection operator Random sets governing functional is based on intersection operator

38/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 MI issues MIL approaches –Some approaches are biased to believe only one sample in each bag caused the target concept –Some approaches can only label bags –It is not clear whether anything is gained over supervised approaches

39/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 RSF-MIL MIL-like Positive Bags = blue Negative Bags = orange Distinct shapes = distinct bags x x x x x x x x x T T T T T

40/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 Side Note: Bayesian Networks Noisy-OR Assumption –Bayesian Network representation of Noisy-OR –Polytree: singly connected DAG

41/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 Side Note Full Bayesian network may be intractable –Occurrence of causal factors are rare (sparse co-occurrence) So assume polytree So assume result has boolean relationship with causal factors –Absorb I, X and A into one node, governed by randomness of I These assumptions greatly simplify inference calculation Calculate Z based on probabilities rather than constructing a distribution using X

42/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 Diverse Density (DD) Probabilistic Approach –Goal: Standard statistics approaches identify areas in a feature space with high density of target samples and low density of non-target samples DD: identify areas in a feature space with a high “density” of samples from EACH of the postitive bags (“diverse”), and low density of samples from negative bags. –Identify attributes or characteristics similar to positive bags, dissimilar with negative bags –Assume t is a target characterization –Goal: –Assuming the bags are conditionally independent

43/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 Diverse Density Calculation (Noisy-OR Model): Optimization It is NOT the case that EACH element is NOT the target concept

44/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 Random Set Brief Random Set

45/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 Random Set Functionals Capacity and avoidance functionals –Given a germ and grain model –Assumed random radii

46/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 When disjunction makes sense Using Large overlapping bins the target concept can be encapsulated within 1 instance: Therefore a disjunctive relationship exists OR Target Concept Present

47/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 Theoretical and Developmental Progress Previous Optimization: Did not necessarily promote diverse density Current optimization Better for context learning and MIL Previously no feature relevance or selection (hypersphere) –Improvement: included learned weights on each feature dimension Previous TO DO list Improve Existing Code – Develop joint optimization for context learning and MIL Apply MIL approaches (broad scale) Learn similarities between feature sets of mines Aid in training existing algos: find “best” EHD features for training / testing Construct set-based classifiers? Previous TO DO list Improve Existing Code – Develop joint optimization for context learning and MIL Apply MIL approaches (broad scale) Learn similarities between feature sets of mines Aid in training existing algos: find “best” EHD features for training / testing Construct set-based classifiers?

48/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 How do we impose the MI scenario?: Diverse Density (Maron et al.) Calculation (Noisy-OR Model): –Inherent in Random Set formulation Optimization –Combo of exhaustive search and gradient ascent It is NOT the case that EACH element is NOT the target concept

49/23 CSI Laboratory Jeremy Bolton Paul Gader 2010 How can we use Random Sets for MIL? Random set for MIL: Bags are sets –Idea of finding commonality of positive bags inherent in random set formulation Sets have an empty intersection or non-empty intersection relationship Find commonality using intersection operator Random sets governing functional is based on intersection operator Example: Bags with target {l,a,e,i,o,p,u,f} {f,b,a,e,i,z,o,u} {a,b,c,i,o,u,e,p,f} {a,f,t,e,i,u,o,d,v} Bags without target {s,r,n,m,p,l} {z,s,w,t,g,n,c} {f,p,k,r} {q,x,z,c,v} {p,l,f} {a,e,i,o,u,f} intersection union {f,s,r,n,m,p,l,z,w,g,n,c,v,q,k} Target concept = \ = {a,e,i,o,u}