Multiple Instance Learning

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Linear Regression.
Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques

Machine learning continued Image source:
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
1/48 Faculty of Electrical Engineering Department of Computer Engineering University of Belgrade, Serbia and Informatics Autonomous Visual Model Building.
What is Statistical Modeling
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
x – independent variable (input)
Solving the Multiple-Instance Problem with Axis-Parallel Rectangles By Thomas G. Dietterich, Richard H. Lathrop, and Tomás Lozano-Pérez Appeared in Artificial.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Multiple-Instance Learning Paper 1: A Framework for Multiple-Instance Learning [Maron and Lozano-Perez, 1998] Paper 2: EM-DD: An Improved Multiple-Instance.
Support Vector Machines Pattern Recognition Sergios Theodoridis Konstantinos Koutroumbas Second Edition A Tutorial on Support Vector Machines for Pattern.
Active Learning with Support Vector Machines
Image Categorization by Learning and Reasoning with Regions Yixin Chen, University of New Orleans James Z. Wang, The Pennsylvania State University Published.
For internal use only / Copyright © Siemens AG All rights reserved. Multiple-instance learning improves CAD detection of masses in digital mammography.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Region Based Image Annotation Through Multiple-Instance Learning By: Changbo Yang Wayne State University Department of Computer Science.
Scalable Text Mining with Sparse Generative Models
Wayne State University, 1/31/ Multiple-Instance Learning via Embedded Instance Selection Yixin Chen Department of Computer Science University of.
Online Learning Algorithms
August 16, 2015EECS, OSU1 Learning with Ambiguously Labeled Training Data Kshitij Judah Ph.D. student Advisor: Prof. Alan Fern Qualifier Oral Presentation.
Anomaly detection Problem motivation Machine Learning.
Advanced Multimedia Text Classification Tamara Berg.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Example 16,000 documents 100 topic Picked those with large p(w|z)
Boris Babenko Department of Computer Science and Engineering University of California, San Diego Semi-supervised and Unsupervised Feature Scaling.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Universit at Dortmund, LS VIII
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.
Multiple Instance Real Boosting with Aggregation Functions Hossein Hajimirsadeghi and Greg Mori School of Computing Science Simon Fraser University International.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Classification Techniques: Bayesian Classification
CSC 2535 Lecture 8 Products of Experts Geoffrey Hinton.
Extending the Multi- Instance Problem to Model Instance Collaboration Anjali Koppal Advanced Machine Learning December 11, 2007.
Fields of Experts: A Framework for Learning Image Priors (Mon) Young Ki Baik, Computer Vision Lab.
Multiple Instance Learning via Successive Linear Programming Olvi Mangasarian Edward Wild University of Wisconsin-Madison.
CS 478 – Tools for Machine Learning and Data Mining SVM.
ECML 2001 A Framework for Learning Rules from Multi-Instance Data Yann Chevaleyre and Jean-Daniel Zucker University of Paris VI – LIP6 - CNRS.
HAITHAM BOU AMMAR MAASTRICHT UNIVERSITY Transfer for Supervised Learning Tasks.
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Machine Learning Concept Learning General-to Specific Ordering
Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
Regression. We have talked about regression problems before, as the problem of estimating the mapping f(x) between an independent variable x and a dependent.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Theoretical Analysis of Multi-Instance Leaning 张敏灵 周志华 南京大学软件新技术国家重点实验室
 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
Chapter 7. Classification and Prediction
Yu-Feng Li 1, James T. Kwok2, Ivor W. Tsang3 and Zhi-Hua Zhou1
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Neural Networks for Machine Learning Lecture 1e Three types of learning Geoffrey Hinton with Nitish Srivastava Kevin Swersky.
Multimodal Learning with Deep Boltzmann Machines
Machine Learning Basics
Learning with information of features
What is Regression Analysis?
Content Based Image Retrieval
Presentation transcript:

Multiple Instance Learning

Outline Motivation Multiple Instance Learning (MIL) Diverse Density Single Point Concept Disjunctive Point Concept SVM Algorithms for MIL Single Instance Learner (SIL) Sparse MIL mi-SVM MI-SVM Results Some Thoughts

Part I: Multiple Instance Learning (MIL)

Motivation It is not always possible to provide labeled data for training Reasons: Requires substantial human effort Requires expensive tests Disagreement among experts Labeling is not possible at instance level Objective: present a learning algorithm that can learn from ambiguously labeled training data

Multiple Instance Learning (MIL) In MIL, instead of giving the learner labels for the individual examples, the trainer only labels collections of examples, which are called bags. A bag is labeled positive if there is at least one positive example in it It is labeled negative if all the examples in it are negative Negative Bags (Bi-) Positive Bags (Bi+)

Multiple Instance Learning (MIL) The key challenge with MIL is coping with the ambiguity of not knowing which examples in the positive bag are actually positive and which are not MIL model was first formalized by Dietterich et al. to deal with the drug activity prediction problem Following that, an algorithm called Diverse Density was developed to provide a solution to MIL Later, the method was extended to deal real-valued labels instead of binary labels.

Diverse Density Diversity Density solves MIL problem by examining the distribution of the instances It looks for a point that is close to instances in different positive bags and that is far from the instances in the negative bags Such a point represents the concept that we would like to learn Diversity Density is the measure of the intersection of the positive bags minus the union of the negative bags.

Diversity Density – Molecular Example Suppose the shape of candidate molecule can be described by a feature vector If a molecule is labeled positive, then at least one place along the manifold it took the right shape to fit into the target protein

Diversity Density – Molecular Example

Noisy-Or for Estimating the Density It is assumed that the event can only happen if at least one of the causations occurred It is also assumed that the probability of any cause failing to trigger the event is independent of any other cause

Diverse Density - Formally By maximizing the Diverse Density we can find the point of intersection (the desired concept) where Alternatively, one can use most-likely-cause estimator

Single Point Concept A concept that corresponds to single point in feature space Every Bi+ has at least one instance that is equal to the true concept corrupted by some Gaussian noise. Every Bi- has no instances that are equal to the true concept corrupted by some Gaussian noise Where k = number of dimensions in feature space sk = scaling vector This class assumes that given an underlying true concept P , every positive bag has at least one instance that is equal to P corrupted by some Gaussian noise. Every negative has no instances that are equal to P corrupted by some Gaussian noise.

Disjunctive Point Concept More complicated concepts are disjunction of d-single point concepts A bag is positive if at least one of its instances is in the concept xt1, xt2 or xtd

Density Surfaces

Part II: SVM Algorithms for MIL

Single Instance Learning MIL SIL-MIL: Single Instance Learning approach Applies bag’s label to all instances in the bag A normal SVM is trained on the resulting dataset Good for bags that are rich with positive instances

Sparse MIL All instances from negative bags are real negative instances Small positive bags are more informative than large positive bags A bag is represented as the sum of all its instances normalized by its 1 or 2-norm Good for sparse bags Balanced SVM that uses transductive SVM to estimate the labels for the unlabeled data from labeled training data. Where does the labeled data come from? Does not that contradicts with MIL concept?

Results Datasets used: AIMed: sparse dataset created from a corpus of protein-protein interactions. Contains 670 positive and 1,040 negative bags CBIR: Content Based Image Retrieval domain. The task is to categorize images as to whether they contain an object of interest MUSK: drug activity dataset. Bags corresponds to molecule, while bag instances correspond to three dimensional conformation of same molecule TST: text categorization dataset in which MEDLINE articles are represented as bags of overlapping text passages.

Results

mi-SVM Instance level classification Treats label instance labels yi as unobserved hidden variable Goal is to maximize the margin over the unknown instance labels Suitable for instance classification

MI-SVM Bag level classification Goal is to maximize the bag margin, which is The “most positive” instance in case of positive bags The “least negative” instance in case of negative bags Suitable for bag classification

Results: mi-SVM vs. MI-SVM Corel image data sets TREC9 document categorization sets

Some Thoughts Can find multiple positive concepts in a single bag and learn these concepts? Does varying sizes of negative bags have an influence on the learning algorithm? Can we re-formulate MIL using Fuzzy Logic?

References O. Maron and T. Lozano-Pérez, "A framework for multiple-instance learning," 1998, pp. 570-576. R. C. Bunescu and R. J. Mooney, "Multiple instance learning for sparse positive bags," 2007, pp. 105-112. J. Yang, "Review of Multi-Instance Learning and Its applications," 2008. S. Andrews, et al., "Support vector machines for multiple-instance learning," Advances in neural information processing systems, pp. 577-584, 2003.