Yilin Wang 11/5/2009. Background Labeling Problem Labeling: Observed data set (X) Label set (L) Inferring the labels of the data points Most vision problems.

Slides:

Advertisements

Similar presentations

Mean-Field Theory and Its Applications In Computer Vision1 1.

Advertisements

Bayesian Belief Propagation

Linear Time Methods for Propagating Beliefs Min Convolution, Distance Transforms and Box Sums Daniel Huttenlocher Computer Science Department December,

Markov Networks Alan Ritter.

Introduction to Markov Random Fields and Graph Cuts Simon Prince

Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.

Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.

Hongliang Li, Senior Member, IEEE, Linfeng Xu, Member, IEEE, and Guanghui Liu Face Hallucination via Similarity Constraints.

Chapter 4: Linear Models for Classification

EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.

Probabilistic Inference Lecture 1

1 Fast Primal-Dual Strategies for MRF Optimization (Fast PD) Robot Perception Lab Taha Hamedani Aug 2014.

Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.

Learning to Detect A Salient Object Reporter: 鄭綱 (3/2)

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

1 On the Statistical Analysis of Dirty Pictures Julian Besag.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

1 Markov random field: A brief introduction Tzu-Cheng Jen Institute of Electronics, NCTU

Announcements Readings for today:

An Iterative Optimization Approach for Unified Image Segmentation and Matting Hello everyone, my name is Jue Wang, I’m glad to be here to present our paper.

© 2006 by Davi GeigerComputer Vision April 2006 L1.1 Binocular Stereo Left Image Right Image.

Computer vision: models, learning and inference Chapter 10 Graphical Models.

CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.

Crash Course on Machine Learning

Image Analysis and Markov Random Fields (MRFs) Quanren Xiong.

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

Mutual Information-based Stereo Matching Combined with SIFT Descriptor in Log-chromaticity Color Space Yong Seok Heo, Kyoung Mu Lee, and Sang Uk Lee.

CSC2535: Computation in Neural Networks Lecture 11: Conditional Random Fields Geoffrey Hinton.

INDEPENDENT COMPONENT ANALYSIS OF TEXTURES based on the article R.Manduchi, J. Portilla, ICA of Textures, The Proc. of the 7 th IEEE Int. Conf. On Comp.

Object Stereo- Joint Stereo Matching and Object Segmentation Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on Michael Bleyer Vienna.

ALIP: Automatic Linguistic Indexing of Pictures Jia Li The Pennsylvania State University.

Markov Random Fields Probabilistic Models for Images

Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.

The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.

An Asymptotic Analysis of Generative, Discriminative, and Pseudolikelihood Estimators by Percy Liang and Michael Jordan (ICML 2008 ) Presented by Lihan.

1 Markov Random Fields with Efficient Approximations Yuri Boykov, Olga Veksler, Ramin Zabih Computer Science Department CORNELL UNIVERSITY.

Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:

CS774. Markov Random Field : Theory and Application Lecture 02

Image Analysis, Random Fields and Dynamic MCMC By Marc Sobel.

1 Markov random field: A brief introduction (2) Tzu-Cheng Jen Institute of Electronics, NCTU

Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)

Associative Hierarchical CRFs for Object Class Image Segmentation

Probabilistic Graphical Models seminar 15/16 ( ) Haim Kaplan Tel Aviv University.

Lecture 2: Statistical learning primer for biologists

A Dynamic Conditional Random Field Model for Object Segmentation in Image Sequences Duke University Machine Learning Group Presented by Qiuhua Liu March.

Motion Estimation using Markov Random Fields Hrvoje Bogunović Image Processing Group Faculty of Electrical Engineering and Computing University of Zagreb.

Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.

6.4 Random Fields on Graphs 6.5 Random Fields Models In “Adaptive Cooperative Systems” Summarized by Ho-Sik Seok.

Efficient Belief Propagation for Image Restoration Qi Zhao Mar.22,2006.

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov

1 Multi Scale Markov Random Field Image Segmentation Taha hamedani.

Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.

Markov Random Fields in Vision

Jianchao Yang, John Wright, Thomas Huang, Yi Ma CVPR 2008 Image Super-Resolution as Sparse Representation of Raw Image Patches.

Ch 6. Markov Random Fields 6.1 ~ 6.3 Adaptive Cooperative Systems, Martin Beckerman, Summarized by H.-W. Lim Biointelligence Laboratory, Seoul National.

6.8 Maximizer of the Posterior Marginals 6.9 Iterated Conditional Modes of the Posterior Distribution Jang, HaYoung.

Biointelligence Laboratory, Seoul National University

Statistical-Mechanical Approach to Probabilistic Image Processing -- Loopy Belief Propagation and Advanced Mean-Field Method -- Kazuyuki Tanaka and Noriko.

Summary of “Efficient Deep Learning for Stereo Matching”

CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.

Markov Random Fields with Efficient Approximations

Multimodal Learning with Deep Boltzmann Machines

Nonparametric Semantic Segmentation

Markov Random Fields for Edge Classification

A Block Based MAP Segmentation for Image Compression

Markov Networks.

“Traditional” image segmentation

Random Neural Network Texture Model

Outline Texture modeling - continued Markov Random Field models

Presentation transcript:

Yilin Wang 11/5/2009

Background Labeling Problem Labeling: Observed data set (X) Label set (L) Inferring the labels of the data points Most vision problems can be posed as labeling problems Stereo matching Image segmentation Image restoration

Examples of Labeling Problem Stereo Matching For a pixel in image 1, where is the corresponding pixel in image 2? Picture source: S. Lazebnik Label set: Differences (disparities) between corresponding pixels

Examples of Labeling Problem Image Segmentation To partition an image into multiple disjoint regions. Picture source: Label set: Region IDs

Examples of Labeling Problem Image Restoration To "compensate for" or "undo" defects which degrade an image. Picture source: Label set: Restored Intensities

Background Image Labeling Given an image, the system should automatically partition it into semantically meaningful areas each labeled with a specific object class Cow Lawn Plane Sky Building Tree

Image Labeling Problem Given : the observed data from an input image, where is the data from site (pixel or block) of the image set S A pre-defined label set Let be the corresponding labels at the image sites, we want to find proper L to maximize the conditional probability :

Which kinds of information can be used for labeling? Features from individual sites Intensity, color, texture, … Interactions with neighboring sites Contextual information Vegetation Sky or Building?

Two types of interactions Interaction with neighboring labels (Spatial smoothness of labels) neighboring sites tend to have similar labels(except at the discontinuities) Interactions with neighboring observed data Building Sky

Information for Image Labeling Let be the label of the site of the image set S, and N i be the neighboring sites of site i Three kinds of information for image labeling Features from local site Interaction with neighboring labels Interaction with neighboring observed data Picture source: S. Xiang site iS-{i}NiNi

Markov Random Fields (MRF) Markov Random Fields (MRFs) are the most popular models to incorporate local contextual constraints in labeling problems Let be the label of the site of the image set S, and N i be the neighboring sites of site i The label set L ( ) is said to be a MRF on S w.r.t. a neighborhood N iff the following condition is satisfied: Markov property: Maintain global spatial consistency by only considering relatively local dependencies !

Markov-Gibbs Equivalence Let l be a realization of, then P(l) has an explicit formulation (Gibbs distribution): where Energy function Z: a normalizing factor, called the partition function T: a constant Clique C k ={{i,i’,i’’,…}|i,i’,i’’,… are neighbors to one another} Potential functions represent a priori knowledge of interactions between labels of neighboring sites

Auto-Model With clique potentials up to two sites, the energy takes the form When and, where G i (·) are arbitrary functions (or constants) and are constants reflecting the pairwise interaction between i and i’, the energy is Such models are called auto-models (Besag 1974)

Parameter Estimation Give the functional form of the auto-model How to specify its parameters ?

Maximum Likelihood Estimation Given a realization l of a MRF, the maximum likelihood (ML) estimate maximizes the conditional probability P(l | θ) (the likelihood of θ), that is: By Bayesian rules: The prior P(θ) is assumed to be flat when the prior information is totally unavailable. In this case, the MAP estimation reduces to the ML estimation.

Maximum Likelihood Estimation The likelihood function is in the Gibbs form where However, the computation of Z(θ) is intractable even for moderately sized problems because there are a combinatorial number of elements in the configuration space L.

Maximum Pseudo-Likelihood Assumption: and are independent. Notice that the pseudo-likelihood does not involve the partition function Z. {α, β} can be obtained by solving

Inference Recall that in image labeling, we want to find L such that maximizes the posterior, by Bayesian rules: Where prior probability: Let and then: posterior energy likelihood energy prior energy

MAP-MRF Labeling Maximizing a posterior probability is equivalent to minimizing the posterior energy: Steps of MAP: Picture source: S. Xiang N C

MRF for Image Labeling Difficulties and disadvantages Very strict independence assumptions : The interactions among label are modeled by the priori term (P(L)), and are independent of the observation data, which prohibits one from modeling data- dependent interactions in labels.

Conditional Random Fields Let G = (S, E) be a graph, then (X, L) is said to be a Conditional Random Field (CRF) if, when conditioned on X, the random variables obey the Markov property with respect to the graph: where S-{i} is the set of all sites in the graph except the site i, Ni is the set of neighbors of the site i in G. Compare with MRF:

CRF According to the Markov-Gibbs equivalence, we have If only up to pairwise clique potentials are nonzero, the posterior probability P(L| X) has the form where −V1 and −V2 are called the association and interaction potentials, respectively, in the CRF literature

CRF vs. MRF MRF is a generative model(Two-step) Infer likelihood and prior Then use Bayes theorem to determine posterior CRF is a discriminative model(One-step) Directly infer posterior

CRF vs. MRF More differences between the CRF and MRF MRF: CRF: In CRF, both Association and Interaction Potentials are functions of all the observation data as well as that of the labels

Discriminative Random Fields The Discriminative Random Field (DRF) is a special type of CRF with two extensions. First, a DRF is defined over 2D lattices (such as the image grid) Second, the unary (association) and pairwise (interaction) potentials therein are designed using local discriminative classifiers Kumar, S. and M. Hebert: `Discriminative Random Fields: A Discriminative Framework for Contextual Interaction in Classification'. ICCV 2003

DRF Formulation of DRF where and are called association potential and interaction potential Picture source: S. Xiang

Association Potential is modeled using a local discriminative model that outputs the association of the site i with class l i as: where f i (.) is a linear function that maps an patch centered at site i to a feature vector. Picture source: S. Srihari

Association Potential For binary classification (l i = -1 or 1), the posterior at site i is modeled using a logistic function: Since l i = -1 or 1, the probability can be compactly expressed as: Finally, the association potential is defined as: Picture source: S. Srihari

Interaction Potential The interaction potential can be seen as a measure of how the labels at neighboring sites i and i' should interact given the observed image X. Given the features at two different sites, a pairwise discriminative model is defined as: where is a function that maps an patch centered at site i to a feature vector, is a new feature vector, and v are model parameters is a measure of how likely site i and i’ have the same label given image X

Interaction Potential The interaction potential is modeled using data-dependent term along with constant smoothing term The first term is a data-independent smoothing term, similar to the auto-model The second term is a [-1, 1] mapping of the pairwise logistic function, which ensures that both terms have the same range Ideally, the data-dependent term will act as a discontinuity adaptive model that will moderate smoothing when the data from two sites is 'different'.

Discussion of I(l i,l i’, X) Suppose,, and Also for simplicity, we assume Then If only considering, will never choose b. The second term is used to compensate the effect of the smoothness assumption. Oversmoothed!

Parameter Estimation θ={w,v,β,K} Maximum likelihood estimation In the conventional maximum-likelihood approach, the evaluation of Z is an NP-hard problem. Approximate evaluation of partition function Z by pseudo-likelihood where m indexes over the training images and M is the total number of training images, and Subject to 0≤K≤1

Inference Objective function: Iterated Conditional Modes (ICM) algorithm Given an initial label configuration, ICM maximizes the local conditional probabilities iteratively, i.e. ICM yields local maximum of the posterior and has been shown to give reasonably good results

Experiment Task: detecting man-made structures in natural scenes Database Corel (training: 108 images, test: 129 images) Each image was divided in non-overlapping 16*16 pixels blocks Compared methods Logistic MRF DRF

Experiment Results Detection Rates (DR) and False Positives (FP) Superscript ‘-’ indicates no neighborhood data interaction was used. K = 0 indicates the absence of the data independent term in the interaction potential in DRF. The DRF reduces false positives from the MRF by more than 48%.

Experiment Results For similar detection rate, DRF has the lower false positives Detection rate of DRF is higher than that of MRF for similar false positives

Conclusion of DRF Pros Provide the benefits of discriminative models Demonstrate good performance Cons Although the model outperforms traditional MRFs, it is not strong enough to capture long range correlations among the labels due to the rigid lattice based structure which allows for only pairwise interactions

Problem Local information can be confused when there are large overlaps between different classes Sky or Water ? Solution: utilizing the global contextual information to improve the performance

Multiscale Conditional Random Field (mCRF) Considering features in different scales Local Features (site) Regional Label Features (small patch) Global Label Features (big patch or the whole image) The conditional probability P(L|X) is formulated by features in different scales s: where He, X., R. Zemel, and M. Carreira-Perpinan: 2004, `Multiscale conditional random fields for image labelling'. IEEE Int. Conf. CVPR.

Local Features The local feature of site i is represented by the outputs of several filters. The aim is to associate the patch with one of a predefined set of labels.

Local Classifier Here a multilayer perceptron is used as the local classifier. Independently at each site i, the local classifier produces a conditional distribution over label variable l i given filter outputs x i within an image patch centered on site (pixel) i: where λ are the classifier parameters.

Regional Label Features Encoding a particular constraint between the image and the labels within a region of the image Sample pattern: ground pixels (brown) above water pixels (cyan)

Global Label Features Operate at a coarser resolution, specifying common value for a patch of sites in the label field. Sample pattern: sky pixels (blue) at the top of the image, hippo pixels (red) in the middle, and water pixels (cyan) near the bottom.

Feature Function Global label features are trained by Restricted Boltzmann Machines (RBM) two layers: label sites (L) and features (f) features and labels are fully inter-connected, with no intra-layer connections where w a is the parameter connecting hidden global label feature f a and label sites L The joint distribution of the global label feature model is:

Feature Function By marginalizing out the hidden variables (f), the global component of the model is: Similarly, the regional component of the model can be represented as: By multiplicatively combining component conditional distributions:

Parameter Estimation and Inference Parameter Estimation The conditional model is trained discriminatively based on the Conditional Maximum Likelihood (CML) criterion, which maximizes the log conditional likelihood: Inference Maximum Posterior Marginals (MPM):

Experiment Results Database Corel (100 images with 7 labels) Sowerby (104 images with 8 labels) Compared methods Single classifier (MLP) MRF mCRF

Labeling Results

Conclusion of mCRF Pros Formulating the image labeling problem into a multiscale CRF model Combining the local and larger scale contextual information in a unique framework Cons Including additional classifiers operating at different scales into the mCRF framework introduces a large number of model parameters The model assumes conditional independence of hidden variables given the label field

More CRF models Hierarchical Conditional Random Field (HCRF) –S. Kumar and M. Hebert. A hierarchical field framework for unified context-based classification –Jordan Reynolds and Kevin Murphy. Figure-ground segmentation using a hierarchical conditional random field Tree Structured Conditional Random Fields (TCRF) –P. Awasthi, A. Gagrani, and B. Ravindran, Image Modeling using Tree Structured Conditional Random Fields. 2007

Reference Li, S. Z.: 2009, `Markov Random Field Modeling in Image Analysis’. Springer, 2009 Kumar, S. and M. Hebert: 2003, `Discriminative Random Fields: A Discriminative Framework for Contextual Interaction in Classification'. in proc. IEEE International Conference on Computer Vision (ICCV) He, X., R. Zemel, and M. Carreira-Perpinan: 2004, `Multiscale conditional random fields for image labelling'. IEEE Int. Conf. CVPR.

End Thanks!