Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.

Slides:

Advertisements

Similar presentations

Recognizing Human Actions by Attributes CVPR2011 Jingen Liu, Benjamin Kuipers, Silvio Savarese Dept. of Electrical Engineering and Computer Science University.

Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Three things everyone should know to improve object retrieval

Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.

Efficiently searching for similar images (Kristen Grauman)

Support Vector Machines

1 Texture Texture is a description of the spatial arrangement of color or intensities in an image or a selected region of an image. Structural approach:

Face Alignment with Part-Based Modeling

Computer Vision Lecture 16: Texture

Texture Segmentation Based on Voting of Blocks, Bayesian Flooding and Region Merging C. Panagiotakis (1), I. Grinias (2) and G. Tziritas (3)

São Paulo Advanced School of Computing (SP-ASC’10). São Paulo, Brazil, July 12-17, 2010 Looking at People Using Partial Least Squares William Robson Schwartz.

10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.

Discriminative and generative methods for bags of features

Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.

Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.

A Unified View of Kernel k-means, Spectral Clustering and Graph Cuts

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.

Image Categorization by Learning and Reasoning with Regions Yixin Chen, University of New Orleans James Z. Wang, The Pennsylvania State University Published.

Texture Readings: Ch 7: all of it plus Carson paper

Image Matching via Saliency Region Correspondences Alexander Toshev Jianbo Shi Kostas Daniilidis IEEE Conference on Computer Vision and Pattern Recognition.

What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.

Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.

Cue Integration in Figure/Ground Labeling Xiaofeng Ren, Charless Fowlkes and Jitendra Malik, U.C. Berkeley We present a model of edge and region grouping.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

An Introduction to Support Vector Machines Martin Law.

Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.

Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

Autonomous Learning of Object Models on Mobile Robots Xiang Li Ph.D. student supervised by Dr. Mohan Sridharan Stochastic Estimation and Autonomous Robotics.

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

Prakash Chockalingam Clemson University Non-Rigid Multi-Modal Object Tracking Using Gaussian Mixture Models Committee Members Dr Stan Birchfield (chair)

Object Detection Sliding Window Based Approach Context Helps

Texture. Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result.

IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

Texture We would like to thank Amnon Drory for this deck הבהרה : החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.

Supervised Learning of Edges and Object Boundaries Piotr Dollár Zhuowen Tu Serge Belongie.

Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.

An Introduction to Support Vector Machines (M. Law)

A Two-level Pose Estimation Framework Using Majority Voting of Gabor Wavelets and Bunch Graph Analysis J. Wu, J. M. Pedersen, D. Putthividhya, D. Norgaard,

Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.

Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova ， Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.

CS654: Digital Image Analysis

Non-Photorealistic Rendering and Content- Based Image Retrieval Yuan-Hao Lai Pacific Graphics (2003)

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

Spoken Language Group Chinese Information Processing Lab. Institute of Information Science Academia Sinica, Taipei, Taiwan

Levels of Image Data Representation 4.2. Traditional Image Data Structures 4.3. Hierarchical Data Structures Chapter 4 – Data structures for.

Associative Hierarchical CRFs for Object Class Image Segmentation

A split-and-merge framework for 2D shape summarization D. Gerogiannis, C. Nikou and A. Likas Department of Computer Science, University of Ioannina, Greece.

1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci

Chapter 13 (Prototype Methods and Nearest-Neighbors )

A New Method for Crater Detection Heather Dunlop November 2, 2006.

POSTER TEMPLATE BY: Background Objectives Psychophysical Experiment Smoothness Features Project Pipeline and outlines The purpose.

Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.

Object Recognition by Integrating Multiple Image Segmentations Caroline Pantofaru, Cordelia Schmid, Martial Hebert ECCV 2008 E.

Classifying Covert Photographs CVPR 2012 POSTER. Outline  Introduction  Combine Image Features and Attributes  Experiment  Conclusion.

SUN Database: Large-scale Scene Recognition from Abbey to Zoo Jianxiong Xiao *James Haysy Krista A. Ehinger Aude Oliva Antonio Torralba Massachusetts Institute.

DISCRIMINATIVELY TRAINED DENSE SURFACE NORMAL ESTIMATION ANDREW SHARP.

CSCI 631 – Foundations of Computer Vision March 15, 2016 Ashwini Imran Image Stitching.

Image segmentation.

Support Feature Machine for DNA microarray data

Nonparametric Semantic Segmentation

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science

Learning with information of features

Multivariate Methods Berlin Chen

“Traditional” image segmentation

Presentation transcript:

Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department University of California, San Diego Multi-Class Object Localization by Combining Local Contextual Interactions

Outline Introduction Multi-Class Multi-Kernel Approach Contextual Interaction Experiment & Results Conclusion

Outline Introduction Multi-Class Multi-Kernel Approach Contextual Interaction Experiment & Results Conclusion

Introduction Object localization of contextual cues can greatly improve accuracy over model that use appearance feature alone. Context considers information from neighboring area of object, such as pixel, region, and object interaction.

Introduction In this work, we present a novel framework for object localization that efficiently and effectively combines different level of interaction. Develop a multiple kernel learning algorithm to integrate appearance feature with pixel and region interaction data, resulting in a unified similarity metric, which is optimized for nearest neighbor classification. Object level interactions are modeled by a conditional random field(CRF) to produce the final label prediction.

Outline Introduction Multi-Class Multi-Kernel Approach Contextual Interaction Experiment & Results Conclusion

Multi-Class Multi-Kernel Approach Large Margin Nearest Neighbor Multiple Kernel Extension Spatial Smoothing by Segment Merging Contextual Conditional Random Field

Multi-Class Multi-Kernel Approach In our model, each training image I is partitioned into segments si by using ground truth information. Each segment si corresponds to exactly one object of class where C is the set of all object labels. These segments are collected into the training set S. For each segment si2S,,we extract several types of features, where the pth feature space is characterized by a kernel function and inner product matrix:

Multi-Class Multi-Kernel Approach From this collection of kernels, we learn a unified similarity metric over, and a corresponding embedding function, map training set to learned space. To provide more representative examples for nearest neighbor prediction, we augment the training set S with additional segments, obtained by running a segmentation algorithm multiple times on the training images [24]. Because at test time, ground-truth segmentations are not available, the test image must be segmented automatically.

Multi-Class Multi-Kernel Approach Multiple Kernel Extension – several different features are extracted Spatial Smoothing by Segment Merging Contextual Conditional Random Field – predict the final labeling of each segment Segment are mapped into a unified space & soft label prediction is compute

Multi-Class Multi-Kernel Approach Large Margin Nearest Neighbor Multiple Kernel Extension Spatial Smoothing by Segment Merging Contextual Conditional Random Field

Large Margin Nearest Neighbor Our classification algorithm is based on k-nearest neighbor prediction. Apply the Large Margin Nearest Neighbor(LMNN) algorithm to optimally distort the features for nearest neighbor prediction [35]. Neighbors are selected by using the learned Mahalanobis distance metric W : W is a positive semidefinite(PSD) matrix.

Large Margin Nearest Neighbor W is trained so that for each training segment. Neighboring segments (in feature space) with differing labels are pushed away by a large margin. Achieved by solving the following semidefinite program: and is similar and dissimilar label is slack parameter, is slack variable

Large Margin Nearest Neighbor A linear projection matrix L can be recovered from W by its spectral decomposition, so that W = L: V contains the eigenvectors of W, and is a diagonal matrix containing the eigenvalues

Large Margin Nearest Neighbor Although the learned projection is linear, the algorithm can be kernelized [28] to effectively learn non-linear feature transformations. After kernelizing the algorithm, each segment si can be rewritten by its corresponding column in the kernel matrix and introducing a regularization term. The embedding function then takes the form:

Multi-Class Multi-Kernel Approach Large Margin Nearest Neighbor Multiple Kernel Extension Spatial Smoothing by Segment Merging Contextual Conditional Random Field

Multiple Kernel Extension To effectively integrate different types of feature descriptions, we learn a linear projection from each kernel’s feature space. Define the combined distance between two points by summing the distance in each (transformed) space. This is expressed algebraically as: The regularization term tr(WK) is similarly extended to the sum The multiple-kernel embedding function then takes the form

Multiple Kernel Extension Multiple Kernel LMNN(MKLMNN) algorithm:

Multiple Kernel Extension The probability distribution over the labels for the segment is computed by using its k nearest neighbors, weighted according to distance from g(s0): where is the label of segment To simplify the process, we restrict to be diagonal, which can be interpreted as learning weightings over S in each feature space.

Multi-Class Multi-Kernel Approach Large Margin Nearest Neighbor Multiple Kernel Extension Spatial Smoothing by Segment Merging Contextual Conditional Random Field

Spatial Smoothing by Segment Merging Because objects may be represented by multiple segments at test time, some of those segments will contain only partial information from the object. Resulting in less reliable label predictions. Smooth a segment’s label distribution by incorporating information from segments which are likely to come from the same object, resulting in an updated label distribution

Spatial Smoothing by Segment Merging Using the extra segments, we train an SVM classifier to predict when two segments belong to the same object. By using the ground truth object annotation, we know when a pair of training segment came from the same object. Given two segment and we compute: pixel and region interaction features. overlap between segment masks. normalized segment centroids. number of segments obtained in the segmentation. Euclidean distance between the two segment centroids.

Spatial Smoothing by Segment Merging We construct an undirected graph where each vertex is a segment, and edges are added between pairs that the classifier predicts should be merged, resulting in a new object segment. The smoothed label distribution is the geometric mean of the segment distribution and its corresponding object’s distribution:

Multi-Class Multi-Kernel Approach Large Margin Nearest Neighbor Multiple Kernel Extension Spatial Smoothing by Segment Merging Contextual Conditional Random Field

Pixel and region interactions can be described by low-level features, but object interaction require a high-level description, e.g., it’s label. We follow the soft label prediction with Conditional Random field(CRF) that encode high-level object interaction.

Contextual Conditional Random Field We learn potential functions from object co-occurrences, capturing long-distance dependencies between whole regions of the image and across classes. Our CRF model is described as: treating the image as a bag of segment:, represents the vector of labels for the segment in The final label vector is the value of which is maximize.

Outline Introduction Multi-Class Multi-Kernel Approach Contextual Interaction Experiment & Results Conclusion

Contextual Interactions In this part, we describe the features we use to characterize each level of contextual interaction. Pixel level interaction. Region level interaction. Object level interaction.

Pixel Level Interaction Pixel level interactions can implicitly capture background contextual information as well as information about object boundaries. We use a new type of contextual source, boundary support.

Pixel Level Interaction Encode by computing a histogram over LAB color value between 0 and pixel away from the object’s boundary. Compute the -distance between boundary support histogram H: Define the pixel interaction kernel as:

Region Level Interaction By using large windows around an object, known as contextual neighborhoods [7], regions encode probable geometrical configurations, and capture information from neighboring (parts of) objects.

Region Level Interaction Computed by dilating the bounding box around the object by using a disk of diameter: We model region interactions by computing the gist[31] of a contextual neighborhood, Gi. Our region interaction are represented by the kernel:

Object Level Interactions To train the object interaction CRF, we derive semantic context from the co-occurrence of objects within each training image. A co-ocurrence matrix A A(i,j) counts the times an object with label ci appears in a training image with an object with label cj. Diagonal entries correspond to the frequency of the object in the training set.

Outline Introduction Multi-Class Multi-Kernel Approach Contextual Interaction Experiment & Results Conclusion

Experiments Database : MSRC and PASCAL 2007 Appearance feature : SIFT Self-similarity (SSIM) LAB histogram Pyramid of Histogram of Oriented Gradients (PHOG). Context feature : GIST LAB color

Result Object localization: Mean accuracy results

Result MSRC presents more co-occurrences of object classes per image than PASCAL, providing more information to the object interaction model.

Result Feature combination: Learning the optimal embedding

Result Learned kernel weights

Result Comparison to other model: MSRC PASCAL 07

Outline Introduction Multi-Class Multi-Kernel Approach Contextual Interaction Experiment & Results Conclusion

We have introduced a novel framework that efficiently and effectively combines different levels of local context interactions. Our multiple kernel learning algorithm integrates appearance features with pixel and region interaction data. We obtain significant improvement over current state-of-the- art contextual frameworks. Adding another object interaction type, such as spatial context [8], localization accuracy could be improved further.

Thank you!!!