Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao.

Slides:



Advertisements
Similar presentations
Distinctive Image Features from Scale-Invariant Keypoints
Advertisements

Shape Matching and Object Recognition using Low Distortion Correspondence Alexander C. Berg, Tamara L. Berg, Jitendra Malik U.C. Berkeley.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
EigenFaces and EigenPatches Useful model of variation in a region –Region must be fixed shape (eg rectangle) Developed for face recognition Generalised.
Zhimin CaoThe Chinese University of Hong Kong Qi YinITCS, Tsinghua University Xiaoou TangShenzhen Institutes of Advanced Technology Chinese Academy of.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.
Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.
UCB Computer Vision Animals on the Web Tamara L. Berg CSE 595 Words & Pictures.
MIT CSAIL Vision interfaces Approximate Correspondences in High Dimensions Kristen Grauman* Trevor Darrell MIT CSAIL (*) UT Austin…
The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features Kristen Grauman Trevor Darrell MIT.
Detecting Categories in News Video Using Image Features Slav Petrov, Arlo Faria, Pascal Michaillat, Alex Berg, Andreas Stolcke, Dan Klein, Jitendra Malik.
Object Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition l Panoramas,
Computer Vision – Image Representation (Histograms)
Object Recognition & Model Based Tracking © Danica Kragic Tracking system.
Instructor: Mircea Nicolescu Lecture 15 CS 485 / 685 Computer Vision.
Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.
Discriminative and generative methods for bags of features
Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
EE 7730 Image Segmentation.
1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Overview Introduction to local features
Exercise Session 10 – Image Categorization
Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.
Unsupervised Learning of Categories from Sets of Partially Matching Image Features Kristen Grauman and Trevor Darrel CVPR 2006 Presented By Sovan Biswas.
Computer vision.
Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.
Prakash Chockalingam Clemson University Non-Rigid Multi-Modal Object Tracking Using Gaussian Mixture Models Committee Members Dr Stan Birchfield (chair)
Recognition and Matching based on local invariant features Cordelia Schmid INRIA, Grenoble David Lowe Univ. of British Columbia.
Building Face Dataset Shijin Kong. Building Face Dataset Ramanan et al, ICCV 2007, Leveraging Archival Video for Building Face DatasetsLeveraging Archival.
Watch, Listen and Learn Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney -Pratiksha Shah.
BACKGROUND LEARNING AND LETTER DETECTION USING TEXTURE WITH PRINCIPAL COMPONENT ANALYSIS (PCA) CIS 601 PROJECT SUMIT BASU FALL 2004.
Building local part models for category-level recognition C. Schmid, INRIA Grenoble Joint work with G. Dorko, S. Lazebnik, J. Ponce.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Texture. Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result.
Image Classification 영상분류
SVM-KNN Discriminative Nearest Neighbor Classification for Visual Category Recognition Hao Zhang, Alex Berg, Michael Maire, Jitendra Malik.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
A Two-level Pose Estimation Framework Using Majority Voting of Gabor Wavelets and Bunch Graph Analysis J. Wu, J. M. Pedersen, D. Putthividhya, D. Norgaard,
CSE 185 Introduction to Computer Vision Face Recognition.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
A feature-based kernel for object classification P. Moreels - J-Y Bouguet Intel.
CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.
A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER
Dense Color Moment: A New Discriminative Color Descriptor Kylie Gorman, Mentor: Yang Zhang University of Central Florida I.Problem:  Create Robust Discriminative.
Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.
Goggle Gist on the Google Phone A Content-based image retrieval system for the Google phone Manu Viswanathan Chin-Kai Chang Ji Hyun Moon.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Face detection and recognition Many slides adapted from K. Grauman and D. Lowe.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
Nonparametric Semantic Segmentation
Video Google: Text Retrieval Approach to Object Matching in Videos
Histogram—Representation of Color Feature in Image Processing Yang, Li
Training Techniques for Deep Neural Networks
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
Brief Review of Recognition + Context
Video Google: Text Retrieval Approach to Object Matching in Videos
Recognition and Matching based on local invariant features
EM Algorithm and its Applications
Presentation transcript:

Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao

Motivation Image search Building large sets of classified images Robotics

Background Object recognition is unsolved Deformable shaped matching Classification based on purely textual information SVMs, PCA

Image and Textual Feature Based Matching Goal: Large dataset of labelled images of animals found online Animals are difficult to classify

Dataset Web pages from Google text search on 10 different animals used as the data set Also gathered data based on searches related to monkeys

Training Use LDA to discover highly likely words for each topic Rank images based on their word likelihoods to get a set of 30 exemplars for each topic Supervision

Latent Dirichlet Allocation Nearby words are likely to be relevant to the image Probabilistic generative model to find 10 latent topics for each category Discover 50 highly likely words for each topic

LDA cont. Assign each image to to it’s most likely topic Select the top 30 images for each topic as exemplars

Supervision Each topic is labelled as relevant or background Topics merged to form a single relevant group and background group Optional: Allow the user to swap incorrectly labelled exemplars

Testing Independent voting based on textual information and 3 image types: shape, color, texture Image feature similarity is computed by nearest neighbour comparing to features from positive and negative exemplar groups Compute the sum of the similarities of the image features matching the positive group

Textual Sum the likelihood the image belongs to the relevant topics of a category Normalize based on the maximal score over all images

Shape - Geometric Blur Local appearance descriptor Apply spatially varying blur Robust to small affine distortions

Shape Compare geometric blur features Ensure there is a high similarity Gating of geometric blur features: ensure the local color matches

Color Subdivide the image into 9 regions Compute a normalized color histogram with 8 bins per color channel Compute color histograms around geometric blur features with a radius of 30 pixels

Texture Compute histograms of the output many different filters

Voting Linear combination of the 4 voting features All votes have equal weight

Supervision Supervision helps accuracy for smaller categories Excluding exemplars selected by LDA can result in worse accuracy

Results Always more accurate than google image search False positives are often reasonable Image features greatly improves accuracy over purely textual classification Multiple features help recognize a wide range of images

Voting Accuracy

Accuracy

Limitations Requires light supervision Based on textual information, only can be applied to certain situations when there is associated text

Names and Faces Goal: Given an input image and an associated caption, detect the face(s) in the image and label it with the correct name(s) detected in the caption Motivation: Build a rich, reasonably accurate collection of labeled faces

Names and Faces

Names: Extract names found in the captions; Identify two or more capitalized words followed by a present tense verb Faces: “Face Detector”; Rectification – Use SVM to detect 5 feature points on the face – Do affine transformation

Face Representation All faces are resized to 86*86 pixels RGB values from each face are concatenated into a long vector Vectors in a space where same faces are close and different faces are far apart

Background Kernel PCA (Principal Component Analysis) – Discard components that are similar for all faces to reduce dimensions – In the coordinate system set up by the basic principal components, the images have the widest distribution LDA (Linear Discrimination Analysis) – Provide a linear space that best separates different faces for discrimination

Kernel PCA in Names and Faces Compute a kernel matrix, K Kij = value of kernel function (Gaussian kernel here) comparing image i and image j Due to the huge image set, NxN Kernel Matrix will have about 2*10^9 elements Nystrom Approximation is used to calculate the eigenvectors of K. C is estimated by Ĉ

LDA in Names and Faces After applying kernel PCA, the dimensions of the data points (here are the faces) are sharply reduced The size of covariance matrix for the reduced input vectors is not huge Project all images into a linear space where different faces are separated best

Modified k-Means Step 1. Randomly assign each face to a name Step 2. for the faces in each name, calculate the means of the image vectors Step 3. Reassign each image to the name whose vector means is closest to it. Repeat step 2 and 3 to convergence

Prune and Merge Remove clusters with fewer than three Remove points with low likelihood to get low error rates – Likelihood=P(face is from assigned cluster)/P(face is not from assigned cluster) Merge clusters with small distances between their means

Example

Evaluation Fairly good assignment of names to faces using simple models for images, names

Limitations Random assignment of faces in k-means Use of RGB pixel values to discriminate between faces of different people

Bag-of-features for scene categorization Bag-of-features method represents an image as an orderless collection of local features Disregards all information about the spatial layout of the features Incapable of capturing shape or of segmenting an object from its background

Spatial Pyramid Matching (review) Compute rough geometric correspondence on a global scale Repeatedly subdivide the image and compute histograms of local features at increasingly fine resolutions. It is a “Locally orderless matching” method which achieves good performance in estimating overall perceptual similarity between images

Spatial Pyramid Matching (review) Example

Pyramid Matching Mechanism (review) Place a sequence of increasingly coarser grids over the feature space Takes a weighted sum of the number of matches that occur at each level of resolution Two points are said to match if they fall into the same cell of the grid matches found at finer resolutions are weighted more highly than matches found at coarser resolutions

Equation for Kernel

Spatial Pyramid Matching in Labeling Images (Preparation stage) Change images to gray scale Feature Extraction – Feature as a dense regular grid instead of interest points – SIFT descriptors of 16×16 pixel patches Set up vocabulary (classify features) – k-Means is to merge similar features – k=200, k=400

Spatial Pyramid Matching in Labeling Images (Matching) Each type of feature as a channel For each channel, apply Pyramid Matching to get kernel values of each pair of images Sum of the kernel values between image X and Y in all channels to get the final kernel between X and Y

Spatial Pyramid Matching in Labeling Images (Discriminating) SVM. One-versus-all rule: a classifier is learned to separate each class from the rest, and a test image is assigned the label of the classifier with the highest response.

Spatial Pyramid Matching in Labeling Images

Spatial Pyramid Matching in Labeling Images Classification Result The authors’ dataset Caltech 101 dataset

Spatial Pyramid Matching in Labeling Images Classification Result (continued) The Graz dataset

Discussion Simple – Using global cues as indirect evidence about the presence of an object – Explicit object models are not necessary Accurate – Achieve improvement over “bag-of-features” image presentation

Applications Stand-alone Scene categorizer “Context” modular integrated in larger object recognization system