Using the Forest to see the Trees: A computational model relating features, objects and scenes Antonio Torralba CSAIL-MIT Joint work with Aude Oliva, Kevin.

Slides:



Advertisements
Similar presentations
Object recognition and scene “understanding”
Advertisements

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Object class recognition using unsupervised scale-invariant learning Rob Fergus Pietro Perona Andrew Zisserman Oxford University California Institute of.
HMAX Models Architecture Jim Mutch March 31, 2010.
My Group’s Current Research on Image Understanding.
Multi-Local Feature Manifolds for Object Detection Oscar Danielsson Stefan Carlsson Josephine Sullivan
Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.
Robust Moving Object Detection & Categorization using self- improving classifiers Omar Javed, Saad Ali & Mubarak Shah.
Recognition: A machine learning approach
Event prediction CS 590v. Applications Video search Surveillance – Detecting suspicious activities – Illegally parked cars – Abandoned bags Intelligent.
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.
Statistics of Natural Image Categories Antonio Torralba and Aude Oliva. Network: Computation in Neural Systems, 14(2003) Jonathan Huang
Object Recognition. So what does object recognition involve?
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Object Recognition: History and Overview Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce.
Object Recognition by Parts Object recognition started with line segments. - Roberts recognized objects from line segments and junctions. - This led to.
Saliency & attention (P) Lavanya Sharan April 4th, 2011.
Visual Object Recognition Rob Fergus Courant Institute, New York University
Object Class Recognition by Unsupervised Scale-Invariant Learning R. Fergus, P. Perona, and A. Zisserman Presented By Jeff.
Extreme, Non-parametric Object Recognition 80 million tiny images (Torralba et al)
Texture perception Lavanya Sharan February 23rd, 2011.
By Suren Manvelyan,
Object Recognition by Parts Object recognition started with line segments. - Roberts recognized objects from line segments and junctions. - This led to.
Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.
Multiclass object recognition
Autonomous Learning of Object Models on Mobile Robots Xiang Li Ph.D. student supervised by Dr. Mohan Sridharan Stochastic Estimation and Autonomous Robotics.
Internet-scale Imagery for Graphics and Vision James Hays cs195g Computational Photography Brown University, Spring 2010.
Part 3: discriminative methods Antonio Torralba. Overview of section Object detection with classifiers Boosting –Gentle boosting –Weak detectors –Object.
Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.
Recognition using Boosting Modified from various sources including
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
A Statistically Selected Part-Based Probabilistic Model for Object Recognition Zhipeng Zhao, Ahmed Elgammal Department of Computer Science, Rutgers, The.
Sharing features for multi-class object detection Antonio Torralba, Kevin Murphy and Bill Freeman MIT Computer Science and Artificial Intelligence Laboratory.
Visual Attention Derek Hoiem March 14, 2007 Misc Reading Group.
Assessment of Computational Visual Attention Models on Medical Images Varun Jampani 1, Ujjwal 1, Jayanthi Sivaswamy 1 and Vivek Vaidya 2 1 CVIT, IIIT Hyderabad,
Classical Methods for Object Recognition Rob Fergus (NYU)
Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks.
Features-based Object Recognition P. Moreels, P. Perona California Institute of Technology.
MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.
Putting Context into Vision Derek Hoiem September 15, 2004.
Pulkit Agrawal Y7322 BVV Sri Raj Dutt Y7110 Sushobhan Nayak Y7460.
Modeling the Shape of a Scene: Seeing the trees as a forest Scene Understanding Seminar
Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR Anchorage,
11/26/2015 Copyright G.D. Hager Class 2 - Schedule 1.Optical Illusions 2.Lecture on Object Recognition 3.Group Work 4.Sports Videos 5.Short Lecture on.
Joint Tracking of Features and Edges STAN BIRCHFIELD AND SHRINIVAS PUNDLIK CLEMSON UNIVERSITY ABSTRACT LUCAS-KANADE AND HORN-SCHUNCK JOINT TRACKING OF.
Grammars in computer vision
Context-based vision system for place and object recognition Antonio Torralba Kevin Murphy Bill Freeman Mark Rubin Presented by David Lee Some slides borrowed.
Li Fei-Fei, UIUC Rob Fergus, MIT Antonio Torralba, MIT Recognizing and Learning Object Categories ICCV 2005 Beijing, Short Course, Oct 15.
Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework N 工科所 錢雅馨 2011/01/16 Li-Jia Li, Richard.
Learning video saliency from human gaze using candidate selection CVPR2013 Poster.
1.Learn appearance based models for concepts 2.Compute posterior probabilities or Semantic Multinomial (SMN) under appearance models. -But, suffers from.
SUN Database: Large-scale Scene Recognition from Abbey to Zoo Jianxiong Xiao *James Haysy Krista A. Ehinger Aude Oliva Antonio Torralba Massachusetts Institute.
Coherent Scene Understanding with 3D Geometric Reasoning Jiyan Pan 12/3/2012.
Max-Margin Training of Upstream Scene Understanding Models Jun Zhu Carnegie Mellon University Joint work with Li-Jia Li *, Li Fei-Fei *, and Eric P. Xing.
ICCV 2009 Tilke Judd, Krista Ehinger, Fr´edo Durand, Antonio Torralba.
Biologically Inspired Vision-based Indoor Localization Zhihao Li, Ming Yang
Nonparametric Semantic Segmentation
Part-Based Room Categorization for Household Service Robots
Lecture 14: Introduction to Recognition
Digit Recognition using SVMS
Context-based vision system for place and object recognition
Object Recognition by Parts
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
Finding Clusters within a Class to Improve Classification Accuracy
CS 1674: Intro to Computer Vision Scene Recognition
CVPR 2014 Orientational Pyramid Matching for Recognizing Indoor Scenes
Rob Fergus Computer Vision
Brief Review of Recognition + Context
Object Recognition by Parts
Presentation transcript:

Using the Forest to see the Trees: A computational model relating features, objects and scenes Antonio Torralba CSAIL-MIT Joint work with Aude Oliva, Kevin Murphy, William Freeman Monica Castelhano, John Henderson

From objects to scenes Image I Local features LLLL O2O2 O2O2 O2O2 O2O2 S SceneType 2 {street, office, …} O1O1 O1O1 O1O1 O1O1 Object localization Riesenhuber & Poggio (99); Vidal- Naquet & Ullman (03); Serre & Poggio, (05); Agarwal & Roth, (02), Moghaddam, Pentland (97), Turk, Pentland (91),Vidal-Naquet, Ullman, (03) Heisele, et al, (01), Agarwal & Roth, (02), Kremp, Geman, Amit (02), Dorko, Schmid, (03) Fergus, Perona, Zisserman (03), Fei Fei, Fergus, Perona, (03), Schneiderman, Kanade (00), Lowe (99)

From scenes to objects S SceneType 2 {street, office, …} Image Local features LLL I O1O1 O1O1 O1O1 O1O1 O2O2 O2O2 O2O2 O2O2 L Global gist features G Object localization

From scenes to objects S SceneType 2 {street, office, …} Image Local features LLL I O1O1 O1O1 O1O1 O1O1 O2O2 O2O2 O2O2 O2O2 L Global gist features G Object localization

The context challenge 2 What do you think are the hidden objects? 1 Biederman et al 82; Bar & Ullman 93; Palmer, 75;

The context challenge What do you think are the hidden objects? Answering this question does not require knowing how the objects look like. It is all about context. Chance ~ 1/30000

From scenes to objects S SceneType 2 {street, office, …} Image Local features LLL I L Global gist features G

Scene categorization OfficeCorridorStreet Oliva & Torralba, IJCV’01; Torralba, Murphy, Freeman, Mark, CVPR 03.

Place identification Office 610Office 615 Draper street 59 other places… Scenes are categories, places are instances

Supervised learning V g, Office} V g, { { Office} V g, { Corridor} V g, { Street} … Classifier

Supervised learning V g, Office} V g, { { Office} V g, { Corridor} V g, { Street} … Classifier Which feature vector for a whole image?

Global features (gist) First, we propose a set of features that do not encode specific object information Oliva & Torralba, IJCV’01; Torralba, Murphy, Freeman, Mark, CVPR 03.

| v t | PCA 80 features Global features (gist) V = {energy at each orientation and scale} = 6 x 4 dimensions First, we propose a set of features that do not encode specific object information G Oliva & Torralba, IJCV’01; Torralba, Murphy, Freeman, Mark, CVPR 03.

Example visual gists I I’ Global features (I) ~ global features (I’) Cf. “Pyramid Based Texture Analysis/Synthesis”, Heeger and Bergen, Siggraph, 1995

Learning to recognize places Hidden states = location (63 values) Observations = v G t (80 dimensions) Transition matrix encodes topology of environment Observation model is a mixture of Gaussians centered on prototypes (100 views per place) Office 610 Corridor 6bCorridor 6c Office 617 We use annotated sequences for training

Wearable test-bed v1 Kevin Murphy

Wearable test-bed v2

Place/scene recognition demo

From scenes to objects S SceneType 2 {street, office, …} Image Local features LLL I O1O1 O1O1 O1O1 O1O1 O2O2 O2O2 O2O2 O2O2 L Global gist features G Object localization

Global scene features predicts object location vgvg New image Image regions likely to contain the target

V g, 1 { } X1X1 2 { } X2X2 3 { } X3X3 4 { } X4X4 … Training set (cars) Global scene features predicts object location The goal of the training is to learn the association between the location of the target and the global scene features

Global scene features predicts object location vgvg X Results for predicting the vertical location of people Estimated Y True Y Results for predicting the horizontal location of people Estimated X True X

The layered structure of scenes p(x 2 |x 1 )p(x) In a display with multiple targets present, the location of one target constraints the ‘y’ coordinate of the remaining targets, but not the ‘x’ coordinate.

Global scene features predicts object location Stronger contextual constraints can be obtained using other objects. vgvg X

1

1

Attentional guidance Local features Saliency Saliency models: Koch & Ullman, 85; Wolfe 94; Itti, Koch, Niebur, 98; Rosenholtz, 99

Attentional guidance Local features Global features Saliency Scene prior TASK Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP 2003

Attentional guidance Local features Global features Saliency Object model TASK Scene prior

Comparison regions of interest Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP 2003

Comparison regions of interest Saliency predictions Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP % 20% 30%

Comparison regions of interest Saliency predictions Saliency and Global scene priors Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP % 20% 30%

Comparison regions of interest Dots correspond to fixations 1-4 Saliency predictions 10% 20% 30% Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP 2003

Comparison regions of interest Dots correspond to fixations 1-4 Saliency predictions Saliency and Global scene priors 10% 20% 30% Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP 2003

Results Saliency Region Contextual Region Chance level: 33 % % of fixations inside the region Fixation number Scenes without people Scenes with people Fixation number

Task modulation Local features Global features Saliency Scene prior TASK Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP 2003

Task modulation Mug search Painting search Saliency predictions Saliency and Global scene priors

From the computational perspective, scene context can be derived from global image properties and predict where objects are most likely to be. Scene context considerably improves predictions of fixation locations. A complete model of attention guidance in natural scenes requires both saliency and contextual pathways Discussion