The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.

Slides:



Advertisements
Similar presentations
Shape Matching and Object Recognition using Low Distortion Correspondence Alexander C. Berg, Tamara L. Berg, Jitendra Malik U.C. Berkeley.
Advertisements

CVPR2013 Poster Modeling Actions through State Changes.
Presented By: Vennela Sunnam
Recovering Human Body Configurations: Combining Segmentation and Recognition Greg Mori, Xiaofeng Ren, and Jitentendra Malik (UC Berkeley) Alexei A. Efros.
Normalized Cuts and Image Segmentation
I Images as graphs Fully-connected graph – node for every pixel – link between every pair of pixels, p,q – similarity w ij for each link j w ij c Source:
Adviser:Ming-Yuan Shieh Student:shun-te chuang SN:M
Ghunhui Gu, Joseph J. Lim, Pablo Arbeláez, Jitendra Malik University of California at Berkeley Berkeley, CA
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
December 5, 2013Computer Vision Lecture 20: Hidden Markov Models/Depth 1 Stereo Vision Due to the limited resolution of images, increasing the baseline.
Computer Vision Group University of California Berkeley Shape Matching and Object Recognition using Shape Contexts Jitendra Malik U.C. Berkeley (joint.
ICIP 2000, Vancouver, Canada IVML, ECE, NTUA Face Detection: Is it only for Face Recognition?  A few years earlier  Face Detection Face Recognition 
Lecture 6 Image Segmentation
Image segmentation. The goals of segmentation Group together similar-looking pixels for efficiency of further processing “Bottom-up” process Unsupervised.
Natan Jacobson, Yen-Lin Lee, Vijay Mahadevan, Nuno Vasconcelos, Truong Q. Nguyen IEEE, ICME 2010.
CS 376b Introduction to Computer Vision 04 / 08 / 2008 Instructor: Michael Eckmann.
Normalized Cuts and Image Segmentation Jianbo Shi and Jitendra Malik, Presented by: Alireza Tavakkoli.
© 2003 by Davi GeigerComputer Vision October 2003 L1.1 Image Segmentation Based on the work of Shi and Malik, Carnegie Mellon and Berkley and based on.
Region Segmentation. Find sets of pixels, such that All pixels in region i satisfy some constraint of similarity.
Computer Vision Group University of California Berkeley Recognizing objects and actions in images and video Jitendra Malik U.C. Berkeley.
Visual Grouping and Recognition Jitendra Malik U.C. Berkeley Jitendra Malik U.C. Berkeley.
Today: Image Segmentation Image Segmentation Techniques Snakes Scissors Graph Cuts Mean Shift Wednesday (2/28) Texture analysis and synthesis Multiple.
1 Learning to Detect Natural Image Boundaries David Martin, Charless Fowlkes, Jitendra Malik Computer Science Division University of California at Berkeley.
Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,
CS 376b Introduction to Computer Vision 04 / 04 / 2008 Instructor: Michael Eckmann.
Visual Grouping and Recognition David Martin UC Berkeley David Martin UC Berkeley.
Efficient Spatiotemporal Grouping Using the Nyström Method Charless Fowlkes, U.C. Berkeley Serge Belongie, U.C. San Diego Jitendra Malik, U.C. Berkeley.
Cutting complete weighted graphs Jameson Cahill Ido Heskia Math/CSC 870 Spring 2007.
WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES Prasad Gabbur, Kobus Barnard University of Arizona.
Computational Vision Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.
Lecture#6: segmentation Anat Levin Introduction to Computer Vision Class Fall 2009 Department of Computer Science and App math, Weizmann Institute of Science.
Segmentation and Perceptual Grouping. The image of this cube contradicts the optical image.
Computer Vision Group University of California Berkeley Matching Shapes Serge Belongie *, Jitendra Malik and Jan Puzicha U.C. Berkeley * Present address:
Presentation By Michael Tao and Patrick Virtue. Agenda History of the problem Graph cut background Compute graph cut Extensions State of the art Continued.
Computer Vision Group University of California Berkeley Recognizing Objects in Adversarial Clutter: Breaking a Visual CAPTCHA Greg Mori and Jitendra Malik.
Perceptual Organization: Segmentation and Optical Flow.
Graph-based Segmentation
Image Segmentation Rob Atlas Nick Bridle Evan Radkoff.
Image Renaissance Using Discrete Optimization Cédric AllèneNikos Paragios ENPC – CERTIS ESIEE – A²SI ECP - MAS France.
The Three R’s of Vision Jitendra Malik.
IAstro/IDHA Workshop Strasbourg Observatory November 2002 Vito Di Gesù, Giosuè Lo Bosco DMA – University of Palermo, ITALY THE.
Presenter : Kuang-Jui Hsu Date : 2011/5/3(Tues.).
Clustering appearance and shape by learning jigsaws Anitha Kannan, John Winn, Carsten Rother.
Segmentation using eigenvectors
CSSE463: Image Recognition Day 34 This week This week Today: Today: Graph-theoretic approach to segmentation Graph-theoretic approach to segmentation Tuesday:
Segmentation using eigenvectors Papers: “Normalized Cuts and Image Segmentation”. Jianbo Shi and Jitendra Malik, IEEE, 2000 “Segmentation using eigenvectors:
1 Contours and Junctions in Natural Images Jitendra Malik University of California at Berkeley (with Jianbo Shi, Thomas Leung, Serge Belongie, Charless.
Chapter 14: SEGMENTATION BY CLUSTERING 1. 2 Outline Introduction Human Vision & Gestalt Properties Applications – Background Subtraction – Shot Boundary.
Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.
CS 4487/6587 Algorithms for Image Analysis
Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley.
Scene Completion Using Millions of Photographs James Hays, Alexei A. Efros Carnegie Mellon University ACM SIGGRAPH 2007.
Graphs and 2-Way Bounding Discrete Structures (CS 173) Madhusudan Parthasarathy, University of Illinois 1 /File:7_bridgesID.png.
CS654: Digital Image Analysis Lecture 28: Advanced topics in Image Segmentation Image courtesy: IEEE, IJCV.
Quiz Week 8 Topical. Topical Quiz (Section 2) What is the difference between Computer Vision and Computer Graphics What is the difference between Computer.
 In the previews parts we have seen some kind of segmentation method.  In this lecture we will see graph cut, which is a another segmentation method.
Computational Vision Jitendra Malik University of California, Berkeley.
Normalized Cuts and Image Segmentation Patrick Denis COSC 6121 York University Jianbo Shi and Jitendra Malik.
Jo˜ao Carreira, Abhishek Kar, Shubham Tulsiani and Jitendra Malik University of California, Berkeley CVPR2015 Virtual View Networks for Object Reconstruction.
Course Introduction to Medical Imaging Segmentation 1 – Mean Shift and Graph-Cuts Guy Gilboa.
CSSE463: Image Recognition Day 34
Paper Presentation: Shape and Matching
Shape matching and object recognition using shape contexts
Outline H. Murase, and S. K. Nayar, “Visual learning and recognition of 3-D objects from appearance,” International Journal of Computer Vision, vol. 14,
Presented by: Chang Jia As for: Pattern Recognition
Spectral Clustering Eric Xing Lecture 8, August 13, 2010
CSSE463: Image Recognition Day 34
“Traditional” image segmentation
Shape Matching and Object Recognition
Presentation transcript:

The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley

From images to objects Labeled sets: tiger, grass etc

Recognition Possible for both instances or object classes (Mona Lisa vs. faces or Beetle vs. cars) Tolerant to changes in pose and illumination

Three stages Segmentation: Images Regions Association: Regions Super-regions Matching: Super-regions Prototype views Segmentation: Images Regions Association: Regions Super-regions Matching: Super-regions Prototype views

Three stages Segmentation: Images Regions Association: Regions Super-regions Matching: Super-regions Prototype views Segmentation: Images Regions Association: Regions Super-regions Matching: Super-regions Prototype views

Boundaries of image regions defined by a number of attributes –Brightness/color –Texture –Motion –Stereoscopic depth –Familiar configuration –Brightness/color –Texture –Motion –Stereoscopic depth –Familiar configuration

Image Segmentation as Graph Partitioning Build a weighted graph G=(V,E) from image V:image pixels E:connections between pairs of nearby pixels Partition graph so that similarity within group is large and similarity between groups is small -- Normalized Cuts [Shi&Malik 97]

Some Terminology for Graph Partitioning How do we bipartition a graph:

Normalized Cut, A measure of dissimilarity Minimum cut is not appropriate since it favors cutting small pieces. Normalized Cut, Ncut: Minimum cut is not appropriate since it favors cutting small pieces. Normalized Cut, Ncut:

Solving the Normalized Cut problem Exact discrete solution to Ncut is NP- complete even on regular grid, –[Papadimitriou’97] Drawing on spectral graph theory, good approximation can be obtained by solving a generalized eigenvalue problem. Exact discrete solution to Ncut is NP- complete even on regular grid, –[Papadimitriou’97] Drawing on spectral graph theory, good approximation can be obtained by solving a generalized eigenvalue problem.

Normalized Cut As Generalized Eigenvalue problem after simplification, we get

Computational Aspects Solving for the generalized eigensystem: (D-W) is of size, but it is sparse with O(N) nonzero entries, where N is the number of pixels. Using Lanczos algorithm. Solving for the generalized eigensystem: (D-W) is of size, but it is sparse with O(N) nonzero entries, where N is the number of pixels. Using Lanczos algorithm.

Three stages Segmentation: Images Regions Association: Regions Super-regions Matching: Super-regions Prototype views Segmentation: Images Regions Association: Regions Super-regions Matching: Super-regions Prototype views

Association Number of super-regions of size k in image with n regions is approximately (4**k)*n/k For typical images, this ranges between 1000 and Plausibility ordering could reduce effective number substantially Computing time for this stage negligible Number of super-regions of size k in image with n regions is approximately (4**k)*n/k For typical images, this ranges between 1000 and Plausibility ordering could reduce effective number substantially Computing time for this stage negligible

Three stages Segmentation: Images Regions Association: Regions Super-regions Matching: Super-regions Prototype views Segmentation: Images Regions Association: Regions Super-regions Matching: Super-regions Prototype views

Matching Objects are represented by a set of prototypical views (~10 per object) For each super-region S, calculate probability that it is an instance of view V Determine most probable labeling of image into objects Objects are represented by a set of prototypical views (~10 per object) For each super-region S, calculate probability that it is an instance of view V Determine most probable labeling of image into objects

Matching super-regions to views Based on color, texture and shape similarity Color, texture matching is relatively well understood and fast Shape matching is difficult because the algorithm should tolerate pose, illumination and intra-category variation GOAL: small misclassification error with few views. Based on color, texture and shape similarity Color, texture matching is relatively well understood and fast Shape matching is difficult because the algorithm should tolerate pose, illumination and intra-category variation GOAL: small misclassification error with few views.

Core idea Find corresponding points on the two shapes and use those to deform prototype into alignment Allowing this flexibility reduces number of prototype views needed Find corresponding points on the two shapes and use those to deform prototype into alignment Allowing this flexibility reduces number of prototype views needed

MNIST Handwritten Digits

Digit Prototypes

Matching with original and deformed prototypes Prototype TestError

Deforming prototypes using thin plate splines

Only 25 deformable templates needed (instead of 60 K) to get 5% error

COIL Object Database

Computing cost on a Pentium PC Segmentation: 5 minutes /image Matching : 0.5 sec / match Segmentation: 5 minutes /image Matching : 0.5 sec / match

Cost on 10**4 node machine Segmentation: 0.03 sec /image, which is 30 Hz (video rate) Matching : 20K matches/sec at full resolution (100 points/shape) Segmentation: 0.03 sec /image, which is 30 Hz (video rate) Matching : 20K matches/sec at full resolution (100 points/shape)

How many prototype views can one match at 1 Hz? 1K candidate super-regions Consider only 1% of matches at full resolution (10% pass color/texture filter, 10% of those pass low resolution shape filter) If half time spent in pruning and half in full resolution matching, 1000 prototype views can be matched at 1 Hz. 1K candidate super-regions Consider only 1% of matches at full resolution (10% pass color/texture filter, 10% of those pass low resolution shape filter) If half time spent in pruning and half in full resolution matching, 1000 prototype views can be matched at 1 Hz.

What can one do with matching 1000 views a second? Worst case: 100 object categories Best case depends on how well one can exploit context, hierarchy and hashing. Cf. humans can recognize K objects Worst case: 100 object categories Best case depends on how well one can exploit context, hierarchy and hashing. Cf. humans can recognize K objects

Memory requirements 10 K object categories * 10 views/category * 100 * 100 pixels/view * 1 byte/pixel gives us 1 Gigabyte.

Concluding remarks Speech in 1985 was in the same state as vision in Hidden Markov Models adoption led to a decade of research which refined the paradigm for continuous speech recognition. The proposed 3 stage framework for recognition: segmentation, association and matching, could provide the same focus and coherence to vision research leading to general purpose object recognition in 10 years. Speech in 1985 was in the same state as vision in Hidden Markov Models adoption led to a decade of research which refined the paradigm for continuous speech recognition. The proposed 3 stage framework for recognition: segmentation, association and matching, could provide the same focus and coherence to vision research leading to general purpose object recognition in 10 years.