Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.

Similar presentations


Presentation on theme: "Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009."— Presentation transcript:

1 Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009

2 Road Sky Trees Bridge Sign Car

3 Huge datasets PASCAL Visual Objects Challenge (VOC) dataset ~15000 annotated images, ~35,000 annotated object instances, 20 object classes with segmentations, bounding boxes

4 Huge datasets LabelMe dataset ~11845 static images, >100,000 labeled polygons

5 Outline I. Recognizing single object classes (Jon) II. Scene understanding with multiple classes (Tomasz)

6 Recognition task #1: Find all markers

7 Geometric Variability Recognition task #2: Find all cats Object recognition is often hard due to:

8 Variation within an object class

9 Viewpoint/Scales/Illumination Variability Images from Flickr

10 From Pixels to Visual features car Imaging Inference Scene Features Pixels Low level features Higher level inference

11 Local Visual Features Images are high dimensional! Compute image statistics in a region (e.g., estimate the distribution of image gradient orientations) (640 width) *(480 height) = (307200 pixels)

12 Key ideas in feature design Be invariant to stuff you don’t care about… while not being too invariant

13 Object classification Inference: What object class is this? Learning: What does each object class look like? Cowor Horse?? Let’s look at a simpler example first…

14 Document classification analogy John Terry scored on a header to lift Chelsea to a 1-0 victory over Manchester United and extend the Blues’ Premier League lead to 5 points. Chelsea had been frustrated by Manchester United for 76 minutes, but took advantage of a free kick awarded when Darren Fletcher fouled Ashley Cole. Brian Ching scored six minutes into overtime and the Houston Dynamo advanced to Major League Soccer’s Western... In the Senate, where proposals differ substantially from the House-passed measure on issues like a government-run plan and how to pay for coverage, the bill is stalled while budget analysts assess its overall costs. The slim margin in the House — the bill passed with just two votes to spare, and 39 Democrats opposed it — suggests even greater challenges in the Senate, where the majority leader,... ??? Classify each document as sports or politics

15 Bag-of-words models for text classification “Much of the meaning behind written language is preserved even when the ordering of the individual words is lost.” [El-Arini et al.,’09] bag words (Sue Ann)

16 Document classification analogy but to on Darren awarded Fletcher advanced Ashley lift over to 1-0 scored advantage Major for lead 76 Chelsea Premier to Terry League John Houston the kick Chelsea took United points. free minutes fouled United been frustrated overtime Manchester six a when League a extend victory Ching 5 and to and Western Manchester Brian Cole. Dynamo Soccer’s by a minutes, Blues’ the had header into of scored... the margin how In on majority 39 costs. with measure slim overall — to like opposed suggests challenges pay even substantially stalled government run where the issues votes it the where bill for spare, from bill and a Senate, analysts coverage, in — the Democrats greater differ two proposals budget its House assess while Senate, to in just the leader and the plan passed the is House passed The... ???

17 Document classification analogy but to on Darren awarded Fletcher advanced Ashley lift over to 1-0 scored advantage Major for lead 76 Chelsea Premier to Terry League John Houston the kick Chelsea took United points. free minutes fouled United been frustrated overtime Manchester six a when League a extend victory Ching 5 and to and Western Manchester Brian Cole. Dynamo Soccer’s by a minutes, Blues’ the had header into of scored... the margin how In on majority 39 costs. with measure slim overall — to like opposed suggests challenges pay even substantially stalled government-run where the issues votes it the where bill for spare, from bill and a Senate, analysts coverage, in — the Democrats greater differ two proposals budget its House assess while Senate, to in just the leader and the plan passed the is House-passed The... ???

18

19 Visual words (discretization) Words are discrete, visual features are typically continuous… Discretization via clustering/vector quantization

20 Visual words [Sivic et al., ‘05]

21 Object classification with bag of words [Sivic et al., ‘05]

22 Object classification with bag of words Performance on Caltech 101 dataset with linear SVM on bag-of-word vectors: Faces AirplanesCars [Csurka et al., ‘04]

23 Object Detection problem Detection: Locate all the faces in this image. Classification: Is this a face, or not a face?

24 Face detection via a series of classifications (a.k.a. sliding window brain damage)

25 False Detection Missed Faces Sliding window detection results

26 The need for…capturing spatial relationships

27 One Approach Create a more descriptive (complicated) feature Histograms of Oriented Gradients (HOG) features Original Image Subdivided Image cells Histogrammed gradients in each cell Estimated Image Gradients gradient magnitudes gradient orientations [Dalal & Triggs, ‘06]

28 People Tracking with HOG features better

29 Modeling Spatial Relationships with Deformable Part Based Models Spring-based models: Parts prefer low-energy configurations [Fischler & Elschlager,’73], [Ramanan et al,’07], [Felszwenwalb et al,’05,’09], [Kumar et al, ‘09]

30 Parts Based Model Vertices – Local Appearance Edges - Spatial Relationship    Goal: Assign model parts to image regions preserving both local appearance and spatial relationships

31 Parts based models - Inference Problem Inference problem: What is the best scoring assignment f? Local Appearance term Pairwise Spatial Relationship term Inference is NP-hard for general graphs For trees can use belief propagation for exact solution in polytime

32 Parts based models - Learning Problem Linear models: s.t. Local Appearance term Pairwise Spatial Relationship term Convex max-margin objective Positive examples on one side Negative examples on the other [Kumar et al,’09] Learning linear models: Find weight vectors that best separate positive and negative examples. E.g.,

33 Person deformable part model Root filter (8x8 resolution) Part filter (4x4 resolution) Quadratic spatial configuration model [Felszwenwalb et al,’09]

34

35 [Ramanan et al,’09]

36 Outline I. Recognizing single object classes (Jon) II. Scene understanding with multiple classes (Tomasz)

37 Part II: Scene Understanding with Multiple Classes Goal: Predict Many Different Objects in a Single Image Car Fire Hydrant Building Fence Sidewalk Tree

38 Wait... What’s wrong with just learning a different sliding window classifier for each object type in the world?

39 The image as seen from a object detector’s point of view

40 41 Relationships between objects make recognition possible 41 Antonio Torralba. The Context Challenge. http://web.mit.edu/torralba/www/carsAndFacesInContext.html

41 43 Objects as the “Parts” of a Scene Key Challenge in Scene Understanding: Modeling relationships between objects from different categories Deformable Part Model Scene Model

42 Fixed Extent “Things” vs Free-form “Stuff” Building Fence Sidewalk Car Fire Hydrant Tree Things have a well-defined shape. A part of a car is not a car. Stuff is free-form and mostly defined by color/texture. A part of a building is still a building.

43 3 Types of Scene Models Pixel-basedWindow-basedSegment-based

44 Pixel-based Scene Understanding Unable to reason about instances Only limited notion of context TextonBoost: Joint Appearance, Shape and Context Modeling for Multi- class Object Recognition and Segmentation. Shotton et al. ECCV 2006 Produces Segmentation Works well on “stuff”

45 50 Pixel-wise Conditional Random Fields (TextonBoost) Inference y^* = argmax_y p(y|x) Training: Use boosting to learn unary potential Future Direction: Higher-Order Cliques 50 TextonBoost: Joint Appearance, Shape and Context Modeling for Multi- class Object Recognition and Segmentation. Shotton et al. ECCV 2006

46 Window-based Scene Understanding Often not possible to model “stuff” using windows. Window assumption also questionable for some “things.” Possible to model interactions between object instances. Discriminative models for multi-class object layout. Desai et al. ICCV 2009 Object Recognition by Scene Alignment. Russell et al. NIPS 2007

47 52 Discriminative models for multi- class object layout Inference via Greedy Forward Search Training 52

48 53 Window-based results 53

49 Region-Based Scene Understanding Use Segmentation algorithm to extract stable regions Use CRF to label those segments Problem: Hard to get object-segments. Problem: Inference difficult for fully connected models.

50 56 Region-Based CRF Training: Bag of Words with Nearest Neighbor classifier Maximum Likelihood training of pairwise potentials 56 Object Categorization using Co-Occurrence, Location and Appearance. Galleguillos et al. CVPR 2008. Spatial Relations

51 57 Segmentation-Based Results 57 Input image No contextw/ context Object Categorization using Co-Occurrence, Location and Appearance. Galleguillos et al. CVPR 2008.

52 58 Model Granularity vs. Object Type PixelsWindowsRegions Things (car, cow, person) :-(:-):-/ Stuff (road, sky, tree) :-):-(:-) Granularity Object Type

53 Scene Understanding Recap Rich object-object interactions are important for scene understanding. Different underlying assumptions (pixel vs. window vs. region) are better suited for different types of objects (“stuff” vs. “things”) Many of the techniques for single class object recognition (e.g., part based models) are relevant for scene understanding

54 Thanks! Image Classification Sliding Window based Object Detection Modeling Spatial Relationships between parts Modeling Spatial Relationships between objects


Download ppt "Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009."

Similar presentations


Ads by Google