Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scene Understanding through Transfer Learning Stephen Gould Ben Packer Geremy Heitz Daphne Koller DARPA Update September 11, 2008.

Similar presentations


Presentation on theme: "Scene Understanding through Transfer Learning Stephen Gould Ben Packer Geremy Heitz Daphne Koller DARPA Update September 11, 2008."— Presentation transcript:

1 Scene Understanding through Transfer Learning Stephen Gould Ben Packer Geremy Heitz Daphne Koller DARPA Update September 11, 2008

2 Outline What is Scene Understanding? Scene Understanding Projects R ij TiTi SjSj FjFj Image Window W i N J I ΦDΦD ΦSΦS ΦZΦZ ŶDŶD 0 ŶDŶD 1 ŶDŶD L ŶSŶS 0 ŶSŶS 1 ŶSŶS L ŶZŶZ 0 ŶZŶZ 1 ŶZŶZ L Cascaded Classification Models [NIPS, 2008] TAS: Things and Stuff [ECCV, 2008] 3D Context [ECCV Workshop, 2008] LOOPS [NIPS, 2008] Hierarchical Learning [UAI, 2008] Indoor Depth Reconstruction (in progress)

3 What is “Understanding”? Vision Subtask (Recognition): “Is there an object of type X in this image?” Airplane? NO Human? YES Dog? YES Scene Understanding: “What is happening in this image?” MAN DOG The man is walking the dog

4 SEASIDE PASTURE GRASS SKY Computer View of a “Scene”

5 Human View of a “Scene” “The cow is walking through the grass on a pasture by the sea.” A cow Some grass… She’s walking.

6 “Context”

7 What can we do when we have all the components and datasets with ground-truth labels for each?

8 Scene Understanding CCM SEASIDE PASTURE GRASS SKY Grass = Flat Sky = Far FG = Vertical 40% Grass, 30% Sky… 1 cow, 2 boats…

9 Solution: CCMs I ΦDΦD ΦSΦS ΦZΦZ ŶDŶD 0 ŶDŶD 1 ŶDŶD L ŶSŶS 0 ŶSŶS 1 ŶSŶS L ŶZŶZ 0 ŶZŶZ 1 ŶZŶZ L I: Image Φ: Image Features Ŷ: Output labels Features for level ℓ+1 computed from Φ and labels of level ℓ

10 Some Examples: SU-2

11 Why do we think depth can provide context signals?

12 Indoor Detection Image sensors (cameras) provide high resolution color and intensity data Range sensors (laser) provide depth and global contextual information Improving detection: augment visual information with 3-d features from a range scanner, e.g. laser 480 640 30

13 3-d features Scene representation based on 3-d points and surface normals for every pixel in image, { X ij, n ij }, and set of dominant planes, { P k }. Compute 3-d features over candidate windows (in image plane) by projecting window into 3-d scene

14 Example scenes mugcupmonitorclockhandleski boot 2-d onlywith 3-d2-d onlywith 3-d

15 What if we only have detection ground-truth? Can we still do anything?

16 Unsupervised Context - TAS Stuff-Thing: Based on intuitive “relationships” Green & Textured = no cars Red & Boxy = cars nearby Gray & Smooth = cars here

17 The TAS Model W i : Window T i : Object Presence S j : Region Label F j : Region Features R ij : Relationship R ij TiTi SjSj FjFj Image Window W i N J

18 TAS Results - Satellite

19 What about questions that require more than just object bounding boxes?

20 Finer-grained analysis… Man Dog Scene: “man wearing a backpack walking a dog” Backpack Body Objects (Context) Parts (Rough Layout) Landmarks (Local Shape) Head Fore LegsHind Legs Head Torso Legs B1B1 B2B2 B3B3 B4B4 L1L1 L2L2 L3L3 L4L4 T1T1 T2T2 T3T3 T4T4

21 RANDOM Shape-based Classification Goal: Classify based on shape characteristics Is the giraffe Or 12345678910 0.4 0.6 0.8 1 Accuracy # Training Instances (per class) boosted detector GROUND = NN on “true” shape Goal: close this gap

22 Classifying Lamps 12345678910 0.4 0.6 0.8 1 # Training Instances (per class) NB BOOSTING LOOPS GROUND Wide base (-) Thin base (+) 12345678910 0.4 0.6 0.8 1 # Training Instances (per class) NB BOOSTING LOOPS GROUND Triangular (-) Square (+) Accuracy [Submitted to IJCV]

23 Learning the Shape Model Problem: With few instances, learned models aren’t robust MEAN Principal Components Training Set: std +1 std -1 std +1 std -1 MEAN

24 F data : Encourage parameters to explain data Undirected Probabilistic Model  root F data Divergence  : high  Elephant  Rhino  : low Divergence: Encourage parameters to be similar to parents Divergence

25 Does Hierarchy Help? 6102030 -350 -300 -250 -200 -150 -100 -50 0 50 Total Number of Training Instances Delta log-loss / instance Mammal Pairs Regularized Max Likelihood Bison-Rhino Elephant-Bison Elephant-Rhino Giraffe-Bison Giraffe-Elephant Giraffe-Rhino Llama-Bison Llama-Elephant Llama-Giraffe Llama-Rhino Unregularized max likelihood, shrinkage: Much worse, not shown

26 How can we use this platform for the vision to manipulation transfer task?

27 Indoor Scene Reconstruction Motivation: 3D relationships are essential for scene understanding. Most applications work on monocular input. Goal: Reconstruct the 3D geometry of an indoor scene/workspace from monocular images. Transfer Task: Use object detection to add 3D constraints. Eventually this will help robotic manipulation.

28 Indoor Scene Reconstruction Method: learn how geometric features appear in images depth differences between pairs of points co-linearity/co-planarity of triplets of points higher level structures (corners, objects, etc) Encode these as “soft” constraints between parts of the scene and use belief propagation to satisfy the constraints. Preliminary results: see CURIS presentation.


Download ppt "Scene Understanding through Transfer Learning Stephen Gould Ben Packer Geremy Heitz Daphne Koller DARPA Update September 11, 2008."

Similar presentations


Ads by Google