Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using the Forest to see the Trees: A computational model relating features, objects and scenes Antonio Torralba CSAIL-MIT Joint work with Aude Oliva, Kevin.

Similar presentations


Presentation on theme: "Using the Forest to see the Trees: A computational model relating features, objects and scenes Antonio Torralba CSAIL-MIT Joint work with Aude Oliva, Kevin."— Presentation transcript:

1 Using the Forest to see the Trees: A computational model relating features, objects and scenes Antonio Torralba CSAIL-MIT Joint work with Aude Oliva, Kevin Murphy, William Freeman Monica Castelhano, John Henderson

2 From objects to scenes Image I Local features LLLL O2O2 O2O2 O2O2 O2O2 S SceneType 2 {street, office, …} O1O1 O1O1 O1O1 O1O1 Object localization Riesenhuber & Poggio (99); Vidal- Naquet & Ullman (03); Serre & Poggio, (05); Agarwal & Roth, (02), Moghaddam, Pentland (97), Turk, Pentland (91),Vidal-Naquet, Ullman, (03) Heisele, et al, (01), Agarwal & Roth, (02), Kremp, Geman, Amit (02), Dorko, Schmid, (03) Fergus, Perona, Zisserman (03), Fei Fei, Fergus, Perona, (03), Schneiderman, Kanade (00), Lowe (99)

3 From scenes to objects S SceneType 2 {street, office, …} Image Local features LLL I O1O1 O1O1 O1O1 O1O1 O2O2 O2O2 O2O2 O2O2 L Global gist features G Object localization

4 From scenes to objects S SceneType 2 {street, office, …} Image Local features LLL I O1O1 O1O1 O1O1 O1O1 O2O2 O2O2 O2O2 O2O2 L Global gist features G Object localization

5 The context challenge 2 What do you think are the hidden objects? 1 Biederman et al 82; Bar & Ullman 93; Palmer, 75;

6 The context challenge What do you think are the hidden objects? Answering this question does not require knowing how the objects look like. It is all about context. Chance ~ 1/30000

7 From scenes to objects S SceneType 2 {street, office, …} Image Local features LLL I L Global gist features G

8 Scene categorization OfficeCorridorStreet Oliva & Torralba, IJCV’01; Torralba, Murphy, Freeman, Mark, CVPR 03.

9 Place identification Office 610Office 615 Draper street 59 other places… Scenes are categories, places are instances

10 Supervised learning V g, Office} V g, { { Office} V g, { Corridor} V g, { Street} … Classifier

11 Supervised learning V g, Office} V g, { { Office} V g, { Corridor} V g, { Street} … Classifier Which feature vector for a whole image?

12 Global features (gist) First, we propose a set of features that do not encode specific object information Oliva & Torralba, IJCV’01; Torralba, Murphy, Freeman, Mark, CVPR 03.

13 | v t | PCA 80 features Global features (gist) V = {energy at each orientation and scale} = 6 x 4 dimensions First, we propose a set of features that do not encode specific object information G Oliva & Torralba, IJCV’01; Torralba, Murphy, Freeman, Mark, CVPR 03.

14 Example visual gists I I’ Global features (I) ~ global features (I’) Cf. “Pyramid Based Texture Analysis/Synthesis”, Heeger and Bergen, Siggraph, 1995

15 Learning to recognize places Hidden states = location (63 values) Observations = v G t (80 dimensions) Transition matrix encodes topology of environment Observation model is a mixture of Gaussians centered on prototypes (100 views per place) Office 610 Corridor 6bCorridor 6c Office 617 We use annotated sequences for training

16 Wearable test-bed v1 Kevin Murphy

17 Wearable test-bed v2

18 Place/scene recognition demo

19 From scenes to objects S SceneType 2 {street, office, …} Image Local features LLL I O1O1 O1O1 O1O1 O1O1 O2O2 O2O2 O2O2 O2O2 L Global gist features G Object localization

20 Global scene features predicts object location vgvg New image Image regions likely to contain the target

21 V g, 1 { } X1X1 2 { } X2X2 3 { } X3X3 4 { } X4X4 … Training set (cars) Global scene features predicts object location The goal of the training is to learn the association between the location of the target and the global scene features

22 Global scene features predicts object location vgvg X Results for predicting the vertical location of people Estimated Y True Y Results for predicting the horizontal location of people Estimated X True X

23 The layered structure of scenes p(x 2 |x 1 )p(x) In a display with multiple targets present, the location of one target constraints the ‘y’ coordinate of the remaining targets, but not the ‘x’ coordinate.

24 Global scene features predicts object location Stronger contextual constraints can be obtained using other objects. vgvg X

25

26

27 1

28 1

29 Attentional guidance Local features Saliency Saliency models: Koch & Ullman, 85; Wolfe 94; Itti, Koch, Niebur, 98; Rosenholtz, 99

30 Attentional guidance Local features Global features Saliency Scene prior TASK Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP 2003

31 Attentional guidance Local features Global features Saliency Object model TASK Scene prior

32 Comparison regions of interest Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP 2003

33 Comparison regions of interest Saliency predictions Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP 2003 10% 20% 30%

34 Comparison regions of interest Saliency predictions Saliency and Global scene priors Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP 2003 10% 20% 30%

35 Comparison regions of interest Dots correspond to fixations 1-4 Saliency predictions 10% 20% 30% Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP 2003

36 Comparison regions of interest Dots correspond to fixations 1-4 Saliency predictions Saliency and Global scene priors 10% 20% 30% Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP 2003

37 Results Saliency Region Contextual Region 1 2 3 4 Chance level: 33 % 100 90 80 70 60 50 % of fixations inside the region Fixation number 1 2 3 4 100 90 80 70 60 50 Scenes without people Scenes with people Fixation number

38 Task modulation Local features Global features Saliency Scene prior TASK Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP 2003

39 Task modulation Mug search Painting search Saliency predictions Saliency and Global scene priors

40 From the computational perspective, scene context can be derived from global image properties and predict where objects are most likely to be. Scene context considerably improves predictions of fixation locations. A complete model of attention guidance in natural scenes requires both saliency and contextual pathways Discussion


Download ppt "Using the Forest to see the Trees: A computational model relating features, objects and scenes Antonio Torralba CSAIL-MIT Joint work with Aude Oliva, Kevin."

Similar presentations


Ads by Google