Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning Spatial Context: Using stuff to find things Geremy Heitz Daphne Koller Stanford University October 13, 2008 ECCV 2008.

Similar presentations


Presentation on theme: "Learning Spatial Context: Using stuff to find things Geremy Heitz Daphne Koller Stanford University October 13, 2008 ECCV 2008."— Presentation transcript:

1 Learning Spatial Context: Using stuff to find things Geremy Heitz Daphne Koller Stanford University October 13, 2008 ECCV 2008

2 Things vs. Stuff Stuff (n): Material defined by a homogeneous or repetitive pattern of fine-scale properties, but has no specific or distinctive spatial extent or shape. Thing (n): An object with a specific size and shape. From: Forsyth et al. Finding pictures of objects in large collections of images. Object Representation in Computer Vision, 1996.

3 Finding Things Context is key!

4 Outline What is Context? The Things and Stuff (TAS) model Results

5 Satellite Detection Example D(W) = 0.8

6 Error Analysis Typically… We need to look outside the bounding box! False Positives are OUT OF CONTEXT True Positives are IN CONTEXT

7 Types of Context Scene-Thing: Stuff-Stuff: gist car “likely” keyboard “unlikely” Thing-Thing: [ Torralba et al., LNCS 2005 ] [ Gould et al., IJCV 2008 ] [ Rabinovich et al., ICCV 2007 ]

8 Types of Context Stuff-Thing: Based on spatial relationships Intuition: Trees = no cars Houses = cars nearby Road = cars here “Cars drive on roads” “Cows graze on grass” “Boats sail on water” Goal: Unsupervised

9 Outline What is Context? The Things and Stuff (TAS) model Results

10 Things Detection “candidates” Low detector threshold -> “over-detect” Each candidate has a detector score

11 Things Candidate detections Image Window W i + Score Boolean R.V. T i T i = 1: Candidate is a positive detection Thing model TiTi Image Window W i

12 Stuff Coherent image regions Coarse “superpixels” Feature vector F j in R n Cluster label S j in {1…C} Stuff model Naïve Bayes SjSj FjFj

13 Relationships Descriptive Relations “Near”, “Above”, “In front of”, etc. Choose set R = { r 1 …r K } R ijk =1: Detection i and region j have relation k Relationship model S 72 = Trees S 4 = Houses S 10 = Road T1T1 R ijk TiTi SjSj R 1,10,in =1

14 The TAS Model R ijk TiTi SjSj FjFj Image Window W i W i : Window T i : Object Presence S j : Region Label F j : Region Features R ijk : Relationship N J K Supervised in Training Set Always Observed Always Hidden

15 Unrolled Model T1T1 S1S1 S2S2 S3S3 S4S4 S5S5 T2T2 T3T3 R 2,1,above = 0 R 3,1,left = 1 R 1,3,near = 0 R 3,3,in = 1 R 1,1,left = 1 Candidate Windows Image Regions

16 Learning the Parameters Assume we know R S j is hidden Everything else observed Expectation-Maximization “Contextual clustering” Parameters are readily interpretable R ijk TiTi SjSj FjFj Image Window W i N J K Supervised in Training Set Always Observed Always Hidden

17 Learned Satellite Clusters

18 Which Relationships to Use? Rijk = spatial relationship between candidate i and region j Rij1 = candidate in region Rij2 = candidate closer than 2 bounding boxes (BBs) to region Rij3 = candidate closer than 4 BBs to region Rij4 = candidate farther than 8 BBs from region Rij5 = candidate 2BBs left of region Rij6 = candidate 2BBs right of region Rij7 = candidate 2BBs below region Rij8 = candidate more than 2 and less than 4 BBs from region … RijK = candidate near region boundary How do we avoid overfitting?

19 Learning the Relationships Intuition “Detached” R ijk = inactive relationship Structural EM iterates: Learn parameters Decide which edge to toggle Evaluate with l (T|F,W,R) Requires inference Better results than using standard E[ l (T,S,F,W,R)] R ij1 TiTi SjSj FjFj R ij2 R ijK

20 Inference Goal: Block Gibbs Sampling Easy to sample T i ’s given S j ’s and vice versa

21 Outline What is Context? The Things and Stuff (TAS) model Results

22 Base Detector - HOG [ Dalal & Triggs, CVPR, 2006 ] HOG Detector: Feature Vector XSVM Classifier

23 Results - Satellite Prior: Detector Only Posterior: Detections Posterior: Region Labels

24 Results - Satellite 4080120160 0 0.2 0.4 0.6 0.8 1 False Positives Per Image Recall Rate Base Detector TAS Model ~10% improvement in recall at 40 fppi

25 PASCAL VOC Challenge 2005 Challenge 2232 images split into {train, val, test} Cars, Bikes, People, and Motorbikes 2006 Challenge 5304 images plit into {train, test} 12 classes, we use Cows and Sheep

26 Base Detector Error Analysis Cows

27 Discovered Context - Bicycles Bicycles Cluster #3

28 TAS Results – Bicycles Examples Discover “true positives” Remove “false positives” BIKE ? ? ?

29 Results – VOC 2005

30 Results – VOC 2006

31 Conclusions Detectors can benefit from context The TAS model captures an important type of context We can improve any sliding window detector using TAS The TAS model can be interpreted and matches our intuitions We can learn which relationships to use

32 Merci!

33 Object Detection Task: Find the things Example: Find all the cars in this image Return a “bounding box” for each Evaluation: Maximize true positives Minimize false positives

34 Sliding Window Detection Consider every bounding box All shifts All scales Possibly all rotations Each such window gets a score: D(W) Detections: Local peaks in D(W) Pros: Covers the entire image Flexible to allow variety of D(W)’s Cons: Brute force – can be slow Only considers features in box D = 1.5 D = -0.3

35 Sliding Window Results PASCAL Visual Object Classes Challenge Cows 2006 score(A,B) > 0.5 TRUE POSITIVE score(A,B) ≤ 0.5 FALSE POSITIVE B A Recall(T) = TP / (TP + FN) Precision(T) = TP / (TP + FP) score(A,B) = |A∩B| / |AUB| D(W) > T

36 Quantitative Evaluation 04080120160 0.2 0.4 0.6 0.8 1 False Positives Per Image Recall Rate

37

38 Prior: Detector Only Posterior: TAS Model Region Labels Detections in Context Task: Identify all cars in the satellite image Idea: The surrounding context adds info to the local window detector + = Houses Road

39 Equations

40 Features: Haar wavelets Haar filters and integral image Viola and Jones, ICCV 2001 The average intensity in the block is computed with four sums independently of the block size. BOOSTING!

41 Features: Edge fragments Weak detector = Match of edge chain(s) from training image to edgemap of test image Opelt, Pinz, Zisserman, ECCV 2006 BOOSTING!

42 Histograms of oriented gradients Dalal & Trigs, 2006 SIFT, D. Lowe, ICCV 1999 SVM!


Download ppt "Learning Spatial Context: Using stuff to find things Geremy Heitz Daphne Koller Stanford University October 13, 2008 ECCV 2008."

Similar presentations


Ads by Google