Presentation is loading. Please wait.

Presentation is loading. Please wait.

Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking? Philip H.S. Torr Pawan Kumar, Pushmeet Kohli, Matt Bray.

Similar presentations


Presentation on theme: "Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking? Philip H.S. Torr Pawan Kumar, Pushmeet Kohli, Matt Bray."— Presentation transcript:

1 Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking? Philip H.S. Torr Pawan Kumar, Pushmeet Kohli, Matt Bray Oxford Brookes University Andrew Zisserman Oxford Arasanathan Thayananthan, Bjorn Stenger, Roberto Cipolla Cambridge

2 Algebra n Unifying Conjecture n Tracking = Detection = Recognition n Detection = Segmentation therefore n Tracking (pose estimation)=Segmentation?

3 Objective ImageSegmentationPose Estimate?? Aim to get a clean segmentation of a human…

4 Developments n ICCV 2003, pose estimation as fast nearest neighbour plus dynamics (inspired by Gavrilla and Toyoma & Blake) n BMVC 2004, parts based chamfer to make space of templates more flexible (a la pictorial structures of Huttenlocher) n CVPR 2005, ObjCut combining segmentation and detection. n ECCV 2006, interpolation of poses using the MVRVM (Agarwal and Triggs) n ECCV 2006 combination of pose estimation and segmentation using graph cuts.

5 Tracking as Detection (Stenger et al ICCV 2003) Detection has become very efficient, e.g. real-time face detection, pedestrian detection Example: Pedestrian detection [Gavrila & Philomin, 1999]: Find match among large number of exemplar templates Issues: Number of templates needed Efficient search Robust cost function

6 Cascaded Classifiers

7 First filter : 19.8 % patches remaining 1280x1024 image, 11 subsampling levels, 80s Average number of filter per patch : 6.7

8 Filter 10 : 0.74 % patches remaining 1280x1024 image, 11 subsampling levels, 80s Average number of filter per patch : 6.7

9 Filter 20 : 0.06 % patches remaining 1280x1024 image, 11 subsampling levels, 80s Average number of filter per patch : 6.7

10 Filter 30 : 0.01 % patches remaining 1280x1024 image, 11 subsampling levels, 80s Average number of filter per patch : 6.7

11 Filter 70 : 0.007 % patches remaining 1280x1024 image, 11 subsampling levels, 80s Average number of filter per patch : 6.7

12 Hierarchical Detection Efficient template matching (Huttenlocher & Olson, Gavrila) Idea: When matching similar objects, speed-up by forming template hierarchy found by clustering Match prototypes first, sub-tree only if cost below threshold

13 Trees n These search trees are the same as used for efficient nearest neighbour. n Add dynamic model and Detection = Tracking = Recognition

14 Evaluation at Multiple Resolutions One traversal of tree per time step

15 Evaluation at Multiple Resolutions Tree: 9000 templates of hand pointing, rigid

16 Templates at Level 1

17 Templates at Level 2

18 Templates at Level 3

19 Comparison with Particle Filters n This method is grid based, No need to render the model on line Like efficient search Can always use this as a proposal process for a particle filter if need be.

20 Interpolation, MVRVM, ECCV 2006 Code available.

21 Energy being Optimized, link to graph cuts n Combination of Edge term (quickly evaluated using chamfer) Interior term (quickly evaluated using integral images) n Note that possible templates are a bit like cuts that we put down, one could think of this whole process as a constrained search for the best graph cut.

22 Likelihood : Edges Edge DetectionProjected Contours Robust Edge Matching Input Image 3D Model

23 Chamfer Matching Input imageCanny edges Distance transform Projected Contours

24 Likelihood : Colour Skin Colour Model Projected Silhouette Input Image 3D Model Template Matching

25 Template Matching = n Template Matching = constrained search for a cut/segmentation? n Detection = Segmentation?

26 Objective ImageSegmentationPose Estimate?? Aim to get a clean segmentation of a human…

27 MRF for Interactive Image Segmentation, Boykov and Jolly [ICCV 2001] Energy MRF Pair-wise Terms MAP Solution Unary likelihoodData (D) Unary likelihoodContrast TermUniform Prior (Potts Model) Maximum-a-posteriori (MAP) solution x* = arg min E(x) x =

28 However… n This energy formulation rarely provides realistic (target- like) results.

29 Shape-Priors and Segmentation n Combine object detection with segmentation Obj-Cut, Kumar et al., CVPR ’05 Zhao and Davis, ICCV ’05 n Obj-Cut Shape-Prior: Layered Pictorial Structure (LPS) Learned exemplars for parts of the LPS model Obtained impressive results + Layer 1Layer 2 = LPS model

30 LPS for Detection n Learning Learnt automatically using a set of examples n Detection Tree of chamfers to detect parts, assemble with pictorial structure and belief propogation.

31 Solve via Integer Programming n SDP formulation (Torr 2001, AI stats) n SOCP formulation (Kumar, Torr & Zisserman this conference) n LBP (Huttenlocher, many)

32 Obj-Cut Image Likelihood Ratio (Colour) Shape Prior Distance from  Likelihood + Distance from 

33 Integrating Shape-Prior in MRFs Unary potential Pairwise potential Label s Pixel s Prior Potts model MRF for segmentation

34 Integrating Shape-Prior in MRFs  Unary potential Pairwise potential Pose parameters Label s Pixel s Prior Potts model Pose-specific MRF

35 Layer 2 Layer 1 Transformations Θ 1 P(Θ 1 ) = 0.9 Cow Instance Do we really need accurate models?

36 n Segmentation boundary can be extracted from edges n Rough 3D Shape-prior enough for region disambiguation

37 Energy of the Pose-specific MRF Energy to be minimized Unary term Shape prior Pairwise potential Potts model But what should be the value of θ?

38 The different terms of the MRF Original image Likelihood of being foreground given a foreground histogram Grimson- Stauffer segmentation Shape prior model Shape prior (distance transform) Likelihood of being foreground given all the terms Resulting Graph-Cuts segmentation

39 Can segment multiple views simultaneously

40 Solve via gradient descent n Comparable to level set methods n Could use other approaches (e.g. Objcut) n Need a graph cut per function evaluation

41 Formulating the Pose Inference Problem

42 But… EACH … to compute the MAP of E(x) w.r.t the pose, it means that the unary terms will be changed at EACH iteration and the maxflow recomputed! However… Kohli and Torr showed how dynamic graph cuts can be used to efficiently find MAP solutions for MRFs that change minimally from one time instant to the next: Dynamic Graph Cuts (ICCV05).

43 Dynamic Graph Cuts PBPB SBSB cheaper operation computationally expensive operation Simpler problem P B* differences between A and B similar PAPA SASA solve

44 Dynamic Image Segmentation Image Flows in n-edges Segmentation Obtained

45 First segmentation problem MAP solution GaGa Our Algorithm GbGb second segmentation problem Maximum flow residual graph ( G r ) G` difference between G a and G b updated residual graph

46 Dynamic Graph Cut vs Active Cuts n Our method flow recycling n AC cut recycling n Both methods: Tree recycling

47 Experimental Analysis MRF consisting of 2x10 5 latent variables connected in a 4-neighborhood. Running time of the dynamic algorithm

48 Segmentation Comparison Grimson-Stauffer Bathia04 Our method

49 Face Detector and ObjCut

50 Segmentation

51

52 Conclusion n Combining pose inference and segmentation worth investigating. n Tracking = Detection n Detection = Segmentation n Tracking = Segmentation. n Segmentation = SFM ??


Download ppt "Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking? Philip H.S. Torr Pawan Kumar, Pushmeet Kohli, Matt Bray."

Similar presentations


Ads by Google