Presentation is loading. Please wait.

Presentation is loading. Please wait.

Recognizing Human Figures and Actions Greg Mori Simon Fraser University.

Similar presentations


Presentation on theme: "Recognizing Human Figures and Actions Greg Mori Simon Fraser University."— Presentation transcript:

1 Recognizing Human Figures and Actions Greg Mori Simon Fraser University

2 Goal Action recognition –Where are the people? –What are they doing? Applications –Image understanding, image retrieval and search –HCI –Surveillance –Computer Graphics

3 3-pixel man Blob tracking 300-pixel man Find and track limbs Far field Near field Medium field 30-pixel man Coarse-level actions

4 Outline Human figures in motion –Action Recognition Localizing joint positions –Exemplar-based approach –Parts-based approach Motion Synthesis –Novel graphics application

5 Appearance vs. Motion Jackson Pollock Number 21 (detail)

6 Action Recognition Recognize human actions at a distance –Low resolution, noisy data –Moving camera, occlusions –Wide range of actions (including non-periodic)

7 Our Approach Motion-based approach –Classify a novel motion by finding the most similar motion from the training set –Use large amounts of data (“non-parametric”) Related Work –Periodicity analysis Polana & Nelson; Seitz & Dyer; Bobick et al; Cutler & Davis; Collins et al. –Model-free Temporal Templates [Bobick & Davis] Orientation histograms [Freeman et al; Zelnik & Irani] Using MoCap data [Zhao & Nevatia, Ramanan & Forsyth]

8 Gathering action data Tracking –Simple correlation-based tracker

9 Figure-centric Representation Stabilized spatio-temporal volume –No translation information –All motion caused by person’s limbs Good news: indifferent to camera motion Bad news: hard! Good test to see if actions, not just translation, are being captured

10 input sequence Remembrance of Things Past “Explain” novel motion sequence by matching to previously seen video clips –For each frame, match based on some temporal extent Challenge: how to compare motions? run walk left swing walk right jog database

11 How to describe motion? Appearance –Not preserved across different clothing Gradients (spatial, temporal) –same (e.g. contrast reversal) Edges –Unreliable at this scale Optical flow –Explicitly encodes motion –Least affected by appearance –…but too noisy

12 Spatial Motion Descriptor Image frame Optical flow F x,y yx FF,  yyxx FFFF,,, blurred  yyxx FFFF,,,

13 Spatio-temporal Motion Descriptor t … … … …  Sequence A Sequence B Temporal window w B frame-to-frame similarity matrix A motion-to-motion similarity matrix A B I matrix w w blurry I w w

14 Soccer Real actions, moving camera, poor video 8 classes of actions 4500 frames of labeled data 1-nearest-neighbor classifier

15 Classifying Ballet Actions 16 Actions; 24800 total frames; 51-frame motion descriptor. Men used to classify women and vice versa.

16 Classifying Tennis Actions 6 actions; 4600 frames; 7-frame motion descriptor Woman player used as training, man as testing.

17 Classifying Tennis Red bars show classification results

18 Outline Human figures in motion –Action Recognition Localizing joint positions –Exemplar-based approach –Parts-based approach Motion Synthesis –Novel graphics application

19

20

21 Human Figures in Still Images Detection of humans is possible for stereotypical poses –Standing –Walking –(Viola et al., Poggio et al.) But we want to do more –Wider variety of poses –Localize joint positions

22 Problem

23 Shape Matching For Finding People Database of Exemplars

24 Shape Contexts Deformable template approach –Shapes represented as a collection of edge points Two stages –Fast pruning Quick tests to construct a shortlist of candidate objects Database of known objects could be large –Detailed matching Perform computationally expensive comparisons on only the few shapes in the shortlist Publications –Mori et al., CVPR 2001 –Mori and Malik, CVPR 2003 Featured in New York Times Science section

25 Results: Tracking by Repeated Finding

26 Multiple Exemplars Parts-based approach –Use a combination of keypoints or limbs from different exemplars –Reduces the number of exemplars needed Compute a matching cost for each limb from every exemplar Compute pairwise “consistency” costs for neighbouring limbs Use dynamic programming to find best K configurations

27 Combining Exemplars

28 Finding People (II): Parts-based Approach Bottom-up Segmentation as preprocessing Detect half-limbs and torsos Assemble partial configurations –Prune using global constraints Extend partial configurations to full human figures

29 Segmentation for Recognition Window-scanning (e.g. face detection) –O(N M S) SUPERPIXELS SEGMENTS Segmentation –Support masks for computation of features –Efficiency –Scalability –600K pixels  300 superpixels, 50 segments –O(N) + O(log(M))

30 Limb/Torso Detectors Learn limb and torso detectors from hand-labeled data Cues: –Contour Average edge strength on boundary –Shape Similarity to rectangle –Shading x,y gradients, blurred –Focus Ratio of high to low frequency energies

31 Assembling Partial Configurations Combinatorial search over sets of limbs and torsos –3 half-limbs plus a torso configurations Prune using global constraints –Proximity –Relative widths –Maximum lengths –Symmetry in colour Complete half-limbs –2 or 3-limbed people Sort partial configurations –Use limb, torso, and segmentation scores Extend final limbs of best configurations

32 Results

33 Rank 3

34 Outline Human figures in motion –Action Recognition Localizing joint positions –Exemplar-based approach –Parts-based approach Motion Synthesis –Novel graphics application

35 “Do as I Do” Motion Synthesis Matching two things: –Motion similarity across sequences –Appearance similarity within sequence Dynamic Programming input sequence synthetic sequence

36 Smoothness for Synthesis is similarity between input and target frames is appearance similarity within target frames For input frames {i}, find best target frames { } by maximizing following cost function: Optimize using dynamic programming: –N frames in input sequence –M target frames in database

37 “Do as I Do” Synthesis Target Frames Input Sequence Result 3400 Frames

38 “Do as I Say” Synthesis Synthesize given action labels –e.g. video game control run walk left swing walk right jog synthetic sequence run walk left swing walk right jog

39 “Do as I Say” Red box shows when constraint is applied

40 Frame 9½ Putting It All Together Can we do a better job of splicing clips together? Frame 9Frame 10YES… if we can find the joints!

41 Morphed Transitions

42 8 Transitions

43 Morphed Transitions

44 3 Transitions

45 Actor Replacement Rendering new character into existing footage Algorithm –Track original character –Find matches from new character –Erase original character –Render in new character Need to worry about occlusions

46 Show the impressive video

47 Future Directions Much remains to be done! Action Recognition –Using joint positions, shape: the “morpho-kinetics” of action recognition –Better models of activities Detecting and localizing figures –Combining top-down exemplar methods with bottom-up segmentation methods –Exploiting temporal cues

48 Acknowledgements References –Mori, Belongie, and Malik, “Shape Contexts Enable Efficient Retrieval of Similar Shapes”, CVPR 2001 –Mori and Malik, “Estimating Human Body Configurations using Shape Context Matching”, ECCV 2002 –Efros, Berg, Mori, and Malik, “Recognizing Action at A Distance” ICCV 2003 –Mori and Malik, “Recognizing Objects in Adversarial Clutter: Breaking a Visual CAPTCHA”, CVPR 2003 –Mori, Ren, Efros, Malik, “Recovering Human Body Configurations: Combining Segmentation and Recognition” CVPR 2004 Thank you!


Download ppt "Recognizing Human Figures and Actions Greg Mori Simon Fraser University."

Similar presentations


Ads by Google