Download presentation

Presentation is loading. Please wait.

Published byLucas Hoover Modified over 3 years ago

1
Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin INRIA Rhone-Alpes

2
Career path Masters thesis on gesture recognition at the University of Padova Visiting student, ESSRL, Washington University in St. Louis, and at the University of California at Los Angeles (2000) Ph.D. thesis on belief functions and uncertainty theory (2001) Researcher at Politecnico di Milano with the Image and Sound Processing group (2003-2004) Post-doc at the University of California at Los Angeles, UCLA Vision Lab (2004-2006) Marie Curie fellow at INRIA Rhone-Alpes

3
collaborations with several groups Scientific production and collaborations collaborations with journals: IEEE PAMIIEEE SMC-BCVIU Information FusionInt. J. Approximate Reasoning PC member for VISAPP, FLAIRS, IMMERSCOM, ISAIM currently 4+10 journal papers and 31+8 conference papers SIPTA Setubal CMU Pompeu Fabra EPFL-IDIAP UBoston

4
My background research Discrete math linear independence on lattices and matroids Uncertainty theory geometric approach algebraic analysis generalized total probability Machine learning Manifold learning for dynamical models Computer vision gesture and action recognition 3D shape analysis and matching Gait ID pose estimation

5
action recognition action segmentation A multi-layer framework for human motion analysis different tasks, integrated in a series of layes feedbacks act between different layers multiple views 3D reconstruction unsupervised body-part segmentation image data fusion model fitting (stick-articulated) motion capture identity recognition surveillanceHMI

6
A multi-layer framework for human motion analysis Action and gesture recognition Laplacian unsupervised segmentation Matching of 3D shapes by embedded orthogonal alignment Bilinear models for invariant gaitID Manifold learning for dynamical models The role of uncertainty measures Information fusion for model-free pose estimation

7
HMMs for gesture recognition transition matrix A -> gesture dynamics state-output matrix C -> collection of hand poses Hand poses were represented by size functions (BMVC'97)

8
Gesture classification … HMM 1 HMM 2 HMM n EM to learn HMM parameters from an input sequence the new sequence is fed to the learnt gesture models they produce a likelihood the most likely model is chosen (if above a threshold) OR new model is attributed the label of the closest one (using K-L divergence or other distances)

9
Volumetric action recognition 2D approaches: features are extracted from single views -> viewpoint dependence volumetric approach: features are extracted from a volumetric reconstruction of the moving body (ICIP'04)

10
A multi-layer framework for human motion analysis Action and gesture recognition Laplacian unsupervised segmentation Matching of 3D shapes by embedded orthogonal alignment Bilinear models for invariant gaitID Manifold learning for dynamical models The role of uncertainty measures Information fusion for model-free pose estimation

11
Unsupervised coherent 3D segmentation to recognize actions we need to extract features segmenting moving articulated 3D bodies into parts along sequences, in a consistent way in an unsupervised fashion robustly, with respect to changes of the topology of the moving body as a building block of a wider motion analysis and capture framework ICCV-HM'07, CVPR'08, to submit to IJCV

12
Clustering after Laplacian embedding generates a lower-dim, widely separated embedded cloud less sensitive to topology changes than other methods less computationally expensive then ISOMAP local neighborhoods -> stable under articulated motion

13
Algorithm K-wise clustering in the embedding space

14
Seed propagation along time To ensure time consistency clusters seeds have to be propagated along time Old positions of clusters in 3D are added to new cloud and embedded Result: new seeds

15
Results Coherent clustering along a sequence Handling of topology changes

16
A multi-layer framework for human motion analysis Action and gesture recognition Laplacian unsupervised segmentation Matching of 3D shapes by embedded orthogonal alignment Bilinear models for invariant gaitID Manifold learning for dynamical models The role of uncertainty measures Information fusion for model-free pose estimation

17
Laplacian matching of dense meshes or voxelsets as embeddings are pose-invariant (for articulated bodies) they can then be used to match dense shapes by simply aligning their images after embedding ICCV '07 – NTRL, ICCV '07 – 3dRR, CVPR '08, submitted to ECCV'08, to submit to PAMI

18
Eigenfunction Histogram assignment Algorithm: compute Laplacian embedding of the two shapes find assignment between eigenfunctions of the two shapes this selects a section of the embedding space embeddings are orthogonally aligned there by EM

19
Results Appls: graph matching, protein analysis, motion capture To propagate bodypart segmentation in time Motion field estimation, action segmentation

20
Application: spatio-temporal action segmentation problem: segmenting parts of the video(s) containing interesting motions multidimensional volume global approach: working on the entire sequence (multidimensional volume) previous works: object segmentation on the spatio- temporal volume for single frames idea: in a multi-camera setup, working on 3D clouds (hulls) + motion fields + time = 7D volume smoothing shape detection outline of an approach: smoothing using message passing + shape detection on the obtained manifold

21
A multi-layer framework for human motion analysis Action and gesture recognition Laplacian unsupervised segmentation Matching of 3D shapes by embedded orthogonal alignment Bilinear models for invariant gaitID Manifold learning for dynamical models The role of uncertainty measures Information fusion for model-free pose estimation

22
Bilinear models for gait-ID To recognize the identity of humans from their gait (CVPR '06, book chapter in progress) nuisance factors: emotional state, illumination, appearance, view invariance... (literature: randomized trees) each motion possess several labels: action, identity, viewpoint, emotional state, etc. bilinear models (Tenenbaum) can be used to separate the influence of style and content (the label to classify)

23
Content classification of unknown style given a training set in which persons (content=ID) are seen walking from different viewpoints (style=viewpoint) an asymmetric bilinear model can learned from it through SVD when new motions are acquired in which a known person is being seen walking from a different viewpoint (unknown style)… an iterative EM procedure can be set up to classify the content E step -> estimation of p(c|s), the prob. of the content given the current estimate s of the style M step -> estimation of the linear map for unknown style s

24
Three layer model each sequence is encoded as an HMM its C matrix is stacked in a single observation vector a bilinear model is learnt from those vectors Three-layer model Features: projections of silhouette's contours onto a line through the center

25
Results on CMU database T Mobo database: 25 people performing 4 different walking actions, from 6 cameras. Three labels: action, id, view Compared performances with baseline algorithm and straight k-NN on sequence HMMs

26
A multi-layer framework for human motion analysis Action and gesture recognition Laplacian unsupervised segmentation Matching of 3D shapes by embedded orthogonal alignment Bilinear models for invariant gaitID Manifold learning for dynamical models The role of uncertainty measures Information fusion for model-free pose estimation

27
Learning manifolds of dynamical models Classify movements represented as dynamical models for instance, each image sequence can be mapped to an ARMA, or AR linear model Motion classification then reduces to find a suitable distance function in the space of dynamical models when some a-priori info is available (training set).... we can learn in a supervised fashion the best metric for the classification problem! To submit to ECCV'08 – MLVMA Workshop

28
Learning pullback metrics many unsupervised algorithms take in input dataset and map it to an embedded space, but fail to learn a full metric consider than a family of diffeomorphisms F between the original space M and a metric space N the diffeomorphism F induces on M a pullback metric maximizing inverse volume finds the manifold which better interpolates the data (geodesics pass through crowded regions)

29
Space of AR(2) models given an input sequence, we can identify the parameters of the linear model which better describes it autoregressive models of order 2 AR(2) Fisher metric on AR(2) Compute the geodesics of the pullback metric on M

30
Results on action and ID rec scalar feature, AR(2) and ARMA models

31
A multi-layer framework for human motion analysis Action and gesture recognition Laplacian unsupervised segmentation Matching of 3D shapes by embedded orthogonal alignment Bilinear models for invariant gaitID Manifold learning for dynamical models The role of uncertainty measures Information fusion for model-free pose estimation

32
assumption: not enough evidence to determine the actual probability describing the problem second-order distributions (Dirichlet), interval probabilities credal sets Uncertainty measures: Intervals, credal sets Belief functions (Shafer 76): special case of credal sets a number of formalisms have been proposed to extend or replace classical probability

33
if m is a mass function on 2 Θ s.t. Probability on a finite set: function p: 2 Θ -> [0,1] with p(A)= x m(x), where m: Θ -> [0,1] is a mass function Probabilities are additive: if A B= then p(A B)=p(A)+p(B) Belief functions as random sets A B belief function b: 2 Θ ->[0,1]

34
Information fusion by Dempsters rule several aggregation or elicitation operators proposed original proposal: Dempsters rule b 1 : m({a 1 })=0.7, m({a 1, a 2 })=0.3 a1a1 a2a2 a3a3 a4a4 b 1 b 2 : m({a 1 }) = 0.7*0.1/0.37 = 0.19 m({a 2 }) = 0.3*0.9/0.37 = 0.73 m({a 1, a 2 }) = 0.3*0.1/0.37 = 0.08 b 2 : m( )=0.1, m({a 2, a 3, a 4 })=0.9

35
Imprecise classifiers and credal networks imprecise classifiers class estimate is a belief function exploit only available evidence, represent ignorance Belief networks or credal networks at each node a belief function or a convex set of probs robust version of bayesian networks

36
A multi-layer framework for human motion analysis Action and gesture recognition Laplacian unsupervised segmentation Matching of 3D shapes by embedded orthogonal alignment Bilinear models for invariant gaitID Manifold learning for dynamical models The role of uncertainty measures Information fusion for model-free pose estimation

37
Model-free pose estimation pose estimating the pose (internal configuration) of a moving body from the available images t=0t=T if you do not have an a- priori model of the object..

38
Learning feature-pose maps... learn a map between features and poses directly from the data given pose and feature sequences acquired by motion capture.. q q yy 1 1 T T a Gaussian density for each state is set up on the feature space -> approximate feature space maps each cluster to the set of training poses q k with feature y k inside it

39
Evidential model 18594 161 38.. and approximate parameter space.... form the evidential model MTNS'00, ISIPTA'05, to submit to Information Fusion

40
Results on human body tracking comparison of three models: left view only, right view only, both views pose estimation yielded by the overall model estimate associated with the right model ground truth left model

41
Conclusions - Research Hot topic in computer vision and machine learning: human motion analysis Applications: motion capture, surveillance, human machine interaction, biometric identification Different tools from machine learning, robust statistics, differential geometry can be useful Several tasks are involved in a hierarchical fashion Tasks are not isolated, but interact and generate feedbacks to help the solution of the others

42
Conclusions - Teaching plans machine vision involves notions coming from different branches of pure and applied mathematics: robust statistics, differential geometry, discrete math all of them are considered as useful tools to solve real- world problems students have then the chance to improve their mathematical background...... and learn at the same time how to develop real products on the ground integrated courses can be designed along this line

43
Conclusions – Commercial partnerships several opportunities to develop technology transfer activities involving companies biometrics: in particular, behavioral (non-controlled) identification surveillance: multi-camera human motion detection and classification image and video browsing: internet-based content retrieval personal links with companies like Honeywell Labs (surveillance), Riya (image googling), MS Research

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google