Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006.

Similar presentations


Presentation on theme: "Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006."— Presentation transcript:

1 Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

2 Myself gesture recognition Masters thesis on gesture recognition at the University of Padova Visiting student, ESSRL, Washington University in St. Louis theory of evidence Ph.D. thesis on the theory of evidence Young researcher in Milan with the Image and Sound Processing group Post-doc at UCLA in the Vision Lab

3 My research research Discrete mathematics linear independence on lattices Belief functions and imprecise probabilities geometric approach algebraic analysis combinatorial analysis Computer vision object and body tracking data association gesture and action recognition

4 Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov models Volumetric action recognition Data association with shape information Evidential models for pose estimation Bilinear models for view-invariant gaitID Riemannian metrics for motion classification Imprecise probabilities Geometric approach Algebraic analysis

5 Approach Problem: recognizing an example of a known category of gestures from image sequences HMMs size functions Combination of HMMs (for dynamics) and size functions (for pose representation) Continuous hidden Markov models EM algorithm for parameter learning (Moore)

6 Example transition matrix A -> gesture dynamics state-output matrix C -> collection of hand poses The gesture is represented as a sequence of transitions between a small set of canonical poses

7 Size functions Hand poses are represented through their contours real image measuring function family of lines size function table

8 Gesture classification … HMM 1 HMM 2 HMM n EM algorithm EM algorithm is used to learn HMM parameters from an input feature sequence the new sequence is fed to the learnt gesture models they produce a likelihood the most likely model is chosen (if above a threshold)

9 Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov models Volumetric action recognition Data association with shape information Evidential models for pose estimation Bilinear models for view-invariant gaitID Riemannian metrics for motion classification Imprecise probabilities Geometric approach Algebraic analysis

10 Composition of HMMs Compositional behavior of HMMS: the model of the action of interest is embedded in the overall model Clustering Clustering: states of the original model are grouped in clusters, and the transition matrix recomputed accordingly:

11 State clustering Effect of clustering on HMM topology Cluttered model for the two overlapping motions Reduced model for the fly gesture extracted through clustering

12 Kullback-Leibler comparison We used the K-L distance to measure the similarity between models extracted from clutter and in absence of it KL distances between fly (solid) and fly from clutter (dash) KL distances between fly and cycle

13 Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov models Volumetric action recognition Data association with shape information Evidential models for pose estimation Bilinear models for view-invariant gaitID Riemannian metrics for motion classification Imprecise probabilities Geometric approach Algebraic analysis

14 Volumetric action recognition problem: recognizing the action performed by a person viewed by a number of cameras 2D approaches: features are extracted from single views -> viewpoint dependence volumetric approach volumetric approach: features are extracted from a volumetric reconstruction of the moving body

15 Locally linear embedding Locally linear embedding to find topological representation of the moving body 3D feature extraction Linear discriminant analysis (LDA) to estimate the direction of motion as the direction of maximal separation between the legs k-means clustering k-means clustering to separate bodyparts

16 Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov models Volumetric action recognition Data association with shape information Evidential models for pose estimation Bilinear models for view-invariant gaitID Riemannian metrics for motion classification Imprecise probabilities Geometric approach Algebraic analysis

17 A number of formalisms have been proposed to extend or replace classical probability: e.g. possibilities, fuzzy sets, random sets, monotone capacities, gambles, upper and lower previsions Uncertainty descriptions theory of evidence theory of evidence (A. Dempster, G. Shafer) Probabilities are replaced by belief functions Bayes rule is replaced by Dempsters rule families of domains for multiple representation of evidence

18 A B2B2 B1B1..where m is a mass function on 2 Θ s.t. Belief functions are not additive belief function s: 2 Θ ->[0,1] Probability on a finite set: function p: 2 Θ -> [0,1] with p(A)= x m(x), where m: Θ -> [0,1] is a mass function which meets the normalization constraint Probabilities are additive: if A B= then p(A B)=p(A)+p(B) Belief functions

19 Dempsters rule in the theory of evidence, new information encoded as a belief function is combined with old beliefs in a revision process belief functions are combined through Dempsters rule AiAi BjBj A i B j =A intersection of focal elements

20 Example of combination s 1 : m({a 1 })=0.7, m({a 1, a 2 })=0.3 a1a1 a2a2 a3a3 a4a4 s 2 : m( )=0.1, m({a 2, a 3, a 4 })=0.9 s 1 s 2 : m({a 1 }) = 0.7*0.1/0.37 = 0.19 m({a 2 }) = 0.3*0.9/0.37 = 0.73 m({a 1, a 2 }) = 0.3*0.1/0.37 = 0.08

21 JPDA with shape info robustness: clutter does not meet shape constraints occlusions: occluded targets can be estimated JPDA model: independent targets shape model: rigid links Dempsters fusion

22 Body tracking tracking of feature points Application: tracking of feature points on a moving human body

23 Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov models Volumetric action recognition Data association with shape information Evidential models for pose estimation Bilinear models for view-invariant gaitID Riemannian metrics for motion classification Imprecise probabilities Geometric approach Algebraic analysis

24 Pose estimation pose estimating the pose (internal configuration) of a moving body from the available images features salient image measurements: features t=0t=T CAMERA t=0t=T

25 Model-based estimation a-priori model if you have an a-priori model of the object.... you can exploit it to help (or drive) the estimation example: kinematic model

26 Model-free estimation if you do not have any information about the body.. the only way to do inference is to learn a map learn a map between features and poses directly from the data training stage this can be done in a training stage

27 Collecting training data motion capture system 3D locations of markers = pose

28 Training data when the object performs some significant movements in front of the camera … … a finite collection of configuration values are provided by the motion capture system … while a sequence of features is computed from the image(s) q q yy 1 1 T T

29 Learning feature-pose maps Hidden Markov models Hidden Markov models provide a way to build feature-pose maps from the training data approximate feature space a Gaussian density for each state is set up on the feature space -> approximate feature space map map between each region and the set of training poses q k with feature value y k inside it

30 Evidential model approximate feature spaces.... and approximate parameter space.. family of compatible frames: the evidential model.. form a family of compatible frames: the evidential model

31 Human body tracking two experiments, two views four markers on the right arm six markers on both legs

32 Feature extraction three steps: original image, color segmentation, bounding box

33 Performances comparison of three models: left view only, right view only, both views pose estimation yielded by the overall model estimate associated with the right model left model ground truth

34 Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov models Volumetric action recognition Data association with shape information Evidential models for pose estimation Bilinear models for view-invariant gaitID Riemannian metrics for motion classification Imprecise probabilities Geometric approach Algebraic analysis

35 GaitID The problem: recognizing the identity of humans from their gait Typical approaches: PCA on image features, HMMs People typically use silhouette data view-invariance Issue: view-invariance Can be addressed via 3D representations 3D tracking: difficult and sensitive

36 Bilinear models style invariance From view-invariance to style invariance In a dataset of sequences, each motion possess several labels: action, identity, viewpoint, emotional state, etc. Bilinear models Bilinear models (Tenenbaum) can be used to separate the influence of two of those factors, called style and content (the label to classify) y SC is a training set of k-dimensional observations with labels S and C b C is a parameter vector representing content, while A S is a style- specific linear map mapping the content space onto the observation space

37 Content classification of unknown style Consider a training set in which persons (content=ID) are seen walking from different viewpoints (style=viewpoint) an asymmetric bilinear model can learned from it through the SVD of a stacked observation matrix when new motions are acquired in which a known person is being seen walking from a different viewpoint (unknown style)… … an iterative EM procedure can be set up to classify the content (identity) E step -> estimation of p(c|s), the prob. of the content given the current estimate s of the style M step -> estimation of the linear map for the unknown style s

38 Three layer model each sequence is encoded as a Markov model, its C matrix is stacked in an observation vector, and a bilinear model is trained over those vectors Three-layer model Feature representation: projection of the contour of the silhouette on a sheaf of lines passing through the center

39 MOBO database 6 cameras Mobo database: 25 people performing 4 different walking actions, from 6 cameras action, id, view Each sequence has three labels: action, id, view four experiments We set up four experiments in which one label was chosen as content, another one as style, and the remaining is considered as a nuisance factor Content = id, style = view -> view-invariant gaitID Content = id, style = action -> action-invariant gaitID Content = action, style = view -> view-invariant action recogntion Content = action, style = id -> style-invariant action recognition

40 Results Compared performances with baseline algorithm and straight k-NN on sequence HMMs

41 Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov models Volumetric action recognition Data association with shape information Evidential models for pose estimation Bilinear models for view-invariant gaitID Riemannian metrics for motion classification Imprecise probabilities Geometric approach Algebraic analysis

42 Distances between dynamical models Problem: motion classification linear dynamical model Approach: representing each movement as a linear dynamical model for instance, each image sequence can be mapped to an ARMA, or AR linear model distance function in the space of dynamical models Classification is then reduced to find a suitable distance function in the space of dynamical models We can use this distance in any of the popular classification schemes: k-NN, SVM, etc.

43 Riemannian metrics Some distances have been proposed: Martins distance, subspace angles, gap-metric, Fisher metric However, it makes no sense to choose a single distance for all possible classification problems When some a-priori info is available (training set).... we can learn in a supervised fashion the best metric for the classification problem!.. we can learn in a supervised fashion the best metric for the classification problem! volume minimization of Feasible approach: volume minimization of pullback metrics pullback metrics

44 Learning pullback metrics many unsupervised algorithms take in input dataset and map it to an embedded space they fail to learn a full metric Consider than a family of diffeomorphisms F between the original space M and a metric space N The diffeomorphism F induces on M a pullback metric M N D F

45 Space of AR(2) models Given an input sequence, we can identify the parameters of the linear model which better describes it We chose the class of autoregressive models of order 2 AR(2) Fisher metric on AR(2) Compute the geodesics of the pullback metric on M

46 Results scalar feature, AR(2) and ARMA models NN algorithm to classify new sequences Identity recognition Action recognition

47 Results -2 Recognition performance of the second-best distance and the optimal pull-back metric The whole dataset is considered, regardless the view View 1 View 5

48 Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov models Volumetric action recognition Data association with shape information Evidential models for pose estimation Bilinear models for view-invariant gaitID Riemannian metrics for motion classification Imprecise probabilities Geometric approach Algebraic analysis

49 it has the shape of a simplex each subset A A-th coordinate s(A) Geometric approach to the ToE Belief functions can be seen as points of an Euclidean space of dimension 2 n -2 Belief space Belief space: the space of all the belief functions on a given frame

50 Geometry of Dempsters rule Dempsters rule can be studied in the geometric setup too Geometric operator mapping pairs of points onto another point of the belief space conditional subspaces

51 compositional criterion compositional criterion the approximation behaves like s when combined through Dempsters rule Problem: given a belief function s, finding the best probabilistic approximation of s this can be solved in the geometric setup comparative study of all the proposed probabilstic approximations Probabilistic approximation

52 Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov models Volumetric action recognition Data association with shape information Evidential models for pose estimation Bilinear models for view-invariant gaitID Riemannian metrics for motion classification Imprecise probabilities Geometric approach Algebraic analysis

53 Lattice structure minimal refinement 1F1F maximal coarsening F is a locally Birkhoff (semimodular with finite length) lattice bounded below order relation: existence of a refining families of frames have the algebraic structure of a lattice

54 a-priori constraint conditional constraint generalization of the total probability theorem Total belief theorem whole graph of candidate solutions, connections with combinatorics and linear systems


Download ppt "Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006."

Similar presentations


Ads by Google