Download presentation
Presentation is loading. Please wait.
Published byJasmin Maxwell Modified over 9 years ago
1
Motion Segmentation from Clustering of Sparse Point Features Using Spatially Constrained Mixture Models Shrinivas Pundlik Committee members Dr. Stan Birchfield (chair) Dr. Adam Hoover Dr. Ian Walker Dr. Damon Woodard
2
Motion Segmentation Gestalt insight: grouping forms the basis of human perception Gestalt Laws: Factors that affect the grouping process (cues) Motion segmentation: segmenting images based on common motion points moving together are grouped together similarityproximity common motion (common fate) continuity Typically, motion segmentation uses common motion + proximity
3
Applications of Motion Segmentation object detection pedestrian detection tracking vehicle tracking robotics surveillance image and video compression scene reconstruction video manipulation / editing video matting video annotation motion magnification Video editing Criminisi et al., 2006 Vehicle tracking Kanhere et al., 2005 Pedestrian detection Viola et al., 2003
4
Previous Work Approach Wang and Adelson 1994 Xiao and Shah 2005 Ayer and Sawhney 1995 Willis et al. 2003 Motion Layer Estimation Multi Body Factorization Object Level Grouping Miscellaneous Costeria and Kanade 1995 Sivic et al. 2004 Kanhere et al. 2005 Ke and Kanade 2002 Black and Fleet 1998 Birchfield 1999 Levine and Weiss 2006 Vidal and Sastry 2003 Yan and Pollefeys 2006 Gruber and Weiss 2006 Jojic and Frey 2001 Algorithm Shi and Malik 1998 Expectation Maximization Graph Cuts Belief Propagation Normalized Cuts Jojic and Frey 2001 Smith et al. 2004 Kokkinos and Maragos 2004 Xiao and Shah 2005 Willis et al. 2003 Criminisi et al. 2006 Kumar et al. 2005 Variational Methods Cremers and Soatto 2005 Brox et al. 2005 Nature of Data Dense Motion Motion + Image Cues Sparse Features Sivic et al. 2004 Kanhere et al. 2005 Rothganger et al. 2004 Cremers and Soatto 2005 Brox et al. 2005 Kumar et al. 2005 Criminisi et al. 2006 Xiao and Shah 2005
5
Challenges: Short Term 1. statue 2. wall 4. grass 3. trees 5. biker 6. pedestrian + + + + + + computation of motion in the scene influence of the neighboring motion number of objects / regions in the scene initialization of motion parameters description of complex motions (articulated human motion)
6
Challenges: Long Term t threshold slow mediumfast x time window batch processing incremental processing threshold slowmediumfast t x crawling time window batch processing vs. incremental processing updating the reference frame maintain existing groups growing existing regions splitting adding new groups (new objects) (deleting invisible groups)
7
Objectives motion computation clustering (two-frame) long-term maintenance of groups observed data parameter estimation group assignment motion models translation affine complex models Feature Tracking Motion Segmentation Mixture Model Framework Articulated Human Motion Models motion segmentation using sparse point features automatically determine the number of groups handling dynamic sequences real time performance handling complex motions
8
Overview of the Topics Feature Tracking: Tracking sparse point features for computation of image motion and its extension to joint feature tracking. S. T. Birchfield and S. J. Pundlik, “Joint Tracking of Features and Edges”, CVPR, 2008. Motion Segmentation: Clustering point features in videos based on their motion and spatial connectivity. S. J. Pundlik and S. T. Birchfield, “ Motion Segmentation at Any Speed”, BMVC 2006. S. J. Pundlik and S. T. Birchfield, “Real Time Motion Segmentation of Sparse Feature Points at Any Speed”, IEEE Trans. on Systems, Man, and Cybernetics, 2008. Articulated Human Motion Models: Learning human walking motion from various pose and view angles for segmentation and pose estimation (a special handling of a complex motion model) Iris Segmentation: Texture and intensity based segmentation of non-ideal iris images. S. J. Pundlik, D. L. Woodard and S. T. Birchfield, “Non Ideal Iris Segmentation Using Graph Cuts”, CVPR Workshop on Biometrics, 2008.
9
Point Features Popular features: Harris corner feature [Harris & Stephens 1987, Schmid et al. 2000] Shi-Tomasi feature [Shi & Tomasi 1994] Forstner corner feature [Forstner 1994] Scale invariant feature transform (SIFT) [Lowe 2000] Gradient Location and Orientation Histogram (GLOH) [Mikolajczyk and Schmid 2005] Features from accelerated segment test (FAST) [Rosten and Drummond 2005] Speeded up robust features (SURF) [Bay et al. 2006] DAISY [Tola et al. 2008] gradients point features input capturing the information content
10
Utility of Point Features Advantages: highly repeatable and extensible (work for a variety of images) efficient to compute (real time implementations available) local methods for processing (tracking through multiple frames) tracking multiple point features = sparse optical flow sparse point feature tracks yield the image motion
11
Tracking Point Features : Lucas-Kanade (optic flow constraint equation) image pixel displacement image spatial derivatives image temporal derivative Estimate the pixel displacement u = ( u, v ) T by minimizing: Differentiating with respect to u and v, setting the derivatives to zero leads to a linear system: Assume constant brightness: Iterate using Newton-Raphson method Gradient covariance matrix convolution kernel
12
Detection of Point Features intensity x y no feature 1 low intensity variation e max = 5.15, e min = 3.13 two small eigenvalues intensity x y edge feature 2 unidirectional intensity variation e max = 1026.9, e min = 29.9 a small and a large eigenvalue intensity y x good feature 3 bidirectional intensity variation e max = 1672.44, e min = 932.4 two large eigenvalues 1 3 2 Gradient covariance matrix: eigenvalues of Z threshold Good feature: Z = convolution kernel image gradients >
13
Dense Optical Flow: Horn-Schunck Horn-Schunck: find global displacement functions u(x,y) and v(x,y) by minimizing: data term (optical flow constraint) smoothness term regularization parameter Solve using Euler-Lagrange: Laplacian Approximation leads to a sparse system: average displacement in the neighborhood a constant
14
Need for a Joint Approach Lucas-Kanade (1981) Horn-Schunck (1981) local method (local smoothing) pixel displacement: constant within a small neighborhood robust under noise produces sparse optical flow global method (global smoothing) pixel displacement: a smooth function over the image domain sensitive to noise produces dense optical flow use global smoothing to improve feature tracking use local smoothing to improve dense optical flow Joint Feature Tracking Combined Local-Global approach (Bruhn et al., 2004)
15
Joint Lucas-Kanade (JLK) data term (optical flow constraint) smoothness term (regularization) Joint Lucas-Kanade energy functional: number of feature points Differentiating E JLK w.r.t. (u,v) gives a 2N x 2N system whose (2i-1) and (2i) th rows are given by: Sparse system is solved using Jacobi iterations expected values
16
Results of JLK low texture repetitive texture
17
Overview of the Topics Feature Tracking: Tracking sparse point features for computation of image motion and its extension to joint feature tracking. S. T. Birchfield and S. J. Pundlik, “Joint Tracking of Features and Edges”, CVPR, 2008. Motion Segmentation: Clustering point features in videos based on their motion and spatial connectivity. S. J. Pundlik and S. T. Birchfield, “ Motion Segmentation at Any Speed”, BMVC 2006. S. J. Pundlik and S. T. Birchfield, “Real Time Motion Segmentation of Sparse Feature Points at Any Speed”, IEEE Trans. on Systems, Man, and Cybernetics, 2008. Articulated Human Motion Models: Learning human walking motion from various pose and view angles for segmentation and pose estimation (a special handling of a complex motion model) Iris Segmentation: Texture and intensity based segmentation of non-ideal iris images. S. J. Pundlik, D. L. Woodard and S. T. Birchfield, “Non Ideal Iris Segmentation Using Graph Cuts”, CVPR Workshop on Biometrics, 2008.
18
Mixture Models Basics Posterior Probability of drawing a Red sample likelihood of the sample being Red (measurement) prior probability of the Red bin P(Red|sample) P(sample|Red) P(Red) how Red is the drawn sample? how big is the Red bin? 3 bins (components) sample P(sample) = P(sample|Red)P(Red) + P(sample|Green) P(Green) + P(sample|Blue)P(Blue) probability of drawing a sample from a mixture of three bins: challenge: only available information is the drawn sample! Mixture Model: likelihoods and priors for all the components
19
Mixture Model Example: GMM Parameters of a Gaussian density, θ : mean (μ) and variance (σ 2 ) grayscale values θ 1 = {μ 1, σ 1 } θ 2 = {μ 2, σ 2 } θ 3 = {μ 3, σ 3 } θ 4 = {μ 4, σ 4 } Gaussian density for the j th component: i th pixel conditioned on parameters of the j th Gaussian density
20
Learning Mixture Models Mixture model defined as: number of components (known) mixing weights (unknown) component density density parameters (unknown) Learning mixture models (parameter estimation): Estimate mixing weights and component density parameters parameter estimation class association (segmentation) observed data point (known) Circular nature of the problem:
21
Expectation Maximization EM: an iterative two step algorithm for parameter estimation 1.Initialize: a.number of components K b.component density parameters θ for all components c.mixing weights π d.convergence criterion 2.repeat until convergence E STEP a.for all N data points i. compute likelihood from the component density ii. estimate weights, w M STEP b. estimate mixing weights c. estimate component density parameters E Step: Find expectation of the likelihood function (Segmentation / label assignment) M Step: Maximize the likelihood function (parameter estimation based on segmentation) convergence: when the likelihood cannot be further maximized (when estimates do not change between successive iterations )
22
Various Mixture Models data term (how closely the data follow the models) smoothness term (spatial interaction of the data elements) one prior for each component (mixing weights) prior distribution for each data element (label probabilities) neighbors mostly have similar labels (loose constraint) enforce spatial connectivity of labels Finite Mixture Model FMM Spatially Variant Finite Mixture Model (ML) ML-SVFMM [1] Spatially Variant Finite Mixture Model (MAP) MAP-SVFMM [1] Spatially Constrained Finite Mixture Model SCFMM EM algorithm Greedy EM algorithm 1.S. Sanjay-Gopal and T. Hebert, “Bayesian Pixel Classification Using Spatially Variant Finite Mixtures and Generalized EM Algorithm”, IEEE Tran. on Image Processing, 1998.
23
Greedy-EM (Iterative Region Growing) start location 1 start location 2 start location 3 consider a 4-connected grid Properties of Greedy EM: enforces spatial connectivity of labels (SCFMM) automatically determines the number of groups local initialization of parameters primary user defined parameters: inclusion criterion minimum number of elements in a group
24
Grouping Point Features Between two frames, Repeat Randomly select seed feature Fit motion model to neighbors Repeat until group does not change: Discard all features except the one near the centroid Grow group by recursively including neighboring features with similar motion Update the motion model Until all features have been considered grouping features from a single seed point original seed original seed centroid original seed centroid original seed centroid original seed original seed centroid
25
Grouping Consistent Features input: point features tracked between two frames output: groups of point features for N seed points group point features gather sets of features always grouped together seed 1 seed 2 seed 3 consistent feature group
26
Grouping Consistent Features + = 1 11 1 1 1 abcd d c b a 11 11 1 1 1 abcd d c b a abcd d c b a 2 2 2 2 2 a b c d a b c d a b c d Consistency check: Features that are always grouped together, no matter the seed point In practice, we use 7 seed points seed point
27
Consistent Features: Multiple Groups Feature groups obtained for various iterations consistent feature groups
28
Maintaining Groups Over Time 6 6 9 2 7 6 3 5 4 1 2 6 8 3 5 7 track features find consistent groups lost features newly added features if Х 2 test fails 1 4 8 3 7 9 5 frame kframe k + n 6 3 7 89 5 either features are regrouped or multiple groups are found
29
Experimental Results mobile-calendar freethrow car-map robots statue
30
Videos statue sequence mobile-calendar sequence
31
Results Over Time freethrow mobile-calendar statue car-map robots vehicles Algorithm dynamically determines the number of feature groups
32
Comparison with Other Approaches AlgorithmRun Time (sec/frame) Max. number of groups Xiao and Shah (PAMI, 2005) 5204 Kumar et al. (ICCV, 2005) 5006 Smith et al. (PAMI, 2004) 1803 Rothganger et al. (CVPR, 2004) 303 Jojic and Frey (CVPR, 2001) 13 Cremers and Soatto (IJCV, 2005) 404 Our algorithm (TSMC, 2008) 0.168
33
Effect of Joint Feature Tracking input standard Lucas-Kanade Joint Lucas-Kanade
34
Overview of the Topics Feature Tracking: Tracking sparse point features for computation of image motion and its extension to joint feature tracking. S. T. Birchfield and S. J. Pundlik, “Joint Tracking of Features and Edges”, CVPR, 2008. Motion Segmentation: Clustering point features in videos based on their motion and spatial connectivity. S. J. Pundlik and S. T. Birchfield, “ Motion Segmentation at Any Speed”, BMVC 2006. S. J. Pundlik and S. T. Birchfield, “Real Time Motion Segmentation of Sparse Feature Points at Any Speed”, IEEE Trans. on Systems, Man, and Cybernetics, 2008. Articulated Human Motion Models: Learning human walking motion from various pose and view angles for segmentation and pose estimation (a special handling of a complex motion model) Iris Segmentation: Texture and intensity based segmentation of non-ideal iris images. S. J. Pundlik, D. L. Woodard and S. T. Birchfield, “Non Ideal Iris Segmentation Using Graph Cuts”, CVPR Workshop on Biometrics, 2008.
35
Articulated Motion Models Objectives: learn articulated human motion models motion only, no appearance viewpoint and scale invariant detection varying lighting conditions (day and night time sequences) detection in presence of camera and background motion pose estimation Theme: Sparse Motion alone captures a wealth of information Purpose of human motion analysis: pedestrian detection/surveillance action recognition pose estimation Traditional approaches use: appearance frame differencing
36
Use of Motion Capture Data motion capture (mocap) data in 3D train high-level descriptors (appearance or motion based) that describe articulated motion at a global level for detection learn the motion of individual joints from the training data and aggregate the information to detect human motion Bottom-Up Approach Top-Down Approach hand foot 2 foot 1 center displacement of the limbs w.r.t. the body center
37
Approach Overview
38
Training 3D motion capture pointsangular viewpoints walking poses
39
Motion Descriptor Gaussian weight maps for the various means and orientations that constitute the motion descriptor spatial arrangement of the descriptor bins w.r.t. the body center bin values of the motion descriptor describing human subjects from various viewpoints and pose configurations views poses confusion matrix for 64 training descriptors
40
Segmentation Results View-invariant segmentation of articulated motion using a motion descriptor right profile left profile angular front Segmentation of articulated motion in a challenging sequence involving camera and background motion
41
Pose Estimation Results front view nighttime sequence right-profile view angular view
42
Videos of Detection and Pose Estimation
43
Overview of the Topics Feature Tracking: Tracking sparse point features for computation of image motion and its extension to joint feature tracking. S. T. Birchfield and S. J. Pundlik, “Joint Tracking of Features and Edges”, CVPR, 2008. Motion Segmentation: Clustering point features in videos based on their motion and spatial connectivity. S. J. Pundlik and S. T. Birchfield, “ Motion Segmentation at Any Speed”, BMVC 2006. S. J. Pundlik and S. T. Birchfield, “Real Time Motion Segmentation of Sparse Feature Points at Any Speed”, IEEE Trans. on Systems, Man, and Cybernetics, 2008. Articulated Human Motion Models: Learning human walking motion from various pose and view angles for segmentation and pose estimation (a special handling of a complex motion model) Iris Segmentation: Texture and intensity based segmentation of non-ideal iris images. S. J. Pundlik, D. L. Woodard and S. T. Birchfield, “Non Ideal Iris Segmentation Using Graph Cuts”, CVPR Workshop on Biometrics, 2008.
44
Iris Image Segmentation non-ideal iris image segmentation using texture and intensity Ideas: local intensity variations (computed from gradient magnitude and point features) can be used for texture representation that segments eyelash and non-eyelash regions possible segments based on image intensity: iris, pupil and background higher density of point features higher gradient magnitude lower density of point features lower gradient magnitude input image point features gradient magnitude background iris pupil Coarse Texture Computation eye eyelash non-eyelash irispupil background textured regions un-textured regions (Four Regions)
45
Iris Segmentation and Recognition Input Iris ImagePreprocessed inputIris SegmentationIris RefinementIris MaskIris Ellipse Specular Reflections - Iris segmentation: Iris recognition: unwrap and normalize the iris mask generate iris signature from iris mask (using texture in the iris) compare iris signature using Hamming distance
46
Image Segmentation Results iris background pupil eyelashes Input Image Segmentation Iris Mask
47
Iris Recognition Iris recognition using our segmentation algorithm West Virginia Non-Ideal Database West Virginia Off-Axis Database 1868 images 467 classes, 4 images/class 584 images 146 classes, 4 images/class
48
Conclusions and Future Work Motion segmentation based on sparse feature clustering spatially constrained mixture model and greedy EM algorithm automatically determines number of groups real-time performance ability to handle long, dynamic sequences and arbitrary number of feature groups Joint feature tracking incorporation of neighboring feature motion improved performance in areas of low-texture or repetitive texture Detection of articulated motion motion based approach for learning high-level human motion models segment and track human motion in varying pose, scale, and lighting conditions view invariant pose estimation Iris segmentation graph cuts based dense segmentation using texture and intensity combines appearance and eye geometry handles non-ideal iris image with occlusion, illumination changes, and eye rotation Future Work integration of motion segmentation, joint feature tracking, and articulated motion segmentation dense segmentation from the sparse feature groups handling non-rigid motions, non-textured regions, and occlusions combining sparse feature groups, discontinuities, and image contours for a novel representation of video
49
Questions?
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.