Presentation on theme: "Learning deformable models Yali Amit, University of Chicago Alain Trouvé, CMLA Cachan."— Presentation transcript:
Learning deformable models Yali Amit, University of Chicago Alain Trouvé, CMLA Cachan.
Why modeling? Generative models for object appearance allow us to move from learned objects to online decisions on object configurations. Probability models can be composed. Parameters can be estimated online. Generative models allow us to learn sequentially and still be able to discriminate between objects. Sequential learning of new objects. Sequential learning of sub-classes. Proper modeling and accounting of invariances allows us to learn with small samples. Large background samples not necessary.
Modeling object appearance Object classes are recognized in data modulo strong variations – geometric and photometric. Variations are modeled as group action on data. Data is noisy and sampled discretely. Model object appearance through group actions on a template which then undergoes some degradation to become observed data.
As vectors these are very far apart. Modulo translation, rotation and contrast they are identical except for the noise. Lower dimensional parameterization. This structure could not be discovered through direct measurements on the data. (Dictionary world or manifold world)
Example: handwritten digits No modeling of contrast → contrast sensitive. One way to avoid modeling a certain variability is to `mod' it out - Binary oriented edges. (Can't add binary images... )
Oriented edge data Original image Transforming to oriented edges
Deforming the data
Simplest background model
Mixture models for the `micro-world'
Structured library of parts A mixture of models for local image windows – parts - is used to recode the image data at much lower spatial resolution with little loss of information. A mixture of deformable models (rotations) imposes a geometric structure on this code – tells us which parts are similar.
Part based representation Because parts are structured not much information lost with lower resolution. Much invariance gained. Now estimate Bernoulli mixture models for object class with coarse part based representation. Or estimate hierarchy of mixture models.
Simple non-linear deformations
Patchwork model: gray levels
Training a POP model Simple approximation: train each window separately with full E-step in the EM algorithm. Assume homogeneous background model outside window. Works for binary features not so well for gray level models. For gray level data: use current estimates for all other windows, at optimal instantiation for each training sample as a background – iterative optimization of the full likelihood.
Training a POP model continued
Mixture models based on parts on coarse grid → POP models for each component based on oriented edge data → For each component of POP model compute mean image modulo shift → Produce gray level POP model from image means.
Conclusion Importance of modeling variability as hidden random variable. Estimation of templates and mixtures through EM type algorithms. Local world – parts, dictionaries with symmetries. Global objects – non-linear deformations. Instead of modeling variability – max over simple subsets of deformations applied to object parts. (Needs formalization.) For object recognition there is rich structure in the subject matter beyond linear operations in function spaces. Distances should not be measured directly in observation space. The `manifold' is defined through the group action. A wide range of open questions both theoretical and applied waiting to be studied.