9.012 Presentation by Alex Rakhlin March 16, 2001

9.012 Presentation by Alex Rakhlin March 16, 2001
Computation of pattern invariance in brain-like structures (S. Ullman, S. Soloviev) 9.012 Presentation by Alex Rakhlin March 16, 2001

The problem of shift invariance
Visual system recognizes familiar objects despite changes in retinal position. Simple transformation, yet no satisfactory and biologically plausible models.

Main approaches Different initial representations presumably reach a common unified representation at some high levels. Two approaches: Full Replication Specialized neuronal mechanism dedicated to the detection of a given shape at a given position. Highly redundant. Some variations with detection of only simple features. Problem: spatial relation is lost.

Main approaches Normalized Representation
Transform image into normalized central representation, common to all retinal positions. Then let the pattern analyzing mechanisms operate on this common representation. Requires a complex, unrealistic network. Does not generalize to other invariances such as rotations. Implies shift invariance for arbitrary novel shapes. Does not account for the main properties of units along the visual pathway; does not account for the role of learning. Other models lie in between these two models.

Psychophysical studies
High degree of position invariance for line drawings of familiar objects. (Biederman and Copper, 1991) Increase in recognition latencies with discrepancy in size between learned and viewed shapes (Bricolo and Bülthoff, 1993) Significant decrease in discrimination of novel patterns at nearby locations! (Nazir and O’Regan, 1990) Extensive training to discriminate between similar novel patterns at one location doesn’t improve performance at a new location! (Dill and Fahle, 1997, etc) Shift invariance is not automatic and universal.

Shift invariance by the conjunction of fragments
Uses full replication at the level of object fragments. Stores view-fragments of different complexity as well as the equivalence relations among them.

Shift invariance by conjunction of fragments

Problem of spatial relations
… solved by the use of multiple, overlapping fragments.

First domain: line drawings
Grid = nxn Shapes = connected figures Parts = connected pairs of line segments

First domain: line drawings
Use small number of parts. Number of shapes grows exponentially with n2. Number of possible parts grows polynomially with n. In the case of 3x3, used several hundred parts for tens of thousands of input shapes. Tested whether the same collection of parts could generate a different shape.

Second domain: image patches
Transform gray-level images into binary (edge-detecting filter) Start with small micro-patterns of 3x3 patches (512). Next, use 3x3 patches to construct larger micro-patterns of size 5x5 (contains nine 3x3 patches). Question: are all 5x5 patterns unambiguously determined in terms of the sub-units?

Ambiguity of conjunction
1 2 3 4 5 6 7 8 9

Building the hierarchy
Construct shift-invariant units for the larger patterns by a convergence of the more elementary sub-units within a region. If hierarchy is not used, representation becomes increasingly ambiguous as the size of the pattern increases (i.e. constructing 7x7 patterns from 3x3 directly)

In the hierarchical construction the ambiguity is significantly reduced. Ambiguous patterns are visually quite similar! Degree of overlap is a natural measure. System will eventually contain units encoding fragments of different size and complexity. Then these local fragment detectors converge to a global unit, which responds to presence of the fragment anywhere in the region.

Novel pattern will activate at two locations a similar set of micro-patterns. Thus, system will be able to immediately generalize. Scheme doesn’t rely on the detailed shape of the object’s bounding contour – tolerant to clutter and occlusion (why?) Learning: if shape cannot be represented with existing patterns, system will eventually store additional fragments. Need to store this at each point, as objects move in the world. This agrees with psychophysics.

Natural patterns: high orientation preference

Parallels with visual system
Increase in receptive field size along the pathway; selectivity to increasingly complex shapes. High degree of parallelism. Memory-based, using sub-patterns seen in the past. Simple computations (rather than complex internal shifting) Fast computation (rather than lengthy iterative)

Extensions 3D Classification
Equivalence defined between image fragments on the basis of a substitution relation

Conclusions Outlined an approach to the computation of pattern invariance. Learning-based. Image fragments are building blocks for increasingly complex representations. Many similarities with visual system. Agrees the psychophysical studies.

Critique Rotation invariance? Depth invariance?
Not clear how to implement the learning mechanism on the level of neurons (i.e. how the convergent representation is created)

9.012 Presentation by Alex Rakhlin March 16, 2001

Similar presentations

Presentation on theme: "9.012 Presentation by Alex Rakhlin March 16, 2001"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

9.012 Presentation by Alex Rakhlin March 16, 2001

Similar presentations

Presentation on theme: "9.012 Presentation by Alex Rakhlin March 16, 2001"— Presentation transcript:

Similar presentations

About project

Feedback