Recognition by Parts Visual Recognition Lecture 12 “The whole is equal to the sum of its parts” Euclid.

Slides:

Advertisements

Similar presentations

Shapelets Correlated with Surface Normals Produce Surfaces Peter Kovesi School of Computer Science & Software Engineering The University of Western Australia.

Advertisements

Chapter 2: Marr’s theory of vision. Cognitive Science  José Luis Bermúdez / Cambridge University Press 2010 Overview Introduce Marr’s distinction between.

Context-based object-class recognition and retrieval by generalized correlograms by J. Amores, N. Sebe and P. Radeva Discussion led by Qi An Duke University.

Alignment Visual Recognition “Straighten your paths” Isaiah.

Invariants (concluded); Lowe and Biederman. Announcements No class Thursday. Attend Rao lecture. Double-check your paper assignments.

電腦視覺 Computer and Robot Vision I

November 12, 2013Computer Vision Lecture 12: Texture 1Signature Another popular method of representing shape is called the signature. In order to compute.

Dynamic Occlusion Analysis in Optical Flow Fields

DREAM PLAN IDEA IMPLEMENTATION Introduction to Image Processing Dr. Kourosh Kiani

Image Analysis Phases Image pre-processing –Noise suppression, linear and non-linear filters, deconvolution, etc. Image segmentation –Detection of objects.

Lecture 18 – Recognition 1. Visual Recognition 1)Contours 2)Objects 3)Faces 4)Scenes.

Extended Gaussian Images

1 Computational Vision CSCI 363, Fall 2012 Lecture 35 Perceptual Organization II.

Computer Vision Laboratory 1 Unrestricted Recognition of 3-D Objects Using Multi-Level Triplet Invariants Gösta Granlund and Anders Moe Computer Vision.

Proportion Priors for Image Sequence Segmentation Claudia Nieuwenhuis, etc. ICCV 2013 Oral.

Last week... why object recognition is difficult, the template model the feature recognition model, word recognition as a case study Today... Recognition.

Perception Putting it together. Sensation vs. Perception A somewhat artificial distinction Sensation: Analysis –Extraction of basic perceptual features.

Cognitive Processes PSY 334 Chapter 2 – Perception April 9, 2003.

1 Visual Recognition Lecturers: Ehud Rivlin, Michael Rudzsky, Ilan Shimshoni Tutor: Michael Rudzsky.

Visual Cognition II Object Perception. Theories of Object Recognition Template matching models Feature matching Models Recognition-by-components Configural.

A Study of Approaches for Object Recognition

Cognitive Processes PSY 334 Chapter 2 – Perception June 30, 2003.

2003 by Jim X. Chen: Introduction to Modeling Jim X. Chen George Mason University.

Processing Digital Images. Filtering Analysis –Recognition Transmission.

Local Symmetry - 2D Ribbons, SATs and Smoothed Local Symmetries Asaf Yaffe Image Processing Seminar, Haifa University, March 2005.

Pattern Recognition Pattern - complex composition of sensory stimuli that the human observer may recognize as being a member of a class of objects Issue.

Object Perception. Perceptual Grouping and Gestalt Laws Law of Good continuation. This is perceived as a square and triangle, not as a combination of.

Course Website: Computer Graphics 11: 3D Object Representations – Octrees & Fractals.

Visual Cognition II Object Perception. Theories of Object Recognition Template matching models Feature matching Models Recognition-by-components Configural.

Evaluating the Quality of Image Synthesis and Analysis Techniques Matthew O. Ward Computer Science Department Worcester Polytechnic Institute.

Human Visual System Lecture 3 Human Visual System – Recap

Shadow Removal Seminar

What should be done at the Low Level?

CS292 Computational Vision and Language Visual Features - Colour and Texture.

Michael Arbib & Laurent Itti: CS664 – USC, spring Lecture 6: Object Recognition 1 CS664, USC, Spring 2002 Lecture 6. Object Recognition Reading Assignments:

Information that lets you recognise a region.

An aside: peripheral drift illusion illusion of motion is strongest when reading text (such as this) while viewing the image in your periphery. Blinking.

Cognitive Processes PSY 334 Chapter 2 – Perception.

00/4/103DVIP-011 Part Three: Descriptions of 3-D Objects and Scenes.

3D Models and Matching representations for 3D object models

Copyright © 2012 Elsevier Inc. All rights reserved.

THE PROBLEM OF VISUAL RECOGNITION (Ch. 3, Farah) Why is it difficult to identify real world objects from the retinal image? Why is it difficult to identify.

OBJECT RECOGNITION. The next step in Robot Vision is the Object Recognition. This problem is accomplished using the extracted feature information. The.

Perception Introduction Pattern Recognition Image Formation

Shape from Contour Recover 3D information. Muller-Lyer illusion.

ENT 273 Object Recognition and Feature Detection Hema C.R.

Intelligent Vision Systems ENT 496 Object Shape Identification and Representation Hema C.R. Lecture 7.

Detecting Curved Symmetric Parts using a Deformable Disc Model Tom Sie Ho Lee, University of Toronto Sanja Fidler, TTI Chicago Sven Dickinson, University.

Shape Analysis and Retrieval Structural Shape Descriptors Notes courtesy of Funk et al., SIGGRAPH 2004.

Digital Image Processing CCS331 Relationships of Pixel 1.

Course 9 Texture. Definition: Texture is repeating patterns of local variations in image intensity, which is too fine to be distinguished. Texture evokes.

Vision-based human motion analysis: An overview Computer Vision and Image Understanding(2007)

So, what’s the “point” to all of this?….

(c) 2000, 2001 SNU CSE Biointelligence Lab Finding Region Another method for processing image  to find “regions” Finding regions  Finding outlines.

Fundamentals of Sensation and Perception RECOGNIZING VISUAL OBJECTS ERIK CHEVRIER NOVEMBER 23, 2015.

Learning object affordances based on structural object representation Kadir F. Uyanik Asil Kaan Bozcuoglu EE 583 Pattern Recognition Jan 4, 2011.

3:01 PM Three points for today Sensory memory (SM) contains highly transient information about the dynamic sensory array. Stabilizing the contents of SM.

Colour and Texture. Extract 3-D information Using Vision Extract 3-D information for performing certain tasks such as manipulation, navigation, and recognition.

High level vision.

Lecture 04 Edge Detection Lecture 04 Edge Detection Mata kuliah: T Computer Vision Tahun: 2010.

Image Features (I) Dr. Chang Shu COMP 4900C Winter 2008.

Using Geoboards. Identify and describe shapes (squares, circles, triangles, rectangles, hexagons, cubes, cones, cylinders, and spheres). This entire cluster.

Visual Recognition Lecture 12

9.012 Presentation by Alex Rakhlin March 16, 2001

Cognitive Processes PSY 334

3D Models and Matching particular matching techniques

Reading Assignments: Lecture 6. Object Recognition None

Pattern recognition (…and object perception).

Cognitive Processes PSY 334

Descriptions of 3-D Objects and Scenes

Presentation transcript:

Recognition by Parts Visual Recognition Lecture 12 “The whole is equal to the sum of its parts” Euclid

Main approaches to recognition: Pattern recognition Pattern recognition Invariants Invariants Alignment Alignment Part decomposition Part decomposition Functional description Functional description

Recognize !

“One of the most interesting aspect of the world is that it can be considered to be made up of patterns. A pattern is essentially an arrangement. It is characterized by the order of the elements of which it is made rather than by intrinsic nature of the elements” “One of the most interesting aspect of the world is that it can be considered to be made up of patterns. A pattern is essentially an arrangement. It is characterized by the order of the elements of which it is made rather than by intrinsic nature of the elements” Norbert Wiener

Nonsense Object The description reflect the working of a representational system The description reflect the working of a representational system Segmentation at regions of deep concavity Segmentation at regions of deep concavity Parts are described with common volumetric terms Parts are described with common volumetric terms The manner of segmentation and analysis into components does not depend on our familiarity with the object The manner of segmentation and analysis into components does not depend on our familiarity with the object

Issues Why parts? Why partition the shape? Why parts? Why partition the shape? How does the visual system decompose shapes into parts ? How does the visual system decompose shapes into parts ? Are parts chosen arbitrarily by the visual system? Are parts chosen arbitrarily by the visual system? How the 3D parts of an object are inferred from its 2D projection delivered by the eye? How the 3D parts of an object are inferred from its 2D projection delivered by the eye? Etc. Etc.

Between Speech and OR Number of categories rivals the number of words that can be identified from speech Number of categories rivals the number of words that can be identified from speech Speech perception: by identification of primitive elements – phonemes Speech perception: by identification of primitive elements – phonemes Small set of primitives (English 44) each with a handful of attributes Small set of primitives (English 44) each with a handful of attributes The representational power derives from combinations of the primitives The representational power derives from combinations of the primitives

OR – The Visual Domain Primitives – modest number of simple geometric components Primitives – modest number of simple geometric components Generally convex and volumetric (cylinders, blocks, cones, etc.) Generally convex and volumetric (cylinders, blocks, cones, etc.) Segmentation at regions of sharp concavity Segmentation at regions of sharp concavity Primitives derive from combinations of few qualitative characteristics of the edges in the 2D image (straight vs. curved, symmetry etc.) Primitives derive from combinations of few qualitative characteristics of the edges in the 2D image (straight vs. curved, symmetry etc.) These particular properties of edges are invariant over changes in orientation and can be determined from just a few points on each edge These particular properties of edges are invariant over changes in orientation and can be determined from just a few points on each edge Tolerance for variations of viewpoint, occlusion, noise Tolerance for variations of viewpoint, occlusion, noise The representational power derives from the enormous number of combinations The representational power derives from the enormous number of combinations

Count VS. Mass Noun Objects Categorization of isolated (unanticipated) objects Categorization of isolated (unanticipated) objects Modeling is limited to concrete entities with specified boundaries Modeling is limited to concrete entities with specified boundaries Mass nouns (water, sand) do not have a simple volumetric description and are identified differently. Primarily through surface characteristics (texture, color) Mass nouns (water, sand) do not have a simple volumetric description and are identified differently. Primarily through surface characteristics (texture, color)

Unexpected Object Recognition Is possible (not an obvious conclusion) Is possible (not an obvious conclusion) Can be done rapidly Can be done rapidly When viewed from novel orientations When viewed from novel orientations Under moderate level of visual noise Under moderate level of visual noise When partially occluded When partially occluded When it is a new exemplar of a category When it is a new exemplar of a category

Resulting Constraints Access to mental representation should not be dependent on absolute judgment of quantitative detail Access to mental representation should not be dependent on absolute judgment of quantitative detail The information that is the basis of recognition should be relatively invariant with respect to orientation and modest degradation The information that is the basis of recognition should be relatively invariant with respect to orientation and modest degradation Partial matches should be computable Partial matches should be computable

RBC: Recognition-By-Components The contribution: a proposal for a particular vocabulary of components derived from perceptual mechanisms and its account of how an arrangement of these components can access a representation of an object in memory The contribution: a proposal for a particular vocabulary of components derived from perceptual mechanisms and its account of how an arrangement of these components can access a representation of an object in memory

Issues Stages up to and including the identification of components are assumed to be bottom-up Stages up to and including the identification of components are assumed to be bottom-up It is likely that top-down routes (e.g. from expectancy, object familiarity, scene constraints) will be observed at number of the stages (e.g. segmentation, component definition, matching) – omitted in the interests of simplicity It is likely that top-down routes (e.g. from expectancy, object familiarity, scene constraints) will be observed at number of the stages (e.g. segmentation, component definition, matching) – omitted in the interests of simplicity Matching of the components occurs in parallel Matching of the components occurs in parallel Partial matches are possible (degree of match is proportional to the similarity in the components between image and representation) Partial matches are possible (degree of match is proportional to the similarity in the components between image and representation)

Geons - Units of Representation Segmentation into separate regions at points of deep concavity (particularly at cusps where there are discontinuities in curvature) Segmentation into separate regions at points of deep concavity (particularly at cusps where there are discontinuities in curvature) Transversality – paired concavities arise whenever convex volumes are joined Transversality – paired concavities arise whenever convex volumes are joined Each segmented region is approximated by one of a possible set of simple components = geons (geometrical ions) Each segmented region is approximated by one of a possible set of simple components = geons (geometrical ions) Can be modeled by generalized cones: volume swept out by a cross section moving along an axis Can be modeled by generalized cones: volume swept out by a cross section moving along an axis

Geons Are hypothesized to be simple, typically symmetrical volumes lacking sharp concavities (e.g. blocks, cylinders, spheres) Are hypothesized to be simple, typically symmetrical volumes lacking sharp concavities (e.g. blocks, cylinders, spheres) Can be differentiated on the basis of perceptual properties in the 2D image that are readily detectable and relatively independent of viewing position and degradation (e.g. good continuation, symmetry) Can be differentiated on the basis of perceptual properties in the 2D image that are readily detectable and relatively independent of viewing position and degradation (e.g. good continuation, symmetry) Objects can be complex – the units are simple and regular Objects can be complex – the units are simple and regular

Relations Among the Geons The arrangement of primitives is necessary for representing a particular object The arrangement of primitives is necessary for representing a particular object Different arrangements of the same components can lead to different objects Different arrangements of the same components can lead to different objects

Perceptual Basis for RBC Certain properties of edges in 2D are taken by the visual system as strong evidence that the 3D edges contain those same properties Certain properties of edges in 2D are taken by the visual system as strong evidence that the 3D edges contain those same properties Nonaccidental properties – would only rarely be produced by accidental alignments of viewpoint and object features Nonaccidental properties – would only rarely be produced by accidental alignments of viewpoint and object features Five nonaccidental properties:  Collinearity – the edge in the 3D world is also straight  Curvilinearity – smoothly curved elements in the image are inferred to arise from smoothly curved features in the 3D world  Symmetry – the object projecting the image is also symmetrical  Parallelism  Cotermination

Nonaccidental Properties Witkin & Tenenbaum 83: surface’s silhouette override the perceptual interpretation of the luminance gradient

Penrose Impossible Triangle

Cotermination – accidental alignment of the ends of noncoterminous segments Cotermination – accidental alignment of the ends of noncoterminous segments

Muller-Lyer Illusion

Y, arrow, and L vertices allow inference as to the identity of the volume in the image

Generating Geons from GC The primitives should be rapidly identifiable and invariant over viewpoint and noise The primitives should be rapidly identifiable and invariant over viewpoint and noise Differences among components are based on differences in nonaccidental properties Differences among components are based on differences in nonaccidental properties Variation over the nonaccidental relations of four attributes of GC generates a set of 36 geons Variation over the nonaccidental relations of four attributes of GC generates a set of 36 geons

Geon Set The characteristics of the cross section: Shape, Symmetry, Constancy of size along the axis (2 x 3 x 3) The characteristics of the cross section: Shape, Symmetry, Constancy of size along the axis (2 x 3 x 3) The shape of the axis ( x 2) The shape of the axis ( x 2) Here figures 6 and or 7 Here figures 6 and or 7

Nonaccidental 2D Contrasts Among Geons The values of the 4 attributes can be directly detected as differences in nonaccidental properties e.g. : The values of the 4 attributes can be directly detected as differences in nonaccidental properties e.g. :  Cross-section edges and curvature of the axis – collinearity or curvilinearity  Constant vs expand size of the cross section – parallelism Specification of the above is sufficient to uniquely classify a given arrangement of edges as one of the 36 geons Specification of the above is sufficient to uniquely classify a given arrangement of edges as one of the 36 geons

More Distinctive Nonaccidental Differences The arrangement of vertices – a richer description

RBC - Summary A specific set of primitives is derived from small number of independent characteristics of the input A specific set of primitives is derived from small number of independent characteristics of the input The perceptual system is designed to represent the free combination of a modest number of primitives based on simple perceptual contrast The perceptual system is designed to represent the free combination of a modest number of primitives based on simple perceptual contrast Geons are uniquely specified from their 2D image properties ( -> 3D object centered reconstruction is not needed) Geons are uniquely specified from their 2D image properties ( -> 3D object centered reconstruction is not needed) The input is mapped onto this modest number of primitives. Then using a representational system we can code and access free combinations of these primitives The input is mapped onto this modest number of primitives. Then using a representational system we can code and access free combinations of these primitives

RBC – General Principles A line drawing which represents discontinuities is an efficient description and sufficient for primal access A line drawing which represents discontinuities is an efficient description and sufficient for primal access Objects are better represented and analyzed by decomposing them into their natural components – parts Objects are better represented and analyzed by decomposing them into their natural components – parts A qualitative description of the components is necessary and sufficient to permit fast access to DB of object models A qualitative description of the components is necessary and sufficient to permit fast access to DB of object models Non-accidental instances of viewpoint invariant features in the 2D line drawing are sufficient to permit fast access to the qualitative model of a 3D object Non-accidental instances of viewpoint invariant features in the 2D line drawing are sufficient to permit fast access to the qualitative model of a 3D object Primal access for visual OR is obtained by matching a description of the spatial structure of components making up the object to an indexed DB of models in similar representation Primal access for visual OR is obtained by matching a description of the spatial structure of components making up the object to an indexed DB of models in similar representation

RBC – Computational Hypotheses Five specific classes of 2D line groupings are sufficient to access the parts representation Five specific classes of 2D line groupings are sufficient to access the parts representation Segmentation should happen at concavities in the outline of an object Segmentation should happen at concavities in the outline of an object The geons form an efficient qualitative shape representation for the parts which is suitable for primal access The geons form an efficient qualitative shape representation for the parts which is suitable for primal access The symbolic description for objects and models should include geon labels aspect ratios and relative sizes of parts The symbolic description for objects and models should include geon labels aspect ratios and relative sizes of parts

Implementations PARVO - Bergevin and Levine 1988 PARVO - Bergevin and Levine 1988 OPTICA – Dickinson, Rosenfeld, Pentland 1989 OPTICA – Dickinson, Rosenfeld, Pentland 1989 Munck-Fairwood 1991 Munck-Fairwood 1991 Pentland and Sclaroff 1991 Pentland and Sclaroff 1991 Raja and Jain 1992 Raja and Jain 1992

Example - Recovering Geons using Superquadrics Lame curves (1818) : Superellipse (Hein 1960) Where p even positive integer and q odd positive integer

Superellipse From star-shape to a square in the limit

Superellipsoid 3D surface is obtained by the spherical product of two 2D curves

e e

Superquadrics Barr 1981 – extension to Include superhyperboloids (1-2 pieces) and supertoroids

Superquadrics in Genral Position From world coordinates to SQ centered (11DOF)

Issues Domain: Suitable mainly for categorization. Suitable mainly for categorization.Problems: Extracting parts from the image is often difficult and unreliable. Extracting parts from the image is often difficult and unreliable. Many objects cannot be distinguished by their part structure only. Many objects cannot be distinguished by their part structure only. Metric information is essential in many cases. Metric information is essential in many cases.