Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Visual Recognition Lecturers: Ehud Rivlin, Michael Rudzsky, Ilan Shimshoni Tutor: Michael Rudzsky.

Similar presentations


Presentation on theme: "1 Visual Recognition Lecturers: Ehud Rivlin, Michael Rudzsky, Ilan Shimshoni Tutor: Michael Rudzsky."— Presentation transcript:

1 1 Visual Recognition Lecturers: Ehud Rivlin, Michael Rudzsky, Ilan Shimshoni Tutor: Michael Rudzsky

2 2

3 3 Outline n Introduction n Pattern recognition n Alignment n Invariants n Recognition by parts n Function and context n Recognition and cognition n Applications

4 4 Motivations for Computer Vision n “Computational Vision”: Modeling biological visual processes -[Vision as a challenge] n “Machine Vision”: Performing specific tasks with the aid of visual data -[Vision as an engineering tool] n “Image Understanding”: Recovering the structure of a scene (geometry photometry) from images -[Vision as a science]

5 5 Vision is n To see – to know what is where by looking Aristotle n Vision – an information processing task n Representation and processing n To understand computers is different from understanding computations

6 6 Mental Rotations Shepard and Mezler 1979

7 7

8 8 Understanding an IP Device n Computational theory: –What is the goal of the computation, why is it appropriate? n Representation and algorithm: –How can this computational theory be implemented? In particular, what is the representation (input/output) and the algorithm for the transformation? (e.g. Psychophysics) n Hardware implementation: –How can the representation and algorithm be realized physically? (e.g. Neuroanatomy)

9 9 What is (human) Vision for? n What kind of information is vision delivering? n What are the representational issues involved? n Warrington (1973): patient who had suffered left or right parietal lesions –Right side: provided a conventional view -> name, semantics. O.w. – fail to recognize –Left side: unable to name, or state purpose and semantics. Correctly perceived geometry (un/conventional) n Main job of vision – to derive representation of shape

10 10

11 11 Vision as Recovery If we can recover we can: n Navigate and avoid obstacles n Recognize classes of objects Use the following methodology: n Computational Theory n Algorithms and Datastructures n Implementation

12 12

13 13 Vision processes Are divided into three levels: n Low level processes (e.g., edge detection, stereo, motion analysis, shape from shading): recovering surfaces from the image, extracting features. n Mid level processes (e.g., attention, segmentation, grouping, object-background separation) : focusing on the objects. n High level processes (e.g., recognition, localization, navigation) : building a map of the environment

14 14

15 15 Marr Paradigm To understand a visual process consider the following levels: (a) The level of computational theory (b) The level of algorithms and data structures (c) The level of implementation

16 16

17 17 Problems n Visual computations are reduced to finding the value of some real quantity n Success relies on the accuracy of the first or second digit of the above quantity n Instability

18 18

19 19 Unexpected Object Recognition n Imagine that you are viewing a random sequence of slides… A. Rosenfeld

20 20 Conjectures n For purposes of rapid recognition humans represent a 3-D object by a set of characteristic views or “aspects”, by a set of commonly occurring 2-D projections n To a first approximation, humans describe the image of an object as consisting of a set of “primitive” parts. There are two types of such parts: pieces of regions and pieces of boundaries n The properties used by humans to describe parts for purposes of rapid recognition are local property values, or simple combinations of such values…….

21 21

22 22

23 23 Vision is Hard n Some pictures are difficult to interpret n A large size of the brain is devoted to vision n 45 years of research in computer vision and we are still nowhere near a solution

24 24 Why Vision is Hard? n Ill-definedness n Ill-posedness n Intractability

25 25 Ill - definedness n The standard scene models used in general recovery tasks are not fully defined: “Piesewise” simple means nothing unless we impose a lower bound on the piece sizes. “Noise” is not easy to model (or to distinguish from the “signal”); it’s not Gaussian! Many easily recognizable object classes do not have simple definitions (A’s, chairs, bushes, dogs, …)

26 26 Ill - posedness Recovery problems are usually underconstrained - e.g., ambiguity of illumination/photometry/ geometry Questionable approaches: Add constraints - e.g., smoothness (regularization) description length Critique: the actual scene may not satisfy the constraints!

27 27 Shape from Shading n ( p s, q s,-1) – direction of the light source n ( p, q,-1) – normal at the surface point whose image is (x,y) n Intensity at any image point gives one equation with two unknowns (p,q) n We need to impose additional constraints n Same for every recovery module

28 28

29 29 Regularization n For surface shape – surface is smooth n In color processing – reflectance is described with small number of basis functions (retinal pigments) n For non-rigid motion – small deviation from rigid motion n In segmentation – as simple as possible with respect to some complexity measure n Minimize: The equationis not enough to recover Assume

30 30 Issues n The coefficient determines the relative importance of smoothing n Tends to smooth over discontinuities n How to pick the coefficient systematically? n Small values increase sensitivity to noise Solutions: - Segment into homogeneous regions and regularize within each region - Divide the image into boundary and nonboundary points - etc…. (e.g. Active Vision)

31 31 Intractability n Recovery and recognition tasks are of combinatorial complexity. n Parallelism can speed up the early stages of the vision process (e.g., image operations), but little is known about how to speed up the potentially combinatorial stages. n We are usually forced to solve vision problems in real time with inadequate computational resources; hence we are always cutting corners (modularity, suboptimality).

32 32 What can be done Define your domain Work in a domain that can be adequate (e.g., specialized) Pick your problem Attempt only partial recovery (“qualitative”) for recognition Purposive (functional) vision Improve your inputs - Sensory redundancy (Multisensor fusion, Active vision - Processing redundancy (consensus) Take your time Useadequate computational resources

33 33

34 34 Recognition Associating consistent labels to objects ignoring variations due to illumination conditions, viewing position, posture (non-rigidities), occlusion, background, etc.

35 35 Main approaches to recognition: n Pattern recognition n Alignment n Invariants n Part decomposition n Functional description

36 36

37 37 Pattern Recognition n Recognizing objects by their distinguished features Method: n A set of characteristic features is extracted from the image. n Characteristic features of the same object are clustered together using statistical modeling. n Recognition is achieved by finding the cluster to which the image features belong

38 38

39 39

40 40

41 41 Pattern recognition (cont.) Domain: n Suitable mainly for 2D objects. Problems: n Limited domain. n Can statistical approaches effectively account for variations between images?

42 42 Alignment n Recognizing objects by compensating for variations Method: n The stored library of objects contains their shape and allowed transformations. n Given an image and an object model, a transformation is sought that brings the object to appear identical to the image.

43 43

44 44 Alignment (cont.) Domain: n Suitable mainly for recognition of specific object. Problems: n Complexity: recovering the transformation is time-consuming. n Indexing: library is searched serially. n Non rigidities are difficult to model.

45 45 Invariance n Removing the effect of transformation from the image. Method: n Compute a set of measurement that are independent of image variations

46 46

47 47

48 48 Invariance (cont.) Domain: n Suitable mainly for planar objects. Problems: n Limited domain: problematic when 3D objects are considered. n Non rigidities are difficult to model.

49 49 Part decomposition Recognizing objects by their structural descriptions. Method: n Given an image, identify basic parts in the image. n Generate a graph in which the nodes represent the parts and the edges n Represent the spatial relations between the parts. n Compute the obtained graph with stored graph representations.

50 50

51 51 Part decomposition (cont.) Domain: n Suitable mainly for categorization. Problems: n Extracting parts from the image is often difficult and unreliable. n Many objects cannot be distinguished by their part structure only. n Metric information is essential in these cases.

52 52

53 53 Functional description Recognizing objects by their function. Method: n Identify parts of the object that have certain functional use. n Compare the set of functions extracted from the image with similar sets of stored objects (or with a needed one).

54 54

55 55 Functional description (cont.) Domain: n Easy to apply to man-made objects (design). Problems: n Function is often difficult to extract n Mapping Shape to Function

56 56 Recognize !

57 57 Recognition is harder n What constitutes an object? n Categorization vs. identification n Recognition of entire scenes n Context.

58 58

59 59 Context The Question n How does our visual system enable us to gain reliable information about the environment? n What kind of information can or should a visual system derive from the image? n Are descriptions of 3-D spatial structure and location the only descriptions that should be produced? What is vision for? n Bottom-up vs. Top-down n Complete and accurate recovery of the scene is not necessary Full recovery is not always

60 60 Purposive Recognition n Perception for action: formulating questions that are directly related to visual tasks n Recognition is defined with respect to a given task n Complete and accurate recovery of the scene is not necessary

61 61

62 62 Applications n Inspection n Guiding tools for the blind. n Autonomous machines n Security systems n Optical character recognition n Image data bases n Image stabilizers, video editing


Download ppt "1 Visual Recognition Lecturers: Ehud Rivlin, Michael Rudzsky, Ilan Shimshoni Tutor: Michael Rudzsky."

Similar presentations


Ads by Google