Graphical Models in Vision. Alan L. Yuille. UCLA. Dept. Statistics.

Graphical Models in Vision. Alan L. Yuille. UCLA. Dept. Statistics

The Purpose of Vision. “To Know What is Where by Looking”. Aristotle. (384-322 BC). Information Processing: receive a signal by light rays and decode its information. Vision appears deceptively simple, but there is more to Vision than meets the Eye.

Ames Room

Perspective.

What are Humans Ideal for? Clearly humans are not good at determining the size of objects in images – at least for these types of stimuli. But they are good at determining context and taking contextual cues into account – i.e. use perspective cues to estimate depth and make adjustments. What reasoning/statistical tasks are humans ideal for?

Brightness of Patterns: Adelson (MIT)

Visual Illusions The perception of brightness of a surface, or the length of a line, depends on context. Not on basic measurements like: the no. of photons that reach the eye or the length of line in the image.

Vision is ill-posed. Vision is ill-posed – the data in the retina is not sufficient to unambiguously determine the visual scene. Vision is possible because we have prior knowledge about visual scenes. Even simple perception is an act of creation.

Perception as Inference Helmholtz. 1821-1894. “Perception as Unconscious Inference”.

Ball in a Box. (D. Kersten)

How Hard is Vision? The Human Brain devotes an enormous amount of resources to vision. (I) Optic nerve is the biggest nerve in the body. (II) Roughly half of the neurons in the cortex are involved in vision (van Essen). If intelligence is proportional to neural activity, then vision requires more intelligence than mathematics or chess.

Vision and the Brain

Half the Cortex does Vision

Vision and Artificial Intelligence The hardness of vision became clearer when the Artificial Intelligence community tried to design computer programs to do vision. ’60s. AI workers thought that vision was “low- level” and easy. Prof. Marvin Minsky (pioneer of AI) asked a student to solve vision as a summer project.

Chess and Face Detection Artificial Intelligence Community preferred Chess to Vision. By the mid-90’s Chess programs could beat the world champion Kasparov. But computers could not find faces in images.

Man and Machine. David Marr (1945-1980) Three Levels of explanation: 1. Computation Level/Information Processing 2. Algorithmic Level 3. Hardware: Neurons versus silicon chips. Claim: Man and Machine are similar at Level 1.

Vision: Decoding Images

Vision as Probabilistic Inference Represent the World by S. Represent the Image by I. Goal: decode I and infer S. Model image formation by likelihood function, generative model, P(I|S) Model our knowledge of the world by a prior P(S).

Bayes Theorem Then Bayes’ Theorem states we show infer the world S from I by P(S|I) = P(I|S)P(S)/P(I). Rev. T. Bayes. 1702-1761

Bayes to Infer S from I P(I|S) likelihood function. P(S) prior..

Ambiguity and Complexity of Images. Similar objects give rise to very different images. Different objects can cause similar images.

Ideal Observers The Image of a cylinder is consistent with multiple objects and viewpoints. The likelihood is ambiguous (concave or convex). The prior resolves the ambiguity by biasing towards convex objects viewed from above.

Influence Graphs and Visual Tasks Influence Graphs and the Visual Task

A Simple Taxonomy of Graphs A Taxonomy of Graphs: B. C. D.

Examples of Vision Tasks Visual Inference: (1) Estimating Shape. (2) Segmenting Images. (3) Detecting Faces. (4) Detecting and Reading Text. (5) Parsing the full image – detect and recognize all objects in the image, understand the viewed scene.

Segmentation (Level Sets)

Analysis by Synthesis Invert generation process to parse the image. Probabilistic Grammars for image generation (week 2).

Probabilistic Grammars for Images (I) Image are generated by composing visual patterns: (II) Parse an image by decomposing it into patterns.

Generative Models for Patterns Examples of images synthesized from generative models (MCMC).

Shape Inference

Face and Text Detection.

Text Detection

Towards Full Image Parsing The image genome project (Zhu). Attempt to determine the grammar for images by interactive parsing of images. Thereby learn the statistical regularities of images – the priors and the representations.

Parse graph with horizontal relations

Example: street scene

Database

Back to the Brain Top-Level; compare human performance to Ideal Observers. Explain human perceptual biases (visual illusions) as strategies that are “statistical effective”.

Brain Architecture The Bayesian models have interesting analogies to the brain. Generative models and analysis by synthesis. This is consistent with top-down processing? (Kersten’s talk next week).

Conclusion Vision is unconscious inference. Bayesian Approach lead to vision as analysis by synthesis -- inverting the image generation process. This requires “sophisticated” priors about the statistics of natural images. This can be formulated mathematically in terms of Probabilistic Grammars for image formation. These grammars can be learnt by analysing the “sophisticated” statistics of natural images.

Graphical Models in Vision. Alan L. Yuille. UCLA. Dept. Statistics.

Similar presentations

Presentation on theme: "Graphical Models in Vision. Alan L. Yuille. UCLA. Dept. Statistics."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Graphical Models in Vision. Alan L. Yuille. UCLA. Dept. Statistics.

Similar presentations

Presentation on theme: "Graphical Models in Vision. Alan L. Yuille. UCLA. Dept. Statistics."— Presentation transcript:

Similar presentations

About project

Feedback