Segmentation and Grouping

Slides:



Advertisements
Similar presentations
Computer Vision - A Modern Approach Set: Probability in segmentation Slides by D.A. Forsyth Missing variable problems In many vision problems, if some.
Advertisements

Computer Vision Lecture 16: Region Representation
Segmentation Course web page: vision.cis.udel.edu/~cv May 2, 2003  Lecture 29.
Human-Computer Interaction Human-Computer Interaction Segmentation Hanyang University Jong-Il Park.
Jochen Triesch, UC San Diego, 1 Local Stability Analysis Step One: find stationary point(s) Step Two: linearize around.
Lecture 6 Image Segmentation
Lecture 5 Hough transform and RANSAC
Fitting lines. Fitting Choose a parametric object/some objects to represent a set of tokens Most interesting case is when criterion is not local –can’t.
Computer Vision Fitting Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, T. Darrel, A. Zisserman,...
1Ellen L. Walker Segmentation Separating “content” from background Separating image into parts corresponding to “real” objects Complete segmentation Each.
Normalized Cuts and Image Segmentation Jianbo Shi and Jitendra Malik, Presented by: Alireza Tavakkoli.
Region Segmentation. Find sets of pixels, such that All pixels in region i satisfy some constraint of similarity.
Image Segmentation Chapter 14, David A. Forsyth and Jean Ponce, “Computer Vision: A Modern Approach”.
Segmentation CSE P 576 Larry Zitnick Many slides courtesy of Steve Seitz.
Cognitive Processes PSY 334 Chapter 2 – Perception June 30, 2003.
Computer Vision - A Modern Approach
Computer Vision - A Modern Approach Set: Fitting Slides by D.A. Forsyth Fitting Choose a parametric object/some objects to represent a set of tokens Most.
Problem Sets Problem Set 3 –Distributed Tuesday, 3/18. –Due Thursday, 4/3 Problem Set 4 –Distributed Tuesday, 4/1 –Due Tuesday, 4/15. Probably a total.
Segmentation Divide the image into segments. Each segment:
Segmentation Kyongil Yoon. Segmentation Obtain a compact representation of what is helpful (in the image) No comprehensive theory of segmentation Human.
Computer Vision Segmentation Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, T. Darrel,...
Segmentation by Clustering Reading: Chapter 14 (skip 14.5) Data reduction - obtain a compact representation for interesting image data in terms of a set.
Visual Cognition II Object Perception. Theories of Object Recognition Template matching models Feature matching Models Recognition-by-components Configural.
Image Segmentation Some slides: courtesy of O. Capms, Penn State, J.Ponce and D. Fortsyth, Computer Vision Book.
Fitting a Model to Data Reading: 15.1,
Fitting. Choose a parametric object/some objects to represent a set of tokens Most interesting case is when criterion is not local –can’t tell whether.
Presented By : Murad Tukan
Computer Vision Segmentation Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, T. Darrel,...
Segmentation Slides Credit: Jim Rehg, G.Tech. Christopher Rasmussen, UD John Spletzer, Lehigh Also, Slides adopted from material provided by David Forsyth.
Computer Vision - A Modern Approach Set: Segmentation Slides by D.A. Forsyth Segmentation and Grouping Motivation: not information is evidence Obtain a.
Computer Vision Spring ,-685 Instructor: S. Narasimhan Wean Hall 5409 T-R 10:30am – 11:50am.
October 8, 2013Computer Vision Lecture 11: The Hough Transform 1 Fitting Curve Models to Edges Most contours can be well described by combining several.
CSE 185 Introduction to Computer Vision Pattern Recognition.
Jochen Triesch, UC San Diego, 1 Short-term and Long-term Memory Motivation: very simple circuits can store patterns of.
CSSE463: Image Recognition Day 34 This week This week Today: Today: Graph-theoretic approach to segmentation Graph-theoretic approach to segmentation Tuesday:
Segmentation by Clustering
Computer Vision Lecture 5. Clustering: Why and How.
Segmentation Course web page: vision.cis.udel.edu/~cv May 7, 2003  Lecture 31.
Chapter 14: SEGMENTATION BY CLUSTERING 1. 2 Outline Introduction Human Vision & Gestalt Properties Applications – Background Subtraction – Shot Boundary.
Digital Image Processing CCS331 Relationships of Pixel 1.
Chapter 6 Repetition…. Objectives (1 of 2) Reinforce the importance of the principle of repetition. Understand the effect of repetition in a design. Appreciate.
CS654: Digital Image Analysis Lecture 25: Hough Transform Slide credits: Guillermo Sapiro, Mubarak Shah, Derek Hoiem.
CS654: Digital Image Analysis
EECS 274 Computer Vision Model Fitting. Fitting Choose a parametric object/some objects to represent a set of points Three main questions: –what object.
Image Segmentation Shengnan Wang
1 Computational Vision CSCI 363, Fall 2012 Lecture 16 Stereopsis.
Mestrado em Ciência de Computadores Mestrado Integrado em Engenharia de Redes e Sistemas Informáticos VC 15/16 – TP10 Advanced Segmentation Miguel Tavares.
Motion Segmentation at Any Speed Shrinivas J. Pundlik Department of Electrical and Computer Engineering, Clemson University, Clemson, SC.
Normalized Cuts and Image Segmentation Patrick Denis COSC 6121 York University Jianbo Shi and Jitendra Malik.
18. Perception Unit 3 - Neurobiology and Communication
Jochen Triesch, UC San Diego, 1 Part 3: Hebbian Learning and the Development of Maps Outline: kinds of plasticity Hebbian.
K-Means Segmentation.
Miguel Tavares Coimbra
Fitting: Voting and the Hough Transform
Miguel Tavares Coimbra
Fitting.
CSSE463: Image Recognition Day 34
Segmentation by clustering: mean shift
Fitting Curve Models to Edges
Segmentation and Grouping
Brain States: Top-Down Influences in Sensory Processing
Perceiving and Recognizing Objects
Computer Vision - A Modern Approach
Emel Doğrusöz Esra Ataer Muhammet Baştan Tolga Can
Cognitive Processes PSY 334
Grouping/Segmentation
Brain States: Top-Down Influences in Sensory Processing
Volume 64, Issue 6, Pages (December 2009)
CSSE463: Image Recognition Day 34
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

Segmentation and Grouping Outline: Overview Gestalt laws role of recognition processes excursion: competition and binocular rivalry Segmentation as clustering Fitting and the Hough Transform

Credits: major sources of material, including figures and slides were: Forsyth, D.A. and Ponce, J., Computer Vision: A Modern Approach, Prentice Hall, 2003 Peterson, M. Object recognition processes can and do operate before figure-ground organization. Current Directions in Psychological Science, 1994. Wilson, H. Spikes, decisions, actions. Oxford University Press, 1999. Jitendra Malik and various resources on the WWW

Different Views Obtain a compact representation from an image/motion sequence/set of tokens Should support application Broad theory is absent at present Grouping (or clustering) collect together tokens that “belong together” Fitting associate a model with tokens issues which model? which token goes to which element? how many elements in the model?

General ideas tokens whatever we need to group (pixels, points, surface elements, etc., etc.) top down segmentation tokens belong together because they lie on the same object bottom up segmentation tokens belong together because they are locally coherent These two are not mutually exclusive

Why do these tokens belong together? A disturbing possibility is that they all lie on a sphere -- but then, if we didn’t know that the tokens belonged together, where did the sphere come from? Why do these tokens belong together?

A driving force behind the gestalt movement is the observation that it isn’t enough to think about pictures in terms of separating figure and ground (e.g. foreground and background). This is (partially) because there are too many different possibilities in pictures like this one. Is a square with a hole in it the figure? or a white circle? or what?

Basic ideas of grouping in humans Figure-ground discrimination grouping can be seen in terms of allocating some elements to a figure, some to ground impoverished theory Gestalt properties elements in a collection of elements can have properties that result from relationships (Muller-Lyer effect) gestaltqualitat A series of factors affect whether elements should be grouped together Gestalt factors

The famous Muller-Lyer illusion; the point is that the horizontal bar has properties that come only from its membership in a group (it looks shorter in the lower picture, but is actually the same size) and that these properties can’t be discounted--- you can’t look at this figure and ignore the arrowheads and thereby make the two bars seem to be the same size.

Some criteria that tend to cause tokens to be grouped.

More such

Occlusion cues seem to be very important in grouping Occlusion cues seem to be very important in grouping. Most people find it hard to read the 5 numerals in this picture

but easy in this

The story is in the book (figure 14.7)

Illusory contours; a curious phenomenon where you see an object that appears to be occluding.

Segmentation and Recognition Early idea: (Marr and others) segmentation of the scene into surfaces essentially bottom-up (2½-D sketch) many computer vision systems assume segmentation happens strictly before recognition Now: few people still believe that general solution to bottom-up segmentation of scene is possible some think that segmentation processes should provide a set of segmentations among which higher processes somehow choose

Evidence for Recognition Influencing Figure-Ground Processes After: Mary A. Peterson (1994). Rubin vase-faces stimulus: you see only one shape at a time spontaneous switching

Reversals of Figure-Ground Organization: a+c: symmetry, enclosure, relative smallness of area, suggest center is the foreground b+d: partial symmetry, relative smallness of area, interposition, suggest center is foreground a+c (inverted), b+d (upright) are rotated versions of each other Two conditions: “try to see white as foreground” “try to see black as foreground” Measurement: how long do they see what as foreground before reversal?

Results: (averaged across the 2 conditions) Overall, subjects perceive white region longer as foreground if it’s an object in canonical orientation. Durations when black object is foreground get shorter

Impoverished recognition of, e.g., upside-down faces:

The First Perceived Figure-Ground Organization: low vs. high denotative region (left vs. right in example), matched for area and convexity presented for 86 ms followed by mask Results: high denotative regions seen as foreground more often when upright than when inverted (76% vs. 61%) works for presentation times as short as 28 ms

Combination with Symmetry Effects: symmetry also requires presentation for at least 28 ms both symmetry and recognition seem to get about equal weight in influencing figure-ground organization

Object Recognition Inputs to the Organization of 3-D Displays: stereogram version of stimuli: disparity can suggest high or low denotative region as foreground: cooperative vs. competitive stereograms Results: cooperative case: ~90% of time high-denotative region seen as foreground competitive case: ~50% result Note: for random dot stereograms high denotative region has no advantage for becoming foreground

Peterson’s Model:

Competition and Rivalry: Decisions! Motivation: ability to decide between alternatives is fundamental Idea: inhibitory interaction between neuronal populations representing different alternatives is plausible candidate mechanism The most simple system: Winner-take-all (WTA) network K1 K2

The Naka-Rushton function A good fit for the steady state firing rate of neurons in several visual areas (LGN, V1, middle temporal) in response to a visual stimulus of contrast P is given by: P ½ , the “semi-saturation”, is the stimulus contrast (intensity) that produces half of the maximum firing rate rmax. N determines the slope of the non-linearity at P ½ . Albrecht and Hamilton (1982)

Stationary States and Stability The stationary states for K1=K2=120: e1 = 50, e2 = 0 e2 = 50, e1 = 0 e1 = e2 = 20 Linear stability analysis: 1) for e1 = 50, e2 = 0 : 2) for e1 = e2 = 20 : (τ=20ms) → “stable node” → “unstable saddle”

Matlab Simulation Behavior for strong identical input: K1=K2=K=120 one unit wins the competition and completely suppresses the other

Binocular Rivalry, Bistable Percepts Idea: extend WTA network by slow adaptation mechanism that models neural adaptation due to slow hyperpolarizing potassium current. Adaptation acts to increase semi-saturation of Naka Rushton non-linearity ambiguous figure binocular rivalry L R

Matlab Simulation β=1.5 β=1.5

Discussion of Rivalry Model Positive: roughly consistent with anatomy/physiology offers parsimonious mechanism for different perceptual switching phenomena, in a sense it “unifies” different phenomena by explaining them with the same mechanism Limitations: provides only qualitative account real switching behaviors are not so nice and regular and simple: cycles of different durations temporal asymmetries rivalry: competition likely takes place in hierarchical network rather than in just one stage. spatial dimension was ignored training effects

Technique: Shot Boundary Detection Find the shots in a sequence of video shot boundaries usually result in big differences between succeeding frames Strategy: compute interframe distances declare a boundary where these are big Possible distances frame differences histogram differences block comparisons edge differences Applications: representation for movies, or video sequences find shot boundaries obtain “most representative” frame supports search

Technique: Background Subtraction If we know what the background looks like, it is easy to identify “interesting bits” Applications Person in an office Tracking cars on a road surveillance Approach: use a moving average to estimate background image subtract from current frame large absolute values are interesting pixels trick: use morphological operations to clean up pixels

a: average image. b: background subtraction with different threshold See caption, figure 14.11 An important point here is that background subtraction works quite badly in the presence of high spatial frequencies, because when we Estimate the background, we’re almost always going to smooth it. a: average image. b: background subtraction with different threshold d: background estimated with EM, e: result of Background subtraction

Segmentation as clustering Cluster together (pixels, tokens, etc.) that belong together Agglomerative clustering attach closest to cluster it is closest to repeat Divisive clustering split cluster along best boundary Point-Cluster distance single-link clustering (distance between clusters is shortest distance between elements) complete-link clustering (distance between clusters is longest distance between elements) group-average clustering (distance between clusters is distance between their averages (fast!)) Dendrograms yield a picture of output as clustering process continues

K-Means Choose a fixed number of clusters (K) Choose cluster centers and point-cluster allocations to minimize error can’t do this by search, because there are too many possible allocations. Algorithm fix cluster centers; allocate points to closest cluster fix allocation; compute best cluster centers x could be any set of features for which we can compute a distance (careful about scaling)

K-means clustering using intensity alone and color alone Image Clusters on intensity Clusters on color I gave each pixel the mean intensity or mean color of its cluster --- this is basically just vector quantizing the image intensities/colors. Notice that there is no requirement that clusters be spatially localized and they’re not. K-means clustering using intensity alone and color alone

K-means using color alone, 11 segments Image Clusters on color K-means using color alone, 11 segments

K-means using color alone, 11 segments.

K-means using colour and position, 20 segments Here I’ve represented each pixel as (r, g, b, x, y), which means that segments prefer to be spatially coherent. THese are just some of 20 segments.

Graph theoretic clustering Represent tokens using a weighted graph. Define affinity matrix of edge weights Cut up this graph to get subgraphs with strong interior links

Measuring Affinity Intensity Distance Texture here c(x) is a vector of filter outputs. A natural thing to do is to square the outputs of a range of different filters at different scales and orientations, smooth the result, and rack these into a vector. Texture

Scale affects affinity This is figure 14.18

Normalized cuts Idea: maximize the within cluster similarity compared to the across cluster difference Write graph as V, one cluster as A and the other as B Maximize i.e. construct A, B such that their within cluster similarity is high compared to their association with the rest of the graph cut(A,B) = sum of weights between A and B assoc(A,V) = sum of weights that have one end in A

by Shi and Malik, copyright IEEE, 1998 This is figure 14.23 - caption there explains it all. Figure from “Image and video segmentation: the normalised cut framework”, by Shi and Malik, copyright IEEE, 1998

This is figure 14.24, whose caption gives the story F igure from “Normalized cuts and image segmentation,” Shi and Malik, copyright IEEE, 2000

Data base with Human labeled segmentations is now available (Jitendra Malik)

Fitting Choose a parametric object/some objects to represent a set of tokens Most interesting case is when criterion is not local can’t tell whether a set of points lies on a line by looking only at each point and the next. Three main questions: what object represents this set of tokens best? which of several objects gets which token? how many objects are there? (you could read line for object here, or circle, or ellipse or...)

Fitting and the Hough Transform Purports to answer all three questions in practice, answer isn’t usually all that much help We do for lines only A line is the set of points (x, y) such that Different choices of q, d>0 give different lines For any (x, y) there is a one parameter family of lines through this point, given by Each point gets to vote for each line in the family; if there is a line that has lots of votes, that should be the line passing through the points

Figure 15.1, top half. Note that most points in the vote array are very dark, because they get only one vote. tokens votes

Mechanics of the Hough transform Construct an array representing q, d For each point, render the curve (q, d) into this array, adding one at each cell Difficulties how big should the cells be? (too big, and we cannot distinguish between quite different lines; too small, and noise causes lines to be missed) How many lines? count the peaks in the Hough array Who belongs to which line? tag the votes Hardly ever satisfactory in practice, because problems with noise and cell size defeat it

This is 15.1 lower half tokens votes

15.2; main point is that lots of noise can lead to large peaks in the array

This is the number of votes that the real line of 20 points gets with increasing noise (figure15.3)

Figure 15.4; as the noise increases in a picture without a line, the number of points in the max cell goes up, too