Presentation is loading. Please wait.

Presentation is loading. Please wait.

What is focal attention for? The What and Why of perceptual selection  The central function of focal attention is to select  We must select because our.

Similar presentations


Presentation on theme: "What is focal attention for? The What and Why of perceptual selection  The central function of focal attention is to select  We must select because our."— Presentation transcript:

1 What is focal attention for? The What and Why of perceptual selection  The central function of focal attention is to select  We must select because our capacity to process information is limited  We must select because we need to be able to mark certain aspects of a display and to refer to the marked tokens individually  That’s what this talk is principally about: but first some background

2 The functions of focal attention  A central notion is that of “picking out” or selecting. The usual mechanism that is appealed to in explaining perceptual selection is attention (sometimes called focal attention or selective attention).  Why must we select anyway?  We must select because we can’t process all the information available. This is the resource-limitation reason. ○But in what way (along what dimensions) is it limited? What happens to what is not selected? The “filter theory” has many problems.  We need to select because certain patterns cannot be computed without first marking certain special elements (e.g. in counting)  We need to select in order to track the identity of individual things (e.g., to solve the correspondence problem)  We need to select because of the way relevant information in the world is packaged. This leads to the Binding Problem (later)

3 What is selected?  Whatever the reason for selection, the selection must occur in early in vision (in the visual module) and prior to conceptualization.  For resource-limitation reasons, selection must occur before the need for major resources  In the case of the “marking” or individuating, the empirical facts require that vision pick out and individuate without regard for the conceptual category or properties of the individuals  In the case of the property-binding, there are good reasons why selection should be based on individual things (objects)  All these reasons converge on the claim that what is selected is individuals or proto-objects

4 Attention and Selection  Early research concentrated on selective attention as a filter. It assumed that we select what can be described in very low-level terms – i.e., in terms of physical “channels” or based on transducer outputs. But the filter idea was shown to be only approximate – because filters always leaked  It is important that the question of selection be placed in the context of a pre-attentive (modular, nonconceptual, cognitively-impenetrable) stage of vision – otherwise in some sense anything can be “selected” (e.g., being edible, being a genuine Rembrandt painting)

5 Broadbent’s Filter Theory (illustrating the resource-limited account of selection) Broadbent, D. E. (1958). Perception and Communication. London: Pergamon Press. Limited Capacity Channel Effectors Store of conditional probabilities of past events (in LTM) Filter Motor planner Very Short Term Store Senses Rehearsal loop

6 Attention and Selection  The question the basis for selection has been at the bottom of a lot of controversy in vision science. Some options that have been proposed include:  We select what can be described physically (i.e., by “channels”) – i.e. we select based on transducer outputs  e.g., we select by frequency, color, shape, or location  We select according to what is important to us (e.g., affordances), or according to phenomenological salience  We select what we need to treat as special (selection = “marking”) or what we need to refer to  We select aspects (properties) to which we subsequently attach concepts (this idea will be important later)  It is important that the question of selection be placed in the context of a pre-attentive (modular, nonconceptual, cognitively-impenetrable) stage of vision – otherwise in some sense anything can be “selected” (e.g., being edible, being a genuine Rembrandt painting)

7 What does visual attention select? (What is the basis for selection?)  The most obvious answer to what we select is places. For example, we can select places by moving our eyes so our gaze lands on different places  When places are selected, are they selected automatically?  Must we always move our eyes to change what we attend to? ○ Studies of Covert Attention-Movement: Posner (1980). ○ How does attention switch from one place to another? ▫ When places are selected, are they selected automatically? ○ How does the visual system specify where to move attention to?  If we select places, are there restrictions on those places? e.g., ○ Must those places be filled or can they be empty places? ○ Must they be specifiable in relation to landmark objects?

8 Covert movement of attention Example of an experiment using a cue-validity paradigm for showing that the locus of attention moves without eye movements and for estimating its speed. Posner, M. I. (1980). Orienting of Attention. Quarterly Journal of Experimental Psychology, 32, 3-25.

9 Extension of Posner’s demonstration of attention switch Does the improved detection in intermediate locations entail that the “spotlight of attention” moves continuously through empty space?

10 But there are empirical reasons why objects are a better basis for attentional selection than location  There is experimental evidence that attention attaches to things rather than places  When attention is exogenously summoned, the appearance of analog movement of focal attention can be explained by a punctate object-based theory of attention-allocation – Sperling & Weichselgartner (1995)

11 Sperling & Weichselgartner (1995) “Episodic” or Quantal Theory of Attention switching Assumes a quantal “shift” in attention in which the spotlight pointed at location -2 is extinguished and, simultaneously, the spotlight at location +2 is turned on. Because extinction and onset take a measurable amount of time, there is a brief period when the spotlights partially illuminate both locations simultaneously.

12 This object-based view of attentional selection is at the heart of FINST theory  I propose that there are good reasons on both experimental and conceptual grounds for supposing that attention attaches itself to objects rather than locations

13 In what other ways might our visual information capacity be limited?  There are obviously limitations on the input side of vision that depend on the acuity of the sensors and the range of physical properties to which they respond.  But there is a limitation beyond that of acuity: The perceptual system is limited in what it can individuate and how many of these individuals it can deal with at one time. The capacity to individuate is different from the capacity to discriminate.  Some reason for thinking that individuating is a distinct process

14 The increasingly important role played by ‘Objects’ in studies of visual attention  There is a limitation in visual information processing that is beyond the limitation of acuity and of channel capacity: The perceptual system is limited in what it can individuate and how many of these individuals it can deal with at one time.  The capacity to individuate is different from memory capacity and discrimination capacity.  This notion of individuating and of individuals may be related to Miller’s “chunks”, but it has a special role in vision which we will explore in the next lecture  First some reasons why individuating is a distinct process

15 Visual Indexes ( aka FINSTs)  There is evidence that individuating is a special aspect of vision and the capacity to individuate is different from memory capacity and discrimination capacity.  This notion of individuating and of individuals may be related to Miller’s “chunks”, but it has a special role in vision  In vision there appears to be a limit to how many objects (individuals) can be selected and bound to the arguments of cognitive functions at one time.  There is evidence that we can hold on to 4 objects in visual short term memory (Luck & Vogel, 1997).  There is evidence that Objects (i.e., individual things) may be the basic units of visual attention  FINST Theory (to be described later) claims that there is a mechanism for picking out and referring to (pointing to) primitive visual elements independent of any of their properties and that this mechanism is the essential bridge between nonconceptual and conceptual representation.

16 Picking out is different from discriminating: Pick out the third contour from the left

17 Individuating as a distinct process  Individuating has its own psychometric function: The minimum distance for individuating is much larger than for discriminating.  It may be that in vision our attention is limited in the number of things we can individuate and simultaneously access (more on this later). But how do you determine what counts as a “thing”?  Individuating is a prerequisite for recognition of patterns and other properties defined among a number of individual parts  An example of how we can easily detect patterns if they are defined over a small enough number of parts is subitizing  Another area where the concept of an individual has become important is in cognitive development, where it is clear that babies are sensitive to the numerosity of individual things in a way that is distinct from their perceptual abilities but is limited in its capacity

18 Pick out 3 dots and keep track of them  In a field of identical elements you can select a number of them and move your attention among them (e.g., “move one up” or Move 2 right” etc) so long as at no time do you have to hold on to more than 4 dots

19 Individuals and patterns  Vision does not recognize patterns by applying templates since the size, shape, retinal location, orientation, and other properties must be abstracted away,  A pattern is encoded over time (and often over saccades), therefore the visual system must keep track of the individual parts and merge descriptions of the same part at different times and stages of encoding  Individuating is a prerequisite for recognition of patterns and configural properties defined among a number of individual parts  An example of how we can easily detect patterns if they are defined over a small enough number of parts is subitizing  In order to recognize a pattern, the visual system must pick out individual parts and bind them to the representation being constructed  Examples include what Ullman called “visual routines”  Another area where the concept of an individual has become important is in cognitive development, where it is clear that babies are sensitive to the numerosity of individual things in a way that is distinct from their perceptual abilities but is limited in its capacity

20 Are there collinear items (n>3)?

21 Several objects must be picked out at once in making relational judgments  The same is true for other relational judgments like inside or on-the-same- contour… etc. We must pick out the relevant individual objects first. Respond: Inside-same contour? On-same contour?

22 Subitizing vs Counting How many squares are there? Concentric squares cannot be subitized because individuating them requires the serial operation of curve tracing Subitizing indexed objects is fast, accurate and (relatively) independent of how many items there are. But a prerequisite for subitizing is being able to pick out the relevant individuals. Only the squares on the right can be subitized because picking out concentric items requires serial attention.

23 Signature subitizing phenomena only appear when objects are automatically individuated and indexed Trick, L. M., & Pylyshyn, Z. W. (1994). Why are small and large numbers enumerated differently? A limited capacity preattentive stage in vision. Psychological Review, 101(1), 80-102.

24 Example of subitizing popout and non- popout features (Count Pink vs. Count On-Same-Line)

25 Encoding conjunctions of properties  Experiments showing the special difficulty that vision has in detecting conjunctions of several properties have provided a basis for understanding an important problem in in visual analysis

26 How are conjunctions of features detected? Read the vertical line of digits in the following display Under these conditions Conjunction Errors are very frequent

27 Rapid visual search (Treisman) Find the following simple figure in the next slide:

28 This case is easy – and the time is independent of how many nontargets there are – because there is only one red item. This is called a ‘popout’ search

29 This case is also easy – and the time is independent of how many nontargets there are – because there is only one right-leaning item. This is also a ‘popout’ search.

30 Rapid visual search (conjunction) Find the following simple figure in the next slide:

31

32 Serial vs parallel search?  Finding an element that differs from all others in a scene by a single feature – called a single-feature search – is fast, error-free and almost independent of how many nontargets there are;  Finding an object that differs from all others by a conjunction of two or more features (and that shares at least one feature with each object in the scene) – called a conjunction search – is usually slow, error-prone, and is worse the more nontargets there are in the scene*.  These results suggest that in order to find a conjunction, which requires solving the binding problem, attention has to be scanned serially to all objects. * This way of putting is simplifies things. Under certain conditions the serial-parallel distinction breaks down

33 Single-Feature vs Conjunction-feature search

34 What is attention is for? Treisman’s Attention as Glue Hypothesis  The purpose of visual attention is to Bind properties together in order to recognize objects  This is called the “binding problem” or the “many properties problem” and it is of considerable interest to philosophers as well as vision scientists  We can recognize not only the presence of “squareness” and “redness” in our field of view, but we can also distinguish between different ways they may be conjoined

35 The role of attention to location in Treisman’s Feature Integration Theory

36 The ‘attention-as-glue’ hypothesis has a corollary: In computing conjunctions of properties, attention must be directed primarily at objects since it is objects that have the conjoined properties  Instead of being like a spotlight beam that can be scanned around a scene, and can be zoomed to cover a larger or smaller area, maybe attention can only be directed towards occupied places – i.e., to visual objects

37 An alternative view of how we solve the binding problem  If we assume that only properties of indexed objects are encoded and stored in Object Files, then properties that belong to the same object are stored in the same Object File, so the binding problem does not arise  This is the Object-Based Attention view exemplified by FINST Theory  The assumption that only properties of indexed objects are encoded raises the problem of what happens to properties of the other (unindexed) objects or unencoded properties in a display I will return to this conundrum later.

38 FINST Theory postulates a limited number of pointers in early vision that are elicited by causal events in the visual field and that enable vision to refer to things without doing so under concept or a description

39 Evidence for attentional selection based on Objects  Single Object Advantage: pairs of judgments are faster when both apply to the same perceived object  Entire objects acquire enhanced sensitivity from focal attention to a part of the object  Single-Object advantage occurs even with generalized “objects” defined in feature space  Simultanagnosia and hemispatial neglect show object-based effects  Attention moves with Moving Objects  IOR  Object Files  MOT

40 Single-object superiority even when the shapes are controlled

41 Attention spreads over perceived objects Using a priming method (Egly, Driver & Rafal, 1994) showed that the effect of a prime spreads to other parts of the same visual object compared to equally distant parts of different objects. Spreads to B and not C Spreads to B and not C Spreads to C and not B Spreads to C and not B

42 Objecthood endures over time  Several studies have shown that what counts as an object (as the same object) endures over time and over changes in location;  Certain forms of disappearances in time and changes in location preserve objecthood.  This gives what we have been calling a “visual object” a real physical-object character and partly justifies our calling it an “object”.

43 Yantis use of the “Ternus Configuration” to demonstrate the early visual effect of objecthood Short time delays result in “element motion” because the middle object is seen to persist as the “same object” so it does not appear to move

44 Long time delays result in “group motion” because the middle object is not perceived as persisting but is perceived as a new object each time it reappears

45 But with long delays, if the disappearance appears to be due to occlusion by an opaque surface, objects appear to endure so the display behaves like the short delay display

46 With long delays if the disappearance is perceived as due to occlusion by an opaque surface the display behaves like the short delay one (if the timing and attention focus is just right!)

47 Inhibition of return appears to be object-based (as well as to some extent location-based)  Inhibition-of-return is thought to help in visual search since it prevents previously visited objects from being revisited  The original study used static objects. Then (Tipper, Driver & Weaver, 1991) showed that IOR moves with the inhibited object

48 IOR appears to be object-based (it travels with the object that was attended)

49 Most studies showed that IOR is object-based (it travels with the object that was attended)  Some studies (Tipper, Weaver, Jerreat, & Burak, 1994) showed that attention can also be location-based, but in those cases the “location” was well marked by visible context cues – so it may be that locations such as “halfway between object X and Object Y” can be attended  Clinical studies with patients who have attentional deficits show that their deficit is object based (illustrated later)

50 Objects endure despite changes in location; and they carry their history with them! Object File Theory of Kahneman & Treisman Letters are faster to read if they appear in the same box where they appeared initially. Priming travels with the object. According to the theory, when an object first appears, a file is created for it and the properties of the object are encoded and subsequently accessed through this object-file.

51 Demo of Object File Experiment

52 Tracking objects not defined by distinct spatial locations and spatial trajectories Blaser, E., Pylyshyn, Z. W., & Holcombe, A. O. (2000). Tracking an object through feature-space. Nature, 408(Nov 9), 196-199.

53 There is also evidence from neuropsychology that is consistent with the object-based view  Neglect  Balint and simultanagnosic patients

54 Visual neglect syndrome is object-based When a right neglect patient is shown a dumbbell that rotates, the patient continues to neglect the object that had been on the right, even though It is now on the left (Behrmann & Tipper, 1999).

55 Simultanagnosic (Balint Syndrome) patients only attend to one object at a time Simultanagnosic patients cannot judge the relative length of two lines, but they can tell that a figure made by connecting the ends of the lines is not a rectangle but a trapezoid (Holmes & Horax, 1919).

56 Balint patients can only attend to one object at a time even if they are overlapping Luria, 1959

57 The End (for now)

58 Multiple Object Tracking  One of the clearest cases illustrating object-based attention is Multiple Object Tracking  Keeping track of individual objects in a scene requires a mechanism for individuating, selecting, accessing and tracking the identity of individuals over time  These are the functions we have proposed are carried out by the mechanism of visual indexes (FINSTs)  We have been using a variety of methods for studying visual indexing, including subitizing, subset selection for search, and Multiple Object Tracking (MOT).

59 Multiple Object Tracking  In a typical experiment, 8 simple identical objects are presented on a screen and 4 of them are briefly distinguished in some visual manner – usually by flashing them on and off.  After these 4 “targets” have been briefly identified, all objects resume their identical appearance and move randomly. The subjects’ task is to keep track of which ones had earlier been designated as targets.  After a period of 5-10 seconds the motion stops and subjects must indicate, using a mouse, which objects were the targets.  People are very good at this task (80%-98% correct). The question is: How do they do it?

60 Keep track of the objects that flash

61 How do we do it? What properties of individual objects do we use?

62 Keep track of the objects that flash

63 How do we do it? What properties of individual objects do we use?

64  Basic finding: People (even 5 year old children) can track 4 to 5 individual objects that have no unique visual properties  How is it done?  Can it be done by keeping track of the only distinctive property of objects – their location? Explaining Multiple Object Tracking

65 Predicted performance for the serial tracking algorithm as a function of the speed of movement of attention

66 If we are not using and updating objects’ locations, then how are we tracking them?  Our hypothesis, which is independently motivated, is that there are a small number of primitive indexes or pointers, each of which can pick out a particular individual object  The index keeps providing access to the object as the object changes its properties and its location.  The object is not selected by using an encoding of any of its properties. It is picked it out nonconceptually just as the demonstrative that does in language.  Nonconceptual selection is selection without classification (without encoding the selected thing as having certain properties or as being a member of a certain category)  Nonconceptual contact with the world is essential in order to ground concepts in causal connections

67 A FINST is a mechanism that: 1. Picks out, and 2. Keeps track of  individual distal elements, and 3. Does so directly (i.e., without mediation of concepts and without appealing to or using any encoded properties of the individuals). Therefore, 4. FINSTs pick out and track individuals as individuals rather than as bearers of certain properties 5. FINSTs do not pick out and track individuals as members of any category: The connection to the world is purely causal and nonconceptual, so there is no “seeing as” relation.  So the visual system (and the person) literally does not what is being selected and tracked, even though this indexed selection allows further properties of the object in question to be encoded subsequently!

68 Where does this leave the binding problem?  Binding by location – advantages  It’s easy to see how locations might be picked out since they are physically specifiable and are a logical extension of direction of gaze  Location can be specified across modalities  Binding by location – disadvantages  Empty locations do not have causal powers  Empty locations do not have properties  Point locations do not help with the binding problem ○they have to be at least regions ○The boundaries of regions are defined by objects, so objects first have to be selected in any case

69 Objects as the basis for binding  Binding by individual – advantages  Individuals are the focus of properties – in the end we need to bind together properties of a single individual  Binding by individual – disadvantages  It is hard to see how a mechanism can pick out individuals without focusing on their location  How can individuals be tracked without detecting properties unique to that individual?  Philosophers from Strawson to Clark have argued that individuation requires the apparatus of concepts to provide conditions of individuation, so how can individuals be recognized and tracked by early (nonconceptual) vision?

70 Summary of some properties of indexing revealed by recent experiments 1.Targets can be tracked even when they disappear behind an occluder and, under certain conditions, even when all objects disappear from view (Scholl & Pylyshyn, 1999; Keane & Pylyshyn, VSS2003). Demo: MOT with occlusion MOT with occlusion 2.Properties of targets are not encoded during MOT nor are they used in tracking. Changes in target properties are not even noticed (Scholl, Pylyshyn & Franconeri, 1999; Bahrami, 2003). 3.Not all well-defined clusters of features can be tracked: Only ones that correspond to objects (Scholl, Pylyshyn & Feldman, 2001). Demo: "Rubber band" displays "Rubber band" displays

71 Summary of some properties of indexing revealed by recent experiments 4.Indexes are assigned primarily in an exogenous, automatic, involuntary and data-drive manner. They can also be assigned endogenously (voluntarily) but we believe this happens only by moving focal attention to each target serially (Annon & Pylyshyn, VSS2003). 5.Index maintenance in tracking appears to be non- predictive and non-attentive (Keane & Pylyshyn, VSS2003; Leonard & Pylyshyn, VSS2003). 6.Target-target confusions are much more numerous than target-nontarget confusions. The reason appears to be that nontargets are inhibited, which may prevent them from being swapped with nontargets (Pylyshyn & Leonard, VSS2003).

72 Summary of some properties of indexing revealed by recent experiments 7.Keeping track of objects as targets is easier than keeping track of their identity (when the latter is provided at the start of the trial by a name or special location)  The poorer recall of object identities is surprising, given that in order to judge an object as a target one needs to trace its identity back to an object that had been visibly distinct at the start of a trial! So why is ID lost? 8.One reason is that target-target confusions are much more numerous than target-nontarget confusions. But why should this be so? 9.One reason may be that nontargets are inhibited, which may prevent them from being swapped with nontargets. We have shown this is so experimentally. But that leaves a serious puzzle: How can inhibition travel with objects when no indexes are available for tracking?

73 The beginnings of the puzzle of clustering prior to indexing, and what that might mean!  If moving objects are inhibited then inhibition moves along with the objects. How can this be unless they are being tracked? And if they are being tracked there must be at least 8 FINSTs!  This puzzle may signal the need for a kind of individuation that is weaker than the individuation we have discussed so far – a mere clustering, circumscribing, figure-ground distinction without a pointer or access mechanism – i.e. without reference!  It turns out that such a circumscribing-clustering process is needed to fulfill many different functions in early vision. It is needed whenever the correspondence problem arises – whenever visual elements need to be placed in correspondence or paired with other elements. This occurs in computing stereo, apparent motion, and other grouping situations in which the number of elements does not affect ease of pairing (or even results in faster pairing when there are more elements). Correspondence is not computed over continuous visual manifolds but only over some pre-clustered elements.

74 An alternative view of how to solve the Binding Problem  According to the current version of FINST theory, only properties of indexed objects are encoded (conceptualized)  The binding problem never arises because properties are always encoded as properties of an indexed object, and no other properties are encoded at all.  This is in conflict with strong intuitions – namely that we see much more than we conceptualize. So what do we do about the things we “see” but do not conceptualize?  Some philosophers say they are represented nonconceptually?  But what is such a representation like? And what makes it a representation, as opposed to just a biological reaction?  My provisional answer is that such biological reactions (e.g., retinal activity) are not representations at all – they have no truth values and so they cannot misrepresent  This is another hard issue to be deferred to later

75 Puzzles raised by FINST theory and MOT results  If the only information about indexed objects is encoded and made available to the cognitive mind, what happens to information about other parts of the visual scene?  There are, after all, only about 4 or 5 indexes and surely we see a lot more of the world than 4 or 5 objects!  This raises the question about whether non-indexed objects are ‘processed’ in any sense at all, and whether they are even represented in some (presumably nonconceptual) way.  Do objects that are not indexed have any effect on the visual system at all?  The mystery of unattended objects  Functional blindness in normal vision

76 Austen Clark (& P. Strawson) and feature placing languages What kind of representation does sensation allow? Ans: Just those in feature-placing languages “The hypothesis that this book offers is that sensation is feature-placing: a pre-linguistic system of mental representation. Mechanisms of spatio- temporal discrimination … serve to pick out or identify the subject-matter of sensory representation. That subject-matter turns out invariably to be some place-time in or around the body of the sentient organism. …the various reasons cited for thinking that sensation is intentional can also be explained on this hypothesis. The ‘aboutness’ of sensation reduces to its spatial character. (p 165)” “…there is a sensory level of identification of place-times that is more primitive than the identification of three-dimensional material objects. Below our conceptual scheme – underneath the streets, so to speak – we find evidence of this more primitive system. The sensory identification of place-times is independent of the identification of objects; one can place features even though one lacks the latter conceptual scheme.

77 Because our perceptual system can distinguish objects that differ by conjunctions of properties, early vision must not fuse together or lose the object-specificity of properties it detects. In reporting properties early vision must bind them together according to the objects that have those properties

78 Some philosophical morals we can draw from FINST theory  Distinguishing causes and codes ○What causes Object Files to be created vs what is entered into them  Conceptual and nonconceptual contents  Representing and carrying information ○The case of clusters, figure-ground, and correspondence  Can information-carrying properties (e.g., location on the proximal pattern) create clusters without representing locations of features that are clustered?

79 The problem is what to do about the items that were not attended but in some sense had been ‘seen’ Some considerations:  We should not equate ‘attended’ with indexed or selected or with any other information-processing function? To be attended is typically defined in terms of either the task goals (where unattended means unreported) or the perceptual experience Forms of inattentional blindness  Non-indexed items may continue to be indexable for a short time after they physically disappear (e.g., occlusions in MOT)  The question is whether this persistence is a form of nonconceptual representation or a mere latency or inertia in the visual mechanism, and that question eventually comes back to whether we must advert to semantical notions in stating the generalizations (De Morgan’s Canon or Occam’s Razor).

80 Another puzzle: Punctate inhibition of moving objects?  We have recently obtained evidence that nontargets are inhibited (as measured by the rate of detection of small faint probe dots).  There appears to be no inhibition of the empty region through which the nontargets move  The inhibition is spatially local  How can a punctate moving object be inhibited unless the object is being tracked? And how can it be tracked if there are many (n > 5) of them?  But there is some sense in which moving objects must be tracked: E.g., Dynamic random-dot stereograms, kinetic depth effect  Maybe Indexing is a two-stage process? 1.Individuate 2.Reference (for accessing)

81 Exp 1: Probe-dot detection (statistically adjusted using regression)

82 Recent experimental results on Inhibition of nontargets Experiment 1: 3 locations

83 Recent experimental results on Inhibition of nontargets Expt 2: 5 locations

84 Exp 2: Showing results when statistically adjusted using regression

85 The effect of doubling the number of nontargets

86 The beginnings of the puzzle of individuating prior to indexing, and what that might mean!  If moving objects are inhibited then inhibition moves along with the objects. How can this be unless they are being tracked? And if they are being tracked there must be at least 8 FINSTs!  This puzzle may signal the need for a kind of individuation that is weaker than the individuation we have discussed so far – a mere clustering, circumscribing, figure-ground distinction without a pointer or access mechanism – i.e. without reference!  It turns out that such a circumscribing-clustering process is needed to fulfill many different functions in early vision. It is needed whenever the correspondence problem arises – whenever visual elements need to be placed in correspondence or paired with other elements. This occurs in stereo, apparent motion, and other situations in which increasing the number of elements does not increase the difficulty of computing correspondences.  Correspondence is not computed over continuous visual manifolds but only over some pre-clustered elements.

87 Example of the correspondence problem for apparent motion The grey disks correspond to the first flash and the black ones to the second flash. Which of the 24 possible matches will the visual system select as the solution to this correspondence problem? What principal does it use? Curved matchesLinear matches

88 Here is how it actually looks

89 Why does the apparent motion take the form it does?  The principle appears to be one of minimizing the vector difference between each possible correspondence pair and that of its nearest neighbors (Dawson & Pylyshyn, 1988)  This principle arises from (is justified by) the natural constraints of rigidity and opacity:  In our kind of world most image features arise from distal elements on the surface of opaque rigid objects, i.e., the vast majority of perceived distal elements are on the visible surface of opaque rigid objects  Therefore each distal element is likely to move the same amount and in the same direction as elements near to it (since they are likely to be on the same surface)

90 Views of a dome

91 Structure from Motion Demo Cylinder Kinetic Depth Effect

92 The correspondence problem for biological motion

93 FINSTs and nonconceptual representation (a reprise)  What does the early vision system deliver to the mind preconceptually and preattentively?  What classes and properties can be recognized without the apparatus of concepts?  Causality? Cardinality (of small sets)?  3D object shapes? Shape-from motion? Shape from shading? Shape from contours?  What can be selected in a nonconceptual manner, and how does this help with the problem of connecting vision with the world?

94 Reprise … what are FINSTs?  They are a primitive reference mechanism that refer to individual objects in the world (FINGs?)  Objects are picked out and referred to without using any encoding of their properties, including their location. Picking out objects is prior to encoding their locations!  Indexing is nonconceptual because it does not represent an individuals as a member of some conceptual category – not even as being in the category individual or object!  FINSTs serve as visual demonstratives, much like the terms this or that do in language, by picking out and referring to individuals without using their properties.  The central function of FINST indexes is to bind arguments of visual predicates or of motor commands to things in the world to which they must refer. Only predicates with bound arguments can be evaluated.

95 Schema for how FINSTs function in visual-motor control

96 The binding hypothesis of the visual-cognitive bottleneck  Going back to Newell’s binding hypothesis we are hypothesizing that the bottleneck between vision and cognition is in the number of objects that can be simultaneously bound to the arguments of cognitive routines  Another way to put this is that visual cognition can simultaneously attend to only about 4 objects.  There is direct evidence for the limit of about 4 visual objects in visual working memory (Luck & Vogel, 1997)  This sense of “attend to” means refer to or bind to a mental symbols  This is precisely what is posited by FINST Theory.

97 Schema for how FINSTs function in Robot Vision


Download ppt "What is focal attention for? The What and Why of perceptual selection  The central function of focal attention is to select  We must select because our."

Similar presentations


Ads by Google