Attention, selection and the need for perceptual demonstratives An empirically-motivated proposal concerning the nonconceptual link between the perceived.

Attention, selection and the need for perceptual demonstratives An empirically-motivated proposal concerning the nonconceptual link between the perceived world and its conceptual representation Zenon Pylyshyn, Rutgers Center for Cognitive Science

Focal attention: What is it for? Perceptual selection and perceptual demonstratives The principal function of focal attention is to select. But why do we need to select? 1. We must select because our capacity to process information is limited. 2. We also must select because we need to be able to mark certain tokens in the perceived world and to refer to the marked tokens qua individuals (e.g., as in counting things).  Another way to put this is that we need to select in order to refer to things and we need to refer to things whenever we detect relational properties among them (Collinear, Inside, Part-of, Connected-to,...) 3. An important reason for early selection is that it provides a way to group properties appropriately at the earliest (nonconceptual) stages of perception – and thus to help solve the binding problem  That’s what this talk is about: but first some background

Some background …. The early origins and motivation for the view that there is nonconceptual selection … a personal introduction

Why do we need to be able to pick out individuals without concepts? We need to make nonconceptual contact with the world through perception in order to stop the regress of concepts being defined in terms of other concepts which are defined in terms of still other concepts – sometimes called the symbol grounding problem Sensory transduction appears to be the universal, though typically tacit, assumption about how grounding occurs, at least in psychology and artificial intelligence. Yet most concepts cannot be reduced to sensory transduction. My proposal is that nonconceptual selection of individual objects is the primitive basis for all conceptualization and predication  The argument for nonconceptual selection of token objects as the primitive operation is primarily empirical.  I begin with a personal experience in developing a model for reasoning about geometry by drawing a diagram.

Begin by drawing a line….

Now draw a second line….

And draw a third line….

Notice what you have so far…. (noticings are local – you encode what you attend to) There is an intersection of two lines… But which of the two lines you drew are they? There is no way to indicate which individual things are seen again unless there is a way to refer to individual things

Look around some more to see what is there …. Here is another intersection of two lines… Is it the same intersection as the one seen earlier? To be able to tell without a reference to individuals you would have to encode unique properties of the individual lines. Which properties should you encode? L3L3 L6L6

Keeping track by encoding unique properties of individual items will not work in general No description can keep picking out the same individual when it is changing its location or appearance unpredictably  But a perceptual representation is always changing since it is always built up over time as properties are noticed – so you need a way to find the representation of a particular token element when new properties of that particular token element are noticed Many writers have postulated a “marking” process for computing relational predicates. But where is the “mark” placed? It can’t be placed in the representation, because its purpose is to keep track of which things in the world correspond to which things in the representation (e.g. counting).  People can pick out several individual items even if they are in a field of identical individuals – e.g., pick out a dot in a uniform field of dots so the picking out cannot be done solely by direction of gaze.

Footnote Notice that in the previous example it would not help if you labeled the diagram as you drew it. Why not?  Because to refer to the line with label L1 you would have to be able to think “This is line L1” and you could not think that unless you had a way to think “this” and the label would not help you to do that!  Being able to think “this” is another way to view the very problem I will be concerned with in this talk. You need an independent way to pick out and refer to an individual element – even if it is labeled! (I will also provide evidence that you need to do this for several individuals simultaneously).  This is exactly the point of Kaplan’s and Perry’s claim about the “essential indexical”

The requirements for picking out individual things and keeping track of them reminded me of an early comic book character called “Plastic Man”

Imagine being able to place several of your fingers on things in the world without being able to detect their properties in this way, but being able to refer to those things so you could move your gaze or attention to them. If you could you would possess FINgers of INSTantiation = FINSTs!

Outline of remainder of this talk  Selection: What is selected?  Places vs ‘Objects’ (Posner & analogue attention movement)  Evidence in favor of object-based selection  Selection and demonstrative reference  Multiple selection  FINST Theory and Object Files  Multiple Object Tracking (MOT) and FINST Indexes as direct (non-conceptually-mediated) reference  Selection and the Binding Problem  Implication for philosophical ideas about individuals, tracking and nonconceptual representation

Covert movement of attention Example of an experiment using a cue-validity paradigm for showing that the locus of attention moves without eye movements and for estimating its speed. Posner, M. I. (1980). Orienting of Attention. Quarterly Journal of Experimental Psychology, 32, 3-25. * *

Extension of Posner’s demonstration of attention switch Does the improved detection in intermediate locations entail that the “spotlight of attention” moves continuously through empty space?

But the enhancement of intermediate locations does not require a continuous analogue movement of attention through empty space  When attention is attracted by an onset event, the appearance of analog movement of focal attention can be explained by a punctate (quantal) theory of attention-switching Sperling & Weichselgartner (1995) – an episodic theory of attention shift  This raises the possibility that in shifting between two objects, attention does not actually move through empty space  Maybe attention is allocated to objects rather than locations?

Sperling & Weichselgartner argued that this analog movement is best explained by a quantal mechanism The theory assumes a quantal jump in attention in which the spotlight pointed at location -2 is extinguished and, simultaneously, the spotlight at location +2 is turned on. Because extinction and onset take a measurable amount of time, there is a brief period when the spotlights partially illuminate both locations simultaneously.

Evidence for Objects as the basis for selection Single Object Advantage: pairs of judgments are faster when both judgments concern the same perceived object Entire objects acquire enhanced sensitivity from the allocation of focal attention to part of the object Single-Object advantage occurs even with generalized “objects” defined in feature space (Blaser & Pylyshyn, 2000) and even when the object is distributed over time-slices (Flombaum & Scholl, 2006) Clinical (brain damage) syndromes such as Simultanagnosia and Hemispatial Neglect show object-based properties Attention moves with Moving Objects  Inhibition of Return (IOR)  Object Files  Multiple Object Tracking MOT (and generalization to movement in feature space)

Single-object superiority even when the shapes are controlled There are a large number of published experiments showing that when several perceptual judgments are made they are faster when they pertain to the same object, even when all other factors are controlled

Attention spreads over perceived objects Using a priming method (Egly, Driver & Rafal, 1994) showed that the effect of a prime spreads to other parts of the same visual object compared to equally distant parts of different objects. Spreads to B and not C Spreads to B and not C Spreads to C and not B Spreads to C and not B *

Objecthood endures over space-time Several studies have shown that what counts as the same object endures over time and location;  Object-specific priming (Kahneman; Scholl), Inhibition of return (Tipper)  Inhibition of return is object-based  Certain forms of disappearance-reappearance preserve objecthood Multiple Object Tracking MOT (Scholl, Keane) Apparent motion (Kolers, Yantis) Tunnel Effect (Michotte, 1953; Flombaum & Scholl, 2006) This identity constancy gives “visual objects” a real physical-object character and is one of the reasons why psychologists refer to them as “objects”.

Objects endure despite changes in location; and they carry their history with them! Object File Theory of Kahneman & Treisman Letters are faster to read if they appear in the same box in which they had appeared initially. Priming travels with the object. According to the theory, when an object first appears, a file is created for it and the properties of the object are encoded and subsequently accessed through this object-file.

Inhibition of return appears to be object-based Inhibition-of-return is thought to help in visual search since it prevents previously visited objects from being revisited The original study used static objects. Then (Tipper, Driver & Weaver, 1991) showed that IOR moves with the inhibited object.

IOR appears to be object-based (it travels with the object that was attended)

The same-object effect generalizes to objects not defined by distinct spatial locations Blaser, E., Pylyshyn, Z. W., & Holcombe, A. O. (2000). Tracking an object through feature-space. Nature, 408(Nov 9), 196-199.

There is also evidence from clinical studies supporting object-based selection Hemispatial Neglect Balint and simultanagnosia syndromes

Visual neglect syndrome is object-based When a right neglect patient is shown a dumbbell that rotates, the patient continues to neglect the object that had been on the right, even though It is now on the left (Behrmann & Tipper, 1999).

Simultanagnosic (Balint Syndrome) patients attend to only one object at a time Simultanagnosic patients cannot judge the relative length of two lines, but they can tell that a figure made by connecting the ends of the lines is not a rectangle but a trapezoid (Holmes & Horax, 1919).

Balint patients can only attend to one object at a time even if they are overlapping Luria, 1959

An empirical hypothesis: To select is to refer When we select an object with focal attention we thereby refer to it. Consequently we can e.g.,  Entertain thoughts about it (“this is red”)  Carry out certain actions towards it (e.g., move our gaze to it) But we can select several (n ≤ 4) objects at once so;  We can have demonstrative thoughts about several objects “this 1 is above this 2 ”  Having selected several objects we can evaluate predicates over them or move focal attention to them We can also subitize them or search through them  We can keep track of selected objects if we or they move unpredictably or change their properties

Pick out 3 dots I will cue and keep track of them  In a field of identical elements you can select several of them and move your attention among them (e.g., “move one up” or Move 2 right” etc) so long as at no time do you have to hold on to more than 3 or 4 dots

Subset selection for search Burkell, J., & Pylyshyn, Z. W. (1997). Searching through subsets: A test of the visual indexing hypothesis. Spatial Vision, 11(2), 225-258.

Subset search results: Only properties of the subset matter  If the subset is a single-feature search it is fast and parallel  If the subset is a conjunction search set, finding the target takes longer and is a serial search (RT increases with set size) The distance between targets does not matter, so observers don’t seem to be scanning the display looking for the target but can switch their attention directly to the subset items. This finding supports the claim that we have a small number of FINST indexes that can be captured by sudden onsets and can serve to direct focal attention

Individuals and patterns Vision does not recognize patterns by applying templates but rather by decomposing them into parts Recognition-By-Parts (Biederman, 2000) A pattern is encoded over time (and often over different views separated by saccades), so the visual system must keep track of the individual parts and merge descriptions of the same part at different times and stages of encoding  In recognizing a pattern, the visual system must pick out individual parts and bind them to the representation being constructed

Are there collinear items (n>3)?

Several objects must be picked out at once in making relational judgments The same is true for other relational judgments like inside or on-the-same- contour… etc. We must pick out the relevant individual objects first. Respond: Inside-same contour? On-same contour?

When items cannot be individuated, predicates over them cannot be evaluated When items cannot be individuated, predicates over them cannot be evaluated ● Do these figures contain one or two distinct curves? ● Individuating these curves requires a “curve tracing” operation, so Number_of_curves (C 1, C 2, …) takes time proportional to the length of the shortest curve.

The figure on the left is one continuous curve, the one on the right is two distinct curves – as shown in color.

Signature ‘subitizing’ phenomena only appear when objects are automatically individuated and indexed Trick, L. M., & Pylyshyn, Z. W. (1994). Why are small and large numbers enumerated differently? A limited capacity preattentive stage in vision. Psychological Review, 101(1), 80-102.

Our principal methodology: Multiple Object Tracking In a typical experiment, 8 simple identical objects are presented on a screen and 4 of them are briefly distinguished in some visual manner – usually by flashing them on and off. After these 4 “targets” have been briefly identified, all objects resume their identical appearance and move randomly. The subjects’ task is to keep track of which ones had earlier been designated as targets. After a period of 5-10 seconds the motion stops and subjects must indicate, using a mouse, which objects were the targets. People (even children) are very good at this task (80%- 98% correct). The question is: How do they do it?

Keep track of the objects that flash

How do we do it? What properties of individual objects do we use?

Keep track of the objects that flash

How do we do it? What properties of individual objects do we use?

 Do we track by storing and updating objects’ locations?  Not likely: the possibility that locations of targets are encoded and updated through serial visitation by focal attention was excluded in an early study  This supports the idea that the FINST mechanism automatically keeps track of objects as long as there are 4 or fewer of them (in other words indexes are “sticky”). Explaining Multiple Object Tracking

Other findings using MOT There have been dozens of studies using MOT with many surprising findings. Here are a few: Tracking performance is not affected if objects continually change their color or shape during a tracking trial (whether the change is synchronous or asynchronous) If objects do change their color or shape the change is not noticed Tracking is not disrupted of objects disappear briefly but totally behind opaque strips or if they all disappear together Targets can be selected automatically (by flashing) and also voluntarily. If selected voluntarily they have to be visited serially (while indexes are “dropped off”)

Review: A FINST is a mechanism that: 1. Picks out, and keeps track of individual distal objects 2. It does so directly – without the mediation of concepts and without using any encoded property of the indexed objects  In other words, FINSTs pick out and track objects as individuals rather than as bearers of certain properties 3. Because FINSTs do not pick out and track individuals as members of any category (including the category object), their connection to the world is transparent and nonconceptual. It is not an opaque “selecting as” relation;  Consequently a person may literally not know what he has selected (although indexes do make it possible for properties of the objects to be subsequently encoded into Object Files)  Pace John Campbell (2002, p134) “conscious experience of an object explains how you know the reference of a demonstrative”, we may not know the reference of a (perceptual) demonstrative

More on FINSTs ●A FINST is a numerically limited mechanism for selecting individual visual objects currently in view. It works just the way that a pointer in a computer data structure works: It provides epistemic access to a particular item without representing the item’s location or other properties; ●Although a FINST does not pick out an object in terms of its represented properties, there are properties that cause an index to be assigned (cf Kripke’s distinction between properties that fix a referent vs properties of the referent). There are also properties (maybe different properties) that allow objects to be tracked; ●A FINST is usually captured or grabbed by an object that suddenly appears. But its attachment to particular items can be voluntarily enabled by moving unitary focal attention to the desired objects, thus precipitating the capture of an index

A fundamental problem of perception: Encoding conjunctions of properties ☺ Finally this brings me to an important function that FINST indexes provide – a way to solve the ubiquitous binding problem in perception  Since we can distinguish between one combination of properties and another, early vision (sensation?) cannot simply announce the presence of properties for which there are sensors. They must provide additional information that allows the reconstruction of which properties ‘go with’ which.  The almost universal assumption about how this is done is that in early vision properties are encoded as being at particular locations Treisman’s Feature Integration Theory Strawson’s (and Clark’s) use of Feature Placing Theory

How are conjunctions of features detected? Read the vertical line of digits in the following display Under these conditions Conjunction Errors are very frequent

Rapid visual search (Treisman) Find the following simple figure in the next slides:

This case is easy – and the time is independent of how many nontargets there are – because there is only one red item. This is called a ‘popout’ search.

This case is harder – because it requires both color and orientation. Search time depends on how many nontargets there are. This is conjunction search.

Conjunction search and attention In a conjunction search, all the properties are present, but finding the target requires finding one in which the properties are bound together in the right combination Experiments suggest that in order to find a conjunction, attention has to be scanned serially to each candidate object Even though attention (and concepts) are needed in order to compute a conjunction, the earliest automatic stage of vision (the early-vision module) has to preserve the information from which conjunction can be computed. So it must not fuse the different instances of features.  Early vision must do more than report “P 1, P 2, … present” How does it preserve the relevant information? By location?  Many theories claim early vision reports “P 1 at L 1, P 2 at L 2, …”

Treisman’s Attention as Glue Hypothesis  The purpose of focal attention is to bind properties together in order to recognize objects  We can recognize not only the presence of “squareness” and “redness” in our field of view, but we can also distinguish between different ways they may be conjoined  How does attention conjoin features? Ans: by location.

The role of location in Treisman’s Feature Integration Theory

But in encoding properties, early vision can’t just bind them together according to their spatial co-occurrence – even their co- occurrence within some region. That’s because the relevant region depends on the object. So the selection and binding must be according to the objects that have those properties

The problem of binding conjunctions by the location of conjuncts does not work when feature location is not punctate and becomes even more problematic if they are co-located – e.g., if their relation is “inside”

In computing conjunctions of properties attention is directed at objects since it is objects that have conjoined properties Instead of being like a spotlight beam that can be scanned around a scene, and can be zoomed to cover a larger or smaller area, maybe attention can only be directed to occupied places – i.e., to visual objects  A large experimental literature shows that attention is Object- Based This suggests an alternative view of how the binding problem is solved in early vision – through the prior selection of perceptual objects  But selection does not have to depend only on unitary focal attention. FINSTs allow multiple objects to be selected. An alternative:

Object Files and the binding problem Suppose that only properties of indexed objects are conceptually encoded and that these are stored in object files associated with each object.  Then properties that belong to the same object are stored in the same object file (which may be empty, as they are in MOT).  This automatically solves the binding problem since it connects encoded properties to their visual object  This view comes out of both FINST Theory (Pylyshyn, 1989) and Object File Theory (Kahneman et al., 1992)

FINSTs and Object Files form the link between the world and its conceptualization

Some open questions We have arrived at the view that only properties of selected (indexed) objects enter into subsequent conceptualization and perception-based thought (i.e., only information in object files is made available to cognition) So what happens to the rest of the visual information? Visual information seems rich and fine-grained while this theory only allows for the properties of 4 or 5 objects to be encoded!  The present view leaves no room for nonconceptual representations whose content corresponds to the content of conscious experience  According to the present view, the only content that nonconceptual representations have is the demonstrative content of indexes that refer to perceptual objects  Question: Why do we need any more than that?

An intriguing possibility…. Maybe the theoretically relevant information we take in is less than (or at least different from) what we experience  This possibility has received attention recently with the discovery of various “blindnesses” (e.g., change-blindness, inattentional blindness, blindsight…) as well as the discovery of independent- vision systems (e.g., recognition and motor control)  The qualitative content of conscious experience may not play a role in explanations of cognitive processes  Even if unconceptualized information enters into causal process (e.g., motor control) it may not be represented or made available to the cognitive mind it – not even as a nonconceptual representation For something to be a representation its content must figure in explanations – it must capture generalizations. It must have truth conditions and therefore allow for misrepresentation. It is an empirical question whether current proposals do (e.g., primal sketch, scenarios). cf Devitt: Pylyshyn’s Razor

Vision science has always been deeply ambivalent about role of conscious experience Isn’t how things appear one of the things that our theories must explain? Answer: There is no a priori ‘must explain’! ● The content of subjective experience is a major type of evidence. But it may turn out not to be the most reliable source for inferring the relevant functional states. It competes with other types of evidence. ● How things appear cannot be taken at face value: it carries substantive theoretical assumptions. It also draws on many levels of processing.  It was a serious obstacle to early theories of vision (Kepler)  It has been a poor guide in the case of theories of mental imagery (e.g., color mixing, image size, image distances). ‘Reading X off an image’ is an illusion. ● It seems likely that vision science will use evidence of conscious experience the way linguistics uses evidence of grammatical intuitions – only as it is filtered through developing theories.  The questions a science is expected to answer cannot be set in advance – they change as the science develops.

What next? This picture leaves many unanswered questions, but it does provide a mechanism for solving the binding problem and also explaining how mental representations could have a nonconceptual connection with objects in the world (something required if mental representations are to connect with actions)

For a copy of these slides see: http://ruccs.rutgers.edu/faculty/pylyshyn/SelectionReference.ppt http://ruccs.rutgers.edu/faculty/pylyshyn/SelectionReference.ppt Or MIT Press Paperback

A new puzzle: individuation without reference? The correspondence problem is often solved without a numerical limit, therefore without the objects being indexed.  Examples include apparent motion and stereovision  Such computations do not seem to be over continuous visual manifolds but over discrete elements  Such discrete elements must therefore be created by a process that clusters features over space and time  Psychologists call the creation of individual elements “individuation”

Structure from Motion Demo Cylinder Kinetic Depth Effect

The correspondence problem for biological motion

Apparent motion of random dots

Another example: Punctate inhibition of moving objects? We have recently obtained evidence that nontargets are inhibited (as measured by the rate of detection of small faint probe dots).  There appears to be no inhibition of the empty region through which the nontargets move  The inhibition is spatially local How can punctate moving objects be inhibited unless they are somehow being tracked? And how can they be tracked if there are many (n > 5) of them? This provides more evidence for individuation without reference: Maybe Indexing is a two-stage process? 1. Individuate (numerically unlimited) 2. Assign a demonstrative reference (limited to ~4 indexes)

Recent experimental results on Inhibition of nontargets

The puzzle of clustering prior to indexing This puzzle of inhibitory tracking may signal the need for a kind of individuation that is a mere clustering, circumscribing, figure-ground distinction without a pointer or access mechanism – i.e. without reference. So long as correspondence can be computed with “local support” it can be done in early vision by cellular co- operative computations that do not require reference. Such computations do not require that the items be bound to some cognitive function.

Different types of mind-world links “I say that vision occurs when the image of the whole hemisphere of the world that is before the eye … is fixed in the reddish white concave surface of the retina. How the image or picture is composed by the visual spirits that reside in the retina and the [optic] nerve, and whether it is made to appear before the soul or the tribunal of the visual faculty by a spirit within the hollows of the brain, or whether the visual faculty, like a magistrate sent by the soul, goes forth from the administrative chamber of the brain into the optic nerve and the retina to meet this image, as though descending to a lower court – I leave to be disputed by [others]. For the armament of the opticians does not take them beyond this first opaque wall encountered within the eye.” (Johannes Kepler, quoted in Lindberg, 1976) This is a particularly insightful passage that indicates Kepler’s appreciation of a difficult contemporary problem of vision (the top-down vs bottom-up distinction) and also shows Kepler’s appreciation of the limits of his methodology.

From geometrical optics to Information, and from information to reference Two distinct types of mind-world connections The nonconceptual connection: causal or informational link The semantic connection: satisfaction (and reference) The question of how we make the transition from cause to meaning/reference is one of the great mysteries of mind (Brentano’s Problem). I will address a very small corner of that problem – namely, the question of how perception picks out individuals without already having concepts – which I propose as the first nononceptual step in the mind-world link.

Attention, selection and the need for perceptual demonstratives An empirically-motivated proposal concerning the nonconceptual link between the perceived.

Similar presentations

Presentation on theme: "Attention, selection and the need for perceptual demonstratives An empirically-motivated proposal concerning the nonconceptual link between the perceived."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Attention, selection and the need for perceptual demonstratives An empirically-motivated proposal concerning the nonconceptual link between the perceived.

Similar presentations

Presentation on theme: "Attention, selection and the need for perceptual demonstratives An empirically-motivated proposal concerning the nonconceptual link between the perceived."— Presentation transcript:

Similar presentations

About project

Feedback