Part 2: Cognitive Architecture. Representation in perception One of the most important properties of the cognitive architecture – especially in vision.

Part 2: Cognitive Architecture

Representation in perception One of the most important properties of the cognitive architecture – especially in vision – is that it determines the FORM that representations take  What do we know about the form of perceptual representations, as opposed to their content? Does vision make use of cognition, or is it prohibited from using relevant knowledge – i.e., it is encapsulated Does the organization of cognitive functions (as in Broadbent’s filter theory) follow a fixed structure? Does this constrain the interaction among different aspects of mind?  Fodor’s The Modularity of Mind

The Form and Structure of perceptual representations ● Our subjective impression of what our visual representation is like is seriously unreliable and misleading. We do not experience the form of a representation, only its content – what it is about (or what it represents) ● But the demand of causal scientific explanation is quite different; and they almost always lead us to unfamiliar and counterintuitive conclusions

This is what our conscious experience suggests goes on in vision…

This is what the demands of explanation suggests must be going on in vision…

Is the problem solely (or primarily) due to how things seem to us in our consciousness? There are also empirical phenomena that suggest that there is an internal display somewhere in our visual system

Perceptual completions Where’s Waldo?

Parts of representations are sensitive to other parts of a representation – there’s an interpretive holism  Examples from vision seeing as: It’s what you see the figure as that determines behavior – not its physical properties.  What you see one part as determines what you see another part as

Is it possible to specify a set of ways of physically presenting a visual stimulus for it to be perceived in a certain way?

Can you think of other ways of presenting a stimulus so it is perceived as e.g., a Necker Cube?

But the evidence against the inner picture is very strong While we experience a panoramic world, the evidence is that there is far less information there than meets the eye  Although there is “filling in”, there is far less of it than we assume and it is usually not ‘cognitive’  Distinct saccades carry over very little information  Most visual phenomena are constrained by our concepts

Standard view of saccadic integration assumes that retinal images are superposed in a central display

The superposition view fails O'Regan, J. K., & Lévy-Schoen, A. (1983). Integrating visual information from successive fixations: Does trans-saccadic fusion exist? Vision Research, 23(8), 765-768.

Vision consists of many modules with restricted lines of inter-communication

But not every way of creating a contour yields the same result

There may be far less information in each glance than you think “Change Blindness” Airplane Farm Dinner

Errors in recall suggest how visual information is encoded Errors in relative orientation often take a canonical form Errors in reproducing a 3D image preserve 3D information, but not as conventional pictures Children have very good visual memory, yet often make egregious errors of recall

Errors in recall suggest how visual information is encoded Children more often confuse left-right than rotated forms Errors in imitating actions is another source of evidence

Ability to manipulate and recall patterns depends on their conceptual, not geometric, complexity Difficulty in superimposing shapes depends on how they are conceptualized Look at first two shapes and superimpose them in your mind; then draw (or select one) that is their superposition

Many studies have shown that memory for shapes is dependent on the conceptual vocabulary available for encoding them e.g., recall of chess positions by beginners and masters

Strong Equivalence and the role of cognitive architecture

The concept of cognitive architecture  If differences among behaviors (including differences among individuals) is to be attributed to different beliefs or different algorithms, then there must be some common set of basic operations and mechanisms. This is called the Cognitive Architecture The concept of a particular algorithm, or of being “the same algorithm” is only meaningful if two computers have the same architecture. Algorithm is architecture-relative.  The architecture is the part of the system that does not change when beliefs change. So it defines the system’s Cognitive Capacity.

Example of model of the Sternberg task discussed earlier 1.Store memory set as a list L. Call the list size = n 2.Read target item, call it  (If there is no , then quit) 3.Check if  is one of the letters in the list L 4.If found in list, assign  =“yes” otherwise  =“no” (That provides the answer, but what about the time  ?) 5.If  =“yes”, set  = 500 + K * n  Rand(20  x  50) 6.If  =“no”, set  = 800 + K * n  Rand(20  x  50) 7.Print , Print  8.Go to 2 Is this the way people do it? How do you know? Store memory set Compute the list size = n

Example of a weakly equivalent model of the Sternberg task 1.Store memory set as a list L. Call the list size = n 2.Read target item, call it  3.Check if  is one of the letters in the list L 4.If found in list, then assign  =“yes” else  =“no” 5.If  =“yes”, then set  = 500 + K set * n  Rand (20  x  50) 6.If  =“no”, then set  = 800 + K * n  Rand(20  x  50) 7.Print , Print  8.Go to 2 Is this the way people do it? How do you know?

Tacit assumptions made in constructing a computational model But there are many other properties of algorithms that constitute assumptions about the cognitive architecture. One class of properties seems so natural that it goes unquestioned – it’s the control structure ● Operations are carried out in sequence. No operation can begin until the previous one is completed. This seems so natural that it goes unnoticed as an assumption. ● Another fundamental property that is assumed is that control is passed from one operation to another (e.g., “go to”), as opposed to being grabbed in a “recognize-act” cycle

More about the computational model and the tacit assumptions it makes An important one not yet mentioned is the constraints on information processing that the architecture provides We know that human information processing is limited by the architecture  But how is it limited?

Now a short excursion into visual attention

Attention as Selection We will concentrate on the Selection or Filtering aspects of attention. We will ask: 1.Why do we need to select anyway?  Because our processing capacity is limited? The Big Question: In what way is it limited? (Miller, 1957)  We will return to this core question after some preliminaries on the early study of attention as selection and the filter theory. 2.On what basis do we select? Some alternatives:  We select according to what is important to us (e.g., affordances)  We select what can be described physically (i.e., “channels”)  We select based on what can be encoded without accessing LTM  We “pick out” things to which we subsequently attach concepts: i.e., we pick out objects (or regions?) 3.What happens to what we have not selected? A largely unsolved mystery (though in some cases there are plausible answers).

Broadbent’s Filter Theory Broadbent, D. E. (1958). Perception and Communication. London: Pergamon Press. Limited Capacity Channel Effectors Store of conditional probabilities of past events (in LTM) Filter Motor planner Very Short Term Store Senses Rehearsal loop

Problems with the Filter Theory The filter “leaks.” Work by Treisman, Lackner, and many others shows that the filter could not be eliminating parts of the input using a physically-defined channel, because the properties on the basis of which the input is filtered require a high level of processing (e.g., determination of meaning). Consequently such information must have to have gotten through the filter! Many solutions to this conundrum have been proposed, ranging from replacing the filter with an attenuator, to various complex (and highly incomplete) proposals such as those of Deutsch & Deutsch, (1963) and Norman (1968), Morton (1969) and Neisser(1967), none of which are satisfactory, but each of which embodies some ideas that may be part of the story. What all these alternatives do is assume that the filter is responsive to top-down expectancy and prediction effects. But the evidence is against this sort of knowledge-based selection as a general property of perception (Pylyshyn, 1999).

Channel Capacity? How to measure it? The Shannon Information Theory measure, that excited psychologists in the 1950s, proved to be very limited and did not help our understanding of attention.

Big Question #1: Why do we need to select information? Along which dimensions is human information processing capacity limited? Channel capacity: Shannon-Hartley Theorem Capacity measured in some sort of “chunks” (Miller) Capacity measured in terms of the number of arguments that can be simultaneously bound to cognitive routines (Newell) To what things in the world can the arguments of visual predicates be bound?

Example of the use of chunking To recall a string of binary bits: e.g., 110101110101110110101001 (24) People can recall a string of about 8 binary integers. If they learn a binary encoding rule (00  0, 01  1, 10  2, 11  3) they can recall about 8 such chunks or 18 binary bits. If they learn a 3:1 chunking rule (called the Octal number system) they can recall a 24 bit string, etc

Does the evidence support this idea? Memory span can be greatly increased through chunking! Yet chunking has also been used to explain things it cannot explain. It is only explanatory if you have an account of how chunking occurs and what rules in LTM are being used (and what counts as a chunk).

Why can we retain vastly different amounts of ‘information’ just by using a different encoding vocabulary? Answer: The architecture of the cognitive system has the property that it can deal with a fixed maximum number of items, regardless of what the items are. This property can be exploited to get around the bottleneck of the short-term memory. We do this by recoding the input into a smaller number of discrete units, called chunks. There is also evidence that it takes additional time to encode and decode chunks, so the recoding technique is a case of time-capacity tradeoff or what is known in CS as a compute-vs-store tradeoff.

The special Case of Visual Attention What can we attend to and how do we change what we attend to?

What does visual attention select? (What is the basis for selection?) If attention is selection, what does visual attention select? An obvious answer is places. We can select places by moving our eyes so our gaze lands on different places. When places are selected, are they selected automatically? Must we always move our eyes to change what we attend to?  Studies of Covert Attention-Movement: Posner (1980). Can we attend to any other property? Why can’t we select on the basis of color, depth, or the property that some paintings have of having been painted by Da Vinci (A property to which Bernard Berenson was able to attend extremely well).

How else can visual attention select? Can we control the size and shape of the region that is selected, or is selection always punctate and data-driven? Zoom Lens model of spatial attention (Eriksen & St James, 1986). We control where attention moves:  Is this automatic or voluntary?  How do we know where to direct our attention? How do we specify a location prior to attending to it?  We need a way to specify where or what prior to attending to it! Keep this conundrum in mind – we will return to it later! How narrowly can we focus our attention? Can we make it pick out one out of several objects?  Are there special conditions under which we are able to pick out individual things? We will return to “attentional resolution” or the minimum spacing for selecting individual things.

Covert movements of attention Example of an experiment using a cue-validity paradigm for showing that the locus of attention moves without eye movements and for estimating its speed. Posner, M. I. (1980). Orienting of Attention. Quarterly Journal of Experimental Psychology, 32, 3-25.

We can select a shape even when it is intertwined among other similar shapes Are the green items the same? On a surprise test at the end, subjects were not able to recognize recall shapes that had been present but had not been attended – i.e., had not appeared in green (Rock and Gutman, 1981)

Other examples of attentionally induced inhibition Negative Priming (Treisman & DeShepper, 1996).  Is there a figure on the right that is the same as the figure on the left?  When the figure on the left is one that had appeared as an ignored figure on the right, RT is long and accuracy poor.  This “negative priming” effect persisted over 200 intervening trials and lasted for a month!

Another negative attention effect: Inattentional Blindness

Inattentional Blindness  The background task is to report which of two arms of the + is longer. One critical trial per subject, after about 3,4 background trials. Another “critical” trial presented as a divided attention control.  25% of subjects failed to see the square when it was presented in the parafovea (2° from fixation).  But 65% failed to see it when it was at fixation!  When the background task cross was made 10% as large, Inattentional Blindness increased from 25% to 66%.  It is not known whether this IB is due to concentration of attention at the primary task, or whether there is inhibition of outside regions.

In what other ways might our information capacity be limited? We have limitations on the input side that depend on the acuity of the sensors and the range of physical properties to which they respond. But there is a limitation beyond that of acuity: The perceptual system is limited in what it can individuate and how many of these individuals it can deal with at one time. The capacity to individuate is different from the capacity to discriminate.  This notion of individuating and of individuals may be related to Miller’s “chunks”, but it has a special role in vision which we will explore in the next lecture  First some reason for thinking that individuating is a distinct process

Individuating is different from discriminating

Individuating as a distinct process Individuating has its own psychometric function: The minimum distance for individuating is much larger than for discriminating. It may be that in vision our attention is limited in the number of things we can individuate and simultaneously access (more on this later). But how do you determine what counts as a “thing”? Individuating is a prerequisite for recognition of patterns and other properties that hold over several individual things  An example of how we can easily detect patterns if they are defined over a small enough number of parts is in subitizing  Another area where the concept of an individual has become important is in cognitive development, where it is clear that babies are sensitive to the numerosity of individual things in a way that is distinct from their perceptual abilities but is limited in its capacity

Are there collinear items (n>3)?

Several objects must be picked out at once in making relational judgments The same is true for other relational judgments like inside or on-the-same-contour… etc. We must pick out the relevant individual objects first.

What is attention is for? Treisman’s Attention as Glue Hypothesis  The purpose of visual attention is to Bind properties together in order to recognize objects

How are conjunctions of features detected? Under these conditions Conjunction Errors are very frequent Read the vertical line of digits in the following brief display What color was the N? What color was the O? What letters were in red?

Rapid visual search (Treisman) Find the following simple figure in the next slide:

Rapid visual search (conjunction) Find the following simple figure in the next slide:

Serial vs parallel search? Finding an object that differs from all others in a scene by a single feature – called a single-feature search – is fast, error-free and almost independent of how many nontargets there are; Finding an object that differs from all others by a conjunction of two or more features (and that shares at least one feature with each object in the scene) – called a conjunction search – is usually slow, error-prone, and is worse the more nontargets there are in the scene. These results suggest that in order to find a conjunction, which requires solving the binding problem, attention has to be scanned serially to each of the objects.

Attention is required to recognize objects Attention is primarily directed at Objects Instead of being like a spotlight beam that can be scanned around a scene and can be zoomed to cover a larger or smaller area, perhaps attention can only be directed towards occupied places – i.e., to objects.

Evidence that attentional selection is based on Objects Single Object Advantage: pairs of judgments are faster when both apply to the same object Entire objects acquire enhanced sensitivity from focal attention to a part of the object Single-Object advantage occurs even with generalized “objects” defined in feature space Simultanagnosia and hemispatial neglect show object-based effect Studies with Moving Objects  IOR  Object Files  MOT

Single-object advantage even when the shapes are controlled

Attention spreads over perceived objects Using a priming method (Egly, Driver & Rafal, 1994) showed that the effect of a prime spreads to other parts of the same visual object compared to equally distant parts of different objects. Spreads to B and not C Spreads to B and not C Spreads to C and not B Spreads to C and not B

“Objects” endure over time & space Several studies have shown that what counts as the same object endures over time and over changes in location;  Certain forms of disappearances in time and changes in location preserve objecthood. This gives what we have been calling a “visual object” a real physical-object character and partly justifies our calling it an “object”.

Inhibition of return appears to be object-based Recall that Inhibition-of-return is the phenomenon whereby an object that has been attended (and then attention is moved away from it) is less likely to attract attention again in a period of 300 ms to 900 ms after it is first attended. The attended item is said to be inhibited.  This is thought to help in visual search since it prevents previously visited objects from being revisited The original study used static objects. Then (Tipper, Driver & Weaver, 1991) showed that IOR moves with the inhibited object.

Simultanagnosic (Balint Syndrome) patients can only attend to one object at a time Simultanagnosic patients cannot judge the relative length of two lines, but they can tell that a figure made by connecting the ends of the lines is not a rectangle but a trapezoid (Holmes & Horax, 1919).

Balint patients can only attend to one object at a time even if they are overlapping Luria, 1959

Multiple Object Tracking One of the clearest cases illustrating object-based attention is Multiple Object Tracking Keeping track of individual visual objects requires a mechanism for individuating, selecting, accessing and maintaining the identity of individuals over time  These are the functions we have proposed are carried out by the mechanism of visual indexes (FINSTs)  We have been using a variety of methods for studying visual indexing, including subitizing, subset selection for search, and Multiple Object Tracking (MOT).

Multiple Object Tracking In a typical experiment, 8 simple identical objects are presented on a screen and 4 of them are briefly distinguished in some visual manner – usually by flashing them on and off. After these 4 “targets” have been briefly identified, all objects resume their identical appearance and move randomly. The subjects’ task is to keep track of which ones had earlier been designated as targets. After a period of 5-10 seconds the motion stops and subjects must indicate, using a mouse, which objects were the targets. People are very good at this task (80%-98% correct). The question is: How do they do it?

Keep track of the objects that flash

How do we do it? What properties of individual objects do we use?

These are just a few of the manifestations of object-based attention – an idea that has far- reaching consequences How do we keep track of several things at once (e.g., team games, computer games)? How do we refer to individual things in the world (e.g., this, that) in our thoughts? How can our mental images have spatial properties (with some things farther apart, some things bigger, etc than others). An entire large literature in philosophy dealing with nonconceptual representation

Part 2: Cognitive Architecture. Representation in perception One of the most important properties of the cognitive architecture – especially in vision.

Similar presentations

Presentation on theme: "Part 2: Cognitive Architecture. Representation in perception One of the most important properties of the cognitive architecture – especially in vision."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Part 2: Cognitive Architecture. Representation in perception One of the most important properties of the cognitive architecture – especially in vision.

Similar presentations

Presentation on theme: "Part 2: Cognitive Architecture. Representation in perception One of the most important properties of the cognitive architecture – especially in vision."— Presentation transcript:

Similar presentations

About project

Feedback