Chapter 3: Perception, Recognizing patterns and objects

Chapter 3: Perception, Recognizing patterns and objects
Foundations of Cognitive Science Spring 2019

How do we recognize objects?
Sensation Perception Object recognition Bottom-up Top-down

Important for: Applying knowledge to the world Learning-combining information

Origins of Knowledge The study of sensory processes grew out of questions about the origin of human knowledge How do we know anything about the world?

The Passive Perceiver John Locke Empiricism Tabula rasa 1632-1704
knowledge comes through senses; senses are passive Tabula rasa blank slate

The Passive Perceiver Empiricists: knowledge comes through stimuli that excite the senses We interpret information about the distal stimuli The real object in the outside world only through the proximal stimuli The energies that directly reach our sensory organs An important distinction This claim is challenged by the fact that our information about an external object (the distal stimulus) comes through the proximal stimulus to which this distal stimulus gives rise (e.g., the retinal image of an object being viewed). We perceive many qualities of a distal object—for example, its distance from us and its constant size and shape—that do not seem to be given in the proximal stimulus (e.g., the retinal image of a 5-foot tree 10 feet away is just as tall as one cast by a 50-foot tree 100 feet away).

4.2 Distal and proximal stimuli When this person looks out on the world, the retinal image cast by his hand will be roughly the same size as the image of the car. But one of these images obviously represents a much larger object! Clearly, then, retinal-image size (the proximal stimulus) alone cannot tell us the size of the distant object (the distal stimulus).

Sensation The awareness of stimuli through our senses Form perception The process through which you manage to see the basic shape and size of the object Object Recognition The process through which you identify what the object is

Physical stimuli Sensation: smell, taste, sight, sound, touch Mental processes Form perception: shape and size Object recognition: identification Visual system

We get information about distal stimuli through proximal stimuli. Perception is built up through learning by association.

5.22 The relationship between image size and distance If an object moves to a new distance, the size of the retinal image cast by that object changes. A doubling of the distance reduces the retinal image by half. If the distance is tripled, the retinal image is cut to one-third of its initial size.

Visual System Our sensory systems are active receivers of information.
The level of activity involved in shaping and interpreting input increases as we go deeper into the visual system

Vision Vision is our primary distance sense.
Its stimulus is light, which varies in intensity and wavelength. Eye structures, like the iris and the lens, control the amount of light entering the eye. form the retinal image. Light, which (like sound) travels in the form of waves that vary in amplitude and length. Amplitude corresponds to our perception of brightness, and wavelength corresponds to our perception of color. The human visual system is sensitive to only a tiny part of the electromagnetic spectrum of light. The incoming light is transformed into a proximal stimulus, the retinal image, by several structures of the eye. The cornea and lens of the eye focus incoming light (like a camera lens), and the iris governs the amount of incoming light.

4.24 The human eye (A) Light enters the eye through the cornea, and the cornea and lens refract the light rays to produce a sharply focused image on the retina. The iris can open or close to control the amount of light that reaches the retina. (B) The retina is made up of three main layers: the rods and cones, which are the photoreceptors; the bipolar cells; and the ganglion cells, whose axons make up the optic nerve. Two other kinds of cells, horizontal cells and amacrine cells, allow for lateral (sideways) interaction. You may have noticed that the retina contains an anatomical oddity: the photoreceptors are at the very back, the bipolar cells are in between, and the ganglion cells are at the top. As a result, light has to pass through the other layers (they’re not opaque, so this is possible) to reach the rods and cones, whose stimulation starts the visual process.

Rods and Cones On the retina, the light stimulus is transduced by rods and cones. Rods and cones differ in function. rods: low light intensities, colorless cones: great light intensities, responsible for sensations of color Acuity is greatest in the fovea, where the most cones are located. Once on the retina, the physical stimulus energies are transduced into neural impulses by the visual receptors, the rods and cones. The rods and cones stimulate cells that converge to form the optic nerve, which carries the visual signals to the thalamus and then to the cortex. Rods (located primarily in the periphery of the retina) are the receptors for night vision; they are sensitive to relatively low light intensities and lead to achromatic sensation. Cones (located in the fovea) are the receptors for day vision; they respond to much higher levels of light intensity, have much higher acuity, and lead to chromatic sensations. Rods and cones also contain different photopigments.

Omikron/Photo Researchers, Inc.
4.25 Rods and cones (A) Rods and cones are receptor cells at the back of the retina that transmit the neural impulses of vision. In this (colorized) photo, cones appear green and rods appear brown. (B) Distribution of photoreceptors: Cones are most frequent at the fovea, and the number of cones drops off sharply in locations away from the fovea. In contrast, there are no rods at all on the fovea. There are neither rods nor cones at the retina’s blind spot.

4.27 The blind spot Close your right eye and stare at the picture of the dog. Can you see the cat without moving your eye? Move the book either closer to you or farther away. You should be able to find a position (about 7 inches from your face) where the cat’s picture vanishes when you’re looking at the dog. That’s because, at that distance, the cat’s picture is positioned on the retina such that it falls onto the blind spot. Note, though, that the grid pattern seems continuous. For this sort of regular pattern, your visual system is able to “fill in” the gap created by the blind spot.

Visual System Transduction- distal  proximal Sensory codes
nervous system uses them to translate a proximal stimulus into neural impulses Codes for Intensity and quality The intensity of a stimulus (e.g., its brightness or loudness) is encoded by both the rate of neural firing and the number of neurons that are triggered by the stimulus. The sensory modality (whether a sensation is of a sight or a sound) is encoded by differences in the neural structures that are excited by these stimuli (e.g., optic nerve vs. auditory nerve), as argued by Müller.

Sensory Coding Psychological intensity is coded by
rates of neuron firing total number of neurons triggered

Sensory Coding Other codes indicate sensory quality Specificity theory
Pattern theory Within a modality, differences in sensory quality (e.g., red versus green or sweet versus sour) are sometimes encoded by the specific neurons being stimulated (specificity theory), but more commonly are encoded by the pattern of firing across a set of neurons (pattern theory).

Sensory Coding Specificity theory
Different sensory qualities are signaled by different neurons. describes qualitative differences within a sensory modality

4.7 Seeing stars Whether it comes from light or some other source, stimulation of the optic nerve causes the sense of seeing. This is why boxers sometimes “see stars.” The punches they receive cause the head to move abruptly, making the eyeballs press briefly against their eye sockets. This pressure mechanically stimulates the optic nerves and makes the stars appear.

Sensory Coding More commonly, sensory coding is best described by pattern theory. Certain sensory qualities arise because of different patterns of activation across a whole set of neurons. An additional consideration for all sensory systems is adaptation, or the gradual decrease in sensory response to unchanging stimuli. This pattern of response can have evolutionary benefits by actively highlighting new information and changes in the physical environment.

Visual System The various components of the visual system interact continually. These interactions actively shape and transform the stimulus input.

4.26 The visual pathway Information about the left side of the visual world is sent, via the thalamus, to the right visual cortex (at the rear of the head, in the occipital lobe). Information about the right side of the visual world is sent to the left visual cortex. The “cross point” for the neural fibers is called the optic chiasm and is located near the thalamus.

Perceiving Shapes Shape perception depends on detector cells.
They respond to certain characteristics of the stimulus, such as curves and straight edges. Optimal input for each cell defines the cell’s receptive field. Single-cell nerve recordings suggests that each cell in the visual system has a preferred target; it responds maximally only to an input that has the right location, shape, or position, features that define that cell's receptive field. Cells can thus be tuned to detect particular features, which can be rather general (i.e., vertical lines) or very specific (i.e., faces). Phenomena such as lateral inhibition and feature detection demonstrate that the visual system is not a passive receiver of external stimuli.

4.43 Receptive fields on the cat’s visual system Using the setup shown previously, in Figure 4.41, stimuli are presented to various regions of the retina. The data show that different cells show different patterns of responding. For example, parts (A) through (D) show the firing frequency of a particular ganglion cell. (A) This graph shows the baseline firing rate when no stimulus is presented anywhere. (B) The cell’s firing rate goes up when a stimulus is presented in the middle of the cell’s receptive field. (C) In contrast, the cell’s firing rate goes down if a stimulus is presented at the edge of the cell’s receptive field. (D) If a stimulus is presented both to the center of the receptive field and to the edge, the cell’s firing rate does not change from its baseline level. Cells with this pattern of responding are called “center-surround” cells, to highlight their opposite responses to stimulation in the center of the receptive field and the surrounding region.

Perceiving Shapes Feature detectors
In cats and monkeys, these seem to respond the most when a line or edge of a specific orientation is in view. Other cells assemble these elements in order to detect more complex patterns.

Visual System Dot detectors Edge detectors Corner detector Movement
Horizontal Vertical Diagonal Corner detector Movement L to R R to L…

The Active Perceiver Our sensory systems are active receivers of information. The level of activity involved in shaping and interpreting input increases as we go deeper.

The Visual Pathway Visual cortex—different cells respond to specific aspects of a stimulus. These different analyses go on in parallel. Cells that analyze forms are doing their work at the same time that other cells are analyzing motion and still others are analyzing color. In the visual cortex, different types of cells might respond to orientation, shape, color, motion, or other more complex dimensions, and they all work at the same time on the same stimuli. Parallel processing allows for greater speed and also allows the analyses going on in different areas of the brain to influence each other. This provides a biological basis for the simultaneous combination of top-down interpretations and bottom-up feature detection described earlier.

5.19 The “what” and “where” pathways Information from the primary visual cortex at the back of the head is transmitted to the inferotemporal cortex (the “what” system) and to the posterior parietal cortex (the “where” system).

5.18 The visual processing pathways Each box in this figure refers to a specific location within the visual system; the two blue boxes at the left refer to locations outside the cortex; all other boxes refer to locations on the cortex. Notice two key points: First, vision depends on many different brain sites, each performing a specialized type of analysis. Second, the flow of information is complex—so there’s surely no strict sequence of one specific step of analysis followed by another specific step. Instead, everything happens at once and there’s a great deal of back-and-forth communication among the various elements.

“What” and “Where” “What” system “Where” system Brain damage
temporal cortex identification of visual objects “Where” system parietal cortex where stimulus is located Brain damage Akinetopsia Agnosia etc Two major cortical pathways carry information in parallel beyond the visual cortex. The “what” system carries information to the temporal cortex and is crucial for the identification of visual objects; the “where” system carries information to the parietal cortex and conveys information about where a stimulus is located. These pathways can be independently disrupted. Patients with lesions in the occipital-temporal pathway show visual agnosia but little disruption in spatial orientation and reaching for objects, whereas patients with lesions in the occipital-parietal pathway show impaired reaching but no problems with identification.

The Binding Problem How do we integrate results from different subsystems? binding problem solved in part by neural synchrony attention Synchronized neural firing nervous system’s way of representing different attributes of a single object - The fact that vision depends on multiple specialized subsystems raises a question: How are the separate pieces of information integrated to form a coherent perceptual whole? This question, called the “binding problem,” is currently unresolved, and continues to be a subject of intense research. One proposal is that binding is achieved through the rhythm of cell firing. When cells in different brain areas fire at a particular rate and precisely in time with each other (called neural synchrony), this may signal that all the cells are responding to different aspects of the same object, and so should be bound together.

Ambiguous Figures

Form Perception To organize input, the perceiver must analyze the visual scene. segregation of figure and ground Interpretive steps are logical. They don’t contain contradictions and don’t depend on coincidence. Our perception is organized in ways the stimuli input is not - Gestalt psychologists argued therefore that how we organize, or parse, our perceptions, is as important as the inputs themselves and that we recognize forms as coherent wholes rather than the sum of their parts. The information needed for this parsing comes, not from the stimulus, but from the perceiver. Visual parsing typically follows several principles about how forms are typically organized, such as similarity, proximity, and good continuation. Another crucial part of visual organization is separating the form (the figure) from its surroundings (the ground). These principles point again to the active role of the perceiver in interpreting its surroundings, but these interpretations are not random. Instead, they reflect our typical visual experience and basic logical rules.

Necker Cube Two possible interpretations
But only one can be seen at a time The lines are neutral with regard to the shape’s configuration in depth. Your perception is not—you specify an arrangement in depth. One set of visual features

Ambiguous Figures

Charles Allan Gilbert, “All is Vanity”

Erik Johansson, “The Architect, 2015”

Ambiguous Figures Neutral with regard to: Figure Ground
The depicted object Ground The background Your perception contributes information on how the stimuli is arranged

Recognizing Objects Reversible figures are not special
Many stimuli need interpretation

Motion Illusion Dragon Video
Video Full of Illusions Optical Illusion Dance

Gestalt Principles Perceptual illusions work because we try to make sense of the world in a systematic manner. “The perceptual whole is often different from the sum of its parts.” “Beyond the information given,” Bruner (1973) Interpretation of stimuli is guided by a set of principles

People resolve ambiguity in everyday situations

Gestalt Principles Closure Proximity Simplicity

The analysis of features depends on how the overall figure has been interpreted and organized.

Object Recognition What comes first the features or your interpretation? Must start with stimuli stimuli first Features you find depend on your interpretation interpretation first Parallel processing neither first various brain areas influence each other

Perceptual Constancy We perceive a stable world
despite changes in viewing circumstances that cause alteration to sensory information. size constancy even though it’s determined by size of distal object and by viewing distance shape constancy even though the shape depends on viewing angle We correctly perceive an object's size as we approach it, for example, even as the size of its retinal image changes (size constancy); similarly, we achieve shape constancy despite changes in our viewing angle and brightness constancy despite changes in illumination. One hypothesis, developed by Helmholtz, is that the visual system adjusts our perceptions automatically to take into account unchanging patterns in visual scenes, a process called unconscious inference. Size constancy, for example, can be achieved by using cues concerning the distance of the object and accounting for how far away it is when interpreting the size of its retinal image. Shape judgments take cues about our viewing angle into account, and brightness judgments use cues for a surface's orientation relative to the available light sources.

Unconscious Interference
Figure-ground relationship change Unconscious inference. It takes viewing circumstances (distance, viewing angle, illumination) into account by means of a simple calculation. It can sometimes lead us astray. illusions - Interpretations can be wrong and can be exposed by visual illusions, where we misinterpret the available cues. For example, accounting for the "shadow" cast by an image on the page can result in incorrect brightness judgments.

Why does the shaded box look longer on the left than right?

Size Illusion

Brightness Illusion

Distance Perception Depth perception depends on various depth cues, including binocular disparity monocular (or pictorial) cues interposition and linear perspective motion parallax Perception depends on interpreting cues about relationships like distance—but what are these cues and how do we use them? One important cue is the difference between each eye's view of the world, or binocular disparity; this cue is the basis for three-dimensional (3D) images in movies and television. Other cues require only one eye (monocular); these include interposition, linear perspective, and texture gradients. Motion cues, including motion parallax and optic flow, derive from the changes in the retinal image across the visual field when we are in motion. Why should the visual system use all these cues when they are often redundant? The answer is that different cues become important in different circumstances. Binocular disparity, for instance, is useful only for objects less than approximately 30 feet away from us, and motion parallax works only when we are moving.

5.26 Binocular disparity Two images at different distances from the observer will present somewhat different retinal images. In the left eye’s view, these images are close together on the retina; in the right eye’s view, the images are farther apart. This disparity between the views serves as a powerful cue for depth.

Role of redundancy of distance information

Now let’s turn from form perception, the process through which the basic shape and size of an object are seen And discuss object recognition, the process through which the object is identified

A A A a a A Object recognition
Able to recognize a huge number of patterns Objects Actions Situations Variations of each of these Even if information is incomplete A A a a A A A A A a a A

Knowledge can alter our interpretation
The answers 01. Heineken 02. Adidas 03. Toyota 04. British Airways 05. BP 06. Google 07. BMW 08. Vodafone 09. Ford 10. McDonald’s 11. Coca Cola 12. Olympic Games 13. Microsoft 14. IBM 15. Nike 16. Pepsi 17. GE 18. Qantas 19. Nokia 20. Virgin

Knowledge can alter our interpretation

Object Recognition 6 circles exercise Draw a chair Describe a chair
A separate seat for one person, typically with a back and four legs.

Object Recognition

Object Recognition Find the

Object Recognition: Features
Much slower at searching for a combination of features Simple feature detection occurs early in perception Feature binding is a separate step Integrative agnosia TMS

Object Recognition: Features
Recognize objects by their parts How do you recognize the parts? Identification of visual features Assemble the features Feature detectors in visual system What is common to an item? Fast and efficient at ID’ing simple features

Outline Word superiority effect Top Down Processing
Feature Nets Interactive Activation Model Bottom Up Processing Template Theory Representation & Process Evidence for and against this theory Recognition by Components Theory When perception fails

Word Recognition Tachistoscopic presentation Mask Familiarity Recency
Priming Repetition priming Word-Superiority Effect

WSE demo + DARK XXXXXX * * * _ K or E + RDAK XXXXXX * * * _ K or E OR

3. Word Superiority Effect
IV: presentation condition word: DARK non-word: RDAK Results: word > non-word word superiority effect: letter identification is better in the context of a word than a non-word or an isolated letter

Degree of well-formedness HZYE or SBNE FIKE or LAFE Pronounceability
Probabilities- “Englishness” How often letter combinations occur

Making errors Systematic errors Irregular patterns are regularized TPUM  TRUM or DRUM Reverse errors are rare Recognition guided by knowledge of spelling patterns

Feature Nets

Feature Nets Activation level Response threshold
Complex assemblies of neural tissue Recency and frequency- activation level Degraded input Well-formedness Bigram detectors Familiar combinations

Recovery from confusion
Due to priming

Ambiguous input Resolved at subsequent levels Context allows you to make better use of what you see

Recognition errors CQRN  CORN Biased network Usually unproblematic Not locally represented Distributed knowledge Efficiency vs accuracy

Feature Nets How does Feature Net explain WSE?
How does Feature Net explain type of errors? What are the costs/benefits of Feature Net?

Descendants of Feature Nets
McClelland & Rumelhart (1981) Excitatory connections Includes inhibitory connections Information flows is not one-way Similar to visual processing in the nervous system

Interactive Activation Model McClelland & Rumelhart (1981)
WORK Word layer: your knowledge learned through life K Letter layer: composed of features and used to compose words | ) _ Feature layer: stimuli coming in from the world Visual Input

The Interactive Activation Model
activation spreads from feature level to letter level to word level activation spreads from word back to letter level How does IAM explain the WSE? (McClelland & Rumelhart, 1981)

Interactive Activation Model McClelland & Rumelhart (1981)
TOP DOWN WORK Top-down processing influence of knowledge on perception i.e., word to letter level Bottom-up processing influence of pure sensory processes on perception i.e., visual input to feature K | ) _ BOTTOM UP Visual Input

How do we manage to recognize objects in the midst of this ambiguous information?
Bottom-up processing Stimulus driven influence of pure sensory processes on perception (information in the pictures) Top-down processing Knowledge driven influence of knowledge on perception (experience in the world)

Recognition by Components Theories
Hummel & Beiderman (1992) Representation: 3D objects are a set of features (i.e., geons) and their relations Process: detect features and relations in visual field, then match to list in memory and pick best match

Geons are like the alphabet of object recognition
36 different geons identified Can be identified from many different viewpoints

Objects from Geons

Relations Among the Geons
Different arrangements of the same component geons can lead to different objects

Object model—object recognition
Feature detectors Edges Curves Vertices Geon detectors Geon assemblies Object model—object recognition

Evidence - Biederman (1975)
Easier to ID Hard to ID participants identified partials pictures. Results - harder to identify pictures that have geons or relations between geons removed than pictures that preserve geons and relations Two conditions: one where geons and relationships are preserved and one where they are not. The total amount of information available for template matching is the same in both, however only the information that helps with geon matching is masked.

Potential Problems with RBC Theory
Geons have no detail – cannot tell the difference between a dog and a wolf Some things are not well-defined by a geon

RBC is Object centered – object has the same parts & relations regardless of the location of the viewer Viewpoint-independent

Template Theory Representation: unanalyzed whole image, ‘snapshots’
NOTE: a picture depends on the position of the viewer (viewer-centered) Process: match object to templates stored in memory and measure overlap; best match wins To navigate the world you would need lots and lots of snapshots labeled with the object from which you’ve experienced the item.

Examples of a Template Representation

Example of the Template Process
Test item What letter is this? Test-J overlap is 10 Test-T overlap is 8 J wins

Where else might we use template matching in the real world?

Potential Problems with Template Theory
We can identify an object regardless of translation and rotation. We can identify an object regardless of color or size. Size invariance Color Translation invariance Rotation invariance

Translation: moving to a different part of the visual field Rotation: viewing the object from a different viewpoint

Inefficient (huge number of potential templates) Any change in the object’s shape, location, size, or orientation produces a new template. Templates could match (close to) perfectly but still not be the same object in the world May not have ever stored a particular template (upside down tricycle) but know what it is immediately These doesn’t necessarily mean that the theory is broken though.

Template Theory is Viewer centered – stored representation is different depending on location of viewer View point dependent

Canonical View of Template Theory
This is an updated version of the Template Theory process: transform to canonical view then match object with template stored in memory and measure overlap

Evidence for Canonical View: Palmer, Rosch, & Chase (1981)
Part I participants rated how typical the picture was. Part II a new group of participants asked to identify objects. Part 2 Different subjects were shown the images and asked to name the image as quickly as possible. What would template theory predict about RT with a large number of templates? So what are the Ivs and DV here? What comparison can we make that would speak to the issue of a canonical view point.

More Evidence for Canonical View: Palmer, Rosch, & Chase (1981)
Part II a new group of participants asked to identify objects. Not all templates are equally useful. Images are perceived, adjusted, and then matched to a template. The better the images matches to a template at the outset, the better RT is going to be.

Canonical View of Template Theory
templates (‘snapshots’ of the actual object) are the basis for object recognition the canonical template for each object is preferred & used whenever possible

Object vs Viewer centered
Does perception depend on the object or on the location of the viewer with respect to the object? May depend on the task

Review Which is viewer centered?
RBC or Template theory What is stored in memory for template theory? What is a canonical view? What’s happening in the figure below?

Specific Objects Are faces special? Prosopagnosia:

Specific Objects Fusiform Face Area (FFA)
Recognition of specific individuals within a category Extremely familiar category Holistic perception Relations among parts

Top-Down Influences Driven by knowledge and expectations
Recency and frequency Words are easier to recognize if you see them as part of a sentence rather than in isolation Context priming Object recognition is not self-contained process

Chapter 3: Perception, Recognizing patterns and objects

Similar presentations

Presentation on theme: "Chapter 3: Perception, Recognizing patterns and objects"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 3: Perception, Recognizing patterns and objects

Similar presentations

Presentation on theme: "Chapter 3: Perception, Recognizing patterns and objects"— Presentation transcript:

Similar presentations

About project

Feedback