Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 What can computational models tell us about face processing? Garrison W. Cottrell Gary's Unbelievable Research Unit (GURU) Computer Science and Engineering.

Similar presentations


Presentation on theme: "1 What can computational models tell us about face processing? Garrison W. Cottrell Gary's Unbelievable Research Unit (GURU) Computer Science and Engineering."— Presentation transcript:

1

2 1 What can computational models tell us about face processing? Garrison W. Cottrell Gary's Unbelievable Research Unit (GURU) Computer Science and Engineering Department Institute for Neural Computation UCSD Collaborators, Past, Present and Future: Ralph Adolphs, Luke Barrington, Serge Belongie, Kristin Branson, Tom Busey, Andy Calder, Eric Christiansen, Matthew Dailey, Piotr Dollar, Michael Fleming, AfmZakaria Haque, Janet Hsiao, Carrie Joyce, Brenden Lake, Kang Lee, Joe McCleery, Janet Metcalfe, Jonathan Nelson, Nam Nguyen, Curt Padgett, Angelina Saldivar, Honghao Shan, Maki Sugimoto, Matt Tong, Brian Tran, Keiji Yamada, Lingyun Zhang

3 2 What can computational models tell us about face processing? Garrison W. Cottrell Gary's Unbelievable Research Unit (GURU) Computer Science and Engineering Department Institute for Neural Computation UCSD Collaborators, Past, Present and Future: Ralph Adolphs, Luke Barrington, Serge Belongie, Kristin Branson, Tom Busey, Andy Calder, Eric Christiansen, Matthew Dailey, Piotr Dollar, Michael Fleming, AfmZakaria Haque, Janet Hsiao, Carrie Joyce, Brenden Lake, Kang Lee, Joe McCleery, Janet Metcalfe, Jonathan Nelson, Nam Nguyen, Curt Padgett, Angelina Saldivar, Honghao Shan, Maki Sugimoto, Matt Tong, Brian Tran, Keiji Yamada, Lingyun Zhang

4 3 What can computational models tell us about face processing? Garrison W. Cottrell Gary's Unbelievable Research Unit (GURU) Computer Science and Engineering Department Institute for Neural Computation UCSD Collaborators, Past, Present and Future: Ralph Adolphs, Luke Barrington, Serge Belongie, Kristin Branson, Tom Busey, Andy Calder, Eric Christiansen, Matthew Dailey, Piotr Dollar, Michael Fleming, AfmZakaria Haque, Janet Hsiao, Carrie Joyce, Brenden Lake, Kang Lee, Joe McCleery, Janet Metcalfe, Jonathan Nelson, Nam Nguyen, Curt Padgett, Angelina Saldivar, Honghao Shan, Maki Sugimoto, Matt Tong, Brian Tran, Keiji Yamada, Lingyun Zhang

5 4 What can computational models tell us about face processing? Garrison W. Cottrell Gary's Unbelievable Research Unit (GURU) Computer Science and Engineering Department Institute for Neural Computation UCSD Collaborators, Past, Present and Future: Ralph Adolphs, Luke Barrington, Serge Belongie, Kristin Branson, Tom Busey, Andy Calder, Eric Christiansen, Matthew Dailey, Piotr Dollar, Michael Fleming, AfmZakaria Haque, Janet Hsiao, Carrie Joyce, Brenden Lake, Kang Lee, Joe McCleery, Janet Metcalfe, Jonathan Nelson, Nam Nguyen, Curt Padgett, Angelina Saldivar, Honghao Shan, Maki Sugimoto, Matt Tong, Brian Tran, Keiji Yamada, Lingyun Zhang

6 5 What can computational models tell us about face processing? Garrison W. Cottrell Gary's Unbelievable Research Unit (GURU) Computer Science and Engineering Department Institute for Neural Computation UCSD Collaborators, Past, Present and Future: Ralph Adolphs, Luke Barrington, Serge Belongie, Kristin Branson, Tom Busey, Andy Calder, Eric Christiansen, Matthew Dailey, Piotr Dollar, Michael Fleming, AfmZakaria Haque, Janet Hsiao, Carrie Joyce, Brenden Lake, Kang Lee, Joe McCleery, Janet Metcalfe, Jonathan Nelson, Nam Nguyen, Curt Padgett, Angelina Saldivar, Honghao Shan, Maki Sugimoto, Matt Tong, Brian Tran, Keiji Yamada, Lingyun Zhang

7 6 What can computational models tell us about face processing? Garrison W. Cottrell Gary's Unbelievable Research Unit (GURU) Computer Science and Engineering Department Institute for Neural Computation UCSD Collaborators, Past, Present and Future: Ralph Adolphs, Luke Barrington, Serge Belongie, Kristin Branson, Tom Busey, Andy Calder, Eric Christiansen, Matthew Dailey, Piotr Dollar, Michael Fleming, AfmZakaria Haque, Janet Hsiao, Carrie Joyce, Brenden Lake, Kang Lee, Joe McCleery, Janet Metcalfe, Jonathan Nelson, Nam Nguyen, Curt Padgett, Angelina Saldivar, Honghao Shan, Maki Sugimoto, Matt Tong, Brian Tran, Keiji Yamada, Lingyun Zhang

8 IEEE Computational Intelligence Society 4/12/2006 7/100 And now for something completely different… The CIS goal is to mimic nature for problem solving My goal is to mimic nature in order to understand nature In fact, as a cognitive scientist, I am glad when my models make the same mistakes people do… Because that means the model is fitting the data better -- so maybe I have a better model! So - dont look for a better problem solver here…hopefully, look for some insights into how people process faces.

9 IEEE Computational Intelligence Society 4/12/2006 8/100 Why use models to understand thought? Models rush in where theories fear to tread. Models can be manipulated in ways people cannot Models can be analyzed in ways people cannot.

10 IEEE Computational Intelligence Society 4/12/2006 9/100 Models rush in where theories fear to tread Theories are high level descriptions of the processes underlying behavior. They are often not explicit about the processes involved. They are difficult to reason about if no mechanisms are explicit -- they may be too high level to make explicit predictions. Theory formation itself is difficult. working model Using machine learning techniques, one can often build a working model of a task for which we have no theories or algorithms (e.g., expression recognition). A working model provides an intuition pump for how things might work, especially if they are neurally plausible (e.g., development of face processing - Dailey and Cottrell). A working model may make unexpected predictions (e.g., the Interactive Activation Model and SLNT).

11 IEEE Computational Intelligence Society 4/12/ /100 Models can be manipulated in ways people cannot We can see the effects of variations in cortical architecture (e.g., split (hemispheric) vs. non-split models (Shillcock and Monaghan word perception model)). We can see the effects of variations in processing resources (e.g., variations in number of hidden units in Plaut et al. models). We can see the effects of variations in environment (e.g., what if our parents were cans, cups or books instead of humans? I.e., is there something special about face expertise versus visual expertise in general? (Sugimoto and Cottrell, Joyce and Cottrell)). We can see variations in behavior due to different kinds of brain damage within a single brain (e.g. Juola and Plunkett, Hinton and Shallice).

12 IEEE Computational Intelligence Society 4/12/ /100 Models can be analyzed in ways people cannot In the following, I specifically refer to neural network models. We can do single unit recordings. We can selectively ablate and restore parts of the network, even down to the single unit level, to assess the contribution to processing. We can measure the individual connections -- e.g., the receptive and projective fields of a unit. We can measure responses at different layers of processing (e.g., which level accounts for a particular judgment: perceptual, object, or categorization? (Dailey et al. J Cog Neuro 2002).

13 IEEE Computational Intelligence Society 4/12/ /100 How (I like) to build Cognitive Models I like to be able to relate them to the brain, so neurally plausible models are preferred -- neural nets. The model should be a working model of the actual task, rather than a cartoon version of it. Of course, the model should nevertheless be simplifying (i.e. it should be constrained to the essential features of the problem at hand): Do we really need to model the (supposed) translation invariance and size invariance of biological perception? As far as I can tell, NO! Then, take the model as is and fit the experimental data: 0 fitting parameters is preferred over 1, 2, or 3.

14 IEEE Computational Intelligence Society 4/12/ /100 The other way (I like) to build Cognitive Models Same as above, except: Use them as exploratory models -- in domains where there is little direct data (e.g. no single cell recordings in infants or undergraduates) to suggest what we might find if we could get the data. These can then serve as intuition pumps. Examples: Why we might get specialized face processors Why those face processors get recruited for other tasks

15 IEEE Computational Intelligence Society 4/12/ /100 Outline Review of our model of face and object processing Some insights from modeling: What could holistic processing mean? Does a specialized processor for faces need to be innately specified? Why would a face area process BMWs? Some new directions: How do we select where to look next? How is information integrated across saccades?

16 IEEE Computational Intelligence Society 4/12/ /100 Outline Review of our model of face and object processing Some insights from modeling: What could holistic processing mean? Does a specialized processor for faces need to be innately specified? Why would a face area process BMWs? Some new directions: How do we select where to look next? How is information integrated across saccades?

17 IEEE Computational Intelligence Society 4/12/ /100 The Face Processing System PCA Gabor Filtering Happy Sad Afraid Angry Surprised Disgusted Neural Net Pixel (Retina) Level Object (IT) Level Perceptual (V1) Level Category Level

18 IEEE Computational Intelligence Society 4/12/ /100 PCA Gabor Filtering Bob Carol Ted Alice Neural Net Pixel (Retina) Level Object (IT) Level Perceptual (V1) Level Category Level The Face Processing System

19 IEEE Computational Intelligence Society 4/12/ /100 The Face Processing System PCA Gabor Filtering Bob Carol Ted Cup Can Book Neural Net Pixel (Retina) Level Object (IT) Level Perceptual (V1) Level Category Level Featurelevel

20 IEEE Computational Intelligence Society 4/12/ /100 The Face Processing System LSF PCA HSF PCA Gabor Filtering Bob Carol Ted Cup Can Book Neural Net Pixel (Retina) Level Object (IT) Level Perceptual (V1) Level Category Level

21 IEEE Computational Intelligence Society 4/12/ /100 The Gabor Filter Layer Basic feature: the 2-D Gabor wavelet filter (Daugman, 85): These model the processing in early visual areas Convolution * Magnitudes Subsample in a 29x36 grid

22 IEEE Computational Intelligence Society 4/12/ /100 Principal Components Analysis n The Gabor filters give us 40,600 numbers n We use PCA to reduce this to 50 numbers n PCA is like Factor Analysis: It finds the underlying directions of Maximum Variance n PCA can be computed in a neural network through a competitive Hebbian learning mechanism n Hence this is also a biologically plausible processing step n We suggest this leads to representations similar to those in Inferior Temporal cortex

23 IEEE Computational Intelligence Society 4/12/ /100 How to do PCA with a neural network (Cottrell, Munro & Zipser, 1987; Cottrell & Fleming 1990; Cottrell & Metcalfe 1990; OToole et al. 1991) n n A self-organizing network that learns whole-object representations (features, Principal Components, Holons, eigenfaces) (features, Principal Components, Holons, eigenfaces)... Holons (Gestalt layer) Input from Perceptual Layer

24 IEEE Computational Intelligence Society 4/12/ /100 How to do PCA with a neural network (Cottrell, Munro & Zipser, 1987; Cottrell & Fleming 1990; Cottrell & Metcalfe 1990; OToole et al. 1991) n n A self-organizing network that learns whole-object representations (features, Principal Components, Holons, eigenfaces) (features, Principal Components, Holons, eigenfaces)... Holons (Gestalt layer) Input from Perceptual Layer

25 IEEE Computational Intelligence Society 4/12/ /100 How to do PCA with a neural network (Cottrell, Munro & Zipser, 1987; Cottrell & Fleming 1990; Cottrell & Metcalfe 1990; OToole et al. 1991) n n A self-organizing network that learns whole-object representations (features, Principal Components, Holons, eigenfaces) (features, Principal Components, Holons, eigenfaces)... Holons (Gestalt layer) Input from Perceptual Layer

26 IEEE Computational Intelligence Society 4/12/ /100 How to do PCA with a neural network (Cottrell, Munro & Zipser, 1987; Cottrell & Fleming 1990; Cottrell & Metcalfe 1990; OToole et al. 1991) n n A self-organizing network that learns whole-object representations (features, Principal Components, Holons, eigenfaces) (features, Principal Components, Holons, eigenfaces)... Holons (Gestalt layer) Input from Perceptual Layer

27 IEEE Computational Intelligence Society 4/12/ /100 How to do PCA with a neural network (Cottrell, Munro & Zipser, 1987; Cottrell & Fleming 1990; Cottrell & Metcalfe 1990; OToole et al. 1991) n n A self-organizing network that learns whole-object representations (features, Principal Components, Holons, eigenfaces) (features, Principal Components, Holons, eigenfaces)... Holons (Gestalt layer) Input from Perceptual Layer

28 IEEE Computational Intelligence Society 4/12/ /100 How to do PCA with a neural network (Cottrell, Munro & Zipser, 1987; Cottrell & Fleming 1990; Cottrell & Metcalfe 1990; OToole et al. 1991) n n A self-organizing network that learns whole-object representations (features, Principal Components, Holons, eigenfaces) (features, Principal Components, Holons, eigenfaces)... Holons (Gestalt layer) Input from Perceptual Layer

29 IEEE Computational Intelligence Society 4/12/ /100 The Gestalt Layer: Holons (Cottrell, Munro & Zipser, 1987; Cottrell & Fleming 1990; Cottrell & Metcalfe 1990; OToole et al. 1991) n n A self-organizing network that learns whole-object representations (features, Principal Components, Holons, eigenfaces) (features, Principal Components, Holons, eigenfaces)... Holons (Gestalt layer) Input from Perceptual Layer

30 IEEE Computational Intelligence Society 4/12/ /100 Holons They act like face cells (Desimone, 1991): Response of single units is strong despite occluding eyes, e.g. Response drops off with rotation Some fire to my dogs face A novel representation: Distributed templates -- each units optimal stimulus is a ghostly looking face (template- like), but many units participate in the representation of a single face (distributed). For this audience: Neither exemplars nor prototypes! Explain holistic processing: Why? If stimulated with a partial match, the firing represents votes for this template: Units downstream dont know what caused this unit to fire. (more on this later…)

31 IEEE Computational Intelligence Society 4/12/ /100 The Final Layer: Classification (Cottrell & Fleming 1990; Cottrell & Metcalfe 1990; Padgett & Cottrell 1996; Dailey & Cottrell, 1999; Dailey et al. 2002) The holistic representation is then used as input to a categorization network trained by supervised learning. n n Excellent generalization performance demonstrates the sufficiency of the holistic representation for recognition Holons Categories... Output: Cup, Can, Book, Greeble, Face, Bob, Carol, Ted, Happy, Sad, Afraid, etc. Input from Perceptual Layer

32 IEEE Computational Intelligence Society 4/12/ /100 The Final Layer: Classification Categories can be at different levels: basic, subordinate. Simple learning rule (~delta rule). It says (mild lie here): add inputs to your weights (synaptic strengths) when you are supposed to be on, subtract them when you are supposed to be off. This makes your weights look like your favorite patterns – the ones that turn you on. When no hidden units => No back propagation of error. When hidden units: we get task-specific features (most interesting when we use the basic/subordinate distinction)

33 IEEE Computational Intelligence Society 4/12/ /100 Outline Review of our model of face and object processing Some insights from modeling: Some insights from modeling: What could holistic processing mean? What could holistic processing mean? Does a specialized processor for faces need to be innately specified? Why would a face area process BMWs? Some new directions: How do we select where to look next? How is information integrated across saccades?

34 IEEE Computational Intelligence Society 4/12/ /100 Holistic Processing Holistic processing refers to a type of processing where visual stimuli are treated as a piece -- in fact, we are unable to ignore other apparent parts of an image. Face processing, in particular, is thought to be holistic in nature. We are better at recognizing Bobs nose when it is on his face Changing the spacing between the eyes makes the nose look different We are unable to ignore conflicting information from other parts of a face All of these might be summarized as context influences perception, but the context is obligatory.

35 IEEE Computational Intelligence Society 4/12/ /100 Who do you see? Context influences perception

36 IEEE Computational Intelligence Society 4/12/ /100 Same Different Task

37 IEEE Computational Intelligence Society 4/12/ /100

38 IEEE Computational Intelligence Society 4/12/ /100

39 IEEE Computational Intelligence Society 4/12/ /100

40 IEEE Computational Intelligence Society 4/12/ /100 These look like very different women

41 IEEE Computational Intelligence Society 4/12/ /100 But all that has changed is the height of the eyes, right?

42 IEEE Computational Intelligence Society 4/12/ /100 Take the configural processing test! n What emotion is being shown in the top half of the image below? n Happy, Sad, Afraid, Surprised, Disgusted, or Angry? Now, what do you see? Answer: Sad

43 IEEE Computational Intelligence Society 4/12/ /100 Do Holons explain these effects? Recall that they are templates -- each units optimal stimulus is a ghostly looking face (template-like) What will happen if there is a partial match? Suppose there is a holon that likes happy faces. The mouth will match, causing this unit to fire. Units downstream have learned to associate this firing with a happy face. They will think the top of the face is happier than it is…

44 IEEE Computational Intelligence Society 4/12/ /100 Do Holons explain these effects? Clinton/Gore: The outer part of the face votes for Gore. The nose effect: a match at the eyes votes for that templates nose. Expression/identity configural effects: Split faces: The bottom votes for one person, the top another, but both vote for the WHOLE face… Split expressions: The bottom votes for one expression, the top another…

45 IEEE Computational Intelligence Society 4/12/ /100 Gabor Pattern Gabor Filtering Input Pixel Image Attention to half an image Attenuate Attenuated Pattern

46 Composite vs. non-composite facial expressions (Calder et al. 2000) Network ErrorsHuman Reaction Times (error bars indicate one standard deviation)

47 IEEE Computational Intelligence Society 4/12/ /100 Is Configural Processing of Identity and Expression Independent? Calder et al. (2000) found that adding additional inconsistent information that is not relevant to the task didnt further slow reaction times. E.g., when the task is who is it on the top?, having a different persons face on the bottom hurts your performance, but also having a different expression doesnt hurt you any more. Same Identity, Different Expression Different Identity, Same Expression Different Identity, Different Expression

48 IEEE Computational Intelligence Society 4/12/ /100 (Lack of) Interaction between expression and identity Network Reaction Time: 1 – Correct Output Human Reaction Time (ms) Cottrell, Branson, and Calder, 2002

49 IEEE Computational Intelligence Society 4/12/ /100 AttenuatedInconsistentInformationhere The representation of shifted information here (non-configural)-> Has little Impact here--> because because The bottom half doesnt match any template PCA Gabor Filtering Happy Sad Afraid Bob Carol Ted Neural Net Pixel (Retina) Level Object (IT) Level Perceptual (V1) Level Category Level Why does this work? Leads to a weaker representationhere-> Because the Wrong template Is weakly activated

50 IEEE Computational Intelligence Society 4/12/ /100 Configural/holistic processing phenomena accounted for Interference from incorrect information in other half of image. Lack of interference from misaligned incorrect information. We have shown this for identity and expression, as well as the lack of interaction between these. Calder suggested from his data that we must have two representations: one for expression and one for identity: but our model has only one representation.

51 IEEE Computational Intelligence Society 4/12/ /100 Outline Review of our model of face and object processing Some insights from modeling: Some insights from modeling: What could holistic processing mean? Does a specialized processor for faces need to be innately specified? Does a specialized processor for faces need to be innately specified? Why would a face area process BMWs? Some new directions: How do we select where to look next? How is information integrated across saccades?

52 IEEE Computational Intelligence Society 4/12/ /100 Introduction The brain appears to devote specialized resources to face processing. The issue: innate or learned? Our approach: computational models guided by neuropsychological and experimental data. The model: competing neural networks + biologically plausible task and input biases. Results: interaction between face discrimination and low visual acuity leads to networks specializing for face recognition. No innateness necessary!

53 IEEE Computational Intelligence Society 4/12/ /100 Step one: a model with parts Independent networks compete to perform new tasks A mediator rewards winners The question: What might cause a specialized face processor? Stimulus Decision Mediator Feature Extraction units Face Processing Object Processing Processing ??

54 IEEE Computational Intelligence Society 4/12/ /100 Developmental biases in learning The task: we have strong need to discriminate between faces but not between baby bottles. Mothers face recognition at 4 days (Pascalis et al., 1995) The input: low spatial frequencies - which tends to be more holistic in nature Infant sensitivity to high spatial frequencies is low at birth From Banks and Salapatek, 1981

55 IEEE Computational Intelligence Society 4/12/ /100 Neural Network Implementation Input Stimulus Image Preprocessing Separate nets in competition Output mixed by gate network More error feedback to winner... Gate Output... multiplicative connections High spatial frequency Low spatial frequency

56 IEEE Computational Intelligence Society 4/12/ /100 Experimental methods Image data: 12 faces, 12 books, 12 cups, 12 soda cans, five examples each. 8-bit grayscale, cropped and scaled to 64x64 pixels

57 IEEE Computational Intelligence Society 4/12/ /100 Image Preprocessing Gabor Jet Pattern Vector (8x5 Elements) Filter Responses (512x5 Elements) Dimensionality Reduction PCA

58 IEEE Computational Intelligence Society 4/12/ /100 Task Manipulation n Trained networks for two types of task: Superordinate four-way classification (book? face?) Subordinate classification within one class; simple classification for others (book? John?) Face CanBookCup Bob CanBookCup CarolTed...Alice Task 1: SuperordinateTask 2: Subordinate Network Output Units

59 IEEE Computational Intelligence Society 4/12/ /100 Input spatial frequency manipulation n Used two input pattern formats Each module receives same full pattern vector One module receives low spatial frequencies; other receives high spatial frequencies aedcb a000.5cb0ed 0

60 IEEE Computational Intelligence Society 4/12/ /100 Measuring specialization n Train the network n Record how gate network outputs change with each pattern Net 1 Net 2 Gate Net 1 Net 2 Gate

61 IEEE Computational Intelligence Society 4/12/ /100 Specialization Results Gating Unit Average Weight Four-way classification (Face, Book, Cup, Can?) Book identification (Face, Cup, Can, Book1, Book2,...?) Face identification (Book, Cup, Can, Bob, Carol, Ted,...?) Module 1 Module 2 High frequency module Low frequency module TASK All frequencies Hi/Lo split

62 IEEE Computational Intelligence Society 4/12/ /100 Modeling prosopagnosia n Can damage the specialized network. Damage to high spatial frequency network degrades object classification Damage to low spatial frequency network degrades face identification

63 IEEE Computational Intelligence Society 4/12/ /100 Conclusions so far… There is a strong interaction between task and spatial frequency in the degree of specialization for face processing. The model suggests that the infants low visual acuity and the need to discriminate between faces but not other objects could lock in a special face processor early in development. => General mechanisms (competition, known innate biases) could lead to a specialized face processing module No need for an innately-specified processor

64 IEEE Computational Intelligence Society 4/12/ /100 Outline Review of our model of face and object processing Some insights from modeling: Some insights from modeling: What could holistic processing mean? Does a specialized processor for faces need to be innately specified? Why would a face area process BMWs? Why would a face area process BMWs? Some new directions: How do we select where to look next? How is information integrated across saccades?

65 IEEE Computational Intelligence Society 4/12/ /100 Are you a perceptual expert? Take the expertise test!!!** Identify this object with the first name that comes to mind. **Courtesy of Jim Tanaka, University of Victoria

66 IEEE Computational Intelligence Society 4/12/ /100 Car - Not an expert 2002 BMW Series 7 - Expert!

67 IEEE Computational Intelligence Society 4/12/ /100 Bird or Blue Bird - Not an expert Indigo Bunting - Expert!

68 IEEE Computational Intelligence Society 4/12/ /100 Face or Man - Not an expert George Dubya- Expert!

69 IEEE Computational Intelligence Society 4/12/ /100 Greeble Experts (Gauthier et al. 1999) Subjects trained over many hours to recognize individual Greebles. Activation of the FFA increased for Greebles as the training proceeded.

70 IEEE Computational Intelligence Society 4/12/ /100 The visual expertise mystery n If the so-called Fusiform Face Area (FFA) is specialized for face processing, then why would it also be used for cars, birds, dogs, or Greebles? n Our view: the FFA is an area associated with a process: fine level discrimination of homogeneous categories. n But the question remains: why would an area that presumably starts as a face area get recruited for these other visual tasks? Surely, they dont share features, do they? Sugimoto & Cottrell (2001), Proceedings of the Cognitive Science Society

71 IEEE Computational Intelligence Society 4/12/ /100 Solving the mystery with models n Main idea: There are multiple visual areas that could compete to be the Greeble expert - basic level areas and the expert (FFA) area. The expert area must use features that distinguish similar looking inputs -- thats what makes it an expert Perhaps these features will be useful for other fine-level discrimination tasks. n We will create Basic level models - trained to identify an objects class Expert level models - trained to identify individual objects. Then we will put them in a race to become Greeble experts. Then we can deconstruct the winner to see why they won. Sugimoto & Cottrell (2001), Proceedings of the Cognitive Science Society

72 IEEE Computational Intelligence Society 4/12/ /100 Model Database A network that can differentiate faces, books, cups and A network that can differentiate faces, books, cups and cans is a basic level network. cans is a basic level network. A network that can also differentiate individuals within ONE A network that can also differentiate individuals within ONE class (faces, cups, cans OR books) is an expert.

73 IEEE Computational Intelligence Society 4/12/ /100 Model Pretrain two groups of neural networks on different tasks. Compare the abilities to learn a new individual Greeble classification task. cup Carol book can Bob Ted can cup book face Hidden layer Greeble1 Greeble2 Greeble3 Greeble1 Greeble2 Greeble3 (Experts) (Non-experts)

74 IEEE Computational Intelligence Society 4/12/ /100 Expertise begets expertise Learning to individuate cups, cans, books, or faces first, leads to faster learning of Greebles (cant try this with kids!!!). The more expertise, the faster the learning of the new task! Hence in a competition with the object area, FFA would win. Hence in a competition with the object area, FFA would win. If our parents were cans, the FCA (Fusiform Can Area) would win. AmountOfTrainingRequired To be a GreebleExpert Training Time on first task

75 IEEE Computational Intelligence Society 4/12/ /100 Entry Level Shift: Subordinate RT decreases with training (rt = uncertainty of response = 1.0 -max(output)) Human data --- Subordinate Basic RT Network data # Training Sessions

76 IEEE Computational Intelligence Society 4/12/ /100 How do experts learn the task? Expert level networks must be sensitive to within-class variation: Expert level networks must be sensitive to within-class variation: Representations must amplify small differencesRepresentations must amplify small differences Basic level networks must ignore within-class variation. Basic level networks must ignore within-class variation. Representations should reduce differencesRepresentations should reduce differences

77 IEEE Computational Intelligence Society 4/12/ /100 Observing hidden layer representations Principal Components Analysis on hidden unit activation: Principal Components Analysis on hidden unit activation: PCA of hidden unit activations allows us to reduce the dimensionality (to 2) and plot representations. PCA of hidden unit activations allows us to reduce the dimensionality (to 2) and plot representations. We can then observe how tightly clustered stimuli are in a low-dimensional subspace We can then observe how tightly clustered stimuli are in a low-dimensional subspace We expect basic level networks to separate classes, but not individuals. We expect basic level networks to separate classes, but not individuals. We expect expert networks to separate classes and individuals. We expect expert networks to separate classes and individuals.

78 IEEE Computational Intelligence Society 4/12/ /100 Subordinate level training magnifies small differences within object representations 1 epoch80 epochs1280 epochs Face Basic greeble

79 IEEE Computational Intelligence Society 4/12/ /100 Greeble representations are spread out prior to Greeble Training Face Basic greeble

80 IEEE Computational Intelligence Society 4/12/ /100 Variability Decreases Learning Time Greeble Learning Time Greeble Variance Prior to Learning Greebles (r = )

81 IEEE Computational Intelligence Society 4/12/ /100 Examining the Nets Representations We want to visualize receptive fields in the network. But the Gabor magnitude representation is noninvertible. We can learn an approximate inverse mapping, however. We used linear regression to find the best linear combination of Gabor magnitude principal components for each image pixel. Then projecting each hidden units weight vector into image space with the same mapping visualizes its receptive field.

82 IEEE Computational Intelligence Society 4/12/ /100 Two hidden unit receptive fields AFTER TRAINING AS A FACE EXPERTAFTER FURTHER TRAINING ON GREEBLES HU 16 HU 36 NOTE: These are not face-specific!

83 IEEE Computational Intelligence Society 4/12/ /100 Controlling for the number of classes We obtained 13 classes from hemera.com: 10 of these are learned at the basic level. 10 faces, each with 8 expressions, make the expert task 3 (lamps, ships, swords) are used for the novel expertise task.

84 IEEE Computational Intelligence Society 4/12/ /100 Results: Pre-training New initial tasks of similar difficulty: In previous work, the basic level task was much easier. These are the learning curves for the 10 object classes and the 10 faces.

85 IEEE Computational Intelligence Society 4/12/ /100 Results As before, experts still learned new expert level tasks faster Number of epochs To learn swords After learning faces Or objects Number of training epochs on faces or objects

86 IEEE Computational Intelligence Society 4/12/ /100 Outline Review of our model of face and object processing Some insights from modeling: What could holistic processing mean? Does a specialized processor for faces need to be innately specified? Why would a face area process BMWs? Some new directions: Some new directions: How do we select where to look next? How do we select where to look next? How is information integrated across saccades? How is information integrated across saccades?

87 IEEE Computational Intelligence Society 4/12/ /100 Issues I havent addressed… 1.Development - what is the trajectory of the system from infant to adult? How do representations change over development? 2.How do earlier acquired representations differ from later ones? I.e., what is the representational basis of Age of Acquisition effects? 3.How do representations change based on familiarity? 4.Does the FFA participate in basic level processing? 5.Dynamics of expertise: Eye movements 1.How do they change with expertise? 2.Are there visual routines for different tasks? 3.How much does the stimulus influence eye movements? I.e., how flexible are the routines? 4.How do we decide where to look next? 6.How are samples integrated across saccades?

88 IEEE Computational Intelligence Society 4/12/ /100 How do we decide where to look next? Both bottom up and top down influences: Local stimulus complexity == interestingness Task requirements: Look for discriminative features Weve looked at at least two ideas: 1.Gabor filter response variance 2.Mutual information between the features and the categories

89 IEEE Computational Intelligence Society 4/12/ /100 Interest points created using Gabor filter variance

90 IEEE Computational Intelligence Society 4/12/ /100 Where do we look next #2: Mutual Information Ullman et al. (2002) proposed that features of intermediate complexity are best for classification. They used mutual info. between patches in images and categories to find patches that were good discriminators: Faces vs. non-faces; Cars vs. non-cars They found that medium-sized and medium-resolution patches were best for these tasks. Our question: what features are best for subordinate-level classification tasks that need expertise, like facial identity recognition We found that traditional features such as eyes, noses, and mouths are informative for identity ONLY in the context of each other: I.e., in a configuration. Conclusion: Holistic processing develops because it is good.

91 IEEE Computational Intelligence Society 4/12/ /100 Ullman et al 2002 Features of intermediate complexity (size and resolution) are best for classification. These were determined by computing the mutual information between an image patch and the class

92 IEEE Computational Intelligence Society 4/12/ /100 Facial Identity Classification Will features that are good for telling faces from objects be good for identification? We expect that more specific features will be needed for face identification.

93 IEEE Computational Intelligence Society 4/12/ /100 Data Set We used 36 frontal images of 6 individuals (6 images each) from FERET [Phillips et al., 1998]. The images were aligned. Gabor filter responses were extracted from rectangular grids

94 IEEE Computational Intelligence Society 4/12/ /100 Patches Rectangular patches of different centers, sizes and Gabor filter frequencies were taken from images.

95 IEEE Computational Intelligence Society 4/12/ /100 Corresponding Patches Patches are defined as corresponding when they are in the same position, size and Gabor filter frequency across images. If a Fred patch matches the corresponding patch in another image, this is evidence for the Fredness of the new image. We can then use some measure of how many Fred patches match, and a threshold, to decide if this face is Fred.

96 IEEE Computational Intelligence Society 4/12/ /100 Mutual Information How useful the patches were for face identification was measured by mutual information: I(C,F) = H(C) - H(C|F) C, F are binary variables standing for class and feature C=1 when the image is of the individual F=1 when the patch is present in the image

97 IEEE Computational Intelligence Society 4/12/ /100 Implementation n For patch i in image j C=1 for the 6 images belonging to the same individual of image j, C=0 for the other 30. F=1 for images in which the patch i is present, F=0 for the rest. The best threshold for the presence of a patch was found by brute force search, from -0.9 to 0.9 by steps of 0.1. n Mutual information was calculated over every patch in each image. This measures how well each patch predicts the identity of the image. n An average was taken across corresponding patches to measure how a patch of certain center, size and frequency predicts face identity.

98 IEEE Computational Intelligence Society 4/12/ /100 Results: Best Patches The 6 patches with the highest mutual information. Frequency of 1 to 5 denote from the highest Gabor filter frequency to the lowest. These are similar to each other because we do not eliminate redundancy in the patches (these are not independent)

99 IEEE Computational Intelligence Society 4/12/ /100 Conclusions so far… Against intuition, local features of eyes and mouths by themselves are not very informative for face identity. Local features need to processed in medium-sized face areas for identification - where they are in a particular configuration with other features. This may explain why holistic processing has developed for face processing - simply because it is good or even necessary for identification.

100 IEEE Computational Intelligence Society 4/12/ /100 Integration across saccades Now, given these patches sampled from an image, what to do with them? Joyca LaCroixs (2004) Natural Input Memory (NIM) model of recognition memory: At study, sample the image at random points. Store the patches. At test, sample the new image at random points. Count how many stored patches fall inside a ball of radius R around the new patches. The average of this is the recognition score. This is a kernel density estimation model, like GCM, but exemplars are patches, not the whole image. I.e., the NIM answer to the integration problem is: Dont integrate! NIM is a natural partner to our eye movement modeling.

101 IEEE Computational Intelligence Society 4/12/ /100 Implications of the NIM model What would this mean for expertise? Lots of experience -> lots of fragments -> better discrimination Familiarity also means lots of fragments, under many lighting conditions, all associated with one name Augmentation with an interest operator (e.g., look at high variance points on the face (see previous slides and Yamada & Cottrell 1994) ) could easily lead to parts-based representations!

102 IEEE Computational Intelligence Society 4/12/ /100 Wrap up We are able to explain a variety of results in face processing. We have a mechanistic way of talking about holistic processing. How a specialized area might arise for faces, and why low spatial frequencies (LSF) appear to be important in face processing (specialization model: LSF -> better learning and generalization). Why a face area would be recruited to be a Greeble area: expert level (fine discrimination) processing leads to highly differentiated features useful for other discrimination tasks. And…we have plans to go beyond simple passive recognition models…

103 102 END


Download ppt "1 What can computational models tell us about face processing? Garrison W. Cottrell Gary's Unbelievable Research Unit (GURU) Computer Science and Engineering."

Similar presentations


Ads by Google