Presentation is loading. Please wait.

Presentation is loading. Please wait.

Do Expression and Identity Need Separate Representations?

Similar presentations


Presentation on theme: "Do Expression and Identity Need Separate Representations?"— Presentation transcript:

1 Do Expression and Identity Need Separate Representations?
Kristin Branson Gary Cottrell Andrew Calder 24th Annual Meeting of the Cognitive Science Society Friday, August 9, 2002

2 Introduction Facial identity recognition uses a holistic, or configural representation [Tanaka & Farah, 1993], [Young et al., 1987]. A configural model incorporates information about the relationship between (first-order) features. Does facial expression recognition use a configural representation? Are the representations for expression and identity separate? Example First-Order Features Example Configural Information

3 Calder’s Behavioral Experiments
Andrew Calder performed experiments to answer these questions, with the conclusions: Expression recognition uses a configural model. The representations for expression and identity are separate [Calder, et al., 2000]. Identity Representation Expression Task Cathy Happy

4 Purpose Our goal is to model Calder’s experiments using the Dailey et al. model of expression recognition, and a single representation for expression and identity. If we obtain similar results, then Calder’s results can be obtained using only one representation for identity and expression. Face Representation Neural Network Cathy Happy

5 Stimuli: Composite Images
Two types of stimuli: (Aligned) Composite and (Misaligned) Noncomposite face images. (Aligned) Composite images are created by aligning e.g. the top half of a fearful face with the bottom half of a disgusted face. (Aligned) Composite “nr” posing “fearful” “nr” posing “disgusted”

6 Stimuli: Noncomposite Images
(Misaligned) Noncomposite images are created by misaligning e.g. the top half of a fearful face with the bottom half of a disgusted face. (Misaligned) Noncomposite “nr” posing “fearful” “nr” posing “disgusted”

7 Evidence for a Configural Model
Incorrect configural information from the other half of the face is present in composite but not noncomposite images. If it is harder to identify the expression in half a composite than a noncomposite, incorrect configural information disrupts expression recognition, demonstrating that a configural model is used. This method parallels experiments performed to demonstrate that a configural model is used for identity recognition [Young, et al., 1987]. Incorrect Configural Information Composite Noncomposite

8 Supporting Independence
Incorrect identity configural information Incorrect expression configural information Same Expression, Different Identity Different Expression, Same Identity Incorrect identity and expression configural information Different Expression, Different Identity Create three different types of composite images.

9 Supporting Independence
SID, DE DID, DE DID, SE (SID,DE and DID,DE) contain incorrect expression configural information. Expression recognition is disrupted. If expression recognition is independent of identity recognition, incorrect identity configural information should not affect expression recognition. Therefore, it should be no harder to recognize expression in the DID, DE than the SID, DE composites. (DID,SE and DID,DE) contain incorrect identity configural information. Identity recognition is disrupted. If identity recognition is independent of expression recognition, incorrect expression configural information should not affect identity recognition. Therefore, it should be no harder to recognize identity in the DID, DE than the DID, SE composites.

10 Stimuli Preprocessing
Gabor Wavelet Filtering The pixel image is convolved with 2D Gabor wavelet filters of 8 different orientations and 5 different scales. Results in a 40,600 element vector. Similar to filtering in the striate cortex of cats; reacts strongly to edges. Insensitive to small translations. 2D Gabor wavelet filters of different scales and orientations [Daugman, 1985] Images of 10 actors posing 6 expressions from the Ekman & Friesen database of Pictures of Facial Affects. Six “universal” expressions (Happy, Sad, Angry, Fearful Surprised, Disgusted). Actors are trained to move some of 44 muscle groups identified by Ekman as displaying the posed expression. All expressions were recognizable by at least 70% of those tested on them. Pixel Image Database Gabor Pattern Gabor Filtering Attenuated Pattern Attenuate If the current experiment requires that the network identify the expression in only one half of the face, attention to the other half of the face must be attenuated. This is done by multiplying the elements of the Gabor pattern in the half we are attenuating by a fraction. Input Pixel Image

11 Single-layer, feed-forward neural net
Model Description Happy Sad PCA Angry Preprocessing Fearful Surprised Pixel Image Disgusted Network Output Single-layer, feed-forward neural net

12 Model Training Training stimuli in all experiments:
The model is trained on nine actors and tested on the tenth. This is repeated for all ten actors. Both whole face stimuli and half face stimuli are trained upon. Training was stopped when network performance on the training set most closely correlated with the human confusion matrix reported by Ekman.

13 Experiment 1: Half-Faces
Happy Sad Fearful Angry Surprised Disgusted Our first experiment explored recognition of half-face images. An expression should be recognizable in both halves of a composite image in order for one half to interfere with the other half. How well does our model correspond to Calder’s results using humans? The half-face test stimuli are created by zeroing the Gabor outputs in one half (top/bottom) of the image (in the “attenuate” part of the preprocessing). Attenuate Centered on bottom half Centered on bottom half

14 Experiment 1 Results Fraction of Stimuli Incorrectly Identified
(Ten trials of ten networks, standard deviation in parentheses) Expression Human Top Network Top Human Bottom Network Bottom Happy 0.20 (.09) 0.40 0.01 (.01) 0.00 Sadness 0.19 (.05) 0.28 0.34 (.08) Fear 0.33 (.08) 0.56 (.09) 0.70 Anger 0.28 (.06) 0.29 0.49 (.09) 0.65 Surprise 0.06 (.21) 0.33 (.07) 0.21 Disgust 0.62 (.10) 0.20 0.04 (.14)

15 Experiment 2: Does Expression Use a Configural Model?
In this experiment, we test the networks on stimuli created from composite and noncomposite images. Composite/noncomposite images are created using the top halves of images of top-biased expressions and the bottom halves of images of bottom-biased expressions. To focus the network’s attention on one half of the face, the other half is attenuated. attenuate Composite Image Gabor Response Vector Composite Stimulus Noncomposite Stimulus shift convolve

16 Experiment 2 Results Human Results Network Results
Composite Noncomposite Composite Noncomposite Human Reaction Time (ms) Fraction incorrectly identified (bars indicate one standard deviation) The network more correctly identified the expression in one half of the noncomposite than the composite images. When identifying the expression in one half of the stimuli, there is incorrect configural information from the other half of composite stimuli, but not the other half of the noncomposite stimuli. If incorrect configural information disrupts expression recognition, then configural information is used by the network, thus the networks use a configural model.

17 Experiment 3: Can Incorrect Configural Information Disrupt Identity & Expression Recognition Independently? In this experiment, we test the networks on stimuli created from SID,DE, DID,DE, and DID,SE composites. Test stimuli are created as in Experiment 2. Network outputs both identity and expression classifications. Results for all three types of stimuli are compared. Same Identity, Different Expression Different Identity, Same Expression Different Identity, Different Expression

18 Experiment 3 Results Human Results Network Results
Expression Decision Expression Decision Reaction Time (ms) 1 – Average Correct Output Identity Decision Identity Decision SID/DE DID/DE DID/SE SID/DE DID/DE DID/SE Human Reaction Time (ms) Approximation of Reaction Time: – Average Correct Output It is no harder for humans/model to recognize expression in DID,DE than SID,DE. It is no harder for humans/model to recognize identity in DID,DE than DID,SE. Incorrect identity configural information does not disrupt humans/model’s expression recognition, and vice-versa.

19 Only One Representation?
PCA Representation Our model uses one representation for identity and expression. Our results suggest that identity and expression are encoded by different, orthogonal principal components. It is probable that identity and expression representations evolve together to ensure orthogonality.

20 Conclusions Our model and humans use similar information in expression and identity recognition: Experiment 1 showed that our model found the same expressions top- and bottom-biased as humans. Experiment 2 showed that our model uses configural information for expression recognition. Experiment 3 showed that incorrect configural information can disrupt identity and expression recognition independently. However, as our model uses only one representation for both identity and expression recognition, it is possible that the same representation be used for both to obtain the same results as Calder’s experiments.


Download ppt "Do Expression and Identity Need Separate Representations?"

Similar presentations


Ads by Google