Presentation is loading. Please wait.

Presentation is loading. Please wait.

Decoding Human Face Processing Ankit Awasthi Prof. Harish Karnick.

Similar presentations


Presentation on theme: "Decoding Human Face Processing Ankit Awasthi Prof. Harish Karnick."— Presentation transcript:

1 Decoding Human Face Processing Ankit Awasthi Prof. Harish Karnick

2 Motivation One of the most important goals of Computer Vision researchers is to come up with a algorithm which can process face images and classify into different categories (based on gender, emotions, identity etc.) Human are extremely good at these tasks In order to match human performace and eventually beat it, it is imperative that we understand how humans do it

3 Motivation Moreover, similar cognitive processes might be involved in processing of other kinds of visual data or even data from other modalities Discovery of computational basis of face processing might be a good indication of generic cognitive structures

4 Where does our work fit in?? A large number of neurological and psychological experimental findings Implications for computer vision algorithms Closing the loop

5 Neural Networks (~1985) input vector hidden layers outputs Back-propagate error signal to get derivatives for learning Compare outputs with correct answer to get error signal

6 Why Deep Learning?? Brains have a deep architecture Humans organize their ideas hierarchically, through composition of simpler ideas Insufficiently deep architectures can be exponentially inefficient Deep architectures facilitate feature and sub- feature sharing

7 Restricted Boltzmann Machines (RBM) We restrict the connectivity to make learning easier. – Only one layer of hidden units. – No connections between hidden units. Energy of a joint configuration is defined as (for binary visible units) (for real visible units) Hidden(h) i j Visible(v)

8 Training a deep network

9 Sparse DBNs(Lee at. al. 2007) In order to have a sparse hidden layer, the average activation of a hidden unit over the training is constrained to a certain small quantity The optimization problem in the learning algorithm would look like

10 Oriented edge detectors using DBNs

11 Important observations about DBNs We found that in our experiments that – Fine tuning was important only for construction of autoencoder – The final softmax layer can be learned on top of the learned with marginal loss in accuracy Fine tuning the autoencoder is important

12 Neural Underpinnings (Sinha et. al., 2006) The human visual system appears to devote specialized resources for face perception Latency of responses to faces in infero-temporal cortex is about 120 ms, suggesting a largely feed-forward computation Facial identity and emotion might be processed separately o One of the reasons, we restricted ourselves to emotion and gender classification

13 Experiments and Dataset Gender and Emotion Recognition (happy,neutral) Training images – 300, 50x50 images used Test images – 98,50x50 images used

14 Results on Normal images Same network architecture used for all experiments (3000->1000->500->200->100) Gender Recognition – 94% Emotion Recognition – 93%

15 Low vs High Spatial Frequency A number of contradictory results General Consensus – Low spatial frequency is more important than higher spatial frequencies – Hints at the importance of configural information High frequency information by itself does not lead to good performance – How to reconcile this with observed recognizability of line drawings in everyday experience Spatial frequency employed for emotions is higher than that employed for gender classification (Deruelle and Fagot,2004)

16 Experiments We cut off all spatial frequencies above 8cycles per face Two cases each in gender and emotion recognition 1.A model trained on ‘normal’ images is tested on low spatial frequency images 2.A model trained on low spatial frequency images is tested on low spatial frequency images

17 Results Gender Recognition 1.Model trained on ‘normal’ images ~ 89% 2.Model trained on LSF images ~ 91% Emotion Recognition 1.Model trained on ‘normal’ images ~ 87% 2.Model trained on LSF images ~ 90.5%

18 Discussion The decrease in the accuracy is not much considering the significant reduction in the amount of information Implies low spatial frequency information can be used to classify a majority of images Tests with different spatial frequencies need to be done to reach a conclusive answer Importance of HSF is not apparent here because of the simplicity of the task In some other experiments where we looked at only HSF images, the results weren’t good!

19 Component and Configural Information Facial features are processed holistically in recognition (Sinha et. al,2006) and in emotion recognition (Durand et. al., 2007) The configural information affects how individual features are processed On the other hand, there is evidence that we process face images by matching parts – Thatcher illusion Configural information affects individual features are processed

20 Thatcher Illusion

21 Experiments Two kinds experiments 1.Models trained on ‘normal’ images tested on new images 2.Same set of training and test images

22 Results (Gender Classification) Models trained on ‘normal’ images ~ 91%~80% ~ 70%~ random!!

23 Results(Gender Classification) Same training and test images ~ 93% ~ 85% ~ 79%

24 Results (Emotion Classification) Models trained on ‘normal’ images ~ 87%~81% ~ 87%~ random!!

25 Results(Emotion Classification) Same training and test images ~ 92% ~ 84% ~ 82%

26 Agreement with Human Performance Preliminary results show that humans are – Perfect in case of normal images we are using – Error prone when the parts are removed (3 out of 20 images on an average) Accuracy depends a lot upon the time of exposure Proper timed experiments are expected yield results much similar to the algorithm

27 Discussion The importance of important features (eyes,mouth) evident Eyes/eyebrows are important for gender recognition When trained on ‘normal’ images the algorithm learns features corresponding to important parts In absence of these features the algorithm learns to extract other features to increase

28 Inversion Effect One of the first findings which hinted at a dedicated face processing pathway Another indicator configural prccessing of face images Inverted images take significantly longer to process

29 Experiments and Results Models trained on ‘normal’ images – The results are “random”!! Training and testing on inverted is same doing it for ‘normal’ images Results show that the face image processing is not part based

30 Thatcher Illusion

31 Experiment and Results Models trained on ‘normal’ images – Random for both tasks!! Same training and test images – Gender: 92% – Emotion: 91%

32 High Level Features Only few connections to the previous layer have their weights either too high or too low Some of the largest weighted connections are used for linear combination Overlooks the non-linearity in the network from one layer to the other

33 Natural Extensions More exhaustive set of experiments need to be done to verify our preliminary observations It would be interesting to compare other models with Deep networks Some of the problems or inconsistencies are due to lack of translation invariant features – Best solution is to use a Convolutional Model Natural regularizer Translational invariance Biologically plausible

34 Conclusion We have done preliminary investigations with respect to various phenomenon Observed results certainly hint at the cognitive relevance of the model

35 References Georey E. Hinton, Yee-Whye Teh and Simon Osindero, A Fast Learning Algorithm forDeep Belief Nets. Neural Computation, pages 1527-1554, Volume 18, 2006. Georey E. Hinton (2010). A Practical Guide to Training Restricted Boltzmann Machines,Technical Report,Volume 1 Dumitru Erhan, Yoshua Bengio, Aaron Courville, and Pascal Vincent (2010). Visualizing Higher-Layer Features of a Deep Network,Technical Report 1341 Honglak Lee, Roger Grosse,Rajesh Ranganath, Andrew Y. Ng. Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations,ICML 2009 Geoffrey E. Hinton Learning multiple layers of representation,Trends in Cognitive Sciences Vol.11 No.10,2006 Honglak Lee, Chaitanya Ekanadham, Andrew Y. Ng, Sparse deep belief net model for visual area V2, NIPS,2007

36 References Olshausen BA, Field DJ (1997) Sparse Coding with an Overcomplete Basis Set: A Strategy Employed by V1? Vision Research, 37: 3311-3325. Karine Durand, Mathieu Gallay, Alix Seigneuric,Fabrice Robichon, Jean- Yves Baudouin,The development of facial emotion recognition:The role of configural information, Jornal of child Psychology, 2007 Prawal Sinha, Benjamin Balas, Yuri Ostrovsky, Richard Russell,Face Recognition by Humans: Nineteen results all Computer Vision Researchers should know, Christian Wallraven, Adrian Schwaninger,Heinrich H. Bulltoff, Learning from humans, Computational modeling of face recognition, Computation in Neural Systems Christine Duerelle and Joel Faggot,Categorizing facial indentities,emotions, and genders : Attention to high and low-spatial frequencies by children and adults


Download ppt "Decoding Human Face Processing Ankit Awasthi Prof. Harish Karnick."

Similar presentations


Ads by Google