Latent Variable / Hierarchical Models in Computational Neural Science Ying Nian Wu UCLA Department of Statistics March 30, 2011.

Slides:



Advertisements
Similar presentations
Adjusting Active Basis Model by Regularized Logistic Regression
Advertisements

Learning deformable models Yali Amit, University of Chicago Alain Trouvé, CMLA Cachan.
Chapter 2.
Object recognition and scene “understanding”
CIAR Second Summer School Tutorial Lecture 2a Learning a Deep Belief Net Geoffrey Hinton.
CS590M 2008 Fall: Paper Presentation
Detecting Faces in Images: A Survey
Advanced topics.
Supervised Learning Recap
Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.
Many slides based on P. FelzenszwalbP. Felzenszwalb General object detection with deformable part-based models.
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
Chapter 4: Linear Models for Classification
CIAR Summer School Tutorial Lecture 2b Learning a Deep Belief Net
More sliding window detection: Discriminative part-based models Many slides based on P. FelzenszwalbP. Felzenszwalb.
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
Model: Parts and Structure. History of Idea Fischler & Elschlager 1973 Yuille ‘91 Brunelli & Poggio ‘93 Lades, v.d. Malsburg et al. ‘93 Cootes, Lanitis,
Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.
A Mathematical Theory of Primal Sketch & Sketchability C. Guo, S. C. Zhu, Y. N.Wu UCLA.
Announcements  Homework 4 is due on this Thursday (02/27/2004)  Project proposal is due on 03/02.
Primal Sketch Integrating Structure and Texture Ying Nian Wu UCLA Department of Statistics Keck Meeting April 28, 2006 Guo, Zhu, Wu (ICCV, 2003; GMBV,
Information Theory and Learning
Machine Learning CMPT 726 Simon Fraser University
Michael Arbib & Laurent Itti: CS664 – USC, spring Lecture 6: Object Recognition 1 CS664, USC, Spring 2002 Lecture 6. Object Recognition Reading Assignments:
CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Object Recognition by Parts Object recognition started with line segments. - Roberts recognized objects from line segments and junctions. - This led to.
Grammar of Image Zhaoyin Jia, Problems  Enormous amount of vision knowledge:  Computational complexity  Semantic gap …… Classification,
Can computer simulations of the brain allow us to see into the mind? Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto.
Generic object detection with deformable part-based models
AbstractAbstract Adjusting Active Basis Model by Regularized Logistic Regression Ruixun Zhang Peking University, Department of Statistics and Probability.
CIVS, Statistics Dept. UCLA Deformable Template as Active Basis Zhangzhang Si UCLA Department of Statistics Ying Nian Wu, Zhangzhang Si, Chuck.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Multiclass object recognition
Sparse Coding Arthur Pece Outline Generative-model-based vision Linear, non-Gaussian, over-complete generative models The penalty method.
CSC2535: Computation in Neural Networks Lecture 11: Conditional Random Fields Geoffrey Hinton.
2 2  Background  Vision in Human Brain  Efficient Coding Theory  Motivation  Natural Pictures  Methodology  Statistical Characteristics  Models.
Unsupervised Learning of Compositional Sparse Code for Natural Image Representation Ying Nian Wu UCLA Department of Statistics October 5, 2012, MURI Meeting.
Leo Zhu CSAIL MIT Joint work with Chen, Yuille, Freeman and Torralba 1.
Learning Lateral Connections between Hidden Units Geoffrey Hinton University of Toronto in collaboration with Kejie Bao University of Toronto.
Object Detection with Discriminatively Trained Part Based Models
Markov Random Fields Probabilistic Models for Images
A Two-level Pose Estimation Framework Using Majority Voting of Gabor Wavelets and Bunch Graph Analysis J. Wu, J. M. Pedersen, D. Putthividhya, D. Norgaard,
Paper Reading Dalong Du Nov.27, Papers Leon Gu and Takeo Kanade. A Generative Shape Regularization Model for Robust Face Alignment. ECCV08. Yan.
Discussion of Pictorial Structures Pedro Felzenszwalb Daniel Huttenlocher Sicily Workshop September, 2006.
Grammars in computer vision
Inference in generative models of images and video John Winn MSR Cambridge May 2004.
Lecture 2: Statistical learning primer for biologists
CSC321: Introduction to Neural Networks and Machine Learning Lecture 19: Learning Restricted Boltzmann Machines Geoffrey Hinton.
Information Extraction Entity Extraction: Statistical Methods Sunita Sarawagi.
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
Deep learning Tsai bing-chen 10/22.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 17: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
CSC2515 Fall 2008 Introduction to Machine Learning Lecture 8 Deep Belief Nets All lecture slides will be available as.ppt,.ps, &.htm at
CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
Statistical Modeling and Learning in Vision --- cortex-like generative models Ying Nian Wu UCLA Department of Statistics JSM, August 2010.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
1 Nonlinear models for Natural Image Statistics Urs Köster & Aapo Hyvärinen University of Helsinki.
Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton
Learning Deep Generative Models by Ruslan Salakhutdinov
Deep Feedforward Networks
CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.
Article Review Todd Hricik.
Restricted Boltzmann Machines for Classification
Multimodal Learning with Deep Boltzmann Machines
CSC321 Winter 2007 Lecture 21: Some Demonstrations of Restricted Boltzmann Machines Geoffrey Hinton.
CSC 2535: Computation in Neural Networks Lecture 9 Learning Multiple Layers of Features Greedily Geoffrey Hinton.
Reuben Feinman Research advised by Brenden Lake
Presentation transcript:

Latent Variable / Hierarchical Models in Computational Neural Science Ying Nian Wu UCLA Department of Statistics March 30, 2011

Outline Latent variable models in statistics Primary visual cortex (V1) Modeling and learning in V1 Layered hierarchical models Joint work with Song-Chun Zhu and Zhangzhang Si

Latent variable models Hidden Observed Learning:Examples Inference:

Latent variable models Mixture model Factor analysis

Latent variable models Hidden Observed Learning:Examples Maximum likelihood EM/gradient Inference / explaining away E-step / imputation

Computational neural science Z: Internal representation by neurons Y: Sensory data from outside environment Hidden Observed Connection weights Hierarchical extension: modeling Z by another layer of hidden variables explaining Y instead of Z Inference / explaining away

Source: Scientific American, 1999 Visual cortex: layered hierarchical architecture V1: primary visual cortex simple cells complex cells bottom-up/top-down

Simple V1 cells Daugman, 1985 Gabor wavelets: localized sine and cosine waves Transation, rotation, dilation of the above function

image pixels V1 simple cells respond to edges

Complex V1 cells Riesenhuber and Poggio,1999 Image pixels V1 simple cells V1 complex cells Local max Local sum Larger receptive field Less sensitive to deformation

Independent Component Analysis Bell and Sejnowski, 1996 Laplacian/Cauchy

Hyvarinen, 2000

Sparse coding Olshausen and Field, 1996 Laplacian/Cauchy/mixture Gaussians

Inference: sparsification, non-linear lasso/basis pursuit/matching pursuit mode and uncertainty of p(C|I) explaining-away, lateral inhibition Sparse coding / variable selection Learning: A dictionary of representational elements (regressors)

Olshausen and Field, 1996

Restricted Boltzmann Machine Hinton, Osindero and Teh, 2006 P(V|H) P(H|V): factorized no-explaining away hidden, binary visible

Energy-based model Teh, Welling, Osindero and Hinton, 2003 Features, no explaining-away Maximum entropy with marginals Exponential family with sufficient stat Zhu, Wu, and Mumford, 1997 Wu, Liu, and Zhu, 2000 Markov random field/Gibbs distribution

Zhu, Wu, and Mumford, 1997 Wu, Liu, and Zhu, 2000

Source: Scientific American, 1999 Visual cortex: layered hierarchical architecture bottom-up/top-down What is beyond V1? Hierarchical model?

Hierchical ICA/Energy-based model? Larger features Must introduce nonlinearities Purely bottom-up

P(V,H) = P(H)P(V|H) P(H)  P(V’,H) I H V V’ Discriminative correction by back-propagation Unfolding, untying, re-learning Hierarchical RBM Hinton, Osindero and Teh, 2006

Hierarchical sparse coding Attributed sparse coding elements transformation group topological neighborhood system Layer above : further coding of the attributes of selected sparse coding elements

Active basis model Wu, Si, Gong, Zhu, 10 Zhu, Guo, Wang, Xu, 05 n-stroke template n = 40 to 60, box= 100x100

Active basis model Wu, Si, Gong, Zhu, 10 Zhu, et al., 05 Yuille, Hallinan, Cohen, 92 n-stroke template n = 40 to 60, box= 100x100

Simplest AND-OR graph ( Pearl, 84; Zhu, Mumford 06 ) AND composition and OR perturbations or variations of basis elements Simplest shape model: average + residual Simplest modification of Olshausen-Field model Further sparse coding of attributes of sparse coding elements Simplicity

Bottom layer: sketch against texture Only need to pool a marginal q(c) as null hypothesis natural images  explicit q(I) of Zhu, Mumford, 97 this image  explicit q(I) of Zhu, Wu, Mumford, 97 Maximum entropy ( Della Pietra, Della Pietra, Lafferty, 97; Zhu, Wu, Mumford, 97; Jin, S. Geman, 06; Wu, Guo, Zhu, 08 ) Special case: density substitution ( Friedman, 87; Jin, S. Geman, 06 ) p(C, U) = p(C) p(U|C) = p(C) q(U|C) = p(C) q(U,C)/q(C)

Shared sketch algorithm : maximum likelihood learning Prototype: shared matching pursuit (closed-form computation) Step 1: two max to explain images by maximum likelihood no early decision on edge detection Step 2: arg-max for inferring hidden variables Step 3: arg-max explains away, thus inhibits (matching pursuit, Mallat, Zhang, 93 ) Finding n strokes to sketch M images simultaneously n = 60, M = 9

Bottom-up sum-max scoring ( no early edge decision ) Top-down arg-max sketching 1.Reinterpreting MAX1: OR-node of AND-OR, MAX for ARG-MAX in max-product algorithm 2.Stick to Olshausen-Field sparse top-down model : AND-node of AND-OR  Active basis, SUM2 layer, “neurons” memorize shapes by sparse connections to MAX1 layer  Hierarchical, recursive AND-OR/ SUM-MAX Architecture: more top-down than bottom-up Neurons: more representational than operational (OR-neurons/AND-neurons) Cortex-like sum-max maps: maximum likelihood inference SUM1 layer: simple V1 cells of Olshausen, Field, 96 MAX1 layer: complex V1 cells of Riesenhuber, Poggio, 99 Scan over multiple resolutions

Bottom-up detection Top-down sketching SUM1 MAX1 SUM2 arg MAX1 Sparse selective connection as a result of learning Explaining-away in learning but not in inference Bottom-up scoring and top-down sketching

Scan over multiple resolutions and orientations (rotating template)

Classification based on log likelihood ratio score Freund, Schapire, 95; Viola, Jones, 04

Adjusting Active Basis Model by L2 Regularized Logistic Regression By Ruixun Zhang Exponential family model, q(I) negatives  Logistic regression for p(class | image), partial likelihood Generative learning without negative examples  basis elements and hidden variables Discriminative adjustment with hugely reduced dimensionality  correcting conditional independence assumption L2 regularized logistic regression  re-estimated lambda’s Conditional on: (1) selected basis elements (2) inferred hidden variables (1) and (2)  generative learning

Active basis templates Adaboost templates # of negatives: Arg-max inference and explaining away, no reweighting, Residual images neutralize existing elements, same set of training examples No arg-max inference or explaining away inhibition Reweighted examples neutralize existing classifiers, changing set of examples double # elements same # elements

Mixture model of active basis templates fitted by EM/maximum likelihood with random initialization MNIST 500 total

Learning active basis models from non-aligned image EM-type maximum likelihood learning, Initialized by single image learning

Learning active basis models from non-aligned image

Hierarchical active basis by Zhangzhang Si et al. And-OR graph: Pearl, 84; Zhu, Mumford, 06 Compositionality and reusability: Geman, Potter, Chi, 02; L.Zhu, Lin, Huang, Chen,Yuille, 08 Part-based method: everyone et al. Latent SVM: Felzenszwalb, McAllester, Ramanan, 08 Constellation model: Weber, Welling, Perona, 00 Low log-likelihood High log-like

Simplicity Simplest and purest recursive two-layer AND-OR graph Simplest generalization of active basis model

AND-OR graph and SUM-MAX maps maximum likelihood inference Cortex-like, related to Riesenhuber, Poggio, 99 Bottom-up sum-max scoring Top-down arg-max sketching

Hierarchical active basis by Zhangzhang Si et al.

Shape script by composing active basis shape motifs Representing elementary geometric shapes (shape motifs) by active bases ( Si, Wu, 10 ) Geometry = sketch that can be parametrized

Bottom-layer: Olshausen-Field (foreground) + Zhu-Wu-Mumford (background) Maximum entropy tilting (Della Pietra, Della Pietra, Lafferty, 97) white noise  texture (high entropy)  sketch (low and mid entropy) (reverse the central limit theorem effect of information scaling) Build up layers: (1) AND-OR, SUM-MAX (top-down arg-MAX) (2) Perpetual sparse coding: further coding of attributes of the current sparse coding elements (a) residuals of attributes  continuous OR-nodes (b) mixture model  discrete OR-nodes Summary