Stochastic Sets and Regimes of Mathematical Models of Images Song-Chun Zhu University of California, Los Angeles Tsinghua Sanya Int’l Math Forum, Jan,

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

Weakly supervised learning of MRF models for image region labeling Jakob Verbeek LEAR team, INRIA Rhône-Alpes.
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
Texture. Limitation of pixel based processing Edge detection with different threshold.
Kick-off Meeting, July 28, 2008 ONR MURI: NexGeNetSci Distributed Coordination, Consensus, and Coverage in Networked Dynamic Systems Ali Jadbabaie Electrical.
By: Ryan Wendel.  It is an ongoing analysis in which videos are analyzed frame by frame  Most of the video recognition is pulled from 3-D graphic engines.
Visual Representation The Frontiers of Vision Workshop, August 20-23, 2011 Song-Chun Zhu.
Learning Inhomogeneous Gibbs Models Ce Liu
Yuanlu Xu Human Re-identification: A Survey.
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
Image Parsing: Unifying Segmentation and Detection Z. Tu, X. Chen, A.L. Yuille and S-C. Hz ICCV 2003 (Marr Prize) & IJCV 2005 Sanketh Shetty.
CPSC 322, Lecture 23Slide 1 Logic: TD as search, Datalog (variables) Computer Science cpsc322, Lecture 23 (Textbook Chpt 5.2 & some basic concepts from.
1 Hierarchical Image-Motion Segmentation using Swendsen-Wang Cuts Adrian Barbu Siemens Corporate Research Princeton, NJ Acknowledgements: S.C. Zhu, Y.N.
CPSC 322, Lecture 19Slide 1 Propositional Logic Intro, Syntax Computer Science cpsc322, Lecture 19 (Textbook Chpt ) February, 23, 2009.
Good morning, everyone, thank you for coming to my presentation.
1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.
A Mathematical Theory of Primal Sketch & Sketchability C. Guo, S. C. Zhu, Y. N.Wu UCLA.
CPSC 322, Lecture 23Slide 1 Logic: TD as search, Datalog (variables) Computer Science cpsc322, Lecture 23 (Textbook Chpt 5.2 & some basic concepts from.
Primal Sketch Integrating Structure and Texture Ying Nian Wu UCLA Department of Statistics Keck Meeting April 28, 2006 Guo, Zhu, Wu (ICCV, 2003; GMBV,
Zhu, Song-Chun and Mumford, David. A Stochastic Grammar of Images. Foundations and Trends in Computer Graphics and Vision 2(4), (2006) Hemerson.
TEXTURE SYNTHESIS PEI YEAN LEE. What is texture? Images containing repeating patterns Local & stationary.
Computational Vision Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.
CPSC 322, Lecture 31Slide 1 Probability and Time: Markov Models Computer Science cpsc322, Lecture 31 (Textbook Chpt 6.5) March, 25, 2009.
Information Modeling: The process and the required competencies of its participants Paul Frederiks Theo van der Weide.
What is an Ontology? AmphibiaTree 2006 Workshop Saturday 8:45–9:15 A. Maglia.
Grammar of Image Zhaoyin Jia, Problems  Enormous amount of vision knowledge:  Computational complexity  Semantic gap …… Classification,
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Lecture 1: Introduction to Pattern Recognition
CIVS, Statistics Dept. UCLA Deformable Template as Active Basis Zhangzhang Si UCLA Department of Statistics Ying Nian Wu, Zhangzhang Si, Chuck.
©2003/04 Alessandro Bogliolo Background Information theory Probability theory Algorithms.
Chapter 5 Models and theories 1. Cognitive modeling If we can build a model of how a user works, then we can predict how s/he will interact with the interface.
Computer Science CPSC 322 Lecture 3 AI Applications 1.
Probabilistic Context Free Grammars for Representing Action Song Mao November 14, 2000.
(Infinitely) Deep Learning in Vision Max Welling (UCI) collaborators: Ian Porteous (UCI) Evgeniy Bart UCI/Caltech) Pietro Perona (Caltech)
TTH 1:30-2:48 Winter DL266 CIS 788v04 Zhu Topic 5. Human Faces Human face is extensively studied.
2 2  Background  Vision in Human Brain  Efficient Coding Theory  Motivation  Natural Pictures  Methodology  Statistical Characteristics  Models.
Unsupervised Learning of Compositional Sparse Code for Natural Image Representation Ying Nian Wu UCLA Department of Statistics October 5, 2012, MURI Meeting.
Leo Zhu CSAIL MIT Joint work with Chen, Yuille, Freeman and Torralba 1.
Markov Random Fields Probabilistic Models for Images
Template matching and object recognition. CS8690 Computer Vision University of Missouri at Columbia Matching by relations Idea: –find bits, then say object.
Physics Fluctuomatics (Tohoku University) 1 Physical Fluctuomatics 12th Bayesian network and belief propagation in statistical inference Kazuyuki Tanaka.
Fields of Experts: A Framework for Learning Image Priors (Mon) Young Ki Baik, Computer Vision Lab.
1 1. Representing and Parameterizing Agent Behaviors Jan Allbeck and Norm Badler 연세대학교 컴퓨터과학과 로봇 공학 특강 학기 유 지 오.
Physical Science Methods and Math Describing Matter The Scientific Method Measurements and Calculations 1.
Programming Languages and Design Lecture 3 Semantic Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Latent Variable / Hierarchical Models in Computational Neural Science Ying Nian Wu UCLA Department of Statistics March 30, 2011.
Grammars in computer vision
Stochastic Grammars: Overview Representation: Stochastic grammar Representation: Stochastic grammar Terminals: object interactions Terminals: object interactions.
Why use landscape models?  Models allow us to generate and test hypotheses on systems Collect data, construct model based on assumptions, observe behavior.
MA354 Math Modeling Introduction. Outline A. Three Course Objectives 1. Model literacy: understanding a typical model description 2. Model Analysis 3.
Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208
Bayesian Inference and Visual Processing: Image Parsing & DDMCMC. Alan Yuille (Dept. Statistics. UCLA) Tu, Chen, Yuille & Zhu (ICCV 2003).
Jianchao Yang, John Wright, Thomas Huang, Yi Ma CVPR 2008 Image Super-Resolution as Sparse Representation of Raw Image Patches.
Learning Image Statistics for Bayesian Tracking Hedvig Sidenbladh KTH, Sweden Michael Black Brown University, RI, USA
Graduate School of Information Sciences, Tohoku University
Outline Statistical Modeling and Conceptualization of Visual Patterns
Edges/curves /blobs Grammars are important because:
Outline S. C. Zhu, X. Liu, and Y. Wu, “Exploring Texture Ensembles by Efficient Markov Chain Monte Carlo”, IEEE Transactions On Pattern Analysis And Machine.
CSc4730/6730 Scientific Visualization
Estimating Networks With Jumps
Image Parsing & DDMCMC. Alan Yuille (Dept. Statistics. UCLA)
Brief Review of Recognition + Context
Graduate School of Information Sciences, Tohoku University
Graduate School of Information Sciences, Tohoku University
Nonparametric Bayesian Texture Learning and Synthesis
CPSC 503 Computational Linguistics
Graduate School of Information Sciences, Tohoku University
Graduate School of Information Sciences, Tohoku University
Chapter 14 February 26, 2004.
Presentation transcript:

Stochastic Sets and Regimes of Mathematical Models of Images Song-Chun Zhu University of California, Los Angeles Tsinghua Sanya Int’l Math Forum, Jan, 2013

Outline 1, Three regimes of image models and stochastic sets 2, Information scaling ---- the transitions in a continuous entropy spectrum. High entropy regime --- (Gibbs, MRF, FRAME) and Julesz ensembles; Low entropy regime --- Sparse land and bounded subspace; Middle entropy regime --- Stochastic image grammar and its language; and 3, Spatial, Temporal, and Causal and-or-graph Demo on joint parsing and query answering

How do we represent a concept in computer? Mathematics and logic has been based on deterministic sets (e.g. Cantor, Boole) and their compositions through the “and”, “or”, and “negation” operators. Ref. [1] D. Mumford. The Dawning of the Age of Stochasticity [2] E. Jaynes. Probability Theory: the Logic of Science. Cambridge University Press, But the world is fundamentally stochastic ! e.g. the set of people who are in Sanya today, and the set of people in Florida who voted for Al Gore in 2000 are impossible to know exactly.

Stochastic sets in the image space Symbol grounding problem in AI: ground abstract symbols on the sensory signals Can we define visual concepts as sets of image/video ? e.g. noun concepts: human face, human figure, vehicle; verbal concept: opening a door, drinking tea. image space A point is an image or a video clip

1. Stochastic set in statistical physics Statistical physics studies macroscopic properties of systems that consist of massive elements with microscopic interactions. e.g.: a tank of insulated gas or ferro-magnetic material N = Micro-canonical Ensemble S = (x N, p N ) Micro-canonical Ensemble =  N, E, V) = { s : h(S) = (N, E, V) } A state of the system is specified by the position of the N elements X N and their momenta p N But we only care about some global properties Energy E, Volume V, Pressure, ….

It took 30-years to transfer this theory to vision I obs I syn ~  h  k=0 I syn ~  h  k=1 I syn ~  h  k=3 I syn ~  h  k=7 I syn ~  h  k=4 h c are histograms of Gabor filter responses (Zhu, Wu, and Mumford, “Minimax entropy principle and its applications to texture modeling,” 97,99,00) We call this the Julesz ensemble

More texture examples of the Julesz ensemble MCMC sample from the micro-canonical ensemble Observed

Equivalence of deterministic set and probabilistic models Theorem 1 For an infinite (large) image from the texture ensemble any local patch of the image given its neighborhood follows a conditional distribution specified by a FRAME/MRF model  Z2Z2 Theorem 2 As the image lattice goes to infinity, is the limit of the FRAME model, in the absence of phase transition. Gibbs 1902, Wu and Zhu, 2000 Ref. Y. N. Wu, S. C. Zhu, “Equivalence of Julesz Ensemble and FRAME models,” Int’l J. Computer Vision, 38(3), , July, 2000.

subspace 1 subspace 2 2. Lower dimensional sets or bounded subspaces K is far smaller than the dimension n of the image space.  is a basis function from a dictionary. e.g. Basis pursuit (Chen and Donoho 99), Lasso (Tibshirani 95), (yesterday: Ma, Wright, Li).

Learning an over-complete basis from natural images I =  i  i  i + n (Olshausen and Fields, ). B. Olshausen and D. Fields, “Sparse Coding with an Overcomplete Basis Set: A Strategy Employed by V1?” Vision Research, 37 : , S.C. Zhu, C. E. Guo, Y.Z. Wang, and Z.J. Xu, “What are Textons ?” Int'l J. of Computer Vision, vol.62(1/2), , Textons

Examples of low dimensional sets Saul and Roweis, Sampling the 3D elements under varying lighting directions lighting directions

Bigger textons: object template, but still low dimensional Note: the template only represents an object at a fixed view and a fixed configuration. (a)(b) When we allow the sketches to deform locally, the space becomes “swollen”. The elements are almost non-overlapping Y.N. Wu, Z.Z. Si, H.F. Gong, and S.C. Zhu, “Learning Active Basis Model for Object Detection and Recognition,” IJCV, 2009.

Summary: two regimes of stochastic sets I call them the implicit vs. explicit sets

Relations to the psychophysics literature Response time T Distractors # n The struggle on textures vs textons (Julesz, 60-80s) Textons: coded explicitly

Textons vs. Textures Response time T Distractors # n Textures: coded up to an equivalence ensemble. Actually the brain is plastic, textons are learned over experience. e.g. Chinese characters are texture to you first, then they become textons if you can recognize them.

A second look at the space of images image space explicit manifolds implicit manifolds

3. Stochastic sets by composition: mixing im/explicit subspaces Product:

Examples of learned object templates Zhangzhang Si, Ref: Si and Zhu, Learning Hybrid Image Templates for object modeling and detection,

More examples rich appearance, deformable, but fixed configurations

Fully unsupervised learning with compositional sparsity Four common templates from 20 images Hong, et al. “Compositional sparsity for learning from natural images,” 2013.

Fully unsupervised learning According to the Chinese painters, the world has only one image !

Isn’t this how the Chinese characters were created for objects and scenes? Sparsity, Symbolized Texture, Shape Diffeomorphism, Compositionality --- Every topic in this workshop is covered !

4. Stochastic sets by And-Or composition (Grammar) A ::= aB | a | aBc A A1A1 A2A2 A3A3 Or-node And-nodes Or-nodes terminal nodes B1B1 B2B2 a1a1 a2a2 a3a3 c A production rule in grammar can be represented by an And-Or tree We put the previous templates as terminal nodes, and compose new templates through And-Or operations.

The language of a grammar is a set of valid sentences A B C acc b Or-node And-node leaf-node A grammar production rule: The language is the set of all valid configurations derived from a note A.

And-Or graph, parse graphs, and configurations Each category is conceptualized to a grammar whose language defines a set or “equivalence class” for all the valid configurations of the each category.

Unsupervised Learning of AND-OR Templates Si and Zhu, PAMI, to appear

A concrete example on human figures

Templates for the terminal notes at all levels symbols are grounded !

Synthesis (Computer Dream) by sampling the language Rothrock and Zhu, 2011

Local computation is hugely ambiguous Dynamic programming and re-ranking

Composing Upper Body

Composing parts in the hierarchy

5. Continuous entropy spectrum Scaling (zoom-out) increases the image entropy (dimensions) Ref: Y.N. Wu, C.E. Guo, and S.C. Zhu, “From Information Scaling of Natural Images to Regimes of Statistical Models,” Quarterly of Applied Mathematics, 2007.

Entropy rate (bits/pixel) over distance on natural images 1.entropy of I x 2.JPEG #of DooG bases for reaching 30% MSE

Simulation: regime transitions in scale space We need a seamless transition between different regimes of models scale 1scale 2 scale 3 scale 4 scale 5 scale 6scale 7

Coding efficiency and number of clusters over scales Number of clusters found Low Middle High

Imperceptibility: key to transition Let W be the description of the scene (world), W ~ p(W) Assume: generative model I = g(W) Imperceptibility = Scene Complexity – Image complexity 1. Scene Complexity is defined as the entropy of p(W) 2. Imperceptibility is defined as the entropy of posterior p(W|I) Theorem:

6. Spatial, Temporal, Causal AoG– Knowledge Representation Ref. M. Pei and S.C. Zhu, “Parsing Video Events with Goal inference and Intent Prediction,” ICCV, Temporal-AOG for action / events (express hi-order sequence)

Representing causal concepts by Causal-AOG Spatial, Temporal, Causal AoG for Knowledge Representation

Summary: a unifying mathematical foundation regimes of representations / models Stochastic grammar partonomy, taxonomy, relations Logics (common sense, domain knowledge) Sparse coding (low-D manifolds, textons) Two known grand challenges: symbol grounding, semantic gaps. Markov, Gibbs Fields (hi-D manifolds, textures) Reasoning Cognition Recognition Coding