المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen

Slides:



Advertisements
Similar presentations
Sum-Product Networks: A New Deep Architecture
Advertisements

CS188: Computational Models of Human Behavior
Aggregating local image descriptors into compact codes
CS590M 2008 Fall: Paper Presentation
Advanced topics.
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
1 Naïve Bayes Models for Probability Estimation Daniel Lowd University of Washington (Joint work with Pedro Domingos)
(Infinitely) Deep Learning in Vision Max Welling (UCI) collaborators: Ian Porteous (UCI) Evgeniy Bart UCI/Caltech) Pietro Perona (Caltech)
Sum-Product Networks CS886 Topics in Natural Language Processing
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Randomized Algorithms for Bayesian Hierarchical Clustering
Dependency Networks for Collaborative Filtering and Data Visualization UAI-2000 발표 : 황규백.
Indexing Correlated Probabilistic Databases Bhargav Kanagal, Amol Deshpande University of Maryland, College Park, USA SIGMOD Presented.
DISTIN: Distributed Inference and Optimization in WSNs A Message-Passing Perspective SCOM Team
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Sum-Product Networks: A New Deep Architecture
1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Sum-Product Networks Ph.D. Student : Li Weizhuo
Fill-in-The-Blank Using Sum Product Network
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
A Brief Introduction to Bayesian networks
Machine Learning Supervised Learning Classification and Regression
Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton
Learning to Compare Image Patches via Convolutional Neural Networks
Learning Deep Generative Models by Ruslan Salakhutdinov
Summary of “Efficient Deep Learning for Stereo Matching”
Deep Learning Amin Sobhani.
Data Mining, Neural Network and Genetic Programming
Boosted Augmented Naive Bayes. Efficient discriminative learning of
CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.
ECE 5424: Introduction to Machine Learning
Recursive Neural Networks
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Article Review Todd Hricik.
Multimodal Learning with Deep Boltzmann Machines
Deep Learning Yoshua Bengio, U. Montreal
Intelligent Information System Lab
Neural networks (3) Regularization Autoencoder
Deep learning and applications to Natural language processing
Understanding the Difficulty of Training Deep Feedforward Neural Networks Qiyue Wang Oct 27, 2017.
Deep Learning Workshop
Dynamic Routing Using Inter Capsule Routing Protocol Between Capsules
Machine Learning Today: Reading: Maria Florina Balcan
Outline Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no.
Learning to Combine Bottom-Up and Top-Down Segmentation
Chapter 3. Artificial Neural Networks - Introduction -
Introduction to Deep Learning with Keras
Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
Neural Networks Geoff Hulten.
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Expectation-Maximization & Belief Propagation
Overview of deep learning
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Variable Elimination Graphical Models – Carlos Guestrin
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Graph Attention Networks
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Learning and Memorization
CSC 578 Neural Networks and Deep Learning
CS249: Neural Language Model
Presentation transcript:

المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen Sum-Product Networks   المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen 1

introduction The key limiting factor in graphical model inference and learning is the complexity of the partition function. We thus ask the question: what are general conditions under which the partition function is tractable? The answer leads to a new kind of deep architecture, which we call sum product networks (SPNs). SPNs are directed acyclic graphs with variables as leaves, sums and products as internal nodes. We show that if an SPN is complete and consistent it represents the partition function and all marginal of some graphical model,and give semantics to its nodes .

What is a Sum-Product Network What is a Sum-Product Network? * Poon and Domingos, _ Acyclic directed graph of sums and products * Leaves can be indicator variables or univariate distributions

SPN Architecture Specific type of deep neural network – Activation function: product Advantage: – Clear semantics and well understood theory

This Talk: Sum-Product Networks Compactly represent partition function using a deep network Sum-Product Networks Graphical Models Existing Tractable Models

Learn optimal way to reuse computation, etc. Sum-Product Networks Graphical Models Existing Tractable Models

Outline Sum-product networks (SPNs) Learning SPNs Experimental results Conclusion 7

Why Is Inference Hard? Bottleneck: Summing out variables E.g.: Partition function Sum of exponentially many products

Alternative Representation X1 X2 P(X) 1 0.4 0.2 0.1 0.3 P(X) = 0.4  I[X1=1]  I[X2=1] + 0.2  I[X1=1]  I[X2=0] + 0.1  I[X1=0]  I[X2=1] + 0.3  I[X1=0]  I[X2=0]

Shorthand for Indicators X1 X2 P(X) 1 0.4 0.2 0.1 0.3 P(X) = 0.4  X1  X2 + 0.2  X1  X2 + 0.1  X1  X2 + 0.3  X1  X2    

Easy: Set both indicators to 1 Sum Out Variables e: X1 = 1 X1 X2 P(X) 1 0.4 0.2 0.1 0.3 P(e) = 0.4  X1  X2 + 0.2  X1  X2 + 0.1  X1  X2 + 0.3  X1  X2       Set X1 = 1, X1 = 0, X2 = 1, X2 = 1 Easy: Set both indicators to 1

SUM PRODUCT NETWORK Graphical Representation  X1 X2 P(X) 1 0.4 0.2 0.1 0.3 0.4 0.3 0.2 0.1       X1 X1 X2 X2

But … Exponentially Large Example: Parity Uniform distribution over states with even number of 1’s 2N-1                  N2N-1      X1 X1 X2 X2 X3 X3 X4 X4 X5 X5 Can we make this more compact 13 13

But … Exponentially Large Can we make this more compact? Example: Parity Uniform distribution over states of even number of 1’s                       X1 X1 X2 X2 X3 X3 X4 X4 X5 X5 14 14

Use a Deep Network Example: Parity Uniform distribution over states of even number of 1’s Induce many hidden layers Reuse partial computation 15

Sum-Product Networks (SPNs) Rooted DAG Nodes: Sum, product, input indicator Weights on edges from sum to children  0.7 0.3       0.4 0.9 0.7 0.2 0.6 0.1 0.3 0.8   X1 X1 X2 X2 16

Valid SPN SPN is valid if S(e) = Xe S(X) for all e Valid  Can compute marginals efficiently Partition function Z can be computed by setting all indicators to 1 17

Valid SPN: General Conditions Theorem: SPN is valid if it is complete & consistent Consistent: Under product, no variable in one child and negation in another Complete: Under sum, children cover the same set of variables Incomplete Inconsistent S(e)  Xe S(X) S(e)  Xe S(X) 18

Semantics of Sums and Products Product  Feature  Form feature hierarchy Sum  Mixture (with hidden var. summed out) i i   wij wij Sum out Yi j ……  …… ……  …… j  I[Yi = j]

Handling Continuous Variables Sum  Integral over input Simplest case: Indicator  Gaussian SPN compactly defines a very large mixture of Gaussians

SPNs can compactly represent many more distributions SPNs Everywhere Graphical models Existing tractable models, e.g.: hierarchical mixture model, thin junction tree, etc. SPNs can compactly represent many more distributions 21

SPNs can represent, combine, and learn the optimal way SPNs Everywhere Graphical models Inference methods, e.g.: Junction-tree algorithm, message passing, … SPNs can represent, combine, and learn the optimal way 22

SPNs Everywhere Graphical models SPNs can be more compact by leveraging determinism, context-specific independence, etc. 23

SPN: First approach for learning directly from data SPNs Everywhere Graphical models Models for efficient inference E.g., arithmetic circuits, AND/OR graphs, case-factor diagrams SPN: First approach for learning directly from data 24

SPNs Everywhere Graphical models Models for efficient inference General, probabilistic convolutional network Sum: Average-pooling Max: Max-pooling 25

Product: Production rule SPNs Everywhere Graphical models Models for efficient inference General, probabilistic convolutional network Grammars in vision and language E.g., object detection grammar, probabilistic context-free grammar Sum: Non-terminal Product: Production rule 26

Outline Sum-product networks (SPNs) Learning SPNs Experimental results Conclusion 27

The Challenge In principle, can use gradient descent But … gradient quickly dilutes Similar problem with EM Hard EM overcomes this problem 28

Outline Sum-product networks (SPNs) Learning SPNs Experimental results Conclusion 29

Task: Image Completion Very challenging Good for evaluating deep models Methodology: Learn a model from training images Complete unseen test images Measure mean square errors

SPN Architecture          ...... ...... ...... ...... x Region     ......  ......   ......  ......  Pixel x

Systems SPN DBM [Salakhutdinov & Hinton, 2010] DBN [Hinton & Salakhutdinov, 2006] PCA [Turk & Pentland, 1991] Nearest neighbor [Hays & Efros, 2007] 32

Preprocessing for DBN Did not work well despite best effort Followed Hinton & Salakhutdinov [2006] Reduced scale: 64  64  25  25 Used much larger dataset:  120,000 images Reduced-scale  Artificially lower errors

Caltech: Mean-Square Errors LEFT BOTTOM SPN 3551 3270 DBM 9043 9792 DBN 4778 4492 PCA 4234 4465 Nearest Neighbor 4887 5505

SPN vs. DBM / DBN SPN is order of magnitude faster No elaborate preprocessing, tuning Reduced errors by 30-60% Learned up to 46 layers SPN DBM / DBN Learning 2-3 hours Days Inference < 1 second Minutes or hours

Scatter Plot: Complete Left

Scatter Plot: Complete Bottom

Olivetti: Mean-Square Errors LEFT BOTTOM SPN 942 918 DBM 1866 2401 DBN 2386 1931 PCA 1076 1265 Nearest Neighbor 1527 1793

End-to-End Comparison Approximate General Graphical Models Approximate Data Performance Given same computation budget, which approach has better performance? Approximate Sum-Product Networks Exact Data Performance

Deep Learning Stack many layers E.g.: DBN [Hinton & Salakhutdinov,2006] CDBN [Lee et al.,2009] DBM [Salakhutdinov & Hinton,2010] Potentially much more powerful than shallow architectures [Bengio, 2009] But … Inference is even harder Learning requires extensive effort 40

Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden variables Exact inference: Linear time in network size Deep learning: Online hard EM Substantially outperform state of the art on image completion 41