Mixture Density Networks

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Chapter 4: Linear Models for Classification
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.
Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Radford M. Neal and Jianguo Zhang the winners.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
EM and expected complete log-likelihood Mixture of Experts
Least-Mean-Square Training of Cluster-Weighted-Modeling National Taiwan University Department of Computer Science and Information Engineering.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Ensemble Methods: Bagging and Boosting
CSC 2535 Lecture 8 Products of Experts Geoffrey Hinton.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,... Si Sj.
Machine Learning 5. Parametric Methods.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 15: Mixtures of Experts Geoffrey Hinton.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
Computacion Inteligente Least-Square Methods for System Identification.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
3. Linear Models for Regression 後半 東京大学大学院 学際情報学府 中川研究室 星野 綾子.
Chapter 12 Case Studies Part B. Control System Design.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
MathematicalMarketing Slide 3c.1 Mathematical Tools Chapter 3: Part c – Parameter Estimation We will be discussing  Nonlinear Parameter Estimation  Maximum.
Fall 2004 Backpropagation CS478 - Machine Learning.
PDF, Normal Distribution and Linear Regression
Deep Feedforward Networks
Ch 12. Continuous Latent Variables ~ 12
Probability Theory and Parameter Estimation I
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Artificial Intelligence (CS 370D)
Dept. Computer Science & Engineering, Shanghai Jiao Tong University
Lecture 04: Logistic Regression
Data Mining Lecture 11.
Overview of Supervised Learning
Neural Language Model CS246 Junghoo “John” Cho.
Information Based Criteria for Design of Experiments
Today (2/11/16) Learning objectives (Sections 5.1 and 5.2):
System Level Diesel Engine Emission Modeling Using Neural Networks
Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall CHAPTER 15 SIMULATION-BASED OPTIMIZATION II: STOCHASTIC GRADIENT AND.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
10701 / Machine Learning Today: - Cross validation,
network of simple neuron-like computing elements
Backpropagation.
Gaussian Mixture Models And their training with the EM algorithm
Pattern Recognition and Machine Learning
Biointelligence Laboratory, Seoul National University
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Generally Discriminant Analysis
CSCE833 Machine Learning Lecture 9 Linear Discriminant Analysis
Model generalization Brief summary of methods
Recursively Adapted Radial Basis Function Networks and its Relationship to Resource Allocating Networks and Online Kernel Learning Weifeng Liu, Puskal.
Parametric Methods Berlin Chen, 2005 References:
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Linear Discrimination
Reuben Feinman Research advised by Brenden Lake
Introduction to Neural Networks
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Example of training and deployment of deep convolutional neural networks. Example of training and deployment of deep convolutional neural networks. During.
Presentation transcript:

Mixture Density Networks Qiang Lou

Outline Simple Example Introduction of MDN Analysis of MDN Weights Optimization Prediction

Simple Example The inverse problem of x=t+0.3*sin(2лt) Input: x, output: t

Learn form the example Obviously, we find the problem of the conventional neural networks. multi-valued mapping Reason: f(x,w*)= E[Y|X], the average of the correct target values, sometimes is not correct solution.

Solution: mixture density networks MDN: overcome the limits mentioned above ---- using a linear combination of kernel function: Three parameters: coefficients: means: variances:

How to model the parameters? ---- using the outputs of the conventional NN Coefficients: Variances: Means can be directly represented by output of NN:

Basic structure of MDN

Weights Optimization Similar to the conventional NN: maximum likelihood (minimize the negative logarithm of the likelihood). We try to minimize E(w), which is equivalent to maximize the likelihood.

Weights Optimization Using chain rule and back propagation: start off the algorithm:

Prediction General Way take the conditional average of the target data: Accurate Way take the solution of the most probable components μk , where k = arg maxk( )

Results of example

Problems The number of the outputs of the MDN Assume: L models in the mixture model K outputs in conventional NN Outputs of MDN: (K+2) L 2)

Thank you !