CS 2750: Machine Learning Density Estimation

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Part 2: Unsupervised Learning
Bayes rule, priors and maximum a posteriori
Basics of Statistical Estimation
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Pattern Recognition and Machine Learning
Pattern Recognition and Machine Learning
Flipping A Biased Coin Suppose you have a coin with an unknown bias, θ ≡ P(head). You flip the coin multiple times and observe the outcome. From observations,
Visual Recognition Tutorial
Midterm Review. The Midterm Everything we have talked about so far Stuff from HW I won’t ask you to do as complicated calculations as the HW Don’t need.
Descriptive statistics Experiment  Data  Sample Statistics Sample mean Sample variance Normalize sample variance by N-1 Standard deviation goes as square-root.
Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.
Most slides from Expectation Maximization (EM) Northwestern University EECS 395/495 Special Topics in Machine Learning.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.
Machine Learning CMPT 726 Simon Fraser University
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1.
Thanks to Nir Friedman, HU
Learning Bayesian Networks (From David Heckerman’s tutorial)
Crash Course on Machine Learning
Recitation 1 Probability Review
Chapter Two Probability Distributions: Discrete Variables
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Latent Variable Models Christopher M. Bishop. 1. Density Modeling A standard approach: parametric models  a number of adaptive parameters  Gaussian.
Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013.
Statistical Learning (From data to distributions).
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Elementary manipulations of probabilities Set probability of multi-valued r.v. P({x=Odd}) = P(1)+P(3)+P(5) = 1/6+1/6+1/6 = ½ Multi-variant distribution:
Lecture 2: Statistical learning primer for biologists
1 Parameter Learning 2 Structure Learning 1: The good Graphical Models – Carlos Guestrin Carnegie Mellon University September 27 th, 2006 Readings:
Maximum Likelihood Estimation
Gaussian Processes For Regression, Classification, and Prediction.
Machine Learning 5. Parametric Methods.
1 Param. Learning (MLE) Structure Learning The Good Graphical Models – Carlos Guestrin Carnegie Mellon University October 1 st, 2008 Readings: K&F:
1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,
Dirichlet Distribution
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
CSC321: Lecture 8: The Bayesian way to fit models Geoffrey Hinton.
CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.
Bayesian Estimation and Confidence Intervals Lecture XXII.
MLPR - Questions. Can you go through integration, differentiation etc. Why do we need priors? Difference between prior and posterior. What does Bayesian.
Crash course in probability theory and statistics – part 2 Machine Learning, Wed Apr 16, 2008.
Applied statistics Usman Roshan.
Oliver Schulte Machine Learning 726
CS 2750: Machine Learning Review
Usman Roshan CS 675 Machine Learning
CS 2750: Machine Learning Directed Graphical Models
Probability Theory and Parameter Estimation I
Ch3: Model Building through Regression
CS 2750: Machine Learning Probability Review Density Estimation
Bayes Net Learning: Bayesian Approaches
Maximum Likelihood Estimation
Oliver Schulte Machine Learning 726
CS 2750: Machine Learning Expectation Maximization
Special Topics In Scientific Computing
Latent Variables, Mixture Models and EM
Distributions and Concepts in Probability Theory
Maximum Likelihood Find the parameters of a model that best fit the data… Forms the foundation of Bayesian inference Slide 1.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Important Distinctions in Learning BNs
CS 2750: Machine Learning Expectation Maximization
Parameter Learning 2 Structure Learning 1: The good
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions July chonbuk national university.
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

CS 2750: Machine Learning Density Estimation Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Midterm exam

Midterm exam T/F Question # # Correct (Total 26) 1 22 2 26 3 17 4 21 5 23 7 25 8 24 9 10 11 12 15 13 14 16 18 19 20

Parametric Distributions Basic building blocks: Need to determine given Curve Fitting Slide from Bishop

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution Slide from Bishop

Binary Variables (2) N coin flips: Binomial Distribution Slide from Bishop

Binomial Distribution Slide from Bishop

Parameter Estimation (1) ML for Bernoulli Given: Slide from Bishop

Parameter Estimation (2) Example: Prediction: all future tosses will land heads up Overfitting to D Slide from Bishop

Beta Distribution Distribution over . Slide from Bishop

Bayesian Bernoulli The Beta distribution provides the conjugate prior for the Bernoulli distribution. Slide from Bishop

Bayesian Bernoulli The hyperparameters aN and bN are the effective number of observations of x=1 and x=0 (need not be integers) The posterior distribution in turn can act as a prior as more data is observed

Bayesian Bernoulli Interpretation? The fraction of (real and fictitious/prior observations) corresponding to x=1 l = N - m

Prior ∙ Likelihood = Posterior Slide from Bishop

Multinomial Variables 1-of-K coding scheme: Slide from Bishop

ML Parameter estimation Given: Ensure , use a Lagrange multiplier, λ. Slide from Bishop

The Multinomial Distribution Slide from Bishop

The Dirichlet Distribution Conjugate prior for the multinomial distribution. Slide from Bishop

The Gaussian Distribution Slide from Bishop

The Gaussian Distribution Diagonal covariance matrix Covariance matrix proportional to the identity matrix Slide from Bishop

Maximum Likelihood for the Gaussian (1) Given i.i.d. data , the log likeli-hood function is given by Sufficient statistics Slide from Bishop

Maximum Likelihood for the Gaussian (2) Set the derivative of the log likelihood function to zero, and solve to obtain Similarly Slide from Bishop

Mixtures of Gaussians (1) Old Faithful data set Single Gaussian Mixture of two Gaussians Slide from Bishop

Mixtures of Gaussians (2) Combine simple models into a complex model: K=3 Component Mixing coefficient Slide from Bishop

Mixtures of Gaussians (3) Slide from Bishop