Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter Two Probability Distributions: Discrete Variables

Similar presentations


Presentation on theme: "Chapter Two Probability Distributions: Discrete Variables"— Presentation transcript:

1 Chapter Two Probability Distributions: Discrete Variables
Distributions: Relationships Binary Variables Bernoulli, Binomial and Beta Multinomial Variables Generalized Bernoulli and Dirichlet

2 Distributions: Landscape Discrete- Binary
Bernoulli Discrete- Multivalued Continuous Binomial Multinomial Beta Dirichlet Gaussian Wishart Student’s-t Gamma Exponential Angular Von Mises Uniform 2

3 Distributions: Relationships
Discrete- Binary Conjugate Prior N=1 Beta Continuous variable Binomial Bernoulli N samples of Bernoulli Single binary variable between {0,1] K=2 Discrete- Multi-valued Multinomial One of K values = K-dimensional Conjugate Prior Large N Dirichlet K random variables binary vector between [0.1] Continuous Gaussian Student’s-t Generalization of Gaussian robust to Outliers Gamma Conjugate Prior of univariate Gaussian precision Wishart Conjugate Prior of multivariate Gaussian precision matrix Exponential Special case of Gamma Infinite mixture of Gaussians Gaussian-Gamma Conjugate prior of univariate Gaussian Unknown mean and precision Gaussian-Wishart Conjugate prior of multi-variate Gaussian Unknown mean and precision matrix Angular Von Mises Uniform 3

4 Bernoulli, Binomial and Beta
Binary Variables Bernoulli, Binomial and Beta 4

5 Bernoulli Distribution
Expresses distribution of single binary-valued random variable x  {0,1} Probability of x=1 is denoted by parameter , i.e., Therefore, Probability distribution has the form Mean is shown to be E[x]= Variance is Var[x]=(1-) Likelihood of n observations independently drawn from p(x|) is Log-likelihood is Maximum likelihood estimator – obtained by setting derivative of ln p(D|) wrt  equal to zero is If no of observations of x=1 is m then ML=m/N Jacob Bernoulli 5

6 Binomial Distribution
• Related to Bernoulli distribution • Expresses Distribution of m – No of observations for which x=1 • It is proportional to Bern(x|) • Add up all ways of obtaining heads Histogram of Binomial for N=10 and =0.25 • Mean and Variance are N times m: head N-m: tail 6

7 Bayesian Inference with Beta
• MLE of  in Bernoulli is fraction of observations with x=1 – Severely over-fitted for small data sets // MLE: argmax{p(D|)} • Likelihood function takes products of factors of the form x(1- )(1-x) • If prior distribution of  is chosen to be proportional to powers of  and 1-, posterior will have same functional form as the prior – Called conjugacy • Beta has form suitable for a prior distribution of p() posterior p(|D)  likelihood p(D|)  prior p() where 8

8 Beta Distribution • Mean and Variance • Beta distribution
• Where the Gamma function is defined as • a and b are hyperparameters that control distribution of parameter  • Mean and Variance a=0.1, b=0.1 a=2, b=3 a=1, b=1 a=8, b=4 Beta distribution as function of  For values of hyperparameters a and b 7

9 Bayesian Inference with Beta
posterior p(|D)  likelihood p(D|)  prior p() Illustration of one step in process a=2, b=2 N=m=1, with x=1 • Posterior obtained by multiplying beta prior with binomial likelihood yields – where l = N - m, which is no of tails – m is no of heads • It is another beta distribution – Effectively increase value of a by m and b by l – As number of observations increases distribution becomes more peaked a=3, b=2 9

10 Predicting Next Trial Outcome
• Need predictive distribution of x given observed D – From sum and products rule • Expected value of the posterior distribution can be shown to be – Which is fraction of observations (both fictitious and real) that correspond to x=1 • Maximum likelihood and Bayesian results agree in the limit of infinite observations – On average uncertainty (variance) decreases with observed data 10

11 Summary • Single Binary variable distribution is represented by Bernoulli • Binomial is related to Bernoulli – Expresses distribution of number of occurrences of either 1 or 0 in N trials • Beta distribution is a conjugate prior for Bernoulli – Both have the same functional form 11

12 Multinomial Variables
Generalized Bernoulli and Dirichlet 12

13 Generalization of Bernoulli
• Discrete variable that takes one of K values (instead of 2) • Represent as 1 of K scheme – Represent x as a K-dimensional vector – If x=3 then we represent it as x=(0,0,1,0,0,0)T – Such vectors satisfy • If probability of xk =1 is denoted k then distribution of x is given by Generalized Bernoulli 13

14 Likelihood Function • Given a set of D of N independent
observations x1,..xN • The likelihood function has the form • Where mk=nxnk is the number of observations of xk=1 • The maximum likelihood solution (obtained by log-likelihood and derivative wrt zero) is which is fraction of N observations for which xk = 1. D: N = 5, K = 6 (0, 0, 1, 0, 0, 0)T (1, 0, 0, 0, 0, 0)T (0, 0, 0, 0, 1, 0)T 14

15 • Multinomial distribution
Generalized Binomial Distribution • Multinomial distribution • Where the normalization coefficient is the no of ways of partitioning N objects into K groups of size • Given by D: N = 5, K = 6 (0, 0, 1, 0, 0, 0)T (1, 0, 0, 0, 0, 0)T (0, 0, 0, 0, 1, 0)T D: N = 7, K = 6 (1, 0, 0, 0, 0, 0)T (0, 1, 0, 0, 0, 0)T 15

16 Dirichlet Distribution
• Family of prior distributions for parameters k of multinomial distribution • By inspection of multinomial, form of conjugate prior is • Normalized form of Dirichlet distribution Lejeune Dirichlet 16

17 Dirichlet over 3 variables
• Due to summation constraint – Distribution over space of {k} is confined to the simplex of dimensionality K-1 – For K=3 k = 0.1 k = 1 Plots of Dirichlet distribution over the simplex for various settings of parameters k k = 10 17

18 Dirichlet Posterior Distribution
• Multiplying prior by likelihood • Which has the form of the Dirichlet distribution 18

19 • Multinomial is a generalization of Bernoulli
Summary • Multinomial is a generalization of Bernoulli –Variable takes on one of K values instead of 2 • Conjugate prior of Multinomial is Dirichlet distribution 19


Download ppt "Chapter Two Probability Distributions: Discrete Variables"

Similar presentations


Ads by Google