Dirichlet Distribution

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Bayes rule, priors and maximum a posteriori
NORMAL OR GAUSSIAN DISTRIBUTION Chapter 5. General Normal Distribution Two parameter distribution with a pdf given by:
Basics of Statistical Estimation
Probabilistic models Haixu Tang School of Informatics.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
By C. Yeshwanth and Mohit Gupta.  An inference method that uses Bayes’ rule to update prior beliefs based on data  Allows a-priori information about.
Flipping A Biased Coin Suppose you have a coin with an unknown bias, θ ≡ P(head). You flip the coin multiple times and observe the outcome. From observations,
Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference (Sec. )
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
A random variable that has the following pmf is said to be a binomial random variable with parameters n, p The Binomial random variable.
Computer vision: models, learning and inference
Computer vision: models, learning and inference Chapter 3 Common probability distributions.
Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1.
Probability Distributions and Frequentist Statistics “A single death is a tragedy, a million deaths is a statistic” Joseph Stalin.
Bayesian Wrap-Up (probably). Administrivia Office hours tomorrow on schedule Woo hoo! Office hours today deferred... [sigh] 4:30-5:15.
Language Modeling Approaches for Information Retrieval Rong Jin.
Multivariate Probability Distributions. Multivariate Random Variables In many settings, we are interested in 2 or more characteristics observed in experiments.
Chapter Two Probability Distributions: Discrete Variables
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
A quick intro to Bayesian thinking 104 Frequentist Approach 10/14 Probability of 1 head next: = X Probability of 2 heads next: = 0.51.
Statistics for Engineer Week II and Week III: Random Variables and Probability Distribution.
Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013.
Introduction to Bayesian statistics Yves Moreau. Overview The Cox-Jaynes axioms Bayes’ rule Probabilistic models Maximum likelihood Maximum a posteriori.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.
Sample variance and sample error We learned recently how to determine the sample variance using the sample mean. How do we translate this to an unbiased.
- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
Bayesian Prior and Posterior Study Guide for ES205 Yu-Chi Ho Jonathan T. Lee Nov. 24, 2000.
MATH 643 Bayesian Statistics. 2 Discrete Case n There are 3 suspects in a murder case –Based on available information, the police think the following.
Basics on Probability Jingrui He 09/11/2007. Coin Flips  You flip a coin Head with probability 0.5  You flip 100 coins How many heads would you expect.
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
Gaussian Processes For Regression, Classification, and Prediction.
The Uniform Prior and the Laplace Correction Supplemental Material not on exam.
Bayesian Approach Jake Blanchard Fall Introduction This is a methodology for combining observed data with expert judgment Treats all parameters.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
3.1 Statistical Distributions. Random Variable Observation = Variable Outcome = Random Variable Examples: – Weight/Size of animals – Animal surveys: detection.
© 2007 Thomson Brooks/Cole, a part of The Thomson Corporation. FIGURES FOR CHAPTER 8 ESTIMATION OF PARAMETERS AND FITTING OF PROBABILITY DISTRIBUTIONS.
Essential Probability & Statistics (Lecture for CS397-CXZ Algorithms in Bioinformatics) Jan. 23, 2004 ChengXiang Zhai Department of Computer Science University.
Random Variables By: 1.
Jacek Wallusch _________________________________ Statistics for International Business Lecture 8: Distributions and Densities.
MLPR - Questions. Can you go through integration, differentiation etc. Why do we need priors? Difference between prior and posterior. What does Bayesian.
Crash course in probability theory and statistics – part 2 Machine Learning, Wed Apr 16, 2008.
Probability for Machine Learning
Bayesian statistics So far we have thought of probabilities as the long term “success frequency”: #successes / #trails → P(success). In Bayesian statistics.
CS 2750: Machine Learning Density Estimation
Ch3: Model Building through Regression
Bayes Net Learning: Bayesian Approaches
Computer vision: models, learning and inference
Chapter 7: Sampling Distributions
Review of Probabilities and Basic Statistics
Special Topics In Scientific Computing
Distributions and Concepts in Probability Theory
Maximum Likelihood Find the parameters of a model that best fit the data… Forms the foundation of Bayesian inference Slide 1.
OVERVIEW OF BAYESIAN INFERENCE: PART 1
Statistical NLP: Lecture 4
More Parameter Learning, Multinomial and Continuous Variables
LECTURE 07: BAYESIAN ESTIMATION
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions July chonbuk national university.
Chapter 3 : Random Variables
HKN ECE 313 Exam 2 Review Session
Mathematical Foundations of BME Reza Shadmehr
1/2555 สมศักดิ์ ศิวดำรงพงศ์
Presentation transcript:

Dirichlet Distribution M. Farrow, MAS3301 Bayesian Statistics, Newcastle University. 2013. B. A. Frigyik, A. Kapila, and M. a R. Gupta. Introduction to the Dirichlet Distribution and Related Processes, University of Washington Department of Electrical Engineering, 2012. Feb 26, 2015 Hee-Gook Jun

Outline Bernoulli Distribution Binomial Distribution Multinomial Distribution Beta Distribution Dirichlet Distribution

Distribution Function vs. Linear Function 𝒇 𝒙 =𝒂𝒙+𝒃 Distribution function Random variable: x Parameter: a, b Linear function Variable: x Constant: a, b Binomial dist. function: 𝑿~𝑩 𝒑 =𝒇(𝒙;𝒑) Gaussian dist. function: 𝑿~𝑵 𝝁, 𝝈 𝟐 =𝐟(𝐱;𝝁, 𝝈 𝟐 ) 𝝁↓ 𝝈 𝟐 ↑ 𝝁 ↑ 𝝈 𝟐 ↓

Bernoulli Distribution Random variable: X = {0,1} Parameter: 0 < p < 1 Sample space (support): x ∈ {0,1} (when p is 0.5) 0.5 pmf If success, 𝑓(𝑥; 𝑝) = 𝑝 If fail, 𝑓(𝑥; 𝑝) = 1 − 𝑝 1 𝑓(𝑥; 𝑝) = 𝑝 𝑥 (1−𝑝) 1−𝑥 cdf 0 𝑓𝑜𝑟 𝑥<0 1−𝑝 𝑓𝑜𝑟 0≤𝑥≤1 1 𝑓𝑜𝑟 𝑥 ≥1 F 𝑥;𝑝 =𝑃 𝑋≤𝑥 = 1 1

Binomial Distribution Random variable: X = # of successes in n trials Parameter n: # of trials p: success probability in each trial Sample space: x ∈ {0,…,n} 𝑓 𝑥;𝑛, 𝑝 =𝑃 𝑋=𝑥 = 𝑛 𝑥 𝑝 𝑥 (1−𝑝) 𝑛−𝑥 pmf F 𝑘;𝑛,𝑝 =𝑃 𝑋≤𝑘 = 𝑖=0 𝑘 𝑛 𝑖 𝑝 𝑖 (1−𝑝) 𝑛−𝑖 cdf에서 수식 헷갈릴거 같아 x대신 k 씀. 의미는 그대로임 cdf 𝑿 𝟏 , 𝑿 𝟐 ,…, 𝑿 𝒏 ~𝑩𝒆𝒓 𝜽 ⇔ 𝒀~𝑩𝒊𝒏 𝒏,𝜽

Binomial Distribution: the parameter Distribution’s shape is changed by the p http://www.marin.edu/~npsomas/Normal_Binomial.htm p < 0.5 skewed left p = 0.5 symmetric p > 0.5 skewed right p 마다 pdf 형태 달라진다 → p 도 분포 가질 수 있다!

Beta Distribution: distribution of probability Beta dist. provide a family of conjugate prior probability distributions in Bayesian inference The domain of the beta dist. can be viewed as a probability, and in fact beta dist. is often used to describe the distribution of a probability value p 𝑿~𝑩𝒊𝒏 𝒏,𝒑 P ~𝑩𝒆𝒕𝒂(𝜶,𝜷) 𝑿~𝑩𝒊𝒏 𝒏,𝜽 Θ ~𝑩𝒆𝒕𝒂(𝜶,𝜷) 𝒇 𝒙; 𝒏,𝜽 𝒇(𝜽; 𝜶,𝜷)

Parameters of Gaussian Distribution 𝑓 𝑥;𝜇, 𝜎 2 = 1 𝜎 2𝜋 𝑒 − 𝑥−𝜇 2 2 𝜎 2 http://ajourneyintodatascience.com/normal-distribution/

Parameters of Binomial Distribution 𝑓 𝑥;𝑛, 𝑝 = 𝑛 𝑥 𝑝 𝑥 (1−𝑝) 𝑛−𝑥 http://www.boost.org/doc/libs/1_41_0/libs/math/doc/sf_and_dist/html/math_toolkit/dist/dist_ref/dists/binomial_dist.html

Parameters of Beta Distribution Θ ~𝐵𝑒𝑡𝑎(𝛼,𝛽) http://www.mailund.dk/index.php/2009/08/09/

Beta Distribution: Gamma Function and Beta Function Extension of the factorial function Γ(𝑛) = (n−1)! Beta function Binomial coefficient after adjusting indices Beta a, b = Γ(a)Γ(b) Γ(a+b) 𝑓 𝜃; 𝛼,𝛽 = Γ(𝛼+𝛽) Γ(𝛼)Γ(𝛽) 𝜃 𝛼−1 (1−𝜃) 𝛽−1 Θ ~𝐵𝑒𝑡𝑎(𝛼,𝛽) 𝑛 𝑥 = 𝑛! 𝑛−𝑥 !𝑥! =…= 1 𝑛+1 𝐵𝑒𝑡𝑎(𝑛−𝑥+1,𝑥+1) 𝑛 𝑥 𝑝 𝑥 (1−𝑝) 𝑛−𝑥

Bayesian inference Posterior probability Consequence of two antecedents Prior probability Likelihood function P 𝐴|𝐵 = P B A 𝑃(𝐴) 𝑃(𝐵) posterior likelihood x prior 𝑝 𝜃|𝑥 ∝ p x 𝜃 ×𝑝(𝜃)

Bayesian inference: Intuition What we want to know What we should know Background knowledge 𝑝 𝜃|𝑥 posterior 𝑝 𝜃|𝑥 = 𝑝 𝑥 𝜃 𝑝(𝜃) likelihood x prior 𝑝 𝜃|𝑥 = 𝑝 𝑥 𝜃 𝑝(𝜃) Maximum Likelihood Assumption

Conjugate prior for a binomial likelihood Posterior distributions are in the same family as the prior probability distribution Beta distribution Conjugate prior for the binomial dist. for binomial likelihood 𝑝 𝜃|𝑥 ∝ p x 𝜃 𝑝(𝜃) likelihood (Binomial Dist.) prior (Beta Dist.) Γ(𝛼+𝛽) Γ(𝛼)Γ(𝛽) 𝜃 𝛼−1 (1−𝜃) 𝛽−1 𝑛 𝑥 𝜃 𝑥 (1−𝜃) 𝑛−𝑥 Γ(𝛼+𝛽+n) Γ(𝛼+𝑥)Γ(𝛽+𝑛−𝑥) 𝜃 𝛼+𝑥−1 (1−𝜃) 𝛽+𝑛−𝑥−1 Posterior (Beta Dist.)

Binomial Distribution vs. Multinomial Distribution Multinomial dist. is a generalization of the binomial dist. 𝑓 𝑥;𝑛, 𝑝 = 𝑛 𝑥 𝑝 𝑥 (1−𝑝) 𝑛−𝑥 Binomial distribution 10 5 𝑝 5 (1−𝑝) 10−5 T F T T F F F T F T 𝑓 𝑥 1 , …,𝑥 𝑘 ;𝑛, 𝑝 1 , …,𝑝 𝑘 = 𝑛! 𝑥 1 ! …𝑥 𝑘 ! 𝑝 1 𝑥 1 … 𝑝 𝑘 𝑥 𝑘 Multinomial distribution 10! 5!4!2!1! 𝑝 1 5 𝑝 2 4 𝑝 3 2 𝑝 4 1 ⇧ ⇩ ⇧ ⇧ ⇩ ⇦ ⇧ ⇩ ⇨ ⇧ ⇦ ⇩

Beta Distribution vs. Dirichlet Distribution Dirichlet dist. is a multivariate generalization of the Beta dist. Very often used as prior distributions in Bayesian Statistics Conjugate prior of the Multinomial dist. 𝑿~𝑩𝒊𝒏 𝒏,𝜽 Θ ~𝑩𝒆𝒕𝒂(𝜶,𝜷) 𝑿 𝟏 ,…, 𝑿 𝒌 ~𝑴𝒖𝒍𝒕𝒊 𝒏, 𝜽 𝟏 ,…, 𝜽 𝒌 Θ 𝟏 ,…, Θ 𝒌 ~𝑫𝒊𝒓 𝜶 𝟏 ,…, 𝜶 𝒌

Beta Distribution vs. Dirichlet Distribution Cont. 𝑓 𝜃; 𝛼,𝛽 = Γ(𝛼+𝛽) Γ(𝛼)Γ(𝛽) 𝜃 𝛼−1 (1−𝜃) 𝛽−1 Beta distribution 𝑓 𝜃 1 , …,𝜃 𝑘 ; 𝛼 1 , …,𝛼 𝑘 = Γ( 𝛼 1 +…+ 𝛼 𝑘 ) Γ( 𝛼 1 )...Γ( 𝛼 𝑘 ) 𝜃 1 𝛼 1 … 𝜃 𝑘 𝛼 𝑘 Dirichlet distribution http://www.columbia.edu/~cjd11/charles_dimaggio/DIRE/styled-4/styled-11/code-4/ http://www.52nlp.cn/lda-math-%E8%AE%A4%E8%AF%86betadirichlet%E5%88%86%E5%B8%833/dirichlet-distribution Beta Dirichlet